* [PATCH -v2 1/5] sched: Fix ttwu() race
2020-06-22 10:01 [PATCH -v2 0/5] sched: TTWU, IPI and stuff Peter Zijlstra
@ 2020-06-22 10:01 ` Peter Zijlstra
2020-06-22 12:56 ` Peter Zijlstra
2020-07-21 10:49 ` [PATCH -v2 1/5] sched: " Chris Wilson
2020-06-22 10:01 ` [PATCH -v2 2/5] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
` (3 subsequent siblings)
4 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-06-22 10:01 UTC (permalink / raw)
To: mingo, tglx
Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, paulmck, frederic, peterz, torvalds,
hch
Paul reported rcutorture occasionally hitting a NULL deref:
sched_ttwu_pending()
ttwu_do_wakeup()
check_preempt_curr() := check_preempt_wakeup()
find_matching_se()
is_same_group()
if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*
Debugging showed that this only appears to happen when we take the new
code-path from commit:
2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
and only when @cpu == smp_processor_id(). Something which should not
be possible, because p->on_cpu can only be true for remote tasks.
Similarly, without the new code-path from commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
this would've unconditionally hit:
smp_cond_load_acquire(&p->on_cpu, !VAL);
and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this
would result in an instant live-lock (with IRQs disabled), something
that hasn't been reported.
The NULL deref can be explained however if the task_cpu(p) load at the
beginning of try_to_wake_up() returns an old value, and this old value
happens to be smp_processor_id(). Further assume that the p->on_cpu
load accurately returns 1, it really is still running, just not here.
Then, when we enqueue the task locally, we can crash in exactly the
observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq
is from the wrong CPU, therefore we'll iterate into the non-existant
parents and NULL deref.
The closest semi-plausible scenario I've managed to contrive is
somewhat elaborate (then again, actual reproduction takes many CPU
hours of rcutorture, so it can't be anything obvious):
X->cpu = 1
rq(1)->curr = X
CPU0 CPU1 CPU2
// switch away from X
LOCK rq(1)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 9
switch_to(Z)
X->on_cpu = 0
UNLOCK rq(1)->lock
// migrate X to cpu 0
LOCK rq(1)->lock
dequeue_task(X)
set_task_cpu(X, 0)
X->cpu = 0
UNLOCK rq(1)->lock
LOCK rq(0)->lock
enqueue_task(X)
X->on_rq = 1
UNLOCK rq(0)->lock
// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
X->on_cpu = 1
UNLOCK rq(0)->lock
// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb(); // wake X
ttwu()
LOCK X->pi_lock
smp_mb__after_spinlock
if (p->state)
cpu = X->cpu; // =? 1
smp_rmb()
// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 0
if (p->on_rq)
smp_rmb();
if (p->on_cpu && ttwu_queue_wakelist(..)) [*]
smp_cond_load_acquire(&p->on_cpu, !VAL)
cpu = select_task_rq(X, X->wake_cpu, ...)
if (X->cpu != cpu)
switch_to(Y)
X->on_cpu = 0
UNLOCK rq(0)->lock
However I'm having trouble convincing myself that's actually possible
on x86_64 -- after all, every LOCK implies an smp_mb there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.
(Most of the previous ttwu() races were found on very large PowerPC)
Nevertheless, this fully explains the observed failure case.
Fix it by ordering the task_cpu(p) load after the p->on_cpu load,
which is easy since nothing actually uses @cpu before this.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/sched/core.c | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,8 +2293,15 @@ void sched_ttwu_pending(void *arg)
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- llist_for_each_entry_safe(p, t, llist, wake_entry)
+ llist_for_each_entry_safe(p, t, llist, wake_entry) {
+ if (WARN_ON_ONCE(p->on_cpu))
+ smp_cond_load_acquire(p->on_cpu, !VAL);
+
+ if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))
+ set_task_cpu(p, cpu_of(rq));
+
ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
+ }
rq_unlock_irqrestore(rq, &rf);
}
@@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int c
static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
{
if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+ if (WARN_ON_ONCE(cpu == smp_processor_id()))
+ return false;
+
sched_clock_cpu(cpu); /* Sync clocks across CPUs */
__ttwu_queue_wakelist(p, cpu, wake_flags);
return true;
@@ -2550,7 +2560,6 @@ try_to_wake_up(struct task_struct *p, un
/* We're going to change ->state: */
success = 1;
- cpu = task_cpu(p);
/*
* Ensure we load p->on_rq _after_ p->state, otherwise it would
@@ -2614,8 +2623,21 @@ try_to_wake_up(struct task_struct *p, un
* which potentially sends an IPI instead of spinning on p->on_cpu to
* let the waker make forward progress. This is safe because IRQs are
* disabled and the IPI will deliver after on_cpu is cleared.
+ *
+ * Ensure we load task_cpu(p) after p->on_cpu:
+ *
+ * set_task_cpu(p, cpu);
+ * STORE p->cpu = @cpu
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock
+ * smp_mb__after_spin_lock() smp_cond_load_acquire(&p->on_cpu)
+ * STORE p->on_cpu = 1 LOAD p->cpu
+ *
+ * to ensure we observe the correct CPU on which the task is currently
+ * scheduling.
*/
- if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
+ if (smp_load_acquire(&p->on_cpu) &&
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
goto unlock;
/*
@@ -2635,6 +2657,8 @@ try_to_wake_up(struct task_struct *p, un
psi_ttwu_dequeue(p);
set_task_cpu(p, cpu);
}
+#else
+ cpu = task_cpu(p);
#endif /* CONFIG_SMP */
ttwu_queue(p, cpu, wake_flags);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-06-22 10:01 ` [PATCH -v2 1/5] sched: Fix ttwu() race Peter Zijlstra
@ 2020-06-22 12:56 ` Peter Zijlstra
2020-06-23 7:19 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2020-06-23 8:48 ` [tip: sched/urgent] sched/core: " tip-bot2 for Peter Zijlstra
2020-07-21 10:49 ` [PATCH -v2 1/5] sched: " Chris Wilson
1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-06-22 12:56 UTC (permalink / raw)
To: mingo, tglx
Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, paulmck, frederic, torvalds, hch
*sigh*, this one should actually build and I got a smatch report that
there's an uninitizlied usage of @cpu, so I shuffled that around a bit.
---
Subject: sched: Fix ttwu() race
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon, 22 Jun 2020 12:01:23 +0200
Paul reported rcutorture occasionally hitting a NULL deref:
sched_ttwu_pending()
ttwu_do_wakeup()
check_preempt_curr() := check_preempt_wakeup()
find_matching_se()
is_same_group()
if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*
Debugging showed that this only appears to happen when we take the new
code-path from commit:
2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
and only when @cpu == smp_processor_id(). Something which should not
be possible, because p->on_cpu can only be true for remote tasks.
Similarly, without the new code-path from commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
this would've unconditionally hit:
smp_cond_load_acquire(&p->on_cpu, !VAL);
and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this
would result in an instant live-lock (with IRQs disabled), something
that hasn't been reported.
The NULL deref can be explained however if the task_cpu(p) load at the
beginning of try_to_wake_up() returns an old value, and this old value
happens to be smp_processor_id(). Further assume that the p->on_cpu
load accurately returns 1, it really is still running, just not here.
Then, when we enqueue the task locally, we can crash in exactly the
observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq
is from the wrong CPU, therefore we'll iterate into the non-existant
parents and NULL deref.
The closest semi-plausible scenario I've managed to contrive is
somewhat elaborate (then again, actual reproduction takes many CPU
hours of rcutorture, so it can't be anything obvious):
X->cpu = 1
rq(1)->curr = X
CPU0 CPU1 CPU2
// switch away from X
LOCK rq(1)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 9
switch_to(Z)
X->on_cpu = 0
UNLOCK rq(1)->lock
// migrate X to cpu 0
LOCK rq(1)->lock
dequeue_task(X)
set_task_cpu(X, 0)
X->cpu = 0
UNLOCK rq(1)->lock
LOCK rq(0)->lock
enqueue_task(X)
X->on_rq = 1
UNLOCK rq(0)->lock
// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
X->on_cpu = 1
UNLOCK rq(0)->lock
// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb(); // wake X
ttwu()
LOCK X->pi_lock
smp_mb__after_spinlock
if (p->state)
cpu = X->cpu; // =? 1
smp_rmb()
// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 0
if (p->on_rq)
smp_rmb();
if (p->on_cpu && ttwu_queue_wakelist(..)) [*]
smp_cond_load_acquire(&p->on_cpu, !VAL)
cpu = select_task_rq(X, X->wake_cpu, ...)
if (X->cpu != cpu)
switch_to(Y)
X->on_cpu = 0
UNLOCK rq(0)->lock
However I'm having trouble convincing myself that's actually possible
on x86_64 -- after all, every LOCK implies an smp_mb there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.
(Most of the previous ttwu() races were found on very large PowerPC)
Nevertheless, this fully explains the observed failure case.
Fix it by ordering the task_cpu(p) load after the p->on_cpu load,
which is easy since nothing actually uses @cpu before this.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/sched/core.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,8 +2293,15 @@ void sched_ttwu_pending(void *arg)
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- llist_for_each_entry_safe(p, t, llist, wake_entry)
+ llist_for_each_entry_safe(p, t, llist, wake_entry) {
+ if (WARN_ON_ONCE(p->on_cpu))
+ smp_cond_load_acquire(&p->on_cpu, !VAL);
+
+ if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))
+ set_task_cpu(p, cpu_of(rq));
+
ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
+ }
rq_unlock_irqrestore(rq, &rf);
}
@@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int c
static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
{
if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+ if (WARN_ON_ONCE(cpu == smp_processor_id()))
+ return false;
+
sched_clock_cpu(cpu); /* Sync clocks across CPUs */
__ttwu_queue_wakelist(p, cpu, wake_flags);
return true;
@@ -2528,7 +2538,6 @@ try_to_wake_up(struct task_struct *p, un
goto out;
success = 1;
- cpu = task_cpu(p);
trace_sched_waking(p);
p->state = TASK_RUNNING;
trace_sched_wakeup(p);
@@ -2550,7 +2559,6 @@ try_to_wake_up(struct task_struct *p, un
/* We're going to change ->state: */
success = 1;
- cpu = task_cpu(p);
/*
* Ensure we load p->on_rq _after_ p->state, otherwise it would
@@ -2614,8 +2622,21 @@ try_to_wake_up(struct task_struct *p, un
* which potentially sends an IPI instead of spinning on p->on_cpu to
* let the waker make forward progress. This is safe because IRQs are
* disabled and the IPI will deliver after on_cpu is cleared.
+ *
+ * Ensure we load task_cpu(p) after p->on_cpu:
+ *
+ * set_task_cpu(p, cpu);
+ * STORE p->cpu = @cpu
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock
+ * smp_mb__after_spin_lock() smp_cond_load_acquire(&p->on_cpu)
+ * STORE p->on_cpu = 1 LOAD p->cpu
+ *
+ * to ensure we observe the correct CPU on which the task is currently
+ * scheduling.
*/
- if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
+ if (smp_load_acquire(&p->on_cpu) &&
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
goto unlock;
/*
@@ -2635,6 +2656,8 @@ try_to_wake_up(struct task_struct *p, un
psi_ttwu_dequeue(p);
set_task_cpu(p, cpu);
}
+#else
+ cpu = task_cpu(p);
#endif /* CONFIG_SMP */
ttwu_queue(p, cpu, wake_flags);
@@ -2642,7 +2665,7 @@ try_to_wake_up(struct task_struct *p, un
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
out:
if (success)
- ttwu_stat(p, cpu, wake_flags);
+ ttwu_stat(p, task_cpu(p), wake_flags);
preempt_enable();
return success;
^ permalink raw reply [flat|nested] 20+ messages in thread
* [tip: sched/urgent] sched: Fix ttwu() race
2020-06-22 12:56 ` Peter Zijlstra
@ 2020-06-23 7:19 ` tip-bot2 for Peter Zijlstra
2020-06-23 8:48 ` [tip: sched/urgent] sched/core: " tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-06-23 7:19 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Paul E. McKenney, Peter Zijlstra (Intel), x86, LKML
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: dcb623b51fb9d424471da9acbd2cfa618ecf9a09
Gitweb: https://git.kernel.org/tip/dcb623b51fb9d424471da9acbd2cfa618ecf9a09
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 22 Jun 2020 12:01:23 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 22 Jun 2020 20:51:06 +02:00
sched: Fix ttwu() race
Paul reported rcutorture occasionally hitting a NULL deref:
sched_ttwu_pending()
ttwu_do_wakeup()
check_preempt_curr() := check_preempt_wakeup()
find_matching_se()
is_same_group()
if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*
Debugging showed that this only appears to happen when we take the new
code-path from commit:
2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
and only when @cpu == smp_processor_id(). Something which should not
be possible, because p->on_cpu can only be true for remote tasks.
Similarly, without the new code-path from commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
this would've unconditionally hit:
smp_cond_load_acquire(&p->on_cpu, !VAL);
and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this
would result in an instant live-lock (with IRQs disabled), something
that hasn't been reported.
The NULL deref can be explained however if the task_cpu(p) load at the
beginning of try_to_wake_up() returns an old value, and this old value
happens to be smp_processor_id(). Further assume that the p->on_cpu
load accurately returns 1, it really is still running, just not here.
Then, when we enqueue the task locally, we can crash in exactly the
observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq
is from the wrong CPU, therefore we'll iterate into the non-existant
parents and NULL deref.
The closest semi-plausible scenario I've managed to contrive is
somewhat elaborate (then again, actual reproduction takes many CPU
hours of rcutorture, so it can't be anything obvious):
X->cpu = 1
rq(1)->curr = X
CPU0 CPU1 CPU2
// switch away from X
LOCK rq(1)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 9
switch_to(Z)
X->on_cpu = 0
UNLOCK rq(1)->lock
// migrate X to cpu 0
LOCK rq(1)->lock
dequeue_task(X)
set_task_cpu(X, 0)
X->cpu = 0
UNLOCK rq(1)->lock
LOCK rq(0)->lock
enqueue_task(X)
X->on_rq = 1
UNLOCK rq(0)->lock
// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
X->on_cpu = 1
UNLOCK rq(0)->lock
// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb(); // wake X
ttwu()
LOCK X->pi_lock
smp_mb__after_spinlock
if (p->state)
cpu = X->cpu; // =? 1
smp_rmb()
// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 0
if (p->on_rq)
smp_rmb();
if (p->on_cpu && ttwu_queue_wakelist(..)) [*]
smp_cond_load_acquire(&p->on_cpu, !VAL)
cpu = select_task_rq(X, X->wake_cpu, ...)
if (X->cpu != cpu)
switch_to(Y)
X->on_cpu = 0
UNLOCK rq(0)->lock
However I'm having trouble convincing myself that's actually possible
on x86_64 -- after all, every LOCK implies an smp_mb there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.
(Most of the previous ttwu() races were found on very large PowerPC)
Nevertheless, this fully explains the observed failure case.
Fix it by ordering the task_cpu(p) load after the p->on_cpu load,
which is easy since nothing actually uses @cpu before this.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net
---
kernel/sched/core.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f79d76..3328c29 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,8 +2293,15 @@ void sched_ttwu_pending(void *arg)
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- llist_for_each_entry_safe(p, t, llist, wake_entry)
+ llist_for_each_entry_safe(p, t, llist, wake_entry) {
+ if (WARN_ON_ONCE(p->on_cpu))
+ smp_cond_load_acquire(&p->on_cpu, !VAL);
+
+ if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))
+ set_task_cpu(p, cpu_of(rq));
+
ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
+ }
rq_unlock_irqrestore(rq, &rf);
}
@@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
{
if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+ if (WARN_ON_ONCE(cpu == smp_processor_id()))
+ return false;
+
sched_clock_cpu(cpu); /* Sync clocks across CPUs */
__ttwu_queue_wakelist(p, cpu, wake_flags);
return true;
@@ -2528,7 +2538,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
goto out;
success = 1;
- cpu = task_cpu(p);
trace_sched_waking(p);
p->state = TASK_RUNNING;
trace_sched_wakeup(p);
@@ -2550,7 +2559,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
/* We're going to change ->state: */
success = 1;
- cpu = task_cpu(p);
/*
* Ensure we load p->on_rq _after_ p->state, otherwise it would
@@ -2614,8 +2622,21 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* which potentially sends an IPI instead of spinning on p->on_cpu to
* let the waker make forward progress. This is safe because IRQs are
* disabled and the IPI will deliver after on_cpu is cleared.
+ *
+ * Ensure we load task_cpu(p) after p->on_cpu:
+ *
+ * set_task_cpu(p, cpu);
+ * STORE p->cpu = @cpu
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock
+ * smp_mb__after_spin_lock() smp_cond_load_acquire(&p->on_cpu)
+ * STORE p->on_cpu = 1 LOAD p->cpu
+ *
+ * to ensure we observe the correct CPU on which the task is currently
+ * scheduling.
*/
- if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
+ if (smp_load_acquire(&p->on_cpu) &&
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
goto unlock;
/*
@@ -2635,6 +2656,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
psi_ttwu_dequeue(p);
set_task_cpu(p, cpu);
}
+#else
+ cpu = task_cpu(p);
#endif /* CONFIG_SMP */
ttwu_queue(p, cpu, wake_flags);
@@ -2642,7 +2665,7 @@ unlock:
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
out:
if (success)
- ttwu_stat(p, cpu, wake_flags);
+ ttwu_stat(p, task_cpu(p), wake_flags);
preempt_enable();
return success;
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: sched/urgent] sched/core: Fix ttwu() race
2020-06-22 12:56 ` Peter Zijlstra
2020-06-23 7:19 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2020-06-23 8:48 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-06-23 8:48 UTC (permalink / raw)
To: linux-tip-commits
Cc: Paul E. McKenney, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 964ed98b075263faabe416eeebac99a9bef3f06c
Gitweb: https://git.kernel.org/tip/964ed98b075263faabe416eeebac99a9bef3f06c
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 22 Jun 2020 12:01:23 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 23 Jun 2020 10:42:39 +02:00
sched/core: Fix ttwu() race
Paul reported rcutorture occasionally hitting a NULL deref:
sched_ttwu_pending()
ttwu_do_wakeup()
check_preempt_curr() := check_preempt_wakeup()
find_matching_se()
is_same_group()
if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*
Debugging showed that this only appears to happen when we take the new
code-path from commit:
2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
and only when @cpu == smp_processor_id(). Something which should not
be possible, because p->on_cpu can only be true for remote tasks.
Similarly, without the new code-path from commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
this would've unconditionally hit:
smp_cond_load_acquire(&p->on_cpu, !VAL);
and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this
would result in an instant live-lock (with IRQs disabled), something
that hasn't been reported.
The NULL deref can be explained however if the task_cpu(p) load at the
beginning of try_to_wake_up() returns an old value, and this old value
happens to be smp_processor_id(). Further assume that the p->on_cpu
load accurately returns 1, it really is still running, just not here.
Then, when we enqueue the task locally, we can crash in exactly the
observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq
is from the wrong CPU, therefore we'll iterate into the non-existant
parents and NULL deref.
The closest semi-plausible scenario I've managed to contrive is
somewhat elaborate (then again, actual reproduction takes many CPU
hours of rcutorture, so it can't be anything obvious):
X->cpu = 1
rq(1)->curr = X
CPU0 CPU1 CPU2
// switch away from X
LOCK rq(1)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 9
switch_to(Z)
X->on_cpu = 0
UNLOCK rq(1)->lock
// migrate X to cpu 0
LOCK rq(1)->lock
dequeue_task(X)
set_task_cpu(X, 0)
X->cpu = 0
UNLOCK rq(1)->lock
LOCK rq(0)->lock
enqueue_task(X)
X->on_rq = 1
UNLOCK rq(0)->lock
// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
X->on_cpu = 1
UNLOCK rq(0)->lock
// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb(); // wake X
ttwu()
LOCK X->pi_lock
smp_mb__after_spinlock
if (p->state)
cpu = X->cpu; // =? 1
smp_rmb()
// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
X->on_rq = 0
if (p->on_rq)
smp_rmb();
if (p->on_cpu && ttwu_queue_wakelist(..)) [*]
smp_cond_load_acquire(&p->on_cpu, !VAL)
cpu = select_task_rq(X, X->wake_cpu, ...)
if (X->cpu != cpu)
switch_to(Y)
X->on_cpu = 0
UNLOCK rq(0)->lock
However I'm having trouble convincing myself that's actually possible
on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.
(Most of the previous ttwu() races were found on very large PowerPC)
Nevertheless, this fully explains the observed failure case.
Fix it by ordering the task_cpu(p) load after the p->on_cpu load,
which is easy since nothing actually uses @cpu before this.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net
---
kernel/sched/core.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c1ba2e5..60791b9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,8 +2293,15 @@ void sched_ttwu_pending(void *arg)
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- llist_for_each_entry_safe(p, t, llist, wake_entry)
+ llist_for_each_entry_safe(p, t, llist, wake_entry) {
+ if (WARN_ON_ONCE(p->on_cpu))
+ smp_cond_load_acquire(&p->on_cpu, !VAL);
+
+ if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq)))
+ set_task_cpu(p, cpu_of(rq));
+
ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
+ }
rq_unlock_irqrestore(rq, &rf);
}
@@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
{
if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+ if (WARN_ON_ONCE(cpu == smp_processor_id()))
+ return false;
+
sched_clock_cpu(cpu); /* Sync clocks across CPUs */
__ttwu_queue_wakelist(p, cpu, wake_flags);
return true;
@@ -2528,7 +2538,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
goto out;
success = 1;
- cpu = task_cpu(p);
trace_sched_waking(p);
p->state = TASK_RUNNING;
trace_sched_wakeup(p);
@@ -2550,7 +2559,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
/* We're going to change ->state: */
success = 1;
- cpu = task_cpu(p);
/*
* Ensure we load p->on_rq _after_ p->state, otherwise it would
@@ -2614,8 +2622,21 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* which potentially sends an IPI instead of spinning on p->on_cpu to
* let the waker make forward progress. This is safe because IRQs are
* disabled and the IPI will deliver after on_cpu is cleared.
+ *
+ * Ensure we load task_cpu(p) after p->on_cpu:
+ *
+ * set_task_cpu(p, cpu);
+ * STORE p->cpu = @cpu
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock
+ * smp_mb__after_spin_lock() smp_cond_load_acquire(&p->on_cpu)
+ * STORE p->on_cpu = 1 LOAD p->cpu
+ *
+ * to ensure we observe the correct CPU on which the task is currently
+ * scheduling.
*/
- if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
+ if (smp_load_acquire(&p->on_cpu) &&
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
goto unlock;
/*
@@ -2635,6 +2656,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
psi_ttwu_dequeue(p);
set_task_cpu(p, cpu);
}
+#else
+ cpu = task_cpu(p);
#endif /* CONFIG_SMP */
ttwu_queue(p, cpu, wake_flags);
@@ -2642,7 +2665,7 @@ unlock:
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
out:
if (success)
- ttwu_stat(p, cpu, wake_flags);
+ ttwu_stat(p, task_cpu(p), wake_flags);
preempt_enable();
return success;
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-06-22 10:01 ` [PATCH -v2 1/5] sched: Fix ttwu() race Peter Zijlstra
2020-06-22 12:56 ` Peter Zijlstra
@ 2020-07-21 10:49 ` Chris Wilson
2020-07-21 11:37 ` peterz
1 sibling, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2020-07-21 10:49 UTC (permalink / raw)
To: Peter Zijlstra, mingo, tglx
Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, paulmck, frederic, peterz, torvalds,
hch
Quoting Peter Zijlstra (2020-06-22 11:01:23)
> @@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int c
> static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
> {
> if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> + if (WARN_ON_ONCE(cpu == smp_processor_id()))
> + return false;
> +
> sched_clock_cpu(cpu); /* Sync clocks across CPUs */
> __ttwu_queue_wakelist(p, cpu, wake_flags);
> return true;
We've been hitting this warning frequently, but have never seen the
rcu-torture-esque oops ourselves.
<4> [181.766705] RIP: 0010:ttwu_queue_wakelist+0xbc/0xd0
<4> [181.766710] Code: 00 00 00 5b 5d 41 5c 41 5d c3 31 c0 5b 5d 41 5c 41 5d c3 31 c0 f6 c3 08 74 f2 48 c7 c2 00 ad 03 00 83 7c 11 40 01 77 e4 eb 80 <0f> 0b 31 c0 eb dc 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 bf 17
<4> [181.766726] RSP: 0018:ffffc90000003e08 EFLAGS: 00010046
<4> [181.766733] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffff888276a00000
<4> [181.766740] RDX: 000000000003ad00 RSI: ffffffff8232045b RDI: ffffffff8233103e
<4> [181.766747] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
<4> [181.766754] R10: 00000000d3fa25c3 R11: 0000000053712267 R12: ffff88825b912940
<4> [181.766761] R13: 0000000000000000 R14: 0000000000000087 R15: 000000000003ad00
<4> [181.766769] FS: 0000000000000000(0000) GS:ffff888276a00000(0000) knlGS:0000000000000000
<4> [181.766777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [181.766783] CR2: 000055b8245814e0 CR3: 0000000005610003 CR4: 00000000003606f0
<4> [181.766790] Call Trace:
<4> [181.766794] <IRQ>
<4> [181.766798] try_to_wake_up+0x21b/0x690
<4> [181.766805] autoremove_wake_function+0xc/0x50
<4> [181.766858] __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4> [181.766912] dma_i915_sw_fence_wake+0x2d/0x40 [i915]
We are seeing this on the ttwu_queue() path, so with p->on_cpu=0, and the
warning is cleared up by
- if (WARN_ON_ONCE(cpu == smp_processor_id()))
+ if (WARN_ON_ONCE(p->on_cpu && cpu == smp_processor_id()))
which would appear to restore the old behaviour for ttwu_queue() and
seem to be consistent with the intent of this patch. Hopefully this
helps identify the problem correctly.
-Chris
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-07-21 10:49 ` [PATCH -v2 1/5] sched: " Chris Wilson
@ 2020-07-21 11:37 ` peterz
2020-07-22 9:57 ` Chris Wilson
0 siblings, 1 reply; 20+ messages in thread
From: peterz @ 2020-07-21 11:37 UTC (permalink / raw)
To: Chris Wilson
Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
torvalds, hch
On Tue, Jul 21, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> Quoting Peter Zijlstra (2020-06-22 11:01:23)
> > @@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int c
> > static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
> > {
> > if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> > + if (WARN_ON_ONCE(cpu == smp_processor_id()))
> > + return false;
> > +
> > sched_clock_cpu(cpu); /* Sync clocks across CPUs */
> > __ttwu_queue_wakelist(p, cpu, wake_flags);
> > return true;
>
> We've been hitting this warning frequently, but have never seen the
> rcu-torture-esque oops ourselves.
How easy is it to hit this? What, if anything, can I do to make my own
computer go bang?
> <4> [181.766705] RIP: 0010:ttwu_queue_wakelist+0xbc/0xd0
> <4> [181.766710] Code: 00 00 00 5b 5d 41 5c 41 5d c3 31 c0 5b 5d 41 5c 41 5d c3 31 c0 f6 c3 08 74 f2 48 c7 c2 00 ad 03 00 83 7c 11 40 01 77 e4 eb 80 <0f> 0b 31 c0 eb dc 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 bf 17
> <4> [181.766726] RSP: 0018:ffffc90000003e08 EFLAGS: 00010046
> <4> [181.766733] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffff888276a00000
> <4> [181.766740] RDX: 000000000003ad00 RSI: ffffffff8232045b RDI: ffffffff8233103e
> <4> [181.766747] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> <4> [181.766754] R10: 00000000d3fa25c3 R11: 0000000053712267 R12: ffff88825b912940
> <4> [181.766761] R13: 0000000000000000 R14: 0000000000000087 R15: 000000000003ad00
> <4> [181.766769] FS: 0000000000000000(0000) GS:ffff888276a00000(0000) knlGS:0000000000000000
> <4> [181.766777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [181.766783] CR2: 000055b8245814e0 CR3: 0000000005610003 CR4: 00000000003606f0
> <4> [181.766790] Call Trace:
> <4> [181.766794] <IRQ>
> <4> [181.766798] try_to_wake_up+0x21b/0x690
> <4> [181.766805] autoremove_wake_function+0xc/0x50
> <4> [181.766858] __i915_sw_fence_complete+0x1ee/0x250 [i915]
> <4> [181.766912] dma_i915_sw_fence_wake+0x2d/0x40 [i915]
Please, don't trim oopses..
> We are seeing this on the ttwu_queue() path, so with p->on_cpu=0, and the
> warning is cleared up by
>
> - if (WARN_ON_ONCE(cpu == smp_processor_id()))
> + if (WARN_ON_ONCE(p->on_cpu && cpu == smp_processor_id()))
>
> which would appear to restore the old behaviour for ttwu_queue() and
> seem to be consistent with the intent of this patch. Hopefully this
> helps identify the problem correctly.
Hurmph, that's actively wrong. We should never queue to self, as that
would result in self-IPI, which is not possible on a bunch of archs. It
works for you because x86 can in fact do that.
So ttwu_queue_cond() will only return true when:
- target-cpu and current-cpu do not share cache;
so it cannot be this condition, because you _always_
share cache with yourself.
- when WF_ON_CPU and target-cpu has nr_running <= 1;
which means p->on_cpu == true.
So now you have cpu == smp_processor_id() && p->on_cpu == 1, however
your modified WARN contradicts that.
*puzzle*
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-07-21 11:37 ` peterz
@ 2020-07-22 9:57 ` Chris Wilson
2020-07-23 18:28 ` Peter Zijlstra
0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2020-07-22 9:57 UTC (permalink / raw)
To: peterz
Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
torvalds, hch
Quoting peterz@infradead.org (2020-07-21 12:37:19)
> On Tue, Jul 21, 2020 at 11:49:05AM +0100, Chris Wilson wrote:
> > Quoting Peter Zijlstra (2020-06-22 11:01:23)
> > > @@ -2378,6 +2385,9 @@ static inline bool ttwu_queue_cond(int c
> > > static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
> > > {
> > > if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> > > + if (WARN_ON_ONCE(cpu == smp_processor_id()))
> > > + return false;
> > > +
> > > sched_clock_cpu(cpu); /* Sync clocks across CPUs */
> > > __ttwu_queue_wakelist(p, cpu, wake_flags);
> > > return true;
> >
> > We've been hitting this warning frequently, but have never seen the
> > rcu-torture-esque oops ourselves.
>
> How easy is it to hit this? What, if anything, can I do to make my own
> computer go bang?
I tried reproducing it in a mockup, hrtimer + irq_work + waitqueue, but
it remains elusive. It pops up in an obscure HW tests where we are
exercising timeout handling for rogue HW.
>
> > <4> [181.766705] RIP: 0010:ttwu_queue_wakelist+0xbc/0xd0
> > <4> [181.766710] Code: 00 00 00 5b 5d 41 5c 41 5d c3 31 c0 5b 5d 41 5c 41 5d c3 31 c0 f6 c3 08 74 f2 48 c7 c2 00 ad 03 00 83 7c 11 40 01 77 e4 eb 80 <0f> 0b 31 c0 eb dc 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 bf 17
> > <4> [181.766726] RSP: 0018:ffffc90000003e08 EFLAGS: 00010046
> > <4> [181.766733] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffff888276a00000
> > <4> [181.766740] RDX: 000000000003ad00 RSI: ffffffff8232045b RDI: ffffffff8233103e
> > <4> [181.766747] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> > <4> [181.766754] R10: 00000000d3fa25c3 R11: 0000000053712267 R12: ffff88825b912940
> > <4> [181.766761] R13: 0000000000000000 R14: 0000000000000087 R15: 000000000003ad00
> > <4> [181.766769] FS: 0000000000000000(0000) GS:ffff888276a00000(0000) knlGS:0000000000000000
> > <4> [181.766777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4> [181.766783] CR2: 000055b8245814e0 CR3: 0000000005610003 CR4: 00000000003606f0
> > <4> [181.766790] Call Trace:
> > <4> [181.766794] <IRQ>
> > <4> [181.766798] try_to_wake_up+0x21b/0x690
> > <4> [181.766805] autoremove_wake_function+0xc/0x50
> > <4> [181.766858] __i915_sw_fence_complete+0x1ee/0x250 [i915]
> > <4> [181.766912] dma_i915_sw_fence_wake+0x2d/0x40 [i915]
>
> Please, don't trim oopses..
>
> > We are seeing this on the ttwu_queue() path, so with p->on_cpu=0, and the
> > warning is cleared up by
> >
> > - if (WARN_ON_ONCE(cpu == smp_processor_id()))
> > + if (WARN_ON_ONCE(p->on_cpu && cpu == smp_processor_id()))
> >
> > which would appear to restore the old behaviour for ttwu_queue() and
> > seem to be consistent with the intent of this patch. Hopefully this
> > helps identify the problem correctly.
>
> Hurmph, that's actively wrong. We should never queue to self, as that
> would result in self-IPI, which is not possible on a bunch of archs. It
> works for you because x86 can in fact do that.
>
> So ttwu_queue_cond() will only return true when:
>
> - target-cpu and current-cpu do not share cache;
> so it cannot be this condition, because you _always_
> share cache with yourself.
>
> - when WF_ON_CPU and target-cpu has nr_running <= 1;
> which means p->on_cpu == true.
>
> So now you have cpu == smp_processor_id() && p->on_cpu == 1, however
> your modified WARN contradicts that.
>
> *puzzle*
Perhaps more damning is that I can replace WF_ON_CPU with p->on_cpu to
suppress the warning:
-static inline bool ttwu_queue_cond(int cpu, int wake_flags)
+static inline bool ttwu_queue_cond(struct task_struct *p, int cpu, int wake_flags)
{
/*
* If the CPU does not share cache, then queue the task on the
@@ -2370,7 +2370,7 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
*/
- if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
+ if (p->on_cpu && cpu_rq(cpu)->nr_running <= 1)
return true;
return false;
@@ -2378,7 +2378,7 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
{
- if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+ if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(p, cpu, wake_flags)) {
if (WARN_ON_ONCE(cpu == smp_processor_id()))
return false;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-07-22 9:57 ` Chris Wilson
@ 2020-07-23 18:28 ` Peter Zijlstra
2020-07-23 19:41 ` Chris Wilson
0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2020-07-23 18:28 UTC (permalink / raw)
To: Chris Wilson
Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
torvalds, hch
On Wed, Jul 22, 2020 at 10:57:56AM +0100, Chris Wilson wrote:
> Perhaps more damning is that I can replace WF_ON_CPU with p->on_cpu to
> suppress the warning:
*argh*, I'm starting to go mad...
Chris, could you please try the below patch?
Can you also confirm that if you do:
$ echo NO_TTWU_QUEUE_ON_CPU > /debug/sched_features
or wherever else system-doofus mounts debugfs these days,
the issue no longer manifests? Because if I don't get a handle on this
soon we might have to disable this thing for now :/
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a2a244af9a537..8218779734288 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2430,13 +2430,15 @@ bool cpus_share_cache(int this_cpu, int that_cpu)
return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
}
-static inline bool ttwu_queue_cond(int cpu, int wake_flags)
+static inline bool ttwu_queue_cond(struct task_struct *p, int cpu, int wake_flags)
{
+ int this_cpu = smp_processor_id();
+
/*
* If the CPU does not share cache, then queue the task on the
* remote rqs wakelist to avoid accessing remote data.
*/
- if (!cpus_share_cache(smp_processor_id(), cpu))
+ if (!cpus_share_cache(this_cpu, cpu))
return true;
/*
@@ -2445,15 +2447,30 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
*/
- if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
- return true;
+ if (wake_flags & WF_ON_CPU) {
+
+ if (unlikely(cpu == this_cpu)) {
+ int on_cpu = READ_ONCE(p->on_cpu);
+ int cpu1 = task_cpu(p);
+
+ smp_rmb();
+ smp_cond_load_acquire(&p->on_cpu, !VAL);
+
+ pr_alert("ttwu-IPI-self: %d==%d, p->on_cpu=%d;0, task_cpu(p)=%d;%d\n",
+ cpu, this_cpu, on_cpu, cpu1, task_cpu(p));
+
+ return false;
+ }
+
+ return cpu_rq(cpu)->nr_running <= 1;
+ }
return false;
}
static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
{
- if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+ if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(p, cpu, wake_flags)) {
if (WARN_ON_ONCE(cpu == smp_processor_id()))
return false;
@@ -2713,7 +2730,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* to ensure we observe the correct CPU on which the task is currently
* scheduling.
*/
- if (smp_load_acquire(&p->on_cpu) &&
+ if (sched_feat(TTWU_QUEUE_ON_CPU) && smp_load_acquire(&p->on_cpu) &&
ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))
goto unlock;
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 7481cd96f3915..b231a840c3eba 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -50,6 +50,7 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
* using the scheduler IPI. Reduces rq->lock contention/bounces.
*/
SCHED_FEAT(TTWU_QUEUE, true)
+SCHED_FEAT(TTWU_QUEUE_ON_CPU, true)
/*
* When doing wakeups, attempt to limit superfluous scans of the LLC domain.
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-07-23 18:28 ` Peter Zijlstra
@ 2020-07-23 19:41 ` Chris Wilson
2020-07-23 20:11 ` Peter Zijlstra
0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2020-07-23 19:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
torvalds, hch
Quoting Peter Zijlstra (2020-07-23 19:28:41)
> On Wed, Jul 22, 2020 at 10:57:56AM +0100, Chris Wilson wrote:
>
> > Perhaps more damning is that I can replace WF_ON_CPU with p->on_cpu to
> > suppress the warning:
>
> *argh*, I'm starting to go mad...
>
> Chris, could you please try the below patch?
ttwu-IPI-self: 1==1, p->on_cpu=0;0, task_cpu(p)=1;1
ttwu-IPI-self: 1==1, p->on_cpu=0;0, task_cpu(p)=1;1
ttwu-IPI-self: 0==0, p->on_cpu=0;0, task_cpu(p)=0;0
ttwu-IPI-self: 3==3, p->on_cpu=0;0, task_cpu(p)=3;3
ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2
ttwu-IPI-self: 1==1, p->on_cpu=0;0, task_cpu(p)=1;1
ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2
ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2
ttwu-IPI-self: 2==2, p->on_cpu=0;0, task_cpu(p)=2;2
> Can you also confirm that if you do:
>
> $ echo NO_TTWU_QUEUE_ON_CPU > /debug/sched_features
With,
sched_feat_disable(10):TTWU_QUEUE_ON_CPU
the pr_alert is still being hit
ttwu-IPI-self: 3==3, p->on_cpu=0;0, task_cpu(p)=3;3
At which point, it darns on me. Mea culpa, stray bits being passed into
default_wake_function.
I am very sorry for the wild goose chase.
-Chris
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-07-23 19:41 ` Chris Wilson
@ 2020-07-23 20:11 ` Peter Zijlstra
2020-07-24 17:55 ` Paul E. McKenney
0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2020-07-23 20:11 UTC (permalink / raw)
To: Chris Wilson
Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
torvalds, hch
On Thu, Jul 23, 2020 at 08:41:03PM +0100, Chris Wilson wrote:
> I am very sorry for the wild goose chase.
*phew*... all good then. I was starting to go a little ga-ga trying to
make sense of things.
Arguably we should probably do something like:
@@ -4555,7 +4572,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void)
int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
void *key)
{
- return try_to_wake_up(curr->private, mode, wake_flags);
+ return try_to_wake_up(curr->private, mode, wake_flags & WF_SYNC);
}
EXPORT_SYMBOL(default_wake_function);
Since I don't think anybody uses anything other than WF_SYNC, ever. And
the rest of the WF_flags are used internally.
Thanks Chris!
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH -v2 1/5] sched: Fix ttwu() race
2020-07-23 20:11 ` Peter Zijlstra
@ 2020-07-24 17:55 ` Paul E. McKenney
0 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2020-07-24 17:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Chris Wilson, mingo, tglx, linux-kernel, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
frederic, torvalds, hch
On Thu, Jul 23, 2020 at 10:11:28PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 23, 2020 at 08:41:03PM +0100, Chris Wilson wrote:
>
> > I am very sorry for the wild goose chase.
>
> *phew*... all good then. I was starting to go a little ga-ga trying to
> make sense of things.
>
> Arguably we should probably do something like:
>
>
> @@ -4555,7 +4572,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void)
> int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
> void *key)
> {
> - return try_to_wake_up(curr->private, mode, wake_flags);
> + return try_to_wake_up(curr->private, mode, wake_flags & WF_SYNC);
> }
> EXPORT_SYMBOL(default_wake_function);
If you do:
Tested-by: Paul E. McKenney <paulmck@kernel.org>
This was about nine hours of each of the default rcutorture scenarios.
Thanx, Paul
> Since I don't think anybody uses anything other than WF_SYNC, ever. And
> the rest of the WF_flags are used internally.
>
> Thanks Chris!
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH -v2 2/5] sched: s/WF_ON_RQ/WQ_ON_CPU/
2020-06-22 10:01 [PATCH -v2 0/5] sched: TTWU, IPI and stuff Peter Zijlstra
2020-06-22 10:01 ` [PATCH -v2 1/5] sched: Fix ttwu() race Peter Zijlstra
@ 2020-06-22 10:01 ` Peter Zijlstra
2020-06-23 7:19 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2020-06-23 8:48 ` [tip: sched/urgent] sched/core: s/WF_ON_RQ/WQ_ON_CPU/ tip-bot2 for Peter Zijlstra
2020-06-22 10:01 ` [PATCH -v2 3/5] smp, irq_work: Continue smp_call_function*() and irq_work*() integration Peter Zijlstra
` (2 subsequent siblings)
4 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-06-22 10:01 UTC (permalink / raw)
To: mingo, tglx
Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, paulmck, frederic, peterz, torvalds,
hch, Mel Gorman
Avoids confusion...
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@susee.de>
---
kernel/sched/core.c | 4 ++--
kernel/sched/sched.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2376,7 +2376,7 @@ static inline bool ttwu_queue_cond(int c
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
*/
- if ((wake_flags & WF_ON_RQ) && cpu_rq(cpu)->nr_running <= 1)
+ if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
return true;
return false;
@@ -2637,7 +2637,7 @@ try_to_wake_up(struct task_struct *p, un
* scheduling.
*/
if (smp_load_acquire(&p->on_cpu) &&
- ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))
goto unlock;
/*
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1682,7 +1682,7 @@ static inline int task_on_rq_migrating(s
#define WF_SYNC 0x01 /* Waker goes to sleep after wakeup */
#define WF_FORK 0x02 /* Child wakeup after fork */
#define WF_MIGRATED 0x04 /* Internal use, task got migrated */
-#define WF_ON_RQ 0x08 /* Wakee is on_rq */
+#define WF_ON_CPU 0x08 /* Wakee is on_cpu */
/*
* To aid in avoiding the subversion of "niceness" due to uneven distribution
^ permalink raw reply [flat|nested] 20+ messages in thread
* [tip: sched/urgent] sched: s/WF_ON_RQ/WQ_ON_CPU/
2020-06-22 10:01 ` [PATCH -v2 2/5] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
@ 2020-06-23 7:19 ` tip-bot2 for Peter Zijlstra
2020-06-23 8:48 ` [tip: sched/urgent] sched/core: s/WF_ON_RQ/WQ_ON_CPU/ tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-06-23 7:19 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Mel Gorman, x86, LKML
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 38d8705bb63d19747eb6259c22c54d18cc47e4f7
Gitweb: https://git.kernel.org/tip/38d8705bb63d19747eb6259c22c54d18cc47e4f7
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 22 Jun 2020 12:01:24 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 22 Jun 2020 20:51:06 +02:00
sched: s/WF_ON_RQ/WQ_ON_CPU/
Avoids confusion...
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org
---
kernel/sched/core.c | 4 ++--
kernel/sched/sched.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3328c29..019db7a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2376,7 +2376,7 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
*/
- if ((wake_flags & WF_ON_RQ) && cpu_rq(cpu)->nr_running <= 1)
+ if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
return true;
return false;
@@ -2636,7 +2636,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* scheduling.
*/
if (smp_load_acquire(&p->on_cpu) &&
- ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))
goto unlock;
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1d4e94c..877fb08 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1682,7 +1682,7 @@ static inline int task_on_rq_migrating(struct task_struct *p)
#define WF_SYNC 0x01 /* Waker goes to sleep after wakeup */
#define WF_FORK 0x02 /* Child wakeup after fork */
#define WF_MIGRATED 0x04 /* Internal use, task got migrated */
-#define WF_ON_RQ 0x08 /* Wakee is on_rq */
+#define WF_ON_CPU 0x08 /* Wakee is on_cpu */
/*
* To aid in avoiding the subversion of "niceness" due to uneven distribution
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: sched/urgent] sched/core: s/WF_ON_RQ/WQ_ON_CPU/
2020-06-22 10:01 ` [PATCH -v2 2/5] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
2020-06-23 7:19 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2020-06-23 8:48 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-06-23 8:48 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Mel Gorman, x86, LKML
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 5311fd9f4c49ccc597021d901136fc807c01d678
Gitweb: https://git.kernel.org/tip/5311fd9f4c49ccc597021d901136fc807c01d678
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 22 Jun 2020 12:01:24 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 23 Jun 2020 10:42:39 +02:00
sched/core: s/WF_ON_RQ/WQ_ON_CPU/
Use a better name for this poorly named flag, to avoid confusion...
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org
---
kernel/sched/core.c | 4 ++--
kernel/sched/sched.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 60791b9..f778067 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2376,7 +2376,7 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
* the soon-to-be-idle CPU as the current CPU is likely busy.
* nr_running is checked to avoid unnecessary task stacking.
*/
- if ((wake_flags & WF_ON_RQ) && cpu_rq(cpu)->nr_running <= 1)
+ if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
return true;
return false;
@@ -2636,7 +2636,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* scheduling.
*/
if (smp_load_acquire(&p->on_cpu) &&
- ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
+ ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU))
goto unlock;
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1d4e94c..877fb08 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1682,7 +1682,7 @@ static inline int task_on_rq_migrating(struct task_struct *p)
#define WF_SYNC 0x01 /* Waker goes to sleep after wakeup */
#define WF_FORK 0x02 /* Child wakeup after fork */
#define WF_MIGRATED 0x04 /* Internal use, task got migrated */
-#define WF_ON_RQ 0x08 /* Wakee is on_rq */
+#define WF_ON_CPU 0x08 /* Wakee is on_cpu */
/*
* To aid in avoiding the subversion of "niceness" due to uneven distribution
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH -v2 3/5] smp, irq_work: Continue smp_call_function*() and irq_work*() integration
2020-06-22 10:01 [PATCH -v2 0/5] sched: TTWU, IPI and stuff Peter Zijlstra
2020-06-22 10:01 ` [PATCH -v2 1/5] sched: Fix ttwu() race Peter Zijlstra
2020-06-22 10:01 ` [PATCH -v2 2/5] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
@ 2020-06-22 10:01 ` Peter Zijlstra
2020-06-23 7:19 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2020-06-23 8:48 ` tip-bot2 for Peter Zijlstra
2020-06-22 10:01 ` [PATCH -v2 4/5] irq_work: Cleanup Peter Zijlstra
2020-06-22 10:01 ` [PATCH -v2 5/5] smp: Cleanup smp_call_function*() Peter Zijlstra
4 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-06-22 10:01 UTC (permalink / raw)
To: mingo, tglx
Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, paulmck, frederic, peterz, torvalds,
hch
Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions.
Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/irq_work.h | 26 +++++-------------
include/linux/sched.h | 5 ---
include/linux/smp.h | 23 +++++-----------
include/linux/smp_types.h | 66 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/core.c | 6 ++--
kernel/smp.c | 18 ------------
6 files changed, 86 insertions(+), 58 deletions(-)
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -2,7 +2,7 @@
#ifndef _LINUX_IRQ_WORK_H
#define _LINUX_IRQ_WORK_H
-#include <linux/llist.h>
+#include <linux/smp_types.h>
/*
* An entry can be in one of four states:
@@ -13,24 +13,14 @@
* busy NULL, 2 -> {free, claimed} : callback in progress, can be claimed
*/
-/* flags share CSD_FLAG_ space */
-
-#define IRQ_WORK_PENDING BIT(0)
-#define IRQ_WORK_BUSY BIT(1)
-
-/* Doesn't want IPI, wait for tick: */
-#define IRQ_WORK_LAZY BIT(2)
-/* Run hard IRQ context, even on RT */
-#define IRQ_WORK_HARD_IRQ BIT(3)
-
-#define IRQ_WORK_CLAIMED (IRQ_WORK_PENDING | IRQ_WORK_BUSY)
-
-/*
- * structure shares layout with single_call_data_t.
- */
struct irq_work {
- struct llist_node llnode;
- atomic_t flags;
+ union {
+ struct __call_single_node node;
+ struct {
+ struct llist_node llnode;
+ atomic_t flags;
+ };
+ };
void (*func)(struct irq_work *);
};
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -654,11 +654,8 @@ struct task_struct {
unsigned int ptrace;
#ifdef CONFIG_SMP
- struct {
- struct llist_node wake_entry;
- unsigned int wake_entry_type;
- };
int on_cpu;
+ struct __call_single_node wake_entry;
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* Current CPU: */
unsigned int cpu;
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -12,29 +12,22 @@
#include <linux/list.h>
#include <linux/cpumask.h>
#include <linux/init.h>
-#include <linux/llist.h>
+#include <linux/smp_types.h>
typedef void (*smp_call_func_t)(void *info);
typedef bool (*smp_cond_func_t)(int cpu, void *info);
-enum {
- CSD_FLAG_LOCK = 0x01,
-
- /* IRQ_WORK_flags */
-
- CSD_TYPE_ASYNC = 0x00,
- CSD_TYPE_SYNC = 0x10,
- CSD_TYPE_IRQ_WORK = 0x20,
- CSD_TYPE_TTWU = 0x30,
- CSD_FLAG_TYPE_MASK = 0xF0,
-};
-
/*
* structure shares (partial) layout with struct irq_work
*/
struct __call_single_data {
- struct llist_node llist;
- unsigned int flags;
+ union {
+ struct __call_single_node node;
+ struct {
+ struct llist_node llist;
+ unsigned int flags;
+ };
+ };
smp_call_func_t func;
void *info;
};
--- /dev/null
+++ b/include/linux/smp_types.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_SMP_TYPES_H
+#define __LINUX_SMP_TYPES_H
+
+#include <linux/llist.h>
+
+enum {
+ CSD_FLAG_LOCK = 0x01,
+
+ IRQ_WORK_PENDING = 0x01,
+ IRQ_WORK_BUSY = 0x02,
+ IRQ_WORK_LAZY = 0x04, /* No IPI, wait for tick */
+ IRQ_WORK_HARD_IRQ = 0x08, /* IRQ context on PREEMPT_RT */
+
+ IRQ_WORK_CLAIMED = (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
+
+ CSD_TYPE_ASYNC = 0x00,
+ CSD_TYPE_SYNC = 0x10,
+ CSD_TYPE_IRQ_WORK = 0x20,
+ CSD_TYPE_TTWU = 0x30,
+
+ CSD_FLAG_TYPE_MASK = 0xF0,
+};
+
+/*
+ * struct __call_single_node is the primary type on
+ * smp.c:call_single_queue.
+ *
+ * flush_smp_call_function_queue() only reads the type from
+ * __call_single_node::u_flags as a regular load, the above
+ * (anonymous) enum defines all the bits of this word.
+ *
+ * Other bits are not modified until the type is known.
+ *
+ * CSD_TYPE_SYNC/ASYNC:
+ * struct {
+ * struct llist_node node;
+ * unsigned int flags;
+ * smp_call_func_t func;
+ * void *info;
+ * };
+ *
+ * CSD_TYPE_IRQ_WORK:
+ * struct {
+ * struct llist_node node;
+ * atomic_t flags;
+ * void (*func)(struct irq_work *);
+ * };
+ *
+ * CSD_TYPE_TTWU:
+ * struct {
+ * struct llist_node node;
+ * unsigned int flags;
+ * };
+ *
+ */
+
+struct __call_single_node {
+ struct llist_node llist;
+ union {
+ unsigned int u_flags;
+ atomic_t a_flags;
+ };
+};
+
+#endif /* __LINUX_SMP_TYPES_H */
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,7 +2293,7 @@ void sched_ttwu_pending(void *arg)
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- llist_for_each_entry_safe(p, t, llist, wake_entry) {
+ llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {
if (WARN_ON_ONCE(p->on_cpu))
smp_cond_load_acquire(p->on_cpu, !VAL);
@@ -2329,7 +2329,7 @@ static void __ttwu_queue_wakelist(struct
p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
WRITE_ONCE(rq->ttwu_pending, 1);
- __smp_call_single_queue(cpu, &p->wake_entry);
+ __smp_call_single_queue(cpu, &p->wake_entry.llist);
}
void wake_up_if_idle(int cpu)
@@ -2787,7 +2787,7 @@ static void __sched_fork(unsigned long c
#endif
init_numa_balancing(clone_flags, p);
#ifdef CONFIG_SMP
- p->wake_entry_type = CSD_TYPE_TTWU;
+ p->wake_entry.u_flags = CSD_TYPE_TTWU;
#endif
}
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -669,24 +669,6 @@ void __init smp_init(void)
{
int num_nodes, num_cpus;
- /*
- * Ensure struct irq_work layout matches so that
- * flush_smp_call_function_queue() can do horrible things.
- */
- BUILD_BUG_ON(offsetof(struct irq_work, llnode) !=
- offsetof(struct __call_single_data, llist));
- BUILD_BUG_ON(offsetof(struct irq_work, func) !=
- offsetof(struct __call_single_data, func));
- BUILD_BUG_ON(offsetof(struct irq_work, flags) !=
- offsetof(struct __call_single_data, flags));
-
- /*
- * Assert the CSD_TYPE_TTWU layout is similar enough
- * for task_struct to be on the @call_single_queue.
- */
- BUILD_BUG_ON(offsetof(struct task_struct, wake_entry_type) - offsetof(struct task_struct, wake_entry) !=
- offsetof(struct __call_single_data, flags) - offsetof(struct __call_single_data, llist));
-
idle_threads_init();
cpuhp_threads_init();
^ permalink raw reply [flat|nested] 20+ messages in thread
* [tip: sched/urgent] smp, irq_work: Continue smp_call_function*() and irq_work*() integration
2020-06-22 10:01 ` [PATCH -v2 3/5] smp, irq_work: Continue smp_call_function*() and irq_work*() integration Peter Zijlstra
@ 2020-06-23 7:19 ` tip-bot2 for Peter Zijlstra
2020-06-23 8:48 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-06-23 7:19 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Frederic Weisbecker, x86, LKML
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: ad6567b439016f9ffd01bbb25276ee363d9a106a
Gitweb: https://git.kernel.org/tip/ad6567b439016f9ffd01bbb25276ee363d9a106a
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 22 Jun 2020 12:01:25 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 22 Jun 2020 20:51:06 +02:00
smp, irq_work: Continue smp_call_function*() and irq_work*() integration
Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions.
Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
---
include/linux/irq_work.h | 26 ++++-----------
include/linux/sched.h | 5 +---
include/linux/smp.h | 23 ++++---------
include/linux/smp_types.h | 66 ++++++++++++++++++++++++++++++++++++++-
kernel/sched/core.c | 6 +--
kernel/smp.c | 18 +----------
6 files changed, 86 insertions(+), 58 deletions(-)
create mode 100644 include/linux/smp_types.h
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 2735da5..3082378 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -2,7 +2,7 @@
#ifndef _LINUX_IRQ_WORK_H
#define _LINUX_IRQ_WORK_H
-#include <linux/llist.h>
+#include <linux/smp_types.h>
/*
* An entry can be in one of four states:
@@ -13,24 +13,14 @@
* busy NULL, 2 -> {free, claimed} : callback in progress, can be claimed
*/
-/* flags share CSD_FLAG_ space */
-
-#define IRQ_WORK_PENDING BIT(0)
-#define IRQ_WORK_BUSY BIT(1)
-
-/* Doesn't want IPI, wait for tick: */
-#define IRQ_WORK_LAZY BIT(2)
-/* Run hard IRQ context, even on RT */
-#define IRQ_WORK_HARD_IRQ BIT(3)
-
-#define IRQ_WORK_CLAIMED (IRQ_WORK_PENDING | IRQ_WORK_BUSY)
-
-/*
- * structure shares layout with single_call_data_t.
- */
struct irq_work {
- struct llist_node llnode;
- atomic_t flags;
+ union {
+ struct __call_single_node node;
+ struct {
+ struct llist_node llnode;
+ atomic_t flags;
+ };
+ };
void (*func)(struct irq_work *);
};
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 59caeb9..0376588 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -653,11 +653,8 @@ struct task_struct {
unsigned int ptrace;
#ifdef CONFIG_SMP
- struct {
- struct llist_node wake_entry;
- unsigned int wake_entry_type;
- };
int on_cpu;
+ struct __call_single_node wake_entry;
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* Current CPU: */
unsigned int cpu;
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 7ee202a..80d557e 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -12,29 +12,22 @@
#include <linux/list.h>
#include <linux/cpumask.h>
#include <linux/init.h>
-#include <linux/llist.h>
+#include <linux/smp_types.h>
typedef void (*smp_call_func_t)(void *info);
typedef bool (*smp_cond_func_t)(int cpu, void *info);
-enum {
- CSD_FLAG_LOCK = 0x01,
-
- /* IRQ_WORK_flags */
-
- CSD_TYPE_ASYNC = 0x00,
- CSD_TYPE_SYNC = 0x10,
- CSD_TYPE_IRQ_WORK = 0x20,
- CSD_TYPE_TTWU = 0x30,
- CSD_FLAG_TYPE_MASK = 0xF0,
-};
-
/*
* structure shares (partial) layout with struct irq_work
*/
struct __call_single_data {
- struct llist_node llist;
- unsigned int flags;
+ union {
+ struct __call_single_node node;
+ struct {
+ struct llist_node llist;
+ unsigned int flags;
+ };
+ };
smp_call_func_t func;
void *info;
};
diff --git a/include/linux/smp_types.h b/include/linux/smp_types.h
new file mode 100644
index 0000000..364b3ae
--- /dev/null
+++ b/include/linux/smp_types.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_SMP_TYPES_H
+#define __LINUX_SMP_TYPES_H
+
+#include <linux/llist.h>
+
+enum {
+ CSD_FLAG_LOCK = 0x01,
+
+ IRQ_WORK_PENDING = 0x01,
+ IRQ_WORK_BUSY = 0x02,
+ IRQ_WORK_LAZY = 0x04, /* No IPI, wait for tick */
+ IRQ_WORK_HARD_IRQ = 0x08, /* IRQ context on PREEMPT_RT */
+
+ IRQ_WORK_CLAIMED = (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
+
+ CSD_TYPE_ASYNC = 0x00,
+ CSD_TYPE_SYNC = 0x10,
+ CSD_TYPE_IRQ_WORK = 0x20,
+ CSD_TYPE_TTWU = 0x30,
+
+ CSD_FLAG_TYPE_MASK = 0xF0,
+};
+
+/*
+ * struct __call_single_node is the primary type on
+ * smp.c:call_single_queue.
+ *
+ * flush_smp_call_function_queue() only reads the type from
+ * __call_single_node::u_flags as a regular load, the above
+ * (anonymous) enum defines all the bits of this word.
+ *
+ * Other bits are not modified until the type is known.
+ *
+ * CSD_TYPE_SYNC/ASYNC:
+ * struct {
+ * struct llist_node node;
+ * unsigned int flags;
+ * smp_call_func_t func;
+ * void *info;
+ * };
+ *
+ * CSD_TYPE_IRQ_WORK:
+ * struct {
+ * struct llist_node node;
+ * atomic_t flags;
+ * void (*func)(struct irq_work *);
+ * };
+ *
+ * CSD_TYPE_TTWU:
+ * struct {
+ * struct llist_node node;
+ * unsigned int flags;
+ * };
+ *
+ */
+
+struct __call_single_node {
+ struct llist_node llist;
+ union {
+ unsigned int u_flags;
+ atomic_t a_flags;
+ };
+};
+
+#endif /* __LINUX_SMP_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 019db7a..1339440 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,7 +2293,7 @@ void sched_ttwu_pending(void *arg)
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- llist_for_each_entry_safe(p, t, llist, wake_entry) {
+ llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {
if (WARN_ON_ONCE(p->on_cpu))
smp_cond_load_acquire(&p->on_cpu, !VAL);
@@ -2329,7 +2329,7 @@ static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags
p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
WRITE_ONCE(rq->ttwu_pending, 1);
- __smp_call_single_queue(cpu, &p->wake_entry);
+ __smp_call_single_queue(cpu, &p->wake_entry.llist);
}
void wake_up_if_idle(int cpu)
@@ -2786,7 +2786,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
#endif
init_numa_balancing(clone_flags, p);
#ifdef CONFIG_SMP
- p->wake_entry_type = CSD_TYPE_TTWU;
+ p->wake_entry.u_flags = CSD_TYPE_TTWU;
#endif
}
diff --git a/kernel/smp.c b/kernel/smp.c
index 472c2b2..aa17eed 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -669,24 +669,6 @@ void __init smp_init(void)
{
int num_nodes, num_cpus;
- /*
- * Ensure struct irq_work layout matches so that
- * flush_smp_call_function_queue() can do horrible things.
- */
- BUILD_BUG_ON(offsetof(struct irq_work, llnode) !=
- offsetof(struct __call_single_data, llist));
- BUILD_BUG_ON(offsetof(struct irq_work, func) !=
- offsetof(struct __call_single_data, func));
- BUILD_BUG_ON(offsetof(struct irq_work, flags) !=
- offsetof(struct __call_single_data, flags));
-
- /*
- * Assert the CSD_TYPE_TTWU layout is similar enough
- * for task_struct to be on the @call_single_queue.
- */
- BUILD_BUG_ON(offsetof(struct task_struct, wake_entry_type) - offsetof(struct task_struct, wake_entry) !=
- offsetof(struct __call_single_data, flags) - offsetof(struct __call_single_data, llist));
-
idle_threads_init();
cpuhp_threads_init();
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [tip: sched/urgent] smp, irq_work: Continue smp_call_function*() and irq_work*() integration
2020-06-22 10:01 ` [PATCH -v2 3/5] smp, irq_work: Continue smp_call_function*() and irq_work*() integration Peter Zijlstra
2020-06-23 7:19 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2020-06-23 8:48 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-06-23 8:48 UTC (permalink / raw)
To: linux-tip-commits
Cc: Linus Torvalds, Peter Zijlstra (Intel),
Ingo Molnar, Frederic Weisbecker, x86, LKML
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 380dc20ce84341bb376371fd5ed5fe6a93d4f4cf
Gitweb: https://git.kernel.org/tip/380dc20ce84341bb376371fd5ed5fe6a93d4f4cf
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 22 Jun 2020 12:01:25 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 23 Jun 2020 10:42:39 +02:00
smp, irq_work: Continue smp_call_function*() and irq_work*() integration
Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions to make it all automatic.
Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
---
include/linux/irq_work.h | 26 ++++-----------
include/linux/sched.h | 5 +---
include/linux/smp.h | 23 ++++---------
include/linux/smp_types.h | 66 ++++++++++++++++++++++++++++++++++++++-
kernel/sched/core.c | 6 +--
kernel/smp.c | 18 +----------
6 files changed, 86 insertions(+), 58 deletions(-)
create mode 100644 include/linux/smp_types.h
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 2735da5..3082378 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -2,7 +2,7 @@
#ifndef _LINUX_IRQ_WORK_H
#define _LINUX_IRQ_WORK_H
-#include <linux/llist.h>
+#include <linux/smp_types.h>
/*
* An entry can be in one of four states:
@@ -13,24 +13,14 @@
* busy NULL, 2 -> {free, claimed} : callback in progress, can be claimed
*/
-/* flags share CSD_FLAG_ space */
-
-#define IRQ_WORK_PENDING BIT(0)
-#define IRQ_WORK_BUSY BIT(1)
-
-/* Doesn't want IPI, wait for tick: */
-#define IRQ_WORK_LAZY BIT(2)
-/* Run hard IRQ context, even on RT */
-#define IRQ_WORK_HARD_IRQ BIT(3)
-
-#define IRQ_WORK_CLAIMED (IRQ_WORK_PENDING | IRQ_WORK_BUSY)
-
-/*
- * structure shares layout with single_call_data_t.
- */
struct irq_work {
- struct llist_node llnode;
- atomic_t flags;
+ union {
+ struct __call_single_node node;
+ struct {
+ struct llist_node llnode;
+ atomic_t flags;
+ };
+ };
void (*func)(struct irq_work *);
};
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 224b5de..692e327 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -654,11 +654,8 @@ struct task_struct {
unsigned int ptrace;
#ifdef CONFIG_SMP
- struct {
- struct llist_node wake_entry;
- unsigned int wake_entry_type;
- };
int on_cpu;
+ struct __call_single_node wake_entry;
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* Current CPU: */
unsigned int cpu;
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 7ee202a..80d557e 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -12,29 +12,22 @@
#include <linux/list.h>
#include <linux/cpumask.h>
#include <linux/init.h>
-#include <linux/llist.h>
+#include <linux/smp_types.h>
typedef void (*smp_call_func_t)(void *info);
typedef bool (*smp_cond_func_t)(int cpu, void *info);
-enum {
- CSD_FLAG_LOCK = 0x01,
-
- /* IRQ_WORK_flags */
-
- CSD_TYPE_ASYNC = 0x00,
- CSD_TYPE_SYNC = 0x10,
- CSD_TYPE_IRQ_WORK = 0x20,
- CSD_TYPE_TTWU = 0x30,
- CSD_FLAG_TYPE_MASK = 0xF0,
-};
-
/*
* structure shares (partial) layout with struct irq_work
*/
struct __call_single_data {
- struct llist_node llist;
- unsigned int flags;
+ union {
+ struct __call_single_node node;
+ struct {
+ struct llist_node llist;
+ unsigned int flags;
+ };
+ };
smp_call_func_t func;
void *info;
};
diff --git a/include/linux/smp_types.h b/include/linux/smp_types.h
new file mode 100644
index 0000000..364b3ae
--- /dev/null
+++ b/include/linux/smp_types.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_SMP_TYPES_H
+#define __LINUX_SMP_TYPES_H
+
+#include <linux/llist.h>
+
+enum {
+ CSD_FLAG_LOCK = 0x01,
+
+ IRQ_WORK_PENDING = 0x01,
+ IRQ_WORK_BUSY = 0x02,
+ IRQ_WORK_LAZY = 0x04, /* No IPI, wait for tick */
+ IRQ_WORK_HARD_IRQ = 0x08, /* IRQ context on PREEMPT_RT */
+
+ IRQ_WORK_CLAIMED = (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
+
+ CSD_TYPE_ASYNC = 0x00,
+ CSD_TYPE_SYNC = 0x10,
+ CSD_TYPE_IRQ_WORK = 0x20,
+ CSD_TYPE_TTWU = 0x30,
+
+ CSD_FLAG_TYPE_MASK = 0xF0,
+};
+
+/*
+ * struct __call_single_node is the primary type on
+ * smp.c:call_single_queue.
+ *
+ * flush_smp_call_function_queue() only reads the type from
+ * __call_single_node::u_flags as a regular load, the above
+ * (anonymous) enum defines all the bits of this word.
+ *
+ * Other bits are not modified until the type is known.
+ *
+ * CSD_TYPE_SYNC/ASYNC:
+ * struct {
+ * struct llist_node node;
+ * unsigned int flags;
+ * smp_call_func_t func;
+ * void *info;
+ * };
+ *
+ * CSD_TYPE_IRQ_WORK:
+ * struct {
+ * struct llist_node node;
+ * atomic_t flags;
+ * void (*func)(struct irq_work *);
+ * };
+ *
+ * CSD_TYPE_TTWU:
+ * struct {
+ * struct llist_node node;
+ * unsigned int flags;
+ * };
+ *
+ */
+
+struct __call_single_node {
+ struct llist_node llist;
+ union {
+ unsigned int u_flags;
+ atomic_t a_flags;
+ };
+};
+
+#endif /* __LINUX_SMP_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f778067..ca5db40 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,7 +2293,7 @@ void sched_ttwu_pending(void *arg)
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- llist_for_each_entry_safe(p, t, llist, wake_entry) {
+ llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {
if (WARN_ON_ONCE(p->on_cpu))
smp_cond_load_acquire(&p->on_cpu, !VAL);
@@ -2329,7 +2329,7 @@ static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags
p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
WRITE_ONCE(rq->ttwu_pending, 1);
- __smp_call_single_queue(cpu, &p->wake_entry);
+ __smp_call_single_queue(cpu, &p->wake_entry.llist);
}
void wake_up_if_idle(int cpu)
@@ -2786,7 +2786,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
#endif
init_numa_balancing(clone_flags, p);
#ifdef CONFIG_SMP
- p->wake_entry_type = CSD_TYPE_TTWU;
+ p->wake_entry.u_flags = CSD_TYPE_TTWU;
#endif
}
diff --git a/kernel/smp.c b/kernel/smp.c
index 472c2b2..aa17eed 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -669,24 +669,6 @@ void __init smp_init(void)
{
int num_nodes, num_cpus;
- /*
- * Ensure struct irq_work layout matches so that
- * flush_smp_call_function_queue() can do horrible things.
- */
- BUILD_BUG_ON(offsetof(struct irq_work, llnode) !=
- offsetof(struct __call_single_data, llist));
- BUILD_BUG_ON(offsetof(struct irq_work, func) !=
- offsetof(struct __call_single_data, func));
- BUILD_BUG_ON(offsetof(struct irq_work, flags) !=
- offsetof(struct __call_single_data, flags));
-
- /*
- * Assert the CSD_TYPE_TTWU layout is similar enough
- * for task_struct to be on the @call_single_queue.
- */
- BUILD_BUG_ON(offsetof(struct task_struct, wake_entry_type) - offsetof(struct task_struct, wake_entry) !=
- offsetof(struct __call_single_data, flags) - offsetof(struct __call_single_data, llist));
-
idle_threads_init();
cpuhp_threads_init();
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH -v2 4/5] irq_work: Cleanup
2020-06-22 10:01 [PATCH -v2 0/5] sched: TTWU, IPI and stuff Peter Zijlstra
` (2 preceding siblings ...)
2020-06-22 10:01 ` [PATCH -v2 3/5] smp, irq_work: Continue smp_call_function*() and irq_work*() integration Peter Zijlstra
@ 2020-06-22 10:01 ` Peter Zijlstra
2020-06-22 10:01 ` [PATCH -v2 5/5] smp: Cleanup smp_call_function*() Peter Zijlstra
4 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-06-22 10:01 UTC (permalink / raw)
To: mingo, tglx
Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, paulmck, frederic, peterz, torvalds,
hch
Get rid of the __call_single_node union and clean up the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/irq_work.h | 33 +++++++++++++++++++++------------
include/linux/irqflags.h | 4 ++--
kernel/bpf/stackmap.c | 2 +-
kernel/irq_work.c | 18 +++++++++---------
kernel/printk/printk.c | 6 ++----
kernel/rcu/tree.c | 3 +--
kernel/time/tick-sched.c | 6 ++----
kernel/trace/bpf_trace.c | 2 +-
8 files changed, 39 insertions(+), 35 deletions(-)
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -14,28 +14,37 @@
*/
struct irq_work {
- union {
- struct __call_single_node node;
- struct {
- struct llist_node llnode;
- atomic_t flags;
- };
- };
+ struct __call_single_node node;
void (*func)(struct irq_work *);
};
+#define __IRQ_WORK_INIT(_func, _flags) (struct irq_work){ \
+ .node = { .u_flags = (_flags), }, \
+ .func = (_func), \
+}
+
+#define IRQ_WORK_INIT(_func) __IRQ_WORK_INIT(_func, 0)
+#define IRQ_WORK_INIT_LAZY(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_LAZY)
+#define IRQ_WORK_INIT_HARD(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_HARD_IRQ)
+
+#define DEFINE_IRQ_WORK(name, _f) \
+ struct irq_work name = IRQ_WORK_INIT(_f)
+
static inline
void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
{
- atomic_set(&work->flags, 0);
- work->func = func;
+ *work = IRQ_WORK_INIT(func);
}
-#define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { \
- .flags = ATOMIC_INIT(0), \
- .func = (_f) \
+static inline bool irq_work_is_pending(struct irq_work *work)
+{
+ return atomic_read(&work->node.a_flags) & IRQ_WORK_PENDING;
}
+static inline bool irq_work_is_busy(struct irq_work *work)
+{
+ return atomic_read(&work->node.a_flags) & IRQ_WORK_BUSY;
+}
bool irq_work_queue(struct irq_work *work);
bool irq_work_queue_on(struct irq_work *work, int cpu);
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -90,12 +90,12 @@ do { \
# define lockdep_irq_work_enter(__work) \
do { \
- if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+ if (!(atomic_read(&__work->node.a_flags) & IRQ_WORK_HARD_IRQ))\
current->irq_config = 1; \
} while (0)
# define lockdep_irq_work_exit(__work) \
do { \
- if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+ if (!(atomic_read(&__work->node.a_flags) & IRQ_WORK_HARD_IRQ))\
current->irq_config = 0; \
} while (0)
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -292,7 +292,7 @@ static void stack_map_get_build_id_offse
if (irqs_disabled()) {
if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
work = this_cpu_ptr(&up_read_work);
- if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY) {
+ if (irq_work_is_busy(&work->irq_work)) {
/* cannot queue more up_read, fallback */
irq_work_busy = true;
}
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -31,7 +31,7 @@ static bool irq_work_claim(struct irq_wo
{
int oflags;
- oflags = atomic_fetch_or(IRQ_WORK_CLAIMED | CSD_TYPE_IRQ_WORK, &work->flags);
+ oflags = atomic_fetch_or(IRQ_WORK_CLAIMED | CSD_TYPE_IRQ_WORK, &work->node.a_flags);
/*
* If the work is already pending, no need to raise the IPI.
* The pairing atomic_fetch_andnot() in irq_work_run() makes sure
@@ -53,12 +53,12 @@ void __weak arch_irq_work_raise(void)
static void __irq_work_queue_local(struct irq_work *work)
{
/* If the work is "lazy", handle it from next tick if any */
- if (atomic_read(&work->flags) & IRQ_WORK_LAZY) {
- if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
+ if (atomic_read(&work->node.a_flags) & IRQ_WORK_LAZY) {
+ if (llist_add(&work->node.llist, this_cpu_ptr(&lazy_list)) &&
tick_nohz_tick_stopped())
arch_irq_work_raise();
} else {
- if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+ if (llist_add(&work->node.llist, this_cpu_ptr(&raised_list)))
arch_irq_work_raise();
}
}
@@ -102,7 +102,7 @@ bool irq_work_queue_on(struct irq_work *
if (cpu != smp_processor_id()) {
/* Arch remote IPI send/receive backend aren't NMI safe */
WARN_ON_ONCE(in_nmi());
- __smp_call_single_queue(cpu, &work->llnode);
+ __smp_call_single_queue(cpu, &work->node.llist);
} else {
__irq_work_queue_local(work);
}
@@ -142,7 +142,7 @@ void irq_work_single(void *arg)
* to claim that work don't rely on us to handle their data
* while we are in the middle of the func.
*/
- flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
+ flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->node.a_flags);
lockdep_irq_work_enter(work);
work->func(work);
@@ -152,7 +152,7 @@ void irq_work_single(void *arg)
* no-one else claimed it meanwhile.
*/
flags &= ~IRQ_WORK_PENDING;
- (void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
+ (void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
}
static void irq_work_run_list(struct llist_head *list)
@@ -166,7 +166,7 @@ static void irq_work_run_list(struct lli
return;
llnode = llist_del_all(list);
- llist_for_each_entry_safe(work, tmp, llnode, llnode)
+ llist_for_each_entry_safe(work, tmp, llnode, node.llist)
irq_work_single(work);
}
@@ -198,7 +198,7 @@ void irq_work_sync(struct irq_work *work
{
lockdep_assert_irqs_enabled();
- while (atomic_read(&work->flags) & IRQ_WORK_BUSY)
+ while (irq_work_is_busy(work))
cpu_relax();
}
EXPORT_SYMBOL_GPL(irq_work_sync);
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3031,10 +3031,8 @@ static void wake_up_klogd_work_func(stru
wake_up_interruptible(&log_wait);
}
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
- .func = wake_up_klogd_work_func,
- .flags = ATOMIC_INIT(IRQ_WORK_LAZY),
-};
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) =
+ IRQ_WORK_INIT_LAZY(wake_up_klogd_work_func);
void wake_up_klogd(void)
{
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1273,8 +1273,6 @@ static int rcu_implicit_dynticks_qs(stru
if (IS_ENABLED(CONFIG_IRQ_WORK) &&
!rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
(rnp->ffmask & rdp->grpmask)) {
- init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
- atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
rdp->rcu_iw_pending = true;
rdp->rcu_iw_gp_seq = rnp->gp_seq;
irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
@@ -3740,6 +3738,7 @@ int rcutree_prepare_cpu(unsigned int cpu
rdp->cpu_no_qs.b.norm = true;
rdp->core_needs_qs = false;
rdp->rcu_iw_pending = false;
+ rdp->rcu_iw = IRQ_WORK_INIT_HARD(rcu_iw_handler);
rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -243,10 +243,8 @@ static void nohz_full_kick_func(struct i
/* Empty, the tick restart happens on tick_nohz_irq_exit() */
}
-static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
- .func = nohz_full_kick_func,
- .flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
-};
+static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) =
+ IRQ_WORK_INIT_HARD(nohz_full_kick_func);
/*
* Kick this CPU if it's full dynticks in order to force it to
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1013,7 +1013,7 @@ static int bpf_send_signal_common(u32 si
return -EINVAL;
work = this_cpu_ptr(&send_signal_work);
- if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY)
+ if (irq_work_is_busy(&work->irq_work))
return -EBUSY;
/* Add the current task, which is the target of sending signal,
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH -v2 5/5] smp: Cleanup smp_call_function*()
2020-06-22 10:01 [PATCH -v2 0/5] sched: TTWU, IPI and stuff Peter Zijlstra
` (3 preceding siblings ...)
2020-06-22 10:01 ` [PATCH -v2 4/5] irq_work: Cleanup Peter Zijlstra
@ 2020-06-22 10:01 ` Peter Zijlstra
4 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-06-22 10:01 UTC (permalink / raw)
To: mingo, tglx
Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, paulmck, frederic, peterz, torvalds,
hch
Get rid of the __call_single_node union and cleanup the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/mips/kernel/process.c | 5 +--
arch/mips/kernel/smp.c | 25 +++------------
arch/s390/pci/pci_irq.c | 4 --
arch/x86/kernel/cpuid.c | 7 +---
arch/x86/lib/msr-smp.c | 7 +---
block/blk-mq.c | 4 --
block/blk-softirq.c | 9 +----
drivers/cpuidle/coupled.c | 3 -
drivers/net/ethernet/cavium/liquidio/lio_core.c | 9 +----
include/linux/smp.h | 16 +++++----
kernel/debug/debug_core.c | 6 +--
kernel/sched/core.c | 12 +------
kernel/smp.c | 40 ++++++++++++------------
net/core/dev.c | 3 -
14 files changed, 57 insertions(+), 93 deletions(-)
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -687,7 +687,6 @@ unsigned long arch_align_stack(unsigned
return sp & ALMASK;
}
-static DEFINE_PER_CPU(call_single_data_t, backtrace_csd);
static struct cpumask backtrace_csd_busy;
static void handle_backtrace(void *info)
@@ -696,6 +695,9 @@ static void handle_backtrace(void *info)
cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy);
}
+static DEFINE_PER_CPU(call_single_data_t, backtrace_csd) =
+ CSD_INIT(handle_backtrace, NULL);
+
static void raise_backtrace(cpumask_t *mask)
{
call_single_data_t *csd;
@@ -715,7 +717,6 @@ static void raise_backtrace(cpumask_t *m
}
csd = &per_cpu(backtrace_csd, cpu);
- csd->func = handle_backtrace;
smp_call_function_single_async(cpu, csd);
}
}
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -687,36 +687,23 @@ EXPORT_SYMBOL(flush_tlb_one);
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
-static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd);
-
-void tick_broadcast(const struct cpumask *mask)
-{
- call_single_data_t *csd;
- int cpu;
-
- for_each_cpu(cpu, mask) {
- csd = &per_cpu(tick_broadcast_csd, cpu);
- smp_call_function_single_async(cpu, csd);
- }
-}
-
static void tick_broadcast_callee(void *info)
{
tick_receive_broadcast();
}
-static int __init tick_broadcast_init(void)
+static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd) =
+ CSD_INIT(tick_broadcast_callee, NULL);
+
+void tick_broadcast(const struct cpumask *mask)
{
call_single_data_t *csd;
int cpu;
- for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ for_each_cpu(cpu, mask) {
csd = &per_cpu(tick_broadcast_csd, cpu);
- csd->func = tick_broadcast_callee;
+ smp_call_function_single_async(cpu, csd);
}
-
- return 0;
}
-early_initcall(tick_broadcast_init);
#endif /* CONFIG_GENERIC_CLOCKEVENTS_BROADCAST */
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
if (atomic_inc_return(&cpu_data->scheduled) > 1)
continue;
- cpu_data->csd.func = zpci_handle_remote_irq;
- cpu_data->csd.info = &cpu_data->scheduled;
- cpu_data->csd.flags = 0;
+ INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq, &cpu_data->scheduled);
smp_call_function_single_async(cpu, &cpu_data->csd);
}
}
--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -74,10 +74,9 @@ static ssize_t cpuid_read(struct file *f
init_completion(&cmd.done);
for (; count; count -= 16) {
- call_single_data_t csd = {
- .func = cpuid_smp_cpuid,
- .info = &cmd,
- };
+ call_single_data_t csd;
+
+ INIT_CSD(&csd, cpuid_smp_cpuid, &cmd);
cmd.regs.eax = pos;
cmd.regs.ecx = pos >> 32;
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -169,12 +169,11 @@ static void __wrmsr_safe_on_cpu(void *in
int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
{
struct msr_info_completion rv;
- call_single_data_t csd = {
- .func = __rdmsr_safe_on_cpu,
- .info = &rv,
- };
+ call_single_data_t csd;
int err;
+ INIT_CSD(&csd, __rdmsr_safe_on_cpu, &rv);
+
memset(&rv, 0, sizeof(rv));
init_completion(&rv.done);
rv.msr.msr_no = msr_no;
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -629,9 +629,7 @@ void blk_mq_force_complete_rq(struct req
shared = cpus_share_cache(cpu, ctx->cpu);
if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
- rq->csd.func = __blk_mq_complete_request_remote;
- rq->csd.info = rq;
- rq->csd.flags = 0;
+ INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
smp_call_function_single_async(ctx->cpu, &rq->csd);
} else {
q->mq_ops->complete(rq);
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -57,13 +57,8 @@ static void trigger_softirq(void *data)
static int raise_blk_irq(int cpu, struct request *rq)
{
if (cpu_online(cpu)) {
- call_single_data_t *data = &rq->csd;
-
- data->func = trigger_softirq;
- data->info = rq;
- data->flags = 0;
-
- smp_call_function_single_async(cpu, data);
+ INIT_CSD(&rq->csd, trigger_softirq, rq);
+ smp_call_function_single_async(cpu, &rq->csd);
return 0;
}
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -674,8 +674,7 @@ int cpuidle_coupled_register_device(stru
coupled->refcnt++;
csd = &per_cpu(cpuidle_coupled_poke_cb, dev->cpu);
- csd->func = cpuidle_coupled_handle_poke;
- csd->info = (void *)(unsigned long)dev->cpu;
+ INIT_CSD(csd, cpuidle_coupled_handle_poke, (void *)(unsigned long)dev->cpu);
return 0;
}
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -726,13 +726,8 @@ static void liquidio_napi_drv_callback(v
droq->cpu_id == this_cpu) {
napi_schedule_irqoff(&droq->napi);
} else {
- call_single_data_t *csd = &droq->csd;
-
- csd->func = napi_schedule_wrapper;
- csd->info = &droq->napi;
- csd->flags = 0;
-
- smp_call_function_single_async(droq->cpu_id, csd);
+ INIT_CSD(&droq->csd, napi_schedule_wrapper, &droq->napi);
+ smp_call_function_single_async(droq->cpu_id, &droq->csd);
}
}
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -21,21 +21,23 @@ typedef bool (*smp_cond_func_t)(int cpu,
* structure shares (partial) layout with struct irq_work
*/
struct __call_single_data {
- union {
- struct __call_single_node node;
- struct {
- struct llist_node llist;
- unsigned int flags;
- };
- };
+ struct __call_single_node node;
smp_call_func_t func;
void *info;
};
+#define CSD_INIT(_func, _info) \
+ (struct __call_single_data){ .func = (_func), .info = (_info), }
+
/* Use __aligned() to avoid to use 2 cache lines for 1 csd */
typedef struct __call_single_data call_single_data_t
__aligned(sizeof(struct __call_single_data));
+#define INIT_CSD(_csd, _func, _info) \
+do { \
+ *(_csd) = CSD_INIT((_func), (_info)); \
+} while (0)
+
/*
* Enqueue a llist_node on the call_single_queue; be very careful, read
* flush_smp_call_function_queue() in detail.
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -225,8 +225,6 @@ int __weak kgdb_skipexception(int except
* Default (weak) implementation for kgdb_roundup_cpus
*/
-static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
-
void __weak kgdb_call_nmi_hook(void *ignored)
{
/*
@@ -240,6 +238,9 @@ void __weak kgdb_call_nmi_hook(void *ign
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}
+static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd) =
+ CSD_INIT(kgdb_call_nmi_hook, NULL);
+
void __weak kgdb_roundup_cpus(void)
{
call_single_data_t *csd;
@@ -266,7 +267,6 @@ void __weak kgdb_roundup_cpus(void)
continue;
kgdb_info[cpu].rounding_up = true;
- csd->func = kgdb_call_nmi_hook;
ret = smp_call_function_single_async(cpu, csd);
if (ret)
kgdb_info[cpu].rounding_up = false;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -221,14 +221,6 @@ void update_rq_clock(struct rq *rq)
update_rq_clock_task(rq, delta);
}
-static inline void
-rq_csd_init(struct rq *rq, call_single_data_t *csd, smp_call_func_t func)
-{
- csd->flags = 0;
- csd->func = func;
- csd->info = rq;
-}
-
#ifdef CONFIG_SCHED_HRTICK
/*
* Use HR-timers to deliver accurate preemption points.
@@ -329,7 +321,7 @@ void hrtick_start(struct rq *rq, u64 del
static void hrtick_rq_init(struct rq *rq)
{
#ifdef CONFIG_SMP
- rq_csd_init(rq, &rq->hrtick_csd, __hrtick_start);
+ INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq);
#endif
hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
rq->hrtick_timer.function = hrtick;
@@ -6778,7 +6770,7 @@ void __init sched_init(void)
rq->last_blocked_load_update_tick = jiffies;
atomic_set(&rq->nohz_flags, 0);
- rq_csd_init(rq, &rq->nohz_csd, nohz_csd_func);
+ INIT_CSD(&rq->nohz_csd, nohz_csd_func, rq);
#endif
#endif /* CONFIG_SMP */
hrtick_rq_init(rq);
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -24,7 +24,7 @@
#include "smpboot.h"
#include "sched/smp.h"
-#define CSD_TYPE(_csd) ((_csd)->flags & CSD_FLAG_TYPE_MASK)
+#define CSD_TYPE(_csd) ((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
struct call_function_data {
call_single_data_t __percpu *csd;
@@ -105,13 +105,13 @@ void __init call_function_init(void)
*/
static __always_inline void csd_lock_wait(call_single_data_t *csd)
{
- smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
+ smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK));
}
static __always_inline void csd_lock(call_single_data_t *csd)
{
csd_lock_wait(csd);
- csd->flags |= CSD_FLAG_LOCK;
+ csd->node.u_flags |= CSD_FLAG_LOCK;
/*
* prevent CPU from reordering the above assignment
@@ -123,12 +123,12 @@ static __always_inline void csd_lock(cal
static __always_inline void csd_unlock(call_single_data_t *csd)
{
- WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
+ WARN_ON(!(csd->node.u_flags & CSD_FLAG_LOCK));
/*
* ensure we're all done before releasing data:
*/
- smp_store_release(&csd->flags, 0);
+ smp_store_release(&csd->node.u_flags, 0);
}
static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
@@ -180,7 +180,7 @@ static int generic_exec_single(int cpu,
return -ENXIO;
}
- __smp_call_single_queue(cpu, &csd->llist);
+ __smp_call_single_queue(cpu, &csd->node.llist);
return 0;
}
@@ -233,7 +233,7 @@ static void flush_smp_call_function_queu
* We don't have to use the _safe() variant here
* because we are not invoking the IPI handlers yet.
*/
- llist_for_each_entry(csd, entry, llist) {
+ llist_for_each_entry(csd, entry, node.llist) {
switch (CSD_TYPE(csd)) {
case CSD_TYPE_ASYNC:
case CSD_TYPE_SYNC:
@@ -258,22 +258,22 @@ static void flush_smp_call_function_queu
* First; run all SYNC callbacks, people are waiting for us.
*/
prev = NULL;
- llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+ llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
/* Do we wait until *after* callback? */
if (CSD_TYPE(csd) == CSD_TYPE_SYNC) {
smp_call_func_t func = csd->func;
void *info = csd->info;
if (prev) {
- prev->next = &csd_next->llist;
+ prev->next = &csd_next->node.llist;
} else {
- entry = &csd_next->llist;
+ entry = &csd_next->node.llist;
}
func(info);
csd_unlock(csd);
} else {
- prev = &csd->llist;
+ prev = &csd->node.llist;
}
}
@@ -284,14 +284,14 @@ static void flush_smp_call_function_queu
* Second; run all !SYNC callbacks.
*/
prev = NULL;
- llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+ llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
int type = CSD_TYPE(csd);
if (type != CSD_TYPE_TTWU) {
if (prev) {
- prev->next = &csd_next->llist;
+ prev->next = &csd_next->node.llist;
} else {
- entry = &csd_next->llist;
+ entry = &csd_next->node.llist;
}
if (type == CSD_TYPE_ASYNC) {
@@ -305,7 +305,7 @@ static void flush_smp_call_function_queu
}
} else {
- prev = &csd->llist;
+ prev = &csd->node.llist;
}
}
@@ -341,7 +341,7 @@ int smp_call_function_single(int cpu, sm
{
call_single_data_t *csd;
call_single_data_t csd_stack = {
- .flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC,
+ .node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, },
};
int this_cpu;
int err;
@@ -416,12 +416,12 @@ int smp_call_function_single_async(int c
preempt_disable();
- if (csd->flags & CSD_FLAG_LOCK) {
+ if (csd->node.u_flags & CSD_FLAG_LOCK) {
err = -EBUSY;
goto out;
}
- csd->flags = CSD_FLAG_LOCK;
+ csd->node.u_flags = CSD_FLAG_LOCK;
smp_wmb();
err = generic_exec_single(cpu, csd);
@@ -539,10 +539,10 @@ static void smp_call_function_many_cond(
csd_lock(csd);
if (wait)
- csd->flags |= CSD_TYPE_SYNC;
+ csd->node.u_flags |= CSD_TYPE_SYNC;
csd->func = func;
csd->info = info;
- if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
+ if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu)))
__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
}
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10645,8 +10645,7 @@ static int __init net_dev_init(void)
INIT_LIST_HEAD(&sd->poll_list);
sd->output_queue_tailp = &sd->output_queue;
#ifdef CONFIG_RPS
- sd->csd.func = rps_trigger_softirq;
- sd->csd.info = sd;
+ INIT_CSD(&sd->csd, rps_trigger_softirq, sd);
sd->cpu = i;
#endif
^ permalink raw reply [flat|nested] 20+ messages in thread