linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
@ 2020-06-15 12:56 Peter Zijlstra
  2020-06-15 12:56 ` [PATCH 1/6] sched: Fix ttwu_queue_cond() Peter Zijlstra
                   ` (6 more replies)
  0 siblings, 7 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 12:56 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, peterz

Hi,

So Paul reported rcutorture hitting a NULL dereference, and patch #1 fixes it.

Now, patch #1 is obviously correct, but I can't explain how exactly it leads to
the observed NULL pointer dereference. The NULL pointer deref happens in
find_matching_se()'s last while() loop when is_same_group() fails even though
both parents are NULL.

The only explanation I have for that is that we just did an activate_task()
while: 'task_cpu(p) != cpu_of(rq)', because then 'p->se.cfs_rq' doesn't match.
However, I can't see how the lack of #1 would lead to that. Never-the-less,
patch #2 adds assertions to warn us of this case.

Patch #3 is a trivial rename that ought to eradicate some confusion.

The last 3 patches is what I ended up with for cleaning up the whole
smp_call_function/irq_work/ttwu thing more.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/6] sched: Fix ttwu_queue_cond()
  2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
@ 2020-06-15 12:56 ` Peter Zijlstra
  2020-06-15 13:34   ` Peter Zijlstra
  2020-06-22  9:11   ` Mel Gorman
  2020-06-15 12:56 ` [PATCH 2/6] sched: Verify some SMP assumptions Peter Zijlstra
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 12:56 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, peterz

Where the condition:

  !cpus_share_cache(smp_processor_id(), cpu)

already implies 'cpu != smp_processor_id()', because a CPU always
shares cache with itself, the secondary condition added in commit:

  2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")

voids that implication, resulting in attempting to do local wake-ups
through the queue mechanism.

Fixes: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/sched/core.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2356,11 +2356,22 @@ bool cpus_share_cache(int this_cpu, int
 
 static inline bool ttwu_queue_cond(int cpu, int wake_flags)
 {
+	int this_cpu = smp_processor_id();
+
+	/*
+	 * Only ever queue for remote wakeups. The on_cpu case can only ever
+	 * happen remotely, and for the normal case it makes no sense to
+	 * involve IPIs here, and would be broken, as many architectures cannot
+	 * trivially IPI self in any case.
+	 */
+	if (cpu == this_cpu)
+		return false;
+
 	/*
 	 * If the CPU does not share cache, then queue the task on the
 	 * remote rqs wakelist to avoid accessing remote data.
 	 */
-	if (!cpus_share_cache(smp_processor_id(), cpu))
+	if (!cpus_share_cache(this_cpu, cpu))
 		return true;
 
 	/*



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 2/6] sched: Verify some SMP assumptions
  2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
  2020-06-15 12:56 ` [PATCH 1/6] sched: Fix ttwu_queue_cond() Peter Zijlstra
@ 2020-06-15 12:56 ` Peter Zijlstra
  2020-06-15 12:56 ` [PATCH 3/6] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 12:56 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, peterz

Validate that:
 - __smp_call_single_queue() is only used on remote CPUs
 - task and rq CPUs match on activate_task()

(and always use activate_task() where we should)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |    8 ++++----
 kernel/smp.c        |    2 ++
 2 files changed, 6 insertions(+), 4 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1311,6 +1311,8 @@ static inline void dequeue_task(struct r
 
 void activate_task(struct rq *rq, struct task_struct *p, int flags)
 {
+	SCHED_WARN_ON(task_cpu(p) != cpu_of(rq));
+
 	if (task_contributes_to_load(p))
 		rq->nr_uninterruptible--;
 
@@ -1474,8 +1476,7 @@ static struct rq *move_queued_task(struc
 {
 	lockdep_assert_held(&rq->lock);
 
-	WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING);
-	dequeue_task(rq, p, DEQUEUE_NOCLOCK);
+	deactivate_task(rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, new_cpu);
 	rq_unlock(rq, rf);
 
@@ -1483,8 +1484,7 @@ static struct rq *move_queued_task(struc
 
 	rq_lock(rq, rf);
 	BUG_ON(task_cpu(p) != new_cpu);
-	enqueue_task(rq, p, 0);
-	p->on_rq = TASK_ON_RQ_QUEUED;
+	activate_task(rq, p, 0);
 	check_preempt_curr(rq, p, 0);
 
 	return rq;
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -135,6 +135,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(cal
 
 void __smp_call_single_queue(int cpu, struct llist_node *node)
 {
+	WARN_ON_ONCE(cpu == smp_processor_id());
+
 	/*
 	 * The list addition should be visible before sending the IPI
 	 * handler locks the list to pull the entry off it because of



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 3/6] sched: s/WF_ON_RQ/WQ_ON_CPU/
  2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
  2020-06-15 12:56 ` [PATCH 1/6] sched: Fix ttwu_queue_cond() Peter Zijlstra
  2020-06-15 12:56 ` [PATCH 2/6] sched: Verify some SMP assumptions Peter Zijlstra
@ 2020-06-15 12:56 ` Peter Zijlstra
  2020-06-22  9:13   ` Mel Gorman
  2020-06-15 12:56 ` [PATCH 4/6] smp, irq_work: Continue smp_call_function*() and irq_work*() integration Peter Zijlstra
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 12:56 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, peterz

Avoids confusion...

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c  |    4 ++--
 kernel/sched/sched.h |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2380,7 +2380,7 @@ static inline bool ttwu_queue_cond(int c
 	 * the soon-to-be-idle CPU as the current CPU is likely busy.
 	 * nr_running is checked to avoid unnecessary task stacking.
 	 */
-	if ((wake_flags & WF_ON_RQ) && cpu_rq(cpu)->nr_running <= 1)
+	if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1)
 		return true;
 
 	return false;
@@ -2626,7 +2626,7 @@ try_to_wake_up(struct task_struct *p, un
 	 * let the waker make forward progress. This is safe because IRQs are
 	 * disabled and the IPI will deliver after on_cpu is cleared.
 	 */
-	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
+	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_CPU))
 		goto unlock;
 
 	/*
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1682,7 +1682,7 @@ static inline int task_on_rq_migrating(s
 #define WF_SYNC			0x01		/* Waker goes to sleep after wakeup */
 #define WF_FORK			0x02		/* Child wakeup after fork */
 #define WF_MIGRATED		0x04		/* Internal use, task got migrated */
-#define WF_ON_RQ		0x08		/* Wakee is on_rq */
+#define WF_ON_CPU		0x08		/* Wakee is on_cpu */
 
 /*
  * To aid in avoiding the subversion of "niceness" due to uneven distribution



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 4/6] smp, irq_work: Continue smp_call_function*() and irq_work*() integration
  2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
                   ` (2 preceding siblings ...)
  2020-06-15 12:56 ` [PATCH 3/6] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
@ 2020-06-15 12:56 ` Peter Zijlstra
  2020-06-15 12:56 ` [PATCH 5/6] irq_work: Cleanup Peter Zijlstra
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 12:56 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, peterz

Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions.

Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/irq_work.h  |   26 +++++-------------
 include/linux/sched.h     |    5 ---
 include/linux/smp.h       |   23 +++++-----------
 include/linux/smp_types.h |   66 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c       |    6 ++--
 kernel/smp.c              |   18 ------------
 6 files changed, 86 insertions(+), 58 deletions(-)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -2,7 +2,7 @@
 #ifndef _LINUX_IRQ_WORK_H
 #define _LINUX_IRQ_WORK_H
 
-#include <linux/llist.h>
+#include <linux/smp_types.h>
 
 /*
  * An entry can be in one of four states:
@@ -13,24 +13,14 @@
  * busy      NULL, 2 -> {free, claimed} : callback in progress, can be claimed
  */
 
-/* flags share CSD_FLAG_ space */
-
-#define IRQ_WORK_PENDING	BIT(0)
-#define IRQ_WORK_BUSY		BIT(1)
-
-/* Doesn't want IPI, wait for tick: */
-#define IRQ_WORK_LAZY		BIT(2)
-/* Run hard IRQ context, even on RT */
-#define IRQ_WORK_HARD_IRQ	BIT(3)
-
-#define IRQ_WORK_CLAIMED	(IRQ_WORK_PENDING | IRQ_WORK_BUSY)
-
-/*
- * structure shares layout with single_call_data_t.
- */
 struct irq_work {
-	struct llist_node llnode;
-	atomic_t flags;
+	union {
+		struct __call_single_node node;
+		struct {
+			struct llist_node llnode;
+			atomic_t flags;
+		};
+	};
 	void (*func)(struct irq_work *);
 };
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -653,11 +653,8 @@ struct task_struct {
 	unsigned int			ptrace;
 
 #ifdef CONFIG_SMP
-	struct {
-		struct llist_node		wake_entry;
-		unsigned int			wake_entry_type;
-	};
 	int				on_cpu;
+	struct __call_single_node	wake_entry;
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/* Current CPU: */
 	unsigned int			cpu;
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -12,29 +12,22 @@
 #include <linux/list.h>
 #include <linux/cpumask.h>
 #include <linux/init.h>
-#include <linux/llist.h>
+#include <linux/smp_types.h>
 
 typedef void (*smp_call_func_t)(void *info);
 typedef bool (*smp_cond_func_t)(int cpu, void *info);
 
-enum {
-	CSD_FLAG_LOCK		= 0x01,
-
-	/* IRQ_WORK_flags */
-
-	CSD_TYPE_ASYNC		= 0x00,
-	CSD_TYPE_SYNC		= 0x10,
-	CSD_TYPE_IRQ_WORK	= 0x20,
-	CSD_TYPE_TTWU		= 0x30,
-	CSD_FLAG_TYPE_MASK	= 0xF0,
-};
-
 /*
  * structure shares (partial) layout with struct irq_work
  */
 struct __call_single_data {
-	struct llist_node llist;
-	unsigned int flags;
+	union {
+		struct __call_single_node node;
+		struct {
+			struct llist_node llist;
+			unsigned int flags;
+		};
+	};
 	smp_call_func_t func;
 	void *info;
 };
--- /dev/null
+++ b/include/linux/smp_types.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_SMP_TYPES_H
+#define __LINUX_SMP_TYPES_H
+
+#include <linux/llist.h>
+
+enum {
+	CSD_FLAG_LOCK		= 0x01,
+
+	IRQ_WORK_PENDING	= 0x01,
+	IRQ_WORK_BUSY		= 0x02,
+	IRQ_WORK_LAZY		= 0x04, /* No IPI, wait for tick */
+	IRQ_WORK_HARD_IRQ	= 0x08, /* IRQ context on PREEMPT_RT */
+
+	IRQ_WORK_CLAIMED	= (IRQ_WORK_PENDING | IRQ_WORK_BUSY),
+
+	CSD_TYPE_ASYNC		= 0x00,
+	CSD_TYPE_SYNC		= 0x10,
+	CSD_TYPE_IRQ_WORK	= 0x20,
+	CSD_TYPE_TTWU		= 0x30,
+
+	CSD_FLAG_TYPE_MASK	= 0xF0,
+};
+
+/*
+ * struct __call_single_node is the primary type on
+ * smp.c:call_single_queue.
+ *
+ * flush_smp_call_function_queue() only reads the type from
+ * __call_single_node::u_flags as a regular load, the above
+ * (anonymous) enum defines all the bits of this word.
+ *
+ * Other bits are not modified until the type is known.
+ *
+ * CSD_TYPE_SYNC/ASYNC:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *		smp_call_func_t func;
+ *		void *info;
+ *	};
+ *
+ * CSD_TYPE_IRQ_WORK:
+ *	struct {
+ *		struct llist_node node;
+ *		atomic_t flags;
+ *		void (*func)(struct irq_work *);
+ *	};
+ *
+ * CSD_TYPE_TTWU:
+ *	struct {
+ *		struct llist_node node;
+ *		unsigned int flags;
+ *	};
+ *
+ */
+
+struct __call_single_node {
+	struct llist_node	llist;
+	union {
+		unsigned int	u_flags;
+		atomic_t	a_flags;
+	};
+};
+
+#endif /* __LINUX_SMP_TYPES_H */
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2293,7 +2293,7 @@ void sched_ttwu_pending(void *arg)
 	rq_lock_irqsave(rq, &rf);
 	update_rq_clock(rq);
 
-	llist_for_each_entry_safe(p, t, llist, wake_entry)
+	llist_for_each_entry_safe(p, t, llist, wake_entry.llist)
 		ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
 
 	rq_unlock_irqrestore(rq, &rf);
@@ -2322,7 +2322,7 @@ static void __ttwu_queue_wakelist(struct
 	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
 
 	WRITE_ONCE(rq->ttwu_pending, 1);
-	__smp_call_single_queue(cpu, &p->wake_entry);
+	__smp_call_single_queue(cpu, &p->wake_entry.llist);
 }
 
 void wake_up_if_idle(int cpu)
@@ -2774,7 +2774,7 @@ static void __sched_fork(unsigned long c
 #endif
 	init_numa_balancing(clone_flags, p);
 #ifdef CONFIG_SMP
-	p->wake_entry_type = CSD_TYPE_TTWU;
+	p->wake_entry.u_flags = CSD_TYPE_TTWU;
 #endif
 }
 
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -671,24 +671,6 @@ void __init smp_init(void)
 {
 	int num_nodes, num_cpus;
 
-	/*
-	 * Ensure struct irq_work layout matches so that
-	 * flush_smp_call_function_queue() can do horrible things.
-	 */
-	BUILD_BUG_ON(offsetof(struct irq_work, llnode) !=
-		     offsetof(struct __call_single_data, llist));
-	BUILD_BUG_ON(offsetof(struct irq_work, func) !=
-		     offsetof(struct __call_single_data, func));
-	BUILD_BUG_ON(offsetof(struct irq_work, flags) !=
-		     offsetof(struct __call_single_data, flags));
-
-	/*
-	 * Assert the CSD_TYPE_TTWU layout is similar enough
-	 * for task_struct to be on the @call_single_queue.
-	 */
-	BUILD_BUG_ON(offsetof(struct task_struct, wake_entry_type) - offsetof(struct task_struct, wake_entry) !=
-		     offsetof(struct __call_single_data, flags) - offsetof(struct __call_single_data, llist));
-
 	idle_threads_init();
 	cpuhp_threads_init();
 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 5/6] irq_work: Cleanup
  2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
                   ` (3 preceding siblings ...)
  2020-06-15 12:56 ` [PATCH 4/6] smp, irq_work: Continue smp_call_function*() and irq_work*() integration Peter Zijlstra
@ 2020-06-15 12:56 ` Peter Zijlstra
  2020-06-16 15:16   ` Petr Mladek
  2020-06-15 12:57 ` [PATCH 6/6] smp: Cleanup smp_call_function*() Peter Zijlstra
  2020-06-15 16:23 ` [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Paul E. McKenney
  6 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 12:56 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, peterz, ast,
	daniel, pmladek, sergey.senozhatsky

Get rid of the __call_single_node union and clean up the API a little
to avoid external code relying on the structure layout as much.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/irq_work.h |   33 +++++++++++++++++++++------------
 include/linux/irqflags.h |    4 ++--
 kernel/bpf/stackmap.c    |    2 +-
 kernel/irq_work.c        |   18 +++++++++---------
 kernel/printk/printk.c   |    6 ++----
 kernel/rcu/tree.c        |    3 +--
 kernel/time/tick-sched.c |    6 ++----
 kernel/trace/bpf_trace.c |    2 +-
 8 files changed, 39 insertions(+), 35 deletions(-)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -14,28 +14,37 @@
  */
 
 struct irq_work {
-	union {
-		struct __call_single_node node;
-		struct {
-			struct llist_node llnode;
-			atomic_t flags;
-		};
-	};
+	struct __call_single_node node;
 	void (*func)(struct irq_work *);
 };
 
+#define __IRQ_WORK_INIT(_func, _flags) (struct irq_work){	\
+	.node = { .u_flags = (_flags), },			\
+	.func = (_func),					\
+}
+
+#define IRQ_WORK_INIT(_func) __IRQ_WORK_INIT(_func, 0)
+#define IRQ_WORK_INIT_LAZY(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_LAZY)
+#define IRQ_WORK_INIT_HARD(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_HARD_IRQ)
+
+#define DEFINE_IRQ_WORK(name, _f)				\
+	struct irq_work name = IRQ_WORK_INIT(_f)
+
 static inline
 void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 {
-	atomic_set(&work->flags, 0);
-	work->func = func;
+	*work = IRQ_WORK_INIT(func);
 }
 
-#define DEFINE_IRQ_WORK(name, _f) struct irq_work name = {	\
-		.flags = ATOMIC_INIT(0),			\
-		.func  = (_f)					\
+static inline bool irq_work_pending(struct irq_work *work)
+{
+	return atomic_read(&work->node.a_flags) & IRQ_WORK_PENDING;
 }
 
+static inline bool irq_work_busy(struct irq_work *work)
+{
+	return atomic_read(&work->node.a_flags) & IRQ_WORK_BUSY;
+}
 
 bool irq_work_queue(struct irq_work *work);
 bool irq_work_queue_on(struct irq_work *work, int cpu);
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -90,12 +90,12 @@ do {						\
 
 # define lockdep_irq_work_enter(__work)					\
 	  do {								\
-		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+		  if (!(atomic_read(&__work->node.a_flags) & IRQ_WORK_HARD_IRQ))\
 			current->irq_config = 1;			\
 	  } while (0)
 # define lockdep_irq_work_exit(__work)					\
 	  do {								\
-		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+		  if (!(atomic_read(&__work->node.a_flags) & IRQ_WORK_HARD_IRQ))\
 			current->irq_config = 0;			\
 	  } while (0)
 
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -292,7 +292,7 @@ static void stack_map_get_build_id_offse
 	if (irqs_disabled()) {
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 			work = this_cpu_ptr(&up_read_work);
-			if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY) {
+			if (irq_work_busy(&work->irq_work)) {
 				/* cannot queue more up_read, fallback */
 				irq_work_busy = true;
 			}
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -31,7 +31,7 @@ static bool irq_work_claim(struct irq_wo
 {
 	int oflags;
 
-	oflags = atomic_fetch_or(IRQ_WORK_CLAIMED | CSD_TYPE_IRQ_WORK, &work->flags);
+	oflags = atomic_fetch_or(IRQ_WORK_CLAIMED | CSD_TYPE_IRQ_WORK, &work->node.a_flags);
 	/*
 	 * If the work is already pending, no need to raise the IPI.
 	 * The pairing atomic_fetch_andnot() in irq_work_run() makes sure
@@ -53,12 +53,12 @@ void __weak arch_irq_work_raise(void)
 static void __irq_work_queue_local(struct irq_work *work)
 {
 	/* If the work is "lazy", handle it from next tick if any */
-	if (atomic_read(&work->flags) & IRQ_WORK_LAZY) {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
+	if (atomic_read(&work->node.a_flags) & IRQ_WORK_LAZY) {
+		if (llist_add(&work->node.llist, this_cpu_ptr(&lazy_list)) &&
 		    tick_nohz_tick_stopped())
 			arch_irq_work_raise();
 	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+		if (llist_add(&work->node.llist, this_cpu_ptr(&raised_list)))
 			arch_irq_work_raise();
 	}
 }
@@ -102,7 +102,7 @@ bool irq_work_queue_on(struct irq_work *
 	if (cpu != smp_processor_id()) {
 		/* Arch remote IPI send/receive backend aren't NMI safe */
 		WARN_ON_ONCE(in_nmi());
-		__smp_call_single_queue(cpu, &work->llnode);
+		__smp_call_single_queue(cpu, &work->node.llist);
 	} else {
 		__irq_work_queue_local(work);
 	}
@@ -142,7 +142,7 @@ void irq_work_single(void *arg)
 	 * to claim that work don't rely on us to handle their data
 	 * while we are in the middle of the func.
 	 */
-	flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
+	flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->node.a_flags);
 
 	lockdep_irq_work_enter(work);
 	work->func(work);
@@ -152,7 +152,7 @@ void irq_work_single(void *arg)
 	 * no-one else claimed it meanwhile.
 	 */
 	flags &= ~IRQ_WORK_PENDING;
-	(void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
+	(void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
 }
 
 static void irq_work_run_list(struct llist_head *list)
@@ -166,7 +166,7 @@ static void irq_work_run_list(struct lli
 		return;
 
 	llnode = llist_del_all(list);
-	llist_for_each_entry_safe(work, tmp, llnode, llnode)
+	llist_for_each_entry_safe(work, tmp, llnode, node.llist)
 		irq_work_single(work);
 }
 
@@ -198,7 +198,7 @@ void irq_work_sync(struct irq_work *work
 {
 	lockdep_assert_irqs_enabled();
 
-	while (atomic_read(&work->flags) & IRQ_WORK_BUSY)
+	while (irq_work_busy(work))
 		cpu_relax();
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3031,10 +3031,8 @@ static void wake_up_klogd_work_func(stru
 		wake_up_interruptible(&log_wait);
 }
 
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = ATOMIC_INIT(IRQ_WORK_LAZY),
-};
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) =
+	IRQ_WORK_INIT_LAZY(wake_up_klogd_work_func);
 
 void wake_up_klogd(void)
 {
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1273,8 +1273,6 @@ static int rcu_implicit_dynticks_qs(stru
 		if (IS_ENABLED(CONFIG_IRQ_WORK) &&
 		    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
 		    (rnp->ffmask & rdp->grpmask)) {
-			init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
-			atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
 			rdp->rcu_iw_pending = true;
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
@@ -3740,6 +3738,7 @@ int rcutree_prepare_cpu(unsigned int cpu
 	rdp->cpu_no_qs.b.norm = true;
 	rdp->core_needs_qs = false;
 	rdp->rcu_iw_pending = false;
+	rdp->rcu_iw = IRQ_WORK_INIT_HARD(rcu_iw_handler);
 	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
 	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -243,10 +243,8 @@ static void nohz_full_kick_func(struct i
 	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
 }
 
-static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
-	.func = nohz_full_kick_func,
-	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
-};
+static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) =
+	IRQ_WORK_INIT_HARD(nohz_full_kick_func);
 
 /*
  * Kick this CPU if it's full dynticks in order to force it to
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1013,7 +1013,7 @@ static int bpf_send_signal_common(u32 si
 			return -EINVAL;
 
 		work = this_cpu_ptr(&send_signal_work);
-		if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY)
+		if (irq_work_busy(&work->irq_work))
 			return -EBUSY;
 
 		/* Add the current task, which is the target of sending signal,



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
                   ` (4 preceding siblings ...)
  2020-06-15 12:56 ` [PATCH 5/6] irq_work: Cleanup Peter Zijlstra
@ 2020-06-15 12:57 ` Peter Zijlstra
  2020-06-15 14:34   ` Jens Axboe
                     ` (2 more replies)
  2020-06-15 16:23 ` [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Paul E. McKenney
  6 siblings, 3 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 12:57 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, peterz, tsbogend,
	axboe, rjw, daniel.lezcano, dchickles, davem, kuba,
	daniel.thompson, gerald.schaefer

Get rid of the __call_single_node union and cleanup the API a little
to avoid external code relying on the structure layout as much.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/mips/kernel/process.c                      |    3 -
 arch/mips/kernel/smp.c                          |   24 +++-----------
 arch/s390/pci/pci_irq.c                         |    4 --
 arch/x86/kernel/cpuid.c                         |    5 ---
 arch/x86/lib/msr-smp.c                          |    5 ---
 block/blk-mq.c                                  |    4 --
 block/blk-softirq.c                             |    9 +----
 drivers/cpuidle/coupled.c                       |    3 -
 drivers/net/ethernet/cavium/liquidio/lio_core.c |    9 +----
 include/linux/smp.h                             |   11 ++----
 kernel/debug/debug_core.c                       |    5 +--
 kernel/sched/core.c                             |   12 +------
 kernel/smp.c                                    |   40 ++++++++++++------------
 net/core/dev.c                                  |    3 -
 14 files changed, 44 insertions(+), 93 deletions(-)

--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -686,7 +686,7 @@ unsigned long arch_align_stack(unsigned
 	return sp & ALMASK;
 }
 
-static DEFINE_PER_CPU(call_single_data_t, backtrace_csd);
+static DEFINE_PER_CPU(call_single_data_t, backtrace_csd) = CSD_INIT(handle_backtrace, NULL);
 static struct cpumask backtrace_csd_busy;
 
 static void handle_backtrace(void *info)
@@ -714,7 +714,6 @@ static void raise_backtrace(cpumask_t *m
 		}
 
 		csd = &per_cpu(backtrace_csd, cpu);
-		csd->func = handle_backtrace;
 		smp_call_function_single_async(cpu, csd);
 	}
 }
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -687,36 +687,22 @@ EXPORT_SYMBOL(flush_tlb_one);
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 
-static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd);
-
-void tick_broadcast(const struct cpumask *mask)
-{
-	call_single_data_t *csd;
-	int cpu;
-
-	for_each_cpu(cpu, mask) {
-		csd = &per_cpu(tick_broadcast_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
-	}
-}
-
 static void tick_broadcast_callee(void *info)
 {
 	tick_receive_broadcast();
 }
 
-static int __init tick_broadcast_init(void)
+static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd) = CSD_INIT(tick_broadcast_callee, NULL);
+
+void tick_broadcast(const struct cpumask *mask)
 {
 	call_single_data_t *csd;
 	int cpu;
 
-	for (cpu = 0; cpu < NR_CPUS; cpu++) {
+	for_each_cpu(cpu, mask) {
 		csd = &per_cpu(tick_broadcast_csd, cpu);
-		csd->func = tick_broadcast_callee;
+		smp_call_function_single_async(cpu, csd);
 	}
-
-	return 0;
 }
-early_initcall(tick_broadcast_init);
 
 #endif /* CONFIG_GENERIC_CLOCKEVENTS_BROADCAST */
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
 		if (atomic_inc_return(&cpu_data->scheduled) > 1)
 			continue;
 
-		cpu_data->csd.func = zpci_handle_remote_irq;
-		cpu_data->csd.info = &cpu_data->scheduled;
-		cpu_data->csd.flags = 0;
+		cpu_data->csd = CSD_INIT(zpci_handle_remote_irq, &cpu_data->scheduled);
 		smp_call_function_single_async(cpu, &cpu_data->csd);
 	}
 }
--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -74,10 +74,7 @@ static ssize_t cpuid_read(struct file *f
 
 	init_completion(&cmd.done);
 	for (; count; count -= 16) {
-		call_single_data_t csd = {
-			.func = cpuid_smp_cpuid,
-			.info = &cmd,
-		};
+		call_single_data_t csd = CSD_INIT(cpuid_smp_cpuid, &cmd);
 
 		cmd.regs.eax = pos;
 		cmd.regs.ecx = pos >> 32;
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -169,10 +169,7 @@ static void __wrmsr_safe_on_cpu(void *in
 int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 {
 	struct msr_info_completion rv;
-	call_single_data_t csd = {
-		.func	= __rdmsr_safe_on_cpu,
-		.info	= &rv,
-	};
+	call_single_data_t csd = CSD_INIT(__rdmsr_safe_on_cpu, &rv);
 	int err;
 
 	memset(&rv, 0, sizeof(rv));
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -629,9 +629,7 @@ void blk_mq_force_complete_rq(struct req
 		shared = cpus_share_cache(cpu, ctx->cpu);
 
 	if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
-		rq->csd.func = __blk_mq_complete_request_remote;
-		rq->csd.info = rq;
-		rq->csd.flags = 0;
+		rq->csd = CSD_INIT(__blk_mq_complete_request_remote, rq);
 		smp_call_function_single_async(ctx->cpu, &rq->csd);
 	} else {
 		q->mq_ops->complete(rq);
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -57,13 +57,8 @@ static void trigger_softirq(void *data)
 static int raise_blk_irq(int cpu, struct request *rq)
 {
 	if (cpu_online(cpu)) {
-		call_single_data_t *data = &rq->csd;
-
-		data->func = trigger_softirq;
-		data->info = rq;
-		data->flags = 0;
-
-		smp_call_function_single_async(cpu, data);
+		rq->csd = CSD_INIT(trigger_softirq, rq);
+		smp_call_function_single_async(cpu, &rq->csd);
 		return 0;
 	}
 
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -674,8 +674,7 @@ int cpuidle_coupled_register_device(stru
 	coupled->refcnt++;
 
 	csd = &per_cpu(cpuidle_coupled_poke_cb, dev->cpu);
-	csd->func = cpuidle_coupled_handle_poke;
-	csd->info = (void *)(unsigned long)dev->cpu;
+	*csd = CSD_INIT(cpuidle_coupled_handle_poke, (void *)(unsigned long)dev->cpu);
 
 	return 0;
 }
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -726,13 +726,8 @@ static void liquidio_napi_drv_callback(v
 	    droq->cpu_id == this_cpu) {
 		napi_schedule_irqoff(&droq->napi);
 	} else {
-		call_single_data_t *csd = &droq->csd;
-
-		csd->func = napi_schedule_wrapper;
-		csd->info = &droq->napi;
-		csd->flags = 0;
-
-		smp_call_function_single_async(droq->cpu_id, csd);
+		droq->csd = CSD_INIT(napi_schedule_wrapper, &droq->napi);
+		smp_call_function_single_async(droq->cpu_id, &droq->csd);
 	}
 }
 
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -21,17 +21,14 @@ typedef bool (*smp_cond_func_t)(int cpu,
  * structure shares (partial) layout with struct irq_work
  */
 struct __call_single_data {
-	union {
-		struct __call_single_node node;
-		struct {
-			struct llist_node llist;
-			unsigned int flags;
-		};
-	};
+	struct __call_single_node node;
 	smp_call_func_t func;
 	void *info;
 };
 
+#define CSD_INIT(_func, _info) \
+	(struct __call_single_data){ .func = (_func), .info = (_info), }
+
 /* Use __aligned() to avoid to use 2 cache lines for 1 csd */
 typedef struct __call_single_data call_single_data_t
 	__aligned(sizeof(struct __call_single_data));
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -225,8 +225,6 @@ int __weak kgdb_skipexception(int except
  * Default (weak) implementation for kgdb_roundup_cpus
  */
 
-static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
-
 void __weak kgdb_call_nmi_hook(void *ignored)
 {
 	/*
@@ -240,6 +238,8 @@ void __weak kgdb_call_nmi_hook(void *ign
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
+static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd) = CSD_INIT(kgdb_call_nmi_hook, NULL);
+
 void __weak kgdb_roundup_cpus(void)
 {
 	call_single_data_t *csd;
@@ -266,7 +266,6 @@ void __weak kgdb_roundup_cpus(void)
 			continue;
 		kgdb_info[cpu].rounding_up = true;
 
-		csd->func = kgdb_call_nmi_hook;
 		ret = smp_call_function_single_async(cpu, csd);
 		if (ret)
 			kgdb_info[cpu].rounding_up = false;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -221,14 +221,6 @@ void update_rq_clock(struct rq *rq)
 	update_rq_clock_task(rq, delta);
 }
 
-static inline void
-rq_csd_init(struct rq *rq, call_single_data_t *csd, smp_call_func_t func)
-{
-	csd->flags = 0;
-	csd->func = func;
-	csd->info = rq;
-}
-
 #ifdef CONFIG_SCHED_HRTICK
 /*
  * Use HR-timers to deliver accurate preemption points.
@@ -329,7 +321,7 @@ void hrtick_start(struct rq *rq, u64 del
 static void hrtick_rq_init(struct rq *rq)
 {
 #ifdef CONFIG_SMP
-	rq_csd_init(rq, &rq->hrtick_csd, __hrtick_start);
+	rq->hrtick_csd = CSD_INIT(__hrtick_start, rq);
 #endif
 	hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
 	rq->hrtick_timer.function = hrtick;
@@ -6776,7 +6768,7 @@ void __init sched_init(void)
 		rq->last_blocked_load_update_tick = jiffies;
 		atomic_set(&rq->nohz_flags, 0);
 
-		rq_csd_init(rq, &rq->nohz_csd, nohz_csd_func);
+		rq->nohz_csd = CSD_INIT(nohz_csd_func, rq);
 #endif
 #endif /* CONFIG_SMP */
 		hrtick_rq_init(rq);
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -24,7 +24,7 @@
 #include "smpboot.h"
 #include "sched/smp.h"
 
-#define CSD_TYPE(_csd)	((_csd)->flags & CSD_FLAG_TYPE_MASK)
+#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
 
 struct call_function_data {
 	call_single_data_t	__percpu *csd;
@@ -105,13 +105,13 @@ void __init call_function_init(void)
  */
 static __always_inline void csd_lock_wait(call_single_data_t *csd)
 {
-	smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
+	smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK));
 }
 
 static __always_inline void csd_lock(call_single_data_t *csd)
 {
 	csd_lock_wait(csd);
-	csd->flags |= CSD_FLAG_LOCK;
+	csd->node.u_flags |= CSD_FLAG_LOCK;
 
 	/*
 	 * prevent CPU from reordering the above assignment
@@ -123,12 +123,12 @@ static __always_inline void csd_lock(cal
 
 static __always_inline void csd_unlock(call_single_data_t *csd)
 {
-	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
+	WARN_ON(!(csd->node.u_flags & CSD_FLAG_LOCK));
 
 	/*
 	 * ensure we're all done before releasing data:
 	 */
-	smp_store_release(&csd->flags, 0);
+	smp_store_release(&csd->node.u_flags, 0);
 }
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
@@ -180,7 +180,7 @@ static int generic_exec_single(int cpu,
 		return -ENXIO;
 	}
 
-	__smp_call_single_queue(cpu, &csd->llist);
+	__smp_call_single_queue(cpu, &csd->node.llist);
 
 	return 0;
 }
@@ -233,7 +233,7 @@ static void flush_smp_call_function_queu
 		 * We don't have to use the _safe() variant here
 		 * because we are not invoking the IPI handlers yet.
 		 */
-		llist_for_each_entry(csd, entry, llist) {
+		llist_for_each_entry(csd, entry, node.llist) {
 			switch (CSD_TYPE(csd)) {
 			case CSD_TYPE_ASYNC:
 			case CSD_TYPE_SYNC:
@@ -258,22 +258,22 @@ static void flush_smp_call_function_queu
 	 * First; run all SYNC callbacks, people are waiting for us.
 	 */
 	prev = NULL;
-	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+	llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
 		/* Do we wait until *after* callback? */
 		if (CSD_TYPE(csd) == CSD_TYPE_SYNC) {
 			smp_call_func_t func = csd->func;
 			void *info = csd->info;
 
 			if (prev) {
-				prev->next = &csd_next->llist;
+				prev->next = &csd_next->node.llist;
 			} else {
-				entry = &csd_next->llist;
+				entry = &csd_next->node.llist;
 			}
 
 			func(info);
 			csd_unlock(csd);
 		} else {
-			prev = &csd->llist;
+			prev = &csd->node.llist;
 		}
 	}
 
@@ -284,14 +284,14 @@ static void flush_smp_call_function_queu
 	 * Second; run all !SYNC callbacks.
 	 */
 	prev = NULL;
-	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+	llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
 		int type = CSD_TYPE(csd);
 
 		if (type != CSD_TYPE_TTWU) {
 			if (prev) {
-				prev->next = &csd_next->llist;
+				prev->next = &csd_next->node.llist;
 			} else {
-				entry = &csd_next->llist;
+				entry = &csd_next->node.llist;
 			}
 
 			if (type == CSD_TYPE_ASYNC) {
@@ -305,7 +305,7 @@ static void flush_smp_call_function_queu
 			}
 
 		} else {
-			prev = &csd->llist;
+			prev = &csd->node.llist;
 		}
 	}
 
@@ -341,7 +341,7 @@ int smp_call_function_single(int cpu, sm
 {
 	call_single_data_t *csd;
 	call_single_data_t csd_stack = {
-		.flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC,
+		.node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, },
 	};
 	int this_cpu;
 	int err;
@@ -416,12 +416,12 @@ int smp_call_function_single_async(int c
 
 	preempt_disable();
 
-	if (csd->flags & CSD_FLAG_LOCK) {
+	if (csd->node.u_flags & CSD_FLAG_LOCK) {
 		err = -EBUSY;
 		goto out;
 	}
 
-	csd->flags = CSD_FLAG_LOCK;
+	csd->node.u_flags = CSD_FLAG_LOCK;
 	smp_wmb();
 
 	err = generic_exec_single(cpu, csd);
@@ -539,10 +539,10 @@ static void smp_call_function_many_cond(
 
 		csd_lock(csd);
 		if (wait)
-			csd->flags |= CSD_TYPE_SYNC;
+			csd->node.u_flags |= CSD_TYPE_SYNC;
 		csd->func = func;
 		csd->info = info;
-		if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
+		if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu)))
 			__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
 	}
 
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10643,8 +10643,7 @@ static int __init net_dev_init(void)
 		INIT_LIST_HEAD(&sd->poll_list);
 		sd->output_queue_tailp = &sd->output_queue;
 #ifdef CONFIG_RPS
-		sd->csd.func = rps_trigger_softirq;
-		sd->csd.info = sd;
+		sd->csd = CSD_INIT(rps_trigger_softirq, sd);
 		sd->cpu = i;
 #endif
 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] sched: Fix ttwu_queue_cond()
  2020-06-15 12:56 ` [PATCH 1/6] sched: Fix ttwu_queue_cond() Peter Zijlstra
@ 2020-06-15 13:34   ` Peter Zijlstra
  2020-06-15 16:45     ` Paul E. McKenney
  2020-06-22  9:11   ` Mel Gorman
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 13:34 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic

On Mon, Jun 15, 2020 at 02:56:55PM +0200, Peter Zijlstra wrote:
> Where the condition:
> 
>   !cpus_share_cache(smp_processor_id(), cpu)
> 
> already implies 'cpu != smp_processor_id()', because a CPU always
> shares cache with itself, the secondary condition added in commit:
> 
>   2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> 
> voids that implication, resulting in attempting to do local wake-ups
> through the queue mechanism.
> 
> Fixes: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> Reported-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  kernel/sched/core.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2356,11 +2356,22 @@ bool cpus_share_cache(int this_cpu, int
>  
>  static inline bool ttwu_queue_cond(int cpu, int wake_flags)
>  {
> +	int this_cpu = smp_processor_id();
> +
> +	/*
> +	 * Only ever queue for remote wakeups. The on_cpu case can only ever
> +	 * happen remotely, and for the normal case it makes no sense to

The 'funny' thing here is, that this must be false for this patch to
make any difference.. I just cannot see how.

Also, if this is false, and p->on_cpu == 1 and p->cpu == this_cpu, then
p _should_ be current, in which case we should never get here either,
due to the 'p == current' special case in try_to_wake_up().

The only other option is that 'p == next', but then we'd be doing
wakeups from the middle of __schedule() and seems 'unlikely' too, esp.
so since none of the actual stack-traces we have shows that.

So colour me terribly confused.

> +	 * involve IPIs here, and would be broken, as many architectures cannot
> +	 * trivially IPI self in any case.
> +	 */
> +	if (cpu == this_cpu)
> +		return false;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-15 12:57 ` [PATCH 6/6] smp: Cleanup smp_call_function*() Peter Zijlstra
@ 2020-06-15 14:34   ` Jens Axboe
  2020-06-15 16:04   ` Daniel Thompson
  2020-06-17  8:23   ` Christoph Hellwig
  2 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2020-06-15 14:34 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, tglx
  Cc: linux-kernel, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, paulmck, frederic, tsbogend, rjw,
	daniel.lezcano, dchickles, davem, kuba, daniel.thompson,
	gerald.schaefer

On 6/15/20 6:57 AM, Peter Zijlstra wrote:
> Get rid of the __call_single_node union and cleanup the API a little
> to avoid external code relying on the structure layout as much.

core and block bits look good to me.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-15 12:57 ` [PATCH 6/6] smp: Cleanup smp_call_function*() Peter Zijlstra
  2020-06-15 14:34   ` Jens Axboe
@ 2020-06-15 16:04   ` Daniel Thompson
  2020-06-17  8:23   ` Christoph Hellwig
  2 siblings, 0 replies; 37+ messages in thread
From: Daniel Thompson @ 2020-06-15 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
	tsbogend, axboe, rjw, daniel.lezcano, dchickles, davem, kuba,
	gerald.schaefer

On Mon, Jun 15, 2020 at 02:57:00PM +0200, Peter Zijlstra wrote:
> Get rid of the __call_single_node union and cleanup the API a little
> to avoid external code relying on the structure layout as much.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

For kgdb,
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>


Daniel.


> ---
>  arch/mips/kernel/process.c                      |    3 -
>  arch/mips/kernel/smp.c                          |   24 +++-----------
>  arch/s390/pci/pci_irq.c                         |    4 --
>  arch/x86/kernel/cpuid.c                         |    5 ---
>  arch/x86/lib/msr-smp.c                          |    5 ---
>  block/blk-mq.c                                  |    4 --
>  block/blk-softirq.c                             |    9 +----
>  drivers/cpuidle/coupled.c                       |    3 -
>  drivers/net/ethernet/cavium/liquidio/lio_core.c |    9 +----
>  include/linux/smp.h                             |   11 ++----
>  kernel/debug/debug_core.c                       |    5 +--
>  kernel/sched/core.c                             |   12 +------
>  kernel/smp.c                                    |   40 ++++++++++++------------
>  net/core/dev.c                                  |    3 -
>  14 files changed, 44 insertions(+), 93 deletions(-)
> 
> --- a/arch/mips/kernel/process.c
> +++ b/arch/mips/kernel/process.c
> @@ -686,7 +686,7 @@ unsigned long arch_align_stack(unsigned
>  	return sp & ALMASK;
>  }
>  
> -static DEFINE_PER_CPU(call_single_data_t, backtrace_csd);
> +static DEFINE_PER_CPU(call_single_data_t, backtrace_csd) = CSD_INIT(handle_backtrace, NULL);
>  static struct cpumask backtrace_csd_busy;
>  
>  static void handle_backtrace(void *info)
> @@ -714,7 +714,6 @@ static void raise_backtrace(cpumask_t *m
>  		}
>  
>  		csd = &per_cpu(backtrace_csd, cpu);
> -		csd->func = handle_backtrace;
>  		smp_call_function_single_async(cpu, csd);
>  	}
>  }
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -687,36 +687,22 @@ EXPORT_SYMBOL(flush_tlb_one);
>  
>  #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
>  
> -static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd);
> -
> -void tick_broadcast(const struct cpumask *mask)
> -{
> -	call_single_data_t *csd;
> -	int cpu;
> -
> -	for_each_cpu(cpu, mask) {
> -		csd = &per_cpu(tick_broadcast_csd, cpu);
> -		smp_call_function_single_async(cpu, csd);
> -	}
> -}
> -
>  static void tick_broadcast_callee(void *info)
>  {
>  	tick_receive_broadcast();
>  }
>  
> -static int __init tick_broadcast_init(void)
> +static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd) = CSD_INIT(tick_broadcast_callee, NULL);
> +
> +void tick_broadcast(const struct cpumask *mask)
>  {
>  	call_single_data_t *csd;
>  	int cpu;
>  
> -	for (cpu = 0; cpu < NR_CPUS; cpu++) {
> +	for_each_cpu(cpu, mask) {
>  		csd = &per_cpu(tick_broadcast_csd, cpu);
> -		csd->func = tick_broadcast_callee;
> +		smp_call_function_single_async(cpu, csd);
>  	}
> -
> -	return 0;
>  }
> -early_initcall(tick_broadcast_init);
>  
>  #endif /* CONFIG_GENERIC_CLOCKEVENTS_BROADCAST */
> --- a/arch/s390/pci/pci_irq.c
> +++ b/arch/s390/pci/pci_irq.c
> @@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
>  		if (atomic_inc_return(&cpu_data->scheduled) > 1)
>  			continue;
>  
> -		cpu_data->csd.func = zpci_handle_remote_irq;
> -		cpu_data->csd.info = &cpu_data->scheduled;
> -		cpu_data->csd.flags = 0;
> +		cpu_data->csd = CSD_INIT(zpci_handle_remote_irq, &cpu_data->scheduled);
>  		smp_call_function_single_async(cpu, &cpu_data->csd);
>  	}
>  }
> --- a/arch/x86/kernel/cpuid.c
> +++ b/arch/x86/kernel/cpuid.c
> @@ -74,10 +74,7 @@ static ssize_t cpuid_read(struct file *f
>  
>  	init_completion(&cmd.done);
>  	for (; count; count -= 16) {
> -		call_single_data_t csd = {
> -			.func = cpuid_smp_cpuid,
> -			.info = &cmd,
> -		};
> +		call_single_data_t csd = CSD_INIT(cpuid_smp_cpuid, &cmd);
>  
>  		cmd.regs.eax = pos;
>  		cmd.regs.ecx = pos >> 32;
> --- a/arch/x86/lib/msr-smp.c
> +++ b/arch/x86/lib/msr-smp.c
> @@ -169,10 +169,7 @@ static void __wrmsr_safe_on_cpu(void *in
>  int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
>  {
>  	struct msr_info_completion rv;
> -	call_single_data_t csd = {
> -		.func	= __rdmsr_safe_on_cpu,
> -		.info	= &rv,
> -	};
> +	call_single_data_t csd = CSD_INIT(__rdmsr_safe_on_cpu, &rv);
>  	int err;
>  
>  	memset(&rv, 0, sizeof(rv));
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -629,9 +629,7 @@ void blk_mq_force_complete_rq(struct req
>  		shared = cpus_share_cache(cpu, ctx->cpu);
>  
>  	if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
> -		rq->csd.func = __blk_mq_complete_request_remote;
> -		rq->csd.info = rq;
> -		rq->csd.flags = 0;
> +		rq->csd = CSD_INIT(__blk_mq_complete_request_remote, rq);
>  		smp_call_function_single_async(ctx->cpu, &rq->csd);
>  	} else {
>  		q->mq_ops->complete(rq);
> --- a/block/blk-softirq.c
> +++ b/block/blk-softirq.c
> @@ -57,13 +57,8 @@ static void trigger_softirq(void *data)
>  static int raise_blk_irq(int cpu, struct request *rq)
>  {
>  	if (cpu_online(cpu)) {
> -		call_single_data_t *data = &rq->csd;
> -
> -		data->func = trigger_softirq;
> -		data->info = rq;
> -		data->flags = 0;
> -
> -		smp_call_function_single_async(cpu, data);
> +		rq->csd = CSD_INIT(trigger_softirq, rq);
> +		smp_call_function_single_async(cpu, &rq->csd);
>  		return 0;
>  	}
>  
> --- a/drivers/cpuidle/coupled.c
> +++ b/drivers/cpuidle/coupled.c
> @@ -674,8 +674,7 @@ int cpuidle_coupled_register_device(stru
>  	coupled->refcnt++;
>  
>  	csd = &per_cpu(cpuidle_coupled_poke_cb, dev->cpu);
> -	csd->func = cpuidle_coupled_handle_poke;
> -	csd->info = (void *)(unsigned long)dev->cpu;
> +	*csd = CSD_INIT(cpuidle_coupled_handle_poke, (void *)(unsigned long)dev->cpu);
>  
>  	return 0;
>  }
> --- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
> @@ -726,13 +726,8 @@ static void liquidio_napi_drv_callback(v
>  	    droq->cpu_id == this_cpu) {
>  		napi_schedule_irqoff(&droq->napi);
>  	} else {
> -		call_single_data_t *csd = &droq->csd;
> -
> -		csd->func = napi_schedule_wrapper;
> -		csd->info = &droq->napi;
> -		csd->flags = 0;
> -
> -		smp_call_function_single_async(droq->cpu_id, csd);
> +		droq->csd = CSD_INIT(napi_schedule_wrapper, &droq->napi);
> +		smp_call_function_single_async(droq->cpu_id, &droq->csd);
>  	}
>  }
>  
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -21,17 +21,14 @@ typedef bool (*smp_cond_func_t)(int cpu,
>   * structure shares (partial) layout with struct irq_work
>   */
>  struct __call_single_data {
> -	union {
> -		struct __call_single_node node;
> -		struct {
> -			struct llist_node llist;
> -			unsigned int flags;
> -		};
> -	};
> +	struct __call_single_node node;
>  	smp_call_func_t func;
>  	void *info;
>  };
>  
> +#define CSD_INIT(_func, _info) \
> +	(struct __call_single_data){ .func = (_func), .info = (_info), }
> +
>  /* Use __aligned() to avoid to use 2 cache lines for 1 csd */
>  typedef struct __call_single_data call_single_data_t
>  	__aligned(sizeof(struct __call_single_data));
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -225,8 +225,6 @@ int __weak kgdb_skipexception(int except
>   * Default (weak) implementation for kgdb_roundup_cpus
>   */
>  
> -static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
> -
>  void __weak kgdb_call_nmi_hook(void *ignored)
>  {
>  	/*
> @@ -240,6 +238,8 @@ void __weak kgdb_call_nmi_hook(void *ign
>  	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd) = CSD_INIT(kgdb_call_nmi_hook, NULL);
> +
>  void __weak kgdb_roundup_cpus(void)
>  {
>  	call_single_data_t *csd;
> @@ -266,7 +266,6 @@ void __weak kgdb_roundup_cpus(void)
>  			continue;
>  		kgdb_info[cpu].rounding_up = true;
>  
> -		csd->func = kgdb_call_nmi_hook;
>  		ret = smp_call_function_single_async(cpu, csd);
>  		if (ret)
>  			kgdb_info[cpu].rounding_up = false;
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -221,14 +221,6 @@ void update_rq_clock(struct rq *rq)
>  	update_rq_clock_task(rq, delta);
>  }
>  
> -static inline void
> -rq_csd_init(struct rq *rq, call_single_data_t *csd, smp_call_func_t func)
> -{
> -	csd->flags = 0;
> -	csd->func = func;
> -	csd->info = rq;
> -}
> -
>  #ifdef CONFIG_SCHED_HRTICK
>  /*
>   * Use HR-timers to deliver accurate preemption points.
> @@ -329,7 +321,7 @@ void hrtick_start(struct rq *rq, u64 del
>  static void hrtick_rq_init(struct rq *rq)
>  {
>  #ifdef CONFIG_SMP
> -	rq_csd_init(rq, &rq->hrtick_csd, __hrtick_start);
> +	rq->hrtick_csd = CSD_INIT(__hrtick_start, rq);
>  #endif
>  	hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
>  	rq->hrtick_timer.function = hrtick;
> @@ -6776,7 +6768,7 @@ void __init sched_init(void)
>  		rq->last_blocked_load_update_tick = jiffies;
>  		atomic_set(&rq->nohz_flags, 0);
>  
> -		rq_csd_init(rq, &rq->nohz_csd, nohz_csd_func);
> +		rq->nohz_csd = CSD_INIT(nohz_csd_func, rq);
>  #endif
>  #endif /* CONFIG_SMP */
>  		hrtick_rq_init(rq);
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -24,7 +24,7 @@
>  #include "smpboot.h"
>  #include "sched/smp.h"
>  
> -#define CSD_TYPE(_csd)	((_csd)->flags & CSD_FLAG_TYPE_MASK)
> +#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
>  
>  struct call_function_data {
>  	call_single_data_t	__percpu *csd;
> @@ -105,13 +105,13 @@ void __init call_function_init(void)
>   */
>  static __always_inline void csd_lock_wait(call_single_data_t *csd)
>  {
> -	smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
> +	smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK));
>  }
>  
>  static __always_inline void csd_lock(call_single_data_t *csd)
>  {
>  	csd_lock_wait(csd);
> -	csd->flags |= CSD_FLAG_LOCK;
> +	csd->node.u_flags |= CSD_FLAG_LOCK;
>  
>  	/*
>  	 * prevent CPU from reordering the above assignment
> @@ -123,12 +123,12 @@ static __always_inline void csd_lock(cal
>  
>  static __always_inline void csd_unlock(call_single_data_t *csd)
>  {
> -	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
> +	WARN_ON(!(csd->node.u_flags & CSD_FLAG_LOCK));
>  
>  	/*
>  	 * ensure we're all done before releasing data:
>  	 */
> -	smp_store_release(&csd->flags, 0);
> +	smp_store_release(&csd->node.u_flags, 0);
>  }
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
> @@ -180,7 +180,7 @@ static int generic_exec_single(int cpu,
>  		return -ENXIO;
>  	}
>  
> -	__smp_call_single_queue(cpu, &csd->llist);
> +	__smp_call_single_queue(cpu, &csd->node.llist);
>  
>  	return 0;
>  }
> @@ -233,7 +233,7 @@ static void flush_smp_call_function_queu
>  		 * We don't have to use the _safe() variant here
>  		 * because we are not invoking the IPI handlers yet.
>  		 */
> -		llist_for_each_entry(csd, entry, llist) {
> +		llist_for_each_entry(csd, entry, node.llist) {
>  			switch (CSD_TYPE(csd)) {
>  			case CSD_TYPE_ASYNC:
>  			case CSD_TYPE_SYNC:
> @@ -258,22 +258,22 @@ static void flush_smp_call_function_queu
>  	 * First; run all SYNC callbacks, people are waiting for us.
>  	 */
>  	prev = NULL;
> -	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> +	llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
>  		/* Do we wait until *after* callback? */
>  		if (CSD_TYPE(csd) == CSD_TYPE_SYNC) {
>  			smp_call_func_t func = csd->func;
>  			void *info = csd->info;
>  
>  			if (prev) {
> -				prev->next = &csd_next->llist;
> +				prev->next = &csd_next->node.llist;
>  			} else {
> -				entry = &csd_next->llist;
> +				entry = &csd_next->node.llist;
>  			}
>  
>  			func(info);
>  			csd_unlock(csd);
>  		} else {
> -			prev = &csd->llist;
> +			prev = &csd->node.llist;
>  		}
>  	}
>  
> @@ -284,14 +284,14 @@ static void flush_smp_call_function_queu
>  	 * Second; run all !SYNC callbacks.
>  	 */
>  	prev = NULL;
> -	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> +	llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
>  		int type = CSD_TYPE(csd);
>  
>  		if (type != CSD_TYPE_TTWU) {
>  			if (prev) {
> -				prev->next = &csd_next->llist;
> +				prev->next = &csd_next->node.llist;
>  			} else {
> -				entry = &csd_next->llist;
> +				entry = &csd_next->node.llist;
>  			}
>  
>  			if (type == CSD_TYPE_ASYNC) {
> @@ -305,7 +305,7 @@ static void flush_smp_call_function_queu
>  			}
>  
>  		} else {
> -			prev = &csd->llist;
> +			prev = &csd->node.llist;
>  		}
>  	}
>  
> @@ -341,7 +341,7 @@ int smp_call_function_single(int cpu, sm
>  {
>  	call_single_data_t *csd;
>  	call_single_data_t csd_stack = {
> -		.flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC,
> +		.node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, },
>  	};
>  	int this_cpu;
>  	int err;
> @@ -416,12 +416,12 @@ int smp_call_function_single_async(int c
>  
>  	preempt_disable();
>  
> -	if (csd->flags & CSD_FLAG_LOCK) {
> +	if (csd->node.u_flags & CSD_FLAG_LOCK) {
>  		err = -EBUSY;
>  		goto out;
>  	}
>  
> -	csd->flags = CSD_FLAG_LOCK;
> +	csd->node.u_flags = CSD_FLAG_LOCK;
>  	smp_wmb();
>  
>  	err = generic_exec_single(cpu, csd);
> @@ -539,10 +539,10 @@ static void smp_call_function_many_cond(
>  
>  		csd_lock(csd);
>  		if (wait)
> -			csd->flags |= CSD_TYPE_SYNC;
> +			csd->node.u_flags |= CSD_TYPE_SYNC;
>  		csd->func = func;
>  		csd->info = info;
> -		if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
> +		if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu)))
>  			__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
>  	}
>  
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -10643,8 +10643,7 @@ static int __init net_dev_init(void)
>  		INIT_LIST_HEAD(&sd->poll_list);
>  		sd->output_queue_tailp = &sd->output_queue;
>  #ifdef CONFIG_RPS
> -		sd->csd.func = rps_trigger_softirq;
> -		sd->csd.info = sd;
> +		sd->csd = CSD_INIT(rps_trigger_softirq, sd);
>  		sd->cpu = i;
>  #endif
>  
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
                   ` (5 preceding siblings ...)
  2020-06-15 12:57 ` [PATCH 6/6] smp: Cleanup smp_call_function*() Peter Zijlstra
@ 2020-06-15 16:23 ` Paul E. McKenney
  2020-06-15 16:40   ` Peter Zijlstra
  6 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-15 16:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 02:56:54PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> So Paul reported rcutorture hitting a NULL dereference, and patch #1 fixes it.
> 
> Now, patch #1 is obviously correct, but I can't explain how exactly it leads to
> the observed NULL pointer dereference. The NULL pointer deref happens in
> find_matching_se()'s last while() loop when is_same_group() fails even though
> both parents are NULL.

My bisection of yet another bug sometimes hits the scheduler NULL pointer
dereference on older commits.  I will try out patch #2.

Whether this is reassuring or depressing, I have no idea.  :-/

> The only explanation I have for that is that we just did an activate_task()
> while: 'task_cpu(p) != cpu_of(rq)', because then 'p->se.cfs_rq' doesn't match.
> However, I can't see how the lack of #1 would lead to that. Never-the-less,
> patch #2 adds assertions to warn us of this case.
> 
> Patch #3 is a trivial rename that ought to eradicate some confusion.
> 
> The last 3 patches is what I ended up with for cleaning up the whole
> smp_call_function/irq_work/ttwu thing more.

Would it be possible to allow a target CPU # on those instances of
__call_single_data?  This is extremely helpful for debugging lost
smp_call_function*() calls.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-15 16:23 ` [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Paul E. McKenney
@ 2020-06-15 16:40   ` Peter Zijlstra
  2020-06-15 17:21     ` Paul E. McKenney
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 16:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 09:23:30AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 15, 2020 at 02:56:54PM +0200, Peter Zijlstra wrote:
> > Hi,
> > 
> > So Paul reported rcutorture hitting a NULL dereference, and patch #1 fixes it.
> > 
> > Now, patch #1 is obviously correct, but I can't explain how exactly it leads to
> > the observed NULL pointer dereference. The NULL pointer deref happens in
> > find_matching_se()'s last while() loop when is_same_group() fails even though
> > both parents are NULL.
> 
> My bisection of yet another bug sometimes hits the scheduler NULL pointer
> dereference on older commits.  I will try out patch #2.

Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
(FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).

> Whether this is reassuring or depressing, I have no idea.  :-/

Worrysome at least, I don't trust stuff I can't explain.

> > The only explanation I have for that is that we just did an activate_task()
> > while: 'task_cpu(p) != cpu_of(rq)', because then 'p->se.cfs_rq' doesn't match.
> > However, I can't see how the lack of #1 would lead to that. Never-the-less,
> > patch #2 adds assertions to warn us of this case.
> > 
> > Patch #3 is a trivial rename that ought to eradicate some confusion.
> > 
> > The last 3 patches is what I ended up with for cleaning up the whole
> > smp_call_function/irq_work/ttwu thing more.
> 
> Would it be possible to allow a target CPU # on those instances of
> __call_single_data?  This is extremely helpful for debugging lost
> smp_call_function*() calls.

target or source ? Either would be possible, perhaps even both. We have
a spare u32 in __call_single_node.

Something like the below on top of 1-4. If we want to keep this, we
should probably stick it under some CONFIG_DBUG thing or other.

--- a/include/linux/smp_types.h
+++ b/include/linux/smp_types.h
@@ -61,6 +61,7 @@ struct __call_single_node {
 		unsigned int	u_flags;
 		atomic_t	a_flags;
 	};
+	u16 src, dst;
 };
 
 #endif /* __LINUX_SMP_TYPES_H */
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -135,8 +135,14 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(cal
 
 void __smp_call_single_queue(int cpu, struct llist_node *node)
 {
+	struct __call_single_node *n =
+		container_of(node, struct __call_single_node, llist);
+
 	WARN_ON_ONCE(cpu == smp_processor_id());
 
+	n->src = smp_processor_id();
+	n->dst = cpu;
+
 	/*
 	 * The list addition should be visible before sending the IPI
 	 * handler locks the list to pull the entry off it because of


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] sched: Fix ttwu_queue_cond()
  2020-06-15 13:34   ` Peter Zijlstra
@ 2020-06-15 16:45     ` Paul E. McKenney
  2020-06-15 22:58       ` Paul E. McKenney
  0 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-15 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 03:34:09PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 02:56:55PM +0200, Peter Zijlstra wrote:
> > Where the condition:
> > 
> >   !cpus_share_cache(smp_processor_id(), cpu)
> > 
> > already implies 'cpu != smp_processor_id()', because a CPU always
> > shares cache with itself, the secondary condition added in commit:
> > 
> >   2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> > 
> > voids that implication, resulting in attempting to do local wake-ups
> > through the queue mechanism.
> > 
> > Fixes: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Tested-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  kernel/sched/core.c |   13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2356,11 +2356,22 @@ bool cpus_share_cache(int this_cpu, int
> >  
> >  static inline bool ttwu_queue_cond(int cpu, int wake_flags)
> >  {
> > +	int this_cpu = smp_processor_id();
> > +
> > +	/*
> > +	 * Only ever queue for remote wakeups. The on_cpu case can only ever
> > +	 * happen remotely, and for the normal case it makes no sense to
> 
> The 'funny' thing here is, that this must be false for this patch to
> make any difference.. I just cannot see how.
> 
> Also, if this is false, and p->on_cpu == 1 and p->cpu == this_cpu, then
> p _should_ be current, in which case we should never get here either,
> due to the 'p == current' special case in try_to_wake_up().
> 
> The only other option is that 'p == next', but then we'd be doing
> wakeups from the middle of __schedule() and seems 'unlikely' too, esp.
> so since none of the actual stack-traces we have shows that.
> 
> So colour me terribly confused.

I am rerunning with your patch 2 on the last bisection point that
resulted in scheduler NULL dereferences despite having your patch.
Hopefully some illumination will result...

						Thanx, Paul

> > +	 * involve IPIs here, and would be broken, as many architectures cannot
> > +	 * trivially IPI self in any case.
> > +	 */
> > +	if (cpu == this_cpu)
> > +		return false;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-15 16:40   ` Peter Zijlstra
@ 2020-06-15 17:21     ` Paul E. McKenney
  2020-06-15 19:11       ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-15 17:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 06:40:48PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 09:23:30AM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 15, 2020 at 02:56:54PM +0200, Peter Zijlstra wrote:
> > > Hi,
> > > 
> > > So Paul reported rcutorture hitting a NULL dereference, and patch #1 fixes it.
> > > 
> > > Now, patch #1 is obviously correct, but I can't explain how exactly it leads to
> > > the observed NULL pointer dereference. The NULL pointer deref happens in
> > > find_matching_se()'s last while() loop when is_same_group() fails even though
> > > both parents are NULL.
> > 
> > My bisection of yet another bug sometimes hits the scheduler NULL pointer
> > dereference on older commits.  I will try out patch #2.
> 
> Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
> (FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).

My large system as large remote memory latencies, as in the better part
of a microsecond.  My small system is old (Haswell).  So, just to grasp
at the obvious straw, do you have access to a multisocket Haswell system?

Or maybe Thomas can reproduce this one as well?

Right now I am running your earlier fix patch against mainline commit
d2d5439df22f ("Merge tag 'for-linus-5.8b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip").

> > Whether this is reassuring or depressing, I have no idea.  :-/
> 
> Worrysome at least, I don't trust stuff I can't explain.

I know that feeling!

> > > The only explanation I have for that is that we just did an activate_task()
> > > while: 'task_cpu(p) != cpu_of(rq)', because then 'p->se.cfs_rq' doesn't match.
> > > However, I can't see how the lack of #1 would lead to that. Never-the-less,
> > > patch #2 adds assertions to warn us of this case.
> > > 
> > > Patch #3 is a trivial rename that ought to eradicate some confusion.
> > > 
> > > The last 3 patches is what I ended up with for cleaning up the whole
> > > smp_call_function/irq_work/ttwu thing more.
> > 
> > Would it be possible to allow a target CPU # on those instances of
> > __call_single_data?  This is extremely helpful for debugging lost
> > smp_call_function*() calls.
> 
> target or source ? Either would be possible, perhaps even both. We have
> a spare u32 in __call_single_node.

The target CPU is the immediate concern, but I could easily imagine
cases where the source CPU might also be needed.

> Something like the below on top of 1-4. If we want to keep this, we
> should probably stick it under some CONFIG_DBUG thing or other.

Looks plausible to me, and CONFIG_DEBUG or whatever makes a lot of
sense.

How would you like to proceed?  I can take this one and then port
the existing debug code on top of it, for example.

							Thanx, Paul

> --- a/include/linux/smp_types.h
> +++ b/include/linux/smp_types.h
> @@ -61,6 +61,7 @@ struct __call_single_node {
>  		unsigned int	u_flags;
>  		atomic_t	a_flags;
>  	};
> +	u16 src, dst;
>  };
>  
>  #endif /* __LINUX_SMP_TYPES_H */
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -135,8 +135,14 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(cal
>  
>  void __smp_call_single_queue(int cpu, struct llist_node *node)
>  {
> +	struct __call_single_node *n =
> +		container_of(node, struct __call_single_node, llist);
> +
>  	WARN_ON_ONCE(cpu == smp_processor_id());
>  
> +	n->src = smp_processor_id();
> +	n->dst = cpu;
> +
>  	/*
>  	 * The list addition should be visible before sending the IPI
>  	 * handler locks the list to pull the entry off it because of
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-15 17:21     ` Paul E. McKenney
@ 2020-06-15 19:11       ` Peter Zijlstra
  2020-06-15 19:55         ` Paul E. McKenney
  2020-06-16 17:04         ` Peter Zijlstra
  0 siblings, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-15 19:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 10:21:49AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 15, 2020 at 06:40:48PM +0200, Peter Zijlstra wrote:

> > Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
> > (FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).
> 
> My large system as large remote memory latencies, as in the better part
> of a microsecond.  My small system is old (Haswell).  So, just to grasp
> at the obvious straw, do you have access to a multisocket Haswell system?

I've been running this on a 4 socket haswell ex.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-15 19:11       ` Peter Zijlstra
@ 2020-06-15 19:55         ` Paul E. McKenney
  2020-06-16 16:31           ` Paul E. McKenney
  2020-06-16 17:04         ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-15 19:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 09:11:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 10:21:49AM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 15, 2020 at 06:40:48PM +0200, Peter Zijlstra wrote:
> 
> > > Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
> > > (FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).
> > 
> > My large system as large remote memory latencies, as in the better part
> > of a microsecond.  My small system is old (Haswell).  So, just to grasp
> > at the obvious straw, do you have access to a multisocket Haswell system?
> 
> I've been running this on a 4 socket haswell ex.

OK, I officially have no idea, then!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] sched: Fix ttwu_queue_cond()
  2020-06-15 16:45     ` Paul E. McKenney
@ 2020-06-15 22:58       ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-15 22:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 09:45:41AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 15, 2020 at 03:34:09PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 02:56:55PM +0200, Peter Zijlstra wrote:
> > > Where the condition:
> > > 
> > >   !cpus_share_cache(smp_processor_id(), cpu)
> > > 
> > > already implies 'cpu != smp_processor_id()', because a CPU always
> > > shares cache with itself, the secondary condition added in commit:
> > > 
> > >   2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> > > 
> > > voids that implication, resulting in attempting to do local wake-ups
> > > through the queue mechanism.
> > > 
> > > Fixes: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> > > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Tested-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  kernel/sched/core.c |   13 ++++++++++++-
> > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > > 
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -2356,11 +2356,22 @@ bool cpus_share_cache(int this_cpu, int
> > >  
> > >  static inline bool ttwu_queue_cond(int cpu, int wake_flags)
> > >  {
> > > +	int this_cpu = smp_processor_id();
> > > +
> > > +	/*
> > > +	 * Only ever queue for remote wakeups. The on_cpu case can only ever
> > > +	 * happen remotely, and for the normal case it makes no sense to
> > 
> > The 'funny' thing here is, that this must be false for this patch to
> > make any difference.. I just cannot see how.
> > 
> > Also, if this is false, and p->on_cpu == 1 and p->cpu == this_cpu, then
> > p _should_ be current, in which case we should never get here either,
> > due to the 'p == current' special case in try_to_wake_up().
> > 
> > The only other option is that 'p == next', but then we'd be doing
> > wakeups from the middle of __schedule() and seems 'unlikely' too, esp.
> > so since none of the actual stack-traces we have shows that.
> > 
> > So colour me terribly confused.
> 
> I am rerunning with your patch 2 on the last bisection point that
> resulted in scheduler NULL dereferences despite having your patch.
> Hopefully some illumination will result...

No, Mr. Murphy is out in force.  I saw only the NULL pointer
dereferences without any WARN()s.  :-/

						Thanx, Paul

> > > +	 * involve IPIs here, and would be broken, as many architectures cannot
> > > +	 * trivially IPI self in any case.
> > > +	 */
> > > +	if (cpu == this_cpu)
> > > +		return false;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/6] irq_work: Cleanup
  2020-06-15 12:56 ` [PATCH 5/6] irq_work: Cleanup Peter Zijlstra
@ 2020-06-16 15:16   ` Petr Mladek
  0 siblings, 0 replies; 37+ messages in thread
From: Petr Mladek @ 2020-06-16 15:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
	ast, daniel, sergey.senozhatsky

On Mon 2020-06-15 14:56:59, Peter Zijlstra wrote:
> Get rid of the __call_single_node union and clean up the API a little
> to avoid external code relying on the structure layout as much.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Revieved-by: Petr Mladek <pmladek@suse.com>

I focused on the printk part. But the rest looks fine as well.
It is a nice cleanup.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-15 19:55         ` Paul E. McKenney
@ 2020-06-16 16:31           ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-16 16:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 12:55:22PM -0700, Paul E. McKenney wrote:
> On Mon, Jun 15, 2020 at 09:11:58PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 10:21:49AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 15, 2020 at 06:40:48PM +0200, Peter Zijlstra wrote:
> > 
> > > > Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
> > > > (FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).
> > > 
> > > My large system as large remote memory latencies, as in the better part
> > > of a microsecond.  My small system is old (Haswell).  So, just to grasp
> > > at the obvious straw, do you have access to a multisocket Haswell system?
> > 
> > I've been running this on a 4 socket haswell ex.
> 
> OK, I officially have no idea, then!

Ah, are you overcommitting?  None of my Haswell runs did any overcommit.

Also, any failures in the meantime?

Any debug/tracing/whatever that I could add to my runs?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-15 19:11       ` Peter Zijlstra
  2020-06-15 19:55         ` Paul E. McKenney
@ 2020-06-16 17:04         ` Peter Zijlstra
  2020-06-16 17:17           ` Peter Zijlstra
  2020-06-16 17:51           ` Paul E. McKenney
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-16 17:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Mon, Jun 15, 2020 at 09:11:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 10:21:49AM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 15, 2020 at 06:40:48PM +0200, Peter Zijlstra wrote:
> 
> > > Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
> > > (FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).
> > 
> > My large system as large remote memory latencies, as in the better part
> > of a microsecond.  My small system is old (Haswell).  So, just to grasp
> > at the obvious straw, do you have access to a multisocket Haswell system?
> 
> I've been running this on a 4 socket haswell ex.

Today, with patch 1 commented out, after ~5h20 I finally managed to trigger a splat.

Let me go stare at it, see if it wants to yield it sekrets

[19324.742887] BUG: kernel NULL pointer dereference, address: 0000000000000150
[19324.744075] #PF: supervisor read access in kernel mode
[19324.744919] #PF: error_code(0x0000) - not-present page
[19324.745786] PGD 0 P4D 0
[19324.746215] Oops: 0000 [#1] PREEMPT SMP PTI
[19324.746948] CPU: 10 PID: 76 Comm: ksoftirqd/10 Tainted: G        W         5.8.0-rc1+ #8
[19324.748080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
[19324.749372] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
[19324.750218] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
[19324.753364] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
[19324.754255] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
[19324.755465] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
[19324.756682] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
[19324.757848] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
[19324.758453] smpboot: CPU 11 is now offline
[19324.759099] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
[19324.761167] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
[19324.762559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19324.763527] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
[19324.764726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19324.765929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19324.767100] Call Trace:
[19324.767516]  <IRQ>
[19324.767875]  check_preempt_curr+0x62/0x90
[19324.768586]  ttwu_do_wakeup.constprop.0+0xf/0x100
[19324.769407]  sched_ttwu_pending+0xa9/0xe0
[19324.770077]  __sysvec_call_function_single+0x28/0xe0
[19324.770926]  asm_call_on_stack+0x12/0x20
[19324.771594]  </IRQ>
[19324.771951]  sysvec_call_function_single+0x94/0xd0
[19324.772596]  asm_sysvec_call_function_single+0x12/0x20
[19324.773254] RIP: 0010:_raw_spin_unlock_irqrestore+0x5/0x30
[19324.774169] Code: e4 49 ff c3 90 c6 07 00 bf 01 00 00 00 e8 23 2d 53 ff 65 8b 05 cc 32 4b 4c 85 c0 74 01 c3 e8 b9 e4 49 ff c3 90 c6 07 00 56 9d <bf> 01 00 00 00 e8 01 2d 53 ff 65 8b 05 aa 32 4b 4c 85 c0 74 01 c3
[19324.777267] RSP: 0000:ffffb3cb4030bd58 EFLAGS: 00000287
[19324.777956] RAX: 0000000000000001 RBX: ffff93c1dbed5b00 RCX: ffff93c1dcd63400
[19324.779015] RDX: 0000000000000000 RSI: 0000000000000287 RDI: ffff93c1dbed6284
[19324.780067] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
[19324.781192] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: 0000000000000000
[19324.782386] R13: 0000000000000287 R14: ffff93c1dbed6284 R15: ffff93c1df2e8340
[19324.783565]  try_to_wake_up+0x232/0x530
[19324.784057]  ? trace_raw_output_hrtimer_start+0x70/0x70
[19324.784977]  call_timer_fn+0x28/0x150
[19324.785606]  ? trace_raw_output_hrtimer_start+0x70/0x70
[19324.786486]  run_timer_softirq+0x182/0x250
[19324.787191]  ? set_next_entity+0x8b/0x1a0
[19324.787867]  ? _raw_spin_unlock_irq+0xe/0x20
[19324.788597]  ? finish_task_switch+0x7b/0x230
[19324.789338]  __do_softirq+0xfc/0x32b
[19324.789961]  ? smpboot_register_percpu_thread+0xd0/0xd0
[19324.790904]  run_ksoftirqd+0x21/0x30
[19324.791510]  smpboot_thread_fn+0x195/0x230
[19324.792203]  kthread+0x13d/0x160
[19324.792731]  ? kthread_create_worker_on_cpu+0x60/0x60
[19324.793576]  ret_from_fork+0x22/0x30
[19324.794186] Modules linked in:
[19324.794729] CR2: 0000000000000150
[19324.795303] ------------[ cut here ]------------
[19324.795304] WARNING: CPU: 10 PID: 76 at kernel/smp.c:138 __smp_call_single_queue+0x40/0x50
[19324.795305] Modules linked in:
[19324.795306] CPU: 10 PID: 76 Comm: ksoftirqd/10 Not tainted 5.8.0-rc1+ #8
[19324.795307] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
[19324.795307] RIP: 0010:__smp_call_single_queue+0x40/0x50
[19324.795308] Code: c2 40 91 02 00 4c 89 e6 4c 89 e7 48 03 14 c5 e0 56 2d b4 e8 b2 3a 2f 00 84 c0 75 04 5d 41 5c c3 89 ef 5d 41 5c e9 40 af f9 ff <0f> 0b eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 49 89 f4 55
[19324.795309] RSP: 0000:ffffb3cb4030bd18 EFLAGS: 00010046
[19324.795310] RAX: 000000000000000a RBX: 0000000000000000 RCX: 00000000ffffffff
[19324.795310] RDX: 00000000000090aa RSI: ffffffffb420bc3f RDI: ffffffffb4232e3e
[19324.795311] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
[19324.795312] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: ffff93c1dbed5b30
[19324.795312] R13: ffff93c1df4a8340 R14: 000000000000000a R15: ffff93c1df2e8340
[19324.795313] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
[19324.795313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19324.795314] CR2: 00000000ffffffff CR3: 000000001e40a000 CR4: 00000000000006e0
[19324.795315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19324.795315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19324.795316] Call Trace:
[19324.795316]  ttwu_queue_wakelist+0xa4/0xc0
[19324.795316]  try_to_wake_up+0x432/0x530
[19324.795317]  ? trace_raw_output_hrtimer_start+0x70/0x70
[19324.795317]  call_timer_fn+0x28/0x150
[19324.795318]  ? trace_raw_output_hrtimer_start+0x70/0x70
[19324.795318]  run_timer_softirq+0x182/0x250
[19324.795319]  ? set_next_entity+0x8b/0x1a0
[19324.795319]  ? _raw_spin_unlock_irq+0xe/0x20
[19324.795319]  ? finish_task_switch+0x7b/0x230
[19324.795320]  __do_softirq+0xfc/0x32b
[19324.795320]  ? smpboot_register_percpu_thread+0xd0/0xd0
[19324.795321]  run_ksoftirqd+0x21/0x30
[19324.795321]  smpboot_thread_fn+0x195/0x230
[19324.795321]  kthread+0x13d/0x160
[19324.795322]  ? kthread_create_worker_on_cpu+0x60/0x60
[19324.795322]  ret_from_fork+0x22/0x30
[19324.795323] ---[ end trace 851fe1f1f7a85d8b ]---
[19324.828475] ---[ end trace 851fe1f1f7a85d8c ]---
[19324.829250] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
[19324.830107] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
[19324.833208] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
[19324.834098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
[19324.835272] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
[19324.836466] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
[19324.837669] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
[19324.838867] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
[19324.840019] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
[19324.841316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19324.842242] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
[19324.843406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19324.844568] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19324.845710] Kernel panic - not syncing: Fatal exception in interrupt
[19324.846998] Kernel Offset: 0x32000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[19324.848713] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-16 17:04         ` Peter Zijlstra
@ 2020-06-16 17:17           ` Peter Zijlstra
  2020-06-16 17:53             ` Paul E. McKenney
  2020-06-19 13:44             ` Peter Zijlstra
  2020-06-16 17:51           ` Paul E. McKenney
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-16 17:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Tue, Jun 16, 2020 at 07:04:10PM +0200, Peter Zijlstra wrote:
> [19324.742887] BUG: kernel NULL pointer dereference, address: 0000000000000150
> [19324.744075] #PF: supervisor read access in kernel mode
> [19324.744919] #PF: error_code(0x0000) - not-present page
> [19324.745786] PGD 0 P4D 0
> [19324.746215] Oops: 0000 [#1] PREEMPT SMP PTI
> [19324.746948] CPU: 10 PID: 76 Comm: ksoftirqd/10 Tainted: G        W         5.8.0-rc1+ #8
> [19324.748080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> [19324.749372] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
> [19324.750218] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
> [19324.753364] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
> [19324.754255] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
> [19324.755465] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
> [19324.756682] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
> [19324.757848] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
> [19324.758453] smpboot: CPU 11 is now offline
> [19324.759099] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
> [19324.761167] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> [19324.762559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19324.763527] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
> [19324.764726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [19324.765929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [19324.767100] Call Trace:
> [19324.767516]  <IRQ>
> [19324.767875]  check_preempt_curr+0x62/0x90
> [19324.768586]  ttwu_do_wakeup.constprop.0+0xf/0x100
> [19324.769407]  sched_ttwu_pending+0xa9/0xe0
> [19324.770077]  __sysvec_call_function_single+0x28/0xe0
> [19324.770926]  asm_call_on_stack+0x12/0x20
> [19324.771594]  </IRQ>
> [19324.771951]  sysvec_call_function_single+0x94/0xd0
> [19324.772596]  asm_sysvec_call_function_single+0x12/0x20
> [19324.773254] RIP: 0010:_raw_spin_unlock_irqrestore+0x5/0x30
> [19324.774169] Code: e4 49 ff c3 90 c6 07 00 bf 01 00 00 00 e8 23 2d 53 ff 65 8b 05 cc 32 4b 4c 85 c0 74 01 c3 e8 b9 e4 49 ff c3 90 c6 07 00 56 9d <bf> 01 00 00 00 e8 01 2d 53 ff 65 8b 05 aa 32 4b 4c 85 c0 74 01 c3
> [19324.777267] RSP: 0000:ffffb3cb4030bd58 EFLAGS: 00000287
> [19324.777956] RAX: 0000000000000001 RBX: ffff93c1dbed5b00 RCX: ffff93c1dcd63400
> [19324.779015] RDX: 0000000000000000 RSI: 0000000000000287 RDI: ffff93c1dbed6284
> [19324.780067] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> [19324.781192] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: 0000000000000000
> [19324.782386] R13: 0000000000000287 R14: ffff93c1dbed6284 R15: ffff93c1df2e8340
> [19324.783565]  try_to_wake_up+0x232/0x530
> [19324.784057]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.784977]  call_timer_fn+0x28/0x150
> [19324.785606]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.786486]  run_timer_softirq+0x182/0x250
> [19324.787191]  ? set_next_entity+0x8b/0x1a0
> [19324.787867]  ? _raw_spin_unlock_irq+0xe/0x20
> [19324.788597]  ? finish_task_switch+0x7b/0x230
> [19324.789338]  __do_softirq+0xfc/0x32b
> [19324.789961]  ? smpboot_register_percpu_thread+0xd0/0xd0
> [19324.790904]  run_ksoftirqd+0x21/0x30
> [19324.791510]  smpboot_thread_fn+0x195/0x230
> [19324.792203]  kthread+0x13d/0x160
> [19324.792731]  ? kthread_create_worker_on_cpu+0x60/0x60
> [19324.793576]  ret_from_fork+0x22/0x30
> [19324.794186] Modules linked in:
> [19324.794729] CR2: 0000000000000150

OK, so above we run the IPI handler, while below we trigger the IPI
handler on the local CPU (something that'll make, for instance Power,
explode but -just-works- on x86). Both on CPU 10.

The question is, what actual order did this happen in, I expect the WARN
happened first, and then the NULL deref. Seeing how a NULL deref is
fatal and can't very well continue to produce a WARN. I expect they got
inverted by the magic gunk that is printk.

> [19324.795303] ------------[ cut here ]------------
> [19324.795304] WARNING: CPU: 10 PID: 76 at kernel/smp.c:138 __smp_call_single_queue+0x40/0x50
> [19324.795305] Modules linked in:
> [19324.795306] CPU: 10 PID: 76 Comm: ksoftirqd/10 Not tainted 5.8.0-rc1+ #8
> [19324.795307] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> [19324.795307] RIP: 0010:__smp_call_single_queue+0x40/0x50
> [19324.795308] Code: c2 40 91 02 00 4c 89 e6 4c 89 e7 48 03 14 c5 e0 56 2d b4 e8 b2 3a 2f 00 84 c0 75 04 5d 41 5c c3 89 ef 5d 41 5c e9 40 af f9 ff <0f> 0b eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 49 89 f4 55
> [19324.795309] RSP: 0000:ffffb3cb4030bd18 EFLAGS: 00010046
> [19324.795310] RAX: 000000000000000a RBX: 0000000000000000 RCX: 00000000ffffffff
> [19324.795310] RDX: 00000000000090aa RSI: ffffffffb420bc3f RDI: ffffffffb4232e3e
> [19324.795311] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> [19324.795312] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: ffff93c1dbed5b30
> [19324.795312] R13: ffff93c1df4a8340 R14: 000000000000000a R15: ffff93c1df2e8340
> [19324.795313] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> [19324.795313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19324.795314] CR2: 00000000ffffffff CR3: 000000001e40a000 CR4: 00000000000006e0
> [19324.795315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [19324.795315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [19324.795316] Call Trace:
> [19324.795316]  ttwu_queue_wakelist+0xa4/0xc0
> [19324.795316]  try_to_wake_up+0x432/0x530

This is indeed WF_ON_CPU... it had to be, but how ?!

> [19324.795317]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.795317]  call_timer_fn+0x28/0x150
> [19324.795318]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.795318]  run_timer_softirq+0x182/0x250
> [19324.795319]  ? set_next_entity+0x8b/0x1a0
> [19324.795319]  ? _raw_spin_unlock_irq+0xe/0x20
> [19324.795319]  ? finish_task_switch+0x7b/0x230
> [19324.795320]  __do_softirq+0xfc/0x32b
> [19324.795320]  ? smpboot_register_percpu_thread+0xd0/0xd0
> [19324.795321]  run_ksoftirqd+0x21/0x30
> [19324.795321]  smpboot_thread_fn+0x195/0x230
> [19324.795321]  kthread+0x13d/0x160
> [19324.795322]  ? kthread_create_worker_on_cpu+0x60/0x60
> [19324.795322]  ret_from_fork+0x22/0x30
> [19324.795323] ---[ end trace 851fe1f1f7a85d8b ]---


> [19324.828475] ---[ end trace 851fe1f1f7a85d8c ]---
> [19324.829250] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
> [19324.830107] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
> [19324.833208] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
> [19324.834098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
> [19324.835272] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
> [19324.836466] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
> [19324.837669] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
> [19324.838867] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
> [19324.840019] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> [19324.841316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19324.842242] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
> [19324.843406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [19324.844568] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [19324.845710] Kernel panic - not syncing: Fatal exception in interrupt
> [19324.846998] Kernel Offset: 0x32000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [19324.848713] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-16 17:04         ` Peter Zijlstra
  2020-06-16 17:17           ` Peter Zijlstra
@ 2020-06-16 17:51           ` Paul E. McKenney
  1 sibling, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-16 17:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Tue, Jun 16, 2020 at 07:04:10PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 09:11:58PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 10:21:49AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 15, 2020 at 06:40:48PM +0200, Peter Zijlstra wrote:
> > 
> > > > Thanks! I've got 16*TREE03 running since this morning, so far so nothing :/
> > > > (FWIW that's 16/9 times overcommit, idle time fluctuates around 10%).
> > > 
> > > My large system as large remote memory latencies, as in the better part
> > > of a microsecond.  My small system is old (Haswell).  So, just to grasp
> > > at the obvious straw, do you have access to a multisocket Haswell system?
> > 
> > I've been running this on a 4 socket haswell ex.
> 
> Today, with patch 1 commented out, after ~5h20 I finally managed to trigger a splat.
> 
> Let me go stare at it, see if it wants to yield it sekrets

This splat looks very very familiar, so good show!  ;-)

							Thanx, Paul

> [19324.742887] BUG: kernel NULL pointer dereference, address: 0000000000000150
> [19324.744075] #PF: supervisor read access in kernel mode
> [19324.744919] #PF: error_code(0x0000) - not-present page
> [19324.745786] PGD 0 P4D 0
> [19324.746215] Oops: 0000 [#1] PREEMPT SMP PTI
> [19324.746948] CPU: 10 PID: 76 Comm: ksoftirqd/10 Tainted: G        W         5.8.0-rc1+ #8
> [19324.748080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> [19324.749372] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
> [19324.750218] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
> [19324.753364] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
> [19324.754255] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
> [19324.755465] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
> [19324.756682] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
> [19324.757848] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
> [19324.758453] smpboot: CPU 11 is now offline
> [19324.759099] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
> [19324.761167] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> [19324.762559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19324.763527] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
> [19324.764726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [19324.765929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [19324.767100] Call Trace:
> [19324.767516]  <IRQ>
> [19324.767875]  check_preempt_curr+0x62/0x90
> [19324.768586]  ttwu_do_wakeup.constprop.0+0xf/0x100
> [19324.769407]  sched_ttwu_pending+0xa9/0xe0
> [19324.770077]  __sysvec_call_function_single+0x28/0xe0
> [19324.770926]  asm_call_on_stack+0x12/0x20
> [19324.771594]  </IRQ>
> [19324.771951]  sysvec_call_function_single+0x94/0xd0
> [19324.772596]  asm_sysvec_call_function_single+0x12/0x20
> [19324.773254] RIP: 0010:_raw_spin_unlock_irqrestore+0x5/0x30
> [19324.774169] Code: e4 49 ff c3 90 c6 07 00 bf 01 00 00 00 e8 23 2d 53 ff 65 8b 05 cc 32 4b 4c 85 c0 74 01 c3 e8 b9 e4 49 ff c3 90 c6 07 00 56 9d <bf> 01 00 00 00 e8 01 2d 53 ff 65 8b 05 aa 32 4b 4c 85 c0 74 01 c3
> [19324.777267] RSP: 0000:ffffb3cb4030bd58 EFLAGS: 00000287
> [19324.777956] RAX: 0000000000000001 RBX: ffff93c1dbed5b00 RCX: ffff93c1dcd63400
> [19324.779015] RDX: 0000000000000000 RSI: 0000000000000287 RDI: ffff93c1dbed6284
> [19324.780067] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> [19324.781192] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: 0000000000000000
> [19324.782386] R13: 0000000000000287 R14: ffff93c1dbed6284 R15: ffff93c1df2e8340
> [19324.783565]  try_to_wake_up+0x232/0x530
> [19324.784057]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.784977]  call_timer_fn+0x28/0x150
> [19324.785606]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.786486]  run_timer_softirq+0x182/0x250
> [19324.787191]  ? set_next_entity+0x8b/0x1a0
> [19324.787867]  ? _raw_spin_unlock_irq+0xe/0x20
> [19324.788597]  ? finish_task_switch+0x7b/0x230
> [19324.789338]  __do_softirq+0xfc/0x32b
> [19324.789961]  ? smpboot_register_percpu_thread+0xd0/0xd0
> [19324.790904]  run_ksoftirqd+0x21/0x30
> [19324.791510]  smpboot_thread_fn+0x195/0x230
> [19324.792203]  kthread+0x13d/0x160
> [19324.792731]  ? kthread_create_worker_on_cpu+0x60/0x60
> [19324.793576]  ret_from_fork+0x22/0x30
> [19324.794186] Modules linked in:
> [19324.794729] CR2: 0000000000000150
> [19324.795303] ------------[ cut here ]------------
> [19324.795304] WARNING: CPU: 10 PID: 76 at kernel/smp.c:138 __smp_call_single_queue+0x40/0x50
> [19324.795305] Modules linked in:
> [19324.795306] CPU: 10 PID: 76 Comm: ksoftirqd/10 Not tainted 5.8.0-rc1+ #8
> [19324.795307] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> [19324.795307] RIP: 0010:__smp_call_single_queue+0x40/0x50
> [19324.795308] Code: c2 40 91 02 00 4c 89 e6 4c 89 e7 48 03 14 c5 e0 56 2d b4 e8 b2 3a 2f 00 84 c0 75 04 5d 41 5c c3 89 ef 5d 41 5c e9 40 af f9 ff <0f> 0b eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 49 89 f4 55
> [19324.795309] RSP: 0000:ffffb3cb4030bd18 EFLAGS: 00010046
> [19324.795310] RAX: 000000000000000a RBX: 0000000000000000 RCX: 00000000ffffffff
> [19324.795310] RDX: 00000000000090aa RSI: ffffffffb420bc3f RDI: ffffffffb4232e3e
> [19324.795311] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> [19324.795312] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: ffff93c1dbed5b30
> [19324.795312] R13: ffff93c1df4a8340 R14: 000000000000000a R15: ffff93c1df2e8340
> [19324.795313] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> [19324.795313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19324.795314] CR2: 00000000ffffffff CR3: 000000001e40a000 CR4: 00000000000006e0
> [19324.795315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [19324.795315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [19324.795316] Call Trace:
> [19324.795316]  ttwu_queue_wakelist+0xa4/0xc0
> [19324.795316]  try_to_wake_up+0x432/0x530
> [19324.795317]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.795317]  call_timer_fn+0x28/0x150
> [19324.795318]  ? trace_raw_output_hrtimer_start+0x70/0x70
> [19324.795318]  run_timer_softirq+0x182/0x250
> [19324.795319]  ? set_next_entity+0x8b/0x1a0
> [19324.795319]  ? _raw_spin_unlock_irq+0xe/0x20
> [19324.795319]  ? finish_task_switch+0x7b/0x230
> [19324.795320]  __do_softirq+0xfc/0x32b
> [19324.795320]  ? smpboot_register_percpu_thread+0xd0/0xd0
> [19324.795321]  run_ksoftirqd+0x21/0x30
> [19324.795321]  smpboot_thread_fn+0x195/0x230
> [19324.795321]  kthread+0x13d/0x160
> [19324.795322]  ? kthread_create_worker_on_cpu+0x60/0x60
> [19324.795322]  ret_from_fork+0x22/0x30
> [19324.795323] ---[ end trace 851fe1f1f7a85d8b ]---
> [19324.828475] ---[ end trace 851fe1f1f7a85d8c ]---
> [19324.829250] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
> [19324.830107] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
> [19324.833208] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
> [19324.834098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
> [19324.835272] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
> [19324.836466] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
> [19324.837669] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
> [19324.838867] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
> [19324.840019] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> [19324.841316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19324.842242] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
> [19324.843406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [19324.844568] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [19324.845710] Kernel panic - not syncing: Fatal exception in interrupt
> [19324.846998] Kernel Offset: 0x32000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [19324.848713] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-16 17:17           ` Peter Zijlstra
@ 2020-06-16 17:53             ` Paul E. McKenney
  2020-06-19 13:44             ` Peter Zijlstra
  1 sibling, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-16 17:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic

On Tue, Jun 16, 2020 at 07:17:21PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 16, 2020 at 07:04:10PM +0200, Peter Zijlstra wrote:
> > [19324.742887] BUG: kernel NULL pointer dereference, address: 0000000000000150
> > [19324.744075] #PF: supervisor read access in kernel mode
> > [19324.744919] #PF: error_code(0x0000) - not-present page
> > [19324.745786] PGD 0 P4D 0
> > [19324.746215] Oops: 0000 [#1] PREEMPT SMP PTI
> > [19324.746948] CPU: 10 PID: 76 Comm: ksoftirqd/10 Tainted: G        W         5.8.0-rc1+ #8
> > [19324.748080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> > [19324.749372] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
> > [19324.750218] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
> > [19324.753364] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
> > [19324.754255] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
> > [19324.755465] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
> > [19324.756682] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
> > [19324.757848] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
> > [19324.758453] smpboot: CPU 11 is now offline
> > [19324.759099] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
> > [19324.761167] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> > [19324.762559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [19324.763527] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
> > [19324.764726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [19324.765929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [19324.767100] Call Trace:
> > [19324.767516]  <IRQ>
> > [19324.767875]  check_preempt_curr+0x62/0x90
> > [19324.768586]  ttwu_do_wakeup.constprop.0+0xf/0x100
> > [19324.769407]  sched_ttwu_pending+0xa9/0xe0
> > [19324.770077]  __sysvec_call_function_single+0x28/0xe0
> > [19324.770926]  asm_call_on_stack+0x12/0x20
> > [19324.771594]  </IRQ>
> > [19324.771951]  sysvec_call_function_single+0x94/0xd0
> > [19324.772596]  asm_sysvec_call_function_single+0x12/0x20
> > [19324.773254] RIP: 0010:_raw_spin_unlock_irqrestore+0x5/0x30
> > [19324.774169] Code: e4 49 ff c3 90 c6 07 00 bf 01 00 00 00 e8 23 2d 53 ff 65 8b 05 cc 32 4b 4c 85 c0 74 01 c3 e8 b9 e4 49 ff c3 90 c6 07 00 56 9d <bf> 01 00 00 00 e8 01 2d 53 ff 65 8b 05 aa 32 4b 4c 85 c0 74 01 c3
> > [19324.777267] RSP: 0000:ffffb3cb4030bd58 EFLAGS: 00000287
> > [19324.777956] RAX: 0000000000000001 RBX: ffff93c1dbed5b00 RCX: ffff93c1dcd63400
> > [19324.779015] RDX: 0000000000000000 RSI: 0000000000000287 RDI: ffff93c1dbed6284
> > [19324.780067] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> > [19324.781192] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: 0000000000000000
> > [19324.782386] R13: 0000000000000287 R14: ffff93c1dbed6284 R15: ffff93c1df2e8340
> > [19324.783565]  try_to_wake_up+0x232/0x530
> > [19324.784057]  ? trace_raw_output_hrtimer_start+0x70/0x70
> > [19324.784977]  call_timer_fn+0x28/0x150
> > [19324.785606]  ? trace_raw_output_hrtimer_start+0x70/0x70
> > [19324.786486]  run_timer_softirq+0x182/0x250
> > [19324.787191]  ? set_next_entity+0x8b/0x1a0
> > [19324.787867]  ? _raw_spin_unlock_irq+0xe/0x20
> > [19324.788597]  ? finish_task_switch+0x7b/0x230
> > [19324.789338]  __do_softirq+0xfc/0x32b
> > [19324.789961]  ? smpboot_register_percpu_thread+0xd0/0xd0
> > [19324.790904]  run_ksoftirqd+0x21/0x30
> > [19324.791510]  smpboot_thread_fn+0x195/0x230
> > [19324.792203]  kthread+0x13d/0x160
> > [19324.792731]  ? kthread_create_worker_on_cpu+0x60/0x60
> > [19324.793576]  ret_from_fork+0x22/0x30
> > [19324.794186] Modules linked in:
> > [19324.794729] CR2: 0000000000000150
> 
> OK, so above we run the IPI handler, while below we trigger the IPI
> handler on the local CPU (something that'll make, for instance Power,
> explode but -just-works- on x86). Both on CPU 10.
> 
> The question is, what actual order did this happen in, I expect the WARN
> happened first, and then the NULL deref. Seeing how a NULL deref is
> fatal and can't very well continue to produce a WARN. I expect they got
> inverted by the magic gunk that is printk.
> 
> > [19324.795303] ------------[ cut here ]------------
> > [19324.795304] WARNING: CPU: 10 PID: 76 at kernel/smp.c:138 __smp_call_single_queue+0x40/0x50
> > [19324.795305] Modules linked in:
> > [19324.795306] CPU: 10 PID: 76 Comm: ksoftirqd/10 Not tainted 5.8.0-rc1+ #8
> > [19324.795307] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> > [19324.795307] RIP: 0010:__smp_call_single_queue+0x40/0x50
> > [19324.795308] Code: c2 40 91 02 00 4c 89 e6 4c 89 e7 48 03 14 c5 e0 56 2d b4 e8 b2 3a 2f 00 84 c0 75 04 5d 41 5c c3 89 ef 5d 41 5c e9 40 af f9 ff <0f> 0b eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 49 89 f4 55
> > [19324.795309] RSP: 0000:ffffb3cb4030bd18 EFLAGS: 00010046
> > [19324.795310] RAX: 000000000000000a RBX: 0000000000000000 RCX: 00000000ffffffff
> > [19324.795310] RDX: 00000000000090aa RSI: ffffffffb420bc3f RDI: ffffffffb4232e3e
> > [19324.795311] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> > [19324.795312] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: ffff93c1dbed5b30
> > [19324.795312] R13: ffff93c1df4a8340 R14: 000000000000000a R15: ffff93c1df2e8340
> > [19324.795313] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> > [19324.795313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [19324.795314] CR2: 00000000ffffffff CR3: 000000001e40a000 CR4: 00000000000006e0
> > [19324.795315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [19324.795315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [19324.795316] Call Trace:
> > [19324.795316]  ttwu_queue_wakelist+0xa4/0xc0
> > [19324.795316]  try_to_wake_up+0x432/0x530
> 
> This is indeed WF_ON_CPU... it had to be, but how ?!

I confess to being similarly mystified.  :-/

							Thanx, Paul

> > [19324.795317]  ? trace_raw_output_hrtimer_start+0x70/0x70
> > [19324.795317]  call_timer_fn+0x28/0x150
> > [19324.795318]  ? trace_raw_output_hrtimer_start+0x70/0x70
> > [19324.795318]  run_timer_softirq+0x182/0x250
> > [19324.795319]  ? set_next_entity+0x8b/0x1a0
> > [19324.795319]  ? _raw_spin_unlock_irq+0xe/0x20
> > [19324.795319]  ? finish_task_switch+0x7b/0x230
> > [19324.795320]  __do_softirq+0xfc/0x32b
> > [19324.795320]  ? smpboot_register_percpu_thread+0xd0/0xd0
> > [19324.795321]  run_ksoftirqd+0x21/0x30
> > [19324.795321]  smpboot_thread_fn+0x195/0x230
> > [19324.795321]  kthread+0x13d/0x160
> > [19324.795322]  ? kthread_create_worker_on_cpu+0x60/0x60
> > [19324.795322]  ret_from_fork+0x22/0x30
> > [19324.795323] ---[ end trace 851fe1f1f7a85d8b ]---
> 
> 
> > [19324.828475] ---[ end trace 851fe1f1f7a85d8c ]---
> > [19324.829250] RIP: 0010:check_preempt_wakeup+0xad/0x1a0
> > [19324.830107] Code: d0 39 d0 7d 2c 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 74 1e 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 75 e2 48 85 ff 74 dd e8
> > [19324.833208] RSP: 0000:ffffb3cb40320f50 EFLAGS: 00010087
> > [19324.834098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb400bce0
> > [19324.835272] RDX: 0000000000000000 RSI: ffff93c1dbed5b00 RDI: ffff93c1df4a8380
> > [19324.836466] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff93c1df2e83b0
> > [19324.837669] R10: 0000000000000001 R11: 0000000000000335 R12: 0000000000000001
> > [19324.838867] R13: ffff93c1dcf48000 R14: ffff93c1df4a8340 R15: ffff93c1df4a8340
> > [19324.840019] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> > [19324.841316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [19324.842242] CR2: 0000000000000150 CR3: 000000001e40a000 CR4: 00000000000006e0
> > [19324.843406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [19324.844568] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [19324.845710] Kernel panic - not syncing: Fatal exception in interrupt
> > [19324.846998] Kernel Offset: 0x32000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [19324.848713] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-15 12:57 ` [PATCH 6/6] smp: Cleanup smp_call_function*() Peter Zijlstra
  2020-06-15 14:34   ` Jens Axboe
  2020-06-15 16:04   ` Daniel Thompson
@ 2020-06-17  8:23   ` Christoph Hellwig
  2020-06-17  9:00     ` Peter Zijlstra
  2020-06-17 11:04     ` Peter Zijlstra
  2 siblings, 2 replies; 37+ messages in thread
From: Christoph Hellwig @ 2020-06-17  8:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
	tsbogend, axboe, rjw, daniel.lezcano, dchickles, davem, kuba,
	daniel.thompson, gerald.schaefer

> -static DEFINE_PER_CPU(call_single_data_t, backtrace_csd);
> +static DEFINE_PER_CPU(call_single_data_t, backtrace_csd) = CSD_INIT(handle_backtrace, NULL);
>  static struct cpumask backtrace_csd_busy;

Besides the crazy long line: does assigning to a DEFINE_PER_CPU
really work and initialize all the members?

> @@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
>  		if (atomic_inc_return(&cpu_data->scheduled) > 1)
>  			continue;
>  
> -		cpu_data->csd.func = zpci_handle_remote_irq;
> -		cpu_data->csd.info = &cpu_data->scheduled;
> -		cpu_data->csd.flags = 0;
> +		cpu_data->csd = CSD_INIT(zpci_handle_remote_irq, &cpu_data->scheduled);

This looks weird.  I'd much rather see an initialization ala INIT_WORK:

		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq,
			 &cpu_data->scheduled);

Also for many smp_call_function_* users it would be trivial and actually
lead to nicer code if the data argument went away and we'd just use
container_of to get to the containing structure.  For the remaining
ones we can trivially general a container strucuture that has the
extra data pointer.

> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -629,9 +629,7 @@ void blk_mq_force_complete_rq(struct req
>  		shared = cpus_share_cache(cpu, ctx->cpu);
>  
>  	if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
> -		rq->csd.func = __blk_mq_complete_request_remote;
> -		rq->csd.info = rq;
> -		rq->csd.flags = 0;
> +		rq->csd = CSD_INIT(__blk_mq_complete_request_remote, rq);
>  		smp_call_function_single_async(ctx->cpu, &rq->csd);
>  	} else {
>  		q->mq_ops->complete(rq);
> --- a/block/blk-softirq.c
> +++ b/block/blk-softirq.c
> @@ -57,13 +57,8 @@ static void trigger_softirq(void *data)
>  static int raise_blk_irq(int cpu, struct request *rq)
>  {
>  	if (cpu_online(cpu)) {
> -		call_single_data_t *data = &rq->csd;
> -
> -		data->func = trigger_softirq;
> -		data->info = rq;
> -		data->flags = 0;
> -
> -		smp_call_function_single_async(cpu, data);
> +		rq->csd = CSD_INIT(trigger_softirq, rq);
> +		smp_call_function_single_async(cpu, &rq->csd);
>  		return 0;
>  	}

FYI, I rewrote much of the blk code in this series:

https://lore.kernel.org/linux-block/20200611064452.12353-1-hch@lst.de/T/#t

that you also were Cced on.

>  struct __call_single_data {
> -	union {
> -		struct __call_single_node node;
> -		struct {
> -			struct llist_node llist;
> -			unsigned int flags;
> -		};
> -	};
> +	struct __call_single_node node;
>  	smp_call_func_t func;
>  	void *info;
>  };

Can we rename this to struct call_single_data without the __prefix
and switch all the users you touch anyway away from the typedef?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-17  8:23   ` Christoph Hellwig
@ 2020-06-17  9:00     ` Peter Zijlstra
  2020-06-17 11:04     ` Peter Zijlstra
  1 sibling, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-17  9:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
	tsbogend, axboe, rjw, daniel.lezcano, dchickles, davem, kuba,
	daniel.thompson, gerald.schaefer

On Wed, Jun 17, 2020 at 01:23:49AM -0700, Christoph Hellwig wrote:
> > -static DEFINE_PER_CPU(call_single_data_t, backtrace_csd);
> > +static DEFINE_PER_CPU(call_single_data_t, backtrace_csd) = CSD_INIT(handle_backtrace, NULL);
> >  static struct cpumask backtrace_csd_busy;
> 
> Besides the crazy long line: does assigning to a DEFINE_PER_CPU
> really work and initialize all the members?

Yes. The way it works is that it initializes the variable that ends up
in the .data..percpu section and that's copied when we create the
actual per-cpu things.

> > @@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
> >  		if (atomic_inc_return(&cpu_data->scheduled) > 1)
> >  			continue;
> >  
> > -		cpu_data->csd.func = zpci_handle_remote_irq;
> > -		cpu_data->csd.info = &cpu_data->scheduled;
> > -		cpu_data->csd.flags = 0;
> > +		cpu_data->csd = CSD_INIT(zpci_handle_remote_irq, &cpu_data->scheduled);
> 
> This looks weird.  I'd much rather see an initialization ala INIT_WORK:
> 
> 		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq,
> 			 &cpu_data->scheduled);
> 
> Also for many smp_call_function_* users it would be trivial and actually
> lead to nicer code if the data argument went away and we'd just use
> container_of to get to the containing structure.  For the remaining
> ones we can trivially general a container strucuture that has the
> extra data pointer.

Agreed, except that won't work for things like cfd_data, csd_data and
csd_stack in smp.c. It might be possible to rework some of that, but
that's going to be further surgery.

> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -629,9 +629,7 @@ void blk_mq_force_complete_rq(struct req
> >  		shared = cpus_share_cache(cpu, ctx->cpu);
> >  
> >  	if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
> > -		rq->csd.func = __blk_mq_complete_request_remote;
> > -		rq->csd.info = rq;
> > -		rq->csd.flags = 0;
> > +		rq->csd = CSD_INIT(__blk_mq_complete_request_remote, rq);
> >  		smp_call_function_single_async(ctx->cpu, &rq->csd);
> >  	} else {
> >  		q->mq_ops->complete(rq);
> > --- a/block/blk-softirq.c
> > +++ b/block/blk-softirq.c
> > @@ -57,13 +57,8 @@ static void trigger_softirq(void *data)
> >  static int raise_blk_irq(int cpu, struct request *rq)
> >  {
> >  	if (cpu_online(cpu)) {
> > -		call_single_data_t *data = &rq->csd;
> > -
> > -		data->func = trigger_softirq;
> > -		data->info = rq;
> > -		data->flags = 0;
> > -
> > -		smp_call_function_single_async(cpu, data);
> > +		rq->csd = CSD_INIT(trigger_softirq, rq);
> > +		smp_call_function_single_async(cpu, &rq->csd);
> >  		return 0;
> >  	}
> 
> FYI, I rewrote much of the blk code in this series:
> 
> https://lore.kernel.org/linux-block/20200611064452.12353-1-hch@lst.de/T/#t
> 
> that you also were Cced on.

Yes, I know. The merge shouldn't be too difficult, but if that's landed
in a git tree meanwhile, I can try and pull that in.

> >  struct __call_single_data {
> > -	union {
> > -		struct __call_single_node node;
> > -		struct {
> > -			struct llist_node llist;
> > -			unsigned int flags;
> > -		};
> > -	};
> > +	struct __call_single_node node;
> >  	smp_call_func_t func;
> >  	void *info;
> >  };
> 
> Can we rename this to struct call_single_data without the __prefix
> and switch all the users you touch anyway away from the typedef?

That mess exists because of the alignment thing. IIRC you can't use the
sizeof() of a struct you're still declaring.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-17  8:23   ` Christoph Hellwig
  2020-06-17  9:00     ` Peter Zijlstra
@ 2020-06-17 11:04     ` Peter Zijlstra
  2020-06-18  6:51       ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-17 11:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
	tsbogend, axboe, rjw, daniel.lezcano, dchickles, davem, kuba,
	daniel.thompson, gerald.schaefer

On Wed, Jun 17, 2020 at 01:23:49AM -0700, Christoph Hellwig wrote:

> > @@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
> >  		if (atomic_inc_return(&cpu_data->scheduled) > 1)
> >  			continue;
> >  
> > -		cpu_data->csd.func = zpci_handle_remote_irq;
> > -		cpu_data->csd.info = &cpu_data->scheduled;
> > -		cpu_data->csd.flags = 0;
> > +		cpu_data->csd = CSD_INIT(zpci_handle_remote_irq, &cpu_data->scheduled);
> 
> This looks weird.  I'd much rather see an initialization ala INIT_WORK:
> 
> 		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq,
> 			 &cpu_data->scheduled);


like so then?

---
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -687,7 +687,6 @@ unsigned long arch_align_stack(unsigned
 	return sp & ALMASK;
 }
 
-static DEFINE_PER_CPU(call_single_data_t, backtrace_csd);
 static struct cpumask backtrace_csd_busy;
 
 static void handle_backtrace(void *info)
@@ -696,6 +695,9 @@ static void handle_backtrace(void *info)
 	cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy);
 }
 
+static DEFINE_PER_CPU(call_single_data_t, backtrace_csd) =
+	CSD_INIT(handle_backtrace, NULL);
+
 static void raise_backtrace(cpumask_t *mask)
 {
 	call_single_data_t *csd;
@@ -715,7 +717,6 @@ static void raise_backtrace(cpumask_t *m
 		}
 
 		csd = &per_cpu(backtrace_csd, cpu);
-		csd->func = handle_backtrace;
 		smp_call_function_single_async(cpu, csd);
 	}
 }
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -687,36 +687,23 @@ EXPORT_SYMBOL(flush_tlb_one);
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 
-static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd);
-
-void tick_broadcast(const struct cpumask *mask)
-{
-	call_single_data_t *csd;
-	int cpu;
-
-	for_each_cpu(cpu, mask) {
-		csd = &per_cpu(tick_broadcast_csd, cpu);
-		smp_call_function_single_async(cpu, csd);
-	}
-}
-
 static void tick_broadcast_callee(void *info)
 {
 	tick_receive_broadcast();
 }
 
-static int __init tick_broadcast_init(void)
+static DEFINE_PER_CPU(call_single_data_t, tick_broadcast_csd) =
+	CSD_INIT(tick_broadcast_callee, NULL);
+
+void tick_broadcast(const struct cpumask *mask)
 {
 	call_single_data_t *csd;
 	int cpu;
 
-	for (cpu = 0; cpu < NR_CPUS; cpu++) {
+	for_each_cpu(cpu, mask) {
 		csd = &per_cpu(tick_broadcast_csd, cpu);
-		csd->func = tick_broadcast_callee;
+		smp_call_function_single_async(cpu, csd);
 	}
-
-	return 0;
 }
-early_initcall(tick_broadcast_init);
 
 #endif /* CONFIG_GENERIC_CLOCKEVENTS_BROADCAST */
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
 		if (atomic_inc_return(&cpu_data->scheduled) > 1)
 			continue;
 
-		cpu_data->csd.func = zpci_handle_remote_irq;
-		cpu_data->csd.info = &cpu_data->scheduled;
-		cpu_data->csd.flags = 0;
+		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq, &cpu_data->scheduled);
 		smp_call_function_single_async(cpu, &cpu_data->csd);
 	}
 }
--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -74,10 +74,9 @@ static ssize_t cpuid_read(struct file *f
 
 	init_completion(&cmd.done);
 	for (; count; count -= 16) {
-		call_single_data_t csd = {
-			.func = cpuid_smp_cpuid,
-			.info = &cmd,
-		};
+		call_single_data_t csd;
+
+		INIT_CSD(&csd, cpuid_smp_cpuid, &cmd);
 
 		cmd.regs.eax = pos;
 		cmd.regs.ecx = pos >> 32;
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -169,12 +169,11 @@ static void __wrmsr_safe_on_cpu(void *in
 int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 {
 	struct msr_info_completion rv;
-	call_single_data_t csd = {
-		.func	= __rdmsr_safe_on_cpu,
-		.info	= &rv,
-	};
+	call_single_data_t csd;
 	int err;
 
+	INIT_CSD(&csd, __rdmsr_safe_on_cpu, &rv);
+
 	memset(&rv, 0, sizeof(rv));
 	init_completion(&rv.done);
 	rv.msr.msr_no = msr_no;
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -629,9 +629,7 @@ void blk_mq_force_complete_rq(struct req
 		shared = cpus_share_cache(cpu, ctx->cpu);
 
 	if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
-		rq->csd.func = __blk_mq_complete_request_remote;
-		rq->csd.info = rq;
-		rq->csd.flags = 0;
+		INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
 		smp_call_function_single_async(ctx->cpu, &rq->csd);
 	} else {
 		q->mq_ops->complete(rq);
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -57,13 +57,8 @@ static void trigger_softirq(void *data)
 static int raise_blk_irq(int cpu, struct request *rq)
 {
 	if (cpu_online(cpu)) {
-		call_single_data_t *data = &rq->csd;
-
-		data->func = trigger_softirq;
-		data->info = rq;
-		data->flags = 0;
-
-		smp_call_function_single_async(cpu, data);
+		INIT_CSD(&rq->csd, trigger_softirq, rq);
+		smp_call_function_single_async(cpu, &rq->csd);
 		return 0;
 	}
 
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -674,8 +674,7 @@ int cpuidle_coupled_register_device(stru
 	coupled->refcnt++;
 
 	csd = &per_cpu(cpuidle_coupled_poke_cb, dev->cpu);
-	csd->func = cpuidle_coupled_handle_poke;
-	csd->info = (void *)(unsigned long)dev->cpu;
+	INIT_CSD(csd, cpuidle_coupled_handle_poke, (void *)(unsigned long)dev->cpu);
 
 	return 0;
 }
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -726,13 +726,8 @@ static void liquidio_napi_drv_callback(v
 	    droq->cpu_id == this_cpu) {
 		napi_schedule_irqoff(&droq->napi);
 	} else {
-		call_single_data_t *csd = &droq->csd;
-
-		csd->func = napi_schedule_wrapper;
-		csd->info = &droq->napi;
-		csd->flags = 0;
-
-		smp_call_function_single_async(droq->cpu_id, csd);
+		INIT_CSD(&droq->csd, napi_schedule_wrapper, &droq->napi);
+		smp_call_function_single_async(droq->cpu_id, &droq->csd);
 	}
 }
 
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -21,21 +21,23 @@ typedef bool (*smp_cond_func_t)(int cpu,
  * structure shares (partial) layout with struct irq_work
  */
 struct __call_single_data {
-	union {
-		struct __call_single_node node;
-		struct {
-			struct llist_node llist;
-			unsigned int flags;
-		};
-	};
+	struct __call_single_node node;
 	smp_call_func_t func;
 	void *info;
 };
 
+#define CSD_INIT(_func, _info) \
+	(struct __call_single_data){ .func = (_func), .info = (_info), }
+
 /* Use __aligned() to avoid to use 2 cache lines for 1 csd */
 typedef struct __call_single_data call_single_data_t
 	__aligned(sizeof(struct __call_single_data));
 
+#define INIT_CSD(_csd, _func, _info)		\
+do {						\
+	*(_csd) = CSD_INIT((_func), (_info));	\
+while (0)
+
 /*
  * Enqueue a llist_node on the call_single_queue; be very careful, read
  * flush_smp_call_function_queue() in detail.
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -225,8 +225,6 @@ int __weak kgdb_skipexception(int except
  * Default (weak) implementation for kgdb_roundup_cpus
  */
 
-static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
-
 void __weak kgdb_call_nmi_hook(void *ignored)
 {
 	/*
@@ -240,6 +238,9 @@ void __weak kgdb_call_nmi_hook(void *ign
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
+static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd) =
+	CSD_INIT(kgdb_call_nmi_hook, NULL);
+
 void __weak kgdb_roundup_cpus(void)
 {
 	call_single_data_t *csd;
@@ -266,7 +267,6 @@ void __weak kgdb_roundup_cpus(void)
 			continue;
 		kgdb_info[cpu].rounding_up = true;
 
-		csd->func = kgdb_call_nmi_hook;
 		ret = smp_call_function_single_async(cpu, csd);
 		if (ret)
 			kgdb_info[cpu].rounding_up = false;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -221,14 +221,6 @@ void update_rq_clock(struct rq *rq)
 	update_rq_clock_task(rq, delta);
 }
 
-static inline void
-rq_csd_init(struct rq *rq, call_single_data_t *csd, smp_call_func_t func)
-{
-	csd->flags = 0;
-	csd->func = func;
-	csd->info = rq;
-}
-
 #ifdef CONFIG_SCHED_HRTICK
 /*
  * Use HR-timers to deliver accurate preemption points.
@@ -329,7 +321,7 @@ void hrtick_start(struct rq *rq, u64 del
 static void hrtick_rq_init(struct rq *rq)
 {
 #ifdef CONFIG_SMP
-	rq_csd_init(rq, &rq->hrtick_csd, __hrtick_start);
+	INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq);
 #endif
 	hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
 	rq->hrtick_timer.function = hrtick;
@@ -6778,7 +6770,7 @@ void __init sched_init(void)
 		rq->last_blocked_load_update_tick = jiffies;
 		atomic_set(&rq->nohz_flags, 0);
 
-		rq_csd_init(rq, &rq->nohz_csd, nohz_csd_func);
+		INIT_CSD(&rq->nohz_csd, nohz_csd_func, rq);
 #endif
 #endif /* CONFIG_SMP */
 		hrtick_rq_init(rq);
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -24,7 +24,7 @@
 #include "smpboot.h"
 #include "sched/smp.h"
 
-#define CSD_TYPE(_csd)	((_csd)->flags & CSD_FLAG_TYPE_MASK)
+#define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
 
 struct call_function_data {
 	call_single_data_t	__percpu *csd;
@@ -105,13 +105,13 @@ void __init call_function_init(void)
  */
 static __always_inline void csd_lock_wait(call_single_data_t *csd)
 {
-	smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
+	smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK));
 }
 
 static __always_inline void csd_lock(call_single_data_t *csd)
 {
 	csd_lock_wait(csd);
-	csd->flags |= CSD_FLAG_LOCK;
+	csd->node.u_flags |= CSD_FLAG_LOCK;
 
 	/*
 	 * prevent CPU from reordering the above assignment
@@ -123,12 +123,12 @@ static __always_inline void csd_lock(cal
 
 static __always_inline void csd_unlock(call_single_data_t *csd)
 {
-	WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
+	WARN_ON(!(csd->node.u_flags & CSD_FLAG_LOCK));
 
 	/*
 	 * ensure we're all done before releasing data:
 	 */
-	smp_store_release(&csd->flags, 0);
+	smp_store_release(&csd->node.u_flags, 0);
 }
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
@@ -180,7 +180,7 @@ static int generic_exec_single(int cpu,
 		return -ENXIO;
 	}
 
-	__smp_call_single_queue(cpu, &csd->llist);
+	__smp_call_single_queue(cpu, &csd->node.llist);
 
 	return 0;
 }
@@ -233,7 +233,7 @@ static void flush_smp_call_function_queu
 		 * We don't have to use the _safe() variant here
 		 * because we are not invoking the IPI handlers yet.
 		 */
-		llist_for_each_entry(csd, entry, llist) {
+		llist_for_each_entry(csd, entry, node.llist) {
 			switch (CSD_TYPE(csd)) {
 			case CSD_TYPE_ASYNC:
 			case CSD_TYPE_SYNC:
@@ -258,22 +258,22 @@ static void flush_smp_call_function_queu
 	 * First; run all SYNC callbacks, people are waiting for us.
 	 */
 	prev = NULL;
-	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+	llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
 		/* Do we wait until *after* callback? */
 		if (CSD_TYPE(csd) == CSD_TYPE_SYNC) {
 			smp_call_func_t func = csd->func;
 			void *info = csd->info;
 
 			if (prev) {
-				prev->next = &csd_next->llist;
+				prev->next = &csd_next->node.llist;
 			} else {
-				entry = &csd_next->llist;
+				entry = &csd_next->node.llist;
 			}
 
 			func(info);
 			csd_unlock(csd);
 		} else {
-			prev = &csd->llist;
+			prev = &csd->node.llist;
 		}
 	}
 
@@ -284,14 +284,14 @@ static void flush_smp_call_function_queu
 	 * Second; run all !SYNC callbacks.
 	 */
 	prev = NULL;
-	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
+	llist_for_each_entry_safe(csd, csd_next, entry, node.llist) {
 		int type = CSD_TYPE(csd);
 
 		if (type != CSD_TYPE_TTWU) {
 			if (prev) {
-				prev->next = &csd_next->llist;
+				prev->next = &csd_next->node.llist;
 			} else {
-				entry = &csd_next->llist;
+				entry = &csd_next->node.llist;
 			}
 
 			if (type == CSD_TYPE_ASYNC) {
@@ -305,7 +305,7 @@ static void flush_smp_call_function_queu
 			}
 
 		} else {
-			prev = &csd->llist;
+			prev = &csd->node.llist;
 		}
 	}
 
@@ -341,7 +341,7 @@ int smp_call_function_single(int cpu, sm
 {
 	call_single_data_t *csd;
 	call_single_data_t csd_stack = {
-		.flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC,
+		.node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, },
 	};
 	int this_cpu;
 	int err;
@@ -416,12 +416,12 @@ int smp_call_function_single_async(int c
 
 	preempt_disable();
 
-	if (csd->flags & CSD_FLAG_LOCK) {
+	if (csd->node.u_flags & CSD_FLAG_LOCK) {
 		err = -EBUSY;
 		goto out;
 	}
 
-	csd->flags = CSD_FLAG_LOCK;
+	csd->node.u_flags = CSD_FLAG_LOCK;
 	smp_wmb();
 
 	err = generic_exec_single(cpu, csd);
@@ -539,10 +539,10 @@ static void smp_call_function_many_cond(
 
 		csd_lock(csd);
 		if (wait)
-			csd->flags |= CSD_TYPE_SYNC;
+			csd->node.u_flags |= CSD_TYPE_SYNC;
 		csd->func = func;
 		csd->info = info;
-		if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
+		if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu)))
 			__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
 	}
 
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10645,8 +10645,7 @@ static int __init net_dev_init(void)
 		INIT_LIST_HEAD(&sd->poll_list);
 		sd->output_queue_tailp = &sd->output_queue;
 #ifdef CONFIG_RPS
-		sd->csd.func = rps_trigger_softirq;
-		sd->csd.info = sd;
+		INIT_CSD(&sd->csd, rps_trigger_softirq, sd);
 		sd->cpu = i;
 #endif
 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-17 11:04     ` Peter Zijlstra
@ 2020-06-18  6:51       ` Christoph Hellwig
  2020-06-18 16:25         ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2020-06-18  6:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, mingo, tglx, linux-kernel, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	paulmck, frederic, tsbogend, axboe, rjw, daniel.lezcano,
	dchickles, davem, kuba, daniel.thompson, gerald.schaefer

On Wed, Jun 17, 2020 at 01:04:01PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 17, 2020 at 01:23:49AM -0700, Christoph Hellwig wrote:
> 
> > > @@ -178,9 +178,7 @@ static void zpci_handle_fallback_irq(voi
> > >  		if (atomic_inc_return(&cpu_data->scheduled) > 1)
> > >  			continue;
> > >  
> > > -		cpu_data->csd.func = zpci_handle_remote_irq;
> > > -		cpu_data->csd.info = &cpu_data->scheduled;
> > > -		cpu_data->csd.flags = 0;
> > > +		cpu_data->csd = CSD_INIT(zpci_handle_remote_irq, &cpu_data->scheduled);
> > 
> > This looks weird.  I'd much rather see an initialization ala INIT_WORK:
> > 
> > 		INIT_CSD(&cpu_data->csd, zpci_handle_remote_irq,
> > 			 &cpu_data->scheduled);
> 
> 
> like so then?

Much better.  Although if we touch all the callers we might as well
pass the csd as the argument to the callback, as with that we can
pretty trivially remove the private data field later.

Btw, it seems the callers that don't have the CSD embedded into the
containing structure seems to be of these two kinds:

 - reimplementing on_each_cpumask (mostly because they can be called
   from irq context)
 - reimplenenting smp_call_function_single because they want
   to sleep instead of busy wait

I wonder if those would be useful primitives for smp.c..

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/6] smp: Cleanup smp_call_function*()
  2020-06-18  6:51       ` Christoph Hellwig
@ 2020-06-18 16:25         ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-18 16:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, paulmck, frederic,
	tsbogend, axboe, rjw, daniel.lezcano, dchickles, davem, kuba,
	daniel.thompson, gerald.schaefer

On Wed, Jun 17, 2020 at 11:51:07PM -0700, Christoph Hellwig wrote:
> Much better.  Although if we touch all the callers we might as well
> pass the csd as the argument to the callback, as with that we can
> pretty trivially remove the private data field later.

My plan was to introduce a new function and type and convert
smp_call_function_async() callers over to that. The csd as it exists is
useful for the regular smp_call_function*() API.

> Btw, it seems the callers that don't have the CSD embedded into the
> containing structure seems to be of these two kinds:
> 
>  - reimplementing on_each_cpumask (mostly because they can be called
>    from irq context)

These are fairly special purpose constructs; and they come at the cost
of extra per-cpu storage and they have the limitiation that they must
wait for completion of the first before they can be used again.

>  - reimplenenting smp_call_function_single because they want
>    to sleep instead of busy wait

These are atrocious pieces of crap (the x86/msr ones), the reason it was
done is because virt :/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-16 17:17           ` Peter Zijlstra
  2020-06-16 17:53             ` Paul E. McKenney
@ 2020-06-19 13:44             ` Peter Zijlstra
  2020-06-19 17:20               ` Paul E. McKenney
  2020-06-20 18:46               ` Paul E. McKenney
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-19 13:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic,
	Will Deacon, Mathieu Desnoyers, npiggin

On Tue, Jun 16, 2020 at 07:17:21PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 16, 2020 at 07:04:10PM +0200, Peter Zijlstra wrote:
> > [19324.795303] ------------[ cut here ]------------
> > [19324.795304] WARNING: CPU: 10 PID: 76 at kernel/smp.c:138 __smp_call_single_queue+0x40/0x50
> > [19324.795305] Modules linked in:
> > [19324.795306] CPU: 10 PID: 76 Comm: ksoftirqd/10 Not tainted 5.8.0-rc1+ #8
> > [19324.795307] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> > [19324.795307] RIP: 0010:__smp_call_single_queue+0x40/0x50
> > [19324.795308] Code: c2 40 91 02 00 4c 89 e6 4c 89 e7 48 03 14 c5 e0 56 2d b4 e8 b2 3a 2f 00 84 c0 75 04 5d 41 5c c3 89 ef 5d 41 5c e9 40 af f9 ff <0f> 0b eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 49 89 f4 55
> > [19324.795309] RSP: 0000:ffffb3cb4030bd18 EFLAGS: 00010046
> > [19324.795310] RAX: 000000000000000a RBX: 0000000000000000 RCX: 00000000ffffffff
> > [19324.795310] RDX: 00000000000090aa RSI: ffffffffb420bc3f RDI: ffffffffb4232e3e
> > [19324.795311] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> > [19324.795312] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: ffff93c1dbed5b30
> > [19324.795312] R13: ffff93c1df4a8340 R14: 000000000000000a R15: ffff93c1df2e8340
> > [19324.795313] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> > [19324.795313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [19324.795314] CR2: 00000000ffffffff CR3: 000000001e40a000 CR4: 00000000000006e0
> > [19324.795315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [19324.795315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [19324.795316] Call Trace:
> > [19324.795316]  ttwu_queue_wakelist+0xa4/0xc0
> > [19324.795316]  try_to_wake_up+0x432/0x530
> 
> This is indeed WF_ON_CPU... it had to be, but how ?!

So my latest theory is that we have a memory ordering problem. It would
fully explain the thing, but it would also render my patch #1
insufficient.

If we suppose the: task_cpu(p) load at the beginning of try_to_wake_up()
returns an old value, and this old value happens to be this_cpu. Further
assume that the p->on_cpu load accurately returns 1, it really is still
running, just not here.

Then, when we issue a local wakeup, we can crash in exactly the observed
manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from
the wrong CPU, therefore we'll iterate into the non-existant parents and
NULL deref.

The scenario is somewhat elaborate:


					X->cpu = 1
					rq(1)->curr = X


	CPU0				CPU1				CPU2

					// switch away from X
					LOCK rq(1)->lock
					smp_mb__after_spinlock
					dequeue_task(X)
					  X->on_rq = 9
					switch_to(Z)
					  X->on_cpu = 0
					UNLOCK rq(1)->lock


									// migrate X to cpu 0
									LOCK rq(1)->lock
									dequeue_task(X)
									set_task_cpu(X, 0)
									  X->cpu = 0
									UNLOCK rq(1)->lock

									LOCK rq(0)->lock
									enqueue_task(X)
									  X->on_rq = 1
									UNLOCK rq(0)->lock

	// switch to X
	LOCK rq(0)->lock
	smp_mb__after_spinlock
	switch_to(X)
	  X->on_cpu = 1
	UNLOCK rq(0)->lock

	// X goes sleep
	X->state = TASK_UNINTERRUPTIBLE
	smp_mb();			// wake X
					ttwu()
					  LOCK X->pi_lock
					  smp_mb__after_spinlock

					  if (p->state)

					  cpu = X->cpu; // =? 1

					  smp_rmb()

	// X calls schedule()
	LOCK rq(0)->lock
	smp_mb__after_spinlock
	dequeue_task(X)
	  X->on_rq = 0

					  if (p->on_rq)

					  smp_rmb();

					  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]

					  smp_cond_load_acquire(&p->on_cpu, !VAL)

					  cpu = select_task_rq(X, X->wake_cpu, ...)
					  if (X->cpu != cpu)
	switch_to(Y)
	  X->on_cpu = 0
	UNLOCK rq(0)->lock


Furthermore, without the fancy new path [*] we would have hit
smp_cond_load_acquire(), and if we _really_ would have had ->on_cpu==1
and cpu==this_cpu there, that'd have been a deadlock, but no such
deadlocks have ever been observed.

Also, note how the rest of the code never actually uses the @cpu value
loaded earlier, all that is re-loaded after the load_aquire of
X->on_cpu.

I'm having trouble convincing myself that's actually possible on
x86_64 -- after all, every LOCK implies an smp_mb there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.

Most of the previous ttwu() races were found on very large PowerPC
machines which are far more 'interesting'. I suppose I should go write
me litmus tests...

Anyway, IFF any of this holds true; then I suppose a patch like the below
ought to cure things.

If not, I'm, once again, defeated by this...

---
 kernel/sched/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8298b2c240ce..5534eb1ab79a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2378,6 +2378,9 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
 static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
 {
 	if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
+		if (WARN_ON(cpu == smp_processor_id()))
+			return false;
+
 		sched_clock_cpu(cpu); /* Sync clocks across CPUs */
 		__ttwu_queue_wakelist(p, cpu, wake_flags);
 		return true;
@@ -2550,7 +2553,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 	/* We're going to change ->state: */
 	success = 1;
-	cpu = task_cpu(p);
 
 	/*
 	 * Ensure we load p->on_rq _after_ p->state, otherwise it would
@@ -2615,7 +2617,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	 * let the waker make forward progress. This is safe because IRQs are
 	 * disabled and the IPI will deliver after on_cpu is cleared.
 	 */
-	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
+	if (smp_load_acquire(&p->on_cpu) &&
+	    ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
 		goto unlock;
 
 	/*
@@ -2635,6 +2638,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		psi_ttwu_dequeue(p);
 		set_task_cpu(p, cpu);
 	}
+#else
+	cpu = task_cpu(p);
 #endif /* CONFIG_SMP */
 
 	ttwu_queue(p, cpu, wake_flags);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-19 13:44             ` Peter Zijlstra
@ 2020-06-19 17:20               ` Paul E. McKenney
  2020-06-19 17:48                 ` Paul E. McKenney
  2020-06-20 18:46               ` Paul E. McKenney
  1 sibling, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-19 17:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic,
	Will Deacon, Mathieu Desnoyers, npiggin

On Fri, Jun 19, 2020 at 03:44:23PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 16, 2020 at 07:17:21PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 16, 2020 at 07:04:10PM +0200, Peter Zijlstra wrote:
> > > [19324.795303] ------------[ cut here ]------------
> > > [19324.795304] WARNING: CPU: 10 PID: 76 at kernel/smp.c:138 __smp_call_single_queue+0x40/0x50
> > > [19324.795305] Modules linked in:
> > > [19324.795306] CPU: 10 PID: 76 Comm: ksoftirqd/10 Not tainted 5.8.0-rc1+ #8
> > > [19324.795307] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> > > [19324.795307] RIP: 0010:__smp_call_single_queue+0x40/0x50
> > > [19324.795308] Code: c2 40 91 02 00 4c 89 e6 4c 89 e7 48 03 14 c5 e0 56 2d b4 e8 b2 3a 2f 00 84 c0 75 04 5d 41 5c c3 89 ef 5d 41 5c e9 40 af f9 ff <0f> 0b eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 49 89 f4 55
> > > [19324.795309] RSP: 0000:ffffb3cb4030bd18 EFLAGS: 00010046
> > > [19324.795310] RAX: 000000000000000a RBX: 0000000000000000 RCX: 00000000ffffffff
> > > [19324.795310] RDX: 00000000000090aa RSI: ffffffffb420bc3f RDI: ffffffffb4232e3e
> > > [19324.795311] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> > > [19324.795312] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: ffff93c1dbed5b30
> > > [19324.795312] R13: ffff93c1df4a8340 R14: 000000000000000a R15: ffff93c1df2e8340
> > > [19324.795313] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> > > [19324.795313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [19324.795314] CR2: 00000000ffffffff CR3: 000000001e40a000 CR4: 00000000000006e0
> > > [19324.795315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [19324.795315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [19324.795316] Call Trace:
> > > [19324.795316]  ttwu_queue_wakelist+0xa4/0xc0
> > > [19324.795316]  try_to_wake_up+0x432/0x530
> > 
> > This is indeed WF_ON_CPU... it had to be, but how ?!
> 
> So my latest theory is that we have a memory ordering problem. It would
> fully explain the thing, but it would also render my patch #1
> insufficient.
> 
> If we suppose the: task_cpu(p) load at the beginning of try_to_wake_up()
> returns an old value, and this old value happens to be this_cpu. Further
> assume that the p->on_cpu load accurately returns 1, it really is still
> running, just not here.
> 
> Then, when we issue a local wakeup, we can crash in exactly the observed
> manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from
> the wrong CPU, therefore we'll iterate into the non-existant parents and
> NULL deref.
> 
> The scenario is somewhat elaborate:
> 
> 
> 					X->cpu = 1
> 					rq(1)->curr = X
> 
> 
> 	CPU0				CPU1				CPU2
> 
> 					// switch away from X
> 					LOCK rq(1)->lock
> 					smp_mb__after_spinlock
> 					dequeue_task(X)
> 					  X->on_rq = 9
> 					switch_to(Z)
> 					  X->on_cpu = 0
> 					UNLOCK rq(1)->lock
> 
> 
> 									// migrate X to cpu 0
> 									LOCK rq(1)->lock
> 									dequeue_task(X)
> 									set_task_cpu(X, 0)
> 									  X->cpu = 0
> 									UNLOCK rq(1)->lock
> 
> 									LOCK rq(0)->lock
> 									enqueue_task(X)
> 									  X->on_rq = 1
> 									UNLOCK rq(0)->lock
> 
> 	// switch to X
> 	LOCK rq(0)->lock
> 	smp_mb__after_spinlock
> 	switch_to(X)
> 	  X->on_cpu = 1
> 	UNLOCK rq(0)->lock
> 
> 	// X goes sleep
> 	X->state = TASK_UNINTERRUPTIBLE
> 	smp_mb();			// wake X
> 					ttwu()
> 					  LOCK X->pi_lock
> 					  smp_mb__after_spinlock
> 
> 					  if (p->state)
> 
> 					  cpu = X->cpu; // =? 1
> 
> 					  smp_rmb()
> 
> 	// X calls schedule()
> 	LOCK rq(0)->lock
> 	smp_mb__after_spinlock
> 	dequeue_task(X)
> 	  X->on_rq = 0
> 
> 					  if (p->on_rq)
> 
> 					  smp_rmb();
> 
> 					  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]
> 
> 					  smp_cond_load_acquire(&p->on_cpu, !VAL)
> 
> 					  cpu = select_task_rq(X, X->wake_cpu, ...)
> 					  if (X->cpu != cpu)
> 	switch_to(Y)
> 	  X->on_cpu = 0
> 	UNLOCK rq(0)->lock
> 
> 
> Furthermore, without the fancy new path [*] we would have hit
> smp_cond_load_acquire(), and if we _really_ would have had ->on_cpu==1
> and cpu==this_cpu there, that'd have been a deadlock, but no such
> deadlocks have ever been observed.
> 
> Also, note how the rest of the code never actually uses the @cpu value
> loaded earlier, all that is re-loaded after the load_aquire of
> X->on_cpu.
> 
> I'm having trouble convincing myself that's actually possible on
> x86_64 -- after all, every LOCK implies an smp_mb there, so if ttwu
> observes ->state != RUNNING, it must also observe ->cpu != 1.
> 
> Most of the previous ttwu() races were found on very large PowerPC
> machines which are far more 'interesting'. I suppose I should go write
> me litmus tests...

You should be able to get access to a modest PowerPC system here:

https://osuosl.org/services/powerdev/request_hosting/

> Anyway, IFF any of this holds true; then I suppose a patch like the below
> ought to cure things.

I have started an initial 140-hour test which should complete by about
4PM PDT.

Last night's all-scenarios run on current -rcu (without any scheduler
patch) saw two failures in three 10-hour runs of TREE03.  And I might as
well use the failure times to get tighter bounds on the probability of
occurrence, though with only two failures it is pretty dicey -- getting
decent bounds would require something like 30 failures.  But we must
work with what we have.

The failures occurred at 18680 and 20018 seconds, respectively, and
the failure-free run was 36000 seconds long.  Using maxima to help:

	load(distrib);
	bfloat(18680/3600+20018/3600+10);
		2.074944444444444b1
	1-bfloat(cdf_poisson(0,140/(2.074944444444444b1/2)));
		9.999986212554204b-1

So a 140-hour failure-free run gives a 99.9999% confidence of some degree
of improvement.  But "some degree of improvement" might be a very small
improvement.  So let's look at an order of magnitude:

	1-bfloat(cdf_poisson(0,0.1*140/(2.074944444444444b1/2)));
		7.406128951482777b-1

So a 140-hour failure-free run gives a 74% confidence that the
patch gave an order-of-magnitude improvement.  So if that passes, an
additional 420-hour failure-free run would get to 99.5% confidence of
an order-of-magnitude improvement.

Statistics is rather unforgiving.  :-/

> If not, I'm, once again, defeated by this...

Here is hoping that this patch cures things!

							Thanx, Paul

> ---
>  kernel/sched/core.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8298b2c240ce..5534eb1ab79a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2378,6 +2378,9 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
>  static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
>  {
>  	if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> +		if (WARN_ON(cpu == smp_processor_id()))
> +			return false;
> +
>  		sched_clock_cpu(cpu); /* Sync clocks across CPUs */
>  		__ttwu_queue_wakelist(p, cpu, wake_flags);
>  		return true;
> @@ -2550,7 +2553,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  
>  	/* We're going to change ->state: */
>  	success = 1;
> -	cpu = task_cpu(p);
>  
>  	/*
>  	 * Ensure we load p->on_rq _after_ p->state, otherwise it would
> @@ -2615,7 +2617,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	 * let the waker make forward progress. This is safe because IRQs are
>  	 * disabled and the IPI will deliver after on_cpu is cleared.
>  	 */
> -	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
> +	if (smp_load_acquire(&p->on_cpu) &&
> +	    ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
>  		goto unlock;
>  
>  	/*
> @@ -2635,6 +2638,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  		psi_ttwu_dequeue(p);
>  		set_task_cpu(p, cpu);
>  	}
> +#else
> +	cpu = task_cpu(p);
>  #endif /* CONFIG_SMP */
>  
>  	ttwu_queue(p, cpu, wake_flags);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-19 17:20               ` Paul E. McKenney
@ 2020-06-19 17:48                 ` Paul E. McKenney
  2020-06-19 18:11                   ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-19 17:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic,
	Will Deacon, Mathieu Desnoyers, npiggin

On Fri, Jun 19, 2020 at 10:20:47AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 19, 2020 at 03:44:23PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 16, 2020 at 07:17:21PM +0200, Peter Zijlstra wrote:

[ . . . ]

> > If not, I'm, once again, defeated by this...
> 
> Here is hoping that this patch cures things!
> 
> 							Thanx, Paul
> 
> > ---
> >  kernel/sched/core.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 8298b2c240ce..5534eb1ab79a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2378,6 +2378,9 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
> >  static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
> >  {
> >  	if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> > +		if (WARN_ON(cpu == smp_processor_id()))
> > +			return false;
> > +
> >  		sched_clock_cpu(cpu); /* Sync clocks across CPUs */
> >  		__ttwu_queue_wakelist(p, cpu, wake_flags);
> >  		return true;
> > @@ -2550,7 +2553,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >  
> >  	/* We're going to change ->state: */
> >  	success = 1;
> > -	cpu = task_cpu(p);
> >  
> >  	/*
> >  	 * Ensure we load p->on_rq _after_ p->state, otherwise it would
> > @@ -2615,7 +2617,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >  	 * let the waker make forward progress. This is safe because IRQs are
> >  	 * disabled and the IPI will deliver after on_cpu is cleared.
> >  	 */
> > -	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
> > +	if (smp_load_acquire(&p->on_cpu) &&

Given the x86 memory model, this only protects against the compiler
reordering accesses in ttwu_queue_wakelist() against the fetch of
p->on_cpu, correct?

Don't get me wrong, I do see some potential compiler misorderings,
including with cpu_rq(cpu)->nr_running.  Just curious.

						Thanx, Paul

> > +	    ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
> >  		goto unlock;
> >  
> >  	/*
> > @@ -2635,6 +2638,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >  		psi_ttwu_dequeue(p);
> >  		set_task_cpu(p, cpu);
> >  	}
> > +#else
> > +	cpu = task_cpu(p);
> >  #endif /* CONFIG_SMP */
> >  
> >  	ttwu_queue(p, cpu, wake_flags);
> > 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-19 17:48                 ` Paul E. McKenney
@ 2020-06-19 18:11                   ` Peter Zijlstra
  2020-06-19 18:46                     ` Paul E. McKenney
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-19 18:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic,
	Will Deacon, Mathieu Desnoyers, npiggin

On Fri, Jun 19, 2020 at 10:48:02AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 19, 2020 at 10:20:47AM -0700, Paul E. McKenney wrote:

> > > @@ -2615,7 +2617,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > >  	 * let the waker make forward progress. This is safe because IRQs are
> > >  	 * disabled and the IPI will deliver after on_cpu is cleared.
> > >  	 */
> > > -	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
> > > +	if (smp_load_acquire(&p->on_cpu) &&
> 
> Given the x86 memory model, this only protects against the compiler
> reordering accesses in ttwu_queue_wakelist() against the fetch of
> p->on_cpu, correct?

Yes.

> Don't get me wrong, I do see some potential compiler misorderings,
> including with cpu_rq(cpu)->nr_running.  Just curious.

Given this is arch independent code, I'd better write generic code, and
there I really think this wants to be acquire. I'll also try and write a
comment for next time.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-19 18:11                   ` Peter Zijlstra
@ 2020-06-19 18:46                     ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-19 18:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic,
	Will Deacon, Mathieu Desnoyers, npiggin

On Fri, Jun 19, 2020 at 08:11:28PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 19, 2020 at 10:48:02AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 19, 2020 at 10:20:47AM -0700, Paul E. McKenney wrote:
> 
> > > > @@ -2615,7 +2617,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > > >  	 * let the waker make forward progress. This is safe because IRQs are
> > > >  	 * disabled and the IPI will deliver after on_cpu is cleared.
> > > >  	 */
> > > > -	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
> > > > +	if (smp_load_acquire(&p->on_cpu) &&
> > 
> > Given the x86 memory model, this only protects against the compiler
> > reordering accesses in ttwu_queue_wakelist() against the fetch of
> > p->on_cpu, correct?
> 
> Yes.
> 
> > Don't get me wrong, I do see some potential compiler misorderings,
> > including with cpu_rq(cpu)->nr_running.  Just curious.
> 
> Given this is arch independent code, I'd better write generic code, and
> there I really think this wants to be acquire. I'll also try and write a
> comment for next time.

I completely understand and agree.  Just trying to work out why my
systems hit this more than an order of magnitude more often than do
yours.  Compiler version might be important?  As noted on IRC, I am
using gcc 8.2.1.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/6] sched: TTWU, IPI, and assorted stuff
  2020-06-19 13:44             ` Peter Zijlstra
  2020-06-19 17:20               ` Paul E. McKenney
@ 2020-06-20 18:46               ` Paul E. McKenney
  1 sibling, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2020-06-20 18:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, frederic,
	Will Deacon, Mathieu Desnoyers, npiggin

On Fri, Jun 19, 2020 at 03:44:23PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 16, 2020 at 07:17:21PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 16, 2020 at 07:04:10PM +0200, Peter Zijlstra wrote:
> > > [19324.795303] ------------[ cut here ]------------
> > > [19324.795304] WARNING: CPU: 10 PID: 76 at kernel/smp.c:138 __smp_call_single_queue+0x40/0x50
> > > [19324.795305] Modules linked in:
> > > [19324.795306] CPU: 10 PID: 76 Comm: ksoftirqd/10 Not tainted 5.8.0-rc1+ #8
> > > [19324.795307] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> > > [19324.795307] RIP: 0010:__smp_call_single_queue+0x40/0x50
> > > [19324.795308] Code: c2 40 91 02 00 4c 89 e6 4c 89 e7 48 03 14 c5 e0 56 2d b4 e8 b2 3a 2f 00 84 c0 75 04 5d 41 5c c3 89 ef 5d 41 5c e9 40 af f9 ff <0f> 0b eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 49 89 f4 55
> > > [19324.795309] RSP: 0000:ffffb3cb4030bd18 EFLAGS: 00010046
> > > [19324.795310] RAX: 000000000000000a RBX: 0000000000000000 RCX: 00000000ffffffff
> > > [19324.795310] RDX: 00000000000090aa RSI: ffffffffb420bc3f RDI: ffffffffb4232e3e
> > > [19324.795311] RBP: 000000000000000a R08: 00001193646cd91c R09: ffff93c1df49c008
> > > [19324.795312] R10: ffffb3cb4030bdf8 R11: 000000000000032e R12: ffff93c1dbed5b30
> > > [19324.795312] R13: ffff93c1df4a8340 R14: 000000000000000a R15: ffff93c1df2e8340
> > > [19324.795313] FS:  0000000000000000(0000) GS:ffff93c1df480000(0000) knlGS:0000000000000000
> > > [19324.795313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [19324.795314] CR2: 00000000ffffffff CR3: 000000001e40a000 CR4: 00000000000006e0
> > > [19324.795315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [19324.795315] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [19324.795316] Call Trace:
> > > [19324.795316]  ttwu_queue_wakelist+0xa4/0xc0
> > > [19324.795316]  try_to_wake_up+0x432/0x530
> > 
> > This is indeed WF_ON_CPU... it had to be, but how ?!
> 
> So my latest theory is that we have a memory ordering problem. It would
> fully explain the thing, but it would also render my patch #1
> insufficient.
> 
> If we suppose the: task_cpu(p) load at the beginning of try_to_wake_up()
> returns an old value, and this old value happens to be this_cpu. Further
> assume that the p->on_cpu load accurately returns 1, it really is still
> running, just not here.
> 
> Then, when we issue a local wakeup, we can crash in exactly the observed
> manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from
> the wrong CPU, therefore we'll iterate into the non-existant parents and
> NULL deref.
> 
> The scenario is somewhat elaborate:
> 
> 
> 					X->cpu = 1
> 					rq(1)->curr = X
> 
> 
> 	CPU0				CPU1				CPU2
> 
> 					// switch away from X
> 					LOCK rq(1)->lock
> 					smp_mb__after_spinlock
> 					dequeue_task(X)
> 					  X->on_rq = 9
> 					switch_to(Z)
> 					  X->on_cpu = 0
> 					UNLOCK rq(1)->lock
> 
> 
> 									// migrate X to cpu 0
> 									LOCK rq(1)->lock
> 									dequeue_task(X)
> 									set_task_cpu(X, 0)
> 									  X->cpu = 0
> 									UNLOCK rq(1)->lock
> 
> 									LOCK rq(0)->lock
> 									enqueue_task(X)
> 									  X->on_rq = 1
> 									UNLOCK rq(0)->lock
> 
> 	// switch to X
> 	LOCK rq(0)->lock
> 	smp_mb__after_spinlock
> 	switch_to(X)
> 	  X->on_cpu = 1
> 	UNLOCK rq(0)->lock
> 
> 	// X goes sleep
> 	X->state = TASK_UNINTERRUPTIBLE
> 	smp_mb();			// wake X
> 					ttwu()
> 					  LOCK X->pi_lock
> 					  smp_mb__after_spinlock
> 
> 					  if (p->state)
> 
> 					  cpu = X->cpu; // =? 1
> 
> 					  smp_rmb()
> 
> 	// X calls schedule()
> 	LOCK rq(0)->lock
> 	smp_mb__after_spinlock
> 	dequeue_task(X)
> 	  X->on_rq = 0
> 
> 					  if (p->on_rq)
> 
> 					  smp_rmb();
> 
> 					  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]
> 
> 					  smp_cond_load_acquire(&p->on_cpu, !VAL)
> 
> 					  cpu = select_task_rq(X, X->wake_cpu, ...)
> 					  if (X->cpu != cpu)
> 	switch_to(Y)
> 	  X->on_cpu = 0
> 	UNLOCK rq(0)->lock
> 
> 
> Furthermore, without the fancy new path [*] we would have hit
> smp_cond_load_acquire(), and if we _really_ would have had ->on_cpu==1
> and cpu==this_cpu there, that'd have been a deadlock, but no such
> deadlocks have ever been observed.
> 
> Also, note how the rest of the code never actually uses the @cpu value
> loaded earlier, all that is re-loaded after the load_aquire of
> X->on_cpu.
> 
> I'm having trouble convincing myself that's actually possible on
> x86_64 -- after all, every LOCK implies an smp_mb there, so if ttwu
> observes ->state != RUNNING, it must also observe ->cpu != 1.
> 
> Most of the previous ttwu() races were found on very large PowerPC
> machines which are far more 'interesting'. I suppose I should go write
> me litmus tests...
> 
> Anyway, IFF any of this holds true; then I suppose a patch like the below
> ought to cure things.
> 
> If not, I'm, once again, defeated by this...

700 hours of TREE03 with no drama whatsoever, so it does appear that
defeat has finally been deterred!  ;-)

Tested-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/sched/core.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8298b2c240ce..5534eb1ab79a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2378,6 +2378,9 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags)
>  static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
>  {
>  	if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(cpu, wake_flags)) {
> +		if (WARN_ON(cpu == smp_processor_id()))
> +			return false;
> +
>  		sched_clock_cpu(cpu); /* Sync clocks across CPUs */
>  		__ttwu_queue_wakelist(p, cpu, wake_flags);
>  		return true;
> @@ -2550,7 +2553,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  
>  	/* We're going to change ->state: */
>  	success = 1;
> -	cpu = task_cpu(p);
>  
>  	/*
>  	 * Ensure we load p->on_rq _after_ p->state, otherwise it would
> @@ -2615,7 +2617,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	 * let the waker make forward progress. This is safe because IRQs are
>  	 * disabled and the IPI will deliver after on_cpu is cleared.
>  	 */
> -	if (READ_ONCE(p->on_cpu) && ttwu_queue_wakelist(p, cpu, wake_flags | WF_ON_RQ))
> +	if (smp_load_acquire(&p->on_cpu) &&
> +	    ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_RQ))
>  		goto unlock;
>  
>  	/*
> @@ -2635,6 +2638,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  		psi_ttwu_dequeue(p);
>  		set_task_cpu(p, cpu);
>  	}
> +#else
> +	cpu = task_cpu(p);
>  #endif /* CONFIG_SMP */
>  
>  	ttwu_queue(p, cpu, wake_flags);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] sched: Fix ttwu_queue_cond()
  2020-06-15 12:56 ` [PATCH 1/6] sched: Fix ttwu_queue_cond() Peter Zijlstra
  2020-06-15 13:34   ` Peter Zijlstra
@ 2020-06-22  9:11   ` Mel Gorman
  2020-06-22  9:41     ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Mel Gorman @ 2020-06-22  9:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, paulmck, frederic

On Mon, Jun 15, 2020 at 02:56:55PM +0200, Peter Zijlstra wrote:
> Where the condition:
> 
>   !cpus_share_cache(smp_processor_id(), cpu)
> 
> already implies 'cpu != smp_processor_id()', because a CPU always
> shares cache with itself, the secondary condition added in commit:
> 
>   2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> 
> voids that implication, resulting in attempting to do local wake-ups
> through the queue mechanism.
> 
> Fixes: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> Reported-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: Paul E. McKenney <paulmck@kernel.org>

Yep, I mistakenly though this would be covered by the self-wakeup check
early in try_to_wake_up() but it is not

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/6] sched: s/WF_ON_RQ/WQ_ON_CPU/
  2020-06-15 12:56 ` [PATCH 3/6] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
@ 2020-06-22  9:13   ` Mel Gorman
  0 siblings, 0 replies; 37+ messages in thread
From: Mel Gorman @ 2020-06-22  9:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, paulmck, frederic

On Mon, Jun 15, 2020 at 02:56:57PM +0200, Peter Zijlstra wrote:
> Avoids confusion...
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Mel Gorman <mgorman@susee.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/6] sched: Fix ttwu_queue_cond()
  2020-06-22  9:11   ` Mel Gorman
@ 2020-06-22  9:41     ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-06-22  9:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: mingo, tglx, linux-kernel, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, paulmck, frederic

On Mon, Jun 22, 2020 at 10:11:22AM +0100, Mel Gorman wrote:
> On Mon, Jun 15, 2020 at 02:56:55PM +0200, Peter Zijlstra wrote:
> > Where the condition:
> > 
> >   !cpus_share_cache(smp_processor_id(), cpu)
> > 
> > already implies 'cpu != smp_processor_id()', because a CPU always
> > shares cache with itself, the secondary condition added in commit:
> > 
> >   2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> > 
> > voids that implication, resulting in attempting to do local wake-ups
> > through the queue mechanism.
> > 
> > Fixes: 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")
> > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Tested-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Yep, I mistakenly though this would be covered by the self-wakeup check
> early in try_to_wake_up() but it is not

It is, one should not be able to observe 'p->on_cpu && task_cpu(cpu) ==
smp_processor_id()); I've since found the actual problem, find here:

  https://lkml.kernel.org/r/20200620184622.GA19696@paulmck-ThinkPad-P72

I'm currently polishing the changelog a little, and will shortly post a
new version of that.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-06-22  9:42 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-15 12:56 [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Peter Zijlstra
2020-06-15 12:56 ` [PATCH 1/6] sched: Fix ttwu_queue_cond() Peter Zijlstra
2020-06-15 13:34   ` Peter Zijlstra
2020-06-15 16:45     ` Paul E. McKenney
2020-06-15 22:58       ` Paul E. McKenney
2020-06-22  9:11   ` Mel Gorman
2020-06-22  9:41     ` Peter Zijlstra
2020-06-15 12:56 ` [PATCH 2/6] sched: Verify some SMP assumptions Peter Zijlstra
2020-06-15 12:56 ` [PATCH 3/6] sched: s/WF_ON_RQ/WQ_ON_CPU/ Peter Zijlstra
2020-06-22  9:13   ` Mel Gorman
2020-06-15 12:56 ` [PATCH 4/6] smp, irq_work: Continue smp_call_function*() and irq_work*() integration Peter Zijlstra
2020-06-15 12:56 ` [PATCH 5/6] irq_work: Cleanup Peter Zijlstra
2020-06-16 15:16   ` Petr Mladek
2020-06-15 12:57 ` [PATCH 6/6] smp: Cleanup smp_call_function*() Peter Zijlstra
2020-06-15 14:34   ` Jens Axboe
2020-06-15 16:04   ` Daniel Thompson
2020-06-17  8:23   ` Christoph Hellwig
2020-06-17  9:00     ` Peter Zijlstra
2020-06-17 11:04     ` Peter Zijlstra
2020-06-18  6:51       ` Christoph Hellwig
2020-06-18 16:25         ` Peter Zijlstra
2020-06-15 16:23 ` [PATCH 0/6] sched: TTWU, IPI, and assorted stuff Paul E. McKenney
2020-06-15 16:40   ` Peter Zijlstra
2020-06-15 17:21     ` Paul E. McKenney
2020-06-15 19:11       ` Peter Zijlstra
2020-06-15 19:55         ` Paul E. McKenney
2020-06-16 16:31           ` Paul E. McKenney
2020-06-16 17:04         ` Peter Zijlstra
2020-06-16 17:17           ` Peter Zijlstra
2020-06-16 17:53             ` Paul E. McKenney
2020-06-19 13:44             ` Peter Zijlstra
2020-06-19 17:20               ` Paul E. McKenney
2020-06-19 17:48                 ` Paul E. McKenney
2020-06-19 18:11                   ` Peter Zijlstra
2020-06-19 18:46                     ` Paul E. McKenney
2020-06-20 18:46               ` Paul E. McKenney
2020-06-16 17:51           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).