All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu
@ 2020-12-06 21:12 Thomas Gleixner
  2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-06 21:12 UTC (permalink / raw)
  To: LKML
  Cc: Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Frederic Weisbecker, Will Deacon, Naresh Kamboju

There have been several reports about KCSAN complaints vs. the racy access
to tick_do_timer_cpu. The syzbot moderation queue has three different
patterns all related to this. There are a few more...

As I know that this is intentional and safe, I did not pay much attention
to it, but Marco actually made me feel bad a few days ago as he explained
that these intentional races generate too much noise to get to the
dangerous ones.

There was an earlier attempt to just silence KCSAN by slapping READ/WRITE
once all over the place without even the faintiest attempt of reasoning,
which is definitely the wrong thing to do.

The bad thing about tick_do_timer_cpu is that its only barely documented
why it is safe and works at all, which makes it extremly hard for someone
not really familiar with the code to come up with reasoning.

So Marco made me fast forward that item in my todo list and I have to admit
that it would have been damned helpful if that Gleixner dude would have
added proper comments in the first place. Would have spared a lot of brain
twisting. :)

Staring at all usage sites unearthed a few silly things which are cleaned
up upfront. The actual annotation uses data_race() with proper comments as
READ/WRITE_ONCE() does not really buy anything under the assumption that
the compiler does not play silly buggers and tears the 32bit stores/loads
into byte wise ones. But even that would cause just potentially shorter
idle sleeps in the worst case and not a complete malfunction.

Thanks,

	tglx
----
 tick-common.c |   55 +++++++++++++++++++++++++++------
 tick-sched.c  |   96 ++++++++++++++++++++++++++++++++++++++++++----------------
 2 files changed, 117 insertions(+), 34 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch 1/3] tick: Remove pointless cpu valid check in hotplug code
  2020-12-06 21:12 [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Thomas Gleixner
@ 2020-12-06 21:12 ` Thomas Gleixner
  2020-12-07 11:59   ` Peter Zijlstra
                     ` (3 more replies)
  2020-12-06 21:12 ` [patch 2/3] tick/sched: Remove bogus boot "safety" check Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-06 21:12 UTC (permalink / raw)
  To: LKML
  Cc: Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Frederic Weisbecker, Will Deacon, Naresh Kamboju

tick_handover_do_timer() which is invoked when a CPU is unplugged has a
check for cpumask_first(cpu_online_mask) when it tries to hand over the
tick update duty.

Checking the result of cpumask_first() there is pointless because if the
online mask is empty at this point, then this would be the last CPU in the
system going offline, which is impossible. There is always at least one CPU
remaining. If online mask would be really empty then the timer duty would
be the least of the resulting problems.

Remove the well meant check simply because it is pointless and confusing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-common.c |   10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -407,17 +407,13 @@ EXPORT_SYMBOL_GPL(tick_broadcast_oneshot
 /*
  * Transfer the do_timer job away from a dying cpu.
  *
- * Called with interrupts disabled. Not locking required. If
+ * Called with interrupts disabled. No locking required. If
  * tick_do_timer_cpu is owned by this cpu, nothing can change it.
  */
 void tick_handover_do_timer(void)
 {
-	if (tick_do_timer_cpu == smp_processor_id()) {
-		int cpu = cpumask_first(cpu_online_mask);
-
-		tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu :
-			TICK_DO_TIMER_NONE;
-	}
+	if (tick_do_timer_cpu == smp_processor_id())
+		tick_do_timer_cpu = cpumask_first(cpu_online_mask);
 }
 
 /*


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch 2/3] tick/sched: Remove bogus boot "safety" check
  2020-12-06 21:12 [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Thomas Gleixner
  2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
@ 2020-12-06 21:12 ` Thomas Gleixner
  2020-12-11 22:41   ` Frederic Weisbecker
  2020-12-16 10:50   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
  2020-12-06 21:12 ` [patch 3/3] tick: Annotate tick_do_timer_cpu data races Thomas Gleixner
  2020-12-07 11:05 ` [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Marco Elver
  3 siblings, 2 replies; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-06 21:12 UTC (permalink / raw)
  To: LKML
  Cc: Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Frederic Weisbecker, Will Deacon, Naresh Kamboju

can_stop_idle_tick() checks whether the do_timer() duty has been taken over
by a CPU on boot. That's silly because the boot CPU always takes over with
the initial clockevent device.

But even if no CPU would have installed a clockevent and taken over the
duty then the question whether the tick on the current CPU can be stopped
or not is moot. In that case the current CPU would have no clockevent
either, so there would be nothing to keep ticking.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c |    7 -------
 1 file changed, 7 deletions(-)
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -941,13 +941,6 @@ static bool can_stop_idle_tick(int cpu,
 		 */
 		if (tick_do_timer_cpu == cpu)
 			return false;
-		/*
-		 * Boot safety: make sure the timekeeping duty has been
-		 * assigned before entering dyntick-idle mode,
-		 * tick_do_timer_cpu is TICK_DO_TIMER_BOOT
-		 */
-		if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_BOOT))
-			return false;
 
 		/* Should not happen for nohz-full */
 		if (WARN_ON_ONCE(tick_do_timer_cpu == TICK_DO_TIMER_NONE))


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-06 21:12 [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Thomas Gleixner
  2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
  2020-12-06 21:12 ` [patch 2/3] tick/sched: Remove bogus boot "safety" check Thomas Gleixner
@ 2020-12-06 21:12 ` Thomas Gleixner
  2020-12-07 12:09   ` Peter Zijlstra
  2020-12-07 11:05 ` [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Marco Elver
  3 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-06 21:12 UTC (permalink / raw)
  To: LKML
  Cc: Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

tick_do_timer_cpu is intentionally racy as it would require a global lock
to protect which is a non-starter.

The variable is used to prevent thundering herd problems of many CPUs
trying to update jiffies and timekeeping on the tick. It assigns the duty
of doing so to one CPU. On NOHZ=n systems this stays on the boot CPU
forever and is never updated except when the boot CPU is unplugged. On NOHZ
enabled systems this is quite different because a CPU can give up the
assignment to go for a long idle sleep. The duty is then picked up by
another online CPU on the next tick or interrupt from idle. This handover
is carefully designed so that even competing writes or temporary
assumptions cannot cause havoc.

The mechanism is unfortunately barely documented and not annotated which
triggers KCSAN reports all over the place.

Annotate the racy reads with data_race() and document the various places
which are affected.

Reported-by: syzbot+23a256029191772c2f02@syzkaller.appspotmail.com
Reported-by: syzbot+56078ac0b9071335a745@syzkaller.appspotmail.com
Reported-by: syzbot+867130cb240c41f15164@syzkaller.appspotmail.com
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-common.c |   45 ++++++++++++++++++++++-
 kernel/time/tick-sched.c  |   89 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 114 insertions(+), 20 deletions(-)

--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -45,6 +45,27 @@ ktime_t tick_period;
  *    TICK_DO_TIMER_NONE, i.e. a non existing CPU. So the next cpu which looks
  *    at it will take over and keep the time keeping alive.  The handover
  *    procedure also covers cpu hotplug.
+ *
+ * The variable is neither atomic nor accessed with locks. The handover and
+ * checks are intentionaly racy but safe. The basic rules for the update are:
+ *
+ * All reads and writes happen with interrupts disabled. The execution
+ * context is either task or hard interrupt.
+ *
+ * If tick_do_timer_cpu contains a CPU number then the only possible
+ * transition is that the holding CPU writes it to TICK_DO_TIMER_NONE. This
+ * only happens on NOHZ systems. If NOHZ is off then the duty stays with
+ * the CPU which picked it up, the boot CPU. The only exception is CPU hot
+ * unplug where the outgoing CPU transfers it, but that's safe because all
+ * other CPUs are stuck in stomp_machine().
+ *
+ * If tick_do_timer_cpu contains TICK_DO_TIMER_NONE then any CPU observing
+ * this can overwrite it with it's own CPU number and take on the tick
+ * duty. As this is lockless several CPUs can observe TICK_DO_TIMER_NONE
+ * concurrently and write their own CPU number to it, but at the end only
+ * one will win. Even if one of the writers assumes temporarily that it
+ * owns the duty there is no harm because the tick update is serialized
+ * with jiffies_lock. Other side effects are shorter sleeps for one round.
  */
 int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT;
 #ifdef CONFIG_NO_HZ_FULL
@@ -83,6 +104,13 @@ int tick_is_oneshot_available(void)
  */
 static void tick_periodic(int cpu)
 {
+	/*
+	 * Raceless access in periodic tick mode. The variable can only
+	 * change when the CPU holding tick_do_timer_cpu goes offline which
+	 * requires that all other CPUs are inside stomp_machine() with
+	 * interrupts disabled and cannot do this read here which is always
+	 * executed from the timer interrupt.
+	 */
 	if (tick_do_timer_cpu == cpu) {
 		raw_spin_lock(&jiffies_lock);
 		write_seqcount_begin(&jiffies_seq);
@@ -185,6 +213,11 @@ static void giveup_do_timer(void *info)
 
 	WARN_ON(tick_do_timer_cpu != smp_processor_id());
 
+	/*
+	 * Exception to the rule that the holding CPU only can
+	 * write TICK_DO_TIMER_NONE. The new CPU is waiting for
+	 * this function call to complete and asked for this.
+	 */
 	tick_do_timer_cpu = cpu;
 }
 
@@ -214,9 +247,14 @@ static void tick_setup_device(struct tic
 	if (!td->evtdev) {
 		/*
 		 * If no cpu took the do_timer update, assign it to
-		 * this cpu:
+		 * this cpu.
+		 *
+		 * Intentional data race. The boot CPU takes it over which
+		 * is obviously not racy. CPUs coming up later cannot
+		 * observe TICK_DO_TIMER_BOOT even if there is a concurrent
+		 * hand over.
 		 */
-		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
+		if (data_race(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
 			tick_do_timer_cpu = cpu;
 
 			tick_next_period = ktime_get();
@@ -409,6 +447,9 @@ EXPORT_SYMBOL_GPL(tick_broadcast_oneshot
  *
  * Called with interrupts disabled. No locking required. If
  * tick_do_timer_cpu is owned by this cpu, nothing can change it.
+ *
+ * Not a data race because all other CPUs are hanging out in
+ * stomp_machine() and cannot change that assignment.
  */
 void tick_handover_do_timer(void)
 {
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -125,25 +125,40 @@ static void tick_sched_do_timer(struct t
 
 #ifdef CONFIG_NO_HZ_COMMON
 	/*
-	 * Check if the do_timer duty was dropped. We don't care about
-	 * concurrency: This happens only when the CPU in charge went
-	 * into a long sleep. If two CPUs happen to assign themselves to
-	 * this duty, then the jiffies update is still serialized by
-	 * jiffies_lock.
-	 *
-	 * If nohz_full is enabled, this should not happen because the
-	 * tick_do_timer_cpu never relinquishes.
+	 * Check if the do_timer duty was dropped. This is an intentional
+	 * data race and we don't care about concurrency: This happens only
+	 * when the CPU in charge went into a long sleep. If two CPUs
+	 * happen to assign themselves to this duty it does not matter
+	 * which one ends up holding it. Both will try to update jiffies
+	 * but the jiffies update is still serialized by jiffies_lock.
 	 */
-	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
+	if (unlikely(data_race(tick_do_timer_cpu) == TICK_DO_TIMER_NONE)) {
 #ifdef CONFIG_NO_HZ_FULL
+		/*
+		 *
+		 * If nohz_full is enabled, this should not happen because
+		 * the tick_do_timer_cpu never relinquishes. Warn and try
+		 * to keep it alive.
+		 */
 		WARN_ON(tick_nohz_full_running);
 #endif
 		tick_do_timer_cpu = cpu;
 	}
 #endif
 
-	/* Check, if the jiffies need an update */
-	if (tick_do_timer_cpu == cpu)
+	/*
+	 * Check if jiffies need an update. Intentional data race on NOHZ:
+	 *
+	 * - There is some other CPU holding it
+	 * - This CPU took over above but raced with another CPU. So both
+	 *   invoke tick_do_update_jiffies64() and contend on jiffies lock
+	 *   eventually.
+	 * - The other CPU which holds it is about to give it up which does
+	 *   not cause harm because the current or some other CPU will
+	 *   observe that state either on the next interrupt or when trying
+	 *   to go back to idle and act upon it.
+	 */
+	if (data_race(tick_do_timer_cpu) == cpu)
 		tick_do_update_jiffies64(now);
 
 	if (ts->inidle)
@@ -433,6 +448,9 @@ static int tick_nohz_cpu_down(unsigned i
 	 * The tick_do_timer_cpu CPU handles housekeeping duty (unbound
 	 * timers, workqueues, timekeeping, ...) on behalf of full dynticks
 	 * CPUs. It must remain online when nohz full is enabled.
+	 *
+	 * There is no data race here. If nohz_full is enabled, this can
+	 * not change because the tick_do_timer_cpu never relinquishes.
 	 */
 	if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
 		return -EBUSY;
@@ -687,6 +705,7 @@ static ktime_t tick_nohz_next_event(stru
 	u64 basemono, next_tick, next_tmr, next_rcu, delta, expires;
 	unsigned long basejiff;
 	unsigned int seq;
+	int duty;
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
@@ -751,8 +770,15 @@ static ktime_t tick_nohz_next_event(stru
 	 * Otherwise we can sleep as long as we want.
 	 */
 	delta = timekeeping_max_deferment();
-	if (cpu != tick_do_timer_cpu &&
-	    (tick_do_timer_cpu != TICK_DO_TIMER_NONE || !ts->do_timer_last))
+	/*
+	 * Intentional data race: If the current CPU holds the do_timer()
+	 * duty then nothing can change it. It's always the holder which
+	 * gives it up.  If it's not held by any CPU then this CPU might be
+	 * the one which held it last. If that is true and another CPU
+	 * takes over in parallel then the only "damage" is a short sleep.
+	 */
+	duty = data_race(tick_do_timer_cpu);
+	if (cpu != duty && (duty != TICK_DO_TIMER_NONE || !ts->do_timer_last))
 		delta = KTIME_MAX;
 
 	/* Calculate the next expiry time */
@@ -773,6 +799,7 @@ static void tick_nohz_stop_tick(struct t
 	u64 basemono = ts->timer_expires_base;
 	u64 expires = ts->timer_expires;
 	ktime_t tick = expires;
+	int duty;
 
 	/* Make sure we won't be trying to stop it twice in a row. */
 	ts->timer_expires_base = 0;
@@ -784,11 +811,19 @@ static void tick_nohz_stop_tick(struct t
 	 * don't drop this here the jiffies might be stale and
 	 * do_timer() never invoked. Keep track of the fact that it
 	 * was the one which had the do_timer() duty last.
+	 *
+	 * Another intentional data race: If the current CPU holds the
+	 * do_timer() duty, then no other CPU can change it. If no CPU
+	 * holds it and on read another CPU is taking it on concurrently
+	 * then the only damage is that ts->do_timer_last might not be
+	 * cleared right now which just prevents the CPU from going into
+	 * sleep forever mode for another round.
 	 */
-	if (cpu == tick_do_timer_cpu) {
+	duty = data_race(tick_do_timer_cpu);
+	if (cpu == duty) {
 		tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 		ts->do_timer_last = 1;
-	} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+	} else if (duty != TICK_DO_TIMER_NONE) {
 		ts->do_timer_last = 0;
 	}
 
@@ -906,7 +941,15 @@ static bool can_stop_idle_tick(int cpu,
 	 * invoked.
 	 */
 	if (unlikely(!cpu_online(cpu))) {
-		if (cpu == tick_do_timer_cpu)
+		/*
+		 * If the current CPU holds it then the access is not racy
+		 * as no other CPU can change it. If it does not hold it
+		 * then it's irrelevant whether there is a concurrent
+		 * update which either sets it to TICK_DO_TIMER_NONE or
+		 * takes over from TICK_DO_TIMER_NONE, but no other CPU can
+		 * write the current CPU number into it.
+		 */
+		if (cpu == data_race(tick_do_timer_cpu))
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 		/*
 		 * Make sure the CPU doesn't get fooled by obsolete tick
@@ -935,15 +978,25 @@ static bool can_stop_idle_tick(int cpu,
 	}
 
 	if (tick_nohz_full_enabled()) {
+		int duty;
+
+		/*
+		 * Data race only possible during boot while a nohz_full
+		 * CPU holds the do_timer() duty. So this read might race
+		 * with an upcoming housekeeping CPU taking over. If the
+		 * current CPU holds it then this cannot race.
+		 */
+		duty = data_race(tick_do_timer_cpu);
+
 		/*
 		 * Keep the tick alive to guarantee timekeeping progression
 		 * if there are full dynticks CPUs around
 		 */
-		if (tick_do_timer_cpu == cpu)
+		if (duty == cpu)
 			return false;
 
 		/* Should not happen for nohz-full */
-		if (WARN_ON_ONCE(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
+		if (WARN_ON_ONCE(duty == TICK_DO_TIMER_NONE))
 			return false;
 	}
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu
  2020-12-06 21:12 [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-12-06 21:12 ` [patch 3/3] tick: Annotate tick_do_timer_cpu data races Thomas Gleixner
@ 2020-12-07 11:05 ` Marco Elver
  3 siblings, 0 replies; 32+ messages in thread
From: Marco Elver @ 2020-12-07 11:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, kasan-dev, Peter Zijlstra, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju, Dmitry Vyukov

On Sun, 6 Dec 2020 at 22:21, Thomas Gleixner <tglx@linutronix.de> wrote:
> There have been several reports about KCSAN complaints vs. the racy access
> to tick_do_timer_cpu. The syzbot moderation queue has three different
> patterns all related to this. There are a few more...
>
> As I know that this is intentional and safe, I did not pay much attention
> to it, but Marco actually made me feel bad a few days ago as he explained
> that these intentional races generate too much noise to get to the
> dangerous ones.

My strategy so far was to inspect random data races and decide which
ones might be more interesting and send those, but I haven't had time
to chase data races the past few months. Thus, getting rid of the
intentional boring ones will definitely scale better -- relying on a
human to do filtering really is suboptimal. :-)

> There was an earlier attempt to just silence KCSAN by slapping READ/WRITE
> once all over the place without even the faintiest attempt of reasoning,
> which is definitely the wrong thing to do.
>
> The bad thing about tick_do_timer_cpu is that its only barely documented
> why it is safe and works at all, which makes it extremly hard for someone
> not really familiar with the code to come up with reasoning.
>
> So Marco made me fast forward that item in my todo list and I have to admit
> that it would have been damned helpful if that Gleixner dude would have
> added proper comments in the first place. Would have spared a lot of brain
> twisting. :)
>
> Staring at all usage sites unearthed a few silly things which are cleaned
> up upfront. The actual annotation uses data_race() with proper comments as
> READ/WRITE_ONCE() does not really buy anything under the assumption that
> the compiler does not play silly buggers and tears the 32bit stores/loads
> into byte wise ones. But even that would cause just potentially shorter
> idle sleeps in the worst case and not a complete malfunction.

Ack -- thanks for marking the accesses!

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 1/3] tick: Remove pointless cpu valid check in hotplug code
  2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
@ 2020-12-07 11:59   ` Peter Zijlstra
  2020-12-07 17:44     ` Thomas Gleixner
  2020-12-11 22:21   ` Frederic Weisbecker
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-12-07 11:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Marco Elver, kasan-dev, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju

On Sun, Dec 06, 2020 at 10:12:54PM +0100, Thomas Gleixner wrote:

>  void tick_handover_do_timer(void)
>  {
> +	if (tick_do_timer_cpu == smp_processor_id())
> +		tick_do_timer_cpu = cpumask_first(cpu_online_mask);

For the paranoid amongst us, would it make sense to add something like:

	/*
	 * There must always be at least one online CPU.
	 */
	WARN_ON_ONCE(tick_do_timer_cpu >= nr_cpu_ids);

>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-06 21:12 ` [patch 3/3] tick: Annotate tick_do_timer_cpu data races Thomas Gleixner
@ 2020-12-07 12:09   ` Peter Zijlstra
  2020-12-07 17:46     ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-12-07 12:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Marco Elver, kasan-dev, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Sun, Dec 06, 2020 at 10:12:56PM +0100, Thomas Gleixner wrote:
> +		if (data_race(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {

I prefer the form:

	if (data_race(tick_do_timer_cpu == TICK_DO_TIMER_BOOT)) {

But there doesn't yet seem to be sufficient data_race() usage in the
kernel to see which of the forms is preferred. Do we want to bike-shed
this now and document the outcome somewhere?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 1/3] tick: Remove pointless cpu valid check in hotplug code
  2020-12-07 11:59   ` Peter Zijlstra
@ 2020-12-07 17:44     ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-07 17:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Marco Elver, kasan-dev, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju

On Mon, Dec 07 2020 at 12:59, Peter Zijlstra wrote:
> On Sun, Dec 06, 2020 at 10:12:54PM +0100, Thomas Gleixner wrote:
>
>>  void tick_handover_do_timer(void)
>>  {
>> +	if (tick_do_timer_cpu == smp_processor_id())
>> +		tick_do_timer_cpu = cpumask_first(cpu_online_mask);
>
> For the paranoid amongst us, would it make sense to add something like:
>
> 	/*
> 	 * There must always be at least one online CPU.
> 	 */
> 	WARN_ON_ONCE(tick_do_timer_cpu >= nr_cpu_ids);

And add that to all places which look at online mask during hotplug.

If we really care we can add it somewhere central in the hotplug
code. If that ever triggers then the wreckaged tick duty is just
uninteresting.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 12:09   ` Peter Zijlstra
@ 2020-12-07 17:46     ` Thomas Gleixner
  2020-12-07 18:19       ` Marco Elver
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-07 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Marco Elver, kasan-dev, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07 2020 at 13:09, Peter Zijlstra wrote:
> On Sun, Dec 06, 2020 at 10:12:56PM +0100, Thomas Gleixner wrote:
>> +		if (data_race(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
>
> I prefer the form:
>
> 	if (data_race(tick_do_timer_cpu == TICK_DO_TIMER_BOOT)) {
>
> But there doesn't yet seem to be sufficient data_race() usage in the
> kernel to see which of the forms is preferred. Do we want to bike-shed
> this now and document the outcome somewhere?

Yes please before we get a gazillion of patches changing half of them
half a year from now.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 17:46     ` Thomas Gleixner
@ 2020-12-07 18:19       ` Marco Elver
  2020-12-07 19:43         ` Thomas Gleixner
                           ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Marco Elver @ 2020-12-07 18:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, LKML, kasan-dev, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, Dec 07 2020 at 13:09, Peter Zijlstra wrote:
> > On Sun, Dec 06, 2020 at 10:12:56PM +0100, Thomas Gleixner wrote:
> >> +            if (data_race(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
> >
> > I prefer the form:
> >
> >       if (data_race(tick_do_timer_cpu == TICK_DO_TIMER_BOOT)) {
> >
> > But there doesn't yet seem to be sufficient data_race() usage in the
> > kernel to see which of the forms is preferred. Do we want to bike-shed
> > this now and document the outcome somewhere?
>
> Yes please before we get a gazillion of patches changing half of them
> half a year from now.

That rule should be as simple as possible. The simplest would be:
"Only enclose the smallest required expression in data_race(); keep
the number of required data_race() expressions to a minimum." (=> want
least amount of code inside data_race() with the least number of
data_race()s).

In the case here, that'd be the "if (data_race(tick_do_timer_cpu) ==
..." variant.

Otherwise there's the possibility that we'll end up with accesses
inside data_race() that we hadn't planned for. For example, somebody
refactors some code replacing constants with variables.

I currently don't know what the rule for Peter's preferred variant
would be, without running the risk of some accidentally data_race()'d
accesses.

Thoughts?

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 18:19       ` Marco Elver
@ 2020-12-07 19:43         ` Thomas Gleixner
  2020-12-07 19:44         ` Paul E. McKenney
  2020-12-08  8:01         ` Peter Zijlstra
  2 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-07 19:43 UTC (permalink / raw)
  To: Marco Elver
  Cc: Peter Zijlstra, LKML, kasan-dev, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07 2020 at 19:19, Marco Elver wrote:
> On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Mon, Dec 07 2020 at 13:09, Peter Zijlstra wrote:
>> > I prefer the form:
>> >
>> >       if (data_race(tick_do_timer_cpu == TICK_DO_TIMER_BOOT)) {
>> >
>> > But there doesn't yet seem to be sufficient data_race() usage in the
>> > kernel to see which of the forms is preferred. Do we want to bike-shed
>> > this now and document the outcome somewhere?
>>
>> Yes please before we get a gazillion of patches changing half of them
>> half a year from now.
>
> That rule should be as simple as possible. The simplest would be:
> "Only enclose the smallest required expression in data_race(); keep
> the number of required data_race() expressions to a minimum." (=> want
> least amount of code inside data_race() with the least number of
> data_race()s).
>
> In the case here, that'd be the "if (data_race(tick_do_timer_cpu) ==
> ..." variant.
>
> Otherwise there's the possibility that we'll end up with accesses
> inside data_race() that we hadn't planned for. For example, somebody
> refactors some code replacing constants with variables.
>
> I currently don't know what the rule for Peter's preferred variant
> would be, without running the risk of some accidentally data_race()'d
> accesses.

I agree. Lets keep it simple and have the data_race() only covering the
actual access to the racy variable, struct member.

The worst case we could end up with would be

    if (data_race(A) == data_race(B))

which would still be clearly isolated. The racy part is not the
comparison, it's the accesses which can cause random results for the
comparison.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 18:19       ` Marco Elver
  2020-12-07 19:43         ` Thomas Gleixner
@ 2020-12-07 19:44         ` Paul E. McKenney
  2020-12-07 21:46           ` Thomas Gleixner
  2020-12-08  8:11           ` Peter Zijlstra
  2020-12-08  8:01         ` Peter Zijlstra
  2 siblings, 2 replies; 32+ messages in thread
From: Paul E. McKenney @ 2020-12-07 19:44 UTC (permalink / raw)
  To: Marco Elver
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07, 2020 at 07:19:51PM +0100, Marco Elver wrote:
> On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, Dec 07 2020 at 13:09, Peter Zijlstra wrote:
> > > On Sun, Dec 06, 2020 at 10:12:56PM +0100, Thomas Gleixner wrote:
> > >> +            if (data_race(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
> > >
> > > I prefer the form:
> > >
> > >       if (data_race(tick_do_timer_cpu == TICK_DO_TIMER_BOOT)) {
> > >
> > > But there doesn't yet seem to be sufficient data_race() usage in the
> > > kernel to see which of the forms is preferred. Do we want to bike-shed
> > > this now and document the outcome somewhere?
> >
> > Yes please before we get a gazillion of patches changing half of them
> > half a year from now.
> 
> That rule should be as simple as possible. The simplest would be:
> "Only enclose the smallest required expression in data_race(); keep
> the number of required data_race() expressions to a minimum." (=> want
> least amount of code inside data_race() with the least number of
> data_race()s).
> 
> In the case here, that'd be the "if (data_race(tick_do_timer_cpu) ==
> ..." variant.
> 
> Otherwise there's the possibility that we'll end up with accesses
> inside data_race() that we hadn't planned for. For example, somebody
> refactors some code replacing constants with variables.
> 
> I currently don't know what the rule for Peter's preferred variant
> would be, without running the risk of some accidentally data_race()'d
> accesses.
> 
> Thoughts?

I am also concerned about inadvertently covering code with data_race().

Also, in this particular case, why data_race() rather than READ_ONCE()?
Do we really expect the compiler to be able to optimize this case
significantly without READ_ONCE()?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 19:44         ` Paul E. McKenney
@ 2020-12-07 21:46           ` Thomas Gleixner
  2020-12-07 22:38             ` Paul E. McKenney
  2020-12-08  8:11           ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-07 21:46 UTC (permalink / raw)
  To: paulmck, Marco Elver
  Cc: Peter Zijlstra, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07 2020 at 11:44, Paul E. McKenney wrote:
> On Mon, Dec 07, 2020 at 07:19:51PM +0100, Marco Elver wrote:
>> On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
>> I currently don't know what the rule for Peter's preferred variant
>> would be, without running the risk of some accidentally data_race()'d
>> accesses.
>> 
>> Thoughts?
>
> I am also concerned about inadvertently covering code with data_race().
>
> Also, in this particular case, why data_race() rather than READ_ONCE()?
> Do we really expect the compiler to be able to optimize this case
> significantly without READ_ONCE()?

That was your suggestion a week or so ago :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 21:46           ` Thomas Gleixner
@ 2020-12-07 22:38             ` Paul E. McKenney
  2020-12-07 22:46               ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2020-12-07 22:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marco Elver, Peter Zijlstra, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07, 2020 at 10:46:33PM +0100, Thomas Gleixner wrote:
> On Mon, Dec 07 2020 at 11:44, Paul E. McKenney wrote:
> > On Mon, Dec 07, 2020 at 07:19:51PM +0100, Marco Elver wrote:
> >> On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> I currently don't know what the rule for Peter's preferred variant
> >> would be, without running the risk of some accidentally data_race()'d
> >> accesses.
> >> 
> >> Thoughts?
> >
> > I am also concerned about inadvertently covering code with data_race().
> >
> > Also, in this particular case, why data_race() rather than READ_ONCE()?
> > Do we really expect the compiler to be able to optimize this case
> > significantly without READ_ONCE()?
> 
> That was your suggestion a week or so ago :)

You expected my suggestion to change?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 22:38             ` Paul E. McKenney
@ 2020-12-07 22:46               ` Thomas Gleixner
  2020-12-07 22:55                 ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-07 22:46 UTC (permalink / raw)
  To: paulmck
  Cc: Marco Elver, Peter Zijlstra, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07 2020 at 14:38, Paul E. McKenney wrote:

> On Mon, Dec 07, 2020 at 10:46:33PM +0100, Thomas Gleixner wrote:
>> On Mon, Dec 07 2020 at 11:44, Paul E. McKenney wrote:
>> > On Mon, Dec 07, 2020 at 07:19:51PM +0100, Marco Elver wrote:
>> >> On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> I currently don't know what the rule for Peter's preferred variant
>> >> would be, without running the risk of some accidentally data_race()'d
>> >> accesses.
>> >> 
>> >> Thoughts?
>> >
>> > I am also concerned about inadvertently covering code with data_race().
>> >
>> > Also, in this particular case, why data_race() rather than READ_ONCE()?
>> > Do we really expect the compiler to be able to optimize this case
>> > significantly without READ_ONCE()?
>> 
>> That was your suggestion a week or so ago :)
>
> You expected my suggestion to change?  ;-)

Your suggestion was data_race() IIRC but I might have lost track in that
conversation.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 22:46               ` Thomas Gleixner
@ 2020-12-07 22:55                 ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2020-12-07 22:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marco Elver, Peter Zijlstra, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07, 2020 at 11:46:48PM +0100, Thomas Gleixner wrote:
> On Mon, Dec 07 2020 at 14:38, Paul E. McKenney wrote:
> 
> > On Mon, Dec 07, 2020 at 10:46:33PM +0100, Thomas Gleixner wrote:
> >> On Mon, Dec 07 2020 at 11:44, Paul E. McKenney wrote:
> >> > On Mon, Dec 07, 2020 at 07:19:51PM +0100, Marco Elver wrote:
> >> >> On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> >> I currently don't know what the rule for Peter's preferred variant
> >> >> would be, without running the risk of some accidentally data_race()'d
> >> >> accesses.
> >> >> 
> >> >> Thoughts?
> >> >
> >> > I am also concerned about inadvertently covering code with data_race().
> >> >
> >> > Also, in this particular case, why data_race() rather than READ_ONCE()?
> >> > Do we really expect the compiler to be able to optimize this case
> >> > significantly without READ_ONCE()?
> >> 
> >> That was your suggestion a week or so ago :)
> >
> > You expected my suggestion to change?  ;-)
> 
> Your suggestion was data_race() IIRC but I might have lost track in that
> conversation.

OK, I am inconsistent after all.  I would have suggested READ_ONCE() given
no difference between them, so it is probably best to assume that there is
(or at least was) a good reason for data_race() instead of READ_ONCE().
Couldn't tell you what it might be, though.  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 18:19       ` Marco Elver
  2020-12-07 19:43         ` Thomas Gleixner
  2020-12-07 19:44         ` Paul E. McKenney
@ 2020-12-08  8:01         ` Peter Zijlstra
  2 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-12-08  8:01 UTC (permalink / raw)
  To: Marco Elver
  Cc: Thomas Gleixner, LKML, kasan-dev, Paul E. McKenney, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07, 2020 at 07:19:51PM +0100, Marco Elver wrote:
> On Mon, 7 Dec 2020 at 18:46, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, Dec 07 2020 at 13:09, Peter Zijlstra wrote:
> > > On Sun, Dec 06, 2020 at 10:12:56PM +0100, Thomas Gleixner wrote:
> > >> +            if (data_race(tick_do_timer_cpu) == TICK_DO_TIMER_BOOT) {
> > >
> > > I prefer the form:
> > >
> > >       if (data_race(tick_do_timer_cpu == TICK_DO_TIMER_BOOT)) {
> > >
> > > But there doesn't yet seem to be sufficient data_race() usage in the
> > > kernel to see which of the forms is preferred. Do we want to bike-shed
> > > this now and document the outcome somewhere?
> >
> > Yes please before we get a gazillion of patches changing half of them
> > half a year from now.
> 
> That rule should be as simple as possible. The simplest would be:
> "Only enclose the smallest required expression in data_race(); keep
> the number of required data_race() expressions to a minimum." (=> want
> least amount of code inside data_race() with the least number of
> data_race()s).
> 
> In the case here, that'd be the "if (data_race(tick_do_timer_cpu) ==
> ..." variant.

So I was worried that data_race(var) == const, would not allow the
compiler to emit

	cmpq $CONST, ();

but would instead force a separate load. But I checked and it does
generate the right code.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-07 19:44         ` Paul E. McKenney
  2020-12-07 21:46           ` Thomas Gleixner
@ 2020-12-08  8:11           ` Peter Zijlstra
  2020-12-08 15:03             ` Paul E. McKenney
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-12-08  8:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marco Elver, Thomas Gleixner, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Mon, Dec 07, 2020 at 11:44:06AM -0800, Paul E. McKenney wrote:

> Also, in this particular case, why data_race() rather than READ_ONCE()?
> Do we really expect the compiler to be able to optimize this case
> significantly without READ_ONCE()?

It's about intent and how the code reads. READ_ONCE() is something
completely different from data_race(). data_race() is correct here.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-08  8:11           ` Peter Zijlstra
@ 2020-12-08 15:03             ` Paul E. McKenney
  2020-12-16  0:27               ` Thomas Gleixner
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2020-12-08 15:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marco Elver, Thomas Gleixner, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Tue, Dec 08, 2020 at 09:11:29AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 07, 2020 at 11:44:06AM -0800, Paul E. McKenney wrote:
> 
> > Also, in this particular case, why data_race() rather than READ_ONCE()?
> > Do we really expect the compiler to be able to optimize this case
> > significantly without READ_ONCE()?
> 
> It's about intent and how the code reads. READ_ONCE() is something
> completely different from data_race(). data_race() is correct here.

Why?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 1/3] tick: Remove pointless cpu valid check in hotplug code
  2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
  2020-12-07 11:59   ` Peter Zijlstra
@ 2020-12-11 22:21   ` Frederic Weisbecker
  2020-12-12  0:16     ` Thomas Gleixner
  2020-12-11 22:31   ` Frederic Weisbecker
  2020-12-16 10:50   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
  3 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2020-12-11 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Will Deacon, Naresh Kamboju

On Sun, Dec 06, 2020 at 10:12:54PM +0100, Thomas Gleixner wrote:
> tick_handover_do_timer() which is invoked when a CPU is unplugged has a
> check for cpumask_first(cpu_online_mask) when it tries to hand over the
> tick update duty.
> 
> Checking the result of cpumask_first() there is pointless because if the
> online mask is empty at this point, then this would be the last CPU in the
> system going offline, which is impossible. There is always at least one CPU
> remaining. If online mask would be really empty then the timer duty would
> be the least of the resulting problems.
> 
> Remove the well meant check simply because it is pointless and confusing.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/tick-common.c |   10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -407,17 +407,13 @@ EXPORT_SYMBOL_GPL(tick_broadcast_oneshot
>  /*
>   * Transfer the do_timer job away from a dying cpu.
>   *
> - * Called with interrupts disabled. Not locking required. If
> + * Called with interrupts disabled. No locking required. If
>   * tick_do_timer_cpu is owned by this cpu, nothing can change it.
>   */
>  void tick_handover_do_timer(void)
>  {
> -	if (tick_do_timer_cpu == smp_processor_id()) {
> -		int cpu = cpumask_first(cpu_online_mask);
> -
> -		tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu :
> -			TICK_DO_TIMER_NONE;
> -	}
> +	if (tick_do_timer_cpu == smp_processor_id())
> +		tick_do_timer_cpu = cpumask_first(cpu_online_mask);

I was about to whine that this randomly chosen CPU may be idle and leave
the timekeeping stale until I realized that stop_machine() is running at that
time. Might be worth adding a comment about that.

Also why not just setting it to TICK_DO_TIMER_NONE and be done with it? Perhaps
to avoid that all the CPUs to compete and contend on jiffies update after stop
machine?

If so:

   Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 1/3] tick: Remove pointless cpu valid check in hotplug code
  2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
  2020-12-07 11:59   ` Peter Zijlstra
  2020-12-11 22:21   ` Frederic Weisbecker
@ 2020-12-11 22:31   ` Frederic Weisbecker
  2020-12-16 10:50   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
  3 siblings, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2020-12-11 22:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Will Deacon, Naresh Kamboju

On Sun, Dec 06, 2020 at 10:12:54PM +0100, Thomas Gleixner wrote:
> tick_handover_do_timer() which is invoked when a CPU is unplugged has a
> check for cpumask_first(cpu_online_mask) when it tries to hand over the
> tick update duty.
> 
> Checking the result of cpumask_first() there is pointless because if the
> online mask is empty at this point, then this would be the last CPU in the
> system going offline, which is impossible. There is always at least one CPU
> remaining. If online mask would be really empty then the timer duty would
> be the least of the resulting problems.
> 
> Remove the well meant check simply because it is pointless and confusing.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/tick-common.c |   10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -407,17 +407,13 @@ EXPORT_SYMBOL_GPL(tick_broadcast_oneshot
>  /*
>   * Transfer the do_timer job away from a dying cpu.
>   *
> - * Called with interrupts disabled. Not locking required. If
> + * Called with interrupts disabled. No locking required. If
>   * tick_do_timer_cpu is owned by this cpu, nothing can change it.
>   */
>  void tick_handover_do_timer(void)
>  {
> -	if (tick_do_timer_cpu == smp_processor_id()) {
> -		int cpu = cpumask_first(cpu_online_mask);
> -
> -		tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu :
> -			TICK_DO_TIMER_NONE;
> -	}
> +	if (tick_do_timer_cpu == smp_processor_id())
> +		tick_do_timer_cpu = cpumask_first(cpu_online_mask);
>  }

BTW since we have that, why do we need:

static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
{
	/*
	 * If this CPU is offline and it is the one which updates
	 * jiffies, then give up the assignment and let it be taken by
	 * the CPU which runs the tick timer next. If we don't drop
	 * this here the jiffies might be stale and do_timer() never
	 * invoked.
	 */
	if (unlikely(!cpu_online(cpu))) {
		if (cpu == tick_do_timer_cpu)
			tick_do_timer_cpu = TICK_DO_TIMER_NONE;


We should only enter idle with an offline CPU after calling
tick_handover_do_timer() so (cpu == tick_do_timer_cpu) shouldn't be possible.

Or am I missing something?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 2/3] tick/sched: Remove bogus boot "safety" check
  2020-12-06 21:12 ` [patch 2/3] tick/sched: Remove bogus boot "safety" check Thomas Gleixner
@ 2020-12-11 22:41   ` Frederic Weisbecker
  2020-12-16 10:50   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2020-12-11 22:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Will Deacon, Naresh Kamboju

On Sun, Dec 06, 2020 at 10:12:55PM +0100, Thomas Gleixner wrote:
> can_stop_idle_tick() checks whether the do_timer() duty has been taken over
> by a CPU on boot. That's silly because the boot CPU always takes over with
> the initial clockevent device.
> 
> But even if no CPU would have installed a clockevent and taken over the
> duty then the question whether the tick on the current CPU can be stopped
> or not is moot. In that case the current CPU would have no clockevent
> either, so there would be nothing to keep ticking.
> 
> Remove it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Acked-by: Frederic Weisbecker <frederic@kernel.org>

Thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 1/3] tick: Remove pointless cpu valid check in hotplug code
  2020-12-11 22:21   ` Frederic Weisbecker
@ 2020-12-12  0:16     ` Thomas Gleixner
  2020-12-12  1:20       ` Frederic Weisbecker
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-12  0:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Will Deacon, Naresh Kamboju

On Fri, Dec 11 2020 at 23:21, Frederic Weisbecker wrote:
> On Sun, Dec 06, 2020 at 10:12:54PM +0100, Thomas Gleixner wrote:
>> tick_handover_do_timer() which is invoked when a CPU is unplugged has a
>> @@ -407,17 +407,13 @@ EXPORT_SYMBOL_GPL(tick_broadcast_oneshot
>>  /*
>>   * Transfer the do_timer job away from a dying cpu.
>>   *
>> - * Called with interrupts disabled. Not locking required. If
>> + * Called with interrupts disabled. No locking required. If
>>   * tick_do_timer_cpu is owned by this cpu, nothing can change it.
>>   */
>>  void tick_handover_do_timer(void)
>>  {
>> -	if (tick_do_timer_cpu == smp_processor_id()) {
>> -		int cpu = cpumask_first(cpu_online_mask);
>> -
>> -		tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu :
>> -			TICK_DO_TIMER_NONE;
>> -	}
>> +	if (tick_do_timer_cpu == smp_processor_id())
>> +		tick_do_timer_cpu = cpumask_first(cpu_online_mask);
>
> I was about to whine that this randomly chosen CPU may be idle and leave
> the timekeeping stale until I realized that stop_machine() is running at that
> time. Might be worth adding a comment about that.
>
> Also why not just setting it to TICK_DO_TIMER_NONE and be done with it? Perhaps
> to avoid that all the CPUs to compete and contend on jiffies update after stop
> machine?

No. Because we'd need to add the NONE magic to NOHZ=n kernels which does
not make sense.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 1/3] tick: Remove pointless cpu valid check in hotplug code
  2020-12-12  0:16     ` Thomas Gleixner
@ 2020-12-12  1:20       ` Frederic Weisbecker
  0 siblings, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2020-12-12  1:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Marco Elver, kasan-dev, Peter Zijlstra, Paul E. McKenney,
	Ingo Molnar, Will Deacon, Naresh Kamboju

On Sat, Dec 12, 2020 at 01:16:12AM +0100, Thomas Gleixner wrote:
> On Fri, Dec 11 2020 at 23:21, Frederic Weisbecker wrote:
> > On Sun, Dec 06, 2020 at 10:12:54PM +0100, Thomas Gleixner wrote:
> >> tick_handover_do_timer() which is invoked when a CPU is unplugged has a
> >> @@ -407,17 +407,13 @@ EXPORT_SYMBOL_GPL(tick_broadcast_oneshot
> >>  /*
> >>   * Transfer the do_timer job away from a dying cpu.
> >>   *
> >> - * Called with interrupts disabled. Not locking required. If
> >> + * Called with interrupts disabled. No locking required. If
> >>   * tick_do_timer_cpu is owned by this cpu, nothing can change it.
> >>   */
> >>  void tick_handover_do_timer(void)
> >>  {
> >> -	if (tick_do_timer_cpu == smp_processor_id()) {
> >> -		int cpu = cpumask_first(cpu_online_mask);
> >> -
> >> -		tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu :
> >> -			TICK_DO_TIMER_NONE;
> >> -	}
> >> +	if (tick_do_timer_cpu == smp_processor_id())
> >> +		tick_do_timer_cpu = cpumask_first(cpu_online_mask);
> >
> > I was about to whine that this randomly chosen CPU may be idle and leave
> > the timekeeping stale until I realized that stop_machine() is running at that
> > time. Might be worth adding a comment about that.
> >
> > Also why not just setting it to TICK_DO_TIMER_NONE and be done with it? Perhaps
> > to avoid that all the CPUs to compete and contend on jiffies update after stop
> > machine?
> 
> No. Because we'd need to add the NONE magic to NOHZ=n kernels which does
> not make sense.

I forgot about that other half of the world.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-08 15:03             ` Paul E. McKenney
@ 2020-12-16  0:27               ` Thomas Gleixner
  2020-12-16 21:19                 ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-16  0:27 UTC (permalink / raw)
  To: paulmck, Peter Zijlstra
  Cc: Marco Elver, LKML, kasan-dev, Ingo Molnar, Frederic Weisbecker,
	Will Deacon, Naresh Kamboju, syzbot+23a256029191772c2f02,
	syzbot+56078ac0b9071335a745, syzbot+867130cb240c41f15164

On Tue, Dec 08 2020 at 07:03, Paul E. McKenney wrote:

> On Tue, Dec 08, 2020 at 09:11:29AM +0100, Peter Zijlstra wrote:
>> On Mon, Dec 07, 2020 at 11:44:06AM -0800, Paul E. McKenney wrote:
>> 
>> > Also, in this particular case, why data_race() rather than READ_ONCE()?
>> > Do we really expect the compiler to be able to optimize this case
>> > significantly without READ_ONCE()?

There is probably not much optimization potential for the compiler if
data_race() is used vs. READ/WRITE_ONCE() in this code.

>> It's about intent and how the code reads. READ_ONCE() is something
>> completely different from data_race(). data_race() is correct here.
>
> Why?

Lemme answer that to the extent why _I_ chose data_race() - aside of my
likely confusion over our IRC conversation.

The code does not really care about the compiler trying to be clever or
not as it is designed to be tolerant of all sorts of concurrency
including competing writes. It does not care about multiple reloads
either.  It neither cares about invented stores as long as these
invented stores are not storing phantasy values.

The only thing it cares about is store/load tearing, but there is no
'clever' way to use that because of the only valid transitions of
'cpunr' which comes from smp_processor_id() to TICK_DO_TIMER_NONE which
is the only constant involved or the other way round (which is
intentionally subject to competing stores).

If the compiler is free to store the 32bit value as 4 seperate bytes or
does invented stores with phantasy values, then there is surely a reason
to switch to READ/WRITE_ONCE(), but that'd be a really daft reason.

So my intent was to document that this code does not care about anything
else than what I'd consider to be plain compiler bugs.

My conclusion might be wrong as usual :)

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 32+ messages in thread

* [tip: timers/urgent] tick/sched: Remove bogus boot "safety" check
  2020-12-06 21:12 ` [patch 2/3] tick/sched: Remove bogus boot "safety" check Thomas Gleixner
  2020-12-11 22:41   ` Frederic Weisbecker
@ 2020-12-16 10:50   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 32+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-12-16 10:50 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Frederic Weisbecker, x86, linux-kernel

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     ba8ea8e7dd6e1662e34e730eadfc52aa6816f9dd
Gitweb:        https://git.kernel.org/tip/ba8ea8e7dd6e1662e34e730eadfc52aa6816f9dd
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sun, 06 Dec 2020 22:12:55 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 16 Dec 2020 11:26:27 +01:00

tick/sched: Remove bogus boot "safety" check

can_stop_idle_tick() checks whether the do_timer() duty has been taken over
by a CPU on boot. That's silly because the boot CPU always takes over with
the initial clockevent device.

But even if no CPU would have installed a clockevent and taken over the
duty then the question whether the tick on the current CPU can be stopped
or not is moot. In that case the current CPU would have no clockevent
either, so there would be nothing to keep ticking.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20201206212002.725238293@linutronix.de

---
 kernel/time/tick-sched.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a9e6893..5fbc748 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -991,13 +991,6 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		 */
 		if (tick_do_timer_cpu == cpu)
 			return false;
-		/*
-		 * Boot safety: make sure the timekeeping duty has been
-		 * assigned before entering dyntick-idle mode,
-		 * tick_do_timer_cpu is TICK_DO_TIMER_BOOT
-		 */
-		if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_BOOT))
-			return false;
 
 		/* Should not happen for nohz-full */
 		if (WARN_ON_ONCE(tick_do_timer_cpu == TICK_DO_TIMER_NONE))

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip: timers/urgent] tick: Remove pointless cpu valid check in hotplug code
  2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
                     ` (2 preceding siblings ...)
  2020-12-11 22:31   ` Frederic Weisbecker
@ 2020-12-16 10:50   ` tip-bot2 for Thomas Gleixner
  3 siblings, 0 replies; 32+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-12-16 10:50 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Frederic Weisbecker, x86, linux-kernel

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     f12ad423c4af877b2e4b5a80928b95195fccab04
Gitweb:        https://git.kernel.org/tip/f12ad423c4af877b2e4b5a80928b95195fccab04
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sun, 06 Dec 2020 22:12:54 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 16 Dec 2020 11:26:27 +01:00

tick: Remove pointless cpu valid check in hotplug code

tick_handover_do_timer() which is invoked when a CPU is unplugged has a
check for cpumask_first(cpu_online_mask) when it tries to hand over the
tick update duty.

Checking the result of cpumask_first() there is pointless because if the
online mask is empty at this point, then this would be the last CPU in the
system going offline, which is impossible. There is always at least one CPU
remaining. If online mask would be really empty then the timer duty would
be the least of the resulting problems.

Remove the well meant check simply because it is pointless and confusing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20201206212002.582579516@linutronix.de

---
 kernel/time/tick-common.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index a03764d..9d3a225 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -407,17 +407,13 @@ EXPORT_SYMBOL_GPL(tick_broadcast_oneshot_control);
 /*
  * Transfer the do_timer job away from a dying cpu.
  *
- * Called with interrupts disabled. Not locking required. If
+ * Called with interrupts disabled. No locking required. If
  * tick_do_timer_cpu is owned by this cpu, nothing can change it.
  */
 void tick_handover_do_timer(void)
 {
-	if (tick_do_timer_cpu == smp_processor_id()) {
-		int cpu = cpumask_first(cpu_online_mask);
-
-		tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu :
-			TICK_DO_TIMER_NONE;
-	}
+	if (tick_do_timer_cpu == smp_processor_id())
+		tick_do_timer_cpu = cpumask_first(cpu_online_mask);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-16  0:27               ` Thomas Gleixner
@ 2020-12-16 21:19                 ` Paul E. McKenney
  2020-12-16 21:23                   ` Thomas Gleixner
  2020-12-17 10:48                   ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Paul E. McKenney @ 2020-12-16 21:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Marco Elver, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Wed, Dec 16, 2020 at 01:27:43AM +0100, Thomas Gleixner wrote:
> On Tue, Dec 08 2020 at 07:03, Paul E. McKenney wrote:
> 
> > On Tue, Dec 08, 2020 at 09:11:29AM +0100, Peter Zijlstra wrote:
> >> On Mon, Dec 07, 2020 at 11:44:06AM -0800, Paul E. McKenney wrote:
> >> 
> >> > Also, in this particular case, why data_race() rather than READ_ONCE()?
> >> > Do we really expect the compiler to be able to optimize this case
> >> > significantly without READ_ONCE()?
> 
> There is probably not much optimization potential for the compiler if
> data_race() is used vs. READ/WRITE_ONCE() in this code.

OK, got it.

> >> It's about intent and how the code reads. READ_ONCE() is something
> >> completely different from data_race(). data_race() is correct here.
> >
> > Why?
> 
> Lemme answer that to the extent why _I_ chose data_race() - aside of my
> likely confusion over our IRC conversation.
> 
> The code does not really care about the compiler trying to be clever or
> not as it is designed to be tolerant of all sorts of concurrency
> including competing writes. It does not care about multiple reloads
> either.  It neither cares about invented stores as long as these
> invented stores are not storing phantasy values.
> 
> The only thing it cares about is store/load tearing, but there is no
> 'clever' way to use that because of the only valid transitions of
> 'cpunr' which comes from smp_processor_id() to TICK_DO_TIMER_NONE which
> is the only constant involved or the other way round (which is
> intentionally subject to competing stores).
> 
> If the compiler is free to store the 32bit value as 4 seperate bytes or
> does invented stores with phantasy values, then there is surely a reason
> to switch to READ/WRITE_ONCE(), but that'd be a really daft reason.
> 
> So my intent was to document that this code does not care about anything
> else than what I'd consider to be plain compiler bugs.
> 
> My conclusion might be wrong as usual :)

Given that there is no optimization potential, then the main reason to use
data_race() instead of *_ONCE() is to prevent KCSAN from considering the
accesses when looking for data races.  But that is mostly for debugging
accesses, in cases when these accesses are not really part of the
concurrent algorithm.

So if I understand the situation correctly, I would be using *ONCE().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-16 21:19                 ` Paul E. McKenney
@ 2020-12-16 21:23                   ` Thomas Gleixner
  2020-12-16 21:32                     ` Paul E. McKenney
  2020-12-17 10:48                   ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2020-12-16 21:23 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Marco Elver, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Wed, Dec 16 2020 at 13:19, Paul E. McKenney wrote:
> On Wed, Dec 16, 2020 at 01:27:43AM +0100, Thomas Gleixner wrote:
>> So my intent was to document that this code does not care about anything
>> else than what I'd consider to be plain compiler bugs.
>> 
>> My conclusion might be wrong as usual :)
>
> Given that there is no optimization potential, then the main reason to use
> data_race() instead of *_ONCE() is to prevent KCSAN from considering the
> accesses when looking for data races.  But that is mostly for debugging
> accesses, in cases when these accesses are not really part of the
> concurrent algorithm.
>
> So if I understand the situation correctly, I would be using *ONCE().

Could this be spelled out somewhere in Documentation/ please?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-16 21:23                   ` Thomas Gleixner
@ 2020-12-16 21:32                     ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2020-12-16 21:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Marco Elver, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Wed, Dec 16, 2020 at 10:23:57PM +0100, Thomas Gleixner wrote:
> On Wed, Dec 16 2020 at 13:19, Paul E. McKenney wrote:
> > On Wed, Dec 16, 2020 at 01:27:43AM +0100, Thomas Gleixner wrote:
> >> So my intent was to document that this code does not care about anything
> >> else than what I'd consider to be plain compiler bugs.
> >> 
> >> My conclusion might be wrong as usual :)
> >
> > Given that there is no optimization potential, then the main reason to use
> > data_race() instead of *_ONCE() is to prevent KCSAN from considering the
> > accesses when looking for data races.  But that is mostly for debugging
> > accesses, in cases when these accesses are not really part of the
> > concurrent algorithm.
> >
> > So if I understand the situation correctly, I would be using *ONCE().
> 
> Could this be spelled out somewhere in Documentation/ please?

Good point!  I will put a patch together.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-16 21:19                 ` Paul E. McKenney
  2020-12-16 21:23                   ` Thomas Gleixner
@ 2020-12-17 10:48                   ` Peter Zijlstra
  2020-12-17 14:59                     ` Paul E. McKenney
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-12-17 10:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Marco Elver, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Wed, Dec 16, 2020 at 01:19:31PM -0800, Paul E. McKenney wrote:
> Given that there is no optimization potential, then the main reason to use
> data_race() instead of *_ONCE() is to prevent KCSAN from considering the
> accesses when looking for data races.  But that is mostly for debugging
> accesses, in cases when these accesses are not really part of the
> concurrent algorithm.
> 
> So if I understand the situation correctly, I would be using *ONCE().

Huh, what, why?

The code doesn't need READ_ONCE(), it merely wants to tell kasan that
the race it observes is fine and as to please shut up.

IOW data_race() is accurate and right.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch 3/3] tick: Annotate tick_do_timer_cpu data races
  2020-12-17 10:48                   ` Peter Zijlstra
@ 2020-12-17 14:59                     ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2020-12-17 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Marco Elver, LKML, kasan-dev, Ingo Molnar,
	Frederic Weisbecker, Will Deacon, Naresh Kamboju,
	syzbot+23a256029191772c2f02, syzbot+56078ac0b9071335a745,
	syzbot+867130cb240c41f15164

On Thu, Dec 17, 2020 at 11:48:23AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 16, 2020 at 01:19:31PM -0800, Paul E. McKenney wrote:
> > Given that there is no optimization potential, then the main reason to use
> > data_race() instead of *_ONCE() is to prevent KCSAN from considering the
> > accesses when looking for data races.  But that is mostly for debugging
> > accesses, in cases when these accesses are not really part of the
> > concurrent algorithm.
> > 
> > So if I understand the situation correctly, I would be using *ONCE().
> 
> Huh, what, why?
> 
> The code doesn't need READ_ONCE(), it merely wants to tell kasan that
> the race it observes is fine and as to please shut up.
> 
> IOW data_race() is accurate and right.

Other way around.

The code does not need any of the compiler optimizations that might
someday be enabled by data_race().  So why do you need data_race() here?
What do exactly is lost by instead using READ_ONCE()?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-12-17 15:00 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-06 21:12 [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Thomas Gleixner
2020-12-06 21:12 ` [patch 1/3] tick: Remove pointless cpu valid check in hotplug code Thomas Gleixner
2020-12-07 11:59   ` Peter Zijlstra
2020-12-07 17:44     ` Thomas Gleixner
2020-12-11 22:21   ` Frederic Weisbecker
2020-12-12  0:16     ` Thomas Gleixner
2020-12-12  1:20       ` Frederic Weisbecker
2020-12-11 22:31   ` Frederic Weisbecker
2020-12-16 10:50   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
2020-12-06 21:12 ` [patch 2/3] tick/sched: Remove bogus boot "safety" check Thomas Gleixner
2020-12-11 22:41   ` Frederic Weisbecker
2020-12-16 10:50   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
2020-12-06 21:12 ` [patch 3/3] tick: Annotate tick_do_timer_cpu data races Thomas Gleixner
2020-12-07 12:09   ` Peter Zijlstra
2020-12-07 17:46     ` Thomas Gleixner
2020-12-07 18:19       ` Marco Elver
2020-12-07 19:43         ` Thomas Gleixner
2020-12-07 19:44         ` Paul E. McKenney
2020-12-07 21:46           ` Thomas Gleixner
2020-12-07 22:38             ` Paul E. McKenney
2020-12-07 22:46               ` Thomas Gleixner
2020-12-07 22:55                 ` Paul E. McKenney
2020-12-08  8:11           ` Peter Zijlstra
2020-12-08 15:03             ` Paul E. McKenney
2020-12-16  0:27               ` Thomas Gleixner
2020-12-16 21:19                 ` Paul E. McKenney
2020-12-16 21:23                   ` Thomas Gleixner
2020-12-16 21:32                     ` Paul E. McKenney
2020-12-17 10:48                   ` Peter Zijlstra
2020-12-17 14:59                     ` Paul E. McKenney
2020-12-08  8:01         ` Peter Zijlstra
2020-12-07 11:05 ` [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu Marco Elver

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.