linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tick/sched: fix data races at tick_do_timer_cpu
@ 2020-02-25  3:08 Qian Cai
  2020-03-04  9:39 ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-02-25  3:08 UTC (permalink / raw)
  To: tglx, fweisbec, mingo; +Cc: elver, linux-kernel, Qian Cai

tick_do_timer_cpu could be accessed concurrently where both plain writes
and plain reads are not protected by a lock. Thus, it could result in
data races. Fix them by adding pairs of READ|WRITE_ONCE(). The data
races were reported by KCSAN,

 write to 0xffffffffb2dc94ac of 4 bytes by interrupt on cpu 98:
  tick_sched_do_timer+0x77/0x90
  tick_sched_do_timer at kernel/time/tick-sched.c:136
  tick_sched_timer+0x35/0xc0
  __hrtimer_run_queues+0x217/0x7c0
  hrtimer_interrupt+0x1d4/0x3e0
  smp_apic_timer_interrupt+0x107/0x460
  apic_timer_interrupt+0xf/0x20
  cpuidle_enter_state+0x15e/0x980
  cpuidle_enter+0x69/0xc0
  call_cpuidle+0x23/0x40
  do_idle+0x248/0x280
  cpu_startup_entry+0x1d/0x1f
  start_secondary+0x1b2/0x230
  secondary_startup_64+0xb6/0xc0

 read to 0xffffffffb2dc94ac of 4 bytes by interrupt on cpu 67:
  tick_sched_do_timer+0x31/0x90
  tick_sched_do_timer at kernel/time/tick-sched.c:132
  tick_sched_timer+0x35/0xc0
  __hrtimer_run_queues+0x217/0x7c0
  hrtimer_interrupt+0x1d4/0x3e0
  smp_apic_timer_interrupt+0x107/0x460
  apic_timer_interrupt+0xf/0x20
  cpuidle_enter_state+0x15e/0x980
  cpuidle_enter+0x69/0xc0
  call_cpuidle+0x23/0x40
  do_idle+0x248/0x280
  cpu_startup_entry+0x1d/0x1f
  start_secondary+0x1b2/0x230
  secondary_startup_64+0xb6/0xc0

 read to 0xffffffffb2dc94ac of 4 bytes by task 0 on cpu 107:
  tick_nohz_idle_stop_tick+0x149/0x5f0
  tick_nohz_stop_tick at kernel/time/tick-sched.c:774
  (inlined by) __tick_nohz_idle_stop_tick at kernel/time/tick-sched.c:967
  (inlined by) tick_nohz_idle_stop_tick at kernel/time/tick-sched.c:988
  do_idle+0x235/0x280
  cpu_startup_entry+0x1d/0x1f
  start_secondary+0x1b2/0x230
  secondary_startup_64+0xb6/0xc0

 read to 0xffffffffb2dc94ac of 4 bytes by task 0 on cpu 21:
  tick_nohz_next_event+0x19b/0x2c0
  tick_nohz_next_event at kernel/time/tick-sched.c:740
  tick_nohz_get_sleep_length+0xae/0xe0
  menu_select+0x8b/0xc29
  cpuidle_select+0x50/0x70
  do_idle+0x214/0x280
  cpu_startup_entry+0x1d/0x1f
  start_secondary+0x1b2/0x230
  secondary_startup_64+0xb6/0xc0

Signed-off-by: Qian Cai <cai@lca.pw>
---
 kernel/time/tick-sched.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a792d21cac64..54bcf4eff238 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -129,16 +129,16 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
 	 * If nohz_full is enabled, this should not happen because the
 	 * tick_do_timer_cpu never relinquishes.
 	 */
-	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
+	if (unlikely(READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_NONE)) {
 #ifdef CONFIG_NO_HZ_FULL
 		WARN_ON(tick_nohz_full_running);
 #endif
-		tick_do_timer_cpu = cpu;
+		WRITE_ONCE(tick_do_timer_cpu, cpu);
 	}
 #endif
 
 	/* Check, if the jiffies need an update */
-	if (tick_do_timer_cpu == cpu)
+	if (READ_ONCE(tick_do_timer_cpu) == cpu)
 		tick_do_update_jiffies64(now);
 
 	if (ts->inidle)
@@ -737,8 +737,9 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
 	 * Otherwise we can sleep as long as we want.
 	 */
 	delta = timekeeping_max_deferment();
-	if (cpu != tick_do_timer_cpu &&
-	    (tick_do_timer_cpu != TICK_DO_TIMER_NONE || !ts->do_timer_last))
+	if (cpu != READ_ONCE(tick_do_timer_cpu) &&
+	    (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE ||
+	     !ts->do_timer_last))
 		delta = KTIME_MAX;
 
 	/* Calculate the next expiry time */
@@ -771,10 +772,10 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
 	 * do_timer() never invoked. Keep track of the fact that it
 	 * was the one which had the do_timer() duty last.
 	 */
-	if (cpu == tick_do_timer_cpu) {
-		tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+	if (cpu == READ_ONCE(tick_do_timer_cpu)) {
+		WRITE_ONCE(tick_do_timer_cpu, TICK_DO_TIMER_NONE);
 		ts->do_timer_last = 1;
-	} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+	} else if (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE) {
 		ts->do_timer_last = 0;
 	}
 
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu
  2020-02-25  3:08 [PATCH] tick/sched: fix data races at tick_do_timer_cpu Qian Cai
@ 2020-03-04  9:39 ` Thomas Gleixner
  2020-03-04  9:56   ` Peter Zijlstra
  2020-03-04 11:20   ` Qian Cai
  0 siblings, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-03-04  9:39 UTC (permalink / raw)
  To: Qian Cai, fweisbec, mingo; +Cc: elver, linux-kernel, Qian Cai, Peter Zijlstra

Qian,

Qian Cai <cai@lca.pw> writes:
> tick_do_timer_cpu could be accessed concurrently where both plain writes
> and plain reads are not protected by a lock. Thus, it could result in
> data races. Fix them by adding pairs of READ|WRITE_ONCE(). The data
> races were reported by KCSAN,

They are reported, but are they actually a real problem?

This completely lacks analysis why these 8 places need the
READ/WRITE_ONCE() treatment at all and if so why the other 14 places
accessing tick_do_timer_cpu are safe without it.

I definitely appreciate the work done with KCSAN, but just making the
tool shut up does not cut it. 

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu
  2020-03-04  9:39 ` Thomas Gleixner
@ 2020-03-04  9:56   ` Peter Zijlstra
  2020-03-04 11:24     ` Qian Cai
  2020-03-04 11:20   ` Qian Cai
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2020-03-04  9:56 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Qian Cai, fweisbec, mingo, elver, linux-kernel

On Wed, Mar 04, 2020 at 10:39:41AM +0100, Thomas Gleixner wrote:
> Qian,
> 
> Qian Cai <cai@lca.pw> writes:
> > tick_do_timer_cpu could be accessed concurrently where both plain writes
> > and plain reads are not protected by a lock. Thus, it could result in
> > data races. Fix them by adding pairs of READ|WRITE_ONCE(). The data
> > races were reported by KCSAN,
> 
> They are reported, but are they actually a real problem?
> 
> This completely lacks analysis why these 8 places need the
> READ/WRITE_ONCE() treatment at all and if so why the other 14 places
> accessing tick_do_timer_cpu are safe without it.
> 
> I definitely appreciate the work done with KCSAN, but just making the
> tool shut up does not cut it. 

Worse:

+	if (cpu != READ_ONCE(tick_do_timer_cpu) &&
+	    (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE ||

Doing that read twice is just utterly stupid. And the patch is full of
that :/

Please stop this non-thinking 'fix' generation!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu
  2020-03-04  9:39 ` Thomas Gleixner
  2020-03-04  9:56   ` Peter Zijlstra
@ 2020-03-04 11:20   ` Qian Cai
  2020-03-06 15:02     ` Qian Cai
  1 sibling, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-03-04 11:20 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: fweisbec, mingo, elver, linux-kernel, Peter Zijlstra



> On Mar 4, 2020, at 4:39 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> They are reported, but are they actually a real problem?
> 
> This completely lacks analysis why these 8 places need the
> READ/WRITE_ONCE() treatment at all and if so why the other 14 places
> accessing tick_do_timer_cpu are safe without it.
> 
> I definitely appreciate the work done with KCSAN, but just making the
> tool shut up does not cut it.

Looks at tick_sched_do_timer(), for example,

if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
#ifdef CONFIG_NO_HZ_FULL
		WARN_ON(tick_nohz_full_running);
#endif
		tick_do_timer_cpu = cpu;
	}
#endif

/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)
	tick_do_update_jiffies64(now);

Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to,

if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu)
         tick_do_update_jiffies64(now);

So it could save a branch or/and realized that tick_do_timer_cpu is not used later in this cpu, so it could discard the store?

I am not all that familiar with all other 14 places if it is possible to happen concurrently as well, so I took a pragmatic approach that for now only deal with the places that KCSAN confirmed, and then look forward for an incremental approach if there are more places needs treatments later once confirmed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu
  2020-03-04  9:56   ` Peter Zijlstra
@ 2020-03-04 11:24     ` Qian Cai
  0 siblings, 0 replies; 7+ messages in thread
From: Qian Cai @ 2020-03-04 11:24 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, fweisbec, mingo, elver, linux-kernel



> On Mar 4, 2020, at 4:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> Worse:
> 
> +    if (cpu != READ_ONCE(tick_do_timer_cpu) &&
> +        (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE ||
> 
> Doing that read twice is just utterly stupid. And the patch is full of
> that :/
> 
> Please stop this non-thinking 'fix' generation!

That is an oversight that could easily be fixed. However, blaming about “non-thinking” is rather harsh. Do you know I had spent weeks reading about memory-model.txt and alike just to get better at it?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu
  2020-03-04 11:20   ` Qian Cai
@ 2020-03-06 15:02     ` Qian Cai
  2020-03-07  0:45       ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-03-06 15:02 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: fweisbec, mingo, elver, linux-kernel, Peter Zijlstra

On Wed, 2020-03-04 at 06:20 -0500, Qian Cai wrote:
> > On Mar 4, 2020, at 4:39 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> > They are reported, but are they actually a real problem?
> > 
> > This completely lacks analysis why these 8 places need the
> > READ/WRITE_ONCE() treatment at all and if so why the other 14 places
> > accessing tick_do_timer_cpu are safe without it.
> > 
> > I definitely appreciate the work done with KCSAN, but just making the
> > tool shut up does not cut it.
> 
> Looks at tick_sched_do_timer(), for example,
> 
> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
> #ifdef CONFIG_NO_HZ_FULL
> 		WARN_ON(tick_nohz_full_running);
> #endif
> 		tick_do_timer_cpu = cpu;
> 	}
> #endif
> 
> /* Check, if the jiffies need an update */
> if (tick_do_timer_cpu == cpu)
> 	tick_do_update_jiffies64(now);
> 
> Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to,
> 
> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu)
>          tick_do_update_jiffies64(now);
> 
> So it could save a branch or/and realized that tick_do_timer_cpu is not used later in this cpu, so it could discard the store?
> 
> I am not all that familiar with all other 14 places if it is possible to happen concurrently as well, so I took a pragmatic approach that for now only deal with the places that KCSAN confirmed, and then look forward for an incremental approach if there are more places needs treatments later once confirmed.

If you don't think that will be happening and dislike using READ/WRITE_ONCE()
for documentation purpose, we could annotate those using the data_race() macro.
Is that more acceptable?

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a792d21cac64..08ce4088da87 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -129,7 +129,7 @@ static void tick_sched_do_timer(struct tick_sched *ts,
ktime_t now)
 	 * If nohz_full is enabled, this should not happen because the
 	 * tick_do_timer_cpu never relinquishes.
 	 */
-	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
+	if (unlikely(data_race(tick_do_timer_cpu == TICK_DO_TIMER_NONE))) {
 #ifdef CONFIG_NO_HZ_FULL
 		WARN_ON(tick_nohz_full_running);
 #endif
@@ -138,7 +138,7 @@ static void tick_sched_do_timer(struct tick_sched *ts,
ktime_t now)
 #endif
 
 	/* Check, if the jiffies need an update */
-	if (tick_do_timer_cpu == cpu)
+	if (data_race(tick_do_timer_cpu == cpu))
 		tick_do_update_jiffies64(now);
 
 	if (ts->inidle)
@@ -737,8 +737,9 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts,
int cpu)
 	 * Otherwise we can sleep as long as we want.
 	 */
 	delta = timekeeping_max_deferment();
-	if (cpu != tick_do_timer_cpu &&
-	    (tick_do_timer_cpu != TICK_DO_TIMER_NONE || !ts->do_timer_last))
+	if (data_race(cpu != tick_do_timer_cpu) &&
+	    (data_race(tick_do_timer_cpu != TICK_DO_TIMER_NONE) ||
+	    !ts->do_timer_last))
 		delta = KTIME_MAX;
 
 	/* Calculate the next expiry time */
@@ -771,10 +772,10 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int
cpu)
 	 * do_timer() never invoked. Keep track of the fact that it
 	 * was the one which had the do_timer() duty last.
 	 */
-	if (cpu == tick_do_timer_cpu) {
+	if (data_race(cpu == tick_do_timer_cpu)) {
 		tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 		ts->do_timer_last = 1;
-	} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+	} else if (data_race(tick_do_timer_cpu != TICK_DO_TIMER_NONE)) {
 		ts->do_timer_last = 0;
 	}
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu
  2020-03-06 15:02     ` Qian Cai
@ 2020-03-07  0:45       ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2020-03-07  0:45 UTC (permalink / raw)
  To: Qian Cai; +Cc: fweisbec, mingo, elver, linux-kernel, Peter Zijlstra

Qian Cai <cai@lca.pw> writes:
> On Wed, 2020-03-04 at 06:20 -0500, Qian Cai wrote:
>> Looks at tick_sched_do_timer(), for example,
>> 
>> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
>> #ifdef CONFIG_NO_HZ_FULL
>> 		WARN_ON(tick_nohz_full_running);
>> #endif
>> 		tick_do_timer_cpu = cpu;
>> 	}
>> #endif
>> 
>> /* Check, if the jiffies need an update */
>> if (tick_do_timer_cpu == cpu)
>> 	tick_do_update_jiffies64(now);
>> 
>> Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to,
>> 
>> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu)
>>          tick_do_update_jiffies64(now);
>> 
>> So it could save a branch or/and realized that tick_do_timer_cpu is
>> not used later in this cpu, so it could discard the store?

Sorry, I can't make any sense of this.

If a compiler optimizes this in the way you are describing then it is
seriously broken. If that happens then lots of code will just fall apart
in colourful ways.

>> I am not all that familiar with all other 14 places if it is possible
>> to happen concurrently as well, so I took a pragmatic approach that
>> for now only deal with the places that KCSAN confirmed, and then look
>> forward for an incremental approach if there are more places needs
>> treatments later once confirmed.

Stop this please. I just wasted days to understand why code was
sprinkled with random rcu*enter/exit() invocations until I figured out
that the reason was exactly what you are proposing:

     Curing the symptoms

That's the worst engineering principle ever, really.

Just shutting up the tool without doing a proper analysis is a
guaranteed recipe for disaster because it is going to paper over real or
latent bugs. Read the git log history carefully to find prime examples
for this.

So no, neither shutting up just the places KCSAN complained about and
ignoring the rest nor your new proposal
 
> If you don't think that will be happening and dislike using READ/WRITE_ONCE()
> for documentation purpose, we could annotate those using the data_race() macro.
> Is that more acceptable?

is leading anywhere. It's both based on cargo cult programming and lacks
proper root cause analysis.

I'm surely not ignoring the output of KCSAN, but please understand that
any attempt to silence its output purely based on mechanical duct tape
is not going anywhere.

I wasted actually months in the past to decode a subtle wreckage which
was caused by the back then 'silence uninitilized variable warnings'
frenzy. One of these blindly applied annotations papered over a real
bug and made it horribly hard to find.

As I'm busy with other urgent nightmares right now, this has to wait
until I get some spare cycles or you come up with some proper analysis.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-07  0:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25  3:08 [PATCH] tick/sched: fix data races at tick_do_timer_cpu Qian Cai
2020-03-04  9:39 ` Thomas Gleixner
2020-03-04  9:56   ` Peter Zijlstra
2020-03-04 11:24     ` Qian Cai
2020-03-04 11:20   ` Qian Cai
2020-03-06 15:02     ` Qian Cai
2020-03-07  0:45       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).