linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING in tracepoint_probe_register_prio (3)
@ 2018-05-18 10:29 syzbot
  2019-08-16  0:11 ` syzbot
  2019-08-16 12:32 ` WARNING in tracepoint_probe_register_prio (3) syzbot
  0 siblings, 2 replies; 60+ messages in thread
From: syzbot @ 2018-05-18 10:29 UTC (permalink / raw)
  To: linux-kernel, mathieu.desnoyers, paulmck, rostedt, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    a78622932c27 bpf: sockmap, fix double-free
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1305ba77800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b632d8e2c2ab2c1
dashboard link: https://syzkaller.appspot.com/bug?extid=774fddf07b7ab29a1e55
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com

WARNING: CPU: 0 PID: 11734 at kernel/tracepoint.c:210 tracepoint_add_func  
kernel/tracepoint.c:210 [inline]
WARNING: CPU: 0 PID: 11734 at kernel/tracepoint.c:210  
tracepoint_probe_register_prio+0x3b4/0xa50 kernel/tracepoint.c:282
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11734 Comm: syz-executor1 Not tainted 4.17.0-rc4+ #13
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  panic+0x22f/0x4de kernel/panic.c:184
  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:tracepoint_add_func kernel/tracepoint.c:210 [inline]
RIP: 0010:tracepoint_probe_register_prio+0x3b4/0xa50 kernel/tracepoint.c:282
RSP: 0018:ffff8801c7977438 EFLAGS: 00010216
RAX: 0000000000040000 RBX: ffff8801c7977518 RCX: ffffc900080f9000
RDX: 00000000000012a1 RSI: ffffffff817a9d34 RDI: ffff8801c29ddab0
RBP: ffff8801c7977540 R08: ffff880197448700 R09: fffffbfff11b803c
R10: ffff8801c7977428 R11: ffffffff88dc01e7 R12: 00000000ffffffef
R13: 00000000ffffffff R14: ffffffff81516c90 R15: 0000000000000001
  tracepoint_probe_register+0x2a/0x40 kernel/tracepoint.c:303
  trace_event_reg+0x19a/0x350 kernel/trace/trace_events.c:305
  perf_trace_event_reg kernel/trace/trace_event_perf.c:123 [inline]
  perf_trace_event_init+0x4fe/0x990 kernel/trace/trace_event_perf.c:198
  perf_trace_init+0x186/0x250 kernel/trace/trace_event_perf.c:222
  perf_tp_event_init+0xa6/0x120 kernel/events/core.c:8337
  perf_try_init_event+0x137/0x2f0 kernel/events/core.c:9734
  perf_init_event kernel/events/core.c:9772 [inline]
  perf_event_alloc.part.91+0x1248/0x3090 kernel/events/core.c:10038
  perf_event_alloc kernel/events/core.c:10394 [inline]
  __do_sys_perf_event_open+0xa8a/0x2fa0 kernel/events/core.c:10495
  __se_sys_perf_event_open kernel/events/core.c:10384 [inline]
  __x64_sys_perf_event_open+0xbe/0x150 kernel/events/core.c:10384
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455a09
RSP: 002b:00007f136b4f5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 00007f136b4f66d4 RCX: 0000000000455a09
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000002001d000
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000014
R13: 000000000000050c R14: 00000000006fb9c0 R15: 0000000000000003
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: WARNING in tracepoint_probe_register_prio (3)
  2018-05-18 10:29 WARNING in tracepoint_probe_register_prio (3) syzbot
@ 2019-08-16  0:11 ` syzbot
  2019-08-16 14:26   ` [PATCH 1/1] Fix: trace sched switch start/stop racy updates Mathieu Desnoyers
  2019-08-16 12:32 ` WARNING in tracepoint_probe_register_prio (3) syzbot
  1 sibling, 1 reply; 60+ messages in thread
From: syzbot @ 2019-08-16  0:11 UTC (permalink / raw)
  To: ard.biesheuvel, gregkh, gustavo, jeyu, linux-kernel,
	mathieu.desnoyers, mingo, netdev, paulmck, paulmck, rostedt,
	syzkaller-bugs, tglx

syzbot has found a reproducer for the following crash on:

HEAD commit:    ecb9f80d net/mvpp2: Replace tasklet with softirq hrtimer
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=115730ac600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
dashboard link: https://syzkaller.appspot.com/bug?extid=774fddf07b7ab29a1e55
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11b02a22600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com

WARNING: CPU: 0 PID: 8824 at kernel/tracepoint.c:243 tracepoint_add_func  
kernel/tracepoint.c:243 [inline]
WARNING: CPU: 0 PID: 8824 at kernel/tracepoint.c:243  
tracepoint_probe_register_prio+0x216/0x790 kernel/tracepoint.c:315
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 8824 Comm: syz-executor.4 Not tainted 5.3.0-rc3+ #133
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  panic+0x2dc/0x755 kernel/panic.c:219
  __warn.cold+0x20/0x4c kernel/panic.c:576
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:179 [inline]
  fixup_bug arch/x86/kernel/traps.c:174 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
  invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1028
RIP: 0010:tracepoint_add_func kernel/tracepoint.c:243 [inline]
RIP: 0010:tracepoint_probe_register_prio+0x216/0x790 kernel/tracepoint.c:315
Code: 48 89 f8 48 c1 e8 03 80 3c 08 00 0f 85 bf 04 00 00 48 8b 45 b8 49 3b  
45 08 0f 85 21 ff ff ff 41 bd ef ff ff ff e8 aa 8c fe ff <0f> 0b e8 a3 8c  
fe ff 48 c7 c7 20 44 de 88 e8 d7 1d ca 05 44 89 e8
RSP: 0018:ffff88807b5a7498 EFLAGS: 00010293
RAX: ffff8880a87a85c0 RBX: ffffffff89a1eb00 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff8173fcd6 RDI: ffff88808afc4698
RBP: ffff88807b5a74f0 R08: ffff8880a87a85c0 R09: fffffbfff11bc885
R10: ffff88807b5a7488 R11: ffffffff88de4427 R12: ffff88808afc4690
R13: 00000000ffffffef R14: 00000000ffffffff R15: ffffffff8177f710
  tracepoint_probe_register+0x2b/0x40 kernel/tracepoint.c:335
  register_trace_sched_wakeup include/trace/events/sched.h:96 [inline]
  tracing_sched_register kernel/trace/trace_sched_switch.c:54 [inline]
  tracing_start_sched_switch+0xa8/0x190 kernel/trace/trace_sched_switch.c:106
  tracing_start_cmdline_record+0x13/0x20  
kernel/trace/trace_sched_switch.c:131
  trace_printk_init_buffers kernel/trace/trace.c:3050 [inline]
  trace_printk_init_buffers.cold+0xdf/0xe9 kernel/trace/trace.c:3013
  bpf_get_trace_printk_proto+0xe/0x20 kernel/trace/bpf_trace.c:338
  cgroup_base_func_proto kernel/bpf/cgroup.c:799 [inline]
  cgroup_base_func_proto.isra.0+0x10e/0x120 kernel/bpf/cgroup.c:776
  cg_sockopt_func_proto+0x53/0x70 kernel/bpf/cgroup.c:1411
  check_helper_call+0x141/0x32f0 kernel/bpf/verifier.c:3950
  do_check+0x6213/0x89f0 kernel/bpf/verifier.c:7707
  bpf_check+0x6f99/0x9948 kernel/bpf/verifier.c:9294
  bpf_prog_load+0xe68/0x1670 kernel/bpf/syscall.c:1698
  __do_sys_bpf+0xc43/0x3460 kernel/bpf/syscall.c:2849
  __se_sys_bpf kernel/bpf/syscall.c:2808 [inline]
  __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:2808
  do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc4bf1dec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459829
RDX: 0000000000000070 RSI: 0000000020000180 RDI: 0000000000000005
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc4bf1df6d4
R13: 00000000004bfc7c R14: 00000000004d1938 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: WARNING in tracepoint_probe_register_prio (3)
  2018-05-18 10:29 WARNING in tracepoint_probe_register_prio (3) syzbot
  2019-08-16  0:11 ` syzbot
@ 2019-08-16 12:32 ` syzbot
  2019-08-16 12:41   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 60+ messages in thread
From: syzbot @ 2019-08-16 12:32 UTC (permalink / raw)
  To: antoine.tenart, ard.biesheuvel, baruch, bigeasy, davem, gregkh,
	gustavo, jeyu, linux-kernel, mathieu.desnoyers,
	maxime.chevallier, mingo, netdev, paulmck, paulmck, rmk+kernel,
	rostedt, syzkaller-bugs, tglx

syzbot has bisected this bug to:

commit ecb9f80db23a7ab09b46b298b404e41dd7aff6e6
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Aug 13 08:00:25 2019 +0000

     net/mvpp2: Replace tasklet with softirq hrtimer

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13ffb9ee600000
start commit:   ecb9f80d net/mvpp2: Replace tasklet with softirq hrtimer
git tree:       net-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=100079ee600000
console output: https://syzkaller.appspot.com/x/log.txt?x=17ffb9ee600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
dashboard link: https://syzkaller.appspot.com/bug?extid=774fddf07b7ab29a1e55
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11b02a22600000

Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com
Fixes: ecb9f80db23a ("net/mvpp2: Replace tasklet with softirq hrtimer")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: WARNING in tracepoint_probe_register_prio (3)
  2019-08-16 12:32 ` WARNING in tracepoint_probe_register_prio (3) syzbot
@ 2019-08-16 12:41   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 60+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-08-16 12:41 UTC (permalink / raw)
  To: syzbot
  Cc: antoine.tenart, ard.biesheuvel, baruch, davem, gregkh, gustavo,
	jeyu, linux-kernel, mathieu.desnoyers, maxime.chevallier, mingo,
	netdev, paulmck, paulmck, rmk+kernel, rostedt, syzkaller-bugs,
	tglx

On 2019-08-16 05:32:00 [-0700], syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit ecb9f80db23a7ab09b46b298b404e41dd7aff6e6
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Tue Aug 13 08:00:25 2019 +0000
> 
>     net/mvpp2: Replace tasklet with softirq hrtimer
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13ffb9ee600000

that bisect is wrong. That warning triggered once and this commit was
the top most one in net-next at the time…

Sebastian

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16  0:11 ` syzbot
@ 2019-08-16 14:26   ` Mathieu Desnoyers
  2019-08-16 16:25     ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-16 14:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Mathieu Desnoyers, Joel Fernandes, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney

Reading the sched_cmdline_ref and sched_tgid_ref initial state within
tracing_start_sched_switch without holding the sched_register_mutex is
racy against concurrent updates, which can lead to tracepoint probes
being registered more than once (and thus trigger warnings within
tracepoint.c).

Also, write and read to/from those variables should be done with
WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
probes without holding the sched_register_mutex.

[ Compile-tested only. I suspect it might fix the following syzbot
  report:

  syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com ]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Joel Fernandes (Google) <joel@joelfernandes.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Steven Rostedt (VMware) <rostedt@goodmis.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul E. McKenney <paulmck@linux.ibm.com>
---
 kernel/trace/trace_sched_switch.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index e288168661e1..902e8bf59aeb 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -26,8 +26,8 @@ probe_sched_switch(void *ignore, bool preempt,
 {
 	int flags;
 
-	flags = (RECORD_TGID * !!sched_tgid_ref) +
-		(RECORD_CMDLINE * !!sched_cmdline_ref);
+	flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
+		(RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
 
 	if (!flags)
 		return;
@@ -39,8 +39,8 @@ probe_sched_wakeup(void *ignore, struct task_struct *wakee)
 {
 	int flags;
 
-	flags = (RECORD_TGID * !!sched_tgid_ref) +
-		(RECORD_CMDLINE * !!sched_cmdline_ref);
+	flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
+		(RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
 
 	if (!flags)
 		return;
@@ -89,21 +89,28 @@ static void tracing_sched_unregister(void)
 
 static void tracing_start_sched_switch(int ops)
 {
-	bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
+	bool sched_register;
+
 	mutex_lock(&sched_register_mutex);
+	sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
 
 	switch (ops) {
 	case RECORD_CMDLINE:
-		sched_cmdline_ref++;
+		WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref + 1);
 		break;
 
 	case RECORD_TGID:
-		sched_tgid_ref++;
+		WRITE_ONCE(sched_tgid_ref, sched_tgid_ref + 1);
 		break;
+
+	default:
+		WARN_ONCE(1, "Unsupported tracing op: %d", ops);
+		goto end;
 	}
 
-	if (sched_register && (sched_cmdline_ref || sched_tgid_ref))
+	if (sched_register)
 		tracing_sched_register();
+end:
 	mutex_unlock(&sched_register_mutex);
 }
 
@@ -113,16 +120,21 @@ static void tracing_stop_sched_switch(int ops)
 
 	switch (ops) {
 	case RECORD_CMDLINE:
-		sched_cmdline_ref--;
+		WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref - 1);
 		break;
 
 	case RECORD_TGID:
-		sched_tgid_ref--;
+		WRITE_ONCE(sched_tgid_ref, sched_tgid_ref - 1);
 		break;
+
+	default:
+		WARN_ONCE(1, "Unsupported tracing op: %d", ops);
+		goto end;
 	}
 
 	if (!sched_cmdline_ref && !sched_tgid_ref)
 		tracing_sched_unregister();
+end:
 	mutex_unlock(&sched_register_mutex);
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 14:26   ` [PATCH 1/1] Fix: trace sched switch start/stop racy updates Mathieu Desnoyers
@ 2019-08-16 16:25     ` Steven Rostedt
  2019-08-16 16:48       ` Valentin Schneider
  2019-08-16 17:19       ` Mathieu Desnoyers
  0 siblings, 2 replies; 60+ messages in thread
From: Steven Rostedt @ 2019-08-16 16:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Joel Fernandes, Peter Zijlstra, Thomas Gleixner,
	Paul E . McKenney

On Fri, 16 Aug 2019 10:26:43 -0400
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Reading the sched_cmdline_ref and sched_tgid_ref initial state within
> tracing_start_sched_switch without holding the sched_register_mutex is
> racy against concurrent updates, which can lead to tracepoint probes
> being registered more than once (and thus trigger warnings within
> tracepoint.c).
> 
> Also, write and read to/from those variables should be done with
> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
> probes without holding the sched_register_mutex.
> 

I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
It's done while holding the mutex. It's not that critical of a path,
and makes the code look ugly.

-- Steve



> [ Compile-tested only. I suspect it might fix the following syzbot
>   report:
> 
>   syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com ]
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Joel Fernandes (Google) <joel@joelfernandes.org>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Steven Rostedt (VMware) <rostedt@goodmis.org>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Paul E. McKenney <paulmck@linux.ibm.com>
> ---
>  kernel/trace/trace_sched_switch.c | 32 ++++++++++++++++++++++----------
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
> index e288168661e1..902e8bf59aeb 100644
> --- a/kernel/trace/trace_sched_switch.c
> +++ b/kernel/trace/trace_sched_switch.c
> @@ -26,8 +26,8 @@ probe_sched_switch(void *ignore, bool preempt,
>  {
>  	int flags;
>  
> -	flags = (RECORD_TGID * !!sched_tgid_ref) +
> -		(RECORD_CMDLINE * !!sched_cmdline_ref);
> +	flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
> +		(RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
>  
>  	if (!flags)
>  		return;
> @@ -39,8 +39,8 @@ probe_sched_wakeup(void *ignore, struct task_struct *wakee)
>  {
>  	int flags;
>  
> -	flags = (RECORD_TGID * !!sched_tgid_ref) +
> -		(RECORD_CMDLINE * !!sched_cmdline_ref);
> +	flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
> +		(RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
>  
>  	if (!flags)
>  		return;
> @@ -89,21 +89,28 @@ static void tracing_sched_unregister(void)
>  
>  static void tracing_start_sched_switch(int ops)
>  {
> -	bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
> +	bool sched_register;
> +
>  	mutex_lock(&sched_register_mutex);
> +	sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
>  
>  	switch (ops) {
>  	case RECORD_CMDLINE:
> -		sched_cmdline_ref++;
> +		WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref + 1);
>  		break;
>  
>  	case RECORD_TGID:
> -		sched_tgid_ref++;
> +		WRITE_ONCE(sched_tgid_ref, sched_tgid_ref + 1);
>  		break;
> +
> +	default:
> +		WARN_ONCE(1, "Unsupported tracing op: %d", ops);
> +		goto end;
>  	}
>  
> -	if (sched_register && (sched_cmdline_ref || sched_tgid_ref))
> +	if (sched_register)
>  		tracing_sched_register();
> +end:
>  	mutex_unlock(&sched_register_mutex);
>  }
>  
> @@ -113,16 +120,21 @@ static void tracing_stop_sched_switch(int ops)
>  
>  	switch (ops) {
>  	case RECORD_CMDLINE:
> -		sched_cmdline_ref--;
> +		WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref - 1);
>  		break;
>  
>  	case RECORD_TGID:
> -		sched_tgid_ref--;
> +		WRITE_ONCE(sched_tgid_ref, sched_tgid_ref - 1);
>  		break;
> +
> +	default:
> +		WARN_ONCE(1, "Unsupported tracing op: %d", ops);
> +		goto end;
>  	}
>  
>  	if (!sched_cmdline_ref && !sched_tgid_ref)
>  		tracing_sched_unregister();
> +end:
>  	mutex_unlock(&sched_register_mutex);
>  }
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 16:25     ` Steven Rostedt
@ 2019-08-16 16:48       ` Valentin Schneider
  2019-08-16 17:04         ` Steven Rostedt
  2019-08-16 17:19       ` Mathieu Desnoyers
  1 sibling, 1 reply; 60+ messages in thread
From: Valentin Schneider @ 2019-08-16 16:48 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: linux-kernel, Joel Fernandes, Peter Zijlstra, Thomas Gleixner,
	Paul E . McKenney

On 16/08/2019 17:25, Steven Rostedt wrote:
>> Also, write and read to/from those variables should be done with
>> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
>> probes without holding the sched_register_mutex.
>>
> 
> I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> It's done while holding the mutex. It's not that critical of a path,
> and makes the code look ugly.
> 

I seem to recall something like locking primitives don't protect you from
store tearing / invented stores, so if you can have concurrent readers
using READ_ONCE(), there should be a WRITE_ONCE() on the writer side, even
if it's done in a critical section.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 16:48       ` Valentin Schneider
@ 2019-08-16 17:04         ` Steven Rostedt
  2019-08-16 17:41           ` Mathieu Desnoyers
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2019-08-16 17:04 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Mathieu Desnoyers, linux-kernel, Joel Fernandes, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney

On Fri, 16 Aug 2019 17:48:59 +0100
Valentin Schneider <valentin.schneider@arm.com> wrote:

> On 16/08/2019 17:25, Steven Rostedt wrote:
> >> Also, write and read to/from those variables should be done with
> >> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
> >> probes without holding the sched_register_mutex.
> >>  
> > 
> > I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> > It's done while holding the mutex. It's not that critical of a path,
> > and makes the code look ugly.
> >   
> 
> I seem to recall something like locking primitives don't protect you from
> store tearing / invented stores, so if you can have concurrent readers
> using READ_ONCE(), there should be a WRITE_ONCE() on the writer side, even
> if it's done in a critical section.

But for this, it really doesn't matter. The READ_ONCE() is for going
from 0->1 or 1->0 any other change is the same as 1.

When we enable trace events, we start recording the tasks comms such
that we can possibly map them to the pids. When we disable trace
events, we stop recording the comms so that we don't overwrite the
cache when not needed. Note, if more than the max cache of tasks are
recorded during a session, we are likely to miss comms anyway.

Thinking about this more, the READ_ONCE() and WRTIE_ONCE() are not even
needed, because this is just a best effort anyway.

The only real fix was to move the check into the mutex protect area,
because that can cause a real bug if there was a race.

 {
-	bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
+	bool sched_register;
+
 	mutex_lock(&sched_register_mutex);
+	sched_register = (!sched_cmdline_ref && !sched_tgid_ref);

Thus, I'd like to see a v2 of this patch without the READ_ONCE() or
WRITE_ONCE() added.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 16:25     ` Steven Rostedt
  2019-08-16 16:48       ` Valentin Schneider
@ 2019-08-16 17:19       ` Mathieu Desnoyers
  2019-08-16 19:15         ` Steven Rostedt
  1 sibling, 1 reply; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-16 17:19 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, Joel Fernandes, Google, Peter Zijlstra,
	Thomas Gleixner, paulmck, Boqun Feng, Will Deacon, David Howells,
	Alan Stern, Linus Torvalds

----- On Aug 16, 2019, at 12:25 PM, rostedt rostedt@goodmis.org wrote:

> On Fri, 16 Aug 2019 10:26:43 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
[...]
>> 
>> Also, write and read to/from those variables should be done with
>> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
>> probes without holding the sched_register_mutex.
>> 
> 
> I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> It's done while holding the mutex. It's not that critical of a path,
> and makes the code look ugly.

The update is done while holding the mutex, but the read-side does not
hold that mutex, so it can observe the intermediate state caused by
store-tearing or invented stores which can be generated by the compiler
on the update-side.

Please refer to the following LWN article:

https://lwn.net/Articles/793253/

Sections:
- "Store tearing"
- "Invented stores"

Arguably, based on that article, store tearing is only observed in the
wild for constants (which is not the case here), and invented stores
seem to require specific code patterns. But I wonder why we would ever want to
pair a fragile non-volatile store with a READ_ONCE() ? Considering the pain
associated to reproduce and hunt down this kind of issue in the wild, I would
be tempted to enforce that any READ_ONCE() operating on a variable would either
need to be paired with WRITE_ONCE() or with atomic operations, so those can
eventually be validated by static code checkers and code sanitizers.

If coding style is your only concern here, we may want to consider
introducing new macros in compiler.h:

WRITE_ONCE_INC(v) /* v++ */
WRITE_ONCE_DEC(v) /* v-- */
WRITE_ONCE_ADD(v, count) /* v += count */
WRITE_ONCE_SUB(v, count) /* v -= count */

Thanks,

Mathieu

> 
> -- Steve
> 
> 
> 
>> [ Compile-tested only. I suspect it might fix the following syzbot
>>   report:
>> 
>>   syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com ]
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> CC: Joel Fernandes (Google) <joel@joelfernandes.org>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> CC: Steven Rostedt (VMware) <rostedt@goodmis.org>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Paul E. McKenney <paulmck@linux.ibm.com>
>> ---
>>  kernel/trace/trace_sched_switch.c | 32 ++++++++++++++++++++++----------
>>  1 file changed, 22 insertions(+), 10 deletions(-)
>> 
>> diff --git a/kernel/trace/trace_sched_switch.c
>> b/kernel/trace/trace_sched_switch.c
>> index e288168661e1..902e8bf59aeb 100644
>> --- a/kernel/trace/trace_sched_switch.c
>> +++ b/kernel/trace/trace_sched_switch.c
>> @@ -26,8 +26,8 @@ probe_sched_switch(void *ignore, bool preempt,
>>  {
>>  	int flags;
>>  
>> -	flags = (RECORD_TGID * !!sched_tgid_ref) +
>> -		(RECORD_CMDLINE * !!sched_cmdline_ref);
>> +	flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
>> +		(RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
>>  
>>  	if (!flags)
>>  		return;
>> @@ -39,8 +39,8 @@ probe_sched_wakeup(void *ignore, struct task_struct *wakee)
>>  {
>>  	int flags;
>>  
>> -	flags = (RECORD_TGID * !!sched_tgid_ref) +
>> -		(RECORD_CMDLINE * !!sched_cmdline_ref);
>> +	flags = (RECORD_TGID * !!READ_ONCE(sched_tgid_ref)) +
>> +		(RECORD_CMDLINE * !!READ_ONCE(sched_cmdline_ref));
>>  
>>  	if (!flags)
>>  		return;
>> @@ -89,21 +89,28 @@ static void tracing_sched_unregister(void)
>>  
>>  static void tracing_start_sched_switch(int ops)
>>  {
>> -	bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
>> +	bool sched_register;
>> +
>>  	mutex_lock(&sched_register_mutex);
>> +	sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
>>  
>>  	switch (ops) {
>>  	case RECORD_CMDLINE:
>> -		sched_cmdline_ref++;
>> +		WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref + 1);
>>  		break;
>>  
>>  	case RECORD_TGID:
>> -		sched_tgid_ref++;
>> +		WRITE_ONCE(sched_tgid_ref, sched_tgid_ref + 1);
>>  		break;
>> +
>> +	default:
>> +		WARN_ONCE(1, "Unsupported tracing op: %d", ops);
>> +		goto end;
>>  	}
>>  
>> -	if (sched_register && (sched_cmdline_ref || sched_tgid_ref))
>> +	if (sched_register)
>>  		tracing_sched_register();
>> +end:
>>  	mutex_unlock(&sched_register_mutex);
>>  }
>>  
>> @@ -113,16 +120,21 @@ static void tracing_stop_sched_switch(int ops)
>>  
>>  	switch (ops) {
>>  	case RECORD_CMDLINE:
>> -		sched_cmdline_ref--;
>> +		WRITE_ONCE(sched_cmdline_ref, sched_cmdline_ref - 1);
>>  		break;
>>  
>>  	case RECORD_TGID:
>> -		sched_tgid_ref--;
>> +		WRITE_ONCE(sched_tgid_ref, sched_tgid_ref - 1);
>>  		break;
>> +
>> +	default:
>> +		WARN_ONCE(1, "Unsupported tracing op: %d", ops);
>> +		goto end;
>>  	}
>>  
>>  	if (!sched_cmdline_ref && !sched_tgid_ref)
>>  		tracing_sched_unregister();
>> +end:
>>  	mutex_unlock(&sched_register_mutex);
>>  }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 17:04         ` Steven Rostedt
@ 2019-08-16 17:41           ` Mathieu Desnoyers
  2019-08-16 19:18             ` Steven Rostedt
  2019-08-16 19:19             ` Alan Stern
  0 siblings, 2 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-16 17:41 UTC (permalink / raw)
  To: rostedt
  Cc: Valentin Schneider, linux-kernel, Joel Fernandes, Google,
	Peter Zijlstra, Thomas Gleixner, paulmck, Boqun Feng,
	Will Deacon, David Howells, Alan Stern, Linus Torvalds

----- On Aug 16, 2019, at 1:04 PM, rostedt rostedt@goodmis.org wrote:

> On Fri, 16 Aug 2019 17:48:59 +0100
> Valentin Schneider <valentin.schneider@arm.com> wrote:
> 
>> On 16/08/2019 17:25, Steven Rostedt wrote:
>> >> Also, write and read to/from those variables should be done with
>> >> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
>> >> probes without holding the sched_register_mutex.
>> >>  
>> > 
>> > I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
>> > It's done while holding the mutex. It's not that critical of a path,
>> > and makes the code look ugly.
>> >   
>> 
>> I seem to recall something like locking primitives don't protect you from
>> store tearing / invented stores, so if you can have concurrent readers
>> using READ_ONCE(), there should be a WRITE_ONCE() on the writer side, even
>> if it's done in a critical section.
> 
> But for this, it really doesn't matter. The READ_ONCE() is for going
> from 0->1 or 1->0 any other change is the same as 1.

Let's consider this "invented store" scenario (even though as I noted in my
earlier email, I suspect this particular instance of the code does not appear
to fit the requirements to generate this in the wild with current compilers):

intial state:

sched_tgid_ref = 10;

Thread A                                Thread B

registration                            tracepoint probe
sched_tgid_ref++
  - compiler decides to invent a
    store: sched_tgid_ref = 0
                                        READ_ONCE(sched_tgid_ref) -> observes 0,
                                        but should really see either 10 or 11.
  - compiler stores 11 into
    sched_tgid_ref

This kind of scenario could cause spurious missing data in the middle of a
trace caused by another user trying to increment the reference count, which
is really unexpected.

A similar scenario can happen for "store tearing" if the compiler decides
to break the store into many stores close to the 16-bit overflow value when
updating a 32-bit reference count. Spurious 1 -> 0 -> 1 transitions could be
observed by readers.

> When we enable trace events, we start recording the tasks comms such
> that we can possibly map them to the pids. When we disable trace
> events, we stop recording the comms so that we don't overwrite the
> cache when not needed. Note, if more than the max cache of tasks are
> recorded during a session, we are likely to miss comms anyway.
> 
> Thinking about this more, the READ_ONCE() and WRTIE_ONCE() are not even
> needed, because this is just a best effort anyway.

If you choose not to use READ_ONCE(), then the "load tearing" issue can
cause similar spurious 1 -> 0 -> 1 transitions near 16-bit counter
overflow as described above. The "Invented load" also becomes an issue,
because the compiler could use the loaded value for a branch, and re-load
that value between two branches which are expected to use the same value,
effectively generating a corrupted state.

I think we need a statement about whether READ_ONCE/WRITE_ONCE should
be used in this kind of situation, or if we are fine dealing with the
awkward compiler side-effects when they will occur.

Thanks,

Mathieu

> 
> The only real fix was to move the check into the mutex protect area,
> because that can cause a real bug if there was a race.
> 
> {
> -	bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
> +	bool sched_register;
> +
> 	mutex_lock(&sched_register_mutex);
> +	sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
> 
> Thus, I'd like to see a v2 of this patch without the READ_ONCE() or
> WRITE_ONCE() added.
> 
> -- Steve

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 17:19       ` Mathieu Desnoyers
@ 2019-08-16 19:15         ` Steven Rostedt
  2019-08-17 14:27           ` Mathieu Desnoyers
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2019-08-16 19:15 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Joel Fernandes, Google, Peter Zijlstra,
	Thomas Gleixner, paulmck, Boqun Feng, Will Deacon, David Howells,
	Alan Stern, Linus Torvalds

On Fri, 16 Aug 2019 13:19:20 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Aug 16, 2019, at 12:25 PM, rostedt rostedt@goodmis.org wrote:
> 
> > On Fri, 16 Aug 2019 10:26:43 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> [...]
> >> 
> >> Also, write and read to/from those variables should be done with
> >> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
> >> probes without holding the sched_register_mutex.
> >>   
> > 
> > I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> > It's done while holding the mutex. It's not that critical of a path,
> > and makes the code look ugly.  
> 
> The update is done while holding the mutex, but the read-side does not
> hold that mutex, so it can observe the intermediate state caused by
> store-tearing or invented stores which can be generated by the compiler
> on the update-side.
> 
> Please refer to the following LWN article:
> 
> https://lwn.net/Articles/793253/
> 
> Sections:
> - "Store tearing"
> - "Invented stores"
> 
> Arguably, based on that article, store tearing is only observed in the
> wild for constants (which is not the case here), and invented stores
> seem to require specific code patterns. But I wonder why we would ever want to
> pair a fragile non-volatile store with a READ_ONCE() ? Considering the pain
> associated to reproduce and hunt down this kind of issue in the wild, I would
> be tempted to enforce that any READ_ONCE() operating on a variable would either
> need to be paired with WRITE_ONCE() or with atomic operations, so those can
> eventually be validated by static code checkers and code sanitizers.

My issue is that this is just a case to decide if we should cache a
comm or not. It's a helper, nothing more. There's no guarantee that
something will get cached.

-- Steve


> 
> If coding style is your only concern here, we may want to consider
> introducing new macros in compiler.h:
> 
> WRITE_ONCE_INC(v) /* v++ */
> WRITE_ONCE_DEC(v) /* v-- */
> WRITE_ONCE_ADD(v, count) /* v += count */
> WRITE_ONCE_SUB(v, count) /* v -= count */
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 17:41           ` Mathieu Desnoyers
@ 2019-08-16 19:18             ` Steven Rostedt
  2019-08-16 19:19             ` Alan Stern
  1 sibling, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2019-08-16 19:18 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Valentin Schneider, linux-kernel, Joel Fernandes, Google,
	Peter Zijlstra, Thomas Gleixner, paulmck, Boqun Feng,
	Will Deacon, David Howells, Alan Stern, Linus Torvalds

On Fri, 16 Aug 2019 13:41:59 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Aug 16, 2019, at 1:04 PM, rostedt rostedt@goodmis.org wrote:
> 
> > On Fri, 16 Aug 2019 17:48:59 +0100
> > Valentin Schneider <valentin.schneider@arm.com> wrote:
> >   
> >> On 16/08/2019 17:25, Steven Rostedt wrote:  
> >> >> Also, write and read to/from those variables should be done with
> >> >> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
> >> >> probes without holding the sched_register_mutex.
> >> >>    
> >> > 
> >> > I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> >> > It's done while holding the mutex. It's not that critical of a path,
> >> > and makes the code look ugly.
> >> >     
> >> 
> >> I seem to recall something like locking primitives don't protect you from
> >> store tearing / invented stores, so if you can have concurrent readers
> >> using READ_ONCE(), there should be a WRITE_ONCE() on the writer side, even
> >> if it's done in a critical section.  
> > 
> > But for this, it really doesn't matter. The READ_ONCE() is for going
> > from 0->1 or 1->0 any other change is the same as 1.  
> 
> Let's consider this "invented store" scenario (even though as I noted in my
> earlier email, I suspect this particular instance of the code does not appear
> to fit the requirements to generate this in the wild with current compilers):
> 
> intial state:
> 
> sched_tgid_ref = 10;
> 
> Thread A                                Thread B
> 
> registration                            tracepoint probe
> sched_tgid_ref++
>   - compiler decides to invent a
>     store: sched_tgid_ref = 0

This looks to me that this would cause more issues in other parts of
the code if a compiler ever decided to do this.

But I still don't care for this case. It's a cache, nothing more. No
guarantee that anything will be recorded.


>                                         READ_ONCE(sched_tgid_ref) ->
> observes 0, but should really see either 10 or 11.
>   - compiler stores 11 into
>     sched_tgid_ref
> 
> This kind of scenario could cause spurious missing data in the middle
> of a trace caused by another user trying to increment the reference
> count, which is really unexpected.
> 
> A similar scenario can happen for "store tearing" if the compiler
> decides to break the store into many stores close to the 16-bit
> overflow value when updating a 32-bit reference count. Spurious 1 ->
> 0 -> 1 transitions could be observed by readers.
> 
> > When we enable trace events, we start recording the tasks comms such
> > that we can possibly map them to the pids. When we disable trace
> > events, we stop recording the comms so that we don't overwrite the
> > cache when not needed. Note, if more than the max cache of tasks are
> > recorded during a session, we are likely to miss comms anyway.
> > 
> > Thinking about this more, the READ_ONCE() and WRTIE_ONCE() are not
> > even needed, because this is just a best effort anyway.  
> 
> If you choose not to use READ_ONCE(), then the "load tearing" issue
> can cause similar spurious 1 -> 0 -> 1 transitions near 16-bit counter
> overflow as described above. The "Invented load" also becomes an
> issue, because the compiler could use the loaded value for a branch,
> and re-load that value between two branches which are expected to use
> the same value, effectively generating a corrupted state.
> 
> I think we need a statement about whether READ_ONCE/WRITE_ONCE should
> be used in this kind of situation, or if we are fine dealing with the
> awkward compiler side-effects when they will occur.
> 

Let me put it this way. My biggest issue with this, is that the
READ/WRITE_ONCE() has *nothing* to do with the bug you are trying to
fix.

That bug is that we did the decision of starting the probes outside the
mutex. That is fixed my moving the decision into the mutex. The
READ/WRITE_ONCE() is just added noise.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 17:41           ` Mathieu Desnoyers
  2019-08-16 19:18             ` Steven Rostedt
@ 2019-08-16 19:19             ` Alan Stern
  2019-08-16 20:44               ` Joel Fernandes
  1 sibling, 1 reply; 60+ messages in thread
From: Alan Stern @ 2019-08-16 19:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: rostedt, Valentin Schneider, linux-kernel, Joel Fernandes,
	Google, Peter Zijlstra, Thomas Gleixner, paulmck, Boqun Feng,
	Will Deacon, David Howells, Linus Torvalds

On Fri, 16 Aug 2019, Mathieu Desnoyers wrote:

> If you choose not to use READ_ONCE(), then the "load tearing" issue can
> cause similar spurious 1 -> 0 -> 1 transitions near 16-bit counter
> overflow as described above. The "Invented load" also becomes an issue,
> because the compiler could use the loaded value for a branch, and re-load
> that value between two branches which are expected to use the same value,
> effectively generating a corrupted state.
> 
> I think we need a statement about whether READ_ONCE/WRITE_ONCE should
> be used in this kind of situation, or if we are fine dealing with the
> awkward compiler side-effects when they will occur.

The only real downside (apart from readability) of READ_ONCE and
WRITE_ONCE is that they prevent the compiler from optimizing accesses
to the location being read or written.  But if you're just doing a
single access in each place, not multiple accesses, then there's
nothing to optimize anyway.  So there's no real reason not to use
READ_ONCE or WRITE_ONCE.

Alan Stern



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 19:19             ` Alan Stern
@ 2019-08-16 20:44               ` Joel Fernandes
  2019-08-16 20:49                 ` Thomas Gleixner
  2019-08-16 20:49                 ` Steven Rostedt
  0 siblings, 2 replies; 60+ messages in thread
From: Joel Fernandes @ 2019-08-16 20:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mathieu Desnoyers, rostedt, Valentin Schneider, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, paulmck, Boqun Feng,
	Will Deacon, David Howells, Linus Torvalds

On Fri, Aug 16, 2019 at 3:19 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Fri, 16 Aug 2019, Mathieu Desnoyers wrote:
>
> > If you choose not to use READ_ONCE(), then the "load tearing" issue can
> > cause similar spurious 1 -> 0 -> 1 transitions near 16-bit counter
> > overflow as described above. The "Invented load" also becomes an issue,
> > because the compiler could use the loaded value for a branch, and re-load
> > that value between two branches which are expected to use the same value,
> > effectively generating a corrupted state.
> >
> > I think we need a statement about whether READ_ONCE/WRITE_ONCE should
> > be used in this kind of situation, or if we are fine dealing with the
> > awkward compiler side-effects when they will occur.
>
> The only real downside (apart from readability) of READ_ONCE and
> WRITE_ONCE is that they prevent the compiler from optimizing accesses
> to the location being read or written.  But if you're just doing a
> single access in each place, not multiple accesses, then there's
> nothing to optimize anyway.  So there's no real reason not to use
> READ_ONCE or WRITE_ONCE.

I am also more on the side of using *_ONCE. To me, by principal, I
would be willing to convert any concurrent plain access using _ONCE,
just so we don't have to worry about it now or in the future and also
documents the access.

Perhaps the commit message can be reworded to mention that the _ONCE
is an additional clean up for safe access.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:44               ` Joel Fernandes
@ 2019-08-16 20:49                 ` Thomas Gleixner
  2019-08-16 20:57                   ` Joel Fernandes
  2019-08-16 21:04                   ` Linus Torvalds
  2019-08-16 20:49                 ` Steven Rostedt
  1 sibling, 2 replies; 60+ messages in thread
From: Thomas Gleixner @ 2019-08-16 20:49 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alan Stern, Mathieu Desnoyers, rostedt, Valentin Schneider,
	linux-kernel, Peter Zijlstra, paulmck, Boqun Feng, Will Deacon,
	David Howells, Linus Torvalds

On Fri, 16 Aug 2019, Joel Fernandes wrote:
> On Fri, Aug 16, 2019 at 3:19 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Fri, 16 Aug 2019, Mathieu Desnoyers wrote:
> >
> > > If you choose not to use READ_ONCE(), then the "load tearing" issue can
> > > cause similar spurious 1 -> 0 -> 1 transitions near 16-bit counter
> > > overflow as described above. The "Invented load" also becomes an issue,
> > > because the compiler could use the loaded value for a branch, and re-load
> > > that value between two branches which are expected to use the same value,
> > > effectively generating a corrupted state.
> > >
> > > I think we need a statement about whether READ_ONCE/WRITE_ONCE should
> > > be used in this kind of situation, or if we are fine dealing with the
> > > awkward compiler side-effects when they will occur.
> >
> > The only real downside (apart from readability) of READ_ONCE and
> > WRITE_ONCE is that they prevent the compiler from optimizing accesses
> > to the location being read or written.  But if you're just doing a
> > single access in each place, not multiple accesses, then there's
> > nothing to optimize anyway.  So there's no real reason not to use
> > READ_ONCE or WRITE_ONCE.
> 
> I am also more on the side of using *_ONCE. To me, by principal, I
> would be willing to convert any concurrent plain access using _ONCE,
> just so we don't have to worry about it now or in the future and also
> documents the access.

By that argumentation we need to plaster half of the kernel with _ONCE()
and I'm so not looking forward to the insane amount of script kiddies
patches to do that.

Can we finally put a foot down and tell compiler and standard committee
people to stop this insanity?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:44               ` Joel Fernandes
  2019-08-16 20:49                 ` Thomas Gleixner
@ 2019-08-16 20:49                 ` Steven Rostedt
  2019-08-16 20:59                   ` Joel Fernandes
                                     ` (2 more replies)
  1 sibling, 3 replies; 60+ messages in thread
From: Steven Rostedt @ 2019-08-16 20:49 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alan Stern, Mathieu Desnoyers, Valentin Schneider, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, paulmck, Boqun Feng,
	Will Deacon, David Howells, Linus Torvalds

On Fri, 16 Aug 2019 16:44:10 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:


> I am also more on the side of using *_ONCE. To me, by principal, I
> would be willing to convert any concurrent plain access using _ONCE,
> just so we don't have to worry about it now or in the future and also
> documents the access.
> 
> Perhaps the commit message can be reworded to mention that the _ONCE
> is an additional clean up for safe access.

The most I'll take is two separate patches. One is going to be marked
for stable as it fixes a real bug. The other is more for cosmetic or
theoretical issues, that I will state clearly "NOT FOR STABLE", such
that the autosel doesn't take them.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:49                 ` Thomas Gleixner
@ 2019-08-16 20:57                   ` Joel Fernandes
  2019-08-16 22:27                     ` Valentin Schneider
  2019-08-16 21:04                   ` Linus Torvalds
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2019-08-16 20:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alan Stern, Mathieu Desnoyers, rostedt, Valentin Schneider,
	linux-kernel, Peter Zijlstra, paulmck, Boqun Feng, Will Deacon,
	David Howells, Linus Torvalds

On Fri, Aug 16, 2019 at 10:49:04PM +0200, Thomas Gleixner wrote:
> On Fri, 16 Aug 2019, Joel Fernandes wrote:
> > On Fri, Aug 16, 2019 at 3:19 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > > On Fri, 16 Aug 2019, Mathieu Desnoyers wrote:
> > >
> > > > If you choose not to use READ_ONCE(), then the "load tearing" issue can
> > > > cause similar spurious 1 -> 0 -> 1 transitions near 16-bit counter
> > > > overflow as described above. The "Invented load" also becomes an issue,
> > > > because the compiler could use the loaded value for a branch, and re-load
> > > > that value between two branches which are expected to use the same value,
> > > > effectively generating a corrupted state.
> > > >
> > > > I think we need a statement about whether READ_ONCE/WRITE_ONCE should
> > > > be used in this kind of situation, or if we are fine dealing with the
> > > > awkward compiler side-effects when they will occur.
> > >
> > > The only real downside (apart from readability) of READ_ONCE and
> > > WRITE_ONCE is that they prevent the compiler from optimizing accesses
> > > to the location being read or written.  But if you're just doing a
> > > single access in each place, not multiple accesses, then there's
> > > nothing to optimize anyway.  So there's no real reason not to use
> > > READ_ONCE or WRITE_ONCE.
> > 
> > I am also more on the side of using *_ONCE. To me, by principal, I
> > would be willing to convert any concurrent plain access using _ONCE,
> > just so we don't have to worry about it now or in the future and also
> > documents the access.
> 
> By that argumentation we need to plaster half of the kernel with _ONCE()
> and I'm so not looking forward to the insane amount of script kiddies
> patches to do that.

Really? That is quite scary that you are saying half of the kernel has issues
with concurrent access or compiler optimizations. It scares me that a
concurrent access can tear down a store/load and existing code can just fail,
if that is the case.

> Can we finally put a foot down and tell compiler and standard committee
> people to stop this insanity?

Sure, or could the compilers provide flags which prevent such optimization
similar to -O* flags?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:49                 ` Steven Rostedt
@ 2019-08-16 20:59                   ` Joel Fernandes
  2019-08-17  1:25                   ` Mathieu Desnoyers
  2019-08-18  9:15                   ` stable markup was " Pavel Machek
  2 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2019-08-16 20:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alan Stern, Mathieu Desnoyers, Valentin Schneider, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, paulmck, Boqun Feng,
	Will Deacon, David Howells, Linus Torvalds

On Fri, Aug 16, 2019 at 04:49:12PM -0400, Steven Rostedt wrote:
> On Fri, 16 Aug 2019 16:44:10 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> 
> > I am also more on the side of using *_ONCE. To me, by principal, I
> > would be willing to convert any concurrent plain access using _ONCE,
> > just so we don't have to worry about it now or in the future and also
> > documents the access.
> > 
> > Perhaps the commit message can be reworded to mention that the _ONCE
> > is an additional clean up for safe access.
> 
> The most I'll take is two separate patches. One is going to be marked
> for stable as it fixes a real bug. The other is more for cosmetic or
> theoretical issues, that I will state clearly "NOT FOR STABLE", such
> that the autosel doesn't take them.

Makes sense to me!

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:49                 ` Thomas Gleixner
  2019-08-16 20:57                   ` Joel Fernandes
@ 2019-08-16 21:04                   ` Linus Torvalds
  2019-08-17  1:36                     ` Mathieu Desnoyers
  1 sibling, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2019-08-16 21:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joel Fernandes, Alan Stern, Mathieu Desnoyers, rostedt,
	Valentin Schneider, linux-kernel, Peter Zijlstra, paulmck,
	Boqun Feng, Will Deacon, David Howells

On Fri, Aug 16, 2019 at 1:49 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Can we finally put a foot down and tell compiler and standard committee
> people to stop this insanity?

It's already effectively done.

Yes, values can be read from memory multiple times if they need
reloading. So "READ_ONCE()" when the value can change is a damn good
idea.

But it should only be used if the value *can* change. Inside a locked
region it is actively pointless and misleading.

Similarly, WRITE_ONCE() should only be used if you have a _reason_ for
using it (notably if you're not holding a lock).

If people use READ_ONCE/WRITE_ONCE when there are locks that prevent
the values from changing, they are only making the code illegible.
Don't do it.

But in the *absence* of locking, READ_ONCE/WRITE_ONCE is usually a
good thing.  The READ_ONCE actually tends to matter, because even if
the value is used only once at a source level, the compiler *could*
decide to do something else.

The WRITE_ONCE() may or may not matter (afaik, thanks to concurrency,
modern C standard does not allow optimistic writes anyway, and we
wouldn't really accept such a compiler option if it did).

But if the write is done without locking, it's good practice just to
show you are aware of the whole "done without locking" part.

              Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:57                   ` Joel Fernandes
@ 2019-08-16 22:27                     ` Valentin Schneider
  2019-08-16 22:57                       ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Valentin Schneider @ 2019-08-16 22:27 UTC (permalink / raw)
  To: Joel Fernandes, Thomas Gleixner
  Cc: Alan Stern, Mathieu Desnoyers, rostedt, linux-kernel,
	Peter Zijlstra, paulmck, Boqun Feng, Will Deacon, David Howells,
	Linus Torvalds

On 16/08/2019 21:57, Joel Fernandes wrote:
>> Can we finally put a foot down and tell compiler and standard committee
>> people to stop this insanity?
> 
> Sure, or could the compilers provide flags which prevent such optimization
> similar to -O* flags?
> 

How would you differentiate optimizations you want from those you don't with
just a flag? There's a reason we use volatile casts instead of declaring
everything volatile: we actually *want* those optimizations. It just so
happens that we don't want them *in some places*, and we have tools to tag
them as such.

The alternative is having a compiler that can magically correlate e.g. locked
writes with lock-free reads and properly handle them, but I don't think
there's a foolproof way of doing that.

> thanks,
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 22:27                     ` Valentin Schneider
@ 2019-08-16 22:57                       ` Linus Torvalds
  2019-08-17  1:41                         ` Mathieu Desnoyers
  2019-08-17  4:52                         ` Paul E. McKenney
  0 siblings, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2019-08-16 22:57 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Joel Fernandes, Thomas Gleixner, Alan Stern, Mathieu Desnoyers,
	rostedt, linux-kernel, Peter Zijlstra, paulmck, Boqun Feng,
	Will Deacon, David Howells

On Fri, Aug 16, 2019 at 3:27 PM Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> How would you differentiate optimizations you want from those you don't with
> just a flag? There's a reason we use volatile casts instead of declaring
> everything volatile: we actually *want* those optimizations. It just so
> happens that we don't want them *in some places*, and we have tools to tag
> them as such.

We actually disable lots of "valid" (read: the standard allows them,
but they are completely wrong for the kernel) optimizations because
they are wrong.

The whole type-based alias thing is just wrong. The C standards body
was incompetent to allow that garbage. So we disable it.

If you can *prove* that no aliasing exists, go ahead and re-order
accesses. But no guesses based on random types.

Similarly, if some compiler decides that it's ok to make speculative
writes (knowing it will over-write it with the right data later) to
data that is possibly visible to other threads, then such an
"optimization" needs to just be disabled. It might help some
benchmark, and if you read the standard just the right way it might be
allowed - but that doesn't make it valid.

We already had situations like that, where compiler people thought it
would be ok (for example) to turns a narrow write into a wider
read-modify-write because it had already done the wider read for other
reasons.

Again, the original C standard "allows" that in theory, because the
original C standard doesn't take threading into account. In fact, the
alpha architecture made actively bad design decisions based on that
(incorrect) assumption.

It turns out that in that case, even non-kernel people rebelled, and
it's apparently thankfully not allowed in newer versions of the
standard, exactly because threading has become a thing. You can't
magically write back unrelated variables just because they might be
next-door neighbors and share a word.

So no, we do *not* in general just say that we want any random
optimizations. A compiler that turns a single write into something
else is almost certainly something that shouldn't be allowed near the
kernel.

We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
because of some theoretical "compiler is free to do garbage"
arguments. If such garbage happens, we need to fix the compiler, the
same way we already do with

  -fno-strict-aliasing
  -fno-delete-null-pointer-checks
  -fno-strict-overflow

because all those "optimizations" are just fundamentally unsafe and wrong.

I really wish the compiler would never take advantage of "I can prove
this is undefined behavior" kind of things when it comes to the kernel
(or any other projects I am involved with, for that matter). If you
can prove that, then you shouldn't decide to generate random code
without a big warning. But that's what those optimizations that we
disable effectively all do.

I'd love to have a flag that says "all undefined behavior is treated
as implementation-defined". There's a somewhat subtle - but very
important - difference there.

And that's what some hypothetical speculative write optimizations do
too. I do not believe they are valid for the kernel. If the code says

   if (a)
      global_var = 1
   else
      global_var = 0

then the compiler had better not turn that into

     global_var = 0
     if (a)
         global_var = 1

even if there isn't a volatile there. But yes, we've had compiler
writers that say "if you read the specs, that's ok".

No, it's not ok. Because reality trumps any weasel-spec-reading.

And happily, I don't think we've ever really seen a compiler that we
use that actually does the above kind of speculative write thing (but
doing it for your own local variables that can't be seen by other
threads of execution - go wild).

So in general, we very much expect the compiler to do sane code
generation, and not (for example) do store tearing on normal
word-sized things or add writes that weren't there originally etc.

And yes, reads are different from writes. Reads don't have the same
kind of "other threads of execution can see them" effects, so a
compiler turning a single read into multiple reads is much more
realistic and not the same kind of "we need to expect a certain kind
of sanity from the compiler" issue.

              Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:49                 ` Steven Rostedt
  2019-08-16 20:59                   ` Joel Fernandes
@ 2019-08-17  1:25                   ` Mathieu Desnoyers
  2019-08-18  9:15                   ` stable markup was " Pavel Machek
  2 siblings, 0 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17  1:25 UTC (permalink / raw)
  To: rostedt
  Cc: Joel Fernandes, Google, Alan Stern, Valentin Schneider,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, paulmck,
	Boqun Feng, Will Deacon, David Howells, Linus Torvalds

----- On Aug 16, 2019, at 4:49 PM, rostedt rostedt@goodmis.org wrote:

> On Fri, 16 Aug 2019 16:44:10 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> 
>> I am also more on the side of using *_ONCE. To me, by principal, I
>> would be willing to convert any concurrent plain access using _ONCE,
>> just so we don't have to worry about it now or in the future and also
>> documents the access.
>> 
>> Perhaps the commit message can be reworded to mention that the _ONCE
>> is an additional clean up for safe access.
> 
> The most I'll take is two separate patches. One is going to be marked
> for stable as it fixes a real bug. The other is more for cosmetic or
> theoretical issues, that I will state clearly "NOT FOR STABLE", such
> that the autosel doesn't take them.

Splitting this into two separate patches makes perfect sense.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 21:04                   ` Linus Torvalds
@ 2019-08-17  1:36                     ` Mathieu Desnoyers
  2019-08-17  2:13                       ` Steven Rostedt
  2019-08-17  8:08                       ` Linus Torvalds
  0 siblings, 2 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17  1:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Joel Fernandes, Google, Alan Stern, rostedt,
	Valentin Schneider, linux-kernel, Peter Zijlstra, paulmck,
	Boqun Feng, Will Deacon, David Howells

----- On Aug 16, 2019, at 5:04 PM, Linus Torvalds torvalds@linux-foundation.org wrote:

> On Fri, Aug 16, 2019 at 1:49 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Can we finally put a foot down and tell compiler and standard committee
>> people to stop this insanity?
> 
> It's already effectively done.
> 
> Yes, values can be read from memory multiple times if they need
> reloading. So "READ_ONCE()" when the value can change is a damn good
> idea.
> 
> But it should only be used if the value *can* change. Inside a locked
> region it is actively pointless and misleading.
> 
> Similarly, WRITE_ONCE() should only be used if you have a _reason_ for
> using it (notably if you're not holding a lock).
> 
> If people use READ_ONCE/WRITE_ONCE when there are locks that prevent
> the values from changing, they are only making the code illegible.
> Don't do it.

I agree with your argument in the case where both read-side and write-side
are protected by the same lock: READ/WRITE_ONCE are useless then. However,
in the scenario we have here, only the write-side is protected by the lock
against concurrent updates, but reads don't take any lock.

If WRITE_ONCE has any use at all (protecting against store tearing and
invented stores), it should be used even with a lock held in this
scenario, because the lock does not prevent READ_ONCE() from observing
transient states caused by lack of WRITE_ONCE() for the update.

So why does WRITE_ONCE exist in the first place ? Is it for documentation
purposes only or are there actual situations where omitting it can cause
bugs with real-life compilers ?

In terms of code change, should we favor only introducing WRITE_ONCE
in new code, or should existing code matching those conditions be
moved to WRITE_ONCE as bug fixes ?

Thanks,

Mathieu

> 
> But in the *absence* of locking, READ_ONCE/WRITE_ONCE is usually a
> good thing.  The READ_ONCE actually tends to matter, because even if
> the value is used only once at a source level, the compiler *could*
> decide to do something else.
> 
> The WRITE_ONCE() may or may not matter (afaik, thanks to concurrency,
> modern C standard does not allow optimistic writes anyway, and we
> wouldn't really accept such a compiler option if it did).
> 
> But if the write is done without locking, it's good practice just to
> show you are aware of the whole "done without locking" part.
> 
>               Linus

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 22:57                       ` Linus Torvalds
@ 2019-08-17  1:41                         ` Mathieu Desnoyers
  2019-08-17  4:52                         ` Paul E. McKenney
  1 sibling, 0 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17  1:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Valentin Schneider, Joel Fernandes, Google, Thomas Gleixner,
	Alan Stern, rostedt, linux-kernel, Peter Zijlstra, paulmck,
	Boqun Feng, Will Deacon, David Howells

----- On Aug 16, 2019, at 6:57 PM, Linus Torvalds torvalds@linux-foundation.org wrote:

> So in general, we very much expect the compiler to do sane code
> generation, and not (for example) do store tearing on normal
> word-sized things or add writes that weren't there originally etc.

My understanding of https://lwn.net/Articles/793253/ section "Store tearing"
which points at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56028 seems
to contradict your expectation at least when writing constants to a
64-bit word without a volatile access.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  1:36                     ` Mathieu Desnoyers
@ 2019-08-17  2:13                       ` Steven Rostedt
  2019-08-17 14:40                         ` Mathieu Desnoyers
  2019-08-17  8:08                       ` Linus Torvalds
  1 sibling, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2019-08-17  2:13 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Thomas Gleixner, Joel Fernandes, Google,
	Alan Stern, Valentin Schneider, linux-kernel, Peter Zijlstra,
	paulmck, Boqun Feng, Will Deacon, David Howells

On Fri, 16 Aug 2019 21:36:49 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Aug 16, 2019, at 5:04 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
> 
> > On Fri, Aug 16, 2019 at 1:49 PM Thomas Gleixner <tglx@linutronix.de> wrote:  
> >>
> >> Can we finally put a foot down and tell compiler and standard committee
> >> people to stop this insanity?  
> > 
> > It's already effectively done.
> > 
> > Yes, values can be read from memory multiple times if they need
> > reloading. So "READ_ONCE()" when the value can change is a damn good
> > idea.
> > 
> > But it should only be used if the value *can* change. Inside a locked
> > region it is actively pointless and misleading.
> > 
> > Similarly, WRITE_ONCE() should only be used if you have a _reason_ for
> > using it (notably if you're not holding a lock).
> > 
> > If people use READ_ONCE/WRITE_ONCE when there are locks that prevent
> > the values from changing, they are only making the code illegible.
> > Don't do it.  
> 
> I agree with your argument in the case where both read-side and write-side
> are protected by the same lock: READ/WRITE_ONCE are useless then. However,
> in the scenario we have here, only the write-side is protected by the lock
> against concurrent updates, but reads don't take any lock.

And because reads are not protected by any lock or memory barrier,
using READ_ONCE() is pointless. The CPU could be doing a lot of hidden
manipulation of that variable too.

Again, this is just to enable caching of the comm. Nothing more. It's a
"best effort" algorithm. We get it, we then can map a pid to a name. If
not, we just have a pid and we map "<...>".

Next you'll be asking for the memory barriers to guarantee a real hit.
And honestly, this information is not worth any overhead.

And most often we enable this before we enable the tracepoint we want
this information from, which requires modification of the text area and
will do a bunch of syncs across CPUs. That alone will most likely keep
any race from happening.

The only real bug is the check to know if we should add the probe or
not was done outside the lock, and when we hit the race, we could add a
probe twice, causing the kernel to spit out a warning. You fixed that,
and that's all that needs to be done. I'm now even more against adding
the READ_ONCE() or WRITE_ONCE().


-- Steve



> 
> If WRITE_ONCE has any use at all (protecting against store tearing and
> invented stores), it should be used even with a lock held in this
> scenario, because the lock does not prevent READ_ONCE() from observing
> transient states caused by lack of WRITE_ONCE() for the update.
> 
> So why does WRITE_ONCE exist in the first place ? Is it for documentation
> purposes only or are there actual situations where omitting it can cause
> bugs with real-life compilers ?
> 
> In terms of code change, should we favor only introducing WRITE_ONCE
> in new code, or should existing code matching those conditions be
> moved to WRITE_ONCE as bug fixes ?
> 
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 22:57                       ` Linus Torvalds
  2019-08-17  1:41                         ` Mathieu Desnoyers
@ 2019-08-17  4:52                         ` Paul E. McKenney
  2019-08-17  8:28                           ` Linus Torvalds
  2019-08-20 14:01                           ` Peter Zijlstra
  1 sibling, 2 replies; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-17  4:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Valentin Schneider, Joel Fernandes, Thomas Gleixner, Alan Stern,
	Mathieu Desnoyers, rostedt, linux-kernel, Peter Zijlstra,
	Boqun Feng, Will Deacon, David Howells

On Fri, Aug 16, 2019 at 03:57:43PM -0700, Linus Torvalds wrote:

[ . . . ]

> We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
> because of some theoretical "compiler is free to do garbage"
> arguments. If such garbage happens, we need to fix the compiler, the
> same way we already do with
> 
>   -fno-strict-aliasing

Yeah, the compete-with-FORTRAN stuff.  :-/

There is some work going on in the C committee on this, where the
theorists would like to restrict strict-alias based optimizations to
enable better analysis tooling.  And no, although the theorists are
pushing in the direction we would like them to, as far as I can see
they are not pushing as far as we would like.  But it might be that
-fno-strict-aliasing needs some upgrades as well.  I expect to learn
more at the next meeting in a few months.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf

>   -fno-delete-null-pointer-checks
>   -fno-strict-overflow
> 
> because all those "optimizations" are just fundamentally unsafe and wrong.

I was hoping that -fno-strict-overflow might go into the C++ (not C)
standard, and even thought that it had at one point, but what went into
the standard was that signed integers are twos complement, not that
overflowing them is well defined.

We are pushing to hopefully end but at least to restrict the current
pointer lifetime-end zap behavior in both C and C++, which is similar
to the pointer provenance/alias analysis that -fno-strict-aliasing at
least partially suppresses.  The zapping invalidates loading, storing,
comparing, or doing arithmetic on a pointer to an object whose lifetime
has ended.  (The WG14 PDF linked to below gives a non-exhaustive list
of problems that this causes.)

The Linux kernel used to avoid this by refusing to tell the compiler about
kmalloc() and friends, but that changed a few years ago.  This zapping
rules out a non-trivial class of concurrent algorithms, but for once
RCU is unaffected.  Some committee members complained that zapping has
been part of the standard since about 1990, but Maged Michael found a
writeup of one of the algorithms dating back to 1973.  ;-)

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2369.pdf
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1726r0.pdf

> I really wish the compiler would never take advantage of "I can prove
> this is undefined behavior" kind of things when it comes to the kernel
> (or any other projects I am involved with, for that matter). If you
> can prove that, then you shouldn't decide to generate random code
> without a big warning. But that's what those optimizations that we
> disable effectively all do.
> 
> I'd love to have a flag that says "all undefined behavior is treated
> as implementation-defined". There's a somewhat subtle - but very
> important - difference there.

It would also be nice to downgrade some of the undefined behavior in
the standard itself.  Compiler writers usually hate this because it
would require them to document what their compiler does in each case.
So they would prefer "unspecified" if the could not have "undefined".

> And that's what some hypothetical speculative write optimizations do
> too. I do not believe they are valid for the kernel. If the code says
> 
>    if (a)
>       global_var = 1
>    else
>       global_var = 0
> 
> then the compiler had better not turn that into
> 
>      global_var = 0
>      if (a)
>          global_var = 1
> 
> even if there isn't a volatile there. But yes, we've had compiler
> writers that say "if you read the specs, that's ok".
> 
> No, it's not ok. Because reality trumps any weasel-spec-reading.
> 
> And happily, I don't think we've ever really seen a compiler that we
> use that actually does the above kind of speculative write thing (but
> doing it for your own local variables that can't be seen by other
> threads of execution - go wild).

Me, I would still want to use WRITE_ONCE() in this case.

> So in general, we very much expect the compiler to do sane code
> generation, and not (for example) do store tearing on normal
> word-sized things or add writes that weren't there originally etc.
> 
> And yes, reads are different from writes. Reads don't have the same
> kind of "other threads of execution can see them" effects, so a
> compiler turning a single read into multiple reads is much more
> realistic and not the same kind of "we need to expect a certain kind
> of sanity from the compiler" issue.

Though each of those compiler-generated multiple reads might return a
different value, right?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  1:36                     ` Mathieu Desnoyers
  2019-08-17  2:13                       ` Steven Rostedt
@ 2019-08-17  8:08                       ` Linus Torvalds
  2019-08-20 13:56                         ` Peter Zijlstra
  1 sibling, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2019-08-17  8:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, Joel Fernandes, Alan Stern, rostedt,
	Valentin Schneider, linux-kernel, Peter Zijlstra, paulmck,
	Boqun Feng, Will Deacon, David Howells

On Fri, Aug 16, 2019, 18:36 Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> If WRITE_ONCE has any use at all (protecting against store tearing and
> invented stores), it should be used even with a lock held in this
> scenario, because the lock does not prevent READ_ONCE() from observing
> transient states caused by lack of WRITE_ONCE() for the update.

The thing is, we really haven't requred WRITE_ONCE() to protect
against store tearing.

We have lots of traditional code that does stuff along the lines of

   .. set of data structure ..
   smp_wmb();
   *ptr = newdata;

and we simply *depend* on the compiler doing the "*ptr" as a single
write. We've simply always done that.  Even on UP we've had the
"interrupts will see old value or new value but not some random
half-way value", going all the way back to the original kernel
sources.

The data tearing issue is almost a non-issue. We're not going to add
WRITE_ONCE() to these kinds of places for no good reason.

> So why does WRITE_ONCE exist in the first place ? Is it for documentation
> purposes only or are there actual situations where omitting it can cause
> bugs with real-life compilers ?

WRITE_ONCE should be seen mainly as (a) documentation and (b) for new code.

Although I suspect often you'd be better off using smb_load_acquire()
and smp_store_release() when you have code sequences where you have
unlocked READ_ONCE/WRITE_ONCE and memory ordering.

WRITE_ONCE() doesn't even protect against data tearing. If you do a
"WRITE_ONCE()" on a type larger than 8 bytes, it will fall back to
__builtin_memcpy().

So honestly, WRITE_ONCE() is often more documentation than protecting
against something, but overdoing it doesn't help document anything, it
just obscures the point.

Yeah, yeah, it will use a "volatile" access for the proper normal
types, but even then that won't protect against data tearing on 64-bit
writes on a 32-bit machine, for example. It doesn't even give ordering
guarantees for the sub-words.

So you still end up with the almost the same basic rule: a natural
byte/word write will be atomic. But really, it's going to be so even
without the WRITE_ONCE(), so...

It does ostensibly protect against the compiler re-ordering the write
against other writes (note: *compiler*, not CPU), and it does make
sure the write only happens once. But it's really hard to see a valid
compiler optimization that wouldn't do that anyway.

Because of the compiler ordering of WRITE_ONCE against other
WRITE_ONCE cases, it could be used for irq-safe ordering on the local
cpu, for example. If that matters. Although I suspect any such case is
practically going to use per-cpu variables anyway.

End result: it's *mostly* about documentation.

Just do a grep for WRITE_ONCE() vs READ_ONCE(). You'll find a lot more
users of the latter. And quite frankly, I looked at some of the
WRITE_ONCE() users and a lot of them were kind of pointless.

Somebody tell me just exactly how they expect the WRITE_ONCE() cases
in net/tls/tls_sw.c to matter, for example (and that was literally a
random case I happened to look at). It's not clear what they do, since
they certainly don't imply any memory ordering. They do imply a
certain amount of compiler re-ordering due to the volatile, but that's
pretty limited too. Only wrt _other_ things with side effects, not the
normal code around them anyway.

In contrast, READ_ONCE() has very practical examples of mattering,
because unlike writes, compilers really can reasonably split reads.
For example, if you're testing multiple bits in the same word, and you
want to make sure they are coherent, you should very much do

   val = READ_ONCE(mem);
   .. test different bits in val ..

because without the READ_ONCE(), the compiler could end up just doing
the read multiple times.

Similarly, even if you only use a value once, this is actually
something a compiler can do:

    if (a) {
         lock();
         B()
         unlock();
   } else
         B();

and a compiler might end up generating that as

   if (a) lock();
   B();
   if (a) unlock();

instead. So doing "if (READ_ONCE(a))" actually makes a difference - it
guarantees that 'a' is only read once, even if that value was
_literally_ only used on a source level, the compiler really could
have decided "let's read it twice".

See how duplicating a read is fundamentally different from duplicating
a write? Duplicating or splitting a read is not visible to other
threads. Notice how nothiing like the above is reasonable for a write.

That said, the most common case really is none of the above half-way
subtle cases. It's literally the whole "write pointer once". Making
existing code that does that use WRITE_ONCE() is completely pointless.
It's not going to really change or fix anything at all.

                 Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  4:52                         ` Paul E. McKenney
@ 2019-08-17  8:28                           ` Linus Torvalds
  2019-08-17  8:44                             ` Linus Torvalds
  2019-08-17 22:28                             ` Paul E. McKenney
  2019-08-20 14:01                           ` Peter Zijlstra
  1 sibling, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2019-08-17  8:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Valentin Schneider, Joel Fernandes, Thomas Gleixner, Alan Stern,
	Mathieu Desnoyers, rostedt, linux-kernel, Peter Zijlstra,
	Boqun Feng, Will Deacon, David Howells

On Fri, Aug 16, 2019 at 9:52 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
>
> > I'd love to have a flag that says "all undefined behavior is treated
> > as implementation-defined". There's a somewhat subtle - but very
> > important - difference there.
>
> It would also be nice to downgrade some of the undefined behavior in
> the standard itself.  Compiler writers usually hate this because it
> would require them to document what their compiler does in each case.
> So they would prefer "unspecified" if the could not have "undefined".

That certainly would sound very good to me.

It's not the "documented behavior" that is important to me (since it
is still going to be potentially different across different
platforms), it's the "at least it's *some* consistent behavior for
that platform".

That's just _so_ much better than "undefined behavior" which basically
says the compiler writer can do absolutely anything, whether it makes
sense or not, and whether it might open a small bug up to absolutely
catastrophic end results.

> >    if (a)
> >       global_var = 1
> >    else
> >       global_var = 0
>
> Me, I would still want to use WRITE_ONCE() in this case.

I actually suspect that if we had this kind of code, and a real reason
why readers would see it locklessly, then yes, WRITE_ONCE() makes
sense.

But most of the cases where people add WRITE_ONCE() aren't even the
above kind of half-questionable cases. They are just literally

        WRITE_ONCE(flag, value);

and since there is no real memory ordering implied by this, what's the
actual value of that statement? What problem does it really solve wrt
just writing it as

        flag = value;

which is generally a lot easier to read.

If the flag has semantic behavior wrt other things, and the write can
race with a read (whether it's the read or the write that is unlocked
is irrelevant), is still doesn't tend to make a lot of real
difference.

In the end, the most common reason for a WRITE_ONCE() is mostly just
"to visually pair up with the non-synchronized read that uses
READ_ONCE()"

Put another way: a WRITE_ONCE() without a paired READ_ONCE() is almost
certainly pointless.

But the reverse is not really true. All a READ_ONCE() says is "I want
either the old or the new value", and it can get that _without_ being
paired with a WRITE_ONCE().

See? They just aren't equally important.

> > And yes, reads are different from writes. Reads don't have the same
> > kind of "other threads of execution can see them" effects, so a
> > compiler turning a single read into multiple reads is much more
> > realistic and not the same kind of "we need to expect a certain kind
> > of sanity from the compiler" issue.
>
> Though each of those compiler-generated multiple reads might return a
> different value, right?

Right. See the examples I have in the email to Mathieu:

   unsigned int bits = some_global_value;
   ...test different bits in in 'bits' ...

can easily cause multiple reads (particularly on a CPU that has a
"test bits in memory" instruction and a lack of registers.

So then doing it as

   unsigned int bits = READ_ONCE(some_global_value);
   .. test different bits in 'bits'...

makes a real and obvious semantic difference. In ways that changing a one-time

   ptr->flag = true;

to

   WRITE_ONCE(ptr->flag, true);

does _not_ make any obvious semantic difference what-so-ever.

Caching reads is also something that makes sense and is common, in
ways that caching writes does not. So doing

    while (in_progress_global) /* twiddle your thumbs */;

obviously trivially translates to an infinite loop with a single
conditional in front of it, in ways that

   while (READ_ONCE(in_progress_global)) /* twiddle */;

does not.

So there are often _obvious_ reasons to use READ_ONCE().

I really do not find the same to be true of WRITE_ONCE(). There are
valid uses, but they are definitely much less common, and much less
obvious.

                Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  8:28                           ` Linus Torvalds
@ 2019-08-17  8:44                             ` Linus Torvalds
  2019-08-17 15:02                               ` Mathieu Desnoyers
  2019-08-17 22:28                             ` Paul E. McKenney
  1 sibling, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2019-08-17  8:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Valentin Schneider, Joel Fernandes, Thomas Gleixner, Alan Stern,
	Mathieu Desnoyers, rostedt, linux-kernel, Peter Zijlstra,
	Boqun Feng, Will Deacon, David Howells

On Sat, Aug 17, 2019 at 1:28 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>    unsigned int bits = some_global_value;
>    ...test different bits in in 'bits' ...
>
> can easily cause multiple reads (particularly on a CPU that has a
> "test bits in memory" instruction and a lack of registers.
>
> So then doing it as
>
>    unsigned int bits = READ_ONCE(some_global_value);
>    .. test different bits in 'bits'...

Side note: this is likely the best example of actual WRITE_ONCE() use
too: if you have that global value with multiple bits that actually
have some interdependencies, then doing

    some_global_value = some_complex_expression();

might be reasonably compiled to do several rmw instructions to update
'some_global_value'

So then

   WRITE_ONCE(some_global_value, some_complex_expression());

really can be a good thing - it clearly just writes things once, and
it also documents the whole "write one or the other" value, not some
mid-way one, when you then look at the READ_ONCE() thing.

But I'm seeing a lot of WRITE_ONCE(x, constantvalue) kind of things
and don't seem to find a lot of reason to think that they are any
inherently better than "x = constantvalue".

(In contrast, using "smp_store_release(flag, true)" has inherent
value, because it actually implies a memory barrier wrt previous
writes, in ways that WRITE_ONCE() or a direct assignment does not.)

Ok, enough blathering. I think I've made my point.

              Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 19:15         ` Steven Rostedt
@ 2019-08-17 14:27           ` Mathieu Desnoyers
  2019-08-17 15:42             ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17 14:27 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, Joel Fernandes, Google, Peter Zijlstra,
	Thomas Gleixner, paulmck, Boqun Feng, Will Deacon, David Howells,
	Alan Stern, Linus Torvalds

----- On Aug 16, 2019, at 3:15 PM, rostedt rostedt@goodmis.org wrote:

> On Fri, 16 Aug 2019 13:19:20 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> ----- On Aug 16, 2019, at 12:25 PM, rostedt rostedt@goodmis.org wrote:
>> 
>> > On Fri, 16 Aug 2019 10:26:43 -0400 Mathieu Desnoyers
>> > <mathieu.desnoyers@efficios.com> wrote:
>> >   
>> [...]
>> >> 
>> >> Also, write and read to/from those variables should be done with
>> >> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
>> >> probes without holding the sched_register_mutex.
>> >>   
>> > 
>> > I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
>> > It's done while holding the mutex. It's not that critical of a path,
>> > and makes the code look ugly.
>> 
>> The update is done while holding the mutex, but the read-side does not
>> hold that mutex, so it can observe the intermediate state caused by
>> store-tearing or invented stores which can be generated by the compiler
>> on the update-side.
>> 
>> Please refer to the following LWN article:
>> 
>> https://lwn.net/Articles/793253/
>> 
>> Sections:
>> - "Store tearing"
>> - "Invented stores"
>> 
>> Arguably, based on that article, store tearing is only observed in the
>> wild for constants (which is not the case here), and invented stores
>> seem to require specific code patterns. But I wonder why we would ever want to
>> pair a fragile non-volatile store with a READ_ONCE() ? Considering the pain
>> associated to reproduce and hunt down this kind of issue in the wild, I would
>> be tempted to enforce that any READ_ONCE() operating on a variable would either
>> need to be paired with WRITE_ONCE() or with atomic operations, so those can
>> eventually be validated by static code checkers and code sanitizers.
> 
> My issue is that this is just a case to decide if we should cache a
> comm or not. It's a helper, nothing more. There's no guarantee that
> something will get cached.

I get your point wrt WRITE_ONCE(): since it's a cache it should not have
user-visible effects if a temporary incorrect value is observed. Well in
reality, it's not a cache: if the lookup fails, it returns "<...>" instead,
so cache lookup failure ends up not providing any useful data in the trace.
Let's assume this is a known and documented tracer limitation.

However, wrt READ_ONCE(), things are different. The variable read ends up
being used to control various branches in the code, and the compiler could
decide to re-fetch the variable (with a different state), and therefore
cause _some_ of the branches to be inconsistent. See
tracing_record_taskinfo_sched_switch() and tracing_record_taskinfo() @flags
parameter.

AFAIU the current code should not generate any out-of-bound writes in case of
re-fetch, but no comment in there documents how fragile this is.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  2:13                       ` Steven Rostedt
@ 2019-08-17 14:40                         ` Mathieu Desnoyers
  2019-08-17 15:26                           ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17 14:40 UTC (permalink / raw)
  To: rostedt
  Cc: Linus Torvalds, Thomas Gleixner, Joel Fernandes, Google,
	Alan Stern, Valentin Schneider, linux-kernel, Peter Zijlstra,
	paulmck, Boqun Feng, Will Deacon, David Howells

----- On Aug 16, 2019, at 10:13 PM, rostedt rostedt@goodmis.org wrote:

> On Fri, 16 Aug 2019 21:36:49 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> ----- On Aug 16, 2019, at 5:04 PM, Linus Torvalds torvalds@linux-foundation.org
>> wrote:
>> 
>> > On Fri, Aug 16, 2019 at 1:49 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> >>
>> >> Can we finally put a foot down and tell compiler and standard committee
>> >> people to stop this insanity?
>> > 
>> > It's already effectively done.
>> > 
>> > Yes, values can be read from memory multiple times if they need
>> > reloading. So "READ_ONCE()" when the value can change is a damn good
>> > idea.
>> > 
>> > But it should only be used if the value *can* change. Inside a locked
>> > region it is actively pointless and misleading.
>> > 
>> > Similarly, WRITE_ONCE() should only be used if you have a _reason_ for
>> > using it (notably if you're not holding a lock).
>> > 
>> > If people use READ_ONCE/WRITE_ONCE when there are locks that prevent
>> > the values from changing, they are only making the code illegible.
>> > Don't do it.
>> 
>> I agree with your argument in the case where both read-side and write-side
>> are protected by the same lock: READ/WRITE_ONCE are useless then. However,
>> in the scenario we have here, only the write-side is protected by the lock
>> against concurrent updates, but reads don't take any lock.
> 
> And because reads are not protected by any lock or memory barrier,
> using READ_ONCE() is pointless. The CPU could be doing a lot of hidden
> manipulation of that variable too.

Please enlighten me by providing some details on what the CPU could do to
this word-aligned, word-sized variable in the absence of lock and barrier
that is relevant to this specific use-case ?

I suspect most of the barriers you refer to here are taken care of by the
tracepoint code which uses RCU to synchronize probe registration wrt
probe callback execution.

> 
> Again, this is just to enable caching of the comm. Nothing more. It's a
> "best effort" algorithm. We get it, we then can map a pid to a name. If
> not, we just have a pid and we map "<...>".
> 
> Next you'll be asking for the memory barriers to guarantee a real hit.
> And honestly, this information is not worth any overhead.

No, that's not my intent to add overhead to guarantee trace data
availability near trace beginning and end. However, considering that
READ_ONCE() and WRITE_ONCE() can provide additional data availability
guarantees in the middle of traces at no runtime cost, it seems like a
good trade off.

It's easier for an analysis to disregard missing information at the
beginning and end of trace without generating false-positive reports
than when it happens spuriously in the middle of traces.

> 
> And most often we enable this before we enable the tracepoint we want
> this information from, which requires modification of the text area and
> will do a bunch of syncs across CPUs. That alone will most likely keep
> any race from happening.

Indeed the tracepoint use of RCU to synchronize registration vs probes
should take care of those barriers.

> 
> The only real bug is the check to know if we should add the probe or
> not was done outside the lock, and when we hit the race, we could add a
> probe twice, causing the kernel to spit out a warning. You fixed that,
> and that's all that needs to be done.

I just sent that fix in a separate patch.

> I'm now even more against adding the READ_ONCE() or WRITE_ONCE().

I'm not convinced by your arguments.

Thanks,

Mathieu

> 
> 
> -- Steve
> 
> 
> 
>> 
>> If WRITE_ONCE has any use at all (protecting against store tearing and
>> invented stores), it should be used even with a lock held in this
>> scenario, because the lock does not prevent READ_ONCE() from observing
>> transient states caused by lack of WRITE_ONCE() for the update.
>> 
>> So why does WRITE_ONCE exist in the first place ? Is it for documentation
>> purposes only or are there actual situations where omitting it can cause
>> bugs with real-life compilers ?
>> 
>> In terms of code change, should we favor only introducing WRITE_ONCE
>> in new code, or should existing code matching those conditions be
>> moved to WRITE_ONCE as bug fixes ?
>> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  8:44                             ` Linus Torvalds
@ 2019-08-17 15:02                               ` Mathieu Desnoyers
  2019-08-17 20:03                                 ` Valentin Schneider
  0 siblings, 1 reply; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17 15:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: paulmck, Valentin Schneider, Joel Fernandes, Google,
	Thomas Gleixner, Alan Stern, rostedt, linux-kernel,
	Peter Zijlstra, Boqun Feng, Will Deacon, David Howells

----- On Aug 17, 2019, at 4:44 AM, Linus Torvalds torvalds@linux-foundation.org wrote:

> 
> But I'm seeing a lot of WRITE_ONCE(x, constantvalue) kind of things
> and don't seem to find a lot of reason to think that they are any
> inherently better than "x = constantvalue".

If the only states that "x" can take is 1 or 0, then indeed there seems
to be no point in using a WRITE_ONCE() when paired with a READ_ONCE()
other than for documentation purposes.

However, if the state of "x" can be any pointer value, or a reference
count value, then not using "WRITE_ONCE()" to store a constant leaves
the compiler free to perform that store in more than one memory access.
Based on [1], section "Store tearing", there are situations where this
happens on x86 in the wild today when storing 64-bit constants: the
compiler is then free to decide to use two 32-bit immediate store
instructions.

Thanks,

Mathieu

[1] https://lwn.net/Articles/793253/

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 14:40                         ` Mathieu Desnoyers
@ 2019-08-17 15:26                           ` Steven Rostedt
  2019-08-17 15:55                             ` Mathieu Desnoyers
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2019-08-17 15:26 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Thomas Gleixner, Joel Fernandes, Google,
	Alan Stern, Valentin Schneider, linux-kernel, Peter Zijlstra,
	paulmck, Boqun Feng, Will Deacon, David Howells

On Sat, 17 Aug 2019 10:40:31 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> > I'm now even more against adding the READ_ONCE() or WRITE_ONCE().  
> 
> I'm not convinced by your arguments.

Prove to me that there's an issue here beyond theoretical analysis,
then I'll consider that patch.

Show me a compiler used to compile the kernel that zeros out the
increment. Show me were the race actually occurs.

I think the READ/WRITE_ONCE() is more confusing than helpful. And
unneeded churn to the code. And really not needed for something that's
not critical to execution.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 14:27           ` Mathieu Desnoyers
@ 2019-08-17 15:42             ` Steven Rostedt
  2019-08-17 15:53               ` Mathieu Desnoyers
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2019-08-17 15:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Joel Fernandes, Google, Peter Zijlstra,
	Thomas Gleixner, paulmck, Boqun Feng, Will Deacon, David Howells,
	Alan Stern, Linus Torvalds

On Sat, 17 Aug 2019 10:27:39 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> I get your point wrt WRITE_ONCE(): since it's a cache it should not have
> user-visible effects if a temporary incorrect value is observed. Well in
> reality, it's not a cache: if the lookup fails, it returns "<...>" instead,
> so cache lookup failure ends up not providing any useful data in the trace.
> Let's assume this is a known and documented tracer limitation.

Note, this is done at every sched switch, for both next and prev tasks.
And the update is only done at the enabling of a tracepoint (very rare
occurrence) If it missed it scheduling in, it has a really good chance
of getting it while scheduling out.

And 99.999% of my tracing that I do, the tasks scheduling in when
enabling a tracepoint is not what I even care about, as I enable
tracing then start what I want to trace.


> 
> However, wrt READ_ONCE(), things are different. The variable read ends up
> being used to control various branches in the code, and the compiler could
> decide to re-fetch the variable (with a different state), and therefore
> cause _some_ of the branches to be inconsistent. See
> tracing_record_taskinfo_sched_switch() and tracing_record_taskinfo() @flags
> parameter.

I'm more OK with using a READ_ONCE() on the flags so it is consistent.
But the WRITE_ONCE() is going a bit overboard.

> 
> AFAIU the current code should not generate any out-of-bound writes in case of
> re-fetch, but no comment in there documents how fragile this is.

Which part of the code are you talking about here?

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 15:42             ` Steven Rostedt
@ 2019-08-17 15:53               ` Mathieu Desnoyers
  2019-08-17 16:43                 ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17 15:53 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, Joel Fernandes, Google, Peter Zijlstra,
	Thomas Gleixner, paulmck, Boqun Feng, Will Deacon, David Howells,
	Alan Stern, Linus Torvalds

----- On Aug 17, 2019, at 11:42 AM, rostedt rostedt@goodmis.org wrote:

> On Sat, 17 Aug 2019 10:27:39 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> I get your point wrt WRITE_ONCE(): since it's a cache it should not have
>> user-visible effects if a temporary incorrect value is observed. Well in
>> reality, it's not a cache: if the lookup fails, it returns "<...>" instead,
>> so cache lookup failure ends up not providing any useful data in the trace.
>> Let's assume this is a known and documented tracer limitation.
> 
> Note, this is done at every sched switch, for both next and prev tasks.
> And the update is only done at the enabling of a tracepoint (very rare
> occurrence) If it missed it scheduling in, it has a really good chance
> of getting it while scheduling out.
> 
> And 99.999% of my tracing that I do, the tasks scheduling in when
> enabling a tracepoint is not what I even care about, as I enable
> tracing then start what I want to trace.

Since it's refcount based, my concern is about the side-effect of
incrementing or decrementing that reference count without WRITE_ONCE
which would lead to a transient corrupted value observed by _another_
active tracing user.

For you use-case, it would lead to a missing comm when you are actively
tracing what you want to trace, caused by another user of that refcount
incrementing or decrementing it.

I agree with you that missing tracing data at the beginning or end of a
trace is not important.

>> 
>> However, wrt READ_ONCE(), things are different. The variable read ends up
>> being used to control various branches in the code, and the compiler could
>> decide to re-fetch the variable (with a different state), and therefore
>> cause _some_ of the branches to be inconsistent. See
>> tracing_record_taskinfo_sched_switch() and tracing_record_taskinfo() @flags
>> parameter.
> 
> I'm more OK with using a READ_ONCE() on the flags so it is consistent.
> But the WRITE_ONCE() is going a bit overboard.

Hence my request for additional guidance on the usefulness of WRITE_ONCE(),
whether it's mainly there for documentation purposes, or if we should consider
that it takes care of real-life problems introduced by compiler optimizations
in the wild. The LWN article seems to imply that it's not just a theoretical
issue, but I'll have to let the article authors justify their conclusions,
because I have limited time to investigate this myself.

> 
>> 
>> AFAIU the current code should not generate any out-of-bound writes in case of
>> re-fetch, but no comment in there documents how fragile this is.
> 
> Which part of the code are you talking about here?

kernel/trace/trace.c:tracing_record_taskinfo_sched_switch()
kernel/trace/trace.c:tracing_record_taskinfo()

where @flags is used to control a few branches. I don't think any of those
would end up causing corruption if the flags is re-fetched between two
branches, but it seems rather fragile.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 15:26                           ` Steven Rostedt
@ 2019-08-17 15:55                             ` Mathieu Desnoyers
  2019-08-17 16:40                               ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-17 15:55 UTC (permalink / raw)
  To: rostedt
  Cc: Linus Torvalds, Thomas Gleixner, Joel Fernandes, Google,
	Alan Stern, Valentin Schneider, linux-kernel, Peter Zijlstra,
	paulmck, Boqun Feng, Will Deacon, David Howells

----- On Aug 17, 2019, at 11:26 AM, rostedt rostedt@goodmis.org wrote:

> On Sat, 17 Aug 2019 10:40:31 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> > I'm now even more against adding the READ_ONCE() or WRITE_ONCE().
>> 
>> I'm not convinced by your arguments.
> 
> Prove to me that there's an issue here beyond theoretical analysis,
> then I'll consider that patch.
> 
> Show me a compiler used to compile the kernel that zeros out the
> increment. Show me were the race actually occurs.
> 
> I think the READ/WRITE_ONCE() is more confusing than helpful. And
> unneeded churn to the code. And really not needed for something that's
> not critical to execution.

I'll have to let the authors of the LWN article speak up on this, because
I have limited time to replicate this investigation myself.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 15:55                             ` Mathieu Desnoyers
@ 2019-08-17 16:40                               ` Steven Rostedt
  2019-08-17 22:06                                 ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2019-08-17 16:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Thomas Gleixner, Joel Fernandes, Google,
	Alan Stern, Valentin Schneider, linux-kernel, Peter Zijlstra,
	paulmck, Boqun Feng, Will Deacon, David Howells

On Sat, 17 Aug 2019 11:55:17 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Aug 17, 2019, at 11:26 AM, rostedt rostedt@goodmis.org wrote:
> 
> > On Sat, 17 Aug 2019 10:40:31 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >> > I'm now even more against adding the READ_ONCE() or WRITE_ONCE().  
> >> 
> >> I'm not convinced by your arguments.  
> > 
> > Prove to me that there's an issue here beyond theoretical analysis,
> > then I'll consider that patch.
> > 
> > Show me a compiler used to compile the kernel that zeros out the
> > increment. Show me were the race actually occurs.
> > 
> > I think the READ/WRITE_ONCE() is more confusing than helpful. And
> > unneeded churn to the code. And really not needed for something that's
> > not critical to execution.  
> 
> I'll have to let the authors of the LWN article speak up on this, because
> I have limited time to replicate this investigation myself.

I'll let Paul McKenney convince me then, if he has any spare cycles ;-)

The one instance in that article is from a 2013 bug, which talks about
storing a 64 bit value on a 32 bit machine. But the ref count is an int
(32 bit), and I highly doubt any compiler will split it into 16 bit
stores for a simple increment. And I don't believe Linux even supports
any architecture that requires 16 bit stores anymore.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 15:53               ` Mathieu Desnoyers
@ 2019-08-17 16:43                 ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2019-08-17 16:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Joel Fernandes, Google, Peter Zijlstra,
	Thomas Gleixner, paulmck, Boqun Feng, Will Deacon, David Howells,
	Alan Stern, Linus Torvalds

On Sat, 17 Aug 2019 11:53:41 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> kernel/trace/trace.c:tracing_record_taskinfo_sched_switch()
> kernel/trace/trace.c:tracing_record_taskinfo()
> 
> where @flags is used to control a few branches. I don't think any of those
> would end up causing corruption if the flags is re-fetched between two
> branches, but it seems rather fragile.

There shouldn't be any corruption, as each flag test is not dependent
on any of the other tests. But I do agree, a READ_ONCE() would be
appropriate for just making a consistent state among the tests, even
though they are independent.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 15:02                               ` Mathieu Desnoyers
@ 2019-08-17 20:03                                 ` Valentin Schneider
  2019-08-17 23:00                                   ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Valentin Schneider @ 2019-08-17 20:03 UTC (permalink / raw)
  To: Mathieu Desnoyers, Linus Torvalds
  Cc: paulmck, Joel Fernandes, Google, Thomas Gleixner, Alan Stern,
	rostedt, linux-kernel, Peter Zijlstra, Boqun Feng, Will Deacon,
	David Howells

Apologies to Steve for continuing this thread when all he wanted was moving
an operation inside a mutex...

On 17/08/2019 16:02, Mathieu Desnoyers wrote:
[...]
> However, if the state of "x" can be any pointer value, or a reference
> count value, then not using "WRITE_ONCE()" to store a constant leaves
> the compiler free to perform that store in more than one memory access.
> Based on [1], section "Store tearing", there are situations where this
> happens on x86 in the wild today when storing 64-bit constants: the
> compiler is then free to decide to use two 32-bit immediate store
> instructions.
> 

That's also how I understand things, and it's also one of the points raised
in the compiler barrier section of memory-barriers.txt

Taking this store tearing, or the invented stores - e.g. the branch
optimization pointed out by Linus:

>    if (a)
>       global_var = 1
>    else
>       global_var = 0
> 
> then the compiler had better not turn that into
> 
>      global_var = 0
>      if (a)
>          global_var = 1

AFAICT nothing prevents this from happening inside a critical section (where
the locking primitives provide the right barriers, but that's it). That's
all fine when data is never accessed locklessly, but in the case of locked
writes vs lockless reads, couldn't there be "leaks" of these transient
states? In those cases we would want WRITE_ONCE() for the writes.

So going back to:

> But the reverse is not really true. All a READ_ONCE() says is "I want
> either the old or the new value", and it can get that _without_ being
> paired with a WRITE_ONCE().

AFAIU it's not always the case, since a lone READ_ONCE() could get transient
values.

I'll be honest, it's not 100% clear to me when those optimizations can
actually be done (maybe the branch thingy but the others are dubious), and
it's even less clear when compilers *actually* do it - only that they have
been reported to do it (so it's not made up).

> Thanks,
> 
> Mathieu
> 
> [1] https://lwn.net/Articles/793253/
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 16:40                               ` Steven Rostedt
@ 2019-08-17 22:06                                 ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-17 22:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Linus Torvalds, Thomas Gleixner,
	Joel Fernandes, Google, Alan Stern, Valentin Schneider,
	linux-kernel, Peter Zijlstra, Boqun Feng, Will Deacon,
	David Howells

On Sat, Aug 17, 2019 at 12:40:40PM -0400, Steven Rostedt wrote:
> On Sat, 17 Aug 2019 11:55:17 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> > ----- On Aug 17, 2019, at 11:26 AM, rostedt rostedt@goodmis.org wrote:
> > 
> > > On Sat, 17 Aug 2019 10:40:31 -0400 (EDT)
> > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> > >   
> > >> > I'm now even more against adding the READ_ONCE() or WRITE_ONCE().  
> > >> 
> > >> I'm not convinced by your arguments.  
> > > 
> > > Prove to me that there's an issue here beyond theoretical analysis,
> > > then I'll consider that patch.
> > > 
> > > Show me a compiler used to compile the kernel that zeros out the
> > > increment. Show me were the race actually occurs.
> > > 
> > > I think the READ/WRITE_ONCE() is more confusing than helpful. And
> > > unneeded churn to the code. And really not needed for something that's
> > > not critical to execution.  
> > 
> > I'll have to let the authors of the LWN article speak up on this, because
> > I have limited time to replicate this investigation myself.
> 
> I'll let Paul McKenney convince me then, if he has any spare cycles ;-)

You guys do manage to time these things sometimes.  ;-)

> The one instance in that article is from a 2013 bug, which talks about
> storing a 64 bit value on a 32 bit machine. But the ref count is an int
> (32 bit), and I highly doubt any compiler will split it into 16 bit
> stores for a simple increment. And I don't believe Linux even supports
> any architecture that requires 16 bit stores anymore.

For a machine-sized and aligned increment, it is indeed hard to imagine,
even for me.  I would be more worried about stores of constants with
lots of zero bits between non-zero bits on systems with small-sized
store-immediate instructions.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  8:28                           ` Linus Torvalds
  2019-08-17  8:44                             ` Linus Torvalds
@ 2019-08-17 22:28                             ` Paul E. McKenney
  1 sibling, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-17 22:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Valentin Schneider, Joel Fernandes, Thomas Gleixner, Alan Stern,
	Mathieu Desnoyers, rostedt, linux-kernel, Peter Zijlstra,
	Boqun Feng, Will Deacon, David Howells

On Sat, Aug 17, 2019 at 01:28:48AM -0700, Linus Torvalds wrote:

[ . . . ]

> Put another way: a WRITE_ONCE() without a paired READ_ONCE() is almost
> certainly pointless.

"Your honor, I have no further questions at this time, but I reserve
the right to recall this witness."

Outside of things like MMIO (where one could argue that the corresponding
READ_ONCE() is in the device firmware), the use cases I can imagine
for WRITE_ONCE() with no READ_ONCE() are quite strange.  For example,
doing the WRITE_ONCE()s while read-holding a given lock and doing plain
reads while write-holding that same lock.  While at the same time being
worried about store tearing and similar.

Perhaps I am suffering a failure of imagination, but I am not seeing
a reasonable use for such things at the moment.

> But the reverse is not really true. All a READ_ONCE() says is "I want
> either the old or the new value", and it can get that _without_ being
> paired with a WRITE_ONCE().
> 
> See? They just aren't equally important.
> 
> > > And yes, reads are different from writes. Reads don't have the same
> > > kind of "other threads of execution can see them" effects, so a
> > > compiler turning a single read into multiple reads is much more
> > > realistic and not the same kind of "we need to expect a certain kind
> > > of sanity from the compiler" issue.
> >
> > Though each of those compiler-generated multiple reads might return a
> > different value, right?
> 
> Right. See the examples I have in the email to Mathieu:
> 
>    unsigned int bits = some_global_value;
>    ...test different bits in in 'bits' ...
> 
> can easily cause multiple reads (particularly on a CPU that has a
> "test bits in memory" instruction and a lack of registers.
> 
> So then doing it as
> 
>    unsigned int bits = READ_ONCE(some_global_value);
>    .. test different bits in 'bits'...
> 
> makes a real and obvious semantic difference. In ways that changing a one-time
> 
>    ptr->flag = true;
> 
> to
> 
>    WRITE_ONCE(ptr->flag, true);
> 
> does _not_ make any obvious semantic difference what-so-ever.

Agreed, especially given that only one bit of ->flag is most likely
ever changing.

> Caching reads is also something that makes sense and is common, in
> ways that caching writes does not. So doing
> 
>     while (in_progress_global) /* twiddle your thumbs */;
> 
> obviously trivially translates to an infinite loop with a single
> conditional in front of it, in ways that
> 
>    while (READ_ONCE(in_progress_global)) /* twiddle */;
> 
> does not.
> 
> So there are often _obvious_ reasons to use READ_ONCE().
> 
> I really do not find the same to be true of WRITE_ONCE(). There are
> valid uses, but they are definitely much less common, and much less
> obvious.

Agreed, and I expect READ_ONCE() to continue to be used more heavily than
is WRITE_ONCE(), even including the documentation-only WRITE_ONCE() usage.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 20:03                                 ` Valentin Schneider
@ 2019-08-17 23:00                                   ` Paul E. McKenney
  2019-08-19 10:34                                     ` Valentin Schneider
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-17 23:00 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Mathieu Desnoyers, Linus Torvalds, Joel Fernandes, Google,
	Thomas Gleixner, Alan Stern, rostedt, linux-kernel,
	Peter Zijlstra, Boqun Feng, Will Deacon, David Howells

On Sat, Aug 17, 2019 at 09:03:30PM +0100, Valentin Schneider wrote:
> Apologies to Steve for continuing this thread when all he wanted was moving
> an operation inside a mutex...
> 
> On 17/08/2019 16:02, Mathieu Desnoyers wrote:
> [...]
> > However, if the state of "x" can be any pointer value, or a reference
> > count value, then not using "WRITE_ONCE()" to store a constant leaves
> > the compiler free to perform that store in more than one memory access.
> > Based on [1], section "Store tearing", there are situations where this
> > happens on x86 in the wild today when storing 64-bit constants: the
> > compiler is then free to decide to use two 32-bit immediate store
> > instructions.
> > 
> 
> That's also how I understand things, and it's also one of the points raised
> in the compiler barrier section of memory-barriers.txt
> 
> Taking this store tearing, or the invented stores - e.g. the branch
> optimization pointed out by Linus:
> 
> >    if (a)
> >       global_var = 1
> >    else
> >       global_var = 0
> > 
> > then the compiler had better not turn that into
> > 
> >      global_var = 0
> >      if (a)
> >          global_var = 1
> 
> AFAICT nothing prevents this from happening inside a critical section (where
> the locking primitives provide the right barriers, but that's it). That's
> all fine when data is never accessed locklessly, but in the case of locked
> writes vs lockless reads, couldn't there be "leaks" of these transient
> states? In those cases we would want WRITE_ONCE() for the writes.
> 
> So going back to:
> 
> > But the reverse is not really true. All a READ_ONCE() says is "I want
> > either the old or the new value", and it can get that _without_ being
> > paired with a WRITE_ONCE().
> 
> AFAIU it's not always the case, since a lone READ_ONCE() could get transient
> values.

Linus noted that he believes that compilers for architectures supporting
Linux can be trusted to avoid store-to-load transformations, invented
stores, and unnecessary store tearing.  Should these appear, Linus would
report a bug against the compiler and expect it to be fixed.

> I'll be honest, it's not 100% clear to me when those optimizations can
> actually be done (maybe the branch thingy but the others are dubious), and
> it's even less clear when compilers *actually* do it - only that they have
> been reported to do it (so it's not made up).

There is significant unclarity inherent in the situation.  The standard
says one thing, different compilers do other things, and developers
often expect yet a third thing.  And sometimes things change over time,
for example, the ca. 2011 dictim against compilers inventing data races.

Hey, they didn't teach me this aspect of software development in school,
either.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* stable markup was Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-16 20:49                 ` Steven Rostedt
  2019-08-16 20:59                   ` Joel Fernandes
  2019-08-17  1:25                   ` Mathieu Desnoyers
@ 2019-08-18  9:15                   ` Pavel Machek
  2 siblings, 0 replies; 60+ messages in thread
From: Pavel Machek @ 2019-08-18  9:15 UTC (permalink / raw)
  To: Steven Rostedt, Greg KH, stable
  Cc: Joel Fernandes, Alan Stern, Mathieu Desnoyers,
	Valentin Schneider, linux-kernel, Peter Zijlstra,
	Thomas Gleixner, paulmck, Boqun Feng, Will Deacon, David Howells,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

Hi!

> The most I'll take is two separate patches. One is going to be marked
> for stable as it fixes a real bug. The other is more for cosmetic or
> theoretical issues, that I will state clearly "NOT FOR STABLE", such
> that the autosel doesn't take them.

Do we have standartized way to mark "this is not for stable"? Because
I often think "I'd really hate to see this in stable"...

On a related note, it would be nice to have standartized way to
marking corresponding upstream commit. (Currently three methods are in
use).

Upstream: XXXX

in the signoff area would be nice, clearly marking who touched the
patch before mainline and who after.

Best regards,

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17 23:00                                   ` Paul E. McKenney
@ 2019-08-19 10:34                                     ` Valentin Schneider
  0 siblings, 0 replies; 60+ messages in thread
From: Valentin Schneider @ 2019-08-19 10:34 UTC (permalink / raw)
  To: paulmck
  Cc: Mathieu Desnoyers, Linus Torvalds, Joel Fernandes, Google,
	Thomas Gleixner, Alan Stern, rostedt, linux-kernel,
	Peter Zijlstra, Boqun Feng, Will Deacon, David Howells



On 18/08/2019 00:00, Paul E. McKenney wrote:
[...]
> Linus noted that he believes that compilers for architectures supporting
> Linux can be trusted to avoid store-to-load transformations, invented
> stores, and unnecessary store tearing.  Should these appear, Linus would
> report a bug against the compiler and expect it to be fixed.
> 
>> I'll be honest, it's not 100% clear to me when those optimizations can
>> actually be done (maybe the branch thingy but the others are dubious), and
>> it's even less clear when compilers *actually* do it - only that they have
>> been reported to do it (so it's not made up).
> 
> There is significant unclarity inherent in the situation.  The standard
> says one thing, different compilers do other things, and developers
> often expect yet a third thing.  And sometimes things change over time,
> for example, the ca. 2011 dictim against compilers inventing data races.
> 
> Hey, they didn't teach me this aspect of software development in school,
> either.  ;-)
> 

Gotta keeps things "interesting" somehow, eh...

Thanks for the clarifications.

> 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  8:08                       ` Linus Torvalds
@ 2019-08-20 13:56                         ` Peter Zijlstra
  2019-08-20 20:29                           ` Paul E. McKenney
  2019-09-09  6:21                           ` Herbert Xu
  0 siblings, 2 replies; 60+ messages in thread
From: Peter Zijlstra @ 2019-08-20 13:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mathieu Desnoyers, Thomas Gleixner, Joel Fernandes, Alan Stern,
	rostedt, Valentin Schneider, linux-kernel, paulmck, Boqun Feng,
	Will Deacon, David Howells

On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:

> The data tearing issue is almost a non-issue. We're not going to add
> WRITE_ONCE() to these kinds of places for no good reason.

Paulmck actually has an example of that somewhere; ISTR that particular
case actually got fixed by GCC, but I'd really _love_ for some compiler
people (both GCC and LLVM) to state that their respective compilers will
not do load/store tearing for machine word sized load/stores.

Without this written guarantee (which supposedly was in older GCC
manuals but has since gone missing), I'm loathe to rely on it.

Yes, it is very rare, but it is a massive royal pain to debug if/when it
does do happen.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-17  4:52                         ` Paul E. McKenney
  2019-08-17  8:28                           ` Linus Torvalds
@ 2019-08-20 14:01                           ` Peter Zijlstra
  2019-08-20 20:31                             ` Paul E. McKenney
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2019-08-20 14:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Valentin Schneider, Joel Fernandes,
	Thomas Gleixner, Alan Stern, Mathieu Desnoyers, rostedt,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Fri, Aug 16, 2019 at 09:52:17PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 16, 2019 at 03:57:43PM -0700, Linus Torvalds wrote:
> 
> [ . . . ]
> 
> > We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
> > because of some theoretical "compiler is free to do garbage"
> > arguments. If such garbage happens, we need to fix the compiler, the
> > same way we already do with
> > 
> >   -fno-strict-aliasing
> 
> Yeah, the compete-with-FORTRAN stuff.  :-/
> 
> There is some work going on in the C committee on this, where the
> theorists would like to restrict strict-alias based optimizations to
> enable better analysis tooling.  And no, although the theorists are
> pushing in the direction we would like them to, as far as I can see
> they are not pushing as far as we would like.  But it might be that
> -fno-strict-aliasing needs some upgrades as well.  I expect to learn
> more at the next meeting in a few months.
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf

We really should get the compiler folks to give us a
-fno-pointer-provenance. Waiting on the standards committee to get their
act together seems unlikely, esp. given that some people actually seem
to _want_ this nonsense :/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-20 13:56                         ` Peter Zijlstra
@ 2019-08-20 20:29                           ` Paul E. McKenney
  2019-08-21 10:32                             ` Will Deacon
  2019-09-09  6:21                           ` Herbert Xu
  1 sibling, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-20 20:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Mathieu Desnoyers, Thomas Gleixner,
	Joel Fernandes, Alan Stern, rostedt, Valentin Schneider,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Tue, Aug 20, 2019 at 03:56:12PM +0200, Peter Zijlstra wrote:
> On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:
> 
> > The data tearing issue is almost a non-issue. We're not going to add
> > WRITE_ONCE() to these kinds of places for no good reason.
> 
> Paulmck actually has an example of that somewhere; ISTR that particular
> case actually got fixed by GCC, but I'd really _love_ for some compiler
> people (both GCC and LLVM) to state that their respective compilers will
> not do load/store tearing for machine word sized load/stores.

I do very much recall such an example, but I am now unable to either
find it or reproduce it.  :-/

If I cannot turn it up in a few days, I will ask the LWN editors to
make appropriate changes to the "Who is afraid" article.

> Without this written guarantee (which supposedly was in older GCC
> manuals but has since gone missing), I'm loathe to rely on it.
> 
> Yes, it is very rare, but it is a massive royal pain to debug if/when it
> does do happen.

But from what I can see, Linus is OK with use of WRITE_ONCE() for data
races on any variable for which there is at least one READ_ONCE().
So we can still use WRITE_ONCE() as we would like in our own code.
Yes, you or I might be hit by someone else's omission of WRITE_ONCE(),
it is better than the proverbial kick in the teeth.

Of course, if anyone knows of a compiler/architecture combination that
really does tear stores of 32-bit constants, please do not keep it
a secret!  After all, it would be good to get that addressed easily
starting now rather than after a difficult and painful series of
debugging sessions.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-20 14:01                           ` Peter Zijlstra
@ 2019-08-20 20:31                             ` Paul E. McKenney
  2019-08-20 20:39                               ` Peter Zijlstra
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-20 20:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Valentin Schneider, Joel Fernandes,
	Thomas Gleixner, Alan Stern, Mathieu Desnoyers, rostedt,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Tue, Aug 20, 2019 at 04:01:16PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 16, 2019 at 09:52:17PM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 16, 2019 at 03:57:43PM -0700, Linus Torvalds wrote:
> > 
> > [ . . . ]
> > 
> > > We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
> > > because of some theoretical "compiler is free to do garbage"
> > > arguments. If such garbage happens, we need to fix the compiler, the
> > > same way we already do with
> > > 
> > >   -fno-strict-aliasing
> > 
> > Yeah, the compete-with-FORTRAN stuff.  :-/
> > 
> > There is some work going on in the C committee on this, where the
> > theorists would like to restrict strict-alias based optimizations to
> > enable better analysis tooling.  And no, although the theorists are
> > pushing in the direction we would like them to, as far as I can see
> > they are not pushing as far as we would like.  But it might be that
> > -fno-strict-aliasing needs some upgrades as well.  I expect to learn
> > more at the next meeting in a few months.
> > 
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> 
> We really should get the compiler folks to give us a
> -fno-pointer-provenance. Waiting on the standards committee to get their
> act together seems unlikely, esp. given that some people actually seem
> to _want_ this nonsense :/

The reason that they want it is to enable some significant optimizations
in numerical code on the one hand and in heavily templated C++ code on
the other.  Neither of which has much bearing on kernel code.

Interested in coming to the next C standards committee meeting in October
to help me push for this?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-20 20:31                             ` Paul E. McKenney
@ 2019-08-20 20:39                               ` Peter Zijlstra
  2019-08-20 20:52                                 ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2019-08-20 20:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Valentin Schneider, Joel Fernandes,
	Thomas Gleixner, Alan Stern, Mathieu Desnoyers, rostedt,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Tue, Aug 20, 2019 at 01:31:35PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 20, 2019 at 04:01:16PM +0200, Peter Zijlstra wrote:

> > We really should get the compiler folks to give us a
> > -fno-pointer-provenance. Waiting on the standards committee to get their
> > act together seems unlikely, esp. given that some people actually seem
> > to _want_ this nonsense :/
> 
> The reason that they want it is to enable some significant optimizations
> in numerical code on the one hand and in heavily templated C++ code on
> the other.  Neither of which has much bearing on kernel code.
> 
> Interested in coming to the next C standards committee meeting in October
> to help me push for this?  ;-)

How about we try and get some compiler folks together at plumbers and
bribe them with beer? Once we have our compiler knob, we happy :-)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-20 20:39                               ` Peter Zijlstra
@ 2019-08-20 20:52                                 ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-20 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Valentin Schneider, Joel Fernandes,
	Thomas Gleixner, Alan Stern, Mathieu Desnoyers, rostedt,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Tue, Aug 20, 2019 at 10:39:39PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 20, 2019 at 01:31:35PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 20, 2019 at 04:01:16PM +0200, Peter Zijlstra wrote:
> 
> > > We really should get the compiler folks to give us a
> > > -fno-pointer-provenance. Waiting on the standards committee to get their
> > > act together seems unlikely, esp. given that some people actually seem
> > > to _want_ this nonsense :/
> > 
> > The reason that they want it is to enable some significant optimizations
> > in numerical code on the one hand and in heavily templated C++ code on
> > the other.  Neither of which has much bearing on kernel code.
> > 
> > Interested in coming to the next C standards committee meeting in October
> > to help me push for this?  ;-)
> 
> How about we try and get some compiler folks together at plumbers and
> bribe them with beer? Once we have our compiler knob, we happy :-)

C'mon, Peter!  Where is your sense of self-destruction???  ;-)

But yes, if nothing else there is a Toolchains MC [1].  Which happens to
have a topic entitled "Potential impact/benefit/detriment of recently
developed GCC optimizations on the kernel", now that you mention it.
Looking forward to seeing you in Lisbon!

						Thanx, Paul

[1] https://linuxplumbersconf.org/event/4/sessions/45/#20190909

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-20 20:29                           ` Paul E. McKenney
@ 2019-08-21 10:32                             ` Will Deacon
  2019-08-21 13:23                               ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Will Deacon @ 2019-08-21 10:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Linus Torvalds, Mathieu Desnoyers,
	Thomas Gleixner, Joel Fernandes, Alan Stern, rostedt,
	Valentin Schneider, linux-kernel, Boqun Feng, Will Deacon,
	David Howells

On Tue, Aug 20, 2019 at 01:29:32PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 20, 2019 at 03:56:12PM +0200, Peter Zijlstra wrote:
> > On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:
> > 
> > > The data tearing issue is almost a non-issue. We're not going to add
> > > WRITE_ONCE() to these kinds of places for no good reason.
> > 
> > Paulmck actually has an example of that somewhere; ISTR that particular
> > case actually got fixed by GCC, but I'd really _love_ for some compiler
> > people (both GCC and LLVM) to state that their respective compilers will
> > not do load/store tearing for machine word sized load/stores.
> 
> I do very much recall such an example, but I am now unable to either
> find it or reproduce it.  :-/
> 
> If I cannot turn it up in a few days, I will ask the LWN editors to
> make appropriate changes to the "Who is afraid" article.
> 
> > Without this written guarantee (which supposedly was in older GCC
> > manuals but has since gone missing), I'm loathe to rely on it.
> > 
> > Yes, it is very rare, but it is a massive royal pain to debug if/when it
> > does do happen.
> 
> But from what I can see, Linus is OK with use of WRITE_ONCE() for data
> races on any variable for which there is at least one READ_ONCE().
> So we can still use WRITE_ONCE() as we would like in our own code.
> Yes, you or I might be hit by someone else's omission of WRITE_ONCE(),
> it is better than the proverbial kick in the teeth.
> 
> Of course, if anyone knows of a compiler/architecture combination that
> really does tear stores of 32-bit constants, please do not keep it
> a secret!  After all, it would be good to get that addressed easily
> starting now rather than after a difficult and painful series of
> debugging sessions.

It's not quite what you asked for, but if you look at the following
silly code:

typedef unsigned long long u64;

struct data {
	u64 arr[1023];
	u64 flag;
};

void foo(struct data *x)
{
	int i;

	for (i = 0; i < 1023; ++i)
		x->arr[i] = 0;

	x->flag = 0;
}

void bar(u64 *x)
{
	*x = 0xabcdef10abcdef10;
}

Then arm64 clang (-O2) generates the following for foo:

foo:                                    // @foo
	stp	x29, x30, [sp, #-16]!   // 16-byte Folded Spill
	orr	w2, wzr, #0x2000
	mov	w1, wzr
	mov	x29, sp
	bl	memset
	ldp	x29, x30, [sp], #16     // 16-byte Folded Reload
	ret

and so the store to 'flag' has become part of the memset, which could
easily be bytewise in terms of atomicity (and this isn't unlikely given
we have a DC ZVA instruction which only guaratees bytewise atomicity).

GCC (also -O2) generates the following for bar:

bar:
	mov	w1, 61200
	movk	w1, 0xabcd, lsl 16
	stp	w1, w1, [x0]
	ret

and so it is using a store-pair instruction to reduce the complexity in
the immediate generation. Thus, the 64-bit store will only have 32-bit
atomicity. In fact, this is scary because if I change bar to:

void bar(u64 *x)
{
	*(volatile u64 *)x = 0xabcdef10abcdef10;
}

then I get:

bar:
	mov	w1, 61200
	movk	w1, 0xabcd, lsl 16
	str	w1, [x0]
	str	w1, [x0, 4]
	ret

so I'm not sure that WRITE_ONCE would even help :/

It's worth noting that:

void baz(atomic_long *x)
{
	atomic_store_explicit(x, 0xabcdef10abcdef10, memory_order_relaxed)
}

does the right thing:

baz:
	mov	x1, 61200
	movk	x1, 0xabcd, lsl 16
	movk	x1, 0xef10, lsl 32
	movk	x1, 0xabcd, lsl 48
	str	x1, [x0]
	ret

Whilst these examples may be contrived, I do thing they illustrate that
we can't simply say "stores to aligned, word-sized pointers are atomic".

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 10:32                             ` Will Deacon
@ 2019-08-21 13:23                               ` Paul E. McKenney
  2019-08-21 13:32                                 ` Will Deacon
  2019-08-21 15:33                                 ` Peter Zijlstra
  0 siblings, 2 replies; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-21 13:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Linus Torvalds, Mathieu Desnoyers,
	Thomas Gleixner, Joel Fernandes, Alan Stern, rostedt,
	Valentin Schneider, linux-kernel, Boqun Feng, Will Deacon,
	David Howells

On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> On Tue, Aug 20, 2019 at 01:29:32PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 20, 2019 at 03:56:12PM +0200, Peter Zijlstra wrote:
> > > On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:
> > > 
> > > > The data tearing issue is almost a non-issue. We're not going to add
> > > > WRITE_ONCE() to these kinds of places for no good reason.
> > > 
> > > Paulmck actually has an example of that somewhere; ISTR that particular
> > > case actually got fixed by GCC, but I'd really _love_ for some compiler
> > > people (both GCC and LLVM) to state that their respective compilers will
> > > not do load/store tearing for machine word sized load/stores.
> > 
> > I do very much recall such an example, but I am now unable to either
> > find it or reproduce it.  :-/
> > 
> > If I cannot turn it up in a few days, I will ask the LWN editors to
> > make appropriate changes to the "Who is afraid" article.
> > 
> > > Without this written guarantee (which supposedly was in older GCC
> > > manuals but has since gone missing), I'm loathe to rely on it.
> > > 
> > > Yes, it is very rare, but it is a massive royal pain to debug if/when it
> > > does do happen.
> > 
> > But from what I can see, Linus is OK with use of WRITE_ONCE() for data
> > races on any variable for which there is at least one READ_ONCE().
> > So we can still use WRITE_ONCE() as we would like in our own code.
> > Yes, you or I might be hit by someone else's omission of WRITE_ONCE(),
> > it is better than the proverbial kick in the teeth.
> > 
> > Of course, if anyone knows of a compiler/architecture combination that
> > really does tear stores of 32-bit constants, please do not keep it
> > a secret!  After all, it would be good to get that addressed easily
> > starting now rather than after a difficult and painful series of
> > debugging sessions.
> 
> It's not quite what you asked for, but if you look at the following
> silly code:
> 
> typedef unsigned long long u64;
> 
> struct data {
> 	u64 arr[1023];
> 	u64 flag;
> };
> 
> void foo(struct data *x)
> {
> 	int i;
> 
> 	for (i = 0; i < 1023; ++i)
> 		x->arr[i] = 0;
> 
> 	x->flag = 0;
> }
> 
> void bar(u64 *x)
> {
> 	*x = 0xabcdef10abcdef10;
> }
> 
> Then arm64 clang (-O2) generates the following for foo:
> 
> foo:                                    // @foo
> 	stp	x29, x30, [sp, #-16]!   // 16-byte Folded Spill
> 	orr	w2, wzr, #0x2000
> 	mov	w1, wzr
> 	mov	x29, sp
> 	bl	memset
> 	ldp	x29, x30, [sp], #16     // 16-byte Folded Reload
> 	ret
> 
> and so the store to 'flag' has become part of the memset, which could
> easily be bytewise in terms of atomicity (and this isn't unlikely given
> we have a DC ZVA instruction which only guaratees bytewise atomicity).
> 
> GCC (also -O2) generates the following for bar:
> 
> bar:
> 	mov	w1, 61200
> 	movk	w1, 0xabcd, lsl 16
> 	stp	w1, w1, [x0]
> 	ret
> 
> and so it is using a store-pair instruction to reduce the complexity in
> the immediate generation. Thus, the 64-bit store will only have 32-bit
> atomicity. In fact, this is scary because if I change bar to:
> 
> void bar(u64 *x)
> {
> 	*(volatile u64 *)x = 0xabcdef10abcdef10;
> }
> 
> then I get:
> 
> bar:
> 	mov	w1, 61200
> 	movk	w1, 0xabcd, lsl 16
> 	str	w1, [x0]
> 	str	w1, [x0, 4]
> 	ret
> 
> so I'm not sure that WRITE_ONCE would even help :/

Well, I can have the LWN article cite your email, then.  So thank you
very much!

Is generation of this code for a 64-bit volatile store considered a bug?
Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
would guess that Thomas and Linus would ask a similar bugginess question
for normal stores.  ;-)

> It's worth noting that:
> 
> void baz(atomic_long *x)
> {
> 	atomic_store_explicit(x, 0xabcdef10abcdef10, memory_order_relaxed)
> }
> 
> does the right thing:
> 
> baz:
> 	mov	x1, 61200
> 	movk	x1, 0xabcd, lsl 16
> 	movk	x1, 0xef10, lsl 32
> 	movk	x1, 0xabcd, lsl 48
> 	str	x1, [x0]
> 	ret

OK, the C11 and C++11 guys should be happy with this.

> Whilst these examples may be contrived, I do thing they illustrate that
> we can't simply say "stores to aligned, word-sized pointers are atomic".

And thank you again!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 13:23                               ` Paul E. McKenney
@ 2019-08-21 13:32                                 ` Will Deacon
  2019-08-21 13:56                                   ` Paul E. McKenney
  2019-08-21 15:33                                 ` Peter Zijlstra
  1 sibling, 1 reply; 60+ messages in thread
From: Will Deacon @ 2019-08-21 13:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Linus Torvalds, Mathieu Desnoyers,
	Thomas Gleixner, Joel Fernandes, Alan Stern, rostedt,
	Valentin Schneider, linux-kernel, Boqun Feng, Will Deacon,
	David Howells

On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> > void bar(u64 *x)
> > {
> > 	*(volatile u64 *)x = 0xabcdef10abcdef10;
> > }
> > 
> > then I get:
> > 
> > bar:
> > 	mov	w1, 61200
> > 	movk	w1, 0xabcd, lsl 16
> > 	str	w1, [x0]
> > 	str	w1, [x0, 4]
> > 	ret
> > 
> > so I'm not sure that WRITE_ONCE would even help :/
> 
> Well, I can have the LWN article cite your email, then.  So thank you
> very much!
> 
> Is generation of this code for a 64-bit volatile store considered a bug?

I consider it a bug for the volatile case, and the one compiler person I've
spoken to also seems to reckon it's a bug, so hopefully it will be fixed.
I'm led to believe it's an optimisation in the AArch64 backend of GCC.

> Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
> would guess that Thomas and Linus would ask a similar bugginess question
> for normal stores.  ;-)

We use inline asm for MMIO, fwiw.

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 13:32                                 ` Will Deacon
@ 2019-08-21 13:56                                   ` Paul E. McKenney
  2019-08-21 16:22                                     ` Will Deacon
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-21 13:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Linus Torvalds, Mathieu Desnoyers,
	Thomas Gleixner, Joel Fernandes, Alan Stern, rostedt,
	Valentin Schneider, linux-kernel, Boqun Feng, Will Deacon,
	David Howells

On Wed, Aug 21, 2019 at 02:32:48PM +0100, Will Deacon wrote:
> On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> > > void bar(u64 *x)
> > > {
> > > 	*(volatile u64 *)x = 0xabcdef10abcdef10;
> > > }
> > > 
> > > then I get:
> > > 
> > > bar:
> > > 	mov	w1, 61200
> > > 	movk	w1, 0xabcd, lsl 16
> > > 	str	w1, [x0]
> > > 	str	w1, [x0, 4]
> > > 	ret
> > > 
> > > so I'm not sure that WRITE_ONCE would even help :/
> > 
> > Well, I can have the LWN article cite your email, then.  So thank you
> > very much!
> > 
> > Is generation of this code for a 64-bit volatile store considered a bug?
> 
> I consider it a bug for the volatile case, and the one compiler person I've
> spoken to also seems to reckon it's a bug, so hopefully it will be fixed.
> I'm led to believe it's an optimisation in the AArch64 backend of GCC.

Here is hoping for the fix!

> > Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
> > would guess that Thomas and Linus would ask a similar bugginess question
> > for normal stores.  ;-)
> 
> We use inline asm for MMIO, fwiw.

I should have remembered that, shouldn't I have?  ;-)

Is that also common practice across other embedded kernels these days?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 13:23                               ` Paul E. McKenney
  2019-08-21 13:32                                 ` Will Deacon
@ 2019-08-21 15:33                                 ` Peter Zijlstra
  2019-08-21 15:48                                   ` Mathieu Desnoyers
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2019-08-21 15:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Linus Torvalds, Mathieu Desnoyers, Thomas Gleixner,
	Joel Fernandes, Alan Stern, rostedt, Valentin Schneider,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:

> > and so it is using a store-pair instruction to reduce the complexity in
> > the immediate generation. Thus, the 64-bit store will only have 32-bit
> > atomicity. In fact, this is scary because if I change bar to:
> > 
> > void bar(u64 *x)
> > {
> > 	*(volatile u64 *)x = 0xabcdef10abcdef10;
> > }
> > 
> > then I get:
> > 
> > bar:
> > 	mov	w1, 61200
> > 	movk	w1, 0xabcd, lsl 16
> > 	str	w1, [x0]
> > 	str	w1, [x0, 4]
> > 	ret
> > 
> > so I'm not sure that WRITE_ONCE would even help :/
> 
> Well, I can have the LWN article cite your email, then.  So thank you
> very much!
> 
> Is generation of this code for a 64-bit volatile store considered a bug?
> Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
> would guess that Thomas and Linus would ask a similar bugginess question
> for normal stores.  ;-)

I'm calling this a compiler bug; the way I understand volatile this is
very much against the intentended use case. That is, this is buggy even
on UP vs signals or MMIO.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 15:33                                 ` Peter Zijlstra
@ 2019-08-21 15:48                                   ` Mathieu Desnoyers
  2019-08-21 16:14                                     ` Paul E. McKenney
  2019-08-21 19:03                                     ` Joel Fernandes
  0 siblings, 2 replies; 60+ messages in thread
From: Mathieu Desnoyers @ 2019-08-21 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, Will Deacon, Linus Torvalds, Thomas Gleixner,
	Joel Fernandes, Google, Alan Stern, rostedt, Valentin Schneider,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

----- On Aug 21, 2019, at 8:33 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
>> On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> 
>> > and so it is using a store-pair instruction to reduce the complexity in
>> > the immediate generation. Thus, the 64-bit store will only have 32-bit
>> > atomicity. In fact, this is scary because if I change bar to:
>> > 
>> > void bar(u64 *x)
>> > {
>> > 	*(volatile u64 *)x = 0xabcdef10abcdef10;
>> > }
>> > 
>> > then I get:
>> > 
>> > bar:
>> > 	mov	w1, 61200
>> > 	movk	w1, 0xabcd, lsl 16
>> > 	str	w1, [x0]
>> > 	str	w1, [x0, 4]
>> > 	ret
>> > 
>> > so I'm not sure that WRITE_ONCE would even help :/
>> 
>> Well, I can have the LWN article cite your email, then.  So thank you
>> very much!
>> 
>> Is generation of this code for a 64-bit volatile store considered a bug?
>> Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
>> would guess that Thomas and Linus would ask a similar bugginess question
>> for normal stores.  ;-)
> 
> I'm calling this a compiler bug; the way I understand volatile this is
> very much against the intentended use case. That is, this is buggy even
> on UP vs signals or MMIO.

And here is a simpler reproducer on my gcc-8.3.0 (aarch64) compiled with O2:

volatile unsigned long a;
 
void fct(void)
{
        a = 0x1234567812345678ULL;
}

void fct(void)
{
        a = 0x1234567812345678ULL;
   0:   90000000        adrp    x0, 8 <fct+0x8>
   4:   528acf01        mov     w1, #0x5678                     // #22136
   8:   72a24681        movk    w1, #0x1234, lsl #16
   c:   f9400000        ldr     x0, [x0]
  10:   b9000001        str     w1, [x0]
  14:   b9000401        str     w1, [x0, #4]
}
  18:   d65f03c0        ret

And the non-volatile case uses stp (is it a single store to memory ?):

unsigned long a;
  
void fct(void)
{
        a = 0x1234567812345678ULL;
}

void fct(void)
{
        a = 0x1234567812345678ULL;
   0:   90000000        adrp    x0, 8 <fct+0x8>
   4:   528acf01        mov     w1, #0x5678                     // #22136
   8:   72a24681        movk    w1, #0x1234, lsl #16
   c:   f9400000        ldr     x0, [x0]
  10:   29000401        stp     w1, w1, [x0]
}
  14:   d65f03c0        ret

It would probably be a good idea to audit other architectures, since this
is done by the compiler backend.

Thanks,

Mathieu








-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 15:48                                   ` Mathieu Desnoyers
@ 2019-08-21 16:14                                     ` Paul E. McKenney
  2019-08-21 19:03                                     ` Joel Fernandes
  1 sibling, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2019-08-21 16:14 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Will Deacon, Linus Torvalds, Thomas Gleixner,
	Joel Fernandes, Google, Alan Stern, rostedt, Valentin Schneider,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Wed, Aug 21, 2019 at 11:48:43AM -0400, Mathieu Desnoyers wrote:
> ----- On Aug 21, 2019, at 8:33 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
> >> On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> > 
> >> > and so it is using a store-pair instruction to reduce the complexity in
> >> > the immediate generation. Thus, the 64-bit store will only have 32-bit
> >> > atomicity. In fact, this is scary because if I change bar to:
> >> > 
> >> > void bar(u64 *x)
> >> > {
> >> > 	*(volatile u64 *)x = 0xabcdef10abcdef10;
> >> > }
> >> > 
> >> > then I get:
> >> > 
> >> > bar:
> >> > 	mov	w1, 61200
> >> > 	movk	w1, 0xabcd, lsl 16
> >> > 	str	w1, [x0]
> >> > 	str	w1, [x0, 4]
> >> > 	ret
> >> > 
> >> > so I'm not sure that WRITE_ONCE would even help :/
> >> 
> >> Well, I can have the LWN article cite your email, then.  So thank you
> >> very much!
> >> 
> >> Is generation of this code for a 64-bit volatile store considered a bug?
> >> Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
> >> would guess that Thomas and Linus would ask a similar bugginess question
> >> for normal stores.  ;-)
> > 
> > I'm calling this a compiler bug; the way I understand volatile this is
> > very much against the intentended use case. That is, this is buggy even
> > on UP vs signals or MMIO.
> 
> And here is a simpler reproducer on my gcc-8.3.0 (aarch64) compiled with O2:
> 
> volatile unsigned long a;
>  
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
> }
> 
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
>    0:   90000000        adrp    x0, 8 <fct+0x8>
>    4:   528acf01        mov     w1, #0x5678                     // #22136
>    8:   72a24681        movk    w1, #0x1234, lsl #16
>    c:   f9400000        ldr     x0, [x0]
>   10:   b9000001        str     w1, [x0]
>   14:   b9000401        str     w1, [x0, #4]
> }
>   18:   d65f03c0        ret
> 
> And the non-volatile case uses stp (is it a single store to memory ?):
> 
> unsigned long a;
>   
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
> }
> 
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
>    0:   90000000        adrp    x0, 8 <fct+0x8>
>    4:   528acf01        mov     w1, #0x5678                     // #22136
>    8:   72a24681        movk    w1, #0x1234, lsl #16
>    c:   f9400000        ldr     x0, [x0]
>   10:   29000401        stp     w1, w1, [x0]
> }
>   14:   d65f03c0        ret
> 
> It would probably be a good idea to audit other architectures, since this
> is done by the compiler backend.

That does seem like a very good idea!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 13:56                                   ` Paul E. McKenney
@ 2019-08-21 16:22                                     ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2019-08-21 16:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Linus Torvalds, Mathieu Desnoyers,
	Thomas Gleixner, Joel Fernandes, Alan Stern, rostedt,
	Valentin Schneider, linux-kernel, Boqun Feng, Will Deacon,
	David Howells

On Wed, Aug 21, 2019 at 06:56:10AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 21, 2019 at 02:32:48PM +0100, Will Deacon wrote:
> > On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> > > > void bar(u64 *x)
> > > > {
> > > > 	*(volatile u64 *)x = 0xabcdef10abcdef10;
> > > > }
> > > > 
> > > > then I get:
> > > > 
> > > > bar:
> > > > 	mov	w1, 61200
> > > > 	movk	w1, 0xabcd, lsl 16
> > > > 	str	w1, [x0]
> > > > 	str	w1, [x0, 4]
> > > > 	ret
> > > > 
> > > > so I'm not sure that WRITE_ONCE would even help :/
> > > 
> > > Well, I can have the LWN article cite your email, then.  So thank you
> > > very much!
> > > 
> > > Is generation of this code for a 64-bit volatile store considered a bug?
> > 
> > I consider it a bug for the volatile case, and the one compiler person I've
> > spoken to also seems to reckon it's a bug, so hopefully it will be fixed.
> > I'm led to believe it's an optimisation in the AArch64 backend of GCC.
> 
> Here is hoping for the fix!
> 
> > > Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
> > > would guess that Thomas and Linus would ask a similar bugginess question
> > > for normal stores.  ;-)
> > 
> > We use inline asm for MMIO, fwiw.
> 
> I should have remembered that, shouldn't I have?  ;-)
> 
> Is that also common practice across other embedded kernels these days?

I think so. Sometimes you care about things like the addressing mode being
used, so it's easier to roll it by hand.

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-21 15:48                                   ` Mathieu Desnoyers
  2019-08-21 16:14                                     ` Paul E. McKenney
@ 2019-08-21 19:03                                     ` Joel Fernandes
  1 sibling, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2019-08-21 19:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, paulmck, Will Deacon, Linus Torvalds,
	Thomas Gleixner, Alan Stern, rostedt, Valentin Schneider,
	linux-kernel, Boqun Feng, Will Deacon, David Howells

On Wed, Aug 21, 2019 at 11:48:43AM -0400, Mathieu Desnoyers wrote:
> ----- On Aug 21, 2019, at 8:33 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
> >> On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> > 
> >> > and so it is using a store-pair instruction to reduce the complexity in
> >> > the immediate generation. Thus, the 64-bit store will only have 32-bit
> >> > atomicity. In fact, this is scary because if I change bar to:
> >> > 
> >> > void bar(u64 *x)
> >> > {
> >> > 	*(volatile u64 *)x = 0xabcdef10abcdef10;
> >> > }
> >> > 
> >> > then I get:
> >> > 
> >> > bar:
> >> > 	mov	w1, 61200
> >> > 	movk	w1, 0xabcd, lsl 16
> >> > 	str	w1, [x0]
> >> > 	str	w1, [x0, 4]
> >> > 	ret
> >> > 
> >> > so I'm not sure that WRITE_ONCE would even help :/
> >> 
> >> Well, I can have the LWN article cite your email, then.  So thank you
> >> very much!
> >> 
> >> Is generation of this code for a 64-bit volatile store considered a bug?
> >> Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
> >> would guess that Thomas and Linus would ask a similar bugginess question
> >> for normal stores.  ;-)
> > 
> > I'm calling this a compiler bug; the way I understand volatile this is
> > very much against the intentended use case. That is, this is buggy even
> > on UP vs signals or MMIO.
> 
> And here is a simpler reproducer on my gcc-8.3.0 (aarch64) compiled with O2:
> 
> volatile unsigned long a;
>  
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
> }
> 
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
>    0:   90000000        adrp    x0, 8 <fct+0x8>
>    4:   528acf01        mov     w1, #0x5678                     // #22136
>    8:   72a24681        movk    w1, #0x1234, lsl #16
>    c:   f9400000        ldr     x0, [x0]
>   10:   b9000001        str     w1, [x0]
>   14:   b9000401        str     w1, [x0, #4]
> }
>   18:   d65f03c0        ret

Fwiw, and, interestingly, on clang v7.0.1-8 (aarch64), I get a proper 64-bit
str with the above example (even when not using volatile):

0000000000000000 <nonvol>:
   0:	d28acf08 	mov	x8, #0x5678                	// #22136
   4:	f2a24688 	movk	x8, #0x1234, lsl #16
   8:	f2cacf08 	movk	x8, #0x5678, lsl #32
   c:	f2e24688 	movk	x8, #0x1234, lsl #48
  10:	90000009 	adrp	x9, 8 <nonvol+0x8>
  14:	91000129 	add	x9, x9, #0x0
  18:	f9000128 	str	x8, [x9]
  1c:	d65f03c0 	ret

test1.o:     file format elf64-littleaarch64


And even with -O2 it is a single store:

Disassembly of section .text:

0000000000000000 <nonvol>:
   0:	d28acf09 	mov	x9, #0x5678                	// #22136
   4:	f2a24689 	movk	x9, #0x1234, lsl #16
   8:	f2cacf09 	movk	x9, #0x5678, lsl #32
   c:	90000008 	adrp	x8, 8 <nonvol+0x8>
  10:	f2e24689 	movk	x9, #0x1234, lsl #48
  14:	f9000109 	str	x9, [x8]
  18:	d65f03c0 	ret

thanks,

 - Joel

[...]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
  2019-08-20 13:56                         ` Peter Zijlstra
  2019-08-20 20:29                           ` Paul E. McKenney
@ 2019-09-09  6:21                           ` Herbert Xu
  1 sibling, 0 replies; 60+ messages in thread
From: Herbert Xu @ 2019-09-09  6:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, mathieu.desnoyers, tglx, joel, stern, rostedt,
	valentin.schneider, linux-kernel, paulmck, boqun.feng,
	will.deacon, dhowells

Peter Zijlstra <peterz@infradead.org> wrote:
>
> Paulmck actually has an example of that somewhere; ISTR that particular
> case actually got fixed by GCC, but I'd really _love_ for some compiler
> people (both GCC and LLVM) to state that their respective compilers will
> not do load/store tearing for machine word sized load/stores.
> 
> Without this written guarantee (which supposedly was in older GCC
> manuals but has since gone missing), I'm loathe to rely on it.

IIRC in that case gcc actually broke atomic writes even with a
volatile keyword.  So even WRITE_ONCE wouldn't have saved us.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2019-09-09  6:22 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18 10:29 WARNING in tracepoint_probe_register_prio (3) syzbot
2019-08-16  0:11 ` syzbot
2019-08-16 14:26   ` [PATCH 1/1] Fix: trace sched switch start/stop racy updates Mathieu Desnoyers
2019-08-16 16:25     ` Steven Rostedt
2019-08-16 16:48       ` Valentin Schneider
2019-08-16 17:04         ` Steven Rostedt
2019-08-16 17:41           ` Mathieu Desnoyers
2019-08-16 19:18             ` Steven Rostedt
2019-08-16 19:19             ` Alan Stern
2019-08-16 20:44               ` Joel Fernandes
2019-08-16 20:49                 ` Thomas Gleixner
2019-08-16 20:57                   ` Joel Fernandes
2019-08-16 22:27                     ` Valentin Schneider
2019-08-16 22:57                       ` Linus Torvalds
2019-08-17  1:41                         ` Mathieu Desnoyers
2019-08-17  4:52                         ` Paul E. McKenney
2019-08-17  8:28                           ` Linus Torvalds
2019-08-17  8:44                             ` Linus Torvalds
2019-08-17 15:02                               ` Mathieu Desnoyers
2019-08-17 20:03                                 ` Valentin Schneider
2019-08-17 23:00                                   ` Paul E. McKenney
2019-08-19 10:34                                     ` Valentin Schneider
2019-08-17 22:28                             ` Paul E. McKenney
2019-08-20 14:01                           ` Peter Zijlstra
2019-08-20 20:31                             ` Paul E. McKenney
2019-08-20 20:39                               ` Peter Zijlstra
2019-08-20 20:52                                 ` Paul E. McKenney
2019-08-16 21:04                   ` Linus Torvalds
2019-08-17  1:36                     ` Mathieu Desnoyers
2019-08-17  2:13                       ` Steven Rostedt
2019-08-17 14:40                         ` Mathieu Desnoyers
2019-08-17 15:26                           ` Steven Rostedt
2019-08-17 15:55                             ` Mathieu Desnoyers
2019-08-17 16:40                               ` Steven Rostedt
2019-08-17 22:06                                 ` Paul E. McKenney
2019-08-17  8:08                       ` Linus Torvalds
2019-08-20 13:56                         ` Peter Zijlstra
2019-08-20 20:29                           ` Paul E. McKenney
2019-08-21 10:32                             ` Will Deacon
2019-08-21 13:23                               ` Paul E. McKenney
2019-08-21 13:32                                 ` Will Deacon
2019-08-21 13:56                                   ` Paul E. McKenney
2019-08-21 16:22                                     ` Will Deacon
2019-08-21 15:33                                 ` Peter Zijlstra
2019-08-21 15:48                                   ` Mathieu Desnoyers
2019-08-21 16:14                                     ` Paul E. McKenney
2019-08-21 19:03                                     ` Joel Fernandes
2019-09-09  6:21                           ` Herbert Xu
2019-08-16 20:49                 ` Steven Rostedt
2019-08-16 20:59                   ` Joel Fernandes
2019-08-17  1:25                   ` Mathieu Desnoyers
2019-08-18  9:15                   ` stable markup was " Pavel Machek
2019-08-16 17:19       ` Mathieu Desnoyers
2019-08-16 19:15         ` Steven Rostedt
2019-08-17 14:27           ` Mathieu Desnoyers
2019-08-17 15:42             ` Steven Rostedt
2019-08-17 15:53               ` Mathieu Desnoyers
2019-08-17 16:43                 ` Steven Rostedt
2019-08-16 12:32 ` WARNING in tracepoint_probe_register_prio (3) syzbot
2019-08-16 12:41   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).