From: Willy Tarreau <w@1wt.eu>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Florian Weimer <fweimer@redhat.com>,
linux-toolchains@vger.kernel.org, Will Deacon <will@kernel.org>,
Paul McKenney <paulmck@kernel.org>,
linux-kernel@vger.kernel.org, stern@rowland.harvard.edu,
parri.andrea@gmail.com, boqun.feng@gmail.com, npiggin@gmail.com,
dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr,
akiyks@gmail.com, dlustig@nvidia.com, joel@joelfernandes.org,
torvalds@linux-foundation.org
Subject: Re: Control Dependencies vs C Compilers
Date: Wed, 7 Oct 2020 12:30:33 +0200 [thread overview]
Message-ID: <20201007103033.GB6550@1wt.eu> (raw)
In-Reply-To: <20201007093243.GB2628@hirez.programming.kicks-ass.net>
On Wed, Oct 07, 2020 at 11:32:43AM +0200, Peter Zijlstra wrote:
> A branch that cannot be optimized away and prohibits lifting stores
> over. One possible suggestion would be allowing the volatile keyword as
> a qualifier to if.
>
> x = *foo;
> volatile if (x > 42)
> *bar = 1;
>
> This would tell the compiler that the condition is special in that it
> must emit a conditional branch instruction and that it must not lift
> stores (or sequence points) over it.
This test is interesting, because if foo and bar are of the same type,
nothing prevents them from aliasing and the compiler cannot make wild
guesses on them (i.e. they may be plain memory as well as memory-mapped
registers).
Extending it like this shows a difference between the use of volatile
and __atomic_{load,store}_n. While both are correct in that each access
is properly performed, for an unknown reason the compiler decided to
implement two distinct branches in the atomic case and to inflate the
code:
$ gcc -v
gcc version 9.3.0 (GCC)
$ cat foo-volatile.c
long foobar(long *foo, long *bar)
{
*(volatile long *)bar = 10;
if (*(volatile long *)foo <= 42)
*(volatile long *)bar = 64;
if (*(volatile long *)foo > 42)
*(volatile long *)bar = 0;
return *(volatile long *)bar;
}
$ gcc -c -O2 foo-volatile.c
$ objdump -dr foo-volatile.o
0000000000000000 <foobar>:
0: 48 c7 06 0a 00 00 00 movq $0xa,(%rsi)
7: 48 8b 07 mov (%rdi),%rax
a: 48 83 f8 2a cmp $0x2a,%rax
e: 7f 07 jg 17 <foobar+0x17>
10: 48 c7 06 40 00 00 00 movq $0x40,(%rsi)
17: 48 8b 07 mov (%rdi),%rax
1a: 48 83 f8 2a cmp $0x2a,%rax
1e: 7e 07 jle 27 <foobar+0x27>
20: 48 c7 06 00 00 00 00 movq $0x0,(%rsi)
27: 48 8b 06 mov (%rsi),%rax
2a: c3 retq
$ cat foo-atomic.c
long foobar(long *foo, long *bar)
{
__atomic_store_n(bar, 10, __ATOMIC_RELAXED);
if (__atomic_load_n(foo, __ATOMIC_RELAXED) <= 42)
__atomic_store_n(bar, 64, __ATOMIC_RELAXED);
if (__atomic_load_n(foo, __ATOMIC_RELAXED) > 42)
__atomic_store_n(bar, 0, __ATOMIC_RELAXED);
return __atomic_load_n(bar, __ATOMIC_RELAXED);
}
$ objdump -dr foo-atomic.o
0000000000000000 <foobar>:
0: 48 c7 06 0a 00 00 00 movq $0xa,(%rsi)
7: 48 8b 07 mov (%rdi),%rax
a: 48 83 f8 2a cmp $0x2a,%rax
e: 7e 10 jle 20 <foobar+0x20>
10: 48 8b 07 mov (%rdi),%rax
13: 48 83 f8 2a cmp $0x2a,%rax
17: 7f 17 jg 30 <foobar+0x30>
19: 48 8b 06 mov (%rsi),%rax
1c: c3 retq
1d: 0f 1f 00 nopl (%rax)
20: 48 c7 06 40 00 00 00 movq $0x40,(%rsi)
27: 48 8b 07 mov (%rdi),%rax
2a: 48 83 f8 2a cmp $0x2a,%rax
2e: 7e e9 jle 19 <foobar+0x19>
30: 48 c7 06 00 00 00 00 movq $0x0,(%rsi)
37: 48 8b 06 mov (%rsi),%rax
3a: c3 retq
When building at -Os both produce the same code as the volatile version
above. It *seems* to me that the volatile version always produces more
optimal code, but is it always correct ? This is just an illustration of
how tricky this can currently be and how confusing it can sometimes be
for the developer to make sure the desired code is emitted in a few
special cases. And just for this, having the compiler support more easily
predictable constructs would be a nice improvement.
Willy
prev parent reply other threads:[~2020-10-07 10:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-06 11:47 Control Dependencies vs C Compilers Peter Zijlstra
2020-10-06 12:37 ` David Laight
2020-10-06 12:49 ` Willy Tarreau
2020-10-06 13:31 ` Peter Zijlstra
2020-10-06 14:23 ` stern
2020-10-06 14:43 ` Peter Zijlstra
2020-10-06 15:16 ` Nick Clifton
2020-10-06 15:37 ` David Laight
2020-10-06 15:50 ` Paul E. McKenney
2020-10-06 16:10 ` Willy Tarreau
2020-10-06 16:22 ` David Laight
2020-10-06 16:31 ` Paul E. McKenney
2020-10-06 15:07 ` David Laight
2020-10-06 21:20 ` Florian Weimer
2020-10-07 9:32 ` Peter Zijlstra
2020-10-07 10:20 ` Florian Weimer
2020-10-07 11:50 ` Peter Zijlstra
2020-10-07 17:11 ` Paul E. McKenney
2020-10-07 21:07 ` Peter Zijlstra
2020-10-07 21:20 ` Paul E. McKenney
2020-10-07 10:30 ` Willy Tarreau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201007103033.GB6550@1wt.eu \
--to=w@1wt.eu \
--cc=akiyks@gmail.com \
--cc=boqun.feng@gmail.com \
--cc=dhowells@redhat.com \
--cc=dlustig@nvidia.com \
--cc=fweimer@redhat.com \
--cc=j.alglave@ucl.ac.uk \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-toolchains@vger.kernel.org \
--cc=luc.maranget@inria.fr \
--cc=npiggin@gmail.com \
--cc=parri.andrea@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=stern@rowland.harvard.edu \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.