All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	dja@axtens.net, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, christophe.leroy@c-s.fr,
	linux-arch@vger.kernel.org, Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Christian Borntraeger <borntraeger@de.ibm.com>
Subject: Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))
Date: Thu, 12 Dec 2019 09:01:05 +0100	[thread overview]
Message-ID: <20191212080105.GV2844@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <875zimp0ay.fsf@mpe.ellerman.id.au>

On Thu, Dec 12, 2019 at 04:42:13PM +1100, Michael Ellerman wrote:
> [ trimmed CC a bit ]
> 
> Peter Zijlstra <peterz@infradead.org> writes:
> > On Fri, Dec 06, 2019 at 11:46:11PM +1100, Michael Ellerman wrote:
> ...
> > you write:
> >
> >   "Currently bitops-instrumented.h assumes that the architecture provides
> > atomic, non-atomic and locking bitops (e.g. both set_bit and __set_bit).
> > This is true on x86 and s390, but is not always true: there is a
> > generic bitops/non-atomic.h header that provides generic non-atomic
> > operations, and also a generic bitops/lock.h for locking operations."
> >
> > Is there any actual benefit for PPC to using their own atomic bitops
> > over bitops/lock.h ? I'm thinking that the generic code is fairly
> > optimal for most LL/SC architectures.
> 
> Yes and no :)
> 
> Some of the generic versions don't generate good code compared to our
> versions, but that's because READ_ONCE() is triggering stack protector
> to be enabled.

Bah, there's never anything simple, is there :/

> For example, comparing an out-of-line copy of the generic and ppc
> versions of test_and_set_bit_lock():
> 
>    1 <generic_test_and_set_bit_lock>:           1 <ppc_test_and_set_bit_lock>:
>    2         addis   r2,r12,361
>    3         addi    r2,r2,-4240
>    4         stdu    r1,-48(r1)
>    5         rlwinm  r8,r3,29,3,28
>    6         clrlwi  r10,r3,26                   2         rldicl  r10,r3,58,6
>    7         ld      r9,3320(r13)
>    8         std     r9,40(r1)
>    9         li      r9,0
>   10         li      r9,1                        3         li      r9,1
>                                                  4         clrlwi  r3,r3,26
>                                                  5         rldicr  r10,r10,3,60
>   11         sld     r9,r9,r10                   6         sld     r3,r9,r3
>   12         add     r10,r4,r8                   7         add     r4,r4,r10
>   13         ldx     r8,r4,r8
>   14         and.    r8,r9,r8
>   15         bne     34f
>   16         ldarx   r7,0,r10                    8         ldarx   r9,0,r4,1
>   17         or      r8,r9,r7                    9         or      r10,r9,r3
>   18         stdcx.  r8,0,r10                   10         stdcx.  r10,0,r4
>   19         bne-    16b                        11         bne-    8b
>   20         isync                              12         isync
>   21         and     r9,r7,r9                   13         and     r3,r3,r9
>   22         addic   r7,r9,-1                   14         addic   r9,r3,-1
>   23         subfe   r7,r7,r9                   15         subfe   r3,r9,r3
>   24         ld      r9,40(r1)
>   25         ld      r10,3320(r13)
>   26         xor.    r9,r9,r10
>   27         li      r10,0
>   28         mr      r3,r7
>   29         bne     36f
>   30         addi    r1,r1,48
>   31         blr                                16         blr
>   32         nop
>   33         nop
>   34         li      r7,1
>   35         b       24b
>   36         mflr    r0
>   37         std     r0,64(r1)
>   38         bl      <__stack_chk_fail+0x8>
> 
> 
> If you squint, the generated code for the actual logic is pretty similar, but
> the stack protector gunk makes a big mess. It's particularly bad here
> because the ppc version doesn't even need a stack frame.
> 
> I've also confirmed that even when test_and_set_bit_lock() is inlined
> into an actual call site the stack protector logic still triggers.

> If I change the READ_ONCE() in test_and_set_bit_lock():
> 
> 	if (READ_ONCE(*p) & mask)
> 		return 1;
> 
> to a regular pointer access:
> 
> 	if (*p & mask)
> 		return 1;
> 
> Then the generated code looks more or less the same, except for the extra early
> return in the generic version of test_and_set_bit_lock(), and different handling
> of the return code by the compiler.

So given that the function signature is:

static inline int test_and_set_bit_lock(unsigned int nr,
					volatile unsigned long *p)

@p already carries the required volatile qualifier, so READ_ONCE() does
not add anything here (except for easier to read code and poor code
generation).

So your proposed change _should_ be fine. Will, I'm assuming you never
saw this on your ARGH64 builds when you did this code ?

---
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index dd90c9792909..10264e8808f8 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -35,7 +35,7 @@ static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (READ_ONCE(*p) & mask)
+	if (*p & mask)
 		return 1;
 
 	old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
@@ -48,7 +48,7 @@ static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (!(READ_ONCE(*p) & mask))
+	if (!(*p & mask))
 		return 0;
 
 	old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index 3ae021368f48..9baf0a0055b8 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -22,7 +22,7 @@ static inline int test_and_set_bit_lock(unsigned int nr,
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (READ_ONCE(*p) & mask)
+	if (*p & mask)
 		return 1;
 
 	old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
@@ -60,7 +60,7 @@ static inline void __clear_bit_unlock(unsigned int nr,
 	unsigned long old;
 
 	p += BIT_WORD(nr);
-	old = READ_ONCE(*p);
+	old = *p;
 	old &= ~BIT_MASK(nr);
 	atomic_long_set_release((atomic_long_t *)p, old);
 }

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-arch@vger.kernel.org, Will Deacon <will@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Mark Rutland <mark.rutland@arm.com>,
	linuxppc-dev@lists.ozlabs.org, dja@axtens.net
Subject: Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))
Date: Thu, 12 Dec 2019 09:01:05 +0100	[thread overview]
Message-ID: <20191212080105.GV2844@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <875zimp0ay.fsf@mpe.ellerman.id.au>

On Thu, Dec 12, 2019 at 04:42:13PM +1100, Michael Ellerman wrote:
> [ trimmed CC a bit ]
> 
> Peter Zijlstra <peterz@infradead.org> writes:
> > On Fri, Dec 06, 2019 at 11:46:11PM +1100, Michael Ellerman wrote:
> ...
> > you write:
> >
> >   "Currently bitops-instrumented.h assumes that the architecture provides
> > atomic, non-atomic and locking bitops (e.g. both set_bit and __set_bit).
> > This is true on x86 and s390, but is not always true: there is a
> > generic bitops/non-atomic.h header that provides generic non-atomic
> > operations, and also a generic bitops/lock.h for locking operations."
> >
> > Is there any actual benefit for PPC to using their own atomic bitops
> > over bitops/lock.h ? I'm thinking that the generic code is fairly
> > optimal for most LL/SC architectures.
> 
> Yes and no :)
> 
> Some of the generic versions don't generate good code compared to our
> versions, but that's because READ_ONCE() is triggering stack protector
> to be enabled.

Bah, there's never anything simple, is there :/

> For example, comparing an out-of-line copy of the generic and ppc
> versions of test_and_set_bit_lock():
> 
>    1 <generic_test_and_set_bit_lock>:           1 <ppc_test_and_set_bit_lock>:
>    2         addis   r2,r12,361
>    3         addi    r2,r2,-4240
>    4         stdu    r1,-48(r1)
>    5         rlwinm  r8,r3,29,3,28
>    6         clrlwi  r10,r3,26                   2         rldicl  r10,r3,58,6
>    7         ld      r9,3320(r13)
>    8         std     r9,40(r1)
>    9         li      r9,0
>   10         li      r9,1                        3         li      r9,1
>                                                  4         clrlwi  r3,r3,26
>                                                  5         rldicr  r10,r10,3,60
>   11         sld     r9,r9,r10                   6         sld     r3,r9,r3
>   12         add     r10,r4,r8                   7         add     r4,r4,r10
>   13         ldx     r8,r4,r8
>   14         and.    r8,r9,r8
>   15         bne     34f
>   16         ldarx   r7,0,r10                    8         ldarx   r9,0,r4,1
>   17         or      r8,r9,r7                    9         or      r10,r9,r3
>   18         stdcx.  r8,0,r10                   10         stdcx.  r10,0,r4
>   19         bne-    16b                        11         bne-    8b
>   20         isync                              12         isync
>   21         and     r9,r7,r9                   13         and     r3,r3,r9
>   22         addic   r7,r9,-1                   14         addic   r9,r3,-1
>   23         subfe   r7,r7,r9                   15         subfe   r3,r9,r3
>   24         ld      r9,40(r1)
>   25         ld      r10,3320(r13)
>   26         xor.    r9,r9,r10
>   27         li      r10,0
>   28         mr      r3,r7
>   29         bne     36f
>   30         addi    r1,r1,48
>   31         blr                                16         blr
>   32         nop
>   33         nop
>   34         li      r7,1
>   35         b       24b
>   36         mflr    r0
>   37         std     r0,64(r1)
>   38         bl      <__stack_chk_fail+0x8>
> 
> 
> If you squint, the generated code for the actual logic is pretty similar, but
> the stack protector gunk makes a big mess. It's particularly bad here
> because the ppc version doesn't even need a stack frame.
> 
> I've also confirmed that even when test_and_set_bit_lock() is inlined
> into an actual call site the stack protector logic still triggers.

> If I change the READ_ONCE() in test_and_set_bit_lock():
> 
> 	if (READ_ONCE(*p) & mask)
> 		return 1;
> 
> to a regular pointer access:
> 
> 	if (*p & mask)
> 		return 1;
> 
> Then the generated code looks more or less the same, except for the extra early
> return in the generic version of test_and_set_bit_lock(), and different handling
> of the return code by the compiler.

So given that the function signature is:

static inline int test_and_set_bit_lock(unsigned int nr,
					volatile unsigned long *p)

@p already carries the required volatile qualifier, so READ_ONCE() does
not add anything here (except for easier to read code and poor code
generation).

So your proposed change _should_ be fine. Will, I'm assuming you never
saw this on your ARGH64 builds when you did this code ?

---
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index dd90c9792909..10264e8808f8 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -35,7 +35,7 @@ static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (READ_ONCE(*p) & mask)
+	if (*p & mask)
 		return 1;
 
 	old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
@@ -48,7 +48,7 @@ static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (!(READ_ONCE(*p) & mask))
+	if (!(*p & mask))
 		return 0;
 
 	old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index 3ae021368f48..9baf0a0055b8 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -22,7 +22,7 @@ static inline int test_and_set_bit_lock(unsigned int nr,
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (READ_ONCE(*p) & mask)
+	if (*p & mask)
 		return 1;
 
 	old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
@@ -60,7 +60,7 @@ static inline void __clear_bit_unlock(unsigned int nr,
 	unsigned long old;
 
 	p += BIT_WORD(nr);
-	old = READ_ONCE(*p);
+	old = *p;
 	old &= ~BIT_MASK(nr);
 	atomic_long_set_release((atomic_long_t *)p, old);
 }

  reply	other threads:[~2019-12-12  8:01 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 12:46 [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops) Michael Ellerman
2019-12-06 12:46 ` Michael Ellerman
2019-12-06 13:16 ` Peter Zijlstra
2019-12-06 13:16   ` Peter Zijlstra
2019-12-10  5:38   ` Michael Ellerman
2019-12-10  5:38     ` Michael Ellerman
2019-12-10 10:15     ` Peter Zijlstra
2019-12-10 10:15       ` Peter Zijlstra
2019-12-11  0:29       ` Michael Ellerman
2019-12-11  0:29         ` Michael Ellerman
2019-12-12  5:42   ` READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops)) Michael Ellerman
2019-12-12  5:42     ` Michael Ellerman
2019-12-12  8:01     ` Peter Zijlstra [this message]
2019-12-12  8:01       ` Peter Zijlstra
2019-12-12 10:07       ` Will Deacon
2019-12-12 10:07         ` Will Deacon
2019-12-12 10:46         ` Peter Zijlstra
2019-12-12 10:46           ` Peter Zijlstra
2019-12-12 17:04           ` Will Deacon
2019-12-12 17:04             ` Will Deacon
2019-12-12 17:16             ` Will Deacon
2019-12-12 17:16               ` Will Deacon
2019-12-12 17:41           ` Linus Torvalds
2019-12-12 17:41             ` Linus Torvalds
2019-12-12 17:50             ` Segher Boessenkool
2019-12-12 17:50               ` Segher Boessenkool
2019-12-12 18:06             ` Will Deacon
2019-12-12 18:06               ` Will Deacon
2019-12-12 18:29               ` Christian Borntraeger
2019-12-12 18:29                 ` Christian Borntraeger
2019-12-12 18:43               ` Linus Torvalds
2019-12-12 18:43                 ` Linus Torvalds
2019-12-12 19:34                 ` Will Deacon
2019-12-12 19:34                   ` Will Deacon
2019-12-12 20:21                   ` Peter Zijlstra
2019-12-12 20:21                     ` Peter Zijlstra
2019-12-12 20:53                     ` Peter Zijlstra
2019-12-12 20:53                       ` Peter Zijlstra
2019-12-13 10:47                       ` Luc Van Oostenryck
2019-12-13 10:47                         ` Luc Van Oostenryck
2019-12-13 12:56                         ` Peter Zijlstra
2019-12-13 12:56                           ` Peter Zijlstra
2019-12-13 14:28                           ` Luc Van Oostenryck
2019-12-13 14:28                             ` Luc Van Oostenryck
2019-12-12 20:49                   ` Linus Torvalds
2019-12-12 20:49                     ` Linus Torvalds
2019-12-13 13:17                     ` Arnd Bergmann
2019-12-13 13:17                       ` Arnd Bergmann
2019-12-13 21:32                       ` Arnd Bergmann
2019-12-13 21:32                         ` Arnd Bergmann
2019-12-13 22:01                         ` Linus Torvalds
2019-12-13 22:01                           ` Linus Torvalds
2019-12-16 10:28                       ` Will Deacon
2019-12-16 10:28                         ` Will Deacon
2019-12-16 11:47                         ` Peter Zijlstra
2019-12-16 11:47                           ` Peter Zijlstra
2019-12-16 12:06                         ` Arnd Bergmann
2019-12-16 12:06                           ` Arnd Bergmann
2019-12-17 17:07                     ` Will Deacon
2019-12-17 17:07                       ` Will Deacon
2019-12-17 18:04                       ` Linus Torvalds
2019-12-17 18:04                         ` Linus Torvalds
2019-12-17 18:05                         ` Linus Torvalds
2019-12-17 18:05                           ` Linus Torvalds
2019-12-17 18:31                           ` Will Deacon
2019-12-17 18:31                             ` Will Deacon
2019-12-17 18:32                         ` Linus Torvalds
2019-12-17 18:32                           ` Linus Torvalds
2019-12-18 12:17                           ` Michael Ellerman
2019-12-18 12:17                             ` Michael Ellerman
2019-12-19 12:11                           ` Will Deacon
2019-12-19 12:11                             ` Will Deacon
2019-12-18 10:22                     ` Christian Borntraeger
2019-12-18 10:22                       ` Christian Borntraeger
2019-12-18 10:35                       ` Will Deacon
2019-12-18 10:35                         ` Will Deacon
2019-12-13 12:07           ` Michael Ellerman
2019-12-13 12:07             ` Michael Ellerman
2019-12-13 13:53             ` Segher Boessenkool
2019-12-13 13:53               ` Segher Boessenkool
2019-12-13 21:06               ` Michael Ellerman
2019-12-13 21:06                 ` Michael Ellerman
2019-12-12 15:10     ` Segher Boessenkool
2019-12-12 15:10       ` Segher Boessenkool
2019-12-06 22:15 ` [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops) pr-tracker-bot
2019-12-06 22:15   ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191212080105.GV2844@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=arnd@arndb.de \
    --cc=borntraeger@de.ibm.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=dja@axtens.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mark.rutland@arm.com \
    --cc=mpe@ellerman.id.au \
    --cc=segher@kernel.crashing.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.