All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: will@kernel.org, boqun.feng@gmail.com,
	linux-kernel@vger.kernel.org, x86@kernel.org, elver@google.com,
	keescook@chromium.org, hch@infradead.org,
	torvalds@linux-foundation.org, axboe@kernel.dk
Subject: Re: [PATCH v2 4/9] refcount: Use atomic_*_overflow()
Date: Mon, 13 Dec 2021 10:35:00 +0000	[thread overview]
Message-ID: <Ybch1AKtthvI7Obm@FVFF77S0Q05N> (raw)
In-Reply-To: <20211210162313.534215085@infradead.org>

On Fri, Dec 10, 2021 at 05:16:22PM +0100, Peter Zijlstra wrote:
> Use the shiny new atomic_*_overflow() functions in order to have better
> code-gen.
> 
> Strictly speaking, these ops operate on [0, INT_MIN] rather than
> [0, INT_MAX] but this is harmless. At worst another op finds the
> refcount is negative and goes saturate, which is always a possibility
> anyway.
> 
> Notably refcount_inc() case no longer distinguishes between
> inc-from-zero and inc-negative in the fast path, this improves
> code-gen:
> 
>     4b88:       b8 01 00 00 00          mov    $0x1,%eax
>     4b8d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
>     4b92:       85 c0                   test   %eax,%eax
>     4b94:       74 1b                   je     4bb1 <alloc_perf_context+0xf1>
>     4b96:       8d 50 01                lea    0x1(%rax),%edx
>     4b99:       09 c2                   or     %eax,%edx
>     4b9b:       78 20                   js     4bbd <alloc_perf_context+0xfd>
> 
> to:
> 
>     4768:       b8 01 00 00 00          mov    $0x1,%eax
>     476d:       f0 0f c1 43 28          lock xadd %eax,0x28(%rbx)
>     4772:       85 c0                   test   %eax,%eax
>     4774:       7e 14                   jle    478a <alloc_perf_context+0xea>

With GCC 11.1.0 (and my LSE cleanup), arm64 gets similar:

    40e8:       52800024        mov     w4, #0x1                        // #1
    ...
    4118:       b8e40001        ldaddal w4, w1, [x0]
    411c:       7100003f        cmp     w1, #0x0
    4120:       5400018d        b.le    4150 <alloc_perf_context+0xdc>

> without sacrificing on functionality; the only thing that suffers is
> the reported error condition, which might now 'spuriously' report
> 'saturated' instead of 'inc-from-zero'.

Hmm... maybe it's worth a message to cover both cases, since:

	"refcount_t: saturated; leaking memory"

... may not look as severe an issue as:

	"refcount_t: addition on 0; use-after-free"

Perhaps just:

	"refcount_t: leaking memory or use-after-free"

... and don't bother with the racy read when reporting?


> refcount_dec_and_test() is also improved:
> 
>     aa40:       b8 ff ff ff ff          mov    $0xffffffff,%eax
>     aa45:       f0 0f c1 07             lock xadd %eax,(%rdi)
>     aa49:       83 f8 01                cmp    $0x1,%eax
>     aa4c:       74 05                   je     aa53 <ring_buffer_put+0x13>
>     aa4e:       85 c0                   test   %eax,%eax
>     aa50:       7e 1e                   jle    aa70 <ring_buffer_put+0x30>
>     aa52:       c3                      ret
> 
> to:
> 
>     a980:       b8 ff ff ff ff          mov    $0xffffffff,%eax
>     a985:       f0 0f c1 07             lock xadd %eax,(%rdi)
>     a989:       83 e8 01                sub    $0x1,%eax
>     a98c:       78 20                   js     a9ae <ring_buffer_put+0x2e>
>     a98e:       74 01                   je     a991 <ring_buffer_put+0x11>
>     a990:       c3                      ret

Likewise (with my LSE cleanup) for arm64:

    c3ac:       12800001        mov     w1, #0xffffffff                 // #-1
    c3b0:       b8e10001        ldaddal w1, w1, [x0]
    c3b4:       51000421        sub     w1, w1, #0x1
    c3b8:       7100003f        cmp     w1, #0x0
    c3bc:       5400012b        b.lt    c3e0 <ring_buffer_put+0x4c>  // b.tstop
    c3c0:       540001a0        b.eq    c3f4 <ring_buffer_put+0x60>  // b.none

AFAICT the compiler *should* be able to use a single `MOVS` in place of that
`SUB; CMP`, and I think that's either a missed optimization or an artifact of
something in my LSE patches.

> XXX atomic_dec_and_test_overflow() is strictly stronger ordered than
> refcount_dec_and_test() is defined -- Power and Arrghh64 ?

I'll leave the ordering to Will.

As this stands (or with the warning rewrite):

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/refcount.h |   17 +++++++++++++----
>  lib/refcount.c           |    5 +++++
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -264,7 +264,10 @@ static inline void __refcount_inc(refcou
>   */
>  static inline void refcount_inc(refcount_t *r)
>  {
> -	__refcount_inc(r, NULL);
> +	atomic_inc_overflow(&r->refs, Eoverflow);
> +	return;
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_ADD_OVF);
>  }
>  
>  static inline __must_check bool __refcount_sub_and_test(int i, refcount_t *r, int *oldp)
> @@ -330,7 +333,10 @@ static inline __must_check bool __refcou
>   */
>  static inline __must_check bool refcount_dec_and_test(refcount_t *r)
>  {
> -	return __refcount_dec_and_test(r, NULL);
> +	return atomic_dec_and_test_overflow(&r->refs, Eoverflow);
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_SUB_UAF);
> +	return false;
>  }
>  
>  static inline void __refcount_dec(refcount_t *r, int *oldp)
> @@ -340,7 +346,7 @@ static inline void __refcount_dec(refcou
>  	if (oldp)
>  		*oldp = old;
>  
> -	if (unlikely(old <= 1))
> +	if (unlikely(old - 1 <= 0))
>  		refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
>  }
>  
> @@ -356,7 +362,10 @@ static inline void __refcount_dec(refcou
>   */
>  static inline void refcount_dec(refcount_t *r)
>  {
> -	__refcount_dec(r, NULL);
> +	atomic_dec_overflow(&r->refs, Eoverflow);
> +	return;
> +Eoverflow:
> +	refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
>  }
>  
>  extern __must_check bool refcount_dec_if_one(refcount_t *r);
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -12,8 +12,13 @@
>  
>  void refcount_warn_saturate(refcount_t *r, enum refcount_saturation_type t)
>  {
> +	int old = refcount_read(r);
>  	refcount_set(r, REFCOUNT_SATURATED);
>  
> +	/* racy; who cares */
> +	if (old == 1 && t == REFCOUNT_ADD_OVF)
> +		t = REFCOUNT_ADD_UAF;
> +
>  	switch (t) {
>  	case REFCOUNT_ADD_NOT_ZERO_OVF:
>  		REFCOUNT_WARN("saturated; leaking memory");
> 
> 

  reply	other threads:[~2021-12-13 10:35 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 16:16 [PATCH v2 0/9] refcount: Improve code-gen Peter Zijlstra
2021-12-10 16:16 ` [PATCH v2 1/9] atomic: Prepare scripts for macro ops Peter Zijlstra
2021-12-10 17:27   ` Mark Rutland
2021-12-10 17:43   ` Marco Elver
2021-12-10 16:16 ` [PATCH v2 2/9] atomic: Add xchg.tbl Peter Zijlstra
2021-12-13  9:50   ` Mark Rutland
2021-12-10 16:16 ` [PATCH v2 3/9] atomic: Introduce atomic_{inc,dec,dec_and_test}_overflow() Peter Zijlstra
2021-12-13 10:06   ` Mark Rutland
2021-12-13 10:57     ` Peter Zijlstra
2021-12-13 10:59     ` Peter Zijlstra
2021-12-13 11:09       ` Mark Rutland
2021-12-10 16:16 ` [PATCH v2 4/9] refcount: Use atomic_*_overflow() Peter Zijlstra
2021-12-13 10:35   ` Mark Rutland [this message]
2021-12-10 16:16 ` [PATCH v2 5/9] atomic,x86: Implement atomic_dec_and_test_overflow() Peter Zijlstra
2021-12-13 11:04   ` Mark Rutland
2021-12-10 16:16 ` [PATCH v2 6/9] refcount: Fix refcount_dec_not_one() Peter Zijlstra
2021-12-10 16:16 ` [PATCH v2 7/9] refcount: Prepare for atomic_*_overflow() offsets Peter Zijlstra
2021-12-10 16:16 ` [PATCH v2 8/9] atomic,x86: Alternative atomic_*_overflow() scheme Peter Zijlstra
2021-12-10 16:53   ` Linus Torvalds
2021-12-10 17:27     ` Linus Torvalds
2021-12-17  3:38     ` Herbert Xu
2021-12-13 16:43   ` Peter Zijlstra
2021-12-13 17:29     ` Marco Elver
2021-12-13 18:11     ` Linus Torvalds
2021-12-13 18:18       ` Marco Elver
2021-12-13 18:24         ` Linus Torvalds
2021-12-13 19:35           ` Marco Elver
2021-12-13 18:21       ` Linus Torvalds
2021-12-10 16:16 ` [PATCH v2 9/9] refcount: Optimize __refcount_add_not_zero(.i=1) Peter Zijlstra
2021-12-10 19:37 ` [PATCH v2 0/9] refcount: Improve code-gen Peter Zijlstra
2021-12-13 12:15 ` [PATCH v2 10/9] atomic: Document the atomic_{}_overflow() functions Peter Zijlstra
2021-12-13 12:20 ` [PATCH v2 0/9] refcount: Improve code-gen Peter Zijlstra
2021-12-13 14:42   ` Marco Elver
2021-12-13 16:11     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ybch1AKtthvI7Obm@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=elver@google.com \
    --cc=hch@infradead.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.