All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH] block: switch to atomic_t for request references
Date: Fri, 10 Dec 2021 13:38:07 +0100	[thread overview]
Message-ID: <YbNKLwMm+hv14WZs@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CAHk-=wiFLbv2M9gRkh6_Zkwiza17QP0gJLAL7AgDqDArGBGpSQ@mail.gmail.com>

On Wed, Dec 08, 2021 at 10:00:04AM -0800, Linus Torvalds wrote:

> It's also why for "page->_mapcount" we have the "free" value being -1,
> not 0, and the refcount is "off by one". It makes the special cases of
> "increment from zero" and "decrement to zero" be very easy and
> straightforward to test for.
> 
> That might be an option for an "atomic_ref" type - with our existing
> "page_mapcount()" code being the thing we'd convert first, and make be
> the example for it.
> 
> I think it should also make the error cases be very easy to check for
> without extra tests. If you make "decrement from zero" be the "ok, now
> it's free", then that shows in the carry flag. But otherwise, if SF or
> OF is set, it's an error.  That means we can use the regular atomics
> and flags (although not "dec" and "inc", since we'd care about CF).
> 
> So on x86, I think "atomic_dec_ref()" could be
> 
>         lock subl $1,ptr
>         jc now_its_free
>         jl this_is_an_error
> 
> if we end up having that "off by one" model.
> 
> And importantly, "atomic_inc_ref()" would be just
> 
>         lock incl ptr
>         jle this_is_an_error
> 
> and this avoids us having to have the value in a register and test it
> separately.
> 
> So your suggestion is _close_, but note how you can't do the "inc_ofl"
> without that "off-by-one" model.
> 
> And again - I might have gotten the exact flag test instructions
> wrong. That's what you get for not actually doing serious assembly
> language for a couple of decades.


add(         -3):       CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0       | sub(         -3):     CF=0 PF=1 AF=0 ZF=0 SF=1 ... OF=0
add(         -2):       CF=0 PF=1 AF=0 ZF=0 SF=1 ... OF=0       | sub(         -2):     CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0
add(         -1):       CF=1 PF=1 AF=1 ZF=1 SF=0 ... OF=0       | sub(         -1):     CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0
add(          0):       CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0       | sub(          0):     CF=1 PF=1 AF=1 ZF=0 SF=1 ... OF=0
add(          1):       CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0       | sub(          1):     CF=0 PF=1 AF=0 ZF=1 SF=0 ... OF=0
add(          2):       CF=0 PF=1 AF=0 ZF=0 SF=0 ... OF=0       | sub(          2):     CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0
add(          3):       CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0       | sub(          3):     CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0
               :                                                |               :
add( 2147483645):       CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0       | sub( 2147483645):     CF=0 PF=1 AF=0 ZF=0 SF=0 ... OF=0
add( 2147483646):       CF=0 PF=1 AF=0 ZF=0 SF=0 ... OF=0       | sub( 2147483646):     CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0
add( 2147483647):       CF=0 PF=1 AF=1 ZF=0 SF=1 ... OF=1       | sub( 2147483647):     CF=0 PF=0 AF=0 ZF=0 SF=0 ... OF=0
add(-2147483648):       CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0       | sub(-2147483648):     CF=0 PF=1 AF=1 ZF=0 SF=0 ... OF=1
add(-2147483647):       CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0       | sub(-2147483647):     CF=0 PF=1 AF=0 ZF=0 SF=1 ... OF=0
add(-2147483646):       CF=0 PF=1 AF=0 ZF=0 SF=1 ... OF=0       | sub(-2147483646):     CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0
add(-2147483645):       CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0       | sub(-2147483645):     CF=0 PF=0 AF=0 ZF=0 SF=1 ... OF=0

So:

e := z
l := s!=o

inc()						inc()
	lock inc %[var]                     		mov       $-1, %[reg]
	jle	error-zero-or-negative          	lock xadd %[reg], %[var]
                                                	test      %[reg], %[reg]
                                                	jle	  error-zero-or-negative
                                                
dec()                                           dec()
	lock sub $1, %[var]                     	lock dec %[var]
	jc	error-to-zero                   	jle	error-zero-or-negative
	jl	error-from-negative             
                                                
dec_and_test()                                  dec_and_test()
	lock sub $1, %[var]                     	lock dec %[var]
	jc	do-free                         	jl	error-from-negative
	jl	error-from-negative             	je	do-free


Should work I suppose, and gives [-1, INT_MIN] as operating range. It
adds a single branch instruction (which should be default predicted
not-taken due to being a forward jump IIRC) but makes inc a lot smaller.


Except I've no sane idea how to make it work with the rest of
refcount_t. The best I can seem to come up with is something like:

#define ATOMIC_OFL_OFFSET	1

static inline int refcount_read(const refcount_t *r)
{
	return atomic_read(&r->refs) + ATOMIC_OFL_OFFSET;
}

static inline void refcount_set(refcount_t *r, int n)
{
	atomic_set(&r->refs, n - ATOMIC_OFL_OFFSET);
}

static inline __must_check bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
{
	int old = atomic_read(&r->refs);

	do {
		if (old == -ATOMIC_OFL)
			break;
	} while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));

	old += ATOMIC_OFL_OFFSET;

	if (oldp)
		*oldp = old;

	if (unlikely(old < 0 || (i > 1 && old + i < 0)))
		refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);

	return old;
}

static inline void __refcount_add(int i, refcount_t *, int *oldp)
{
	int old = atomic_fetch_add_relaxed(i, &r->refs) + ATOMIC_OFL_OFFSET;

	if (oldp)
		*oldp = old;

	if (unlikely(!old))
		refcount_warn_saturate(r, REFCOUNT_ADD_UAF);
	if (unlikely(old < 0 || old + i < 0)
		refcount_warn_saturate(r, REFCOUNT_ADD_OVF);
}

And have the generic code have: ATOMIC_OFL_OFFSET == 0.

Do we *really* want to do that ?

With the above, __refcount_add_not_zero(), for the common case of: @i=1,
@oldp=NULL we get:

    a8f7:       41 8b 04 24             mov    (%r12),%eax
    a8fb:       83 f8 ff                cmp    $0xffffffff,%eax
    a8fe:       74 1a                   je     a91a <ring_buffer_get+0x3a>
    a900:       8d 50 01                lea    0x1(%rax),%edx
    a903:       f0 41 0f b1 14 24       lock cmpxchg %edx,(%r12)
    a909:       75 f0                   jne    a8fb <ring_buffer_get+0x1b>
    a90b:       85 d2                   test   %edx,%edx
    a90d:       78 19                   js     a928 <ring_buffer_get+0x48>

Which is actually really nice because i == ATOMIC_OFL_OFFSET.

Anybody? For now I think I'll drop the documentation patch and do this
scheme as the last patch in the series for v2.

Also, Mark suggested I rename the new primitives to:
atomic_*_overflow().

      parent reply	other threads:[~2021-12-10 12:38 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-03 15:35 [PATCH] block: switch to atomic_t for request references Jens Axboe
2021-12-03 15:56 ` Keith Busch
2021-12-06  6:53 ` Christoph Hellwig
2021-12-06  8:31   ` Peter Zijlstra
2021-12-06 16:32     ` Jens Axboe
2021-12-06 17:19       ` Peter Zijlstra
2021-12-06 17:35     ` Linus Torvalds
2021-12-06 18:13       ` Jens Axboe
2021-12-06 20:51         ` Kees Cook
2021-12-06 21:17           ` Linus Torvalds
2021-12-06 23:28             ` Kees Cook
2021-12-07  0:13               ` Linus Torvalds
2021-12-07  4:56                 ` Kees Cook
2021-12-07  9:34                 ` Peter Zijlstra
2021-12-07 16:03                   ` Linus Torvalds
2021-12-07 10:30                 ` Peter Zijlstra
2021-12-07 16:10                   ` Linus Torvalds
2021-12-07 16:23                     ` Peter Zijlstra
2021-12-06 16:31   ` Jens Axboe
2021-12-07 11:26   ` Peter Zijlstra
2021-12-07 13:28     ` Peter Zijlstra
2021-12-07 15:51       ` Peter Zijlstra
2021-12-07 16:13       ` Linus Torvalds
2021-12-07 16:52         ` Peter Zijlstra
2021-12-07 17:41           ` Peter Zijlstra
2021-12-07 17:43           ` Linus Torvalds
2021-12-07 17:45             ` Linus Torvalds
2021-12-07 20:28       ` Peter Zijlstra
2021-12-07 23:23         ` Linus Torvalds
2021-12-08 17:07           ` Peter Zijlstra
2021-12-08 18:00             ` Linus Torvalds
2021-12-08 18:44               ` Peter Zijlstra
2021-12-08 18:50                 ` Linus Torvalds
2021-12-08 20:32                   ` Peter Zijlstra
2021-12-10 10:57                   ` Peter Zijlstra
2021-12-10 12:38               ` Peter Zijlstra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbNKLwMm+hv14WZs@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=keescook@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.