All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kent Overstreet <koverstreet@google.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-aio@kvack.org, akpm@linux-foundation.org,
	Zach Brown <zab@redhat.com>, Felipe Balbi <balbi@ti.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mark Fasheh <mfasheh@suse.com>, Joel Becker <jlbec@evilplan.org>,
	Jens Axboe <axboe@kernel.dk>,
	Asai Thambi S P <asamymuthupa@micron.com>,
	Selvan Mani <smani@micron.com>,
	Sam Bradshaw <sbradshaw@micron.com>,
	Jeff Moyer <jmoyer@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH 04/21] Generic percpu refcounting
Date: Fri, 31 May 2013 13:12:59 -0700	[thread overview]
Message-ID: <20130531201259.GH2291@google.com> (raw)
In-Reply-To: <87hahmmldf.fsf@rustcorp.com.au>

On Wed, May 29, 2013 at 02:29:56PM +0930, Rusty Russell wrote:
> Kent Overstreet <koverstreet@google.com> writes:
> > I'm not sure I know of any good way of explaining it intuitively, but
> > here's this at least...
> >
> >  * (More precisely: because moduler arithmatic is commutative the sum of all the
> >  * pcpu_count vars will be equal to what it would have been if all the gets and
> >  * puts were done to a single integer, even if some of the percpu integers
> >  * overflow or underflow).
> 
> This seems intuitively obvious, so I wouldn't sweat it too much.  What
> goes up, has to come down somewhere.

I agree, but it seems there's a fair amount of disagreement over what's
intuitive :)

> Yes.  We should note the 31 bit limit somewhere.  We could WARN_ON() if
> count is >= BIAS in percpu_ref_kill(), perhaps.

I'd be hesitant about that - that WARN_ON() would work for this version
(I think) but it'd be incorrect for dynamic percpu refcounting, for
reasons that are almost accidental. And that WARN_ON() isn't going to
fire in anything but the most retarded torture testing.

Besides that, it's hard to imagine a situation where a range of 1 << 32
would be ok but a range of 1 << 31 wouldn't... if we need a WARN_ON()
here we need one for regular atomic_t too, but I don't see either buying
us much.

Also, if/when this is used for something where the range does matter
I'll just switch it to unsigned long (been debating doing that now, but
the aio code was using at atomic_t so I don't really care yet).

It should be documented though - I'll do that.

> >> I probably should have made it clearer.  Sorry about that.  tryget()
> >> is fine.  I was curious about count() as it's always a bit dangerous a
> >> query interface which is racy and can return something unexpected like
> >> false zero or underflowed refcnt.
> >
> > Yeah, it is, it was intended just for the module code where it's only
> > used for the value lsmod shows.
> 
> Open code it there?

Maybe justified for this, but I'm not a fan of open coding anything that
could be considered library/utility code... better to just document it
with ALL CAPS WARNINGS about being dangerous if used incorrectly.

But we can revisit that if/when the module refcount conversion is done.

> >> Let's just have percpu_ref_kill(ref, release) which puts the base ref
> >> and invokes release whenever it's done.
> >
> > Release has to be stored in struct percpu_ref() so it can be invoked
> > after a call_rcu() (percpu_ref_kill -> call_rcu() ->
> > percpu_ref_kill_rcu() -> percpu_ref_put()) so I'm passing it to
> > percpu_ref_init(), but yeah.
> 
> Or hand it to percpu_ref_put(), too, as per kref_put().  I hate indirect
> magic.

The indirect magic is unfortunately necessary because percpu_ref_kill()
has to do a put after a call_rcu().

If the indirect magic wasn't needed I'd prefer to not pass a release
function to anything and just have percpu_ref_put() return bool, but
Tejun disagrees and it's a moot point anyways.

WARNING: multiple messages have this Message-ID (diff)
From: Kent Overstreet <koverstreet@google.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-aio@kvack.org, akpm@linux-foundation.org,
	Zach Brown <zab@redhat.com>, Felipe Balbi <balbi@ti.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mark Fasheh <mfasheh@suse.com>, Joel Becker <jlbec@evilplan.org>,
	Jens Axboe <axboe@kernel.dk>,
	Asai Thambi S P <asamymuthupa@micron.com>,
	Selvan Mani <smani@micron.com>,
	Sam Bradshaw <sbradshaw@micron.com>,
	Jeff Moyer <jmoyer@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH 04/21] Generic percpu refcounting
Date: Fri, 31 May 2013 13:12:59 -0700	[thread overview]
Message-ID: <20130531201259.GH2291@google.com> (raw)
In-Reply-To: <87hahmmldf.fsf@rustcorp.com.au>

On Wed, May 29, 2013 at 02:29:56PM +0930, Rusty Russell wrote:
> Kent Overstreet <koverstreet@google.com> writes:
> > I'm not sure I know of any good way of explaining it intuitively, but
> > here's this at least...
> >
> >  * (More precisely: because moduler arithmatic is commutative the sum of all the
> >  * pcpu_count vars will be equal to what it would have been if all the gets and
> >  * puts were done to a single integer, even if some of the percpu integers
> >  * overflow or underflow).
> 
> This seems intuitively obvious, so I wouldn't sweat it too much.  What
> goes up, has to come down somewhere.

I agree, but it seems there's a fair amount of disagreement over what's
intuitive :)

> Yes.  We should note the 31 bit limit somewhere.  We could WARN_ON() if
> count is >= BIAS in percpu_ref_kill(), perhaps.

I'd be hesitant about that - that WARN_ON() would work for this version
(I think) but it'd be incorrect for dynamic percpu refcounting, for
reasons that are almost accidental. And that WARN_ON() isn't going to
fire in anything but the most retarded torture testing.

Besides that, it's hard to imagine a situation where a range of 1 << 32
would be ok but a range of 1 << 31 wouldn't... if we need a WARN_ON()
here we need one for regular atomic_t too, but I don't see either buying
us much.

Also, if/when this is used for something where the range does matter
I'll just switch it to unsigned long (been debating doing that now, but
the aio code was using at atomic_t so I don't really care yet).

It should be documented though - I'll do that.

> >> I probably should have made it clearer.  Sorry about that.  tryget()
> >> is fine.  I was curious about count() as it's always a bit dangerous a
> >> query interface which is racy and can return something unexpected like
> >> false zero or underflowed refcnt.
> >
> > Yeah, it is, it was intended just for the module code where it's only
> > used for the value lsmod shows.
> 
> Open code it there?

Maybe justified for this, but I'm not a fan of open coding anything that
could be considered library/utility code... better to just document it
with ALL CAPS WARNINGS about being dangerous if used incorrectly.

But we can revisit that if/when the module refcount conversion is done.

> >> Let's just have percpu_ref_kill(ref, release) which puts the base ref
> >> and invokes release whenever it's done.
> >
> > Release has to be stored in struct percpu_ref() so it can be invoked
> > after a call_rcu() (percpu_ref_kill -> call_rcu() ->
> > percpu_ref_kill_rcu() -> percpu_ref_put()) so I'm passing it to
> > percpu_ref_init(), but yeah.
> 
> Or hand it to percpu_ref_put(), too, as per kref_put().  I hate indirect
> magic.

The indirect magic is unfortunately necessary because percpu_ref_kill()
has to do a put after a call_rcu().

If the indirect magic wasn't needed I'd prefer to not pass a release
function to anything and just have percpu_ref_put() return bool, but
Tejun disagrees and it's a moot point anyways.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  reply	other threads:[~2013-05-31 20:13 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-14  1:18 AIO refactoring/performance improvements/cancellation Kent Overstreet
2013-05-14  1:18 ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 01/21] aio: fix kioctx not being freed after cancellation at exit time Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 02/21] aio: reqs_active -> reqs_available Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 03/21] aio: percpu reqs_available Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 04/21] Generic percpu refcounting Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14 13:51   ` Oleg Nesterov
2013-05-14 13:51     ` Oleg Nesterov
2013-05-15  8:21     ` Kent Overstreet
2013-05-15  8:21       ` Kent Overstreet
2013-05-14 14:59   ` Tejun Heo
2013-05-14 14:59     ` Tejun Heo
2013-05-14 15:28     ` Oleg Nesterov
2013-05-14 15:28       ` Oleg Nesterov
2013-05-15  9:00       ` Kent Overstreet
2013-05-15  9:00         ` Kent Overstreet
2013-05-15  8:58     ` Kent Overstreet
2013-05-15  8:58       ` Kent Overstreet
2013-05-15 17:37       ` Tejun Heo
2013-05-15 17:37         ` Tejun Heo
2013-05-28 23:47         ` Kent Overstreet
2013-05-28 23:47           ` Kent Overstreet
2013-05-29  1:11           ` Tejun Heo
2013-05-29  1:11             ` Tejun Heo
2013-05-29  4:59           ` Rusty Russell
2013-05-29  4:59             ` Rusty Russell
2013-05-31 20:12             ` Kent Overstreet [this message]
2013-05-31 20:12               ` Kent Overstreet
2013-05-14 21:59   ` Tejun Heo
2013-05-14 21:59     ` Tejun Heo
2013-05-14 22:15     ` Tejun Heo
2013-05-14 22:15       ` Tejun Heo
2013-05-15  9:07     ` Kent Overstreet
2013-05-15  9:07       ` Kent Overstreet
2013-05-15 17:56       ` Tejun Heo
2013-05-15 17:56         ` Tejun Heo
2013-05-16  0:26   ` Rusty Russell
2013-05-16  0:26     ` Rusty Russell
2013-05-14  1:18 ` [PATCH 05/21] aio: percpu ioctx refcount Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 06/21] aio: io_cancel() no longer returns the io_event Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 07/21] aio: Don't use ctx->tail unnecessarily Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 08/21] aio: Kill aio_rw_vect_retry() Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 09/21] aio: Kill unneeded kiocb members Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 10/21] aio: Kill ki_users Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 11/21] aio: Kill ki_dtor Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 12/21] aio: convert the ioctx list to radix tree Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 13/21] block: prep work for batch completion Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 14/21] block, aio: batch completion for bios/kiocbs Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 15/21] virtio-blk: convert to batch completion Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 16/21] mtip32xx: " Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 17/21] Percpu tag allocator Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14 13:48   ` Oleg Nesterov
2013-05-14 13:48     ` Oleg Nesterov
2013-05-14 14:24     ` Oleg Nesterov
2013-05-14 14:24       ` Oleg Nesterov
2013-05-15  9:34       ` Kent Overstreet
2013-05-15  9:34         ` Kent Overstreet
2013-05-15  9:25     ` Kent Overstreet
2013-05-15  9:25       ` Kent Overstreet
2013-05-15 15:41       ` Oleg Nesterov
2013-05-15 15:41         ` Oleg Nesterov
2013-05-15 16:10         ` Oleg Nesterov
2013-05-15 16:10           ` Oleg Nesterov
2013-06-10 23:20         ` Kent Overstreet
2013-06-10 23:20           ` Kent Overstreet
2013-06-11 17:42           ` Oleg Nesterov
2013-06-11 17:42             ` Oleg Nesterov
2013-05-14 15:03   ` Tejun Heo
2013-05-14 15:03     ` Tejun Heo
2013-05-15 20:19   ` Andi Kleen
2013-05-15 20:19     ` Andi Kleen
2013-05-14  1:18 ` [PATCH 18/21] aio: Allow cancellation without a cancel callback, new kiocb lookup Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 19/21] aio/usb: Update cancellation for new synchonization Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 20/21] direct-io: Set dio->io_error directly Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-14  1:18 ` [PATCH 21/21] block: Bio cancellation Kent Overstreet
2013-05-14  1:18   ` Kent Overstreet
2013-05-15 17:52   ` Jens Axboe
2013-05-15 17:52     ` Jens Axboe
2013-05-15 19:29     ` Kent Overstreet
2013-05-15 19:29       ` Kent Overstreet
2013-05-15 20:01       ` Jens Axboe
2013-05-15 20:01         ` Jens Axboe
2013-05-31 22:52         ` Kent Overstreet
2013-05-31 22:52           ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130531201259.GH2291@google.com \
    --to=koverstreet@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=asamymuthupa@micron.com \
    --cc=axboe@kernel.dk \
    --cc=balbi@ti.com \
    --cc=bcrl@kvack.org \
    --cc=cl@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jlbec@evilplan.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=sbradshaw@micron.com \
    --cc=smani@micron.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.