All of lore.kernel.org
 help / color / mirror / Atom feed
* aio: bump i_count instead of using igrab
@ 2010-08-23 14:47 Chris Mason
  2010-08-23 14:50 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Mason @ 2010-08-23 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Jeff Moyer

The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch.  igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle.  It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

diff --git a/fs/aio.c b/fs/aio.c
index cf0bef4..0be69fa 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1558,7 +1558,20 @@ static void aio_batch_add(struct address_space *mapping,
 	}
 
 	abe = mempool_alloc(abe_pool, GFP_KERNEL);
-	BUG_ON(!igrab(mapping->host));
+
+	/*
+	 * we should be using igrab here, but
+	 * we don't want to hammer on the global
+	 * inode spinlock just to take an extra
+	 * reference on a file that we must already
+	 * have a reference to.
+	 *
+	 * When we're called, we always have a reference
+	 * on the file, so we must always have a reference
+	 * on the inode, so igrab must always just
+	 * bump the count and move on.
+	 */
+	atomic_inc(&mapping->host->i_count);
 	abe->mapping = mapping;
 	hlist_add_head(&abe->list, &batch_hash[bucket]);
 	return;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: aio: bump i_count instead of using igrab
  2010-08-23 14:47 aio: bump i_count instead of using igrab Chris Mason
@ 2010-08-23 14:50 ` Christoph Hellwig
  2010-08-23 15:00   ` Chris Mason
  2010-08-23 21:18   ` Jeff Moyer
  0 siblings, 2 replies; 6+ messages in thread
From: Christoph Hellwig @ 2010-08-23 14:50 UTC (permalink / raw)
  To: Chris Mason, linux-kernel, linux-fsdevel, Jeff Moyer

On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
> The aio batching code is using igrab to get an extra reference on the
> inode so it can safely batch.  igrab will go ahead and take the global
> inode spinlock, which can be a bottleneck on large machines doing lots
> of AIO.
> 
> In this case, igrab isn't required because we already have a reference
> on the file handle.  It is safe to just bump the i_count directly
> on the inode.
> 
> Benchmarking shows this patch brings IOP/s on tons of flash up by about
> 2.5X.

There's some places in XFS where we do the same, and it showed up as a
bottle neck before.  Instead of open coding the increment we have
a wrapper that includes and assert that the numbers is always positive.

I think we really want a proper helper for general use instead of
completly opencoding it.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aio: bump i_count instead of using igrab
  2010-08-23 14:50 ` Christoph Hellwig
@ 2010-08-23 15:00   ` Chris Mason
  2010-08-23 17:26     ` Jeff Moyer
  2010-08-24  7:33     ` Nick Piggin
  2010-08-23 21:18   ` Jeff Moyer
  1 sibling, 2 replies; 6+ messages in thread
From: Chris Mason @ 2010-08-23 15:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, linux-fsdevel, Jeff Moyer, npiggin

On Mon, Aug 23, 2010 at 10:50:31AM -0400, Christoph Hellwig wrote:
> On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
> > The aio batching code is using igrab to get an extra reference on the
> > inode so it can safely batch.  igrab will go ahead and take the global
> > inode spinlock, which can be a bottleneck on large machines doing lots
> > of AIO.
> > 
> > In this case, igrab isn't required because we already have a reference
> > on the file handle.  It is safe to just bump the i_count directly
> > on the inode.
> > 
> > Benchmarking shows this patch brings IOP/s on tons of flash up by about
> > 2.5X.
> 
> There's some places in XFS where we do the same, and it showed up as a
> bottle neck before.  Instead of open coding the increment we have
> a wrapper that includes and assert that the numbers is always positive.
> 
> I think we really want a proper helper for general use instead of
> completly opencoding it.
> 

Nick, this is about a 1 liner to fs/aio.c replacing igrab with
atomic_inc directly on the inode reference count.

I know your scalability tree gets rid of the global, but in this case I
think it still makes sense to avoid the locking completely when the
caller knows it is safe.  Do you already have something similar hiding
in the scalability tree?

-chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aio: bump i_count instead of using igrab
  2010-08-23 15:00   ` Chris Mason
@ 2010-08-23 17:26     ` Jeff Moyer
  2010-08-24  7:33     ` Nick Piggin
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Moyer @ 2010-08-23 17:26 UTC (permalink / raw)
  To: Chris Mason; +Cc: Christoph Hellwig, linux-kernel, linux-fsdevel, npiggin

Chris Mason <chris.mason@oracle.com> writes:

> On Mon, Aug 23, 2010 at 10:50:31AM -0400, Christoph Hellwig wrote:
>> On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
>> > The aio batching code is using igrab to get an extra reference on the
>> > inode so it can safely batch.  igrab will go ahead and take the global
>> > inode spinlock, which can be a bottleneck on large machines doing lots
>> > of AIO.
>> > 
>> > In this case, igrab isn't required because we already have a reference
>> > on the file handle.  It is safe to just bump the i_count directly
>> > on the inode.
>> > 
>> > Benchmarking shows this patch brings IOP/s on tons of flash up by about
>> > 2.5X.
>> 
>> There's some places in XFS where we do the same, and it showed up as a
>> bottle neck before.  Instead of open coding the increment we have
>> a wrapper that includes and assert that the numbers is always positive.
>> 
>> I think we really want a proper helper for general use instead of
>> completly opencoding it.
>> 
>
> Nick, this is about a 1 liner to fs/aio.c replacing igrab with
> atomic_inc directly on the inode reference count.
>
> I know your scalability tree gets rid of the global, but in this case I
> think it still makes sense to avoid the locking completely when the
> caller knows it is safe.  Do you already have something similar hiding
> in the scalability tree?

I opted for the safe route, initially, as I was not too familiar with
the locking.  If it's deemed safe to just do the increment, that works
for me.

Thanks for tracking this down, Chris!

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aio: bump i_count instead of using igrab
  2010-08-23 14:50 ` Christoph Hellwig
  2010-08-23 15:00   ` Chris Mason
@ 2010-08-23 21:18   ` Jeff Moyer
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Moyer @ 2010-08-23 21:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, linux-kernel, linux-fsdevel

Christoph Hellwig <hch@infradead.org> writes:

> On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
>> The aio batching code is using igrab to get an extra reference on the
>> inode so it can safely batch.  igrab will go ahead and take the global
>> inode spinlock, which can be a bottleneck on large machines doing lots
>> of AIO.
>> 
>> In this case, igrab isn't required because we already have a reference
>> on the file handle.  It is safe to just bump the i_count directly
>> on the inode.
>> 
>> Benchmarking shows this patch brings IOP/s on tons of flash up by about
>> 2.5X.
>
> There's some places in XFS where we do the same, and it showed up as a
> bottle neck before.  Instead of open coding the increment we have
> a wrapper that includes and assert that the numbers is always positive.
>
> I think we really want a proper helper for general use instead of
> completly opencoding it.

Well, it would make detecting races or invalid assumptions a little
easier.  If Chris wants to code that up, that's fine with me.  Honestly,
though, I don't think it's necessary.

I've gone through the alloc/free paths for the inode and I'm convinced
this is safe.  I'm happy with this version of the patch.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: aio: bump i_count instead of using igrab
  2010-08-23 15:00   ` Chris Mason
  2010-08-23 17:26     ` Jeff Moyer
@ 2010-08-24  7:33     ` Nick Piggin
  1 sibling, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2010-08-24  7:33 UTC (permalink / raw)
  To: Chris Mason, Christoph Hellwig, linux-kernel, linux-fsdevel,
	Jeff Moyer, npiggin

On Mon, Aug 23, 2010 at 11:00:23AM -0400, Chris Mason wrote:
> On Mon, Aug 23, 2010 at 10:50:31AM -0400, Christoph Hellwig wrote:
> > On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
> > > The aio batching code is using igrab to get an extra reference on the
> > > inode so it can safely batch.  igrab will go ahead and take the global
> > > inode spinlock, which can be a bottleneck on large machines doing lots
> > > of AIO.
> > > 
> > > In this case, igrab isn't required because we already have a reference
> > > on the file handle.  It is safe to just bump the i_count directly
> > > on the inode.
> > > 
> > > Benchmarking shows this patch brings IOP/s on tons of flash up by about
> > > 2.5X.
> > 
> > There's some places in XFS where we do the same, and it showed up as a
> > bottle neck before.  Instead of open coding the increment we have
> > a wrapper that includes and assert that the numbers is always positive.
> > 
> > I think we really want a proper helper for general use instead of
> > completly opencoding it.

Yeah igrab isn't nice. Several places are calling it without checking
return code too, which means they are either buggy or should be using
something else.


> Nick, this is about a 1 liner to fs/aio.c replacing igrab with
> atomic_inc directly on the inode reference count.
> 
> I know your scalability tree gets rid of the global, but in this case I
> think it still makes sense to avoid the locking completely when the
> caller knows it is safe.  Do you already have something similar hiding
> in the scalability tree?

Yes of course :) It has just simple refcounting increment helper that
can do some of the assertions.

I am hoping to be able to post the inode stuff soonish and hopefully
get it into Al's tree. It will pop up on fsdevel.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-24  7:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-23 14:47 aio: bump i_count instead of using igrab Chris Mason
2010-08-23 14:50 ` Christoph Hellwig
2010-08-23 15:00   ` Chris Mason
2010-08-23 17:26     ` Jeff Moyer
2010-08-24  7:33     ` Nick Piggin
2010-08-23 21:18   ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.