All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
@ 2011-12-13 21:44 Dave Kleikamp
  2011-12-13 22:18 ` Jeff Moyer
  2011-12-15  1:09 ` Shaohua Li
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Kleikamp @ 2011-12-13 21:44 UTC (permalink / raw)
  To: linux-aio; +Cc: linux-kernel, Chris Mason, Jens Axboe, Andi Kleen, Jeff Moyer

Asynchronous I/O latency to a solid-state disk greatly increased
between the 2.6.32 and 3.0 kernels. By removing the plug from
do_io_submit(), we observed a 34% improvement in the I/O latency.

Unfortunately, at this level, we don't know if the request is to
a rotating disk or not.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: linux-aio@kvack.org
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>

diff --git a/fs/aio.c b/fs/aio.c
index 78c514c..d131a2c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1696,7 +1696,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
 	struct kioctx *ctx;
 	long ret = 0;
 	int i = 0;
-	struct blk_plug plug;
 	struct kiocb_batch batch;
 
 	if (unlikely(nr < 0))
@@ -1716,8 +1715,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
 
 	kiocb_batch_init(&batch, nr);
 
-	blk_start_plug(&plug);
-
 	/*
 	 * AKPM: should this return a partial result if some of the IOs were
 	 * successfully submitted?
@@ -1740,7 +1737,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
 		if (ret)
 			break;
 	}
-	blk_finish_plug(&plug);
 
 	kiocb_batch_free(&batch);
 	put_ioctx(ctx);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
  2011-12-13 21:44 [PATCH] AIO: Don't plug the I/O queue in do_io_submit() Dave Kleikamp
@ 2011-12-13 22:18 ` Jeff Moyer
  2011-12-13 23:26   ` Dave Kleikamp
  2011-12-15  1:09 ` Shaohua Li
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Moyer @ 2011-12-13 22:18 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: linux-aio, linux-kernel, Chris Mason, Jens Axboe, Andi Kleen

Dave Kleikamp <dave.kleikamp@oracle.com> writes:

> Asynchronous I/O latency to a solid-state disk greatly increased
> between the 2.6.32 and 3.0 kernels. By removing the plug from
> do_io_submit(), we observed a 34% improvement in the I/O latency.
>
> Unfortunately, at this level, we don't know if the request is to
> a rotating disk or not.

I'm guessing I know the answer to this, but what workload were you
testing, and can you provide more concrete evidence than "latency
greatly increased?"  Also, have you tested the effects this has when
using traditional storage for whatever your workload is?  I don't feel
comfortable essentially reverting a performance patch without knowing
the entire picture.  I will certainly do some testing on my end, too.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
  2011-12-13 22:18 ` Jeff Moyer
@ 2011-12-13 23:26   ` Dave Kleikamp
  2011-12-14 20:58     ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Kleikamp @ 2011-12-13 23:26 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-aio, linux-kernel, Chris Mason, Jens Axboe, Andi Kleen

On 12/13/2011 04:18 PM, Jeff Moyer wrote:
> Dave Kleikamp <dave.kleikamp@oracle.com> writes:
> 
>> Asynchronous I/O latency to a solid-state disk greatly increased
>> between the 2.6.32 and 3.0 kernels. By removing the plug from
>> do_io_submit(), we observed a 34% improvement in the I/O latency.
>>
>> Unfortunately, at this level, we don't know if the request is to
>> a rotating disk or not.
> 
> I'm guessing I know the answer to this, but what workload were you
> testing, and can you provide more concrete evidence than "latency
> greatly increased?"

It is a piece of a larger industry-standard benchmark and you're
probably guessing correctly. The "greatly increased" latency was
actually slightly higher the improvement I get with this patch. So the
patch brought the latency nearly down to where it was before.

 I will try a microbenchmark to see if I get similar behavior, but I
wanted to throw this out here to get input.

I also failed to mention that the earlier kernel was a vendor kernel
(similar results on both Redhat and Oracle kernels). The 3.0 kernel is
much closer to mainline, but I haven't played with mainline kernels yet.
I expect similar results, but I can verify that.

> Also, have you tested the effects this has when
> using traditional storage for whatever your workload is?

That may be difficult, but hopefully, I can demonstrate it with a
simpler benchmark which I could test on traditional storage.

> I don't feel
> comfortable essentially reverting a performance patch without knowing
> the entire picture.  I will certainly do some testing on my end, too.

Understood. Thanks,

Shaggy

> Cheers,
> Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
  2011-12-13 23:26   ` Dave Kleikamp
@ 2011-12-14 20:58     ` Chris Mason
  2011-12-16 14:45       ` Jeff Moyer
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Mason @ 2011-12-14 20:58 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Jeff Moyer, linux-aio, linux-kernel, Jens Axboe, Andi Kleen

On Tue, Dec 13, 2011 at 05:26:07PM -0600, Dave Kleikamp wrote:
> On 12/13/2011 04:18 PM, Jeff Moyer wrote:
> > Dave Kleikamp <dave.kleikamp@oracle.com> writes:
> > 
> >> Asynchronous I/O latency to a solid-state disk greatly increased
> >> between the 2.6.32 and 3.0 kernels. By removing the plug from
> >> do_io_submit(), we observed a 34% improvement in the I/O latency.
> >>
> >> Unfortunately, at this level, we don't know if the request is to
> >> a rotating disk or not.
> > 
> > I'm guessing I know the answer to this, but what workload were you
> > testing, and can you provide more concrete evidence than "latency
> > greatly increased?"
> 
> It is a piece of a larger industry-standard benchmark and you're
> probably guessing correctly. The "greatly increased" latency was
> actually slightly higher the improvement I get with this patch. So the
> patch brought the latency nearly down to where it was before.
> 
>  I will try a microbenchmark to see if I get similar behavior, but I
> wanted to throw this out here to get input.

The better IO latency did bump the overall benchmark score by 3%, and it
did end up bringing our latencies on par with solaris runs on similar
hardware.

We didn't find this one through exhaustive tracing...instead we used a more
traditional approach involving a list of Jens' commits and a dart board.
So, we don't have a lot of data yet on exactly why the plug is hurting.

But, I'm starting to wonder if the plug makes sense here at all.  We're
queueing up IO in the main submit loop, and the aio submit might be
spanning any number of devices on a large variety of filesystems.  The
actual direct IO call may be pretty expensive.

My guess for why this helps is contention on the aio context lock
between the submission code and the end_io softirq code.  We bash on
that lock a number of times during the IO submit, and the whole time
we're holding on to our list of plugged IOs instead of giving the
hardware the chance to process them.

-chris


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
  2011-12-13 21:44 [PATCH] AIO: Don't plug the I/O queue in do_io_submit() Dave Kleikamp
  2011-12-13 22:18 ` Jeff Moyer
@ 2011-12-15  1:09 ` Shaohua Li
  2011-12-15 16:15   ` Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Shaohua Li @ 2011-12-15  1:09 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: linux-aio, linux-kernel, Chris Mason, Jens Axboe, Andi Kleen, Jeff Moyer

2011/12/14 Dave Kleikamp <dave.kleikamp@oracle.com>:
> Asynchronous I/O latency to a solid-state disk greatly increased
> between the 2.6.32 and 3.0 kernels. By removing the plug from
> do_io_submit(), we observed a 34% improvement in the I/O latency.
>
> Unfortunately, at this level, we don't know if the request is to
> a rotating disk or not.
>
> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> Cc: linux-aio@kvack.org
> Cc: Chris Mason <chris.mason@oracle.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Jeff Moyer <jmoyer@redhat.com>
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 78c514c..d131a2c 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1696,7 +1696,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>        struct kioctx *ctx;
>        long ret = 0;
>        int i = 0;
> -       struct blk_plug plug;
>        struct kiocb_batch batch;
>
>        if (unlikely(nr < 0))
> @@ -1716,8 +1715,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>
>        kiocb_batch_init(&batch, nr);
>
> -       blk_start_plug(&plug);
> -
>        /*
>         * AKPM: should this return a partial result if some of the IOs were
>         * successfully submitted?
> @@ -1740,7 +1737,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>                if (ret)
>                        break;
>        }
> -       blk_finish_plug(&plug);
>
>        kiocb_batch_free(&batch);
>        put_ioctx(ctx);
can you explain why this can help? Note, in 3.1 kernel we now force flush
plug list if the list is too long, which will remove a lot of latency.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
  2011-12-15  1:09 ` Shaohua Li
@ 2011-12-15 16:15   ` Jens Axboe
  2011-12-15 16:40     ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2011-12-15 16:15 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Dave Kleikamp, linux-aio, linux-kernel, Chris Mason, Andi Kleen,
	Jeff Moyer

On 2011-12-15 02:09, Shaohua Li wrote:
> 2011/12/14 Dave Kleikamp <dave.kleikamp@oracle.com>:
>> Asynchronous I/O latency to a solid-state disk greatly increased
>> between the 2.6.32 and 3.0 kernels. By removing the plug from
>> do_io_submit(), we observed a 34% improvement in the I/O latency.
>>
>> Unfortunately, at this level, we don't know if the request is to
>> a rotating disk or not.
>>
>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>> Cc: linux-aio@kvack.org
>> Cc: Chris Mason <chris.mason@oracle.com>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Jeff Moyer <jmoyer@redhat.com>
>>
>> diff --git a/fs/aio.c b/fs/aio.c
>> index 78c514c..d131a2c 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1696,7 +1696,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>>        struct kioctx *ctx;
>>        long ret = 0;
>>        int i = 0;
>> -       struct blk_plug plug;
>>        struct kiocb_batch batch;
>>
>>        if (unlikely(nr < 0))
>> @@ -1716,8 +1715,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>>
>>        kiocb_batch_init(&batch, nr);
>>
>> -       blk_start_plug(&plug);
>> -
>>        /*
>>         * AKPM: should this return a partial result if some of the IOs were
>>         * successfully submitted?
>> @@ -1740,7 +1737,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>>                if (ret)
>>                        break;
>>        }
>> -       blk_finish_plug(&plug);
>>
>>        kiocb_batch_free(&batch);
>>        put_ioctx(ctx);
> can you explain why this can help? Note, in 3.1 kernel we now force flush
> plug list if the list is too long, which will remove a lot of latency.

I think that would indeed be an interesting addition to test on top of
the 3.0 kernel being used.

This is a bit of a sticky situation. We want the plugging and merging on
rotational storage, and on SSDs we want the batch addition to the queue
to avoid hammering on the queue lock. At this level, we have no idea.
But we don't want to introduce longer latencies. So the question is, are
these latencies due to long queues (and hence would be helped with the
auto-replug on 3.1 and newer), or are they due to the submissions
running for too long. If the latter, then we can either look into
reducing the time spent between submitting the individual pieces. Or at
least not holding up too long.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
  2011-12-15 16:15   ` Jens Axboe
@ 2011-12-15 16:40     ` Chris Mason
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Mason @ 2011-12-15 16:40 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Shaohua Li, Dave Kleikamp, linux-aio, linux-kernel, Andi Kleen,
	Jeff Moyer

On Thu, Dec 15, 2011 at 05:15:26PM +0100, Jens Axboe wrote:
> On 2011-12-15 02:09, Shaohua Li wrote:
> > 2011/12/14 Dave Kleikamp <dave.kleikamp@oracle.com>:
> >> Asynchronous I/O latency to a solid-state disk greatly increased
> >> between the 2.6.32 and 3.0 kernels. By removing the plug from
> >> do_io_submit(), we observed a 34% improvement in the I/O latency.
> >>
> >> Unfortunately, at this level, we don't know if the request is to
> >> a rotating disk or not.
> >>
> >> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> >> Cc: linux-aio@kvack.org
> >> Cc: Chris Mason <chris.mason@oracle.com>
> >> Cc: Jens Axboe <axboe@kernel.dk>
> >> Cc: Andi Kleen <ak@linux.intel.com>
> >> Cc: Jeff Moyer <jmoyer@redhat.com>
> >>
> >> diff --git a/fs/aio.c b/fs/aio.c
> >> index 78c514c..d131a2c 100644
> >> --- a/fs/aio.c
> >> +++ b/fs/aio.c
> >> @@ -1696,7 +1696,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> >>        struct kioctx *ctx;
> >>        long ret = 0;
> >>        int i = 0;
> >> -       struct blk_plug plug;
> >>        struct kiocb_batch batch;
> >>
> >>        if (unlikely(nr < 0))
> >> @@ -1716,8 +1715,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> >>
> >>        kiocb_batch_init(&batch, nr);
> >>
> >> -       blk_start_plug(&plug);
> >> -
> >>        /*
> >>         * AKPM: should this return a partial result if some of the IOs were
> >>         * successfully submitted?
> >> @@ -1740,7 +1737,6 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> >>                if (ret)
> >>                        break;
> >>        }
> >> -       blk_finish_plug(&plug);
> >>
> >>        kiocb_batch_free(&batch);
> >>        put_ioctx(ctx);
> > can you explain why this can help? Note, in 3.1 kernel we now force flush
> > plug list if the list is too long, which will remove a lot of latency.
> 
> I think that would indeed be an interesting addition to test on top of
> the 3.0 kernel being used.
> 
> This is a bit of a sticky situation. We want the plugging and merging on
> rotational storage, and on SSDs we want the batch addition to the queue
> to avoid hammering on the queue lock. At this level, we have no idea.
> But we don't want to introduce longer latencies. So the question is, are
> these latencies due to long queues (and hence would be helped with the
> auto-replug on 3.1 and newer), or are they due to the submissions
> running for too long. If the latter, then we can either look into
> reducing the time spent between submitting the individual pieces. Or at
> least not holding up too long.

Each io_submit call is sending down about 34K of IO to two different devices.
The latencies were measured just on the process writing the redo
logs, so it is a very specific subset of the overall benchmark.

The patched kernel only does 4x more iops for the redo logs than the
unpatched kernel, so we're talking ~8K ios here.

-chris


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] AIO: Don't plug the I/O queue in do_io_submit()
  2011-12-14 20:58     ` Chris Mason
@ 2011-12-16 14:45       ` Jeff Moyer
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Moyer @ 2011-12-16 14:45 UTC (permalink / raw)
  To: Chris Mason
  Cc: Dave Kleikamp, linux-aio, linux-kernel, Jens Axboe, Andi Kleen

Chris Mason <chris.mason@oracle.com> writes:

> On Tue, Dec 13, 2011 at 05:26:07PM -0600, Dave Kleikamp wrote:
>> On 12/13/2011 04:18 PM, Jeff Moyer wrote:
>> > Dave Kleikamp <dave.kleikamp@oracle.com> writes:
>> > 
>> >> Asynchronous I/O latency to a solid-state disk greatly increased
>> >> between the 2.6.32 and 3.0 kernels. By removing the plug from
>> >> do_io_submit(), we observed a 34% improvement in the I/O latency.
>> >>
>> >> Unfortunately, at this level, we don't know if the request is to
>> >> a rotating disk or not.
>> > 
>> > I'm guessing I know the answer to this, but what workload were you
>> > testing, and can you provide more concrete evidence than "latency
>> > greatly increased?"
>> 
>> It is a piece of a larger industry-standard benchmark and you're
>> probably guessing correctly. The "greatly increased" latency was
>> actually slightly higher the improvement I get with this patch. So the
>> patch brought the latency nearly down to where it was before.
>> 
>>  I will try a microbenchmark to see if I get similar behavior, but I
>> wanted to throw this out here to get input.
>
> The better IO latency did bump the overall benchmark score by 3%, and it
> did end up bringing our latencies on par with solaris runs on similar
> hardware.
>
> We didn't find this one through exhaustive tracing...instead we used a more
> traditional approach involving a list of Jens' commits and a dart board.
> So, we don't have a lot of data yet on exactly why the plug is hurting.
>
> But, I'm starting to wonder if the plug makes sense here at all.  We're
> queueing up IO in the main submit loop, and the aio submit might be
> spanning any number of devices on a large variety of filesystems.  The
> actual direct IO call may be pretty expensive.

I believe the original plugging here was done on a per fd basis.  So, I
concede that the behaviour may have changed a bit since the initial
patch for this was merged.

> My guess for why this helps is contention on the aio context lock
> between the submission code and the end_io softirq code.  We bash on
> that lock a number of times during the IO submit, and the whole time
> we're holding on to our list of plugged IOs instead of giving the
> hardware the chance to process them.

I have a patch slated for 3.2 that should help that.  It batches the
allocation of the aio requests, which showed a good improvement in
microbenchmarks there.

commit 080d676de095a14ecba14c0b9a91acb5bbb634df
Author: Jeff Moyer <jmoyer@redhat.com>
Date:   Wed Nov 2 13:40:10 2011 -0700

    aio: allocate kiocbs in batches

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-12-16 14:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-13 21:44 [PATCH] AIO: Don't plug the I/O queue in do_io_submit() Dave Kleikamp
2011-12-13 22:18 ` Jeff Moyer
2011-12-13 23:26   ` Dave Kleikamp
2011-12-14 20:58     ` Chris Mason
2011-12-16 14:45       ` Jeff Moyer
2011-12-15  1:09 ` Shaohua Li
2011-12-15 16:15   ` Jens Axboe
2011-12-15 16:40     ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.