All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC]cfq-iosched: no dispatch limit for single queue
@ 2009-12-03  3:53 Shaohua Li
  2009-12-03 11:57 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Shaohua Li @ 2009-12-03  3:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, akpm

Since commit 2f5cb7381b737e24c8046fd4aeab571fb71315f5, each queue can send
up to 4 * 4 requests if only one queue exists. I wonder why we have such limit.
Device supports tag can send more requests. For example, AHCI can send 31
requests. Test (direct aio randread) shows the limits reduce about 4% disk
thoughput.
On the other hand, since we send one request one time, if other queue
pop when current is sending more than cfq_quantum requests, current queue will
stop send requests soon after one request, so sounds there is no big latency.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index aa1e953..e05650f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1298,9 +1298,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 			return false;
 
 		/*
-		 * Sole queue user, allow bigger slice
+		 * Sole queue user, no limit
 		 */
-		max_dispatch *= 4;
+		max_dispatch = -1;
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC]cfq-iosched: no dispatch limit for single queue
  2009-12-03  3:53 [RFC]cfq-iosched: no dispatch limit for single queue Shaohua Li
@ 2009-12-03 11:57 ` Jens Axboe
  2009-12-04 18:34   ` Corrado Zoccolo
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2009-12-03 11:57 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, akpm

On Thu, Dec 03 2009, Shaohua Li wrote:
> Since commit 2f5cb7381b737e24c8046fd4aeab571fb71315f5, each queue can send
> up to 4 * 4 requests if only one queue exists. I wonder why we have such limit.
> Device supports tag can send more requests. For example, AHCI can send 31
> requests. Test (direct aio randread) shows the limits reduce about 4% disk
> thoughput.
> On the other hand, since we send one request one time, if other queue
> pop when current is sending more than cfq_quantum requests, current queue will
> stop send requests soon after one request, so sounds there is no big latency.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index aa1e953..e05650f 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1298,9 +1298,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  			return false;
>  
>  		/*
> -		 * Sole queue user, allow bigger slice
> +		 * Sole queue user, no limit
>  		 */
> -		max_dispatch *= 4;
> +		max_dispatch = -1;
>  	}
>  
>  	/*

As you mention, we do dispatches in bites of 1. In reality, there's
going to be little difference when we get this far in the depth process,
so I think the patch looks good. I have applied it, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC]cfq-iosched: no dispatch limit for single queue
  2009-12-03 11:57 ` Jens Axboe
@ 2009-12-04 18:34   ` Corrado Zoccolo
  2009-12-05  8:50     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Corrado Zoccolo @ 2009-12-04 18:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Shaohua Li, linux-kernel, akpm

Hi Shaohua, Jens,
On Thu, Dec 3, 2009 at 12:57 PM, Jens Axboe <jens.axboe@oracle.com> wrote:
> On Thu, Dec 03 2009, Shaohua Li wrote:
>> Since commit 2f5cb7381b737e24c8046fd4aeab571fb71315f5, each queue can send
>> up to 4 * 4 requests if only one queue exists. I wonder why we have such limit.
>> Device supports tag can send more requests. For example, AHCI can send 31
>> requests. Test (direct aio randread) shows the limits reduce about 4% disk
>> thoughput.
>> On the other hand, since we send one request one time, if other queue
>> pop when current is sending more than cfq_quantum requests, current queue will
>> stop send requests soon after one request, so sounds there is no big latency.
>>
>> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>>
>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> index aa1e953..e05650f 100644
>> --- a/block/cfq-iosched.c
>> +++ b/block/cfq-iosched.c
>> @@ -1298,9 +1298,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>>                       return false;
>>
>>               /*
>> -              * Sole queue user, allow bigger slice
>> +              * Sole queue user, no limit
>>                */
>> -             max_dispatch *= 4;
>> +             max_dispatch = -1;
>>       }
>>
>>       /*
>
> As you mention, we do dispatches in bites of 1. In reality, there's
> going to be little difference when we get this far in the depth process,
> so I think the patch looks good. I have applied it, thanks.

I think the limit should be removed only for sync queues.
For async queues, if cfq_latency is not set, removing the limit here can
cause very high latencies to sync queues (almost 100% increase),
without a noticeable throughput gain.

Thanks,
Corrado

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC]cfq-iosched: no dispatch limit for single queue
  2009-12-04 18:34   ` Corrado Zoccolo
@ 2009-12-05  8:50     ` Jens Axboe
  2009-12-05 10:48       ` Corrado Zoccolo
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2009-12-05  8:50 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: Shaohua Li, linux-kernel, akpm

On Fri, Dec 04 2009, Corrado Zoccolo wrote:
> Hi Shaohua, Jens,
> On Thu, Dec 3, 2009 at 12:57 PM, Jens Axboe <jens.axboe@oracle.com> wrote:
> > On Thu, Dec 03 2009, Shaohua Li wrote:
> >> Since commit 2f5cb7381b737e24c8046fd4aeab571fb71315f5, each queue can send
> >> up to 4 * 4 requests if only one queue exists. I wonder why we have such limit.
> >> Device supports tag can send more requests. For example, AHCI can send 31
> >> requests. Test (direct aio randread) shows the limits reduce about 4% disk
> >> thoughput.
> >> On the other hand, since we send one request one time, if other queue
> >> pop when current is sending more than cfq_quantum requests, current queue will
> >> stop send requests soon after one request, so sounds there is no big latency.
> >>
> >> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> >>
> >> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> >> index aa1e953..e05650f 100644
> >> --- a/block/cfq-iosched.c
> >> +++ b/block/cfq-iosched.c
> >> @@ -1298,9 +1298,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> >>                       return false;
> >>
> >>               /*
> >> -              * Sole queue user, allow bigger slice
> >> +              * Sole queue user, no limit
> >>                */
> >> -             max_dispatch *= 4;
> >> +             max_dispatch = -1;
> >>       }
> >>
> >>       /*
> >
> > As you mention, we do dispatches in bites of 1. In reality, there's
> > going to be little difference when we get this far in the depth process,
> > so I think the patch looks good. I have applied it, thanks.
> 
> I think the limit should be removed only for sync queues.
> For async queues, if cfq_latency is not set, removing the limit here can
> cause very high latencies to sync queues (almost 100% increase),
> without a noticeable throughput gain.

It's always problematic to say 'without a noticable throughput gain', as
on some workloads/storage, the difference between 16 and eg 32 in depth
WILL be noticeable. 16 is already high enough that if we hit that limit,
it will cause a latency hit. The hope here is that larger wont make it
much worse, but we'll see.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC]cfq-iosched: no dispatch limit for single queue
  2009-12-05  8:50     ` Jens Axboe
@ 2009-12-05 10:48       ` Corrado Zoccolo
  2009-12-05 18:31         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Corrado Zoccolo @ 2009-12-05 10:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Shaohua Li, linux-kernel, akpm

On Sat, Dec 5, 2009 at 9:50 AM, Jens Axboe <jens.axboe@oracle.com> wrote:
> On Fri, Dec 04 2009, Corrado Zoccolo wrote:
>> Hi Shaohua, Jens,
>> On Thu, Dec 3, 2009 at 12:57 PM, Jens Axboe <jens.axboe@oracle.com> wrote:
>> > On Thu, Dec 03 2009, Shaohua Li wrote:
>> >> Since commit 2f5cb7381b737e24c8046fd4aeab571fb71315f5, each queue can send
>> >> up to 4 * 4 requests if only one queue exists. I wonder why we have such limit.
>> >> Device supports tag can send more requests. For example, AHCI can send 31
>> >> requests. Test (direct aio randread) shows the limits reduce about 4% disk
>> >> thoughput.
>> >> On the other hand, since we send one request one time, if other queue
>> >> pop when current is sending more than cfq_quantum requests, current queue will
>> >> stop send requests soon after one request, so sounds there is no big latency.
>> >>
>> >> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>> >>
>> >> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> >> index aa1e953..e05650f 100644
>> >> --- a/block/cfq-iosched.c
>> >> +++ b/block/cfq-iosched.c
>> >> @@ -1298,9 +1298,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>> >>                       return false;
>> >>
>> >>               /*
>> >> -              * Sole queue user, allow bigger slice
>> >> +              * Sole queue user, no limit
>> >>                */
>> >> -             max_dispatch *= 4;
>> >> +             max_dispatch = -1;
>> >>       }
>> >>
>> >>       /*
>> >
>> > As you mention, we do dispatches in bites of 1. In reality, there's
>> > going to be little difference when we get this far in the depth process,
>> > so I think the patch looks good. I have applied it, thanks.
>>
>> I think the limit should be removed only for sync queues.
>> For async queues, if cfq_latency is not set, removing the limit here can
>> cause very high latencies to sync queues (almost 100% increase),
>> without a noticeable throughput gain.
>
> It's always problematic to say 'without a noticable throughput gain', as
> on some workloads/storage, the difference between 16 and eg 32 in depth
> WILL be noticeable.
For async writes, I think that the hardware that could benefit of 32
parallel requests
(e.g. RAIDs with > 8 disks), already has a big write cache, so 16 or 32 doesn't
really matter for them. It matters, instead, on single SATA disk with NCQ, where
having 31 pending requests instead of 16 will increase the latency of subsequent
reads by 120ms in worst case.

> 16 is already high enough that if we hit that limit,
> it will cause a latency hit. The hope here is that larger wont make it
> much worse, but we'll see.
Ok. Maybe, when one sets low_latency = 0, having also the highest
write throughput
is desired, so the additional latency will not be a problem.

Thanks,
Corrado

>
> --
> Jens Axboe
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC]cfq-iosched: no dispatch limit for single queue
  2009-12-05 10:48       ` Corrado Zoccolo
@ 2009-12-05 18:31         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2009-12-05 18:31 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: Shaohua Li, linux-kernel, akpm

On Sat, Dec 05 2009, Corrado Zoccolo wrote:
> On Sat, Dec 5, 2009 at 9:50 AM, Jens Axboe <jens.axboe@oracle.com> wrote:
> > On Fri, Dec 04 2009, Corrado Zoccolo wrote:
> >> Hi Shaohua, Jens,
> >> On Thu, Dec 3, 2009 at 12:57 PM, Jens Axboe <jens.axboe@oracle.com> wrote:
> >> > On Thu, Dec 03 2009, Shaohua Li wrote:
> >> >> Since commit 2f5cb7381b737e24c8046fd4aeab571fb71315f5, each queue can send
> >> >> up to 4 * 4 requests if only one queue exists. I wonder why we have such limit.
> >> >> Device supports tag can send more requests. For example, AHCI can send 31
> >> >> requests. Test (direct aio randread) shows the limits reduce about 4% disk
> >> >> thoughput.
> >> >> On the other hand, since we send one request one time, if other queue
> >> >> pop when current is sending more than cfq_quantum requests, current queue will
> >> >> stop send requests soon after one request, so sounds there is no big latency.
> >> >>
> >> >> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> >> >>
> >> >> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> >> >> index aa1e953..e05650f 100644
> >> >> --- a/block/cfq-iosched.c
> >> >> +++ b/block/cfq-iosched.c
> >> >> @@ -1298,9 +1298,9 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> >> >>                       return false;
> >> >>
> >> >>               /*
> >> >> -              * Sole queue user, allow bigger slice
> >> >> +              * Sole queue user, no limit
> >> >>                */
> >> >> -             max_dispatch *= 4;
> >> >> +             max_dispatch = -1;
> >> >>       }
> >> >>
> >> >>       /*
> >> >
> >> > As you mention, we do dispatches in bites of 1. In reality, there's
> >> > going to be little difference when we get this far in the depth process,
> >> > so I think the patch looks good. I have applied it, thanks.
> >>
> >> I think the limit should be removed only for sync queues.
> >> For async queues, if cfq_latency is not set, removing the limit here can
> >> cause very high latencies to sync queues (almost 100% increase),
> >> without a noticeable throughput gain.
> >
> > It's always problematic to say 'without a noticable throughput gain', as
> > on some workloads/storage, the difference between 16 and eg 32 in depth
> > WILL be noticeable.
> For async writes, I think that the hardware that could benefit of 32
> parallel requests (e.g. RAIDs with > 8 disks), already has a big write
> cache, so 16 or 32 doesn't really matter for them. It matters,
> instead, on single SATA disk with NCQ, where having 31 pending
> requests instead of 16 will increase the latency of subsequent reads
> by 120ms in worst case.

That depends completely on whether that cache is write back or write
through. If it's write through caching, queue depth is the primary
factor in performance for writes. For write back caching, queue depth is
a lot less relevant.

> > 16 is already high enough that if we hit that limit,
> > it will cause a latency hit. The hope here is that larger wont make it
> > much worse, but we'll see.
> Ok. Maybe, when one sets low_latency = 0, having also the highest
> write throughput
> is desired, so the additional latency will not be a problem.

That would be an option, though I'd prefer not putting too much logic
into that latency knob.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-12-05 18:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-03  3:53 [RFC]cfq-iosched: no dispatch limit for single queue Shaohua Li
2009-12-03 11:57 ` Jens Axboe
2009-12-04 18:34   ` Corrado Zoccolo
2009-12-05  8:50     ` Jens Axboe
2009-12-05 10:48       ` Corrado Zoccolo
2009-12-05 18:31         ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.