All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFD] I/O scheduling in blk-mq
@ 2016-08-08 14:09 Paolo
  2016-08-08 15:26 ` Bart Van Assche
  2016-08-08 20:09 ` Omar Sandoval
  0 siblings, 2 replies; 10+ messages in thread
From: Paolo @ 2016-08-08 14:09 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie

Hi Jens, Tejun, Christoph, all,
AFAIK blk-mq does not yet feature I/O schedulers. In particular, there
is no scheduler providing strong guarantees in terms of
responsiveness, latency for time-sensitive applications and bandwidth
distribution.

For this reason, I'm trying to port BFQ to blk-mq, or to develop
something simpler if even a reduced version of BFQ proves to be too
heavy (this project is supported by Linaro). If you are willing to
provide some feedback in this respect, I would like to ask for
opinions/suggestions on the following two matters, and possibly to
open a more general discussion on I/O scheduling in blk-mq.

1) My idea is to have an independent instance of BFQ, or in general of
the I/O scheduler, executed for each software queue. Then there would
be no global scheduling. The drawback of no global scheduling is that
each process cannot get more than 1/M of the total throughput of the
device, if M is the number of software queues. But, if I'm not
mistaken, it is however unfeasible to give a process more than 1/M of
the total throughput, without lowering the throughput itself. In fact,
giving a process more than 1/M of the total throughput implies serving
its software queue, say Q, more than the others.  The only way to do
it is periodically stopping the service of the other software queues
and dispatching only the requests in Q. But this would reduce
parallelism, which is the main way how blk-mq achieves a very high
throughput. Are these considerations, and, in particular, one
independent I/O scheduler per software queue, sensible?

2) To provide per-process service guarantees, an I/O scheduler must
create per-process internal queues. BFQ and CFQ use I/O contexts to
achieve this goal. Is something like that (or exactly the same)
available also in blk-mq? If so, do you have any suggestion, or link to
documentation/code on how to use what is available in blk-mq?

Thanks,
Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
  2016-08-08 14:09 [RFD] I/O scheduling in blk-mq Paolo
@ 2016-08-08 15:26 ` Bart Van Assche
  2016-08-08 20:09 ` Omar Sandoval
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2016-08-08 15:26 UTC (permalink / raw)
  To: Paolo, Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie

On 08/08/16 07:09, Paolo wrote:
> 2) To provide per-process service guarantees, an I/O scheduler must
> create per-process internal queues. BFQ and CFQ use I/O contexts to
> achieve this goal. Is something like that (or exactly the same)
> available also in blk-mq? If so, do you have any suggestion, or link to
> documentation/code on how to use what is available in blk-mq?

Hello Paolo,

I/O contexts are, by definition, data structures that are shared by 
multiple I/O queues. blk-mq reaches high performance by keeping each 
per-CPU queue independent. This means that using I/O contexts in a 
blk-mq I/O scheduler would introduce a contention point and probably 
also a performance bottleneck. So I would appreciate it if multiqueue 
schedulers would avoid constructs similar to I/O contexts.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
  2016-08-08 14:09 [RFD] I/O scheduling in blk-mq Paolo
  2016-08-08 15:26 ` Bart Van Assche
@ 2016-08-08 20:09 ` Omar Sandoval
  2016-08-31 15:20     ` Paolo Valente
  1 sibling, 1 reply; 10+ messages in thread
From: Omar Sandoval @ 2016-08-08 20:09 UTC (permalink / raw)
  To: Paolo
  Cc: Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie

On Mon, Aug 08, 2016 at 04:09:56PM +0200, Paolo wrote:
> Hi Jens, Tejun, Christoph, all,
> AFAIK blk-mq does not yet feature I/O schedulers. In particular, there
> is no scheduler providing strong guarantees in terms of
> responsiveness, latency for time-sensitive applications and bandwidth
> distribution.
> 
> For this reason, I'm trying to port BFQ to blk-mq, or to develop
> something simpler if even a reduced version of BFQ proves to be too
> heavy (this project is supported by Linaro). If you are willing to
> provide some feedback in this respect, I would like to ask for
> opinions/suggestions on the following two matters, and possibly to
> open a more general discussion on I/O scheduling in blk-mq.
> 
> 1) My idea is to have an independent instance of BFQ, or in general of
> the I/O scheduler, executed for each software queue. Then there would
> be no global scheduling. The drawback of no global scheduling is that
> each process cannot get more than 1/M of the total throughput of the
> device, if M is the number of software queues. But, if I'm not
> mistaken, it is however unfeasible to give a process more than 1/M of
> the total throughput, without lowering the throughput itself. In fact,
> giving a process more than 1/M of the total throughput implies serving
> its software queue, say Q, more than the others.  The only way to do
> it is periodically stopping the service of the other software queues
> and dispatching only the requests in Q. But this would reduce
> parallelism, which is the main way how blk-mq achieves a very high
> throughput. Are these considerations, and, in particular, one
> independent I/O scheduler per software queue, sensible?
> 
> 2) To provide per-process service guarantees, an I/O scheduler must
> create per-process internal queues. BFQ and CFQ use I/O contexts to
> achieve this goal. Is something like that (or exactly the same)
> available also in blk-mq? If so, do you have any suggestion, or link to
> documentation/code on how to use what is available in blk-mq?
> 
> Thanks,
> Paolo

Hi, Paolo,

I've been working on I/O scheduling for blk-mq with Jens for the past
few months (splitting time with other small projects), and we're making
good progress. Like you noticed, the hard part isn't really grafting a
scheduler interface onto blk-mq, it's maintaining good scalability while
providing adequate fairness.

We're working towards a scheduler more like deadline and getting the
architectural issues worked out. The goal is some sort of fairness
across all queues. The scheduler-per-software-queue model won't hold up
so well if we have a slower device with an I/O-hungry process on one CPU
and an interactive process on another CPU.

The issue I'm working through now is that on blk-mq, we only have as
many `struct request`s as the hardware has tags, so on a device with a
limited queue depth, it's really hard to do any sort of intelligent
scheduling. The solution for that is switching over to working with
`struct bio`s in the software queues instead, which abstracts away the
hardware capabilities. I have some work in progress at
https://github.com/osandov/linux/tree/blk-mq-iosched, but it's not yet
at feature-parity.

After that, I'll be back to working on the scheduling itself. The vague
idea is to amortize global scheduling decisions, but I don't have much
concrete code behind that yet.

Thanks!
-- 
Omar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
  2016-08-08 20:09 ` Omar Sandoval
@ 2016-08-31 15:20     ` Paolo Valente
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Valente @ 2016-08-31 15:20 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie


Il giorno 08/ago/2016, alle ore 22:09, Omar Sandoval =
<osandov@osandov.com> ha scritto:

> On Mon, Aug 08, 2016 at 04:09:56PM +0200, Paolo wrote:
>> Hi Jens, Tejun, Christoph, all,
>> AFAIK blk-mq does not yet feature I/O schedulers. In particular, =
there
>> is no scheduler providing strong guarantees in terms of
>> responsiveness, latency for time-sensitive applications and bandwidth
>> distribution.
>>=20
>> For this reason, I'm trying to port BFQ to blk-mq, or to develop
>> something simpler if even a reduced version of BFQ proves to be too
>> heavy (this project is supported by Linaro). If you are willing to
>> provide some feedback in this respect, I would like to ask for
>> opinions/suggestions on the following two matters, and possibly to
>> open a more general discussion on I/O scheduling in blk-mq.
>>=20
>> 1) My idea is to have an independent instance of BFQ, or in general =
of
>> the I/O scheduler, executed for each software queue. Then there would
>> be no global scheduling. The drawback of no global scheduling is that
>> each process cannot get more than 1/M of the total throughput of the
>> device, if M is the number of software queues. But, if I'm not
>> mistaken, it is however unfeasible to give a process more than 1/M of
>> the total throughput, without lowering the throughput itself. In =
fact,
>> giving a process more than 1/M of the total throughput implies =
serving
>> its software queue, say Q, more than the others.  The only way to do
>> it is periodically stopping the service of the other software queues
>> and dispatching only the requests in Q. But this would reduce
>> parallelism, which is the main way how blk-mq achieves a very high
>> throughput. Are these considerations, and, in particular, one
>> independent I/O scheduler per software queue, sensible?
>>=20
>> 2) To provide per-process service guarantees, an I/O scheduler must
>> create per-process internal queues. BFQ and CFQ use I/O contexts to
>> achieve this goal. Is something like that (or exactly the same)
>> available also in blk-mq? If so, do you have any suggestion, or link =
to
>> documentation/code on how to use what is available in blk-mq?
>>=20
>> Thanks,
>> Paolo
>=20
> Hi, Paolo,
>=20
> I've been working on I/O scheduling for blk-mq with Jens for the past
> few months (splitting time with other small projects), and we're =
making
> good progress. Like you noticed, the hard part isn't really grafting a
> scheduler interface onto blk-mq, it's maintaining good scalability =
while
> providing adequate fairness.
>=20
> We're working towards a scheduler more like deadline and getting the
> architectural issues worked out. The goal is some sort of fairness
> across all queues.

If I'm not mistaken, the requests of a process (the bios after your
patch) end up in a given software queue basically by chance, i.e.,
because the process happens to be executed on the core which that
queue is associated with. If this is true, then the scheduler cannot
control in which queue a request is sent. So, how do you imagine the
scheduler to control the global request service order exactly? By
stopping the service of some queues and letting only the head-of-line
request(s) of some other queue(s) be dispatched?

In this respect, I guess that, as of now, it is again chance that
determines from which software queue the next request to dispatch is
picked, i.e., it depends on which core the dispatch functions happen
to be executed. Is it correct?

> The scheduler-per-software-queue model won't hold up
> so well if we have a slower device with an I/O-hungry process on one =
CPU
> and an interactive process on another CPU.
>=20

So, the problem would be that the hungry process eats all the
bandwidth, and the interactive one never gets served.

What about the case where both processes are on the same CPU, i.e.,
where the requests of both processes are on the same software queue?
How does the scheduler you envisage guarantees a good latency to the
interactive process in this case? By properly reordering requests
inside the software queue?

I'm sorry if my questions are quite silly, or do not make much sense.

Thanks,
Paolo


> The issue I'm working through now is that on blk-mq, we only have as
> many `struct request`s as the hardware has tags, so on a device with a
> limited queue depth, it's really hard to do any sort of intelligent
> scheduling. The solution for that is switching over to working with
> `struct bio`s in the software queues instead, which abstracts away the
> hardware capabilities. I have some work in progress at
> https://github.com/osandov/linux/tree/blk-mq-iosched, but it's not yet
> at feature-parity.
>=20
> After that, I'll be back to working on the scheduling itself. The =
vague
> idea is to amortize global scheduling decisions, but I don't have much
> concrete code behind that yet.
>=20
> Thanks!
> --=20
> Omar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
@ 2016-08-31 15:20     ` Paolo Valente
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Valente @ 2016-08-31 15:20 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie


Il giorno 08/ago/2016, alle ore 22:09, Omar Sandoval <osandov@osandov.com> ha scritto:

> On Mon, Aug 08, 2016 at 04:09:56PM +0200, Paolo wrote:
>> Hi Jens, Tejun, Christoph, all,
>> AFAIK blk-mq does not yet feature I/O schedulers. In particular, there
>> is no scheduler providing strong guarantees in terms of
>> responsiveness, latency for time-sensitive applications and bandwidth
>> distribution.
>> 
>> For this reason, I'm trying to port BFQ to blk-mq, or to develop
>> something simpler if even a reduced version of BFQ proves to be too
>> heavy (this project is supported by Linaro). If you are willing to
>> provide some feedback in this respect, I would like to ask for
>> opinions/suggestions on the following two matters, and possibly to
>> open a more general discussion on I/O scheduling in blk-mq.
>> 
>> 1) My idea is to have an independent instance of BFQ, or in general of
>> the I/O scheduler, executed for each software queue. Then there would
>> be no global scheduling. The drawback of no global scheduling is that
>> each process cannot get more than 1/M of the total throughput of the
>> device, if M is the number of software queues. But, if I'm not
>> mistaken, it is however unfeasible to give a process more than 1/M of
>> the total throughput, without lowering the throughput itself. In fact,
>> giving a process more than 1/M of the total throughput implies serving
>> its software queue, say Q, more than the others.  The only way to do
>> it is periodically stopping the service of the other software queues
>> and dispatching only the requests in Q. But this would reduce
>> parallelism, which is the main way how blk-mq achieves a very high
>> throughput. Are these considerations, and, in particular, one
>> independent I/O scheduler per software queue, sensible?
>> 
>> 2) To provide per-process service guarantees, an I/O scheduler must
>> create per-process internal queues. BFQ and CFQ use I/O contexts to
>> achieve this goal. Is something like that (or exactly the same)
>> available also in blk-mq? If so, do you have any suggestion, or link to
>> documentation/code on how to use what is available in blk-mq?
>> 
>> Thanks,
>> Paolo
> 
> Hi, Paolo,
> 
> I've been working on I/O scheduling for blk-mq with Jens for the past
> few months (splitting time with other small projects), and we're making
> good progress. Like you noticed, the hard part isn't really grafting a
> scheduler interface onto blk-mq, it's maintaining good scalability while
> providing adequate fairness.
> 
> We're working towards a scheduler more like deadline and getting the
> architectural issues worked out. The goal is some sort of fairness
> across all queues.

If I'm not mistaken, the requests of a process (the bios after your
patch) end up in a given software queue basically by chance, i.e.,
because the process happens to be executed on the core which that
queue is associated with. If this is true, then the scheduler cannot
control in which queue a request is sent. So, how do you imagine the
scheduler to control the global request service order exactly? By
stopping the service of some queues and letting only the head-of-line
request(s) of some other queue(s) be dispatched?

In this respect, I guess that, as of now, it is again chance that
determines from which software queue the next request to dispatch is
picked, i.e., it depends on which core the dispatch functions happen
to be executed. Is it correct?

> The scheduler-per-software-queue model won't hold up
> so well if we have a slower device with an I/O-hungry process on one CPU
> and an interactive process on another CPU.
> 

So, the problem would be that the hungry process eats all the
bandwidth, and the interactive one never gets served.

What about the case where both processes are on the same CPU, i.e.,
where the requests of both processes are on the same software queue?
How does the scheduler you envisage guarantees a good latency to the
interactive process in this case? By properly reordering requests
inside the software queue?

I'm sorry if my questions are quite silly, or do not make much sense.

Thanks,
Paolo


> The issue I'm working through now is that on blk-mq, we only have as
> many `struct request`s as the hardware has tags, so on a device with a
> limited queue depth, it's really hard to do any sort of intelligent
> scheduling. The solution for that is switching over to working with
> `struct bio`s in the software queues instead, which abstracts away the
> hardware capabilities. I have some work in progress at
> https://github.com/osandov/linux/tree/blk-mq-iosched, but it's not yet
> at feature-parity.
> 
> After that, I'll be back to working on the scheduling itself. The vague
> idea is to amortize global scheduling decisions, but I don't have much
> concrete code behind that yet.
> 
> Thanks!
> -- 
> Omar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
  2016-08-31 15:20     ` Paolo Valente
@ 2016-09-30  6:18       ` Paolo Valente
  -1 siblings, 0 replies; 10+ messages in thread
From: Paolo Valente @ 2016-09-30  6:18 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Omar Sandoval, Jens Axboe, Tejun Heo, Christoph Hellwig,
	linux-block, linux-kernel, Ulf Hansson, Linus Walleij, broonie

Hi Omar,
have you had a chance to look at these last questions of mine?

Thanks,
Paolo

> Il giorno 31 ago 2016, alle ore 17:20, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
>=20
> Il giorno 08/ago/2016, alle ore 22:09, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>=20
>> On Mon, Aug 08, 2016 at 04:09:56PM +0200, Paolo wrote:
>>> Hi Jens, Tejun, Christoph, all,
>>> AFAIK blk-mq does not yet feature I/O schedulers. In particular, =
there
>>> is no scheduler providing strong guarantees in terms of
>>> responsiveness, latency for time-sensitive applications and =
bandwidth
>>> distribution.
>>>=20
>>> For this reason, I'm trying to port BFQ to blk-mq, or to develop
>>> something simpler if even a reduced version of BFQ proves to be too
>>> heavy (this project is supported by Linaro). If you are willing to
>>> provide some feedback in this respect, I would like to ask for
>>> opinions/suggestions on the following two matters, and possibly to
>>> open a more general discussion on I/O scheduling in blk-mq.
>>>=20
>>> 1) My idea is to have an independent instance of BFQ, or in general =
of
>>> the I/O scheduler, executed for each software queue. Then there =
would
>>> be no global scheduling. The drawback of no global scheduling is =
that
>>> each process cannot get more than 1/M of the total throughput of the
>>> device, if M is the number of software queues. But, if I'm not
>>> mistaken, it is however unfeasible to give a process more than 1/M =
of
>>> the total throughput, without lowering the throughput itself. In =
fact,
>>> giving a process more than 1/M of the total throughput implies =
serving
>>> its software queue, say Q, more than the others.  The only way to do
>>> it is periodically stopping the service of the other software queues
>>> and dispatching only the requests in Q. But this would reduce
>>> parallelism, which is the main way how blk-mq achieves a very high
>>> throughput. Are these considerations, and, in particular, one
>>> independent I/O scheduler per software queue, sensible?
>>>=20
>>> 2) To provide per-process service guarantees, an I/O scheduler must
>>> create per-process internal queues. BFQ and CFQ use I/O contexts to
>>> achieve this goal. Is something like that (or exactly the same)
>>> available also in blk-mq? If so, do you have any suggestion, or link =
to
>>> documentation/code on how to use what is available in blk-mq?
>>>=20
>>> Thanks,
>>> Paolo
>>=20
>> Hi, Paolo,
>>=20
>> I've been working on I/O scheduling for blk-mq with Jens for the past
>> few months (splitting time with other small projects), and we're =
making
>> good progress. Like you noticed, the hard part isn't really grafting =
a
>> scheduler interface onto blk-mq, it's maintaining good scalability =
while
>> providing adequate fairness.
>>=20
>> We're working towards a scheduler more like deadline and getting the
>> architectural issues worked out. The goal is some sort of fairness
>> across all queues.
>=20
> If I'm not mistaken, the requests of a process (the bios after your
> patch) end up in a given software queue basically by chance, i.e.,
> because the process happens to be executed on the core which that
> queue is associated with. If this is true, then the scheduler cannot
> control in which queue a request is sent. So, how do you imagine the
> scheduler to control the global request service order exactly? By
> stopping the service of some queues and letting only the head-of-line
> request(s) of some other queue(s) be dispatched?
>=20
> In this respect, I guess that, as of now, it is again chance that
> determines from which software queue the next request to dispatch is
> picked, i.e., it depends on which core the dispatch functions happen
> to be executed. Is it correct?
>=20
>> The scheduler-per-software-queue model won't hold up
>> so well if we have a slower device with an I/O-hungry process on one =
CPU
>> and an interactive process on another CPU.
>>=20
>=20
> So, the problem would be that the hungry process eats all the
> bandwidth, and the interactive one never gets served.
>=20
> What about the case where both processes are on the same CPU, i.e.,
> where the requests of both processes are on the same software queue?
> How does the scheduler you envisage guarantees a good latency to the
> interactive process in this case? By properly reordering requests
> inside the software queue?
>=20
> I'm sorry if my questions are quite silly, or do not make much sense.
>=20
> Thanks,
> Paolo
>=20
>=20
>> The issue I'm working through now is that on blk-mq, we only have as
>> many `struct request`s as the hardware has tags, so on a device with =
a
>> limited queue depth, it's really hard to do any sort of intelligent
>> scheduling. The solution for that is switching over to working with
>> `struct bio`s in the software queues instead, which abstracts away =
the
>> hardware capabilities. I have some work in progress at
>> https://github.com/osandov/linux/tree/blk-mq-iosched, but it's not =
yet
>> at feature-parity.
>>=20
>> After that, I'll be back to working on the scheduling itself. The =
vague
>> idea is to amortize global scheduling decisions, but I don't have =
much
>> concrete code behind that yet.
>>=20
>> Thanks!
>> --=20
>> Omar
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
@ 2016-09-30  6:18       ` Paolo Valente
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Valente @ 2016-09-30  6:18 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Omar Sandoval, Jens Axboe, Tejun Heo, Christoph Hellwig,
	linux-block, linux-kernel, Ulf Hansson, Linus Walleij, broonie

Hi Omar,
have you had a chance to look at these last questions of mine?

Thanks,
Paolo

> Il giorno 31 ago 2016, alle ore 17:20, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> Il giorno 08/ago/2016, alle ore 22:09, Omar Sandoval <osandov@osandov.com> ha scritto:
> 
>> On Mon, Aug 08, 2016 at 04:09:56PM +0200, Paolo wrote:
>>> Hi Jens, Tejun, Christoph, all,
>>> AFAIK blk-mq does not yet feature I/O schedulers. In particular, there
>>> is no scheduler providing strong guarantees in terms of
>>> responsiveness, latency for time-sensitive applications and bandwidth
>>> distribution.
>>> 
>>> For this reason, I'm trying to port BFQ to blk-mq, or to develop
>>> something simpler if even a reduced version of BFQ proves to be too
>>> heavy (this project is supported by Linaro). If you are willing to
>>> provide some feedback in this respect, I would like to ask for
>>> opinions/suggestions on the following two matters, and possibly to
>>> open a more general discussion on I/O scheduling in blk-mq.
>>> 
>>> 1) My idea is to have an independent instance of BFQ, or in general of
>>> the I/O scheduler, executed for each software queue. Then there would
>>> be no global scheduling. The drawback of no global scheduling is that
>>> each process cannot get more than 1/M of the total throughput of the
>>> device, if M is the number of software queues. But, if I'm not
>>> mistaken, it is however unfeasible to give a process more than 1/M of
>>> the total throughput, without lowering the throughput itself. In fact,
>>> giving a process more than 1/M of the total throughput implies serving
>>> its software queue, say Q, more than the others.  The only way to do
>>> it is periodically stopping the service of the other software queues
>>> and dispatching only the requests in Q. But this would reduce
>>> parallelism, which is the main way how blk-mq achieves a very high
>>> throughput. Are these considerations, and, in particular, one
>>> independent I/O scheduler per software queue, sensible?
>>> 
>>> 2) To provide per-process service guarantees, an I/O scheduler must
>>> create per-process internal queues. BFQ and CFQ use I/O contexts to
>>> achieve this goal. Is something like that (or exactly the same)
>>> available also in blk-mq? If so, do you have any suggestion, or link to
>>> documentation/code on how to use what is available in blk-mq?
>>> 
>>> Thanks,
>>> Paolo
>> 
>> Hi, Paolo,
>> 
>> I've been working on I/O scheduling for blk-mq with Jens for the past
>> few months (splitting time with other small projects), and we're making
>> good progress. Like you noticed, the hard part isn't really grafting a
>> scheduler interface onto blk-mq, it's maintaining good scalability while
>> providing adequate fairness.
>> 
>> We're working towards a scheduler more like deadline and getting the
>> architectural issues worked out. The goal is some sort of fairness
>> across all queues.
> 
> If I'm not mistaken, the requests of a process (the bios after your
> patch) end up in a given software queue basically by chance, i.e.,
> because the process happens to be executed on the core which that
> queue is associated with. If this is true, then the scheduler cannot
> control in which queue a request is sent. So, how do you imagine the
> scheduler to control the global request service order exactly? By
> stopping the service of some queues and letting only the head-of-line
> request(s) of some other queue(s) be dispatched?
> 
> In this respect, I guess that, as of now, it is again chance that
> determines from which software queue the next request to dispatch is
> picked, i.e., it depends on which core the dispatch functions happen
> to be executed. Is it correct?
> 
>> The scheduler-per-software-queue model won't hold up
>> so well if we have a slower device with an I/O-hungry process on one CPU
>> and an interactive process on another CPU.
>> 
> 
> So, the problem would be that the hungry process eats all the
> bandwidth, and the interactive one never gets served.
> 
> What about the case where both processes are on the same CPU, i.e.,
> where the requests of both processes are on the same software queue?
> How does the scheduler you envisage guarantees a good latency to the
> interactive process in this case? By properly reordering requests
> inside the software queue?
> 
> I'm sorry if my questions are quite silly, or do not make much sense.
> 
> Thanks,
> Paolo
> 
> 
>> The issue I'm working through now is that on blk-mq, we only have as
>> many `struct request`s as the hardware has tags, so on a device with a
>> limited queue depth, it's really hard to do any sort of intelligent
>> scheduling. The solution for that is switching over to working with
>> `struct bio`s in the software queues instead, which abstracts away the
>> hardware capabilities. I have some work in progress at
>> https://github.com/osandov/linux/tree/blk-mq-iosched, but it's not yet
>> at feature-parity.
>> 
>> After that, I'll be back to working on the scheduling itself. The vague
>> idea is to amortize global scheduling decisions, but I don't have much
>> concrete code behind that yet.
>> 
>> Thanks!
>> -- 
>> Omar
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
  2016-08-31 15:20     ` Paolo Valente
  (?)
  (?)
@ 2016-10-05 17:46     ` Omar Sandoval
  2016-10-05 20:16         ` Paolo Valente
  -1 siblings, 1 reply; 10+ messages in thread
From: Omar Sandoval @ 2016-10-05 17:46 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie

Hey, Paolo,

On Wed, Aug 31, 2016 at 05:20:10PM +0200, Paolo Valente wrote:
[snip]
> > Hi, Paolo,
> > 
> > I've been working on I/O scheduling for blk-mq with Jens for the past
> > few months (splitting time with other small projects), and we're making
> > good progress. Like you noticed, the hard part isn't really grafting a
> > scheduler interface onto blk-mq, it's maintaining good scalability while
> > providing adequate fairness.
> > 
> > We're working towards a scheduler more like deadline and getting the
> > architectural issues worked out. The goal is some sort of fairness
> > across all queues.
> 
> If I'm not mistaken, the requests of a process (the bios after your
> patch) end up in a given software queue basically by chance, i.e.,
> because the process happens to be executed on the core which that
> queue is associated with.

Yeah, pretty much.

> If this is true, then the scheduler cannot
> control in which queue a request is sent. So, how do you imagine the
> scheduler to control the global request service order exactly? By
> stopping the service of some queues and letting only the head-of-line
> request(s) of some other queue(s) be dispatched?

For single-queue devices (HDDs, non-NVME SSDs), all of these software
queues feed into one hardware queue, which is where we can control
global service order. For multi-queue devices, we don't really want to
enforce a strict global service order, since that would undermine the
purpose of having multiple queues.

> In this respect, I guess that, as of now, it is again chance that
> determines from which software queue the next request to dispatch is
> picked, i.e., it depends on which core the dispatch functions happen
> to be executed. Is it correct?

blk-mq has a push model of request dispatch rather than a pull model.
That is, in the old block layer the device driver would ask the elevator
for the next request to dispatch. In blk-mq, either the thread
submitting a request or a worker thread will invoke the driver's
dispatch function with the next request.

> > The scheduler-per-software-queue model won't hold up
> > so well if we have a slower device with an I/O-hungry process on one CPU
> > and an interactive process on another CPU.
> > 
> 
> So, the problem would be that the hungry process eats all the
> bandwidth, and the interactive one never gets served.
> 
> What about the case where both processes are on the same CPU, i.e.,
> where the requests of both processes are on the same software queue?
> How does the scheduler you envisage guarantees a good latency to the
> interactive process in this case? By properly reordering requests
> inside the software queue?

We need a combination of controlling the order in which we queue in the
software queues, the order in which we move requests from the software
queues to the hardware queues, and the order in which we dispatch
requests from the hardware queues to the driver.

> I'm sorry if my questions are quite silly, or do not make much sense.

Hope this helps, and sorry for the delay in my response.

> Thanks,
> Paolo
-- 
Omar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
  2016-10-05 17:46     ` Omar Sandoval
@ 2016-10-05 20:16         ` Paolo Valente
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Valente @ 2016-10-05 20:16 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie


> Il giorno 05 ott 2016, alle ore 19:46, Omar Sandoval =
<osandov@osandov.com> ha scritto:
>=20
> Hey, Paolo,
>=20
> On Wed, Aug 31, 2016 at 05:20:10PM +0200, Paolo Valente wrote:
> [snip]
>>> Hi, Paolo,
>>>=20
>>> I've been working on I/O scheduling for blk-mq with Jens for the =
past
>>> few months (splitting time with other small projects), and we're =
making
>>> good progress. Like you noticed, the hard part isn't really grafting =
a
>>> scheduler interface onto blk-mq, it's maintaining good scalability =
while
>>> providing adequate fairness.
>>>=20
>>> We're working towards a scheduler more like deadline and getting the
>>> architectural issues worked out. The goal is some sort of fairness
>>> across all queues.
>>=20
>> If I'm not mistaken, the requests of a process (the bios after your
>> patch) end up in a given software queue basically by chance, i.e.,
>> because the process happens to be executed on the core which that
>> queue is associated with.
>=20
> Yeah, pretty much.
>=20
>> If this is true, then the scheduler cannot
>> control in which queue a request is sent. So, how do you imagine the
>> scheduler to control the global request service order exactly? By
>> stopping the service of some queues and letting only the head-of-line
>> request(s) of some other queue(s) be dispatched?
>=20
> For single-queue devices (HDDs, non-NVME SSDs), all of these software
> queues feed into one hardware queue, which is where we can control
> global service order. For multi-queue devices, we don't really want to
> enforce a strict global service order, since that would undermine the
> purpose of having multiple queues.
>=20

If I understood well, this general scheme may be effective.  Any
progress with the code?  As I already said, if I can help, I will be
glad to.

>> In this respect, I guess that, as of now, it is again chance that
>> determines from which software queue the next request to dispatch is
>> picked, i.e., it depends on which core the dispatch functions happen
>> to be executed. Is it correct?
>=20
> blk-mq has a push model of request dispatch rather than a pull model.
> That is, in the old block layer the device driver would ask the =
elevator
> for the next request to dispatch. In blk-mq, either the thread
> submitting a request or a worker thread will invoke the driver's
> dispatch function with the next request.
>=20

Thank you very much for this explanation.  So, in this push model,
what guarantees the device not to receive more requests per second
than what it can handle?

>>> The scheduler-per-software-queue model won't hold up
>>> so well if we have a slower device with an I/O-hungry process on one =
CPU
>>> and an interactive process on another CPU.
>>>=20
>>=20
>> So, the problem would be that the hungry process eats all the
>> bandwidth, and the interactive one never gets served.
>>=20
>> What about the case where both processes are on the same CPU, i.e.,
>> where the requests of both processes are on the same software queue?
>> How does the scheduler you envisage guarantees a good latency to the
>> interactive process in this case? By properly reordering requests
>> inside the software queue?
>=20
> We need a combination of controlling the order in which we queue in =
the
> software queues, the order in which we move requests from the software
> queues to the hardware queues, and the order in which we dispatch
> requests from the hardware queues to the driver.
>=20

It doesn't sound simple to control service guarantees with all these
controlled passages, but I guess that only a prototype can give sound
answers.

>> I'm sorry if my questions are quite silly, or do not make much sense.
>=20
> Hope this helps, and sorry for the delay in my response.

It did help!

Thank you,
Paolo

>=20
>> Thanks,
>> Paolo
> --=20
> Omar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFD] I/O scheduling in blk-mq
@ 2016-10-05 20:16         ` Paolo Valente
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Valente @ 2016-10-05 20:16 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jens Axboe, Tejun Heo, Christoph Hellwig, linux-block,
	linux-kernel, Ulf Hansson, Linus Walleij, broonie


> Il giorno 05 ott 2016, alle ore 19:46, Omar Sandoval <osandov@osandov.com> ha scritto:
> 
> Hey, Paolo,
> 
> On Wed, Aug 31, 2016 at 05:20:10PM +0200, Paolo Valente wrote:
> [snip]
>>> Hi, Paolo,
>>> 
>>> I've been working on I/O scheduling for blk-mq with Jens for the past
>>> few months (splitting time with other small projects), and we're making
>>> good progress. Like you noticed, the hard part isn't really grafting a
>>> scheduler interface onto blk-mq, it's maintaining good scalability while
>>> providing adequate fairness.
>>> 
>>> We're working towards a scheduler more like deadline and getting the
>>> architectural issues worked out. The goal is some sort of fairness
>>> across all queues.
>> 
>> If I'm not mistaken, the requests of a process (the bios after your
>> patch) end up in a given software queue basically by chance, i.e.,
>> because the process happens to be executed on the core which that
>> queue is associated with.
> 
> Yeah, pretty much.
> 
>> If this is true, then the scheduler cannot
>> control in which queue a request is sent. So, how do you imagine the
>> scheduler to control the global request service order exactly? By
>> stopping the service of some queues and letting only the head-of-line
>> request(s) of some other queue(s) be dispatched?
> 
> For single-queue devices (HDDs, non-NVME SSDs), all of these software
> queues feed into one hardware queue, which is where we can control
> global service order. For multi-queue devices, we don't really want to
> enforce a strict global service order, since that would undermine the
> purpose of having multiple queues.
> 

If I understood well, this general scheme may be effective.  Any
progress with the code?  As I already said, if I can help, I will be
glad to.

>> In this respect, I guess that, as of now, it is again chance that
>> determines from which software queue the next request to dispatch is
>> picked, i.e., it depends on which core the dispatch functions happen
>> to be executed. Is it correct?
> 
> blk-mq has a push model of request dispatch rather than a pull model.
> That is, in the old block layer the device driver would ask the elevator
> for the next request to dispatch. In blk-mq, either the thread
> submitting a request or a worker thread will invoke the driver's
> dispatch function with the next request.
> 

Thank you very much for this explanation.  So, in this push model,
what guarantees the device not to receive more requests per second
than what it can handle?

>>> The scheduler-per-software-queue model won't hold up
>>> so well if we have a slower device with an I/O-hungry process on one CPU
>>> and an interactive process on another CPU.
>>> 
>> 
>> So, the problem would be that the hungry process eats all the
>> bandwidth, and the interactive one never gets served.
>> 
>> What about the case where both processes are on the same CPU, i.e.,
>> where the requests of both processes are on the same software queue?
>> How does the scheduler you envisage guarantees a good latency to the
>> interactive process in this case? By properly reordering requests
>> inside the software queue?
> 
> We need a combination of controlling the order in which we queue in the
> software queues, the order in which we move requests from the software
> queues to the hardware queues, and the order in which we dispatch
> requests from the hardware queues to the driver.
> 

It doesn't sound simple to control service guarantees with all these
controlled passages, but I guess that only a prototype can give sound
answers.

>> I'm sorry if my questions are quite silly, or do not make much sense.
> 
> Hope this helps, and sorry for the delay in my response.

It did help!

Thank you,
Paolo

> 
>> Thanks,
>> Paolo
> -- 
> Omar

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-10-05 20:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-08 14:09 [RFD] I/O scheduling in blk-mq Paolo
2016-08-08 15:26 ` Bart Van Assche
2016-08-08 20:09 ` Omar Sandoval
2016-08-31 15:20   ` Paolo Valente
2016-08-31 15:20     ` Paolo Valente
2016-09-30  6:18     ` Paolo Valente
2016-09-30  6:18       ` Paolo Valente
2016-10-05 17:46     ` Omar Sandoval
2016-10-05 20:16       ` Paolo Valente
2016-10-05 20:16         ` Paolo Valente

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.