linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: io.latency controller apparently not working
       [not found] <22878C62-54B8-41BA-B90C-1C5414F3060F@linaro.org>
@ 2019-08-16 13:21 ` Josef Bacik
  2019-08-16 17:52   ` Paolo Valente
  0 siblings, 1 reply; 7+ messages in thread
From: Josef Bacik @ 2019-08-16 13:21 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, linux-kernel, Josef Bacik, Jens Axboe,
	Vincent Guittot, noreply-spamdigest via bfq-iosched, Tejun Heo

On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
> Hi,
> I happened to test the io.latency controller, to make a comparison
> between this controller and BFQ.  But io.latency seems not to work,
> i.e., not to reduce latency compared with what happens with no I/O
> control at all.  Here is a summary of the results for one of the
> workloads I tested, on three different devices (latencies in ms):
> 
>              no I/O control        io.latency         BFQ
> NVMe SSD     1.9                   1.9                0.07
> SATA SSD     39                    56                 0.7
> HDD          4500                  4500               11
> 
> I have put all details on hardware, OS, scenarios and results in the
> attached pdf.  For your convenience, I'm pasting the source file too.
> 

Do you have the fio jobs you use for this?  I just tested on Jens's most recent
tree and io.latency appears to be doing what its supposed to be doing.  We've
also started testing 5.2 in production and it's still working in production as
well.  The only thing I've touched recently was around wakeups and shouldn't
have broken everything.  I'm not sure why it's not working for you, but a fio
script will help me narrow down what's going on.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io.latency controller apparently not working
  2019-08-16 13:21 ` io.latency controller apparently not working Josef Bacik
@ 2019-08-16 17:52   ` Paolo Valente
  2019-08-16 17:59     ` Josef Bacik
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Valente @ 2019-08-16 17:52 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-block, linux-kernel, Jens Axboe, Vincent Guittot,
	noreply-spamdigest via bfq-iosched, Tejun Heo



> Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@toxicpanda.com> ha scritto:
> 
> On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
>> Hi,
>> I happened to test the io.latency controller, to make a comparison
>> between this controller and BFQ.  But io.latency seems not to work,
>> i.e., not to reduce latency compared with what happens with no I/O
>> control at all.  Here is a summary of the results for one of the
>> workloads I tested, on three different devices (latencies in ms):
>> 
>>             no I/O control        io.latency         BFQ
>> NVMe SSD     1.9                   1.9                0.07
>> SATA SSD     39                    56                 0.7
>> HDD          4500                  4500               11
>> 
>> I have put all details on hardware, OS, scenarios and results in the
>> attached pdf.  For your convenience, I'm pasting the source file too.
>> 
> 
> Do you have the fio jobs you use for this?

The script mentioned in the draft (executed with the command line
reported in the draft), executes one fio instance for the target
process, and one fio instance for each interferer.  I couldn't do with
just one fio instance executing all jobs, because the weight parameter
doesn't work in fio jobfiles for some reason, and because the ioprio
class cannot be set for individual jobs.

In particular, the script generates a job with the following
parameters for the target process:

 ioengine=sync
 loops=10000
 direct=0
 readwrite=randread
 fdatasync=0
 bs=4k
 thread=0
 filename=/mnt/scsi_debug/largefile_interfered0
 iodepth=1
 numjobs=1
 invalidate=1

and a job with the following parameters for each of the interferers,
in case, e.g., of a workload made of reads:

 ioengine=sync
 direct=0
 readwrite=read
 fdatasync=0
 bs=4k
 filename=/mnt/scsi_debug/largefileX
 invalidate=1

Should you fail to reproduce this issue by creating groups, setting
latencies and starting fio jobs manually, what if you try by just
executing my script?  Maybe this could help us spot the culprit more
quickly.

>  I just tested on Jens's most recent
> tree and io.latency appears to be doing what its supposed to be doing.  We've
> also started testing 5.2 in production and it's still working in production as
> well.

I tested 5.2 too, same negative outcome.

Thanks,
Paolo

>  The only thing I've touched recently was around wakeups and shouldn't
> have broken everything.  I'm not sure why it's not working for you, but a fio
> script will help me narrow down what's going on.  Thanks,
> 
> Josef


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io.latency controller apparently not working
  2019-08-16 17:52   ` Paolo Valente
@ 2019-08-16 17:59     ` Josef Bacik
  2019-08-16 18:17       ` Paolo Valente
  0 siblings, 1 reply; 7+ messages in thread
From: Josef Bacik @ 2019-08-16 17:59 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Josef Bacik, linux-block, linux-kernel, Jens Axboe,
	Vincent Guittot, noreply-spamdigest via bfq-iosched, Tejun Heo

On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote:
> 
> 
> > Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@toxicpanda.com> ha scritto:
> > 
> > On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
> >> Hi,
> >> I happened to test the io.latency controller, to make a comparison
> >> between this controller and BFQ.  But io.latency seems not to work,
> >> i.e., not to reduce latency compared with what happens with no I/O
> >> control at all.  Here is a summary of the results for one of the
> >> workloads I tested, on three different devices (latencies in ms):
> >> 
> >>             no I/O control        io.latency         BFQ
> >> NVMe SSD     1.9                   1.9                0.07
> >> SATA SSD     39                    56                 0.7
> >> HDD          4500                  4500               11
> >> 
> >> I have put all details on hardware, OS, scenarios and results in the
> >> attached pdf.  For your convenience, I'm pasting the source file too.
> >> 
> > 
> > Do you have the fio jobs you use for this?
> 
> The script mentioned in the draft (executed with the command line
> reported in the draft), executes one fio instance for the target
> process, and one fio instance for each interferer.  I couldn't do with
> just one fio instance executing all jobs, because the weight parameter
> doesn't work in fio jobfiles for some reason, and because the ioprio
> class cannot be set for individual jobs.
> 
> In particular, the script generates a job with the following
> parameters for the target process:
> 
>  ioengine=sync
>  loops=10000
>  direct=0
>  readwrite=randread
>  fdatasync=0
>  bs=4k
>  thread=0
>  filename=/mnt/scsi_debug/largefile_interfered0
>  iodepth=1
>  numjobs=1
>  invalidate=1
> 
> and a job with the following parameters for each of the interferers,
> in case, e.g., of a workload made of reads:
> 
>  ioengine=sync
>  direct=0
>  readwrite=read
>  fdatasync=0
>  bs=4k
>  filename=/mnt/scsi_debug/largefileX
>  invalidate=1
> 
> Should you fail to reproduce this issue by creating groups, setting
> latencies and starting fio jobs manually, what if you try by just
> executing my script?  Maybe this could help us spot the culprit more
> quickly.

Ah ok, you are doing it on a mountpoint.  Are you using btrfs?  Cause otherwise
you are going to have a sad time.  The other thing is you are using buffered,
which may or may not hit the disk.  This is what I use to test io.latency

https://patchwork.kernel.org/patch/10714425/

I had to massage it since it didn't apply directly, but running this against the
actual block device, with O_DIRECT so I'm sure to be measure the actual impact
of the controller, it all works out fine.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io.latency controller apparently not working
  2019-08-16 17:59     ` Josef Bacik
@ 2019-08-16 18:17       ` Paolo Valente
  2019-08-19 16:41         ` Paolo Valente
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Valente @ 2019-08-16 18:17 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-block, linux-kernel, Jens Axboe, Vincent Guittot,
	noreply-spamdigest via bfq-iosched, Tejun Heo



> Il giorno 16 ago 2019, alle ore 19:59, Josef Bacik <josef@toxicpanda.com> ha scritto:
> 
> On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@toxicpanda.com> ha scritto:
>>> 
>>> On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
>>>> Hi,
>>>> I happened to test the io.latency controller, to make a comparison
>>>> between this controller and BFQ.  But io.latency seems not to work,
>>>> i.e., not to reduce latency compared with what happens with no I/O
>>>> control at all.  Here is a summary of the results for one of the
>>>> workloads I tested, on three different devices (latencies in ms):
>>>> 
>>>>            no I/O control        io.latency         BFQ
>>>> NVMe SSD     1.9                   1.9                0.07
>>>> SATA SSD     39                    56                 0.7
>>>> HDD          4500                  4500               11
>>>> 
>>>> I have put all details on hardware, OS, scenarios and results in the
>>>> attached pdf.  For your convenience, I'm pasting the source file too.
>>>> 
>>> 
>>> Do you have the fio jobs you use for this?
>> 
>> The script mentioned in the draft (executed with the command line
>> reported in the draft), executes one fio instance for the target
>> process, and one fio instance for each interferer.  I couldn't do with
>> just one fio instance executing all jobs, because the weight parameter
>> doesn't work in fio jobfiles for some reason, and because the ioprio
>> class cannot be set for individual jobs.
>> 
>> In particular, the script generates a job with the following
>> parameters for the target process:
>> 
>> ioengine=sync
>> loops=10000
>> direct=0
>> readwrite=randread
>> fdatasync=0
>> bs=4k
>> thread=0
>> filename=/mnt/scsi_debug/largefile_interfered0
>> iodepth=1
>> numjobs=1
>> invalidate=1
>> 
>> and a job with the following parameters for each of the interferers,
>> in case, e.g., of a workload made of reads:
>> 
>> ioengine=sync
>> direct=0
>> readwrite=read
>> fdatasync=0
>> bs=4k
>> filename=/mnt/scsi_debug/largefileX
>> invalidate=1
>> 
>> Should you fail to reproduce this issue by creating groups, setting
>> latencies and starting fio jobs manually, what if you try by just
>> executing my script?  Maybe this could help us spot the culprit more
>> quickly.
> 
> Ah ok, you are doing it on a mountpoint.

Yep

>  Are you using btrfs?

ext4

>  Cause otherwise
> you are going to have a sad time.

Could you elaborate more on this?  I/O seems to be controllable on ext4.

>  The other thing is you are using buffered,

Actually, the problem is suffered by sync random reads, which always
hit the disk in this test.

> which may or may not hit the disk.  This is what I use to test io.latency
> 
> https://patchwork.kernel.org/patch/10714425/
> 
> I had to massage it since it didn't apply directly, but running this against the
> actual block device, with O_DIRECT so I'm sure to be measure the actual impact
> of the controller, it all works out fine.

I'm not getting why non-direct sync reads, or buffered writes, should
be uncontrollable.  As a trivial example, BFQ in this tests controls
I/O as expected, and keeps latency extremely low.

What am I missing?

Thanks,
Paolo

>  Thanks,
> 
> Josef


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io.latency controller apparently not working
  2019-08-16 18:17       ` Paolo Valente
@ 2019-08-19 16:41         ` Paolo Valente
  2019-08-19 17:00           ` Paolo Valente
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Valente @ 2019-08-19 16:41 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-block, linux-kernel, Jens Axboe, Vincent Guittot,
	noreply-spamdigest via bfq-iosched, Tejun Heo



> Il giorno 16 ago 2019, alle ore 20:17, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> 
>> Il giorno 16 ago 2019, alle ore 19:59, Josef Bacik <josef@toxicpanda.com> ha scritto:
>> 
>> On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote:
>>> 
>>> 
>>>> Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@toxicpanda.com> ha scritto:
>>>> 
>>>> On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
>>>>> Hi,
>>>>> I happened to test the io.latency controller, to make a comparison
>>>>> between this controller and BFQ.  But io.latency seems not to work,
>>>>> i.e., not to reduce latency compared with what happens with no I/O
>>>>> control at all.  Here is a summary of the results for one of the
>>>>> workloads I tested, on three different devices (latencies in ms):
>>>>> 
>>>>>           no I/O control        io.latency         BFQ
>>>>> NVMe SSD     1.9                   1.9                0.07
>>>>> SATA SSD     39                    56                 0.7
>>>>> HDD          4500                  4500               11
>>>>> 
>>>>> I have put all details on hardware, OS, scenarios and results in the
>>>>> attached pdf.  For your convenience, I'm pasting the source file too.
>>>>> 
>>>> 
>>>> Do you have the fio jobs you use for this?
>>> 
>>> The script mentioned in the draft (executed with the command line
>>> reported in the draft), executes one fio instance for the target
>>> process, and one fio instance for each interferer.  I couldn't do with
>>> just one fio instance executing all jobs, because the weight parameter
>>> doesn't work in fio jobfiles for some reason, and because the ioprio
>>> class cannot be set for individual jobs.
>>> 
>>> In particular, the script generates a job with the following
>>> parameters for the target process:
>>> 
>>> ioengine=sync
>>> loops=10000
>>> direct=0
>>> readwrite=randread
>>> fdatasync=0
>>> bs=4k
>>> thread=0
>>> filename=/mnt/scsi_debug/largefile_interfered0
>>> iodepth=1
>>> numjobs=1
>>> invalidate=1
>>> 
>>> and a job with the following parameters for each of the interferers,
>>> in case, e.g., of a workload made of reads:
>>> 
>>> ioengine=sync
>>> direct=0
>>> readwrite=read
>>> fdatasync=0
>>> bs=4k
>>> filename=/mnt/scsi_debug/largefileX
>>> invalidate=1
>>> 
>>> Should you fail to reproduce this issue by creating groups, setting
>>> latencies and starting fio jobs manually, what if you try by just
>>> executing my script?  Maybe this could help us spot the culprit more
>>> quickly.
>> 
>> Ah ok, you are doing it on a mountpoint.
> 
> Yep
> 
>> Are you using btrfs?
> 
> ext4
> 
>> Cause otherwise
>> you are going to have a sad time.
> 
> Could you elaborate more on this?  I/O seems to be controllable on ext4.
> 
>> The other thing is you are using buffered,
> 
> Actually, the problem is suffered by sync random reads, which always
> hit the disk in this test.
> 
>> which may or may not hit the disk.  This is what I use to test io.latency
>> 
>> https://patchwork.kernel.org/patch/10714425/
>> 
>> I had to massage it since it didn't apply directly, but running this against the
>> actual block device, with O_DIRECT so I'm sure to be measure the actual impact
>> of the controller, it all works out fine.
> 
> I'm not getting why non-direct sync reads, or buffered writes, should
> be uncontrollable.  As a trivial example, BFQ in this tests controls
> I/O as expected, and keeps latency extremely low.
> 
> What am I missing?
> 

While waiting for your answer, I've added also the direct-I/O case to
my test.  Now we have also this new case reproduced by the command
line reported in the draft.

Even with direct I/O, nothing changes with writers as interferers,
apart from latency becoming at least equal to the case of no I/O
control for the HDD.  Summing up, with writers as interferers (latency
in ms):

            no I/O control        io.latency         BFQ
NVMe SSD     3                     3                 0.2
SATA SSD     3                     3                 0.2
HDD          56                    56                13

In contrast, there are important improvements with the SSDs, in case
of readers as interferers.  This is the new situation (latency still
in ms):

            no I/O control        io.latency         BFQ
NVMe SSD     1.9                   0.08              0.07
SATA SSD     39                    0.2               0.7
HDD          4500                  118               11

Thanks,
Paolo

> Thanks,
> Paolo
> 
>> Thanks,
>> 
>> Josef


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io.latency controller apparently not working
  2019-08-19 16:41         ` Paolo Valente
@ 2019-08-19 17:00           ` Paolo Valente
  2019-08-19 17:49             ` Josef Bacik
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Valente @ 2019-08-19 17:00 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-block, linux-kernel, Jens Axboe, Vincent Guittot,
	noreply-spamdigest via bfq-iosched, Tejun Heo



> Il giorno 19 ago 2019, alle ore 18:41, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> 
>> Il giorno 16 ago 2019, alle ore 20:17, Paolo Valente <paolo.valente@linaro.org> ha scritto:
>> 
>> 
>> 
>>> Il giorno 16 ago 2019, alle ore 19:59, Josef Bacik <josef@toxicpanda.com> ha scritto:
>>> 
>>> On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote:
>>>> 
>>>> 
>>>>> Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@toxicpanda.com> ha scritto:
>>>>> 
>>>>> On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
>>>>>> Hi,
>>>>>> I happened to test the io.latency controller, to make a comparison
>>>>>> between this controller and BFQ.  But io.latency seems not to work,
>>>>>> i.e., not to reduce latency compared with what happens with no I/O
>>>>>> control at all.  Here is a summary of the results for one of the
>>>>>> workloads I tested, on three different devices (latencies in ms):
>>>>>> 
>>>>>>          no I/O control        io.latency         BFQ
>>>>>> NVMe SSD     1.9                   1.9                0.07
>>>>>> SATA SSD     39                    56                 0.7
>>>>>> HDD          4500                  4500               11
>>>>>> 
>>>>>> I have put all details on hardware, OS, scenarios and results in the
>>>>>> attached pdf.  For your convenience, I'm pasting the source file too.
>>>>>> 
>>>>> 
>>>>> Do you have the fio jobs you use for this?
>>>> 
>>>> The script mentioned in the draft (executed with the command line
>>>> reported in the draft), executes one fio instance for the target
>>>> process, and one fio instance for each interferer.  I couldn't do with
>>>> just one fio instance executing all jobs, because the weight parameter
>>>> doesn't work in fio jobfiles for some reason, and because the ioprio
>>>> class cannot be set for individual jobs.
>>>> 
>>>> In particular, the script generates a job with the following
>>>> parameters for the target process:
>>>> 
>>>> ioengine=sync
>>>> loops=10000
>>>> direct=0
>>>> readwrite=randread
>>>> fdatasync=0
>>>> bs=4k
>>>> thread=0
>>>> filename=/mnt/scsi_debug/largefile_interfered0
>>>> iodepth=1
>>>> numjobs=1
>>>> invalidate=1
>>>> 
>>>> and a job with the following parameters for each of the interferers,
>>>> in case, e.g., of a workload made of reads:
>>>> 
>>>> ioengine=sync
>>>> direct=0
>>>> readwrite=read
>>>> fdatasync=0
>>>> bs=4k
>>>> filename=/mnt/scsi_debug/largefileX
>>>> invalidate=1
>>>> 
>>>> Should you fail to reproduce this issue by creating groups, setting
>>>> latencies and starting fio jobs manually, what if you try by just
>>>> executing my script?  Maybe this could help us spot the culprit more
>>>> quickly.
>>> 
>>> Ah ok, you are doing it on a mountpoint.
>> 
>> Yep
>> 
>>> Are you using btrfs?
>> 
>> ext4
>> 
>>> Cause otherwise
>>> you are going to have a sad time.
>> 
>> Could you elaborate more on this?  I/O seems to be controllable on ext4.
>> 
>>> The other thing is you are using buffered,
>> 
>> Actually, the problem is suffered by sync random reads, which always
>> hit the disk in this test.
>> 
>>> which may or may not hit the disk.  This is what I use to test io.latency
>>> 
>>> https://patchwork.kernel.org/patch/10714425/
>>> 
>>> I had to massage it since it didn't apply directly, but running this against the
>>> actual block device, with O_DIRECT so I'm sure to be measure the actual impact
>>> of the controller, it all works out fine.
>> 
>> I'm not getting why non-direct sync reads, or buffered writes, should
>> be uncontrollable.  As a trivial example, BFQ in this tests controls
>> I/O as expected, and keeps latency extremely low.
>> 
>> What am I missing?
>> 
> 
> While waiting for your answer, I've added also the direct-I/O case to
> my test.  Now we have also this new case reproduced by the command
> line reported in the draft.
> 
> Even with direct I/O, nothing changes with writers as interferers,
> apart from latency becoming at least equal to the case of no I/O
> control for the HDD.  Summing up, with writers as interferers (latency
> in ms):
> 
>            no I/O control        io.latency         BFQ
> NVMe SSD     3                     3                 0.2
> SATA SSD     3                     3                 0.2
> HDD          56                    56                13
> 
> In contrast, there are important improvements with the SSDs, in case
> of readers as interferers.  This is the new situation (latency still
> in ms):
> 
>            no I/O control        io.latency         BFQ
> NVMe SSD     1.9                   0.08              0.07
> SATA SSD     39                    0.2               0.7
> HDD          4500                  118               11
> 

I'm sorry, I didn't repeat tests with direct I/O for BFQ too.  And
results change for BFQ too in case of readers as interferes.  Here
are all correct figures for readers as interferers (latency in ms):

           no I/O control        io.latency         BFQ
NVMe SSD     1.9                   0.08              0.07
SATA SSD     39                    0.2               0.2
HDD          4500                  118               10

Thanks,
Paolo


> Thanks,
> Paolo
> 
>> Thanks,
>> Paolo
>> 
>>> Thanks,
>>> 
>>> Josef


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io.latency controller apparently not working
  2019-08-19 17:00           ` Paolo Valente
@ 2019-08-19 17:49             ` Josef Bacik
  0 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2019-08-19 17:49 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Josef Bacik, linux-block, linux-kernel, Jens Axboe,
	Vincent Guittot, noreply-spamdigest via bfq-iosched, Tejun Heo

On Mon, Aug 19, 2019 at 07:00:56PM +0200, Paolo Valente wrote:
> 
> 
> > Il giorno 19 ago 2019, alle ore 18:41, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> > 
> > 
> > 
> >> Il giorno 16 ago 2019, alle ore 20:17, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> >> 
> >> 
> >> 
> >>> Il giorno 16 ago 2019, alle ore 19:59, Josef Bacik <josef@toxicpanda.com> ha scritto:
> >>> 
> >>> On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote:
> >>>> 
> >>>> 
> >>>>> Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@toxicpanda.com> ha scritto:
> >>>>> 
> >>>>> On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
> >>>>>> Hi,
> >>>>>> I happened to test the io.latency controller, to make a comparison
> >>>>>> between this controller and BFQ.  But io.latency seems not to work,
> >>>>>> i.e., not to reduce latency compared with what happens with no I/O
> >>>>>> control at all.  Here is a summary of the results for one of the
> >>>>>> workloads I tested, on three different devices (latencies in ms):
> >>>>>> 
> >>>>>>          no I/O control        io.latency         BFQ
> >>>>>> NVMe SSD     1.9                   1.9                0.07
> >>>>>> SATA SSD     39                    56                 0.7
> >>>>>> HDD          4500                  4500               11
> >>>>>> 
> >>>>>> I have put all details on hardware, OS, scenarios and results in the
> >>>>>> attached pdf.  For your convenience, I'm pasting the source file too.
> >>>>>> 
> >>>>> 
> >>>>> Do you have the fio jobs you use for this?
> >>>> 
> >>>> The script mentioned in the draft (executed with the command line
> >>>> reported in the draft), executes one fio instance for the target
> >>>> process, and one fio instance for each interferer.  I couldn't do with
> >>>> just one fio instance executing all jobs, because the weight parameter
> >>>> doesn't work in fio jobfiles for some reason, and because the ioprio
> >>>> class cannot be set for individual jobs.
> >>>> 
> >>>> In particular, the script generates a job with the following
> >>>> parameters for the target process:
> >>>> 
> >>>> ioengine=sync
> >>>> loops=10000
> >>>> direct=0
> >>>> readwrite=randread
> >>>> fdatasync=0
> >>>> bs=4k
> >>>> thread=0
> >>>> filename=/mnt/scsi_debug/largefile_interfered0
> >>>> iodepth=1
> >>>> numjobs=1
> >>>> invalidate=1
> >>>> 
> >>>> and a job with the following parameters for each of the interferers,
> >>>> in case, e.g., of a workload made of reads:
> >>>> 
> >>>> ioengine=sync
> >>>> direct=0
> >>>> readwrite=read
> >>>> fdatasync=0
> >>>> bs=4k
> >>>> filename=/mnt/scsi_debug/largefileX
> >>>> invalidate=1
> >>>> 
> >>>> Should you fail to reproduce this issue by creating groups, setting
> >>>> latencies and starting fio jobs manually, what if you try by just
> >>>> executing my script?  Maybe this could help us spot the culprit more
> >>>> quickly.
> >>> 
> >>> Ah ok, you are doing it on a mountpoint.
> >> 
> >> Yep
> >> 
> >>> Are you using btrfs?
> >> 
> >> ext4
> >> 
> >>> Cause otherwise
> >>> you are going to have a sad time.
> >> 
> >> Could you elaborate more on this?  I/O seems to be controllable on ext4.
> >> 
> >>> The other thing is you are using buffered,
> >> 
> >> Actually, the problem is suffered by sync random reads, which always
> >> hit the disk in this test.
> >> 
> >>> which may or may not hit the disk.  This is what I use to test io.latency
> >>> 
> >>> https://patchwork.kernel.org/patch/10714425/
> >>> 
> >>> I had to massage it since it didn't apply directly, but running this against the
> >>> actual block device, with O_DIRECT so I'm sure to be measure the actual impact
> >>> of the controller, it all works out fine.
> >> 
> >> I'm not getting why non-direct sync reads, or buffered writes, should
> >> be uncontrollable.  As a trivial example, BFQ in this tests controls
> >> I/O as expected, and keeps latency extremely low.
> >> 
> >> What am I missing?
> >> 
> > 
> > While waiting for your answer, I've added also the direct-I/O case to
> > my test.  Now we have also this new case reproduced by the command
> > line reported in the draft.

Sorry something caught fire and I got distracted.

> > 
> > Even with direct I/O, nothing changes with writers as interferers,
> > apart from latency becoming at least equal to the case of no I/O
> > control for the HDD.  Summing up, with writers as interferers (latency
> > in ms):
> > 
> >            no I/O control        io.latency         BFQ
> > NVMe SSD     3                     3                 0.2
> > SATA SSD     3                     3                 0.2
> > HDD          56                    56                13
> > 
> > In contrast, there are important improvements with the SSDs, in case
> > of readers as interferers.  This is the new situation (latency still
> > in ms):
> > 
> >            no I/O control        io.latency         BFQ
> > NVMe SSD     1.9                   0.08              0.07
> > SATA SSD     39                    0.2               0.7
> > HDD          4500                  118               11
> > 
> 
> I'm sorry, I didn't repeat tests with direct I/O for BFQ too.  And
> results change for BFQ too in case of readers as interferes.  Here
> are all correct figures for readers as interferers (latency in ms):
> 
>            no I/O control        io.latency         BFQ
> NVMe SSD     1.9                   0.08              0.07
> SATA SSD     39                    0.2               0.2
> HDD          4500                  118               10

Alright so it's working right with reads an O_DIRECT.  I'm not entirely sure
what's happening with the buffered side, I will investigate.

The writes I don't expect to work on ext4, there's a priority inversion in ext4
that I haven't bothered to work around since we use btrfs anywhere we want io
control.  That being said it shouldn't be behaving this poorly.  Once I've
wrapped up what I'm currently working on I'll run through this and figure out
the weirdness for these two cases.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-19 17:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <22878C62-54B8-41BA-B90C-1C5414F3060F@linaro.org>
2019-08-16 13:21 ` io.latency controller apparently not working Josef Bacik
2019-08-16 17:52   ` Paolo Valente
2019-08-16 17:59     ` Josef Bacik
2019-08-16 18:17       ` Paolo Valente
2019-08-19 16:41         ` Paolo Valente
2019-08-19 17:00           ` Paolo Valente
2019-08-19 17:49             ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).