All of lore.kernel.org
 help / color / mirror / Atom feed
* coalescing in polling mode in 4.9
@ 2018-02-02  8:10 Alex Nln
  2018-02-05  0:15 ` Alex Nln
  2018-02-05 15:02 ` Keith Busch
  0 siblings, 2 replies; 12+ messages in thread
From: Alex Nln @ 2018-02-02  8:10 UTC (permalink / raw)


Hello,

Enabling interrupt coalescing in nvme device, kernel 4.9 
significantly reduces performance when using polling mode.
When I enable coalescing, IOPS drops from 100K to 35K and 
latency jumps from 7 usec to 25 usec.

Shouldn't we expect performance boost in polling mode when
interrupt coalescing enabled? 


Device is Intel DC P3600
Coalescing enabled:  nvme set-feature /dev/nvme0 -f 8 -v 0x00ffff
fio-2.16 file:
[global]
iodepth=1
direct=1
ioengine=pvsync2
hipri
group_reporting
time_based
blocksize=4k
norandommap=1

[job1]
rw=read
filename=/dev/nvme0n1
name=raw=sequential-read
numjobs=1
runtime=60



Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-02  8:10 coalescing in polling mode in 4.9 Alex Nln
@ 2018-02-05  0:15 ` Alex Nln
  2018-02-05 14:40   ` Sagi Grimberg
  2018-02-05 15:02 ` Keith Busch
  1 sibling, 1 reply; 12+ messages in thread
From: Alex Nln @ 2018-02-05  0:15 UTC (permalink / raw)



Appreciate if someone could help me to understand why do we need interrupts 
in polling mode, and performance drops when interrupts coalescing enabled.

Thanks

On Fri, 2 Feb 2018 00:10:28 -0800
Alex Nln <alex.nlnnfn@gmail.com> wrote:

> Hello,
> 
> Enabling interrupt coalescing in nvme device, kernel 4.9 
> significantly reduces performance when using polling mode.
> When I enable coalescing, IOPS drops from 100K to 35K and 
> latency jumps from 7 usec to 25 usec.
> 
> Shouldn't we expect performance boost in polling mode when
> interrupt coalescing enabled? 
> 
> 
> Device is Intel DC P3600
> Coalescing enabled:  nvme set-feature /dev/nvme0 -f 8 -v 0x00ffff
> fio-2.16 file:
> [global]
> iodepth=1
> direct=1
> ioengine=pvsync2
> hipri
> group_reporting
> time_based
> blocksize=4k
> norandommap=1
> 
> [job1]
> rw=read
> filename=/dev/nvme0n1
> name=raw=sequential-read
> numjobs=1
> runtime=60
> 
> 
> 
> Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-05  0:15 ` Alex Nln
@ 2018-02-05 14:40   ` Sagi Grimberg
  0 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2018-02-05 14:40 UTC (permalink / raw)



> Appreciate if someone could help me to understand why do we need interrupts
> in polling mode, and performance drops when interrupts coalescing enabled.

I didn't try with this device. I did try on p3700 in the past and this
did not happen.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-02  8:10 coalescing in polling mode in 4.9 Alex Nln
  2018-02-05  0:15 ` Alex Nln
@ 2018-02-05 15:02 ` Keith Busch
  2018-02-05 15:42   ` Nitesh
  2018-02-06  3:55   ` Alex Nln
  1 sibling, 2 replies; 12+ messages in thread
From: Keith Busch @ 2018-02-05 15:02 UTC (permalink / raw)


On Fri, Feb 02, 2018@12:10:28AM -0800, Alex Nln wrote:
> Enabling interrupt coalescing in nvme device, kernel 4.9 
> significantly reduces performance when using polling mode.
> When I enable coalescing, IOPS drops from 100K to 35K and 
> latency jumps from 7 usec to 25 usec.
> 
> Shouldn't we expect performance boost in polling mode when
> interrupt coalescing enabled? 
> 
> 
> Device is Intel DC P3600

That's a pretty low latency for this device. Are you running reads to
unmapped blocks?

I've seen the phenomenom occur where higher coalescing settings worsens
performance. That usually means the the polling thread exceeded its time:
need_resched() returns true, so the tasks schedules out and relies on
interrupts, which have been throttled. Maybe a hybrid polling would be
better for this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-05 15:02 ` Keith Busch
@ 2018-02-05 15:42   ` Nitesh
  2018-02-06  5:39     ` Alex Nln
  2018-02-06 16:29     ` Keith Busch
  2018-02-06  3:55   ` Alex Nln
  1 sibling, 2 replies; 12+ messages in thread
From: Nitesh @ 2018-02-05 15:42 UTC (permalink / raw)


Hi Alex,
I got into similar problem not long ago. With coalescing enabled, some 
I/Os took very long. Every time need_reshed() returns true, 
io_schedule() makes task go to sleep as its state is previous set as 
non-interruptible.I handled this by setting task state as running, and 
release the cpu. Diff is attached below, you may give it a try.

Hi Keith,
Should below not be the default way to handle classic-polling case i.e. 
do not rely on interrupts at all? For sleep during polling-path (which 
reduces cpu utilization) we anyway have hybrid polling path. If this 
approach seems acceptable, I can further refactor the code and submit a 
patch. I look forward to your suggestions.


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 4a181fc..d2eeedf 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, 
struct iov_iter *iter,
                 set_current_state(TASK_UNINTERRUPTIBLE);
                 if (!READ_ONCE(bio.bi_private))
                         break;
-               if (!(iocb->ki_flags & IOCB_HIPRI) ||
-                   !blk_poll(bdev_get_queue(bdev), qc))
+               if (!(iocb->ki_flags & IOCB_HIPRI))
                         io_schedule();
+               else if (!blk_poll(bdev_get_queue(bdev), qc)) {
+                       if(need_resched())
+                               set_current_state(TASK_RUNNING);
+                       io_schedule();
+               }
         }
         __set_current_state(TASK_RUNNING);


On Monday 05 February 2018 08:32 PM, Keith Busch wrote:
> On Fri, Feb 02, 2018@12:10:28AM -0800, Alex Nln wrote:
>> Enabling interrupt coalescing in nvme device, kernel 4.9
>> significantly reduces performance when using polling mode.
>> When I enable coalescing, IOPS drops from 100K to 35K and
>> latency jumps from 7 usec to 25 usec.
>>
>> Shouldn't we expect performance boost in polling mode when
>> interrupt coalescing enabled?
>>
>>
>> Device is Intel DC P3600
> 
> That's a pretty low latency for this device. Are you running reads to
> unmapped blocks?
> 
> I've seen the phenomenom occur where higher coalescing settings worsens
> performance. That usually means the the polling thread exceeded its time:
> need_resched() returns true, so the tasks schedules out and relies on
> interrupts, which have been throttled. Maybe a hybrid polling would be
> better for this.
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 
> 
> 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-05 15:02 ` Keith Busch
  2018-02-05 15:42   ` Nitesh
@ 2018-02-06  3:55   ` Alex Nln
  1 sibling, 0 replies; 12+ messages in thread
From: Alex Nln @ 2018-02-06  3:55 UTC (permalink / raw)


On Mon, 5 Feb 2018 08:02:53 -0700
Keith Busch <keith.busch@intel.com> wrote:

> On Fri, Feb 02, 2018@12:10:28AM -0800, Alex Nln wrote:
> > Enabling interrupt coalescing in nvme device, kernel 4.9 
> > significantly reduces performance when using polling mode.
> > When I enable coalescing, IOPS drops from 100K to 35K and 
> > latency jumps from 7 usec to 25 usec.
> > 
> > Shouldn't we expect performance boost in polling mode when
> > interrupt coalescing enabled? 
> > 
> > 
> > Device is Intel DC P3600
> 
> That's a pretty low latency for this device. Are you running reads to
> unmapped blocks?

These are synchronous sequential reads, random reads' latency is higher.

> 
> I've seen the phenomenom occur where higher coalescing settings worsens
> performance. That usually means the the polling thread exceeded its time:
> need_resched() returns true, so the tasks schedules out and relies on
> interrupts, which have been throttled. Maybe a hybrid polling would be
> better for this.
> 

It looks like the eventual call to need_resched() is a result of low performance.
When I set coalescing to 0x00FFFF there are about 160 nvme interrupts/sec, 
but ~35K IOPS, so clearly only a fractions of CQ entries are processed in 
interrupt context, most of the CQ entries still processed by polling.
Somehow device delays posting completion entries when coalescing is enabled.
Is it possible?

Thanks for help

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-05 15:42   ` Nitesh
@ 2018-02-06  5:39     ` Alex Nln
  2018-02-09 22:37       ` Nitesh Shetty
  2018-02-06 16:29     ` Keith Busch
  1 sibling, 1 reply; 12+ messages in thread
From: Alex Nln @ 2018-02-06  5:39 UTC (permalink / raw)


Hi Nitesh,

On Mon, 05 Feb 2018 21:12:12 +0530
Nitesh <nj.shetty@samsung.com> wrote:

> Hi Alex,
> I got into similar problem not long ago. With coalescing enabled, some 
> I/Os took very long. Every time need_reshed() returns true, 
> io_schedule() makes task go to sleep as its state is previous set as 
> non-interruptible.I handled this by setting task state as running, and 
> release the cpu. Diff is attached below, you may give it a try.
> 

I tested this patch on kernel 4.9 and it solves the problem.
Thanks a lot


> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 4a181fc..d2eeedf 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, 
> struct iov_iter *iter,
>                  set_current_state(TASK_UNINTERRUPTIBLE);
>                  if (!READ_ONCE(bio.bi_private))
>                          break;
> -               if (!(iocb->ki_flags & IOCB_HIPRI) ||
> -                   !blk_poll(bdev_get_queue(bdev), qc))
> +               if (!(iocb->ki_flags & IOCB_HIPRI))
>                          io_schedule();
> +               else if (!blk_poll(bdev_get_queue(bdev), qc)) {
> +                       if(need_resched())
> +                               set_current_state(TASK_RUNNING);
> +                       io_schedule();
> +               }
>          }
>          __set_current_state(TASK_RUNNING);
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-05 15:42   ` Nitesh
  2018-02-06  5:39     ` Alex Nln
@ 2018-02-06 16:29     ` Keith Busch
  2018-02-07  8:27       ` Nitesh
  1 sibling, 1 reply; 12+ messages in thread
From: Keith Busch @ 2018-02-06 16:29 UTC (permalink / raw)


On Mon, Feb 05, 2018@09:12:12PM +0530, Nitesh wrote:
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 4a181fc..d2eeedf 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct
> iov_iter *iter,
>                 set_current_state(TASK_UNINTERRUPTIBLE);
>                 if (!READ_ONCE(bio.bi_private))
>                         break;
> -               if (!(iocb->ki_flags & IOCB_HIPRI) ||
> -                   !blk_poll(bdev_get_queue(bdev), qc))
> +               if (!(iocb->ki_flags & IOCB_HIPRI))
>                         io_schedule();
> +               else if (!blk_poll(bdev_get_queue(bdev), qc)) {
> +                       if(need_resched())
> +                               set_current_state(TASK_RUNNING);
> +                       io_schedule();
> +               }
>         }
>         __set_current_state(TASK_RUNNING);

Yah, I think this looks good. Do you want to send this as a proper patch
to linux-block mailing list for consideration?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-06 16:29     ` Keith Busch
@ 2018-02-07  8:27       ` Nitesh
  0 siblings, 0 replies; 12+ messages in thread
From: Nitesh @ 2018-02-07  8:27 UTC (permalink / raw)


Hi Keith,

I will send proper patch.

On Tuesday 06 February 2018 09:59 PM, Keith Busch wrote:
> On Mon, Feb 05, 2018@09:12:12PM +0530, Nitesh wrote:
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 4a181fc..d2eeedf 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct
>> iov_iter *iter,
>>                  set_current_state(TASK_UNINTERRUPTIBLE);
>>                  if (!READ_ONCE(bio.bi_private))
>>                          break;
>> -               if (!(iocb->ki_flags & IOCB_HIPRI) ||
>> -                   !blk_poll(bdev_get_queue(bdev), qc))
>> +               if (!(iocb->ki_flags & IOCB_HIPRI))
>>                          io_schedule();
>> +               else if (!blk_poll(bdev_get_queue(bdev), qc)) {
>> +                       if(need_resched())
>> +                               set_current_state(TASK_RUNNING);
>> +                       io_schedule();
>> +               }
>>          }
>>          __set_current_state(TASK_RUNNING);
> 
> Yah, I think this looks good. Do you want to send this as a proper patch
> to linux-block mailing list for consideration?
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-06  5:39     ` Alex Nln
@ 2018-02-09 22:37       ` Nitesh Shetty
  2018-02-10 21:08         ` Alex Nln
  0 siblings, 1 reply; 12+ messages in thread
From: Nitesh Shetty @ 2018-02-09 22:37 UTC (permalink / raw)


Hi Alex,

Did you observe any benefit in polling-performance i.e. with 
interrupt-coalescing vs without interrupt-coalescing. I'd appreciate if 
you could share the performance/latency data.

Thanks,
Nitesh

On Tuesday 06 February 2018 11:09 AM, Alex Nln wrote:
> Hi Nitesh,
> 
> On Mon, 05 Feb 2018 21:12:12 +0530
> Nitesh <nj.shetty@samsung.com> wrote:
> 
>> Hi Alex,
>> I got into similar problem not long ago. With coalescing enabled, some
>> I/Os took very long. Every time need_reshed() returns true,
>> io_schedule() makes task go to sleep as its state is previous set as
>> non-interruptible.I handled this by setting task state as running, and
>> release the cpu. Diff is attached below, you may give it a try.
>>
> 
> I tested this patch on kernel 4.9 and it solves the problem.
> Thanks a lot
> 
> 
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 4a181fc..d2eeedf 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb,
>> struct iov_iter *iter,
>>                   set_current_state(TASK_UNINTERRUPTIBLE);
>>                   if (!READ_ONCE(bio.bi_private))
>>                           break;
>> -               if (!(iocb->ki_flags & IOCB_HIPRI) ||
>> -                   !blk_poll(bdev_get_queue(bdev), qc))
>> +               if (!(iocb->ki_flags & IOCB_HIPRI))
>>                           io_schedule();
>> +               else if (!blk_poll(bdev_get_queue(bdev), qc)) {
>> +                       if(need_resched())
>> +                               set_current_state(TASK_RUNNING);
>> +                       io_schedule();
>> +               }
>>           }
>>           __set_current_state(TASK_RUNNING);
>>
>>
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-09 22:37       ` Nitesh Shetty
@ 2018-02-10 21:08         ` Alex Nln
  2018-02-12 14:53           ` Keith Busch
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Nln @ 2018-02-10 21:08 UTC (permalink / raw)


Hi Nitesh,

Yes, the benefits are clear, in terms of IOPS and latency. See below.

coalescing   iops clat intr/s
    0x0000 108710 8.65 108711
    0xff01 113081 8.28  56541
    0xff03 114955 8.14  28738
    0xff07 116318 8.04  14539
    0xff0f 116470 8.02   7279
    0xff1f 116908 8.00   3653
    0xff3f 116833 8.00   1825
    0xff7f 116718 8.00    911
    0xffff 116962 8.00    456

It is quite strange model of IO processing: polling with interrupts. 
Shouldn't these two be mutual exclusive? Why not to disable
interrupts at all on a queue while polling?

Thanks,
Alex

device Intel DC P3600, kernel 4.9.76 + Nitesh patch that fixes polling, fio 2.21:

[global]
iodepth=1
direct=1
ioengine=pvsync2
hipri
group_reporting
time_based
norandommap=1

[job1]
rw=read
filename=/dev/nvme0n1
name=raw=sequential-read
numjobs=1
runtime=60



On Sat, 10 Feb 2018 04:07:08 +0530
Nitesh Shetty <nj.shetty@samsung.com> wrote:

> Hi Alex,
> 
> Did you observe any benefit in polling-performance i.e. with 
> interrupt-coalescing vs without interrupt-coalescing. I'd appreciate if 
> you could share the performance/latency data.
> 
> Thanks,
> Nitesh
> 
> On Tuesday 06 February 2018 11:09 AM, Alex Nln wrote:
> > Hi Nitesh,
> > 
> > On Mon, 05 Feb 2018 21:12:12 +0530
> > Nitesh <nj.shetty@samsung.com> wrote:
> > 
> >> Hi Alex,
> >> I got into similar problem not long ago. With coalescing enabled, some
> >> I/Os took very long. Every time need_reshed() returns true,
> >> io_schedule() makes task go to sleep as its state is previous set as
> >> non-interruptible.I handled this by setting task state as running, and
> >> release the cpu. Diff is attached below, you may give it a try.
> >>
> > 
> > I tested this patch on kernel 4.9 and it solves the problem.
> > Thanks a lot
> > 
> > 
> >>
> >> diff --git a/fs/block_dev.c b/fs/block_dev.c
> >> index 4a181fc..d2eeedf 100644
> >> --- a/fs/block_dev.c
> >> +++ b/fs/block_dev.c
> >> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb,
> >> struct iov_iter *iter,
> >>                   set_current_state(TASK_UNINTERRUPTIBLE);
> >>                   if (!READ_ONCE(bio.bi_private))
> >>                           break;
> >> -               if (!(iocb->ki_flags & IOCB_HIPRI) ||
> >> -                   !blk_poll(bdev_get_queue(bdev), qc))
> >> +               if (!(iocb->ki_flags & IOCB_HIPRI))
> >>                           io_schedule();
> >> +               else if (!blk_poll(bdev_get_queue(bdev), qc)) {
> >> +                       if(need_resched())
> >> +                               set_current_state(TASK_RUNNING);
> >> +                       io_schedule();
> >> +               }
> >>           }
> >>           __set_current_state(TASK_RUNNING);
> >>
> >>
> > 
> > _______________________________________________
> > Linux-nvme mailing list
> > Linux-nvme at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-nvme
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* coalescing in polling mode in 4.9
  2018-02-10 21:08         ` Alex Nln
@ 2018-02-12 14:53           ` Keith Busch
  0 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2018-02-12 14:53 UTC (permalink / raw)


On Sat, Feb 10, 2018@01:08:17PM -0800, Alex Nln wrote:
> It is quite strange model of IO processing: polling with interrupts. 
> Shouldn't these two be mutual exclusive? Why not to disable
> interrupts at all on a queue while polling?

That's a valid point. There are some limitations preventing this at the
moment. The nvme driver currently does not have a way to know the command
it is submitting is intended to be polled, and if it did, we still need
to determine a policy for creating special queue or queues that have
interrupts disabled and how we register these with the block layer.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-02-12 14:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-02  8:10 coalescing in polling mode in 4.9 Alex Nln
2018-02-05  0:15 ` Alex Nln
2018-02-05 14:40   ` Sagi Grimberg
2018-02-05 15:02 ` Keith Busch
2018-02-05 15:42   ` Nitesh
2018-02-06  5:39     ` Alex Nln
2018-02-09 22:37       ` Nitesh Shetty
2018-02-10 21:08         ` Alex Nln
2018-02-12 14:53           ` Keith Busch
2018-02-06 16:29     ` Keith Busch
2018-02-07  8:27       ` Nitesh
2018-02-06  3:55   ` Alex Nln

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.