linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] NVMe Configuraiton using sysctl
@ 2017-05-15  8:34 Oza Oza
  2017-05-15  8:39 ` Oza Oza
  0 siblings, 1 reply; 5+ messages in thread
From: Oza Oza @ 2017-05-15  8:34 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel

Hi,

we are configuring interrupt coalesce for NVMe, but right now, it uses
module param.
so the same interrupt coalesce settings get applied for all the NVMEs
connected to different RCs.

ideally it should be with sysctl.
for e.g.
sysctl should provide interface to change
Per-CPU IO queue pairs, interrupt coalesce settings etc..

please suggest if we could have/implement sysctl module for NVMe ?

Regards,
Oza.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] NVMe Configuraiton using sysctl
  2017-05-15  8:34 [RFC] NVMe Configuraiton using sysctl Oza Oza
@ 2017-05-15  8:39 ` Oza Oza
  2017-05-15  9:15   ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Oza Oza @ 2017-05-15  8:39 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, linux-kernel

On Mon, May 15, 2017 at 2:04 PM, Oza Oza <oza.oza@broadcom.com> wrote:
> Hi,
>
> we are configuring interrupt coalesce for NVMe, but right now, it uses
> module param.
> so the same interrupt coalesce settings get applied for all the NVMEs
> connected to different RCs.
>
> ideally it should be with sysctl.
> for e.g.
> sysctl should provide interface to change
> Per-CPU IO queue pairs, interrupt coalesce settings etc..
>
> please suggest if we could have/implement sysctl module for NVMe ?
>
> Regards,
> Oza.

+ linux-nvme@lists.infradead.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] NVMe Configuraiton using sysctl
  2017-05-15  8:39 ` Oza Oza
@ 2017-05-15  9:15   ` Sagi Grimberg
  2017-05-15 10:59     ` Oza Oza
  2017-05-15 14:44     ` Keith Busch
  0 siblings, 2 replies; 5+ messages in thread
From: Sagi Grimberg @ 2017-05-15  9:15 UTC (permalink / raw)
  To: Oza Oza, Keith Busch, Jens Axboe, linux-nvme, linux-kernel


>> Hi,

Hi Oza,

>> we are configuring interrupt coalesce for NVMe, but right now, it uses
>> module param.
>> so the same interrupt coalesce settings get applied for all the NVMEs
>> connected to different RCs.
>>
>> ideally it should be with sysctl.

If at all, I would place this in nvme-cli (via ioctl) instead of
sysctl.

>> for e.g.
>> sysctl should provide interface to change
>> Per-CPU IO queue pairs, interrupt coalesce settings etc..

My personal feeling is that percpu granularity is a lot to take in for
the user, and also can yield some unexpected performance
characteristics. But I might be wrong here..

>> please suggest if we could have/implement sysctl module for NVMe ?

I have asked this before, but interrupt coalescing has very little
merit without being able to be adaptive. net drivers maintain online
stats and schedule interrupt coalescing modifications.

Should work in theory, but having said that, interrupt coalescing as a
whole is essentially unusable in nvme since the coalescing time limit
is in units of 100us increments...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] NVMe Configuraiton using sysctl
  2017-05-15  9:15   ` Sagi Grimberg
@ 2017-05-15 10:59     ` Oza Oza
  2017-05-15 14:44     ` Keith Busch
  1 sibling, 0 replies; 5+ messages in thread
From: Oza Oza @ 2017-05-15 10:59 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Jens Axboe, linux-nvme, linux-kernel

On Mon, May 15, 2017 at 2:45 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>>> Hi,
>
>
> Hi Oza,
>
>>> we are configuring interrupt coalesce for NVMe, but right now, it uses
>>> module param.
>>> so the same interrupt coalesce settings get applied for all the NVMEs
>>> connected to different RCs.
>>>
>>> ideally it should be with sysctl.
>
>
> If at all, I would place this in nvme-cli (via ioctl) instead of
> sysctl.
>
>>> for e.g.
>>> sysctl should provide interface to change
>>> Per-CPU IO queue pairs, interrupt coalesce settings etc..
>
>
> My personal feeling is that percpu granularity is a lot to take in for
> the user, and also can yield some unexpected performance
> characteristics. But I might be wrong here..
>

I thought of nvme_ioctl, but was not sure whether sysctl or ioctl.
although we are interested only introducing interrupt coalesce,
because that brings improvements.

>>> please suggest if we could have/implement sysctl module for NVMe ?
>
>
> I have asked this before, but interrupt coalescing has very little
> merit without being able to be adaptive. net drivers maintain online
> stats and schedule interrupt coalescing modifications.
>
> Should work in theory, but having said that, interrupt coalescing as a
> whole is essentially unusable in nvme since the coalescing time limit
> is in units of 100us increments...

surprisingly, it brings 20% improvement in CPU utilization for us.
so it saves lot of our CPU cycles there freeing up to do something else.
the value has to be tuned but that's all there it is.
so we are keen on having this to tune in.

so your suggestion is to use IOCTL instead of sysctl right ?.
and as of now we are only interested in interrupt coalesce alone.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] NVMe Configuraiton using sysctl
  2017-05-15  9:15   ` Sagi Grimberg
  2017-05-15 10:59     ` Oza Oza
@ 2017-05-15 14:44     ` Keith Busch
  1 sibling, 0 replies; 5+ messages in thread
From: Keith Busch @ 2017-05-15 14:44 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Oza Oza, Jens Axboe, linux-nvme, linux-kernel

On Mon, May 15, 2017 at 12:15:28PM +0300, Sagi Grimberg wrote:
> 
> > > Hi,
> 
> Hi Oza,
> 
> > > we are configuring interrupt coalesce for NVMe, but right now, it uses
> > > module param.
> > > so the same interrupt coalesce settings get applied for all the NVMEs
> > > connected to different RCs.
> > > 
> > > ideally it should be with sysctl.
> 
> If at all, I would place this in nvme-cli (via ioctl) instead of
> sysctl.

That's also how I currently recommend testing this feature out. A problem
with that, though, is the feature isn't persistent across controller
resets, so the setting could be reverted without the user knowing.


> > > for e.g.
> > > sysctl should provide interface to change
> > > Per-CPU IO queue pairs, interrupt coalesce settings etc..
> 
> My personal feeling is that percpu granularity is a lot to take in for
> the user, and also can yield some unexpected performance
> characteristics. But I might be wrong here..

We currently use the IRQ affinity spread to get good default pairings.
It's possible to decouple that, but let's hear what about the default
setting isn't optimal before exposing additional knobs. More user tunables
just means one of us will get to frequently re-explain how to use it!


> > > please suggest if we could have/implement sysctl module for NVMe ?
> 
> I have asked this before, but interrupt coalescing has very little
> merit without being able to be adaptive. net drivers maintain online
> stats and schedule interrupt coalescing modifications.
> 
> Should work in theory, but having said that, interrupt coalescing as a
> whole is essentially unusable in nvme since the coalescing time limit
> is in units of 100us increments...

Yeah, as it is defined, the low depth work-load latency does suffer
quite a bit. If the user only cares about IOPs, though, we find that
coalescing is necessary for some workloads to hit the peak capabilities.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-15 14:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15  8:34 [RFC] NVMe Configuraiton using sysctl Oza Oza
2017-05-15  8:39 ` Oza Oza
2017-05-15  9:15   ` Sagi Grimberg
2017-05-15 10:59     ` Oza Oza
2017-05-15 14:44     ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).