All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] block-mq issues with FC
@ 2016-04-08 11:29 Hannes Reinecke
  2016-04-08 15:11 ` James Bottomley
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Hannes Reinecke @ 2016-04-08 11:29 UTC (permalink / raw)
  To: lsf; +Cc: SCSI Mailing List, linux-block, Christoph Hellwig, Jens Axboe

Hi all,

I'd like to propose a topic on block-mq issues with FC.
During my performance testing using block/scsi-mq with FC I've hit
several issues I'd like to discuss:

- timeout handling:
Out of necessity the status of any timed out command is undefined.
So to be absolutely safe HBAs will be using extended timeouts here
(eg 70secs for lpfc). During that time we _could_ signal I/O timeout
to the upper layers, but then the tag will be reused, despite the
HBA still having a reference to it.
I'd like to discuss how this could be solved best with blk-mq.

- Adaption on other HBAs to multiqueue:
The current block-mq design assumes symmetric send and receive
queues (in effect queue pairs). Any hardware _not_ providing this
(like qla2xxx) can not be easily converted to scsi-mq. I'd like to
discuss how one could approach converting these drivers.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 11:29 [LSF/MM TOPIC] block-mq issues with FC Hannes Reinecke
@ 2016-04-08 15:11 ` James Bottomley
  2016-04-08 15:51   ` [Lsf] " Ewan D. Milne
  2016-04-08 17:26 ` Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2016-04-08 15:11 UTC (permalink / raw)
  To: Hannes Reinecke, lsf
  Cc: SCSI Mailing List, linux-block, Christoph Hellwig, Jens Axboe

On Fri, 2016-04-08 at 13:29 +0200, Hannes Reinecke wrote:
> Hi all,
> 
> I'd like to propose a topic on block-mq issues with FC.
> During my performance testing using block/scsi-mq with FC I've hit
> several issues I'd like to discuss:
> 
> - timeout handling:
> Out of necessity the status of any timed out command is undefined.
> So to be absolutely safe HBAs will be using extended timeouts here
> (eg 70secs for lpfc). During that time we _could_ signal I/O timeout
> to the upper layers, but then the tag will be reused, despite the
> HBA still having a reference to it.
> I'd like to discuss how this could be solved best with blk-mq.

What's wrong with the obvious answer: the tag shouldn't be re-used
until after at least the TMF abort.  If we need to escalate that then
it looks like the controller lost the tag and requires a bigger hammer.

However, when I look at what we do, it seems the running abort handler
is triggered from the block timeout function, so where's the problem?
... surely mq can't free the tag until that returns, because it might
extend the time. 

James


> - Adaption on other HBAs to multiqueue:
> The current block-mq design assumes symmetric send and receive
> queues (in effect queue pairs). Any hardware _not_ providing this
> (like qla2xxx) can not be easily converted to scsi-mq. I'd like to
> discuss how one could approach converting these drivers.
> 
> Cheers,
> 
> Hannes


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 15:11 ` James Bottomley
@ 2016-04-08 15:51   ` Ewan D. Milne
  2016-04-08 16:06     ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Ewan D. Milne @ 2016-04-08 15:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hannes Reinecke, lsf, linux-block, Jens Axboe, Christoph Hellwig,
	SCSI Mailing List

On Fri, 2016-04-08 at 08:11 -0700, James Bottomley wrote:
> On Fri, 2016-04-08 at 13:29 +0200, Hannes Reinecke wrote:
> > Hi all,
> > 
> > I'd like to propose a topic on block-mq issues with FC.
> > During my performance testing using block/scsi-mq with FC I've hit
> > several issues I'd like to discuss:
> > 
> > - timeout handling:
> > Out of necessity the status of any timed out command is undefined.
> > So to be absolutely safe HBAs will be using extended timeouts here
> > (eg 70secs for lpfc). During that time we _could_ signal I/O timeout
> > to the upper layers, but then the tag will be reused, despite the
> > HBA still having a reference to it.
> > I'd like to discuss how this could be solved best with blk-mq.
> 
> What's wrong with the obvious answer: the tag shouldn't be re-used
> until after at least the TMF abort.  If we need to escalate that then
> it looks like the controller lost the tag and requires a bigger hammer.
> 
> However, when I look at what we do, it seems the running abort handler
> is triggered from the block timeout function, so where's the problem?
> ... surely mq can't free the tag until that returns, because it might
> extend the time. 
> 
> James

There was some discussion a while back about whether we could decouple
the SCSI EH's recovery of the device from using the failed scmds, so
that once the disposition of the original I/O was determined (i.e. they
had succeeded, failed or timed out & aborted), the scmds could be
returned to a higher layer while the EH attempted to recover the
device.  That way, in a multipath environment, we could submit the I/O
on working paths and avoid lengthy delays while we went through all the
resets.

We still need a successful abort after a timeout, but at least in the
above scenario we shouldn't be reusing the tags until the device is
recovered, as further I/O should be blocked while EH is running.

-Ewan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 15:51   ` [Lsf] " Ewan D. Milne
@ 2016-04-08 16:06     ` James Bottomley
  0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2016-04-08 16:06 UTC (permalink / raw)
  To: emilne
  Cc: Hannes Reinecke, lsf, linux-block, Jens Axboe, Christoph Hellwig,
	SCSI Mailing List

On Fri, 2016-04-08 at 11:51 -0400, Ewan D. Milne wrote:
> On Fri, 2016-04-08 at 08:11 -0700, James Bottomley wrote:
> > On Fri, 2016-04-08 at 13:29 +0200, Hannes Reinecke wrote:
> > > Hi all,
> > > 
> > > I'd like to propose a topic on block-mq issues with FC.
> > > During my performance testing using block/scsi-mq with FC I've 
> > > hit several issues I'd like to discuss:
> > > 
> > > - timeout handling:
> > > Out of necessity the status of any timed out command is 
> > > undefined. So to be absolutely safe HBAs will be using extended 
> > > timeouts here (eg 70secs for lpfc). During that time we _could_ 
> > > signal I/O timeout to the upper layers, but then the tag will be 
> > > reused, despite the HBA still having a reference to it. I'd like
> > > to discuss how this could be solved best with blk-mq.
> > 
> > What's wrong with the obvious answer: the tag shouldn't be re-used
> > until after at least the TMF abort.  If we need to escalate that 
> > then it looks like the controller lost the tag and requires a 
> > bigger hammer.
> > 
> > However, when I look at what we do, it seems the running abort 
> > handler is triggered from the block timeout function, so where's 
> > the problem? ... surely mq can't free the tag until that returns, 
> > because it migh extend the time.
> > 
> > James
> 
> There was some discussion a while back about whether we could 
> decouple the SCSI EH's recovery of the device from using the failed 
> scmds, so that once the disposition of the original I/O was 
> determined (i.e. they had succeeded, failed or timed out & aborted), 
> the scmds could be returned to a higher layer while the EH attempted 
> to recover the device.

OK, so is the problem the tag or the request pointed to by the scmd?  I
think in the tag case, as long as it's not recovered until after the
abort is processed (i.e. until a disposition is returned from
scsi_times_out) then we're fine.  If the abort fails, we quiesce the
host anyway, so the block layer can happily queue commands with re-used
tags and the device will never see the duplication.

I can't see how there can be a problem with the requests, because we
hold a reference to them in the scmd, so while it might be nicer to
release them earlier, it shouldn't be a problem today.

James


>   That way, in a multipath environment, we could submit the I/O on
> working paths and avoid lengthy delays while we went through all the
> resets.
> 
> We still need a successful abort after a timeout, but at least in the
> above scenario we shouldn't be reusing the tags until the device is
> recovered, as further I/O should be blocked while EH is running.
> 
> -Ewan
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux
> -block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 11:29 [LSF/MM TOPIC] block-mq issues with FC Hannes Reinecke
  2016-04-08 15:11 ` James Bottomley
@ 2016-04-08 17:26 ` Bart Van Assche
  2016-04-08 17:40 ` Matthew Wilcox
  2016-04-08 18:13 ` Christoph Hellwig
  3 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2016-04-08 17:26 UTC (permalink / raw)
  To: Hannes Reinecke, lsf
  Cc: linux-block, Jens Axboe, Christoph Hellwig, SCSI Mailing List

On 04/08/2016 04:29 AM, Hannes Reinecke wrote:
> I'd like to propose a topic on block-mq issues with FC.
> During my performance testing using block/scsi-mq with FC I've hit
> several issues I'd like to discuss:
>
> - timeout handling:
> Out of necessity the status of any timed out command is undefined.
> So to be absolutely safe HBAs will be using extended timeouts here
> (eg 70secs for lpfc). During that time we _could_ signal I/O timeout
> to the upper layers, but then the tag will be reused, despite the
> HBA still having a reference to it.
> I'd like to discuss how this could be solved best with blk-mq.
>
> - Adaption on other HBAs to multiqueue:
> The current block-mq design assumes symmetric send and receive
> queues (in effect queue pairs). Any hardware _not_ providing this
> (like qla2xxx) can not be easily converted to scsi-mq. I'd like to
> discuss how one could approach converting these drivers.

Hello Hannes,

Without commenting on the specifics of the above proposal, I'm 
interested in a further discussion of how to improve multiqueue support 
for FC drivers.

Bart.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 11:29 [LSF/MM TOPIC] block-mq issues with FC Hannes Reinecke
  2016-04-08 15:11 ` James Bottomley
  2016-04-08 17:26 ` Bart Van Assche
@ 2016-04-08 17:40 ` Matthew Wilcox
  2016-04-08 18:00   ` James Bottomley
                     ` (4 more replies)
  2016-04-08 18:13 ` Christoph Hellwig
  3 siblings, 5 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-04-08 17:40 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: lsf, linux-block, Jens Axboe, Christoph Hellwig, SCSI Mailing List

On Fri, Apr 08, 2016 at 01:29:26PM +0200, Hannes Reinecke wrote:
> I'd like to propose a topic on block-mq issues with FC.
> During my performance testing using block/scsi-mq with FC I've hit
> several issues I'd like to discuss:

If there's a general block-mq bitching session, I have some ideas :-)

 - Inability to use all queues supported by a device.  Intel's P3700
   supports 31 queues, but block-mq insists on assigning an even multiple
   of CPUs to each queue.  So if you have 48 CPUs, it will use 24 queues.
   If you have 128 CPUs, it will only use 16 of the queues.

 - Interrupt steering needs to be controlled by block-mq instead of
   the driver.  It's pointless to have each driver implement its own
   policies on interrupt steering, irqbalanced remains a source of
   end-user frustration, and block-mq can change the queue<->cpu mapping
   without the driver's knowledge.

(thanks to Keith for his input on the first and suggestion of the second).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 17:40 ` Matthew Wilcox
@ 2016-04-08 18:00   ` James Bottomley
  2016-04-08 18:08     ` Christoph Hellwig
  2016-04-08 18:06   ` Keith Busch
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2016-04-08 18:00 UTC (permalink / raw)
  To: Matthew Wilcox, Hannes Reinecke
  Cc: lsf, linux-block, Jens Axboe, Christoph Hellwig, SCSI Mailing List

On Fri, 2016-04-08 at 13:40 -0400, Matthew Wilcox wrote:
> On Fri, Apr 08, 2016 at 01:29:26PM +0200, Hannes Reinecke wrote:
> > I'd like to propose a topic on block-mq issues with FC.
> > During my performance testing using block/scsi-mq with FC I've hit
> > several issues I'd like to discuss:
> 
> If there's a general block-mq bitching session, I have some ideas :-)

"Block mq bitching session" is going to look a bit bad on the public
schedule, what about "Block MQ implementor feedback"?

>  - Inability to use all queues supported by a device.  Intel's P3700
>    supports 31 queues, but block-mq insists on assigning an even multiple
>    of CPUs to each queue.  So if you have 48 CPUs, it will use 24 queues.
>    If you have 128 CPUs, it will only use 16 of the queues.
> 
>  - Interrupt steering needs to be controlled by block-mq instead of
>    the driver.  It's pointless to have each driver implement its own
>    policies on interrupt steering, irqbalanced remains a source of
>    end-user frustration, and block-mq can change the queue<->cpu mapping
>    without the driver's knowledge.
> 
> (thanks to Keith for his input on the first and suggestion of the second).

OK, what about two sessions, one for general bitching (the feedback
sessions) and one for concrete proposals for improvements ... so rather
than just complaining about the problem, if you have concrete ideas
about fixing it, that would go into the second session.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 17:40 ` Matthew Wilcox
  2016-04-08 18:00   ` James Bottomley
@ 2016-04-08 18:06   ` Keith Busch
  2016-04-12 19:16     ` Jens Axboe
  2016-04-08 18:14   ` Bart Van Assche
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2016-04-08 18:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hannes Reinecke, lsf, linux-block, Jens Axboe, SCSI Mailing List,
	Christoph Hellwig

On Fri, Apr 08, 2016 at 01:40:06PM -0400, Matthew Wilcox wrote:
>  - Inability to use all queues supported by a device.  Intel's P3700
>    supports 31 queues, but block-mq insists on assigning an even multiple
>    of CPUs to each queue.  So if you have 48 CPUs, it will use 24 queues.
>    If you have 128 CPUs, it will only use 16 of the queues.

While it'd be better to use all the available h/w resources, that's
actually not the worst part.

The real problems occur when there are more physical/unique CPUs than
h/w queues since blk-mq does not consider CPU topology beyond thread
siblings. With 128 CPUs, blk-mq may use all 31 queues P3700 supports,
but many CPU groups won't share a last-level-cache.

Smarter assignment would reclaim some untapped performance, and we can
share such code prior to the session.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 18:00   ` James Bottomley
@ 2016-04-08 18:08     ` Christoph Hellwig
  2016-04-08 18:24       ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2016-04-08 18:08 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Hannes Reinecke, lsf, linux-block, Jens Axboe,
	Christoph Hellwig, SCSI Mailing List

On Fri, Apr 08, 2016 at 11:00:51AM -0700, James Bottomley wrote:
> >  - Inability to use all queues supported by a device.  Intel's P3700
> >    supports 31 queues, but block-mq insists on assigning an even multiple
> >    of CPUs to each queue.  So if you have 48 CPUs, it will use 24 queues.
> >    If you have 128 CPUs, it will only use 16 of the queues.
> > 
> >  - Interrupt steering needs to be controlled by block-mq instead of
> >    the driver.  It's pointless to have each driver implement its own
> >    policies on interrupt steering, irqbalanced remains a source of
> >    end-user frustration, and block-mq can change the queue<->cpu mapping
> >    without the driver's knowledge.
> > 
> > (thanks to Keith for his input on the first and suggestion of the second).
> 
> OK, what about two sessions, one for general bitching (the feedback
> sessions) and one for concrete proposals for improvements ... so rather
> than just complaining about the problem, if you have concrete ideas
> about fixing it, that would go into the second session.

We already have the blk-mq interrupt assignment session on the schedule,
which is about willy's item.  And my work in progress code to address
the issue also mostly addresses his item number 1, so I think we can
just keep the schedule most as is and just rename "multiqueue interrupt
assignment" into "multiqueue interrupt and queue assignment".

No need to blow it up into three slots.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 11:29 [LSF/MM TOPIC] block-mq issues with FC Hannes Reinecke
                   ` (2 preceding siblings ...)
  2016-04-08 17:40 ` Matthew Wilcox
@ 2016-04-08 18:13 ` Christoph Hellwig
  3 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-04-08 18:13 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: lsf, SCSI Mailing List, linux-block, Christoph Hellwig, Jens Axboe

First: what is actually FC specific here?

> - timeout handling:
> Out of necessity the status of any timed out command is undefined.
> So to be absolutely safe HBAs will be using extended timeouts here
> (eg 70secs for lpfc). During that time we _could_ signal I/O timeout
> to the upper layers, but then the tag will be reused, despite the
> HBA still having a reference to it.
> I'd like to discuss how this could be solved best with blk-mq.

reusing a tag that the hardware hasn't returned is simply unsfafe,
nothing really blk-mq specific here.

> - Adaption on other HBAs to multiqueue:
> The current block-mq design assumes symmetric send and receive
> queues (in effect queue pairs). Any hardware _not_ providing this
> (like qla2xxx) can not be easily converted to scsi-mq. I'd like to
> discuss how one could approach converting these drivers.

Why do you think blk-mq assumes this?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 17:40 ` Matthew Wilcox
  2016-04-08 18:00   ` James Bottomley
  2016-04-08 18:06   ` Keith Busch
@ 2016-04-08 18:14   ` Bart Van Assche
  2016-04-08 19:22   ` Waskiewicz, PJ
  2016-04-10 19:02   ` Sagi Grimberg
  4 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2016-04-08 18:14 UTC (permalink / raw)
  To: Matthew Wilcox, Hannes Reinecke
  Cc: lsf, linux-block, Jens Axboe, SCSI Mailing List, Christoph Hellwig

On 04/08/2016 10:40 AM, Matthew Wilcox wrote:
>   - Interrupt steering needs to be controlled by block-mq instead of
>     the driver.  It's pointless to have each driver implement its own
>     policies on interrupt steering, irqbalanced remains a source of
>     end-user frustration, and block-mq can change the queue<->cpu mapping
>     without the driver's knowledge.

I'm looking forward to the day that I will be able to drop my script for 
spreading interrupts manually (see also the fifth attachment of 
http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/21312/focus=98409).

Bart.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 18:08     ` Christoph Hellwig
@ 2016-04-08 18:24       ` James Bottomley
  0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2016-04-08 18:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, SCSI Mailing List, lsf, Hannes Reinecke,
	Matthew Wilcox

On Fri, 2016-04-08 at 20:08 +0200, Christoph Hellwig wrote:
> On Fri, Apr 08, 2016 at 11:00:51AM -0700, James Bottomley wrote:
> > >  - Inability to use all queues supported by a device.  Intel's
> > > P3700
> > >    supports 31 queues, but block-mq insists on assigning an even
> > > multiple
> > >    of CPUs to each queue.  So if you have 48 CPUs, it will use 24
> > > queues.
> > >    If you have 128 CPUs, it will only use 16 of the queues.
> > > 
> > >  - Interrupt steering needs to be controlled by block-mq instead
> > > of
> > >    the driver.  It's pointless to have each driver implement its
> > > own
> > >    policies on interrupt steering, irqbalanced remains a source
> > > of
> > >    end-user frustration, and block-mq can change the queue<->cpu
> > > mapping
> > >    without the driver's knowledge.
> > > 
> > > (thanks to Keith for his input on the first and suggestion of the
> > > second).
> > 
> > OK, what about two sessions, one for general bitching (the feedback
> > sessions) and one for concrete proposals for improvements ... so
> > rather
> > than just complaining about the problem, if you have concrete ideas
> > about fixing it, that would go into the second session.
> 
> We already have the blk-mq interrupt assignment session on the
> schedule,
> which is about willy's item.  And my work in progress code to address
> the issue also mostly addresses his item number 1, so I think we can
> just keep the schedule most as is and just rename "multiqueue
> interrupt
> assignment" into "multiqueue interrupt and queue assignment".
> 
> No need to blow it up into three slots.

Agreed; I made the adjustments.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 17:40 ` Matthew Wilcox
                     ` (2 preceding siblings ...)
  2016-04-08 18:14   ` Bart Van Assche
@ 2016-04-08 19:22   ` Waskiewicz, PJ
  2016-04-10 19:02   ` Sagi Grimberg
  4 siblings, 0 replies; 16+ messages in thread
From: Waskiewicz, PJ @ 2016-04-08 19:22 UTC (permalink / raw)
  To: willy, hare; +Cc: lsf, linux-scsi, hch, linux-block, axboe

On Fri, 2016-04-08 at 13:40 -0400, Matthew Wilcox wrote:
> On Fri, Apr 08, 2016 at 01:29:26PM +0200, Hannes Reinecke wrote:
> >  - Interrupt steering needs to be controlled by block-mq instead of
>    the driver.  It's pointless to have each driver implement its own
>    policies on interrupt steering, irqbalanced remains a source of
>    end-user frustration, and block-mq can change the queue<->cpu
> mapping
>    without the driver's knowledge.

This is the same problem in the networking space as well.  When I added
affinity_hint to the irq_desc, and then that support into irqbalance,
my original approach was to allow the driver to assign affinities.
 This was shot down because a driver was influencing policy, versus
allowing userspace to do so.  Meh.

If there's something actionable out of this discussion that makes
interrupt steering better, I'd like to see us drive it into the
networking world as well.  That would also let me rip out the
affinity_hint stuff overall from irqbalance...

-PJ

-- 
PJ Waskiewicz
Principal Engineer, NetApp
e: pj.waskiewicz@netapp.com
d: 503.961.3705

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 17:40 ` Matthew Wilcox
                     ` (3 preceding siblings ...)
  2016-04-08 19:22   ` Waskiewicz, PJ
@ 2016-04-10 19:02   ` Sagi Grimberg
  2016-04-12 19:04     ` Quinn Tran
  4 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2016-04-10 19:02 UTC (permalink / raw)
  To: Matthew Wilcox, Hannes Reinecke
  Cc: lsf, linux-block, Jens Axboe, SCSI Mailing List, Christoph Hellwig

Hey Willy,

>   - Interrupt steering needs to be controlled by block-mq instead of
>     the driver.  It's pointless to have each driver implement its own
>     policies on interrupt steering, irqbalanced remains a source of
>     end-user frustration, and block-mq can change the queue<->cpu mapping
>     without the driver's knowledge.

I honestly don't think that block-mq is the right place to
*assign* interrupt steering. Not all HW devices are dedicated
to storage, take RDMA for example, a RNIC is shared by block
storage, networking and even user-space workloads so obviously
block-mq can't understand how a user wants to steer interrupts.

I think that block-mq needs to ask the device driver:
"what is the optimal queue index for cpu X?" and use it
while *someone* will be responsible for optimum interrupt
steering (can be the driver itself or user-space).

 From some discussions I had with HCH I think he intends to
use the cpu reverse-mapping API to try and do what's described
above (if I'm not mistaken).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-10 19:02   ` Sagi Grimberg
@ 2016-04-12 19:04     ` Quinn Tran
  0 siblings, 0 replies; 16+ messages in thread
From: Quinn Tran @ 2016-04-12 19:04 UTC (permalink / raw)
  To: Sagi Grimberg, Matthew Wilcox, Hannes Reinecke
  Cc: lsf, linux-block, Jens Axboe, linux-scsi, Christoph Hellwig


>Hey Willy,
>
>>   - Interrupt steering needs to be controlled by block-mq instead of
>>     the driver.  It's pointless to have each driver implement its own
>>     policies on interrupt steering, irqbalanced remains a source of
>>     end-user frustration, and block-mq can change the queue<->cpu mapping
>>     without the driver's knowledge.
>
>I honestly don't think that block-mq is the right place to
>*assign* interrupt steering. Not all HW devices are dedicated
>to storage, take RDMA for example, a RNIC is shared by block
>storage, networking and even user-space workloads so obviously
>block-mq can't understand how a user wants to steer interrupts.
>
>I think that block-mq needs to ask the device driver:
>"what is the optimal queue index for cpu X?" and use it
>while *someone* will be responsible for optimum interrupt
>steering (can be the driver itself or user-space).

+0.5 on block-mq asking lower layer on where to place the queue.  However, I think it is better that the lower layer push up the data rather the block-mq asking for it.  User can change or irqbalance can relocate the interrupt vector(s) during runtime.  

For Qlogic adapter, it can act in both Initiator & Target Modes at the same time.  Certain target vendor might not wants the initiator side to holding this knob.




>
> From some discussions I had with HCH I think he intends to
>use the cpu reverse-mapping API to try and do what's described
>above (if I'm not mistaken).

>_______________________________________________
>Lsf mailing list
>Lsf@lists.linux-foundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/lsf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
  2016-04-08 18:06   ` Keith Busch
@ 2016-04-12 19:16     ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2016-04-12 19:16 UTC (permalink / raw)
  To: Keith Busch, Matthew Wilcox
  Cc: Hannes Reinecke, lsf, linux-block, SCSI Mailing List, Christoph Hellwig

On 04/08/2016 12:06 PM, Keith Busch wrote:
> On Fri, Apr 08, 2016 at 01:40:06PM -0400, Matthew Wilcox wrote:
>>   - Inability to use all queues supported by a device.  Intel's P3700
>>     supports 31 queues, but block-mq insists on assigning an even multiple
>>     of CPUs to each queue.  So if you have 48 CPUs, it will use 24 queues.
>>     If you have 128 CPUs, it will only use 16 of the queues.
>
> While it'd be better to use all the available h/w resources, that's
> actually not the worst part.
>
> The real problems occur when there are more physical/unique CPUs than
> h/w queues since blk-mq does not consider CPU topology beyond thread
> siblings. With 128 CPUs, blk-mq may use all 31 queues P3700 supports,
> but many CPU groups won't share a last-level-cache.
>
> Smarter assignment would reclaim some untapped performance, and we can
> share such code prior to the session.

There's definitely room for improvement in the cpu mapping code.

However, on the original complaint, it's by design (or, working as 
intended) - this was done to keep the layout symmetrical. It's been 
discussed on the mailing lists before. We can have a discussion whether 
we should change this or not, of course.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-04-12 19:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08 11:29 [LSF/MM TOPIC] block-mq issues with FC Hannes Reinecke
2016-04-08 15:11 ` James Bottomley
2016-04-08 15:51   ` [Lsf] " Ewan D. Milne
2016-04-08 16:06     ` James Bottomley
2016-04-08 17:26 ` Bart Van Assche
2016-04-08 17:40 ` Matthew Wilcox
2016-04-08 18:00   ` James Bottomley
2016-04-08 18:08     ` Christoph Hellwig
2016-04-08 18:24       ` James Bottomley
2016-04-08 18:06   ` Keith Busch
2016-04-12 19:16     ` Jens Axboe
2016-04-08 18:14   ` Bart Van Assche
2016-04-08 19:22   ` Waskiewicz, PJ
2016-04-10 19:02   ` Sagi Grimberg
2016-04-12 19:04     ` Quinn Tran
2016-04-08 18:13 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.