All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
@ 2015-01-07 16:25 Sagi Grimberg
       [not found] ` <54AD5DDD.2090808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-01-07 16:58 ` Nicholas A. Bellinger
  0 siblings, 2 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-07 16:25 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-scsi, target-devel, open-iscsi

Hi everyone,

Now that scsi-mq is fully included, we need an iSCSI initiator that
would use it to achieve scalable performance. The need is even greater
for iSCSI offload devices and transports that support multiple HW
queues. As iSER maintainer I'd like to discuss the way we would choose
to implement that in iSCSI.

My measurements show that iSER initiator can scale up to ~2.1M IOPs
with multiple sessions but only ~630K IOPs with a single session where
the most significant bottleneck the (single) core processing
completions.

In the existing single connection per session model, given that command
ordering must be preserved session-wide, we end up in a serial command
execution over a single connection which is basically a single queue
model. The best fit seems to be plugging iSCSI MCS as a multi-queued
scsi LLDD. In this model, a hardware context will have a 1x1 mapping
with an iSCSI connection (TCP socket or a HW queue).

iSCSI MCS and it's role in the presence of dm-multipath layer was
discussed several times in the past decade(s). The basic need for MCS is
implementing a multi-queue data path, so perhaps we may want to avoid
doing any type link aggregation or load balancing to not overlap
dm-multipath. For example we can implement ERL=0 (which is basically the
scsi-mq ERL) and/or restrict a session to a single portal.

As I see it, the todo's are:
1. Getting MCS to work (kernel + user-space) with ERL=0 and a
    round-robin connection selection (per scsi command execution).
2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
    using blk-mq based queue (conn) selection.
3. Rework iSCSI core locking scheme to avoid session-wide locking
    as much as possible.
4. Use blk-mq pre-allocation and tagging facilities.

I've recently started looking into this. I would like the community to
agree (or debate) on this scheme and also talk about implementation
with anyone who is also interested in this.

Cheers,
Sagi.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found] ` <54AD5DDD.2090808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-01-07 16:57   ` Hannes Reinecke
       [not found]     ` <54AD6563.4040603-l3A5Bk7waGM@public.gmane.org>
  2015-01-07 17:22   ` Lee Duncan
  1 sibling, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2015-01-07 16:57 UTC (permalink / raw)
  To: Sagi Grimberg, lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-scsi, target-devel, open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	Mike Christie

On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
> Hi everyone,
> 
> Now that scsi-mq is fully included, we need an iSCSI initiator that
> would use it to achieve scalable performance. The need is even greater
> for iSCSI offload devices and transports that support multiple HW
> queues. As iSER maintainer I'd like to discuss the way we would choose
> to implement that in iSCSI.
> 
> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> with multiple sessions but only ~630K IOPs with a single session where
> the most significant bottleneck the (single) core processing
> completions.
> 
> In the existing single connection per session model, given that command
> ordering must be preserved session-wide, we end up in a serial command
> execution over a single connection which is basically a single queue
> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> with an iSCSI connection (TCP socket or a HW queue).
> 
> iSCSI MCS and it's role in the presence of dm-multipath layer was
> discussed several times in the past decade(s). The basic need for MCS is
> implementing a multi-queue data path, so perhaps we may want to avoid
> doing any type link aggregation or load balancing to not overlap
> dm-multipath. For example we can implement ERL=0 (which is basically the
> scsi-mq ERL) and/or restrict a session to a single portal.
> 
> As I see it, the todo's are:
> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>    round-robin connection selection (per scsi command execution).
> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>    using blk-mq based queue (conn) selection.
> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>    as much as possible.
> 4. Use blk-mq pre-allocation and tagging facilities.
> 
> I've recently started looking into this. I would like the community to
> agree (or debate) on this scheme and also talk about implementation
> with anyone who is also interested in this.
> 
Yes, that's a really good topic.

I've pondered implementing MC/S for iscsi/TCP but then I've figured my
network implementation knowledge doesn't spread that far.
So yeah, a discussion here would be good.

Mike? Any comments?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-07 16:25 [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion Sagi Grimberg
       [not found] ` <54AD5DDD.2090808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-01-07 16:58 ` Nicholas A. Bellinger
  1 sibling, 0 replies; 37+ messages in thread
From: Nicholas A. Bellinger @ 2015-01-07 16:58 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: lsf-pc, linux-scsi, target-devel, open-iscsi, Mike Christie

On Wed, 2015-01-07 at 18:25 +0200, Sagi Grimberg wrote:
> Hi everyone,
> 
> Now that scsi-mq is fully included, we need an iSCSI initiator that
> would use it to achieve scalable performance. The need is even greater
> for iSCSI offload devices and transports that support multiple HW
> queues. As iSER maintainer I'd like to discuss the way we would choose
> to implement that in iSCSI.
> 
> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> with multiple sessions but only ~630K IOPs with a single session where
> the most significant bottleneck the (single) core processing
> completions.
> 
> In the existing single connection per session model, given that command
> ordering must be preserved session-wide, we end up in a serial command
> execution over a single connection which is basically a single queue
> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> with an iSCSI connection (TCP socket or a HW queue).
> 
> iSCSI MCS and it's role in the presence of dm-multipath layer was
> discussed several times in the past decade(s). The basic need for MCS is
> implementing a multi-queue data path, so perhaps we may want to avoid
> doing any type link aggregation or load balancing to not overlap
> dm-multipath. For example we can implement ERL=0 (which is basically the
> scsi-mq ERL) and/or restrict a session to a single portal.
> 
> As I see it, the todo's are:
> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>     round-robin connection selection (per scsi command execution).
> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>     using blk-mq based queue (conn) selection.
> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>     as much as possible.
> 4. Use blk-mq pre-allocation and tagging facilities.
> 
> I've recently started looking into this. I would like the community to
> agree (or debate) on this scheme and also talk about implementation
> with anyone who is also interested in this.
> 

(Adding CC' for MNC)

+1.  I'd be up for this discussion to finally get MC/S support into
open-iscsi, and determining the best method for mapping connection
conects to scsi-mq hw queues.

--nab

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found] ` <54AD5DDD.2090808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-01-07 16:57   ` Hannes Reinecke
@ 2015-01-07 17:22   ` Lee Duncan
  2015-01-07 19:11     ` [Lsf-pc] " Jan Kara
  1 sibling, 1 reply; 37+ messages in thread
From: Lee Duncan @ 2015-01-07 17:22 UTC (permalink / raw)
  To: Sagi Grimberg, lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-scsi, target-devel, open-iscsi-/JYPxA39Uh5TLH3MbocFFw

On 01/07/2015 08:25 AM, Sagi Grimberg wrote:
> Hi everyone,
> 
> Now that scsi-mq is fully included, we need an iSCSI initiator that
> would use it to achieve scalable performance. The need is even greater
> for iSCSI offload devices and transports that support multiple HW
> queues. As iSER maintainer I'd like to discuss the way we would choose
> to implement that in iSCSI.
> 
> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> with multiple sessions but only ~630K IOPs with a single session where
> the most significant bottleneck the (single) core processing
> completions.
> 
> In the existing single connection per session model, given that command
> ordering must be preserved session-wide, we end up in a serial command
> execution over a single connection which is basically a single queue
> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> with an iSCSI connection (TCP socket or a HW queue).
> 
> iSCSI MCS and it's role in the presence of dm-multipath layer was
> discussed several times in the past decade(s). The basic need for MCS is
> implementing a multi-queue data path, so perhaps we may want to avoid
> doing any type link aggregation or load balancing to not overlap
> dm-multipath. For example we can implement ERL=0 (which is basically the
> scsi-mq ERL) and/or restrict a session to a single portal.
> 
> As I see it, the todo's are:
> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>    round-robin connection selection (per scsi command execution).
> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>    using blk-mq based queue (conn) selection.
> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>    as much as possible.
> 4. Use blk-mq pre-allocation and tagging facilities.
> 
> I've recently started looking into this. I would like the community to
> agree (or debate) on this scheme and also talk about implementation
> with anyone who is also interested in this.
> 
> Cheers,
> Sagi.

I started looking at this last year (and Hannes' suggestion), and would
love to join the discussion.

Please add me to the list of those that wish to attend.
-- 
Lee Duncan
SUSE Labs

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-07 17:22   ` Lee Duncan
@ 2015-01-07 19:11     ` Jan Kara
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Kara @ 2015-01-07 19:11 UTC (permalink / raw)
  To: Lee Duncan; +Cc: Sagi Grimberg, lsf-pc, open-iscsi, target-devel, linux-scsi

On Wed 07-01-15 09:22:13, Lee Duncan wrote:
> On 01/07/2015 08:25 AM, Sagi Grimberg wrote:
> > Hi everyone,
> > 
> > Now that scsi-mq is fully included, we need an iSCSI initiator that
> > would use it to achieve scalable performance. The need is even greater
> > for iSCSI offload devices and transports that support multiple HW
> > queues. As iSER maintainer I'd like to discuss the way we would choose
> > to implement that in iSCSI.
> > 
> > My measurements show that iSER initiator can scale up to ~2.1M IOPs
> > with multiple sessions but only ~630K IOPs with a single session where
> > the most significant bottleneck the (single) core processing
> > completions.
> > 
> > In the existing single connection per session model, given that command
> > ordering must be preserved session-wide, we end up in a serial command
> > execution over a single connection which is basically a single queue
> > model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> > scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> > with an iSCSI connection (TCP socket or a HW queue).
> > 
> > iSCSI MCS and it's role in the presence of dm-multipath layer was
> > discussed several times in the past decade(s). The basic need for MCS is
> > implementing a multi-queue data path, so perhaps we may want to avoid
> > doing any type link aggregation or load balancing to not overlap
> > dm-multipath. For example we can implement ERL=0 (which is basically the
> > scsi-mq ERL) and/or restrict a session to a single portal.
> > 
> > As I see it, the todo's are:
> > 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
> >    round-robin connection selection (per scsi command execution).
> > 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
> >    using blk-mq based queue (conn) selection.
> > 3. Rework iSCSI core locking scheme to avoid session-wide locking
> >    as much as possible.
> > 4. Use blk-mq pre-allocation and tagging facilities.
> > 
> > I've recently started looking into this. I would like the community to
> > agree (or debate) on this scheme and also talk about implementation
> > with anyone who is also interested in this.
> > 
> > Cheers,
> > Sagi.
> 
> I started looking at this last year (and Hannes' suggestion), and would
> love to join the discussion.
> 
> Please add me to the list of those that wish to attend.
  For that please send a separate email with attend request as described in
the call for proposals. Thanks!

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]     ` <54AD6563.4040603-l3A5Bk7waGM@public.gmane.org>
@ 2015-01-07 21:39       ` Mike Christie
  2015-01-08  7:50         ` Bart Van Assche
                           ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Mike Christie @ 2015-01-07 21:39 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-scsi, target-devel, open-iscsi-/JYPxA39Uh5TLH3MbocFFw

On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
>> Hi everyone,
>>
>> Now that scsi-mq is fully included, we need an iSCSI initiator that
>> would use it to achieve scalable performance. The need is even greater
>> for iSCSI offload devices and transports that support multiple HW
>> queues. As iSER maintainer I'd like to discuss the way we would choose
>> to implement that in iSCSI.
>>
>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
>> with multiple sessions but only ~630K IOPs with a single session where
>> the most significant bottleneck the (single) core processing
>> completions.
>>
>> In the existing single connection per session model, given that command
>> ordering must be preserved session-wide, we end up in a serial command
>> execution over a single connection which is basically a single queue
>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
>> with an iSCSI connection (TCP socket or a HW queue).
>>
>> iSCSI MCS and it's role in the presence of dm-multipath layer was
>> discussed several times in the past decade(s). The basic need for MCS is
>> implementing a multi-queue data path, so perhaps we may want to avoid
>> doing any type link aggregation or load balancing to not overlap
>> dm-multipath. For example we can implement ERL=0 (which is basically the
>> scsi-mq ERL) and/or restrict a session to a single portal.
>>
>> As I see it, the todo's are:
>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>>    round-robin connection selection (per scsi command execution).
>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>>    using blk-mq based queue (conn) selection.
>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>>    as much as possible.
>> 4. Use blk-mq pre-allocation and tagging facilities.
>>
>> I've recently started looking into this. I would like the community to
>> agree (or debate) on this scheme and also talk about implementation
>> with anyone who is also interested in this.
>>
> Yes, that's a really good topic.
> 
> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
> network implementation knowledge doesn't spread that far.
> So yeah, a discussion here would be good.
> 
> Mike? Any comments?

I have been working under the assumption that people would be ok with
MCS upstream if we are only using it to handle the issue where we want
to do something like have a tcp/iscsi connection per CPU then map the
connection to a blk_mq_hw_ctx. In this more limited MCS implementation
there would be no iscsi layer code to do something like load balance
across ports or transport paths like how dm-multipath does, so there
would be no feature/code duplication. For balancing across hctxs, then
the iscsi layer would also leave that up to whatever we end up with in
upper layers, so again no feature/code duplication with upper layers.

So pretty non controversial I hope :)

If people want to add something like round robin connection selection in
the iscsi layer, then I think we want to leave that for after the
initial merge, so people can argue about that separately.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-07 21:39       ` Mike Christie
@ 2015-01-08  7:50         ` Bart Van Assche
  2015-01-08 13:45           ` Sagi Grimberg
                             ` (2 more replies)
  2015-01-08 14:50         ` James Bottomley
       [not found]         ` <54ADA777.6090801-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  2 siblings, 3 replies; 37+ messages in thread
From: Bart Van Assche @ 2015-01-08  7:50 UTC (permalink / raw)
  To: open-iscsi, Hannes Reinecke, Sagi Grimberg, lsf-pc
  Cc: linux-scsi, target-devel

On 01/07/15 22:39, Mike Christie wrote:
> On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
>> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
>>> Hi everyone,
>>>
>>> Now that scsi-mq is fully included, we need an iSCSI initiator that
>>> would use it to achieve scalable performance. The need is even greater
>>> for iSCSI offload devices and transports that support multiple HW
>>> queues. As iSER maintainer I'd like to discuss the way we would choose
>>> to implement that in iSCSI.
>>>
>>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
>>> with multiple sessions but only ~630K IOPs with a single session where
>>> the most significant bottleneck the (single) core processing
>>> completions.
>>>
>>> In the existing single connection per session model, given that command
>>> ordering must be preserved session-wide, we end up in a serial command
>>> execution over a single connection which is basically a single queue
>>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
>>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
>>> with an iSCSI connection (TCP socket or a HW queue).
>>>
>>> iSCSI MCS and it's role in the presence of dm-multipath layer was
>>> discussed several times in the past decade(s). The basic need for MCS is
>>> implementing a multi-queue data path, so perhaps we may want to avoid
>>> doing any type link aggregation or load balancing to not overlap
>>> dm-multipath. For example we can implement ERL=0 (which is basically the
>>> scsi-mq ERL) and/or restrict a session to a single portal.
>>>
>>> As I see it, the todo's are:
>>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>>>     round-robin connection selection (per scsi command execution).
>>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>>>     using blk-mq based queue (conn) selection.
>>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>>>     as much as possible.
>>> 4. Use blk-mq pre-allocation and tagging facilities.
>>>
>>> I've recently started looking into this. I would like the community to
>>> agree (or debate) on this scheme and also talk about implementation
>>> with anyone who is also interested in this.
>>>
>> Yes, that's a really good topic.
>>
>> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
>> network implementation knowledge doesn't spread that far.
>> So yeah, a discussion here would be good.
>>
>> Mike? Any comments?
>
> I have been working under the assumption that people would be ok with
> MCS upstream if we are only using it to handle the issue where we want
> to do something like have a tcp/iscsi connection per CPU then map the
> connection to a blk_mq_hw_ctx. In this more limited MCS implementation
> there would be no iscsi layer code to do something like load balance
> across ports or transport paths like how dm-multipath does, so there
> would be no feature/code duplication. For balancing across hctxs, then
> the iscsi layer would also leave that up to whatever we end up with in
> upper layers, so again no feature/code duplication with upper layers.
>
> So pretty non controversial I hope :)
>
> If people want to add something like round robin connection selection in
> the iscsi layer, then I think we want to leave that for after the
> initial merge, so people can argue about that separately.

Hello Sagi and Mike,

I agree with Sagi that adding scsi-mq support in the iSER initiator 
would help iSER users because that would allow these users to configure 
a single iSER target and use the multiqueue feature instead of having to 
configure multiple iSER targets to spread the workload over multiple 
cpus at the target side.

And I agree with Mike that implementing scsi-mq support in the iSER 
initiator as multiple independent connections probably is a better 
choice than MC/S. RFC 3720 namely requires that iSCSI numbering is 
session-wide. This means maintaining a single counter for all MC/S 
sessions. Such a counter would be a contention point. I'm afraid that 
because of that counter performance on a multi-socket initiator system 
with a scsi-mq implementation based on MC/S could be worse than with the 
approach with multiple iSER targets. Hence my preference for an approach 
based on multiple independent iSER connections instead of MC/S.

Bart.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08  7:50         ` Bart Van Assche
@ 2015-01-08 13:45           ` Sagi Grimberg
       [not found]             ` <54AE8A02.1030100-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-01-14  4:16             ` Vladislav Bolkhovitin
  2015-01-08 22:16           ` Nicholas A. Bellinger
  2015-01-08 23:01           ` Mike Christie
  2 siblings, 2 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-08 13:45 UTC (permalink / raw)
  To: Bart Van Assche, open-iscsi, Hannes Reinecke, lsf-pc
  Cc: linux-scsi, target-devel, Oren Duer, Or Gerlitz

On 1/8/2015 9:50 AM, Bart Van Assche wrote:
> On 01/07/15 22:39, Mike Christie wrote:
>> On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
>>> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
>>>> Hi everyone,
>>>>
>>>> Now that scsi-mq is fully included, we need an iSCSI initiator that
>>>> would use it to achieve scalable performance. The need is even greater
>>>> for iSCSI offload devices and transports that support multiple HW
>>>> queues. As iSER maintainer I'd like to discuss the way we would choose
>>>> to implement that in iSCSI.
>>>>
>>>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
>>>> with multiple sessions but only ~630K IOPs with a single session where
>>>> the most significant bottleneck the (single) core processing
>>>> completions.
>>>>
>>>> In the existing single connection per session model, given that command
>>>> ordering must be preserved session-wide, we end up in a serial command
>>>> execution over a single connection which is basically a single queue
>>>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
>>>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
>>>> with an iSCSI connection (TCP socket or a HW queue).
>>>>
>>>> iSCSI MCS and it's role in the presence of dm-multipath layer was
>>>> discussed several times in the past decade(s). The basic need for
>>>> MCS is
>>>> implementing a multi-queue data path, so perhaps we may want to avoid
>>>> doing any type link aggregation or load balancing to not overlap
>>>> dm-multipath. For example we can implement ERL=0 (which is basically
>>>> the
>>>> scsi-mq ERL) and/or restrict a session to a single portal.
>>>>
>>>> As I see it, the todo's are:
>>>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>>>>     round-robin connection selection (per scsi command execution).
>>>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>>>>     using blk-mq based queue (conn) selection.
>>>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>>>>     as much as possible.
>>>> 4. Use blk-mq pre-allocation and tagging facilities.
>>>>
>>>> I've recently started looking into this. I would like the community to
>>>> agree (or debate) on this scheme and also talk about implementation
>>>> with anyone who is also interested in this.
>>>>
>>> Yes, that's a really good topic.
>>>
>>> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
>>> network implementation knowledge doesn't spread that far.
>>> So yeah, a discussion here would be good.
>>>
>>> Mike? Any comments?
>>
>> I have been working under the assumption that people would be ok with
>> MCS upstream if we are only using it to handle the issue where we want
>> to do something like have a tcp/iscsi connection per CPU then map the
>> connection to a blk_mq_hw_ctx. In this more limited MCS implementation
>> there would be no iscsi layer code to do something like load balance
>> across ports or transport paths like how dm-multipath does, so there
>> would be no feature/code duplication. For balancing across hctxs, then
>> the iscsi layer would also leave that up to whatever we end up with in
>> upper layers, so again no feature/code duplication with upper layers.
>>
>> So pretty non controversial I hope :)
>>
>> If people want to add something like round robin connection selection in
>> the iscsi layer, then I think we want to leave that for after the
>> initial merge, so people can argue about that separately.
>
> Hello Sagi and Mike,
>
> I agree with Sagi that adding scsi-mq support in the iSER initiator
> would help iSER users because that would allow these users to configure
> a single iSER target and use the multiqueue feature instead of having to
> configure multiple iSER targets to spread the workload over multiple
> cpus at the target side.

Hey Bart,

IMHO, iSER is an iSCSI extension, so I think the discussion should
focus the solving this in iSCSI level in a way that would apply both
for TCP and RDMA (and offload devices).

>
> And I agree with Mike that implementing scsi-mq support in the iSER
> initiator as multiple independent connections probably is a better
> choice than MC/S.

Actually I started with that approach, but the independent connections
under a single session (I-T-Nexus) violates the command ordering 
requirement. Plus, such a solution is specific to iSER...

> RFC 3720 namely requires that iSCSI numbering is
> session-wide. This means maintaining a single counter for all MC/S
> sessions. Such a counter would be a contention point. I'm afraid that
> because of that counter performance on a multi-socket initiator system
> with a scsi-mq implementation based on MC/S could be worse than with the
> approach with multiple iSER targets. Hence my preference for an approach
> based on multiple independent iSER connections instead of MC/S.

So this comment is spot on the pros/cons of the discussion (we might 
want to leave something for LSF ;)).
MCS would not allow a completely lockless data-path due to command
ordering. On the other hand implementing some kind of multiple sessions
solution feels somewhat like a mis-fit (at least in my view).

One of my thoughts about how to overcome the contention on commands
sequence numbering was to suggest some kind of negotiable "relaxed
ordering" mode but of course I don't have anything figured out yet.

I had a short discussion on this with Mallikarjun Chadalapaka at SDC-14.
He said that if I show some numbers to back up such a proposal it can
be considered.

Sagi.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]             ` <54AE8A02.1030100-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-01-08 14:11               ` Bart Van Assche
       [not found]                 ` <54AE9010.5080609-HInyCGIudOg@public.gmane.org>
  2015-01-09 11:39                 ` Sagi Grimberg
  0 siblings, 2 replies; 37+ messages in thread
From: Bart Van Assche @ 2015-01-08 14:11 UTC (permalink / raw)
  To: Sagi Grimberg, open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	Hannes Reinecke,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-scsi, target-devel, Oren Duer, Or Gerlitz

On 01/08/15 14:45, Sagi Grimberg wrote:
> Actually I started with that approach, but the independent connections
> under a single session (I-T-Nexus) violates the command ordering
> requirement. Plus, such a solution is specific to iSER...

Hello Sagi,

Which command ordering requirement are you referring to ? The Linux 
storage stack does not guarantee that block layer or SCSI commands will 
be processed in the same order as these commands have been submitted.

However, it might be interesting to have a look at virtscsi_pick_vq(). I 
think the purpose of that function is to keep queueing to the same hwq 
as long as any commands are being executed. This approach avoids that if 
an application is migrated by the scheduler from one CPU to another that 
commands get reordered due to have been submitted to different hwq's. I 
don't think we already have something similar in blk-mq but this is 
something that could be discussed further.

> (we might want to leave something for LSF ;)).

Agreed :-)

Bart.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-07 21:39       ` Mike Christie
  2015-01-08  7:50         ` Bart Van Assche
@ 2015-01-08 14:50         ` James Bottomley
  2015-01-08 17:25           ` Sagi Grimberg
       [not found]         ` <54ADA777.6090801-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  2 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2015-01-08 14:50 UTC (permalink / raw)
  To: Mike Christie
  Cc: Hannes Reinecke, Sagi Grimberg, lsf-pc, linux-scsi, target-devel,
	open-iscsi

On Wed, 2015-01-07 at 15:39 -0600, Mike Christie wrote:
> On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
> > On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
> >> Hi everyone,
> >>
> >> Now that scsi-mq is fully included, we need an iSCSI initiator that
> >> would use it to achieve scalable performance. The need is even greater
> >> for iSCSI offload devices and transports that support multiple HW
> >> queues. As iSER maintainer I'd like to discuss the way we would choose
> >> to implement that in iSCSI.
> >>
> >> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> >> with multiple sessions but only ~630K IOPs with a single session where
> >> the most significant bottleneck the (single) core processing
> >> completions.
> >>
> >> In the existing single connection per session model, given that command
> >> ordering must be preserved session-wide, we end up in a serial command
> >> execution over a single connection which is basically a single queue
> >> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> >> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> >> with an iSCSI connection (TCP socket or a HW queue).
> >>
> >> iSCSI MCS and it's role in the presence of dm-multipath layer was
> >> discussed several times in the past decade(s). The basic need for MCS is
> >> implementing a multi-queue data path, so perhaps we may want to avoid
> >> doing any type link aggregation or load balancing to not overlap
> >> dm-multipath. For example we can implement ERL=0 (which is basically the
> >> scsi-mq ERL) and/or restrict a session to a single portal.
> >>
> >> As I see it, the todo's are:
> >> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
> >>    round-robin connection selection (per scsi command execution).
> >> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
> >>    using blk-mq based queue (conn) selection.
> >> 3. Rework iSCSI core locking scheme to avoid session-wide locking
> >>    as much as possible.
> >> 4. Use blk-mq pre-allocation and tagging facilities.
> >>
> >> I've recently started looking into this. I would like the community to
> >> agree (or debate) on this scheme and also talk about implementation
> >> with anyone who is also interested in this.
> >>
> > Yes, that's a really good topic.
> > 
> > I've pondered implementing MC/S for iscsi/TCP but then I've figured my
> > network implementation knowledge doesn't spread that far.
> > So yeah, a discussion here would be good.
> > 
> > Mike? Any comments?
> 
> I have been working under the assumption that people would be ok with
> MCS upstream if we are only using it to handle the issue where we want
> to do something like have a tcp/iscsi connection per CPU then map the
> connection to a blk_mq_hw_ctx. In this more limited MCS implementation
> there would be no iscsi layer code to do something like load balance
> across ports or transport paths like how dm-multipath does, so there
> would be no feature/code duplication. For balancing across hctxs, then
> the iscsi layer would also leave that up to whatever we end up with in
> upper layers, so again no feature/code duplication with upper layers.
> 
> So pretty non controversial I hope :)

If you can make that work, so we expose MCS in a way that allows upper
layers to use it, I'd say it was pretty much perfect.  The main
objection I've had over the years to multiple connections per session is
that it required a duplication of the multi-path code within the iscsi
initiator (and that was after several long fights to get multi path out
of other fabric initiators), so something that doesn't require the
duplication overcomes that objection.

> If people want to add something like round robin connection selection in
> the iscsi layer, then I think we want to leave that for after the
> initial merge, so people can argue about that separately.

Well, you're right, we can argue about it later, but if it's just round
robin, why would it be better done in the initator rather than dm?

James



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                 ` <54AE9010.5080609-HInyCGIudOg@public.gmane.org>
@ 2015-01-08 15:57                   ` Paul Koning
  0 siblings, 0 replies; 37+ messages in thread
From: Paul Koning @ 2015-01-08 15:57 UTC (permalink / raw)
  To: open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: Sagi Grimberg, Hannes Reinecke,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-scsi,
	target-devel, Oren Duer, Or Gerlitz


> On Jan 8, 2015, at 9:11 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> 
> On 01/08/15 14:45, Sagi Grimberg wrote:
>> Actually I started with that approach, but the independent connections
>> under a single session (I-T-Nexus) violates the command ordering
>> requirement. Plus, such a solution is specific to iSER…

The iSCSI standard specifies an ordering requirement for the case of multiple connections under a single session.  That requirement is in fact a reason why some iSCSI targets have declined to implement multiple connections.

On the other hand, there are lots of “MPIO” implementations in many different operating systems that use multiple sessions, so there is no ordering at the iSCSI level, and whatever ordering is required (if any) is instead implemented at higher layers in the requesting OS.

> 
> Hello Sagi,
> 
> Which command ordering requirement are you referring to ? The Linux storage stack does not guarantee that block layer or SCSI commands will be processed in the same order as these commands have been submitted.

Neither does SCSI, in fact.  The ordering rules of the SCSI standard are worth studying.  They are a lot weaker than most people expect.  A particularly interesting case is multiple concurrent writes with overlapping block ranges.

	paul

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08 14:50         ` James Bottomley
@ 2015-01-08 17:25           ` Sagi Grimberg
  0 siblings, 0 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-08 17:25 UTC (permalink / raw)
  To: James Bottomley, Mike Christie
  Cc: Hannes Reinecke, lsf-pc, linux-scsi, target-devel, open-iscsi

On 1/8/2015 4:50 PM, James Bottomley wrote:
<SNIP>
>> If people want to add something like round robin connection selection in
>> the iscsi layer, then I think we want to leave that for after the
>> initial merge, so people can argue about that separately.
>
> Well, you're right, we can argue about it later, but if it's just round
> robin, why would it be better done in the initator rather than dm?

I agree,

My assumption was that a round-robin conn selection would only be a
temporal stage until we get full integration with scsi-mq. Not something
that will be actually merged.

Sagi.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08  7:50         ` Bart Van Assche
  2015-01-08 13:45           ` Sagi Grimberg
@ 2015-01-08 22:16           ` Nicholas A. Bellinger
  2015-01-08 22:29             ` James Bottomley
  2015-01-08 23:01           ` Mike Christie
  2 siblings, 1 reply; 37+ messages in thread
From: Nicholas A. Bellinger @ 2015-01-08 22:16 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: open-iscsi, Hannes Reinecke, Sagi Grimberg, lsf-pc, linux-scsi,
	target-devel

On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote:
> On 01/07/15 22:39, Mike Christie wrote:
> > On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
> >> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
> >>> Hi everyone,
> >>>
> >>> Now that scsi-mq is fully included, we need an iSCSI initiator that
> >>> would use it to achieve scalable performance. The need is even greater
> >>> for iSCSI offload devices and transports that support multiple HW
> >>> queues. As iSER maintainer I'd like to discuss the way we would choose
> >>> to implement that in iSCSI.
> >>>
> >>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> >>> with multiple sessions but only ~630K IOPs with a single session where
> >>> the most significant bottleneck the (single) core processing
> >>> completions.
> >>>
> >>> In the existing single connection per session model, given that command
> >>> ordering must be preserved session-wide, we end up in a serial command
> >>> execution over a single connection which is basically a single queue
> >>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> >>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> >>> with an iSCSI connection (TCP socket or a HW queue).
> >>>
> >>> iSCSI MCS and it's role in the presence of dm-multipath layer was
> >>> discussed several times in the past decade(s). The basic need for MCS is
> >>> implementing a multi-queue data path, so perhaps we may want to avoid
> >>> doing any type link aggregation or load balancing to not overlap
> >>> dm-multipath. For example we can implement ERL=0 (which is basically the
> >>> scsi-mq ERL) and/or restrict a session to a single portal.
> >>>
> >>> As I see it, the todo's are:
> >>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
> >>>     round-robin connection selection (per scsi command execution).
> >>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
> >>>     using blk-mq based queue (conn) selection.
> >>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
> >>>     as much as possible.
> >>> 4. Use blk-mq pre-allocation and tagging facilities.
> >>>
> >>> I've recently started looking into this. I would like the community to
> >>> agree (or debate) on this scheme and also talk about implementation
> >>> with anyone who is also interested in this.
> >>>
> >> Yes, that's a really good topic.
> >>
> >> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
> >> network implementation knowledge doesn't spread that far.
> >> So yeah, a discussion here would be good.
> >>
> >> Mike? Any comments?
> >
> > I have been working under the assumption that people would be ok with
> > MCS upstream if we are only using it to handle the issue where we want
> > to do something like have a tcp/iscsi connection per CPU then map the
> > connection to a blk_mq_hw_ctx. In this more limited MCS implementation
> > there would be no iscsi layer code to do something like load balance
> > across ports or transport paths like how dm-multipath does, so there
> > would be no feature/code duplication. For balancing across hctxs, then
> > the iscsi layer would also leave that up to whatever we end up with in
> > upper layers, so again no feature/code duplication with upper layers.
> >
> > So pretty non controversial I hope :)
> >
> > If people want to add something like round robin connection selection in
> > the iscsi layer, then I think we want to leave that for after the
> > initial merge, so people can argue about that separately.
> 
> Hello Sagi and Mike,
> 
> I agree with Sagi that adding scsi-mq support in the iSER initiator 
> would help iSER users because that would allow these users to configure 
> a single iSER target and use the multiqueue feature instead of having to 
> configure multiple iSER targets to spread the workload over multiple 
> cpus at the target side.
> 
> And I agree with Mike that implementing scsi-mq support in the iSER 
> initiator as multiple independent connections probably is a better 
> choice than MC/S. RFC 3720 namely requires that iSCSI numbering is 
> session-wide. This means maintaining a single counter for all MC/S 
> sessions. Such a counter would be a contention point. I'm afraid that 
> because of that counter performance on a multi-socket initiator system 
> with a scsi-mq implementation based on MC/S could be worse than with the 
> approach with multiple iSER targets. Hence my preference for an approach 
> based on multiple independent iSER connections instead of MC/S.
> 

The idea that a simple session wide counter for command sequence number
assignment adds such a degree of contention that it renders MC/S at a
performance disadvantage vs. multi-session configurations with all of
the extra multipath logic overhead on top is at best, a naive
proposition.

On the initiator side for MC/S, literally the only thing that needs to
be serialized is the assignment of the command sequence number to
individual non-immediate PDUs.  The sending of the outgoing PDUs +
immediate data by the initiator can happen out-of-order, and it's up to
the target to ensure that the submission of the commands to the device
server is in command sequence number order.

All of the actual immediate data + R2T -> data-out processing by the
target can also be done out-of-order as well.

--nab


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08 22:16           ` Nicholas A. Bellinger
@ 2015-01-08 22:29             ` James Bottomley
  2015-01-08 22:57               ` Nicholas A. Bellinger
  0 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2015-01-08 22:29 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Bart Van Assche, open-iscsi, Hannes Reinecke, Sagi Grimberg,
	lsf-pc, linux-scsi, target-devel

On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
> On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote:
> > On 01/07/15 22:39, Mike Christie wrote:
> > > On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
> > >> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
> > >>> Hi everyone,
> > >>>
> > >>> Now that scsi-mq is fully included, we need an iSCSI initiator that
> > >>> would use it to achieve scalable performance. The need is even greater
> > >>> for iSCSI offload devices and transports that support multiple HW
> > >>> queues. As iSER maintainer I'd like to discuss the way we would choose
> > >>> to implement that in iSCSI.
> > >>>
> > >>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> > >>> with multiple sessions but only ~630K IOPs with a single session where
> > >>> the most significant bottleneck the (single) core processing
> > >>> completions.
> > >>>
> > >>> In the existing single connection per session model, given that command
> > >>> ordering must be preserved session-wide, we end up in a serial command
> > >>> execution over a single connection which is basically a single queue
> > >>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> > >>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> > >>> with an iSCSI connection (TCP socket or a HW queue).
> > >>>
> > >>> iSCSI MCS and it's role in the presence of dm-multipath layer was
> > >>> discussed several times in the past decade(s). The basic need for MCS is
> > >>> implementing a multi-queue data path, so perhaps we may want to avoid
> > >>> doing any type link aggregation or load balancing to not overlap
> > >>> dm-multipath. For example we can implement ERL=0 (which is basically the
> > >>> scsi-mq ERL) and/or restrict a session to a single portal.
> > >>>
> > >>> As I see it, the todo's are:
> > >>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
> > >>>     round-robin connection selection (per scsi command execution).
> > >>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
> > >>>     using blk-mq based queue (conn) selection.
> > >>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
> > >>>     as much as possible.
> > >>> 4. Use blk-mq pre-allocation and tagging facilities.
> > >>>
> > >>> I've recently started looking into this. I would like the community to
> > >>> agree (or debate) on this scheme and also talk about implementation
> > >>> with anyone who is also interested in this.
> > >>>
> > >> Yes, that's a really good topic.
> > >>
> > >> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
> > >> network implementation knowledge doesn't spread that far.
> > >> So yeah, a discussion here would be good.
> > >>
> > >> Mike? Any comments?
> > >
> > > I have been working under the assumption that people would be ok with
> > > MCS upstream if we are only using it to handle the issue where we want
> > > to do something like have a tcp/iscsi connection per CPU then map the
> > > connection to a blk_mq_hw_ctx. In this more limited MCS implementation
> > > there would be no iscsi layer code to do something like load balance
> > > across ports or transport paths like how dm-multipath does, so there
> > > would be no feature/code duplication. For balancing across hctxs, then
> > > the iscsi layer would also leave that up to whatever we end up with in
> > > upper layers, so again no feature/code duplication with upper layers.
> > >
> > > So pretty non controversial I hope :)
> > >
> > > If people want to add something like round robin connection selection in
> > > the iscsi layer, then I think we want to leave that for after the
> > > initial merge, so people can argue about that separately.
> > 
> > Hello Sagi and Mike,
> > 
> > I agree with Sagi that adding scsi-mq support in the iSER initiator 
> > would help iSER users because that would allow these users to configure 
> > a single iSER target and use the multiqueue feature instead of having to 
> > configure multiple iSER targets to spread the workload over multiple 
> > cpus at the target side.
> > 
> > And I agree with Mike that implementing scsi-mq support in the iSER 
> > initiator as multiple independent connections probably is a better 
> > choice than MC/S. RFC 3720 namely requires that iSCSI numbering is 
> > session-wide. This means maintaining a single counter for all MC/S 
> > sessions. Such a counter would be a contention point. I'm afraid that 
> > because of that counter performance on a multi-socket initiator system 
> > with a scsi-mq implementation based on MC/S could be worse than with the 
> > approach with multiple iSER targets. Hence my preference for an approach 
> > based on multiple independent iSER connections instead of MC/S.
> > 
> 
> The idea that a simple session wide counter for command sequence number
> assignment adds such a degree of contention that it renders MC/S at a
> performance disadvantage vs. multi-session configurations with all of
> the extra multipath logic overhead on top is at best, a naive
> proposition.
> 
> On the initiator side for MC/S, literally the only thing that needs to
> be serialized is the assignment of the command sequence number to
> individual non-immediate PDUs.  The sending of the outgoing PDUs +
> immediate data by the initiator can happen out-of-order, and it's up to
> the target to ensure that the submission of the commands to the device
> server is in command sequence number order.
> 
> All of the actual immediate data + R2T -> data-out processing by the
> target can also be done out-of-order as well.

Right, but what he's saying is that we've taken great pains in the MQ
situation to free our issue queues of all entanglements and cross queue
locking so they can fly as fast as possible.  If we have to assign an
in-order sequence number across all the queues, this becomes both a
cross CPU bus lock point to ensure atomicity and a sync point to ensure
sequencing.  Naïvely that does look to be a bottleneck which wouldn't
necessarily be mitigated simply by allowing everything to proceed out of
order after this point.

James

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08 22:29             ` James Bottomley
@ 2015-01-08 22:57               ` Nicholas A. Bellinger
       [not found]                 ` <1420757822.2842.39.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  2015-01-08 23:26                 ` Mike Christie
  0 siblings, 2 replies; 37+ messages in thread
From: Nicholas A. Bellinger @ 2015-01-08 22:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, open-iscsi, Hannes Reinecke, Sagi Grimberg,
	lsf-pc, linux-scsi, target-devel

On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
> > On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote:
> > > On 01/07/15 22:39, Mike Christie wrote:
> > > > On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
> > > >> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
> > > >>> Hi everyone,
> > > >>>
> > > >>> Now that scsi-mq is fully included, we need an iSCSI initiator that
> > > >>> would use it to achieve scalable performance. The need is even greater
> > > >>> for iSCSI offload devices and transports that support multiple HW
> > > >>> queues. As iSER maintainer I'd like to discuss the way we would choose
> > > >>> to implement that in iSCSI.
> > > >>>
> > > >>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> > > >>> with multiple sessions but only ~630K IOPs with a single session where
> > > >>> the most significant bottleneck the (single) core processing
> > > >>> completions.
> > > >>>
> > > >>> In the existing single connection per session model, given that command
> > > >>> ordering must be preserved session-wide, we end up in a serial command
> > > >>> execution over a single connection which is basically a single queue
> > > >>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> > > >>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> > > >>> with an iSCSI connection (TCP socket or a HW queue).
> > > >>>
> > > >>> iSCSI MCS and it's role in the presence of dm-multipath layer was
> > > >>> discussed several times in the past decade(s). The basic need for MCS is
> > > >>> implementing a multi-queue data path, so perhaps we may want to avoid
> > > >>> doing any type link aggregation or load balancing to not overlap
> > > >>> dm-multipath. For example we can implement ERL=0 (which is basically the
> > > >>> scsi-mq ERL) and/or restrict a session to a single portal.
> > > >>>
> > > >>> As I see it, the todo's are:
> > > >>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
> > > >>>     round-robin connection selection (per scsi command execution).
> > > >>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
> > > >>>     using blk-mq based queue (conn) selection.
> > > >>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
> > > >>>     as much as possible.
> > > >>> 4. Use blk-mq pre-allocation and tagging facilities.
> > > >>>
> > > >>> I've recently started looking into this. I would like the community to
> > > >>> agree (or debate) on this scheme and also talk about implementation
> > > >>> with anyone who is also interested in this.
> > > >>>
> > > >> Yes, that's a really good topic.
> > > >>
> > > >> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
> > > >> network implementation knowledge doesn't spread that far.
> > > >> So yeah, a discussion here would be good.
> > > >>
> > > >> Mike? Any comments?
> > > >
> > > > I have been working under the assumption that people would be ok with
> > > > MCS upstream if we are only using it to handle the issue where we want
> > > > to do something like have a tcp/iscsi connection per CPU then map the
> > > > connection to a blk_mq_hw_ctx. In this more limited MCS implementation
> > > > there would be no iscsi layer code to do something like load balance
> > > > across ports or transport paths like how dm-multipath does, so there
> > > > would be no feature/code duplication. For balancing across hctxs, then
> > > > the iscsi layer would also leave that up to whatever we end up with in
> > > > upper layers, so again no feature/code duplication with upper layers.
> > > >
> > > > So pretty non controversial I hope :)
> > > >
> > > > If people want to add something like round robin connection selection in
> > > > the iscsi layer, then I think we want to leave that for after the
> > > > initial merge, so people can argue about that separately.
> > > 
> > > Hello Sagi and Mike,
> > > 
> > > I agree with Sagi that adding scsi-mq support in the iSER initiator 
> > > would help iSER users because that would allow these users to configure 
> > > a single iSER target and use the multiqueue feature instead of having to 
> > > configure multiple iSER targets to spread the workload over multiple 
> > > cpus at the target side.
> > > 
> > > And I agree with Mike that implementing scsi-mq support in the iSER 
> > > initiator as multiple independent connections probably is a better 
> > > choice than MC/S. RFC 3720 namely requires that iSCSI numbering is 
> > > session-wide. This means maintaining a single counter for all MC/S 
> > > sessions. Such a counter would be a contention point. I'm afraid that 
> > > because of that counter performance on a multi-socket initiator system 
> > > with a scsi-mq implementation based on MC/S could be worse than with the 
> > > approach with multiple iSER targets. Hence my preference for an approach 
> > > based on multiple independent iSER connections instead of MC/S.
> > > 
> > 
> > The idea that a simple session wide counter for command sequence number
> > assignment adds such a degree of contention that it renders MC/S at a
> > performance disadvantage vs. multi-session configurations with all of
> > the extra multipath logic overhead on top is at best, a naive
> > proposition.
> > 
> > On the initiator side for MC/S, literally the only thing that needs to
> > be serialized is the assignment of the command sequence number to
> > individual non-immediate PDUs.  The sending of the outgoing PDUs +
> > immediate data by the initiator can happen out-of-order, and it's up to
> > the target to ensure that the submission of the commands to the device
> > server is in command sequence number order.
> > 
> > All of the actual immediate data + R2T -> data-out processing by the
> > target can also be done out-of-order as well.
> 
> Right, but what he's saying is that we've taken great pains in the MQ
> situation to free our issue queues of all entanglements and cross queue
> locking so they can fly as fast as possible.  If we have to assign an
> in-order sequence number across all the queues, this becomes both a
> cross CPU bus lock point to ensure atomicity and a sync point to ensure
> sequencing.  Naïvely that does look to be a bottleneck which wouldn't
> necessarily be mitigated simply by allowing everything to proceed out of
> order after this point.
> 

The point is that a simple session wide counter for command sequence
number assignment is significantly less overhead than all of the
overhead associated with running a full multipath stack atop multiple
sessions.

Not to mention that our iSCSI/iSER initiator is already taking a session
wide lock when sending outgoing PDUs, so adding a session wide counter
isn't adding any additional synchronization overhead vs. what's already
in place.

--nab

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08  7:50         ` Bart Van Assche
  2015-01-08 13:45           ` Sagi Grimberg
  2015-01-08 22:16           ` Nicholas A. Bellinger
@ 2015-01-08 23:01           ` Mike Christie
  2 siblings, 0 replies; 37+ messages in thread
From: Mike Christie @ 2015-01-08 23:01 UTC (permalink / raw)
  To: open-iscsi
  Cc: Bart Van Assche, Hannes Reinecke, Sagi Grimberg, lsf-pc,
	linux-scsi, target-devel

On 1/8/15, 1:50 AM, Bart Van Assche wrote:
> On 01/07/15 22:39, Mike Christie wrote:
>> On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
>>> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
>>>> Hi everyone,
>>>>
>>>> Now that scsi-mq is fully included, we need an iSCSI initiator that
>>>> would use it to achieve scalable performance. The need is even greater
>>>> for iSCSI offload devices and transports that support multiple HW
>>>> queues. As iSER maintainer I'd like to discuss the way we would choose
>>>> to implement that in iSCSI.
>>>>
>>>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
>>>> with multiple sessions but only ~630K IOPs with a single session where
>>>> the most significant bottleneck the (single) core processing
>>>> completions.
>>>>
>>>> In the existing single connection per session model, given that command
>>>> ordering must be preserved session-wide, we end up in a serial command
>>>> execution over a single connection which is basically a single queue
>>>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
>>>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
>>>> with an iSCSI connection (TCP socket or a HW queue).
>>>>
>>>> iSCSI MCS and it's role in the presence of dm-multipath layer was
>>>> discussed several times in the past decade(s). The basic need for
>>>> MCS is
>>>> implementing a multi-queue data path, so perhaps we may want to avoid
>>>> doing any type link aggregation or load balancing to not overlap
>>>> dm-multipath. For example we can implement ERL=0 (which is basically
>>>> the
>>>> scsi-mq ERL) and/or restrict a session to a single portal.
>>>>
>>>> As I see it, the todo's are:
>>>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>>>>     round-robin connection selection (per scsi command execution).
>>>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>>>>     using blk-mq based queue (conn) selection.
>>>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>>>>     as much as possible.
>>>> 4. Use blk-mq pre-allocation and tagging facilities.
>>>>
>>>> I've recently started looking into this. I would like the community to
>>>> agree (or debate) on this scheme and also talk about implementation
>>>> with anyone who is also interested in this.
>>>>
>>> Yes, that's a really good topic.
>>>
>>> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
>>> network implementation knowledge doesn't spread that far.
>>> So yeah, a discussion here would be good.
>>>
>>> Mike? Any comments?
>>
>> I have been working under the assumption that people would be ok with
>> MCS upstream if we are only using it to handle the issue where we want
>> to do something like have a tcp/iscsi connection per CPU then map the
>> connection to a blk_mq_hw_ctx. In this more limited MCS implementation
>> there would be no iscsi layer code to do something like load balance
>> across ports or transport paths like how dm-multipath does, so there
>> would be no feature/code duplication. For balancing across hctxs, then
>> the iscsi layer would also leave that up to whatever we end up with in
>> upper layers, so again no feature/code duplication with upper layers.
>>
>> So pretty non controversial I hope :)
>>
>> If people want to add something like round robin connection selection in
>> the iscsi layer, then I think we want to leave that for after the
>> initial merge, so people can argue about that separately.
>
> Hello Sagi and Mike,
>
> I agree with Sagi that adding scsi-mq support in the iSER initiator
> would help iSER users because that would allow these users to configure
> a single iSER target and use the multiqueue feature instead of having to
> configure multiple iSER targets to spread the workload over multiple
> cpus at the target side.
>
> And I agree with Mike that implementing scsi-mq support in the iSER
> initiator as multiple independent connections probably is a better
> choice than MC/S. RFC 3720 namely requires that iSCSI numbering is
> session-wide. This means maintaining a single counter for all MC/S
> sessions. Such a counter would be a contention point. I'm afraid that
> because of that counter performance on a multi-socket initiator system
> with a scsi-mq implementation based on MC/S could be worse than with the
> approach with multiple iSER targets. Hence my preference for an approach
> based on multiple independent iSER connections instead of MC/S.
>

Above I was actually saying we should do a limited MCS. Originally, I 
tried something like you are suggesting for the non MCS case, but I hit 
some snags. While I was rethinking it today, I think I figured out where 
I messed up though. It was just in how I was doing the 
device/kobject/sysfs compat stuff.

Sagi, instead of reviewing that patch you sent me offlist the other day, 
let me try to update my non MCS patch (I originally did it before you 
guys did the locking changes so I need to fix it up) and send it 
tomorrow. We then do not have to worry about MCS support and also issues 
like the session wide sequence number tracking.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                 ` <1420757822.2842.39.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
@ 2015-01-08 23:22                   ` James Bottomley
  2015-01-09  5:03                     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2015-01-08 23:22 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, Sagi Grimberg, target-devel,
	Hannes Reinecke, open-iscsi-/JYPxA39Uh5TLH3MbocFFw

On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
> > On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote:
> > > > On 01/07/15 22:39, Mike Christie wrote:
> > > > > On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
> > > > >> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
> > > > >>> Hi everyone,
> > > > >>>
> > > > >>> Now that scsi-mq is fully included, we need an iSCSI initiator that
> > > > >>> would use it to achieve scalable performance. The need is even greater
> > > > >>> for iSCSI offload devices and transports that support multiple HW
> > > > >>> queues. As iSER maintainer I'd like to discuss the way we would choose
> > > > >>> to implement that in iSCSI.
> > > > >>>
> > > > >>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> > > > >>> with multiple sessions but only ~630K IOPs with a single session where
> > > > >>> the most significant bottleneck the (single) core processing
> > > > >>> completions.
> > > > >>>
> > > > >>> In the existing single connection per session model, given that command
> > > > >>> ordering must be preserved session-wide, we end up in a serial command
> > > > >>> execution over a single connection which is basically a single queue
> > > > >>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> > > > >>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> > > > >>> with an iSCSI connection (TCP socket or a HW queue).
> > > > >>>
> > > > >>> iSCSI MCS and it's role in the presence of dm-multipath layer was
> > > > >>> discussed several times in the past decade(s). The basic need for MCS is
> > > > >>> implementing a multi-queue data path, so perhaps we may want to avoid
> > > > >>> doing any type link aggregation or load balancing to not overlap
> > > > >>> dm-multipath. For example we can implement ERL=0 (which is basically the
> > > > >>> scsi-mq ERL) and/or restrict a session to a single portal.
> > > > >>>
> > > > >>> As I see it, the todo's are:
> > > > >>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
> > > > >>>     round-robin connection selection (per scsi command execution).
> > > > >>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
> > > > >>>     using blk-mq based queue (conn) selection.
> > > > >>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
> > > > >>>     as much as possible.
> > > > >>> 4. Use blk-mq pre-allocation and tagging facilities.
> > > > >>>
> > > > >>> I've recently started looking into this. I would like the community to
> > > > >>> agree (or debate) on this scheme and also talk about implementation
> > > > >>> with anyone who is also interested in this.
> > > > >>>
> > > > >> Yes, that's a really good topic.
> > > > >>
> > > > >> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
> > > > >> network implementation knowledge doesn't spread that far.
> > > > >> So yeah, a discussion here would be good.
> > > > >>
> > > > >> Mike? Any comments?
> > > > >
> > > > > I have been working under the assumption that people would be ok with
> > > > > MCS upstream if we are only using it to handle the issue where we want
> > > > > to do something like have a tcp/iscsi connection per CPU then map the
> > > > > connection to a blk_mq_hw_ctx. In this more limited MCS implementation
> > > > > there would be no iscsi layer code to do something like load balance
> > > > > across ports or transport paths like how dm-multipath does, so there
> > > > > would be no feature/code duplication. For balancing across hctxs, then
> > > > > the iscsi layer would also leave that up to whatever we end up with in
> > > > > upper layers, so again no feature/code duplication with upper layers.
> > > > >
> > > > > So pretty non controversial I hope :)
> > > > >
> > > > > If people want to add something like round robin connection selection in
> > > > > the iscsi layer, then I think we want to leave that for after the
> > > > > initial merge, so people can argue about that separately.
> > > > 
> > > > Hello Sagi and Mike,
> > > > 
> > > > I agree with Sagi that adding scsi-mq support in the iSER initiator 
> > > > would help iSER users because that would allow these users to configure 
> > > > a single iSER target and use the multiqueue feature instead of having to 
> > > > configure multiple iSER targets to spread the workload over multiple 
> > > > cpus at the target side.
> > > > 
> > > > And I agree with Mike that implementing scsi-mq support in the iSER 
> > > > initiator as multiple independent connections probably is a better 
> > > > choice than MC/S. RFC 3720 namely requires that iSCSI numbering is 
> > > > session-wide. This means maintaining a single counter for all MC/S 
> > > > sessions. Such a counter would be a contention point. I'm afraid that 
> > > > because of that counter performance on a multi-socket initiator system 
> > > > with a scsi-mq implementation based on MC/S could be worse than with the 
> > > > approach with multiple iSER targets. Hence my preference for an approach 
> > > > based on multiple independent iSER connections instead of MC/S.
> > > > 
> > > 
> > > The idea that a simple session wide counter for command sequence number
> > > assignment adds such a degree of contention that it renders MC/S at a
> > > performance disadvantage vs. multi-session configurations with all of
> > > the extra multipath logic overhead on top is at best, a naive
> > > proposition.
> > > 
> > > On the initiator side for MC/S, literally the only thing that needs to
> > > be serialized is the assignment of the command sequence number to
> > > individual non-immediate PDUs.  The sending of the outgoing PDUs +
> > > immediate data by the initiator can happen out-of-order, and it's up to
> > > the target to ensure that the submission of the commands to the device
> > > server is in command sequence number order.
> > > 
> > > All of the actual immediate data + R2T -> data-out processing by the
> > > target can also be done out-of-order as well.
> > 
> > Right, but what he's saying is that we've taken great pains in the MQ
> > situation to free our issue queues of all entanglements and cross queue
> > locking so they can fly as fast as possible.  If we have to assign an
> > in-order sequence number across all the queues, this becomes both a
> > cross CPU bus lock point to ensure atomicity and a sync point to ensure
> > sequencing.  Naïvely that does look to be a bottleneck which wouldn't
> > necessarily be mitigated simply by allowing everything to proceed out of
> > order after this point.
> > 
> 
> The point is that a simple session wide counter for command sequence
> number assignment is significantly less overhead than all of the
> overhead associated with running a full multipath stack atop multiple
> sessions.

I don't see how that's relevant to issue speed, which was the measure we
were using: The layers above are just a hopper.  As long as they're
loaded, the MQ lower layer can issue at full speed.  So as long as the
multipath hopper is efficient enough to keep the queues loaded there's
no speed degradation.

The problem with a sequence point inside the MQ issue layer is that it
can cause a stall that reduces the issue speed. so the counter sequence
point causes a degraded issue speed over the multipath hopper approach
above even if the multipath approach has a higher CPU overhead.

Now, if the system is close to 100% cpu already, *then* the multipath
overhead will try to take CPU power we don't have and cause a stall, but
it's only in the flat out CPU case.

> Not to mention that our iSCSI/iSER initiator is already taking a session
> wide lock when sending outgoing PDUs, so adding a session wide counter
> isn't adding any additional synchronization overhead vs. what's already
> in place.

I'll leave it up to the iSER people to decide whether they're redoing
this as part of the MQ work.

James


-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08 22:57               ` Nicholas A. Bellinger
       [not found]                 ` <1420757822.2842.39.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
@ 2015-01-08 23:26                 ` Mike Christie
       [not found]                   ` <54AF122C.9070703-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  1 sibling, 1 reply; 37+ messages in thread
From: Mike Christie @ 2015-01-08 23:26 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: James Bottomley, Bart Van Assche, open-iscsi, Hannes Reinecke,
	Sagi Grimberg, lsf-pc, linux-scsi, target-devel

On 1/8/15, 4:57 PM, Nicholas A. Bellinger wrote:
> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
>>> On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote:
>>>> On 01/07/15 22:39, Mike Christie wrote:
>>>>> On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
>>>>>> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Now that scsi-mq is fully included, we need an iSCSI initiator that
>>>>>>> would use it to achieve scalable performance. The need is even greater
>>>>>>> for iSCSI offload devices and transports that support multiple HW
>>>>>>> queues. As iSER maintainer I'd like to discuss the way we would choose
>>>>>>> to implement that in iSCSI.
>>>>>>>
>>>>>>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
>>>>>>> with multiple sessions but only ~630K IOPs with a single session where
>>>>>>> the most significant bottleneck the (single) core processing
>>>>>>> completions.
>>>>>>>
>>>>>>> In the existing single connection per session model, given that command
>>>>>>> ordering must be preserved session-wide, we end up in a serial command
>>>>>>> execution over a single connection which is basically a single queue
>>>>>>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
>>>>>>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
>>>>>>> with an iSCSI connection (TCP socket or a HW queue).
>>>>>>>
>>>>>>> iSCSI MCS and it's role in the presence of dm-multipath layer was
>>>>>>> discussed several times in the past decade(s). The basic need for MCS is
>>>>>>> implementing a multi-queue data path, so perhaps we may want to avoid
>>>>>>> doing any type link aggregation or load balancing to not overlap
>>>>>>> dm-multipath. For example we can implement ERL=0 (which is basically the
>>>>>>> scsi-mq ERL) and/or restrict a session to a single portal.
>>>>>>>
>>>>>>> As I see it, the todo's are:
>>>>>>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
>>>>>>>      round-robin connection selection (per scsi command execution).
>>>>>>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
>>>>>>>      using blk-mq based queue (conn) selection.
>>>>>>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
>>>>>>>      as much as possible.
>>>>>>> 4. Use blk-mq pre-allocation and tagging facilities.
>>>>>>>
>>>>>>> I've recently started looking into this. I would like the community to
>>>>>>> agree (or debate) on this scheme and also talk about implementation
>>>>>>> with anyone who is also interested in this.
>>>>>>>
>>>>>> Yes, that's a really good topic.
>>>>>>
>>>>>> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
>>>>>> network implementation knowledge doesn't spread that far.
>>>>>> So yeah, a discussion here would be good.
>>>>>>
>>>>>> Mike? Any comments?
>>>>>
>>>>> I have been working under the assumption that people would be ok with
>>>>> MCS upstream if we are only using it to handle the issue where we want
>>>>> to do something like have a tcp/iscsi connection per CPU then map the
>>>>> connection to a blk_mq_hw_ctx. In this more limited MCS implementation
>>>>> there would be no iscsi layer code to do something like load balance
>>>>> across ports or transport paths like how dm-multipath does, so there
>>>>> would be no feature/code duplication. For balancing across hctxs, then
>>>>> the iscsi layer would also leave that up to whatever we end up with in
>>>>> upper layers, so again no feature/code duplication with upper layers.
>>>>>
>>>>> So pretty non controversial I hope :)
>>>>>
>>>>> If people want to add something like round robin connection selection in
>>>>> the iscsi layer, then I think we want to leave that for after the
>>>>> initial merge, so people can argue about that separately.
>>>>
>>>> Hello Sagi and Mike,
>>>>
>>>> I agree with Sagi that adding scsi-mq support in the iSER initiator
>>>> would help iSER users because that would allow these users to configure
>>>> a single iSER target and use the multiqueue feature instead of having to
>>>> configure multiple iSER targets to spread the workload over multiple
>>>> cpus at the target side.
>>>>
>>>> And I agree with Mike that implementing scsi-mq support in the iSER
>>>> initiator as multiple independent connections probably is a better
>>>> choice than MC/S. RFC 3720 namely requires that iSCSI numbering is
>>>> session-wide. This means maintaining a single counter for all MC/S
>>>> sessions. Such a counter would be a contention point. I'm afraid that
>>>> because of that counter performance on a multi-socket initiator system
>>>> with a scsi-mq implementation based on MC/S could be worse than with the
>>>> approach with multiple iSER targets. Hence my preference for an approach
>>>> based on multiple independent iSER connections instead of MC/S.
>>>>
>>>
>>> The idea that a simple session wide counter for command sequence number
>>> assignment adds such a degree of contention that it renders MC/S at a
>>> performance disadvantage vs. multi-session configurations with all of
>>> the extra multipath logic overhead on top is at best, a naive
>>> proposition.
>>>
>>> On the initiator side for MC/S, literally the only thing that needs to
>>> be serialized is the assignment of the command sequence number to
>>> individual non-immediate PDUs.  The sending of the outgoing PDUs +
>>> immediate data by the initiator can happen out-of-order, and it's up to
>>> the target to ensure that the submission of the commands to the device
>>> server is in command sequence number order.
>>>
>>> All of the actual immediate data + R2T -> data-out processing by the
>>> target can also be done out-of-order as well.
>>
>> Right, but what he's saying is that we've taken great pains in the MQ
>> situation to free our issue queues of all entanglements and cross queue
>> locking so they can fly as fast as possible.  If we have to assign an
>> in-order sequence number across all the queues, this becomes both a
>> cross CPU bus lock point to ensure atomicity and a sync point to ensure
>> sequencing.  Naïvely that does look to be a bottleneck which wouldn't
>> necessarily be mitigated simply by allowing everything to proceed out of
>> order after this point.
>>
>
> The point is that a simple session wide counter for command sequence
> number assignment is significantly less overhead than all of the
> overhead associated with running a full multipath stack atop multiple
> sessions.

I think we are still going to want to use dm multipath on top of iscsi 
for devices that do failover across some sort group of paths like with 
ALUA, so we have to solve dm multipath problems either way.

There is greater memory overhead, but how bad is it? With lots of CPUs 
and lots of transport paths, I can see where it could get crazy. I have 
no idea how bad it will be though.

Are you also seeing a perf issue that is caused by dm?

Hannes had an idea where we could merge the lower and upper levels 
somehow and that might solve some of the issues you are thinking about.

>
> Not to mention that our iSCSI/iSER initiator is already taking a session
> wide lock when sending outgoing PDUs, so adding a session wide counter
> isn't adding any additional synchronization overhead vs. what's already
> in place.

I am not sure if we want this to be a deciding factor. I think the 
session wide lock is something that can be removed in the main IO paths.

A lot of what it is used for now is cmd/task related handling like list 
accesses. When we have the scsi layer alloc/free/manage that, then we 
can simplify that a lot for iser/bnx2i/cxgb*i since there send path is 
less complicated than software iscsi.

It is also used for the state check but I think that is overkill.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]         ` <54ADA777.6090801-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
@ 2015-01-08 23:40           ` Mike Christie
  0 siblings, 0 replies; 37+ messages in thread
From: Mike Christie @ 2015-01-08 23:40 UTC (permalink / raw)
  To: open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: Hannes Reinecke, Sagi Grimberg,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-scsi,
	target-devel

On 1/7/15, 3:39 PM, Mike Christie wrote:
> So pretty non controversial I hope

Ok, maybe a little controversial. Let me work with Sagi on his MCS (tcp 
connection per CPU approach) patch and update my session per CPU patch 
and we can do some benchmarking and tracing and see what is up.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08 23:22                   ` [Lsf-pc] " James Bottomley
@ 2015-01-09  5:03                     ` Nicholas A. Bellinger
  2015-01-09  6:25                       ` James Bottomley
       [not found]                       ` <1420779808.21830.21.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  0 siblings, 2 replies; 37+ messages in thread
From: Nicholas A. Bellinger @ 2015-01-09  5:03 UTC (permalink / raw)
  To: James Bottomley
  Cc: lsf-pc, Bart Van Assche, linux-scsi, Sagi Grimberg, target-devel,
	Hannes Reinecke, open-iscsi

On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
> > On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
> > > On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:

<SNIP>

> > The point is that a simple session wide counter for command sequence
> > number assignment is significantly less overhead than all of the
> > overhead associated with running a full multipath stack atop multiple
> > sessions.
> 
> I don't see how that's relevant to issue speed, which was the measure we
> were using: The layers above are just a hopper.  As long as they're
> loaded, the MQ lower layer can issue at full speed.  So as long as the
> multipath hopper is efficient enough to keep the queues loaded there's
> no speed degradation.
> 
> The problem with a sequence point inside the MQ issue layer is that it
> can cause a stall that reduces the issue speed. so the counter sequence
> point causes a degraded issue speed over the multipath hopper approach
> above even if the multipath approach has a higher CPU overhead.
> 
> Now, if the system is close to 100% cpu already, *then* the multipath
> overhead will try to take CPU power we don't have and cause a stall, but
> it's only in the flat out CPU case.
> 
> > Not to mention that our iSCSI/iSER initiator is already taking a session
> > wide lock when sending outgoing PDUs, so adding a session wide counter
> > isn't adding any additional synchronization overhead vs. what's already
> > in place.
> 
> I'll leave it up to the iSER people to decide whether they're redoing
> this as part of the MQ work.
> 

Session wide command sequence number synchronization isn't something to
be removed as part of the MQ work.  It's a iSCSI/iSER protocol
requirement.

That is, the expected + maximum sequence numbers are returned as part of
every response PDU, which the initiator uses to determine when the
command sequence number window is open so new non-immediate commands may
be sent to the target.

So, given some manner of session wide synchronization is required
between different contexts for the existing single connection case to
update the command sequence number and check when the window opens, it's
a fallacy to claim MC/S adds some type of new initiator specific
synchronization overhead vs. single connection code.

--nab

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-09  5:03                     ` Nicholas A. Bellinger
@ 2015-01-09  6:25                       ` James Bottomley
       [not found]                       ` <1420779808.21830.21.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
  1 sibling, 0 replies; 37+ messages in thread
From: James Bottomley @ 2015-01-09  6:25 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: open-iscsi, Bart Van Assche, linux-scsi, Sagi Grimberg,
	target-devel, Hannes Reinecke, lsf-pc

On Thu, 2015-01-08 at 21:03 -0800, Nicholas A. Bellinger wrote:
> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
> > On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
> > > > On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
> 
> <SNIP>
> 
> > > The point is that a simple session wide counter for command sequence
> > > number assignment is significantly less overhead than all of the
> > > overhead associated with running a full multipath stack atop multiple
> > > sessions.
> > 
> > I don't see how that's relevant to issue speed, which was the measure we
> > were using: The layers above are just a hopper.  As long as they're
> > loaded, the MQ lower layer can issue at full speed.  So as long as the
> > multipath hopper is efficient enough to keep the queues loaded there's
> > no speed degradation.
> > 
> > The problem with a sequence point inside the MQ issue layer is that it
> > can cause a stall that reduces the issue speed. so the counter sequence
> > point causes a degraded issue speed over the multipath hopper approach
> > above even if the multipath approach has a higher CPU overhead.
> > 
> > Now, if the system is close to 100% cpu already, *then* the multipath
> > overhead will try to take CPU power we don't have and cause a stall, but
> > it's only in the flat out CPU case.
> > 
> > > Not to mention that our iSCSI/iSER initiator is already taking a session
> > > wide lock when sending outgoing PDUs, so adding a session wide counter
> > > isn't adding any additional synchronization overhead vs. what's already
> > > in place.
> > 
> > I'll leave it up to the iSER people to decide whether they're redoing
> > this as part of the MQ work.
> > 
> 
> Session wide command sequence number synchronization isn't something to
> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
> requirement.

The sequence number is a requirement of the session.  Multiple separate
sessions means no SN correlation between the different connections, so
no global requirement for a SN counter across the queues ... that's what
Mike was saying about implementing multipath not using MCS.  With MCS we
have a single session for all the queues and thus have to correlate the
sequence number across all the connections and hence all the queues;
without it we don't.  That's why the sequence number becomes a potential
stall point in MQ implementation of MCS which can be obviated if we use
a separate session per queue.

James



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                   ` <54AF122C.9070703-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
@ 2015-01-09 11:17                     ` Sagi Grimberg
  0 siblings, 0 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-09 11:17 UTC (permalink / raw)
  To: Mike Christie, Nicholas A. Bellinger
  Cc: James Bottomley, Bart Van Assche,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw, Hannes Reinecke,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-scsi,
	target-devel

On 1/9/2015 1:26 AM, Mike Christie wrote:
<SNIP>
> I am not sure if we want this to be a deciding factor. I think the
> session wide lock is something that can be removed in the main IO paths.
>
> A lot of what it is used for now is cmd/task related handling like list
> accesses. When we have the scsi layer alloc/free/manage that, then we
> can simplify that a lot for iser/bnx2i/cxgb*i since there send path is
> less complicated than software iscsi.
>

Completely agree. We should assume that other than session-wide command
sequence numbers nothing is synced across connections.

> It is also used for the state check but I think that is overkill.
>

I have a piped patch to remove this completely redundant spin_lock that
protects a check on a state that can change right after the spin_unlock.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08 14:11               ` Bart Van Assche
       [not found]                 ` <54AE9010.5080609-HInyCGIudOg@public.gmane.org>
@ 2015-01-09 11:39                 ` Sagi Grimberg
  2015-01-09 13:31                   ` Bart Van Assche
  1 sibling, 1 reply; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-09 11:39 UTC (permalink / raw)
  To: Bart Van Assche, open-iscsi, Hannes Reinecke, lsf-pc
  Cc: linux-scsi, target-devel, Oren Duer, Or Gerlitz

On 1/8/2015 4:11 PM, Bart Van Assche wrote:
> On 01/08/15 14:45, Sagi Grimberg wrote:
>> Actually I started with that approach, but the independent connections
>> under a single session (I-T-Nexus) violates the command ordering
>> requirement. Plus, such a solution is specific to iSER...
>
> Hello Sagi,
>
> Which command ordering requirement are you referring to ? The Linux
> storage stack does not guarantee that block layer or SCSI commands will
> be processed in the same order as these commands have been submitted.
>

I was referring to the iSCSI session requirement. I initially thought of
an approach to maintain multiple iSER connections under a single session
but pretty soon I realized that preserving commands ordering this way
is not feasible. So independent iSER connections means independent
iSCSI sessions (each with a single connection). This is indeed another
choice, which we are clearly debating on...

I'm just wandering if we are not trying to force-fit this model. How
would this model look like? We will need to define another entity to
track and maintain the sessions and to allocate the scsi_host. Will that
be communicated to user-space? How will error recovery look like?

Sagi.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-09 11:39                 ` Sagi Grimberg
@ 2015-01-09 13:31                   ` Bart Van Assche
       [not found]                     ` <5EE87F5E6631894E80EB1A63198F964D040A6A8F-cXZ6iGhjG0hm/BozF5lIdDJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2015-01-09 13:31 UTC (permalink / raw)
  To: open-iscsi, Hannes Reinecke, lsf-pc
  Cc: linux-scsi, target-devel, Oren Duer, Or Gerlitz

On 01/09/15 12:39, Sagi Grimberg wrote:
> On 1/8/2015 4:11 PM, Bart Van Assche wrote:
>> On 01/08/15 14:45, Sagi Grimberg wrote:
>>> Actually I started with that approach, but the independent connections
>>> under a single session (I-T-Nexus) violates the command ordering
>>> requirement. Plus, such a solution is specific to iSER...
>>
>> Which command ordering requirement are you referring to ? The Linux
>> storage stack does not guarantee that block layer or SCSI commands will
>> be processed in the same order as these commands have been submitted.
>
> I was referring to the iSCSI session requirement. I initially thought of
> an approach to maintain multiple iSER connections under a single session
> but pretty soon I realized that preserving commands ordering this way
> is not feasible. So independent iSER connections means independent
> iSCSI sessions (each with a single connection). This is indeed another
> choice, which we are clearly debating on...
>
> I'm just wandering if we are not trying to force-fit this model. How
> would this model look like? We will need to define another entity to
> track and maintain the sessions and to allocate the scsi_host. Will that
> be communicated to user-space? How will error recovery look like?

Hello Sagi,

As you probably remember scsi-mq support was added in the SRP initiator
by changing the 1:1 relationship between scsi_host and RDMA connection
into a 1:n relationship. I don't know how much work it would take to
implement a similar transformation in the SCSI initiator. Maybe we
should wait until Mike's workday starts such that Mike has a chance to
comment on this.

Bart.


________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                       ` <1420779808.21830.21.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
@ 2015-01-09 18:00                         ` Michael Christie
  2015-01-09 18:28                           ` Hannes Reinecke
       [not found]                           ` <38CE4ECA-D155-4BF9-9D6D-E1A01ADA05E4-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  0 siblings, 2 replies; 37+ messages in thread
From: Michael Christie @ 2015-01-09 18:00 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, Sagi Grimberg, target-devel,
	Hannes Reinecke, open-iscsi-/JYPxA39Uh5TLH3MbocFFw


On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> wrote:

> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
> 
> <SNIP>
> 
>>> The point is that a simple session wide counter for command sequence
>>> number assignment is significantly less overhead than all of the
>>> overhead associated with running a full multipath stack atop multiple
>>> sessions.
>> 
>> I don't see how that's relevant to issue speed, which was the measure we
>> were using: The layers above are just a hopper.  As long as they're
>> loaded, the MQ lower layer can issue at full speed.  So as long as the
>> multipath hopper is efficient enough to keep the queues loaded there's
>> no speed degradation.
>> 
>> The problem with a sequence point inside the MQ issue layer is that it
>> can cause a stall that reduces the issue speed. so the counter sequence
>> point causes a degraded issue speed over the multipath hopper approach
>> above even if the multipath approach has a higher CPU overhead.
>> 
>> Now, if the system is close to 100% cpu already, *then* the multipath
>> overhead will try to take CPU power we don't have and cause a stall, but
>> it's only in the flat out CPU case.
>> 
>>> Not to mention that our iSCSI/iSER initiator is already taking a session
>>> wide lock when sending outgoing PDUs, so adding a session wide counter
>>> isn't adding any additional synchronization overhead vs. what's already
>>> in place.
>> 
>> I'll leave it up to the iSER people to decide whether they're redoing
>> this as part of the MQ work.
>> 
> 
> Session wide command sequence number synchronization isn't something to
> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
> requirement.
> 
> That is, the expected + maximum sequence numbers are returned as part of
> every response PDU, which the initiator uses to determine when the
> command sequence number window is open so new non-immediate commands may
> be sent to the target.
> 
> So, given some manner of session wide synchronization is required
> between different contexts for the existing single connection case to
> update the command sequence number and check when the window opens, it's
> a fallacy to claim MC/S adds some type of new initiator specific
> synchronization overhead vs. single connection code.

I think you are assuming we are leaving the iscsi code as it is today.

For the non-MCS mq session per CPU design, we would be allocating and binding the session and its resources to specific CPUs. They would only be accessed by the threads on that one CPU, so we get our serialization/synchronization from that. That is why we are saying we do not need something like atomic_t/spin_locks for the sequence number handling for this type of implementation.

If we just tried to do this with the old code where the session could be accessed on multiple CPUs then you are right, we need locks/atomics like how we do in the MCS case.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-09 18:00                         ` Michael Christie
@ 2015-01-09 18:28                           ` Hannes Reinecke
       [not found]                             ` <54B01DBD.5020707-l3A5Bk7waGM@public.gmane.org>
       [not found]                           ` <38CE4ECA-D155-4BF9-9D6D-E1A01ADA05E4-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  1 sibling, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2015-01-09 18:28 UTC (permalink / raw)
  To: Michael Christie, Nicholas A. Bellinger
  Cc: James Bottomley, lsf-pc, Bart Van Assche, linux-scsi,
	Sagi Grimberg, target-devel, open-iscsi

On 01/09/2015 07:00 PM, Michael Christie wrote:
> 
> On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> 
>> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
>>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
>>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
>>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
>>
>> <SNIP>
>>
>>>> The point is that a simple session wide counter for command sequence
>>>> number assignment is significantly less overhead than all of the
>>>> overhead associated with running a full multipath stack atop multiple
>>>> sessions.
>>>
>>> I don't see how that's relevant to issue speed, which was the measure we
>>> were using: The layers above are just a hopper.  As long as they're
>>> loaded, the MQ lower layer can issue at full speed.  So as long as the
>>> multipath hopper is efficient enough to keep the queues loaded there's
>>> no speed degradation.
>>>
>>> The problem with a sequence point inside the MQ issue layer is that it
>>> can cause a stall that reduces the issue speed. so the counter sequence
>>> point causes a degraded issue speed over the multipath hopper approach
>>> above even if the multipath approach has a higher CPU overhead.
>>>
>>> Now, if the system is close to 100% cpu already, *then* the multipath
>>> overhead will try to take CPU power we don't have and cause a stall, but
>>> it's only in the flat out CPU case.
>>>
>>>> Not to mention that our iSCSI/iSER initiator is already taking a session
>>>> wide lock when sending outgoing PDUs, so adding a session wide counter
>>>> isn't adding any additional synchronization overhead vs. what's already
>>>> in place.
>>>
>>> I'll leave it up to the iSER people to decide whether they're redoing
>>> this as part of the MQ work.
>>>
>>
>> Session wide command sequence number synchronization isn't something to
>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>> requirement.
>>
>> That is, the expected + maximum sequence numbers are returned as part of
>> every response PDU, which the initiator uses to determine when the
>> command sequence number window is open so new non-immediate commands may
>> be sent to the target.
>>
>> So, given some manner of session wide synchronization is required
>> between different contexts for the existing single connection case to
>> update the command sequence number and check when the window opens, it's
>> a fallacy to claim MC/S adds some type of new initiator specific
>> synchronization overhead vs. single connection code.
> 
> I think you are assuming we are leaving the iscsi code as it is today.
> 
> For the non-MCS mq session per CPU design, we would be allocating and
> binding the session and its resources to specific CPUs. They would only
> be accessed by the threads on that one CPU, so we get our
> serialization/synchronization from that. That is why we are saying we
> do not need something like atomic_t/spin_locks for the sequence number
> handling for this type of implementation.
> 
Wouldn't that need to be coordinated with the networking layer?
Doesn't it do the same thing, matching TX/RX queues to CPUs?
If so, wouldn't we decrease bandwidth by restricting things to one CPU?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                             ` <54B01DBD.5020707-l3A5Bk7waGM@public.gmane.org>
@ 2015-01-09 18:34                               ` James Bottomley
  2015-01-09 20:19                               ` Mike Christie
  1 sibling, 0 replies; 37+ messages in thread
From: James Bottomley @ 2015-01-09 18:34 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Michael Christie, Nicholas A. Bellinger,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, Sagi Grimberg, target-devel,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw

On Fri, 2015-01-09 at 19:28 +0100, Hannes Reinecke wrote:
[...]
> > I think you are assuming we are leaving the iscsi code as it is today.
> > 
> > For the non-MCS mq session per CPU design, we would be allocating and
> > binding the session and its resources to specific CPUs. They would only
> > be accessed by the threads on that one CPU, so we get our
> > serialization/synchronization from that. That is why we are saying we
> > do not need something like atomic_t/spin_locks for the sequence number
> > handling for this type of implementation.
> > 
> Wouldn't that need to be coordinated with the networking layer?
> Doesn't it do the same thing, matching TX/RX queues to CPUs?
> If so, wouldn't we decrease bandwidth by restricting things to one CPU?

So this is actually one of the fascinating questions on multi-queue.
Long ago, when I worked for the NCR OS group and we were bringing up the
first SMP systems, we actually found that the SCSI stack went faster
when bound to a single CPU.  The problem in those days was lock
granularity and contention, so single CPU binding eliminated that
overhead.  However, nowadays with modern multi-tiered caching and huge
latencies for cache line bouncing, we're approaching the point where the
fineness of our lock granularity is hurting performance, so it's worth
re-asking the question of whether just dumping all the lock latency by
single CPU binding is a worthwhile exercise.

James

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                             ` <54B01DBD.5020707-l3A5Bk7waGM@public.gmane.org>
  2015-01-09 18:34                               ` James Bottomley
@ 2015-01-09 20:19                               ` Mike Christie
       [not found]                                 ` <54B037BF.1010903-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  1 sibling, 1 reply; 37+ messages in thread
From: Mike Christie @ 2015-01-09 20:19 UTC (permalink / raw)
  To: open-iscsi-/JYPxA39Uh5TLH3MbocFFw, Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, Sagi Grimberg, target-devel

On 01/09/2015 12:28 PM, Hannes Reinecke wrote:
> On 01/09/2015 07:00 PM, Michael Christie wrote:
>>
>> On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> wrote:
>>
>>> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
>>>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
>>>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
>>>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
>>>
>>> <SNIP>
>>>
>>>>> The point is that a simple session wide counter for command sequence
>>>>> number assignment is significantly less overhead than all of the
>>>>> overhead associated with running a full multipath stack atop multiple
>>>>> sessions.
>>>>
>>>> I don't see how that's relevant to issue speed, which was the measure we
>>>> were using: The layers above are just a hopper.  As long as they're
>>>> loaded, the MQ lower layer can issue at full speed.  So as long as the
>>>> multipath hopper is efficient enough to keep the queues loaded there's
>>>> no speed degradation.
>>>>
>>>> The problem with a sequence point inside the MQ issue layer is that it
>>>> can cause a stall that reduces the issue speed. so the counter sequence
>>>> point causes a degraded issue speed over the multipath hopper approach
>>>> above even if the multipath approach has a higher CPU overhead.
>>>>
>>>> Now, if the system is close to 100% cpu already, *then* the multipath
>>>> overhead will try to take CPU power we don't have and cause a stall, but
>>>> it's only in the flat out CPU case.
>>>>
>>>>> Not to mention that our iSCSI/iSER initiator is already taking a session
>>>>> wide lock when sending outgoing PDUs, so adding a session wide counter
>>>>> isn't adding any additional synchronization overhead vs. what's already
>>>>> in place.
>>>>
>>>> I'll leave it up to the iSER people to decide whether they're redoing
>>>> this as part of the MQ work.
>>>>
>>>
>>> Session wide command sequence number synchronization isn't something to
>>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>>> requirement.
>>>
>>> That is, the expected + maximum sequence numbers are returned as part of
>>> every response PDU, which the initiator uses to determine when the
>>> command sequence number window is open so new non-immediate commands may
>>> be sent to the target.
>>>
>>> So, given some manner of session wide synchronization is required
>>> between different contexts for the existing single connection case to
>>> update the command sequence number and check when the window opens, it's
>>> a fallacy to claim MC/S adds some type of new initiator specific
>>> synchronization overhead vs. single connection code.
>>
>> I think you are assuming we are leaving the iscsi code as it is today.
>>
>> For the non-MCS mq session per CPU design, we would be allocating and
>> binding the session and its resources to specific CPUs. They would only
>> be accessed by the threads on that one CPU, so we get our
>> serialization/synchronization from that. That is why we are saying we
>> do not need something like atomic_t/spin_locks for the sequence number
>> handling for this type of implementation.
>>
> Wouldn't that need to be coordinated with the networking layer?

Yes.

> Doesn't it do the same thing, matching TX/RX queues to CPUs?

Yes.

> If so, wouldn't we decrease bandwidth by restricting things to one CPU?

We have a session or connection per CPU though, so we end up hitting the
same problem you talked about last year where one hctx (iscsi session or
connection's socket or nic hw queue) could get overloaded. This is what
I meant in my original mail where iscsi would rely on whatever blk/mq
load balancers we end up implementing at that layer to balance requests
across hctxs.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                           ` <38CE4ECA-D155-4BF9-9D6D-E1A01ADA05E4-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
@ 2015-01-11  9:23                             ` Sagi Grimberg
       [not found]                               ` <54B24117.7050204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-11  9:23 UTC (permalink / raw)
  To: Michael Christie, Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, target-devel, Hannes Reinecke,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw

On 1/9/2015 8:00 PM, Michael Christie wrote:
<SNIP>
>>>
>>
>> Session wide command sequence number synchronization isn't something to
>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>> requirement.
>>
>> That is, the expected + maximum sequence numbers are returned as part of
>> every response PDU, which the initiator uses to determine when the
>> command sequence number window is open so new non-immediate commands may
>> be sent to the target.
>>
>> So, given some manner of session wide synchronization is required
>> between different contexts for the existing single connection case to
>> update the command sequence number and check when the window opens, it's
>> a fallacy to claim MC/S adds some type of new initiator specific
>> synchronization overhead vs. single connection code.
>
> I think you are assuming we are leaving the iscsi code as it is today.
>
> For the non-MCS mq session per CPU design, we would be allocating and binding the session and its resources to specific CPUs. They would only be accessed by the threads on that one CPU, so we get our serialization/synchronization from that. That is why we are saying we do not need something like atomic_t/spin_locks for the sequence number handling for this type of implementation.
>
> If we just tried to do this with the old code where the session could be accessed on multiple CPUs then you are right, we need locks/atomics like how we do in the MCS case.
>

I don't think we will want to restrict session per CPU. There is a
tradeoff question of system resources. We might want to allow a user to
configure multiple HW queues but still not to use too much of the system
resources. So the session locks would still be used but definitely less
congested...

Sagi.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                                 ` <54B037BF.1010903-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
@ 2015-01-11  9:40                                   ` Sagi Grimberg
  2015-01-12 12:56                                     ` Bart Van Assche
       [not found]                                     ` <54B24501.7090801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 2 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-11  9:40 UTC (permalink / raw)
  To: Mike Christie, open-iscsi-/JYPxA39Uh5TLH3MbocFFw, Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, target-devel

On 1/9/2015 10:19 PM, Mike Christie wrote:
> On 01/09/2015 12:28 PM, Hannes Reinecke wrote:
>> On 01/09/2015 07:00 PM, Michael Christie wrote:
>>>
>>> On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> wrote:
>>>
>>>> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
>>>>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
>>>>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
>>>>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
>>>>
>>>> <SNIP>
>>>>
>>>>>> The point is that a simple session wide counter for command sequence
>>>>>> number assignment is significantly less overhead than all of the
>>>>>> overhead associated with running a full multipath stack atop multiple
>>>>>> sessions.
>>>>>
>>>>> I don't see how that's relevant to issue speed, which was the measure we
>>>>> were using: The layers above are just a hopper.  As long as they're
>>>>> loaded, the MQ lower layer can issue at full speed.  So as long as the
>>>>> multipath hopper is efficient enough to keep the queues loaded there's
>>>>> no speed degradation.
>>>>>
>>>>> The problem with a sequence point inside the MQ issue layer is that it
>>>>> can cause a stall that reduces the issue speed. so the counter sequence
>>>>> point causes a degraded issue speed over the multipath hopper approach
>>>>> above even if the multipath approach has a higher CPU overhead.
>>>>>
>>>>> Now, if the system is close to 100% cpu already, *then* the multipath
>>>>> overhead will try to take CPU power we don't have and cause a stall, but
>>>>> it's only in the flat out CPU case.
>>>>>
>>>>>> Not to mention that our iSCSI/iSER initiator is already taking a session
>>>>>> wide lock when sending outgoing PDUs, so adding a session wide counter
>>>>>> isn't adding any additional synchronization overhead vs. what's already
>>>>>> in place.
>>>>>
>>>>> I'll leave it up to the iSER people to decide whether they're redoing
>>>>> this as part of the MQ work.
>>>>>
>>>>
>>>> Session wide command sequence number synchronization isn't something to
>>>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>>>> requirement.
>>>>
>>>> That is, the expected + maximum sequence numbers are returned as part of
>>>> every response PDU, which the initiator uses to determine when the
>>>> command sequence number window is open so new non-immediate commands may
>>>> be sent to the target.
>>>>
>>>> So, given some manner of session wide synchronization is required
>>>> between different contexts for the existing single connection case to
>>>> update the command sequence number and check when the window opens, it's
>>>> a fallacy to claim MC/S adds some type of new initiator specific
>>>> synchronization overhead vs. single connection code.
>>>
>>> I think you are assuming we are leaving the iscsi code as it is today.
>>>
>>> For the non-MCS mq session per CPU design, we would be allocating and
>>> binding the session and its resources to specific CPUs. They would only
>>> be accessed by the threads on that one CPU, so we get our
>>> serialization/synchronization from that. That is why we are saying we
>>> do not need something like atomic_t/spin_locks for the sequence number
>>> handling for this type of implementation.
>>>
>> Wouldn't that need to be coordinated with the networking layer?
>
> Yes.
>
>> Doesn't it do the same thing, matching TX/RX queues to CPUs?
>
> Yes.
>

Hey Hannes, Mike,

I would say there is no need for specific coordination from iSCSI PoV.
This is exactly what flow steering is designed for. As I see it, in
order to get the TX/RX to match rings, the user can attach 5-tuple rules
(using standard ethtool) to steer packets to the right rings.

Sagi.

>> If so, wouldn't we decrease bandwidth by restricting things to one CPU?
>
> We have a session or connection per CPU though, so we end up hitting the
> same problem you talked about last year where one hctx (iscsi session or
> connection's socket or nic hw queue) could get overloaded. This is what
> I meant in my original mail where iscsi would rely on whatever blk/mq
> load balancers we end up implementing at that layer to balance requests
> across hctxs.
>

I'm not sure I understand,

The submission flow is CPU bound. In the current single queue model
both CPU X and CPU Y will end up using a single socket. In the
multi-queue solution, CPU X will go to socket X and CPU Y will go to
socket Y. This is equal to what we have today (if only CPU X is active)
or better (if more CPUs are active).

Am I missing something?

Sagi.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                     ` <5EE87F5E6631894E80EB1A63198F964D040A6A8F-cXZ6iGhjG0hm/BozF5lIdDJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
@ 2015-01-11  9:52                       ` Sagi Grimberg
  0 siblings, 0 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-11  9:52 UTC (permalink / raw)
  To: Bart Van Assche, open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	Hannes Reinecke,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-scsi, target-devel, Oren Duer, Or Gerlitz

On 1
9/2015 3:31 PM, Bart Van Assche wrote:
> On 01/09/15 12:39, Sagi Grimberg wrote:
>> On 1/8/2015 4:11 PM, Bart Van Assche wrote:
>>> On 01/08/15 14:45, Sagi Grimberg wrote:
>>>> Actually I started with that approach, but the independent connections
>>>> under a single session (I-T-Nexus) violates the command ordering
>>>> requirement. Plus, such a solution is specific to iSER...
>>>
>>> Which command ordering requirement are you referring to ? The Linux
>>> storage stack does not guarantee that block layer or SCSI commands will
>>> be processed in the same order as these commands have been submitted.
>>
>> I was referring to the iSCSI session requirement. I initially thought of
>> an approach to maintain multiple iSER connections under a single session
>> but pretty soon I realized that preserving commands ordering this way
>> is not feasible. So independent iSER connections means independent
>> iSCSI sessions (each with a single connection). This is indeed another
>> choice, which we are clearly debating on...
>>
>> I'm just wandering if we are not trying to force-fit this model. How
>> would this model look like? We will need to define another entity to
>> track and maintain the sessions and to allocate the scsi_host. Will that
>> be communicated to user-space? How will error recovery look like?
>
> Hello Sagi,

Hey Bart,

>
> As you probably remember scsi-mq support was added in the SRP initiator
> by changing the 1:1 relationship between scsi_host and RDMA connection
> into a 1:n relationship.

Of course I remember ;)

> I don't know how much work it would take to
> implement a similar transformation in the SCSI initiator. Maybe we
> should wait until Mike's workday starts such that Mike has a chance to
> comment on this.

So the question of the amount of work is not the main concern here.
After all I still think that MCS is a better fit to scsi-mq than coming
up with some other of abstraction layer to manage multiple sessions.
After all, MCS is specified and well defined.

There is the question of cmdsn as a potential synchronization point, but
shouldn't we discuss that instead of trying to get around it? If we
agree that iSCSI session-wide ordering requirements are too restrictive
in most use-cases, why not suggesting some form of relaxed ordering
mode that is would leave ordering to upper layers?

If this kind of mode would appear in the RFC 3720 family, was there
even a question of what we should do?

Sagi.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-11  9:40                                   ` Sagi Grimberg
@ 2015-01-12 12:56                                     ` Bart Van Assche
       [not found]                                       ` <54B3C47E.6010109-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
       [not found]                                     ` <54B24501.7090801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2015-01-12 12:56 UTC (permalink / raw)
  To: open-iscsi, Mike Christie, Nicholas A. Bellinger
  Cc: James Bottomley, lsf-pc, linux-scsi, target-devel

On 01/11/15 10:40, Sagi Grimberg wrote:
> I would say there is no need for specific coordination from iSCSI PoV.
> This is exactly what flow steering is designed for. As I see it, in
> order to get the TX/RX to match rings, the user can attach 5-tuple rules
> (using standard ethtool) to steer packets to the right rings.

Hello Sagi,

Can the 5-tuple rules be chosen such that it is guaranteed that the
sockets used to implement per-CPU queues are spread evenly over MSI-X
completion vectors ? If not, would it help to add a socket option to the
Linux network stack that allows to select the TX ring explicitly, just
like ib_create_cq() in the Linux RDMA stack allows to select a
completion vector explicitly ? My concerns are as follows:
- If the number of queues exceeds the number of MSI-X vectors then I
  expect that it will be much easier to guarantee even spreading by
  selecting tx queues explicitly instead of relying on a hashing scheme.
- On multi-socket systems it is important to process completion
  interrupts on the CPU socket from where the I/O was initiated. I'm
  not sure it is possible to guarantee this when using a hashing
  algorithm to select the TX ring.

Bart.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                               ` <54B24117.7050204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-01-12 20:05                                 ` Mike Christie
       [not found]                                   ` <54B428F2.2010507-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2015-01-12 20:05 UTC (permalink / raw)
  To: Sagi Grimberg, Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, target-devel, Hannes Reinecke,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw

On 01/11/2015 03:23 AM, Sagi Grimberg wrote:
> On 1/9/2015 8:00 PM, Michael Christie wrote:
> <SNIP>
>>>>
>>>
>>> Session wide command sequence number synchronization isn't something to
>>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>>> requirement.
>>>
>>> That is, the expected + maximum sequence numbers are returned as part of
>>> every response PDU, which the initiator uses to determine when the
>>> command sequence number window is open so new non-immediate commands may
>>> be sent to the target.
>>>
>>> So, given some manner of session wide synchronization is required
>>> between different contexts for the existing single connection case to
>>> update the command sequence number and check when the window opens, it's
>>> a fallacy to claim MC/S adds some type of new initiator specific
>>> synchronization overhead vs. single connection code.
>>
>> I think you are assuming we are leaving the iscsi code as it is today.
>>
>> For the non-MCS mq session per CPU design, we would be allocating and
>> binding the session and its resources to specific CPUs. They would
>> only be accessed by the threads on that one CPU, so we get our
>> serialization/synchronization from that. That is why we are saying we
>> do not need something like atomic_t/spin_locks for the sequence number
>> handling for this type of implementation.
>>
>> If we just tried to do this with the old code where the session could
>> be accessed on multiple CPUs then you are right, we need locks/atomics
>> like how we do in the MCS case.
>>
> 
> I don't think we will want to restrict session per CPU. There is a
> tradeoff question of system resources. We might want to allow a user to
> configure multiple HW queues but still not to use too much of the system
> resources. So the session locks would still be used but definitely less
> congested...

Are you talking about specifically the session per CPU or also MCS and
doing a connection per CPU?

Based on the srp work, how bad do you think it will be to do a
session/connection per CPU? What are you thinking will be more common?
Session per 4 CPU? 2 CPUs? 8?

There is also multipath to take into account here. We could do a mq/MCS
session/connection per CPU (or group of CPS) then also one of those per
transport path. We could also do a mq/MCS session/connection per
transport path, then bind those to specific CPUs. Or something in between.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                                     ` <54B24501.7090801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-01-12 20:14                                       ` Mike Christie
  0 siblings, 0 replies; 37+ messages in thread
From: Mike Christie @ 2015-01-12 20:14 UTC (permalink / raw)
  To: open-iscsi-/JYPxA39Uh5TLH3MbocFFw, Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, target-devel

On 01/11/2015 03:40 AM, Sagi Grimberg wrote:
> On 1/9/2015 10:19 PM, Mike Christie wrote:
>> On 01/09/2015 12:28 PM, Hannes Reinecke wrote:
>>> On 01/09/2015 07:00 PM, Michael Christie wrote:
>>>>
>>>> On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger
>>>> <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> wrote:
>>>>
>>>>> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
>>>>>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
>>>>>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
>>>>>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
>>>>>
>>>>> <SNIP>
>>>>>
>>>>>>> The point is that a simple session wide counter for command sequence
>>>>>>> number assignment is significantly less overhead than all of the
>>>>>>> overhead associated with running a full multipath stack atop
>>>>>>> multiple
>>>>>>> sessions.
>>>>>>
>>>>>> I don't see how that's relevant to issue speed, which was the
>>>>>> measure we
>>>>>> were using: The layers above are just a hopper.  As long as they're
>>>>>> loaded, the MQ lower layer can issue at full speed.  So as long as
>>>>>> the
>>>>>> multipath hopper is efficient enough to keep the queues loaded
>>>>>> there's
>>>>>> no speed degradation.
>>>>>>
>>>>>> The problem with a sequence point inside the MQ issue layer is
>>>>>> that it
>>>>>> can cause a stall that reduces the issue speed. so the counter
>>>>>> sequence
>>>>>> point causes a degraded issue speed over the multipath hopper
>>>>>> approach
>>>>>> above even if the multipath approach has a higher CPU overhead.
>>>>>>
>>>>>> Now, if the system is close to 100% cpu already, *then* the multipath
>>>>>> overhead will try to take CPU power we don't have and cause a
>>>>>> stall, but
>>>>>> it's only in the flat out CPU case.
>>>>>>
>>>>>>> Not to mention that our iSCSI/iSER initiator is already taking a
>>>>>>> session
>>>>>>> wide lock when sending outgoing PDUs, so adding a session wide
>>>>>>> counter
>>>>>>> isn't adding any additional synchronization overhead vs. what's
>>>>>>> already
>>>>>>> in place.
>>>>>>
>>>>>> I'll leave it up to the iSER people to decide whether they're redoing
>>>>>> this as part of the MQ work.
>>>>>>
>>>>>
>>>>> Session wide command sequence number synchronization isn't
>>>>> something to
>>>>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>>>>> requirement.
>>>>>
>>>>> That is, the expected + maximum sequence numbers are returned as
>>>>> part of
>>>>> every response PDU, which the initiator uses to determine when the
>>>>> command sequence number window is open so new non-immediate
>>>>> commands may
>>>>> be sent to the target.
>>>>>
>>>>> So, given some manner of session wide synchronization is required
>>>>> between different contexts for the existing single connection case to
>>>>> update the command sequence number and check when the window opens,
>>>>> it's
>>>>> a fallacy to claim MC/S adds some type of new initiator specific
>>>>> synchronization overhead vs. single connection code.
>>>>
>>>> I think you are assuming we are leaving the iscsi code as it is today.
>>>>
>>>> For the non-MCS mq session per CPU design, we would be allocating and
>>>> binding the session and its resources to specific CPUs. They would only
>>>> be accessed by the threads on that one CPU, so we get our
>>>> serialization/synchronization from that. That is why we are saying we
>>>> do not need something like atomic_t/spin_locks for the sequence number
>>>> handling for this type of implementation.
>>>>
>>> Wouldn't that need to be coordinated with the networking layer?
>>
>> Yes.
>>
>>> Doesn't it do the same thing, matching TX/RX queues to CPUs?
>>
>> Yes.
>>
> 
> Hey Hannes, Mike,
> 
> I would say there is no need for specific coordination from iSCSI PoV.
> This is exactly what flow steering is designed for. As I see it, in
> order to get the TX/RX to match rings, the user can attach 5-tuple rules
> (using standard ethtool) to steer packets to the right rings.
> 
> Sagi.
> 
>>> If so, wouldn't we decrease bandwidth by restricting things to one CPU?
>>
>> We have a session or connection per CPU though, so we end up hitting the
>> same problem you talked about last year where one hctx (iscsi session or
>> connection's socket or nic hw queue) could get overloaded. This is what
>> I meant in my original mail where iscsi would rely on whatever blk/mq
>> load balancers we end up implementing at that layer to balance requests
>> across hctxs.
>>
> 
> I'm not sure I understand,
> 
> The submission flow is CPU bound. In the current single queue model
> both CPU X and CPU Y will end up using a single socket. In the
> multi-queue solution, CPU X will go to socket X and CPU Y will go to
> socket Y. This is equal to what we have today (if only CPU X is active)
> or better (if more CPUs are active).
> 
> Am I missing something?

I did not take Hannes's comment as comparing what we have today vs the
proposal. I thought he was referring to the problem he was talking about
at LSF last year and saying there could be cases where we want to spread
IO across CPUs/queues and some cases where we would want to execute on
the CPU we were originally submitted on. I was just saying the iscsi
layer would not control that and would rely on the blk/mq layer to
handle this or tell us what to do similar to what we do for the
rq_affinity setting.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                                       ` <54B3C47E.6010109-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-01-13  9:46                                         ` Sagi Grimberg
  0 siblings, 0 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-13  9:46 UTC (permalink / raw)
  To: Bart Van Assche, open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	Mike Christie, Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-scsi,
	target-devel, Oren Duer, Or Gerlitz

On 1/12/2015 2:56 PM, Bart Van Assche wrote:
> On 01/11/15 10:40, Sagi Grimberg wrote:
>> I would say there is no need for specific coordination from iSCSI PoV.
>> This is exactly what flow steering is designed for. As I see it, in
>> order to get the TX/RX to match rings, the user can attach 5-tuple rules
>> (using standard ethtool) to steer packets to the right rings.
>
> Hello Sagi,
>
> Can the 5-tuple rules be chosen such that it is guaranteed that the
> sockets used to implement per-CPU queues are spread evenly over MSI-X
> completion vectors ? If not, would it help to add a socket option to the
> Linux network stack that allows to select the TX ring explicitly, just
> like ib_create_cq() in the Linux RDMA stack allows to select a
> completion vector explicitly ? My concerns are as follows:
> - If the number of queues exceeds the number of MSI-X vectors then I
>    expect that it will be much easier to guarantee even spreading by
>    selecting tx queues explicitly instead of relying on a hashing scheme.
> - On multi-socket systems it is important to process completion
>    interrupts on the CPU socket from where the I/O was initiated. I'm
>    not sure it is possible to guarantee this when using a hashing
>    algorithm to select the TX ring.
>

Hey Bart,

Your concerns are correct. Flow steering rules will guarantee that each
socket will have a different TX/RX ring, but not necessarily the
"correct" TX/RX ring. These issues have been addressed in the
Networking subsystem.

Thinking more on this out loud,

There is the TX challenge, getting the HW queue selection to match the
TX ring selection (which might not be the same according to flow hash), 
First thing that comes to mind is XPS (Transmit Packet Steering).

 From Documentation/networking/scaling.txt:
"Transmit Packet Steering is a mechanism for intelligently selecting
which transmit queue to use when transmitting a packet on a multi-queue
device. To accomplish this, a mapping from CPU to hardware queue(s) is
recorded. The goal of this mapping is usually to assign queues
exclusively to a subset of CPUs, where the transmit completions for
these queues are processed on a CPU within this set."

About the RX challenge, I think RFS (Receive Flow Steering) will
probably be the best fit here since RX packets will be steered to the
CPU where the application is running.

 From Documentation/networking/scaling.txt:
"The goal of RFS is to increase datacache hitrate by steering
kernel processing of packets to the CPU where the application thread
consuming the packet is running. RFS relies on the same RPS mechanisms
to enqueue packets onto the backlog of another CPU and to wake up that
CPU. In RFS, packets are not forwarded directly by the value of their
hash, but the hash is used as index into a flow lookup table. This
table maps flows to the CPUs where those flows are being processed."

This definitely needs some more thinking. CC'ing Or Gerlitz which has
a lot of experience in the Networking stack...

Sagi.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
       [not found]                                   ` <54B428F2.2010507-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
@ 2015-01-13  9:55                                     ` Sagi Grimberg
  0 siblings, 0 replies; 37+ messages in thread
From: Sagi Grimberg @ 2015-01-13  9:55 UTC (permalink / raw)
  To: Mike Christie, Nicholas A. Bellinger
  Cc: James Bottomley,
	lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bart Van Assche, linux-scsi, target-devel, Hannes Reinecke,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw

On 1/12/2015 10:05 PM, Mike Christie wrote:
> On 01/11/2015 03:23 AM, Sagi Grimberg wrote:
>> On 1/9/2015 8:00 PM, Michael Christie wrote:
>> <SNIP>
>>>>>
>>>>
>>>> Session wide command sequence number synchronization isn't something to
>>>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>>>> requirement.
>>>>
>>>> That is, the expected + maximum sequence numbers are returned as part of
>>>> every response PDU, which the initiator uses to determine when the
>>>> command sequence number window is open so new non-immediate commands may
>>>> be sent to the target.
>>>>
>>>> So, given some manner of session wide synchronization is required
>>>> between different contexts for the existing single connection case to
>>>> update the command sequence number and check when the window opens, it's
>>>> a fallacy to claim MC/S adds some type of new initiator specific
>>>> synchronization overhead vs. single connection code.
>>>
>>> I think you are assuming we are leaving the iscsi code as it is today.
>>>
>>> For the non-MCS mq session per CPU design, we would be allocating and
>>> binding the session and its resources to specific CPUs. They would
>>> only be accessed by the threads on that one CPU, so we get our
>>> serialization/synchronization from that. That is why we are saying we
>>> do not need something like atomic_t/spin_locks for the sequence number
>>> handling for this type of implementation.
>>>
>>> If we just tried to do this with the old code where the session could
>>> be accessed on multiple CPUs then you are right, we need locks/atomics
>>> like how we do in the MCS case.
>>>
>>
>> I don't think we will want to restrict session per CPU. There is a
>> tradeoff question of system resources. We might want to allow a user to
>> configure multiple HW queues but still not to use too much of the system
>> resources. So the session locks would still be used but definitely less
>> congested...
>
> Are you talking about specifically the session per CPU or also MCS and
> doing a connection per CPU?

This applies to both.

>
> Based on the srp work, how bad do you think it will be to do a
> session/connection per CPU? What are you thinking will be more common?
> Session per 4 CPU? 2 CPUs? 8?

This is a level of degree which demonstrates why we need to let the
user choose. I don't think there is a magic number here, there is a
tradeoff between performance and memory footprint.

>
> There is also multipath to take into account here. We could do a mq/MCS
> session/connection per CPU (or group of CPS) then also one of those per
> transport path. We could also do a mq/MCS session/connection per
> transport path, then bind those to specific CPUs. Or something in between.
>

Is it a good idea to tie iSCSI implementation in multipath? I've seen
deployments where multipath was not used for HA (NIC bonding was used
for that).

The srp implementation allowed the user to choose the number of
channels per target and the default was chosen by empirical results
(Bart, please correct me if I'm wrong here).

Sagi.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
  2015-01-08 13:45           ` Sagi Grimberg
       [not found]             ` <54AE8A02.1030100-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-01-14  4:16             ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 37+ messages in thread
From: Vladislav Bolkhovitin @ 2015-01-14  4:16 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche, open-iscsi, Hannes Reinecke, lsf-pc
  Cc: linux-scsi, target-devel, Oren Duer, Or Gerlitz

Sagi Grimberg wrote on 01/08/2015 05:45 AM:
>> RFC 3720 namely requires that iSCSI numbering is
>> session-wide. This means maintaining a single counter for all MC/S
>> sessions. Such a counter would be a contention point. I'm afraid that
>> because of that counter performance on a multi-socket initiator system
>> with a scsi-mq implementation based on MC/S could be worse than with the
>> approach with multiple iSER targets. Hence my preference for an approach
>> based on multiple independent iSER connections instead of MC/S.
> 
> So this comment is spot on the pros/cons of the discussion (we might want to leave
> something for LSF ;)).
> MCS would not allow a completely lockless data-path due to command
> ordering. On the other hand implementing some kind of multiple sessions
> solution feels somewhat like a mis-fit (at least in my view).
> 
> One of my thoughts about how to overcome the contention on commands
> sequence numbering was to suggest some kind of negotiable "relaxed
> ordering" mode but of course I don't have anything figured out yet.

Linux SCSI/block stack neither uses, nor guarantees any commands order. Applications
requiring commands order enforce it by queue draining (i.e. wait until all previous
commands finished). Hence, MC/S enforced commands order is an overkill, which
additionally coming with some non-zero performance cost.

Don't do MC/S, do independent connections. You know the KISS principle. Memory overhead
to setup the extra iSCSI sessions should be negligible.

Vlad

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-01-14  4:16 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-07 16:25 [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion Sagi Grimberg
     [not found] ` <54AD5DDD.2090808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-07 16:57   ` Hannes Reinecke
     [not found]     ` <54AD6563.4040603-l3A5Bk7waGM@public.gmane.org>
2015-01-07 21:39       ` Mike Christie
2015-01-08  7:50         ` Bart Van Assche
2015-01-08 13:45           ` Sagi Grimberg
     [not found]             ` <54AE8A02.1030100-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-08 14:11               ` Bart Van Assche
     [not found]                 ` <54AE9010.5080609-HInyCGIudOg@public.gmane.org>
2015-01-08 15:57                   ` Paul Koning
2015-01-09 11:39                 ` Sagi Grimberg
2015-01-09 13:31                   ` Bart Van Assche
     [not found]                     ` <5EE87F5E6631894E80EB1A63198F964D040A6A8F-cXZ6iGhjG0hm/BozF5lIdDJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
2015-01-11  9:52                       ` Sagi Grimberg
2015-01-14  4:16             ` Vladislav Bolkhovitin
2015-01-08 22:16           ` Nicholas A. Bellinger
2015-01-08 22:29             ` James Bottomley
2015-01-08 22:57               ` Nicholas A. Bellinger
     [not found]                 ` <1420757822.2842.39.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2015-01-08 23:22                   ` [Lsf-pc] " James Bottomley
2015-01-09  5:03                     ` Nicholas A. Bellinger
2015-01-09  6:25                       ` James Bottomley
     [not found]                       ` <1420779808.21830.21.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2015-01-09 18:00                         ` Michael Christie
2015-01-09 18:28                           ` Hannes Reinecke
     [not found]                             ` <54B01DBD.5020707-l3A5Bk7waGM@public.gmane.org>
2015-01-09 18:34                               ` James Bottomley
2015-01-09 20:19                               ` Mike Christie
     [not found]                                 ` <54B037BF.1010903-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
2015-01-11  9:40                                   ` Sagi Grimberg
2015-01-12 12:56                                     ` Bart Van Assche
     [not found]                                       ` <54B3C47E.6010109-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-01-13  9:46                                         ` Sagi Grimberg
     [not found]                                     ` <54B24501.7090801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12 20:14                                       ` Mike Christie
     [not found]                           ` <38CE4ECA-D155-4BF9-9D6D-E1A01ADA05E4-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
2015-01-11  9:23                             ` Sagi Grimberg
     [not found]                               ` <54B24117.7050204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12 20:05                                 ` Mike Christie
     [not found]                                   ` <54B428F2.2010507-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
2015-01-13  9:55                                     ` Sagi Grimberg
2015-01-08 23:26                 ` Mike Christie
     [not found]                   ` <54AF122C.9070703-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
2015-01-09 11:17                     ` Sagi Grimberg
2015-01-08 23:01           ` Mike Christie
2015-01-08 14:50         ` James Bottomley
2015-01-08 17:25           ` Sagi Grimberg
     [not found]         ` <54ADA777.6090801-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
2015-01-08 23:40           ` Mike Christie
2015-01-07 17:22   ` Lee Duncan
2015-01-07 19:11     ` [Lsf-pc] " Jan Kara
2015-01-07 16:58 ` Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.