From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sagi Grimberg Subject: Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion Date: Sun, 11 Jan 2015 11:40:17 +0200 Message-ID: <54B24501.7090801@dev.mellanox.co.il> References: <54AD5DDD.2090808@dev.mellanox.co.il> <54AD6563.4040603@suse.de> <54ADA777.6090801@cs.wisc.edu> <54AE36CE.8020509@acm.org> <1420755361.2842.16.camel@haakon3.risingtidesystems.com> <1420756142.11310.9.camel@HansenPartnership.com> <1420757822.2842.39.camel@haakon3.risingtidesystems.com> <1420759360.11310.13.camel@HansenPartnership.com> <1420779808.21830.21.camel@haakon3.risingtidesystems.com> <38CE4ECA-D155-4BF9-9D6D-E1A01ADA05E4@cs.wisc.edu> <54B01DBD.5020707@suse.de> <54B037BF.1010903@cs.wisc.edu> Reply-To: open-iscsi-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Return-path: In-Reply-To: <54B037BF.1010903-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org> List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , To: Mike Christie , open-iscsi-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org, "Nicholas A. Bellinger" Cc: James Bottomley , lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Bart Van Assche , linux-scsi , target-devel List-Id: linux-scsi@vger.kernel.org On 1/9/2015 10:19 PM, Mike Christie wrote: > On 01/09/2015 12:28 PM, Hannes Reinecke wrote: >> On 01/09/2015 07:00 PM, Michael Christie wrote: >>> >>> On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger wrote: >>> >>>> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote: >>>>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote: >>>>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote: >>>>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote: >>>> >>>> >>>> >>>>>> The point is that a simple session wide counter for command sequence >>>>>> number assignment is significantly less overhead than all of the >>>>>> overhead associated with running a full multipath stack atop multiple >>>>>> sessions. >>>>> >>>>> I don't see how that's relevant to issue speed, which was the measure we >>>>> were using: The layers above are just a hopper. As long as they're >>>>> loaded, the MQ lower layer can issue at full speed. So as long as the >>>>> multipath hopper is efficient enough to keep the queues loaded there's >>>>> no speed degradation. >>>>> >>>>> The problem with a sequence point inside the MQ issue layer is that it >>>>> can cause a stall that reduces the issue speed. so the counter sequence >>>>> point causes a degraded issue speed over the multipath hopper approach >>>>> above even if the multipath approach has a higher CPU overhead. >>>>> >>>>> Now, if the system is close to 100% cpu already, *then* the multipath >>>>> overhead will try to take CPU power we don't have and cause a stall, but >>>>> it's only in the flat out CPU case. >>>>> >>>>>> Not to mention that our iSCSI/iSER initiator is already taking a session >>>>>> wide lock when sending outgoing PDUs, so adding a session wide counter >>>>>> isn't adding any additional synchronization overhead vs. what's already >>>>>> in place. >>>>> >>>>> I'll leave it up to the iSER people to decide whether they're redoing >>>>> this as part of the MQ work. >>>>> >>>> >>>> Session wide command sequence number synchronization isn't something to >>>> be removed as part of the MQ work. It's a iSCSI/iSER protocol >>>> requirement. >>>> >>>> That is, the expected + maximum sequence numbers are returned as part of >>>> every response PDU, which the initiator uses to determine when the >>>> command sequence number window is open so new non-immediate commands may >>>> be sent to the target. >>>> >>>> So, given some manner of session wide synchronization is required >>>> between different contexts for the existing single connection case to >>>> update the command sequence number and check when the window opens, it's >>>> a fallacy to claim MC/S adds some type of new initiator specific >>>> synchronization overhead vs. single connection code. >>> >>> I think you are assuming we are leaving the iscsi code as it is today. >>> >>> For the non-MCS mq session per CPU design, we would be allocating and >>> binding the session and its resources to specific CPUs. They would only >>> be accessed by the threads on that one CPU, so we get our >>> serialization/synchronization from that. That is why we are saying we >>> do not need something like atomic_t/spin_locks for the sequence number >>> handling for this type of implementation. >>> >> Wouldn't that need to be coordinated with the networking layer? > > Yes. > >> Doesn't it do the same thing, matching TX/RX queues to CPUs? > > Yes. > Hey Hannes, Mike, I would say there is no need for specific coordination from iSCSI PoV. This is exactly what flow steering is designed for. As I see it, in order to get the TX/RX to match rings, the user can attach 5-tuple rules (using standard ethtool) to steer packets to the right rings. Sagi. >> If so, wouldn't we decrease bandwidth by restricting things to one CPU? > > We have a session or connection per CPU though, so we end up hitting the > same problem you talked about last year where one hctx (iscsi session or > connection's socket or nic hw queue) could get overloaded. This is what > I meant in my original mail where iscsi would rely on whatever blk/mq > load balancers we end up implementing at that layer to balance requests > across hctxs. > I'm not sure I understand, The submission flow is CPU bound. In the current single queue model both CPU X and CPU Y will end up using a single socket. In the multi-queue solution, CPU X will go to socket X and CPU Y will go to socket Y. This is equal to what we have today (if only CPU X is active) or better (if more CPUs are active). Am I missing something? Sagi. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.