From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 203B4C31E40 for ; Fri, 9 Aug 2019 12:32:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EA1262166E for ; Fri, 9 Aug 2019 12:32:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406798AbfHIMcg (ORCPT ); Fri, 9 Aug 2019 08:32:36 -0400 Received: from p3plsmtpa11-07.prod.phx3.secureserver.net ([68.178.252.108]:51202 "EHLO p3plsmtpa11-07.prod.phx3.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2406823AbfHIMcg (ORCPT ); Fri, 9 Aug 2019 08:32:36 -0400 X-Greylist: delayed 317 seconds by postgrey-1.27 at vger.kernel.org; Fri, 09 Aug 2019 08:32:35 EDT Received: from [192.168.0.56] ([24.218.182.144]) by :SMTPAUTH: with ESMTPSA id w44QhNivaEs2pw44QhG2ZY; Fri, 09 Aug 2019 05:32:34 -0700 Subject: Re: [PATCH for-rc] siw: MPA Reply handler tries to read beyond MPA message To: Krishnamraju Eraparaju , Bernard Metzler Cc: "jgg@ziepe.ca" , "linux-rdma@vger.kernel.org" , Potnuri Bharat Teja , Nirranjan Kirubaharan References: <20190805172605.GA5549@chelsio.com> <8499b96a-48dd-1286-ea0f-e66be34afffa@talpey.com> <20190731103310.23199-1-krishna2@chelsio.com> <518b1734-5d72-e32d-376b-0fec1cbce8f5@talpey.com> <20190808164638.GA7795@chelsio.com> From: Tom Talpey Message-ID: <14285740-dfe9-1a6c-8964-bf2590400f51@talpey.com> Date: Fri, 9 Aug 2019 08:32:33 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190808164638.GA7795@chelsio.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4wfA6srflIWh7zIq5L2HW3MKmyxSTYWm/gZipJ1ngdb4msLSC3MwydlRk0IjWaHa58u9OXkYeY/1EKNKZoeJjiYDv8c4EhF+A92KPwyyITSbCNSiRkRIpW ZkZtXcMXCaan2XlLy955nMlZ9o8B6PTC8t1GkmMMZdB0+L4X15boeOTBO3j7IjxTAjee6lg1odKGgRaYaAGuqeGql3UTXtxN8Tk0DFsbvx1WfxnNQFXq3588 mXFCnvBFITJZaOas700J1uoRT4IVXVXvUHaOU62YSyzgCULz7R6NtEtA8ajimo39k3cRA2z+/pPD8jhoOTmZaw== Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On 8/8/2019 12:46 PM, Krishnamraju Eraparaju wrote: > On Thursday, August 08/08/19, 2019 at 15:05:12 +0000, Bernard Metzler wrote: >> -----"Krishnamraju Eraparaju" wrote: ----- >> >>> To: "Tom Talpey" , >>> From: "Krishnamraju Eraparaju" >>> Date: 08/05/2019 07:26PM >>> Cc: "jgg@ziepe.ca" , "linux-rdma@vger.kernel.org" >>> , "Potnuri Bharat Teja" >>> , "Nirranjan Kirubaharan" >>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler tries >>> to read beyond MPA message >>> >>> On Friday, August 08/02/19, 2019 at 18:17:37 +0530, Tom Talpey wrote: >>>> On 8/2/2019 7:18 AM, Bernard Metzler wrote: >>>>> -----"Tom Talpey" wrote: ----- >>>>> >>>>>> To: "Bernard Metzler" , "Krishnamraju >>> Eraparaju" >>>>>> >>>>>> From: "Tom Talpey" >>>>>> Date: 08/01/2019 08:54PM >>>>>> Cc: jgg@ziepe.ca, linux-rdma@vger.kernel.org, >>> bharat@chelsio.com, >>>>>> nirranjan@chelsio.com, krishn2@chelsio.com >>>>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler >>> tries >>>>>> to read beyond MPA message >>>>>> >>>>>> On 8/1/2019 6:56 AM, Bernard Metzler wrote: >>>>>>> -----"Krishnamraju Eraparaju" wrote: >>> ----- >>>>>>> >>>>>>>> To: jgg@ziepe.ca, bmt@zurich.ibm.com >>>>>>>> From: "Krishnamraju Eraparaju" >>>>>>>> Date: 07/31/2019 12:34PM >>>>>>>> Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com, >>>>>>>> nirranjan@chelsio.com, krishn2@chelsio.com, "Krishnamraju >>>>>> Eraparaju" >>>>>>>> >>>>>>>> Subject: [EXTERNAL] [PATCH for-rc] siw: MPA Reply handler >>> tries to >>>>>>>> read beyond MPA message >>>>>>>> >>>>>>>> while processing MPA Reply, SIW driver is trying to read extra >>> 4 >>>>>>>> bytes >>>>>>>> than what peer has advertised as private data length. >>>>>>>> >>>>>>>> If a FPDU data is received before even siw_recv_mpa_rr() >>> completed >>>>>>>> reading MPA reply, then ksock_recv() in siw_recv_mpa_rr() >>> could >>>>>> also >>>>>>>> read FPDU, if "size" is larger than advertised MPA reply >>> length. >>>>>>>> >>>>>>>> 501 static int siw_recv_mpa_rr(struct siw_cep *cep) >>>>>>>> 502 { >>>>>>>> ............. >>>>>>>> 572 >>>>>>>> 573 if (rcvd > to_rcv) >>>>>>>> 574 return -EPROTO; <----- Failure here >>>>>>>> >>>>>>>> Looks like the intention here is to throw an ERROR if the >>> received >>>>>>>> data >>>>>>>> is more than the total private data length advertised by the >>> peer. >>>>>>>> But >>>>>>>> reading beyond MPA message causes siw_cm to generate >>>>>>>> RDMA_CM_EVENT_CONNECT_ERROR event when TCP socket recv buffer >>> is >>>>>>>> already >>>>>>>> queued with FPDU messages. >>>>>>>> >>>>>>>> Hence, this function should only read upto private data >>> length. >>>>>>>> >>>>>>>> Signed-off-by: Krishnamraju Eraparaju >>>>>>>> --- >>>>>>>> drivers/infiniband/sw/siw/siw_cm.c | 4 ++-- >>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c >>>>>>>> b/drivers/infiniband/sw/siw/siw_cm.c >>>>>>>> index a7cde98e73e8..8dc8cea2566c 100644 >>>>>>>> --- a/drivers/infiniband/sw/siw/siw_cm.c >>>>>>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c >>>>>>>> @@ -559,13 +559,13 @@ static int siw_recv_mpa_rr(struct >>> siw_cep >>>>>> *cep) >>>>>>>> * A private data buffer gets allocated if hdr->params.pd_len >>> != >>>>>> 0. >>>>>>>> */ >>>>>>>> if (!cep->mpa.pdata) { >>>>>>>> - cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL); >>>>>>>> + cep->mpa.pdata = kmalloc(pd_len, GFP_KERNEL); >>>>>>>> if (!cep->mpa.pdata) >>>>>>>> return -ENOMEM; >>>>>>>> } >>>>>>>> rcvd = ksock_recv( >>>>>>>> s, cep->mpa.pdata + cep->mpa.bytes_rcvd - sizeof(struct >>> mpa_rr), >>>>>>>> - to_rcv + 4, MSG_DONTWAIT); >>>>>>>> + to_rcv, MSG_DONTWAIT); >>>>>>>> >>>>>>>> if (rcvd < 0) >>>>>>>> return rcvd; >>>>>>>> -- >>>>>>>> 2.23.0.rc0 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> The intention of this code is to make sure the >>>>>>> peer does not violates the MPA handshake rules. >>>>>>> The initiator MUST NOT send extra data after its >>>>>>> MPA request and before receiving the MPA reply. >>>>>> >>>>>> I think this is true only for MPA v2. With MPA v1, the >>>>>> initiator can begin sending immediately (before receiving >>>>>> the MPA reply), because there is no actual negotiation at >>>>>> the MPA layer. >>>>>> >>>>>> With MPA v2, the negotiation exchange is required because >>>>>> the type of the following message is predicated by the >>>>>> "Enhanced mode" a|b|c|d flags present in the first 32 bits >>>>>> of the private data buffer. >>>>>> >>>>>> So, it seems to me that additional logic is needed here to >>>>>> determine the MPA version, before sniffing additional octets >>>>> >from the incoming stream? >>>>>> >>>>>> Tom. >>>>>> >>>>> >>>>> There still is the marker negotiation taking place. >>>>> RFC 5044 says in section 7.1.2: >>>>> >>>>> "Note: Since the receiver's ability to deal with Markers is >>>>> unknown until the Request and Reply Frames have been >>>>> received, sending FPDUs before this occurs is not possible." >>>>> >>>>> This section further discusses the responder's behavior, >>>>> where it MUST receive a first FPDU from the initiator >>>>> before sending its first FPDU: >>>>> >>>>> "4. MPA Responder mode implementations MUST receive and validate >>> at >>>>> least one FPDU before sending any FPDUs or Markers." >>>>> >>>>> So it appears with MPA version 1, the current siw >>>>> code is correct. The initiator is entering FPDU mode >>>>> first, and only after receiving the MPA reply frame. >>>>> Only after the initiator sent a first FPDU, the responder >>>>> can start using the connection in FPDU mode. >>>>> Because of this somehow broken connection establishment >>>>> procedure (only the initiator can start sending data), a >>>>> later MPA version makes this first FPDU exchange explicit >>>>> and selectable (zero length READ/WRITE/Send). >>>> >>>> Yeah, I guess so. Because nobody ever actually implemented markers, >>>> I think that they may more or less passively ignore this. But >>> you're >>>> currect that it's invalid protocol behavior. >>>> >>>> If your testing didn't uncover any issues with existing >>> implementations >>>> failing to connect with your strict checking, I'm ok with it. >>> Tom & Bernard, >>> Thanks for the insight on MPA negotiation. >>> >>> Could the below patch be considered as a proper fix? >>> >>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c >>> b/drivers/infiniband/sw/siw/siw_cm.c >>> index 9ce8a1b925d2..0aec1b5212f9 100644 >>> --- a/drivers/infiniband/sw/siw/siw_cm.c >>> +++ b/drivers/infiniband/sw/siw/siw_cm.c >>> @@ -503,6 +503,7 @@ static int siw_recv_mpa_rr(struct siw_cep *cep) >>> struct socket *s = cep->sock; >>> u16 pd_len; >>> int rcvd, to_rcv; >>> + int extra_data_check = 4; /* 4Bytes, for MPA rules violation >>> checking */ >>> >>> if (cep->mpa.bytes_rcvd < sizeof(struct mpa_rr)) { >>> rcvd = ksock_recv(s, (char *)hdr + >>> cep->mpa.bytes_rcvd, >>> @@ -553,23 +554,37 @@ static int siw_recv_mpa_rr(struct siw_cep *cep) >>> return -EPROTO; >>> } >>> >>> + /* >>> + * Peer must not send any extra data other than MPA messages >>> until MPA >>> + * negotiation is completed, an exception is MPA V2 >>> client-server Mode, >>> + * IE, in this mode the peer after sending MPA Reply can >>> immediately >>> + * start sending data in RDMA mode. >>> + */ >> >> This is unfortunately not true. The responder NEVER sends >> an FPDU without having seen an FPDU from the initiator. >> I just checked RFC 6581 again. The RTR (ready-to-receive) >> indication from the initiator is still needed, but now >> provided by the protocol and not the application: w/o >> 'enhanced MPA setup', the initiator sends the first >> FPDU as an application message. With 'enhanced MPA setup', >> the initiator protocol entity sends (w/o application >> interaction) a zero length READ/WRITE/Send as a first FPDU, >> as previously negotiated with the responder. Again: only >> after the responder has seen the first FPDU, it can >> start sending in FPDU mode. > If I understand your statements correctly: in MPA v2 clint-server mode, > the responder application should have some logic to wait(after ESTABLISH > event) until the first FPDU message is received from the initiator? Krishna, this is not the application's responsibility. This is MPA, and the iWARP provider's domain. Tom. > > Thanks, > Krishna. >> >> Sorry for the confusion. But the current >> siw code appears to be just correct. >> >> Thanks >> Bernard >> >> >> >>> + if ((__mpa_rr_revision(cep->mpa.hdr.params.bits) == >>> MPA_REVISION_2) && >>> + (cep->state == SIW_EPSTATE_AWAIT_MPAREP)) { >>> + int mpa_p2p_mode = cep->mpa.v2_ctrl_req.ord & >>> + (MPA_V2_RDMA_WRITE_RTR | >>> MPA_V2_RDMA_READ_RTR); >>> + if (!mpa_p2p_mode) >>> + extra_data_check = 0; >>> + } >>> + >>> /* >>> * At this point, we must have hdr->params.pd_len != 0. >>> * A private data buffer gets allocated if hdr->params.pd_len >>> != >>> * 0. >>> */ >>> if (!cep->mpa.pdata) { >>> - cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL); >>> + cep->mpa.pdata = kmalloc(pd_len + extra_data_check, >>> GFP_KERNEL); >>> if (!cep->mpa.pdata) >>> return -ENOMEM; >>> } >>> rcvd = ksock_recv( >>> s, cep->mpa.pdata + cep->mpa.bytes_rcvd - >>> sizeof(struct >>> mpa_rr), >>> - to_rcv + 4, MSG_DONTWAIT); >>> + to_rcv + extra_data_check, MSG_DONTWAIT); >>> >>> if (rcvd < 0) >>> return rcvd; >>> >>> - if (rcvd > to_rcv) >>> + if (extra_data_check && (rcvd > to_rcv)) >>> return -EPROTO; >>> >>> cep->mpa.bytes_rcvd += rcvd; >>> >>> -Krishna. >>>> >>>> Tom. >>>> >>>> >>>> >>>>> >>>>> Bernard. >>>>> >>>>>> >>>>>>> So, for the MPA request case, this code is needed >>>>>>> to check for protocol correctness. >>>>>>> You are right for the MPA reply case - if we are >>>>>>> _not_ in peer2peer mode, the peer can immediately >>>>>>> start sending data in RDMA mode after its MPA Reply. >>>>>>> So we shall add appropriate code to be more specific >>>>>>> For an error, we are (1) processing an MPA Request, >>>>>>> OR (2) processing an MPA Reply AND we are not expected >>>>>>> to send an initial READ/WRITE/Send as negotiated with >>>>>>> the peer (peer2peer mode MPA handshake). >>>>>>> >>>>>>> Just removing this check would make siw more permissive, >>>>>>> but to a point where peer MPA protocol errors are >>>>>>> tolerated. I am not in favor of that level of >>>>>>> forgiveness. >>>>>>> >>>>>>> If possible, please provide an appropriate patch >>>>>>> or (if it causes current issues with another peer >>>>>>> iWarp implementation) just run in MPA peer2peer mode, >>>>>>> where the current check is appropriate. >>>>>>> Otherwise, I would provide an appropriate fix by Monday >>>>>>> (I am still out of office this week). >>>>>>> >>>>>>> >>>>>>> Many thanks and best regards, >>>>>>> Bernard. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>> >>> >> > >