linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Krishnamraju Eraparaju <krishna2@chelsio.com>,
	Bernard Metzler <BMT@zurich.ibm.com>
Cc: "jgg@ziepe.ca" <jgg@ziepe.ca>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Potnuri Bharat Teja <bharat@chelsio.com>,
	Nirranjan Kirubaharan <nirranjan@chelsio.com>
Subject: Re: [PATCH for-rc] siw: MPA Reply handler tries to read beyond MPA message
Date: Fri, 9 Aug 2019 08:32:33 -0400	[thread overview]
Message-ID: <14285740-dfe9-1a6c-8964-bf2590400f51@talpey.com> (raw)
In-Reply-To: <20190808164638.GA7795@chelsio.com>

On 8/8/2019 12:46 PM, Krishnamraju Eraparaju wrote:
> On Thursday, August 08/08/19, 2019 at 15:05:12 +0000, Bernard Metzler wrote:
>> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
>>
>>> To: "Tom Talpey" <tom@talpey.com>, <BMT@zurich.ibm.com>
>>> From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>>> Date: 08/05/2019 07:26PM
>>> Cc: "jgg@ziepe.ca" <jgg@ziepe.ca>, "linux-rdma@vger.kernel.org"
>>> <linux-rdma@vger.kernel.org>, "Potnuri Bharat Teja"
>>> <bharat@chelsio.com>, "Nirranjan Kirubaharan" <nirranjan@chelsio.com>
>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler tries
>>> to read beyond MPA message
>>>
>>> On Friday, August 08/02/19, 2019 at 18:17:37 +0530, Tom Talpey wrote:
>>>> On 8/2/2019 7:18 AM, Bernard Metzler wrote:
>>>>> -----"Tom Talpey" <tom@talpey.com> wrote: -----
>>>>>
>>>>>> To: "Bernard Metzler" <BMT@zurich.ibm.com>, "Krishnamraju
>>> Eraparaju"
>>>>>> <krishna2@chelsio.com>
>>>>>> From: "Tom Talpey" <tom@talpey.com>
>>>>>> Date: 08/01/2019 08:54PM
>>>>>> Cc: jgg@ziepe.ca, linux-rdma@vger.kernel.org,
>>> bharat@chelsio.com,
>>>>>> nirranjan@chelsio.com, krishn2@chelsio.com
>>>>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler
>>> tries
>>>>>> to read beyond MPA message
>>>>>>
>>>>>> On 8/1/2019 6:56 AM, Bernard Metzler wrote:
>>>>>>> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote:
>>> -----
>>>>>>>
>>>>>>>> To: jgg@ziepe.ca, bmt@zurich.ibm.com
>>>>>>>> From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>>>>>>>> Date: 07/31/2019 12:34PM
>>>>>>>> Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
>>>>>>>> nirranjan@chelsio.com, krishn2@chelsio.com, "Krishnamraju
>>>>>> Eraparaju"
>>>>>>>> <krishna2@chelsio.com>
>>>>>>>> Subject: [EXTERNAL] [PATCH for-rc] siw: MPA Reply handler
>>> tries to
>>>>>>>> read beyond MPA message
>>>>>>>>
>>>>>>>> while processing MPA Reply, SIW driver is trying to read extra
>>> 4
>>>>>>>> bytes
>>>>>>>> than what peer has advertised as private data length.
>>>>>>>>
>>>>>>>> If a FPDU data is received before even siw_recv_mpa_rr()
>>> completed
>>>>>>>> reading MPA reply, then ksock_recv() in siw_recv_mpa_rr()
>>> could
>>>>>> also
>>>>>>>> read FPDU, if "size" is larger than advertised MPA reply
>>> length.
>>>>>>>>
>>>>>>>> 501 static int siw_recv_mpa_rr(struct siw_cep *cep)
>>>>>>>> 502 {
>>>>>>>>             .............
>>>>>>>> 572
>>>>>>>> 573         if (rcvd > to_rcv)
>>>>>>>> 574                 return -EPROTO;   <----- Failure here
>>>>>>>>
>>>>>>>> Looks like the intention here is to throw an ERROR if the
>>> received
>>>>>>>> data
>>>>>>>> is more than the total private data length advertised by the
>>> peer.
>>>>>>>> But
>>>>>>>> reading beyond MPA message causes siw_cm to generate
>>>>>>>> RDMA_CM_EVENT_CONNECT_ERROR event when TCP socket recv buffer
>>> is
>>>>>>>> already
>>>>>>>> queued with FPDU messages.
>>>>>>>>
>>>>>>>> Hence, this function should only read upto private data
>>> length.
>>>>>>>>
>>>>>>>> Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
>>>>>>>> ---
>>>>>>>> drivers/infiniband/sw/siw/siw_cm.c | 4 ++--
>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> b/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> index a7cde98e73e8..8dc8cea2566c 100644
>>>>>>>> --- a/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> @@ -559,13 +559,13 @@ static int siw_recv_mpa_rr(struct
>>> siw_cep
>>>>>> *cep)
>>>>>>>> 	 * A private data buffer gets allocated if hdr->params.pd_len
>>> !=
>>>>>> 0.
>>>>>>>> 	 */
>>>>>>>> 	if (!cep->mpa.pdata) {
>>>>>>>> -		cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL);
>>>>>>>> +		cep->mpa.pdata = kmalloc(pd_len, GFP_KERNEL);
>>>>>>>> 		if (!cep->mpa.pdata)
>>>>>>>> 			return -ENOMEM;
>>>>>>>> 	}
>>>>>>>> 	rcvd = ksock_recv(
>>>>>>>> 		s, cep->mpa.pdata + cep->mpa.bytes_rcvd - sizeof(struct
>>> mpa_rr),
>>>>>>>> -		to_rcv + 4, MSG_DONTWAIT);
>>>>>>>> +		to_rcv, MSG_DONTWAIT);
>>>>>>>>
>>>>>>>> 	if (rcvd < 0)
>>>>>>>> 		return rcvd;
>>>>>>>> -- 
>>>>>>>> 2.23.0.rc0
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> The intention of this code is to make sure the
>>>>>>> peer does not violates the MPA handshake rules.
>>>>>>> The initiator MUST NOT send extra data after its
>>>>>>> MPA request and before receiving the MPA reply.
>>>>>>
>>>>>> I think this is true only for MPA v2. With MPA v1, the
>>>>>> initiator can begin sending immediately (before receiving
>>>>>> the MPA reply), because there is no actual negotiation at
>>>>>> the MPA layer.
>>>>>>
>>>>>> With MPA v2, the negotiation exchange is required because
>>>>>> the type of the following message is predicated by the
>>>>>> "Enhanced mode" a|b|c|d flags present in the first 32 bits
>>>>>> of the private data buffer.
>>>>>>
>>>>>> So, it seems to me that additional logic is needed here to
>>>>>> determine the MPA version, before sniffing additional octets
>>>>> >from the incoming stream?
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>>
>>>>> There still is the marker negotiation taking place.
>>>>> RFC 5044 says in section 7.1.2:
>>>>>
>>>>> "Note: Since the receiver's ability to deal with Markers is
>>>>>    unknown until the Request and Reply Frames have been
>>>>>    received, sending FPDUs before this occurs is not possible."
>>>>>
>>>>> This section further discusses the responder's behavior,
>>>>> where it MUST receive a first FPDU from the initiator
>>>>> before sending its first FPDU:
>>>>>
>>>>> "4.  MPA Responder mode implementations MUST receive and validate
>>> at
>>>>>          least one FPDU before sending any FPDUs or Markers."
>>>>>
>>>>> So it appears with MPA version 1, the current siw
>>>>> code is correct. The initiator is entering FPDU mode
>>>>> first, and only after receiving the MPA reply frame.
>>>>> Only after the initiator sent a first FPDU, the responder
>>>>> can start using the connection in FPDU mode.
>>>>> Because of this somehow broken connection establishment
>>>>> procedure (only the initiator can start sending data), a
>>>>> later MPA version makes this first FPDU exchange explicit
>>>>> and selectable (zero length READ/WRITE/Send).
>>>>
>>>> Yeah, I guess so. Because nobody ever actually implemented markers,
>>>> I think that they may more or less passively ignore this. But
>>> you're
>>>> currect that it's invalid protocol behavior.
>>>>
>>>> If your testing didn't uncover any issues with existing
>>> implementations
>>>> failing to connect with your strict checking, I'm ok with it.
>>> Tom & Bernard,
>>> Thanks for the insight on MPA negotiation.
>>>
>>> Could the below patch be considered as a proper fix?
>>>
>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>>> b/drivers/infiniband/sw/siw/siw_cm.c
>>> index 9ce8a1b925d2..0aec1b5212f9 100644
>>> --- a/drivers/infiniband/sw/siw/siw_cm.c
>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
>>> @@ -503,6 +503,7 @@ static int siw_recv_mpa_rr(struct siw_cep *cep)
>>>         struct socket *s = cep->sock;
>>>         u16 pd_len;
>>>         int rcvd, to_rcv;
>>> +       int extra_data_check = 4; /* 4Bytes, for MPA rules violation
>>> checking */
>>>
>>>         if (cep->mpa.bytes_rcvd < sizeof(struct mpa_rr)) {
>>>                 rcvd = ksock_recv(s, (char *)hdr +
>>> cep->mpa.bytes_rcvd,
>>> @@ -553,23 +554,37 @@ static int siw_recv_mpa_rr(struct siw_cep *cep)
>>>                 return -EPROTO;
>>>         }
>>>
>>> +       /*
>>> +        * Peer must not send any extra data other than MPA messages
>>> until MPA
>>> +        * negotiation is completed, an exception is MPA V2
>>> client-server Mode,
>>> +        * IE, in this mode the peer after sending MPA Reply can
>>> immediately
>>> +        * start sending data in RDMA mode.
>>> +        */
>>
>> This is unfortunately not true. The responder NEVER sends
>> an FPDU without having seen an FPDU from the initiator.
>> I just checked RFC 6581 again. The RTR (ready-to-receive)
>> indication from the initiator is still needed, but now
>> provided by the protocol and not the application: w/o
>> 'enhanced MPA setup', the initiator sends the first
>> FPDU as an application message. With 'enhanced MPA setup',
>> the initiator protocol entity sends (w/o application
>> interaction) a zero length READ/WRITE/Send as a first FPDU,
>> as previously negotiated with the responder. Again: only
>> after the responder has seen the first FPDU, it can
>> start sending in FPDU mode.
> If I understand your statements correctly: in MPA v2 clint-server mode,
> the responder application should have some logic to wait(after ESTABLISH
> event) until the first FPDU message is received from the initiator?

Krishna, this is not the application's responsibility. This is MPA,
and the iWARP provider's domain.

Tom.

> 
> Thanks,
> Krishna.
>>
>> Sorry for the confusion. But the current
>> siw code appears to be just correct.
>>
>> Thanks
>> Bernard
>>
>>
>>
>>> +       if ((__mpa_rr_revision(cep->mpa.hdr.params.bits) ==
>>> MPA_REVISION_2) &&
>>> +               (cep->state == SIW_EPSTATE_AWAIT_MPAREP)) {
>>> +               int mpa_p2p_mode = cep->mpa.v2_ctrl_req.ord &
>>> +                               (MPA_V2_RDMA_WRITE_RTR |
>>> MPA_V2_RDMA_READ_RTR);
>>> +               if (!mpa_p2p_mode)
>>> +                       extra_data_check = 0;
>>> +       }
>>> +
>>>         /*
>>>          * At this point, we must have hdr->params.pd_len != 0.
>>>          * A private data buffer gets allocated if hdr->params.pd_len
>>> !=
>>>          * 0.
>>>          */
>>>         if (!cep->mpa.pdata) {
>>> -               cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL);
>>> +               cep->mpa.pdata = kmalloc(pd_len + extra_data_check,
>>> GFP_KERNEL);
>>>                 if (!cep->mpa.pdata)
>>>                         return -ENOMEM;
>>>         }
>>>         rcvd = ksock_recv(
>>>                 s, cep->mpa.pdata + cep->mpa.bytes_rcvd -
>>> sizeof(struct
>>> mpa_rr),
>>> -               to_rcv + 4, MSG_DONTWAIT);
>>> +               to_rcv + extra_data_check, MSG_DONTWAIT);
>>>
>>>         if (rcvd < 0)
>>>                 return rcvd;
>>>
>>> -       if (rcvd > to_rcv)
>>> +       if (extra_data_check && (rcvd > to_rcv))
>>>                 return -EPROTO;
>>>
>>>         cep->mpa.bytes_rcvd += rcvd;
>>>
>>> -Krishna.
>>>>
>>>> Tom.
>>>>
>>>>
>>>>
>>>>>
>>>>> Bernard.
>>>>>
>>>>>>
>>>>>>> So, for the MPA request case, this code is needed
>>>>>>> to check for protocol correctness.
>>>>>>> You are right for the MPA reply case - if we are
>>>>>>> _not_ in peer2peer mode, the peer can immediately
>>>>>>> start sending data in RDMA mode after its MPA Reply.
>>>>>>> So we shall add appropriate code to be more specific
>>>>>>> For an error, we are (1) processing an MPA Request,
>>>>>>> OR (2) processing an MPA Reply AND we are not expected
>>>>>>> to send an initial READ/WRITE/Send as negotiated with
>>>>>>> the peer (peer2peer mode MPA handshake).
>>>>>>>
>>>>>>> Just removing this check would make siw more permissive,
>>>>>>> but to a point where peer MPA protocol errors are
>>>>>>> tolerated. I am not in favor of that level of
>>>>>>> forgiveness.
>>>>>>>
>>>>>>> If possible, please provide an appropriate patch
>>>>>>> or (if it causes current issues with another peer
>>>>>>> iWarp implementation) just run in MPA peer2peer mode,
>>>>>>> where the current check is appropriate.
>>>>>>> Otherwise, I would provide an appropriate fix by Monday
>>>>>>> (I am still out of office this week).
>>>>>>>
>>>>>>>
>>>>>>> Many thanks and best regards,
>>>>>>> Bernard.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>
> 
> 

  reply	other threads:[~2019-08-09 12:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-31 10:33 [PATCH for-rc] siw: MPA Reply handler tries to read beyond MPA message Krishnamraju Eraparaju
2019-07-31 19:17 ` Doug Ledford
2019-08-01 11:34   ` Krishnamraju Eraparaju
2019-08-01 10:56 ` Bernard Metzler
2019-08-01 15:36   ` Doug Ledford
2019-08-01 18:53   ` Tom Talpey
2019-08-02 11:18   ` Bernard Metzler
2019-08-02 12:47     ` Tom Talpey
2019-08-08 15:05       ` Bernard Metzler
2019-08-08 16:46         ` Krishnamraju Eraparaju
2019-08-09 12:32           ` Tom Talpey [this message]
2019-08-09 10:29         ` Bernard Metzler
2019-08-09 12:27           ` Tom Talpey
2019-08-09 13:52           ` Bernard Metzler
2019-08-09 20:35             ` Tom Talpey
2019-08-12  9:58               ` Krishnamraju Eraparaju
2019-08-12 12:56               ` Bernard Metzler
2019-08-13  8:05                 ` Krishnamraju Eraparaju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14285740-dfe9-1a6c-8964-bf2590400f51@talpey.com \
    --to=tom@talpey.com \
    --cc=BMT@zurich.ibm.com \
    --cc=bharat@chelsio.com \
    --cc=jgg@ziepe.ca \
    --cc=krishna2@chelsio.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=nirranjan@chelsio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).