From: Tom Talpey <tom@talpey.com>
To: Krishnamraju Eraparaju <krishna2@chelsio.com>,
Bernard Metzler <BMT@zurich.ibm.com>
Cc: "jgg@ziepe.ca" <jgg@ziepe.ca>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
Potnuri Bharat Teja <bharat@chelsio.com>,
Nirranjan Kirubaharan <nirranjan@chelsio.com>
Subject: Re: [PATCH for-rc] siw: MPA Reply handler tries to read beyond MPA message
Date: Fri, 9 Aug 2019 08:32:33 -0400 [thread overview]
Message-ID: <14285740-dfe9-1a6c-8964-bf2590400f51@talpey.com> (raw)
In-Reply-To: <20190808164638.GA7795@chelsio.com>
On 8/8/2019 12:46 PM, Krishnamraju Eraparaju wrote:
> On Thursday, August 08/08/19, 2019 at 15:05:12 +0000, Bernard Metzler wrote:
>> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
>>
>>> To: "Tom Talpey" <tom@talpey.com>, <BMT@zurich.ibm.com>
>>> From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>>> Date: 08/05/2019 07:26PM
>>> Cc: "jgg@ziepe.ca" <jgg@ziepe.ca>, "linux-rdma@vger.kernel.org"
>>> <linux-rdma@vger.kernel.org>, "Potnuri Bharat Teja"
>>> <bharat@chelsio.com>, "Nirranjan Kirubaharan" <nirranjan@chelsio.com>
>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler tries
>>> to read beyond MPA message
>>>
>>> On Friday, August 08/02/19, 2019 at 18:17:37 +0530, Tom Talpey wrote:
>>>> On 8/2/2019 7:18 AM, Bernard Metzler wrote:
>>>>> -----"Tom Talpey" <tom@talpey.com> wrote: -----
>>>>>
>>>>>> To: "Bernard Metzler" <BMT@zurich.ibm.com>, "Krishnamraju
>>> Eraparaju"
>>>>>> <krishna2@chelsio.com>
>>>>>> From: "Tom Talpey" <tom@talpey.com>
>>>>>> Date: 08/01/2019 08:54PM
>>>>>> Cc: jgg@ziepe.ca, linux-rdma@vger.kernel.org,
>>> bharat@chelsio.com,
>>>>>> nirranjan@chelsio.com, krishn2@chelsio.com
>>>>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler
>>> tries
>>>>>> to read beyond MPA message
>>>>>>
>>>>>> On 8/1/2019 6:56 AM, Bernard Metzler wrote:
>>>>>>> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote:
>>> -----
>>>>>>>
>>>>>>>> To: jgg@ziepe.ca, bmt@zurich.ibm.com
>>>>>>>> From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>>>>>>>> Date: 07/31/2019 12:34PM
>>>>>>>> Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
>>>>>>>> nirranjan@chelsio.com, krishn2@chelsio.com, "Krishnamraju
>>>>>> Eraparaju"
>>>>>>>> <krishna2@chelsio.com>
>>>>>>>> Subject: [EXTERNAL] [PATCH for-rc] siw: MPA Reply handler
>>> tries to
>>>>>>>> read beyond MPA message
>>>>>>>>
>>>>>>>> while processing MPA Reply, SIW driver is trying to read extra
>>> 4
>>>>>>>> bytes
>>>>>>>> than what peer has advertised as private data length.
>>>>>>>>
>>>>>>>> If a FPDU data is received before even siw_recv_mpa_rr()
>>> completed
>>>>>>>> reading MPA reply, then ksock_recv() in siw_recv_mpa_rr()
>>> could
>>>>>> also
>>>>>>>> read FPDU, if "size" is larger than advertised MPA reply
>>> length.
>>>>>>>>
>>>>>>>> 501 static int siw_recv_mpa_rr(struct siw_cep *cep)
>>>>>>>> 502 {
>>>>>>>> .............
>>>>>>>> 572
>>>>>>>> 573 if (rcvd > to_rcv)
>>>>>>>> 574 return -EPROTO; <----- Failure here
>>>>>>>>
>>>>>>>> Looks like the intention here is to throw an ERROR if the
>>> received
>>>>>>>> data
>>>>>>>> is more than the total private data length advertised by the
>>> peer.
>>>>>>>> But
>>>>>>>> reading beyond MPA message causes siw_cm to generate
>>>>>>>> RDMA_CM_EVENT_CONNECT_ERROR event when TCP socket recv buffer
>>> is
>>>>>>>> already
>>>>>>>> queued with FPDU messages.
>>>>>>>>
>>>>>>>> Hence, this function should only read upto private data
>>> length.
>>>>>>>>
>>>>>>>> Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
>>>>>>>> ---
>>>>>>>> drivers/infiniband/sw/siw/siw_cm.c | 4 ++--
>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> b/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> index a7cde98e73e8..8dc8cea2566c 100644
>>>>>>>> --- a/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
>>>>>>>> @@ -559,13 +559,13 @@ static int siw_recv_mpa_rr(struct
>>> siw_cep
>>>>>> *cep)
>>>>>>>> * A private data buffer gets allocated if hdr->params.pd_len
>>> !=
>>>>>> 0.
>>>>>>>> */
>>>>>>>> if (!cep->mpa.pdata) {
>>>>>>>> - cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL);
>>>>>>>> + cep->mpa.pdata = kmalloc(pd_len, GFP_KERNEL);
>>>>>>>> if (!cep->mpa.pdata)
>>>>>>>> return -ENOMEM;
>>>>>>>> }
>>>>>>>> rcvd = ksock_recv(
>>>>>>>> s, cep->mpa.pdata + cep->mpa.bytes_rcvd - sizeof(struct
>>> mpa_rr),
>>>>>>>> - to_rcv + 4, MSG_DONTWAIT);
>>>>>>>> + to_rcv, MSG_DONTWAIT);
>>>>>>>>
>>>>>>>> if (rcvd < 0)
>>>>>>>> return rcvd;
>>>>>>>> --
>>>>>>>> 2.23.0.rc0
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> The intention of this code is to make sure the
>>>>>>> peer does not violates the MPA handshake rules.
>>>>>>> The initiator MUST NOT send extra data after its
>>>>>>> MPA request and before receiving the MPA reply.
>>>>>>
>>>>>> I think this is true only for MPA v2. With MPA v1, the
>>>>>> initiator can begin sending immediately (before receiving
>>>>>> the MPA reply), because there is no actual negotiation at
>>>>>> the MPA layer.
>>>>>>
>>>>>> With MPA v2, the negotiation exchange is required because
>>>>>> the type of the following message is predicated by the
>>>>>> "Enhanced mode" a|b|c|d flags present in the first 32 bits
>>>>>> of the private data buffer.
>>>>>>
>>>>>> So, it seems to me that additional logic is needed here to
>>>>>> determine the MPA version, before sniffing additional octets
>>>>> >from the incoming stream?
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>>
>>>>> There still is the marker negotiation taking place.
>>>>> RFC 5044 says in section 7.1.2:
>>>>>
>>>>> "Note: Since the receiver's ability to deal with Markers is
>>>>> unknown until the Request and Reply Frames have been
>>>>> received, sending FPDUs before this occurs is not possible."
>>>>>
>>>>> This section further discusses the responder's behavior,
>>>>> where it MUST receive a first FPDU from the initiator
>>>>> before sending its first FPDU:
>>>>>
>>>>> "4. MPA Responder mode implementations MUST receive and validate
>>> at
>>>>> least one FPDU before sending any FPDUs or Markers."
>>>>>
>>>>> So it appears with MPA version 1, the current siw
>>>>> code is correct. The initiator is entering FPDU mode
>>>>> first, and only after receiving the MPA reply frame.
>>>>> Only after the initiator sent a first FPDU, the responder
>>>>> can start using the connection in FPDU mode.
>>>>> Because of this somehow broken connection establishment
>>>>> procedure (only the initiator can start sending data), a
>>>>> later MPA version makes this first FPDU exchange explicit
>>>>> and selectable (zero length READ/WRITE/Send).
>>>>
>>>> Yeah, I guess so. Because nobody ever actually implemented markers,
>>>> I think that they may more or less passively ignore this. But
>>> you're
>>>> currect that it's invalid protocol behavior.
>>>>
>>>> If your testing didn't uncover any issues with existing
>>> implementations
>>>> failing to connect with your strict checking, I'm ok with it.
>>> Tom & Bernard,
>>> Thanks for the insight on MPA negotiation.
>>>
>>> Could the below patch be considered as a proper fix?
>>>
>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>>> b/drivers/infiniband/sw/siw/siw_cm.c
>>> index 9ce8a1b925d2..0aec1b5212f9 100644
>>> --- a/drivers/infiniband/sw/siw/siw_cm.c
>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
>>> @@ -503,6 +503,7 @@ static int siw_recv_mpa_rr(struct siw_cep *cep)
>>> struct socket *s = cep->sock;
>>> u16 pd_len;
>>> int rcvd, to_rcv;
>>> + int extra_data_check = 4; /* 4Bytes, for MPA rules violation
>>> checking */
>>>
>>> if (cep->mpa.bytes_rcvd < sizeof(struct mpa_rr)) {
>>> rcvd = ksock_recv(s, (char *)hdr +
>>> cep->mpa.bytes_rcvd,
>>> @@ -553,23 +554,37 @@ static int siw_recv_mpa_rr(struct siw_cep *cep)
>>> return -EPROTO;
>>> }
>>>
>>> + /*
>>> + * Peer must not send any extra data other than MPA messages
>>> until MPA
>>> + * negotiation is completed, an exception is MPA V2
>>> client-server Mode,
>>> + * IE, in this mode the peer after sending MPA Reply can
>>> immediately
>>> + * start sending data in RDMA mode.
>>> + */
>>
>> This is unfortunately not true. The responder NEVER sends
>> an FPDU without having seen an FPDU from the initiator.
>> I just checked RFC 6581 again. The RTR (ready-to-receive)
>> indication from the initiator is still needed, but now
>> provided by the protocol and not the application: w/o
>> 'enhanced MPA setup', the initiator sends the first
>> FPDU as an application message. With 'enhanced MPA setup',
>> the initiator protocol entity sends (w/o application
>> interaction) a zero length READ/WRITE/Send as a first FPDU,
>> as previously negotiated with the responder. Again: only
>> after the responder has seen the first FPDU, it can
>> start sending in FPDU mode.
> If I understand your statements correctly: in MPA v2 clint-server mode,
> the responder application should have some logic to wait(after ESTABLISH
> event) until the first FPDU message is received from the initiator?
Krishna, this is not the application's responsibility. This is MPA,
and the iWARP provider's domain.
Tom.
>
> Thanks,
> Krishna.
>>
>> Sorry for the confusion. But the current
>> siw code appears to be just correct.
>>
>> Thanks
>> Bernard
>>
>>
>>
>>> + if ((__mpa_rr_revision(cep->mpa.hdr.params.bits) ==
>>> MPA_REVISION_2) &&
>>> + (cep->state == SIW_EPSTATE_AWAIT_MPAREP)) {
>>> + int mpa_p2p_mode = cep->mpa.v2_ctrl_req.ord &
>>> + (MPA_V2_RDMA_WRITE_RTR |
>>> MPA_V2_RDMA_READ_RTR);
>>> + if (!mpa_p2p_mode)
>>> + extra_data_check = 0;
>>> + }
>>> +
>>> /*
>>> * At this point, we must have hdr->params.pd_len != 0.
>>> * A private data buffer gets allocated if hdr->params.pd_len
>>> !=
>>> * 0.
>>> */
>>> if (!cep->mpa.pdata) {
>>> - cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL);
>>> + cep->mpa.pdata = kmalloc(pd_len + extra_data_check,
>>> GFP_KERNEL);
>>> if (!cep->mpa.pdata)
>>> return -ENOMEM;
>>> }
>>> rcvd = ksock_recv(
>>> s, cep->mpa.pdata + cep->mpa.bytes_rcvd -
>>> sizeof(struct
>>> mpa_rr),
>>> - to_rcv + 4, MSG_DONTWAIT);
>>> + to_rcv + extra_data_check, MSG_DONTWAIT);
>>>
>>> if (rcvd < 0)
>>> return rcvd;
>>>
>>> - if (rcvd > to_rcv)
>>> + if (extra_data_check && (rcvd > to_rcv))
>>> return -EPROTO;
>>>
>>> cep->mpa.bytes_rcvd += rcvd;
>>>
>>> -Krishna.
>>>>
>>>> Tom.
>>>>
>>>>
>>>>
>>>>>
>>>>> Bernard.
>>>>>
>>>>>>
>>>>>>> So, for the MPA request case, this code is needed
>>>>>>> to check for protocol correctness.
>>>>>>> You are right for the MPA reply case - if we are
>>>>>>> _not_ in peer2peer mode, the peer can immediately
>>>>>>> start sending data in RDMA mode after its MPA Reply.
>>>>>>> So we shall add appropriate code to be more specific
>>>>>>> For an error, we are (1) processing an MPA Request,
>>>>>>> OR (2) processing an MPA Reply AND we are not expected
>>>>>>> to send an initial READ/WRITE/Send as negotiated with
>>>>>>> the peer (peer2peer mode MPA handshake).
>>>>>>>
>>>>>>> Just removing this check would make siw more permissive,
>>>>>>> but to a point where peer MPA protocol errors are
>>>>>>> tolerated. I am not in favor of that level of
>>>>>>> forgiveness.
>>>>>>>
>>>>>>> If possible, please provide an appropriate patch
>>>>>>> or (if it causes current issues with another peer
>>>>>>> iWarp implementation) just run in MPA peer2peer mode,
>>>>>>> where the current check is appropriate.
>>>>>>> Otherwise, I would provide an appropriate fix by Monday
>>>>>>> (I am still out of office this week).
>>>>>>>
>>>>>>>
>>>>>>> Many thanks and best regards,
>>>>>>> Bernard.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>
>
next prev parent reply other threads:[~2019-08-09 12:32 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-31 10:33 [PATCH for-rc] siw: MPA Reply handler tries to read beyond MPA message Krishnamraju Eraparaju
2019-07-31 19:17 ` Doug Ledford
2019-08-01 11:34 ` Krishnamraju Eraparaju
2019-08-01 10:56 ` Bernard Metzler
2019-08-01 15:36 ` Doug Ledford
2019-08-01 18:53 ` Tom Talpey
2019-08-02 11:18 ` Bernard Metzler
2019-08-02 12:47 ` Tom Talpey
2019-08-08 15:05 ` Bernard Metzler
2019-08-08 16:46 ` Krishnamraju Eraparaju
2019-08-09 12:32 ` Tom Talpey [this message]
2019-08-09 10:29 ` Bernard Metzler
2019-08-09 12:27 ` Tom Talpey
2019-08-09 13:52 ` Bernard Metzler
2019-08-09 20:35 ` Tom Talpey
2019-08-12 9:58 ` Krishnamraju Eraparaju
2019-08-12 12:56 ` Bernard Metzler
2019-08-13 8:05 ` Krishnamraju Eraparaju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14285740-dfe9-1a6c-8964-bf2590400f51@talpey.com \
--to=tom@talpey.com \
--cc=BMT@zurich.ibm.com \
--cc=bharat@chelsio.com \
--cc=jgg@ziepe.ca \
--cc=krishna2@chelsio.com \
--cc=linux-rdma@vger.kernel.org \
--cc=nirranjan@chelsio.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).