All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
@ 2021-03-22 13:35 Håkon Bugge
  2021-03-23 19:46 ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Håkon Bugge @ 2021-03-22 13:35 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe, linux-rdma

On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
Subnet Local is zero.

In cm_req_handler(), the cm_process_routed_req() function is
called. Since the Primary Subnet Local value is zero in the request,
and since this is RoCE (Primary Local LID is permissive), the
following statement will be executed:

      IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);

This corrupts SL in req_msg if it was different from zero. In other
words, a request to setup a connection using an SL != zero, will not
be honored, and a connection using SL zero will be created instead.

Fixed by not calling cm_process_routed_req() on RoCE systems.

Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/core/cm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 3d194bb..6adbaea 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
 		goto destroy;
 	}
 
-	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
+	if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
+		cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
 
 	memset(&work->path[0], 0, sizeof(work->path[0]));
 	if (cm_req_has_alt_path(req_msg))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
  2021-03-22 13:35 [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side Håkon Bugge
@ 2021-03-23 19:46 ` Jason Gunthorpe
  2021-03-24 14:34   ` Håkon Bugge
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-03-23 19:46 UTC (permalink / raw)
  To: Håkon Bugge; +Cc: Doug Ledford, linux-rdma

On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
> Subnet Local is zero.
> 
> In cm_req_handler(), the cm_process_routed_req() function is
> called. Since the Primary Subnet Local value is zero in the request,
> and since this is RoCE (Primary Local LID is permissive), the
> following statement will be executed:
> 
>       IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
> 
> This corrupts SL in req_msg if it was different from zero. In other
> words, a request to setup a connection using an SL != zero, will not
> be honored, and a connection using SL zero will be created instead.
> 
> Fixed by not calling cm_process_routed_req() on RoCE systems.
> 
> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>  drivers/infiniband/core/cm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 3d194bb..6adbaea 100644
> +++ b/drivers/infiniband/core/cm.c
> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>  		goto destroy;
>  	}
>  
> -	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> +	if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
> +		cm_process_routed_req(req_msg, work->mad_recv_wc->wc);

why use ah_attr.type when a few lines below we have:

	if (gid_attr &&
	    rdma_protocol_roce(work->port->cm_dev->ib_device,
			       work->port->port_num)) {

?

I suspect you can just move this into the else?

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
  2021-03-23 19:46 ` Jason Gunthorpe
@ 2021-03-24 14:34   ` Håkon Bugge
  2021-03-31 15:41     ` Haakon Bugge
  2021-04-01 15:04     ` Jason Gunthorpe
  0 siblings, 2 replies; 6+ messages in thread
From: Håkon Bugge @ 2021-03-24 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, OFED mailing list



> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
>> Subnet Local is zero.
>> 
>> In cm_req_handler(), the cm_process_routed_req() function is
>> called. Since the Primary Subnet Local value is zero in the request,
>> and since this is RoCE (Primary Local LID is permissive), the
>> following statement will be executed:
>> 
>>      IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>> 
>> This corrupts SL in req_msg if it was different from zero. In other
>> words, a request to setup a connection using an SL != zero, will not
>> be honored, and a connection using SL zero will be created instead.
>> 
>> Fixed by not calling cm_process_routed_req() on RoCE systems.
>> 
>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>> drivers/infiniband/core/cm.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>> index 3d194bb..6adbaea 100644
>> +++ b/drivers/infiniband/core/cm.c
>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>> 		goto destroy;
>> 	}
>> 
>> -	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>> +	if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>> +		cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> 
> why use ah_attr.type when a few lines below we have:
> 
> 	if (gid_attr &&
> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
> 			       work->port->port_num)) {
> 
> ?
> 
> I suspect you can just move this into the else?

I can counter that by saying ah_attr.type is used ~10 lines further down in the conditional call to sa_path_set_dmac() ;-)


Further, in

> 	if (gid_attr &&
> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
> 			       work->port->port_num)) {

I cannot really see how gid_attr could be null. If ib_init_ah_attr_from_wc() succeeds, it is set after the call to cm_init_av_for_response() above. May be using ah_attr.type in this test instead, for uniformity and readability?

I have no strong opinion.

Let me know your preference.


Thxs, Håkon


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
  2021-03-24 14:34   ` Håkon Bugge
@ 2021-03-31 15:41     ` Haakon Bugge
  2021-04-01 15:04     ` Jason Gunthorpe
  1 sibling, 0 replies; 6+ messages in thread
From: Haakon Bugge @ 2021-03-31 15:41 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, OFED mailing list

> On 24 Mar 2021, at 15:34, Håkon Bugge <haakon.bugge@oracle.com> wrote:
> 
> 
> 
>> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
>> 
>> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
>>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
>>> Subnet Local is zero.
>>> 
>>> In cm_req_handler(), the cm_process_routed_req() function is
>>> called. Since the Primary Subnet Local value is zero in the request,
>>> and since this is RoCE (Primary Local LID is permissive), the
>>> following statement will be executed:
>>> 
>>>     IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>>> 
>>> This corrupts SL in req_msg if it was different from zero. In other
>>> words, a request to setup a connection using an SL != zero, will not
>>> be honored, and a connection using SL zero will be created instead.
>>> 
>>> Fixed by not calling cm_process_routed_req() on RoCE systems.
>>> 
>>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>> drivers/infiniband/core/cm.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index 3d194bb..6adbaea 100644
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>>> 		goto destroy;
>>> 	}
>>> 
>>> -	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>> +	if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>>> +		cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>> 
>> why use ah_attr.type when a few lines below we have:
>> 
>> 	if (gid_attr &&
>> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
>> 			       work->port->port_num)) {
>> 
>> ?
>> 
>> I suspect you can just move this into the else?
> 
> I can counter that by saying ah_attr.type is used ~10 lines further down in the conditional call to sa_path_set_dmac() ;-)
> 
> 
> Further, in
> 
>> 	if (gid_attr &&
>> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
>> 			       work->port->port_num)) {
> 
> I cannot really see how gid_attr could be null. If ib_init_ah_attr_from_wc() succeeds, it is set after the call to cm_init_av_for_response() above. May be using ah_attr.type in this test instead, for uniformity and readability?
> 
> I have no strong opinion.
> 
> Let me know your preference.

A gentle ping.

Thxs, Håkon


> 
> 
> Thxs, Håkon


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
  2021-03-24 14:34   ` Håkon Bugge
  2021-03-31 15:41     ` Haakon Bugge
@ 2021-04-01 15:04     ` Jason Gunthorpe
  2021-04-06  7:41       ` Haakon Bugge
  1 sibling, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 15:04 UTC (permalink / raw)
  To: Håkon Bugge; +Cc: Doug Ledford, OFED mailing list

On Wed, Mar 24, 2021 at 02:34:13PM +0000, Håkon Bugge wrote:
> 
> 
> > On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
> > 
> > On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
> >> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
> >> Subnet Local is zero.
> >> 
> >> In cm_req_handler(), the cm_process_routed_req() function is
> >> called. Since the Primary Subnet Local value is zero in the request,
> >> and since this is RoCE (Primary Local LID is permissive), the
> >> following statement will be executed:
> >> 
> >>      IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
> >> 
> >> This corrupts SL in req_msg if it was different from zero. In other
> >> words, a request to setup a connection using an SL != zero, will not
> >> be honored, and a connection using SL zero will be created instead.
> >> 
> >> Fixed by not calling cm_process_routed_req() on RoCE systems.
> >> 
> >> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
> >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> >> drivers/infiniband/core/cm.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> >> index 3d194bb..6adbaea 100644
> >> +++ b/drivers/infiniband/core/cm.c
> >> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
> >> 		goto destroy;
> >> 	}
> >> 
> >> -	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> >> +	if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
> >> +		cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> > 
> > why use ah_attr.type when a few lines below we have:
> > 
> > 	if (gid_attr &&
> > 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
> > 			       work->port->port_num)) {
> > 
> > ?
> > 
> > I suspect you can just move this into the else?
> 
> I can counter that by saying ah_attr.type is used ~10 lines further
> down in the conditional call to sa_path_set_dmac() ;-)

Hum, OK. Please send an additional patch to unify everything around
av.ah_attr.type

> > 	if (gid_attr &&
> > 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
> > 			       work->port->port_num)) {
> 
> I cannot really see how gid_attr could be null. If
> ib_init_ah_attr_from_wc() succeeds, it is set after the call to
> cm_init_av_for_response() above. May be using ah_attr.type in this
> test instead, for uniformity and readability?

The GRH is optional, ib_init_ah_attr_from_wc() only sets it
conditionally.

Applied to for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
  2021-04-01 15:04     ` Jason Gunthorpe
@ 2021-04-06  7:41       ` Haakon Bugge
  0 siblings, 0 replies; 6+ messages in thread
From: Haakon Bugge @ 2021-04-06  7:41 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, OFED mailing list



> On 1 Apr 2021, at 17:04, Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> On Wed, Mar 24, 2021 at 02:34:13PM +0000, Håkon Bugge wrote:
>> 
>> 
>>> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>> 
>>> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
>>>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
>>>> Subnet Local is zero.
>>>> 
>>>> In cm_req_handler(), the cm_process_routed_req() function is
>>>> called. Since the Primary Subnet Local value is zero in the request,
>>>> and since this is RoCE (Primary Local LID is permissive), the
>>>> following statement will be executed:
>>>> 
>>>>     IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>>>> 
>>>> This corrupts SL in req_msg if it was different from zero. In other
>>>> words, a request to setup a connection using an SL != zero, will not
>>>> be honored, and a connection using SL zero will be created instead.
>>>> 
>>>> Fixed by not calling cm_process_routed_req() on RoCE systems.
>>>> 
>>>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> drivers/infiniband/core/cm.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>> index 3d194bb..6adbaea 100644
>>>> +++ b/drivers/infiniband/core/cm.c
>>>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>>>> 		goto destroy;
>>>> 	}
>>>> 
>>>> -	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>>> +	if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>>>> +		cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>> 
>>> why use ah_attr.type when a few lines below we have:
>>> 
>>> 	if (gid_attr &&
>>> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
>>> 			       work->port->port_num)) {
>>> 
>>> ?
>>> 
>>> I suspect you can just move this into the else?
>> 
>> I can counter that by saying ah_attr.type is used ~10 lines further
>> down in the conditional call to sa_path_set_dmac() ;-)
> 
> Hum, OK. Please send an additional patch to unify everything around
> av.ah_attr.type

Will do.

>>> 	if (gid_attr &&
>>> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
>>> 			       work->port->port_num)) {
>> 
>> I cannot really see how gid_attr could be null. If
>> ib_init_ah_attr_from_wc() succeeds, it is set after the call to
>> cm_init_av_for_response() above. May be using ah_attr.type in this
>> test instead, for uniformity and readability?
> 
> The GRH is optional, ib_init_ah_attr_from_wc() only sets it
> conditionally.

True. But one of the conditions to set sgid_attr is rdma_protocol_roce(). Hence the first term in:

if (gid_attr && rdma_protocol_roce())

is superfluous. This because, it cannot be NULL on RoCE systems, because it is dereferenced in:

cm_init_av_for_response()
    ib_init_ah_attr_from_wc()
        rdma_move_grh_sgid_attr()


I'll send the patch with the gid_attr term and let you can decide.


Thxs, Håkon





> 
> Applied to for-next
> 
> Thanks,
> Jason


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-06  7:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22 13:35 [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side Håkon Bugge
2021-03-23 19:46 ` Jason Gunthorpe
2021-03-24 14:34   ` Håkon Bugge
2021-03-31 15:41     ` Haakon Bugge
2021-04-01 15:04     ` Jason Gunthorpe
2021-04-06  7:41       ` Haakon Bugge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.