* [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
@ 2021-03-22 13:35 Håkon Bugge
2021-03-23 19:46 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Håkon Bugge @ 2021-03-22 13:35 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe, linux-rdma
On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
Subnet Local is zero.
In cm_req_handler(), the cm_process_routed_req() function is
called. Since the Primary Subnet Local value is zero in the request,
and since this is RoCE (Primary Local LID is permissive), the
following statement will be executed:
IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
This corrupts SL in req_msg if it was different from zero. In other
words, a request to setup a connection using an SL != zero, will not
be honored, and a connection using SL zero will be created instead.
Fixed by not calling cm_process_routed_req() on RoCE systems.
Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
drivers/infiniband/core/cm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 3d194bb..6adbaea 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
goto destroy;
}
- cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
+ if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
+ cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
memset(&work->path[0], 0, sizeof(work->path[0]));
if (cm_req_has_alt_path(req_msg))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
2021-03-22 13:35 [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side Håkon Bugge
@ 2021-03-23 19:46 ` Jason Gunthorpe
2021-03-24 14:34 ` Håkon Bugge
0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-03-23 19:46 UTC (permalink / raw)
To: Håkon Bugge; +Cc: Doug Ledford, linux-rdma
On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
> Subnet Local is zero.
>
> In cm_req_handler(), the cm_process_routed_req() function is
> called. Since the Primary Subnet Local value is zero in the request,
> and since this is RoCE (Primary Local LID is permissive), the
> following statement will be executed:
>
> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>
> This corrupts SL in req_msg if it was different from zero. In other
> words, a request to setup a connection using an SL != zero, will not
> be honored, and a connection using SL zero will be created instead.
>
> Fixed by not calling cm_process_routed_req() on RoCE systems.
>
> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> drivers/infiniband/core/cm.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 3d194bb..6adbaea 100644
> +++ b/drivers/infiniband/core/cm.c
> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
> goto destroy;
> }
>
> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
why use ah_attr.type when a few lines below we have:
if (gid_attr &&
rdma_protocol_roce(work->port->cm_dev->ib_device,
work->port->port_num)) {
?
I suspect you can just move this into the else?
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
2021-03-23 19:46 ` Jason Gunthorpe
@ 2021-03-24 14:34 ` Håkon Bugge
2021-03-31 15:41 ` Haakon Bugge
2021-04-01 15:04 ` Jason Gunthorpe
0 siblings, 2 replies; 6+ messages in thread
From: Håkon Bugge @ 2021-03-24 14:34 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, OFED mailing list
> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
>> Subnet Local is zero.
>>
>> In cm_req_handler(), the cm_process_routed_req() function is
>> called. Since the Primary Subnet Local value is zero in the request,
>> and since this is RoCE (Primary Local LID is permissive), the
>> following statement will be executed:
>>
>> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>>
>> This corrupts SL in req_msg if it was different from zero. In other
>> words, a request to setup a connection using an SL != zero, will not
>> be honored, and a connection using SL zero will be created instead.
>>
>> Fixed by not calling cm_process_routed_req() on RoCE systems.
>>
>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>> drivers/infiniband/core/cm.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>> index 3d194bb..6adbaea 100644
>> +++ b/drivers/infiniband/core/cm.c
>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>> goto destroy;
>> }
>>
>> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>
> why use ah_attr.type when a few lines below we have:
>
> if (gid_attr &&
> rdma_protocol_roce(work->port->cm_dev->ib_device,
> work->port->port_num)) {
>
> ?
>
> I suspect you can just move this into the else?
I can counter that by saying ah_attr.type is used ~10 lines further down in the conditional call to sa_path_set_dmac() ;-)
Further, in
> if (gid_attr &&
> rdma_protocol_roce(work->port->cm_dev->ib_device,
> work->port->port_num)) {
I cannot really see how gid_attr could be null. If ib_init_ah_attr_from_wc() succeeds, it is set after the call to cm_init_av_for_response() above. May be using ah_attr.type in this test instead, for uniformity and readability?
I have no strong opinion.
Let me know your preference.
Thxs, Håkon
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
2021-03-24 14:34 ` Håkon Bugge
@ 2021-03-31 15:41 ` Haakon Bugge
2021-04-01 15:04 ` Jason Gunthorpe
1 sibling, 0 replies; 6+ messages in thread
From: Haakon Bugge @ 2021-03-31 15:41 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, OFED mailing list
> On 24 Mar 2021, at 15:34, Håkon Bugge <haakon.bugge@oracle.com> wrote:
>
>
>
>> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
>>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
>>> Subnet Local is zero.
>>>
>>> In cm_req_handler(), the cm_process_routed_req() function is
>>> called. Since the Primary Subnet Local value is zero in the request,
>>> and since this is RoCE (Primary Local LID is permissive), the
>>> following statement will be executed:
>>>
>>> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>>>
>>> This corrupts SL in req_msg if it was different from zero. In other
>>> words, a request to setup a connection using an SL != zero, will not
>>> be honored, and a connection using SL zero will be created instead.
>>>
>>> Fixed by not calling cm_process_routed_req() on RoCE systems.
>>>
>>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>> drivers/infiniband/core/cm.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index 3d194bb..6adbaea 100644
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>>> goto destroy;
>>> }
>>>
>>> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>>> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>
>> why use ah_attr.type when a few lines below we have:
>>
>> if (gid_attr &&
>> rdma_protocol_roce(work->port->cm_dev->ib_device,
>> work->port->port_num)) {
>>
>> ?
>>
>> I suspect you can just move this into the else?
>
> I can counter that by saying ah_attr.type is used ~10 lines further down in the conditional call to sa_path_set_dmac() ;-)
>
>
> Further, in
>
>> if (gid_attr &&
>> rdma_protocol_roce(work->port->cm_dev->ib_device,
>> work->port->port_num)) {
>
> I cannot really see how gid_attr could be null. If ib_init_ah_attr_from_wc() succeeds, it is set after the call to cm_init_av_for_response() above. May be using ah_attr.type in this test instead, for uniformity and readability?
>
> I have no strong opinion.
>
> Let me know your preference.
A gentle ping.
Thxs, Håkon
>
>
> Thxs, Håkon
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
2021-03-24 14:34 ` Håkon Bugge
2021-03-31 15:41 ` Haakon Bugge
@ 2021-04-01 15:04 ` Jason Gunthorpe
2021-04-06 7:41 ` Haakon Bugge
1 sibling, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-04-01 15:04 UTC (permalink / raw)
To: Håkon Bugge; +Cc: Doug Ledford, OFED mailing list
On Wed, Mar 24, 2021 at 02:34:13PM +0000, Håkon Bugge wrote:
>
>
> > On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
> >> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
> >> Subnet Local is zero.
> >>
> >> In cm_req_handler(), the cm_process_routed_req() function is
> >> called. Since the Primary Subnet Local value is zero in the request,
> >> and since this is RoCE (Primary Local LID is permissive), the
> >> following statement will be executed:
> >>
> >> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
> >>
> >> This corrupts SL in req_msg if it was different from zero. In other
> >> words, a request to setup a connection using an SL != zero, will not
> >> be honored, and a connection using SL zero will be created instead.
> >>
> >> Fixed by not calling cm_process_routed_req() on RoCE systems.
> >>
> >> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
> >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> >> drivers/infiniband/core/cm.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> >> index 3d194bb..6adbaea 100644
> >> +++ b/drivers/infiniband/core/cm.c
> >> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
> >> goto destroy;
> >> }
> >>
> >> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> >> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
> >> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> >
> > why use ah_attr.type when a few lines below we have:
> >
> > if (gid_attr &&
> > rdma_protocol_roce(work->port->cm_dev->ib_device,
> > work->port->port_num)) {
> >
> > ?
> >
> > I suspect you can just move this into the else?
>
> I can counter that by saying ah_attr.type is used ~10 lines further
> down in the conditional call to sa_path_set_dmac() ;-)
Hum, OK. Please send an additional patch to unify everything around
av.ah_attr.type
> > if (gid_attr &&
> > rdma_protocol_roce(work->port->cm_dev->ib_device,
> > work->port->port_num)) {
>
> I cannot really see how gid_attr could be null. If
> ib_init_ah_attr_from_wc() succeeds, it is set after the call to
> cm_init_av_for_response() above. May be using ah_attr.type in this
> test instead, for uniformity and readability?
The GRH is optional, ib_init_ah_attr_from_wc() only sets it
conditionally.
Applied to for-next
Thanks,
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side
2021-04-01 15:04 ` Jason Gunthorpe
@ 2021-04-06 7:41 ` Haakon Bugge
0 siblings, 0 replies; 6+ messages in thread
From: Haakon Bugge @ 2021-04-06 7:41 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, OFED mailing list
> On 1 Apr 2021, at 17:04, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Mar 24, 2021 at 02:34:13PM +0000, Håkon Bugge wrote:
>>
>>
>>> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>
>>> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
>>>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
>>>> Subnet Local is zero.
>>>>
>>>> In cm_req_handler(), the cm_process_routed_req() function is
>>>> called. Since the Primary Subnet Local value is zero in the request,
>>>> and since this is RoCE (Primary Local LID is permissive), the
>>>> following statement will be executed:
>>>>
>>>> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>>>>
>>>> This corrupts SL in req_msg if it was different from zero. In other
>>>> words, a request to setup a connection using an SL != zero, will not
>>>> be honored, and a connection using SL zero will be created instead.
>>>>
>>>> Fixed by not calling cm_process_routed_req() on RoCE systems.
>>>>
>>>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> drivers/infiniband/core/cm.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>> index 3d194bb..6adbaea 100644
>>>> +++ b/drivers/infiniband/core/cm.c
>>>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>>>> goto destroy;
>>>> }
>>>>
>>>> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>>> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>>>> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>>
>>> why use ah_attr.type when a few lines below we have:
>>>
>>> if (gid_attr &&
>>> rdma_protocol_roce(work->port->cm_dev->ib_device,
>>> work->port->port_num)) {
>>>
>>> ?
>>>
>>> I suspect you can just move this into the else?
>>
>> I can counter that by saying ah_attr.type is used ~10 lines further
>> down in the conditional call to sa_path_set_dmac() ;-)
>
> Hum, OK. Please send an additional patch to unify everything around
> av.ah_attr.type
Will do.
>>> if (gid_attr &&
>>> rdma_protocol_roce(work->port->cm_dev->ib_device,
>>> work->port->port_num)) {
>>
>> I cannot really see how gid_attr could be null. If
>> ib_init_ah_attr_from_wc() succeeds, it is set after the call to
>> cm_init_av_for_response() above. May be using ah_attr.type in this
>> test instead, for uniformity and readability?
>
> The GRH is optional, ib_init_ah_attr_from_wc() only sets it
> conditionally.
True. But one of the conditions to set sgid_attr is rdma_protocol_roce(). Hence the first term in:
if (gid_attr && rdma_protocol_roce())
is superfluous. This because, it cannot be NULL on RoCE systems, because it is dereferenced in:
cm_init_av_for_response()
ib_init_ah_attr_from_wc()
rdma_move_grh_sgid_attr()
I'll send the patch with the gid_attr term and let you can decide.
Thxs, Håkon
>
> Applied to for-next
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-04-06 7:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22 13:35 [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side Håkon Bugge
2021-03-23 19:46 ` Jason Gunthorpe
2021-03-24 14:34 ` Håkon Bugge
2021-03-31 15:41 ` Haakon Bugge
2021-04-01 15:04 ` Jason Gunthorpe
2021-04-06 7:41 ` Haakon Bugge
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.