* cm_process_routed_req() does not resonate well with RoCE systems
@ 2021-03-18 19:21 Haakon Bugge
2021-03-19 15:07 ` Haakon Bugge
0 siblings, 1 reply; 4+ messages in thread
From: Haakon Bugge @ 2021-03-18 19:21 UTC (permalink / raw)
To: OFED mailing list, Jason Gunthorpe, Leon Romanovsky
With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark):
Primary Hop Limit: 0x40
.... 0... = Primary Subnet Local: 0x0
This because cma_resolve_iboe_route() has:
if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB)
/* TODO: get the hoplimit from the inet/inet6 device */
route->path_rec->hop_limit = addr->dev_addr.hoplimit;
else
route->path_rec->hop_limit = 1;
The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has:
addr->hoplimit = ip4_dst_hoplimit(&rt->dst);
ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64).
For the purpose of this case, consider the CM REQ to have the Primary SL != 0.
When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called.
Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed:
IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
At least on the system I ran on, which was equipped with a Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup a connection using an SL != zero, will not be honoured, and a connection using SL zero will be created instead.
As a side note, in cm_process_routed_req(), we have:
IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits);
which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight.
I am uncertain about the correct fix here. Any input appreciated.
Thxs, Håkon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cm_process_routed_req() does not resonate well with RoCE systems
2021-03-18 19:21 cm_process_routed_req() does not resonate well with RoCE systems Haakon Bugge
@ 2021-03-19 15:07 ` Haakon Bugge
2021-03-23 18:08 ` Jason Gunthorpe
0 siblings, 1 reply; 4+ messages in thread
From: Haakon Bugge @ 2021-03-19 15:07 UTC (permalink / raw)
To: OFED mailing list, Jason Gunthorpe, Leon Romanovsky
> On 18 Mar 2021, at 20:21, Haakon Bugge <haakon.bugge@oracle.com> wrote:
>
> With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark):
>
> Primary Hop Limit: 0x40
> .... 0... = Primary Subnet Local: 0x0
>
> This because cma_resolve_iboe_route() has:
>
> if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB)
> /* TODO: get the hoplimit from the inet/inet6 device */
> route->path_rec->hop_limit = addr->dev_addr.hoplimit;
> else
> route->path_rec->hop_limit = 1;
>
> The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has:
>
> addr->hoplimit = ip4_dst_hoplimit(&rt->dst);
>
> ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64).
>
> For the purpose of this case, consider the CM REQ to have the Primary SL != 0.
>
> When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called.
>
> Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed:
>
> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>
> At least on the system I ran on, which was equipped with a Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup a connection using an SL != zero, will not be honoured, and a connection using SL zero will be created instead.
>
> As a side note, in cm_process_routed_req(), we have:
>
> IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits);
>
> which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight.
>
> I am uncertain about the correct fix here. Any input appreciated.
I intend to send a patch doing:
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
goto destroy;
}
- cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
+ if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
+ cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
memset(&work->path[0], 0, sizeof(work->path[0]));
if (cm_req_has_alt_path(req_msg))
if I do not get a push back.
Thxs, Håkon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cm_process_routed_req() does not resonate well with RoCE systems
2021-03-19 15:07 ` Haakon Bugge
@ 2021-03-23 18:08 ` Jason Gunthorpe
2021-03-23 18:38 ` Haakon Bugge
0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2021-03-23 18:08 UTC (permalink / raw)
To: Haakon Bugge; +Cc: OFED mailing list, Leon Romanovsky
On Fri, Mar 19, 2021 at 03:07:07PM +0000, Haakon Bugge wrote:
>
>
> > On 18 Mar 2021, at 20:21, Haakon Bugge <haakon.bugge@oracle.com> wrote:
> >
> > With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark):
> >
> > Primary Hop Limit: 0x40
> > .... 0... = Primary Subnet Local: 0x0
> >
> > This because cma_resolve_iboe_route() has:
> >
> > if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB)
> > /* TODO: get the hoplimit from the inet/inet6 device */
> > route->path_rec->hop_limit = addr->dev_addr.hoplimit;
> > else
> > route->path_rec->hop_limit = 1;
> >
> > The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has:
> >
> > addr->hoplimit = ip4_dst_hoplimit(&rt->dst);
> >
> > ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64).
> >
> > For the purpose of this case, consider the CM REQ to have the Primary SL != 0.
> >
> > When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called.
> >
> > Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed:
> >
> > IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
> >
> > At least on the system I ran on, which was equipped with a
> > Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup
> > a connection using an SL != zero, will not be honoured, and a
> > connection using SL zero will be created instead.
> >
> > As a side note, in cm_process_routed_req(), we have:
> >
> > IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits);
> >
> > which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight.
> >
> > I am uncertain about the correct fix here. Any input appreciated.
>
> I intend to send a patch doing:
>
> +++ b/drivers/infiniband/core/cm.c
> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
> goto destroy;
> }
>
> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>
> memset(&work->path[0], 0, sizeof(work->path[0]));
> if (cm_req_has_alt_path(req_msg))
> >
> if I do not get a push back.
This does seem reasonable, but I don't understand the underlying
issue, why is anything in roce land touching the SL? Are you trying to
use the SL as a proxy for the TOS bits?
Jason
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: cm_process_routed_req() does not resonate well with RoCE systems
2021-03-23 18:08 ` Jason Gunthorpe
@ 2021-03-23 18:38 ` Haakon Bugge
0 siblings, 0 replies; 4+ messages in thread
From: Haakon Bugge @ 2021-03-23 18:38 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: OFED mailing list, Leon Romanovsky
> On 23 Mar 2021, at 19:08, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Mar 19, 2021 at 03:07:07PM +0000, Haakon Bugge wrote:
>>
>>
>>> On 18 Mar 2021, at 20:21, Haakon Bugge <haakon.bugge@oracle.com> wrote:
>>>
>>> With the introduction of RoCE systems, a CM REQ message will contain (pasted from Wireshark):
>>>
>>> Primary Hop Limit: 0x40
>>> .... 0... = Primary Subnet Local: 0x0
>>>
>>> This because cma_resolve_iboe_route() has:
>>>
>>> if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB)
>>> /* TODO: get the hoplimit from the inet/inet6 device */
>>> route->path_rec->hop_limit = addr->dev_addr.hoplimit;
>>> else
>>> route->path_rec->hop_limit = 1;
>>>
>>> The addr->dev_addr.hoplimit is coming from addr4_resolve(), which has:
>>>
>>> addr->hoplimit = ip4_dst_hoplimit(&rt->dst);
>>>
>>> ip4_dst_hoplimit() returns the value of the sysctl net.ipv4.ip_default_ttl in this case (64).
>>>
>>> For the purpose of this case, consider the CM REQ to have the Primary SL != 0.
>>>
>>> When this REQ gets processed by cm_req_handler(), the cm_process_routed_req() function is called.
>>>
>>> Since the Primary Subnet Local value is zero in the request, and since this is RoCE (Primary Local LID is permissive), the following statement will be executed:
>>>
>>> IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>>>
>>> At least on the system I ran on, which was equipped with a
>>> Mellanox CX-5 HCA, the wc->sl is zero. Hence, the request to setup
>>> a connection using an SL != zero, will not be honoured, and a
>>> connection using SL zero will be created instead.
>>>
>>> As a side note, in cm_process_routed_req(), we have:
>>>
>>> IBA_SET(CM_REQ_PRIMARY_REMOTE_PORT_LID, req_msg, wc->dlid_path_bits);
>>>
>>> which is strange, since a LID is 16 bits, whereas dlid_path_bits is only eight.
>>>
>>> I am uncertain about the correct fix here. Any input appreciated.
>>
>> I intend to send a patch doing:
>>
>> +++ b/drivers/infiniband/core/cm.c
>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>> goto destroy;
>> }
>>
>> - cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>> + if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>> + cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>
>> memset(&work->path[0], 0, sizeof(work->path[0]));
>> if (cm_req_has_alt_path(req_msg))
>>>
>> if I do not get a push back.
>
> This does seem reasonable, but I don't understand the underlying
> issue, why is anything in roce land touching the SL? Are you trying to
> use the SL as a proxy for the TOS bits?
We want to control the DSCP in the encapsulating IP packet to select different TCs. As per the RoCE Annex:
<quote>
The SL component in the Address Vector is used to determine the Ethernet Priority of generated RoCEv2 packets. SL 0-7 are mapped directly to Priorities 0-7, respectively. SL 8-15 are reserved.
</quote>
Quite similar to how IB Link-layer translates the SL to an VL.
Thxs, Håkon
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-03-23 18:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 19:21 cm_process_routed_req() does not resonate well with RoCE systems Haakon Bugge
2021-03-19 15:07 ` Haakon Bugge
2021-03-23 18:08 ` Jason Gunthorpe
2021-03-23 18:38 ` Haakon Bugge
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.