* [PATCH IB/core 0/2] Do not form IB connections between limited partition members @ 2018-05-09 9:30 Håkon Bugge 2018-05-09 9:30 ` [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one Håkon Bugge 2018-05-09 9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge 0 siblings, 2 replies; 24+ messages in thread From: Håkon Bugge @ 2018-05-09 9:30 UTC (permalink / raw) To: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty Cc: linux-rdma, linux-kernel, Håkon Bugge Systems using IB partitions might be exposed to excessive pkey violation traps which are sent to the OpenSM. This can be close to a DoS attack, and in addition, the OpenSM logs are flooded with these messages, hiding potential other log messages deemed important in order to investigate important issues. This series prohibit RDMA CM to establish connections between two limited partition members. This avoids pkey violation traps stemming from unicast messages to be sent to the OpenSM. [If this patch series get accepted by the community, I ask if the maintainer can update the reference to the first commit in the second commit message with a correct 12 chars SHA] Håkon Bugge (2): IB/core: A full pkey is required to match a limited one IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys drivers/infiniband/core/cache.c | 32 +++++++++++++++++++++++++++----- drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- include/rdma/ib_cache.h | 18 ++++++++++++++++++ include/rdma/ib_cm.h | 4 +++- 4 files changed, 80 insertions(+), 13 deletions(-) -- 2.13.6 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one 2018-05-09 9:30 [PATCH IB/core 0/2] Do not form IB connections between limited partition members Håkon Bugge @ 2018-05-09 9:30 ` Håkon Bugge 2018-05-09 9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge 1 sibling, 0 replies; 24+ messages in thread From: Håkon Bugge @ 2018-05-09 9:30 UTC (permalink / raw) To: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty Cc: linux-rdma, linux-kernel, Håkon Bugge In ib_find_cached_pkey(), the preference is to return the pkey-index of the full pkey, if both a limited and the full exists in the cache. However, if both pkeys are limited, the function return success. To detect if two pkeys are eligible for communication, ib_find_matched_cached_pkey() is introduced, which requires at least one of the pkeys to be a full member, in order to return success. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- drivers/infiniband/core/cache.c | 32 +++++++++++++++++++++++++++----- include/rdma/ib_cache.h | 18 ++++++++++++++++++ 2 files changed, 45 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index fb2d347f760f..cb7e9818a226 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -983,10 +983,11 @@ int ib_get_cached_subnet_prefix(struct ib_device *device, } EXPORT_SYMBOL(ib_get_cached_subnet_prefix); -int ib_find_cached_pkey(struct ib_device *device, - u8 port_num, - u16 pkey, - u16 *index) +static int __ib_find_cached_pkey(struct ib_device *device, + u8 port_num, + u16 pkey, + u16 *index, + bool allow_lim_to_lim_match) { struct ib_pkey_cache *cache; unsigned long flags; @@ -1013,7 +1014,11 @@ int ib_find_cached_pkey(struct ib_device *device, partial_ix = i; } - if (ret && partial_ix >= 0) { + /* + * If table content is limited, pkey must be full to match, + * unless allow_lim_to_lim_match is set + */ + if (ret && partial_ix >= 0 && (pkey & 0x8000 || allow_lim_to_lim_match)) { *index = partial_ix; ret = 0; } @@ -1022,6 +1027,23 @@ int ib_find_cached_pkey(struct ib_device *device, return ret; } + +int ib_find_matched_cached_pkey(struct ib_device *device, + u8 port_num, + u16 pkey, + u16 *index) +{ + return __ib_find_cached_pkey(device, port_num, pkey, index, false); +} +EXPORT_SYMBOL(ib_find_matched_cached_pkey); + +int ib_find_cached_pkey(struct ib_device *device, + u8 port_num, + u16 pkey, + u16 *index) +{ + return __ib_find_cached_pkey(device, port_num, pkey, index, true); +} EXPORT_SYMBOL(ib_find_cached_pkey); int ib_find_exact_cached_pkey(struct ib_device *device, diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h index eb49cc8d1f95..748bd27559b6 100644 --- a/include/rdma/ib_cache.h +++ b/include/rdma/ib_cache.h @@ -92,6 +92,24 @@ int ib_get_cached_pkey(struct ib_device *device_handle, u16 *pkey); /** + * ib_find_matched_cached_pkey - Returns the PKey table index where a + * specified PKey value occurs. Success requires at least one of the + * PKeys to be a full-member. + * @device: The device to query. + * @port_num: The port number of the device to search for the PKey. + * @pkey: The PKey value to search for. + * @index: The index into the cached PKey table where the PKey was found. + * + * ib_find_matched_cached_pkey() searches the specified PKey table in + * the local software cache and return success unless both PKeys are + * limited. + */ +int ib_find_matched_cached_pkey(struct ib_device *device, + u8 port_num, + u16 pkey, + u16 *index); + +/** * ib_find_cached_pkey - Returns the PKey table index where a specified * PKey value occurs. * @device: The device to query. -- 2.13.6 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-09 9:30 [PATCH IB/core 0/2] Do not form IB connections between limited partition members Håkon Bugge 2018-05-09 9:30 ` [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one Håkon Bugge @ 2018-05-09 9:30 ` Håkon Bugge 2018-05-09 11:28 ` Hal Rosenstock 2018-05-09 18:11 ` kbuild test robot 1 sibling, 2 replies; 24+ messages in thread From: Håkon Bugge @ 2018-05-09 9:30 UTC (permalink / raw) To: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty Cc: linux-rdma, linux-kernel, Håkon Bugge There is no point in using RDMA CM to establish a connection between two QPs that cannot possible communicate. Particularly, if both the active and passive side use limited pkeys, they are not able to communicate. In order to detect this situation, the authentic pkey is used in the CM REQ message. The authentic pkey is the one that the HCA inserts into the BTH in the IB packets. When the passive side receives the REQ, commit ("ib_core: A full pkey is required to match a limited one") ensures that ib_find_matched_cached_pkey() fails unless at least one of the pkeys compared has the full-member bit. In the limited-to-limited case, this will prohibit the connection to be formed, and thus, Pkey Violation Traps will not be sent to the SM. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- include/rdma/ib_cm.h | 4 +++- 2 files changed, 35 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index a92e1a5c202b..52ed51d5bd2a 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3,6 +3,7 @@ * Copyright (c) 2004 Topspin Corporation. All rights reserved. * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", }; const char *__attribute_const__ ibcm_reject_msg(int reason) @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, return -EINVAL; cm_dev = port->cm_dev; - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, - be16_to_cpu(path->pkey), &av->pkey_index); + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, + be16_to_cpu(path->pkey), &av->pkey_index); if (ret) return ret; @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); cm_req_set_local_resp_timeout(req_msg, param->local_cm_response_timeout); - req_msg->pkey = param->primary_path->pkey; + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); cm_req_set_path_mtu(req_msg, param->primary_path->mtu); cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, cm_id_priv->responder_resources = param->responder_resources; cm_id_priv->retry_count = param->retry_count; cm_id_priv->path_mtu = param->primary_path->mtu; - cm_id_priv->pkey = param->primary_path->pkey; + + /* + * We want to send the pkey used in the BTH in packets + * sent. This, in order for the passive side to determine if + * communication is permitted by the respective pkeys. + * + * The pkey in the paths are derived from the MGID, which has + * the full membership bit set. Hence, we retrieve the pkey by + * using the address vector's pkey_index. + */ + ret = ib_get_cached_pkey(cm_id_priv->id.device, + cm_id_priv->av.port->port_num, + cm_id_priv->av.pkey_index, + &cm_id_priv->pkey); + if (ret) + goto error1; + cm_id_priv->qp_type = param->qp_type; ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) cm_id_priv); if (ret) { int err; + int rej_reason = (ret == -ENOENT ? + IB_CM_REJ_INVALID_PKEY : + IB_CM_REJ_INVALID_GID); err = ib_get_cached_gid(work->port->cm_dev->ib_device, work->port->port_num, 0, &work->path[0].sgid, NULL); if (err) - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, + ib_send_cm_rej(cm_id, rej_reason, NULL, 0, NULL, 0); else - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, + ib_send_cm_rej(cm_id, rej_reason, &work->path[0].sgid, sizeof(work->path[0].sgid), NULL, 0); @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, cm_id_priv); if (ret) { - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, + int rej_reason = (ret == -ENOENT ? + IB_CM_REJ_INVALID_PKEY : + IB_CM_REJ_INVALID_ALT_GID); + + ib_send_cm_rej(cm_id, rej_reason, &work->path[0].sgid, sizeof(work->path[0].sgid), NULL, 0); goto rejected; diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h index 7979cb04f529..56b62303946a 100644 --- a/include/rdma/ib_cm.h +++ b/include/rdma/ib_cm.h @@ -3,6 +3,7 @@ * Copyright (c) 2004 Topspin Corporation. All rights reserved. * Copyright (c) 2004 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, IB_CM_REJ_INVALID_CLASS_VERSION = 31, IB_CM_REJ_INVALID_FLOW_LABEL = 32, - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, + IB_CM_REJ_INVALID_PKEY = 34, }; struct ib_cm_rej_event_param { -- 2.13.6 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-09 9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge @ 2018-05-09 11:28 ` Hal Rosenstock 2018-05-10 9:16 ` Håkon Bugge 2018-05-09 18:11 ` kbuild test robot 1 sibling, 1 reply; 24+ messages in thread From: Hal Rosenstock @ 2018-05-09 11:28 UTC (permalink / raw) To: Håkon Bugge, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty Cc: linux-rdma, linux-kernel On 5/9/2018 5:30 AM, Håkon Bugge wrote: > There is no point in using RDMA CM to establish a connection between > two QPs that cannot possible communicate. Particularly, if both the > active and passive side use limited pkeys, they are not able to > communicate. > > In order to detect this situation, the authentic pkey is used in the > CM REQ message. The authentic pkey is the one that the HCA inserts > into the BTH in the IB packets. > > When the passive side receives the REQ, commit ("ib_core: A full pkey > is required to match a limited one") ensures that > ib_find_matched_cached_pkey() fails unless at least one of the pkeys > compared has the full-member bit. > > In the limited-to-limited case, this will prohibit the connection to > be formed, and thus, Pkey Violation Traps will not be sent to the SM. > > Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> > --- > drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- > include/rdma/ib_cm.h | 4 +++- > 2 files changed, 35 insertions(+), 8 deletions(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index a92e1a5c202b..52ed51d5bd2a 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -3,6 +3,7 @@ > * Copyright (c) 2004 Topspin Corporation. All rights reserved. > * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. > * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. > + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { > [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", > [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", > [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", > + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", If this patch goes ahead, IBA spec for CM should be updated to include this. > }; > > const char *__attribute_const__ ibcm_reject_msg(int reason) > @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, > return -EINVAL; > cm_dev = port->cm_dev; > > - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, > - be16_to_cpu(path->pkey), &av->pkey_index); > + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, > + be16_to_cpu(path->pkey), &av->pkey_index); > if (ret) > return ret; > > @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, > cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); > cm_req_set_local_resp_timeout(req_msg, > param->local_cm_response_timeout); > - req_msg->pkey = param->primary_path->pkey; > + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); > cm_req_set_path_mtu(req_msg, param->primary_path->mtu); > cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); > > @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, > cm_id_priv->responder_resources = param->responder_resources; > cm_id_priv->retry_count = param->retry_count; > cm_id_priv->path_mtu = param->primary_path->mtu; > - cm_id_priv->pkey = param->primary_path->pkey; > + > + /* > + * We want to send the pkey used in the BTH in packets > + * sent. This, in order for the passive side to determine if > + * communication is permitted by the respective pkeys. > + * > + * The pkey in the paths are derived from the MGID, which has > + * the full membership bit set. Hence, we retrieve the pkey by > + * using the address vector's pkey_index. The paths usually come from the SM and I don't expect SM to provide path between ports of only limited members of partition. Default ACM provider forms path from multicast group parameters including pkey. Is that the scenario of concern ? If so, I still don't fully understand the scenario because limited members are not supposed to be part of a multicast group. There was some work started to extend this for client/server model but it was never completed. However, there may be hole(s) in various components of implementation which open(s) this door. -- Hal > + */ > + ret = ib_get_cached_pkey(cm_id_priv->id.device, > + cm_id_priv->av.port->port_num, > + cm_id_priv->av.pkey_index, > + &cm_id_priv->pkey); > + if (ret) > + goto error1; > + > cm_id_priv->qp_type = param->qp_type; > > ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); > @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) > cm_id_priv); > if (ret) { > int err; > + int rej_reason = (ret == -ENOENT ? > + IB_CM_REJ_INVALID_PKEY : > + IB_CM_REJ_INVALID_GID); > > err = ib_get_cached_gid(work->port->cm_dev->ib_device, > work->port->port_num, 0, > &work->path[0].sgid, > NULL); > if (err) > - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, > + ib_send_cm_rej(cm_id, rej_reason, > NULL, 0, NULL, 0); > else > - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, > + ib_send_cm_rej(cm_id, rej_reason, > &work->path[0].sgid, > sizeof(work->path[0].sgid), > NULL, 0); > @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) > ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, > cm_id_priv); > if (ret) { > - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, > + int rej_reason = (ret == -ENOENT ? > + IB_CM_REJ_INVALID_PKEY : > + IB_CM_REJ_INVALID_ALT_GID); > + > + ib_send_cm_rej(cm_id, rej_reason, > &work->path[0].sgid, > sizeof(work->path[0].sgid), NULL, 0); > goto rejected; > diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h > index 7979cb04f529..56b62303946a 100644 > --- a/include/rdma/ib_cm.h > +++ b/include/rdma/ib_cm.h > @@ -3,6 +3,7 @@ > * Copyright (c) 2004 Topspin Corporation. All rights reserved. > * Copyright (c) 2004 Voltaire Corporation. All rights reserved. > * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. > + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. > * > * This software is available to you under a choice of one of two > * licenses. You may choose to be licensed under the terms of the GNU > @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { > IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, > IB_CM_REJ_INVALID_CLASS_VERSION = 31, > IB_CM_REJ_INVALID_FLOW_LABEL = 32, > - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 > + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, > + IB_CM_REJ_INVALID_PKEY = 34, > }; > > struct ib_cm_rej_event_param { > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-09 11:28 ` Hal Rosenstock @ 2018-05-10 9:16 ` Håkon Bugge 2018-05-10 14:01 ` Hal Rosenstock 0 siblings, 1 reply; 24+ messages in thread From: Håkon Bugge @ 2018-05-10 9:16 UTC (permalink / raw) To: Hal Rosenstock Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel > On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > > On 5/9/2018 5:30 AM, Håkon Bugge wrote: >> There is no point in using RDMA CM to establish a connection between >> two QPs that cannot possible communicate. Particularly, if both the >> active and passive side use limited pkeys, they are not able to >> communicate. >> >> In order to detect this situation, the authentic pkey is used in the >> CM REQ message. The authentic pkey is the one that the HCA inserts >> into the BTH in the IB packets. >> >> When the passive side receives the REQ, commit ("ib_core: A full pkey >> is required to match a limited one") ensures that >> ib_find_matched_cached_pkey() fails unless at least one of the pkeys >> compared has the full-member bit. >> >> In the limited-to-limited case, this will prohibit the connection to >> be formed, and thus, Pkey Violation Traps will not be sent to the SM. >> >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >> --- >> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- >> include/rdma/ib_cm.h | 4 +++- >> 2 files changed, 35 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >> index a92e1a5c202b..52ed51d5bd2a 100644 >> --- a/drivers/infiniband/core/cm.c >> +++ b/drivers/infiniband/core/cm.c >> @@ -3,6 +3,7 @@ >> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >> * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. >> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >> * >> * This software is available to you under a choice of one of two >> * licenses. You may choose to be licensed under the terms of the GNU >> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { >> [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", >> [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", >> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", >> + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", > > If this patch goes ahead, IBA spec for CM should be updated to include this. Sure, I see: 33 Invalid Alternate Flow Label as the latest in the spec. > >> }; >> >> const char *__attribute_const__ ibcm_reject_msg(int reason) >> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, >> return -EINVAL; >> cm_dev = port->cm_dev; >> >> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, >> - be16_to_cpu(path->pkey), &av->pkey_index); >> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, >> + be16_to_cpu(path->pkey), &av->pkey_index); >> if (ret) >> return ret; >> >> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, >> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); >> cm_req_set_local_resp_timeout(req_msg, >> param->local_cm_response_timeout); >> - req_msg->pkey = param->primary_path->pkey; >> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); >> cm_req_set_path_mtu(req_msg, param->primary_path->mtu); >> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); >> >> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, >> cm_id_priv->responder_resources = param->responder_resources; >> cm_id_priv->retry_count = param->retry_count; >> cm_id_priv->path_mtu = param->primary_path->mtu; >> - cm_id_priv->pkey = param->primary_path->pkey; >> + >> + /* >> + * We want to send the pkey used in the BTH in packets >> + * sent. This, in order for the passive side to determine if >> + * communication is permitted by the respective pkeys. >> + * >> + * The pkey in the paths are derived from the MGID, which has >> + * the full membership bit set. Hence, we retrieve the pkey by >> + * using the address vector's pkey_index. > > The paths usually come from the SM and I don't expect SM to provide path > between ports of only limited members of partition. In my case, it does. > Default ACM provider > forms path from multicast group parameters including pkey. Is that the > scenario of concern ? Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg. > If so, I still don't fully understand the scenario > because limited members are not supposed to be part of a multicast > group. There was some work started to extend this for client/server > model but it was never completed. However, there may be hole(s) in > various components of implementation which open(s) this door. I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component. I think the CM REQ message should contain the correct PKey (fixed by this patch series). And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series). Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case. Thxs, Håkon > > -- Hal > >> + */ >> + ret = ib_get_cached_pkey(cm_id_priv->id.device, >> + cm_id_priv->av.port->port_num, >> + cm_id_priv->av.pkey_index, >> + &cm_id_priv->pkey); >> + if (ret) >> + goto error1; >> + >> cm_id_priv->qp_type = param->qp_type; >> >> ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); >> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) >> cm_id_priv); >> if (ret) { >> int err; >> + int rej_reason = (ret == -ENOENT ? >> + IB_CM_REJ_INVALID_PKEY : >> + IB_CM_REJ_INVALID_GID); >> >> err = ib_get_cached_gid(work->port->cm_dev->ib_device, >> work->port->port_num, 0, >> &work->path[0].sgid, >> NULL); >> if (err) >> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >> + ib_send_cm_rej(cm_id, rej_reason, >> NULL, 0, NULL, 0); >> else >> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >> + ib_send_cm_rej(cm_id, rej_reason, >> &work->path[0].sgid, >> sizeof(work->path[0].sgid), >> NULL, 0); >> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) >> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, >> cm_id_priv); >> if (ret) { >> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, >> + int rej_reason = (ret == -ENOENT ? >> + IB_CM_REJ_INVALID_PKEY : >> + IB_CM_REJ_INVALID_ALT_GID); >> + >> + ib_send_cm_rej(cm_id, rej_reason, >> &work->path[0].sgid, >> sizeof(work->path[0].sgid), NULL, 0); >> goto rejected; >> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h >> index 7979cb04f529..56b62303946a 100644 >> --- a/include/rdma/ib_cm.h >> +++ b/include/rdma/ib_cm.h >> @@ -3,6 +3,7 @@ >> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >> * Copyright (c) 2004 Voltaire Corporation. All rights reserved. >> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >> * >> * This software is available to you under a choice of one of two >> * licenses. You may choose to be licensed under the terms of the GNU >> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { >> IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, >> IB_CM_REJ_INVALID_CLASS_VERSION = 31, >> IB_CM_REJ_INVALID_FLOW_LABEL = 32, >> - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 >> + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, >> + IB_CM_REJ_INVALID_PKEY = 34, >> }; >> >> struct ib_cm_rej_event_param { >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-10 9:16 ` Håkon Bugge @ 2018-05-10 14:01 ` Hal Rosenstock 2018-05-10 15:16 ` Håkon Bugge 0 siblings, 1 reply; 24+ messages in thread From: Hal Rosenstock @ 2018-05-10 14:01 UTC (permalink / raw) To: Håkon Bugge Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On 5/10/2018 5:16 AM, Håkon Bugge wrote: > > >> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >> >> On 5/9/2018 5:30 AM, Håkon Bugge wrote: >>> There is no point in using RDMA CM to establish a connection between >>> two QPs that cannot possible communicate. Particularly, if both the >>> active and passive side use limited pkeys, they are not able to >>> communicate. >>> >>> In order to detect this situation, the authentic pkey is used in the >>> CM REQ message. The authentic pkey is the one that the HCA inserts >>> into the BTH in the IB packets. >>> >>> When the passive side receives the REQ, commit ("ib_core: A full pkey >>> is required to match a limited one") ensures that >>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys >>> compared has the full-member bit. >>> >>> In the limited-to-limited case, this will prohibit the connection to >>> be formed, and thus, Pkey Violation Traps will not be sent to the SM. >>> >>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>> --- >>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- >>> include/rdma/ib_cm.h | 4 +++- >>> 2 files changed, 35 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>> index a92e1a5c202b..52ed51d5bd2a 100644 >>> --- a/drivers/infiniband/core/cm.c >>> +++ b/drivers/infiniband/core/cm.c >>> @@ -3,6 +3,7 @@ >>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>> * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. >>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>> * >>> * This software is available to you under a choice of one of two >>> * licenses. You may choose to be licensed under the terms of the GNU >>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { >>> [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", >>> [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", >>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", >>> + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", >> >> If this patch goes ahead, IBA spec for CM should be updated to include this. > > Sure, I see: > > 33 Invalid Alternate Flow Label > > as the latest in the spec. This appears to be an implementation rather than architectural issue. If there is limited <-> limited CM, then the MAD would never get up to the CM layer as it would be filtered by end port pkey enforcement. This is only needed to protect against current code which can send on the full rather than limited pkey (in BTH), right ? >> >>> }; >>> >>> const char *__attribute_const__ ibcm_reject_msg(int reason) >>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, >>> return -EINVAL; >>> cm_dev = port->cm_dev; >>> >>> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, >>> - be16_to_cpu(path->pkey), &av->pkey_index); >>> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, >>> + be16_to_cpu(path->pkey), &av->pkey_index); >>> if (ret) >>> return ret; >>> >>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, >>> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); >>> cm_req_set_local_resp_timeout(req_msg, >>> param->local_cm_response_timeout); >>> - req_msg->pkey = param->primary_path->pkey; >>> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); >>> cm_req_set_path_mtu(req_msg, param->primary_path->mtu); >>> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); >>> >>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, >>> cm_id_priv->responder_resources = param->responder_resources; >>> cm_id_priv->retry_count = param->retry_count; >>> cm_id_priv->path_mtu = param->primary_path->mtu; >>> - cm_id_priv->pkey = param->primary_path->pkey; >>> + >>> + /* >>> + * We want to send the pkey used in the BTH in packets >>> + * sent. This, in order for the passive side to determine if >>> + * communication is permitted by the respective pkeys. >>> + * >>> + * The pkey in the paths are derived from the MGID, which has >>> + * the full membership bit set. Hence, we retrieve the pkey by >>> + * using the address vector's pkey_index. >> >> The paths usually come from the SM and I don't expect SM to provide path >> between ports of only limited members of partition. > > In my case, it does. Which OpenSM flavor are you talking about (upstream, MLNX, or other) ? What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ? >> Default ACM provider >> forms path from multicast group parameters including pkey. Is that the >> scenario of concern ? > > Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg. Not exactly sure what you mean by RDMA CM does this as there are a number of different ways to use RDMA CM. Are you referring to using IPoIB for address resolution ? Another way is for a path record to be passed in from user space and there is no control over it's contents. At a minimum, I think the comment mentioning MGID should be clarified. >> If so, I still don't fully understand the scenario >> because limited members are not supposed to be part of a multicast >> group. There was some work started to extend this for client/server >> model but it was never completed. However, there may be hole(s) in >> various components of implementation which open(s) this door. > > I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component. Agreed. I think that there are other components too... > I think the CM REQ message should contain the correct PKey (fixed by this patch series). > > And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series). This is the backward compatibility for the current behavior on the active side so that REQ gets rejected rather than accepted, right ? In general, the approach used when there is pkey mismatch is to silently drop packet rather than respond which causes a timeout on the requester side. In this specific case, I'm not sure which approach is better. Also, is there a similar issue with SIDR_REQ ? > Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case. Understood and agreed. -- Hal > > Thxs, Håkon > > >> >> -- Hal >> >>> + */ >>> + ret = ib_get_cached_pkey(cm_id_priv->id.device, >>> + cm_id_priv->av.port->port_num, >>> + cm_id_priv->av.pkey_index, >>> + &cm_id_priv->pkey); >>> + if (ret) >>> + goto error1; >>> + >>> cm_id_priv->qp_type = param->qp_type; >>> >>> ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); >>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) >>> cm_id_priv); >>> if (ret) { >>> int err; >>> + int rej_reason = (ret == -ENOENT ? >>> + IB_CM_REJ_INVALID_PKEY : >>> + IB_CM_REJ_INVALID_GID); >>> >>> err = ib_get_cached_gid(work->port->cm_dev->ib_device, >>> work->port->port_num, 0, >>> &work->path[0].sgid, >>> NULL); >>> if (err) >>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>> + ib_send_cm_rej(cm_id, rej_reason, >>> NULL, 0, NULL, 0); >>> else >>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>> + ib_send_cm_rej(cm_id, rej_reason, >>> &work->path[0].sgid, >>> sizeof(work->path[0].sgid), >>> NULL, 0); >>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) >>> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, >>> cm_id_priv); >>> if (ret) { >>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, >>> + int rej_reason = (ret == -ENOENT ? >>> + IB_CM_REJ_INVALID_PKEY : >>> + IB_CM_REJ_INVALID_ALT_GID); >>> + >>> + ib_send_cm_rej(cm_id, rej_reason, >>> &work->path[0].sgid, >>> sizeof(work->path[0].sgid), NULL, 0); >>> goto rejected; >>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h >>> index 7979cb04f529..56b62303946a 100644 >>> --- a/include/rdma/ib_cm.h >>> +++ b/include/rdma/ib_cm.h >>> @@ -3,6 +3,7 @@ >>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>> * Copyright (c) 2004 Voltaire Corporation. All rights reserved. >>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>> * >>> * This software is available to you under a choice of one of two >>> * licenses. You may choose to be licensed under the terms of the GNU >>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { >>> IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, >>> IB_CM_REJ_INVALID_CLASS_VERSION = 31, >>> IB_CM_REJ_INVALID_FLOW_LABEL = 32, >>> - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 >>> + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, >>> + IB_CM_REJ_INVALID_PKEY = 34, >>> }; >>> >>> struct ib_cm_rej_event_param { >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-10 14:01 ` Hal Rosenstock @ 2018-05-10 15:16 ` Håkon Bugge 2018-05-10 16:54 ` Hal Rosenstock 2018-05-14 21:02 ` Jason Gunthorpe 0 siblings, 2 replies; 24+ messages in thread From: Håkon Bugge @ 2018-05-10 15:16 UTC (permalink / raw) To: Hal Rosenstock Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel > On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > > On 5/10/2018 5:16 AM, Håkon Bugge wrote: >> >> >>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>> >>> On 5/9/2018 5:30 AM, Håkon Bugge wrote: >>>> There is no point in using RDMA CM to establish a connection between >>>> two QPs that cannot possible communicate. Particularly, if both the >>>> active and passive side use limited pkeys, they are not able to >>>> communicate. >>>> >>>> In order to detect this situation, the authentic pkey is used in the >>>> CM REQ message. The authentic pkey is the one that the HCA inserts >>>> into the BTH in the IB packets. >>>> >>>> When the passive side receives the REQ, commit ("ib_core: A full pkey >>>> is required to match a limited one") ensures that >>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys >>>> compared has the full-member bit. >>>> >>>> In the limited-to-limited case, this will prohibit the connection to >>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM. >>>> >>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>>> --- >>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- >>>> include/rdma/ib_cm.h | 4 +++- >>>> 2 files changed, 35 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>>> index a92e1a5c202b..52ed51d5bd2a 100644 >>>> --- a/drivers/infiniband/core/cm.c >>>> +++ b/drivers/infiniband/core/cm.c >>>> @@ -3,6 +3,7 @@ >>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>> * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. >>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>> * >>>> * This software is available to you under a choice of one of two >>>> * licenses. You may choose to be licensed under the terms of the GNU >>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { >>>> [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", >>>> [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", >>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", >>>> + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", >>> >>> If this patch goes ahead, IBA spec for CM should be updated to include this. >> >> Sure, I see: >> >> 33 Invalid Alternate Flow Label >> >> as the latest in the spec. > > This appears to be an implementation rather than architectural issue. If > there is limited <-> limited CM, then the MAD would never get up to the > CM layer as it would be filtered by end port pkey enforcement. This is > only needed to protect against current code which can send on the full > rather than limited pkey (in BTH), right ? We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ. I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. Further, we have per IBTA: C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching. So, by architecture, the MAD arrives. > > >>> >>>> }; >>>> >>>> const char *__attribute_const__ ibcm_reject_msg(int reason) >>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, >>>> return -EINVAL; >>>> cm_dev = port->cm_dev; >>>> >>>> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, >>>> - be16_to_cpu(path->pkey), &av->pkey_index); >>>> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, >>>> + be16_to_cpu(path->pkey), &av->pkey_index); >>>> if (ret) >>>> return ret; >>>> >>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, >>>> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); >>>> cm_req_set_local_resp_timeout(req_msg, >>>> param->local_cm_response_timeout); >>>> - req_msg->pkey = param->primary_path->pkey; >>>> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); >>>> cm_req_set_path_mtu(req_msg, param->primary_path->mtu); >>>> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); >>>> >>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, >>>> cm_id_priv->responder_resources = param->responder_resources; >>>> cm_id_priv->retry_count = param->retry_count; >>>> cm_id_priv->path_mtu = param->primary_path->mtu; >>>> - cm_id_priv->pkey = param->primary_path->pkey; >>>> + >>>> + /* >>>> + * We want to send the pkey used in the BTH in packets >>>> + * sent. This, in order for the passive side to determine if >>>> + * communication is permitted by the respective pkeys. >>>> + * >>>> + * The pkey in the paths are derived from the MGID, which has >>>> + * the full membership bit set. Hence, we retrieve the pkey by >>>> + * using the address vector's pkey_index. >>> >>> The paths usually come from the SM and I don't expect SM to provide path >>> between ports of only limited members of partition. >> >> In my case, it does. > > Which OpenSM flavor are you talking about (upstream, MLNX, or other) ? I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609. > What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ? The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords. >>> Default ACM provider >>> forms path from multicast group parameters including pkey. Is that the >>> scenario of concern ? >> >> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg. > > Not exactly sure what you mean by RDMA CM does this as there are a > number of different ways to use RDMA CM. I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one. > Are you referring to using > IPoIB for address resolution ? Another way is for a path record to be > passed in from user space and there is no control over it’s contents. I did not bring up the path record or address resolution issue ;-) But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2). But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent. > At a minimum, I think the comment mentioning MGID should be clarified. Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? With the OpenSM I am running, the MC group contains limited members. >>> If so, I still don't fully understand the scenario >>> because limited members are not supposed to be part of a multicast >>> group. There was some work started to extend this for client/server >>> model but it was never completed. However, there may be hole(s) in >>> various components of implementation which open(s) this door. >> >> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component. > > Agreed. I think that there are other components too... > >> I think the CM REQ message should contain the correct PKey (fixed by this patch series). >> >> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series). > This is the backward compatibility for the current behavior on the > active side so that REQ gets rejected rather than accepted, right ? Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back. > In general, the approach used when there is pkey mismatch is to silently > drop packet rather than respond which causes a timeout on the requester > side. In this specific case, I’m not sure which approach is better. It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM. > Also, is there a similar issue with SIDR_REQ ? Good point, most probaly! > Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case. > > Understood and agreed. Thxs, Håkon > > -- Hal > >> >> Thxs, Håkon >> >> >>> >>> -- Hal >>> >>>> + */ >>>> + ret = ib_get_cached_pkey(cm_id_priv->id.device, >>>> + cm_id_priv->av.port->port_num, >>>> + cm_id_priv->av.pkey_index, >>>> + &cm_id_priv->pkey); >>>> + if (ret) >>>> + goto error1; >>>> + >>>> cm_id_priv->qp_type = param->qp_type; >>>> >>>> ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); >>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) >>>> cm_id_priv); >>>> if (ret) { >>>> int err; >>>> + int rej_reason = (ret == -ENOENT ? >>>> + IB_CM_REJ_INVALID_PKEY : >>>> + IB_CM_REJ_INVALID_GID); >>>> >>>> err = ib_get_cached_gid(work->port->cm_dev->ib_device, >>>> work->port->port_num, 0, >>>> &work->path[0].sgid, >>>> NULL); >>>> if (err) >>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>> + ib_send_cm_rej(cm_id, rej_reason, >>>> NULL, 0, NULL, 0); >>>> else >>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>> + ib_send_cm_rej(cm_id, rej_reason, >>>> &work->path[0].sgid, >>>> sizeof(work->path[0].sgid), >>>> NULL, 0); >>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) >>>> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, >>>> cm_id_priv); >>>> if (ret) { >>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, >>>> + int rej_reason = (ret == -ENOENT ? >>>> + IB_CM_REJ_INVALID_PKEY : >>>> + IB_CM_REJ_INVALID_ALT_GID); >>>> + >>>> + ib_send_cm_rej(cm_id, rej_reason, >>>> &work->path[0].sgid, >>>> sizeof(work->path[0].sgid), NULL, 0); >>>> goto rejected; >>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h >>>> index 7979cb04f529..56b62303946a 100644 >>>> --- a/include/rdma/ib_cm.h >>>> +++ b/include/rdma/ib_cm.h >>>> @@ -3,6 +3,7 @@ >>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>> * Copyright (c) 2004 Voltaire Corporation. All rights reserved. >>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>> * >>>> * This software is available to you under a choice of one of two >>>> * licenses. You may choose to be licensed under the terms of the GNU >>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { >>>> IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, >>>> IB_CM_REJ_INVALID_CLASS_VERSION = 31, >>>> IB_CM_REJ_INVALID_FLOW_LABEL = 32, >>>> - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 >>>> + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, >>>> + IB_CM_REJ_INVALID_PKEY = 34, >>>> }; >>>> >>>> struct ib_cm_rej_event_param { >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-10 15:16 ` Håkon Bugge @ 2018-05-10 16:54 ` Hal Rosenstock 2018-05-11 10:55 ` Håkon Bugge 2018-05-14 21:02 ` Jason Gunthorpe 1 sibling, 1 reply; 24+ messages in thread From: Hal Rosenstock @ 2018-05-10 16:54 UTC (permalink / raw) To: Håkon Bugge Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On 5/10/2018 11:16 AM, Håkon Bugge wrote: > > >> On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >> >> On 5/10/2018 5:16 AM, Håkon Bugge wrote: >>> >>> >>>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>>> >>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote: >>>>> There is no point in using RDMA CM to establish a connection between >>>>> two QPs that cannot possible communicate. Particularly, if both the >>>>> active and passive side use limited pkeys, they are not able to >>>>> communicate. >>>>> >>>>> In order to detect this situation, the authentic pkey is used in the >>>>> CM REQ message. The authentic pkey is the one that the HCA inserts >>>>> into the BTH in the IB packets. >>>>> >>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey >>>>> is required to match a limited one") ensures that >>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys >>>>> compared has the full-member bit. >>>>> >>>>> In the limited-to-limited case, this will prohibit the connection to >>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM. >>>>> >>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>>>> --- >>>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- >>>>> include/rdma/ib_cm.h | 4 +++- >>>>> 2 files changed, 35 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>>>> index a92e1a5c202b..52ed51d5bd2a 100644 >>>>> --- a/drivers/infiniband/core/cm.c >>>>> +++ b/drivers/infiniband/core/cm.c >>>>> @@ -3,6 +3,7 @@ >>>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>>> * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. >>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>>> * >>>>> * This software is available to you under a choice of one of two >>>>> * licenses. You may choose to be licensed under the terms of the GNU >>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { >>>>> [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", >>>>> [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", >>>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", >>>>> + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", >>>> >>>> If this patch goes ahead, IBA spec for CM should be updated to include this. >>> >>> Sure, I see: >>> >>> 33 Invalid Alternate Flow Label >>> >>> as the latest in the spec. >> >> This appears to be an implementation rather than architectural issue. If >> there is limited <-> limited CM, then the MAD would never get up to the >> CM layer as it would be filtered by end port pkey enforcement. This is >> only needed to protect against current code which can send on the full >> rather than limited pkey (in BTH), right ? > > We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ. Yes, I understand the difference between the 2 potentially different PKeys. > I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. MADs are sent on reversible paths (13.5.4.1) where the response can be constructed from only header info in the request. BTH:pkey doesn't __have to be__ the default pkey. It just needs to be a common/shared pkey between the source/destination pair. For example, there are some configurations where the port is only a limited member of the default partition (and full member of other non default partition(s)). There are also configurations where the port is both full and limited member on one or more partition(s). This is the allow_both_pkeys TRUE case for OpenSM which also uses the 'both' syntax in the partitions config file. This appears to be the case to which you are referring. CM MAD could be sent on default partition but that may not work if both source and dest are only limited members of the default partition. CM MAD could be sent on any common/shared pkey between the source/destination ports. > Further, we have per IBTA: > > C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching. > > So, by architecture, the MAD arrives In the case of port being both full and limited member of the same partition on which CM MAD comes in on (as defined by pkey in BTH). I was talking about limited <-> limited CM and didn't realize you were talking about the 'both' case. >> >> >>>> >>>>> }; >>>>> >>>>> const char *__attribute_const__ ibcm_reject_msg(int reason) >>>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, >>>>> return -EINVAL; >>>>> cm_dev = port->cm_dev; >>>>> >>>>> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, >>>>> - be16_to_cpu(path->pkey), &av->pkey_index); >>>>> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, >>>>> + be16_to_cpu(path->pkey), &av->pkey_index); >>>>> if (ret) >>>>> return ret; >>>>> >>>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, >>>>> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); >>>>> cm_req_set_local_resp_timeout(req_msg, >>>>> param->local_cm_response_timeout); >>>>> - req_msg->pkey = param->primary_path->pkey; >>>>> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); >>>>> cm_req_set_path_mtu(req_msg, param->primary_path->mtu); >>>>> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); >>>>> >>>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, >>>>> cm_id_priv->responder_resources = param->responder_resources; >>>>> cm_id_priv->retry_count = param->retry_count; >>>>> cm_id_priv->path_mtu = param->primary_path->mtu; >>>>> - cm_id_priv->pkey = param->primary_path->pkey; >>>>> + >>>>> + /* >>>>> + * We want to send the pkey used in the BTH in packets >>>>> + * sent. This, in order for the passive side to determine if >>>>> + * communication is permitted by the respective pkeys. >>>>> + * >>>>> + * The pkey in the paths are derived from the MGID, which has >>>>> + * the full membership bit set. Hence, we retrieve the pkey by >>>>> + * using the address vector's pkey_index. >>>> >>>> The paths usually come from the SM and I don't expect SM to provide path >>>> between ports of only limited members of partition. >>> >>> In my case, it does. >> >> Which OpenSM flavor are you talking about (upstream, MLNX, or other) ? > > I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609. I'll check upstream OpenSM when I get a chance to see if same problem exists. Oracle OpenSM forked from upstream long time ago and not sure how similar the path record code is. There were various impacts for the shared port virtualization (for CX-3) in terms of adding the ability to allow both pkeys. >> What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ? > > The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords. > >>>> Default ACM provider >>>> forms path from multicast group parameters including pkey. Is that the >>>> scenario of concern ? >>> >>> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg. >> >> Not exactly sure what you mean by RDMA CM does this as there are a >> number of different ways to use RDMA CM. > > I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one. When you write "port uses limited one", you mean BTH:pkey is limited member but REQ:pkey is full member, right ? > >> Are you referring to using >> IPoIB for address resolution ? Another way is for a path record to be >> passed in from user space and there is no control over it’s contents. > > I did not bring up the path record or address resolution issue ;-) You mentioned that "The pkey in the paths are derived from the MGID" and brought this issue up in my thinking about this issue. > But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2). > > But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent. > >> At a minimum, I think the comment mentioning MGID should be clarified. > > Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? > > With the OpenSM I am running, the MC group contains limited members. That is problematic as any sending to MC group by limited members will cause pkey violation traps to SM. >>>> If so, I still don't fully understand the scenario >>>> because limited members are not supposed to be part of a multicast >>>> group. There was some work started to extend this for client/server >>>> model but it was never completed. However, there may be hole(s) in >>>> various components of implementation which open(s) this door. >>> >>> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component. >> >> Agreed. I think that there are other components too... >> >>> I think the CM REQ message should contain the correct PKey (fixed by this patch series). >>> >>> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series). >> This is the backward compatibility for the current behavior on the >> active side so that REQ gets rejected rather than accepted, right ? > > Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back. Understood. >> In general, the approach used when there is pkey mismatch is to silently >> drop packet rather than respond which causes a timeout on the requester >> side. In this specific case, I’m not sure which approach is better. > > It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM. I was only talking about the requester-responder part. Yes, pkey security trap would go to SM. As I wrote, I'm not sure which is preferable: to explictly reject as this patch proposes or just silently drop and penalize requester. -- Hal >> Also, is there a similar issue with SIDR_REQ ? > > Good point, most probaly! > >> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case. >> >> Understood and agreed. > > > Thxs, Håkon > >> >> -- Hal >> >>> >>> Thxs, Håkon >>> >>> >>>> >>>> -- Hal >>>> >>>>> + */ >>>>> + ret = ib_get_cached_pkey(cm_id_priv->id.device, >>>>> + cm_id_priv->av.port->port_num, >>>>> + cm_id_priv->av.pkey_index, >>>>> + &cm_id_priv->pkey); >>>>> + if (ret) >>>>> + goto error1; >>>>> + >>>>> cm_id_priv->qp_type = param->qp_type; >>>>> >>>>> ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); >>>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) >>>>> cm_id_priv); >>>>> if (ret) { >>>>> int err; >>>>> + int rej_reason = (ret == -ENOENT ? >>>>> + IB_CM_REJ_INVALID_PKEY : >>>>> + IB_CM_REJ_INVALID_GID); >>>>> >>>>> err = ib_get_cached_gid(work->port->cm_dev->ib_device, >>>>> work->port->port_num, 0, >>>>> &work->path[0].sgid, >>>>> NULL); >>>>> if (err) >>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>> NULL, 0, NULL, 0); >>>>> else >>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>> &work->path[0].sgid, >>>>> sizeof(work->path[0].sgid), >>>>> NULL, 0); >>>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) >>>>> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, >>>>> cm_id_priv); >>>>> if (ret) { >>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, >>>>> + int rej_reason = (ret == -ENOENT ? >>>>> + IB_CM_REJ_INVALID_PKEY : >>>>> + IB_CM_REJ_INVALID_ALT_GID); >>>>> + >>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>> &work->path[0].sgid, >>>>> sizeof(work->path[0].sgid), NULL, 0); >>>>> goto rejected; >>>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h >>>>> index 7979cb04f529..56b62303946a 100644 >>>>> --- a/include/rdma/ib_cm.h >>>>> +++ b/include/rdma/ib_cm.h >>>>> @@ -3,6 +3,7 @@ >>>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>>> * Copyright (c) 2004 Voltaire Corporation. All rights reserved. >>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>>> * >>>>> * This software is available to you under a choice of one of two >>>>> * licenses. You may choose to be licensed under the terms of the GNU >>>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { >>>>> IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, >>>>> IB_CM_REJ_INVALID_CLASS_VERSION = 31, >>>>> IB_CM_REJ_INVALID_FLOW_LABEL = 32, >>>>> - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 >>>>> + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, >>>>> + IB_CM_REJ_INVALID_PKEY = 34, >>>>> }; >>>>> >>>>> struct ib_cm_rej_event_param { >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-10 16:54 ` Hal Rosenstock @ 2018-05-11 10:55 ` Håkon Bugge 2018-05-11 12:51 ` Hal Rosenstock 2018-05-14 21:04 ` Jason Gunthorpe 0 siblings, 2 replies; 24+ messages in thread From: Håkon Bugge @ 2018-05-11 10:55 UTC (permalink / raw) To: Hal Rosenstock Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel > On 10 May 2018, at 18:54, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > > On 5/10/2018 11:16 AM, Håkon Bugge wrote: >> >> >>> On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>> >>> On 5/10/2018 5:16 AM, Håkon Bugge wrote: >>>> >>>> >>>>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>>>> >>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote: >>>>>> There is no point in using RDMA CM to establish a connection between >>>>>> two QPs that cannot possible communicate. Particularly, if both the >>>>>> active and passive side use limited pkeys, they are not able to >>>>>> communicate. >>>>>> >>>>>> In order to detect this situation, the authentic pkey is used in the >>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts >>>>>> into the BTH in the IB packets. >>>>>> >>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey >>>>>> is required to match a limited one") ensures that >>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys >>>>>> compared has the full-member bit. >>>>>> >>>>>> In the limited-to-limited case, this will prohibit the connection to >>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM. >>>>>> >>>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>>>>> --- >>>>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- >>>>>> include/rdma/ib_cm.h | 4 +++- >>>>>> 2 files changed, 35 insertions(+), 8 deletions(-) >>>>>> >>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>>>>> index a92e1a5c202b..52ed51d5bd2a 100644 >>>>>> --- a/drivers/infiniband/core/cm.c >>>>>> +++ b/drivers/infiniband/core/cm.c >>>>>> @@ -3,6 +3,7 @@ >>>>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. >>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>>>> * >>>>>> * This software is available to you under a choice of one of two >>>>>> * licenses. You may choose to be licensed under the terms of the GNU >>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { >>>>>> [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", >>>>>> [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", >>>>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", >>>>>> + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", >>>>> >>>>> If this patch goes ahead, IBA spec for CM should be updated to include this. >>>> >>>> Sure, I see: >>>> >>>> 33 Invalid Alternate Flow Label >>>> >>>> as the latest in the spec. >>> >>> This appears to be an implementation rather than architectural issue. If >>> there is limited <-> limited CM, then the MAD would never get up to the >>> CM layer as it would be filtered by end port pkey enforcement. This is >>> only needed to protect against current code which can send on the full >>> rather than limited pkey (in BTH), right ? >> >> We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ. > > Yes, I understand the difference between the 2 potentially different PKeys. > >> I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. > > MADs are sent on reversible paths (13.5.4.1) where the response can be > constructed from only header info in the request. BTH:pkey doesn't > __have to be__ the default pkey. It just needs to be a common/shared > pkey between the source/destination pair. > > For example, there are some configurations where the port is only a > limited member of the default partition (and full member of other non > default partition(s)). > > There are also configurations where the port is both full and limited > member on one or more partition(s). This is the allow_both_pkeys TRUE > case for OpenSM which also uses the 'both' syntax in the partitions > config file. This appears to be the case to which you are referring. > > CM MAD could be sent on default partition but that may not work if both > source and dest are only limited members of the default partition. > > CM MAD could be sent on any common/shared pkey between the > source/destination ports. > >> Further, we have per IBTA: >> >> C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching. >> >> So, by architecture, the MAD arrives > In the case of port being both full and limited member of the same > partition on which CM MAD comes in on (as defined by pkey in BTH). > > I was talking about limited <-> limited CM and didn't realize you were > talking about the ‘both' case. OK. So we agree that for the most typical cases, where the BTH.PKey is the default one, the MAD will be received and not trapped by the PKey check. >>> >>> >>>>> >>>>>> }; >>>>>> >>>>>> const char *__attribute_const__ ibcm_reject_msg(int reason) >>>>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, >>>>>> return -EINVAL; >>>>>> cm_dev = port->cm_dev; >>>>>> >>>>>> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, >>>>>> - be16_to_cpu(path->pkey), &av->pkey_index); >>>>>> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, >>>>>> + be16_to_cpu(path->pkey), &av->pkey_index); >>>>>> if (ret) >>>>>> return ret; >>>>>> >>>>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, >>>>>> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); >>>>>> cm_req_set_local_resp_timeout(req_msg, >>>>>> param->local_cm_response_timeout); >>>>>> - req_msg->pkey = param->primary_path->pkey; >>>>>> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); >>>>>> cm_req_set_path_mtu(req_msg, param->primary_path->mtu); >>>>>> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); >>>>>> >>>>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, >>>>>> cm_id_priv->responder_resources = param->responder_resources; >>>>>> cm_id_priv->retry_count = param->retry_count; >>>>>> cm_id_priv->path_mtu = param->primary_path->mtu; >>>>>> - cm_id_priv->pkey = param->primary_path->pkey; >>>>>> + >>>>>> + /* >>>>>> + * We want to send the pkey used in the BTH in packets >>>>>> + * sent. This, in order for the passive side to determine if >>>>>> + * communication is permitted by the respective pkeys. >>>>>> + * >>>>>> + * The pkey in the paths are derived from the MGID, which has >>>>>> + * the full membership bit set. Hence, we retrieve the pkey by >>>>>> + * using the address vector's pkey_index. >>>>> >>>>> The paths usually come from the SM and I don't expect SM to provide path >>>>> between ports of only limited members of partition. >>>> >>>> In my case, it does. >>> >>> Which OpenSM flavor are you talking about (upstream, MLNX, or other) ? >> >> I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609. > > I'll check upstream OpenSM when I get a chance to see if same problem > exists. Oracle OpenSM forked from upstream long time ago and not sure > how similar the path record code is. There were various impacts for the > shared port virtualization (for CX-3) in terms of adding the ability to > allow both pkeys. I just had a chat with Line H. here in Oracle. She confirms that Oracle’s SM and the upstream one act similar in this matter. Further, using shared port virtualization, OpenSM will adhere to the physical ports only, and doesn’t even know the PKey assignments to the VFs assigned the VMs. >>> What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ? >> >> The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords. >> >>>>> Default ACM provider >>>>> forms path from multicast group parameters including pkey. Is that the >>>>> scenario of concern ? >>>> >>>> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg. >>> >>> Not exactly sure what you mean by RDMA CM does this as there are a >>> number of different ways to use RDMA CM. >> >> I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one. > > When you write "port uses limited one", you mean BTH:pkey is limited > member but REQ:pkey is full member, right ? Yes, but the BTH:pkey is, as stated above most often the default pkey (limited or full), whilst REQ:pkey is a designated, limited pkey. >>> Are you referring to using >>> IPoIB for address resolution ? Another way is for a path record to be >>> passed in from user space and there is no control over it’s contents. >> >> I did not bring up the path record or address resolution issue ;-) > > You mentioned that "The pkey in the paths are derived from the MGID" and > brought this issue up in my thinking about this issue. Yes, just to inform why it is full - when it shouldn’t be. >> But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2). >> >> But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent. >> >>> At a minimum, I think the comment mentioning MGID should be clarified. >> >> Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? >> >> With the OpenSM I am running, the MC group contains limited members. > > That is problematic as any sending to MC group by limited members will > cause pkey violation traps to SM. Sure, agree, but a separate issue from this series which handles the unicast case. >>>>> If so, I still don't fully understand the scenario >>>>> because limited members are not supposed to be part of a multicast >>>>> group. There was some work started to extend this for client/server >>>>> model but it was never completed. However, there may be hole(s) in >>>>> various components of implementation which open(s) this door. >>>> >>>> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component. >>> >>> Agreed. I think that there are other components too... >>> >>>> I think the CM REQ message should contain the correct PKey (fixed by this patch series). >>>> >>>> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series). >>> This is the backward compatibility for the current behavior on the >>> active side so that REQ gets rejected rather than accepted, right ? >> >> Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back. > > Understood. > >>> In general, the approach used when there is pkey mismatch is to silently >>> drop packet rather than respond which causes a timeout on the requester >>> side. In this specific case, I’m not sure which approach is better. >> >> It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM. > > I was only talking about the requester-responder part. Yes, pkey > security trap would go to SM. > > As I wrote, I'm not sure which is preferable: to explictly reject as > this patch proposes or just silently drop and penalize requester. Since the infrastructure is in place, i.e., the REJ reasons, my view is that an explicit REJ with a distinct reason is the most obvious. Thxs, Håkon > > -- Hal > >>> Also, is there a similar issue with SIDR_REQ ? >> >> Good point, most probaly! >> >>> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case. >>> >>> Understood and agreed. >> >> >> Thxs, Håkon >> >>> >>> -- Hal >>> >>>> >>>> Thxs, Håkon >>>> >>>> >>>>> >>>>> -- Hal >>>>> >>>>>> + */ >>>>>> + ret = ib_get_cached_pkey(cm_id_priv->id.device, >>>>>> + cm_id_priv->av.port->port_num, >>>>>> + cm_id_priv->av.pkey_index, >>>>>> + &cm_id_priv->pkey); >>>>>> + if (ret) >>>>>> + goto error1; >>>>>> + >>>>>> cm_id_priv->qp_type = param->qp_type; >>>>>> >>>>>> ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); >>>>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) >>>>>> cm_id_priv); >>>>>> if (ret) { >>>>>> int err; >>>>>> + int rej_reason = (ret == -ENOENT ? >>>>>> + IB_CM_REJ_INVALID_PKEY : >>>>>> + IB_CM_REJ_INVALID_GID); >>>>>> >>>>>> err = ib_get_cached_gid(work->port->cm_dev->ib_device, >>>>>> work->port->port_num, 0, >>>>>> &work->path[0].sgid, >>>>>> NULL); >>>>>> if (err) >>>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>>> NULL, 0, NULL, 0); >>>>>> else >>>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>>> &work->path[0].sgid, >>>>>> sizeof(work->path[0].sgid), >>>>>> NULL, 0); >>>>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) >>>>>> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, >>>>>> cm_id_priv); >>>>>> if (ret) { >>>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, >>>>>> + int rej_reason = (ret == -ENOENT ? >>>>>> + IB_CM_REJ_INVALID_PKEY : >>>>>> + IB_CM_REJ_INVALID_ALT_GID); >>>>>> + >>>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>>> &work->path[0].sgid, >>>>>> sizeof(work->path[0].sgid), NULL, 0); >>>>>> goto rejected; >>>>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h >>>>>> index 7979cb04f529..56b62303946a 100644 >>>>>> --- a/include/rdma/ib_cm.h >>>>>> +++ b/include/rdma/ib_cm.h >>>>>> @@ -3,6 +3,7 @@ >>>>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>>>> * Copyright (c) 2004 Voltaire Corporation. All rights reserved. >>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>>>> * >>>>>> * This software is available to you under a choice of one of two >>>>>> * licenses. You may choose to be licensed under the terms of the GNU >>>>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { >>>>>> IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, >>>>>> IB_CM_REJ_INVALID_CLASS_VERSION = 31, >>>>>> IB_CM_REJ_INVALID_FLOW_LABEL = 32, >>>>>> - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 >>>>>> + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, >>>>>> + IB_CM_REJ_INVALID_PKEY = 34, >>>>>> }; >>>>>> >>>>>> struct ib_cm_rej_event_param { >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-11 10:55 ` Håkon Bugge @ 2018-05-11 12:51 ` Hal Rosenstock 2018-05-14 21:04 ` Jason Gunthorpe 1 sibling, 0 replies; 24+ messages in thread From: Hal Rosenstock @ 2018-05-11 12:51 UTC (permalink / raw) To: Håkon Bugge Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On 5/11/2018 6:55 AM, Håkon Bugge wrote: > > >> On 10 May 2018, at 18:54, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >> >> On 5/10/2018 11:16 AM, Håkon Bugge wrote: >>> >>> >>>> On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>>> >>>> On 5/10/2018 5:16 AM, Håkon Bugge wrote: >>>>> >>>>> >>>>>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>>>>> >>>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote: >>>>>>> There is no point in using RDMA CM to establish a connection between >>>>>>> two QPs that cannot possible communicate. Particularly, if both the >>>>>>> active and passive side use limited pkeys, they are not able to >>>>>>> communicate. >>>>>>> >>>>>>> In order to detect this situation, the authentic pkey is used in the >>>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts >>>>>>> into the BTH in the IB packets. >>>>>>> >>>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey >>>>>>> is required to match a limited one") ensures that >>>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys >>>>>>> compared has the full-member bit. >>>>>>> >>>>>>> In the limited-to-limited case, this will prohibit the connection to >>>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM. >>>>>>> >>>>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>>>>>> --- >>>>>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++------- >>>>>>> include/rdma/ib_cm.h | 4 +++- >>>>>>> 2 files changed, 35 insertions(+), 8 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c >>>>>>> index a92e1a5c202b..52ed51d5bd2a 100644 >>>>>>> --- a/drivers/infiniband/core/cm.c >>>>>>> +++ b/drivers/infiniband/core/cm.c >>>>>>> @@ -3,6 +3,7 @@ >>>>>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. >>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>>>>> * >>>>>>> * This software is available to you under a choice of one of two >>>>>>> * licenses. You may choose to be licensed under the terms of the GNU >>>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = { >>>>>>> [IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version", >>>>>>> [IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label", >>>>>>> [IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label", >>>>>>> + [IB_CM_REJ_INVALID_PKEY] = "invalid PKey", >>>>>> >>>>>> If this patch goes ahead, IBA spec for CM should be updated to include this. >>>>> >>>>> Sure, I see: >>>>> >>>>> 33 Invalid Alternate Flow Label >>>>> >>>>> as the latest in the spec. >>>> >>>> This appears to be an implementation rather than architectural issue. If >>>> there is limited <-> limited CM, then the MAD would never get up to the >>>> CM layer as it would be filtered by end port pkey enforcement. This is >>>> only needed to protect against current code which can send on the full >>>> rather than limited pkey (in BTH), right ? >>> >>> We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ. >> >> Yes, I understand the difference between the 2 potentially different PKeys. >> >>> I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. >> >> MADs are sent on reversible paths (13.5.4.1) where the response can be >> constructed from only header info in the request. BTH:pkey doesn't >> __have to be__ the default pkey. It just needs to be a common/shared >> pkey between the source/destination pair. >> >> For example, there are some configurations where the port is only a >> limited member of the default partition (and full member of other non >> default partition(s)). >> >> There are also configurations where the port is both full and limited >> member on one or more partition(s). This is the allow_both_pkeys TRUE >> case for OpenSM which also uses the 'both' syntax in the partitions >> config file. This appears to be the case to which you are referring. >> >> CM MAD could be sent on default partition but that may not work if both >> source and dest are only limited members of the default partition. >> >> CM MAD could be sent on any common/shared pkey between the >> source/destination ports. >> >>> Further, we have per IBTA: >>> >>> C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching. >>> >>> So, by architecture, the MAD arrives >> In the case of port being both full and limited member of the same >> partition on which CM MAD comes in on (as defined by pkey in BTH). >> >> I was talking about limited <-> limited CM and didn't realize you were >> talking about the ‘both' case. > > OK. So we agree that for the most typical cases, where the BTH.PKey is the default one, the MAD will be received and not trapped by the PKey check. > >>>> >>>> >>>>>> >>>>>>> }; >>>>>>> >>>>>>> const char *__attribute_const__ ibcm_reject_msg(int reason) >>>>>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, >>>>>>> return -EINVAL; >>>>>>> cm_dev = port->cm_dev; >>>>>>> >>>>>>> - ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num, >>>>>>> - be16_to_cpu(path->pkey), &av->pkey_index); >>>>>>> + ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num, >>>>>>> + be16_to_cpu(path->pkey), &av->pkey_index); This routine is shared by LAP send/rcvd and SIDR send. Is this change the correct thing to do there as well ? Were LAP and SIDR tested ? >>>>>>> if (ret) >>>>>>> return ret; >>>>>>> >>>>>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, >>>>>>> cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); >>>>>>> cm_req_set_local_resp_timeout(req_msg, >>>>>>> param->local_cm_response_timeout); >>>>>>> - req_msg->pkey = param->primary_path->pkey; >>>>>>> + req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); >>>>>>> cm_req_set_path_mtu(req_msg, param->primary_path->mtu); >>>>>>> cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); >>>>>>> >>>>>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, >>>>>>> cm_id_priv->responder_resources = param->responder_resources; >>>>>>> cm_id_priv->retry_count = param->retry_count; >>>>>>> cm_id_priv->path_mtu = param->primary_path->mtu; >>>>>>> - cm_id_priv->pkey = param->primary_path->pkey; >>>>>>> + >>>>>>> + /* >>>>>>> + * We want to send the pkey used in the BTH in packets >>>>>>> + * sent. This, in order for the passive side to determine if >>>>>>> + * communication is permitted by the respective pkeys. >>>>>>> + * >>>>>>> + * The pkey in the paths are derived from the MGID, which has >>>>>>> + * the full membership bit set. Hence, we retrieve the pkey by >>>>>>> + * using the address vector's pkey_index. >>>>>> >>>>>> The paths usually come from the SM and I don't expect SM to provide path >>>>>> between ports of only limited members of partition. >>>>> >>>>> In my case, it does. >>>> >>>> Which OpenSM flavor are you talking about (upstream, MLNX, or other) ? >>> >>> I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609. >> >> I'll check upstream OpenSM when I get a chance to see if same problem >> exists. Oracle OpenSM forked from upstream long time ago and not sure >> how similar the path record code is. There were various impacts for the >> shared port virtualization (for CX-3) in terms of adding the ability to >> allow both pkeys. > > I just had a chat with Line H. here in Oracle. She confirms that Oracle’s SM and the upstream one act similar in this matter. To be sure, I will double check this with upstream when I get a few cycles to do this. > Further, using shared port virtualization, OpenSM will adhere to the physical ports only, and doesn’t even know the PKey assignments to the VFs assigned the VMs. Right; with the shared port virtualization, the SM only knows about the pport (physical port) partitioning whereas with the newer virtualization annex there is also the vport (VF) partitioning in addition to the pport partitioning. This has an impact on the paths that can be provided. > >>>> What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ? >>> >>> The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords. >>> >>>>>> Default ACM provider >>>>>> forms path from multicast group parameters including pkey. Is that the >>>>>> scenario of concern ? >>>>> >>>>> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg. >>>> >>>> Not exactly sure what you mean by RDMA CM does this as there are a >>>> number of different ways to use RDMA CM. >>> >>> I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one. >> >> When you write "port uses limited one", you mean BTH:pkey is limited >> member but REQ:pkey is full member, right ? > > Yes, but the BTH:pkey is, as stated above most often the default pkey (limited or full), whilst REQ:pkey is a designated, limited pkey. FWIW I think the comment "The pkey in the paths are derived from the MGID" can be improved as this is one use case and is not the most common one. >>>> Are you referring to using >>>> IPoIB for address resolution ? Another way is for a path record to be >>>> passed in from user space and there is no control over it’s contents. >>> >>> I did not bring up the path record or address resolution issue ;-) >> >> You mentioned that "The pkey in the paths are derived from the MGID" and >> brought this issue up in my thinking about this issue. > > Yes, just to inform why it is full - when it shouldn’t be. > >>> But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2). >>> >>> But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent. >>> >>>> At a minimum, I think the comment mentioning MGID should be clarified. >>> >>> Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? >>> >>> With the OpenSM I am running, the MC group contains limited members. >> >> That is problematic as any sending to MC group by limited members will >> cause pkey violation traps to SM. > > Sure, agree, but a separate issue from this series which handles the unicast case. > >>>>>> If so, I still don't fully understand the scenario >>>>>> because limited members are not supposed to be part of a multicast >>>>>> group. There was some work started to extend this for client/server >>>>>> model but it was never completed. However, there may be hole(s) in >>>>>> various components of implementation which open(s) this door. >>>>> >>>>> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component. >>>> >>>> Agreed. I think that there are other components too... >>>> >>>>> I think the CM REQ message should contain the correct PKey (fixed by this patch series). >>>>> >>>>> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series). >>>> This is the backward compatibility for the current behavior on the >>>> active side so that REQ gets rejected rather than accepted, right ? >>> >>> Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back. >> >> Understood. >> >>>> In general, the approach used when there is pkey mismatch is to silently >>>> drop packet rather than respond which causes a timeout on the requester >>>> side. In this specific case, I’m not sure which approach is better. >>> >>> It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM. >> >> I was only talking about the requester-responder part. Yes, pkey >> security trap would go to SM. >> >> As I wrote, I'm not sure which is preferable: to explictly reject as >> this patch proposes or just silently drop and penalize requester. > > Since the infrastructure is in place, i.e., the REJ reasons, my view is that an explicit REJ with a distinct reason is the most obvious. The downsides are that it reveals info that perhaps should not be revealed and allows the requestor to rerequest sooner. In some scenarios, that could cause DoS attack on the responder (server). -- Hal > > Thxs, Håkon > >> >> -- Hal >> >>>> Also, is there a similar issue with SIDR_REQ ? >>> >>> Good point, most probaly! >>> >>>> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case. >>>> >>>> Understood and agreed. >>> >>> >>> Thxs, Håkon >>> >>>> >>>> -- Hal >>>> >>>>> >>>>> Thxs, Håkon >>>>> >>>>> >>>>>> >>>>>> -- Hal >>>>>> >>>>>>> + */ >>>>>>> + ret = ib_get_cached_pkey(cm_id_priv->id.device, >>>>>>> + cm_id_priv->av.port->port_num, >>>>>>> + cm_id_priv->av.pkey_index, >>>>>>> + &cm_id_priv->pkey); >>>>>>> + if (ret) >>>>>>> + goto error1; >>>>>>> + >>>>>>> cm_id_priv->qp_type = param->qp_type; >>>>>>> >>>>>>> ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); >>>>>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work) >>>>>>> cm_id_priv); >>>>>>> if (ret) { >>>>>>> int err; >>>>>>> + int rej_reason = (ret == -ENOENT ? >>>>>>> + IB_CM_REJ_INVALID_PKEY : >>>>>>> + IB_CM_REJ_INVALID_GID); >>>>>>> >>>>>>> err = ib_get_cached_gid(work->port->cm_dev->ib_device, >>>>>>> work->port->port_num, 0, >>>>>>> &work->path[0].sgid, >>>>>>> NULL); >>>>>>> if (err) >>>>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>>>> NULL, 0, NULL, 0); >>>>>>> else >>>>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, >>>>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>>>> &work->path[0].sgid, >>>>>>> sizeof(work->path[0].sgid), >>>>>>> NULL, 0); >>>>>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work) >>>>>>> ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, >>>>>>> cm_id_priv); >>>>>>> if (ret) { >>>>>>> - ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, >>>>>>> + int rej_reason = (ret == -ENOENT ? >>>>>>> + IB_CM_REJ_INVALID_PKEY : >>>>>>> + IB_CM_REJ_INVALID_ALT_GID); >>>>>>> + >>>>>>> + ib_send_cm_rej(cm_id, rej_reason, >>>>>>> &work->path[0].sgid, >>>>>>> sizeof(work->path[0].sgid), NULL, 0); >>>>>>> goto rejected; >>>>>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h >>>>>>> index 7979cb04f529..56b62303946a 100644 >>>>>>> --- a/include/rdma/ib_cm.h >>>>>>> +++ b/include/rdma/ib_cm.h >>>>>>> @@ -3,6 +3,7 @@ >>>>>>> * Copyright (c) 2004 Topspin Corporation. All rights reserved. >>>>>>> * Copyright (c) 2004 Voltaire Corporation. All rights reserved. >>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. >>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved. >>>>>>> * >>>>>>> * This software is available to you under a choice of one of two >>>>>>> * licenses. You may choose to be licensed under the terms of the GNU >>>>>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason { >>>>>>> IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID = 30, >>>>>>> IB_CM_REJ_INVALID_CLASS_VERSION = 31, >>>>>>> IB_CM_REJ_INVALID_FLOW_LABEL = 32, >>>>>>> - IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33 >>>>>>> + IB_CM_REJ_INVALID_ALT_FLOW_LABEL = 33, >>>>>>> + IB_CM_REJ_INVALID_PKEY = 34, >>>>>>> }; >>>>>>> >>>>>>> struct ib_cm_rej_event_param { >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-11 10:55 ` Håkon Bugge 2018-05-11 12:51 ` Hal Rosenstock @ 2018-05-14 21:04 ` Jason Gunthorpe 1 sibling, 0 replies; 24+ messages in thread From: Jason Gunthorpe @ 2018-05-14 21:04 UTC (permalink / raw) To: Håkon Bugge Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On Fri, May 11, 2018 at 12:55:00PM +0200, Håkon Bugge wrote: > > I'll check upstream OpenSM when I get a chance to see if same problem > > exists. Oracle OpenSM forked from upstream long time ago and not sure > > how similar the path record code is. There were various impacts for the > > shared port virtualization (for CX-3) in terms of adding the ability to > > allow both pkeys. > > I just had a chat with Line H. here in Oracle. She confirms that Oracle’s SM and the upstream one act similar in this matter. > > Further, using shared port virtualization, OpenSM will adhere to the > physical ports only, and doesn’t even know the PKey assignments to > the VFs assigned the VMs. Yes, but that isn't the point. If the VM receives a PR with a pkey that is not found in the VM's table then it should not even send a CM REQ in the first place. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-10 15:16 ` Håkon Bugge 2018-05-10 16:54 ` Hal Rosenstock @ 2018-05-14 21:02 ` Jason Gunthorpe 2018-05-15 0:38 ` Hal Rosenstock 1 sibling, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2018-05-14 21:02 UTC (permalink / raw) To: Håkon Bugge Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote: > We are talking about two things here. The PKey in the BTH and the > PKey in the CM REQ payload. They differ. > > I am out of office, but if my memory serves me correct, the PKey in > the BTH in the MAD packet will be the default PKey. Further, we have > per IBTA: This sounds like a Linux bug. Linux does not do a PR to get a reversible path dedicated to the GMP, so it always uses the data flow path, thus the GMP path paramenters and those in the REQ should always exactly match. Where is Linux setting the BTH.PKey and how did it choose to use the default pkey? Lets fix that at least for sure. Once that is fixed the rest of the series makes no sense since a REQ with invalid PKey will never arrive. However... This series seems inconsistent with the spec. IIRC the spec doesn't say if a full or limited pkey should be placed in the REQ (Hal?). It is designed so that the requestor can get a single reversible path and put that results into the REQ without additional processing, however the PR returns only one PKey and again, itis not really specified if it should be the full or limited pkey (Hal?). Basically this means that any pkey in the REQ could randomly be the full or limited value, and that in-of-itself has not bearing on the connection. So it is quite wrong to insist that the pkey be limited or full when processing the REQ. The end port is expected to match against the local table. The real answer to your trap problem is to fix the SM to not create paths that are non-functional, that is just flat out broken SM behavior. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-14 21:02 ` Jason Gunthorpe @ 2018-05-15 0:38 ` Hal Rosenstock 2018-05-15 18:11 ` Håkon Bugge 0 siblings, 1 reply; 24+ messages in thread From: Hal Rosenstock @ 2018-05-15 0:38 UTC (permalink / raw) To: Jason Gunthorpe, Håkon Bugge Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On 5/14/2018 5:02 PM, Jason Gunthorpe wrote: > On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote: > >> We are talking about two things here. The PKey in the BTH and the >> PKey in the CM REQ payload. They differ. >> >> I am out of office, but if my memory serves me correct, the PKey in >> the BTH in the MAD packet will be the default PKey. Further, we have >> per IBTA: > > This sounds like a Linux bug. > > Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters > and those in the REQ should always exactly match. > > Where is Linux setting the BTH.PKey and how did it choose to use the > default pkey? Lets fix that at least for sure. > > Once that is fixed the rest of the series makes no sense since a REQ > with invalid PKey will never arrive. > > However... > > This series seems inconsistent with the spec. > > IIRC the spec doesn't say if a full or limited pkey should be placed > in the REQ (Hal?). CM spec for REQ just says partition key without indicating whether this means P_Key or just the partition (15 bits) so my read is that either full or limited pkey is allowed in REQ. > It is designed so that the requestor can get a > single reversible path and put that results into the REQ without > additional processing, however the PR returns only one PKey and again, > it is not really specified if it should be the full or limited pkey > (Hal?). Correct; it's not specified. > Basically this means that any pkey in the REQ could randomly be the > full or limited value, and that in-of-itself has not bearing on the > connection. > > So it is quite wrong to insist that the pkey be limited or full when > processing the REQ. The end port is expected to match against the > local table. Note that there is thorny issue with shared (physical) port virtualization. In shared port virtualization, the VF pkey assignment is a local matter. Only thing SM knows is the physical port pkeys where both full and limited membership of same partition is possible. It is conceivable that CM REQ contains limited partition key between 2 limited VFs and for that a new REJ reason code appears to be needed. -- Hal > The real answer to your trap problem is to fix the SM to not create > paths that are non-functional, that is just flat out broken SM > behavior. > > Jason > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-15 0:38 ` Hal Rosenstock @ 2018-05-15 18:11 ` Håkon Bugge 2018-05-15 19:04 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Håkon Bugge @ 2018-05-15 18:11 UTC (permalink / raw) To: Hal Rosenstock Cc: Jason Gunthorpe, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel > On 15 May 2018, at 02:38, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > > On 5/14/2018 5:02 PM, Jason Gunthorpe wrote: >> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote: >> >>> We are talking about two things here. The PKey in the BTH and the >>> PKey in the CM REQ payload. They differ. >>> >>> I am out of office, but if my memory serves me correct, the PKey in >>> the BTH in the MAD packet will be the default PKey. Further, we have >>> per IBTA: >> >> This sounds like a Linux bug. >> >> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters >> and those in the REQ should always exactly match. >> >> Where is Linux setting the BTH.PKey and how did it choose to use the >> default pkey? Lets fix that at least for sure. Linux isn’t. The BTH.PKey is inserted by the HCA (hw or fw) coming from the P_Key table (10.9.2 in IBTA), selected by a pkey_index associated with the QP. As per C10-133: Packets sent from the Send Queue of a GSI QP shall attach a P_Key associated with that QP, just as a P_Key is associated with nonmanagement QPs >> Once that is fixed the rest of the series makes no sense since a REQ >> with invalid PKey will never arrive. >> >> However... >> >> This series seems inconsistent with the spec. >> >> IIRC the spec doesn't say if a full or limited pkey should be placed >> in the REQ (Hal?). > > CM spec for REQ just says partition key without indicating whether this > means P_Key or just the partition (15 bits) so my read is that either > full or limited pkey is allowed in REQ. > >> It is designed so that the requestor can get a >> single reversible path and put that results into the REQ without >> additional processing, however the PR returns only one PKey and again, >> it is not really specified if it should be the full or limited pkey >> (Hal?). > > Correct; it's not specified. > >> Basically this means that any pkey in the REQ could randomly be the >> full or limited value, and that in-of-itself has not bearing on the >> connection. >> >> So it is quite wrong to insist that the pkey be limited or full when >> processing the REQ. The end port is expected to match against the >> local table. > > Note that there is thorny issue with shared (physical) port > virtualization. In shared port virtualization, the VF pkey assignment is > a local matter. Only thing SM knows is the physical port pkeys where > both full and limited membership of same partition is possible. It is > conceivable that CM REQ contains limited partition key between 2 limited > VFs and for that a new REJ reason code appears to be needed. +1 Håkon > > -- Hal > >> The real answer to your trap problem is to fix the SM to not create >> paths that are non-functional, that is just flat out broken SM >> behavior. >> >> Jason >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-15 18:11 ` Håkon Bugge @ 2018-05-15 19:04 ` Jason Gunthorpe 2018-05-16 6:47 ` Håkon Bugge 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2018-05-15 19:04 UTC (permalink / raw) To: Håkon Bugge Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On Tue, May 15, 2018 at 08:11:09PM +0200, Håkon Bugge wrote: > > On 15 May 2018, at 02:38, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > > > > On 5/14/2018 5:02 PM, Jason Gunthorpe wrote: > >> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote: > >> > >>> We are talking about two things here. The PKey in the BTH and the > >>> PKey in the CM REQ payload. They differ. > >>> > >>> I am out of office, but if my memory serves me correct, the PKey in > >>> the BTH in the MAD packet will be the default PKey. Further, we have > >>> per IBTA: > >> > >> This sounds like a Linux bug. > >> > >> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters > >> and those in the REQ should always exactly match. > >> > >> Where is Linux setting the BTH.PKey and how did it choose to use the > >> default pkey? Lets fix that at least for sure. > > Linux isn’t. The BTH.PKey is inserted by the HCA (hw or fw) coming from the P_Key table (10.9.2 in IBTA), selected by a pkey_index associated with the QP. > > As per C10-133: Packets sent from the Send Queue of a GSI QP shall > attach a P_Key associated with that QP, just as a P_Key is > associated with nonmanagement QPs That language doesn't mean the PKey is forced to the default, it says the pkey is programmable. Linux implemented programmable PKeys for the GSI by using the work request, so it deviates from what the spec imagined (which was probably one GSI QP per PKey). See ib_create_send_mad for instance: mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe; mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list; mad_send_wr->send_wr.wr.num_sge = 2; mad_send_wr->send_wr.wr.opcode = IB_WR_SEND; mad_send_wr->send_wr.wr.send_flags = IB_SEND_SIGNALED; mad_send_wr->send_wr.remote_qpn = remote_qpn; mad_send_wr->send_wr.remote_qkey = IB_QP_SET_QKEY; mad_send_wr->send_wr.pkey_index = pkey_index; The pkey_index associated with the QP1 is ignored: /* * PKey index for QP1 is irrelevant but * one is needed for the Reset to Init transition */ attr->qp_state = IB_QPS_INIT; attr->pkey_index = pkey_index; attr->qkey = (qp->qp_num == 0) ? 0 : IB_QP1_QKEY; ret = ib_modify_qp(qp, attr, IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_QKEY); Since is is present in every WR. > >> Basically this means that any pkey in the REQ could randomly be the > >> full or limited value, and that in-of-itself has not bearing on the > >> connection. > >> > >> So it is quite wrong to insist that the pkey be limited or full when > >> processing the REQ. The end port is expected to match against the > >> local table. > > > > Note that there is thorny issue with shared (physical) port > > virtualization. In shared port virtualization, the VF pkey assignment is > > a local matter. Only thing SM knows is the physical port pkeys where > > both full and limited membership of same partition is possible. It is > > conceivable that CM REQ contains limited partition key between 2 limited > > VFs and for that a new REJ reason code appears to be needed. > > +1 This is not a difficult issue. If the GMP is properly tagged with the right PKey then it will never be delivered to the VM if the VM does not have the PKey in the table. It is up to the hypervisor to block GMPs that have Pkeys not in the virtual PKey table of the VF. The only time you could need a new REJ code is if the GMP is using a PKey different from the REQ - which is a pretty goofy thing to do considering this VM case. Remember the SM doesn't know what Pkeys are in the VM, so it is basically impossible for the REQ side to reliably select two different pkeys and know that they will bothmake it to the VM. One pkey could be done reliably if it matched ipoib, for instance, but two really cannot in the general case. So again - the bug here is that the GMP is being sent with the wrong pkey. Fix that, then see what is left to fix.. If I recall there were bugs here in mlx drivers, where the driver sent with the wrong Pkey. I think this has actually been fixed now, so please check the upstream kernel to be sure the Pkey is not what it is supposed to be. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-15 19:04 ` Jason Gunthorpe @ 2018-05-16 6:47 ` Håkon Bugge 2018-05-16 15:12 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Håkon Bugge @ 2018-05-16 6:47 UTC (permalink / raw) To: Jason Gunthorpe Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel > On 15 May 2018, at 21:04, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, May 15, 2018 at 08:11:09PM +0200, Håkon Bugge wrote: >>> On 15 May 2018, at 02:38, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: >>> >>> On 5/14/2018 5:02 PM, Jason Gunthorpe wrote: >>>> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote: >>>> >>>>> We are talking about two things here. The PKey in the BTH and the >>>>> PKey in the CM REQ payload. They differ. >>>>> >>>>> I am out of office, but if my memory serves me correct, the PKey in >>>>> the BTH in the MAD packet will be the default PKey. Further, we have >>>>> per IBTA: >>>> >>>> This sounds like a Linux bug. >>>> >>>> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters >>>> and those in the REQ should always exactly match. >>>> >>>> Where is Linux setting the BTH.PKey and how did it choose to use the >>>> default pkey? Lets fix that at least for sure. >> >> Linux isn’t. The BTH.PKey is inserted by the HCA (hw or fw) coming from the P_Key table (10.9.2 in IBTA), selected by a pkey_index associated with the QP. >> >> As per C10-133: Packets sent from the Send Queue of a GSI QP shall >> attach a P_Key associated with that QP, just as a P_Key is >> associated with nonmanagement QPs > > That language doesn't mean the PKey is forced to the default, it says > the pkey is programmable. > > Linux implemented programmable PKeys for the GSI by using the work > request, so it deviates from what the spec imagined (which was > probably one GSI QP per PKey). > > See ib_create_send_mad for instance: > > mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe; > mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list; > mad_send_wr->send_wr.wr.num_sge = 2; > mad_send_wr->send_wr.wr.opcode = IB_WR_SEND; > mad_send_wr->send_wr.wr.send_flags = IB_SEND_SIGNALED; > mad_send_wr->send_wr.remote_qpn = remote_qpn; > mad_send_wr->send_wr.remote_qkey = IB_QP_SET_QKEY; > mad_send_wr->send_wr.pkey_index = pkey_index; > > The pkey_index associated with the QP1 is ignored: > > /* > * PKey index for QP1 is irrelevant but > * one is needed for the Reset to Init transition > */ > attr->qp_state = IB_QPS_INIT; > attr->pkey_index = pkey_index; > attr->qkey = (qp->qp_num == 0) ? 0 : IB_QP1_QKEY; > ret = ib_modify_qp(qp, attr, IB_QP_STATE | > IB_QP_PKEY_INDEX | IB_QP_QKEY); > > Since is is present in every WR. > >>>> Basically this means that any pkey in the REQ could randomly be the >>>> full or limited value, and that in-of-itself has not bearing on the >>>> connection. >>>> >>>> So it is quite wrong to insist that the pkey be limited or full when >>>> processing the REQ. The end port is expected to match against the >>>> local table. >>> >>> Note that there is thorny issue with shared (physical) port >>> virtualization. In shared port virtualization, the VF pkey assignment is >>> a local matter. Only thing SM knows is the physical port pkeys where >>> both full and limited membership of same partition is possible. It is >>> conceivable that CM REQ contains limited partition key between 2 limited >>> VFs and for that a new REJ reason code appears to be needed. >> >> +1 > > This is not a difficult issue. > > If the GMP is properly tagged with the right PKey then it will never > be delivered to the VM if the VM does not have the PKey in the > table. Not quite right. For the shared port model, a GMP will (most probably) be accepted by the physical port, due to: IBTA 3.9.5: QP1 is a member of all of the port’s partitions (i.e., can accept any packet specifying a P_Key contained in the port’s P_Key table). and C9-42: If the destination QP is QP1, the BTH:P_Key shall be compared to the set of P_Keys associated with the port on which the packet arrived. If the P_Key matches any of the keys associated with the port, it shall be considered valid. So, if the GMP is destined to VM1, which is a limited member, but we have another VM2 which is a full member of the same partition, the GMP will pass the HCA’s PKey check. > It is up to the hypervisor to block GMPs that have Pkeys not in > the virtual PKey table of the VF. The packet is received by the HCA and stripped from IB headers, in particular the BTH. How can the "hypervisor" block it when its doesn’t have access to the GMP’s BTH.PKey? > The only time you could need a new REJ code is if the GMP is using a > PKey different from the REQ - which is a pretty goofy thing to do > considering this VM case. Its goofy. In the CX-3 shared port model, the BTH.PKey is the default one and the REQ.PKey is the full one even if the sending VM’s port only is a limited member. This patch series fixes the last issue. > Remember the SM doesn't know what Pkeys are in the VM, so it is > basically impossible for the REQ side to reliably select two different > pkeys and know that they will bothmake it to the VM. The active side should use the "authentic" PKey in the REQ message. That is the one that would be used in BTH.PKey when communication has been established. This is implemented by this patch series. Not sure what you mean by "reliably select two different pkeys". The CM REQ message contains one PKey. > One pkey could be done reliably if it matched ipoib, for instance, but > two really cannot in the general case. > > So again - the bug here is that the GMP is being sent with the wrong > pkey. Fix that, then see what is left to fix.. See above, not sure how that could be implemented. And if it is solved by the HCA trapping the GMP due to the PKey check, it doesn’t help me, as the purpose of the series is to avoid (excessive) PKey traps sent to the SM. > If I recall there were bugs here in mlx drivers, where the driver sent > with the wrong Pkey. I think this has actually been fixed now, so > please check the upstream kernel to be sure the Pkey is not what it is > supposed to be. Let me get back to you with some ibdumps here. Håkon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-16 6:47 ` Håkon Bugge @ 2018-05-16 15:12 ` Jason Gunthorpe 2018-05-16 16:42 ` Hal Rosenstock 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2018-05-16 15:12 UTC (permalink / raw) To: Håkon Bugge Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On Wed, May 16, 2018 at 08:47:21AM +0200, Håkon Bugge wrote: > > This is not a difficult issue. > > > > If the GMP is properly tagged with the right PKey then it will never > > be delivered to the VM if the VM does not have the PKey in the > > table. > > Not quite right. For the shared port model, a GMP will (most > probably) be accepted by the physical port, due to: Sure, but I am talking about the VM's 'virtual port'. > So, if the GMP is destined to VM1, which is a limited member, but we > have another VM2 which is a full member of the same partition, the > GMP will pass the HCA’s PKey check. Sure. > > It is up to the hypervisor to block GMPs that have Pkeys not in > > the virtual PKey table of the VF. > > The packet is received by the HCA and stripped from IB headers, in > particular the BTH. How can the "hypervisor" block it when its > doesn’t have access to the GMP’s BTH.PKey? This is wrong too, in Linux the WC's for GMP packets include the pkey index that was used to RX the packet. This is a Linux extension matching the one on the sending side. The hypervisor *must* block these GMPs, it is a hard requirement of the pkey security model for VMs. AFAIK the Mellanox drivers do this right - do you know differently? > > The only time you could need a new REJ code is if the GMP is using a > > PKey different from the REQ - which is a pretty goofy thing to do > > considering this VM case. > > Its goofy. In the CX-3 shared port model, the BTH.PKey is the > default one and the REQ.PKey is the full one even if the sending > VM’s port only is a limited member. This patch series fixes the last > issue. Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same - please fix that first before trying to change anything else. > > Remember the SM doesn't know what Pkeys are in the VM, so it is > > basically impossible for the REQ side to reliably select two different > > pkeys and know that they will bothmake it to the VM. > > The active side should use the "authentic" PKey in the REQ > message. That is the one that would be used in BTH.PKey when > communication has been established. This is implemented by this > patch series. > > Not sure what you mean by "reliably select two different pkeys". The > CM REQ message contains one PKey. If BTH.Pkey != REQ.PKey then the requestor side has to obviously select two PKeys, which is basically impossible. The VM should not be part of the default partition, for instance. > See above, not sure how that could be implemented. And if it is > solved by the HCA trapping the GMP due to the PKey check, it doesn’t > help me, as the purpose of the series is to avoid (excessive) PKey > traps sent to the SM. It might not help you, but is shows this fix for the pkey trap issue is wrong. You must fix the pkey traps on the sending side, not on the responder side.. > > If I recall there were bugs here in mlx drivers, where the driver > > sent with the wrong Pkey. I think this has actually been fixed > > now, so please check the upstream kernel to be sure the Pkey is > > not what it is supposed to be. > > Let me get back to you with some ibdumps here. Upstream kernel (eg 4.16) and newish Mellanox firmware please. And it would be fantastic if you could ID why this is happening, AFAIK it should be OK in the kernel. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-16 15:12 ` Jason Gunthorpe @ 2018-05-16 16:42 ` Hal Rosenstock 2018-05-16 16:57 ` Jason Gunthorpe [not found] ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com> 0 siblings, 2 replies; 24+ messages in thread From: Hal Rosenstock @ 2018-05-16 16:42 UTC (permalink / raw) To: Jason Gunthorpe, Håkon Bugge Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On 5/16/2018 11:12 AM, Jason Gunthorpe wrote: > On Wed, May 16, 2018 at 08:47:21AM +0200, Håkon Bugge wrote: > >>> This is not a difficult issue. >>> >>> If the GMP is properly tagged with the right PKey then it will never >>> be delivered to the VM if the VM does not have the PKey in the >>> table. >> >> Not quite right. For the shared port model, a GMP will (most >> probably) be accepted by the physical port, due to: > > Sure, but I am talking about the VM's 'virtual port'. > >> So, if the GMP is destined to VM1, which is a limited member, but we >> have another VM2 which is a full member of the same partition, the >> GMP will pass the HCA’s PKey check. > > Sure. > >>> It is up to the hypervisor to block GMPs that have Pkeys not in >>> the virtual PKey table of the VF. >> >> The packet is received by the HCA and stripped from IB headers, in >> particular the BTH. How can the "hypervisor" block it when its >> doesn’t have access to the GMP’s BTH.PKey? > > This is wrong too, in Linux the WC's for GMP packets include the pkey > index that was used to RX the packet. This is a Linux extension > matching the one on the sending side. > > The hypervisor *must* block these GMPs, it is a hard requirement of > the pkey security model for VMs. Aside from the pkey enforcement at the VF, there is also the issue of the pkey trap - if physical port model is followed for shared virtual ports, trap should be generated which could look confusing to SM as well as counting in some error counter (for physical port, it's either PortCounter:Port[Xmit Rcv]ConstraintErrors. > AFAIK the Mellanox drivers do this right - do you know differently? > >>> The only time you could need a new REJ code is if the GMP is using a >>> PKey different from the REQ - which is a pretty goofy thing to do >>> considering this VM case. >> >> Its goofy. In the CX-3 shared port model, the BTH.PKey is the >> default one and the REQ.PKey is the full one even if the sending >> VM’s port only is a limited member. This patch series fixes the last >> issue. > > Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same - I do not believe there is anything in the spec that requires this. I agree it's the simplest use model though. > please fix that first before trying to change anything else. > >>> Remember the SM doesn't know what Pkeys are in the VM, so it is >>> basically impossible for the REQ side to reliably select two different >>> pkeys and know that they will bothmake it to the VM. >> >> The active side should use the "authentic" PKey in the REQ >> message. That is the one that would be used in BTH.PKey when >> communication has been established. This is implemented by this >> patch series. >> >> Not sure what you mean by "reliably select two different pkeys". The >> CM REQ message contains one PKey. > > If BTH.Pkey != REQ.PKey then the requestor side has to obviously > select two PKeys, which is basically impossible. > > The VM should not be part of the default partition, for instance. I think that the VM is at least a limited member of the default partition. -- Hal >> See above, not sure how that could be implemented. And if it is >> solved by the HCA trapping the GMP due to the PKey check, it doesn’t >> help me, as the purpose of the series is to avoid (excessive) PKey >> traps sent to the SM. > > It might not help you, but is shows this fix for the pkey trap issue > is wrong. You must fix the pkey traps on the sending side, not on the > responder side.. > >>> If I recall there were bugs here in mlx drivers, where the driver >>> sent with the wrong Pkey. I think this has actually been fixed >>> now, so please check the upstream kernel to be sure the Pkey is >>> not what it is supposed to be. >> >> Let me get back to you with some ibdumps here. > > Upstream kernel (eg 4.16) and newish Mellanox firmware please. > > And it would be fantastic if you could ID why this is happening, AFAIK > it should be OK in the kernel. > > Jason > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-16 16:42 ` Hal Rosenstock @ 2018-05-16 16:57 ` Jason Gunthorpe [not found] ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com> 1 sibling, 0 replies; 24+ messages in thread From: Jason Gunthorpe @ 2018-05-16 16:57 UTC (permalink / raw) To: Hal Rosenstock Cc: Håkon Bugge, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On Wed, May 16, 2018 at 12:42:37PM -0400, Hal Rosenstock wrote: > >>> The only time you could need a new REJ code is if the GMP is using a > >>> PKey different from the REQ - which is a pretty goofy thing to do > >>> considering this VM case. > >> > >> Its goofy. In the CX-3 shared port model, the BTH.PKey is the > >> default one and the REQ.PKey is the full one even if the sending > >> VM’s port only is a limited member. This patch series fixes the last > >> issue. > > > > Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same - > > I do not believe there is anything in the spec that requires this. I > agree it's the simplest use model though. The spec doesn't require it, but the design of the Linux CM certainly does. > > If BTH.Pkey != REQ.PKey then the requestor side has to obviously > > select two PKeys, which is basically impossible. > > > > The VM should not be part of the default partition, for instance. > > I think that the VM is at least a limited member of the default partition. Well, being a limited member still means the default pkey cannot be used for CM GMPs. I actually can't think of why you'd want to do this, better to put the SM nodes in all the pkeys and reserve the default pkey completely for the network management control plane. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com>]
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys [not found] ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com> @ 2018-05-16 18:01 ` Jason Gunthorpe 2018-05-16 18:14 ` Håkon Bugge 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2018-05-16 18:01 UTC (permalink / raw) To: Håkon Bugge Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel, Hal Rosenstock On Wed, May 16, 2018 at 07:46:10PM +0200, Håkon Bugge wrote: > OK. Lets take one example. The pkey table contains 0xFFFF, 0x8001, > 0x0001. > > The wce.pkey_index is 1 (i.e., pointing to 0x8001). Now, tell me, was > BTH.PKey 0x8001 (matches 0x8001) or was it 0x0001 (also matching > 0x8001) ? As far as the Linux core is concerned, it must have been 0x8001, because the only way the pkey_index feature works properly is if exact-match takes precedence over in-exact match. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-16 18:01 ` Jason Gunthorpe @ 2018-05-16 18:14 ` Håkon Bugge 2018-05-16 18:16 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Håkon Bugge @ 2018-05-16 18:14 UTC (permalink / raw) To: Jason Gunthorpe Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel, Hal Rosenstock > On 16 May 2018, at 20:01, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Wed, May 16, 2018 at 07:46:10PM +0200, Håkon Bugge wrote: > >> OK. Lets take one example. The pkey table contains 0xFFFF, 0x8001, >> 0x0001. >> >> The wce.pkey_index is 1 (i.e., pointing to 0x8001). Now, tell me, was >> BTH.PKey 0x8001 (matches 0x8001) or was it 0x0001 (also matching >> 0x8001) ? > > As far as the Linux core is concerned, it must have been 0x8001, > because the only way the pkey_index feature works properly is if > exact-match takes precedence over in-exact match. And now if the table only contains 0xFFFF, 0x8001, how do you tell? Håkon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-16 18:14 ` Håkon Bugge @ 2018-05-16 18:16 ` Jason Gunthorpe [not found] ` <A087A721-E596-428E-8554-FAEB4BE9B306@oracle.com> 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2018-05-16 18:16 UTC (permalink / raw) To: Håkon Bugge Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel, Hal Rosenstock On Wed, May 16, 2018 at 08:14:50PM +0200, Håkon Bugge wrote: > > > > On 16 May 2018, at 20:01, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Wed, May 16, 2018 at 07:46:10PM +0200, Håkon Bugge wrote: > > > >> OK. Lets take one example. The pkey table contains 0xFFFF, 0x8001, > >> 0x0001. > >> > >> The wce.pkey_index is 1 (i.e., pointing to 0x8001). Now, tell me, was > >> BTH.PKey 0x8001 (matches 0x8001) or was it 0x0001 (also matching > >> 0x8001) ? > > > > As far as the Linux core is concerned, it must have been 0x8001, > > because the only way the pkey_index feature works properly is if > > exact-match takes precedence over in-exact match. > > And now if the table only contains 0xFFFF, 0x8001, how do you tell? It doesn't matter. The delgation of Pkeys to VMs are on a pkey-table-index basis, so if it matches table entry 1 and entry 1 is passed to the VM, then the packet can be passed to the VM. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <A087A721-E596-428E-8554-FAEB4BE9B306@oracle.com>]
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys [not found] ` <A087A721-E596-428E-8554-FAEB4BE9B306@oracle.com> @ 2018-05-16 19:30 ` Hal Rosenstock 0 siblings, 0 replies; 24+ messages in thread From: Hal Rosenstock @ 2018-05-16 19:30 UTC (permalink / raw) To: Håkon Bugge, Jason Gunthorpe Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, OFED mailing list, linux-kernel On 5/16/2018 3:22 PM, Håkon Bugge wrote: > But, do we need an update to IBTA (that the BTH.PKey shall be that of the VM's Port)? Nothing in spec mentions shared (port) virtualization so that is an exercise completely left to the reader... Annex A19 is silent on this specific point but the virtual port model has virtual port partition table so there's no ambiguity there. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys 2018-05-09 9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge 2018-05-09 11:28 ` Hal Rosenstock @ 2018-05-09 18:11 ` kbuild test robot 1 sibling, 0 replies; 24+ messages in thread From: kbuild test robot @ 2018-05-09 18:11 UTC (permalink / raw) To: Håkon Bugge Cc: kbuild-all, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty, linux-rdma, linux-kernel, Håkon Bugge Hi Håkon, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.17-rc4 next-20180509] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/H-kon-Bugge/Do-not-form-IB-connections-between-limited-partition-members/20180509-224727 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) drivers/infiniband/core/cm.c:880:16: sparse: expression using sizeof(void) drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void) >> drivers/infiniband/core/cm.c:1246:25: sparse: cast from restricted __be16 >> drivers/infiniband/core/cm.c:1246:25: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned short [unsigned] [usertype] val @@ got short [unsigned] [usertype] val @@ drivers/infiniband/core/cm.c:1246:25: expected unsigned short [unsigned] [usertype] val drivers/infiniband/core/cm.c:1246:25: got restricted __be16 [usertype] pkey >> drivers/infiniband/core/cm.c:1246:25: sparse: cast from restricted __be16 >> drivers/infiniband/core/cm.c:1246:25: sparse: cast from restricted __be16 drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void) drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void) >> drivers/infiniband/core/cm.c:1414:35: sparse: incorrect type in argument 4 (different base types) @@ expected unsigned short [usertype] *pkey @@ got short [usertype] *pkey @@ drivers/infiniband/core/cm.c:1414:35: expected unsigned short [usertype] *pkey drivers/infiniband/core/cm.c:1414:35: got restricted __be16 *<noident> drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void) drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void) drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void) vim +1246 drivers/infiniband/core/cm.c 1218 1219 static void cm_format_req(struct cm_req_msg *req_msg, 1220 struct cm_id_private *cm_id_priv, 1221 struct ib_cm_req_param *param) 1222 { 1223 struct sa_path_rec *pri_path = param->primary_path; 1224 struct sa_path_rec *alt_path = param->alternate_path; 1225 bool pri_ext = false; 1226 1227 if (pri_path->rec_type == SA_PATH_REC_TYPE_OPA) 1228 pri_ext = opa_is_extended_lid(pri_path->opa.dlid, 1229 pri_path->opa.slid); 1230 1231 cm_format_mad_hdr(&req_msg->hdr, CM_REQ_ATTR_ID, 1232 cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_REQ)); 1233 1234 req_msg->local_comm_id = cm_id_priv->id.local_id; 1235 req_msg->service_id = param->service_id; 1236 req_msg->local_ca_guid = cm_id_priv->id.device->node_guid; 1237 cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num)); 1238 cm_req_set_init_depth(req_msg, param->initiator_depth); 1239 cm_req_set_remote_resp_timeout(req_msg, 1240 param->remote_cm_response_timeout); 1241 cm_req_set_qp_type(req_msg, param->qp_type); 1242 cm_req_set_flow_ctrl(req_msg, param->flow_control); 1243 cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); 1244 cm_req_set_local_resp_timeout(req_msg, 1245 param->local_cm_response_timeout); > 1246 req_msg->pkey = cpu_to_be16(cm_id_priv->pkey); 1247 cm_req_set_path_mtu(req_msg, param->primary_path->mtu); 1248 cm_req_set_max_cm_retries(req_msg, param->max_cm_retries); 1249 1250 if (param->qp_type != IB_QPT_XRC_INI) { 1251 cm_req_set_resp_res(req_msg, param->responder_resources); 1252 cm_req_set_retry_count(req_msg, param->retry_count); 1253 cm_req_set_rnr_retry_count(req_msg, param->rnr_retry_count); 1254 cm_req_set_srq(req_msg, param->srq); 1255 } 1256 1257 req_msg->primary_local_gid = pri_path->sgid; 1258 req_msg->primary_remote_gid = pri_path->dgid; 1259 if (pri_ext) { 1260 req_msg->primary_local_gid.global.interface_id 1261 = OPA_MAKE_ID(be32_to_cpu(pri_path->opa.slid)); 1262 req_msg->primary_remote_gid.global.interface_id 1263 = OPA_MAKE_ID(be32_to_cpu(pri_path->opa.dlid)); 1264 } 1265 if (pri_path->hop_limit <= 1) { 1266 req_msg->primary_local_lid = pri_ext ? 0 : 1267 htons(ntohl(sa_path_get_slid(pri_path))); 1268 req_msg->primary_remote_lid = pri_ext ? 0 : 1269 htons(ntohl(sa_path_get_dlid(pri_path))); 1270 } else { 1271 /* Work-around until there's a way to obtain remote LID info */ 1272 req_msg->primary_local_lid = IB_LID_PERMISSIVE; 1273 req_msg->primary_remote_lid = IB_LID_PERMISSIVE; 1274 } 1275 cm_req_set_primary_flow_label(req_msg, pri_path->flow_label); 1276 cm_req_set_primary_packet_rate(req_msg, pri_path->rate); 1277 req_msg->primary_traffic_class = pri_path->traffic_class; 1278 req_msg->primary_hop_limit = pri_path->hop_limit; 1279 cm_req_set_primary_sl(req_msg, pri_path->sl); 1280 cm_req_set_primary_subnet_local(req_msg, (pri_path->hop_limit <= 1)); 1281 cm_req_set_primary_local_ack_timeout(req_msg, 1282 cm_ack_timeout(cm_id_priv->av.port->cm_dev->ack_delay, 1283 pri_path->packet_life_time)); 1284 1285 if (alt_path) { 1286 bool alt_ext = false; 1287 1288 if (alt_path->rec_type == SA_PATH_REC_TYPE_OPA) 1289 alt_ext = opa_is_extended_lid(alt_path->opa.dlid, 1290 alt_path->opa.slid); 1291 1292 req_msg->alt_local_gid = alt_path->sgid; 1293 req_msg->alt_remote_gid = alt_path->dgid; 1294 if (alt_ext) { 1295 req_msg->alt_local_gid.global.interface_id 1296 = OPA_MAKE_ID(be32_to_cpu(alt_path->opa.slid)); 1297 req_msg->alt_remote_gid.global.interface_id 1298 = OPA_MAKE_ID(be32_to_cpu(alt_path->opa.dlid)); 1299 } 1300 if (alt_path->hop_limit <= 1) { 1301 req_msg->alt_local_lid = alt_ext ? 0 : 1302 htons(ntohl(sa_path_get_slid(alt_path))); 1303 req_msg->alt_remote_lid = alt_ext ? 0 : 1304 htons(ntohl(sa_path_get_dlid(alt_path))); 1305 } else { 1306 req_msg->alt_local_lid = IB_LID_PERMISSIVE; 1307 req_msg->alt_remote_lid = IB_LID_PERMISSIVE; 1308 } 1309 cm_req_set_alt_flow_label(req_msg, 1310 alt_path->flow_label); 1311 cm_req_set_alt_packet_rate(req_msg, alt_path->rate); 1312 req_msg->alt_traffic_class = alt_path->traffic_class; 1313 req_msg->alt_hop_limit = alt_path->hop_limit; 1314 cm_req_set_alt_sl(req_msg, alt_path->sl); 1315 cm_req_set_alt_subnet_local(req_msg, (alt_path->hop_limit <= 1)); 1316 cm_req_set_alt_local_ack_timeout(req_msg, 1317 cm_ack_timeout(cm_id_priv->av.port->cm_dev->ack_delay, 1318 alt_path->packet_life_time)); 1319 } 1320 1321 if (param->private_data && param->private_data_len) 1322 memcpy(req_msg->private_data, param->private_data, 1323 param->private_data_len); 1324 } 1325 1326 static int cm_validate_req_param(struct ib_cm_req_param *param) 1327 { 1328 /* peer-to-peer not supported */ 1329 if (param->peer_to_peer) 1330 return -EINVAL; 1331 1332 if (!param->primary_path) 1333 return -EINVAL; 1334 1335 if (param->qp_type != IB_QPT_RC && param->qp_type != IB_QPT_UC && 1336 param->qp_type != IB_QPT_XRC_INI) 1337 return -EINVAL; 1338 1339 if (param->private_data && 1340 param->private_data_len > IB_CM_REQ_PRIVATE_DATA_SIZE) 1341 return -EINVAL; 1342 1343 if (param->alternate_path && 1344 (param->alternate_path->pkey != param->primary_path->pkey || 1345 param->alternate_path->mtu != param->primary_path->mtu)) 1346 return -EINVAL; 1347 1348 return 0; 1349 } 1350 1351 int ib_send_cm_req(struct ib_cm_id *cm_id, 1352 struct ib_cm_req_param *param) 1353 { 1354 struct cm_id_private *cm_id_priv; 1355 struct cm_req_msg *req_msg; 1356 unsigned long flags; 1357 int ret; 1358 1359 ret = cm_validate_req_param(param); 1360 if (ret) 1361 return ret; 1362 1363 /* Verify that we're not in timewait. */ 1364 cm_id_priv = container_of(cm_id, struct cm_id_private, id); 1365 spin_lock_irqsave(&cm_id_priv->lock, flags); 1366 if (cm_id->state != IB_CM_IDLE) { 1367 spin_unlock_irqrestore(&cm_id_priv->lock, flags); 1368 ret = -EINVAL; 1369 goto out; 1370 } 1371 spin_unlock_irqrestore(&cm_id_priv->lock, flags); 1372 1373 cm_id_priv->timewait_info = cm_create_timewait_info(cm_id_priv-> 1374 id.local_id); 1375 if (IS_ERR(cm_id_priv->timewait_info)) { 1376 ret = PTR_ERR(cm_id_priv->timewait_info); 1377 goto out; 1378 } 1379 1380 ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av, 1381 cm_id_priv); 1382 if (ret) 1383 goto error1; 1384 if (param->alternate_path) { 1385 ret = cm_init_av_by_path(param->alternate_path, 1386 &cm_id_priv->alt_av, cm_id_priv); 1387 if (ret) 1388 goto error1; 1389 } 1390 cm_id->service_id = param->service_id; 1391 cm_id->service_mask = ~cpu_to_be64(0); 1392 cm_id_priv->timeout_ms = cm_convert_to_ms( 1393 param->primary_path->packet_life_time) * 2 + 1394 cm_convert_to_ms( 1395 param->remote_cm_response_timeout); 1396 cm_id_priv->max_cm_retries = param->max_cm_retries; 1397 cm_id_priv->initiator_depth = param->initiator_depth; 1398 cm_id_priv->responder_resources = param->responder_resources; 1399 cm_id_priv->retry_count = param->retry_count; 1400 cm_id_priv->path_mtu = param->primary_path->mtu; 1401 1402 /* 1403 * We want to send the pkey used in the BTH in packets 1404 * sent. This, in order for the passive side to determine if 1405 * communication is permitted by the respective pkeys. 1406 * 1407 * The pkey in the paths are derived from the MGID, which has 1408 * the full membership bit set. Hence, we retrieve the pkey by 1409 * using the address vector's pkey_index. 1410 */ 1411 ret = ib_get_cached_pkey(cm_id_priv->id.device, 1412 cm_id_priv->av.port->port_num, 1413 cm_id_priv->av.pkey_index, > 1414 &cm_id_priv->pkey); 1415 if (ret) 1416 goto error1; 1417 1418 cm_id_priv->qp_type = param->qp_type; 1419 1420 ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg); 1421 if (ret) 1422 goto error1; 1423 1424 req_msg = (struct cm_req_msg *) cm_id_priv->msg->mad; 1425 cm_format_req(req_msg, cm_id_priv, param); 1426 cm_id_priv->tid = req_msg->hdr.tid; 1427 cm_id_priv->msg->timeout_ms = cm_id_priv->timeout_ms; 1428 cm_id_priv->msg->context[1] = (void *) (unsigned long) IB_CM_REQ_SENT; 1429 1430 cm_id_priv->local_qpn = cm_req_get_local_qpn(req_msg); 1431 cm_id_priv->rq_psn = cm_req_get_starting_psn(req_msg); 1432 1433 spin_lock_irqsave(&cm_id_priv->lock, flags); 1434 ret = ib_post_send_mad(cm_id_priv->msg, NULL); 1435 if (ret) { 1436 spin_unlock_irqrestore(&cm_id_priv->lock, flags); 1437 goto error2; 1438 } 1439 BUG_ON(cm_id->state != IB_CM_IDLE); 1440 cm_id->state = IB_CM_REQ_SENT; 1441 spin_unlock_irqrestore(&cm_id_priv->lock, flags); 1442 return 0; 1443 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2018-05-16 19:30 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-09 9:30 [PATCH IB/core 0/2] Do not form IB connections between limited partition members Håkon Bugge 2018-05-09 9:30 ` [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one Håkon Bugge 2018-05-09 9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge 2018-05-09 11:28 ` Hal Rosenstock 2018-05-10 9:16 ` Håkon Bugge 2018-05-10 14:01 ` Hal Rosenstock 2018-05-10 15:16 ` Håkon Bugge 2018-05-10 16:54 ` Hal Rosenstock 2018-05-11 10:55 ` Håkon Bugge 2018-05-11 12:51 ` Hal Rosenstock 2018-05-14 21:04 ` Jason Gunthorpe 2018-05-14 21:02 ` Jason Gunthorpe 2018-05-15 0:38 ` Hal Rosenstock 2018-05-15 18:11 ` Håkon Bugge 2018-05-15 19:04 ` Jason Gunthorpe 2018-05-16 6:47 ` Håkon Bugge 2018-05-16 15:12 ` Jason Gunthorpe 2018-05-16 16:42 ` Hal Rosenstock 2018-05-16 16:57 ` Jason Gunthorpe [not found] ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com> 2018-05-16 18:01 ` Jason Gunthorpe 2018-05-16 18:14 ` Håkon Bugge 2018-05-16 18:16 ` Jason Gunthorpe [not found] ` <A087A721-E596-428E-8554-FAEB4BE9B306@oracle.com> 2018-05-16 19:30 ` Hal Rosenstock 2018-05-09 18:11 ` kbuild test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).