linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH IB/core 0/2] Do not form IB connections between limited partition members
@ 2018-05-09  9:30 Håkon Bugge
  2018-05-09  9:30 ` [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one Håkon Bugge
  2018-05-09  9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge
  0 siblings, 2 replies; 24+ messages in thread
From: Håkon Bugge @ 2018-05-09  9:30 UTC (permalink / raw)
  To: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty
  Cc: linux-rdma, linux-kernel, Håkon Bugge

Systems using IB partitions might be exposed to excessive pkey
violation traps which are sent to the OpenSM. This can be close to
a DoS attack, and in addition, the OpenSM logs are flooded with these
messages, hiding potential other log messages deemed important in
order to investigate important issues.

This series prohibit RDMA CM to establish connections between two
limited partition members. This avoids pkey violation traps stemming
from unicast messages to be sent to the OpenSM.

[If this patch series get accepted by the community, I ask if
the maintainer can update the reference to the first commit in the
second commit message with a correct 12 chars SHA]

Håkon Bugge (2):
  IB/core: A full pkey is required to match a limited one
  IB/cm: Send authentic pkey in REQ msg and check eligibility of the
    pkeys

 drivers/infiniband/core/cache.c | 32 +++++++++++++++++++++++++++-----
 drivers/infiniband/core/cm.c    | 39 ++++++++++++++++++++++++++++++++-------
 include/rdma/ib_cache.h         | 18 ++++++++++++++++++
 include/rdma/ib_cm.h            |  4 +++-
 4 files changed, 80 insertions(+), 13 deletions(-)

--
2.13.6

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one
  2018-05-09  9:30 [PATCH IB/core 0/2] Do not form IB connections between limited partition members Håkon Bugge
@ 2018-05-09  9:30 ` Håkon Bugge
  2018-05-09  9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge
  1 sibling, 0 replies; 24+ messages in thread
From: Håkon Bugge @ 2018-05-09  9:30 UTC (permalink / raw)
  To: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty
  Cc: linux-rdma, linux-kernel, Håkon Bugge

In ib_find_cached_pkey(), the preference is to return the pkey-index
of the full pkey, if both a limited and the full exists in the cache.

However, if both pkeys are limited, the function return success.

To detect if two pkeys are eligible for communication,
ib_find_matched_cached_pkey() is introduced, which requires at least
one of the pkeys to be a full member, in order to return success.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/core/cache.c | 32 +++++++++++++++++++++++++++-----
 include/rdma/ib_cache.h         | 18 ++++++++++++++++++
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index fb2d347f760f..cb7e9818a226 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -983,10 +983,11 @@ int ib_get_cached_subnet_prefix(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_get_cached_subnet_prefix);
 
-int ib_find_cached_pkey(struct ib_device *device,
-			u8                port_num,
-			u16               pkey,
-			u16              *index)
+static int __ib_find_cached_pkey(struct ib_device *device,
+				 u8                port_num,
+				 u16               pkey,
+				 u16              *index,
+				 bool allow_lim_to_lim_match)
 {
 	struct ib_pkey_cache *cache;
 	unsigned long flags;
@@ -1013,7 +1014,11 @@ int ib_find_cached_pkey(struct ib_device *device,
 				partial_ix = i;
 		}
 
-	if (ret && partial_ix >= 0) {
+	/*
+	 * If table content is limited, pkey must be full to match,
+	 * unless allow_lim_to_lim_match is set
+	 */
+	if (ret && partial_ix >= 0 && (pkey & 0x8000 || allow_lim_to_lim_match)) {
 		*index = partial_ix;
 		ret = 0;
 	}
@@ -1022,6 +1027,23 @@ int ib_find_cached_pkey(struct ib_device *device,
 
 	return ret;
 }
+
+int ib_find_matched_cached_pkey(struct ib_device *device,
+				u8                port_num,
+				u16               pkey,
+				u16              *index)
+{
+	return __ib_find_cached_pkey(device, port_num, pkey, index, false);
+}
+EXPORT_SYMBOL(ib_find_matched_cached_pkey);
+
+int ib_find_cached_pkey(struct ib_device *device,
+			u8                port_num,
+			u16               pkey,
+			u16              *index)
+{
+	return __ib_find_cached_pkey(device, port_num, pkey, index, true);
+}
 EXPORT_SYMBOL(ib_find_cached_pkey);
 
 int ib_find_exact_cached_pkey(struct ib_device *device,
diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index eb49cc8d1f95..748bd27559b6 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -92,6 +92,24 @@ int ib_get_cached_pkey(struct ib_device    *device_handle,
 		       u16                 *pkey);
 
 /**
+ * ib_find_matched_cached_pkey - Returns the PKey table index where a
+ * specified PKey value occurs. Success requires at least one of the
+ * PKeys to be a full-member.
+ * @device: The device to query.
+ * @port_num: The port number of the device to search for the PKey.
+ * @pkey: The PKey value to search for.
+ * @index: The index into the cached PKey table where the PKey was found.
+ *
+ * ib_find_matched_cached_pkey() searches the specified PKey table in
+ * the local software cache and return success unless both PKeys are
+ * limited.
+ */
+int ib_find_matched_cached_pkey(struct ib_device    *device,
+				u8		     port_num,
+				u16		     pkey,
+				u16		    *index);
+
+/**
  * ib_find_cached_pkey - Returns the PKey table index where a specified
  *   PKey value occurs.
  * @device: The device to query.
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-09  9:30 [PATCH IB/core 0/2] Do not form IB connections between limited partition members Håkon Bugge
  2018-05-09  9:30 ` [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one Håkon Bugge
@ 2018-05-09  9:30 ` Håkon Bugge
  2018-05-09 11:28   ` Hal Rosenstock
  2018-05-09 18:11   ` kbuild test robot
  1 sibling, 2 replies; 24+ messages in thread
From: Håkon Bugge @ 2018-05-09  9:30 UTC (permalink / raw)
  To: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty
  Cc: linux-rdma, linux-kernel, Håkon Bugge

There is no point in using RDMA CM to establish a connection between
two QPs that cannot possible communicate. Particularly, if both the
active and passive side use limited pkeys, they are not able to
communicate.

In order to detect this situation, the authentic pkey is used in the
CM REQ message. The authentic pkey is the one that the HCA inserts
into the BTH in the IB packets.

When the passive side receives the REQ, commit ("ib_core: A full pkey
is required to match a limited one") ensures that
ib_find_matched_cached_pkey() fails unless at least one of the pkeys
compared has the full-member bit.

In the limited-to-limited case, this will prohibit the connection to
be formed, and thus, Pkey Violation Traps will not be sent to the SM.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
 include/rdma/ib_cm.h         |  4 +++-
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index a92e1a5c202b..52ed51d5bd2a 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
 	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
 	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
 	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
+	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",
 };
 
 const char *__attribute_const__ ibcm_reject_msg(int reason)
@@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
 		return -EINVAL;
 	cm_dev = port->cm_dev;
 
-	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
-				  be16_to_cpu(path->pkey), &av->pkey_index);
+	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
+					  be16_to_cpu(path->pkey), &av->pkey_index);
 	if (ret)
 		return ret;
 
@@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
 	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
 	cm_req_set_local_resp_timeout(req_msg,
 				      param->local_cm_response_timeout);
-	req_msg->pkey = param->primary_path->pkey;
+	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
 	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
 	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
 
@@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
 	cm_id_priv->responder_resources = param->responder_resources;
 	cm_id_priv->retry_count = param->retry_count;
 	cm_id_priv->path_mtu = param->primary_path->mtu;
-	cm_id_priv->pkey = param->primary_path->pkey;
+
+	/*
+	 * We want to send the pkey used in the BTH in packets
+	 * sent. This, in order for the passive side to determine if
+	 * communication is permitted by the respective pkeys.
+	 *
+	 * The pkey in the paths are derived from the MGID, which has
+	 * the full membership bit set. Hence, we retrieve the pkey by
+	 * using the address vector's pkey_index.
+	 */
+	ret = ib_get_cached_pkey(cm_id_priv->id.device,
+				 cm_id_priv->av.port->port_num,
+				 cm_id_priv->av.pkey_index,
+				 &cm_id_priv->pkey);
+	if (ret)
+		goto error1;
+
 	cm_id_priv->qp_type = param->qp_type;
 
 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
@@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
 				 cm_id_priv);
 	if (ret) {
 		int err;
+		int rej_reason = (ret == -ENOENT ?
+				  IB_CM_REJ_INVALID_PKEY :
+				  IB_CM_REJ_INVALID_GID);
 
 		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
 					work->port->port_num, 0,
 					&work->path[0].sgid,
 					NULL);
 		if (err)
-			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
+			ib_send_cm_rej(cm_id, rej_reason,
 				       NULL, 0, NULL, 0);
 		else
-			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
+			ib_send_cm_rej(cm_id, rej_reason,
 				       &work->path[0].sgid,
 				       sizeof(work->path[0].sgid),
 				       NULL, 0);
@@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
 		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
 					 cm_id_priv);
 		if (ret) {
-			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
+			int rej_reason = (ret == -ENOENT ?
+					  IB_CM_REJ_INVALID_PKEY :
+					  IB_CM_REJ_INVALID_ALT_GID);
+
+			ib_send_cm_rej(cm_id, rej_reason,
 				       &work->path[0].sgid,
 				       sizeof(work->path[0].sgid), NULL, 0);
 			goto rejected;
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 7979cb04f529..56b62303946a 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -3,6 +3,7 @@
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
 	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
 	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
 	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
-	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
+	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
+	IB_CM_REJ_INVALID_PKEY			= 34,
 };
 
 struct ib_cm_rej_event_param {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-09  9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge
@ 2018-05-09 11:28   ` Hal Rosenstock
  2018-05-10  9:16     ` Håkon Bugge
  2018-05-09 18:11   ` kbuild test robot
  1 sibling, 1 reply; 24+ messages in thread
From: Hal Rosenstock @ 2018-05-09 11:28 UTC (permalink / raw)
  To: Håkon Bugge, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty
  Cc: linux-rdma, linux-kernel

On 5/9/2018 5:30 AM, Håkon Bugge wrote:
> There is no point in using RDMA CM to establish a connection between
> two QPs that cannot possible communicate. Particularly, if both the
> active and passive side use limited pkeys, they are not able to
> communicate.
> 
> In order to detect this situation, the authentic pkey is used in the
> CM REQ message. The authentic pkey is the one that the HCA inserts
> into the BTH in the IB packets.
> 
> When the passive side receives the REQ, commit ("ib_core: A full pkey
> is required to match a limited one") ensures that
> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
> compared has the full-member bit.
> 
> In the limited-to-limited case, this will prohibit the connection to
> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
> 
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> ---
>  drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
>  include/rdma/ib_cm.h         |  4 +++-
>  2 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index a92e1a5c202b..52ed51d5bd2a 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>   * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>   * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>  	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
>  	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
>  	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
> +	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",

If this patch goes ahead, IBA spec for CM should be updated to include this.

>  };
>  
>  const char *__attribute_const__ ibcm_reject_msg(int reason)
> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
>  		return -EINVAL;
>  	cm_dev = port->cm_dev;
>  
> -	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
> -				  be16_to_cpu(path->pkey), &av->pkey_index);
> +	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
> +					  be16_to_cpu(path->pkey), &av->pkey_index);
>  	if (ret)
>  		return ret;
>  
> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>  	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>  	cm_req_set_local_resp_timeout(req_msg,
>  				      param->local_cm_response_timeout);
> -	req_msg->pkey = param->primary_path->pkey;
> +	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>  	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>  	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>  
> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>  	cm_id_priv->responder_resources = param->responder_resources;
>  	cm_id_priv->retry_count = param->retry_count;
>  	cm_id_priv->path_mtu = param->primary_path->mtu;
> -	cm_id_priv->pkey = param->primary_path->pkey;
> +
> +	/*
> +	 * We want to send the pkey used in the BTH in packets
> +	 * sent. This, in order for the passive side to determine if
> +	 * communication is permitted by the respective pkeys.
> +	 *
> +	 * The pkey in the paths are derived from the MGID, which has
> +	 * the full membership bit set. Hence, we retrieve the pkey by
> +	 * using the address vector's pkey_index.

The paths usually come from the SM and I don't expect SM to provide path
between ports of only limited members of partition. Default ACM provider
forms path from multicast group parameters including pkey. Is that the
scenario of concern ? If so, I still don't fully understand the scenario
because limited members are not supposed to be part of a multicast
group. There was some work started to extend this for client/server
model but it was never completed. However, there may be hole(s) in
various components of implementation which open(s) this door.

-- Hal

> +	 */
> +	ret = ib_get_cached_pkey(cm_id_priv->id.device,
> +				 cm_id_priv->av.port->port_num,
> +				 cm_id_priv->av.pkey_index,
> +				 &cm_id_priv->pkey);
> +	if (ret)
> +		goto error1;
> +
>  	cm_id_priv->qp_type = param->qp_type;
>  
>  	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>  				 cm_id_priv);
>  	if (ret) {
>  		int err;
> +		int rej_reason = (ret == -ENOENT ?
> +				  IB_CM_REJ_INVALID_PKEY :
> +				  IB_CM_REJ_INVALID_GID);
>  
>  		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
>  					work->port->port_num, 0,
>  					&work->path[0].sgid,
>  					NULL);
>  		if (err)
> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
> +			ib_send_cm_rej(cm_id, rej_reason,
>  				       NULL, 0, NULL, 0);
>  		else
> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
> +			ib_send_cm_rej(cm_id, rej_reason,
>  				       &work->path[0].sgid,
>  				       sizeof(work->path[0].sgid),
>  				       NULL, 0);
> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
>  		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
>  					 cm_id_priv);
>  		if (ret) {
> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
> +			int rej_reason = (ret == -ENOENT ?
> +					  IB_CM_REJ_INVALID_PKEY :
> +					  IB_CM_REJ_INVALID_ALT_GID);
> +
> +			ib_send_cm_rej(cm_id, rej_reason,
>  				       &work->path[0].sgid,
>  				       sizeof(work->path[0].sgid), NULL, 0);
>  			goto rejected;
> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
> index 7979cb04f529..56b62303946a 100644
> --- a/include/rdma/ib_cm.h
> +++ b/include/rdma/ib_cm.h
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>   * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
>   * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
>  	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
>  	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
>  	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
> -	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
> +	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
> +	IB_CM_REJ_INVALID_PKEY			= 34,
>  };
>  
>  struct ib_cm_rej_event_param {
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-09  9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge
  2018-05-09 11:28   ` Hal Rosenstock
@ 2018-05-09 18:11   ` kbuild test robot
  1 sibling, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2018-05-09 18:11 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: kbuild-all, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	linux-rdma, linux-kernel, Håkon Bugge

Hi Håkon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17-rc4 next-20180509]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/H-kon-Bugge/Do-not-form-IB-connections-between-limited-partition-members/20180509-224727
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   drivers/infiniband/core/cm.c:880:16: sparse: expression using sizeof(void)
   drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void)
>> drivers/infiniband/core/cm.c:1246:25: sparse: cast from restricted __be16
>> drivers/infiniband/core/cm.c:1246:25: sparse: incorrect type in argument 1 (different base types) @@    expected unsigned short [unsigned] [usertype] val @@    got  short [unsigned] [usertype] val @@
   drivers/infiniband/core/cm.c:1246:25:    expected unsigned short [unsigned] [usertype] val
   drivers/infiniband/core/cm.c:1246:25:    got restricted __be16 [usertype] pkey
>> drivers/infiniband/core/cm.c:1246:25: sparse: cast from restricted __be16
>> drivers/infiniband/core/cm.c:1246:25: sparse: cast from restricted __be16
   drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void)
   drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void)
>> drivers/infiniband/core/cm.c:1414:35: sparse: incorrect type in argument 4 (different base types) @@    expected unsigned short [usertype] *pkey @@    got  short [usertype] *pkey @@
   drivers/infiniband/core/cm.c:1414:35:    expected unsigned short [usertype] *pkey
   drivers/infiniband/core/cm.c:1414:35:    got restricted __be16 *<noident>
   drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void)
   drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void)
   drivers/infiniband/core/cm.c:861:21: sparse: expression using sizeof(void)

vim +1246 drivers/infiniband/core/cm.c

  1218	
  1219	static void cm_format_req(struct cm_req_msg *req_msg,
  1220				  struct cm_id_private *cm_id_priv,
  1221				  struct ib_cm_req_param *param)
  1222	{
  1223		struct sa_path_rec *pri_path = param->primary_path;
  1224		struct sa_path_rec *alt_path = param->alternate_path;
  1225		bool pri_ext = false;
  1226	
  1227		if (pri_path->rec_type == SA_PATH_REC_TYPE_OPA)
  1228			pri_ext = opa_is_extended_lid(pri_path->opa.dlid,
  1229						      pri_path->opa.slid);
  1230	
  1231		cm_format_mad_hdr(&req_msg->hdr, CM_REQ_ATTR_ID,
  1232				  cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_REQ));
  1233	
  1234		req_msg->local_comm_id = cm_id_priv->id.local_id;
  1235		req_msg->service_id = param->service_id;
  1236		req_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
  1237		cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num));
  1238		cm_req_set_init_depth(req_msg, param->initiator_depth);
  1239		cm_req_set_remote_resp_timeout(req_msg,
  1240					       param->remote_cm_response_timeout);
  1241		cm_req_set_qp_type(req_msg, param->qp_type);
  1242		cm_req_set_flow_ctrl(req_msg, param->flow_control);
  1243		cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
  1244		cm_req_set_local_resp_timeout(req_msg,
  1245					      param->local_cm_response_timeout);
> 1246		req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
  1247		cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
  1248		cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
  1249	
  1250		if (param->qp_type != IB_QPT_XRC_INI) {
  1251			cm_req_set_resp_res(req_msg, param->responder_resources);
  1252			cm_req_set_retry_count(req_msg, param->retry_count);
  1253			cm_req_set_rnr_retry_count(req_msg, param->rnr_retry_count);
  1254			cm_req_set_srq(req_msg, param->srq);
  1255		}
  1256	
  1257		req_msg->primary_local_gid = pri_path->sgid;
  1258		req_msg->primary_remote_gid = pri_path->dgid;
  1259		if (pri_ext) {
  1260			req_msg->primary_local_gid.global.interface_id
  1261				= OPA_MAKE_ID(be32_to_cpu(pri_path->opa.slid));
  1262			req_msg->primary_remote_gid.global.interface_id
  1263				= OPA_MAKE_ID(be32_to_cpu(pri_path->opa.dlid));
  1264		}
  1265		if (pri_path->hop_limit <= 1) {
  1266			req_msg->primary_local_lid = pri_ext ? 0 :
  1267				htons(ntohl(sa_path_get_slid(pri_path)));
  1268			req_msg->primary_remote_lid = pri_ext ? 0 :
  1269				htons(ntohl(sa_path_get_dlid(pri_path)));
  1270		} else {
  1271			/* Work-around until there's a way to obtain remote LID info */
  1272			req_msg->primary_local_lid = IB_LID_PERMISSIVE;
  1273			req_msg->primary_remote_lid = IB_LID_PERMISSIVE;
  1274		}
  1275		cm_req_set_primary_flow_label(req_msg, pri_path->flow_label);
  1276		cm_req_set_primary_packet_rate(req_msg, pri_path->rate);
  1277		req_msg->primary_traffic_class = pri_path->traffic_class;
  1278		req_msg->primary_hop_limit = pri_path->hop_limit;
  1279		cm_req_set_primary_sl(req_msg, pri_path->sl);
  1280		cm_req_set_primary_subnet_local(req_msg, (pri_path->hop_limit <= 1));
  1281		cm_req_set_primary_local_ack_timeout(req_msg,
  1282			cm_ack_timeout(cm_id_priv->av.port->cm_dev->ack_delay,
  1283				       pri_path->packet_life_time));
  1284	
  1285		if (alt_path) {
  1286			bool alt_ext = false;
  1287	
  1288			if (alt_path->rec_type == SA_PATH_REC_TYPE_OPA)
  1289				alt_ext = opa_is_extended_lid(alt_path->opa.dlid,
  1290							      alt_path->opa.slid);
  1291	
  1292			req_msg->alt_local_gid = alt_path->sgid;
  1293			req_msg->alt_remote_gid = alt_path->dgid;
  1294			if (alt_ext) {
  1295				req_msg->alt_local_gid.global.interface_id
  1296					= OPA_MAKE_ID(be32_to_cpu(alt_path->opa.slid));
  1297				req_msg->alt_remote_gid.global.interface_id
  1298					= OPA_MAKE_ID(be32_to_cpu(alt_path->opa.dlid));
  1299			}
  1300			if (alt_path->hop_limit <= 1) {
  1301				req_msg->alt_local_lid = alt_ext ? 0 :
  1302					htons(ntohl(sa_path_get_slid(alt_path)));
  1303				req_msg->alt_remote_lid = alt_ext ? 0 :
  1304					htons(ntohl(sa_path_get_dlid(alt_path)));
  1305			} else {
  1306				req_msg->alt_local_lid = IB_LID_PERMISSIVE;
  1307				req_msg->alt_remote_lid = IB_LID_PERMISSIVE;
  1308			}
  1309			cm_req_set_alt_flow_label(req_msg,
  1310						  alt_path->flow_label);
  1311			cm_req_set_alt_packet_rate(req_msg, alt_path->rate);
  1312			req_msg->alt_traffic_class = alt_path->traffic_class;
  1313			req_msg->alt_hop_limit = alt_path->hop_limit;
  1314			cm_req_set_alt_sl(req_msg, alt_path->sl);
  1315			cm_req_set_alt_subnet_local(req_msg, (alt_path->hop_limit <= 1));
  1316			cm_req_set_alt_local_ack_timeout(req_msg,
  1317				cm_ack_timeout(cm_id_priv->av.port->cm_dev->ack_delay,
  1318					       alt_path->packet_life_time));
  1319		}
  1320	
  1321		if (param->private_data && param->private_data_len)
  1322			memcpy(req_msg->private_data, param->private_data,
  1323			       param->private_data_len);
  1324	}
  1325	
  1326	static int cm_validate_req_param(struct ib_cm_req_param *param)
  1327	{
  1328		/* peer-to-peer not supported */
  1329		if (param->peer_to_peer)
  1330			return -EINVAL;
  1331	
  1332		if (!param->primary_path)
  1333			return -EINVAL;
  1334	
  1335		if (param->qp_type != IB_QPT_RC && param->qp_type != IB_QPT_UC &&
  1336		    param->qp_type != IB_QPT_XRC_INI)
  1337			return -EINVAL;
  1338	
  1339		if (param->private_data &&
  1340		    param->private_data_len > IB_CM_REQ_PRIVATE_DATA_SIZE)
  1341			return -EINVAL;
  1342	
  1343		if (param->alternate_path &&
  1344		    (param->alternate_path->pkey != param->primary_path->pkey ||
  1345		     param->alternate_path->mtu != param->primary_path->mtu))
  1346			return -EINVAL;
  1347	
  1348		return 0;
  1349	}
  1350	
  1351	int ib_send_cm_req(struct ib_cm_id *cm_id,
  1352			   struct ib_cm_req_param *param)
  1353	{
  1354		struct cm_id_private *cm_id_priv;
  1355		struct cm_req_msg *req_msg;
  1356		unsigned long flags;
  1357		int ret;
  1358	
  1359		ret = cm_validate_req_param(param);
  1360		if (ret)
  1361			return ret;
  1362	
  1363		/* Verify that we're not in timewait. */
  1364		cm_id_priv = container_of(cm_id, struct cm_id_private, id);
  1365		spin_lock_irqsave(&cm_id_priv->lock, flags);
  1366		if (cm_id->state != IB_CM_IDLE) {
  1367			spin_unlock_irqrestore(&cm_id_priv->lock, flags);
  1368			ret = -EINVAL;
  1369			goto out;
  1370		}
  1371		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
  1372	
  1373		cm_id_priv->timewait_info = cm_create_timewait_info(cm_id_priv->
  1374								    id.local_id);
  1375		if (IS_ERR(cm_id_priv->timewait_info)) {
  1376			ret = PTR_ERR(cm_id_priv->timewait_info);
  1377			goto out;
  1378		}
  1379	
  1380		ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av,
  1381					 cm_id_priv);
  1382		if (ret)
  1383			goto error1;
  1384		if (param->alternate_path) {
  1385			ret = cm_init_av_by_path(param->alternate_path,
  1386						 &cm_id_priv->alt_av, cm_id_priv);
  1387			if (ret)
  1388				goto error1;
  1389		}
  1390		cm_id->service_id = param->service_id;
  1391		cm_id->service_mask = ~cpu_to_be64(0);
  1392		cm_id_priv->timeout_ms = cm_convert_to_ms(
  1393					    param->primary_path->packet_life_time) * 2 +
  1394					 cm_convert_to_ms(
  1395					    param->remote_cm_response_timeout);
  1396		cm_id_priv->max_cm_retries = param->max_cm_retries;
  1397		cm_id_priv->initiator_depth = param->initiator_depth;
  1398		cm_id_priv->responder_resources = param->responder_resources;
  1399		cm_id_priv->retry_count = param->retry_count;
  1400		cm_id_priv->path_mtu = param->primary_path->mtu;
  1401	
  1402		/*
  1403		 * We want to send the pkey used in the BTH in packets
  1404		 * sent. This, in order for the passive side to determine if
  1405		 * communication is permitted by the respective pkeys.
  1406		 *
  1407		 * The pkey in the paths are derived from the MGID, which has
  1408		 * the full membership bit set. Hence, we retrieve the pkey by
  1409		 * using the address vector's pkey_index.
  1410		 */
  1411		ret = ib_get_cached_pkey(cm_id_priv->id.device,
  1412					 cm_id_priv->av.port->port_num,
  1413					 cm_id_priv->av.pkey_index,
> 1414					 &cm_id_priv->pkey);
  1415		if (ret)
  1416			goto error1;
  1417	
  1418		cm_id_priv->qp_type = param->qp_type;
  1419	
  1420		ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
  1421		if (ret)
  1422			goto error1;
  1423	
  1424		req_msg = (struct cm_req_msg *) cm_id_priv->msg->mad;
  1425		cm_format_req(req_msg, cm_id_priv, param);
  1426		cm_id_priv->tid = req_msg->hdr.tid;
  1427		cm_id_priv->msg->timeout_ms = cm_id_priv->timeout_ms;
  1428		cm_id_priv->msg->context[1] = (void *) (unsigned long) IB_CM_REQ_SENT;
  1429	
  1430		cm_id_priv->local_qpn = cm_req_get_local_qpn(req_msg);
  1431		cm_id_priv->rq_psn = cm_req_get_starting_psn(req_msg);
  1432	
  1433		spin_lock_irqsave(&cm_id_priv->lock, flags);
  1434		ret = ib_post_send_mad(cm_id_priv->msg, NULL);
  1435		if (ret) {
  1436			spin_unlock_irqrestore(&cm_id_priv->lock, flags);
  1437			goto error2;
  1438		}
  1439		BUG_ON(cm_id->state != IB_CM_IDLE);
  1440		cm_id->state = IB_CM_REQ_SENT;
  1441		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
  1442		return 0;
  1443	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-09 11:28   ` Hal Rosenstock
@ 2018-05-10  9:16     ` Håkon Bugge
  2018-05-10 14:01       ` Hal Rosenstock
  0 siblings, 1 reply; 24+ messages in thread
From: Håkon Bugge @ 2018-05-10  9:16 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel



> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> 
> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>> There is no point in using RDMA CM to establish a connection between
>> two QPs that cannot possible communicate. Particularly, if both the
>> active and passive side use limited pkeys, they are not able to
>> communicate.
>> 
>> In order to detect this situation, the authentic pkey is used in the
>> CM REQ message. The authentic pkey is the one that the HCA inserts
>> into the BTH in the IB packets.
>> 
>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>> is required to match a limited one") ensures that
>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>> compared has the full-member bit.
>> 
>> In the limited-to-limited case, this will prohibit the connection to
>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>> 
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>> ---
>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
>> include/rdma/ib_cm.h         |  4 +++-
>> 2 files changed, 35 insertions(+), 8 deletions(-)
>> 
>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>> index a92e1a5c202b..52ed51d5bd2a 100644
>> --- a/drivers/infiniband/core/cm.c
>> +++ b/drivers/infiniband/core/cm.c
>> @@ -3,6 +3,7 @@
>>  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>  * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>  *
>>  * This software is available to you under a choice of one of two
>>  * licenses.  You may choose to be licensed under the terms of the GNU
>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>> 	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
>> 	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
>> 	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
>> +	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",
> 
> If this patch goes ahead, IBA spec for CM should be updated to include this.

Sure, I see:

 33 Invalid Alternate Flow Label

as the latest in the spec.

> 
>> };
>> 
>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
>> 		return -EINVAL;
>> 	cm_dev = port->cm_dev;
>> 
>> -	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>> -				  be16_to_cpu(path->pkey), &av->pkey_index);
>> +	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>> +					  be16_to_cpu(path->pkey), &av->pkey_index);
>> 	if (ret)
>> 		return ret;
>> 
>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>> 	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>> 	cm_req_set_local_resp_timeout(req_msg,
>> 				      param->local_cm_response_timeout);
>> -	req_msg->pkey = param->primary_path->pkey;
>> +	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>> 	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>> 	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>> 
>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>> 	cm_id_priv->responder_resources = param->responder_resources;
>> 	cm_id_priv->retry_count = param->retry_count;
>> 	cm_id_priv->path_mtu = param->primary_path->mtu;
>> -	cm_id_priv->pkey = param->primary_path->pkey;
>> +
>> +	/*
>> +	 * We want to send the pkey used in the BTH in packets
>> +	 * sent. This, in order for the passive side to determine if
>> +	 * communication is permitted by the respective pkeys.
>> +	 *
>> +	 * The pkey in the paths are derived from the MGID, which has
>> +	 * the full membership bit set. Hence, we retrieve the pkey by
>> +	 * using the address vector's pkey_index.
> 
> The paths usually come from the SM and I don't expect SM to provide path
> between ports of only limited members of partition.

In my case, it does. 

> Default ACM provider
> forms path from multicast group parameters including pkey. Is that the
> scenario of concern ?

Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg.

> If so, I still don't fully understand the scenario
> because limited members are not supposed to be part of a multicast
> group. There was some work started to extend this for client/server
> model but it was never completed. However, there may be hole(s) in
> various components of implementation which open(s) this door.

I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component.

I think the CM REQ message should contain the correct PKey (fixed by this patch series).

And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series).

Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case.


Thxs, Håkon


> 
> -- Hal
> 
>> +	 */
>> +	ret = ib_get_cached_pkey(cm_id_priv->id.device,
>> +				 cm_id_priv->av.port->port_num,
>> +				 cm_id_priv->av.pkey_index,
>> +				 &cm_id_priv->pkey);
>> +	if (ret)
>> +		goto error1;
>> +
>> 	cm_id_priv->qp_type = param->qp_type;
>> 
>> 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>> 				 cm_id_priv);
>> 	if (ret) {
>> 		int err;
>> +		int rej_reason = (ret == -ENOENT ?
>> +				  IB_CM_REJ_INVALID_PKEY :
>> +				  IB_CM_REJ_INVALID_GID);
>> 
>> 		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
>> 					work->port->port_num, 0,
>> 					&work->path[0].sgid,
>> 					NULL);
>> 		if (err)
>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>> +			ib_send_cm_rej(cm_id, rej_reason,
>> 				       NULL, 0, NULL, 0);
>> 		else
>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>> +			ib_send_cm_rej(cm_id, rej_reason,
>> 				       &work->path[0].sgid,
>> 				       sizeof(work->path[0].sgid),
>> 				       NULL, 0);
>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
>> 		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
>> 					 cm_id_priv);
>> 		if (ret) {
>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
>> +			int rej_reason = (ret == -ENOENT ?
>> +					  IB_CM_REJ_INVALID_PKEY :
>> +					  IB_CM_REJ_INVALID_ALT_GID);
>> +
>> +			ib_send_cm_rej(cm_id, rej_reason,
>> 				       &work->path[0].sgid,
>> 				       sizeof(work->path[0].sgid), NULL, 0);
>> 			goto rejected;
>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
>> index 7979cb04f529..56b62303946a 100644
>> --- a/include/rdma/ib_cm.h
>> +++ b/include/rdma/ib_cm.h
>> @@ -3,6 +3,7 @@
>>  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
>>  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>  *
>>  * This software is available to you under a choice of one of two
>>  * licenses.  You may choose to be licensed under the terms of the GNU
>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
>> 	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
>> 	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
>> 	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
>> -	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
>> +	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
>> +	IB_CM_REJ_INVALID_PKEY			= 34,
>> };
>> 
>> struct ib_cm_rej_event_param {
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-10  9:16     ` Håkon Bugge
@ 2018-05-10 14:01       ` Hal Rosenstock
  2018-05-10 15:16         ` Håkon Bugge
  0 siblings, 1 reply; 24+ messages in thread
From: Hal Rosenstock @ 2018-05-10 14:01 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On 5/10/2018 5:16 AM, Håkon Bugge wrote:
> 
> 
>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>
>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>> There is no point in using RDMA CM to establish a connection between
>>> two QPs that cannot possible communicate. Particularly, if both the
>>> active and passive side use limited pkeys, they are not able to
>>> communicate.
>>>
>>> In order to detect this situation, the authentic pkey is used in the
>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>> into the BTH in the IB packets.
>>>
>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>> is required to match a limited one") ensures that
>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>> compared has the full-member bit.
>>>
>>> In the limited-to-limited case, this will prohibit the connection to
>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>
>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>> ---
>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
>>> include/rdma/ib_cm.h         |  4 +++-
>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>> --- a/drivers/infiniband/core/cm.c
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -3,6 +3,7 @@
>>>  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>  * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>  *
>>>  * This software is available to you under a choice of one of two
>>>  * licenses.  You may choose to be licensed under the terms of the GNU
>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>> 	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
>>> 	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
>>> 	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
>>> +	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",
>>
>> If this patch goes ahead, IBA spec for CM should be updated to include this.
> 
> Sure, I see:
> 
>  33 Invalid Alternate Flow Label
> 
> as the latest in the spec.

This appears to be an implementation rather than architectural issue. If
there is limited <-> limited CM, then the MAD would never get up to the
CM layer as it would be filtered by end port pkey enforcement. This is
only needed to protect against current code which can send on the full
rather than limited pkey (in BTH), right ?

>>
>>> };
>>>
>>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
>>> 		return -EINVAL;
>>> 	cm_dev = port->cm_dev;
>>>
>>> -	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>>> -				  be16_to_cpu(path->pkey), &av->pkey_index);
>>> +	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>>> +					  be16_to_cpu(path->pkey), &av->pkey_index);
>>> 	if (ret)
>>> 		return ret;
>>>
>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>>> 	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>>> 	cm_req_set_local_resp_timeout(req_msg,
>>> 				      param->local_cm_response_timeout);
>>> -	req_msg->pkey = param->primary_path->pkey;
>>> +	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>>> 	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>>> 	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>>>
>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>>> 	cm_id_priv->responder_resources = param->responder_resources;
>>> 	cm_id_priv->retry_count = param->retry_count;
>>> 	cm_id_priv->path_mtu = param->primary_path->mtu;
>>> -	cm_id_priv->pkey = param->primary_path->pkey;
>>> +
>>> +	/*
>>> +	 * We want to send the pkey used in the BTH in packets
>>> +	 * sent. This, in order for the passive side to determine if
>>> +	 * communication is permitted by the respective pkeys.
>>> +	 *
>>> +	 * The pkey in the paths are derived from the MGID, which has
>>> +	 * the full membership bit set. Hence, we retrieve the pkey by
>>> +	 * using the address vector's pkey_index.
>>
>> The paths usually come from the SM and I don't expect SM to provide path
>> between ports of only limited members of partition.
> 
> In my case, it does. 

Which OpenSM flavor are you talking about (upstream, MLNX, or other) ?
What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ?

>> Default ACM provider
>> forms path from multicast group parameters including pkey. Is that the
>> scenario of concern ?
> 
> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg.

Not exactly sure what you mean by RDMA CM does this as there are a
number of different ways to use RDMA CM. Are you referring to using
IPoIB for address resolution ? Another way is for a path record to be
passed in from user space and there is no control over it's contents.

At a minimum, I think the comment mentioning MGID should be clarified.

>> If so, I still don't fully understand the scenario
>> because limited members are not supposed to be part of a multicast
>> group. There was some work started to extend this for client/server
>> model but it was never completed. However, there may be hole(s) in
>> various components of implementation which open(s) this door.
> 
> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component.

Agreed. I think that there are other components too...

> I think the CM REQ message should contain the correct PKey (fixed by this patch series).
> 
> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series).
This is the backward compatibility for the current behavior on the
active side so that REQ gets rejected rather than accepted, right ?

In general, the approach used when there is pkey mismatch is to silently
drop packet rather than respond which causes a timeout on the requester
side. In this specific case, I'm not sure which approach is better.

Also, is there a similar issue with SIDR_REQ ?

> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case.

Understood and agreed.

-- Hal

> 
> Thxs, Håkon
> 
> 
>>
>> -- Hal
>>
>>> +	 */
>>> +	ret = ib_get_cached_pkey(cm_id_priv->id.device,
>>> +				 cm_id_priv->av.port->port_num,
>>> +				 cm_id_priv->av.pkey_index,
>>> +				 &cm_id_priv->pkey);
>>> +	if (ret)
>>> +		goto error1;
>>> +
>>> 	cm_id_priv->qp_type = param->qp_type;
>>>
>>> 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>>> 				 cm_id_priv);
>>> 	if (ret) {
>>> 		int err;
>>> +		int rej_reason = (ret == -ENOENT ?
>>> +				  IB_CM_REJ_INVALID_PKEY :
>>> +				  IB_CM_REJ_INVALID_GID);
>>>
>>> 		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
>>> 					work->port->port_num, 0,
>>> 					&work->path[0].sgid,
>>> 					NULL);
>>> 		if (err)
>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>> 				       NULL, 0, NULL, 0);
>>> 		else
>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>> 				       &work->path[0].sgid,
>>> 				       sizeof(work->path[0].sgid),
>>> 				       NULL, 0);
>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
>>> 		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
>>> 					 cm_id_priv);
>>> 		if (ret) {
>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
>>> +			int rej_reason = (ret == -ENOENT ?
>>> +					  IB_CM_REJ_INVALID_PKEY :
>>> +					  IB_CM_REJ_INVALID_ALT_GID);
>>> +
>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>> 				       &work->path[0].sgid,
>>> 				       sizeof(work->path[0].sgid), NULL, 0);
>>> 			goto rejected;
>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
>>> index 7979cb04f529..56b62303946a 100644
>>> --- a/include/rdma/ib_cm.h
>>> +++ b/include/rdma/ib_cm.h
>>> @@ -3,6 +3,7 @@
>>>  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
>>>  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>  *
>>>  * This software is available to you under a choice of one of two
>>>  * licenses.  You may choose to be licensed under the terms of the GNU
>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
>>> 	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
>>> 	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
>>> 	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
>>> -	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
>>> +	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
>>> +	IB_CM_REJ_INVALID_PKEY			= 34,
>>> };
>>>
>>> struct ib_cm_rej_event_param {
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-10 14:01       ` Hal Rosenstock
@ 2018-05-10 15:16         ` Håkon Bugge
  2018-05-10 16:54           ` Hal Rosenstock
  2018-05-14 21:02           ` Jason Gunthorpe
  0 siblings, 2 replies; 24+ messages in thread
From: Håkon Bugge @ 2018-05-10 15:16 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel



> On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> 
> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>> 
>> 
>>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>> 
>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>> There is no point in using RDMA CM to establish a connection between
>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>> active and passive side use limited pkeys, they are not able to
>>>> communicate.
>>>> 
>>>> In order to detect this situation, the authentic pkey is used in the
>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>> into the BTH in the IB packets.
>>>> 
>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>> is required to match a limited one") ensures that
>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>> compared has the full-member bit.
>>>> 
>>>> In the limited-to-limited case, this will prohibit the connection to
>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>> 
>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> ---
>>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
>>>> include/rdma/ib_cm.h         |  4 +++-
>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>> 
>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>> --- a/drivers/infiniband/core/cm.c
>>>> +++ b/drivers/infiniband/core/cm.c
>>>> @@ -3,6 +3,7 @@
>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>> *
>>>> * This software is available to you under a choice of one of two
>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>> 	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
>>>> 	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
>>>> 	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
>>>> +	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",
>>> 
>>> If this patch goes ahead, IBA spec for CM should be updated to include this.
>> 
>> Sure, I see:
>> 
>> 33 Invalid Alternate Flow Label
>> 
>> as the latest in the spec.
> 
> This appears to be an implementation rather than architectural issue. If
> there is limited <-> limited CM, then the MAD would never get up to the
> CM layer as it would be filtered by end port pkey enforcement. This is
> only needed to protect against current code which can send on the full
> rather than limited pkey (in BTH), right ?

We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ.

I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. Further, we have per IBTA:

C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching.

So, by architecture, the MAD arrives.

> 
> 
>>> 
>>>> };
>>>> 
>>>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
>>>> 		return -EINVAL;
>>>> 	cm_dev = port->cm_dev;
>>>> 
>>>> -	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>>>> -				  be16_to_cpu(path->pkey), &av->pkey_index);
>>>> +	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>>>> +					  be16_to_cpu(path->pkey), &av->pkey_index);
>>>> 	if (ret)
>>>> 		return ret;
>>>> 
>>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>>>> 	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>>>> 	cm_req_set_local_resp_timeout(req_msg,
>>>> 				      param->local_cm_response_timeout);
>>>> -	req_msg->pkey = param->primary_path->pkey;
>>>> +	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>>>> 	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>>>> 	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>>>> 
>>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>>>> 	cm_id_priv->responder_resources = param->responder_resources;
>>>> 	cm_id_priv->retry_count = param->retry_count;
>>>> 	cm_id_priv->path_mtu = param->primary_path->mtu;
>>>> -	cm_id_priv->pkey = param->primary_path->pkey;
>>>> +
>>>> +	/*
>>>> +	 * We want to send the pkey used in the BTH in packets
>>>> +	 * sent. This, in order for the passive side to determine if
>>>> +	 * communication is permitted by the respective pkeys.
>>>> +	 *
>>>> +	 * The pkey in the paths are derived from the MGID, which has
>>>> +	 * the full membership bit set. Hence, we retrieve the pkey by
>>>> +	 * using the address vector's pkey_index.
>>> 
>>> The paths usually come from the SM and I don't expect SM to provide path
>>> between ports of only limited members of partition.
>> 
>> In my case, it does. 
> 
> Which OpenSM flavor are you talking about (upstream, MLNX, or other) ?

I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609.

> What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ?

The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords.

>>> Default ACM provider
>>> forms path from multicast group parameters including pkey. Is that the
>>> scenario of concern ?
>> 
>> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg.
> 
> Not exactly sure what you mean by RDMA CM does this as there are a
> number of different ways to use RDMA CM.

I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one.

> Are you referring to using
> IPoIB for address resolution ? Another way is for a path record to be
> passed in from user space and there is no control over it’s contents.

I did not bring up the path record or address resolution issue ;-)

But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2).

But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent.

> At a minimum, I think the comment mentioning MGID should be clarified.

Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? 

With the OpenSM I am running, the MC group contains limited members.

>>> If so, I still don't fully understand the scenario
>>> because limited members are not supposed to be part of a multicast
>>> group. There was some work started to extend this for client/server
>>> model but it was never completed. However, there may be hole(s) in
>>> various components of implementation which open(s) this door.
>> 
>> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component.
> 
> Agreed. I think that there are other components too...
> 
>> I think the CM REQ message should contain the correct PKey (fixed by this patch series).
>> 
>> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series).
> This is the backward compatibility for the current behavior on the
> active side so that REQ gets rejected rather than accepted, right ?

Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back.

> In general, the approach used when there is pkey mismatch is to silently
> drop packet rather than respond which causes a timeout on the requester
> side. In this specific case, I’m not sure which approach is better.

It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM.

> Also, is there a similar issue with SIDR_REQ ?

Good point, most probaly!

> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case.
> 
> Understood and agreed.


Thxs, Håkon

> 
> -- Hal
> 
>> 
>> Thxs, Håkon
>> 
>> 
>>> 
>>> -- Hal
>>> 
>>>> +	 */
>>>> +	ret = ib_get_cached_pkey(cm_id_priv->id.device,
>>>> +				 cm_id_priv->av.port->port_num,
>>>> +				 cm_id_priv->av.pkey_index,
>>>> +				 &cm_id_priv->pkey);
>>>> +	if (ret)
>>>> +		goto error1;
>>>> +
>>>> 	cm_id_priv->qp_type = param->qp_type;
>>>> 
>>>> 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
>>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>>>> 				 cm_id_priv);
>>>> 	if (ret) {
>>>> 		int err;
>>>> +		int rej_reason = (ret == -ENOENT ?
>>>> +				  IB_CM_REJ_INVALID_PKEY :
>>>> +				  IB_CM_REJ_INVALID_GID);
>>>> 
>>>> 		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
>>>> 					work->port->port_num, 0,
>>>> 					&work->path[0].sgid,
>>>> 					NULL);
>>>> 		if (err)
>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>> 				       NULL, 0, NULL, 0);
>>>> 		else
>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>> 				       &work->path[0].sgid,
>>>> 				       sizeof(work->path[0].sgid),
>>>> 				       NULL, 0);
>>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
>>>> 		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
>>>> 					 cm_id_priv);
>>>> 		if (ret) {
>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
>>>> +			int rej_reason = (ret == -ENOENT ?
>>>> +					  IB_CM_REJ_INVALID_PKEY :
>>>> +					  IB_CM_REJ_INVALID_ALT_GID);
>>>> +
>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>> 				       &work->path[0].sgid,
>>>> 				       sizeof(work->path[0].sgid), NULL, 0);
>>>> 			goto rejected;
>>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
>>>> index 7979cb04f529..56b62303946a 100644
>>>> --- a/include/rdma/ib_cm.h
>>>> +++ b/include/rdma/ib_cm.h
>>>> @@ -3,6 +3,7 @@
>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>> * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>> *
>>>> * This software is available to you under a choice of one of two
>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
>>>> 	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
>>>> 	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
>>>> 	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
>>>> -	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
>>>> +	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
>>>> +	IB_CM_REJ_INVALID_PKEY			= 34,
>>>> };
>>>> 
>>>> struct ib_cm_rej_event_param {
>>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-10 15:16         ` Håkon Bugge
@ 2018-05-10 16:54           ` Hal Rosenstock
  2018-05-11 10:55             ` Håkon Bugge
  2018-05-14 21:02           ` Jason Gunthorpe
  1 sibling, 1 reply; 24+ messages in thread
From: Hal Rosenstock @ 2018-05-10 16:54 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On 5/10/2018 11:16 AM, Håkon Bugge wrote:
> 
> 
>> On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>
>> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>>>
>>>
>>>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>>
>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>>> There is no point in using RDMA CM to establish a connection between
>>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>>> active and passive side use limited pkeys, they are not able to
>>>>> communicate.
>>>>>
>>>>> In order to detect this situation, the authentic pkey is used in the
>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>>> into the BTH in the IB packets.
>>>>>
>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>>> is required to match a limited one") ensures that
>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>>> compared has the full-member bit.
>>>>>
>>>>> In the limited-to-limited case, this will prohibit the connection to
>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>>>
>>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>>> ---
>>>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
>>>>> include/rdma/ib_cm.h         |  4 +++-
>>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>>> --- a/drivers/infiniband/core/cm.c
>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>> @@ -3,6 +3,7 @@
>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>> *
>>>>> * This software is available to you under a choice of one of two
>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>>> 	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
>>>>> 	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
>>>>> 	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
>>>>> +	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",
>>>>
>>>> If this patch goes ahead, IBA spec for CM should be updated to include this.
>>>
>>> Sure, I see:
>>>
>>> 33 Invalid Alternate Flow Label
>>>
>>> as the latest in the spec.
>>
>> This appears to be an implementation rather than architectural issue. If
>> there is limited <-> limited CM, then the MAD would never get up to the
>> CM layer as it would be filtered by end port pkey enforcement. This is
>> only needed to protect against current code which can send on the full
>> rather than limited pkey (in BTH), right ?
> 
> We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ.

Yes, I understand the difference between the 2 potentially different PKeys.

> I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. 

MADs are sent on reversible paths (13.5.4.1) where the response can be
constructed from only header info in the request. BTH:pkey doesn't
__have to be__ the default pkey. It just needs to be a common/shared
pkey between the source/destination pair.

For example, there are some configurations where the port is only a
limited member of the default partition (and full member of other non
default partition(s)).

There are also configurations where the port is both full and limited
member on one or more partition(s). This is the allow_both_pkeys TRUE
case for OpenSM which also uses the 'both' syntax in the partitions
config file. This appears to be the case to which you are referring.

CM MAD could be sent on default partition but that may not work if both
source and dest are only limited members of the default partition.

CM MAD could be sent on any common/shared pkey between the
source/destination ports.

> Further, we have per IBTA:
> 
> C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching.
> 
> So, by architecture, the MAD arrives
In the case of port being both full and limited member of the same
partition on which CM MAD comes in on (as defined by pkey in BTH).

I was talking about limited <-> limited CM and didn't realize you were
talking about the 'both' case.

>>
>>
>>>>
>>>>> };
>>>>>
>>>>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>>>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
>>>>> 		return -EINVAL;
>>>>> 	cm_dev = port->cm_dev;
>>>>>
>>>>> -	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>>>>> -				  be16_to_cpu(path->pkey), &av->pkey_index);
>>>>> +	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>>>>> +					  be16_to_cpu(path->pkey), &av->pkey_index);
>>>>> 	if (ret)
>>>>> 		return ret;
>>>>>
>>>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>>>>> 	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>>>>> 	cm_req_set_local_resp_timeout(req_msg,
>>>>> 				      param->local_cm_response_timeout);
>>>>> -	req_msg->pkey = param->primary_path->pkey;
>>>>> +	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>>>>> 	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>>>>> 	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>>>>>
>>>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>>>>> 	cm_id_priv->responder_resources = param->responder_resources;
>>>>> 	cm_id_priv->retry_count = param->retry_count;
>>>>> 	cm_id_priv->path_mtu = param->primary_path->mtu;
>>>>> -	cm_id_priv->pkey = param->primary_path->pkey;
>>>>> +
>>>>> +	/*
>>>>> +	 * We want to send the pkey used in the BTH in packets
>>>>> +	 * sent. This, in order for the passive side to determine if
>>>>> +	 * communication is permitted by the respective pkeys.
>>>>> +	 *
>>>>> +	 * The pkey in the paths are derived from the MGID, which has
>>>>> +	 * the full membership bit set. Hence, we retrieve the pkey by
>>>>> +	 * using the address vector's pkey_index.
>>>>
>>>> The paths usually come from the SM and I don't expect SM to provide path
>>>> between ports of only limited members of partition.
>>>
>>> In my case, it does. 
>>
>> Which OpenSM flavor are you talking about (upstream, MLNX, or other) ?
> 
> I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609.

I'll check upstream OpenSM when I get a chance to see if same problem
exists. Oracle OpenSM forked from upstream long time ago and not sure
how similar the path record code is. There were various impacts for the
shared port virtualization (for CX-3) in terms of adding the ability to
allow both pkeys.

>> What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ?
> 
> The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords.
> 
>>>> Default ACM provider
>>>> forms path from multicast group parameters including pkey. Is that the
>>>> scenario of concern ?
>>>
>>> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg.
>>
>> Not exactly sure what you mean by RDMA CM does this as there are a
>> number of different ways to use RDMA CM.
> 
> I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one.

When you write "port uses limited one", you mean BTH:pkey is limited
member but REQ:pkey is full member, right ?

> 
>> Are you referring to using
>> IPoIB for address resolution ? Another way is for a path record to be
>> passed in from user space and there is no control over it’s contents.
> 
> I did not bring up the path record or address resolution issue ;-)

You mentioned that "The pkey in the paths are derived from the MGID" and
brought this issue up in my thinking about this issue.

> But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2).
> 
> But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent.
> 
>> At a minimum, I think the comment mentioning MGID should be clarified.
> 
> Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? 
> 
> With the OpenSM I am running, the MC group contains limited members.

That is problematic as any sending to MC group by limited members will
cause pkey violation traps to SM.

>>>> If so, I still don't fully understand the scenario
>>>> because limited members are not supposed to be part of a multicast
>>>> group. There was some work started to extend this for client/server
>>>> model but it was never completed. However, there may be hole(s) in
>>>> various components of implementation which open(s) this door.
>>>
>>> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component.
>>
>> Agreed. I think that there are other components too...
>>
>>> I think the CM REQ message should contain the correct PKey (fixed by this patch series).
>>>
>>> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series).
>> This is the backward compatibility for the current behavior on the
>> active side so that REQ gets rejected rather than accepted, right ?
> 
> Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back.

Understood.

>> In general, the approach used when there is pkey mismatch is to silently
>> drop packet rather than respond which causes a timeout on the requester
>> side. In this specific case, I’m not sure which approach is better.
> 
> It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM.

I was only talking about the requester-responder part. Yes, pkey
security trap would go to SM.

As I wrote, I'm not sure which is preferable: to explictly reject as
this patch proposes or just silently drop and penalize requester.

-- Hal

>> Also, is there a similar issue with SIDR_REQ ?
> 
> Good point, most probaly!
> 
>> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case.
>>
>> Understood and agreed.
> 
> 
> Thxs, Håkon
> 
>>
>> -- Hal
>>
>>>
>>> Thxs, Håkon
>>>
>>>
>>>>
>>>> -- Hal
>>>>
>>>>> +	 */
>>>>> +	ret = ib_get_cached_pkey(cm_id_priv->id.device,
>>>>> +				 cm_id_priv->av.port->port_num,
>>>>> +				 cm_id_priv->av.pkey_index,
>>>>> +				 &cm_id_priv->pkey);
>>>>> +	if (ret)
>>>>> +		goto error1;
>>>>> +
>>>>> 	cm_id_priv->qp_type = param->qp_type;
>>>>>
>>>>> 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
>>>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>>>>> 				 cm_id_priv);
>>>>> 	if (ret) {
>>>>> 		int err;
>>>>> +		int rej_reason = (ret == -ENOENT ?
>>>>> +				  IB_CM_REJ_INVALID_PKEY :
>>>>> +				  IB_CM_REJ_INVALID_GID);
>>>>>
>>>>> 		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
>>>>> 					work->port->port_num, 0,
>>>>> 					&work->path[0].sgid,
>>>>> 					NULL);
>>>>> 		if (err)
>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>> 				       NULL, 0, NULL, 0);
>>>>> 		else
>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>> 				       &work->path[0].sgid,
>>>>> 				       sizeof(work->path[0].sgid),
>>>>> 				       NULL, 0);
>>>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
>>>>> 		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
>>>>> 					 cm_id_priv);
>>>>> 		if (ret) {
>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
>>>>> +			int rej_reason = (ret == -ENOENT ?
>>>>> +					  IB_CM_REJ_INVALID_PKEY :
>>>>> +					  IB_CM_REJ_INVALID_ALT_GID);
>>>>> +
>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>> 				       &work->path[0].sgid,
>>>>> 				       sizeof(work->path[0].sgid), NULL, 0);
>>>>> 			goto rejected;
>>>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
>>>>> index 7979cb04f529..56b62303946a 100644
>>>>> --- a/include/rdma/ib_cm.h
>>>>> +++ b/include/rdma/ib_cm.h
>>>>> @@ -3,6 +3,7 @@
>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>> * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>> *
>>>>> * This software is available to you under a choice of one of two
>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
>>>>> 	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
>>>>> 	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
>>>>> 	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
>>>>> -	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
>>>>> +	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
>>>>> +	IB_CM_REJ_INVALID_PKEY			= 34,
>>>>> };
>>>>>
>>>>> struct ib_cm_rej_event_param {
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-10 16:54           ` Hal Rosenstock
@ 2018-05-11 10:55             ` Håkon Bugge
  2018-05-11 12:51               ` Hal Rosenstock
  2018-05-14 21:04               ` Jason Gunthorpe
  0 siblings, 2 replies; 24+ messages in thread
From: Håkon Bugge @ 2018-05-11 10:55 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel



> On 10 May 2018, at 18:54, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> 
> On 5/10/2018 11:16 AM, Håkon Bugge wrote:
>> 
>> 
>>> On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>> 
>>> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>>>> 
>>>> 
>>>>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>>> 
>>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>>>> There is no point in using RDMA CM to establish a connection between
>>>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>>>> active and passive side use limited pkeys, they are not able to
>>>>>> communicate.
>>>>>> 
>>>>>> In order to detect this situation, the authentic pkey is used in the
>>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>>>> into the BTH in the IB packets.
>>>>>> 
>>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>>>> is required to match a limited one") ensures that
>>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>>>> compared has the full-member bit.
>>>>>> 
>>>>>> In the limited-to-limited case, this will prohibit the connection to
>>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>>>> 
>>>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>>>> ---
>>>>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
>>>>>> include/rdma/ib_cm.h         |  4 +++-
>>>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>>>> 
>>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>>>> --- a/drivers/infiniband/core/cm.c
>>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>>> @@ -3,6 +3,7 @@
>>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>>> *
>>>>>> * This software is available to you under a choice of one of two
>>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>>>> 	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
>>>>>> 	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
>>>>>> 	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
>>>>>> +	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",
>>>>> 
>>>>> If this patch goes ahead, IBA spec for CM should be updated to include this.
>>>> 
>>>> Sure, I see:
>>>> 
>>>> 33 Invalid Alternate Flow Label
>>>> 
>>>> as the latest in the spec.
>>> 
>>> This appears to be an implementation rather than architectural issue. If
>>> there is limited <-> limited CM, then the MAD would never get up to the
>>> CM layer as it would be filtered by end port pkey enforcement. This is
>>> only needed to protect against current code which can send on the full
>>> rather than limited pkey (in BTH), right ?
>> 
>> We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ.
> 
> Yes, I understand the difference between the 2 potentially different PKeys.
> 
>> I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. 
> 
> MADs are sent on reversible paths (13.5.4.1) where the response can be
> constructed from only header info in the request. BTH:pkey doesn't
> __have to be__ the default pkey. It just needs to be a common/shared
> pkey between the source/destination pair.
> 
> For example, there are some configurations where the port is only a
> limited member of the default partition (and full member of other non
> default partition(s)).
> 
> There are also configurations where the port is both full and limited
> member on one or more partition(s). This is the allow_both_pkeys TRUE
> case for OpenSM which also uses the 'both' syntax in the partitions
> config file. This appears to be the case to which you are referring.
> 
> CM MAD could be sent on default partition but that may not work if both
> source and dest are only limited members of the default partition.
> 
> CM MAD could be sent on any common/shared pkey between the
> source/destination ports.
> 
>> Further, we have per IBTA:
>> 
>> C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching.
>> 
>> So, by architecture, the MAD arrives
> In the case of port being both full and limited member of the same
> partition on which CM MAD comes in on (as defined by pkey in BTH).
> 
> I was talking about limited <-> limited CM and didn't realize you were
> talking about the ‘both' case.

OK. So we agree that for the most typical cases, where the BTH.PKey is the default one, the MAD will be received and not trapped by the PKey check.

>>> 
>>> 
>>>>> 
>>>>>> };
>>>>>> 
>>>>>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>>>>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
>>>>>> 		return -EINVAL;
>>>>>> 	cm_dev = port->cm_dev;
>>>>>> 
>>>>>> -	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>>>>>> -				  be16_to_cpu(path->pkey), &av->pkey_index);
>>>>>> +	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>>>>>> +					  be16_to_cpu(path->pkey), &av->pkey_index);
>>>>>> 	if (ret)
>>>>>> 		return ret;
>>>>>> 
>>>>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>>>>>> 	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>>>>>> 	cm_req_set_local_resp_timeout(req_msg,
>>>>>> 				      param->local_cm_response_timeout);
>>>>>> -	req_msg->pkey = param->primary_path->pkey;
>>>>>> +	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>>>>>> 	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>>>>>> 	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>>>>>> 
>>>>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>>>>>> 	cm_id_priv->responder_resources = param->responder_resources;
>>>>>> 	cm_id_priv->retry_count = param->retry_count;
>>>>>> 	cm_id_priv->path_mtu = param->primary_path->mtu;
>>>>>> -	cm_id_priv->pkey = param->primary_path->pkey;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * We want to send the pkey used in the BTH in packets
>>>>>> +	 * sent. This, in order for the passive side to determine if
>>>>>> +	 * communication is permitted by the respective pkeys.
>>>>>> +	 *
>>>>>> +	 * The pkey in the paths are derived from the MGID, which has
>>>>>> +	 * the full membership bit set. Hence, we retrieve the pkey by
>>>>>> +	 * using the address vector's pkey_index.
>>>>> 
>>>>> The paths usually come from the SM and I don't expect SM to provide path
>>>>> between ports of only limited members of partition.
>>>> 
>>>> In my case, it does. 
>>> 
>>> Which OpenSM flavor are you talking about (upstream, MLNX, or other) ?
>> 
>> I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609.
> 
> I'll check upstream OpenSM when I get a chance to see if same problem
> exists. Oracle OpenSM forked from upstream long time ago and not sure
> how similar the path record code is. There were various impacts for the
> shared port virtualization (for CX-3) in terms of adding the ability to
> allow both pkeys.

I just had a chat with Line H. here in Oracle. She confirms that Oracle’s SM and the upstream one act similar in this matter.

Further, using shared port virtualization, OpenSM will adhere to the physical ports only, and doesn’t even know the PKey assignments to the VFs assigned the VMs.

>>> What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ?
>> 
>> The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords.
>> 
>>>>> Default ACM provider
>>>>> forms path from multicast group parameters including pkey. Is that the
>>>>> scenario of concern ?
>>>> 
>>>> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg.
>>> 
>>> Not exactly sure what you mean by RDMA CM does this as there are a
>>> number of different ways to use RDMA CM.
>> 
>> I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one.
> 
> When you write "port uses limited one", you mean BTH:pkey is limited
> member but REQ:pkey is full member, right ?

Yes, but the BTH:pkey is, as stated above most often the default pkey (limited or full), whilst REQ:pkey is a designated, limited pkey.

>>> Are you referring to using
>>> IPoIB for address resolution ? Another way is for a path record to be
>>> passed in from user space and there is no control over it’s contents.
>> 
>> I did not bring up the path record or address resolution issue ;-)
> 
> You mentioned that "The pkey in the paths are derived from the MGID" and
> brought this issue up in my thinking about this issue.

Yes, just to inform why it is full - when it shouldn’t be.

>> But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2).
>> 
>> But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent.
>> 
>>> At a minimum, I think the comment mentioning MGID should be clarified.
>> 
>> Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? 
>> 
>> With the OpenSM I am running, the MC group contains limited members.
> 
> That is problematic as any sending to MC group by limited members will
> cause pkey violation traps to SM.

Sure, agree, but a separate issue from this series which handles the unicast case.

>>>>> If so, I still don't fully understand the scenario
>>>>> because limited members are not supposed to be part of a multicast
>>>>> group. There was some work started to extend this for client/server
>>>>> model but it was never completed. However, there may be hole(s) in
>>>>> various components of implementation which open(s) this door.
>>>> 
>>>> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component.
>>> 
>>> Agreed. I think that there are other components too...
>>> 
>>>> I think the CM REQ message should contain the correct PKey (fixed by this patch series).
>>>> 
>>>> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series).
>>> This is the backward compatibility for the current behavior on the
>>> active side so that REQ gets rejected rather than accepted, right ?
>> 
>> Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back.
> 
> Understood.
> 
>>> In general, the approach used when there is pkey mismatch is to silently
>>> drop packet rather than respond which causes a timeout on the requester
>>> side. In this specific case, I’m not sure which approach is better.
>> 
>> It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM.
> 
> I was only talking about the requester-responder part. Yes, pkey
> security trap would go to SM.
> 
> As I wrote, I'm not sure which is preferable: to explictly reject as
> this patch proposes or just silently drop and penalize requester.

Since the infrastructure is in place, i.e., the REJ reasons, my view is that an explicit REJ with a distinct reason is the most obvious.


Thxs, Håkon

> 
> -- Hal
> 
>>> Also, is there a similar issue with SIDR_REQ ?
>> 
>> Good point, most probaly!
>> 
>>> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case.
>>> 
>>> Understood and agreed.
>> 
>> 
>> Thxs, Håkon
>> 
>>> 
>>> -- Hal
>>> 
>>>> 
>>>> Thxs, Håkon
>>>> 
>>>> 
>>>>> 
>>>>> -- Hal
>>>>> 
>>>>>> +	 */
>>>>>> +	ret = ib_get_cached_pkey(cm_id_priv->id.device,
>>>>>> +				 cm_id_priv->av.port->port_num,
>>>>>> +				 cm_id_priv->av.pkey_index,
>>>>>> +				 &cm_id_priv->pkey);
>>>>>> +	if (ret)
>>>>>> +		goto error1;
>>>>>> +
>>>>>> 	cm_id_priv->qp_type = param->qp_type;
>>>>>> 
>>>>>> 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
>>>>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>>>>>> 				 cm_id_priv);
>>>>>> 	if (ret) {
>>>>>> 		int err;
>>>>>> +		int rej_reason = (ret == -ENOENT ?
>>>>>> +				  IB_CM_REJ_INVALID_PKEY :
>>>>>> +				  IB_CM_REJ_INVALID_GID);
>>>>>> 
>>>>>> 		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
>>>>>> 					work->port->port_num, 0,
>>>>>> 					&work->path[0].sgid,
>>>>>> 					NULL);
>>>>>> 		if (err)
>>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>>> 				       NULL, 0, NULL, 0);
>>>>>> 		else
>>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>>> 				       &work->path[0].sgid,
>>>>>> 				       sizeof(work->path[0].sgid),
>>>>>> 				       NULL, 0);
>>>>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
>>>>>> 		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
>>>>>> 					 cm_id_priv);
>>>>>> 		if (ret) {
>>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
>>>>>> +			int rej_reason = (ret == -ENOENT ?
>>>>>> +					  IB_CM_REJ_INVALID_PKEY :
>>>>>> +					  IB_CM_REJ_INVALID_ALT_GID);
>>>>>> +
>>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>>> 				       &work->path[0].sgid,
>>>>>> 				       sizeof(work->path[0].sgid), NULL, 0);
>>>>>> 			goto rejected;
>>>>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
>>>>>> index 7979cb04f529..56b62303946a 100644
>>>>>> --- a/include/rdma/ib_cm.h
>>>>>> +++ b/include/rdma/ib_cm.h
>>>>>> @@ -3,6 +3,7 @@
>>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>>> * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>>> *
>>>>>> * This software is available to you under a choice of one of two
>>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
>>>>>> 	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
>>>>>> 	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
>>>>>> 	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
>>>>>> -	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
>>>>>> +	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
>>>>>> +	IB_CM_REJ_INVALID_PKEY			= 34,
>>>>>> };
>>>>>> 
>>>>>> struct ib_cm_rej_event_param {
>>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-11 10:55             ` Håkon Bugge
@ 2018-05-11 12:51               ` Hal Rosenstock
  2018-05-14 21:04               ` Jason Gunthorpe
  1 sibling, 0 replies; 24+ messages in thread
From: Hal Rosenstock @ 2018-05-11 12:51 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On 5/11/2018 6:55 AM, Håkon Bugge wrote:
> 
> 
>> On 10 May 2018, at 18:54, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>
>> On 5/10/2018 11:16 AM, Håkon Bugge wrote:
>>>
>>>
>>>> On 10 May 2018, at 16:01, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>>
>>>> On 5/10/2018 5:16 AM, Håkon Bugge wrote:
>>>>>
>>>>>
>>>>>> On 9 May 2018, at 13:28, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>>>>
>>>>>> On 5/9/2018 5:30 AM, Håkon Bugge wrote:
>>>>>>> There is no point in using RDMA CM to establish a connection between
>>>>>>> two QPs that cannot possible communicate. Particularly, if both the
>>>>>>> active and passive side use limited pkeys, they are not able to
>>>>>>> communicate.
>>>>>>>
>>>>>>> In order to detect this situation, the authentic pkey is used in the
>>>>>>> CM REQ message. The authentic pkey is the one that the HCA inserts
>>>>>>> into the BTH in the IB packets.
>>>>>>>
>>>>>>> When the passive side receives the REQ, commit ("ib_core: A full pkey
>>>>>>> is required to match a limited one") ensures that
>>>>>>> ib_find_matched_cached_pkey() fails unless at least one of the pkeys
>>>>>>> compared has the full-member bit.
>>>>>>>
>>>>>>> In the limited-to-limited case, this will prohibit the connection to
>>>>>>> be formed, and thus, Pkey Violation Traps will not be sent to the SM.
>>>>>>>
>>>>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>>>>> ---
>>>>>>> drivers/infiniband/core/cm.c | 39 ++++++++++++++++++++++++++++++++-------
>>>>>>> include/rdma/ib_cm.h         |  4 +++-
>>>>>>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>>>> index a92e1a5c202b..52ed51d5bd2a 100644
>>>>>>> --- a/drivers/infiniband/core/cm.c
>>>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>>>> @@ -3,6 +3,7 @@
>>>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>>>> *
>>>>>>> * This software is available to you under a choice of one of two
>>>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>>>> @@ -91,6 +92,7 @@ static const char * const ibcm_rej_reason_strs[] = {
>>>>>>> 	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
>>>>>>> 	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
>>>>>>> 	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
>>>>>>> +	[IB_CM_REJ_INVALID_PKEY]		= "invalid PKey",
>>>>>>
>>>>>> If this patch goes ahead, IBA spec for CM should be updated to include this.
>>>>>
>>>>> Sure, I see:
>>>>>
>>>>> 33 Invalid Alternate Flow Label
>>>>>
>>>>> as the latest in the spec.
>>>>
>>>> This appears to be an implementation rather than architectural issue. If
>>>> there is limited <-> limited CM, then the MAD would never get up to the
>>>> CM layer as it would be filtered by end port pkey enforcement. This is
>>>> only needed to protect against current code which can send on the full
>>>> rather than limited pkey (in BTH), right ?
>>>
>>> We are talking about two things here. The PKey in the BTH and the PKey in the CM REQ payload. They differ.
>>
>> Yes, I understand the difference between the 2 potentially different PKeys.
>>
>>> I am out of office, but if my memory serves me correct, the PKey in the BTH in the MAD packet will be the default PKey. 
>>
>> MADs are sent on reversible paths (13.5.4.1) where the response can be
>> constructed from only header info in the request. BTH:pkey doesn't
>> __have to be__ the default pkey. It just needs to be a common/shared
>> pkey between the source/destination pair.
>>
>> For example, there are some configurations where the port is only a
>> limited member of the default partition (and full member of other non
>> default partition(s)).
>>
>> There are also configurations where the port is both full and limited
>> member on one or more partition(s). This is the allow_both_pkeys TRUE
>> case for OpenSM which also uses the 'both' syntax in the partitions
>> config file. This appears to be the case to which you are referring.
>>
>> CM MAD could be sent on default partition but that may not work if both
>> source and dest are only limited members of the default partition.
>>
>> CM MAD could be sent on any common/shared pkey between the
>> source/destination ports.
>>
>>> Further, we have per IBTA:
>>>
>>> C10-132: Packets sent to the General Service Interface QP (QP1) shall be accepted if the P_Key in the packet matches any valid P_Key in the P_Key Table of the port on which the packet arrived. Matching is defined in 10.9.3 Partition Key Matching.
>>>
>>> So, by architecture, the MAD arrives
>> In the case of port being both full and limited member of the same
>> partition on which CM MAD comes in on (as defined by pkey in BTH).
>>
>> I was talking about limited <-> limited CM and didn't realize you were
>> talking about the ‘both' case.
> 
> OK. So we agree that for the most typical cases, where the BTH.PKey is the default one, the MAD will be received and not trapped by the PKey check.
> 
>>>>
>>>>
>>>>>>
>>>>>>> };
>>>>>>>
>>>>>>> const char *__attribute_const__ ibcm_reject_msg(int reason)
>>>>>>> @@ -518,8 +520,8 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av,
>>>>>>> 		return -EINVAL;
>>>>>>> 	cm_dev = port->cm_dev;
>>>>>>>
>>>>>>> -	ret = ib_find_cached_pkey(cm_dev->ib_device, port->port_num,
>>>>>>> -				  be16_to_cpu(path->pkey), &av->pkey_index);
>>>>>>> +	ret = ib_find_matched_cached_pkey(cm_dev->ib_device, port->port_num,
>>>>>>> +					  be16_to_cpu(path->pkey), &av->pkey_index);

This routine is shared by LAP send/rcvd and SIDR send. Is this change
the correct thing to do there as well ? Were LAP and SIDR tested ?

>>>>>>> 	if (ret)
>>>>>>> 		return ret;
>>>>>>>
>>>>>>> @@ -1241,7 +1243,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>>>>>>> 	cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn));
>>>>>>> 	cm_req_set_local_resp_timeout(req_msg,
>>>>>>> 				      param->local_cm_response_timeout);
>>>>>>> -	req_msg->pkey = param->primary_path->pkey;
>>>>>>> +	req_msg->pkey = cpu_to_be16(cm_id_priv->pkey);
>>>>>>> 	cm_req_set_path_mtu(req_msg, param->primary_path->mtu);
>>>>>>> 	cm_req_set_max_cm_retries(req_msg, param->max_cm_retries);
>>>>>>>
>>>>>>> @@ -1396,7 +1398,23 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
>>>>>>> 	cm_id_priv->responder_resources = param->responder_resources;
>>>>>>> 	cm_id_priv->retry_count = param->retry_count;
>>>>>>> 	cm_id_priv->path_mtu = param->primary_path->mtu;
>>>>>>> -	cm_id_priv->pkey = param->primary_path->pkey;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * We want to send the pkey used in the BTH in packets
>>>>>>> +	 * sent. This, in order for the passive side to determine if
>>>>>>> +	 * communication is permitted by the respective pkeys.
>>>>>>> +	 *
>>>>>>> +	 * The pkey in the paths are derived from the MGID, which has
>>>>>>> +	 * the full membership bit set. Hence, we retrieve the pkey by
>>>>>>> +	 * using the address vector's pkey_index.
>>>>>>
>>>>>> The paths usually come from the SM and I don't expect SM to provide path
>>>>>> between ports of only limited members of partition.
>>>>>
>>>>> In my case, it does. 
>>>>
>>>> Which OpenSM flavor are you talking about (upstream, MLNX, or other) ?
>>>
>>> I guess other is the right term, as its Oracle’s flavour, 3.2.6_20170609.
>>
>> I'll check upstream OpenSM when I get a chance to see if same problem
>> exists. Oracle OpenSM forked from upstream long time ago and not sure
>> how similar the path record code is. There were various impacts for the
>> shared port virtualization (for CX-3) in terms of adding the ability to
>> allow both pkeys.
> 
> I just had a chat with Line H. here in Oracle. She confirms that Oracle’s SM and the upstream one act similar in this matter.

To be sure, I will double check this with upstream when I get a few
cycles to do this.

> Further, using shared port virtualization, OpenSM will adhere to the physical ports only, and doesn’t even know the PKey assignments to the VFs assigned the VMs.

Right; with the shared port virtualization, the SM only knows about the
pport (physical port) partitioning whereas with the newer virtualization
annex there is also the vport (VF) partitioning in addition to the pport
partitioning. This has an impact on the paths that can be provided.

> 
>>>> What is OpenSM version string ? Is allow_both_pkeys TRUE or FALSE ?
>>>
>>> The ports do have both limited and full memberships, and the partitions.conf file contains both “limited”, “both”, and the “full” keywords.
>>>
>>>>>> Default ACM provider
>>>>>> forms path from multicast group parameters including pkey. Is that the
>>>>>> scenario of concern ?
>>>>>
>>>>> Also RDMA CM does that. Do an ibdump of a CM REQ message sent from a limited port, and you will see the PKey is the full member in the CM REQ msg.
>>>>
>>>> Not exactly sure what you mean by RDMA CM does this as there are a
>>>> number of different ways to use RDMA CM.
>>>
>>> I am talking specifically about the CM REQ message. For example, if you run an IB-UV test program and specifies that RDMA CM should be used for connection management. Or it could be kernel ULPs. My point is that the pkey field in the CM REQ message (i.e. not the one in the BTH) is always the full one, even though the port uses the limited one.
>>
>> When you write "port uses limited one", you mean BTH:pkey is limited
>> member but REQ:pkey is full member, right ?
> 
> Yes, but the BTH:pkey is, as stated above most often the default pkey (limited or full), whilst REQ:pkey is a designated, limited pkey.

FWIW I think the comment "The pkey in the paths are derived from the
MGID" can be improved as this is one use case and is not the most common
one.

>>>> Are you referring to using
>>>> IPoIB for address resolution ? Another way is for a path record to be
>>>> passed in from user space and there is no control over it’s contents.
>>>
>>> I did not bring up the path record or address resolution issue ;-)
>>
>> You mentioned that "The pkey in the paths are derived from the MGID" and
>> brought this issue up in my thinking about this issue.
> 
> Yes, just to inform why it is full - when it shouldn’t be.
> 
>>> But I guess you are alluding to the limited-to-limited case should 1) not have been able to perform address resolution and/or 2) the path record should not have been returned. I agree to both 1) and 2).
>>>
>>> But if both 1) and 2) succeeds, this patch series is a last resort to stop the connection to be established and avoid pkey trap violation traps being sent.
>>>
>>>> At a minimum, I think the comment mentioning MGID should be clarified.
>>>
>>> Is the question if an MC group attached by a limited port will include other limited ports or only full member ports? 
>>>
>>> With the OpenSM I am running, the MC group contains limited members.
>>
>> That is problematic as any sending to MC group by limited members will
>> cause pkey violation traps to SM.
> 
> Sure, agree, but a separate issue from this series which handles the unicast case.
> 
>>>>>> If so, I still don't fully understand the scenario
>>>>>> because limited members are not supposed to be part of a multicast
>>>>>> group. There was some work started to extend this for client/server
>>>>>> model but it was never completed. However, there may be hole(s) in
>>>>>> various components of implementation which open(s) this door.
>>>>>
>>>>> I view OpenSM not returning a valid path between two limited members an orthogonal issue, as OpenSM is another component.
>>>>
>>>> Agreed. I think that there are other components too...
>>>>
>>>>> I think the CM REQ message should contain the correct PKey (fixed by this patch series).
>>>>>
>>>>> And in the event the passive side being a limited member and receives a CM REQ with a limited PKey, that connection should not be formed (fixed by this patch series).
>>>> This is the backward compatibility for the current behavior on the
>>>> active side so that REQ gets rejected rather than accepted, right ?
>>>
>>> Not sure about the backwards compatibility part, but I certainly agree to the rest. Put it another way, when the MAD payload’s pkey is limited in the CM REQ, and there is only a limited matching key in the pkey cache, a REJ is sent back.
>>
>> Understood.
>>
>>>> In general, the approach used when there is pkey mismatch is to silently
>>>> drop packet rather than respond which causes a timeout on the requester
>>>> side. In this specific case, I’m not sure which approach is better.
>>>
>>> It is not silently dropped when the port supports pkey trap messages. Then a trap message is sent to the OpenSM.
>>
>> I was only talking about the requester-responder part. Yes, pkey
>> security trap would go to SM.
>>
>> As I wrote, I'm not sure which is preferable: to explictly reject as
>> this patch proposes or just silently drop and penalize requester.
> 
> Since the infrastructure is in place, i.e., the REJ reasons, my view is that an explicit REJ with a distinct reason is the most obvious.

The downsides are that it reveals info that perhaps should not be
revealed and allows the requestor to rerequest sooner. In some
scenarios, that could cause DoS attack on the responder (server).

-- Hal

> 
> Thxs, Håkon
> 
>>
>> -- Hal
>>
>>>> Also, is there a similar issue with SIDR_REQ ?
>>>
>>> Good point, most probaly!
>>>
>>>> Read me correct, I am also in favour of fixing the OpenSM to not return a valid (but useless) path record in this case.
>>>>
>>>> Understood and agreed.
>>>
>>>
>>> Thxs, Håkon
>>>
>>>>
>>>> -- Hal
>>>>
>>>>>
>>>>> Thxs, Håkon
>>>>>
>>>>>
>>>>>>
>>>>>> -- Hal
>>>>>>
>>>>>>> +	 */
>>>>>>> +	ret = ib_get_cached_pkey(cm_id_priv->id.device,
>>>>>>> +				 cm_id_priv->av.port->port_num,
>>>>>>> +				 cm_id_priv->av.pkey_index,
>>>>>>> +				 &cm_id_priv->pkey);
>>>>>>> +	if (ret)
>>>>>>> +		goto error1;
>>>>>>> +
>>>>>>> 	cm_id_priv->qp_type = param->qp_type;
>>>>>>>
>>>>>>> 	ret = cm_alloc_msg(cm_id_priv, &cm_id_priv->msg);
>>>>>>> @@ -1956,16 +1974,19 @@ static int cm_req_handler(struct cm_work *work)
>>>>>>> 				 cm_id_priv);
>>>>>>> 	if (ret) {
>>>>>>> 		int err;
>>>>>>> +		int rej_reason = (ret == -ENOENT ?
>>>>>>> +				  IB_CM_REJ_INVALID_PKEY :
>>>>>>> +				  IB_CM_REJ_INVALID_GID);
>>>>>>>
>>>>>>> 		err = ib_get_cached_gid(work->port->cm_dev->ib_device,
>>>>>>> 					work->port->port_num, 0,
>>>>>>> 					&work->path[0].sgid,
>>>>>>> 					NULL);
>>>>>>> 		if (err)
>>>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>>>> 				       NULL, 0, NULL, 0);
>>>>>>> 		else
>>>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
>>>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>>>> 				       &work->path[0].sgid,
>>>>>>> 				       sizeof(work->path[0].sgid),
>>>>>>> 				       NULL, 0);
>>>>>>> @@ -1975,7 +1996,11 @@ static int cm_req_handler(struct cm_work *work)
>>>>>>> 		ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av,
>>>>>>> 					 cm_id_priv);
>>>>>>> 		if (ret) {
>>>>>>> -			ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID,
>>>>>>> +			int rej_reason = (ret == -ENOENT ?
>>>>>>> +					  IB_CM_REJ_INVALID_PKEY :
>>>>>>> +					  IB_CM_REJ_INVALID_ALT_GID);
>>>>>>> +
>>>>>>> +			ib_send_cm_rej(cm_id, rej_reason,
>>>>>>> 				       &work->path[0].sgid,
>>>>>>> 				       sizeof(work->path[0].sgid), NULL, 0);
>>>>>>> 			goto rejected;
>>>>>>> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
>>>>>>> index 7979cb04f529..56b62303946a 100644
>>>>>>> --- a/include/rdma/ib_cm.h
>>>>>>> +++ b/include/rdma/ib_cm.h
>>>>>>> @@ -3,6 +3,7 @@
>>>>>>> * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
>>>>>>> * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
>>>>>>> + * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
>>>>>>> *
>>>>>>> * This software is available to you under a choice of one of two
>>>>>>> * licenses.  You may choose to be licensed under the terms of the GNU
>>>>>>> @@ -183,7 +184,8 @@ enum ib_cm_rej_reason {
>>>>>>> 	IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID	= 30,
>>>>>>> 	IB_CM_REJ_INVALID_CLASS_VERSION		= 31,
>>>>>>> 	IB_CM_REJ_INVALID_FLOW_LABEL		= 32,
>>>>>>> -	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33
>>>>>>> +	IB_CM_REJ_INVALID_ALT_FLOW_LABEL	= 33,
>>>>>>> +	IB_CM_REJ_INVALID_PKEY			= 34,
>>>>>>> };
>>>>>>>
>>>>>>> struct ib_cm_rej_event_param {
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-10 15:16         ` Håkon Bugge
  2018-05-10 16:54           ` Hal Rosenstock
@ 2018-05-14 21:02           ` Jason Gunthorpe
  2018-05-15  0:38             ` Hal Rosenstock
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2018-05-14 21:02 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote:

> We are talking about two things here. The PKey in the BTH and the
> PKey in the CM REQ payload. They differ.
> 
> I am out of office, but if my memory serves me correct, the PKey in
> the BTH in the MAD packet will be the default PKey. Further, we have
> per IBTA:

This sounds like a Linux bug.

Linux does not do a PR to get a reversible path dedicated to the GMP,
so it always uses the data flow path, thus the GMP path paramenters
and those in the REQ should always exactly match.

Where is Linux setting the BTH.PKey and how did it choose to use the
default pkey? Lets fix that at least for sure.

Once that is fixed the rest of the series makes no sense since a REQ
with invalid PKey will never arrive.

However...

This series seems inconsistent with the spec.

IIRC the spec doesn't say if a full or limited pkey should be placed
in the REQ (Hal?). It is designed so that the requestor can get a
single reversible path and put that results into the REQ without
additional processing, however the PR returns only one PKey and again,
itis not really specified if it should be the full or limited pkey
(Hal?).

Basically this means that any pkey in the REQ could randomly be the
full or limited value, and that in-of-itself has not bearing on the
connection.

So it is quite wrong to insist that the pkey be limited or full when
processing the REQ. The end port is expected to match against the
local table.

The real answer to your trap problem is to fix the SM to not create
paths that are non-functional, that is just flat out broken SM
behavior.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-11 10:55             ` Håkon Bugge
  2018-05-11 12:51               ` Hal Rosenstock
@ 2018-05-14 21:04               ` Jason Gunthorpe
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2018-05-14 21:04 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On Fri, May 11, 2018 at 12:55:00PM +0200, Håkon Bugge wrote:
> > I'll check upstream OpenSM when I get a chance to see if same problem
> > exists. Oracle OpenSM forked from upstream long time ago and not sure
> > how similar the path record code is. There were various impacts for the
> > shared port virtualization (for CX-3) in terms of adding the ability to
> > allow both pkeys.
> 
> I just had a chat with Line H. here in Oracle. She confirms that Oracle’s SM and the upstream one act similar in this matter.
> 
> Further, using shared port virtualization, OpenSM will adhere to the
> physical ports only, and doesn’t even know the PKey assignments to
> the VFs assigned the VMs.

Yes, but that isn't the point. If the VM receives a PR with a pkey
that is not found in the VM's table then it should not even send a CM
REQ in the first place.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-14 21:02           ` Jason Gunthorpe
@ 2018-05-15  0:38             ` Hal Rosenstock
  2018-05-15 18:11               ` Håkon Bugge
  0 siblings, 1 reply; 24+ messages in thread
From: Hal Rosenstock @ 2018-05-15  0:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Håkon Bugge
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On 5/14/2018 5:02 PM, Jason Gunthorpe wrote:
> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote:
> 
>> We are talking about two things here. The PKey in the BTH and the
>> PKey in the CM REQ payload. They differ.
>>
>> I am out of office, but if my memory serves me correct, the PKey in
>> the BTH in the MAD packet will be the default PKey. Further, we have
>> per IBTA:
> 
> This sounds like a Linux bug.
> 
> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters
> and those in the REQ should always exactly match.
>
> Where is Linux setting the BTH.PKey and how did it choose to use the
> default pkey? Lets fix that at least for sure.
> 
> Once that is fixed the rest of the series makes no sense since a REQ
> with invalid PKey will never arrive.
> 
> However...
> 
> This series seems inconsistent with the spec.
> 
> IIRC the spec doesn't say if a full or limited pkey should be placed
> in the REQ (Hal?). 

CM spec for REQ just says partition key without indicating whether this
means P_Key or just the partition (15 bits) so my read is that either
full or limited pkey is allowed in REQ.

> It is designed so that the requestor can get a
> single reversible path and put that results into the REQ without
> additional processing, however the PR returns only one PKey and again,
> it is not really specified if it should be the full or limited pkey
> (Hal?).

Correct; it's not specified.

> Basically this means that any pkey in the REQ could randomly be the
> full or limited value, and that in-of-itself has not bearing on the
> connection.
> 
> So it is quite wrong to insist that the pkey be limited or full when
> processing the REQ. The end port is expected to match against the
> local table.

Note that there is thorny issue with shared (physical) port
virtualization. In shared port virtualization, the VF pkey assignment is
a local matter. Only thing SM knows is the physical port pkeys where
both full and limited membership of same partition is possible. It is
conceivable that CM REQ contains limited partition key between 2 limited
VFs and for that a new REJ reason code appears to be needed.

-- Hal

> The real answer to your trap problem is to fix the SM to not create
> paths that are non-functional, that is just flat out broken SM
> behavior.
>
> Jason
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-15  0:38             ` Hal Rosenstock
@ 2018-05-15 18:11               ` Håkon Bugge
  2018-05-15 19:04                 ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Håkon Bugge @ 2018-05-15 18:11 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Jason Gunthorpe, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel



> On 15 May 2018, at 02:38, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> 
> On 5/14/2018 5:02 PM, Jason Gunthorpe wrote:
>> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote:
>> 
>>> We are talking about two things here. The PKey in the BTH and the
>>> PKey in the CM REQ payload. They differ.
>>> 
>>> I am out of office, but if my memory serves me correct, the PKey in
>>> the BTH in the MAD packet will be the default PKey. Further, we have
>>> per IBTA:
>> 
>> This sounds like a Linux bug.
>> 
>> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters
>> and those in the REQ should always exactly match.
>> 
>> Where is Linux setting the BTH.PKey and how did it choose to use the
>> default pkey? Lets fix that at least for sure.

Linux isn’t. The BTH.PKey is inserted by the HCA (hw or fw) coming from the P_Key table (10.9.2 in IBTA), selected by a pkey_index associated with the QP.

As per C10-133: Packets sent from the Send Queue of a GSI QP shall attach a P_Key associated with that QP, just as a P_Key is associated with nonmanagement QPs

>> Once that is fixed the rest of the series makes no sense since a REQ
>> with invalid PKey will never arrive.
>> 
>> However...
>> 
>> This series seems inconsistent with the spec.
>> 
>> IIRC the spec doesn't say if a full or limited pkey should be placed
>> in the REQ (Hal?). 
> 
> CM spec for REQ just says partition key without indicating whether this
> means P_Key or just the partition (15 bits) so my read is that either
> full or limited pkey is allowed in REQ.
> 
>> It is designed so that the requestor can get a
>> single reversible path and put that results into the REQ without
>> additional processing, however the PR returns only one PKey and again,
>> it is not really specified if it should be the full or limited pkey
>> (Hal?).
> 
> Correct; it's not specified.
> 
>> Basically this means that any pkey in the REQ could randomly be the
>> full or limited value, and that in-of-itself has not bearing on the
>> connection.
>> 
>> So it is quite wrong to insist that the pkey be limited or full when
>> processing the REQ. The end port is expected to match against the
>> local table.
> 
> Note that there is thorny issue with shared (physical) port
> virtualization. In shared port virtualization, the VF pkey assignment is
> a local matter. Only thing SM knows is the physical port pkeys where
> both full and limited membership of same partition is possible. It is
> conceivable that CM REQ contains limited partition key between 2 limited
> VFs and for that a new REJ reason code appears to be needed.

+1


Håkon

> 
> -- Hal
> 
>> The real answer to your trap problem is to fix the SM to not create
>> paths that are non-functional, that is just flat out broken SM
>> behavior.
>> 
>> Jason
>> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-15 18:11               ` Håkon Bugge
@ 2018-05-15 19:04                 ` Jason Gunthorpe
  2018-05-16  6:47                   ` Håkon Bugge
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2018-05-15 19:04 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On Tue, May 15, 2018 at 08:11:09PM +0200, Håkon Bugge wrote:
> > On 15 May 2018, at 02:38, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> > 
> > On 5/14/2018 5:02 PM, Jason Gunthorpe wrote:
> >> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote:
> >> 
> >>> We are talking about two things here. The PKey in the BTH and the
> >>> PKey in the CM REQ payload. They differ.
> >>> 
> >>> I am out of office, but if my memory serves me correct, the PKey in
> >>> the BTH in the MAD packet will be the default PKey. Further, we have
> >>> per IBTA:
> >> 
> >> This sounds like a Linux bug.
> >> 
> >> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters
> >> and those in the REQ should always exactly match.
> >> 
> >> Where is Linux setting the BTH.PKey and how did it choose to use the
> >> default pkey? Lets fix that at least for sure.
> 
> Linux isn’t. The BTH.PKey is inserted by the HCA (hw or fw) coming from the P_Key table (10.9.2 in IBTA), selected by a pkey_index associated with the QP.
> 
> As per C10-133: Packets sent from the Send Queue of a GSI QP shall
> attach a P_Key associated with that QP, just as a P_Key is
> associated with nonmanagement QPs

That language doesn't mean the PKey is forced to the default, it says
the pkey is programmable.

Linux implemented programmable PKeys for the GSI by using the work
request, so it deviates from what the spec imagined (which was
probably one GSI QP per PKey).

See ib_create_send_mad for instance:

        mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe;
        mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
        mad_send_wr->send_wr.wr.num_sge = 2;
        mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
        mad_send_wr->send_wr.wr.send_flags = IB_SEND_SIGNALED;
        mad_send_wr->send_wr.remote_qpn = remote_qpn;
        mad_send_wr->send_wr.remote_qkey = IB_QP_SET_QKEY;
        mad_send_wr->send_wr.pkey_index = pkey_index;

The pkey_index associated with the QP1 is ignored:

                /*
                 * PKey index for QP1 is irrelevant but
                 * one is needed for the Reset to Init transition
                 */
                attr->qp_state = IB_QPS_INIT;
                attr->pkey_index = pkey_index;
                attr->qkey = (qp->qp_num == 0) ? 0 : IB_QP1_QKEY;
                ret = ib_modify_qp(qp, attr, IB_QP_STATE |
                                             IB_QP_PKEY_INDEX | IB_QP_QKEY);

Since is is present in every WR.

> >> Basically this means that any pkey in the REQ could randomly be the
> >> full or limited value, and that in-of-itself has not bearing on the
> >> connection.
> >> 
> >> So it is quite wrong to insist that the pkey be limited or full when
> >> processing the REQ. The end port is expected to match against the
> >> local table.
> > 
> > Note that there is thorny issue with shared (physical) port
> > virtualization. In shared port virtualization, the VF pkey assignment is
> > a local matter. Only thing SM knows is the physical port pkeys where
> > both full and limited membership of same partition is possible. It is
> > conceivable that CM REQ contains limited partition key between 2 limited
> > VFs and for that a new REJ reason code appears to be needed.
> 
> +1

This is not a difficult issue.

If the GMP is properly tagged with the right PKey then it will never
be delivered to the VM if the VM does not have the PKey in the
table. It is up to the hypervisor to block GMPs that have Pkeys not in
the virtual PKey table of the VF.

The only time you could need a new REJ code is if the GMP is using a
PKey different from the REQ - which is a pretty goofy thing to do
considering this VM case.

Remember the SM doesn't know what Pkeys are in the VM, so it is
basically impossible for the REQ side to reliably select two different
pkeys and know that they will bothmake it to the VM.

One pkey could be done reliably if it matched ipoib, for instance, but
two really cannot in the general case.

So again - the bug here is that the GMP is being sent with the wrong
pkey. Fix that, then see what is left to fix..

If I recall there were bugs here in mlx drivers, where the driver sent
with the wrong Pkey. I think this has actually been fixed now, so
please check the upstream kernel to be sure the Pkey is not what it is
supposed to be.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-15 19:04                 ` Jason Gunthorpe
@ 2018-05-16  6:47                   ` Håkon Bugge
  2018-05-16 15:12                     ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Håkon Bugge @ 2018-05-16  6:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel



> On 15 May 2018, at 21:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Tue, May 15, 2018 at 08:11:09PM +0200, Håkon Bugge wrote:
>>> On 15 May 2018, at 02:38, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>> 
>>> On 5/14/2018 5:02 PM, Jason Gunthorpe wrote:
>>>> On Thu, May 10, 2018 at 05:16:28PM +0200, Håkon Bugge wrote:
>>>> 
>>>>> We are talking about two things here. The PKey in the BTH and the
>>>>> PKey in the CM REQ payload. They differ.
>>>>> 
>>>>> I am out of office, but if my memory serves me correct, the PKey in
>>>>> the BTH in the MAD packet will be the default PKey. Further, we have
>>>>> per IBTA:
>>>> 
>>>> This sounds like a Linux bug.
>>>> 
>>>> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters
>>>> and those in the REQ should always exactly match.
>>>> 
>>>> Where is Linux setting the BTH.PKey and how did it choose to use the
>>>> default pkey? Lets fix that at least for sure.
>> 
>> Linux isn’t. The BTH.PKey is inserted by the HCA (hw or fw) coming from the P_Key table (10.9.2 in IBTA), selected by a pkey_index associated with the QP.
>> 
>> As per C10-133: Packets sent from the Send Queue of a GSI QP shall
>> attach a P_Key associated with that QP, just as a P_Key is
>> associated with nonmanagement QPs
> 
> That language doesn't mean the PKey is forced to the default, it says
> the pkey is programmable.
> 
> Linux implemented programmable PKeys for the GSI by using the work
> request, so it deviates from what the spec imagined (which was
> probably one GSI QP per PKey).
> 
> See ib_create_send_mad for instance:
> 
>        mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe;
>        mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
>        mad_send_wr->send_wr.wr.num_sge = 2;
>        mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
>        mad_send_wr->send_wr.wr.send_flags = IB_SEND_SIGNALED;
>        mad_send_wr->send_wr.remote_qpn = remote_qpn;
>        mad_send_wr->send_wr.remote_qkey = IB_QP_SET_QKEY;
>        mad_send_wr->send_wr.pkey_index = pkey_index;
> 
> The pkey_index associated with the QP1 is ignored:
> 
>                /*
>                 * PKey index for QP1 is irrelevant but
>                 * one is needed for the Reset to Init transition
>                 */
>                attr->qp_state = IB_QPS_INIT;
>                attr->pkey_index = pkey_index;
>                attr->qkey = (qp->qp_num == 0) ? 0 : IB_QP1_QKEY;
>                ret = ib_modify_qp(qp, attr, IB_QP_STATE |
>                                             IB_QP_PKEY_INDEX | IB_QP_QKEY);
> 
> Since is is present in every WR.
> 
>>>> Basically this means that any pkey in the REQ could randomly be the
>>>> full or limited value, and that in-of-itself has not bearing on the
>>>> connection.
>>>> 
>>>> So it is quite wrong to insist that the pkey be limited or full when
>>>> processing the REQ. The end port is expected to match against the
>>>> local table.
>>> 
>>> Note that there is thorny issue with shared (physical) port
>>> virtualization. In shared port virtualization, the VF pkey assignment is
>>> a local matter. Only thing SM knows is the physical port pkeys where
>>> both full and limited membership of same partition is possible. It is
>>> conceivable that CM REQ contains limited partition key between 2 limited
>>> VFs and for that a new REJ reason code appears to be needed.
>> 
>> +1
> 
> This is not a difficult issue.
> 
> If the GMP is properly tagged with the right PKey then it will never
> be delivered to the VM if the VM does not have the PKey in the
> table.

Not quite right. For the shared port model, a GMP will (most probably) be accepted by the physical port, due to:

IBTA 3.9.5: QP1 is a member of all of the port’s partitions (i.e., can accept any packet specifying a P_Key contained in the port’s P_Key table).

and 

C9-42: If the destination QP is QP1, the BTH:P_Key shall be compared to the set of P_Keys associated with the port on which the packet arrived. If the P_Key matches any of the keys associated with the port, it shall be considered valid.

So, if the GMP is destined to VM1, which is a limited member, but we have another VM2 which is a full member of the same partition, the GMP will pass the HCA’s PKey check.

> It is up to the hypervisor to block GMPs that have Pkeys not in
> the virtual PKey table of the VF.

The packet is received by the HCA and stripped from IB headers, in particular the BTH. How can the "hypervisor" block it when its doesn’t have access to the GMP’s BTH.PKey?

> The only time you could need a new REJ code is if the GMP is using a
> PKey different from the REQ - which is a pretty goofy thing to do
> considering this VM case.

Its goofy. In the CX-3 shared port model, the BTH.PKey is the default one and the REQ.PKey is the full one even if the sending VM’s port only is a limited member. This patch series fixes the last issue.

> Remember the SM doesn't know what Pkeys are in the VM, so it is
> basically impossible for the REQ side to reliably select two different
> pkeys and know that they will bothmake it to the VM.

The active side should use the "authentic" PKey in the REQ message. That is the one that would be used in BTH.PKey when communication has been established. This is implemented by this patch series.

Not sure what you mean by "reliably select two different pkeys". The CM REQ message contains one PKey.

> One pkey could be done reliably if it matched ipoib, for instance, but
> two really cannot in the general case.
> 
> So again - the bug here is that the GMP is being sent with the wrong
> pkey. Fix that, then see what is left to fix..

See above, not sure how that could be implemented. And if it is solved by the HCA trapping the GMP due to the PKey check, it doesn’t help me, as the purpose of the series is to avoid (excessive) PKey traps sent to the SM.

> If I recall there were bugs here in mlx drivers, where the driver sent
> with the wrong Pkey. I think this has actually been fixed now, so
> please check the upstream kernel to be sure the Pkey is not what it is
> supposed to be.

Let me get back to you with some ibdumps here.


Håkon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-16  6:47                   ` Håkon Bugge
@ 2018-05-16 15:12                     ` Jason Gunthorpe
  2018-05-16 16:42                       ` Hal Rosenstock
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2018-05-16 15:12 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Hal Rosenstock, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On Wed, May 16, 2018 at 08:47:21AM +0200, Håkon Bugge wrote:

> > This is not a difficult issue.
> > 
> > If the GMP is properly tagged with the right PKey then it will never
> > be delivered to the VM if the VM does not have the PKey in the
> > table.
> 
> Not quite right. For the shared port model, a GMP will (most
> probably) be accepted by the physical port, due to:

Sure, but I am talking about the VM's 'virtual port'.

> So, if the GMP is destined to VM1, which is a limited member, but we
> have another VM2 which is a full member of the same partition, the
> GMP will pass the HCA’s PKey check.

Sure.

> > It is up to the hypervisor to block GMPs that have Pkeys not in
> > the virtual PKey table of the VF.
> 
> The packet is received by the HCA and stripped from IB headers, in
> particular the BTH. How can the "hypervisor" block it when its
> doesn’t have access to the GMP’s BTH.PKey?

This is wrong too, in Linux the WC's for GMP packets include the pkey
index that was used to RX the packet. This is a Linux extension
matching the one on the sending side.

The hypervisor *must* block these GMPs, it is a hard requirement of
the pkey security model for VMs.

AFAIK the Mellanox drivers do this right - do you know differently?

> > The only time you could need a new REJ code is if the GMP is using a
> > PKey different from the REQ - which is a pretty goofy thing to do
> > considering this VM case.
>
> Its goofy. In the CX-3 shared port model, the BTH.PKey is the
> default one and the REQ.PKey is the full one even if the sending
> VM’s port only is a limited member. This patch series fixes the last
> issue.

Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same -
please fix that first before trying to change anything else.

> > Remember the SM doesn't know what Pkeys are in the VM, so it is
> > basically impossible for the REQ side to reliably select two different
> > pkeys and know that they will bothmake it to the VM.
> 
> The active side should use the "authentic" PKey in the REQ
> message. That is the one that would be used in BTH.PKey when
> communication has been established. This is implemented by this
> patch series.
> 
> Not sure what you mean by "reliably select two different pkeys". The
> CM REQ message contains one PKey.

If BTH.Pkey != REQ.PKey then the requestor side has to obviously
select two PKeys, which is basically impossible.

The VM should not be part of the default partition, for instance.

> See above, not sure how that could be implemented. And if it is
> solved by the HCA trapping the GMP due to the PKey check, it doesn’t
> help me, as the purpose of the series is to avoid (excessive) PKey
> traps sent to the SM.

It might not help you, but is shows this fix for the pkey trap issue
is wrong. You must fix the pkey traps on the sending side, not on the
responder side..

> > If I recall there were bugs here in mlx drivers, where the driver
> > sent with the wrong Pkey. I think this has actually been fixed
> > now, so please check the upstream kernel to be sure the Pkey is
> > not what it is supposed to be.
> 
> Let me get back to you with some ibdumps here.

Upstream kernel (eg 4.16) and newish Mellanox firmware please.

And it would be fantastic if you could ID why this is happening, AFAIK
it should be OK in the kernel.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-16 15:12                     ` Jason Gunthorpe
@ 2018-05-16 16:42                       ` Hal Rosenstock
  2018-05-16 16:57                         ` Jason Gunthorpe
       [not found]                         ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com>
  0 siblings, 2 replies; 24+ messages in thread
From: Hal Rosenstock @ 2018-05-16 16:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Håkon Bugge
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On 5/16/2018 11:12 AM, Jason Gunthorpe wrote:
> On Wed, May 16, 2018 at 08:47:21AM +0200, Håkon Bugge wrote:
> 
>>> This is not a difficult issue.
>>>
>>> If the GMP is properly tagged with the right PKey then it will never
>>> be delivered to the VM if the VM does not have the PKey in the
>>> table.
>>
>> Not quite right. For the shared port model, a GMP will (most
>> probably) be accepted by the physical port, due to:
> 
> Sure, but I am talking about the VM's 'virtual port'.
> 
>> So, if the GMP is destined to VM1, which is a limited member, but we
>> have another VM2 which is a full member of the same partition, the
>> GMP will pass the HCA’s PKey check.
> 
> Sure.
> 
>>> It is up to the hypervisor to block GMPs that have Pkeys not in
>>> the virtual PKey table of the VF.
>>
>> The packet is received by the HCA and stripped from IB headers, in
>> particular the BTH. How can the "hypervisor" block it when its
>> doesn’t have access to the GMP’s BTH.PKey?
> 
> This is wrong too, in Linux the WC's for GMP packets include the pkey
> index that was used to RX the packet. This is a Linux extension
> matching the one on the sending side.
> 
> The hypervisor *must* block these GMPs, it is a hard requirement of
> the pkey security model for VMs.

Aside from the pkey enforcement at the VF, there is also the issue of
the pkey trap - if physical port model is followed for shared virtual
ports, trap should be generated which could look confusing to SM as well
as counting in some error counter (for physical port, it's either
PortCounter:Port[Xmit Rcv]ConstraintErrors.

> AFAIK the Mellanox drivers do this right - do you know differently?
> 
>>> The only time you could need a new REJ code is if the GMP is using a
>>> PKey different from the REQ - which is a pretty goofy thing to do
>>> considering this VM case.
>>
>> Its goofy. In the CX-3 shared port model, the BTH.PKey is the
>> default one and the REQ.PKey is the full one even if the sending
>> VM’s port only is a limited member. This patch series fixes the last
>> issue.
> 
> Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same -

I do not believe there is anything in the spec that requires this. I
agree it's the simplest use model though.

> please fix that first before trying to change anything else.
> 
>>> Remember the SM doesn't know what Pkeys are in the VM, so it is
>>> basically impossible for the REQ side to reliably select two different
>>> pkeys and know that they will bothmake it to the VM.
>>
>> The active side should use the "authentic" PKey in the REQ
>> message. That is the one that would be used in BTH.PKey when
>> communication has been established. This is implemented by this
>> patch series.
>>
>> Not sure what you mean by "reliably select two different pkeys". The
>> CM REQ message contains one PKey.
> 
> If BTH.Pkey != REQ.PKey then the requestor side has to obviously
> select two PKeys, which is basically impossible.
> 
> The VM should not be part of the default partition, for instance.

I think that the VM is at least a limited member of the default partition.

-- Hal

>> See above, not sure how that could be implemented. And if it is
>> solved by the HCA trapping the GMP due to the PKey check, it doesn’t
>> help me, as the purpose of the series is to avoid (excessive) PKey
>> traps sent to the SM.
> 
> It might not help you, but is shows this fix for the pkey trap issue
> is wrong. You must fix the pkey traps on the sending side, not on the
> responder side..
> 
>>> If I recall there were bugs here in mlx drivers, where the driver
>>> sent with the wrong Pkey. I think this has actually been fixed
>>> now, so please check the upstream kernel to be sure the Pkey is
>>> not what it is supposed to be.
>>
>> Let me get back to you with some ibdumps here.
> 
> Upstream kernel (eg 4.16) and newish Mellanox firmware please.
> 
> And it would be fantastic if you could ID why this is happening, AFAIK
> it should be OK in the kernel.
> 
> Jason
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-16 16:42                       ` Hal Rosenstock
@ 2018-05-16 16:57                         ` Jason Gunthorpe
       [not found]                         ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com>
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2018-05-16 16:57 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Håkon Bugge, Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On Wed, May 16, 2018 at 12:42:37PM -0400, Hal Rosenstock wrote:

> >>> The only time you could need a new REJ code is if the GMP is using a
> >>> PKey different from the REQ - which is a pretty goofy thing to do
> >>> considering this VM case.
> >>
> >> Its goofy. In the CX-3 shared port model, the BTH.PKey is the
> >> default one and the REQ.PKey is the full one even if the sending
> >> VM’s port only is a limited member. This patch series fixes the last
> >> issue.
> > 
> > Again, this is wrong, the BTH.Pkey and REQ.Pkey should be the same -
> 
> I do not believe there is anything in the spec that requires this. I
> agree it's the simplest use model though.

The spec doesn't require it, but the design of the Linux CM certainly
does.

> > If BTH.Pkey != REQ.PKey then the requestor side has to obviously
> > select two PKeys, which is basically impossible.
> > 
> > The VM should not be part of the default partition, for instance.
> 
> I think that the VM is at least a limited member of the default partition.

Well, being a limited member still means the default pkey cannot be
used for CM GMPs.

I actually can't think of why you'd want to do this, better to put the
SM nodes in all the pkeys and reserve the default pkey completely for
the network management control plane.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
       [not found]                         ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com>
@ 2018-05-16 18:01                           ` Jason Gunthorpe
  2018-05-16 18:14                             ` Håkon Bugge
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2018-05-16 18:01 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel, Hal Rosenstock

On Wed, May 16, 2018 at 07:46:10PM +0200, Håkon Bugge wrote:

>    OK. Lets take one example. The pkey table contains 0xFFFF, 0x8001,
>    0x0001.
> 
>    The wce.pkey_index is 1 (i.e., pointing to 0x8001). Now, tell me, was
>    BTH.PKey 0x8001 (matches 0x8001) or was it 0x0001 (also matching
>    0x8001) ?

As far as the Linux core is concerned, it must have been 0x8001,
because the only way the pkey_index feature works properly is if
exact-match takes precedence over in-exact match.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-16 18:01                           ` Jason Gunthorpe
@ 2018-05-16 18:14                             ` Håkon Bugge
  2018-05-16 18:16                               ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Håkon Bugge @ 2018-05-16 18:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel, Hal Rosenstock



> On 16 May 2018, at 20:01, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, May 16, 2018 at 07:46:10PM +0200, Håkon Bugge wrote:
> 
>>   OK. Lets take one example. The pkey table contains 0xFFFF, 0x8001,
>>   0x0001.
>> 
>>   The wce.pkey_index is 1 (i.e., pointing to 0x8001). Now, tell me, was
>>   BTH.PKey 0x8001 (matches 0x8001) or was it 0x0001 (also matching
>>   0x8001) ?
> 
> As far as the Linux core is concerned, it must have been 0x8001,
> because the only way the pkey_index feature works properly is if
> exact-match takes precedence over in-exact match.

And now if the table only contains 0xFFFF, 0x8001, how do you tell?


Håkon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
  2018-05-16 18:14                             ` Håkon Bugge
@ 2018-05-16 18:16                               ` Jason Gunthorpe
       [not found]                                 ` <A087A721-E596-428E-8554-FAEB4BE9B306@oracle.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2018-05-16 18:16 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel, Hal Rosenstock

On Wed, May 16, 2018 at 08:14:50PM +0200, Håkon Bugge wrote:
> 
> 
> > On 16 May 2018, at 20:01, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Wed, May 16, 2018 at 07:46:10PM +0200, Håkon Bugge wrote:
> > 
> >>   OK. Lets take one example. The pkey table contains 0xFFFF, 0x8001,
> >>   0x0001.
> >> 
> >>   The wce.pkey_index is 1 (i.e., pointing to 0x8001). Now, tell me, was
> >>   BTH.PKey 0x8001 (matches 0x8001) or was it 0x0001 (also matching
> >>   0x8001) ?
> > 
> > As far as the Linux core is concerned, it must have been 0x8001,
> > because the only way the pkey_index feature works properly is if
> > exact-match takes precedence over in-exact match.
> 
> And now if the table only contains 0xFFFF, 0x8001, how do you tell?

It doesn't matter.

The delgation of Pkeys to VMs are on a pkey-table-index basis, so if
it matches table entry 1 and entry 1 is passed to the VM, then the
packet can be passed to the VM.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
       [not found]                                 ` <A087A721-E596-428E-8554-FAEB4BE9B306@oracle.com>
@ 2018-05-16 19:30                                   ` Hal Rosenstock
  0 siblings, 0 replies; 24+ messages in thread
From: Hal Rosenstock @ 2018-05-16 19:30 UTC (permalink / raw)
  To: Håkon Bugge, Jason Gunthorpe
  Cc: Doug Ledford, Don Hiatt, Ira Weiny, Sean Hefty,
	OFED mailing list, linux-kernel

On 5/16/2018 3:22 PM, Håkon Bugge wrote:
> But, do we need an update to IBTA (that the BTH.PKey shall be that of the VM's Port)?

Nothing in spec mentions shared (port) virtualization so that is an
exercise completely left to the reader...

Annex A19 is silent on this specific point but the virtual port model
has virtual port partition table so there's no ambiguity there.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-16 19:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-09  9:30 [PATCH IB/core 0/2] Do not form IB connections between limited partition members Håkon Bugge
2018-05-09  9:30 ` [PATCH IB/core 1/2] IB/core: A full pkey is required to match a limited one Håkon Bugge
2018-05-09  9:30 ` [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys Håkon Bugge
2018-05-09 11:28   ` Hal Rosenstock
2018-05-10  9:16     ` Håkon Bugge
2018-05-10 14:01       ` Hal Rosenstock
2018-05-10 15:16         ` Håkon Bugge
2018-05-10 16:54           ` Hal Rosenstock
2018-05-11 10:55             ` Håkon Bugge
2018-05-11 12:51               ` Hal Rosenstock
2018-05-14 21:04               ` Jason Gunthorpe
2018-05-14 21:02           ` Jason Gunthorpe
2018-05-15  0:38             ` Hal Rosenstock
2018-05-15 18:11               ` Håkon Bugge
2018-05-15 19:04                 ` Jason Gunthorpe
2018-05-16  6:47                   ` Håkon Bugge
2018-05-16 15:12                     ` Jason Gunthorpe
2018-05-16 16:42                       ` Hal Rosenstock
2018-05-16 16:57                         ` Jason Gunthorpe
     [not found]                         ` <151B2A36-28F0-4A88-8633-31AE7E55F848@oracle.com>
2018-05-16 18:01                           ` Jason Gunthorpe
2018-05-16 18:14                             ` Håkon Bugge
2018-05-16 18:16                               ` Jason Gunthorpe
     [not found]                                 ` <A087A721-E596-428E-8554-FAEB4BE9B306@oracle.com>
2018-05-16 19:30                                   ` Hal Rosenstock
2018-05-09 18:11   ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).