All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 for-next 0/7] Add support for multicast loopback prevention to mlx4
@ 2015-10-15 11:44 Eran Ben Elisha
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter

Hi Doug,

This patch-set adds a new  implementation for multicast loopback prevention for
mlx4 driver.  The current implementation is very limited, especially if link
layer is Ethernet. The new implementation is based on HW feature of dropping
incoming multicast packets if the sender QP counter index is equal to the
receiver counter index.

Patch 0001 extends ib_uverbs_create_qp in order to allow receiving the
multicast loopback flag at create flags.
Patch 0002 allows setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK in create_flag
field both from uverbs and ib_create_qp.
Patch 0003 adds an infrastructure for the counters' loopback prevention in the
mlx4_core.
Patch 0004 modifies mlx4_en QPs to use the new loopback prevention mode.
Patches 0005-0007 implements this feature for mlx4_ib driver.

Changes from v0:
	Move loopback assignment outside the for loop according to Yuval's comment
	rebase over to-be-rebased/for-4.3
Changes from v1:
	rebase over k.o/for-4.4
	Fix cover letter patch description - was out of order

Tested-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

Thanks,
Eran.

Eran Ben Elisha (5):
  IB/core: Extend ib_uverbs_create_qp
  IB/core: Allow setting create flags in QP init attribute
  IB/mlx4: Add IB counters table
  IB/mlx4: Add counter based implementation for QP multicast loopback
    block
  IB/mlx4: Add support for blocking multicast loopback QP creation user
    flag

Maor Gottlieb (2):
  net/mlx4_core: Add support for filtering multicast loopback
  net/mlx4_en: Implement mcast loopback prevention for ETH qps

 drivers/infiniband/core/uverbs.h                   |   1 +
 drivers/infiniband/core/uverbs_cmd.c               | 259 +++++++++++++++------
 drivers/infiniband/core/uverbs_main.c              |   1 +
 drivers/infiniband/hw/mlx4/mad.c                   |  25 +-
 drivers/infiniband/hw/mlx4/main.c                  |  66 ++++--
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |  10 +-
 drivers/infiniband/hw/mlx4/qp.c                    |  88 ++++++-
 drivers/net/ethernet/mellanox/mlx4/en_main.c       |  22 ++
 drivers/net/ethernet/mellanox/mlx4/en_resources.c  |  25 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c            |   6 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |   3 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c            |  19 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  30 ++-
 include/linux/mlx4/device.h                        |   2 +
 include/linux/mlx4/qp.h                            |  24 +-
 include/uapi/rdma/ib_user_verbs.h                  |  26 +++
 16 files changed, 498 insertions(+), 109 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-15 11:44   ` Eran Ben Elisha
       [not found]     ` <1444909482-17113-2-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-15 11:44   ` [PATCH v2 for-next 2/7] IB/core: Allow setting create flags in QP init attribute Eran Ben Elisha
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter

ib_uverbs_ex_create_qp follows the extension verbs
mechanism. New features (for example, QP creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.

Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 259 +++++++++++++++++++++++++---------
 drivers/infiniband/core/uverbs_main.c |   1 +
 include/uapi/rdma/ib_user_verbs.h     |  26 ++++
 4 files changed, 222 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 3863d33..94bbd8c 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -272,5 +272,6 @@ IB_UVERBS_DECLARE_EX_CMD(create_flow);
 IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 IB_UVERBS_DECLARE_EX_CMD(query_device);
 IB_UVERBS_DECLARE_EX_CMD(create_cq);
+IB_UVERBS_DECLARE_EX_CMD(create_qp);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index be4cb9f..e795d59 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1741,66 +1741,65 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
 	return in_len;
 }
 
-ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
-			    struct ib_device *ib_dev,
-			    const char __user *buf, int in_len,
-			    int out_len)
-{
-	struct ib_uverbs_create_qp      cmd;
-	struct ib_uverbs_create_qp_resp resp;
-	struct ib_udata                 udata;
-	struct ib_uqp_object           *obj;
-	struct ib_device	       *device;
-	struct ib_pd                   *pd = NULL;
-	struct ib_xrcd		       *xrcd = NULL;
-	struct ib_uobject	       *uninitialized_var(xrcd_uobj);
-	struct ib_cq                   *scq = NULL, *rcq = NULL;
-	struct ib_srq                  *srq = NULL;
-	struct ib_qp                   *qp;
-	struct ib_qp_init_attr          attr;
-	int ret;
-
-	if (out_len < sizeof resp)
-		return -ENOSPC;
-
-	if (copy_from_user(&cmd, buf, sizeof cmd))
-		return -EFAULT;
+static int create_qp(struct ib_uverbs_file *file,
+		     struct ib_udata *ucore,
+		     struct ib_udata *uhw,
+		     struct ib_uverbs_ex_create_qp *cmd,
+		     size_t cmd_sz,
+		     int (*cb)(struct ib_uverbs_file *file,
+			       struct ib_uverbs_ex_create_qp_resp *resp,
+			       struct ib_udata *udata),
+		     void *context)
+{
+	struct ib_uqp_object		*obj;
+	struct ib_device		*device;
+	struct ib_pd			*pd = NULL;
+	struct ib_xrcd			*xrcd = NULL;
+	struct ib_uobject		*uninitialized_var(xrcd_uobj);
+	struct ib_cq			*scq = NULL, *rcq = NULL;
+	struct ib_srq			*srq = NULL;
+	struct ib_qp			*qp;
+	char				*buf;
+	struct ib_qp_init_attr		attr;
+	struct ib_uverbs_ex_create_qp_resp resp;
+	int				ret;
 
-	if (cmd.qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
+	if (cmd->qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
 		return -EPERM;
 
-	INIT_UDATA(&udata, buf + sizeof cmd,
-		   (unsigned long) cmd.response + sizeof resp,
-		   in_len - sizeof cmd, out_len - sizeof resp);
-
 	obj = kzalloc(sizeof *obj, GFP_KERNEL);
 	if (!obj)
 		return -ENOMEM;
 
-	init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_class);
+	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext,
+		  &qp_lock_class);
 	down_write(&obj->uevent.uobject.mutex);
 
-	if (cmd.qp_type == IB_QPT_XRC_TGT) {
-		xrcd = idr_read_xrcd(cmd.pd_handle, file->ucontext, &xrcd_uobj);
+	if (cmd->qp_type == IB_QPT_XRC_TGT) {
+		xrcd = idr_read_xrcd(cmd->pd_handle, file->ucontext,
+				     &xrcd_uobj);
 		if (!xrcd) {
 			ret = -EINVAL;
 			goto err_put;
 		}
 		device = xrcd->device;
 	} else {
-		if (cmd.qp_type == IB_QPT_XRC_INI) {
-			cmd.max_recv_wr = cmd.max_recv_sge = 0;
+		if (cmd->qp_type == IB_QPT_XRC_INI) {
+			cmd->max_recv_wr = 0;
+			cmd->max_recv_sge = 0;
 		} else {
-			if (cmd.is_srq) {
-				srq = idr_read_srq(cmd.srq_handle, file->ucontext);
+			if (cmd->is_srq) {
+				srq = idr_read_srq(cmd->srq_handle,
+						   file->ucontext);
 				if (!srq || srq->srq_type != IB_SRQT_BASIC) {
 					ret = -EINVAL;
 					goto err_put;
 				}
 			}
 
-			if (cmd.recv_cq_handle != cmd.send_cq_handle) {
-				rcq = idr_read_cq(cmd.recv_cq_handle, file->ucontext, 0);
+			if (cmd->recv_cq_handle != cmd->send_cq_handle) {
+				rcq = idr_read_cq(cmd->recv_cq_handle,
+						  file->ucontext, 0);
 				if (!rcq) {
 					ret = -EINVAL;
 					goto err_put;
@@ -1808,9 +1807,9 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
 			}
 		}
 
-		scq = idr_read_cq(cmd.send_cq_handle, file->ucontext, !!rcq);
+		scq = idr_read_cq(cmd->send_cq_handle, file->ucontext, !!rcq);
 		rcq = rcq ?: scq;
-		pd  = idr_read_pd(cmd.pd_handle, file->ucontext);
+		pd  = idr_read_pd(cmd->pd_handle, file->ucontext);
 		if (!pd || !scq) {
 			ret = -EINVAL;
 			goto err_put;
@@ -1825,31 +1824,54 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
 	attr.recv_cq       = rcq;
 	attr.srq           = srq;
 	attr.xrcd	   = xrcd;
-	attr.sq_sig_type   = cmd.sq_sig_all ? IB_SIGNAL_ALL_WR : IB_SIGNAL_REQ_WR;
-	attr.qp_type       = cmd.qp_type;
+	attr.sq_sig_type   = cmd->sq_sig_all ? IB_SIGNAL_ALL_WR :
+					      IB_SIGNAL_REQ_WR;
+	attr.qp_type       = cmd->qp_type;
 	attr.create_flags  = 0;
 
-	attr.cap.max_send_wr     = cmd.max_send_wr;
-	attr.cap.max_recv_wr     = cmd.max_recv_wr;
-	attr.cap.max_send_sge    = cmd.max_send_sge;
-	attr.cap.max_recv_sge    = cmd.max_recv_sge;
-	attr.cap.max_inline_data = cmd.max_inline_data;
+	attr.cap.max_send_wr     = cmd->max_send_wr;
+	attr.cap.max_recv_wr     = cmd->max_recv_wr;
+	attr.cap.max_send_sge    = cmd->max_send_sge;
+	attr.cap.max_recv_sge    = cmd->max_recv_sge;
+	attr.cap.max_inline_data = cmd->max_inline_data;
 
 	obj->uevent.events_reported     = 0;
 	INIT_LIST_HEAD(&obj->uevent.event_list);
 	INIT_LIST_HEAD(&obj->mcast_list);
 
-	if (cmd.qp_type == IB_QPT_XRC_TGT)
+	if (cmd_sz >= offsetof(typeof(*cmd), create_flags) +
+		      sizeof(cmd->create_flags))
+		attr.create_flags = cmd->create_flags;
+
+	if (attr.create_flags) {
+		ret = -EINVAL;
+		goto err_put;
+	}
+
+	if (cmd->comp_mask) {
+		ret = -EINVAL;
+		goto err_put;
+	}
+
+	buf = (void *)cmd + sizeof(*cmd);
+	if (cmd_sz > sizeof(*cmd))
+		if (!(buf[0] == 0 && !memcmp(buf, buf + 1,
+					     cmd_sz - sizeof(*cmd) - 1))) {
+			ret = -EINVAL;
+			goto err_put;
+		}
+
+	if (cmd->qp_type == IB_QPT_XRC_TGT)
 		qp = ib_create_qp(pd, &attr);
 	else
-		qp = device->create_qp(pd, &attr, &udata);
+		qp = device->create_qp(pd, &attr, uhw);
 
 	if (IS_ERR(qp)) {
 		ret = PTR_ERR(qp);
 		goto err_put;
 	}
 
-	if (cmd.qp_type != IB_QPT_XRC_TGT) {
+	if (cmd->qp_type != IB_QPT_XRC_TGT) {
 		qp->real_qp	  = qp;
 		qp->device	  = device;
 		qp->pd		  = pd;
@@ -1875,19 +1897,20 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
 		goto err_destroy;
 
 	memset(&resp, 0, sizeof resp);
-	resp.qpn             = qp->qp_num;
-	resp.qp_handle       = obj->uevent.uobject.id;
-	resp.max_recv_sge    = attr.cap.max_recv_sge;
-	resp.max_send_sge    = attr.cap.max_send_sge;
-	resp.max_recv_wr     = attr.cap.max_recv_wr;
-	resp.max_send_wr     = attr.cap.max_send_wr;
-	resp.max_inline_data = attr.cap.max_inline_data;
+	resp.base.qpn             = qp->qp_num;
+	resp.base.qp_handle       = obj->uevent.uobject.id;
+	resp.base.max_recv_sge    = attr.cap.max_recv_sge;
+	resp.base.max_send_sge    = attr.cap.max_send_sge;
+	resp.base.max_recv_wr     = attr.cap.max_recv_wr;
+	resp.base.max_send_wr     = attr.cap.max_send_wr;
+	resp.base.max_inline_data = attr.cap.max_inline_data;
 
-	if (copy_to_user((void __user *) (unsigned long) cmd.response,
-			 &resp, sizeof resp)) {
-		ret = -EFAULT;
-		goto err_copy;
-	}
+	resp.response_length = offsetof(typeof(resp), response_length) +
+			       sizeof(resp.response_length);
+
+	ret = cb(file, &resp, ucore);
+	if (ret)
+		goto err_cb;
 
 	if (xrcd) {
 		obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object,
@@ -1913,9 +1936,8 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
 
 	up_write(&obj->uevent.uobject.mutex);
 
-	return in_len;
-
-err_copy:
+	return 0;
+err_cb:
 	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
 
 err_destroy:
@@ -1937,6 +1959,113 @@ err_put:
 	return ret;
 }
 
+static int ib_uverbs_create_qp_cb(struct ib_uverbs_file *file,
+				  struct ib_uverbs_ex_create_qp_resp *resp,
+				  struct ib_udata *ucore)
+{
+	if (ib_copy_to_udata(ucore, &resp->base, sizeof(resp->base)))
+		return -EFAULT;
+
+	return 0;
+}
+
+ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
+			    struct ib_device *ib_dev,
+			    const char __user *buf, int in_len,
+			    int out_len)
+{
+	struct ib_uverbs_create_qp      cmd;
+	struct ib_uverbs_ex_create_qp	cmd_ex;
+	struct ib_udata			ucore;
+	struct ib_udata			uhw;
+	ssize_t resp_size = sizeof(struct ib_uverbs_create_qp_resp);
+	int				err;
+
+	if (out_len < resp_size)
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	INIT_UDATA(&ucore, buf, (unsigned long)cmd.response, sizeof(cmd),
+		   resp_size);
+	INIT_UDATA(&uhw, buf + sizeof(cmd),
+		   (unsigned long)cmd.response + resp_size,
+		   in_len - sizeof(cmd), out_len - resp_size);
+
+	memset(&cmd_ex, 0, sizeof(cmd_ex));
+	cmd_ex.user_handle = cmd.user_handle;
+	cmd_ex.pd_handle = cmd.pd_handle;
+	cmd_ex.send_cq_handle = cmd.send_cq_handle;
+	cmd_ex.recv_cq_handle = cmd.recv_cq_handle;
+	cmd_ex.srq_handle = cmd.srq_handle;
+	cmd_ex.max_send_wr = cmd.max_send_wr;
+	cmd_ex.max_recv_wr = cmd.max_recv_wr;
+	cmd_ex.max_send_sge = cmd.max_send_sge;
+	cmd_ex.max_recv_sge = cmd.max_recv_sge;
+	cmd_ex.max_inline_data = cmd.max_inline_data;
+	cmd_ex.sq_sig_all = cmd.sq_sig_all;
+	cmd_ex.qp_type = cmd.qp_type;
+	cmd_ex.is_srq = cmd.is_srq;
+
+	err = create_qp(file, &ucore, &uhw, &cmd_ex,
+			offsetof(typeof(cmd_ex), is_srq) +
+			sizeof(cmd.is_srq), ib_uverbs_create_qp_cb,
+			NULL);
+
+	if (err)
+		return err;
+
+	return in_len;
+}
+
+static int ib_uverbs_ex_create_qp_cb(struct ib_uverbs_file *file,
+				     struct ib_uverbs_ex_create_qp_resp *resp,
+				     struct ib_udata *ucore)
+{
+	if (ib_copy_to_udata(ucore, resp, resp->response_length))
+		return -EFAULT;
+
+	return 0;
+}
+
+int ib_uverbs_ex_create_qp(struct ib_uverbs_file *file,
+			   struct ib_device *ib_dev,
+			   struct ib_udata *ucore,
+			   struct ib_udata *uhw)
+{
+	struct ib_uverbs_ex_create_qp_resp resp;
+	struct ib_uverbs_ex_create_qp cmd = {0};
+	int err;
+
+	if (ucore->inlen < (offsetof(typeof(cmd), comp_mask) +
+			    sizeof(cmd.comp_mask)))
+		return -EINVAL;
+
+	err = ib_copy_from_udata(&cmd, ucore, min(sizeof(cmd), ucore->inlen));
+	if (err)
+		return err;
+
+	if (cmd.comp_mask)
+		return -EINVAL;
+
+	if (cmd.reserved)
+		return -EINVAL;
+
+	if (ucore->outlen < (offsetof(typeof(resp), response_length) +
+			     sizeof(resp.response_length)))
+		return -ENOSPC;
+
+	err = create_qp(file, ucore, uhw, &cmd,
+			min(ucore->inlen, sizeof(cmd)),
+			ib_uverbs_ex_create_qp_cb, NULL);
+
+	if (err)
+		return err;
+
+	return 0;
+}
+
 ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 			  struct ib_device *ib_dev,
 			  const char __user *buf, int in_len, int out_len)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index c29a660..e3ef288 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -127,6 +127,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_EX_CMD_DESTROY_FLOW]	= ib_uverbs_ex_destroy_flow,
 	[IB_USER_VERBS_EX_CMD_QUERY_DEVICE]	= ib_uverbs_ex_query_device,
 	[IB_USER_VERBS_EX_CMD_CREATE_CQ]	= ib_uverbs_ex_create_cq,
+	[IB_USER_VERBS_EX_CMD_CREATE_QP]        = ib_uverbs_ex_create_qp,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 978841e..8126c14 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -92,6 +92,7 @@ enum {
 enum {
 	IB_USER_VERBS_EX_CMD_QUERY_DEVICE = IB_USER_VERBS_CMD_QUERY_DEVICE,
 	IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ,
+	IB_USER_VERBS_EX_CMD_CREATE_QP = IB_USER_VERBS_CMD_CREATE_QP,
 	IB_USER_VERBS_EX_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
 	IB_USER_VERBS_EX_CMD_DESTROY_FLOW,
 };
@@ -516,6 +517,25 @@ struct ib_uverbs_create_qp {
 	__u64 driver_data[0];
 };
 
+struct ib_uverbs_ex_create_qp {
+	__u64 user_handle;
+	__u32 pd_handle;
+	__u32 send_cq_handle;
+	__u32 recv_cq_handle;
+	__u32 srq_handle;
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+	__u8  sq_sig_all;
+	__u8  qp_type;
+	__u8  is_srq;
+	__u8 reserved;
+	__u32 comp_mask;
+	__u32 create_flags;
+};
+
 struct ib_uverbs_open_qp {
 	__u64 response;
 	__u64 user_handle;
@@ -538,6 +558,12 @@ struct ib_uverbs_create_qp_resp {
 	__u32 reserved;
 };
 
+struct ib_uverbs_ex_create_qp_resp {
+	struct ib_uverbs_create_qp_resp base;
+	__u32 comp_mask;
+	__u32 response_length;
+};
+
 /*
  * This struct needs to remain a multiple of 8 bytes to keep the
  * alignment of the modify QP parameters.
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 for-next 2/7] IB/core: Allow setting create flags in QP init attribute
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-15 11:44   ` [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp Eran Ben Elisha
@ 2015-10-15 11:44   ` Eran Ben Elisha
       [not found]     ` <1444909482-17113-3-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-15 11:44   ` [PATCH v2 for-next 3/7] net/mlx4_core: Add support for filtering multicast loopback Eran Ben Elisha
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter

Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in
ib_uverbs_create_qp_ex.

Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index e795d59..e9bafa3 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1843,7 +1843,7 @@ static int create_qp(struct ib_uverbs_file *file,
 		      sizeof(cmd->create_flags))
 		attr.create_flags = cmd->create_flags;
 
-	if (attr.create_flags) {
+	if (attr.create_flags & ~IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK) {
 		ret = -EINVAL;
 		goto err_put;
 	}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 for-next 3/7] net/mlx4_core: Add support for filtering multicast loopback
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-15 11:44   ` [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp Eran Ben Elisha
  2015-10-15 11:44   ` [PATCH v2 for-next 2/7] IB/core: Allow setting create flags in QP init attribute Eran Ben Elisha
@ 2015-10-15 11:44   ` Eran Ben Elisha
  2015-10-15 11:44   ` [PATCH v2 for-next 4/7] net/mlx4_en: Implement mcast loopback prevention for ETH qps Eran Ben Elisha
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter, Maor Gottlieb

From: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Update device capabilities regarding HW filtering multicast loopback support.

Add MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB attribute to mlx4_update_qp to
enable changing QP context to support filtering incoming multicast
loopback traffic according the sender's counter index.

Set the corresponding bits in QP context to force the loopback source
checks if attribute is given and HW supports it.

Signed-off-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c            |  6 +++++
 drivers/net/ethernet/mellanox/mlx4/qp.c            | 19 +++++++++++++-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  | 30 +++++++++++++++++-----
 include/linux/mlx4/device.h                        |  2 ++
 include/linux/mlx4/qp.h                            | 24 +++++++++++++----
 5 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index e8ec1de..b3be3a0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -155,6 +155,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags)
 		[27] = "Port beacon support",
 		[28] = "RX-ALL support",
 		[29] = "802.1ad offload support",
+		[31] = "Modifying loopback source checks using UPDATE_QP support",
+		[32] = "Loopback source checks support",
 	};
 	int i;
 
@@ -964,6 +966,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	MLX4_GET(field32, outbox, QUERY_DEV_CAP_EXT_2_FLAGS_OFFSET);
 	if (field32 & (1 << 16))
 		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP;
+	if (field32 & (1 << 18))
+		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB;
+	if (field32 & (1 << 19))
+		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_LB_SRC_CHK;
 	if (field32 & (1 << 26))
 		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_VLAN_CONTROL;
 	if (field32 & (1 << 20))
diff --git a/drivers/net/ethernet/mellanox/mlx4/qp.c b/drivers/net/ethernet/mellanox/mlx4/qp.c
index 2026863..b162495 100644
--- a/drivers/net/ethernet/mellanox/mlx4/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx4/qp.c
@@ -436,6 +436,23 @@ int mlx4_update_qp(struct mlx4_dev *dev, u32 qpn,
 		cmd->qp_context.pri_path.grh_mylmc = params->smac_index;
 	}
 
+	if (attr & MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB) {
+		if (!(dev->caps.flags2
+		      & MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB)) {
+			mlx4_warn(dev,
+				  "Trying to set src check LB, but it isn't supported\n");
+			err = -ENOTSUPP;
+			goto out;
+		}
+		pri_addr_path_mask |=
+			1ULL << MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB;
+		if (params->flags &
+		    MLX4_UPDATE_QP_PARAMS_FLAGS_ETH_CHECK_MC_LB) {
+			cmd->qp_context.pri_path.fl |=
+				MLX4_FL_ETH_SRC_CHECK_MC_LB;
+		}
+	}
+
 	if (attr & MLX4_UPDATE_QP_VSD) {
 		qp_mask |= 1ULL << MLX4_UPD_QP_MASK_VSD;
 		if (params->flags & MLX4_UPDATE_QP_PARAMS_FLAGS_VSD_ENABLE)
@@ -458,7 +475,7 @@ int mlx4_update_qp(struct mlx4_dev *dev, u32 qpn,
 	err = mlx4_cmd(dev, mailbox->dma, qpn & 0xffffff, 0,
 		       MLX4_CMD_UPDATE_QP, MLX4_CMD_TIME_CLASS_A,
 		       MLX4_CMD_NATIVE);
-
+out:
 	mlx4_free_cmd_mailbox(dev, mailbox);
 	return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 731423c..502f335 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -770,9 +770,12 @@ static int update_vport_qp_param(struct mlx4_dev *dev,
 			}
 		}
 
+		/* preserve IF_COUNTER flag */
+		qpc->pri_path.vlan_control &=
+			MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER;
 		if (vp_oper->state.link_state == IFLA_VF_LINK_STATE_DISABLE &&
 		    dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_UPDATE_QP) {
-			qpc->pri_path.vlan_control =
+			qpc->pri_path.vlan_control |=
 				MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED |
 				MLX4_VLAN_CTRL_ETH_TX_BLOCK_PRIO_TAGGED |
 				MLX4_VLAN_CTRL_ETH_TX_BLOCK_UNTAGGED |
@@ -780,12 +783,12 @@ static int update_vport_qp_param(struct mlx4_dev *dev,
 				MLX4_VLAN_CTRL_ETH_RX_BLOCK_UNTAGGED |
 				MLX4_VLAN_CTRL_ETH_RX_BLOCK_TAGGED;
 		} else if (0 != vp_oper->state.default_vlan) {
-			qpc->pri_path.vlan_control =
+			qpc->pri_path.vlan_control |=
 				MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED |
 				MLX4_VLAN_CTRL_ETH_RX_BLOCK_PRIO_TAGGED |
 				MLX4_VLAN_CTRL_ETH_RX_BLOCK_UNTAGGED;
 		} else { /* priority tagged */
-			qpc->pri_path.vlan_control =
+			qpc->pri_path.vlan_control |=
 				MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED |
 				MLX4_VLAN_CTRL_ETH_RX_BLOCK_TAGGED;
 		}
@@ -3762,9 +3765,6 @@ int mlx4_INIT2RTR_QP_wrapper(struct mlx4_dev *dev, int slave,
 	update_gid(dev, inbox, (u8)slave);
 	adjust_proxy_tun_qkey(dev, vhcr, qpc);
 	orig_sched_queue = qpc->pri_path.sched_queue;
-	err = update_vport_qp_param(dev, inbox, slave, qpn);
-	if (err)
-		return err;
 
 	err = get_res(dev, slave, qpn, RES_QP, &qp);
 	if (err)
@@ -3774,6 +3774,10 @@ int mlx4_INIT2RTR_QP_wrapper(struct mlx4_dev *dev, int slave,
 		goto out;
 	}
 
+	err = update_vport_qp_param(dev, inbox, slave, qpn);
+	if (err)
+		goto out;
+
 	err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
 out:
 	/* if no error, save sched queue value passed in by VF. This is
@@ -4208,7 +4212,9 @@ static int add_eth_header(struct mlx4_dev *dev, int slave,
 
 }
 
-#define MLX4_UPD_QP_PATH_MASK_SUPPORTED (1ULL << MLX4_UPD_QP_PATH_MASK_MAC_INDEX)
+#define MLX4_UPD_QP_PATH_MASK_SUPPORTED      (                                \
+	1ULL << MLX4_UPD_QP_PATH_MASK_MAC_INDEX                     |\
+	1ULL << MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB)
 int mlx4_UPDATE_QP_wrapper(struct mlx4_dev *dev, int slave,
 			   struct mlx4_vhcr *vhcr,
 			   struct mlx4_cmd_mailbox *inbox,
@@ -4231,6 +4237,16 @@ int mlx4_UPDATE_QP_wrapper(struct mlx4_dev *dev, int slave,
 	    (pri_addr_path_mask & ~MLX4_UPD_QP_PATH_MASK_SUPPORTED))
 		return -EPERM;
 
+	if ((pri_addr_path_mask &
+	     (1ULL << MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB)) &&
+		!(dev->caps.flags2 &
+		  MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB)) {
+			mlx4_warn(dev,
+				  "Src check LB for slave %d isn't supported\n",
+				   slave);
+		return -ENOTSUPP;
+	}
+
 	/* Just change the smac for the QP */
 	err = get_res(dev, slave, qpn, RES_QP, &rqp);
 	if (err) {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index baad4cb..dac6872 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -214,6 +214,8 @@ enum {
 	MLX4_DEV_CAP_FLAG2_IGNORE_FCS		= 1LL <<  28,
 	MLX4_DEV_CAP_FLAG2_PHV_EN		= 1LL <<  29,
 	MLX4_DEV_CAP_FLAG2_SKIP_OUTER_VLAN	= 1LL <<  30,
+	MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB = 1ULL << 31,
+	MLX4_DEV_CAP_FLAG2_LB_SRC_CHK           = 1ULL << 32,
 };
 
 enum {
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index de45a51..fe052e2 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -135,7 +135,10 @@ struct mlx4_rss_context {
 
 struct mlx4_qp_path {
 	u8			fl;
-	u8			vlan_control;
+	union {
+		u8			vlan_control;
+		u8			control;
+	};
 	u8			disable_pkey_check;
 	u8			pkey_index;
 	u8			counter_index;
@@ -156,9 +159,16 @@ struct mlx4_qp_path {
 };
 
 enum { /* fl */
-	MLX4_FL_CV      = 1 << 6,
-	MLX4_FL_ETH_HIDE_CQE_VLAN       = 1 << 2
+	MLX4_FL_CV	= 1 << 6,
+	MLX4_FL_ETH_HIDE_CQE_VLAN	= 1 << 2,
+	MLX4_FL_ETH_SRC_CHECK_MC_LB	= 1 << 1,
+	MLX4_FL_ETH_SRC_CHECK_UC_LB	= 1 << 0,
 };
+
+enum { /* control */
+	MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER	= 1 << 7,
+};
+
 enum { /* vlan_control */
 	MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED	= 1 << 6,
 	MLX4_VLAN_CTRL_ETH_TX_BLOCK_PRIO_TAGGED	= 1 << 5, /* 802.1p priority tag */
@@ -254,6 +264,8 @@ enum {
 	MLX4_UPD_QP_PATH_MASK_SCHED_QUEUE		= 14 + 32,
 	MLX4_UPD_QP_PATH_MASK_IF_COUNTER_INDEX		= 15 + 32,
 	MLX4_UPD_QP_PATH_MASK_FVL_RX			= 16 + 32,
+	MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_UC_LB	= 18 + 32,
+	MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB	= 19 + 32,
 };
 
 enum { /* param3 */
@@ -436,11 +448,13 @@ enum mlx4_update_qp_attr {
 	MLX4_UPDATE_QP_VSD		= 1 << 1,
 	MLX4_UPDATE_QP_RATE_LIMIT	= 1 << 2,
 	MLX4_UPDATE_QP_QOS_VPORT	= 1 << 3,
-	MLX4_UPDATE_QP_SUPPORTED_ATTRS	= (1 << 4) - 1
+	MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB      = 1 << 4,
+	MLX4_UPDATE_QP_SUPPORTED_ATTRS	= (1 << 5) - 1
 };
 
 enum mlx4_update_qp_params_flags {
-	MLX4_UPDATE_QP_PARAMS_FLAGS_VSD_ENABLE		= 1 << 0,
+	MLX4_UPDATE_QP_PARAMS_FLAGS_ETH_CHECK_MC_LB     = 1 << 0,
+	MLX4_UPDATE_QP_PARAMS_FLAGS_VSD_ENABLE		= 1 << 1,
 };
 
 struct mlx4_update_qp_params {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 for-next 4/7] net/mlx4_en: Implement mcast loopback prevention for ETH qps
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-10-15 11:44   ` [PATCH v2 for-next 3/7] net/mlx4_core: Add support for filtering multicast loopback Eran Ben Elisha
@ 2015-10-15 11:44   ` Eran Ben Elisha
  2015-10-15 11:44   ` [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table Eran Ben Elisha
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter, Maor Gottlieb

From: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Set the mcast loopback prevention bit in the QPC for ETH MLX QPs (not
RSS QPs), when the firmware supports this feature. In addition, all rx
ring QPs need to be updated in order not to enforce loopback checks.
This prevents getting packets we sent both from the network stack and
the HCA. Loopback prevention is done by comparing the counter indices of
the sent and receiving QPs. If they're equal, packets aren't
loopback-ed.

Signed-off-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx4/en_main.c      | 22 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_resources.c | 25 +++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      |  3 ++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index a946e4b..005f910 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -123,6 +123,28 @@ void mlx4_en_update_loopback_state(struct net_device *dev,
 	 */
 	if (mlx4_is_mfunc(priv->mdev->dev) || priv->validate_loopback)
 		priv->flags |= MLX4_EN_FLAG_ENABLE_HW_LOOPBACK;
+
+	mutex_lock(&priv->mdev->state_lock);
+	if (priv->mdev->dev->caps.flags2 &
+	    MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB &&
+	    priv->rss_map.indir_qp.qpn) {
+		int i;
+		int err = 0;
+		int loopback = !!(features & NETIF_F_LOOPBACK);
+
+		for (i = 0; i < priv->rx_ring_num; i++) {
+			int ret;
+
+			ret = mlx4_en_change_mcast_lb(priv,
+						      &priv->rss_map.qps[i],
+						      loopback);
+			if (!err)
+				err = ret;
+		}
+		if (err)
+			mlx4_warn(priv->mdev, "failed to change mcast loopback\n");
+	}
+	mutex_unlock(&priv->mdev->state_lock);
 }
 
 static int mlx4_en_get_profile(struct mlx4_en_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
index e482fa1b..12aab5a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
@@ -69,6 +69,15 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride,
 	context->pri_path.counter_index = priv->counter_index;
 	context->cqn_send = cpu_to_be32(cqn);
 	context->cqn_recv = cpu_to_be32(cqn);
+	if (!rss &&
+	    (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_LB_SRC_CHK) &&
+	    context->pri_path.counter_index !=
+			    MLX4_SINK_COUNTER_INDEX(mdev->dev)) {
+		/* disable multicast loopback to qp with same counter */
+		if (!(dev->features & NETIF_F_LOOPBACK))
+			context->pri_path.fl |= MLX4_FL_ETH_SRC_CHECK_MC_LB;
+		context->pri_path.control |= MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER;
+	}
 	context->db_rec_addr = cpu_to_be64(priv->res.db.dma << 2);
 	if (!(dev->features & NETIF_F_HW_VLAN_CTAG_RX))
 		context->param3 |= cpu_to_be32(1 << 30);
@@ -80,6 +89,22 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride,
 	}
 }
 
+int mlx4_en_change_mcast_lb(struct mlx4_en_priv *priv, struct mlx4_qp *qp,
+			    int loopback)
+{
+	int ret;
+	struct mlx4_update_qp_params qp_params;
+
+	memset(&qp_params, 0, sizeof(qp_params));
+	if (!loopback)
+		qp_params.flags = MLX4_UPDATE_QP_PARAMS_FLAGS_ETH_CHECK_MC_LB;
+
+	ret = mlx4_update_qp(priv->mdev->dev, qp->qpn,
+			     MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB,
+			     &qp_params);
+
+	return ret;
+}
 
 int mlx4_en_map_buffer(struct mlx4_buf *buf)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index defcf8c..c41f151 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -798,7 +798,8 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride,
 void mlx4_en_sqp_event(struct mlx4_qp *qp, enum mlx4_event event);
 int mlx4_en_map_buffer(struct mlx4_buf *buf);
 void mlx4_en_unmap_buffer(struct mlx4_buf *buf);
-
+int mlx4_en_change_mcast_lb(struct mlx4_en_priv *priv, struct mlx4_qp *qp,
+			    int loopback);
 void mlx4_en_calc_rx_buf(struct net_device *dev);
 int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv);
 void mlx4_en_release_rss_steer(struct mlx4_en_priv *priv);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-10-15 11:44   ` [PATCH v2 for-next 4/7] net/mlx4_en: Implement mcast loopback prevention for ETH qps Eran Ben Elisha
@ 2015-10-15 11:44   ` Eran Ben Elisha
       [not found]     ` <1444909482-17113-6-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-15 11:44   ` [PATCH v2 for-next 6/7] IB/mlx4: Add counter based implementation for QP multicast loopback block Eran Ben Elisha
  2015-10-15 11:44   ` [PATCH v2 for-next 7/7] IB/mlx4: Add support for blocking multicast loopback QP creation user flag Eran Ben Elisha
  6 siblings, 1 reply; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter

This is an infrastructure step for allocating and attaching more than
one counter to QPs on the same port. Allocate a counters table and
manage the insertion and removals of the counters in load and unload of
mlx4 IB.

Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/mad.c     | 25 ++++++++++----
 drivers/infiniband/hw/mlx4/main.c    | 63 ++++++++++++++++++++++++++++--------
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  9 +++++-
 drivers/infiniband/hw/mlx4/qp.c      |  8 +++--
 4 files changed, 81 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 1cd75ff..68f2567 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -824,18 +824,29 @@ static int iboe_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 {
 	struct mlx4_counter counter_stats;
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
-	int err;
+	struct counter_index *tmp_counter;
+	int err = IB_MAD_RESULT_FAILURE, stats_avail = 0;
 
 	if (in_mad->mad_hdr.mgmt_class != IB_MGMT_CLASS_PERF_MGMT)
 		return -EINVAL;
 
 	memset(&counter_stats, 0, sizeof(counter_stats));
-	err = mlx4_get_counter_stats(dev->dev,
-				     dev->counters[port_num - 1].index,
-				     &counter_stats, 0);
-	if (err)
-		err = IB_MAD_RESULT_FAILURE;
-	else {
+	mutex_lock(&dev->counters_table[port_num - 1].mutex);
+	list_for_each_entry(tmp_counter,
+			    &dev->counters_table[port_num - 1].counters_list,
+			    list) {
+		err = mlx4_get_counter_stats(dev->dev,
+					     tmp_counter->index,
+					     &counter_stats, 0);
+		if (err) {
+			err = IB_MAD_RESULT_FAILURE;
+			stats_avail = 0;
+			break;
+		}
+		stats_avail = 1;
+	}
+	mutex_unlock(&dev->counters_table[port_num - 1].mutex);
+	if (stats_avail) {
 		memset(out_mad->data, 0, sizeof out_mad->data);
 		switch (counter_stats.counter_mode & 0xf) {
 		case 0:
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 38be8dc..232b104 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1249,6 +1249,22 @@ static int add_gid_entry(struct ib_qp *ibqp, union ib_gid *gid)
 	return 0;
 }
 
+static void mlx4_ib_delete_counters_table(struct mlx4_ib_dev *ibdev,
+					  struct mlx4_ib_counters *ctr_table)
+{
+	struct counter_index *counter, *tmp_count;
+
+	mutex_lock(&ctr_table->mutex);
+	list_for_each_entry_safe(counter, tmp_count, &ctr_table->counters_list,
+				 list) {
+		if (counter->allocated)
+			mlx4_counter_free(ibdev->dev, counter->index);
+		list_del(&counter->list);
+		kfree(counter);
+	}
+	mutex_unlock(&ctr_table->mutex);
+}
+
 int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 		   union ib_gid *gid)
 {
@@ -2133,6 +2149,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	int num_req_counters;
 	int allocated;
 	u32 counter_index;
+	struct counter_index *new_counter_index = NULL;
 
 	pr_info_once("%s", mlx4_ib_version);
 
@@ -2304,6 +2321,11 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	if (init_node_data(ibdev))
 		goto err_map;
 
+	for (i = 0; i < ibdev->num_ports; ++i) {
+		mutex_init(&ibdev->counters_table[i].mutex);
+		INIT_LIST_HEAD(&ibdev->counters_table[i].counters_list);
+	}
+
 	num_req_counters = mlx4_is_bonded(dev) ? 1 : ibdev->num_ports;
 	for (i = 0; i < num_req_counters; ++i) {
 		mutex_init(&ibdev->qp1_proxy_lock[i]);
@@ -2322,15 +2344,34 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 			counter_index = mlx4_get_default_counter_index(dev,
 								       i + 1);
 		}
-		ibdev->counters[i].index = counter_index;
-		ibdev->counters[i].allocated = allocated;
+		new_counter_index = kmalloc(sizeof(*new_counter_index),
+					    GFP_KERNEL);
+		if (!new_counter_index) {
+			if (allocated)
+				mlx4_counter_free(ibdev->dev, counter_index);
+			goto err_counter;
+		}
+		new_counter_index->index = counter_index;
+		new_counter_index->allocated = allocated;
+		list_add_tail(&new_counter_index->list,
+			      &ibdev->counters_table[i].counters_list);
+		ibdev->counters_table[i].default_counter = counter_index;
 		pr_info("counter index %d for port %d allocated %d\n",
 			counter_index, i + 1, allocated);
 	}
 	if (mlx4_is_bonded(dev))
 		for (i = 1; i < ibdev->num_ports ; ++i) {
-			ibdev->counters[i].index = ibdev->counters[0].index;
-			ibdev->counters[i].allocated = 0;
+			new_counter_index =
+					kmalloc(sizeof(struct counter_index),
+						GFP_KERNEL);
+			if (!new_counter_index)
+				goto err_counter;
+			new_counter_index->index = counter_index;
+			new_counter_index->allocated = 0;
+			list_add_tail(&new_counter_index->list,
+				      &ibdev->counters_table[i].counters_list);
+			ibdev->counters_table[i].default_counter =
+								counter_index;
 		}
 
 	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
@@ -2439,12 +2480,9 @@ err_steer_qp_release:
 		mlx4_qp_release_range(dev, ibdev->steer_qpn_base,
 				      ibdev->steer_qpn_count);
 err_counter:
-	for (i = 0; i < ibdev->num_ports; ++i) {
-		if (ibdev->counters[i].index != -1 &&
-		    ibdev->counters[i].allocated)
-			mlx4_counter_free(ibdev->dev,
-					  ibdev->counters[i].index);
-	}
+	for (i = 0; i < ibdev->num_ports; ++i)
+		mlx4_ib_delete_counters_table(ibdev, &ibdev->counters_table[i]);
+
 err_map:
 	iounmap(ibdev->uar_map);
 
@@ -2548,9 +2586,8 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 
 	iounmap(ibdev->uar_map);
 	for (p = 0; p < ibdev->num_ports; ++p)
-		if (ibdev->counters[p].index != -1 &&
-		    ibdev->counters[p].allocated)
-			mlx4_counter_free(ibdev->dev, ibdev->counters[p].index);
+		mlx4_ib_delete_counters_table(ibdev, &ibdev->counters_table[p]);
+
 	mlx4_foreach_port(p, dev, MLX4_PORT_TYPE_IB)
 		mlx4_CLOSE_PORT(dev, p);
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 1e7b23b..4056dc1 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -528,10 +528,17 @@ struct mlx4_ib_iov_port {
 };
 
 struct counter_index {
+	struct  list_head       list;
 	u32		index;
 	u8		allocated;
 };
 
+struct mlx4_ib_counters {
+	struct list_head        counters_list;
+	struct mutex            mutex; /* mutex for accessing counters list */
+	u32			default_counter;
+};
+
 struct mlx4_ib_dev {
 	struct ib_device	ib_dev;
 	struct mlx4_dev	       *dev;
@@ -550,7 +557,7 @@ struct mlx4_ib_dev {
 	struct mutex		cap_mask_mutex;
 	bool			ib_active;
 	struct mlx4_ib_iboe	iboe;
-	struct counter_index    counters[MLX4_MAX_PORTS];
+	struct mlx4_ib_counters counters_table[MLX4_MAX_PORTS];
 	int		       *eq_table;
 	struct kobject	       *iov_parent;
 	struct kobject	       *ports_parent;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 4ad9be3..382913e 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1460,6 +1460,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	int sqd_event;
 	int steer_qp = 0;
 	int err = -EINVAL;
+	int counter_index;
 
 	/* APM is not supported under RoCE */
 	if (attr_mask & IB_QP_ALT_PATH &&
@@ -1543,9 +1544,10 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	}
 
 	if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR) {
-		if (dev->counters[qp->port - 1].index != -1) {
-			context->pri_path.counter_index =
-					dev->counters[qp->port - 1].index;
+		counter_index =
+			dev->counters_table[qp->port - 1].default_counter;
+		if (counter_index != -1) {
+			context->pri_path.counter_index = counter_index;
 			optpar |= MLX4_QP_OPTPAR_COUNTER_INDEX;
 		} else
 			context->pri_path.counter_index =
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 for-next 6/7] IB/mlx4: Add counter based implementation for QP multicast loopback block
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-10-15 11:44   ` [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table Eran Ben Elisha
@ 2015-10-15 11:44   ` Eran Ben Elisha
  2015-10-15 11:44   ` [PATCH v2 for-next 7/7] IB/mlx4: Add support for blocking multicast loopback QP creation user flag Eran Ben Elisha
  6 siblings, 0 replies; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter

Current implementation for MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is not
supported when link layer is Ethernet.

This patch will add counter based implementation for multicast loopback
prevention. HW can drop multicast loopback packets if sender QP counter
index is equal to receiver QP counter index. If qp flag
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is set and link layer is Ethernet,
create a new counter and attach it to the QP so it will continue
receiving multicast loopback traffic but it's own.

The decision if to create a new counter is being made at the qp
modification to RTR after the QP's port is set. When QP is destroyed or
moved back to reset state, delete the counter.

Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  1 +
 drivers/infiniband/hw/mlx4/qp.c      | 67 ++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 4056dc1..086416a 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -320,6 +320,7 @@ struct mlx4_ib_qp {
 	struct list_head	qps_list;
 	struct list_head	cq_recv_list;
 	struct list_head	cq_send_list;
+	struct counter_index	*counter_index;
 };
 
 struct mlx4_ib_srq {
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 382913e..2871949 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -617,6 +617,18 @@ static int qp0_enabled_vf(struct mlx4_dev *dev, int qpn)
 	return 0;
 }
 
+static void mlx4_ib_free_qp_counter(struct mlx4_ib_dev *dev,
+				    struct mlx4_ib_qp *qp)
+{
+	mutex_lock(&dev->counters_table[qp->port - 1].mutex);
+	mlx4_counter_free(dev->dev, qp->counter_index->index);
+	list_del(&qp->counter_index->list);
+	mutex_unlock(&dev->counters_table[qp->port - 1].mutex);
+
+	kfree(qp->counter_index);
+	qp->counter_index = NULL;
+}
+
 static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			    struct ib_qp_init_attr *init_attr,
 			    struct ib_udata *udata, int sqpn, struct mlx4_ib_qp **caller_qp,
@@ -1189,6 +1201,9 @@ int mlx4_ib_destroy_qp(struct ib_qp *qp)
 		mutex_unlock(&dev->qp1_proxy_lock[mqp->port - 1]);
 	}
 
+	if (mqp->counter_index)
+		mlx4_ib_free_qp_counter(dev, mqp);
+
 	pd = get_pd(mqp);
 	destroy_qp_common(dev, mqp, !!pd->ibpd.uobject);
 
@@ -1447,6 +1462,40 @@ static int handle_eth_ud_smac_index(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *
 	return 0;
 }
 
+static int create_qp_lb_counter(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp)
+{
+	struct counter_index *new_counter_index;
+	int err;
+	u32 tmp_idx;
+
+	if (rdma_port_get_link_layer(&dev->ib_dev, qp->port) !=
+	    IB_LINK_LAYER_ETHERNET ||
+	    !(qp->flags & MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK) ||
+	    !(dev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_LB_SRC_CHK))
+		return 0;
+
+	err = mlx4_counter_alloc(dev->dev, &tmp_idx);
+	if (err)
+		return err;
+
+	new_counter_index = kmalloc(sizeof(*new_counter_index), GFP_KERNEL);
+	if (!new_counter_index) {
+		mlx4_counter_free(dev->dev, tmp_idx);
+		return -ENOMEM;
+	}
+
+	new_counter_index->index = tmp_idx;
+	new_counter_index->allocated = 1;
+	qp->counter_index = new_counter_index;
+
+	mutex_lock(&dev->counters_table[qp->port - 1].mutex);
+	list_add_tail(&new_counter_index->list,
+		      &dev->counters_table[qp->port - 1].counters_list);
+	mutex_unlock(&dev->counters_table[qp->port - 1].mutex);
+
+	return 0;
+}
+
 static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 			       const struct ib_qp_attr *attr, int attr_mask,
 			       enum ib_qp_state cur_state, enum ib_qp_state new_state)
@@ -1520,6 +1569,9 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		context->sq_size_stride = ilog2(qp->sq.wqe_cnt) << 3;
 	context->sq_size_stride |= qp->sq.wqe_shift - 4;
 
+	if (new_state == IB_QPS_RESET && qp->counter_index)
+		mlx4_ib_free_qp_counter(dev, qp);
+
 	if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
 		context->sq_size_stride |= !!qp->sq_no_prefetch << 7;
 		context->xrcd = cpu_to_be32((u32) qp->xrcdn);
@@ -1544,11 +1596,24 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	}
 
 	if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR) {
+		err = create_qp_lb_counter(dev, qp);
+		if (err)
+			goto out;
+
 		counter_index =
 			dev->counters_table[qp->port - 1].default_counter;
+		if (qp->counter_index)
+			counter_index = qp->counter_index->index;
+
 		if (counter_index != -1) {
 			context->pri_path.counter_index = counter_index;
 			optpar |= MLX4_QP_OPTPAR_COUNTER_INDEX;
+			if (qp->counter_index) {
+				context->pri_path.fl |=
+					MLX4_FL_ETH_SRC_CHECK_MC_LB;
+				context->pri_path.vlan_control |=
+					MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER;
+			}
 		} else
 			context->pri_path.counter_index =
 				MLX4_SINK_COUNTER_INDEX(dev->dev);
@@ -1850,6 +1915,8 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		}
 	}
 out:
+	if (err && qp->counter_index)
+		mlx4_ib_free_qp_counter(dev, qp);
 	if (err && steer_qp)
 		mlx4_ib_steer_qp_reg(dev, qp, 0);
 	kfree(context);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 for-next 7/7] IB/mlx4: Add support for blocking multicast loopback QP creation user flag
       [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-10-15 11:44   ` [PATCH v2 for-next 6/7] IB/mlx4: Add counter based implementation for QP multicast loopback block Eran Ben Elisha
@ 2015-10-15 11:44   ` Eran Ben Elisha
  6 siblings, 0 replies; 29+ messages in thread
From: Eran Ben Elisha @ 2015-10-15 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Eran Ben Elisha,
	Christoph Lameter

MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is now supported downstream.

In addition, this flag was supported only for IB_QPT_UD, now, with the
new implementation it is supported for all QP types.

Support IB_USER_VERBS_EX_CMD_CREATE_QP in order to get the flag from
user space using the extension create qp command.

Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c |  3 ++-
 drivers/infiniband/hw/mlx4/qp.c   | 13 ++++++++-----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 232b104..8779d26 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2312,7 +2312,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 
 	ibdev->ib_dev.uverbs_ex_cmd_mask |=
 		(1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE) |
-		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ);
+		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ) |
+		(1ull << IB_USER_VERBS_EX_CMD_CREATE_QP);
 
 	mlx4_ib_alloc_eqs(dev, ibdev);
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 2871949..ba25a1b 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -758,9 +758,6 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	} else {
 		qp->sq_no_prefetch = 0;
 
-		if (init_attr->create_flags & IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK)
-			qp->flags |= MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
-
 		if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO)
 			qp->flags |= MLX4_IB_QP_LSO;
 
@@ -834,6 +831,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			goto err_proxy;
 	}
 
+	if (init_attr->create_flags & IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK)
+		qp->flags |= MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
+
 	err = mlx4_qp_alloc(dev->dev, qpn, &qp->mqp, gfp);
 	if (err)
 		goto err_qpn;
@@ -1098,6 +1098,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
 {
 	struct mlx4_ib_qp *qp = NULL;
 	int err;
+	int sup_u_create_flags = MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
 	u16 xrcdn = 0;
 	gfp_t gfp;
 
@@ -1121,8 +1122,10 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
 	}
 
 	if (init_attr->create_flags &&
-	    (udata ||
-	     ((init_attr->create_flags & ~(MLX4_IB_SRIOV_SQP | MLX4_IB_QP_CREATE_USE_GFP_NOIO)) &&
+	    ((udata && init_attr->create_flags & ~(sup_u_create_flags)) ||
+	     ((init_attr->create_flags & ~(MLX4_IB_SRIOV_SQP |
+					   MLX4_IB_QP_CREATE_USE_GFP_NOIO |
+					   MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK)) &&
 	      init_attr->qp_type != IB_QPT_UD) ||
 	     ((init_attr->create_flags & MLX4_IB_SRIOV_SQP) &&
 	      init_attr->qp_type > IB_QPT_GSI)))
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]     ` <1444909482-17113-2-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-21  8:34       ` Haggai Eran
       [not found]         ` <56274E02.7010002-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-21  9:53       ` Sagi Grimberg
  1 sibling, 1 reply; 29+ messages in thread
From: Haggai Eran @ 2015-10-21  8:34 UTC (permalink / raw)
  To: Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Christoph Lameter

On 15/10/2015 14:44, Eran Ben Elisha wrote:
> ib_uverbs_ex_create_qp follows the extension verbs
> mechanism. New features (for example, QP creation flags
> field which is added in a downstream patch) could used
> via user-space libraries without breaking the ABI.
> 
> Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---

> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index be4cb9f..e795d59 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -1741,66 +1741,65 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
>  	return in_len;
>  }
>  
> -ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
> -			    struct ib_device *ib_dev,
> -			    const char __user *buf, int in_len,
> -			    int out_len)
> -{
> -	struct ib_uverbs_create_qp      cmd;
> -	struct ib_uverbs_create_qp_resp resp;
> -	struct ib_udata                 udata;
> -	struct ib_uqp_object           *obj;
> -	struct ib_device	       *device;
> -	struct ib_pd                   *pd = NULL;
> -	struct ib_xrcd		       *xrcd = NULL;
> -	struct ib_uobject	       *uninitialized_var(xrcd_uobj);
> -	struct ib_cq                   *scq = NULL, *rcq = NULL;
> -	struct ib_srq                  *srq = NULL;
> -	struct ib_qp                   *qp;
> -	struct ib_qp_init_attr          attr;
> -	int ret;
> -
> -	if (out_len < sizeof resp)
> -		return -ENOSPC;
> -
> -	if (copy_from_user(&cmd, buf, sizeof cmd))
> -		return -EFAULT;
> +static int create_qp(struct ib_uverbs_file *file,
> +		     struct ib_udata *ucore,
> +		     struct ib_udata *uhw,
> +		     struct ib_uverbs_ex_create_qp *cmd,
> +		     size_t cmd_sz,
> +		     int (*cb)(struct ib_uverbs_file *file,
> +			       struct ib_uverbs_ex_create_qp_resp *resp,
> +			       struct ib_udata *udata),
> +		     void *context)
> +{
> +	struct ib_uqp_object		*obj;
> +	struct ib_device		*device;
> +	struct ib_pd			*pd = NULL;
> +	struct ib_xrcd			*xrcd = NULL;
> +	struct ib_uobject		*uninitialized_var(xrcd_uobj);
> +	struct ib_cq			*scq = NULL, *rcq = NULL;
> +	struct ib_srq			*srq = NULL;
> +	struct ib_qp			*qp;
> +	char				*buf;
> +	struct ib_qp_init_attr		attr;
> +	struct ib_uverbs_ex_create_qp_resp resp;
> +	int				ret;
>  
> -	if (cmd.qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
> +	if (cmd->qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
>  		return -EPERM;
>  
> -	INIT_UDATA(&udata, buf + sizeof cmd,
> -		   (unsigned long) cmd.response + sizeof resp,
> -		   in_len - sizeof cmd, out_len - sizeof resp);
> -
>  	obj = kzalloc(sizeof *obj, GFP_KERNEL);
>  	if (!obj)
>  		return -ENOMEM;
>  
> -	init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_class);
> +	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext,
> +		  &qp_lock_class);
>  	down_write(&obj->uevent.uobject.mutex);
>  
> -	if (cmd.qp_type == IB_QPT_XRC_TGT) {
> -		xrcd = idr_read_xrcd(cmd.pd_handle, file->ucontext, &xrcd_uobj);
> +	if (cmd->qp_type == IB_QPT_XRC_TGT) {
> +		xrcd = idr_read_xrcd(cmd->pd_handle, file->ucontext,
> +				     &xrcd_uobj);
>  		if (!xrcd) {
>  			ret = -EINVAL;
>  			goto err_put;
>  		}
>  		device = xrcd->device;
>  	} else {
> -		if (cmd.qp_type == IB_QPT_XRC_INI) {
> -			cmd.max_recv_wr = cmd.max_recv_sge = 0;
> +		if (cmd->qp_type == IB_QPT_XRC_INI) {
> +			cmd->max_recv_wr = 0;
> +			cmd->max_recv_sge = 0;
>  		} else {
> -			if (cmd.is_srq) {
> -				srq = idr_read_srq(cmd.srq_handle, file->ucontext);
> +			if (cmd->is_srq) {
> +				srq = idr_read_srq(cmd->srq_handle,
> +						   file->ucontext);
>  				if (!srq || srq->srq_type != IB_SRQT_BASIC) {
>  					ret = -EINVAL;
>  					goto err_put;
>  				}
>  			}
>  
> -			if (cmd.recv_cq_handle != cmd.send_cq_handle) {
> -				rcq = idr_read_cq(cmd.recv_cq_handle, file->ucontext, 0);
> +			if (cmd->recv_cq_handle != cmd->send_cq_handle) {
> +				rcq = idr_read_cq(cmd->recv_cq_handle,
> +						  file->ucontext, 0);
>  				if (!rcq) {
>  					ret = -EINVAL;
>  					goto err_put;
> @@ -1808,9 +1807,9 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>  			}
>  		}
>  
> -		scq = idr_read_cq(cmd.send_cq_handle, file->ucontext, !!rcq);
> +		scq = idr_read_cq(cmd->send_cq_handle, file->ucontext, !!rcq);
>  		rcq = rcq ?: scq;
> -		pd  = idr_read_pd(cmd.pd_handle, file->ucontext);
> +		pd  = idr_read_pd(cmd->pd_handle, file->ucontext);
>  		if (!pd || !scq) {
>  			ret = -EINVAL;
>  			goto err_put;
> @@ -1825,31 +1824,54 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>  	attr.recv_cq       = rcq;
>  	attr.srq           = srq;
>  	attr.xrcd	   = xrcd;
> -	attr.sq_sig_type   = cmd.sq_sig_all ? IB_SIGNAL_ALL_WR : IB_SIGNAL_REQ_WR;
> -	attr.qp_type       = cmd.qp_type;
> +	attr.sq_sig_type   = cmd->sq_sig_all ? IB_SIGNAL_ALL_WR :
> +					      IB_SIGNAL_REQ_WR;
> +	attr.qp_type       = cmd->qp_type;
>  	attr.create_flags  = 0;
>  
> -	attr.cap.max_send_wr     = cmd.max_send_wr;
> -	attr.cap.max_recv_wr     = cmd.max_recv_wr;
> -	attr.cap.max_send_sge    = cmd.max_send_sge;
> -	attr.cap.max_recv_sge    = cmd.max_recv_sge;
> -	attr.cap.max_inline_data = cmd.max_inline_data;
> +	attr.cap.max_send_wr     = cmd->max_send_wr;
> +	attr.cap.max_recv_wr     = cmd->max_recv_wr;
> +	attr.cap.max_send_sge    = cmd->max_send_sge;
> +	attr.cap.max_recv_sge    = cmd->max_recv_sge;
> +	attr.cap.max_inline_data = cmd->max_inline_data;
>  
>  	obj->uevent.events_reported     = 0;
>  	INIT_LIST_HEAD(&obj->uevent.event_list);
>  	INIT_LIST_HEAD(&obj->mcast_list);
>  
> -	if (cmd.qp_type == IB_QPT_XRC_TGT)
> +	if (cmd_sz >= offsetof(typeof(*cmd), create_flags) +
> +		      sizeof(cmd->create_flags))
> +		attr.create_flags = cmd->create_flags;
> +
> +	if (attr.create_flags) {
> +		ret = -EINVAL;
> +		goto err_put;
> +	}
> +
> +	if (cmd->comp_mask) {
Do you need this check in addition to the check in ib_uverbs_ex_create_qp()?

> +		ret = -EINVAL;
> +		goto err_put;
> +	}
> +
> +	buf = (void *)cmd + sizeof(*cmd);
> +	if (cmd_sz > sizeof(*cmd))
> +		if (!(buf[0] == 0 && !memcmp(buf, buf + 1,
> +					     cmd_sz - sizeof(*cmd) - 1))) {
> +			ret = -EINVAL;
> +			goto err_put;
> +		}
> +
> +	if (cmd->qp_type == IB_QPT_XRC_TGT)
>  		qp = ib_create_qp(pd, &attr);
>  	else
> -		qp = device->create_qp(pd, &attr, &udata);
> +		qp = device->create_qp(pd, &attr, uhw);
>  
>  	if (IS_ERR(qp)) {
>  		ret = PTR_ERR(qp);
>  		goto err_put;
>  	}
>  
> -	if (cmd.qp_type != IB_QPT_XRC_TGT) {
> +	if (cmd->qp_type != IB_QPT_XRC_TGT) {
>  		qp->real_qp	  = qp;
>  		qp->device	  = device;
>  		qp->pd		  = pd;
> @@ -1875,19 +1897,20 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>  		goto err_destroy;
>  
>  	memset(&resp, 0, sizeof resp);
> -	resp.qpn             = qp->qp_num;
> -	resp.qp_handle       = obj->uevent.uobject.id;
> -	resp.max_recv_sge    = attr.cap.max_recv_sge;
> -	resp.max_send_sge    = attr.cap.max_send_sge;
> -	resp.max_recv_wr     = attr.cap.max_recv_wr;
> -	resp.max_send_wr     = attr.cap.max_send_wr;
> -	resp.max_inline_data = attr.cap.max_inline_data;
> +	resp.base.qpn             = qp->qp_num;
> +	resp.base.qp_handle       = obj->uevent.uobject.id;
> +	resp.base.max_recv_sge    = attr.cap.max_recv_sge;
> +	resp.base.max_send_sge    = attr.cap.max_send_sge;
> +	resp.base.max_recv_wr     = attr.cap.max_recv_wr;
> +	resp.base.max_send_wr     = attr.cap.max_send_wr;
> +	resp.base.max_inline_data = attr.cap.max_inline_data;
>  
> -	if (copy_to_user((void __user *) (unsigned long) cmd.response,
> -			 &resp, sizeof resp)) {
> -		ret = -EFAULT;
> -		goto err_copy;
> -	}
> +	resp.response_length = offsetof(typeof(resp), response_length) +
> +			       sizeof(resp.response_length);
> +
> +	ret = cb(file, &resp, ucore);
> +	if (ret)
> +		goto err_cb;
>  
>  	if (xrcd) {
>  		obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object,
> @@ -1913,9 +1936,8 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>  
>  	up_write(&obj->uevent.uobject.mutex);
>  
> -	return in_len;
> -
> -err_copy:
> +	return 0;
> +err_cb:
>  	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
>  
>  err_destroy:
> @@ -1937,6 +1959,113 @@ err_put:
>  	return ret;
>  }
>  
> +static int ib_uverbs_create_qp_cb(struct ib_uverbs_file *file,
> +				  struct ib_uverbs_ex_create_qp_resp *resp,
> +				  struct ib_udata *ucore)
> +{
> +	if (ib_copy_to_udata(ucore, &resp->base, sizeof(resp->base)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
> +			    struct ib_device *ib_dev,
> +			    const char __user *buf, int in_len,
> +			    int out_len)
> +{
> +	struct ib_uverbs_create_qp      cmd;
> +	struct ib_uverbs_ex_create_qp	cmd_ex;
> +	struct ib_udata			ucore;
> +	struct ib_udata			uhw;
> +	ssize_t resp_size = sizeof(struct ib_uverbs_create_qp_resp);
> +	int				err;

I would expect a check here that in_len >= sizeof(cmd). But I see the
previous code didn't have it either. Any reason adding the check would
break user-space?

> +
> +	if (out_len < resp_size)
> +		return -ENOSPC;
> +
> +	if (copy_from_user(&cmd, buf, sizeof(cmd)))
> +		return -EFAULT;
> +
> +	INIT_UDATA(&ucore, buf, (unsigned long)cmd.response, sizeof(cmd),
> +		   resp_size);
> +	INIT_UDATA(&uhw, buf + sizeof(cmd),
> +		   (unsigned long)cmd.response + resp_size,
> +		   in_len - sizeof(cmd), out_len - resp_size);
> +
> +	memset(&cmd_ex, 0, sizeof(cmd_ex));
> +	cmd_ex.user_handle = cmd.user_handle;
> +	cmd_ex.pd_handle = cmd.pd_handle;
> +	cmd_ex.send_cq_handle = cmd.send_cq_handle;
> +	cmd_ex.recv_cq_handle = cmd.recv_cq_handle;
> +	cmd_ex.srq_handle = cmd.srq_handle;
> +	cmd_ex.max_send_wr = cmd.max_send_wr;
> +	cmd_ex.max_recv_wr = cmd.max_recv_wr;
> +	cmd_ex.max_send_sge = cmd.max_send_sge;
> +	cmd_ex.max_recv_sge = cmd.max_recv_sge;
> +	cmd_ex.max_inline_data = cmd.max_inline_data;
> +	cmd_ex.sq_sig_all = cmd.sq_sig_all;
> +	cmd_ex.qp_type = cmd.qp_type;
> +	cmd_ex.is_srq = cmd.is_srq;
> +
> +	err = create_qp(file, &ucore, &uhw, &cmd_ex,
> +			offsetof(typeof(cmd_ex), is_srq) +
> +			sizeof(cmd.is_srq), ib_uverbs_create_qp_cb,
> +			NULL);
> +
> +	if (err)
> +		return err;
> +
> +	return in_len;
> +}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 2/7] IB/core: Allow setting create flags in QP init attribute
       [not found]     ` <1444909482-17113-3-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-21  8:46       ` Haggai Eran
  0 siblings, 0 replies; 29+ messages in thread
From: Haggai Eran @ 2015-10-21  8:46 UTC (permalink / raw)
  To: Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Christoph Lameter

On 15/10/2015 14:44, Eran Ben Elisha wrote:
> Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in
> ib_uverbs_create_qp_ex.
> 
> Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/core/uverbs_cmd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index e795d59..e9bafa3 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -1843,7 +1843,7 @@ static int create_qp(struct ib_uverbs_file *file,
>  		      sizeof(cmd->create_flags))
>  		attr.create_flags = cmd->create_flags;
>  
> -	if (attr.create_flags) {
> +	if (attr.create_flags & ~IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK) {
>  		ret = -EINVAL;
>  		goto err_put;
>  	}
> 

FWIW

Reviewed-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]     ` <1444909482-17113-2-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-21  8:34       ` Haggai Eran
@ 2015-10-21  9:53       ` Sagi Grimberg
       [not found]         ` <56276095.6020803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2015-10-21  9:53 UTC (permalink / raw)
  To: Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Christoph Lameter

On 10/15/2015 2:44 PM, Eran Ben Elisha wrote:
> ib_uverbs_ex_create_qp follows the extension verbs
> mechanism. New features (for example, QP creation flags
> field which is added in a downstream patch) could used
> via user-space libraries without breaking the ABI.

This is an important addition, I'll need it soon for stuff
I'm working on...

> +struct ib_uverbs_ex_create_qp {
> +	__u64 user_handle;
> +	__u32 pd_handle;
> +	__u32 send_cq_handle;
> +	__u32 recv_cq_handle;
> +	__u32 srq_handle;
> +	__u32 max_send_wr;
> +	__u32 max_recv_wr;
> +	__u32 max_send_sge;
> +	__u32 max_recv_sge;
> +	__u32 max_inline_data;
> +	__u8  sq_sig_all;
> +	__u8  qp_type;
> +	__u8  is_srq;
> +	__u8 reserved;
> +	__u32 comp_mask;
> +	__u32 create_flags;

Can we please make create_flags u64 to begin with?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]         ` <56276095.6020803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-10-21 10:04           ` Or Gerlitz
       [not found]             ` <56276336.7000704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-21 10:26           ` Haggai Eran
  1 sibling, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2015-10-21 10:04 UTC (permalink / raw)
  To: Sagi Grimberg, Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Christoph Lameter

On 10/21/2015 12:53 PM, Sagi Grimberg wrote:
> On 10/15/2015 2:44 PM, Eran Ben Elisha wrote:
>
>
>> +struct ib_uverbs_ex_create_qp {
>> +    __u64 user_handle;
>> +    __u32 pd_handle;
>> +    __u32 send_cq_handle;
>> +    __u32 recv_cq_handle;
>> +    __u32 srq_handle;
>> +    __u32 max_send_wr;
>> +    __u32 max_recv_wr;
>> +    __u32 max_send_sge;
>> +    __u32 max_recv_sge;
>> +    __u32 max_inline_data;
>> +    __u8  sq_sig_all;
>> +    __u8  qp_type;
>> +    __u8  is_srq;
>> +    __u8 reserved;
>> +    __u32 comp_mask;
>> +    __u32 create_flags;
>
> Can we please make create_flags u64 to begin with?


In the kernel we used 4-5 flags over 10 years, why worry?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]         ` <56276095.6020803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-10-21 10:04           ` Or Gerlitz
@ 2015-10-21 10:26           ` Haggai Eran
  1 sibling, 0 replies; 29+ messages in thread
From: Haggai Eran @ 2015-10-21 10:26 UTC (permalink / raw)
  To: Sagi Grimberg, Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Christoph Lameter

On 21/10/2015 12:53, Sagi Grimberg wrote:
> On 10/15/2015 2:44 PM, Eran Ben Elisha wrote:
>> ib_uverbs_ex_create_qp follows the extension verbs
>> mechanism. New features (for example, QP creation flags
>> field which is added in a downstream patch) could used
>> via user-space libraries without breaking the ABI.
> 
> This is an important addition, I'll need it soon for stuff
> I'm working on...
> 
>> +struct ib_uverbs_ex_create_qp {
>> +    __u64 user_handle;
>> +    __u32 pd_handle;
>> +    __u32 send_cq_handle;
>> +    __u32 recv_cq_handle;
>> +    __u32 srq_handle;
>> +    __u32 max_send_wr;
>> +    __u32 max_recv_wr;
>> +    __u32 max_send_sge;
>> +    __u32 max_recv_sge;
>> +    __u32 max_inline_data;
>> +    __u8  sq_sig_all;
>> +    __u8  qp_type;
>> +    __u8  is_srq;
>> +    __u8 reserved;
>> +    __u32 comp_mask;
>> +    __u32 create_flags;
> 
> Can we please make create_flags u64 to begin with?
Note that the size of the struct must be a round number of 64-bit words
to avoid different sizeof values on 32-bit systems. If you change the
size of create_flags you'll need another 32-bit padding field in the struct.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]         ` <56274E02.7010002-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-21 13:46           ` eran ben elisha
       [not found]             ` <CAKHjkj=k6QSKj0gMe0z5=Kjbe2e7cVCbPhkq9Z457gpvnRyitg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: eran ben elisha @ 2015-10-21 13:46 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Eran Ben Elisha, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Christoph Lameter

On Wed, Oct 21, 2015 at 11:34 AM, Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 15/10/2015 14:44, Eran Ben Elisha wrote:
>> ib_uverbs_ex_create_qp follows the extension verbs
>> mechanism. New features (for example, QP creation flags
>> field which is added in a downstream patch) could used
>> via user-space libraries without breaking the ABI.
>>
>> Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>
>> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
>> index be4cb9f..e795d59 100644
>> --- a/drivers/infiniband/core/uverbs_cmd.c
>> +++ b/drivers/infiniband/core/uverbs_cmd.c
>> @@ -1741,66 +1741,65 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
>>       return in_len;
>>  }
>>
>> -ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>> -                         struct ib_device *ib_dev,
>> -                         const char __user *buf, int in_len,
>> -                         int out_len)
>> -{
>> -     struct ib_uverbs_create_qp      cmd;
>> -     struct ib_uverbs_create_qp_resp resp;
>> -     struct ib_udata                 udata;
>> -     struct ib_uqp_object           *obj;
>> -     struct ib_device               *device;
>> -     struct ib_pd                   *pd = NULL;
>> -     struct ib_xrcd                 *xrcd = NULL;
>> -     struct ib_uobject              *uninitialized_var(xrcd_uobj);
>> -     struct ib_cq                   *scq = NULL, *rcq = NULL;
>> -     struct ib_srq                  *srq = NULL;
>> -     struct ib_qp                   *qp;
>> -     struct ib_qp_init_attr          attr;
>> -     int ret;
>> -
>> -     if (out_len < sizeof resp)
>> -             return -ENOSPC;
>> -
>> -     if (copy_from_user(&cmd, buf, sizeof cmd))
>> -             return -EFAULT;
>> +static int create_qp(struct ib_uverbs_file *file,
>> +                  struct ib_udata *ucore,
>> +                  struct ib_udata *uhw,
>> +                  struct ib_uverbs_ex_create_qp *cmd,
>> +                  size_t cmd_sz,
>> +                  int (*cb)(struct ib_uverbs_file *file,
>> +                            struct ib_uverbs_ex_create_qp_resp *resp,
>> +                            struct ib_udata *udata),
>> +                  void *context)
>> +{
>> +     struct ib_uqp_object            *obj;
>> +     struct ib_device                *device;
>> +     struct ib_pd                    *pd = NULL;
>> +     struct ib_xrcd                  *xrcd = NULL;
>> +     struct ib_uobject               *uninitialized_var(xrcd_uobj);
>> +     struct ib_cq                    *scq = NULL, *rcq = NULL;
>> +     struct ib_srq                   *srq = NULL;
>> +     struct ib_qp                    *qp;
>> +     char                            *buf;
>> +     struct ib_qp_init_attr          attr;
>> +     struct ib_uverbs_ex_create_qp_resp resp;
>> +     int                             ret;
>>
>> -     if (cmd.qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
>> +     if (cmd->qp_type == IB_QPT_RAW_PACKET && !capable(CAP_NET_RAW))
>>               return -EPERM;
>>
>> -     INIT_UDATA(&udata, buf + sizeof cmd,
>> -                (unsigned long) cmd.response + sizeof resp,
>> -                in_len - sizeof cmd, out_len - sizeof resp);
>> -
>>       obj = kzalloc(sizeof *obj, GFP_KERNEL);
>>       if (!obj)
>>               return -ENOMEM;
>>
>> -     init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_class);
>> +     init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext,
>> +               &qp_lock_class);
>>       down_write(&obj->uevent.uobject.mutex);
>>
>> -     if (cmd.qp_type == IB_QPT_XRC_TGT) {
>> -             xrcd = idr_read_xrcd(cmd.pd_handle, file->ucontext, &xrcd_uobj);
>> +     if (cmd->qp_type == IB_QPT_XRC_TGT) {
>> +             xrcd = idr_read_xrcd(cmd->pd_handle, file->ucontext,
>> +                                  &xrcd_uobj);
>>               if (!xrcd) {
>>                       ret = -EINVAL;
>>                       goto err_put;
>>               }
>>               device = xrcd->device;
>>       } else {
>> -             if (cmd.qp_type == IB_QPT_XRC_INI) {
>> -                     cmd.max_recv_wr = cmd.max_recv_sge = 0;
>> +             if (cmd->qp_type == IB_QPT_XRC_INI) {
>> +                     cmd->max_recv_wr = 0;
>> +                     cmd->max_recv_sge = 0;
>>               } else {
>> -                     if (cmd.is_srq) {
>> -                             srq = idr_read_srq(cmd.srq_handle, file->ucontext);
>> +                     if (cmd->is_srq) {
>> +                             srq = idr_read_srq(cmd->srq_handle,
>> +                                                file->ucontext);
>>                               if (!srq || srq->srq_type != IB_SRQT_BASIC) {
>>                                       ret = -EINVAL;
>>                                       goto err_put;
>>                               }
>>                       }
>>
>> -                     if (cmd.recv_cq_handle != cmd.send_cq_handle) {
>> -                             rcq = idr_read_cq(cmd.recv_cq_handle, file->ucontext, 0);
>> +                     if (cmd->recv_cq_handle != cmd->send_cq_handle) {
>> +                             rcq = idr_read_cq(cmd->recv_cq_handle,
>> +                                               file->ucontext, 0);
>>                               if (!rcq) {
>>                                       ret = -EINVAL;
>>                                       goto err_put;
>> @@ -1808,9 +1807,9 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>>                       }
>>               }
>>
>> -             scq = idr_read_cq(cmd.send_cq_handle, file->ucontext, !!rcq);
>> +             scq = idr_read_cq(cmd->send_cq_handle, file->ucontext, !!rcq);
>>               rcq = rcq ?: scq;
>> -             pd  = idr_read_pd(cmd.pd_handle, file->ucontext);
>> +             pd  = idr_read_pd(cmd->pd_handle, file->ucontext);
>>               if (!pd || !scq) {
>>                       ret = -EINVAL;
>>                       goto err_put;
>> @@ -1825,31 +1824,54 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>>       attr.recv_cq       = rcq;
>>       attr.srq           = srq;
>>       attr.xrcd          = xrcd;
>> -     attr.sq_sig_type   = cmd.sq_sig_all ? IB_SIGNAL_ALL_WR : IB_SIGNAL_REQ_WR;
>> -     attr.qp_type       = cmd.qp_type;
>> +     attr.sq_sig_type   = cmd->sq_sig_all ? IB_SIGNAL_ALL_WR :
>> +                                           IB_SIGNAL_REQ_WR;
>> +     attr.qp_type       = cmd->qp_type;
>>       attr.create_flags  = 0;
>>
>> -     attr.cap.max_send_wr     = cmd.max_send_wr;
>> -     attr.cap.max_recv_wr     = cmd.max_recv_wr;
>> -     attr.cap.max_send_sge    = cmd.max_send_sge;
>> -     attr.cap.max_recv_sge    = cmd.max_recv_sge;
>> -     attr.cap.max_inline_data = cmd.max_inline_data;
>> +     attr.cap.max_send_wr     = cmd->max_send_wr;
>> +     attr.cap.max_recv_wr     = cmd->max_recv_wr;
>> +     attr.cap.max_send_sge    = cmd->max_send_sge;
>> +     attr.cap.max_recv_sge    = cmd->max_recv_sge;
>> +     attr.cap.max_inline_data = cmd->max_inline_data;
>>
>>       obj->uevent.events_reported     = 0;
>>       INIT_LIST_HEAD(&obj->uevent.event_list);
>>       INIT_LIST_HEAD(&obj->mcast_list);
>>
>> -     if (cmd.qp_type == IB_QPT_XRC_TGT)
>> +     if (cmd_sz >= offsetof(typeof(*cmd), create_flags) +
>> +                   sizeof(cmd->create_flags))
>> +             attr.create_flags = cmd->create_flags;
>> +
>> +     if (attr.create_flags) {
>> +             ret = -EINVAL;
>> +             goto err_put;
>> +     }
>> +
>> +     if (cmd->comp_mask) {
> Do you need this check in addition to the check in ib_uverbs_ex_create_qp()?

You are right, I will remove this extra check and resend.

>
>> +             ret = -EINVAL;
>> +             goto err_put;
>> +     }
>> +
>> +     buf = (void *)cmd + sizeof(*cmd);
>> +     if (cmd_sz > sizeof(*cmd))
>> +             if (!(buf[0] == 0 && !memcmp(buf, buf + 1,
>> +                                          cmd_sz - sizeof(*cmd) - 1))) {
>> +                     ret = -EINVAL;
>> +                     goto err_put;
>> +             }
>> +
>> +     if (cmd->qp_type == IB_QPT_XRC_TGT)
>>               qp = ib_create_qp(pd, &attr);
>>       else
>> -             qp = device->create_qp(pd, &attr, &udata);
>> +             qp = device->create_qp(pd, &attr, uhw);
>>
>>       if (IS_ERR(qp)) {
>>               ret = PTR_ERR(qp);
>>               goto err_put;
>>       }
>>
>> -     if (cmd.qp_type != IB_QPT_XRC_TGT) {
>> +     if (cmd->qp_type != IB_QPT_XRC_TGT) {
>>               qp->real_qp       = qp;
>>               qp->device        = device;
>>               qp->pd            = pd;
>> @@ -1875,19 +1897,20 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>>               goto err_destroy;
>>
>>       memset(&resp, 0, sizeof resp);
>> -     resp.qpn             = qp->qp_num;
>> -     resp.qp_handle       = obj->uevent.uobject.id;
>> -     resp.max_recv_sge    = attr.cap.max_recv_sge;
>> -     resp.max_send_sge    = attr.cap.max_send_sge;
>> -     resp.max_recv_wr     = attr.cap.max_recv_wr;
>> -     resp.max_send_wr     = attr.cap.max_send_wr;
>> -     resp.max_inline_data = attr.cap.max_inline_data;
>> +     resp.base.qpn             = qp->qp_num;
>> +     resp.base.qp_handle       = obj->uevent.uobject.id;
>> +     resp.base.max_recv_sge    = attr.cap.max_recv_sge;
>> +     resp.base.max_send_sge    = attr.cap.max_send_sge;
>> +     resp.base.max_recv_wr     = attr.cap.max_recv_wr;
>> +     resp.base.max_send_wr     = attr.cap.max_send_wr;
>> +     resp.base.max_inline_data = attr.cap.max_inline_data;
>>
>> -     if (copy_to_user((void __user *) (unsigned long) cmd.response,
>> -                      &resp, sizeof resp)) {
>> -             ret = -EFAULT;
>> -             goto err_copy;
>> -     }
>> +     resp.response_length = offsetof(typeof(resp), response_length) +
>> +                            sizeof(resp.response_length);
>> +
>> +     ret = cb(file, &resp, ucore);
>> +     if (ret)
>> +             goto err_cb;
>>
>>       if (xrcd) {
>>               obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object,
>> @@ -1913,9 +1936,8 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>>
>>       up_write(&obj->uevent.uobject.mutex);
>>
>> -     return in_len;
>> -
>> -err_copy:
>> +     return 0;
>> +err_cb:
>>       idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
>>
>>  err_destroy:
>> @@ -1937,6 +1959,113 @@ err_put:
>>       return ret;
>>  }
>>
>> +static int ib_uverbs_create_qp_cb(struct ib_uverbs_file *file,
>> +                               struct ib_uverbs_ex_create_qp_resp *resp,
>> +                               struct ib_udata *ucore)
>> +{
>> +     if (ib_copy_to_udata(ucore, &resp->base, sizeof(resp->base)))
>> +             return -EFAULT;
>> +
>> +     return 0;
>> +}
>> +
>> +ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>> +                         struct ib_device *ib_dev,
>> +                         const char __user *buf, int in_len,
>> +                         int out_len)
>> +{
>> +     struct ib_uverbs_create_qp      cmd;
>> +     struct ib_uverbs_ex_create_qp   cmd_ex;
>> +     struct ib_udata                 ucore;
>> +     struct ib_udata                 uhw;
>> +     ssize_t resp_size = sizeof(struct ib_uverbs_create_qp_resp);
>> +     int                             err;
>
> I would expect a check here that in_len >= sizeof(cmd). But I see the
> previous code didn't have it either. Any reason adding the check would
> break user-space?

This patch just refactor in ib_uverbs_create_qp and doesn't change any
of it's logic or fix any bug. we can consider such a fix for the
future.

>
>> +
>> +     if (out_len < resp_size)
>> +             return -ENOSPC;
>> +
>> +     if (copy_from_user(&cmd, buf, sizeof(cmd)))
>> +             return -EFAULT;
>> +
>> +     INIT_UDATA(&ucore, buf, (unsigned long)cmd.response, sizeof(cmd),
>> +                resp_size);
>> +     INIT_UDATA(&uhw, buf + sizeof(cmd),
>> +                (unsigned long)cmd.response + resp_size,
>> +                in_len - sizeof(cmd), out_len - resp_size);
>> +
>> +     memset(&cmd_ex, 0, sizeof(cmd_ex));
>> +     cmd_ex.user_handle = cmd.user_handle;
>> +     cmd_ex.pd_handle = cmd.pd_handle;
>> +     cmd_ex.send_cq_handle = cmd.send_cq_handle;
>> +     cmd_ex.recv_cq_handle = cmd.recv_cq_handle;
>> +     cmd_ex.srq_handle = cmd.srq_handle;
>> +     cmd_ex.max_send_wr = cmd.max_send_wr;
>> +     cmd_ex.max_recv_wr = cmd.max_recv_wr;
>> +     cmd_ex.max_send_sge = cmd.max_send_sge;
>> +     cmd_ex.max_recv_sge = cmd.max_recv_sge;
>> +     cmd_ex.max_inline_data = cmd.max_inline_data;
>> +     cmd_ex.sq_sig_all = cmd.sq_sig_all;
>> +     cmd_ex.qp_type = cmd.qp_type;
>> +     cmd_ex.is_srq = cmd.is_srq;
>> +
>> +     err = create_qp(file, &ucore, &uhw, &cmd_ex,
>> +                     offsetof(typeof(cmd_ex), is_srq) +
>> +                     sizeof(cmd.is_srq), ib_uverbs_create_qp_cb,
>> +                     NULL);
>> +
>> +     if (err)
>> +             return err;
>> +
>> +     return in_len;
>> +}
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]             ` <56276336.7000704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-21 14:09               ` Sagi Grimberg
       [not found]                 ` <56279CA4.8040201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2015-10-21 14:09 UTC (permalink / raw)
  To: Or Gerlitz, Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Christoph Lameter

On 10/21/2015 1:04 PM, Or Gerlitz wrote:
> On 10/21/2015 12:53 PM, Sagi Grimberg wrote:
>> On 10/15/2015 2:44 PM, Eran Ben Elisha wrote:
>>
>>
>>> +struct ib_uverbs_ex_create_qp {
>>> +    __u64 user_handle;
>>> +    __u32 pd_handle;
>>> +    __u32 send_cq_handle;
>>> +    __u32 recv_cq_handle;
>>> +    __u32 srq_handle;
>>> +    __u32 max_send_wr;
>>> +    __u32 max_recv_wr;
>>> +    __u32 max_send_sge;
>>> +    __u32 max_recv_sge;
>>> +    __u32 max_inline_data;
>>> +    __u8  sq_sig_all;
>>> +    __u8  qp_type;
>>> +    __u8  is_srq;
>>> +    __u8 reserved;
>>> +    __u32 comp_mask;
>>> +    __u32 create_flags;
>>
>> Can we please make create_flags u64 to begin with?
>
>
> In the kernel we used 4-5 flags over 10 years, why worry?

I don't see why we should assume that we won't occupy enough bits in a
long time just for saving extra 4 bytes. But if you are strongly against
it then I won't persist here.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]             ` <CAKHjkj=k6QSKj0gMe0z5=Kjbe2e7cVCbPhkq9Z457gpvnRyitg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-21 14:42               ` Haggai Eran
  0 siblings, 0 replies; 29+ messages in thread
From: Haggai Eran @ 2015-10-21 14:42 UTC (permalink / raw)
  To: eran ben elisha
  Cc: Eran Ben Elisha, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Christoph Lameter

On 21/10/2015 16:46, eran ben elisha wrote:
>>> +ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
>>> >> +                         struct ib_device *ib_dev,
>>> >> +                         const char __user *buf, int in_len,
>>> >> +                         int out_len)
>>> >> +{
>>> >> +     struct ib_uverbs_create_qp      cmd;
>>> >> +     struct ib_uverbs_ex_create_qp   cmd_ex;
>>> >> +     struct ib_udata                 ucore;
>>> >> +     struct ib_udata                 uhw;
>>> >> +     ssize_t resp_size = sizeof(struct ib_uverbs_create_qp_resp);
>>> >> +     int                             err;
>> >
>> > I would expect a check here that in_len >= sizeof(cmd). But I see the
>> > previous code didn't have it either. Any reason adding the check would
>> > break user-space?
> This patch just refactor in ib_uverbs_create_qp and doesn't change any
> of it's logic or fix any bug. we can consider such a fix for the
> future.

Fair enough.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]                 ` <56279CA4.8040201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-10-21 18:15                   ` Or Gerlitz
       [not found]                     ` <CAJ3xEMgWcpKbKL3jhiK2RP=H6_iY0eriirAXn1TJaSG7u8frAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2015-10-21 18:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Or Gerlitz, Eran Ben Elisha, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Christoph Lameter

On Wed, Oct 21, 2015 at 5:09 PM, Sagi Grimberg  wrote:
> On 10/21/2015 1:04 PM, Or Gerlitz wrote:
>> On 10/21/2015 12:53 PM, Sagi Grimberg wrote:
>>> On 10/15/2015 2:44 PM, Eran Ben Elisha wrote:

>>>> +struct ib_uverbs_ex_create_qp {
>>>> +    __u64 user_handle;
>>>> +    __u32 pd_handle;
>>>> +    __u32 send_cq_handle;
>>>> +    __u32 recv_cq_handle;
>>>> +    __u32 srq_handle;
>>>> +    __u32 max_send_wr;
>>>> +    __u32 max_recv_wr;
>>>> +    __u32 max_send_sge;
>>>> +    __u32 max_recv_sge;
>>>> +    __u32 max_inline_data;
>>>> +    __u8  sq_sig_all;
>>>> +    __u8  qp_type;
>>>> +    __u8  is_srq;
>>>> +    __u8 reserved;
>>>> +    __u32 comp_mask;
>>>> +    __u32 create_flags;
>>>
>>>
>>> Can we please make create_flags u64 to begin with?

>> In the kernel we used 4-5 flags over 10 years, why worry?

> I don't see why we should assume that we won't occupy enough bits in a
> long time just for saving extra 4 bytes. But if you are strongly against
> it then I won't persist here.

Again, the kernel stack consuming rate here was < 1 bit a year when
averaging over time since this was introduced. So we should be doing
well for the coming ~10-20 years with this 32 bit field, and we can
easily extend it later, I verified that with Haggai, so yes, don't
want 64 bits now.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]                     ` <CAJ3xEMgWcpKbKL3jhiK2RP=H6_iY0eriirAXn1TJaSG7u8frAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-21 18:21                       ` Christoph Lameter
       [not found]                         ` <alpine.DEB.2.20.1510211319520.10833-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Lameter @ 2015-10-21 18:21 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Sagi Grimberg, Or Gerlitz, Eran Ben Elisha, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, 21 Oct 2015, Or Gerlitz wrote:

> Again, the kernel stack consuming rate here was < 1 bit a year when
> averaging over time since this was introduced. So we should be doing
> well for the coming ~10-20 years with this 32 bit field, and we can
> easily extend it later, I verified that with Haggai, so yes, don't
> want 64 bits now.

Hmmm... We are running out of dev_cap flags on mlx4 already. They are 32
bits.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp
       [not found]                         ` <alpine.DEB.2.20.1510211319520.10833-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-10-21 18:52                           ` Or Gerlitz
  0 siblings, 0 replies; 29+ messages in thread
From: Or Gerlitz @ 2015-10-21 18:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Sagi Grimberg, Or Gerlitz, Eran Ben Elisha, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 21, 2015 at 9:21 PM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:

> Hmmm... We are running out of dev_cap flags on mlx4 already. They are 32
> bits.

Christoph,

These  are QP creation flags (IB_QP_CREATE_XXX) not device capability
flags (IB_DEVICE_YYY)

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]     ` <1444909482-17113-6-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-24 10:12       ` Sagi Grimberg
       [not found]         ` <567BC52B.4030801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2015-12-24 10:12 UTC (permalink / raw)
  To: Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Christoph Lameter

This patch seems to generate a list corruption [1] when I test
with Doug's for-4.5 tree.

Eran, care to take a look at this?

[1]:
mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) 

mlx4_core: Initializing 0000:04:00.0 

mlx4_core 0000:04:00.0: PCIe link speed is 8.0GT/s, device supports 
8.0GT/s
mlx4_core 0000:04:00.0: PCIe link width is x8, device supports x8 

<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver 
v2.2-1 (Feb 2014)
<mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0 

<mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0 

BUG: unable to handle kernel NULL pointer dereference at 
(null)
IP: [<ffffffff81274e36>] __list_add+0x26/0xd0 

PGD 46da14067 PUD 46daa0067 PMD 0 

Oops: 0000 [#1] SMP 

Modules linked in: mlx4_ib(+) ib_sa ib_mad mlx4_core mlx5_ib mlx5_core 
ib_core ib_addr netconsole configfs nfsv3 nfs fscache cfg80211 rfkill 
x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crc32c_intel 
aesni_intel aes_x86_64 glue_helper lrw dm_mod gf128mul ablk_helper 
cryptd iTCO_wdt iTCO_vendor_support sb_edac shpchp ipmi_si ioatdma 
lpc_ich mfd_core edac_core pcspkr wmi ipmi_msghandler i2c_i801 
acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 
mbcache jbd2 sd_mod isci libsas igb serio_raw ahci ptp pps_core libahci 
i2c_algo_bit scsi_transport_sas i2c_core dca ipv6 autofs4 [last 
unloaded: mlx5_core] 

CPU: 0 PID: 1737 Comm: modprobe Not tainted 4.4.0-rc6+ #107 
 

Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013 
 

task: ffff8804673da800 ti: ffff880466694000 task.ti: ffff880466694000 
 

RIP: 0010:[<ffffffff81274e36>]  [<ffffffff81274e36>] 
__list_add+0x26/0xd0 

RSP: 0018:ffff880466697898  EFLAGS: 00010246 
 

RAX: 00000000ffffffff RBX: ffff8804666978c8 RCX: ffff8804673da800 
 

RDX: ffff88086b8539b8 RSI: 0000000000000000 RDI: ffff8804666978c8 
 

RBP: ffff8804666978b8 R08: 0000000000000000 R09: 0000000000000001 
 

R10: 0000000000000000 R11: 000000000000fffe R12: ffff88086b8539b8 
 

R13: 0000000000000000 R14: ffff88086b8539b8 R15: ffff880466697908
FS:  00007f37a02cf700(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000046b6ee000 CR4: 00000000000406f0
Stack:
  ffff8804673da800 ffff88086b8539b0 ffff8804673da800 ffff88086b8539b4
  ffff880466697958 ffffffff8154f7be ffff880466697904 0000000000000292
  ffff880466697938 ffffffff81259bc1 0000000000007f49 80000000024000c0
Call Trace:
  [<ffffffff8154f7be>] __mutex_lock_slowpath+0x6e/0x110
  [<ffffffff81259bc1>] ? ida_simple_get+0x91/0x100
  [<ffffffff811d354e>] ? kernfs_next_descendant_post+0x1e/0x90
  [<ffffffff811d3646>] ? kernfs_activate+0x86/0xf0
  [<ffffffff8154f87e>] mutex_lock+0x1e/0x40
  [<ffffffffa00fb083>] iboe_process_mad+0x73/0x180 [mlx4_ib]
  [<ffffffffa00fba36>] mlx4_ib_process_mad+0xd6/0x110 [mlx4_ib]
  [<ffffffffa06b4703>] get_perf_mad+0x103/0x140 [ib_core]
  [<ffffffffa06b4764>] get_counter_table+0x24/0x40 [ib_core]
  [<ffffffff8115846e>] ? __kmalloc+0xde/0xe0
  [<ffffffffa06b4895>] add_port+0x115/0x3f0 [ib_core]
  [<ffffffffa06b4c5e>] ib_device_register_sysfs+0xee/0x160 [ib_core]
  [<ffffffffa06b5e05>] ib_register_device+0x1d5/0x300 [ib_core]
  [<ffffffffa010282b>] mlx4_ib_add+0x78b/0xd00 [mlx4_ib]
  [<ffffffffa08027ce>] mlx4_add_device+0x3e/0xb0 [mlx4_core]
  [<ffffffffa0802957>] mlx4_register_interface+0x87/0xe0 [mlx4_core]
  [<ffffffffa0096055>] mlx4_ib_init+0x55/0x72 [mlx4_ib]
  [<ffffffffa0096000>] ? 0xffffffffa0096000
  [<ffffffff81000368>] do_one_initcall+0xa8/0x1c0
  [<ffffffff810ca5bf>] do_init_module+0x5f/0x210
  [<ffffffff810cc267>] load_module+0x5d7/0x700
  [<ffffffff810c97a0>] ? mod_sysfs_teardown+0x140/0x140
  [<ffffffff810c91f0>] ? module_sect_show+0x20/0x20
  [<ffffffff810cc44b>] SyS_finit_module+0xbb/0xf0
  [<ffffffff81551757>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 90 90 90 90 90 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 
89 fb 4c 89 6d f8 4c 8b 42 08 49 89 f5 49 89 d4 49 39 f0 75 31 <4d> 8b 
45 00 4d 39 c4 75 6f 4c 39 e3 74 45 4c 39 eb 74 40 49 89
RIP  [<ffffffff81274e36>] __list_add+0x26/0xd0
  RSP <ffff880466697898>
CR2: 0000000000000000
---[ end trace 5f4fe0ca857661e6 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]         ` <567BC52B.4030801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-12-24 10:30           ` Sagi Grimberg
  2015-12-24 10:34           ` Or Gerlitz
  1 sibling, 0 replies; 29+ messages in thread
From: Sagi Grimberg @ 2015-12-24 10:30 UTC (permalink / raw)
  To: Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Christoph Lameter

Doug,

I'm also can't load mlx5 drivers in your tree [1] but
I don't know where it's from, it can come from pretty much everything...

Now I'm left with no useable HW to test with :(


[1]:
mlx5_core 0000:06:00.0: firmware version: 12.14.74
mlx5_core 0000:06:00.1: firmware version: 12.14.74
mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
command failed, status bad parameter(0x3), syndrome 0x7424da
command failed, status bad parameter(0x3), syndrome 0x7424da
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]         ` <567BC52B.4030801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-12-24 10:30           ` Sagi Grimberg
@ 2015-12-24 10:34           ` Or Gerlitz
       [not found]             ` <567BCA3F.7050502-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2015-12-24 10:34 UTC (permalink / raw)
  To: Sagi Grimberg, Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Christoph Lameter

On 12/24/2015 12:12 PM, Sagi Grimberg wrote:
> This patch seems to generate a list corruption [1] when I test
> with Doug's for-4.5 tree. Eran, care to take a look at this? 

This patch is part from a series that was introduced in 4.3-rc1 [1], did 
4.4-rc5/6 worked for you before you uploaded there further patches?

Or.

[1]
fbfb662 IB/mlx4: Add support for blocking multicast loopback QP creation 
user flag
7b59f0f IB/mlx4: Add counter based implementation for QP multicast 
loopback block
3ba8e31 IB/mlx4: Add IB counters table
74194fb net/mlx4_en: Implement mcast loopback prevention for ETH qps
9a89283 net/mlx4_core: Add support for filtering multicast loopback
ddf9529 IB/core: Allow setting create flags in QP init attribute
6d8a749 IB/core: Extend ib_uverbs_create_qp


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]             ` <567BCA3F.7050502-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-24 10:42               ` Sagi Grimberg
       [not found]                 ` <567BCBF8.30006-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Sagi Grimberg @ 2015-12-24 10:42 UTC (permalink / raw)
  To: Or Gerlitz, Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Christoph Lameter


>> This patch seems to generate a list corruption [1] when I test
>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>
> This patch is part from a series that was introduced in 4.3-rc1 [1],

Then something else broke it. Can people check their patches on doug's
tree? At the moment it's unusable...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]                 ` <567BCBF8.30006-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-12-24 12:38                   ` Or Gerlitz
       [not found]                     ` <567BE756.9050005-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2015-12-24 12:38 UTC (permalink / raw)
  To: Sagi Grimberg, Eran Ben Elisha, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Christoph Lameter

On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>
>>> This patch seems to generate a list corruption [1] when I test
>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>
>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>
> Then something else broke it. Can people check their patches on doug's
> tree? At the moment it's unusable...

Yes, I checked the branch up to commit 882f3b3 "Merge branches 
'4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works 
(rping, ibv_rc_pingpong over top of mlx4 VPI)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]                     ` <567BE756.9050005-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-24 14:07                       ` Matan Barak
       [not found]                         ` <CAAKD3BBsv8Qe=RZvCTOa=0kuSAqYetx4NOok_Tcp5xtUz+Ggtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2015-12-24 14:07 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Sagi Grimberg, Eran Ben Elisha, Doug Ledford, linux-rdma,
	Christoph Lameter

On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>>
>>
>>>> This patch seems to generate a list corruption [1] when I test
>>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>>
>>>
>>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>>
>>
>> Then something else broke it. Can people check their patches on doug's
>> tree? At the moment it's unusable...
>

Leon and I have checked Doug's tree with mlx4_ib disabled and we
didn't encounter any error.
We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.

>
> Yes, I checked the branch up to commit 882f3b3 "Merge branches
> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
> ibv_rc_pingpong over top of mlx4 VPI)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]                         ` <CAAKD3BBsv8Qe=RZvCTOa=0kuSAqYetx4NOok_Tcp5xtUz+Ggtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-24 16:09                           ` Matan Barak
       [not found]                             ` <CAAKD3BB5-hUC=0bDqxXxuag9TPqQEeMD74paHrAMOc-MsQwEow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2015-12-24 16:09 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Sagi Grimberg, Eran Ben Elisha, Doug Ledford, linux-rdma,
	Christoph Lameter

On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>>>
>>>
>>>>> This patch seems to generate a list corruption [1] when I test
>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>>>
>>>>
>>>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>>>
>>>
>>> Then something else broke it. Can people check their patches on doug's
>>> tree? At the moment it's unusable...
>>
>
> Leon and I have checked Doug's tree with mlx4_ib disabled and we
> didn't encounter any error.
> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.
>
>>
>> Yes, I checked the branch up to commit 882f3b3 "Merge branches
>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
>> ibv_rc_pingpong over top of mlx4 VPI)
>>

Regarding mlx4, Eran and I analyzed it. We didn't test that, but it
seems like the bug is introduced in the 64bit counters test. Here's a
proposal:

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 539040f..8da3c83 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -714,11 +714,12 @@ err:
  * Figure out which counter table to use depending on
  * the device capabilities.
  */
-static struct attribute_group *get_counter_table(struct ib_device *dev)
+static struct attribute_group *get_counter_table(struct ib_device *dev,
+                                          int port_num)
 {
        struct ib_class_port_info cpi;

-   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
+ if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
                                &cpi, 40, sizeof(cpi)) >= 0) {

                if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
@@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num,
                goto err_put;
        }

-   p->pma_table = get_counter_table(device);
+ p->pma_table = get_counter_table(device, port_num);
        ret = sysfs_create_group(&p->kobj, p->pma_table);
        if (ret)
                goto err_put_gid_attrs;


>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]                             ` <CAAKD3BB5-hUC=0bDqxXxuag9TPqQEeMD74paHrAMOc-MsQwEow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-25 14:50                               ` Hal Rosenstock
       [not found]                                 ` <567D57A5.2050908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Hal Rosenstock @ 2015-12-25 14:50 UTC (permalink / raw)
  To: Matan Barak, Or Gerlitz
  Cc: Sagi Grimberg, Eran Ben Elisha, Doug Ledford, linux-rdma,
	Christoph Lameter

On 12/24/2015 11:09 AM, Matan Barak wrote:
> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>>>>
>>>>
>>>>>> This patch seems to generate a list corruption [1] when I test
>>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>>>>
>>>>>
>>>>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>>>>
>>>>
>>>> Then something else broke it. Can people check their patches on doug's
>>>> tree? At the moment it's unusable...
>>>
>>
>> Leon and I have checked Doug's tree with mlx4_ib disabled and we
>> didn't encounter any error.
>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.
>>
>>>
>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches
>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
>>> ibv_rc_pingpong over top of mlx4 VPI)
>>>
> 
> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it
> seems like the bug is introduced in the 64bit counters test. Here's a
> proposal:
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 539040f..8da3c83 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -714,11 +714,12 @@ err:
>   * Figure out which counter table to use depending on
>   * the device capabilities.
>   */
> -static struct attribute_group *get_counter_table(struct ib_device *dev)
> +static struct attribute_group *get_counter_table(struct ib_device *dev,
> +                                          int port_num)
>  {
>         struct ib_class_port_info cpi;
> 
> -   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
>                                 &cpi, 40, sizeof(cpi)) >= 0) {

Your proposal is similar to earlier version of Christoph's patch but was
changed since ClassPortInfo attribute does not have PortSelect field
like other PerfMgt attributes which is where this port num would be
placed. In ClassPortInfo attribute, that location would be the
ClassVersion field that would be set to port number in PerfMgt Get query.

-- Hal

> 
>                 if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num,
>                 goto err_put;
>         }
> 
> -   p->pma_table = get_counter_table(device);
> + p->pma_table = get_counter_table(device, port_num);
>         ret = sysfs_create_group(&p->kobj, p->pma_table);
>         if (ret)
>                 goto err_put_gid_attrs;
> 
> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]                                 ` <567D57A5.2050908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-12-25 15:56                                   ` Hal Rosenstock
       [not found]                                     ` <567D674B.9050807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Hal Rosenstock @ 2015-12-25 15:56 UTC (permalink / raw)
  To: Matan Barak, Or Gerlitz
  Cc: Sagi Grimberg, Eran Ben Elisha, Doug Ledford, linux-rdma,
	Christoph Lameter

On 12/25/2015 9:50 AM, Hal Rosenstock wrote:
> On 12/24/2015 11:09 AM, Matan Barak wrote:
>> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>>>>>
>>>>>
>>>>>>> This patch seems to generate a list corruption [1] when I test
>>>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>>>>>
>>>>>>
>>>>>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>>>>>
>>>>>
>>>>> Then something else broke it. Can people check their patches on doug's
>>>>> tree? At the moment it's unusable...
>>>>
>>>
>>> Leon and I have checked Doug's tree with mlx4_ib disabled and we
>>> didn't encounter any error.
>>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.
>>>
>>>>
>>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches
>>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
>>>> ibv_rc_pingpong over top of mlx4 VPI)
>>>>
>>
>> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it
>> seems like the bug is introduced in the 64bit counters test. Here's a
>> proposal:
>>
>> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
>> index 539040f..8da3c83 100644
>> --- a/drivers/infiniband/core/sysfs.c
>> +++ b/drivers/infiniband/core/sysfs.c
>> @@ -714,11 +714,12 @@ err:
>>   * Figure out which counter table to use depending on
>>   * the device capabilities.
>>   */
>> -static struct attribute_group *get_counter_table(struct ib_device *dev)
>> +static struct attribute_group *get_counter_table(struct ib_device *dev,
>> +                                          int port_num)
>>  {
>>         struct ib_class_port_info cpi;
>>
>> -   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
>> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
>>                                 &cpi, 40, sizeof(cpi)) >= 0) {
> 
> Your proposal is similar to earlier version of Christoph's patch but was
> changed since ClassPortInfo attribute does not have PortSelect field
> like other PerfMgt attributes which is where this port num would be
> placed. In ClassPortInfo attribute, that location would be the
> ClassVersion field that would be set to port number in PerfMgt Get query.

In actuality, I don't think it really matters as this is a Get not a Set
and the PMA would do the right thing even if some field in the CPI were
stepped on.

> -- Hal
> 
>>
>>                 if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
>> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num,
>>                 goto err_put;
>>         }
>>
>> -   p->pma_table = get_counter_table(device);
>> + p->pma_table = get_counter_table(device, port_num);
>>         ret = sysfs_create_group(&p->kobj, p->pma_table);
>>         if (ret)
>>                 goto err_put_gid_attrs;
>>
>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table
       [not found]                                     ` <567D674B.9050807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-12-27 11:16                                       ` Matan Barak
  0 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2015-12-27 11:16 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Or Gerlitz, Sagi Grimberg, Eran Ben Elisha, Doug Ledford,
	linux-rdma, Christoph Lameter

On Fri, Dec 25, 2015 at 5:56 PM, Hal Rosenstock <hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On 12/25/2015 9:50 AM, Hal Rosenstock wrote:
>> On 12/24/2015 11:09 AM, Matan Barak wrote:
>>> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>>>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>>>>>>
>>>>>>
>>>>>>>> This patch seems to generate a list corruption [1] when I test
>>>>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>>>>>>
>>>>>>>
>>>>>>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>>>>>>
>>>>>>
>>>>>> Then something else broke it. Can people check their patches on doug's
>>>>>> tree? At the moment it's unusable...
>>>>>
>>>>
>>>> Leon and I have checked Doug's tree with mlx4_ib disabled and we
>>>> didn't encounter any error.
>>>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.
>>>>
>>>>>
>>>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches
>>>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
>>>>> ibv_rc_pingpong over top of mlx4 VPI)
>>>>>
>>>
>>> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it
>>> seems like the bug is introduced in the 64bit counters test. Here's a
>>> proposal:
>>>
>>> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
>>> index 539040f..8da3c83 100644
>>> --- a/drivers/infiniband/core/sysfs.c
>>> +++ b/drivers/infiniband/core/sysfs.c
>>> @@ -714,11 +714,12 @@ err:
>>>   * Figure out which counter table to use depending on
>>>   * the device capabilities.
>>>   */
>>> -static struct attribute_group *get_counter_table(struct ib_device *dev)
>>> +static struct attribute_group *get_counter_table(struct ib_device *dev,
>>> +                                          int port_num)
>>>  {
>>>         struct ib_class_port_info cpi;
>>>
>>> -   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
>>> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
>>>                                 &cpi, 40, sizeof(cpi)) >= 0) {
>>
>> Your proposal is similar to earlier version of Christoph's patch but was
>> changed since ClassPortInfo attribute does not have PortSelect field
>> like other PerfMgt attributes which is where this port num would be
>> placed. In ClassPortInfo attribute, that location would be the
>> ClassVersion field that would be set to port number in PerfMgt Get query.
>
> In actuality, I don't think it really matters as this is a Get not a Set
> and the PMA would do the right thing even if some field in the CPI were
> stepped on.
>

Well, it does matter as it calls the vendor driver with port_num = 0.
Since the kernel is trusted, the vendor driver expects a valid port number.
Giving it an invalid number might result in memory corruptions, as
demonstrated in this case.

>> -- Hal

Matan

>>
>>>
>>>                 if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
>>> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num,
>>>                 goto err_put;
>>>         }
>>>
>>> -   p->pma_table = get_counter_table(device);
>>> + p->pma_table = get_counter_table(device, port_num);
>>>         ret = sysfs_create_group(&p->kobj, p->pma_table);
>>>         if (ret)
>>>                 goto err_put_gid_attrs;
>>>
>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-12-27 11:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-15 11:44 [PATCH v2 for-next 0/7] Add support for multicast loopback prevention to mlx4 Eran Ben Elisha
     [not found] ` <1444909482-17113-1-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-15 11:44   ` [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp Eran Ben Elisha
     [not found]     ` <1444909482-17113-2-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-21  8:34       ` Haggai Eran
     [not found]         ` <56274E02.7010002-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-21 13:46           ` eran ben elisha
     [not found]             ` <CAKHjkj=k6QSKj0gMe0z5=Kjbe2e7cVCbPhkq9Z457gpvnRyitg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-21 14:42               ` Haggai Eran
2015-10-21  9:53       ` Sagi Grimberg
     [not found]         ` <56276095.6020803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-10-21 10:04           ` Or Gerlitz
     [not found]             ` <56276336.7000704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-21 14:09               ` Sagi Grimberg
     [not found]                 ` <56279CA4.8040201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-10-21 18:15                   ` Or Gerlitz
     [not found]                     ` <CAJ3xEMgWcpKbKL3jhiK2RP=H6_iY0eriirAXn1TJaSG7u8frAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-21 18:21                       ` Christoph Lameter
     [not found]                         ` <alpine.DEB.2.20.1510211319520.10833-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-10-21 18:52                           ` Or Gerlitz
2015-10-21 10:26           ` Haggai Eran
2015-10-15 11:44   ` [PATCH v2 for-next 2/7] IB/core: Allow setting create flags in QP init attribute Eran Ben Elisha
     [not found]     ` <1444909482-17113-3-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-21  8:46       ` Haggai Eran
2015-10-15 11:44   ` [PATCH v2 for-next 3/7] net/mlx4_core: Add support for filtering multicast loopback Eran Ben Elisha
2015-10-15 11:44   ` [PATCH v2 for-next 4/7] net/mlx4_en: Implement mcast loopback prevention for ETH qps Eran Ben Elisha
2015-10-15 11:44   ` [PATCH v2 for-next 5/7] IB/mlx4: Add IB counters table Eran Ben Elisha
     [not found]     ` <1444909482-17113-6-git-send-email-eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-24 10:12       ` Sagi Grimberg
     [not found]         ` <567BC52B.4030801-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-12-24 10:30           ` Sagi Grimberg
2015-12-24 10:34           ` Or Gerlitz
     [not found]             ` <567BCA3F.7050502-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-24 10:42               ` Sagi Grimberg
     [not found]                 ` <567BCBF8.30006-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-12-24 12:38                   ` Or Gerlitz
     [not found]                     ` <567BE756.9050005-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-24 14:07                       ` Matan Barak
     [not found]                         ` <CAAKD3BBsv8Qe=RZvCTOa=0kuSAqYetx4NOok_Tcp5xtUz+Ggtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-24 16:09                           ` Matan Barak
     [not found]                             ` <CAAKD3BB5-hUC=0bDqxXxuag9TPqQEeMD74paHrAMOc-MsQwEow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-25 14:50                               ` Hal Rosenstock
     [not found]                                 ` <567D57A5.2050908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-12-25 15:56                                   ` Hal Rosenstock
     [not found]                                     ` <567D674B.9050807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-12-27 11:16                                       ` Matan Barak
2015-10-15 11:44   ` [PATCH v2 for-next 6/7] IB/mlx4: Add counter based implementation for QP multicast loopback block Eran Ben Elisha
2015-10-15 11:44   ` [PATCH v2 for-next 7/7] IB/mlx4: Add support for blocking multicast loopback QP creation user flag Eran Ben Elisha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.