linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP
@ 2015-10-14  8:29 Haggai Eran
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Haggai Eran @ 2015-10-14  8:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran, Jason Gunthorpe,
	Hal Rosenstock, Sean Hefty, Or Gerlitz, Eli Cohen

Hi,

Patch 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA
CM") started checking the BTH P_Key for RDMA CM requests, and this
uncovered some issues with P_Key handling. In mlx5 the pkey_index in ib_wc
wasn't set correctly [1]. In addition, we found out that mlx5 doesn't
respect the pkey_index in ib_send_wr.ud for GSI packets. Apparently having
the pkey_index in a work request isn't required by the IBA specifications,
and so the Connect-IB and ConnectX-4 cards that mlx5 drives doesn't support
it. The result is that in kernel 4.3, RDMA CM can reject new connections
created using an mlx5 IPoIB interface that uses a non-default P_Key.

The proposed solution here is to change the ib_mad module to create
multiple UD QPs for sending GMPs, when the underlying driver doesn't
support the pkey_index in the work request. A side effect of
this change is that MADs going out with pkey_index != 0 will have a source
QP number != 1. This is allowed by the IBA specifications.

To save resources, the code creates QPs only for pkey table entries that are
non-zero. However, the code doesn't currently destroy QPs when pkey entries are
cleared. This would require reference counting on each QP to work correctly,
and we plan to do that later on.

The patchset is divided as follows. The first two patches add helper
functions and store additional info in ib_mad_qp_info to help eliminate the
depenedency on a single ib_qp pointer in ib_mad_qp_info. Patch 3 adds a
capability bit to ib_device to tell whether a device uses the pkey index
from a work request or from the QP. Patch 4 adds SRQ support to the mad
layer to allow receiving MADs designated to any QP belonging to a GSI agent
to the same queue. Patch 5 adds the bulk of the code required to create and
use multiple QPs on the same ib_mad_qp_info struct, and patch 6 adds a
P_Key change event handler to update the table when the SM changes the
available P_Keys.

[1] mlx5: Fix incorrect wc pkey_index assignment for GSI messages
http://www.spinics.net/lists/linux-rdma/msg28374.html

Regards,
Haggai

Haggai Eran (6):
  IB/mad: Use helpers to get ib_device and ib_pd from ib_mad_agent
  IB/mad: Add QP parameters to ib_mad_qp_info
  IB/core: Add capability bit to tell whether per-WR P_Key change in
    GSI is supported
  IB/mad: Use a SRQ for receiving GMPs
  IB/mad: Create multiple QPs for supporting different P_Keys
  IB/mad: P_Key change event handler

 drivers/infiniband/core/agent.c          |   4 +-
 drivers/infiniband/core/cm.c             |   5 +-
 drivers/infiniband/core/mad.c            | 528 ++++++++++++++++++++++++-------
 drivers/infiniband/core/mad_priv.h       |   9 +-
 drivers/infiniband/core/mad_rmpp.c       |   4 +-
 drivers/infiniband/core/sa_query.c       |  10 +-
 drivers/infiniband/core/user_mad.c       |   4 +-
 drivers/infiniband/hw/mlx4/mad.c         |   2 +-
 drivers/infiniband/hw/mlx5/main.c        |   1 +
 drivers/infiniband/hw/mthca/mthca_mad.c  |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c    |   2 +-
 drivers/staging/rdma/ipath/ipath_verbs.c |   1 +
 include/rdma/ib_mad.h                    |  16 +-
 include/rdma/ib_verbs.h                  |   2 +
 14 files changed, 452 insertions(+), 138 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] IB/mad: Use helpers to get ib_device and ib_pd from ib_mad_agent
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-14  8:29   ` Haggai Eran
  2015-10-14  8:29   ` [PATCH 2/6] IB/mad: Add QP parameters to ib_mad_qp_info Haggai Eran
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2015-10-14  8:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran, Jason Gunthorpe,
	Hal Rosenstock, Sean Hefty, Or Gerlitz, Eli Cohen

ib_mad_agent currently exposes an ib_qp and an ib_device unnecessarily.
Replace these fields with a single ib_pd, and use helper functions to get
the device and pd instead of accessing the fields directly.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/agent.c         |  4 +-
 drivers/infiniband/core/cm.c            |  5 ++-
 drivers/infiniband/core/mad.c           | 69 ++++++++++++++++++---------------
 drivers/infiniband/core/mad_rmpp.c      |  4 +-
 drivers/infiniband/core/sa_query.c      | 10 +++--
 drivers/infiniband/core/user_mad.c      |  4 +-
 drivers/infiniband/hw/mlx4/mad.c        |  2 +-
 drivers/infiniband/hw/mthca/mthca_mad.c |  2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c   |  2 +-
 include/rdma/ib_mad.h                   | 16 ++++++--
 10 files changed, 67 insertions(+), 51 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index 0429040304fd..444279ae3827 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -59,7 +59,7 @@ __ib_get_agent_port(const struct ib_device *device, int port_num)
 	struct ib_agent_port_private *entry;
 
 	list_for_each_entry(entry, &ib_agent_port_list, port_list) {
-		if (entry->agent[1]->device == device &&
+		if (ib_mad_agent_device(entry->agent[1]) == device &&
 		    entry->agent[1]->port_num == port_num)
 			return entry;
 	}
@@ -99,7 +99,7 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, const struct ib_grh *
 	}
 
 	agent = port_priv->agent[qpn];
-	ah = ib_create_ah_from_wc(agent->qp->pd, wc, grh, port_num);
+	ah = ib_create_ah_from_wc(ib_mad_agent_pd(agent), wc, grh, port_num);
 	if (IS_ERR(ah)) {
 		dev_err(&device->dev, "ib_create_ah_from_wc error %ld\n",
 			PTR_ERR(ah));
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index ea4db9c1d44f..c6150c5b6ada 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -263,7 +263,7 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv,
 	struct ib_ah *ah;
 
 	mad_agent = cm_id_priv->av.port->mad_agent;
-	ah = ib_create_ah(mad_agent->qp->pd, &cm_id_priv->av.ah_attr);
+	ah = ib_create_ah(ib_mad_agent_pd(mad_agent), &cm_id_priv->av.ah_attr);
 	if (IS_ERR(ah))
 		return PTR_ERR(ah);
 
@@ -294,7 +294,8 @@ static int cm_alloc_response_msg(struct cm_port *port,
 	struct ib_mad_send_buf *m;
 	struct ib_ah *ah;
 
-	ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc,
+	ah = ib_create_ah_from_wc(ib_mad_agent_pd(port->mad_agent),
+				  mad_recv_wc->wc,
 				  mad_recv_wc->recv_buf.grh, port->port_num);
 	if (IS_ERR(ah))
 		return PTR_ERR(ah);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 4b5c72311deb..62ce3a4c20b7 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -350,11 +350,10 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
 	mad_agent_priv->qp_info = &port_priv->qp_info[qpn];
 	mad_agent_priv->reg_req = reg_req;
 	mad_agent_priv->agent.rmpp_version = rmpp_version;
-	mad_agent_priv->agent.device = device;
 	mad_agent_priv->agent.recv_handler = recv_handler;
 	mad_agent_priv->agent.send_handler = send_handler;
 	mad_agent_priv->agent.context = context;
-	mad_agent_priv->agent.qp = port_priv->qp_info[qpn].qp;
+	mad_agent_priv->agent.pd = port_priv->qp_info[qpn].qp->pd;
 	mad_agent_priv->agent.port_num = port_num;
 	mad_agent_priv->agent.flags = registration_flags;
 	spin_lock_init(&mad_agent_priv->lock);
@@ -516,11 +515,10 @@ struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device,
 
 	/* Now, fill in the various structures */
 	mad_snoop_priv->qp_info = &port_priv->qp_info[qpn];
-	mad_snoop_priv->agent.device = device;
 	mad_snoop_priv->agent.recv_handler = recv_handler;
 	mad_snoop_priv->agent.snoop_handler = snoop_handler;
 	mad_snoop_priv->agent.context = context;
-	mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp;
+	mad_snoop_priv->agent.pd = port_priv->qp_info[qpn].qp->pd;
 	mad_snoop_priv->agent.port_num = port_num;
 	mad_snoop_priv->mad_snoop_flags = mad_snoop_flags;
 	init_completion(&mad_snoop_priv->comp);
@@ -749,7 +747,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	struct ib_mad_private *mad_priv;
 	struct ib_mad_port_private *port_priv;
 	struct ib_mad_agent_private *recv_mad_agent = NULL;
-	struct ib_device *device = mad_agent_priv->agent.device;
+	struct ib_device *device = ib_mad_agent_device(&mad_agent_priv->agent);
 	u8 port_num;
 	struct ib_wc mad_wc;
 	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
@@ -831,7 +829,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 		goto out;
 	}
 
-	build_smp_wc(mad_agent_priv->agent.qp,
+	build_smp_wc(mad_agent_priv->qp_info->qp,
 		     send_wr->wr_id, drslid,
 		     send_wr->wr.ud.pkey_index,
 		     send_wr->wr.ud.port_num, &mad_wc);
@@ -867,8 +865,9 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 		break;
 	case IB_MAD_RESULT_SUCCESS:
 		/* Treat like an incoming receive MAD */
-		port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
-					    mad_agent_priv->agent.port_num);
+		port_priv = ib_get_mad_port(
+				ib_mad_agent_device(&mad_agent_priv->agent),
+				mad_agent_priv->agent.port_num);
 		if (port_priv) {
 			memcpy(mad_priv->mad, smp, mad_priv->mad_size);
 			recv_mad_agent = find_mad_agent(port_priv,
@@ -949,7 +948,7 @@ static int alloc_send_rmpp_list(struct ib_mad_send_wr_private *send_wr,
 	for (left = send_buf->data_len + pad; left > 0; left -= seg_size) {
 		seg = kmalloc(sizeof (*seg) + seg_size, gfp_mask);
 		if (!seg) {
-			dev_err(&send_buf->mad_agent->device->dev,
+			dev_err(&ib_mad_agent_device(send_buf->mad_agent)->dev,
 				"alloc_send_rmpp_segs: RMPP mem alloc failed for len %zd, gfp %#x\n",
 				sizeof (*seg) + seg_size, gfp_mask);
 			free_send_rmpp_list(send_wr);
@@ -997,7 +996,8 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private,
 				      agent);
 
-	opa = rdma_cap_opa_mad(mad_agent->device, mad_agent->port_num);
+	opa = rdma_cap_opa_mad(ib_mad_agent_device(mad_agent),
+			       mad_agent->port_num);
 
 	if (opa && base_version == OPA_MGMT_BASE_VERSION)
 		mad_size = sizeof(struct opa_mad);
@@ -1028,7 +1028,8 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 
 	mad_send_wr->mad_agent_priv = mad_agent_priv;
 	mad_send_wr->sg_list[0].length = hdr_len;
-	mad_send_wr->sg_list[0].lkey = mad_agent->qp->pd->local_dma_lkey;
+	mad_send_wr->sg_list[0].lkey =
+		ib_mad_agent_pd(mad_agent)->local_dma_lkey;
 
 	/* OPA MADs don't have to be the full 2048 bytes */
 	if (opa && base_version == OPA_MGMT_BASE_VERSION &&
@@ -1037,7 +1038,8 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	else
 		mad_send_wr->sg_list[1].length = mad_size - hdr_len;
 
-	mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
+	mad_send_wr->sg_list[1].lkey =
+		ib_mad_agent_pd(mad_agent)->local_dma_lkey;
 
 	mad_send_wr->send_wr.wr_id = (unsigned long) mad_send_wr;
 	mad_send_wr->send_wr.sg_list = mad_send_wr->sg_list;
@@ -1156,21 +1158,23 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
 
 	mad_agent = mad_send_wr->send_buf.mad_agent;
 	sge = mad_send_wr->sg_list;
-	sge[0].addr = ib_dma_map_single(mad_agent->device,
+	sge[0].addr = ib_dma_map_single(ib_mad_agent_device(mad_agent),
 					mad_send_wr->send_buf.mad,
 					sge[0].length,
 					DMA_TO_DEVICE);
-	if (unlikely(ib_dma_mapping_error(mad_agent->device, sge[0].addr)))
+	if (unlikely(ib_dma_mapping_error(ib_mad_agent_device(mad_agent),
+					  sge[0].addr)))
 		return -ENOMEM;
 
 	mad_send_wr->header_mapping = sge[0].addr;
 
-	sge[1].addr = ib_dma_map_single(mad_agent->device,
+	sge[1].addr = ib_dma_map_single(ib_mad_agent_device(mad_agent),
 					ib_get_payload(mad_send_wr),
 					sge[1].length,
 					DMA_TO_DEVICE);
-	if (unlikely(ib_dma_mapping_error(mad_agent->device, sge[1].addr))) {
-		ib_dma_unmap_single(mad_agent->device,
+	if (unlikely(ib_dma_mapping_error(ib_mad_agent_device(mad_agent),
+					  sge[1].addr))) {
+		ib_dma_unmap_single(ib_mad_agent_device(mad_agent),
 				    mad_send_wr->header_mapping,
 				    sge[0].length, DMA_TO_DEVICE);
 		return -ENOMEM;
@@ -1179,7 +1183,7 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
 
 	spin_lock_irqsave(&qp_info->send_queue.lock, flags);
 	if (qp_info->send_queue.count < qp_info->send_queue.max_active) {
-		ret = ib_post_send(mad_agent->qp, &mad_send_wr->send_wr,
+		ret = ib_post_send(qp_info->qp, &mad_send_wr->send_wr,
 				   &bad_send_wr);
 		list = &qp_info->send_queue.list;
 	} else {
@@ -1193,10 +1197,10 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
 	}
 	spin_unlock_irqrestore(&qp_info->send_queue.lock, flags);
 	if (ret) {
-		ib_dma_unmap_single(mad_agent->device,
+		ib_dma_unmap_single(ib_mad_agent_device(mad_agent),
 				    mad_send_wr->header_mapping,
 				    sge[0].length, DMA_TO_DEVICE);
-		ib_dma_unmap_single(mad_agent->device,
+		ib_dma_unmap_single(ib_mad_agent_device(mad_agent),
 				    mad_send_wr->payload_mapping,
 				    sge[1].length, DMA_TO_DEVICE);
 	}
@@ -1337,7 +1341,7 @@ EXPORT_SYMBOL(ib_redirect_mad_qp);
 int ib_process_mad_wc(struct ib_mad_agent *mad_agent,
 		      struct ib_wc *wc)
 {
-	dev_err(&mad_agent->device->dev,
+	dev_err(&ib_mad_agent_device(mad_agent)->dev,
 		"ib_process_mad_wc() not implemented yet\n");
 	return 0;
 }
@@ -1457,7 +1461,7 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req *mad_reg_req,
 		/* Allocate management class table for "new" class version */
 		*class = kzalloc(sizeof **class, GFP_ATOMIC);
 		if (!*class) {
-			dev_err(&agent_priv->agent.device->dev,
+			dev_err(&ib_mad_agent_device(&agent_priv->agent)->dev,
 				"No memory for ib_mad_mgmt_class_table\n");
 			ret = -ENOMEM;
 			goto error1;
@@ -1524,7 +1528,7 @@ static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
 		/* Allocate mgmt vendor class table for "new" class version */
 		vendor = kzalloc(sizeof *vendor, GFP_ATOMIC);
 		if (!vendor) {
-			dev_err(&agent_priv->agent.device->dev,
+			dev_err(&ib_mad_agent_device(&agent_priv->agent)->dev,
 				"No memory for ib_mad_mgmt_vendor_class_table\n");
 			goto error1;
 		}
@@ -1535,7 +1539,7 @@ static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
 		/* Allocate table for this management vendor class */
 		vendor_class = kzalloc(sizeof *vendor_class, GFP_ATOMIC);
 		if (!vendor_class) {
-			dev_err(&agent_priv->agent.device->dev,
+			dev_err(&ib_mad_agent_device(&agent_priv->agent)->dev,
 				"No memory for ib_mad_mgmt_vendor_class\n");
 			goto error2;
 		}
@@ -1567,7 +1571,7 @@ static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
 			goto check_in_use;
 		}
 	}
-	dev_err(&agent_priv->agent.device->dev, "All OUI slots in use\n");
+	dev_err(&ib_mad_agent_device(&agent_priv->agent)->dev, "All OUI slots in use\n");
 	goto error3;
 
 check_in_use:
@@ -1847,7 +1851,7 @@ static inline int rcv_has_same_gid(const struct ib_mad_agent_private *mad_agent_
 	struct ib_ah_attr attr;
 	u8 send_resp, rcv_resp;
 	union ib_gid sgid;
-	struct ib_device *device = mad_agent_priv->agent.device;
+	struct ib_device *device = mad_agent_priv->agent.pd->device;
 	u8 port_num = mad_agent_priv->agent.port_num;
 	u8 lmc;
 
@@ -2417,6 +2421,7 @@ static void ib_mad_send_done_handler(struct ib_mad_port_private *port_priv,
 	struct ib_mad_queue		*send_queue;
 	struct ib_send_wr		*bad_send_wr;
 	struct ib_mad_send_wc		mad_send_wc;
+	struct ib_device		*ibdev;
 	unsigned long flags;
 	int ret;
 
@@ -2427,11 +2432,10 @@ static void ib_mad_send_done_handler(struct ib_mad_port_private *port_priv,
 	qp_info = send_queue->qp_info;
 
 retry:
-	ib_dma_unmap_single(mad_send_wr->send_buf.mad_agent->device,
-			    mad_send_wr->header_mapping,
+	ibdev = ib_mad_agent_device(mad_send_wr->send_buf.mad_agent);
+	ib_dma_unmap_single(ibdev, mad_send_wr->header_mapping,
 			    mad_send_wr->sg_list[0].length, DMA_TO_DEVICE);
-	ib_dma_unmap_single(mad_send_wr->send_buf.mad_agent->device,
-			    mad_send_wr->payload_mapping,
+	ib_dma_unmap_single(ibdev, mad_send_wr->payload_mapping,
 			    mad_send_wr->sg_list[1].length, DMA_TO_DEVICE);
 	queued_send_wr = NULL;
 	spin_lock_irqsave(&send_queue->lock, flags);
@@ -2700,7 +2704,8 @@ static void local_completions(struct work_struct *work)
 			u8 base_version;
 			recv_mad_agent = local->recv_mad_agent;
 			if (!recv_mad_agent) {
-				dev_err(&mad_agent_priv->agent.device->dev,
+				dev_err(&ib_mad_agent_device(
+						&mad_agent_priv->agent)->dev,
 					"No receive MAD agent for local completion\n");
 				free_mad = 1;
 				goto local_send_completion;
@@ -2710,7 +2715,7 @@ static void local_completions(struct work_struct *work)
 			 * Defined behavior is to complete response
 			 * before request
 			 */
-			build_smp_wc(recv_mad_agent->agent.qp,
+			build_smp_wc(recv_mad_agent->qp_info->qp,
 				     (unsigned long) local->mad_send_wr,
 				     be16_to_cpu(IB_LID_PERMISSIVE),
 				     local->mad_send_wr->send_wr.wr.ud.pkey_index,
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 382941b46e43..1d415f71258f 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -160,7 +160,7 @@ static struct ib_mad_send_buf *alloc_response_msg(struct ib_mad_agent *agent,
 	struct ib_ah *ah;
 	int hdr_len;
 
-	ah = ib_create_ah_from_wc(agent->qp->pd, recv_wc->wc,
+	ah = ib_create_ah_from_wc(ib_mad_agent_pd(agent), recv_wc->wc,
 				  recv_wc->recv_buf.grh, agent->port_num);
 	if (IS_ERR(ah))
 		return (void *) ah;
@@ -291,7 +291,7 @@ create_rmpp_recv(struct ib_mad_agent_private *agent,
 	if (!rmpp_recv)
 		return NULL;
 
-	rmpp_recv->ah = ib_create_ah_from_wc(agent->agent.qp->pd,
+	rmpp_recv->ah = ib_create_ah_from_wc(ib_mad_agent_pd(&agent->agent),
 					     mad_recv_wc->wc,
 					     mad_recv_wc->recv_buf.grh,
 					     agent->agent.port_num);
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 8c014b33d8e0..3afd8ba408fa 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -443,7 +443,8 @@ static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
 	/* Construct the family header first */
 	header = (struct rdma_ls_resolve_header *)
 		skb_put(skb, NLMSG_ALIGN(sizeof(*header)));
-	memcpy(header->device_name, query->port->agent->device->name,
+	memcpy(header->device_name,
+	       ib_mad_agent_device(query->port->agent)->name,
 	       LS_DEVICE_NAME_MAX);
 	header->port_num = query->port->port_num;
 
@@ -855,7 +856,8 @@ static void update_sm_ah(struct work_struct *work)
 	struct ib_port_attr port_attr;
 	struct ib_ah_attr   ah_attr;
 
-	if (ib_query_port(port->agent->device, port->port_num, &port_attr)) {
+	if (ib_query_port(ib_mad_agent_device(port->agent), port->port_num,
+			  &port_attr)) {
 		printk(KERN_WARNING "Couldn't query port\n");
 		return;
 	}
@@ -870,7 +872,7 @@ static void update_sm_ah(struct work_struct *work)
 	new_ah->src_path_mask = (1 << port_attr.lmc) - 1;
 
 	new_ah->pkey_index = 0;
-	if (ib_find_pkey(port->agent->device, port->port_num,
+	if (ib_find_pkey(ib_mad_agent_device(port->agent), port->port_num,
 			 IB_DEFAULT_PKEY_FULL, &new_ah->pkey_index))
 		printk(KERN_ERR "Couldn't find index for default PKey\n");
 
@@ -879,7 +881,7 @@ static void update_sm_ah(struct work_struct *work)
 	ah_attr.sl       = port_attr.sm_sl;
 	ah_attr.port_num = port->port_num;
 
-	new_ah->ah = ib_create_ah(port->agent->qp->pd, &ah_attr);
+	new_ah->ah = ib_create_ah(ib_mad_agent_pd(port->agent), &ah_attr);
 	if (IS_ERR(new_ah->ah)) {
 		printk(KERN_WARNING "Couldn't create new SM AH\n");
 		kfree(new_ah);
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 57f281f8d686..ed81fbf4e285 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -236,7 +236,7 @@ static void recv_handler(struct ib_mad_agent *agent,
 	if (packet->mad.hdr.grh_present) {
 		struct ib_ah_attr ah_attr;
 
-		ib_init_ah_from_wc(agent->device, agent->port_num,
+		ib_init_ah_from_wc(ib_mad_agent_device(agent), agent->port_num,
 				   mad_recv_wc->wc, mad_recv_wc->recv_buf.grh,
 				   &ah_attr);
 
@@ -501,7 +501,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 		ah_attr.grh.traffic_class  = packet->mad.hdr.traffic_class;
 	}
 
-	ah = ib_create_ah(agent->qp->pd, &ah_attr);
+	ah = ib_create_ah(ib_mad_agent_pd(agent), &ah_attr);
 	if (IS_ERR(ah)) {
 		ret = PTR_ERR(ah);
 		goto err_up;
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 1cd75ff02251..b8771d3eaebf 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -197,7 +197,7 @@ static void update_sm_ah(struct mlx4_ib_dev *dev, u8 port_num, u16 lid, u8 sl)
 	ah_attr.sl       = sl;
 	ah_attr.port_num = port_num;
 
-	new_ah = ib_create_ah(dev->send_agent[port_num - 1][0]->qp->pd,
+	new_ah = ib_create_ah(ib_mad_agent_pd(dev->send_agent[port_num - 1][0]),
 			      &ah_attr);
 	if (IS_ERR(new_ah))
 		return;
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index 7c3f2fb44ba5..5c3e2eff1c25 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -86,7 +86,7 @@ static void update_sm_ah(struct mthca_dev *dev,
 	ah_attr.sl       = sl;
 	ah_attr.port_num = port_num;
 
-	new_ah = ib_create_ah(dev->send_agent[port_num - 1][0]->qp->pd,
+	new_ah = ib_create_ah(ib_mad_agent_pd(dev->send_agent[port_num - 1][0]),
 			      &ah_attr);
 	if (IS_ERR(new_ah))
 		return;
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index f6fe0414139b..4c8288cc6297 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -465,7 +465,7 @@ static void srpt_mad_recv_handler(struct ib_mad_agent *mad_agent,
 	if (!mad_wc || !mad_wc->recv_buf.mad)
 		return;
 
-	ah = ib_create_ah_from_wc(mad_agent->qp->pd, mad_wc->wc,
+	ah = ib_create_ah_from_wc(ib_mad_agent_pd(mad_agent), mad_wc->wc,
 				  mad_wc->recv_buf.grh, mad_agent->port_num);
 	if (IS_ERR(ah))
 		goto err;
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 188df91d5851..368001f5efa7 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -449,8 +449,7 @@ typedef void (*ib_mad_recv_handler)(struct ib_mad_agent *mad_agent,
 
 /**
  * ib_mad_agent - Used to track MAD registration with the access layer.
- * @device: Reference to device registration is on.
- * @qp: Reference to QP used for sending and receiving MADs.
+ * @pd: Reference to PD used for sending and receiving MADs.
  * @mr: Memory region for system memory usable for DMA.
  * @recv_handler: Callback handler for a received MAD.
  * @send_handler: Callback handler for a sent MAD.
@@ -467,8 +466,7 @@ enum {
 	IB_MAD_USER_RMPP = IB_USER_MAD_USER_RMPP,
 };
 struct ib_mad_agent {
-	struct ib_device	*device;
-	struct ib_qp		*qp;
+	struct ib_pd		*pd;
 	ib_mad_recv_handler	recv_handler;
 	ib_mad_send_handler	send_handler;
 	ib_mad_snoop_handler	snoop_handler;
@@ -479,6 +477,16 @@ struct ib_mad_agent {
 	u8			rmpp_version;
 };
 
+static inline struct ib_pd *ib_mad_agent_pd(struct ib_mad_agent *agent)
+{
+	return agent->pd;
+}
+
+static inline struct ib_device *ib_mad_agent_device(struct ib_mad_agent *agent)
+{
+	return agent->pd->device;
+}
+
 /**
  * ib_mad_send_wc - MAD send completion information.
  * @send_buf: Send MAD data buffer associated with the send MAD request.
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] IB/mad: Add QP parameters to ib_mad_qp_info
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-14  8:29   ` [PATCH 1/6] IB/mad: Use helpers to get ib_device and ib_pd from ib_mad_agent Haggai Eran
@ 2015-10-14  8:29   ` Haggai Eran
  2015-10-14  8:29   ` [PATCH 3/6] IB/core: Add capability bit to tell whether per-WR P_Key change in GSI is supported Haggai Eran
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2015-10-14  8:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran, Jason Gunthorpe,
	Hal Rosenstock, Sean Hefty, Or Gerlitz, Eli Cohen

In preparation for having an array of QPs in each ib_mad_qp_info, add the
qp_type and qp_num parameters to the ib_mad_qp_info struct so that clients
won't need to access the QPs themselves for this information.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/mad.c      | 16 ++++++++++------
 drivers/infiniband/core/mad_priv.h |  2 ++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 62ce3a4c20b7..7a1186173179 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1799,7 +1799,7 @@ static int validate_mad(const struct ib_mad_hdr *mad_hdr,
 			bool opa)
 {
 	int valid = 0;
-	u32 qp_num = qp_info->qp->qp_num;
+	u32 qp_num = qp_info->qp_num;
 
 	/* Make sure MAD base version is understood */
 	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION &&
@@ -2054,7 +2054,7 @@ static enum smi_action handle_ib_smi(const struct ib_mad_port_private *port_priv
 				    &response->grh, wc,
 				    port_priv->device,
 				    smi_get_fwd_port(smp),
-				    qp_info->qp->qp_num,
+				    qp_info->qp_num,
 				    response->mad_size,
 				    false);
 
@@ -2142,7 +2142,7 @@ handle_opa_smi(struct ib_mad_port_private *port_priv,
 				    &response->grh, wc,
 				    port_priv->device,
 				    opa_smi_get_fwd_port(smp),
-				    qp_info->qp->qp_num,
+				    qp_info->qp_num,
 				    recv->header.wc.byte_len,
 				    true);
 
@@ -2264,7 +2264,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 						    &recv->grh, wc,
 						    port_priv->device,
 						    port_num,
-						    qp_info->qp->qp_num,
+						    qp_info->qp_num,
 						    mad_size, opa);
 				goto out;
 			}
@@ -2283,7 +2283,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		   generate_unmatched_resp(recv, response, &mad_size, opa)) {
 		agent_send_response((const struct ib_mad_hdr *)response->mad, &recv->grh, wc,
 				    port_priv->device, port_num,
-				    qp_info->qp->qp_num, mad_size, opa);
+				    qp_info->qp_num, mad_size, opa);
 	}
 
 out:
@@ -3009,7 +3009,8 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv)
 		 */
 		attr->qp_state = IB_QPS_INIT;
 		attr->pkey_index = pkey_index;
-		attr->qkey = (qp->qp_num == 0) ? 0 : IB_QP1_QKEY;
+		attr->qkey = (port_priv->qp_info[i].qp_num == 0) ? 0 :
+			     IB_QP1_QKEY;
 		ret = ib_modify_qp(qp, attr, IB_QP_STATE |
 					     IB_QP_PKEY_INDEX | IB_QP_QKEY);
 		if (ret) {
@@ -3101,6 +3102,8 @@ static int create_mad_qp(struct ib_mad_qp_info *qp_info,
 	struct ib_qp_init_attr	qp_init_attr;
 	int ret;
 
+	qp_info->qp_type = qp_type;
+
 	memset(&qp_init_attr, 0, sizeof qp_init_attr);
 	qp_init_attr.send_cq = qp_info->port_priv->cq;
 	qp_init_attr.recv_cq = qp_info->port_priv->cq;
@@ -3124,6 +3127,7 @@ static int create_mad_qp(struct ib_mad_qp_info *qp_info,
 	/* Use minimum queue sizes unless the CQ is resized */
 	qp_info->send_queue.max_active = mad_sendq_size;
 	qp_info->recv_queue.max_active = mad_recvq_size;
+	qp_info->qp_num = qp_info->qp->qp_num;
 	return 0;
 
 error:
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 4a4f7aad0978..ae099f0f9701 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -191,6 +191,8 @@ struct ib_mad_qp_info {
 	struct ib_mad_snoop_private **snoop_table;
 	int snoop_table_size;
 	atomic_t snoop_count;
+	enum ib_qp_type qp_type;
+	u32 qp_num;
 };
 
 struct ib_mad_port_private {
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] IB/core: Add capability bit to tell whether per-WR P_Key change in GSI is supported
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-10-14  8:29   ` [PATCH 1/6] IB/mad: Use helpers to get ib_device and ib_pd from ib_mad_agent Haggai Eran
  2015-10-14  8:29   ` [PATCH 2/6] IB/mad: Add QP parameters to ib_mad_qp_info Haggai Eran
@ 2015-10-14  8:29   ` Haggai Eran
  2015-10-14  8:29   ` [PATCH 4/6] IB/mad: Use a SRQ for receiving GMPs Haggai Eran
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2015-10-14  8:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran, Jason Gunthorpe,
	Hal Rosenstock, Sean Hefty, Or Gerlitz, Eli Cohen,
	Mike Marciniszyn

The IB specs do not require per work-request control over the P_Key used in
GMPs. Most drivers have implemented this feature, but some drivers such as
ipath and mlx5 do not support it.

Add a capability bit, and turn it on in drivers that lack support for
per-WR P_Key.

Cc: Mike Marciniszyn <infinipath-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c        | 1 +
 drivers/staging/rdma/ipath/ipath_verbs.c | 1 +
 include/rdma/ib_verbs.h                  | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f1ccd40beae9..2d35e0472215 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1429,6 +1429,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
 	dev->ib_dev.check_mr_status	= mlx5_ib_check_mr_status;
 	dev->ib_dev.get_port_immutable  = mlx5_port_immutable;
+	dev->ib_dev.gsi_pkey_index_in_qp = 1;
 
 	mlx5_ib_internal_fill_odp_caps(dev);
 
diff --git a/drivers/staging/rdma/ipath/ipath_verbs.c b/drivers/staging/rdma/ipath/ipath_verbs.c
index ed2bbc2f7eae..2e13a82c1871 100644
--- a/drivers/staging/rdma/ipath/ipath_verbs.c
+++ b/drivers/staging/rdma/ipath/ipath_verbs.c
@@ -2202,6 +2202,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 	dev->mmap = ipath_mmap;
 	dev->dma_ops = &ipath_dma_mapping_ops;
 	dev->get_port_immutable = ipath_port_immutable;
+	dev->gsi_pkey_index_in_qp = 1;
 
 	snprintf(dev->node_desc, sizeof(dev->node_desc),
 		 IPATH_IDSTR " %s", init_utsname()->nodename);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7845fae6f2df..dc5cd93b3386 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1792,6 +1792,8 @@ struct ib_device {
 	__be64			     node_guid;
 	u32			     local_dma_lkey;
 	u16                          is_switch:1;
+	/* The device reads the pkey_index from the QP in GSI (QP1) QPs. */
+	u16			     gsi_pkey_index_in_qp:1;
 	u8                           node_type;
 	u8                           phys_port_cnt;
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] IB/mad: Use a SRQ for receiving GMPs
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-10-14  8:29   ` [PATCH 3/6] IB/core: Add capability bit to tell whether per-WR P_Key change in GSI is supported Haggai Eran
@ 2015-10-14  8:29   ` Haggai Eran
  2015-10-14  8:29   ` [PATCH 5/6] IB/mad: Create multiple QPs for supporting different P_Keys Haggai Eran
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2015-10-14  8:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran, Jason Gunthorpe,
	Hal Rosenstock, Sean Hefty, Or Gerlitz, Eli Cohen

As a preparation for supporting multiple transmission QPs for each GSI QP,
add a SRQ that will be used for all the receive buffers of these QPs.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/mad.c      | 58 ++++++++++++++++++++++++++++++++++----
 drivers/infiniband/core/mad_priv.h |  1 +
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 7a1186173179..2d4457239908 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2921,7 +2921,12 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 		post = (++recv_queue->count < recv_queue->max_active);
 		list_add_tail(&mad_priv->header.mad_list.list, &recv_queue->list);
 		spin_unlock_irqrestore(&recv_queue->lock, flags);
-		ret = ib_post_recv(qp_info->qp, &recv_wr, &bad_recv_wr);
+		if (qp_info->srq)
+			ret = ib_post_srq_recv(qp_info->srq, &recv_wr,
+					       &bad_recv_wr);
+		else
+			ret = ib_post_recv(qp_info->qp, &recv_wr, &bad_recv_wr);
+
 		if (ret) {
 			spin_lock_irqsave(&recv_queue->lock, flags);
 			list_del(&mad_priv->header.mad_list.list);
@@ -3074,6 +3079,16 @@ static void qp_event_handler(struct ib_event *event, void *qp_context)
 		event->event, qp_info->qp->qp_num);
 }
 
+static void srq_event_handler(struct ib_event *event, void *srq_context)
+{
+	struct ib_mad_qp_info	*qp_info = srq_context;
+
+	/* We aren't expecting limit reached events, so this must be an error */
+	dev_err(&qp_info->port_priv->device->dev,
+		"Fatal error (%d) on MAD SRQ (QP%d)\n",
+		event->event, qp_info->qp->qp_num);
+}
+
 static void init_mad_queue(struct ib_mad_qp_info *qp_info,
 			   struct ib_mad_queue *mad_queue)
 {
@@ -3099,19 +3114,45 @@ static void init_mad_qp(struct ib_mad_port_private *port_priv,
 static int create_mad_qp(struct ib_mad_qp_info *qp_info,
 			 enum ib_qp_type qp_type)
 {
+	struct ib_device *device = qp_info->port_priv->device;
+	struct ib_srq_init_attr srq_init_attr;
 	struct ib_qp_init_attr	qp_init_attr;
+	struct ib_srq *srq = NULL;
+	const bool multiple_qps = qp_type == IB_QPT_GSI &&
+				  device->gsi_pkey_index_in_qp;
+
 	int ret;
 
 	qp_info->qp_type = qp_type;
 
+	if (multiple_qps) {
+		memset(&srq_init_attr, 0, sizeof(srq_init_attr));
+		srq_init_attr.event_handler = srq_event_handler;
+		srq_init_attr.srq_context = qp_info;
+		srq_init_attr.attr.max_wr = mad_recvq_size;
+		srq_init_attr.attr.max_sge = IB_MAD_RECV_REQ_MAX_SG;
+		srq = ib_create_srq(qp_info->port_priv->pd, &srq_init_attr);
+		if (IS_ERR(srq)) {
+			dev_err(&qp_info->port_priv->device->dev,
+				"Couldn't create ib_mad SRQ for QP%d\n",
+				get_spl_qp_index(qp_type));
+			ret = PTR_ERR(srq);
+			goto error_srq;
+		}
+	}
+	qp_info->srq = srq;
+
 	memset(&qp_init_attr, 0, sizeof qp_init_attr);
 	qp_init_attr.send_cq = qp_info->port_priv->cq;
 	qp_init_attr.recv_cq = qp_info->port_priv->cq;
+	qp_init_attr.srq = srq;
 	qp_init_attr.sq_sig_type = IB_SIGNAL_ALL_WR;
 	qp_init_attr.cap.max_send_wr = mad_sendq_size;
-	qp_init_attr.cap.max_recv_wr = mad_recvq_size;
 	qp_init_attr.cap.max_send_sge = IB_MAD_SEND_REQ_MAX_SG;
-	qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG;
+	if (!srq) {
+		qp_init_attr.cap.max_recv_wr = mad_recvq_size;
+		qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG;
+	}
 	qp_init_attr.qp_type = qp_type;
 	qp_init_attr.port_num = qp_info->port_priv->port_num;
 	qp_init_attr.qp_context = qp_info;
@@ -3122,7 +3163,7 @@ static int create_mad_qp(struct ib_mad_qp_info *qp_info,
 			"Couldn't create ib_mad QP%d\n",
 			get_spl_qp_index(qp_type));
 		ret = PTR_ERR(qp_info->qp);
-		goto error;
+		goto error_qp;
 	}
 	/* Use minimum queue sizes unless the CQ is resized */
 	qp_info->send_queue.max_active = mad_sendq_size;
@@ -3130,7 +3171,12 @@ static int create_mad_qp(struct ib_mad_qp_info *qp_info,
 	qp_info->qp_num = qp_info->qp->qp_num;
 	return 0;
 
-error:
+error_qp:
+	if (srq) {
+		WARN_ON(ib_destroy_srq(srq));
+		qp_info->srq = NULL;
+	}
+error_srq:
 	return ret;
 }
 
@@ -3140,6 +3186,8 @@ static void destroy_mad_qp(struct ib_mad_qp_info *qp_info)
 		return;
 
 	ib_destroy_qp(qp_info->qp);
+	if (qp_info->srq)
+		WARN_ON(ib_destroy_srq(qp_info->srq));
 	kfree(qp_info->snoop_table);
 }
 
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index ae099f0f9701..0c3b7c576f3a 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -184,6 +184,7 @@ struct ib_mad_queue {
 struct ib_mad_qp_info {
 	struct ib_mad_port_private *port_priv;
 	struct ib_qp *qp;
+	struct ib_srq *srq;
 	struct ib_mad_queue send_queue;
 	struct ib_mad_queue recv_queue;
 	struct list_head overflow_list;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] IB/mad: Create multiple QPs for supporting different P_Keys
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-10-14  8:29   ` [PATCH 4/6] IB/mad: Use a SRQ for receiving GMPs Haggai Eran
@ 2015-10-14  8:29   ` Haggai Eran
  2015-10-14  8:29   ` [PATCH 6/6] IB/mad: P_Key change event handler Haggai Eran
  2015-10-14 17:54   ` [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP Jason Gunthorpe
  6 siblings, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2015-10-14  8:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran, Jason Gunthorpe,
	Hal Rosenstock, Sean Hefty, Or Gerlitz, Eli Cohen

On devices that do not support sending different P_Keys on a single GSI QP,
create a different UD QP for each non-zero P_Key in the P_Key table. When
transmitting a MAD with a specified P_Key, send it using the right QP.
Receive packets from all the QPs through the common SRQ.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/mad.c      | 404 +++++++++++++++++++++++++++----------
 drivers/infiniband/core/mad_priv.h |   4 +-
 2 files changed, 299 insertions(+), 109 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 2d4457239908..02977942574c 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -84,6 +84,8 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req *mad_reg_req,
 			      u8 mgmt_class);
 static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
 			   struct ib_mad_agent_private *agent_priv);
+static void report_mad_completion(struct ib_mad_qp_info *qp_info,
+				  struct ib_mad_send_wr_private *mad_send_wr);
 
 /*
  * Returns a ib_mad_port_private structure or NULL for a device/port
@@ -353,7 +355,7 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
 	mad_agent_priv->agent.recv_handler = recv_handler;
 	mad_agent_priv->agent.send_handler = send_handler;
 	mad_agent_priv->agent.context = context;
-	mad_agent_priv->agent.pd = port_priv->qp_info[qpn].qp->pd;
+	mad_agent_priv->agent.pd = port_priv->pd;
 	mad_agent_priv->agent.port_num = port_num;
 	mad_agent_priv->agent.flags = registration_flags;
 	spin_lock_init(&mad_agent_priv->lock);
@@ -518,7 +520,7 @@ struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device,
 	mad_snoop_priv->agent.recv_handler = recv_handler;
 	mad_snoop_priv->agent.snoop_handler = snoop_handler;
 	mad_snoop_priv->agent.context = context;
-	mad_snoop_priv->agent.pd = port_priv->qp_info[qpn].qp->pd;
+	mad_snoop_priv->agent.pd = port_priv->pd;
 	mad_snoop_priv->agent.port_num = port_num;
 	mad_snoop_priv->mad_snoop_flags = mad_snoop_flags;
 	init_completion(&mad_snoop_priv->comp);
@@ -829,10 +831,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 		goto out;
 	}
 
-	build_smp_wc(mad_agent_priv->qp_info->qp,
+	read_lock_irqsave(&mad_agent_priv->qp_info->qp_lock, flags);
+	build_smp_wc(mad_agent_priv->qp_info->qp[0],
 		     send_wr->wr_id, drslid,
 		     send_wr->wr.ud.pkey_index,
 		     send_wr->wr.ud.port_num, &mad_wc);
+	read_unlock_irqrestore(&mad_agent_priv->qp_info->qp_lock, flags);
 
 	if (opa && smp->base_version == OPA_MGMT_BASE_VERSION) {
 		mad_wc.byte_len = mad_send_wr->send_buf.hdr_len
@@ -1141,6 +1145,46 @@ void ib_free_send_mad(struct ib_mad_send_buf *send_buf)
 }
 EXPORT_SYMBOL(ib_free_send_mad);
 
+static int ib_mad_post_send(struct ib_mad_qp_info *qp_info,
+			    struct ib_mad_send_wr_private *mad_send_wr,
+			    struct ib_send_wr **bad_send_wr)
+{
+	struct ib_qp *qp;
+	u16 pkey_index;
+	unsigned long flags;
+	struct ib_wc;
+	int ret;
+
+	read_lock_irqsave(&qp_info->qp_lock, flags);
+
+	if (!qp_info->srq)
+		pkey_index = 0;
+	else
+		pkey_index = mad_send_wr->send_wr.wr.ud.pkey_index;
+
+	if (pkey_index >= qp_info->num_qps)
+		goto error;
+
+	qp = qp_info->qp[pkey_index];
+	if (!qp)
+		goto error;
+
+	ret = ib_post_send(qp, &mad_send_wr->send_wr, bad_send_wr);
+	read_unlock_irqrestore(&qp_info->qp_lock, flags);
+
+	return ret;
+
+error:
+	read_unlock_irqrestore(&qp_info->qp_lock, flags);
+	dev_dbg(&qp_info->port_priv->device->dev,
+		"ib_mad: failed to send MAD with pkey_index=%d. dropping.\n",
+		pkey_index);
+
+	report_mad_completion(qp_info, mad_send_wr);
+
+	return 0;
+}
+
 int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
 {
 	struct ib_mad_qp_info *qp_info;
@@ -1183,8 +1227,7 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
 
 	spin_lock_irqsave(&qp_info->send_queue.lock, flags);
 	if (qp_info->send_queue.count < qp_info->send_queue.max_active) {
-		ret = ib_post_send(qp_info->qp, &mad_send_wr->send_wr,
-				   &bad_send_wr);
+		ret = ib_mad_post_send(qp_info, mad_send_wr, &bad_send_wr);
 		list = &qp_info->send_queue.list;
 	} else {
 		ret = 0;
@@ -1795,7 +1838,7 @@ out:
 }
 
 static int validate_mad(const struct ib_mad_hdr *mad_hdr,
-			const struct ib_mad_qp_info *qp_info,
+			struct ib_mad_qp_info *qp_info,
 			bool opa)
 {
 	int valid = 0;
@@ -2461,8 +2504,7 @@ retry:
 	ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc);
 
 	if (queued_send_wr) {
-		ret = ib_post_send(qp_info->qp, &queued_send_wr->send_wr,
-				   &bad_send_wr);
+		ret = ib_mad_post_send(qp_info, queued_send_wr, &bad_send_wr);
 		if (ret) {
 			dev_err(&port_priv->device->dev,
 				"ib_post_send failed: %d\n", ret);
@@ -2473,6 +2515,18 @@ retry:
 	}
 }
 
+/* Report a successful completion in order to silently drop a MAD. */
+static void report_mad_completion(struct ib_mad_qp_info *qp_info,
+				  struct ib_mad_send_wr_private *mad_send_wr)
+{
+	struct ib_mad_port_private *port_priv = qp_info->port_priv;
+	struct ib_wc wc;
+
+	memset(&wc, 0, sizeof(wc));
+	wc.wr_id = (unsigned long)&mad_send_wr->mad_list;
+	ib_mad_send_done_handler(port_priv, &wc);
+}
+
 static void mark_sends_for_retry(struct ib_mad_qp_info *qp_info)
 {
 	struct ib_mad_send_wr_private *mad_send_wr;
@@ -2519,8 +2573,8 @@ static void mad_error_handler(struct ib_mad_port_private *port_priv,
 			struct ib_send_wr *bad_send_wr;
 
 			mad_send_wr->retry = 0;
-			ret = ib_post_send(qp_info->qp, &mad_send_wr->send_wr,
-					&bad_send_wr);
+			ret = ib_mad_post_send(qp_info, mad_send_wr,
+					       &bad_send_wr);
 			if (ret)
 				ib_mad_send_done_handler(port_priv, wc);
 		} else
@@ -2533,7 +2587,7 @@ static void mad_error_handler(struct ib_mad_port_private *port_priv,
 		if (attr) {
 			attr->qp_state = IB_QPS_RTS;
 			attr->cur_qp_state = IB_QPS_SQE;
-			ret = ib_modify_qp(qp_info->qp, attr,
+			ret = ib_modify_qp(wc->qp, attr,
 					   IB_QP_STATE | IB_QP_CUR_STATE);
 			kfree(attr);
 			if (ret)
@@ -2680,6 +2734,7 @@ static void local_completions(struct work_struct *work)
 	struct ib_mad_agent_private *mad_agent_priv;
 	struct ib_mad_local_private *local;
 	struct ib_mad_agent_private *recv_mad_agent;
+	struct ib_mad_qp_info *qp_info;
 	unsigned long flags;
 	int free_mad;
 	struct ib_wc wc;
@@ -2715,11 +2770,14 @@ static void local_completions(struct work_struct *work)
 			 * Defined behavior is to complete response
 			 * before request
 			 */
-			build_smp_wc(recv_mad_agent->qp_info->qp,
+			qp_info = recv_mad_agent->qp_info;
+			read_lock_irqsave(&qp_info->qp_lock, flags);
+			build_smp_wc(qp_info->qp[0],
 				     (unsigned long) local->mad_send_wr,
 				     be16_to_cpu(IB_LID_PERMISSIVE),
 				     local->mad_send_wr->send_wr.wr.ud.pkey_index,
 				     recv_mad_agent->agent.port_num, &wc);
+			read_unlock_irqrestore(&qp_info->qp_lock, flags);
 
 			local->mad_priv->header.recv_wc.wc = &wc;
 
@@ -2738,9 +2796,9 @@ static void local_completions(struct work_struct *work)
 			local->mad_priv->header.recv_wc.recv_buf.grh = NULL;
 			local->mad_priv->header.recv_wc.recv_buf.mad =
 						(struct ib_mad *)local->mad_priv->mad;
-			if (atomic_read(&recv_mad_agent->qp_info->snoop_count))
-				snoop_recv(recv_mad_agent->qp_info,
-					  &local->mad_priv->header.recv_wc,
+			if (atomic_read(&qp_info->snoop_count))
+				snoop_recv(qp_info,
+					   &local->mad_priv->header.recv_wc,
 					   IB_MAD_SNOOP_RECVS);
 			recv_mad_agent->agent.recv_handler(
 						&recv_mad_agent->agent,
@@ -2925,7 +2983,8 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 			ret = ib_post_srq_recv(qp_info->srq, &recv_wr,
 					       &bad_recv_wr);
 		else
-			ret = ib_post_recv(qp_info->qp, &recv_wr, &bad_recv_wr);
+			ret = ib_post_recv(qp_info->qp[0], &recv_wr,
+					   &bad_recv_wr);
 
 		if (ret) {
 			spin_lock_irqsave(&recv_queue->lock, flags);
@@ -2955,9 +3014,6 @@ static void cleanup_recv_queue(struct ib_mad_qp_info *qp_info)
 	struct ib_mad_private *recv;
 	struct ib_mad_list_head *mad_list;
 
-	if (!qp_info->qp)
-		return;
-
 	while (!list_empty(&qp_info->recv_queue.list)) {
 
 		mad_list = list_entry(qp_info->recv_queue.list.next,
@@ -2981,6 +3037,57 @@ static void cleanup_recv_queue(struct ib_mad_qp_info *qp_info)
 	qp_info->recv_queue.count = 0;
 }
 
+/* Called with qp_rwsem locked for write */
+static int setup_qp(struct ib_mad_port_private *port_priv, struct ib_qp *qp,
+		    u16 pkey_index, struct ib_qp_attr *attr)
+{
+	int mask;
+	int ret;
+
+	/*
+	 * PKey index for QP1 is irrelevant on devices with
+	 * gsi_pkey_index_in_qp=0, but one is needed for the Reset to
+	 * Init transition. On devices with gsi_pkey_index_in_qp=1,
+	 * pkey index determines the pkey to be used for all GMPs.
+	 */
+	mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_QKEY;
+	attr->qp_state = IB_QPS_INIT;
+	attr->pkey_index = pkey_index;
+	attr->qkey = (qp->qp_num == 0) ? 0 : IB_QP1_QKEY;
+	if (qp->qp_type == IB_QPT_UD) {
+		attr->port_num = port_priv->port_num;
+		mask |= IB_QP_PORT;
+	}
+	ret = ib_modify_qp(qp, attr, mask);
+	if (ret) {
+		dev_err(&qp->device->dev,
+			"Couldn't change QP%d state to INIT: %d\n",
+			qp->qp_num, ret);
+		return ret;
+	}
+
+	attr->qp_state = IB_QPS_RTR;
+	ret = ib_modify_qp(qp, attr, IB_QP_STATE);
+	if (ret) {
+		dev_err(&qp->device->dev,
+			"Couldn't change QP%d state to RTR: %d\n",
+			qp->qp_num, ret);
+		return ret;
+	}
+
+	attr->qp_state = IB_QPS_RTS;
+	attr->sq_psn = IB_MAD_SEND_Q_PSN;
+	ret = ib_modify_qp(qp, attr, IB_QP_STATE | IB_QP_SQ_PSN);
+	if (ret) {
+		dev_err(&qp->device->dev,
+			"Couldn't change QP%d state to RTS: %d\n",
+			qp->qp_num, ret);
+		return ret;
+	}
+
+	return ret;
+}
+
 /*
  * Start the port
  */
@@ -2988,8 +3095,6 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv)
 {
 	int ret, i;
 	struct ib_qp_attr *attr;
-	struct ib_qp *qp;
-	u16 pkey_index;
 
 	attr = kmalloc(sizeof *attr, GFP_KERNEL);
 	if (!attr) {
@@ -2998,53 +3103,6 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv)
 		return -ENOMEM;
 	}
 
-	ret = ib_find_pkey(port_priv->device, port_priv->port_num,
-			   IB_DEFAULT_PKEY_FULL, &pkey_index);
-	if (ret)
-		pkey_index = 0;
-
-	for (i = 0; i < IB_MAD_QPS_CORE; i++) {
-		qp = port_priv->qp_info[i].qp;
-		if (!qp)
-			continue;
-
-		/*
-		 * PKey index for QP1 is irrelevant but
-		 * one is needed for the Reset to Init transition
-		 */
-		attr->qp_state = IB_QPS_INIT;
-		attr->pkey_index = pkey_index;
-		attr->qkey = (port_priv->qp_info[i].qp_num == 0) ? 0 :
-			     IB_QP1_QKEY;
-		ret = ib_modify_qp(qp, attr, IB_QP_STATE |
-					     IB_QP_PKEY_INDEX | IB_QP_QKEY);
-		if (ret) {
-			dev_err(&port_priv->device->dev,
-				"Couldn't change QP%d state to INIT: %d\n",
-				i, ret);
-			goto out;
-		}
-
-		attr->qp_state = IB_QPS_RTR;
-		ret = ib_modify_qp(qp, attr, IB_QP_STATE);
-		if (ret) {
-			dev_err(&port_priv->device->dev,
-				"Couldn't change QP%d state to RTR: %d\n",
-				i, ret);
-			goto out;
-		}
-
-		attr->qp_state = IB_QPS_RTS;
-		attr->sq_psn = IB_MAD_SEND_Q_PSN;
-		ret = ib_modify_qp(qp, attr, IB_QP_STATE | IB_QP_SQ_PSN);
-		if (ret) {
-			dev_err(&port_priv->device->dev,
-				"Couldn't change QP%d state to RTS: %d\n",
-				i, ret);
-			goto out;
-		}
-	}
-
 	ret = ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
 	if (ret) {
 		dev_err(&port_priv->device->dev,
@@ -3076,17 +3134,23 @@ static void qp_event_handler(struct ib_event *event, void *qp_context)
 	/* It's worse than that! He's dead, Jim! */
 	dev_err(&qp_info->port_priv->device->dev,
 		"Fatal error (%d) on MAD QP (%d)\n",
-		event->event, qp_info->qp->qp_num);
+		event->event, event->element.qp->qp_num);
 }
 
 static void srq_event_handler(struct ib_event *event, void *srq_context)
 {
 	struct ib_mad_qp_info	*qp_info = srq_context;
+	int qp_num;
+	unsigned long flags;
+
+	read_lock_irqsave(&qp_info->qp_lock, flags);
+	qp_num = qp_info->qp[0]->qp_num;
+	read_unlock_irqrestore(&qp_info->qp_lock, flags);
 
 	/* We aren't expecting limit reached events, so this must be an error */
 	dev_err(&qp_info->port_priv->device->dev,
 		"Fatal error (%d) on MAD SRQ (QP%d)\n",
-		event->event, qp_info->qp->qp_num);
+		event->event, qp_num);
 }
 
 static void init_mad_queue(struct ib_mad_qp_info *qp_info,
@@ -3102,6 +3166,7 @@ static void init_mad_qp(struct ib_mad_port_private *port_priv,
 			struct ib_mad_qp_info *qp_info)
 {
 	qp_info->port_priv = port_priv;
+	rwlock_init(&qp_info->qp_lock);
 	init_mad_queue(qp_info, &qp_info->send_queue);
 	init_mad_queue(qp_info, &qp_info->recv_queue);
 	INIT_LIST_HEAD(&qp_info->overflow_list);
@@ -3111,12 +3176,161 @@ static void init_mad_qp(struct ib_mad_port_private *port_priv,
 	atomic_set(&qp_info->snoop_count, 0);
 }
 
+static struct ib_qp *create_qp(struct ib_mad_qp_info *qp_info,
+			       enum ib_qp_type qp_type)
+{
+	struct ib_qp_init_attr	qp_init_attr;
+	struct ib_qp *qp;
+
+	memset(&qp_init_attr, 0, sizeof(qp_init_attr));
+	qp_init_attr.send_cq = qp_info->port_priv->cq;
+	qp_init_attr.recv_cq = qp_info->port_priv->cq;
+	qp_init_attr.srq = qp_info->srq;
+	qp_init_attr.sq_sig_type = IB_SIGNAL_ALL_WR;
+	qp_init_attr.cap.max_send_wr = mad_sendq_size;
+	qp_init_attr.cap.max_send_sge = IB_MAD_SEND_REQ_MAX_SG;
+	if (!qp_info->srq) {
+		qp_init_attr.cap.max_recv_wr = mad_recvq_size;
+		qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG;
+	}
+	qp_init_attr.qp_type = qp_type;
+	qp_init_attr.port_num = qp_info->port_priv->port_num;
+	qp_init_attr.qp_context = qp_info;
+	qp_init_attr.event_handler = qp_event_handler;
+
+	qp = ib_create_qp(qp_info->port_priv->pd, &qp_init_attr);
+	if (IS_ERR(qp)) {
+		dev_err(&qp_info->port_priv->device->dev,
+			"Couldn't create ib_mad QP%d\n",
+			get_spl_qp_index(qp_type));
+	}
+
+	return qp;
+}
+
+static u16 max_pkeys(struct ib_mad_port_private *port_priv)
+{
+	struct ib_device *device = port_priv->device;
+	struct ib_device_attr attr;
+	int ret;
+
+	ret = ib_query_device(device, &attr);
+	if (ret) {
+		WARN_ONCE(1, "ib_query_device failed");
+		return 0;
+	}
+
+	return attr.max_pkeys;
+}
+
+/* Return 0 if the QP should be created with the pkey_index set. Returns an
+ * error if the QP should not be created (a zero P_Key or qp_index >0 for
+ * SMI).
+ */
+static int get_pkey_index(struct ib_mad_qp_info *qp_info, u16 qp_index,
+			  u16 *pkey_index)
+{
+	struct ib_device *device = qp_info->port_priv->device;
+	const int port_num = qp_info->port_priv->port_num;
+	u16 pkey;
+	int ret;
+
+	if (qp_info->qp_type == IB_QPT_SMI) {
+		if (qp_index > 0)
+			return -ENOENT;
+
+		ret = ib_find_pkey(device, port_num, IB_DEFAULT_PKEY_FULL,
+				   pkey_index);
+		if (ret)
+			*pkey_index = 0;
+
+		return 0;
+	}
+
+	*pkey_index = qp_index;
+	ret = ib_query_pkey(device, port_num, *pkey_index, &pkey);
+	if (!ret && !pkey)
+		ret = -ENOENT;
+	return ret;
+}
+
+static int update_pkey_table(struct ib_mad_qp_info *qp_info)
+{
+	struct ib_device *device = qp_info->port_priv->device;
+	const int num_qps = qp_info->num_qps;
+	struct ib_qp *qp;
+	struct ib_qp_attr attr;
+	u16 pkey_index;
+	u16 qp_index;
+	unsigned long flags;
+	int ret = 0;
+
+	for (qp_index = 0; qp_index < num_qps; ++qp_index) {
+		const enum ib_qp_type qp_type = qp_index == 0 ?
+			qp_info->qp_type : IB_QPT_UD;
+
+		ret = get_pkey_index(qp_info, qp_index, &pkey_index);
+		if (ret)
+			/* TODO it would be nice to destroy unused QPs */
+			continue;
+
+		read_lock_irqsave(&qp_info->qp_lock, flags);
+		if (qp_info->qp[qp_index]) {
+			dev_dbg(&device->dev, "Skipping creation of already existing QP at index %d\n",
+				qp_index);
+			read_unlock_irqrestore(&qp_info->qp_lock, flags);
+			continue;
+		}
+		read_unlock_irqrestore(&qp_info->qp_lock, flags);
+
+		qp = create_qp(qp_info, qp_type);
+		if (IS_ERR(qp))
+			return PTR_ERR(qp);
+
+		dev_dbg(&device->dev, "ib_mad: setting up QP%d, qp_index=%hu, pkey_index=%hu, port_num=%d\n",
+			qp->qp_num, qp_index, pkey_index,
+			qp_info->port_priv->port_num);
+
+		ret = setup_qp(qp_info->port_priv, qp, pkey_index, &attr);
+		if (ret) {
+			WARN_ON_ONCE(ib_destroy_qp(qp));
+			return ret;
+		}
+
+		write_lock_irqsave(&qp_info->qp_lock, flags);
+		WARN_ON(qp_info->qp[qp_index]);
+		qp_info->qp[qp_index] = qp;
+		write_unlock_irqrestore(&qp_info->qp_lock, flags);
+	}
+
+	return 0;
+}
+
+static void destroy_mad_qp(struct ib_mad_qp_info *qp_info)
+{
+	u16 qp_index;
+
+	if (!qp_info->qp)
+		return;
+
+	for (qp_index = 0; qp_index < qp_info->num_qps; ++qp_index) {
+		if (qp_info->qp[qp_index])
+			WARN_ON_ONCE(ib_destroy_qp(qp_info->qp[qp_index]));
+	}
+	kfree(qp_info->qp);
+	qp_info->qp = NULL;
+	if (qp_info->srq) {
+		WARN_ON(ib_destroy_srq(qp_info->srq));
+		qp_info->srq = NULL;
+	}
+	kfree(qp_info->snoop_table);
+}
+
 static int create_mad_qp(struct ib_mad_qp_info *qp_info,
 			 enum ib_qp_type qp_type)
 {
 	struct ib_device *device = qp_info->port_priv->device;
 	struct ib_srq_init_attr srq_init_attr;
-	struct ib_qp_init_attr	qp_init_attr;
 	struct ib_srq *srq = NULL;
 	const bool multiple_qps = qp_type == IB_QPT_GSI &&
 				  device->gsi_pkey_index_in_qp;
@@ -3142,55 +3356,29 @@ static int create_mad_qp(struct ib_mad_qp_info *qp_info,
 	}
 	qp_info->srq = srq;
 
-	memset(&qp_init_attr, 0, sizeof qp_init_attr);
-	qp_init_attr.send_cq = qp_info->port_priv->cq;
-	qp_init_attr.recv_cq = qp_info->port_priv->cq;
-	qp_init_attr.srq = srq;
-	qp_init_attr.sq_sig_type = IB_SIGNAL_ALL_WR;
-	qp_init_attr.cap.max_send_wr = mad_sendq_size;
-	qp_init_attr.cap.max_send_sge = IB_MAD_SEND_REQ_MAX_SG;
-	if (!srq) {
-		qp_init_attr.cap.max_recv_wr = mad_recvq_size;
-		qp_init_attr.cap.max_recv_sge = IB_MAD_RECV_REQ_MAX_SG;
-	}
-	qp_init_attr.qp_type = qp_type;
-	qp_init_attr.port_num = qp_info->port_priv->port_num;
-	qp_init_attr.qp_context = qp_info;
-	qp_init_attr.event_handler = qp_event_handler;
-	qp_info->qp = ib_create_qp(qp_info->port_priv->pd, &qp_init_attr);
-	if (IS_ERR(qp_info->qp)) {
-		dev_err(&qp_info->port_priv->device->dev,
-			"Couldn't create ib_mad QP%d\n",
-			get_spl_qp_index(qp_type));
-		ret = PTR_ERR(qp_info->qp);
+	qp_info->num_qps = multiple_qps ? max_pkeys(qp_info->port_priv) : 1;
+	qp_info->qp = kcalloc(qp_info->num_qps, sizeof(*qp_info->qp),
+			      GFP_KERNEL);
+	if (!qp_info->qp) {
+		ret = -ENOMEM;
 		goto error_qp;
 	}
+	ret = update_pkey_table(qp_info);
+	if (ret)
+		goto error_qp;
+
 	/* Use minimum queue sizes unless the CQ is resized */
 	qp_info->send_queue.max_active = mad_sendq_size;
 	qp_info->recv_queue.max_active = mad_recvq_size;
-	qp_info->qp_num = qp_info->qp->qp_num;
+	qp_info->qp_num = qp_info->qp[0]->qp_num;
 	return 0;
 
 error_qp:
-	if (srq) {
-		WARN_ON(ib_destroy_srq(srq));
-		qp_info->srq = NULL;
-	}
+	destroy_mad_qp(qp_info);
 error_srq:
 	return ret;
 }
 
-static void destroy_mad_qp(struct ib_mad_qp_info *qp_info)
-{
-	if (!qp_info->qp)
-		return;
-
-	ib_destroy_qp(qp_info->qp);
-	if (qp_info->srq)
-		WARN_ON(ib_destroy_srq(qp_info->srq));
-	kfree(qp_info->snoop_table);
-}
-
 /*
  * Open the port
  * Create the QP, PD, MR, and CQ if needed
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 0c3b7c576f3a..32b9532c7868 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -183,7 +183,9 @@ struct ib_mad_queue {
 
 struct ib_mad_qp_info {
 	struct ib_mad_port_private *port_priv;
-	struct ib_qp *qp;
+	rwlock_t qp_lock; /* Protects the contents of the qp array */
+	struct ib_qp **qp;
+	int num_qps;
 	struct ib_srq *srq;
 	struct ib_mad_queue send_queue;
 	struct ib_mad_queue recv_queue;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] IB/mad: P_Key change event handler
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-10-14  8:29   ` [PATCH 5/6] IB/mad: Create multiple QPs for supporting different P_Keys Haggai Eran
@ 2015-10-14  8:29   ` Haggai Eran
  2015-10-14 17:54   ` [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP Jason Gunthorpe
  6 siblings, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2015-10-14  8:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran, Jason Gunthorpe,
	Hal Rosenstock, Sean Hefty, Or Gerlitz, Eli Cohen

Add a device event handler to capture P_Key table change events. For
devices that don't support setting the P_Key index per work request, update
the per-P_Key QP table in the MAD layer, creating QPs as
needed.

The code currently doesn't destroy created QPs when their pkeys are
cleared. This can be added later on.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/mad.c      | 51 ++++++++++++++++++++++++++++++++++++--
 drivers/infiniband/core/mad_priv.h |  2 ++
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 02977942574c..a350b4117cb3 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3153,6 +3153,28 @@ static void srq_event_handler(struct ib_event *event, void *srq_context)
 		event->event, qp_num);
 }
 
+static void device_event_handler(struct ib_event_handler *handler,
+				 struct ib_event *event)
+{
+	struct ib_mad_port_private *port_priv =
+		container_of(handler, struct ib_mad_port_private,
+			     event_handler);
+
+	if (event->element.port_num != port_priv->port_num)
+		return;
+
+	dev_dbg(&port_priv->device->dev, "ib_mad: event %s on port %d\n",
+		ib_event_msg(event->event), event->element.port_num);
+
+	switch (event->event) {
+	case IB_EVENT_PKEY_CHANGE:
+		queue_work(port_priv->wq, &port_priv->pkey_change_work);
+		break;
+	default:
+		break;
+	}
+}
+
 static void init_mad_queue(struct ib_mad_qp_info *qp_info,
 			   struct ib_mad_queue *mad_queue)
 {
@@ -3306,6 +3328,15 @@ static int update_pkey_table(struct ib_mad_qp_info *qp_info)
 	return 0;
 }
 
+static void pkey_change_handler(struct work_struct *work)
+{
+	struct ib_mad_port_private *port_priv =
+		container_of(work, struct ib_mad_port_private,
+			     pkey_change_work);
+
+	update_pkey_table(&port_priv->qp_info[1]);
+}
+
 static void destroy_mad_qp(struct ib_mad_qp_info *qp_info)
 {
 	u16 qp_index;
@@ -3453,6 +3484,17 @@ static int ib_mad_port_open(struct ib_device *device,
 	}
 	INIT_WORK(&port_priv->work, ib_mad_completion_handler);
 
+	if (device->gsi_pkey_index_in_qp) {
+		INIT_WORK(&port_priv->pkey_change_work, pkey_change_handler);
+		INIT_IB_EVENT_HANDLER(&port_priv->event_handler, device,
+				      device_event_handler);
+		ret = ib_register_event_handler(&port_priv->event_handler);
+		if (ret) {
+			dev_err(&device->dev, "Unable to register event handler for ib_mad\n");
+			goto error9;
+		}
+	}
+
 	spin_lock_irqsave(&ib_mad_port_list_lock, flags);
 	list_add_tail(&port_priv->port_list, &ib_mad_port_list);
 	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
@@ -3460,16 +3502,19 @@ static int ib_mad_port_open(struct ib_device *device,
 	ret = ib_mad_port_start(port_priv);
 	if (ret) {
 		dev_err(&device->dev, "Couldn't start port\n");
-		goto error9;
+		goto error10;
 	}
 
 	return 0;
 
-error9:
+error10:
 	spin_lock_irqsave(&ib_mad_port_list_lock, flags);
 	list_del_init(&port_priv->port_list);
 	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
 
+	if (device->gsi_pkey_index_in_qp)
+		ib_unregister_event_handler(&port_priv->event_handler);
+error9:
 	destroy_workqueue(port_priv->wq);
 error8:
 	destroy_mad_qp(&port_priv->qp_info[1]);
@@ -3507,6 +3552,8 @@ static int ib_mad_port_close(struct ib_device *device, int port_num)
 	list_del_init(&port_priv->port_list);
 	spin_unlock_irqrestore(&ib_mad_port_list_lock, flags);
 
+	if (device->gsi_pkey_index_in_qp)
+		ib_unregister_event_handler(&port_priv->event_handler);
 	destroy_workqueue(port_priv->wq);
 	destroy_mad_qp(&port_priv->qp_info[1]);
 	destroy_mad_qp(&port_priv->qp_info[0]);
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 32b9532c7868..ee8003648d8a 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -211,6 +211,8 @@ struct ib_mad_port_private {
 	struct workqueue_struct *wq;
 	struct work_struct work;
 	struct ib_mad_qp_info qp_info[IB_MAD_QPS_CORE];
+	struct ib_event_handler event_handler;
+	struct work_struct pkey_change_work;
 };
 
 int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr);
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP
       [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-10-14  8:29   ` [PATCH 6/6] IB/mad: P_Key change event handler Haggai Eran
@ 2015-10-14 17:54   ` Jason Gunthorpe
       [not found]     ` <20151014175410.GB28534-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  6 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2015-10-14 17:54 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hal Rosenstock,
	Sean Hefty, Or Gerlitz, Eli Cohen

On Wed, Oct 14, 2015 at 11:29:42AM +0300, Haggai Eran wrote:
> respect the pkey_index in ib_send_wr.ud for GSI packets. Apparently having
> the pkey_index in a work request isn't required by the IBA
> specifications,

I disagree. The spec is very clear, 13.5.3.2.1 perscribes how GMP
replies must be generated and there are only two ways to follow that
perscription:
 - Have some mechanism to set the pkey on outgoing QP1 GMPs
 - Support only one pkey

What this series is doing (src QPN != 1) is absolutely not complainant
with the spec.

If hardware doesn't have the ability to set the pkey on outbound, then
it can only support 1 pkey. This may be why reading other parts of the
spec is confusing. pkey_index in the verbs section is optional, but
without it an implementation cannot support multiple pkeys. Thus when
multiple pkeys are supported it is not optional at all.

This proposed hack violates parts of 13.5.3.2.1 so I don't think it
is acceptable for core code to endorse ignoring the spec so
badly. Especially when the violations are visible on the wire.

Put the hack in the driver, and obsolete it when the hardware is fixed
to follow the spec. Even better would be to fix this in firmware and
leave the kernel alone.

FWIW, IMHO, no device that works like this should be a candidate for
the IBTA Interop Logo.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP
       [not found]     ` <20151014175410.GB28534-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-10-14 21:42       ` Weiny, Ira
  2015-10-19 17:59       ` Haggai Eran
  1 sibling, 0 replies; 15+ messages in thread
From: Weiny, Ira @ 2015-10-14 21:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hal Rosenstock,
	Hefty, Sean, Or Gerlitz, Eli Cohen

I was writing my own objection to this but Jason beat me too it.  More comments below.

> 
> On Wed, Oct 14, 2015 at 11:29:42AM +0300, Haggai Eran wrote:
> > respect the pkey_index in ib_send_wr.ud for GSI packets. Apparently
> > having the pkey_index in a work request isn't required by the IBA
> > specifications,
> 
> I disagree. The spec is very clear, 13.5.3.2.1 perscribes how GMP replies must
> be generated and there are only two ways to follow that
> perscription:
>  - Have some mechanism to set the pkey on outgoing QP1 GMPs
>  - Support only one pkey

Agree with Jason.

In addition Section 9.6.1.1.3 C9-42 specifically states that inbound Packets for QP1 are allowed on any PKey of which the port is a member.  What you are proposing widely deviates from the intent of QP1 processing.

> 
> What this series is doing (src QPN != 1) is absolutely not complainant with the
> spec.

Agree.

> 
> If hardware doesn't have the ability to set the pkey on outbound, then it can
> only support 1 pkey. This may be why reading other parts of the spec is
> confusing. pkey_index in the verbs section is optional, but without it an
> implementation cannot support multiple pkeys. Thus when multiple pkeys are
> supported it is not optional at all.

Agree.

> 
> This proposed hack violates parts of 13.5.3.2.1 so I don't think it is acceptable
> for core code to endorse ignoring the spec so badly. Especially when the
> violations are visible on the wire.
> 
> Put the hack in the driver, and obsolete it when the hardware is fixed to follow
> the spec. Even better would be to fix this in firmware and leave the kernel
> alone.

Agree.

One final point, this extends the MAD layer to do things it was not intended to do.  The MAD layer was designed to handle the special QPs (0 and 1) we should keep it that way.  Even if we had a valid use case to create multiple UD QPs for GSI traffic it should be done via a new "QP pool" mechanism rather than hiding it like it was part of QP1.

Ira

> 
> FWIW, IMHO, no device that works like this should be a candidate for the IBTA
> Interop Logo.
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
> of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP
       [not found]     ` <20151014175410.GB28534-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-10-14 21:42       ` Weiny, Ira
@ 2015-10-19 17:59       ` Haggai Eran
       [not found]         ` <56252F7B.4000300-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Haggai Eran @ 2015-10-19 17:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hal Rosenstock,
	Sean Hefty, Or Gerlitz, Eli Cohen

On 10/14/2015 08:54 PM, Jason Gunthorpe wrote:
> What this series is doing (src QPN != 1) is absolutely not complainant
> with the spec.

Hal convinced me that the spec implicitly requires that. Originally I
thought that it was allowed, based this paragraph from Section 13.5.1
(MAD Interfaces):
> Note that it is not required by IBA that GS managers use QP1 as the
> source QP used to send management packets to GS agents. GS managers
> may send packets from any QP other than QP0. QP1's primary purpose
> is to be a known QP target to which GS managers can send packets
> to initially contact a GS agent on a node.

On 10/14/2015 08:54 PM, Jason Gunthorpe wrote:
> If hardware doesn't have the ability to set the pkey on outbound, then
> it can only support 1 pkey. This may be why reading other parts of the
> spec is confusing. pkey_index in the verbs section is optional, but
> without it an implementation cannot support multiple pkeys. Thus when
> multiple pkeys are supported it is not optional at all.
I wouldn't say that pkey_index is optional. If you look at section
11.4.1.1 (Post Send Request), there is simply no mention of a pkey index
in the input modifier list.

> Put the hack in the driver, and obsolete it when the hardware is fixed
> to follow the spec. Even better would be to fix this in firmware and
> leave the kernel alone.
We will move the code to create multiple QPs to the driver, and make
sure it uses SQPN == 1 on all created QPs. Note that a side effect will
be that MADs could be sent out of order (if they are put on different
QPs), however we would make sure that completions are in the right order.

> FWIW, IMHO, no device that works like this should be a candidate for
> the IBTA Interop Logo.
I would expect interop tests to test the BTH pkey, but my guess is that
it isn't tested today. That would explain how both the mlx5 and the
ipath drivers have been able to hide this issue so far.

I think most of the kernel code only looks at the BTH pkey to decide on
what pkey to send a response. It is only with our cma demux patches that
we started to look a the BTH pkey field for validating a connection, and
so the issue surfaced.

Since the change to the driver will take some time, I suggest that for
in order to fix the issue in 4.3 we change cma to look at the CM request
payload pkey and not at the BTH. After mlx5 and ipath are fixed we can
change cma back again.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] IB/cma: Use inner P_Key to determine netdev
       [not found]         ` <56252F7B.4000300-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-19 18:09           ` Haggai Eran
       [not found]             ` <1445278165-18442-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Haggai Eran @ 2015-10-19 18:09 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Sean Hefty, Hal Rosenstock, Jason Gunthorpe, Ira Weiny,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eli Cohen, Or Gerlitz,
	Haggai Eran

When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
was decided that it is best to use the P_Key value in the packet headers
[1]. However, some drivers are currently unable to send correct P_Key in
GMP headers.

Change the rdma_cm code to look at the P_Key value that is part of the
packet payload as a workaround. Once the drivers are fixed this patch can
be reverted.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 59a2dafc8c57..e8324543e085 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1067,14 +1067,14 @@ static int cma_save_req_info(const struct ib_cm_event *ib_event,
 		       sizeof(req->local_gid));
 		req->has_gid	= true;
 		req->service_id	= req_param->primary_path->service_id;
-		req->pkey	= req_param->bth_pkey;
+		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
 		break;
 	case IB_CM_SIDR_REQ_RECEIVED:
 		req->device	= sidr_param->listen_id->device;
 		req->port	= sidr_param->port;
 		req->has_gid	= false;
 		req->service_id	= sidr_param->service_id;
-		req->pkey	= sidr_param->bth_pkey;
+		req->pkey	= sidr_param->pkey;
 		break;
 	default:
 		return -EINVAL;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] IB/cma: Use inner P_Key to determine netdev
       [not found]             ` <1445278165-18442-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-19 18:19               ` Jason Gunthorpe
       [not found]                 ` <20151019181953.GA19665-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2015-10-19 18:19 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Ira Weiny,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eli Cohen, Or Gerlitz

On Mon, Oct 19, 2015 at 09:09:25PM +0300, Haggai Eran wrote:
> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
> was decided that it is best to use the P_Key value in the packet headers
> [1]. However, some drivers are currently unable to send correct P_Key in
> GMP headers.

You should explicitly describe the broken drivers in the commit text.

I thought mlx5 was fixed for receive already? I'm confused why we need
this.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] IB/cma: Use inner P_Key to determine netdev
       [not found]                 ` <20151019181953.GA19665-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-10-20  6:45                   ` Haggai Eran
       [not found]                     ` <5625E307.4090105-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Haggai Eran @ 2015-10-20  6:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Ira Weiny,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eli Cohen, Or Gerlitz

On 19/10/2015 21:19, Jason Gunthorpe wrote:
> On Mon, Oct 19, 2015 at 09:09:25PM +0300, Haggai Eran wrote:
>> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
>> was decided that it is best to use the P_Key value in the packet headers
>> [1]. However, some drivers are currently unable to send correct P_Key in
>> GMP headers.
> 
> You should explicitly describe the broken drivers in the commit text.
These are mlx5 and ipath. I'll add them to the commit message.

> I thought mlx5 was fixed for receive already? I'm confused why we need
> this.
mlx5 had two issues related to GSI pkeys. The issue that was fixed was
that it treated the pkey value returned by the hardware in receive
completions as a pkey_index. The remaining issue is that it doesn't
respect the ib_send_wr.ud.pkey_index field when sending.

With the current state of things, cma will try to look for an ipoib net
dev matching the BTH pkey of the request, but if the sender is mlx5 or
ipath, the BTH pkey would be the default pkey. If the request was
intended for a different pkey, cma won't find a matching netdev and will
throw away the request.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] IB/cma: Use inner P_Key to determine netdev
       [not found]                     ` <5625E307.4090105-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-10-20 16:44                       ` Jason Gunthorpe
       [not found]                         ` <20151020164445.GB24608-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2015-10-20 16:44 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Ira Weiny,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eli Cohen, Or Gerlitz

On Tue, Oct 20, 2015 at 09:45:27AM +0300, Haggai Eran wrote:
> On 19/10/2015 21:19, Jason Gunthorpe wrote:
> > On Mon, Oct 19, 2015 at 09:09:25PM +0300, Haggai Eran wrote:
> >> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
> >> was decided that it is best to use the P_Key value in the packet headers
> >> [1]. However, some drivers are currently unable to send correct P_Key in
> >> GMP headers.
> > 
> > You should explicitly describe the broken drivers in the commit text.
> These are mlx5 and ipath. I'll add them to the commit message.

And ipath is actually ipath, the obsolete driver being removed, not
qib? (ie we don't care about it?)

> The remaining issue is that it doesn't respect the
> ib_send_wr.ud.pkey_index field when sending.

But this is a very serious bug, to the point the mis-labeled packets
may not even be delivered in some cases - you really care about the
sub case where mis-labeled packets happen to be deliverable but don't
parse right?

Well, don't forget to undo this patch when the mlx5 driver is fixed..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] IB/cma: Use inner P_Key to determine netdev
       [not found]                         ` <20151020164445.GB24608-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-10-21 10:51                           ` Haggai Eran
  0 siblings, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2015-10-21 10:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Ira Weiny,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eli Cohen, Or Gerlitz

On 20/10/2015 19:44, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2015 at 09:45:27AM +0300, Haggai Eran wrote:
>> On 19/10/2015 21:19, Jason Gunthorpe wrote:
>>> On Mon, Oct 19, 2015 at 09:09:25PM +0300, Haggai Eran wrote:
>>>> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
>>>> was decided that it is best to use the P_Key value in the packet headers
>>>> [1]. However, some drivers are currently unable to send correct P_Key in
>>>> GMP headers.
>>>
>>> You should explicitly describe the broken drivers in the commit text.
>> These are mlx5 and ipath. I'll add them to the commit message.
> 
> And ipath is actually ipath, the obsolete driver being removed, not
> qib? (ie we don't care about it?)
Right. qib is fine.

>> The remaining issue is that it doesn't respect the
>> ib_send_wr.ud.pkey_index field when sending.
> 
> But this is a very serious bug, to the point the mis-labeled packets
> may not even be delivered in some cases - you really care about the
> sub case where mis-labeled packets happen to be deliverable but don't
> parse right?
I understand the issue in mlx5 is serious but we've lived with it a long
time. In this patch I only wanted to work around it for the purpose of
fixing the regression introduced in 4.3.

> Well, don't forget to undo this patch when the mlx5 driver is fixed..
Of course.

Thanks,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-10-21 10:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-14  8:29 [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP Haggai Eran
     [not found] ` <1444811388-22486-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-14  8:29   ` [PATCH 1/6] IB/mad: Use helpers to get ib_device and ib_pd from ib_mad_agent Haggai Eran
2015-10-14  8:29   ` [PATCH 2/6] IB/mad: Add QP parameters to ib_mad_qp_info Haggai Eran
2015-10-14  8:29   ` [PATCH 3/6] IB/core: Add capability bit to tell whether per-WR P_Key change in GSI is supported Haggai Eran
2015-10-14  8:29   ` [PATCH 4/6] IB/mad: Use a SRQ for receiving GMPs Haggai Eran
2015-10-14  8:29   ` [PATCH 5/6] IB/mad: Create multiple QPs for supporting different P_Keys Haggai Eran
2015-10-14  8:29   ` [PATCH 6/6] IB/mad: P_Key change event handler Haggai Eran
2015-10-14 17:54   ` [PATCH 0/6] IB/mad: Support devices taking pkey_index from the GSI QP Jason Gunthorpe
     [not found]     ` <20151014175410.GB28534-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-10-14 21:42       ` Weiny, Ira
2015-10-19 17:59       ` Haggai Eran
     [not found]         ` <56252F7B.4000300-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-19 18:09           ` [PATCH] IB/cma: Use inner P_Key to determine netdev Haggai Eran
     [not found]             ` <1445278165-18442-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-19 18:19               ` Jason Gunthorpe
     [not found]                 ` <20151019181953.GA19665-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-10-20  6:45                   ` Haggai Eran
     [not found]                     ` <5625E307.4090105-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-10-20 16:44                       ` Jason Gunthorpe
     [not found]                         ` <20151020164445.GB24608-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-10-21 10:51                           ` Haggai Eran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).