All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] IB/mad: Add support for OPA MAD processing.
@ 2015-06-06 18:38 ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The following patch series modifies the kernel MAD processing (ib_mad/ib_umad)
and related interfaces to send and receive Intel Omni-Path Architecture MADs on
devices which support them.

OPA MADs share the same common header with IBTA MADs which allows us to share
most of the MAD processing code.

In addition to supporting some IBTA management classes, OPA devices use MADs
with lengths up to 2K.  These MADs increase the performance of management
traffic on OPA fabrics.

Devices report their support of OPA MADs through the new immutable data
capability flag and immutable max mad size.

Changes from V1:
================

Remove patch:
	IB/mad: Create an RMPP Base header

Add new patch:
	IB/mad cleanup: Clean up function params -- find_mad_agent

Address comments from Jason about the idea of a flex array for struct ib_mad:
	ib_mad does not really allocate struct ib_mads.  Rather it allocates
	ib_mad_private objects.  This is where the flex array was more
	appropriate.  So this series changes struct ib_mad_private to end in a
	flex array to store MAD data.  Casts are used where appropriate to
	IB/OPA mad structures or headers.

Minor updates:
	Clean up commit messages
	Fix/add const and bool usage
	Remove inline qualifiers (let complier handle inline)
	Add additional Immutable data checks
	Change WARN_ON to BUG_ON in drivers
	Add out_mad_pkey_index to process_mad in order to maintain the
	"constness" of the struct ib_wc parameter.


Ira Weiny (14):
  IB/mad cleanup: Clean up function params -- find_mad_agent
  IB/mad cleanup: Generalize processing of MAD data
  IB/mad: Split IB SMI handling from MAD Recv handler
  IB/mad: Create a generic helper for DR SMP Send processing
  IB/mad: Create a generic helper for DR SMP Recv processing
  IB/mad: Create a generic helper for DR forwarding checks
  IB/mad: Support alternate Base Versions when creating MADs
  IB/core: Add ability for drivers to report an alternate MAD size.
  IB/mad: Convert allocations from kmem_cache to kzalloc
  IB/mad: Add support for additional MAD info to/from drivers
  IB/core: Add OPA MAD core capability flag
  IB/mad: Add partial Intel OPA MAD support
  IB/mad: Add partial Intel OPA MAD support
  IB/mad: Add final OPA MAD processing

 drivers/infiniband/core/agent.c              |  15 +-
 drivers/infiniband/core/agent.h              |   4 +-
 drivers/infiniband/core/cm.c                 |   6 +-
 drivers/infiniband/core/device.c             |  11 +
 drivers/infiniband/core/mad.c                | 541 +++++++++++++++++++--------
 drivers/infiniband/core/mad_priv.h           |  11 +-
 drivers/infiniband/core/mad_rmpp.c           |  33 +-
 drivers/infiniband/core/opa_smi.h            |  78 ++++
 drivers/infiniband/core/sa_query.c           |   3 +-
 drivers/infiniband/core/smi.c                | 228 +++++++----
 drivers/infiniband/core/sysfs.c              |   7 +-
 drivers/infiniband/core/user_mad.c           |  20 +-
 drivers/infiniband/hw/amso1100/c2_provider.c |   6 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |   6 +-
 drivers/infiniband/hw/cxgb4/provider.c       |   7 +-
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |   5 +-
 drivers/infiniband/hw/ehca/ehca_main.c       |   2 +
 drivers/infiniband/hw/ehca/ehca_sqp.c        |   9 +-
 drivers/infiniband/hw/ipath/ipath_mad.c      |   9 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c    |   1 +
 drivers/infiniband/hw/ipath/ipath_verbs.h    |   4 +-
 drivers/infiniband/hw/mlx4/mad.c             |  13 +-
 drivers/infiniband/hw/mlx4/main.c            |   2 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |   4 +-
 drivers/infiniband/hw/mlx5/mad.c             |   9 +-
 drivers/infiniband/hw/mlx5/main.c            |   1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |   4 +-
 drivers/infiniband/hw/mthca/mthca_dev.h      |   5 +-
 drivers/infiniband/hw/mthca/mthca_mad.c      |  13 +-
 drivers/infiniband/hw/mthca/mthca_provider.c |   1 +
 drivers/infiniband/hw/nes/nes_verbs.c        |   4 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c     |   9 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h     |   4 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |   2 +
 drivers/infiniband/hw/qib/qib_iba7322.c      |   3 +-
 drivers/infiniband/hw/qib/qib_mad.c          |  12 +-
 drivers/infiniband/hw/qib/qib_verbs.c        |   1 +
 drivers/infiniband/hw/qib/qib_verbs.h        |   4 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c        |   3 +-
 include/rdma/ib_mad.h                        |  37 +-
 include/rdma/ib_verbs.h                      |  55 ++-
 include/rdma/opa_smi.h                       | 106 ++++++
 42 files changed, 997 insertions(+), 301 deletions(-)
 create mode 100644 drivers/infiniband/core/opa_smi.h
 create mode 100644 include/rdma/opa_smi.h

-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/14] IB/mad cleanup: Clean up function params -- find_mad_agent
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 02/14] IB/mad cleanup: Generalize processing of MAD data ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

find_mad_agent only needs read only access to the MAD header.  Update the
ib_mad pointer to be const ib_mad_hdr.  Adjust call tree.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 600af266838c..71cf3d51dad0 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -73,7 +73,7 @@ static int method_in_use(struct ib_mad_mgmt_method_table **method,
 static void remove_mad_reg_req(struct ib_mad_agent_private *priv);
 static struct ib_mad_agent_private *find_mad_agent(
 					struct ib_mad_port_private *port_priv,
-					struct ib_mad *mad);
+					const struct ib_mad_hdr *mad);
 static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 				    struct ib_mad_private *mad);
 static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv);
@@ -813,7 +813,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 		if (port_priv) {
 			memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad));
 			recv_mad_agent = find_mad_agent(port_priv,
-						        &mad_priv->mad.mad);
+						        &mad_priv->mad.mad.mad_hdr);
 		}
 		if (!port_priv || !recv_mad_agent) {
 			/*
@@ -1324,7 +1324,7 @@ static int check_vendor_class(struct ib_mad_mgmt_vendor_class *vendor_class)
 }
 
 static int find_vendor_oui(struct ib_mad_mgmt_vendor_class *vendor_class,
-			   char *oui)
+			   const char *oui)
 {
 	int i;
 
@@ -1622,13 +1622,13 @@ out:
 
 static struct ib_mad_agent_private *
 find_mad_agent(struct ib_mad_port_private *port_priv,
-	       struct ib_mad *mad)
+	       const struct ib_mad_hdr *mad_hdr)
 {
 	struct ib_mad_agent_private *mad_agent = NULL;
 	unsigned long flags;
 
 	spin_lock_irqsave(&port_priv->reg_lock, flags);
-	if (ib_response_mad(&mad->mad_hdr)) {
+	if (ib_response_mad(mad_hdr)) {
 		u32 hi_tid;
 		struct ib_mad_agent_private *entry;
 
@@ -1636,7 +1636,7 @@ find_mad_agent(struct ib_mad_port_private *port_priv,
 		 * Routing is based on high 32 bits of transaction ID
 		 * of MAD.
 		 */
-		hi_tid = be64_to_cpu(mad->mad_hdr.tid) >> 32;
+		hi_tid = be64_to_cpu(mad_hdr->tid) >> 32;
 		list_for_each_entry(entry, &port_priv->agent_list, agent_list) {
 			if (entry->agent.hi_tid == hi_tid) {
 				mad_agent = entry;
@@ -1648,45 +1648,45 @@ find_mad_agent(struct ib_mad_port_private *port_priv,
 		struct ib_mad_mgmt_method_table *method;
 		struct ib_mad_mgmt_vendor_class_table *vendor;
 		struct ib_mad_mgmt_vendor_class *vendor_class;
-		struct ib_vendor_mad *vendor_mad;
+		const struct ib_vendor_mad *vendor_mad;
 		int index;
 
 		/*
 		 * Routing is based on version, class, and method
 		 * For "newer" vendor MADs, also based on OUI
 		 */
-		if (mad->mad_hdr.class_version >= MAX_MGMT_VERSION)
+		if (mad_hdr->class_version >= MAX_MGMT_VERSION)
 			goto out;
-		if (!is_vendor_class(mad->mad_hdr.mgmt_class)) {
+		if (!is_vendor_class(mad_hdr->mgmt_class)) {
 			class = port_priv->version[
-					mad->mad_hdr.class_version].class;
+					mad_hdr->class_version].class;
 			if (!class)
 				goto out;
-			if (convert_mgmt_class(mad->mad_hdr.mgmt_class) >=
+			if (convert_mgmt_class(mad_hdr->mgmt_class) >=
 			    IB_MGMT_MAX_METHODS)
 				goto out;
 			method = class->method_table[convert_mgmt_class(
-							mad->mad_hdr.mgmt_class)];
+							mad_hdr->mgmt_class)];
 			if (method)
-				mad_agent = method->agent[mad->mad_hdr.method &
+				mad_agent = method->agent[mad_hdr->method &
 							  ~IB_MGMT_METHOD_RESP];
 		} else {
 			vendor = port_priv->version[
-					mad->mad_hdr.class_version].vendor;
+					mad_hdr->class_version].vendor;
 			if (!vendor)
 				goto out;
 			vendor_class = vendor->vendor_class[vendor_class_index(
-						mad->mad_hdr.mgmt_class)];
+						mad_hdr->mgmt_class)];
 			if (!vendor_class)
 				goto out;
 			/* Find matching OUI */
-			vendor_mad = (struct ib_vendor_mad *)mad;
+			vendor_mad = (const struct ib_vendor_mad *)mad_hdr;
 			index = find_vendor_oui(vendor_class, vendor_mad->oui);
 			if (index == -1)
 				goto out;
 			method = vendor_class->method_table[index];
 			if (method) {
-				mad_agent = method->agent[mad->mad_hdr.method &
+				mad_agent = method->agent[mad_hdr->method &
 							  ~IB_MGMT_METHOD_RESP];
 			}
 		}
@@ -2056,7 +2056,7 @@ local:
 		}
 	}
 
-	mad_agent = find_mad_agent(port_priv, &recv->mad.mad);
+	mad_agent = find_mad_agent(port_priv, &recv->mad.mad.mad_hdr);
 	if (mad_agent) {
 		ib_mad_complete_recv(mad_agent, &recv->header.recv_wc);
 		/*
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/14] IB/mad cleanup: Generalize processing of MAD data
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-06 18:38   ` [PATCH 01/14] IB/mad cleanup: Clean up function params -- find_mad_agent ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 03/14] IB/mad: Split IB SMI handling from MAD Recv handler ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

ib_find_send_mad only needs access to the MAD header not the full IB MAD.
Change the local variable to ib_mad_hdr and change the corresponding cast.

This allows for clean usage of this function with both IB and OPA MADs because
OPA MADs carry the same header as IB MADs.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1
	Make ib_mad_hdr const so as to not cast away the const of the WC
	parameter.

 drivers/infiniband/core/mad.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 71cf3d51dad0..fa9157db7b2e 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1815,18 +1815,18 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv,
 		 const struct ib_mad_recv_wc *wc)
 {
 	struct ib_mad_send_wr_private *wr;
-	struct ib_mad *mad;
+	const struct ib_mad_hdr *mad_hdr;
 
-	mad = (struct ib_mad *)wc->recv_buf.mad;
+	mad_hdr = &wc->recv_buf.mad->mad_hdr;
 
 	list_for_each_entry(wr, &mad_agent_priv->wait_list, agent_list) {
-		if ((wr->tid == mad->mad_hdr.tid) &&
+		if ((wr->tid == mad_hdr->tid) &&
 		    rcv_has_same_class(wr, wc) &&
 		    /*
 		     * Don't check GID for direct routed MADs.
 		     * These might have permissive LIDs.
 		     */
-		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
+		    (is_direct(mad_hdr->mgmt_class) ||
 		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
 			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
 	}
@@ -1837,14 +1837,14 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv,
 	 */
 	list_for_each_entry(wr, &mad_agent_priv->send_list, agent_list) {
 		if (is_rmpp_data_mad(mad_agent_priv, wr->send_buf.mad) &&
-		    wr->tid == mad->mad_hdr.tid &&
+		    wr->tid == mad_hdr->tid &&
 		    wr->timeout &&
 		    rcv_has_same_class(wr, wc) &&
 		    /*
 		     * Don't check GID for direct routed MADs.
 		     * These might have permissive LIDs.
 		     */
-		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
+		    (is_direct(mad_hdr->mgmt_class) ||
 		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
 			/* Verify request has not been canceled */
 			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/14] IB/mad: Split IB SMI handling from MAD Recv handler
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-06 18:38   ` [PATCH 01/14] IB/mad cleanup: Clean up function params -- find_mad_agent ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 02/14] IB/mad cleanup: Generalize processing of MAD data ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 04/14] IB/mad: Create a generic helper for DR SMP Send processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Make a helper function to process Directed Route SMPs to be called by the IB
MAD Recv Handler, ib_mad_recv_done_handler.

This cleans up the MAD receive handler code a bit and allows for us to better
share the SMP processing code between IB and OPA SMPs.

IB and OPA SMPs share the same processing algorithm but have different header
formats and permissive LID detection.  Therefore this and subsequent patches
split the common processing code from the IB specific code in anticipation of
sharing those algorithms with the OPA code.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	add const to input variables
	make static
	Reword commit message

 drivers/infiniband/core/mad.c | 85 +++++++++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index fa9157db7b2e..b1c7990f36e3 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1924,6 +1924,52 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 	}
 }
 
+static enum smi_action handle_ib_smi(const struct ib_mad_port_private *port_priv,
+				     const struct ib_mad_qp_info *qp_info,
+				     const struct ib_wc *wc,
+				     int port_num,
+				     struct ib_mad_private *recv,
+				     struct ib_mad_private *response)
+{
+	enum smi_forward_action retsmi;
+
+	if (smi_handle_dr_smp_recv(&recv->mad.smp,
+				   port_priv->device->node_type,
+				   port_num,
+				   port_priv->device->phys_port_cnt) ==
+				   IB_SMI_DISCARD)
+		return IB_SMI_DISCARD;
+
+	retsmi = smi_check_forward_dr_smp(&recv->mad.smp);
+	if (retsmi == IB_SMI_LOCAL)
+		return IB_SMI_HANDLE;
+
+	if (retsmi == IB_SMI_SEND) { /* don't forward */
+		if (smi_handle_dr_smp_send(&recv->mad.smp,
+					   port_priv->device->node_type,
+					   port_num) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+		if (smi_check_local_smp(&recv->mad.smp, port_priv->device) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
+		/* forward case for switches */
+		memcpy(response, recv, sizeof(*response));
+		response->header.recv_wc.wc = &response->header.wc;
+		response->header.recv_wc.recv_buf.mad = &response->mad.mad;
+		response->header.recv_wc.recv_buf.grh = &response->grh;
+
+		agent_send_response(&response->mad.mad,
+				    &response->grh, wc,
+				    port_priv->device,
+				    smi_get_fwd_port(&recv->mad.smp),
+				    qp_info->qp->qp_num);
+
+		return IB_SMI_DISCARD;
+	}
+	return IB_SMI_HANDLE;
+}
+
 static bool generate_unmatched_resp(struct ib_mad_private *recv,
 				    struct ib_mad_private *response)
 {
@@ -1996,45 +2042,12 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 
 	if (recv->mad.mad.mad_hdr.mgmt_class ==
 	    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
-		enum smi_forward_action retsmi;
-
-		if (smi_handle_dr_smp_recv(&recv->mad.smp,
-					   port_priv->device->node_type,
-					   port_num,
-					   port_priv->device->phys_port_cnt) ==
-					   IB_SMI_DISCARD)
+		if (handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
+				  response)
+		    == IB_SMI_DISCARD)
 			goto out;
-
-		retsmi = smi_check_forward_dr_smp(&recv->mad.smp);
-		if (retsmi == IB_SMI_LOCAL)
-			goto local;
-
-		if (retsmi == IB_SMI_SEND) { /* don't forward */
-			if (smi_handle_dr_smp_send(&recv->mad.smp,
-						   port_priv->device->node_type,
-						   port_num) == IB_SMI_DISCARD)
-				goto out;
-
-			if (smi_check_local_smp(&recv->mad.smp, port_priv->device) == IB_SMI_DISCARD)
-				goto out;
-		} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
-			/* forward case for switches */
-			memcpy(response, recv, sizeof(*response));
-			response->header.recv_wc.wc = &response->header.wc;
-			response->header.recv_wc.recv_buf.mad = &response->mad.mad;
-			response->header.recv_wc.recv_buf.grh = &response->grh;
-
-			agent_send_response(&response->mad.mad,
-					    &response->grh, wc,
-					    port_priv->device,
-					    smi_get_fwd_port(&recv->mad.smp),
-					    qp_info->qp->qp_num);
-
-			goto out;
-		}
 	}
 
-local:
 	/* Give driver "right of first refusal" on incoming MAD */
 	if (port_priv->device->process_mad) {
 		ret = port_priv->device->process_mad(port_priv->device, 0,
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/14] IB/mad: Create a generic helper for DR SMP Send processing
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 03/14] IB/mad: Split IB SMI handling from MAD Recv handler ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 05/14] IB/mad: Create a generic helper for DR SMP Recv processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

IB and OPA SMPs share the same processing algorithm but have different header
formats and permissive LID detection.

Add a helper function which is generic to processing DR SMP Send messages which
can be used by both IB and OPA SMP code.

Use this function in the current IB function smi_handle_dr_smp_send.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	Remove unnecessary inline qualifier
	Change parameters to bool
	Add const to input parameters
	Clean up commit message

 drivers/infiniband/core/smi.c | 80 +++++++++++++++++++++++++------------------
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index e6c6810c8c41..b6dedc0918fe 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -39,84 +39,80 @@
 #include <rdma/ib_smi.h>
 #include "smi.h"
 
-/*
- * Fixup a directed route SMP for sending
- * Return IB_SMI_DISCARD if the SMP should be discarded
- */
-enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
-				       u8 node_type, int port_num)
+static enum smi_action __smi_handle_dr_smp_send(u8 node_type, int port_num,
+						u8 *hop_ptr, u8 hop_cnt,
+						const u8 *initial_path,
+						const u8 *return_path,
+						u8 direction,
+						bool dr_dlid_is_permissive,
+						bool dr_slid_is_permissive)
 {
-	u8 hop_ptr, hop_cnt;
-
-	hop_ptr = smp->hop_ptr;
-	hop_cnt = smp->hop_cnt;
-
 	/* See section 14.2.2.2, Vol 1 IB spec */
 	/* C14-6 -- valid hop_cnt values are from 0 to 63 */
 	if (hop_cnt >= IB_SMP_MAX_PATH_HOPS)
 		return IB_SMI_DISCARD;
 
-	if (!ib_get_smp_direction(smp)) {
+	if (!direction) {
 		/* C14-9:1 */
-		if (hop_cnt && hop_ptr == 0) {
-			smp->hop_ptr++;
-			return (smp->initial_path[smp->hop_ptr] ==
+		if (hop_cnt && *hop_ptr == 0) {
+			(*hop_ptr)++;
+			return (initial_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:2 */
-		if (hop_ptr && hop_ptr < hop_cnt) {
+		if (*hop_ptr && *hop_ptr < hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			/* smp->return_path set when received */
-			smp->hop_ptr++;
-			return (smp->initial_path[smp->hop_ptr] ==
+			/* return_path set when received */
+			(*hop_ptr)++;
+			return (initial_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:3 -- We're at the end of the DR segment of path */
-		if (hop_ptr == hop_cnt) {
-			/* smp->return_path set when received */
-			smp->hop_ptr++;
+		if (*hop_ptr == hop_cnt) {
+			/* return_path set when received */
+			(*hop_ptr)++;
 			return (node_type == RDMA_NODE_IB_SWITCH ||
-				smp->dr_dlid == IB_LID_PERMISSIVE ?
+				dr_dlid_is_permissive ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM */
 		/* C14-9:5 -- Fail unreasonable hop pointer */
-		return (hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+		return (*hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 
 	} else {
 		/* C14-13:1 */
-		if (hop_cnt && hop_ptr == hop_cnt + 1) {
-			smp->hop_ptr--;
-			return (smp->return_path[smp->hop_ptr] ==
+		if (hop_cnt && *hop_ptr == hop_cnt + 1) {
+			(*hop_ptr)--;
+			return (return_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:2 */
-		if (2 <= hop_ptr && hop_ptr <= hop_cnt) {
+		if (2 <= *hop_ptr && *hop_ptr <= hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			smp->hop_ptr--;
-			return (smp->return_path[smp->hop_ptr] ==
+			(*hop_ptr)--;
+			return (return_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:3 -- at the end of the DR segment of path */
-		if (hop_ptr == 1) {
-			smp->hop_ptr--;
+		if (*hop_ptr == 1) {
+			(*hop_ptr)--;
 			/* C14-13:3 -- SMPs destined for SM shouldn't be here */
 			return (node_type == RDMA_NODE_IB_SWITCH ||
-				smp->dr_slid == IB_LID_PERMISSIVE ?
+				dr_slid_is_permissive ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:4 -- hop_ptr = 0 -> should have gone to SM */
-		if (hop_ptr == 0)
+		if (*hop_ptr == 0)
 			return IB_SMI_HANDLE;
 
 		/* C14-13:5 -- Check for unreasonable hop pointer */
@@ -125,6 +121,22 @@ enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
 }
 
 /*
+ * Fixup a directed route SMP for sending
+ * Return IB_SMI_DISCARD if the SMP should be discarded
+ */
+enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
+				       u8 node_type, int port_num)
+{
+	return __smi_handle_dr_smp_send(node_type, port_num,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->initial_path,
+					smp->return_path,
+					ib_get_smp_direction(smp),
+					smp->dr_dlid == IB_LID_PERMISSIVE,
+					smp->dr_slid == IB_LID_PERMISSIVE);
+}
+
+/*
  * Adjust information for a received SMP
  * Return IB_SMI_DISCARD if the SMP should be dropped
  */
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/14] IB/mad: Create a generic helper for DR SMP Recv processing
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 04/14] IB/mad: Create a generic helper for DR SMP Send processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 06/14] IB/mad: Create a generic helper for DR forwarding checks ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

IB and OPA SMPs share the same processing algorithm but have different header
formats and permissive LID detection.

Add a helper function which is generic to processing DR SMP Recv messages which
can be used by both IB and OPA SMP code.

Use this function in the current IB function smi_handle_dr_smp_recv.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	Remove unnecessary inline qualifier
	Change parameters to bool
	Make initial_path const
	Clean up commit message

 drivers/infiniband/core/smi.c | 79 +++++++++++++++++++++++++------------------
 1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index b6dedc0918fe..eb39146adb80 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -136,91 +136,104 @@ enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
-/*
- * Adjust information for a received SMP
- * Return IB_SMI_DISCARD if the SMP should be dropped
- */
-enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
-				       int port_num, int phys_port_cnt)
+static enum smi_action __smi_handle_dr_smp_recv(u8 node_type, int port_num,
+						int phys_port_cnt,
+						u8 *hop_ptr, u8 hop_cnt,
+						const u8 *initial_path,
+						u8 *return_path,
+						u8 direction,
+						bool dr_dlid_is_permissive,
+						bool dr_slid_is_permissive)
 {
-	u8 hop_ptr, hop_cnt;
-
-	hop_ptr = smp->hop_ptr;
-	hop_cnt = smp->hop_cnt;
-
 	/* See section 14.2.2.2, Vol 1 IB spec */
 	/* C14-6 -- valid hop_cnt values are from 0 to 63 */
 	if (hop_cnt >= IB_SMP_MAX_PATH_HOPS)
 		return IB_SMI_DISCARD;
 
-	if (!ib_get_smp_direction(smp)) {
+	if (!direction) {
 		/* C14-9:1 -- sender should have incremented hop_ptr */
-		if (hop_cnt && hop_ptr == 0)
+		if (hop_cnt && *hop_ptr == 0)
 			return IB_SMI_DISCARD;
 
 		/* C14-9:2 -- intermediate hop */
-		if (hop_ptr && hop_ptr < hop_cnt) {
+		if (*hop_ptr && *hop_ptr < hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			smp->return_path[hop_ptr] = port_num;
-			/* smp->hop_ptr updated when sending */
-			return (smp->initial_path[hop_ptr+1] <= phys_port_cnt ?
+			return_path[*hop_ptr] = port_num;
+			/* hop_ptr updated when sending */
+			return (initial_path[*hop_ptr+1] <= phys_port_cnt ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:3 -- We're at the end of the DR segment of path */
-		if (hop_ptr == hop_cnt) {
+		if (*hop_ptr == hop_cnt) {
 			if (hop_cnt)
-				smp->return_path[hop_ptr] = port_num;
-			/* smp->hop_ptr updated when sending */
+				return_path[*hop_ptr] = port_num;
+			/* hop_ptr updated when sending */
 
 			return (node_type == RDMA_NODE_IB_SWITCH ||
-				smp->dr_dlid == IB_LID_PERMISSIVE ?
+				dr_dlid_is_permissive ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM */
 		/* C14-9:5 -- fail unreasonable hop pointer */
-		return (hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+		return (*hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 
 	} else {
 
 		/* C14-13:1 */
-		if (hop_cnt && hop_ptr == hop_cnt + 1) {
-			smp->hop_ptr--;
-			return (smp->return_path[smp->hop_ptr] ==
+		if (hop_cnt && *hop_ptr == hop_cnt + 1) {
+			(*hop_ptr)--;
+			return (return_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:2 */
-		if (2 <= hop_ptr && hop_ptr <= hop_cnt) {
+		if (2 <= *hop_ptr && *hop_ptr <= hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			/* smp->hop_ptr updated when sending */
-			return (smp->return_path[hop_ptr-1] <= phys_port_cnt ?
+			/* hop_ptr updated when sending */
+			return (return_path[*hop_ptr-1] <= phys_port_cnt ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:3 -- We're at the end of the DR segment of path */
-		if (hop_ptr == 1) {
-			if (smp->dr_slid == IB_LID_PERMISSIVE) {
+		if (*hop_ptr == 1) {
+			if (dr_slid_is_permissive) {
 				/* giving SMP to SM - update hop_ptr */
-				smp->hop_ptr--;
+				(*hop_ptr)--;
 				return IB_SMI_HANDLE;
 			}
-			/* smp->hop_ptr updated when sending */
+			/* hop_ptr updated when sending */
 			return (node_type == RDMA_NODE_IB_SWITCH ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:4 -- hop_ptr = 0 -> give to SM */
 		/* C14-13:5 -- Check for unreasonable hop pointer */
-		return (hop_ptr == 0 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+		return (*hop_ptr == 0 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 	}
 }
 
+/*
+ * Adjust information for a received SMP
+ * Return IB_SMI_DISCARD if the SMP should be dropped
+ */
+enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
+				       int port_num, int phys_port_cnt)
+{
+	return __smi_handle_dr_smp_recv(node_type, port_num, phys_port_cnt,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->initial_path,
+					smp->return_path,
+					ib_get_smp_direction(smp),
+					smp->dr_dlid == IB_LID_PERMISSIVE,
+					smp->dr_slid == IB_LID_PERMISSIVE);
+}
+
 enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
 {
 	u8 hop_ptr, hop_cnt;
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/14] IB/mad: Create a generic helper for DR forwarding checks
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 05/14] IB/mad: Create a generic helper for DR SMP Recv processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 07/14] IB/mad: Support alternate Base Versions when creating MADs ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

IB and OPA SMPs share the same processing algorithm but have different header
formats and permissive LID detection.

Add a helper function which is generic to processing the DR forwarding checks which
can be used by both IB and OPA SMP code.

Use this function in the current IB function smi_check_forward_dr_smp.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	Remove unnecessary inline qualifier
	Change parameters to bool
	Fix is permissive logic
	Clean up commit message

 drivers/infiniband/core/smi.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index eb39146adb80..c523b2df2571 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -234,21 +234,19 @@ enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
-enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
+static enum smi_forward_action __smi_check_forward_dr_smp(u8 hop_ptr, u8 hop_cnt,
+							  u8 direction,
+							  bool dr_dlid_is_permissive,
+							  bool dr_slid_is_permissive)
 {
-	u8 hop_ptr, hop_cnt;
-
-	hop_ptr = smp->hop_ptr;
-	hop_cnt = smp->hop_cnt;
-
-	if (!ib_get_smp_direction(smp)) {
+	if (!direction) {
 		/* C14-9:2 -- intermediate hop */
 		if (hop_ptr && hop_ptr < hop_cnt)
 			return IB_SMI_FORWARD;
 
 		/* C14-9:3 -- at the end of the DR segment of path */
 		if (hop_ptr == hop_cnt)
-			return (smp->dr_dlid == IB_LID_PERMISSIVE ?
+			return (dr_dlid_is_permissive ?
 				IB_SMI_SEND : IB_SMI_LOCAL);
 
 		/* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM */
@@ -261,10 +259,19 @@ enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
 
 		/* C14-13:3 -- at the end of the DR segment of path */
 		if (hop_ptr == 1)
-			return (smp->dr_slid != IB_LID_PERMISSIVE ?
+			return (!dr_slid_is_permissive ?
 				IB_SMI_SEND : IB_SMI_LOCAL);
 	}
 	return IB_SMI_LOCAL;
+
+}
+
+enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
+{
+	return __smi_check_forward_dr_smp(smp->hop_ptr, smp->hop_cnt,
+					  ib_get_smp_direction(smp),
+					  smp->dr_dlid == IB_LID_PERMISSIVE,
+					  smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
 /*
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/14] IB/mad: Support alternate Base Versions when creating MADs
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 06/14] IB/mad: Create a generic helper for DR forwarding checks ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 08/14] IB/core: Add ability for drivers to report an alternate MAD size ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

In preparation to support the new OPA MAD Base version, add a base version
parameter to ib_create_send_mad and set it to IB_MGMT_BASE_VERSION for current
users.

Definition of the new base version and it's processing will occur in later
patches.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/agent.c         | 3 ++-
 drivers/infiniband/core/cm.c            | 6 ++++--
 drivers/infiniband/core/mad.c           | 3 ++-
 drivers/infiniband/core/mad_rmpp.c      | 6 ++++--
 drivers/infiniband/core/sa_query.c      | 3 ++-
 drivers/infiniband/core/user_mad.c      | 3 ++-
 drivers/infiniband/hw/mlx4/mad.c        | 3 ++-
 drivers/infiniband/hw/mthca/mthca_mad.c | 3 ++-
 drivers/infiniband/hw/qib/qib_iba7322.c | 3 ++-
 drivers/infiniband/hw/qib/qib_mad.c     | 3 ++-
 drivers/infiniband/ulp/srpt/ib_srpt.c   | 3 ++-
 include/rdma/ib_mad.h                   | 4 +++-
 12 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index e51ea76c2523..4fe1fb6b37cd 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -108,7 +108,8 @@ void agent_send_response(const struct ib_mad *mad, const struct ib_grh *grh,
 
 	send_buf = ib_create_send_mad(agent, wc->src_qp, wc->pkey_index, 0,
 				      IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-				      GFP_KERNEL);
+				      GFP_KERNEL,
+				      IB_MGMT_BASE_VERSION);
 	if (IS_ERR(send_buf)) {
 		dev_err(&device->dev, "ib_create_send_mad error\n");
 		goto err1;
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c3be6663f4f5..dbddddd6fb5d 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -267,7 +267,8 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv,
 	m = ib_create_send_mad(mad_agent, cm_id_priv->id.remote_cm_qpn,
 			       cm_id_priv->av.pkey_index,
 			       0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-			       GFP_ATOMIC);
+			       GFP_ATOMIC,
+			       IB_MGMT_BASE_VERSION);
 	if (IS_ERR(m)) {
 		ib_destroy_ah(ah);
 		return PTR_ERR(m);
@@ -297,7 +298,8 @@ static int cm_alloc_response_msg(struct cm_port *port,
 
 	m = ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index,
 			       0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-			       GFP_ATOMIC);
+			       GFP_ATOMIC,
+			       IB_MGMT_BASE_VERSION);
 	if (IS_ERR(m)) {
 		ib_destroy_ah(ah);
 		return PTR_ERR(m);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index b1c7990f36e3..de7e239d3d6f 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -920,7 +920,8 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 					    u32 remote_qpn, u16 pkey_index,
 					    int rmpp_active,
 					    int hdr_len, int data_len,
-					    gfp_t gfp_mask)
+					    gfp_t gfp_mask,
+					    u8 base_version)
 {
 	struct ib_mad_agent_private *mad_agent_priv;
 	struct ib_mad_send_wr_private *mad_send_wr;
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index f37878c9c06e..2379e2dfa400 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -139,7 +139,8 @@ static void ack_recv(struct mad_rmpp_recv *rmpp_recv,
 	hdr_len = ib_get_mad_data_offset(recv_wc->recv_buf.mad->mad_hdr.mgmt_class);
 	msg = ib_create_send_mad(&rmpp_recv->agent->agent, recv_wc->wc->src_qp,
 				 recv_wc->wc->pkey_index, 1, hdr_len,
-				 0, GFP_KERNEL);
+				 0, GFP_KERNEL,
+				 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(msg))
 		return;
 
@@ -165,7 +166,8 @@ static struct ib_mad_send_buf *alloc_response_msg(struct ib_mad_agent *agent,
 	hdr_len = ib_get_mad_data_offset(recv_wc->recv_buf.mad->mad_hdr.mgmt_class);
 	msg = ib_create_send_mad(agent, recv_wc->wc->src_qp,
 				 recv_wc->wc->pkey_index, 1,
-				 hdr_len, 0, GFP_KERNEL);
+				 hdr_len, 0, GFP_KERNEL,
+				 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(msg))
 		ib_destroy_ah(ah);
 	else {
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 7f7c8c9fa92c..78fbedd8d013 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -583,7 +583,8 @@ static int alloc_mad(struct ib_sa_query *query, gfp_t gfp_mask)
 	query->mad_buf = ib_create_send_mad(query->port->agent, 1,
 					    query->sm_ah->pkey_index,
 					    0, IB_MGMT_SA_HDR, IB_MGMT_SA_DATA,
-					    gfp_mask);
+					    gfp_mask,
+					    IB_MGMT_BASE_VERSION);
 	if (IS_ERR(query->mad_buf)) {
 		kref_put(&query->sm_ah->ref, free_sm_ah);
 		return -ENOMEM;
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index e58d701b7791..d4286712405d 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -520,7 +520,8 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 	packet->msg = ib_create_send_mad(agent,
 					 be32_to_cpu(packet->mad.hdr.qpn),
 					 packet->mad.hdr.pkey_index, rmpp_active,
-					 hdr_len, data_len, GFP_KERNEL);
+					 hdr_len, data_len, GFP_KERNEL,
+					 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(packet->msg)) {
 		ret = PTR_ERR(packet->msg);
 		goto err_ah;
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 614ac6f07ae1..6ac41cc15872 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -367,7 +367,8 @@ static void forward_trap(struct mlx4_ib_dev *dev, u8 port_num, const struct ib_m
 
 	if (agent) {
 		send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR,
-					      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+					      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+					      IB_MGMT_BASE_VERSION);
 		if (IS_ERR(send_buf))
 			return;
 		/*
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index d54608ca0820..e121e646591d 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -170,7 +170,8 @@ static void forward_trap(struct mthca_dev *dev,
 
 	if (agent) {
 		send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR,
-					      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+					      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+					      IB_MGMT_BASE_VERSION);
 		if (IS_ERR(send_buf))
 			return;
 		/*
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index f32b4628e991..6c8ff10101c0 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -5502,7 +5502,8 @@ static void try_7322_ipg(struct qib_pportdata *ppd)
 		goto retry;
 
 	send_buf = ib_create_send_mad(agent, 0, 0, 0, IB_MGMT_MAD_HDR,
-				      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+				      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+				      IB_MGMT_BASE_VERSION);
 	if (IS_ERR(send_buf))
 		goto retry;
 
diff --git a/drivers/infiniband/hw/qib/qib_mad.c b/drivers/infiniband/hw/qib/qib_mad.c
index 6ab8ab89d058..206b2050b247 100644
--- a/drivers/infiniband/hw/qib/qib_mad.c
+++ b/drivers/infiniband/hw/qib/qib_mad.c
@@ -83,7 +83,8 @@ static void qib_send_trap(struct qib_ibport *ibp, void *data, unsigned len)
 		return;
 
 	send_buf = ib_create_send_mad(agent, 0, 0, 0, IB_MGMT_MAD_HDR,
-				      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+				      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+				      IB_MGMT_BASE_VERSION);
 	if (IS_ERR(send_buf))
 		return;
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9b84b4c0a000..68201d69d99e 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -476,7 +476,8 @@ static void srpt_mad_recv_handler(struct ib_mad_agent *mad_agent,
 	rsp = ib_create_send_mad(mad_agent, mad_wc->wc->src_qp,
 				 mad_wc->wc->pkey_index, 0,
 				 IB_MGMT_DEVICE_HDR, IB_MGMT_DEVICE_DATA,
-				 GFP_KERNEL);
+				 GFP_KERNEL,
+				 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(rsp))
 		goto err_rsp;
 
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index c0ea51f90a03..bf03ce07c316 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -618,6 +618,7 @@ int ib_process_mad_wc(struct ib_mad_agent *mad_agent,
  *   automatically adjust the allocated buffer size to account for any
  *   additional padding that may be necessary.
  * @gfp_mask: GFP mask used for the memory allocation.
+ * @base_version: Base Version of this MAD
  *
  * This routine allocates a MAD for sending.  The returned MAD send buffer
  * will reference a data buffer usable for sending a MAD, along
@@ -633,7 +634,8 @@ struct ib_mad_send_buf *ib_create_send_mad(struct ib_mad_agent *mad_agent,
 					   u32 remote_qpn, u16 pkey_index,
 					   int rmpp_active,
 					   int hdr_len, int data_len,
-					   gfp_t gfp_mask);
+					   gfp_t gfp_mask,
+					   u8 base_version);
 
 /**
  * ib_is_mad_class_rmpp - returns whether given management class
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/14] IB/core: Add ability for drivers to report an alternate MAD size.
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 07/14] IB/mad: Support alternate Base Versions when creating MADs ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 09/14] IB/mad: Convert allocations from kmem_cache to kzalloc ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Add max MAD size to the device immutable data set and have all drivers that
support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core.

Verify MAD size data in both the MAD core and when reading the immutable data.

OPA drivers will report alternate MAD sizes in subsequent patches.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	Add check on immutable data
	Clean up the comment
	Add const to helper function parameter
	Clean up commit message

 drivers/infiniband/core/device.c             | 11 +++++++++++
 drivers/infiniband/core/mad.c                |  3 +++
 drivers/infiniband/hw/ehca/ehca_main.c       |  2 ++
 drivers/infiniband/hw/ipath/ipath_verbs.c    |  1 +
 drivers/infiniband/hw/mlx4/main.c            |  2 ++
 drivers/infiniband/hw/mlx5/main.c            |  1 +
 drivers/infiniband/hw/mthca/mthca_provider.c |  1 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |  2 ++
 drivers/infiniband/hw/qib/qib_verbs.c        |  1 +
 include/rdma/ib_mad.h                        |  1 +
 include/rdma/ib_verbs.h                      | 18 ++++++++++++++++++
 11 files changed, 43 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8d07c12ab718..5a67be5d391d 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -211,6 +211,12 @@ static int add_client_context(struct ib_device *device, struct ib_client *client
 	return 0;
 }
 
+static int verify_immutable(const struct ib_device *dev, u8 port)
+{
+	return WARN_ON(!rdma_cap_ib_mad(dev, port) &&
+			    rdma_max_mad_size(dev, port) != 0);
+}
+
 static int read_port_immutable(struct ib_device *device)
 {
 	int ret = -ENOMEM;
@@ -236,6 +242,11 @@ static int read_port_immutable(struct ib_device *device)
 						 &device->port_immutable[port]);
 		if (ret)
 			goto err;
+
+		if (verify_immutable(device, port)) {
+			ret = -EINVAL;
+			goto err;
+		}
 	}
 
 	ret = 0;
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index de7e239d3d6f..7b94be9aa6de 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2938,6 +2938,9 @@ static int ib_mad_port_open(struct ib_device *device,
 	char name[sizeof "ib_mad123"];
 	int has_smi;
 
+	if (WARN_ON(rdma_max_mad_size(device, port_num) < IB_MGMT_MAD_SIZE))
+		return -EFAULT;
+
 	/* Create new device info */
 	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
 	if (!port_priv) {
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 5e30b72d3677..64c834b358d6 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -46,6 +46,7 @@
 
 #include <linux/notifier.h>
 #include <linux/memory.h>
+#include <rdma/ib_mad.h>
 #include "ehca_classes.h"
 #include "ehca_iverbs.h"
 #include "ehca_mrmw.h"
@@ -444,6 +445,7 @@ static int ehca_port_immutable(struct ib_device *ibdev, u8 port_num,
 	immutable->pkey_tbl_len = attr.pkey_tbl_len;
 	immutable->gid_tbl_len = attr.gid_tbl_len;
 	immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+	immutable->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 764081d305b6..c67e8c22dabc 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1993,6 +1993,7 @@ static int ipath_port_immutable(struct ib_device *ibdev, u8 port_num,
 	immutable->pkey_tbl_len = attr.pkey_tbl_len;
 	immutable->gid_tbl_len = attr.gid_tbl_len;
 	immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+	immutable->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 86c0c27120f7..113733ccfa08 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2133,6 +2133,8 @@ static int mlx4_port_immutable(struct ib_device *ibdev, u8 port_num,
 	else
 		immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
 
+	immutable->max_mad_size = IB_MGMT_MAD_SIZE;
+
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b2fdb9cfa645..bb7f718adc11 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1195,6 +1195,7 @@ static int mlx5_port_immutable(struct ib_device *ibdev, u8 port_num,
 	immutable->pkey_tbl_len = attr.pkey_tbl_len;
 	immutable->gid_tbl_len = attr.gid_tbl_len;
 	immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+	immutable->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 509d59e7a15a..db11d0cc5d59 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -1257,6 +1257,7 @@ static int mthca_port_immutable(struct ib_device *ibdev, u8 port_num,
 	immutable->pkey_tbl_len = attr.pkey_tbl_len;
 	immutable->gid_tbl_len = attr.gid_tbl_len;
 	immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+	immutable->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index f55289869357..8a1398b253a2 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -30,6 +30,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_user_verbs.h>
 #include <rdma/ib_addr.h>
+#include <rdma/ib_mad.h>
 
 #include <linux/netdevice.h>
 #include <net/addrconf.h>
@@ -215,6 +216,7 @@ static int ocrdma_port_immutable(struct ib_device *ibdev, u8 port_num,
 	immutable->pkey_tbl_len = attr.pkey_tbl_len;
 	immutable->gid_tbl_len = attr.gid_tbl_len;
 	immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
+	immutable->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index dba1c92f1a54..e9e21f9c36e2 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -2053,6 +2053,7 @@ static int qib_port_immutable(struct ib_device *ibdev, u8 port_num,
 	immutable->pkey_tbl_len = attr.pkey_tbl_len;
 	immutable->gid_tbl_len = attr.gid_tbl_len;
 	immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+	immutable->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index bf03ce07c316..349880696abc 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -135,6 +135,7 @@ enum {
 	IB_MGMT_SA_DATA = 200,
 	IB_MGMT_DEVICE_HDR = 64,
 	IB_MGMT_DEVICE_DATA = 192,
+	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
 };
 
 struct ib_mad_hdr {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7d78794ed189..f054f2db7084 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1523,6 +1523,7 @@ struct ib_port_immutable {
 	int                           pkey_tbl_len;
 	int                           gid_tbl_len;
 	u32                           core_cap_flags;
+	u32                           max_mad_size;
 };
 
 struct ib_device {
@@ -2040,6 +2041,23 @@ static inline bool rdma_cap_read_multi_sge(struct ib_device *device,
 	return !(device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_IWARP);
 }
 
+/**
+ * rdma_max_mad_size - Return the max MAD size required by this RDMA Port.
+ *
+ * @device: Device
+ * @port_num: Port number
+ *
+ * This MAD size includes the MAD headers and MAD payload.  No other headers
+ * are included.
+ *
+ * Return the max MAD size required by the Port.  Will return 0 if the port
+ * does not support MADs
+ */
+static inline size_t rdma_max_mad_size(const struct ib_device *device, u8 port_num)
+{
+	return device->port_immutable[port_num].max_mad_size;
+}
+
 int ib_query_gid(struct ib_device *device,
 		 u8 port_num, int index, union ib_gid *gid);
 
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/14] IB/mad: Convert allocations from kmem_cache to kzalloc
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 08/14] IB/core: Add ability for drivers to report an alternate MAD size ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 10/14] IB/mad: Add support for additional MAD info to/from drivers ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch implements allocating alternate receive MAD buffers within the MAD
stack.  Support for OPA to send/recv variable sized MADs is implemented later.

    1) Convert MAD allocations from kmem_cache to kzalloc

       kzalloc is more flexible to support devices with different sized MADs
       and research and testing showed that the current use of kmem_cache does
       not provide performance benefits over kzalloc.

    2) Change struct ib_mad_private to use a flex array for the mad data
    3) Allocate ib_mad_private based on the size specified by devices in
       rdma_max_mad_size.
    4) Carry the allocated size in ib_mad_private to be used when processing
       ib_mad_private objects.
    5) Alter DMA mappings based on the mad_size of ib_mad_private.
    6) Replace the use of sizeof and static defines as appropriate
    7) Add appropriate casts for the MAD data when calling processing
       functions.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	Completely redesigned based on Jasons comments to make ib_mad a flex
	array.

	It turns out that making ib_mad a flex array is really not what we
	need.  The MAD layer uses struct ib_mad_private to track its buffers.
	Therefore it was more appropriate to make ib_mad_private a flex array.

 drivers/infiniband/core/agent.c    |   9 +-
 drivers/infiniband/core/agent.h    |   4 +-
 drivers/infiniband/core/mad.c      | 169 +++++++++++++++++++------------------
 drivers/infiniband/core/mad_priv.h |   7 +-
 4 files changed, 97 insertions(+), 92 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index 4fe1fb6b37cd..6c420736ce93 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -78,9 +78,9 @@ ib_get_agent_port(const struct ib_device *device, int port_num)
 	return entry;
 }
 
-void agent_send_response(const struct ib_mad *mad, const struct ib_grh *grh,
+void agent_send_response(const struct ib_mad_hdr *mad_hdr, const struct ib_grh *grh,
 			 const struct ib_wc *wc, const struct ib_device *device,
-			 int port_num, int qpn)
+			 int port_num, int qpn, size_t resp_mad_len)
 {
 	struct ib_agent_port_private *port_priv;
 	struct ib_mad_agent *agent;
@@ -107,7 +107,8 @@ void agent_send_response(const struct ib_mad *mad, const struct ib_grh *grh,
 	}
 
 	send_buf = ib_create_send_mad(agent, wc->src_qp, wc->pkey_index, 0,
-				      IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
+				      IB_MGMT_MAD_HDR,
+				      resp_mad_len - IB_MGMT_MAD_HDR,
 				      GFP_KERNEL,
 				      IB_MGMT_BASE_VERSION);
 	if (IS_ERR(send_buf)) {
@@ -115,7 +116,7 @@ void agent_send_response(const struct ib_mad *mad, const struct ib_grh *grh,
 		goto err1;
 	}
 
-	memcpy(send_buf->mad, mad, sizeof *mad);
+	memcpy(send_buf->mad, mad_hdr, resp_mad_len);
 	send_buf->ah = ah;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH) {
diff --git a/drivers/infiniband/core/agent.h b/drivers/infiniband/core/agent.h
index 94b5fb5b3eef..234c8aa380e0 100644
--- a/drivers/infiniband/core/agent.h
+++ b/drivers/infiniband/core/agent.h
@@ -44,8 +44,8 @@ extern int ib_agent_port_open(struct ib_device *device, int port_num);
 
 extern int ib_agent_port_close(struct ib_device *device, int port_num);
 
-extern void agent_send_response(const struct ib_mad *mad, const struct ib_grh *grh,
+extern void agent_send_response(const struct ib_mad_hdr *mad_hdr, const struct ib_grh *grh,
 				const struct ib_wc *wc, const struct ib_device *device,
-				int port_num, int qpn);
+				int port_num, int qpn, size_t resp_mad_len);
 
 #endif	/* __AGENT_H_ */
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 7b94be9aa6de..64b66ffae2cb 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -59,8 +59,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests
 module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
 MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests");
 
-static struct kmem_cache *ib_mad_cache;
-
 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;
 
@@ -717,6 +715,32 @@ static void build_smp_wc(struct ib_qp *qp,
 	wc->port_num = port_num;
 }
 
+static size_t mad_priv_size(const struct ib_mad_private *mp)
+{
+	return sizeof(struct ib_mad_private) + mp->mad_size;
+}
+
+static struct ib_mad_private *alloc_mad_private(size_t mad_size, gfp_t flags)
+{
+	size_t size = sizeof(struct ib_mad_private) + mad_size;
+	struct ib_mad_private *ret = kzalloc(size, flags);
+
+	if (ret)
+		ret->mad_size = mad_size;
+
+	return ret;
+}
+
+static size_t port_mad_size(const struct ib_mad_port_private *port_priv)
+{
+	return rdma_max_mad_size(port_priv->device, port_priv->port_num);
+}
+
+static size_t mad_priv_dma_size(const struct ib_mad_private *mp)
+{
+	return sizeof(struct ib_grh) + mp->mad_size;
+}
+
 /*
  * Return 0 if SMP is to be sent
  * Return 1 if SMP was consumed locally (whether or not solicited)
@@ -736,6 +760,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	u8 port_num;
 	struct ib_wc mad_wc;
 	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
+	size_t mad_size = port_mad_size(mad_agent_priv->qp_info->port_priv);
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH &&
 	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
@@ -771,7 +796,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 	local->mad_priv = NULL;
 	local->recv_mad_agent = NULL;
-	mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC);
+	mad_priv = alloc_mad_private(mad_size, GFP_ATOMIC);
 	if (!mad_priv) {
 		ret = -ENOMEM;
 		dev_err(&device->dev, "No memory for local response MAD\n");
@@ -786,12 +811,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 
 	/* No GRH for DR SMP */
 	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
-				  (struct ib_mad *)smp,
-				  (struct ib_mad *)&mad_priv->mad);
+				  (const struct ib_mad *)smp,
+				  (struct ib_mad *)mad_priv->mad);
 	switch (ret)
 	{
 	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY:
-		if (ib_response_mad(&mad_priv->mad.mad.mad_hdr) &&
+		if (ib_response_mad((const struct ib_mad_hdr *)mad_priv->mad) &&
 		    mad_agent_priv->agent.recv_handler) {
 			local->mad_priv = mad_priv;
 			local->recv_mad_agent = mad_agent_priv;
@@ -801,33 +826,33 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 			 */
 			atomic_inc(&mad_agent_priv->refcount);
 		} else
-			kmem_cache_free(ib_mad_cache, mad_priv);
+			kfree(mad_priv);
 		break;
 	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED:
-		kmem_cache_free(ib_mad_cache, mad_priv);
+		kfree(mad_priv);
 		break;
 	case IB_MAD_RESULT_SUCCESS:
 		/* Treat like an incoming receive MAD */
 		port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
 					    mad_agent_priv->agent.port_num);
 		if (port_priv) {
-			memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad));
+			memcpy(mad_priv->mad, smp, mad_priv->mad_size);
 			recv_mad_agent = find_mad_agent(port_priv,
-						        &mad_priv->mad.mad.mad_hdr);
+						        (const struct ib_mad_hdr *)mad_priv->mad);
 		}
 		if (!port_priv || !recv_mad_agent) {
 			/*
 			 * No receiving agent so drop packet and
 			 * generate send completion.
 			 */
-			kmem_cache_free(ib_mad_cache, mad_priv);
+			kfree(mad_priv);
 			break;
 		}
 		local->mad_priv = mad_priv;
 		local->recv_mad_agent = recv_mad_agent;
 		break;
 	default:
-		kmem_cache_free(ib_mad_cache, mad_priv);
+		kfree(mad_priv);
 		kfree(local);
 		ret = -EINVAL;
 		goto out;
@@ -877,7 +902,7 @@ static int alloc_send_rmpp_list(struct ib_mad_send_wr_private *send_wr,
 	struct ib_rmpp_segment *seg = NULL;
 	int left, seg_size, pad;
 
-	send_buf->seg_size = sizeof (struct ib_mad) - send_buf->hdr_len;
+	send_buf->seg_size = sizeof(struct ib_mad) - send_buf->hdr_len;
 	seg_size = send_buf->seg_size;
 	pad = send_wr->pad;
 
@@ -1238,7 +1263,7 @@ void ib_free_recv_mad(struct ib_mad_recv_wc *mad_recv_wc)
 					    recv_wc);
 		priv = container_of(mad_priv_hdr, struct ib_mad_private,
 				    header);
-		kmem_cache_free(ib_mad_cache, priv);
+		kfree(priv);
 	}
 }
 EXPORT_SYMBOL(ib_free_recv_mad);
@@ -1933,58 +1958,62 @@ static enum smi_action handle_ib_smi(const struct ib_mad_port_private *port_priv
 				     struct ib_mad_private *response)
 {
 	enum smi_forward_action retsmi;
+	struct ib_smp *smp = (struct ib_smp *)recv->mad;
 
-	if (smi_handle_dr_smp_recv(&recv->mad.smp,
+	if (smi_handle_dr_smp_recv(smp,
 				   port_priv->device->node_type,
 				   port_num,
 				   port_priv->device->phys_port_cnt) ==
 				   IB_SMI_DISCARD)
 		return IB_SMI_DISCARD;
 
-	retsmi = smi_check_forward_dr_smp(&recv->mad.smp);
+	retsmi = smi_check_forward_dr_smp(smp);
 	if (retsmi == IB_SMI_LOCAL)
 		return IB_SMI_HANDLE;
 
 	if (retsmi == IB_SMI_SEND) { /* don't forward */
-		if (smi_handle_dr_smp_send(&recv->mad.smp,
+		if (smi_handle_dr_smp_send(smp,
 					   port_priv->device->node_type,
 					   port_num) == IB_SMI_DISCARD)
 			return IB_SMI_DISCARD;
 
-		if (smi_check_local_smp(&recv->mad.smp, port_priv->device) == IB_SMI_DISCARD)
+		if (smi_check_local_smp(smp, port_priv->device) == IB_SMI_DISCARD)
 			return IB_SMI_DISCARD;
 	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
 		/* forward case for switches */
-		memcpy(response, recv, sizeof(*response));
+		memcpy(response, recv, mad_priv_size(response));
 		response->header.recv_wc.wc = &response->header.wc;
-		response->header.recv_wc.recv_buf.mad = &response->mad.mad;
+		response->header.recv_wc.recv_buf.mad = (struct ib_mad *)response->mad;
 		response->header.recv_wc.recv_buf.grh = &response->grh;
 
-		agent_send_response(&response->mad.mad,
+		agent_send_response((const struct ib_mad_hdr *)response->mad,
 				    &response->grh, wc,
 				    port_priv->device,
-				    smi_get_fwd_port(&recv->mad.smp),
-				    qp_info->qp->qp_num);
+				    smi_get_fwd_port(smp),
+				    qp_info->qp->qp_num,
+				    response->mad_size);
 
 		return IB_SMI_DISCARD;
 	}
 	return IB_SMI_HANDLE;
 }
 
-static bool generate_unmatched_resp(struct ib_mad_private *recv,
+static bool generate_unmatched_resp(const struct ib_mad_private *recv,
 				    struct ib_mad_private *response)
 {
-	if (recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_GET ||
-	    recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_SET) {
-		memcpy(response, recv, sizeof *response);
+	const struct ib_mad_hdr *recv_hdr = (const struct ib_mad_hdr *)recv->mad;
+	struct ib_mad_hdr *resp_hdr = (struct ib_mad_hdr *)response->mad;
+
+	if (recv_hdr->method == IB_MGMT_METHOD_GET ||
+	    recv_hdr->method == IB_MGMT_METHOD_SET) {
+		memcpy(response, recv, mad_priv_size(response));
 		response->header.recv_wc.wc = &response->header.wc;
-		response->header.recv_wc.recv_buf.mad = &response->mad.mad;
+		response->header.recv_wc.recv_buf.mad = (struct ib_mad *)response->mad;
 		response->header.recv_wc.recv_buf.grh = &response->grh;
-		response->mad.mad.mad_hdr.method = IB_MGMT_METHOD_GET_RESP;
-		response->mad.mad.mad_hdr.status =
-			cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD_ATTRIB);
-		if (recv->mad.mad.mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
-			response->mad.mad.mad_hdr.status |= IB_SMP_DIRECTION;
+		resp_hdr->method = IB_MGMT_METHOD_GET_RESP;
+		resp_hdr->status = cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD_ATTRIB);
+		if (recv_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+			resp_hdr->status |= IB_SMP_DIRECTION;
 
 		return true;
 	} else {
@@ -2011,25 +2040,24 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	recv = container_of(mad_priv_hdr, struct ib_mad_private, header);
 	ib_dma_unmap_single(port_priv->device,
 			    recv->header.mapping,
-			    sizeof(struct ib_mad_private) -
-			      sizeof(struct ib_mad_private_header),
+			    mad_priv_dma_size(recv),
 			    DMA_FROM_DEVICE);
 
 	/* Setup MAD receive work completion from "normal" work completion */
 	recv->header.wc = *wc;
 	recv->header.recv_wc.wc = &recv->header.wc;
 	recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
-	recv->header.recv_wc.recv_buf.mad = &recv->mad.mad;
+	recv->header.recv_wc.recv_buf.mad = (struct ib_mad *)recv->mad;
 	recv->header.recv_wc.recv_buf.grh = &recv->grh;
 
 	if (atomic_read(&qp_info->snoop_count))
 		snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS);
 
 	/* Validate MAD */
-	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
+	if (!validate_mad((const struct ib_mad_hdr *)recv->mad, qp_info->qp->qp_num))
 		goto out;
 
-	response = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL);
+	response = alloc_mad_private(recv->mad_size, GFP_ATOMIC);
 	if (!response) {
 		dev_err(&port_priv->device->dev,
 			"ib_mad_recv_done_handler no memory for response buffer\n");
@@ -2041,7 +2069,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	else
 		port_num = port_priv->port_num;
 
-	if (recv->mad.mad.mad_hdr.mgmt_class ==
+	if (((struct ib_mad_hdr *)recv->mad)->mgmt_class ==
 	    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
 		if (handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
 				  response)
@@ -2054,23 +2082,24 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		ret = port_priv->device->process_mad(port_priv->device, 0,
 						     port_priv->port_num,
 						     wc, &recv->grh,
-						     &recv->mad.mad,
-						     &response->mad.mad);
+						     (const struct ib_mad *)recv->mad,
+						     (struct ib_mad *)response->mad);
 		if (ret & IB_MAD_RESULT_SUCCESS) {
 			if (ret & IB_MAD_RESULT_CONSUMED)
 				goto out;
 			if (ret & IB_MAD_RESULT_REPLY) {
-				agent_send_response(&response->mad.mad,
+				agent_send_response((const struct ib_mad_hdr *)response->mad,
 						    &recv->grh, wc,
 						    port_priv->device,
 						    port_num,
-						    qp_info->qp->qp_num);
+						    qp_info->qp->qp_num,
+						    response->mad_size);
 				goto out;
 			}
 		}
 	}
 
-	mad_agent = find_mad_agent(port_priv, &recv->mad.mad.mad_hdr);
+	mad_agent = find_mad_agent(port_priv, (const struct ib_mad_hdr *)recv->mad);
 	if (mad_agent) {
 		ib_mad_complete_recv(mad_agent, &recv->header.recv_wc);
 		/*
@@ -2080,16 +2109,16 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		recv = NULL;
 	} else if ((ret & IB_MAD_RESULT_SUCCESS) &&
 		   generate_unmatched_resp(recv, response)) {
-		agent_send_response(&response->mad.mad, &recv->grh, wc,
-				    port_priv->device, port_num, qp_info->qp->qp_num);
+		agent_send_response((const struct ib_mad_hdr *)response->mad, &recv->grh, wc,
+				    port_priv->device, port_num,
+				    qp_info->qp->qp_num, response->mad_size);
 	}
 
 out:
 	/* Post another receive request for this QP */
 	if (response) {
 		ib_mad_post_receive_mads(qp_info, response);
-		if (recv)
-			kmem_cache_free(ib_mad_cache, recv);
+		kfree(recv);
 	} else
 		ib_mad_post_receive_mads(qp_info, recv);
 }
@@ -2521,7 +2550,7 @@ static void local_completions(struct work_struct *work)
 				 &local->mad_priv->header.recv_wc.rmpp_list);
 			local->mad_priv->header.recv_wc.recv_buf.grh = NULL;
 			local->mad_priv->header.recv_wc.recv_buf.mad =
-						&local->mad_priv->mad.mad;
+						(struct ib_mad *)local->mad_priv->mad;
 			if (atomic_read(&recv_mad_agent->qp_info->snoop_count))
 				snoop_recv(recv_mad_agent->qp_info,
 					  &local->mad_priv->header.recv_wc,
@@ -2549,7 +2578,7 @@ local_send_completion:
 		spin_lock_irqsave(&mad_agent_priv->lock, flags);
 		atomic_dec(&mad_agent_priv->refcount);
 		if (free_mad)
-			kmem_cache_free(ib_mad_cache, local->mad_priv);
+			kfree(local->mad_priv);
 		kfree(local);
 	}
 	spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
@@ -2664,7 +2693,6 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 	struct ib_mad_queue *recv_queue = &qp_info->recv_queue;
 
 	/* Initialize common scatter list fields */
-	sg_list.length = sizeof *mad_priv - sizeof mad_priv->header;
 	sg_list.lkey = (*qp_info->port_priv->mr).lkey;
 
 	/* Initialize common receive WR fields */
@@ -2678,7 +2706,8 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 			mad_priv = mad;
 			mad = NULL;
 		} else {
-			mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL);
+			mad_priv = alloc_mad_private(port_mad_size(qp_info->port_priv),
+						     GFP_ATOMIC);
 			if (!mad_priv) {
 				dev_err(&qp_info->port_priv->device->dev,
 					"No memory for receive buffer\n");
@@ -2686,10 +2715,10 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 				break;
 			}
 		}
+		sg_list.length = mad_priv_dma_size(mad_priv);
 		sg_list.addr = ib_dma_map_single(qp_info->port_priv->device,
 						 &mad_priv->grh,
-						 sizeof *mad_priv -
-						   sizeof mad_priv->header,
+						 mad_priv_dma_size(mad_priv),
 						 DMA_FROM_DEVICE);
 		if (unlikely(ib_dma_mapping_error(qp_info->port_priv->device,
 						  sg_list.addr))) {
@@ -2713,10 +2742,9 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 			spin_unlock_irqrestore(&recv_queue->lock, flags);
 			ib_dma_unmap_single(qp_info->port_priv->device,
 					    mad_priv->header.mapping,
-					    sizeof *mad_priv -
-					      sizeof mad_priv->header,
+					    mad_priv_dma_size(mad_priv),
 					    DMA_FROM_DEVICE);
-			kmem_cache_free(ib_mad_cache, mad_priv);
+			kfree(mad_priv);
 			dev_err(&qp_info->port_priv->device->dev,
 				"ib_post_recv failed: %d\n", ret);
 			break;
@@ -2753,10 +2781,9 @@ static void cleanup_recv_queue(struct ib_mad_qp_info *qp_info)
 
 		ib_dma_unmap_single(qp_info->port_priv->device,
 				    recv->header.mapping,
-				    sizeof(struct ib_mad_private) -
-				      sizeof(struct ib_mad_private_header),
+				    mad_priv_dma_size(recv),
 				    DMA_FROM_DEVICE);
-		kmem_cache_free(ib_mad_cache, recv);
+		kfree(recv);
 	}
 
 	qp_info->recv_queue.count = 0;
@@ -3148,45 +3175,25 @@ static struct ib_client mad_client = {
 
 static int __init ib_mad_init_module(void)
 {
-	int ret;
-
 	mad_recvq_size = min(mad_recvq_size, IB_MAD_QP_MAX_SIZE);
 	mad_recvq_size = max(mad_recvq_size, IB_MAD_QP_MIN_SIZE);
 
 	mad_sendq_size = min(mad_sendq_size, IB_MAD_QP_MAX_SIZE);
 	mad_sendq_size = max(mad_sendq_size, IB_MAD_QP_MIN_SIZE);
 
-	ib_mad_cache = kmem_cache_create("ib_mad",
-					 sizeof(struct ib_mad_private),
-					 0,
-					 SLAB_HWCACHE_ALIGN,
-					 NULL);
-	if (!ib_mad_cache) {
-		pr_err("Couldn't create ib_mad cache\n");
-		ret = -ENOMEM;
-		goto error1;
-	}
-
 	INIT_LIST_HEAD(&ib_mad_port_list);
 
 	if (ib_register_client(&mad_client)) {
 		pr_err("Couldn't register ib_mad client\n");
-		ret = -EINVAL;
-		goto error2;
+		return -EINVAL;
 	}
 
 	return 0;
-
-error2:
-	kmem_cache_destroy(ib_mad_cache);
-error1:
-	return ret;
 }
 
 static void __exit ib_mad_cleanup_module(void)
 {
 	ib_unregister_client(&mad_client);
-	kmem_cache_destroy(ib_mad_cache);
 }
 
 module_init(ib_mad_init_module);
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 7b19cba2adf0..44c0a7842a70 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -75,12 +75,9 @@ struct ib_mad_private_header {
 
 struct ib_mad_private {
 	struct ib_mad_private_header header;
+	size_t mad_size;
 	struct ib_grh grh;
-	union {
-		struct ib_mad mad;
-		struct ib_rmpp_mad rmpp_mad;
-		struct ib_smp smp;
-	} mad;
+	u8 mad[0];
 } __attribute__ ((packed));
 
 struct ib_rmpp_segment {
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/14] IB/mad: Add support for additional MAD info to/from drivers
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 09/14] IB/mad: Convert allocations from kmem_cache to kzalloc ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1433615915-24591-11-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-06 18:38   ` [PATCH 11/14] IB/core: Add OPA MAD core capability flag ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (4 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

In order to support alternate sized MADs (and variable sized MADs on OPA
devices) add in/out MAD size parameters to the process_mad core call.

In addition, add an out_mad_pkey_index to communicate the pkey index the driver
wishes the MAD stack to use when sending OPA MAD responses.

The out MAD size and the out MAD PKey index are required by the MAD
stack to generate responses on OPA devices.

Furthermore, the in and out MAD parameters are made generic by specifying them
as ib_mad_hdr rather than ib_mad.

Drivers are modified as needed and are protected by BUG_ON flags if the MAD
sizes passed to them is incorrect.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes since V1:
	Add const to the in MAD paramter
	Change WARN_ON to BUG_ON in drivers
	Maintain const struct ib_mad through casts
	Add out_mad_pkey_index
	Update commit message

 drivers/infiniband/core/mad.c                | 17 ++++++++++++-----
 drivers/infiniband/core/sysfs.c              |  7 ++++++-
 drivers/infiniband/hw/amso1100/c2_provider.c |  6 +++++-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  6 +++++-
 drivers/infiniband/hw/cxgb4/provider.c       |  7 +++++--
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |  5 +++--
 drivers/infiniband/hw/ehca/ehca_sqp.c        |  9 ++++++++-
 drivers/infiniband/hw/ipath/ipath_mad.c      |  9 ++++++++-
 drivers/infiniband/hw/ipath/ipath_verbs.h    |  4 +++-
 drivers/infiniband/hw/mlx4/mad.c             | 10 +++++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |  4 +++-
 drivers/infiniband/hw/mlx5/mad.c             |  9 ++++++++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |  4 +++-
 drivers/infiniband/hw/mthca/mthca_dev.h      |  5 +++--
 drivers/infiniband/hw/mthca/mthca_mad.c      | 10 ++++++++--
 drivers/infiniband/hw/nes/nes_verbs.c        |  4 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c     |  9 ++++++++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h     |  4 +++-
 drivers/infiniband/hw/qib/qib_mad.c          |  9 ++++++++-
 drivers/infiniband/hw/qib/qib_verbs.h        |  4 +++-
 include/rdma/ib_verbs.h                      |  9 ++++++---
 21 files changed, 120 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 64b66ffae2cb..e072d2a94690 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -761,6 +761,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	struct ib_wc mad_wc;
 	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
 	size_t mad_size = port_mad_size(mad_agent_priv->qp_info->port_priv);
+	u16 out_mad_pkey_index = 0;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH &&
 	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
@@ -811,8 +812,9 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 
 	/* No GRH for DR SMP */
 	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
-				  (const struct ib_mad *)smp,
-				  (struct ib_mad *)mad_priv->mad);
+				  (const struct ib_mad_hdr *)smp, mad_size,
+				  (struct ib_mad_hdr *)mad_priv->mad,
+				  &mad_size, &out_mad_pkey_index);
 	switch (ret)
 	{
 	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY:
@@ -2030,6 +2032,8 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	struct ib_mad_agent_private *mad_agent;
 	int port_num;
 	int ret = IB_MAD_RESULT_SUCCESS;
+	size_t mad_size;
+	u16 resp_mad_pkey_index = 0;
 
 	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
 	qp_info = mad_list->mad_queue->qp_info;
@@ -2057,7 +2061,8 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	if (!validate_mad((const struct ib_mad_hdr *)recv->mad, qp_info->qp->qp_num))
 		goto out;
 
-	response = alloc_mad_private(recv->mad_size, GFP_ATOMIC);
+	mad_size = recv->mad_size;
+	response = alloc_mad_private(mad_size, GFP_KERNEL);
 	if (!response) {
 		dev_err(&port_priv->device->dev,
 			"ib_mad_recv_done_handler no memory for response buffer\n");
@@ -2082,8 +2087,10 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		ret = port_priv->device->process_mad(port_priv->device, 0,
 						     port_priv->port_num,
 						     wc, &recv->grh,
-						     (const struct ib_mad *)recv->mad,
-						     (struct ib_mad *)response->mad);
+						     (const struct ib_mad_hdr *)recv->mad,
+						     recv->mad_size,
+						     (struct ib_mad_hdr *)response->mad,
+						     &mad_size, &resp_mad_pkey_index);
 		if (ret & IB_MAD_RESULT_SUCCESS) {
 			if (ret & IB_MAD_RESULT_CONSUMED)
 				goto out;
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index d0334c101ecb..ed6b6c85c334 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -326,6 +326,8 @@ static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr,
 	int width  = (tab_attr->index >> 16) & 0xff;
 	struct ib_mad *in_mad  = NULL;
 	struct ib_mad *out_mad = NULL;
+	size_t mad_size = sizeof(*out_mad);
+	u16 out_mad_pkey_index = 0;
 	ssize_t ret;
 
 	if (!p->ibdev->process_mad)
@@ -347,7 +349,10 @@ static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr,
 	in_mad->data[41] = p->port_num;	/* PortSelect field */
 
 	if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY,
-		 p->port_num, NULL, NULL, in_mad, out_mad) &
+		 p->port_num, NULL, NULL,
+		 (const struct ib_mad_hdr *)in_mad, mad_size,
+		 (struct ib_mad_hdr *)out_mad, &mad_size,
+		 &out_mad_pkey_index) &
 	     (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) !=
 	    (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) {
 		ret = -EINVAL;
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index 0f007a6b188b..eac836286fd7 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -584,7 +584,11 @@ static int c2_process_mad(struct ib_device *ibdev,
 			  u8 port_num,
 			  const struct ib_wc *in_wc,
 			  const struct ib_grh *in_grh,
-			  const struct ib_mad *in_mad, struct ib_mad *out_mad)
+			  const struct ib_mad_hdr *in_mad,
+			  size_t in_mad_size,
+			  struct ib_mad_hdr *out_mad,
+			  size_t *out_mad_size,
+			  u16 *out_mad_pkey_index)
 {
 	pr_debug("%s:%u\n", __func__, __LINE__);
 	return -ENOSYS;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 19c830ecbb67..8b6504095414 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -87,7 +87,11 @@ static int iwch_process_mad(struct ib_device *ibdev,
 			    u8 port_num,
 			    const struct ib_wc *in_wc,
 			    const struct ib_grh *in_grh,
-			    const struct ib_mad *in_mad, struct ib_mad *out_mad)
+			    const struct ib_mad_hdr *in_mad,
+			    size_t in_mad_size,
+			    struct ib_mad_hdr *out_mad,
+			    size_t *out_mad_size,
+			    u16 *out_mad_pkey_index)
 {
 	return -ENOSYS;
 }
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 75ea26a32076..02aad989c75b 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -82,8 +82,11 @@ static int c4iw_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 static int c4iw_process_mad(struct ib_device *ibdev, int mad_flags,
 			    u8 port_num, const struct ib_wc *in_wc,
 			    const struct ib_grh *in_grh,
-			    const struct ib_mad *in_mad,
-			    struct ib_mad *out_mad)
+			    const struct ib_mad_hdr *in_mad,
+			    size_t in_mad_size,
+			    struct ib_mad_hdr *out_mad,
+			    size_t *out_mad_size,
+			    u16 *out_mad_pkey_index)
 {
 	return -ENOSYS;
 }
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 582fc71a8488..0931203d6478 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -192,8 +192,9 @@ int ehca_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
 
 int ehca_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		     const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-		     const struct ib_mad *in_mad,
-		     struct ib_mad *out_mad);
+		     const struct ib_mad_hdr *in, size_t in_mad_size,
+		     struct ib_mad_hdr *out, size_t *out_mad_size,
+		     u16 *out_mad_pkey_index);
 
 void ehca_poll_eqs(unsigned long data);
 
diff --git a/drivers/infiniband/hw/ehca/ehca_sqp.c b/drivers/infiniband/hw/ehca/ehca_sqp.c
index 889ccfda6401..12b5bc23832b 100644
--- a/drivers/infiniband/hw/ehca/ehca_sqp.c
+++ b/drivers/infiniband/hw/ehca/ehca_sqp.c
@@ -218,9 +218,16 @@ perf_reply:
 
 int ehca_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		     const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-		     const struct ib_mad *in_mad, struct ib_mad *out_mad)
+		     const struct ib_mad_hdr *in, size_t in_mad_size,
+		     struct ib_mad_hdr *out, size_t *out_mad_size,
+		     u16 *out_mad_pkey_index)
 {
 	int ret;
+	const struct ib_mad *in_mad = (const struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	BUG_ON(in_mad_size != sizeof(*in_mad) ||
+	       *out_mad_size != sizeof(*out_mad));
 
 	if (!port_num || port_num > ibdev->phys_port_cnt || !in_wc)
 		return IB_MAD_RESULT_FAILURE;
diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c
index 9e8929e23740..948188e37f95 100644
--- a/drivers/infiniband/hw/ipath/ipath_mad.c
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
@@ -1491,9 +1491,16 @@ bail:
  */
 int ipath_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		      const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-		      const struct ib_mad *in_mad, struct ib_mad *out_mad)
+		      const struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size,
+		      u16 *out_mad_pkey_index)
 {
 	int ret;
+	const struct ib_mad *in_mad = (const struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	BUG_ON(in_mad_size != sizeof(*in_mad) ||
+	       *out_mad_size != sizeof(*out_mad));
 
 	switch (in_mad->mad_hdr.mgmt_class) {
 	case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index 7a2b6a17f844..6c3f4d9179b2 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -703,7 +703,9 @@ int ipath_process_mad(struct ib_device *ibdev,
 		      u8 port_num,
 		      const struct ib_wc *in_wc,
 		      const struct ib_grh *in_grh,
-		      const struct ib_mad *in_mad, struct ib_mad *out_mad);
+		      const struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size,
+		      u16 *out_mad_pkey_index);
 
 /*
  * Compare the lower 24 bits of the two values.
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 6ac41cc15872..e960085bef64 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -869,8 +869,16 @@ static int iboe_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 
 int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-			const struct ib_mad *in_mad, struct ib_mad *out_mad)
+			const struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size,
+			u16 *out_mad_pkey_index)
 {
+	const struct ib_mad *in_mad = (const struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	BUG_ON(in_mad_size != sizeof(*in_mad) ||
+	       *out_mad_size != sizeof(*out_mad));
+
 	switch (rdma_port_get_link_layer(ibdev, port_num)) {
 	case IB_LINK_LAYER_INFINIBAND:
 		return ib_process_mad(ibdev, mad_flags, port_num, in_wc,
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 645d55ef0604..298883bb0c7e 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -710,7 +710,9 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int mad_ifc_flags,
 		 const void *in_mad, void *response_mad);
 int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags,	u8 port_num,
 			const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-			const struct ib_mad *in_mad, struct ib_mad *out_mad);
+			const struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size,
+			u16 *out_mad_pkey_index);
 int mlx4_ib_mad_init(struct mlx4_ib_dev *dev);
 void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev);
 
diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 34e519cd4c64..8e45714fa369 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -59,10 +59,17 @@ int mlx5_MAD_IFC(struct mlx5_ib_dev *dev, int ignore_mkey, int ignore_bkey,
 
 int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-			const struct ib_mad *in_mad, struct ib_mad *out_mad)
+			const struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size,
+			u16 *out_mad_pkey_index)
 {
 	u16 slid;
 	int err;
+	const struct ib_mad *in_mad = (const struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	BUG_ON(in_mad_size != sizeof(*in_mad) ||
+	       *out_mad_size != sizeof(*out_mad));
 
 	slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index c6219032d00c..1013944d3180 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -587,7 +587,9 @@ int mlx5_ib_unmap_fmr(struct list_head *fmr_list);
 int mlx5_ib_fmr_dealloc(struct ib_fmr *ibfmr);
 int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-			const struct ib_mad *in_mad, struct ib_mad *out_mad);
+			const struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size,
+			u16 *out_mad_pkey_index);
 struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev,
 					  struct ib_ucontext *context,
 					  struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index b70f9ff23171..4393a022867b 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -578,8 +578,9 @@ int mthca_process_mad(struct ib_device *ibdev,
 		      u8 port_num,
 		      const struct ib_wc *in_wc,
 		      const struct ib_grh *in_grh,
-		      const struct ib_mad *in_mad,
-		      struct ib_mad *out_mad);
+		      const struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size,
+		      u16 *out_mad_pkey_index);
 int mthca_create_agents(struct mthca_dev *dev);
 void mthca_free_agents(struct mthca_dev *dev);
 
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index e121e646591d..6b2418b74c99 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -198,13 +198,19 @@ int mthca_process_mad(struct ib_device *ibdev,
 		      u8 port_num,
 		      const struct ib_wc *in_wc,
 		      const struct ib_grh *in_grh,
-		      const struct ib_mad *in_mad,
-		      struct ib_mad *out_mad)
+		      const struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size,
+		      u16 *out_mad_pkey_index)
 {
 	int err;
 	u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
 	u16 prev_lid = 0;
 	struct ib_port_attr pattr;
+	const struct ib_mad *in_mad = (const struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	BUG_ON(in_mad_size != sizeof(*in_mad) ||
+	       *out_mad_size != sizeof(*out_mad));
 
 	/* Forward locally generated traps to the SM */
 	if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP &&
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 0099e419e24f..3fec491a332c 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -3222,7 +3222,9 @@ static int nes_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
  */
 static int nes_process_mad(struct ib_device *ibdev, int mad_flags,
 		u8 port_num, const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-		const struct ib_mad *in_mad, struct ib_mad *out_mad)
+		const struct ib_mad_hdr *in, size_t in_mad_size,
+		struct ib_mad_hdr *out, size_t *out_mad_size,
+		u16 *out_mad_pkey_index)
 {
 	nes_debug(NES_DBG_INIT, "\n");
 	return -ENOSYS;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 3216bce08a10..5f8a8dd423fc 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -198,10 +198,17 @@ int ocrdma_process_mad(struct ib_device *ibdev,
 		       u8 port_num,
 		       const struct ib_wc *in_wc,
 		       const struct ib_grh *in_grh,
-		       const struct ib_mad *in_mad, struct ib_mad *out_mad)
+		       const struct ib_mad_hdr *in, size_t in_mad_size,
+		       struct ib_mad_hdr *out, size_t *out_mad_size,
+		       u16 *out_mad_pkey_index)
 {
 	int status;
 	struct ocrdma_dev *dev;
+	const struct ib_mad *in_mad = (const struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	BUG_ON(in_mad_size != sizeof(*in_mad) ||
+	       *out_mad_size != sizeof(*out_mad));
 
 	switch (in_mad->mad_hdr.mgmt_class) {
 	case IB_MGMT_CLASS_PERF_MGMT:
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
index 5c4ae3eba47c..cf366fe03cb8 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
@@ -44,5 +44,7 @@ int ocrdma_process_mad(struct ib_device *,
 		       u8 port_num,
 		       const struct ib_wc *in_wc,
 		       const struct ib_grh *in_grh,
-		       const struct ib_mad *in_mad, struct ib_mad *out_mad);
+		       const struct ib_mad_hdr *in, size_t in_mad_size,
+		       struct ib_mad_hdr *out, size_t *out_mad_size,
+		       u16 *out_mad_pkey_index);
 #endif				/* __OCRDMA_AH_H__ */
diff --git a/drivers/infiniband/hw/qib/qib_mad.c b/drivers/infiniband/hw/qib/qib_mad.c
index 206b2050b247..05e3242d8442 100644
--- a/drivers/infiniband/hw/qib/qib_mad.c
+++ b/drivers/infiniband/hw/qib/qib_mad.c
@@ -2402,11 +2402,18 @@ bail:
  */
 int qib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port,
 		    const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-		    const struct ib_mad *in_mad, struct ib_mad *out_mad)
+		    const struct ib_mad_hdr *in, size_t in_mad_size,
+		    struct ib_mad_hdr *out, size_t *out_mad_size,
+		    u16 *out_mad_pkey_index)
 {
 	int ret;
 	struct qib_ibport *ibp = to_iport(ibdev, port);
 	struct qib_pportdata *ppd = ppd_from_ibp(ibp);
+	const struct ib_mad *in_mad = (const struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	BUG_ON(in_mad_size != sizeof(*in_mad) ||
+	       *out_mad_size != sizeof(*out_mad));
 
 	switch (in_mad->mad_hdr.mgmt_class) {
 	case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index f2f57749c07d..f6e8b0b9d947 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -873,7 +873,9 @@ void qib_sys_guid_chg(struct qib_ibport *ibp);
 void qib_node_desc_chg(struct qib_ibport *ibp);
 int qib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		    const struct ib_wc *in_wc, const struct ib_grh *in_grh,
-		    const struct ib_mad *in_mad, struct ib_mad *out_mad);
+		    const struct ib_mad_hdr *in, size_t in_mad_size,
+		    struct ib_mad_hdr *out, size_t *out_mad_size,
+		    u16 *out_mad_pkey_index);
 int qib_create_agents(struct qib_ibdev *dev);
 void qib_free_agents(struct qib_ibdev *dev);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f054f2db7084..594d4d5665de 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1452,7 +1452,7 @@ struct ib_flow {
 	struct ib_uobject	*uobject;
 };
 
-struct ib_mad;
+struct ib_mad_hdr;
 struct ib_grh;
 
 enum ib_process_mad_flags {
@@ -1693,8 +1693,11 @@ struct ib_device {
 						  u8 port_num,
 						  const struct ib_wc *in_wc,
 						  const struct ib_grh *in_grh,
-						  const struct ib_mad *in_mad,
-						  struct ib_mad *out_mad);
+						  const struct ib_mad_hdr *in_mad,
+						  size_t in_mad_size,
+						  struct ib_mad_hdr *out_mad,
+						  size_t *out_mad_size,
+						  u16 *out_mad_pkey_index);
 	struct ib_xrcd *	   (*alloc_xrcd)(struct ib_device *device,
 						 struct ib_ucontext *ucontext,
 						 struct ib_udata *udata);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/14] IB/core: Add OPA MAD core capability flag
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 10/14] IB/mad: Add support for additional MAD info to/from drivers ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 12/14] IB/mad: Add partial Intel OPA MAD support ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Add OPA MAD support flags to the core capability immutable flags.  In addition
add the rdma_cap_opa_mad helper function for core functions to use to detect
OPA MAD support.

OPA MADs share a common header with IBTA MADs but with some differences for
increased performance.

Sharing a common header with IBTA MADs allows us to share most of the MAD
processing code when dealing with OPA MADs in addition to supporting some IBTA
MADs on OPA devices.

OPA MADs differ in the following ways:

	1) MADs are variable size up to 2K
	   IBTA defined MADs remain fixed at 256 bytes
	2) OPA SMPs must carry valid PKeys
	3) OPA SMP packets are a different format

The MAD stack will use this new functionality to determine if OPA MAD
processing should occur on individual device ports.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	Update commit to indicate the difference between OPA and IB MADs

 include/rdma/ib_verbs.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 594d4d5665de..deebe9db3a19 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -362,6 +362,7 @@ union rdma_protocol_stats {
 #define RDMA_CORE_CAP_IB_CM             0x00000004
 #define RDMA_CORE_CAP_IW_CM             0x00000008
 #define RDMA_CORE_CAP_IB_SA             0x00000010
+#define RDMA_CORE_CAP_OPA_MAD           0x00000020
 
 /* Address format                       0x000FF000 */
 #define RDMA_CORE_CAP_AF_IB             0x00001000
@@ -386,6 +387,8 @@ union rdma_protocol_stats {
 					| RDMA_CORE_CAP_ETH_AH)
 #define RDMA_CORE_PORT_IWARP           (RDMA_CORE_CAP_PROT_IWARP \
 					| RDMA_CORE_CAP_IW_CM)
+#define RDMA_CORE_PORT_INTEL_OPA       (RDMA_CORE_PORT_IBA_IB  \
+					| RDMA_CORE_CAP_OPA_MAD)
 
 struct ib_port_attr {
 	enum ib_port_state	state;
@@ -1874,6 +1877,31 @@ static inline bool rdma_cap_ib_mad(const struct ib_device *device, u8 port_num)
 }
 
 /**
+ * rdma_cap_opa_mad - Check if the port of device provides support for OPA
+ * Management Datagrams.
+ * @device: Device to check
+ * @port_num: Port number to check
+ *
+ * Intel OmniPath devices extend and/or replace the InfiniBand Management
+ * datagrams with their own versions.  These OPA MADs share many but not all of
+ * the characteristics of InfiniBand MADs.
+ *
+ * OPA MADs differ in the following ways:
+ *
+ *    1) MADs are variable size up to 2K
+ *       IBTA defined MADs remain fixed at 256 bytes
+ *    2) OPA SMPs must carry valid PKeys
+ *    3) OPA SMP packets are a different format
+ *
+ * Return: true if the port supports OPA MAD packet formats.
+ */
+static inline bool rdma_cap_opa_mad(struct ib_device *device, u8 port_num)
+{
+	return (device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_OPA_MAD)
+		== RDMA_CORE_CAP_OPA_MAD;
+}
+
+/**
  * rdma_cap_ib_smi - Check if the port of a device provides an Infiniband
  * Subnet Management Agent (SMA) on the Subnet Management Interface (SMI).
  * @device: Device to check
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 12/14] IB/mad: Add partial Intel OPA MAD support
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 11/14] IB/core: Add OPA MAD core capability flag ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 13/14] " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch is the first of 3 which adds processing of OPA MADs

1) Add Intel Omni-Path Architecture defines
2) Increase max management version to accommodate OPA
3) update ib_create_send_mad
	If the device supports OPA MADs and the MAD being sent is the OPA base
	version alter the MAD size and sg lengths as appropriate

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Chagnes from V1:
	Update commit message
	Add check for OPA_MGMT_MAD_SIZE on a device which specifies OPA support

 drivers/infiniband/core/mad.c      | 42 +++++++++++++++++++++++++++++---------
 drivers/infiniband/core/mad_priv.h |  2 +-
 drivers/infiniband/core/mad_rmpp.c |  7 ++++---
 include/rdma/ib_mad.h              | 24 +++++++++++++++++++---
 4 files changed, 58 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index e072d2a94690..81acde8b5e90 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -874,11 +874,11 @@ out:
 	return ret;
 }
 
-static int get_pad_size(int hdr_len, int data_len)
+static int get_pad_size(int hdr_len, int data_len, size_t mad_size)
 {
 	int seg_size, pad;
 
-	seg_size = sizeof(struct ib_mad) - hdr_len;
+	seg_size = mad_size - hdr_len;
 	if (data_len && seg_size) {
 		pad = seg_size - data_len % seg_size;
 		return pad == seg_size ? 0 : pad;
@@ -897,14 +897,15 @@ static void free_send_rmpp_list(struct ib_mad_send_wr_private *mad_send_wr)
 }
 
 static int alloc_send_rmpp_list(struct ib_mad_send_wr_private *send_wr,
-				gfp_t gfp_mask)
+				size_t mad_size, gfp_t gfp_mask)
 {
 	struct ib_mad_send_buf *send_buf = &send_wr->send_buf;
 	struct ib_rmpp_mad *rmpp_mad = send_buf->mad;
 	struct ib_rmpp_segment *seg = NULL;
 	int left, seg_size, pad;
 
-	send_buf->seg_size = sizeof(struct ib_mad) - send_buf->hdr_len;
+	send_buf->seg_size = mad_size - send_buf->hdr_len;
+	send_buf->seg_rmpp_size = mad_size - IB_MGMT_RMPP_HDR;
 	seg_size = send_buf->seg_size;
 	pad = send_wr->pad;
 
@@ -954,20 +955,30 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	struct ib_mad_send_wr_private *mad_send_wr;
 	int pad, message_size, ret, size;
 	void *buf;
+	size_t mad_size;
+	bool opa;
 
 	mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private,
 				      agent);
-	pad = get_pad_size(hdr_len, data_len);
+
+	opa = rdma_cap_opa_mad(mad_agent->device, mad_agent->port_num);
+
+	if (opa && base_version == OPA_MGMT_BASE_VERSION)
+		mad_size = sizeof(struct opa_mad);
+	else
+		mad_size = sizeof(struct ib_mad);
+
+	pad = get_pad_size(hdr_len, data_len, mad_size);
 	message_size = hdr_len + data_len + pad;
 
 	if (ib_mad_kernel_rmpp_agent(mad_agent)) {
-		if (!rmpp_active && message_size > sizeof(struct ib_mad))
+		if (!rmpp_active && message_size > mad_size)
 			return ERR_PTR(-EINVAL);
 	} else
-		if (rmpp_active || message_size > sizeof(struct ib_mad))
+		if (rmpp_active || message_size > mad_size)
 			return ERR_PTR(-EINVAL);
 
-	size = rmpp_active ? hdr_len : sizeof(struct ib_mad);
+	size = rmpp_active ? hdr_len : mad_size;
 	buf = kzalloc(sizeof *mad_send_wr + size, gfp_mask);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
@@ -982,7 +993,14 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	mad_send_wr->mad_agent_priv = mad_agent_priv;
 	mad_send_wr->sg_list[0].length = hdr_len;
 	mad_send_wr->sg_list[0].lkey = mad_agent->mr->lkey;
-	mad_send_wr->sg_list[1].length = sizeof(struct ib_mad) - hdr_len;
+
+	/* OPA MADs don't have to be the full 2048 bytes */
+	if (opa && base_version == OPA_MGMT_BASE_VERSION &&
+	    data_len < mad_size - hdr_len)
+		mad_send_wr->sg_list[1].length = data_len;
+	else
+		mad_send_wr->sg_list[1].length = mad_size - hdr_len;
+
 	mad_send_wr->sg_list[1].lkey = mad_agent->mr->lkey;
 
 	mad_send_wr->send_wr.wr_id = (unsigned long) mad_send_wr;
@@ -995,7 +1013,7 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	mad_send_wr->send_wr.wr.ud.pkey_index = pkey_index;
 
 	if (rmpp_active) {
-		ret = alloc_send_rmpp_list(mad_send_wr, gfp_mask);
+		ret = alloc_send_rmpp_list(mad_send_wr, mad_size, gfp_mask);
 		if (ret) {
 			kfree(buf);
 			return ERR_PTR(ret);
@@ -2975,6 +2993,10 @@ static int ib_mad_port_open(struct ib_device *device,
 	if (WARN_ON(rdma_max_mad_size(device, port_num) < IB_MGMT_MAD_SIZE))
 		return -EFAULT;
 
+	if (WARN_ON(rdma_cap_opa_mad(device, port_num) &&
+		    rdma_max_mad_size(device, port_num) < OPA_MGMT_MAD_SIZE))
+		return -EFAULT;
+
 	/* Create new device info */
 	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
 	if (!port_priv) {
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 44c0a7842a70..e8852be0c3f8 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -56,7 +56,7 @@
 
 /* Registration table sizes */
 #define MAX_MGMT_CLASS		80
-#define MAX_MGMT_VERSION	8
+#define MAX_MGMT_VERSION	0x83
 #define MAX_MGMT_OUI		8
 #define MAX_MGMT_VENDOR_RANGE2	(IB_MGMT_CLASS_VENDOR_RANGE2_END - \
 				IB_MGMT_CLASS_VENDOR_RANGE2_START + 1)
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 2379e2dfa400..f4e4fe609e95 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -572,13 +572,14 @@ static int send_next_seg(struct ib_mad_send_wr_private *mad_send_wr)
 
 	if (mad_send_wr->seg_num == 1) {
 		rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST;
-		paylen = mad_send_wr->send_buf.seg_count * IB_MGMT_RMPP_DATA -
-			 mad_send_wr->pad;
+		paylen = (mad_send_wr->send_buf.seg_count *
+			  mad_send_wr->send_buf.seg_rmpp_size) -
+			  mad_send_wr->pad;
 	}
 
 	if (mad_send_wr->seg_num == mad_send_wr->send_buf.seg_count) {
 		rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_LAST;
-		paylen = IB_MGMT_RMPP_DATA - mad_send_wr->pad;
+		paylen = mad_send_wr->send_buf.seg_rmpp_size - mad_send_wr->pad;
 	}
 	rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(paylen);
 
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 349880696abc..eaa8c5dc472b 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -42,8 +42,11 @@
 #include <rdma/ib_verbs.h>
 #include <uapi/rdma/ib_user_mad.h>
 
-/* Management base version */
+/* Management base versions */
 #define IB_MGMT_BASE_VERSION			1
+#define OPA_MGMT_BASE_VERSION			0x80
+
+#define OPA_SMP_CLASS_VERSION			0x80
 
 /* Management classes */
 #define IB_MGMT_CLASS_SUBN_LID_ROUTED		0x01
@@ -136,6 +139,9 @@ enum {
 	IB_MGMT_DEVICE_HDR = 64,
 	IB_MGMT_DEVICE_DATA = 192,
 	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
+	OPA_MGMT_MAD_DATA = 2024,
+	OPA_MGMT_RMPP_DATA = 2012,
+	OPA_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + OPA_MGMT_MAD_DATA,
 };
 
 struct ib_mad_hdr {
@@ -182,6 +188,11 @@ struct ib_mad {
 	u8			data[IB_MGMT_MAD_DATA];
 };
 
+struct opa_mad {
+	struct ib_mad_hdr	mad_hdr;
+	u8			data[OPA_MGMT_MAD_DATA];
+};
+
 struct ib_rmpp_mad {
 	struct ib_mad_hdr	mad_hdr;
 	struct ib_rmpp_hdr	rmpp_hdr;
@@ -236,7 +247,10 @@ struct ib_class_port_info {
  *   includes the common MAD, RMPP, and class specific headers.
  * @data_len: Indicates the total size of user-transferred data.
  * @seg_count: The number of RMPP segments allocated for this send.
- * @seg_size: Size of each RMPP segment.
+ * @seg_size: Size of the data in each RMPP segment.  This does not include
+ *   class specific headers.
+ * @seg_rmpp_size: Size of each RMPP segment including the class specific
+ *   headers.
  * @timeout_ms: Time to wait for a response.
  * @retries: Number of times to retry a request for a response.  For MADs
  *   using RMPP, this applies per window.  On completion, returns the number
@@ -256,6 +270,7 @@ struct ib_mad_send_buf {
 	int			data_len;
 	int			seg_count;
 	int			seg_size;
+	int			seg_rmpp_size;
 	int			timeout_ms;
 	int			retries;
 };
@@ -402,7 +417,10 @@ struct ib_mad_send_wc {
 struct ib_mad_recv_buf {
 	struct list_head	list;
 	struct ib_grh		*grh;
-	struct ib_mad		*mad;
+	union {
+		struct ib_mad	*mad;
+		struct opa_mad	*opa_mad;
+	};
 };
 
 /**
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 13/14] IB/mad: Add partial Intel OPA MAD support
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 12/14] IB/mad: Add partial Intel OPA MAD support ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-06 18:38   ` [PATCH 14/14] IB/mad: Add final OPA MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-06-12 20:00   ` [PATCH 00/14] IB/mad: Add support for " Doug Ledford
  14 siblings, 0 replies; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Add OPA SMP processing functionality.

Define the new OPA SMP format, create support functions for this format using
the previously defined helper functions as appropriate.

These functions are defined in this patch and used in the final OPA MAD support
patch.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad_priv.h |   1 +
 drivers/infiniband/core/opa_smi.h  |  78 +++++++++++++++++++++++++++
 drivers/infiniband/core/smi.c      |  54 +++++++++++++++++++
 include/rdma/opa_smi.h             | 106 +++++++++++++++++++++++++++++++++++++
 4 files changed, 239 insertions(+)
 create mode 100644 drivers/infiniband/core/opa_smi.h
 create mode 100644 include/rdma/opa_smi.h

diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index e8852be0c3f8..4423f68e2a77 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -41,6 +41,7 @@
 #include <linux/workqueue.h>
 #include <rdma/ib_mad.h>
 #include <rdma/ib_smi.h>
+#include <rdma/opa_smi.h>
 
 #define IB_MAD_QPS_CORE		2 /* Always QP0 and QP1 as a minimum */
 
diff --git a/drivers/infiniband/core/opa_smi.h b/drivers/infiniband/core/opa_smi.h
new file mode 100644
index 000000000000..62d91bfa4cb7
--- /dev/null
+++ b/drivers/infiniband/core/opa_smi.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __OPA_SMI_H_
+#define __OPA_SMI_H_
+
+#include <rdma/ib_smi.h>
+#include <rdma/opa_smi.h>
+
+#include "smi.h"
+
+enum smi_action opa_smi_handle_dr_smp_recv(struct opa_smp *smp, u8 node_type,
+				       int port_num, int phys_port_cnt);
+int opa_smi_get_fwd_port(struct opa_smp *smp);
+extern enum smi_forward_action opa_smi_check_forward_dr_smp(struct opa_smp *smp);
+extern enum smi_action opa_smi_handle_dr_smp_send(struct opa_smp *smp,
+					      u8 node_type, int port_num);
+
+/*
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
+ */
+static inline enum smi_action opa_smi_check_local_smp(struct opa_smp *smp,
+						      struct ib_device *device)
+{
+	/* C14-9:3 -- We're at the end of the DR segment of path */
+	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
+	return (device->process_mad &&
+		!opa_get_smp_direction(smp) &&
+		(smp->hop_ptr == smp->hop_cnt + 1)) ?
+		IB_SMI_HANDLE : IB_SMI_DISCARD;
+}
+
+/*
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
+ */
+static inline enum smi_action opa_smi_check_local_returning_smp(struct opa_smp *smp,
+								struct ib_device *device)
+{
+	/* C14-13:3 -- We're at the end of the DR segment of path */
+	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
+	return (device->process_mad &&
+		opa_get_smp_direction(smp) &&
+		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD;
+}
+
+#endif	/* __OPA_SMI_H_ */
diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index c523b2df2571..368a561d1a5d 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2004, 2005 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004-2007 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -38,6 +39,7 @@
 
 #include <rdma/ib_smi.h>
 #include "smi.h"
+#include "opa_smi.h"
 
 static enum smi_action __smi_handle_dr_smp_send(u8 node_type, int port_num,
 						u8 *hop_ptr, u8 hop_cnt,
@@ -136,6 +138,20 @@ enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
+enum smi_action opa_smi_handle_dr_smp_send(struct opa_smp *smp,
+				       u8 node_type, int port_num)
+{
+	return __smi_handle_dr_smp_send(node_type, port_num,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->route.dr.initial_path,
+					smp->route.dr.return_path,
+					opa_get_smp_direction(smp),
+					smp->route.dr.dr_dlid ==
+					OPA_LID_PERMISSIVE,
+					smp->route.dr.dr_slid ==
+					OPA_LID_PERMISSIVE);
+}
+
 static enum smi_action __smi_handle_dr_smp_recv(u8 node_type, int port_num,
 						int phys_port_cnt,
 						u8 *hop_ptr, u8 hop_cnt,
@@ -234,6 +250,24 @@ enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
+/*
+ * Adjust information for a received SMP
+ * Return IB_SMI_DISCARD if the SMP should be dropped
+ */
+enum smi_action opa_smi_handle_dr_smp_recv(struct opa_smp *smp, u8 node_type,
+					   int port_num, int phys_port_cnt)
+{
+	return __smi_handle_dr_smp_recv(node_type, port_num, phys_port_cnt,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->route.dr.initial_path,
+					smp->route.dr.return_path,
+					opa_get_smp_direction(smp),
+					smp->route.dr.dr_dlid ==
+					OPA_LID_PERMISSIVE,
+					smp->route.dr.dr_slid ==
+					OPA_LID_PERMISSIVE);
+}
+
 static enum smi_forward_action __smi_check_forward_dr_smp(u8 hop_ptr, u8 hop_cnt,
 							  u8 direction,
 							  bool dr_dlid_is_permissive,
@@ -274,6 +308,16 @@ enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
 					  smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
+enum smi_forward_action opa_smi_check_forward_dr_smp(struct opa_smp *smp)
+{
+	return __smi_check_forward_dr_smp(smp->hop_ptr, smp->hop_cnt,
+					  opa_get_smp_direction(smp),
+					  smp->route.dr.dr_dlid ==
+					  OPA_LID_PERMISSIVE,
+					  smp->route.dr.dr_slid ==
+					  OPA_LID_PERMISSIVE);
+}
+
 /*
  * Return the forwarding port number from initial_path for outgoing SMP and
  * from return_path for returning SMP
@@ -283,3 +327,13 @@ int smi_get_fwd_port(struct ib_smp *smp)
 	return (!ib_get_smp_direction(smp) ? smp->initial_path[smp->hop_ptr+1] :
 		smp->return_path[smp->hop_ptr-1]);
 }
+
+/*
+ * Return the forwarding port number from initial_path for outgoing SMP and
+ * from return_path for returning SMP
+ */
+int opa_smi_get_fwd_port(struct opa_smp *smp)
+{
+	return !opa_get_smp_direction(smp) ? smp->route.dr.initial_path[smp->hop_ptr+1] :
+		smp->route.dr.return_path[smp->hop_ptr-1];
+}
diff --git a/include/rdma/opa_smi.h b/include/rdma/opa_smi.h
new file mode 100644
index 000000000000..29063e84c253
--- /dev/null
+++ b/include/rdma/opa_smi.h
@@ -0,0 +1,106 @@
+/*
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(OPA_SMI_H)
+#define OPA_SMI_H
+
+#include <rdma/ib_mad.h>
+#include <rdma/ib_smi.h>
+
+#define OPA_SMP_LID_DATA_SIZE			2016
+#define OPA_SMP_DR_DATA_SIZE			1872
+#define OPA_SMP_MAX_PATH_HOPS			64
+
+#define OPA_SMI_CLASS_VERSION			0x80
+
+#define OPA_LID_PERMISSIVE			cpu_to_be32(0xFFFFFFFF)
+
+struct opa_smp {
+	u8	base_version;
+	u8	mgmt_class;
+	u8	class_version;
+	u8	method;
+	__be16	status;
+	u8	hop_ptr;
+	u8	hop_cnt;
+	__be64	tid;
+	__be16	attr_id;
+	__be16	resv;
+	__be32	attr_mod;
+	__be64	mkey;
+	union {
+		struct {
+			uint8_t data[OPA_SMP_LID_DATA_SIZE];
+		} lid;
+		struct {
+			__be32	dr_slid;
+			__be32	dr_dlid;
+			u8	initial_path[OPA_SMP_MAX_PATH_HOPS];
+			u8	return_path[OPA_SMP_MAX_PATH_HOPS];
+			u8	reserved[8];
+			u8	data[OPA_SMP_DR_DATA_SIZE];
+		} dr;
+	} route;
+} __packed;
+
+
+static inline u8
+opa_get_smp_direction(struct opa_smp *smp)
+{
+	return ib_get_smp_direction((struct ib_smp *)smp);
+}
+
+static inline u8 *opa_get_smp_data(struct opa_smp *smp)
+{
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+		return smp->route.dr.data;
+
+	return smp->route.lid.data;
+}
+
+static inline size_t opa_get_smp_data_size(struct opa_smp *smp)
+{
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+		return sizeof(smp->route.dr.data);
+
+	return sizeof(smp->route.lid.data);
+}
+
+static inline size_t opa_get_smp_header_size(struct opa_smp *smp)
+{
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+		return sizeof(*smp) - sizeof(smp->route.dr.data);
+
+	return sizeof(*smp) - sizeof(smp->route.lid.data);
+}
+
+#endif /* OPA_SMI_H */
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 13/14] " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-06 18:38   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1433615915-24591-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-12 20:00   ` [PATCH 00/14] IB/mad: Add support for " Doug Ledford
  14 siblings, 1 reply; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-06 18:38 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

For devices which support OPA MADs

   1) Use previously defined SMP support functions.

   2) Pass correct base version to ib_create_send_mad when processing OPA MADs.

   3) Process out_mad_key_index returned by agents for a response.  This is
      necessary because OPA SMP packets must carry a valid pkey.

   4) Carry the correct segment size (OPA vs IBTA) of RMPP messages within
      ib_mad_recv_wc.

   5) Handle variable length OPA MADs by:

        * Adjusting the 'fake' WC for locally routed SMP's to represent the
          proper incoming byte_len
        * out_mad_size is used from the local HCA agents
                1) when sending agent responses on the wire
                2) when passing responses through the local_completions
		   function

	NOTE: wc.byte_len includes the GRH length and therefore is different
	      from the in_mad_size specified to the local HCA agents.
	      out_mad_size should _not_ include the GRH length as it is added

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
Changes from V1:
	Use out_mad_pkey_index rather than the wc.pkey_index from the agents
		This is a much cleaner interface than returning data in the
		in_wc which can now remain const.
	Correct all opa variables to be bool type
	Adjust for ib_mad_private being a flex array

 drivers/infiniband/core/agent.c    |   7 +-
 drivers/infiniband/core/agent.h    |   2 +-
 drivers/infiniband/core/mad.c      | 220 ++++++++++++++++++++++++++++++++-----
 drivers/infiniband/core/mad_priv.h |   1 +
 drivers/infiniband/core/mad_rmpp.c |  20 +++-
 drivers/infiniband/core/user_mad.c |  19 ++--
 include/rdma/ib_mad.h              |   8 ++
 7 files changed, 234 insertions(+), 43 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index 6c420736ce93..c7dcfe4ca5f1 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -80,7 +80,7 @@ ib_get_agent_port(const struct ib_device *device, int port_num)
 
 void agent_send_response(const struct ib_mad_hdr *mad_hdr, const struct ib_grh *grh,
 			 const struct ib_wc *wc, const struct ib_device *device,
-			 int port_num, int qpn, size_t resp_mad_len)
+			 int port_num, int qpn, size_t resp_mad_len, bool opa)
 {
 	struct ib_agent_port_private *port_priv;
 	struct ib_mad_agent *agent;
@@ -106,11 +106,14 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, const struct ib_grh *
 		return;
 	}
 
+	if (opa && mad_hdr->base_version != OPA_MGMT_BASE_VERSION)
+		resp_mad_len = IB_MGMT_MAD_SIZE;
+
 	send_buf = ib_create_send_mad(agent, wc->src_qp, wc->pkey_index, 0,
 				      IB_MGMT_MAD_HDR,
 				      resp_mad_len - IB_MGMT_MAD_HDR,
 				      GFP_KERNEL,
-				      IB_MGMT_BASE_VERSION);
+				      mad_hdr->base_version);
 	if (IS_ERR(send_buf)) {
 		dev_err(&device->dev, "ib_create_send_mad error\n");
 		goto err1;
diff --git a/drivers/infiniband/core/agent.h b/drivers/infiniband/core/agent.h
index 234c8aa380e0..65f92bedae44 100644
--- a/drivers/infiniband/core/agent.h
+++ b/drivers/infiniband/core/agent.h
@@ -46,6 +46,6 @@ extern int ib_agent_port_close(struct ib_device *device, int port_num);
 
 extern void agent_send_response(const struct ib_mad_hdr *mad_hdr, const struct ib_grh *grh,
 				const struct ib_wc *wc, const struct ib_device *device,
-				int port_num, int qpn, size_t resp_mad_len);
+				int port_num, int qpn, size_t resp_mad_len, bool opa);
 
 #endif	/* __AGENT_H_ */
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 81acde8b5e90..17c90519c03e 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2005 Intel Corporation.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2009 HNR Consulting. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -44,6 +45,7 @@
 #include "mad_priv.h"
 #include "mad_rmpp.h"
 #include "smi.h"
+#include "opa_smi.h"
 #include "agent.h"
 
 MODULE_LICENSE("Dual BSD/GPL");
@@ -751,6 +753,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 {
 	int ret = 0;
 	struct ib_smp *smp = mad_send_wr->send_buf.mad;
+	struct opa_smp *opa_smp = (struct opa_smp *)smp;
 	unsigned long flags;
 	struct ib_mad_local_private *local;
 	struct ib_mad_private *mad_priv;
@@ -762,6 +765,9 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
 	size_t mad_size = port_mad_size(mad_agent_priv->qp_info->port_priv);
 	u16 out_mad_pkey_index = 0;
+	u16 drslid;
+	bool opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,
+				    mad_agent_priv->qp_info->port_priv->port_num);
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH &&
 	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
@@ -775,19 +781,48 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	 * If we are at the start of the LID routed part, don't update the
 	 * hop_ptr or hop_cnt.  See section 14.2.2, Vol 1 IB spec.
 	 */
-	if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
-	     IB_LID_PERMISSIVE &&
-	     smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
-	     IB_SMI_DISCARD) {
-		ret = -EINVAL;
-		dev_err(&device->dev, "Invalid directed route\n");
-		goto out;
-	}
+	if (opa && smp->class_version == OPA_SMP_CLASS_VERSION) {
+		u32 opa_drslid;
+
+		if ((opa_get_smp_direction(opa_smp)
+		     ? opa_smp->route.dr.dr_dlid : opa_smp->route.dr.dr_slid) ==
+		     OPA_LID_PERMISSIVE &&
+		     opa_smi_handle_dr_smp_send(opa_smp, device->node_type,
+						port_num) == IB_SMI_DISCARD) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "OPA Invalid directed route\n");
+			goto out;
+		}
+		opa_drslid = be32_to_cpu(opa_smp->route.dr.dr_slid);
+		if (opa_drslid != OPA_LID_PERMISSIVE &&
+		    opa_drslid & 0xffff0000) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "OPA Invalid dr_slid 0x%x\n",
+			       opa_drslid);
+			goto out;
+		}
+		drslid = (u16)(opa_drslid & 0x0000ffff);
 
-	/* Check to post send on QP or process locally */
-	if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD &&
-	    smi_check_local_returning_smp(smp, device) == IB_SMI_DISCARD)
-		goto out;
+		/* Check to post send on QP or process locally */
+		if (opa_smi_check_local_smp(opa_smp, device) == IB_SMI_DISCARD &&
+		    opa_smi_check_local_returning_smp(opa_smp, device) == IB_SMI_DISCARD)
+			goto out;
+	} else {
+		if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
+		     IB_LID_PERMISSIVE &&
+		     smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
+		     IB_SMI_DISCARD) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "Invalid directed route\n");
+			goto out;
+		}
+		drslid = be16_to_cpu(smp->dr_slid);
+
+		/* Check to post send on QP or process locally */
+		if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD &&
+		    smi_check_local_returning_smp(smp, device) == IB_SMI_DISCARD)
+			goto out;
+	}
 
 	local = kmalloc(sizeof *local, GFP_ATOMIC);
 	if (!local) {
@@ -806,10 +841,16 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	build_smp_wc(mad_agent_priv->agent.qp,
-		     send_wr->wr_id, be16_to_cpu(smp->dr_slid),
+		     send_wr->wr_id, drslid,
 		     send_wr->wr.ud.pkey_index,
 		     send_wr->wr.ud.port_num, &mad_wc);
 
+	if (opa && smp->base_version == OPA_MGMT_BASE_VERSION) {
+		mad_wc.byte_len = mad_send_wr->send_buf.hdr_len
+					+ mad_send_wr->send_buf.data_len
+					+ sizeof(struct ib_grh);
+	}
+
 	/* No GRH for DR SMP */
 	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
 				  (const struct ib_mad_hdr *)smp, mad_size,
@@ -861,6 +902,10 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	local->mad_send_wr = mad_send_wr;
+	if (opa) {
+		local->mad_send_wr->send_wr.wr.ud.pkey_index = out_mad_pkey_index;
+		local->return_wc_byte_len = mad_size;
+	}
 	/* Reference MAD agent until send side of local completion handled */
 	atomic_inc(&mad_agent_priv->refcount);
 	/* Queue local completion to local list */
@@ -1754,14 +1799,18 @@ out:
 	return mad_agent;
 }
 
-static int validate_mad(const struct ib_mad_hdr *mad_hdr, u32 qp_num)
+static int validate_mad(const struct ib_mad_hdr *mad_hdr,
+			const struct ib_mad_qp_info *qp_info,
+			bool opa)
 {
 	int valid = 0;
+	u32 qp_num = qp_info->qp->qp_num;
 
 	/* Make sure MAD base version is understood */
-	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION) {
-		pr_err("MAD received with unsupported base version %d\n",
-			mad_hdr->base_version);
+	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION &&
+	    (!opa || mad_hdr->base_version != OPA_MGMT_BASE_VERSION)) {
+		pr_err("MAD received with unsupported base version %d %s\n",
+		       mad_hdr->base_version, opa ? "(opa)" : "");
 		goto out;
 	}
 
@@ -2011,7 +2060,8 @@ static enum smi_action handle_ib_smi(const struct ib_mad_port_private *port_priv
 				    port_priv->device,
 				    smi_get_fwd_port(smp),
 				    qp_info->qp->qp_num,
-				    response->mad_size);
+				    response->mad_size,
+				    false);
 
 		return IB_SMI_DISCARD;
 	}
@@ -2019,7 +2069,8 @@ static enum smi_action handle_ib_smi(const struct ib_mad_port_private *port_priv
 }
 
 static bool generate_unmatched_resp(const struct ib_mad_private *recv,
-				    struct ib_mad_private *response)
+				    struct ib_mad_private *response,
+				    size_t *resp_len, bool opa)
 {
 	const struct ib_mad_hdr *recv_hdr = (const struct ib_mad_hdr *)recv->mad;
 	struct ib_mad_hdr *resp_hdr = (struct ib_mad_hdr *)response->mad;
@@ -2035,11 +2086,96 @@ static bool generate_unmatched_resp(const struct ib_mad_private *recv,
 		if (recv_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
 			resp_hdr->status |= IB_SMP_DIRECTION;
 
+		if (opa && recv_hdr->base_version == OPA_MGMT_BASE_VERSION) {
+			if (recv_hdr->mgmt_class ==
+			    IB_MGMT_CLASS_SUBN_LID_ROUTED ||
+			    recv_hdr->mgmt_class ==
+			    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+				*resp_len = opa_get_smp_header_size(
+							(struct opa_smp *)recv->mad);
+			else
+				*resp_len = sizeof(struct ib_mad_hdr);
+		}
+
 		return true;
 	} else {
 		return false;
 	}
 }
+
+static enum smi_action
+handle_opa_smi(struct ib_mad_port_private *port_priv,
+	       struct ib_mad_qp_info *qp_info,
+	       struct ib_wc *wc,
+	       int port_num,
+	       struct ib_mad_private *recv,
+	       struct ib_mad_private *response)
+{
+	enum smi_forward_action retsmi;
+	struct opa_smp *smp = (struct opa_smp *)recv->mad;
+
+	if (opa_smi_handle_dr_smp_recv(smp,
+				   port_priv->device->node_type,
+				   port_num,
+				   port_priv->device->phys_port_cnt) ==
+				   IB_SMI_DISCARD)
+		return IB_SMI_DISCARD;
+
+	retsmi = opa_smi_check_forward_dr_smp(smp);
+	if (retsmi == IB_SMI_LOCAL)
+		return IB_SMI_HANDLE;
+
+	if (retsmi == IB_SMI_SEND) { /* don't forward */
+		if (opa_smi_handle_dr_smp_send(smp,
+					   port_priv->device->node_type,
+					   port_num) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+		if (opa_smi_check_local_smp(smp, port_priv->device) ==
+		    IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
+		/* forward case for switches */
+		memcpy(response, recv, mad_priv_size(response));
+		response->header.recv_wc.wc = &response->header.wc;
+		response->header.recv_wc.recv_buf.opa_mad =
+				(struct opa_mad *)response->mad;
+		response->header.recv_wc.recv_buf.grh = &response->grh;
+
+		agent_send_response((const struct ib_mad_hdr *)response->mad,
+				    &response->grh, wc,
+				    port_priv->device,
+				    opa_smi_get_fwd_port(smp),
+				    qp_info->qp->qp_num,
+				    recv->header.wc.byte_len,
+				    true);
+
+		return IB_SMI_DISCARD;
+	}
+
+	return IB_SMI_HANDLE;
+}
+
+static enum smi_action
+handle_smi(struct ib_mad_port_private *port_priv,
+	   struct ib_mad_qp_info *qp_info,
+	   struct ib_wc *wc,
+	   int port_num,
+	   struct ib_mad_private *recv,
+	   struct ib_mad_private *response,
+	   bool opa)
+{
+	struct ib_mad_hdr *mad_hdr = (struct ib_mad_hdr *)recv->mad;
+
+	if (opa && mad_hdr->base_version == OPA_MGMT_BASE_VERSION &&
+	    mad_hdr->class_version == OPA_SMI_CLASS_VERSION)
+		return handle_opa_smi(port_priv, qp_info, wc, port_num, recv,
+				      response);
+
+	return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response);
+}
+
 static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 				     struct ib_wc *wc)
 {
@@ -2052,11 +2188,15 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	int ret = IB_MAD_RESULT_SUCCESS;
 	size_t mad_size;
 	u16 resp_mad_pkey_index = 0;
+	bool opa;
 
 	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
 	qp_info = mad_list->mad_queue->qp_info;
 	dequeue_mad(mad_list);
 
+	opa = rdma_cap_opa_mad(qp_info->port_priv->device,
+			       qp_info->port_priv->port_num);
+
 	mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header,
 				    mad_list);
 	recv = container_of(mad_priv_hdr, struct ib_mad_private, header);
@@ -2068,7 +2208,15 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	/* Setup MAD receive work completion from "normal" work completion */
 	recv->header.wc = *wc;
 	recv->header.recv_wc.wc = &recv->header.wc;
-	recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+
+	if (opa && ((struct ib_mad_hdr *)(recv->mad))->base_version == OPA_MGMT_BASE_VERSION) {
+		recv->header.recv_wc.mad_len = wc->byte_len - sizeof(struct ib_grh);
+		recv->header.recv_wc.mad_seg_size = sizeof(struct opa_mad);
+	} else {
+		recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+		recv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
+	}
+
 	recv->header.recv_wc.recv_buf.mad = (struct ib_mad *)recv->mad;
 	recv->header.recv_wc.recv_buf.grh = &recv->grh;
 
@@ -2076,7 +2224,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS);
 
 	/* Validate MAD */
-	if (!validate_mad((const struct ib_mad_hdr *)recv->mad, qp_info->qp->qp_num))
+	if (!validate_mad((const struct ib_mad_hdr *)recv->mad, qp_info, opa))
 		goto out;
 
 	mad_size = recv->mad_size;
@@ -2094,8 +2242,8 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 
 	if (((struct ib_mad_hdr *)recv->mad)->mgmt_class ==
 	    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
-		if (handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
-				  response)
+		if (handle_smi(port_priv, qp_info, wc, port_num, recv,
+			       response, opa)
 		    == IB_SMI_DISCARD)
 			goto out;
 	}
@@ -2118,7 +2266,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 						    port_priv->device,
 						    port_num,
 						    qp_info->qp->qp_num,
-						    response->mad_size);
+						    mad_size, opa);
 				goto out;
 			}
 		}
@@ -2133,10 +2281,10 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		 */
 		recv = NULL;
 	} else if ((ret & IB_MAD_RESULT_SUCCESS) &&
-		   generate_unmatched_resp(recv, response)) {
+		   generate_unmatched_resp(recv, response, &mad_size, opa)) {
 		agent_send_response((const struct ib_mad_hdr *)response->mad, &recv->grh, wc,
 				    port_priv->device, port_num,
-				    qp_info->qp->qp_num, response->mad_size);
+				    qp_info->qp->qp_num, mad_size, opa);
 	}
 
 out:
@@ -2537,10 +2685,14 @@ static void local_completions(struct work_struct *work)
 	int free_mad;
 	struct ib_wc wc;
 	struct ib_mad_send_wc mad_send_wc;
+	bool opa;
 
 	mad_agent_priv =
 		container_of(work, struct ib_mad_agent_private, local_work);
 
+	opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,
+			       mad_agent_priv->qp_info->port_priv->port_num);
+
 	spin_lock_irqsave(&mad_agent_priv->lock, flags);
 	while (!list_empty(&mad_agent_priv->local_list)) {
 		local = list_entry(mad_agent_priv->local_list.next,
@@ -2550,6 +2702,7 @@ static void local_completions(struct work_struct *work)
 		spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
 		free_mad = 0;
 		if (local->mad_priv) {
+			u8 base_version;
 			recv_mad_agent = local->recv_mad_agent;
 			if (!recv_mad_agent) {
 				dev_err(&mad_agent_priv->agent.device->dev,
@@ -2565,11 +2718,20 @@ static void local_completions(struct work_struct *work)
 			build_smp_wc(recv_mad_agent->agent.qp,
 				     (unsigned long) local->mad_send_wr,
 				     be16_to_cpu(IB_LID_PERMISSIVE),
-				     0, recv_mad_agent->agent.port_num, &wc);
+				     local->mad_send_wr->send_wr.wr.ud.pkey_index,
+				     recv_mad_agent->agent.port_num, &wc);
 
 			local->mad_priv->header.recv_wc.wc = &wc;
-			local->mad_priv->header.recv_wc.mad_len =
-						sizeof(struct ib_mad);
+
+			base_version = ((struct ib_mad_hdr *)(local->mad_priv->mad))->base_version;
+			if (opa && base_version == OPA_MGMT_BASE_VERSION) {
+				local->mad_priv->header.recv_wc.mad_len = local->return_wc_byte_len;
+				local->mad_priv->header.recv_wc.mad_seg_size = sizeof(struct opa_mad);
+			} else {
+				local->mad_priv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+				local->mad_priv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
+			}
+
 			INIT_LIST_HEAD(&local->mad_priv->header.recv_wc.rmpp_list);
 			list_add(&local->mad_priv->header.recv_wc.recv_buf.list,
 				 &local->mad_priv->header.recv_wc.rmpp_list);
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 4423f68e2a77..5be89f98928f 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -148,6 +148,7 @@ struct ib_mad_local_private {
 	struct ib_mad_private *mad_priv;
 	struct ib_mad_agent_private *recv_mad_agent;
 	struct ib_mad_send_wr_private *mad_send_wr;
+	size_t return_wc_byte_len;
 };
 
 struct ib_mad_mgmt_method_table {
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index f4e4fe609e95..382941b46e43 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2005 Intel Inc. All rights reserved.
  * Copyright (c) 2005-2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -67,6 +68,7 @@ struct mad_rmpp_recv {
 	u8 mgmt_class;
 	u8 class_version;
 	u8 method;
+	u8 base_version;
 };
 
 static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv)
@@ -318,6 +320,7 @@ create_rmpp_recv(struct ib_mad_agent_private *agent,
 	rmpp_recv->mgmt_class = mad_hdr->mgmt_class;
 	rmpp_recv->class_version = mad_hdr->class_version;
 	rmpp_recv->method  = mad_hdr->method;
+	rmpp_recv->base_version  = mad_hdr->base_version;
 	return rmpp_recv;
 
 error:	kfree(rmpp_recv);
@@ -433,14 +436,23 @@ static inline int get_mad_len(struct mad_rmpp_recv *rmpp_recv)
 {
 	struct ib_rmpp_mad *rmpp_mad;
 	int hdr_size, data_size, pad;
+	bool opa = rdma_cap_opa_mad(rmpp_recv->agent->qp_info->port_priv->device,
+				    rmpp_recv->agent->qp_info->port_priv->port_num);
 
 	rmpp_mad = (struct ib_rmpp_mad *)rmpp_recv->cur_seg_buf->mad;
 
 	hdr_size = ib_get_mad_data_offset(rmpp_mad->mad_hdr.mgmt_class);
-	data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
-	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin);
-	if (pad > IB_MGMT_RMPP_DATA || pad < 0)
-		pad = 0;
+	if (opa && rmpp_recv->base_version == OPA_MGMT_BASE_VERSION) {
+		data_size = sizeof(struct opa_rmpp_mad) - hdr_size;
+		pad = OPA_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin);
+		if (pad > OPA_MGMT_RMPP_DATA || pad < 0)
+			pad = 0;
+	} else {
+		data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
+		pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin);
+		if (pad > IB_MGMT_RMPP_DATA || pad < 0)
+			pad = 0;
+	}
 
 	return hdr_size + rmpp_recv->seg_num * data_size - pad;
 }
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index d4286712405d..35567fffaa4e 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -262,20 +262,23 @@ static ssize_t copy_recv_mad(struct ib_umad_file *file, char __user *buf,
 {
 	struct ib_mad_recv_buf *recv_buf;
 	int left, seg_payload, offset, max_seg_payload;
+	size_t seg_size;
 
-	/* We need enough room to copy the first (or only) MAD segment. */
 	recv_buf = &packet->recv_wc->recv_buf;
-	if ((packet->length <= sizeof (*recv_buf->mad) &&
+	seg_size = packet->recv_wc->mad_seg_size;
+
+	/* We need enough room to copy the first (or only) MAD segment. */
+	if ((packet->length <= seg_size &&
 	     count < hdr_size(file) + packet->length) ||
-	    (packet->length > sizeof (*recv_buf->mad) &&
-	     count < hdr_size(file) + sizeof (*recv_buf->mad)))
+	    (packet->length > seg_size &&
+	     count < hdr_size(file) + seg_size))
 		return -EINVAL;
 
 	if (copy_to_user(buf, &packet->mad, hdr_size(file)))
 		return -EFAULT;
 
 	buf += hdr_size(file);
-	seg_payload = min_t(int, packet->length, sizeof (*recv_buf->mad));
+	seg_payload = min_t(int, packet->length, seg_size);
 	if (copy_to_user(buf, recv_buf->mad, seg_payload))
 		return -EFAULT;
 
@@ -292,7 +295,7 @@ static ssize_t copy_recv_mad(struct ib_umad_file *file, char __user *buf,
 			return -ENOSPC;
 		}
 		offset = ib_get_mad_data_offset(recv_buf->mad->mad_hdr.mgmt_class);
-		max_seg_payload = sizeof (struct ib_mad) - offset;
+		max_seg_payload = seg_size - offset;
 
 		for (left = packet->length - seg_payload, buf += seg_payload;
 		     left; left -= seg_payload, buf += seg_payload) {
@@ -450,6 +453,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 	struct ib_rmpp_mad *rmpp_mad;
 	__be64 *tid;
 	int ret, data_len, hdr_len, copy_offset, rmpp_active;
+	u8 base_version;
 
 	if (count < hdr_size(file) + IB_MGMT_RMPP_HDR)
 		return -EINVAL;
@@ -516,12 +520,13 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 		rmpp_active = 0;
 	}
 
+	base_version = ((struct ib_mad_hdr *)&packet->mad.data)->base_version;
 	data_len = count - hdr_size(file) - hdr_len;
 	packet->msg = ib_create_send_mad(agent,
 					 be32_to_cpu(packet->mad.hdr.qpn),
 					 packet->mad.hdr.pkey_index, rmpp_active,
 					 hdr_len, data_len, GFP_KERNEL,
-					 IB_MGMT_BASE_VERSION);
+					 base_version);
 	if (IS_ERR(packet->msg)) {
 		ret = PTR_ERR(packet->msg);
 		goto err_ah;
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index eaa8c5dc472b..c8422d5a5a91 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -199,6 +199,12 @@ struct ib_rmpp_mad {
 	u8			data[IB_MGMT_RMPP_DATA];
 };
 
+struct opa_rmpp_mad {
+	struct ib_mad_hdr	mad_hdr;
+	struct ib_rmpp_hdr	rmpp_hdr;
+	u8			data[OPA_MGMT_RMPP_DATA];
+};
+
 struct ib_sa_mad {
 	struct ib_mad_hdr	mad_hdr;
 	struct ib_rmpp_hdr	rmpp_hdr;
@@ -429,6 +435,7 @@ struct ib_mad_recv_buf {
  * @recv_buf: Specifies the location of the received data buffer(s).
  * @rmpp_list: Specifies a list of RMPP reassembled received MAD buffers.
  * @mad_len: The length of the received MAD, without duplicated headers.
+ * @mad_seg_size: The size of individual MAD segments
  *
  * For received response, the wr_id contains a pointer to the ib_mad_send_buf
  *   for the corresponding send request.
@@ -438,6 +445,7 @@ struct ib_mad_recv_wc {
 	struct ib_mad_recv_buf	recv_buf;
 	struct list_head	rmpp_list;
 	int			mad_len;
+	size_t			mad_seg_size;
 };
 
 /**
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* RE: [PATCH 10/14] IB/mad: Add support for additional MAD info to/from drivers
       [not found]     ` <1433615915-24591-11-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-06-08 18:50       ` Hefty, Sean
  0 siblings, 0 replies; 42+ messages in thread
From: Hefty, Sean @ 2015-06-08 18:50 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Weiny, Ira

> @@ -1693,8 +1693,11 @@ struct ib_device {
>  						  u8 port_num,
>  						  const struct ib_wc *in_wc,
>  						  const struct ib_grh *in_grh,
> -						  const struct ib_mad *in_mad,
> -						  struct ib_mad *out_mad);
> +						  const struct ib_mad_hdr *in_mad,
> +						  size_t in_mad_size,
> +						  struct ib_mad_hdr *out_mad,
> +						  size_t *out_mad_size,
> +						  u16 *out_mad_pkey_index);

I don't have an alternate suggestion at the moment, but this call is getting to be a little much.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]     ` <1433615915-24591-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-06-10  6:30       ` Liran Liss
       [not found]         ` <HE1PR05MB1418BB6C461790B76D9C02A3B1BD0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Liran Liss @ 2015-06-10  6:30 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Hi Ira,

OPA cannot impersonate IB; OPA node and link types have to be designated as such.
In terms of MAD processing flows, both explicit (as in the handle_opa_smi() call below) and implicit code paths (which share IB flows - there are several cases) must make this distinction.

> +static enum smi_action
> +handle_opa_smi(struct ib_mad_port_private *port_priv,
> +	       struct ib_mad_qp_info *qp_info,
> +	       struct ib_wc *wc,
> +	       int port_num,
> +	       struct ib_mad_private *recv,
> +	       struct ib_mad_private *response)
> +{
...
> +	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH)  <----

--Liran
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]         ` <HE1PR05MB1418BB6C461790B76D9C02A3B1BD0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-06-10 17:54           ` ira.weiny
  2015-06-10 18:37           ` Doug Ledford
  1 sibling, 0 replies; 42+ messages in thread
From: ira.weiny @ 2015-06-10 17:54 UTC (permalink / raw)
  To: Liran Liss
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Jun 10, 2015 at 06:30:58AM +0000, Liran Liss wrote:
> > From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> Hi Ira,
> 
> OPA cannot impersonate IB; OPA node and link types have to be designated as such.

This was discussed at length and we agreed that the kernel would have explicit
capabilities communicated between the drivers and the core layers rather than
using link layer to determine what core support was needed.

For Node Type, OPA is its own "namespace" and as such we use the same values
for "CA" and "Switch".  The code you reference below is explicitly executed
only on OPA devices so I don't see why this is in conflict with IB.

> In terms of MAD processing flows, both explicit (as in the handle_opa_smi() call below) and implicit code paths (which share IB flows - there are several cases) must make this distinction.
> 

I agreed and all OPA differences are limited to device/ports which explicitly
indicate they are OPA ports.

For example:

        opa = rdma_cap_opa_mad(qp_info->port_priv->device,
                               qp_info->port_priv->port_num);

...

        if (opa && ((struct ib_mad_hdr *)(recv->mad))->base_version == OPA_MGMT_BASE_VERSION) {
                recv->header.recv_wc.mad_len = wc->byte_len - sizeof(struct ib_grh);
                recv->header.recv_wc.mad_seg_size = sizeof(struct opa_mad);
        } else {
                recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
                recv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
        }


If I missed a place where this is not the case please let me know but I made
this change many months back and I'm pretty sure I caught them all.

Thanks,
Ira


> > +static enum smi_action
> > +handle_opa_smi(struct ib_mad_port_private *port_priv,
> > +	       struct ib_mad_qp_info *qp_info,
> > +	       struct ib_wc *wc,
> > +	       int port_num,
> > +	       struct ib_mad_private *recv,
> > +	       struct ib_mad_private *response)
> > +{
> ...
> > +	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH)  <----
> 
> --Liran
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]         ` <HE1PR05MB1418BB6C461790B76D9C02A3B1BD0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2015-06-10 17:54           ` ira.weiny
@ 2015-06-10 18:37           ` Doug Ledford
       [not found]             ` <1433961446.71666.26.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2015-06-10 18:37 UTC (permalink / raw)
  To: Liran Liss
  Cc: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

On Wed, 2015-06-10 at 06:30 +0000, Liran Liss wrote:
> > From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> Hi Ira,
> 
> OPA cannot impersonate IB; OPA node and link types have to be designated as such.
> In terms of MAD processing flows, both explicit (as in the handle_opa_smi() call below) and implicit code paths (which share IB flows - there are several cases) must make this distinction.

As far as in the kernel is concerned, the individual capability bits are
much more important.  I would actually like to do away with the
node_type variable from struct ib_device eventually.  As for user space,
where we have to maintain ABI, node_type can be IB_CA (after all, the
OPA devices are just like RoCE devices in that they implement IB VERBS
as their user visible transport, and only addressing/management is
different from link layer IB devices), link layer needs to be OPA.

> > +static enum smi_action
> > +handle_opa_smi(struct ib_mad_port_private *port_priv,
> > +	       struct ib_mad_qp_info *qp_info,
> > +	       struct ib_wc *wc,
> > +	       int port_num,
> > +	       struct ib_mad_private *recv,
> > +	       struct ib_mad_private *response)
> > +{
> ...
> > +	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH)  <----
> 
> --Liran


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]             ` <1433961446.71666.26.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-10 18:56               ` Jason Gunthorpe
       [not found]                 ` <20150610185653.GA28153-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2015-06-10 18:56 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, ira.weiny-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Jun 10, 2015 at 02:37:26PM -0400, Doug Ledford wrote:
> On Wed, 2015-06-10 at 06:30 +0000, Liran Liss wrote:
> > > From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > 
> > Hi Ira,
> > 
> > OPA cannot impersonate IB; OPA node and link types have to be
> > designated as such.  In terms of MAD processing flows, both
> > explicit (as in the handle_opa_smi() call below) and implicit code
> > paths (which share IB flows - there are several cases) must make
> > this distinction.
> 
> As far as in the kernel is concerned, the individual capability bits
> are much more important.  I would actually like to do away with the
> node_type variable from struct ib_device eventually.  As for user
> space,

All SMI code has different behavior if it is running on a switch or
HCA, so testing for 'switchyness' is very appropriate here.
cap_is_switch_smi would be a nice refinement to let us drop nodetype.

I don't have a problem with sharing the IBA constant names for MAD
structures (like RDMA_NODE_IB_SWITCH) between IB and OPA code. They
already share the structure layouts/etc.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                 ` <20150610185653.GA28153-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-10 19:59                   ` Doug Ledford
       [not found]                     ` <1433966378.71666.44.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2015-06-10 19:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, ira.weiny-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1751 bytes --]

On Wed, 2015-06-10 at 12:56 -0600, Jason Gunthorpe wrote:
> On Wed, Jun 10, 2015 at 02:37:26PM -0400, Doug Ledford wrote:
> > On Wed, 2015-06-10 at 06:30 +0000, Liran Liss wrote:
> > > > From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > 
> > > Hi Ira,
> > > 
> > > OPA cannot impersonate IB; OPA node and link types have to be
> > > designated as such.  In terms of MAD processing flows, both
> > > explicit (as in the handle_opa_smi() call below) and implicit code
> > > paths (which share IB flows - there are several cases) must make
> > > this distinction.
> > 
> > As far as in the kernel is concerned, the individual capability bits
> > are much more important.  I would actually like to do away with the
> > node_type variable from struct ib_device eventually.  As for user
> > space,
> 
> All SMI code has different behavior if it is running on a switch or
> HCA, so testing for 'switchyness' is very appropriate here.

Sure...

> cap_is_switch_smi would be a nice refinement to let us drop nodetype.

Exactly, we need a bit added to the immutable data bits, and a new cap_
helper, and then nodetype is ready to be retired.  Add a bit, drop a
u8 ;-)

> I don't have a problem with sharing the IBA constant names for MAD
> structures (like RDMA_NODE_IB_SWITCH) between IB and OPA code. They
> already share the structure layouts/etc.
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                     ` <1433966378.71666.44.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-11 18:27                       ` Liran Liss
       [not found]                         ` <HE1PR05MB141885494D6967919DAE135EB1BC0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2015-06-11 21:00                       ` Hefty, Sean
  1 sibling, 1 reply; 42+ messages in thread
From: Liran Liss @ 2015-06-11 18:27 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> From: Doug Ledford [mailto:dledford@redhat.com]

> > > > OPA cannot impersonate IB; OPA node and link types have to be
> > > > designated as such.  In terms of MAD processing flows, both
> > > > explicit (as in the handle_opa_smi() call below) and implicit code
> > > > paths (which share IB flows - there are several cases) must make
> > > > this distinction.
> > >
> > > As far as in the kernel is concerned, the individual capability bits
> > > are much more important.  I would actually like to do away with the
> > > node_type variable from struct ib_device eventually.  As for user
> > > space,

We agreed on the concept of capability bits for the sake of simplifying code sharing.
That is OK.

But the node_type stands for more than just an abstract RDMA device:
In IB, it designates an instance of an industry-standard, well-defined, device type: it's possible link types, transport, semantics, management, everything.
It *should* be exposed to user-space so apps that know and care what they are running on could continue to work.

The place for abstraction is in the rdmacm/CMA, which serves applications that just
want some RDMA functionality regardless of the underlying technology.

> >
> > All SMI code has different behavior if it is running on a switch or
> > HCA, so testing for 'switchyness' is very appropriate here.
> 
> Sure...
> 
> > cap_is_switch_smi would be a nice refinement to let us drop nodetype.
> 
> Exactly, we need a bit added to the immutable data bits, and a new cap_
> helper, and then nodetype is ready to be retired.  Add a bit, drop a
> u8 ;-)
> 

This is indeed a viable solution.

> > I don't have a problem with sharing the IBA constant names for MAD
> > structures (like RDMA_NODE_IB_SWITCH) between IB and OPA code. They
> > already share the structure layouts/etc.
> >

The node type is reflected to user-space, which, as I mentioned above, is important.
Abusing this enumeration is misleading, even in the kernel.
Jason's proposal for a 'cap_is_switch_smi' is more readable, and directly in line with
the explicit capability approach that we discussed.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                     ` <1433966378.71666.44.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-11 18:27                       ` Liran Liss
@ 2015-06-11 21:00                       ` Hefty, Sean
       [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373A8FEF1EA-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Hefty, Sean @ 2015-06-11 21:00 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 952 bytes --]

> > cap_is_switch_smi would be a nice refinement to let us drop nodetype.
> 
> Exactly, we need a bit added to the immutable data bits, and a new cap_
> helper, and then nodetype is ready to be retired.  Add a bit, drop a
> u8 ;-)

I agree that the node type enum isn't particularly useful and should be retired.  In fact, I don't see where RDMA_NODE_IB_SWITCH is used by any upstream device.  So I don't think there's any obligation to keep it.  But even if we do, I'm not sure this is the correct approach.  I don't know this for a fact, but it seems more likely that someone would embed Linux on an IB switch than they would plug an IB switch into a Linux based system.  The code is designed around the latter.  Making this a system wide setting might simplify the code and optimize the code paths.

- Sean
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373A8FEF1EA-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-11 23:24                           ` Hal Rosenstock
       [not found]                             ` <557A18C0.6010200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Hal Rosenstock @ 2015-06-11 23:24 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 6/11/2015 5:00 PM, Hefty, Sean wrote:
>>> cap_is_switch_smi would be a nice refinement to let us drop nodetype.
>>
>> Exactly, we need a bit added to the immutable data bits, and a new cap_
>> helper, and then nodetype is ready to be retired.  Add a bit, drop a
>> u8 ;-)
> 
> I agree that the node type enum isn't particularly useful and should be retired.  

Are you referring to kernel space or user space or both ?

> In fact, I don't see where RDMA_NODE_IB_SWITCH is used by any upstream device.  

While not upstream, there are at least 2 vendors with one or more switch
device drivers using the upstream stack.

> So I don't think there's any obligation to keep it.  

In kernel space, we can get rid of it but it's exposed by verbs and
currently relied upon in user space in a number of places.

There's one kernel place that needs more than just cap_is_switch_smi().

> But even if we do, I'm not sure this is the correct approach.  I don't know this for a fact, 
> but it seems more likely that someone would embed Linux on an IB switch than they would plug an IB switch 
> into a Linux based system.  The code is designed around the latter.  Making this a system wide setting might simplify the code and optimize the code paths.

I think we need to discuss how user space would be addressed.

-- Hal

> - Sean
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{��ٚ�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+��ݢj"��!tml=

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                             ` <557A18C0.6010200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-06-11 23:52                               ` Hefty, Sean
       [not found]                                 ` <1828884A29C6694DAF28B7E6B8A82373A8FEF321-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Hefty, Sean @ 2015-06-11 23:52 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> > I agree that the node type enum isn't particularly useful and should be
> retired.
> 
> Are you referring to kernel space or user space or both ?

Short term, kernel space.  User space needs to keep something around for backwards compatibility.

But the in tree code will never expose this value up.

> > But even if we do, I'm not sure this is the correct approach.  I don't
> know this for a fact,
> > but it seems more likely that someone would embed Linux on an IB switch
> than they would plug an IB switch
> > into a Linux based system.  The code is designed around the latter.
> Making this a system wide setting might simplify the code and optimize the
> code paths.
> 
> I think we need to discuss how user space would be addressed.

This is an issue with out of tree drivers.  We're having to guess what things might be doing.  Are all devices being exposed up as a 'switch', or is there ever a case where there's a 'switch' device and an HCA device being reported together, or (highly unlikely) a switch device and an RNIC?

If the real use case is to embed Linux on a switch, then we could look at making that a system wide setting, rather than per device.  This could clean up the kernel without impacting the uABI.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                 ` <1828884A29C6694DAF28B7E6B8A82373A8FEF321-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-12  0:22                                   ` Hal Rosenstock
  0 siblings, 0 replies; 42+ messages in thread
From: Hal Rosenstock @ 2015-06-12  0:22 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 6/11/2015 7:52 PM, Hefty, Sean wrote:
>>> I agree that the node type enum isn't particularly useful and should be
>> retired.
>>
>> Are you referring to kernel space or user space or both ?
> 
> Short term, kernel space.  User space needs to keep something around for backwards compatibility.
> 
> But the in tree code will never expose this value up.
> 
>>> But even if we do, I'm not sure this is the correct approach.  I don't
>> know this for a fact,
>>> but it seems more likely that someone would embed Linux on an IB switch
>> than they would plug an IB switch
>>> into a Linux based system.  The code is designed around the latter.
>> Making this a system wide setting might simplify the code and optimize the
>> code paths.
>>
>> I think we need to discuss how user space would be addressed.
> 
> This is an issue with out of tree drivers.  We're having to guess what things might be doing.  
> Are all devices being exposed up as a 'switch', or is there ever a case where there's a 'switch' device 
> and an HCA device being reported together, or (highly unlikely) a switch device and an RNIC?

Gateways are comprised of switch + HCA devices. There are other more
complex cases of multiple devices.

> If the real use case is to embed Linux on a switch, then we could look at making that a system wide setting, 
> rather than per device.  This could clean up the kernel without impacting the uABI.

I think that system wide is too limiting and it needs to be on a per
device basis.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                         ` <HE1PR05MB141885494D6967919DAE135EB1BC0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-06-12 14:23                           ` Doug Ledford
       [not found]                             ` <557AEB5D.1040003-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2015-06-12 14:23 UTC (permalink / raw)
  To: Liran Liss, Jason Gunthorpe
  Cc: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

On 06/11/2015 02:27 PM, Liran Liss wrote:
>> From: Doug Ledford [mailto:dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> 
>>>>> OPA cannot impersonate IB; OPA node and link types have to be
>>>>> designated as such.  In terms of MAD processing flows, both
>>>>> explicit (as in the handle_opa_smi() call below) and implicit code
>>>>> paths (which share IB flows - there are several cases) must make
>>>>> this distinction.
>>>>
>>>> As far as in the kernel is concerned, the individual capability bits
>>>> are much more important.  I would actually like to do away with the
>>>> node_type variable from struct ib_device eventually.  As for user
>>>> space,
> 
> We agreed on the concept of capability bits for the sake of simplifying code sharing.
> That is OK.
> 
> But the node_type stands for more than just an abstract RDMA device:
> In IB, it designates an instance of an industry-standard, well-defined, device type: it's possible link types, transport, semantics, management, everything.
> It *should* be exposed to user-space so apps that know and care what they are running on could continue to work.

I'm sorry, but your argument here is not very convincing at all.  And
it's somewhat hypocritical.  When RoCE was first introduced, the *exact*
same argument could be used to argue for why RoCE should require a new
node_type.  Except then, because RoCE was your own, you argued for, and
got, an expansion of the IB node_type definition that now included a
relevant link_layer attribute that apps never needed to care about
before.  However, now you are a victim of your own success.  You set the
standard then that if the new device can properly emulate an IB Verbs/IB
Link Layer device in terms of A) supported primitives (iWARP and usNIC
both fail here, and hence why they have their own node_types) and B)
queue pair creation process modulo link layer specific addressing
attributes, then that device qualifies to use the IB_CA node_type and
merely needs only a link_layer attribute to differentiate it.

The new OPA stuff appears to be following *exactly* the same development
model/path that RoCE did.  When RoCE was introduced, all the apps that
really cared about low level addressing on the link layer had to be
modified to encompass the new link type.  This is simply link_layer
number three for apps to care about.

> The place for abstraction is in the rdmacm/CMA, which serves applications that just
> want some RDMA functionality regardless of the underlying technology.
> 
>>>
>>> All SMI code has different behavior if it is running on a switch or
>>> HCA, so testing for 'switchyness' is very appropriate here.
>>
>> Sure...
>>
>>> cap_is_switch_smi would be a nice refinement to let us drop nodetype.
>>
>> Exactly, we need a bit added to the immutable data bits, and a new cap_
>> helper, and then nodetype is ready to be retired.  Add a bit, drop a
>> u8 ;-)
>>
> 
> This is indeed a viable solution.
> 
>>> I don't have a problem with sharing the IBA constant names for MAD
>>> structures (like RDMA_NODE_IB_SWITCH) between IB and OPA code. They
>>> already share the structure layouts/etc.
>>>
> 
> The node type is reflected to user-space, which, as I mentioned above, is important.
> Abusing this enumeration is misleading, even in the kernel.
> Jason's proposal for a 'cap_is_switch_smi' is more readable, and directly in line with
> the explicit capability approach that we discussed.
> 
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{��ٚ�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/14] IB/mad: Add support for OPA MAD processing.
       [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2015-06-06 18:38   ` [PATCH 14/14] IB/mad: Add final OPA MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-12 20:00   ` Doug Ledford
  14 siblings, 0 replies; 42+ messages in thread
From: Doug Ledford @ 2015-06-12 20:00 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2916 bytes --]

On 06/06/2015 02:38 PM, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> The following patch series modifies the kernel MAD processing (ib_mad/ib_umad)
> and related interfaces to send and receive Intel Omni-Path Architecture MADs on
> devices which support them.
> 
> OPA MADs share the same common header with IBTA MADs which allows us to share
> most of the MAD processing code.
> 
> In addition to supporting some IBTA management classes, OPA devices use MADs
> with lengths up to 2K.  These MADs increase the performance of management
> traffic on OPA fabrics.
> 
> Devices report their support of OPA MADs through the new immutable data
> capability flag and immutable max mad size.
> 
> Changes from V1:
> ================
> 
> Remove patch:
> 	IB/mad: Create an RMPP Base header
> 
> Add new patch:
> 	IB/mad cleanup: Clean up function params -- find_mad_agent
> 
> Address comments from Jason about the idea of a flex array for struct ib_mad:
> 	ib_mad does not really allocate struct ib_mads.  Rather it allocates
> 	ib_mad_private objects.  This is where the flex array was more
> 	appropriate.  So this series changes struct ib_mad_private to end in a
> 	flex array to store MAD data.  Casts are used where appropriate to
> 	IB/OPA mad structures or headers.
> 
> Minor updates:
> 	Clean up commit messages
> 	Fix/add const and bool usage
> 	Remove inline qualifiers (let complier handle inline)
> 	Add additional Immutable data checks
> 	Change WARN_ON to BUG_ON in drivers
> 	Add out_mad_pkey_index to process_mad in order to maintain the
> 	"constness" of the struct ib_wc parameter.
> 
> 
> Ira Weiny (14):
>   IB/mad cleanup: Clean up function params -- find_mad_agent
>   IB/mad cleanup: Generalize processing of MAD data
>   IB/mad: Split IB SMI handling from MAD Recv handler
>   IB/mad: Create a generic helper for DR SMP Send processing
>   IB/mad: Create a generic helper for DR SMP Recv processing
>   IB/mad: Create a generic helper for DR forwarding checks
>   IB/mad: Support alternate Base Versions when creating MADs
>   IB/core: Add ability for drivers to report an alternate MAD size.
>   IB/mad: Convert allocations from kmem_cache to kzalloc
>   IB/mad: Add support for additional MAD info to/from drivers
>   IB/core: Add OPA MAD core capability flag
>   IB/mad: Add partial Intel OPA MAD support
>   IB/mad: Add partial Intel OPA MAD support
>   IB/mad: Add final OPA MAD processing

There haven't been any further technical issues with this patchset.
There is the ongoing argument about usage of node_type, but I think I've
made my opinion on that matter clear.  Changes to the core code to make
CA versus SWITCH operation not rely on node_type can be done as follow
ons to this patch set.  As such, I've picked this up for 4.2.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                             ` <557AEB5D.1040003-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-14 19:16                               ` Liran Liss
       [not found]                                 ` <HE1PR05MB14182DCD7003B52A28BB62A5B1B90-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Liran Liss @ 2015-06-14 19:16 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> From: Doug Ledford [mailto:dledford@redhat.com]

> > But the node_type stands for more than just an abstract RDMA device:
> > In IB, it designates an instance of an industry-standard, well-defined,
> device type: it's possible link types, transport, semantics, management,
> everything.
> > It *should* be exposed to user-space so apps that know and care what
> they are running on could continue to work.
> 
> I'm sorry, but your argument here is not very convincing at all.  And
> it's somewhat hypocritical.  When RoCE was first introduced, the *exact*
> same argument could be used to argue for why RoCE should require a new
> node_type.  Except then, because RoCE was your own, you argued for, and
> got, an expansion of the IB node_type definition that now included a
> relevant link_layer attribute that apps never needed to care about
> before.  However, now you are a victim of your own success.  You set the
> standard then that if the new device can properly emulate an IB Verbs/IB
> Link Layer device in terms of A) supported primitives (iWARP and usNIC
> both fail here, and hence why they have their own node_types) and B)
> queue pair creation process modulo link layer specific addressing
> attributes, then that device qualifies to use the IB_CA node_type and
> merely needs only a link_layer attribute to differentiate it.
> 

No. RoCE is as an open standard from the IBTA with the exact same RDMA protocol semantics as InfiniBand and a clear set of compliancy rules without which an implementation can't claim to be such. A RoCE device *is* an IB CA with an Ethernet link.
In contrast, OPA is a proprietary protocol. We don't know what primitives are supported, and whether the semantics of supported primitives are the same as in InfiniBand.

> The new OPA stuff appears to be following *exactly* the same development
> model/path that RoCE did.  When RoCE was introduced, all the apps that
> really cared about low level addressing on the link layer had to be
> modified to encompass the new link type.  This is simply link_layer
> number three for apps to care about.
> 

You are missing my point. API transparency is not a synonym for full semantic equivalence.  The Node Type doesn’t indicate level of adherence to an API. Node Type indicates compliancy to a  specification (e.g. wire protocol, remote order of execution, error semantics, architectural limitations, etc). The IBTA CA and Switch Node Types belong to devices that are compliant to the corresponding specifications from the InfiniBand Trade Association.  And that doesn’t prevent applications to choose to be coded to run over nodes of different Node Type as it happens today with IB/RoCE and iWARP.

This has nothing to do with addressing.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                 ` <HE1PR05MB14182DCD7003B52A28BB62A5B1B90-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-06-15  5:39                                   ` Doug Ledford
       [not found]                                     ` <557E6514.1060600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2015-06-15  5:39 UTC (permalink / raw)
  To: Liran Liss, Jason Gunthorpe
  Cc: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 5488 bytes --]

On 06/14/2015 03:16 PM, Liran Liss wrote:
>> From: Doug Ledford [mailto:dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> 
>>> But the node_type stands for more than just an abstract RDMA device:
>>> In IB, it designates an instance of an industry-standard, well-defined,
>> device type: it's possible link types, transport, semantics, management,
>> everything.
>>> It *should* be exposed to user-space so apps that know and care what
>> they are running on could continue to work.
>>
>> I'm sorry, but your argument here is not very convincing at all.  And
>> it's somewhat hypocritical.  When RoCE was first introduced, the *exact*
>> same argument could be used to argue for why RoCE should require a new
>> node_type.  Except then, because RoCE was your own, you argued for, and
>> got, an expansion of the IB node_type definition that now included a
>> relevant link_layer attribute that apps never needed to care about
>> before.  However, now you are a victim of your own success.  You set the
>> standard then that if the new device can properly emulate an IB Verbs/IB
>> Link Layer device in terms of A) supported primitives (iWARP and usNIC
>> both fail here, and hence why they have their own node_types) and B)
>> queue pair creation process modulo link layer specific addressing
>> attributes, then that device qualifies to use the IB_CA node_type and
>> merely needs only a link_layer attribute to differentiate it.
>>
> 
> No. RoCE is as an open standard from the IBTA with the exact same RDMA protocol semantics as InfiniBand and a clear set of compliancy rules without which an implementation can't claim to be such. A RoCE device *is* an IB CA with an Ethernet link.
> In contrast, OPA is a proprietary protocol. We don't know what primitives are supported, and whether the semantics of supported primitives are the same as in InfiniBand.

Intel has stated on this list that they intend for RDMA apps to run on
OPA transparently.  That pretty much implies the list of primitives and
everything else that they must support.  However, time will tell if they
succeeded or not.

>> The new OPA stuff appears to be following *exactly* the same development
>> model/path that RoCE did.  When RoCE was introduced, all the apps that
>> really cared about low level addressing on the link layer had to be
>> modified to encompass the new link type.  This is simply link_layer
>> number three for apps to care about.
>>
> 
> You are missing my point. API transparency is not a synonym for full semantic equivalence.  The Node Type doesn’t indicate level of adherence to an API. Node Type indicates compliancy to a  specification (e.g. wire protocol, remote order of execution, error semantics, architectural limitations, etc). The IBTA CA and Switch Node Types belong to devices that are compliant to the corresponding specifications from the InfiniBand Trade Association.  And that doesn’t prevent applications to choose to be coded to run over nodes of different Node Type as it happens today with IB/RoCE and iWARP.
> 
> This has nothing to do with addressing.

And whether you like it or not, Intel is intentionally creating a
device/fabric with the specific intention of mimicking the IB_CA device
type (with stated exceptions for MAD packets and addresses).  They
obviously won't have certification as an IB_CA, but that's not their
aim.  Their aim is to be a functional drop in replacement that apps
don't need to know about except for the stated exceptions.

And I'm not missing your point.  Your point is inappropriate.  You're
trying to conflate certification with a functional API.  The IB_CA node
type is not an official certification of anything, and the linux kernel
is not an official certifying body for anything.  If you want
certification, you go to the OFA and the UNH-IOL testing program.
There, you have the rights to the certification branding logo and you
have the right to deny access to that logo to anyone that doesn't meet
the branding requirements.

You're right that apps can be coded to other CA types, like RNICs and
USNICs.  However, those are all very different from an IB_CA due to
limited queue pair types or limited primitives.  If OPA had that same
limitation then I would agree it needs a different node type.

So this will be my litmus test.  Currently, an app that supports all of
the RDMA types looks like this:

if (node_type == RNIC)
	do iwarpy stuff
else if (node_type == USNIC)
	do USNIC stuff
else if (node_type == IB_CA)
	do IB verbs stuff
	if (link_layer == Ethernet)
		do RoCE addressing/management
	else
		do IB addressing/management



If, in the end, apps that are modified to support OPA end up looking
like this:

if (node_type == RNIC)
	do iwarpy stuff
else if (node_type == USNIC)
	do USNIC stuff
else if (node_type == IB_CA || node_type == OPA_CA)
	do IB verbs stuff
	if (node_type == OPA_CA)
		do OPA addressing/management
	else if (link_layer == Ethernet)
		do RoCE addressing/management
	else
		do IB addressing/management

where you can plainly see that the exact same goal can be accomplished
whether you have an OPA node_type or an IB_CA node_type + OPA
link_layer, then I will be fine with either a new node_type or a new
link_layer.  They will be functionally equivalent as far as I'm concerned.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                     ` <557E6514.1060600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-16 21:05                                       ` Liran Liss
       [not found]                                         ` <HE1PR05MB1418C8F8E54FCC790B0CCAE3B1A70-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2015-06-16 22:12                                       ` Hefty, Sean
  1 sibling, 1 reply; 42+ messages in thread
From: Liran Liss @ 2015-06-16 21:05 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> From: Doug Ledford [mailto:dledford@redhat.com]

> > No. RoCE is as an open standard from the IBTA with the exact same RDMA
> protocol semantics as InfiniBand and a clear set of compliancy rules without
> which an implementation can't claim to be such. A RoCE device *is* an IB CA
> with an Ethernet link.
> > In contrast, OPA is a proprietary protocol. We don't know what primitives
> are supported, and whether the semantics of supported primitives are the
> same as in InfiniBand.
> 
> Intel has stated on this list that they intend for RDMA apps to run on
> OPA transparently.  That pretty much implies the list of primitives and
> everything else that they must support.  However, time will tell if they
> succeeded or not.
> 

I am sorry, but that's not good enough.
When I see an IB device, I know exactly what to expect. I can't say anything regarding an OPA device.

It might be that today the semantics are "close enough".
But in the future, both feature sets and semantics may diverge considerably.
What are you going to do then?

In addition, today, the host admin knows that 2 IB CA nodes will always interoperate. If you share the node type with OPA, everything breaks down. There is no way of knowing which devices work with which.

> >> The new OPA stuff appears to be following *exactly* the same
> development
> >> model/path that RoCE did.  When RoCE was introduced, all the apps that
> >> really cared about low level addressing on the link layer had to be
> >> modified to encompass the new link type.  This is simply link_layer
> >> number three for apps to care about.
> >>
> >
> > You are missing my point. API transparency is not a synonym for full
> semantic equivalence.  The Node Type doesn’t indicate level of adherence to
> an API. Node Type indicates compliancy to a  specification (e.g. wire protocol,
> remote order of execution, error semantics, architectural limitations, etc).
> The IBTA CA and Switch Node Types belong to devices that are compliant to
> the corresponding specifications from the InfiniBand Trade Association.  And
> that doesn’t prevent applications to choose to be coded to run over nodes of
> different Node Type as it happens today with IB/RoCE and iWARP.
> >
> > This has nothing to do with addressing.
> 
> And whether you like it or not, Intel is intentionally creating a
> device/fabric with the specific intention of mimicking the IB_CA device
> type (with stated exceptions for MAD packets and addresses).  They
> obviously won't have certification as an IB_CA, but that's not their
> aim.  Their aim is to be a functional drop in replacement that apps
> don't need to know about except for the stated exceptions.
> 

Intensions are nice, but there is no way to define these "stated exceptions" apart from a specification.

> And I'm not missing your point.  Your point is inappropriate.  You're
> trying to conflate certification with a functional API.  The IB_CA node
> type is not an official certification of anything, and the linux kernel
> is not an official certifying body for anything.  If you want
> certification, you go to the OFA and the UNH-IOL testing program.
> There, you have the rights to the certification branding logo and you
> have the right to deny access to that logo to anyone that doesn't meet
> the branding requirements.

Who said anything about certification?
I am talking about present and future semantic compliance to what an IB CA stands for, and interoperability guarantees.

ib_verbs define an *extensive* direct HW access API, which is constantly evolving.
You cannot describe the intricate object relations and semantics through an API.
In addition, you can't abstract anything or fix stuff in SW.
The only way to *truly* know what to expect when performing Verbs calls is to check the node type.

ib_verbs was never only an API. It started as the Linux implementation of the IBTA standard, with guaranteed semantics and wire protocol.
Later, the interface was reused to support additional RDMA devices. However, you could *always* check the node type if you wanted to, thereby retaining the standard guarantees. Win-win situation...

This is a very strong property; we should not give up on it.

> 
> You're right that apps can be coded to other CA types, like RNICs and
> USNICs.  However, those are all very different from an IB_CA due to
> limited queue pair types or limited primitives.  If OPA had that same
> limitation then I would agree it needs a different node type.
> 

How do you know that it doesn't?
Have you seen the OPA specification?

> So this will be my litmus test.  Currently, an app that supports all of
> the RDMA types looks like this:
> 
> if (node_type == RNIC)
> 	do iwarpy stuff
> else if (node_type == USNIC)
> 	do USNIC stuff
> else if (node_type == IB_CA)
> 	do IB verbs stuff
> 	if (link_layer == Ethernet)
> 		do RoCE addressing/management
> 	else
> 		do IB addressing/management
> 
> 
> 
> If, in the end, apps that are modified to support OPA end up looking
> like this:
> 
> if (node_type == RNIC)
> 	do iwarpy stuff
> else if (node_type == USNIC)
> 	do USNIC stuff
> else if (node_type == IB_CA || node_type == OPA_CA)
> 	do IB verbs stuff
> 	if (node_type == OPA_CA)
> 		do OPA addressing/management
> 	else if (link_layer == Ethernet)
> 		do RoCE addressing/management
> 	else
> 		do IB addressing/management
> 
> where you can plainly see that the exact same goal can be accomplished
> whether you have an OPA node_type or an IB_CA node_type + OPA
> link_layer, then I will be fine with either a new node_type or a new
> link_layer.  They will be functionally equivalent as far as I'm concerned.
> 

It is true that for some applications, your abstraction might work transparently.
But for other applications, your "do IB verbs stuff" (and not just the addressing/management) will either break today or break tomorrow.

This is bad both for IB and for OPA.

Why on earth are we putting ourselves into a position which could easily be avoided in the first place?

The solution is simple:
- As an API, Verbs will support IB/ROCE, iWARP, USNIC, and OPA
- The node type and link type refer to specific technologies
-- Most applications indeed don't care and don't check either of these properties anyway
-- Those that do, do it for a good reason; don't break them
- Management helpers will do a good job to keep the code maintainable and efficient even if OPA and IB have different node types

Win-win situation...
--Liran




^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                     ` <557E6514.1060600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-16 21:05                                       ` Liran Liss
@ 2015-06-16 22:12                                       ` Hefty, Sean
  1 sibling, 0 replies; 42+ messages in thread
From: Hefty, Sean @ 2015-06-16 22:12 UTC (permalink / raw)
  To: Doug Ledford, Liran Liss, Jason Gunthorpe
  Cc: Weiny, Ira, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> You're right that apps can be coded to other CA types, like RNICs and
> USNICs.  However, those are all very different from an IB_CA due to
> limited queue pair types or limited primitives.  If OPA had that same
> limitation then I would agree it needs a different node type.
> 
> So this will be my litmus test.  Currently, an app that supports all of
> the RDMA types looks like this:
> 
> if (node_type == RNIC)
> 	do iwarpy stuff
> else if (node_type == USNIC)
> 	do USNIC stuff
> else if (node_type == IB_CA)
> 	do IB verbs stuff
> 	if (link_layer == Ethernet)
> 		do RoCE addressing/management
> 	else
> 		do IB addressing/management

The node type values were originally defined to align with the IB management NodeInfo structure.  AFAICT, there was no intent to associate those values with specific functionality or addressing or verbs support or anything else, really, outside of what IB management needed.

iWarp added a new node type, so that the IB management code could ignore those devices.  RoCE basically broke this association by forcing additional checks in the local management code to also check against the link layer.  The recent mad capability bits are a superior solution, making the node type obsolete. 

At this point, the node type essentially indicates if we start counting ports at a numeric value 0 or 1.  The NodeType that an OPA channel adapter will report in a NodeInfo structure will be 1, the same value as if it were an IB channel adapter.

In the end, this argument matters one iota.  The kernel code barely relies on the node type, and a user space verbs provider can report whatever value it wants.

- Sean

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                         ` <HE1PR05MB1418C8F8E54FCC790B0CCAE3B1A70-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-06-17 14:03                                           ` Weiny, Ira
       [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E1109EA02-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-06-18 21:00                                           ` Doug Ledford
  1 sibling, 1 reply; 42+ messages in thread
From: Weiny, Ira @ 2015-06-17 14:03 UTC (permalink / raw)
  To: Liran Liss, Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> 
> ib_verbs define an *extensive* direct HW access API, which is constantly
> evolving.

This is the problem with verbs...

> You cannot describe the intricate object relations and semantics through an
> API.
> In addition, you can't abstract anything or fix stuff in SW.
> The only way to *truly* know what to expect when performing Verbs calls is to
> check the node type.

How can you say this?

mthca, mlx4, mlx5, and qib all have different sets of functionality... all with the same node type.  OPA has the same set as qib...  same node type.

> 
> ib_verbs was never only an API. It started as the Linux implementation of the
> IBTA standard, with guaranteed semantics and wire protocol.
> Later, the interface was reused to support additional RDMA devices. However,
> you could *always* check the node type if you wanted to, thereby retaining the
> standard guarantees. Win-win situation...

Not true at all.  For example, Qib does not support XRC and yet has the same node type as mlx4 (5)...

> 
> This is a very strong property; we should not give up on it.

On the contrary the property is weak and implies functionality or lack of functionality rather than being explicit.  This was done because getting changes to kernel ABIs was hard and we took a shortcut with node type which we should not have.  OPA attempts to stop this madness and supports the functionality of verbs _As_ _Defined_ rather than creating yet another set of things which applications need to check against.

> 
> >
> > You're right that apps can be coded to other CA types, like RNICs and
> > USNICs.  However, those are all very different from an IB_CA due to
> > limited queue pair types or limited primitives.  If OPA had that same
> > limitation then I would agree it needs a different node type.
> >
> 
> How do you know that it doesn't?

Up until now you have had to take my word for it.  Now that the driver has been posted it should be clear what verbs we support (same as qib).

Ira


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E1109EA02-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-18 20:12                                               ` Liran Liss
  0 siblings, 0 replies; 42+ messages in thread
From: Liran Liss @ 2015-06-18 20:12 UTC (permalink / raw)
  To: Weiny, Ira, Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> From: Weiny, Ira [mailto:ira.weiny@intel.com]

> > ib_verbs define an *extensive* direct HW access API, which is constantly
> > evolving.
> 
> This is the problem with verbs...

Huh?
It is its strength, if you don't break backward compatibility...

> 
> > You cannot describe the intricate object relations and semantics through an
> > API.
> > In addition, you can't abstract anything or fix stuff in SW.
> > The only way to *truly* know what to expect when performing Verbs calls
> is to
> > check the node type.
> 
> How can you say this?
> 
> mthca, mlx4, mlx5, and qib all have different sets of functionality... all with
> the same node type.  OPA has the same set as qib...  same node type.
> 

Only that qib is IB, which is fully interoperable with mlx*

> >
> > ib_verbs was never only an API. It started as the Linux implementation of
> the
> > IBTA standard, with guaranteed semantics and wire protocol.
> > Later, the interface was reused to support additional RDMA devices.
> However,
> > you could *always* check the node type if you wanted to, thereby retaining
> the
> > standard guarantees. Win-win situation...
> 
> Not true at all.  For example, Qib does not support XRC and yet has the same
> node type as mlx4 (5)...

The node type is for guaranteeing semantics and interop for the features that you do implement...

> 
> >
> > This is a very strong property; we should not give up on it.
> 
> On the contrary the property is weak and implies functionality or lack of
> functionality rather than being explicit.  This was done because getting
> changes to kernel ABIs was hard and we took a shortcut with node type
> which we should not have.  OPA attempts to stop this madness and supports
> the functionality of verbs _As_ _Defined_ rather than creating yet another
> set of things which applications need to check against.
> 

I totally agree that we should improve the expressiveness and accuracy of our capabilities;
you don't need OPA for this. Unfortunately, it is not always the case. 

Also, there are behaviors that are not defined by the API, but still rely on the node type.
Management applications , for example.

--Liran

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                         ` <HE1PR05MB1418C8F8E54FCC790B0CCAE3B1A70-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2015-06-17 14:03                                           ` Weiny, Ira
@ 2015-06-18 21:00                                           ` Doug Ledford
       [not found]                                             ` <953CDD5A-2738-4427-B763-EBFB4BBB2E03-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Doug Ledford @ 2015-06-18 21:00 UTC (permalink / raw)
  To: Liran Liss; +Cc: Jason Gunthorpe, Ira Weiny, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 5786 bytes --]


> On Jun 16, 2015, at 5:05 PM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> 
>> From: Doug Ledford [mailto:dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> 
>>> No. RoCE is as an open standard from the IBTA with the exact same RDMA
>> protocol semantics as InfiniBand and a clear set of compliancy rules without
>> which an implementation can't claim to be such. A RoCE device *is* an IB CA
>> with an Ethernet link.
>>> In contrast, OPA is a proprietary protocol. We don't know what primitives
>> are supported, and whether the semantics of supported primitives are the
>> same as in InfiniBand.
>> 
>> Intel has stated on this list that they intend for RDMA apps to run on
>> OPA transparently.  That pretty much implies the list of primitives and
>> everything else that they must support.  However, time will tell if they
>> succeeded or not.
>> 
> 
> I am sorry, but that's not good enough.
> When I see an IB device, I know exactly what to expect. I can't say anything regarding an OPA device.
> 
> It might be that today the semantics are "close enough".
> But in the future, both feature sets and semantics may diverge considerably.
> What are you going to do then?
> 
> In addition, today, the host admin knows that 2 IB CA nodes will always interoperate. If you share the node type with OPA, everything breaks down. There is no way of knowing which devices work with which.

You’ve not done yourself any favors with this argument.  You’ve actually stretched yourself into the land of hyperbole and FUD in order to make this.  Do you not see that “2 IB CA nodes will always interoperate” is not true as soon as you consider differing link layer types?  For example, an mlx4_en device will not interoperate with a qib device, yet they are both IB_CA node types.  Conflating allowing an OPA device to be node type IB_CA and link layer OPA to everything breaking down is pure and utter rubbish.  And with that, we are done with this discussion.  I’ve detailed what my litmus test will be, and I’m sticking with exactly that.

In the case of iWARP and usNIC, there are significant differences from an IB_CA that render a program responsible for possibly altering its intended transfer mechanism significantly (for instance usNIC is UD only, iWARP can’t do atomics or immediate data, so any transfer engine design that uses either of those is out of the question).  On the other hand, everything that uses IB_CA supports the various primitives and only vary in their addressing/management.  If OPA stays true to that (and it certainly does so far by supporting the same verbs as qib), then IB_CA/link_layer OPA is perfectly acceptable and in fact preferred due to the fact that it will produce the minimum amount of change in user space applications before they can support the OPA devices.

>> So this will be my litmus test.  Currently, an app that supports all of
>> the RDMA types looks like this:
>> 
>> if (node_type == RNIC)
>> 	do iwarpy stuff
>> else if (node_type == USNIC)
>> 	do USNIC stuff
>> else if (node_type == IB_CA)
>> 	do IB verbs stuff
>> 	if (link_layer == Ethernet)
>> 		do RoCE addressing/management
>> 	else
>> 		do IB addressing/management
>> 
>> 
>> 
>> If, in the end, apps that are modified to support OPA end up looking
>> like this:
>> 
>> if (node_type == RNIC)
>> 	do iwarpy stuff
>> else if (node_type == USNIC)
>> 	do USNIC stuff
>> else if (node_type == IB_CA || node_type == OPA_CA)
>> 	do IB verbs stuff
>> 	if (node_type == OPA_CA)
>> 		do OPA addressing/management
>> 	else if (link_layer == Ethernet)
>> 		do RoCE addressing/management
>> 	else
>> 		do IB addressing/management
>> 
>> where you can plainly see that the exact same goal can be accomplished
>> whether you have an OPA node_type or an IB_CA node_type + OPA
>> link_layer, then I will be fine with either a new node_type or a new
>> link_layer.  They will be functionally equivalent as far as I'm concerned.
>> 
> 
> It is true that for some applications, your abstraction might work transparently.
> But for other applications, your "do IB verbs stuff" (and not just the addressing/management) will either break today or break tomorrow.

FUD.  Come to me when you have a concrete issue and not hand-wavy scare mongering.

> This is bad both for IB and for OPA.

No, it’s not.

> Why on earth are we putting ourselves into a position which could easily be avoided in the first place?
> 
> The solution is simple:
> - As an API, Verbs will support IB/ROCE, iWARP, USNIC, and OPA

There is *zero* functional difference between node_type == OPA or node_type == IB_CA and link_layer == OPA. An application has *exactly* what they need to do everything you have mentioned.  It changes the test the application makes, but not what the application does.

> - The node type and link type refer to specific technologies

Yes, and Intel has made it clear that they are copying IB Verbs as a technology.  It is common that the one being copied be pissed off by the copying, but it is also common that they can’t do a damn thing about it.

> -- Most applications indeed don't care and don't check either of these properties anyway
> -- Those that do, do it for a good reason; don't break them
> - Management helpers will do a good job to keep the code maintainable and efficient even if OPA and IB have different node types
> 
> Win-win situation...
> --Liran
> 
> 
> 
> N���r�y���b�X�ǧv�^�)޺{.n�+�{#<^_NSEDR_^#<ٚ�{ay�\x1dʇڙ�,j\a�f�h��z�\x1e�w�\f�j:+v�w�j�m�\a���zZ+��ݢj"�!

—
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
	GPG Key ID: 0E572FDD






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]                                             ` <953CDD5A-2738-4427-B763-EBFB4BBB2E03-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-19 11:53                                               ` Hal Rosenstock
  0 siblings, 0 replies; 42+ messages in thread
From: Hal Rosenstock @ 2015-06-19 11:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Jason Gunthorpe, Ira Weiny,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 6/18/2015 5:00 PM, Doug Ledford wrote:
> There is *zero* functional difference between node_type == OPA or node_type == IB_CA and link_layer == OPA. 
> An application has *exactly* what they need

We have neither of these things in the kernel today. Also, if I interpreted what was written by first Ira and more recently Sean, even if either of these things were done, the user space provider library for OPA might/could just change these back to the IB types.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
@ 2015-05-28 16:21 Liran Liss
  0 siblings, 0 replies; 42+ messages in thread
From: Liran Liss @ 2015-05-28 16:21 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> >
> > Why do you have RDMA_NODE_IB_SWITCH related stuff inside the
> handle_opa_smi() function?
> > Is there a node type of "switch" in OPA similar to IB?
> >
> 
> Yes.  OPA uses the same node types as IB.
> 
> Ira
> 

No, OPA cannot impersonate IB.
It has to have distinct node and link types.

--Liran

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]         ` <20150520185901.GK28496-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-21 16:23           ` ira.weiny
  0 siblings, 0 replies; 42+ messages in thread
From: ira.weiny @ 2015-05-21 16:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Wed, May 20, 2015 at 12:59:01PM -0600, Jason Gunthorpe wrote:
> On Wed, May 20, 2015 at 04:13:35AM -0400, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > @@ -433,14 +436,23 @@ static inline int get_mad_len(struct mad_rmpp_recv *rmpp_recv)
> >  {
> >  	struct ib_rmpp_base *rmpp_base;
> >  	int hdr_size, data_size, pad;
> > +	int opa = rdma_cap_opa_mad(rmpp_recv->agent->qp_info->port_priv->device,
> > +				   rmpp_recv->agent->qp_info->port_priv->port_num);
> 
> bool opa

Thanks Fixed

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]         ` <CY1PR03MB1440B98A7FE0A82E1BE53D75DEC20-DUcFgbLRNhB/HYnSB+xpdWP7xZHs9kq/vxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-05-20 21:26           ` ira.weiny
  0 siblings, 0 replies; 42+ messages in thread
From: ira.weiny @ 2015-05-20 21:26 UTC (permalink / raw)
  To: Suri Shelvapille
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> 
> Why do you have RDMA_NODE_IB_SWITCH related stuff inside the handle_opa_smi() function?
> Is there a node type of "switch" in OPA similar to IB?
> 

Yes.  OPA uses the same node types as IB.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]     ` <1432109615-19564-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-05-20 18:59       ` Jason Gunthorpe
@ 2015-05-20 21:11       ` Suri Shelvapille
       [not found]         ` <CY1PR03MB1440B98A7FE0A82E1BE53D75DEC20-DUcFgbLRNhB/HYnSB+xpdWP7xZHs9kq/vxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Suri Shelvapille @ 2015-05-20 21:11 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

Can you please clarify:

+static enum smi_action
+handle_opa_smi(struct ib_mad_port_private *port_priv,
+       struct ib_mad_qp_info *qp_info,
+       struct ib_wc *wc,
+       int port_num,
+       struct ib_mad_private *recv,
+       struct ib_mad_private *response) {
+enum smi_forward_action retsmi;
+
+if (opa_smi_handle_dr_smp_recv(&recv->mad.opa_smp,
+   port_priv->device->node_type,
+   port_num,
+   port_priv->device->phys_port_cnt) ==
+   IB_SMI_DISCARD)
+return IB_SMI_DISCARD;
+
+retsmi = opa_smi_check_forward_dr_smp(&recv->mad.opa_smp);
+if (retsmi == IB_SMI_LOCAL)
+return IB_SMI_HANDLE;
+
+if (retsmi == IB_SMI_SEND) { /* don't forward */
+if (opa_smi_handle_dr_smp_send(&recv->mad.opa_smp,
+   port_priv->device->node_type,
+   port_num) == IB_SMI_DISCARD)
+return IB_SMI_DISCARD;
+
+if (opa_smi_check_local_smp(&recv->mad.opa_smp, port_priv->device) == IB_SMI_DISCARD)
+return IB_SMI_DISCARD;
+
+} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
+/* forward case for switches */
+memcpy(response, recv, sizeof(*response));
+response->header.recv_wc.wc = &response->header.wc;
+response->header.recv_wc.recv_buf.opa_mad = &response->mad.opa_mad;
+response->header.recv_wc.recv_buf.grh = &response->grh;
+
+agent_send_response((struct ib_mad *)&response->mad.mad,
+    &response->grh, wc,
+    port_priv->device,
+    opa_smi_get_fwd_port(&recv->mad.opa_smp),
+    qp_info->qp->qp_num,
+    recv->header.wc.byte_len,
+    1);
+
+return IB_SMI_DISCARD;
+}
+
+return IB_SMI_HANDLE;
+}
+

Why do you have RDMA_NODE_IB_SWITCH related stuff inside the handle_opa_smi() function?
Is there a node type of "switch" in OPA similar to IB?


Thanks,
Suri

This correspondence, and any attachments or files transmitted with this correspondence, contains information which may be confidential and privileged and is intended solely for the use of the addressee. Unless you are the addressee or are authorized to receive messages for the addressee, you may not use, copy, disseminate, or disclose this correspondence or any information contained in this correspondence to any third party. If you have received this correspondence in error, please notify the sender immediately and delete this correspondence and any attachments or files transmitted with this correspondence from your system, and destroy any and all copies thereof, electronic or otherwise. Your cooperation and understanding are greatly appreciated.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found]     ` <1432109615-19564-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-05-20 18:59       ` Jason Gunthorpe
       [not found]         ` <20150520185901.GK28496-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-05-20 21:11       ` Suri Shelvapille
  1 sibling, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2015-05-20 18:59 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Wed, May 20, 2015 at 04:13:35AM -0400, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> @@ -433,14 +436,23 @@ static inline int get_mad_len(struct mad_rmpp_recv *rmpp_recv)
>  {
>  	struct ib_rmpp_base *rmpp_base;
>  	int hdr_size, data_size, pad;
> +	int opa = rdma_cap_opa_mad(rmpp_recv->agent->qp_info->port_priv->device,
> +				   rmpp_recv->agent->qp_info->port_priv->port_num);

bool opa

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 14/14] IB/mad: Add final OPA MAD processing
       [not found] ` <1432109615-19564-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-05-20  8:13   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1432109615-19564-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-05-20  8:13 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

For devices which support OPA MADs

Use previously defined SMP support functions.

Pass correct base version to ib_create_send_mad when processing OPA MADs.

Process wc.pkey_index returned by agents for response because OPA SMP packets
must carry a valid pkey.

Carry the correct segment size (OPA vs IBTA) of RMPP messages within
ib_mad_recv_wc.

Handle variable length OPA MADs by:

        * Adjusting the 'fake' WC for locally routed SMP's to represent the
          proper incoming byte_len
        * out_mad_size is used from the local HCA agents
                1) when sending agent responses on the wire
                2) when passing responses through the local_completions
		   function

NOTE: wc.byte_len includes the GRH length and therefore is different from the
      in_mad_size specified to the local HCA agents.  out_mad_size should _not_
      include the GRH length as it is added by the verbs layer and is not part
      of MAD processing.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/agent.c    |  23 +++-
 drivers/infiniband/core/agent.h    |   3 +-
 drivers/infiniband/core/mad.c      | 222 +++++++++++++++++++++++++++++++------
 drivers/infiniband/core/mad_priv.h |   1 +
 drivers/infiniband/core/mad_rmpp.c |  20 +++-
 drivers/infiniband/core/user_mad.c |  19 ++--
 include/rdma/ib_mad.h              |   2 +
 7 files changed, 242 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index 5c7627c3278c..fb305635c95d 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -80,13 +80,16 @@ ib_get_agent_port(struct ib_device *device, int port_num)
 
 void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 			 struct ib_wc *wc, struct ib_device *device,
-			 int port_num, int qpn)
+			 int port_num, int qpn, u32 resp_mad_len,
+			 bool opa)
 {
 	struct ib_agent_port_private *port_priv;
 	struct ib_mad_agent *agent;
 	struct ib_mad_send_buf *send_buf;
 	struct ib_ah *ah;
 	struct ib_mad_send_wr_private *mad_send_wr;
+	size_t data_len;
+	u8 base_version;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH)
 		port_priv = ib_get_agent_port(device, 0);
@@ -106,16 +109,26 @@ void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 		return;
 	}
 
+	/* On OPA devices base version determines MAD size */
+	base_version = mad->mad_hdr.base_version;
+	if (opa && base_version == OPA_MGMT_BASE_VERSION)
+		data_len = resp_mad_len - IB_MGMT_MAD_HDR;
+	else
+		data_len = IB_MGMT_MAD_DATA;
+
 	send_buf = ib_create_send_mad(agent, wc->src_qp, wc->pkey_index, 0,
-				      IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-				      GFP_KERNEL,
-				      IB_MGMT_BASE_VERSION);
+				      IB_MGMT_MAD_HDR, data_len, GFP_KERNEL,
+				      base_version);
 	if (IS_ERR(send_buf)) {
 		dev_err(&device->dev, "ib_create_send_mad error\n");
 		goto err1;
 	}
 
-	memcpy(send_buf->mad, mad, sizeof *mad);
+	if (opa && base_version == OPA_MGMT_BASE_VERSION)
+		memcpy(send_buf->mad, mad, resp_mad_len);
+	else
+		memcpy(send_buf->mad, mad, sizeof(*mad));
+
 	send_buf->ah = ah;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH) {
diff --git a/drivers/infiniband/core/agent.h b/drivers/infiniband/core/agent.h
index 6669287009c2..7ede18b34ca8 100644
--- a/drivers/infiniband/core/agent.h
+++ b/drivers/infiniband/core/agent.h
@@ -46,6 +46,7 @@ extern int ib_agent_port_close(struct ib_device *device, int port_num);
 
 extern void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 				struct ib_wc *wc, struct ib_device *device,
-				int port_num, int qpn);
+				int port_num, int qpn, u32 resp_mad_len,
+				bool opa);
 
 #endif	/* __AGENT_H_ */
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 50a63247f6f9..e3328a353f92 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2005 Intel Corporation.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2009 HNR Consulting. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -44,6 +45,7 @@
 #include "mad_priv.h"
 #include "mad_rmpp.h"
 #include "smi.h"
+#include "opa_smi.h"
 #include "agent.h"
 
 MODULE_LICENSE("Dual BSD/GPL");
@@ -736,6 +738,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 {
 	int ret = 0;
 	struct ib_smp *smp = mad_send_wr->send_buf.mad;
+	struct opa_smp *opa_smp = (struct opa_smp *)smp;
 	unsigned long flags;
 	struct ib_mad_local_private *local;
 	struct ib_mad_private *mad_priv;
@@ -747,6 +750,9 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
 	size_t in_mad_size = max_mad_size(mad_agent_priv->qp_info->port_priv);
 	size_t out_mad_size;
+	u16 drslid;
+	bool opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,
+				    mad_agent_priv->qp_info->port_priv->port_num);
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH &&
 	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
@@ -760,19 +766,47 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	 * If we are at the start of the LID routed part, don't update the
 	 * hop_ptr or hop_cnt.  See section 14.2.2, Vol 1 IB spec.
 	 */
-	if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
-	     IB_LID_PERMISSIVE &&
-	     smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
-	     IB_SMI_DISCARD) {
-		ret = -EINVAL;
-		dev_err(&device->dev, "Invalid directed route\n");
-		goto out;
-	}
+	if (opa && smp->class_version == OPA_SMP_CLASS_VERSION) {
+		u32 opa_drslid;
+		if ((opa_get_smp_direction(opa_smp)
+		     ? opa_smp->route.dr.dr_dlid : opa_smp->route.dr.dr_slid) ==
+		     OPA_LID_PERMISSIVE &&
+		     opa_smi_handle_dr_smp_send(opa_smp, device->node_type,
+						port_num) == IB_SMI_DISCARD) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "OPA Invalid directed route\n");
+			goto out;
+		}
+		opa_drslid = be32_to_cpu(opa_smp->route.dr.dr_slid);
+		if (opa_drslid != OPA_LID_PERMISSIVE &&
+		    opa_drslid & 0xffff0000) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "OPA Invalid dr_slid 0x%x\n",
+			       opa_drslid);
+			goto out;
+		}
+		drslid = (u16)(opa_drslid & 0x0000ffff);
 
-	/* Check to post send on QP or process locally */
-	if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD &&
-	    smi_check_local_returning_smp(smp, device) == IB_SMI_DISCARD)
-		goto out;
+		/* Check to post send on QP or process locally */
+		if (opa_smi_check_local_smp(opa_smp, device) == IB_SMI_DISCARD &&
+		    opa_smi_check_local_returning_smp(opa_smp, device) == IB_SMI_DISCARD)
+			goto out;
+	} else {
+		if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
+		     IB_LID_PERMISSIVE &&
+		     smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
+		     IB_SMI_DISCARD) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "Invalid directed route\n");
+			goto out;
+		}
+		drslid = be16_to_cpu(smp->dr_slid);
+
+		/* Check to post send on QP or process locally */
+		if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD &&
+		    smi_check_local_returning_smp(smp, device) == IB_SMI_DISCARD)
+			goto out;
+	}
 
 	local = kmalloc(sizeof *local, GFP_ATOMIC);
 	if (!local) {
@@ -793,10 +827,16 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	build_smp_wc(mad_agent_priv->agent.qp,
-		     send_wr->wr_id, be16_to_cpu(smp->dr_slid),
+		     send_wr->wr_id, drslid,
 		     send_wr->wr.ud.pkey_index,
 		     send_wr->wr.ud.port_num, &mad_wc);
 
+	if (opa && smp->base_version == OPA_MGMT_BASE_VERSION) {
+		mad_wc.byte_len = mad_send_wr->send_buf.hdr_len
+					+ mad_send_wr->send_buf.data_len
+					+ sizeof(struct ib_grh);
+	}
+
 	/* No GRH for DR SMP */
 	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
 				  (struct ib_mad_hdr *)smp, in_mad_size,
@@ -825,7 +865,10 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 		port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
 					    mad_agent_priv->agent.port_num);
 		if (port_priv) {
-			memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad));
+			if (opa && smp->base_version == OPA_MGMT_BASE_VERSION)
+				memcpy(&mad_priv->mad.mad, smp, sizeof(struct opa_mad));
+			else
+				memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad));
 			recv_mad_agent = find_mad_agent(port_priv,
 						        &mad_priv->mad.mad);
 		}
@@ -848,6 +891,8 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	local->mad_send_wr = mad_send_wr;
+	local->mad_send_wr->send_wr.wr.ud.pkey_index = mad_wc.pkey_index;
+	local->return_wc_byte_len = out_mad_size;
 	/* Reference MAD agent until send side of local completion handled */
 	atomic_inc(&mad_agent_priv->refcount);
 	/* Queue local completion to local list */
@@ -1740,14 +1785,18 @@ out:
 	return mad_agent;
 }
 
-static int validate_mad(const struct ib_mad_hdr *mad_hdr, u32 qp_num)
+static int validate_mad(const struct ib_mad_hdr *mad_hdr,
+			const struct ib_mad_qp_info *qp_info,
+			bool opa)
 {
 	int valid = 0;
+	u32 qp_num = qp_info->qp->qp_num;
 
 	/* Make sure MAD base version is understood */
-	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION) {
-		pr_err("MAD received with unsupported base version %d\n",
-			mad_hdr->base_version);
+	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION &&
+	    (!opa || mad_hdr->base_version != OPA_MGMT_BASE_VERSION)) {
+		pr_err("MAD received with unsupported base version %d %s\n",
+		       mad_hdr->base_version, opa ? "(opa)" : "");
 		goto out;
 	}
 
@@ -1995,7 +2044,9 @@ enum smi_action handle_ib_smi(struct ib_mad_port_private *port_priv,
 				    &response->grh, wc,
 				    port_priv->device,
 				    smi_get_fwd_port(&recv->mad.smp),
-				    qp_info->qp->qp_num);
+				    qp_info->qp->qp_num,
+				    sizeof(struct ib_mad),
+				    false);
 
 		return IB_SMI_DISCARD;
 	}
@@ -2008,7 +2059,9 @@ static size_t mad_recv_buf_size(struct ib_mad_port_private *port_priv)
 }
 
 static bool generate_unmatched_resp(struct ib_mad_private *recv,
-				    struct ib_mad_private *response)
+				    struct ib_mad_private *response,
+				    size_t *resp_len,
+				    bool opa)
 {
 	if (recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_GET ||
 	    recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_SET) {
@@ -2022,11 +2075,90 @@ static bool generate_unmatched_resp(struct ib_mad_private *recv,
 		if (recv->mad.mad.mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
 			response->mad.mad.mad_hdr.status |= IB_SMP_DIRECTION;
 
+		if (opa && recv->mad.mad.mad_hdr.base_version == OPA_MGMT_BASE_VERSION) {
+			if (recv->mad.mad.mad_hdr.mgmt_class ==
+			    IB_MGMT_CLASS_SUBN_LID_ROUTED ||
+			    recv->mad.mad.mad_hdr.mgmt_class ==
+			    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+				*resp_len = opa_get_smp_header_size(
+							(struct opa_smp *)&recv->mad.smp);
+			else
+				*resp_len = sizeof(struct ib_mad_hdr);
+		}
+
 		return true;
 	} else {
 		return false;
 	}
 }
+
+static enum smi_action
+handle_opa_smi(struct ib_mad_port_private *port_priv,
+	       struct ib_mad_qp_info *qp_info,
+	       struct ib_wc *wc,
+	       int port_num,
+	       struct ib_mad_private *recv,
+	       struct ib_mad_private *response)
+{
+	enum smi_forward_action retsmi;
+
+	if (opa_smi_handle_dr_smp_recv(&recv->mad.opa_smp,
+				   port_priv->device->node_type,
+				   port_num,
+				   port_priv->device->phys_port_cnt) ==
+				   IB_SMI_DISCARD)
+		return IB_SMI_DISCARD;
+
+	retsmi = opa_smi_check_forward_dr_smp(&recv->mad.opa_smp);
+	if (retsmi == IB_SMI_LOCAL)
+		return IB_SMI_HANDLE;
+
+	if (retsmi == IB_SMI_SEND) { /* don't forward */
+		if (opa_smi_handle_dr_smp_send(&recv->mad.opa_smp,
+					   port_priv->device->node_type,
+					   port_num) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+		if (opa_smi_check_local_smp(&recv->mad.opa_smp, port_priv->device) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
+		/* forward case for switches */
+		memcpy(response, recv, sizeof(*response));
+		response->header.recv_wc.wc = &response->header.wc;
+		response->header.recv_wc.recv_buf.opa_mad = &response->mad.opa_mad;
+		response->header.recv_wc.recv_buf.grh = &response->grh;
+
+		agent_send_response((struct ib_mad *)&response->mad.mad,
+				    &response->grh, wc,
+				    port_priv->device,
+				    opa_smi_get_fwd_port(&recv->mad.opa_smp),
+				    qp_info->qp->qp_num,
+				    recv->header.wc.byte_len,
+				    1);
+
+		return IB_SMI_DISCARD;
+	}
+
+	return IB_SMI_HANDLE;
+}
+
+static enum smi_action
+handle_smi(struct ib_mad_port_private *port_priv,
+	   struct ib_mad_qp_info *qp_info,
+	   struct ib_wc *wc,
+	   int port_num,
+	   struct ib_mad_private *recv,
+	   struct ib_mad_private *response,
+	   bool opa)
+{
+	if (opa && recv->mad.mad.mad_hdr.base_version == OPA_MGMT_BASE_VERSION &&
+	    recv->mad.mad.mad_hdr.class_version == OPA_SMI_CLASS_VERSION)
+		return handle_opa_smi(port_priv, qp_info, wc, port_num, recv, response);
+
+	return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response);
+}
+
 static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 				     struct ib_wc *wc)
 {
@@ -2038,11 +2170,15 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	int port_num;
 	int ret = IB_MAD_RESULT_SUCCESS;
 	size_t resp_mad_size;
+	bool opa;
 
 	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
 	qp_info = mad_list->mad_queue->qp_info;
 	dequeue_mad(mad_list);
 
+	opa = rdma_cap_opa_mad(qp_info->port_priv->device,
+			       qp_info->port_priv->port_num);
+
 	mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header,
 				    mad_list);
 	recv = container_of(mad_priv_hdr, struct ib_mad_private, header);
@@ -2054,7 +2190,13 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	/* Setup MAD receive work completion from "normal" work completion */
 	recv->header.wc = *wc;
 	recv->header.recv_wc.wc = &recv->header.wc;
-	recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+	if (opa && recv->mad.mad.mad_hdr.base_version == OPA_MGMT_BASE_VERSION) {
+		recv->header.recv_wc.mad_len = wc->byte_len - sizeof(struct ib_grh);
+		recv->header.recv_wc.mad_seg_size = sizeof(struct opa_mad);
+	} else {
+		recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+		recv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
+	}
 	recv->header.recv_wc.recv_buf.mad = &recv->mad.mad;
 	recv->header.recv_wc.recv_buf.grh = &recv->grh;
 
@@ -2062,7 +2204,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS);
 
 	/* Validate MAD */
-	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
+	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info, opa))
 		goto out;
 
 	resp_mad_size = max_mad_size(port_priv);
@@ -2080,8 +2222,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 
 	if (recv->mad.mad.mad_hdr.mgmt_class ==
 	    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
-		if (handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
-				  response)
+		if (handle_smi(port_priv, qp_info, wc, port_num, recv, response, opa)
 		    == IB_SMI_DISCARD)
 			goto out;
 	}
@@ -2103,7 +2244,9 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 						    &recv->grh, wc,
 						    port_priv->device,
 						    port_num,
-						    qp_info->qp->qp_num);
+						    qp_info->qp->qp_num,
+						    resp_mad_size,
+						    opa);
 				goto out;
 			}
 		}
@@ -2118,9 +2261,12 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		 */
 		recv = NULL;
 	} else if ((ret & IB_MAD_RESULT_SUCCESS) &&
-		   generate_unmatched_resp(recv, response)) {
+		   generate_unmatched_resp(recv, response, &resp_mad_size, opa)) {
 		agent_send_response(&response->mad.mad, &recv->grh, wc,
-				    port_priv->device, port_num, qp_info->qp->qp_num);
+				    port_priv->device, port_num,
+				    qp_info->qp->qp_num,
+				    resp_mad_size,
+				    opa);
 	}
 
 out:
@@ -2522,10 +2668,14 @@ static void local_completions(struct work_struct *work)
 	int free_mad;
 	struct ib_wc wc;
 	struct ib_mad_send_wc mad_send_wc;
+	bool opa;
 
 	mad_agent_priv =
 		container_of(work, struct ib_mad_agent_private, local_work);
 
+	opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,
+			       mad_agent_priv->qp_info->port_priv->port_num);
+
 	spin_lock_irqsave(&mad_agent_priv->lock, flags);
 	while (!list_empty(&mad_agent_priv->local_list)) {
 		local = list_entry(mad_agent_priv->local_list.next,
@@ -2535,6 +2685,7 @@ static void local_completions(struct work_struct *work)
 		spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
 		free_mad = 0;
 		if (local->mad_priv) {
+			u8 base_version;
 			recv_mad_agent = local->recv_mad_agent;
 			if (!recv_mad_agent) {
 				dev_err(&mad_agent_priv->agent.device->dev,
@@ -2550,11 +2701,20 @@ static void local_completions(struct work_struct *work)
 			build_smp_wc(recv_mad_agent->agent.qp,
 				     (unsigned long) local->mad_send_wr,
 				     be16_to_cpu(IB_LID_PERMISSIVE),
-				     0, recv_mad_agent->agent.port_num, &wc);
+				     local->mad_send_wr->send_wr.wr.ud.pkey_index,
+				     recv_mad_agent->agent.port_num, &wc);
 
 			local->mad_priv->header.recv_wc.wc = &wc;
-			local->mad_priv->header.recv_wc.mad_len =
-						sizeof(struct ib_mad);
+
+			base_version = local->mad_priv->mad.mad.mad_hdr.base_version;
+			if (opa && base_version == OPA_MGMT_BASE_VERSION) {
+				local->mad_priv->header.recv_wc.mad_len = local->return_wc_byte_len;
+				local->mad_priv->header.recv_wc.mad_seg_size = sizeof(struct opa_mad);
+			} else {
+				local->mad_priv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+				local->mad_priv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
+			}
+
 			INIT_LIST_HEAD(&local->mad_priv->header.recv_wc.rmpp_list);
 			list_add(&local->mad_priv->header.recv_wc.recv_buf.list,
 				 &local->mad_priv->header.recv_wc.rmpp_list);
@@ -2703,7 +2863,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 	struct ib_mad_queue *recv_queue = &qp_info->recv_queue;
 
 	/* Initialize common scatter list fields */
-	sg_list.length = sizeof *mad_priv - sizeof mad_priv->header;
+	sg_list.length = mad_recv_buf_size(qp_info->port_priv);
 	sg_list.lkey = (*qp_info->port_priv->mr).lkey;
 
 	/* Initialize common receive WR fields */
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 6e8b02fdcc5f..e8db522cc447 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -154,6 +154,7 @@ struct ib_mad_local_private {
 	struct ib_mad_private *mad_priv;
 	struct ib_mad_agent_private *recv_mad_agent;
 	struct ib_mad_send_wr_private *mad_send_wr;
+	size_t return_wc_byte_len;
 };
 
 struct ib_mad_mgmt_method_table {
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 9c284d9b4fa9..4930bb90319f 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2005 Intel Inc. All rights reserved.
  * Copyright (c) 2005-2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -67,6 +68,7 @@ struct mad_rmpp_recv {
 	u8 mgmt_class;
 	u8 class_version;
 	u8 method;
+	u8 base_version;
 };
 
 static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv)
@@ -318,6 +320,7 @@ create_rmpp_recv(struct ib_mad_agent_private *agent,
 	rmpp_recv->mgmt_class = mad_hdr->mgmt_class;
 	rmpp_recv->class_version = mad_hdr->class_version;
 	rmpp_recv->method  = mad_hdr->method;
+	rmpp_recv->base_version  = mad_hdr->base_version;
 	return rmpp_recv;
 
 error:	kfree(rmpp_recv);
@@ -433,14 +436,23 @@ static inline int get_mad_len(struct mad_rmpp_recv *rmpp_recv)
 {
 	struct ib_rmpp_base *rmpp_base;
 	int hdr_size, data_size, pad;
+	int opa = rdma_cap_opa_mad(rmpp_recv->agent->qp_info->port_priv->device,
+				   rmpp_recv->agent->qp_info->port_priv->port_num);
 
 	rmpp_base = (struct ib_rmpp_base *)rmpp_recv->cur_seg_buf->mad;
 
 	hdr_size = ib_get_mad_data_offset(rmpp_base->mad_hdr.mgmt_class);
-	data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
-	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_base->rmpp_hdr.paylen_newwin);
-	if (pad > IB_MGMT_RMPP_DATA || pad < 0)
-		pad = 0;
+	if (opa && rmpp_recv->base_version == OPA_MGMT_BASE_VERSION) {
+		data_size = sizeof(struct opa_rmpp_mad) - hdr_size;
+		pad = OPA_MGMT_RMPP_DATA - be32_to_cpu(rmpp_base->rmpp_hdr.paylen_newwin);
+		if (pad > OPA_MGMT_RMPP_DATA || pad < 0)
+			pad = 0;
+	} else {
+		data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
+		pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_base->rmpp_hdr.paylen_newwin);
+		if (pad > IB_MGMT_RMPP_DATA || pad < 0)
+			pad = 0;
+	}
 
 	return hdr_size + rmpp_recv->seg_num * data_size - pad;
 }
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 6be72a563c61..8be631d94615 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -262,20 +262,23 @@ static ssize_t copy_recv_mad(struct ib_umad_file *file, char __user *buf,
 {
 	struct ib_mad_recv_buf *recv_buf;
 	int left, seg_payload, offset, max_seg_payload;
+	size_t seg_size;
 
-	/* We need enough room to copy the first (or only) MAD segment. */
 	recv_buf = &packet->recv_wc->recv_buf;
-	if ((packet->length <= sizeof (*recv_buf->mad) &&
+	seg_size = packet->recv_wc->mad_seg_size;
+
+	/* We need enough room to copy the first (or only) MAD segment. */
+	if ((packet->length <= seg_size &&
 	     count < hdr_size(file) + packet->length) ||
-	    (packet->length > sizeof (*recv_buf->mad) &&
-	     count < hdr_size(file) + sizeof (*recv_buf->mad)))
+	    (packet->length > seg_size &&
+	     count < hdr_size(file) + seg_size))
 		return -EINVAL;
 
 	if (copy_to_user(buf, &packet->mad, hdr_size(file)))
 		return -EFAULT;
 
 	buf += hdr_size(file);
-	seg_payload = min_t(int, packet->length, sizeof (*recv_buf->mad));
+	seg_payload = min_t(int, packet->length, seg_size);
 	if (copy_to_user(buf, recv_buf->mad, seg_payload))
 		return -EFAULT;
 
@@ -292,7 +295,7 @@ static ssize_t copy_recv_mad(struct ib_umad_file *file, char __user *buf,
 			return -ENOSPC;
 		}
 		offset = ib_get_mad_data_offset(recv_buf->mad->mad_hdr.mgmt_class);
-		max_seg_payload = sizeof (struct ib_mad) - offset;
+		max_seg_payload = seg_size - offset;
 
 		for (left = packet->length - seg_payload, buf += seg_payload;
 		     left; left -= seg_payload, buf += seg_payload) {
@@ -450,6 +453,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 	struct ib_rmpp_base *rmpp_base;
 	__be64 *tid;
 	int ret, data_len, hdr_len, copy_offset, rmpp_active;
+	u8 base_version;
 
 	if (count < hdr_size(file) + IB_MGMT_RMPP_HDR)
 		return -EINVAL;
@@ -516,12 +520,13 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 		rmpp_active = 0;
 	}
 
+	base_version = ((struct ib_mad_hdr *)&packet->mad.data)->base_version;
 	data_len = count - hdr_size(file) - hdr_len;
 	packet->msg = ib_create_send_mad(agent,
 					 be32_to_cpu(packet->mad.hdr.qpn),
 					 packet->mad.hdr.pkey_index, rmpp_active,
 					 hdr_len, data_len, GFP_KERNEL,
-					 IB_MGMT_BASE_VERSION);
+					 base_version);
 	if (IS_ERR(packet->msg)) {
 		ret = PTR_ERR(packet->msg);
 		goto err_ah;
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index a8a6e9d2485e..e5b664098047 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -434,6 +434,7 @@ struct ib_mad_recv_buf {
  * @recv_buf: Specifies the location of the received data buffer(s).
  * @rmpp_list: Specifies a list of RMPP reassembled received MAD buffers.
  * @mad_len: The length of the received MAD, without duplicated headers.
+ * @mad_seg_size: The size of individual MAD segments
  *
  * For received response, the wr_id contains a pointer to the ib_mad_send_buf
  *   for the corresponding send request.
@@ -443,6 +444,7 @@ struct ib_mad_recv_wc {
 	struct ib_mad_recv_buf	recv_buf;
 	struct list_head	rmpp_list;
 	int			mad_len;
+	size_t			mad_seg_size;
 };
 
 /**
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-06-19 11:53 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-06 18:38 [PATCH 00/14] IB/mad: Add support for OPA MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found] ` <1433615915-24591-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-06-06 18:38   ` [PATCH 01/14] IB/mad cleanup: Clean up function params -- find_mad_agent ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 02/14] IB/mad cleanup: Generalize processing of MAD data ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 03/14] IB/mad: Split IB SMI handling from MAD Recv handler ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 04/14] IB/mad: Create a generic helper for DR SMP Send processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 05/14] IB/mad: Create a generic helper for DR SMP Recv processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 06/14] IB/mad: Create a generic helper for DR forwarding checks ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 07/14] IB/mad: Support alternate Base Versions when creating MADs ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 08/14] IB/core: Add ability for drivers to report an alternate MAD size ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 09/14] IB/mad: Convert allocations from kmem_cache to kzalloc ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 10/14] IB/mad: Add support for additional MAD info to/from drivers ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1433615915-24591-11-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-06-08 18:50       ` Hefty, Sean
2015-06-06 18:38   ` [PATCH 11/14] IB/core: Add OPA MAD core capability flag ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 12/14] IB/mad: Add partial Intel OPA MAD support ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 13/14] " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-06-06 18:38   ` [PATCH 14/14] IB/mad: Add final OPA MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1433615915-24591-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-06-10  6:30       ` Liran Liss
     [not found]         ` <HE1PR05MB1418BB6C461790B76D9C02A3B1BD0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-06-10 17:54           ` ira.weiny
2015-06-10 18:37           ` Doug Ledford
     [not found]             ` <1433961446.71666.26.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-10 18:56               ` Jason Gunthorpe
     [not found]                 ` <20150610185653.GA28153-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-10 19:59                   ` Doug Ledford
     [not found]                     ` <1433966378.71666.44.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-11 18:27                       ` Liran Liss
     [not found]                         ` <HE1PR05MB141885494D6967919DAE135EB1BC0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-06-12 14:23                           ` Doug Ledford
     [not found]                             ` <557AEB5D.1040003-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-14 19:16                               ` Liran Liss
     [not found]                                 ` <HE1PR05MB14182DCD7003B52A28BB62A5B1B90-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-06-15  5:39                                   ` Doug Ledford
     [not found]                                     ` <557E6514.1060600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-16 21:05                                       ` Liran Liss
     [not found]                                         ` <HE1PR05MB1418C8F8E54FCC790B0CCAE3B1A70-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-06-17 14:03                                           ` Weiny, Ira
     [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E1109EA02-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-18 20:12                                               ` Liran Liss
2015-06-18 21:00                                           ` Doug Ledford
     [not found]                                             ` <953CDD5A-2738-4427-B763-EBFB4BBB2E03-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-19 11:53                                               ` Hal Rosenstock
2015-06-16 22:12                                       ` Hefty, Sean
2015-06-11 21:00                       ` Hefty, Sean
     [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373A8FEF1EA-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-11 23:24                           ` Hal Rosenstock
     [not found]                             ` <557A18C0.6010200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-06-11 23:52                               ` Hefty, Sean
     [not found]                                 ` <1828884A29C6694DAF28B7E6B8A82373A8FEF321-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-12  0:22                                   ` Hal Rosenstock
2015-06-12 20:00   ` [PATCH 00/14] IB/mad: Add support for " Doug Ledford
  -- strict thread matches above, loose matches on Subject: below --
2015-05-28 16:21 [PATCH 14/14] IB/mad: Add final " Liran Liss
2015-05-20  8:13 [PATCH 00/14] IB/mad: Add support for " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found] ` <1432109615-19564-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-05-20  8:13   ` [PATCH 14/14] IB/mad: Add final " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1432109615-19564-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-05-20 18:59       ` Jason Gunthorpe
     [not found]         ` <20150520185901.GK28496-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-21 16:23           ` ira.weiny
2015-05-20 21:11       ` Suri Shelvapille
     [not found]         ` <CY1PR03MB1440B98A7FE0A82E1BE53D75DEC20-DUcFgbLRNhB/HYnSB+xpdWP7xZHs9kq/vxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-05-20 21:26           ` ira.weiny

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.