All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.
@ 2015-02-04 23:29 ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The following patch series modifies the kernel MAD processing (ib_mad/ib_umad)
and related interfaces to send and receive Intel Omni-Path Architecture MADs on
devices which support them.

In addition to supporting some IBTA management classes, OPA devices use MADs
with lengths up to 2K.  These "jumbo" MADs increase the performance of
management traffic.

To distinguish IBTA MADs from OPA MADs a new Base Version is introduced.  The
new format shares the same common header with IBTA MADs which allows us to
share most of the MAD processing code when dealing with the new Base Version.


The patch series is broken into 3 main areas.

1) Add the ability for devices to indicate MAD size.
   modify the MAD code to use this MAD size

2) Enhance the interface to the device agents to support larger and variable
   length MADs.

3) Add capability bit to indicate support for OPA MADs

4) Add support for creating and processing OPA MADs


Changes for V4:

	Rebased to latest Rolands for-next branch (3.19-rc4)
	Fixed compile issue in ehca driver found with 0-day build.


Ira Weiny (19):
  IB/mad: Rename is_data_mad to is_rmpp_data_mad
  IB/core: Cache device attributes for use by upper level drivers
  IB/mad: Change validate_mad signature to take ib_mad_hdr rather than
    ib_mad
  IB/mad: Change ib_response_mad signature to take ib_mad_hdr rather
    than ib_mad
  IB/mad: Change cast in rcv_has_same_class
  IB/core: Add max_mad_size to ib_device_attr
  IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc
  IB/mad: Add helper function for smi_handle_dr_smp_send
  IB/mad: Add helper function for smi_handle_dr_smp_recv
  IB/mad: Add helper function for smi_check_forward_dr_smp
  IB/mad: Add helper function for SMI processing
  IB/mad: Add MAD size parameters to process_mad
  IB/mad: Add base version parameter to ib_create_send_mad
  IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
  IB/mad: Create jumbo_mad data structures
  IB/mad: Add Intel Omni-Path Architecture defines
  IB/mad: Implement support for Intel Omni-Path Architecture base
    version MADs in ib_create_send_mad
  IB/mad: Implement Intel Omni-Path Architecture SMP processing
  IB/mad: Implement Intel Omni-Path Architecture MAD processing

 drivers/infiniband/core/agent.c              |  26 +-
 drivers/infiniband/core/agent.h              |   3 +-
 drivers/infiniband/core/cm.c                 |   6 +-
 drivers/infiniband/core/device.c             |   2 +
 drivers/infiniband/core/mad.c                | 519 ++++++++++++++++++---------
 drivers/infiniband/core/mad_priv.h           |   7 +-
 drivers/infiniband/core/mad_rmpp.c           | 144 ++++----
 drivers/infiniband/core/opa_smi.h            |  78 ++++
 drivers/infiniband/core/sa_query.c           |   3 +-
 drivers/infiniband/core/smi.c                | 231 ++++++++----
 drivers/infiniband/core/smi.h                |   6 +
 drivers/infiniband/core/sysfs.c              |   5 +-
 drivers/infiniband/core/user_mad.c           |  38 +-
 drivers/infiniband/hw/amso1100/c2_provider.c |   5 +-
 drivers/infiniband/hw/amso1100/c2_rnic.c     |   1 +
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |   6 +-
 drivers/infiniband/hw/cxgb4/provider.c       |   8 +-
 drivers/infiniband/hw/ehca/ehca_hca.c        |   3 +
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |   4 +-
 drivers/infiniband/hw/ehca/ehca_sqp.c        |   8 +-
 drivers/infiniband/hw/ipath/ipath_mad.c      |   8 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c    |   1 +
 drivers/infiniband/hw/ipath/ipath_verbs.h    |   3 +-
 drivers/infiniband/hw/mlx4/mad.c             |  12 +-
 drivers/infiniband/hw/mlx4/main.c            |   1 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |   3 +-
 drivers/infiniband/hw/mlx5/mad.c             |   8 +-
 drivers/infiniband/hw/mlx5/main.c            |   1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |   3 +-
 drivers/infiniband/hw/mthca/mthca_dev.h      |   4 +-
 drivers/infiniband/hw/mthca/mthca_mad.c      |  12 +-
 drivers/infiniband/hw/mthca/mthca_provider.c |   2 +
 drivers/infiniband/hw/nes/nes_verbs.c        |   4 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c     |   3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h     |   3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |   1 +
 drivers/infiniband/hw/qib/qib_iba7322.c      |   3 +-
 drivers/infiniband/hw/qib/qib_mad.c          |  11 +-
 drivers/infiniband/hw/qib/qib_verbs.c        |   1 +
 drivers/infiniband/hw/qib/qib_verbs.h        |   3 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |   2 +
 drivers/infiniband/ulp/srpt/ib_srpt.c        |   3 +-
 include/rdma/ib_mad.h                        |  40 ++-
 include/rdma/ib_verbs.h                      |  15 +-
 include/rdma/opa_smi.h                       | 106 ++++++
 45 files changed, 999 insertions(+), 357 deletions(-)
 create mode 100644 drivers/infiniband/core/opa_smi.h
 create mode 100644 include/rdma/opa_smi.h

-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v4 01/19] IB/mad: Rename is_data_mad to is_rmpp_data_mad
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-04 23:29   ` [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

is_rmpp_data_mad is more descriptive for this function.

Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 74c30f4..4673262 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1734,7 +1734,7 @@ out:
 	return valid;
 }
 
-static int is_data_mad(struct ib_mad_agent_private *mad_agent_priv,
+static int is_rmpp_data_mad(struct ib_mad_agent_private *mad_agent_priv,
 		       struct ib_mad_hdr *mad_hdr)
 {
 	struct ib_rmpp_mad *rmpp_mad;
@@ -1836,7 +1836,7 @@ ib_find_send_mad(struct ib_mad_agent_private *mad_agent_priv,
 	 * been notified that the send has completed
 	 */
 	list_for_each_entry(wr, &mad_agent_priv->send_list, agent_list) {
-		if (is_data_mad(mad_agent_priv, wr->send_buf.mad) &&
+		if (is_rmpp_data_mad(mad_agent_priv, wr->send_buf.mad) &&
 		    wr->tid == mad->mad_hdr.tid &&
 		    wr->timeout &&
 		    rcv_has_same_class(wr, wc) &&
@@ -2411,7 +2411,8 @@ find_send_wr(struct ib_mad_agent_private *mad_agent_priv,
 
 	list_for_each_entry(mad_send_wr, &mad_agent_priv->send_list,
 			    agent_list) {
-		if (is_data_mad(mad_agent_priv, mad_send_wr->send_buf.mad) &&
+		if (is_rmpp_data_mad(mad_agent_priv,
+				     mad_send_wr->send_buf.mad) &&
 		    &mad_send_wr->send_buf == send_buf)
 			return mad_send_wr;
 	}
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 01/19] IB/mad: Rename is_data_mad to is_rmpp_data_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-3-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 03/19] IB/mad: Change validate_mad signature to take ib_mad_hdr rather than ib_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (17 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Upper level drivers can access these cached device attributes rather than
caching them on their own.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/device.c | 2 ++
 include/rdma/ib_verbs.h          | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..30d9d09 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -294,6 +294,8 @@ int ib_register_device(struct ib_device *device,
 	spin_lock_init(&device->event_handler_lock);
 	spin_lock_init(&device->client_data_lock);
 
+	device->query_device(device, &device->cached_dev_attrs);
+
 	ret = read_port_table_lengths(device);
 	if (ret) {
 		printk(KERN_WARNING "Couldn't create table lengths cache for device %s\n",
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0d74f1d..0116e4b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1675,6 +1675,7 @@ struct ib_device {
 	u32			     local_dma_lkey;
 	u8                           node_type;
 	u8                           phys_port_cnt;
+	struct ib_device_attr        cached_dev_attrs;
 };
 
 struct ib_client {
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 03/19] IB/mad: Change validate_mad signature to take ib_mad_hdr rather than ib_mad
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 01/19] IB/mad: Rename is_data_mad to is_rmpp_data_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-04 23:29   ` [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-4-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 04/19] IB/mad: Change ib_response_mad " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (16 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

validate_mad only needs access to the MAD header and can be used for both IB
and Jumbo MADs.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 4673262..9ffff9b 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1708,20 +1708,20 @@ out:
 	return mad_agent;
 }
 
-static int validate_mad(struct ib_mad *mad, u32 qp_num)
+static int validate_mad(struct ib_mad_hdr *mad_hdr, u32 qp_num)
 {
 	int valid = 0;
 
 	/* Make sure MAD base version is understood */
-	if (mad->mad_hdr.base_version != IB_MGMT_BASE_VERSION) {
+	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION) {
 		pr_err("MAD received with unsupported base version %d\n",
-			mad->mad_hdr.base_version);
+			mad_hdr->base_version);
 		goto out;
 	}
 
 	/* Filter SMI packets sent to other than QP0 */
-	if ((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) ||
-	    (mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)) {
+	if ((mad_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED) ||
+	    (mad_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)) {
 		if (qp_num == 0)
 			valid = 1;
 	} else {
@@ -1979,7 +1979,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS);
 
 	/* Validate MAD */
-	if (!validate_mad(&recv->mad.mad, qp_info->qp->qp_num))
+	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
 		goto out;
 
 	response = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 04/19] IB/mad: Change ib_response_mad signature to take ib_mad_hdr rather than ib_mad
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 03/19] IB/mad: Change validate_mad signature to take ib_mad_hdr rather than ib_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-5-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 05/19] IB/mad: Change cast in rcv_has_same_class ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (15 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

ib_response_mad only needs access to the MAD header and can be used for both IB
and Jumbo MADs.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad.c      | 20 ++++++++++----------
 drivers/infiniband/core/user_mad.c |  6 +++---
 include/rdma/ib_mad.h              |  2 +-
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 9ffff9b..66b3940 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -179,12 +179,12 @@ static int is_vendor_method_in_use(
 	return 0;
 }
 
-int ib_response_mad(struct ib_mad *mad)
+int ib_response_mad(struct ib_mad_hdr *hdr)
 {
-	return ((mad->mad_hdr.method & IB_MGMT_METHOD_RESP) ||
-		(mad->mad_hdr.method == IB_MGMT_METHOD_TRAP_REPRESS) ||
-		((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_BM) &&
-		 (mad->mad_hdr.attr_mod & IB_BM_ATTR_MOD_RESP)));
+	return ((hdr->method & IB_MGMT_METHOD_RESP) ||
+		(hdr->method == IB_MGMT_METHOD_TRAP_REPRESS) ||
+		((hdr->mgmt_class == IB_MGMT_CLASS_BM) &&
+		 (hdr->attr_mod & IB_BM_ATTR_MOD_RESP)));
 }
 EXPORT_SYMBOL(ib_response_mad);
 
@@ -791,7 +791,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	switch (ret)
 	{
 	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY:
-		if (ib_response_mad(&mad_priv->mad.mad) &&
+		if (ib_response_mad(&mad_priv->mad.mad.mad_hdr) &&
 		    mad_agent_priv->agent.recv_handler) {
 			local->mad_priv = mad_priv;
 			local->recv_mad_agent = mad_agent_priv;
@@ -1628,7 +1628,7 @@ find_mad_agent(struct ib_mad_port_private *port_priv,
 	unsigned long flags;
 
 	spin_lock_irqsave(&port_priv->reg_lock, flags);
-	if (ib_response_mad(mad)) {
+	if (ib_response_mad(&mad->mad_hdr)) {
 		u32 hi_tid;
 		struct ib_mad_agent_private *entry;
 
@@ -1765,8 +1765,8 @@ static inline int rcv_has_same_gid(struct ib_mad_agent_private *mad_agent_priv,
 	u8 port_num = mad_agent_priv->agent.port_num;
 	u8 lmc;
 
-	send_resp = ib_response_mad((struct ib_mad *)wr->send_buf.mad);
-	rcv_resp = ib_response_mad(rwc->recv_buf.mad);
+	send_resp = ib_response_mad((struct ib_mad_hdr *)wr->send_buf.mad);
+	rcv_resp = ib_response_mad(&rwc->recv_buf.mad->mad_hdr);
 
 	if (send_resp == rcv_resp)
 		/* both requests, or both responses. GIDs different */
@@ -1879,7 +1879,7 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	/* Complete corresponding request */
-	if (ib_response_mad(mad_recv_wc->recv_buf.mad)) {
+	if (ib_response_mad(&mad_recv_wc->recv_buf.mad->mad_hdr)) {
 		spin_lock_irqsave(&mad_agent_priv->lock, flags);
 		mad_send_wr = ib_find_send_mad(mad_agent_priv, mad_recv_wc);
 		if (!mad_send_wr) {
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 928cdd2..66b5217 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -426,11 +426,11 @@ static int is_duplicate(struct ib_umad_file *file,
 		 * the same TID, reject the second as a duplicate.  This is more
 		 * restrictive than required by the spec.
 		 */
-		if (!ib_response_mad((struct ib_mad *) hdr)) {
-			if (!ib_response_mad((struct ib_mad *) sent_hdr))
+		if (!ib_response_mad(hdr)) {
+			if (!ib_response_mad(sent_hdr))
 				return 1;
 			continue;
-		} else if (!ib_response_mad((struct ib_mad *) sent_hdr))
+		} else if (!ib_response_mad(sent_hdr))
 			continue;
 
 		if (same_destination(&packet->mad.hdr, &sent_packet->mad.hdr))
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 9bb99e9..9c89939 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -263,7 +263,7 @@ struct ib_mad_send_buf {
  * ib_response_mad - Returns if the specified MAD has been generated in
  *   response to a sent request or trap.
  */
-int ib_response_mad(struct ib_mad *mad);
+int ib_response_mad(struct ib_mad_hdr *hdr);
 
 /**
  * ib_get_rmpp_resptime - Returns the RMPP response time.
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 05/19] IB/mad: Change cast in rcv_has_same_class
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 04/19] IB/mad: Change ib_response_mad " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-04 23:29   ` [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (14 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

rcv_has_same_class only needs access to the MAD header and can be used for both IB
and Jumbo MADs.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/mad.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 66b3940..819b794 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1750,7 +1750,7 @@ static int is_rmpp_data_mad(struct ib_mad_agent_private *mad_agent_priv,
 static inline int rcv_has_same_class(struct ib_mad_send_wr_private *wr,
 				     struct ib_mad_recv_wc *rwc)
 {
-	return ((struct ib_mad *)(wr->send_buf.mad))->mad_hdr.mgmt_class ==
+	return ((struct ib_mad_hdr *)(wr->send_buf.mad))->mgmt_class ==
 		rwc->recv_buf.mad->mad_hdr.mgmt_class;
 }
 
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 05/19] IB/mad: Change cast in rcv_has_same_class ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-7-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (13 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Change all IB drivers to report the max MAD size.
Add check to verify that all devices support at least IB_MGMT_MAD_SIZE

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---

Changes from V3:
	Fix ehca compile found with 0-day build

 drivers/infiniband/core/mad.c                | 6 ++++++
 drivers/infiniband/hw/amso1100/c2_rnic.c     | 1 +
 drivers/infiniband/hw/cxgb3/iwch_provider.c  | 1 +
 drivers/infiniband/hw/cxgb4/provider.c       | 1 +
 drivers/infiniband/hw/ehca/ehca_hca.c        | 3 +++
 drivers/infiniband/hw/ipath/ipath_verbs.c    | 1 +
 drivers/infiniband/hw/mlx4/main.c            | 1 +
 drivers/infiniband/hw/mlx5/main.c            | 1 +
 drivers/infiniband/hw/mthca/mthca_provider.c | 2 ++
 drivers/infiniband/hw/nes/nes_verbs.c        | 1 +
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  | 1 +
 drivers/infiniband/hw/qib/qib_verbs.c        | 1 +
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 ++
 include/rdma/ib_mad.h                        | 1 +
 include/rdma/ib_verbs.h                      | 1 +
 15 files changed, 24 insertions(+)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 819b794..a6a33cf 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2924,6 +2924,12 @@ static int ib_mad_port_open(struct ib_device *device,
 	char name[sizeof "ib_mad123"];
 	int has_smi;
 
+	if (device->cached_dev_attrs.max_mad_size < IB_MGMT_MAD_SIZE) {
+		dev_err(&device->dev, "Min MAD size for device is %u\n",
+			IB_MGMT_MAD_SIZE);
+		return -EFAULT;
+	}
+
 	/* Create new device info */
 	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
 	if (!port_priv) {
diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c
index d2a6d96..63322c0 100644
--- a/drivers/infiniband/hw/amso1100/c2_rnic.c
+++ b/drivers/infiniband/hw/amso1100/c2_rnic.c
@@ -197,6 +197,7 @@ static int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props)
 	props->max_srq_sge         = 0;
 	props->max_pkeys           = 0;
 	props->local_ca_ack_delay  = 0;
+	props->max_mad_size        = IB_MGMT_MAD_SIZE;
 
  bail2:
 	vq_repbuf_free(c2dev, reply);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 811b24a..b8a80aa0 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1174,6 +1174,7 @@ static int iwch_query_device(struct ib_device *ibdev,
 	props->max_pd = dev->attr.max_pds;
 	props->local_ca_ack_delay = 0;
 	props->max_fast_reg_page_list_len = T3_MAX_FASTREG_DEPTH;
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 66bd6a2..299c70c 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -332,6 +332,7 @@ static int c4iw_query_device(struct ib_device *ibdev,
 	props->max_pd = T4_MAX_NUM_PD;
 	props->local_ca_ack_delay = 0;
 	props->max_fast_reg_page_list_len = t4_max_fr_depth(use_dsgl);
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index 9ed4d25..6166146 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -40,6 +40,7 @@
  */
 
 #include <linux/gfp.h>
+#include <rdma/ib_mad.h>
 
 #include "ehca_tools.h"
 #include "ehca_iverbs.h"
@@ -133,6 +134,8 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
 		if (rblock->hca_cap_indicators & cap_mapping[i + 1])
 			props->device_cap_flags |= cap_mapping[i];
 
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
+
 query_device1:
 	ehca_free_fw_ctrlblock(rblock);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 44ea939..4c6474c 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1538,6 +1538,7 @@ static int ipath_query_device(struct ib_device *ibdev,
 	props->max_mcast_qp_attach = ib_ipath_max_mcast_qp_attached;
 	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
 		props->max_mcast_grp;
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 57ecc5b..88326a7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -229,6 +229,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
 					   props->max_mcast_grp;
 	props->max_map_per_fmr = dev->dev->caps.max_fmr_maps;
+	props->max_mad_size        = IB_MGMT_MAD_SIZE;
 
 out:
 	kfree(in_mad);
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 8a87404..24a0a54 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -243,6 +243,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
 					   props->max_mcast_grp;
 	props->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */
+	props->max_mad_size        = IB_MGMT_MAD_SIZE;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	if (dev->mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 415f8e1..236c0df 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -123,6 +123,8 @@ static int mthca_query_device(struct ib_device *ibdev,
 		props->max_map_per_fmr =
 			(1 << (32 - ilog2(mdev->limits.num_mpts))) - 1;
 
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
+
 	err = 0;
  out:
 	kfree(in_mad);
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index c0d0296..93e67e2 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -555,6 +555,7 @@ static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *prop
 	props->max_qp_init_rd_atom = props->max_qp_rd_atom;
 	props->atomic_cap = IB_ATOMIC_NONE;
 	props->max_map_per_fmr = 1;
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index fb8d8c4..7ae0a22 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -103,6 +103,7 @@ int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr)
 	attr->local_ca_ack_delay = dev->attr.local_ca_ack_delay;
 	attr->max_fast_reg_page_list_len = dev->attr.max_pages_per_frmr;
 	attr->max_pkeys = 1;
+	attr->max_mad_size = IB_MGMT_MAD_SIZE;
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 9bcfbd8..5d6447b 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1591,6 +1591,7 @@ static int qib_query_device(struct ib_device *ibdev,
 	props->max_mcast_qp_attach = ib_qib_max_mcast_qp_attached;
 	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
 		props->max_mcast_grp;
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 53bd6a2..b72ad7f 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -22,6 +22,7 @@
 
 #include <rdma/ib_user_verbs.h>
 #include <rdma/ib_addr.h>
+#include <rdma/ib_mad.h>
 
 #include "usnic_abi.h"
 #include "usnic_ib.h"
@@ -296,6 +297,7 @@ int usnic_ib_query_device(struct ib_device *ibdev,
 	props->max_mcast_qp_attach = 0;
 	props->max_total_mcast_qp_attach = 0;
 	props->max_map_per_fmr = 0;
+	props->max_mad_size = IB_MGMT_MAD_SIZE;
 	/* Owned by Userspace
 	 * max_qp_wr, max_sge, max_sge_rd, max_cqe */
 	mutex_unlock(&us_ibdev->usdev_lock);
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 9c89939..5823016 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -135,6 +135,7 @@ enum {
 	IB_MGMT_SA_DATA = 200,
 	IB_MGMT_DEVICE_HDR = 64,
 	IB_MGMT_DEVICE_DATA = 192,
+	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
 };
 
 struct ib_mad_hdr {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0116e4b..64d3479 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -210,6 +210,7 @@ struct ib_device_attr {
 	int			sig_prot_cap;
 	int			sig_guard_cap;
 	struct ib_odp_caps	odp_caps;
+	u32			max_mad_size;
 };
 
 enum ib_mtu {
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-8-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 08/19] IB/mad: Add helper function for smi_handle_dr_smp_send ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (12 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Use the new max_mad_size specified by devices for the allocations and DMA maps.

kmalloc is more flexible to support devices with different sized MADs and
research and testing showed that the current use of kmem_cache does not provide
performance benefits over kmalloc.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/mad.c | 73 ++++++++++++++++++-------------------------
 1 file changed, 30 insertions(+), 43 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index a6a33cf..cc0a3ad 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -59,8 +59,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests
 module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
 MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests");
 
-static struct kmem_cache *ib_mad_cache;
-
 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;
 
@@ -717,6 +715,13 @@ static void build_smp_wc(struct ib_qp *qp,
 	wc->port_num = port_num;
 }
 
+static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev)
+{
+	return (kmalloc(sizeof(struct ib_mad_private_header) +
+			sizeof(struct ib_grh) +
+			dev->cached_dev_attrs.max_mad_size, GFP_ATOMIC));
+}
+
 /*
  * Return 0 if SMP is to be sent
  * Return 1 if SMP was consumed locally (whether or not solicited)
@@ -771,7 +776,8 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 	local->mad_priv = NULL;
 	local->recv_mad_agent = NULL;
-	mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC);
+
+	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device);
 	if (!mad_priv) {
 		ret = -ENOMEM;
 		dev_err(&device->dev, "No memory for local response MAD\n");
@@ -801,10 +807,10 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 			 */
 			atomic_inc(&mad_agent_priv->refcount);
 		} else
-			kmem_cache_free(ib_mad_cache, mad_priv);
+			kfree(mad_priv);
 		break;
 	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED:
-		kmem_cache_free(ib_mad_cache, mad_priv);
+		kfree(mad_priv);
 		break;
 	case IB_MAD_RESULT_SUCCESS:
 		/* Treat like an incoming receive MAD */
@@ -820,14 +826,14 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 			 * No receiving agent so drop packet and
 			 * generate send completion.
 			 */
-			kmem_cache_free(ib_mad_cache, mad_priv);
+			kfree(mad_priv);
 			break;
 		}
 		local->mad_priv = mad_priv;
 		local->recv_mad_agent = recv_mad_agent;
 		break;
 	default:
-		kmem_cache_free(ib_mad_cache, mad_priv);
+		kfree(mad_priv);
 		kfree(local);
 		ret = -EINVAL;
 		goto out;
@@ -1237,7 +1243,7 @@ void ib_free_recv_mad(struct ib_mad_recv_wc *mad_recv_wc)
 					    recv_wc);
 		priv = container_of(mad_priv_hdr, struct ib_mad_private,
 				    header);
-		kmem_cache_free(ib_mad_cache, priv);
+		kfree(priv);
 	}
 }
 EXPORT_SYMBOL(ib_free_recv_mad);
@@ -1924,6 +1930,11 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 	}
 }
 
+static size_t mad_recv_buf_size(struct ib_device *dev)
+{
+	return(sizeof(struct ib_grh) + dev->cached_dev_attrs.max_mad_size);
+}
+
 static bool generate_unmatched_resp(struct ib_mad_private *recv,
 				    struct ib_mad_private *response)
 {
@@ -1964,8 +1975,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	recv = container_of(mad_priv_hdr, struct ib_mad_private, header);
 	ib_dma_unmap_single(port_priv->device,
 			    recv->header.mapping,
-			    sizeof(struct ib_mad_private) -
-			      sizeof(struct ib_mad_private_header),
+			    mad_recv_buf_size(port_priv->device),
 			    DMA_FROM_DEVICE);
 
 	/* Setup MAD receive work completion from "normal" work completion */
@@ -1982,7 +1992,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
 		goto out;
 
-	response = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL);
+	response = alloc_mad_priv(port_priv->device);
 	if (!response) {
 		dev_err(&port_priv->device->dev,
 			"ib_mad_recv_done_handler no memory for response buffer\n");
@@ -2075,7 +2085,7 @@ out:
 	if (response) {
 		ib_mad_post_receive_mads(qp_info, response);
 		if (recv)
-			kmem_cache_free(ib_mad_cache, recv);
+			kfree(recv);
 	} else
 		ib_mad_post_receive_mads(qp_info, recv);
 }
@@ -2535,7 +2545,7 @@ local_send_completion:
 		spin_lock_irqsave(&mad_agent_priv->lock, flags);
 		atomic_dec(&mad_agent_priv->refcount);
 		if (free_mad)
-			kmem_cache_free(ib_mad_cache, local->mad_priv);
+			kfree(local->mad_priv);
 		kfree(local);
 	}
 	spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
@@ -2664,7 +2674,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 			mad_priv = mad;
 			mad = NULL;
 		} else {
-			mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL);
+			mad_priv = alloc_mad_priv(qp_info->port_priv->device);
 			if (!mad_priv) {
 				dev_err(&qp_info->port_priv->device->dev,
 					"No memory for receive buffer\n");
@@ -2674,8 +2684,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 		}
 		sg_list.addr = ib_dma_map_single(qp_info->port_priv->device,
 						 &mad_priv->grh,
-						 sizeof *mad_priv -
-						   sizeof mad_priv->header,
+						 mad_recv_buf_size(qp_info->port_priv->device),
 						 DMA_FROM_DEVICE);
 		if (unlikely(ib_dma_mapping_error(qp_info->port_priv->device,
 						  sg_list.addr))) {
@@ -2699,10 +2708,9 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 			spin_unlock_irqrestore(&recv_queue->lock, flags);
 			ib_dma_unmap_single(qp_info->port_priv->device,
 					    mad_priv->header.mapping,
-					    sizeof *mad_priv -
-					      sizeof mad_priv->header,
+					    mad_recv_buf_size(qp_info->port_priv->device),
 					    DMA_FROM_DEVICE);
-			kmem_cache_free(ib_mad_cache, mad_priv);
+			kfree(mad_priv);
 			dev_err(&qp_info->port_priv->device->dev,
 				"ib_post_recv failed: %d\n", ret);
 			break;
@@ -2739,10 +2747,9 @@ static void cleanup_recv_queue(struct ib_mad_qp_info *qp_info)
 
 		ib_dma_unmap_single(qp_info->port_priv->device,
 				    recv->header.mapping,
-				    sizeof(struct ib_mad_private) -
-				      sizeof(struct ib_mad_private_header),
+				    mad_recv_buf_size(qp_info->port_priv->device),
 				    DMA_FROM_DEVICE);
-		kmem_cache_free(ib_mad_cache, recv);
+		kfree(recv);
 	}
 
 	qp_info->recv_queue.count = 0;
@@ -3138,45 +3145,25 @@ static struct ib_client mad_client = {
 
 static int __init ib_mad_init_module(void)
 {
-	int ret;
-
 	mad_recvq_size = min(mad_recvq_size, IB_MAD_QP_MAX_SIZE);
 	mad_recvq_size = max(mad_recvq_size, IB_MAD_QP_MIN_SIZE);
 
 	mad_sendq_size = min(mad_sendq_size, IB_MAD_QP_MAX_SIZE);
 	mad_sendq_size = max(mad_sendq_size, IB_MAD_QP_MIN_SIZE);
 
-	ib_mad_cache = kmem_cache_create("ib_mad",
-					 sizeof(struct ib_mad_private),
-					 0,
-					 SLAB_HWCACHE_ALIGN,
-					 NULL);
-	if (!ib_mad_cache) {
-		pr_err("Couldn't create ib_mad cache\n");
-		ret = -ENOMEM;
-		goto error1;
-	}
-
 	INIT_LIST_HEAD(&ib_mad_port_list);
 
 	if (ib_register_client(&mad_client)) {
 		pr_err("Couldn't register ib_mad client\n");
-		ret = -EINVAL;
-		goto error2;
+		return(-EINVAL);
 	}
 
 	return 0;
-
-error2:
-	kmem_cache_destroy(ib_mad_cache);
-error1:
-	return ret;
 }
 
 static void __exit ib_mad_cleanup_module(void)
 {
 	ib_unregister_client(&mad_client);
-	kmem_cache_destroy(ib_mad_cache);
 }
 
 module_init(ib_mad_init_module);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 08/19] IB/mad: Add helper function for smi_handle_dr_smp_send
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-9-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 09/19] IB/mad: Add helper function for smi_handle_dr_smp_recv ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (11 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This helper function will be used for processing both IB and OPA SMP sends.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/smi.c | 81 +++++++++++++++++++++++++------------------
 1 file changed, 47 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index 5855e44..3bac6e6 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -39,84 +39,81 @@
 #include <rdma/ib_smi.h>
 #include "smi.h"
 
-/*
- * Fixup a directed route SMP for sending
- * Return 0 if the SMP should be discarded
- */
-enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
-				       u8 node_type, int port_num)
+static inline
+enum smi_action __smi_handle_dr_smp_send(u8 node_type, int port_num,
+					 u8 *hop_ptr, u8 hop_cnt,
+					 u8 *initial_path,
+					 u8 *return_path,
+					 u8 direction,
+					 int dr_dlid_is_permissive,
+					 int dr_slid_is_permissive)
 {
-	u8 hop_ptr, hop_cnt;
-
-	hop_ptr = smp->hop_ptr;
-	hop_cnt = smp->hop_cnt;
-
 	/* See section 14.2.2.2, Vol 1 IB spec */
 	/* C14-6 -- valid hop_cnt values are from 0 to 63 */
 	if (hop_cnt >= IB_SMP_MAX_PATH_HOPS)
 		return IB_SMI_DISCARD;
 
-	if (!ib_get_smp_direction(smp)) {
+	if (!direction) {
 		/* C14-9:1 */
-		if (hop_cnt && hop_ptr == 0) {
-			smp->hop_ptr++;
-			return (smp->initial_path[smp->hop_ptr] ==
+		if (hop_cnt && *hop_ptr == 0) {
+			(*hop_ptr)++;
+			return (initial_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:2 */
-		if (hop_ptr && hop_ptr < hop_cnt) {
+		if (*hop_ptr && *hop_ptr < hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			/* smp->return_path set when received */
-			smp->hop_ptr++;
-			return (smp->initial_path[smp->hop_ptr] ==
+			/* return_path set when received */
+			(*hop_ptr)++;
+			return (initial_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:3 -- We're at the end of the DR segment of path */
-		if (hop_ptr == hop_cnt) {
-			/* smp->return_path set when received */
-			smp->hop_ptr++;
+		if (*hop_ptr == hop_cnt) {
+			/* return_path set when received */
+			(*hop_ptr)++;
 			return (node_type == RDMA_NODE_IB_SWITCH ||
-				smp->dr_dlid == IB_LID_PERMISSIVE ?
+				dr_dlid_is_permissive ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM */
 		/* C14-9:5 -- Fail unreasonable hop pointer */
-		return (hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+		return (*hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 
 	} else {
 		/* C14-13:1 */
-		if (hop_cnt && hop_ptr == hop_cnt + 1) {
-			smp->hop_ptr--;
-			return (smp->return_path[smp->hop_ptr] ==
+		if (hop_cnt && *hop_ptr == hop_cnt + 1) {
+			(*hop_ptr)--;
+			return (return_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:2 */
-		if (2 <= hop_ptr && hop_ptr <= hop_cnt) {
+		if (2 <= *hop_ptr && *hop_ptr <= hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			smp->hop_ptr--;
-			return (smp->return_path[smp->hop_ptr] ==
+			(*hop_ptr)--;
+			return (return_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:3 -- at the end of the DR segment of path */
-		if (hop_ptr == 1) {
-			smp->hop_ptr--;
+		if (*hop_ptr == 1) {
+			(*hop_ptr)--;
 			/* C14-13:3 -- SMPs destined for SM shouldn't be here */
 			return (node_type == RDMA_NODE_IB_SWITCH ||
-				smp->dr_slid == IB_LID_PERMISSIVE ?
+				dr_slid_is_permissive ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:4 -- hop_ptr = 0 -> should have gone to SM */
-		if (hop_ptr == 0)
+		if (*hop_ptr == 0)
 			return IB_SMI_HANDLE;
 
 		/* C14-13:5 -- Check for unreasonable hop pointer */
@@ -125,6 +122,22 @@ enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
 }
 
 /*
+ * Fixup a directed route SMP for sending
+ * Return 0 if the SMP should be discarded
+ */
+enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
+				       u8 node_type, int port_num)
+{
+	return __smi_handle_dr_smp_send(node_type, port_num,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->initial_path,
+					smp->return_path,
+					ib_get_smp_direction(smp),
+					smp->dr_dlid == IB_LID_PERMISSIVE,
+					smp->dr_slid == IB_LID_PERMISSIVE);
+}
+
+/*
  * Adjust information for a received SMP
  * Return 0 if the SMP should be dropped
  */
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 09/19] IB/mad: Add helper function for smi_handle_dr_smp_recv
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 08/19] IB/mad: Add helper function for smi_handle_dr_smp_send ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-04 23:29   ` [PATCH v4 10/19] IB/mad: Add helper function for smi_check_forward_dr_smp ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (10 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This helper function will be used for processing both IB and OPA SMP recvs.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/smi.c | 80 +++++++++++++++++++++++++------------------
 1 file changed, 47 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index 3bac6e6..24670de 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -137,91 +137,105 @@ enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
-/*
- * Adjust information for a received SMP
- * Return 0 if the SMP should be dropped
- */
-enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
-				       int port_num, int phys_port_cnt)
+static inline
+enum smi_action __smi_handle_dr_smp_recv(u8 node_type, int port_num,
+					 int phys_port_cnt,
+					 u8 *hop_ptr, u8 hop_cnt,
+					 u8 *initial_path,
+					 u8 *return_path,
+					 u8 direction,
+					 int dr_dlid_is_permissive,
+					 int dr_slid_is_permissive)
 {
-	u8 hop_ptr, hop_cnt;
-
-	hop_ptr = smp->hop_ptr;
-	hop_cnt = smp->hop_cnt;
-
 	/* See section 14.2.2.2, Vol 1 IB spec */
 	/* C14-6 -- valid hop_cnt values are from 0 to 63 */
 	if (hop_cnt >= IB_SMP_MAX_PATH_HOPS)
 		return IB_SMI_DISCARD;
 
-	if (!ib_get_smp_direction(smp)) {
+	if (!direction) {
 		/* C14-9:1 -- sender should have incremented hop_ptr */
-		if (hop_cnt && hop_ptr == 0)
+		if (hop_cnt && *hop_ptr == 0)
 			return IB_SMI_DISCARD;
 
 		/* C14-9:2 -- intermediate hop */
-		if (hop_ptr && hop_ptr < hop_cnt) {
+		if (*hop_ptr && *hop_ptr < hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			smp->return_path[hop_ptr] = port_num;
-			/* smp->hop_ptr updated when sending */
-			return (smp->initial_path[hop_ptr+1] <= phys_port_cnt ?
+			return_path[*hop_ptr] = port_num;
+			/* hop_ptr updated when sending */
+			return (initial_path[*hop_ptr+1] <= phys_port_cnt ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:3 -- We're at the end of the DR segment of path */
-		if (hop_ptr == hop_cnt) {
+		if (*hop_ptr == hop_cnt) {
 			if (hop_cnt)
-				smp->return_path[hop_ptr] = port_num;
-			/* smp->hop_ptr updated when sending */
+				return_path[*hop_ptr] = port_num;
+			/* hop_ptr updated when sending */
 
 			return (node_type == RDMA_NODE_IB_SWITCH ||
-				smp->dr_dlid == IB_LID_PERMISSIVE ?
+				dr_dlid_is_permissive ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM */
 		/* C14-9:5 -- fail unreasonable hop pointer */
-		return (hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+		return (*hop_ptr == hop_cnt + 1 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 
 	} else {
 
 		/* C14-13:1 */
-		if (hop_cnt && hop_ptr == hop_cnt + 1) {
-			smp->hop_ptr--;
-			return (smp->return_path[smp->hop_ptr] ==
+		if (hop_cnt && *hop_ptr == hop_cnt + 1) {
+			(*hop_ptr)--;
+			return (return_path[*hop_ptr] ==
 				port_num ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:2 */
-		if (2 <= hop_ptr && hop_ptr <= hop_cnt) {
+		if (2 <= *hop_ptr && *hop_ptr <= hop_cnt) {
 			if (node_type != RDMA_NODE_IB_SWITCH)
 				return IB_SMI_DISCARD;
 
-			/* smp->hop_ptr updated when sending */
-			return (smp->return_path[hop_ptr-1] <= phys_port_cnt ?
+			/* hop_ptr updated when sending */
+			return (return_path[*hop_ptr-1] <= phys_port_cnt ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:3 -- We're at the end of the DR segment of path */
-		if (hop_ptr == 1) {
-			if (smp->dr_slid == IB_LID_PERMISSIVE) {
+		if (*hop_ptr == 1) {
+			if (dr_slid_is_permissive) {
 				/* giving SMP to SM - update hop_ptr */
-				smp->hop_ptr--;
+				(*hop_ptr)--;
 				return IB_SMI_HANDLE;
 			}
-			/* smp->hop_ptr updated when sending */
+			/* hop_ptr updated when sending */
 			return (node_type == RDMA_NODE_IB_SWITCH ?
 				IB_SMI_HANDLE : IB_SMI_DISCARD);
 		}
 
 		/* C14-13:4 -- hop_ptr = 0 -> give to SM */
 		/* C14-13:5 -- Check for unreasonable hop pointer */
-		return (hop_ptr == 0 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+		return (*hop_ptr == 0 ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 	}
 }
 
+/*
+ * Adjust information for a received SMP
+ * Return 0 if the SMP should be dropped
+ */
+enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
+				       int port_num, int phys_port_cnt)
+{
+	return __smi_handle_dr_smp_recv(node_type, port_num, phys_port_cnt,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->initial_path,
+					smp->return_path,
+					ib_get_smp_direction(smp),
+					smp->dr_dlid == IB_LID_PERMISSIVE,
+					smp->dr_slid == IB_LID_PERMISSIVE);
+}
+
 enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
 {
 	u8 hop_ptr, hop_cnt;
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 10/19] IB/mad: Add helper function for smi_check_forward_dr_smp
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 09/19] IB/mad: Add helper function for smi_handle_dr_smp_recv ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-04 23:29   ` [PATCH v4 11/19] IB/mad: Add helper function for SMI processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (9 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This helper function will be used for processing both IB and OPA SMPs.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/smi.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index 24670de..8a5fb1d 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -236,21 +236,20 @@ enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
-enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
+static inline
+enum smi_forward_action __smi_check_forward_dr_smp(u8 hop_ptr, u8 hop_cnt,
+						   u8 direction,
+						   int dr_dlid_is_permissive,
+						   int dr_slid_is_permissive)
 {
-	u8 hop_ptr, hop_cnt;
-
-	hop_ptr = smp->hop_ptr;
-	hop_cnt = smp->hop_cnt;
-
-	if (!ib_get_smp_direction(smp)) {
+	if (!direction) {
 		/* C14-9:2 -- intermediate hop */
 		if (hop_ptr && hop_ptr < hop_cnt)
 			return IB_SMI_FORWARD;
 
 		/* C14-9:3 -- at the end of the DR segment of path */
 		if (hop_ptr == hop_cnt)
-			return (smp->dr_dlid == IB_LID_PERMISSIVE ?
+			return (dr_dlid_is_permissive ?
 				IB_SMI_SEND : IB_SMI_LOCAL);
 
 		/* C14-9:4 -- hop_ptr = hop_cnt + 1 -> give to SMA/SM */
@@ -263,10 +262,19 @@ enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
 
 		/* C14-13:3 -- at the end of the DR segment of path */
 		if (hop_ptr == 1)
-			return (smp->dr_slid != IB_LID_PERMISSIVE ?
+			return (dr_slid_is_permissive ?
 				IB_SMI_SEND : IB_SMI_LOCAL);
 	}
 	return IB_SMI_LOCAL;
+
+}
+
+enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
+{
+	return __smi_check_forward_dr_smp(smp->hop_ptr, smp->hop_cnt,
+					  ib_get_smp_direction(smp),
+					  smp->dr_dlid == IB_LID_PERMISSIVE,
+					  smp->dr_slid != IB_LID_PERMISSIVE);
 }
 
 /*
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 11/19] IB/mad: Add helper function for SMI processing
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 10/19] IB/mad: Add helper function for smi_check_forward_dr_smp ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-04 23:29   ` [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (8 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This helper function will be used for processing both IB and OPA SMPs.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad.c | 85 +++++++++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index cc0a3ad..2ffeace 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1930,6 +1930,52 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 	}
 }
 
+enum smi_action handle_ib_smi(struct ib_mad_port_private *port_priv,
+			      struct ib_mad_qp_info *qp_info,
+			      struct ib_wc *wc,
+			      int port_num,
+			      struct ib_mad_private *recv,
+			      struct ib_mad_private *response)
+{
+	enum smi_forward_action retsmi;
+
+	if (smi_handle_dr_smp_recv(&recv->mad.smp,
+				   port_priv->device->node_type,
+				   port_num,
+				   port_priv->device->phys_port_cnt) ==
+				   IB_SMI_DISCARD)
+		return IB_SMI_DISCARD;
+
+	retsmi = smi_check_forward_dr_smp(&recv->mad.smp);
+	if (retsmi == IB_SMI_LOCAL)
+		return IB_SMI_HANDLE;
+
+	if (retsmi == IB_SMI_SEND) { /* don't forward */
+		if (smi_handle_dr_smp_send(&recv->mad.smp,
+					   port_priv->device->node_type,
+					   port_num) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+		if (smi_check_local_smp(&recv->mad.smp, port_priv->device) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
+		/* forward case for switches */
+		memcpy(response, recv, sizeof(*response));
+		response->header.recv_wc.wc = &response->header.wc;
+		response->header.recv_wc.recv_buf.mad = &response->mad.mad;
+		response->header.recv_wc.recv_buf.grh = &response->grh;
+
+		agent_send_response(&response->mad.mad,
+				    &response->grh, wc,
+				    port_priv->device,
+				    smi_get_fwd_port(&recv->mad.smp),
+				    qp_info->qp->qp_num);
+
+		return IB_SMI_DISCARD;
+	}
+	return IB_SMI_HANDLE;
+}
+
 static size_t mad_recv_buf_size(struct ib_device *dev)
 {
 	return(sizeof(struct ib_grh) + dev->cached_dev_attrs.max_mad_size);
@@ -2006,45 +2052,12 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 
 	if (recv->mad.mad.mad_hdr.mgmt_class ==
 	    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
-		enum smi_forward_action retsmi;
-
-		if (smi_handle_dr_smp_recv(&recv->mad.smp,
-					   port_priv->device->node_type,
-					   port_num,
-					   port_priv->device->phys_port_cnt) ==
-					   IB_SMI_DISCARD)
+		if (handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
+				  response)
+		    == IB_SMI_DISCARD)
 			goto out;
-
-		retsmi = smi_check_forward_dr_smp(&recv->mad.smp);
-		if (retsmi == IB_SMI_LOCAL)
-			goto local;
-
-		if (retsmi == IB_SMI_SEND) { /* don't forward */
-			if (smi_handle_dr_smp_send(&recv->mad.smp,
-						   port_priv->device->node_type,
-						   port_num) == IB_SMI_DISCARD)
-				goto out;
-
-			if (smi_check_local_smp(&recv->mad.smp, port_priv->device) == IB_SMI_DISCARD)
-				goto out;
-		} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
-			/* forward case for switches */
-			memcpy(response, recv, sizeof(*response));
-			response->header.recv_wc.wc = &response->header.wc;
-			response->header.recv_wc.recv_buf.mad = &response->mad.mad;
-			response->header.recv_wc.recv_buf.grh = &response->grh;
-
-			agent_send_response(&response->mad.mad,
-					    &response->grh, wc,
-					    port_priv->device,
-					    smi_get_fwd_port(&recv->mad.smp),
-					    qp_info->qp->qp_num);
-
-			goto out;
-		}
 	}
 
-local:
 	/* Give driver "right of first refusal" on incoming MAD */
 	if (port_priv->device->process_mad) {
 		ret = port_priv->device->process_mad(port_priv->device, 0,
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 11/19] IB/mad: Add helper function for SMI processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-13-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 13/19] IB/mad: Add base version parameter to ib_create_send_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (7 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

In support of variable length MADs add in and out MAD size parameters to the
process_mad call.

The out MAD size parameter is passed by reference such that it can be updated
by the agent to indicate the proper response length to be sent by the MAD
stack.

The in and out MAD parameters are made generic by specifying them as
ib_mad_hdr.

Drivers are modified as needed.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/mad.c                | 30 ++++++++++++++++++----------
 drivers/infiniband/core/sysfs.c              |  5 ++++-
 drivers/infiniband/hw/amso1100/c2_provider.c |  5 ++++-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  5 ++++-
 drivers/infiniband/hw/cxgb4/provider.c       |  7 +++++--
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |  4 ++--
 drivers/infiniband/hw/ehca/ehca_sqp.c        |  8 +++++++-
 drivers/infiniband/hw/ipath/ipath_mad.c      |  8 +++++++-
 drivers/infiniband/hw/ipath/ipath_verbs.h    |  3 ++-
 drivers/infiniband/hw/mlx4/mad.c             |  9 ++++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |  3 ++-
 drivers/infiniband/hw/mlx5/mad.c             |  8 +++++++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |  3 ++-
 drivers/infiniband/hw/mthca/mthca_dev.h      |  4 ++--
 drivers/infiniband/hw/mthca/mthca_mad.c      |  9 +++++++--
 drivers/infiniband/hw/nes/nes_verbs.c        |  3 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c     |  3 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h     |  3 ++-
 drivers/infiniband/hw/qib/qib_mad.c          |  8 +++++++-
 drivers/infiniband/hw/qib/qib_verbs.h        |  3 ++-
 include/rdma/ib_verbs.h                      |  8 +++++---
 21 files changed, 103 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 2ffeace..4d93ad2 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -715,11 +715,12 @@ static void build_smp_wc(struct ib_qp *qp,
 	wc->port_num = port_num;
 }
 
-static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev)
+static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev,
+					     size_t *mad_size)
 {
+	*mad_size = dev->cached_dev_attrs.max_mad_size;
 	return (kmalloc(sizeof(struct ib_mad_private_header) +
-			sizeof(struct ib_grh) +
-			dev->cached_dev_attrs.max_mad_size, GFP_ATOMIC));
+			sizeof(struct ib_grh) + *mad_size, GFP_ATOMIC));
 }
 
 /*
@@ -741,6 +742,8 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	u8 port_num;
 	struct ib_wc mad_wc;
 	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
+	size_t in_mad_size = mad_agent_priv->agent.device->cached_dev_attrs.max_mad_size;
+	size_t out_mad_size;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH &&
 	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
@@ -777,7 +780,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	local->mad_priv = NULL;
 	local->recv_mad_agent = NULL;
 
-	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device);
+	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device, &out_mad_size);
 	if (!mad_priv) {
 		ret = -ENOMEM;
 		dev_err(&device->dev, "No memory for local response MAD\n");
@@ -792,8 +795,9 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 
 	/* No GRH for DR SMP */
 	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
-				  (struct ib_mad *)smp,
-				  (struct ib_mad *)&mad_priv->mad);
+				  (struct ib_mad_hdr *)smp, in_mad_size,
+				  (struct ib_mad_hdr *)&mad_priv->mad,
+				  &out_mad_size);
 	switch (ret)
 	{
 	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY:
@@ -2011,6 +2015,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	struct ib_mad_agent_private *mad_agent;
 	int port_num;
 	int ret = IB_MAD_RESULT_SUCCESS;
+	size_t resp_mad_size;
 
 	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
 	qp_info = mad_list->mad_queue->qp_info;
@@ -2038,7 +2043,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
 		goto out;
 
-	response = alloc_mad_priv(port_priv->device);
+	response = alloc_mad_priv(port_priv->device, &resp_mad_size);
 	if (!response) {
 		dev_err(&port_priv->device->dev,
 			"ib_mad_recv_done_handler no memory for response buffer\n");
@@ -2063,8 +2068,10 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		ret = port_priv->device->process_mad(port_priv->device, 0,
 						     port_priv->port_num,
 						     wc, &recv->grh,
-						     &recv->mad.mad,
-						     &response->mad.mad);
+						     (struct ib_mad_hdr *)&recv->mad.mad,
+						     port_priv->device->cached_dev_attrs.max_mad_size,
+						     (struct ib_mad_hdr *)&response->mad.mad,
+						     &resp_mad_size);
 		if (ret & IB_MAD_RESULT_SUCCESS) {
 			if (ret & IB_MAD_RESULT_CONSUMED)
 				goto out;
@@ -2687,7 +2694,10 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 			mad_priv = mad;
 			mad = NULL;
 		} else {
-			mad_priv = alloc_mad_priv(qp_info->port_priv->device);
+			size_t mad_size;
+
+			mad_priv = alloc_mad_priv(qp_info->port_priv->device,
+						  &mad_size);
 			if (!mad_priv) {
 				dev_err(&qp_info->port_priv->device->dev,
 					"No memory for receive buffer\n");
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index cbd0383..a59bb8f 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -326,6 +326,7 @@ static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr,
 	int width  = (tab_attr->index >> 16) & 0xff;
 	struct ib_mad *in_mad  = NULL;
 	struct ib_mad *out_mad = NULL;
+	size_t out_mad_size = sizeof(*out_mad);
 	ssize_t ret;
 
 	if (!p->ibdev->process_mad)
@@ -347,7 +348,9 @@ static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr,
 	in_mad->data[41] = p->port_num;	/* PortSelect field */
 
 	if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY,
-		 p->port_num, NULL, NULL, in_mad, out_mad) &
+		 p->port_num, NULL, NULL,
+		 (struct ib_mad_hdr *)in_mad, sizeof(*in_mad),
+		 (struct ib_mad_hdr *)out_mad, &out_mad_size) &
 	     (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) !=
 	    (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) {
 		ret = -EINVAL;
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index bdf3507..94926e6 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -584,7 +584,10 @@ static int c2_process_mad(struct ib_device *ibdev,
 			  u8 port_num,
 			  struct ib_wc *in_wc,
 			  struct ib_grh *in_grh,
-			  struct ib_mad *in_mad, struct ib_mad *out_mad)
+			  struct ib_mad_hdr *in_mad,
+			  size_t in_mad_size,
+			  struct ib_mad_hdr *out_mad,
+			  size_t *out_mad_size)
 {
 	pr_debug("%s:%u\n", __func__, __LINE__);
 	return -ENOSYS;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index b8a80aa0..226f4e3 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -87,7 +87,10 @@ static int iwch_process_mad(struct ib_device *ibdev,
 			    u8 port_num,
 			    struct ib_wc *in_wc,
 			    struct ib_grh *in_grh,
-			    struct ib_mad *in_mad, struct ib_mad *out_mad)
+			    struct ib_mad_hdr *in_mad,
+			    size_t in_mad_size,
+			    struct ib_mad_hdr *out_mad,
+			    size_t *out_mad_size)
 {
 	return -ENOSYS;
 }
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 299c70c..a4cfe9a 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -81,8 +81,11 @@ static int c4iw_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 
 static int c4iw_process_mad(struct ib_device *ibdev, int mad_flags,
 			    u8 port_num, struct ib_wc *in_wc,
-			    struct ib_grh *in_grh, struct ib_mad *in_mad,
-			    struct ib_mad *out_mad)
+			    struct ib_grh *in_grh,
+			    struct ib_mad_hdr *in_mad,
+			    size_t in_mad_size,
+			    struct ib_mad_hdr *out_mad,
+			    size_t *out_mad_size)
 {
 	return -ENOSYS;
 }
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 22f79af..d9fa829 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -189,8 +189,8 @@ int ehca_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
 
 int ehca_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		     struct ib_wc *in_wc, struct ib_grh *in_grh,
-		     struct ib_mad *in_mad,
-		     struct ib_mad *out_mad);
+		     struct ib_mad_hdr *in, size_t in_mad_size,
+		     struct ib_mad_hdr *out, size_t *out_mad_size);
 
 void ehca_poll_eqs(unsigned long data);
 
diff --git a/drivers/infiniband/hw/ehca/ehca_sqp.c b/drivers/infiniband/hw/ehca/ehca_sqp.c
index dba8f9f..d4ed490 100644
--- a/drivers/infiniband/hw/ehca/ehca_sqp.c
+++ b/drivers/infiniband/hw/ehca/ehca_sqp.c
@@ -218,9 +218,15 @@ perf_reply:
 
 int ehca_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		     struct ib_wc *in_wc, struct ib_grh *in_grh,
-		     struct ib_mad *in_mad, struct ib_mad *out_mad)
+		     struct ib_mad_hdr *in, size_t in_mad_size,
+		     struct ib_mad_hdr *out, size_t *out_mad_size)
 {
 	int ret;
+	struct ib_mad *in_mad = (struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	if (in_mad_size != sizeof(*in_mad) || *out_mad_size != sizeof(*out_mad))
+		return IB_MAD_RESULT_FAILURE;
 
 	if (!port_num || port_num > ibdev->phys_port_cnt || !in_wc)
 		return IB_MAD_RESULT_FAILURE;
diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c
index e890e5b..d554089 100644
--- a/drivers/infiniband/hw/ipath/ipath_mad.c
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
@@ -1491,9 +1491,15 @@ bail:
  */
 int ipath_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		      struct ib_wc *in_wc, struct ib_grh *in_grh,
-		      struct ib_mad *in_mad, struct ib_mad *out_mad)
+		      struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size)
 {
 	int ret;
+	struct ib_mad *in_mad = (struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	if (in_mad_size != sizeof(*in_mad) || *out_mad_size != sizeof(*out_mad))
+		return IB_MAD_RESULT_FAILURE;
 
 	switch (in_mad->mad_hdr.mgmt_class) {
 	case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index ae6cff4..cd8dd09 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -703,7 +703,8 @@ int ipath_process_mad(struct ib_device *ibdev,
 		      u8 port_num,
 		      struct ib_wc *in_wc,
 		      struct ib_grh *in_grh,
-		      struct ib_mad *in_mad, struct ib_mad *out_mad);
+		      struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size);
 
 /*
  * Compare the lower 24 bits of the two values.
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 82a7dd8..4acb3ee 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -855,8 +855,15 @@ static int iboe_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 
 int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			struct ib_wc *in_wc, struct ib_grh *in_grh,
-			struct ib_mad *in_mad, struct ib_mad *out_mad)
+			struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size)
 {
+	struct ib_mad *in_mad = (struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	if (in_mad_size != sizeof(*in_mad) || *out_mad_size != sizeof(*out_mad))
+		return IB_MAD_RESULT_FAILURE;
+
 	switch (rdma_port_get_link_layer(ibdev, port_num)) {
 	case IB_LINK_LAYER_INFINIBAND:
 		return ib_process_mad(ibdev, mad_flags, port_num, in_wc,
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 6eb743f..c5960fe 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -690,7 +690,8 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int mad_ifc_flags,
 		 void *in_mad, void *response_mad);
 int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags,	u8 port_num,
 			struct ib_wc *in_wc, struct ib_grh *in_grh,
-			struct ib_mad *in_mad, struct ib_mad *out_mad);
+			struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size);
 int mlx4_ib_mad_init(struct mlx4_ib_dev *dev);
 void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev);
 
diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 657af9a..c790066 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -59,10 +59,16 @@ int mlx5_MAD_IFC(struct mlx5_ib_dev *dev, int ignore_mkey, int ignore_bkey,
 
 int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			struct ib_wc *in_wc, struct ib_grh *in_grh,
-			struct ib_mad *in_mad, struct ib_mad *out_mad)
+			struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size)
 {
 	u16 slid;
 	int err;
+	struct ib_mad *in_mad = (struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	if (in_mad_size != sizeof(*in_mad) || *out_mad_size != sizeof(*out_mad))
+		return IB_MAD_RESULT_FAILURE;
 
 	slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 83f22fe..7897d35 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -589,7 +589,8 @@ int mlx5_ib_unmap_fmr(struct list_head *fmr_list);
 int mlx5_ib_fmr_dealloc(struct ib_fmr *ibfmr);
 int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			struct ib_wc *in_wc, struct ib_grh *in_grh,
-			struct ib_mad *in_mad, struct ib_mad *out_mad);
+			struct ib_mad_hdr *in, size_t in_mad_size,
+			struct ib_mad_hdr *out, size_t *out_mad_size);
 struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev,
 					  struct ib_ucontext *context,
 					  struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 7e6a6d6..60d15f1 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -578,8 +578,8 @@ int mthca_process_mad(struct ib_device *ibdev,
 		      u8 port_num,
 		      struct ib_wc *in_wc,
 		      struct ib_grh *in_grh,
-		      struct ib_mad *in_mad,
-		      struct ib_mad *out_mad);
+		      struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size);
 int mthca_create_agents(struct mthca_dev *dev);
 void mthca_free_agents(struct mthca_dev *dev);
 
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index 8881fa3..5d597a1 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -197,13 +197,18 @@ int mthca_process_mad(struct ib_device *ibdev,
 		      u8 port_num,
 		      struct ib_wc *in_wc,
 		      struct ib_grh *in_grh,
-		      struct ib_mad *in_mad,
-		      struct ib_mad *out_mad)
+		      struct ib_mad_hdr *in, size_t in_mad_size,
+		      struct ib_mad_hdr *out, size_t *out_mad_size)
 {
 	int err;
 	u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
 	u16 prev_lid = 0;
 	struct ib_port_attr pattr;
+	struct ib_mad *in_mad = (struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	if (in_mad_size != sizeof(*in_mad) || *out_mad_size != sizeof(*out_mad))
+		return IB_MAD_RESULT_FAILURE;
 
 	/* Forward locally generated traps to the SM */
 	if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP &&
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 93e67e2..3aa038f 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -3224,7 +3224,8 @@ static int nes_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
  */
 static int nes_process_mad(struct ib_device *ibdev, int mad_flags,
 		u8 port_num, struct ib_wc *in_wc, struct ib_grh *in_grh,
-		struct ib_mad *in_mad, struct ib_mad *out_mad)
+		struct ib_mad_hdr *in, size_t in_mad_size,
+		struct ib_mad_hdr *out, size_t *out_mad_size)
 {
 	nes_debug(NES_DBG_INIT, "\n");
 	return -ENOSYS;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index f3cc8c9..8f22518 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -189,7 +189,8 @@ int ocrdma_process_mad(struct ib_device *ibdev,
 		       u8 port_num,
 		       struct ib_wc *in_wc,
 		       struct ib_grh *in_grh,
-		       struct ib_mad *in_mad, struct ib_mad *out_mad)
+		       struct ib_mad_hdr *in, size_t in_mad_size,
+		       struct ib_mad_hdr *out, size_t *out_mad_size)
 {
 	return IB_MAD_RESULT_SUCCESS;
 }
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
index 8ac49e7..78e45ec 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
@@ -38,5 +38,6 @@ int ocrdma_process_mad(struct ib_device *,
 		       u8 port_num,
 		       struct ib_wc *in_wc,
 		       struct ib_grh *in_grh,
-		       struct ib_mad *in_mad, struct ib_mad *out_mad);
+		       struct ib_mad_hdr *in, size_t in_mad_size,
+		       struct ib_mad_hdr *out, size_t *out_mad_size);
 #endif				/* __OCRDMA_AH_H__ */
diff --git a/drivers/infiniband/hw/qib/qib_mad.c b/drivers/infiniband/hw/qib/qib_mad.c
index 636be11..fb022a3 100644
--- a/drivers/infiniband/hw/qib/qib_mad.c
+++ b/drivers/infiniband/hw/qib/qib_mad.c
@@ -2401,11 +2401,17 @@ bail:
  */
 int qib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port,
 		    struct ib_wc *in_wc, struct ib_grh *in_grh,
-		    struct ib_mad *in_mad, struct ib_mad *out_mad)
+		    struct ib_mad_hdr *in, size_t in_mad_size,
+		    struct ib_mad_hdr *out, size_t *out_mad_size)
 {
 	int ret;
 	struct qib_ibport *ibp = to_iport(ibdev, port);
 	struct qib_pportdata *ppd = ppd_from_ibp(ibp);
+	struct ib_mad *in_mad = (struct ib_mad *)in;
+	struct ib_mad *out_mad = (struct ib_mad *)out;
+
+	if (in_mad_size != sizeof(*in_mad) || *out_mad_size != sizeof(*out_mad))
+		return IB_MAD_RESULT_FAILURE;
 
 	switch (in_mad->mad_hdr.mgmt_class) {
 	case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index bfc8948..77f1d31 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -873,7 +873,8 @@ void qib_sys_guid_chg(struct qib_ibport *ibp);
 void qib_node_desc_chg(struct qib_ibport *ibp);
 int qib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		    struct ib_wc *in_wc, struct ib_grh *in_grh,
-		    struct ib_mad *in_mad, struct ib_mad *out_mad);
+		    struct ib_mad_hdr *in, size_t in_mad_size,
+		    struct ib_mad_hdr *out, size_t *out_mad_size);
 int qib_create_agents(struct qib_ibdev *dev);
 void qib_free_agents(struct qib_ibdev *dev);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 64d3479..3ab4033 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1408,7 +1408,7 @@ struct ib_flow {
 	struct ib_uobject	*uobject;
 };
 
-struct ib_mad;
+struct ib_mad_hdr;
 struct ib_grh;
 
 enum ib_process_mad_flags {
@@ -1640,8 +1640,10 @@ struct ib_device {
 						  u8 port_num,
 						  struct ib_wc *in_wc,
 						  struct ib_grh *in_grh,
-						  struct ib_mad *in_mad,
-						  struct ib_mad *out_mad);
+						  struct ib_mad_hdr *in_mad,
+						  size_t in_mad_size,
+						  struct ib_mad_hdr *out_mad,
+						  size_t *out_mad_size);
 	struct ib_xrcd *	   (*alloc_xrcd)(struct ib_device *device,
 						 struct ib_ucontext *ucontext,
 						 struct ib_udata *udata);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 13/19] IB/mad: Add base version parameter to ib_create_send_mad
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-04 23:29   ` [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

In preparation to support the new OPA MAD Base version, add a base version
parameter to ib_create_send_mad and set it to IB_MGMT_BASE_VERSION for current
users.

Definition of the new base version and it's processing will occur in later
patches.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/agent.c         | 3 ++-
 drivers/infiniband/core/cm.c            | 6 ++++--
 drivers/infiniband/core/mad.c           | 3 ++-
 drivers/infiniband/core/mad_rmpp.c      | 6 ++++--
 drivers/infiniband/core/sa_query.c      | 3 ++-
 drivers/infiniband/core/user_mad.c      | 3 ++-
 drivers/infiniband/hw/mlx4/mad.c        | 3 ++-
 drivers/infiniband/hw/mthca/mthca_mad.c | 3 ++-
 drivers/infiniband/hw/qib/qib_iba7322.c | 3 ++-
 drivers/infiniband/hw/qib/qib_mad.c     | 3 ++-
 drivers/infiniband/ulp/srpt/ib_srpt.c   | 3 ++-
 include/rdma/ib_mad.h                   | 4 +++-
 12 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f6d2961..b6bd305 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -108,7 +108,8 @@ void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 
 	send_buf = ib_create_send_mad(agent, wc->src_qp, wc->pkey_index, 0,
 				      IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-				      GFP_KERNEL);
+				      GFP_KERNEL,
+				      IB_MGMT_BASE_VERSION);
 	if (IS_ERR(send_buf)) {
 		dev_err(&device->dev, "ib_create_send_mad error\n");
 		goto err1;
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e28a494..5767781 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -267,7 +267,8 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv,
 	m = ib_create_send_mad(mad_agent, cm_id_priv->id.remote_cm_qpn,
 			       cm_id_priv->av.pkey_index,
 			       0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-			       GFP_ATOMIC);
+			       GFP_ATOMIC,
+			       IB_MGMT_BASE_VERSION);
 	if (IS_ERR(m)) {
 		ib_destroy_ah(ah);
 		return PTR_ERR(m);
@@ -297,7 +298,8 @@ static int cm_alloc_response_msg(struct cm_port *port,
 
 	m = ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index,
 			       0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-			       GFP_ATOMIC);
+			       GFP_ATOMIC,
+			       IB_MGMT_BASE_VERSION);
 	if (IS_ERR(m)) {
 		ib_destroy_ah(ah);
 		return PTR_ERR(m);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 4d93ad2..2145294 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -930,7 +930,8 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 					    u32 remote_qpn, u16 pkey_index,
 					    int rmpp_active,
 					    int hdr_len, int data_len,
-					    gfp_t gfp_mask)
+					    gfp_t gfp_mask,
+					    u8 base_version)
 {
 	struct ib_mad_agent_private *mad_agent_priv;
 	struct ib_mad_send_wr_private *mad_send_wr;
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index f37878c..2379e2d 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -139,7 +139,8 @@ static void ack_recv(struct mad_rmpp_recv *rmpp_recv,
 	hdr_len = ib_get_mad_data_offset(recv_wc->recv_buf.mad->mad_hdr.mgmt_class);
 	msg = ib_create_send_mad(&rmpp_recv->agent->agent, recv_wc->wc->src_qp,
 				 recv_wc->wc->pkey_index, 1, hdr_len,
-				 0, GFP_KERNEL);
+				 0, GFP_KERNEL,
+				 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(msg))
 		return;
 
@@ -165,7 +166,8 @@ static struct ib_mad_send_buf *alloc_response_msg(struct ib_mad_agent *agent,
 	hdr_len = ib_get_mad_data_offset(recv_wc->recv_buf.mad->mad_hdr.mgmt_class);
 	msg = ib_create_send_mad(agent, recv_wc->wc->src_qp,
 				 recv_wc->wc->pkey_index, 1,
-				 hdr_len, 0, GFP_KERNEL);
+				 hdr_len, 0, GFP_KERNEL,
+				 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(msg))
 		ib_destroy_ah(ah);
 	else {
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index c38f030..32c3fe6 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -583,7 +583,8 @@ static int alloc_mad(struct ib_sa_query *query, gfp_t gfp_mask)
 	query->mad_buf = ib_create_send_mad(query->port->agent, 1,
 					    query->sm_ah->pkey_index,
 					    0, IB_MGMT_SA_HDR, IB_MGMT_SA_DATA,
-					    gfp_mask);
+					    gfp_mask,
+					    IB_MGMT_BASE_VERSION);
 	if (IS_ERR(query->mad_buf)) {
 		kref_put(&query->sm_ah->ref, free_sm_ah);
 		return -ENOMEM;
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 66b5217..9628494 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -521,7 +521,8 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 	packet->msg = ib_create_send_mad(agent,
 					 be32_to_cpu(packet->mad.hdr.qpn),
 					 packet->mad.hdr.pkey_index, rmpp_active,
-					 hdr_len, data_len, GFP_KERNEL);
+					 hdr_len, data_len, GFP_KERNEL,
+					 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(packet->msg)) {
 		ret = PTR_ERR(packet->msg);
 		goto err_ah;
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 4acb3ee..cd97722 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -358,7 +358,8 @@ static void forward_trap(struct mlx4_ib_dev *dev, u8 port_num, struct ib_mad *ma
 
 	if (agent) {
 		send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR,
-					      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+					      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+					      IB_MGMT_BASE_VERSION);
 		if (IS_ERR(send_buf))
 			return;
 		/*
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index 5d597a1..83817da 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -170,7 +170,8 @@ static void forward_trap(struct mthca_dev *dev,
 
 	if (agent) {
 		send_buf = ib_create_send_mad(agent, qpn, 0, 0, IB_MGMT_MAD_HDR,
-					      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+					      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+					      IB_MGMT_BASE_VERSION);
 		if (IS_ERR(send_buf))
 			return;
 		/*
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index a7eb325..d41a4170 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -5488,7 +5488,8 @@ static void try_7322_ipg(struct qib_pportdata *ppd)
 		goto retry;
 
 	send_buf = ib_create_send_mad(agent, 0, 0, 0, IB_MGMT_MAD_HDR,
-				      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+				      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+				      IB_MGMT_BASE_VERSION);
 	if (IS_ERR(send_buf))
 		goto retry;
 
diff --git a/drivers/infiniband/hw/qib/qib_mad.c b/drivers/infiniband/hw/qib/qib_mad.c
index fb022a3..cfc1be7 100644
--- a/drivers/infiniband/hw/qib/qib_mad.c
+++ b/drivers/infiniband/hw/qib/qib_mad.c
@@ -83,7 +83,8 @@ static void qib_send_trap(struct qib_ibport *ibp, void *data, unsigned len)
 		return;
 
 	send_buf = ib_create_send_mad(agent, 0, 0, 0, IB_MGMT_MAD_HDR,
-				      IB_MGMT_MAD_DATA, GFP_ATOMIC);
+				      IB_MGMT_MAD_DATA, GFP_ATOMIC,
+				      IB_MGMT_BASE_VERSION);
 	if (IS_ERR(send_buf))
 		return;
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index eb694dd..990c6be 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -477,7 +477,8 @@ static void srpt_mad_recv_handler(struct ib_mad_agent *mad_agent,
 	rsp = ib_create_send_mad(mad_agent, mad_wc->wc->src_qp,
 				 mad_wc->wc->pkey_index, 0,
 				 IB_MGMT_DEVICE_HDR, IB_MGMT_DEVICE_DATA,
-				 GFP_KERNEL);
+				 GFP_KERNEL,
+				 IB_MGMT_BASE_VERSION);
 	if (IS_ERR(rsp))
 		goto err_rsp;
 
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 5823016..00a5e51 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -619,6 +619,7 @@ int ib_process_mad_wc(struct ib_mad_agent *mad_agent,
  *   automatically adjust the allocated buffer size to account for any
  *   additional padding that may be necessary.
  * @gfp_mask: GFP mask used for the memory allocation.
+ * @base_version: Base Version of this MAD
  *
  * This routine allocates a MAD for sending.  The returned MAD send buffer
  * will reference a data buffer usable for sending a MAD, along
@@ -634,7 +635,8 @@ struct ib_mad_send_buf *ib_create_send_mad(struct ib_mad_agent *mad_agent,
 					   u32 remote_qpn, u16 pkey_index,
 					   int rmpp_active,
 					   int hdr_len, int data_len,
-					   gfp_t gfp_mask);
+					   gfp_t gfp_mask,
+					   u8 base_version);
 
 /**
  * ib_is_mad_class_rmpp - returns whether given management class
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 13/19] IB/mad: Add base version parameter to ib_create_send_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 15/19] IB/mad: Create jumbo_mad data structures ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (5 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

OPA MADs share a common header with IBTA MADs but with a different base version
and an extended length.  These "jumbo" MADs increase the performance of
management traffic.

Sharing a common header with IBTA MADs allows us to share most of the MAD
processing code when dealing with OPA MADs in addition to supporting some IBTA
MADs on OPA devices.

Add a device capability flag to indicate OPA MAD support on the device.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 include/rdma/ib_verbs.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 3ab4033..2614233 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -128,6 +128,10 @@ enum ib_device_cap_flags {
 	IB_DEVICE_ON_DEMAND_PAGING	= (1<<31),
 };
 
+enum ib_device_cap_flags2 {
+	IB_DEVICE_OPA_MAD_SUPPORT	= 1
+};
+
 enum ib_signature_prot_cap {
 	IB_PROT_T10DIF_TYPE_1 = 1,
 	IB_PROT_T10DIF_TYPE_2 = 1 << 1,
@@ -210,6 +214,7 @@ struct ib_device_attr {
 	int			sig_prot_cap;
 	int			sig_guard_cap;
 	struct ib_odp_caps	odp_caps;
+	u64			device_cap_flags2;
 	u32			max_mad_size;
 };
 
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 15/19] IB/mad: Create jumbo_mad data structures
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-16-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 16/19] IB/mad: Add Intel Omni-Path Architecture defines ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (4 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Define jumbo_mad and jumbo_rmpp_mad.

Jumbo MAD structures are 2K versions of ib_mad and ib_rmpp_mad structures.
Currently only OPA base version MADs are of this type.

Create an RMPP Base header to share between ib_rmpp_mad and jumbo_rmpp_mad

Update existing code to use the new structures.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/mad.c      |  18 +++---
 drivers/infiniband/core/mad_priv.h |   2 +
 drivers/infiniband/core/mad_rmpp.c | 120 ++++++++++++++++++-------------------
 drivers/infiniband/core/user_mad.c |  16 ++---
 include/rdma/ib_mad.h              |  26 +++++++-
 5 files changed, 103 insertions(+), 79 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 2145294..316b4b2 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -883,7 +883,7 @@ static int alloc_send_rmpp_list(struct ib_mad_send_wr_private *send_wr,
 				gfp_t gfp_mask)
 {
 	struct ib_mad_send_buf *send_buf = &send_wr->send_buf;
-	struct ib_rmpp_mad *rmpp_mad = send_buf->mad;
+	struct ib_rmpp_base *rmpp_base = send_buf->mad;
 	struct ib_rmpp_segment *seg = NULL;
 	int left, seg_size, pad;
 
@@ -909,10 +909,10 @@ static int alloc_send_rmpp_list(struct ib_mad_send_wr_private *send_wr,
 	if (pad)
 		memset(seg->data + seg_size - pad, 0, pad);
 
-	rmpp_mad->rmpp_hdr.rmpp_version = send_wr->mad_agent_priv->
+	rmpp_base->rmpp_hdr.rmpp_version = send_wr->mad_agent_priv->
 					  agent.rmpp_version;
-	rmpp_mad->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_DATA;
-	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
+	rmpp_base->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_DATA;
+	ib_set_rmpp_flags(&rmpp_base->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
 
 	send_wr->cur_seg = container_of(send_wr->rmpp_list.next,
 					struct ib_rmpp_segment, list);
@@ -1748,14 +1748,14 @@ out:
 static int is_rmpp_data_mad(struct ib_mad_agent_private *mad_agent_priv,
 		       struct ib_mad_hdr *mad_hdr)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 
-	rmpp_mad = (struct ib_rmpp_mad *)mad_hdr;
+	rmpp_base = (struct ib_rmpp_base *)mad_hdr;
 	return !mad_agent_priv->agent.rmpp_version ||
 		!ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent) ||
-		!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
+		!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
 				    IB_MGMT_RMPP_FLAG_ACTIVE) ||
-		(rmpp_mad->rmpp_hdr.rmpp_type == IB_MGMT_RMPP_TYPE_DATA);
+		(rmpp_base->rmpp_hdr.rmpp_type == IB_MGMT_RMPP_TYPE_DATA);
 }
 
 static inline int rcv_has_same_class(struct ib_mad_send_wr_private *wr,
@@ -1897,7 +1897,7 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 			spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
 			if (!ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent)
 			   && ib_is_mad_class_rmpp(mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class)
-			   && (ib_get_rmpp_flags(&((struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad)->rmpp_hdr)
+			   && (ib_get_rmpp_flags(&((struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad)->rmpp_hdr)
 					& IB_MGMT_RMPP_FLAG_ACTIVE)) {
 				/* user rmpp is in effect
 				 * and this is an active RMPP MAD
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index d1a0b0e..d71ddcc 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -80,6 +80,8 @@ struct ib_mad_private {
 		struct ib_mad mad;
 		struct ib_rmpp_mad rmpp_mad;
 		struct ib_smp smp;
+		struct jumbo_mad jumbo_mad;
+		struct jumbo_rmpp_mad jumbo_rmpp_mad;
 	} mad;
 } __attribute__ ((packed));
 
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 2379e2d..7184530 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -111,10 +111,10 @@ void ib_cancel_rmpp_recvs(struct ib_mad_agent_private *agent)
 }
 
 static void format_ack(struct ib_mad_send_buf *msg,
-		       struct ib_rmpp_mad *data,
+		       struct ib_rmpp_base *data,
 		       struct mad_rmpp_recv *rmpp_recv)
 {
-	struct ib_rmpp_mad *ack = msg->mad;
+	struct ib_rmpp_base *ack = msg->mad;
 	unsigned long flags;
 
 	memcpy(ack, &data->mad_hdr, msg->hdr_len);
@@ -144,7 +144,7 @@ static void ack_recv(struct mad_rmpp_recv *rmpp_recv,
 	if (IS_ERR(msg))
 		return;
 
-	format_ack(msg, (struct ib_rmpp_mad *) recv_wc->recv_buf.mad, rmpp_recv);
+	format_ack(msg, (struct ib_rmpp_base *) recv_wc->recv_buf.mad, rmpp_recv);
 	msg->ah = rmpp_recv->ah;
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret)
@@ -182,20 +182,20 @@ static void ack_ds_ack(struct ib_mad_agent_private *agent,
 		       struct ib_mad_recv_wc *recv_wc)
 {
 	struct ib_mad_send_buf *msg;
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 	int ret;
 
 	msg = alloc_response_msg(&agent->agent, recv_wc);
 	if (IS_ERR(msg))
 		return;
 
-	rmpp_mad = msg->mad;
-	memcpy(rmpp_mad, recv_wc->recv_buf.mad, msg->hdr_len);
+	rmpp_base = msg->mad;
+	memcpy(rmpp_base, recv_wc->recv_buf.mad, msg->hdr_len);
 
-	rmpp_mad->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
-	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
-	rmpp_mad->rmpp_hdr.seg_num = 0;
-	rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(1);
+	rmpp_base->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
+	ib_set_rmpp_flags(&rmpp_base->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
+	rmpp_base->rmpp_hdr.seg_num = 0;
+	rmpp_base->rmpp_hdr.paylen_newwin = cpu_to_be32(1);
 
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
@@ -215,23 +215,23 @@ static void nack_recv(struct ib_mad_agent_private *agent,
 		      struct ib_mad_recv_wc *recv_wc, u8 rmpp_status)
 {
 	struct ib_mad_send_buf *msg;
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 	int ret;
 
 	msg = alloc_response_msg(&agent->agent, recv_wc);
 	if (IS_ERR(msg))
 		return;
 
-	rmpp_mad = msg->mad;
-	memcpy(rmpp_mad, recv_wc->recv_buf.mad, msg->hdr_len);
+	rmpp_base = msg->mad;
+	memcpy(rmpp_base, recv_wc->recv_buf.mad, msg->hdr_len);
 
-	rmpp_mad->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
-	rmpp_mad->rmpp_hdr.rmpp_version = IB_MGMT_RMPP_VERSION;
-	rmpp_mad->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_ABORT;
-	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
-	rmpp_mad->rmpp_hdr.rmpp_status = rmpp_status;
-	rmpp_mad->rmpp_hdr.seg_num = 0;
-	rmpp_mad->rmpp_hdr.paylen_newwin = 0;
+	rmpp_base->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
+	rmpp_base->rmpp_hdr.rmpp_version = IB_MGMT_RMPP_VERSION;
+	rmpp_base->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_ABORT;
+	ib_set_rmpp_flags(&rmpp_base->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
+	rmpp_base->rmpp_hdr.rmpp_status = rmpp_status;
+	rmpp_base->rmpp_hdr.seg_num = 0;
+	rmpp_base->rmpp_hdr.paylen_newwin = 0;
 
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
@@ -373,18 +373,18 @@ insert_rmpp_recv(struct ib_mad_agent_private *agent,
 
 static inline int get_last_flag(struct ib_mad_recv_buf *seg)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 
-	rmpp_mad = (struct ib_rmpp_mad *) seg->mad;
-	return ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) & IB_MGMT_RMPP_FLAG_LAST;
+	rmpp_base = (struct ib_rmpp_base *) seg->mad;
+	return ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) & IB_MGMT_RMPP_FLAG_LAST;
 }
 
 static inline int get_seg_num(struct ib_mad_recv_buf *seg)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 
-	rmpp_mad = (struct ib_rmpp_mad *) seg->mad;
-	return be32_to_cpu(rmpp_mad->rmpp_hdr.seg_num);
+	rmpp_base = (struct ib_rmpp_base *) seg->mad;
+	return be32_to_cpu(rmpp_base->rmpp_hdr.seg_num);
 }
 
 static inline struct ib_mad_recv_buf * get_next_seg(struct list_head *rmpp_list,
@@ -436,9 +436,9 @@ static inline int get_mad_len(struct mad_rmpp_recv *rmpp_recv)
 
 	rmpp_mad = (struct ib_rmpp_mad *)rmpp_recv->cur_seg_buf->mad;
 
-	hdr_size = ib_get_mad_data_offset(rmpp_mad->mad_hdr.mgmt_class);
+	hdr_size = ib_get_mad_data_offset(rmpp_mad->base.mad_hdr.mgmt_class);
 	data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
-	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin);
+	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad->base.rmpp_hdr.paylen_newwin);
 	if (pad > IB_MGMT_RMPP_DATA || pad < 0)
 		pad = 0;
 
@@ -567,20 +567,20 @@ static int send_next_seg(struct ib_mad_send_wr_private *mad_send_wr)
 	u32 paylen = 0;
 
 	rmpp_mad = mad_send_wr->send_buf.mad;
-	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
-	rmpp_mad->rmpp_hdr.seg_num = cpu_to_be32(++mad_send_wr->seg_num);
+	ib_set_rmpp_flags(&rmpp_mad->base.rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
+	rmpp_mad->base.rmpp_hdr.seg_num = cpu_to_be32(++mad_send_wr->seg_num);
 
 	if (mad_send_wr->seg_num == 1) {
-		rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST;
+		rmpp_mad->base.rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST;
 		paylen = mad_send_wr->send_buf.seg_count * IB_MGMT_RMPP_DATA -
 			 mad_send_wr->pad;
 	}
 
 	if (mad_send_wr->seg_num == mad_send_wr->send_buf.seg_count) {
-		rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_LAST;
+		rmpp_mad->base.rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_LAST;
 		paylen = IB_MGMT_RMPP_DATA - mad_send_wr->pad;
 	}
-	rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(paylen);
+	rmpp_mad->base.rmpp_hdr.paylen_newwin = cpu_to_be32(paylen);
 
 	/* 2 seconds for an ACK until we can find the packet lifetime */
 	timeout = mad_send_wr->send_buf.timeout_ms;
@@ -644,19 +644,19 @@ static void process_rmpp_ack(struct ib_mad_agent_private *agent,
 			     struct ib_mad_recv_wc *mad_recv_wc)
 {
 	struct ib_mad_send_wr_private *mad_send_wr;
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 	unsigned long flags;
 	int seg_num, newwin, ret;
 
-	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
-	if (rmpp_mad->rmpp_hdr.rmpp_status) {
+	rmpp_base = (struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad;
+	if (rmpp_base->rmpp_hdr.rmpp_status) {
 		abort_send(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
 		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
 		return;
 	}
 
-	seg_num = be32_to_cpu(rmpp_mad->rmpp_hdr.seg_num);
-	newwin = be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin);
+	seg_num = be32_to_cpu(rmpp_base->rmpp_hdr.seg_num);
+	newwin = be32_to_cpu(rmpp_base->rmpp_hdr.paylen_newwin);
 	if (newwin < seg_num) {
 		abort_send(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_W2S);
 		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_W2S);
@@ -741,7 +741,7 @@ process_rmpp_data(struct ib_mad_agent_private *agent,
 	struct ib_rmpp_hdr *rmpp_hdr;
 	u8 rmpp_status;
 
-	rmpp_hdr = &((struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad)->rmpp_hdr;
+	rmpp_hdr = &((struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad)->rmpp_hdr;
 
 	if (rmpp_hdr->rmpp_status) {
 		rmpp_status = IB_MGMT_RMPP_STATUS_BAD_STATUS;
@@ -770,30 +770,30 @@ bad:
 static void process_rmpp_stop(struct ib_mad_agent_private *agent,
 			      struct ib_mad_recv_wc *mad_recv_wc)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 
-	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
+	rmpp_base = (struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad;
 
-	if (rmpp_mad->rmpp_hdr.rmpp_status != IB_MGMT_RMPP_STATUS_RESX) {
+	if (rmpp_base->rmpp_hdr.rmpp_status != IB_MGMT_RMPP_STATUS_RESX) {
 		abort_send(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
 		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
 	} else
-		abort_send(agent, mad_recv_wc, rmpp_mad->rmpp_hdr.rmpp_status);
+		abort_send(agent, mad_recv_wc, rmpp_base->rmpp_hdr.rmpp_status);
 }
 
 static void process_rmpp_abort(struct ib_mad_agent_private *agent,
 			       struct ib_mad_recv_wc *mad_recv_wc)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 
-	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
+	rmpp_base = (struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad;
 
-	if (rmpp_mad->rmpp_hdr.rmpp_status < IB_MGMT_RMPP_STATUS_ABORT_MIN ||
-	    rmpp_mad->rmpp_hdr.rmpp_status > IB_MGMT_RMPP_STATUS_ABORT_MAX) {
+	if (rmpp_base->rmpp_hdr.rmpp_status < IB_MGMT_RMPP_STATUS_ABORT_MIN ||
+	    rmpp_base->rmpp_hdr.rmpp_status > IB_MGMT_RMPP_STATUS_ABORT_MAX) {
 		abort_send(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
 		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
 	} else
-		abort_send(agent, mad_recv_wc, rmpp_mad->rmpp_hdr.rmpp_status);
+		abort_send(agent, mad_recv_wc, rmpp_base->rmpp_hdr.rmpp_status);
 }
 
 struct ib_mad_recv_wc *
@@ -803,16 +803,16 @@ ib_process_rmpp_recv_wc(struct ib_mad_agent_private *agent,
 	struct ib_rmpp_mad *rmpp_mad;
 
 	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
-	if (!(rmpp_mad->rmpp_hdr.rmpp_rtime_flags & IB_MGMT_RMPP_FLAG_ACTIVE))
+	if (!(rmpp_mad->base.rmpp_hdr.rmpp_rtime_flags & IB_MGMT_RMPP_FLAG_ACTIVE))
 		return mad_recv_wc;
 
-	if (rmpp_mad->rmpp_hdr.rmpp_version != IB_MGMT_RMPP_VERSION) {
+	if (rmpp_mad->base.rmpp_hdr.rmpp_version != IB_MGMT_RMPP_VERSION) {
 		abort_send(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_UNV);
 		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_UNV);
 		goto out;
 	}
 
-	switch (rmpp_mad->rmpp_hdr.rmpp_type) {
+	switch (rmpp_mad->base.rmpp_hdr.rmpp_type) {
 	case IB_MGMT_RMPP_TYPE_DATA:
 		return process_rmpp_data(agent, mad_recv_wc);
 	case IB_MGMT_RMPP_TYPE_ACK:
@@ -873,11 +873,11 @@ int ib_send_rmpp_mad(struct ib_mad_send_wr_private *mad_send_wr)
 	int ret;
 
 	rmpp_mad = mad_send_wr->send_buf.mad;
-	if (!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
+	if (!(ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) &
 	      IB_MGMT_RMPP_FLAG_ACTIVE))
 		return IB_RMPP_RESULT_UNHANDLED;
 
-	if (rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) {
+	if (rmpp_mad->base.rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) {
 		mad_send_wr->seg_num = 1;
 		return IB_RMPP_RESULT_INTERNAL;
 	}
@@ -895,15 +895,15 @@ int ib_send_rmpp_mad(struct ib_mad_send_wr_private *mad_send_wr)
 int ib_process_rmpp_send_wc(struct ib_mad_send_wr_private *mad_send_wr,
 			    struct ib_mad_send_wc *mad_send_wc)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 	int ret;
 
-	rmpp_mad = mad_send_wr->send_buf.mad;
-	if (!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
+	rmpp_base = mad_send_wr->send_buf.mad;
+	if (!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
 	      IB_MGMT_RMPP_FLAG_ACTIVE))
 		return IB_RMPP_RESULT_UNHANDLED; /* RMPP not active */
 
-	if (rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA)
+	if (rmpp_base->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA)
 		return IB_RMPP_RESULT_INTERNAL;	 /* ACK, STOP, or ABORT */
 
 	if (mad_send_wc->status != IB_WC_SUCCESS ||
@@ -933,11 +933,11 @@ int ib_process_rmpp_send_wc(struct ib_mad_send_wr_private *mad_send_wr,
 
 int ib_retry_rmpp(struct ib_mad_send_wr_private *mad_send_wr)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 	int ret;
 
-	rmpp_mad = mad_send_wr->send_buf.mad;
-	if (!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
+	rmpp_base = mad_send_wr->send_buf.mad;
+	if (!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
 	      IB_MGMT_RMPP_FLAG_ACTIVE))
 		return IB_RMPP_RESULT_UNHANDLED; /* RMPP not active */
 
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 9628494..ac33d34 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -448,7 +448,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 	struct ib_mad_agent *agent;
 	struct ib_ah_attr ah_attr;
 	struct ib_ah *ah;
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 	__be64 *tid;
 	int ret, data_len, hdr_len, copy_offset, rmpp_active;
 
@@ -504,13 +504,13 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 		goto err_up;
 	}
 
-	rmpp_mad = (struct ib_rmpp_mad *) packet->mad.data;
-	hdr_len = ib_get_mad_data_offset(rmpp_mad->mad_hdr.mgmt_class);
+	rmpp_base = (struct ib_rmpp_base *) packet->mad.data;
+	hdr_len = ib_get_mad_data_offset(rmpp_base->mad_hdr.mgmt_class);
 
-	if (ib_is_mad_class_rmpp(rmpp_mad->mad_hdr.mgmt_class)
+	if (ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
 	    && ib_mad_kernel_rmpp_agent(agent)) {
 		copy_offset = IB_MGMT_RMPP_HDR;
-		rmpp_active = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
+		rmpp_active = ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
 						IB_MGMT_RMPP_FLAG_ACTIVE;
 	} else {
 		copy_offset = IB_MGMT_MAD_HDR;
@@ -558,12 +558,12 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 		tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid;
 		*tid = cpu_to_be64(((u64) agent->hi_tid) << 32 |
 				   (be64_to_cpup(tid) & 0xffffffff));
-		rmpp_mad->mad_hdr.tid = *tid;
+		rmpp_base->mad_hdr.tid = *tid;
 	}
 
 	if (!ib_mad_kernel_rmpp_agent(agent)
-	   && ib_is_mad_class_rmpp(rmpp_mad->mad_hdr.mgmt_class)
-	   && (ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE)) {
+	   && ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
+	   && (ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE)) {
 		spin_lock_irq(&file->send_lock);
 		list_add_tail(&packet->list, &file->send_list);
 		spin_unlock_irq(&file->send_lock);
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 00a5e51..80e7cf4 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -136,6 +136,11 @@ enum {
 	IB_MGMT_DEVICE_HDR = 64,
 	IB_MGMT_DEVICE_DATA = 192,
 	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
+	JUMBO_MGMT_MAD_HDR = IB_MGMT_MAD_HDR,
+	JUMBO_MGMT_MAD_DATA = 2024,
+	JUMBO_MGMT_RMPP_HDR = IB_MGMT_RMPP_HDR,
+	JUMBO_MGMT_RMPP_DATA = 2012,
+	JUMBO_MGMT_MAD_SIZE = JUMBO_MGMT_MAD_HDR + JUMBO_MGMT_MAD_DATA,
 };
 
 struct ib_mad_hdr {
@@ -182,12 +187,26 @@ struct ib_mad {
 	u8			data[IB_MGMT_MAD_DATA];
 };
 
-struct ib_rmpp_mad {
+struct jumbo_mad {
+	struct ib_mad_hdr	mad_hdr;
+	u8			data[JUMBO_MGMT_MAD_DATA];
+};
+
+struct ib_rmpp_base {
 	struct ib_mad_hdr	mad_hdr;
 	struct ib_rmpp_hdr	rmpp_hdr;
+} __packed;
+
+struct ib_rmpp_mad {
+	struct ib_rmpp_base	base;
 	u8			data[IB_MGMT_RMPP_DATA];
 };
 
+struct jumbo_rmpp_mad {
+	struct ib_rmpp_base	base;
+	u8			data[JUMBO_MGMT_RMPP_DATA];
+};
+
 struct ib_sa_mad {
 	struct ib_mad_hdr	mad_hdr;
 	struct ib_rmpp_hdr	rmpp_hdr;
@@ -402,7 +421,10 @@ struct ib_mad_send_wc {
 struct ib_mad_recv_buf {
 	struct list_head	list;
 	struct ib_grh		*grh;
-	struct ib_mad		*mad;
+	union {
+		struct ib_mad		*mad;
+		struct jumbo_mad	*jumbo_mad;
+	};
 };
 
 /**
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 16/19] IB/mad: Add Intel Omni-Path Architecture defines
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (14 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 15/19] IB/mad: Create jumbo_mad data structures ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-17-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 17/19] IB/mad: Implement support for Intel Omni-Path Architecture base version MADs in ib_create_send_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (3 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

OPA_SMP_CLASS_VERSION -- Defined at 0x80
OPA_MGMT_BASE_VERSION -- Defined at 0x80

Increase max management version to accommodate OPA

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/mad_priv.h | 2 +-
 include/rdma/ib_mad.h              | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index d71ddcc..10df80d 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -56,7 +56,7 @@
 
 /* Registration table sizes */
 #define MAX_MGMT_CLASS		80
-#define MAX_MGMT_VERSION	8
+#define MAX_MGMT_VERSION	0x83
 #define MAX_MGMT_OUI		8
 #define MAX_MGMT_VENDOR_RANGE2	(IB_MGMT_CLASS_VENDOR_RANGE2_END - \
 				IB_MGMT_CLASS_VENDOR_RANGE2_START + 1)
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 80e7cf4..8938f1e 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -42,8 +42,11 @@
 #include <rdma/ib_verbs.h>
 #include <uapi/rdma/ib_user_mad.h>
 
-/* Management base version */
+/* Management base versions */
 #define IB_MGMT_BASE_VERSION			1
+#define OPA_MGMT_BASE_VERSION			0x80
+
+#define OPA_SMP_CLASS_VERSION			0x80
 
 /* Management classes */
 #define IB_MGMT_CLASS_SUBN_LID_ROUTED		0x01
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 17/19] IB/mad: Implement support for Intel Omni-Path Architecture base version MADs in ib_create_send_mad
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (15 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 16/19] IB/mad: Add Intel Omni-Path Architecture defines ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-18-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 18/19] IB/mad: Implement Intel Omni-Path Architecture SMP processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
                     ` (2 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

If the device supports OPA MADs process the OPA base version.
	Set MAD size and sg lengths as appropriate
	Split RMPP MADs as appropriate

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/mad.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 316b4b2..5aefe4c 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -857,11 +857,11 @@ out:
 	return ret;
 }
 
-static int get_pad_size(int hdr_len, int data_len)
+static int get_pad_size(int hdr_len, int data_len, size_t mad_size)
 {
 	int seg_size, pad;
 
-	seg_size = sizeof(struct ib_mad) - hdr_len;
+	seg_size = mad_size - hdr_len;
 	if (data_len && seg_size) {
 		pad = seg_size - data_len % seg_size;
 		return pad == seg_size ? 0 : pad;
@@ -880,14 +880,14 @@ static void free_send_rmpp_list(struct ib_mad_send_wr_private *mad_send_wr)
 }
 
 static int alloc_send_rmpp_list(struct ib_mad_send_wr_private *send_wr,
-				gfp_t gfp_mask)
+				size_t mad_size, gfp_t gfp_mask)
 {
 	struct ib_mad_send_buf *send_buf = &send_wr->send_buf;
 	struct ib_rmpp_base *rmpp_base = send_buf->mad;
 	struct ib_rmpp_segment *seg = NULL;
 	int left, seg_size, pad;
 
-	send_buf->seg_size = sizeof (struct ib_mad) - send_buf->hdr_len;
+	send_buf->seg_size = mad_size - send_buf->hdr_len;
 	seg_size = send_buf->seg_size;
 	pad = send_wr->pad;
 
@@ -937,20 +937,31 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	struct ib_mad_send_wr_private *mad_send_wr;
 	int pad, message_size, ret, size;
 	void *buf;
+	size_t mad_size;
+	int opa;
 
 	mad_agent_priv = container_of(mad_agent, struct ib_mad_agent_private,
 				      agent);
-	pad = get_pad_size(hdr_len, data_len);
+
+	opa = mad_agent_priv->agent.device->cached_dev_attrs.device_cap_flags2 &
+	      IB_DEVICE_OPA_MAD_SUPPORT;
+
+	if (opa && base_version == OPA_MGMT_BASE_VERSION)
+		mad_size = sizeof(struct jumbo_mad);
+	else
+		mad_size = sizeof(struct ib_mad);
+
+	pad = get_pad_size(hdr_len, data_len, mad_size);
 	message_size = hdr_len + data_len + pad;
 
 	if (ib_mad_kernel_rmpp_agent(mad_agent)) {
-		if (!rmpp_active && message_size > sizeof(struct ib_mad))
+		if (!rmpp_active && message_size > mad_size)
 			return ERR_PTR(-EINVAL);
 	} else
-		if (rmpp_active || message_size > sizeof(struct ib_mad))
+		if (rmpp_active || message_size > mad_size)
 			return ERR_PTR(-EINVAL);
 
-	size = rmpp_active ? hdr_len : sizeof(struct ib_mad);
+	size = rmpp_active ? hdr_len : mad_size;
 	buf = kzalloc(sizeof *mad_send_wr + size, gfp_mask);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
@@ -965,7 +976,14 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	mad_send_wr->mad_agent_priv = mad_agent_priv;
 	mad_send_wr->sg_list[0].length = hdr_len;
 	mad_send_wr->sg_list[0].lkey = mad_agent->mr->lkey;
-	mad_send_wr->sg_list[1].length = sizeof(struct ib_mad) - hdr_len;
+
+	/* OPA MADs don't have to be the full 2048 bytes */
+	if (opa && base_version == OPA_MGMT_BASE_VERSION &&
+	    data_len < mad_size - hdr_len)
+		mad_send_wr->sg_list[1].length = data_len;
+	else
+		mad_send_wr->sg_list[1].length = mad_size - hdr_len;
+
 	mad_send_wr->sg_list[1].lkey = mad_agent->mr->lkey;
 
 	mad_send_wr->send_wr.wr_id = (unsigned long) mad_send_wr;
@@ -978,7 +996,7 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
 	mad_send_wr->send_wr.wr.ud.pkey_index = pkey_index;
 
 	if (rmpp_active) {
-		ret = alloc_send_rmpp_list(mad_send_wr, gfp_mask);
+		ret = alloc_send_rmpp_list(mad_send_wr, mad_size, gfp_mask);
 		if (ret) {
 			kfree(buf);
 			return ERR_PTR(ret);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 18/19] IB/mad: Implement Intel Omni-Path Architecture SMP processing
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (16 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 17/19] IB/mad: Implement support for Intel Omni-Path Architecture base version MADs in ib_create_send_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-19-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-04 23:29   ` [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  2015-02-06 20:34   ` [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) " Hal Rosenstock
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Define the new OPA SMP format, create support functions for this format using
the previously defined helper functions as appropriate.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad_priv.h |   2 +
 drivers/infiniband/core/opa_smi.h  |  78 +++++++++++++++++++++++++++
 drivers/infiniband/core/smi.c      |  54 +++++++++++++++++++
 drivers/infiniband/core/smi.h      |   6 +++
 include/rdma/opa_smi.h             | 106 +++++++++++++++++++++++++++++++++++++
 5 files changed, 246 insertions(+)
 create mode 100644 drivers/infiniband/core/opa_smi.h
 create mode 100644 include/rdma/opa_smi.h

diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 10df80d..141b05a 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -41,6 +41,7 @@
 #include <linux/workqueue.h>
 #include <rdma/ib_mad.h>
 #include <rdma/ib_smi.h>
+#include <rdma/opa_smi.h>
 
 #define IB_MAD_QPS_CORE		2 /* Always QP0 and QP1 as a minimum */
 
@@ -82,6 +83,7 @@ struct ib_mad_private {
 		struct ib_smp smp;
 		struct jumbo_mad jumbo_mad;
 		struct jumbo_rmpp_mad jumbo_rmpp_mad;
+		struct opa_smp opa_smp;
 	} mad;
 } __attribute__ ((packed));
 
diff --git a/drivers/infiniband/core/opa_smi.h b/drivers/infiniband/core/opa_smi.h
new file mode 100644
index 0000000..d180179
--- /dev/null
+++ b/drivers/infiniband/core/opa_smi.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __OPA_SMI_H_
+#define __OPA_SMI_H_
+
+#include <rdma/ib_smi.h>
+#include <rdma/opa_smi.h>
+
+#include "smi.h"
+
+enum smi_action opa_smi_handle_dr_smp_recv(struct opa_smp *smp, u8 node_type,
+				       int port_num, int phys_port_cnt);
+int opa_smi_get_fwd_port(struct opa_smp *smp);
+extern enum smi_forward_action opa_smi_check_forward_dr_smp(struct opa_smp *smp);
+extern enum smi_action opa_smi_handle_dr_smp_send(struct opa_smp *smp,
+					      u8 node_type, int port_num);
+
+/*
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
+ */
+static inline enum smi_action opa_smi_check_local_smp(struct opa_smp *smp,
+						  struct ib_device *device)
+{
+	/* C14-9:3 -- We're at the end of the DR segment of path */
+	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
+	return (device->process_mad &&
+		!opa_get_smp_direction(smp) &&
+		(smp->hop_ptr == smp->hop_cnt + 1)) ?
+		IB_SMI_HANDLE : IB_SMI_DISCARD;
+}
+
+/*
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
+ */
+static inline enum smi_action opa_smi_check_local_returning_smp(struct opa_smp *smp,
+						   struct ib_device *device)
+{
+	/* C14-13:3 -- We're at the end of the DR segment of path */
+	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
+	return (device->process_mad &&
+		opa_get_smp_direction(smp) &&
+		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD;
+}
+
+#endif	/* __OPA_SMI_H_ */
diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index 8a5fb1d..a38ccb4 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2004, 2005 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004-2007 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -38,6 +39,7 @@
 
 #include <rdma/ib_smi.h>
 #include "smi.h"
+#include "opa_smi.h"
 
 static inline
 enum smi_action __smi_handle_dr_smp_send(u8 node_type, int port_num,
@@ -137,6 +139,20 @@ enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
+enum smi_action opa_smi_handle_dr_smp_send(struct opa_smp *smp,
+				       u8 node_type, int port_num)
+{
+	return __smi_handle_dr_smp_send(node_type, port_num,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->route.dr.initial_path,
+					smp->route.dr.return_path,
+					opa_get_smp_direction(smp),
+					smp->route.dr.dr_dlid ==
+					OPA_LID_PERMISSIVE,
+					smp->route.dr.dr_slid ==
+					OPA_LID_PERMISSIVE);
+}
+
 static inline
 enum smi_action __smi_handle_dr_smp_recv(u8 node_type, int port_num,
 					 int phys_port_cnt,
@@ -236,6 +252,24 @@ enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
 					smp->dr_slid == IB_LID_PERMISSIVE);
 }
 
+/*
+ * Adjust information for a received SMP
+ * Return 0 if the SMP should be dropped
+ */
+enum smi_action opa_smi_handle_dr_smp_recv(struct opa_smp *smp, u8 node_type,
+					   int port_num, int phys_port_cnt)
+{
+	return __smi_handle_dr_smp_recv(node_type, port_num, phys_port_cnt,
+					&smp->hop_ptr, smp->hop_cnt,
+					smp->route.dr.initial_path,
+					smp->route.dr.return_path,
+					opa_get_smp_direction(smp),
+					smp->route.dr.dr_dlid ==
+					OPA_LID_PERMISSIVE,
+					smp->route.dr.dr_slid ==
+					OPA_LID_PERMISSIVE);
+}
+
 static inline
 enum smi_forward_action __smi_check_forward_dr_smp(u8 hop_ptr, u8 hop_cnt,
 						   u8 direction,
@@ -277,6 +311,16 @@ enum smi_forward_action smi_check_forward_dr_smp(struct ib_smp *smp)
 					  smp->dr_slid != IB_LID_PERMISSIVE);
 }
 
+enum smi_forward_action opa_smi_check_forward_dr_smp(struct opa_smp *smp)
+{
+	return __smi_check_forward_dr_smp(smp->hop_ptr, smp->hop_cnt,
+					  opa_get_smp_direction(smp),
+					  smp->route.dr.dr_dlid ==
+					  OPA_LID_PERMISSIVE,
+					  smp->route.dr.dr_slid ==
+					  OPA_LID_PERMISSIVE);
+}
+
 /*
  * Return the forwarding port number from initial_path for outgoing SMP and
  * from return_path for returning SMP
@@ -286,3 +330,13 @@ int smi_get_fwd_port(struct ib_smp *smp)
 	return (!ib_get_smp_direction(smp) ? smp->initial_path[smp->hop_ptr+1] :
 		smp->return_path[smp->hop_ptr-1]);
 }
+
+/*
+ * Return the forwarding port number from initial_path for outgoing SMP and
+ * from return_path for returning SMP
+ */
+int opa_smi_get_fwd_port(struct opa_smp *smp)
+{
+	return !opa_get_smp_direction(smp) ? smp->route.dr.initial_path[smp->hop_ptr+1] :
+		smp->route.dr.return_path[smp->hop_ptr-1];
+}
diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
index aff96ba..e95c537 100644
--- a/drivers/infiniband/core/smi.h
+++ b/drivers/infiniband/core/smi.h
@@ -62,6 +62,9 @@ extern enum smi_action smi_handle_dr_smp_send(struct ib_smp *smp,
  * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
  * via process_mad
  */
+/* NOTE: This is called on opa_smp's don't check fields which are not common
+ * between ib_smp and opa_smp
+ */
 static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
 						  struct ib_device *device)
 {
@@ -77,6 +80,9 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
  * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
  * via process_mad
  */
+/* NOTE: This is called on opa_smp's don't check fields which are not common
+ * between ib_smp and opa_smp
+ */
 static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
 						   struct ib_device *device)
 {
diff --git a/include/rdma/opa_smi.h b/include/rdma/opa_smi.h
new file mode 100644
index 0000000..29063e8
--- /dev/null
+++ b/include/rdma/opa_smi.h
@@ -0,0 +1,106 @@
+/*
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(OPA_SMI_H)
+#define OPA_SMI_H
+
+#include <rdma/ib_mad.h>
+#include <rdma/ib_smi.h>
+
+#define OPA_SMP_LID_DATA_SIZE			2016
+#define OPA_SMP_DR_DATA_SIZE			1872
+#define OPA_SMP_MAX_PATH_HOPS			64
+
+#define OPA_SMI_CLASS_VERSION			0x80
+
+#define OPA_LID_PERMISSIVE			cpu_to_be32(0xFFFFFFFF)
+
+struct opa_smp {
+	u8	base_version;
+	u8	mgmt_class;
+	u8	class_version;
+	u8	method;
+	__be16	status;
+	u8	hop_ptr;
+	u8	hop_cnt;
+	__be64	tid;
+	__be16	attr_id;
+	__be16	resv;
+	__be32	attr_mod;
+	__be64	mkey;
+	union {
+		struct {
+			uint8_t data[OPA_SMP_LID_DATA_SIZE];
+		} lid;
+		struct {
+			__be32	dr_slid;
+			__be32	dr_dlid;
+			u8	initial_path[OPA_SMP_MAX_PATH_HOPS];
+			u8	return_path[OPA_SMP_MAX_PATH_HOPS];
+			u8	reserved[8];
+			u8	data[OPA_SMP_DR_DATA_SIZE];
+		} dr;
+	} route;
+} __packed;
+
+
+static inline u8
+opa_get_smp_direction(struct opa_smp *smp)
+{
+	return ib_get_smp_direction((struct ib_smp *)smp);
+}
+
+static inline u8 *opa_get_smp_data(struct opa_smp *smp)
+{
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+		return smp->route.dr.data;
+
+	return smp->route.lid.data;
+}
+
+static inline size_t opa_get_smp_data_size(struct opa_smp *smp)
+{
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+		return sizeof(smp->route.dr.data);
+
+	return sizeof(smp->route.lid.data);
+}
+
+static inline size_t opa_get_smp_header_size(struct opa_smp *smp)
+{
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+		return sizeof(*smp) - sizeof(smp->route.dr.data);
+
+	return sizeof(*smp) - sizeof(smp->route.lid.data);
+}
+
+#endif /* OPA_SMI_H */
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (17 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 18/19] IB/mad: Implement Intel Omni-Path Architecture SMP processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-04 23:29   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1423092585-26692-20-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-02-06 20:34   ` [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) " Hal Rosenstock
  19 siblings, 1 reply; 84+ messages in thread
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w @ 2015-02-04 23:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Ira Weiny

From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

For devices which support OPA MADs

OPA SMP packets must carry a valid pkey
	process wc.pkey_index returned by agents for response.

Handle variable length OPA MADs by:

	* Adjusting the 'fake' WC for locally routed SMP's to represent the
	  proper incoming byte_len
	* out_mad_size is used from the local HCA agents
		1) when sending agent responses on the wire
		2) when passing responses through the local_completions function

NOTE: wc.byte_len includes the GRH length and therefore is different from the
      in_mad_size specified to the local HCA agents.  out_mad_size should _not_
      include the GRH length as it is added by the verbs layer and is not part
      of MAD processing.

Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/core/agent.c    |  27 +++-
 drivers/infiniband/core/agent.h    |   3 +-
 drivers/infiniband/core/mad.c      | 251 ++++++++++++++++++++++++++++++-------
 drivers/infiniband/core/mad_priv.h |   1 +
 drivers/infiniband/core/mad_rmpp.c |  32 +++--
 drivers/infiniband/core/user_mad.c |  35 +++---
 include/rdma/ib_mad.h              |   2 +
 7 files changed, 276 insertions(+), 75 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index b6bd305..18275a5 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -80,13 +80,17 @@ ib_get_agent_port(struct ib_device *device, int port_num)
 
 void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 			 struct ib_wc *wc, struct ib_device *device,
-			 int port_num, int qpn)
+			 int port_num, int qpn, u32 resp_mad_len,
+			 int opa)
 {
 	struct ib_agent_port_private *port_priv;
 	struct ib_mad_agent *agent;
 	struct ib_mad_send_buf *send_buf;
 	struct ib_ah *ah;
+	size_t data_len;
+	size_t hdr_len;
 	struct ib_mad_send_wr_private *mad_send_wr;
+	u8 base_version;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH)
 		port_priv = ib_get_agent_port(device, 0);
@@ -106,16 +110,29 @@ void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 		return;
 	}
 
+	/* base version determines MAD size */
+	base_version = mad->mad_hdr.base_version;
+	if (opa && base_version == OPA_MGMT_BASE_VERSION) {
+		data_len = resp_mad_len - JUMBO_MGMT_MAD_HDR;
+		hdr_len = JUMBO_MGMT_MAD_HDR;
+	} else {
+		data_len = IB_MGMT_MAD_DATA;
+		hdr_len = IB_MGMT_MAD_HDR;
+	}
+
 	send_buf = ib_create_send_mad(agent, wc->src_qp, wc->pkey_index, 0,
-				      IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
-				      GFP_KERNEL,
-				      IB_MGMT_BASE_VERSION);
+				      hdr_len, data_len, GFP_KERNEL,
+				      base_version);
 	if (IS_ERR(send_buf)) {
 		dev_err(&device->dev, "ib_create_send_mad error\n");
 		goto err1;
 	}
 
-	memcpy(send_buf->mad, mad, sizeof *mad);
+	if (opa && base_version == OPA_MGMT_BASE_VERSION)
+		memcpy(send_buf->mad, mad, JUMBO_MGMT_MAD_HDR + data_len);
+	else
+		memcpy(send_buf->mad, mad, sizeof(*mad));
+
 	send_buf->ah = ah;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH) {
diff --git a/drivers/infiniband/core/agent.h b/drivers/infiniband/core/agent.h
index 6669287..1dee837 100644
--- a/drivers/infiniband/core/agent.h
+++ b/drivers/infiniband/core/agent.h
@@ -46,6 +46,7 @@ extern int ib_agent_port_close(struct ib_device *device, int port_num);
 
 extern void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 				struct ib_wc *wc, struct ib_device *device,
-				int port_num, int qpn);
+				int port_num, int qpn, u32 resp_mad_len,
+				int opa);
 
 #endif	/* __AGENT_H_ */
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 5aefe4c..9b7dc36 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2005 Intel Corporation.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2009 HNR Consulting. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -44,6 +45,7 @@
 #include "mad_priv.h"
 #include "mad_rmpp.h"
 #include "smi.h"
+#include "opa_smi.h"
 #include "agent.h"
 
 MODULE_LICENSE("Dual BSD/GPL");
@@ -733,6 +735,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 {
 	int ret = 0;
 	struct ib_smp *smp = mad_send_wr->send_buf.mad;
+	struct opa_smp *opa_smp = (struct opa_smp *)smp;
 	unsigned long flags;
 	struct ib_mad_local_private *local;
 	struct ib_mad_private *mad_priv;
@@ -744,6 +747,9 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
 	size_t in_mad_size = mad_agent_priv->agent.device->cached_dev_attrs.max_mad_size;
 	size_t out_mad_size;
+	u16 drslid;
+	int opa = mad_agent_priv->qp_info->qp->device->cached_dev_attrs.device_cap_flags2 &
+		  IB_DEVICE_OPA_MAD_SUPPORT;
 
 	if (device->node_type == RDMA_NODE_IB_SWITCH &&
 	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
@@ -757,13 +763,36 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	 * If we are at the start of the LID routed part, don't update the
 	 * hop_ptr or hop_cnt.  See section 14.2.2, Vol 1 IB spec.
 	 */
-	if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
-	     IB_LID_PERMISSIVE &&
-	     smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
-	     IB_SMI_DISCARD) {
-		ret = -EINVAL;
-		dev_err(&device->dev, "Invalid directed route\n");
-		goto out;
+	if (opa && smp->class_version == OPA_SMP_CLASS_VERSION) {
+		u32 opa_drslid;
+		if ((opa_get_smp_direction(opa_smp)
+		     ? opa_smp->route.dr.dr_dlid : opa_smp->route.dr.dr_slid) ==
+		     OPA_LID_PERMISSIVE &&
+		     opa_smi_handle_dr_smp_send(opa_smp, device->node_type,
+						port_num) == IB_SMI_DISCARD) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "OPA Invalid directed route\n");
+			goto out;
+		}
+		opa_drslid = be32_to_cpu(opa_smp->route.dr.dr_slid);
+		if (opa_drslid != OPA_LID_PERMISSIVE &&
+		    opa_drslid & 0xffff0000) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "OPA Invalid dr_slid 0x%x\n",
+			       opa_drslid);
+			goto out;
+		}
+		drslid = (u16)(opa_drslid & 0x0000ffff);
+	} else {
+		if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
+		     IB_LID_PERMISSIVE &&
+		     smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
+		     IB_SMI_DISCARD) {
+			ret = -EINVAL;
+			dev_err(&device->dev, "Invalid directed route\n");
+			goto out;
+		}
+		drslid = be16_to_cpu(smp->dr_slid);
 	}
 
 	/* Check to post send on QP or process locally */
@@ -789,10 +818,16 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	build_smp_wc(mad_agent_priv->agent.qp,
-		     send_wr->wr_id, be16_to_cpu(smp->dr_slid),
+		     send_wr->wr_id, drslid,
 		     send_wr->wr.ud.pkey_index,
 		     send_wr->wr.ud.port_num, &mad_wc);
 
+	if (opa && smp->base_version == OPA_MGMT_BASE_VERSION) {
+		mad_wc.byte_len = mad_send_wr->send_buf.hdr_len
+					+ mad_send_wr->send_buf.data_len
+					+ sizeof(struct ib_grh);
+	}
+
 	/* No GRH for DR SMP */
 	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
 				  (struct ib_mad_hdr *)smp, in_mad_size,
@@ -821,7 +856,10 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 		port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
 					    mad_agent_priv->agent.port_num);
 		if (port_priv) {
-			memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad));
+			if (opa && smp->base_version == OPA_MGMT_BASE_VERSION)
+				memcpy(&mad_priv->mad.mad, smp, sizeof(struct jumbo_mad));
+			else
+				memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad));
 			recv_mad_agent = find_mad_agent(port_priv,
 						        &mad_priv->mad.mad);
 		}
@@ -844,6 +882,8 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	local->mad_send_wr = mad_send_wr;
+	local->mad_send_wr->send_wr.wr.ud.pkey_index = mad_wc.pkey_index;
+	local->return_wc_byte_len = out_mad_size;
 	/* Reference MAD agent until send side of local completion handled */
 	atomic_inc(&mad_agent_priv->refcount);
 	/* Queue local completion to local list */
@@ -1737,14 +1777,18 @@ out:
 	return mad_agent;
 }
 
-static int validate_mad(struct ib_mad_hdr *mad_hdr, u32 qp_num)
+static int validate_mad(struct ib_mad_hdr *mad_hdr,
+			struct ib_mad_qp_info *qp_info,
+			int opa)
 {
 	int valid = 0;
+	u32 qp_num = qp_info->qp->qp_num;
 
 	/* Make sure MAD base version is understood */
-	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION) {
-		pr_err("MAD received with unsupported base version %d\n",
-			mad_hdr->base_version);
+	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION &&
+	    (!opa || mad_hdr->base_version != OPA_MGMT_BASE_VERSION)) {
+		pr_err("MAD received with unsupported base version %d %s\n",
+		       mad_hdr->base_version, opa ? "(opa)" : "");
 		goto out;
 	}
 
@@ -1844,18 +1888,18 @@ ib_find_send_mad(struct ib_mad_agent_private *mad_agent_priv,
 		 struct ib_mad_recv_wc *wc)
 {
 	struct ib_mad_send_wr_private *wr;
-	struct ib_mad *mad;
+	struct ib_mad_hdr *mad_hdr;
 
-	mad = (struct ib_mad *)wc->recv_buf.mad;
+	mad_hdr = (struct ib_mad_hdr *)wc->recv_buf.mad;
 
 	list_for_each_entry(wr, &mad_agent_priv->wait_list, agent_list) {
-		if ((wr->tid == mad->mad_hdr.tid) &&
+		if ((wr->tid == mad_hdr->tid) &&
 		    rcv_has_same_class(wr, wc) &&
 		    /*
 		     * Don't check GID for direct routed MADs.
 		     * These might have permissive LIDs.
 		     */
-		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
+		    (is_direct(mad_hdr->mgmt_class) ||
 		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
 			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
 	}
@@ -1866,14 +1910,14 @@ ib_find_send_mad(struct ib_mad_agent_private *mad_agent_priv,
 	 */
 	list_for_each_entry(wr, &mad_agent_priv->send_list, agent_list) {
 		if (is_rmpp_data_mad(mad_agent_priv, wr->send_buf.mad) &&
-		    wr->tid == mad->mad_hdr.tid &&
+		    wr->tid == mad_hdr->tid &&
 		    wr->timeout &&
 		    rcv_has_same_class(wr, wc) &&
 		    /*
 		     * Don't check GID for direct routed MADs.
 		     * These might have permissive LIDs.
 		     */
-		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
+		    (is_direct(mad_hdr->mgmt_class) ||
 		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
 			/* Verify request has not been canceled */
 			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
@@ -1889,7 +1933,7 @@ void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr)
 			      &mad_send_wr->mad_agent_priv->done_list);
 }
 
-static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
+void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 				 struct ib_mad_recv_wc *mad_recv_wc)
 {
 	struct ib_mad_send_wr_private *mad_send_wr;
@@ -1992,7 +2036,9 @@ enum smi_action handle_ib_smi(struct ib_mad_port_private *port_priv,
 				    &response->grh, wc,
 				    port_priv->device,
 				    smi_get_fwd_port(&recv->mad.smp),
-				    qp_info->qp->qp_num);
+				    qp_info->qp->qp_num,
+				    sizeof(struct ib_mad),
+				    0);
 
 		return IB_SMI_DISCARD;
 	}
@@ -2005,7 +2051,9 @@ static size_t mad_recv_buf_size(struct ib_device *dev)
 }
 
 static bool generate_unmatched_resp(struct ib_mad_private *recv,
-				    struct ib_mad_private *response)
+				    struct ib_mad_private *response,
+				    size_t *resp_len,
+				    int opa)
 {
 	if (recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_GET ||
 	    recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_SET) {
@@ -2019,29 +2067,103 @@ static bool generate_unmatched_resp(struct ib_mad_private *recv,
 		if (recv->mad.mad.mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
 			response->mad.mad.mad_hdr.status |= IB_SMP_DIRECTION;
 
+		if (opa && recv->mad.mad.mad_hdr.base_version == OPA_MGMT_BASE_VERSION) {
+			if (recv->mad.mad.mad_hdr.mgmt_class ==
+			    IB_MGMT_CLASS_SUBN_LID_ROUTED ||
+			    recv->mad.mad.mad_hdr.mgmt_class ==
+			    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+				*resp_len = opa_get_smp_header_size(
+							(struct opa_smp *)&recv->mad.smp);
+			else
+				*resp_len = sizeof(struct ib_mad_hdr);
+		}
+
 		return true;
 	} else {
 		return false;
 	}
 }
+
+static enum smi_action
+handle_opa_smi(struct ib_mad_port_private *port_priv,
+	       struct ib_mad_qp_info *qp_info,
+	       struct ib_wc *wc,
+	       int port_num,
+	       struct ib_mad_private *recv,
+	       struct ib_mad_private *response)
+{
+	enum smi_forward_action retsmi;
+
+	if (opa_smi_handle_dr_smp_recv(&recv->mad.opa_smp,
+				   port_priv->device->node_type,
+				   port_num,
+				   port_priv->device->phys_port_cnt) ==
+				   IB_SMI_DISCARD)
+		return IB_SMI_DISCARD;
+
+	retsmi = opa_smi_check_forward_dr_smp(&recv->mad.opa_smp);
+	if (retsmi == IB_SMI_LOCAL)
+		return IB_SMI_HANDLE;
+
+	if (retsmi == IB_SMI_SEND) { /* don't forward */
+		if (opa_smi_handle_dr_smp_send(&recv->mad.opa_smp,
+					   port_priv->device->node_type,
+					   port_num) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+		if (opa_smi_check_local_smp(&recv->mad.opa_smp, port_priv->device) == IB_SMI_DISCARD)
+			return IB_SMI_DISCARD;
+
+	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
+		/* forward case for switches */
+		memcpy(response, recv, sizeof(*response));
+		response->header.recv_wc.wc = &response->header.wc;
+		response->header.recv_wc.recv_buf.jumbo_mad = &response->mad.jumbo_mad;
+		response->header.recv_wc.recv_buf.grh = &response->grh;
+
+		agent_send_response((struct ib_mad *)&response->mad.mad,
+				    &response->grh, wc,
+				    port_priv->device,
+				    opa_smi_get_fwd_port(&recv->mad.opa_smp),
+				    qp_info->qp->qp_num,
+				    recv->header.wc.byte_len,
+				    1);
+
+		return IB_SMI_DISCARD;
+	}
+
+	return IB_SMI_HANDLE;
+}
+
+static enum smi_action
+handle_smi(struct ib_mad_port_private *port_priv,
+	   struct ib_mad_qp_info *qp_info,
+	   struct ib_wc *wc,
+	   int port_num,
+	   struct ib_mad_private *recv,
+	   struct ib_mad_private *response,
+	   int opa)
+{
+	if (opa && recv->mad.mad.mad_hdr.base_version == OPA_MGMT_BASE_VERSION &&
+	    recv->mad.mad.mad_hdr.class_version == OPA_SMI_CLASS_VERSION)
+		return handle_opa_smi(port_priv, qp_info, wc, port_num, recv, response);
+
+	return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response);
+}
+
 static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
-				     struct ib_wc *wc)
+				     struct ib_wc *wc,
+				     struct ib_mad_private_header *mad_priv_hdr,
+				     struct ib_mad_qp_info *qp_info)
 {
-	struct ib_mad_qp_info *qp_info;
-	struct ib_mad_private_header *mad_priv_hdr;
 	struct ib_mad_private *recv, *response = NULL;
-	struct ib_mad_list_head *mad_list;
 	struct ib_mad_agent_private *mad_agent;
 	int port_num;
 	int ret = IB_MAD_RESULT_SUCCESS;
 	size_t resp_mad_size;
+	int opa = qp_info->qp->device->cached_dev_attrs.device_cap_flags2 &
+		  IB_DEVICE_OPA_MAD_SUPPORT;
 
-	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
-	qp_info = mad_list->mad_queue->qp_info;
-	dequeue_mad(mad_list);
-
-	mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header,
-				    mad_list);
 	recv = container_of(mad_priv_hdr, struct ib_mad_private, header);
 	ib_dma_unmap_single(port_priv->device,
 			    recv->header.mapping,
@@ -2051,7 +2173,13 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 	/* Setup MAD receive work completion from "normal" work completion */
 	recv->header.wc = *wc;
 	recv->header.recv_wc.wc = &recv->header.wc;
-	recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+	if (opa && recv->mad.mad.mad_hdr.base_version == OPA_MGMT_BASE_VERSION) {
+		recv->header.recv_wc.mad_len = wc->byte_len - sizeof(struct ib_grh);
+		recv->header.recv_wc.mad_seg_size = sizeof(struct jumbo_mad);
+	} else {
+		recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+		recv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
+	}
 	recv->header.recv_wc.recv_buf.mad = &recv->mad.mad;
 	recv->header.recv_wc.recv_buf.grh = &recv->grh;
 
@@ -2059,7 +2187,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS);
 
 	/* Validate MAD */
-	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
+	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info, opa))
 		goto out;
 
 	response = alloc_mad_priv(port_priv->device, &resp_mad_size);
@@ -2076,8 +2204,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 
 	if (recv->mad.mad.mad_hdr.mgmt_class ==
 	    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
-		if (handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
-				  response)
+		if (handle_smi(port_priv, qp_info, wc, port_num, recv, response, opa)
 		    == IB_SMI_DISCARD)
 			goto out;
 	}
@@ -2099,7 +2226,9 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 						    &recv->grh, wc,
 						    port_priv->device,
 						    port_num,
-						    qp_info->qp->qp_num);
+						    qp_info->qp->qp_num,
+						    resp_mad_size,
+						    opa);
 				goto out;
 			}
 		}
@@ -2114,9 +2243,12 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
 		 */
 		recv = NULL;
 	} else if ((ret & IB_MAD_RESULT_SUCCESS) &&
-		   generate_unmatched_resp(recv, response)) {
+		   generate_unmatched_resp(recv, response, &resp_mad_size, opa)) {
 		agent_send_response(&response->mad.mad, &recv->grh, wc,
-				    port_priv->device, port_num, qp_info->qp->qp_num);
+				    port_priv->device, port_num,
+				    qp_info->qp->qp_num,
+				    resp_mad_size,
+				    opa);
 	}
 
 out:
@@ -2381,6 +2513,23 @@ static void mad_error_handler(struct ib_mad_port_private *port_priv,
 	}
 }
 
+static void ib_mad_recv_mad(struct ib_mad_port_private *port_priv,
+			    struct ib_wc *wc)
+{
+	struct ib_mad_qp_info *qp_info;
+	struct ib_mad_list_head *mad_list;
+	struct ib_mad_private_header *mad_priv_hdr;
+
+	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
+	qp_info = mad_list->mad_queue->qp_info;
+	dequeue_mad(mad_list);
+
+	mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header,
+				    mad_list);
+
+	ib_mad_recv_done_handler(port_priv, wc, mad_priv_hdr, qp_info);
+}
+
 /*
  * IB MAD completion callback
  */
@@ -2399,7 +2548,7 @@ static void ib_mad_completion_handler(struct work_struct *work)
 				ib_mad_send_done_handler(port_priv, &wc);
 				break;
 			case IB_WC_RECV:
-				ib_mad_recv_done_handler(port_priv, &wc);
+				ib_mad_recv_mad(port_priv, &wc);
 				break;
 			default:
 				BUG_ON(1);
@@ -2518,10 +2667,14 @@ static void local_completions(struct work_struct *work)
 	int free_mad;
 	struct ib_wc wc;
 	struct ib_mad_send_wc mad_send_wc;
+	int opa;
 
 	mad_agent_priv =
 		container_of(work, struct ib_mad_agent_private, local_work);
 
+	opa = mad_agent_priv->qp_info->qp->device->cached_dev_attrs.device_cap_flags2 &
+	      IB_DEVICE_OPA_MAD_SUPPORT;
+
 	spin_lock_irqsave(&mad_agent_priv->lock, flags);
 	while (!list_empty(&mad_agent_priv->local_list)) {
 		local = list_entry(mad_agent_priv->local_list.next,
@@ -2531,6 +2684,7 @@ static void local_completions(struct work_struct *work)
 		spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
 		free_mad = 0;
 		if (local->mad_priv) {
+			u8 base_version;
 			recv_mad_agent = local->recv_mad_agent;
 			if (!recv_mad_agent) {
 				dev_err(&mad_agent_priv->agent.device->dev,
@@ -2546,11 +2700,20 @@ static void local_completions(struct work_struct *work)
 			build_smp_wc(recv_mad_agent->agent.qp,
 				     (unsigned long) local->mad_send_wr,
 				     be16_to_cpu(IB_LID_PERMISSIVE),
-				     0, recv_mad_agent->agent.port_num, &wc);
+				     local->mad_send_wr->send_wr.wr.ud.pkey_index,
+				     recv_mad_agent->agent.port_num, &wc);
 
 			local->mad_priv->header.recv_wc.wc = &wc;
-			local->mad_priv->header.recv_wc.mad_len =
-						sizeof(struct ib_mad);
+
+			base_version = local->mad_priv->mad.mad.mad_hdr.base_version;
+			if (opa && base_version == OPA_MGMT_BASE_VERSION) {
+				local->mad_priv->header.recv_wc.mad_len = local->return_wc_byte_len;
+				local->mad_priv->header.recv_wc.mad_seg_size = sizeof(struct jumbo_mad);
+			} else {
+				local->mad_priv->header.recv_wc.mad_len = sizeof(struct ib_mad);
+				local->mad_priv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
+			}
+
 			INIT_LIST_HEAD(&local->mad_priv->header.recv_wc.rmpp_list);
 			list_add(&local->mad_priv->header.recv_wc.recv_buf.list,
 				 &local->mad_priv->header.recv_wc.rmpp_list);
@@ -2699,7 +2862,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
 	struct ib_mad_queue *recv_queue = &qp_info->recv_queue;
 
 	/* Initialize common scatter list fields */
-	sg_list.length = sizeof *mad_priv - sizeof mad_priv->header;
+	sg_list.length = mad_recv_buf_size(qp_info->port_priv->device);
 	sg_list.lkey = (*qp_info->port_priv->mr).lkey;
 
 	/* Initialize common receive WR fields */
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 141b05a..dd42ace 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -154,6 +154,7 @@ struct ib_mad_local_private {
 	struct ib_mad_private *mad_priv;
 	struct ib_mad_agent_private *recv_mad_agent;
 	struct ib_mad_send_wr_private *mad_send_wr;
+	size_t return_wc_byte_len;
 };
 
 struct ib_mad_mgmt_method_table {
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 7184530..6f69d5a 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2005 Intel Inc. All rights reserved.
  * Copyright (c) 2005-2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2014 Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -67,6 +68,7 @@ struct mad_rmpp_recv {
 	u8 mgmt_class;
 	u8 class_version;
 	u8 method;
+	u8 base_version;
 };
 
 static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv)
@@ -318,6 +320,7 @@ create_rmpp_recv(struct ib_mad_agent_private *agent,
 	rmpp_recv->mgmt_class = mad_hdr->mgmt_class;
 	rmpp_recv->class_version = mad_hdr->class_version;
 	rmpp_recv->method  = mad_hdr->method;
+	rmpp_recv->base_version  = mad_hdr->base_version;
 	return rmpp_recv;
 
 error:	kfree(rmpp_recv);
@@ -431,16 +434,25 @@ static void update_seg_num(struct mad_rmpp_recv *rmpp_recv,
 
 static inline int get_mad_len(struct mad_rmpp_recv *rmpp_recv)
 {
-	struct ib_rmpp_mad *rmpp_mad;
+	struct ib_rmpp_base *rmpp_base;
 	int hdr_size, data_size, pad;
+	int opa = rmpp_recv->agent->qp_info->qp->device->cached_dev_attrs.device_cap_flags2 &
+		  IB_DEVICE_OPA_MAD_SUPPORT;
 
-	rmpp_mad = (struct ib_rmpp_mad *)rmpp_recv->cur_seg_buf->mad;
+	rmpp_base = (struct ib_rmpp_base *)rmpp_recv->cur_seg_buf->mad;
 
-	hdr_size = ib_get_mad_data_offset(rmpp_mad->base.mad_hdr.mgmt_class);
-	data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
-	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad->base.rmpp_hdr.paylen_newwin);
-	if (pad > IB_MGMT_RMPP_DATA || pad < 0)
-		pad = 0;
+	hdr_size = ib_get_mad_data_offset(rmpp_base->mad_hdr.mgmt_class);
+	if (opa && rmpp_recv->base_version == OPA_MGMT_BASE_VERSION) {
+		data_size = sizeof(struct jumbo_rmpp_mad) - hdr_size;
+		pad = JUMBO_MGMT_RMPP_DATA - be32_to_cpu(rmpp_base->rmpp_hdr.paylen_newwin);
+		if (pad > JUMBO_MGMT_RMPP_DATA || pad < 0)
+			pad = 0;
+	} else {
+		data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
+		pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_base->rmpp_hdr.paylen_newwin);
+		if (pad > IB_MGMT_RMPP_DATA || pad < 0)
+			pad = 0;
+	}
 
 	return hdr_size + rmpp_recv->seg_num * data_size - pad;
 }
@@ -933,11 +945,11 @@ int ib_process_rmpp_send_wc(struct ib_mad_send_wr_private *mad_send_wr,
 
 int ib_retry_rmpp(struct ib_mad_send_wr_private *mad_send_wr)
 {
-	struct ib_rmpp_base *rmpp_base;
+	struct ib_rmpp_mad *rmpp_mad;
 	int ret;
 
-	rmpp_base = mad_send_wr->send_buf.mad;
-	if (!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
+	rmpp_mad = mad_send_wr->send_buf.mad;
+	if (!(ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) &
 	      IB_MGMT_RMPP_FLAG_ACTIVE))
 		return IB_RMPP_RESULT_UNHANDLED; /* RMPP not active */
 
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index ac33d34..1192f6c 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -263,20 +263,23 @@ static ssize_t copy_recv_mad(struct ib_umad_file *file, char __user *buf,
 {
 	struct ib_mad_recv_buf *recv_buf;
 	int left, seg_payload, offset, max_seg_payload;
+	size_t seg_size;
 
-	/* We need enough room to copy the first (or only) MAD segment. */
 	recv_buf = &packet->recv_wc->recv_buf;
-	if ((packet->length <= sizeof (*recv_buf->mad) &&
+	seg_size = packet->recv_wc->mad_seg_size;
+
+	/* We need enough room to copy the first (or only) MAD segment. */
+	if ((packet->length <= seg_size &&
 	     count < hdr_size(file) + packet->length) ||
-	    (packet->length > sizeof (*recv_buf->mad) &&
-	     count < hdr_size(file) + sizeof (*recv_buf->mad)))
+	    (packet->length > seg_size &&
+	     count < hdr_size(file) + seg_size))
 		return -EINVAL;
 
 	if (copy_to_user(buf, &packet->mad, hdr_size(file)))
 		return -EFAULT;
 
 	buf += hdr_size(file);
-	seg_payload = min_t(int, packet->length, sizeof (*recv_buf->mad));
+	seg_payload = min_t(int, packet->length, seg_size);
 	if (copy_to_user(buf, recv_buf->mad, seg_payload))
 		return -EFAULT;
 
@@ -293,7 +296,7 @@ static ssize_t copy_recv_mad(struct ib_umad_file *file, char __user *buf,
 			return -ENOSPC;
 		}
 		offset = ib_get_mad_data_offset(recv_buf->mad->mad_hdr.mgmt_class);
-		max_seg_payload = sizeof (struct ib_mad) - offset;
+		max_seg_payload = seg_size - offset;
 
 		for (left = packet->length - seg_payload, buf += seg_payload;
 		     left; left -= seg_payload, buf += seg_payload) {
@@ -448,9 +451,10 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 	struct ib_mad_agent *agent;
 	struct ib_ah_attr ah_attr;
 	struct ib_ah *ah;
-	struct ib_rmpp_base *rmpp_base;
+	struct ib_rmpp_mad *rmpp_mad;
 	__be64 *tid;
 	int ret, data_len, hdr_len, copy_offset, rmpp_active;
+	u8 base_version;
 
 	if (count < hdr_size(file) + IB_MGMT_RMPP_HDR)
 		return -EINVAL;
@@ -504,25 +508,26 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 		goto err_up;
 	}
 
-	rmpp_base = (struct ib_rmpp_base *) packet->mad.data;
-	hdr_len = ib_get_mad_data_offset(rmpp_base->mad_hdr.mgmt_class);
+	rmpp_mad = (struct ib_rmpp_mad *) packet->mad.data;
+	hdr_len = ib_get_mad_data_offset(rmpp_mad->base.mad_hdr.mgmt_class);
 
-	if (ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
+	if (ib_is_mad_class_rmpp(rmpp_mad->base.mad_hdr.mgmt_class)
 	    && ib_mad_kernel_rmpp_agent(agent)) {
 		copy_offset = IB_MGMT_RMPP_HDR;
-		rmpp_active = ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
+		rmpp_active = ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) &
 						IB_MGMT_RMPP_FLAG_ACTIVE;
 	} else {
 		copy_offset = IB_MGMT_MAD_HDR;
 		rmpp_active = 0;
 	}
 
+	base_version = ((struct ib_mad_hdr *)&packet->mad.data)->base_version;
 	data_len = count - hdr_size(file) - hdr_len;
 	packet->msg = ib_create_send_mad(agent,
 					 be32_to_cpu(packet->mad.hdr.qpn),
 					 packet->mad.hdr.pkey_index, rmpp_active,
 					 hdr_len, data_len, GFP_KERNEL,
-					 IB_MGMT_BASE_VERSION);
+					 base_version);
 	if (IS_ERR(packet->msg)) {
 		ret = PTR_ERR(packet->msg);
 		goto err_ah;
@@ -558,12 +563,12 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
 		tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid;
 		*tid = cpu_to_be64(((u64) agent->hi_tid) << 32 |
 				   (be64_to_cpup(tid) & 0xffffffff));
-		rmpp_base->mad_hdr.tid = *tid;
+		rmpp_mad->base.mad_hdr.tid = *tid;
 	}
 
 	if (!ib_mad_kernel_rmpp_agent(agent)
-	   && ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
-	   && (ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE)) {
+	   && ib_is_mad_class_rmpp(rmpp_mad->base.mad_hdr.mgmt_class)
+	   && (ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE)) {
 		spin_lock_irq(&file->send_lock);
 		list_add_tail(&packet->list, &file->send_list);
 		spin_unlock_irq(&file->send_lock);
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 8938f1e..f5b6a27 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -436,6 +436,7 @@ struct ib_mad_recv_buf {
  * @recv_buf: Specifies the location of the received data buffer(s).
  * @rmpp_list: Specifies a list of RMPP reassembled received MAD buffers.
  * @mad_len: The length of the received MAD, without duplicated headers.
+ * @mad_seg_size: The size of individual MAD segments
  *
  * For received response, the wr_id contains a pointer to the ib_mad_send_buf
  *   for the corresponding send request.
@@ -445,6 +446,7 @@ struct ib_mad_recv_wc {
 	struct ib_mad_recv_buf	recv_buf;
 	struct list_head	rmpp_list;
 	int			mad_len;
+	size_t			mad_seg_size;
 };
 
 /**
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.
       [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (18 preceding siblings ...)
  2015-02-04 23:29   ` [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
@ 2015-02-06 20:34   ` Hal Rosenstock
       [not found]     ` <54D52562.5050408-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  19 siblings, 1 reply; 84+ messages in thread
From: Hal Rosenstock @ 2015-02-06 20:34 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 2/4/2015 6:29 PM, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> The following patch series modifies the kernel MAD processing (ib_mad/ib_umad)
> and related interfaces to send and receive Intel Omni-Path Architecture MADs on
> devices which support them.
> 
> In addition to supporting some IBTA management classes, OPA devices use MADs
> with lengths up to 2K.  These "jumbo" MADs increase the performance of
> management traffic.
> 
> To distinguish IBTA MADs from OPA MADs a new Base Version is introduced.  

With your recent changes, I don't think that statement above is strictly
true any longer. While OPA does use a different base version for it's
jumbo MADs, aren't OPA MADs distinguished from IBTA MADs by the new OPA
MAD device capability bit ?

> The
> new format shares the same common header with IBTA MADs which allows us to
> share most of the MAD processing code when dealing with the new Base Version.
> 
> 
> The patch series is broken into 3 main areas.
> 
> 1) Add the ability for devices to indicate MAD size.
>    modify the MAD code to use this MAD size
> 
> 2) Enhance the interface to the device agents to support larger and variable
>    length MADs.
> 
> 3) Add capability bit to indicate support for OPA MADs
> 
> 4) Add support for creating and processing OPA MADs
> 
> 
> Changes for V4:
> 
> 	Rebased to latest Rolands for-next branch (3.19-rc4)
> 	Fixed compile issue in ehca driver found with 0-day build.
> 
> 
> Ira Weiny (19):
>   IB/mad: Rename is_data_mad to is_rmpp_data_mad
>   IB/core: Cache device attributes for use by upper level drivers
>   IB/mad: Change validate_mad signature to take ib_mad_hdr rather than
>     ib_mad
>   IB/mad: Change ib_response_mad signature to take ib_mad_hdr rather
>     than ib_mad
>   IB/mad: Change cast in rcv_has_same_class
>   IB/core: Add max_mad_size to ib_device_attr
>   IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc
>   IB/mad: Add helper function for smi_handle_dr_smp_send
>   IB/mad: Add helper function for smi_handle_dr_smp_recv
>   IB/mad: Add helper function for smi_check_forward_dr_smp
>   IB/mad: Add helper function for SMI processing
>   IB/mad: Add MAD size parameters to process_mad
>   IB/mad: Add base version parameter to ib_create_send_mad
>   IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
>   IB/mad: Create jumbo_mad data structures
>   IB/mad: Add Intel Omni-Path Architecture defines
>   IB/mad: Implement support for Intel Omni-Path Architecture base
>     version MADs in ib_create_send_mad
>   IB/mad: Implement Intel Omni-Path Architecture SMP processing
>   IB/mad: Implement Intel Omni-Path Architecture MAD processing
> 
>  drivers/infiniband/core/agent.c              |  26 +-
>  drivers/infiniband/core/agent.h              |   3 +-
>  drivers/infiniband/core/cm.c                 |   6 +-
>  drivers/infiniband/core/device.c             |   2 +
>  drivers/infiniband/core/mad.c                | 519 ++++++++++++++++++---------
>  drivers/infiniband/core/mad_priv.h           |   7 +-
>  drivers/infiniband/core/mad_rmpp.c           | 144 ++++----
>  drivers/infiniband/core/opa_smi.h            |  78 ++++
>  drivers/infiniband/core/sa_query.c           |   3 +-
>  drivers/infiniband/core/smi.c                | 231 ++++++++----
>  drivers/infiniband/core/smi.h                |   6 +
>  drivers/infiniband/core/sysfs.c              |   5 +-
>  drivers/infiniband/core/user_mad.c           |  38 +-
>  drivers/infiniband/hw/amso1100/c2_provider.c |   5 +-
>  drivers/infiniband/hw/amso1100/c2_rnic.c     |   1 +
>  drivers/infiniband/hw/cxgb3/iwch_provider.c  |   6 +-
>  drivers/infiniband/hw/cxgb4/provider.c       |   8 +-
>  drivers/infiniband/hw/ehca/ehca_hca.c        |   3 +
>  drivers/infiniband/hw/ehca/ehca_iverbs.h     |   4 +-
>  drivers/infiniband/hw/ehca/ehca_sqp.c        |   8 +-
>  drivers/infiniband/hw/ipath/ipath_mad.c      |   8 +-
>  drivers/infiniband/hw/ipath/ipath_verbs.c    |   1 +
>  drivers/infiniband/hw/ipath/ipath_verbs.h    |   3 +-
>  drivers/infiniband/hw/mlx4/mad.c             |  12 +-
>  drivers/infiniband/hw/mlx4/main.c            |   1 +
>  drivers/infiniband/hw/mlx4/mlx4_ib.h         |   3 +-
>  drivers/infiniband/hw/mlx5/mad.c             |   8 +-
>  drivers/infiniband/hw/mlx5/main.c            |   1 +
>  drivers/infiniband/hw/mlx5/mlx5_ib.h         |   3 +-
>  drivers/infiniband/hw/mthca/mthca_dev.h      |   4 +-
>  drivers/infiniband/hw/mthca/mthca_mad.c      |  12 +-
>  drivers/infiniband/hw/mthca/mthca_provider.c |   2 +
>  drivers/infiniband/hw/nes/nes_verbs.c        |   4 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c     |   3 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.h     |   3 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |   1 +
>  drivers/infiniband/hw/qib/qib_iba7322.c      |   3 +-
>  drivers/infiniband/hw/qib/qib_mad.c          |  11 +-
>  drivers/infiniband/hw/qib/qib_verbs.c        |   1 +
>  drivers/infiniband/hw/qib/qib_verbs.h        |   3 +-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |   2 +
>  drivers/infiniband/ulp/srpt/ib_srpt.c        |   3 +-
>  include/rdma/ib_mad.h                        |  40 ++-
>  include/rdma/ib_verbs.h                      |  15 +-
>  include/rdma/opa_smi.h                       | 106 ++++++
>  45 files changed, 999 insertions(+), 357 deletions(-)
>  create mode 100644 drivers/infiniband/core/opa_smi.h
>  create mode 100644 include/rdma/opa_smi.h

What performance tests were run in terms of IBTA MADs ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]     ` <1423092585-26692-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-02-06 20:35       ` Hal Rosenstock
       [not found]         ` <54D52589.8020305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hal Rosenstock @ 2015-02-06 20:35 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 2/4/2015 6:29 PM, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> OPA MADs share a common header with IBTA MADs but with a different base version
> and an extended length.  These "jumbo" MADs increase the performance of
> management traffic.
> 
> Sharing a common header with IBTA MADs allows us to share most of the MAD
> processing code when dealing with OPA MADs in addition to supporting some IBTA
> MADs on OPA devices.
> 
> Add a device capability flag to indicate OPA MAD support on the device.
> 
> Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> ---
>  include/rdma/ib_verbs.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 3ab4033..2614233 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -128,6 +128,10 @@ enum ib_device_cap_flags {
>  	IB_DEVICE_ON_DEMAND_PAGING	= (1<<31),
>  };
>  
> +enum ib_device_cap_flags2 {
> +	IB_DEVICE_OPA_MAD_SUPPORT	= 1
> +};
> +
>  enum ib_signature_prot_cap {
>  	IB_PROT_T10DIF_TYPE_1 = 1,
>  	IB_PROT_T10DIF_TYPE_2 = 1 << 1,
> @@ -210,6 +214,7 @@ struct ib_device_attr {
>  	int			sig_prot_cap;
>  	int			sig_guard_cap;
>  	struct ib_odp_caps	odp_caps;
> +	u64			device_cap_flags2;
>  	u32			max_mad_size;
>  };
>  

Why is OPA support determined via a device capability flag ? What are
the tradeoffs for doing it this way versus the other choices that have
been used in the past for other RDMA technologies like RoCE, iWARP,
usNIC, … ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing.
       [not found]     ` <54D52562.5050408-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-02-09 21:20       ` Weiny, Ira
  0 siblings, 0 replies; 84+ messages in thread
From: Weiny, Ira @ 2015-02-09 21:20 UTC (permalink / raw)
  To: 'Hal Rosenstock'
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> 
> On 2/4/2015 6:29 PM, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >
> > The following patch series modifies the kernel MAD processing
> > (ib_mad/ib_umad) and related interfaces to send and receive Intel
> > Omni-Path Architecture MADs on devices which support them.
> >
> > In addition to supporting some IBTA management classes, OPA devices
> > use MADs with lengths up to 2K.  These "jumbo" MADs increase the
> > performance of management traffic.
> >
> > To distinguish IBTA MADs from OPA MADs a new Base Version is introduced.
> 
> With your recent changes, I don't think that statement above is strictly true any
> longer. While OPA does use a different base version for it's jumbo MADs, aren't
> OPA MADs distinguished from IBTA MADs by the new OPA MAD device
> capability bit ?
> 

True.

However, OPA MADs with a base version of 0x1 are compatible with and therefore can be processed by the same code as IBTA MADs.

If I need to respin the series I will update the comment.

> 
> What performance tests were run in terms of IBTA MADs ?

Sorry for the delay.  As there have been a couple of versions of this series since I ran those tests in December I took the time to re-run these tests.  

OpenSM (sweep time) and infiniband-diag (iblinkinfo and saquery) were timed on my small cluster with no noticeable change in performance.  But this is not the best test as I only have 6 or so nodes on 2 switches.

For example iblinkinfo runs very quickly:

[root@phcppriv12 OPENIB_FF]# time iblinkinfo > /dev/null

real    0m0.072s
user    0m0.002s
sys     0m0.041s


The better test we have at this small scale are a couple of tools (closed source) which send SMA and PMA packets as rapidly as possible.

Those showed no difference in performance.

For example I ran these tools with 3 different kernels 1) "stock roland",  2) the series in question up to the kmalloc patch,  and  final 3) then the full OPA series.

Here is a summary of the results:

                   Roland for-next (ecb7b12)             up to kmalloc patch                   full OPA patch set

SMA            21072                                                21324                                        21381
SMA rcv      17139                                                17329                                        17303

PMA            24159.4                                             24401.2                                     24166.5
PMA rcv      24159.4                                             24401.2                                     24166.5


NOTE: The results shown above are specifically shown without units as I am not allowed to publish performance numbers.  However larger numbers equal better performance.

The numbers are quite repeatable and as you can see are all close to each other from kernel to kernel.
 
The remote node for these tests was running the stock RHEL7 kernel.  All software and hardware was held constant except for the kernel patches in question.
	
Ira

> 
> -- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]         ` <54D52589.8020305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-02-11 15:40           ` Weiny, Ira
       [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC244A8-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-02-11 15:40 UTC (permalink / raw)
  To: 'Hal Rosenstock'
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> On 2/4/2015 6:29 PM, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >
> > OPA MADs share a common header with IBTA MADs but with a different
> > base version and an extended length.  These "jumbo" MADs increase the
> > performance of management traffic.
> >
> > Sharing a common header with IBTA MADs allows us to share most of the
> > MAD processing code when dealing with OPA MADs in addition to
> > supporting some IBTA MADs on OPA devices.
> >
> > Add a device capability flag to indicate OPA MAD support on the device.
> >
> > Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >
> > ---
> >  include/rdma/ib_verbs.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> > 3ab4033..2614233 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -128,6 +128,10 @@ enum ib_device_cap_flags {
> >  	IB_DEVICE_ON_DEMAND_PAGING	= (1<<31),
> >  };
> >
> > +enum ib_device_cap_flags2 {
> > +	IB_DEVICE_OPA_MAD_SUPPORT	= 1
> > +};
> > +
> >  enum ib_signature_prot_cap {
> >  	IB_PROT_T10DIF_TYPE_1 = 1,
> >  	IB_PROT_T10DIF_TYPE_2 = 1 << 1,
> > @@ -210,6 +214,7 @@ struct ib_device_attr {
> >  	int			sig_prot_cap;
> >  	int			sig_guard_cap;
> >  	struct ib_odp_caps	odp_caps;
> > +	u64			device_cap_flags2;
> >  	u32			max_mad_size;
> >  };
> >
> 
> Why is OPA support determined via a device capability flag ? What are the
> tradeoffs for doing it this way versus the other choices that have been used in
> the past for other RDMA technologies like RoCE, iWARP, usNIC, ... ?

None of those technologies use the MAD stack for Subnet Management.  Other MAD support is very limited (ie IB compatible PMA queries on the local port only).

Do you have a suggestion for alternatives?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC244A8-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-02-12 14:00               ` Hal Rosenstock
       [not found]                 ` <54DCB1E9.7010309-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hal Rosenstock @ 2015-02-12 14:00 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 2/11/2015 10:40 AM, Weiny, Ira wrote:
>> On 2/4/2015 6:29 PM, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
>>> From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>>
>>> OPA MADs share a common header with IBTA MADs but with a different
>>> base version and an extended length.  These "jumbo" MADs increase the
>>> performance of management traffic.
>>>
>>> Sharing a common header with IBTA MADs allows us to share most of the
>>> MAD processing code when dealing with OPA MADs in addition to
>>> supporting some IBTA MADs on OPA devices.
>>>
>>> Add a device capability flag to indicate OPA MAD support on the device.
>>>
>>> Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>>
>>> ---
>>>  include/rdma/ib_verbs.h | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
>>> 3ab4033..2614233 100644
>>> --- a/include/rdma/ib_verbs.h
>>> +++ b/include/rdma/ib_verbs.h
>>> @@ -128,6 +128,10 @@ enum ib_device_cap_flags {
>>>  	IB_DEVICE_ON_DEMAND_PAGING	= (1<<31),
>>>  };
>>>
>>> +enum ib_device_cap_flags2 {
>>> +	IB_DEVICE_OPA_MAD_SUPPORT	= 1
>>> +};
>>> +
>>>  enum ib_signature_prot_cap {
>>>  	IB_PROT_T10DIF_TYPE_1 = 1,
>>>  	IB_PROT_T10DIF_TYPE_2 = 1 << 1,
>>> @@ -210,6 +214,7 @@ struct ib_device_attr {
>>>  	int			sig_prot_cap;
>>>  	int			sig_guard_cap;
>>>  	struct ib_odp_caps	odp_caps;
>>> +	u64			device_cap_flags2;
>>>  	u32			max_mad_size;
>>>  };
>>>
>>
>> Why is OPA support determined via a device capability flag ? What are the
>> tradeoffs for doing it this way versus the other choices that have been used in
>> the past for other RDMA technologies like RoCE, iWARP, usNIC, ... ?
> 
> None of those technologies use the MAD stack for Subnet Management.  Other MAD support is very limited (ie IB compatible PMA queries on the local port only).
> 
> Do you have a suggestion for alternatives?

The desire to leverage the IB MAD infrastructure for OPA is understood
but the current approach represents OPA as a device capability which
does not seem appropriate because OPA is clearly a different type of
RDMA technology than IB.

-- Hal

> Ira
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                 ` <54DCB1E9.7010309-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-02-17 21:25                   ` Weiny, Ira
       [not found]                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC29020-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-02-17 21:25 UTC (permalink / raw)
  To: 'Hal Rosenstock'
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> >>>
> >>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> >>> 3ab4033..2614233 100644
> >>> --- a/include/rdma/ib_verbs.h
> >>> +++ b/include/rdma/ib_verbs.h
> >>> @@ -128,6 +128,10 @@ enum ib_device_cap_flags {
> >>>  	IB_DEVICE_ON_DEMAND_PAGING	= (1<<31),
> >>>  };
> >>>
> >>> +enum ib_device_cap_flags2 {
> >>> +	IB_DEVICE_OPA_MAD_SUPPORT	= 1
> >>> +};
> >>> +
> >>>  enum ib_signature_prot_cap {
> >>>  	IB_PROT_T10DIF_TYPE_1 = 1,
> >>>  	IB_PROT_T10DIF_TYPE_2 = 1 << 1,
> >>> @@ -210,6 +214,7 @@ struct ib_device_attr {
> >>>  	int			sig_prot_cap;
> >>>  	int			sig_guard_cap;
> >>>  	struct ib_odp_caps	odp_caps;
> >>> +	u64			device_cap_flags2;
> >>>  	u32			max_mad_size;
> >>>  };
> >>>
> >>
> >> Why is OPA support determined via a device capability flag ? What are
> >> the tradeoffs for doing it this way versus the other choices that
> >> have been used in the past for other RDMA technologies like RoCE, iWARP,
> usNIC, ... ?
> >
> > None of those technologies use the MAD stack for Subnet Management.
> Other MAD support is very limited (ie IB compatible PMA queries on the local
> port only).
> >
> > Do you have a suggestion for alternatives?
> 
> The desire to leverage the IB MAD infrastructure for OPA is understood but the
> current approach represents OPA as a device capability which does not seem
> appropriate because OPA is clearly a different type of RDMA technology than
> IB.
> 

While it is a different type of technology, standard verbs[*] remains 100% compatible.  Unlike other verbs technologies user space software does not need any knowledge that the underlying device is not IB.  For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible.

Therefore, to address your initial question regarding tradeoffs I believe this method is the least invasive to the code as well as removing any potential performance penalties to core verbs.

Ira

[*] We don't support some of the extensions particularly those which have been most recently introduced.  And we would like to make our own extensions in the form of higher MTU availability, but the patch is not yet ready to be submitted upstream.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC29020-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-02-23 18:54                       ` Hal Rosenstock
       [not found]                         ` <54EB7756.7070407-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hal Rosenstock @ 2015-02-23 18:54 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 2/17/2015 4:25 PM, Weiny, Ira wrote:
>>>>>
>>>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
>>>>> 3ab4033..2614233 100644
>>>>> --- a/include/rdma/ib_verbs.h
>>>>> +++ b/include/rdma/ib_verbs.h
>>>>> @@ -128,6 +128,10 @@ enum ib_device_cap_flags {
>>>>>  	IB_DEVICE_ON_DEMAND_PAGING	= (1<<31),
>>>>>  };
>>>>>
>>>>> +enum ib_device_cap_flags2 {
>>>>> +	IB_DEVICE_OPA_MAD_SUPPORT	= 1
>>>>> +};
>>>>> +
>>>>>  enum ib_signature_prot_cap {
>>>>>  	IB_PROT_T10DIF_TYPE_1 = 1,
>>>>>  	IB_PROT_T10DIF_TYPE_2 = 1 << 1,
>>>>> @@ -210,6 +214,7 @@ struct ib_device_attr {
>>>>>  	int			sig_prot_cap;
>>>>>  	int			sig_guard_cap;
>>>>>  	struct ib_odp_caps	odp_caps;
>>>>> +	u64			device_cap_flags2;
>>>>>  	u32			max_mad_size;
>>>>>  };
>>>>>
>>>>
>>>> Why is OPA support determined via a device capability flag ? What are
>>>> the tradeoffs for doing it this way versus the other choices that
>>>> have been used in the past for other RDMA technologies like RoCE, iWARP,
>> usNIC, ... ?
>>>
>>> None of those technologies use the MAD stack for Subnet Management.
>> Other MAD support is very limited (ie IB compatible PMA queries on the local
>> port only).
>>>
>>> Do you have a suggestion for alternatives?
>>
>> The desire to leverage the IB MAD infrastructure for OPA is understood but the
>> current approach represents OPA as a device capability which does not seem
>> appropriate because OPA is clearly a different type of RDMA technology than
>> IB.
>>
> 
> While it is a different type of technology, standard verbs[*] remains 100% compatible.  Unlike other verbs technologies user space software does not need any knowledge that the underlying device is not IB.  For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible.

Even if OPA is 100% standard verbs compatible which it does not appear
to be, that does not make OPA an extra capability of an IBA device.
While it is a primary goal of the RDMA stack to have a common verbs API
for various RDMA interconnects, each one is properly represented to
allow it’s unique characteristics to be exposed.

> Therefore, to address your initial question regarding tradeoffs I believe this method is the least invasive to the code as well as removing any potential performance penalties to core verbs.
> 
> Ira
> 
> [*] We don't support some of the extensions particularly those which have been most recently introduced.  And we would like to make our own extensions in the form of higher MTU availability, but the patch is not yet ready to be submitted upstream.

There appear to be a number of things that are not exposed by the
current patch set which will be needed in subsequent patches. It would
be better to see the complete picture so it can be reviewed as a whole.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
       [not found]     ` <1423092585-26692-7-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-02-24 14:16       ` Doug Ledford
       [not found]         ` <1424787385.4847.16.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Doug Ledford @ 2015-02-24 14:16 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

[-- Attachment #1: Type: text/plain, Size: 10109 bytes --]

On Wed, 2015-02-04 at 18:29 -0500, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> Change all IB drivers to report the max MAD size.
> Add check to verify that all devices support at least IB_MGMT_MAD_SIZE
> 
> Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> ---
> 
> Changes from V3:
> 	Fix ehca compile found with 0-day build
> 
>  drivers/infiniband/core/mad.c                | 6 ++++++
>  drivers/infiniband/hw/amso1100/c2_rnic.c     | 1 +
>  drivers/infiniband/hw/cxgb3/iwch_provider.c  | 1 +
>  drivers/infiniband/hw/cxgb4/provider.c       | 1 +
>  drivers/infiniband/hw/ehca/ehca_hca.c        | 3 +++
>  drivers/infiniband/hw/ipath/ipath_verbs.c    | 1 +
>  drivers/infiniband/hw/mlx4/main.c            | 1 +
>  drivers/infiniband/hw/mlx5/main.c            | 1 +
>  drivers/infiniband/hw/mthca/mthca_provider.c | 2 ++
>  drivers/infiniband/hw/nes/nes_verbs.c        | 1 +
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  | 1 +
>  drivers/infiniband/hw/qib/qib_verbs.c        | 1 +
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 ++
>  include/rdma/ib_mad.h                        | 1 +
>  include/rdma/ib_verbs.h                      | 1 +
>  15 files changed, 24 insertions(+)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 819b794..a6a33cf 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2924,6 +2924,12 @@ static int ib_mad_port_open(struct ib_device *device,
>  	char name[sizeof "ib_mad123"];
>  	int has_smi;
>  
> +	if (device->cached_dev_attrs.max_mad_size < IB_MGMT_MAD_SIZE) {
> +		dev_err(&device->dev, "Min MAD size for device is %u\n",
> +			IB_MGMT_MAD_SIZE);
> +		return -EFAULT;
> +	}
> +

The printk message here is not very informative and it qualifies as an
error.  Someone reading that for the first time in the dmesg output and
wondering why their device isn't working will be confused if they don't
know about the mad size changes you are making here.  Something like
"max supported MAD size (%u) < min required by ib_mad (%u), ignoring dev
\n"

>  	/* Create new device info */
>  	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
>  	if (!port_priv) {
> diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c
> index d2a6d96..63322c0 100644
> --- a/drivers/infiniband/hw/amso1100/c2_rnic.c
> +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c
> @@ -197,6 +197,7 @@ static int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props)
>  	props->max_srq_sge         = 0;
>  	props->max_pkeys           = 0;
>  	props->local_ca_ack_delay  = 0;
> +	props->max_mad_size        = IB_MGMT_MAD_SIZE;
>  
>   bail2:
>  	vq_repbuf_free(c2dev, reply);
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 811b24a..b8a80aa0 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1174,6 +1174,7 @@ static int iwch_query_device(struct ib_device *ibdev,
>  	props->max_pd = dev->attr.max_pds;
>  	props->local_ca_ack_delay = 0;
>  	props->max_fast_reg_page_list_len = T3_MAX_FASTREG_DEPTH;
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
>  
>  	return 0;
>  }
> diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
> index 66bd6a2..299c70c 100644
> --- a/drivers/infiniband/hw/cxgb4/provider.c
> +++ b/drivers/infiniband/hw/cxgb4/provider.c
> @@ -332,6 +332,7 @@ static int c4iw_query_device(struct ib_device *ibdev,
>  	props->max_pd = T4_MAX_NUM_PD;
>  	props->local_ca_ack_delay = 0;
>  	props->max_fast_reg_page_list_len = t4_max_fr_depth(use_dsgl);
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
>  
>  	return 0;
>  }
> diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
> index 9ed4d25..6166146 100644
> --- a/drivers/infiniband/hw/ehca/ehca_hca.c
> +++ b/drivers/infiniband/hw/ehca/ehca_hca.c
> @@ -40,6 +40,7 @@
>   */
>  
>  #include <linux/gfp.h>
> +#include <rdma/ib_mad.h>
>  
>  #include "ehca_tools.h"
>  #include "ehca_iverbs.h"
> @@ -133,6 +134,8 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
>  		if (rblock->hca_cap_indicators & cap_mapping[i + 1])
>  			props->device_cap_flags |= cap_mapping[i];
>  
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
> +
>  query_device1:
>  	ehca_free_fw_ctrlblock(rblock);
>  
> diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
> index 44ea939..4c6474c 100644
> --- a/drivers/infiniband/hw/ipath/ipath_verbs.c
> +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
> @@ -1538,6 +1538,7 @@ static int ipath_query_device(struct ib_device *ibdev,
>  	props->max_mcast_qp_attach = ib_ipath_max_mcast_qp_attached;
>  	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
>  		props->max_mcast_grp;
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
>  
>  	return 0;
>  }
> diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> index 57ecc5b..88326a7 100644
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -229,6 +229,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
>  	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
>  					   props->max_mcast_grp;
>  	props->max_map_per_fmr = dev->dev->caps.max_fmr_maps;
> +	props->max_mad_size        = IB_MGMT_MAD_SIZE;
>  
>  out:
>  	kfree(in_mad);
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index 8a87404..24a0a54 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -243,6 +243,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
>  	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
>  					   props->max_mcast_grp;
>  	props->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */
> +	props->max_mad_size        = IB_MGMT_MAD_SIZE;
>  
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>  	if (dev->mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)
> diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
> index 415f8e1..236c0df 100644
> --- a/drivers/infiniband/hw/mthca/mthca_provider.c
> +++ b/drivers/infiniband/hw/mthca/mthca_provider.c
> @@ -123,6 +123,8 @@ static int mthca_query_device(struct ib_device *ibdev,
>  		props->max_map_per_fmr =
>  			(1 << (32 - ilog2(mdev->limits.num_mpts))) - 1;
>  
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
> +
>  	err = 0;
>   out:
>  	kfree(in_mad);
> diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
> index c0d0296..93e67e2 100644
> --- a/drivers/infiniband/hw/nes/nes_verbs.c
> +++ b/drivers/infiniband/hw/nes/nes_verbs.c
> @@ -555,6 +555,7 @@ static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *prop
>  	props->max_qp_init_rd_atom = props->max_qp_rd_atom;
>  	props->atomic_cap = IB_ATOMIC_NONE;
>  	props->max_map_per_fmr = 1;
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
>  
>  	return 0;
>  }
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> index fb8d8c4..7ae0a22 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> @@ -103,6 +103,7 @@ int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr)
>  	attr->local_ca_ack_delay = dev->attr.local_ca_ack_delay;
>  	attr->max_fast_reg_page_list_len = dev->attr.max_pages_per_frmr;
>  	attr->max_pkeys = 1;
> +	attr->max_mad_size = IB_MGMT_MAD_SIZE;
>  	return 0;
>  }
>  
> diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
> index 9bcfbd8..5d6447b 100644
> --- a/drivers/infiniband/hw/qib/qib_verbs.c
> +++ b/drivers/infiniband/hw/qib/qib_verbs.c
> @@ -1591,6 +1591,7 @@ static int qib_query_device(struct ib_device *ibdev,
>  	props->max_mcast_qp_attach = ib_qib_max_mcast_qp_attached;
>  	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
>  		props->max_mcast_grp;
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
>  
>  	return 0;
>  }
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> index 53bd6a2..b72ad7f 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> @@ -22,6 +22,7 @@
>  
>  #include <rdma/ib_user_verbs.h>
>  #include <rdma/ib_addr.h>
> +#include <rdma/ib_mad.h>
>  
>  #include "usnic_abi.h"
>  #include "usnic_ib.h"
> @@ -296,6 +297,7 @@ int usnic_ib_query_device(struct ib_device *ibdev,
>  	props->max_mcast_qp_attach = 0;
>  	props->max_total_mcast_qp_attach = 0;
>  	props->max_map_per_fmr = 0;
> +	props->max_mad_size = IB_MGMT_MAD_SIZE;
>  	/* Owned by Userspace
>  	 * max_qp_wr, max_sge, max_sge_rd, max_cqe */
>  	mutex_unlock(&us_ibdev->usdev_lock);
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index 9c89939..5823016 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -135,6 +135,7 @@ enum {
>  	IB_MGMT_SA_DATA = 200,
>  	IB_MGMT_DEVICE_HDR = 64,
>  	IB_MGMT_DEVICE_DATA = 192,
> +	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
>  };
>  
>  struct ib_mad_hdr {
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 0116e4b..64d3479 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -210,6 +210,7 @@ struct ib_device_attr {
>  	int			sig_prot_cap;
>  	int			sig_guard_cap;
>  	struct ib_odp_caps	odp_caps;
> +	u32			max_mad_size;
>  };
>  
>  enum ib_mtu {


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc
       [not found]     ` <1423092585-26692-8-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-02-24 14:22       ` Doug Ledford
       [not found]         ` <1424787735.4847.19.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Doug Ledford @ 2015-02-24 14:22 UTC (permalink / raw)
  To: ira.weiny-ral2JQCrhuEAvxtiuMwx3w
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

[-- Attachment #1: Type: text/plain, Size: 8971 bytes --]

On Wed, 2015-02-04 at 18:29 -0500, ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> Use the new max_mad_size specified by devices for the allocations and DMA maps.
> 
> kmalloc is more flexible to support devices with different sized MADs and
> research and testing showed that the current use of kmem_cache does not provide
> performance benefits over kmalloc.
> 
> Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> ---
>  drivers/infiniband/core/mad.c | 73 ++++++++++++++++++-------------------------
>  1 file changed, 30 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index a6a33cf..cc0a3ad 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -59,8 +59,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests
>  module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
>  MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests");
>  
> -static struct kmem_cache *ib_mad_cache;
> -
>  static struct list_head ib_mad_port_list;
>  static u32 ib_mad_client_id = 0;
>  
> @@ -717,6 +715,13 @@ static void build_smp_wc(struct ib_qp *qp,
>  	wc->port_num = port_num;
>  }
>  
> +static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev)
> +{
> +	return (kmalloc(sizeof(struct ib_mad_private_header) +
> +			sizeof(struct ib_grh) +
> +			dev->cached_dev_attrs.max_mad_size, GFP_ATOMIC));

Ouch!  GFP_ATOMIC?  I thought that generally all of the mad processing
was done from workqueue context where sleeping is allowed?  In the two
places where you removed kmem_cache_alloc() calls and replaced it with
calls to this code, they both used GFP_KERNEL and now you have switched
it to GFP_ATOMIC.  If there isn't a good reason for this, it should be
switched back to GFP_KERNEL.

> +}
> +
>  /*
>   * Return 0 if SMP is to be sent
>   * Return 1 if SMP was consumed locally (whether or not solicited)
> @@ -771,7 +776,8 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>  	}
>  	local->mad_priv = NULL;
>  	local->recv_mad_agent = NULL;
> -	mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC);
> +
> +	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device);
>  	if (!mad_priv) {
>  		ret = -ENOMEM;
>  		dev_err(&device->dev, "No memory for local response MAD\n");
> @@ -801,10 +807,10 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>  			 */
>  			atomic_inc(&mad_agent_priv->refcount);
>  		} else
> -			kmem_cache_free(ib_mad_cache, mad_priv);
> +			kfree(mad_priv);
>  		break;
>  	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED:
> -		kmem_cache_free(ib_mad_cache, mad_priv);
> +		kfree(mad_priv);
>  		break;
>  	case IB_MAD_RESULT_SUCCESS:
>  		/* Treat like an incoming receive MAD */
> @@ -820,14 +826,14 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>  			 * No receiving agent so drop packet and
>  			 * generate send completion.
>  			 */
> -			kmem_cache_free(ib_mad_cache, mad_priv);
> +			kfree(mad_priv);
>  			break;
>  		}
>  		local->mad_priv = mad_priv;
>  		local->recv_mad_agent = recv_mad_agent;
>  		break;
>  	default:
> -		kmem_cache_free(ib_mad_cache, mad_priv);
> +		kfree(mad_priv);
>  		kfree(local);
>  		ret = -EINVAL;
>  		goto out;
> @@ -1237,7 +1243,7 @@ void ib_free_recv_mad(struct ib_mad_recv_wc *mad_recv_wc)
>  					    recv_wc);
>  		priv = container_of(mad_priv_hdr, struct ib_mad_private,
>  				    header);
> -		kmem_cache_free(ib_mad_cache, priv);
> +		kfree(priv);
>  	}
>  }
>  EXPORT_SYMBOL(ib_free_recv_mad);
> @@ -1924,6 +1930,11 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
>  	}
>  }
>  
> +static size_t mad_recv_buf_size(struct ib_device *dev)
> +{
> +	return(sizeof(struct ib_grh) + dev->cached_dev_attrs.max_mad_size);
> +}
> +
>  static bool generate_unmatched_resp(struct ib_mad_private *recv,
>  				    struct ib_mad_private *response)
>  {
> @@ -1964,8 +1975,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
>  	recv = container_of(mad_priv_hdr, struct ib_mad_private, header);
>  	ib_dma_unmap_single(port_priv->device,
>  			    recv->header.mapping,
> -			    sizeof(struct ib_mad_private) -
> -			      sizeof(struct ib_mad_private_header),
> +			    mad_recv_buf_size(port_priv->device),
>  			    DMA_FROM_DEVICE);
>  
>  	/* Setup MAD receive work completion from "normal" work completion */
> @@ -1982,7 +1992,7 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
>  	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
>  		goto out;
>  
> -	response = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL);
> +	response = alloc_mad_priv(port_priv->device);
>  	if (!response) {
>  		dev_err(&port_priv->device->dev,
>  			"ib_mad_recv_done_handler no memory for response buffer\n");
> @@ -2075,7 +2085,7 @@ out:
>  	if (response) {
>  		ib_mad_post_receive_mads(qp_info, response);
>  		if (recv)
> -			kmem_cache_free(ib_mad_cache, recv);
> +			kfree(recv);
>  	} else
>  		ib_mad_post_receive_mads(qp_info, recv);
>  }
> @@ -2535,7 +2545,7 @@ local_send_completion:
>  		spin_lock_irqsave(&mad_agent_priv->lock, flags);
>  		atomic_dec(&mad_agent_priv->refcount);
>  		if (free_mad)
> -			kmem_cache_free(ib_mad_cache, local->mad_priv);
> +			kfree(local->mad_priv);
>  		kfree(local);
>  	}
>  	spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
> @@ -2664,7 +2674,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
>  			mad_priv = mad;
>  			mad = NULL;
>  		} else {
> -			mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL);
> +			mad_priv = alloc_mad_priv(qp_info->port_priv->device);
>  			if (!mad_priv) {
>  				dev_err(&qp_info->port_priv->device->dev,
>  					"No memory for receive buffer\n");
> @@ -2674,8 +2684,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
>  		}
>  		sg_list.addr = ib_dma_map_single(qp_info->port_priv->device,
>  						 &mad_priv->grh,
> -						 sizeof *mad_priv -
> -						   sizeof mad_priv->header,
> +						 mad_recv_buf_size(qp_info->port_priv->device),
>  						 DMA_FROM_DEVICE);
>  		if (unlikely(ib_dma_mapping_error(qp_info->port_priv->device,
>  						  sg_list.addr))) {
> @@ -2699,10 +2708,9 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
>  			spin_unlock_irqrestore(&recv_queue->lock, flags);
>  			ib_dma_unmap_single(qp_info->port_priv->device,
>  					    mad_priv->header.mapping,
> -					    sizeof *mad_priv -
> -					      sizeof mad_priv->header,
> +					    mad_recv_buf_size(qp_info->port_priv->device),
>  					    DMA_FROM_DEVICE);
> -			kmem_cache_free(ib_mad_cache, mad_priv);
> +			kfree(mad_priv);
>  			dev_err(&qp_info->port_priv->device->dev,
>  				"ib_post_recv failed: %d\n", ret);
>  			break;
> @@ -2739,10 +2747,9 @@ static void cleanup_recv_queue(struct ib_mad_qp_info *qp_info)
>  
>  		ib_dma_unmap_single(qp_info->port_priv->device,
>  				    recv->header.mapping,
> -				    sizeof(struct ib_mad_private) -
> -				      sizeof(struct ib_mad_private_header),
> +				    mad_recv_buf_size(qp_info->port_priv->device),
>  				    DMA_FROM_DEVICE);
> -		kmem_cache_free(ib_mad_cache, recv);
> +		kfree(recv);
>  	}
>  
>  	qp_info->recv_queue.count = 0;
> @@ -3138,45 +3145,25 @@ static struct ib_client mad_client = {
>  
>  static int __init ib_mad_init_module(void)
>  {
> -	int ret;
> -
>  	mad_recvq_size = min(mad_recvq_size, IB_MAD_QP_MAX_SIZE);
>  	mad_recvq_size = max(mad_recvq_size, IB_MAD_QP_MIN_SIZE);
>  
>  	mad_sendq_size = min(mad_sendq_size, IB_MAD_QP_MAX_SIZE);
>  	mad_sendq_size = max(mad_sendq_size, IB_MAD_QP_MIN_SIZE);
>  
> -	ib_mad_cache = kmem_cache_create("ib_mad",
> -					 sizeof(struct ib_mad_private),
> -					 0,
> -					 SLAB_HWCACHE_ALIGN,
> -					 NULL);
> -	if (!ib_mad_cache) {
> -		pr_err("Couldn't create ib_mad cache\n");
> -		ret = -ENOMEM;
> -		goto error1;
> -	}
> -
>  	INIT_LIST_HEAD(&ib_mad_port_list);
>  
>  	if (ib_register_client(&mad_client)) {
>  		pr_err("Couldn't register ib_mad client\n");
> -		ret = -EINVAL;
> -		goto error2;
> +		return(-EINVAL);
>  	}
>  
>  	return 0;
> -
> -error2:
> -	kmem_cache_destroy(ib_mad_cache);
> -error1:
> -	return ret;
>  }
>  
>  static void __exit ib_mad_cleanup_module(void)
>  {
>  	ib_unregister_client(&mad_client);
> -	kmem_cache_destroy(ib_mad_cache);
>  }
>  
>  module_init(ib_mad_init_module);


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                         ` <54EB7756.7070407-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-02-25  0:29                           ` Weiny, Ira
       [not found]                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3D330-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-02-25  0:29 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> >>>
> >>> Do you have a suggestion for alternatives?
> >>
> >> The desire to leverage the IB MAD infrastructure for OPA is
> >> understood but the current approach represents OPA as a device
> >> capability which does not seem appropriate because OPA is clearly a
> >> different type of RDMA technology than IB.
> >>
> >
> > While it is a different type of technology, standard verbs[*] remains 100%
> compatible.  Unlike other verbs technologies user space software does not need
> any knowledge that the underlying device is not IB.  For example, PR (and SA)
> queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible.
> 
> Even if OPA is 100% standard verbs compatible which it does not appear to be,
> that does not make OPA an extra capability of an IBA device.

I don't want to make it an extra capability of an IBA device.  I want to make it an extra capability of a "verbs" device.

> While it is a primary goal of the RDMA stack to have a common verbs API for
> various RDMA interconnects, each one is properly represented to allow it's
> unique characteristics to be exposed.

The difference here is that we have maintained IB Verbs compatibility where other RDMA technologies did not.  We have tested many Verbs applications (both kernel and user space) and they function _without_ _modification_.

Despite this compatibility we are still having this discussion.

I can think of no other way to signal the MAD capability to the MAD stack which will preserve the verbs compatibility in the same way.

> 
> > Therefore, to address your initial question regarding tradeoffs I believe this
> method is the least invasive to the code as well as removing any potential
> performance penalties to core verbs.
> >
> > Ira
> >
> > [*] We don't support some of the extensions particularly those which have
> been most recently introduced.  And we would like to make our own extensions
> in the form of higher MTU availability, but the patch is not yet ready to be
> submitted upstream.
> 
> There appear to be a number of things that are not exposed by the current
> patch set which will be needed in subsequent patches. It would be better to see
> the complete picture so it can be reviewed as a whole.

Is there something in particular you would like to see?  There are no other patches required in the core modules for verbs applications to function.  The MTU patch only improves verbs performance.

Ira 

> 
> -- Hal
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
> of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3D330-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-02-25 17:13                               ` Doug Ledford
       [not found]                                 ` <1424884438.4847.91.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Doug Ledford @ 2015-02-25 17:13 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Hal Rosenstock, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 5338 bytes --]

On Wed, 2015-02-25 at 00:29 +0000, Weiny, Ira wrote:
> > >>>
> > >>> Do you have a suggestion for alternatives?
> > >>
> > >> The desire to leverage the IB MAD infrastructure for OPA is
> > >> understood but the current approach represents OPA as a device
> > >> capability which does not seem appropriate because OPA is clearly a
> > >> different type of RDMA technology than IB.
> > >>
> > >
> > > While it is a different type of technology, standard verbs[*] remains 100%
> > compatible.  Unlike other verbs technologies user space software does not need
> > any knowledge that the underlying device is not IB.  For example, PR (and SA)
> > queries, CM, rdmacm, and verbs calls themselves are all 100% IB compatible.
> > 
> > Even if OPA is 100% standard verbs compatible which it does not appear to be,
> > that does not make OPA an extra capability of an IBA device.
> 
> I don't want to make it an extra capability of an IBA device.  I want to make it an extra capability of a "verbs" device.

And this, friends, is why it's bad to make both a link layer and an user
space API with the exact same name ;-).  Anyway, I get your point Ira
and it makes sense to me.  However, I also get Hal's point.  Our track
record on this particular issue is a bit wonky though.

First we had InfiniBand.

Then came iWARP, and we used the transport type to differentiate it from
an actual InfiniBand device, but left the underlying link layer listed
as InfiniBand.  Then came RoCE, and we listed its transport type as
InfiniBand, but changed the link layer to Ethernet.  Which left us in
the oxymoronic position that even though iWARP was over Ethernet, the
tools said it was over InfiniBand, while RoCE was the only thing that
listed Ethernet as the link layer.  We later fixed that up with some
hacks in tools to keep users from being confused and filing bugs.

Maybe this represents an opportunity to straighten some of this mess
out.  If I remember correctly, this is the matrix of technologies today:

Technology	LinkLayer	Transport

InfiniBand	InfiniBand	InfiniBand Verbs
iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
				specific connection establishment
				requirements that don't exist with IBV)
RoCE		Ethernet	InfiniBand Verbs (but with different
				addressing because of the different
				link layer)
OPA		?		InfiniBand Verbs

It makes me wonder if we shouldn't make this matrix more accurate:

Technology	LinkLayer	Transport

InfiniBand	InfiniBand	InfiniBand Verbs
iWARP		Ethernet	iWARP Verbs
RoCE		Ethernet	RoCE-v1 or RoCE-v2
OPA		?		OPA Verbs

With this sort of setup, the core ib_mad/ib_umad code would simply check
the verbs type to see what support it can enable.  For IBV it would be
the existing support, for OPAV it would be the additional jumbo support.

I'm not sure how much we might expect a change like this to break
existing software though, so maybe staightening this mess out is a
non-starter.

> > While it is a primary goal of the RDMA stack to have a common verbs API for
> > various RDMA interconnects, each one is properly represented to allow it's
> > unique characteristics to be exposed.
> 
> The difference here is that we have maintained IB Verbs compatibility where other RDMA technologies did not.  We have tested many Verbs applications (both kernel and user space) and they function _without_ _modification_.
> 
> Despite this compatibility we are still having this discussion.
> 
> I can think of no other way to signal the MAD capability to the MAD stack which will preserve the verbs compatibility in the same way.

See above.  Define a new transport type, OPAVerbs, that is a superset of
IBV and enable jumbo support when OPAV is the transport on the link.

> > 
> > > Therefore, to address your initial question regarding tradeoffs I believe this
> > method is the least invasive to the code as well as removing any potential
> > performance penalties to core verbs.
> > >
> > > Ira
> > >
> > > [*] We don't support some of the extensions particularly those which have
> > been most recently introduced.  And we would like to make our own extensions
> > in the form of higher MTU availability, but the patch is not yet ready to be
> > submitted upstream.
> > 
> > There appear to be a number of things that are not exposed by the current
> > patch set which will be needed in subsequent patches. It would be better to see
> > the complete picture so it can be reviewed as a whole.
> 
> Is there something in particular you would like to see?  There are no other patches required in the core modules for verbs applications to function.  The MTU patch only improves verbs performance.
> 
> Ira 
> 
> > 
> > -- Hal
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
> > of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
       [not found]         ` <1424787385.4847.16.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-02-25 18:13           ` Weiny, Ira
       [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3ECE5-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-02-25 18:13 UTC (permalink / raw)
  To: Doug Ledford
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> > diff --git a/drivers/infiniband/core/mad.c
> > b/drivers/infiniband/core/mad.c index 819b794..a6a33cf 100644
> > --- a/drivers/infiniband/core/mad.c
> > +++ b/drivers/infiniband/core/mad.c
> > @@ -2924,6 +2924,12 @@ static int ib_mad_port_open(struct ib_device
> *device,
> >  	char name[sizeof "ib_mad123"];
> >  	int has_smi;
> >
> > +	if (device->cached_dev_attrs.max_mad_size < IB_MGMT_MAD_SIZE) {
> > +		dev_err(&device->dev, "Min MAD size for device is %u\n",
> > +			IB_MGMT_MAD_SIZE);
> > +		return -EFAULT;
> > +	}
> > +
> 
> The printk message here is not very informative and it qualifies as an error.
> Someone reading that for the first time in the dmesg output and wondering
> why their device isn't working will be confused if they don't know about the
> mad size changes you are making here.  Something like "max supported MAD
> size (%u) < min required by ib_mad (%u), ignoring dev \n"

Good suggestion.

Fixed for v5 with this message.

+               dev_err(&device->dev,
+                       "Max supported MAD size (%u) < min required by ib_mad (%u), ignoring device (%s)\n",
+                       device->cached_dev_attrs.max_mad_size,
+                       IB_MGMT_MAD_SIZE, device->name);


Ira


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
       [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3ECE5-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-02-25 18:23               ` Jason Gunthorpe
       [not found]                 ` <20150225182308.GA14580-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Jason Gunthorpe @ 2015-02-25 18:23 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Wed, Feb 25, 2015 at 06:13:35PM +0000, Weiny, Ira wrote:
> > > diff --git a/drivers/infiniband/core/mad.c
> > > b/drivers/infiniband/core/mad.c index 819b794..a6a33cf 100644
> > > +++ b/drivers/infiniband/core/mad.c
> > > @@ -2924,6 +2924,12 @@ static int ib_mad_port_open(struct ib_device
> > *device,
> > >  	char name[sizeof "ib_mad123"];
> > >  	int has_smi;
> > >
> > > +	if (device->cached_dev_attrs.max_mad_size < IB_MGMT_MAD_SIZE) {
> > > +		dev_err(&device->dev, "Min MAD size for device is %u\n",
> > > +			IB_MGMT_MAD_SIZE);
> > > +		return -EFAULT;
> > > +	}
> > > +
> > 
> > The printk message here is not very informative and it qualifies as an error.
> > Someone reading that for the first time in the dmesg output and wondering
> > why their device isn't working will be confused if they don't know about the
> > mad size changes you are making here.  Something like "max supported MAD
> > size (%u) < min required by ib_mad (%u), ignoring dev \n"
> 
> Good suggestion.
> 
> Fixed for v5 with this message.
> 
> +               dev_err(&device->dev,
> +                       "Max supported MAD size (%u) < min required by ib_mad (%u), ignoring device (%s)\n",
> +                       device->cached_dev_attrs.max_mad_size,
> +                       IB_MGMT_MAD_SIZE, device->name);

It also seems redundant since the only call to ib_mad_port_open is:

                if (ib_mad_port_open(device, i)) {
                        printk(KERN_ERR PFX "Couldn't open %s port %d\n",
                               device->name, i);

So, why does this particular error deserve a special double error
print? I assume it is basically impossible to hit?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc
       [not found]         ` <1424787735.4847.19.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-02-25 18:23           ` Weiny, Ira
  0 siblings, 0 replies; 84+ messages in thread
From: Weiny, Ira @ 2015-02-25 18:23 UTC (permalink / raw)
  To: Doug Ledford
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> >
> > +static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev) {
> > +	return (kmalloc(sizeof(struct ib_mad_private_header) +
> > +			sizeof(struct ib_grh) +
> > +			dev->cached_dev_attrs.max_mad_size,
> GFP_ATOMIC));
> 
> Ouch!  GFP_ATOMIC?  I thought that generally all of the mad processing was
> done from workqueue context where sleeping is allowed?  In the two places
> where you removed kmem_cache_alloc() calls and replaced it with calls to this
> code, they both used GFP_KERNEL and now you have switched it to
> GFP_ATOMIC.  If there isn't a good reason for this, it should be switched back
> to GFP_KERNEL.

The original kmem_cache_allocs are actually both GFP_ATOMIC (1 usage, see below) and GFP_KERNEL (the 2 usages you reference).

My bad for not making this specific to the allocation.

I will research the original GFP_ATOMIC usage and if it is necessary have this function take gfp_t.  Otherwise if we can get away with GFP_KERNEL I agree that would be best.

> 
> > +}
> > +
> >  /*
> >   * Return 0 if SMP is to be sent
> >   * Return 1 if SMP was consumed locally (whether or not solicited) @@
> > -771,7 +776,8 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
> >  	}
> >  	local->mad_priv = NULL;
> >  	local->recv_mad_agent = NULL;
> > -	mad_priv = kmem_cache_alloc(ib_mad_cache, GFP_ATOMIC);

Original usage here... 

Thanks,
Ira

> > +
> > +	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device);
> >  	if (!mad_priv) {
> >  		ret = -ENOMEM;
> >  		dev_err(&device->dev, "No memory for local response
> MAD\n"); @@


^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
       [not found]                 ` <20150225182308.GA14580-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-02-25 18:32                   ` Weiny, Ira
       [not found]                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3ED5B-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-02-25 18:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> >
> > Fixed for v5 with this message.
> >
> > +               dev_err(&device->dev,
> > +                       "Max supported MAD size (%u) < min required by ib_mad
> (%u), ignoring device (%s)\n",
> > +                       device->cached_dev_attrs.max_mad_size,
> > +                       IB_MGMT_MAD_SIZE, device->name);
> 
> It also seems redundant since the only call to ib_mad_port_open is:
> 
>                 if (ib_mad_port_open(device, i)) {
>                         printk(KERN_ERR PFX "Couldn't open %s port %d\n",
>                                device->name, i);
> 
> So, why does this particular error deserve a special double error print? I
> assume it is basically impossible to hit?

This does indicate a coding error.  Generally I prefer details of why the device could not open the port.  But if the community feels this is redundant or "not possible" I can drop the hunk.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr
       [not found]                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3ED5B-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-02-25 18:43                       ` Jason Gunthorpe
  0 siblings, 0 replies; 84+ messages in thread
From: Jason Gunthorpe @ 2015-02-25 18:43 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Wed, Feb 25, 2015 at 06:32:27PM +0000, Weiny, Ira wrote:
> > >
> > > Fixed for v5 with this message.
> > >
> > > +               dev_err(&device->dev,
> > > +                       "Max supported MAD size (%u) < min required by ib_mad
> > (%u), ignoring device (%s)\n",
> > > +                       device->cached_dev_attrs.max_mad_size,
> > > +                       IB_MGMT_MAD_SIZE, device->name);
> > 
> > It also seems redundant since the only call to ib_mad_port_open is:
> > 
> >                 if (ib_mad_port_open(device, i)) {
> >                         printk(KERN_ERR PFX "Couldn't open %s port %d\n",
> >                                device->name, i);
> > 
> > So, why does this particular error deserve a special double error print? I
> > assume it is basically impossible to hit?
> 
> This does indicate a coding error.  Generally I prefer details of
> why the device could not open the port.  But if the community feels
> this is redundant or "not possible" I can drop the hunk.

Internal logic errors are handled with WARN_ON/BUG/etc.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                 ` <1424884438.4847.91.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-03-04  7:21                                   ` Weiny, Ira
       [not found]                                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC4F18C-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-03-04  7:21 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Hal Rosenstock, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Doug,

You have given me a lot to think about...  Comments below...

> > > >
> > > > While it is a different type of technology, standard verbs[*]
> > > > remains 100%
> > > compatible.  Unlike other verbs technologies user space software
> > > does not need any knowledge that the underlying device is not IB.
> > > For example, PR (and SA) queries, CM, rdmacm, and verbs calls themselves
> are all 100% IB compatible.
> > >
> > > Even if OPA is 100% standard verbs compatible which it does not
> > > appear to be, that does not make OPA an extra capability of an IBA device.
> >
> > I don't want to make it an extra capability of an IBA device.  I want to make it
> an extra capability of a "verbs" device.
> 
> And this, friends, is why it's bad to make both a link layer and an user space API
> with the exact same name ;-).  Anyway, I get your point Ira and it makes sense
> to me.  However, I also get Hal's point.  Our track record on this particular
> issue is a bit wonky though.

Thanks for laying this out.  I too understand Hals point.

> 
> First we had InfiniBand.
> 
> Then came iWARP, and we used the transport type to differentiate it from an
> actual InfiniBand device, but left the underlying link layer listed as InfiniBand.
> Then came RoCE, and we listed its transport type as InfiniBand, but changed
> the link layer to Ethernet.  Which left us in the oxymoronic position that even
> though iWARP was over Ethernet, the tools said it was over InfiniBand, while
> RoCE was the only thing that listed Ethernet as the link layer.  We later fixed
> that up with some hacks in tools to keep users from being confused and filing
> bugs.
> 
> Maybe this represents an opportunity to straighten some of this mess out.  If I
> remember correctly, this is the matrix of technologies today:
> 
> Technology	LinkLayer	Transport
> 
> InfiniBand	InfiniBand	InfiniBand Verbs
> iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
> 				specific connection establishment
> 				requirements that don't exist with IBV)
> RoCE		Ethernet	InfiniBand Verbs (but with different
> 				addressing because of the different
> 				link layer)
> OPA		?		InfiniBand Verbs

I think this is _relatively_ accurate.  The one exception is with the various IB verbs extensions which have been introduced.  While most are being pushed into the spec not all of them are in the spec prior to being in the kernel.  It makes keeping up with what "IB Verbs" really is difficult.

Mind you I'm not opposing having IB Verbs be flexible.  But I think we can accurately have multiple underlying technologies which support IB Verbs with various extensions.

> 
> It makes me wonder if we shouldn't make this matrix more accurate:
> 
> Technology	LinkLayer	Transport
> 
> InfiniBand	InfiniBand	InfiniBand Verbs
> iWARP		Ethernet	iWARP Verbs
> RoCE		Ethernet	RoCE-v1 or RoCE-v2
> OPA		?		OPA Verbs
> 
> With this sort of setup, the core ib_mad/ib_umad code would simply check the
> verbs type to see what support it can enable.  For IBV it would be the existing
> support, for OPAV it would be the additional jumbo support.

OPA, to be compatible with IB Verbs, uses the same node types as InfiniBand verbs (1 == CA, 2 == Switch).  As such it returns the same Transport type.

> 
> I'm not sure how much we might expect a change like this to break existing
> software though, so maybe staightening this mess out is a non-starter.

I think this is going to break quite a bit.  I have prototyped setting OPA devices to "OPA Link Layer" and the perftest tools just fall over.  Any changes to the Link layer or the transport types will require a transition period for ULPs.

> 
> > > While it is a primary goal of the RDMA stack to have a common verbs
> > > API for various RDMA interconnects, each one is properly represented
> > > to allow it's unique characteristics to be exposed.
> >
> > The difference here is that we have maintained IB Verbs compatibility where
> other RDMA technologies did not.  We have tested many Verbs applications
> (both kernel and user space) and they function _without_ _modification_.
> >
> > Despite this compatibility we are still having this discussion.
> >
> > I can think of no other way to signal the MAD capability to the MAD stack
> which will preserve the verbs compatibility in the same way.
> 
> See above.  Define a new transport type, OPAVerbs, that is a superset of IBV
> and enable jumbo support when OPAV is the transport on the link.

But the transport type is not changing.  Once again we are attempting to be completely verbs compatible.  From the MAD stack POV the verbs calls in the kernel are not different.

Would it be acceptable if the result of my patch series was:

InfiniBand	InfiniBand	InfiniBand Verbs
iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
				specific connection establishment
				requirements that don't exist with IBV)
RoCE		Ethernet	InfiniBand Verbs (but with different
				addressing because of the different
				link layer)
OPA		OPA		InfiniBand Verbs

And the MAD stack looked at the link layer to see the difference?

Ira


^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC4F18C-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-04 16:02                                       ` Hefty, Sean
       [not found]                                         ` <1828884A29C6694DAF28B7E6B8A8237399E6F06F-8oqHQFITsIHTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-03-06 17:47                                       ` Jason Gunthorpe
  1 sibling, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-03-04 16:02 UTC (permalink / raw)
  To: Weiny, Ira, Doug Ledford
  Cc: Hal Rosenstock, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

> InfiniBand	InfiniBand	InfiniBand Verbs
> iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
> 				specific connection establishment
> 				requirements that don't exist with IBV)
> RoCE		Ethernet	InfiniBand Verbs (but with different
> 				addressing because of the different
> 				link layer)
> OPA		OPA		InfiniBand Verbs

Verbs is an interface definition to hardware that has been twisted to be a software API and extended to expose vendor-specific implementation 'features' as extensions.  It is not a transport.

The device capability bits seems to have evolved to mean: vendor A implemented some random 'feature' in their hardware and wants all applications to now check for this 'feature' and change their code to use it.  Basically, what gets defined as a device cap is rather arbitrary.

- Sean

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                         ` <1828884A29C6694DAF28B7E6B8A8237399E6F06F-8oqHQFITsIHTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-04 16:41                                           ` Weiny, Ira
       [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC4F50B-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-03-04 16:41 UTC (permalink / raw)
  To: Hefty, Sean, Doug Ledford
  Cc: Hal Rosenstock, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

> 
> > InfiniBand	InfiniBand	InfiniBand Verbs
> > iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
> > 				specific connection establishment
> > 				requirements that don't exist with IBV)
> > RoCE		Ethernet	InfiniBand Verbs (but with different
> > 				addressing because of the different
> > 				link layer)
> > OPA		OPA		InfiniBand Verbs
> 
> Verbs is an interface definition to hardware that has been twisted to be a
> software API and extended to expose vendor-specific implementation 'features'
> as extensions.  It is not a transport.
> 
> The device capability bits seems to have evolved to mean: vendor A
> implemented some random 'feature' in their hardware and wants all
> applications to now check for this 'feature' and change their code to use it.
> Basically, what gets defined as a device cap is rather arbitrary.
> 

This was the point I was trying to make and the reason the OPA MAD support was implemented as a device capability.

Ira


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC4F18C-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-03-04 16:02                                       ` Hefty, Sean
@ 2015-03-06 17:47                                       ` Jason Gunthorpe
       [not found]                                         ` <20150306174729.GE22375-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 84+ messages in thread
From: Jason Gunthorpe @ 2015-03-06 17:47 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Doug Ledford, Hal Rosenstock, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 04, 2015 at 07:21:48AM +0000, Weiny, Ira wrote:

> I think this is going to break quite a bit.  I have prototyped
> setting OPA devices to "OPA Link Layer" and the perftest tools just
> fall over.  Any changes to the Link layer or the transport types
> will require a transition period for ULPs.

How do the perftest tools work with OPA in the first place? OPA seems
to have 32 bit lids. Do you mean it 'works' as long as the lid is < 16
bits? Same general point about all of verbs, lots of 'uint16_t lid' in
the interfaces?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                         ` <20150306174729.GE22375-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-03-06 22:47                                           ` Weiny, Ira
       [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC53437-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-03-06 22:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Hal Rosenstock, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

> 
> On Wed, Mar 04, 2015 at 07:21:48AM +0000, Weiny, Ira wrote:
> 
> > I think this is going to break quite a bit.  I have prototyped setting
> > OPA devices to "OPA Link Layer" and the perftest tools just fall over.
> > Any changes to the Link layer or the transport types will require a
> > transition period for ULPs.
> 
> How do the perftest tools work with OPA in the first place? OPA seems to have
> 32 bit lids. Do you mean it 'works' as long as the lid is < 16 bits? Same general
> point about all of verbs, lots of 'uint16_t lid' in the interfaces?

The 32 bit LIDs in the SMP are designed for future expansion.  Currently OPA does not support > 16 bit LIDs.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC4F50B-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-09 21:22                                               ` Hal Rosenstock
       [not found]                                                 ` <54FE0F16.5090905-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hal Rosenstock @ 2015-03-09 21:22 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Hefty, Sean, Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 3/4/2015 11:41 AM, Weiny, Ira wrote:
>>
>>> InfiniBand	InfiniBand	InfiniBand Verbs
>>> iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
>>> 				specific connection establishment
>>> 				requirements that don't exist with IBV)
>>> RoCE		Ethernet	InfiniBand Verbs (but with different
>>> 				addressing because of the different
>>> 				link layer)
>>> OPA		OPA		InfiniBand Verbs
>>
>> Verbs is an interface definition to hardware that has been twisted to be a
>> software API and extended to expose vendor-specific implementation 'features'
>> as extensions.  It is not a transport.
>>
>> The device capability bits seems to have evolved to mean: vendor A
>> implemented some random 'feature' in their hardware and wants all
>> applications to now check for this 'feature' and change their code to use it.
>> Basically, what gets defined as a device cap is rather arbitrary.
>>
> 
> This was the point I was trying to make and the reason the OPA MAD support was implemented as a device capability.

The proposed device capability stands for a change that is way more
drastic than just a vendor extension. This is a device running a
completely different wire protocol which does not interoperate with the
IB device that is impersonating.

Also, it does not come under the same jurisdication as IB. It is
entirely possible that IBTA could make some change in the future where
OPA can no longer masquerade as IB.

OPA must also be identifiable by any verbs or management application. To
do that, it should be properly represented in node type, transport type,
and link layer as all the other RDMA technologies have been regardless
of the changes needed. All the other RDMA technologies have gone through
this process.

-- Hal

> Ira
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC53437-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-09 21:23                                               ` Hal Rosenstock
  0 siblings, 0 replies; 84+ messages in thread
From: Hal Rosenstock @ 2015-03-09 21:23 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Jason Gunthorpe, Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 3/6/2015 5:47 PM, Weiny, Ira wrote:
>>
>> On Wed, Mar 04, 2015 at 07:21:48AM +0000, Weiny, Ira wrote:
>>
>>> I think this is going to break quite a bit.  I have prototyped setting
>>> OPA devices to "OPA Link Layer" and the perftest tools just fall over.
>>> Any changes to the Link layer or the transport types will require a
>>> transition period for ULPs.
>>
>> How do the perftest tools work with OPA in the first place? OPA seems to have
>> 32 bit lids. Do you mean it 'works' as long as the lid is < 16 bits? Same general
>> point about all of verbs, lots of 'uint16_t lid' in the interfaces?
> 
> The 32 bit LIDs in the SMP are designed for future expansion.  Currently OPA does not support > 16 bit LIDs.

It's not just verbs (structures and APIs) but also CM and any other
place where LID is used which are numerous.

Is a new verbs coming for this ? How will compatibility be dealt with ?

This is also another example why a more complete picture of OPA is needed.

-- Hal

> 
> Ira
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                 ` <54FE0F16.5090905-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-03-11  7:27                                                   ` Weiny, Ira
       [not found]                                                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC5C11A-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Weiny, Ira @ 2015-03-11  7:27 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Hefty, Sean, Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 20875 bytes --]

> On 3/4/2015 11:41 AM, Weiny, Ira wrote:
> >>
> >>> InfiniBand	InfiniBand	InfiniBand Verbs
> >>> iWARP		InfiniBand	iWARP Verbs (subset of IBV, with
> >>> 				specific connection establishment
> >>> 				requirements that don't exist with IBV)
> >>> RoCE		Ethernet	InfiniBand Verbs (but with different
> >>> 				addressing because of the different
> >>> 				link layer)
> >>> OPA		OPA		InfiniBand Verbs
> >>
> >> Verbs is an interface definition to hardware that has been twisted to
> >> be a software API and extended to expose vendor-specific implementation
> 'features'
> >> as extensions.  It is not a transport.
> >>
> >> The device capability bits seems to have evolved to mean: vendor A
> >> implemented some random 'feature' in their hardware and wants all
> >> applications to now check for this 'feature' and change their code to use it.
> >> Basically, what gets defined as a device cap is rather arbitrary.
> >>
> >
> > This was the point I was trying to make and the reason the OPA MAD support
> was implemented as a device capability.
> 
> The proposed device capability stands for a change that is way more drastic
> than just a vendor extension. This is a device running a completely different
> wire protocol which does not interoperate with the IB device that is
> impersonating.
> 
> Also, it does not come under the same jurisdication as IB. It is entirely possible
> that IBTA could make some change in the future where OPA can no longer
> masquerade as IB.
> 
> OPA must also be identifiable by any verbs 

I disagree.  Software applications running IP don't need to know they are running over IB vs Ethernet?  Why would this have to be true for Verbs?
	
perftest, libibverbs (example apps), mvapich (verbs), openmpi (verbs), ibacm (librdmacm based applications), srp, and ipoib all run without any knowledge of the link being OPA.  While not an exhaustive list of verbs applications, this is a pretty good sampling.  We have specifically designed OPA to support these applications without modifications.

> or management application. 

Agreed, but _only_ when talking to the hardware.  Other MAD interfaces are IB compatible.

The idea of the original series was to check the device capability bits (in kernel and in userspace) for SM diagnostic and diagnostic tool support.  As currently submitted I would need to add a kernel ABI to get the extended capability bits (because the originals were exhausted).  I have been waiting for this discussion to settle before going through that effort.
	
> To do
> that, it should be properly represented in node type, transport type, and link
> layer as all the other RDMA technologies have been regardless of the changes
> needed. All the other RDMA technologies have gone through this process.
> 

As I suggested above: Would setting the Link Layer to a new value (ie. IB_LINK_LAYER_OMNI_PATH_ARCH) while maintaining the Transport as InfiniBand Verbs be satisfactory?

While this breaks at least some of the examples I list above I believe I have worked out a way to phase in this support through our provider library.


To address your comments from the other fork of this thread:

> > The 32 bit LIDs in the SMP are designed for future expansion.  Currently OPA
> does not support > 16 bit LIDs.
> 
> It's not just verbs (structures and APIs) but also CM and any other place where
> LID is used which are numerous.
> 
> Is a new verbs coming for this ?

I can't guarantee that no changes will be required in the future.  But right now the answer is "No" because we specifically implemented OPA to be "InfiniBand Verbs".

> How will compatibility be dealt with ?

Compatibility is provided by presenting identical InfiniBand Verbs interfaces (kernel abi, SA protocols, etc) to the user.

> 
> This is also another example why a more complete picture of OPA is needed.
> 
> -- Hal


At this moment OPA provides ULP compatibility for all IB Verbs applications with the exception of management software which talks directly to the SMA and PMA.  All ULP interactions with management (Path Record, CM, Multicast joins, etc) are still IB formatted and compatible.  We have specifically designed this interoperability with OFA.

To help illustrate my point I have included 2 patches below.  The first defines a new Link Layer; IB_LINK_LAYER_OMNI_PATH_ARCH and modifies all the kernel interfaces which look at link layer.  This has been minimally tested with IPoIB and the perftest tools (the modification of which is included as the 2nd patch).  The entire patch boils down to changing "if (InfiniBand)" to "if (InfiniBand || OPA)".  IMO this is rather inefficient but if this is more acceptable to the community I am willing to investigate this further.

Ira



From 9f09be92576204b3ead71f714fc231110f03bff6 Mon Sep 17 00:00:00 2001
From: Ira Weiny <ira.weiny@intel.com>
Date: Wed, 3 Dec 2014 20:01:09 -0500
Subject: [PATCH] WIP: IB/core: Add IB_LINK_LAYER_OMNI_PATH_ARCH

This OPA Link Layer is 100% compatible with InfiniBand Verbs software.
---
 drivers/infiniband/core/agent.c           |  4 +++-
 drivers/infiniband/core/cma.c             | 39 +++++++++++++++++++++++--------
 drivers/infiniband/core/mad.c             |  4 +++-
 drivers/infiniband/core/multicast.c       | 16 ++++++++-----
 drivers/infiniband/core/sa_query.c        | 22 +++++++++++------
 drivers/infiniband/core/sysfs.c           |  2 ++
 drivers/infiniband/core/ucma.c            |  1 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  3 ++-
 include/rdma/ib_verbs.h                   |  1 +
 9 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f6d2961..0b1e7ee 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -147,6 +147,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num)
 	struct ib_agent_port_private *port_priv;
 	unsigned long flags;
 	int ret;
+	enum rdma_link_layer ll;
 
 	/* Create new device info */
 	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
@@ -156,7 +157,8 @@ int ib_agent_port_open(struct ib_device *device, int port_num)
 		goto error1;
 	}
 
-	if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) {
+	ll = rdma_port_get_link_layer(device, port_num);
+	if (ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH) {
 		/* Obtain send only MAD agent for SMI QP */
 		port_priv->agent[0] = ib_register_mad_agent(device, port_num,
 							    IB_QPT_SMI, NULL, 0,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d570030..d16586c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -349,6 +349,15 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a
 	return ret;
 }
 
+static inline int ll_matches_dev_type(enum rdma_link_layer ll,
+				      unsigned short dev_type)
+{
+	return ((dev_type == ARPHRD_INFINIBAND &&
+		(ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH))
+		||
+		(dev_type != ARPHRD_INFINIBAND && ll == IB_LINK_LAYER_ETHERNET));
+}
+
 static int cma_acquire_dev(struct rdma_id_private *id_priv,
 			   struct rdma_id_private *listen_id_priv)
 {
@@ -357,11 +366,9 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 	union ib_gid gid, iboe_gid;
 	int ret = -ENODEV;
 	u8 port, found_port;
-	enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ?
-		IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
+	unsigned short dev_type = dev_addr->dev_type;
 
-	if (dev_ll != IB_LINK_LAYER_INFINIBAND &&
-	    id_priv->id.ps == RDMA_PS_IPOIB)
+	if (dev_type != ARPHRD_INFINIBAND && id_priv->id.ps == RDMA_PS_IPOIB)
 		return -EINVAL;
 
 	mutex_lock(&lock);
@@ -370,9 +377,11 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 
 	memcpy(&gid, dev_addr->src_dev_addr +
 	       rdma_addr_gid_offset(dev_addr), sizeof gid);
+
 	if (listen_id_priv &&
-	    rdma_port_get_link_layer(listen_id_priv->id.device,
-				     listen_id_priv->id.port_num) == dev_ll) {
+	    ll_matches_dev_type(rdma_port_get_link_layer(listen_id_priv->id.device,
+							 listen_id_priv->id.port_num),
+				dev_type)) {
 		cma_dev = listen_id_priv->cma_dev;
 		port = listen_id_priv->id.port_num;
 		if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
@@ -394,7 +403,8 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 			    listen_id_priv->cma_dev == cma_dev &&
 			    listen_id_priv->id.port_num == port)
 				continue;
-			if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) {
+			if (ll_matches_dev_type(rdma_port_get_link_layer(cma_dev->device, port),
+						dev_type)) {
 				if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
 				    rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
 					ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL);
@@ -699,9 +709,11 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv,
 	struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
 	int ret;
 	u16 pkey;
+	enum rdma_link_layer ll;
+
+	ll = rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num);
 
-	if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) ==
-	    IB_LINK_LAYER_INFINIBAND)
+	if (ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH)
 		pkey = ib_addr_get_pkey(dev_addr);
 	else
 		pkey = 0xffff;
@@ -930,6 +942,7 @@ static void cma_cancel_route(struct rdma_id_private *id_priv)
 {
 	switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) {
 	case IB_LINK_LAYER_INFINIBAND:
+	case IB_LINK_LAYER_OMNI_PATH_ARCH:
 		if (id_priv->query)
 			ib_sa_cancel_query(id_priv->query_id, id_priv->query);
 		break;
@@ -1008,6 +1021,7 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
 		list_del(&mc->list);
 		switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) {
 		case IB_LINK_LAYER_INFINIBAND:
+		case IB_LINK_LAYER_OMNI_PATH_ARCH:
 			ib_sa_free_multicast(mc->multicast.ib);
 			kfree(mc);
 			break;
@@ -1971,6 +1985,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
 	case RDMA_TRANSPORT_IB:
 		switch (rdma_port_get_link_layer(id->device, id->port_num)) {
 		case IB_LINK_LAYER_INFINIBAND:
+		case IB_LINK_LAYER_OMNI_PATH_ARCH:
 			ret = cma_resolve_ib_route(id_priv, timeout_ms);
 			break;
 		case IB_LINK_LAYER_ETHERNET:
@@ -2023,6 +2038,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
 	u16 pkey;
 	int ret;
 	u8 p;
+	enum rdma_link_layer ll;
 
 	cma_dev = NULL;
 	mutex_lock(&lock);
@@ -2059,8 +2075,9 @@ port_found:
 	if (ret)
 		goto out;
 
+	ll = rdma_port_get_link_layer(cma_dev->device, p);
 	id_priv->id.route.addr.dev_addr.dev_type =
-		(rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ?
+		(ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH) ?
 		ARPHRD_INFINIBAND : ARPHRD_ETHER;
 
 	rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
@@ -3364,6 +3381,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 	case RDMA_TRANSPORT_IB:
 		switch (rdma_port_get_link_layer(id->device, id->port_num)) {
 		case IB_LINK_LAYER_INFINIBAND:
+		case IB_LINK_LAYER_OMNI_PATH_ARCH:
 			ret = cma_join_ib_multicast(id_priv, mc);
 			break;
 		case IB_LINK_LAYER_ETHERNET:
@@ -3408,6 +3426,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
 			if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) {
 				switch (rdma_port_get_link_layer(id->device, id->port_num)) {
 				case IB_LINK_LAYER_INFINIBAND:
+				case IB_LINK_LAYER_OMNI_PATH_ARCH:
 					ib_sa_free_multicast(mc->multicast.ib);
 					kfree(mc);
 					break;
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 74c30f4..4100312 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2922,6 +2922,7 @@ static int ib_mad_port_open(struct ib_device *device,
 	unsigned long flags;
 	char name[sizeof "ib_mad123"];
 	int has_smi;
+	enum rdma_link_layer ll;
 
 	/* Create new device info */
 	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
@@ -2938,7 +2939,8 @@ static int ib_mad_port_open(struct ib_device *device,
 	init_mad_qp(port_priv, &port_priv->qp_info[1]);
 
 	cq_size = mad_sendq_size + mad_recvq_size;
-	has_smi = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND;
+	ll = rdma_port_get_link_layer(device, port_num);
+	has_smi = (ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH);
 	if (has_smi)
 		cq_size *= 2;
 
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index fa17b55..5bce1a58 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -778,10 +778,12 @@ static void mcast_event_handler(struct ib_event_handler *handler,
 {
 	struct mcast_device *dev;
 	int index;
+	enum rdma_link_layer ll;
 
 	dev = container_of(handler, struct mcast_device, event_handler);
-	if (rdma_port_get_link_layer(dev->device, event->element.port_num) !=
-	    IB_LINK_LAYER_INFINIBAND)
+
+	ll = rdma_port_get_link_layer(dev->device, event->element.port_num);
+	if (ll != IB_LINK_LAYER_INFINIBAND && ll != IB_LINK_LAYER_OMNI_PATH_ARCH)
 		return;
 
 	index = event->element.port_num - dev->start_port;
@@ -824,8 +826,9 @@ static void mcast_add_one(struct ib_device *device)
 	}
 
 	for (i = 0; i <= dev->end_port - dev->start_port; i++) {
-		if (rdma_port_get_link_layer(device, dev->start_port + i) !=
-		    IB_LINK_LAYER_INFINIBAND)
+		enum rdma_link_layer ll = rdma_port_get_link_layer(device,
+								   dev->start_port + i);
+		if (ll != IB_LINK_LAYER_INFINIBAND && ll != IB_LINK_LAYER_OMNI_PATH_ARCH)
 			continue;
 		port = &dev->port[i];
 		port->dev = dev;
@@ -863,8 +866,9 @@ static void mcast_remove_one(struct ib_device *device)
 	flush_workqueue(mcast_wq);
 
 	for (i = 0; i <= dev->end_port - dev->start_port; i++) {
-		if (rdma_port_get_link_layer(device, dev->start_port + i) ==
-		    IB_LINK_LAYER_INFINIBAND) {
+		enum rdma_link_layer ll = rdma_port_get_link_layer(device,
+								   dev->start_port + i);
+		if (ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH) {
 			port = &dev->port[i];
 			deref_port(port);
 			wait_for_completion(&port->comp);
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index c38f030..90eda12 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -450,7 +450,8 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event
 		struct ib_sa_port *port =
 			&sa_dev->port[event->element.port_num - sa_dev->start_port];
 
-		if (rdma_port_get_link_layer(handler->device, port->port_num) != IB_LINK_LAYER_INFINIBAND)
+		enum rdma_link_layer ll = rdma_port_get_link_layer(handler->device, port->port_num);
+		if (ll != IB_LINK_LAYER_INFINIBAND && ll != IB_LINK_LAYER_OMNI_PATH_ARCH)
 			return;
 
 		spin_lock_irqsave(&port->ah_lock, flags);
@@ -1153,6 +1154,7 @@ static void ib_sa_add_one(struct ib_device *device)
 {
 	struct ib_sa_device *sa_dev;
 	int s, e, i;
+	enum rdma_link_layer ll;
 
 	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
 		return;
@@ -1175,7 +1177,8 @@ static void ib_sa_add_one(struct ib_device *device)
 
 	for (i = 0; i <= e - s; ++i) {
 		spin_lock_init(&sa_dev->port[i].ah_lock);
-		if (rdma_port_get_link_layer(device, i + 1) != IB_LINK_LAYER_INFINIBAND)
+		ll = rdma_port_get_link_layer(device, i + 1);
+		if (ll != IB_LINK_LAYER_INFINIBAND && ll != IB_LINK_LAYER_OMNI_PATH_ARCH)
 			continue;
 
 		sa_dev->port[i].sm_ah    = NULL;
@@ -1204,16 +1207,20 @@ static void ib_sa_add_one(struct ib_device *device)
 	if (ib_register_event_handler(&sa_dev->event_handler))
 		goto err;
 
-	for (i = 0; i <= e - s; ++i)
-		if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND)
+	for (i = 0; i <= e - s; ++i) {
+		ll = rdma_port_get_link_layer(device, i + 1);
+		if (ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH)
 			update_sm_ah(&sa_dev->port[i].update_task);
+	}
 
 	return;
 
 err:
-	while (--i >= 0)
-		if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND)
+	while (--i >= 0) {
+		ll = rdma_port_get_link_layer(device, i + 1);
+		if (ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH)
 			ib_unregister_mad_agent(sa_dev->port[i].agent);
+	}
 
 	kfree(sa_dev);
 
@@ -1233,7 +1240,8 @@ static void ib_sa_remove_one(struct ib_device *device)
 	flush_workqueue(ib_wq);
 
 	for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) {
-		if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) {
+		enum rdma_link_layer ll = rdma_port_get_link_layer(device, i + 1);
+		if (ll == IB_LINK_LAYER_INFINIBAND || ll == IB_LINK_LAYER_OMNI_PATH_ARCH) {
 			ib_unregister_mad_agent(sa_dev->port[i].agent);
 			if (sa_dev->port[i].sm_ah)
 				kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah);
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index cbd0383..66b01f4 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -253,6 +253,8 @@ static ssize_t link_layer_show(struct ib_port *p, struct port_attribute *unused,
 		return sprintf(buf, "%s\n", "InfiniBand");
 	case IB_LINK_LAYER_ETHERNET:
 		return sprintf(buf, "%s\n", "Ethernet");
+	case IB_LINK_LAYER_OMNI_PATH_ARCH:
+		return sprintf(buf, "%s\n", "OmniPathArch");
 	default:
 		return sprintf(buf, "%s\n", "Unknown");
 	}
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 45d67e9..502e2e8 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -727,6 +727,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
 		switch (rdma_port_get_link_layer(ctx->cm_id->device,
 			ctx->cm_id->port_num)) {
 		case IB_LINK_LAYER_INFINIBAND:
+		case IB_LINK_LAYER_OMNI_PATH_ARCH:
 			ucma_copy_ib_route(&resp, &ctx->cm_id->route);
 			break;
 		case IB_LINK_LAYER_ETHERNET:
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 58b5aa3..5c51866 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1673,7 +1673,8 @@ static void ipoib_add_one(struct ib_device *device)
 	}
 
 	for (p = s; p <= e; ++p) {
-		if (rdma_port_get_link_layer(device, p) != IB_LINK_LAYER_INFINIBAND)
+		enum rdma_link_layer ll = rdma_port_get_link_layer(device, p);
+		if (ll != IB_LINK_LAYER_INFINIBAND && ll != IB_LINK_LAYER_OMNI_PATH_ARCH)
 			continue;
 		dev = ipoib_add_port("ib%d", device, p);
 		if (!IS_ERR(dev)) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65994a1..6a15088 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -88,6 +88,7 @@ enum rdma_link_layer {
 	IB_LINK_LAYER_UNSPECIFIED,
 	IB_LINK_LAYER_INFINIBAND,
 	IB_LINK_LAYER_ETHERNET,
+	IB_LINK_LAYER_OMNI_PATH_ARCH,
 };
 
 enum ib_device_cap_flags {
-- 
1.8.2





From 28c5c7e44b87c4e3e29634fd378da4871401cbcd Mon Sep 17 00:00:00 2001
From: Ira Weiny <ira.weiny@intel.com>
Date: Wed, 11 Mar 2015 02:28:14 -0400
Subject: [PATCH] perftest: WIP add OPA Link Layer support

---
 src/perftest_parameters.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/perftest_parameters.c b/src/perftest_parameters.c
index fc4088a..928ba96 100755
--- a/src/perftest_parameters.c
+++ b/src/perftest_parameters.c
@@ -985,6 +985,7 @@ const char *transport_str(enum ibv_transport_type type)
 /******************************************************************************
  *
  ******************************************************************************/
+#define IBV_LINK_LAYER_OPA (IBV_LINK_LAYER_ETHERNET+1)
 const char *link_layer_str(uint8_t link_layer)
 {
 	switch (link_layer) {
@@ -994,6 +995,8 @@ const char *link_layer_str(uint8_t link_layer)
 			return "IB";
 		case IBV_LINK_LAYER_ETHERNET:
 			return "Ethernet";
+		case IBV_LINK_LAYER_OPA:
+			return "OPA";
 		#ifdef HAVE_SCIF
 		case IBV_LINK_LAYER_SCIF:
 			return "SCIF";
-- 
1.8.2



N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC5C11A-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-16 23:59                                                       ` Hefty, Sean
       [not found]                                                         ` <1828884A29C6694DAF28B7E6B8A8237399E8106A-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-03-16 23:59 UTC (permalink / raw)
  To: Weiny, Ira, Hal Rosenstock
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

> > or management application.
> 
> Agreed, but _only_ when talking to the hardware.  Other MAD interfaces are
> IB compatible.

Maybe this is what Ira is thinking, and just not explaining very well.  But it makes sense to me to use management specific fields/attributes/flags for the *management* pieces, rather than using the link and/or transport layer protocols as a proxy.  Management related code should really branch based on that.

The introduction of new OPA link and transport protocols, and fixing what's there for iWarp, can then be addressed separately.

I don't have any thoughts for what the management specific fields/attribute/flags should be -- whether new fields are added to the device attributes, a management flag is defined, etc.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                         ` <1828884A29C6694DAF28B7E6B8A8237399E8106A-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-17 23:36                                                           ` Hefty, Sean
       [not found]                                                             ` <1828884A29C6694DAF28B7E6B8A8237399E818C6-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-03-17 23:36 UTC (permalink / raw)
  To: Hefty, Sean, Weiny, Ira, Hal Rosenstock
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1468 bytes --]

> But it makes sense to me to use management specific
> fields/attributes/flags for the *management* pieces, rather than using the
> link and/or transport layer protocols as a proxy.  Management related code
> should really branch based on that.

As a proposal, we could add a new field to the kernel port attribute structure.  The field would be a bitmask of management capabilities/protocols:

IB_MGMT_PROTO_SM - supports IB SMPs
IB_MGMT_PROTO_SA - supports IB SA MADs
IB_MGMT_PROTO_GS - supports IB GSI MADs (e.g. CM, PM, ...)
IB_MGMT_PROTO_OPA_SM - supports OPA SMPs (or whatever they are called)
IB_MGMT_PROTO_OPA_GS - supports OPA GS MADs (or whatever is supported) 

If the *GS flags are not sufficient to distinguish between MADs supported over IB and RoCE, it can be further divided (i.e. CM, PM, BM, DM, etc.).

This would provide a direct mapping of which management protocols are supported for a given port, rather than it being inferred by the link/transport fields, which should really be independent.  It would also allow for simple checks by the core layer.

If we want the code to be more generic, additional field(s) could be added, such as mad_size, so that any size of management datagram is supported.  This would be used instead of inferring the size based on the supported protocol.

- Sean 
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                             ` <1828884A29C6694DAF28B7E6B8A8237399E818C6-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-20 13:38                                                               ` Michael Wang
  2015-03-20 13:48                                                               ` Michael Wang
  1 sibling, 0 replies; 84+ messages in thread
From: Michael Wang @ 2015-03-20 13:38 UTC (permalink / raw)
  To: Hefty, Sean, Weiny, Ira, Hal Rosenstock
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi, folks

I've done a draft (very rough draft...) according to my understanding on
Sean's proposal.

The implementation is to allow device setup the management flags during
ib_query_port() (amso1100 as eg), and later we could use the flags to check
the capability.

For new capability/proto, like OPA, device could setup new flag
IB_MGMT_PROTO_OPA during query_port() callback, and some helper like
rdma_mgmt_cap_opa() can be used for management branch.

How do you think about this?

Regards,
Michael Wang



diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d570030..ad1685e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -375,8 +375,7 @@ static int cma_acquire_dev(struct rdma_id_private 
*id_priv,
                       listen_id_priv->id.port_num) == dev_ll) {
          cma_dev = listen_id_priv->cma_dev;
          port = listen_id_priv->id.port_num;
-        if (rdma_node_get_transport(cma_dev->device->node_type) == 
RDMA_TRANSPORT_IB &&
-            rdma_port_get_link_layer(cma_dev->device, port) == 
IB_LINK_LAYER_ETHERNET)
+        if (rdma_mgmt_cap_iboe(cma_dev->device, port))
              ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
                           &found_port, NULL);
          else
@@ -395,8 +394,7 @@ static int cma_acquire_dev(struct rdma_id_private 
*id_priv,
                  listen_id_priv->id.port_num == port)
                  continue;
              if (rdma_port_get_link_layer(cma_dev->device, port) == 
dev_ll) {
-                if (rdma_node_get_transport(cma_dev->device->node_type) 
== RDMA_TRANSPORT_IB &&
-                    rdma_port_get_link_layer(cma_dev->device, port) == 
IB_LINK_LAYER_ETHERNET)
+                if (rdma_mgmt_cap_iboe(cma_dev->device, port))
                      ret = ib_find_cached_gid(cma_dev->device, 
&iboe_gid, &found_port, NULL);
                  else
                      ret = ib_find_cached_gid(cma_dev->device, &gid, 
&found_port, NULL);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 74c30f4..0ae6b04 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
      init_mad_qp(port_priv, &port_priv->qp_info[1]);

      cq_size = mad_sendq_size + mad_recvq_size;
-    has_smi = rdma_port_get_link_layer(device, port_num) == 
IB_LINK_LAYER_INFINIBAND;
+    has_smi = rdma_mgmt_cap_smi(device, port_num);
      if (has_smi)
          cq_size *= 2;

@@ -3057,7 +3057,7 @@ static void ib_mad_init_device(struct ib_device 
*device)
  {
      int start, end, i;

-    if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+    if (!rdma_mgmt_cap_ib(device))
          return;

      if (device->node_type == RDMA_NODE_IB_SWITCH) {
diff --git a/drivers/infiniband/core/verbs.c 
b/drivers/infiniband/core/verbs.c
index f93eb8d..5ecf9c8 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -146,6 +146,26 @@ enum rdma_link_layer 
rdma_port_get_link_layer(struct ib_device *device, u8 port_
  }
  EXPORT_SYMBOL(rdma_port_get_link_layer);

+int rdma_port_default_mgmt_flags(struct ib_device *device, u8 port_num)
+{
+    int mgmt_flags = 0;
+    enum rdma_transport_type tp =
+            rdma_node_get_transport(device->node_type);
+    enum rdma_link_layer ll =
+            rdma_port_get_link_layer(device, port_num);
+
+    if (tp == RDMA_TRANSPORT_IB) {
+        mgmt_flags |= IB_MGMT_PROTO_IB;
+        if (ll == IB_LINK_LAYER_INFINIBAND) {
+            mgmt_flags |= IB_MGMT_PROTO_SMI;
+            mgmt_flags |= IB_MGMT_PROTO_IBOE;
+        }
+    }
+
+    return mgmt_flags;
+}
+EXPORT_SYMBOL(rdma_port_default_mgmt_flags);
+
  /* Protection domains */

  struct ib_pd *ib_alloc_pd(struct ib_device *device)
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
b/drivers/infiniband/hw/amso1100/c2_provider.c
index bdf3507..04d005e 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -96,6 +96,9 @@ static int c2_query_port(struct ib_device *ibdev,
      props->active_width = 1;
      props->active_speed = IB_SPEED_SDR;

+    /* Makeup flags here, by default or on your own */
+    props->mgmt_flags = rdma_port_default_mgmt_flags(ibdev, port);
+
      return 0;
  }

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65994a1..d19c7c9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -90,6 +90,13 @@ enum rdma_link_layer {
      IB_LINK_LAYER_ETHERNET,
  };

+enum rdma_mgmt_flag {
+    IB_MGMT_PROTO_IB,
+    IB_MGMT_PROTO_SMI,
+    IB_MGMT_PROTO_IBOE,
+    /* More Here*/
+};
+
  enum ib_device_cap_flags {
      IB_DEVICE_RESIZE_MAX_WR        = 1,
      IB_DEVICE_BAD_PKEY_CNTR        = (1<<1),
@@ -352,6 +359,7 @@ struct ib_port_attr {
      enum ib_mtu        active_mtu;
      int            gid_tbl_len;
      u32            port_cap_flags;
+    u32            mgmt_flags;
      u32            max_msg_sz;
      u32            bad_pkey_cntr;
      u32            qkey_viol_cntr;
@@ -1743,6 +1751,32 @@ int ib_query_port(struct ib_device *device,
  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device,
                             u8 port_num);

+int rdma_port_default_mgmt_flags(struct ib_device *device, u8 port_num);
+
+static inline int rdma_mgmt_cap(struct ib_device *device, u8 port_num)
+{
+    struct ib_port_attr port_attr;
+    memset(&port_attr, 0, sizeof port_attr);
+    ib_query_port(device, port_num, &port_attr);
+    return port_attr.mgmt_flags;
+}
+
+static inline int rdma_mgmt_cap_ib(struct ib_device *device)
+{
+    u8 port_num = device->node_type == RDMA_NODE_IB_SWITCH ? 0 : 1;
+    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IB;
+}
+
+static inline int rdma_mgmt_cap_smi(struct ib_device *device, u8 port_num)
+{
+    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_SMI;
+}
+
+static inline int rdma_mgmt_cap_iboe(struct ib_device *device, u8 port_num)
+{
+    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IBOE;
+}
+
  int ib_query_gid(struct ib_device *device,
           u8 port_num, int index, union ib_gid *gid);



On 03/18/2015 12:36 AM, Hefty, Sean wrote:
>> But it makes sense to me to use management specific
>> fields/attributes/flags for the *management* pieces, rather than using the
>> link and/or transport layer protocols as a proxy.  Management related code
>> should really branch based on that.
> As a proposal, we could add a new field to the kernel port attribute structure.  The field would be a bitmask of management capabilities/protocols:
>
> IB_MGMT_PROTO_SM - supports IB SMPs
> IB_MGMT_PROTO_SA - supports IB SA MADs
> IB_MGMT_PROTO_GS - supports IB GSI MADs (e.g. CM, PM, ...)
> IB_MGMT_PROTO_OPA_SM - supports OPA SMPs (or whatever they are called)
> IB_MGMT_PROTO_OPA_GS - supports OPA GS MADs (or whatever is supported)
>
> If the *GS flags are not sufficient to distinguish between MADs supported over IB and RoCE, it can be further divided (i.e. CM, PM, BM, DM, etc.).
>
> This would provide a direct mapping of which management protocols are supported for a given port, rather than it being inferred by the link/transport fields, which should really be independent.  It would also allow for simple checks by the core layer.
>
> If we want the code to be more generic, additional field(s) could be added, such as mad_size, so that any size of management datagram is supported.  This would be used instead of inferring the size based on the supported protocol.
>
> - Sean
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{��ٚ�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                             ` <1828884A29C6694DAF28B7E6B8A8237399E818C6-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-03-20 13:38                                                               ` Michael Wang
@ 2015-03-20 13:48                                                               ` Michael Wang
       [not found]                                                                 ` <6A3D3202-0128-4F33-B596-D7A76AB66DF8@gmail.com>
  1 sibling, 1 reply; 84+ messages in thread
From: Michael Wang @ 2015-03-20 13:48 UTC (permalink / raw)
  To: Hefty, Sean, Weiny, Ira, Hal Rosenstock
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi, folks

I've done a draft (very rough draft...) according to my understanding on
Sean's proposal.

The implementation is to allow device setup the management flags during
ib_query_port() (amso1100 as eg), and later we could use the flags to check
the capability.

For new capability/proto, like OPA, device could setup new flag
IB_MGMT_PROTO_OPA during query_port() callback, and some helper like
rdma_mgmt_cap_opa() can be used for management branch.

How do you think about this?

Regards,
Michael Wang



diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d570030..ad1685e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -375,8 +375,7 @@ static int cma_acquire_dev(struct rdma_id_private 
*id_priv,
                       listen_id_priv->id.port_num) == dev_ll) {
          cma_dev = listen_id_priv->cma_dev;
          port = listen_id_priv->id.port_num;
-        if (rdma_node_get_transport(cma_dev->device->node_type) == 
RDMA_TRANSPORT_IB &&
-            rdma_port_get_link_layer(cma_dev->device, port) == 
IB_LINK_LAYER_ETHERNET)
+        if (rdma_mgmt_cap_iboe(cma_dev->device, port))
              ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
                           &found_port, NULL);
          else
@@ -395,8 +394,7 @@ static int cma_acquire_dev(struct rdma_id_private 
*id_priv,
                  listen_id_priv->id.port_num == port)
                  continue;
              if (rdma_port_get_link_layer(cma_dev->device, port) == 
dev_ll) {
-                if (rdma_node_get_transport(cma_dev->device->node_type) 
== RDMA_TRANSPORT_IB &&
-                    rdma_port_get_link_layer(cma_dev->device, port) == 
IB_LINK_LAYER_ETHERNET)
+                if (rdma_mgmt_cap_iboe(cma_dev->device, port))
                      ret = ib_find_cached_gid(cma_dev->device, 
&iboe_gid, &found_port, NULL);
                  else
                      ret = ib_find_cached_gid(cma_dev->device, &gid, 
&found_port, NULL);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 74c30f4..0ae6b04 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
      init_mad_qp(port_priv, &port_priv->qp_info[1]);

      cq_size = mad_sendq_size + mad_recvq_size;
-    has_smi = rdma_port_get_link_layer(device, port_num) == 
IB_LINK_LAYER_INFINIBAND;
+    has_smi = rdma_mgmt_cap_smi(device, port_num);
      if (has_smi)
          cq_size *= 2;

@@ -3057,7 +3057,7 @@ static void ib_mad_init_device(struct ib_device 
*device)
  {
      int start, end, i;

-    if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+    if (!rdma_mgmt_cap_ib(device))
          return;

      if (device->node_type == RDMA_NODE_IB_SWITCH) {
diff --git a/drivers/infiniband/core/verbs.c 
b/drivers/infiniband/core/verbs.c
index f93eb8d..5ecf9c8 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -146,6 +146,26 @@ enum rdma_link_layer 
rdma_port_get_link_layer(struct ib_device *device, u8 port_
  }
  EXPORT_SYMBOL(rdma_port_get_link_layer);

+int rdma_port_default_mgmt_flags(struct ib_device *device, u8 port_num)
+{
+    int mgmt_flags = 0;
+    enum rdma_transport_type tp =
+            rdma_node_get_transport(device->node_type);
+    enum rdma_link_layer ll =
+            rdma_port_get_link_layer(device, port_num);
+
+    if (tp == RDMA_TRANSPORT_IB) {
+        mgmt_flags |= IB_MGMT_PROTO_IB;
+        if (ll == IB_LINK_LAYER_INFINIBAND) {
+            mgmt_flags |= IB_MGMT_PROTO_SMI;
+            mgmt_flags |= IB_MGMT_PROTO_IBOE;
+        }
+    }
+
+    return mgmt_flags;
+}
+EXPORT_SYMBOL(rdma_port_default_mgmt_flags);
+
  /* Protection domains */

  struct ib_pd *ib_alloc_pd(struct ib_device *device)
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
b/drivers/infiniband/hw/amso1100/c2_provider.c
index bdf3507..04d005e 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -96,6 +96,9 @@ static int c2_query_port(struct ib_device *ibdev,
      props->active_width = 1;
      props->active_speed = IB_SPEED_SDR;

+    /* Makeup flags here, by default or on your own */
+    props->mgmt_flags = rdma_port_default_mgmt_flags(ibdev, port);
+
      return 0;
  }

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65994a1..d19c7c9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -90,6 +90,13 @@ enum rdma_link_layer {
      IB_LINK_LAYER_ETHERNET,
  };

+enum rdma_mgmt_flag {
+    IB_MGMT_PROTO_IB,
+    IB_MGMT_PROTO_SMI,
+    IB_MGMT_PROTO_IBOE,
+    /* More Here*/
+};
+
  enum ib_device_cap_flags {
      IB_DEVICE_RESIZE_MAX_WR        = 1,
      IB_DEVICE_BAD_PKEY_CNTR        = (1<<1),
@@ -352,6 +359,7 @@ struct ib_port_attr {
      enum ib_mtu        active_mtu;
      int            gid_tbl_len;
      u32            port_cap_flags;
+    u32            mgmt_flags;
      u32            max_msg_sz;
      u32            bad_pkey_cntr;
      u32            qkey_viol_cntr;
@@ -1743,6 +1751,32 @@ int ib_query_port(struct ib_device *device,
  enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device,
                             u8 port_num);

+int rdma_port_default_mgmt_flags(struct ib_device *device, u8 port_num);
+
+static inline int rdma_mgmt_cap(struct ib_device *device, u8 port_num)
+{
+    struct ib_port_attr port_attr;
+    memset(&port_attr, 0, sizeof port_attr);
+    ib_query_port(device, port_num, &port_attr);
+    return port_attr.mgmt_flags;
+}
+
+static inline int rdma_mgmt_cap_ib(struct ib_device *device)
+{
+    u8 port_num = device->node_type == RDMA_NODE_IB_SWITCH ? 0 : 1;
+    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IB;
+}
+
+static inline int rdma_mgmt_cap_smi(struct ib_device *device, u8 port_num)
+{
+    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_SMI;
+}
+
+static inline int rdma_mgmt_cap_iboe(struct ib_device *device, u8 port_num)
+{
+    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IBOE;
+}
+
  int ib_query_gid(struct ib_device *device,
           u8 port_num, int index, union ib_gid *gid);



On 03/18/2015 12:36 AM, Hefty, Sean wrote:
>> But it makes sense to me to use management specific
>> fields/attributes/flags for the *management* pieces, rather than using the
>> link and/or transport layer protocols as a proxy.  Management related code
>> should really branch based on that.
> As a proposal, we could add a new field to the kernel port attribute structure.  The field would be a bitmask of management capabilities/protocols:
>
> IB_MGMT_PROTO_SM - supports IB SMPs
> IB_MGMT_PROTO_SA - supports IB SA MADs
> IB_MGMT_PROTO_GS - supports IB GSI MADs (e.g. CM, PM, ...)
> IB_MGMT_PROTO_OPA_SM - supports OPA SMPs (or whatever they are called)
> IB_MGMT_PROTO_OPA_GS - supports OPA GS MADs (or whatever is supported)
>
> If the *GS flags are not sufficient to distinguish between MADs supported over IB and RoCE, it can be further divided (i.e. CM, PM, BM, DM, etc.).
>
> This would provide a direct mapping of which management protocols are supported for a given port, rather than it being inferred by the link/transport fields, which should really be independent.  It would also allow for simple checks by the core layer.
>
> If we want the code to be more generic, additional field(s) could be added, such as mad_size, so that any size of management datagram is supported.  This would be used instead of inferring the size based on the supported protocol.
>
> - Sean
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{��ٚ�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                                     ` <20150320235748.GA22703-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-03-21  0:05                                                                       ` ira.weiny
       [not found]                                                                         ` <20150321000541.GA24717-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: ira.weiny @ 2015-03-21  0:05 UTC (permalink / raw)
  To: Michael Wang
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Hal Rosenstock

My apologies to those who are duplicated here.  This did not make it to the
mailing list due to mail configuration issues.

From ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org Fri Mar 20 18:55:29 2015

> 
> Hi, folks
> 
> I've done a draft (very rough draft...) according to my understanding on
> Sean's proposal.
> 
> The implementation is to allow device setup the management flags during
> ib_query_port() (amso1100 as eg), and later we could use the flags to check
> the capability.
> 
> For new capability/proto, like OPA, device could setup new flag
> IB_MGMT_PROTO_OPA during query_port() callback, and some helper like
> rdma_mgmt_cap_opa() can be used for management branch.
> 
> How do you think about this?

This is not saving us anything...  See below.

> 
> Regards,
> Michael Wang
> 
> 
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index d570030..ad1685e 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -375,8 +375,7 @@ static int cma_acquire_dev(struct rdma_id_private 
> *id_priv,
>                        listen_id_priv->id.port_num) == dev_ll) {
>           cma_dev = listen_id_priv->cma_dev;
>           port = listen_id_priv->id.port_num;
> -        if (rdma_node_get_transport(cma_dev->device->node_type) == 
> RDMA_TRANSPORT_IB &&
> -            rdma_port_get_link_layer(cma_dev->device, port) == 
> IB_LINK_LAYER_ETHERNET)
> +        if (rdma_mgmt_cap_iboe(cma_dev->device, port))

This is still indicating a specific technology "iboe" rather than the specific
management capabilities the port has.

Also this if statement does not seem to have anything to do with the management
support.  Here the iboe_gid is a different format and needs to be processed
differently from the gid.

>               ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
>                            &found_port, NULL);
>           else
> @@ -395,8 +394,7 @@ static int cma_acquire_dev(struct rdma_id_private 
> *id_priv,
>                   listen_id_priv->id.port_num == port)
>                   continue;
>               if (rdma_port_get_link_layer(cma_dev->device, port) == 
> dev_ll) {
> -                if (rdma_node_get_transport(cma_dev->device->node_type) 
> == RDMA_TRANSPORT_IB &&
> -                    rdma_port_get_link_layer(cma_dev->device, port) == 
> IB_LINK_LAYER_ETHERNET)
> +                if (rdma_mgmt_cap_iboe(cma_dev->device, port))
>                       ret = ib_find_cached_gid(cma_dev->device, 
> &iboe_gid, &found_port, NULL);
>                   else
>                       ret = ib_find_cached_gid(cma_dev->device, &gid, 
> &found_port, NULL);
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 74c30f4..0ae6b04 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
>       init_mad_qp(port_priv, &port_priv->qp_info[1]);
> 
>       cq_size = mad_sendq_size + mad_recvq_size;
> -    has_smi = rdma_port_get_link_layer(device, port_num) == 
> IB_LINK_LAYER_INFINIBAND;
> +    has_smi = rdma_mgmt_cap_smi(device, port_num);
>       if (has_smi)
>           cq_size *= 2;
> 
> @@ -3057,7 +3057,7 @@ static void ib_mad_init_device(struct ib_device 
> *device)
>   {
>       int start, end, i;
> 
> -    if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
> +    if (!rdma_mgmt_cap_ib(device))
>           return;
> 
>       if (device->node_type == RDMA_NODE_IB_SWITCH) {
> diff --git a/drivers/infiniband/core/verbs.c 
> b/drivers/infiniband/core/verbs.c
> index f93eb8d..5ecf9c8 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -146,6 +146,26 @@ enum rdma_link_layer 
> rdma_port_get_link_layer(struct ib_device *device, u8 port_
>   }
>   EXPORT_SYMBOL(rdma_port_get_link_layer);
> 
> +int rdma_port_default_mgmt_flags(struct ib_device *device, u8 port_num)
> +{
> +    int mgmt_flags = 0;
> +    enum rdma_transport_type tp =
> +            rdma_node_get_transport(device->node_type);
> +    enum rdma_link_layer ll =
> +            rdma_port_get_link_layer(device, port_num);

This does not separate the management capabilities from the transport and link
layer like Sean was advocating.  This is just refactoring the current
implementation with the use of additional flags.

> +
> +    if (tp == RDMA_TRANSPORT_IB) {
> +        mgmt_flags |= IB_MGMT_PROTO_IB;
> +        if (ll == IB_LINK_LAYER_INFINIBAND) {
> +            mgmt_flags |= IB_MGMT_PROTO_SMI;
> +            mgmt_flags |= IB_MGMT_PROTO_IBOE;
> +        }
> +    }
> +
> +    return mgmt_flags;
> +}
> +EXPORT_SYMBOL(rdma_port_default_mgmt_flags);
> +
>   /* Protection domains */
> 
>   struct ib_pd *ib_alloc_pd(struct ib_device *device)
> diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
> b/drivers/infiniband/hw/amso1100/c2_provider.c
> index bdf3507..04d005e 100644
> --- a/drivers/infiniband/hw/amso1100/c2_provider.c
> +++ b/drivers/infiniband/hw/amso1100/c2_provider.c
> @@ -96,6 +96,9 @@ static int c2_query_port(struct ib_device *ibdev,
>       props->active_width = 1;
>       props->active_speed = IB_SPEED_SDR;
> 
> +    /* Makeup flags here, by default or on your own */
> +    props->mgmt_flags = rdma_port_default_mgmt_flags(ibdev, port);
> +
>       return 0;
>   }
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 65994a1..d19c7c9 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -90,6 +90,13 @@ enum rdma_link_layer {
>       IB_LINK_LAYER_ETHERNET,
>   };
> 
> +enum rdma_mgmt_flag {
> +    IB_MGMT_PROTO_IB,
> +    IB_MGMT_PROTO_SMI,
> +    IB_MGMT_PROTO_IBOE,

IB and IBoE are not management protocols.

> +    /* More Here*/
> +};
> +
>   enum ib_device_cap_flags {
>       IB_DEVICE_RESIZE_MAX_WR        = 1,
>       IB_DEVICE_BAD_PKEY_CNTR        = (1<<1),
> @@ -352,6 +359,7 @@ struct ib_port_attr {
>       enum ib_mtu        active_mtu;
>       int            gid_tbl_len;
>       u32            port_cap_flags;
> +    u32            mgmt_flags;
>       u32            max_msg_sz;
>       u32            bad_pkey_cntr;
>       u32            qkey_viol_cntr;
> @@ -1743,6 +1751,32 @@ int ib_query_port(struct ib_device *device,
>   enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device,
>                              u8 port_num);
> 
> +int rdma_port_default_mgmt_flags(struct ib_device *device, u8 port_num);

This should return u32.

I think I would rather see your other patch go in to clean up the code a bit
and work this issue separately.

Ira

> +
> +static inline int rdma_mgmt_cap(struct ib_device *device, u8 port_num)
> +{
> +    struct ib_port_attr port_attr;
> +    memset(&port_attr, 0, sizeof port_attr);
> +    ib_query_port(device, port_num, &port_attr);
> +    return port_attr.mgmt_flags;
> +}
> +
> +static inline int rdma_mgmt_cap_ib(struct ib_device *device)
> +{
> +    u8 port_num = device->node_type == RDMA_NODE_IB_SWITCH ? 0 : 1;
> +    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IB;
> +}
> +
> +static inline int rdma_mgmt_cap_smi(struct ib_device *device, u8 port_num)
> +{
> +    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_SMI;
> +}
> +
> +static inline int rdma_mgmt_cap_iboe(struct ib_device *device, u8 port_num)
> +{
> +    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IBOE;
> +}
> +
>   int ib_query_gid(struct ib_device *device,
>            u8 port_num, int index, union ib_gid *gid);
> 
> 
> 
> On 03/18/2015 12:36 AM, Hefty, Sean wrote:
> >> But it makes sense to me to use management specific
> >> fields/attributes/flags for the *management* pieces, rather than using the
> >> link and/or transport layer protocols as a proxy.  Management related code
> >> should really branch based on that.
> > As a proposal, we could add a new field to the kernel port attribute structure.  The field would be a bitmask of management capabilities/protocols:
> >
> > IB_MGMT_PROTO_SM - supports IB SMPs
> > IB_MGMT_PROTO_SA - supports IB SA MADs
> > IB_MGMT_PROTO_GS - supports IB GSI MADs (e.g. CM, PM, ...)
> > IB_MGMT_PROTO_OPA_SM - supports OPA SMPs (or whatever they are called)
> > IB_MGMT_PROTO_OPA_GS - supports OPA GS MADs (or whatever is supported)
> >
> > If the *GS flags are not sufficient to distinguish between MADs supported over IB and RoCE, it can be further divided (i.e. CM, PM, BM, DM, etc.).
> >
> > This would provide a direct mapping of which management protocols are supported for a given port, rather than it being inferred by the link/transport fields, which should really be independent.  It would also allow for simple checks by the core layer.
> >
> > If we want the code to be more generic, additional field(s) could be added, such as mad_size, so that any size of management datagram is supported.  This would be used instead of inferring the size based on the supported protocol.
> >
> > - Sean
> > N�����r��y���b�X��ǧv�^�)Þº{.n�+����{��Ùs�{ay�\x1dÊ?ÚT�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                                         ` <20150321000541.GA24717-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-03-21  7:49                                                                           ` Yun Wang
       [not found]                                                                             ` <CAJuTgQUsZ34F-dKpsmW+5=axDWb93pA43LZ-qKbEjqyu-RUUmg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Yun Wang @ 2015-03-21  7:49 UTC (permalink / raw)
  To: ira.weiny
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hefty, Sean, Hal Rosenstock

Hello, Ira

Thanks for the reply :-)

On Sat, Mar 21, 2015 at 1:05 AM, ira.weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> My apologies to those who are duplicated here.  This did not make it to the
> mailing list due to mail configuration issues.
>
> From ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org Fri Mar 20 18:55:29 2015
>
[snip]
>> -        if (rdma_node_get_transport(cma_dev->device->node_type) ==
>> RDMA_TRANSPORT_IB &&
>> -            rdma_port_get_link_layer(cma_dev->device, port) ==
>> IB_LINK_LAYER_ETHERNET)
>> +        if (rdma_mgmt_cap_iboe(cma_dev->device, port))
>
> This is still indicating a specific technology "iboe" rather than the specific
> management capabilities the port has.
>
> Also this if statement does not seem to have anything to do with the management
> support.  Here the iboe_gid is a different format and needs to be processed
> differently from the gid.

Agree, the mgmt cap here is more close to concept 'feature' rather
than 'protocol'.

The purpose is to help branch the management, like when vendor declare
his device support ibeo format, mad can get the hint by checking the
port attribute then branch the management path, which currently is
inferred from transport type and link layer type, not told by the
vendor.

>
>>               ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
>>                            &found_port, NULL);
>>           else
[snip]
>> +    enum rdma_transport_type tp =
>> +            rdma_node_get_transport(device->node_type);
>> +    enum rdma_link_layer ll =
>> +            rdma_port_get_link_layer(device, port_num);
>
> This does not separate the management capabilities from the transport and link
> layer like Sean was advocating.  This is just refactoring the current
> implementation with the use of additional flags.

Yes, it's currently copy the logical we are using to branch the
management, however, if the vendor could provide his own setup code
here, we don't really need to infer anymore but following the
indicator from vendor, this 'default' method is for transition.

Sean, could you please give more hint or details on what you'd rather
like to have? I think I got some misunderstanding on your
suggestion...

>
>> +
[snip]
>>
>> +enum rdma_mgmt_flag {
>> +    IB_MGMT_PROTO_IB,
>> +    IB_MGMT_PROTO_SMI,
>> +    IB_MGMT_PROTO_IBOE,
>
> IB and IBoE are not management protocols.

May be it should be 'feature' rather than 'proto'?

We have management branch for iboe although it's not really a protocol
(or does it belong to any protocol specially?), also we have too much
places to check if device got infiniband feature, especially at the
beginning of initialization, I really want to have some way to
simplify these check, or make them more like a formal mechanism.

>
[snip]
>>
>> +int rdma_port_default_mgmt_flags(struct ib_device *device, u8 port_num);
>
> This should return u32.

Yeah.. that's why I call it a draft ;-)

Actually the flags are even not defined in bit but integer :-P just
try to introduce the idea see if it's close to Sean's proposal.

>
> I think I would rather see your other patch go in to clean up the code a bit
> and work this issue separately.

Agree, it seems like not a simple work, me too would rather to cleanup
the code of inferring firstly, put them together into some public
helper, waiting for adapt to some new mechanism.

Sean, what's your opinion?

Regards,
Michael Wang

>
> Ira
>
>> +
>> +static inline int rdma_mgmt_cap(struct ib_device *device, u8 port_num)
>> +{
>> +    struct ib_port_attr port_attr;
>> +    memset(&port_attr, 0, sizeof port_attr);
>> +    ib_query_port(device, port_num, &port_attr);
>> +    return port_attr.mgmt_flags;
>> +}
>> +
>> +static inline int rdma_mgmt_cap_ib(struct ib_device *device)
>> +{
>> +    u8 port_num = device->node_type == RDMA_NODE_IB_SWITCH ? 0 : 1;
>> +    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IB;
>> +}
>> +
>> +static inline int rdma_mgmt_cap_smi(struct ib_device *device, u8 port_num)
>> +{
>> +    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_SMI;
>> +}
>> +
>> +static inline int rdma_mgmt_cap_iboe(struct ib_device *device, u8 port_num)
>> +{
>> +    return rdma_mgmt_cap(device, port_num) & IB_MGMT_PROTO_IBOE;
>> +}
>> +
>>   int ib_query_gid(struct ib_device *device,
>>            u8 port_num, int index, union ib_gid *gid);
>>
>>
>>
>> On 03/18/2015 12:36 AM, Hefty, Sean wrote:
>> >> But it makes sense to me to use management specific
>> >> fields/attributes/flags for the *management* pieces, rather than using the
>> >> link and/or transport layer protocols as a proxy.  Management related code
>> >> should really branch based on that.
>> > As a proposal, we could add a new field to the kernel port attribute structure.  The field would be a bitmask of management capabilities/protocols:
>> >
>> > IB_MGMT_PROTO_SM - supports IB SMPs
>> > IB_MGMT_PROTO_SA - supports IB SA MADs
>> > IB_MGMT_PROTO_GS - supports IB GSI MADs (e.g. CM, PM, ...)
>> > IB_MGMT_PROTO_OPA_SM - supports OPA SMPs (or whatever they are called)
>> > IB_MGMT_PROTO_OPA_GS - supports OPA GS MADs (or whatever is supported)
>> >
>> > If the *GS flags are not sufficient to distinguish between MADs supported over IB and RoCE, it can be further divided (i.e. CM, PM, BM, DM, etc.).
>> >
>> > This would provide a direct mapping of which management protocols are supported for a given port, rather than it being inferred by the link/transport fields, which should really be independent.  It would also allow for simple checks by the core layer.
>> >
>> > If we want the code to be more generic, additional field(s) could be added, such as mad_size, so that any size of management datagram is supported.  This would be used instead of inferring the size based on the supported protocol.
>> >
>> > - Sean
>> > N�����r��y���b�X��ǧv�^�)Þº{.n�+����{��Ùs�{ay� Ê?ÚT�,j ��f���h���z� �w��� ���j:+v���w�j�m���� ����zZ+�����ݢj"��!tml=
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                                             ` <CAJuTgQUsZ34F-dKpsmW+5=axDWb93pA43LZ-qKbEjqyu-RUUmg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-03-23 16:31                                                                               ` Hefty, Sean
       [not found]                                                                                 ` <1828884A29C6694DAF28B7E6B8A8237399E82E58-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-03-23 16:31 UTC (permalink / raw)
  To: Yun Wang, Weiny, Ira
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hal Rosenstock

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1774 bytes --]

> Yes, it's currently copy the logical we are using to branch the
> management, however, if the vendor could provide his own setup code
> here, we don't really need to infer anymore but following the
> indicator from vendor, this 'default' method is for transition.
> 
> Sean, could you please give more hint or details on what you'd rather
> like to have? I think I got some misunderstanding on your
> suggestion...

To restate my suggesting, I was thinking of defining something like this:

enum {
	IB_MGMT_PROTO_SM = (1 << 0),   /* supports IB SMPs */
	IB_MGMT_PROTO_SA = (1 << 1),   /* supports IB SA MADs */
	IB_MGMT_PROTO_GS = (1 << 2),   /* supports IB GSI MADs (e.g. PM, ...) */
	IB_MGMT_PROTO_CM = (1 << 3),   /* IB CM called out separately */
	IB_MGMT_PROTO_IW_CM = (1 << 4),/* iWarp CM */
	/* OPA can define new values here */
};

struct ib_port_attr {
	...
	u32	mgmt_proto;  /* bitmask of supported protocols */
};

I am not familiar enough with RoCE (IBoE) to know off the top of my head if this breakdown works as I defined it, or if IB_MGMT_PROTO_GS needs to be separated into more mgmt classes.  (Hal or Ira might.)  I separated out the CM class, as the rdma cm has checks where it wants to distinguish between which CM protocol to execute (IB or iWarp).

This change would be limited to management checks only.  There may still be places in the code where the link and transport checks would continue to exist.  Again, this is just a suggestion.  Without actually implementing the patch, I don't know if this would simplify things.  The checks in the rdma cm, in particular, are messy.

- Sean
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                                                 ` <1828884A29C6694DAF28B7E6B8A8237399E82E58-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-03-24 12:49                                                                                   ` Michael Wang
       [not found]                                                                                     ` <55115D61.9040201-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Michael Wang @ 2015-03-24 12:49 UTC (permalink / raw)
  To: Hefty, Sean, Weiny, Ira
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hal Rosenstock

On 03/23/2015 05:31 PM, Hefty, Sean wrote:
>> [snip]
> To restate my suggesting, I was thinking of defining something like this:
>
> enum {
> 	IB_MGMT_PROTO_SM = (1 << 0),   /* supports IB SMPs */
> 	IB_MGMT_PROTO_SA = (1 << 1),   /* supports IB SA MADs */
> 	IB_MGMT_PROTO_GS = (1 << 2),   /* supports IB GSI MADs (e.g. PM, ...) */
> 	IB_MGMT_PROTO_CM = (1 << 3),   /* IB CM called out separately */
> 	IB_MGMT_PROTO_IW_CM = (1 << 4),/* iWarp CM */
> 	/* OPA can define new values here */
> };
>
> struct ib_port_attr {
> 	...
> 	u32	mgmt_proto;  /* bitmask of supported protocols */
> };

Thanks for the restate, Sean :) seems like your proposal is also to ask 
vendor
setup 'mgmt_proto' during ib_query_port(), correct?

>
> I am not familiar enough with RoCE (IBoE) to know off the top of my head if this breakdown works as I defined it, or if IB_MGMT_PROTO_GS needs to be separated into more mgmt classes.  (Hal or Ira might.)  I separated out the CM class, as the rdma cm has checks where it wants to distinguish between which CM protocol to execute (IB or iWarp).

Maybe we can apply this thought to CM part firstly?
>
> This change would be limited to management checks only.  There may still be places in the code where the link and transport checks would continue to exist.  Again, this is just a suggestion.  Without actually implementing the patch, I don't know if this would simplify things.  The checks in the rdma cm, in particular, are messy.
I think it's time to make a formal patch now and discuss the problem in
a separate thread, I'll still use the mechanism in draft and apply these
flags, let's see if it satisfied peoples ;-)

Regards,
Michael Wang

> - Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                                                     ` <55115D61.9040201-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2015-03-25 10:30                                                                                       ` Michael Wang
       [not found]                                                                                         ` <55128E48.1080406-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Michael Wang @ 2015-03-25 10:30 UTC (permalink / raw)
  To: Hefty, Sean, Weiny, Ira
  Cc: Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hal Rosenstock

Hi, Sean

On 03/24/2015 01:49 PM, Michael Wang wrote:
> On 03/23/2015 05:31 PM, Hefty, Sean wrote:
>>> [snip]
>> To restate my suggesting, I was thinking of defining something like 
>> this:
>>
>> enum {
>>     IB_MGMT_PROTO_SM = (1 << 0),   /* supports IB SMPs */
>>     IB_MGMT_PROTO_SA = (1 << 1),   /* supports IB SA MADs */
>>     IB_MGMT_PROTO_GS = (1 << 2),   /* supports IB GSI MADs (e.g. PM, 
>> ...) */
>>     IB_MGMT_PROTO_CM = (1 << 3),   /* IB CM called out separately */
>>     IB_MGMT_PROTO_IW_CM = (1 << 4),/* iWarp CM */
>>     /* OPA can define new values here */
>> };
>>
>> struct ib_port_attr {
>>     ...
>>     u32    mgmt_proto;  /* bitmask of supported protocols */
>> };
>
> Thanks for the restate, Sean :) seems like your proposal is also to 
> ask vendor
> setup 'mgmt_proto' during ib_query_port(), correct?

I think we got one problem here, if we rely on ib_query_port() to setup 
mgmt flags
each time, the performance may suffered, since some implementation of
query_port() is really expensive, like hardware communication (mlx4/5) and
mutex protection (usnic)...

And also I found that the current implementation doesn't match the idea 
very well,
for example CM, I haven't found any scene to check whether a specified 
port support
CM or not (correct me please), mostly only check the device rather than 
it's port,
SM is checking the port, however since we already verified transport 
layer at beginning,
just check link layer sounds not that bad...

Thus I think using flags may not very helpful to the current 
implementation, may
be use some helper to refine the code is more applicable?

I'll send out the patch later with just helpers, we can discuss in that 
thread see if
there is any better solutions ;-)

Regards,
Michael Wang

>
>>
>> I am not familiar enough with RoCE (IBoE) to know off the top of my 
>> head if this breakdown works as I defined it, or if IB_MGMT_PROTO_GS 
>> needs to be separated into more mgmt classes. (Hal or Ira might.)  I 
>> separated out the CM class, as the rdma cm has checks where it wants 
>> to distinguish between which CM protocol to execute (IB or iWarp).
>
> Maybe we can apply this thought to CM part firstly?
>>
>> This change would be limited to management checks only.  There may 
>> still be places in the code where the link and transport checks would 
>> continue to exist.  Again, this is just a suggestion.  Without 
>> actually implementing the patch, I don't know if this would simplify 
>> things.  The checks in the rdma cm, in particular, are messy.
> I think it's time to make a formal patch now and discuss the problem in
> a separate thread, I'll still use the mechanism in draft and apply these
> flags, let's see if it satisfied peoples ;-)
>
> Regards,
> Michael Wang
>
>> - Sean
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag
       [not found]                                                                                         ` <55128E48.1080406-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2015-04-02 22:45                                                                                           ` ira.weiny
  0 siblings, 0 replies; 84+ messages in thread
From: ira.weiny @ 2015-04-02 22:45 UTC (permalink / raw)
  To: Michael Wang
  Cc: Hefty, Sean, Doug Ledford, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hal Rosenstock

On Wed, Mar 25, 2015 at 11:30:32AM +0100, Michael Wang wrote:
> Hi, Sean
> 
> On 03/24/2015 01:49 PM, Michael Wang wrote:
> >On 03/23/2015 05:31 PM, Hefty, Sean wrote:
> >>>[snip]
> >>To restate my suggesting, I was thinking of defining something like 
> >>this:
> >>
> >>enum {
> >>    IB_MGMT_PROTO_SM = (1 << 0),   /* supports IB SMPs */
> >>    IB_MGMT_PROTO_SA = (1 << 1),   /* supports IB SA MADs */
> >>    IB_MGMT_PROTO_GS = (1 << 2),   /* supports IB GSI MADs (e.g. PM, 
> >>...) */
> >>    IB_MGMT_PROTO_CM = (1 << 3),   /* IB CM called out separately */
> >>    IB_MGMT_PROTO_IW_CM = (1 << 4),/* iWarp CM */
> >>    /* OPA can define new values here */
> >>};
> >>
> >>struct ib_port_attr {
> >>    ...
> >>    u32    mgmt_proto;  /* bitmask of supported protocols */
> >>};
> >
> >Thanks for the restate, Sean :) seems like your proposal is also to 
> >ask vendor
> >setup 'mgmt_proto' during ib_query_port(), correct?
> 
> I think we got one problem here, if we rely on ib_query_port() to setup 
> mgmt flags
> each time, the performance may suffered, since some implementation of
> query_port() is really expensive, like hardware communication (mlx4/5) and
> mutex protection (usnic)...

This is why my OPA patch series included a cache of the device attributes.

https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg22827.html

This was suggested by Or and incorporated rather than having the MAD stack
issue ib_query_device.  The same could be done for port attributes.

That said I would like to see the next version of your patch series.  We have
had a lot of churn on that thread and I think you have a lot of things you were
going to incorporate.  :-D

Thanks!
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers
       [not found]     ` <1423092585-26692-3-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 20:43       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD574-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 20:43 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 0d74f1d..0116e4b 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1675,6 +1675,7 @@ struct ib_device {
>  	u32			     local_dma_lkey;
>  	u8                           node_type;
>  	u8                           phys_port_cnt;
> +	struct ib_device_attr        cached_dev_attrs;
>  };

Looking at the device attributes, I think all of the values are static for a given device.  If this is indeed the case, then I would just remove the word 'cached' from the field name.  Cached makes me think of the values dynamically changing, and if that's the case, then this patch isn't sufficient.

Alternatively, if there's only a few values that ULPs need, maybe just store those.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 03/19] IB/mad: Change validate_mad signature to take ib_mad_hdr rather than ib_mad
       [not found]     ` <1423092585-26692-4-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 21:20       ` Hefty, Sean
  0 siblings, 0 replies; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 21:20 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 04/19] IB/mad: Change ib_response_mad signature to take ib_mad_hdr rather than ib_mad
       [not found]     ` <1423092585-26692-5-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 21:20       ` Hefty, Sean
  0 siblings, 0 replies; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 21:20 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 08/19] IB/mad: Add helper function for smi_handle_dr_smp_send
       [not found]     ` <1423092585-26692-9-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 21:28       ` Hefty, Sean
  0 siblings, 0 replies; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 21:28 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad
       [not found]     ` <1423092585-26692-13-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 22:40       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD692-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 22:40 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

> -static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev)
> +static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev,
> +					     size_t *mad_size)
>  {
> +	*mad_size = dev->cached_dev_attrs.max_mad_size;

Why does this function return the value that the caller can just read from the device?

Actually, it's odd for an alloc() call to return how much it allocated, rather than taking that as input.

>  	return (kmalloc(sizeof(struct ib_mad_private_header) +
> -			sizeof(struct ib_grh) +
> -			dev->cached_dev_attrs.max_mad_size, GFP_ATOMIC));
> +			sizeof(struct ib_grh) + *mad_size, GFP_ATOMIC));
>  }
> 
>  /*
> @@ -741,6 +742,8 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  	u8 port_num;
>  	struct ib_wc mad_wc;
>  	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
> +	size_t in_mad_size = mad_agent_priv->agent.device-
> >cached_dev_attrs.max_mad_size;
> +	size_t out_mad_size;
> 
>  	if (device->node_type == RDMA_NODE_IB_SWITCH &&
>  	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
> @@ -777,7 +780,7 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  	local->mad_priv = NULL;
>  	local->recv_mad_agent = NULL;
> 
> -	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device);
> +	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device,
> &out_mad_size);
>  	if (!mad_priv) {
>  		ret = -ENOMEM;
>  		dev_err(&device->dev, "No memory for local response MAD\n");
> @@ -792,8 +795,9 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
> 
>  	/* No GRH for DR SMP */
>  	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
> -				  (struct ib_mad *)smp,
> -				  (struct ib_mad *)&mad_priv->mad);
> +				  (struct ib_mad_hdr *)smp, in_mad_size,
> +				  (struct ib_mad_hdr *)&mad_priv->mad,
> +				  &out_mad_size);

Rather than calling device->process_mad() directly, would it be better to call a common function?  So we can avoid adding:

> +	struct ib_mad *in_mad = (struct ib_mad *)in;
> +	struct ib_mad *out_mad = (struct ib_mad *)out;
> +
> +	if (in_mad_size != sizeof(*in_mad) || *out_mad_size !=
> sizeof(*out_mad))
> +		return IB_MAD_RESULT_FAILURE;

to existing drivers?


>  	switch (ret)
>  	{
>  	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY:
> @@ -2011,6 +2015,7 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
>  	struct ib_mad_agent_private *mad_agent;
>  	int port_num;
>  	int ret = IB_MAD_RESULT_SUCCESS;
> +	size_t resp_mad_size;
> 
>  	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
>  	qp_info = mad_list->mad_queue->qp_info;
> @@ -2038,7 +2043,7 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
>  	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
>  		goto out;
> 
> -	response = alloc_mad_priv(port_priv->device);
> +	response = alloc_mad_priv(port_priv->device, &resp_mad_size);
>  	if (!response) {
>  		dev_err(&port_priv->device->dev,
>  			"ib_mad_recv_done_handler no memory for response
> buffer\n");
> @@ -2063,8 +2068,10 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
>  		ret = port_priv->device->process_mad(port_priv->device, 0,
>  						     port_priv->port_num,
>  						     wc, &recv->grh,
> -						     &recv->mad.mad,
> -						     &response->mad.mad);
> +						     (struct ib_mad_hdr *)&recv-
> >mad.mad,
> +						     port_priv->device-
> >cached_dev_attrs.max_mad_size,

This is the size of the allocated buffer.  Something based on wc.byte_len seems like a better option.


> +						     (struct ib_mad_hdr *)&response-
> >mad.mad,
> +						     &resp_mad_size);
>  		if (ret & IB_MAD_RESULT_SUCCESS) {
>  			if (ret & IB_MAD_RESULT_CONSUMED)
>  				goto out;
> @@ -2687,7 +2694,10 @@ static int ib_mad_post_receive_mads(struct
> ib_mad_qp_info *qp_info,
>  			mad_priv = mad;
>  			mad = NULL;
>  		} else {
> -			mad_priv = alloc_mad_priv(qp_info->port_priv->device);
> +			size_t mad_size;
> +
> +			mad_priv = alloc_mad_priv(qp_info->port_priv->device,
> +						  &mad_size);
>  			if (!mad_priv) {
>  				dev_err(&qp_info->port_priv->device->dev,
>  					"No memory for receive buffer\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 15/19] IB/mad: Create jumbo_mad data structures
       [not found]     ` <1423092585-26692-16-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 23:08       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD6C8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 23:08 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

> Define jumbo_mad and jumbo_rmpp_mad.

I would just use 'opa_mad' in place of 'jumbo_mad'.  Jumbo sounds like a marketing term or elephant name.
 
> Jumbo MAD structures are 2K versions of ib_mad and ib_rmpp_mad structures.
> Currently only OPA base version MADs are of this type.
> 
> Create an RMPP Base header to share between ib_rmpp_mad and jumbo_rmpp_mad
> 
> Update existing code to use the new structures.
> 
> Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> ---
>  drivers/infiniband/core/mad.c      |  18 +++---
>  drivers/infiniband/core/mad_priv.h |   2 +
>  drivers/infiniband/core/mad_rmpp.c | 120 ++++++++++++++++++--------------
> -----
>  drivers/infiniband/core/user_mad.c |  16 ++---
>  include/rdma/ib_mad.h              |  26 +++++++-
>  5 files changed, 103 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 2145294..316b4b2 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -883,7 +883,7 @@ static int alloc_send_rmpp_list(struct
> ib_mad_send_wr_private *send_wr,
>  				gfp_t gfp_mask)
>  {
>  	struct ib_mad_send_buf *send_buf = &send_wr->send_buf;
> -	struct ib_rmpp_mad *rmpp_mad = send_buf->mad;
> +	struct ib_rmpp_base *rmpp_base = send_buf->mad;
>  	struct ib_rmpp_segment *seg = NULL;
>  	int left, seg_size, pad;
> 
> @@ -909,10 +909,10 @@ static int alloc_send_rmpp_list(struct
> ib_mad_send_wr_private *send_wr,
>  	if (pad)
>  		memset(seg->data + seg_size - pad, 0, pad);
> 
> -	rmpp_mad->rmpp_hdr.rmpp_version = send_wr->mad_agent_priv->
> +	rmpp_base->rmpp_hdr.rmpp_version = send_wr->mad_agent_priv->
>  					  agent.rmpp_version;
> -	rmpp_mad->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_DATA;
> -	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
> +	rmpp_base->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_DATA;
> +	ib_set_rmpp_flags(&rmpp_base->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
> 
>  	send_wr->cur_seg = container_of(send_wr->rmpp_list.next,
>  					struct ib_rmpp_segment, list);
> @@ -1748,14 +1748,14 @@ out:
>  static int is_rmpp_data_mad(struct ib_mad_agent_private *mad_agent_priv,
>  		       struct ib_mad_hdr *mad_hdr)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
> 
> -	rmpp_mad = (struct ib_rmpp_mad *)mad_hdr;
> +	rmpp_base = (struct ib_rmpp_base *)mad_hdr;
>  	return !mad_agent_priv->agent.rmpp_version ||
>  		!ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent) ||
> -		!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
> +		!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
>  				    IB_MGMT_RMPP_FLAG_ACTIVE) ||
> -		(rmpp_mad->rmpp_hdr.rmpp_type == IB_MGMT_RMPP_TYPE_DATA);
> +		(rmpp_base->rmpp_hdr.rmpp_type == IB_MGMT_RMPP_TYPE_DATA);
>  }
> 
>  static inline int rcv_has_same_class(struct ib_mad_send_wr_private *wr,
> @@ -1897,7 +1897,7 @@ static void ib_mad_complete_recv(struct
> ib_mad_agent_private *mad_agent_priv,
>  			spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
>  			if (!ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent)
>  			   && ib_is_mad_class_rmpp(mad_recv_wc->recv_buf.mad-
> >mad_hdr.mgmt_class)
> -			   && (ib_get_rmpp_flags(&((struct ib_rmpp_mad
> *)mad_recv_wc->recv_buf.mad)->rmpp_hdr)
> +			   && (ib_get_rmpp_flags(&((struct ib_rmpp_base
> *)mad_recv_wc->recv_buf.mad)->rmpp_hdr)
>  					& IB_MGMT_RMPP_FLAG_ACTIVE)) {
>  				/* user rmpp is in effect
>  				 * and this is an active RMPP MAD
> diff --git a/drivers/infiniband/core/mad_priv.h
> b/drivers/infiniband/core/mad_priv.h
> index d1a0b0e..d71ddcc 100644
> --- a/drivers/infiniband/core/mad_priv.h
> +++ b/drivers/infiniband/core/mad_priv.h
> @@ -80,6 +80,8 @@ struct ib_mad_private {
>  		struct ib_mad mad;
>  		struct ib_rmpp_mad rmpp_mad;
>  		struct ib_smp smp;
> +		struct jumbo_mad jumbo_mad;
> +		struct jumbo_rmpp_mad jumbo_rmpp_mad;
>  	} mad;
>  } __attribute__ ((packed));
> 
> diff --git a/drivers/infiniband/core/mad_rmpp.c
> b/drivers/infiniband/core/mad_rmpp.c
> index 2379e2d..7184530 100644
> --- a/drivers/infiniband/core/mad_rmpp.c
> +++ b/drivers/infiniband/core/mad_rmpp.c
> @@ -111,10 +111,10 @@ void ib_cancel_rmpp_recvs(struct
> ib_mad_agent_private *agent)
>  }
> 
>  static void format_ack(struct ib_mad_send_buf *msg,
> -		       struct ib_rmpp_mad *data,
> +		       struct ib_rmpp_base *data,
>  		       struct mad_rmpp_recv *rmpp_recv)
>  {
> -	struct ib_rmpp_mad *ack = msg->mad;
> +	struct ib_rmpp_base *ack = msg->mad;
>  	unsigned long flags;
> 
>  	memcpy(ack, &data->mad_hdr, msg->hdr_len);
> @@ -144,7 +144,7 @@ static void ack_recv(struct mad_rmpp_recv *rmpp_recv,
>  	if (IS_ERR(msg))
>  		return;
> 
> -	format_ack(msg, (struct ib_rmpp_mad *) recv_wc->recv_buf.mad,
> rmpp_recv);
> +	format_ack(msg, (struct ib_rmpp_base *) recv_wc->recv_buf.mad,
> rmpp_recv);
>  	msg->ah = rmpp_recv->ah;
>  	ret = ib_post_send_mad(msg, NULL);
>  	if (ret)
> @@ -182,20 +182,20 @@ static void ack_ds_ack(struct ib_mad_agent_private
> *agent,
>  		       struct ib_mad_recv_wc *recv_wc)
>  {
>  	struct ib_mad_send_buf *msg;
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
>  	int ret;
> 
>  	msg = alloc_response_msg(&agent->agent, recv_wc);
>  	if (IS_ERR(msg))
>  		return;
> 
> -	rmpp_mad = msg->mad;
> -	memcpy(rmpp_mad, recv_wc->recv_buf.mad, msg->hdr_len);
> +	rmpp_base = msg->mad;
> +	memcpy(rmpp_base, recv_wc->recv_buf.mad, msg->hdr_len);
> 
> -	rmpp_mad->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
> -	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
> -	rmpp_mad->rmpp_hdr.seg_num = 0;
> -	rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(1);
> +	rmpp_base->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
> +	ib_set_rmpp_flags(&rmpp_base->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
> +	rmpp_base->rmpp_hdr.seg_num = 0;
> +	rmpp_base->rmpp_hdr.paylen_newwin = cpu_to_be32(1);
> 
>  	ret = ib_post_send_mad(msg, NULL);
>  	if (ret) {
> @@ -215,23 +215,23 @@ static void nack_recv(struct ib_mad_agent_private
> *agent,
>  		      struct ib_mad_recv_wc *recv_wc, u8 rmpp_status)
>  {
>  	struct ib_mad_send_buf *msg;
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
>  	int ret;
> 
>  	msg = alloc_response_msg(&agent->agent, recv_wc);
>  	if (IS_ERR(msg))
>  		return;
> 
> -	rmpp_mad = msg->mad;
> -	memcpy(rmpp_mad, recv_wc->recv_buf.mad, msg->hdr_len);
> +	rmpp_base = msg->mad;
> +	memcpy(rmpp_base, recv_wc->recv_buf.mad, msg->hdr_len);
> 
> -	rmpp_mad->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
> -	rmpp_mad->rmpp_hdr.rmpp_version = IB_MGMT_RMPP_VERSION;
> -	rmpp_mad->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_ABORT;
> -	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
> -	rmpp_mad->rmpp_hdr.rmpp_status = rmpp_status;
> -	rmpp_mad->rmpp_hdr.seg_num = 0;
> -	rmpp_mad->rmpp_hdr.paylen_newwin = 0;
> +	rmpp_base->mad_hdr.method ^= IB_MGMT_METHOD_RESP;
> +	rmpp_base->rmpp_hdr.rmpp_version = IB_MGMT_RMPP_VERSION;
> +	rmpp_base->rmpp_hdr.rmpp_type = IB_MGMT_RMPP_TYPE_ABORT;
> +	ib_set_rmpp_flags(&rmpp_base->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
> +	rmpp_base->rmpp_hdr.rmpp_status = rmpp_status;
> +	rmpp_base->rmpp_hdr.seg_num = 0;
> +	rmpp_base->rmpp_hdr.paylen_newwin = 0;
> 
>  	ret = ib_post_send_mad(msg, NULL);
>  	if (ret) {
> @@ -373,18 +373,18 @@ insert_rmpp_recv(struct ib_mad_agent_private *agent,
> 
>  static inline int get_last_flag(struct ib_mad_recv_buf *seg)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
> 
> -	rmpp_mad = (struct ib_rmpp_mad *) seg->mad;
> -	return ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
> IB_MGMT_RMPP_FLAG_LAST;
> +	rmpp_base = (struct ib_rmpp_base *) seg->mad;
> +	return ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
> IB_MGMT_RMPP_FLAG_LAST;
>  }
> 
>  static inline int get_seg_num(struct ib_mad_recv_buf *seg)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
> 
> -	rmpp_mad = (struct ib_rmpp_mad *) seg->mad;
> -	return be32_to_cpu(rmpp_mad->rmpp_hdr.seg_num);
> +	rmpp_base = (struct ib_rmpp_base *) seg->mad;
> +	return be32_to_cpu(rmpp_base->rmpp_hdr.seg_num);
>  }
> 
>  static inline struct ib_mad_recv_buf * get_next_seg(struct list_head
> *rmpp_list,
> @@ -436,9 +436,9 @@ static inline int get_mad_len(struct mad_rmpp_recv
> *rmpp_recv)
> 
>  	rmpp_mad = (struct ib_rmpp_mad *)rmpp_recv->cur_seg_buf->mad;
> 
> -	hdr_size = ib_get_mad_data_offset(rmpp_mad->mad_hdr.mgmt_class);
> +	hdr_size = ib_get_mad_data_offset(rmpp_mad-
> >base.mad_hdr.mgmt_class);
>  	data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
> -	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad-
> >rmpp_hdr.paylen_newwin);
> +	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad-
> >base.rmpp_hdr.paylen_newwin);
>  	if (pad > IB_MGMT_RMPP_DATA || pad < 0)
>  		pad = 0;
> 
> @@ -567,20 +567,20 @@ static int send_next_seg(struct
> ib_mad_send_wr_private *mad_send_wr)
>  	u32 paylen = 0;
> 
>  	rmpp_mad = mad_send_wr->send_buf.mad;
> -	ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE);
> -	rmpp_mad->rmpp_hdr.seg_num = cpu_to_be32(++mad_send_wr->seg_num);
> +	ib_set_rmpp_flags(&rmpp_mad->base.rmpp_hdr,
> IB_MGMT_RMPP_FLAG_ACTIVE);
> +	rmpp_mad->base.rmpp_hdr.seg_num = cpu_to_be32(++mad_send_wr-
> >seg_num);
> 
>  	if (mad_send_wr->seg_num == 1) {
> -		rmpp_mad->rmpp_hdr.rmpp_rtime_flags |=
> IB_MGMT_RMPP_FLAG_FIRST;
> +		rmpp_mad->base.rmpp_hdr.rmpp_rtime_flags |=
> IB_MGMT_RMPP_FLAG_FIRST;
>  		paylen = mad_send_wr->send_buf.seg_count * IB_MGMT_RMPP_DATA -
>  			 mad_send_wr->pad;
>  	}
> 
>  	if (mad_send_wr->seg_num == mad_send_wr->send_buf.seg_count) {
> -		rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_LAST;
> +		rmpp_mad->base.rmpp_hdr.rmpp_rtime_flags |=
> IB_MGMT_RMPP_FLAG_LAST;
>  		paylen = IB_MGMT_RMPP_DATA - mad_send_wr->pad;
>  	}
> -	rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(paylen);
> +	rmpp_mad->base.rmpp_hdr.paylen_newwin = cpu_to_be32(paylen);
> 
>  	/* 2 seconds for an ACK until we can find the packet lifetime */
>  	timeout = mad_send_wr->send_buf.timeout_ms;
> @@ -644,19 +644,19 @@ static void process_rmpp_ack(struct
> ib_mad_agent_private *agent,
>  			     struct ib_mad_recv_wc *mad_recv_wc)
>  {
>  	struct ib_mad_send_wr_private *mad_send_wr;
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
>  	unsigned long flags;
>  	int seg_num, newwin, ret;
> 
> -	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
> -	if (rmpp_mad->rmpp_hdr.rmpp_status) {
> +	rmpp_base = (struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad;
> +	if (rmpp_base->rmpp_hdr.rmpp_status) {
>  		abort_send(agent, mad_recv_wc,
> IB_MGMT_RMPP_STATUS_BAD_STATUS);
>  		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
>  		return;
>  	}
> 
> -	seg_num = be32_to_cpu(rmpp_mad->rmpp_hdr.seg_num);
> -	newwin = be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin);
> +	seg_num = be32_to_cpu(rmpp_base->rmpp_hdr.seg_num);
> +	newwin = be32_to_cpu(rmpp_base->rmpp_hdr.paylen_newwin);
>  	if (newwin < seg_num) {
>  		abort_send(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_W2S);
>  		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_W2S);
> @@ -741,7 +741,7 @@ process_rmpp_data(struct ib_mad_agent_private *agent,
>  	struct ib_rmpp_hdr *rmpp_hdr;
>  	u8 rmpp_status;
> 
> -	rmpp_hdr = &((struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad)-
> >rmpp_hdr;
> +	rmpp_hdr = &((struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad)-
> >rmpp_hdr;
> 
>  	if (rmpp_hdr->rmpp_status) {
>  		rmpp_status = IB_MGMT_RMPP_STATUS_BAD_STATUS;
> @@ -770,30 +770,30 @@ bad:
>  static void process_rmpp_stop(struct ib_mad_agent_private *agent,
>  			      struct ib_mad_recv_wc *mad_recv_wc)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
> 
> -	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
> +	rmpp_base = (struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad;
> 
> -	if (rmpp_mad->rmpp_hdr.rmpp_status != IB_MGMT_RMPP_STATUS_RESX) {
> +	if (rmpp_base->rmpp_hdr.rmpp_status != IB_MGMT_RMPP_STATUS_RESX) {
>  		abort_send(agent, mad_recv_wc,
> IB_MGMT_RMPP_STATUS_BAD_STATUS);
>  		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
>  	} else
> -		abort_send(agent, mad_recv_wc, rmpp_mad-
> >rmpp_hdr.rmpp_status);
> +		abort_send(agent, mad_recv_wc, rmpp_base-
> >rmpp_hdr.rmpp_status);
>  }
> 
>  static void process_rmpp_abort(struct ib_mad_agent_private *agent,
>  			       struct ib_mad_recv_wc *mad_recv_wc)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
> 
> -	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
> +	rmpp_base = (struct ib_rmpp_base *)mad_recv_wc->recv_buf.mad;
> 
> -	if (rmpp_mad->rmpp_hdr.rmpp_status < IB_MGMT_RMPP_STATUS_ABORT_MIN
> ||
> -	    rmpp_mad->rmpp_hdr.rmpp_status > IB_MGMT_RMPP_STATUS_ABORT_MAX)
> {
> +	if (rmpp_base->rmpp_hdr.rmpp_status < IB_MGMT_RMPP_STATUS_ABORT_MIN
> ||
> +	    rmpp_base->rmpp_hdr.rmpp_status > IB_MGMT_RMPP_STATUS_ABORT_MAX)
> {
>  		abort_send(agent, mad_recv_wc,
> IB_MGMT_RMPP_STATUS_BAD_STATUS);
>  		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_BAD_STATUS);
>  	} else
> -		abort_send(agent, mad_recv_wc, rmpp_mad-
> >rmpp_hdr.rmpp_status);
> +		abort_send(agent, mad_recv_wc, rmpp_base-
> >rmpp_hdr.rmpp_status);
>  }
> 
>  struct ib_mad_recv_wc *
> @@ -803,16 +803,16 @@ ib_process_rmpp_recv_wc(struct ib_mad_agent_private
> *agent,
>  	struct ib_rmpp_mad *rmpp_mad;
> 
>  	rmpp_mad = (struct ib_rmpp_mad *)mad_recv_wc->recv_buf.mad;
> -	if (!(rmpp_mad->rmpp_hdr.rmpp_rtime_flags &
> IB_MGMT_RMPP_FLAG_ACTIVE))
> +	if (!(rmpp_mad->base.rmpp_hdr.rmpp_rtime_flags &
> IB_MGMT_RMPP_FLAG_ACTIVE))
>  		return mad_recv_wc;
> 
> -	if (rmpp_mad->rmpp_hdr.rmpp_version != IB_MGMT_RMPP_VERSION) {
> +	if (rmpp_mad->base.rmpp_hdr.rmpp_version != IB_MGMT_RMPP_VERSION) {
>  		abort_send(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_UNV);
>  		nack_recv(agent, mad_recv_wc, IB_MGMT_RMPP_STATUS_UNV);
>  		goto out;
>  	}
> 
> -	switch (rmpp_mad->rmpp_hdr.rmpp_type) {
> +	switch (rmpp_mad->base.rmpp_hdr.rmpp_type) {
>  	case IB_MGMT_RMPP_TYPE_DATA:
>  		return process_rmpp_data(agent, mad_recv_wc);
>  	case IB_MGMT_RMPP_TYPE_ACK:
> @@ -873,11 +873,11 @@ int ib_send_rmpp_mad(struct ib_mad_send_wr_private
> *mad_send_wr)
>  	int ret;
> 
>  	rmpp_mad = mad_send_wr->send_buf.mad;
> -	if (!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
> +	if (!(ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) &
>  	      IB_MGMT_RMPP_FLAG_ACTIVE))
>  		return IB_RMPP_RESULT_UNHANDLED;
> 
> -	if (rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) {
> +	if (rmpp_mad->base.rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) {
>  		mad_send_wr->seg_num = 1;
>  		return IB_RMPP_RESULT_INTERNAL;
>  	}
> @@ -895,15 +895,15 @@ int ib_send_rmpp_mad(struct ib_mad_send_wr_private
> *mad_send_wr)
>  int ib_process_rmpp_send_wc(struct ib_mad_send_wr_private *mad_send_wr,
>  			    struct ib_mad_send_wc *mad_send_wc)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
>  	int ret;
> 
> -	rmpp_mad = mad_send_wr->send_buf.mad;
> -	if (!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
> +	rmpp_base = mad_send_wr->send_buf.mad;
> +	if (!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
>  	      IB_MGMT_RMPP_FLAG_ACTIVE))
>  		return IB_RMPP_RESULT_UNHANDLED; /* RMPP not active */
> 
> -	if (rmpp_mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA)
> +	if (rmpp_base->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA)
>  		return IB_RMPP_RESULT_INTERNAL;	 /* ACK, STOP, or ABORT */
> 
>  	if (mad_send_wc->status != IB_WC_SUCCESS ||
> @@ -933,11 +933,11 @@ int ib_process_rmpp_send_wc(struct
> ib_mad_send_wr_private *mad_send_wr,
> 
>  int ib_retry_rmpp(struct ib_mad_send_wr_private *mad_send_wr)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
>  	int ret;
> 
> -	rmpp_mad = mad_send_wr->send_buf.mad;
> -	if (!(ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
> +	rmpp_base = mad_send_wr->send_buf.mad;
> +	if (!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
>  	      IB_MGMT_RMPP_FLAG_ACTIVE))
>  		return IB_RMPP_RESULT_UNHANDLED; /* RMPP not active */
> 
> diff --git a/drivers/infiniband/core/user_mad.c
> b/drivers/infiniband/core/user_mad.c
> index 9628494..ac33d34 100644
> --- a/drivers/infiniband/core/user_mad.c
> +++ b/drivers/infiniband/core/user_mad.c
> @@ -448,7 +448,7 @@ static ssize_t ib_umad_write(struct file *filp, const
> char __user *buf,
>  	struct ib_mad_agent *agent;
>  	struct ib_ah_attr ah_attr;
>  	struct ib_ah *ah;
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
>  	__be64 *tid;
>  	int ret, data_len, hdr_len, copy_offset, rmpp_active;
> 
> @@ -504,13 +504,13 @@ static ssize_t ib_umad_write(struct file *filp,
> const char __user *buf,
>  		goto err_up;
>  	}
> 
> -	rmpp_mad = (struct ib_rmpp_mad *) packet->mad.data;
> -	hdr_len = ib_get_mad_data_offset(rmpp_mad->mad_hdr.mgmt_class);
> +	rmpp_base = (struct ib_rmpp_base *) packet->mad.data;
> +	hdr_len = ib_get_mad_data_offset(rmpp_base->mad_hdr.mgmt_class);
> 
> -	if (ib_is_mad_class_rmpp(rmpp_mad->mad_hdr.mgmt_class)
> +	if (ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
>  	    && ib_mad_kernel_rmpp_agent(agent)) {
>  		copy_offset = IB_MGMT_RMPP_HDR;
> -		rmpp_active = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
> +		rmpp_active = ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
>  						IB_MGMT_RMPP_FLAG_ACTIVE;
>  	} else {
>  		copy_offset = IB_MGMT_MAD_HDR;
> @@ -558,12 +558,12 @@ static ssize_t ib_umad_write(struct file *filp,
> const char __user *buf,
>  		tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid;
>  		*tid = cpu_to_be64(((u64) agent->hi_tid) << 32 |
>  				   (be64_to_cpup(tid) & 0xffffffff));
> -		rmpp_mad->mad_hdr.tid = *tid;
> +		rmpp_base->mad_hdr.tid = *tid;
>  	}
> 
>  	if (!ib_mad_kernel_rmpp_agent(agent)
> -	   && ib_is_mad_class_rmpp(rmpp_mad->mad_hdr.mgmt_class)
> -	   && (ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) &
> IB_MGMT_RMPP_FLAG_ACTIVE)) {
> +	   && ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
> +	   && (ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
> IB_MGMT_RMPP_FLAG_ACTIVE)) {
>  		spin_lock_irq(&file->send_lock);
>  		list_add_tail(&packet->list, &file->send_list);
>  		spin_unlock_irq(&file->send_lock);
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index 00a5e51..80e7cf4 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -136,6 +136,11 @@ enum {
>  	IB_MGMT_DEVICE_HDR = 64,
>  	IB_MGMT_DEVICE_DATA = 192,
>  	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
> +	JUMBO_MGMT_MAD_HDR = IB_MGMT_MAD_HDR,
> +	JUMBO_MGMT_MAD_DATA = 2024,
> +	JUMBO_MGMT_RMPP_HDR = IB_MGMT_RMPP_HDR,
> +	JUMBO_MGMT_RMPP_DATA = 2012,
> +	JUMBO_MGMT_MAD_SIZE = JUMBO_MGMT_MAD_HDR + JUMBO_MGMT_MAD_DATA,

Keep the "IB_" prefix, or add a new "OPA_" prefix.

>  };
> 
>  struct ib_mad_hdr {
> @@ -182,12 +187,26 @@ struct ib_mad {
>  	u8			data[IB_MGMT_MAD_DATA];
>  };
> 
> -struct ib_rmpp_mad {
> +struct jumbo_mad {
> +	struct ib_mad_hdr	mad_hdr;
> +	u8			data[JUMBO_MGMT_MAD_DATA];
> +};
> +
> +struct ib_rmpp_base {
>  	struct ib_mad_hdr	mad_hdr;
>  	struct ib_rmpp_hdr	rmpp_hdr;
> +} __packed;
> +
> +struct ib_rmpp_mad {
> +	struct ib_rmpp_base	base;
>  	u8			data[IB_MGMT_RMPP_DATA];
>  };
> 
> +struct jumbo_rmpp_mad {
> +	struct ib_rmpp_base	base;
> +	u8			data[JUMBO_MGMT_RMPP_DATA];
> +};

Please separate this patch into 2 changes.  One that adds and updates ib_rmpp_base, with the second one defining ib_opa_mad & ib_opa_rmpp_mad (or whatever prefix is chosen).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 16/19] IB/mad: Add Intel Omni-Path Architecture defines
       [not found]     ` <1423092585-26692-17-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 23:33       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD708-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 23:33 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

>  /* Registration table sizes */
>  #define MAX_MGMT_CLASS		80
> -#define MAX_MGMT_VERSION	8
> +#define MAX_MGMT_VERSION	0x83

It's unfortunate that this results in a big jump in used versions.  Mad_priv.h defines this:

struct ib_mad_port_private {
	...
	struct ib_mad_mgmt_version_table version[MAX_MGMT_VERSION];

struct ib_mad_mgmt_version_table {
	struct ib_mad_mgmt_class_table *class;
	struct ib_mad_mgmt_vendor_class_table *vendor;
};

This ends up allocating about 2K of data per port of NULL pointers.  Not a huge deal, but still.

I don't have a great fix here.  Maybe the version[] array can be the necessary size, with some sort of simple mapping function from version to the index?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 17/19] IB/mad: Implement support for Intel Omni-Path Architecture base version MADs in ib_create_send_mad
       [not found]     ` <1423092585-26692-18-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 23:40       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD722-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 23:40 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

> @@ -937,20 +937,31 @@ struct ib_mad_send_buf * ib_create_send_mad(struct
> ib_mad_agent *mad_agent,
>  	struct ib_mad_send_wr_private *mad_send_wr;
>  	int pad, message_size, ret, size;
>  	void *buf;
> +	size_t mad_size;
> +	int opa;
> 
>  	mad_agent_priv = container_of(mad_agent, struct
> ib_mad_agent_private,
>  				      agent);
> -	pad = get_pad_size(hdr_len, data_len);
> +
> +	opa = mad_agent_priv->agent.device-
> >cached_dev_attrs.device_cap_flags2 &
> +	      IB_DEVICE_OPA_MAD_SUPPORT;
> +
> +	if (opa && base_version == OPA_MGMT_BASE_VERSION)
> +		mad_size = sizeof(struct jumbo_mad);
> +	else
> +		mad_size = sizeof(struct ib_mad);

Didn't an earlier patch make is possible to read the mad_size directly from the device?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 18/19] IB/mad: Implement Intel Omni-Path Architecture SMP processing
       [not found]     ` <1423092585-26692-19-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-03 23:47       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD742-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-03 23:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

> @@ -236,6 +252,24 @@ enum smi_action smi_handle_dr_smp_recv(struct ib_smp
> *smp, u8 node_type,
>  					smp->dr_slid == IB_LID_PERMISSIVE);
>  }
> 
> +/*
> + * Adjust information for a received SMP
> + * Return 0 if the SMP should be dropped

The function returns an enum.  The comment of returning 0 is misleading.  The entire comment seems unnecessary. 

> + */
> +enum smi_action opa_smi_handle_dr_smp_recv(struct opa_smp *smp, u8
> node_type,
> +					   int port_num, int phys_port_cnt)
> +{
> +	return __smi_handle_dr_smp_recv(node_type, port_num, phys_port_cnt,
> +					&smp->hop_ptr, smp->hop_cnt,
> +					smp->route.dr.initial_path,
> +					smp->route.dr.return_path,
> +					opa_get_smp_direction(smp),
> +					smp->route.dr.dr_dlid ==
> +					OPA_LID_PERMISSIVE,
> +					smp->route.dr.dr_slid ==
> +					OPA_LID_PERMISSIVE);
> +}
> +
>  static inline
>  enum smi_forward_action __smi_check_forward_dr_smp(u8 hop_ptr, u8
> hop_cnt,
>  						   u8 direction,
> @@ -277,6 +311,16 @@ enum smi_forward_action
> smi_check_forward_dr_smp(struct ib_smp *smp)
>  					  smp->dr_slid != IB_LID_PERMISSIVE);
>  }
> 
> +enum smi_forward_action opa_smi_check_forward_dr_smp(struct opa_smp *smp)
> +{
> +	return __smi_check_forward_dr_smp(smp->hop_ptr, smp->hop_cnt,
> +					  opa_get_smp_direction(smp),
> +					  smp->route.dr.dr_dlid ==
> +					  OPA_LID_PERMISSIVE,
> +					  smp->route.dr.dr_slid ==
> +					  OPA_LID_PERMISSIVE);
> +}
> +
>  /*
>   * Return the forwarding port number from initial_path for outgoing SMP
> and
>   * from return_path for returning SMP
> @@ -286,3 +330,13 @@ int smi_get_fwd_port(struct ib_smp *smp)
>  	return (!ib_get_smp_direction(smp) ? smp->initial_path[smp-
> >hop_ptr+1] :
>  		smp->return_path[smp->hop_ptr-1]);
>  }
> +
> +/*
> + * Return the forwarding port number from initial_path for outgoing SMP
> and
> + * from return_path for returning SMP
> + */
> +int opa_smi_get_fwd_port(struct opa_smp *smp)
> +{
> +	return !opa_get_smp_direction(smp) ? smp->route.dr.initial_path[smp-
> >hop_ptr+1] :
> +		smp->route.dr.return_path[smp->hop_ptr-1];
> +}
> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
> index aff96ba..e95c537 100644
> --- a/drivers/infiniband/core/smi.h
> +++ b/drivers/infiniband/core/smi.h
> @@ -62,6 +62,9 @@ extern enum smi_action smi_handle_dr_smp_send(struct
> ib_smp *smp,
>   * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
>   * via process_mad
>   */
> +/* NOTE: This is called on opa_smp's don't check fields which are not
> common
> + * between ib_smp and opa_smp
> + */

This comment suggests that the function is not correct for OPA.  ?

>  static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>  						  struct ib_device *device)
>  {
> @@ -77,6 +80,9 @@ static inline enum smi_action smi_check_local_smp(struct
> ib_smp *smp,
>   * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
>   * via process_mad
>   */
> +/* NOTE: This is called on opa_smp's don't check fields which are not
> common
> + * between ib_smp and opa_smp
> + */

Same comment

>  static inline enum smi_action smi_check_local_returning_smp(struct ib_smp
> *smp,
>  						   struct ib_device *device)
>  {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 15/19] IB/mad: Create jumbo_mad data structures
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD6C8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-04  0:14           ` Hefty, Sean
  2015-04-08 15:33           ` ira.weiny
  1 sibling, 0 replies; 84+ messages in thread
From: Hefty, Sean @ 2015-04-04  0:14 UTC (permalink / raw)
  To: Hefty, Sean, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

> > diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> > index 00a5e51..80e7cf4 100644
> > --- a/include/rdma/ib_mad.h
> > +++ b/include/rdma/ib_mad.h
> > @@ -136,6 +136,11 @@ enum {
> >  	IB_MGMT_DEVICE_HDR = 64,
> >  	IB_MGMT_DEVICE_DATA = 192,
> >  	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
> > +	JUMBO_MGMT_MAD_HDR = IB_MGMT_MAD_HDR,
> > +	JUMBO_MGMT_MAD_DATA = 2024,
> > +	JUMBO_MGMT_RMPP_HDR = IB_MGMT_RMPP_HDR,

Examining a later patch in this series highlighted that JUMBO_MGMT_MAD_HDR and JUMBO_MGMT_RMPP_HDR are the same size as the corresponding IB_MGMT_* defines.  This seems unnecessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]     ` <1423092585-26692-20-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-04-04  1:44       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD9DE-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-04  1:44 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Weiny, Ira

> diff --git a/drivers/infiniband/core/agent.c
> b/drivers/infiniband/core/agent.c
> index b6bd305..18275a5 100644
> --- a/drivers/infiniband/core/agent.c
> +++ b/drivers/infiniband/core/agent.c
> @@ -80,13 +80,17 @@ ib_get_agent_port(struct ib_device *device, int
> port_num)
> 
>  void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
>  			 struct ib_wc *wc, struct ib_device *device,
> -			 int port_num, int qpn)
> +			 int port_num, int qpn, u32 resp_mad_len,
> +			 int opa)

Can't OPA support be determined by looking at the device structure?

>  {
>  	struct ib_agent_port_private *port_priv;
>  	struct ib_mad_agent *agent;
>  	struct ib_mad_send_buf *send_buf;
>  	struct ib_ah *ah;
> +	size_t data_len;
> +	size_t hdr_len;
>  	struct ib_mad_send_wr_private *mad_send_wr;
> +	u8 base_version;
> 
>  	if (device->node_type == RDMA_NODE_IB_SWITCH)
>  		port_priv = ib_get_agent_port(device, 0);
> @@ -106,16 +110,29 @@ void agent_send_response(struct ib_mad *mad, struct
> ib_grh *grh,
>  		return;
>  	}
> 
> +	/* base version determines MAD size */
> +	base_version = mad->mad_hdr.base_version;
> +	if (opa && base_version == OPA_MGMT_BASE_VERSION) {
> +		data_len = resp_mad_len - JUMBO_MGMT_MAD_HDR;
> +		hdr_len = JUMBO_MGMT_MAD_HDR;
> +	} else {
> +		data_len = IB_MGMT_MAD_DATA;
> +		hdr_len = IB_MGMT_MAD_HDR;
> +	}

I _think_ this can be simplified to:

	hdr_len = IB_MGMT_MAD_HDR;
	data_len = resp_mad_len - hdr_len;

IB should set resp_mad_len = 256 in all cases.

> +
>  	send_buf = ib_create_send_mad(agent, wc->src_qp, wc->pkey_index, 0,
> -				      IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
> -				      GFP_KERNEL,
> -				      IB_MGMT_BASE_VERSION);
> +				      hdr_len, data_len, GFP_KERNEL,
> +				      base_version);
>  	if (IS_ERR(send_buf)) {
>  		dev_err(&device->dev, "ib_create_send_mad error\n");
>  		goto err1;
>  	}
> 
> -	memcpy(send_buf->mad, mad, sizeof *mad);
> +	if (opa && base_version == OPA_MGMT_BASE_VERSION)
> +		memcpy(send_buf->mad, mad, JUMBO_MGMT_MAD_HDR + data_len);
> +	else
> +		memcpy(send_buf->mad, mad, sizeof(*mad));

And this may be able to be simplified to:

	memcpy(send_buf->mad, mad, resp_mad_len);

> +
>  	send_buf->ah = ah;
> 
>  	if (device->node_type == RDMA_NODE_IB_SWITCH) {
> diff --git a/drivers/infiniband/core/agent.h
> b/drivers/infiniband/core/agent.h
> index 6669287..1dee837 100644
> --- a/drivers/infiniband/core/agent.h
> +++ b/drivers/infiniband/core/agent.h
> @@ -46,6 +46,7 @@ extern int ib_agent_port_close(struct ib_device *device,
> int port_num);
> 
>  extern void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
>  				struct ib_wc *wc, struct ib_device *device,
> -				int port_num, int qpn);
> +				int port_num, int qpn, u32 resp_mad_len,
> +				int opa);
> 
>  #endif	/* __AGENT_H_ */
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 5aefe4c..9b7dc36 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2005 Intel Corporation.  All rights reserved.
>   * Copyright (c) 2005 Mellanox Technologies Ltd.  All rights reserved.
>   * Copyright (c) 2009 HNR Consulting. All rights reserved.
> + * Copyright (c) 2014 Intel Corporation.  All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -44,6 +45,7 @@
>  #include "mad_priv.h"
>  #include "mad_rmpp.h"
>  #include "smi.h"
> +#include "opa_smi.h"
>  #include "agent.h"
> 
>  MODULE_LICENSE("Dual BSD/GPL");
> @@ -733,6 +735,7 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  {
>  	int ret = 0;
>  	struct ib_smp *smp = mad_send_wr->send_buf.mad;
> +	struct opa_smp *opa_smp = (struct opa_smp *)smp;
>  	unsigned long flags;
>  	struct ib_mad_local_private *local;
>  	struct ib_mad_private *mad_priv;
> @@ -744,6 +747,9 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
>  	size_t in_mad_size = mad_agent_priv->agent.device-
> >cached_dev_attrs.max_mad_size;
>  	size_t out_mad_size;
> +	u16 drslid;
> +	int opa = mad_agent_priv->qp_info->qp->device-
> >cached_dev_attrs.device_cap_flags2 &
> +		  IB_DEVICE_OPA_MAD_SUPPORT;
> 
>  	if (device->node_type == RDMA_NODE_IB_SWITCH &&
>  	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
> @@ -757,13 +763,36 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  	 * If we are at the start of the LID routed part, don't update the
>  	 * hop_ptr or hop_cnt.  See section 14.2.2, Vol 1 IB spec.
>  	 */
> -	if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid) ==
> -	     IB_LID_PERMISSIVE &&
> -	     smi_handle_dr_smp_send(smp, device->node_type, port_num) ==
> -	     IB_SMI_DISCARD) {
> -		ret = -EINVAL;
> -		dev_err(&device->dev, "Invalid directed route\n");
> -		goto out;
> +	if (opa && smp->class_version == OPA_SMP_CLASS_VERSION) {

There are several places where this sort of check is made.  IMO, this check should only require looking at the MAD, not the MAD + the device attributes that the MAD will be transferred on.  I would actually prefer to see this as:

	if (smp->class_version == OPA_SMP_CLASS_VERSION)

That check is sufficient.  There is no conflict with IB MADs, and it's needlessly complicates the code to assume that the IBTA is going to someday define another 128 class versions in such a way that those versions will not require any other changes to the code.

> +		u32 opa_drslid;
> +		if ((opa_get_smp_direction(opa_smp)
> +		     ? opa_smp->route.dr.dr_dlid : opa_smp->route.dr.dr_slid)
> ==
> +		     OPA_LID_PERMISSIVE &&
> +		     opa_smi_handle_dr_smp_send(opa_smp, device->node_type,
> +						port_num) == IB_SMI_DISCARD) {
> +			ret = -EINVAL;
> +			dev_err(&device->dev, "OPA Invalid directed route\n");
> +			goto out;
> +		}
> +		opa_drslid = be32_to_cpu(opa_smp->route.dr.dr_slid);
> +		if (opa_drslid != OPA_LID_PERMISSIVE &&
> +		    opa_drslid & 0xffff0000) {
> +			ret = -EINVAL;
> +			dev_err(&device->dev, "OPA Invalid dr_slid 0x%x\n",
> +			       opa_drslid);
> +			goto out;
> +		}
> +		drslid = (u16)(opa_drslid & 0x0000ffff);
> +	} else {
> +		if ((ib_get_smp_direction(smp) ? smp->dr_dlid : smp->dr_slid)
> ==
> +		     IB_LID_PERMISSIVE &&
> +		     smi_handle_dr_smp_send(smp, device->node_type, port_num)
> ==
> +		     IB_SMI_DISCARD) {
> +			ret = -EINVAL;
> +			dev_err(&device->dev, "Invalid directed route\n");
> +			goto out;
> +		}
> +		drslid = be16_to_cpu(smp->dr_slid);
>  	}
> 
>  	/* Check to post send on QP or process locally */
> @@ -789,10 +818,16 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  	}
> 
>  	build_smp_wc(mad_agent_priv->agent.qp,
> -		     send_wr->wr_id, be16_to_cpu(smp->dr_slid),
> +		     send_wr->wr_id, drslid,
>  		     send_wr->wr.ud.pkey_index,
>  		     send_wr->wr.ud.port_num, &mad_wc);
> 
> +	if (opa && smp->base_version == OPA_MGMT_BASE_VERSION) {
> +		mad_wc.byte_len = mad_send_wr->send_buf.hdr_len
> +					+ mad_send_wr->send_buf.data_len
> +					+ sizeof(struct ib_grh);
> +	}
> +
>  	/* No GRH for DR SMP */
>  	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
>  				  (struct ib_mad_hdr *)smp, in_mad_size,
> @@ -821,7 +856,10 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  		port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
>  					    mad_agent_priv->agent.port_num);
>  		if (port_priv) {
> -			memcpy(&mad_priv->mad.mad, smp, sizeof(struct ib_mad));
> +			if (opa && smp->base_version == OPA_MGMT_BASE_VERSION)
> +				memcpy(&mad_priv->mad.mad, smp, sizeof(struct
> jumbo_mad));
> +			else
> +				memcpy(&mad_priv->mad.mad, smp, sizeof(struct
> ib_mad));
>  			recv_mad_agent = find_mad_agent(port_priv,
>  						        &mad_priv->mad.mad);
>  		}
> @@ -844,6 +882,8 @@ static int handle_outgoing_dr_smp(struct
> ib_mad_agent_private *mad_agent_priv,
>  	}
> 
>  	local->mad_send_wr = mad_send_wr;
> +	local->mad_send_wr->send_wr.wr.ud.pkey_index = mad_wc.pkey_index;
> +	local->return_wc_byte_len = out_mad_size;
>  	/* Reference MAD agent until send side of local completion handled
> */
>  	atomic_inc(&mad_agent_priv->refcount);
>  	/* Queue local completion to local list */
> @@ -1737,14 +1777,18 @@ out:
>  	return mad_agent;
>  }
> 
> -static int validate_mad(struct ib_mad_hdr *mad_hdr, u32 qp_num)
> +static int validate_mad(struct ib_mad_hdr *mad_hdr,
> +			struct ib_mad_qp_info *qp_info,
> +			int opa)

I'm not a fan of having an 'opa' integer passed around to a bunch of functions.  This can be determined through the qp_info parameter.

>  {
>  	int valid = 0;
> +	u32 qp_num = qp_info->qp->qp_num;

Am I missing where this is used?

> 
>  	/* Make sure MAD base version is understood */
> -	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION) {
> -		pr_err("MAD received with unsupported base version %d\n",
> -			mad_hdr->base_version);
> +	if (mad_hdr->base_version != IB_MGMT_BASE_VERSION &&
> +	    (!opa || mad_hdr->base_version != OPA_MGMT_BASE_VERSION)) {
> +		pr_err("MAD received with unsupported base version %d %s\n",
> +		       mad_hdr->base_version, opa ? "(opa)" : "");
>  		goto out;
>  	}
> 
> @@ -1844,18 +1888,18 @@ ib_find_send_mad(struct ib_mad_agent_private
> *mad_agent_priv,
>  		 struct ib_mad_recv_wc *wc)
>  {
>  	struct ib_mad_send_wr_private *wr;
> -	struct ib_mad *mad;
> +	struct ib_mad_hdr *mad_hdr;
> 
> -	mad = (struct ib_mad *)wc->recv_buf.mad;
> +	mad_hdr = (struct ib_mad_hdr *)wc->recv_buf.mad;
> 
>  	list_for_each_entry(wr, &mad_agent_priv->wait_list, agent_list) {
> -		if ((wr->tid == mad->mad_hdr.tid) &&
> +		if ((wr->tid == mad_hdr->tid) &&
>  		    rcv_has_same_class(wr, wc) &&
>  		    /*
>  		     * Don't check GID for direct routed MADs.
>  		     * These might have permissive LIDs.
>  		     */
> -		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
> +		    (is_direct(mad_hdr->mgmt_class) ||
>  		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
>  			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
>  	}
> @@ -1866,14 +1910,14 @@ ib_find_send_mad(struct ib_mad_agent_private
> *mad_agent_priv,
>  	 */
>  	list_for_each_entry(wr, &mad_agent_priv->send_list, agent_list) {
>  		if (is_rmpp_data_mad(mad_agent_priv, wr->send_buf.mad) &&
> -		    wr->tid == mad->mad_hdr.tid &&
> +		    wr->tid == mad_hdr->tid &&
>  		    wr->timeout &&
>  		    rcv_has_same_class(wr, wc) &&
>  		    /*
>  		     * Don't check GID for direct routed MADs.
>  		     * These might have permissive LIDs.
>  		     */
> -		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
> +		    (is_direct(mad_hdr->mgmt_class) ||
>  		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
>  			/* Verify request has not been canceled */
>  			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;

The updates to the two functions above can be pulled out into a separate commit.

> @@ -1889,7 +1933,7 @@ void ib_mark_mad_done(struct ib_mad_send_wr_private
> *mad_send_wr)
>  			      &mad_send_wr->mad_agent_priv->done_list);
>  }
> 
> -static void ib_mad_complete_recv(struct ib_mad_agent_private
> *mad_agent_priv,
> +void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
>  				 struct ib_mad_recv_wc *mad_recv_wc)
>  {
>  	struct ib_mad_send_wr_private *mad_send_wr;
> @@ -1992,7 +2036,9 @@ enum smi_action handle_ib_smi(struct
> ib_mad_port_private *port_priv,
>  				    &response->grh, wc,
>  				    port_priv->device,
>  				    smi_get_fwd_port(&recv->mad.smp),
> -				    qp_info->qp->qp_num);
> +				    qp_info->qp->qp_num,
> +				    sizeof(struct ib_mad),
> +				    0);
> 
>  		return IB_SMI_DISCARD;
>  	}
> @@ -2005,7 +2051,9 @@ static size_t mad_recv_buf_size(struct ib_device
> *dev)
>  }
> 
>  static bool generate_unmatched_resp(struct ib_mad_private *recv,
> -				    struct ib_mad_private *response)
> +				    struct ib_mad_private *response,
> +				    size_t *resp_len,
> +				    int opa)
>  {
>  	if (recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_GET ||
>  	    recv->mad.mad.mad_hdr.method == IB_MGMT_METHOD_SET) {
> @@ -2019,29 +2067,103 @@ static bool generate_unmatched_resp(struct
> ib_mad_private *recv,
>  		if (recv->mad.mad.mad_hdr.mgmt_class ==
> IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
>  			response->mad.mad.mad_hdr.status |= IB_SMP_DIRECTION;
> 
> +		if (opa && recv->mad.mad.mad_hdr.base_version ==
> OPA_MGMT_BASE_VERSION) {
> +			if (recv->mad.mad.mad_hdr.mgmt_class ==
> +			    IB_MGMT_CLASS_SUBN_LID_ROUTED ||
> +			    recv->mad.mad.mad_hdr.mgmt_class ==
> +			    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
> +				*resp_len = opa_get_smp_header_size(
> +							(struct opa_smp *)&recv-
> >mad.smp);
> +			else
> +				*resp_len = sizeof(struct ib_mad_hdr);
> +		}
> +

A local variable mad_hdr = &recv->mad.mad.mad_hdr may help with the readability of this function.

>  		return true;
>  	} else {
>  		return false;
>  	}
>  }
> +
> +static enum smi_action
> +handle_opa_smi(struct ib_mad_port_private *port_priv,
> +	       struct ib_mad_qp_info *qp_info,
> +	       struct ib_wc *wc,
> +	       int port_num,
> +	       struct ib_mad_private *recv,
> +	       struct ib_mad_private *response)
> +{
> +	enum smi_forward_action retsmi;
> +
> +	if (opa_smi_handle_dr_smp_recv(&recv->mad.opa_smp,
> +				   port_priv->device->node_type,
> +				   port_num,
> +				   port_priv->device->phys_port_cnt) ==
> +				   IB_SMI_DISCARD)
> +		return IB_SMI_DISCARD;
> +
> +	retsmi = opa_smi_check_forward_dr_smp(&recv->mad.opa_smp);
> +	if (retsmi == IB_SMI_LOCAL)
> +		return IB_SMI_HANDLE;
> +
> +	if (retsmi == IB_SMI_SEND) { /* don't forward */
> +		if (opa_smi_handle_dr_smp_send(&recv->mad.opa_smp,
> +					   port_priv->device->node_type,
> +					   port_num) == IB_SMI_DISCARD)
> +			return IB_SMI_DISCARD;
> +
> +		if (opa_smi_check_local_smp(&recv->mad.opa_smp, port_priv-
> >device) == IB_SMI_DISCARD)
> +			return IB_SMI_DISCARD;
> +
> +	} else if (port_priv->device->node_type == RDMA_NODE_IB_SWITCH) {
> +		/* forward case for switches */
> +		memcpy(response, recv, sizeof(*response));
> +		response->header.recv_wc.wc = &response->header.wc;
> +		response->header.recv_wc.recv_buf.jumbo_mad = &response-
> >mad.jumbo_mad;
> +		response->header.recv_wc.recv_buf.grh = &response->grh;
> +
> +		agent_send_response((struct ib_mad *)&response->mad.mad,
> +				    &response->grh, wc,
> +				    port_priv->device,
> +				    opa_smi_get_fwd_port(&recv->mad.opa_smp),
> +				    qp_info->qp->qp_num,
> +				    recv->header.wc.byte_len,
> +				    1);
> +
> +		return IB_SMI_DISCARD;
> +	}
> +
> +	return IB_SMI_HANDLE;
> +}
> +
> +static enum smi_action
> +handle_smi(struct ib_mad_port_private *port_priv,
> +	   struct ib_mad_qp_info *qp_info,
> +	   struct ib_wc *wc,
> +	   int port_num,
> +	   struct ib_mad_private *recv,
> +	   struct ib_mad_private *response,
> +	   int opa)
> +{
> +	if (opa && recv->mad.mad.mad_hdr.base_version ==
> OPA_MGMT_BASE_VERSION &&
> +	    recv->mad.mad.mad_hdr.class_version == OPA_SMI_CLASS_VERSION)
> +		return handle_opa_smi(port_priv, qp_info, wc, port_num, recv,
> response);
> +
> +	return handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
> response);
> +}
> +
>  static void ib_mad_recv_done_handler(struct ib_mad_port_private
> *port_priv,
> -				     struct ib_wc *wc)
> +				     struct ib_wc *wc,
> +				     struct ib_mad_private_header *mad_priv_hdr,
> +				     struct ib_mad_qp_info *qp_info)
>  {
> -	struct ib_mad_qp_info *qp_info;
> -	struct ib_mad_private_header *mad_priv_hdr;
>  	struct ib_mad_private *recv, *response = NULL;
> -	struct ib_mad_list_head *mad_list;
>  	struct ib_mad_agent_private *mad_agent;
>  	int port_num;
>  	int ret = IB_MAD_RESULT_SUCCESS;
>  	size_t resp_mad_size;
> +	int opa = qp_info->qp->device->cached_dev_attrs.device_cap_flags2 &
> +		  IB_DEVICE_OPA_MAD_SUPPORT;
> 
> -	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
> -	qp_info = mad_list->mad_queue->qp_info;
> -	dequeue_mad(mad_list);
> -
> -	mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header,
> -				    mad_list);
>  	recv = container_of(mad_priv_hdr, struct ib_mad_private, header);
>  	ib_dma_unmap_single(port_priv->device,
>  			    recv->header.mapping,
> @@ -2051,7 +2173,13 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
>  	/* Setup MAD receive work completion from "normal" work completion
> */
>  	recv->header.wc = *wc;
>  	recv->header.recv_wc.wc = &recv->header.wc;
> -	recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
> +	if (opa && recv->mad.mad.mad_hdr.base_version ==
> OPA_MGMT_BASE_VERSION) {
> +		recv->header.recv_wc.mad_len = wc->byte_len - sizeof(struct
> ib_grh);

Can this logic to set mad_len be used for both OPA and IB?

> +		recv->header.recv_wc.mad_seg_size = sizeof(struct jumbo_mad);
> +	} else {
> +		recv->header.recv_wc.mad_len = sizeof(struct ib_mad);
> +		recv->header.recv_wc.mad_seg_size = sizeof(struct ib_mad);
> +	}
>  	recv->header.recv_wc.recv_buf.mad = &recv->mad.mad;
>  	recv->header.recv_wc.recv_buf.grh = &recv->grh;
> 
> @@ -2059,7 +2187,7 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
>  		snoop_recv(qp_info, &recv->header.recv_wc,
> IB_MAD_SNOOP_RECVS);
> 
>  	/* Validate MAD */
> -	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
> +	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info, opa))
>  		goto out;
> 
>  	response = alloc_mad_priv(port_priv->device, &resp_mad_size);
> @@ -2076,8 +2204,7 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
> 
>  	if (recv->mad.mad.mad_hdr.mgmt_class ==
>  	    IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
> -		if (handle_ib_smi(port_priv, qp_info, wc, port_num, recv,
> -				  response)
> +		if (handle_smi(port_priv, qp_info, wc, port_num, recv,
> response, opa)
>  		    == IB_SMI_DISCARD)
>  			goto out;
>  	}
> @@ -2099,7 +2226,9 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
>  						    &recv->grh, wc,
>  						    port_priv->device,
>  						    port_num,
> -						    qp_info->qp->qp_num);
> +						    qp_info->qp->qp_num,
> +						    resp_mad_size,
> +						    opa);
>  				goto out;
>  			}
>  		}
> @@ -2114,9 +2243,12 @@ static void ib_mad_recv_done_handler(struct
> ib_mad_port_private *port_priv,
>  		 */
>  		recv = NULL;
>  	} else if ((ret & IB_MAD_RESULT_SUCCESS) &&
> -		   generate_unmatched_resp(recv, response)) {
> +		   generate_unmatched_resp(recv, response, &resp_mad_size,
> opa)) {
>  		agent_send_response(&response->mad.mad, &recv->grh, wc,
> -				    port_priv->device, port_num, qp_info->qp-
> >qp_num);
> +				    port_priv->device, port_num,
> +				    qp_info->qp->qp_num,
> +				    resp_mad_size,
> +				    opa);
>  	}
> 
>  out:
> @@ -2381,6 +2513,23 @@ static void mad_error_handler(struct
> ib_mad_port_private *port_priv,
>  	}
>  }
> 
> +static void ib_mad_recv_mad(struct ib_mad_port_private *port_priv,
> +			    struct ib_wc *wc)
> +{
> +	struct ib_mad_qp_info *qp_info;
> +	struct ib_mad_list_head *mad_list;
> +	struct ib_mad_private_header *mad_priv_hdr;
> +
> +	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
> +	qp_info = mad_list->mad_queue->qp_info;
> +	dequeue_mad(mad_list);
> +
> +	mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header,
> +				    mad_list);
> +
> +	ib_mad_recv_done_handler(port_priv, wc, mad_priv_hdr, qp_info);
> +}
> +
>  /*
>   * IB MAD completion callback
>   */
> @@ -2399,7 +2548,7 @@ static void ib_mad_completion_handler(struct
> work_struct *work)
>  				ib_mad_send_done_handler(port_priv, &wc);
>  				break;
>  			case IB_WC_RECV:
> -				ib_mad_recv_done_handler(port_priv, &wc);
> +				ib_mad_recv_mad(port_priv, &wc);
>  				break;
>  			default:
>  				BUG_ON(1);
> @@ -2518,10 +2667,14 @@ static void local_completions(struct work_struct
> *work)
>  	int free_mad;
>  	struct ib_wc wc;
>  	struct ib_mad_send_wc mad_send_wc;
> +	int opa;
> 
>  	mad_agent_priv =
>  		container_of(work, struct ib_mad_agent_private, local_work);
> 
> +	opa = mad_agent_priv->qp_info->qp->device-
> >cached_dev_attrs.device_cap_flags2 &
> +	      IB_DEVICE_OPA_MAD_SUPPORT;
> +
>  	spin_lock_irqsave(&mad_agent_priv->lock, flags);
>  	while (!list_empty(&mad_agent_priv->local_list)) {
>  		local = list_entry(mad_agent_priv->local_list.next,
> @@ -2531,6 +2684,7 @@ static void local_completions(struct work_struct
> *work)
>  		spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
>  		free_mad = 0;
>  		if (local->mad_priv) {
> +			u8 base_version;
>  			recv_mad_agent = local->recv_mad_agent;
>  			if (!recv_mad_agent) {
>  				dev_err(&mad_agent_priv->agent.device->dev,
> @@ -2546,11 +2700,20 @@ static void local_completions(struct work_struct
> *work)
>  			build_smp_wc(recv_mad_agent->agent.qp,
>  				     (unsigned long) local->mad_send_wr,
>  				     be16_to_cpu(IB_LID_PERMISSIVE),
> -				     0, recv_mad_agent->agent.port_num, &wc);
> +				     local->mad_send_wr->send_wr.wr.ud.pkey_index,
> +				     recv_mad_agent->agent.port_num, &wc);
> 
>  			local->mad_priv->header.recv_wc.wc = &wc;
> -			local->mad_priv->header.recv_wc.mad_len =
> -						sizeof(struct ib_mad);
> +
> +			base_version = local->mad_priv-
> >mad.mad.mad_hdr.base_version;
> +			if (opa && base_version == OPA_MGMT_BASE_VERSION) {

Okay, how about having something like this?

int is_opa_mad(struct ib_mad_private *mad_priv)

that returns true if the MAD is a new OPA MAD

> +				local->mad_priv->header.recv_wc.mad_len = local-
> >return_wc_byte_len;

The mad_len calculation seems like it should be the same in all cases.

> +				local->mad_priv->header.recv_wc.mad_seg_size =
> sizeof(struct jumbo_mad);
> +			} else {
> +				local->mad_priv->header.recv_wc.mad_len =
> sizeof(struct ib_mad);
> +				local->mad_priv->header.recv_wc.mad_seg_size =
> sizeof(struct ib_mad);
> +			}
> +
>  			INIT_LIST_HEAD(&local->mad_priv-
> >header.recv_wc.rmpp_list);
>  			list_add(&local->mad_priv->header.recv_wc.recv_buf.list,
>  				 &local->mad_priv->header.recv_wc.rmpp_list);
> @@ -2699,7 +2862,7 @@ static int ib_mad_post_receive_mads(struct
> ib_mad_qp_info *qp_info,
>  	struct ib_mad_queue *recv_queue = &qp_info->recv_queue;
> 
>  	/* Initialize common scatter list fields */
> -	sg_list.length = sizeof *mad_priv - sizeof mad_priv->header;
> +	sg_list.length = mad_recv_buf_size(qp_info->port_priv->device);
>  	sg_list.lkey = (*qp_info->port_priv->mr).lkey;
> 
>  	/* Initialize common receive WR fields */
> diff --git a/drivers/infiniband/core/mad_priv.h
> b/drivers/infiniband/core/mad_priv.h
> index 141b05a..dd42ace 100644
> --- a/drivers/infiniband/core/mad_priv.h
> +++ b/drivers/infiniband/core/mad_priv.h
> @@ -154,6 +154,7 @@ struct ib_mad_local_private {
>  	struct ib_mad_private *mad_priv;
>  	struct ib_mad_agent_private *recv_mad_agent;
>  	struct ib_mad_send_wr_private *mad_send_wr;
> +	size_t return_wc_byte_len;
>  };
> 
>  struct ib_mad_mgmt_method_table {
> diff --git a/drivers/infiniband/core/mad_rmpp.c
> b/drivers/infiniband/core/mad_rmpp.c
> index 7184530..6f69d5a 100644
> --- a/drivers/infiniband/core/mad_rmpp.c
> +++ b/drivers/infiniband/core/mad_rmpp.c
> @@ -1,6 +1,7 @@
>  /*
>   * Copyright (c) 2005 Intel Inc. All rights reserved.
>   * Copyright (c) 2005-2006 Voltaire, Inc. All rights reserved.
> + * Copyright (c) 2014 Intel Corporation.  All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -67,6 +68,7 @@ struct mad_rmpp_recv {
>  	u8 mgmt_class;
>  	u8 class_version;
>  	u8 method;
> +	u8 base_version;

You're not really caring about the base version, right?  You really just want to know if this is an OPA MAD.

>  };
> 
>  static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv)
> @@ -318,6 +320,7 @@ create_rmpp_recv(struct ib_mad_agent_private *agent,
>  	rmpp_recv->mgmt_class = mad_hdr->mgmt_class;
>  	rmpp_recv->class_version = mad_hdr->class_version;
>  	rmpp_recv->method  = mad_hdr->method;
> +	rmpp_recv->base_version  = mad_hdr->base_version;
>  	return rmpp_recv;
> 
>  error:	kfree(rmpp_recv);
> @@ -431,16 +434,25 @@ static void update_seg_num(struct mad_rmpp_recv
> *rmpp_recv,
> 
>  static inline int get_mad_len(struct mad_rmpp_recv *rmpp_recv)
>  {
> -	struct ib_rmpp_mad *rmpp_mad;
> +	struct ib_rmpp_base *rmpp_base;
>  	int hdr_size, data_size, pad;
> +	int opa = rmpp_recv->agent->qp_info->qp->device-
> >cached_dev_attrs.device_cap_flags2 &
> +		  IB_DEVICE_OPA_MAD_SUPPORT;
> 
> -	rmpp_mad = (struct ib_rmpp_mad *)rmpp_recv->cur_seg_buf->mad;
> +	rmpp_base = (struct ib_rmpp_base *)rmpp_recv->cur_seg_buf->mad;
> 
> -	hdr_size = ib_get_mad_data_offset(rmpp_mad-
> >base.mad_hdr.mgmt_class);
> -	data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
> -	pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad-
> >base.rmpp_hdr.paylen_newwin);
> -	if (pad > IB_MGMT_RMPP_DATA || pad < 0)
> -		pad = 0;
> +	hdr_size = ib_get_mad_data_offset(rmpp_base->mad_hdr.mgmt_class);
> +	if (opa && rmpp_recv->base_version == OPA_MGMT_BASE_VERSION) {
> +		data_size = sizeof(struct jumbo_rmpp_mad) - hdr_size;
> +		pad = JUMBO_MGMT_RMPP_DATA - be32_to_cpu(rmpp_base-
> >rmpp_hdr.paylen_newwin);
> +		if (pad > JUMBO_MGMT_RMPP_DATA || pad < 0)
> +			pad = 0;
> +	} else {
> +		data_size = sizeof(struct ib_rmpp_mad) - hdr_size;
> +		pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_base-
> >rmpp_hdr.paylen_newwin);
> +		if (pad > IB_MGMT_RMPP_DATA || pad < 0)
> +			pad = 0;
> +	}
> 
>  	return hdr_size + rmpp_recv->seg_num * data_size - pad;
>  }
> @@ -933,11 +945,11 @@ int ib_process_rmpp_send_wc(struct
> ib_mad_send_wr_private *mad_send_wr,
> 
>  int ib_retry_rmpp(struct ib_mad_send_wr_private *mad_send_wr)
>  {
> -	struct ib_rmpp_base *rmpp_base;
> +	struct ib_rmpp_mad *rmpp_mad;
>  	int ret;
> 
> -	rmpp_base = mad_send_wr->send_buf.mad;
> -	if (!(ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
> +	rmpp_mad = mad_send_wr->send_buf.mad;
> +	if (!(ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) &
>  	      IB_MGMT_RMPP_FLAG_ACTIVE))
>  		return IB_RMPP_RESULT_UNHANDLED; /* RMPP not active */
> 
> diff --git a/drivers/infiniband/core/user_mad.c
> b/drivers/infiniband/core/user_mad.c
> index ac33d34..1192f6c 100644
> --- a/drivers/infiniband/core/user_mad.c
> +++ b/drivers/infiniband/core/user_mad.c
> @@ -263,20 +263,23 @@ static ssize_t copy_recv_mad(struct ib_umad_file
> *file, char __user *buf,
>  {
>  	struct ib_mad_recv_buf *recv_buf;
>  	int left, seg_payload, offset, max_seg_payload;
> +	size_t seg_size;
> 
> -	/* We need enough room to copy the first (or only) MAD segment. */
>  	recv_buf = &packet->recv_wc->recv_buf;
> -	if ((packet->length <= sizeof (*recv_buf->mad) &&
> +	seg_size = packet->recv_wc->mad_seg_size;
> +
> +	/* We need enough room to copy the first (or only) MAD segment. */
> +	if ((packet->length <= seg_size &&
>  	     count < hdr_size(file) + packet->length) ||
> -	    (packet->length > sizeof (*recv_buf->mad) &&
> -	     count < hdr_size(file) + sizeof (*recv_buf->mad)))
> +	    (packet->length > seg_size &&
> +	     count < hdr_size(file) + seg_size))
>  		return -EINVAL;
> 
>  	if (copy_to_user(buf, &packet->mad, hdr_size(file)))
>  		return -EFAULT;
> 
>  	buf += hdr_size(file);
> -	seg_payload = min_t(int, packet->length, sizeof (*recv_buf->mad));
> +	seg_payload = min_t(int, packet->length, seg_size);
>  	if (copy_to_user(buf, recv_buf->mad, seg_payload))
>  		return -EFAULT;
> 
> @@ -293,7 +296,7 @@ static ssize_t copy_recv_mad(struct ib_umad_file
> *file, char __user *buf,
>  			return -ENOSPC;
>  		}
>  		offset = ib_get_mad_data_offset(recv_buf->mad-
> >mad_hdr.mgmt_class);
> -		max_seg_payload = sizeof (struct ib_mad) - offset;
> +		max_seg_payload = seg_size - offset;
> 
>  		for (left = packet->length - seg_payload, buf += seg_payload;
>  		     left; left -= seg_payload, buf += seg_payload) {
> @@ -448,9 +451,10 @@ static ssize_t ib_umad_write(struct file *filp, const
> char __user *buf,
>  	struct ib_mad_agent *agent;
>  	struct ib_ah_attr ah_attr;
>  	struct ib_ah *ah;
> -	struct ib_rmpp_base *rmpp_base;
> +	struct ib_rmpp_mad *rmpp_mad;
>  	__be64 *tid;
>  	int ret, data_len, hdr_len, copy_offset, rmpp_active;
> +	u8 base_version;
> 
>  	if (count < hdr_size(file) + IB_MGMT_RMPP_HDR)
>  		return -EINVAL;
> @@ -504,25 +508,26 @@ static ssize_t ib_umad_write(struct file *filp,
> const char __user *buf,
>  		goto err_up;
>  	}
> 
> -	rmpp_base = (struct ib_rmpp_base *) packet->mad.data;
> -	hdr_len = ib_get_mad_data_offset(rmpp_base->mad_hdr.mgmt_class);
> +	rmpp_mad = (struct ib_rmpp_mad *) packet->mad.data;
> +	hdr_len = ib_get_mad_data_offset(rmpp_mad->base.mad_hdr.mgmt_class);
> 
> -	if (ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
> +	if (ib_is_mad_class_rmpp(rmpp_mad->base.mad_hdr.mgmt_class)
>  	    && ib_mad_kernel_rmpp_agent(agent)) {
>  		copy_offset = IB_MGMT_RMPP_HDR;
> -		rmpp_active = ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
> +		rmpp_active = ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) &
>  						IB_MGMT_RMPP_FLAG_ACTIVE;
>  	} else {
>  		copy_offset = IB_MGMT_MAD_HDR;
>  		rmpp_active = 0;
>  	}
> 
> +	base_version = ((struct ib_mad_hdr *)&packet->mad.data)-
> >base_version;
>  	data_len = count - hdr_size(file) - hdr_len;
>  	packet->msg = ib_create_send_mad(agent,
>  					 be32_to_cpu(packet->mad.hdr.qpn),
>  					 packet->mad.hdr.pkey_index, rmpp_active,
>  					 hdr_len, data_len, GFP_KERNEL,
> -					 IB_MGMT_BASE_VERSION);
> +					 base_version);
>  	if (IS_ERR(packet->msg)) {
>  		ret = PTR_ERR(packet->msg);
>  		goto err_ah;
> @@ -558,12 +563,12 @@ static ssize_t ib_umad_write(struct file *filp,
> const char __user *buf,
>  		tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid;
>  		*tid = cpu_to_be64(((u64) agent->hi_tid) << 32 |
>  				   (be64_to_cpup(tid) & 0xffffffff));
> -		rmpp_base->mad_hdr.tid = *tid;
> +		rmpp_mad->base.mad_hdr.tid = *tid;
>  	}
> 
>  	if (!ib_mad_kernel_rmpp_agent(agent)
> -	   && ib_is_mad_class_rmpp(rmpp_base->mad_hdr.mgmt_class)
> -	   && (ib_get_rmpp_flags(&rmpp_base->rmpp_hdr) &
> IB_MGMT_RMPP_FLAG_ACTIVE)) {
> +	   && ib_is_mad_class_rmpp(rmpp_mad->base.mad_hdr.mgmt_class)
> +	   && (ib_get_rmpp_flags(&rmpp_mad->base.rmpp_hdr) &
> IB_MGMT_RMPP_FLAG_ACTIVE)) {
>  		spin_lock_irq(&file->send_lock);
>  		list_add_tail(&packet->list, &file->send_list);
>  		spin_unlock_irq(&file->send_lock);
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index 8938f1e..f5b6a27 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -436,6 +436,7 @@ struct ib_mad_recv_buf {
>   * @recv_buf: Specifies the location of the received data buffer(s).
>   * @rmpp_list: Specifies a list of RMPP reassembled received MAD buffers.
>   * @mad_len: The length of the received MAD, without duplicated headers.
> + * @mad_seg_size: The size of individual MAD segments
>   *
>   * For received response, the wr_id contains a pointer to the
> ib_mad_send_buf
>   *   for the corresponding send request.
> @@ -445,6 +446,7 @@ struct ib_mad_recv_wc {
>  	struct ib_mad_recv_buf	recv_buf;
>  	struct list_head	rmpp_list;
>  	int			mad_len;
> +	size_t			mad_seg_size;
>  };
> 
>  /**
> --
> 1.8.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD574-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-06 22:10           ` ira.weiny
       [not found]             ` <20150406221044.GA433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: ira.weiny @ 2015-04-06 22:10 UTC (permalink / raw)
  To: Hefty, Sean, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Fri, Apr 03, 2015 at 02:43:46PM -0600, Hefty, Sean wrote:
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index 0d74f1d..0116e4b 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -1675,6 +1675,7 @@ struct ib_device {
> >  	u32			     local_dma_lkey;
> >  	u8                           node_type;
> >  	u8                           phys_port_cnt;
> > +	struct ib_device_attr        cached_dev_attrs;
> >  };
> 
> Looking at the device attributes, I think all of the values are static for a given device.
>

Yes

> If this is indeed the case, then I would just remove the word 'cached' from the field name.  Cached makes me think of the values dynamically changing, and if that's the case, then this patch isn't sufficient.
> 

I understand your point and I originally called this "attributes" but it was
suggested to call it cached_dev_attrs.

https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg22486.html

I can change it back if we are all agreed.


Roland?


> Alternatively, if there's only a few values that ULPs need, maybe just store those.

This patch was suggested because there are too many of the flags needed by
ULPs.

A quote from Or:

"I find it very annoying that upper level drivers replicate in different ways
elements from the IB device attributes returned by ib_query_device. I met that
in multiple drivers and upcoming designs for which I do code review. Are you up
to come up with a patch that caches the device attributes on the device
structure? if not, I can do that.. and have your code to see it."

https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg22011.html

Roland is having such a patch even acceptable or would you prefer to have a
query_device call by each ULP?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers
       [not found]             ` <20150406221044.GA433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-04-06 22:43               ` Hefty, Sean
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBDE3E-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-06 22:43 UTC (permalink / raw)
  To: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> I understand your point and I originally called this "attributes" but it
> was
> suggested to call it cached_dev_attrs.
> 
> https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg22486.html
> 
> I can change it back if we are all agreed.

I'll disagree with Or on this.  Unless some of these values can change dynamically, these are the attributes.

This leads me to, is there a reason to keep the per device query_attribute() call?  (This would not be part of this series.)  Are there attributes being returned from the kernel to user space that fall outside of the defined attribute area?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD9DE-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-07  2:29           ` Jason Gunthorpe
       [not found]             ` <20150407022954.GA7531-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-04-07 17:15             ` Hefty, Sean
  0 siblings, 2 replies; 84+ messages in thread
From: Jason Gunthorpe @ 2015-04-07  2:29 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Sat, Apr 04, 2015 at 01:44:40AM +0000, Hefty, Sean wrote:

> > +	if (opa && smp->class_version == OPA_SMP_CLASS_VERSION) {
> 
> There are several places where this sort of check is made.  IMO,
> this check should only require looking at the MAD, not the MAD + the
> device attributes that the MAD will be transferred on.  I would
> actually prefer to see this as:
> 
> 	if (smp->class_version == OPA_SMP_CLASS_VERSION)
> 
> That check is sufficient.  There is no conflict with IB MADs, and
> it's needlessly complicates the code to assume that the IBTA is
> going to someday define another 128 class versions in such a way
> that those versions will not require any other changes to the code.

Hal asked for this, and I agree. It is just lazy not to check the
underlying device type for this stuff - they are different number
spaces, administered by different bodies with no apparent
coordination.

The IBA is pretty clear what should happen to process an unsupported class
version and and adding OPA shouldn't suddenly make the IB side
non-conformant, however aesthetically unpleasing the code may be.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]             ` <20150407022954.GA7531-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-07 16:57               ` Hefty, Sean
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBE150-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-07 16:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> Hal asked for this, and I agree. It is just lazy not to check the
> underlying device type for this stuff - they are different number
> spaces, administered by different bodies with no apparent
> coordination.
> 
> The IBA is pretty clear what should happen to process an unsupported class
> version and and adding OPA shouldn't suddenly make the IB side
> non-conformant, however aesthetically unpleasing the code may be.

Are there checks to ensure that MADs not supported by RoCE aren't processed on RoCE ports?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
  2015-04-07  2:29           ` Jason Gunthorpe
       [not found]             ` <20150407022954.GA7531-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-07 17:15             ` Hefty, Sean
       [not found]               ` <1828884A29C6694DAF28B7E6B8A82373A8FBE17D-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-07 17:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> > The IBA is pretty clear what should happen to process an unsupported
> class
> > version and and adding OPA shouldn't suddenly make the IB side
> > non-conformant, however aesthetically unpleasing the code may be.
> 
> Are there checks to ensure that MADs not supported by RoCE aren't
> processed on RoCE ports?

For the receive side, we need to enhance the validate_mad().  A similar check could be added to the send side, if we don't trust the senders.

Mgmt_class checks are everywhere in the code.  Pairing each with device checks for IB, RoCE, or OPA would be unwieldy.

IMO, a MAD should be self-identifying.  E.g. update the mad private structure to indicate what sort of MAD it is -- is_smp(), is_gmp(), is_rmpp(), is_opa(), is_ib(), much like the device checks are being updated.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBE150-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-07 17:16                   ` Jason Gunthorpe
       [not found]                     ` <20150407171659.GA15634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Jason Gunthorpe @ 2015-04-07 17:16 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Tue, Apr 07, 2015 at 04:57:43PM +0000, Hefty, Sean wrote:
> > Hal asked for this, and I agree. It is just lazy not to check the
> > underlying device type for this stuff - they are different number
> > spaces, administered by different bodies with no apparent
> > coordination.
> > 
> > The IBA is pretty clear what should happen to process an unsupported class
> > version and and adding OPA shouldn't suddenly make the IB side
> > non-conformant, however aesthetically unpleasing the code may be.
> 
> Are there checks to ensure that MADs not supported by RoCE aren't
> processed on RoCE ports?

RoCE isn't different, it uses the same numbering space and rules as
IB.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]               ` <1828884A29C6694DAF28B7E6B8A82373A8FBE17D-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-07 17:19                 ` Jason Gunthorpe
       [not found]                   ` <20150407171920.GB15634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Jason Gunthorpe @ 2015-04-07 17:19 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Tue, Apr 07, 2015 at 05:15:07PM +0000, Hefty, Sean wrote:

> IMO, a MAD should be self-identifying.  E.g. update the mad private
> structure to indicate what sort of MAD it is -- is_smp(), is_gmp(),
> is_rmpp(), is_opa(), is_ib(), much like the device checks are being
> updated.

decording into the mad private structure during initial validation
seems like a good idea.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD692-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-07 17:23           ` ira.weiny
       [not found]             ` <20150407172303.GB433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: ira.weiny @ 2015-04-07 17:23 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Fri, Apr 03, 2015 at 04:40:20PM -0600, Hefty, Sean wrote:
> > -static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev)
> > +static struct ib_mad_private *alloc_mad_priv(struct ib_device *dev,
> > +					     size_t *mad_size)
> >  {
> > +	*mad_size = dev->cached_dev_attrs.max_mad_size;
> 
> Why does this function return the value that the caller can just read from the device?
>
> Actually, it's odd for an alloc() call to return how much it allocated, rather than taking that as input.

True.

This was done just for the convenience of the callers.  The previous patch used
ib_device to determine the size I was just trying to keep that semantic with
the function "helping" the caller...

I'll change this to:

static struct ib_mad_private *alloc_mad_priv(size_t mad_size, gfp_t flags);

which also addresses Dougs comments regarding GFP_ATOMIC.

> 
> >  	return (kmalloc(sizeof(struct ib_mad_private_header) +
> > -			sizeof(struct ib_grh) +
> > -			dev->cached_dev_attrs.max_mad_size, GFP_ATOMIC));
> > +			sizeof(struct ib_grh) + *mad_size, GFP_ATOMIC));
> >  }
> > 
> >  /*
> > @@ -741,6 +742,8 @@ static int handle_outgoing_dr_smp(struct
> > ib_mad_agent_private *mad_agent_priv,
> >  	u8 port_num;
> >  	struct ib_wc mad_wc;
> >  	struct ib_send_wr *send_wr = &mad_send_wr->send_wr;
> > +	size_t in_mad_size = mad_agent_priv->agent.device-
> > >cached_dev_attrs.max_mad_size;
> > +	size_t out_mad_size;
> > 
> >  	if (device->node_type == RDMA_NODE_IB_SWITCH &&
> >  	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
> > @@ -777,7 +780,7 @@ static int handle_outgoing_dr_smp(struct
> > ib_mad_agent_private *mad_agent_priv,
> >  	local->mad_priv = NULL;
> >  	local->recv_mad_agent = NULL;
> > 
> > -	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device);
> > +	mad_priv = alloc_mad_priv(mad_agent_priv->agent.device,
> > &out_mad_size);
> >  	if (!mad_priv) {
> >  		ret = -ENOMEM;
> >  		dev_err(&device->dev, "No memory for local response MAD\n");
> > @@ -792,8 +795,9 @@ static int handle_outgoing_dr_smp(struct
> > ib_mad_agent_private *mad_agent_priv,
> > 
> >  	/* No GRH for DR SMP */
> >  	ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
> > -				  (struct ib_mad *)smp,
> > -				  (struct ib_mad *)&mad_priv->mad);
> > +				  (struct ib_mad_hdr *)smp, in_mad_size,
> > +				  (struct ib_mad_hdr *)&mad_priv->mad,
> > +				  &out_mad_size);
> 
> Rather than calling device->process_mad() directly, would it be better to call a common function?  So we can avoid adding:
> 
> > +	struct ib_mad *in_mad = (struct ib_mad *)in;
> > +	struct ib_mad *out_mad = (struct ib_mad *)out;
> > +
> > +	if (in_mad_size != sizeof(*in_mad) || *out_mad_size !=
> > sizeof(*out_mad))
> > +		return IB_MAD_RESULT_FAILURE;
> 
> to existing drivers?

No.  The checks need to be done by the devices.  IB devices expect exactly 256
bytes, OPA devices expect 2K, while some drivers don't care at all (don't
support MADs).

However, upon further reflection these checks indicate a programming error in
which the device is indicating the wrong mad size or there is a bug in the MAD
module.  Therefore these should be WARN_ON.

> 
> 
> >  	switch (ret)
> >  	{
> >  	case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY:
> > @@ -2011,6 +2015,7 @@ static void ib_mad_recv_done_handler(struct
> > ib_mad_port_private *port_priv,
> >  	struct ib_mad_agent_private *mad_agent;
> >  	int port_num;
> >  	int ret = IB_MAD_RESULT_SUCCESS;
> > +	size_t resp_mad_size;
> > 
> >  	mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id;
> >  	qp_info = mad_list->mad_queue->qp_info;
> > @@ -2038,7 +2043,7 @@ static void ib_mad_recv_done_handler(struct
> > ib_mad_port_private *port_priv,
> >  	if (!validate_mad(&recv->mad.mad.mad_hdr, qp_info->qp->qp_num))
> >  		goto out;
> > 
> > -	response = alloc_mad_priv(port_priv->device);
> > +	response = alloc_mad_priv(port_priv->device, &resp_mad_size);
> >  	if (!response) {
> >  		dev_err(&port_priv->device->dev,
> >  			"ib_mad_recv_done_handler no memory for response
> > buffer\n");
> > @@ -2063,8 +2068,10 @@ static void ib_mad_recv_done_handler(struct
> > ib_mad_port_private *port_priv,
> >  		ret = port_priv->device->process_mad(port_priv->device, 0,
> >  						     port_priv->port_num,
> >  						     wc, &recv->grh,
> > -						     &recv->mad.mad,
> > -						     &response->mad.mad);
> > +						     (struct ib_mad_hdr *)&recv-
> > >mad.mad,
> > +						     port_priv->device-
> > >cached_dev_attrs.max_mad_size,
> 
> This is the size of the allocated buffer.

Yes

> Something based on wc.byte_len seems like a better option.

The IB process_mad calls currently have no size checks and expect the full IB
MAD...  I did not want to break that semantic and risk introducing bugs in
those drivers.

While it seems byte_len should be sufficient I can't test for all devices.

Furthermore for OPA devices there is no issue with passing the full buffer
size.  So I prefer to leave this as is.

Ira

> 
> 
> > +						     (struct ib_mad_hdr *)&response-
> > >mad.mad,
> > +						     &resp_mad_size);
> >  		if (ret & IB_MAD_RESULT_SUCCESS) {
> >  			if (ret & IB_MAD_RESULT_CONSUMED)
> >  				goto out;
> > @@ -2687,7 +2694,10 @@ static int ib_mad_post_receive_mads(struct
> > ib_mad_qp_info *qp_info,
> >  			mad_priv = mad;
> >  			mad = NULL;
> >  		} else {
> > -			mad_priv = alloc_mad_priv(qp_info->port_priv->device);
> > +			size_t mad_size;
> > +
> > +			mad_priv = alloc_mad_priv(qp_info->port_priv->device,
> > +						  &mad_size);
> >  			if (!mad_priv) {
> >  				dev_err(&qp_info->port_priv->device->dev,
> >  					"No memory for receive buffer\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]                     ` <20150407171659.GA15634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-07 17:41                       ` Hefty, Sean
       [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373A8FBE1EB-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-07 17:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> > > The IBA is pretty clear what should happen to process an unsupported
> class
> > > version and and adding OPA shouldn't suddenly make the IB side
> > > non-conformant, however aesthetically unpleasing the code may be.
> >
> > Are there checks to ensure that MADs not supported by RoCE aren't
> > processed on RoCE ports?
> 
> RoCE isn't different, it uses the same numbering space and rules as
> IB.

I was agreeing with the point made about non-compliance.  RoCE only supports a subset of IB management, and AFAIK, there are no checks to validate MADs received over a RoCE port.  Those should be added.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad
       [not found]             ` <20150407172303.GB433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-04-07 17:53               ` Hefty, Sean
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBE224-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: Hefty, Sean @ 2015-04-07 17:53 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> > Rather than calling device->process_mad() directly, would it be better
> to call a common function?  So we can avoid adding:
> >
> > > +	struct ib_mad *in_mad = (struct ib_mad *)in;
> > > +	struct ib_mad *out_mad = (struct ib_mad *)out;
> > > +
> > > +	if (in_mad_size != sizeof(*in_mad) || *out_mad_size !=
> > > sizeof(*out_mad))
> > > +		return IB_MAD_RESULT_FAILURE;
> >
> > to existing drivers?
> 
> No.  The checks need to be done by the devices.  IB devices expect exactly
> 256
> bytes, OPA devices expect 2K, while some drivers don't care at all (don't
> support MADs).

The checks are per device, but that doesn't mean that every driver has to do the same check.

ib_process_mad(...)
{
	if (ib_device(...))
		insert check here
	else if (opa_device(...))
		insert some other check here
	else
		hit caller on nose
	dev->process_mad(...)
}

I'm fine either way.  It's more a matter of how much trust is given to the other kernel components.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373A8FBE1EB-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-07 18:06                           ` Jason Gunthorpe
  0 siblings, 0 replies; 84+ messages in thread
From: Jason Gunthorpe @ 2015-04-07 18:06 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Weiny, Ira, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Tue, Apr 07, 2015 at 05:41:34PM +0000, Hefty, Sean wrote:
> > > > The IBA is pretty clear what should happen to process an unsupported
> > class
> > > > version and and adding OPA shouldn't suddenly make the IB side
> > > > non-conformant, however aesthetically unpleasing the code may be.
> > >
> > > Are there checks to ensure that MADs not supported by RoCE aren't
> > > processed on RoCE ports?
> > 
> > RoCE isn't different, it uses the same numbering space and rules as
> > IB.
> 
> I was agreeing with the point made about non-compliance.  RoCE only
> supports a subset of IB management, and AFAIK, there are no checks
> to validate MADs received over a RoCE port.  Those should be added.

RoCE follows the base IB spec pretty well, except that there is no QP0
(or SMPs, or SMI, etc). There is already code that avoids creating QP0
MAD stuff and the SMI for RoCE. A RoCE driver should never deliver a
QP0 packet to the mad layer.

There are already various checks that tie SMP's to QP0, on tx and also
in validate_mad. Eg validate_mad already refuses every SMP on RoCE
since qp_num can never be 0.

So I think we are good already. Every difference not caught by the
missing QP0 is handled by not registering certain agents for RoCE,
which looks OK already since I think most GSI's are supported.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 15/19] IB/mad: Create jumbo_mad data structures
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD6C8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-04-04  0:14           ` Hefty, Sean
@ 2015-04-08 15:33           ` ira.weiny
  1 sibling, 0 replies; 84+ messages in thread
From: ira.weiny @ 2015-04-08 15:33 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Fri, Apr 03, 2015 at 05:08:53PM -0600, Hefty, Sean wrote:
> > Define jumbo_mad and jumbo_rmpp_mad.
> 
> I would just use 'opa_mad' in place of 'jumbo_mad'.  Jumbo sounds like a marketing term or elephant name.

Done in v5.

>  
> > Jumbo MAD structures are 2K versions of ib_mad and ib_rmpp_mad structures.
> > Currently only OPA base version MADs are of this type.
> > 
> > Create an RMPP Base header to share between ib_rmpp_mad and jumbo_rmpp_mad
> > 
> > Update existing code to use the new structures.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > 
> > ---

[snip]

> > diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> > index 00a5e51..80e7cf4 100644
> > --- a/include/rdma/ib_mad.h
> > +++ b/include/rdma/ib_mad.h
> > @@ -136,6 +136,11 @@ enum {
> >  	IB_MGMT_DEVICE_HDR = 64,
> >  	IB_MGMT_DEVICE_DATA = 192,
> >  	IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
> > +	JUMBO_MGMT_MAD_HDR = IB_MGMT_MAD_HDR,
> > +	JUMBO_MGMT_MAD_DATA = 2024,
> > +	JUMBO_MGMT_RMPP_HDR = IB_MGMT_RMPP_HDR,
> > +	JUMBO_MGMT_RMPP_DATA = 2012,
> > +	JUMBO_MGMT_MAD_SIZE = JUMBO_MGMT_MAD_HDR + JUMBO_MGMT_MAD_DATA,
> 
> Keep the "IB_" prefix, or add a new "OPA_" prefix.

I'll change JUMBO to OPA to match the "jumbo" to "opa" change.

Integrating this and your follow up comment v5 of this hunk now reads:

@@ -136,6 +136,9 @@ enum {
        IB_MGMT_DEVICE_HDR = 64,
        IB_MGMT_DEVICE_DATA = 192,
        IB_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + IB_MGMT_MAD_DATA,
+       OPA_MGMT_MAD_DATA = 2024,
+       OPA_MGMT_RMPP_DATA = 2012,
+       OPA_MGMT_MAD_SIZE = IB_MGMT_MAD_HDR + OPA_MGMT_MAD_DATA,
 };

> 
> >  };
> > 
> >  struct ib_mad_hdr {
> > @@ -182,12 +187,26 @@ struct ib_mad {
> >  	u8			data[IB_MGMT_MAD_DATA];
> >  };
> > 
> > -struct ib_rmpp_mad {
> > +struct jumbo_mad {
> > +	struct ib_mad_hdr	mad_hdr;
> > +	u8			data[JUMBO_MGMT_MAD_DATA];
> > +};
> > +
> > +struct ib_rmpp_base {
> >  	struct ib_mad_hdr	mad_hdr;
> >  	struct ib_rmpp_hdr	rmpp_hdr;
> > +} __packed;
> > +
> > +struct ib_rmpp_mad {
> > +	struct ib_rmpp_base	base;
> >  	u8			data[IB_MGMT_RMPP_DATA];
> >  };
> > 
> > +struct jumbo_rmpp_mad {
> > +	struct ib_rmpp_base	base;
> > +	u8			data[JUMBO_MGMT_RMPP_DATA];
> > +};
> 
> Please separate this patch into 2 changes.  One that adds and updates ib_rmpp_base, with the second one defining ib_opa_mad & ib_opa_rmpp_mad (or whatever prefix is chosen).

Done in v5.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBDE3E-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-08 16:01                   ` ira.weiny
  0 siblings, 0 replies; 84+ messages in thread
From: ira.weiny @ 2015-04-08 16:01 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Mon, Apr 06, 2015 at 04:43:01PM -0600, Hefty, Sean wrote:
> > I understand your point and I originally called this "attributes" but it
> > was
> > suggested to call it cached_dev_attrs.
> > 
> > https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg22486.html
> > 
> > I can change it back if we are all agreed.
> 
> I'll disagree with Or on this.  Unless some of these values can change dynamically, these are the attributes.

Or

Any comment?

Personally, this patch was not originally part of my series.  I had a
ib_query_device call which Or objected to "yet another caching" of a device
attribute.

With the discussions on "management helpers" I probably don't need this at all.
So, if I can remove it I will and then we can debate its merits separate from the
OPA changes.

Ira

> 
> This leads me to, is there a reason to keep the per device query_attribute() call?  (This would not be part of this series.)  Are there attributes being returned from the kernel to user space that fall outside of the defined attribute area?
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBE224-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-08 16:07                   ` ira.weiny
  0 siblings, 0 replies; 84+ messages in thread
From: ira.weiny @ 2015-04-08 16:07 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Tue, Apr 07, 2015 at 11:53:00AM -0600, Hefty, Sean wrote:
> > > Rather than calling device->process_mad() directly, would it be better
> > to call a common function?  So we can avoid adding:
> > >
> > > > +	struct ib_mad *in_mad = (struct ib_mad *)in;
> > > > +	struct ib_mad *out_mad = (struct ib_mad *)out;
> > > > +
> > > > +	if (in_mad_size != sizeof(*in_mad) || *out_mad_size !=
> > > > sizeof(*out_mad))
> > > > +		return IB_MAD_RESULT_FAILURE;
> > >
> > > to existing drivers?
> > 
> > No.  The checks need to be done by the devices.  IB devices expect exactly
> > 256
> > bytes, OPA devices expect 2K, while some drivers don't care at all (don't
> > support MADs).
> 
> The checks are per device, but that doesn't mean that every driver has to do the same check.
> 
> ib_process_mad(...)
> {
> 	if (ib_device(...))
> 		insert check here
> 	else if (opa_device(...))
> 		insert some other check here
> 	else
> 		hit caller on nose

See this is another Oddity in the stack.  Those devices which are neither OPA
nor IB (iWarp or RoCE) have some support for MADs so we can't just "hit the
caller on the nose.

> 	dev->process_mad(...)
> }
> 
> I'm fine either way.  It's more a matter of how much trust is given to the other kernel components.

Indeed, I debated if I should add such checks at all.  In the end I have very
little trust so I did...  ;-)

However, to better reflect the nature of the error I've changed them all to
WARN_ONs.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing
       [not found]                   ` <20150407171920.GB15634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-04-08 21:24                     ` ira.weiny
  0 siblings, 0 replies; 84+ messages in thread
From: ira.weiny @ 2015-04-08 21:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hefty, Sean, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Tue, Apr 07, 2015 at 11:19:20AM -0600, Jason Gunthorpe wrote:
> On Tue, Apr 07, 2015 at 05:15:07PM +0000, Hefty, Sean wrote:
> 
> > IMO, a MAD should be self-identifying.

A "MAD" is self-identifying (contains versions and classes).

IBoE CM MADs are a sub-set of the IB MADs.  As are the PMA MADs which those
devices support.  So the MAD stack need not perform any checks on those MADs.

Additionally, the MAD stack simply checks if QP0 is supported and fails any
post to it, regardless of MAD content.  And by definition the MAD stack will
never get anything from QP0.

> > E.g. update the mad private
> > structure to indicate what sort of MAD it is -- is_smp(), is_gmp(),
> > is_rmpp(), is_opa(), is_ib(), much like the device checks are being
> > updated.
> 
> decording into the mad private structure during initial validation
> seems like a good idea.

I think what you mean is that the MAD needs to self-identify the "MAD space"
(OPA vs IB) in which it resides.  I agree this would be nice.

However, I need to work through the full implications of this.

Ira

> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 17/19] IB/mad: Implement support for Intel Omni-Path Architecture base version MADs in ib_create_send_mad
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD722-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-08 21:36           ` ira.weiny
       [not found]             ` <20150408213612.GG433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
  0 siblings, 1 reply; 84+ messages in thread
From: ira.weiny @ 2015-04-08 21:36 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Fri, Apr 03, 2015 at 05:40:22PM -0600, Hefty, Sean wrote:
> > @@ -937,20 +937,31 @@ struct ib_mad_send_buf * ib_create_send_mad(struct
> > ib_mad_agent *mad_agent,
> >  	struct ib_mad_send_wr_private *mad_send_wr;
> >  	int pad, message_size, ret, size;
> >  	void *buf;
> > +	size_t mad_size;
> > +	int opa;
> > 
> >  	mad_agent_priv = container_of(mad_agent, struct
> > ib_mad_agent_private,
> >  				      agent);
> > -	pad = get_pad_size(hdr_len, data_len);
> > +
> > +	opa = mad_agent_priv->agent.device-
> > >cached_dev_attrs.device_cap_flags2 &
> > +	      IB_DEVICE_OPA_MAD_SUPPORT;
> > +
> > +	if (opa && base_version == OPA_MGMT_BASE_VERSION)
> > +		mad_size = sizeof(struct jumbo_mad);
> > +	else
> > +		mad_size = sizeof(struct ib_mad);
> 
> Didn't an earlier patch make is possible to read the mad_size directly from the device?
> 

Not exactly, that patch specified the "max_mad_size" the device supported.

OPA devices support IB MADs.  If the IB Base Version is specified on an OPA
device the IB MAD size is to be used.  This maintains compatibility with any
software issuing IB MADs.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 16/19] IB/mad: Add Intel Omni-Path Architecture defines
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD708-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-08 21:41           ` ira.weiny
  0 siblings, 0 replies; 84+ messages in thread
From: ira.weiny @ 2015-04-08 21:41 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Fri, Apr 03, 2015 at 05:33:55PM -0600, Hefty, Sean wrote:
> >  /* Registration table sizes */
> >  #define MAX_MGMT_CLASS		80
> > -#define MAX_MGMT_VERSION	8
> > +#define MAX_MGMT_VERSION	0x83
> 
> It's unfortunate that this results in a big jump in used versions.  Mad_priv.h defines this:

It is unfortunate.

> 
> struct ib_mad_port_private {
> 	...
> 	struct ib_mad_mgmt_version_table version[MAX_MGMT_VERSION];
> 
> struct ib_mad_mgmt_version_table {
> 	struct ib_mad_mgmt_class_table *class;
> 	struct ib_mad_mgmt_vendor_class_table *vendor;
> };
> 
> This ends up allocating about 2K of data per port of NULL pointers.  Not a huge deal, but still.

I agree this is not ideal but this is not a large amount of space.  Nor is this
something which is dynamically being allocated.

> 
> I don't have a great fix here.  Maybe the version[] array can be the necessary size, with some sort of simple mapping function from version to the index?

I did not have a great fix either.  Hence the current implementation.

Frankly I don't know of many systems that have more than a few ports and at 2K
each this does not seem like a big deal.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v4 17/19] IB/mad: Implement support for Intel Omni-Path Architecture base version MADs in ib_create_send_mad
       [not found]             ` <20150408213612.GG433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-04-08 21:44               ` Hefty, Sean
  0 siblings, 0 replies; 84+ messages in thread
From: Hefty, Sean @ 2015-04-08 21:44 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

> > Didn't an earlier patch make is possible to read the mad_size directly
> from the device?
> >
> 
> Not exactly, that patch specified the "max_mad_size" the device supported.
> 
> OPA devices support IB MADs.  If the IB Base Version is specified on an
> OPA
> device the IB MAD size is to be used.  This maintains compatibility with
> any
> software issuing IB MADs.

Got it - thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v4 18/19] IB/mad: Implement Intel Omni-Path Architecture SMP processing
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD742-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-04-09  4:41           ` ira.weiny
  0 siblings, 0 replies; 84+ messages in thread
From: ira.weiny @ 2015-04-09  4:41 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Fri, Apr 03, 2015 at 05:47:49PM -0600, Hefty, Sean wrote:
> > @@ -236,6 +252,24 @@ enum smi_action smi_handle_dr_smp_recv(struct ib_smp
> > *smp, u8 node_type,
> >  					smp->dr_slid == IB_LID_PERMISSIVE);
> >  }
> > 
> > +/*
> > + * Adjust information for a received SMP
> > + * Return 0 if the SMP should be dropped
> 
> The function returns an enum.  The comment of returning 0 is misleading.  The entire comment seems unnecessary. 

Sorry, I just copied the same comment from the IB function.

/*
 * Adjust information for a received SMP
 * Return 0 if the SMP should be dropped
 */
enum smi_action smi_handle_dr_smp_recv(struct ib_smp *smp, u8 node_type,
                                       int port_num, int phys_port_cnt)

I'll add a patch which cleans up those original comments and then update those
in this series.

> 
> > + */
> > +enum smi_action opa_smi_handle_dr_smp_recv(struct opa_smp *smp, u8
> > node_type,
> > +					   int port_num, int phys_port_cnt)
> > +{
> > +	return __smi_handle_dr_smp_recv(node_type, port_num, phys_port_cnt,
> > +					&smp->hop_ptr, smp->hop_cnt,
> > +					smp->route.dr.initial_path,
> > +					smp->route.dr.return_path,
> > +					opa_get_smp_direction(smp),
> > +					smp->route.dr.dr_dlid ==
> > +					OPA_LID_PERMISSIVE,
> > +					smp->route.dr.dr_slid ==
> > +					OPA_LID_PERMISSIVE);
> > +}
> > +
> >  static inline
> >  enum smi_forward_action __smi_check_forward_dr_smp(u8 hop_ptr, u8
> > hop_cnt,
> >  						   u8 direction,
> > @@ -277,6 +311,16 @@ enum smi_forward_action
> > smi_check_forward_dr_smp(struct ib_smp *smp)
> >  					  smp->dr_slid != IB_LID_PERMISSIVE);
> >  }
> > 
> > +enum smi_forward_action opa_smi_check_forward_dr_smp(struct opa_smp *smp)
> > +{
> > +	return __smi_check_forward_dr_smp(smp->hop_ptr, smp->hop_cnt,
> > +					  opa_get_smp_direction(smp),
> > +					  smp->route.dr.dr_dlid ==
> > +					  OPA_LID_PERMISSIVE,
> > +					  smp->route.dr.dr_slid ==
> > +					  OPA_LID_PERMISSIVE);
> > +}
> > +
> >  /*
> >   * Return the forwarding port number from initial_path for outgoing SMP
> > and
> >   * from return_path for returning SMP
> > @@ -286,3 +330,13 @@ int smi_get_fwd_port(struct ib_smp *smp)
> >  	return (!ib_get_smp_direction(smp) ? smp->initial_path[smp-
> > >hop_ptr+1] :
> >  		smp->return_path[smp->hop_ptr-1]);
> >  }
> > +
> > +/*
> > + * Return the forwarding port number from initial_path for outgoing SMP
> > and
> > + * from return_path for returning SMP
> > + */
> > +int opa_smi_get_fwd_port(struct opa_smp *smp)
> > +{
> > +	return !opa_get_smp_direction(smp) ? smp->route.dr.initial_path[smp-
> > >hop_ptr+1] :
> > +		smp->route.dr.return_path[smp->hop_ptr-1];
> > +}
> > diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
> > index aff96ba..e95c537 100644
> > --- a/drivers/infiniband/core/smi.h
> > +++ b/drivers/infiniband/core/smi.h
> > @@ -62,6 +62,9 @@ extern enum smi_action smi_handle_dr_smp_send(struct
> > ib_smp *smp,
> >   * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
> >   * via process_mad
> >   */
> > +/* NOTE: This is called on opa_smp's don't check fields which are not
> > common
> > + * between ib_smp and opa_smp
> > + */
> 
> This comment suggests that the function is not correct for OPA.  ?

This was a mistake left over from an early version of the patches.  OPA
versions are in opa_smi.h.  Those should be used.

Removed comment and fixed handle_outgoing_dr_smp.

> 
> >  static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
> >  						  struct ib_device *device)
> >  {
> > @@ -77,6 +80,9 @@ static inline enum smi_action smi_check_local_smp(struct
> > ib_smp *smp,
> >   * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
> >   * via process_mad
> >   */
> > +/* NOTE: This is called on opa_smp's don't check fields which are not
> > common
> > + * between ib_smp and opa_smp
> > + */
> 
> Same comment

Same fix.

Ira

> 
> >  static inline enum smi_action smi_check_local_returning_smp(struct ib_smp
> > *smp,
> >  						   struct ib_device *device)
> >  {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2015-04-09  4:41 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-04 23:29 [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found] ` <1423092585-26692-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-02-04 23:29   ` [PATCH v4 01/19] IB/mad: Rename is_data_mad to is_rmpp_data_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-02-04 23:29   ` [PATCH v4 02/19] IB/core: Cache device attributes for use by upper level drivers ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-3-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 20:43       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD574-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-06 22:10           ` ira.weiny
     [not found]             ` <20150406221044.GA433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-04-06 22:43               ` Hefty, Sean
     [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBDE3E-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-08 16:01                   ` ira.weiny
2015-02-04 23:29   ` [PATCH v4 03/19] IB/mad: Change validate_mad signature to take ib_mad_hdr rather than ib_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-4-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 21:20       ` Hefty, Sean
2015-02-04 23:29   ` [PATCH v4 04/19] IB/mad: Change ib_response_mad " ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-5-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 21:20       ` Hefty, Sean
2015-02-04 23:29   ` [PATCH v4 05/19] IB/mad: Change cast in rcv_has_same_class ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-02-04 23:29   ` [PATCH v4 06/19] IB/core: Add max_mad_size to ib_device_attr ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-7-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-02-24 14:16       ` Doug Ledford
     [not found]         ` <1424787385.4847.16.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-25 18:13           ` Weiny, Ira
     [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3ECE5-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-02-25 18:23               ` Jason Gunthorpe
     [not found]                 ` <20150225182308.GA14580-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-02-25 18:32                   ` Weiny, Ira
     [not found]                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3ED5B-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-02-25 18:43                       ` Jason Gunthorpe
2015-02-04 23:29   ` [PATCH v4 07/19] IB/mad: Convert ib_mad_private allocations from kmem_cache to kmalloc ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-8-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-02-24 14:22       ` Doug Ledford
     [not found]         ` <1424787735.4847.19.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-02-25 18:23           ` Weiny, Ira
2015-02-04 23:29   ` [PATCH v4 08/19] IB/mad: Add helper function for smi_handle_dr_smp_send ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-9-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 21:28       ` Hefty, Sean
2015-02-04 23:29   ` [PATCH v4 09/19] IB/mad: Add helper function for smi_handle_dr_smp_recv ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-02-04 23:29   ` [PATCH v4 10/19] IB/mad: Add helper function for smi_check_forward_dr_smp ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-02-04 23:29   ` [PATCH v4 11/19] IB/mad: Add helper function for SMI processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-02-04 23:29   ` [PATCH v4 12/19] IB/mad: Add MAD size parameters to process_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-13-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 22:40       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD692-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-07 17:23           ` ira.weiny
     [not found]             ` <20150407172303.GB433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-04-07 17:53               ` Hefty, Sean
     [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBE224-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-08 16:07                   ` ira.weiny
2015-02-04 23:29   ` [PATCH v4 13/19] IB/mad: Add base version parameter to ib_create_send_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-02-04 23:29   ` [PATCH v4 14/19] IB/core: Add IB_DEVICE_OPA_MAD_SUPPORT device cap flag ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-15-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-02-06 20:35       ` Hal Rosenstock
     [not found]         ` <54D52589.8020305-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-02-11 15:40           ` Weiny, Ira
     [not found]             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC244A8-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-02-12 14:00               ` Hal Rosenstock
     [not found]                 ` <54DCB1E9.7010309-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-02-17 21:25                   ` Weiny, Ira
     [not found]                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC29020-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-02-23 18:54                       ` Hal Rosenstock
     [not found]                         ` <54EB7756.7070407-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-02-25  0:29                           ` Weiny, Ira
     [not found]                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC3D330-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-02-25 17:13                               ` Doug Ledford
     [not found]                                 ` <1424884438.4847.91.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-03-04  7:21                                   ` Weiny, Ira
     [not found]                                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC4F18C-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-04 16:02                                       ` Hefty, Sean
     [not found]                                         ` <1828884A29C6694DAF28B7E6B8A8237399E6F06F-8oqHQFITsIHTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-04 16:41                                           ` Weiny, Ira
     [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC4F50B-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-09 21:22                                               ` Hal Rosenstock
     [not found]                                                 ` <54FE0F16.5090905-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-03-11  7:27                                                   ` Weiny, Ira
     [not found]                                                     ` <2807E5FD2F6FDA4886F6618EAC48510E0CC5C11A-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-16 23:59                                                       ` Hefty, Sean
     [not found]                                                         ` <1828884A29C6694DAF28B7E6B8A8237399E8106A-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-17 23:36                                                           ` Hefty, Sean
     [not found]                                                             ` <1828884A29C6694DAF28B7E6B8A8237399E818C6-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-20 13:38                                                               ` Michael Wang
2015-03-20 13:48                                                               ` Michael Wang
     [not found]                                                                 ` <6A3D3202-0128-4F33-B596-D7A76AB66DF8@gmail.com>
     [not found]                                                                   ` <20150320235748.GA22703@phlsvsds.ph.intel.com>
     [not found]                                                                     ` <20150320235748.GA22703-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-03-21  0:05                                                                       ` ira.weiny
     [not found]                                                                         ` <20150321000541.GA24717-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-03-21  7:49                                                                           ` Yun Wang
     [not found]                                                                             ` <CAJuTgQUsZ34F-dKpsmW+5=axDWb93pA43LZ-qKbEjqyu-RUUmg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-23 16:31                                                                               ` Hefty, Sean
     [not found]                                                                                 ` <1828884A29C6694DAF28B7E6B8A8237399E82E58-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-24 12:49                                                                                   ` Michael Wang
     [not found]                                                                                     ` <55115D61.9040201-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2015-03-25 10:30                                                                                       ` Michael Wang
     [not found]                                                                                         ` <55128E48.1080406-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2015-04-02 22:45                                                                                           ` ira.weiny
2015-03-06 17:47                                       ` Jason Gunthorpe
     [not found]                                         ` <20150306174729.GE22375-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-03-06 22:47                                           ` Weiny, Ira
     [not found]                                             ` <2807E5FD2F6FDA4886F6618EAC48510E0CC53437-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-03-09 21:23                                               ` Hal Rosenstock
2015-02-04 23:29   ` [PATCH v4 15/19] IB/mad: Create jumbo_mad data structures ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-16-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 23:08       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD6C8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-04  0:14           ` Hefty, Sean
2015-04-08 15:33           ` ira.weiny
2015-02-04 23:29   ` [PATCH v4 16/19] IB/mad: Add Intel Omni-Path Architecture defines ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-17-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 23:33       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD708-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-08 21:41           ` ira.weiny
2015-02-04 23:29   ` [PATCH v4 17/19] IB/mad: Implement support for Intel Omni-Path Architecture base version MADs in ib_create_send_mad ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-18-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 23:40       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD722-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-08 21:36           ` ira.weiny
     [not found]             ` <20150408213612.GG433-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-04-08 21:44               ` Hefty, Sean
2015-02-04 23:29   ` [PATCH v4 18/19] IB/mad: Implement Intel Omni-Path Architecture SMP processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-19-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-03 23:47       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD742-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-09  4:41           ` ira.weiny
2015-02-04 23:29   ` [PATCH v4 19/19] IB/mad: Implement Intel Omni-Path Architecture MAD processing ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1423092585-26692-20-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-04-04  1:44       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A8FBD9DE-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-07  2:29           ` Jason Gunthorpe
     [not found]             ` <20150407022954.GA7531-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-04-07 16:57               ` Hefty, Sean
     [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373A8FBE150-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-07 17:16                   ` Jason Gunthorpe
     [not found]                     ` <20150407171659.GA15634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-04-07 17:41                       ` Hefty, Sean
     [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373A8FBE1EB-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-07 18:06                           ` Jason Gunthorpe
2015-04-07 17:15             ` Hefty, Sean
     [not found]               ` <1828884A29C6694DAF28B7E6B8A82373A8FBE17D-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-04-07 17:19                 ` Jason Gunthorpe
     [not found]                   ` <20150407171920.GB15634-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-04-08 21:24                     ` ira.weiny
2015-02-06 20:34   ` [PATCH v4 00/19] IB/mad: Add support for Intel Omni-Path Architecture (OPA) " Hal Rosenstock
     [not found]     ` <54D52562.5050408-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-02-09 21:20       ` Weiny, Ira

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.