All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][rdma-next 00/11] Hardware tag matching support
@ 2017-08-14 13:53 Leon Romanovsky
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky

This patch series adds to Mellanox ConnectX HCA driver support of
tag matching. It introduces new hardware object eXtended shared Receive
Queue (XRQ), which follows SRQ semantics with addition of extended
receive buffers topologies and offloads.

This series adds tag matching topology and rendezvouz offload.

Main changes between RFC and current version:
 * Followed after RFC posted on the ML and OFVWG discussions
 * Implements agreed verbs interface
 * Rebased on top of latest version
 * Adding feature description under Documentaion/infiniband
 * In struct ib_srq_init_attr moved CQ outside XRC inner struct.
 * Added max size of the information passed after the RNDV header
 * Added hca_sq_owner HW flag for RNDV QPs

Thanks

----------------------------------------------------------------

Doug,

Please note that I merged our shared pull request from the mailing
list before sending this series. Please let me know if something needs
to be redone.

Thanks

----------------------------------------------------------------
The following changes since commit c8252e205138a6649a2274ad658b6fd6cce7b334:

  Merge tag 'mlx5-shared-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into HEAD (2017-08-13 15:48:40 +0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git tags/rdma-next-2017-08-14

for you to fetch changes up to 617a0639046f2e1d0cc32782ce60fe25aac3f7e1:

  Documentation: Hardware tag matching (2017-08-13 15:49:08 +0300)

----------------------------------------------------------------
TAG matching support

Message Passing Interface (MPI) is a communication protocol that is
widely used for exchange of messages among processes in high-performance
computing (HPC) systems. Messages sent from a sending process to a
destination process are marked with an identifying label, referred to as
a tag. Destination processes post buffers in local memory that are
similarly marked with tags. When a message is received by the receiver
(i.e., the host computer on which the destination process is running),
the message is stored in a buffer whose tag matches the message tag. The
process of finding a buffer with a matching tag for the received packet
is called tag matching.

There are two protocols that are generally used to send messages over
MPI: The "Eager Protocol" is best suited to small messages that are
simply sent to the destination process and received in an appropriate
matching buffer. The "Rendezvous Protocol" is better suited to large
messages. In Rendezvous, when the sender process has a large message to
send, it first sends a small message to the destination process
announcing its intention to send the large message. This small message
is referred to as an RTS (ready to send) message. The RTS includes the
message tag and buffer address in the sender. The destination process
matches the RTS to a posted receive buffer, or posts such a buffer if
one does not already exist. Once a matching receive buffer has been
posted at the destination process side, the receiver initiates a remote
direct memory access (RDMA) read request to read the data from the
buffer address listed by the sender in the RTS message.

MPI tag matching, when performed in software by a host processor, can
consume substantial host resources, thus detracting from the performance
of the actual software applications that are using MPI for
communications. One possible solution is to offload the entire tag
matching process to a peripheral hardware device, such as a network
interface controller (NIC). In this case, the software application using
MPI will post a set of buffers in a memory of the host processor and
will pass the entire list of tags associated with the buffers to the
NIC. In large-scale networks, however, the NIC may be required to
simultaneously support many communicating processes and contexts
(referred to in MPI parlance as "ranks" and "communicators,"
respectively). NIC access to and matching of the large lists of tags
involved in such a scenario can itself become a bottleneck. The NIC must
also be able to handle "unexpected" traffic, for which buffers and tags
have not yet been posted, which may also degrade performance.

When the NIC receives a message over the network from one of the peer
processes, and the message contains a label in accordance with the
protocol, the NIC compares the label to the labels in the part of the
list that was pushed to the NIC. Upon finding a match to the label, the
NIC writes data conveyed in the message to the buffer in the memory that
is associated with this label and submits a notification to the software
process. The notification serves two purposes: both to indicate to the
software process that the label has been consumed, so that the process
will update the list of the labels posted to the NIC; and to inform the
software process that the data are available in the buffer. In some
cases (such as when the NIC retrieves the data from the remote node by
RDMA), the NIC may submit two notifications, in the form of completion
reports, of which the first informs the software process of the
consumption of the label and the second announces availability of the
data.

----------------------------------------------------------------
Artemy Kovalyov (11):
      net/mlx5: Update HW layout definitions
      IB/core: Add XRQ capabilities
      IB/core: Separate CQ handle in SRQ context
      IB/core: Add new SRQ type IB_SRQT_TM
      IB/uverbs: Add XRQ creation parameter to UAPI
      IB/uverbs: Add new SRQ type IB_SRQT_TM
      IB/uverbs: Expose XRQ capabilities
      IB/mlx5: Fill XRQ capabilities
      net/mlx5: Add XRQ support
      IB/mlx5: Support IB_SRQT_TM
      Documentation: Hardware tag matching

 Documentation/infiniband/tag_matching.txt     |  64 +++++++++++
 drivers/infiniband/core/uverbs_cmd.c          |  43 ++++++--
 drivers/infiniband/core/verbs.c               |  16 +--
 drivers/infiniband/hw/mlx4/srq.c              |   4 +-
 drivers/infiniband/hw/mlx5/main.c             |  20 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   5 +
 drivers/infiniband/hw/mlx5/qp.c               |   9 +-
 drivers/infiniband/hw/mlx5/srq.c              |  29 +++--
 drivers/net/ethernet/mellanox/mlx5/core/srq.c | 150 ++++++++++++++++++++++++--
 include/linux/mlx5/driver.h                   |   1 +
 include/linux/mlx5/mlx5_ifc.h                 |   9 +-
 include/linux/mlx5/srq.h                      |   5 +
 include/rdma/ib_verbs.h                       |  58 +++++++---
 include/uapi/rdma/ib_user_verbs.h             |  17 ++-
 14 files changed, 371 insertions(+), 59 deletions(-)
 create mode 100644 Documentation/infiniband/tag_matching.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [rdma-next 01/11] net/mlx5: Update HW layout definitions
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 02/11] IB/core: Add XRQ capabilities Leon Romanovsky
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

* add offload_type field to mlx5_ifc_qpc_bits
* update mlx5_ifc_xrqc_bits layout

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/mlx5/mlx5_ifc.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 671004923f57..8d62c108dd80 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2016,6 +2016,10 @@ enum {
 	MLX5_QPC_PM_STATE_MIGRATED  = 0x3,
 };
 
+enum {
+	MLX5_QPC_OFFLOAD_TYPE_RNDV  = 0x1,
+};
+
 enum {
 	MLX5_QPC_END_PADDING_MODE_SCATTER_AS_IS                = 0x0,
 	MLX5_QPC_END_PADDING_MODE_PAD_TO_CACHE_LINE_ALIGNMENT  = 0x1,
@@ -2059,7 +2063,8 @@ struct mlx5_ifc_qpc_bits {
 	u8         st[0x8];
 	u8         reserved_at_10[0x3];
 	u8         pm_state[0x2];
-	u8         reserved_at_15[0x7];
+	u8         reserved_at_15[0x3];
+	u8         offload_type[0x4];
 	u8         end_padding_mode[0x2];
 	u8         reserved_at_1e[0x2];
 
@@ -3004,7 +3009,7 @@ struct mlx5_ifc_xrqc_bits {
 
 	struct mlx5_ifc_tag_matching_topology_context_bits tag_matching_topology_context;
 
-	u8         reserved_at_180[0x880];
+	u8         reserved_at_180[0x280];
 
 	struct mlx5_ifc_wq_bits wq;
 };
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 02/11] IB/core: Add XRQ capabilities
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-08-14 13:53   ` [rdma-next 01/11] net/mlx5: Update HW layout definitions Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 03/11] IB/core: Separate CQ handle in SRQ context Leon Romanovsky
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch adds following TM XRQ capabilities:

* max_rndv_hdr_size - Max size of rendezvous request message
* max_num_tags - Max number of entries in tag matching list
* max_ops - Max number of outstanding list operations
* max_sge - Max number of SGE in tag matching entry
* flags - the following flags are currently defined:
    - IB_TM_CAP_RC - Support tag matching on RC transport

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/rdma/ib_verbs.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6aec3971628b..f539a6fe298c 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -276,6 +276,24 @@ struct ib_rss_caps {
 	u32 max_rwq_indirection_table_size;
 };
 
+enum ib_tm_cap_flags {
+	/*  Support tag matching on RC transport */
+	IB_TM_CAP_RC		    = 1 << 0,
+};
+
+struct ib_xrq_caps {
+	/* Max size of RNDV header */
+	u32 max_rndv_hdr_size;
+	/* Max number of entries in tag matching list */
+	u32 max_num_tags;
+	/* From enum ib_tm_cap_flags */
+	u32 flags;
+	/* Max number of outstanding list operations */
+	u32 max_ops;
+	/* Max number of SGE in tag matching entry */
+	u32 max_sge;
+};
+
 enum ib_cq_creation_flags {
 	IB_CQ_FLAGS_TIMESTAMP_COMPLETION   = 1 << 0,
 	IB_CQ_FLAGS_IGNORE_OVERRUN	   = 1 << 1,
@@ -336,6 +354,7 @@ struct ib_device_attr {
 	struct ib_rss_caps	rss_caps;
 	u32			max_wq_type_rq;
 	u32			raw_packet_caps; /* Use ib_raw_packet_caps enum */
+	struct ib_xrq_caps	xrq_caps;
 };
 
 enum ib_mtu {
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 03/11] IB/core: Separate CQ handle in SRQ context
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-08-14 13:53   ` [rdma-next 01/11] net/mlx5: Update HW layout definitions Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 02/11] IB/core: Add XRQ capabilities Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 04/11] IB/core: Add new SRQ type IB_SRQT_TM Leon Romanovsky
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 27 +++++++++++++++++----------
 drivers/infiniband/core/verbs.c      | 16 +++++++++-------
 drivers/infiniband/hw/mlx4/srq.c     |  4 ++--
 drivers/infiniband/hw/mlx5/main.c    | 10 +++++-----
 drivers/infiniband/hw/mlx5/srq.c     | 11 +++++++----
 include/rdma/ib_verbs.h              | 31 ++++++++++++++++++++-----------
 6 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 515425a50059..afa4c1b7891a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3479,10 +3479,12 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 
 		obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object, uobject);
 		atomic_inc(&obj->uxrcd->refcnt);
+	}
 
-		attr.ext.xrc.cq  = uobj_get_obj_read(cq, cmd->cq_handle,
-						     file->ucontext);
-		if (!attr.ext.xrc.cq) {
+	if (ib_srq_has_cq(cmd->srq_type)) {
+		attr.ext.cq  = uobj_get_obj_read(cq, cmd->cq_handle,
+						 file->ucontext);
+		if (!attr.ext.cq) {
 			ret = -EINVAL;
 			goto err_put_xrcd;
 		}
@@ -3517,10 +3519,13 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	srq->event_handler = attr.event_handler;
 	srq->srq_context   = attr.srq_context;
 
+	if (ib_srq_has_cq(cmd->srq_type)) {
+		srq->ext.cq       = attr.ext.cq;
+		atomic_inc(&attr.ext.cq->usecnt);
+	}
+
 	if (cmd->srq_type == IB_SRQT_XRC) {
-		srq->ext.xrc.cq   = attr.ext.xrc.cq;
 		srq->ext.xrc.xrcd = attr.ext.xrc.xrcd;
-		atomic_inc(&attr.ext.xrc.cq->usecnt);
 		atomic_inc(&attr.ext.xrc.xrcd->usecnt);
 	}
 
@@ -3543,10 +3548,12 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 		goto err_copy;
 	}
 
-	if (cmd->srq_type == IB_SRQT_XRC) {
+	if (cmd->srq_type == IB_SRQT_XRC)
 		uobj_put_read(xrcd_uobj);
-		uobj_put_obj_read(attr.ext.xrc.cq);
-	}
+
+	if (ib_srq_has_cq(cmd->srq_type))
+		uobj_put_obj_read(attr.ext.cq);
+
 	uobj_put_obj_read(pd);
 	uobj_alloc_commit(&obj->uevent.uobject);
 
@@ -3559,8 +3566,8 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	uobj_put_obj_read(pd);
 
 err_put_cq:
-	if (cmd->srq_type == IB_SRQT_XRC)
-		uobj_put_obj_read(attr.ext.xrc.cq);
+	if (ib_srq_has_cq(cmd->srq_type))
+		uobj_put_obj_read(attr.ext.cq);
 
 err_put_xrcd:
 	if (cmd->srq_type == IB_SRQT_XRC) {
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e3aa9c60bcc5..cc955b16027f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -620,11 +620,13 @@ struct ib_srq *ib_create_srq(struct ib_pd *pd,
 		srq->event_handler = srq_init_attr->event_handler;
 		srq->srq_context   = srq_init_attr->srq_context;
 		srq->srq_type      = srq_init_attr->srq_type;
+		if (ib_srq_has_cq(srq->srq_type)) {
+			srq->ext.cq   = srq_init_attr->ext.cq;
+			atomic_inc(&srq->ext.cq->usecnt);
+		}
 		if (srq->srq_type == IB_SRQT_XRC) {
 			srq->ext.xrc.xrcd = srq_init_attr->ext.xrc.xrcd;
-			srq->ext.xrc.cq   = srq_init_attr->ext.xrc.cq;
 			atomic_inc(&srq->ext.xrc.xrcd->usecnt);
-			atomic_inc(&srq->ext.xrc.cq->usecnt);
 		}
 		atomic_inc(&pd->usecnt);
 		atomic_set(&srq->usecnt, 0);
@@ -665,18 +667,18 @@ int ib_destroy_srq(struct ib_srq *srq)
 
 	pd = srq->pd;
 	srq_type = srq->srq_type;
-	if (srq_type == IB_SRQT_XRC) {
+	if (ib_srq_has_cq(srq_type))
+		cq = srq->ext.cq;
+	if (srq_type == IB_SRQT_XRC)
 		xrcd = srq->ext.xrc.xrcd;
-		cq = srq->ext.xrc.cq;
-	}
 
 	ret = srq->device->destroy_srq(srq);
 	if (!ret) {
 		atomic_dec(&pd->usecnt);
-		if (srq_type == IB_SRQT_XRC) {
+		if (srq_type == IB_SRQT_XRC)
 			atomic_dec(&xrcd->usecnt);
+		if (ib_srq_has_cq(srq_type))
 			atomic_dec(&cq->usecnt);
-		}
 	}
 
 	return ret;
diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index 0facaf5f6d23..210fe964a65d 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -183,8 +183,8 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 		}
 	}
 
-	cqn = (init_attr->srq_type == IB_SRQT_XRC) ?
-		to_mcq(init_attr->ext.xrc.cq)->mcq.cqn : 0;
+	cqn = ib_srq_has_cq(init_attr->srq_type) ?
+		to_mcq(init_attr->ext.cq)->mcq.cqn : 0;
 	xrcdn = (init_attr->srq_type == IB_SRQT_XRC) ?
 		to_mxrcd(init_attr->ext.xrc.xrcd)->xrcdn :
 		(u16) dev->dev->caps.reserved_xrcds;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 986404827848..387d90ad8ec1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3154,7 +3154,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 	attr.attr.max_sge = 1;
 	attr.attr.max_wr = 1;
 	attr.srq_type = IB_SRQT_XRC;
-	attr.ext.xrc.cq = devr->c0;
+	attr.ext.cq = devr->c0;
 	attr.ext.xrc.xrcd = devr->x0;
 
 	devr->s0 = mlx5_ib_create_srq(devr->p0, &attr, NULL);
@@ -3169,9 +3169,9 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 	devr->s0->srq_context   = NULL;
 	devr->s0->srq_type      = IB_SRQT_XRC;
 	devr->s0->ext.xrc.xrcd	= devr->x0;
-	devr->s0->ext.xrc.cq	= devr->c0;
+	devr->s0->ext.cq	= devr->c0;
 	atomic_inc(&devr->s0->ext.xrc.xrcd->usecnt);
-	atomic_inc(&devr->s0->ext.xrc.cq->usecnt);
+	atomic_inc(&devr->s0->ext.cq->usecnt);
 	atomic_inc(&devr->p0->usecnt);
 	atomic_set(&devr->s0->usecnt, 0);
 
@@ -3190,9 +3190,9 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 	devr->s1->event_handler = NULL;
 	devr->s1->srq_context   = NULL;
 	devr->s1->srq_type      = IB_SRQT_BASIC;
-	devr->s1->ext.xrc.cq	= devr->c0;
+	devr->s1->ext.cq	= devr->c0;
 	atomic_inc(&devr->p0->usecnt);
-	atomic_set(&devr->s0->usecnt, 0);
+	atomic_set(&devr->s1->usecnt, 0);
 
 	for (port = 0; port < ARRAY_SIZE(devr->ports); ++port) {
 		INIT_WORK(&devr->ports[port].pkey_change_work,
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index 43707b101f47..022b4c642047 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -292,13 +292,16 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd,
 	in.wqe_shift = srq->msrq.wqe_shift - 4;
 	if (srq->wq_sig)
 		in.flags |= MLX5_SRQ_FLAG_WQ_SIG;
-	if (init_attr->srq_type == IB_SRQT_XRC) {
+
+	if (init_attr->srq_type == IB_SRQT_XRC)
 		in.xrcd = to_mxrcd(init_attr->ext.xrc.xrcd)->xrcdn;
-		in.cqn = to_mcq(init_attr->ext.xrc.cq)->mcq.cqn;
-	} else if (init_attr->srq_type == IB_SRQT_BASIC) {
+	else
 		in.xrcd = to_mxrcd(dev->devr.x0)->xrcdn;
+
+	if (ib_srq_has_cq(init_attr->srq_type))
+		in.cqn = to_mcq(init_attr->ext.cq)->mcq.cqn;
+	else
 		in.cqn = to_mcq(dev->devr.c0)->mcq.cqn;
-	}
 
 	in.pd = to_mpd(pd)->pdn;
 	in.db_record = srq->db.dma;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f539a6fe298c..f0e9d30bb0b9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -989,6 +989,11 @@ enum ib_srq_type {
 	IB_SRQT_XRC
 };
 
+static inline bool ib_srq_has_cq(enum ib_srq_type srq_type)
+{
+	return srq_type == IB_SRQT_XRC;
+}
+
 enum ib_srq_attr_mask {
 	IB_SRQ_MAX_WR	= 1 << 0,
 	IB_SRQ_LIMIT	= 1 << 1,
@@ -1006,11 +1011,13 @@ struct ib_srq_init_attr {
 	struct ib_srq_attr	attr;
 	enum ib_srq_type	srq_type;
 
-	union {
-		struct {
-			struct ib_xrcd *xrcd;
-			struct ib_cq   *cq;
-		} xrc;
+	struct {
+		struct ib_cq   *cq;
+		union {
+			struct {
+				struct ib_xrcd *xrcd;
+			} xrc;
+		};
 	} ext;
 };
 
@@ -1549,12 +1556,14 @@ struct ib_srq {
 	enum ib_srq_type	srq_type;
 	atomic_t		usecnt;
 
-	union {
-		struct {
-			struct ib_xrcd *xrcd;
-			struct ib_cq   *cq;
-			u32		srq_num;
-		} xrc;
+	struct {
+		struct ib_cq   *cq;
+		union {
+			struct {
+				struct ib_xrcd *xrcd;
+				u32		srq_num;
+			} xrc;
+		};
 	} ext;
 };
 
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 04/11] IB/core: Add new SRQ type IB_SRQT_TM
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 03/11] IB/core: Separate CQ handle in SRQ context Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 05/11] IB/uverbs: Add XRQ creation parameter to UAPI Leon Romanovsky
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag
matching and rendezvous offloads for MPI applications.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.
In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/rdma/ib_verbs.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f0e9d30bb0b9..80ede01d2237 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -986,12 +986,14 @@ enum ib_cq_notify_flags {
 
 enum ib_srq_type {
 	IB_SRQT_BASIC,
-	IB_SRQT_XRC
+	IB_SRQT_XRC,
+	IB_SRQT_TM,
 };
 
 static inline bool ib_srq_has_cq(enum ib_srq_type srq_type)
 {
-	return srq_type == IB_SRQT_XRC;
+	return srq_type == IB_SRQT_XRC ||
+	       srq_type == IB_SRQT_TM;
 }
 
 enum ib_srq_attr_mask {
@@ -1017,6 +1019,10 @@ struct ib_srq_init_attr {
 			struct {
 				struct ib_xrcd *xrcd;
 			} xrc;
+
+			struct {
+				u32		max_num_tags;
+			} tag_matching;
 		};
 	} ext;
 };
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 05/11] IB/uverbs: Add XRQ creation parameter to UAPI
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 04/11] IB/core: Add new SRQ type IB_SRQT_TM Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 06/11] IB/uverbs: Add new SRQ type IB_SRQT_TM Leon Romanovsky
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add tm_list_size parameter to struct ib_uverbs_create_xsrq.
If SRQ type is tag-matching this field defines maximum size
of tag matching list. Otherwise, it is expected to be zero.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/uapi/rdma/ib_user_verbs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 63656d2e8705..d5434bbf40c8 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -1024,7 +1024,7 @@ struct ib_uverbs_create_xsrq {
 	__u32 max_wr;
 	__u32 max_sge;
 	__u32 srq_limit;
-	__u32 reserved;
+	__u32 max_num_tags;
 	__u32 xrcd_handle;
 	__u32 cq_handle;
 	__u64 driver_data[0];
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 06/11] IB/uverbs: Add new SRQ type IB_SRQT_TM
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 05/11] IB/uverbs: Add XRQ creation parameter to UAPI Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 07/11] IB/uverbs: Expose XRQ capabilities Leon Romanovsky
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add new SRQ type capable of new tag matching feature.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.

In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index afa4c1b7891a..e3ac272a0fa9 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1430,7 +1430,7 @@ static int create_qp(struct ib_uverbs_file *file,
 			if (cmd->is_srq) {
 				srq = uobj_get_obj_read(srq, cmd->srq_handle,
 							file->ucontext);
-				if (!srq || srq->srq_type != IB_SRQT_BASIC) {
+				if (!srq || srq->srq_type == IB_SRQT_XRC) {
 					ret = -EINVAL;
 					goto err_put;
 				}
@@ -3463,6 +3463,9 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
+	if (cmd->srq_type == IB_SRQT_TM)
+		attr.ext.tag_matching.max_num_tags = cmd->max_num_tags;
+
 	if (cmd->srq_type == IB_SRQT_XRC) {
 		xrcd_uobj = uobj_get_read(uobj_get_type(xrcd), cmd->xrcd_handle,
 					  file->ucontext);
@@ -3597,6 +3600,7 @@ ssize_t ib_uverbs_create_srq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
+	memset(&xcmd, 0, sizeof(xcmd));
 	xcmd.response	 = cmd.response;
 	xcmd.user_handle = cmd.user_handle;
 	xcmd.srq_type	 = IB_SRQT_BASIC;
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 07/11] IB/uverbs: Expose XRQ capabilities
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 06/11] IB/uverbs: Add new SRQ type IB_SRQT_TM Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 08/11] IB/mlx5: Fill " Leon Romanovsky
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Make XRQ capabilities available via ibv_query_device() verb.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 10 ++++++++++
 include/uapi/rdma/ib_user_verbs.h    | 15 +++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index e3ac272a0fa9..1996f096c4f8 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3850,6 +3850,16 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 
 	resp.raw_packet_caps = attr.raw_packet_caps;
 	resp.response_length += sizeof(resp.raw_packet_caps);
+
+	if (ucore->outlen < resp.response_length + sizeof(resp.xrq_caps))
+		goto end;
+
+	resp.xrq_caps.max_rndv_hdr_size = attr.xrq_caps.max_rndv_hdr_size;
+	resp.xrq_caps.max_num_tags      = attr.xrq_caps.max_num_tags;
+	resp.xrq_caps.max_ops		= attr.xrq_caps.max_ops;
+	resp.xrq_caps.max_sge		= attr.xrq_caps.max_sge;
+	resp.xrq_caps.flags		= attr.xrq_caps.flags;
+	resp.response_length += sizeof(resp.xrq_caps);
 end:
 	err = ib_copy_to_udata(ucore, &resp, resp.response_length);
 	return err;
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index d5434bbf40c8..9a0b6479fe0c 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -236,6 +236,20 @@ struct ib_uverbs_rss_caps {
 	__u32 reserved;
 };
 
+struct ib_uverbs_tm_caps {
+	/* Max size of rendezvous request message */
+	__u32 max_rndv_hdr_size;
+	/* Max number of entries in tag matching list */
+	__u32 max_num_tags;
+	/* TM flags */
+	__u32 flags;
+	/* Max number of outstanding list operations */
+	__u32 max_ops;
+	/* Max number of SGE in tag matching entry */
+	__u32 max_sge;
+	__u32 reserved;
+};
+
 struct ib_uverbs_ex_query_device_resp {
 	struct ib_uverbs_query_device_resp base;
 	__u32 comp_mask;
@@ -247,6 +261,7 @@ struct ib_uverbs_ex_query_device_resp {
 	struct ib_uverbs_rss_caps rss_caps;
 	__u32  max_wq_type_rq;
 	__u32 raw_packet_caps;
+	struct ib_uverbs_tm_caps xrq_caps;
 };
 
 struct ib_uverbs_query_port {
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 08/11] IB/mlx5: Fill XRQ capabilities
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 07/11] IB/uverbs: Expose XRQ capabilities Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 09/11] net/mlx5: Add XRQ support Leon Romanovsky
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Provide driver specific values for XRQ capabilities.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 10 ++++++++++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  5 +++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 387d90ad8ec1..c72c33e2b65f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -777,6 +777,16 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 			1 << MLX5_CAP_GEN(dev->mdev, log_max_rq);
 	}
 
+	if (MLX5_CAP_GEN(mdev, tag_matching)) {
+		props->xrq_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
+		props->xrq_caps.max_num_tags =
+			(1 << MLX5_CAP_GEN(mdev, log_tag_matching_list_sz)) - 1;
+		props->xrq_caps.flags = IB_TM_CAP_RC;
+		props->xrq_caps.max_ops =
+			1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);
+		props->xrq_caps.max_sge = MLX5_TM_MAX_SGE;
+	}
+
 	if (field_avail(typeof(resp), cqe_comp_caps, uhw->outlen)) {
 		resp.cqe_comp_caps.max_num =
 			MLX5_CAP_GEN(dev->mdev, cqe_compression) ?
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7ac991070020..76a731f39d2b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -107,6 +107,11 @@ enum {
 	MLX5_CQE_VERSION_V1,
 };
 
+enum {
+	MLX5_TM_MAX_RNDV_MSG_SIZE	= 64,
+	MLX5_TM_MAX_SGE			= 1,
+};
+
 struct mlx5_ib_vma_private_data {
 	struct list_head list;
 	struct vm_area_struct *vma;
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 09/11] net/mlx5: Add XRQ support
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 08/11] IB/mlx5: Fill " Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 10/11] IB/mlx5: Support IB_SRQT_TM Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 11/11] Documentation: Hardware tag matching Leon Romanovsky
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add support to new XRQ(eXtended shared Receive Queue)
hardware object. It supports SRQ semantics with addition
of extended receive buffers topologies and offloads.

Currently supports tag matching topology and rendezvouz offload.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/srq.c | 150 ++++++++++++++++++++++++--
 include/linux/mlx5/driver.h                   |   1 +
 include/linux/mlx5/srq.h                      |   5 +
 3 files changed, 146 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/srq.c b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
index f774de6f5fcb..7673da04efa4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/srq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
@@ -435,16 +435,128 @@ static int query_rmp_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 	return err;
 }
 
+static int create_xrq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
+			  struct mlx5_srq_attr *in)
+{
+	u32 create_out[MLX5_ST_SZ_DW(create_xrq_out)] = {0};
+	void *create_in;
+	void *xrqc;
+	void *wq;
+	int pas_size;
+	int inlen;
+	int err;
+
+	pas_size = get_pas_size(in);
+	inlen = MLX5_ST_SZ_BYTES(create_xrq_in) + pas_size;
+	create_in = kvzalloc(inlen, GFP_KERNEL);
+	if (!create_in)
+		return -ENOMEM;
+
+	xrqc = MLX5_ADDR_OF(create_xrq_in, create_in, xrq_context);
+	wq = MLX5_ADDR_OF(xrqc, xrqc, wq);
+
+	set_wq(wq, in);
+	memcpy(MLX5_ADDR_OF(xrqc, xrqc, wq.pas), in->pas, pas_size);
+
+	if (in->type == IB_SRQT_TM) {
+		MLX5_SET(xrqc, xrqc, topology, MLX5_XRQC_TOPOLOGY_TAG_MATCHING);
+		if (in->flags & MLX5_SRQ_FLAG_RNDV)
+			MLX5_SET(xrqc, xrqc, offload, MLX5_XRQC_OFFLOAD_RNDV);
+		MLX5_SET(xrqc, xrqc,
+			 tag_matching_topology_context.log_matching_list_sz,
+			 in->tm_log_list_size);
+	}
+	MLX5_SET(xrqc, xrqc, user_index, in->user_index);
+	MLX5_SET(xrqc, xrqc, cqn, in->cqn);
+	MLX5_SET(create_xrq_in, create_in, opcode, MLX5_CMD_OP_CREATE_XRQ);
+	err = mlx5_cmd_exec(dev, create_in, inlen, create_out,
+			    sizeof(create_out));
+	kvfree(create_in);
+	if (!err)
+		srq->srqn = MLX5_GET(create_xrq_out, create_out, xrqn);
+
+	return err;
+}
+
+static int destroy_xrq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_xrq_in)] = {0};
+	u32 out[MLX5_ST_SZ_DW(destroy_xrq_out)] = {0};
+
+	MLX5_SET(destroy_xrq_in, in, opcode, MLX5_CMD_OP_DESTROY_XRQ);
+	MLX5_SET(destroy_xrq_in, in, xrqn,   srq->srqn);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+static int arm_xrq_cmd(struct mlx5_core_dev *dev,
+		       struct mlx5_core_srq *srq,
+		       u16 lwm)
+{
+	u32 out[MLX5_ST_SZ_DW(arm_rq_out)] = {0};
+	u32 in[MLX5_ST_SZ_DW(arm_rq_in)] = {0};
+
+	MLX5_SET(arm_rq_in, in, opcode,     MLX5_CMD_OP_ARM_RQ);
+	MLX5_SET(arm_rq_in, in, op_mod,     MLX5_ARM_RQ_IN_OP_MOD_XRQ);
+	MLX5_SET(arm_rq_in, in, srq_number, srq->srqn);
+	MLX5_SET(arm_rq_in, in, lwm,	    lwm);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+static int query_xrq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
+			 struct mlx5_srq_attr *out)
+{
+	u32 in[MLX5_ST_SZ_DW(query_xrq_in)] = {0};
+	u32 *xrq_out;
+	int outlen = MLX5_ST_SZ_BYTES(query_xrq_out);
+	void *xrqc;
+	int err;
+
+	xrq_out = kvzalloc(outlen, GFP_KERNEL);
+	if (!xrq_out)
+		return -ENOMEM;
+
+	MLX5_SET(query_xrq_in, in, opcode, MLX5_CMD_OP_QUERY_XRQ);
+	MLX5_SET(query_xrq_in, in, xrqn, srq->srqn);
+
+	err = mlx5_cmd_exec(dev, in, sizeof(in), xrq_out, outlen);
+	if (err)
+		goto out;
+
+	xrqc = MLX5_ADDR_OF(query_xrq_out, xrq_out, xrq_context);
+	get_wq(MLX5_ADDR_OF(xrqc, xrqc, wq), out);
+	if (MLX5_GET(xrqc, xrqc, state) != MLX5_XRQC_STATE_GOOD)
+		out->flags |= MLX5_SRQ_FLAG_ERR;
+	out->tm_next_tag =
+		MLX5_GET(xrqc, xrqc,
+			 tag_matching_topology_context.append_next_index);
+	out->tm_hw_phase_cnt =
+		MLX5_GET(xrqc, xrqc,
+			 tag_matching_topology_context.hw_phase_cnt);
+	out->tm_sw_phase_cnt =
+		MLX5_GET(xrqc, xrqc,
+			 tag_matching_topology_context.sw_phase_cnt);
+
+out:
+	kvfree(xrq_out);
+	return err;
+}
+
 static int create_srq_split(struct mlx5_core_dev *dev,
 			    struct mlx5_core_srq *srq,
 			    struct mlx5_srq_attr *in)
 {
 	if (!dev->issi)
 		return create_srq_cmd(dev, srq, in);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return create_xrc_srq_cmd(dev, srq, in);
-	else
+	case MLX5_RES_XRQ:
+		return create_xrq_cmd(dev, srq, in);
+	default:
 		return create_rmp_cmd(dev, srq, in);
+	}
 }
 
 static int destroy_srq_split(struct mlx5_core_dev *dev,
@@ -452,10 +564,14 @@ static int destroy_srq_split(struct mlx5_core_dev *dev,
 {
 	if (!dev->issi)
 		return destroy_srq_cmd(dev, srq);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return destroy_xrc_srq_cmd(dev, srq);
-	else
+	case MLX5_RES_XRQ:
+		return destroy_xrq_cmd(dev, srq);
+	default:
 		return destroy_rmp_cmd(dev, srq);
+	}
 }
 
 int mlx5_core_create_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
@@ -464,10 +580,16 @@ int mlx5_core_create_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 	int err;
 	struct mlx5_srq_table *table = &dev->priv.srq_table;
 
-	if (in->type == IB_SRQT_XRC)
+	switch (in->type) {
+	case IB_SRQT_XRC:
 		srq->common.res = MLX5_RES_XSRQ;
-	else
+		break;
+	case IB_SRQT_TM:
+		srq->common.res = MLX5_RES_XRQ;
+		break;
+	default:
 		srq->common.res = MLX5_RES_SRQ;
+	}
 
 	err = create_srq_split(dev, srq, in);
 	if (err)
@@ -528,10 +650,14 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 {
 	if (!dev->issi)
 		return query_srq_cmd(dev, srq, out);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return query_xrc_srq_cmd(dev, srq, out);
-	else
+	case MLX5_RES_XRQ:
+		return query_xrq_cmd(dev, srq, out);
+	default:
 		return query_rmp_cmd(dev, srq, out);
+	}
 }
 EXPORT_SYMBOL(mlx5_core_query_srq);
 
@@ -540,10 +666,14 @@ int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 {
 	if (!dev->issi)
 		return arm_srq_cmd(dev, srq, lwm, is_srq);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return arm_xrc_srq_cmd(dev, srq, lwm);
-	else
+	case MLX5_RES_XRQ:
+		return arm_xrq_cmd(dev, srq, lwm);
+	default:
 		return arm_rmp_cmd(dev, srq, lwm);
+	}
 }
 EXPORT_SYMBOL(mlx5_core_arm_srq);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index ea89e32a0d5f..f17e51a4ef94 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -418,6 +418,7 @@ enum mlx5_res_type {
 	MLX5_RES_SQ	= MLX5_EVENT_QUEUE_TYPE_SQ,
 	MLX5_RES_SRQ	= 3,
 	MLX5_RES_XSRQ	= 4,
+	MLX5_RES_XRQ	= 5,
 };
 
 struct mlx5_core_rsc_common {
diff --git a/include/linux/mlx5/srq.h b/include/linux/mlx5/srq.h
index 1cde0fd53f90..24ff23e27c8a 100644
--- a/include/linux/mlx5/srq.h
+++ b/include/linux/mlx5/srq.h
@@ -38,6 +38,7 @@
 enum {
 	MLX5_SRQ_FLAG_ERR    = (1 << 0),
 	MLX5_SRQ_FLAG_WQ_SIG = (1 << 1),
+	MLX5_SRQ_FLAG_RNDV   = (1 << 2),
 };
 
 struct mlx5_srq_attr {
@@ -56,6 +57,10 @@ struct mlx5_srq_attr {
 	u32 user_index;
 	u64 db_record;
 	__be64 *pas;
+	u32 tm_log_list_size;
+	u32 tm_next_tag;
+	u32 tm_hw_phase_cnt;
+	u32 tm_sw_phase_cnt;
 };
 
 struct mlx5_core_dev;
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 10/11] IB/mlx5: Support IB_SRQT_TM
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 09/11] net/mlx5: Add XRQ support Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  2017-08-14 13:53   ` [rdma-next 11/11] Documentation: Hardware tag matching Leon Romanovsky
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ
when SRQ created with IB_SRQT_TM.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yossi Itigin <yosefe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/qp.c  |  9 +++++++--
 drivers/infiniband/hw/mlx5/srq.c | 18 +++++++++++++++---
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 5c7ce9bd466e..ebbfd721661d 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1718,10 +1718,15 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 
 	MLX5_SET(qpc, qpc, rq_type, get_rx_type(qp, init_attr));
 
-	if (qp->sq.wqe_cnt)
+	if (qp->sq.wqe_cnt) {
 		MLX5_SET(qpc, qpc, log_sq_size, ilog2(qp->sq.wqe_cnt));
-	else
+	} else {
 		MLX5_SET(qpc, qpc, no_sq, 1);
+		if (init_attr->srq &&
+		    init_attr->srq->srq_type == IB_SRQT_TM)
+			MLX5_SET(qpc, qpc, offload_type,
+				 MLX5_QPC_OFFLOAD_TYPE_RNDV);
+	}
 
 	/* Set default resources */
 	switch (init_attr->qp_type) {
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index 022b4c642047..d9af04100362 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -101,7 +101,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 				 udata->inlen - sizeof(ucmd)))
 		return -EINVAL;
 
-	if (in->type == IB_SRQT_XRC) {
+	if (in->type != IB_SRQT_BASIC) {
 		err = get_srq_user_index(to_mucontext(pd->uobject->context),
 					 &ucmd, udata->inlen, &uidx);
 		if (err)
@@ -145,7 +145,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 	in->log_page_size = page_shift - MLX5_ADAPTER_PAGE_SHIFT;
 	in->page_offset = offset;
 	if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 &&
-	    in->type == IB_SRQT_XRC)
+	    in->type != IB_SRQT_BASIC)
 		in->user_index = uidx;
 
 	return 0;
@@ -205,7 +205,7 @@ static int create_srq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_srq *srq,
 
 	in->log_page_size = srq->buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT;
 	if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 &&
-	    in->type == IB_SRQT_XRC)
+	    in->type != IB_SRQT_BASIC)
 		in->user_index = MLX5_IB_DEFAULT_UIDX;
 
 	return 0;
@@ -298,6 +298,18 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd,
 	else
 		in.xrcd = to_mxrcd(dev->devr.x0)->xrcdn;
 
+	if (init_attr->srq_type == IB_SRQT_TM) {
+		in.tm_log_list_size =
+			ilog2(init_attr->ext.tag_matching.max_num_tags) + 1;
+		if (in.tm_log_list_size >
+		    MLX5_CAP_GEN(dev->mdev, log_tag_matching_list_sz)) {
+			mlx5_ib_dbg(dev, "TM SRQ max_num_tags exceeding limit\n");
+			err = -EINVAL;
+			goto err_usr_kern_srq;
+		}
+		in.flags |= MLX5_SRQ_FLAG_RNDV;
+	}
+
 	if (ib_srq_has_cq(init_attr->srq_type))
 		in.cqn = to_mcq(init_attr->ext.cq)->mcq.cqn;
 	else
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [rdma-next 11/11] Documentation: Hardware tag matching
       [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-08-14 13:53   ` [rdma-next 10/11] IB/mlx5: Support IB_SRQT_TM Leon Romanovsky
@ 2017-08-14 13:53   ` Leon Romanovsky
  10 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2017-08-14 13:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add document providing definitions of terms and core explanations
for tag matching (TM) protocols, eager and rendezvous,
TM application header, tag list manipulations and matching process.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 Documentation/infiniband/tag_matching.txt | 64 +++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 Documentation/infiniband/tag_matching.txt

diff --git a/Documentation/infiniband/tag_matching.txt b/Documentation/infiniband/tag_matching.txt
new file mode 100644
index 000000000000..d2a3bf819226
--- /dev/null
+++ b/Documentation/infiniband/tag_matching.txt
@@ -0,0 +1,64 @@
+Tag matching logic
+
+The MPI standard defines a set of rules, known as tag-matching, for matching
+source send operations to destination receives.  The following parameters must
+match the following source and destination parameters:
+*	Communicator
+*	User tag - wild card may be specified by the receiver
+*	Source rank – wild car may be specified by the receiver
+*	Destination rank – wild
+The ordering rules require that when more than one pair of send and receive
+message envelopes may match, the pair that includes the earliest posted-send
+and the earliest posted-receive is the pair that must be used to satisfy the
+matching operation. However, this doesn’t imply that tags are consumed in
+the order they are created, e.g., a later generated tag may be consumed, if
+earlier tags can’t be used to satisfy the matching rules.
+
+When a message is sent from the sender to the receiver, the communication
+library may attempt to process the operation either after or before the
+corresponding matching receive is posted.  If a matching receive is posted,
+this is an expected message, otherwise it is called an unexpected message.
+Implementations frequently use different matching schemes for these two
+different matching instances.
+
+To keep MPI library memory footprint down, MPI implementations typically use
+two different protocols for this purpose:
+
+1.	The Eager protocol- the complete message is sent when the send is
+processed by the sender. A completion send is received in the send_cq
+notifying that the buffer can be reused.
+
+2.	The Rendezvous Protocol - the sender sends the tag-matching header,
+and perhaps a portion of data when first notifying the receiver. When the
+corresponding buffer is posted, the responder will use the information from
+the header to initiate an RDMA READ operation directly to the matching buffer.
+A fin message needs to be received in order for the buffer to be reused.
+
+Tag matching implementation
+
+There are two types of matching objects used, the posted receive list and the
+unexpected message list. The application posts receive buffers through calls
+to the MPI receive routines in the posted receive list and posts send messages
+using the MPI send routines. The head of the posted receive list may be
+maintained by the hardware, with the software expected to shadow this list.
+
+When send is initiated and arrives at the receive side, if there is no
+pre-posted receive for this arriving message, it is passed to the software and
+placed in the unexpected message list. Otherwise the match is processed,
+including rendezvous processing, if appropriate, delivering the data to the
+specified receive buffer. This allows overlapping receive-side MPI tag
+matching with computation.
+
+When a receive-message is posted, the communication library will first check
+the software unexpected message list for a matching receive. If a match is
+found, data is delivered to the user buffer, using a software controlled
+protocol. The UCX implementation uses either an eager or rendezvous protocol,
+depending on data size. If no match is found, the entire pre-posted receive
+list is maintained by the hardware, and there is space to add one more
+pre-posted receive to this list, this receive is passed to the hardware.
+Software is expected to shadow this list, to help with processing MPI cancel
+operations. In addition, because hardware and software are not expected to be
+tightly synchronized with respect to the tag-matching operation, this shadow
+list is used to detect the case that a pre-posted receive is passed to the
+hardware, as the matching unexpected message is being passed from the hardware
+to the software.
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-08-14 13:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-14 13:53 [pull request][rdma-next 00/11] Hardware tag matching support Leon Romanovsky
     [not found] ` <20170814135352.5324-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-08-14 13:53   ` [rdma-next 01/11] net/mlx5: Update HW layout definitions Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 02/11] IB/core: Add XRQ capabilities Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 03/11] IB/core: Separate CQ handle in SRQ context Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 04/11] IB/core: Add new SRQ type IB_SRQT_TM Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 05/11] IB/uverbs: Add XRQ creation parameter to UAPI Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 06/11] IB/uverbs: Add new SRQ type IB_SRQT_TM Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 07/11] IB/uverbs: Expose XRQ capabilities Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 08/11] IB/mlx5: Fill " Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 09/11] net/mlx5: Add XRQ support Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 10/11] IB/mlx5: Support IB_SRQT_TM Leon Romanovsky
2017-08-14 13:53   ` [rdma-next 11/11] Documentation: Hardware tag matching Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.