All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/10] Hardware tag matching support
@ 2016-08-28 11:00 Leon Romanovsky
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Message Passing Interface (MPI) is a communication protocol that is
widely used for exchange of messages among processes in high-performance
computing (HPC) systems. Messages sent from a sending process to a
destination process are marked with an identifying label, referred to as
a tag. Destination processes post buffers in local memory that are
similarly marked with tags. When a message is received by the receiver
(i.e., the host computer on which the destination process is running),
the message is stored in a buffer whose tag matches the message tag. The
process of finding a buffer with a matching tag for the received packet
is called tag matching.

There are two protocols that are generally used to send messages over
MPI: The "Eager Protocol" is best suited to small messages that are
simply sent to the destination process and received in an appropriate
matching buffer. The "Rendezvous Protocol" is better suited to large
messages. In Rendezvous, when the sender process has a large message to
send, it first sends a small message to the destination process
announcing its intention to send the large message. This small message
is referred to as an RTS (ready to send) message. The RTS includes the
message tag and buffer address in the sender. The destination process
matches the RTS to a posted receive buffer, or posts such a buffer if
one does not already exist. Once a matching receive buffer has been
posted at the destination process side, the receiver initiates a remote
direct memory access (RDMA) read request to read the data from the
buffer address listed by the sender in the RTS message.

MPI tag matching, when performed in software by a host processor, can
consume substantial host resources, thus detracting from the performance
of the actual software applications that are using MPI for
communications. One possible solution is to offload the entire tag
matching process to a peripheral hardware device, such as a network
interface controller (NIC). In this case, the software application using
MPI will post a set of buffers in a memory of the host processor and
will pass the entire list of tags associated with the buffers to the
NIC. In large-scale networks, however, the NIC may be required to
simultaneously support many communicating processes and contexts
(referred to in MPI parlance as "ranks" and "communicators,"
respectively). NIC access to and matching of the large lists of tags
involved in such a scenario can itself become a bottleneck. The NIC must
also be able to handle "unexpected" traffic, for which buffers and tags
have not yet been posted, which may also degrade performance.

When the NIC receives a message over the network from one of the peer
processes, and the message contains a label in accordance with the
protocol, the NIC compares the label to the labels in the part of the
list that was pushed to the NIC. Upon finding a match to the label, the
NIC writes data conveyed in the message to the buffer in the memory that
is associated with this label and submits a notification to the software
process. The notification serves two purposes: both to indicate to the
software process that the label has been consumed, so that the process
will update the list of the labels posted to the NIC; and to inform the
software process that the data are available in the buffer. In some
cases (such as when the NIC retrieves the data from the remote node by
RDMA), the NIC may submit two notifications, in the form of completion
reports, of which the first informs the software process of the
consumption of the label and the second announces availability of the
data.

This patch series adds to Mellanox ConnectX HCA driver support of
tag matching. It introduces new hardware object eXtended shared Receive
Queue (XRQ), which follows SRQ semantics with addition of extended
receive buffers topologies and offloads. This series adds tag matching
topology and rendezvouz offload.

Available in the "topic/xrq" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git

Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/xrq

Thanks,
  Artemy & Leon

Artemy Kovalyov (10):
  IB/core: Add XRQ capabilities
  IB/core: Make CQ separate part of SRQ context
  IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING
  IB/uverbs: Expose tag matching capabilties to UAPI
  IB/uverbs: Expose XRQ capabilities
  IB/uverbs: Add XRQ creation parameter to UAPI
  IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING
  IB/mlx5: Fill XRQ capabilities
  net/mlx5: Add XRQ support
  IB/mlx5: Support IB_SRQT_TAG_MATCHING

 drivers/infiniband/core/uverbs_cmd.c          |  31 +++++-
 drivers/infiniband/core/verbs.c               |  16 +--
 drivers/infiniband/hw/mlx5/main.c             |  21 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   6 ++
 drivers/infiniband/hw/mlx5/srq.c              |  15 ++-
 drivers/net/ethernet/mellanox/mlx5/core/srq.c | 150 ++++++++++++++++++++++++--
 include/linux/mlx5/driver.h                   |   1 +
 include/linux/mlx5/srq.h                      |   5 +
 include/rdma/ib_verbs.h                       |  61 +++++++++--
 include/uapi/rdma/ib_user_verbs.h             |  36 ++++++-
 10 files changed, 307 insertions(+), 35 deletions(-)

--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 01/10] IB/core: Add XRQ capabilities
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 02/10] IB/core: Make CQ separate part of SRQ context Leon Romanovsky
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch adds following XRQ capabilities:

* max_unexpected_tags - The maximum number of tags
    in the unexpected message list

* tag_mask_length - Number of bits in a tag matching mask

* header_size - The maximum size for the TM header including RNDV info

* app_context_size - The size for the application context field in the
    XRQ context

* max_match_list - Maximal number of tags which can be matched
    simultaneously

* capability_flags - Tag matching message type supported:
    - IBV_NO_TAG - The HW supports messages without tag
	sent on QPs attached to a XRQ
    - IBV_EAGER_EXPECTED - The HW supports tag matching
	for EAGER messages when the send arrives after
	the corresponding receive
    - IBV_EAGER_UNEXPECTED - The HW supports tag matching
	for EAGER messages when the send arrives before
	the corresponding receive
    - IBV_RNDV_MATCHED - The HW supports tag matching
	for RENDEZVOUS messages when the send arrives after
	the corresponding receive
    - IBV_RNDV_UNEXPECTED - The HW supports tag matching
	for RENDEZVOUS messages when the send arrives before
	the corresponding receive

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/rdma/ib_verbs.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8e90dd2..000f57d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -261,6 +261,38 @@ struct ib_odp_caps {
 	} per_transport_caps;
 };
 
+enum ibv_xrq_cap_flags {
+	/* The HW supports messages without tag
+	 * sent on QPs attached to a XRQ
+	 */
+	IBV_NO_TAG	     = 1 << 0,
+	/* The HW supports tag matching for EAGER messages when
+	 * the send arrives after the corresponding receive
+	 */
+	IBV_EAGER_EXPECTED   = 1 << 1,
+	/* The HW supports tag matching for EAGER messages when
+	 * the send arrives before the corresponding receive
+	 */
+	IBV_EAGER_UNEXPECTED = 1 << 2,
+	/* The HW supports tag matching for RANDEZVOUS messages when
+	 * the send arrives after the corresponding receive (for RC QPs)
+	 */
+	IBV_RNDV_EXPECTED_RC = 1 << 3,
+	/* The HW supports tag matching for RANDEZVOUS messages when
+	 * the send arrives before the corresponding receive
+	 */
+	IBV_RNDV_UNEXPECTED  = 1 << 5,
+};
+
+struct ib_xrq_caps {
+	uint32_t max_unexpected_tags;
+	uint32_t tag_mask_length;
+	uint32_t header_size;
+	uint32_t app_context_size;
+	uint32_t max_match_list;
+	uint32_t capability_flags;
+};
+
 enum ib_cq_creation_flags {
 	IB_CQ_FLAGS_TIMESTAMP_COMPLETION   = 1 << 0,
 	IB_CQ_FLAGS_IGNORE_OVERRUN	   = 1 << 1,
@@ -318,6 +350,7 @@ struct ib_device_attr {
 	struct ib_odp_caps	odp_caps;
 	uint64_t		timestamp_mask;
 	uint64_t		hca_core_clock; /* in KHZ */
+	struct ib_xrq_caps	xrq_caps;
 };
 
 enum ib_mtu {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 02/10] IB/core: Make CQ separate part of SRQ context
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2016-08-28 11:00   ` [PATCH rdma-next 01/10] IB/core: Add XRQ capabilities Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 03/10] IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING Leon Romanovsky
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ out makes it available to other types extending SRQ
functionality.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c |  2 +-
 drivers/infiniband/core/verbs.c      | 16 +++++++++-------
 drivers/infiniband/hw/mlx5/main.c    |  8 ++++----
 include/rdma/ib_verbs.h              | 19 +++++++++++++------
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index f664731..e1fb1cd 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3847,7 +3847,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	srq->srq_context   = attr.srq_context;
 
 	if (cmd->srq_type == IB_SRQT_XRC) {
-		srq->ext.xrc.cq   = attr.ext.xrc.cq;
+		srq->ext.cq       = attr.ext.xrc.cq;
 		srq->ext.xrc.xrcd = attr.ext.xrc.xrcd;
 		atomic_inc(&attr.ext.xrc.cq->usecnt);
 		atomic_inc(&attr.ext.xrc.xrcd->usecnt);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f2b776e..3349ffb 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -596,11 +596,13 @@ struct ib_srq *ib_create_srq(struct ib_pd *pd,
 		srq->event_handler = srq_init_attr->event_handler;
 		srq->srq_context   = srq_init_attr->srq_context;
 		srq->srq_type      = srq_init_attr->srq_type;
+		if (ib_srq_has_cq(srq->srq_type)) {
+			srq->ext.cq   = srq_init_attr->ext.xrc.cq;
+			atomic_inc(&srq->ext.cq->usecnt);
+		}
 		if (srq->srq_type == IB_SRQT_XRC) {
 			srq->ext.xrc.xrcd = srq_init_attr->ext.xrc.xrcd;
-			srq->ext.xrc.cq   = srq_init_attr->ext.xrc.cq;
 			atomic_inc(&srq->ext.xrc.xrcd->usecnt);
-			atomic_inc(&srq->ext.xrc.cq->usecnt);
 		}
 		atomic_inc(&pd->usecnt);
 		atomic_set(&srq->usecnt, 0);
@@ -641,18 +643,18 @@ int ib_destroy_srq(struct ib_srq *srq)
 
 	pd = srq->pd;
 	srq_type = srq->srq_type;
-	if (srq_type == IB_SRQT_XRC) {
+	if (ib_srq_has_cq(srq_type))
+		cq = srq->ext.cq;
+	if (srq_type == IB_SRQT_XRC)
 		xrcd = srq->ext.xrc.xrcd;
-		cq = srq->ext.xrc.cq;
-	}
 
 	ret = srq->device->destroy_srq(srq);
 	if (!ret) {
 		atomic_dec(&pd->usecnt);
-		if (srq_type == IB_SRQT_XRC) {
+		if (srq_type == IB_SRQT_XRC)
 			atomic_dec(&xrcd->usecnt);
+		if (ib_srq_has_cq(srq_type))
 			atomic_dec(&cq->usecnt);
-		}
 	}
 
 	return ret;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f02a975..77ca218 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2387,9 +2387,9 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 	devr->s0->srq_context   = NULL;
 	devr->s0->srq_type      = IB_SRQT_XRC;
 	devr->s0->ext.xrc.xrcd	= devr->x0;
-	devr->s0->ext.xrc.cq	= devr->c0;
+	devr->s0->ext.cq	= devr->c0;
 	atomic_inc(&devr->s0->ext.xrc.xrcd->usecnt);
-	atomic_inc(&devr->s0->ext.xrc.cq->usecnt);
+	atomic_inc(&devr->s0->ext.cq->usecnt);
 	atomic_inc(&devr->p0->usecnt);
 	atomic_set(&devr->s0->usecnt, 0);
 
@@ -2408,9 +2408,9 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 	devr->s1->event_handler = NULL;
 	devr->s1->srq_context   = NULL;
 	devr->s1->srq_type      = IB_SRQT_BASIC;
-	devr->s1->ext.xrc.cq	= devr->c0;
+	devr->s1->ext.cq	= devr->c0;
 	atomic_inc(&devr->p0->usecnt);
-	atomic_set(&devr->s0->usecnt, 0);
+	atomic_set(&devr->s1->usecnt, 0);
 
 	for (port = 0; port < ARRAY_SIZE(devr->ports); ++port) {
 		INIT_WORK(&devr->ports[port].pkey_change_work,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 000f57d..8b92698 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -935,6 +935,11 @@ enum ib_srq_type {
 	IB_SRQT_XRC
 };
 
+static inline bool ib_srq_has_cq(enum ib_srq_type srq_type)
+{
+	return srq_type == IB_SRQT_XRC;
+}
+
 enum ib_srq_attr_mask {
 	IB_SRQ_MAX_WR	= 1 << 0,
 	IB_SRQ_LIMIT	= 1 << 1,
@@ -1457,12 +1462,14 @@ struct ib_srq {
 	enum ib_srq_type	srq_type;
 	atomic_t		usecnt;
 
-	union {
-		struct {
-			struct ib_xrcd *xrcd;
-			struct ib_cq   *cq;
-			u32		srq_num;
-		} xrc;
+	struct {
+		struct ib_cq   *cq;
+		union {
+			struct {
+				struct ib_xrcd *xrcd;
+				u32		srq_num;
+			} xrc;
+		};
 	} ext;
 };
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 03/10] IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2016-08-28 11:00   ` [PATCH rdma-next 01/10] IB/core: Add XRQ capabilities Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 02/10] IB/core: Make CQ separate part of SRQ context Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 04/10] IB/uverbs: Expose tag matching capabilties to UAPI Leon Romanovsky
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

New SRQ type - IB_SRQT_TAG_MATCHING. It supports extended tag matching
receive buffers topology and rendezvouz offload.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.
In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/rdma/ib_verbs.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8b92698..0e7a711 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -932,12 +932,14 @@ enum ib_cq_notify_flags {
 
 enum ib_srq_type {
 	IB_SRQT_BASIC,
-	IB_SRQT_XRC
+	IB_SRQT_XRC,
+	IB_SRQT_TAG_MATCHING,
 };
 
 static inline bool ib_srq_has_cq(enum ib_srq_type srq_type)
 {
-	return srq_type == IB_SRQT_XRC;
+	return srq_type == IB_SRQT_XRC ||
+	       srq_type == IB_SRQT_TAG_MATCHING;
 }
 
 enum ib_srq_attr_mask {
@@ -962,6 +964,11 @@ struct ib_srq_init_attr {
 			struct ib_xrcd *xrcd;
 			struct ib_cq   *cq;
 		} xrc;
+
+		struct {
+			struct ib_cq   *cq;
+			u32		list_size;
+		} tag_matching;
 	} ext;
 };
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 04/10] IB/uverbs: Expose tag matching capabilties to UAPI
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 03/10] IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 05/10] IB/uverbs: Expose XRQ capabilities Leon Romanovsky
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Expose following tag matching capabilities to UAPI:

* max_unexpected_tags - The maximum number of tags
    in the unexpected message list

* tag_mask_length - Number of bits in a tag matching mask

* header_size - The maximum size for the TM header including RNDV info

* app_context_size - The size for the application context field in the
    XRQ context

* max_match_list - Maximal number of tags which can be matched
    simultaneously

* capability_flags - Tag matching message type supported:
    - IBV_NO_TAG - The HW supports messages without tag
	sent on QPs attached to a XRQ
    - IBV_EAGER_EXPECTED - The HW supports tag matching
	for EAGER messages when the send arrives after
	the corresponding receive
    - IBV_EAGER_UNEXPECTED - The HW supports tag matching
	for EAGER messages when the send arrives before
	the corresponding receive
    - IBV_RNDV_MATCHED - The HW supports tag matching
	for RENDEZVOUS messages when the send arrives after
	the corresponding receive
    - IBV_RNDV_UNEXPECTED - The HW supports tag matching
	for RENDEZVOUS messages when the send arrives before
	the corresponding receive

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/uapi/rdma/ib_user_verbs.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 7f035f4b..4da8f90 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -224,6 +224,39 @@ struct ib_uverbs_odp_caps {
 	__u32 reserved;
 };

+
+enum ibv_tm_cap_flags {
+	/* The HW supports messages without tag
+	 * sent on QPs attached to a XRQ
+	 */
+	IB_NO_TAG	     = 1 << 0,
+	/* The HW supports tag matching for EAGER messages when
+	 * the send arrives after the corresponding receive
+	 */
+	IB_EAGER_EXPECTED   = 1 << 1,
+	/* The HW supports tag matching for EAGER messages when
+	 * the send arrives before the corresponding receive
+	 */
+	IB_EAGER_UNEXPECTED = 1 << 2,
+	/* The HW supports tag matching for RANDEZVOUS messages when
+	 * the send arrives after the corresponding receive (for RC QPs)
+	 */
+	IB_RNDV_EXPECTED    = 1 << 3,
+	/* The HW supports tag matching for RANDEZVOUS messages when
+	 * the send arrives before the corresponding receive
+	 */
+	IB_RNDV_UNEXPECTED  = 1 << 5,
+};
+
+struct ib_uverbs_tm_caps {
+	__u32 max_unexpected_tags;
+	__u32 tag_mask_length;
+	__u32 header_size;
+	__u32 app_context_size;
+	__u32 max_match_list;
+	__u32 capability_flags;
+};
+
 struct ib_uverbs_ex_query_device_resp {
 	struct ib_uverbs_query_device_resp base;
 	__u32 comp_mask;
@@ -232,6 +265,7 @@ struct ib_uverbs_ex_query_device_resp {
 	__u64 timestamp_mask;
 	__u64 hca_core_clock; /* in KHZ */
 	__u64 device_cap_flags_ex;
+	struct ib_uverbs_tm_caps xrq_caps;
 };

 struct ib_uverbs_query_port {
--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 05/10] IB/uverbs: Expose XRQ capabilities
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 04/10] IB/uverbs: Expose tag matching capabilties to UAPI Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 06/10] IB/uverbs: Add XRQ creation parameter to UAPI Leon Romanovsky
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Make XRQ capabilities available via ibv_query_device() verb.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index e1fb1cd..548d4b4 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -4173,6 +4173,17 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 
 	resp.device_cap_flags_ex = attr.device_cap_flags;
 	resp.response_length += sizeof(resp.device_cap_flags_ex);
+
+	if (ucore->outlen < resp.response_length + sizeof(resp.xrq_caps))
+		goto end;
+
+	resp.xrq_caps.max_unexpected_tags = attr.xrq_caps.max_unexpected_tags;
+	resp.xrq_caps.tag_mask_length = attr.xrq_caps.tag_mask_length;
+	resp.xrq_caps.header_size = attr.xrq_caps.header_size;
+	resp.xrq_caps.app_context_size = attr.xrq_caps.app_context_size;
+	resp.xrq_caps.max_match_list = attr.xrq_caps.max_match_list;
+	resp.xrq_caps.capability_flags = attr.xrq_caps.capability_flags;
+	resp.response_length += sizeof(resp.xrq_caps);
 end:
 	err = ib_copy_to_udata(ucore, &resp, resp.response_length);
 	return err;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 06/10] IB/uverbs: Add XRQ creation parameter to UAPI
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 05/10] IB/uverbs: Expose XRQ capabilities Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 07/10] IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING Leon Romanovsky
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add tm_list_size parameter to struct ib_uverbs_create_xsrq.
If SRQ type is tag-matching this field defines maximum size
of tag matching list. Otherwise - reserved.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/uapi/rdma/ib_user_verbs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 4da8f90..475e098 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -968,7 +968,7 @@ struct ib_uverbs_create_xsrq {
 	__u32 max_wr;
 	__u32 max_sge;
 	__u32 srq_limit;
-	__u32 reserved;
+	__u32 tm_list_size;
 	__u32 xrcd_handle;
 	__u32 cq_handle;
 	__u64 driver_data[0];
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 07/10] IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 06/10] IB/uverbs: Add XRQ creation parameter to UAPI Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 08/10] IB/mlx5: Fill XRQ capabilities Leon Romanovsky
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add new SRQ type capable of new Tag Matching feature.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.
In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 548d4b4..b240c9a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1831,7 +1831,7 @@ static int create_qp(struct ib_uverbs_file *file,
 			if (cmd->is_srq) {
 				srq = idr_read_srq(cmd->srq_handle,
 						   file->ucontext);
-				if (!srq || srq->srq_type != IB_SRQT_BASIC) {
+				if (!srq || srq->srq_type == IB_SRQT_XRC) {
 					ret = -EINVAL;
 					goto err_put;
 				}
@@ -3815,6 +3815,15 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 			ret = -EINVAL;
 			goto err_put_xrcd;
 		}
+	} else if (cmd->srq_type == IB_SRQT_TAG_MATCHING) {
+		attr.ext.tag_matching.cq  = idr_read_cq(cmd->cq_handle,
+							file->ucontext, 0);
+		if (!attr.ext.tag_matching.cq) {
+			ret = -EINVAL;
+			goto err;
+		}
+
+		attr.ext.tag_matching.list_size = cmd->tm_list_size;
 	}
 
 	pd  = idr_read_pd(cmd->pd_handle, file->ucontext);
@@ -3851,6 +3860,9 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 		srq->ext.xrc.xrcd = attr.ext.xrc.xrcd;
 		atomic_inc(&attr.ext.xrc.cq->usecnt);
 		atomic_inc(&attr.ext.xrc.xrcd->usecnt);
+	} else if (cmd->srq_type == IB_SRQT_TAG_MATCHING) {
+		srq->ext.cq = attr.ext.tag_matching.cq;
+		atomic_inc(&attr.ext.tag_matching.cq->usecnt);
 	}
 
 	atomic_inc(&pd->usecnt);
@@ -3877,6 +3889,8 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	if (cmd->srq_type == IB_SRQT_XRC) {
 		put_uobj_read(xrcd_uobj);
 		put_cq_read(attr.ext.xrc.cq);
+	} else if (cmd->srq_type == IB_SRQT_TAG_MATCHING) {
+		put_cq_read(attr.ext.tag_matching.cq);
 	}
 	put_pd_read(pd);
 
@@ -3902,6 +3916,8 @@ err_put:
 err_put_cq:
 	if (cmd->srq_type == IB_SRQT_XRC)
 		put_cq_read(attr.ext.xrc.cq);
+	else if (cmd->srq_type == IB_SRQT_TAG_MATCHING)
+		put_cq_read(attr.ext.tag_matching.cq);
 
 err_put_xrcd:
 	if (cmd->srq_type == IB_SRQT_XRC) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 08/10] IB/mlx5: Fill XRQ capabilities
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 07/10] IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 09/10] net/mlx5: Add XRQ support Leon Romanovsky
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Provide driver specific values for XRQ capabilities

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 13 +++++++++++++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  6 ++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 77ca218..692a687 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -594,6 +594,19 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	if (!mlx5_core_is_pf(mdev))
 		props->device_cap_flags |= IB_DEVICE_VIRTUAL_FUNCTION;
 
+	props->xrq_caps.max_unexpected_tags =
+		1 << MLX5_CAP_GEN(mdev, log_max_srq_sz);
+	props->xrq_caps.tag_mask_length  = MLX5_TM_TAG_SIZE;
+	props->xrq_caps.header_size      = MLX5_TM_HEADER_SIZE;
+	props->xrq_caps.app_context_size = MLX5_TM_APP_CTX_SIZE;
+	props->xrq_caps.max_match_list   =
+		1 << MLX5_CAP_GEN(mdev, log_tag_matching_list_sz);
+	props->xrq_caps.capability_flags = IBV_NO_TAG |
+					   IBV_EAGER_EXPECTED |
+					   IBV_EAGER_UNEXPECTED |
+					   IBV_RNDV_EXPECTED_RC |
+					   IBV_RNDV_UNEXPECTED;
+
 	if (uhw->outlen) {
 		err = ib_copy_to_udata(uhw, &resp, resp.response_length);
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index a59034a..1c204d7 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -105,6 +105,12 @@ enum {
 	MLX5_CQE_VERSION_V1,
 };
 
+enum {
+	MLX5_TM_TAG_SIZE		= 64,
+	MLX5_TM_HEADER_SIZE		= 16,
+	MLX5_TM_APP_CTX_SIZE		= 32,
+};
+
 struct mlx5_ib_vma_private_data {
 	struct list_head list;
 	struct vm_area_struct *vma;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 09/10] net/mlx5: Add XRQ support
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 08/10] IB/mlx5: Fill XRQ capabilities Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-08-28 11:00   ` [PATCH rdma-next 10/10] IB/mlx5: Support IB_SRQT_TAG_MATCHING Leon Romanovsky
  2016-10-07 14:56   ` [PATCH rdma-next 00/10] Hardware tag matching support Leon Romanovsky
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add support to new XRQ(eXtended shared Receive Queue)
hardware object. It supports SRQ semantics with addition
of extended receive buffers topologies and offloads.

Currently supports tag matching topology and rendezvouz offload.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/srq.c | 150 ++++++++++++++++++++++++--
 include/linux/mlx5/driver.h                   |   1 +
 include/linux/mlx5/srq.h                      |   5 +
 3 files changed, 146 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/srq.c b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
index 30996300..0669dd9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/srq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/srq.c
@@ -435,16 +435,128 @@ out:
 	return err;
 }

+static int create_xrq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
+			  struct mlx5_srq_attr *in)
+{
+	u32 create_out[MLX5_ST_SZ_DW(create_xrq_out)] = {0};
+	void *create_in;
+	void *xrqc;
+	void *wq;
+	int pas_size;
+	int inlen;
+	int err;
+
+	pas_size = get_pas_size(in);
+	inlen = MLX5_ST_SZ_BYTES(create_xrq_in) + pas_size;
+	create_in = mlx5_vzalloc(inlen);
+	if (!create_in)
+		return -ENOMEM;
+
+	xrqc = MLX5_ADDR_OF(create_xrq_in, create_in, xrq_context);
+	wq = MLX5_ADDR_OF(xrqc, xrqc, wq);
+
+	set_wq(wq, in);
+	memcpy(MLX5_ADDR_OF(xrqc, xrqc, wq.pas), in->pas, pas_size);
+
+	if (in->type == IB_SRQT_TAG_MATCHING) {
+		MLX5_SET(xrqc, xrqc, topology, MLX5_XRQC_TOPOLOGY_TAG_MATCHING);
+		if (in->flags & MLX5_SRQ_FLAG_RNDV)
+			MLX5_SET(xrqc, xrqc, offload, MLX5_XRQC_OFFLOAD_RNDV);
+		MLX5_SET(xrqc, xrqc,
+			 tag_matching_topology_context.log_matching_list_sz,
+			 in->tm_log_list_size);
+	}
+	MLX5_SET(xrqc, xrqc, user_index, in->user_index);
+	MLX5_SET(xrqc, xrqc, cqn, in->cqn);
+	MLX5_SET(create_xrq_in, create_in, opcode, MLX5_CMD_OP_CREATE_XRQ);
+	err = mlx5_cmd_exec(dev, create_in, inlen, create_out,
+			    sizeof(create_out));
+	kvfree(create_in);
+	if (!err)
+		srq->srqn = MLX5_GET(create_xrq_out, create_out, xrqn);
+
+	return err;
+}
+
+static int destroy_xrq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_xrq_in)] = {0};
+	u32 out[MLX5_ST_SZ_DW(destroy_xrq_out)] = {0};
+
+	MLX5_SET(destroy_xrq_in, in, opcode, MLX5_CMD_OP_DESTROY_XRQ);
+	MLX5_SET(destroy_xrq_in, in, xrqn,   srq->srqn);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+static int arm_xrq_cmd(struct mlx5_core_dev *dev,
+		       struct mlx5_core_srq *srq,
+		       u16 lwm)
+{
+	u32 out[MLX5_ST_SZ_DW(arm_rq_out)] = {0};
+	u32 in[MLX5_ST_SZ_DW(arm_rq_in)] = {0};
+
+	MLX5_SET(arm_rq_in, in, opcode,     MLX5_CMD_OP_ARM_RQ);
+	MLX5_SET(arm_rq_in, in, op_mod,     MLX5_ARM_RQ_IN_OP_MOD_XRQ);
+	MLX5_SET(arm_rq_in, in, srq_number, srq->srqn);
+	MLX5_SET(arm_rq_in, in, lwm,	    lwm);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+static int query_xrq_cmd(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
+			 struct mlx5_srq_attr *out)
+{
+	u32 in[MLX5_ST_SZ_DW(query_xrq_in)] = {0};
+	u32 *xrq_out;
+	int outlen = MLX5_ST_SZ_BYTES(query_xrq_out);
+	void *xrqc;
+	int err;
+
+	xrq_out =  mlx5_vzalloc(MLX5_ST_SZ_BYTES(query_xrq_out));
+	if (!xrq_out)
+		return -ENOMEM;
+
+	MLX5_SET(query_xrq_in, in, opcode, MLX5_CMD_OP_QUERY_XRQ);
+	MLX5_SET(query_xrq_in, in, xrqn, srq->srqn);
+
+	err = mlx5_cmd_exec(dev, in, sizeof(in), xrq_out, outlen);
+	if (err)
+		goto out;
+
+	xrqc = MLX5_ADDR_OF(query_xrq_out, xrq_out, xrq_context);
+	get_wq(MLX5_ADDR_OF(xrqc, xrqc, wq), out);
+	if (MLX5_GET(xrqc, xrqc, state) != MLX5_XRQC_STATE_GOOD)
+		out->flags |= MLX5_SRQ_FLAG_ERR;
+	out->tm_next_tag =
+		MLX5_GET(xrqc, xrqc,
+			 tag_matching_topology_context.append_next_index);
+	out->tm_hw_phase_cnt =
+		MLX5_GET(xrqc, xrqc,
+			 tag_matching_topology_context.hw_phase_cnt);
+	out->tm_sw_phase_cnt =
+		MLX5_GET(xrqc, xrqc,
+			 tag_matching_topology_context.sw_phase_cnt);
+
+out:
+	kvfree(xrq_out);
+	return err;
+}
+
 static int create_srq_split(struct mlx5_core_dev *dev,
 			    struct mlx5_core_srq *srq,
 			    struct mlx5_srq_attr *in)
 {
 	if (!dev->issi)
 		return create_srq_cmd(dev, srq, in);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return create_xrc_srq_cmd(dev, srq, in);
-	else
+	case MLX5_RES_XRQ:
+		return create_xrq_cmd(dev, srq, in);
+	default:
 		return create_rmp_cmd(dev, srq, in);
+	}
 }

 static int destroy_srq_split(struct mlx5_core_dev *dev,
@@ -452,10 +564,14 @@ static int destroy_srq_split(struct mlx5_core_dev *dev,
 {
 	if (!dev->issi)
 		return destroy_srq_cmd(dev, srq);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return destroy_xrc_srq_cmd(dev, srq);
-	else
+	case MLX5_RES_XRQ:
+		return destroy_xrq_cmd(dev, srq);
+	default:
 		return destroy_rmp_cmd(dev, srq);
+	}
 }

 int mlx5_core_create_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
@@ -464,10 +580,16 @@ int mlx5_core_create_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 	int err;
 	struct mlx5_srq_table *table = &dev->priv.srq_table;

-	if (in->type == IB_SRQT_XRC)
+	switch (in->type) {
+	case IB_SRQT_XRC:
 		srq->common.res = MLX5_RES_XSRQ;
-	else
+		break;
+	case IB_SRQT_TAG_MATCHING:
+		srq->common.res = MLX5_RES_XRQ;
+		break;
+	default:
 		srq->common.res = MLX5_RES_SRQ;
+	}

 	err = create_srq_split(dev, srq, in);
 	if (err)
@@ -528,10 +650,14 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 {
 	if (!dev->issi)
 		return query_srq_cmd(dev, srq, out);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return query_xrc_srq_cmd(dev, srq, out);
-	else
+	case MLX5_RES_XRQ:
+		return query_xrq_cmd(dev, srq, out);
+	default:
 		return query_rmp_cmd(dev, srq, out);
+	}
 }
 EXPORT_SYMBOL(mlx5_core_query_srq);

@@ -540,10 +666,14 @@ int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 {
 	if (!dev->issi)
 		return arm_srq_cmd(dev, srq, lwm, is_srq);
-	else if (srq->common.res == MLX5_RES_XSRQ)
+	switch (srq->common.res) {
+	case MLX5_RES_XSRQ:
 		return arm_xrc_srq_cmd(dev, srq, lwm);
-	else
+	case MLX5_RES_XRQ:
+		return arm_xrq_cmd(dev, srq, lwm);
+	default:
 		return arm_rmp_cmd(dev, srq, lwm);
+	}
 }
 EXPORT_SYMBOL(mlx5_core_arm_srq);

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 5cb9fa7..c158db0 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -368,6 +368,7 @@ enum mlx5_res_type {
 	MLX5_RES_SQ	= MLX5_EVENT_QUEUE_TYPE_SQ,
 	MLX5_RES_SRQ	= 3,
 	MLX5_RES_XSRQ	= 4,
+	MLX5_RES_XRQ	= 5,
 };

 struct mlx5_core_rsc_common {
diff --git a/include/linux/mlx5/srq.h b/include/linux/mlx5/srq.h
index 33c97dc..9905353 100644
--- a/include/linux/mlx5/srq.h
+++ b/include/linux/mlx5/srq.h
@@ -38,6 +38,7 @@
 enum {
 	MLX5_SRQ_FLAG_ERR    = (1 << 0),
 	MLX5_SRQ_FLAG_WQ_SIG = (1 << 1),
+	MLX5_SRQ_FLAG_RNDV   = (1 << 2),
 };

 struct mlx5_srq_attr {
@@ -56,6 +57,10 @@ struct mlx5_srq_attr {
 	u32 user_index;
 	u64 db_record;
 	u64 *pas;
+	u32 tm_log_list_size;
+	u32 tm_next_tag;
+	u32 tm_hw_phase_cnt;
+	u32 tm_sw_phase_cnt;
 };

 struct mlx5_core_dev;
--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH rdma-next 10/10] IB/mlx5: Support IB_SRQT_TAG_MATCHING
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 09/10] net/mlx5: Add XRQ support Leon Romanovsky
@ 2016-08-28 11:00   ` Leon Romanovsky
  2016-10-07 14:56   ` [PATCH rdma-next 00/10] Hardware tag matching support Leon Romanovsky
  10 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-08-28 11:00 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Artemy Kovalyov

From: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When SRQ created with IB_SRQT_TAG_MATCHING pass to mlx5_core CQ,
list_size and flag to enable rendezvouz offload.

Signed-off-by: Artemy Kovalyov <artemyko-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/srq.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index ed6ac52..82fe622 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -102,7 +102,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 				 udata->inlen - sizeof(ucmd)))
 		return -EINVAL;
 
-	if (in->type == IB_SRQT_XRC) {
+	if (in->type != IB_SRQT_BASIC) {
 		err = get_srq_user_index(to_mucontext(pd->uobject->context),
 					 &ucmd, udata->inlen, &uidx);
 		if (err)
@@ -146,7 +146,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 	in->log_page_size = page_shift - MLX5_ADAPTER_PAGE_SHIFT;
 	in->page_offset = offset;
 	if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 &&
-	    in->type == IB_SRQT_XRC)
+	    in->type != IB_SRQT_BASIC)
 		in->user_index = uidx;
 
 	return 0;
@@ -213,7 +213,7 @@ static int create_srq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_srq *srq,
 
 	in->log_page_size = page_shift - MLX5_ADAPTER_PAGE_SHIFT;
 	if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 &&
-	    in->type == IB_SRQT_XRC)
+	    in->type != IB_SRQT_BASIC)
 		in->user_index = MLX5_IB_DEFAULT_UIDX;
 
 	return 0;
@@ -284,6 +284,8 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd,
 		    desc_size, init_attr->attr.max_wr, srq->msrq.max, srq->msrq.max_gs,
 		    srq->msrq.max_avail_gather);
 
+	in.type = init_attr->srq_type;
+
 	if (pd->uobject)
 		err = create_srq_user(pd, srq, &in, udata, buf_size);
 	else
@@ -295,7 +297,6 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd,
 		goto err_srq;
 	}
 
-	in.type = init_attr->srq_type;
 	in.log_size = ilog2(srq->msrq.max);
 	in.wqe_shift = srq->msrq.wqe_shift - 4;
 	if (srq->wq_sig)
@@ -306,6 +307,12 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd,
 	} else if (init_attr->srq_type == IB_SRQT_BASIC) {
 		in.xrcd = to_mxrcd(dev->devr.x0)->xrcdn;
 		in.cqn = to_mcq(dev->devr.c0)->mcq.cqn;
+	} else if (init_attr->srq_type == IB_SRQT_TAG_MATCHING) {
+		in.xrcd = to_mxrcd(dev->devr.x0)->xrcdn;
+		in.cqn = to_mcq(init_attr->ext.tag_matching.cq)->mcq.cqn;
+		in.tm_log_list_size =
+			ilog2(init_attr->ext.tag_matching.list_size);
+		in.flags |= MLX5_SRQ_FLAG_RNDV;
 	}
 
 	in.pd = to_mpd(pd)->pdn;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (9 preceding siblings ...)
  2016-08-28 11:00   ` [PATCH rdma-next 10/10] IB/mlx5: Support IB_SRQT_TAG_MATCHING Leon Romanovsky
@ 2016-10-07 14:56   ` Leon Romanovsky
       [not found]     ` <20161007145620.GV9282-2ukJVAZIZ/Y@public.gmane.org>
  10 siblings, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2016-10-07 14:56 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 5872 bytes --]

On Sun, Aug 28, 2016 at 02:00:40PM +0300, Leon Romanovsky wrote:
> Message Passing Interface (MPI) is a communication protocol that is
> widely used for exchange of messages among processes in high-performance
> computing (HPC) systems. Messages sent from a sending process to a
> destination process are marked with an identifying label, referred to as
> a tag. Destination processes post buffers in local memory that are
> similarly marked with tags. When a message is received by the receiver
> (i.e., the host computer on which the destination process is running),
> the message is stored in a buffer whose tag matches the message tag. The
> process of finding a buffer with a matching tag for the received packet
> is called tag matching.
>
> There are two protocols that are generally used to send messages over
> MPI: The "Eager Protocol" is best suited to small messages that are
> simply sent to the destination process and received in an appropriate
> matching buffer. The "Rendezvous Protocol" is better suited to large
> messages. In Rendezvous, when the sender process has a large message to
> send, it first sends a small message to the destination process
> announcing its intention to send the large message. This small message
> is referred to as an RTS (ready to send) message. The RTS includes the
> message tag and buffer address in the sender. The destination process
> matches the RTS to a posted receive buffer, or posts such a buffer if
> one does not already exist. Once a matching receive buffer has been
> posted at the destination process side, the receiver initiates a remote
> direct memory access (RDMA) read request to read the data from the
> buffer address listed by the sender in the RTS message.
>
> MPI tag matching, when performed in software by a host processor, can
> consume substantial host resources, thus detracting from the performance
> of the actual software applications that are using MPI for
> communications. One possible solution is to offload the entire tag
> matching process to a peripheral hardware device, such as a network
> interface controller (NIC). In this case, the software application using
> MPI will post a set of buffers in a memory of the host processor and
> will pass the entire list of tags associated with the buffers to the
> NIC. In large-scale networks, however, the NIC may be required to
> simultaneously support many communicating processes and contexts
> (referred to in MPI parlance as "ranks" and "communicators,"
> respectively). NIC access to and matching of the large lists of tags
> involved in such a scenario can itself become a bottleneck. The NIC must
> also be able to handle "unexpected" traffic, for which buffers and tags
> have not yet been posted, which may also degrade performance.
>
> When the NIC receives a message over the network from one of the peer
> processes, and the message contains a label in accordance with the
> protocol, the NIC compares the label to the labels in the part of the
> list that was pushed to the NIC. Upon finding a match to the label, the
> NIC writes data conveyed in the message to the buffer in the memory that
> is associated with this label and submits a notification to the software
> process. The notification serves two purposes: both to indicate to the
> software process that the label has been consumed, so that the process
> will update the list of the labels posted to the NIC; and to inform the
> software process that the data are available in the buffer. In some
> cases (such as when the NIC retrieves the data from the remote node by
> RDMA), the NIC may submit two notifications, in the form of completion
> reports, of which the first informs the software process of the
> consumption of the label and the second announces availability of the
> data.
>
> This patch series adds to Mellanox ConnectX HCA driver support of
> tag matching. It introduces new hardware object eXtended shared Receive
> Queue (XRQ), which follows SRQ semantics with addition of extended
> receive buffers topologies and offloads. This series adds tag matching
> topology and rendezvouz offload.
>
> Available in the "topic/xrq" topic branch of this git repo:
> git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git
>
> Or for browsing:
> https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/xrq

Hi Doug,

For any reasons, I don't see this patch set in your tree. Did I miss it?

Thanks

>
> Thanks,
>   Artemy & Leon
>
> Artemy Kovalyov (10):
>   IB/core: Add XRQ capabilities
>   IB/core: Make CQ separate part of SRQ context
>   IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING
>   IB/uverbs: Expose tag matching capabilties to UAPI
>   IB/uverbs: Expose XRQ capabilities
>   IB/uverbs: Add XRQ creation parameter to UAPI
>   IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING
>   IB/mlx5: Fill XRQ capabilities
>   net/mlx5: Add XRQ support
>   IB/mlx5: Support IB_SRQT_TAG_MATCHING
>
>  drivers/infiniband/core/uverbs_cmd.c          |  31 +++++-
>  drivers/infiniband/core/verbs.c               |  16 +--
>  drivers/infiniband/hw/mlx5/main.c             |  21 +++-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h          |   6 ++
>  drivers/infiniband/hw/mlx5/srq.c              |  15 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/srq.c | 150 ++++++++++++++++++++++++--
>  include/linux/mlx5/driver.h                   |   1 +
>  include/linux/mlx5/srq.h                      |   5 +
>  include/rdma/ib_verbs.h                       |  61 +++++++++--
>  include/uapi/rdma/ib_user_verbs.h             |  36 ++++++-
>  10 files changed, 307 insertions(+), 35 deletions(-)
>
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]     ` <20161007145620.GV9282-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-10-07 16:47       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB093986-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Hefty, Sean @ 2016-10-07 16:47 UTC (permalink / raw)
  To: Leon Romanovsky, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> For any reasons, I don't see this patch set in your tree. Did I miss
> it?
> 
> Thanks
> 
> >
> > Thanks,
> >   Artemy & Leon
> >
> > Artemy Kovalyov (10):
> >   IB/core: Add XRQ capabilities
> >   IB/core: Make CQ separate part of SRQ context
> >   IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING
> >   IB/uverbs: Expose tag matching capabilties to UAPI
> >   IB/uverbs: Expose XRQ capabilities
> >   IB/uverbs: Add XRQ creation parameter to UAPI
> >   IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING
> >   IB/mlx5: Fill XRQ capabilities
> >   net/mlx5: Add XRQ support
> >   IB/mlx5: Support IB_SRQT_TAG_MATCHING

I was out when these were submitted, so this may have been answered.  These patches change the uABI.  Have the changes been vetted out by the IBTA or some other standards (I'm using that term loosely) organization?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB093986-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-10-13 14:15           ` Doug Ledford
       [not found]             ` <6259953b-27fe-77c9-ea90-af744f188671-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Doug Ledford @ 2016-10-13 14:15 UTC (permalink / raw)
  To: Hefty, Sean, Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1441 bytes --]

On 10/7/2016 12:47 PM, Hefty, Sean wrote:
>> For any reasons, I don't see this patch set in your tree. Did I miss
>> it?
>>
>> Thanks
>>
>>>
>>> Thanks,
>>>   Artemy & Leon
>>>
>>> Artemy Kovalyov (10):
>>>   IB/core: Add XRQ capabilities
>>>   IB/core: Make CQ separate part of SRQ context
>>>   IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING
>>>   IB/uverbs: Expose tag matching capabilties to UAPI
>>>   IB/uverbs: Expose XRQ capabilities
>>>   IB/uverbs: Add XRQ creation parameter to UAPI
>>>   IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING
>>>   IB/mlx5: Fill XRQ capabilities
>>>   net/mlx5: Add XRQ support
>>>   IB/mlx5: Support IB_SRQT_TAG_MATCHING
> 
> I was out when these were submitted, so this may have been answered.  These patches change the uABI.  Have the changes been vetted out by the IBTA or some other standards (I'm using that term loosely) organization?
> 

I doubt they have, but they change the kernel to user library API.  I'm
not so worried about that since it's entirely under our control.  It's
the user library to application API that I would be more concerned with.

As these stand, they look fairly clean to me.  Minimal in disruption and
I see no reason other vendors couldn't use this to do tag matching on
their own hardware if they wanted.  I'm inclined to take them.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]             ` <6259953b-27fe-77c9-ea90-af744f188671-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-10-13 17:02               ` Hefty, Sean
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373AB095429-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Hefty, Sean @ 2016-10-13 17:02 UTC (permalink / raw)
  To: Doug Ledford, Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> I doubt they have, but they change the kernel to user library API.  I'm
> not so worried about that since it's entirely under our control.  It's
> the user library to application API that I would be more concerned
> with.

I thought there was an agreement that since verbs implies a hardware implementation, then all kernel to user ABI changes to verbs would go through some external organization, such as the OFVWG, for discussion and approval.  Has that changed? 

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373AB095429-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-10-13 17:06                   ` Christoph Hellwig
       [not found]                     ` <20161013170641.GA9094-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-10-13 17:06 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Doug Ledford, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 13, 2016 at 05:02:13PM +0000, Hefty, Sean wrote:
> > I doubt they have, but they change the kernel to user library API.  I'm
> > not so worried about that since it's entirely under our control.  It's
> > the user library to application API that I would be more concerned
> > with.
> 
> I thought there was an agreement that since verbs implies a hardware implementation, then all kernel to user ABI changes to verbs would go through some external organization, such as the OFVWG, for discussion and approval.  Has that changed? 

What makes OFVWG more qualified to review an ABI than linux-rdma?

That beeing said a review by an independent third party in addition to
the vensors (intentional plural here!) is important, but I think we can
handle that just fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                     ` <20161013170641.GA9094-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-10-13 19:43                       ` Hefty, Sean
       [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373AB095647-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Hefty, Sean @ 2016-10-13 19:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Doug Ledford, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

> > > I doubt they have, but they change the kernel to user library API.
> I'm
> > > not so worried about that since it's entirely under our control.
> It's
> > > the user library to application API that I would be more concerned
> > > with.
> >
> > I thought there was an agreement that since verbs implies a hardware
> implementation, then all kernel to user ABI changes to verbs would go
> through some external organization, such as the OFVWG, for discussion
> and approval.  Has that changed?
> 
> What makes OFVWG more qualified to review an ABI than linux-rdma?

We've had this discussion on the list before.  The issue is that verbs isn't just a software API.  It defines a hardware implementation.  The consensus was that some 'standards' body needed to be able to review the proposed changes and its impact.  The OFVWG was selected as the best available option.  If we want to say no, and let the linux-rdma maintainer dictate the hardware interface exported by the kernel, then, yes, that's an option.

Now, if we want to kill the notion that there's some sort of common rdma/verbs interface that all devices will be able to use effectively, and replace it with vendor specific interfaces, I would completely go along with that.  Then let user space figure out how to deal with the results.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373AB095647-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-10-13 19:56                           ` Jason Gunthorpe
       [not found]                             ` <20161013195605.GA8077-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2016-10-13 19:56 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Christoph Hellwig, Doug Ledford, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 13, 2016 at 07:43:25PM +0000, Hefty, Sean wrote:

> > What makes OFVWG more qualified to review an ABI than linux-rdma?
> 
> We've had this discussion on the list before.  The issue is that
> verbs isn't just a software API.  It defines a hardware
> implementation.  The consensus was that some 'standards' body needed
> to be able to review the proposed changes and its impact.  The OFVWG
> was selected as the best available option.  If we want to say no,
> and let the linux-rdma maintainer dictate the hardware interface
> exported by the kernel, then, yes, that's an option.

A bigger objection I have is that much of this new stuff is not
adaquately documented.

Tag matching is a great example. It looks like this introduces many new
headers into the on-wire protocol. Where are these headers documented?
Where is the spec for the on-wire format? Where is the implementation
for rxe?

Basically, is there enough public information that another vendor
could implement this wire protocol? (if not, why are we even
*contemplating* merging something like this???)

'Hopefully' the answer is that this information is all hidden away in
the IBTA... Which opens up an entire new line of problems reviewing
something like this without seeing the agreed spec..

The basic problem with linux-rdma is that it is not a reasonable place
to review a new on-wire protocol amendment to the IBA. Nor is ofvwg
for that matter.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                             ` <20161013195605.GA8077-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-10-13 20:02                               ` Christoph Hellwig
       [not found]                                 ` <20161013200208.GA8998-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-10-13 20:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Hefty, Sean, Christoph Hellwig, Doug Ledford, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 13, 2016 at 01:56:05PM -0600, Jason Gunthorpe wrote:
> A bigger objection I have is that much of this new stuff is not
> adaquately documented.
> 
> Tag matching is a great example. It looks like this introduces many new
> headers into the on-wire protocol. Where are these headers documented?
> Where is the spec for the on-wire format? Where is the implementation
> for rxe?

Agreed on that part.  For pon the wire changes we need the hardware
standards organization to act.  IBTA for IB and RoCE and IETF for iWarp,
although not even having an active WG for iWarp there is a major problem
at the moment.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                                 ` <20161013200208.GA8998-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-10-13 20:30                                   ` Doug Ledford
       [not found]                                     ` <a3495685-69c7-26a7-ba97-9761848535a8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Doug Ledford @ 2016-10-13 20:30 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Hefty, Sean, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1449 bytes --]

On 10/13/2016 4:02 PM, Christoph Hellwig wrote:
> On Thu, Oct 13, 2016 at 01:56:05PM -0600, Jason Gunthorpe wrote:
>> A bigger objection I have is that much of this new stuff is not
>> adaquately documented.
>>
>> Tag matching is a great example. It looks like this introduces many new
>> headers into the on-wire protocol. Where are these headers documented?
>> Where is the spec for the on-wire format? Where is the implementation
>> for rxe?
> 
> Agreed on that part.  For pon the wire changes we need the hardware
> standards organization to act.  IBTA for IB and RoCE and IETF for iWarp,
> although not even having an active WG for iWarp there is a major problem
> at the moment.
> 

I think there are some fundamental questions here about what this is.
As I understand it (and Mellanox can chime in if I'm wrong), tag
matching already exists on the wire, it's just implemented above the
level of the link layer transmissions.  This enables a hardware
accelerator for something that is already done at an upper layer, so the
implementation is mostly defined by the defacto existing use in upper
layer applications, not by the IBTA.  I believe this could roughly be
considered analogous to a network driver enabling a layer3 or layer4
accelerator.  It's not a layer1 thing, so the IBTA is not appropriate to
review it.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                                     ` <a3495685-69c7-26a7-ba97-9761848535a8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-10-13 20:47                                       ` Jason Gunthorpe
  2016-10-13 21:23                                       ` Hefty, Sean
  2016-10-14  4:19                                       ` Leon Romanovsky
  2 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2016-10-13 20:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Hefty, Sean, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 13, 2016 at 04:30:10PM -0400, Doug Ledford wrote:
> On 10/13/2016 4:02 PM, Christoph Hellwig wrote:
> > On Thu, Oct 13, 2016 at 01:56:05PM -0600, Jason Gunthorpe wrote:
> >> A bigger objection I have is that much of this new stuff is not
> >> adaquately documented.
> >>
> >> Tag matching is a great example. It looks like this introduces many new
> >> headers into the on-wire protocol. Where are these headers documented?
> >> Where is the spec for the on-wire format? Where is the implementation
> >> for rxe?
> > 
> > Agreed on that part.  For pon the wire changes we need the hardware
> > standards organization to act.  IBTA for IB and RoCE and IETF for iWarp,
> > although not even having an active WG for iWarp there is a major problem
> > at the moment.
> > 
> 
> I think there are some fundamental questions here about what this is.
> As I understand it (and Mellanox can chime in if I'm wrong), tag
> matching already exists on the wire, it's just implemented above the
> level of the link layer transmissions.  This enables a hardware
> accelerator for something that is already done at an upper layer, so the
> implementation is mostly defined by the defacto existing use in upper
> layer applications, not by the IBTA.  I believe this could roughly be
> considered analogous to a network driver enabling a layer3 or layer4
> accelerator.  It's not a layer1 thing, so the IBTA is not appropriate to
> review it.

Maybe, but that doesn't change my original point. This is clearly not
defined well enough in the patches alone for any other vendor to
implement it.

If we can't even reach a basic threshold of understanding it is
pointless to expect other vendors to chime in on the design.

I'm starting to wonder if we should insist on an implementation of all
this new stuff in rxe before accepting. At least then we have code as
documentation ...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                                     ` <a3495685-69c7-26a7-ba97-9761848535a8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-10-13 20:47                                       ` Jason Gunthorpe
@ 2016-10-13 21:23                                       ` Hefty, Sean
  2016-10-14  4:19                                       ` Leon Romanovsky
  2 siblings, 0 replies; 23+ messages in thread
From: Hefty, Sean @ 2016-10-13 21:23 UTC (permalink / raw)
  To: Doug Ledford, Christoph Hellwig, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> I think there are some fundamental questions here about what this is.
> As I understand it (and Mellanox can chime in if I'm wrong), tag
> matching already exists on the wire, it's just implemented above the
> level of the link layer transmissions.  This enables a hardware
> accelerator for something that is already done at an upper layer, so
> the
> implementation is mostly defined by the defacto existing use in upper
> layer applications, not by the IBTA.  I believe this could roughly be

MPIs definitely make use of this sort of protocol, but it is an application level protocol that is non-standard, even between MPI implementations.  There's no interoperability between them.

The Intel truescale and OPA implementations support tag matching, with HW offload support, but those too are proprietary (i.e. PSM).  In the case of PSM, I believe the tag matching is part of the transport protocol.  There's a decent chance the Mellanox implementation is the same.

> considered analogous to a network driver enabling a layer3 or layer4
> accelerator.  It's not a layer1 thing, so the IBTA is not appropriate
> to
> review it.

The IBTA defines layer1-4, plus the verbs interfaces.  So, it _could_ be an IBTA issue.  However, the IBTA allows for vendor specific features (e.g. PSM), so it could be a vendor feature.

IMO, this is a feature worth supporting.  My concern is over how it is exposed.  There's a lot of variety in how tag matching can be implemented.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next 00/10] Hardware tag matching support
       [not found]                                     ` <a3495685-69c7-26a7-ba97-9761848535a8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-10-13 20:47                                       ` Jason Gunthorpe
  2016-10-13 21:23                                       ` Hefty, Sean
@ 2016-10-14  4:19                                       ` Leon Romanovsky
  2 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2016-10-14  4:19 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Jason Gunthorpe, Hefty, Sean,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]

On Thu, Oct 13, 2016 at 04:30:10PM -0400, Doug Ledford wrote:
> On 10/13/2016 4:02 PM, Christoph Hellwig wrote:
> > On Thu, Oct 13, 2016 at 01:56:05PM -0600, Jason Gunthorpe wrote:
> >> A bigger objection I have is that much of this new stuff is not
> >> adaquately documented.
> >>
> >> Tag matching is a great example. It looks like this introduces many new
> >> headers into the on-wire protocol. Where are these headers documented?
> >> Where is the spec for the on-wire format? Where is the implementation
> >> for rxe?
> >
> > Agreed on that part.  For pon the wire changes we need the hardware
> > standards organization to act.  IBTA for IB and RoCE and IETF for iWarp,
> > although not even having an active WG for iWarp there is a major problem
> > at the moment.
> >
>
> I think there are some fundamental questions here about what this is.
> As I understand it (and Mellanox can chime in if I'm wrong), tag
> matching already exists on the wire, it's just implemented above the
> level of the link layer transmissions.  This enables a hardware
> accelerator for something that is already done at an upper layer, so the
> implementation is mostly defined by the defacto existing use in upper
> layer applications, not by the IBTA.  I believe this could roughly be
> considered analogous to a network driver enabling a layer3 or layer4
> accelerator.  It's not a layer1 thing, so the IBTA is not appropriate to
> review it.

Yes, you are right. It is in use by most if not all major HPC players
and in this patch set, we are providing possibility to configure it and
expose supported flavours of tag matching, so they will be able to run
RHEL out of the box. There are no HW limitations on how other vendors
(let's pretend that they exist) will implement such feature.

As Sean mentioned in the other email, PSM has it and libfabric natively
supports this functionality through MXM provider.

Regarding documentation, we are in continuous process of improvements
for our submissions and will definitely put our attention to that point.

Thanks

>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>     GPG Key ID: 0E572FDD
>




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-10-14  4:19 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-28 11:00 [PATCH rdma-next 00/10] Hardware tag matching support Leon Romanovsky
     [not found] ` <1472382050-25908-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2016-08-28 11:00   ` [PATCH rdma-next 01/10] IB/core: Add XRQ capabilities Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 02/10] IB/core: Make CQ separate part of SRQ context Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 03/10] IB/core: Add new SRQ type IB_SRQT_TAG_MATCHING Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 04/10] IB/uverbs: Expose tag matching capabilties to UAPI Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 05/10] IB/uverbs: Expose XRQ capabilities Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 06/10] IB/uverbs: Add XRQ creation parameter to UAPI Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 07/10] IB/uverbs: Add new SRQ type IB_SRQT_TAG_MATCHING Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 08/10] IB/mlx5: Fill XRQ capabilities Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 09/10] net/mlx5: Add XRQ support Leon Romanovsky
2016-08-28 11:00   ` [PATCH rdma-next 10/10] IB/mlx5: Support IB_SRQT_TAG_MATCHING Leon Romanovsky
2016-10-07 14:56   ` [PATCH rdma-next 00/10] Hardware tag matching support Leon Romanovsky
     [not found]     ` <20161007145620.GV9282-2ukJVAZIZ/Y@public.gmane.org>
2016-10-07 16:47       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB093986-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-10-13 14:15           ` Doug Ledford
     [not found]             ` <6259953b-27fe-77c9-ea90-af744f188671-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-10-13 17:02               ` Hefty, Sean
     [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373AB095429-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-10-13 17:06                   ` Christoph Hellwig
     [not found]                     ` <20161013170641.GA9094-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-10-13 19:43                       ` Hefty, Sean
     [not found]                         ` <1828884A29C6694DAF28B7E6B8A82373AB095647-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-10-13 19:56                           ` Jason Gunthorpe
     [not found]                             ` <20161013195605.GA8077-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-10-13 20:02                               ` Christoph Hellwig
     [not found]                                 ` <20161013200208.GA8998-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-10-13 20:30                                   ` Doug Ledford
     [not found]                                     ` <a3495685-69c7-26a7-ba97-9761848535a8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-10-13 20:47                                       ` Jason Gunthorpe
2016-10-13 21:23                                       ` Hefty, Sean
2016-10-14  4:19                                       ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.