All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] SIF related verbs patches
@ 2016-09-02  0:09 Knut Omang
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

This patch set contains the changes and bug fixes necessary
to support Oracle's new Infiniband HCA (SIF) from the kernel side.

The exception is patch 2 which is a useful consolidation and simplification
made possible by patch 1 and the replacement of the dma_attr type with
a plain bitmask.

The changes to other rdma drivers are just trivial consequences of
the extended umem_get and create_ah calls and should hopefully be
uncontroversial.

Patches 4, 6 and 7 are related to corresponding libibverbs patches.

Patch 8 reserves a bit in ib_qp_create_flags for indicating to
the driver that the QP is going to be used for Ethernet over IB.

The patch set is based on commit 64278fe89b729dddb8dbe5ea52220685f6103d1b
in Doug's k.o/for-4.9 branch.

Dag Moxnes (1):
  ib_mad: incoming sminfo SMPs gets discarded if no process_mad function
    is registered

Knut Omang (8):
  ib_umem: Add a new, more generic ib_umem_get_attrs
  ib_umem: With the new ib_umem_get_attrs, simplify ib_umem_get
  ib: Add udata argument to create_ah
  ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp
  ib_uverbs: Avoid vendor specific masking of attributes in query_qp
  ib_{uverbs/core}: add new ib_create_qp_ex with udata arg
  ib_uverbs: Support for kernel implementation of XRC
  ib_verbs: Add a new qp create flag to request features for Ethernet
    over IB

 drivers/infiniband/core/core_priv.h          |  3 ++
 drivers/infiniband/core/mad.c                |  6 +++
 drivers/infiniband/core/smi.h                |  6 +--
 drivers/infiniband/core/umem.c               | 23 +++++----
 drivers/infiniband/core/uverbs_cmd.c         | 71 +++++++++++++++++-----------
 drivers/infiniband/core/verbs.c              | 15 ++++--
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  5 +-
 drivers/infiniband/hw/cxgb4/mem.c            |  2 +-
 drivers/infiniband/hw/cxgb4/provider.c       |  3 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c    |  2 +-
 drivers/infiniband/hw/mlx4/ah.c              |  3 +-
 drivers/infiniband/hw/mlx4/cq.c              |  4 +-
 drivers/infiniband/hw/mlx4/doorbell.c        |  2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |  3 +-
 drivers/infiniband/hw/mlx4/mr.c              |  5 +-
 drivers/infiniband/hw/mlx4/qp.c              | 21 +++++++-
 drivers/infiniband/hw/mlx4/srq.c             |  2 +-
 drivers/infiniband/hw/mlx5/ah.c              |  3 +-
 drivers/infiniband/hw/mlx5/cq.c              | 12 +++--
 drivers/infiniband/hw/mlx5/doorbell.c        |  2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |  3 +-
 drivers/infiniband/hw/mlx5/mr.c              |  2 +-
 drivers/infiniband/hw/mlx5/qp.c              | 21 +++++++-
 drivers/infiniband/hw/mlx5/srq.c             |  2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c | 11 +++--
 drivers/infiniband/hw/nes/nes_verbs.c        |  5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c     |  5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h     |  3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |  2 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |  3 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h |  3 +-
 drivers/infiniband/sw/rdmavt/mr.c            |  2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c           |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c        |  3 +-
 include/rdma/ib_umem.h                       | 19 ++++++--
 include/rdma/ib_verbs.h                      | 16 ++++++-
 include/uapi/rdma/ib_user_verbs.h            |  3 ++
 37 files changed, 208 insertions(+), 90 deletions(-)

--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02  0:09   ` Knut Omang
       [not found]     ` <1472774969-18997-2-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:09   ` [PATCH 2/9] ib_umem: Add a new, more generic ib_umem_get_attrs Knut Omang
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes, Knut Omang

From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

The process_mad function is an optional IB driver entry point
allows a driver to intercept or modify MAD traffic.

This fix allows MAD traffic to flow down to the device also
when MAD traffic is completely handled by the device and
no process_mad function is provided.

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/mad.c | 6 ++++++
 drivers/infiniband/core/smi.h | 6 ++----
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 2d49228..ece33ec 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 			goto out;
 	}
 
+	/* If device does not define the optional process_mad function it means that
+	 * everything is handled in hardware:
+	 */
+	if (!device->process_mad)
+		goto out;
+
 	local = kmalloc(sizeof *local, GFP_ATOMIC);
 	if (!local) {
 		ret = -ENOMEM;
diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
index 33c91c8..16a9f9a 100644
--- a/drivers/infiniband/core/smi.h
+++ b/drivers/infiniband/core/smi.h
@@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
 {
 	/* C14-9:3 -- We're at the end of the DR segment of path */
 	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
-	return ((device->process_mad &&
-		!ib_get_smp_direction(smp) &&
+	return ((!ib_get_smp_direction(smp) &&
 		(smp->hop_ptr == smp->hop_cnt + 1)) ?
 		IB_SMI_HANDLE : IB_SMI_DISCARD);
 }
@@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
 {
 	/* C14-13:3 -- We're at the end of the DR segment of path */
 	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
-	return ((device->process_mad &&
-		ib_get_smp_direction(smp) &&
+	return ((ib_get_smp_direction(smp) &&
 		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
 }
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/9] ib_umem: Add a new, more generic ib_umem_get_attrs
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:09   ` [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
  2016-09-02  0:09   ` [PATCH 3/9] ib_umem: With the new ib_umem_get_attrs, simplify ib_umem_get Knut Omang
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

This call allows a full range of DMA attributes and also
DMA direction to be supplied and is just a refactor of the old ib_umem_get.
Reimplement ib_umem_get using the new generic call,
now a trivial implementation.

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/umem.c | 23 +++++++++++++++--------
 include/rdma/ib_umem.h         | 15 ++++++++++++++-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c68746c..699a0f7 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -52,7 +52,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
 	if (umem->nmap > 0)
 		ib_dma_unmap_sg(dev, umem->sg_head.sgl,
 				umem->nmap,
-				DMA_BIDIRECTIONAL);
+				umem->dir);
 
 	for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
 
@@ -82,6 +82,17 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 			    size_t size, int access, int dmasync)
 {
+	unsigned long dma_attrs = 0;
+	if (dmasync)
+		dma_attrs |= DMA_ATTR_WRITE_BARRIER;
+	return ib_umem_get_attrs(context, addr, size, access, DMA_BIDIRECTIONAL, dma_attrs);
+}
+EXPORT_SYMBOL(ib_umem_get);
+
+struct ib_umem *ib_umem_get_attrs(struct ib_ucontext *context, unsigned long addr,
+				  size_t size, int access, enum dma_data_direction dir,
+				  unsigned long dma_attrs)
+{
 	struct ib_umem *umem;
 	struct page **page_list;
 	struct vm_area_struct **vma_list;
@@ -91,16 +102,11 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	unsigned long npages;
 	int ret;
 	int i;
-	unsigned long dma_attrs = 0;
 	struct scatterlist *sg, *sg_list_start;
 	int need_release = 0;
 
-	if (dmasync)
-		dma_attrs |= DMA_ATTR_WRITE_BARRIER;
-
 	if (!size)
 		return ERR_PTR(-EINVAL);
-
 	/*
 	 * If the combination of the addr and size requested for this memory
 	 * region causes an integer overflow, return error.
@@ -121,6 +127,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	umem->address   = addr;
 	umem->page_size = PAGE_SIZE;
 	umem->pid       = get_task_pid(current, PIDTYPE_PID);
+	umem->dir	= dir;
 	/*
 	 * We ask for writable memory if any of the following
 	 * access flags are set.  "Local write" and "remote write"
@@ -213,7 +220,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	umem->nmap = ib_dma_map_sg_attrs(context->device,
 				  umem->sg_head.sgl,
 				  umem->npages,
-				  DMA_BIDIRECTIONAL,
+				  dir,
 				  dma_attrs);
 
 	if (umem->nmap <= 0) {
@@ -239,7 +246,7 @@ out:
 
 	return ret < 0 ? ERR_PTR(ret) : umem;
 }
-EXPORT_SYMBOL(ib_umem_get);
+EXPORT_SYMBOL(ib_umem_get_attrs);
 
 static void ib_umem_account(struct work_struct *work)
 {
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 2d83cfd..2876679 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -36,6 +36,7 @@
 #include <linux/list.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <linux/dma-direction.h>
 
 struct ib_ucontext;
 struct ib_umem_odp;
@@ -47,6 +48,7 @@ struct ib_umem {
 	int			page_size;
 	int                     writable;
 	int                     hugetlb;
+	enum dma_data_direction dir;
 	struct work_struct	work;
 	struct pid             *pid;
 	struct mm_struct       *mm;
@@ -81,9 +83,12 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 }
 
 #ifdef CONFIG_INFINIBAND_USER_MEM
-
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 			    size_t size, int access, int dmasync);
+struct ib_umem *ib_umem_get_attrs(struct ib_ucontext *context, unsigned long addr,
+				  size_t size, int access,
+				  enum dma_data_direction dir,
+				  unsigned long dma_attrs);
 void ib_umem_release(struct ib_umem *umem);
 int ib_umem_page_count(struct ib_umem *umem);
 int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
@@ -98,6 +103,14 @@ static inline struct ib_umem *ib_umem_get(struct ib_ucontext *context,
 					  int access, int dmasync) {
 	return ERR_PTR(-EINVAL);
 }
+static inline struct ib_umem *ib_umem_get_attrs(struct ib_ucontext *context,
+						unsigned long addr,
+						size_t size, int access,
+						enum dma_data_direction dir,
+						unsigned long dma_attrs)
+{
+	return ERR_PTR(-EINVAL);
+}
 static inline void ib_umem_release(struct ib_umem *umem) { }
 static inline int ib_umem_page_count(struct ib_umem *umem) { return 0; }
 static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/9] ib_umem: With the new ib_umem_get_attrs, simplify ib_umem_get
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:09   ` [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered Knut Omang
  2016-09-02  0:09   ` [PATCH 2/9] ib_umem: Add a new, more generic ib_umem_get_attrs Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
  2016-09-02  0:09   ` [PATCH 4/9] ib: Add udata argument to create_ah Knut Omang
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

Instead of having most drivers provide a specialized
and weakly typed "boolean" argument
dmasync = 0 argument which are only set to nonzero 3 places,
instead let these special use cases use the
more generic ib_umem_get_attrs which gives access to
the full range of dma attributes and also dma direction,
and eliminate the dmasync parameter from the ib_umem_get call,

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/umem.c               |  8 ++------
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  2 +-
 drivers/infiniband/hw/cxgb4/mem.c            |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c    |  2 +-
 drivers/infiniband/hw/mlx4/cq.c              |  4 ++--
 drivers/infiniband/hw/mlx4/doorbell.c        |  2 +-
 drivers/infiniband/hw/mlx4/mr.c              |  5 ++---
 drivers/infiniband/hw/mlx4/qp.c              |  2 +-
 drivers/infiniband/hw/mlx4/srq.c             |  2 +-
 drivers/infiniband/hw/mlx5/cq.c              | 12 +++++++-----
 drivers/infiniband/hw/mlx5/doorbell.c        |  2 +-
 drivers/infiniband/hw/mlx5/mr.c              |  2 +-
 drivers/infiniband/hw/mlx5/qp.c              |  4 ++--
 drivers/infiniband/hw/mlx5/srq.c             |  2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c |  8 ++++++--
 drivers/infiniband/hw/nes/nes_verbs.c        |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |  2 +-
 drivers/infiniband/sw/rdmavt/mr.c            |  2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c           |  2 +-
 include/rdma/ib_umem.h                       |  4 ++--
 20 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 699a0f7..938c940 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -77,15 +77,11 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
  * @addr: userspace virtual address to start at
  * @size: length of region to pin
  * @access: IB_ACCESS_xxx flags for memory being pinned
- * @dmasync: flush in-flight DMA when the memory region is written
  */
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
-			    size_t size, int access, int dmasync)
+			    size_t size, int access)
 {
-	unsigned long dma_attrs = 0;
-	if (dmasync)
-		dma_attrs |= DMA_ATTR_WRITE_BARRIER;
-	return ib_umem_get_attrs(context, addr, size, access, DMA_BIDIRECTIONAL, dma_attrs);
+	return ib_umem_get_attrs(context, addr, size, access, DMA_BIDIRECTIONAL, 0);
 }
 EXPORT_SYMBOL(ib_umem_get);
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 3edb806..4225479 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -576,7 +576,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	mhp->rhp = rhp;
 
-	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc);
 	if (IS_ERR(mhp->umem)) {
 		err = PTR_ERR(mhp->umem);
 		kfree(mhp);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 0b91b0f..7a30b552 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -509,7 +509,7 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	mhp->rhp = rhp;
 
-	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc);
 	if (IS_ERR(mhp->umem)) {
 		err = PTR_ERR(mhp->umem);
 		kfree_skb(mhp->dereg_skb);
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 2360338..3cf1c23 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1703,7 +1703,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 
 	if (length > I40IW_MAX_MR_SIZE)
 		return ERR_PTR(-EINVAL);
-	region = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	region = ib_umem_get(pd->uobject->context, start, length, acc);
 	if (IS_ERR(region))
 		return (struct ib_mr *)region;
 
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d6fc8a6..8c3ef00 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -141,8 +141,8 @@ static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, struct ib_ucontext *cont
 	int err;
 	int cqe_size = dev->dev->caps.cqe_size;
 
-	*umem = ib_umem_get(context, buf_addr, cqe * cqe_size,
-			    IB_ACCESS_LOCAL_WRITE, 1);
+	*umem = ib_umem_get_attrs(context, buf_addr, cqe * cqe_size,
+			IB_ACCESS_LOCAL_WRITE, DMA_BIDIRECTIONAL, DMA_ATTR_WRITE_BARRIER);
 	if (IS_ERR(*umem))
 		return PTR_ERR(*umem);
 
diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c
index c517409..f87aa5f 100644
--- a/drivers/infiniband/hw/mlx4/doorbell.c
+++ b/drivers/infiniband/hw/mlx4/doorbell.c
@@ -62,7 +62,7 @@ int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
 	page->user_virt = (virt & PAGE_MASK);
 	page->refcnt    = 0;
 	page->umem      = ib_umem_get(&context->ibucontext, virt & PAGE_MASK,
-				      PAGE_SIZE, 0, 0);
+				      PAGE_SIZE, 0);
 	if (IS_ERR(page->umem)) {
 		err = PTR_ERR(page->umem);
 		kfree(page);
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 5d73989..31ee0f7 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -148,7 +148,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	/* Force registering the memory as writable. */
 	/* Used for memory re-registeration. HCA protects the access */
 	mr->umem = ib_umem_get(pd->uobject->context, start, length,
-			       access_flags | IB_ACCESS_LOCAL_WRITE, 0);
+			       access_flags | IB_ACCESS_LOCAL_WRITE);
 	if (IS_ERR(mr->umem)) {
 		err = PTR_ERR(mr->umem);
 		goto err_free;
@@ -230,8 +230,7 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
 		ib_umem_release(mmr->umem);
 		mmr->umem = ib_umem_get(mr->uobject->context, start, length,
 					mr_access_flags |
-					IB_ACCESS_LOCAL_WRITE,
-					0);
+					IB_ACCESS_LOCAL_WRITE);
 		if (IS_ERR(mmr->umem)) {
 			err = PTR_ERR(mmr->umem);
 			/* Prevent mlx4_ib_dereg_mr from free'ing invalid pointer */
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 768085f..363b1cb 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -742,7 +742,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			goto err;
 
 		qp->umem = ib_umem_get(pd->uobject->context, ucmd.buf_addr,
-				       qp->buf_size, 0, 0);
+				       qp->buf_size, 0);
 		if (IS_ERR(qp->umem)) {
 			err = PTR_ERR(qp->umem);
 			goto err;
diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index 0597f3e..31c297b 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -115,7 +115,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 		}
 
 		srq->umem = ib_umem_get(pd->uobject->context, ucmd.buf_addr,
-					buf_size, 0, 0);
+					buf_size, 0);
 		if (IS_ERR(srq->umem)) {
 			err = PTR_ERR(srq->umem);
 			goto err_srq;
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 35a9f71..c505f0b 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -776,9 +776,11 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 
 	*cqe_size = ucmd.cqe_size;
 
-	cq->buf.umem = ib_umem_get(context, ucmd.buf_addr,
-				   entries * ucmd.cqe_size,
-				   IB_ACCESS_LOCAL_WRITE, 1);
+	cq->buf.umem = ib_umem_get_attrs(context, ucmd.buf_addr,
+					 entries * ucmd.cqe_size,
+					 IB_ACCESS_LOCAL_WRITE,
+					 DMA_BIDIRECTIONAL,
+					 DMA_ATTR_WRITE_BARRIER);
 	if (IS_ERR(cq->buf.umem)) {
 		err = PTR_ERR(cq->buf.umem);
 		return err;
@@ -1137,8 +1139,8 @@ static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	if (ucmd.reserved0 || ucmd.reserved1)
 		return -EINVAL;
 
-	umem = ib_umem_get(context, ucmd.buf_addr, entries * ucmd.cqe_size,
-			   IB_ACCESS_LOCAL_WRITE, 1);
+	umem = ib_umem_get_attrs(context, ucmd.buf_addr, entries * ucmd.cqe_size,
+			IB_ACCESS_LOCAL_WRITE, DMA_BIDIRECTIONAL, DMA_ATTR_WRITE_BARRIER);
 	if (IS_ERR(umem)) {
 		err = PTR_ERR(umem);
 		return err;
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index a0e4e6d..b9f8158 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -64,7 +64,7 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, unsigned long virt,
 	page->user_virt = (virt & PAGE_MASK);
 	page->refcnt    = 0;
 	page->umem      = ib_umem_get(&context->ibucontext, virt & PAGE_MASK,
-				      PAGE_SIZE, 0, 0);
+				      PAGE_SIZE, 0);
 	if (IS_ERR(page->umem)) {
 		err = PTR_ERR(page->umem);
 		kfree(page);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6f7e347..baca6e7 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -822,7 +822,7 @@ static struct ib_umem *mr_umem_get(struct ib_pd *pd, u64 start, u64 length,
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct ib_umem *umem = ib_umem_get(pd->uobject->context, start, length,
-					   access_flags, 0);
+					   access_flags);
 	if (IS_ERR(umem)) {
 		mlx5_ib_err(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
 		return (void *)umem;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index f3c943f..3a48d9db 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -632,7 +632,7 @@ static int mlx5_ib_umem_get(struct mlx5_ib_dev *dev,
 {
 	int err;
 
-	*umem = ib_umem_get(pd->uobject->context, addr, size, 0, 0);
+	*umem = ib_umem_get(pd->uobject->context, addr, size, 0);
 	if (IS_ERR(*umem)) {
 		mlx5_ib_dbg(dev, "umem_get failed\n");
 		return PTR_ERR(*umem);
@@ -684,7 +684,7 @@ static int create_user_rq(struct mlx5_ib_dev *dev, struct ib_pd *pd,
 
 	context = to_mucontext(pd->uobject->context);
 	rwq->umem = ib_umem_get(pd->uobject->context, ucmd->buf_addr,
-			       rwq->buf_size, 0, 0);
+			       rwq->buf_size, 0);
 	if (IS_ERR(rwq->umem)) {
 		mlx5_ib_dbg(dev, "umem_get failed\n");
 		err = PTR_ERR(rwq->umem);
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index ed6ac52..f2f614c 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -112,7 +112,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq,
 	srq->wq_sig = !!(ucmd.flags & MLX5_SRQ_FLAG_SIGNATURE);
 
 	srq->umem = ib_umem_get(pd->uobject->context, ucmd.buf_addr, buf_size,
-				0, 0);
+				0);
 	if (IS_ERR(srq->umem)) {
 		mlx5_ib_dbg(dev, "failed umem get, size %d\n", buf_size);
 		err = PTR_ERR(srq->umem);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index da2335f..86c60dc 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -910,6 +910,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	int i, k, entry;
 	int err = 0;
 	int write_mtt_size;
+	unsigned long dma_attrs = 0;
 
 	if (udata->inlen - sizeof (struct ib_uverbs_cmd_hdr) < sizeof ucmd) {
 		if (!to_mucontext(pd->uobject->context)->reg_mr_warned) {
@@ -926,8 +927,11 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	mr->umem = ib_umem_get(pd->uobject->context, start, length, acc,
-			       ucmd.mr_attrs & MTHCA_MR_DMASYNC);
+	if (ucmd.mr_attrs & MTHCA_MR_DMASYNC)
+		dma_attrs |= DMA_ATTR_WRITE_BARRIER;
+
+	mr->umem = ib_umem_get_attrs(pd->uobject->context, start, length, acc,
+				DMA_BIDIRECTIONAL, dma_attrs);
 
 	if (IS_ERR(mr->umem)) {
 		err = PTR_ERR(mr->umem);
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index bd69125..f417e1d 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -2170,7 +2170,7 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	u8 stag_key;
 	int first_page = 1;
 
-	region = ib_umem_get(pd->uobject->context, start, length, acc, 0);
+	region = ib_umem_get(pd->uobject->context, start, length, acc);
 	if (IS_ERR(region)) {
 		return (struct ib_mr *)region;
 	}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index b1a3d91..5c80765 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -967,7 +967,7 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
 	if (!mr)
 		return ERR_PTR(status);
-	mr->umem = ib_umem_get(ibpd->uobject->context, start, len, acc, 0);
+	mr->umem = ib_umem_get(ibpd->uobject->context, start, len, acc);
 	if (IS_ERR(mr->umem)) {
 		status = -EFAULT;
 		goto umem_err;
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 80c4b6b..564724a 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -369,7 +369,7 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(-EINVAL);
 
 	umem = ib_umem_get(pd->uobject->context, start, length,
-			   mr_access_flags, 0);
+			   mr_access_flags);
 	if (IS_ERR(umem))
 		return (void *)umem;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f3dab65..f4d47fd 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -169,7 +169,7 @@ int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
 	void			*vaddr;
 	int err;
 
-	umem = ib_umem_get(pd->ibpd.uobject->context, start, length, access, 0);
+	umem = ib_umem_get(pd->ibpd.uobject->context, start, length, access);
 	if (IS_ERR(umem)) {
 		pr_warn("err %d from rxe_umem_get\n",
 			(int)PTR_ERR(umem));
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 2876679..472c2f6 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -84,7 +84,7 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem)
 
 #ifdef CONFIG_INFINIBAND_USER_MEM
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
-			    size_t size, int access, int dmasync);
+			    size_t size, int access);
 struct ib_umem *ib_umem_get_attrs(struct ib_ucontext *context, unsigned long addr,
 				  size_t size, int access,
 				  enum dma_data_direction dir,
@@ -100,7 +100,7 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 
 static inline struct ib_umem *ib_umem_get(struct ib_ucontext *context,
 					  unsigned long addr, size_t size,
-					  int access, int dmasync) {
+					  int access) {
 	return ERR_PTR(-EINVAL);
 }
 static inline struct ib_umem *ib_umem_get_attrs(struct ib_ucontext *context,
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/9] ib: Add udata argument to create_ah
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-09-02  0:09   ` [PATCH 3/9] ib_umem: With the new ib_umem_get_attrs, simplify ib_umem_get Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
       [not found]     ` <1472774969-18997-5-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:09   ` [PATCH 5/9] ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp Knut Omang
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

Most of the ib device driver entry points supports optional
device specific parameter transfer between user space and kernel space
via the udata argument - add a similar argument for ib_create_ah.

Update all infiniband drivers to include this agument
in their driver entry point implementation.

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c         | 10 +++++++++-
 drivers/infiniband/core/verbs.c              |  2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  3 ++-
 drivers/infiniband/hw/cxgb4/provider.c       |  3 ++-
 drivers/infiniband/hw/mlx4/ah.c              |  3 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |  3 ++-
 drivers/infiniband/hw/mlx5/ah.c              |  3 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |  3 ++-
 drivers/infiniband/hw/mthca/mthca_provider.c |  3 ++-
 drivers/infiniband/hw/nes/nes_verbs.c        |  3 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c     |  5 +++--
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h     |  3 ++-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |  3 ++-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h |  3 ++-
 drivers/infiniband/sw/rxe/rxe_verbs.c        |  3 ++-
 include/rdma/ib_verbs.h                      |  3 ++-
 16 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index f664731..dbc5885 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2874,6 +2874,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	struct ib_pd			*pd;
 	struct ib_ah			*ah;
 	struct ib_ah_attr		attr;
+	struct ib_udata                 udata;
 	int ret;
 
 	if (out_len < sizeof resp)
@@ -2882,6 +2883,10 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
+	INIT_UDATA(&udata, buf + sizeof cmd,
+		(unsigned long) cmd.response + sizeof resp,
+		in_len - sizeof cmd, out_len - sizeof resp);
+
 	uobj = kmalloc(sizeof *uobj, GFP_KERNEL);
 	if (!uobj)
 		return -ENOMEM;
@@ -2908,14 +2913,17 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	memset(&attr.dmac, 0, sizeof(attr.dmac));
 	memcpy(attr.grh.dgid.raw, cmd.attr.grh.dgid, 16);
 
-	ah = ib_create_ah(pd, &attr);
+	ah = pd->device->create_ah(pd, &attr, &udata);
 	if (IS_ERR(ah)) {
 		ret = PTR_ERR(ah);
 		goto err_put;
 	}
 
+	ah->device  = pd->device;
+	ah->pd      = pd;
 	ah->uobject  = uobj;
 	uobj->object = ah;
+	atomic_inc(&pd->usecnt);
 
 	ret = idr_add_uobj(&ib_uverbs_ah_idr, uobj);
 	if (ret)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f2b776e..6efe23d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -293,7 +293,7 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
 {
 	struct ib_ah *ah;
 
-	ah = pd->device->create_ah(pd, ah_attr);
+	ah = pd->device->create_ah(pd, ah_attr, NULL);
 
 	if (!IS_ERR(ah)) {
 		ah->device  = pd->device;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 4225479..ffdf530 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -62,7 +62,8 @@
 #include "common.h"
 
 static struct ib_ah *iwch_ah_create(struct ib_pd *pd,
-				    struct ib_ah_attr *ah_attr)
+				    struct ib_ah_attr *ah_attr,
+				    struct ib_udata *udata)
 {
 	return ERR_PTR(-ENOSYS);
 }
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index df127ce..b476df0 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -59,7 +59,8 @@ module_param(fastreg_support, int, 0644);
 MODULE_PARM_DESC(fastreg_support, "Advertise fastreg support (default=1)");
 
 static struct ib_ah *c4iw_ah_create(struct ib_pd *pd,
-				    struct ib_ah_attr *ah_attr)
+				    struct ib_ah_attr *ah_attr,
+				    struct ib_udata *udata)
 {
 	return ERR_PTR(-ENOSYS);
 }
diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index 5fc6233..1ea2ba3 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -124,7 +124,8 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
 	return &ah->ibah;
 }
 
-struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+				struct ib_udata *udata)
 {
 	struct mlx4_ib_ah *ah;
 	struct ib_ah *ret;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 7c5832e..c3812e3 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -740,7 +740,8 @@ int mlx4_ib_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);
 void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq);
 void mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq);
 
-struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
+struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+				struct ib_udata *udata);
 int mlx4_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr);
 int mlx4_ib_destroy_ah(struct ib_ah *ah);
 
diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index 745efa4..b5d0f9a 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -64,7 +64,8 @@ static struct ib_ah *create_ib_ah(struct mlx5_ib_dev *dev,
 	return &ah->ibah;
 }
 
-struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+				struct ib_udata *udata)
 {
 	struct mlx5_ib_ah *ah;
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index a59034a..82a1ccd 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -726,7 +726,8 @@ void mlx5_ib_free_srq_wqe(struct mlx5_ib_srq *srq, int wqe_index);
 int mlx5_MAD_IFC(struct mlx5_ib_dev *dev, int ignore_mkey, int ignore_bkey,
 		 u8 port, const struct ib_wc *in_wc, const struct ib_grh *in_grh,
 		 const void *in_mad, void *response_mad);
-struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+				struct ib_udata *udata);
 int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr);
 int mlx5_ib_destroy_ah(struct ib_ah *ah);
 struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd,
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 86c60dc..9618e32 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -409,7 +409,8 @@ static int mthca_dealloc_pd(struct ib_pd *pd)
 }
 
 static struct ib_ah *mthca_ah_create(struct ib_pd *pd,
-				     struct ib_ah_attr *ah_attr)
+				     struct ib_ah_attr *ah_attr,
+				     struct ib_udata *udata)
 {
 	int err;
 	struct mthca_ah *ah;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index f417e1d..76862c9 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -771,7 +771,8 @@ static int nes_dealloc_pd(struct ib_pd *ibpd)
 /**
  * nes_create_ah
  */
-static struct ib_ah *nes_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+static struct ib_ah *nes_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+				struct ib_udata *udata)
 {
 	return ERR_PTR(-ENOSYS);
 }
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 797362a..c65ac291 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -154,7 +154,8 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
 	return status;
 }
 
-struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
+struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr,
+			struct ib_udata *udata)
 {
 	u32 *ahid_addr;
 	int status;
@@ -203,7 +204,7 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
 						      &sgid_attr.ndev->ifindex,
 						      NULL);
 		if (status) {
-			pr_err("%s(): Failed to resolve dmac from gid." 
+			pr_err("%s(): Failed to resolve dmac from gid."
 				"status = %d\n", __func__, status);
 			goto av_conf_err;
 		}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
index 3856dd4..327b888a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
@@ -50,7 +50,8 @@ enum {
 	OCRDMA_AH_L3_TYPE_MASK		= 0x03,
 	OCRDMA_AH_L3_TYPE_SHIFT		= 0x1D /* 29 bits */
 };
-struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *);
+struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *,
+			struct ib_udata *);
 int ocrdma_destroy_ah(struct ib_ah *);
 int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *);
 int ocrdma_modify_ah(struct ib_ah *, struct ib_ah_attr *);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index a5bfbba..9051193 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -738,7 +738,8 @@ int usnic_ib_mmap(struct ib_ucontext *context,
 
 /* In ib callbacks section -  Start of stub funcs */
 struct ib_ah *usnic_ib_create_ah(struct ib_pd *pd,
-					struct ib_ah_attr *ah_attr)
+				struct ib_ah_attr *ah_attr,
+				struct ib_udata *udata)
 {
 	usnic_dbg("\n");
 	return ERR_PTR(-EPERM);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index 0d9d2e6a..1eecd8e 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -75,7 +75,8 @@ int usnic_ib_dealloc_ucontext(struct ib_ucontext *ibcontext);
 int usnic_ib_mmap(struct ib_ucontext *context,
 			struct vm_area_struct *vma);
 struct ib_ah *usnic_ib_create_ah(struct ib_pd *pd,
-					struct ib_ah_attr *ah_attr);
+				struct ib_ah_attr *ah_attr,
+				struct ib_udata *udata);
 int usnic_ib_destroy_ah(struct ib_ah *ah);
 int usnic_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			struct ib_send_wr **bad_wr);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 4552be9..64ade0b 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -314,7 +314,8 @@ static int rxe_init_av(struct rxe_dev *rxe, struct ib_ah_attr *attr,
 	return err;
 }
 
-static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
+static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr,
+				struct ib_udata *udata)
 {
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibpd->device);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8e90dd2..192d591 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1879,7 +1879,8 @@ struct ib_device {
 					       struct ib_udata *udata);
 	int                        (*dealloc_pd)(struct ib_pd *pd);
 	struct ib_ah *             (*create_ah)(struct ib_pd *pd,
-						struct ib_ah_attr *ah_attr);
+						struct ib_ah_attr *ah_attr,
+						struct ib_udata *udata);
 	int                        (*modify_ah)(struct ib_ah *ah,
 						struct ib_ah_attr *ah_attr);
 	int                        (*query_ah)(struct ib_ah *ah,
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/9] ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-09-02  0:09   ` [PATCH 4/9] ib: Add udata argument to create_ah Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
       [not found]     ` <1472774969-18997-6-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:09   ` [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp Knut Omang
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

The ib_uverbs_reg_mr_resp structure was not 64 bit end aligned
as required by the protocol. This causes alignment issues
if a device specific driver needs to transfer extra response
arguments.

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/uapi/rdma/ib_user_verbs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 7f035f4b..6b8c9c0 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -307,6 +307,7 @@ struct ib_uverbs_reg_mr_resp {
 	__u32 mr_handle;
 	__u32 lkey;
 	__u32 rkey;
+	__u32 reserved;
 };
 
 struct ib_uverbs_rereg_mr {
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-09-02  0:09   ` [PATCH 5/9] ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
       [not found]     ` <1472774969-18997-7-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:09   ` [PATCH 7/9] ib_{uverbs/core}: add new ib_create_qp_ex with udata arg Knut Omang
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

This commit removes the implementation and use of the modify_qp_mask
helper function from the generic OFED implementation and into individual
device drivers.

Like with use of the ib_modify_qp_is_ok function it should be up to
each device driver how to handle bits set in the attribute masks.

With the modify_qp_mask function applied in the generic code,
drivers would not see the bits that the user process actually sets.

The restrictions imposed by the filter are also beyond what
is imposed by the Infiniband standard, and would also limit
future drivers or hardware from checking for unsupported or
invalid settings.

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 19 ++-----------------
 drivers/infiniband/hw/mlx4/qp.c      | 19 ++++++++++++++++++-
 drivers/infiniband/hw/mlx5/qp.c      | 17 +++++++++++++++++
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index dbc5885..35d18e0 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2314,20 +2314,6 @@ out:
 	return ret ? ret : in_len;
 }
 
-/* Remove ignored fields set in the attribute mask */
-static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
-{
-	switch (qp_type) {
-	case IB_QPT_XRC_INI:
-		return mask & ~(IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER);
-	case IB_QPT_XRC_TGT:
-		return mask & ~(IB_QP_MAX_QP_RD_ATOMIC | IB_QP_RETRY_CNT |
-				IB_QP_RNR_RETRY);
-	default:
-		return mask;
-	}
-}
-
 ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 			    struct ib_device *ib_dev,
 			    const char __user *buf, int in_len,
@@ -2405,10 +2391,9 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 		ret = ib_resolve_eth_dmac(qp, attr, &cmd.attr_mask);
 		if (ret)
 			goto release_qp;
-		ret = qp->device->modify_qp(qp, attr,
-			modify_qp_mask(qp->qp_type, cmd.attr_mask), &udata);
+		ret = qp->device->modify_qp(qp, attr, cmd.attr_mask, &udata);
 	} else {
-		ret = ib_modify_qp(qp, attr, modify_qp_mask(qp->qp_type, cmd.attr_mask));
+		ret = ib_modify_qp(qp, attr, cmd.attr_mask);
 	}
 
 	if (ret)
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 363b1cb..3a5061f 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -2154,6 +2154,20 @@ out:
 	return err;
 }
 
+/* filter out ignored fields set in the attribute mask */
+static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
+{
+	switch (qp_type) {
+	case IB_QPT_XRC_INI:
+		return mask & ~(IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER);
+	case IB_QPT_XRC_TGT:
+		return mask & ~(IB_QP_MAX_QP_RD_ATOMIC | IB_QP_RETRY_CNT |
+				IB_QP_RNR_RETRY);
+	default:
+		return mask;
+	}
+}
+
 static int _mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 			      int attr_mask, struct ib_udata *udata)
 {
@@ -2162,6 +2176,10 @@ static int _mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	enum ib_qp_state cur_state, new_state;
 	int err = -EINVAL;
 	int ll;
+
+	if (udata)
+		attr_mask = modify_qp_mask(ibqp->qp_type, attr_mask);
+
 	mutex_lock(&qp->mutex);
 
 	cur_state = attr_mask & IB_QP_CUR_STATE ? attr->cur_qp_state : qp->state;
@@ -3501,4 +3519,3 @@ out:
 	mutex_unlock(&qp->mutex);
 	return err;
 }
-
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 3a48d9db..5006ab0 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2737,6 +2737,20 @@ out:
 	return err;
 }
 
+/* filter out ignored fields set in the attribute mask */
+static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
+{
+	switch (qp_type) {
+	case IB_QPT_XRC_INI:
+		return mask & ~(IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER);
+	case IB_QPT_XRC_TGT:
+		return mask & ~(IB_QP_MAX_QP_RD_ATOMIC | IB_QP_RETRY_CNT |
+				IB_QP_RNR_RETRY);
+	default:
+		return mask;
+	}
+}
+
 int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		      int attr_mask, struct ib_udata *udata)
 {
@@ -2751,6 +2765,9 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	if (ibqp->rwq_ind_tbl)
 		return -ENOSYS;
 
+	if (udata)
+		attr_mask = modify_qp_mask(ibqp->qp_type, attr_mask);
+
 	if (unlikely(ibqp->qp_type == IB_QPT_GSI))
 		return mlx5_ib_gsi_modify_qp(ibqp, attr, attr_mask);
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/9] ib_{uverbs/core}: add new ib_create_qp_ex with udata arg
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-09-02  0:09   ` [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
  2016-09-02  0:09   ` [PATCH 8/9] ib_uverbs: Support for kernel implementation of XRC Knut Omang
  2016-09-02  0:09   ` [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB Knut Omang
  8 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

Necessary to get device specific arguments through to XRC QPs.
Added new local header file to serve as support interface
between ib_core and ib_uverbs.

Right now there is a lot of duplicate setup code in uverbs_cmd.c
on the ib_uverbs side and verbs.c on the ib_core side. This commit
is a quick fix to have XRC support working, but similar calls
can be added to consolidate the code for other parts of the API.

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/core_priv.h  |  3 +++
 drivers/infiniband/core/uverbs_cmd.c |  2 +-
 drivers/infiniband/core/verbs.c      | 13 +++++++++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 19d499d..0819006 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -152,5 +152,8 @@ int ib_nl_handle_set_timeout(struct sk_buff *skb,
 			     struct netlink_callback *cb);
 int ib_nl_handle_ip_res_resp(struct sk_buff *skb,
 			     struct netlink_callback *cb);
+struct ib_qp *ib_create_qp_ex(struct ib_pd *pd,
+			struct ib_qp_init_attr *qp_init_attr,
+			struct ib_udata *udata);
 
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 35d18e0..4323093 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1905,7 +1905,7 @@ static int create_qp(struct ib_uverbs_file *file,
 		}
 
 	if (cmd->qp_type == IB_QPT_XRC_TGT)
-		qp = ib_create_qp(pd, &attr);
+		qp = ib_create_qp_ex(pd, &attr, uhw);
 	else
 		qp = device->create_qp(pd, &attr, uhw);
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 6efe23d..8f71b00 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -751,8 +751,9 @@ static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
 	return qp;
 }
 
-struct ib_qp *ib_create_qp(struct ib_pd *pd,
-			   struct ib_qp_init_attr *qp_init_attr)
+struct ib_qp *ib_create_qp_ex(struct ib_pd *pd,
+			struct ib_qp_init_attr *qp_init_attr,
+			struct ib_udata *udata)
 {
 	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
 	struct ib_qp *qp;
@@ -836,6 +837,14 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 
 	return qp;
 }
+EXPORT_SYMBOL(ib_create_qp_ex);
+
+
+struct ib_qp *ib_create_qp(struct ib_pd *pd,
+			   struct ib_qp_init_attr *qp_init_attr)
+{
+	return ib_create_qp_ex(pd, qp_init_attr,NULL);
+}
 EXPORT_SYMBOL(ib_create_qp);
 
 static const struct {
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 8/9] ib_uverbs: Support for kernel implementation of XRC
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-09-02  0:09   ` [PATCH 7/9] ib_{uverbs/core}: add new ib_create_qp_ex with udata arg Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
       [not found]     ` <1472774969-18997-9-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:09   ` [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB Knut Omang
  8 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

Extends the kernel/user space interface for work requests to also provide
the XRC shared receive queue number. Necessary to support
kernel level implementation of user verbs for XRC.

Requires a corresponding libibverbs change to support XRC.

Also fix kernel support for XRC broken by commit

"IB: remove xrc_remote_srq_num from struct ib_send_wr"

which removed a field needed to support kernel side XRC as part of an effort
to trim a work request to sizes dependent of the actual request instead
of the union approach.

With this commit I try to follow the pattern outlined by that cleanup
to also support kernel side XRC. Since XRC attributes are associated with
QP type and are (almost) orthogonal to the type of request, the XRC
specific attribute(s) would have to be applicable to several different
work request type specific subtypes. Since the subtypes have different sizes,
putting the xrc specific attributes at the end would require accessor functions
and to keep explicit track of the size of the subtype used.

The chosen solution is to introduce the xrc specific attributes at the top of
the struct instead, this way type checking is taking care of most issues,
except that extra care is needed at deallocation time. Note that this requires
padding of the xrc specific attributes that matches the size of struct ib_sge,
to avoid that the ALIGN() calls used to ensure that the scatter list of the
work is aligned does not extend beyond the size of the allocates space:

struct ib_xrc_wr
{
  <xrc specific part>;
  struct ib_send_wr wr;
}
< subtype extensions will still extend ib_send_wr down here >

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 40 ++++++++++++++++++++++++++++--------
 include/rdma/ib_verbs.h              | 12 +++++++++++
 include/uapi/rdma/ib_user_verbs.h    |  2 ++
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 4323093..f7e9904 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2469,12 +2469,30 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 	return in_len;
 }
 
-static void *alloc_wr(size_t wr_size, __u32 num_sge)
+static void *alloc_wr(struct ib_qp *qp, size_t wr_size, __u32 num_sge)
 {
-	return kmalloc(ALIGN(wr_size, sizeof (struct ib_sge)) +
-			 num_sge * sizeof (struct ib_sge), GFP_KERNEL);
+	void *wr;
+	size_t xrc_ext = qp->qp_type == IB_QPT_XRC_INI ?
+		sizeof(struct ib_xrc_wr) - sizeof(struct ib_send_wr) :
+		0;
+
+	wr = kmalloc(ALIGN(wr_size + xrc_ext, sizeof (struct ib_sge)) +
+		num_sge * sizeof (struct ib_sge), GFP_KERNEL);
+	if (unlikely(!wr))
+		return wr;
+
+	return wr + xrc_ext;
 };
 
+static void free_wr(struct ib_qp *qp, struct ib_send_wr *wr)
+{
+	void *d;
+	if (unlikely(!wr))
+		return;
+	d = qp->qp_type == IB_QPT_XRC_INI ? xrc_wr(wr) : (void *)wr;
+	kfree(d);
+}
+
 ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 			    struct ib_device *ib_dev,
 			    const char __user *buf, int in_len,
@@ -2509,6 +2527,7 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 		goto out;
 
 	is_ud = qp->qp_type == IB_QPT_UD;
+
 	sg_ind = 0;
 	last = NULL;
 	for (i = 0; i < cmd.wr_count; ++i) {
@@ -2534,7 +2553,7 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 			}
 
 			next_size = sizeof(*ud);
-			ud = alloc_wr(next_size, user_wr->num_sge);
+			ud = alloc_wr(qp, next_size, user_wr->num_sge);
 			if (!ud) {
 				ret = -ENOMEM;
 				goto out_put;
@@ -2556,7 +2575,7 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 			struct ib_rdma_wr *rdma;
 
 			next_size = sizeof(*rdma);
-			rdma = alloc_wr(next_size, user_wr->num_sge);
+			rdma = alloc_wr(qp, next_size, user_wr->num_sge);
 			if (!rdma) {
 				ret = -ENOMEM;
 				goto out_put;
@@ -2571,7 +2590,7 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 			struct ib_atomic_wr *atomic;
 
 			next_size = sizeof(*atomic);
-			atomic = alloc_wr(next_size, user_wr->num_sge);
+			atomic = alloc_wr(qp, next_size, user_wr->num_sge);
 			if (!atomic) {
 				ret = -ENOMEM;
 				goto out_put;
@@ -2587,7 +2606,7 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 			   user_wr->opcode == IB_WR_SEND_WITH_IMM ||
 			   user_wr->opcode == IB_WR_SEND_WITH_INV) {
 			next_size = sizeof(*next);
-			next = alloc_wr(next_size, user_wr->num_sge);
+			next = alloc_wr(qp, next_size, user_wr->num_sge);
 			if (!next) {
 				ret = -ENOMEM;
 				goto out_put;
@@ -2605,6 +2624,11 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 			next->ex.invalidate_rkey = user_wr->ex.invalidate_rkey;
 		}
 
+		if (qp->qp_type == IB_QPT_XRC_INI) {
+			struct ib_xrc_wr *xrc = xrc_wr(next);
+			xrc->remote_srqn = user_wr->xrc_remote_srq_num;
+		}
+
 		if (!last)
 			wr = next;
 		else
@@ -2653,7 +2677,7 @@ out_put:
 		if (is_ud && ud_wr(wr)->ah)
 			put_ah_read(ud_wr(wr)->ah);
 		next = wr->next;
-		kfree(wr);
+		free_wr(qp, wr);
 		wr = next;
 	}
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 192d591..093d68e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1276,6 +1276,18 @@ static inline struct ib_sig_handover_wr *sig_handover_wr(struct ib_send_wr *wr)
 	return container_of(wr, struct ib_sig_handover_wr, wr);
 }
 
+struct ib_xrc_wr {
+	u32 			remote_srqn;
+	u32			reserved1;
+	u64			reserved2;
+	struct ib_send_wr	wr;
+};
+
+static inline struct ib_xrc_wr *xrc_wr(struct ib_send_wr *wr)
+{
+	return container_of(wr, struct ib_xrc_wr, wr);
+}
+
 struct ib_recv_wr {
 	struct ib_recv_wr      *next;
 	union {
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 6b8c9c0..db74a5e 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -725,6 +725,8 @@ struct ib_uverbs_send_wr {
 			__u32 reserved;
 		} ud;
 	} wr;
+	__u32 xrc_remote_srq_num;
+	__u32 reserved;
 };
 
 struct ib_uverbs_post_send {
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-09-02  0:09   ` [PATCH 8/9] ib_uverbs: Support for kernel implementation of XRC Knut Omang
@ 2016-09-02  0:09   ` Knut Omang
       [not found]     ` <1472774969-18997-10-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  8 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  0:09 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

Some Infiniband HCAs need to know if a QP is going to
be used for Ethernet over IB (EOIB).

Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/rdma/ib_verbs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 093d68e..d6f3f55 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -991,6 +991,7 @@ enum ib_qp_create_flags {
 	IB_QP_CREATE_SIGNATURE_EN		= 1 << 6,
 	IB_QP_CREATE_USE_GFP_NOIO		= 1 << 7,
 	IB_QP_CREATE_SCATTER_FCS		= 1 << 8,
+	IB_QP_CREATE_EOIB			= 1 << 9,
 	/* reserve bits 26-31 for low level drivers' internal use */
 	IB_QP_CREATE_RESERVED_START		= 1 << 26,
 	IB_QP_CREATE_RESERVED_END		= 1 << 31,
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] ib: Add udata argument to create_ah
       [not found]     ` <1472774969-18997-5-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02  0:38       ` kbuild test robot
  2016-09-02  0:39       ` kbuild test robot
  1 sibling, 0 replies; 39+ messages in thread
From: kbuild test robot @ 2016-09-02  0:38 UTC (permalink / raw)
  Cc: kbuild-all-JC7UmRfGjtg, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

[-- Attachment #1: Type: text/plain, Size: 4177 bytes --]

Hi Knut,

[auto build test ERROR on rdma/master]
[also build test ERROR on v4.8-rc4 next-20160825]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Knut-Omang/SIF-related-verbs-patches/20160902-081205
base:   https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All errors (new ones prefixed by >>):

   drivers/infiniband/hw/i40iw/i40iw_verbs.c: In function 'i40iw_init_rdma_device':
>> drivers/infiniband/hw/i40iw/i40iw_verbs.c:2650:27: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
     iwibdev->ibdev.create_ah = i40iw_create_ah;
                              ^
   cc1: some warnings being treated as errors

vim +2650 drivers/infiniband/hw/i40iw/i40iw_verbs.c

d3749841 Faisal Latif      2016-01-20  2634  	iwibdev->ibdev.dealloc_ucontext = i40iw_dealloc_ucontext;
d3749841 Faisal Latif      2016-01-20  2635  	iwibdev->ibdev.mmap = i40iw_mmap;
d3749841 Faisal Latif      2016-01-20  2636  	iwibdev->ibdev.alloc_pd = i40iw_alloc_pd;
d3749841 Faisal Latif      2016-01-20  2637  	iwibdev->ibdev.dealloc_pd = i40iw_dealloc_pd;
d3749841 Faisal Latif      2016-01-20  2638  	iwibdev->ibdev.create_qp = i40iw_create_qp;
d3749841 Faisal Latif      2016-01-20  2639  	iwibdev->ibdev.modify_qp = i40iw_modify_qp;
d3749841 Faisal Latif      2016-01-20  2640  	iwibdev->ibdev.query_qp = i40iw_query_qp;
d3749841 Faisal Latif      2016-01-20  2641  	iwibdev->ibdev.destroy_qp = i40iw_destroy_qp;
d3749841 Faisal Latif      2016-01-20  2642  	iwibdev->ibdev.create_cq = i40iw_create_cq;
d3749841 Faisal Latif      2016-01-20  2643  	iwibdev->ibdev.destroy_cq = i40iw_destroy_cq;
d3749841 Faisal Latif      2016-01-20  2644  	iwibdev->ibdev.get_dma_mr = i40iw_get_dma_mr;
d3749841 Faisal Latif      2016-01-20  2645  	iwibdev->ibdev.reg_user_mr = i40iw_reg_user_mr;
d3749841 Faisal Latif      2016-01-20  2646  	iwibdev->ibdev.dereg_mr = i40iw_dereg_mr;
b40f4757 Christoph Lameter 2016-05-16  2647  	iwibdev->ibdev.alloc_hw_stats = i40iw_alloc_hw_stats;
b40f4757 Christoph Lameter 2016-05-16  2648  	iwibdev->ibdev.get_hw_stats = i40iw_get_hw_stats;
d3749841 Faisal Latif      2016-01-20  2649  	iwibdev->ibdev.query_device = i40iw_query_device;
d3749841 Faisal Latif      2016-01-20 @2650  	iwibdev->ibdev.create_ah = i40iw_create_ah;
d3749841 Faisal Latif      2016-01-20  2651  	iwibdev->ibdev.destroy_ah = i40iw_destroy_ah;
c2b75ef7 Ismail, Mustafa   2016-04-18  2652  	iwibdev->ibdev.drain_sq = i40iw_drain_sq;
c2b75ef7 Ismail, Mustafa   2016-04-18  2653  	iwibdev->ibdev.drain_rq = i40iw_drain_rq;
b7aee855 Ismail, Mustafa   2016-04-18  2654  	iwibdev->ibdev.alloc_mr = i40iw_alloc_mr;
b7aee855 Ismail, Mustafa   2016-04-18  2655  	iwibdev->ibdev.map_mr_sg = i40iw_map_mr_sg;
d3749841 Faisal Latif      2016-01-20  2656  	iwibdev->ibdev.iwcm = kzalloc(sizeof(*iwibdev->ibdev.iwcm), GFP_KERNEL);
d3749841 Faisal Latif      2016-01-20  2657  	if (!iwibdev->ibdev.iwcm) {
d3749841 Faisal Latif      2016-01-20  2658  		ib_dealloc_device(&iwibdev->ibdev);

:::::: The code at line 2650 was first introduced by commit
:::::: d37498417947cb2299fc749ae4af1d204c768cba i40iw: add files for iwarp interface

:::::: TO: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
:::::: CC: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 47053 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] ib: Add udata argument to create_ah
       [not found]     ` <1472774969-18997-5-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02  0:38       ` kbuild test robot
@ 2016-09-02  0:39       ` kbuild test robot
       [not found]         ` <201609020848.QliVPrIS%fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: kbuild test robot @ 2016-09-02  0:39 UTC (permalink / raw)
  Cc: kbuild-all-JC7UmRfGjtg, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Knut Omang

[-- Attachment #1: Type: text/plain, Size: 4011 bytes --]

Hi Knut,

[auto build test WARNING on rdma/master]
[also build test WARNING on v4.8-rc4 next-20160825]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Knut-Omang/SIF-related-verbs-patches/20160902-081205
base:   https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master
config: tile-allyesconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   drivers/infiniband/hw/i40iw/i40iw_verbs.c: In function 'i40iw_init_rdma_device':
>> drivers/infiniband/hw/i40iw/i40iw_verbs.c:2650:27: warning: assignment from incompatible pointer type [enabled by default]

vim +2650 drivers/infiniband/hw/i40iw/i40iw_verbs.c

d3749841 Faisal Latif      2016-01-20  2634  	iwibdev->ibdev.dealloc_ucontext = i40iw_dealloc_ucontext;
d3749841 Faisal Latif      2016-01-20  2635  	iwibdev->ibdev.mmap = i40iw_mmap;
d3749841 Faisal Latif      2016-01-20  2636  	iwibdev->ibdev.alloc_pd = i40iw_alloc_pd;
d3749841 Faisal Latif      2016-01-20  2637  	iwibdev->ibdev.dealloc_pd = i40iw_dealloc_pd;
d3749841 Faisal Latif      2016-01-20  2638  	iwibdev->ibdev.create_qp = i40iw_create_qp;
d3749841 Faisal Latif      2016-01-20  2639  	iwibdev->ibdev.modify_qp = i40iw_modify_qp;
d3749841 Faisal Latif      2016-01-20  2640  	iwibdev->ibdev.query_qp = i40iw_query_qp;
d3749841 Faisal Latif      2016-01-20  2641  	iwibdev->ibdev.destroy_qp = i40iw_destroy_qp;
d3749841 Faisal Latif      2016-01-20  2642  	iwibdev->ibdev.create_cq = i40iw_create_cq;
d3749841 Faisal Latif      2016-01-20  2643  	iwibdev->ibdev.destroy_cq = i40iw_destroy_cq;
d3749841 Faisal Latif      2016-01-20  2644  	iwibdev->ibdev.get_dma_mr = i40iw_get_dma_mr;
d3749841 Faisal Latif      2016-01-20  2645  	iwibdev->ibdev.reg_user_mr = i40iw_reg_user_mr;
d3749841 Faisal Latif      2016-01-20  2646  	iwibdev->ibdev.dereg_mr = i40iw_dereg_mr;
b40f4757 Christoph Lameter 2016-05-16  2647  	iwibdev->ibdev.alloc_hw_stats = i40iw_alloc_hw_stats;
b40f4757 Christoph Lameter 2016-05-16  2648  	iwibdev->ibdev.get_hw_stats = i40iw_get_hw_stats;
d3749841 Faisal Latif      2016-01-20  2649  	iwibdev->ibdev.query_device = i40iw_query_device;
d3749841 Faisal Latif      2016-01-20 @2650  	iwibdev->ibdev.create_ah = i40iw_create_ah;
d3749841 Faisal Latif      2016-01-20  2651  	iwibdev->ibdev.destroy_ah = i40iw_destroy_ah;
c2b75ef7 Ismail, Mustafa   2016-04-18  2652  	iwibdev->ibdev.drain_sq = i40iw_drain_sq;
c2b75ef7 Ismail, Mustafa   2016-04-18  2653  	iwibdev->ibdev.drain_rq = i40iw_drain_rq;
b7aee855 Ismail, Mustafa   2016-04-18  2654  	iwibdev->ibdev.alloc_mr = i40iw_alloc_mr;
b7aee855 Ismail, Mustafa   2016-04-18  2655  	iwibdev->ibdev.map_mr_sg = i40iw_map_mr_sg;
d3749841 Faisal Latif      2016-01-20  2656  	iwibdev->ibdev.iwcm = kzalloc(sizeof(*iwibdev->ibdev.iwcm), GFP_KERNEL);
d3749841 Faisal Latif      2016-01-20  2657  	if (!iwibdev->ibdev.iwcm) {
d3749841 Faisal Latif      2016-01-20  2658  		ib_dealloc_device(&iwibdev->ibdev);

:::::: The code at line 2650 was first introduced by commit
:::::: d37498417947cb2299fc749ae4af1d204c768cba i40iw: add files for iwarp interface

:::::: TO: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
:::::: CC: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 45508 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/9] ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp
       [not found]     ` <1472774969-18997-6-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02  2:09       ` Jason Gunthorpe
       [not found]         ` <20160902020945.GB30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2016-09-02  2:09 UTC (permalink / raw)
  To: Knut Omang; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 02, 2016 at 02:09:25AM +0200, Knut Omang wrote:
> The ib_uverbs_reg_mr_resp structure was not 64 bit end aligned
> as required by the protocol. This causes alignment issues
> if a device specific driver needs to transfer extra response
> arguments.

Can you explain in the commit message why this doesn't break anything?
How does compat work with existing libibverbs?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp
       [not found]     ` <1472774969-18997-7-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02  2:13       ` Jason Gunthorpe
       [not found]         ` <20160902021300.GC30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2016-09-02  2:13 UTC (permalink / raw)
  To: Knut Omang; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 02, 2016 at 02:09:26AM +0200, Knut Omang wrote:
> This commit removes the implementation and use of the modify_qp_mask
> helper function from the generic OFED implementation and into individual
> device drivers.
> 
> Like with use of the ib_modify_qp_is_ok function it should be up to
> each device driver how to handle bits set in the attribute masks.
> 
> With the modify_qp_mask function applied in the generic code,
> drivers would not see the bits that the user process actually sets.
> 
> The restrictions imposed by the filter are also beyond what
> is imposed by the Infiniband standard, and would also limit
> future drivers or hardware from checking for unsupported or
> invalid settings.

I'm not excited about this direction. It is not OK for different
drivers to use different mask algorithms, they must all behave the
same.

What is it you want to do differently, and why shouldn't the mlx
drivers be updated to match that?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/9] ib_uverbs: Support for kernel implementation of XRC
       [not found]     ` <1472774969-18997-9-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02  2:16       ` Jason Gunthorpe
       [not found]         ` <20160902021640.GD30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2016-09-02  2:16 UTC (permalink / raw)
  To: Knut Omang; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 02, 2016 at 02:09:28AM +0200, Knut Omang wrote:

> +++ b/include/uapi/rdma/ib_user_verbs.h
> @@ -725,6 +725,8 @@ struct ib_uverbs_send_wr {
>  			__u32 reserved;
>  		} ud;
>  	} wr;
> +	__u32 xrc_remote_srq_num;
> +	__u32 reserved;
>  };

Hum, how does compat work here? Isn't send_wr usually an array?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]     ` <1472774969-18997-10-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02  2:17       ` Jason Gunthorpe
       [not found]         ` <20160902021729.GE30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2016-09-02  2:17 UTC (permalink / raw)
  To: Knut Omang; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 02, 2016 at 02:09:29AM +0200, Knut Omang wrote:
> Some Infiniband HCAs need to know if a QP is going to
> be used for Ethernet over IB (EOIB).

You will have to send this after your driver. I recommend a patch
proposing this functionality with your driver as the example
implementation, along with a kernel user, as mellanox typically does.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp
       [not found]         ` <20160902021300.GC30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-09-02  4:45           ` Knut Omang
       [not found]             ` <1472791552.9410.258.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  4:45 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, 2016-09-01 at 20:13 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2016 at 02:09:26AM +0200, Knut Omang wrote:
> > This commit removes the implementation and use of the modify_qp_mask
> > helper function from the generic OFED implementation and into individual
> > device drivers.
> > 
> > Like with use of the ib_modify_qp_is_ok function it should be up to
> > each device driver how to handle bits set in the attribute masks.
> > 
> > With the modify_qp_mask function applied in the generic code,
> > drivers would not see the bits that the user process actually sets.
> > 
> > The restrictions imposed by the filter are also beyond what
> > is imposed by the Infiniband standard, and would also limit
> > future drivers or hardware from checking for unsupported or
> > invalid settings.
> 
> I'm not excited about this direction. It is not OK for different
> drivers to use different mask algorithms, they must all behave the
> same.

The problem I am solving here is that SIF expects (and requires) some of 
the bits that the user correctly sets, but that Mellanox either ignore or need to
be 0. So when that mask is applied to the user input, the value that reaches
the SIF driver is not correct from SIFs perspective. 
If the values goes through as the ULP sets them, then all is fine.

> What is it you want to do differently, and why shouldn't the mlx
> drivers be updated to match that?

I'd like the hardware specific part to decide which of the user bits that
are interesting and which are not. Currently that information is lost on the way
down to the hardware specific calls.

Knut

> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]         ` <20160902021729.GE30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-09-02  5:04           ` Knut Omang
       [not found]             ` <1472792670.9410.276.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02  5:04 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, 2016-09-01 at 20:17 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2016 at 02:09:29AM +0200, Knut Omang wrote:
> > Some Infiniband HCAs need to know if a QP is going to
> > be used for Ethernet over IB (EOIB).
> 
> You will have to send this after your driver. I recommend a patch
> proposing this functionality with your driver as the example
> implementation, along with a kernel user, as mellanox typically does.

It's a bit of a chicken and egg situation since the driver 
depends on the patches. If I can 'tick off' this bit soon, I can move ahead and 
get to the send the driver too.

There's not much info in the driver - it just forwards that bit to a hardware bit
and it all happens from there. But hardware needs to know when to set that bit,
as it is only valid when operating as transport for Ethernet.

Please - it would be a big help for me (saving a lot of work down the line that 
is better spent working with the community than with handling more version issues later) 
if I can get an indication that just that bit would be acceptable 8-D ...

Thanks,
Knut


> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/9] ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp
       [not found]         ` <20160902020945.GB30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-09-02  7:54           ` Knut Omang
  0 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-02  7:54 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, 2016-09-01 at 20:09 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2016 at 02:09:25AM +0200, Knut Omang wrote:
> > 
> > The ib_uverbs_reg_mr_resp structure was not 64 bit end aligned
> > as required by the protocol. This causes alignment issues
> > if a device specific driver needs to transfer extra response
> > arguments.
> Can you explain in the commit message why this doesn't break anything?
> How does compat work with existing libibverbs?

It doesn't, and I realize I need to handle both

old userspace -> new kernel
new userspace -> old kernel

I am working on a fix for it, which would have to be in ib_uverbs due to the 

if  ( <user size> != <kernel size> ) 
   fail

semantics which effectively inhibits any changes at all  :-/

Thanks,
Knut

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/9] ib_uverbs: Support for kernel implementation of XRC
       [not found]         ` <20160902021640.GD30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-09-02  7:55           ` Knut Omang
  0 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-02  7:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, 2016-09-01 at 20:16 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2016 at 02:09:28AM +0200, Knut Omang wrote:
> 
> > +++ b/include/uapi/rdma/ib_user_verbs.h
> > @@ -725,6 +725,8 @@ struct ib_uverbs_send_wr {
> >  			__u32 reserved;
> >  		} ud;
> >  	} wr;
> > +	__u32 xrc_remote_srq_num;
> > +	__u32 reserved;
> >  };
> 
> Hum, how does compat work here? Isn't send_wr usually an array?

Thanks for pointing that out - you are quite right - this has the same issue as
the other 32 bit padding addition.

As the two patch set stands, both libibverbs and the kernel needs to be updated 
at the same time due to these end alignment issues.

Can we agree on that the patch itself is necessary to support vendor specific
extensions, and that the misaligment I fix here is a bug fix?

Then people can continue to evaluate the patch set assuming 
I can propose a solution to the  

old libibverbs - new kernel or
new libiverbs, old kernel

situations?

Thanks for the reviews!
Knut

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] ib: Add udata argument to create_ah
       [not found]         ` <201609020848.QliVPrIS%fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2016-09-02  8:01           ` Knut Omang
  0 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-02  8:01 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Doug Ledford

Good catch by the robot - I am obviously missing a few options in my test .config
Will fix both of these.

Knut

On Fri, 2016-09-02 at 08:39 +0800, kbuild test robot wrote:
> Hi Knut,
> 
> [auto build test WARNING on rdma/master]
> [also build test WARNING on v4.8-rc4 next-20160825]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch
> series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:    https://github.com/0day-ci/linux/commits/Knut-Omang/SIF-related-verbs-patches/20160902-081205
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master
> config: tile-allyesconfig (attached as .config)
> compiler: tilegx-linux-gcc (GCC) 4.6.2
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=tile 
> 
> All warnings (new ones prefixed by >>):
> 
>    drivers/infiniband/hw/i40iw/i40iw_verbs.c: In function 'i40iw_init_rdma_device':
> > 
> > > 
> > > drivers/infiniband/hw/i40iw/i40iw_verbs.c:2650:27: warning: assignment from incompatible pointer type [enabled by default]
> vim +2650 drivers/infiniband/hw/i40iw/i40iw_verbs.c
> 
> d3749841 Faisal Latif      2016-01-20  2634  	iwibdev->ibdev.dealloc_ucontext = i40iw_dealloc_ucontext;
> d3749841 Faisal Latif      2016-01-20  2635  	iwibdev->ibdev.mmap = i40iw_mmap;
> d3749841 Faisal Latif      2016-01-20  2636  	iwibdev->ibdev.alloc_pd = i40iw_alloc_pd;
> d3749841 Faisal Latif      2016-01-20  2637  	iwibdev->ibdev.dealloc_pd = i40iw_dealloc_pd;
> d3749841 Faisal Latif      2016-01-20  2638  	iwibdev->ibdev.create_qp = i40iw_create_qp;
> d3749841 Faisal Latif      2016-01-20  2639  	iwibdev->ibdev.modify_qp = i40iw_modify_qp;
> d3749841 Faisal Latif      2016-01-20  2640  	iwibdev->ibdev.query_qp = i40iw_query_qp;
> d3749841 Faisal Latif      2016-01-20  2641  	iwibdev->ibdev.destroy_qp = i40iw_destroy_qp;
> d3749841 Faisal Latif      2016-01-20  2642  	iwibdev->ibdev.create_cq = i40iw_create_cq;
> d3749841 Faisal Latif      2016-01-20  2643  	iwibdev->ibdev.destroy_cq = i40iw_destroy_cq;
> d3749841 Faisal Latif      2016-01-20  2644  	iwibdev->ibdev.get_dma_mr = i40iw_get_dma_mr;
> d3749841 Faisal Latif      2016-01-20  2645  	iwibdev->ibdev.reg_user_mr = i40iw_reg_user_mr;
> d3749841 Faisal Latif      2016-01-20  2646  	iwibdev->ibdev.dereg_mr = i40iw_dereg_mr;
> b40f4757 Christoph Lameter 2016-05-16  2647  	iwibdev->ibdev.alloc_hw_stats = i40iw_alloc_hw_stats;
> b40f4757 Christoph Lameter 2016-05-16  2648  	iwibdev->ibdev.get_hw_stats = i40iw_get_hw_stats;
> d3749841 Faisal Latif      2016-01-20  2649  	iwibdev->ibdev.query_device = i40iw_query_device;
> d3749841 Faisal Latif      2016-01-20 @2650  	iwibdev->ibdev.create_ah = i40iw_create_ah;
> d3749841 Faisal Latif      2016-01-20  2651  	iwibdev->ibdev.destroy_ah = i40iw_destroy_ah;
> c2b75ef7 Ismail, Mustafa   2016-04-18  2652  	iwibdev->ibdev.drain_sq = i40iw_drain_sq;
> c2b75ef7 Ismail, Mustafa   2016-04-18  2653  	iwibdev->ibdev.drain_rq = i40iw_drain_rq;
> b7aee855 Ismail, Mustafa   2016-04-18  2654  	iwibdev->ibdev.alloc_mr = i40iw_alloc_mr;
> b7aee855 Ismail, Mustafa   2016-04-18  2655  	iwibdev->ibdev.map_mr_sg = i40iw_map_mr_sg;
> d3749841 Faisal Latif      2016-01-20  2656  	iwibdev->ibdev.iwcm = kzalloc(sizeof(*iwibdev->ibdev.iwcm), GFP_KERNEL);
> d3749841 Faisal Latif      2016-01-20  2657  	if (!iwibdev->ibdev.iwcm) {
> d3749841 Faisal Latif      2016-01-20  2658  		ib_dealloc_device(&iwibdev->ibdev);
> 
> :::::: The code at line 2650 was first introduced by commit
> :::::: d37498417947cb2299fc749ae4af1d204c768cba i40iw: add files for iwarp interface
> 
> :::::: TO: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> :::::: CC: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]             ` <1472792670.9410.276.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02 11:00               ` Knut Omang
       [not found]                 ` <1472814057.3975.47.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02 11:00 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, 2016-09-02 at 07:04 +0200, Knut Omang wrote:
> On Thu, 2016-09-01 at 20:17 -0600, Jason Gunthorpe wrote:
> > 
> > On Fri, Sep 02, 2016 at 02:09:29AM +0200, Knut Omang wrote:
> > > 
> > > Some Infiniband HCAs need to know if a QP is going to
> > > be used for Ethernet over IB (EOIB).
> > You will have to send this after your driver. I recommend a patch
> > proposing this functionality with your driver as the example
> > implementation, along with a kernel user, as mellanox typically does.
> It's a bit of a chicken and egg situation since the driver 
> depends on the patches. If I can 'tick off' this bit soon, I can move ahead and 
> get to the send the driver too.
> 
> There's not much info in the driver - it just forwards that bit to a hardware bit
> and it all happens from there. But hardware needs to know when to set that bit,
> as it is only valid when operating as transport for Ethernet.
> 
> Please - it would be a big help for me (saving a lot of work down the line that 
> is better spent working with the community than with handling more version issues later) 
> if I can get an indication that just that bit would be acceptable 8-D ...

I will add the following further justification to the commit message for v2:

Support for encapsulation of Ethernet over IB (EoIB) is a generic feature 
where different HCA implementations can include different special features. 
One example is that if the HCA knows the QP is going to be used for Ethernet, 
the implementation can ensure that all messages sent on this QP represents 
a properly formatted encapsulation of Ethernet frames with legal header values.
Or the HCA can perform the encapsulation itself.

Knut
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]                 ` <1472814057.3975.47.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02 16:19                   ` Santosh Shilimkar
       [not found]                     ` <b38c5cf6-4e07-33eb-8704-284498481bb6-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02 16:34                   ` Jason Gunthorpe
  1 sibling, 1 reply; 39+ messages in thread
From: Santosh Shilimkar @ 2016-09-02 16:19 UTC (permalink / raw)
  To: Knut Omang, Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 9/2/2016 4:00 AM, Knut Omang wrote:
> On Fri, 2016-09-02 at 07:04 +0200, Knut Omang wrote:
>> On Thu, 2016-09-01 at 20:17 -0600, Jason Gunthorpe wrote:
>>>
>>> On Fri, Sep 02, 2016 at 02:09:29AM +0200, Knut Omang wrote:
>>>>
>>>> Some Infiniband HCAs need to know if a QP is going to
>>>> be used for Ethernet over IB (EOIB).
>>> You will have to send this after your driver. I recommend a patch
>>> proposing this functionality with your driver as the example
>>> implementation, along with a kernel user, as mellanox typically does.
>> It's a bit of a chicken and egg situation since the driver
>> depends on the patches. If I can 'tick off' this bit soon, I can move ahead and
>> get to the send the driver too.
>>
>> There's not much info in the driver - it just forwards that bit to a hardware bit
>> and it all happens from there. But hardware needs to know when to set that bit,
>> as it is only valid when operating as transport for Ethernet.
>>
Right. It comes down to removing mask filter at verbs layer and letting 
driver get the full view of it so that it can better deal with it based
on information.

>> Please - it would be a big help for me (saving a lot of work down the line that
>> is better spent working with the community than with handling more version issues later)
>> if I can get an indication that just that bit would be acceptable 8-D ...
>
> I will add the following further justification to the commit message for v2:
>
> Support for encapsulation of Ethernet over IB (EoIB) is a generic feature
> where different HCA implementations can include different special features.
> One example is that if the HCA knows the QP is going to be used for Ethernet,
> the implementation can ensure that all messages sent on this QP represents
> a properly formatted encapsulation of Ethernet frames with legal header values.
> Or the HCA can perform the encapsulation itself.
>
Don't want to hijack your patch but I also have seen another need where
ULP needs to pass additional information to driver for creating
RC qp with couple of add-ons . My current thought is to use reserved
QP flag bits (bits 26-31, for low level drivers' internal use) but
wanted to check if there is any objection to it.

Regards,
Santosh


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]                 ` <1472814057.3975.47.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2016-09-02 16:19                   ` Santosh Shilimkar
@ 2016-09-02 16:34                   ` Jason Gunthorpe
       [not found]                     ` <20160902163451.GC24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2016-09-02 16:34 UTC (permalink / raw)
  To: Knut Omang; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 02, 2016 at 01:00:57PM +0200, Knut Omang wrote:
> On Fri, 2016-09-02 at 07:04 +0200, Knut Omang wrote:
> > On Thu, 2016-09-01 at 20:17 -0600, Jason Gunthorpe wrote:
> > > 
> > > On Fri, Sep 02, 2016 at 02:09:29AM +0200, Knut Omang wrote:
> > > > 
> > > > Some Infiniband HCAs need to know if a QP is going to
> > > > be used for Ethernet over IB (EOIB).
> > > You will have to send this after your driver. I recommend a patch
> > > proposing this functionality with your driver as the example
> > > implementation, along with a kernel user, as mellanox typically does.
> > It's a bit of a chicken and egg situation since the driver 
> > depends on the patches. If I can 'tick off' this bit soon, I can move ahead and 
> > get to the send the driver too.
> > 
> > There's not much info in the driver - it just forwards that bit to a hardware bit
> > and it all happens from there. But hardware needs to know when to set that bit,
> > as it is only valid when operating as transport for Ethernet.
> > 
> > Please - it would be a big help for me (saving a lot of work down the line that 
> > is better spent working with the community than with handling more version issues later) 
> > if I can get an indication that just that bit would be acceptable 8-D ...
> 
> I will add the following further justification to the commit message for v2:
> 
> Support for encapsulation of Ethernet over IB (EoIB) is a generic feature 
> where different HCA implementations can include different special features. 
> One example is that if the HCA knows the QP is going to be used for Ethernet, 
> the implementation can ensure that all messages sent on this QP represents 
> a properly formatted encapsulation of Ethernet frames with legal header values.
> Or the HCA can perform the encapsulation itself.

It would be better to remove support for this from your driver in the
initial submission.

Keep new uAPIs in new driver submissions to an absolute minimum please.

And as I said, you need to introduce a kernel user of this bit as
well, so the driver alone is not enough.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp
       [not found]             ` <1472791552.9410.258.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02 17:10               ` Jason Gunthorpe
       [not found]                 ` <20160902171008.GE24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2016-09-02 17:10 UTC (permalink / raw)
  To: Knut Omang; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 02, 2016 at 06:45:52AM +0200, Knut Omang wrote:
> On Thu, 2016-09-01 at 20:13 -0600, Jason Gunthorpe wrote:
> > On Fri, Sep 02, 2016 at 02:09:26AM +0200, Knut Omang wrote:
> > > This commit removes the implementation and use of the modify_qp_mask
> > > helper function from the generic OFED implementation and into individual
> > > device drivers.
> > > 
> > > Like with use of the ib_modify_qp_is_ok function it should be up to
> > > each device driver how to handle bits set in the attribute masks.
> > > 
> > > With the modify_qp_mask function applied in the generic code,
> > > drivers would not see the bits that the user process actually sets.
> > > 
> > > The restrictions imposed by the filter are also beyond what
> > > is imposed by the Infiniband standard, and would also limit
> > > future drivers or hardware from checking for unsupported or
> > > invalid settings.
> > 
> > I'm not excited about this direction. It is not OK for different
> > drivers to use different mask algorithms, they must all behave the
> > same.
> 
> The problem I am solving here is that SIF expects (and requires) some of 
> the bits that the user correctly sets, but that Mellanox either ignore or need to
> be 0. So when that mask is applied to the user input, the value that reaches
> the SIF driver is not correct from SIFs perspective. 
> If the values goes through as the ULP sets them, then all is fine.

I guess I still don't understand. SIF and mlx cannot have different
behavior here. How is the application writer to know what to do?

Can you give a concrete example of the problem?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]                     ` <b38c5cf6-4e07-33eb-8704-284498481bb6-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-02 17:11                       ` Jason Gunthorpe
       [not found]                         ` <20160902171107.GF24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2016-09-02 17:11 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Knut Omang, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 02, 2016 at 09:19:43AM -0700, Santosh Shilimkar wrote:

> Don't want to hijack your patch but I also have seen another need where
> ULP needs to pass additional information to driver for creating
> RC qp with couple of add-ons . My current thought is to use reserved
> QP flag bits (bits 26-31, for low level drivers' internal use)

Nope, absolutely not.

Propose an actual verbs API extensions for what you want to do just
like Mellanox has been doing.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]                         ` <20160902171107.GF24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-09-02 17:14                           ` Santosh Shilimkar
  0 siblings, 0 replies; 39+ messages in thread
From: Santosh Shilimkar @ 2016-09-02 17:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Knut Omang, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 9/2/2016 10:11 AM, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2016 at 09:19:43AM -0700, Santosh Shilimkar wrote:
>
>> Don't want to hijack your patch but I also have seen another need where
>> ULP needs to pass additional information to driver for creating
>> RC qp with couple of add-ons . My current thought is to use reserved
>> QP flag bits (bits 26-31, for low level drivers' internal use)
>
> Nope, absolutely not.
>
> Propose an actual verbs API extensions for what you want to do just
> like Mellanox has been doing.
>
Thanks Jason for comments. Will do !!

Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp
       [not found]                 ` <20160902171008.GE24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-09-02 17:35                   ` Knut Omang
       [not found]                     ` <1472837728.9410.340.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-02 17:35 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, 2016-09-02 at 11:10 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2016 at 06:45:52AM +0200, Knut Omang wrote:
> > On Thu, 2016-09-01 at 20:13 -0600, Jason Gunthorpe wrote:
> > > On Fri, Sep 02, 2016 at 02:09:26AM +0200, Knut Omang wrote:
> > > > This commit removes the implementation and use of the modify_qp_mask
> > > > helper function from the generic OFED implementation and into individual
> > > > device drivers.
> > > > 
> > > > Like with use of the ib_modify_qp_is_ok function it should be up to
> > > > each device driver how to handle bits set in the attribute masks.
> > > > 
> > > > With the modify_qp_mask function applied in the generic code,
> > > > drivers would not see the bits that the user process actually sets.
> > > > 
> > > > The restrictions imposed by the filter are also beyond what
> > > > is imposed by the Infiniband standard, and would also limit
> > > > future drivers or hardware from checking for unsupported or
> > > > invalid settings.
> > > 
> > > I'm not excited about this direction. It is not OK for different
> > > drivers to use different mask algorithms, they must all behave the
> > > same.
> > 
> > The problem I am solving here is that SIF expects (and requires) some of 
> > the bits that the user correctly sets, but that Mellanox either ignore or need to
> > be 0. So when that mask is applied to the user input, the value that reaches
> > the SIF driver is not correct from SIFs perspective. 
> > If the values goes through as the ULP sets them, then all is fine.
> 
> I guess I still don't understand. SIF and mlx cannot have different
> behavior here. 

Rest assure, being the new kid on the block here, of course we need to 
work with existing applications. We try to test with whatever we can get our 
hands on, and make sure we behave sensibly..

> How is the application writer to know what to do?

This masking applies to XRC only.
The application does not need to do anything, it treats the XRC QP as any other
RC QP and attempt similar modify_qp operations.
Mlx chooses to ignore some of the bits provided. 

> Can you give a concrete example of the problem?

Maybe someone from Mellanox can explain the reason 
for having the filtering there in the first place,
and only from user mode XRC QPs?

Knut

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]     ` <1472774969-18997-2-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-06 14:01       ` Hal Rosenstock
       [not found]         ` <57867d7f-cc9f-5ec2-6632-c552e6469e40-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Hal Rosenstock @ 2016-09-06 14:01 UTC (permalink / raw)
  To: Knut Omang, Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On 9/1/2016 8:09 PM, Knut Omang wrote:
> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> 
> The process_mad function is an optional IB driver entry point
> allows a driver to intercept or modify MAD traffic.

process_mad is optional but currently that optionality is based on
whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
the core_cap_flags and is related to the device type: IB including RoCE
v. non IB. The current in tree device drivers that do not support
process_mad are i40iw (iWARP) and usnic.

> This fix allows MAD traffic to flow down to the device also
> when MAD traffic is completely handled by the device

All MAD traffic is not handled by the device.

> and no process_mad function is provided.
> 
> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/infiniband/core/mad.c | 6 ++++++
>  drivers/infiniband/core/smi.h | 6 ++----
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 2d49228..ece33ec 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>  			goto out;
>  	}
>  
> +	/* If device does not define the optional process_mad function it means that
> +	 * everything is handled in hardware:
> +	 */
> +	if (!device->process_mad)
> +		goto out;
> +

This changes the ib_post_send_mad flow for those devices in that it
takes the send rather than error path.

>  	local = kmalloc(sizeof *local, GFP_ATOMIC);
>  	if (!local) {
>  		ret = -ENOMEM;
> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
> index 33c91c8..16a9f9a 100644
> --- a/drivers/infiniband/core/smi.h
> +++ b/drivers/infiniband/core/smi.h
> @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>  {
>  	/* C14-9:3 -- We're at the end of the DR segment of path */
>  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
> -	return ((device->process_mad &&
> -		!ib_get_smp_direction(smp) &&
> +	return ((!ib_get_smp_direction(smp) &&
>  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
>  		IB_SMI_HANDLE : IB_SMI_DISCARD);
>  }
> @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
>  {
>  	/* C14-13:3 -- We're at the end of the DR segment of path */
>  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
> -	return ((device->process_mad &&
> -		ib_get_smp_direction(smp) &&
> +	return ((ib_get_smp_direction(smp) &&
>  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
>  }

Also, with this approach of optional process_mad for IB device, will/how
will sysfs port counters be supported for this device ? This currently
relies on process_mad.

It should be possible to implement a process_mad routine to return
ib_mad_result based on management class and perhaps attribute ID in the
case of SMInfo. Can this alternative approach be used for SIF ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]         ` <57867d7f-cc9f-5ec2-6632-c552e6469e40-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-09-07 11:42           ` Knut Omang
       [not found]             ` <1473248532.3103.51.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-07 11:42 UTC (permalink / raw)
  To: Hal Rosenstock, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
> On 9/1/2016 8:09 PM, Knut Omang wrote:
> > 
> > From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > 
> > The process_mad function is an optional IB driver entry point
> > allows a driver to intercept or modify MAD traffic.
> 
> process_mad is optional but currently that optionality is based on
> whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
> the core_cap_flags and is related to the device type: IB including RoCE
> v. non IB. The current in tree device drivers that do not support
> process_mad are i40iw (iWARP) and usnic.

I see - we are introducing a new case here, then.

> > 
> > This fix allows MAD traffic to flow down to the device also
> > when MAD traffic is completely handled by the device
>
> All MAD traffic is not handled by the device.

Yes, in our case all MAD packet handling is done by the device, the driver's task 
for MAD packets is just to forward.

> > 
> > and no process_mad function is provided.
> > 
> > Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/infiniband/core/mad.c | 6 ++++++
> >  drivers/infiniband/core/smi.h | 6 ++----
> >  2 files changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> > index 2d49228..ece33ec 100644
> > --- a/drivers/infiniband/core/mad.c
> > +++ b/drivers/infiniband/core/mad.c
> > @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
> >  			goto out;
> >  	}
> >  
> > +	/* If device does not define the optional process_mad function it means that
> > +	 * everything is handled in hardware:
> > +	 */
> > +	if (!device->process_mad)
> > +		goto out;
> > +
> This changes the ib_post_send_mad flow for those devices in that it
> takes the send rather than error path.

Yes, but no packets will ever be received by this function for the i40iw and usnic 
devices because they have said in their capabilities that they do not support SMI?

> >  	local = kmalloc(sizeof *local, GFP_ATOMIC);
> >  	if (!local) {
> >  		ret = -ENOMEM;
> > diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
> > index 33c91c8..16a9f9a 100644
> > --- a/drivers/infiniband/core/smi.h
> > +++ b/drivers/infiniband/core/smi.h
> > @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
> >  {
> >  	/* C14-9:3 -- We're at the end of the DR segment of path */
> >  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
> > -	return ((device->process_mad &&
> > -		!ib_get_smp_direction(smp) &&
> > +	return ((!ib_get_smp_direction(smp) &&
> >  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
> >  		IB_SMI_HANDLE : IB_SMI_DISCARD);
> >  }
> > @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
> >  {
> >  	/* C14-13:3 -- We're at the end of the DR segment of path */
> >  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
> > -	return ((device->process_mad &&
> > -		ib_get_smp_direction(smp) &&
> > +	return ((ib_get_smp_direction(smp) &&
> >  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
> >  }
> Also, with this approach of optional process_mad for IB device, will/how
> will sysfs port counters be supported for this device ? This currently
> relies on process_mad.

Yes, that is actually a problem for us right now. 
We do however think that the solution of composing a packet to send via process_mad is 
kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.

> It should be possible to implement a process_mad routine to return
> ib_mad_result based on management class and perhaps attribute ID in the
> case of SMInfo. Can this alternative approach be used for SIF ?

The problem is that the handle_outgoing_dr_smp function has an implicit assumption that 
some packets are handled by the process_mad function itself. There is no way to provide return 
values from the process_mad function that ensures that packets are always forwarded to the device,
so the only viable solution without breaking the API seems to be to not implement process_mad.

Thanks for the review,

Knut

> -- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB
       [not found]                     ` <20160902163451.GC24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-09-07 17:33                       ` Knut Omang
  0 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-07 17:33 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, 2016-09-02 at 10:34 -0600, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2016 at 01:00:57PM +0200, Knut Omang wrote:
> > On Fri, 2016-09-02 at 07:04 +0200, Knut Omang wrote:
> > > On Thu, 2016-09-01 at 20:17 -0600, Jason Gunthorpe wrote:
> > > > 
> > > > On Fri, Sep 02, 2016 at 02:09:29AM +0200, Knut Omang wrote:
> > > > > 
> > > > > Some Infiniband HCAs need to know if a QP is going to
> > > > > be used for Ethernet over IB (EOIB).
> > > > You will have to send this after your driver. I recommend a patch
> > > > proposing this functionality with your driver as the example
> > > > implementation, along with a kernel user, as mellanox typically does.
> > > It's a bit of a chicken and egg situation since the driver 
> > > depends on the patches. If I can 'tick off' this bit soon, I can move ahead and 
> > > get to the send the driver too.
> > > 
> > > There's not much info in the driver - it just forwards that bit to a hardware bit
> > > and it all happens from there. But hardware needs to know when to set that bit,
> > > as it is only valid when operating as transport for Ethernet.
> > > 
> > > Please - it would be a big help for me (saving a lot of work down the line that 
> > > is better spent working with the community than with handling more version issues later) 
> > > if I can get an indication that just that bit would be acceptable 8-D ...
> > 
> > I will add the following further justification to the commit message for v2:
> > 
> > Support for encapsulation of Ethernet over IB (EoIB) is a generic feature 
> > where different HCA implementations can include different special features. 
> > One example is that if the HCA knows the QP is going to be used for Ethernet, 
> > the implementation can ensure that all messages sent on this QP represents 
> > a properly formatted encapsulation of Ethernet frames with legal header values.
> > Or the HCA can perform the encapsulation itself.
> 
> It would be better to remove support for this from your driver in the
> initial submission.

Ok, will do.

> Keep new uAPIs in new driver submissions to an absolute minimum please.

I completely agree with you that APIs need a lot of care and consideration, 
rest assure ;-)

> And as I said, you need to introduce a kernel user of this bit as
> well, so the driver alone is not enough.

Ok.
The Ethernet driver using this is outside of my control.

Knut
 
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]             ` <1473248532.3103.51.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-08 11:33               ` Hal Rosenstock
       [not found]                 ` <098f69ae-6940-589a-e9ad-c65e34c958b7-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Hal Rosenstock @ 2016-09-08 11:33 UTC (permalink / raw)
  To: Knut Omang, Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On 9/7/2016 7:42 AM, Knut Omang wrote:
> On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
>> On 9/1/2016 8:09 PM, Knut Omang wrote:
>>>
>>> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>
>>> The process_mad function is an optional IB driver entry point
>>> allows a driver to intercept or modify MAD traffic.
>>  
>> process_mad is optional but currently that optionality is based on
>> whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
>> the core_cap_flags and is related to the device type: IB including RoCE
>> v. non IB. The current in tree device drivers that do not support
>> process_mad are i40iw (iWARP) and usnic.
> 
> I see - we are introducing a new case here, then.
>>>
>>> This fix allows MAD traffic to flow down to the device also
>>> when MAD traffic is completely handled by the device
>>
>> All MAD traffic is not handled by the device.
> 
> Yes, in our case all MAD packet handling is done by the device, 

SMInfo is not handled by device unless you have SM in hardware.
It also depends on management class, method, and direction
(outgoing/incoming) and in case of DR SMP the DR path.

> the driver's task for MAD packets is just to forward.

I think that it's a little more complex than that. See above.

Perhaps some of this is due to not understanding what the effects of
forwarding are with device such as SIF where SMA (and SMI) as well as
PMA are totally in hardware. If MAD is locally handled rather than
"forwarded", I assume that the device generates and sends the response
MAD, right ?

Do locally handled MADs work correctly ? Does OpenSM properly run on SIF
? I saw a post yesterday on issue with running OpenSM on SIF in terms of
obtaining the local PortInfo via DR SMP with hop count of 0.

>>>
>>> and no process_mad function is provided.
>>>
>>> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>> ---
>>>  drivers/infiniband/core/mad.c | 6 ++++++
>>>  drivers/infiniband/core/smi.h | 6 ++----
>>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>> index 2d49228..ece33ec 100644
>>> --- a/drivers/infiniband/core/mad.c
>>> +++ b/drivers/infiniband/core/mad.c
>>> @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>>>  			goto out;
>>>  	}
>>>  
>>> +	/* If device does not define the optional process_mad function it means that
>>> +	 * everything is handled in hardware:
>>> +	 */
>>> +	if (!device->process_mad)
>>> +		goto out;
>>> +
>> This changes the ib_post_send_mad flow for those devices in that it
>> takes the send rather than error path.
> 
> Yes, but no packets will ever be received by this function for the i40iw and usnic 
> devices because they have said in their capabilities that they do not support SMI?

You're right - it shouldn't get here for those devices (it's based on
MAD rather than SMI cap) - umad open would fail for the port GUID.

>>>  	local = kmalloc(sizeof *local, GFP_ATOMIC);
>>>  	if (!local) {
>>>  		ret = -ENOMEM;
>>> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
>>> index 33c91c8..16a9f9a 100644
>>> --- a/drivers/infiniband/core/smi.h
>>> +++ b/drivers/infiniband/core/smi.h
>>> @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>>>  {
>>>  	/* C14-9:3 -- We're at the end of the DR segment of path */
>>>  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
>>> -	return ((device->process_mad &&
>>> -		!ib_get_smp_direction(smp) &&
>>> +	return ((!ib_get_smp_direction(smp) &&
>>>  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
>>>  		IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>  }
>>> @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
>>>  {
>>>  	/* C14-13:3 -- We're at the end of the DR segment of path */
>>>  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
>>> -	return ((device->process_mad &&
>>> -		ib_get_smp_direction(smp) &&
>>> +	return ((ib_get_smp_direction(smp) &&
>>>  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>  }
>> Also, with this approach of optional process_mad for IB device, will/how
>> will sysfs port counters be supported for this device ? This currently
>> relies on process_mad.
> 
> Yes, that is actually a problem for us right now. 
> We do however think that the solution of composing a packet to send via process_mad is 
> kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.

Is there a way other than via PMA to get these counters ?

>> It should be possible to implement a process_mad routine to return
>> ib_mad_result based on management class and perhaps attribute ID in the
>> case of SMInfo. Can this alternative approach be used for SIF ?
> 
> The problem is that the handle_outgoing_dr_smp function has an implicit assumption that 
> some packets are handled by the process_mad function itself. There is no way to provide return 
> values from the process_mad function that ensures that packets are always forwarded to the device,
> so the only viable solution without breaking the API seems to be to not implement process_mad.

Are you referring to the difference between returning 0 when no
process_mad function as this patch does and what happens when
process_mad returns 0 inside of the ib_mad_post_send routine ? Doesn't
the send MAD need to be tracked in the MAD layer ? Is there some problem
invoking ib_send_mad (or ib_send_rmpp_mad) ?

-- Hal

> Thanks for the review,
> 
> Knut
> 
>> -- Hal
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]                 ` <098f69ae-6940-589a-e9ad-c65e34c958b7-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-09-08 17:22                   ` Knut Omang
       [not found]                     ` <1473355350.569.5.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-08 17:22 UTC (permalink / raw)
  To: Hal Rosenstock, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On Thu, 2016-09-08 at 07:33 -0400, Hal Rosenstock wrote:
> On 9/7/2016 7:42 AM, Knut Omang wrote:
>> On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
>>> On 9/1/2016 8:09 PM, Knut Omang wrote:
>>>>
>>>> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>
>>>> The process_mad function is an optional IB driver entry point
>>>> allows a driver to intercept or modify MAD traffic.
>>>
>>> process_mad is optional but currently that optionality is based on
>>> whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
>>> the core_cap_flags and is related to the device type: IB including RoCE
>>> v. non IB. The current in tree device drivers that do not support
>>> process_mad are i40iw (iWARP) and usnic.
>>
>> I see - we are introducing a new case here, then.
>>>>
>>>> This fix allows MAD traffic to flow down to the device also
>>>> when MAD traffic is completely handled by the device
>>>
>>> All MAD traffic is not handled by the device.
>>
>> Yes, in our case all MAD packet handling is done by the device,
>
> SMInfo is not handled by device unless you have SM in hardware.
> It also depends on management class, method, and direction
> (outgoing/incoming) and in case of DR SMP the DR path.

Yes, with "handling" here we really mean "deciding what to do with it" -
in our case, the driver is not involved in those decisions at all.
That includes both inbound and outbound DR and LID routed SMInfo packets.

>> the driver's task for MAD packets is just to forward.
>
> I think that it's a little more complex than that. See above.
>
> Perhaps some of this is due to not understanding what the effects of
> forwarding are with device such as SIF where SMA (and SMI) as well as
> PMA are totally in hardware. If MAD is locally handled rather than
> "forwarded", I assume that the device generates and sends the response
> MAD, right ?

Yes, correct.

> Do locally handled MADs work correctly ?

Yes

> Does OpenSM properly run on SIF?

The goal is to have OpenSM work seamlessly with SIF, yes.

> I saw a post yesterday on issue with running OpenSM on SIF in terms of
> obtaining the local PortInfo via DR SMP with hop count of 0.

Do you have a link ref to this?

>>>> and no process_mad function is provided.
>>>>
>>>> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>  drivers/infiniband/core/mad.c | 6 ++++++
>>>>  drivers/infiniband/core/smi.h | 6 ++----
>>>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>>> index 2d49228..ece33ec 100644
>>>> --- a/drivers/infiniband/core/mad.c
>>>> +++ b/drivers/infiniband/core/mad.c
>>>> @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>>>>  			goto out;
>>>>  	}
>>>>
>>>> +	/* If device does not define the optional process_mad function it means that
>>>> +	 * everything is handled in hardware:
>>>> +	 */
>>>> +	if (!device->process_mad)
>>>> +		goto out;
>>>> +
>>> This changes the ib_post_send_mad flow for those devices in that it
>>> takes the send rather than error path.
>>
>> Yes, but no packets will ever be received by this function for the i40iw and usnic
>> devices because they have said in their capabilities that they do not support SMI?
>
> You're right - it shouldn't get here for those devices (it's based on
> MAD rather than SMI cap) - umad open would fail for the port GUID.

Ok, I see - good!

>>>>  	local = kmalloc(sizeof *local, GFP_ATOMIC);
>>>>  	if (!local) {
>>>>  		ret = -ENOMEM;
>>>> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
>>>> index 33c91c8..16a9f9a 100644
>>>> --- a/drivers/infiniband/core/smi.h
>>>> +++ b/drivers/infiniband/core/smi.h
>>>> @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>>>>  {
>>>>  	/* C14-9:3 -- We're at the end of the DR segment of path */
>>>>  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
>>>> -	return ((device->process_mad &&
>>>> -		!ib_get_smp_direction(smp) &&
>>>> +	return ((!ib_get_smp_direction(smp) &&
>>>>  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
>>>>  		IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>  }
>>>> @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
>>>>  {
>>>>  	/* C14-13:3 -- We're at the end of the DR segment of path */
>>>>  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
>>>> -	return ((device->process_mad &&
>>>> -		ib_get_smp_direction(smp) &&
>>>> +	return ((ib_get_smp_direction(smp) &&
>>>>  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>  }
>>> Also, with this approach of optional process_mad for IB device, will/how
>>> will sysfs port counters be supported for this device ? This currently
>>> relies on process_mad.
>>
>> Yes, that is actually a problem for us right now.
>> We do however think that the solution of composing a packet to send via process_mad is
>> kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.
>
> Is there a way other than via PMA to get these counters ?

Yes, the driver have a work request based model to request such info from the device.
This is partly firmware based so it can be extended/adapted if necessary.

>>> It should be possible to implement a process_mad routine to return
>>> ib_mad_result based on management class and perhaps attribute ID in the
>>> case of SMInfo. Can this alternative approach be used for SIF ?
>>
>> The problem is that the handle_outgoing_dr_smp function has an implicit assumption that
>> some packets are handled by the process_mad function itself. There is no way to provide return
>> values from the process_mad function that ensures that packets are always forwarded to the device,
>> so the only viable solution without breaking the API seems to be to not implement process_mad.
>
> Are you referring to the difference between returning 0 when no
> process_mad function as this patch does and what happens when
> process_mad returns 0 inside of the ib_mad_post_send routine ?

Yes, that sounds about right, it's been a while since we struggled with this.

> Doesn't the send MAD need to be tracked in the MAD layer ?

The MAD layer tracking happens above the driver, it creates and manages QP0 and sees all mad packets
that gets sent as with Connect-X<n> ?

> Is there some problem invoking ib_send_mad (or ib_send_rmpp_mad) ?

No, to my knowledge we are able to support the normal management
applications just as one would expect.

Thanks,
Knut

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]                     ` <1473355350.569.5.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-08 19:02                       ` Hal Rosenstock
       [not found]                         ` <2eddf795-71bb-1866-42d9-8a3ba3d512d4-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Hal Rosenstock @ 2016-09-08 19:02 UTC (permalink / raw)
  To: Knut Omang, Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On 9/8/2016 1:22 PM, Knut Omang wrote:
> On Thu, 2016-09-08 at 07:33 -0400, Hal Rosenstock wrote:
>> On 9/7/2016 7:42 AM, Knut Omang wrote:
>>> On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
>>>> On 9/1/2016 8:09 PM, Knut Omang wrote:
>>>>>
>>>>> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>
>>>>> The process_mad function is an optional IB driver entry point
>>>>> allows a driver to intercept or modify MAD traffic.
>>>>
>>>> process_mad is optional but currently that optionality is based on
>>>> whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
>>>> the core_cap_flags and is related to the device type: IB including RoCE
>>>> v. non IB. The current in tree device drivers that do not support
>>>> process_mad are i40iw (iWARP) and usnic.
>>>
>>> I see - we are introducing a new case here, then.
>>>>>
>>>>> This fix allows MAD traffic to flow down to the device also
>>>>> when MAD traffic is completely handled by the device
>>>>
>>>> All MAD traffic is not handled by the device.
>>>
>>> Yes, in our case all MAD packet handling is done by the device,
>>
>> SMInfo is not handled by device unless you have SM in hardware.
>> It also depends on management class, method, and direction
>> (outgoing/incoming) and in case of DR SMP the DR path.
> 
> Yes, with "handling" here we really mean "deciding what to do with it" -
> in our case, the driver is not involved in those decisions at all.
> That includes both inbound and outbound DR and LID routed SMInfo packets.
> 
>>> the driver's task for MAD packets is just to forward.
>>
>> I think that it's a little more complex than that. See above.
>>
>> Perhaps some of this is due to not understanding what the effects of
>> forwarding are with device such as SIF where SMA (and SMI) as well as
>> PMA are totally in hardware. If MAD is locally handled rather than
>> "forwarded", I assume that the device generates and sends the response
>> MAD, right ?
> 
> Yes, correct.
> 
>> Do locally handled MADs work correctly ?
> 
> Yes
> 
>> Does OpenSM properly run on SIF?
> 
> The goal is to have OpenSM work seamlessly with SIF, yes.

More than whether it's a goal, I'm asking whether OpenSM currently works
seamlessly with SIF with your changes (including driver not yet
submitted to linux-rdma) ?

>> I saw a post yesterday on issue with running OpenSM on SIF in terms of
>> obtaining the local PortInfo via DR SMP with hop count of 0.
> 
> Do you have a link ref to this?

http://lists.openfabrics.org/pipermail/users/2016-September/000607.html

>>>>> and no process_mad function is provided.
>>>>>
>>>>> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>> ---
>>>>>  drivers/infiniband/core/mad.c | 6 ++++++
>>>>>  drivers/infiniband/core/smi.h | 6 ++----
>>>>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>>>> index 2d49228..ece33ec 100644
>>>>> --- a/drivers/infiniband/core/mad.c
>>>>> +++ b/drivers/infiniband/core/mad.c
>>>>> @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>>>>>  			goto out;
>>>>>  	}
>>>>>
>>>>> +	/* If device does not define the optional process_mad function it means that
>>>>> +	 * everything is handled in hardware:
>>>>> +	 */
>>>>> +	if (!device->process_mad)
>>>>> +		goto out;
>>>>> +
>>>> This changes the ib_post_send_mad flow for those devices in that it
>>>> takes the send rather than error path.
>>>
>>> Yes, but no packets will ever be received by this function for the i40iw and usnic
>>> devices because they have said in their capabilities that they do not support SMI?
>>
>> You're right - it shouldn't get here for those devices (it's based on
>> MAD rather than SMI cap) - umad open would fail for the port GUID.
> 
> Ok, I see - good!
> 
>>>>>  	local = kmalloc(sizeof *local, GFP_ATOMIC);
>>>>>  	if (!local) {
>>>>>  		ret = -ENOMEM;
>>>>> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
>>>>> index 33c91c8..16a9f9a 100644
>>>>> --- a/drivers/infiniband/core/smi.h
>>>>> +++ b/drivers/infiniband/core/smi.h
>>>>> @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>>>>>  {
>>>>>  	/* C14-9:3 -- We're at the end of the DR segment of path */
>>>>>  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
>>>>> -	return ((device->process_mad &&
>>>>> -		!ib_get_smp_direction(smp) &&
>>>>> +	return ((!ib_get_smp_direction(smp) &&
>>>>>  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
>>>>>  		IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>  }
>>>>> @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
>>>>>  {
>>>>>  	/* C14-13:3 -- We're at the end of the DR segment of path */
>>>>>  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
>>>>> -	return ((device->process_mad &&
>>>>> -		ib_get_smp_direction(smp) &&
>>>>> +	return ((ib_get_smp_direction(smp) &&
>>>>>  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>  }
>>>> Also, with this approach of optional process_mad for IB device, will/how
>>>> will sysfs port counters be supported for this device ? This currently
>>>> relies on process_mad.
>>>
>>> Yes, that is actually a problem for us right now.
>>> We do however think that the solution of composing a packet to send via process_mad is
>>> kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.
>>
>> Is there a way other than via PMA to get these counters ?
> 
> Yes, the driver have a work request based model to request such info from the device.
> This is partly firmware based so it can be extended/adapted if necessary.

sysfs.c could be extended to support this in addition to the current PMA
approach for PortCounters/PortCountersExtended although PMA access to
your device needs to work anyhow for at least PortCounters. Perhaps a
new optional device driver entry point for get_port_stats taking
ib_device struct and port number and returning some stats struct ?

>>>> It should be possible to implement a process_mad routine to return
>>>> ib_mad_result based on management class and perhaps attribute ID in the
>>>> case of SMInfo. Can this alternative approach be used for SIF ?
>>>
>>> The problem is that the handle_outgoing_dr_smp function has an implicit assumption that
>>> some packets are handled by the process_mad function itself. There is no way to provide return
>>> values from the process_mad function that ensures that packets are always forwarded to the device,
>>> so the only viable solution without breaking the API seems to be to not implement process_mad.
>>
>> Are you referring to the difference between returning 0 when no
>> process_mad function as this patch does and what happens when
>> process_mad returns 0 inside of the ib_mad_post_send routine ?
> 
> Yes, that sounds about right, it's been a while since we struggled with this.
> 
>> Doesn't the send MAD need to be tracked in the MAD layer ?
> 
> The MAD layer tracking happens above the driver, it creates and manages QP0

and QP1 too

> and sees all mad packets that gets sent as with Connect-X<n> ?

Yes. That's what I'm referring to here as this is skipped since
process_mad is NULL. Why wouldn't this be needed with SIF ? What happens
if SIF device doesn't respond when it should ?
Is there some error scenario where no response would come back ?

-- Hal

>> Is there some problem invoking ib_send_mad (or ib_send_rmpp_mad) ?
> 
> No, to my knowledge we are able to support the normal management
> applications just as one would expect.
> 
> Thanks,
> Knut
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]                         ` <2eddf795-71bb-1866-42d9-8a3ba3d512d4-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-09-09  2:23                           ` Knut Omang
       [not found]                             ` <1473387784.569.17.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-09  2:23 UTC (permalink / raw)
  To: Hal Rosenstock, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On Thu, 2016-09-08 at 15:02 -0400, Hal Rosenstock wrote:
> On 9/8/2016 1:22 PM, Knut Omang wrote:
>> On Thu, 2016-09-08 at 07:33 -0400, Hal Rosenstock wrote:
>>> On 9/7/2016 7:42 AM, Knut Omang wrote:
>>>> On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
>>>>> On 9/1/2016 8:09 PM, Knut Omang wrote:
>>>>>>
>>>>>> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>>
>>>>>> The process_mad function is an optional IB driver entry point
>>>>>> allows a driver to intercept or modify MAD traffic.
>>>>>
>>>>> process_mad is optional but currently that optionality is based on
>>>>> whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
>>>>> the core_cap_flags and is related to the device type: IB including RoCE
>>>>> v. non IB. The current in tree device drivers that do not support
>>>>> process_mad are i40iw (iWARP) and usnic.
>>>>
>>>> I see - we are introducing a new case here, then.
>>>>>>
>>>>>> This fix allows MAD traffic to flow down to the device also
>>>>>> when MAD traffic is completely handled by the device
>>>>>
>>>>> All MAD traffic is not handled by the device.
>>>>
>>>> Yes, in our case all MAD packet handling is done by the device,
>>>
>>> SMInfo is not handled by device unless you have SM in hardware.
>>> It also depends on management class, method, and direction
>>> (outgoing/incoming) and in case of DR SMP the DR path.
>>
>> Yes, with "handling" here we really mean "deciding what to do with it" -
>> in our case, the driver is not involved in those decisions at all.
>> That includes both inbound and outbound DR and LID routed SMInfo packets.
>>
>>>> the driver's task for MAD packets is just to forward.
>>>
>>> I think that it's a little more complex than that. See above.
>>>
>>> Perhaps some of this is due to not understanding what the effects of
>>> forwarding are with device such as SIF where SMA (and SMI) as well as
>>> PMA are totally in hardware. If MAD is locally handled rather than
>>> "forwarded", I assume that the device generates and sends the response
>>> MAD, right ?
>>
>> Yes, correct.
>>
>>> Do locally handled MADs work correctly ?
>>
>> Yes
>>
>>> Does OpenSM properly run on SIF?
>>
>> The goal is to have OpenSM work seamlessly with SIF, yes.
>
> More than whether it's a goal, I'm asking whether OpenSM currently works
> seamlessly with SIF with your changes (including driver not yet
> submitted to linux-rdma) ?

Yes, the SM works seamlessly with this patch and the driver.

>>> I saw a post yesterday on issue with running OpenSM on SIF in terms of
>>> obtaining the local PortInfo via DR SMP with hop count of 0.
>>
>> Do you have a link ref to this?
>
> http://lists.openfabrics.org/pipermail/users/2016-September/000607.html

Probably not SIF ;-)

>>>>>> and no process_mad function is provided.
>>>>>>
>>>>>> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>> ---
>>>>>>  drivers/infiniband/core/mad.c | 6 ++++++
>>>>>>  drivers/infiniband/core/smi.h | 6 ++----
>>>>>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>>>>> index 2d49228..ece33ec 100644
>>>>>> --- a/drivers/infiniband/core/mad.c
>>>>>> +++ b/drivers/infiniband/core/mad.c
>>>>>> @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>>>>>>  			goto out;
>>>>>>  	}
>>>>>>
>>>>>> +	/* If device does not define the optional process_mad function it means that
>>>>>> +	 * everything is handled in hardware:
>>>>>> +	 */
>>>>>> +	if (!device->process_mad)
>>>>>> +		goto out;
>>>>>> +
>>>>> This changes the ib_post_send_mad flow for those devices in that it
>>>>> takes the send rather than error path.
>>>>
>>>> Yes, but no packets will ever be received by this function for the i40iw and usnic
>>>> devices because they have said in their capabilities that they do not support SMI?
>>>
>>> You're right - it shouldn't get here for those devices (it's based on
>>> MAD rather than SMI cap) - umad open would fail for the port GUID.
>>
>> Ok, I see - good!
>>
>>>>>>  	local = kmalloc(sizeof *local, GFP_ATOMIC);
>>>>>>  	if (!local) {
>>>>>>  		ret = -ENOMEM;
>>>>>> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
>>>>>> index 33c91c8..16a9f9a 100644
>>>>>> --- a/drivers/infiniband/core/smi.h
>>>>>> +++ b/drivers/infiniband/core/smi.h
>>>>>> @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>>>>>>  {
>>>>>>  	/* C14-9:3 -- We're at the end of the DR segment of path */
>>>>>>  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
>>>>>> -	return ((device->process_mad &&
>>>>>> -		!ib_get_smp_direction(smp) &&
>>>>>> +	return ((!ib_get_smp_direction(smp) &&
>>>>>>  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
>>>>>>  		IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>>  }
>>>>>> @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
>>>>>>  {
>>>>>>  	/* C14-13:3 -- We're at the end of the DR segment of path */
>>>>>>  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
>>>>>> -	return ((device->process_mad &&
>>>>>> -		ib_get_smp_direction(smp) &&
>>>>>> +	return ((ib_get_smp_direction(smp) &&
>>>>>>  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>>  }
>>>>> Also, with this approach of optional process_mad for IB device, will/how
>>>>> will sysfs port counters be supported for this device ? This currently
>>>>> relies on process_mad.
>>>>
>>>> Yes, that is actually a problem for us right now.
>>>> We do however think that the solution of composing a packet to send via process_mad is
>>>> kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.
>>>
>>> Is there a way other than via PMA to get these counters ?
>>
>> Yes, the driver have a work request based model to request such info from the device.
>> This is partly firmware based so it can be extended/adapted if necessary.
>
> sysfs.c could be extended to support this in addition to the current PMA
> approach for PortCounters/PortCountersExtended although PMA access to
> your device needs to work anyhow for at least PortCounters. Perhaps a
> new optional device driver entry point for get_port_stats taking
> ib_device struct and port number and returning some stats struct ?

Sounds good! We do support PMA as well.
>
>>>>> It should be possible to implement a process_mad routine to return
>>>>> ib_mad_result based on management class and perhaps attribute ID in the
>>>>> case of SMInfo. Can this alternative approach be used for SIF ?
>>>>
>>>> The problem is that the handle_outgoing_dr_smp function has an implicit assumption that
>>>> some packets are handled by the process_mad function itself. There is no way to provide return
>>>> values from the process_mad function that ensures that packets are always forwarded to the device,
>>>> so the only viable solution without breaking the API seems to be to not implement process_mad.
>>>
>>> Are you referring to the difference between returning 0 when no
>>> process_mad function as this patch does and what happens when
>>> process_mad returns 0 inside of the ib_mad_post_send routine ?
>>
>> Yes, that sounds about right, it's been a while since we struggled with this.
>>
>>> Doesn't the send MAD need to be tracked in the MAD layer ?
>>
>> The MAD layer tracking happens above the driver, it creates and manages QP0
>
> and QP1 too

Yes, a similar story applies for QP1.

>> and sees all mad packets that gets sent as with Connect-X<n> ?
>
> Yes. That's what I'm referring to here as this is skipped since
> process_mad is NULL. Why wouldn't this be needed with SIF ? What happens
> if SIF device doesn't respond when it should ?
> Is there some error scenario where no response would come back ?

On the contrary, the idea is that the Infiniband side of SIF is up and ready to
handle MAD traffic etc. long before the host has booted, or if the host
had crashed for some reason.

Knut

> -- Hal
>
>>> Is there some problem invoking ib_send_mad (or ib_send_rmpp_mad) ?
>>
>> No, to my knowledge we are able to support the normal management
>> applications just as one would expect.
>>
>> Thanks,
>> Knut
>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]                             ` <1473387784.569.17.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-09 14:24                               ` Hal Rosenstock
       [not found]                                 ` <37195bbc-0db0-8054-2174-8dc0faf7c692-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Hal Rosenstock @ 2016-09-09 14:24 UTC (permalink / raw)
  To: Knut Omang, Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On 9/8/2016 10:23 PM, Knut Omang wrote:
> On Thu, 2016-09-08 at 15:02 -0400, Hal Rosenstock wrote:
>> On 9/8/2016 1:22 PM, Knut Omang wrote:
>>> On Thu, 2016-09-08 at 07:33 -0400, Hal Rosenstock wrote:
>>>> On 9/7/2016 7:42 AM, Knut Omang wrote:
>>>>> On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
>>>>>> On 9/1/2016 8:09 PM, Knut Omang wrote:
>>>>>>>
>>>>>>> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>>>
>>>>>>> The process_mad function is an optional IB driver entry point
>>>>>>> allows a driver to intercept or modify MAD traffic.
>>>>>>
>>>>>> process_mad is optional but currently that optionality is based on
>>>>>> whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
>>>>>> the core_cap_flags and is related to the device type: IB including RoCE
>>>>>> v. non IB. The current in tree device drivers that do not support
>>>>>> process_mad are i40iw (iWARP) and usnic.
>>>>>
>>>>> I see - we are introducing a new case here, then.
>>>>>>>
>>>>>>> This fix allows MAD traffic to flow down to the device also
>>>>>>> when MAD traffic is completely handled by the device
>>>>>>
>>>>>> All MAD traffic is not handled by the device.
>>>>>
>>>>> Yes, in our case all MAD packet handling is done by the device,
>>>>
>>>> SMInfo is not handled by device unless you have SM in hardware.
>>>> It also depends on management class, method, and direction
>>>> (outgoing/incoming) and in case of DR SMP the DR path.
>>>
>>> Yes, with "handling" here we really mean "deciding what to do with it" -
>>> in our case, the driver is not involved in those decisions at all.
>>> That includes both inbound and outbound DR and LID routed SMInfo packets.
>>>
>>>>> the driver's task for MAD packets is just to forward.
>>>>
>>>> I think that it's a little more complex than that. See above.
>>>>
>>>> Perhaps some of this is due to not understanding what the effects of
>>>> forwarding are with device such as SIF where SMA (and SMI) as well as
>>>> PMA are totally in hardware. If MAD is locally handled rather than
>>>> "forwarded", I assume that the device generates and sends the response
>>>> MAD, right ?
>>>
>>> Yes, correct.
>>>
>>>> Do locally handled MADs work correctly ?
>>>
>>> Yes
>>>
>>>> Does OpenSM properly run on SIF?
>>>
>>> The goal is to have OpenSM work seamlessly with SIF, yes.
>>
>> More than whether it's a goal, I'm asking whether OpenSM currently works
>> seamlessly with SIF with your changes (including driver not yet
>> submitted to linux-rdma) ?
> 
> Yes, the SM works seamlessly with this patch and the driver.
> 
>>>> I saw a post yesterday on issue with running OpenSM on SIF in terms of
>>>> obtaining the local PortInfo via DR SMP with hop count of 0.
>>>
>>> Do you have a link ref to this?
>>
>> http://lists.openfabrics.org/pipermail/users/2016-September/000607.html
> 
> Probably not SIF ;-)
> 
>>>>>>> and no process_mad function is provided.
>>>>>>>
>>>>>>> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>>> ---
>>>>>>>  drivers/infiniband/core/mad.c | 6 ++++++
>>>>>>>  drivers/infiniband/core/smi.h | 6 ++----
>>>>>>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>>>>>> index 2d49228..ece33ec 100644
>>>>>>> --- a/drivers/infiniband/core/mad.c
>>>>>>> +++ b/drivers/infiniband/core/mad.c
>>>>>>> @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>>>>>>>  			goto out;
>>>>>>>  	}
>>>>>>>
>>>>>>> +	/* If device does not define the optional process_mad function it means that
>>>>>>> +	 * everything is handled in hardware:
>>>>>>> +	 */
>>>>>>> +	if (!device->process_mad)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>> This changes the ib_post_send_mad flow for those devices in that it
>>>>>> takes the send rather than error path.
>>>>>
>>>>> Yes, but no packets will ever be received by this function for the i40iw and usnic
>>>>> devices because they have said in their capabilities that they do not support SMI?
>>>>
>>>> You're right - it shouldn't get here for those devices (it's based on
>>>> MAD rather than SMI cap) - umad open would fail for the port GUID.
>>>
>>> Ok, I see - good!
>>>
>>>>>>>  	local = kmalloc(sizeof *local, GFP_ATOMIC);
>>>>>>>  	if (!local) {
>>>>>>>  		ret = -ENOMEM;
>>>>>>> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
>>>>>>> index 33c91c8..16a9f9a 100644
>>>>>>> --- a/drivers/infiniband/core/smi.h
>>>>>>> +++ b/drivers/infiniband/core/smi.h
>>>>>>> @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>>>>>>>  {
>>>>>>>  	/* C14-9:3 -- We're at the end of the DR segment of path */
>>>>>>>  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
>>>>>>> -	return ((device->process_mad &&
>>>>>>> -		!ib_get_smp_direction(smp) &&
>>>>>>> +	return ((!ib_get_smp_direction(smp) &&
>>>>>>>  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
>>>>>>>  		IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>>>  }
>>>>>>> @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
>>>>>>>  {
>>>>>>>  	/* C14-13:3 -- We're at the end of the DR segment of path */
>>>>>>>  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
>>>>>>> -	return ((device->process_mad &&
>>>>>>> -		ib_get_smp_direction(smp) &&
>>>>>>> +	return ((ib_get_smp_direction(smp) &&
>>>>>>>  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>>>  }
>>>>>> Also, with this approach of optional process_mad for IB device, will/how
>>>>>> will sysfs port counters be supported for this device ? This currently
>>>>>> relies on process_mad.
>>>>>
>>>>> Yes, that is actually a problem for us right now.
>>>>> We do however think that the solution of composing a packet to send via process_mad is
>>>>> kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.
>>>>
>>>> Is there a way other than via PMA to get these counters ?
>>>
>>> Yes, the driver have a work request based model to request such info from the device.
>>> This is partly firmware based so it can be extended/adapted if necessary.
>>
>> sysfs.c could be extended to support this in addition to the current PMA
>> approach for PortCounters/PortCountersExtended although PMA access to
>> your device needs to work anyhow for at least PortCounters. Perhaps a
>> new optional device driver entry point for get_port_stats taking
>> ib_device struct and port number and returning some stats struct ?
> 
> Sounds good! We do support PMA as well.
>>
>>>>>> It should be possible to implement a process_mad routine to return
>>>>>> ib_mad_result based on management class and perhaps attribute ID in the
>>>>>> case of SMInfo. Can this alternative approach be used for SIF ?
>>>>>
>>>>> The problem is that the handle_outgoing_dr_smp function has an implicit assumption that
>>>>> some packets are handled by the process_mad function itself. There is no way to provide return
>>>>> values from the process_mad function that ensures that packets are always forwarded to the device,
>>>>> so the only viable solution without breaking the API seems to be to not implement process_mad.
>>>>
>>>> Are you referring to the difference between returning 0 when no
>>>> process_mad function as this patch does and what happens when
>>>> process_mad returns 0 inside of the ib_mad_post_send routine ?
>>>
>>> Yes, that sounds about right, it's been a while since we struggled with this.
>>>
>>>> Doesn't the send MAD need to be tracked in the MAD layer ?
>>>
>>> The MAD layer tracking happens above the driver, it creates and manages QP0
>>
>> and QP1 too
> 
> Yes, a similar story applies for QP1.
> 
>>> and sees all mad packets that gets sent as with Connect-X<n> ?
>>
>> Yes. That's what I'm referring to here as this is skipped since
>> process_mad is NULL. Why wouldn't this be needed with SIF ? What happens
>> if SIF device doesn't respond when it should ?
>> Is there some error scenario where no response would come back ?
> 
> On the contrary, the idea is that the Infiniband side of SIF is up and ready to
> handle MAD traffic etc. long before the host has booted, or if the host
> had crashed for some reason.

No transmission medium (including backplanes) are perfect. Isn't there
some rare case where there is some bit error occurs causing bad CRC so
device never sees valid local MAD or response never makes it back ?

Anyhow, I think I see now. There is no such thing as local completion
with SIF so that case is handled like any other MAD send in
ib_post_send_mad code path (and that tracks the MAD and handles timeouts
in case response is not received to "transaction").

One minor thing:
Since SIF device includes SMA and PMA, there should be no need for mad
to "open" the agents. That can either be handled in mad.c:ib_mad_[init
remove]_device to skip ib_agent_port_[open close] when
device->process_mad is not defined or modify those routines in agent.c
to just return if device->process_mad is not defined.

-- Hal

> 
> Knut
> 
>> -- Hal
>>
>>>> Is there some problem invoking ib_send_mad (or ib_send_rmpp_mad) ?
>>>
>>> No, to my knowledge we are able to support the normal management
>>> applications just as one would expect.
>>>
>>> Thanks,
>>> Knut
>>>
>>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]                                 ` <37195bbc-0db0-8054-2174-8dc0faf7c692-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-09-12  9:09                                   ` Knut Omang
       [not found]                                     ` <1473671364.17998.17.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Knut Omang @ 2016-09-12  9:09 UTC (permalink / raw)
  To: Hal Rosenstock, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On Fri, 2016-09-09 at 10:24 -0400, Hal Rosenstock wrote:
> On 9/8/2016 10:23 PM, Knut Omang wrote:
> > 
> > On Thu, 2016-09-08 at 15:02 -0400, Hal Rosenstock wrote:
> > > 
> > > On 9/8/2016 1:22 PM, Knut Omang wrote:
> > > > 
> > > > On Thu, 2016-09-08 at 07:33 -0400, Hal Rosenstock wrote:
> > > > > 
> > > > > On 9/7/2016 7:42 AM, Knut Omang wrote:
> > > > > > 
> > > > > > On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
> > > > > > > 
> > > > > > > On 9/1/2016 8:09 PM, Knut Omang wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > > > > > > > 
> > > > > > > > The process_mad function is an optional IB driver entry point
> > > > > > > > allows a driver to intercept or modify MAD traffic.
> > > > > > > process_mad is optional but currently that optionality is based on
> > > > > > > whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
> > > > > > > the core_cap_flags and is related to the device type: IB including RoCE
> > > > > > > v. non IB. The current in tree device drivers that do not support
> > > > > > > process_mad are i40iw (iWARP) and usnic.
> > > > > > I see - we are introducing a new case here, then.
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > This fix allows MAD traffic to flow down to the device also
> > > > > > > > when MAD traffic is completely handled by the device
> > > > > > > All MAD traffic is not handled by the device.
> > > > > > Yes, in our case all MAD packet handling is done by the device,
> > > > > SMInfo is not handled by device unless you have SM in hardware.
> > > > > It also depends on management class, method, and direction
> > > > > (outgoing/incoming) and in case of DR SMP the DR path.
> > > > Yes, with "handling" here we really mean "deciding what to do with it" -
> > > > in our case, the driver is not involved in those decisions at all.
> > > > That includes both inbound and outbound DR and LID routed SMInfo packets.
> > > > 
> > > > > 
> > > > > > 
> > > > > > the driver's task for MAD packets is just to forward.
> > > > > I think that it's a little more complex than that. See above.
> > > > > 
> > > > > Perhaps some of this is due to not understanding what the effects of
> > > > > forwarding are with device such as SIF where SMA (and SMI) as well as
> > > > > PMA are totally in hardware. If MAD is locally handled rather than
> > > > > "forwarded", I assume that the device generates and sends the response
> > > > > MAD, right ?
> > > > Yes, correct.
> > > > 
> > > > > 
> > > > > Do locally handled MADs work correctly ?
> > > > Yes
> > > > 
> > > > > 
> > > > > Does OpenSM properly run on SIF?
> > > > The goal is to have OpenSM work seamlessly with SIF, yes.
> > > More than whether it's a goal, I'm asking whether OpenSM currently works
> > > seamlessly with SIF with your changes (including driver not yet
> > > submitted to linux-rdma) ?
> > Yes, the SM works seamlessly with this patch and the driver.
> > 
> > > 
> > > > 
> > > > > 
> > > > > I saw a post yesterday on issue with running OpenSM on SIF in terms of
> > > > > obtaining the local PortInfo via DR SMP with hop count of 0.
> > > > Do you have a link ref to this?
> > > http://lists.openfabrics.org/pipermail/users/2016-September/000607.html
> > Probably not SIF ;-)
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > and no process_mad function is provided.
> > > > > > > > 
> > > > > > > > Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > > > > > > > ---
> > > > > > > >  drivers/infiniband/core/mad.c | 6 ++++++
> > > > > > > >  drivers/infiniband/core/smi.h | 6 ++----
> > > > > > > >  2 files changed, 8 insertions(+), 4 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> > > > > > > > index 2d49228..ece33ec 100644
> > > > > > > > --- a/drivers/infiniband/core/mad.c
> > > > > > > > +++ b/drivers/infiniband/core/mad.c
> > > > > > > > @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
> > > > > > > >  			goto out;
> > > > > > > >  	}
> > > > > > > > 
> > > > > > > > +	/* If device does not define the optional process_mad function it means that
> > > > > > > > +	 * everything is handled in hardware:
> > > > > > > > +	 */
> > > > > > > > +	if (!device->process_mad)
> > > > > > > > +		goto out;
> > > > > > > > +
> > > > > > > This changes the ib_post_send_mad flow for those devices in that it
> > > > > > > takes the send rather than error path.
> > > > > > Yes, but no packets will ever be received by this function for the i40iw and usnic
> > > > > > devices because they have said in their capabilities that they do not support SMI?
> > > > > You're right - it shouldn't get here for those devices (it's based on
> > > > > MAD rather than SMI cap) - umad open would fail for the port GUID.
> > > > Ok, I see - good!
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > >  	local = kmalloc(sizeof *local, GFP_ATOMIC);
> > > > > > > >  	if (!local) {
> > > > > > > >  		ret = -ENOMEM;
> > > > > > > > diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
> > > > > > > > index 33c91c8..16a9f9a 100644
> > > > > > > > --- a/drivers/infiniband/core/smi.h
> > > > > > > > +++ b/drivers/infiniband/core/smi.h
> > > > > > > > @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
> > > > > > > >  {
> > > > > > > >  	/* C14-9:3 -- We're at the end of the DR segment of path */
> > > > > > > >  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
> > > > > > > > -	return ((device->process_mad &&
> > > > > > > > -		!ib_get_smp_direction(smp) &&
> > > > > > > > +	return ((!ib_get_smp_direction(smp) &&
> > > > > > > >  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
> > > > > > > >  		IB_SMI_HANDLE : IB_SMI_DISCARD);
> > > > > > > >  }
> > > > > > > > @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
> > > > > > > >  {
> > > > > > > >  	/* C14-13:3 -- We're at the end of the DR segment of path */
> > > > > > > >  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
> > > > > > > > -	return ((device->process_mad &&
> > > > > > > > -		ib_get_smp_direction(smp) &&
> > > > > > > > +	return ((ib_get_smp_direction(smp) &&
> > > > > > > >  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
> > > > > > > >  }
> > > > > > > Also, with this approach of optional process_mad for IB device, will/how
> > > > > > > will sysfs port counters be supported for this device ? This currently
> > > > > > > relies on process_mad.
> > > > > > Yes, that is actually a problem for us right now.
> > > > > > We do however think that the solution of composing a packet to send via process_mad is
> > > > > > kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.
> > > > > Is there a way other than via PMA to get these counters ?
> > > > Yes, the driver have a work request based model to request such info from the device.
> > > > This is partly firmware based so it can be extended/adapted if necessary.
> > > sysfs.c could be extended to support this in addition to the current PMA
> > > approach for PortCounters/PortCountersExtended although PMA access to
> > > your device needs to work anyhow for at least PortCounters. Perhaps a
> > > new optional device driver entry point for get_port_stats taking
> > > ib_device struct and port number and returning some stats struct ?
> > Sounds good! We do support PMA as well.
> > > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > It should be possible to implement a process_mad routine to return
> > > > > > > ib_mad_result based on management class and perhaps attribute ID in the
> > > > > > > case of SMInfo. Can this alternative approach be used for SIF ?
> > > > > > The problem is that the handle_outgoing_dr_smp function has an implicit assumption that
> > > > > > some packets are handled by the process_mad function itself. There is no way to provide return
> > > > > > values from the process_mad function that ensures that packets are always forwarded to the device,
> > > > > > so the only viable solution without breaking the API seems to be to not implement process_mad.
> > > > > Are you referring to the difference between returning 0 when no
> > > > > process_mad function as this patch does and what happens when
> > > > > process_mad returns 0 inside of the ib_mad_post_send routine ?
> > > > Yes, that sounds about right, it's been a while since we struggled with this.
> > > > 
> > > > > 
> > > > > Doesn't the send MAD need to be tracked in the MAD layer ?
> > > > The MAD layer tracking happens above the driver, it creates and manages QP0
> > > and QP1 too
> > Yes, a similar story applies for QP1.
> > 
> > > 
> > > > 
> > > > and sees all mad packets that gets sent as with Connect-X<n> ?
> > > Yes. That's what I'm referring to here as this is skipped since
> > > process_mad is NULL. Why wouldn't this be needed with SIF ? What happens
> > > if SIF device doesn't respond when it should ?
> > > Is there some error scenario where no response would come back ?
> > On the contrary, the idea is that the Infiniband side of SIF is up and ready to
> > handle MAD traffic etc. long before the host has booted, or if the host
> > had crashed for some reason.
> No transmission medium (including backplanes) are perfect. Isn't there
> some rare case where there is some bit error occurs causing bad CRC so
> device never sees valid local MAD or response never makes it back ?

My understanding is that this would then be a PCIe error case - relatively fatal 
from the host's perspective? Of course I am not claiming in any way that 
fatal errors can't happen...

> Anyhow, I think I see now. There is no such thing as local completion
> with SIF so that case is handled like any other MAD send in
> ib_post_send_mad code path (and that tracks the MAD and handles timeouts
> in case response is not received to "transaction").
> 
> One minor thing:
> Since SIF device includes SMA and PMA, there should be no need for mad
> to "open" the agents. That can either be handled in mad.c:ib_mad_[init
> remove]_device to skip ib_agent_port_[open close] when
> device->process_mad is not defined or modify those routines in agent.c
> to just return if device->process_mad is not defined.

It's been a while since I looked at the details there - I would have to revisit more carefully.
We still need to get QP0 and QP1 "created" so I think the current API is satisfactory from 
our perspective. That's probably the easiest for v1 - 
then we can look at further further improvements
once the code is in?

Thanks,
Knut

> 
> -- Hal
> 
> > 
> > 
> > Knut
> > 
> > > 
> > > -- Hal
> > > 
> > > > 
> > > > > 
> > > > > Is there some problem invoking ib_send_mad (or ib_send_rmpp_mad) ?
> > > > No, to my knowledge we are able to support the normal management
> > > > applications just as one would expect.
> > > > 
> > > > Thanks,
> > > > Knut
> > > > 
> > > > 
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered
       [not found]                                     ` <1473671364.17998.17.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-12 14:03                                       ` Hal Rosenstock
  0 siblings, 0 replies; 39+ messages in thread
From: Hal Rosenstock @ 2016-09-12 14:03 UTC (permalink / raw)
  To: Knut Omang, Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dag Moxnes

On 9/12/2016 5:09 AM, Knut Omang wrote:
> On Fri, 2016-09-09 at 10:24 -0400, Hal Rosenstock wrote:
>> On 9/8/2016 10:23 PM, Knut Omang wrote:
>>>
>>> On Thu, 2016-09-08 at 15:02 -0400, Hal Rosenstock wrote:
>>>>
>>>> On 9/8/2016 1:22 PM, Knut Omang wrote:
>>>>>
>>>>> On Thu, 2016-09-08 at 07:33 -0400, Hal Rosenstock wrote:
>>>>>>
>>>>>> On 9/7/2016 7:42 AM, Knut Omang wrote:
>>>>>>>
>>>>>>> On Tue, 2016-09-06 at 10:01 -0400, Hal Rosenstock wrote:
>>>>>>>>
>>>>>>>> On 9/1/2016 8:09 PM, Knut Omang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>>>>>
>>>>>>>>> The process_mad function is an optional IB driver entry point
>>>>>>>>> allows a driver to intercept or modify MAD traffic.
>>>>>>>> process_mad is optional but currently that optionality is based on
>>>>>>>> whether MADs (QP0 and/or QP1) are supported or not. This is reflected in
>>>>>>>> the core_cap_flags and is related to the device type: IB including RoCE
>>>>>>>> v. non IB. The current in tree device drivers that do not support
>>>>>>>> process_mad are i40iw (iWARP) and usnic.
>>>>>>> I see - we are introducing a new case here, then.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This fix allows MAD traffic to flow down to the device also
>>>>>>>>> when MAD traffic is completely handled by the device
>>>>>>>> All MAD traffic is not handled by the device.
>>>>>>> Yes, in our case all MAD packet handling is done by the device,
>>>>>> SMInfo is not handled by device unless you have SM in hardware.
>>>>>> It also depends on management class, method, and direction
>>>>>> (outgoing/incoming) and in case of DR SMP the DR path.
>>>>> Yes, with "handling" here we really mean "deciding what to do with it" -
>>>>> in our case, the driver is not involved in those decisions at all.
>>>>> That includes both inbound and outbound DR and LID routed SMInfo packets.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> the driver's task for MAD packets is just to forward.
>>>>>> I think that it's a little more complex than that. See above.
>>>>>>
>>>>>> Perhaps some of this is due to not understanding what the effects of
>>>>>> forwarding are with device such as SIF where SMA (and SMI) as well as
>>>>>> PMA are totally in hardware. If MAD is locally handled rather than
>>>>>> "forwarded", I assume that the device generates and sends the response
>>>>>> MAD, right ?
>>>>> Yes, correct.
>>>>>
>>>>>>
>>>>>> Do locally handled MADs work correctly ?
>>>>> Yes
>>>>>
>>>>>>
>>>>>> Does OpenSM properly run on SIF?
>>>>> The goal is to have OpenSM work seamlessly with SIF, yes.
>>>> More than whether it's a goal, I'm asking whether OpenSM currently works
>>>> seamlessly with SIF with your changes (including driver not yet
>>>> submitted to linux-rdma) ?
>>> Yes, the SM works seamlessly with this patch and the driver.
>>>
>>>>
>>>>>
>>>>>>
>>>>>> I saw a post yesterday on issue with running OpenSM on SIF in terms of
>>>>>> obtaining the local PortInfo via DR SMP with hop count of 0.
>>>>> Do you have a link ref to this?
>>>> http://lists.openfabrics.org/pipermail/users/2016-September/000607.html
>>> Probably not SIF ;-)
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> and no process_mad function is provided.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>>>>> ---
>>>>>>>>>  drivers/infiniband/core/mad.c | 6 ++++++
>>>>>>>>>  drivers/infiniband/core/smi.h | 6 ++----
>>>>>>>>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>>>>>>>> index 2d49228..ece33ec 100644
>>>>>>>>> --- a/drivers/infiniband/core/mad.c
>>>>>>>>> +++ b/drivers/infiniband/core/mad.c
>>>>>>>>> @@ -813,6 +813,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
>>>>>>>>>  			goto out;
>>>>>>>>>  	}
>>>>>>>>>
>>>>>>>>> +	/* If device does not define the optional process_mad function it means that
>>>>>>>>> +	 * everything is handled in hardware:
>>>>>>>>> +	 */
>>>>>>>>> +	if (!device->process_mad)
>>>>>>>>> +		goto out;
>>>>>>>>> +
>>>>>>>> This changes the ib_post_send_mad flow for those devices in that it
>>>>>>>> takes the send rather than error path.
>>>>>>> Yes, but no packets will ever be received by this function for the i40iw and usnic
>>>>>>> devices because they have said in their capabilities that they do not support SMI?
>>>>>> You're right - it shouldn't get here for those devices (it's based on
>>>>>> MAD rather than SMI cap) - umad open would fail for the port GUID.
>>>>> Ok, I see - good!
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  	local = kmalloc(sizeof *local, GFP_ATOMIC);
>>>>>>>>>  	if (!local) {
>>>>>>>>>  		ret = -ENOMEM;
>>>>>>>>> diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
>>>>>>>>> index 33c91c8..16a9f9a 100644
>>>>>>>>> --- a/drivers/infiniband/core/smi.h
>>>>>>>>> +++ b/drivers/infiniband/core/smi.h
>>>>>>>>> @@ -67,8 +67,7 @@ static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
>>>>>>>>>  {
>>>>>>>>>  	/* C14-9:3 -- We're at the end of the DR segment of path */
>>>>>>>>>  	/* C14-9:4 -- Hop Pointer = Hop Count + 1 -> give to SMA/SM */
>>>>>>>>> -	return ((device->process_mad &&
>>>>>>>>> -		!ib_get_smp_direction(smp) &&
>>>>>>>>> +	return ((!ib_get_smp_direction(smp) &&
>>>>>>>>>  		(smp->hop_ptr == smp->hop_cnt + 1)) ?
>>>>>>>>>  		IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>>>>>  }
>>>>>>>>> @@ -82,8 +81,7 @@ static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
>>>>>>>>>  {
>>>>>>>>>  	/* C14-13:3 -- We're at the end of the DR segment of path */
>>>>>>>>>  	/* C14-13:4 -- Hop Pointer == 0 -> give to SM */
>>>>>>>>> -	return ((device->process_mad &&
>>>>>>>>> -		ib_get_smp_direction(smp) &&
>>>>>>>>> +	return ((ib_get_smp_direction(smp) &&
>>>>>>>>>  		!smp->hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
>>>>>>>>>  }
>>>>>>>> Also, with this approach of optional process_mad for IB device, will/how
>>>>>>>> will sysfs port counters be supported for this device ? This currently
>>>>>>>> relies on process_mad.
>>>>>>> Yes, that is actually a problem for us right now.
>>>>>>> We do however think that the solution of composing a packet to send via process_mad is
>>>>>>> kind of overkill as this doesn't allow devices to optimize the way to retrieve this info.
>>>>>> Is there a way other than via PMA to get these counters ?
>>>>> Yes, the driver have a work request based model to request such info from the device.
>>>>> This is partly firmware based so it can be extended/adapted if necessary.
>>>> sysfs.c could be extended to support this in addition to the current PMA
>>>> approach for PortCounters/PortCountersExtended although PMA access to
>>>> your device needs to work anyhow for at least PortCounters. Perhaps a
>>>> new optional device driver entry point for get_port_stats taking
>>>> ib_device struct and port number and returning some stats struct ?
>>> Sounds good! We do support PMA as well.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> It should be possible to implement a process_mad routine to return
>>>>>>>> ib_mad_result based on management class and perhaps attribute ID in the
>>>>>>>> case of SMInfo. Can this alternative approach be used for SIF ?
>>>>>>> The problem is that the handle_outgoing_dr_smp function has an implicit assumption that
>>>>>>> some packets are handled by the process_mad function itself. There is no way to provide return
>>>>>>> values from the process_mad function that ensures that packets are always forwarded to the device,
>>>>>>> so the only viable solution without breaking the API seems to be to not implement process_mad.
>>>>>> Are you referring to the difference between returning 0 when no
>>>>>> process_mad function as this patch does and what happens when
>>>>>> process_mad returns 0 inside of the ib_mad_post_send routine ?
>>>>> Yes, that sounds about right, it's been a while since we struggled with this.
>>>>>
>>>>>>
>>>>>> Doesn't the send MAD need to be tracked in the MAD layer ?
>>>>> The MAD layer tracking happens above the driver, it creates and manages QP0
>>>> and QP1 too
>>> Yes, a similar story applies for QP1.
>>>
>>>>
>>>>>
>>>>> and sees all mad packets that gets sent as with Connect-X<n> ?
>>>> Yes. That's what I'm referring to here as this is skipped since
>>>> process_mad is NULL. Why wouldn't this be needed with SIF ? What happens
>>>> if SIF device doesn't respond when it should ?
>>>> Is there some error scenario where no response would come back ?
>>> On the contrary, the idea is that the Infiniband side of SIF is up and ready to
>>> handle MAD traffic etc. long before the host has booted, or if the host
>>> had crashed for some reason.
>> No transmission medium (including backplanes) are perfect. Isn't there
>> some rare case where there is some bit error occurs causing bad CRC so
>> device never sees valid local MAD or response never makes it back ?
> 
> My understanding is that this would then be a PCIe error case - relatively fatal 
> from the host's perspective? Of course I am not claiming in any way that 
> fatal errors can't happen...
> 
>> Anyhow, I think I see now. There is no such thing as local completion
>> with SIF so that case is handled like any other MAD send in
>> ib_post_send_mad code path (and that tracks the MAD and handles timeouts
>> in case response is not received to "transaction").
>>
>> One minor thing:
>> Since SIF device includes SMA and PMA, there should be no need for mad
>> to "open" the agents. That can either be handled in mad.c:ib_mad_[init
>> remove]_device to skip ib_agent_port_[open close] when
>> device->process_mad is not defined or modify those routines in agent.c
>> to just return if device->process_mad is not defined.
> 
> It's been a while since I looked at the details there - I would have to revisit more carefully.
> We still need to get QP0 and QP1 "created"

They would be created based on rdma_cap_ib_[ mad smi] in
ib_mad_port_open regardless of whether the agents are opened or not.
It's just that the agents (SMA and PMA) would not be registered.

> so I think the current API is satisfactory from 
> our perspective. That's probably the easiest for v1 - 
> then we can look at further further improvements
> once the code is in?

Yes, this is merely "optimization" in that it eliminates unneeded MAD
registrations and should have no other operational impact. Those agents
should never receive or send anything.

-- Hal

> Thanks,
> Knut
> 
>>
>> -- Hal
>>
>>>
>>>
>>> Knut
>>>
>>>>
>>>> -- Hal
>>>>
>>>>>
>>>>>>
>>>>>> Is there some problem invoking ib_send_mad (or ib_send_rmpp_mad) ?
>>>>> No, to my knowledge we are able to support the normal management
>>>>> applications just as one would expect.
>>>>>
>>>>> Thanks,
>>>>> Knut
>>>>>
>>>>>
>>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp
       [not found]                     ` <1472837728.9410.340.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2016-09-12 15:05                       ` Knut Omang
  0 siblings, 0 replies; 39+ messages in thread
From: Knut Omang @ 2016-09-12 15:05 UTC (permalink / raw)
  To: Sean Hefty
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe

On Fri, 2016-09-02 at 19:35 +0200, Knut Omang wrote:
> On Fri, 2016-09-02 at 11:10 -0600, Jason Gunthorpe wrote:
> > On Fri, Sep 02, 2016 at 06:45:52AM +0200, Knut Omang wrote:
> > > On Thu, 2016-09-01 at 20:13 -0600, Jason Gunthorpe wrote:
> > > > On Fri, Sep 02, 2016 at 02:09:26AM +0200, Knut Omang wrote:
> > > > > This commit removes the implementation and use of the modify_qp_mask
> > > > > helper function from the generic OFED implementation and into individual
> > > > > device drivers.
> > > > > 
> > > > > Like with use of the ib_modify_qp_is_ok function it should be up to
> > > > > each device driver how to handle bits set in the attribute masks.
> > > > > 
> > > > > With the modify_qp_mask function applied in the generic code,
> > > > > drivers would not see the bits that the user process actually sets.
> > > > > 
> > > > > The restrictions imposed by the filter are also beyond what
> > > > > is imposed by the Infiniband standard, and would also limit
> > > > > future drivers or hardware from checking for unsupported or
> > > > > invalid settings.
> > > > 
> > > > I'm not excited about this direction. It is not OK for different
> > > > drivers to use different mask algorithms, they must all behave the
> > > > same.
> > > 
> > > The problem I am solving here is that SIF expects (and requires) some of 
> > > the bits that the user correctly sets, but that Mellanox either ignore or need to
> > > be 0. So when that mask is applied to the user input, the value that reaches
> > > the SIF driver is not correct from SIFs perspective. 
> > > If the values goes through as the ULP sets them, then all is fine.
> > 
> > I guess I still don't understand. SIF and mlx cannot have different
> > behavior here. 
> 
> Rest assure, being the new kid on the block here, of course we need to 
> work with existing applications. We try to test with whatever we can get our 
> hands on, and make sure we behave sensibly..
> 
> > How is the application writer to know what to do?
> 
> This masking applies to XRC only.
> The application does not need to do anything, it treats the XRC QP as any other
> RC QP and attempt similar modify_qp operations.
> Mlx chooses to ignore some of the bits provided. 
> 
> > Can you give a concrete example of the problem?
> 
> Maybe someone from Mellanox can explain the reason 
> for having the filtering there in the first place,
> and only from user mode XRC QPs?

Getting some help from git blame here: 

The modify_qp_mask function was introduced by this commit: 
9977f4f64bfeb5d907a793a6880aab2d43b0bed2 by Sean Hefty.

Can I ask why this function was introduced, and if it is still needed?

Thanks,
Knut

> 
> Knut
> 
> > Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-09-12 15:05 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02  0:09 [PATCH 0/9] SIF related verbs patches Knut Omang
     [not found] ` <1472774969-18997-1-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02  0:09   ` [PATCH 1/9] ib_mad: incoming sminfo SMPs gets discarded if no process_mad function is registered Knut Omang
     [not found]     ` <1472774969-18997-2-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-06 14:01       ` Hal Rosenstock
     [not found]         ` <57867d7f-cc9f-5ec2-6632-c552e6469e40-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-09-07 11:42           ` Knut Omang
     [not found]             ` <1473248532.3103.51.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-08 11:33               ` Hal Rosenstock
     [not found]                 ` <098f69ae-6940-589a-e9ad-c65e34c958b7-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-09-08 17:22                   ` Knut Omang
     [not found]                     ` <1473355350.569.5.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-08 19:02                       ` Hal Rosenstock
     [not found]                         ` <2eddf795-71bb-1866-42d9-8a3ba3d512d4-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-09-09  2:23                           ` Knut Omang
     [not found]                             ` <1473387784.569.17.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-09 14:24                               ` Hal Rosenstock
     [not found]                                 ` <37195bbc-0db0-8054-2174-8dc0faf7c692-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-09-12  9:09                                   ` Knut Omang
     [not found]                                     ` <1473671364.17998.17.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-12 14:03                                       ` Hal Rosenstock
2016-09-02  0:09   ` [PATCH 2/9] ib_umem: Add a new, more generic ib_umem_get_attrs Knut Omang
2016-09-02  0:09   ` [PATCH 3/9] ib_umem: With the new ib_umem_get_attrs, simplify ib_umem_get Knut Omang
2016-09-02  0:09   ` [PATCH 4/9] ib: Add udata argument to create_ah Knut Omang
     [not found]     ` <1472774969-18997-5-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02  0:38       ` kbuild test robot
2016-09-02  0:39       ` kbuild test robot
     [not found]         ` <201609020848.QliVPrIS%fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2016-09-02  8:01           ` Knut Omang
2016-09-02  0:09   ` [PATCH 5/9] ib_uverbs: Add padding to end align ib_uverbs_reg_mr_resp Knut Omang
     [not found]     ` <1472774969-18997-6-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02  2:09       ` Jason Gunthorpe
     [not found]         ` <20160902020945.GB30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-09-02  7:54           ` Knut Omang
2016-09-02  0:09   ` [PATCH 6/9] ib_uverbs: Avoid vendor specific masking of attributes in query_qp Knut Omang
     [not found]     ` <1472774969-18997-7-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02  2:13       ` Jason Gunthorpe
     [not found]         ` <20160902021300.GC30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-09-02  4:45           ` Knut Omang
     [not found]             ` <1472791552.9410.258.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02 17:10               ` Jason Gunthorpe
     [not found]                 ` <20160902171008.GE24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-09-02 17:35                   ` Knut Omang
     [not found]                     ` <1472837728.9410.340.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-12 15:05                       ` Knut Omang
2016-09-02  0:09   ` [PATCH 7/9] ib_{uverbs/core}: add new ib_create_qp_ex with udata arg Knut Omang
2016-09-02  0:09   ` [PATCH 8/9] ib_uverbs: Support for kernel implementation of XRC Knut Omang
     [not found]     ` <1472774969-18997-9-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02  2:16       ` Jason Gunthorpe
     [not found]         ` <20160902021640.GD30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-09-02  7:55           ` Knut Omang
2016-09-02  0:09   ` [PATCH 9/9] ib_verbs: Add a new qp create flag to request features for Ethernet over IB Knut Omang
     [not found]     ` <1472774969-18997-10-git-send-email-knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02  2:17       ` Jason Gunthorpe
     [not found]         ` <20160902021729.GE30057-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-09-02  5:04           ` Knut Omang
     [not found]             ` <1472792670.9410.276.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02 11:00               ` Knut Omang
     [not found]                 ` <1472814057.3975.47.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02 16:19                   ` Santosh Shilimkar
     [not found]                     ` <b38c5cf6-4e07-33eb-8704-284498481bb6-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-09-02 17:11                       ` Jason Gunthorpe
     [not found]                         ` <20160902171107.GF24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-09-02 17:14                           ` Santosh Shilimkar
2016-09-02 16:34                   ` Jason Gunthorpe
     [not found]                     ` <20160902163451.GC24997-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-09-07 17:33                       ` Knut Omang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.