All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next V2 0/9] Add completion timestamping support
@ 2015-05-31 12:14 Or Gerlitz
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Or Gerlitz

Hi Doug,

This patchset adds completion timestamping supports for verbs consumers. 

Reviewing the weekend threads, we've changed the flag time to reflect
that this is completion time-stamp and folded the mlx4 actual support 
into one patch.

Regarding the related user-space support, it's possible to add what you
were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and 
ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned
by the poll cq verb an opaque one that must go through one of  the convertors.

We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which 
could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME.

We think this would address the reviewer comments for the kernel submission.

The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the 
conversion functions) 

 https://github.com/matanb10/libibverbs timestamp-v1
 https://github.com/matanb10/libmlx4 timestamp-v1

Timestamping is used by applications in order to know when a WQE was 
received/transmitted by the HW. The value is given is HCA hardware cycles,
but could be easily converted as the hardware's core clock frequecny is 
available through extension of query device. 

Moreover, we add an ability to read the HCA's current clock. This could be 
useful on order to synchronize events to the wall clock.

This functionality is achieved by adding/extending the following verbs:

create_cq - create_cq is extended in order to allow passing creation flags
to the CQ creation function. We change IB/core --> vendors API
to be easily extendible by passing a struct which contains
comp_vectors, cqe and the new flags parameter. In order to create
CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given.

query_device - We extend query_device uverb further by giving the hardware's
clock frequency and the timestamp mask (the number of timestamp
bits which are supported). If timestamp isn't supported, 0 is returned.

In order to read the timestamp in the WQE, the user needs to query the device 
for support, create an appropriate CQ (using the extanded uverb with
IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently,
only implemented in user-space).

In mlx4, allowing the user to read the core clock efficiently involves mapping
this area of the hardware to user-space (being done by using a mmap command)
and reading the clock from the correct offset of the page. 

This offset is returned in the vendor's specific data from mlx4's kernel driver 
to the mlx4's user-space driver. query_device is modified in order to support
passing this vendor specific data. A user-space application could use a new
verb in order to read the hardware's clock.

Translating the hardware's clock into ms could be done by dividing this
value by hca_core_clock (which is returned by the extended version of
query_device uverb).

A user-space application could get the current HW's clock by executing

ibv_query_values_ex(struct ibv_context *context, uint32_t q_values,
                    struct ibv_values_ex *values)

The function gets a mask of the values to query and return their values.
Vendors could either implement this as a uverb command or use their 
user-space driver to return those values directly from the HW (the mlx4 way).

Matan and Or.

Changes from V1:
 (1) fixed lustre IB's code build
 (2) squashed mlx4 V1 9-11 patches into one
 (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION

Changes from V0:
(1) Pass ib_cq_init_attr instead of cqe and comp_vector.
(2) Fix unneeded indentation.
(3) Change flags to u32.
(4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation.

Matan Barak (9):
  IB/core: Change provider's API of create_cq to be extendible
  IB/core: Change ib_create_cq to use struct ib_cq_init_attr
  IB/core: Add CQ creation time-stamping flag
  IB/core: Extend ib_uverbs_create_cq
  IB/core: Add timestamp_mask and hca_core_clock to query_device
  IB/core: Pass hardware specific data in query_device
  IB/mlx4: Add mmap call to map the hardware clock
  IB/mlx4: Support extended create_cq and query_device uverbs
  IB/mlx4: Add support for CQ time-stamping

 drivers/infiniband/core/device.c                   |    6 +-
 drivers/infiniband/core/mad.c                      |    5 +-
 drivers/infiniband/core/uverbs.h                   |    1 +
 drivers/infiniband/core/uverbs_cmd.c               |  188 ++++++++++++++++----
 drivers/infiniband/core/uverbs_main.c              |    1 +
 drivers/infiniband/core/verbs.c                    |    4 +-
 drivers/infiniband/hw/amso1100/c2_provider.c       |   14 ++-
 drivers/infiniband/hw/cxgb3/iwch_provider.c        |   19 ++-
 drivers/infiniband/hw/cxgb4/cq.c                   |    9 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h             |    8 +-
 drivers/infiniband/hw/cxgb4/provider.c             |    8 +-
 drivers/infiniband/hw/ehca/ehca_cq.c               |    7 +-
 drivers/infiniband/hw/ehca/ehca_hca.c              |    6 +-
 drivers/infiniband/hw/ehca/ehca_iverbs.h           |    6 +-
 drivers/infiniband/hw/ehca/ehca_main.c             |    6 +-
 drivers/infiniband/hw/ipath/ipath_cq.c             |    9 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c          |    7 +-
 drivers/infiniband/hw/ipath/ipath_verbs.h          |    3 +-
 drivers/infiniband/hw/mlx4/cq.c                    |   13 ++-
 drivers/infiniband/hw/mlx4/mad.c                   |    5 +-
 drivers/infiniband/hw/mlx4/main.c                  |   67 +++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |   19 ++-
 drivers/infiniband/hw/mlx5/cq.c                    |   10 +-
 drivers/infiniband/hw/mlx5/main.c                  |   19 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |    5 +-
 drivers/infiniband/hw/mthca/mthca_provider.c       |   15 ++-
 drivers/infiniband/hw/nes/nes_verbs.c              |   17 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c        |   13 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h        |    9 +-
 drivers/infiniband/hw/qib/qib_cq.c                 |   11 +-
 drivers/infiniband/hw/qib/qib_verbs.c              |    6 +-
 drivers/infiniband/hw/qib/qib_verbs.h              |    5 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c       |   16 ++-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h       |   10 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    9 +-
 drivers/infiniband/ulp/iser/iser_verbs.c           |    6 +-
 drivers/infiniband/ulp/isert/ib_isert.c            |    6 +-
 drivers/infiniband/ulp/srp/ib_srp.c                |   10 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c              |    5 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |   19 ++
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +-
 include/linux/mlx4/device.h                        |    9 +
 include/rdma/ib_verbs.h                            |   25 ++-
 include/uapi/rdma/ib_user_verbs.h                  |   19 ++
 net/9p/trans_rdma.c                                |    5 +-
 net/rds/ib_cm.c                                    |    8 +-
 net/rds/iw_cm.c                                    |    8 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |   10 +-
 net/sunrpc/xprtrdma/verbs.c                        |   10 +-
 49 files changed, 564 insertions(+), 139 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-31 12:14   ` Or Gerlitz
       [not found]     ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-31 12:14   ` [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr Or Gerlitz
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.

This commit does not change any functionality.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c         |    6 ++++--
 drivers/infiniband/core/verbs.c              |    3 ++-
 drivers/infiniband/hw/amso1100/c2_provider.c |    7 ++++++-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |   11 ++++++++---
 drivers/infiniband/hw/cxgb4/cq.c             |    9 +++++++--
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h       |    8 ++++----
 drivers/infiniband/hw/ehca/ehca_cq.c         |    7 ++++++-
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |    3 ++-
 drivers/infiniband/hw/ipath/ipath_cq.c       |    9 +++++++--
 drivers/infiniband/hw/ipath/ipath_verbs.h    |    3 ++-
 drivers/infiniband/hw/mlx4/cq.c              |    8 +++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |    3 ++-
 drivers/infiniband/hw/mlx5/cq.c              |   10 ++++++++--
 drivers/infiniband/hw/mlx5/main.c            |    3 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |    5 +++--
 drivers/infiniband/hw/mthca/mthca_provider.c |    8 ++++++--
 drivers/infiniband/hw/nes/nes_verbs.c        |   11 ++++++++---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |    7 ++++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |    6 ++++--
 drivers/infiniband/hw/qib/qib_cq.c           |   11 ++++++++---
 drivers/infiniband/hw/qib/qib_verbs.h        |    5 +++--
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |   10 +++++++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h |    7 ++++---
 include/rdma/ib_verbs.h                      |   10 ++++++++--
 24 files changed, 124 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a9f0489..1954ebb 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 	struct ib_uverbs_event_file    *ev_file = NULL;
 	struct ib_cq                   *cq;
 	int                             ret;
+	struct ib_cq_init_attr attr = {};
 
 	if (out_len < sizeof resp)
 		return -ENOSPC;
@@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&obj->comp_list);
 	INIT_LIST_HEAD(&obj->async_list);
 
-	cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe,
-					     cmd.comp_vector,
+	attr.cqe = cmd.cqe;
+	attr.comp_vector = cmd.comp_vector;
+	cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
 					     file->ucontext, &udata);
 	if (IS_ERR(cq)) {
 		ret = PTR_ERR(cq);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 685a362..f7615d4 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1078,8 +1078,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device,
 			   void *cq_context, int cqe, int comp_vector)
 {
 	struct ib_cq *cq;
+	struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
 
-	cq = device->create_cq(device, cqe, comp_vector, NULL, NULL);
+	cq = device->create_cq(device, &attr, NULL, NULL);
 
 	if (!IS_ERR(cq)) {
 		cq->device        = device;
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index d396c39..a43e022 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -286,13 +286,18 @@ static int c2_destroy_qp(struct ib_qp *ib_qp)
 	return 0;
 }
 
-static struct ib_cq *c2_create_cq(struct ib_device *ibdev, int entries, int vector,
+static struct ib_cq *c2_create_cq(struct ib_device *ibdev,
+				  const struct ib_cq_init_attr *attr,
 				  struct ib_ucontext *context,
 				  struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct c2_cq *cq;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	cq = kmalloc(sizeof(*cq), GFP_KERNEL);
 	if (!cq) {
 		pr_debug("%s: Unable to allocate CQ\n", __func__);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 061ef08..2eaf7e8 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -138,10 +138,12 @@ static int iwch_destroy_cq(struct ib_cq *ib_cq)
 	return 0;
 }
 
-static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int vector,
-			     struct ib_ucontext *ib_context,
-			     struct ib_udata *udata)
+static struct ib_cq *iwch_create_cq(struct ib_device *ibdev,
+				    const struct ib_cq_init_attr *attr,
+				    struct ib_ucontext *ib_context,
+				    struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct iwch_dev *rhp;
 	struct iwch_cq *chp;
 	struct iwch_create_cq_resp uresp;
@@ -151,6 +153,9 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int ve
 	size_t resplen;
 
 	PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries);
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	rhp = to_iwch_dev(ibdev);
 	chp = kzalloc(sizeof(*chp), GFP_KERNEL);
 	if (!chp)
diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 68ddb37..9a2c01c 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -866,10 +866,13 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq)
 	return 0;
 }
 
-struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
-			     int vector, struct ib_ucontext *ib_context,
+struct ib_cq *c4iw_create_cq(struct ib_device *ibdev,
+			     const struct ib_cq_init_attr *attr,
+			     struct ib_ucontext *ib_context,
 			     struct ib_udata *udata)
 {
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
 	struct c4iw_dev *rhp;
 	struct c4iw_cq *chp;
 	struct c4iw_create_cq_resp uresp;
@@ -879,6 +882,8 @@ struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
 	struct c4iw_mm_entry *mm, *mm2;
 
 	PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries);
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
 
 	rhp = to_c4iw_dev(ibdev);
 
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 97bb555..540ed02 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -992,10 +992,10 @@ int c4iw_reregister_phys_mem(struct ib_mr *mr,
 				     int acc, u64 *iova_start);
 int c4iw_dereg_mr(struct ib_mr *ib_mr);
 int c4iw_destroy_cq(struct ib_cq *ib_cq);
-struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
-					int vector,
-					struct ib_ucontext *ib_context,
-					struct ib_udata *udata);
+struct ib_cq *c4iw_create_cq(struct ib_device *ibdev,
+			     const struct ib_cq_init_attr *attr,
+			     struct ib_ucontext *ib_context,
+			     struct ib_udata *udata);
 int c4iw_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata);
 int c4iw_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
 int c4iw_destroy_qp(struct ib_qp *ib_qp);
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index 8cc8375..9b68b17 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -113,10 +113,12 @@ struct ehca_qp *ehca_cq_get_qp(struct ehca_cq *cq, int real_qp_num)
 	return ret;
 }
 
-struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
+struct ib_cq *ehca_create_cq(struct ib_device *device,
+			     const struct ib_cq_init_attr *attr,
 			     struct ib_ucontext *context,
 			     struct ib_udata *udata)
 {
+	int cqe = attr->cqe;
 	static const u32 additional_cqe = 20;
 	struct ib_cq *cq;
 	struct ehca_cq *my_cq;
@@ -131,6 +133,9 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 	int ipz_rc, i;
 	unsigned long flags;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (cqe >= 0xFFFFFFFF - 64 - additional_cqe)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 077185b..952f2fb 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -129,7 +129,8 @@ int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq);
 void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq);
 
 
-struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
+struct ib_cq *ehca_create_cq(struct ib_device *device,
+			     const struct ib_cq_init_attr *attr,
 			     struct ib_ucontext *context,
 			     struct ib_udata *udata);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c
index 0416c6c..e9dd911 100644
--- a/drivers/infiniband/hw/ipath/ipath_cq.c
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c
@@ -188,7 +188,7 @@ static void send_complete(unsigned long data)
 /**
  * ipath_create_cq - create a completion queue
  * @ibdev: the device this completion queue is attached to
- * @entries: the minimum size of the completion queue
+ * @attr: creation attributes
  * @context: unused by the InfiniPath driver
  * @udata: unused by the InfiniPath driver
  *
@@ -197,16 +197,21 @@ static void send_complete(unsigned long data)
  *
  * Called by ib_create_cq() in the generic verbs code.
  */
-struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector,
+struct ib_cq *ipath_create_cq(struct ib_device *ibdev,
+			      const struct ib_cq_init_attr *attr,
 			      struct ib_ucontext *context,
 			      struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct ipath_ibdev *dev = to_idev(ibdev);
 	struct ipath_cq *cq;
 	struct ipath_cq_wc *wc;
 	struct ib_cq *ret;
 	u32 sz;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > ib_ipath_max_cqes) {
 		ret = ERR_PTR(-EINVAL);
 		goto done;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index ae6cff4..c4e4e81 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -807,7 +807,8 @@ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig);
 
 int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
 
-struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector,
+struct ib_cq *ipath_create_cq(struct ib_device *ibdev,
+			      const struct ib_cq_init_attr *attr,
 			      struct ib_ucontext *context,
 			      struct ib_udata *udata);
 
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 0176caa..8e44aaa 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -166,15 +166,21 @@ err_buf:
 	return err;
 }
 
-struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
+struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
+				const struct ib_cq_init_attr *attr,
 				struct ib_ucontext *context,
 				struct ib_udata *udata)
 {
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
 	struct mlx4_ib_cq *cq;
 	struct mlx4_uar *uar;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > dev->dev->caps.max_cqes)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index fce3934..f127efc 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -668,7 +668,8 @@ void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
 
 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
-struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
+struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
+				const struct ib_cq_init_attr *attr,
 				struct ib_ucontext *context,
 				struct ib_udata *udata);
 int mlx4_ib_destroy_cq(struct ib_cq *cq);
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 2ee6b10..09fbae6 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -736,10 +736,13 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq)
 	mlx5_db_free(dev->mdev, &cq->db);
 }
 
-struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
-				int vector, struct ib_ucontext *context,
+struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
+				const struct ib_cq_init_attr *attr,
+				struct ib_ucontext *context,
 				struct ib_udata *udata)
 {
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
 	struct mlx5_create_cq_mbox_in *cqb = NULL;
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_ib_cq *cq;
@@ -750,6 +753,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
 	int eqn;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 0)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b2fdb9c..9565c20 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1087,6 +1087,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 {
 	struct ib_srq_init_attr attr;
 	struct mlx5_ib_dev *dev;
+	struct ib_cq_init_attr cq_attr = {.cqe = 1};
 	int ret = 0;
 
 	dev = container_of(devr, struct mlx5_ib_dev, devr);
@@ -1100,7 +1101,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 	devr->p0->uobject = NULL;
 	atomic_set(&devr->p0->usecnt, 0);
 
-	devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, 1, 0, NULL, NULL);
+	devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, &cq_attr, NULL, NULL);
 	if (IS_ERR(devr->c0)) {
 		ret = PTR_ERR(devr->c0);
 		goto error1;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index dff1cfc..b086162 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -556,8 +556,9 @@ int mlx5_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 void *mlx5_get_send_wqe(struct mlx5_ib_qp *qp, int n);
 int mlx5_ib_read_user_wqe(struct mlx5_ib_qp *qp, int send, int wqe_index,
 			  void *buffer, u32 length);
-struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
-				int vector, struct ib_ucontext *context,
+struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
+				const struct ib_cq_init_attr *attr,
+				struct ib_ucontext *context,
 				struct ib_udata *udata);
 int mlx5_ib_destroy_cq(struct ib_cq *cq);
 int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 509d59e..f1b04bd 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -641,16 +641,20 @@ static int mthca_destroy_qp(struct ib_qp *qp)
 	return 0;
 }
 
-static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, int entries,
-				     int comp_vector,
+static struct ib_cq *mthca_create_cq(struct ib_device *ibdev,
+				     const struct ib_cq_init_attr *attr,
 				     struct ib_ucontext *context,
 				     struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct mthca_create_cq ucmd;
 	struct mthca_cq *cq;
 	int nent;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > to_mdev(ibdev)->limits.max_cqes)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 05530e3..a9d62c5 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -1526,10 +1526,12 @@ static int nes_destroy_qp(struct ib_qp *ibqp)
 /**
  * nes_create_cq
  */
-static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries,
-		int comp_vector,
-		struct ib_ucontext *context, struct ib_udata *udata)
+static struct ib_cq *nes_create_cq(struct ib_device *ibdev,
+				   const struct ib_cq_init_attr *attr,
+				   struct ib_ucontext *context,
+				   struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	u64 u64temp;
 	struct nes_vnic *nesvnic = to_nesvnic(ibdev);
 	struct nes_device *nesdev = nesvnic->nesdev;
@@ -1549,6 +1551,9 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries,
 	unsigned long flags;
 	int ret;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries > nesadapter->max_cqe)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 8771755..f441df6 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1000,10 +1000,12 @@ err:
 	return status;
 }
 
-struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector,
+struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev,
+			       const struct ib_cq_init_attr *attr,
 			       struct ib_ucontext *ib_ctx,
 			       struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct ocrdma_cq *cq;
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
 	struct ocrdma_ucontext *uctx = NULL;
@@ -1011,6 +1013,9 @@ struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector,
 	int status;
 	struct ocrdma_create_cq_ureq ureq;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (udata) {
 		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq)))
 			return ERR_PTR(-EFAULT);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 3cdc81e..e853195 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -59,8 +59,10 @@ struct ib_pd *ocrdma_alloc_pd(struct ib_device *,
 			      struct ib_ucontext *, struct ib_udata *);
 int ocrdma_dealloc_pd(struct ib_pd *pd);
 
-struct ib_cq *ocrdma_create_cq(struct ib_device *, int entries, int vector,
-			       struct ib_ucontext *, struct ib_udata *);
+struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev,
+			       const struct ib_cq_init_attr *attr,
+			       struct ib_ucontext *ib_ctx,
+			       struct ib_udata *udata);
 int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *);
 int ocrdma_destroy_cq(struct ib_cq *);
 
diff --git a/drivers/infiniband/hw/qib/qib_cq.c b/drivers/infiniband/hw/qib/qib_cq.c
index ab4e11c..2b45d0b 100644
--- a/drivers/infiniband/hw/qib/qib_cq.c
+++ b/drivers/infiniband/hw/qib/qib_cq.c
@@ -203,7 +203,7 @@ static void send_complete(struct kthread_work *work)
 /**
  * qib_create_cq - create a completion queue
  * @ibdev: the device this completion queue is attached to
- * @entries: the minimum size of the completion queue
+ * @attr: creation attributes
  * @context: unused by the QLogic_IB driver
  * @udata: user data for libibverbs.so
  *
@@ -212,16 +212,21 @@ static void send_complete(struct kthread_work *work)
  *
  * Called by ib_create_cq() in the generic verbs code.
  */
-struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries,
-			    int comp_vector, struct ib_ucontext *context,
+struct ib_cq *qib_create_cq(struct ib_device *ibdev,
+			    const struct ib_cq_init_attr *attr,
+			    struct ib_ucontext *context,
 			    struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct qib_ibdev *dev = to_idev(ibdev);
 	struct qib_cq *cq;
 	struct qib_cq_wc *wc;
 	struct ib_cq *ret;
 	u32 sz;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > ib_qib_max_cqes) {
 		ret = ERR_PTR(-EINVAL);
 		goto done;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index bfc8948..e9b26a4 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -1007,8 +1007,9 @@ void qib_cq_enter(struct qib_cq *cq, struct ib_wc *entry, int sig);
 
 int qib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
 
-struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries,
-			    int comp_vector, struct ib_ucontext *context,
+struct ib_cq *qib_create_cq(struct ib_device *ibdev,
+			    const struct ib_cq_init_attr *attr,
+			    struct ib_ucontext *context,
 			    struct ib_udata *udata);
 
 int qib_destroy_cq(struct ib_cq *ibcq);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 53bd6a2..2b3f7e3 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -570,13 +570,17 @@ int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	return status;
 }
 
-struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries,
-					int vector, struct ib_ucontext *context,
-					struct ib_udata *udata)
+struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev,
+				 const struct ib_cq_init_attr *attr,
+				 struct ib_ucontext *context,
+				 struct ib_udata *udata)
 {
 	struct ib_cq *cq;
 
 	usnic_dbg("\n");
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	cq = kzalloc(sizeof(*cq), GFP_KERNEL);
 	if (!cq)
 		return ERR_PTR(-EBUSY);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index 57ddba5..2ab8427 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -46,9 +46,10 @@ struct ib_qp *usnic_ib_create_qp(struct ib_pd *pd,
 int usnic_ib_destroy_qp(struct ib_qp *qp);
 int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 				int attr_mask, struct ib_udata *udata);
-struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries,
-					int vector, struct ib_ucontext *context,
-					struct ib_udata *udata);
+struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev,
+				 const struct ib_cq_init_attr *attr,
+				 struct ib_ucontext *context,
+				 struct ib_udata *udata);
 int usnic_ib_destroy_cq(struct ib_cq *cq);
 struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
 				u64 virt_addr, int access_flags,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ad499bd..6c715d7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -173,6 +173,12 @@ struct ib_odp_caps {
 	} per_transport_caps;
 };
 
+struct ib_cq_init_attr {
+	unsigned int	cqe;
+	int		comp_vector;
+	u32		flags;
+};
+
 struct ib_device_attr {
 	u64			fw_ver;
 	__be64			sys_image_guid;
@@ -1614,8 +1620,8 @@ struct ib_device {
 	int                        (*post_recv)(struct ib_qp *qp,
 						struct ib_recv_wr *recv_wr,
 						struct ib_recv_wr **bad_recv_wr);
-	struct ib_cq *             (*create_cq)(struct ib_device *device, int cqe,
-						int comp_vector,
+	struct ib_cq *             (*create_cq)(struct ib_device *device,
+						const struct ib_cq_init_attr *attr,
 						struct ib_ucontext *context,
 						struct ib_udata *udata);
 	int                        (*modify_cq)(struct ib_cq *cq, u16 cq_count,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-31 12:14   ` [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
       [not found]     ` <1433074457-26437-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-31 12:14   ` [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag Or Gerlitz
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currently, ib_create_cq uses cqe and comp_vecotr instead
of the extendible ib_cq_init_attr struct.

Earlier patches already changed the vendors to work with
ib_cq_init_attr. This patch changes the consumers too.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/mad.c                      |    5 ++++-
 drivers/infiniband/core/verbs.c                    |    5 ++---
 drivers/infiniband/hw/ehca/ehca_main.c             |    6 +++++-
 drivers/infiniband/hw/mlx4/mad.c                   |    5 ++++-
 drivers/infiniband/hw/mlx4/main.c                  |    5 ++++-
 drivers/infiniband/hw/mlx5/main.c                  |    7 +++++--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    9 +++++++--
 drivers/infiniband/ulp/iser/iser_verbs.c           |    6 +++++-
 drivers/infiniband/ulp/isert/ib_isert.c            |    6 +++++-
 drivers/infiniband/ulp/srp/ib_srp.c                |   10 ++++++++--
 drivers/infiniband/ulp/srpt/ib_srpt.c              |    5 ++++-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +++++--
 include/rdma/ib_verbs.h                            |    6 ++----
 net/9p/trans_rdma.c                                |    5 ++++-
 net/rds/ib_cm.c                                    |    8 ++++++--
 net/rds/iw_cm.c                                    |    8 ++++++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |   10 ++++++----
 net/sunrpc/xprtrdma/verbs.c                        |   10 ++++++----
 18 files changed, 88 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 600af26..ad3f729 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2923,6 +2923,7 @@ static int ib_mad_port_open(struct ib_device *device,
 	unsigned long flags;
 	char name[sizeof "ib_mad123"];
 	int has_smi;
+	struct ib_cq_init_attr cq_attr;
 
 	/* Create new device info */
 	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
@@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device,
 	if (has_smi)
 		cq_size *= 2;
 
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = cq_size;
 	port_priv->cq = ib_create_cq(port_priv->device,
 				     ib_mad_thread_completion_handler,
-				     NULL, port_priv, cq_size, 0);
+				     NULL, port_priv, &cq_attr);
 	if (IS_ERR(port_priv->cq)) {
 		dev_err(&device->dev, "Couldn't create ib_mad CQ\n");
 		ret = PTR_ERR(port_priv->cq);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f7615d4..a716ae2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp);
 struct ib_cq *ib_create_cq(struct ib_device *device,
 			   ib_comp_handler comp_handler,
 			   void (*event_handler)(struct ib_event *, void *),
-			   void *cq_context, int cqe, int comp_vector)
+			   void *cq_context, struct ib_cq_init_attr *cq_attr)
 {
 	struct ib_cq *cq;
-	struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
 
-	cq = device->create_cq(device, &attr, NULL, NULL);
+	cq = device->create_cq(device, cq_attr, NULL, NULL);
 
 	if (!IS_ERR(cq)) {
 		cq->device        = device;
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 5e30b72..7727556 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -552,6 +552,7 @@ static int ehca_create_aqp1(struct ehca_shca *shca, u32 port)
 	struct ib_cq *ibcq;
 	struct ib_qp *ibqp;
 	struct ib_qp_init_attr qp_init_attr;
+	struct ib_cq_init_attr cq_attr;
 	int ret;
 
 	if (sport->ibcq_aqp1) {
@@ -559,7 +560,10 @@ static int ehca_create_aqp1(struct ehca_shca *shca, u32 port)
 		return -EPERM;
 	}
 
-	ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void *)(-1), 10, 0);
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = 10;
+	ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void *)(-1),
+			    &cq_attr);
 	if (IS_ERR(ibcq)) {
 		ehca_err(&shca->ib_device, "Cannot create AQP1 CQ.");
 		return PTR_ERR(ibcq);
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 9cd2b00..462e728 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -1773,6 +1773,7 @@ static int create_pv_resources(struct ib_device *ibdev, int slave, int port,
 			       int create_tun, struct mlx4_ib_demux_pv_ctx *ctx)
 {
 	int ret, cq_size;
+	struct ib_cq_init_attr cq_attr;
 
 	if (ctx->state != DEMUX_PV_STATE_DOWN)
 		return -EEXIST;
@@ -1801,8 +1802,10 @@ static int create_pv_resources(struct ib_device *ibdev, int slave, int port,
 	if (ctx->has_smi)
 		cq_size *= 2;
 
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = cq_size;
 	ctx->cq = ib_create_cq(ctx->ib_dev, mlx4_ib_tunnel_comp_handler,
-			       NULL, ctx, cq_size, 0);
+			       NULL, ctx, &cq_attr);
 	if (IS_ERR(ctx->cq)) {
 		ret = PTR_ERR(ctx->cq);
 		pr_err("Couldn't create tunnel CQ (%d)\n", ret);
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index c49dd0b..3b90df0 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -758,6 +758,7 @@ static struct ib_xrcd *mlx4_ib_alloc_xrcd(struct ib_device *ibdev,
 					  struct ib_udata *udata)
 {
 	struct mlx4_ib_xrcd *xrcd;
+	struct ib_cq_init_attr cq_attr;
 	int err;
 
 	if (!(to_mdev(ibdev)->dev->caps.flags & MLX4_DEV_CAP_FLAG_XRC))
@@ -777,7 +778,9 @@ static struct ib_xrcd *mlx4_ib_alloc_xrcd(struct ib_device *ibdev,
 		goto err2;
 	}
 
-	xrcd->cq = ib_create_cq(ibdev, NULL, NULL, xrcd, 1, 0);
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = 1;
+	xrcd->cq = ib_create_cq(ibdev, NULL, NULL, xrcd, &cq_attr);
 	if (IS_ERR(xrcd->cq)) {
 		err = PTR_ERR(xrcd->cq);
 		goto err3;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 9565c20..b9976a0 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -971,6 +971,7 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
 	struct ib_cq *cq;
 	struct ib_qp *qp;
 	struct ib_mr *mr;
+	struct ib_cq_init_attr cq_attr;
 	int ret;
 
 	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
@@ -994,8 +995,10 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
 		goto error_1;
 	}
 
-	cq = ib_create_cq(&dev->ib_dev, mlx5_umr_cq_handler, NULL, NULL, 128,
-			  0);
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = 128;
+	cq = ib_create_cq(&dev->ib_dev, mlx5_umr_cq_handler, NULL, NULL,
+			  &cq_attr);
 	if (IS_ERR(cq)) {
 		mlx5_ib_dbg(dev, "Couldn't create CQ for sync UMR QP\n");
 		ret = PTR_ERR(cq);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index e5cc430..93a0ca4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -141,6 +141,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		.sq_sig_type = IB_SIGNAL_ALL_WR,
 		.qp_type     = IB_QPT_UD
 	};
+	struct ib_cq_init_attr cq_attr;
 
 	int ret, size;
 	int i;
@@ -178,14 +179,18 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 	} else
 		goto out_free_wq;
 
-	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0);
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = size;
+	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL,
+				     dev, &cq_attr);
 	if (IS_ERR(priv->recv_cq)) {
 		printk(KERN_WARNING "%s: failed to create receive CQ\n", ca->name);
 		goto out_cm_dev_cleanup;
 	}
 
+	cq_attr.cqe = ipoib_sendq_size;
 	priv->send_cq = ib_create_cq(priv->ca, ipoib_send_comp_handler, NULL,
-				     dev, ipoib_sendq_size, 0);
+				     dev, &cq_attr);
 	if (IS_ERR(priv->send_cq)) {
 		printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name);
 		goto out_free_recv_cq;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index d33c5c0..e287287 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -126,14 +126,18 @@ static int iser_create_device_ib_res(struct iser_device *device)
 		goto pd_err;
 
 	for (i = 0; i < device->comps_used; i++) {
+		struct ib_cq_init_attr cq_attr;
 		struct iser_comp *comp = &device->comps[i];
 
 		comp->device = device;
+		memset(&cq_attr, 0, sizeof(cq_attr));
+		cq_attr.cqe = max_cqe;
+		cq_attr.comp_vector = i;
 		comp->cq = ib_create_cq(device->ib_device,
 					iser_cq_callback,
 					iser_cq_event_callback,
 					(void *)comp,
-					max_cqe, i);
+					&cq_attr);
 		if (IS_ERR(comp->cq)) {
 			comp->cq = NULL;
 			goto cq_err;
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index d99a0c8..59ac1fc 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -318,15 +318,19 @@ isert_alloc_comps(struct isert_device *device,
 	max_cqe = min(ISER_MAX_CQ_LEN, attr->max_cqe);
 
 	for (i = 0; i < device->comps_used; i++) {
+		struct ib_cq_init_attr cq_attr;
 		struct isert_comp *comp = &device->comps[i];
 
 		comp->device = device;
 		INIT_WORK(&comp->work, isert_cq_work);
+		memset(&cq_attr, 0, sizeof(cq_attr));
+		cq_attr.cqe = max_cqe;
+		cq_attr.comp_vector = i;
 		comp->cq = ib_create_cq(device->ib_device,
 					isert_cq_callback,
 					isert_cq_event_callback,
 					(void *)comp,
-					max_cqe, i);
+					&cq_attr);
 		if (IS_ERR(comp->cq)) {
 			isert_err("Unable to allocate cq\n");
 			ret = PTR_ERR(comp->cq);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index c3f654d..16b29cb 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -500,6 +500,7 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch)
 	struct ib_fmr_pool *fmr_pool = NULL;
 	struct srp_fr_pool *fr_pool = NULL;
 	const int m = 1 + dev->use_fast_reg;
+	struct ib_cq_init_attr cq_attr;
 	int ret;
 
 	init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
@@ -507,15 +508,20 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch)
 		return -ENOMEM;
 
 	/* + 1 for SRP_LAST_WR_ID */
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = target->queue_size + 1;
+	cq_attr.comp_vector = ch->comp_vector;
 	recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, ch,
-			       target->queue_size + 1, ch->comp_vector);
+			       &cq_attr);
 	if (IS_ERR(recv_cq)) {
 		ret = PTR_ERR(recv_cq);
 		goto err;
 	}
 
+	cq_attr.cqe = m * target->queue_size;
+	cq_attr.comp_vector = ch->comp_vector;
 	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, ch,
-			       m * target->queue_size, ch->comp_vector);
+			       &cq_attr);
 	if (IS_ERR(send_cq)) {
 		ret = PTR_ERR(send_cq);
 		goto err_recv_cq;
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9b84b4c..1a65202 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2080,6 +2080,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 	struct srpt_port *sport = ch->sport;
 	struct srpt_device *sdev = sport->sdev;
 	u32 srp_sq_size = sport->port_attrib.srp_sq_size;
+	struct ib_cq_init_attr cq_attr;
 	int ret;
 
 	WARN_ON(ch->rq_size < 1);
@@ -2090,8 +2091,10 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 		goto out;
 
 retry:
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = ch->rq_size + srp_sq_size;
 	ch->cq = ib_create_cq(sdev->device, srpt_completion, NULL, ch,
-			      ch->rq_size + srp_sq_size, 0);
+			      &cq_attr);
 	if (IS_ERR(ch->cq)) {
 		ret = PTR_ERR(ch->cq);
 		pr_err("failed to create CQ cqe= %d ret= %d\n",
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 3bad441..a23aa76 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -647,6 +647,7 @@ kib_conn_t *kiblnd_create_conn(kib_peer_t *peer, struct rdma_cm_id *cmid,
 	kib_dev_t	      *dev;
 	struct ib_qp_init_attr *init_qp_attr;
 	struct kib_sched_info	*sched;
+	struct ib_cq_init_attr  cq_attr;
 	kib_conn_t		*conn;
 	struct ib_cq		*cq;
 	unsigned long		flags;
@@ -742,10 +743,12 @@ kib_conn_t *kiblnd_create_conn(kib_peer_t *peer, struct rdma_cm_id *cmid,
 
 	kiblnd_map_rx_descs(conn);
 
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = IBLND_CQ_ENTRIES(version);
+	cq_attr.comp_vector = kiblnd_get_completion_vector(conn, cpt);
 	cq = ib_create_cq(cmid->device,
 			  kiblnd_cq_completion, kiblnd_cq_event, conn,
-			  IBLND_CQ_ENTRIES(version),
-			  kiblnd_get_completion_vector(conn, cpt));
+			  &cq_attr);
 	if (IS_ERR(cq)) {
 		CERROR("Can't create CQ: %ld, cqe: %d\n",
 		       PTR_ERR(cq), IBLND_CQ_ENTRIES(version));
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6c715d7..321a0b3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2314,16 +2314,14 @@ static inline int ib_post_recv(struct ib_qp *qp,
  *   asynchronous event not associated with a completion occurs on the CQ.
  * @cq_context: Context associated with the CQ returned to the user via
  *   the associated completion and event handlers.
- * @cqe: The minimum size of the CQ.
- * @comp_vector - Completion vector used to signal completion events.
- *     Must be >= 0 and < context->num_comp_vectors.
+ * @cq_attr: The attributes the CQ should be created upon.
  *
  * Users can examine the cq structure to determine the actual CQ size.
  */
 struct ib_cq *ib_create_cq(struct ib_device *device,
 			   ib_comp_handler comp_handler,
 			   void (*event_handler)(struct ib_event *, void *),
-			   void *cq_context, int cqe, int comp_vector);
+			   void *cq_context, struct ib_cq_init_attr *cq_attr);
 
 /**
  * ib_resize_cq - Modifies the capacity of the CQ.
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 3533d2a..bfbcb45 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -648,6 +648,7 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args)
 	struct rdma_conn_param conn_param;
 	struct ib_qp_init_attr qp_attr;
 	struct ib_device_attr devattr;
+	struct ib_cq_init_attr cq_attr;
 
 	/* Parse the transport specific mount options */
 	err = parse_opts(args, &opts);
@@ -705,9 +706,11 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args)
 		goto error;
 
 	/* Create the Completion Queue */
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = opts.sq_depth + opts.rq_depth + 1;
 	rdma->cq = ib_create_cq(rdma->cm_id->device, cq_comp_handler,
 				cq_event_handler, client,
-				opts.sq_depth + opts.rq_depth + 1, 0);
+				&cq_attr);
 	if (IS_ERR(rdma->cq))
 		goto error;
 	ib_req_notify_cq(rdma->cq, IB_CQ_NEXT_COMP);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index b8d1bda..7aa1488 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -247,6 +247,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	struct rds_ib_connection *ic = conn->c_transport_data;
 	struct ib_device *dev = ic->i_cm_id->device;
 	struct ib_qp_init_attr attr;
+	struct ib_cq_init_attr cq_attr;
 	struct rds_ib_device *rds_ibdev;
 	int ret;
 
@@ -270,9 +271,11 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	ic->i_pd = rds_ibdev->pd;
 	ic->i_mr = rds_ibdev->mr;
 
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = ic->i_send_ring.w_nr + 1;
 	ic->i_send_cq = ib_create_cq(dev, rds_ib_send_cq_comp_handler,
 				     rds_ib_cq_event_handler, conn,
-				     ic->i_send_ring.w_nr + 1, 0);
+				     &cq_attr);
 	if (IS_ERR(ic->i_send_cq)) {
 		ret = PTR_ERR(ic->i_send_cq);
 		ic->i_send_cq = NULL;
@@ -280,9 +283,10 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 		goto out;
 	}
 
+	cq_attr.cqe = ic->i_recv_ring.w_nr;
 	ic->i_recv_cq = ib_create_cq(dev, rds_ib_recv_cq_comp_handler,
 				     rds_ib_cq_event_handler, conn,
-				     ic->i_recv_ring.w_nr, 0);
+				     &cq_attr);
 	if (IS_ERR(ic->i_recv_cq)) {
 		ret = PTR_ERR(ic->i_recv_cq);
 		ic->i_recv_cq = NULL;
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
index a6c2bea..bc29196 100644
--- a/net/rds/iw_cm.c
+++ b/net/rds/iw_cm.c
@@ -179,6 +179,7 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr,
 		void *context)
 {
 	struct ib_device *dev = rds_iwdev->dev;
+	struct ib_cq_init_attr cq_attr;
 	unsigned int send_size, recv_size;
 	int ret;
 
@@ -198,9 +199,11 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr,
 	attr->sq_sig_type = IB_SIGNAL_REQ_WR;
 	attr->qp_type = IB_QPT_RC;
 
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = send_size;
 	attr->send_cq = ib_create_cq(dev, send_cq_handler,
 				     rds_iw_cq_event_handler,
-				     context, send_size, 0);
+				     context, &cq_attr);
 	if (IS_ERR(attr->send_cq)) {
 		ret = PTR_ERR(attr->send_cq);
 		attr->send_cq = NULL;
@@ -208,9 +211,10 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr,
 		goto out;
 	}
 
+	cq_attr.cqe = recv_size;
 	attr->recv_cq = ib_create_cq(dev, recv_cq_handler,
 				     rds_iw_cq_event_handler,
-				     context, recv_size, 0);
+				     context, &cq_attr);
 	if (IS_ERR(attr->recv_cq)) {
 		ret = PTR_ERR(attr->recv_cq);
 		attr->recv_cq = NULL;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 88eb994..ad96ee5 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -855,6 +855,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct svcxprt_rdma *listen_rdma;
 	struct svcxprt_rdma *newxprt = NULL;
 	struct rdma_conn_param conn_param;
+	struct ib_cq_init_attr cq_attr;
 	struct ib_qp_init_attr qp_attr;
 	struct ib_device_attr devattr;
 	int uninitialized_var(dma_mr_acc);
@@ -907,22 +908,23 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		dprintk("svcrdma: error creating PD for connect request\n");
 		goto errout;
 	}
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = newxprt->sc_sq_depth;
 	newxprt->sc_sq_cq = ib_create_cq(newxprt->sc_cm_id->device,
 					 sq_comp_handler,
 					 cq_event_handler,
 					 newxprt,
-					 newxprt->sc_sq_depth,
-					 0);
+					 &cq_attr);
 	if (IS_ERR(newxprt->sc_sq_cq)) {
 		dprintk("svcrdma: error creating SQ CQ for connect request\n");
 		goto errout;
 	}
+	cq_attr.cqe = newxprt->sc_max_requests;
 	newxprt->sc_rq_cq = ib_create_cq(newxprt->sc_cm_id->device,
 					 rq_comp_handler,
 					 cq_event_handler,
 					 newxprt,
-					 newxprt->sc_max_requests,
-					 0);
+					 &cq_attr);
 	if (IS_ERR(newxprt->sc_rq_cq)) {
 		dprintk("svcrdma: error creating RQ CQ for connect request\n");
 		goto errout;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6f6b8a5..5ac60d4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -644,6 +644,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 {
 	struct ib_device_attr *devattr = &ia->ri_devattr;
 	struct ib_cq *sendcq, *recvcq;
+	struct ib_cq_init_attr cq_attr;
 	int rc, err;
 
 	/* check provider's send/recv wr limits */
@@ -691,9 +692,10 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	init_waitqueue_head(&ep->rep_connect_wait);
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
+	memset(&cq_attr, 0, sizeof(cq_attr));
+	cq_attr.cqe = ep->rep_attr.cap.max_send_wr + 1;
 	sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
-				  rpcrdma_cq_async_error_upcall, ep,
-				  ep->rep_attr.cap.max_send_wr + 1, 0);
+				  rpcrdma_cq_async_error_upcall, ep, &cq_attr);
 	if (IS_ERR(sendcq)) {
 		rc = PTR_ERR(sendcq);
 		dprintk("RPC:       %s: failed to create send CQ: %i\n",
@@ -708,9 +710,9 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 		goto out2;
 	}
 
+	cq_attr.cqe = ep->rep_attr.cap.max_recv_wr + 1;
 	recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
-				  rpcrdma_cq_async_error_upcall, ep,
-				  ep->rep_attr.cap.max_recv_wr + 1, 0);
+				  rpcrdma_cq_async_error_upcall, ep, &cq_attr);
 	if (IS_ERR(recvcq)) {
 		rc = PTR_ERR(recvcq);
 		dprintk("RPC:       %s: failed to create recv CQ: %i\n",
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-31 12:14   ` [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz
  2015-05-31 12:14   ` [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
  2015-05-31 12:14   ` [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq Or Gerlitz
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add CQ creation flag which dictates that the created CQ will report
completion time-stamp value in the WC.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 include/rdma/ib_verbs.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 321a0b3..62e88e1 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -173,6 +173,10 @@ struct ib_odp_caps {
 	} per_transport_caps;
 };
 
+enum ib_cq_creation_flags {
+	IB_CQ_FLAGS_TIMESTAMP_COMPLETION   = 1 << 0,
+};
+
 struct ib_cq_init_attr {
 	unsigned int	cqe;
 	int		comp_vector;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-05-31 12:14   ` [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
  2015-05-31 12:14   ` [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

ib_uverbs_ex_create_cq follows the extension verbs
mechanism. New features (for example, CQ creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |    1 +
 drivers/infiniband/core/uverbs_cmd.c  |  170 ++++++++++++++++++++++++++-------
 drivers/infiniband/core/uverbs_main.c |    1 +
 include/uapi/rdma/ib_user_verbs.h     |   17 ++++
 4 files changed, 154 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index b716b08..ba365b6 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -259,5 +259,6 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
 IB_UVERBS_DECLARE_EX_CMD(create_flow);
 IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 IB_UVERBS_DECLARE_EX_CMD(query_device);
+IB_UVERBS_DECLARE_EX_CMD(create_cq);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 1954ebb..51311b1 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1330,41 +1330,37 @@ ssize_t ib_uverbs_create_comp_channel(struct ib_uverbs_file *file,
 	return in_len;
 }
 
-ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
-			    const char __user *buf, int in_len,
-			    int out_len)
+static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
+				       struct ib_udata *ucore,
+				       struct ib_udata *uhw,
+				       struct ib_uverbs_ex_create_cq *cmd,
+				       size_t cmd_sz,
+				       int (*cb)(struct ib_uverbs_file *file,
+						 struct ib_ucq_object *obj,
+						 struct ib_uverbs_ex_create_cq_resp *resp,
+						 struct ib_udata *udata,
+						 void *context),
+				       void *context)
 {
-	struct ib_uverbs_create_cq      cmd;
-	struct ib_uverbs_create_cq_resp resp;
-	struct ib_udata                 udata;
 	struct ib_ucq_object           *obj;
 	struct ib_uverbs_event_file    *ev_file = NULL;
 	struct ib_cq                   *cq;
 	int                             ret;
+	struct ib_uverbs_ex_create_cq_resp resp;
 	struct ib_cq_init_attr attr = {};
 
-	if (out_len < sizeof resp)
-		return -ENOSPC;
-
-	if (copy_from_user(&cmd, buf, sizeof cmd))
-		return -EFAULT;
-
-	INIT_UDATA(&udata, buf + sizeof cmd,
-		   (unsigned long) cmd.response + sizeof resp,
-		   in_len - sizeof cmd, out_len - sizeof resp);
-
-	if (cmd.comp_vector >= file->device->num_comp_vectors)
-		return -EINVAL;
+	if (cmd->comp_vector >= file->device->num_comp_vectors)
+		return ERR_PTR(-EINVAL);
 
 	obj = kmalloc(sizeof *obj, GFP_KERNEL);
 	if (!obj)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
-	init_uobj(&obj->uobject, cmd.user_handle, file->ucontext, &cq_lock_class);
+	init_uobj(&obj->uobject, cmd->user_handle, file->ucontext, &cq_lock_class);
 	down_write(&obj->uobject.mutex);
 
-	if (cmd.comp_channel >= 0) {
-		ev_file = ib_uverbs_lookup_comp_file(cmd.comp_channel);
+	if (cmd->comp_channel >= 0) {
+		ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel);
 		if (!ev_file) {
 			ret = -EINVAL;
 			goto err;
@@ -1377,10 +1373,14 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&obj->comp_list);
 	INIT_LIST_HEAD(&obj->async_list);
 
-	attr.cqe = cmd.cqe;
-	attr.comp_vector = cmd.comp_vector;
+	attr.cqe = cmd->cqe;
+	attr.comp_vector = cmd->comp_vector;
+
+	if (cmd_sz > offsetof(typeof(*cmd), flags) + sizeof(cmd->flags))
+		attr.flags = cmd->flags;
+
 	cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
-					     file->ucontext, &udata);
+					     file->ucontext, uhw);
 	if (IS_ERR(cq)) {
 		ret = PTR_ERR(cq);
 		goto err_file;
@@ -1399,14 +1399,15 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 		goto err_free;
 
 	memset(&resp, 0, sizeof resp);
-	resp.cq_handle = obj->uobject.id;
-	resp.cqe       = cq->cqe;
+	resp.base.cq_handle = obj->uobject.id;
+	resp.base.cqe       = cq->cqe;
 
-	if (copy_to_user((void __user *) (unsigned long) cmd.response,
-			 &resp, sizeof resp)) {
-		ret = -EFAULT;
-		goto err_copy;
-	}
+	resp.response_length = offsetof(typeof(resp), response_length) +
+		sizeof(resp.response_length);
+
+	ret = cb(file, obj, &resp, ucore, context);
+	if (ret)
+		goto err_cb;
 
 	mutex_lock(&file->mutex);
 	list_add_tail(&obj->uobject.list, &file->ucontext->cq_list);
@@ -1416,9 +1417,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 
 	up_write(&obj->uobject.mutex);
 
-	return in_len;
+	return obj;
 
-err_copy:
+err_cb:
 	idr_remove_uobj(&ib_uverbs_cq_idr, &obj->uobject);
 
 err_free:
@@ -1430,7 +1431,106 @@ err_file:
 
 err:
 	put_uobj_write(&obj->uobject);
-	return ret;
+
+	return ERR_PTR(ret);
+}
+
+static int ib_uverbs_create_cq_cb(struct ib_uverbs_file *file,
+				  struct ib_ucq_object *obj,
+				  struct ib_uverbs_ex_create_cq_resp *resp,
+				  struct ib_udata *ucore, void *context)
+{
+	if (ib_copy_to_udata(ucore, &resp->base, sizeof(resp->base)))
+		return -EFAULT;
+
+	return 0;
+}
+
+ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
+			    const char __user *buf, int in_len,
+			    int out_len)
+{
+	struct ib_uverbs_create_cq      cmd;
+	struct ib_uverbs_ex_create_cq	cmd_ex;
+	struct ib_uverbs_create_cq_resp resp;
+	struct ib_udata                 ucore;
+	struct ib_udata                 uhw;
+	struct ib_ucq_object           *obj;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	INIT_UDATA(&ucore, buf, cmd.response, sizeof(cmd), sizeof(resp));
+
+	INIT_UDATA(&uhw, buf + sizeof(cmd),
+		   (unsigned long)cmd.response + sizeof(resp),
+		   in_len - sizeof(cmd), out_len - sizeof(resp));
+
+	memset(&cmd_ex, 0, sizeof(cmd_ex));
+	cmd_ex.user_handle = cmd.user_handle;
+	cmd_ex.cqe = cmd.cqe;
+	cmd_ex.comp_vector = cmd.comp_vector;
+	cmd_ex.comp_channel = cmd.comp_channel;
+
+	obj = create_cq(file, &ucore, &uhw, &cmd_ex,
+			offsetof(typeof(cmd_ex), comp_channel) +
+			sizeof(cmd.comp_channel), ib_uverbs_create_cq_cb,
+			NULL);
+
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	return in_len;
+}
+
+static int ib_uverbs_ex_create_cq_cb(struct ib_uverbs_file *file,
+				     struct ib_ucq_object *obj,
+				     struct ib_uverbs_ex_create_cq_resp *resp,
+				     struct ib_udata *ucore, void *context)
+{
+	if (ib_copy_to_udata(ucore, resp, resp->response_length))
+		return -EFAULT;
+
+	return 0;
+}
+
+int ib_uverbs_ex_create_cq(struct ib_uverbs_file *file,
+			   struct ib_udata *ucore,
+			   struct ib_udata *uhw)
+{
+	struct ib_uverbs_ex_create_cq_resp resp;
+	struct ib_uverbs_ex_create_cq  cmd;
+	struct ib_ucq_object           *obj;
+	int err;
+
+	if (ucore->inlen < sizeof(cmd))
+		return -EINVAL;
+
+	err = ib_copy_from_udata(&cmd, ucore, sizeof(cmd));
+	if (err)
+		return err;
+
+	if (cmd.comp_mask)
+		return -EINVAL;
+
+	if (cmd.reserved)
+		return -EINVAL;
+
+	if (ucore->outlen < (offsetof(typeof(resp), response_length) +
+			     sizeof(resp.response_length)))
+		return -ENOSPC;
+
+	obj = create_cq(file, ucore, uhw, &cmd,
+			min(ucore->inlen, sizeof(cmd)),
+			ib_uverbs_ex_create_cq_cb, NULL);
+
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	return 0;
 }
 
 ssize_t ib_uverbs_resize_cq(struct ib_uverbs_file *file,
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 88cce9b..f6eef2d 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -124,6 +124,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_EX_CMD_CREATE_FLOW]	= ib_uverbs_ex_create_flow,
 	[IB_USER_VERBS_EX_CMD_DESTROY_FLOW]	= ib_uverbs_ex_destroy_flow,
 	[IB_USER_VERBS_EX_CMD_QUERY_DEVICE]	= ib_uverbs_ex_query_device,
+	[IB_USER_VERBS_EX_CMD_CREATE_CQ]	= ib_uverbs_ex_create_cq,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index b513e66..19f0256 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -91,6 +91,7 @@ enum {
 
 enum {
 	IB_USER_VERBS_EX_CMD_QUERY_DEVICE = IB_USER_VERBS_CMD_QUERY_DEVICE,
+	IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ,
 	IB_USER_VERBS_EX_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
 	IB_USER_VERBS_EX_CMD_DESTROY_FLOW,
 };
@@ -353,11 +354,27 @@ struct ib_uverbs_create_cq {
 	__u64 driver_data[0];
 };
 
+struct ib_uverbs_ex_create_cq {
+	__u64 user_handle;
+	__u32 cqe;
+	__u32 comp_vector;
+	__s32 comp_channel;
+	__u32 comp_mask;
+	__u32 flags;
+	__u32 reserved;
+};
+
 struct ib_uverbs_create_cq_resp {
 	__u32 cq_handle;
 	__u32 cqe;
 };
 
+struct ib_uverbs_ex_create_cq_resp {
+	struct ib_uverbs_create_cq_resp base;
+	__u32 comp_mask;
+	__u32 response_length;
+};
+
 struct ib_uverbs_resize_cq {
 	__u64 response;
 	__u32 cq_handle;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-05-31 12:14   ` [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
  2015-05-31 12:14   ` [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device Or Gerlitz
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In order to expose timestamp we need to expose two attributes
in query_device:
timestamp_mask - how many bits are valid in the timestamp.
		 timestamp values could be 64bits the most.
hca_core_clock - timestamp is given in HW cycles, hca_core_clock
                 is the frequency of the HCA and is necessary in
		 order to convert cycles to seconds.

This is added both to ib_query_device and its respective uverbs
counterpart.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c     |    2 ++
 drivers/infiniband/core/uverbs_cmd.c |   14 ++++++++++++++
 include/rdma/ib_verbs.h              |    2 ++
 include/uapi/rdma/ib_user_verbs.h    |    2 ++
 4 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8d07c12..568cb41 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -539,6 +539,8 @@ EXPORT_SYMBOL(ib_dispatch_event);
 int ib_query_device(struct ib_device *device,
 		    struct ib_device_attr *device_attr)
 {
+	memset(device_attr, 0, sizeof(*device_attr));
+
 	return device->query_device(device, device_attr);
 }
 EXPORT_SYMBOL(ib_query_device);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 51311b1..11ee298 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3426,6 +3426,8 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 	if (ucore->outlen < resp.response_length)
 		return -ENOSPC;
 
+	memset(&attr, 0, sizeof(attr));
+
 	err = device->query_device(device, &attr);
 	if (err)
 		return err;
@@ -3450,6 +3452,18 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 #endif
 	resp.response_length += sizeof(resp.odp_caps);
 
+	if (ucore->outlen < resp.response_length + sizeof(resp.timestamp_mask))
+		goto end;
+
+	resp.timestamp_mask = attr.timestamp_mask;
+	resp.response_length += sizeof(resp.timestamp_mask);
+
+	if (ucore->outlen < resp.response_length + sizeof(resp.hca_core_clock))
+		goto end;
+
+	resp.hca_core_clock = attr.hca_core_clock;
+	resp.response_length += sizeof(resp.hca_core_clock);
+
 end:
 	err = ib_copy_to_udata(ucore, &resp, resp.response_length);
 	if (err)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 62e88e1..a45f674 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -227,6 +227,8 @@ struct ib_device_attr {
 	int			sig_prot_cap;
 	int			sig_guard_cap;
 	struct ib_odp_caps	odp_caps;
+	uint64_t		timestamp_mask;
+	uint64_t		hca_core_clock;
 };
 
 enum ib_mtu {
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 19f0256..f9b8843 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -223,6 +223,8 @@ struct ib_uverbs_ex_query_device_resp {
 	__u32 comp_mask;
 	__u32 response_length;
 	struct ib_uverbs_odp_caps odp_caps;
+	__u64 timestamp_mask;
+	__u64 hca_core_clock;
 };
 
 struct ib_uverbs_query_port {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-05-31 12:14   ` [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
       [not found]     ` <1433074457-26437-7-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-31 12:14   ` [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Vendors should be able to pass vendor specific data to/from
user-space via query_device uverb. In order to do this,
we need to pass the vendors' specific udata.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c             |    4 +++-
 drivers/infiniband/core/uverbs_cmd.c         |    2 +-
 drivers/infiniband/hw/amso1100/c2_provider.c |    7 +++++--
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |    8 ++++++--
 drivers/infiniband/hw/cxgb4/provider.c       |    8 ++++++--
 drivers/infiniband/hw/ehca/ehca_hca.c        |    6 +++++-
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |    3 ++-
 drivers/infiniband/hw/ipath/ipath_verbs.c    |    7 +++++--
 drivers/infiniband/hw/mlx4/main.c            |    6 +++++-
 drivers/infiniband/hw/mlx5/main.c            |    9 +++++++--
 drivers/infiniband/hw/mthca/mthca_provider.c |    7 +++++--
 drivers/infiniband/hw/nes/nes_verbs.c        |    6 +++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |    6 +++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |    3 ++-
 drivers/infiniband/hw/qib/qib_verbs.c        |    6 ++++--
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |    6 +++++-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h |    3 ++-
 include/rdma/ib_verbs.h                      |    3 ++-
 18 files changed, 75 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 568cb41..694bd66 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -539,9 +539,11 @@ EXPORT_SYMBOL(ib_dispatch_event);
 int ib_query_device(struct ib_device *device,
 		    struct ib_device_attr *device_attr)
 {
+	struct ib_udata uhw = {.outlen = 0, .inlen = 0};
+
 	memset(device_attr, 0, sizeof(*device_attr));
 
-	return device->query_device(device, device_attr);
+	return device->query_device(device, device_attr, &uhw);
 }
 EXPORT_SYMBOL(ib_query_device);
 
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 11ee298..bbb02ff 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3428,7 +3428,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 
 	memset(&attr, 0, sizeof(attr));
 
-	err = device->query_device(device, &attr);
+	err = device->query_device(device, &attr, uhw);
 	if (err)
 		return err;
 
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index a43e022..382f109 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -63,13 +63,16 @@
 #include "c2_provider.h"
 #include "c2_user.h"
 
-static int c2_query_device(struct ib_device *ibdev,
-			   struct ib_device_attr *props)
+static int c2_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			   struct ib_udata *uhw)
 {
 	struct c2_dev *c2dev = to_c2dev(ibdev);
 
 	pr_debug("%s:%u\n", __func__, __LINE__);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	*props = c2dev->props;
 	return 0;
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 2eaf7e8..c4b5936 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1150,13 +1150,17 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev)
 	       (fw_mic & 0xffff);
 }
 
-static int iwch_query_device(struct ib_device *ibdev,
-			     struct ib_device_attr *props)
+static int iwch_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			     struct ib_udata *uhw)
 {
 
 	struct iwch_dev *dev;
+
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	dev = to_iwch_dev(ibdev);
 	memset(props, 0, sizeof *props);
 	memcpy(&props->sys_image_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6);
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index ef08a9f..05a96a5 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -301,13 +301,17 @@ static int c4iw_query_gid(struct ib_device *ibdev, u8 port, int index,
 	return 0;
 }
 
-static int c4iw_query_device(struct ib_device *ibdev,
-			     struct ib_device_attr *props)
+static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			     struct ib_udata *uhw)
 {
 
 	struct c4iw_dev *dev;
+
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	dev = to_c4iw_dev(ibdev);
 	memset(props, 0, sizeof *props);
 	memcpy(&props->sys_image_guid, dev->rdev.lldi.ports[0]->dev_addr, 6);
diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index 9ed4d25..e8b1bb6 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -50,7 +50,8 @@ static unsigned int limit_uint(unsigned int value)
 	return min_t(unsigned int, value, INT_MAX);
 }
 
-int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
+int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+		      struct ib_udata *uhw)
 {
 	int i, ret = 0;
 	struct ehca_shca *shca = container_of(ibdev, struct ehca_shca,
@@ -71,6 +72,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
 		IB_DEVICE_PORT_ACTIVE_EVENT,  HCA_CAP_PORT_ACTIVE_EVENT,
 	};
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL);
 	if (!rblock) {
 		ehca_err(&shca->ib_device, "Can't allocate rblock memory.");
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 952f2fb..192eef1 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -44,7 +44,8 @@
 
 #include "ehca_classes.h"
 
-int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props);
+int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+		      struct ib_udata *uhw);
 
 int ehca_query_port(struct ib_device *ibdev, u8 port,
 		    struct ib_port_attr *props);
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 764081d..78af73c 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1495,11 +1495,14 @@ bail:
 	return 0;
 }
 
-static int ipath_query_device(struct ib_device *ibdev,
-			      struct ib_device_attr *props)
+static int ipath_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			      struct ib_udata *uhw)
 {
 	struct ipath_ibdev *dev = to_idev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	memset(props, 0, sizeof(*props));
 
 	props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 3b90df0..914beae 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -132,7 +132,8 @@ static int num_ib_ports(struct mlx4_dev *dev)
 }
 
 static int mlx4_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props)
+				struct ib_device_attr *props,
+				struct ib_udata *uhw)
 {
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
 	struct ib_smp *in_mad  = NULL;
@@ -140,6 +141,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	int err = -ENOMEM;
 	int have_ib_ports;
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
 	out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
 	if (!in_mad || !out_mad)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b9976a0..5a8486c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -63,7 +63,8 @@ static char mlx5_version[] =
 	DRIVER_VERSION " (" DRIVER_RELDATE ")\n";
 
 static int mlx5_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props)
+				struct ib_device_attr *props,
+				struct ib_udata *uhw)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct ib_smp *in_mad  = NULL;
@@ -74,6 +75,9 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	int max_sq_sg;
 	u64 flags;
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	gen = &dev->mdev->caps.gen;
 	in_mad  = kzalloc(sizeof(*in_mad), GFP_KERNEL);
 	out_mad = kmalloc(sizeof(*out_mad), GFP_KERNEL);
@@ -910,6 +914,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 	struct mlx5_general_caps *gen;
 	int err = -ENOMEM;
 	int port;
+	struct ib_udata uhw = {.inlen = 0, .outlen = 0};
 
 	gen = &dev->mdev->caps.gen;
 	pprops = kmalloc(sizeof(*pprops), GFP_KERNEL);
@@ -920,7 +925,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 	if (!dprops)
 		goto out;
 
-	err = mlx5_ib_query_device(&dev->ib_dev, dprops);
+	err = mlx5_ib_query_device(&dev->ib_dev, dprops, &uhw);
 	if (err) {
 		mlx5_ib_warn(dev, "query_device failed %d\n", err);
 		goto out;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index f1b04bd..d6d7c16 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -57,14 +57,17 @@ static void init_query_mad(struct ib_smp *mad)
 	mad->method    	   = IB_MGMT_METHOD_GET;
 }
 
-static int mthca_query_device(struct ib_device *ibdev,
-			      struct ib_device_attr *props)
+static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			      struct ib_udata *uhw)
 {
 	struct ib_smp *in_mad  = NULL;
 	struct ib_smp *out_mad = NULL;
 	int err = -ENOMEM;
 	struct mthca_dev *mdev = to_mdev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
 	out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
 	if (!in_mad || !out_mad)
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index a9d62c5..0c2fb7a 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -512,12 +512,16 @@ static void nes_free_fast_reg_page_list(struct ib_fast_reg_page_list *pifrpl)
 /**
  * nes_query_device
  */
-static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
+static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			    struct ib_udata *uhw)
 {
 	struct nes_vnic *nesvnic = to_nesvnic(ibdev);
 	struct nes_device *nesdev = nesvnic->nesdev;
 	struct nes_ib_device *nesibdev = nesvnic->nesibdev;
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	memset(props, 0, sizeof(*props));
 	memcpy(&props->sys_image_guid, nesvnic->netdev->dev_addr, 6);
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index f441df6..fce11ec 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -61,10 +61,14 @@ int ocrdma_query_gid(struct ib_device *ibdev, u8 port,
 	return 0;
 }
 
-int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr)
+int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr,
+			struct ib_udata *uhw)
 {
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	memset(attr, 0, sizeof *attr);
 	memcpy(&attr->fw_ver, &dev->attr.fw_ver[0],
 	       min(sizeof(dev->attr.fw_ver), sizeof(attr->fw_ver)));
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index e853195..b15c608 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -36,7 +36,8 @@ int ocrdma_post_recv(struct ib_qp *, struct ib_recv_wr *,
 int ocrdma_poll_cq(struct ib_cq *, int num_entries, struct ib_wc *wc);
 int ocrdma_arm_cq(struct ib_cq *, enum ib_cq_notify_flags flags);
 
-int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props);
+int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props,
+			struct ib_udata *uhw);
 int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props);
 int ocrdma_modify_port(struct ib_device *, u8 port, int mask,
 		       struct ib_port_modify *props);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index dba1c92..e1dad45 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1550,12 +1550,14 @@ full:
 	}
 }
 
-static int qib_query_device(struct ib_device *ibdev,
-			    struct ib_device_attr *props)
+static int qib_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			    struct ib_udata *uhw)
 {
 	struct qib_devdata *dd = dd_from_ibdev(ibdev);
 	struct qib_ibdev *dev = to_idev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
 	memset(props, 0, sizeof(*props));
 
 	props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 2b3f7e3..7df4382 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -248,7 +248,8 @@ enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device,
 }
 
 int usnic_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props)
+			  struct ib_device_attr *props,
+			  struct ib_udata *uhw)
 {
 	struct usnic_ib_dev *us_ibdev = to_usdev(ibdev);
 	union ib_gid gid;
@@ -257,6 +258,9 @@ int usnic_ib_query_device(struct ib_device *ibdev,
 	int qp_per_vf;
 
 	usnic_dbg("\n");
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	mutex_lock(&us_ibdev->usdev_lock);
 	us_ibdev->netdev->ethtool_ops->get_drvinfo(us_ibdev->netdev, &info);
 	us_ibdev->netdev->ethtool_ops->get_settings(us_ibdev->netdev, &cmd);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index 2ab8427..0bd04ef 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -24,7 +24,8 @@
 enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device,
 						u8 port_num);
 int usnic_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props);
+				struct ib_device_attr *props,
+			  struct ib_udata *uhw);
 int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
 				struct ib_port_attr *props);
 enum rdma_protocol_type
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a45f674..045b5b2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1562,7 +1562,8 @@ struct ib_device {
 	int		           (*get_protocol_stats)(struct ib_device *device,
 							 union rdma_protocol_stats *stats);
 	int		           (*query_device)(struct ib_device *device,
-						   struct ib_device_attr *device_attr);
+						   struct ib_device_attr *device_attr,
+						   struct ib_udata *udata);
 	int		           (*query_port)(struct ib_device *device,
 						 u8 port_num,
 						 struct ib_port_attr *port_attr);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-05-31 12:14   ` [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
  2015-05-31 12:14   ` [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In order to read the HCA's cycle counter efficiently in
user space, we need to map the HCA's register.
This is done through mmap call.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c         |   18 +++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/main.c |   19 +++++++++++++++++++
 include/linux/mlx4/device.h               |    9 +++++++++
 3 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 914beae..3992349 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -716,8 +716,24 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 				       dev->dev->caps.num_uars,
 				       PAGE_SIZE, vma->vm_page_prot))
 			return -EAGAIN;
-	} else
+	} else if (vma->vm_pgoff == 3) {
+		struct mlx4_clock_params params;
+		int ret = mlx4_get_internal_clock_params(dev->dev, &params);
+
+		if (ret)
+			return ret;
+
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		if (io_remap_pfn_range(vma, vma->vm_start,
+				       (pci_resource_start(dev->dev->persist->pdev,
+							   params.bar) +
+					params.offset)
+				       >> PAGE_SHIFT,
+				       PAGE_SIZE, vma->vm_page_prot))
+			return -EAGAIN;
+	} else {
 		return -EINVAL;
+	}
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index ced5eca..70de39c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1674,6 +1674,25 @@ static int map_internal_clock(struct mlx4_dev *dev)
 	return 0;
 }
 
+int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
+				   struct mlx4_clock_params *params)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+
+	if (mlx4_is_slave(dev))
+		return -ENOTSUPP;
+
+	if (!params)
+		return -EINVAL;
+
+	params->bar = priv->fw.clock_bar;
+	params->offset = priv->fw.clock_offset;
+	params->size = MLX4_CLOCK_SIZE;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_get_internal_clock_params);
+
 static void unmap_internal_clock(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 83e80ab..f94984f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -829,6 +829,12 @@ struct mlx4_dev {
 	struct mlx4_vf_dev     *dev_vfs;
 };
 
+struct mlx4_clock_params {
+	u64 offset;
+	u8 bar;
+	u8 size;
+};
+
 struct mlx4_eqe {
 	u8			reserved1;
 	u8			type;
@@ -1485,4 +1491,7 @@ int mlx4_ACCESS_PTYS_REG(struct mlx4_dev *dev,
 			 enum mlx4_access_reg_method method,
 			 struct mlx4_ptys_reg *ptys_reg);
 
+int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
+				   struct mlx4_clock_params *params);
+
 #endif /* MLX4_DEVICE_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-05-31 12:14   ` [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
       [not found]     ` <1433074457-26437-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-31 12:14   ` [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping Or Gerlitz
                     ` (2 subsequent siblings)
  10 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device
by setting the appropriate bit in uverbs_ex_cmd_mask.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 3992349..832d571 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2323,6 +2323,10 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 			(1ull << IB_USER_VERBS_EX_CMD_DESTROY_FLOW);
 	}
 
+	ibdev->ib_dev.uverbs_ex_cmd_mask |=
+		(1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE) |
+		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ);
+
 	mlx4_ib_alloc_eqs(dev, ibdev);
 
 	spin_lock_init(&iboe->lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-05-31 12:14   ` [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz
@ 2015-05-31 12:14   ` Or Gerlitz
  2015-05-31 12:31   ` [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz
  2015-05-31 19:00   ` Doug Ledford
  10 siblings, 0 replies; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This includes:

* support allocation of CQ with the TIMESTAMP_COMPLETION creation flag.

* add timestamp_mask and hca_core_clock to query_device, reporting the
  number of supported timestamp bits (mask) and the hca_core_clock frequency.

* return hca core clock's offset in query_device vendor's data,
  this is needed in order to read the HCA's core clock.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/cq.c      |    9 +++++--
 drivers/infiniband/hw/mlx4/main.c    |   38 ++++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   16 ++++++++++++++
 3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 8e44aaa..36eb3d0 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -166,6 +166,7 @@ err_buf:
 	return err;
 }
 
+#define CQ_CREATE_FLAGS_SUPPORTED IB_CQ_FLAGS_TIMESTAMP_COMPLETION
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 				const struct ib_cq_init_attr *attr,
 				struct ib_ucontext *context,
@@ -178,10 +179,10 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 	struct mlx4_uar *uar;
 	int err;
 
-	if (attr->flags)
+	if (entries < 1 || entries > dev->dev->caps.max_cqes)
 		return ERR_PTR(-EINVAL);
 
-	if (entries < 1 || entries > dev->dev->caps.max_cqes)
+	if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED)
 		return ERR_PTR(-EINVAL);
 
 	cq = kmalloc(sizeof *cq, GFP_KERNEL);
@@ -194,6 +195,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 	spin_lock_init(&cq->lock);
 	cq->resize_buf = NULL;
 	cq->resize_umem = NULL;
+	cq->create_flags = attr->flags;
 	INIT_LIST_HEAD(&cq->send_qp_list);
 	INIT_LIST_HEAD(&cq->recv_qp_list);
 
@@ -237,7 +239,8 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 		vector = dev->eq_table[vector % ibdev->num_comp_vectors];
 
 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq, vector, 0, 0);
+			    cq->db.dma, &cq->mcq, vector, 0,
+			    !!(cq->create_flags & IB_CQ_FLAGS_TIMESTAMP_COMPLETION));
 	if (err)
 		goto err_dbmap;
 
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 832d571..c642bf0 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -140,10 +140,27 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	struct ib_smp *out_mad = NULL;
 	int err = -ENOMEM;
 	int have_ib_ports;
+	struct mlx4_uverbs_ex_query_device cmd;
+	struct mlx4_uverbs_ex_query_device_resp resp = {.comp_mask = 0};
+	struct mlx4_clock_params clock_params;
 
-	if (uhw->inlen || uhw->outlen)
-		return -EINVAL;
+	if (uhw->inlen) {
+		if (uhw->inlen < sizeof(cmd))
+			return -EINVAL;
+
+		err = ib_copy_from_udata(&cmd, uhw, sizeof(cmd));
+		if (err)
+			return err;
+
+		if (cmd.comp_mask)
+			return -EINVAL;
+
+		if (cmd.reserved)
+			return -EINVAL;
+	}
 
+	resp.response_length = offsetof(typeof(resp), response_length) +
+		sizeof(resp.response_length);
 	in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
 	out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
 	if (!in_mad || !out_mad)
@@ -233,7 +250,24 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
 					   props->max_mcast_grp;
 	props->max_map_per_fmr = dev->dev->caps.max_fmr_maps;
+	props->hca_core_clock = dev->dev->caps.hca_core_clock;
+	props->timestamp_mask = 0xFFFFFFFFFFFFULL;
 
+	err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
+	if (err)
+		goto out;
+
+	if (uhw->outlen >= resp.response_length + sizeof(resp.hca_core_clock_offset)) {
+		resp.hca_core_clock_offset = clock_params.offset % PAGE_SIZE;
+		resp.response_length += sizeof(resp.hca_core_clock_offset);
+		resp.comp_mask |= QUERY_DEVICE_RESP_MASK_TIMESTAMP;
+	}
+
+	if (uhw->outlen) {
+		err = ib_copy_to_udata(uhw, &resp, resp.response_length);
+		if (err)
+			goto out;
+	}
 out:
 	kfree(in_mad);
 	kfree(out_mad);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index f127efc..8421e15 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -110,6 +110,7 @@ struct mlx4_ib_cq {
 	struct mutex		resize_mutex;
 	struct ib_umem	       *umem;
 	struct ib_umem	       *resize_umem;
+	int			create_flags;
 	/* List of qps that it serves.*/
 	struct list_head		send_qp_list;
 	struct list_head		recv_qp_list;
@@ -555,6 +556,21 @@ struct mlx4_ib_qp_tunnel_init_attr {
 	u8 port;
 };
 
+struct mlx4_uverbs_ex_query_device {
+	__u32 comp_mask;
+	__u32 reserved;
+};
+
+enum query_device_resp_mask {
+	QUERY_DEVICE_RESP_MASK_TIMESTAMP = 1UL << 0,
+};
+
+struct mlx4_uverbs_ex_query_device_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+	__u64 hca_core_clock_offset;
+};
+
 static inline struct mlx4_ib_dev *to_mdev(struct ib_device *ibdev)
 {
 	return container_of(ibdev, struct mlx4_ib_dev, ib_dev);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-05-31 12:14   ` [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping Or Gerlitz
@ 2015-05-31 12:31   ` Or Gerlitz
  2015-05-31 19:00   ` Doug Ledford
  10 siblings, 0 replies; 64+ messages in thread
From: Or Gerlitz @ 2015-05-31 12:31 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On 5/31/2015 3:14 PM, Or Gerlitz wrote:
> We would to go for one helper

Would like to go for one helper
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2015-05-31 12:31   ` [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz
@ 2015-05-31 19:00   ` Doug Ledford
       [not found]     ` <1433098827.114391.179.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  10 siblings, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-05-31 19:00 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 10062 bytes --]

On Sun, 2015-05-31 at 15:14 +0300, Or Gerlitz wrote:
> Hi Doug,
> 
> This patchset adds completion timestamping supports for verbs consumers. 
> 
> Reviewing the weekend threads, we've changed the flag time to reflect
> that this is completion time-stamp and folded the mlx4 actual support 
> into one patch.
> 
> Regarding the related user-space support, it's possible to add what you
> were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and 
> ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned
> by the poll cq verb an opaque one that must go through one of  the convertors.
> 
> We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which 
> could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME.

I'm theoretically OK with something similar to the above.  However, the
NS time should not be raw.  It should be cooked and should be able to be
valid to compare between different adapters.  Right now, the cycle
counter that you are exposing is only useful for ordering between
packets received on a single adapter where the cycle counter is the same
on all packets.  Throw in a different vendor's card, or two of your own
cards, and the issue gets much more complex.  The cooked value should be
an actual, real time that can be used across these more complex
environments.  Because of that, it really shouldn't be called RAW.

So, if you want a single entry point, I would suggest something like
this:

enum ib_timestamp_flags {
	IB_TIMESTAMP_COMPLETION = (1 << 0), // specify on create_cq
	IB_TIMESTAMP_WQE_BEGIN =  (1 << 1), // specify on create qp?
	IB_TIMESTAMP_WQE_END =    (1 << 2), // specify on create qp?
	IB_TIMESTAMP_RAW =        (1 << 31)
};

enum ib_cq_creation_flags {
	IB_CQ_FLAGS_TIMESTAMP_COMPLETION = (1 << 0)
};

/**
 * ibv_get_timestamp - Return the requested timestamp for the given wc
 * @wc - work completion to get timestamp results from
 * @ts - struct timespec to return timestamp in
 * @flags - which timestamp to return and in what form
 *
 * Depending on the flags used to create the queue pair/completion
 * queue, different timestamps might be available.  Callers should
 * specify which timestamp they are interested in using the flags
 * element, and if they wish either a cooked or raw timestamp.  A
 * raw timestamp is implementation defined and will be passed back
 * in the tv_nsec portion of the struct timespec.  A raw timestamp
 * can not be relied upon to have any ordering value between more
 * than one HCA or driver.  A cooked timestamp will return a valid
 * struct timespec normalized as closely as possible to the return
 * value for CLOCK_MONOTONIC of clock_gettime at the time of the
 * timestamp.
 */
int ibv_get_timestamp(struct ibv_wc *wc, struct timespec *ts, int
flags);

> We think this would address the reviewer comments for the kernel submission.
> 
> The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the 
> conversion functions) 
> 
>  https://github.com/matanb10/libibverbs timestamp-v1
>  https://github.com/matanb10/libmlx4 timestamp-v1
> 
> Timestamping is used by applications in order to know when a WQE was 
> received/transmitted by the HW. The value is given is HCA hardware cycles,
> but could be easily converted as the hardware's core clock frequecny is 
> available through extension of query device. 
> 
> Moreover, we add an ability to read the HCA's current clock. This could be 
> useful on order to synchronize events to the wall clock.
> 
> This functionality is achieved by adding/extending the following verbs:
> 
> create_cq - create_cq is extended in order to allow passing creation flags
> to the CQ creation function. We change IB/core --> vendors API
> to be easily extendible by passing a struct which contains
> comp_vectors, cqe and the new flags parameter. In order to create
> CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given.
> 
> query_device - We extend query_device uverb further by giving the hardware's
> clock frequency and the timestamp mask (the number of timestamp
> bits which are supported). If timestamp isn't supported, 0 is returned.
> 
> In order to read the timestamp in the WQE, the user needs to query the device 
> for support, create an appropriate CQ (using the extanded uverb with
> IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently,
> only implemented in user-space).
> 
> In mlx4, allowing the user to read the core clock efficiently involves mapping
> this area of the hardware to user-space (being done by using a mmap command)
> and reading the clock from the correct offset of the page. 
> 
> This offset is returned in the vendor's specific data from mlx4's kernel driver 
> to the mlx4's user-space driver. query_device is modified in order to support
> passing this vendor specific data. A user-space application could use a new
> verb in order to read the hardware's clock.
> 
> Translating the hardware's clock into ms could be done by dividing this
> value by hca_core_clock (which is returned by the extended version of
> query_device uverb).
> 
> A user-space application could get the current HW's clock by executing
> 
> ibv_query_values_ex(struct ibv_context *context, uint32_t q_values,
>                     struct ibv_values_ex *values)
> 
> The function gets a mask of the values to query and return their values.
> Vendors could either implement this as a uverb command or use their 
> user-space driver to return those values directly from the HW (the mlx4 way).
> 
> Matan and Or.
> 
> Changes from V1:
>  (1) fixed lustre IB's code build
>  (2) squashed mlx4 V1 9-11 patches into one
>  (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION
> 
> Changes from V0:
> (1) Pass ib_cq_init_attr instead of cqe and comp_vector.
> (2) Fix unneeded indentation.
> (3) Change flags to u32.
> (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation.
> 
> Matan Barak (9):
>   IB/core: Change provider's API of create_cq to be extendible
>   IB/core: Change ib_create_cq to use struct ib_cq_init_attr
>   IB/core: Add CQ creation time-stamping flag
>   IB/core: Extend ib_uverbs_create_cq
>   IB/core: Add timestamp_mask and hca_core_clock to query_device
>   IB/core: Pass hardware specific data in query_device
>   IB/mlx4: Add mmap call to map the hardware clock
>   IB/mlx4: Support extended create_cq and query_device uverbs
>   IB/mlx4: Add support for CQ time-stamping
> 
>  drivers/infiniband/core/device.c                   |    6 +-
>  drivers/infiniband/core/mad.c                      |    5 +-
>  drivers/infiniband/core/uverbs.h                   |    1 +
>  drivers/infiniband/core/uverbs_cmd.c               |  188 ++++++++++++++++----
>  drivers/infiniband/core/uverbs_main.c              |    1 +
>  drivers/infiniband/core/verbs.c                    |    4 +-
>  drivers/infiniband/hw/amso1100/c2_provider.c       |   14 ++-
>  drivers/infiniband/hw/cxgb3/iwch_provider.c        |   19 ++-
>  drivers/infiniband/hw/cxgb4/cq.c                   |    9 +-
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h             |    8 +-
>  drivers/infiniband/hw/cxgb4/provider.c             |    8 +-
>  drivers/infiniband/hw/ehca/ehca_cq.c               |    7 +-
>  drivers/infiniband/hw/ehca/ehca_hca.c              |    6 +-
>  drivers/infiniband/hw/ehca/ehca_iverbs.h           |    6 +-
>  drivers/infiniband/hw/ehca/ehca_main.c             |    6 +-
>  drivers/infiniband/hw/ipath/ipath_cq.c             |    9 +-
>  drivers/infiniband/hw/ipath/ipath_verbs.c          |    7 +-
>  drivers/infiniband/hw/ipath/ipath_verbs.h          |    3 +-
>  drivers/infiniband/hw/mlx4/cq.c                    |   13 ++-
>  drivers/infiniband/hw/mlx4/mad.c                   |    5 +-
>  drivers/infiniband/hw/mlx4/main.c                  |   67 +++++++-
>  drivers/infiniband/hw/mlx4/mlx4_ib.h               |   19 ++-
>  drivers/infiniband/hw/mlx5/cq.c                    |   10 +-
>  drivers/infiniband/hw/mlx5/main.c                  |   19 ++-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h               |    5 +-
>  drivers/infiniband/hw/mthca/mthca_provider.c       |   15 ++-
>  drivers/infiniband/hw/nes/nes_verbs.c              |   17 ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c        |   13 ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h        |    9 +-
>  drivers/infiniband/hw/qib/qib_cq.c                 |   11 +-
>  drivers/infiniband/hw/qib/qib_verbs.c              |    6 +-
>  drivers/infiniband/hw/qib/qib_verbs.h              |    5 +-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c       |   16 ++-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h       |   10 +-
>  drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    9 +-
>  drivers/infiniband/ulp/iser/iser_verbs.c           |    6 +-
>  drivers/infiniband/ulp/isert/ib_isert.c            |    6 +-
>  drivers/infiniband/ulp/srp/ib_srp.c                |   10 +-
>  drivers/infiniband/ulp/srpt/ib_srpt.c              |    5 +-
>  drivers/net/ethernet/mellanox/mlx4/main.c          |   19 ++
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +-
>  include/linux/mlx4/device.h                        |    9 +
>  include/rdma/ib_verbs.h                            |   25 ++-
>  include/uapi/rdma/ib_user_verbs.h                  |   19 ++
>  net/9p/trans_rdma.c                                |    5 +-
>  net/rds/ib_cm.c                                    |    8 +-
>  net/rds/iw_cm.c                                    |    8 +-
>  net/sunrpc/xprtrdma/svc_rdma_transport.c           |   10 +-
>  net/sunrpc/xprtrdma/verbs.c                        |   10 +-
>  49 files changed, 564 insertions(+), 139 deletions(-)
> 


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible
       [not found]     ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-01  9:21       ` Devesh Sharma
  2015-06-01 17:00       ` Jason Gunthorpe
  1 sibling, 0 replies; 64+ messages in thread
From: Devesh Sharma @ 2015-06-01  9:21 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

Looks good.

Reviewed-By: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>

On Sun, May 31, 2015 at 5:44 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Add a new ib_cq_init_attr structure which contains the
> previous cqe (minimum number of CQ entries) and comp_vector
> (completion vector) in addition to a new flags field.
> All vendors' create_cq callbacks are changed in order
> to work with the new API.
>
> This commit does not change any functionality.
>
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/core/uverbs_cmd.c         |    6 ++++--
>  drivers/infiniband/core/verbs.c              |    3 ++-
>  drivers/infiniband/hw/amso1100/c2_provider.c |    7 ++++++-
>  drivers/infiniband/hw/cxgb3/iwch_provider.c  |   11 ++++++++---
>  drivers/infiniband/hw/cxgb4/cq.c             |    9 +++++++--
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h       |    8 ++++----
>  drivers/infiniband/hw/ehca/ehca_cq.c         |    7 ++++++-
>  drivers/infiniband/hw/ehca/ehca_iverbs.h     |    3 ++-
>  drivers/infiniband/hw/ipath/ipath_cq.c       |    9 +++++++--
>  drivers/infiniband/hw/ipath/ipath_verbs.h    |    3 ++-
>  drivers/infiniband/hw/mlx4/cq.c              |    8 +++++++-
>  drivers/infiniband/hw/mlx4/mlx4_ib.h         |    3 ++-
>  drivers/infiniband/hw/mlx5/cq.c              |   10 ++++++++--
>  drivers/infiniband/hw/mlx5/main.c            |    3 ++-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h         |    5 +++--
>  drivers/infiniband/hw/mthca/mthca_provider.c |    8 ++++++--
>  drivers/infiniband/hw/nes/nes_verbs.c        |   11 ++++++++---
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |    7 ++++++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |    6 ++++--
>  drivers/infiniband/hw/qib/qib_cq.c           |   11 ++++++++---
>  drivers/infiniband/hw/qib/qib_verbs.h        |    5 +++--
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |   10 +++++++---
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h |    7 ++++---
>  include/rdma/ib_verbs.h                      |   10 ++++++++--
>  24 files changed, 124 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index a9f0489..1954ebb 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
>         struct ib_uverbs_event_file    *ev_file = NULL;
>         struct ib_cq                   *cq;
>         int                             ret;
> +       struct ib_cq_init_attr attr = {};
>
>         if (out_len < sizeof resp)
>                 return -ENOSPC;
> @@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
>         INIT_LIST_HEAD(&obj->comp_list);
>         INIT_LIST_HEAD(&obj->async_list);
>
> -       cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe,
> -                                            cmd.comp_vector,
> +       attr.cqe = cmd.cqe;
> +       attr.comp_vector = cmd.comp_vector;
> +       cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
>                                              file->ucontext, &udata);
>         if (IS_ERR(cq)) {
>                 ret = PTR_ERR(cq);
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 685a362..f7615d4 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1078,8 +1078,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device,
>                            void *cq_context, int cqe, int comp_vector)
>  {
>         struct ib_cq *cq;
> +       struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
>
> -       cq = device->create_cq(device, cqe, comp_vector, NULL, NULL);
> +       cq = device->create_cq(device, &attr, NULL, NULL);
>
>         if (!IS_ERR(cq)) {
>                 cq->device        = device;
> diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
> index d396c39..a43e022 100644
> --- a/drivers/infiniband/hw/amso1100/c2_provider.c
> +++ b/drivers/infiniband/hw/amso1100/c2_provider.c
> @@ -286,13 +286,18 @@ static int c2_destroy_qp(struct ib_qp *ib_qp)
>         return 0;
>  }
>
> -static struct ib_cq *c2_create_cq(struct ib_device *ibdev, int entries, int vector,
> +static struct ib_cq *c2_create_cq(struct ib_device *ibdev,
> +                                 const struct ib_cq_init_attr *attr,
>                                   struct ib_ucontext *context,
>                                   struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
>         struct c2_cq *cq;
>         int err;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         cq = kmalloc(sizeof(*cq), GFP_KERNEL);
>         if (!cq) {
>                 pr_debug("%s: Unable to allocate CQ\n", __func__);
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 061ef08..2eaf7e8 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -138,10 +138,12 @@ static int iwch_destroy_cq(struct ib_cq *ib_cq)
>         return 0;
>  }
>
> -static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int vector,
> -                            struct ib_ucontext *ib_context,
> -                            struct ib_udata *udata)
> +static struct ib_cq *iwch_create_cq(struct ib_device *ibdev,
> +                                   const struct ib_cq_init_attr *attr,
> +                                   struct ib_ucontext *ib_context,
> +                                   struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
>         struct iwch_dev *rhp;
>         struct iwch_cq *chp;
>         struct iwch_create_cq_resp uresp;
> @@ -151,6 +153,9 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int ve
>         size_t resplen;
>
>         PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries);
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         rhp = to_iwch_dev(ibdev);
>         chp = kzalloc(sizeof(*chp), GFP_KERNEL);
>         if (!chp)
> diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
> index 68ddb37..9a2c01c 100644
> --- a/drivers/infiniband/hw/cxgb4/cq.c
> +++ b/drivers/infiniband/hw/cxgb4/cq.c
> @@ -866,10 +866,13 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq)
>         return 0;
>  }
>
> -struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
> -                            int vector, struct ib_ucontext *ib_context,
> +struct ib_cq *c4iw_create_cq(struct ib_device *ibdev,
> +                            const struct ib_cq_init_attr *attr,
> +                            struct ib_ucontext *ib_context,
>                              struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
> +       int vector = attr->comp_vector;
>         struct c4iw_dev *rhp;
>         struct c4iw_cq *chp;
>         struct c4iw_create_cq_resp uresp;
> @@ -879,6 +882,8 @@ struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
>         struct c4iw_mm_entry *mm, *mm2;
>
>         PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries);
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
>
>         rhp = to_c4iw_dev(ibdev);
>
> diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
> index 97bb555..540ed02 100644
> --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
> +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
> @@ -992,10 +992,10 @@ int c4iw_reregister_phys_mem(struct ib_mr *mr,
>                                      int acc, u64 *iova_start);
>  int c4iw_dereg_mr(struct ib_mr *ib_mr);
>  int c4iw_destroy_cq(struct ib_cq *ib_cq);
> -struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
> -                                       int vector,
> -                                       struct ib_ucontext *ib_context,
> -                                       struct ib_udata *udata);
> +struct ib_cq *c4iw_create_cq(struct ib_device *ibdev,
> +                            const struct ib_cq_init_attr *attr,
> +                            struct ib_ucontext *ib_context,
> +                            struct ib_udata *udata);
>  int c4iw_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata);
>  int c4iw_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
>  int c4iw_destroy_qp(struct ib_qp *ib_qp);
> diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
> index 8cc8375..9b68b17 100644
> --- a/drivers/infiniband/hw/ehca/ehca_cq.c
> +++ b/drivers/infiniband/hw/ehca/ehca_cq.c
> @@ -113,10 +113,12 @@ struct ehca_qp *ehca_cq_get_qp(struct ehca_cq *cq, int real_qp_num)
>         return ret;
>  }
>
> -struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
> +struct ib_cq *ehca_create_cq(struct ib_device *device,
> +                            const struct ib_cq_init_attr *attr,
>                              struct ib_ucontext *context,
>                              struct ib_udata *udata)
>  {
> +       int cqe = attr->cqe;
>         static const u32 additional_cqe = 20;
>         struct ib_cq *cq;
>         struct ehca_cq *my_cq;
> @@ -131,6 +133,9 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
>         int ipz_rc, i;
>         unsigned long flags;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (cqe >= 0xFFFFFFFF - 64 - additional_cqe)
>                 return ERR_PTR(-EINVAL);
>
> diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> index 077185b..952f2fb 100644
> --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
> +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> @@ -129,7 +129,8 @@ int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq);
>  void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq);
>
>
> -struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
> +struct ib_cq *ehca_create_cq(struct ib_device *device,
> +                            const struct ib_cq_init_attr *attr,
>                              struct ib_ucontext *context,
>                              struct ib_udata *udata);
>
> diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c
> index 0416c6c..e9dd911 100644
> --- a/drivers/infiniband/hw/ipath/ipath_cq.c
> +++ b/drivers/infiniband/hw/ipath/ipath_cq.c
> @@ -188,7 +188,7 @@ static void send_complete(unsigned long data)
>  /**
>   * ipath_create_cq - create a completion queue
>   * @ibdev: the device this completion queue is attached to
> - * @entries: the minimum size of the completion queue
> + * @attr: creation attributes
>   * @context: unused by the InfiniPath driver
>   * @udata: unused by the InfiniPath driver
>   *
> @@ -197,16 +197,21 @@ static void send_complete(unsigned long data)
>   *
>   * Called by ib_create_cq() in the generic verbs code.
>   */
> -struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector,
> +struct ib_cq *ipath_create_cq(struct ib_device *ibdev,
> +                             const struct ib_cq_init_attr *attr,
>                               struct ib_ucontext *context,
>                               struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
>         struct ipath_ibdev *dev = to_idev(ibdev);
>         struct ipath_cq *cq;
>         struct ipath_cq_wc *wc;
>         struct ib_cq *ret;
>         u32 sz;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (entries < 1 || entries > ib_ipath_max_cqes) {
>                 ret = ERR_PTR(-EINVAL);
>                 goto done;
> diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
> index ae6cff4..c4e4e81 100644
> --- a/drivers/infiniband/hw/ipath/ipath_verbs.h
> +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
> @@ -807,7 +807,8 @@ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig);
>
>  int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
>
> -struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector,
> +struct ib_cq *ipath_create_cq(struct ib_device *ibdev,
> +                             const struct ib_cq_init_attr *attr,
>                               struct ib_ucontext *context,
>                               struct ib_udata *udata);
>
> diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
> index 0176caa..8e44aaa 100644
> --- a/drivers/infiniband/hw/mlx4/cq.c
> +++ b/drivers/infiniband/hw/mlx4/cq.c
> @@ -166,15 +166,21 @@ err_buf:
>         return err;
>  }
>
> -struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
> +struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
> +                               const struct ib_cq_init_attr *attr,
>                                 struct ib_ucontext *context,
>                                 struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
> +       int vector = attr->comp_vector;
>         struct mlx4_ib_dev *dev = to_mdev(ibdev);
>         struct mlx4_ib_cq *cq;
>         struct mlx4_uar *uar;
>         int err;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (entries < 1 || entries > dev->dev->caps.max_cqes)
>                 return ERR_PTR(-EINVAL);
>
> diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
> index fce3934..f127efc 100644
> --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
> +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
> @@ -668,7 +668,8 @@ void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
>
>  int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
>  int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
> -struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
> +struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
> +                               const struct ib_cq_init_attr *attr,
>                                 struct ib_ucontext *context,
>                                 struct ib_udata *udata);
>  int mlx4_ib_destroy_cq(struct ib_cq *cq);
> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> index 2ee6b10..09fbae6 100644
> --- a/drivers/infiniband/hw/mlx5/cq.c
> +++ b/drivers/infiniband/hw/mlx5/cq.c
> @@ -736,10 +736,13 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq)
>         mlx5_db_free(dev->mdev, &cq->db);
>  }
>
> -struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
> -                               int vector, struct ib_ucontext *context,
> +struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
> +                               const struct ib_cq_init_attr *attr,
> +                               struct ib_ucontext *context,
>                                 struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
> +       int vector = attr->comp_vector;
>         struct mlx5_create_cq_mbox_in *cqb = NULL;
>         struct mlx5_ib_dev *dev = to_mdev(ibdev);
>         struct mlx5_ib_cq *cq;
> @@ -750,6 +753,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
>         int eqn;
>         int err;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (entries < 0)
>                 return ERR_PTR(-EINVAL);
>
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index b2fdb9c..9565c20 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -1087,6 +1087,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
>  {
>         struct ib_srq_init_attr attr;
>         struct mlx5_ib_dev *dev;
> +       struct ib_cq_init_attr cq_attr = {.cqe = 1};
>         int ret = 0;
>
>         dev = container_of(devr, struct mlx5_ib_dev, devr);
> @@ -1100,7 +1101,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
>         devr->p0->uobject = NULL;
>         atomic_set(&devr->p0->usecnt, 0);
>
> -       devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, 1, 0, NULL, NULL);
> +       devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, &cq_attr, NULL, NULL);
>         if (IS_ERR(devr->c0)) {
>                 ret = PTR_ERR(devr->c0);
>                 goto error1;
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index dff1cfc..b086162 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -556,8 +556,9 @@ int mlx5_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
>  void *mlx5_get_send_wqe(struct mlx5_ib_qp *qp, int n);
>  int mlx5_ib_read_user_wqe(struct mlx5_ib_qp *qp, int send, int wqe_index,
>                           void *buffer, u32 length);
> -struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
> -                               int vector, struct ib_ucontext *context,
> +struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
> +                               const struct ib_cq_init_attr *attr,
> +                               struct ib_ucontext *context,
>                                 struct ib_udata *udata);
>  int mlx5_ib_destroy_cq(struct ib_cq *cq);
>  int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
> diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
> index 509d59e..f1b04bd 100644
> --- a/drivers/infiniband/hw/mthca/mthca_provider.c
> +++ b/drivers/infiniband/hw/mthca/mthca_provider.c
> @@ -641,16 +641,20 @@ static int mthca_destroy_qp(struct ib_qp *qp)
>         return 0;
>  }
>
> -static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, int entries,
> -                                    int comp_vector,
> +static struct ib_cq *mthca_create_cq(struct ib_device *ibdev,
> +                                    const struct ib_cq_init_attr *attr,
>                                      struct ib_ucontext *context,
>                                      struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
>         struct mthca_create_cq ucmd;
>         struct mthca_cq *cq;
>         int nent;
>         int err;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (entries < 1 || entries > to_mdev(ibdev)->limits.max_cqes)
>                 return ERR_PTR(-EINVAL);
>
> diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
> index 05530e3..a9d62c5 100644
> --- a/drivers/infiniband/hw/nes/nes_verbs.c
> +++ b/drivers/infiniband/hw/nes/nes_verbs.c
> @@ -1526,10 +1526,12 @@ static int nes_destroy_qp(struct ib_qp *ibqp)
>  /**
>   * nes_create_cq
>   */
> -static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries,
> -               int comp_vector,
> -               struct ib_ucontext *context, struct ib_udata *udata)
> +static struct ib_cq *nes_create_cq(struct ib_device *ibdev,
> +                                  const struct ib_cq_init_attr *attr,
> +                                  struct ib_ucontext *context,
> +                                  struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
>         u64 u64temp;
>         struct nes_vnic *nesvnic = to_nesvnic(ibdev);
>         struct nes_device *nesdev = nesvnic->nesdev;
> @@ -1549,6 +1551,9 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries,
>         unsigned long flags;
>         int ret;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (entries > nesadapter->max_cqe)
>                 return ERR_PTR(-EINVAL);
>
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> index 8771755..f441df6 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> @@ -1000,10 +1000,12 @@ err:
>         return status;
>  }
>
> -struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector,
> +struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev,
> +                              const struct ib_cq_init_attr *attr,
>                                struct ib_ucontext *ib_ctx,
>                                struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
>         struct ocrdma_cq *cq;
>         struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
>         struct ocrdma_ucontext *uctx = NULL;
> @@ -1011,6 +1013,9 @@ struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector,
>         int status;
>         struct ocrdma_create_cq_ureq ureq;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (udata) {
>                 if (ib_copy_from_udata(&ureq, udata, sizeof(ureq)))
>                         return ERR_PTR(-EFAULT);
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> index 3cdc81e..e853195 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> @@ -59,8 +59,10 @@ struct ib_pd *ocrdma_alloc_pd(struct ib_device *,
>                               struct ib_ucontext *, struct ib_udata *);
>  int ocrdma_dealloc_pd(struct ib_pd *pd);
>
> -struct ib_cq *ocrdma_create_cq(struct ib_device *, int entries, int vector,
> -                              struct ib_ucontext *, struct ib_udata *);
> +struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev,
> +                              const struct ib_cq_init_attr *attr,
> +                              struct ib_ucontext *ib_ctx,
> +                              struct ib_udata *udata);
>  int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *);
>  int ocrdma_destroy_cq(struct ib_cq *);
>
> diff --git a/drivers/infiniband/hw/qib/qib_cq.c b/drivers/infiniband/hw/qib/qib_cq.c
> index ab4e11c..2b45d0b 100644
> --- a/drivers/infiniband/hw/qib/qib_cq.c
> +++ b/drivers/infiniband/hw/qib/qib_cq.c
> @@ -203,7 +203,7 @@ static void send_complete(struct kthread_work *work)
>  /**
>   * qib_create_cq - create a completion queue
>   * @ibdev: the device this completion queue is attached to
> - * @entries: the minimum size of the completion queue
> + * @attr: creation attributes
>   * @context: unused by the QLogic_IB driver
>   * @udata: user data for libibverbs.so
>   *
> @@ -212,16 +212,21 @@ static void send_complete(struct kthread_work *work)
>   *
>   * Called by ib_create_cq() in the generic verbs code.
>   */
> -struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries,
> -                           int comp_vector, struct ib_ucontext *context,
> +struct ib_cq *qib_create_cq(struct ib_device *ibdev,
> +                           const struct ib_cq_init_attr *attr,
> +                           struct ib_ucontext *context,
>                             struct ib_udata *udata)
>  {
> +       int entries = attr->cqe;
>         struct qib_ibdev *dev = to_idev(ibdev);
>         struct qib_cq *cq;
>         struct qib_cq_wc *wc;
>         struct ib_cq *ret;
>         u32 sz;
>
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         if (entries < 1 || entries > ib_qib_max_cqes) {
>                 ret = ERR_PTR(-EINVAL);
>                 goto done;
> diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
> index bfc8948..e9b26a4 100644
> --- a/drivers/infiniband/hw/qib/qib_verbs.h
> +++ b/drivers/infiniband/hw/qib/qib_verbs.h
> @@ -1007,8 +1007,9 @@ void qib_cq_enter(struct qib_cq *cq, struct ib_wc *entry, int sig);
>
>  int qib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
>
> -struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries,
> -                           int comp_vector, struct ib_ucontext *context,
> +struct ib_cq *qib_create_cq(struct ib_device *ibdev,
> +                           const struct ib_cq_init_attr *attr,
> +                           struct ib_ucontext *context,
>                             struct ib_udata *udata);
>
>  int qib_destroy_cq(struct ib_cq *ibcq);
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> index 53bd6a2..2b3f7e3 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> @@ -570,13 +570,17 @@ int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
>         return status;
>  }
>
> -struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries,
> -                                       int vector, struct ib_ucontext *context,
> -                                       struct ib_udata *udata)
> +struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev,
> +                                const struct ib_cq_init_attr *attr,
> +                                struct ib_ucontext *context,
> +                                struct ib_udata *udata)
>  {
>         struct ib_cq *cq;
>
>         usnic_dbg("\n");
> +       if (attr->flags)
> +               return ERR_PTR(-EINVAL);
> +
>         cq = kzalloc(sizeof(*cq), GFP_KERNEL);
>         if (!cq)
>                 return ERR_PTR(-EBUSY);
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> index 57ddba5..2ab8427 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> @@ -46,9 +46,10 @@ struct ib_qp *usnic_ib_create_qp(struct ib_pd *pd,
>  int usnic_ib_destroy_qp(struct ib_qp *qp);
>  int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
>                                 int attr_mask, struct ib_udata *udata);
> -struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries,
> -                                       int vector, struct ib_ucontext *context,
> -                                       struct ib_udata *udata);
> +struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev,
> +                                const struct ib_cq_init_attr *attr,
> +                                struct ib_ucontext *context,
> +                                struct ib_udata *udata);
>  int usnic_ib_destroy_cq(struct ib_cq *cq);
>  struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
>                                 u64 virt_addr, int access_flags,
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index ad499bd..6c715d7 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -173,6 +173,12 @@ struct ib_odp_caps {
>         } per_transport_caps;
>  };
>
> +struct ib_cq_init_attr {
> +       unsigned int    cqe;
> +       int             comp_vector;
> +       u32             flags;
> +};
> +
>  struct ib_device_attr {
>         u64                     fw_ver;
>         __be64                  sys_image_guid;
> @@ -1614,8 +1620,8 @@ struct ib_device {
>         int                        (*post_recv)(struct ib_qp *qp,
>                                                 struct ib_recv_wr *recv_wr,
>                                                 struct ib_recv_wr **bad_recv_wr);
> -       struct ib_cq *             (*create_cq)(struct ib_device *device, int cqe,
> -                                               int comp_vector,
> +       struct ib_cq *             (*create_cq)(struct ib_device *device,
> +                                               const struct ib_cq_init_attr *attr,
>                                                 struct ib_ucontext *context,
>                                                 struct ib_udata *udata);
>         int                        (*modify_cq)(struct ib_cq *cq, u16 cq_count,
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device
       [not found]     ` <1433074457-26437-7-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-01  9:24       ` Devesh Sharma
  0 siblings, 0 replies; 64+ messages in thread
From: Devesh Sharma @ 2015-06-01  9:24 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

ocrdma part Looks good.

Reviewed-By: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>

On Sun, May 31, 2015 at 5:44 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Vendors should be able to pass vendor specific data to/from
> user-space via query_device uverb. In order to do this,
> we need to pass the vendors' specific udata.
>
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/core/device.c             |    4 +++-
>  drivers/infiniband/core/uverbs_cmd.c         |    2 +-
>  drivers/infiniband/hw/amso1100/c2_provider.c |    7 +++++--
>  drivers/infiniband/hw/cxgb3/iwch_provider.c  |    8 ++++++--
>  drivers/infiniband/hw/cxgb4/provider.c       |    8 ++++++--
>  drivers/infiniband/hw/ehca/ehca_hca.c        |    6 +++++-
>  drivers/infiniband/hw/ehca/ehca_iverbs.h     |    3 ++-
>  drivers/infiniband/hw/ipath/ipath_verbs.c    |    7 +++++--
>  drivers/infiniband/hw/mlx4/main.c            |    6 +++++-
>  drivers/infiniband/hw/mlx5/main.c            |    9 +++++++--
>  drivers/infiniband/hw/mthca/mthca_provider.c |    7 +++++--
>  drivers/infiniband/hw/nes/nes_verbs.c        |    6 +++++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |    6 +++++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |    3 ++-
>  drivers/infiniband/hw/qib/qib_verbs.c        |    6 ++++--
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |    6 +++++-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h |    3 ++-
>  include/rdma/ib_verbs.h                      |    3 ++-
>  18 files changed, 75 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 568cb41..694bd66 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -539,9 +539,11 @@ EXPORT_SYMBOL(ib_dispatch_event);
>  int ib_query_device(struct ib_device *device,
>                     struct ib_device_attr *device_attr)
>  {
> +       struct ib_udata uhw = {.outlen = 0, .inlen = 0};
> +
>         memset(device_attr, 0, sizeof(*device_attr));
>
> -       return device->query_device(device, device_attr);
> +       return device->query_device(device, device_attr, &uhw);
>  }
>  EXPORT_SYMBOL(ib_query_device);
>
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 11ee298..bbb02ff 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -3428,7 +3428,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
>
>         memset(&attr, 0, sizeof(attr));
>
> -       err = device->query_device(device, &attr);
> +       err = device->query_device(device, &attr, uhw);
>         if (err)
>                 return err;
>
> diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
> index a43e022..382f109 100644
> --- a/drivers/infiniband/hw/amso1100/c2_provider.c
> +++ b/drivers/infiniband/hw/amso1100/c2_provider.c
> @@ -63,13 +63,16 @@
>  #include "c2_provider.h"
>  #include "c2_user.h"
>
> -static int c2_query_device(struct ib_device *ibdev,
> -                          struct ib_device_attr *props)
> +static int c2_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                          struct ib_udata *uhw)
>  {
>         struct c2_dev *c2dev = to_c2dev(ibdev);
>
>         pr_debug("%s:%u\n", __func__, __LINE__);
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         *props = c2dev->props;
>         return 0;
>  }
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 2eaf7e8..c4b5936 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1150,13 +1150,17 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev)
>                (fw_mic & 0xffff);
>  }
>
> -static int iwch_query_device(struct ib_device *ibdev,
> -                            struct ib_device_attr *props)
> +static int iwch_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                            struct ib_udata *uhw)
>  {
>
>         struct iwch_dev *dev;
> +
>         PDBG("%s ibdev %p\n", __func__, ibdev);
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         dev = to_iwch_dev(ibdev);
>         memset(props, 0, sizeof *props);
>         memcpy(&props->sys_image_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6);
> diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
> index ef08a9f..05a96a5 100644
> --- a/drivers/infiniband/hw/cxgb4/provider.c
> +++ b/drivers/infiniband/hw/cxgb4/provider.c
> @@ -301,13 +301,17 @@ static int c4iw_query_gid(struct ib_device *ibdev, u8 port, int index,
>         return 0;
>  }
>
> -static int c4iw_query_device(struct ib_device *ibdev,
> -                            struct ib_device_attr *props)
> +static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                            struct ib_udata *uhw)
>  {
>
>         struct c4iw_dev *dev;
> +
>         PDBG("%s ibdev %p\n", __func__, ibdev);
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         dev = to_c4iw_dev(ibdev);
>         memset(props, 0, sizeof *props);
>         memcpy(&props->sys_image_guid, dev->rdev.lldi.ports[0]->dev_addr, 6);
> diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
> index 9ed4d25..e8b1bb6 100644
> --- a/drivers/infiniband/hw/ehca/ehca_hca.c
> +++ b/drivers/infiniband/hw/ehca/ehca_hca.c
> @@ -50,7 +50,8 @@ static unsigned int limit_uint(unsigned int value)
>         return min_t(unsigned int, value, INT_MAX);
>  }
>
> -int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
> +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                     struct ib_udata *uhw)
>  {
>         int i, ret = 0;
>         struct ehca_shca *shca = container_of(ibdev, struct ehca_shca,
> @@ -71,6 +72,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
>                 IB_DEVICE_PORT_ACTIVE_EVENT,  HCA_CAP_PORT_ACTIVE_EVENT,
>         };
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL);
>         if (!rblock) {
>                 ehca_err(&shca->ib_device, "Can't allocate rblock memory.");
> diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> index 952f2fb..192eef1 100644
> --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
> +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> @@ -44,7 +44,8 @@
>
>  #include "ehca_classes.h"
>
> -int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props);
> +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                     struct ib_udata *uhw);
>
>  int ehca_query_port(struct ib_device *ibdev, u8 port,
>                     struct ib_port_attr *props);
> diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
> index 764081d..78af73c 100644
> --- a/drivers/infiniband/hw/ipath/ipath_verbs.c
> +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
> @@ -1495,11 +1495,14 @@ bail:
>         return 0;
>  }
>
> -static int ipath_query_device(struct ib_device *ibdev,
> -                             struct ib_device_attr *props)
> +static int ipath_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                             struct ib_udata *uhw)
>  {
>         struct ipath_ibdev *dev = to_idev(ibdev);
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         memset(props, 0, sizeof(*props));
>
>         props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
> diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> index 3b90df0..914beae 100644
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -132,7 +132,8 @@ static int num_ib_ports(struct mlx4_dev *dev)
>  }
>
>  static int mlx4_ib_query_device(struct ib_device *ibdev,
> -                               struct ib_device_attr *props)
> +                               struct ib_device_attr *props,
> +                               struct ib_udata *uhw)
>  {
>         struct mlx4_ib_dev *dev = to_mdev(ibdev);
>         struct ib_smp *in_mad  = NULL;
> @@ -140,6 +141,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
>         int err = -ENOMEM;
>         int have_ib_ports;
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
>         out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
>         if (!in_mad || !out_mad)
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index b9976a0..5a8486c 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -63,7 +63,8 @@ static char mlx5_version[] =
>         DRIVER_VERSION " (" DRIVER_RELDATE ")\n";
>
>  static int mlx5_ib_query_device(struct ib_device *ibdev,
> -                               struct ib_device_attr *props)
> +                               struct ib_device_attr *props,
> +                               struct ib_udata *uhw)
>  {
>         struct mlx5_ib_dev *dev = to_mdev(ibdev);
>         struct ib_smp *in_mad  = NULL;
> @@ -74,6 +75,9 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
>         int max_sq_sg;
>         u64 flags;
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         gen = &dev->mdev->caps.gen;
>         in_mad  = kzalloc(sizeof(*in_mad), GFP_KERNEL);
>         out_mad = kmalloc(sizeof(*out_mad), GFP_KERNEL);
> @@ -910,6 +914,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
>         struct mlx5_general_caps *gen;
>         int err = -ENOMEM;
>         int port;
> +       struct ib_udata uhw = {.inlen = 0, .outlen = 0};
>
>         gen = &dev->mdev->caps.gen;
>         pprops = kmalloc(sizeof(*pprops), GFP_KERNEL);
> @@ -920,7 +925,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
>         if (!dprops)
>                 goto out;
>
> -       err = mlx5_ib_query_device(&dev->ib_dev, dprops);
> +       err = mlx5_ib_query_device(&dev->ib_dev, dprops, &uhw);
>         if (err) {
>                 mlx5_ib_warn(dev, "query_device failed %d\n", err);
>                 goto out;
> diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
> index f1b04bd..d6d7c16 100644
> --- a/drivers/infiniband/hw/mthca/mthca_provider.c
> +++ b/drivers/infiniband/hw/mthca/mthca_provider.c
> @@ -57,14 +57,17 @@ static void init_query_mad(struct ib_smp *mad)
>         mad->method        = IB_MGMT_METHOD_GET;
>  }
>
> -static int mthca_query_device(struct ib_device *ibdev,
> -                             struct ib_device_attr *props)
> +static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                             struct ib_udata *uhw)
>  {
>         struct ib_smp *in_mad  = NULL;
>         struct ib_smp *out_mad = NULL;
>         int err = -ENOMEM;
>         struct mthca_dev *mdev = to_mdev(ibdev);
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
>         out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
>         if (!in_mad || !out_mad)
> diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
> index a9d62c5..0c2fb7a 100644
> --- a/drivers/infiniband/hw/nes/nes_verbs.c
> +++ b/drivers/infiniband/hw/nes/nes_verbs.c
> @@ -512,12 +512,16 @@ static void nes_free_fast_reg_page_list(struct ib_fast_reg_page_list *pifrpl)
>  /**
>   * nes_query_device
>   */
> -static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
> +static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                           struct ib_udata *uhw)
>  {
>         struct nes_vnic *nesvnic = to_nesvnic(ibdev);
>         struct nes_device *nesdev = nesvnic->nesdev;
>         struct nes_ib_device *nesibdev = nesvnic->nesibdev;
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         memset(props, 0, sizeof(*props));
>         memcpy(&props->sys_image_guid, nesvnic->netdev->dev_addr, 6);
>
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> index f441df6..fce11ec 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> @@ -61,10 +61,14 @@ int ocrdma_query_gid(struct ib_device *ibdev, u8 port,
>         return 0;
>  }
>
> -int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr)
> +int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr,
> +                       struct ib_udata *uhw)
>  {
>         struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         memset(attr, 0, sizeof *attr);
>         memcpy(&attr->fw_ver, &dev->attr.fw_ver[0],
>                min(sizeof(dev->attr.fw_ver), sizeof(attr->fw_ver)));
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> index e853195..b15c608 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> @@ -36,7 +36,8 @@ int ocrdma_post_recv(struct ib_qp *, struct ib_recv_wr *,
>  int ocrdma_poll_cq(struct ib_cq *, int num_entries, struct ib_wc *wc);
>  int ocrdma_arm_cq(struct ib_cq *, enum ib_cq_notify_flags flags);
>
> -int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props);
> +int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props,
> +                       struct ib_udata *uhw);
>  int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props);
>  int ocrdma_modify_port(struct ib_device *, u8 port, int mask,
>                        struct ib_port_modify *props);
> diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
> index dba1c92..e1dad45 100644
> --- a/drivers/infiniband/hw/qib/qib_verbs.c
> +++ b/drivers/infiniband/hw/qib/qib_verbs.c
> @@ -1550,12 +1550,14 @@ full:
>         }
>  }
>
> -static int qib_query_device(struct ib_device *ibdev,
> -                           struct ib_device_attr *props)
> +static int qib_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> +                           struct ib_udata *uhw)
>  {
>         struct qib_devdata *dd = dd_from_ibdev(ibdev);
>         struct qib_ibdev *dev = to_idev(ibdev);
>
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
>         memset(props, 0, sizeof(*props));
>
>         props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> index 2b3f7e3..7df4382 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> @@ -248,7 +248,8 @@ enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device,
>  }
>
>  int usnic_ib_query_device(struct ib_device *ibdev,
> -                               struct ib_device_attr *props)
> +                         struct ib_device_attr *props,
> +                         struct ib_udata *uhw)
>  {
>         struct usnic_ib_dev *us_ibdev = to_usdev(ibdev);
>         union ib_gid gid;
> @@ -257,6 +258,9 @@ int usnic_ib_query_device(struct ib_device *ibdev,
>         int qp_per_vf;
>
>         usnic_dbg("\n");
> +       if (uhw->inlen || uhw->outlen)
> +               return -EINVAL;
> +
>         mutex_lock(&us_ibdev->usdev_lock);
>         us_ibdev->netdev->ethtool_ops->get_drvinfo(us_ibdev->netdev, &info);
>         us_ibdev->netdev->ethtool_ops->get_settings(us_ibdev->netdev, &cmd);
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> index 2ab8427..0bd04ef 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> @@ -24,7 +24,8 @@
>  enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device,
>                                                 u8 port_num);
>  int usnic_ib_query_device(struct ib_device *ibdev,
> -                               struct ib_device_attr *props);
> +                               struct ib_device_attr *props,
> +                         struct ib_udata *uhw);
>  int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
>                                 struct ib_port_attr *props);
>  enum rdma_protocol_type
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index a45f674..045b5b2 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1562,7 +1562,8 @@ struct ib_device {
>         int                        (*get_protocol_stats)(struct ib_device *device,
>                                                          union rdma_protocol_stats *stats);
>         int                        (*query_device)(struct ib_device *device,
> -                                                  struct ib_device_attr *device_attr);
> +                                                  struct ib_device_attr *device_attr,
> +                                                  struct ib_udata *udata);
>         int                        (*query_port)(struct ib_device *device,
>                                                  u8 port_num,
>                                                  struct ib_port_attr *port_attr);
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]     ` <1433098827.114391.179.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-01  9:30       ` Matan Barak
       [not found]         ` <CAAKD3BBo9A5AV9Uv-q1TgSQ6hs-oSUn5QsUT+5+6mk1RwW7AUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-06-01  9:30 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Sun, May 31, 2015 at 10:00 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Sun, 2015-05-31 at 15:14 +0300, Or Gerlitz wrote:
>> Hi Doug,
>>
>> This patchset adds completion timestamping supports for verbs consumers.
>>
>> Reviewing the weekend threads, we've changed the flag time to reflect
>> that this is completion time-stamp and folded the mlx4 actual support
>> into one patch.
>>
>> Regarding the related user-space support, it's possible to add what you
>> were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and
>> ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned
>> by the poll cq verb an opaque one that must go through one of  the convertors.
>>
>> We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which
>> could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME.
>
> I'm theoretically OK with something similar to the above.  However, the
> NS time should not be raw.  It should be cooked and should be able to be
> valid to compare between different adapters.  Right now, the cycle
> counter that you are exposing is only useful for ordering between
> packets received on a single adapter where the cycle counter is the same
> on all packets.  Throw in a different vendor's card, or two of your own
> cards, and the issue gets much more complex.  The cooked value should be
> an actual, real time that can be used across these more complex
> environments.  Because of that, it really shouldn't be called RAW.
>

Thanks for the feedback Doug.
We wanted to add RAW_NS in order to free the user from calculating it by himself
(dividing the cycles value in the core_clock).
In addition to this, it's possible to implement a future NS_TIME
(without the "raw"), which
will convert the opaque time to system wide ns.

> So, if you want a single entry point, I would suggest something like
> this:
>
> enum ib_timestamp_flags {
>         IB_TIMESTAMP_COMPLETION = (1 << 0), // specify on create_cq
>         IB_TIMESTAMP_WQE_BEGIN =  (1 << 1), // specify on create qp?
>         IB_TIMESTAMP_WQE_END =    (1 << 2), // specify on create qp?
>         IB_TIMESTAMP_RAW =        (1 << 31)
> };
>
> enum ib_cq_creation_flags {
>         IB_CQ_FLAGS_TIMESTAMP_COMPLETION = (1 << 0)
> };
>
> /**
>  * ibv_get_timestamp - Return the requested timestamp for the given wc
>  * @wc - work completion to get timestamp results from
>  * @ts - struct timespec to return timestamp in
>  * @flags - which timestamp to return and in what form
>  *
>  * Depending on the flags used to create the queue pair/completion
>  * queue, different timestamps might be available.  Callers should
>  * specify which timestamp they are interested in using the flags
>  * element, and if they wish either a cooked or raw timestamp.  A
>  * raw timestamp is implementation defined and will be passed back
>  * in the tv_nsec portion of the struct timespec.  A raw timestamp
>  * can not be relied upon to have any ordering value between more
>  * than one HCA or driver.  A cooked timestamp will return a valid
>  * struct timespec normalized as closely as possible to the return
>  * value for CLOCK_MONOTONIC of clock_gettime at the time of the
>  * timestamp.
>  */
> int ibv_get_timestamp(struct ibv_wc *wc, struct timespec *ts, int
> flags);
>

We wanted to divide the flow here:
In create_cq, the user notifies the kernel/HCA which timestamp he
would like to get.
It could be a completion timestamp, a start of WQE timestamp or
whatever he wants.
The timestamp the user gets in the WQE is opaque. Every vendor could
implement it
as it wants - in order to have minimal implication in performance.

The second part is ibv_get_timestamp. It gets an opaque timestamp and
outputs a converted value in respect to the time the user wanted to get.
For example, if IB_TIMESTAMP_NS_TIME is given, the function should output
a system-wide NS value (we would like to implement this only in the future).
Currently, only RAW and RAW_NS will be supported, while RAW gives the time
in cycles and RAW_NS gives a NS value with an unknown time reference.

We think ibv_get_timestamp shouldn't get a wqe but a 64bit opaque value.
The reason for this is that it could be used in order to translate query_values
current time to different types of timestamp.
What do you think?

>> We think this would address the reviewer comments for the kernel submission.
>>
>> The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the
>> conversion functions)
>>
>>  https://github.com/matanb10/libibverbs timestamp-v1
>>  https://github.com/matanb10/libmlx4 timestamp-v1
>>
>> Timestamping is used by applications in order to know when a WQE was
>> received/transmitted by the HW. The value is given is HCA hardware cycles,
>> but could be easily converted as the hardware's core clock frequecny is
>> available through extension of query device.
>>
>> Moreover, we add an ability to read the HCA's current clock. This could be
>> useful on order to synchronize events to the wall clock.
>>
>> This functionality is achieved by adding/extending the following verbs:
>>
>> create_cq - create_cq is extended in order to allow passing creation flags
>> to the CQ creation function. We change IB/core --> vendors API
>> to be easily extendible by passing a struct which contains
>> comp_vectors, cqe and the new flags parameter. In order to create
>> CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given.
>>
>> query_device - We extend query_device uverb further by giving the hardware's
>> clock frequency and the timestamp mask (the number of timestamp
>> bits which are supported). If timestamp isn't supported, 0 is returned.
>>
>> In order to read the timestamp in the WQE, the user needs to query the device
>> for support, create an appropriate CQ (using the extanded uverb with
>> IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently,
>> only implemented in user-space).
>>
>> In mlx4, allowing the user to read the core clock efficiently involves mapping
>> this area of the hardware to user-space (being done by using a mmap command)
>> and reading the clock from the correct offset of the page.
>>
>> This offset is returned in the vendor's specific data from mlx4's kernel driver
>> to the mlx4's user-space driver. query_device is modified in order to support
>> passing this vendor specific data. A user-space application could use a new
>> verb in order to read the hardware's clock.
>>
>> Translating the hardware's clock into ms could be done by dividing this
>> value by hca_core_clock (which is returned by the extended version of
>> query_device uverb).
>>
>> A user-space application could get the current HW's clock by executing
>>
>> ibv_query_values_ex(struct ibv_context *context, uint32_t q_values,
>>                     struct ibv_values_ex *values)
>>
>> The function gets a mask of the values to query and return their values.
>> Vendors could either implement this as a uverb command or use their
>> user-space driver to return those values directly from the HW (the mlx4 way).
>>
>> Matan and Or.
>>
>> Changes from V1:
>>  (1) fixed lustre IB's code build
>>  (2) squashed mlx4 V1 9-11 patches into one
>>  (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION
>>
>> Changes from V0:
>> (1) Pass ib_cq_init_attr instead of cqe and comp_vector.
>> (2) Fix unneeded indentation.
>> (3) Change flags to u32.
>> (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation.
>>
>> Matan Barak (9):
>>   IB/core: Change provider's API of create_cq to be extendible
>>   IB/core: Change ib_create_cq to use struct ib_cq_init_attr
>>   IB/core: Add CQ creation time-stamping flag
>>   IB/core: Extend ib_uverbs_create_cq
>>   IB/core: Add timestamp_mask and hca_core_clock to query_device
>>   IB/core: Pass hardware specific data in query_device
>>   IB/mlx4: Add mmap call to map the hardware clock
>>   IB/mlx4: Support extended create_cq and query_device uverbs
>>   IB/mlx4: Add support for CQ time-stamping
>>
>>  drivers/infiniband/core/device.c                   |    6 +-
>>  drivers/infiniband/core/mad.c                      |    5 +-
>>  drivers/infiniband/core/uverbs.h                   |    1 +
>>  drivers/infiniband/core/uverbs_cmd.c               |  188 ++++++++++++++++----
>>  drivers/infiniband/core/uverbs_main.c              |    1 +
>>  drivers/infiniband/core/verbs.c                    |    4 +-
>>  drivers/infiniband/hw/amso1100/c2_provider.c       |   14 ++-
>>  drivers/infiniband/hw/cxgb3/iwch_provider.c        |   19 ++-
>>  drivers/infiniband/hw/cxgb4/cq.c                   |    9 +-
>>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h             |    8 +-
>>  drivers/infiniband/hw/cxgb4/provider.c             |    8 +-
>>  drivers/infiniband/hw/ehca/ehca_cq.c               |    7 +-
>>  drivers/infiniband/hw/ehca/ehca_hca.c              |    6 +-
>>  drivers/infiniband/hw/ehca/ehca_iverbs.h           |    6 +-
>>  drivers/infiniband/hw/ehca/ehca_main.c             |    6 +-
>>  drivers/infiniband/hw/ipath/ipath_cq.c             |    9 +-
>>  drivers/infiniband/hw/ipath/ipath_verbs.c          |    7 +-
>>  drivers/infiniband/hw/ipath/ipath_verbs.h          |    3 +-
>>  drivers/infiniband/hw/mlx4/cq.c                    |   13 ++-
>>  drivers/infiniband/hw/mlx4/mad.c                   |    5 +-
>>  drivers/infiniband/hw/mlx4/main.c                  |   67 +++++++-
>>  drivers/infiniband/hw/mlx4/mlx4_ib.h               |   19 ++-
>>  drivers/infiniband/hw/mlx5/cq.c                    |   10 +-
>>  drivers/infiniband/hw/mlx5/main.c                  |   19 ++-
>>  drivers/infiniband/hw/mlx5/mlx5_ib.h               |    5 +-
>>  drivers/infiniband/hw/mthca/mthca_provider.c       |   15 ++-
>>  drivers/infiniband/hw/nes/nes_verbs.c              |   17 ++-
>>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c        |   13 ++-
>>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h        |    9 +-
>>  drivers/infiniband/hw/qib/qib_cq.c                 |   11 +-
>>  drivers/infiniband/hw/qib/qib_verbs.c              |    6 +-
>>  drivers/infiniband/hw/qib/qib_verbs.h              |    5 +-
>>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c       |   16 ++-
>>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h       |   10 +-
>>  drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    9 +-
>>  drivers/infiniband/ulp/iser/iser_verbs.c           |    6 +-
>>  drivers/infiniband/ulp/isert/ib_isert.c            |    6 +-
>>  drivers/infiniband/ulp/srp/ib_srp.c                |   10 +-
>>  drivers/infiniband/ulp/srpt/ib_srpt.c              |    5 +-
>>  drivers/net/ethernet/mellanox/mlx4/main.c          |   19 ++
>>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +-
>>  include/linux/mlx4/device.h                        |    9 +
>>  include/rdma/ib_verbs.h                            |   25 ++-
>>  include/uapi/rdma/ib_user_verbs.h                  |   19 ++
>>  net/9p/trans_rdma.c                                |    5 +-
>>  net/rds/ib_cm.c                                    |    8 +-
>>  net/rds/iw_cm.c                                    |    8 +-
>>  net/sunrpc/xprtrdma/svc_rdma_transport.c           |   10 +-
>>  net/sunrpc/xprtrdma/verbs.c                        |   10 +-
>>  49 files changed, 564 insertions(+), 139 deletions(-)
>>
>
>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>               GPG KeyID: 0E572FDD
>

Thanks for taking a look.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]         ` <CAAKD3BBo9A5AV9Uv-q1TgSQ6hs-oSUn5QsUT+5+6mk1RwW7AUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-01 11:25           ` Doug Ledford
       [not found]             ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-06-01 11:25 UTC (permalink / raw)
  To: Matan Barak
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 13671 bytes --]

On Mon, 2015-06-01 at 12:30 +0300, Matan Barak wrote:
> On Sun, May 31, 2015 at 10:00 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Sun, 2015-05-31 at 15:14 +0300, Or Gerlitz wrote:
> >> Hi Doug,
> >>
> >> This patchset adds completion timestamping supports for verbs consumers.
> >>
> >> Reviewing the weekend threads, we've changed the flag time to reflect
> >> that this is completion time-stamp and folded the mlx4 actual support
> >> into one patch.
> >>
> >> Regarding the related user-space support, it's possible to add what you
> >> were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and
> >> ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned
> >> by the poll cq verb an opaque one that must go through one of  the convertors.
> >>
> >> We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which
> >> could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME.
> >
> > I'm theoretically OK with something similar to the above.  However, the
> > NS time should not be raw.  It should be cooked and should be able to be
> > valid to compare between different adapters.  Right now, the cycle
> > counter that you are exposing is only useful for ordering between
> > packets received on a single adapter where the cycle counter is the same
> > on all packets.  Throw in a different vendor's card, or two of your own
> > cards, and the issue gets much more complex.  The cooked value should be
> > an actual, real time that can be used across these more complex
> > environments.  Because of that, it really shouldn't be called RAW.
> >
> 
> Thanks for the feedback Doug.
> We wanted to add RAW_NS in order to free the user from calculating it by himself
> (dividing the cycles value in the core_clock).

What's the point?  If it's raw, it's raw.  It's not coordinated between
adapters.  Whether it's in ns or ps or flipflops doesn't matter, it's a
flat number that has no reference to anything else, so the only thing
that matters is < another version of itself or not.

> In addition to this, it's possible to implement a future NS_TIME
> (without the "raw"), which
> will convert the opaque time to system wide ns.
> 
> > So, if you want a single entry point, I would suggest something like
> > this:
> >
> > enum ib_timestamp_flags {
> >         IB_TIMESTAMP_COMPLETION = (1 << 0), // specify on create_cq
> >         IB_TIMESTAMP_WQE_BEGIN =  (1 << 1), // specify on create qp?
> >         IB_TIMESTAMP_WQE_END =    (1 << 2), // specify on create qp?
> >         IB_TIMESTAMP_RAW =        (1 << 31)
> > };
> >
> > enum ib_cq_creation_flags {
> >         IB_CQ_FLAGS_TIMESTAMP_COMPLETION = (1 << 0)
> > };
> >
> > /**
> >  * ibv_get_timestamp - Return the requested timestamp for the given wc
> >  * @wc - work completion to get timestamp results from
> >  * @ts - struct timespec to return timestamp in
> >  * @flags - which timestamp to return and in what form
> >  *
> >  * Depending on the flags used to create the queue pair/completion
> >  * queue, different timestamps might be available.  Callers should
> >  * specify which timestamp they are interested in using the flags
> >  * element, and if they wish either a cooked or raw timestamp.  A
> >  * raw timestamp is implementation defined and will be passed back
> >  * in the tv_nsec portion of the struct timespec.  A raw timestamp
> >  * can not be relied upon to have any ordering value between more
> >  * than one HCA or driver.  A cooked timestamp will return a valid
> >  * struct timespec normalized as closely as possible to the return
> >  * value for CLOCK_MONOTONIC of clock_gettime at the time of the
> >  * timestamp.
> >  */
> > int ibv_get_timestamp(struct ibv_wc *wc, struct timespec *ts, int
> > flags);
> >
> 
> We wanted to divide the flow here:
> In create_cq, the user notifies the kernel/HCA which timestamp he
> would like to get.

Correction, which timestamp*s*.

> It could be a completion timestamp, a start of WQE timestamp or
> whatever he wants.
> The timestamp the user gets in the WQE is opaque. Every vendor could
> implement it
> as it wants - in order to have minimal implication in performance.

Again, timestamp(s).

> The second part is ibv_get_timestamp. It gets an opaque timestamp

No.  As you've already pointed out, how each vendor implements returning
the timestamp(s) could be totally different.  There are no timestamp
entries in the existing wc struct.  Expecting the user to pass the raw
value to the ibv_get_timestamp function makes no sense and violates the
attempted abstraction of ibverbs.  Passing in the wc struct allows the
driver to internally allocate a wc struct with extra private elements
and pass that back to the user, when the user passes it back to
ibv_get_timestamp the elements are there in the private portion of the
struct.

>  and
> outputs a converted value in respect to the time the user wanted to get.
> For example, if IB_TIMESTAMP_NS_TIME is given, the function should output
> a system-wide NS value (we would like to implement this only in the future).
> Currently, only RAW and RAW_NS will be supported, while RAW gives the time
> in cycles and RAW_NS gives a NS value with an unknown time reference.

Raw is raw.  Converting from raw yogurt to raw purple makes no sense,
it's raw.

> We think ibv_get_timestamp shouldn't get a wqe but a 64bit opaque value.
> The reason for this is that it could be used in order to translate query_values
> current time to different types of timestamp.
> What do you think?

See above.  It needs to be a wc struct unless you plan to identify every
timestamp we might concurrently enable and publicly change the wc struct
to include all of them so that the user can get the opaque value to pass
to the conversion function.

> >> We think this would address the reviewer comments for the kernel submission.
> >>
> >> The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the
> >> conversion functions)
> >>
> >>  https://github.com/matanb10/libibverbs timestamp-v1
> >>  https://github.com/matanb10/libmlx4 timestamp-v1
> >>
> >> Timestamping is used by applications in order to know when a WQE was
> >> received/transmitted by the HW. The value is given is HCA hardware cycles,
> >> but could be easily converted as the hardware's core clock frequecny is
> >> available through extension of query device.
> >>
> >> Moreover, we add an ability to read the HCA's current clock. This could be
> >> useful on order to synchronize events to the wall clock.
> >>
> >> This functionality is achieved by adding/extending the following verbs:
> >>
> >> create_cq - create_cq is extended in order to allow passing creation flags
> >> to the CQ creation function. We change IB/core --> vendors API
> >> to be easily extendible by passing a struct which contains
> >> comp_vectors, cqe and the new flags parameter. In order to create
> >> CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given.
> >>
> >> query_device - We extend query_device uverb further by giving the hardware's
> >> clock frequency and the timestamp mask (the number of timestamp
> >> bits which are supported). If timestamp isn't supported, 0 is returned.
> >>
> >> In order to read the timestamp in the WQE, the user needs to query the device
> >> for support, create an appropriate CQ (using the extanded uverb with
> >> IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently,
> >> only implemented in user-space).
> >>
> >> In mlx4, allowing the user to read the core clock efficiently involves mapping
> >> this area of the hardware to user-space (being done by using a mmap command)
> >> and reading the clock from the correct offset of the page.
> >>
> >> This offset is returned in the vendor's specific data from mlx4's kernel driver
> >> to the mlx4's user-space driver. query_device is modified in order to support
> >> passing this vendor specific data. A user-space application could use a new
> >> verb in order to read the hardware's clock.
> >>
> >> Translating the hardware's clock into ms could be done by dividing this
> >> value by hca_core_clock (which is returned by the extended version of
> >> query_device uverb).
> >>
> >> A user-space application could get the current HW's clock by executing
> >>
> >> ibv_query_values_ex(struct ibv_context *context, uint32_t q_values,
> >>                     struct ibv_values_ex *values)
> >>
> >> The function gets a mask of the values to query and return their values.
> >> Vendors could either implement this as a uverb command or use their
> >> user-space driver to return those values directly from the HW (the mlx4 way).
> >>
> >> Matan and Or.
> >>
> >> Changes from V1:
> >>  (1) fixed lustre IB's code build
> >>  (2) squashed mlx4 V1 9-11 patches into one
> >>  (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION
> >>
> >> Changes from V0:
> >> (1) Pass ib_cq_init_attr instead of cqe and comp_vector.
> >> (2) Fix unneeded indentation.
> >> (3) Change flags to u32.
> >> (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation.
> >>
> >> Matan Barak (9):
> >>   IB/core: Change provider's API of create_cq to be extendible
> >>   IB/core: Change ib_create_cq to use struct ib_cq_init_attr
> >>   IB/core: Add CQ creation time-stamping flag
> >>   IB/core: Extend ib_uverbs_create_cq
> >>   IB/core: Add timestamp_mask and hca_core_clock to query_device
> >>   IB/core: Pass hardware specific data in query_device
> >>   IB/mlx4: Add mmap call to map the hardware clock
> >>   IB/mlx4: Support extended create_cq and query_device uverbs
> >>   IB/mlx4: Add support for CQ time-stamping
> >>
> >>  drivers/infiniband/core/device.c                   |    6 +-
> >>  drivers/infiniband/core/mad.c                      |    5 +-
> >>  drivers/infiniband/core/uverbs.h                   |    1 +
> >>  drivers/infiniband/core/uverbs_cmd.c               |  188 ++++++++++++++++----
> >>  drivers/infiniband/core/uverbs_main.c              |    1 +
> >>  drivers/infiniband/core/verbs.c                    |    4 +-
> >>  drivers/infiniband/hw/amso1100/c2_provider.c       |   14 ++-
> >>  drivers/infiniband/hw/cxgb3/iwch_provider.c        |   19 ++-
> >>  drivers/infiniband/hw/cxgb4/cq.c                   |    9 +-
> >>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h             |    8 +-
> >>  drivers/infiniband/hw/cxgb4/provider.c             |    8 +-
> >>  drivers/infiniband/hw/ehca/ehca_cq.c               |    7 +-
> >>  drivers/infiniband/hw/ehca/ehca_hca.c              |    6 +-
> >>  drivers/infiniband/hw/ehca/ehca_iverbs.h           |    6 +-
> >>  drivers/infiniband/hw/ehca/ehca_main.c             |    6 +-
> >>  drivers/infiniband/hw/ipath/ipath_cq.c             |    9 +-
> >>  drivers/infiniband/hw/ipath/ipath_verbs.c          |    7 +-
> >>  drivers/infiniband/hw/ipath/ipath_verbs.h          |    3 +-
> >>  drivers/infiniband/hw/mlx4/cq.c                    |   13 ++-
> >>  drivers/infiniband/hw/mlx4/mad.c                   |    5 +-
> >>  drivers/infiniband/hw/mlx4/main.c                  |   67 +++++++-
> >>  drivers/infiniband/hw/mlx4/mlx4_ib.h               |   19 ++-
> >>  drivers/infiniband/hw/mlx5/cq.c                    |   10 +-
> >>  drivers/infiniband/hw/mlx5/main.c                  |   19 ++-
> >>  drivers/infiniband/hw/mlx5/mlx5_ib.h               |    5 +-
> >>  drivers/infiniband/hw/mthca/mthca_provider.c       |   15 ++-
> >>  drivers/infiniband/hw/nes/nes_verbs.c              |   17 ++-
> >>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c        |   13 ++-
> >>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h        |    9 +-
> >>  drivers/infiniband/hw/qib/qib_cq.c                 |   11 +-
> >>  drivers/infiniband/hw/qib/qib_verbs.c              |    6 +-
> >>  drivers/infiniband/hw/qib/qib_verbs.h              |    5 +-
> >>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c       |   16 ++-
> >>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h       |   10 +-
> >>  drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    9 +-
> >>  drivers/infiniband/ulp/iser/iser_verbs.c           |    6 +-
> >>  drivers/infiniband/ulp/isert/ib_isert.c            |    6 +-
> >>  drivers/infiniband/ulp/srp/ib_srp.c                |   10 +-
> >>  drivers/infiniband/ulp/srpt/ib_srpt.c              |    5 +-
> >>  drivers/net/ethernet/mellanox/mlx4/main.c          |   19 ++
> >>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +-
> >>  include/linux/mlx4/device.h                        |    9 +
> >>  include/rdma/ib_verbs.h                            |   25 ++-
> >>  include/uapi/rdma/ib_user_verbs.h                  |   19 ++
> >>  net/9p/trans_rdma.c                                |    5 +-
> >>  net/rds/ib_cm.c                                    |    8 +-
> >>  net/rds/iw_cm.c                                    |    8 +-
> >>  net/sunrpc/xprtrdma/svc_rdma_transport.c           |   10 +-
> >>  net/sunrpc/xprtrdma/verbs.c                        |   10 +-
> >>  49 files changed, 564 insertions(+), 139 deletions(-)
> >>
> >
> >
> > --
> > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >               GPG KeyID: 0E572FDD
> >
> 
> Thanks for taking a look.
> 
> Matan


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]             ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-01 13:58               ` Christoph Lameter
       [not found]                 ` <alpine.DEB.2.11.1506010852360.26845-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2015-06-01 16:43               ` Jason Gunthorpe
  1 sibling, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2015-06-01 13:58 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

On Mon, 1 Jun 2015, Doug Ledford wrote:

> What's the point?  If it's raw, it's raw.  It's not coordinated between
> adapters.  Whether it's in ns or ps or flipflops doesn't matter, it's a
> flat number that has no reference to anything else, so the only thing
> that matters is < another version of itself or not.

It can be coordinated between different adapter through the use of time
software that can work with cycles and frequencies to scale the value of
the cycles to realtime. Software like that is available in ptpd,
timekeeper etc. Each NIC basically has its own clock and the timekeeping
software would have to track the scaling and the aberration factor over
time in order to come up with accurate absolute time values derived from
the cycle counters of these NICs. Since we are dealing here with values
that need to be accurate to within less than 100ns this is not trivial and
one can easily get a ns value that is absolutely useless.

Since it is not trivial its better kept out of the timestamp support in
the RDMA API. If the app developer wants a trivial conversion then they
can opencode a simple multiplication by the frequency. At that point it
should be clear though that this raw time value is of limited use given
its inaccuracy  and the dependence on the NIC clock.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                 ` <alpine.DEB.2.11.1506010852360.26845-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-06-01 16:39                   ` Doug Ledford
  0 siblings, 0 replies; 64+ messages in thread
From: Doug Ledford @ 2015-06-01 16:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]

On Mon, 2015-06-01 at 08:58 -0500, Christoph Lameter wrote:
> On Mon, 1 Jun 2015, Doug Ledford wrote:
> 
> > What's the point?  If it's raw, it's raw.  It's not coordinated between
> > adapters.  Whether it's in ns or ps or flipflops doesn't matter, it's a
> > flat number that has no reference to anything else, so the only thing
> > that matters is < another version of itself or not.
> 
> It can be coordinated between different adapter through the use of time
> software that can work with cycles and frequencies to scale the value of
> the cycles to realtime.

And that is precisely what the cooked values should be.

>  Software like that is available in ptpd,
> timekeeper etc. Each NIC basically has its own clock and the timekeeping
> software would have to track the scaling and the aberration factor over
> time in order to come up with accurate absolute time values derived from
> the cycle counters of these NICs. Since we are dealing here with values
> that need to be accurate to within less than 100ns this is not trivial and
> one can easily get a ns value that is absolutely useless.

Agreed.  The cooked value is not going to be a simple thing.  I fail to
see how this is making a case that we should duplicate that code in
every app that uses a timestamp versus getting it right once in
libibverbs.

> Since it is not trivial its better kept out of the timestamp support in
> the RDMA API. If the app developer wants a trivial conversion then they
> can opencode a simple multiplication by the frequency. At that point it
> should be clear though that this raw time value is of limited use given
> its inaccuracy  and the dependence on the NIC clock.

The raw value is just that: raw.  And it *is* of limited use unless you
only have one adapter.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]             ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-01 13:58               ` Christoph Lameter
@ 2015-06-01 16:43               ` Jason Gunthorpe
       [not found]                 ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-01 16:43 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote:

> attempted abstraction of ibverbs.  Passing in the wc struct allows the
> driver to internally allocate a wc struct with extra private elements
> and pass that back to the user, when the user passes it back to
> ibv_get_timestamp the elements are there in the private portion of the
> struct.

wc structures are allocated by the caller, there is no option for the
driver to create private elements.

AFAIK, Christoph's use case is essentially the only meaningful use
case for this feature, generalizing too much may destroy the
performance that is valuable here.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr
       [not found]     ` <1433074457-26437-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-01 16:53       ` Jason Gunthorpe
       [not found]         ` <20150601165335.GB14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-01 16:53 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote:

> +	struct ib_cq_init_attr cq_attr;
>  
>  	/* Create new device info */
>  	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
> @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device,
>  	if (has_smi)
>  		cq_size *= 2;
>  
> +	memset(&cq_attr, 0, sizeof(cq_attr));
> +	cq_attr.cqe = cq_size;

Why does this patch switch to using memset when the prior patch used
= {} ?

> @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp);
>  struct ib_cq *ib_create_cq(struct ib_device *device,
>  			   ib_comp_handler comp_handler,
>  			   void (*event_handler)(struct ib_event *, void *),
> -			   void *cq_context, int cqe, int comp_vector)
> +			   void *cq_context, struct ib_cq_init_attr *cq_attr)
>  {
>  	struct ib_cq *cq;
> -	struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
>  
> -	cq = device->create_cq(device, &attr, NULL, NULL);
> +	cq = device->create_cq(device, cq_attr, NULL, NULL);

How does this compile without warnings?

The prior patch did:

-       struct ib_cq *             (*create_cq)(struct ib_device *device, int cqe,
-                                               int comp_vector,
+       struct ib_cq *             (*create_cq)(struct ib_device *device,
+                                               const struct ib_cq_init_attr *attr,
                                                struct ib_ucontext *context,
                                                struct ib_udata *udata);

Otherwise looks OK.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]     ` <1433074457-26437-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-01 16:56       ` Jason Gunthorpe
       [not found]         ` <20150601165649.GC14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-01 16:56 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote:
> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device
> by setting the appropriate bit in uverbs_ex_cmd_mask.

Why is this a seperate patch? Surely the bits should be or'd in the patches
that actually include the code to do the new commands?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible
       [not found]     ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-01  9:21       ` Devesh Sharma
@ 2015-06-01 17:00       ` Jason Gunthorpe
  1 sibling, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-01 17:00 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On Sun, May 31, 2015 at 03:14:09PM +0300, Or Gerlitz wrote:
> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add a new ib_cq_init_attr structure which contains the
> previous cqe (minimum number of CQ entries) and comp_vector
> (completion vector) in addition to a new flags field.
> All vendors' create_cq callbacks are changed in order
> to work with the new API.
> 
> This commit does not change any functionality.
> 
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Reviewed-By: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                 ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-01 17:00                   ` Doug Ledford
       [not found]                     ` <1433178057.114391.208.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-02 14:35                   ` Doug Ledford
  1 sibling, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-06-01 17:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote:
> On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote:
> 
> > attempted abstraction of ibverbs.  Passing in the wc struct allows the
> > driver to internally allocate a wc struct with extra private elements
> > and pass that back to the user, when the user passes it back to
> > ibv_get_timestamp the elements are there in the private portion of the
> > struct.
> 
> wc structures are allocated by the caller, there is no option for the
> driver to create private elements.

You're right, the data would have to be housed somewhere in the driver
private completion structs (for example, in the CQE the card posts to
memory that the driver then massages into a WC).

> AFAIK, Christoph's use case is essentially the only meaningful use
> case for this feature, generalizing too much may destroy the
> performance that is valuable here.

I'm not convinced of that.  Steve has already spoke up about the
timestamps available in cxgb4.  Those are very different and yet still
highly valuable to someone investigating performance of their RDMA
application.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                     ` <1433178057.114391.208.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-01 17:29                       ` Jason Gunthorpe
  0 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-01 17:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

On Mon, Jun 01, 2015 at 01:00:57PM -0400, Doug Ledford wrote:
> > case for this feature, generalizing too much may destroy the
> > performance that is valuable here.
 
> I'm not convinced of that.  Steve has already spoke up about the
> timestamps available in cxgb4.  Those are very different and yet still
> highly valuable to someone investigating performance of their RDMA
> application.

? cxgb4 looks nearly identical to me. There is only one HW time stamp
'cqe_sge_ts', which occurs at some point in the flow, and is written
the CQE. The current cycle counter can be read from SGE_TIMESTAMP_LO
registers. Same as mlx4, really.

The rest is just bookkeeping and logging that doesn't require special
verbs support for an app to implement.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]         ` <20150601165649.GC14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-02  7:21           ` Matan Barak
       [not found]             ` <CAAKD3BBsTFYKwOBK+6u04x6deRYzjYESJzbpyy83UtoChZhpOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-06-02  7:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Mon, Jun 1, 2015 at 7:56 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote:
>> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device
>> by setting the appropriate bit in uverbs_ex_cmd_mask.
>
> Why is this a seperate patch? Surely the bits should be or'd in the patches
> that actually include the code to do the new commands?
>

Although this patch can stand on its own (requesting unsupported flags
will just fail and the
comp_mask of uhw's will be disabled), there's no real value for not
squashing it into the other mlx4 patches.
We'll do that for the next version.

> Jason

Thanks for the review.

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr
       [not found]         ` <20150601165335.GB14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-02  7:29           ` Matan Barak
       [not found]             ` <CAAKD3BAyNpEKwM7xu=LzFo2F=qP=EN9ZteL2yyAgBVuPNmvtqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-06-02  7:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Mon, Jun 1, 2015 at 7:53 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote:
>
>> +     struct ib_cq_init_attr cq_attr;
>>
>>       /* Create new device info */
>>       port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
>> @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device,
>>       if (has_smi)
>>               cq_size *= 2;
>>
>> +     memset(&cq_attr, 0, sizeof(cq_attr));
>> +     cq_attr.cqe = cq_size;
>
> Why does this patch switch to using memset when the prior patch used
> = {} ?
>

Why does it matter? Both are valid approaches, aren't they?

>> @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp);
>>  struct ib_cq *ib_create_cq(struct ib_device *device,
>>                          ib_comp_handler comp_handler,
>>                          void (*event_handler)(struct ib_event *, void *),
>> -                        void *cq_context, int cqe, int comp_vector)
>> +                        void *cq_context, struct ib_cq_init_attr *cq_attr)
>>  {
>>       struct ib_cq *cq;
>> -     struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
>>
>> -     cq = device->create_cq(device, &attr, NULL, NULL);
>> +     cq = device->create_cq(device, cq_attr, NULL, NULL);
>
> How does this compile without warnings?
>

Do you mean that there's a missing const here? It doesn't (and shouldn't)
cause warnings.
However, ib_create_cq could be changed that it'll get const cq_attr as well.

> The prior patch did:
>
> -       struct ib_cq *             (*create_cq)(struct ib_device *device, int cqe,
> -                                               int comp_vector,
> +       struct ib_cq *             (*create_cq)(struct ib_device *device,
> +                                               const struct ib_cq_init_attr *attr,
>                                                 struct ib_ucontext *context,
>                                                 struct ib_udata *udata);
>
> Otherwise looks OK.
>
> Jason

Thanks for the review.

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                 ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-06-01 17:00                   ` Doug Ledford
@ 2015-06-02 14:35                   ` Doug Ledford
       [not found]                     ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-06-02 14:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]

On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote:
> On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote:
> 
> > attempted abstraction of ibverbs.  Passing in the wc struct allows the
> > driver to internally allocate a wc struct with extra private elements
> > and pass that back to the user, when the user passes it back to
> > ibv_get_timestamp the elements are there in the private portion of the
> > struct.
> 
> wc structures are allocated by the caller, there is no option for the
> driver to create private elements.

Well, they *are* using an extended work completion structure.  Unlike
what I mentioned, where they create a larger one themselves, you have to
allocate a struct ibv_wc_ex instead of a struct ibv_wc and then you have
to call poll_cq_ex, which expects a struct ibv_wc_ex.

So, just so everyone is clear on this point: the current user space
implementation of this feature creates an unversioned, newly named
ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the
end (not 64bit aligned either).  If we ever wanted to have a different
extension to our ibv_wc struct, there is no good way to do that.  If, at
some point, we had multiple extension and the user was able to select
which they wanted to utilize, this structure extension is not flexible
enough to deal with that.  At a minimum, if we are going to have a one
shot extension to the wc struct like this, I would prefer to see it
called struct ibv_wc_timestamp and there be a ibv_poll_cq_timestamp.  At
least that way people would not use the generic _ex and assume this is
the one and only _ex that we will ever need for work completions.

Jason, when the XRC and flow steering extensions were added to
libibverbs, you complained loudly that they were not added in the agreed
upon format and cited a previous on list discussion.  Do you have a link
to that discussion?

> AFAIK, Christoph's use case is essentially the only meaningful use
> case for this feature, generalizing too much may destroy the
> performance that is valuable here.

There is actually room in a 64byte cacheline for two 64bit timestamps
and another 2 bytes of padding or something else.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                     ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-02 14:44                       ` Or Gerlitz
       [not found]                         ` <556DC167.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-02 18:08                       ` Jason Gunthorpe
  1 sibling, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-06-02 14:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

On 6/2/2015 5:35 PM, Doug Ledford wrote:
> On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote:
>> On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote:
>>
>>> attempted abstraction of ibverbs.  Passing in the wc struct allows the
>>> driver to internally allocate a wc struct with extra private elements
>>> and pass that back to the user, when the user passes it back to
>>> ibv_get_timestamp the elements are there in the private portion of the
>>> struct.
>> wc structures are allocated by the caller, there is no option for the
>> driver to create private elements.
> Well, they *are* using an extended work completion structure.  Unlike
> what I mentioned, where they create a larger one themselves, you have to
> allocate a struct ibv_wc_ex instead of a struct ibv_wc and then you have
> to call poll_cq_ex, which expects a struct ibv_wc_ex.
>
> So, just so everyone is clear on this point: the current user space
> implementation of this feature creates an unversioned, newly named
> ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the
> end (not 64bit aligned either).  If we ever wanted to have a different
> extension to our ibv_wc struct, there is no good way to do that.  If, at
> some point, we had multiple extension and the user was able to select
> which they wanted to utilize, this structure extension is not flexible
> enough to deal with that.  At a minimum, if we are going to have a one
> shot extension to the wc struct like this, I would prefer to see it
> called struct ibv_wc_timestamp and there be a ibv_poll_cq_timestamp.  At
> least that way people would not use the generic _ex and assume this is
> the one and only _ex that we will ever need for work completions.
>
> Jason, when the XRC and flow steering extensions were added to
> libibverbs, you complained loudly that they were not added in the agreed
> upon format and cited a previous on list discussion.  Do you have a link
> to that discussion?

Doug,

Do we agree that this part of the discussion (and also the below point) 
are related to the libibverbs API to applications and not to the kernel 
-> user API to support time-stamping?

Or.

>
>> AFAIK, Christoph's use case is essentially the only meaningful use
>> case for this feature, generalizing too much may destroy the
>> performance that is valuable here.
> There is actually room in a 64byte cacheline for two 64bit timestamps
> and another 2 bytes of padding or something else.
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                         ` <556DC167.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-02 14:53                           ` Doug Ledford
  0 siblings, 0 replies; 64+ messages in thread
From: Doug Ledford @ 2015-06-02 14:53 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 3027 bytes --]

On Tue, 2015-06-02 at 17:44 +0300, Or Gerlitz wrote:
> On 6/2/2015 5:35 PM, Doug Ledford wrote:
> > On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote:
> >> On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote:
> >>
> >>> attempted abstraction of ibverbs.  Passing in the wc struct allows the
> >>> driver to internally allocate a wc struct with extra private elements
> >>> and pass that back to the user, when the user passes it back to
> >>> ibv_get_timestamp the elements are there in the private portion of the
> >>> struct.
> >> wc structures are allocated by the caller, there is no option for the
> >> driver to create private elements.
> > Well, they *are* using an extended work completion structure.  Unlike
> > what I mentioned, where they create a larger one themselves, you have to
> > allocate a struct ibv_wc_ex instead of a struct ibv_wc and then you have
> > to call poll_cq_ex, which expects a struct ibv_wc_ex.
> >
> > So, just so everyone is clear on this point: the current user space
> > implementation of this feature creates an unversioned, newly named
> > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the
> > end (not 64bit aligned either).  If we ever wanted to have a different
> > extension to our ibv_wc struct, there is no good way to do that.  If, at
> > some point, we had multiple extension and the user was able to select
> > which they wanted to utilize, this structure extension is not flexible
> > enough to deal with that.  At a minimum, if we are going to have a one
> > shot extension to the wc struct like this, I would prefer to see it
> > called struct ibv_wc_timestamp and there be a ibv_poll_cq_timestamp.  At
> > least that way people would not use the generic _ex and assume this is
> > the one and only _ex that we will ever need for work completions.
> >
> > Jason, when the XRC and flow steering extensions were added to
> > libibverbs, you complained loudly that they were not added in the agreed
> > upon format and cited a previous on list discussion.  Do you have a link
> > to that discussion?
> 
> Doug,
> 
> Do we agree that this part of the discussion (and also the below point) 
> are related to the libibverbs API to applications and not to the kernel 
> -> user API to support time-stamping?

Yes.

> Or.
> 
> >
> >> AFAIK, Christoph's use case is essentially the only meaningful use
> >> case for this feature, generalizing too much may destroy the
> >> performance that is valuable here.
> > There is actually room in a 64byte cacheline for two 64bit timestamps
> > and another 2 bytes of padding or something else.
> >
> >
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]             ` <CAAKD3BBsTFYKwOBK+6u04x6deRYzjYESJzbpyy83UtoChZhpOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-02 17:07               ` Jason Gunthorpe
       [not found]                 ` <20150602170736.GA17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-02 17:07 UTC (permalink / raw)
  To: Matan Barak
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Tue, Jun 02, 2015 at 10:21:47AM +0300, Matan Barak wrote:
> On Mon, Jun 1, 2015 at 7:56 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote:
> >> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>
> >> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device
> >> by setting the appropriate bit in uverbs_ex_cmd_mask.
> >
> > Why is this a seperate patch? Surely the bits should be or'd in the patches
> > that actually include the code to do the new commands?
> >
> 
> Although this patch can stand on its own (requesting unsupported flags
> will just fail and the
> comp_mask of uhw's will be disabled), there's no real value for not
> squashing it into the other mlx4 patches.

Then there is something wrong with this 'uverbs_ex_cmd_mask' stuff -
the bit should be clear if the driver cannot handle the ex inputs, but
these patches make all drivers handle the ex style (by checking
flags), so all drivers should have some of the bits set..

An extended command that doesn't use any extended features should
transparently degrade to the normal command as often as possible.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr
       [not found]             ` <CAAKD3BAyNpEKwM7xu=LzFo2F=qP=EN9ZteL2yyAgBVuPNmvtqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-02 17:27               ` Jason Gunthorpe
       [not found]                 ` <20150602172711.GB17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-02 17:27 UTC (permalink / raw)
  To: Matan Barak
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Tue, Jun 02, 2015 at 10:29:14AM +0300, Matan Barak wrote:
> On Mon, Jun 1, 2015 at 7:53 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote:
> >
> >> +     struct ib_cq_init_attr cq_attr;
> >>
> >>       /* Create new device info */
> >>       port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
> >> @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device,
> >>       if (has_smi)
> >>               cq_size *= 2;
> >>
> >> +     memset(&cq_attr, 0, sizeof(cq_attr));
> >> +     cq_attr.cqe = cq_size;
> >
> > Why does this patch switch to using memset when the prior patch used
> > = {} ?
> >
> 
> Why does it matter? Both are valid approaches, aren't they?

Sure, but why mix and match techniques in the same code base? Is there
a reason?

> >> @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp);
> >>  struct ib_cq *ib_create_cq(struct ib_device *device,
> >>                          ib_comp_handler comp_handler,
> >>                          void (*event_handler)(struct ib_event *, void *),
> >> -                        void *cq_context, int cqe, int comp_vector)
> >> +                        void *cq_context, struct ib_cq_init_attr *cq_attr)
> >>  {
> >>       struct ib_cq *cq;
> >> -     struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
> >>
> >> -     cq = device->create_cq(device, &attr, NULL, NULL);
> >> +     cq = device->create_cq(device, cq_attr, NULL, NULL);
> >
> > How does this compile without warnings?
> >
> 
> Do you mean that there's a missing const here? It doesn't (and shouldn't)
> cause warnings.

Warnings will be emitted when you compile with -Wcast-qual (make W=3).

Some of the static tools may warn/error on implicitly casting away
const, I'm not sure.

It is broadly undesirable to implicitly cast away const, even if the
Kernel warning defaults don't produce the message.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                     ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-02 14:44                       ` Or Gerlitz
@ 2015-06-02 18:08                       ` Jason Gunthorpe
       [not found]                         ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-02 18:08 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote:

> So, just so everyone is clear on this point: the current user space
> implementation of this feature creates an unversioned, newly named
> ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the
> end (not 64bit aligned either).  If we ever wanted to have a different
> extension to our ibv_wc struct, there is no good way to do that.

No, if they followed (I didn't check yes) the extension scheme then the
poll call is

 struct ibv_wc_ex wcs[num_wcs]
 ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0]));

And the drivers decide what to do based on the 3rd argument, which is
essentially the ABI version.

> Jason, when the XRC and flow steering extensions were added to
> libibverbs, you complained loudly that they were not added in the agreed
> upon format and cited a previous on list discussion.  Do you have a link
> to that discussion?

Not off hand, but that was different, that was a misuse of comp_mask,
IIRC.

Or, the question in my mind based on looking at the UAPI patches is
what things should be driver private and what should be general.

Broadly my thoughts:
 - Should the frequency and mask be general, or driver private? If the
   cycles->ns conversion is a function they should be driver private.
   Even if they are general at libibverbs, they don't *have* to be in
   the kernel's general query response.
 - Should frequency even be frequency? Most clocks are expressed
   accurately as a period in picoseconds. Frequency is more often
   imprecise. (eg ethernet is 3200 ps or 312.5MHz)
   However FDR/EDR is fractional for both (4693.33333333 ps vs
   213.0681818181818 MHz)
   Precision is very important for time conversions, so a
   multiply-divide scheme would be ideal.
   This is suggesting to me these details really are not
   general.
 - There should be much better definition on what all this stuff is,
   units for frequency? When is the timestamp applied?
 - Should an app even be exposed to mask? This is very difficult
   to use correctly in the general case. Only cases where an app is
   restarted more often than a wrap period are trivial to use properly.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                         ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-02 18:51                           ` Doug Ledford
       [not found]                             ` <1433271083.40123.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-03 19:48                           ` Or Gerlitz
  2015-06-04 16:17                           ` Hefty, Sean
  2 siblings, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-06-02 18:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 3012 bytes --]

On Tue, 2015-06-02 at 12:08 -0600, Jason Gunthorpe wrote:
> On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote:
> 
> > So, just so everyone is clear on this point: the current user space
> > implementation of this feature creates an unversioned, newly named
> > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the
> > end (not 64bit aligned either).  If we ever wanted to have a different
> > extension to our ibv_wc struct, there is no good way to do that.
> 
> No, if they followed (I didn't check yes) the extension scheme then the
> poll call is
> 
>  struct ibv_wc_ex wcs[num_wcs]
>  ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0]));
> 
> And the drivers decide what to do based on the 3rd argument, which is
> essentially the ABI version.

Ick.  OK.  I would *much* prefer something done akin to the routines in
packer.c of the kernel, but that's not my call to make, the decision on
the ABI/API extension mechanism was made long ago.  It does, however,
mean that extensions are serial and not modular, and that's a shame.

> > Jason, when the XRC and flow steering extensions were added to
> > libibverbs, you complained loudly that they were not added in the agreed
> > upon format and cited a previous on list discussion.  Do you have a link
> > to that discussion?
> 
> Not off hand, but that was different, that was a misuse of comp_mask,
> IIRC.
> 
> Or, the question in my mind based on looking at the UAPI patches is
> what things should be driver private and what should be general.
> 
> Broadly my thoughts:
>  - Should the frequency and mask be general, or driver private? If the
>    cycles->ns conversion is a function they should be driver private.
>    Even if they are general at libibverbs, they don't *have* to be in
>    the kernel's general query response.
>  - Should frequency even be frequency? Most clocks are expressed
>    accurately as a period in picoseconds. Frequency is more often
>    imprecise. (eg ethernet is 3200 ps or 312.5MHz)
>    However FDR/EDR is fractional for both (4693.33333333 ps vs
>    213.0681818181818 MHz)
>    Precision is very important for time conversions, so a
>    multiply-divide scheme would be ideal.
>    This is suggesting to me these details really are not
>    general.
>  - There should be much better definition on what all this stuff is,
>    units for frequency? When is the timestamp applied?
>  - Should an app even be exposed to mask? This is very difficult
>    to use correctly in the general case. Only cases where an app is
>    restarted more often than a wrap period are trivial to use properly.
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                             ` <1433271083.40123.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-02 19:04                               ` Jason Gunthorpe
       [not found]                                 ` <20150602190410.GA23362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-02 19:04 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

On Tue, Jun 02, 2015 at 02:51:23PM -0400, Doug Ledford wrote:
> On Tue, 2015-06-02 at 12:08 -0600, Jason Gunthorpe wrote:
> > On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote:
> > 
> > > So, just so everyone is clear on this point: the current user space
> > > implementation of this feature creates an unversioned, newly named
> > > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the
> > > end (not 64bit aligned either).  If we ever wanted to have a different
> > > extension to our ibv_wc struct, there is no good way to do that.
> > 
> > No, if they followed (I didn't check yes) the extension scheme then the
> > poll call is
> > 
> >  struct ibv_wc_ex wcs[num_wcs]
> >  ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0]));
> > 
> > And the drivers decide what to do based on the 3rd argument, which is
> > essentially the ABI version.
> 
> Ick.  OK.  I would *much* prefer something done akin to the routines in
> packer.c of the kernel, but that's not my call to make, the decision on
> the ABI/API extension mechanism was made long ago.  It does, however,
> mean that extensions are serial and not modular, and that's a shame.

All verbs extensions are essentially serial, each extension requires a
fixed allocation of structure bytes, made by upstream.

This is also why no vendor may ship an extension that is not upstream
and continue to use the same soname as upstream. Similarly for the
kernel.

This is fairly performance neutral, while a packer.c scheme would be
unacceptably expensive, IMHO. poll_wc is one of the most performance
sensitive routines in the library.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                 ` <20150602190410.GA23362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-02 19:56                                   ` Doug Ledford
  0 siblings, 0 replies; 64+ messages in thread
From: Doug Ledford @ 2015-06-02 19:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 2886 bytes --]

On Tue, 2015-06-02 at 13:04 -0600, Jason Gunthorpe wrote:
> On Tue, Jun 02, 2015 at 02:51:23PM -0400, Doug Ledford wrote:
> > On Tue, 2015-06-02 at 12:08 -0600, Jason Gunthorpe wrote:
> > > On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote:
> > > 
> > > > So, just so everyone is clear on this point: the current user space
> > > > implementation of this feature creates an unversioned, newly named
> > > > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the
> > > > end (not 64bit aligned either).  If we ever wanted to have a different
> > > > extension to our ibv_wc struct, there is no good way to do that.
> > > 
> > > No, if they followed (I didn't check yes) the extension scheme then the
> > > poll call is
> > > 
> > >  struct ibv_wc_ex wcs[num_wcs]
> > >  ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0]));
> > > 
> > > And the drivers decide what to do based on the 3rd argument, which is
> > > essentially the ABI version.
> > 
> > Ick.  OK.  I would *much* prefer something done akin to the routines in
> > packer.c of the kernel, but that's not my call to make, the decision on
> > the ABI/API extension mechanism was made long ago.  It does, however,
> > mean that extensions are serial and not modular, and that's a shame.
> 
> All verbs extensions are essentially serial, each extension requires a
> fixed allocation of structure bytes, made by upstream.
> 
> This is also why no vendor may ship an extension that is not upstream
> and continue to use the same soname as upstream. Similarly for the
> kernel.
> 
> This is fairly performance neutral, while a packer.c scheme would be
> unacceptably expensive, IMHO. poll_wc is one of the most performance
> sensitive routines in the library.

I disagree.  Obviously I haven't run them in a tight loop to confirm,
but I looked at mthca, mlx4, and cxgb4 user libraries, and all of them
have complex *_poll_one routines that convert their internal cqe's to
wc's.  The packer routines aren't any more complex or any slower (at
least not necessarily, it all depends on the particular transformation
needed).  The packer routines are just hard to read.

And, as Christoph pointed out, we can keep our wc in a single cache line
right now.  However, we only need a few extensions to blow that out of
the water.  If some extension comes along that gets allocated past the
64byte cacheline size, and that extension is used far more frequently
than say this timestamp, then we will have forced a cache line break on
a frequently used item for a less frequently used item.  So, there would
be benefits to a modular approach in terms of allowing the user to
select what items they want and to keep their important items in that
single cache line.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]                 ` <20150602170736.GA17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-03  8:57                   ` Matan Barak
       [not found]                     ` <CAAKD3BARPocK1zZ2nYV0xWHqEASJTYbp1aQxwymi-TYTyK_vGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-06-03  8:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Tue, Jun 2, 2015 at 8:07 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, Jun 02, 2015 at 10:21:47AM +0300, Matan Barak wrote:
>> On Mon, Jun 1, 2015 at 7:56 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> > On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote:
>> >> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> >>
>> >> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device
>> >> by setting the appropriate bit in uverbs_ex_cmd_mask.
>> >
>> > Why is this a seperate patch? Surely the bits should be or'd in the patches
>> > that actually include the code to do the new commands?
>> >
>>
>> Although this patch can stand on its own (requesting unsupported flags
>> will just fail and the
>> comp_mask of uhw's will be disabled), there's no real value for not
>> squashing it into the other mlx4 patches.
>
> Then there is something wrong with this 'uverbs_ex_cmd_mask' stuff -
> the bit should be clear if the driver cannot handle the ex inputs, but
> these patches make all drivers handle the ex style (by checking
> flags), so all drivers should have some of the bits set..
>

That's a general comment regarding the extension mechanism.
Since by nature the extended verbs as extendible, one consumer could support
A and B while the other only supports A, but they both indicate they support
this extension verb.
You could argue that if the "flags" field wasn't tested, we would have need this
uverbs_ex_cmd_mask - but because it could be also used by kernel consumers,
this check is necessary.

> An extended command that doesn't use any extended features should
> transparently degrade to the normal command as often as possible.
>

That means that uverbs_ex_cmd_mask should only be used on extended commands that
are user-space specific.
Anyway, we could add these  IB_USER_VERBS_EX_CMD flags to all vendors,
but IMHO this general problem doesn't relate to this series,
which is only about adding timestamp support.

> Jason

Thanks for your comments.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr
       [not found]                 ` <20150602172711.GB17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-03  9:04                   ` Matan Barak
       [not found]                     ` <CAAKD3BA7ZD4Gx5c16Ak+0_64rzazw+7n4frkuKHyrvdT=8ac6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-06-03  9:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Tue, Jun 2, 2015 at 8:27 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, Jun 02, 2015 at 10:29:14AM +0300, Matan Barak wrote:
>> On Mon, Jun 1, 2015 at 7:53 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> > On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote:
>> >
>> >> +     struct ib_cq_init_attr cq_attr;
>> >>
>> >>       /* Create new device info */
>> >>       port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
>> >> @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device,
>> >>       if (has_smi)
>> >>               cq_size *= 2;
>> >>
>> >> +     memset(&cq_attr, 0, sizeof(cq_attr));
>> >> +     cq_attr.cqe = cq_size;
>> >
>> > Why does this patch switch to using memset when the prior patch used
>> > = {} ?
>> >
>>
>> Why does it matter? Both are valid approaches, aren't they?
>
> Sure, but why mix and match techniques in the same code base? Is there
> a reason?
>

No reason. We'll change that to {} style.

>> >> @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp);
>> >>  struct ib_cq *ib_create_cq(struct ib_device *device,
>> >>                          ib_comp_handler comp_handler,
>> >>                          void (*event_handler)(struct ib_event *, void *),
>> >> -                        void *cq_context, int cqe, int comp_vector)
>> >> +                        void *cq_context, struct ib_cq_init_attr *cq_attr)
>> >>  {
>> >>       struct ib_cq *cq;
>> >> -     struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
>> >>
>> >> -     cq = device->create_cq(device, &attr, NULL, NULL);
>> >> +     cq = device->create_cq(device, cq_attr, NULL, NULL);
>> >
>> > How does this compile without warnings?
>> >
>>
>> Do you mean that there's a missing const here? It doesn't (and shouldn't)
>> cause warnings.
>
> Warnings will be emitted when you compile with -Wcast-qual (make W=3).
>
> Some of the static tools may warn/error on implicitly casting away
> const, I'm not sure.
>
> It is broadly undesirable to implicitly cast away const, even if the
> Kernel warning defaults don't produce the message.
>

Why is it casting away the const?
ib_create_cq gets a "struct ib_cq_init_attr *" and passes it to
device->create_cq which gets a "const struct ib_cq_init_attr *",
so I'm adding a const, which is perfectly fine.

Anyway, we'll add const to ib_create_cq as well and run make W=3 to verify.

> Jason

Thanks for your comments.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr
       [not found]                     ` <CAAKD3BA7ZD4Gx5c16Ak+0_64rzazw+7n4frkuKHyrvdT=8ac6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-03 16:02                       ` Jason Gunthorpe
  0 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-03 16:02 UTC (permalink / raw)
  To: Matan Barak
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Wed, Jun 03, 2015 at 12:04:23PM +0300, Matan Barak wrote:
> Why is it casting away the const?
> ib_create_cq gets a "struct ib_cq_init_attr *" and passes it to
> device->create_cq which gets a "const struct ib_cq_init_attr *",
> so I'm adding a const, which is perfectly fine.

Right, never mind, I read the patch upside down..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]                     ` <CAAKD3BARPocK1zZ2nYV0xWHqEASJTYbp1aQxwymi-TYTyK_vGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-03 16:31                       ` Jason Gunthorpe
       [not found]                         ` <20150603163146.GD12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-03 16:31 UTC (permalink / raw)
  To: Matan Barak
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote:
> That's a general comment regarding the extension mechanism.

Yes, but it is also a specific comment about patch #4 which adds,
ib_uverbs_ex_create_cq.

Based on the implementation of create_cq, it is pretty clear that
every driver supports ib_uverbs_ex_create_cq, so patch #4 should just
force the flag in the device register function so it is globally
enabled.

query_device looks like it is the same, passing in the original
structure length will always work on any device. So Mellanox should
send a bugfix patch for that as well, unrelated to this series.

> > An extended command that doesn't use any extended features should
> > transparently degrade to the normal command as often as possible.
> 
> That means that uverbs_ex_cmd_mask should only be used on extended commands that
> are user-space specific.

The bit should only be clear on commands that can never return
anything but ENOSYS. Ie the kernel has no support for the command at
all, or for some reason the driver cannot handle the call.

The latter case should be rare, it would be the case if we added a new
command that was not extending an existing command, for instance.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]                         ` <20150603163146.GD12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-03 18:58                           ` Or Gerlitz
       [not found]                             ` <CAJ3xEMhGdX-35WAjauKZE7+Wg39kbXP35UzYX33J6OB8g_9rYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-06-03 18:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak

On Wed, Jun 3, 2015 at 7:31 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote:
>> That's a general comment regarding the extension mechanism.
>
> Yes, but it is also a specific comment about patch #4 which adds,
> ib_uverbs_ex_create_cq.
>
> Based on the implementation of create_cq, it is pretty clear that
> every driver supports ib_uverbs_ex_create_cq, so patch #4 should just
> force the flag in the device register function so it is globally enabled.

But the other drivers currently do not support any CQ creation flag
and hence no extended functionality, I don't see the point signaling
towards user-space that the verb is supported, please elaborate.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]                             ` <CAJ3xEMhGdX-35WAjauKZE7+Wg39kbXP35UzYX33J6OB8g_9rYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-03 19:16                               ` Jason Gunthorpe
       [not found]                                 ` <20150603191609.GB7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-03 19:16 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Matan Barak, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak

On Wed, Jun 03, 2015 at 09:58:25PM +0300, Or Gerlitz wrote:
> On Wed, Jun 3, 2015 at 7:31 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote:
> >> That's a general comment regarding the extension mechanism.
> >
> > Yes, but it is also a specific comment about patch #4 which adds,
> > ib_uverbs_ex_create_cq.
> >
> > Based on the implementation of create_cq, it is pretty clear that
> > every driver supports ib_uverbs_ex_create_cq, so patch #4 should just
> > force the flag in the device register function so it is globally enabled.
> 
> But the other drivers currently do not support any CQ creation flag
> and hence no extended functionality, I don't see the point signaling
> towards user-space that the verb is supported, please elaborate.

They support the base functionality, the flags = 0 case.

There is no reason to block access to the base functionality via the
extended api. That just creates hassles for userspace.

If userspace detects the extended API is present, it can just
switch unconditionally all usage to that API. This is how most new
kernel syscalls are introduced (glibc does this transparently).

Detecting what flags a driver supports (if any) is any entirely
different and orthogonal issue to introducing comp_mask/etc.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]                                 ` <20150603191609.GB7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-03 19:35                                   ` Or Gerlitz
       [not found]                                     ` <CAJ3xEMiPkxf_Qu40AozMRWZaY-dAfEyAOZGyT9VYsNpQO1Q-Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-06-03 19:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak

On Wed, Jun 3, 2015 at 10:16 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, Jun 03, 2015 at 09:58:25PM +0300, Or Gerlitz wrote:
>> On Wed, Jun 3, 2015 at 7:31 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> > On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote:
>> >> That's a general comment regarding the extension mechanism.
>> >
>> > Yes, but it is also a specific comment about patch #4 which adds,
>> > ib_uverbs_ex_create_cq.
>> >
>> > Based on the implementation of create_cq, it is pretty clear that
>> > every driver supports ib_uverbs_ex_create_cq, so patch #4 should just
>> > force the flag in the device register function so it is globally enabled.

>> But the other drivers currently do not support any CQ creation flag
>> and hence no extended functionality, I don't see the point signaling
>> towards user-space that the verb is supported, please elaborate.

> They support the base functionality, the flags = 0 case.

which doesn't let consumers to use any new functionality.

> There is no reason to block access to the base functionality via the
> extended api. That just creates hassles for userspace.
> If userspace detects the extended API is present, it can just
> switch unconditionally all usage to that API.

This is user-space run time story, they don't have the knowledge that
all the LL drivers supports the extended api for CQ creation. We had
to check the flag and in all LL drivers since the in-kernel IB stack
has no (and need not to have any) notion of extended calls.

> This is how most new kernel syscalls are introduced (glibc
> does this transparently).

That's an interesting comment. And you know what, basically we can add
auto support for that call in uverbs.

But the point here is a bit different: I somehow have the feeling that
unless ~each and every one of your review comments are accepted to the
letter, no inclusion.

You are not the maintainer here, and even maintainers prefer not to
force each of their detailed comments on submitters.

This specific comment relates TINY in-kernel thing that can be changed later.

If from ten comments you give me I accept as is five, with the other
five I am trying to argue, on two of them we agree to my side, on two
we go your side and on the last one we let the maintainer to cut, this
is a healthy process that makes sense.

Currently it's feels like of either accepting 98% of the comments you
give or no acceptance.

> Detecting what flags a driver supports (if any) is any entirely
> different and orthogonal issue to introducing comp_mask/etc.

I didn't say that the which flags are supported detection relates to
exposing that extended uverbs call. I don't understand the "is any
entirely different" part of the sentence, is that as of me being
EMS-er?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                         ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-06-02 18:51                           ` Doug Ledford
@ 2015-06-03 19:48                           ` Or Gerlitz
       [not found]                             ` <CAJ3xEMjZqwJC8=HrMBv_eSVKVtLW7qjVZNjtuD28octGPtYu6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-06-04 16:17                           ` Hefty, Sean
  2 siblings, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-06-03 19:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Lameter
  Cc: Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Tue, Jun 2, 2015 at 9:08 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:

> Or, the question in my mind based on looking at the UAPI patches is
> what things should be driver private and what should be general.
>
> Broadly my thoughts:
>  - Should the frequency and mask be general, or driver private? If the
>    cycles->ns conversion is a function they should be driver private.
>    Even if they are general at libibverbs, they don't *have* to be in
>    the kernel's general query response.

If they are general in libibverbs, what's the point not to put them in
the kernel's general query response?

>  - Should frequency even be frequency? Most clocks are expressed
>    accurately as a period in picoseconds. Frequency is more often
>    imprecise. (eg ethernet is 3200 ps or 312.5MHz)
>    However FDR/EDR is fractional for both (4693.33333333 ps vs
>    213.0681818181818 MHz)
>    Precision is very important for time conversions, so a
>    multiply-divide scheme would be ideal.

>From Christoph's response I got the impression that our proposal of
exposing frequency and mask combined with raw time stamps excellently
fits typical user needs, so I thought we're good. Doug made a comment
that things look OK to him and the rest of the work would be when we
come to review the user-space patches.

>    This is suggesting to me these details really are not general.
>  - There should be much better definition on what all this stuff is,
>    units for frequency? When is the timestamp applied?

The timestamp is applied when the WC is generated, as Doug asked, we
changed the flag name to reflect that. I guess that the units for
frequency are MHz but I will check that and we can document it in the
kernel IB core patch and later in man pages.

>  - Should an app even be exposed to mask? This is very difficult
>    to use correctly in the general case. Only cases where an app is
>    restarted more often than a wrap period are trivial to use properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found]                                     ` <CAJ3xEMiPkxf_Qu40AozMRWZaY-dAfEyAOZGyT9VYsNpQO1Q-Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-03 20:38                                       ` Jason Gunthorpe
  0 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-03 20:38 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Matan Barak, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak

On Wed, Jun 03, 2015 at 10:35:03PM +0300, Or Gerlitz wrote:

> > They support the base functionality, the flags = 0 case.
> 
> which doesn't let consumers to use any new functionality.

So what? A call with flags = 0 works, why return ENOSYS for all
drivers except mlx4 in that case? It doesn't make sense to be
asymmetric like that.

Again, the extension process (patch #4) was to introduce the flags, as
long as the flags is processed properly then the syscall is
functional and should not return ENOSYS. It does not matter which
flags, if any, are supported.

> But the point here is a bit different: I somehow have the feeling that
> unless ~each and every one of your review comments are accepted to the
> letter, no inclusion.

I am just reviewing, Doug will have to decide if discussion is done or
not. To be clear: 'no inclusion' from me would be a clear NAK
statement.

If I'm going to provide my Reviewed-By I want to see:
 1) Comments addressed via a code change
 2) Comments addressed via a persuasive technical argument
 3) Comments addressed as 'too much work'/'un-important'/'personal preference'/etc.
 4) Comments addressed because I am wrong

And try to be clear about it, explain clearly.

> You are not the maintainer here, and even maintainers prefer not to
> force each of their detailed comments on submitters.

This isn't a detailed comment, this is a significant point about how a
UAPI is expected to work. And yes, UAPI is important, details are
important and I will argue for my viewpoint.

There is a huge difference between doing work on your own drivers and
doing core work. I do not know many cases where a maintainer/reviewer
of core sections will let details slide. There is a high expectation
for core code, and a very high expectation for UAPI.

> This specific comment relates TINY in-kernel thing that can be
> changed later.

Where is the pride? Do it right!

> If from ten comments you give me I accept as is five, with the other
> five I am trying to argue, on two of them we agree to my side, on two
> we go your side and on the last one we let the maintainer to cut, this
> is a healthy process that makes sense.

Sure, but you have to make a persuasive technical argument.. You can't
just argue..

In this case, you completely skipped over my main point:
 Drivers that only support flags == 0 should not return ENOSYS.

I gave several reasons why I think this is important, and how
userspace can use this, and how it is normal in the kernel.

You responded to the reasons, but ignored the actual thesis, and
didn't provide any counter reasons to support your idea:
 Drivers that only support flags == 0 should return ENOSYS.

So we are not debating, we are just arguing, and it isn't productive.

> I don't understand the "is any entirely different" part of the
> sentence, is that as of me being EMS-er?

No, that is just me typoing 'an -> any'. Sorry

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                             ` <CAJ3xEMjZqwJC8=HrMBv_eSVKVtLW7qjVZNjtuD28octGPtYu6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-03 20:46                               ` Jason Gunthorpe
       [not found]                                 ` <20150603204633.GE7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-03 20:46 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Christoph Lameter, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Wed, Jun 03, 2015 at 10:48:28PM +0300, Or Gerlitz wrote:

> >  - Should the frequency and mask be general, or driver private? If the
> >    cycles->ns conversion is a function they should be driver private.
> >    Even if they are general at libibverbs, they don't *have* to be in
> >    the kernel's general query response.
> 
> If they are general in libibverbs, what's the point not to put them in
> the kernel's general query response?

If there is a timestamp_to_ns API then they would not be general in
libiverbs either.

> >  - Should frequency even be frequency? Most clocks are expressed
> >    accurately as a period in picoseconds. Frequency is more often
> >    imprecise. (eg ethernet is 3200 ps or 312.5MHz)
> >    However FDR/EDR is fractional for both (4693.33333333 ps vs
> >    213.0681818181818 MHz)
> >    Precision is very important for time conversions, so a
> >    multiply-divide scheme would be ideal.
> 
> From Christoph's response I got the impression that our proposal of
> exposing frequency and mask combined with raw time stamps excellently
> fits typical user needs, so I thought we're good. Doug made a comment
> that things look OK to him and the rest of the work would be when we
> come to review the user-space patches.

This response ignores my point about precision.

MHz is fine *for mlx hardware* but someone elses hardware that uses,
say 312.5 MHz (ie the ethernet symbol clock) is NOT OK because MHz
looses too much precision.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                 ` <20150603204633.GE7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-04  0:55                                   ` Christoph Lameter
       [not found]                                     ` <alpine.DEB.2.11.1506031953500.11879-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2015-06-04  0:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Wed, 3 Jun 2015, Jason Gunthorpe wrote:

> MHz is fine *for mlx hardware* but someone elses hardware that uses,
> say 312.5 MHz (ie the ethernet symbol clock) is NOT OK because MHz
> looses too much precision.

Oscillator vary in frequency. In order to accurately convert to NS the
drift due to temperature etc needs to be taken into consideration. The
"ns" value there is pretty rough as well. Accurate time may need time
software to continually monitor the *actual* frequency of the oscillator.
I thknk the raw cycles and the rought oscillator speed are fine.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                     ` <alpine.DEB.2.11.1506031953500.11879-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-06-04  4:25                                       ` Jason Gunthorpe
       [not found]                                         ` <20150604042540.GA8837-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-04  4:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Or Gerlitz, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Wed, Jun 03, 2015 at 07:55:58PM -0500, Christoph Lameter wrote:

> I thknk the raw cycles and the rought oscillator speed are fine.

Time keeping is designed to adjust for 100's of ppm drift between
clocks.

A communications clock source will be spec'd to be below 200ppm in
accuracy. IB clocks are below 100 ppm, and PCI-E is 300ppm (approx, I
didn't check, order of magnitue is close)

That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock

Compared to 5,000,000 Hz in error from rounding.

So no, I disagree that rough is fine for anything.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                         ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-06-02 18:51                           ` Doug Ledford
  2015-06-03 19:48                           ` Or Gerlitz
@ 2015-06-04 16:17                           ` Hefty, Sean
       [not found]                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2 siblings, 1 reply; 64+ messages in thread
From: Hefty, Sean @ 2015-06-04 16:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Doug Ledford
  Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon

> No, if they followed (I didn't check yes) the extension scheme then the
> poll call is
> 
>  struct ibv_wc_ex wcs[num_wcs]
>  ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0]));
> 
> And the drivers decide what to do based on the 3rd argument, which is
> essentially the ABI version.

As a couple of options:

You could add a 'completion format' field to the cq attribute structure, which would allow poll to cast to a specific structure.

If we wanted to go with Doug's earlier idea, you could add a 'provider context' field to the wc.  This would allow calling back into the provider to retrieve more data.  Though, this concept would be racy if other completions can overwrite whatever data was trying to be retrieved.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-04 16:47                               ` Jason Gunthorpe
       [not found]                                 ` <20150604164759.GC27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-06-04 17:30                               ` Doug Ledford
  1 sibling, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-04 16:47 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Thu, Jun 04, 2015 at 04:17:35PM +0000, Hefty, Sean wrote:
> > No, if they followed (I didn't check yes) the extension scheme then the
> > poll call is
> > 
> >  struct ibv_wc_ex wcs[num_wcs]
> >  ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0]));
> > 
> > And the drivers decide what to do based on the 3rd argument, which is
> > essentially the ABI version.
> 
> As a couple of options:
> 
> You could add a 'completion format' field to the cq attribute
> structure, which would allow poll to cast to a specific structure.

Subtle, and tricky to use.. But sure..

> If we wanted to go with Doug's earlier idea, you could add a
> 'provider context' field to the wc.  This would allow calling back
> into the provider to retrieve more data.  Though, this concept would
> be racy if other completions can overwrite whatever data was trying
> to be retrieved.

Right, in the model we have where poll advances the HW completion
queue, that is not straightforward at all.

Can you share what the libfabric change would look like to add a
timestamp field to the completion? That might be informative,
libfabric is also consuming the kernel UAPI.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                 ` <20150604164759.GC27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-04 17:24                                   ` Hefty, Sean
       [not found]                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F98-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Hefty, Sean @ 2015-06-04 17:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

> Can you share what the libfabric change would look like to add a
> timestamp field to the completion? That might be informative,
> libfabric is also consuming the kernel UAPI.

Libfabric uses a CQ format option. so this would just require a new format.

There is no type checking on CQ poll (read).  The provider interface looks like this:

struct fi_ops_cq {
	size_t	size;
	ssize_t	(*read)(struct fid_cq *cq, void *buf, size_t count);
	ssize_t	(*readfrom)(struct fid_cq *cq, void *buf, size_t count,
			fi_addr_t *src_addr);

(We could construct static inline calls with type checking if someone wanted that.)

If I were adding timestamps, I would probably define a new completion structure with 2 u64 time stamp fields (start and end times), and figure out when start occurred, end occurred, and the timing metric later.  :)

I would assume that these are non-wrapping values.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-06-04 16:47                               ` Jason Gunthorpe
@ 2015-06-04 17:30                               ` Doug Ledford
  1 sibling, 0 replies; 64+ messages in thread
From: Doug Ledford @ 2015-06-04 17:30 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Jason Gunthorpe, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 996 bytes --]

On Thu, 2015-06-04 at 16:17 +0000, Hefty, Sean wrote:
> > No, if they followed (I didn't check yes) the extension scheme then the
> > poll call is
> > 
> >  struct ibv_wc_ex wcs[num_wcs]
> >  ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0]));
> > 
> > And the drivers decide what to do based on the 3rd argument, which is
> > essentially the ABI version.
> 
> As a couple of options:
> 
> You could add a 'completion format' field to the cq attribute structure, which would allow poll to cast to a specific structure.

This is probably what I would consider an ideal way to handle things.

> If we wanted to go with Doug's earlier idea, you could add a 'provider context' field to the wc.  This would allow calling back into the provider to retrieve more data.  Though, this concept would be racy if other completions can overwrite whatever data was trying to be retrieved.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                         ` <20150604042540.GA8837-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-06  8:16                                           ` Christoph Lameter
       [not found]                                             ` <alpine.DEB.2.11.1506060304480.29634-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2015-06-06  8:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Wed, 3 Jun 2015, Jason Gunthorpe wrote:

> On Wed, Jun 03, 2015 at 07:55:58PM -0500, Christoph Lameter wrote:
>
> > I thknk the raw cycles and the rought oscillator speed are fine.
>
> Time keeping is designed to adjust for 100's of ppm drift between
> clocks.

What time keeping? Ntp? pptp is supposed to be accurate to 10s of ns and
we would need an accuracy in that range.

> A communications clock source will be spec'd to be below 200ppm in
> accuracy. IB clocks are below 100 ppm, and PCI-E is 300ppm (approx, I
> didn't check, order of magnitue is close)

Well that is not usable. ns are a billionth of a second which is the unit
of measurement of these activities here. A send action can be around 600-1000ns.
If we are off by 200ppm then that is 200 microseconds meaning 200000 ns.
And its our experience that these clocks can be off by milliseconds in
practice.

> That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock

Ok that is around 3ns per cycle? And you think the accuracy is therefore
in femtoseconds? I have never seen something that accurate. Wish something
like that would exist. Maybe in some labs that provide the source of
global timekeeping?

> Compared to 5,000,000 Hz in error from rounding.

Huh?

> So no, I disagree that rough is fine for anything.

I am sorry but the practical issues that we are dealing with in
timekeeping today shows just the opposite. For a true comparison of clocks
with nanosecond accuracy you would need time corrected values and that is
a challenge due to the variances of the clocks that we see.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F98-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-06  8:27                                       ` Christoph Lameter
       [not found]                                         ` <alpine.DEB.2.11.1506060318570.29860-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2015-06-06  8:27 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Thu, 4 Jun 2015, Hefty, Sean wrote:

> If I were adding timestamps, I would probably define a new completion
> structure with 2 u64 time stamp fields (start and end times), and figure
> out when start occurred, end occurred, and the timing metric later.  :)

Not sure why you would need the start. The app knows when it submitted a
send request and incoming packets can be readily timed with taps if
necessary. If you want the start on inbound packets then you have the
challenge that the adapter needs to figure out when the first bit of the
message actually arrived and the timestamp information needs to be pushed
through all the way through the pipeline. Completion is easily done.

> I would assume that these are non-wrapping values.

Its fine what we have now as far as I can tell.

I am not sure why it is necessary to make this more complicated than it is
now. We need a simple means to obtain the completion time and that is what
the current implementation provides. There is even another vendor
(chelsio) who has a similar implementation.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                             ` <alpine.DEB.2.11.1506060304480.29634-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-06-06 15:45                                               ` Doug Ledford
       [not found]                                                 ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-06-06 15:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Or Gerlitz, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 3288 bytes --]

On Sat, 2015-06-06 at 03:16 -0500, Christoph Lameter wrote:
> On Wed, 3 Jun 2015, Jason Gunthorpe wrote:
> 
> > On Wed, Jun 03, 2015 at 07:55:58PM -0500, Christoph Lameter wrote:
> >
> > > I thknk the raw cycles and the rought oscillator speed are fine.
> >
> > Time keeping is designed to adjust for 100's of ppm drift between
> > clocks.
> 
> What time keeping? Ntp? pptp is supposed to be accurate to 10s of ns and
> we would need an accuracy in that range.
> 
> > A communications clock source will be spec'd to be below 200ppm in
> > accuracy. IB clocks are below 100 ppm, and PCI-E is 300ppm (approx, I
> > didn't check, order of magnitue is close)
> 
> Well that is not usable. ns are a billionth of a second which is the unit
> of measurement of these activities here. A send action can be around 600-1000ns.
> If we are off by 200ppm then that is 200 microseconds meaning 200000 ns.
> And its our experience that these clocks can be off by milliseconds in
> practice.

The ppm rating is based upon the speed of the clock, not time.  It's how
many cycles of variance you are allowed from the target speed given in
cycles / millions of cycles of the target clock frequency.  If you have
a 312.5MHz clock, and your accuracy is specified as 100ppm, then the
total clock variability is 312.5 * 100 = 31250 cycles (I suspect that
this is an absolute variance, and so the tolerance would be +-1/2 of the
total amount, but I don't know that for certain).

> > That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock

I don't know how this number is derived, but 0.0625Hz sounds like an odd
variance.

> Ok that is around 3ns per cycle? And you think the accuracy is therefore
> in femtoseconds? I have never seen something that accurate. Wish something
> like that would exist. Maybe in some labs that provide the source of
> global timekeeping?
> 
> > Compared to 5,000,000 Hz in error from rounding.
> 
> Huh?

He's pointing out that the design as specified passes the clock
frequency to the user space library in terms of integer MHz.  The
standard Ethernet clock frequency is 312.5MHz.  That .5MHz, or
500,000Hz, must be rounded off as it is passed from the kernel to the
user space library.  And that 500,000 cycle per second error in the
stated speed of the clock is *way* larger than the +- error variance in
the clocks you are using.  If you are having problems keeping your time
numbers synchronized, then this is likely a bigger problem than the
variance of the clocks.

> > So no, I disagree that rough is fine for anything.
> 
> I am sorry but the practical issues that we are dealing with in
> timekeeping today shows just the opposite. For a true comparison of clocks
> with nanosecond accuracy you would need time corrected values and that is
> a challenge due to the variances of the clocks that we see.

Jason's point, and one that isn't addressed yet, is that this might not
be variance in the clocks and instead might be a design flaw in the API
you are using and the way the clock speeds are passed to user space.
Changing from int MHz to int KHz might solve your problem.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                 ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-06 20:25                                                   ` Or Gerlitz
  2015-06-07  4:42                                                   ` Jason Gunthorpe
  2015-06-07  4:49                                                   ` Christoph Lameter
  2 siblings, 0 replies; 64+ messages in thread
From: Or Gerlitz @ 2015-06-06 20:25 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Lameter, Jason Gunthorpe, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Sat, Jun 6, 2015 at 6:45 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

>>> So no, I disagree that rough is fine for anything.

>> I am sorry but the practical issues that we are dealing with in
>> timekeeping today shows just the opposite. For a true comparison of clocks
>> with nanosecond accuracy you would need time corrected values and that is
>> a challenge due to the variances of the clocks that we see.

> Jason's point, and one that isn't addressed yet, is that this might not
> be variance in the clocks and instead might be a design flaw in the API
> you are using and the way the clock speeds are passed to user space.
> Changing from int MHz to int KHz might solve your problem.

OK, so if we have the UAPI to pass the clock frequency in KHz that
would put us in a better place? seems very much doable.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                 ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-06 20:25                                                   ` Or Gerlitz
@ 2015-06-07  4:42                                                   ` Jason Gunthorpe
  2015-06-07  4:49                                                   ` Christoph Lameter
  2 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-06-07  4:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Lameter, Or Gerlitz, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Sat, Jun 06, 2015 at 11:45:46AM -0400, Doug Ledford wrote:
> The ppm rating is based upon the speed of the clock, not time.  It's how
> many cycles of variance you are allowed from the target speed given in
> cycles / millions of cycles of the target clock frequency.

Right, it is 'parts per million'. Hz, ppm, and a measure of 'phase
noise' are the three technical characteristics used to define a
communications clock source.

> > > That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock
> 
> I don't know how this number is derived, but 0.0625Hz sounds like an odd
> variance.

I used 200ppm for that example, as it is closer to the ethernet worst
case.

> He's pointing out that the design as specified passes the clock
> frequency to the user space library in terms of integer MHz.  The
> standard Ethernet clock frequency is 312.5MHz.  That .5MHz, or
> 500,000Hz, must be rounded off as it is passed from the kernel to
> the

Yes, right, thank you, I thought I was loosing my mind :)

> Jason's point, and one that isn't addressed yet, is that this might not
> be variance in the clocks and instead might be a design flaw in the API
> you are using and the way the clock speeds are passed to user space.
> Changing from int MHz to int KHz might solve your problem.

I would use a period in picoseconds to describe a clock..

But really, if you start talking about IEEE 1588 (the PTP standard
Christoph mentioned) then even that is not enough accuracy to
represent a synchronized clock, and the frequency may change as the
NIC adjusts the time base.

So, if verbs has a 'time stamp to timespec' driver call back then that
would be most general. Userspace that needs high speed self managed
conversion can call 'time stamp to timespec' once with the value 1E9
and learn the general clock period, exactly as if it was in
query_device. So nothing is lost by this small API change.

Which suggest to me, this shouldn't be in the kernel general UAPI at
all.

That just leaves the mask, I dislike it, but.. if Or says it is too
hard to fix then we are stuck with it. I don't think the wrapping can
really be fixed unless the HW generates a CQE entry for every counter
roll over. At least I haven't had a better idea on the subject this week..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                 ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-06-06 20:25                                                   ` Or Gerlitz
  2015-06-07  4:42                                                   ` Jason Gunthorpe
@ 2015-06-07  4:49                                                   ` Christoph Lameter
  2 siblings, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2015-06-07  4:49 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Or Gerlitz, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Sat, 6 Jun 2015, Doug Ledford wrote:

> The ppm rating is based upon the speed of the clock, not time.  It's how
> many cycles of variance you are allowed from the target speed given in
> cycles / millions of cycles of the target clock frequency.  If you have
> a 312.5MHz clock, and your accuracy is specified as 100ppm, then the
> total clock variability is 312.5 * 100 = 31250 cycles (I suspect that
> this is an absolute variance, and so the tolerance would be +-1/2 of the
> total amount, but I don't know that for certain).

Ok well then you also have the problem that the clock may be off in
general already by a certain factor from the true speed of the flow of
time due to manufacturing variances etc. We are only talking about the
instabilty of the clock source while operating it seems?

> > I am sorry but the practical issues that we are dealing with in
> > timekeeping today shows just the opposite. For a true comparison of clocks
> > with nanosecond accuracy you would need time corrected values and that is
> > a challenge due to the variances of the clocks that we see.
>
> Jason's point, and one that isn't addressed yet, is that this might not
> be variance in the clocks and instead might be a design flaw in the API
> you are using and the way the clock speeds are passed to user space.
> Changing from int MHz to int KHz might solve your problem.

That sounds doable. Maybe we need to look at how clock speeds are
specified elsewhere?

man adtimex

gives some ways that this is done in the general API for clock adjustment.

Or maybe better look at IEEE 1588 for ways to specify the clock
characteristics?

http://www.nist.gov/el/isd/ieee/ieee1588.cfm

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                         ` <alpine.DEB.2.11.1506060318570.29860-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-06-08 16:40                                           ` Hefty, Sean
       [not found]                                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE5AB8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Hefty, Sean @ 2015-06-08 16:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

> > If I were adding timestamps, I would probably define a new completion
> > structure with 2 u64 time stamp fields (start and end times), and figure
> > out when start occurred, end occurred, and the timing metric later.  :)
> 
> Not sure why you would need the start. The app knows when it submitted a
> send request and incoming packets can be readily timed with taps if
> necessary. If you want the start on inbound packets then you have the
> challenge that the adapter needs to figure out when the first bit of the
> message actually arrived and the timestamp information needs to be pushed
> through all the way through the pipeline. Completion is easily done.

You're assuming that the only start time of interest is when a send operation has been posted.  Jason asked what I would do with libfabric.  That interface supports triggered operations.  It has also been designed such that a rendezvous (that has to be one of the most difficult words in the English language to spell correctly, even with spell check) protocol could be implemented by the provider.  On the receive side, it may be of interest to report the start and ending time for larger transfers, primarily for debugging purposes.

I have no idea how the time stamps are expected to be used, so why limit it?  An app could just as easily create their own time stamp when reading a work completion, especially when the data is going into an anonymous receive buffer.  That would seem to work for your use case.

> > I would assume that these are non-wrapping values.
> 
> Its fine what we have now as far as I can tell.
> 
> I am not sure why it is necessary to make this more complicated than it is
> now. We need a simple means to obtain the completion time and that is what
> the current implementation provides. There is even another vendor
> (chelsio) who has a similar implementation.

And I thought Chelsio they mentioned that they have 2 time stamps.

I have no problem with a bare metal interface exposing this.  But pretending that it's generic and that this is the one and only way that this could be implemented doesn't make it so.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE5AB8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-10  1:26                                               ` Christoph Lameter
       [not found]                                                 ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2015-06-10  1:26 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Mon, 8 Jun 2015, Hefty, Sean wrote:

> You're assuming that the only start time of interest is when a send operati=
> on has been posted.  Jason asked what I would do with libfabric.  That inte=
> rface supports triggered operations.  It has also been designed such that a=
>  rendezvous (that has to be one of the most difficult words in the English =
> language to spell correctly, even with spell check) protocol could be imple=
> mented by the provider.  On the receive side, it may be of interest to repo=
> rt the start and ending time for larger transfers, primarily for debugging =
> purposes.

There are multiple problems with libfrabric related to the use cases in my
area. Most of all the lack of multicast support. Then there is the build
up of software bloat on top. The interest here is in low latency
operations. Redenzvous and other new features are really not wanted if
they increase the latency.

> I have no idea how the time stamps are expected to be used, so why limit it=
> ?  An app could just as easily create their own time stamp when reading a w=
> ork completion, especially when the data is going into an anonymous receive=
>  buffer.  That would seem to work for your use case.

No it cannot as described earlier. The work can be completed much earlier
than when the polling thread gets around to check for it. We do that today
since there is nothing better but this means that there is a gap there.
On the send side you have no easy way to telling when the operation was
complete without the timestamp.

> I have no problem with a bare metal interface exposing this.  But pretendin=
> g that it's generic and that this is the one and only way that this could b=
> e implemented doesn't make it so.

This is a way it was implemented and its usable. Shooting for pie in the
sky does not bring us anything. Nor ideas of requirements from a new
experimental API that does not support the basic features that we need
and seems to be on its way to mess up the latencies of access to RDMA
operations.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                 ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-06-10  8:45                                                   ` Or Gerlitz
       [not found]                                                     ` <5577F93F.9030405-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-10 16:04                                                   ` Hefty, Sean
  1 sibling, 1 reply; 64+ messages in thread
From: Or Gerlitz @ 2015-06-10  8:45 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Lameter, Hefty, Sean, Jason Gunthorpe, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On 6/10/2015 4:26 AM, Christoph Lameter wrote:
>> >I have no problem with a bare metal interface exposing this.  But pretendin=
>> >g that it's generic and that this is the one and only way that this could b=
>> >e implemented doesn't make it so.
> This is a way it was implemented and its usable. Shooting for pie in the
> sky does not bring us anything. Nor ideas of requirements from a new
> experimental API that does not support the basic features that we need
> and seems to be on its way to mess up the latencies of access to RDMA operations.

Doug,

What's your maintainer say here?

The current proposal has:

1. raw HCA clock completion generation time-stamp for CQEs
2. HCA clock frequency in KHZ
3. mask telling how many bits are relevant from the 64bit time-stamp

This is fairly simple, practical and very much usable to FSI 
applications and users, and can be extended later if someone comes up 
with better/other combination of the frequency/mask. Have a GO?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                 ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2015-06-10  8:45                                                   ` Or Gerlitz
@ 2015-06-10 16:04                                                   ` Hefty, Sean
       [not found]                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE6746-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Hefty, Sean @ 2015-06-10 16:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

> There are multiple problems with libfrabric related to the use cases in my
> area. Most of all the lack of multicast support. Then there is the build
> up of software bloat on top. The interest here is in low latency
> operations. Redenzvous and other new features are really not wanted if
> they increase the latency.

Multicast is only supported by one vendor that has taken a hostile position against libfabric.  Support for multicast will eventually be there, but it's definitely not a priority for me.  As an open source project, anyone is welcome to propose patches.

For native providers, libfabric will reduce latency.  That's a provider implementation issue, and native providers will be available soon.  The OFIWG selected to have a working set of interfaces that applications can begin using immediately, versus waiting until there were a large set of native providers.

This is an entirely unrelated topic for this thread.  Jason asked what I would do with libfabric.  I answered.  Your comments do nothing to change my mind on that answer.

> This is a way it was implemented

IMO, this is exactly the problem.  The entire design is being driving by the implementation.  That produces an unmaintainable API and fractures the software ecosystem, which is exactly where we are today.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                     ` <5577F93F.9030405-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-10 23:54                                                       ` Doug Ledford
  0 siblings, 0 replies; 64+ messages in thread
From: Doug Ledford @ 2015-06-10 23:54 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Christoph Lameter, Hefty, Sean, Jason Gunthorpe, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

On Wed, 2015-06-10 at 11:45 +0300, Or Gerlitz wrote:
> On 6/10/2015 4:26 AM, Christoph Lameter wrote:
> >> >I have no problem with a bare metal interface exposing this.  But pretendin=
> >> >g that it's generic and that this is the one and only way that this could b=
> >> >e implemented doesn't make it so.
> > This is a way it was implemented and its usable. Shooting for pie in the
> > sky does not bring us anything. Nor ideas of requirements from a new
> > experimental API that does not support the basic features that we need
> > and seems to be on its way to mess up the latencies of access to RDMA operations.
> 
> Doug,
> 
> What's your maintainer say here?
> 
> The current proposal has:
> 
> 1. raw HCA clock completion generation time-stamp for CQEs
> 2. HCA clock frequency in KHZ
> 3. mask telling how many bits are relevant from the 64bit time-stamp
> 
> This is fairly simple, practical and very much usable to FSI 
> applications and users, and can be extended later if someone comes up 
> with better/other combination of the frequency/mask. Have a GO?

This is all related to the kernel <-> libibverbs interface.  In that
regard, I'm fine with what we have here.

To be more specific, the CQ creation flags and use of create_cq_ex and
the extension of the query_device struct and use of extended query
device are really the only user visible items here, and I'm OK with
those.  None of these items are hot path items and structure growth with
new fields is not the major item it is for the wc struct.

Now, the change to the wc struct and the change to ibv_poll_cq are more
important and still need some work to get to a final implementation IMO.
But that work is all limited to libibverbs and doesn't impact this
kernel patchset.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE6746-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-11 15:24                                                       ` Christoph Lameter
       [not found]                                                         ` <alpine.DEB.2.11.1506111010580.18426-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2015-06-11 15:24 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

On Wed, 10 Jun 2015, Hefty, Sean wrote:

> > There are multiple problems with libfrabric related to the use cases in m=
> y
> > area. Most of all the lack of multicast support. Then there is the build
> > up of software bloat on top. The interest here is in low latency
> > operations. Redenzvous and other new features are really not wanted if
> > they increase the latency.
>
> Multicast is only supported by one vendor that has taken a hostile position=
>  against libfabric.  Support for multicast will eventually be there, but it=
> 's definitely not a priority for me.  As an open source project, anyone is =
> welcome to propose patches.

Intel is supporting multicast in hardware. Its just a bad implementation
(broadcast and filtering MC groups in the HCA or what was that?) and there
is no plan to fix the issues despite the problem being known for quite
some time. Also does this mean that libfabric only to supports the
features needed by Intel?

> For native providers, libfabric will reduce latency.  That's a provider imp=
> lementation issue, and native providers will be available soon.  The OFIWG =
> selected to have a working set of interfaces that applications can begin us=
> ing immediately, versus waiting until there were a large set of native prov=
> iders.

I would be interested to see some measurements. AFAICT the Intel solutions
are based on historically inferior IB technology from Qlogic which has
never been able in my lab tests to compete latency wise with other
vendors. I have heard these latency claims repeatedly from Qlogic
personnel over the years.

> IMO, this is exactly the problem.  The entire design is being driving by th=
> e implementation.  That produces an unmaintainable API and fractures the so=
> ftware ecosystem, which is exactly where we are today.

This is a well designed solution and its easy to use.

It would help libfabric if you would work with other vendors and
industries to include support for their needs. MPI is not the only
applications that are running on the fabrics. I understand that is
historically the only area in which Qlogic hardware was able to compete
but I think you need to move beyond that. APIs should be as general as
possible abstracting hardware as much as possible. A viable libfabric
needs to be easy to use, low overhead as well as covering the requirements
of multiple vendors and use cases.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 0/9] Add completion timestamping support
       [not found]                                                         ` <alpine.DEB.2.11.1506111010580.18426-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-06-11 19:48                                                           ` Hefty, Sean
  0 siblings, 0 replies; 64+ messages in thread
From: Hefty, Sean @ 2015-06-11 19:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon

> Intel is supporting multicast in hardware. Its just a bad implementation
> (broadcast and filtering MC groups in the HCA or what was that?) and there
> is no plan to fix the issues despite the problem being known for quite
> some time. Also does this mean that libfabric only to supports the
> features needed by Intel?

Libfabric supports whatever features apps require and the participating vendors want to provide.  However, I, personally, have a limited amount of time to my day and will focus my effort on either what my management requires of me or areas that are most interesting.  Libfabric is specifically designed to be vendor, transport, and implementation neutral.

> I would be interested to see some measurements. AFAICT the Intel solutions
> are based on historically inferior IB technology from Qlogic which has
> never been able in my lab tests to compete latency wise with other
> vendors. I have heard these latency claims repeatedly from Qlogic
> personnel over the years.

You are referring to hardware latency.  Libfabric is software.  No amount of software is going to overcome hardware limitations.  The entire reason multicast support was removed from libfabric 1.0 was that the proposed API would have introduced latency by adding a branch into the code path.

> This is a well designed solution and its easy to use.

I fundamentally disagree with the practice of ad-hoc API design.  I stated this on the mail list probably 3 years ago.  I see nothing wrong with allowing and encouraging vendor specific extensions.

> It would help libfabric if you would work with other vendors and
> industries to include support for their needs. MPI is not the only
> applications that are running on the fabrics. I understand that is
> historically the only area in which Qlogic hardware was able to compete
> but I think you need to move beyond that. APIs should be as general as
> possible abstracting hardware as much as possible. A viable libfabric
> needs to be easy to use, low overhead as well as covering the requirements
> of multiple vendors and use cases.

Libfabric included requirements from multiple users and applications - MPI, SHMEM, PGAS, DBMS, and sockets all provided input.  It chose to target MPI as an initial priority, but it is not limited to MPI at all.  It also works with other vendors, including vendors that do not support the verbs interfaces -- Cisco, Cray, Intel PSM, plus others.  I, personally, ensured that libfabric would layer well over verbs based hardware.  That doesn't mean that I'm obligated to provide optimized providers over everyone's hardware.

The goal was not to spend 3 years working on a new API, but to get something usable within a short timeframe that could be extended.  OFIWG could have taken a different approach, but this was what the community (not Intel) selected.

As a company, Intel has many products.  A competitor in one area of the company may be a partner in another.  Xeon is by far the most important to this discussion.  It's why Intel dedicated developers to enabling high performance networking in Linux for over 10 years -- even before Intel had any products in those spaces.  And it's why Intel continues to fund development.  Sure, Intel now has IB and Omni-Path Architecture products, but they also have iWarp and Ethernet.  Intel MPI runs over a bunch of different fabrics.  Libfabric doesn't just need to work well over Intel NICs, it needs to work well over Intel platforms.

Returning to this thread, if I had to add time stamps to libfabric, I would still add 2 time stamps into a new completion structure.  Those time stamps would be selected using a method similar to what Doug stated in an earlier email.  The app would use an enum to select what the time stamps would capture.  However, I would lean more to having those values specified as part of the endpoint attributes, rather than the CQ.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2015-06-11 19:48 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-31 12:14 [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz
     [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-31 12:14   ` [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz
     [not found]     ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-01  9:21       ` Devesh Sharma
2015-06-01 17:00       ` Jason Gunthorpe
2015-05-31 12:14   ` [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr Or Gerlitz
     [not found]     ` <1433074457-26437-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-01 16:53       ` Jason Gunthorpe
     [not found]         ` <20150601165335.GB14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-02  7:29           ` Matan Barak
     [not found]             ` <CAAKD3BAyNpEKwM7xu=LzFo2F=qP=EN9ZteL2yyAgBVuPNmvtqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-02 17:27               ` Jason Gunthorpe
     [not found]                 ` <20150602172711.GB17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03  9:04                   ` Matan Barak
     [not found]                     ` <CAAKD3BA7ZD4Gx5c16Ak+0_64rzazw+7n4frkuKHyrvdT=8ac6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 16:02                       ` Jason Gunthorpe
2015-05-31 12:14   ` [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag Or Gerlitz
2015-05-31 12:14   ` [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq Or Gerlitz
2015-05-31 12:14   ` [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
2015-05-31 12:14   ` [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device Or Gerlitz
     [not found]     ` <1433074457-26437-7-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-01  9:24       ` Devesh Sharma
2015-05-31 12:14   ` [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz
2015-05-31 12:14   ` [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz
     [not found]     ` <1433074457-26437-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-01 16:56       ` Jason Gunthorpe
     [not found]         ` <20150601165649.GC14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-02  7:21           ` Matan Barak
     [not found]             ` <CAAKD3BBsTFYKwOBK+6u04x6deRYzjYESJzbpyy83UtoChZhpOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-02 17:07               ` Jason Gunthorpe
     [not found]                 ` <20150602170736.GA17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03  8:57                   ` Matan Barak
     [not found]                     ` <CAAKD3BARPocK1zZ2nYV0xWHqEASJTYbp1aQxwymi-TYTyK_vGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 16:31                       ` Jason Gunthorpe
     [not found]                         ` <20150603163146.GD12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 18:58                           ` Or Gerlitz
     [not found]                             ` <CAJ3xEMhGdX-35WAjauKZE7+Wg39kbXP35UzYX33J6OB8g_9rYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 19:16                               ` Jason Gunthorpe
     [not found]                                 ` <20150603191609.GB7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-03 19:35                                   ` Or Gerlitz
     [not found]                                     ` <CAJ3xEMiPkxf_Qu40AozMRWZaY-dAfEyAOZGyT9VYsNpQO1Q-Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 20:38                                       ` Jason Gunthorpe
2015-05-31 12:14   ` [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping Or Gerlitz
2015-05-31 12:31   ` [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz
2015-05-31 19:00   ` Doug Ledford
     [not found]     ` <1433098827.114391.179.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-01  9:30       ` Matan Barak
     [not found]         ` <CAAKD3BBo9A5AV9Uv-q1TgSQ6hs-oSUn5QsUT+5+6mk1RwW7AUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-01 11:25           ` Doug Ledford
     [not found]             ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-01 13:58               ` Christoph Lameter
     [not found]                 ` <alpine.DEB.2.11.1506010852360.26845-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-06-01 16:39                   ` Doug Ledford
2015-06-01 16:43               ` Jason Gunthorpe
     [not found]                 ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-01 17:00                   ` Doug Ledford
     [not found]                     ` <1433178057.114391.208.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-01 17:29                       ` Jason Gunthorpe
2015-06-02 14:35                   ` Doug Ledford
     [not found]                     ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-02 14:44                       ` Or Gerlitz
     [not found]                         ` <556DC167.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-02 14:53                           ` Doug Ledford
2015-06-02 18:08                       ` Jason Gunthorpe
     [not found]                         ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-02 18:51                           ` Doug Ledford
     [not found]                             ` <1433271083.40123.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-02 19:04                               ` Jason Gunthorpe
     [not found]                                 ` <20150602190410.GA23362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-02 19:56                                   ` Doug Ledford
2015-06-03 19:48                           ` Or Gerlitz
     [not found]                             ` <CAJ3xEMjZqwJC8=HrMBv_eSVKVtLW7qjVZNjtuD28octGPtYu6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 20:46                               ` Jason Gunthorpe
     [not found]                                 ` <20150603204633.GE7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-04  0:55                                   ` Christoph Lameter
     [not found]                                     ` <alpine.DEB.2.11.1506031953500.11879-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-06-04  4:25                                       ` Jason Gunthorpe
     [not found]                                         ` <20150604042540.GA8837-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-06  8:16                                           ` Christoph Lameter
     [not found]                                             ` <alpine.DEB.2.11.1506060304480.29634-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-06-06 15:45                                               ` Doug Ledford
     [not found]                                                 ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-06 20:25                                                   ` Or Gerlitz
2015-06-07  4:42                                                   ` Jason Gunthorpe
2015-06-07  4:49                                                   ` Christoph Lameter
2015-06-04 16:17                           ` Hefty, Sean
     [not found]                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-04 16:47                               ` Jason Gunthorpe
     [not found]                                 ` <20150604164759.GC27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-04 17:24                                   ` Hefty, Sean
     [not found]                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F98-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-06  8:27                                       ` Christoph Lameter
     [not found]                                         ` <alpine.DEB.2.11.1506060318570.29860-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-06-08 16:40                                           ` Hefty, Sean
     [not found]                                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE5AB8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-10  1:26                                               ` Christoph Lameter
     [not found]                                                 ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-06-10  8:45                                                   ` Or Gerlitz
     [not found]                                                     ` <5577F93F.9030405-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-10 23:54                                                       ` Doug Ledford
2015-06-10 16:04                                                   ` Hefty, Sean
     [not found]                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE6746-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-11 15:24                                                       ` Christoph Lameter
     [not found]                                                         ` <alpine.DEB.2.11.1506111010580.18426-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-06-11 19:48                                                           ` Hefty, Sean
2015-06-04 17:30                               ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.