All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 00/10] Add completion timestamping support
@ 2015-05-17 13:36 Or Gerlitz
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Or Gerlitz

Hi Doug,

This patchset adds completion timestamping supports for verbs consumers. 

Timestamping is used by applications in order to know when a WQE was 
received/transmitted by the HW. The value is given is HCA hardware cycles,
but could be easily converted as the hardware's core clock frequecny is 
available through extension of query device. 

Moreover, we add an ability to read the HCA's current clock. This could be 
useful on order to synchronize events to the wall clock.

This functionality is achieved by adding/extending the following verbs:

create_cq - create_cq is extended in order to allow passing creation flags
to the CQ creation function. We change IB/core --> vendors API
to be easily extendible by passing a struct which contains
comp_vectors, cqe and the new flags parameter. In order to create
CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP should be given.

query_device - We extend query_device uverb further by giving the hardware's
clock frequency and the timestamp mask (the number of timestamp
bits which are supported). If timestamp isn't supported, 0 is returned.

In order to read the timestamp in the WQE, the user needs to query the device 
for support, create an appropriate CQ (using the extanded uverb with
IB_CQ_FLAGS_TIMESTAMP) and poll the CQ with an extended poll_cq verb (currently,
only implemented in user-space).

In mlx4, allowing the user to read the core clock efficiently involves mapping
this area of the hardware to user-space (being done by using a mmap command)
and reading the clock from the correct offset of the page. 

This offset is returned in the vendor's specific data from mlx4's kernel driver 
to the mlx4's user-space driver. query_device is modified in order to support
passing this vendor specific data. A user-space application could use a new
verb in order to read the hardware's clock.

Translating the hardware's clock into ms could be done by dividing this
value by hca_core_clock (which is returned by the extended version of
query_device uverb).


Matan and Or.


Matan Barak (10):
  IB/core: Change provider's API of create_cq to be extendible
  IB/core: Add flags parameter for ib_create_cq
  IB/core: Extend ib_uverbs_create_cq
  IB/core: Add timestamp_mask and hca_core_clock to query_device
  IB/core: Pass hardware specific data in query_device
  IB/mlx4: Add mmap call to map the hardware clock
  IB/mlx4: Support extended create_cq and query_device uverbs
  IB/mlx4: Add support for timestamp in cq creation
  IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
  IB/mlx4: Return hca core clock's offset in query_device vendor's data

 drivers/infiniband/core/device.c                   |    6 +-
 drivers/infiniband/core/mad.c                      |    4 +-
 drivers/infiniband/core/uverbs.h                   |    1 +
 drivers/infiniband/core/uverbs_cmd.c               |  188 ++++++++++++++++----
 drivers/infiniband/core/uverbs_main.c              |    1 +
 drivers/infiniband/core/verbs.c                    |    6 +-
 drivers/infiniband/hw/amso1100/c2_provider.c       |   14 ++-
 drivers/infiniband/hw/cxgb3/iwch_provider.c        |   19 ++-
 drivers/infiniband/hw/cxgb4/cq.c                   |    9 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h             |    8 +-
 drivers/infiniband/hw/cxgb4/provider.c             |    8 +-
 drivers/infiniband/hw/ehca/ehca_cq.c               |    7 +-
 drivers/infiniband/hw/ehca/ehca_hca.c              |    6 +-
 drivers/infiniband/hw/ehca/ehca_iverbs.h           |    6 +-
 drivers/infiniband/hw/ehca/ehca_main.c             |    3 +-
 drivers/infiniband/hw/ipath/ipath_cq.c             |    9 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c          |    7 +-
 drivers/infiniband/hw/ipath/ipath_verbs.h          |    3 +-
 drivers/infiniband/hw/mlx4/cq.c                    |   13 ++-
 drivers/infiniband/hw/mlx4/mad.c                   |    2 +-
 drivers/infiniband/hw/mlx4/main.c                  |   64 +++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |   19 ++-
 drivers/infiniband/hw/mlx5/cq.c                    |   10 +-
 drivers/infiniband/hw/mlx5/main.c                  |   14 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |    5 +-
 drivers/infiniband/hw/mthca/mthca_provider.c       |   15 ++-
 drivers/infiniband/hw/nes/nes_verbs.c              |   17 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c        |   13 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h        |    9 +-
 drivers/infiniband/hw/qib/qib_cq.c                 |   11 +-
 drivers/infiniband/hw/qib/qib_verbs.c              |    6 +-
 drivers/infiniband/hw/qib/qib_verbs.h              |    5 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c       |   16 ++-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h       |   10 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    5 +-
 drivers/infiniband/ulp/iser/iser_verbs.c           |    8 +-
 drivers/infiniband/ulp/isert/ib_isert.c            |    7 +-
 drivers/infiniband/ulp/srp/ib_srp.c                |    4 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c              |    2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |   19 ++
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +-
 include/linux/mlx4/device.h                        |    9 +
 include/rdma/ib_verbs.h                            |   21 ++-
 include/uapi/rdma/ib_user_verbs.h                  |   19 ++
 net/9p/trans_rdma.c                                |    2 +-
 net/rds/ib_cm.c                                    |    4 +-
 net/rds/iw_cm.c                                    |    8 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |   15 +-
 net/sunrpc/xprtrdma/verbs.c                        |    8 +-
 49 files changed, 520 insertions(+), 152 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-17 13:36   ` Or Gerlitz
       [not found]     ` <1431869786-6308-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-17 13:36   ` [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq Or Gerlitz
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.

This commit does not change any functionality.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c         |    6 ++++--
 drivers/infiniband/core/verbs.c              |    3 ++-
 drivers/infiniband/hw/amso1100/c2_provider.c |    7 ++++++-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |   11 ++++++++---
 drivers/infiniband/hw/cxgb4/cq.c             |    9 +++++++--
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h       |    8 ++++----
 drivers/infiniband/hw/ehca/ehca_cq.c         |    7 ++++++-
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |    3 ++-
 drivers/infiniband/hw/ipath/ipath_cq.c       |    9 +++++++--
 drivers/infiniband/hw/ipath/ipath_verbs.h    |    3 ++-
 drivers/infiniband/hw/mlx4/cq.c              |    8 +++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |    3 ++-
 drivers/infiniband/hw/mlx5/cq.c              |   10 ++++++++--
 drivers/infiniband/hw/mlx5/main.c            |    3 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |    5 +++--
 drivers/infiniband/hw/mthca/mthca_provider.c |    8 ++++++--
 drivers/infiniband/hw/nes/nes_verbs.c        |   11 ++++++++---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |    7 ++++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |    6 ++++--
 drivers/infiniband/hw/qib/qib_cq.c           |   11 ++++++++---
 drivers/infiniband/hw/qib/qib_verbs.h        |    5 +++--
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |   10 +++++++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h |    7 ++++---
 include/rdma/ib_verbs.h                      |   10 ++++++++--
 24 files changed, 124 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a9f0489..f3d5830 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 	struct ib_uverbs_event_file    *ev_file = NULL;
 	struct ib_cq                   *cq;
 	int                             ret;
+	struct ib_cq_init_attr attr = {.cqe = 0};
 
 	if (out_len < sizeof resp)
 		return -ENOSPC;
@@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&obj->comp_list);
 	INIT_LIST_HEAD(&obj->async_list);
 
-	cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe,
-					     cmd.comp_vector,
+	attr.cqe = cmd.cqe;
+	attr.comp_vector = cmd.comp_vector;
+	cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
 					     file->ucontext, &udata);
 	if (IS_ERR(cq)) {
 		ret = PTR_ERR(cq);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d110a5e..0da8163 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1013,8 +1013,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device,
 			   void *cq_context, int cqe, int comp_vector)
 {
 	struct ib_cq *cq;
+	struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
 
-	cq = device->create_cq(device, cqe, comp_vector, NULL, NULL);
+	cq = device->create_cq(device, &attr, NULL, NULL);
 
 	if (!IS_ERR(cq)) {
 		cq->device        = device;
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index 6fe329a..de82025 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -292,13 +292,18 @@ static int c2_destroy_qp(struct ib_qp *ib_qp)
 	return 0;
 }
 
-static struct ib_cq *c2_create_cq(struct ib_device *ibdev, int entries, int vector,
+static struct ib_cq *c2_create_cq(struct ib_device *ibdev,
+				  struct ib_cq_init_attr *attr,
 				  struct ib_ucontext *context,
 				  struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct c2_cq *cq;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	cq = kmalloc(sizeof(*cq), GFP_KERNEL);
 	if (!cq) {
 		pr_debug("%s: Unable to allocate CQ\n", __func__);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 298d1ca..2d8c783 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -138,10 +138,12 @@ static int iwch_destroy_cq(struct ib_cq *ib_cq)
 	return 0;
 }
 
-static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int vector,
-			     struct ib_ucontext *ib_context,
-			     struct ib_udata *udata)
+static struct ib_cq *iwch_create_cq(struct ib_device *ibdev,
+				    struct ib_cq_init_attr *attr,
+				    struct ib_ucontext *ib_context,
+				    struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct iwch_dev *rhp;
 	struct iwch_cq *chp;
 	struct iwch_create_cq_resp uresp;
@@ -151,6 +153,9 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int ve
 	size_t resplen;
 
 	PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries);
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	rhp = to_iwch_dev(ibdev);
 	chp = kzalloc(sizeof(*chp), GFP_KERNEL);
 	if (!chp)
diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 68ddb37..37cd229 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -866,10 +866,13 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq)
 	return 0;
 }
 
-struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
-			     int vector, struct ib_ucontext *ib_context,
+struct ib_cq *c4iw_create_cq(struct ib_device *ibdev,
+			     struct ib_cq_init_attr *attr,
+			     struct ib_ucontext *ib_context,
 			     struct ib_udata *udata)
 {
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
 	struct c4iw_dev *rhp;
 	struct c4iw_cq *chp;
 	struct c4iw_create_cq_resp uresp;
@@ -879,6 +882,8 @@ struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
 	struct c4iw_mm_entry *mm, *mm2;
 
 	PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries);
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
 
 	rhp = to_c4iw_dev(ibdev);
 
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 97bb555..9bff2f6 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -992,10 +992,10 @@ int c4iw_reregister_phys_mem(struct ib_mr *mr,
 				     int acc, u64 *iova_start);
 int c4iw_dereg_mr(struct ib_mr *ib_mr);
 int c4iw_destroy_cq(struct ib_cq *ib_cq);
-struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
-					int vector,
-					struct ib_ucontext *ib_context,
-					struct ib_udata *udata);
+struct ib_cq *c4iw_create_cq(struct ib_device *ibdev,
+			     struct ib_cq_init_attr *attr,
+			     struct ib_ucontext *ib_context,
+			     struct ib_udata *udata);
 int c4iw_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata);
 int c4iw_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
 int c4iw_destroy_qp(struct ib_qp *ib_qp);
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index 8cc8375..4f1d18f 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -113,10 +113,12 @@ struct ehca_qp *ehca_cq_get_qp(struct ehca_cq *cq, int real_qp_num)
 	return ret;
 }
 
-struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
+struct ib_cq *ehca_create_cq(struct ib_device *device,
+			     struct ib_cq_init_attr *attr,
 			     struct ib_ucontext *context,
 			     struct ib_udata *udata)
 {
+	int cqe = attr->cqe;
 	static const u32 additional_cqe = 20;
 	struct ib_cq *cq;
 	struct ehca_cq *my_cq;
@@ -131,6 +133,9 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 	int ipz_rc, i;
 	unsigned long flags;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (cqe >= 0xFFFFFFFF - 64 - additional_cqe)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 077185b..a5624fb 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -129,7 +129,8 @@ int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq);
 void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq);
 
 
-struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
+struct ib_cq *ehca_create_cq(struct ib_device *device,
+			     struct ib_cq_init_attr *attr,
 			     struct ib_ucontext *context,
 			     struct ib_udata *udata);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c
index 0416c6c..90c7b1f 100644
--- a/drivers/infiniband/hw/ipath/ipath_cq.c
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c
@@ -188,7 +188,7 @@ static void send_complete(unsigned long data)
 /**
  * ipath_create_cq - create a completion queue
  * @ibdev: the device this completion queue is attached to
- * @entries: the minimum size of the completion queue
+ * @attr: creation attributes
  * @context: unused by the InfiniPath driver
  * @udata: unused by the InfiniPath driver
  *
@@ -197,16 +197,21 @@ static void send_complete(unsigned long data)
  *
  * Called by ib_create_cq() in the generic verbs code.
  */
-struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector,
+struct ib_cq *ipath_create_cq(struct ib_device *ibdev,
+			      struct ib_cq_init_attr *attr,
 			      struct ib_ucontext *context,
 			      struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct ipath_ibdev *dev = to_idev(ibdev);
 	struct ipath_cq *cq;
 	struct ipath_cq_wc *wc;
 	struct ib_cq *ret;
 	u32 sz;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > ib_ipath_max_cqes) {
 		ret = ERR_PTR(-EINVAL);
 		goto done;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index ae6cff4..f502c09 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -807,7 +807,8 @@ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig);
 
 int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
 
-struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector,
+struct ib_cq *ipath_create_cq(struct ib_device *ibdev,
+			      struct ib_cq_init_attr *attr,
 			      struct ib_ucontext *context,
 			      struct ib_udata *udata);
 
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 0176caa..02b0424 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -166,15 +166,21 @@ err_buf:
 	return err;
 }
 
-struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
+struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
+				struct ib_cq_init_attr *attr,
 				struct ib_ucontext *context,
 				struct ib_udata *udata)
 {
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
 	struct mlx4_ib_cq *cq;
 	struct mlx4_uar *uar;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > dev->dev->caps.max_cqes)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index fce3934..8bd07c9 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -668,7 +668,8 @@ void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
 
 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
-struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
+struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
+				struct ib_cq_init_attr *attr,
 				struct ib_ucontext *context,
 				struct ib_udata *udata);
 int mlx4_ib_destroy_cq(struct ib_cq *cq);
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 2ee6b10..e5973cc 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -736,10 +736,13 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq)
 	mlx5_db_free(dev->mdev, &cq->db);
 }
 
-struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
-				int vector, struct ib_ucontext *context,
+struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
+				struct ib_cq_init_attr *attr,
+				struct ib_ucontext *context,
 				struct ib_udata *udata)
 {
+	int entries = attr->cqe;
+	int vector = attr->comp_vector;
 	struct mlx5_create_cq_mbox_in *cqb = NULL;
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_ib_cq *cq;
@@ -750,6 +753,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
 	int eqn;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 0)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 8dec380..5a86c45 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1093,6 +1093,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 {
 	struct ib_srq_init_attr attr;
 	struct mlx5_ib_dev *dev;
+	struct ib_cq_init_attr cq_attr = {.cqe = 1};
 	int ret = 0;
 
 	dev = container_of(devr, struct mlx5_ib_dev, devr);
@@ -1106,7 +1107,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr)
 	devr->p0->uobject = NULL;
 	atomic_set(&devr->p0->usecnt, 0);
 
-	devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, 1, 0, NULL, NULL);
+	devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, &cq_attr, NULL, NULL);
 	if (IS_ERR(devr->c0)) {
 		ret = PTR_ERR(devr->c0);
 		goto error1;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index dff1cfc..1c3547b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -556,8 +556,9 @@ int mlx5_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 void *mlx5_get_send_wqe(struct mlx5_ib_qp *qp, int n);
 int mlx5_ib_read_user_wqe(struct mlx5_ib_qp *qp, int send, int wqe_index,
 			  void *buffer, u32 length);
-struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries,
-				int vector, struct ib_ucontext *context,
+struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
+				struct ib_cq_init_attr *attr,
+				struct ib_ucontext *context,
 				struct ib_udata *udata);
 int mlx5_ib_destroy_cq(struct ib_cq *cq);
 int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index ad1cca3..a3ebbab 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -647,16 +647,20 @@ static int mthca_destroy_qp(struct ib_qp *qp)
 	return 0;
 }
 
-static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, int entries,
-				     int comp_vector,
+static struct ib_cq *mthca_create_cq(struct ib_device *ibdev,
+				     struct ib_cq_init_attr *attr,
 				     struct ib_ucontext *context,
 				     struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct mthca_create_cq ucmd;
 	struct mthca_cq *cq;
 	int nent;
 	int err;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > to_mdev(ibdev)->limits.max_cqes)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 027f6d1..f76bd8a 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -1532,10 +1532,12 @@ static int nes_destroy_qp(struct ib_qp *ibqp)
 /**
  * nes_create_cq
  */
-static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries,
-		int comp_vector,
-		struct ib_ucontext *context, struct ib_udata *udata)
+static struct ib_cq *nes_create_cq(struct ib_device *ibdev,
+				   struct ib_cq_init_attr *attr,
+				   struct ib_ucontext *context,
+				   struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	u64 u64temp;
 	struct nes_vnic *nesvnic = to_nesvnic(ibdev);
 	struct nes_device *nesdev = nesvnic->nesdev;
@@ -1555,6 +1557,9 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries,
 	unsigned long flags;
 	int ret;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries > nesadapter->max_cqe)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 3e98360..a4cb8f7 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1006,10 +1006,12 @@ err:
 	return status;
 }
 
-struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector,
+struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev,
+			       struct ib_cq_init_attr *attr,
 			       struct ib_ucontext *ib_ctx,
 			       struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct ocrdma_cq *cq;
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
 	struct ocrdma_ucontext *uctx = NULL;
@@ -1017,6 +1019,9 @@ struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector,
 	int status;
 	struct ocrdma_create_cq_ureq ureq;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (udata) {
 		if (ib_copy_from_udata(&ureq, udata, sizeof(ureq)))
 			return ERR_PTR(-EFAULT);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 3cdc81e..aab99ae 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -59,8 +59,10 @@ struct ib_pd *ocrdma_alloc_pd(struct ib_device *,
 			      struct ib_ucontext *, struct ib_udata *);
 int ocrdma_dealloc_pd(struct ib_pd *pd);
 
-struct ib_cq *ocrdma_create_cq(struct ib_device *, int entries, int vector,
-			       struct ib_ucontext *, struct ib_udata *);
+struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev,
+			       struct ib_cq_init_attr *attr,
+			       struct ib_ucontext *ib_ctx,
+			       struct ib_udata *udata);
 int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *);
 int ocrdma_destroy_cq(struct ib_cq *);
 
diff --git a/drivers/infiniband/hw/qib/qib_cq.c b/drivers/infiniband/hw/qib/qib_cq.c
index ab4e11c..2cf8544 100644
--- a/drivers/infiniband/hw/qib/qib_cq.c
+++ b/drivers/infiniband/hw/qib/qib_cq.c
@@ -203,7 +203,7 @@ static void send_complete(struct kthread_work *work)
 /**
  * qib_create_cq - create a completion queue
  * @ibdev: the device this completion queue is attached to
- * @entries: the minimum size of the completion queue
+ * @attr: creation attributes
  * @context: unused by the QLogic_IB driver
  * @udata: user data for libibverbs.so
  *
@@ -212,16 +212,21 @@ static void send_complete(struct kthread_work *work)
  *
  * Called by ib_create_cq() in the generic verbs code.
  */
-struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries,
-			    int comp_vector, struct ib_ucontext *context,
+struct ib_cq *qib_create_cq(struct ib_device *ibdev,
+			    struct ib_cq_init_attr *attr,
+			    struct ib_ucontext *context,
 			    struct ib_udata *udata)
 {
+	int entries = attr->cqe;
 	struct qib_ibdev *dev = to_idev(ibdev);
 	struct qib_cq *cq;
 	struct qib_cq_wc *wc;
 	struct ib_cq *ret;
 	u32 sz;
 
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	if (entries < 1 || entries > ib_qib_max_cqes) {
 		ret = ERR_PTR(-EINVAL);
 		goto done;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index bfc8948..8c19015 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -1007,8 +1007,9 @@ void qib_cq_enter(struct qib_cq *cq, struct ib_wc *entry, int sig);
 
 int qib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
 
-struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries,
-			    int comp_vector, struct ib_ucontext *context,
+struct ib_cq *qib_create_cq(struct ib_device *ibdev,
+			    struct ib_cq_init_attr *attr,
+			    struct ib_ucontext *context,
 			    struct ib_udata *udata);
 
 int qib_destroy_cq(struct ib_cq *ibcq);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 732b5c5..e282a2f 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -576,13 +576,17 @@ int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	return status;
 }
 
-struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries,
-					int vector, struct ib_ucontext *context,
-					struct ib_udata *udata)
+struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev,
+				 struct ib_cq_init_attr *attr,
+				 struct ib_ucontext *context,
+				 struct ib_udata *udata)
 {
 	struct ib_cq *cq;
 
 	usnic_dbg("\n");
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
 	cq = kzalloc(sizeof(*cq), GFP_KERNEL);
 	if (!cq)
 		return ERR_PTR(-EBUSY);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index 57ddba5..a484404 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -46,9 +46,10 @@ struct ib_qp *usnic_ib_create_qp(struct ib_pd *pd,
 int usnic_ib_destroy_qp(struct ib_qp *qp);
 int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 				int attr_mask, struct ib_udata *udata);
-struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries,
-					int vector, struct ib_ucontext *context,
-					struct ib_udata *udata);
+struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev,
+				 struct ib_cq_init_attr *attr,
+				 struct ib_ucontext *context,
+				 struct ib_udata *udata);
 int usnic_ib_destroy_cq(struct ib_cq *cq);
 struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
 				u64 virt_addr, int access_flags,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8d59479..ad0e2ea 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -173,6 +173,12 @@ struct ib_odp_caps {
 	} per_transport_caps;
 };
 
+struct ib_cq_init_attr {
+	int cqe;
+	int comp_vector;
+	u32 flags;
+};
+
 struct ib_device_attr {
 	u64			fw_ver;
 	__be64			sys_image_guid;
@@ -1570,8 +1576,8 @@ struct ib_device {
 	int                        (*post_recv)(struct ib_qp *qp,
 						struct ib_recv_wr *recv_wr,
 						struct ib_recv_wr **bad_recv_wr);
-	struct ib_cq *             (*create_cq)(struct ib_device *device, int cqe,
-						int comp_vector,
+	struct ib_cq *             (*create_cq)(struct ib_device *device,
+						struct ib_cq_init_attr *attr,
 						struct ib_ucontext *context,
 						struct ib_udata *udata);
 	int                        (*modify_cq)(struct ib_cq *cq, u16 cq_count,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-17 13:36   ` [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
       [not found]     ` <1431869786-6308-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-17 13:36   ` [PATCH for-next 03/10] IB/core: Extend ib_uverbs_create_cq Or Gerlitz
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

We add a flags parameter for ib_create_cq in order
to specify creation flags for the CQ.

This commit does not change any functionality.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/mad.c                      |    4 ++--
 drivers/infiniband/core/verbs.c                    |    5 +++--
 drivers/infiniband/hw/ehca/ehca_main.c             |    3 ++-
 drivers/infiniband/hw/mlx4/mad.c                   |    2 +-
 drivers/infiniband/hw/mlx4/main.c                  |    2 +-
 drivers/infiniband/hw/mlx5/main.c                  |    2 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    5 +++--
 drivers/infiniband/ulp/iser/iser_verbs.c           |    8 +++-----
 drivers/infiniband/ulp/isert/ib_isert.c            |    7 ++++---
 drivers/infiniband/ulp/srp/ib_srp.c                |    4 ++--
 drivers/infiniband/ulp/srpt/ib_srpt.c              |    2 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +++----
 include/rdma/ib_verbs.h                            |    6 +++++-
 net/9p/trans_rdma.c                                |    2 +-
 net/rds/ib_cm.c                                    |    4 ++--
 net/rds/iw_cm.c                                    |    8 ++++----
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |   15 +++++----------
 net/sunrpc/xprtrdma/verbs.c                        |    8 ++++----
 18 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 600af26..2abcdcb 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2944,8 +2944,8 @@ static int ib_mad_port_open(struct ib_device *device,
 		cq_size *= 2;
 
 	port_priv->cq = ib_create_cq(port_priv->device,
-				     ib_mad_thread_completion_handler,
-				     NULL, port_priv, cq_size, 0);
+				     ib_mad_thread_completion_handler, NULL,
+				     port_priv, cq_size, 0, 0);
 	if (IS_ERR(port_priv->cq)) {
 		dev_err(&device->dev, "Couldn't create ib_mad CQ\n");
 		ret = PTR_ERR(port_priv->cq);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 0da8163..56c2873 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1010,10 +1010,11 @@ EXPORT_SYMBOL(ib_destroy_qp);
 struct ib_cq *ib_create_cq(struct ib_device *device,
 			   ib_comp_handler comp_handler,
 			   void (*event_handler)(struct ib_event *, void *),
-			   void *cq_context, int cqe, int comp_vector)
+			   void *cq_context, int cqe, int comp_vector, u32 flags)
 {
 	struct ib_cq *cq;
-	struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
+	struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector,
+				       .flags = flags};
 
 	cq = device->create_cq(device, &attr, NULL, NULL);
 
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 321545b..81869e3 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -542,7 +542,8 @@ static int ehca_create_aqp1(struct ehca_shca *shca, u32 port)
 		return -EPERM;
 	}
 
-	ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void *)(-1), 10, 0);
+	ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void *)(-1), 10,
+			    0, 0);
 	if (IS_ERR(ibcq)) {
 		ehca_err(&shca->ib_device, "Cannot create AQP1 CQ.");
 		return PTR_ERR(ibcq);
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 9cd2b00..96bf34e 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -1802,7 +1802,7 @@ static int create_pv_resources(struct ib_device *ibdev, int slave, int port,
 		cq_size *= 2;
 
 	ctx->cq = ib_create_cq(ctx->ib_dev, mlx4_ib_tunnel_comp_handler,
-			       NULL, ctx, cq_size, 0);
+			       NULL, ctx, cq_size, 0, 0);
 	if (IS_ERR(ctx->cq)) {
 		ret = PTR_ERR(ctx->cq);
 		pr_err("Couldn't create tunnel CQ (%d)\n", ret);
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 26678d2..fea61fd 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -786,7 +786,7 @@ static struct ib_xrcd *mlx4_ib_alloc_xrcd(struct ib_device *ibdev,
 		goto err2;
 	}
 
-	xrcd->cq = ib_create_cq(ibdev, NULL, NULL, xrcd, 1, 0);
+	xrcd->cq = ib_create_cq(ibdev, NULL, NULL, xrcd, 1, 0, 0);
 	if (IS_ERR(xrcd->cq)) {
 		err = PTR_ERR(xrcd->cq);
 		goto err3;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5a86c45..720ebe3 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1001,7 +1001,7 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
 	}
 
 	cq = ib_create_cq(&dev->ib_dev, mlx5_umr_cq_handler, NULL, NULL, 128,
-			  0);
+			  0, 0);
 	if (IS_ERR(cq)) {
 		mlx5_ib_dbg(dev, "Couldn't create CQ for sync UMR QP\n");
 		ret = PTR_ERR(cq);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index e5cc430..2bf241b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -178,14 +178,15 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 	} else
 		goto out_free_wq;
 
-	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0);
+	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL,
+				     dev, size, 0, 0);
 	if (IS_ERR(priv->recv_cq)) {
 		printk(KERN_WARNING "%s: failed to create receive CQ\n", ca->name);
 		goto out_cm_dev_cleanup;
 	}
 
 	priv->send_cq = ib_create_cq(priv->ca, ipoib_send_comp_handler, NULL,
-				     dev, ipoib_sendq_size, 0);
+				     dev, ipoib_sendq_size, 0, 0);
 	if (IS_ERR(priv->send_cq)) {
 		printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name);
 		goto out_free_recv_cq;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index cc2dd35..922d322 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -126,11 +126,9 @@ static int iser_create_device_ib_res(struct iser_device *device)
 		struct iser_comp *comp = &device->comps[i];
 
 		comp->device = device;
-		comp->cq = ib_create_cq(device->ib_device,
-					iser_cq_callback,
-					iser_cq_event_callback,
-					(void *)comp,
-					max_cqe, i);
+		comp->cq = ib_create_cq(device->ib_device, iser_cq_callback,
+					iser_cq_event_callback, (void *)comp,
+					max_cqe, i, 0);
 		if (IS_ERR(comp->cq)) {
 			comp->cq = NULL;
 			goto cq_err;
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 327529e..f7d9ae0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -320,11 +320,12 @@ isert_alloc_comps(struct isert_device *device,
 
 		comp->device = device;
 		INIT_WORK(&comp->work, isert_cq_work);
-		comp->cq = ib_create_cq(device->ib_device,
-					isert_cq_callback,
+		comp->cq = ib_create_cq(device->ib_device, isert_cq_callback,
 					isert_cq_event_callback,
 					(void *)comp,
-					max_cqe, i);
+					max_cqe,
+					i,
+					0);
 		if (IS_ERR(comp->cq)) {
 			isert_err("Unable to allocate cq\n");
 			ret = PTR_ERR(comp->cq);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 918814c..1a2214a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -507,14 +507,14 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch)
 
 	/* + 1 for SRP_LAST_WR_ID */
 	recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, ch,
-			       target->queue_size + 1, ch->comp_vector);
+			       target->queue_size + 1, ch->comp_vector, 0);
 	if (IS_ERR(recv_cq)) {
 		ret = PTR_ERR(recv_cq);
 		goto err;
 	}
 
 	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, ch,
-			       m * target->queue_size, ch->comp_vector);
+			       m * target->queue_size, ch->comp_vector, 0);
 	if (IS_ERR(send_cq)) {
 		ret = PTR_ERR(send_cq);
 		goto err_recv_cq;
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9b84b4c..7923683 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2091,7 +2091,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 
 retry:
 	ch->cq = ib_create_cq(sdev->device, srpt_completion, NULL, ch,
-			      ch->rq_size + srp_sq_size, 0);
+			      ch->rq_size + srp_sq_size, 0, 0);
 	if (IS_ERR(ch->cq)) {
 		ret = PTR_ERR(ch->cq);
 		pr_err("failed to create CQ cqe= %d ret= %d\n",
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 3bad441..f7b9e4b 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -742,10 +742,9 @@ kib_conn_t *kiblnd_create_conn(kib_peer_t *peer, struct rdma_cm_id *cmid,
 
 	kiblnd_map_rx_descs(conn);
 
-	cq = ib_create_cq(cmid->device,
-			  kiblnd_cq_completion, kiblnd_cq_event, conn,
-			  IBLND_CQ_ENTRIES(version),
-			  kiblnd_get_completion_vector(conn, cpt));
+	cq = ib_create_cq(cmid->device, kiblnd_cq_completion,
+			  kiblnd_cq_event, conn, IBLND_CQ_ENTRIES(version),
+			  kiblnd_get_completion_vector(conn, cpt), 0);
 	if (IS_ERR(cq)) {
 		CERROR("Can't create CQ: %ld, cqe: %d\n",
 		       PTR_ERR(cq), IBLND_CQ_ENTRIES(version));
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ad0e2ea..1bbe4a4 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -173,6 +173,10 @@ struct ib_odp_caps {
 	} per_transport_caps;
 };
 
+enum ib_cq_creation_flags {
+	IB_CQ_FLAGS_TIMESTAMP	= 1 << 0,
+};
+
 struct ib_cq_init_attr {
 	int cqe;
 	int comp_vector;
@@ -2193,7 +2197,7 @@ static inline int ib_post_recv(struct ib_qp *qp,
 struct ib_cq *ib_create_cq(struct ib_device *device,
 			   ib_comp_handler comp_handler,
 			   void (*event_handler)(struct ib_event *, void *),
-			   void *cq_context, int cqe, int comp_vector);
+			   void *cq_context, int cqe, int comp_vector, u32 flags);
 
 /**
  * ib_resize_cq - Modifies the capacity of the CQ.
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 3533d2a..9b75f8c 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -707,7 +707,7 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args)
 	/* Create the Completion Queue */
 	rdma->cq = ib_create_cq(rdma->cm_id->device, cq_comp_handler,
 				cq_event_handler, client,
-				opts.sq_depth + opts.rq_depth + 1, 0);
+				opts.sq_depth + opts.rq_depth + 1, 0, 0);
 	if (IS_ERR(rdma->cq))
 		goto error;
 	ib_req_notify_cq(rdma->cq, IB_CQ_NEXT_COMP);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 31b74f5..a7c28ac 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -293,7 +293,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 
 	ic->i_send_cq = ib_create_cq(dev, rds_ib_send_cq_comp_handler,
 				     rds_ib_cq_event_handler, conn,
-				     ic->i_send_ring.w_nr + 1, 0);
+				     ic->i_send_ring.w_nr + 1, 0, 0);
 	if (IS_ERR(ic->i_send_cq)) {
 		ret = PTR_ERR(ic->i_send_cq);
 		ic->i_send_cq = NULL;
@@ -303,7 +303,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 
 	ic->i_recv_cq = ib_create_cq(dev, rds_ib_recv_cq_comp_handler,
 				     rds_ib_cq_event_handler, conn,
-				     ic->i_recv_ring.w_nr, 0);
+				     ic->i_recv_ring.w_nr, 0, 0);
 	if (IS_ERR(ic->i_recv_cq)) {
 		ret = PTR_ERR(ic->i_recv_cq);
 		ic->i_recv_cq = NULL;
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
index a6c2bea..aec9189 100644
--- a/net/rds/iw_cm.c
+++ b/net/rds/iw_cm.c
@@ -199,8 +199,8 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr,
 	attr->qp_type = IB_QPT_RC;
 
 	attr->send_cq = ib_create_cq(dev, send_cq_handler,
-				     rds_iw_cq_event_handler,
-				     context, send_size, 0);
+				     rds_iw_cq_event_handler, context,
+				     send_size, 0, 0);
 	if (IS_ERR(attr->send_cq)) {
 		ret = PTR_ERR(attr->send_cq);
 		attr->send_cq = NULL;
@@ -209,8 +209,8 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr,
 	}
 
 	attr->recv_cq = ib_create_cq(dev, recv_cq_handler,
-				     rds_iw_cq_event_handler,
-				     context, recv_size, 0);
+				     rds_iw_cq_event_handler, context,
+				     recv_size, 0, 0);
 	if (IS_ERR(attr->recv_cq)) {
 		ret = PTR_ERR(attr->recv_cq);
 		attr->recv_cq = NULL;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 3df8320..cfb5915 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -901,21 +901,16 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		goto errout;
 	}
 	newxprt->sc_sq_cq = ib_create_cq(newxprt->sc_cm_id->device,
-					 sq_comp_handler,
-					 cq_event_handler,
-					 newxprt,
-					 newxprt->sc_sq_depth,
-					 0);
+					 sq_comp_handler, cq_event_handler,
+					 newxprt, newxprt->sc_sq_depth, 0, 0);
 	if (IS_ERR(newxprt->sc_sq_cq)) {
 		dprintk("svcrdma: error creating SQ CQ for connect request\n");
 		goto errout;
 	}
 	newxprt->sc_rq_cq = ib_create_cq(newxprt->sc_cm_id->device,
-					 rq_comp_handler,
-					 cq_event_handler,
-					 newxprt,
-					 newxprt->sc_max_requests,
-					 0);
+					 rq_comp_handler, cq_event_handler,
+					 newxprt, newxprt->sc_max_requests,
+					 0, 0);
 	if (IS_ERR(newxprt->sc_rq_cq)) {
 		dprintk("svcrdma: error creating RQ CQ for connect request\n");
 		goto errout;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4870d27..77a65c6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -772,8 +772,8 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
 	sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
-				  rpcrdma_cq_async_error_upcall, ep,
-				  ep->rep_attr.cap.max_send_wr + 1, 0);
+			      rpcrdma_cq_async_error_upcall, ep,
+			      ep->rep_attr.cap.max_send_wr + 1, 0, 0);
 	if (IS_ERR(sendcq)) {
 		rc = PTR_ERR(sendcq);
 		dprintk("RPC:       %s: failed to create send CQ: %i\n",
@@ -789,8 +789,8 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	}
 
 	recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
-				  rpcrdma_cq_async_error_upcall, ep,
-				  ep->rep_attr.cap.max_recv_wr + 1, 0);
+			      rpcrdma_cq_async_error_upcall, ep,
+			      ep->rep_attr.cap.max_recv_wr + 1, 0, 0);
 	if (IS_ERR(recvcq)) {
 		rc = PTR_ERR(recvcq);
 		dprintk("RPC:       %s: failed to create recv CQ: %i\n",
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 03/10] IB/core: Extend ib_uverbs_create_cq
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-17 13:36   ` [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz
  2015-05-17 13:36   ` [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
  2015-05-17 13:36   ` [PATCH for-next 04/10] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

ib_uverbs_ex_create_cq follows the extension verbs
mechanism. New features (for example, CQ creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |    1 +
 drivers/infiniband/core/uverbs_cmd.c  |  172 ++++++++++++++++++++++++++-------
 drivers/infiniband/core/uverbs_main.c |    1 +
 include/uapi/rdma/ib_user_verbs.h     |   17 +++
 4 files changed, 155 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index b716b08..ba365b6 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -259,5 +259,6 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
 IB_UVERBS_DECLARE_EX_CMD(create_flow);
 IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 IB_UVERBS_DECLARE_EX_CMD(query_device);
+IB_UVERBS_DECLARE_EX_CMD(create_cq);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index f3d5830..5a6f17d 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1330,41 +1330,37 @@ ssize_t ib_uverbs_create_comp_channel(struct ib_uverbs_file *file,
 	return in_len;
 }
 
-ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
-			    const char __user *buf, int in_len,
-			    int out_len)
+static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
+				       struct ib_udata *ucore,
+				       struct ib_udata *uhw,
+				       struct ib_uverbs_ex_create_cq *cmd,
+				       size_t cmd_sz,
+				       int (*cb)(struct ib_uverbs_file *file,
+						 struct ib_ucq_object *obj,
+						 struct ib_uverbs_ex_create_cq_resp *resp,
+						 struct ib_udata *udata,
+						 void *context),
+				       void *context)
 {
-	struct ib_uverbs_create_cq      cmd;
-	struct ib_uverbs_create_cq_resp resp;
-	struct ib_udata                 udata;
 	struct ib_ucq_object           *obj;
 	struct ib_uverbs_event_file    *ev_file = NULL;
 	struct ib_cq                   *cq;
 	int                             ret;
-	struct ib_cq_init_attr attr = {.cqe = 0};
-
-	if (out_len < sizeof resp)
-		return -ENOSPC;
-
-	if (copy_from_user(&cmd, buf, sizeof cmd))
-		return -EFAULT;
+	struct ib_uverbs_ex_create_cq_resp resp;
+	struct ib_cq_init_attr attr = {.flags = 0};
 
-	INIT_UDATA(&udata, buf + sizeof cmd,
-		   (unsigned long) cmd.response + sizeof resp,
-		   in_len - sizeof cmd, out_len - sizeof resp);
-
-	if (cmd.comp_vector >= file->device->num_comp_vectors)
-		return -EINVAL;
+	if (cmd->comp_vector >= file->device->num_comp_vectors)
+		return ERR_PTR(-EINVAL);
 
 	obj = kmalloc(sizeof *obj, GFP_KERNEL);
 	if (!obj)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
-	init_uobj(&obj->uobject, cmd.user_handle, file->ucontext, &cq_lock_class);
+	init_uobj(&obj->uobject, cmd->user_handle, file->ucontext, &cq_lock_class);
 	down_write(&obj->uobject.mutex);
 
-	if (cmd.comp_channel >= 0) {
-		ev_file = ib_uverbs_lookup_comp_file(cmd.comp_channel);
+	if (cmd->comp_channel >= 0) {
+		ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel);
 		if (!ev_file) {
 			ret = -EINVAL;
 			goto err;
@@ -1377,10 +1373,14 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&obj->comp_list);
 	INIT_LIST_HEAD(&obj->async_list);
 
-	attr.cqe = cmd.cqe;
-	attr.comp_vector = cmd.comp_vector;
+	attr.cqe = cmd->cqe;
+	attr.comp_vector = cmd->comp_vector;
+
+	if (cmd_sz > offsetof(typeof(*cmd), flags) + sizeof(cmd->flags))
+		attr.flags = cmd->flags;
+
 	cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
-					     file->ucontext, &udata);
+					     file->ucontext, uhw);
 	if (IS_ERR(cq)) {
 		ret = PTR_ERR(cq);
 		goto err_file;
@@ -1399,14 +1399,15 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 		goto err_free;
 
 	memset(&resp, 0, sizeof resp);
-	resp.cq_handle = obj->uobject.id;
-	resp.cqe       = cq->cqe;
+	resp.base.cq_handle = obj->uobject.id;
+	resp.base.cqe       = cq->cqe;
 
-	if (copy_to_user((void __user *) (unsigned long) cmd.response,
-			 &resp, sizeof resp)) {
-		ret = -EFAULT;
-		goto err_copy;
-	}
+	resp.response_length = offsetof(typeof(resp), response_length) +
+		sizeof(resp.response_length);
+
+	ret = cb(file, obj, &resp, ucore, context);
+	if (ret)
+		goto err_cb;
 
 	mutex_lock(&file->mutex);
 	list_add_tail(&obj->uobject.list, &file->ucontext->cq_list);
@@ -1416,9 +1417,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
 
 	up_write(&obj->uobject.mutex);
 
-	return in_len;
+	return obj;
 
-err_copy:
+err_cb:
 	idr_remove_uobj(&ib_uverbs_cq_idr, &obj->uobject);
 
 err_free:
@@ -1430,7 +1431,106 @@ err_file:
 
 err:
 	put_uobj_write(&obj->uobject);
-	return ret;
+
+	return ERR_PTR(ret);
+}
+
+static int ib_uverbs_create_cq_cb(struct ib_uverbs_file *file,
+				  struct ib_ucq_object *obj,
+				  struct ib_uverbs_ex_create_cq_resp *resp,
+				  struct ib_udata *ucore, void *context)
+{
+	if (ib_copy_to_udata(ucore, &resp->base, sizeof(resp->base)))
+		return -EFAULT;
+
+	return 0;
+}
+
+ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
+			    const char __user *buf, int in_len,
+			    int out_len)
+{
+	struct ib_uverbs_create_cq      cmd;
+	struct ib_uverbs_ex_create_cq	cmd_ex;
+	struct ib_uverbs_create_cq_resp resp;
+	struct ib_udata                 ucore;
+	struct ib_udata                 uhw;
+	struct ib_ucq_object           *obj;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	INIT_UDATA(&ucore, buf, cmd.response, sizeof(cmd), sizeof(resp));
+
+	INIT_UDATA(&uhw, buf + sizeof(cmd),
+		   (unsigned long)cmd.response + sizeof(resp),
+		   in_len - sizeof(cmd), out_len - sizeof(resp));
+
+	memset(&cmd_ex, 0, sizeof(cmd_ex));
+	cmd_ex.user_handle = cmd.user_handle;
+	cmd_ex.cqe = cmd.cqe;
+	cmd_ex.comp_vector = cmd.comp_vector;
+	cmd_ex.comp_channel = cmd.comp_channel;
+
+	obj = create_cq(file, &ucore, &uhw, &cmd_ex,
+			offsetof(typeof(cmd_ex), comp_channel) +
+			sizeof(cmd.comp_channel), ib_uverbs_create_cq_cb,
+			NULL);
+
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	return in_len;
+}
+
+static int ib_uverbs_ex_create_cq_cb(struct ib_uverbs_file *file,
+				     struct ib_ucq_object *obj,
+				     struct ib_uverbs_ex_create_cq_resp *resp,
+				     struct ib_udata *ucore, void *context)
+{
+	if (ib_copy_to_udata(ucore, resp, resp->response_length))
+		return -EFAULT;
+
+	return 0;
+}
+
+int ib_uverbs_ex_create_cq(struct ib_uverbs_file *file,
+			   struct ib_udata *ucore,
+			   struct ib_udata *uhw)
+{
+	struct ib_uverbs_ex_create_cq_resp resp;
+	struct ib_uverbs_ex_create_cq  cmd;
+	struct ib_ucq_object           *obj;
+	int err;
+
+	if (ucore->inlen < sizeof(cmd))
+		return -EINVAL;
+
+	err = ib_copy_from_udata(&cmd, ucore, sizeof(cmd));
+	if (err)
+		return err;
+
+	if (cmd.comp_mask)
+		return -EINVAL;
+
+	if (cmd.reserved)
+		return -EINVAL;
+
+	if (ucore->outlen < (offsetof(typeof(resp), response_length) +
+			     sizeof(resp.response_length)))
+		return -ENOSPC;
+
+	obj = create_cq(file, ucore, uhw, &cmd,
+			min(ucore->inlen, sizeof(cmd)),
+			ib_uverbs_ex_create_cq_cb, NULL);
+
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	return 0;
 }
 
 ssize_t ib_uverbs_resize_cq(struct ib_uverbs_file *file,
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 88cce9b..f6eef2d 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -124,6 +124,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_EX_CMD_CREATE_FLOW]	= ib_uverbs_ex_create_flow,
 	[IB_USER_VERBS_EX_CMD_DESTROY_FLOW]	= ib_uverbs_ex_destroy_flow,
 	[IB_USER_VERBS_EX_CMD_QUERY_DEVICE]	= ib_uverbs_ex_query_device,
+	[IB_USER_VERBS_EX_CMD_CREATE_CQ]	= ib_uverbs_ex_create_cq,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index b513e66..19f0256 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -91,6 +91,7 @@ enum {
 
 enum {
 	IB_USER_VERBS_EX_CMD_QUERY_DEVICE = IB_USER_VERBS_CMD_QUERY_DEVICE,
+	IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ,
 	IB_USER_VERBS_EX_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
 	IB_USER_VERBS_EX_CMD_DESTROY_FLOW,
 };
@@ -353,11 +354,27 @@ struct ib_uverbs_create_cq {
 	__u64 driver_data[0];
 };
 
+struct ib_uverbs_ex_create_cq {
+	__u64 user_handle;
+	__u32 cqe;
+	__u32 comp_vector;
+	__s32 comp_channel;
+	__u32 comp_mask;
+	__u32 flags;
+	__u32 reserved;
+};
+
 struct ib_uverbs_create_cq_resp {
 	__u32 cq_handle;
 	__u32 cqe;
 };
 
+struct ib_uverbs_ex_create_cq_resp {
+	struct ib_uverbs_create_cq_resp base;
+	__u32 comp_mask;
+	__u32 response_length;
+};
+
 struct ib_uverbs_resize_cq {
 	__u64 response;
 	__u32 cq_handle;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 04/10] IB/core: Add timestamp_mask and hca_core_clock to query_device
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-05-17 13:36   ` [PATCH for-next 03/10] IB/core: Extend ib_uverbs_create_cq Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
  2015-05-17 13:36   ` [PATCH for-next 05/10] IB/core: Pass hardware specific data in query_device Or Gerlitz
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In order to expose timestamp we need to expose two attributes
in query_device:
timestamp_mask - how many bits are valid in the timestamp.
		 timestamp values could be 64bits the most.
hca_core_clock - timestamp is given in HW cycles, hca_core_clock
                 is the frequency of the HCA and is necessary in
		 order to convert cycles to seconds.

This is added both to ib_query_device and its respective uverbs
counterpart.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c     |    2 ++
 drivers/infiniband/core/uverbs_cmd.c |   14 ++++++++++++++
 include/rdma/ib_verbs.h              |    2 ++
 include/uapi/rdma/ib_user_verbs.h    |    2 ++
 4 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index b360350..4962b9f 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -559,6 +559,8 @@ EXPORT_SYMBOL(ib_dispatch_event);
 int ib_query_device(struct ib_device *device,
 		    struct ib_device_attr *device_attr)
 {
+	memset(device_attr, 0, sizeof(*device_attr));
+
 	return device->query_device(device, device_attr);
 }
 EXPORT_SYMBOL(ib_query_device);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 5a6f17d..d8789f4 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3426,6 +3426,8 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 	if (ucore->outlen < resp.response_length)
 		return -ENOSPC;
 
+	memset(&attr, 0, sizeof(attr));
+
 	err = device->query_device(device, &attr);
 	if (err)
 		return err;
@@ -3450,6 +3452,18 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 #endif
 	resp.response_length += sizeof(resp.odp_caps);
 
+	if (ucore->outlen < resp.response_length + sizeof(resp.timestamp_mask))
+		goto end;
+
+	resp.timestamp_mask = attr.timestamp_mask;
+	resp.response_length += sizeof(resp.timestamp_mask);
+
+	if (ucore->outlen < resp.response_length + sizeof(resp.hca_core_clock))
+		goto end;
+
+	resp.hca_core_clock = attr.hca_core_clock;
+	resp.response_length += sizeof(resp.hca_core_clock);
+
 end:
 	err = ib_copy_to_udata(ucore, &resp, resp.response_length);
 	if (err)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1bbe4a4..63b11cb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -227,6 +227,8 @@ struct ib_device_attr {
 	int			sig_prot_cap;
 	int			sig_guard_cap;
 	struct ib_odp_caps	odp_caps;
+	uint64_t		timestamp_mask;
+	uint64_t		hca_core_clock;
 };
 
 enum ib_mtu {
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 19f0256..f9b8843 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -223,6 +223,8 @@ struct ib_uverbs_ex_query_device_resp {
 	__u32 comp_mask;
 	__u32 response_length;
 	struct ib_uverbs_odp_caps odp_caps;
+	__u64 timestamp_mask;
+	__u64 hca_core_clock;
 };
 
 struct ib_uverbs_query_port {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 05/10] IB/core: Pass hardware specific data in query_device
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-05-17 13:36   ` [PATCH for-next 04/10] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
  2015-05-17 13:36   ` [PATCH for-next 06/10] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Vendors should be able to pass vendor specific data to/from
user-space via query_device uverb. In order to do this,
we need to pass the vendors' specific udata.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c             |    4 +++-
 drivers/infiniband/core/uverbs_cmd.c         |    2 +-
 drivers/infiniband/hw/amso1100/c2_provider.c |    7 +++++--
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |    8 ++++++--
 drivers/infiniband/hw/cxgb4/provider.c       |    8 ++++++--
 drivers/infiniband/hw/ehca/ehca_hca.c        |    6 +++++-
 drivers/infiniband/hw/ehca/ehca_iverbs.h     |    3 ++-
 drivers/infiniband/hw/ipath/ipath_verbs.c    |    7 +++++--
 drivers/infiniband/hw/mlx4/main.c            |    6 +++++-
 drivers/infiniband/hw/mlx5/main.c            |    9 +++++++--
 drivers/infiniband/hw/mthca/mthca_provider.c |    7 +++++--
 drivers/infiniband/hw/nes/nes_verbs.c        |    6 +++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |    6 +++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |    3 ++-
 drivers/infiniband/hw/qib/qib_verbs.c        |    6 ++++--
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c |    6 +++++-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h |    3 ++-
 include/rdma/ib_verbs.h                      |    3 ++-
 18 files changed, 75 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 4962b9f..c459aa1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -559,9 +559,11 @@ EXPORT_SYMBOL(ib_dispatch_event);
 int ib_query_device(struct ib_device *device,
 		    struct ib_device_attr *device_attr)
 {
+	struct ib_udata uhw = {.outlen = 0, .inlen = 0};
+
 	memset(device_attr, 0, sizeof(*device_attr));
 
-	return device->query_device(device, device_attr);
+	return device->query_device(device, device_attr, &uhw);
 }
 EXPORT_SYMBOL(ib_query_device);
 
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index d8789f4..f3d47d1 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3428,7 +3428,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 
 	memset(&attr, 0, sizeof(attr));
 
-	err = device->query_device(device, &attr);
+	err = device->query_device(device, &attr, uhw);
 	if (err)
 		return err;
 
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index de82025..17f0092 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -63,13 +63,16 @@
 #include "c2_provider.h"
 #include "c2_user.h"
 
-static int c2_query_device(struct ib_device *ibdev,
-			   struct ib_device_attr *props)
+static int c2_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			   struct ib_udata *uhw)
 {
 	struct c2_dev *c2dev = to_c2dev(ibdev);
 
 	pr_debug("%s:%u\n", __func__, __LINE__);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	*props = c2dev->props;
 	return 0;
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 2d8c783..157fc36 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1150,13 +1150,17 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev)
 	       (fw_mic & 0xffff);
 }
 
-static int iwch_query_device(struct ib_device *ibdev,
-			     struct ib_device_attr *props)
+static int iwch_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			     struct ib_udata *uhw)
 {
 
 	struct iwch_dev *dev;
+
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	dev = to_iwch_dev(ibdev);
 	memset(props, 0, sizeof *props);
 	memcpy(&props->sys_image_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6);
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index f52ee63..03e82d4 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -301,13 +301,17 @@ static int c4iw_query_gid(struct ib_device *ibdev, u8 port, int index,
 	return 0;
 }
 
-static int c4iw_query_device(struct ib_device *ibdev,
-			     struct ib_device_attr *props)
+static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			     struct ib_udata *uhw)
 {
 
 	struct c4iw_dev *dev;
+
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	dev = to_c4iw_dev(ibdev);
 	memset(props, 0, sizeof *props);
 	memcpy(&props->sys_image_guid, dev->rdev.lldi.ports[0]->dev_addr, 6);
diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index 1f4dc9c..e68718c 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -50,7 +50,8 @@ static unsigned int limit_uint(unsigned int value)
 	return min_t(unsigned int, value, INT_MAX);
 }
 
-int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
+int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+		      struct ib_udata *uhw)
 {
 	int i, ret = 0;
 	struct ehca_shca *shca = container_of(ibdev, struct ehca_shca,
@@ -71,6 +72,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
 		IB_DEVICE_PORT_ACTIVE_EVENT,  HCA_CAP_PORT_ACTIVE_EVENT,
 	};
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL);
 	if (!rblock) {
 		ehca_err(&shca->ib_device, "Can't allocate rblock memory.");
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index a5624fb..0353953 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -44,7 +44,8 @@
 
 #include "ehca_classes.h"
 
-int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props);
+int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+		      struct ib_udata *uhw);
 
 int ehca_query_port(struct ib_device *ibdev, u8 port,
 		    struct ib_port_attr *props);
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 34b94c3..72970e5 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1495,11 +1495,14 @@ bail:
 	return 0;
 }
 
-static int ipath_query_device(struct ib_device *ibdev,
-			      struct ib_device_attr *props)
+static int ipath_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			      struct ib_udata *uhw)
 {
 	struct ipath_ibdev *dev = to_idev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	memset(props, 0, sizeof(*props));
 
 	props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index fea61fd..fe697ec 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -132,7 +132,8 @@ static int num_ib_ports(struct mlx4_dev *dev)
 }
 
 static int mlx4_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props)
+				struct ib_device_attr *props,
+				struct ib_udata *uhw)
 {
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
 	struct ib_smp *in_mad  = NULL;
@@ -140,6 +141,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	int err = -ENOMEM;
 	int have_ib_ports;
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
 	out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
 	if (!in_mad || !out_mad)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 720ebe3..989fddb 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -63,7 +63,8 @@ static char mlx5_version[] =
 	DRIVER_VERSION " (" DRIVER_RELDATE ")\n";
 
 static int mlx5_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props)
+				struct ib_device_attr *props,
+				struct ib_udata *uhw)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct ib_smp *in_mad  = NULL;
@@ -74,6 +75,9 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	int max_sq_sg;
 	u64 flags;
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	gen = &dev->mdev->caps.gen;
 	in_mad  = kzalloc(sizeof(*in_mad), GFP_KERNEL);
 	out_mad = kmalloc(sizeof(*out_mad), GFP_KERNEL);
@@ -916,6 +920,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 	struct mlx5_general_caps *gen;
 	int err = -ENOMEM;
 	int port;
+	struct ib_udata uhw = {.inlen = 0, .outlen = 0};
 
 	gen = &dev->mdev->caps.gen;
 	pprops = kmalloc(sizeof(*pprops), GFP_KERNEL);
@@ -926,7 +931,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 	if (!dprops)
 		goto out;
 
-	err = mlx5_ib_query_device(&dev->ib_dev, dprops);
+	err = mlx5_ib_query_device(&dev->ib_dev, dprops, &uhw);
 	if (err) {
 		mlx5_ib_warn(dev, "query_device failed %d\n", err);
 		goto out;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index a3ebbab..8421513 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -57,14 +57,17 @@ static void init_query_mad(struct ib_smp *mad)
 	mad->method    	   = IB_MGMT_METHOD_GET;
 }
 
-static int mthca_query_device(struct ib_device *ibdev,
-			      struct ib_device_attr *props)
+static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			      struct ib_udata *uhw)
 {
 	struct ib_smp *in_mad  = NULL;
 	struct ib_smp *out_mad = NULL;
 	int err = -ENOMEM;
 	struct mthca_dev *mdev = to_mdev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
 	out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
 	if (!in_mad || !out_mad)
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index f76bd8a..05aed87 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -512,12 +512,16 @@ static void nes_free_fast_reg_page_list(struct ib_fast_reg_page_list *pifrpl)
 /**
  * nes_query_device
  */
-static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
+static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			    struct ib_udata *uhw)
 {
 	struct nes_vnic *nesvnic = to_nesvnic(ibdev);
 	struct nes_device *nesdev = nesvnic->nesdev;
 	struct nes_ib_device *nesibdev = nesvnic->nesibdev;
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	memset(props, 0, sizeof(*props));
 	memcpy(&props->sys_image_guid, nesvnic->netdev->dev_addr, 6);
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index a4cb8f7..836a16c 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -61,10 +61,14 @@ int ocrdma_query_gid(struct ib_device *ibdev, u8 port,
 	return 0;
 }
 
-int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr)
+int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr,
+			struct ib_udata *uhw)
 {
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	memset(attr, 0, sizeof *attr);
 	memcpy(&attr->fw_ver, &dev->attr.fw_ver[0],
 	       min(sizeof(dev->attr.fw_ver), sizeof(attr->fw_ver)));
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index aab99ae..2bb44cb 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -36,7 +36,8 @@ int ocrdma_post_recv(struct ib_qp *, struct ib_recv_wr *,
 int ocrdma_poll_cq(struct ib_cq *, int num_entries, struct ib_wc *wc);
 int ocrdma_arm_cq(struct ib_cq *, enum ib_cq_notify_flags flags);
 
-int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props);
+int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props,
+			struct ib_udata *uhw);
 int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props);
 int ocrdma_modify_port(struct ib_device *, u8 port, int mask,
 		       struct ib_port_modify *props);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 9fd4b28..2a4a4be 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1550,12 +1550,14 @@ full:
 	}
 }
 
-static int qib_query_device(struct ib_device *ibdev,
-			    struct ib_device_attr *props)
+static int qib_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			    struct ib_udata *uhw)
 {
 	struct qib_devdata *dd = dd_from_ibdev(ibdev);
 	struct qib_ibdev *dev = to_idev(ibdev);
 
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
 	memset(props, 0, sizeof(*props));
 
 	props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index e282a2f..ec198c3 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -248,7 +248,8 @@ enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device,
 }
 
 int usnic_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props)
+			  struct ib_device_attr *props,
+			  struct ib_udata *uhw)
 {
 	struct usnic_ib_dev *us_ibdev = to_usdev(ibdev);
 	union ib_gid gid;
@@ -257,6 +258,9 @@ int usnic_ib_query_device(struct ib_device *ibdev,
 	int qp_per_vf;
 
 	usnic_dbg("\n");
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+
 	mutex_lock(&us_ibdev->usdev_lock);
 	us_ibdev->netdev->ethtool_ops->get_drvinfo(us_ibdev->netdev, &info);
 	us_ibdev->netdev->ethtool_ops->get_settings(us_ibdev->netdev, &cmd);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index a484404..b900613 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -24,7 +24,8 @@
 enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device,
 						u8 port_num);
 int usnic_ib_query_device(struct ib_device *ibdev,
-				struct ib_device_attr *props);
+			  struct ib_device_attr *props,
+			  struct ib_udata *uhw);
 int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
 				struct ib_port_attr *props);
 enum rdma_protocol_type
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 63b11cb..7d3107d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1516,7 +1516,8 @@ struct ib_device {
 	int		           (*get_protocol_stats)(struct ib_device *device,
 							 union rdma_protocol_stats *stats);
 	int		           (*query_device)(struct ib_device *device,
-						   struct ib_device_attr *device_attr);
+						   struct ib_device_attr *device_attr,
+						   struct ib_udata *udata);
 	int		           (*query_port)(struct ib_device *device,
 						 u8 port_num,
 						 struct ib_port_attr *port_attr);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 06/10] IB/mlx4: Add mmap call to map the hardware clock
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-05-17 13:36   ` [PATCH for-next 05/10] IB/core: Pass hardware specific data in query_device Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
  2015-05-17 13:36   ` [PATCH for-next 07/10] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In order to read the HCA's cycle counter efficiently in
user space, we need to map the HCA's register.
This is done through mmap call.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c         |   18 +++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/main.c |   19 +++++++++++++++++++
 include/linux/mlx4/device.h               |    9 +++++++++
 3 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index fe697ec..1b60687 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -725,8 +725,24 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 				       dev->dev->caps.num_uars,
 				       PAGE_SIZE, vma->vm_page_prot))
 			return -EAGAIN;
-	} else
+	} else if (vma->vm_pgoff == 3) {
+		struct mlx4_clock_params params;
+		int ret = mlx4_get_internal_clock_params(dev->dev, &params);
+
+		if (ret)
+			return ret;
+
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		if (io_remap_pfn_range(vma, vma->vm_start,
+				       (pci_resource_start(dev->dev->persist->pdev,
+							   params.bar) +
+					params.offset)
+				       >> PAGE_SHIFT,
+				       PAGE_SIZE, vma->vm_page_prot))
+			return -EAGAIN;
+	} else {
 		return -EINVAL;
+	}
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index ced5eca..70de39c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1674,6 +1674,25 @@ static int map_internal_clock(struct mlx4_dev *dev)
 	return 0;
 }
 
+int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
+				   struct mlx4_clock_params *params)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+
+	if (mlx4_is_slave(dev))
+		return -ENOTSUPP;
+
+	if (!params)
+		return -EINVAL;
+
+	params->bar = priv->fw.clock_bar;
+	params->offset = priv->fw.clock_offset;
+	params->size = MLX4_CLOCK_SIZE;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_get_internal_clock_params);
+
 static void unmap_internal_clock(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 83e80ab..f94984f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -829,6 +829,12 @@ struct mlx4_dev {
 	struct mlx4_vf_dev     *dev_vfs;
 };
 
+struct mlx4_clock_params {
+	u64 offset;
+	u8 bar;
+	u8 size;
+};
+
 struct mlx4_eqe {
 	u8			reserved1;
 	u8			type;
@@ -1485,4 +1491,7 @@ int mlx4_ACCESS_PTYS_REG(struct mlx4_dev *dev,
 			 enum mlx4_access_reg_method method,
 			 struct mlx4_ptys_reg *ptys_reg);
 
+int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
+				   struct mlx4_clock_params *params);
+
 #endif /* MLX4_DEVICE_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 07/10] IB/mlx4: Support extended create_cq and query_device uverbs
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-05-17 13:36   ` [PATCH for-next 06/10] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
  2015-05-17 13:36   ` [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation Or Gerlitz
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device
by setting the appropriate bit in uverbs_ex_cmd_mask.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 1b60687..bb7d42c 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2309,6 +2309,10 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 			(1ull << IB_USER_VERBS_EX_CMD_DESTROY_FLOW);
 	}
 
+	ibdev->ib_dev.uverbs_ex_cmd_mask |=
+		(1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE) |
+		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ);
+
 	mlx4_ib_alloc_eqs(dev, ibdev);
 
 	spin_lock_init(&iboe->lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-05-17 13:36   ` [PATCH for-next 07/10] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
       [not found]     ` <1431869786-6308-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-17 13:36   ` [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
  2015-05-17 13:36   ` [PATCH for-next 10/10] IB/mlx4: Return hca core clock's offset in query_device vendor's data Or Gerlitz
  9 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Support allocation of CQ with IB_CQ_FLAGS_TIMESTAMP
creation flag.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/cq.c      |    9 ++++++---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 02b0424..da7ea19 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -166,6 +166,7 @@ err_buf:
 	return err;
 }
 
+#define CQ_CREATE_FLAGS_SUPPORTED IB_CQ_FLAGS_TIMESTAMP
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 				struct ib_cq_init_attr *attr,
 				struct ib_ucontext *context,
@@ -178,10 +179,10 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 	struct mlx4_uar *uar;
 	int err;
 
-	if (attr->flags)
+	if (entries < 1 || entries > dev->dev->caps.max_cqes)
 		return ERR_PTR(-EINVAL);
 
-	if (entries < 1 || entries > dev->dev->caps.max_cqes)
+	if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED)
 		return ERR_PTR(-EINVAL);
 
 	cq = kmalloc(sizeof *cq, GFP_KERNEL);
@@ -194,6 +195,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 	spin_lock_init(&cq->lock);
 	cq->resize_buf = NULL;
 	cq->resize_umem = NULL;
+	cq->create_flags = attr->flags;
 	INIT_LIST_HEAD(&cq->send_qp_list);
 	INIT_LIST_HEAD(&cq->recv_qp_list);
 
@@ -237,7 +239,8 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
 		vector = dev->eq_table[vector % ibdev->num_comp_vectors];
 
 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq, vector, 0, 0);
+			    cq->db.dma, &cq->mcq, vector, 0,
+			    !!(cq->create_flags & IB_CQ_FLAGS_TIMESTAMP));
 	if (err)
 		goto err_dbmap;
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 8bd07c9..3afd3c5 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -110,6 +110,7 @@ struct mlx4_ib_cq {
 	struct mutex		resize_mutex;
 	struct ib_umem	       *umem;
 	struct ib_umem	       *resize_umem;
+	int			create_flags;
 	/* List of qps that it serves.*/
 	struct list_head		send_qp_list;
 	struct list_head		recv_qp_list;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-05-17 13:36   ` [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
       [not found]     ` <1431869786-6308-10-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-17 13:36   ` [PATCH for-next 10/10] IB/mlx4: Return hca core clock's offset in query_device vendor's data Or Gerlitz
  9 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

mlx4 needs to report the number of supported timestamp
bits (mask) and the hca_core_clock frequency.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index bb7d42c..b77dd77 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -233,6 +233,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
 					   props->max_mcast_grp;
 	props->max_map_per_fmr = dev->dev->caps.max_fmr_maps;
+	props->hca_core_clock = dev->dev->caps.hca_core_clock;
+	props->timestamp_mask = 0xFFFFFFFFFFFFULL;
 
 out:
 	kfree(in_mad);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH for-next 10/10] IB/mlx4: Return hca core clock's offset in query_device vendor's data
       [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-05-17 13:36   ` [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
@ 2015-05-17 13:36   ` Or Gerlitz
  9 siblings, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-17 13:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In order to read the HCA's core clock, the user-space needs to query
the correct offset in the mapped page. This offset is passed
through query_device's vendor specific data.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c    |   36 ++++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   15 ++++++++++++++
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index b77dd77..c0e830c 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -140,10 +140,27 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	struct ib_smp *out_mad = NULL;
 	int err = -ENOMEM;
 	int have_ib_ports;
+	struct mlx4_uverbs_ex_query_device cmd;
+	struct mlx4_uverbs_ex_query_device_resp resp = {.comp_mask = 0};
+	struct mlx4_clock_params clock_params;
 
-	if (uhw->inlen || uhw->outlen)
-		return -EINVAL;
+	if (uhw->inlen) {
+		if (uhw->inlen < sizeof(cmd))
+			return -EINVAL;
+
+		err = ib_copy_from_udata(&cmd, uhw, sizeof(cmd));
+		if (err)
+			return err;
+
+		if (cmd.comp_mask)
+			return -EINVAL;
+
+		if (cmd.reserved)
+			return -EINVAL;
+	}
 
+	resp.response_length = offsetof(typeof(resp), response_length) +
+		sizeof(resp.response_length);
 	in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
 	out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
 	if (!in_mad || !out_mad)
@@ -236,6 +253,21 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	props->hca_core_clock = dev->dev->caps.hca_core_clock;
 	props->timestamp_mask = 0xFFFFFFFFFFFFULL;
 
+	err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
+	if (err)
+		goto out;
+
+	if (uhw->outlen >= resp.response_length + sizeof(resp.hca_core_clock_offset)) {
+		resp.hca_core_clock_offset = clock_params.offset % PAGE_SIZE;
+		resp.response_length += sizeof(resp.hca_core_clock_offset);
+		resp.comp_mask |= QUERY_DEVICE_RESP_MASK_TIMESTAMP;
+	}
+
+	if (uhw->outlen) {
+		err = ib_copy_to_udata(uhw, &resp, resp.response_length);
+		if (err)
+			goto out;
+	}
 out:
 	kfree(in_mad);
 	kfree(out_mad);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 3afd3c5..f15754c 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -556,6 +556,21 @@ struct mlx4_ib_qp_tunnel_init_attr {
 	u8 port;
 };
 
+struct mlx4_uverbs_ex_query_device {
+	__u32 comp_mask;
+	__u32 reserved;
+};
+
+enum query_device_resp_mask {
+	QUERY_DEVICE_RESP_MASK_TIMESTAMP = 1UL << 0,
+};
+
+struct mlx4_uverbs_ex_query_device_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+	__u64 hca_core_clock_offset;
+};
+
 static inline struct mlx4_ib_dev *to_mdev(struct ib_device *ibdev)
 {
 	return container_of(ibdev, struct mlx4_ib_dev, ib_dev);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible
       [not found]     ` <1431869786-6308-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-19 18:45       ` Jason Gunthorpe
       [not found]         ` <20150519184535.GJ18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-19 18:45 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On Sun, May 17, 2015 at 04:36:17PM +0300, Or Gerlitz wrote:
> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add a new ib_cq_init_attr structure which contains the
> previous cqe (minimum number of CQ entries) and comp_vector
> (completion vector) in addition to a new flags field.
> All vendors' create_cq callbacks are changed in order
> to work with the new API.
> 
> This commit does not change any functionality.

This seems reasonable to me.

> @@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
>  	struct ib_uverbs_event_file    *ev_file = NULL;
>  	struct ib_cq                   *cq;
>  	int                             ret;
> +	struct ib_cq_init_attr attr = {.cqe = 0};

This doesn't seem necessary, it is unconditionally set below:

>  	if (out_len < sizeof resp)
>  		return -ENOSPC;
> @@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
>  	INIT_LIST_HEAD(&obj->comp_list);
>  	INIT_LIST_HEAD(&obj->async_list);
>  
> -	cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe,
> -					     cmd.comp_vector,
> +	attr.cqe = cmd.cqe;
> +	attr.comp_vector = cmd.comp_vector;
> +	cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
>  					     file->ucontext, &udata);

> +++ b/drivers/infiniband/core/verbs.c
> @@ -1013,8 +1013,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device,
>  			   void *cq_context, int cqe, int comp_vector)
>  {
>  	struct ib_cq *cq;
> +	struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
>  
> -	cq = device->create_cq(device, cqe, comp_vector, NULL, NULL);
> +	cq = device->create_cq(device, &attr, NULL, NULL);

Hum, I guess it makes sense to stop flowing ib_cq_init_attr at this
point, for this patch, but it does seem a bit weird from an API design
perspective.

> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 8d59479..ad0e2ea 100644
> +++ b/include/rdma/ib_verbs.h
> @@ -173,6 +173,12 @@ struct ib_odp_caps {
>  	} per_transport_caps;
>  };
>  
> +struct ib_cq_init_attr {
> +	int cqe;

 unsigned int cqe

Can't be negative..

> +	struct ib_cq *             (*create_cq)(struct ib_device *device,
> +						struct ib_cq_init_attr *attr,

const struct ib_cq_init_attr *attr,

And related changes that will cause.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq
       [not found]     ` <1431869786-6308-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-19 18:47       ` Jason Gunthorpe
       [not found]         ` <20150519184711.GK18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-05-20 15:24       ` Yann Droneaud
  1 sibling, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-19 18:47 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On Sun, May 17, 2015 at 04:36:18PM +0300, Or Gerlitz wrote:
> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> We add a flags parameter for ib_create_cq in order
> to specify creation flags for the CQ.

Why?

At this point you should keep flowing ib_cq_init_attr up to callers of
ib_create_cq , so we don't have to do this again.

It makes sense to keep that as a distinct patch..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation
       [not found]     ` <1431869786-6308-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-19 18:55       ` Jason Gunthorpe
       [not found]         ` <20150519185527.GL18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-19 18:55 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On Sun, May 17, 2015 at 04:36:24PM +0300, Or Gerlitz wrote:
> @@ -110,6 +110,7 @@ struct mlx4_ib_cq {
>  	struct mutex		resize_mutex;
>  	struct ib_umem	       *umem;
>  	struct ib_umem	       *resize_umem;
> +	int			create_flags;

flags is u32
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]     ` <1431869786-6308-10-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-19 18:58       ` Jason Gunthorpe
       [not found]         ` <20150519185801.GM18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-19 18:58 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Sun, May 17, 2015 at 04:36:25PM +0300, Or Gerlitz wrote:
> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> mlx4 needs to report the number of supported timestamp
> bits (mask) and the hca_core_clock frequency.

This is critical information to parse the timestamp, why is it hidden
in vendor specific land?

We can't really look at the uapi changes here without also seeing the
verbs side changes.

I'd like to see Yann review all the uapi stuff as well.

I'm not sure how much I like this idea of appending vendor stuff to
these replies... Hiding uapi in a driver just seems like asking for
trouble.
	
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]         ` <20150519185801.GM18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-19 19:00           ` Jason Gunthorpe
       [not found]             ` <20150519190031.GN18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-19 19:00 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 19, 2015 at 12:58:01PM -0600, Jason Gunthorpe wrote:
> On Sun, May 17, 2015 at 04:36:25PM +0300, Or Gerlitz wrote:
> > From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > 
> > mlx4 needs to report the number of supported timestamp
> > bits (mask) and the hca_core_clock frequency.
> 
> This is critical information to parse the timestamp, why is it hidden
> in vendor specific land?

Sorry, direct this toward the page offset in the next patch.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]             ` <20150519190031.GN18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-19 19:11               ` Or Gerlitz
       [not found]                 ` <CAJ3xEMiO2y6G5YzzkGC1M=XB6yrtNJ7jj13pTUAee1TssiYwkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-19 19:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 19, 2015 at 10:00 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, May 19, 2015 at 12:58:01PM -0600, Jason Gunthorpe wrote:
>> On Sun, May 17, 2015 at 04:36:25PM +0300, Or Gerlitz wrote:
>> > From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> >
>> > mlx4 needs to report the number of supported timestamp
>> > bits (mask) and the hca_core_clock frequency.
>>
>> This is critical information to parse the timestamp, why is it hidden
>> in vendor specific land?
>
> Sorry, direct this toward the page offset in the next patch.

We see this as a piece of detail related to the specific vendor
driver. I don't see the point of overloading the user space verbs
library or app with an API to get the offset of the clock within the
mmap page and such, do you?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                 ` <CAJ3xEMiO2y6G5YzzkGC1M=XB6yrtNJ7jj13pTUAee1TssiYwkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-19 19:15                   ` Jason Gunthorpe
       [not found]                     ` <20150519191553.GP18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-19 19:15 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 19, 2015 at 10:11:43PM +0300, Or Gerlitz wrote:
> > Sorry, direct this toward the page offset in the next patch.
> 
> We see this as a piece of detail related to the specific vendor
> driver. I don't see the point of overloading the user space verbs
> library or app with an API to get the offset of the clock within the
> mmap page and such, do you?

Can't you find a better place for this than overloading the
query_device call? I'd rather see you add some kind of driver-specific
command than this.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                     ` <20150519191553.GP18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-19 19:30                       ` Or Gerlitz
       [not found]                         ` <CAJ3xEMjj31SpZmYXORCePVCQfMOWVKEhtNay-v59FcnCOQWm9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-19 19:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 19, 2015 at 10:15 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, May 19, 2015 at 10:11:43PM +0300, Or Gerlitz wrote:
>> > Sorry, direct this toward the page offset in the next patch.
>>
>> We see this as a piece of detail related to the specific vendor
>> driver. I don't see the point of overloading the user space verbs
>> library or app with an API to get the offset of the clock within the
>> mmap page and such, do you?
>
> Can't you find a better place for this than overloading the
> query_device call? I'd rather see you add some kind of driver-specific
> command than this.

Let's be precise --

Are you objecting adding the clock frequency and mask to the qeury device verb?
why?

If not, are objecting using the vendor specific track of the verb to
pass from the vendor driver to the vendor library this or that detail
which is needed for proper operation? why?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                         ` <CAJ3xEMjj31SpZmYXORCePVCQfMOWVKEhtNay-v59FcnCOQWm9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-20  0:29                           ` Jason Gunthorpe
       [not found]                             ` <20150520002915.GD16941-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-20  0:29 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 19, 2015 at 10:30:00PM +0300, Or Gerlitz wrote:
> Are you objecting adding the clock frequency and mask to the qeury device verb?
> why?

Lets see the verbs side and I'll let you know.

> If not, are objecting using the vendor specific track of the verb to
> pass from the vendor driver to the vendor library this or that detail
> which is needed for proper operation? why?

I'm uncomfortable seeing otherwise vendor-neutral calls gain vendor
extensions.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq
       [not found]         ` <20150519184711.GK18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-20  8:54           ` Sagi Grimberg
       [not found]             ` <555C4BAB.6090602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Sagi Grimberg @ 2015-05-20  8:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On 5/19/2015 9:47 PM, Jason Gunthorpe wrote:
> On Sun, May 17, 2015 at 04:36:18PM +0300, Or Gerlitz wrote:
>> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> We add a flags parameter for ib_create_cq in order
>> to specify creation flags for the CQ.
>
> Why?
>
> At this point you should keep flowing ib_cq_init_attr up to callers of
> ib_create_cq , so we don't have to do this again.
>
> It makes sense to keep that as a distinct patch..

I completely agree with Jason. Let's move away from
non-easily-extendale APIs.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation
       [not found]         ` <20150519185527.GL18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-20  8:56           ` Sagi Grimberg
       [not found]             ` <555C4C45.7010300-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Sagi Grimberg @ 2015-05-20  8:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On 5/19/2015 9:55 PM, Jason Gunthorpe wrote:
> On Sun, May 17, 2015 at 04:36:24PM +0300, Or Gerlitz wrote:
>> @@ -110,6 +110,7 @@ struct mlx4_ib_cq {
>>   	struct mutex		resize_mutex;
>>   	struct ib_umem	       *umem;
>>   	struct ib_umem	       *resize_umem;
>> +	int			create_flags;
>
> flags is u32

Maybe even u64 so we wont run out of bits like we did in
device_cap_flags.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                             ` <20150520002915.GD16941-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-20 14:40                               ` Or Gerlitz
  2015-05-20 14:41                               ` Or Gerlitz
  1 sibling, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-20 14:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On 5/20/2015 3:29 AM, Jason Gunthorpe wrote:
> On Tue, May 19, 2015 at 10:30:00PM +0300, Or Gerlitz wrote:
>> >Are you objecting adding the clock frequency and mask to the qeury device verb?
>> >why?
> Lets see the verbs side and I'll let you know.

You mean the user series of libibverbs/libmlx4? I don't see why this 
should be a must for the review of the kernel bits. The user-space code 
is coming up soon, sure, but we should be able to review kernel patches 
without requiring to actually see the user-space code.

As the change-logs here explained, the clock frequency is needed for 
applications to convert the HCA clock delta (current time - timestamp on 
WC) into nano-secs and such.The mask is needed to realize how many bits
from the 64b time-stamp are supported by the HW.

>
>> >If not, are objecting using the vendor specific track of the verb to
>> >pass from the vendor driver to the vendor library this or that detail
>> >which is needed for proper operation? why?
> I'm uncomfortable seeing otherwise vendor-neutral calls gain vendor
> extensions.

But this is whole purpose of the udata framework in uverbs, right? for 
each uverb command the vendor user-space library has a well defined 
channel to communicate directly with the low level vendor driver 
throughout the uverbs channels.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                             ` <20150520002915.GD16941-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-05-20 14:40                               ` Or Gerlitz
@ 2015-05-20 14:41                               ` Or Gerlitz
       [not found]                                 ` <555C9D00.2090609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-20 14:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On 5/20/2015 3:29 AM, Jason Gunthorpe wrote:
> On Tue, May 19, 2015 at 10:30:00PM +0300, Or Gerlitz wrote:
>> >Are you objecting adding the clock frequency and mask to the qeury device verb?
>> >why?
> Lets see the verbs side and I'll let you know.

You mean the user series of libibverbs/libmlx4? I don't see why this 
should be a must for the review of the kernel bits. The user-space code 
is coming up soon, sure, but we should be able to review kernel patches 
without requiring to actually see the user-space code.

As the change-logs here explained, the clock frequency is needed for 
applications to convert the HCA clock delta (current time - timestamp on 
WC) into nano-secs and such.The mask is needed to realize how many bits
from the 64b time-stamp are supported by the HW.

>
>> >If not, are objecting using the vendor specific track of the verb to
>> >pass from the vendor driver to the vendor library this or that detail
>> >which is needed for proper operation? why?
> I'm uncomfortable seeing otherwise vendor-neutral calls gain vendor
> extensions.

But this is whole purpose of the udata framework in uverbs, right? for 
each uverb command the vendor user-space library has a well defined 
channel to communicate directly with the low level vendor driver 
throughout the uverbs channels.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation
       [not found]             ` <555C4C45.7010300-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-05-20 14:43               ` Or Gerlitz
  2015-05-20 17:27               ` Jason Gunthorpe
  1 sibling, 0 replies; 71+ messages in thread
From: Or Gerlitz @ 2015-05-20 14:43 UTC (permalink / raw)
  To: Sagi Grimberg, Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On 5/20/2015 11:56 AM, Sagi Grimberg wrote:
> On 5/19/2015 9:55 PM, Jason Gunthorpe wrote:
>> On Sun, May 17, 2015 at 04:36:24PM +0300, Or Gerlitz wrote:
>>> @@ -110,6 +110,7 @@ struct mlx4_ib_cq {
>>>       struct mutex        resize_mutex;
>>>       struct ib_umem           *umem;
>>>       struct ib_umem           *resize_umem;
>>> +    int            create_flags;
>>
>> flags is u32
>
> Maybe even u64 so we wont run out of bits like we did in 
> device_cap_flags. 

Horses hold, you are comparing the volume of the whole feature set 
exposed by the RDMA subsystem to user-space vs  the number of future CQ 
creation flags? I bet we will not have > 8 of them till 2020 and the 
framework used for this uverb command is extendable...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                 ` <555C9D00.2090609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-20 15:11                                   ` Yann Droneaud
       [not found]                                     ` <1432134677.5304.23.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Yann Droneaud @ 2015-05-20 15:11 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jason Gunthorpe, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

Hi,

Le mercredi 20 mai 2015 à 17:41 +0300, Or Gerlitz a écrit :
> On 5/20/2015 3:29 AM, Jason Gunthorpe wrote:
> > On Tue, May 19, 2015 at 10:30:00PM +0300, Or Gerlitz wrote:
> > > > Are you objecting adding the clock frequency and mask to the 
> > > > qeury device verb?
> > > > why?
> > Lets see the verbs side and I'll let you know.
> 
> You mean the user series of libibverbs/libmlx4? I don't see why this 
> should be a must for the review of the kernel bits. The user-space 
> code 
> is coming up soon, sure, but we should be able to review kernel 
> patches 
> without requiring to actually see the user-space code.
> 

In some other subsystems: no userspace code, no merge.

http://blog.ffwll.ch/2015/05/gfx-kernel-upstreaming-requirements.html


> As the change-logs here explained, the clock frequency is needed for 
> applications to convert the HCA clock delta (current time - timestamp 
> on 
> WC) into nano-secs and such.The mask is needed to realize how many 
> bits
> from the 64b time-stamp are supported by the HW.
> 
> > 
> > > > If not, are objecting using the vendor specific track of the 
> > > > verb to
> > > > pass from the vendor driver to the vendor library this or that 
> > > > detail
> > > > which is needed for proper operation? why?
> > I'm uncomfortable seeing otherwise vendor-neutral calls gain vendor
> > extensions.
> 
> But this is whole purpose of the udata framework in uverbs, right? 
> for 
> each uverb command the vendor user-space library has a well defined 
> channel to communicate directly with the low level vendor driver 
> throughout the uverbs channels.
> 

Uverbs convey information between kernel and userspace drivers to
implement verbs for userspace application. I don't think it's designed
to allow vendor to add random extensions in the best way with regard to
backward/forward compability.


Anyway, please, we have to make drivers which are going to behave as
good citizens to the kernel *and* userspace. Adding a dedicated
extensions which is going to be replaced later by a generic, vendor
neutral, extension will be painful to maintain to ensure backward
compatibility.

So let's think how this timestamp extension can be made generic enough
to be future-proof (and at least present proof to address current use
cases).

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible
       [not found]         ` <20150519184535.GJ18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-20 15:14           ` Yann Droneaud
  2015-05-20 16:41           ` Matan Barak
  1 sibling, 0 replies; 71+ messages in thread
From: Yann Droneaud @ 2015-05-20 15:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

Le mardi 19 mai 2015 à 12:45 -0600, Jason Gunthorpe a écrit :
> > +     struct ib_cq *             (*create_cq)(struct ib_device *device,
> > +                                             struct ib_cq_init_attr *attr,
> 
> const struct ib_cq_init_attr *attr,
> 
> And related changes that will cause.

I was going to ask for the same change.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq
       [not found]     ` <1431869786-6308-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-05-19 18:47       ` Jason Gunthorpe
@ 2015-05-20 15:24       ` Yann Droneaud
       [not found]         ` <1432135457.5304.33.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Yann Droneaud @ 2015-05-20 15:24 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

Hi,

Le dimanche 17 mai 2015 à 16:36 +0300, Or Gerlitz a écrit :
> diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
> b/drivers/infiniband/ulp/iser/iser_verbs.c
> index cc2dd35..922d322 100644
> --- a/drivers/infiniband/ulp/iser/iser_verbs.c
> +++ b/drivers/infiniband/ulp/iser/iser_verbs.c
> @@ -126,11 +126,9 @@ static int iser_create_device_ib_res(struct 
> iser_device *device)
>                 struct iser_comp *comp = &device->comps[i];
>  
>                 comp->device = device;
> -               comp->cq = ib_create_cq(device->ib_device,
> -                                       iser_cq_callback,
> -                                       iser_cq_event_callback,
> -                                       (void *)comp,
> -                                       max_cqe, i);
> +               comp->cq = ib_create_cq(device->ib_device, iser_cq_callback,
> +                                       iser_cq_event_callback, (void *)comp,
> +                                       max_cqe, i, 0);

Don't change indentation needlessly.


>                 if (IS_ERR(comp->cq)) {
>                         comp->cq = NULL;
>                         goto cq_err;


> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
> b/drivers/infiniband/ulp/isert/ib_isert.c
> index 327529e..f7d9ae0 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -320,11 +320,12 @@ isert_alloc_comps(struct isert_device *device,
>  
>                 comp->device = device;
>                 INIT_WORK(&comp->work, isert_cq_work);
> -               comp->cq = ib_create_cq(device->ib_device,
> -                                       isert_cq_callback,
> +               comp->cq = ib_create_cq(device->ib_device, isert_cq_callback,

Don't change indentation.

>                                         isert_cq_event_callback,
>                                         (void *)comp,
> -                                       max_cqe, i);
> +                                       max_cqe,
> +                                       i,
> +                                       0);
>                 if (IS_ERR(comp-&gt;cq)) {
>                         isert_err("Unable to allocate cq\n");
>                         ret = PTR_ERR(comp-&gt;cq);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index ad0e2ea..1bbe4a4 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -173,6 +173,10 @@ struct ib_odp_caps {
>         } per_transport_caps;
>  };
>  
> +enum ib_cq_creation_flags {
> +       IB_CQ_FLAGS_TIMESTAMP   = 1 << 0,
> +};
> +

That's must be part of a different patch, otherwise the commit message
is not true.

>  struct ib_cq_init_attr {
>         int cqe;
>         int comp_vector;

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 3df8320..cfb5915 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -901,21 +901,16 @@ static struct svc_xprt *svc_rdma_accept(struct
> svc_xprt *xprt)
>                 goto errout;
>         }
>         newxprt->sc_sq_cq = ib_create_cq(newxprt->sc_cm_id->device,
> -                                        sq_comp_handler,
> -                                        cq_event_handler,
> -                                        newxprt,
> -                                        newxprt->sc_sq_depth,
> -                                        0);
> +                                        sq_comp_handler, cq_event_handler,
> +                                        newxprt, newxprt->sc_sq_depth, 0, 0);

Don't change indentation.

>         if (IS_ERR(newxprt->sc_sq_cq)) {
>                 dprintk("svcrdma: error creating SQ CQ for connect
> request\n");
>                 goto errout;
>         }
>         newxprt->sc_rq_cq = ib_create_cq(newxprt->sc_cm_id->device,
> -                                        rq_comp_handler,
> -                                        cq_event_handler,
> -                                        newxprt,
> -                                        newxprt->sc_max_requests,
> -                                        0);
> +                                        rq_comp_handler, cq_event_handler,
> +                                        newxprt, newxprt->sc_max_requests,
> +                                        0, 0);

Don't change indentation.

>         if (IS_ERR(newxprt->sc_rq_cq)) {
>                 dprintk("svcrdma: error creating RQ CQ for connect
> request\n");
>                 goto errout;


Regards

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible
       [not found]         ` <20150519184535.GJ18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-05-20 15:14           ` Yann Droneaud
@ 2015-05-20 16:41           ` Matan Barak
       [not found]             ` <CAAKD3BAXBPgkFp=bmeDNoj6QbAXoHMGt_nv+LHCE09g5=nyPGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Matan Barak @ 2015-05-20 16:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Tue, May 19, 2015 at 9:45 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Sun, May 17, 2015 at 04:36:17PM +0300, Or Gerlitz wrote:
>> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Add a new ib_cq_init_attr structure which contains the
>> previous cqe (minimum number of CQ entries) and comp_vector
>> (completion vector) in addition to a new flags field.
>> All vendors' create_cq callbacks are changed in order
>> to work with the new API.
>>
>> This commit does not change any functionality.
>
> This seems reasonable to me.
>
>> @@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
>>       struct ib_uverbs_event_file    *ev_file = NULL;
>>       struct ib_cq                   *cq;
>>       int                             ret;
>> +     struct ib_cq_init_attr attr = {.cqe = 0};
>
> This doesn't seem necessary, it is unconditionally set below:
>

Almost :) It also zeros (default value) all other fields. I could
replace it with a memset if it's clearer.

>>       if (out_len < sizeof resp)
>>               return -ENOSPC;
>> @@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
>>       INIT_LIST_HEAD(&obj->comp_list);
>>       INIT_LIST_HEAD(&obj->async_list);
>>
>> -     cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe,
>> -                                          cmd.comp_vector,
>> +     attr.cqe = cmd.cqe;
>> +     attr.comp_vector = cmd.comp_vector;
>> +     cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr,
>>                                            file->ucontext, &udata);
>
>> +++ b/drivers/infiniband/core/verbs.c
>> @@ -1013,8 +1013,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device,
>>                          void *cq_context, int cqe, int comp_vector)
>>  {
>>       struct ib_cq *cq;
>> +     struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector};
>>
>> -     cq = device->create_cq(device, cqe, comp_vector, NULL, NULL);
>> +     cq = device->create_cq(device, &attr, NULL, NULL);
>
> Hum, I guess it makes sense to stop flowing ib_cq_init_attr at this
> point, for this patch, but it does seem a bit weird from an API design
> perspective.
>

I guess you suggest that ib_create_cq will take ib_cq_init_attr * instead of
separate parameters for cqe and comp_vector. Ok, I'll change that for next
patch set.

>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index 8d59479..ad0e2ea 100644
>> +++ b/include/rdma/ib_verbs.h
>> @@ -173,6 +173,12 @@ struct ib_odp_caps {
>>       } per_transport_caps;
>>  };
>>
>> +struct ib_cq_init_attr {
>> +     int cqe;
>
>  unsigned int cqe
>
> Can't be negative..
>

Correct, but that's the current API as well. I'll change that.

>> +     struct ib_cq *             (*create_cq)(struct ib_device *device,
>> +                                             struct ib_cq_init_attr *attr,
>
> const struct ib_cq_init_attr *attr,
>
> And related changes that will cause.
>

I'll change that, thanks!

Matan

> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq
       [not found]             ` <555C4BAB.6090602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-05-20 16:44               ` Matan Barak
  0 siblings, 0 replies; 71+ messages in thread
From: Matan Barak @ 2015-05-20 16:44 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jason Gunthorpe, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak

On Wed, May 20, 2015 at 11:54 AM, Sagi Grimberg
<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On 5/19/2015 9:47 PM, Jason Gunthorpe wrote:
>>
>> On Sun, May 17, 2015 at 04:36:18PM +0300, Or Gerlitz wrote:
>>>
>>> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>
>>> We add a flags parameter for ib_create_cq in order
>>> to specify creation flags for the CQ.
>>
>>
>> Why?
>>
>> At this point you should keep flowing ib_cq_init_attr up to callers of
>> ib_create_cq , so we don't have to do this again.
>>
>> It makes sense to keep that as a distinct patch..
>
>
> I completely agree with Jason. Let's move away from
> non-easily-extendale APIs.
>

I'll change it to:
struct ib_cq *ib_create_cq(struct ib_device *device,
                          ib_comp_handler comp_handler,
                          void (*event_handler)(struct ib_event *, void *),
                          void *cq_context, const struct ib_cq_init_attr *attr);

Thanks for the review.

Matan

> Sagi.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq
       [not found]         ` <1432135457.5304.33.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
@ 2015-05-20 16:46           ` Matan Barak
  0 siblings, 0 replies; 71+ messages in thread
From: Matan Barak @ 2015-05-20 16:46 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Wed, May 20, 2015 at 6:24 PM, Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
> Hi,
>
> Le dimanche 17 mai 2015 à 16:36 +0300, Or Gerlitz a écrit :
>> diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c
>> b/drivers/infiniband/ulp/iser/iser_verbs.c
>> index cc2dd35..922d322 100644
>> --- a/drivers/infiniband/ulp/iser/iser_verbs.c
>> +++ b/drivers/infiniband/ulp/iser/iser_verbs.c
>> @@ -126,11 +126,9 @@ static int iser_create_device_ib_res(struct
>> iser_device *device)
>>                 struct iser_comp *comp = &device->comps[i];
>>
>>                 comp->device = device;
>> -               comp->cq = ib_create_cq(device->ib_device,
>> -                                       iser_cq_callback,
>> -                                       iser_cq_event_callback,
>> -                                       (void *)comp,
>> -                                       max_cqe, i);
>> +               comp->cq = ib_create_cq(device->ib_device, iser_cq_callback,
>> +                                       iser_cq_event_callback, (void *)comp,
>> +                                       max_cqe, i, 0);
>
> Don't change indentation needlessly.
>
>
>>                 if (IS_ERR(comp->cq)) {
>>                         comp->cq = NULL;
>>                         goto cq_err;
>
>
>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
>> b/drivers/infiniband/ulp/isert/ib_isert.c
>> index 327529e..f7d9ae0 100644
>> --- a/drivers/infiniband/ulp/isert/ib_isert.c
>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
>> @@ -320,11 +320,12 @@ isert_alloc_comps(struct isert_device *device,
>>
>>                 comp->device = device;
>>                 INIT_WORK(&comp->work, isert_cq_work);
>> -               comp->cq = ib_create_cq(device->ib_device,
>> -                                       isert_cq_callback,
>> +               comp->cq = ib_create_cq(device->ib_device, isert_cq_callback,
>
> Don't change indentation.
>
>>                                         isert_cq_event_callback,
>>                                         (void *)comp,
>> -                                       max_cqe, i);
>> +                                       max_cqe,
>> +                                       i,
>> +                                       0);
>>                 if (IS_ERR(comp-&gt;cq)) {
>>                         isert_err("Unable to allocate cq\n");
>>                         ret = PTR_ERR(comp-&gt;cq);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index ad0e2ea..1bbe4a4 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -173,6 +173,10 @@ struct ib_odp_caps {
>>         } per_transport_caps;
>>  };
>>
>> +enum ib_cq_creation_flags {
>> +       IB_CQ_FLAGS_TIMESTAMP   = 1 << 0,
>> +};
>> +
>
> That's must be part of a different patch, otherwise the commit message
> is not true.
>
>>  struct ib_cq_init_attr {
>>         int cqe;
>>         int comp_vector;
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 3df8320..cfb5915 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -901,21 +901,16 @@ static struct svc_xprt *svc_rdma_accept(struct
>> svc_xprt *xprt)
>>                 goto errout;
>>         }
>>         newxprt->sc_sq_cq = ib_create_cq(newxprt->sc_cm_id->device,
>> -                                        sq_comp_handler,
>> -                                        cq_event_handler,
>> -                                        newxprt,
>> -                                        newxprt->sc_sq_depth,
>> -                                        0);
>> +                                        sq_comp_handler, cq_event_handler,
>> +                                        newxprt, newxprt->sc_sq_depth, 0, 0);
>
> Don't change indentation.
>
>>         if (IS_ERR(newxprt->sc_sq_cq)) {
>>                 dprintk("svcrdma: error creating SQ CQ for connect
>> request\n");
>>                 goto errout;
>>         }
>>         newxprt->sc_rq_cq = ib_create_cq(newxprt->sc_cm_id->device,
>> -                                        rq_comp_handler,
>> -                                        cq_event_handler,
>> -                                        newxprt,
>> -                                        newxprt->sc_max_requests,
>> -                                        0);
>> +                                        rq_comp_handler, cq_event_handler,
>> +                                        newxprt, newxprt->sc_max_requests,
>> +                                        0, 0);
>
> Don't change indentation.
>
>>         if (IS_ERR(newxprt->sc_rq_cq)) {
>>                 dprintk("svcrdma: error creating RQ CQ for connect
>> request\n");
>>                 goto errout;
>

Nice catches, I'll fix that for the next version.

Thanks.

Matan

>
> Regards
>
> --
> Yann Droneaud
> OPTEYA
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation
       [not found]             ` <555C4C45.7010300-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-05-20 14:43               ` Or Gerlitz
@ 2015-05-20 17:27               ` Jason Gunthorpe
  1 sibling, 0 replies; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-20 17:27 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Wed, May 20, 2015 at 11:56:37AM +0300, Sagi Grimberg wrote:
> On 5/19/2015 9:55 PM, Jason Gunthorpe wrote:
> >On Sun, May 17, 2015 at 04:36:24PM +0300, Or Gerlitz wrote:
> >>@@ -110,6 +110,7 @@ struct mlx4_ib_cq {
> >>  	struct mutex		resize_mutex;
> >>  	struct ib_umem	       *umem;
> >>  	struct ib_umem	       *resize_umem;
> >>+	int			create_flags;
> >
> >flags is u32
> 
> Maybe even u64 so we wont run out of bits like we did in
> device_cap_flags.

It is all internal so the bitwidth doesn't matter, we can change
it. My remark was because the definitions of flags in the other
structure is u32, and copying u32 to int is sloppy.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                     ` <1432134677.5304.23.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
@ 2015-05-20 17:37                                       ` Jason Gunthorpe
  2015-05-20 17:53                                       ` Or Gerlitz
  1 sibling, 0 replies; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-20 17:37 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Wed, May 20, 2015 at 05:11:17PM +0200, Yann Droneaud wrote:
> Hi,
> 
> Le mercredi 20 mai 2015 à 17:41 +0300, Or Gerlitz a écrit :
> > On 5/20/2015 3:29 AM, Jason Gunthorpe wrote:
> > > On Tue, May 19, 2015 at 10:30:00PM +0300, Or Gerlitz wrote:
> > > > > Are you objecting adding the clock frequency and mask to the 
> > > > > qeury device verb?
> > > > > why?
> > > Lets see the verbs side and I'll let you know.
> > 
> > You mean the user series of libibverbs/libmlx4? I don't see why this 
> > should be a must for the review of the kernel bits. The user-space 
> > code 
> > is coming up soon, sure, but we should be able to review kernel 
> > patches 
> > without requiring to actually see the user-space code.
> > 
> 
> In some other subsystems: no userspace code, no merge.
> 
> http://blog.ffwll.ch/2015/05/gfx-kernel-upstreaming-requirements.html

I think we need to have the same policy.

Like for the time base related values, if the uapi is some kind of
'ibv_get_wc_timestamp_ns' call (which would make sense) then those are
actually vendor values and we don't need them in the public structure.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                     ` <1432134677.5304.23.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  2015-05-20 17:37                                       ` Jason Gunthorpe
@ 2015-05-20 17:53                                       ` Or Gerlitz
       [not found]                                         ` <CAJ3xEMjO2HqFePAwXnoCt0rCzRLfYbqdP+883v4LaMovj7Y-FA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-20 17:53 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Or Gerlitz, Jason Gunthorpe, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak

On Wed, May 20, 2015 at 6:11 PM, Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:

>> But this is whole purpose of the udata framework in uverbs, right? for
>> each uverb command the vendor user-space library has a well defined
>> channel to communicate directly with the low level vendor driver
>> throughout the uverbs channels.

> Uverbs convey information between kernel and userspace drivers to
> implement verbs for userspace application. I don't think it's designed
> to allow vendor to add random extensions in the best way with regard to
> backward/forward compability.

Disagree that this is random extension. The people that designed this
stack 10y ago (Roland and Co.) looked very nicely forward and realized
that not all the HW are the same nor can be put 101% under the same
API with no way out, and hence they came up with udata.

Please state how you see the role of the uverbs udata mechanism.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible
       [not found]             ` <CAAKD3BAXBPgkFp=bmeDNoj6QbAXoHMGt_nv+LHCE09g5=nyPGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-20 19:01               ` Jason Gunthorpe
  0 siblings, 0 replies; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-20 19:01 UTC (permalink / raw)
  To: Matan Barak
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak

On Wed, May 20, 2015 at 07:41:17PM +0300, Matan Barak wrote:
> On Tue, May 19, 2015 at 9:45 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Sun, May 17, 2015 at 04:36:17PM +0300, Or Gerlitz wrote:
> >> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>
> >> Add a new ib_cq_init_attr structure which contains the
> >> previous cqe (minimum number of CQ entries) and comp_vector
> >> (completion vector) in addition to a new flags field.
> >> All vendors' create_cq callbacks are changed in order
> >> to work with the new API.
> >>
> >> This commit does not change any functionality.
> >
> > This seems reasonable to me.
> >
> >> @@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file,
> >>       struct ib_uverbs_event_file    *ev_file = NULL;
> >>       struct ib_cq                   *cq;
> >>       int                             ret;
> >> +     struct ib_cq_init_attr attr = {.cqe = 0};
> >
> > This doesn't seem necessary, it is unconditionally set below:
> >
> 
> Almost :) It also zeros (default value) all other fields. I could
> replace it with a memset if it's clearer.

All fields are set unconditionally, so there is no need to init.

= {};

Is idomatic to zero a structure.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                         ` <CAJ3xEMjO2HqFePAwXnoCt0rCzRLfYbqdP+883v4LaMovj7Y-FA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-23  4:26                                           ` Or Gerlitz
       [not found]                                             ` <CAJ3xEMgppUPqOFm7L0+fR3z_N2ALZFb+GpSAGvyyNROZoFCUPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-23  4:26 UTC (permalink / raw)
  To: Yann Droneaud, Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak

On Wed, May 20, 2015 at 8:53 PM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, May 20, 2015 at 6:11 PM, Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
>
>>> But this is whole purpose of the udata framework in uverbs, right? for
>>> each uverb command the vendor user-space library has a well defined
>>> channel to communicate directly with the low level vendor driver
>>> throughout the uverbs channels.
>
>> Uverbs convey information between kernel and userspace drivers to
>> implement verbs for userspace application. I don't think it's designed
>> to allow vendor to add random extensions in the best way with regard to
>> backward/forward compability.
>
> Disagree that this is random extension. The people that designed this
> stack 10y ago (Roland and Co.) looked very nicely forward and realized
> that not all the HW are the same nor can be put 101% under the same
> API with no way out, and hence they came up with udata.
>
> Please state how you see the role of the uverbs udata mechanism.

Guys, still waiting to hear why you think it's wrong here to use the
mechanism which was built from day-1 for the purpose of allowing the
user-space driver library to communicate with the kernel driver and
pass values in both directions.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                             ` <CAJ3xEMgppUPqOFm7L0+fR3z_N2ALZFb+GpSAGvyyNROZoFCUPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-26  8:10                                               ` Or Gerlitz
       [not found]                                                 ` <CAJ3xEMinNsQrmQ7atAVkshefBDBQ1HRuGbOT_cJ4RxEj4dWUPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-26  8:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Sat, May 23, 2015 at 7:26 AM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, May 20, 2015 at 8:53 PM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Wed, May 20, 2015 at 6:11 PM, Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org> wrote:
>>
>>>> But this is whole purpose of the udata framework in uverbs, right? for
>>>> each uverb command the vendor user-space library has a well defined
>>>> channel to communicate directly with the low level vendor driver
>>>> throughout the uverbs channels.
>>
>>> Uverbs convey information between kernel and userspace drivers to
>>> implement verbs for userspace application. I don't think it's designed
>>> to allow vendor to add random extensions in the best way with regard to
>>> backward/forward compability.
>>
>> Disagree that this is random extension. The people that designed this
>> stack 10y ago (Roland and Co.) looked very nicely forward and realized
>> that not all the HW are the same nor can be put 101% under the same
>> API with no way out, and hence they came up with udata.
>>
>> Please state how you see the role of the uverbs udata mechanism.
>
> Guys, still waiting to hear why you think it's wrong here to use the
> mechanism which was built from day-1 for the purpose of allowing the
> user-space driver library to communicate with the kernel driver and
> pass values in both directions.

Jason, ping, it's fair to require that if you made a review argument against
the design done here and we've responded about a week ago, saying why
this design is valid (e.g goes along the 10y old IB stack udata mechanism and
such) -- you would comment on the response and not  leave it in the air.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                 ` <CAJ3xEMinNsQrmQ7atAVkshefBDBQ1HRuGbOT_cJ4RxEj4dWUPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-26 16:06                                                   ` Jason Gunthorpe
       [not found]                                                     ` <20150526160624.GA11800-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-26 16:06 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 26, 2015 at 11:10:45AM +0300, Or Gerlitz wrote:

> Jason, ping, it's fair to require that if you made a review argument against
> the design done here and we've responded about a week ago, saying why
> this design is valid (e.g goes along the 10y old IB stack udata mechanism and
> such) -- you would comment on the response and not  leave it in the air.

Was it not clear? Yann and I asked to see the user space side before
reviewing this series further.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                     ` <20150526160624.GA11800-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-26 18:33                                                       ` Or Gerlitz
       [not found]                                                         ` <CAJ3xEMj5qphSq5QQHKPsihuK6StmJ+NHiPbkY-ZVGmpViB1AXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-26 18:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 26, 2015 at 7:06 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, May 26, 2015 at 11:10:45AM +0300, Or Gerlitz wrote:
>
>> Jason, ping, it's fair to require that if you made a review argument against
>> the design done here and we've responded about a week ago, saying why
>> this design is valid (e.g goes along the 10y old IB stack udata mechanism and
>> such) -- you would comment on the response and not  leave it in the air.
>
> Was it not clear? Yann and I asked to see the user space side before
> reviewing this series further.

Jason, you (U2 BTW) play really, really hard - refusing to say **one**
word on your approach towards the built-in udata mechanism for uverbs
which I asked you to comment on.

On top of that, as happens **all** the **time** in netdev and possibly
other subsystems, user space facing kernel patches were reviewed and
accepted in this list over the last ten years with-out seeing their
user-space counter parts @ the time of the kernel submission. There's
no reason to impose this as hard requirement just b/c two reviewers
ask that. You don't own this place.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                         ` <CAJ3xEMj5qphSq5QQHKPsihuK6StmJ+NHiPbkY-ZVGmpViB1AXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-26 18:53                                                           ` Jason Gunthorpe
       [not found]                                                             ` <20150526185315.GF11800-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-26 18:53 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 26, 2015 at 09:33:18PM +0300, Or Gerlitz wrote:
> On Tue, May 26, 2015 at 7:06 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Tue, May 26, 2015 at 11:10:45AM +0300, Or Gerlitz wrote:
> >
> >> Jason, ping, it's fair to require that if you made a review argument against
> >> the design done here and we've responded about a week ago, saying why
> >> this design is valid (e.g goes along the 10y old IB stack udata mechanism and
> >> such) -- you would comment on the response and not  leave it in the air.
> >
> > Was it not clear? Yann and I asked to see the user space side before
> > reviewing this series further.
> 
> Jason, you (U2 BTW) play really, really hard - refusing to say **one**
> word on your approach towards the built-in udata mechanism for uverbs
> which I asked you to comment on.

And I asked to see the user space side and you have angrily refused
every time.

So I guess we are both playing hard.

FWIW, your comments on udata seemed compelling, but I want to see the
whole solution before saying I'm OK with it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                             ` <20150526185315.GF11800-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-26 20:39                                                               ` Or Gerlitz
       [not found]                                                                 ` <CAJ3xEMh7JYN8sP5MYdVjH=tqPFURwvD7jVvV0=PBiEn1qEFi2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-26 20:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 26, 2015 at 9:53 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, May 26, 2015 at 09:33:18PM +0300, Or Gerlitz wrote:
>> On Tue, May 26, 2015 at 7:06 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> > On Tue, May 26, 2015 at 11:10:45AM +0300, Or Gerlitz wrote:
>> >
>> >> Jason, ping, it's fair to require that if you made a review argument against
>> >> the design done here and we've responded about a week ago, saying why
>> >> this design is valid (e.g goes along the 10y old IB stack udata mechanism and
>> >> such) -- you would comment on the response and not  leave it in the air.
>> >
>> > Was it not clear? Yann and I asked to see the user space side before
>> > reviewing this series further.
>>
>> Jason, you (U2 BTW) play really, really hard - refusing to say **one**
>> word on your approach towards the built-in udata mechanism for uverbs
>> which I asked you to comment on.

> And I asked to see the user space side and you have angrily refused
> every time.

AFAIR I never ever refused to show any piece of code which went under
my hands towards Linux to any-one.

> So I guess we are both playing hard.

I disagree, you act as sort of being the boss here, stating every now
and then your preferences and way of engineering things as the
ultimate guidelines for Linux and/or RDMA engineering.

> FWIW, your comments on udata seemed compelling

Good to hear

> but I want to see the whole solution before saying I'm OK with it.

go look, not the final cut but should be close to what we'll submit

https://github.com/matanb10/libibverbs timestamp-v0
https://github.com/matanb10/libmlx4 timestamp-v0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                 ` <CAJ3xEMh7JYN8sP5MYdVjH=tqPFURwvD7jVvV0=PBiEn1qEFi2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-26 22:07                                                                   ` Jason Gunthorpe
       [not found]                                                                     ` <20150526220724.GC4502-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-26 22:07 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Tue, May 26, 2015 at 11:39:04PM +0300, Or Gerlitz wrote:
> >> Jason, you (U2 BTW) play really, really hard - refusing to say **one**
> >> word on your approach towards the built-in udata mechanism for uverbs
> >> which I asked you to comment on.
> 
> > And I asked to see the user space side and you have angrily refused
> > every time.
> 
> AFAIR I never ever refused to show any piece of code which went under
> my hands towards Linux to any-one.

For future reference, when someone asks a question and you go off on
an tangental rant and ignore the question, then that process repeats,
still without answering the question - most english speakers would
call that refusing to answer the question. It is not looked upon
kindly.

I'm really confused why you didn't just post the github links last
week, the patches are all a month old on there. Was it really so
offensive to you that we wanted to review the kernel UAPI patches and
verbs patches together?

> > So I guess we are both playing hard.
> 
> I disagree, you act as sort of being the boss here, stating every now
> and then your preferences and way of engineering things as the
> ultimate guidelines for Linux and/or RDMA engineering.

Lets be clear Or, I have given you (and others) some very pointed
comments and advice, privately and publicly. That is not 'being the
boss' that is contributing to fix our community.

When it comes to my patch comments, I give direction on what I want to
see to provide my Reviewed-By.

If you don't like it, then find someone else to review your code.

I'm busy, and I don't work for you. If I don't want to review some
patches because my questions have been ignored, then that is entirely
my perogative.

Pinging me *three times this week* on this stupid timestamp thing, is
somewhere between annoying and offensive.

Chill out Or.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                     ` <20150526220724.GC4502-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-27 11:54                                                                       ` Or Gerlitz
       [not found]                                                                         ` <CAJ3xEMgdqBnn6hobRJFut6Nyu2pKt40rb46Z3dYx3MaohR2urA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-27 11:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Wed, May 27, 2015 at 1:07 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, May 26, 2015 at 11:39:04PM +0300, Or Gerlitz wrote:
>> >> Jason, you (U2 BTW) play really, really hard - refusing to say **one**
>> >> word on your approach towards the built-in udata mechanism for uverbs
>> >> which I asked you to comment on.
>>
>> > And I asked to see the user space side and you have angrily refused
>> > every time.
>>
>> AFAIR I never ever refused to show any piece of code which went under
>> my hands towards Linux to any-one.

> For future reference, when someone asks a question and you go off on
> an tangental rant and ignore the question, then that process repeats,
> still without answering the question - most english speakers would
> call that refusing to answer the question. It is not looked upon kindly.

Jason,

It's not that you asked to see the code ala "hey, do you happen to
have a git with
the user space code for people to inspect while doing the review on
the kernel part", but
rather U2 saying in a definitive manner that posting the user space
code should be
imposed as pre-requirement to acceptance of the kernel parts.

In parallel, U2 totally rejected our usage of udata @ on the spot and
when I mentioned
that it's a feature which was designed for that purpose exactly and
from day one, it took
me three reminders to get a "you know what, maybe that can fly"
comment from you.

So here I started to realize that there's something in the attitude
that goes beyond
the details, and I made the you're not the boss comment.


> I'm really confused why you didn't just post the github links last
> week, the patches are all a month old on there. Was it really so
> offensive to you that we wanted to review the kernel UAPI patches and
> verbs patches together?
>
>> > So I guess we are both playing hard.
>>
>> I disagree, you act as sort of being the boss here, stating every now
>> and then your preferences and way of engineering things as the
>> ultimate guidelines for Linux and/or RDMA engineering.
>
> Lets be clear Or, I have given you (and others) some very pointed
> comments and advice, privately and publicly. That is not 'being the
> boss' that is contributing to fix our community.
>
> When it comes to my patch comments, I give direction on what I want to
> see to provide my Reviewed-By.
>
> If you don't like it, then find someone else to review your code.
>
> I'm busy, and I don't work for you. If I don't want to review some
> patches because my questions have been ignored, then that is entirely
> my perogative.
>
> Pinging me *three times this week* on this stupid timestamp thing, is
> somewhere between annoying and offensive.

See above, why I made these pings.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                         ` <CAJ3xEMgdqBnn6hobRJFut6Nyu2pKt40rb46Z3dYx3MaohR2urA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-27 18:48                                                                           ` Jason Gunthorpe
       [not found]                                                                             ` <20150527184856.GA16059-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-27 18:48 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Wed, May 27, 2015 at 02:54:12PM +0300, Or Gerlitz wrote:
> On Wed, May 27, 2015 at 1:07 AM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Tue, May 26, 2015 at 11:39:04PM +0300, Or Gerlitz wrote:
> >> >> Jason, you (U2 BTW) play really, really hard - refusing to say **one**
> >> >> word on your approach towards the built-in udata mechanism for uverbs
> >> >> which I asked you to comment on.
> >>
> >> > And I asked to see the user space side and you have angrily refused
> >> > every time.
> >>
> >> AFAIR I never ever refused to show any piece of code which went under
> >> my hands towards Linux to any-one.
> 
> > For future reference, when someone asks a question and you go off on
> > an tangental rant and ignore the question, then that process repeats,
> > still without answering the question - most english speakers would
> > call that refusing to answer the question. It is not looked upon kindly.
> 
> Jason,
> 
> It's not that you asked to see the code ala "hey, do you happen to
> have a git with the user space code for people to inspect while
> doing the review on the kernel part", but rather U2 saying in a
> definitive manner that posting the user space code should be imposed
> as pre-requirement to acceptance of the kernel parts.

I really didn't Or:

First ask:
 'We can't really look at the uapi changes here without also seeing the
  verbs side changes.'
 (I know others on the list feel the same, so I use 'we')

Second ask:
 'Lets see the verbs side and I'll let you know.'
(.. to your questions based on my review comments ..)

Third ask:
 'I think we need to have the same policy.'
(.. To Yann's point that other kernel communities have a mandatory
  UAPI policy)

Fourth (exasperated) ask:
 'Was it not clear? Yann and I asked to see the user space side before
  reviewing this series further.'

I know you are ESL, and I cut you alot of slack, but *come on* - that
is incredibly soft language, and certainly not bossing and imposing in
a definitive manner a blanket requirement on all patches.

Advice: You would be well served to spend a bit more time on your
emails. I have no idea what 'but rather U2 saying' means, for
instance. Sometimes I just guess at what you are trying to say :|

That is the price we pay for an inclusive international community, but
everyone needs to be careful before starting a flame war based on
percived slight in the text and phrasing of a message. email is hard.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                             ` <20150527184856.GA16059-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-27 21:33                                                                               ` Or Gerlitz
       [not found]                                                                                 ` <CAJ3xEMh_g0zBza5HoVZTy9cTRbXqatzvBMAJcRjQhH7Zi=cHAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-27 21:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Wed, May 27, 2015 at 9:48 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, May 27, 2015 at 02:54:12PM +0300, Or Gerlitz wrote:
>> On Wed, May 27, 2015 at 1:07 AM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> > On Tue, May 26, 2015 at 11:39:04PM +0300, Or Gerlitz wrote:
>> >> >> Jason, you (U2 BTW) play really, really hard - refusing to say **one**
>> >> >> word on your approach towards the built-in udata mechanism for uverbs
>> >> >> which I asked you to comment on.
>> >>
>> >> > And I asked to see the user space side and you have angrily refused
>> >> > every time.
>> >>
>> >> AFAIR I never ever refused to show any piece of code which went under
>> >> my hands towards Linux to any-one.
>>
>> > For future reference, when someone asks a question and you go off on
>> > an tangental rant and ignore the question, then that process repeats,
>> > still without answering the question - most english speakers would
>> > call that refusing to answer the question. It is not looked upon kindly.
>>
>> Jason,
>>
>> It's not that you asked to see the code ala "hey, do you happen to
>> have a git with the user space code for people to inspect while
>> doing the review on the kernel part", but rather U2 saying in a
>> definitive manner that posting the user space code should be imposed
>> as pre-requirement to acceptance of the kernel parts.
>
> I really didn't Or:
>
> First ask:
>  'We can't really look at the uapi changes here without also seeing the
>   verbs side changes.'
>  (I know others on the list feel the same, so I use 'we')
>
> Second ask:
>  'Lets see the verbs side and I'll let you know.'
> (.. to your questions based on my review comments ..)
>
> Third ask:
>  'I think we need to have the same policy.'
> (.. To Yann's point that other kernel communities have a mandatory
>   UAPI policy)
>
> Fourth (exasperated) ask:
>  'Was it not clear? Yann and I asked to see the user space side before
>   reviewing this series further.'
>
> I know you are ESL, and I cut you alot of slack, but *come on* - that
> is incredibly soft language, and certainly not bossing and imposing in
> a definitive manner a blanket requirement on all patches.

Jason,

ESL indeed am I, and in that respect, this clarification, even if
being tedious to set or read, helps.

Still, I'd like to further try and get you from where the bossing thing came:

(1) "show me the user space code prior to acceptance of the kernel
part" never was a requirement on this community since the day we were
born (Q4/2004)

(2) instantly rejecting a usage of a mechanism existing just for that
use case since the first year of our life (2005)

is something perceived by me as two people (that's the U2) that come
and say, "game's over, the old, non-functioning boss is gone, new boss
(== we) in town and forget about everything you knew before".

So you say that is was wrong perception, I hope so. Let's see how Doug
see your feedback, namely either as community reviewer feedback or as
new rules being set overnight, waiting.

This series is (1) simple compared to other stuff being reviewed here
nowadays, and (2) has very nice value to latency sensitive
applications, so two wins, lets get it done.

> Advice: You would be well served to spend a bit more time on your
> emails. I have no idea what 'but rather U2 saying' means, for
> instance. Sometimes I just guess at what you are trying to say :|

point taken

> That is the price we pay for an inclusive international community, but
> everyone needs to be careful before starting a flame war based on
> percived slight in the text and phrasing of a message. email is hard.

point taken. In this case (as you can see from my response above) I am
not convinced yet that this was false positive.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                 ` <CAJ3xEMh_g0zBza5HoVZTy9cTRbXqatzvBMAJcRjQhH7Zi=cHAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-27 22:21                                                                                   ` Jason Gunthorpe
       [not found]                                                                                     ` <20150527222108.GA7855-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-27 22:21 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On Thu, May 28, 2015 at 12:33:58AM +0300, Or Gerlitz wrote:

> (2) instantly rejecting a usage of a mechanism existing just for that
> use case since the first year of our life (2005)

I didn't reject anything, I said I wanted to see the user side to
continue my review.

> This series is (1) simple compared to other stuff being reviewed here
> nowadays,

What? It is 11 patches, long, introduces several UAPI changes, does
not implement a standardized feature, adds new uses of latent kernel
functions and exists to support a unique feature of a single hardware
vendor that few understand the usecase for.

I would describe this as one of the toughest to review series on
patchworks right now.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                     ` <20150527222108.GA7855-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-28  7:04                                                                                       ` Or Gerlitz
       [not found]                                                                                         ` <5566BDE4.50709-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Or Gerlitz @ 2015-05-28  7:04 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Lameter
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai,
	Tal Alon, Matan Barak, Yann Droneaud

On 5/28/2015 1:21 AM, Jason Gunthorpe wrote:
> exists to support a unique feature of a single hardware vendor that few understand the use case for

Responding in EIM (End In Mind) manner

The use case is very clear, low latency applications using UD or RAW 
PACKET QPs that needs to know the time it takes for different HW/SW 
layers to get their packets through. The verbs version of SO_TIMESTAMP 
and friends (Documentation/networking/timestamping*)-- Christoph, can 
you add some info on common use-cases for this?

I bet that > 20 upstream Eth NIC drivers supports time-stamping, so 
there's no reason that a modern HCA will not support it too.

> It is 11 patches, long, introduces several UAPI changes,

      1  IB/core: Change provider's API of create_cq to be extendible
      2  IB/core: Change ib_create_cq to use struct ib_cq_init_attr
      3  IB/core: Add CQ creation time-stamping flag
      4  IB/core: Extend ib_uverbs_create_cq
      5  IB/core: Add timestamp_mask and hca_core_clock to query_device
      6  IB/core: Pass hardware specific data in query_device
      7  IB/mlx4: Add mmap call to map the hardware clock
      8  IB/mlx4: Support extended create_cq and query_device uverbs
      9  IB/mlx4: Add support for timestamp in cq creation
     10  IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
     11  IB/mlx4: Return hca core clock's offset in query_device 
vendor's data

01-02 just cosmetics that don't add any new functionality
03    adding CQ creation flag to the kernel verbs
04    new uverbs API to extend CQ creation
05    extending uverbs query device to return two more values
06    small fix to missing udata mechanics in uverbs query device
07-11 mlx4 provider side of the CQ setup and clock mmaping to user-space

the core of the review should be around the 03-06 zone, and with experts 
such as
Yann (and you) the uverbs part shouldn't be too complex to review and 
fix if needed.

> does not implement a standardized feature,

This is standard in Eth NIC, return the time-stamp of when the packet 
arrived/sent


> adds new uses of latent kernel functions

ESL I am

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                         ` <5566BDE4.50709-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-05-28 14:13                                                                                           ` Christoph Lameter
       [not found]                                                                                             ` <alpine.DEB.2.11.1505280908001.16142-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2015-05-28 14:13 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jason Gunthorpe, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, 28 May 2015, Or Gerlitz wrote:

> The use case is very clear, low latency applications using UD or RAW PACKET
> QPs that needs to know the time it takes for different HW/SW layers to get
> their packets through. The verbs version of SO_TIMESTAMP and friends
> (Documentation/networking/timestamping*)-- Christoph, can you add some info on
> common use-cases for this?

Well timestamp information is widely in the financial industry to track
latency and also for fairness. Exchanges/Brokers etc often must guarantee
that the packet received first is not processed after packets received
later. Packet timestamps in many ways affect processing and are also used
for error checking etc.

What we have to do without this is to use RDTSC to get a timestamp but the
packet reception / sending time then is inaccurate due to the instructions that
have to be executed before and after. And there is additional overhead due
to that processing.

> I bet that > 20 upstream Eth NIC drivers supports time-stamping, so there's no
> reason that a modern HCA will not support it too.

Right. Ethernet timekeeping support and timestamping has become fairly
standard. Even the onboard one do that these days.

> > does not implement a standardized feature,
>
> This is standard in Eth NIC, return the time-stamp of when the packet
> arrived/sent

It needs desperately to be in the standard.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                             ` <alpine.DEB.2.11.1505280908001.16142-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-05-28 16:24                                                                                               ` Jason Gunthorpe
       [not found]                                                                                                 ` <20150528162416.GA6515-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-28 16:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, May 28, 2015 at 09:13:41AM -0500, Christoph Lameter wrote:
> What we have to do without this is to use RDTSC to get a timestamp but the
> packet reception / sending time then is inaccurate due to the instructions that
> have to be executed before and after. And there is additional overhead due
> to that processing.

After a quick look through, the biggest question in my mind is what
should the timestamp value in the wc be?

Right now it is some coded thing in clock cycles.

Should we require the driver to convert to ns before passing the wc
back to the app? (Looks like the socket implementation uniformly uses
us or ns)

Should the app open code the conversion from clock cycles to ns, or
vfunc down to the driver?

Is the coding going to be OK for multiple HW vendors?

These questions effect what the UAPI should be, if the answer is 'use
ns everwhere' then some of this stuff being added does not belong in
the general API.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                 ` <20150528162416.GA6515-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-28 17:14                                                                                                   ` Christoph Lameter
       [not found]                                                                                                     ` <alpine.DEB.2.11.1505281212330.1711-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2015-05-28 17:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, 28 May 2015, Jason Gunthorpe wrote:

> After a quick look through, the biggest question in my mind is what
> should the timestamp value in the wc be?
>
> Right now it is some coded thing in clock cycles.

This is sufficient since it can be converted to ns or whatever one wants.

> Should we require the driver to convert to ns before passing the wc
> back to the app? (Looks like the socket implementation uniformly uses
> us or ns)

But that requires additional processing.

> Should the app open code the conversion from clock cycles to ns, or
> vfunc down to the driver?

The API provides the abilty to retrieve the clock freq which is
sufficient for the user to convert the value to meaningful time values.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                     ` <alpine.DEB.2.11.1505281212330.1711-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-05-28 17:50                                                                                                       ` Jason Gunthorpe
       [not found]                                                                                                         ` <20150528175043.GA10966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-05-29 16:59                                                                                                       ` Doug Ledford
  1 sibling, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-28 17:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, May 28, 2015 at 12:14:15PM -0500, Christoph Lameter wrote:
> On Thu, 28 May 2015, Jason Gunthorpe wrote:
> 
> > After a quick look through, the biggest question in my mind is what
> > should the timestamp value in the wc be?
> >
> > Right now it is some coded thing in clock cycles.
> 
> This is sufficient since it can be converted to ns or whatever one wants.

Sure it is sufficient, but is it a robust UAPI, will it support
multiple hardware vendors?

Is anyone else in ethernet using verbs to deliver IP packets?

Having a conversion function, or doing it in the wc generation is more
'obviously safe' for future proofing the UAPI.

> > Should we require the driver to convert to ns before passing the wc
> > back to the app? (Looks like the socket implementation uniformly uses
> > us or ns)

> But that requires additional processing.

Well, it is only additional if the app is going to ignore the time
stamp or not convert it to ns right away. Is that the common use case?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                         ` <20150528175043.GA10966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-28 18:30                                                                                                           ` Christoph Lameter
       [not found]                                                                                                             ` <alpine.DEB.2.11.1505281325150.1794-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2015-05-28 18:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, 28 May 2015, Jason Gunthorpe wrote:

> > This is sufficient since it can be converted to ns or whatever one wants.
>
> Sure it is sufficient, but is it a robust UAPI, will it support
> multiple hardware vendors?

What would prevent other hardware vendors from exporting their counters?

> Is anyone else in ethernet using verbs to deliver IP packets?

This is not only for Ethernet. Internally Infiniband is frequently used
and there also timestamps are useful.

> Having a conversion function, or doing it in the wc generation is more
> 'obviously safe' for future proofing the UAPI.

Well no. There has been a history of putting time corrections etc etc into
these. Once you move from a raw counter to actual time various
complications may need to be dealt with. For simple time differentials the
counter is sufficient. If you really want proper "time" when something
happens then you may want to scale and correct etc the value and have some
sort of time sync approach.

> > But that requires additional processing.
>
> Well, it is only additional if the app is going to ignore the time
> stamp or not convert it to ns right away. Is that the common use case?

The app may not need proper time but just a cycle count differential. A
cycle count differential is often easier to handle than a ns value. And
having ns values leads to the assumption that they are "correct" so
various factors related to tuning the clock etc may have to be applied.
You want to deal with these issues only if really necessary.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                             ` <alpine.DEB.2.11.1505281325150.1794-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-05-28 19:50                                                                                                               ` Jason Gunthorpe
       [not found]                                                                                                                 ` <20150528195034.GA11182-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-28 19:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, May 28, 2015 at 01:30:52PM -0500, Christoph Lameter wrote:
> On Thu, 28 May 2015, Jason Gunthorpe wrote:
> 
> > > This is sufficient since it can be converted to ns or whatever one wants.
> >
> > Sure it is sufficient, but is it a robust UAPI, will it support
> > multiple hardware vendors?
> 
> What would prevent other hardware vendors from exporting their counters?

Well, I'm not a HW vendor, so I don't know for sure, the proposal is:

timestamp_mask - how many bits are valid in the timestamp.
		 timestamp values could be 64bits the most.
hca_core_clock - timestamp is given in HW cycles, hca_core_clock
                 is the frequency of the HCA and is necessary in
		 order to convert cycles to seconds.

So presumably cycle difference is:
  (a-b) & mask
And to ns is
 a*1E9/hca_core_clock

So first, the above is a tricky bit of math to open code, integer
overflow needs to be avoided - so at a minimum I'd like to see
libibverbs provide this as a function call. I'd probably also say
difference should be an inline too.

If it is a function call then maybe we don't need to tell the app
about hca_core_clock?

Second: What about wrap around? Does it even make sense to expose less
than 64 bits to userspace? Should the driver manage wrap around to
create a flat 64 bit space?

Otherwise, if the app has to do it, there is no event indicating wrap
has occured.

Without managing wrapping, cycle count difference is not a reliable
calculation.

> > Is anyone else in ethernet using verbs to deliver IP packets?
> 
> This is not only for Ethernet. Internally Infiniband is frequently used
> and there also timestamps are useful.

Sure, but should any other vendors be commenting on this UAPI?

> Well no. There has been a history of putting time corrections etc etc into
> these. Once you move from a raw counter to actual time various
> complications may need to be dealt with. For simple time differentials the
> counter is sufficient. If you really want proper "time" when something
> happens then you may want to scale and correct etc the value and have some
> sort of time sync approach.

Sure OK, I'm sold, export a cycle counter of some sort.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                 ` <20150528195034.GA11182-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-28 20:34                                                                                                                   ` Christoph Lameter
       [not found]                                                                                                                     ` <alpine.DEB.2.11.1505281522020.2703-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2015-05-28 20:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, 28 May 2015, Jason Gunthorpe wrote:

> Second: What about wrap around? Does it even make sense to expose less
> than 64 bits to userspace? Should the driver manage wrap around to
> create a flat 64 bit space?

The wrap around is given by the mask. Cycle registers are often shorter
than 64 bits.

> Otherwise, if the app has to do it, there is no event indicating wrap
> has occured.

Well yes. You take the difference and then apply the mask. If you are
outside of the cycle range then this will not give you a proper time.

> Without managing wrapping, cycle count difference is not a reliable
> calculation.

Counters only works for the interval in which they do not go
through a complete cycle. But that is the nature of these counters. And
thats what you see when you look into the kernel timer subsystem for
example.

> > This is not only for Ethernet. Internally Infiniband is frequently used
> > and there also timestamps are useful.
>
> Sure, but should any other vendors be commenting on this UAPI?

I think they would have spoken up if they had any objections?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                     ` <alpine.DEB.2.11.1505281522020.2703-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-05-28 20:47                                                                                                                       ` Jason Gunthorpe
       [not found]                                                                                                                         ` <20150528204749.GA12780-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2015-05-28 20:47 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, May 28, 2015 at 03:34:00PM -0500, Christoph Lameter wrote:
> On Thu, 28 May 2015, Jason Gunthorpe wrote:
> 
> > Second: What about wrap around? Does it even make sense to expose less
> > than 64 bits to userspace? Should the driver manage wrap around to
> > create a flat 64 bit space?
> 
> The wrap around is given by the mask. Cycle registers are often shorter
> than 64 bits.

I am aware of how cycle counters work.

My point was exposing raw wrapping counters is a horrible UAPI.

Shouldn't the driver software extend smaller counters to 64 bits?
That would take a single or and an unlikely branch, so don't say
'performance' :)

> through a complete cycle. But that is the nature of these counters. And
> thats what you see when you look into the kernel timer subsystem for
> example.

Very little in the kernel is exposed to that wrapping, the timer
subsystem takes care of it. Certainly, userspace never sees it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                         ` <20150528204749.GA12780-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-05-29  7:59                                                                                                                           ` Hefty, Sean
       [not found]                                                                                                                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE1F36-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-05-29 13:46                                                                                                                           ` Christoph Lameter
  1 sibling, 1 reply; 71+ messages in thread
From: Hefty, Sean @ 2015-05-29  7:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Lameter
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

> > > Second: What about wrap around? Does it even make sense to expose less
> > > than 64 bits to userspace? Should the driver manage wrap around to
> > > create a flat 64 bit space?
> >
> > The wrap around is given by the mask. Cycle registers are often shorter
> > than 64 bits.
> 
> I am aware of how cycle counters work.
> 
> My point was exposing raw wrapping counters is a horrible UAPI.
> 
> Shouldn't the driver software extend smaller counters to 64 bits?
> That would take a single or and an unlikely branch, so don't say
> 'performance' :)

It's one thing to time stamp a frame or packet.  But this is assigning a time stamp to a work completion.  I don't even know what that's supposed to mean when considering 2 GB (or larger) transfers, RDMA read operations, XRC, dynamic connections, out of order retransmissions, shared receive queues, and other exotic features.  IMO, this is currently vendor specific functionality, and not obviously applicable as a generic feature.  It is certainly poorly defined and exposed in a very implementation specific way.

The use case given by Christoph only speaks of packet level time stamps.  One could argue that such a use case would place the stamp near the packet (similar to the GRH), rather than embedded into a work completion.  This would allow time stamps even in the absence of a work completion.

IMO, vendors already have ways to expose vendor specific features to user space.  I would mark this as vendor specific and keep it out of the core.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                         ` <20150528204749.GA12780-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-05-29  7:59                                                                                                                           ` Hefty, Sean
@ 2015-05-29 13:46                                                                                                                           ` Christoph Lameter
       [not found]                                                                                                                             ` <alpine.DEB.2.11.1505290845040.8178-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2015-05-29 13:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Thu, 28 May 2015, Jason Gunthorpe wrote:

> My point was exposing raw wrapping counters is a horrible UAPI.

Well this is a kernel bypass API and a lot of raw hardware issues will
have to be handled since you do go directly to the device.

> > through a complete cycle. But that is the nature of these counters. And
> > thats what you see when you look into the kernel timer subsystem for
> > example.
>
> Very little in the kernel is exposed to that wrapping, the timer
> subsystem takes care of it. Certainly, userspace never sees it.

Right but then we are not at the comfortable sockets API here but at the
bare metal level.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE1F36-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-05-29 13:55                                                                                                                               ` Christoph Lameter
       [not found]                                                                                                                                 ` <alpine.DEB.2.11.1505290846480.8178-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2015-05-29 13:55 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Jason Gunthorpe, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

On Fri, 29 May 2015, Hefty, Sean wrote:

> > Shouldn't the driver software extend smaller counters to 64 bits?
> > That would take a single or and an unlikely branch, so don't say
> > 'performance' :)
>
> It's one thing to time stamp a frame or packet.  But this is assigning a ti=
> me stamp to a work completion.  I don't even know what that's supposed to m=
> ean when considering 2 GB (or larger) transfers, RDMA read operations, XRC,=
>  dynamic connections, out of order retransmissions, shared receive queues, =
> and other exotic features.  IMO, this is currently vendor specific function=

What is the issue here? The timestamp is created when the processing is
finished by the nic and the completion entry becomes available.

> ality, and not obviously applicable as a generic feature.  It is certainly =
> poorly defined and exposed in a very implementation specific way.

It is exactly defined like any other cycle counters in hardware and it is
exposed using an API that allows multiple vendors to use these cycle
counters without regard to a particular vendors implementation.

I sure wish that Intel would be supporting a feature like this. Please
come up with a better alternative if there is one. This is likely going to
be a differentiator for the vendor used in our industry.

> The use case given by Christoph only speaks of packet level time stamps.  O=
> ne could argue that such a use case would place the stamp near the packet (=
> similar to the GRH), rather than embedded into a work completion.  This wou=
> ld allow time stamps even in the absence of a work completion.
>
> IMO, vendors already have ways to expose vendor specific features to user s=
> pace.  I would mark this as vendor specific and keep it out of the core.

How exactly would that work? How can we attach vendor specific extension
to a completion structure?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                             ` <alpine.DEB.2.11.1505290845040.8178-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-05-29 16:49                                                                                                                               ` Doug Ledford
       [not found]                                                                                                                                 ` <1432918170.114391.104.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Doug Ledford @ 2015-05-29 16:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

[-- Attachment #1: Type: text/plain, Size: 2239 bytes --]

On Fri, 2015-05-29 at 08:46 -0500, Christoph Lameter wrote:
> On Thu, 28 May 2015, Jason Gunthorpe wrote:
> 
> > My point was exposing raw wrapping counters is a horrible UAPI.
> 
> Well this is a kernel bypass API and a lot of raw hardware issues will
> have to be handled since you do go directly to the device.

No, that's not entirely true, and it *certainly* is not the correct way
to think about verbs extensions.  Is it kernel bypass?  Yes.  Does it go
direct to the hardware?  Not as far as the user application is
concerned.  The direct hardware access is abstracted away in the verbs
library.  Because the verbs library is a hardware abstraction layer, any
extensions to it need to be well thought out.  And by that I mean if it
is of general use, then it should be added in a general, abstract way
that any hardware can implement.  If it is specific to just one vendor's
hardware, then it can be added in a means that is specific to that
vendor's hardware.

Now, as a general rule, I would call timestamps general.  They should be
added in a fashion that anyone can implement.  They should also be well
defined.  Sean's questions raise a very valid point.  Exactly what is
being timestamped, and do we care about different timestamp options?  Is
it completion of message, start of message, transfer from HCA to main
system memory completion, etc.  The 00/10 header to this patch series
was probably answering Sean's question, but just based on the name of
the TIMESTAMP flag to the CQ creation attr struct it isn't clear that
this is the case.

> > > through a complete cycle. But that is the nature of these counters. And
> > > thats what you see when you look into the kernel timer subsystem for
> > > example.
> >
> > Very little in the kernel is exposed to that wrapping, the timer
> > subsystem takes care of it. Certainly, userspace never sees it.
> 
> Right but then we are not at the comfortable sockets API here but at the
> bare metal level.

That's not entirely true.  We still hold to our abstrations, they are
just intentionally kept very thin and high performing.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                 ` <alpine.DEB.2.11.1505290846480.8178-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-05-29 16:52                                                                                                                                   ` Hefty, Sean
       [not found]                                                                                                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE2069-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Hefty, Sean @ 2015-05-29 16:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

> What is the issue here? The timestamp is created when the processing is
> finished by the nic and the completion entry becomes available.

The timestamp is generated when a work completion entry is written.  If there's a clear use case for this, it hasn't been described.  The use case you mentioned only works if there is a 1:1 relationship between a packet and a work completion.  That is not what is being defined here.

> It is exactly defined like any other cycle counters in hardware and it is
> exposed using an API that allows multiple vendors to use these cycle
> counters without regard to a particular vendors implementation.

I disagree.  This is associated with a specific implementation.  It assumes that the counter is part of a CQ entry, and that the counter is written when the completion is written.  There's nothing that requires other vendors to follow that model, nor is it clear that this is a generic or useful enough operation that other vendors would want to follow this model.  Why not have the time stamp record the start of the transaction?  The end?  Have two stamps, once for the first packet, and one for the last?  Limit this to single packet operations only?

> > The use case given by Christoph only speaks of packet level time stamps.
> O=
> > ne could argue that such a use case would place the stamp near the
> packet (=
> > similar to the GRH), rather than embedded into a work completion.  This
> wou=
> > ld allow time stamps even in the absence of a work completion.
> >
> > IMO, vendors already have ways to expose vendor specific features to
> user s=
> > pace.  I would mark this as vendor specific and keep it out of the core.
> 
> How exactly would that work? How can we attach vendor specific extension
> to a completion structure?

I'm just stating that there is at other ways of exposing this sort of feature.  A time stamp could just as easily be written with the data, similar to the grh.  One of the points of defining the verbs extensions was exactly so that a vendor could export their own functionality.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                 ` <1432918170.114391.104.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-29 16:59                                                                                                                                   ` Christoph Lameter
  2015-05-29 17:09                                                                                                                                   ` Hefty, Sean
  1 sibling, 0 replies; 71+ messages in thread
From: Christoph Lameter @ 2015-05-29 16:59 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

On Fri, 29 May 2015, Doug Ledford wrote:

> > Well this is a kernel bypass API and a lot of raw hardware issues will
> > have to be handled since you do go directly to the device.
>
> No, that's not entirely true, and it *certainly* is not the correct way
> to think about verbs extensions.  Is it kernel bypass?  Yes.  Does it go
> direct to the hardware?  Not as far as the user application is
> concerned.  The direct hardware access is abstracted away in the verbs

There is a compromise here by using the kernel as a administrative
function (setup and configuration of QPs) but using bare metal for the
data path. The structures modified for send/receive are structures that
are directly understood and handled by the hardware. That is the core
benefit of the RDMA API which results in the wanted performance and
latency.

The administrative function / bare metal separation is also reflected in
this patchset. The admin function allows the determination of the cycle
counter freq and size. The bare metal cycle counter exists in the
fastpath.

> library.  Because the verbs library is a hardware abstraction layer, any
> extensions to it need to be well thought out.  And by that I mean if it
> is of general use, then it should be added in a general, abstract way
> that any hardware can implement.  If it is specific to just one vendor's
> hardware, then it can be added in a means that is specific to that
> vendor's hardware.

What is particular here to the vendor?

> Now, as a general rule, I would call timestamps general.  They should be
> added in a fashion that anyone can implement.  They should also be well
> defined.  Sean's questions raise a very valid point.  Exactly what is
> being timestamped, and do we care about different timestamp options?  Is
> it completion of message, start of message, transfer from HCA to main
> system memory completion, etc.  The 00/10 header to this patch series
> was probably answering Sean's question, but just based on the name of
> the TIMESTAMP flag to the CQ creation attr struct it isn't clear that
> this is the case.

Ok then lets answer that.

> > Right but then we are not at the comfortable sockets API here but at the
> > bare metal level.
>
> That's not entirely true.  We still hold to our abstrations, they are
> just intentionally kept very thin and high performing.

Well there is a distinction here.

We provide the comfort of setup and administrative functions through the
kernel API. We still try to isolate the application as much as possible
when we go to the data paths but we need to hit bare metal in the fastpath
in order to accomplish our mission of maximum performance and minimum
latency. This cannot be accomplished with kernel calls and therefore it is
an examplle of kernel bypass. We want this as comfy as possible of course
and well defined.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                     ` <alpine.DEB.2.11.1505281212330.1711-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2015-05-28 17:50                                                                                                       ` Jason Gunthorpe
@ 2015-05-29 16:59                                                                                                       ` Doug Ledford
       [not found]                                                                                                         ` <1432918767.114391.110.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Doug Ledford @ 2015-05-29 16:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

[-- Attachment #1: Type: text/plain, Size: 2121 bytes --]

On Thu, 2015-05-28 at 12:14 -0500, Christoph Lameter wrote:
> On Thu, 28 May 2015, Jason Gunthorpe wrote:
> 
> > After a quick look through, the biggest question in my mind is what
> > should the timestamp value in the wc be?
> >
> > Right now it is some coded thing in clock cycles.
> 
> This is sufficient since it can be converted to ns or whatever one wants.

It is sufficient for your use.  It is not, however, a good API.

> > Should we require the driver to convert to ns before passing the wc
> > back to the app? (Looks like the socket implementation uniformly uses
> > us or ns)
> 
> But that requires additional processing.

Yes.

> > Should the app open code the conversion from clock cycles to ns, or
> > vfunc down to the driver?
> 
> The API provides the abilty to retrieve the clock freq which is
> sufficient for the user to convert the value to meaningful time values.

I would prefer if the access to the timestamp were implemented via a
function in libibverbs (I haven't looked at the git repo, too little
time, I'll get to it).  Something like ibv_get_cqe_timestamp().  That
function should be general and return a suitable, normalized value (ns
probably).  If you just want a simple comparator without the overhead of
normalizing to time, and are willing to accept the consequences of that,
then I would expect you to use something like
ibv_get_raw_cqe_timestamp() to get the unadulterated cycle counter.  For
the most part, the user space application should not know details like
"we are using a cycle counter in the HCA processor for timestamping",
that's below the level of abstraction we attempt to maintain at the
verbs level.  Libmlx4 should be the only thing aware of that fact, and
it talks to the mlx4 driver in the kernel to get the details it needs.
And by putting it into a function that libmlx4 implements, if libcxgb4
decides to implement timestamps and does it in a different way, the app
doesn't care, it just uses the same call.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                 ` <1432918170.114391.104.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-29 16:59                                                                                                                                   ` Christoph Lameter
@ 2015-05-29 17:09                                                                                                                                   ` Hefty, Sean
       [not found]                                                                                                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE2126-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Hefty, Sean @ 2015-05-29 17:09 UTC (permalink / raw)
  To: Doug Ledford, Christoph Lameter
  Cc: Jason Gunthorpe, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1261 bytes --]

> Now, as a general rule, I would call timestamps general.  They should be
> added in a fashion that anyone can implement.  They should also be well
> defined.  Sean's questions raise a very valid point.  Exactly what is
> being timestamped, and do we care about different timestamp options?  Is
> it completion of message, start of message, transfer from HCA to main
> system memory completion, etc.  The 00/10 header to this patch series
> was probably answering Sean's question, but just based on the name of
> the TIMESTAMP flag to the CQ creation attr struct it isn't clear that
> this is the case.

I didn't see the information that I was looking for in the patch header to this series.  As Jason pointed out, the use case is lacking.

IMO, it could make just as much sense to associate/enable time stamping with the QP as with the CQ, or even make it configurable per operation or operation type.

If Christoph has a clear use case and wants to go to the 'bare metal', then a vendor specific option seems ideal.  At least until there are other implementations or the driving use case is clearer.

- Sean
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE2126-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-05-29 17:14                                                                                                                                       ` Doug Ledford
       [not found]                                                                                                                                         ` <1432919688.114391.116.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Doug Ledford @ 2015-05-29 17:14 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Christoph Lameter, Jason Gunthorpe, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]

On Fri, 2015-05-29 at 17:09 +0000, Hefty, Sean wrote:
> > Now, as a general rule, I would call timestamps general.  They should be
> > added in a fashion that anyone can implement.  They should also be well
> > defined.  Sean's questions raise a very valid point.  Exactly what is
> > being timestamped, and do we care about different timestamp options?  Is
> > it completion of message, start of message, transfer from HCA to main
> > system memory completion, etc.  The 00/10 header to this patch series
> > was probably answering Sean's question, but just based on the name of
> > the TIMESTAMP flag to the CQ creation attr struct it isn't clear that
> > this is the case.
> 
> I didn't see the information that I was looking for in the patch header to this series.  As Jason pointed out, the use case is lacking.
> 
> IMO, it could make just as much sense to associate/enable time stamping with the QP as with the CQ, or even make it configurable per operation or operation type.
> 
> If Christoph has a clear use case and wants to go to the 'bare metal', then a vendor specific option seems ideal.  At least until there are other implementations or the driving use case is clearer.

The use case is clear IMO.  It's for financial trading software.  I
don't think they really care about details like whether it's the start
or end packet, or completion, or whatever.  They need a tie breaker
between when they have two different buy or sell orders on the same lot
of stock.  Any deterministic timing/ordering method will do as long as
they consistently apply it I think.  And the faster and lower overhead
the process, the better.  He doesn't really want a timestamp, he merely
wants a sequence ordering.  But a timestamp is what they are using to
get him what he needs.

Is that a fair guess Christoph?

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                         ` <1432919688.114391.116.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-29 17:18                                                                                                                                           ` Hefty, Sean
  2015-06-01 11:39                                                                                                                                           ` Christoph Lameter
  1 sibling, 0 replies; 71+ messages in thread
From: Hefty, Sean @ 2015-05-29 17:18 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Lameter, Jason Gunthorpe, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

> The use case is clear IMO.  It's for financial trading software.  I
> don't think they really care about details like whether it's the start
> or end packet, or completion, or whatever.  They need a tie breaker
> between when they have two different buy or sell orders on the same lot
> of stock.  Any deterministic timing/ordering method will do as long as
> they consistently apply it I think.  And the faster and lower overhead
> the process, the better.  He doesn't really want a timestamp, he merely
> wants a sequence ordering.  But a timestamp is what they are using to
> get him what he needs.
> 
> Is that a fair guess Christoph?

I get Christoph's usage model.  But AFAIK, he uses UD packets, often with multicast.  So the timestamp ends up associated indirectly with an actual packet.  That model makes sense to me.  It’s the generic time stamping of a work completion (for, say, a 2 GB transfer over RC using an SRQ) that doesn't make sense to me.  That's the part I'm struggling with.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE2069-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-05-29 19:21                                                                                                                                       ` Steve Wise
  2015-06-01 11:50                                                                                                                                       ` Christoph Lameter
  1 sibling, 0 replies; 71+ messages in thread
From: Steve Wise @ 2015-05-29 19:21 UTC (permalink / raw)
  To: Hefty, Sean, Christoph Lameter
  Cc: Jason Gunthorpe, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

On 5/29/2015 11:52 AM, Hefty, Sean wrote:
>> What is the issue here? The timestamp is created when the processing is
>> finished by the nic and the completion entry becomes available.
> The timestamp is generated when a work completion entry is written.  If there's a clear use case for this, it hasn't been described.  The use case you mentioned only works if there is a 1:1 relationship between a packet and a work completion.  That is not what is being defined here.
>
>> It is exactly defined like any other cycle counters in hardware and it is
>> exposed using an API that allows multiple vendors to use these cycle
>> counters without regard to a particular vendors implementation.
> I disagree.  This is associated with a specific implementation.  It assumes that the counter is part of a CQ entry, and that the counter is written when the completion is written.  There's nothing that requires other vendors to follow that model, nor is it clear that this is a generic or useful enough operation that other vendors would want to follow this model.  Why not have the time stamp record the start of the transaction?  The end?  Have two stamps, once for the first packet, and one for the last?  Limit this to single packet operations only?

FWIW: cxgb4 hardware includes a hw timestamp in its CQE as well. It is 
used by SW for CQ overflow detection and debug timing analysis...

>>> The use case given by Christoph only speaks of packet level time stamps.
>> O=
>>> ne could argue that such a use case would place the stamp near the
>> packet (=
>>> similar to the GRH), rather than embedded into a work completion.  This
>> wou=
>>> ld allow time stamps even in the absence of a work completion.
>>>
>>> IMO, vendors already have ways to expose vendor specific features to
>> user s=
>>> pace.  I would mark this as vendor specific and keep it out of the core.
>> How exactly would that work? How can we attach vendor specific extension
>> to a completion structure?
> I'm just stating that there is at other ways of exposing this sort of feature.  A time stamp could just as easily be written with the data, similar to the grh.  One of the points of defining the verbs extensions was exactly so that a vendor could export their own functionality.
>
> - Sean
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                         ` <1432919688.114391.116.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-29 17:18                                                                                                                                           ` Hefty, Sean
@ 2015-06-01 11:39                                                                                                                                           ` Christoph Lameter
       [not found]                                                                                                                                             ` <alpine.DEB.2.11.1506010635360.25165-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Christoph Lameter @ 2015-06-01 11:39 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Hefty, Sean, Jason Gunthorpe, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

On Fri, 29 May 2015, Doug Ledford wrote:

> The use case is clear IMO.  It's for financial trading software.  I
> don't think they really care about details like whether it's the start
> or end packet, or completion, or whatever.  They need a tie breaker
> between when they have two different buy or sell orders on the same lot
> of stock.  Any deterministic timing/ordering method will do as long as
> they consistently apply it I think.  And the faster and lower overhead
> the process, the better.  He doesn't really want a timestamp, he merely
> wants a sequence ordering.  But a timestamp is what they are using to
> get him what he needs.
>
> Is that a fair guess Christoph?

We want to have a time stamp when the action is complete and the data is
available to the application or the send action is complete and the CQ
entry can be reused. That is a well defined point and that is how the time
stamps function in the existing implementation. This is an obvious
understanding and its pretty clear.

The time stamp needs to be at the end of the action because the timestamp
is used to:

1. Assess the impact of network processing. This can be compared with
   packet timestamps from capture devices off the wire.
2. Delineate the start of packet processing in software.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                         ` <1432918767.114391.110.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-01 11:44                                                                                                           ` Christoph Lameter
  0 siblings, 0 replies; 71+ messages in thread
From: Christoph Lameter @ 2015-06-01 11:44 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud


> > This is sufficient since it can be converted to ns or whatever one wants.
>
> It is sufficient for your use.  It is not, however, a good API.

I hate these foggy statements that you guys come up with. Why is it not a
good API? Having a cycle counter without processiing overhead is a good
thing and the way counters are handled is pretty well established.

> > The API provides the abilty to retrieve the clock freq which is
> > sufficient for the user to convert the value to meaningful time values.
>
> I would prefer if the access to the timestamp were implemented via a
> function in libibverbs (I haven't looked at the git repo, too little
> time, I'll get to it).  Something like ibv_get_cqe_timestamp().  That
> function should be general and return a suitable, normalized value (ns
> probably).  If you just want a simple comparator without the overhead of
> normalizing to time, and are willing to accept the consequences of that,
> then I would expect you to use something like

That would introduce additional latencies and would make that feature no
longer useful to us. The advantage of this approach is that the counter is
in the same cacheline that is already used. It is very low overhead.

> ibv_get_raw_cqe_timestamp() to get the unadulterated cycle counter.  For
> the most part, the user space application should not know details like
> "we are using a cycle counter in the HCA processor for timestamping",

Why not? A cycle counter is the general way of providing
timestamps in hardware. RDTSC is such a cycle counter as well. There are
numerous examples of counters like that.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE2069-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-05-29 19:21                                                                                                                                       ` Steve Wise
@ 2015-06-01 11:50                                                                                                                                       ` Christoph Lameter
  1 sibling, 0 replies; 71+ messages in thread
From: Christoph Lameter @ 2015-06-01 11:50 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Jason Gunthorpe, Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

On Fri, 29 May 2015, Hefty, Sean wrote:

> > What is the issue here? The timestamp is created when the processing is
> > finished by the nic and the completion entry becomes available.
>
> The timestamp is generated when a work completion entry is written.  If the=
> re's a clear use case for this, it hasn't been described.  The use case you=
>  mentioned only works if there is a 1:1 relationship between a packet and a=
>  work completion.  That is not what is being defined here.

It does make sense to have a timestamp when the work described by a CQ has
been completed. For that you do not need a 1:1 correspondence.

My use case is focused on single packet processing because that is what we
do here.

> > It is exactly defined like any other cycle counters in hardware and it is
> > exposed using an API that allows multiple vendors to use these cycle
> > counters without regard to a particular vendors implementation.
>
> I disagree.  This is associated with a specific implementation.  It assumes=
>  that the counter is part of a CQ entry, and that the counter is written wh=
> en the completion is written.  There's nothing that requires other vendors =
> to follow that model, nor is it clear that this is a generic or useful enou=
> gh operation that other vendors would want to follow this model.  Why not h=
> ave the time stamp record the start of the transaction?  The end?  Have two=
>  stamps, once for the first packet, and one for the last?  Limit this to si=
> ngle packet operations only?

Why would you have a timestamp at the beginning of the transaction? That
is useless because you can use packet capture devices to establish that
timepoint. At that point you have not identified the CQ anyways.

Having a timestamp when an action is complete makes sense, is generic
and general.

Adding the timestamp to the data means that the application now has to
separate the metadata (timestamp) from the data stream. Not good.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                             ` <alpine.DEB.2.11.1506010635360.25165-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2015-06-01 14:54                                                                                                                                               ` Hefty, Sean
       [not found]                                                                                                                                                 ` <1828884A29C6694DAF28B7E6B8A82373A8FE2688-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Hefty, Sean @ 2015-06-01 14:54 UTC (permalink / raw)
  To: Christoph Lameter, Doug Ledford
  Cc: Jason Gunthorpe, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Amir Vadai, Tal Alon, Matan Barak, Yann Droneaud

> We want to have a time stamp when the action is complete and the data is
> available to the application or the send action is complete and the CQ
> entry can be reused.

This is what polling the completion from the CQ tells you, independent of there being a time stamp.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device
       [not found]                                                                                                                                                 ` <1828884A29C6694DAF28B7E6B8A82373A8FE2688-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-04  0:58                                                                                                                                                   ` Christoph Lameter
  0 siblings, 0 replies; 71+ messages in thread
From: Christoph Lameter @ 2015-06-04  0:58 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Doug Ledford, Jason Gunthorpe, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon,
	Matan Barak, Yann Droneaud

On Mon, 1 Jun 2015, Hefty, Sean wrote:

> > We want to have a time stamp when the action is complete and the data is
> > available to the application or the send action is complete and the CQ
> > entry can be reused.
>
> This is what polling the completion from the CQ tells you, independent of t=
> here being a time stamp.

But you may not be polling that frequently. Polling threads may check
multiple sources of events and may also currently executing code to handle
an event. Also there is the problem of the OS interrupting you. All of
these sources of inaccuracy are removed by the timestamp.

That was for inbound. For outbound you do not get a timestamp without this
feature. Typically reclaim of outbound work requeust is delayed quite a
bit and getting a timestamp later does not reflect the actual time the
message was sent.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2015-06-04  0:58 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-17 13:36 [PATCH for-next 00/10] Add completion timestamping support Or Gerlitz
     [not found] ` <1431869786-6308-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-17 13:36   ` [PATCH for-next 01/10] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz
     [not found]     ` <1431869786-6308-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:45       ` Jason Gunthorpe
     [not found]         ` <20150519184535.GJ18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20 15:14           ` Yann Droneaud
2015-05-20 16:41           ` Matan Barak
     [not found]             ` <CAAKD3BAXBPgkFp=bmeDNoj6QbAXoHMGt_nv+LHCE09g5=nyPGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-20 19:01               ` Jason Gunthorpe
2015-05-17 13:36   ` [PATCH for-next 02/10] IB/core: Add flags parameter for ib_create_cq Or Gerlitz
     [not found]     ` <1431869786-6308-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:47       ` Jason Gunthorpe
     [not found]         ` <20150519184711.GK18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20  8:54           ` Sagi Grimberg
     [not found]             ` <555C4BAB.6090602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-20 16:44               ` Matan Barak
2015-05-20 15:24       ` Yann Droneaud
     [not found]         ` <1432135457.5304.33.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-05-20 16:46           ` Matan Barak
2015-05-17 13:36   ` [PATCH for-next 03/10] IB/core: Extend ib_uverbs_create_cq Or Gerlitz
2015-05-17 13:36   ` [PATCH for-next 04/10] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
2015-05-17 13:36   ` [PATCH for-next 05/10] IB/core: Pass hardware specific data in query_device Or Gerlitz
2015-05-17 13:36   ` [PATCH for-next 06/10] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz
2015-05-17 13:36   ` [PATCH for-next 07/10] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz
2015-05-17 13:36   ` [PATCH for-next 08/10] IB/mlx4: Add support for timestamp in cq creation Or Gerlitz
     [not found]     ` <1431869786-6308-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:55       ` Jason Gunthorpe
     [not found]         ` <20150519185527.GL18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20  8:56           ` Sagi Grimberg
     [not found]             ` <555C4C45.7010300-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-20 14:43               ` Or Gerlitz
2015-05-20 17:27               ` Jason Gunthorpe
2015-05-17 13:36   ` [PATCH for-next 09/10] IB/mlx4: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz
     [not found]     ` <1431869786-6308-10-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-19 18:58       ` Jason Gunthorpe
     [not found]         ` <20150519185801.GM18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-19 19:00           ` Jason Gunthorpe
     [not found]             ` <20150519190031.GN18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-19 19:11               ` Or Gerlitz
     [not found]                 ` <CAJ3xEMiO2y6G5YzzkGC1M=XB6yrtNJ7jj13pTUAee1TssiYwkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-19 19:15                   ` Jason Gunthorpe
     [not found]                     ` <20150519191553.GP18675-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-19 19:30                       ` Or Gerlitz
     [not found]                         ` <CAJ3xEMjj31SpZmYXORCePVCQfMOWVKEhtNay-v59FcnCOQWm9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-20  0:29                           ` Jason Gunthorpe
     [not found]                             ` <20150520002915.GD16941-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-20 14:40                               ` Or Gerlitz
2015-05-20 14:41                               ` Or Gerlitz
     [not found]                                 ` <555C9D00.2090609-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-20 15:11                                   ` Yann Droneaud
     [not found]                                     ` <1432134677.5304.23.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-05-20 17:37                                       ` Jason Gunthorpe
2015-05-20 17:53                                       ` Or Gerlitz
     [not found]                                         ` <CAJ3xEMjO2HqFePAwXnoCt0rCzRLfYbqdP+883v4LaMovj7Y-FA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-23  4:26                                           ` Or Gerlitz
     [not found]                                             ` <CAJ3xEMgppUPqOFm7L0+fR3z_N2ALZFb+GpSAGvyyNROZoFCUPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26  8:10                                               ` Or Gerlitz
     [not found]                                                 ` <CAJ3xEMinNsQrmQ7atAVkshefBDBQ1HRuGbOT_cJ4RxEj4dWUPA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26 16:06                                                   ` Jason Gunthorpe
     [not found]                                                     ` <20150526160624.GA11800-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-26 18:33                                                       ` Or Gerlitz
     [not found]                                                         ` <CAJ3xEMj5qphSq5QQHKPsihuK6StmJ+NHiPbkY-ZVGmpViB1AXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26 18:53                                                           ` Jason Gunthorpe
     [not found]                                                             ` <20150526185315.GF11800-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-26 20:39                                                               ` Or Gerlitz
     [not found]                                                                 ` <CAJ3xEMh7JYN8sP5MYdVjH=tqPFURwvD7jVvV0=PBiEn1qEFi2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-26 22:07                                                                   ` Jason Gunthorpe
     [not found]                                                                     ` <20150526220724.GC4502-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-27 11:54                                                                       ` Or Gerlitz
     [not found]                                                                         ` <CAJ3xEMgdqBnn6hobRJFut6Nyu2pKt40rb46Z3dYx3MaohR2urA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-27 18:48                                                                           ` Jason Gunthorpe
     [not found]                                                                             ` <20150527184856.GA16059-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-27 21:33                                                                               ` Or Gerlitz
     [not found]                                                                                 ` <CAJ3xEMh_g0zBza5HoVZTy9cTRbXqatzvBMAJcRjQhH7Zi=cHAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-27 22:21                                                                                   ` Jason Gunthorpe
     [not found]                                                                                     ` <20150527222108.GA7855-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28  7:04                                                                                       ` Or Gerlitz
     [not found]                                                                                         ` <5566BDE4.50709-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-05-28 14:13                                                                                           ` Christoph Lameter
     [not found]                                                                                             ` <alpine.DEB.2.11.1505280908001.16142-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-05-28 16:24                                                                                               ` Jason Gunthorpe
     [not found]                                                                                                 ` <20150528162416.GA6515-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28 17:14                                                                                                   ` Christoph Lameter
     [not found]                                                                                                     ` <alpine.DEB.2.11.1505281212330.1711-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-05-28 17:50                                                                                                       ` Jason Gunthorpe
     [not found]                                                                                                         ` <20150528175043.GA10966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28 18:30                                                                                                           ` Christoph Lameter
     [not found]                                                                                                             ` <alpine.DEB.2.11.1505281325150.1794-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-05-28 19:50                                                                                                               ` Jason Gunthorpe
     [not found]                                                                                                                 ` <20150528195034.GA11182-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-28 20:34                                                                                                                   ` Christoph Lameter
     [not found]                                                                                                                     ` <alpine.DEB.2.11.1505281522020.2703-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-05-28 20:47                                                                                                                       ` Jason Gunthorpe
     [not found]                                                                                                                         ` <20150528204749.GA12780-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-29  7:59                                                                                                                           ` Hefty, Sean
     [not found]                                                                                                                             ` <1828884A29C6694DAF28B7E6B8A82373A8FE1F36-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-05-29 13:55                                                                                                                               ` Christoph Lameter
     [not found]                                                                                                                                 ` <alpine.DEB.2.11.1505290846480.8178-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-05-29 16:52                                                                                                                                   ` Hefty, Sean
     [not found]                                                                                                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE2069-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-05-29 19:21                                                                                                                                       ` Steve Wise
2015-06-01 11:50                                                                                                                                       ` Christoph Lameter
2015-05-29 13:46                                                                                                                           ` Christoph Lameter
     [not found]                                                                                                                             ` <alpine.DEB.2.11.1505290845040.8178-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-05-29 16:49                                                                                                                               ` Doug Ledford
     [not found]                                                                                                                                 ` <1432918170.114391.104.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-29 16:59                                                                                                                                   ` Christoph Lameter
2015-05-29 17:09                                                                                                                                   ` Hefty, Sean
     [not found]                                                                                                                                     ` <1828884A29C6694DAF28B7E6B8A82373A8FE2126-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-05-29 17:14                                                                                                                                       ` Doug Ledford
     [not found]                                                                                                                                         ` <1432919688.114391.116.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-29 17:18                                                                                                                                           ` Hefty, Sean
2015-06-01 11:39                                                                                                                                           ` Christoph Lameter
     [not found]                                                                                                                                             ` <alpine.DEB.2.11.1506010635360.25165-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2015-06-01 14:54                                                                                                                                               ` Hefty, Sean
     [not found]                                                                                                                                                 ` <1828884A29C6694DAF28B7E6B8A82373A8FE2688-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-04  0:58                                                                                                                                                   ` Christoph Lameter
2015-05-29 16:59                                                                                                       ` Doug Ledford
     [not found]                                                                                                         ` <1432918767.114391.110.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-01 11:44                                                                                                           ` Christoph Lameter
2015-05-17 13:36   ` [PATCH for-next 10/10] IB/mlx4: Return hca core clock's offset in query_device vendor's data Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.