* [PATCH for-next V2 0/9] Add completion timestamping support @ 2015-05-31 12:14 Or Gerlitz [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Or Gerlitz Hi Doug, This patchset adds completion timestamping supports for verbs consumers. Reviewing the weekend threads, we've changed the flag time to reflect that this is completion time-stamp and folded the mlx4 actual support into one patch. Regarding the related user-space support, it's possible to add what you were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned by the poll cq verb an opaque one that must go through one of the convertors. We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME. We think this would address the reviewer comments for the kernel submission. The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the conversion functions) https://github.com/matanb10/libibverbs timestamp-v1 https://github.com/matanb10/libmlx4 timestamp-v1 Timestamping is used by applications in order to know when a WQE was received/transmitted by the HW. The value is given is HCA hardware cycles, but could be easily converted as the hardware's core clock frequecny is available through extension of query device. Moreover, we add an ability to read the HCA's current clock. This could be useful on order to synchronize events to the wall clock. This functionality is achieved by adding/extending the following verbs: create_cq - create_cq is extended in order to allow passing creation flags to the CQ creation function. We change IB/core --> vendors API to be easily extendible by passing a struct which contains comp_vectors, cqe and the new flags parameter. In order to create CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given. query_device - We extend query_device uverb further by giving the hardware's clock frequency and the timestamp mask (the number of timestamp bits which are supported). If timestamp isn't supported, 0 is returned. In order to read the timestamp in the WQE, the user needs to query the device for support, create an appropriate CQ (using the extanded uverb with IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently, only implemented in user-space). In mlx4, allowing the user to read the core clock efficiently involves mapping this area of the hardware to user-space (being done by using a mmap command) and reading the clock from the correct offset of the page. This offset is returned in the vendor's specific data from mlx4's kernel driver to the mlx4's user-space driver. query_device is modified in order to support passing this vendor specific data. A user-space application could use a new verb in order to read the hardware's clock. Translating the hardware's clock into ms could be done by dividing this value by hca_core_clock (which is returned by the extended version of query_device uverb). A user-space application could get the current HW's clock by executing ibv_query_values_ex(struct ibv_context *context, uint32_t q_values, struct ibv_values_ex *values) The function gets a mask of the values to query and return their values. Vendors could either implement this as a uverb command or use their user-space driver to return those values directly from the HW (the mlx4 way). Matan and Or. Changes from V1: (1) fixed lustre IB's code build (2) squashed mlx4 V1 9-11 patches into one (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION Changes from V0: (1) Pass ib_cq_init_attr instead of cqe and comp_vector. (2) Fix unneeded indentation. (3) Change flags to u32. (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation. Matan Barak (9): IB/core: Change provider's API of create_cq to be extendible IB/core: Change ib_create_cq to use struct ib_cq_init_attr IB/core: Add CQ creation time-stamping flag IB/core: Extend ib_uverbs_create_cq IB/core: Add timestamp_mask and hca_core_clock to query_device IB/core: Pass hardware specific data in query_device IB/mlx4: Add mmap call to map the hardware clock IB/mlx4: Support extended create_cq and query_device uverbs IB/mlx4: Add support for CQ time-stamping drivers/infiniband/core/device.c | 6 +- drivers/infiniband/core/mad.c | 5 +- drivers/infiniband/core/uverbs.h | 1 + drivers/infiniband/core/uverbs_cmd.c | 188 ++++++++++++++++---- drivers/infiniband/core/uverbs_main.c | 1 + drivers/infiniband/core/verbs.c | 4 +- drivers/infiniband/hw/amso1100/c2_provider.c | 14 ++- drivers/infiniband/hw/cxgb3/iwch_provider.c | 19 ++- drivers/infiniband/hw/cxgb4/cq.c | 9 +- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 8 +- drivers/infiniband/hw/cxgb4/provider.c | 8 +- drivers/infiniband/hw/ehca/ehca_cq.c | 7 +- drivers/infiniband/hw/ehca/ehca_hca.c | 6 +- drivers/infiniband/hw/ehca/ehca_iverbs.h | 6 +- drivers/infiniband/hw/ehca/ehca_main.c | 6 +- drivers/infiniband/hw/ipath/ipath_cq.c | 9 +- drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +- drivers/infiniband/hw/ipath/ipath_verbs.h | 3 +- drivers/infiniband/hw/mlx4/cq.c | 13 ++- drivers/infiniband/hw/mlx4/mad.c | 5 +- drivers/infiniband/hw/mlx4/main.c | 67 +++++++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 19 ++- drivers/infiniband/hw/mlx5/cq.c | 10 +- drivers/infiniband/hw/mlx5/main.c | 19 ++- drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 +- drivers/infiniband/hw/mthca/mthca_provider.c | 15 ++- drivers/infiniband/hw/nes/nes_verbs.c | 17 ++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 13 ++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 9 +- drivers/infiniband/hw/qib/qib_cq.c | 11 +- drivers/infiniband/hw/qib/qib_verbs.c | 6 +- drivers/infiniband/hw/qib/qib_verbs.h | 5 +- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 16 ++- drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 10 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 9 +- drivers/infiniband/ulp/iser/iser_verbs.c | 6 +- drivers/infiniband/ulp/isert/ib_isert.c | 6 +- drivers/infiniband/ulp/srp/ib_srp.c | 10 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +- drivers/net/ethernet/mellanox/mlx4/main.c | 19 ++ .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 7 +- include/linux/mlx4/device.h | 9 + include/rdma/ib_verbs.h | 25 ++- include/uapi/rdma/ib_user_verbs.h | 19 ++ net/9p/trans_rdma.c | 5 +- net/rds/ib_cm.c | 8 +- net/rds/iw_cm.c | 8 +- net/sunrpc/xprtrdma/svc_rdma_transport.c | 10 +- net/sunrpc/xprtrdma/verbs.c | 10 +- 49 files changed, 564 insertions(+), 139 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-05-31 12:14 ` Or Gerlitz [not found] ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-05-31 12:14 ` [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr Or Gerlitz ` (9 subsequent siblings) 10 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/uverbs_cmd.c | 6 ++++-- drivers/infiniband/core/verbs.c | 3 ++- drivers/infiniband/hw/amso1100/c2_provider.c | 7 ++++++- drivers/infiniband/hw/cxgb3/iwch_provider.c | 11 ++++++++--- drivers/infiniband/hw/cxgb4/cq.c | 9 +++++++-- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 8 ++++---- drivers/infiniband/hw/ehca/ehca_cq.c | 7 ++++++- drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 ++- drivers/infiniband/hw/ipath/ipath_cq.c | 9 +++++++-- drivers/infiniband/hw/ipath/ipath_verbs.h | 3 ++- drivers/infiniband/hw/mlx4/cq.c | 8 +++++++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 3 ++- drivers/infiniband/hw/mlx5/cq.c | 10 ++++++++-- drivers/infiniband/hw/mlx5/main.c | 3 ++- drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 +++-- drivers/infiniband/hw/mthca/mthca_provider.c | 8 ++++++-- drivers/infiniband/hw/nes/nes_verbs.c | 11 ++++++++--- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 7 ++++++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 6 ++++-- drivers/infiniband/hw/qib/qib_cq.c | 11 ++++++++--- drivers/infiniband/hw/qib/qib_verbs.h | 5 +++-- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 10 +++++++--- drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 7 ++++--- include/rdma/ib_verbs.h | 10 ++++++++-- 24 files changed, 124 insertions(+), 46 deletions(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index a9f0489..1954ebb 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, struct ib_uverbs_event_file *ev_file = NULL; struct ib_cq *cq; int ret; + struct ib_cq_init_attr attr = {}; if (out_len < sizeof resp) return -ENOSPC; @@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, INIT_LIST_HEAD(&obj->comp_list); INIT_LIST_HEAD(&obj->async_list); - cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe, - cmd.comp_vector, + attr.cqe = cmd.cqe; + attr.comp_vector = cmd.comp_vector; + cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr, file->ucontext, &udata); if (IS_ERR(cq)) { ret = PTR_ERR(cq); diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 685a362..f7615d4 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1078,8 +1078,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device, void *cq_context, int cqe, int comp_vector) { struct ib_cq *cq; + struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector}; - cq = device->create_cq(device, cqe, comp_vector, NULL, NULL); + cq = device->create_cq(device, &attr, NULL, NULL); if (!IS_ERR(cq)) { cq->device = device; diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index d396c39..a43e022 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -286,13 +286,18 @@ static int c2_destroy_qp(struct ib_qp *ib_qp) return 0; } -static struct ib_cq *c2_create_cq(struct ib_device *ibdev, int entries, int vector, +static struct ib_cq *c2_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata) { + int entries = attr->cqe; struct c2_cq *cq; int err; + if (attr->flags) + return ERR_PTR(-EINVAL); + cq = kmalloc(sizeof(*cq), GFP_KERNEL); if (!cq) { pr_debug("%s: Unable to allocate CQ\n", __func__); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 061ef08..2eaf7e8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -138,10 +138,12 @@ static int iwch_destroy_cq(struct ib_cq *ib_cq) return 0; } -static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int vector, - struct ib_ucontext *ib_context, - struct ib_udata *udata) +static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *ib_context, + struct ib_udata *udata) { + int entries = attr->cqe; struct iwch_dev *rhp; struct iwch_cq *chp; struct iwch_create_cq_resp uresp; @@ -151,6 +153,9 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int ve size_t resplen; PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries); + if (attr->flags) + return ERR_PTR(-EINVAL); + rhp = to_iwch_dev(ibdev); chp = kzalloc(sizeof(*chp), GFP_KERNEL); if (!chp) diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c index 68ddb37..9a2c01c 100644 --- a/drivers/infiniband/hw/cxgb4/cq.c +++ b/drivers/infiniband/hw/cxgb4/cq.c @@ -866,10 +866,13 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq) return 0; } -struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries, - int vector, struct ib_ucontext *ib_context, +struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *ib_context, struct ib_udata *udata) { + int entries = attr->cqe; + int vector = attr->comp_vector; struct c4iw_dev *rhp; struct c4iw_cq *chp; struct c4iw_create_cq_resp uresp; @@ -879,6 +882,8 @@ struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries, struct c4iw_mm_entry *mm, *mm2; PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries); + if (attr->flags) + return ERR_PTR(-EINVAL); rhp = to_c4iw_dev(ibdev); diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index 97bb555..540ed02 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -992,10 +992,10 @@ int c4iw_reregister_phys_mem(struct ib_mr *mr, int acc, u64 *iova_start); int c4iw_dereg_mr(struct ib_mr *ib_mr); int c4iw_destroy_cq(struct ib_cq *ib_cq); -struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries, - int vector, - struct ib_ucontext *ib_context, - struct ib_udata *udata); +struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *ib_context, + struct ib_udata *udata); int c4iw_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata); int c4iw_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags); int c4iw_destroy_qp(struct ib_qp *ib_qp); diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index 8cc8375..9b68b17 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -113,10 +113,12 @@ struct ehca_qp *ehca_cq_get_qp(struct ehca_cq *cq, int real_qp_num) return ret; } -struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, +struct ib_cq *ehca_create_cq(struct ib_device *device, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata) { + int cqe = attr->cqe; static const u32 additional_cqe = 20; struct ib_cq *cq; struct ehca_cq *my_cq; @@ -131,6 +133,9 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, int ipz_rc, i; unsigned long flags; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (cqe >= 0xFFFFFFFF - 64 - additional_cqe) return ERR_PTR(-EINVAL); diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h index 077185b..952f2fb 100644 --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -129,7 +129,8 @@ int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq); void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq); -struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, +struct ib_cq *ehca_create_cq(struct ib_device *device, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c index 0416c6c..e9dd911 100644 --- a/drivers/infiniband/hw/ipath/ipath_cq.c +++ b/drivers/infiniband/hw/ipath/ipath_cq.c @@ -188,7 +188,7 @@ static void send_complete(unsigned long data) /** * ipath_create_cq - create a completion queue * @ibdev: the device this completion queue is attached to - * @entries: the minimum size of the completion queue + * @attr: creation attributes * @context: unused by the InfiniPath driver * @udata: unused by the InfiniPath driver * @@ -197,16 +197,21 @@ static void send_complete(unsigned long data) * * Called by ib_create_cq() in the generic verbs code. */ -struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector, +struct ib_cq *ipath_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata) { + int entries = attr->cqe; struct ipath_ibdev *dev = to_idev(ibdev); struct ipath_cq *cq; struct ipath_cq_wc *wc; struct ib_cq *ret; u32 sz; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (entries < 1 || entries > ib_ipath_max_cqes) { ret = ERR_PTR(-EINVAL); goto done; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index ae6cff4..c4e4e81 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -807,7 +807,8 @@ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig); int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector, +struct ib_cq *ipath_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 0176caa..8e44aaa 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -166,15 +166,21 @@ err_buf: return err; } -struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, +struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata) { + int entries = attr->cqe; + int vector = attr->comp_vector; struct mlx4_ib_dev *dev = to_mdev(ibdev); struct mlx4_ib_cq *cq; struct mlx4_uar *uar; int err; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (entries < 1 || entries > dev->dev->caps.max_cqes) return ERR_PTR(-EINVAL); diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index fce3934..f127efc 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -668,7 +668,8 @@ void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list); int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata); -struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, +struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata); int mlx4_ib_destroy_cq(struct ib_cq *cq); diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 2ee6b10..09fbae6 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -736,10 +736,13 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq) mlx5_db_free(dev->mdev, &cq->db); } -struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries, - int vector, struct ib_ucontext *context, +struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, struct ib_udata *udata) { + int entries = attr->cqe; + int vector = attr->comp_vector; struct mlx5_create_cq_mbox_in *cqb = NULL; struct mlx5_ib_dev *dev = to_mdev(ibdev); struct mlx5_ib_cq *cq; @@ -750,6 +753,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries, int eqn; int err; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (entries < 0) return ERR_PTR(-EINVAL); diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index b2fdb9c..9565c20 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1087,6 +1087,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr) { struct ib_srq_init_attr attr; struct mlx5_ib_dev *dev; + struct ib_cq_init_attr cq_attr = {.cqe = 1}; int ret = 0; dev = container_of(devr, struct mlx5_ib_dev, devr); @@ -1100,7 +1101,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr) devr->p0->uobject = NULL; atomic_set(&devr->p0->usecnt, 0); - devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, 1, 0, NULL, NULL); + devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, &cq_attr, NULL, NULL); if (IS_ERR(devr->c0)) { ret = PTR_ERR(devr->c0); goto error1; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index dff1cfc..b086162 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -556,8 +556,9 @@ int mlx5_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr, void *mlx5_get_send_wqe(struct mlx5_ib_qp *qp, int n); int mlx5_ib_read_user_wqe(struct mlx5_ib_qp *qp, int send, int wqe_index, void *buffer, u32 length); -struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries, - int vector, struct ib_ucontext *context, +struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, struct ib_udata *udata); int mlx5_ib_destroy_cq(struct ib_cq *cq); int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 509d59e..f1b04bd 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -641,16 +641,20 @@ static int mthca_destroy_qp(struct ib_qp *qp) return 0; } -static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, int entries, - int comp_vector, +static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata) { + int entries = attr->cqe; struct mthca_create_cq ucmd; struct mthca_cq *cq; int nent; int err; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (entries < 1 || entries > to_mdev(ibdev)->limits.max_cqes) return ERR_PTR(-EINVAL); diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 05530e3..a9d62c5 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -1526,10 +1526,12 @@ static int nes_destroy_qp(struct ib_qp *ibqp) /** * nes_create_cq */ -static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, - int comp_vector, - struct ib_ucontext *context, struct ib_udata *udata) +static struct ib_cq *nes_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, + struct ib_udata *udata) { + int entries = attr->cqe; u64 u64temp; struct nes_vnic *nesvnic = to_nesvnic(ibdev); struct nes_device *nesdev = nesvnic->nesdev; @@ -1549,6 +1551,9 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, unsigned long flags; int ret; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (entries > nesadapter->max_cqe) return ERR_PTR(-EINVAL); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index 8771755..f441df6 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -1000,10 +1000,12 @@ err: return status; } -struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector, +struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, struct ib_ucontext *ib_ctx, struct ib_udata *udata) { + int entries = attr->cqe; struct ocrdma_cq *cq; struct ocrdma_dev *dev = get_ocrdma_dev(ibdev); struct ocrdma_ucontext *uctx = NULL; @@ -1011,6 +1013,9 @@ struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector, int status; struct ocrdma_create_cq_ureq ureq; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (udata) { if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) return ERR_PTR(-EFAULT); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h index 3cdc81e..e853195 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h @@ -59,8 +59,10 @@ struct ib_pd *ocrdma_alloc_pd(struct ib_device *, struct ib_ucontext *, struct ib_udata *); int ocrdma_dealloc_pd(struct ib_pd *pd); -struct ib_cq *ocrdma_create_cq(struct ib_device *, int entries, int vector, - struct ib_ucontext *, struct ib_udata *); +struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *ib_ctx, + struct ib_udata *udata); int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *); int ocrdma_destroy_cq(struct ib_cq *); diff --git a/drivers/infiniband/hw/qib/qib_cq.c b/drivers/infiniband/hw/qib/qib_cq.c index ab4e11c..2b45d0b 100644 --- a/drivers/infiniband/hw/qib/qib_cq.c +++ b/drivers/infiniband/hw/qib/qib_cq.c @@ -203,7 +203,7 @@ static void send_complete(struct kthread_work *work) /** * qib_create_cq - create a completion queue * @ibdev: the device this completion queue is attached to - * @entries: the minimum size of the completion queue + * @attr: creation attributes * @context: unused by the QLogic_IB driver * @udata: user data for libibverbs.so * @@ -212,16 +212,21 @@ static void send_complete(struct kthread_work *work) * * Called by ib_create_cq() in the generic verbs code. */ -struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries, - int comp_vector, struct ib_ucontext *context, +struct ib_cq *qib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, struct ib_udata *udata) { + int entries = attr->cqe; struct qib_ibdev *dev = to_idev(ibdev); struct qib_cq *cq; struct qib_cq_wc *wc; struct ib_cq *ret; u32 sz; + if (attr->flags) + return ERR_PTR(-EINVAL); + if (entries < 1 || entries > ib_qib_max_cqes) { ret = ERR_PTR(-EINVAL); goto done; diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h index bfc8948..e9b26a4 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.h +++ b/drivers/infiniband/hw/qib/qib_verbs.h @@ -1007,8 +1007,9 @@ void qib_cq_enter(struct qib_cq *cq, struct ib_wc *entry, int sig); int qib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries, - int comp_vector, struct ib_ucontext *context, +struct ib_cq *qib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, struct ib_udata *udata); int qib_destroy_cq(struct ib_cq *ibcq); diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c index 53bd6a2..2b3f7e3 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c @@ -570,13 +570,17 @@ int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, return status; } -struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries, - int vector, struct ib_ucontext *context, - struct ib_udata *udata) +struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, + struct ib_udata *udata) { struct ib_cq *cq; usnic_dbg("\n"); + if (attr->flags) + return ERR_PTR(-EINVAL); + cq = kzalloc(sizeof(*cq), GFP_KERNEL); if (!cq) return ERR_PTR(-EBUSY); diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h index 57ddba5..2ab8427 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h @@ -46,9 +46,10 @@ struct ib_qp *usnic_ib_create_qp(struct ib_pd *pd, int usnic_ib_destroy_qp(struct ib_qp *qp); int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata); -struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries, - int vector, struct ib_ucontext *context, - struct ib_udata *udata); +struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, + const struct ib_cq_init_attr *attr, + struct ib_ucontext *context, + struct ib_udata *udata); int usnic_ib_destroy_cq(struct ib_cq *cq); struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index ad499bd..6c715d7 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -173,6 +173,12 @@ struct ib_odp_caps { } per_transport_caps; }; +struct ib_cq_init_attr { + unsigned int cqe; + int comp_vector; + u32 flags; +}; + struct ib_device_attr { u64 fw_ver; __be64 sys_image_guid; @@ -1614,8 +1620,8 @@ struct ib_device { int (*post_recv)(struct ib_qp *qp, struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr); - struct ib_cq * (*create_cq)(struct ib_device *device, int cqe, - int comp_vector, + struct ib_cq * (*create_cq)(struct ib_device *device, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata); int (*modify_cq)(struct ib_cq *cq, u16 cq_count, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
[parent not found: <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible [not found] ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-06-01 9:21 ` Devesh Sharma 2015-06-01 17:00 ` Jason Gunthorpe 1 sibling, 0 replies; 64+ messages in thread From: Devesh Sharma @ 2015-06-01 9:21 UTC (permalink / raw) To: Or Gerlitz Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak Looks good. Reviewed-By: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org> On Sun, May 31, 2015 at 5:44 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote: > From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > > Add a new ib_cq_init_attr structure which contains the > previous cqe (minimum number of CQ entries) and comp_vector > (completion vector) in addition to a new flags field. > All vendors' create_cq callbacks are changed in order > to work with the new API. > > This commit does not change any functionality. > > Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > --- > drivers/infiniband/core/uverbs_cmd.c | 6 ++++-- > drivers/infiniband/core/verbs.c | 3 ++- > drivers/infiniband/hw/amso1100/c2_provider.c | 7 ++++++- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 11 ++++++++--- > drivers/infiniband/hw/cxgb4/cq.c | 9 +++++++-- > drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 8 ++++---- > drivers/infiniband/hw/ehca/ehca_cq.c | 7 ++++++- > drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 ++- > drivers/infiniband/hw/ipath/ipath_cq.c | 9 +++++++-- > drivers/infiniband/hw/ipath/ipath_verbs.h | 3 ++- > drivers/infiniband/hw/mlx4/cq.c | 8 +++++++- > drivers/infiniband/hw/mlx4/mlx4_ib.h | 3 ++- > drivers/infiniband/hw/mlx5/cq.c | 10 ++++++++-- > drivers/infiniband/hw/mlx5/main.c | 3 ++- > drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 +++-- > drivers/infiniband/hw/mthca/mthca_provider.c | 8 ++++++-- > drivers/infiniband/hw/nes/nes_verbs.c | 11 ++++++++--- > drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 7 ++++++- > drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 6 ++++-- > drivers/infiniband/hw/qib/qib_cq.c | 11 ++++++++--- > drivers/infiniband/hw/qib/qib_verbs.h | 5 +++-- > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 10 +++++++--- > drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 7 ++++--- > include/rdma/ib_verbs.h | 10 ++++++++-- > 24 files changed, 124 insertions(+), 46 deletions(-) > > diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c > index a9f0489..1954ebb 100644 > --- a/drivers/infiniband/core/uverbs_cmd.c > +++ b/drivers/infiniband/core/uverbs_cmd.c > @@ -1341,6 +1341,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, > struct ib_uverbs_event_file *ev_file = NULL; > struct ib_cq *cq; > int ret; > + struct ib_cq_init_attr attr = {}; > > if (out_len < sizeof resp) > return -ENOSPC; > @@ -1376,8 +1377,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, > INIT_LIST_HEAD(&obj->comp_list); > INIT_LIST_HEAD(&obj->async_list); > > - cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe, > - cmd.comp_vector, > + attr.cqe = cmd.cqe; > + attr.comp_vector = cmd.comp_vector; > + cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr, > file->ucontext, &udata); > if (IS_ERR(cq)) { > ret = PTR_ERR(cq); > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c > index 685a362..f7615d4 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -1078,8 +1078,9 @@ struct ib_cq *ib_create_cq(struct ib_device *device, > void *cq_context, int cqe, int comp_vector) > { > struct ib_cq *cq; > + struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector}; > > - cq = device->create_cq(device, cqe, comp_vector, NULL, NULL); > + cq = device->create_cq(device, &attr, NULL, NULL); > > if (!IS_ERR(cq)) { > cq->device = device; > diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c > index d396c39..a43e022 100644 > --- a/drivers/infiniband/hw/amso1100/c2_provider.c > +++ b/drivers/infiniband/hw/amso1100/c2_provider.c > @@ -286,13 +286,18 @@ static int c2_destroy_qp(struct ib_qp *ib_qp) > return 0; > } > > -static struct ib_cq *c2_create_cq(struct ib_device *ibdev, int entries, int vector, > +static struct ib_cq *c2_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata) > { > + int entries = attr->cqe; > struct c2_cq *cq; > int err; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > cq = kmalloc(sizeof(*cq), GFP_KERNEL); > if (!cq) { > pr_debug("%s: Unable to allocate CQ\n", __func__); > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index 061ef08..2eaf7e8 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -138,10 +138,12 @@ static int iwch_destroy_cq(struct ib_cq *ib_cq) > return 0; > } > > -static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int vector, > - struct ib_ucontext *ib_context, > - struct ib_udata *udata) > +static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *ib_context, > + struct ib_udata *udata) > { > + int entries = attr->cqe; > struct iwch_dev *rhp; > struct iwch_cq *chp; > struct iwch_create_cq_resp uresp; > @@ -151,6 +153,9 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries, int ve > size_t resplen; > > PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries); > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > rhp = to_iwch_dev(ibdev); > chp = kzalloc(sizeof(*chp), GFP_KERNEL); > if (!chp) > diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c > index 68ddb37..9a2c01c 100644 > --- a/drivers/infiniband/hw/cxgb4/cq.c > +++ b/drivers/infiniband/hw/cxgb4/cq.c > @@ -866,10 +866,13 @@ int c4iw_destroy_cq(struct ib_cq *ib_cq) > return 0; > } > > -struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries, > - int vector, struct ib_ucontext *ib_context, > +struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *ib_context, > struct ib_udata *udata) > { > + int entries = attr->cqe; > + int vector = attr->comp_vector; > struct c4iw_dev *rhp; > struct c4iw_cq *chp; > struct c4iw_create_cq_resp uresp; > @@ -879,6 +882,8 @@ struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries, > struct c4iw_mm_entry *mm, *mm2; > > PDBG("%s ib_dev %p entries %d\n", __func__, ibdev, entries); > + if (attr->flags) > + return ERR_PTR(-EINVAL); > > rhp = to_c4iw_dev(ibdev); > > diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h > index 97bb555..540ed02 100644 > --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h > +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h > @@ -992,10 +992,10 @@ int c4iw_reregister_phys_mem(struct ib_mr *mr, > int acc, u64 *iova_start); > int c4iw_dereg_mr(struct ib_mr *ib_mr); > int c4iw_destroy_cq(struct ib_cq *ib_cq); > -struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries, > - int vector, > - struct ib_ucontext *ib_context, > - struct ib_udata *udata); > +struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *ib_context, > + struct ib_udata *udata); > int c4iw_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata); > int c4iw_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags); > int c4iw_destroy_qp(struct ib_qp *ib_qp); > diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c > index 8cc8375..9b68b17 100644 > --- a/drivers/infiniband/hw/ehca/ehca_cq.c > +++ b/drivers/infiniband/hw/ehca/ehca_cq.c > @@ -113,10 +113,12 @@ struct ehca_qp *ehca_cq_get_qp(struct ehca_cq *cq, int real_qp_num) > return ret; > } > > -struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, > +struct ib_cq *ehca_create_cq(struct ib_device *device, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata) > { > + int cqe = attr->cqe; > static const u32 additional_cqe = 20; > struct ib_cq *cq; > struct ehca_cq *my_cq; > @@ -131,6 +133,9 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, > int ipz_rc, i; > unsigned long flags; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (cqe >= 0xFFFFFFFF - 64 - additional_cqe) > return ERR_PTR(-EINVAL); > > diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h > index 077185b..952f2fb 100644 > --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h > +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h > @@ -129,7 +129,8 @@ int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq); > void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq); > > > -struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, > +struct ib_cq *ehca_create_cq(struct ib_device *device, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata); > > diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c > index 0416c6c..e9dd911 100644 > --- a/drivers/infiniband/hw/ipath/ipath_cq.c > +++ b/drivers/infiniband/hw/ipath/ipath_cq.c > @@ -188,7 +188,7 @@ static void send_complete(unsigned long data) > /** > * ipath_create_cq - create a completion queue > * @ibdev: the device this completion queue is attached to > - * @entries: the minimum size of the completion queue > + * @attr: creation attributes > * @context: unused by the InfiniPath driver > * @udata: unused by the InfiniPath driver > * > @@ -197,16 +197,21 @@ static void send_complete(unsigned long data) > * > * Called by ib_create_cq() in the generic verbs code. > */ > -struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector, > +struct ib_cq *ipath_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata) > { > + int entries = attr->cqe; > struct ipath_ibdev *dev = to_idev(ibdev); > struct ipath_cq *cq; > struct ipath_cq_wc *wc; > struct ib_cq *ret; > u32 sz; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (entries < 1 || entries > ib_ipath_max_cqes) { > ret = ERR_PTR(-EINVAL); > goto done; > diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h > index ae6cff4..c4e4e81 100644 > --- a/drivers/infiniband/hw/ipath/ipath_verbs.h > +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h > @@ -807,7 +807,8 @@ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig); > > int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); > > -struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries, int comp_vector, > +struct ib_cq *ipath_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata); > > diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c > index 0176caa..8e44aaa 100644 > --- a/drivers/infiniband/hw/mlx4/cq.c > +++ b/drivers/infiniband/hw/mlx4/cq.c > @@ -166,15 +166,21 @@ err_buf: > return err; > } > > -struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, > +struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata) > { > + int entries = attr->cqe; > + int vector = attr->comp_vector; > struct mlx4_ib_dev *dev = to_mdev(ibdev); > struct mlx4_ib_cq *cq; > struct mlx4_uar *uar; > int err; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (entries < 1 || entries > dev->dev->caps.max_cqes) > return ERR_PTR(-EINVAL); > > diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h > index fce3934..f127efc 100644 > --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h > +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h > @@ -668,7 +668,8 @@ void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list); > > int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); > int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata); > -struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector, > +struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata); > int mlx4_ib_destroy_cq(struct ib_cq *cq); > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c > index 2ee6b10..09fbae6 100644 > --- a/drivers/infiniband/hw/mlx5/cq.c > +++ b/drivers/infiniband/hw/mlx5/cq.c > @@ -736,10 +736,13 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq) > mlx5_db_free(dev->mdev, &cq->db); > } > > -struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries, > - int vector, struct ib_ucontext *context, > +struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *context, > struct ib_udata *udata) > { > + int entries = attr->cqe; > + int vector = attr->comp_vector; > struct mlx5_create_cq_mbox_in *cqb = NULL; > struct mlx5_ib_dev *dev = to_mdev(ibdev); > struct mlx5_ib_cq *cq; > @@ -750,6 +753,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries, > int eqn; > int err; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (entries < 0) > return ERR_PTR(-EINVAL); > > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c > index b2fdb9c..9565c20 100644 > --- a/drivers/infiniband/hw/mlx5/main.c > +++ b/drivers/infiniband/hw/mlx5/main.c > @@ -1087,6 +1087,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr) > { > struct ib_srq_init_attr attr; > struct mlx5_ib_dev *dev; > + struct ib_cq_init_attr cq_attr = {.cqe = 1}; > int ret = 0; > > dev = container_of(devr, struct mlx5_ib_dev, devr); > @@ -1100,7 +1101,7 @@ static int create_dev_resources(struct mlx5_ib_resources *devr) > devr->p0->uobject = NULL; > atomic_set(&devr->p0->usecnt, 0); > > - devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, 1, 0, NULL, NULL); > + devr->c0 = mlx5_ib_create_cq(&dev->ib_dev, &cq_attr, NULL, NULL); > if (IS_ERR(devr->c0)) { > ret = PTR_ERR(devr->c0); > goto error1; > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h > index dff1cfc..b086162 100644 > --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h > @@ -556,8 +556,9 @@ int mlx5_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr, > void *mlx5_get_send_wqe(struct mlx5_ib_qp *qp, int n); > int mlx5_ib_read_user_wqe(struct mlx5_ib_qp *qp, int send, int wqe_index, > void *buffer, u32 length); > -struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, int entries, > - int vector, struct ib_ucontext *context, > +struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *context, > struct ib_udata *udata); > int mlx5_ib_destroy_cq(struct ib_cq *cq); > int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); > diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c > index 509d59e..f1b04bd 100644 > --- a/drivers/infiniband/hw/mthca/mthca_provider.c > +++ b/drivers/infiniband/hw/mthca/mthca_provider.c > @@ -641,16 +641,20 @@ static int mthca_destroy_qp(struct ib_qp *qp) > return 0; > } > > -static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, int entries, > - int comp_vector, > +static struct ib_cq *mthca_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata) > { > + int entries = attr->cqe; > struct mthca_create_cq ucmd; > struct mthca_cq *cq; > int nent; > int err; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (entries < 1 || entries > to_mdev(ibdev)->limits.max_cqes) > return ERR_PTR(-EINVAL); > > diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c > index 05530e3..a9d62c5 100644 > --- a/drivers/infiniband/hw/nes/nes_verbs.c > +++ b/drivers/infiniband/hw/nes/nes_verbs.c > @@ -1526,10 +1526,12 @@ static int nes_destroy_qp(struct ib_qp *ibqp) > /** > * nes_create_cq > */ > -static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, > - int comp_vector, > - struct ib_ucontext *context, struct ib_udata *udata) > +static struct ib_cq *nes_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *context, > + struct ib_udata *udata) > { > + int entries = attr->cqe; > u64 u64temp; > struct nes_vnic *nesvnic = to_nesvnic(ibdev); > struct nes_device *nesdev = nesvnic->nesdev; > @@ -1549,6 +1551,9 @@ static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, > unsigned long flags; > int ret; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (entries > nesadapter->max_cqe) > return ERR_PTR(-EINVAL); > > diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c > index 8771755..f441df6 100644 > --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c > +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c > @@ -1000,10 +1000,12 @@ err: > return status; > } > > -struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector, > +struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *ib_ctx, > struct ib_udata *udata) > { > + int entries = attr->cqe; > struct ocrdma_cq *cq; > struct ocrdma_dev *dev = get_ocrdma_dev(ibdev); > struct ocrdma_ucontext *uctx = NULL; > @@ -1011,6 +1013,9 @@ struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, int entries, int vector, > int status; > struct ocrdma_create_cq_ureq ureq; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (udata) { > if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) > return ERR_PTR(-EFAULT); > diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h > index 3cdc81e..e853195 100644 > --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h > +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h > @@ -59,8 +59,10 @@ struct ib_pd *ocrdma_alloc_pd(struct ib_device *, > struct ib_ucontext *, struct ib_udata *); > int ocrdma_dealloc_pd(struct ib_pd *pd); > > -struct ib_cq *ocrdma_create_cq(struct ib_device *, int entries, int vector, > - struct ib_ucontext *, struct ib_udata *); > +struct ib_cq *ocrdma_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *ib_ctx, > + struct ib_udata *udata); > int ocrdma_resize_cq(struct ib_cq *, int cqe, struct ib_udata *); > int ocrdma_destroy_cq(struct ib_cq *); > > diff --git a/drivers/infiniband/hw/qib/qib_cq.c b/drivers/infiniband/hw/qib/qib_cq.c > index ab4e11c..2b45d0b 100644 > --- a/drivers/infiniband/hw/qib/qib_cq.c > +++ b/drivers/infiniband/hw/qib/qib_cq.c > @@ -203,7 +203,7 @@ static void send_complete(struct kthread_work *work) > /** > * qib_create_cq - create a completion queue > * @ibdev: the device this completion queue is attached to > - * @entries: the minimum size of the completion queue > + * @attr: creation attributes > * @context: unused by the QLogic_IB driver > * @udata: user data for libibverbs.so > * > @@ -212,16 +212,21 @@ static void send_complete(struct kthread_work *work) > * > * Called by ib_create_cq() in the generic verbs code. > */ > -struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries, > - int comp_vector, struct ib_ucontext *context, > +struct ib_cq *qib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *context, > struct ib_udata *udata) > { > + int entries = attr->cqe; > struct qib_ibdev *dev = to_idev(ibdev); > struct qib_cq *cq; > struct qib_cq_wc *wc; > struct ib_cq *ret; > u32 sz; > > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > if (entries < 1 || entries > ib_qib_max_cqes) { > ret = ERR_PTR(-EINVAL); > goto done; > diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h > index bfc8948..e9b26a4 100644 > --- a/drivers/infiniband/hw/qib/qib_verbs.h > +++ b/drivers/infiniband/hw/qib/qib_verbs.h > @@ -1007,8 +1007,9 @@ void qib_cq_enter(struct qib_cq *cq, struct ib_wc *entry, int sig); > > int qib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); > > -struct ib_cq *qib_create_cq(struct ib_device *ibdev, int entries, > - int comp_vector, struct ib_ucontext *context, > +struct ib_cq *qib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *context, > struct ib_udata *udata); > > int qib_destroy_cq(struct ib_cq *ibcq); > diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c > index 53bd6a2..2b3f7e3 100644 > --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c > +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c > @@ -570,13 +570,17 @@ int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, > return status; > } > > -struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries, > - int vector, struct ib_ucontext *context, > - struct ib_udata *udata) > +struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *context, > + struct ib_udata *udata) > { > struct ib_cq *cq; > > usnic_dbg("\n"); > + if (attr->flags) > + return ERR_PTR(-EINVAL); > + > cq = kzalloc(sizeof(*cq), GFP_KERNEL); > if (!cq) > return ERR_PTR(-EBUSY); > diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h > index 57ddba5..2ab8427 100644 > --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h > +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h > @@ -46,9 +46,10 @@ struct ib_qp *usnic_ib_create_qp(struct ib_pd *pd, > int usnic_ib_destroy_qp(struct ib_qp *qp); > int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, > int attr_mask, struct ib_udata *udata); > -struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, int entries, > - int vector, struct ib_ucontext *context, > - struct ib_udata *udata); > +struct ib_cq *usnic_ib_create_cq(struct ib_device *ibdev, > + const struct ib_cq_init_attr *attr, > + struct ib_ucontext *context, > + struct ib_udata *udata); > int usnic_ib_destroy_cq(struct ib_cq *cq); > struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length, > u64 virt_addr, int access_flags, > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index ad499bd..6c715d7 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -173,6 +173,12 @@ struct ib_odp_caps { > } per_transport_caps; > }; > > +struct ib_cq_init_attr { > + unsigned int cqe; > + int comp_vector; > + u32 flags; > +}; > + > struct ib_device_attr { > u64 fw_ver; > __be64 sys_image_guid; > @@ -1614,8 +1620,8 @@ struct ib_device { > int (*post_recv)(struct ib_qp *qp, > struct ib_recv_wr *recv_wr, > struct ib_recv_wr **bad_recv_wr); > - struct ib_cq * (*create_cq)(struct ib_device *device, int cqe, > - int comp_vector, > + struct ib_cq * (*create_cq)(struct ib_device *device, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata); > int (*modify_cq)(struct ib_cq *cq, u16 cq_count, > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible [not found] ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-01 9:21 ` Devesh Sharma @ 2015-06-01 17:00 ` Jason Gunthorpe 1 sibling, 0 replies; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-01 17:00 UTC (permalink / raw) To: Or Gerlitz Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Sun, May 31, 2015 at 03:14:09PM +0300, Or Gerlitz wrote: > From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > > Add a new ib_cq_init_attr structure which contains the > previous cqe (minimum number of CQ entries) and comp_vector > (completion vector) in addition to a new flags field. > All vendors' create_cq callbacks are changed in order > to work with the new API. > > This commit does not change any functionality. > > Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Reviewed-By: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-05-31 12:14 ` [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz [not found] ` <1433074457-26437-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-05-31 12:14 ` [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag Or Gerlitz ` (8 subsequent siblings) 10 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/mad.c | 5 ++++- drivers/infiniband/core/verbs.c | 5 ++--- drivers/infiniband/hw/ehca/ehca_main.c | 6 +++++- drivers/infiniband/hw/mlx4/mad.c | 5 ++++- drivers/infiniband/hw/mlx4/main.c | 5 ++++- drivers/infiniband/hw/mlx5/main.c | 7 +++++-- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 9 +++++++-- drivers/infiniband/ulp/iser/iser_verbs.c | 6 +++++- drivers/infiniband/ulp/isert/ib_isert.c | 6 +++++- drivers/infiniband/ulp/srp/ib_srp.c | 10 ++++++++-- drivers/infiniband/ulp/srpt/ib_srpt.c | 5 ++++- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 7 +++++-- include/rdma/ib_verbs.h | 6 ++---- net/9p/trans_rdma.c | 5 ++++- net/rds/ib_cm.c | 8 ++++++-- net/rds/iw_cm.c | 8 ++++++-- net/sunrpc/xprtrdma/svc_rdma_transport.c | 10 ++++++---- net/sunrpc/xprtrdma/verbs.c | 10 ++++++---- 18 files changed, 88 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 600af26..ad3f729 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2923,6 +2923,7 @@ static int ib_mad_port_open(struct ib_device *device, unsigned long flags; char name[sizeof "ib_mad123"]; int has_smi; + struct ib_cq_init_attr cq_attr; /* Create new device info */ port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device, if (has_smi) cq_size *= 2; + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = cq_size; port_priv->cq = ib_create_cq(port_priv->device, ib_mad_thread_completion_handler, - NULL, port_priv, cq_size, 0); + NULL, port_priv, &cq_attr); if (IS_ERR(port_priv->cq)) { dev_err(&device->dev, "Couldn't create ib_mad CQ\n"); ret = PTR_ERR(port_priv->cq); diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index f7615d4..a716ae2 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp); struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, void (*event_handler)(struct ib_event *, void *), - void *cq_context, int cqe, int comp_vector) + void *cq_context, struct ib_cq_init_attr *cq_attr) { struct ib_cq *cq; - struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector}; - cq = device->create_cq(device, &attr, NULL, NULL); + cq = device->create_cq(device, cq_attr, NULL, NULL); if (!IS_ERR(cq)) { cq->device = device; diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 5e30b72..7727556 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -552,6 +552,7 @@ static int ehca_create_aqp1(struct ehca_shca *shca, u32 port) struct ib_cq *ibcq; struct ib_qp *ibqp; struct ib_qp_init_attr qp_init_attr; + struct ib_cq_init_attr cq_attr; int ret; if (sport->ibcq_aqp1) { @@ -559,7 +560,10 @@ static int ehca_create_aqp1(struct ehca_shca *shca, u32 port) return -EPERM; } - ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void *)(-1), 10, 0); + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = 10; + ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void *)(-1), + &cq_attr); if (IS_ERR(ibcq)) { ehca_err(&shca->ib_device, "Cannot create AQP1 CQ."); return PTR_ERR(ibcq); diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 9cd2b00..462e728 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -1773,6 +1773,7 @@ static int create_pv_resources(struct ib_device *ibdev, int slave, int port, int create_tun, struct mlx4_ib_demux_pv_ctx *ctx) { int ret, cq_size; + struct ib_cq_init_attr cq_attr; if (ctx->state != DEMUX_PV_STATE_DOWN) return -EEXIST; @@ -1801,8 +1802,10 @@ static int create_pv_resources(struct ib_device *ibdev, int slave, int port, if (ctx->has_smi) cq_size *= 2; + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = cq_size; ctx->cq = ib_create_cq(ctx->ib_dev, mlx4_ib_tunnel_comp_handler, - NULL, ctx, cq_size, 0); + NULL, ctx, &cq_attr); if (IS_ERR(ctx->cq)) { ret = PTR_ERR(ctx->cq); pr_err("Couldn't create tunnel CQ (%d)\n", ret); diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index c49dd0b..3b90df0 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -758,6 +758,7 @@ static struct ib_xrcd *mlx4_ib_alloc_xrcd(struct ib_device *ibdev, struct ib_udata *udata) { struct mlx4_ib_xrcd *xrcd; + struct ib_cq_init_attr cq_attr; int err; if (!(to_mdev(ibdev)->dev->caps.flags & MLX4_DEV_CAP_FLAG_XRC)) @@ -777,7 +778,9 @@ static struct ib_xrcd *mlx4_ib_alloc_xrcd(struct ib_device *ibdev, goto err2; } - xrcd->cq = ib_create_cq(ibdev, NULL, NULL, xrcd, 1, 0); + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = 1; + xrcd->cq = ib_create_cq(ibdev, NULL, NULL, xrcd, &cq_attr); if (IS_ERR(xrcd->cq)) { err = PTR_ERR(xrcd->cq); goto err3; diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 9565c20..b9976a0 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -971,6 +971,7 @@ static int create_umr_res(struct mlx5_ib_dev *dev) struct ib_cq *cq; struct ib_qp *qp; struct ib_mr *mr; + struct ib_cq_init_attr cq_attr; int ret; attr = kzalloc(sizeof(*attr), GFP_KERNEL); @@ -994,8 +995,10 @@ static int create_umr_res(struct mlx5_ib_dev *dev) goto error_1; } - cq = ib_create_cq(&dev->ib_dev, mlx5_umr_cq_handler, NULL, NULL, 128, - 0); + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = 128; + cq = ib_create_cq(&dev->ib_dev, mlx5_umr_cq_handler, NULL, NULL, + &cq_attr); if (IS_ERR(cq)) { mlx5_ib_dbg(dev, "Couldn't create CQ for sync UMR QP\n"); ret = PTR_ERR(cq); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index e5cc430..93a0ca4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -141,6 +141,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) .sq_sig_type = IB_SIGNAL_ALL_WR, .qp_type = IB_QPT_UD }; + struct ib_cq_init_attr cq_attr; int ret, size; int i; @@ -178,14 +179,18 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca) } else goto out_free_wq; - priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0); + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = size; + priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, + dev, &cq_attr); if (IS_ERR(priv->recv_cq)) { printk(KERN_WARNING "%s: failed to create receive CQ\n", ca->name); goto out_cm_dev_cleanup; } + cq_attr.cqe = ipoib_sendq_size; priv->send_cq = ib_create_cq(priv->ca, ipoib_send_comp_handler, NULL, - dev, ipoib_sendq_size, 0); + dev, &cq_attr); if (IS_ERR(priv->send_cq)) { printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name); goto out_free_recv_cq; diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index d33c5c0..e287287 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -126,14 +126,18 @@ static int iser_create_device_ib_res(struct iser_device *device) goto pd_err; for (i = 0; i < device->comps_used; i++) { + struct ib_cq_init_attr cq_attr; struct iser_comp *comp = &device->comps[i]; comp->device = device; + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = max_cqe; + cq_attr.comp_vector = i; comp->cq = ib_create_cq(device->ib_device, iser_cq_callback, iser_cq_event_callback, (void *)comp, - max_cqe, i); + &cq_attr); if (IS_ERR(comp->cq)) { comp->cq = NULL; goto cq_err; diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index d99a0c8..59ac1fc 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -318,15 +318,19 @@ isert_alloc_comps(struct isert_device *device, max_cqe = min(ISER_MAX_CQ_LEN, attr->max_cqe); for (i = 0; i < device->comps_used; i++) { + struct ib_cq_init_attr cq_attr; struct isert_comp *comp = &device->comps[i]; comp->device = device; INIT_WORK(&comp->work, isert_cq_work); + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = max_cqe; + cq_attr.comp_vector = i; comp->cq = ib_create_cq(device->ib_device, isert_cq_callback, isert_cq_event_callback, (void *)comp, - max_cqe, i); + &cq_attr); if (IS_ERR(comp->cq)) { isert_err("Unable to allocate cq\n"); ret = PTR_ERR(comp->cq); diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index c3f654d..16b29cb 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -500,6 +500,7 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch) struct ib_fmr_pool *fmr_pool = NULL; struct srp_fr_pool *fr_pool = NULL; const int m = 1 + dev->use_fast_reg; + struct ib_cq_init_attr cq_attr; int ret; init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); @@ -507,15 +508,20 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch) return -ENOMEM; /* + 1 for SRP_LAST_WR_ID */ + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = target->queue_size + 1; + cq_attr.comp_vector = ch->comp_vector; recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, ch, - target->queue_size + 1, ch->comp_vector); + &cq_attr); if (IS_ERR(recv_cq)) { ret = PTR_ERR(recv_cq); goto err; } + cq_attr.cqe = m * target->queue_size; + cq_attr.comp_vector = ch->comp_vector; send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, ch, - m * target->queue_size, ch->comp_vector); + &cq_attr); if (IS_ERR(send_cq)) { ret = PTR_ERR(send_cq); goto err_recv_cq; diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 9b84b4c..1a65202 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2080,6 +2080,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch) struct srpt_port *sport = ch->sport; struct srpt_device *sdev = sport->sdev; u32 srp_sq_size = sport->port_attrib.srp_sq_size; + struct ib_cq_init_attr cq_attr; int ret; WARN_ON(ch->rq_size < 1); @@ -2090,8 +2091,10 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch) goto out; retry: + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = ch->rq_size + srp_sq_size; ch->cq = ib_create_cq(sdev->device, srpt_completion, NULL, ch, - ch->rq_size + srp_sq_size, 0); + &cq_attr); if (IS_ERR(ch->cq)) { ret = PTR_ERR(ch->cq); pr_err("failed to create CQ cqe= %d ret= %d\n", diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 3bad441..a23aa76 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -647,6 +647,7 @@ kib_conn_t *kiblnd_create_conn(kib_peer_t *peer, struct rdma_cm_id *cmid, kib_dev_t *dev; struct ib_qp_init_attr *init_qp_attr; struct kib_sched_info *sched; + struct ib_cq_init_attr cq_attr; kib_conn_t *conn; struct ib_cq *cq; unsigned long flags; @@ -742,10 +743,12 @@ kib_conn_t *kiblnd_create_conn(kib_peer_t *peer, struct rdma_cm_id *cmid, kiblnd_map_rx_descs(conn); + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = IBLND_CQ_ENTRIES(version); + cq_attr.comp_vector = kiblnd_get_completion_vector(conn, cpt); cq = ib_create_cq(cmid->device, kiblnd_cq_completion, kiblnd_cq_event, conn, - IBLND_CQ_ENTRIES(version), - kiblnd_get_completion_vector(conn, cpt)); + &cq_attr); if (IS_ERR(cq)) { CERROR("Can't create CQ: %ld, cqe: %d\n", PTR_ERR(cq), IBLND_CQ_ENTRIES(version)); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 6c715d7..321a0b3 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2314,16 +2314,14 @@ static inline int ib_post_recv(struct ib_qp *qp, * asynchronous event not associated with a completion occurs on the CQ. * @cq_context: Context associated with the CQ returned to the user via * the associated completion and event handlers. - * @cqe: The minimum size of the CQ. - * @comp_vector - Completion vector used to signal completion events. - * Must be >= 0 and < context->num_comp_vectors. + * @cq_attr: The attributes the CQ should be created upon. * * Users can examine the cq structure to determine the actual CQ size. */ struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, void (*event_handler)(struct ib_event *, void *), - void *cq_context, int cqe, int comp_vector); + void *cq_context, struct ib_cq_init_attr *cq_attr); /** * ib_resize_cq - Modifies the capacity of the CQ. diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c index 3533d2a..bfbcb45 100644 --- a/net/9p/trans_rdma.c +++ b/net/9p/trans_rdma.c @@ -648,6 +648,7 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args) struct rdma_conn_param conn_param; struct ib_qp_init_attr qp_attr; struct ib_device_attr devattr; + struct ib_cq_init_attr cq_attr; /* Parse the transport specific mount options */ err = parse_opts(args, &opts); @@ -705,9 +706,11 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args) goto error; /* Create the Completion Queue */ + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = opts.sq_depth + opts.rq_depth + 1; rdma->cq = ib_create_cq(rdma->cm_id->device, cq_comp_handler, cq_event_handler, client, - opts.sq_depth + opts.rq_depth + 1, 0); + &cq_attr); if (IS_ERR(rdma->cq)) goto error; ib_req_notify_cq(rdma->cq, IB_CQ_NEXT_COMP); diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c index b8d1bda..7aa1488 100644 --- a/net/rds/ib_cm.c +++ b/net/rds/ib_cm.c @@ -247,6 +247,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn) struct rds_ib_connection *ic = conn->c_transport_data; struct ib_device *dev = ic->i_cm_id->device; struct ib_qp_init_attr attr; + struct ib_cq_init_attr cq_attr; struct rds_ib_device *rds_ibdev; int ret; @@ -270,9 +271,11 @@ static int rds_ib_setup_qp(struct rds_connection *conn) ic->i_pd = rds_ibdev->pd; ic->i_mr = rds_ibdev->mr; + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = ic->i_send_ring.w_nr + 1; ic->i_send_cq = ib_create_cq(dev, rds_ib_send_cq_comp_handler, rds_ib_cq_event_handler, conn, - ic->i_send_ring.w_nr + 1, 0); + &cq_attr); if (IS_ERR(ic->i_send_cq)) { ret = PTR_ERR(ic->i_send_cq); ic->i_send_cq = NULL; @@ -280,9 +283,10 @@ static int rds_ib_setup_qp(struct rds_connection *conn) goto out; } + cq_attr.cqe = ic->i_recv_ring.w_nr; ic->i_recv_cq = ib_create_cq(dev, rds_ib_recv_cq_comp_handler, rds_ib_cq_event_handler, conn, - ic->i_recv_ring.w_nr, 0); + &cq_attr); if (IS_ERR(ic->i_recv_cq)) { ret = PTR_ERR(ic->i_recv_cq); ic->i_recv_cq = NULL; diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c index a6c2bea..bc29196 100644 --- a/net/rds/iw_cm.c +++ b/net/rds/iw_cm.c @@ -179,6 +179,7 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr, void *context) { struct ib_device *dev = rds_iwdev->dev; + struct ib_cq_init_attr cq_attr; unsigned int send_size, recv_size; int ret; @@ -198,9 +199,11 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr, attr->sq_sig_type = IB_SIGNAL_REQ_WR; attr->qp_type = IB_QPT_RC; + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = send_size; attr->send_cq = ib_create_cq(dev, send_cq_handler, rds_iw_cq_event_handler, - context, send_size, 0); + context, &cq_attr); if (IS_ERR(attr->send_cq)) { ret = PTR_ERR(attr->send_cq); attr->send_cq = NULL; @@ -208,9 +211,10 @@ static int rds_iw_init_qp_attrs(struct ib_qp_init_attr *attr, goto out; } + cq_attr.cqe = recv_size; attr->recv_cq = ib_create_cq(dev, recv_cq_handler, rds_iw_cq_event_handler, - context, recv_size, 0); + context, &cq_attr); if (IS_ERR(attr->recv_cq)) { ret = PTR_ERR(attr->recv_cq); attr->recv_cq = NULL; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 88eb994..ad96ee5 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -855,6 +855,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) struct svcxprt_rdma *listen_rdma; struct svcxprt_rdma *newxprt = NULL; struct rdma_conn_param conn_param; + struct ib_cq_init_attr cq_attr; struct ib_qp_init_attr qp_attr; struct ib_device_attr devattr; int uninitialized_var(dma_mr_acc); @@ -907,22 +908,23 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) dprintk("svcrdma: error creating PD for connect request\n"); goto errout; } + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = newxprt->sc_sq_depth; newxprt->sc_sq_cq = ib_create_cq(newxprt->sc_cm_id->device, sq_comp_handler, cq_event_handler, newxprt, - newxprt->sc_sq_depth, - 0); + &cq_attr); if (IS_ERR(newxprt->sc_sq_cq)) { dprintk("svcrdma: error creating SQ CQ for connect request\n"); goto errout; } + cq_attr.cqe = newxprt->sc_max_requests; newxprt->sc_rq_cq = ib_create_cq(newxprt->sc_cm_id->device, rq_comp_handler, cq_event_handler, newxprt, - newxprt->sc_max_requests, - 0); + &cq_attr); if (IS_ERR(newxprt->sc_rq_cq)) { dprintk("svcrdma: error creating RQ CQ for connect request\n"); goto errout; diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 6f6b8a5..5ac60d4 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -644,6 +644,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, { struct ib_device_attr *devattr = &ia->ri_devattr; struct ib_cq *sendcq, *recvcq; + struct ib_cq_init_attr cq_attr; int rc, err; /* check provider's send/recv wr limits */ @@ -691,9 +692,10 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, init_waitqueue_head(&ep->rep_connect_wait); INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker); + memset(&cq_attr, 0, sizeof(cq_attr)); + cq_attr.cqe = ep->rep_attr.cap.max_send_wr + 1; sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall, - rpcrdma_cq_async_error_upcall, ep, - ep->rep_attr.cap.max_send_wr + 1, 0); + rpcrdma_cq_async_error_upcall, ep, &cq_attr); if (IS_ERR(sendcq)) { rc = PTR_ERR(sendcq); dprintk("RPC: %s: failed to create send CQ: %i\n", @@ -708,9 +710,9 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, goto out2; } + cq_attr.cqe = ep->rep_attr.cap.max_recv_wr + 1; recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall, - rpcrdma_cq_async_error_upcall, ep, - ep->rep_attr.cap.max_recv_wr + 1, 0); + rpcrdma_cq_async_error_upcall, ep, &cq_attr); if (IS_ERR(recvcq)) { rc = PTR_ERR(recvcq); dprintk("RPC: %s: failed to create recv CQ: %i\n", -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
[parent not found: <1433074457-26437-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr [not found] ` <1433074457-26437-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-06-01 16:53 ` Jason Gunthorpe [not found] ` <20150601165335.GB14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-01 16:53 UTC (permalink / raw) To: Or Gerlitz Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote: > + struct ib_cq_init_attr cq_attr; > > /* Create new device info */ > port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); > @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device, > if (has_smi) > cq_size *= 2; > > + memset(&cq_attr, 0, sizeof(cq_attr)); > + cq_attr.cqe = cq_size; Why does this patch switch to using memset when the prior patch used = {} ? > @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp); > struct ib_cq *ib_create_cq(struct ib_device *device, > ib_comp_handler comp_handler, > void (*event_handler)(struct ib_event *, void *), > - void *cq_context, int cqe, int comp_vector) > + void *cq_context, struct ib_cq_init_attr *cq_attr) > { > struct ib_cq *cq; > - struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector}; > > - cq = device->create_cq(device, &attr, NULL, NULL); > + cq = device->create_cq(device, cq_attr, NULL, NULL); How does this compile without warnings? The prior patch did: - struct ib_cq * (*create_cq)(struct ib_device *device, int cqe, - int comp_vector, + struct ib_cq * (*create_cq)(struct ib_device *device, + const struct ib_cq_init_attr *attr, struct ib_ucontext *context, struct ib_udata *udata); Otherwise looks OK. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150601165335.GB14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr [not found] ` <20150601165335.GB14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-02 7:29 ` Matan Barak [not found] ` <CAAKD3BAyNpEKwM7xu=LzFo2F=qP=EN9ZteL2yyAgBVuPNmvtqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Matan Barak @ 2015-06-02 7:29 UTC (permalink / raw) To: Jason Gunthorpe Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Mon, Jun 1, 2015 at 7:53 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote: > >> + struct ib_cq_init_attr cq_attr; >> >> /* Create new device info */ >> port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); >> @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device, >> if (has_smi) >> cq_size *= 2; >> >> + memset(&cq_attr, 0, sizeof(cq_attr)); >> + cq_attr.cqe = cq_size; > > Why does this patch switch to using memset when the prior patch used > = {} ? > Why does it matter? Both are valid approaches, aren't they? >> @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp); >> struct ib_cq *ib_create_cq(struct ib_device *device, >> ib_comp_handler comp_handler, >> void (*event_handler)(struct ib_event *, void *), >> - void *cq_context, int cqe, int comp_vector) >> + void *cq_context, struct ib_cq_init_attr *cq_attr) >> { >> struct ib_cq *cq; >> - struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector}; >> >> - cq = device->create_cq(device, &attr, NULL, NULL); >> + cq = device->create_cq(device, cq_attr, NULL, NULL); > > How does this compile without warnings? > Do you mean that there's a missing const here? It doesn't (and shouldn't) cause warnings. However, ib_create_cq could be changed that it'll get const cq_attr as well. > The prior patch did: > > - struct ib_cq * (*create_cq)(struct ib_device *device, int cqe, > - int comp_vector, > + struct ib_cq * (*create_cq)(struct ib_device *device, > + const struct ib_cq_init_attr *attr, > struct ib_ucontext *context, > struct ib_udata *udata); > > Otherwise looks OK. > > Jason Thanks for the review. Matan > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAAKD3BAyNpEKwM7xu=LzFo2F=qP=EN9ZteL2yyAgBVuPNmvtqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr [not found] ` <CAAKD3BAyNpEKwM7xu=LzFo2F=qP=EN9ZteL2yyAgBVuPNmvtqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-02 17:27 ` Jason Gunthorpe [not found] ` <20150602172711.GB17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-02 17:27 UTC (permalink / raw) To: Matan Barak Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Tue, Jun 02, 2015 at 10:29:14AM +0300, Matan Barak wrote: > On Mon, Jun 1, 2015 at 7:53 PM, Jason Gunthorpe > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > > On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote: > > > >> + struct ib_cq_init_attr cq_attr; > >> > >> /* Create new device info */ > >> port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); > >> @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device, > >> if (has_smi) > >> cq_size *= 2; > >> > >> + memset(&cq_attr, 0, sizeof(cq_attr)); > >> + cq_attr.cqe = cq_size; > > > > Why does this patch switch to using memset when the prior patch used > > = {} ? > > > > Why does it matter? Both are valid approaches, aren't they? Sure, but why mix and match techniques in the same code base? Is there a reason? > >> @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp); > >> struct ib_cq *ib_create_cq(struct ib_device *device, > >> ib_comp_handler comp_handler, > >> void (*event_handler)(struct ib_event *, void *), > >> - void *cq_context, int cqe, int comp_vector) > >> + void *cq_context, struct ib_cq_init_attr *cq_attr) > >> { > >> struct ib_cq *cq; > >> - struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector}; > >> > >> - cq = device->create_cq(device, &attr, NULL, NULL); > >> + cq = device->create_cq(device, cq_attr, NULL, NULL); > > > > How does this compile without warnings? > > > > Do you mean that there's a missing const here? It doesn't (and shouldn't) > cause warnings. Warnings will be emitted when you compile with -Wcast-qual (make W=3). Some of the static tools may warn/error on implicitly casting away const, I'm not sure. It is broadly undesirable to implicitly cast away const, even if the Kernel warning defaults don't produce the message. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150602172711.GB17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr [not found] ` <20150602172711.GB17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-03 9:04 ` Matan Barak [not found] ` <CAAKD3BA7ZD4Gx5c16Ak+0_64rzazw+7n4frkuKHyrvdT=8ac6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Matan Barak @ 2015-06-03 9:04 UTC (permalink / raw) To: Jason Gunthorpe Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Tue, Jun 2, 2015 at 8:27 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Tue, Jun 02, 2015 at 10:29:14AM +0300, Matan Barak wrote: >> On Mon, Jun 1, 2015 at 7:53 PM, Jason Gunthorpe >> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: >> > On Sun, May 31, 2015 at 03:14:10PM +0300, Or Gerlitz wrote: >> > >> >> + struct ib_cq_init_attr cq_attr; >> >> >> >> /* Create new device info */ >> >> port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); >> >> @@ -2943,9 +2944,11 @@ static int ib_mad_port_open(struct ib_device *device, >> >> if (has_smi) >> >> cq_size *= 2; >> >> >> >> + memset(&cq_attr, 0, sizeof(cq_attr)); >> >> + cq_attr.cqe = cq_size; >> > >> > Why does this patch switch to using memset when the prior patch used >> > = {} ? >> > >> >> Why does it matter? Both are valid approaches, aren't they? > > Sure, but why mix and match techniques in the same code base? Is there > a reason? > No reason. We'll change that to {} style. >> >> @@ -1075,12 +1075,11 @@ EXPORT_SYMBOL(ib_destroy_qp); >> >> struct ib_cq *ib_create_cq(struct ib_device *device, >> >> ib_comp_handler comp_handler, >> >> void (*event_handler)(struct ib_event *, void *), >> >> - void *cq_context, int cqe, int comp_vector) >> >> + void *cq_context, struct ib_cq_init_attr *cq_attr) >> >> { >> >> struct ib_cq *cq; >> >> - struct ib_cq_init_attr attr = {.cqe = cqe, .comp_vector = comp_vector}; >> >> >> >> - cq = device->create_cq(device, &attr, NULL, NULL); >> >> + cq = device->create_cq(device, cq_attr, NULL, NULL); >> > >> > How does this compile without warnings? >> > >> >> Do you mean that there's a missing const here? It doesn't (and shouldn't) >> cause warnings. > > Warnings will be emitted when you compile with -Wcast-qual (make W=3). > > Some of the static tools may warn/error on implicitly casting away > const, I'm not sure. > > It is broadly undesirable to implicitly cast away const, even if the > Kernel warning defaults don't produce the message. > Why is it casting away the const? ib_create_cq gets a "struct ib_cq_init_attr *" and passes it to device->create_cq which gets a "const struct ib_cq_init_attr *", so I'm adding a const, which is perfectly fine. Anyway, we'll add const to ib_create_cq as well and run make W=3 to verify. > Jason Thanks for your comments. Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAAKD3BA7ZD4Gx5c16Ak+0_64rzazw+7n4frkuKHyrvdT=8ac6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr [not found] ` <CAAKD3BA7ZD4Gx5c16Ak+0_64rzazw+7n4frkuKHyrvdT=8ac6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-03 16:02 ` Jason Gunthorpe 0 siblings, 0 replies; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-03 16:02 UTC (permalink / raw) To: Matan Barak Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Wed, Jun 03, 2015 at 12:04:23PM +0300, Matan Barak wrote: > Why is it casting away the const? > ib_create_cq gets a "struct ib_cq_init_attr *" and passes it to > device->create_cq which gets a "const struct ib_cq_init_attr *", > so I'm adding a const, which is perfectly fine. Right, never mind, I read the patch upside down.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-05-31 12:14 ` [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq Or Gerlitz ` (7 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Add CQ creation flag which dictates that the created CQ will report completion time-stamp value in the WC. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- include/rdma/ib_verbs.h | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 321a0b3..62e88e1 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -173,6 +173,10 @@ struct ib_odp_caps { } per_transport_caps; }; +enum ib_cq_creation_flags { + IB_CQ_FLAGS_TIMESTAMP_COMPLETION = 1 << 0, +}; + struct ib_cq_init_attr { unsigned int cqe; int comp_vector; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (2 preceding siblings ...) 2015-05-31 12:14 ` [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz ` (6 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ib_uverbs_ex_create_cq follows the extension verbs mechanism. New features (for example, CQ creation flags field which is added in a downstream patch) could used via user-space libraries without breaking the ABI. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/uverbs.h | 1 + drivers/infiniband/core/uverbs_cmd.c | 170 ++++++++++++++++++++++++++------- drivers/infiniband/core/uverbs_main.c | 1 + include/uapi/rdma/ib_user_verbs.h | 17 ++++ 4 files changed, 154 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index b716b08..ba365b6 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -259,5 +259,6 @@ IB_UVERBS_DECLARE_CMD(close_xrcd); IB_UVERBS_DECLARE_EX_CMD(create_flow); IB_UVERBS_DECLARE_EX_CMD(destroy_flow); IB_UVERBS_DECLARE_EX_CMD(query_device); +IB_UVERBS_DECLARE_EX_CMD(create_cq); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 1954ebb..51311b1 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1330,41 +1330,37 @@ ssize_t ib_uverbs_create_comp_channel(struct ib_uverbs_file *file, return in_len; } -ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, - const char __user *buf, int in_len, - int out_len) +static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file, + struct ib_udata *ucore, + struct ib_udata *uhw, + struct ib_uverbs_ex_create_cq *cmd, + size_t cmd_sz, + int (*cb)(struct ib_uverbs_file *file, + struct ib_ucq_object *obj, + struct ib_uverbs_ex_create_cq_resp *resp, + struct ib_udata *udata, + void *context), + void *context) { - struct ib_uverbs_create_cq cmd; - struct ib_uverbs_create_cq_resp resp; - struct ib_udata udata; struct ib_ucq_object *obj; struct ib_uverbs_event_file *ev_file = NULL; struct ib_cq *cq; int ret; + struct ib_uverbs_ex_create_cq_resp resp; struct ib_cq_init_attr attr = {}; - if (out_len < sizeof resp) - return -ENOSPC; - - if (copy_from_user(&cmd, buf, sizeof cmd)) - return -EFAULT; - - INIT_UDATA(&udata, buf + sizeof cmd, - (unsigned long) cmd.response + sizeof resp, - in_len - sizeof cmd, out_len - sizeof resp); - - if (cmd.comp_vector >= file->device->num_comp_vectors) - return -EINVAL; + if (cmd->comp_vector >= file->device->num_comp_vectors) + return ERR_PTR(-EINVAL); obj = kmalloc(sizeof *obj, GFP_KERNEL); if (!obj) - return -ENOMEM; + return ERR_PTR(-ENOMEM); - init_uobj(&obj->uobject, cmd.user_handle, file->ucontext, &cq_lock_class); + init_uobj(&obj->uobject, cmd->user_handle, file->ucontext, &cq_lock_class); down_write(&obj->uobject.mutex); - if (cmd.comp_channel >= 0) { - ev_file = ib_uverbs_lookup_comp_file(cmd.comp_channel); + if (cmd->comp_channel >= 0) { + ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel); if (!ev_file) { ret = -EINVAL; goto err; @@ -1377,10 +1373,14 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, INIT_LIST_HEAD(&obj->comp_list); INIT_LIST_HEAD(&obj->async_list); - attr.cqe = cmd.cqe; - attr.comp_vector = cmd.comp_vector; + attr.cqe = cmd->cqe; + attr.comp_vector = cmd->comp_vector; + + if (cmd_sz > offsetof(typeof(*cmd), flags) + sizeof(cmd->flags)) + attr.flags = cmd->flags; + cq = file->device->ib_dev->create_cq(file->device->ib_dev, &attr, - file->ucontext, &udata); + file->ucontext, uhw); if (IS_ERR(cq)) { ret = PTR_ERR(cq); goto err_file; @@ -1399,14 +1399,15 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, goto err_free; memset(&resp, 0, sizeof resp); - resp.cq_handle = obj->uobject.id; - resp.cqe = cq->cqe; + resp.base.cq_handle = obj->uobject.id; + resp.base.cqe = cq->cqe; - if (copy_to_user((void __user *) (unsigned long) cmd.response, - &resp, sizeof resp)) { - ret = -EFAULT; - goto err_copy; - } + resp.response_length = offsetof(typeof(resp), response_length) + + sizeof(resp.response_length); + + ret = cb(file, obj, &resp, ucore, context); + if (ret) + goto err_cb; mutex_lock(&file->mutex); list_add_tail(&obj->uobject.list, &file->ucontext->cq_list); @@ -1416,9 +1417,9 @@ ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, up_write(&obj->uobject.mutex); - return in_len; + return obj; -err_copy: +err_cb: idr_remove_uobj(&ib_uverbs_cq_idr, &obj->uobject); err_free: @@ -1430,7 +1431,106 @@ err_file: err: put_uobj_write(&obj->uobject); - return ret; + + return ERR_PTR(ret); +} + +static int ib_uverbs_create_cq_cb(struct ib_uverbs_file *file, + struct ib_ucq_object *obj, + struct ib_uverbs_ex_create_cq_resp *resp, + struct ib_udata *ucore, void *context) +{ + if (ib_copy_to_udata(ucore, &resp->base, sizeof(resp->base))) + return -EFAULT; + + return 0; +} + +ssize_t ib_uverbs_create_cq(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_create_cq cmd; + struct ib_uverbs_ex_create_cq cmd_ex; + struct ib_uverbs_create_cq_resp resp; + struct ib_udata ucore; + struct ib_udata uhw; + struct ib_ucq_object *obj; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, buf, sizeof(cmd))) + return -EFAULT; + + INIT_UDATA(&ucore, buf, cmd.response, sizeof(cmd), sizeof(resp)); + + INIT_UDATA(&uhw, buf + sizeof(cmd), + (unsigned long)cmd.response + sizeof(resp), + in_len - sizeof(cmd), out_len - sizeof(resp)); + + memset(&cmd_ex, 0, sizeof(cmd_ex)); + cmd_ex.user_handle = cmd.user_handle; + cmd_ex.cqe = cmd.cqe; + cmd_ex.comp_vector = cmd.comp_vector; + cmd_ex.comp_channel = cmd.comp_channel; + + obj = create_cq(file, &ucore, &uhw, &cmd_ex, + offsetof(typeof(cmd_ex), comp_channel) + + sizeof(cmd.comp_channel), ib_uverbs_create_cq_cb, + NULL); + + if (IS_ERR(obj)) + return PTR_ERR(obj); + + return in_len; +} + +static int ib_uverbs_ex_create_cq_cb(struct ib_uverbs_file *file, + struct ib_ucq_object *obj, + struct ib_uverbs_ex_create_cq_resp *resp, + struct ib_udata *ucore, void *context) +{ + if (ib_copy_to_udata(ucore, resp, resp->response_length)) + return -EFAULT; + + return 0; +} + +int ib_uverbs_ex_create_cq(struct ib_uverbs_file *file, + struct ib_udata *ucore, + struct ib_udata *uhw) +{ + struct ib_uverbs_ex_create_cq_resp resp; + struct ib_uverbs_ex_create_cq cmd; + struct ib_ucq_object *obj; + int err; + + if (ucore->inlen < sizeof(cmd)) + return -EINVAL; + + err = ib_copy_from_udata(&cmd, ucore, sizeof(cmd)); + if (err) + return err; + + if (cmd.comp_mask) + return -EINVAL; + + if (cmd.reserved) + return -EINVAL; + + if (ucore->outlen < (offsetof(typeof(resp), response_length) + + sizeof(resp.response_length))) + return -ENOSPC; + + obj = create_cq(file, ucore, uhw, &cmd, + min(ucore->inlen, sizeof(cmd)), + ib_uverbs_ex_create_cq_cb, NULL); + + if (IS_ERR(obj)) + return PTR_ERR(obj); + + return 0; } ssize_t ib_uverbs_resize_cq(struct ib_uverbs_file *file, diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 88cce9b..f6eef2d 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -124,6 +124,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file, [IB_USER_VERBS_EX_CMD_CREATE_FLOW] = ib_uverbs_ex_create_flow, [IB_USER_VERBS_EX_CMD_DESTROY_FLOW] = ib_uverbs_ex_destroy_flow, [IB_USER_VERBS_EX_CMD_QUERY_DEVICE] = ib_uverbs_ex_query_device, + [IB_USER_VERBS_EX_CMD_CREATE_CQ] = ib_uverbs_ex_create_cq, }; static void ib_uverbs_add_one(struct ib_device *device); diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index b513e66..19f0256 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -91,6 +91,7 @@ enum { enum { IB_USER_VERBS_EX_CMD_QUERY_DEVICE = IB_USER_VERBS_CMD_QUERY_DEVICE, + IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ, IB_USER_VERBS_EX_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD, IB_USER_VERBS_EX_CMD_DESTROY_FLOW, }; @@ -353,11 +354,27 @@ struct ib_uverbs_create_cq { __u64 driver_data[0]; }; +struct ib_uverbs_ex_create_cq { + __u64 user_handle; + __u32 cqe; + __u32 comp_vector; + __s32 comp_channel; + __u32 comp_mask; + __u32 flags; + __u32 reserved; +}; + struct ib_uverbs_create_cq_resp { __u32 cq_handle; __u32 cqe; }; +struct ib_uverbs_ex_create_cq_resp { + struct ib_uverbs_create_cq_resp base; + __u32 comp_mask; + __u32 response_length; +}; + struct ib_uverbs_resize_cq { __u64 response; __u32 cq_handle; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (3 preceding siblings ...) 2015-05-31 12:14 ` [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device Or Gerlitz ` (5 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> In order to expose timestamp we need to expose two attributes in query_device: timestamp_mask - how many bits are valid in the timestamp. timestamp values could be 64bits the most. hca_core_clock - timestamp is given in HW cycles, hca_core_clock is the frequency of the HCA and is necessary in order to convert cycles to seconds. This is added both to ib_query_device and its respective uverbs counterpart. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/device.c | 2 ++ drivers/infiniband/core/uverbs_cmd.c | 14 ++++++++++++++ include/rdma/ib_verbs.h | 2 ++ include/uapi/rdma/ib_user_verbs.h | 2 ++ 4 files changed, 20 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 8d07c12..568cb41 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -539,6 +539,8 @@ EXPORT_SYMBOL(ib_dispatch_event); int ib_query_device(struct ib_device *device, struct ib_device_attr *device_attr) { + memset(device_attr, 0, sizeof(*device_attr)); + return device->query_device(device, device_attr); } EXPORT_SYMBOL(ib_query_device); diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 51311b1..11ee298 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -3426,6 +3426,8 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, if (ucore->outlen < resp.response_length) return -ENOSPC; + memset(&attr, 0, sizeof(attr)); + err = device->query_device(device, &attr); if (err) return err; @@ -3450,6 +3452,18 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, #endif resp.response_length += sizeof(resp.odp_caps); + if (ucore->outlen < resp.response_length + sizeof(resp.timestamp_mask)) + goto end; + + resp.timestamp_mask = attr.timestamp_mask; + resp.response_length += sizeof(resp.timestamp_mask); + + if (ucore->outlen < resp.response_length + sizeof(resp.hca_core_clock)) + goto end; + + resp.hca_core_clock = attr.hca_core_clock; + resp.response_length += sizeof(resp.hca_core_clock); + end: err = ib_copy_to_udata(ucore, &resp, resp.response_length); if (err) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 62e88e1..a45f674 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -227,6 +227,8 @@ struct ib_device_attr { int sig_prot_cap; int sig_guard_cap; struct ib_odp_caps odp_caps; + uint64_t timestamp_mask; + uint64_t hca_core_clock; }; enum ib_mtu { diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 19f0256..f9b8843 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -223,6 +223,8 @@ struct ib_uverbs_ex_query_device_resp { __u32 comp_mask; __u32 response_length; struct ib_uverbs_odp_caps odp_caps; + __u64 timestamp_mask; + __u64 hca_core_clock; }; struct ib_uverbs_query_port { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (4 preceding siblings ...) 2015-05-31 12:14 ` [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz [not found] ` <1433074457-26437-7-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-05-31 12:14 ` [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz ` (4 subsequent siblings) 10 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Vendors should be able to pass vendor specific data to/from user-space via query_device uverb. In order to do this, we need to pass the vendors' specific udata. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/device.c | 4 +++- drivers/infiniband/core/uverbs_cmd.c | 2 +- drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++-- drivers/infiniband/hw/cxgb3/iwch_provider.c | 8 ++++++-- drivers/infiniband/hw/cxgb4/provider.c | 8 ++++++-- drivers/infiniband/hw/ehca/ehca_hca.c | 6 +++++- drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 ++- drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++-- drivers/infiniband/hw/mlx4/main.c | 6 +++++- drivers/infiniband/hw/mlx5/main.c | 9 +++++++-- drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++-- drivers/infiniband/hw/nes/nes_verbs.c | 6 +++++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 +++++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 ++- drivers/infiniband/hw/qib/qib_verbs.c | 6 ++++-- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 +++++- drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 3 ++- include/rdma/ib_verbs.h | 3 ++- 18 files changed, 75 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 568cb41..694bd66 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -539,9 +539,11 @@ EXPORT_SYMBOL(ib_dispatch_event); int ib_query_device(struct ib_device *device, struct ib_device_attr *device_attr) { + struct ib_udata uhw = {.outlen = 0, .inlen = 0}; + memset(device_attr, 0, sizeof(*device_attr)); - return device->query_device(device, device_attr); + return device->query_device(device, device_attr, &uhw); } EXPORT_SYMBOL(ib_query_device); diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 11ee298..bbb02ff 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -3428,7 +3428,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, memset(&attr, 0, sizeof(attr)); - err = device->query_device(device, &attr); + err = device->query_device(device, &attr, uhw); if (err) return err; diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index a43e022..382f109 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -63,13 +63,16 @@ #include "c2_provider.h" #include "c2_user.h" -static int c2_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) +static int c2_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { struct c2_dev *c2dev = to_c2dev(ibdev); pr_debug("%s:%u\n", __func__, __LINE__); + if (uhw->inlen || uhw->outlen) + return -EINVAL; + *props = c2dev->props; return 0; } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 2eaf7e8..c4b5936 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1150,13 +1150,17 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev) (fw_mic & 0xffff); } -static int iwch_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) +static int iwch_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { struct iwch_dev *dev; + PDBG("%s ibdev %p\n", __func__, ibdev); + if (uhw->inlen || uhw->outlen) + return -EINVAL; + dev = to_iwch_dev(ibdev); memset(props, 0, sizeof *props); memcpy(&props->sys_image_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6); diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index ef08a9f..05a96a5 100644 --- a/drivers/infiniband/hw/cxgb4/provider.c +++ b/drivers/infiniband/hw/cxgb4/provider.c @@ -301,13 +301,17 @@ static int c4iw_query_gid(struct ib_device *ibdev, u8 port, int index, return 0; } -static int c4iw_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) +static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { struct c4iw_dev *dev; + PDBG("%s ibdev %p\n", __func__, ibdev); + if (uhw->inlen || uhw->outlen) + return -EINVAL; + dev = to_c4iw_dev(ibdev); memset(props, 0, sizeof *props); memcpy(&props->sys_image_guid, dev->rdev.lldi.ports[0]->dev_addr, 6); diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c index 9ed4d25..e8b1bb6 100644 --- a/drivers/infiniband/hw/ehca/ehca_hca.c +++ b/drivers/infiniband/hw/ehca/ehca_hca.c @@ -50,7 +50,8 @@ static unsigned int limit_uint(unsigned int value) return min_t(unsigned int, value, INT_MAX); } -int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { int i, ret = 0; struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, @@ -71,6 +72,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) IB_DEVICE_PORT_ACTIVE_EVENT, HCA_CAP_PORT_ACTIVE_EVENT, }; + if (uhw->inlen || uhw->outlen) + return -EINVAL; + rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL); if (!rblock) { ehca_err(&shca->ib_device, "Can't allocate rblock memory."); diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h index 952f2fb..192eef1 100644 --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -44,7 +44,8 @@ #include "ehca_classes.h" -int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props); +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw); int ehca_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props); diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 764081d..78af73c 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1495,11 +1495,14 @@ bail: return 0; } -static int ipath_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) +static int ipath_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { struct ipath_ibdev *dev = to_idev(ibdev); + if (uhw->inlen || uhw->outlen) + return -EINVAL; + memset(props, 0, sizeof(*props)); props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR | diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 3b90df0..914beae 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -132,7 +132,8 @@ static int num_ib_ports(struct mlx4_dev *dev) } static int mlx4_ib_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) + struct ib_device_attr *props, + struct ib_udata *uhw) { struct mlx4_ib_dev *dev = to_mdev(ibdev); struct ib_smp *in_mad = NULL; @@ -140,6 +141,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, int err = -ENOMEM; int have_ib_ports; + if (uhw->inlen || uhw->outlen) + return -EINVAL; + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index b9976a0..5a8486c 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -63,7 +63,8 @@ static char mlx5_version[] = DRIVER_VERSION " (" DRIVER_RELDATE ")\n"; static int mlx5_ib_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) + struct ib_device_attr *props, + struct ib_udata *uhw) { struct mlx5_ib_dev *dev = to_mdev(ibdev); struct ib_smp *in_mad = NULL; @@ -74,6 +75,9 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, int max_sq_sg; u64 flags; + if (uhw->inlen || uhw->outlen) + return -EINVAL; + gen = &dev->mdev->caps.gen; in_mad = kzalloc(sizeof(*in_mad), GFP_KERNEL); out_mad = kmalloc(sizeof(*out_mad), GFP_KERNEL); @@ -910,6 +914,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev) struct mlx5_general_caps *gen; int err = -ENOMEM; int port; + struct ib_udata uhw = {.inlen = 0, .outlen = 0}; gen = &dev->mdev->caps.gen; pprops = kmalloc(sizeof(*pprops), GFP_KERNEL); @@ -920,7 +925,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev) if (!dprops) goto out; - err = mlx5_ib_query_device(&dev->ib_dev, dprops); + err = mlx5_ib_query_device(&dev->ib_dev, dprops, &uhw); if (err) { mlx5_ib_warn(dev, "query_device failed %d\n", err); goto out; diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index f1b04bd..d6d7c16 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -57,14 +57,17 @@ static void init_query_mad(struct ib_smp *mad) mad->method = IB_MGMT_METHOD_GET; } -static int mthca_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) +static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { struct ib_smp *in_mad = NULL; struct ib_smp *out_mad = NULL; int err = -ENOMEM; struct mthca_dev *mdev = to_mdev(ibdev); + if (uhw->inlen || uhw->outlen) + return -EINVAL; + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index a9d62c5..0c2fb7a 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -512,12 +512,16 @@ static void nes_free_fast_reg_page_list(struct ib_fast_reg_page_list *pifrpl) /** * nes_query_device */ -static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props) +static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { struct nes_vnic *nesvnic = to_nesvnic(ibdev); struct nes_device *nesdev = nesvnic->nesdev; struct nes_ib_device *nesibdev = nesvnic->nesibdev; + if (uhw->inlen || uhw->outlen) + return -EINVAL; + memset(props, 0, sizeof(*props)); memcpy(&props->sys_image_guid, nesvnic->netdev->dev_addr, 6); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index f441df6..fce11ec 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -61,10 +61,14 @@ int ocrdma_query_gid(struct ib_device *ibdev, u8 port, return 0; } -int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr) +int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr, + struct ib_udata *uhw) { struct ocrdma_dev *dev = get_ocrdma_dev(ibdev); + if (uhw->inlen || uhw->outlen) + return -EINVAL; + memset(attr, 0, sizeof *attr); memcpy(&attr->fw_ver, &dev->attr.fw_ver[0], min(sizeof(dev->attr.fw_ver), sizeof(attr->fw_ver))); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h index e853195..b15c608 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h @@ -36,7 +36,8 @@ int ocrdma_post_recv(struct ib_qp *, struct ib_recv_wr *, int ocrdma_poll_cq(struct ib_cq *, int num_entries, struct ib_wc *wc); int ocrdma_arm_cq(struct ib_cq *, enum ib_cq_notify_flags flags); -int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props); +int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props, + struct ib_udata *uhw); int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props); int ocrdma_modify_port(struct ib_device *, u8 port, int mask, struct ib_port_modify *props); diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c index dba1c92..e1dad45 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.c +++ b/drivers/infiniband/hw/qib/qib_verbs.c @@ -1550,12 +1550,14 @@ full: } } -static int qib_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) +static int qib_query_device(struct ib_device *ibdev, struct ib_device_attr *props, + struct ib_udata *uhw) { struct qib_devdata *dd = dd_from_ibdev(ibdev); struct qib_ibdev *dev = to_idev(ibdev); + if (uhw->inlen || uhw->outlen) + return -EINVAL; memset(props, 0, sizeof(*props)); props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR | diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c index 2b3f7e3..7df4382 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c @@ -248,7 +248,8 @@ enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device, } int usnic_ib_query_device(struct ib_device *ibdev, - struct ib_device_attr *props) + struct ib_device_attr *props, + struct ib_udata *uhw) { struct usnic_ib_dev *us_ibdev = to_usdev(ibdev); union ib_gid gid; @@ -257,6 +258,9 @@ int usnic_ib_query_device(struct ib_device *ibdev, int qp_per_vf; usnic_dbg("\n"); + if (uhw->inlen || uhw->outlen) + return -EINVAL; + mutex_lock(&us_ibdev->usdev_lock); us_ibdev->netdev->ethtool_ops->get_drvinfo(us_ibdev->netdev, &info); us_ibdev->netdev->ethtool_ops->get_settings(us_ibdev->netdev, &cmd); diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h index 2ab8427..0bd04ef 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h @@ -24,7 +24,8 @@ enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device, u8 port_num); int usnic_ib_query_device(struct ib_device *ibdev, - struct ib_device_attr *props); + struct ib_device_attr *props, + struct ib_udata *uhw); int usnic_ib_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props); enum rdma_protocol_type diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index a45f674..045b5b2 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1562,7 +1562,8 @@ struct ib_device { int (*get_protocol_stats)(struct ib_device *device, union rdma_protocol_stats *stats); int (*query_device)(struct ib_device *device, - struct ib_device_attr *device_attr); + struct ib_device_attr *device_attr, + struct ib_udata *udata); int (*query_port)(struct ib_device *device, u8 port_num, struct ib_port_attr *port_attr); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
[parent not found: <1433074457-26437-7-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device [not found] ` <1433074457-26437-7-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-06-01 9:24 ` Devesh Sharma 0 siblings, 0 replies; 64+ messages in thread From: Devesh Sharma @ 2015-06-01 9:24 UTC (permalink / raw) To: Or Gerlitz Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak ocrdma part Looks good. Reviewed-By: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org> On Sun, May 31, 2015 at 5:44 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote: > From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > > Vendors should be able to pass vendor specific data to/from > user-space via query_device uverb. In order to do this, > we need to pass the vendors' specific udata. > > Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > --- > drivers/infiniband/core/device.c | 4 +++- > drivers/infiniband/core/uverbs_cmd.c | 2 +- > drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++-- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 8 ++++++-- > drivers/infiniband/hw/cxgb4/provider.c | 8 ++++++-- > drivers/infiniband/hw/ehca/ehca_hca.c | 6 +++++- > drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 ++- > drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++-- > drivers/infiniband/hw/mlx4/main.c | 6 +++++- > drivers/infiniband/hw/mlx5/main.c | 9 +++++++-- > drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++-- > drivers/infiniband/hw/nes/nes_verbs.c | 6 +++++- > drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 +++++- > drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 ++- > drivers/infiniband/hw/qib/qib_verbs.c | 6 ++++-- > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 +++++- > drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 3 ++- > include/rdma/ib_verbs.h | 3 ++- > 18 files changed, 75 insertions(+), 25 deletions(-) > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > index 568cb41..694bd66 100644 > --- a/drivers/infiniband/core/device.c > +++ b/drivers/infiniband/core/device.c > @@ -539,9 +539,11 @@ EXPORT_SYMBOL(ib_dispatch_event); > int ib_query_device(struct ib_device *device, > struct ib_device_attr *device_attr) > { > + struct ib_udata uhw = {.outlen = 0, .inlen = 0}; > + > memset(device_attr, 0, sizeof(*device_attr)); > > - return device->query_device(device, device_attr); > + return device->query_device(device, device_attr, &uhw); > } > EXPORT_SYMBOL(ib_query_device); > > diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c > index 11ee298..bbb02ff 100644 > --- a/drivers/infiniband/core/uverbs_cmd.c > +++ b/drivers/infiniband/core/uverbs_cmd.c > @@ -3428,7 +3428,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, > > memset(&attr, 0, sizeof(attr)); > > - err = device->query_device(device, &attr); > + err = device->query_device(device, &attr, uhw); > if (err) > return err; > > diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c > index a43e022..382f109 100644 > --- a/drivers/infiniband/hw/amso1100/c2_provider.c > +++ b/drivers/infiniband/hw/amso1100/c2_provider.c > @@ -63,13 +63,16 @@ > #include "c2_provider.h" > #include "c2_user.h" > > -static int c2_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > +static int c2_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct c2_dev *c2dev = to_c2dev(ibdev); > > pr_debug("%s:%u\n", __func__, __LINE__); > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > *props = c2dev->props; > return 0; > } > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index 2eaf7e8..c4b5936 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -1150,13 +1150,17 @@ static u64 fw_vers_string_to_u64(struct iwch_dev *iwch_dev) > (fw_mic & 0xffff); > } > > -static int iwch_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > +static int iwch_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > > struct iwch_dev *dev; > + > PDBG("%s ibdev %p\n", __func__, ibdev); > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > dev = to_iwch_dev(ibdev); > memset(props, 0, sizeof *props); > memcpy(&props->sys_image_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6); > diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c > index ef08a9f..05a96a5 100644 > --- a/drivers/infiniband/hw/cxgb4/provider.c > +++ b/drivers/infiniband/hw/cxgb4/provider.c > @@ -301,13 +301,17 @@ static int c4iw_query_gid(struct ib_device *ibdev, u8 port, int index, > return 0; > } > > -static int c4iw_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > +static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > > struct c4iw_dev *dev; > + > PDBG("%s ibdev %p\n", __func__, ibdev); > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > dev = to_c4iw_dev(ibdev); > memset(props, 0, sizeof *props); > memcpy(&props->sys_image_guid, dev->rdev.lldi.ports[0]->dev_addr, 6); > diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c > index 9ed4d25..e8b1bb6 100644 > --- a/drivers/infiniband/hw/ehca/ehca_hca.c > +++ b/drivers/infiniband/hw/ehca/ehca_hca.c > @@ -50,7 +50,8 @@ static unsigned int limit_uint(unsigned int value) > return min_t(unsigned int, value, INT_MAX); > } > > -int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) > +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > int i, ret = 0; > struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, > @@ -71,6 +72,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) > IB_DEVICE_PORT_ACTIVE_EVENT, HCA_CAP_PORT_ACTIVE_EVENT, > }; > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL); > if (!rblock) { > ehca_err(&shca->ib_device, "Can't allocate rblock memory."); > diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h > index 952f2fb..192eef1 100644 > --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h > +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h > @@ -44,7 +44,8 @@ > > #include "ehca_classes.h" > > -int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props); > +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw); > > int ehca_query_port(struct ib_device *ibdev, u8 port, > struct ib_port_attr *props); > diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c > index 764081d..78af73c 100644 > --- a/drivers/infiniband/hw/ipath/ipath_verbs.c > +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c > @@ -1495,11 +1495,14 @@ bail: > return 0; > } > > -static int ipath_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > +static int ipath_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct ipath_ibdev *dev = to_idev(ibdev); > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > memset(props, 0, sizeof(*props)); > > props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR | > diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c > index 3b90df0..914beae 100644 > --- a/drivers/infiniband/hw/mlx4/main.c > +++ b/drivers/infiniband/hw/mlx4/main.c > @@ -132,7 +132,8 @@ static int num_ib_ports(struct mlx4_dev *dev) > } > > static int mlx4_ib_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > + struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct mlx4_ib_dev *dev = to_mdev(ibdev); > struct ib_smp *in_mad = NULL; > @@ -140,6 +141,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, > int err = -ENOMEM; > int have_ib_ports; > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); > out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); > if (!in_mad || !out_mad) > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c > index b9976a0..5a8486c 100644 > --- a/drivers/infiniband/hw/mlx5/main.c > +++ b/drivers/infiniband/hw/mlx5/main.c > @@ -63,7 +63,8 @@ static char mlx5_version[] = > DRIVER_VERSION " (" DRIVER_RELDATE ")\n"; > > static int mlx5_ib_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > + struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct mlx5_ib_dev *dev = to_mdev(ibdev); > struct ib_smp *in_mad = NULL; > @@ -74,6 +75,9 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, > int max_sq_sg; > u64 flags; > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > gen = &dev->mdev->caps.gen; > in_mad = kzalloc(sizeof(*in_mad), GFP_KERNEL); > out_mad = kmalloc(sizeof(*out_mad), GFP_KERNEL); > @@ -910,6 +914,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev) > struct mlx5_general_caps *gen; > int err = -ENOMEM; > int port; > + struct ib_udata uhw = {.inlen = 0, .outlen = 0}; > > gen = &dev->mdev->caps.gen; > pprops = kmalloc(sizeof(*pprops), GFP_KERNEL); > @@ -920,7 +925,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev) > if (!dprops) > goto out; > > - err = mlx5_ib_query_device(&dev->ib_dev, dprops); > + err = mlx5_ib_query_device(&dev->ib_dev, dprops, &uhw); > if (err) { > mlx5_ib_warn(dev, "query_device failed %d\n", err); > goto out; > diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c > index f1b04bd..d6d7c16 100644 > --- a/drivers/infiniband/hw/mthca/mthca_provider.c > +++ b/drivers/infiniband/hw/mthca/mthca_provider.c > @@ -57,14 +57,17 @@ static void init_query_mad(struct ib_smp *mad) > mad->method = IB_MGMT_METHOD_GET; > } > > -static int mthca_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > +static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct ib_smp *in_mad = NULL; > struct ib_smp *out_mad = NULL; > int err = -ENOMEM; > struct mthca_dev *mdev = to_mdev(ibdev); > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); > out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); > if (!in_mad || !out_mad) > diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c > index a9d62c5..0c2fb7a 100644 > --- a/drivers/infiniband/hw/nes/nes_verbs.c > +++ b/drivers/infiniband/hw/nes/nes_verbs.c > @@ -512,12 +512,16 @@ static void nes_free_fast_reg_page_list(struct ib_fast_reg_page_list *pifrpl) > /** > * nes_query_device > */ > -static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props) > +static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct nes_vnic *nesvnic = to_nesvnic(ibdev); > struct nes_device *nesdev = nesvnic->nesdev; > struct nes_ib_device *nesibdev = nesvnic->nesibdev; > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > memset(props, 0, sizeof(*props)); > memcpy(&props->sys_image_guid, nesvnic->netdev->dev_addr, 6); > > diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c > index f441df6..fce11ec 100644 > --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c > +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c > @@ -61,10 +61,14 @@ int ocrdma_query_gid(struct ib_device *ibdev, u8 port, > return 0; > } > > -int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr) > +int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr, > + struct ib_udata *uhw) > { > struct ocrdma_dev *dev = get_ocrdma_dev(ibdev); > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > memset(attr, 0, sizeof *attr); > memcpy(&attr->fw_ver, &dev->attr.fw_ver[0], > min(sizeof(dev->attr.fw_ver), sizeof(attr->fw_ver))); > diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h > index e853195..b15c608 100644 > --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h > +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h > @@ -36,7 +36,8 @@ int ocrdma_post_recv(struct ib_qp *, struct ib_recv_wr *, > int ocrdma_poll_cq(struct ib_cq *, int num_entries, struct ib_wc *wc); > int ocrdma_arm_cq(struct ib_cq *, enum ib_cq_notify_flags flags); > > -int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props); > +int ocrdma_query_device(struct ib_device *, struct ib_device_attr *props, > + struct ib_udata *uhw); > int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props); > int ocrdma_modify_port(struct ib_device *, u8 port, int mask, > struct ib_port_modify *props); > diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c > index dba1c92..e1dad45 100644 > --- a/drivers/infiniband/hw/qib/qib_verbs.c > +++ b/drivers/infiniband/hw/qib/qib_verbs.c > @@ -1550,12 +1550,14 @@ full: > } > } > > -static int qib_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > +static int qib_query_device(struct ib_device *ibdev, struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct qib_devdata *dd = dd_from_ibdev(ibdev); > struct qib_ibdev *dev = to_idev(ibdev); > > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > memset(props, 0, sizeof(*props)); > > props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR | > diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c > index 2b3f7e3..7df4382 100644 > --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c > +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c > @@ -248,7 +248,8 @@ enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device, > } > > int usnic_ib_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props) > + struct ib_device_attr *props, > + struct ib_udata *uhw) > { > struct usnic_ib_dev *us_ibdev = to_usdev(ibdev); > union ib_gid gid; > @@ -257,6 +258,9 @@ int usnic_ib_query_device(struct ib_device *ibdev, > int qp_per_vf; > > usnic_dbg("\n"); > + if (uhw->inlen || uhw->outlen) > + return -EINVAL; > + > mutex_lock(&us_ibdev->usdev_lock); > us_ibdev->netdev->ethtool_ops->get_drvinfo(us_ibdev->netdev, &info); > us_ibdev->netdev->ethtool_ops->get_settings(us_ibdev->netdev, &cmd); > diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h > index 2ab8427..0bd04ef 100644 > --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h > +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h > @@ -24,7 +24,8 @@ > enum rdma_link_layer usnic_ib_port_link_layer(struct ib_device *device, > u8 port_num); > int usnic_ib_query_device(struct ib_device *ibdev, > - struct ib_device_attr *props); > + struct ib_device_attr *props, > + struct ib_udata *uhw); > int usnic_ib_query_port(struct ib_device *ibdev, u8 port, > struct ib_port_attr *props); > enum rdma_protocol_type > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index a45f674..045b5b2 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -1562,7 +1562,8 @@ struct ib_device { > int (*get_protocol_stats)(struct ib_device *device, > union rdma_protocol_stats *stats); > int (*query_device)(struct ib_device *device, > - struct ib_device_attr *device_attr); > + struct ib_device_attr *device_attr, > + struct ib_udata *udata); > int (*query_port)(struct ib_device *device, > u8 port_num, > struct ib_port_attr *port_attr); > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (5 preceding siblings ...) 2015-05-31 12:14 ` [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz ` (3 subsequent siblings) 10 siblings, 0 replies; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> In order to read the HCA's cycle counter efficiently in user space, we need to map the HCA's register. This is done through mmap call. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx4/main.c | 18 +++++++++++++++++- drivers/net/ethernet/mellanox/mlx4/main.c | 19 +++++++++++++++++++ include/linux/mlx4/device.h | 9 +++++++++ 3 files changed, 45 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 914beae..3992349 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -716,8 +716,24 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) dev->dev->caps.num_uars, PAGE_SIZE, vma->vm_page_prot)) return -EAGAIN; - } else + } else if (vma->vm_pgoff == 3) { + struct mlx4_clock_params params; + int ret = mlx4_get_internal_clock_params(dev->dev, ¶ms); + + if (ret) + return ret; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + if (io_remap_pfn_range(vma, vma->vm_start, + (pci_resource_start(dev->dev->persist->pdev, + params.bar) + + params.offset) + >> PAGE_SHIFT, + PAGE_SIZE, vma->vm_page_prot)) + return -EAGAIN; + } else { return -EINVAL; + } return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index ced5eca..70de39c 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -1674,6 +1674,25 @@ static int map_internal_clock(struct mlx4_dev *dev) return 0; } +int mlx4_get_internal_clock_params(struct mlx4_dev *dev, + struct mlx4_clock_params *params) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + + if (mlx4_is_slave(dev)) + return -ENOTSUPP; + + if (!params) + return -EINVAL; + + params->bar = priv->fw.clock_bar; + params->offset = priv->fw.clock_offset; + params->size = MLX4_CLOCK_SIZE; + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_get_internal_clock_params); + static void unmap_internal_clock(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 83e80ab..f94984f 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -829,6 +829,12 @@ struct mlx4_dev { struct mlx4_vf_dev *dev_vfs; }; +struct mlx4_clock_params { + u64 offset; + u8 bar; + u8 size; +}; + struct mlx4_eqe { u8 reserved1; u8 type; @@ -1485,4 +1491,7 @@ int mlx4_ACCESS_PTYS_REG(struct mlx4_dev *dev, enum mlx4_access_reg_method method, struct mlx4_ptys_reg *ptys_reg); +int mlx4_get_internal_clock_params(struct mlx4_dev *dev, + struct mlx4_clock_params *params); + #endif /* MLX4_DEVICE_H */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
* [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (6 preceding siblings ...) 2015-05-31 12:14 ` [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz [not found] ` <1433074457-26437-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-05-31 12:14 ` [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping Or Gerlitz ` (2 subsequent siblings) 10 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device by setting the appropriate bit in uverbs_ex_cmd_mask. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx4/main.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 3992349..832d571 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2323,6 +2323,10 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) (1ull << IB_USER_VERBS_EX_CMD_DESTROY_FLOW); } + ibdev->ib_dev.uverbs_ex_cmd_mask |= + (1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE) | + (1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ); + mlx4_ib_alloc_eqs(dev, ibdev); spin_lock_init(&iboe->lock); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
[parent not found: <1433074457-26437-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <1433074457-26437-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-06-01 16:56 ` Jason Gunthorpe [not found] ` <20150601165649.GC14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-01 16:56 UTC (permalink / raw) To: Or Gerlitz Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote: > From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > > Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device > by setting the appropriate bit in uverbs_ex_cmd_mask. Why is this a seperate patch? Surely the bits should be or'd in the patches that actually include the code to do the new commands? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150601165649.GC14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <20150601165649.GC14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-02 7:21 ` Matan Barak [not found] ` <CAAKD3BBsTFYKwOBK+6u04x6deRYzjYESJzbpyy83UtoChZhpOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Matan Barak @ 2015-06-02 7:21 UTC (permalink / raw) To: Jason Gunthorpe Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Mon, Jun 1, 2015 at 7:56 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote: >> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> >> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device >> by setting the appropriate bit in uverbs_ex_cmd_mask. > > Why is this a seperate patch? Surely the bits should be or'd in the patches > that actually include the code to do the new commands? > Although this patch can stand on its own (requesting unsupported flags will just fail and the comp_mask of uhw's will be disabled), there's no real value for not squashing it into the other mlx4 patches. We'll do that for the next version. > Jason Thanks for the review. Matan > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAAKD3BBsTFYKwOBK+6u04x6deRYzjYESJzbpyy83UtoChZhpOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <CAAKD3BBsTFYKwOBK+6u04x6deRYzjYESJzbpyy83UtoChZhpOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-02 17:07 ` Jason Gunthorpe [not found] ` <20150602170736.GA17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-02 17:07 UTC (permalink / raw) To: Matan Barak Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Tue, Jun 02, 2015 at 10:21:47AM +0300, Matan Barak wrote: > On Mon, Jun 1, 2015 at 7:56 PM, Jason Gunthorpe > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > > On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote: > >> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > >> > >> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device > >> by setting the appropriate bit in uverbs_ex_cmd_mask. > > > > Why is this a seperate patch? Surely the bits should be or'd in the patches > > that actually include the code to do the new commands? > > > > Although this patch can stand on its own (requesting unsupported flags > will just fail and the > comp_mask of uhw's will be disabled), there's no real value for not > squashing it into the other mlx4 patches. Then there is something wrong with this 'uverbs_ex_cmd_mask' stuff - the bit should be clear if the driver cannot handle the ex inputs, but these patches make all drivers handle the ex style (by checking flags), so all drivers should have some of the bits set.. An extended command that doesn't use any extended features should transparently degrade to the normal command as often as possible. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150602170736.GA17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <20150602170736.GA17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-03 8:57 ` Matan Barak [not found] ` <CAAKD3BARPocK1zZ2nYV0xWHqEASJTYbp1aQxwymi-TYTyK_vGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Matan Barak @ 2015-06-03 8:57 UTC (permalink / raw) To: Jason Gunthorpe Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Tue, Jun 2, 2015 at 8:07 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Tue, Jun 02, 2015 at 10:21:47AM +0300, Matan Barak wrote: >> On Mon, Jun 1, 2015 at 7:56 PM, Jason Gunthorpe >> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: >> > On Sun, May 31, 2015 at 03:14:16PM +0300, Or Gerlitz wrote: >> >> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> >> >> >> Add support for ib_uverbs_ex_create_cq and ib_uverbs_ex_query_device >> >> by setting the appropriate bit in uverbs_ex_cmd_mask. >> > >> > Why is this a seperate patch? Surely the bits should be or'd in the patches >> > that actually include the code to do the new commands? >> > >> >> Although this patch can stand on its own (requesting unsupported flags >> will just fail and the >> comp_mask of uhw's will be disabled), there's no real value for not >> squashing it into the other mlx4 patches. > > Then there is something wrong with this 'uverbs_ex_cmd_mask' stuff - > the bit should be clear if the driver cannot handle the ex inputs, but > these patches make all drivers handle the ex style (by checking > flags), so all drivers should have some of the bits set.. > That's a general comment regarding the extension mechanism. Since by nature the extended verbs as extendible, one consumer could support A and B while the other only supports A, but they both indicate they support this extension verb. You could argue that if the "flags" field wasn't tested, we would have need this uverbs_ex_cmd_mask - but because it could be also used by kernel consumers, this check is necessary. > An extended command that doesn't use any extended features should > transparently degrade to the normal command as often as possible. > That means that uverbs_ex_cmd_mask should only be used on extended commands that are user-space specific. Anyway, we could add these IB_USER_VERBS_EX_CMD flags to all vendors, but IMHO this general problem doesn't relate to this series, which is only about adding timestamp support. > Jason Thanks for your comments. Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAAKD3BARPocK1zZ2nYV0xWHqEASJTYbp1aQxwymi-TYTyK_vGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <CAAKD3BARPocK1zZ2nYV0xWHqEASJTYbp1aQxwymi-TYTyK_vGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-03 16:31 ` Jason Gunthorpe [not found] ` <20150603163146.GD12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-03 16:31 UTC (permalink / raw) To: Matan Barak Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote: > That's a general comment regarding the extension mechanism. Yes, but it is also a specific comment about patch #4 which adds, ib_uverbs_ex_create_cq. Based on the implementation of create_cq, it is pretty clear that every driver supports ib_uverbs_ex_create_cq, so patch #4 should just force the flag in the device register function so it is globally enabled. query_device looks like it is the same, passing in the original structure length will always work on any device. So Mellanox should send a bugfix patch for that as well, unrelated to this series. > > An extended command that doesn't use any extended features should > > transparently degrade to the normal command as often as possible. > > That means that uverbs_ex_cmd_mask should only be used on extended commands that > are user-space specific. The bit should only be clear on commands that can never return anything but ENOSYS. Ie the kernel has no support for the command at all, or for some reason the driver cannot handle the call. The latter case should be rare, it would be the case if we added a new command that was not extending an existing command, for instance. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150603163146.GD12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <20150603163146.GD12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-03 18:58 ` Or Gerlitz [not found] ` <CAJ3xEMhGdX-35WAjauKZE7+Wg39kbXP35UzYX33J6OB8g_9rYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-06-03 18:58 UTC (permalink / raw) To: Jason Gunthorpe Cc: Matan Barak, Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Wed, Jun 3, 2015 at 7:31 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote: >> That's a general comment regarding the extension mechanism. > > Yes, but it is also a specific comment about patch #4 which adds, > ib_uverbs_ex_create_cq. > > Based on the implementation of create_cq, it is pretty clear that > every driver supports ib_uverbs_ex_create_cq, so patch #4 should just > force the flag in the device register function so it is globally enabled. But the other drivers currently do not support any CQ creation flag and hence no extended functionality, I don't see the point signaling towards user-space that the verb is supported, please elaborate. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAJ3xEMhGdX-35WAjauKZE7+Wg39kbXP35UzYX33J6OB8g_9rYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <CAJ3xEMhGdX-35WAjauKZE7+Wg39kbXP35UzYX33J6OB8g_9rYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-03 19:16 ` Jason Gunthorpe [not found] ` <20150603191609.GB7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-03 19:16 UTC (permalink / raw) To: Or Gerlitz Cc: Matan Barak, Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Wed, Jun 03, 2015 at 09:58:25PM +0300, Or Gerlitz wrote: > On Wed, Jun 3, 2015 at 7:31 PM, Jason Gunthorpe > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > > On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote: > >> That's a general comment regarding the extension mechanism. > > > > Yes, but it is also a specific comment about patch #4 which adds, > > ib_uverbs_ex_create_cq. > > > > Based on the implementation of create_cq, it is pretty clear that > > every driver supports ib_uverbs_ex_create_cq, so patch #4 should just > > force the flag in the device register function so it is globally enabled. > > But the other drivers currently do not support any CQ creation flag > and hence no extended functionality, I don't see the point signaling > towards user-space that the verb is supported, please elaborate. They support the base functionality, the flags = 0 case. There is no reason to block access to the base functionality via the extended api. That just creates hassles for userspace. If userspace detects the extended API is present, it can just switch unconditionally all usage to that API. This is how most new kernel syscalls are introduced (glibc does this transparently). Detecting what flags a driver supports (if any) is any entirely different and orthogonal issue to introducing comp_mask/etc. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150603191609.GB7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <20150603191609.GB7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-03 19:35 ` Or Gerlitz [not found] ` <CAJ3xEMiPkxf_Qu40AozMRWZaY-dAfEyAOZGyT9VYsNpQO1Q-Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-06-03 19:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Matan Barak, Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Wed, Jun 3, 2015 at 10:16 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Wed, Jun 03, 2015 at 09:58:25PM +0300, Or Gerlitz wrote: >> On Wed, Jun 3, 2015 at 7:31 PM, Jason Gunthorpe >> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: >> > On Wed, Jun 03, 2015 at 11:57:12AM +0300, Matan Barak wrote: >> >> That's a general comment regarding the extension mechanism. >> > >> > Yes, but it is also a specific comment about patch #4 which adds, >> > ib_uverbs_ex_create_cq. >> > >> > Based on the implementation of create_cq, it is pretty clear that >> > every driver supports ib_uverbs_ex_create_cq, so patch #4 should just >> > force the flag in the device register function so it is globally enabled. >> But the other drivers currently do not support any CQ creation flag >> and hence no extended functionality, I don't see the point signaling >> towards user-space that the verb is supported, please elaborate. > They support the base functionality, the flags = 0 case. which doesn't let consumers to use any new functionality. > There is no reason to block access to the base functionality via the > extended api. That just creates hassles for userspace. > If userspace detects the extended API is present, it can just > switch unconditionally all usage to that API. This is user-space run time story, they don't have the knowledge that all the LL drivers supports the extended api for CQ creation. We had to check the flag and in all LL drivers since the in-kernel IB stack has no (and need not to have any) notion of extended calls. > This is how most new kernel syscalls are introduced (glibc > does this transparently). That's an interesting comment. And you know what, basically we can add auto support for that call in uverbs. But the point here is a bit different: I somehow have the feeling that unless ~each and every one of your review comments are accepted to the letter, no inclusion. You are not the maintainer here, and even maintainers prefer not to force each of their detailed comments on submitters. This specific comment relates TINY in-kernel thing that can be changed later. If from ten comments you give me I accept as is five, with the other five I am trying to argue, on two of them we agree to my side, on two we go your side and on the last one we let the maintainer to cut, this is a healthy process that makes sense. Currently it's feels like of either accepting 98% of the comments you give or no acceptance. > Detecting what flags a driver supports (if any) is any entirely > different and orthogonal issue to introducing comp_mask/etc. I didn't say that the which flags are supported detection relates to exposing that extended uverbs call. I don't understand the "is any entirely different" part of the sentence, is that as of me being EMS-er? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAJ3xEMiPkxf_Qu40AozMRWZaY-dAfEyAOZGyT9VYsNpQO1Q-Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs [not found] ` <CAJ3xEMiPkxf_Qu40AozMRWZaY-dAfEyAOZGyT9VYsNpQO1Q-Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-03 20:38 ` Jason Gunthorpe 0 siblings, 0 replies; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-03 20:38 UTC (permalink / raw) To: Or Gerlitz Cc: Matan Barak, Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak On Wed, Jun 03, 2015 at 10:35:03PM +0300, Or Gerlitz wrote: > > They support the base functionality, the flags = 0 case. > > which doesn't let consumers to use any new functionality. So what? A call with flags = 0 works, why return ENOSYS for all drivers except mlx4 in that case? It doesn't make sense to be asymmetric like that. Again, the extension process (patch #4) was to introduce the flags, as long as the flags is processed properly then the syscall is functional and should not return ENOSYS. It does not matter which flags, if any, are supported. > But the point here is a bit different: I somehow have the feeling that > unless ~each and every one of your review comments are accepted to the > letter, no inclusion. I am just reviewing, Doug will have to decide if discussion is done or not. To be clear: 'no inclusion' from me would be a clear NAK statement. If I'm going to provide my Reviewed-By I want to see: 1) Comments addressed via a code change 2) Comments addressed via a persuasive technical argument 3) Comments addressed as 'too much work'/'un-important'/'personal preference'/etc. 4) Comments addressed because I am wrong And try to be clear about it, explain clearly. > You are not the maintainer here, and even maintainers prefer not to > force each of their detailed comments on submitters. This isn't a detailed comment, this is a significant point about how a UAPI is expected to work. And yes, UAPI is important, details are important and I will argue for my viewpoint. There is a huge difference between doing work on your own drivers and doing core work. I do not know many cases where a maintainer/reviewer of core sections will let details slide. There is a high expectation for core code, and a very high expectation for UAPI. > This specific comment relates TINY in-kernel thing that can be > changed later. Where is the pride? Do it right! > If from ten comments you give me I accept as is five, with the other > five I am trying to argue, on two of them we agree to my side, on two > we go your side and on the last one we let the maintainer to cut, this > is a healthy process that makes sense. Sure, but you have to make a persuasive technical argument.. You can't just argue.. In this case, you completely skipped over my main point: Drivers that only support flags == 0 should not return ENOSYS. I gave several reasons why I think this is important, and how userspace can use this, and how it is normal in the kernel. You responded to the reasons, but ignored the actual thesis, and didn't provide any counter reasons to support your idea: Drivers that only support flags == 0 should return ENOSYS. So we are not debating, we are just arguing, and it isn't productive. > I don't understand the "is any entirely different" part of the > sentence, is that as of me being EMS-er? No, that is just me typoing 'an -> any'. Sorry Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (7 preceding siblings ...) 2015-05-31 12:14 ` [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz @ 2015-05-31 12:14 ` Or Gerlitz 2015-05-31 12:31 ` [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz 2015-05-31 19:00 ` Doug Ledford 10 siblings, 0 replies; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:14 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon, Matan Barak, Or Gerlitz From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> This includes: * support allocation of CQ with the TIMESTAMP_COMPLETION creation flag. * add timestamp_mask and hca_core_clock to query_device, reporting the number of supported timestamp bits (mask) and the hca_core_clock frequency. * return hca core clock's offset in query_device vendor's data, this is needed in order to read the HCA's core clock. Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx4/cq.c | 9 +++++-- drivers/infiniband/hw/mlx4/main.c | 38 ++++++++++++++++++++++++++++++++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 16 ++++++++++++++ 3 files changed, 58 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 8e44aaa..36eb3d0 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -166,6 +166,7 @@ err_buf: return err; } +#define CQ_CREATE_FLAGS_SUPPORTED IB_CQ_FLAGS_TIMESTAMP_COMPLETION struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, const struct ib_cq_init_attr *attr, struct ib_ucontext *context, @@ -178,10 +179,10 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, struct mlx4_uar *uar; int err; - if (attr->flags) + if (entries < 1 || entries > dev->dev->caps.max_cqes) return ERR_PTR(-EINVAL); - if (entries < 1 || entries > dev->dev->caps.max_cqes) + if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED) return ERR_PTR(-EINVAL); cq = kmalloc(sizeof *cq, GFP_KERNEL); @@ -194,6 +195,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, spin_lock_init(&cq->lock); cq->resize_buf = NULL; cq->resize_umem = NULL; + cq->create_flags = attr->flags; INIT_LIST_HEAD(&cq->send_qp_list); INIT_LIST_HEAD(&cq->recv_qp_list); @@ -237,7 +239,8 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, vector = dev->eq_table[vector % ibdev->num_comp_vectors]; err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq, vector, 0, 0); + cq->db.dma, &cq->mcq, vector, 0, + !!(cq->create_flags & IB_CQ_FLAGS_TIMESTAMP_COMPLETION)); if (err) goto err_dbmap; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 832d571..c642bf0 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -140,10 +140,27 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, struct ib_smp *out_mad = NULL; int err = -ENOMEM; int have_ib_ports; + struct mlx4_uverbs_ex_query_device cmd; + struct mlx4_uverbs_ex_query_device_resp resp = {.comp_mask = 0}; + struct mlx4_clock_params clock_params; - if (uhw->inlen || uhw->outlen) - return -EINVAL; + if (uhw->inlen) { + if (uhw->inlen < sizeof(cmd)) + return -EINVAL; + + err = ib_copy_from_udata(&cmd, uhw, sizeof(cmd)); + if (err) + return err; + + if (cmd.comp_mask) + return -EINVAL; + + if (cmd.reserved) + return -EINVAL; + } + resp.response_length = offsetof(typeof(resp), response_length) + + sizeof(resp.response_length); in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) @@ -233,7 +250,24 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * props->max_mcast_grp; props->max_map_per_fmr = dev->dev->caps.max_fmr_maps; + props->hca_core_clock = dev->dev->caps.hca_core_clock; + props->timestamp_mask = 0xFFFFFFFFFFFFULL; + err = mlx4_get_internal_clock_params(dev->dev, &clock_params); + if (err) + goto out; + + if (uhw->outlen >= resp.response_length + sizeof(resp.hca_core_clock_offset)) { + resp.hca_core_clock_offset = clock_params.offset % PAGE_SIZE; + resp.response_length += sizeof(resp.hca_core_clock_offset); + resp.comp_mask |= QUERY_DEVICE_RESP_MASK_TIMESTAMP; + } + + if (uhw->outlen) { + err = ib_copy_to_udata(uhw, &resp, resp.response_length); + if (err) + goto out; + } out: kfree(in_mad); kfree(out_mad); diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index f127efc..8421e15 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -110,6 +110,7 @@ struct mlx4_ib_cq { struct mutex resize_mutex; struct ib_umem *umem; struct ib_umem *resize_umem; + int create_flags; /* List of qps that it serves.*/ struct list_head send_qp_list; struct list_head recv_qp_list; @@ -555,6 +556,21 @@ struct mlx4_ib_qp_tunnel_init_attr { u8 port; }; +struct mlx4_uverbs_ex_query_device { + __u32 comp_mask; + __u32 reserved; +}; + +enum query_device_resp_mask { + QUERY_DEVICE_RESP_MASK_TIMESTAMP = 1UL << 0, +}; + +struct mlx4_uverbs_ex_query_device_resp { + __u32 comp_mask; + __u32 response_length; + __u64 hca_core_clock_offset; +}; + static inline struct mlx4_ib_dev *to_mdev(struct ib_device *ibdev) { return container_of(ibdev, struct mlx4_ib_dev, ib_dev); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (8 preceding siblings ...) 2015-05-31 12:14 ` [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping Or Gerlitz @ 2015-05-31 12:31 ` Or Gerlitz 2015-05-31 19:00 ` Doug Ledford 10 siblings, 0 replies; 64+ messages in thread From: Or Gerlitz @ 2015-05-31 12:31 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On 5/31/2015 3:14 PM, Or Gerlitz wrote: > We would to go for one helper Would like to go for one helper -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (9 preceding siblings ...) 2015-05-31 12:31 ` [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz @ 2015-05-31 19:00 ` Doug Ledford [not found] ` <1433098827.114391.179.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 10 siblings, 1 reply; 64+ messages in thread From: Doug Ledford @ 2015-05-31 19:00 UTC (permalink / raw) To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 10062 bytes --] On Sun, 2015-05-31 at 15:14 +0300, Or Gerlitz wrote: > Hi Doug, > > This patchset adds completion timestamping supports for verbs consumers. > > Reviewing the weekend threads, we've changed the flag time to reflect > that this is completion time-stamp and folded the mlx4 actual support > into one patch. > > Regarding the related user-space support, it's possible to add what you > were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and > ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned > by the poll cq verb an opaque one that must go through one of the convertors. > > We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which > could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME. I'm theoretically OK with something similar to the above. However, the NS time should not be raw. It should be cooked and should be able to be valid to compare between different adapters. Right now, the cycle counter that you are exposing is only useful for ordering between packets received on a single adapter where the cycle counter is the same on all packets. Throw in a different vendor's card, or two of your own cards, and the issue gets much more complex. The cooked value should be an actual, real time that can be used across these more complex environments. Because of that, it really shouldn't be called RAW. So, if you want a single entry point, I would suggest something like this: enum ib_timestamp_flags { IB_TIMESTAMP_COMPLETION = (1 << 0), // specify on create_cq IB_TIMESTAMP_WQE_BEGIN = (1 << 1), // specify on create qp? IB_TIMESTAMP_WQE_END = (1 << 2), // specify on create qp? IB_TIMESTAMP_RAW = (1 << 31) }; enum ib_cq_creation_flags { IB_CQ_FLAGS_TIMESTAMP_COMPLETION = (1 << 0) }; /** * ibv_get_timestamp - Return the requested timestamp for the given wc * @wc - work completion to get timestamp results from * @ts - struct timespec to return timestamp in * @flags - which timestamp to return and in what form * * Depending on the flags used to create the queue pair/completion * queue, different timestamps might be available. Callers should * specify which timestamp they are interested in using the flags * element, and if they wish either a cooked or raw timestamp. A * raw timestamp is implementation defined and will be passed back * in the tv_nsec portion of the struct timespec. A raw timestamp * can not be relied upon to have any ordering value between more * than one HCA or driver. A cooked timestamp will return a valid * struct timespec normalized as closely as possible to the return * value for CLOCK_MONOTONIC of clock_gettime at the time of the * timestamp. */ int ibv_get_timestamp(struct ibv_wc *wc, struct timespec *ts, int flags); > We think this would address the reviewer comments for the kernel submission. > > The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the > conversion functions) > > https://github.com/matanb10/libibverbs timestamp-v1 > https://github.com/matanb10/libmlx4 timestamp-v1 > > Timestamping is used by applications in order to know when a WQE was > received/transmitted by the HW. The value is given is HCA hardware cycles, > but could be easily converted as the hardware's core clock frequecny is > available through extension of query device. > > Moreover, we add an ability to read the HCA's current clock. This could be > useful on order to synchronize events to the wall clock. > > This functionality is achieved by adding/extending the following verbs: > > create_cq - create_cq is extended in order to allow passing creation flags > to the CQ creation function. We change IB/core --> vendors API > to be easily extendible by passing a struct which contains > comp_vectors, cqe and the new flags parameter. In order to create > CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given. > > query_device - We extend query_device uverb further by giving the hardware's > clock frequency and the timestamp mask (the number of timestamp > bits which are supported). If timestamp isn't supported, 0 is returned. > > In order to read the timestamp in the WQE, the user needs to query the device > for support, create an appropriate CQ (using the extanded uverb with > IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently, > only implemented in user-space). > > In mlx4, allowing the user to read the core clock efficiently involves mapping > this area of the hardware to user-space (being done by using a mmap command) > and reading the clock from the correct offset of the page. > > This offset is returned in the vendor's specific data from mlx4's kernel driver > to the mlx4's user-space driver. query_device is modified in order to support > passing this vendor specific data. A user-space application could use a new > verb in order to read the hardware's clock. > > Translating the hardware's clock into ms could be done by dividing this > value by hca_core_clock (which is returned by the extended version of > query_device uverb). > > A user-space application could get the current HW's clock by executing > > ibv_query_values_ex(struct ibv_context *context, uint32_t q_values, > struct ibv_values_ex *values) > > The function gets a mask of the values to query and return their values. > Vendors could either implement this as a uverb command or use their > user-space driver to return those values directly from the HW (the mlx4 way). > > Matan and Or. > > Changes from V1: > (1) fixed lustre IB's code build > (2) squashed mlx4 V1 9-11 patches into one > (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION > > Changes from V0: > (1) Pass ib_cq_init_attr instead of cqe and comp_vector. > (2) Fix unneeded indentation. > (3) Change flags to u32. > (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation. > > Matan Barak (9): > IB/core: Change provider's API of create_cq to be extendible > IB/core: Change ib_create_cq to use struct ib_cq_init_attr > IB/core: Add CQ creation time-stamping flag > IB/core: Extend ib_uverbs_create_cq > IB/core: Add timestamp_mask and hca_core_clock to query_device > IB/core: Pass hardware specific data in query_device > IB/mlx4: Add mmap call to map the hardware clock > IB/mlx4: Support extended create_cq and query_device uverbs > IB/mlx4: Add support for CQ time-stamping > > drivers/infiniband/core/device.c | 6 +- > drivers/infiniband/core/mad.c | 5 +- > drivers/infiniband/core/uverbs.h | 1 + > drivers/infiniband/core/uverbs_cmd.c | 188 ++++++++++++++++---- > drivers/infiniband/core/uverbs_main.c | 1 + > drivers/infiniband/core/verbs.c | 4 +- > drivers/infiniband/hw/amso1100/c2_provider.c | 14 ++- > drivers/infiniband/hw/cxgb3/iwch_provider.c | 19 ++- > drivers/infiniband/hw/cxgb4/cq.c | 9 +- > drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 8 +- > drivers/infiniband/hw/cxgb4/provider.c | 8 +- > drivers/infiniband/hw/ehca/ehca_cq.c | 7 +- > drivers/infiniband/hw/ehca/ehca_hca.c | 6 +- > drivers/infiniband/hw/ehca/ehca_iverbs.h | 6 +- > drivers/infiniband/hw/ehca/ehca_main.c | 6 +- > drivers/infiniband/hw/ipath/ipath_cq.c | 9 +- > drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +- > drivers/infiniband/hw/ipath/ipath_verbs.h | 3 +- > drivers/infiniband/hw/mlx4/cq.c | 13 ++- > drivers/infiniband/hw/mlx4/mad.c | 5 +- > drivers/infiniband/hw/mlx4/main.c | 67 +++++++- > drivers/infiniband/hw/mlx4/mlx4_ib.h | 19 ++- > drivers/infiniband/hw/mlx5/cq.c | 10 +- > drivers/infiniband/hw/mlx5/main.c | 19 ++- > drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 +- > drivers/infiniband/hw/mthca/mthca_provider.c | 15 ++- > drivers/infiniband/hw/nes/nes_verbs.c | 17 ++- > drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 13 ++- > drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 9 +- > drivers/infiniband/hw/qib/qib_cq.c | 11 +- > drivers/infiniband/hw/qib/qib_verbs.c | 6 +- > drivers/infiniband/hw/qib/qib_verbs.h | 5 +- > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 16 ++- > drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 10 +- > drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 9 +- > drivers/infiniband/ulp/iser/iser_verbs.c | 6 +- > drivers/infiniband/ulp/isert/ib_isert.c | 6 +- > drivers/infiniband/ulp/srp/ib_srp.c | 10 +- > drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +- > drivers/net/ethernet/mellanox/mlx4/main.c | 19 ++ > .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 7 +- > include/linux/mlx4/device.h | 9 + > include/rdma/ib_verbs.h | 25 ++- > include/uapi/rdma/ib_user_verbs.h | 19 ++ > net/9p/trans_rdma.c | 5 +- > net/rds/ib_cm.c | 8 +- > net/rds/iw_cm.c | 8 +- > net/sunrpc/xprtrdma/svc_rdma_transport.c | 10 +- > net/sunrpc/xprtrdma/verbs.c | 10 +- > 49 files changed, 564 insertions(+), 139 deletions(-) > -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1433098827.114391.179.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433098827.114391.179.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2015-06-01 9:30 ` Matan Barak [not found] ` <CAAKD3BBo9A5AV9Uv-q1TgSQ6hs-oSUn5QsUT+5+6mk1RwW7AUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Matan Barak @ 2015-06-01 9:30 UTC (permalink / raw) To: Doug Ledford Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Sun, May 31, 2015 at 10:00 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Sun, 2015-05-31 at 15:14 +0300, Or Gerlitz wrote: >> Hi Doug, >> >> This patchset adds completion timestamping supports for verbs consumers. >> >> Reviewing the weekend threads, we've changed the flag time to reflect >> that this is completion time-stamp and folded the mlx4 actual support >> into one patch. >> >> Regarding the related user-space support, it's possible to add what you >> were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and >> ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned >> by the poll cq verb an opaque one that must go through one of the convertors. >> >> We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which >> could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME. > > I'm theoretically OK with something similar to the above. However, the > NS time should not be raw. It should be cooked and should be able to be > valid to compare between different adapters. Right now, the cycle > counter that you are exposing is only useful for ordering between > packets received on a single adapter where the cycle counter is the same > on all packets. Throw in a different vendor's card, or two of your own > cards, and the issue gets much more complex. The cooked value should be > an actual, real time that can be used across these more complex > environments. Because of that, it really shouldn't be called RAW. > Thanks for the feedback Doug. We wanted to add RAW_NS in order to free the user from calculating it by himself (dividing the cycles value in the core_clock). In addition to this, it's possible to implement a future NS_TIME (without the "raw"), which will convert the opaque time to system wide ns. > So, if you want a single entry point, I would suggest something like > this: > > enum ib_timestamp_flags { > IB_TIMESTAMP_COMPLETION = (1 << 0), // specify on create_cq > IB_TIMESTAMP_WQE_BEGIN = (1 << 1), // specify on create qp? > IB_TIMESTAMP_WQE_END = (1 << 2), // specify on create qp? > IB_TIMESTAMP_RAW = (1 << 31) > }; > > enum ib_cq_creation_flags { > IB_CQ_FLAGS_TIMESTAMP_COMPLETION = (1 << 0) > }; > > /** > * ibv_get_timestamp - Return the requested timestamp for the given wc > * @wc - work completion to get timestamp results from > * @ts - struct timespec to return timestamp in > * @flags - which timestamp to return and in what form > * > * Depending on the flags used to create the queue pair/completion > * queue, different timestamps might be available. Callers should > * specify which timestamp they are interested in using the flags > * element, and if they wish either a cooked or raw timestamp. A > * raw timestamp is implementation defined and will be passed back > * in the tv_nsec portion of the struct timespec. A raw timestamp > * can not be relied upon to have any ordering value between more > * than one HCA or driver. A cooked timestamp will return a valid > * struct timespec normalized as closely as possible to the return > * value for CLOCK_MONOTONIC of clock_gettime at the time of the > * timestamp. > */ > int ibv_get_timestamp(struct ibv_wc *wc, struct timespec *ts, int > flags); > We wanted to divide the flow here: In create_cq, the user notifies the kernel/HCA which timestamp he would like to get. It could be a completion timestamp, a start of WQE timestamp or whatever he wants. The timestamp the user gets in the WQE is opaque. Every vendor could implement it as it wants - in order to have minimal implication in performance. The second part is ibv_get_timestamp. It gets an opaque timestamp and outputs a converted value in respect to the time the user wanted to get. For example, if IB_TIMESTAMP_NS_TIME is given, the function should output a system-wide NS value (we would like to implement this only in the future). Currently, only RAW and RAW_NS will be supported, while RAW gives the time in cycles and RAW_NS gives a NS value with an unknown time reference. We think ibv_get_timestamp shouldn't get a wqe but a 64bit opaque value. The reason for this is that it could be used in order to translate query_values current time to different types of timestamp. What do you think? >> We think this would address the reviewer comments for the kernel submission. >> >> The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the >> conversion functions) >> >> https://github.com/matanb10/libibverbs timestamp-v1 >> https://github.com/matanb10/libmlx4 timestamp-v1 >> >> Timestamping is used by applications in order to know when a WQE was >> received/transmitted by the HW. The value is given is HCA hardware cycles, >> but could be easily converted as the hardware's core clock frequecny is >> available through extension of query device. >> >> Moreover, we add an ability to read the HCA's current clock. This could be >> useful on order to synchronize events to the wall clock. >> >> This functionality is achieved by adding/extending the following verbs: >> >> create_cq - create_cq is extended in order to allow passing creation flags >> to the CQ creation function. We change IB/core --> vendors API >> to be easily extendible by passing a struct which contains >> comp_vectors, cqe and the new flags parameter. In order to create >> CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given. >> >> query_device - We extend query_device uverb further by giving the hardware's >> clock frequency and the timestamp mask (the number of timestamp >> bits which are supported). If timestamp isn't supported, 0 is returned. >> >> In order to read the timestamp in the WQE, the user needs to query the device >> for support, create an appropriate CQ (using the extanded uverb with >> IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently, >> only implemented in user-space). >> >> In mlx4, allowing the user to read the core clock efficiently involves mapping >> this area of the hardware to user-space (being done by using a mmap command) >> and reading the clock from the correct offset of the page. >> >> This offset is returned in the vendor's specific data from mlx4's kernel driver >> to the mlx4's user-space driver. query_device is modified in order to support >> passing this vendor specific data. A user-space application could use a new >> verb in order to read the hardware's clock. >> >> Translating the hardware's clock into ms could be done by dividing this >> value by hca_core_clock (which is returned by the extended version of >> query_device uverb). >> >> A user-space application could get the current HW's clock by executing >> >> ibv_query_values_ex(struct ibv_context *context, uint32_t q_values, >> struct ibv_values_ex *values) >> >> The function gets a mask of the values to query and return their values. >> Vendors could either implement this as a uverb command or use their >> user-space driver to return those values directly from the HW (the mlx4 way). >> >> Matan and Or. >> >> Changes from V1: >> (1) fixed lustre IB's code build >> (2) squashed mlx4 V1 9-11 patches into one >> (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION >> >> Changes from V0: >> (1) Pass ib_cq_init_attr instead of cqe and comp_vector. >> (2) Fix unneeded indentation. >> (3) Change flags to u32. >> (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation. >> >> Matan Barak (9): >> IB/core: Change provider's API of create_cq to be extendible >> IB/core: Change ib_create_cq to use struct ib_cq_init_attr >> IB/core: Add CQ creation time-stamping flag >> IB/core: Extend ib_uverbs_create_cq >> IB/core: Add timestamp_mask and hca_core_clock to query_device >> IB/core: Pass hardware specific data in query_device >> IB/mlx4: Add mmap call to map the hardware clock >> IB/mlx4: Support extended create_cq and query_device uverbs >> IB/mlx4: Add support for CQ time-stamping >> >> drivers/infiniband/core/device.c | 6 +- >> drivers/infiniband/core/mad.c | 5 +- >> drivers/infiniband/core/uverbs.h | 1 + >> drivers/infiniband/core/uverbs_cmd.c | 188 ++++++++++++++++---- >> drivers/infiniband/core/uverbs_main.c | 1 + >> drivers/infiniband/core/verbs.c | 4 +- >> drivers/infiniband/hw/amso1100/c2_provider.c | 14 ++- >> drivers/infiniband/hw/cxgb3/iwch_provider.c | 19 ++- >> drivers/infiniband/hw/cxgb4/cq.c | 9 +- >> drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 8 +- >> drivers/infiniband/hw/cxgb4/provider.c | 8 +- >> drivers/infiniband/hw/ehca/ehca_cq.c | 7 +- >> drivers/infiniband/hw/ehca/ehca_hca.c | 6 +- >> drivers/infiniband/hw/ehca/ehca_iverbs.h | 6 +- >> drivers/infiniband/hw/ehca/ehca_main.c | 6 +- >> drivers/infiniband/hw/ipath/ipath_cq.c | 9 +- >> drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +- >> drivers/infiniband/hw/ipath/ipath_verbs.h | 3 +- >> drivers/infiniband/hw/mlx4/cq.c | 13 ++- >> drivers/infiniband/hw/mlx4/mad.c | 5 +- >> drivers/infiniband/hw/mlx4/main.c | 67 +++++++- >> drivers/infiniband/hw/mlx4/mlx4_ib.h | 19 ++- >> drivers/infiniband/hw/mlx5/cq.c | 10 +- >> drivers/infiniband/hw/mlx5/main.c | 19 ++- >> drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 +- >> drivers/infiniband/hw/mthca/mthca_provider.c | 15 ++- >> drivers/infiniband/hw/nes/nes_verbs.c | 17 ++- >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 13 ++- >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 9 +- >> drivers/infiniband/hw/qib/qib_cq.c | 11 +- >> drivers/infiniband/hw/qib/qib_verbs.c | 6 +- >> drivers/infiniband/hw/qib/qib_verbs.h | 5 +- >> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 16 ++- >> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 10 +- >> drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 9 +- >> drivers/infiniband/ulp/iser/iser_verbs.c | 6 +- >> drivers/infiniband/ulp/isert/ib_isert.c | 6 +- >> drivers/infiniband/ulp/srp/ib_srp.c | 10 +- >> drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +- >> drivers/net/ethernet/mellanox/mlx4/main.c | 19 ++ >> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 7 +- >> include/linux/mlx4/device.h | 9 + >> include/rdma/ib_verbs.h | 25 ++- >> include/uapi/rdma/ib_user_verbs.h | 19 ++ >> net/9p/trans_rdma.c | 5 +- >> net/rds/ib_cm.c | 8 +- >> net/rds/iw_cm.c | 8 +- >> net/sunrpc/xprtrdma/svc_rdma_transport.c | 10 +- >> net/sunrpc/xprtrdma/verbs.c | 10 +- >> 49 files changed, 564 insertions(+), 139 deletions(-) >> > > > -- > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > GPG KeyID: 0E572FDD > Thanks for taking a look. Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAAKD3BBo9A5AV9Uv-q1TgSQ6hs-oSUn5QsUT+5+6mk1RwW7AUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <CAAKD3BBo9A5AV9Uv-q1TgSQ6hs-oSUn5QsUT+5+6mk1RwW7AUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-01 11:25 ` Doug Ledford [not found] ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Doug Ledford @ 2015-06-01 11:25 UTC (permalink / raw) To: Matan Barak Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 13671 bytes --] On Mon, 2015-06-01 at 12:30 +0300, Matan Barak wrote: > On Sun, May 31, 2015 at 10:00 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > On Sun, 2015-05-31 at 15:14 +0300, Or Gerlitz wrote: > >> Hi Doug, > >> > >> This patchset adds completion timestamping supports for verbs consumers. > >> > >> Reviewing the weekend threads, we've changed the flag time to reflect > >> that this is completion time-stamp and folded the mlx4 actual support > >> into one patch. > >> > >> Regarding the related user-space support, it's possible to add what you > >> were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and > >> ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned > >> by the poll cq verb an opaque one that must go through one of the convertors. > >> > >> We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which > >> could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME. > > > > I'm theoretically OK with something similar to the above. However, the > > NS time should not be raw. It should be cooked and should be able to be > > valid to compare between different adapters. Right now, the cycle > > counter that you are exposing is only useful for ordering between > > packets received on a single adapter where the cycle counter is the same > > on all packets. Throw in a different vendor's card, or two of your own > > cards, and the issue gets much more complex. The cooked value should be > > an actual, real time that can be used across these more complex > > environments. Because of that, it really shouldn't be called RAW. > > > > Thanks for the feedback Doug. > We wanted to add RAW_NS in order to free the user from calculating it by himself > (dividing the cycles value in the core_clock). What's the point? If it's raw, it's raw. It's not coordinated between adapters. Whether it's in ns or ps or flipflops doesn't matter, it's a flat number that has no reference to anything else, so the only thing that matters is < another version of itself or not. > In addition to this, it's possible to implement a future NS_TIME > (without the "raw"), which > will convert the opaque time to system wide ns. > > > So, if you want a single entry point, I would suggest something like > > this: > > > > enum ib_timestamp_flags { > > IB_TIMESTAMP_COMPLETION = (1 << 0), // specify on create_cq > > IB_TIMESTAMP_WQE_BEGIN = (1 << 1), // specify on create qp? > > IB_TIMESTAMP_WQE_END = (1 << 2), // specify on create qp? > > IB_TIMESTAMP_RAW = (1 << 31) > > }; > > > > enum ib_cq_creation_flags { > > IB_CQ_FLAGS_TIMESTAMP_COMPLETION = (1 << 0) > > }; > > > > /** > > * ibv_get_timestamp - Return the requested timestamp for the given wc > > * @wc - work completion to get timestamp results from > > * @ts - struct timespec to return timestamp in > > * @flags - which timestamp to return and in what form > > * > > * Depending on the flags used to create the queue pair/completion > > * queue, different timestamps might be available. Callers should > > * specify which timestamp they are interested in using the flags > > * element, and if they wish either a cooked or raw timestamp. A > > * raw timestamp is implementation defined and will be passed back > > * in the tv_nsec portion of the struct timespec. A raw timestamp > > * can not be relied upon to have any ordering value between more > > * than one HCA or driver. A cooked timestamp will return a valid > > * struct timespec normalized as closely as possible to the return > > * value for CLOCK_MONOTONIC of clock_gettime at the time of the > > * timestamp. > > */ > > int ibv_get_timestamp(struct ibv_wc *wc, struct timespec *ts, int > > flags); > > > > We wanted to divide the flow here: > In create_cq, the user notifies the kernel/HCA which timestamp he > would like to get. Correction, which timestamp*s*. > It could be a completion timestamp, a start of WQE timestamp or > whatever he wants. > The timestamp the user gets in the WQE is opaque. Every vendor could > implement it > as it wants - in order to have minimal implication in performance. Again, timestamp(s). > The second part is ibv_get_timestamp. It gets an opaque timestamp No. As you've already pointed out, how each vendor implements returning the timestamp(s) could be totally different. There are no timestamp entries in the existing wc struct. Expecting the user to pass the raw value to the ibv_get_timestamp function makes no sense and violates the attempted abstraction of ibverbs. Passing in the wc struct allows the driver to internally allocate a wc struct with extra private elements and pass that back to the user, when the user passes it back to ibv_get_timestamp the elements are there in the private portion of the struct. > and > outputs a converted value in respect to the time the user wanted to get. > For example, if IB_TIMESTAMP_NS_TIME is given, the function should output > a system-wide NS value (we would like to implement this only in the future). > Currently, only RAW and RAW_NS will be supported, while RAW gives the time > in cycles and RAW_NS gives a NS value with an unknown time reference. Raw is raw. Converting from raw yogurt to raw purple makes no sense, it's raw. > We think ibv_get_timestamp shouldn't get a wqe but a 64bit opaque value. > The reason for this is that it could be used in order to translate query_values > current time to different types of timestamp. > What do you think? See above. It needs to be a wc struct unless you plan to identify every timestamp we might concurrently enable and publicly change the wc struct to include all of them so that the user can get the opaque value to pass to the conversion function. > >> We think this would address the reviewer comments for the kernel submission. > >> > >> The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the > >> conversion functions) > >> > >> https://github.com/matanb10/libibverbs timestamp-v1 > >> https://github.com/matanb10/libmlx4 timestamp-v1 > >> > >> Timestamping is used by applications in order to know when a WQE was > >> received/transmitted by the HW. The value is given is HCA hardware cycles, > >> but could be easily converted as the hardware's core clock frequecny is > >> available through extension of query device. > >> > >> Moreover, we add an ability to read the HCA's current clock. This could be > >> useful on order to synchronize events to the wall clock. > >> > >> This functionality is achieved by adding/extending the following verbs: > >> > >> create_cq - create_cq is extended in order to allow passing creation flags > >> to the CQ creation function. We change IB/core --> vendors API > >> to be easily extendible by passing a struct which contains > >> comp_vectors, cqe and the new flags parameter. In order to create > >> CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given. > >> > >> query_device - We extend query_device uverb further by giving the hardware's > >> clock frequency and the timestamp mask (the number of timestamp > >> bits which are supported). If timestamp isn't supported, 0 is returned. > >> > >> In order to read the timestamp in the WQE, the user needs to query the device > >> for support, create an appropriate CQ (using the extanded uverb with > >> IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently, > >> only implemented in user-space). > >> > >> In mlx4, allowing the user to read the core clock efficiently involves mapping > >> this area of the hardware to user-space (being done by using a mmap command) > >> and reading the clock from the correct offset of the page. > >> > >> This offset is returned in the vendor's specific data from mlx4's kernel driver > >> to the mlx4's user-space driver. query_device is modified in order to support > >> passing this vendor specific data. A user-space application could use a new > >> verb in order to read the hardware's clock. > >> > >> Translating the hardware's clock into ms could be done by dividing this > >> value by hca_core_clock (which is returned by the extended version of > >> query_device uverb). > >> > >> A user-space application could get the current HW's clock by executing > >> > >> ibv_query_values_ex(struct ibv_context *context, uint32_t q_values, > >> struct ibv_values_ex *values) > >> > >> The function gets a mask of the values to query and return their values. > >> Vendors could either implement this as a uverb command or use their > >> user-space driver to return those values directly from the HW (the mlx4 way). > >> > >> Matan and Or. > >> > >> Changes from V1: > >> (1) fixed lustre IB's code build > >> (2) squashed mlx4 V1 9-11 patches into one > >> (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION > >> > >> Changes from V0: > >> (1) Pass ib_cq_init_attr instead of cqe and comp_vector. > >> (2) Fix unneeded indentation. > >> (3) Change flags to u32. > >> (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation. > >> > >> Matan Barak (9): > >> IB/core: Change provider's API of create_cq to be extendible > >> IB/core: Change ib_create_cq to use struct ib_cq_init_attr > >> IB/core: Add CQ creation time-stamping flag > >> IB/core: Extend ib_uverbs_create_cq > >> IB/core: Add timestamp_mask and hca_core_clock to query_device > >> IB/core: Pass hardware specific data in query_device > >> IB/mlx4: Add mmap call to map the hardware clock > >> IB/mlx4: Support extended create_cq and query_device uverbs > >> IB/mlx4: Add support for CQ time-stamping > >> > >> drivers/infiniband/core/device.c | 6 +- > >> drivers/infiniband/core/mad.c | 5 +- > >> drivers/infiniband/core/uverbs.h | 1 + > >> drivers/infiniband/core/uverbs_cmd.c | 188 ++++++++++++++++---- > >> drivers/infiniband/core/uverbs_main.c | 1 + > >> drivers/infiniband/core/verbs.c | 4 +- > >> drivers/infiniband/hw/amso1100/c2_provider.c | 14 ++- > >> drivers/infiniband/hw/cxgb3/iwch_provider.c | 19 ++- > >> drivers/infiniband/hw/cxgb4/cq.c | 9 +- > >> drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 8 +- > >> drivers/infiniband/hw/cxgb4/provider.c | 8 +- > >> drivers/infiniband/hw/ehca/ehca_cq.c | 7 +- > >> drivers/infiniband/hw/ehca/ehca_hca.c | 6 +- > >> drivers/infiniband/hw/ehca/ehca_iverbs.h | 6 +- > >> drivers/infiniband/hw/ehca/ehca_main.c | 6 +- > >> drivers/infiniband/hw/ipath/ipath_cq.c | 9 +- > >> drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +- > >> drivers/infiniband/hw/ipath/ipath_verbs.h | 3 +- > >> drivers/infiniband/hw/mlx4/cq.c | 13 ++- > >> drivers/infiniband/hw/mlx4/mad.c | 5 +- > >> drivers/infiniband/hw/mlx4/main.c | 67 +++++++- > >> drivers/infiniband/hw/mlx4/mlx4_ib.h | 19 ++- > >> drivers/infiniband/hw/mlx5/cq.c | 10 +- > >> drivers/infiniband/hw/mlx5/main.c | 19 ++- > >> drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 +- > >> drivers/infiniband/hw/mthca/mthca_provider.c | 15 ++- > >> drivers/infiniband/hw/nes/nes_verbs.c | 17 ++- > >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 13 ++- > >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 9 +- > >> drivers/infiniband/hw/qib/qib_cq.c | 11 +- > >> drivers/infiniband/hw/qib/qib_verbs.c | 6 +- > >> drivers/infiniband/hw/qib/qib_verbs.h | 5 +- > >> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 16 ++- > >> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 10 +- > >> drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 9 +- > >> drivers/infiniband/ulp/iser/iser_verbs.c | 6 +- > >> drivers/infiniband/ulp/isert/ib_isert.c | 6 +- > >> drivers/infiniband/ulp/srp/ib_srp.c | 10 +- > >> drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +- > >> drivers/net/ethernet/mellanox/mlx4/main.c | 19 ++ > >> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 7 +- > >> include/linux/mlx4/device.h | 9 + > >> include/rdma/ib_verbs.h | 25 ++- > >> include/uapi/rdma/ib_user_verbs.h | 19 ++ > >> net/9p/trans_rdma.c | 5 +- > >> net/rds/ib_cm.c | 8 +- > >> net/rds/iw_cm.c | 8 +- > >> net/sunrpc/xprtrdma/svc_rdma_transport.c | 10 +- > >> net/sunrpc/xprtrdma/verbs.c | 10 +- > >> 49 files changed, 564 insertions(+), 139 deletions(-) > >> > > > > > > -- > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > GPG KeyID: 0E572FDD > > > > Thanks for taking a look. > > Matan -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2015-06-01 13:58 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506010852360.26845-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-01 16:43 ` Jason Gunthorpe 1 sibling, 1 reply; 64+ messages in thread From: Christoph Lameter @ 2015-06-01 13:58 UTC (permalink / raw) To: Doug Ledford Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Mon, 1 Jun 2015, Doug Ledford wrote: > What's the point? If it's raw, it's raw. It's not coordinated between > adapters. Whether it's in ns or ps or flipflops doesn't matter, it's a > flat number that has no reference to anything else, so the only thing > that matters is < another version of itself or not. It can be coordinated between different adapter through the use of time software that can work with cycles and frequencies to scale the value of the cycles to realtime. Software like that is available in ptpd, timekeeper etc. Each NIC basically has its own clock and the timekeeping software would have to track the scaling and the aberration factor over time in order to come up with accurate absolute time values derived from the cycle counters of these NICs. Since we are dealing here with values that need to be accurate to within less than 100ns this is not trivial and one can easily get a ns value that is absolutely useless. Since it is not trivial its better kept out of the timestamp support in the RDMA API. If the app developer wants a trivial conversion then they can opencode a simple multiplication by the frequency. At that point it should be clear though that this raw time value is of limited use given its inaccuracy and the dependence on the NIC clock. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <alpine.DEB.2.11.1506010852360.26845-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <alpine.DEB.2.11.1506010852360.26845-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> @ 2015-06-01 16:39 ` Doug Ledford 0 siblings, 0 replies; 64+ messages in thread From: Doug Ledford @ 2015-06-01 16:39 UTC (permalink / raw) To: Christoph Lameter Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 1911 bytes --] On Mon, 2015-06-01 at 08:58 -0500, Christoph Lameter wrote: > On Mon, 1 Jun 2015, Doug Ledford wrote: > > > What's the point? If it's raw, it's raw. It's not coordinated between > > adapters. Whether it's in ns or ps or flipflops doesn't matter, it's a > > flat number that has no reference to anything else, so the only thing > > that matters is < another version of itself or not. > > It can be coordinated between different adapter through the use of time > software that can work with cycles and frequencies to scale the value of > the cycles to realtime. And that is precisely what the cooked values should be. > Software like that is available in ptpd, > timekeeper etc. Each NIC basically has its own clock and the timekeeping > software would have to track the scaling and the aberration factor over > time in order to come up with accurate absolute time values derived from > the cycle counters of these NICs. Since we are dealing here with values > that need to be accurate to within less than 100ns this is not trivial and > one can easily get a ns value that is absolutely useless. Agreed. The cooked value is not going to be a simple thing. I fail to see how this is making a case that we should duplicate that code in every app that uses a timestamp versus getting it right once in libibverbs. > Since it is not trivial its better kept out of the timestamp support in > the RDMA API. If the app developer wants a trivial conversion then they > can opencode a simple multiplication by the frequency. At that point it > should be clear though that this raw time value is of limited use given > its inaccuracy and the dependence on the NIC clock. The raw value is just that: raw. And it *is* of limited use unless you only have one adapter. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-01 13:58 ` Christoph Lameter @ 2015-06-01 16:43 ` Jason Gunthorpe [not found] ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-01 16:43 UTC (permalink / raw) To: Doug Ledford Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote: > attempted abstraction of ibverbs. Passing in the wc struct allows the > driver to internally allocate a wc struct with extra private elements > and pass that back to the user, when the user passes it back to > ibv_get_timestamp the elements are there in the private portion of the > struct. wc structures are allocated by the caller, there is no option for the driver to create private elements. AFAIK, Christoph's use case is essentially the only meaningful use case for this feature, generalizing too much may destroy the performance that is valuable here. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-01 17:00 ` Doug Ledford [not found] ` <1433178057.114391.208.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-02 14:35 ` Doug Ledford 1 sibling, 1 reply; 64+ messages in thread From: Doug Ledford @ 2015-06-01 17:00 UTC (permalink / raw) To: Jason Gunthorpe Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 1252 bytes --] On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote: > On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote: > > > attempted abstraction of ibverbs. Passing in the wc struct allows the > > driver to internally allocate a wc struct with extra private elements > > and pass that back to the user, when the user passes it back to > > ibv_get_timestamp the elements are there in the private portion of the > > struct. > > wc structures are allocated by the caller, there is no option for the > driver to create private elements. You're right, the data would have to be housed somewhere in the driver private completion structs (for example, in the CQE the card posts to memory that the driver then massages into a WC). > AFAIK, Christoph's use case is essentially the only meaningful use > case for this feature, generalizing too much may destroy the > performance that is valuable here. I'm not convinced of that. Steve has already spoke up about the timestamps available in cxgb4. Those are very different and yet still highly valuable to someone investigating performance of their RDMA application. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1433178057.114391.208.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433178057.114391.208.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2015-06-01 17:29 ` Jason Gunthorpe 0 siblings, 0 replies; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-01 17:29 UTC (permalink / raw) To: Doug Ledford Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Mon, Jun 01, 2015 at 01:00:57PM -0400, Doug Ledford wrote: > > case for this feature, generalizing too much may destroy the > > performance that is valuable here. > I'm not convinced of that. Steve has already spoke up about the > timestamps available in cxgb4. Those are very different and yet still > highly valuable to someone investigating performance of their RDMA > application. ? cxgb4 looks nearly identical to me. There is only one HW time stamp 'cqe_sge_ts', which occurs at some point in the flow, and is written the CQE. The current cycle counter can be read from SGE_TIMESTAMP_LO registers. Same as mlx4, really. The rest is just bookkeeping and logging that doesn't require special verbs support for an app to implement. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-01 17:00 ` Doug Ledford @ 2015-06-02 14:35 ` Doug Ledford [not found] ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 64+ messages in thread From: Doug Ledford @ 2015-06-02 14:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 2327 bytes --] On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote: > On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote: > > > attempted abstraction of ibverbs. Passing in the wc struct allows the > > driver to internally allocate a wc struct with extra private elements > > and pass that back to the user, when the user passes it back to > > ibv_get_timestamp the elements are there in the private portion of the > > struct. > > wc structures are allocated by the caller, there is no option for the > driver to create private elements. Well, they *are* using an extended work completion structure. Unlike what I mentioned, where they create a larger one themselves, you have to allocate a struct ibv_wc_ex instead of a struct ibv_wc and then you have to call poll_cq_ex, which expects a struct ibv_wc_ex. So, just so everyone is clear on this point: the current user space implementation of this feature creates an unversioned, newly named ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the end (not 64bit aligned either). If we ever wanted to have a different extension to our ibv_wc struct, there is no good way to do that. If, at some point, we had multiple extension and the user was able to select which they wanted to utilize, this structure extension is not flexible enough to deal with that. At a minimum, if we are going to have a one shot extension to the wc struct like this, I would prefer to see it called struct ibv_wc_timestamp and there be a ibv_poll_cq_timestamp. At least that way people would not use the generic _ex and assume this is the one and only _ex that we will ever need for work completions. Jason, when the XRC and flow steering extensions were added to libibverbs, you complained loudly that they were not added in the agreed upon format and cited a previous on list discussion. Do you have a link to that discussion? > AFAIK, Christoph's use case is essentially the only meaningful use > case for this feature, generalizing too much may destroy the > performance that is valuable here. There is actually room in a 64byte cacheline for two 64bit timestamps and another 2 bytes of padding or something else. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2015-06-02 14:44 ` Or Gerlitz [not found] ` <556DC167.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-02 18:08 ` Jason Gunthorpe 1 sibling, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-06-02 14:44 UTC (permalink / raw) To: Doug Ledford Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On 6/2/2015 5:35 PM, Doug Ledford wrote: > On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote: >> On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote: >> >>> attempted abstraction of ibverbs. Passing in the wc struct allows the >>> driver to internally allocate a wc struct with extra private elements >>> and pass that back to the user, when the user passes it back to >>> ibv_get_timestamp the elements are there in the private portion of the >>> struct. >> wc structures are allocated by the caller, there is no option for the >> driver to create private elements. > Well, they *are* using an extended work completion structure. Unlike > what I mentioned, where they create a larger one themselves, you have to > allocate a struct ibv_wc_ex instead of a struct ibv_wc and then you have > to call poll_cq_ex, which expects a struct ibv_wc_ex. > > So, just so everyone is clear on this point: the current user space > implementation of this feature creates an unversioned, newly named > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the > end (not 64bit aligned either). If we ever wanted to have a different > extension to our ibv_wc struct, there is no good way to do that. If, at > some point, we had multiple extension and the user was able to select > which they wanted to utilize, this structure extension is not flexible > enough to deal with that. At a minimum, if we are going to have a one > shot extension to the wc struct like this, I would prefer to see it > called struct ibv_wc_timestamp and there be a ibv_poll_cq_timestamp. At > least that way people would not use the generic _ex and assume this is > the one and only _ex that we will ever need for work completions. > > Jason, when the XRC and flow steering extensions were added to > libibverbs, you complained loudly that they were not added in the agreed > upon format and cited a previous on list discussion. Do you have a link > to that discussion? Doug, Do we agree that this part of the discussion (and also the below point) are related to the libibverbs API to applications and not to the kernel -> user API to support time-stamping? Or. > >> AFAIK, Christoph's use case is essentially the only meaningful use >> case for this feature, generalizing too much may destroy the >> performance that is valuable here. > There is actually room in a 64byte cacheline for two 64bit timestamps > and another 2 bytes of padding or something else. > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <556DC167.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <556DC167.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-06-02 14:53 ` Doug Ledford 0 siblings, 0 replies; 64+ messages in thread From: Doug Ledford @ 2015-06-02 14:53 UTC (permalink / raw) To: Or Gerlitz Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 3027 bytes --] On Tue, 2015-06-02 at 17:44 +0300, Or Gerlitz wrote: > On 6/2/2015 5:35 PM, Doug Ledford wrote: > > On Mon, 2015-06-01 at 10:43 -0600, Jason Gunthorpe wrote: > >> On Mon, Jun 01, 2015 at 07:25:04AM -0400, Doug Ledford wrote: > >> > >>> attempted abstraction of ibverbs. Passing in the wc struct allows the > >>> driver to internally allocate a wc struct with extra private elements > >>> and pass that back to the user, when the user passes it back to > >>> ibv_get_timestamp the elements are there in the private portion of the > >>> struct. > >> wc structures are allocated by the caller, there is no option for the > >> driver to create private elements. > > Well, they *are* using an extended work completion structure. Unlike > > what I mentioned, where they create a larger one themselves, you have to > > allocate a struct ibv_wc_ex instead of a struct ibv_wc and then you have > > to call poll_cq_ex, which expects a struct ibv_wc_ex. > > > > So, just so everyone is clear on this point: the current user space > > implementation of this feature creates an unversioned, newly named > > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the > > end (not 64bit aligned either). If we ever wanted to have a different > > extension to our ibv_wc struct, there is no good way to do that. If, at > > some point, we had multiple extension and the user was able to select > > which they wanted to utilize, this structure extension is not flexible > > enough to deal with that. At a minimum, if we are going to have a one > > shot extension to the wc struct like this, I would prefer to see it > > called struct ibv_wc_timestamp and there be a ibv_poll_cq_timestamp. At > > least that way people would not use the generic _ex and assume this is > > the one and only _ex that we will ever need for work completions. > > > > Jason, when the XRC and flow steering extensions were added to > > libibverbs, you complained loudly that they were not added in the agreed > > upon format and cited a previous on list discussion. Do you have a link > > to that discussion? > > Doug, > > Do we agree that this part of the discussion (and also the below point) > are related to the libibverbs API to applications and not to the kernel > -> user API to support time-stamping? Yes. > Or. > > > > >> AFAIK, Christoph's use case is essentially the only meaningful use > >> case for this feature, generalizing too much may destroy the > >> performance that is valuable here. > > There is actually room in a 64byte cacheline for two 64bit timestamps > > and another 2 bytes of padding or something else. > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-02 14:44 ` Or Gerlitz @ 2015-06-02 18:08 ` Jason Gunthorpe [not found] ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-02 18:08 UTC (permalink / raw) To: Doug Ledford Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote: > So, just so everyone is clear on this point: the current user space > implementation of this feature creates an unversioned, newly named > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the > end (not 64bit aligned either). If we ever wanted to have a different > extension to our ibv_wc struct, there is no good way to do that. No, if they followed (I didn't check yes) the extension scheme then the poll call is struct ibv_wc_ex wcs[num_wcs] ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0])); And the drivers decide what to do based on the 3rd argument, which is essentially the ABI version. > Jason, when the XRC and flow steering extensions were added to > libibverbs, you complained loudly that they were not added in the agreed > upon format and cited a previous on list discussion. Do you have a link > to that discussion? Not off hand, but that was different, that was a misuse of comp_mask, IIRC. Or, the question in my mind based on looking at the UAPI patches is what things should be driver private and what should be general. Broadly my thoughts: - Should the frequency and mask be general, or driver private? If the cycles->ns conversion is a function they should be driver private. Even if they are general at libibverbs, they don't *have* to be in the kernel's general query response. - Should frequency even be frequency? Most clocks are expressed accurately as a period in picoseconds. Frequency is more often imprecise. (eg ethernet is 3200 ps or 312.5MHz) However FDR/EDR is fractional for both (4693.33333333 ps vs 213.0681818181818 MHz) Precision is very important for time conversions, so a multiply-divide scheme would be ideal. This is suggesting to me these details really are not general. - There should be much better definition on what all this stuff is, units for frequency? When is the timestamp applied? - Should an app even be exposed to mask? This is very difficult to use correctly in the general case. Only cases where an app is restarted more often than a wrap period are trivial to use properly. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-02 18:51 ` Doug Ledford [not found] ` <1433271083.40123.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-03 19:48 ` Or Gerlitz 2015-06-04 16:17 ` Hefty, Sean 2 siblings, 1 reply; 64+ messages in thread From: Doug Ledford @ 2015-06-02 18:51 UTC (permalink / raw) To: Jason Gunthorpe Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 3012 bytes --] On Tue, 2015-06-02 at 12:08 -0600, Jason Gunthorpe wrote: > On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote: > > > So, just so everyone is clear on this point: the current user space > > implementation of this feature creates an unversioned, newly named > > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the > > end (not 64bit aligned either). If we ever wanted to have a different > > extension to our ibv_wc struct, there is no good way to do that. > > No, if they followed (I didn't check yes) the extension scheme then the > poll call is > > struct ibv_wc_ex wcs[num_wcs] > ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0])); > > And the drivers decide what to do based on the 3rd argument, which is > essentially the ABI version. Ick. OK. I would *much* prefer something done akin to the routines in packer.c of the kernel, but that's not my call to make, the decision on the ABI/API extension mechanism was made long ago. It does, however, mean that extensions are serial and not modular, and that's a shame. > > Jason, when the XRC and flow steering extensions were added to > > libibverbs, you complained loudly that they were not added in the agreed > > upon format and cited a previous on list discussion. Do you have a link > > to that discussion? > > Not off hand, but that was different, that was a misuse of comp_mask, > IIRC. > > Or, the question in my mind based on looking at the UAPI patches is > what things should be driver private and what should be general. > > Broadly my thoughts: > - Should the frequency and mask be general, or driver private? If the > cycles->ns conversion is a function they should be driver private. > Even if they are general at libibverbs, they don't *have* to be in > the kernel's general query response. > - Should frequency even be frequency? Most clocks are expressed > accurately as a period in picoseconds. Frequency is more often > imprecise. (eg ethernet is 3200 ps or 312.5MHz) > However FDR/EDR is fractional for both (4693.33333333 ps vs > 213.0681818181818 MHz) > Precision is very important for time conversions, so a > multiply-divide scheme would be ideal. > This is suggesting to me these details really are not > general. > - There should be much better definition on what all this stuff is, > units for frequency? When is the timestamp applied? > - Should an app even be exposed to mask? This is very difficult > to use correctly in the general case. Only cases where an app is > restarted more often than a wrap period are trivial to use properly. > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1433271083.40123.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433271083.40123.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2015-06-02 19:04 ` Jason Gunthorpe [not found] ` <20150602190410.GA23362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-02 19:04 UTC (permalink / raw) To: Doug Ledford Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Tue, Jun 02, 2015 at 02:51:23PM -0400, Doug Ledford wrote: > On Tue, 2015-06-02 at 12:08 -0600, Jason Gunthorpe wrote: > > On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote: > > > > > So, just so everyone is clear on this point: the current user space > > > implementation of this feature creates an unversioned, newly named > > > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the > > > end (not 64bit aligned either). If we ever wanted to have a different > > > extension to our ibv_wc struct, there is no good way to do that. > > > > No, if they followed (I didn't check yes) the extension scheme then the > > poll call is > > > > struct ibv_wc_ex wcs[num_wcs] > > ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0])); > > > > And the drivers decide what to do based on the 3rd argument, which is > > essentially the ABI version. > > Ick. OK. I would *much* prefer something done akin to the routines in > packer.c of the kernel, but that's not my call to make, the decision on > the ABI/API extension mechanism was made long ago. It does, however, > mean that extensions are serial and not modular, and that's a shame. All verbs extensions are essentially serial, each extension requires a fixed allocation of structure bytes, made by upstream. This is also why no vendor may ship an extension that is not upstream and continue to use the same soname as upstream. Similarly for the kernel. This is fairly performance neutral, while a packer.c scheme would be unacceptably expensive, IMHO. poll_wc is one of the most performance sensitive routines in the library. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150602190410.GA23362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150602190410.GA23362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-02 19:56 ` Doug Ledford 0 siblings, 0 replies; 64+ messages in thread From: Doug Ledford @ 2015-06-02 19:56 UTC (permalink / raw) To: Jason Gunthorpe Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 2886 bytes --] On Tue, 2015-06-02 at 13:04 -0600, Jason Gunthorpe wrote: > On Tue, Jun 02, 2015 at 02:51:23PM -0400, Doug Ledford wrote: > > On Tue, 2015-06-02 at 12:08 -0600, Jason Gunthorpe wrote: > > > On Tue, Jun 02, 2015 at 10:35:24AM -0400, Doug Ledford wrote: > > > > > > > So, just so everyone is clear on this point: the current user space > > > > implementation of this feature creates an unversioned, newly named > > > > ibv_wc_ex struct that is ibv_wc with a 64bit timestamp tacked on at the > > > > end (not 64bit aligned either). If we ever wanted to have a different > > > > extension to our ibv_wc struct, there is no good way to do that. > > > > > > No, if they followed (I didn't check yes) the extension scheme then the > > > poll call is > > > > > > struct ibv_wc_ex wcs[num_wcs] > > > ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0])); > > > > > > And the drivers decide what to do based on the 3rd argument, which is > > > essentially the ABI version. > > > > Ick. OK. I would *much* prefer something done akin to the routines in > > packer.c of the kernel, but that's not my call to make, the decision on > > the ABI/API extension mechanism was made long ago. It does, however, > > mean that extensions are serial and not modular, and that's a shame. > > All verbs extensions are essentially serial, each extension requires a > fixed allocation of structure bytes, made by upstream. > > This is also why no vendor may ship an extension that is not upstream > and continue to use the same soname as upstream. Similarly for the > kernel. > > This is fairly performance neutral, while a packer.c scheme would be > unacceptably expensive, IMHO. poll_wc is one of the most performance > sensitive routines in the library. I disagree. Obviously I haven't run them in a tight loop to confirm, but I looked at mthca, mlx4, and cxgb4 user libraries, and all of them have complex *_poll_one routines that convert their internal cqe's to wc's. The packer routines aren't any more complex or any slower (at least not necessarily, it all depends on the particular transformation needed). The packer routines are just hard to read. And, as Christoph pointed out, we can keep our wc in a single cache line right now. However, we only need a few extensions to blow that out of the water. If some extension comes along that gets allocated past the 64byte cacheline size, and that extension is used far more frequently than say this timestamp, then we will have forced a cache line break on a frequently used item for a less frequently used item. So, there would be benefits to a modular approach in terms of allowing the user to select what items they want and to keep their important items in that single cache line. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-02 18:51 ` Doug Ledford @ 2015-06-03 19:48 ` Or Gerlitz [not found] ` <CAJ3xEMjZqwJC8=HrMBv_eSVKVtLW7qjVZNjtuD28octGPtYu6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-04 16:17 ` Hefty, Sean 2 siblings, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-06-03 19:48 UTC (permalink / raw) To: Jason Gunthorpe, Christoph Lameter Cc: Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Tue, Jun 2, 2015 at 9:08 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > Or, the question in my mind based on looking at the UAPI patches is > what things should be driver private and what should be general. > > Broadly my thoughts: > - Should the frequency and mask be general, or driver private? If the > cycles->ns conversion is a function they should be driver private. > Even if they are general at libibverbs, they don't *have* to be in > the kernel's general query response. If they are general in libibverbs, what's the point not to put them in the kernel's general query response? > - Should frequency even be frequency? Most clocks are expressed > accurately as a period in picoseconds. Frequency is more often > imprecise. (eg ethernet is 3200 ps or 312.5MHz) > However FDR/EDR is fractional for both (4693.33333333 ps vs > 213.0681818181818 MHz) > Precision is very important for time conversions, so a > multiply-divide scheme would be ideal. >From Christoph's response I got the impression that our proposal of exposing frequency and mask combined with raw time stamps excellently fits typical user needs, so I thought we're good. Doug made a comment that things look OK to him and the rest of the work would be when we come to review the user-space patches. > This is suggesting to me these details really are not general. > - There should be much better definition on what all this stuff is, > units for frequency? When is the timestamp applied? The timestamp is applied when the WC is generated, as Doug asked, we changed the flag name to reflect that. I guess that the units for frequency are MHz but I will check that and we can document it in the kernel IB core patch and later in man pages. > - Should an app even be exposed to mask? This is very difficult > to use correctly in the general case. Only cases where an app is > restarted more often than a wrap period are trivial to use properly. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <CAJ3xEMjZqwJC8=HrMBv_eSVKVtLW7qjVZNjtuD28octGPtYu6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <CAJ3xEMjZqwJC8=HrMBv_eSVKVtLW7qjVZNjtuD28octGPtYu6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-06-03 20:46 ` Jason Gunthorpe [not found] ` <20150603204633.GE7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-03 20:46 UTC (permalink / raw) To: Or Gerlitz Cc: Christoph Lameter, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Wed, Jun 03, 2015 at 10:48:28PM +0300, Or Gerlitz wrote: > > - Should the frequency and mask be general, or driver private? If the > > cycles->ns conversion is a function they should be driver private. > > Even if they are general at libibverbs, they don't *have* to be in > > the kernel's general query response. > > If they are general in libibverbs, what's the point not to put them in > the kernel's general query response? If there is a timestamp_to_ns API then they would not be general in libiverbs either. > > - Should frequency even be frequency? Most clocks are expressed > > accurately as a period in picoseconds. Frequency is more often > > imprecise. (eg ethernet is 3200 ps or 312.5MHz) > > However FDR/EDR is fractional for both (4693.33333333 ps vs > > 213.0681818181818 MHz) > > Precision is very important for time conversions, so a > > multiply-divide scheme would be ideal. > > From Christoph's response I got the impression that our proposal of > exposing frequency and mask combined with raw time stamps excellently > fits typical user needs, so I thought we're good. Doug made a comment > that things look OK to him and the rest of the work would be when we > come to review the user-space patches. This response ignores my point about precision. MHz is fine *for mlx hardware* but someone elses hardware that uses, say 312.5 MHz (ie the ethernet symbol clock) is NOT OK because MHz looses too much precision. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150603204633.GE7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150603204633.GE7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-04 0:55 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506031953500.11879-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Christoph Lameter @ 2015-06-04 0:55 UTC (permalink / raw) To: Jason Gunthorpe Cc: Or Gerlitz, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Wed, 3 Jun 2015, Jason Gunthorpe wrote: > MHz is fine *for mlx hardware* but someone elses hardware that uses, > say 312.5 MHz (ie the ethernet symbol clock) is NOT OK because MHz > looses too much precision. Oscillator vary in frequency. In order to accurately convert to NS the drift due to temperature etc needs to be taken into consideration. The "ns" value there is pretty rough as well. Accurate time may need time software to continually monitor the *actual* frequency of the oscillator. I thknk the raw cycles and the rought oscillator speed are fine. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <alpine.DEB.2.11.1506031953500.11879-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <alpine.DEB.2.11.1506031953500.11879-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> @ 2015-06-04 4:25 ` Jason Gunthorpe [not found] ` <20150604042540.GA8837-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-04 4:25 UTC (permalink / raw) To: Christoph Lameter Cc: Or Gerlitz, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Wed, Jun 03, 2015 at 07:55:58PM -0500, Christoph Lameter wrote: > I thknk the raw cycles and the rought oscillator speed are fine. Time keeping is designed to adjust for 100's of ppm drift between clocks. A communications clock source will be spec'd to be below 200ppm in accuracy. IB clocks are below 100 ppm, and PCI-E is 300ppm (approx, I didn't check, order of magnitue is close) That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock Compared to 5,000,000 Hz in error from rounding. So no, I disagree that rough is fine for anything. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150604042540.GA8837-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150604042540.GA8837-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-06 8:16 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506060304480.29634-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Christoph Lameter @ 2015-06-06 8:16 UTC (permalink / raw) To: Jason Gunthorpe Cc: Or Gerlitz, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Wed, 3 Jun 2015, Jason Gunthorpe wrote: > On Wed, Jun 03, 2015 at 07:55:58PM -0500, Christoph Lameter wrote: > > > I thknk the raw cycles and the rought oscillator speed are fine. > > Time keeping is designed to adjust for 100's of ppm drift between > clocks. What time keeping? Ntp? pptp is supposed to be accurate to 10s of ns and we would need an accuracy in that range. > A communications clock source will be spec'd to be below 200ppm in > accuracy. IB clocks are below 100 ppm, and PCI-E is 300ppm (approx, I > didn't check, order of magnitue is close) Well that is not usable. ns are a billionth of a second which is the unit of measurement of these activities here. A send action can be around 600-1000ns. If we are off by 200ppm then that is 200 microseconds meaning 200000 ns. And its our experience that these clocks can be off by milliseconds in practice. > That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock Ok that is around 3ns per cycle? And you think the accuracy is therefore in femtoseconds? I have never seen something that accurate. Wish something like that would exist. Maybe in some labs that provide the source of global timekeeping? > Compared to 5,000,000 Hz in error from rounding. Huh? > So no, I disagree that rough is fine for anything. I am sorry but the practical issues that we are dealing with in timekeeping today shows just the opposite. For a true comparison of clocks with nanosecond accuracy you would need time corrected values and that is a challenge due to the variances of the clocks that we see. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <alpine.DEB.2.11.1506060304480.29634-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <alpine.DEB.2.11.1506060304480.29634-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> @ 2015-06-06 15:45 ` Doug Ledford [not found] ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Doug Ledford @ 2015-06-06 15:45 UTC (permalink / raw) To: Christoph Lameter Cc: Jason Gunthorpe, Or Gerlitz, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 3288 bytes --] On Sat, 2015-06-06 at 03:16 -0500, Christoph Lameter wrote: > On Wed, 3 Jun 2015, Jason Gunthorpe wrote: > > > On Wed, Jun 03, 2015 at 07:55:58PM -0500, Christoph Lameter wrote: > > > > > I thknk the raw cycles and the rought oscillator speed are fine. > > > > Time keeping is designed to adjust for 100's of ppm drift between > > clocks. > > What time keeping? Ntp? pptp is supposed to be accurate to 10s of ns and > we would need an accuracy in that range. > > > A communications clock source will be spec'd to be below 200ppm in > > accuracy. IB clocks are below 100 ppm, and PCI-E is 300ppm (approx, I > > didn't check, order of magnitue is close) > > Well that is not usable. ns are a billionth of a second which is the unit > of measurement of these activities here. A send action can be around 600-1000ns. > If we are off by 200ppm then that is 200 microseconds meaning 200000 ns. > And its our experience that these clocks can be off by milliseconds in > practice. The ppm rating is based upon the speed of the clock, not time. It's how many cycles of variance you are allowed from the target speed given in cycles / millions of cycles of the target clock frequency. If you have a 312.5MHz clock, and your accuracy is specified as 100ppm, then the total clock variability is 312.5 * 100 = 31250 cycles (I suspect that this is an absolute variance, and so the tolerance would be +-1/2 of the total amount, but I don't know that for certain). > > That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock I don't know how this number is derived, but 0.0625Hz sounds like an odd variance. > Ok that is around 3ns per cycle? And you think the accuracy is therefore > in femtoseconds? I have never seen something that accurate. Wish something > like that would exist. Maybe in some labs that provide the source of > global timekeeping? > > > Compared to 5,000,000 Hz in error from rounding. > > Huh? He's pointing out that the design as specified passes the clock frequency to the user space library in terms of integer MHz. The standard Ethernet clock frequency is 312.5MHz. That .5MHz, or 500,000Hz, must be rounded off as it is passed from the kernel to the user space library. And that 500,000 cycle per second error in the stated speed of the clock is *way* larger than the +- error variance in the clocks you are using. If you are having problems keeping your time numbers synchronized, then this is likely a bigger problem than the variance of the clocks. > > So no, I disagree that rough is fine for anything. > > I am sorry but the practical issues that we are dealing with in > timekeeping today shows just the opposite. For a true comparison of clocks > with nanosecond accuracy you would need time corrected values and that is > a challenge due to the variances of the clocks that we see. Jason's point, and one that isn't addressed yet, is that this might not be variance in the clocks and instead might be a design flaw in the API you are using and the way the clock speeds are passed to user space. Changing from int MHz to int KHz might solve your problem. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2015-06-06 20:25 ` Or Gerlitz 2015-06-07 4:42 ` Jason Gunthorpe 2015-06-07 4:49 ` Christoph Lameter 2 siblings, 0 replies; 64+ messages in thread From: Or Gerlitz @ 2015-06-06 20:25 UTC (permalink / raw) To: Doug Ledford Cc: Christoph Lameter, Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Sat, Jun 6, 2015 at 6:45 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >>> So no, I disagree that rough is fine for anything. >> I am sorry but the practical issues that we are dealing with in >> timekeeping today shows just the opposite. For a true comparison of clocks >> with nanosecond accuracy you would need time corrected values and that is >> a challenge due to the variances of the clocks that we see. > Jason's point, and one that isn't addressed yet, is that this might not > be variance in the clocks and instead might be a design flaw in the API > you are using and the way the clock speeds are passed to user space. > Changing from int MHz to int KHz might solve your problem. OK, so if we have the UAPI to pass the clock frequency in KHz that would put us in a better place? seems very much doable. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-06 20:25 ` Or Gerlitz @ 2015-06-07 4:42 ` Jason Gunthorpe 2015-06-07 4:49 ` Christoph Lameter 2 siblings, 0 replies; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-07 4:42 UTC (permalink / raw) To: Doug Ledford Cc: Christoph Lameter, Or Gerlitz, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Sat, Jun 06, 2015 at 11:45:46AM -0400, Doug Ledford wrote: > The ppm rating is based upon the speed of the clock, not time. It's how > many cycles of variance you are allowed from the target speed given in > cycles / millions of cycles of the target clock frequency. Right, it is 'parts per million'. Hz, ppm, and a measure of 'phase noise' are the three technical characteristics used to define a communications clock source. > > > That translates into 0.0625 Hz. for a 312.5 MHz ethernet reference clock > > I don't know how this number is derived, but 0.0625Hz sounds like an odd > variance. I used 200ppm for that example, as it is closer to the ethernet worst case. > He's pointing out that the design as specified passes the clock > frequency to the user space library in terms of integer MHz. The > standard Ethernet clock frequency is 312.5MHz. That .5MHz, or > 500,000Hz, must be rounded off as it is passed from the kernel to > the Yes, right, thank you, I thought I was loosing my mind :) > Jason's point, and one that isn't addressed yet, is that this might not > be variance in the clocks and instead might be a design flaw in the API > you are using and the way the clock speeds are passed to user space. > Changing from int MHz to int KHz might solve your problem. I would use a period in picoseconds to describe a clock.. But really, if you start talking about IEEE 1588 (the PTP standard Christoph mentioned) then even that is not enough accuracy to represent a synchronized clock, and the frequency may change as the NIC adjusts the time base. So, if verbs has a 'time stamp to timespec' driver call back then that would be most general. Userspace that needs high speed self managed conversion can call 'time stamp to timespec' once with the value 1E9 and learn the general clock period, exactly as if it was in query_device. So nothing is lost by this small API change. Which suggest to me, this shouldn't be in the kernel general UAPI at all. That just leaves the mask, I dislike it, but.. if Or says it is too hard to fix then we are stuck with it. I don't think the wrapping can really be fixed unless the HW generates a CQE entry for every counter roll over. At least I haven't had a better idea on the subject this week.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-06 20:25 ` Or Gerlitz 2015-06-07 4:42 ` Jason Gunthorpe @ 2015-06-07 4:49 ` Christoph Lameter 2 siblings, 0 replies; 64+ messages in thread From: Christoph Lameter @ 2015-06-07 4:49 UTC (permalink / raw) To: Doug Ledford Cc: Jason Gunthorpe, Or Gerlitz, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Sat, 6 Jun 2015, Doug Ledford wrote: > The ppm rating is based upon the speed of the clock, not time. It's how > many cycles of variance you are allowed from the target speed given in > cycles / millions of cycles of the target clock frequency. If you have > a 312.5MHz clock, and your accuracy is specified as 100ppm, then the > total clock variability is 312.5 * 100 = 31250 cycles (I suspect that > this is an absolute variance, and so the tolerance would be +-1/2 of the > total amount, but I don't know that for certain). Ok well then you also have the problem that the clock may be off in general already by a certain factor from the true speed of the flow of time due to manufacturing variances etc. We are only talking about the instabilty of the clock source while operating it seems? > > I am sorry but the practical issues that we are dealing with in > > timekeeping today shows just the opposite. For a true comparison of clocks > > with nanosecond accuracy you would need time corrected values and that is > > a challenge due to the variances of the clocks that we see. > > Jason's point, and one that isn't addressed yet, is that this might not > be variance in the clocks and instead might be a design flaw in the API > you are using and the way the clock speeds are passed to user space. > Changing from int MHz to int KHz might solve your problem. That sounds doable. Maybe we need to look at how clock speeds are specified elsewhere? man adtimex gives some ways that this is done in the general API for clock adjustment. Or maybe better look at IEEE 1588 for ways to specify the clock characteristics? http://www.nist.gov/el/isd/ieee/ieee1588.cfm -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-02 18:51 ` Doug Ledford 2015-06-03 19:48 ` Or Gerlitz @ 2015-06-04 16:17 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2 siblings, 1 reply; 64+ messages in thread From: Hefty, Sean @ 2015-06-04 16:17 UTC (permalink / raw) To: Jason Gunthorpe, Doug Ledford Cc: Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon > No, if they followed (I didn't check yes) the extension scheme then the > poll call is > > struct ibv_wc_ex wcs[num_wcs] > ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0])); > > And the drivers decide what to do based on the 3rd argument, which is > essentially the ABI version. As a couple of options: You could add a 'completion format' field to the cq attribute structure, which would allow poll to cast to a specific structure. If we wanted to go with Doug's earlier idea, you could add a 'provider context' field to the wc. This would allow calling back into the provider to retrieve more data. Though, this concept would be racy if other completions can overwrite whatever data was trying to be retrieved. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2015-06-04 16:47 ` Jason Gunthorpe [not found] ` <20150604164759.GC27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-04 17:30 ` Doug Ledford 1 sibling, 1 reply; 64+ messages in thread From: Jason Gunthorpe @ 2015-06-04 16:47 UTC (permalink / raw) To: Hefty, Sean Cc: Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Thu, Jun 04, 2015 at 04:17:35PM +0000, Hefty, Sean wrote: > > No, if they followed (I didn't check yes) the extension scheme then the > > poll call is > > > > struct ibv_wc_ex wcs[num_wcs] > > ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0])); > > > > And the drivers decide what to do based on the 3rd argument, which is > > essentially the ABI version. > > As a couple of options: > > You could add a 'completion format' field to the cq attribute > structure, which would allow poll to cast to a specific structure. Subtle, and tricky to use.. But sure.. > If we wanted to go with Doug's earlier idea, you could add a > 'provider context' field to the wc. This would allow calling back > into the provider to retrieve more data. Though, this concept would > be racy if other completions can overwrite whatever data was trying > to be retrieved. Right, in the model we have where poll advances the HW completion queue, that is not straightforward at all. Can you share what the libfabric change would look like to add a timestamp field to the completion? That might be informative, libfabric is also consuming the kernel UAPI. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <20150604164759.GC27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <20150604164759.GC27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-06-04 17:24 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F98-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Hefty, Sean @ 2015-06-04 17:24 UTC (permalink / raw) To: Jason Gunthorpe Cc: Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon > Can you share what the libfabric change would look like to add a > timestamp field to the completion? That might be informative, > libfabric is also consuming the kernel UAPI. Libfabric uses a CQ format option. so this would just require a new format. There is no type checking on CQ poll (read). The provider interface looks like this: struct fi_ops_cq { size_t size; ssize_t (*read)(struct fid_cq *cq, void *buf, size_t count); ssize_t (*readfrom)(struct fid_cq *cq, void *buf, size_t count, fi_addr_t *src_addr); (We could construct static inline calls with type checking if someone wanted that.) If I were adding timestamps, I would probably define a new completion structure with 2 u64 time stamp fields (start and end times), and figure out when start occurred, end occurred, and the timing metric later. :) I would assume that these are non-wrapping values. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A82373A8FE4F98-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F98-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2015-06-06 8:27 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506060318570.29860-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Christoph Lameter @ 2015-06-06 8:27 UTC (permalink / raw) To: Hefty, Sean Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Thu, 4 Jun 2015, Hefty, Sean wrote: > If I were adding timestamps, I would probably define a new completion > structure with 2 u64 time stamp fields (start and end times), and figure > out when start occurred, end occurred, and the timing metric later. :) Not sure why you would need the start. The app knows when it submitted a send request and incoming packets can be readily timed with taps if necessary. If you want the start on inbound packets then you have the challenge that the adapter needs to figure out when the first bit of the message actually arrived and the timestamp information needs to be pushed through all the way through the pipeline. Completion is easily done. > I would assume that these are non-wrapping values. Its fine what we have now as far as I can tell. I am not sure why it is necessary to make this more complicated than it is now. We need a simple means to obtain the completion time and that is what the current implementation provides. There is even another vendor (chelsio) who has a similar implementation. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <alpine.DEB.2.11.1506060318570.29860-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>]
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <alpine.DEB.2.11.1506060318570.29860-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> @ 2015-06-08 16:40 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE5AB8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Hefty, Sean @ 2015-06-08 16:40 UTC (permalink / raw) To: Christoph Lameter Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon > > If I were adding timestamps, I would probably define a new completion > > structure with 2 u64 time stamp fields (start and end times), and figure > > out when start occurred, end occurred, and the timing metric later. :) > > Not sure why you would need the start. The app knows when it submitted a > send request and incoming packets can be readily timed with taps if > necessary. If you want the start on inbound packets then you have the > challenge that the adapter needs to figure out when the first bit of the > message actually arrived and the timestamp information needs to be pushed > through all the way through the pipeline. Completion is easily done. You're assuming that the only start time of interest is when a send operation has been posted. Jason asked what I would do with libfabric. That interface supports triggered operations. It has also been designed such that a rendezvous (that has to be one of the most difficult words in the English language to spell correctly, even with spell check) protocol could be implemented by the provider. On the receive side, it may be of interest to report the start and ending time for larger transfers, primarily for debugging purposes. I have no idea how the time stamps are expected to be used, so why limit it? An app could just as easily create their own time stamp when reading a work completion, especially when the data is going into an anonymous receive buffer. That would seem to work for your use case. > > I would assume that these are non-wrapping values. > > Its fine what we have now as far as I can tell. > > I am not sure why it is necessary to make this more complicated than it is > now. We need a simple means to obtain the completion time and that is what > the current implementation provides. There is even another vendor > (chelsio) who has a similar implementation. And I thought Chelsio they mentioned that they have 2 time stamps. I have no problem with a bare metal interface exposing this. But pretending that it's generic and that this is the one and only way that this could be implemented doesn't make it so. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A82373A8FE5AB8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE5AB8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2015-06-10 1:26 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Christoph Lameter @ 2015-06-10 1:26 UTC (permalink / raw) To: Hefty, Sean Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Mon, 8 Jun 2015, Hefty, Sean wrote: > You're assuming that the only start time of interest is when a send operati= > on has been posted. Jason asked what I would do with libfabric. That inte= > rface supports triggered operations. It has also been designed such that a= > rendezvous (that has to be one of the most difficult words in the English = > language to spell correctly, even with spell check) protocol could be imple= > mented by the provider. On the receive side, it may be of interest to repo= > rt the start and ending time for larger transfers, primarily for debugging = > purposes. There are multiple problems with libfrabric related to the use cases in my area. Most of all the lack of multicast support. Then there is the build up of software bloat on top. The interest here is in low latency operations. Redenzvous and other new features are really not wanted if they increase the latency. > I have no idea how the time stamps are expected to be used, so why limit it= > ? An app could just as easily create their own time stamp when reading a w= > ork completion, especially when the data is going into an anonymous receive= > buffer. That would seem to work for your use case. No it cannot as described earlier. The work can be completed much earlier than when the polling thread gets around to check for it. We do that today since there is nothing better but this means that there is a gap there. On the send side you have no easy way to telling when the operation was complete without the timestamp. > I have no problem with a bare metal interface exposing this. But pretendin= > g that it's generic and that this is the one and only way that this could b= > e implemented doesn't make it so. This is a way it was implemented and its usable. Shooting for pie in the sky does not bring us anything. Nor ideas of requirements from a new experimental API that does not support the basic features that we need and seems to be on its way to mess up the latencies of access to RDMA operations. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> @ 2015-06-10 8:45 ` Or Gerlitz [not found] ` <5577F93F.9030405-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-10 16:04 ` Hefty, Sean 1 sibling, 1 reply; 64+ messages in thread From: Or Gerlitz @ 2015-06-10 8:45 UTC (permalink / raw) To: Doug Ledford Cc: Christoph Lameter, Hefty, Sean, Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On 6/10/2015 4:26 AM, Christoph Lameter wrote: >> >I have no problem with a bare metal interface exposing this. But pretendin= >> >g that it's generic and that this is the one and only way that this could b= >> >e implemented doesn't make it so. > This is a way it was implemented and its usable. Shooting for pie in the > sky does not bring us anything. Nor ideas of requirements from a new > experimental API that does not support the basic features that we need > and seems to be on its way to mess up the latencies of access to RDMA operations. Doug, What's your maintainer say here? The current proposal has: 1. raw HCA clock completion generation time-stamp for CQEs 2. HCA clock frequency in KHZ 3. mask telling how many bits are relevant from the 64bit time-stamp This is fairly simple, practical and very much usable to FSI applications and users, and can be extended later if someone comes up with better/other combination of the frequency/mask. Have a GO? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <5577F93F.9030405-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <5577F93F.9030405-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-06-10 23:54 ` Doug Ledford 0 siblings, 0 replies; 64+ messages in thread From: Doug Ledford @ 2015-06-10 23:54 UTC (permalink / raw) To: Or Gerlitz Cc: Christoph Lameter, Hefty, Sean, Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 1897 bytes --] On Wed, 2015-06-10 at 11:45 +0300, Or Gerlitz wrote: > On 6/10/2015 4:26 AM, Christoph Lameter wrote: > >> >I have no problem with a bare metal interface exposing this. But pretendin= > >> >g that it's generic and that this is the one and only way that this could b= > >> >e implemented doesn't make it so. > > This is a way it was implemented and its usable. Shooting for pie in the > > sky does not bring us anything. Nor ideas of requirements from a new > > experimental API that does not support the basic features that we need > > and seems to be on its way to mess up the latencies of access to RDMA operations. > > Doug, > > What's your maintainer say here? > > The current proposal has: > > 1. raw HCA clock completion generation time-stamp for CQEs > 2. HCA clock frequency in KHZ > 3. mask telling how many bits are relevant from the 64bit time-stamp > > This is fairly simple, practical and very much usable to FSI > applications and users, and can be extended later if someone comes up > with better/other combination of the frequency/mask. Have a GO? This is all related to the kernel <-> libibverbs interface. In that regard, I'm fine with what we have here. To be more specific, the CQ creation flags and use of create_cq_ex and the extension of the query_device struct and use of extended query device are really the only user visible items here, and I'm OK with those. None of these items are hot path items and structure growth with new fields is not the major item it is for the wc struct. Now, the change to the wc struct and the change to ibv_poll_cq are more important and still need some work to get to a final implementation IMO. But that work is all limited to libibverbs and doesn't impact this kernel patchset. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-10 8:45 ` Or Gerlitz @ 2015-06-10 16:04 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE6746-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 1 sibling, 1 reply; 64+ messages in thread From: Hefty, Sean @ 2015-06-10 16:04 UTC (permalink / raw) To: Christoph Lameter Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon > There are multiple problems with libfrabric related to the use cases in my > area. Most of all the lack of multicast support. Then there is the build > up of software bloat on top. The interest here is in low latency > operations. Redenzvous and other new features are really not wanted if > they increase the latency. Multicast is only supported by one vendor that has taken a hostile position against libfabric. Support for multicast will eventually be there, but it's definitely not a priority for me. As an open source project, anyone is welcome to propose patches. For native providers, libfabric will reduce latency. That's a provider implementation issue, and native providers will be available soon. The OFIWG selected to have a working set of interfaces that applications can begin using immediately, versus waiting until there were a large set of native providers. This is an entirely unrelated topic for this thread. Jason asked what I would do with libfabric. I answered. Your comments do nothing to change my mind on that answer. > This is a way it was implemented IMO, this is exactly the problem. The entire design is being driving by the implementation. That produces an unmaintainable API and fractures the software ecosystem, which is exactly where we are today. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A82373A8FE6746-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE6746-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2015-06-11 15:24 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506111010580.18426-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 0 siblings, 1 reply; 64+ messages in thread From: Christoph Lameter @ 2015-06-11 15:24 UTC (permalink / raw) To: Hefty, Sean Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon On Wed, 10 Jun 2015, Hefty, Sean wrote: > > There are multiple problems with libfrabric related to the use cases in m= > y > > area. Most of all the lack of multicast support. Then there is the build > > up of software bloat on top. The interest here is in low latency > > operations. Redenzvous and other new features are really not wanted if > > they increase the latency. > > Multicast is only supported by one vendor that has taken a hostile position= > against libfabric. Support for multicast will eventually be there, but it= > 's definitely not a priority for me. As an open source project, anyone is = > welcome to propose patches. Intel is supporting multicast in hardware. Its just a bad implementation (broadcast and filtering MC groups in the HCA or what was that?) and there is no plan to fix the issues despite the problem being known for quite some time. Also does this mean that libfabric only to supports the features needed by Intel? > For native providers, libfabric will reduce latency. That's a provider imp= > lementation issue, and native providers will be available soon. The OFIWG = > selected to have a working set of interfaces that applications can begin us= > ing immediately, versus waiting until there were a large set of native prov= > iders. I would be interested to see some measurements. AFAICT the Intel solutions are based on historically inferior IB technology from Qlogic which has never been able in my lab tests to compete latency wise with other vendors. I have heard these latency claims repeatedly from Qlogic personnel over the years. > IMO, this is exactly the problem. The entire design is being driving by th= > e implementation. That produces an unmaintainable API and fractures the so= > ftware ecosystem, which is exactly where we are today. This is a well designed solution and its easy to use. It would help libfabric if you would work with other vendors and industries to include support for their needs. MPI is not the only applications that are running on the fabrics. I understand that is historically the only area in which Qlogic hardware was able to compete but I think you need to move beyond that. APIs should be as general as possible abstracting hardware as much as possible. A viable libfabric needs to be easy to use, low overhead as well as covering the requirements of multiple vendors and use cases. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <alpine.DEB.2.11.1506111010580.18426-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>]
* RE: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <alpine.DEB.2.11.1506111010580.18426-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> @ 2015-06-11 19:48 ` Hefty, Sean 0 siblings, 0 replies; 64+ messages in thread From: Hefty, Sean @ 2015-06-11 19:48 UTC (permalink / raw) To: Christoph Lameter Cc: Jason Gunthorpe, Doug Ledford, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon > Intel is supporting multicast in hardware. Its just a bad implementation > (broadcast and filtering MC groups in the HCA or what was that?) and there > is no plan to fix the issues despite the problem being known for quite > some time. Also does this mean that libfabric only to supports the > features needed by Intel? Libfabric supports whatever features apps require and the participating vendors want to provide. However, I, personally, have a limited amount of time to my day and will focus my effort on either what my management requires of me or areas that are most interesting. Libfabric is specifically designed to be vendor, transport, and implementation neutral. > I would be interested to see some measurements. AFAICT the Intel solutions > are based on historically inferior IB technology from Qlogic which has > never been able in my lab tests to compete latency wise with other > vendors. I have heard these latency claims repeatedly from Qlogic > personnel over the years. You are referring to hardware latency. Libfabric is software. No amount of software is going to overcome hardware limitations. The entire reason multicast support was removed from libfabric 1.0 was that the proposed API would have introduced latency by adding a branch into the code path. > This is a well designed solution and its easy to use. I fundamentally disagree with the practice of ad-hoc API design. I stated this on the mail list probably 3 years ago. I see nothing wrong with allowing and encouraging vendor specific extensions. > It would help libfabric if you would work with other vendors and > industries to include support for their needs. MPI is not the only > applications that are running on the fabrics. I understand that is > historically the only area in which Qlogic hardware was able to compete > but I think you need to move beyond that. APIs should be as general as > possible abstracting hardware as much as possible. A viable libfabric > needs to be easy to use, low overhead as well as covering the requirements > of multiple vendors and use cases. Libfabric included requirements from multiple users and applications - MPI, SHMEM, PGAS, DBMS, and sockets all provided input. It chose to target MPI as an initial priority, but it is not limited to MPI at all. It also works with other vendors, including vendors that do not support the verbs interfaces -- Cisco, Cray, Intel PSM, plus others. I, personally, ensured that libfabric would layer well over verbs based hardware. That doesn't mean that I'm obligated to provide optimized providers over everyone's hardware. The goal was not to spend 3 years working on a new API, but to get something usable within a short timeframe that could be extended. OFIWG could have taken a different approach, but this was what the community (not Intel) selected. As a company, Intel has many products. A competitor in one area of the company may be a partner in another. Xeon is by far the most important to this discussion. It's why Intel dedicated developers to enabling high performance networking in Linux for over 10 years -- even before Intel had any products in those spaces. And it's why Intel continues to fund development. Sure, Intel now has IB and Omni-Path Architecture products, but they also have iWarp and Ethernet. Intel MPI runs over a bunch of different fabrics. Libfabric doesn't just need to work well over Intel NICs, it needs to work well over Intel platforms. Returning to this thread, if I had to add time stamps to libfabric, I would still add 2 time stamps into a new completion structure. Those time stamps would be selected using a method similar to what Doug stated in an earlier email. The app would use an enum to select what the time stamps would capture. However, I would lean more to having those values specified as part of the endpoint attributes, rather than the CQ. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH for-next V2 0/9] Add completion timestamping support [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2015-06-04 16:47 ` Jason Gunthorpe @ 2015-06-04 17:30 ` Doug Ledford 1 sibling, 0 replies; 64+ messages in thread From: Doug Ledford @ 2015-06-04 17:30 UTC (permalink / raw) To: Hefty, Sean Cc: Jason Gunthorpe, Matan Barak, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Amir Vadai, Tal Alon [-- Attachment #1: Type: text/plain, Size: 996 bytes --] On Thu, 2015-06-04 at 16:17 +0000, Hefty, Sean wrote: > > No, if they followed (I didn't check yes) the extension scheme then the > > poll call is > > > > struct ibv_wc_ex wcs[num_wcs] > > ibv_poll_wc_ex(&wcs,num_wcs,sizeof(wcs[0])); > > > > And the drivers decide what to do based on the 3rd argument, which is > > essentially the ABI version. > > As a couple of options: > > You could add a 'completion format' field to the cq attribute structure, which would allow poll to cast to a specific structure. This is probably what I would consider an ideal way to handle things. > If we wanted to go with Doug's earlier idea, you could add a 'provider context' field to the wc. This would allow calling back into the provider to retrieve more data. Though, this concept would be racy if other completions can overwrite whatever data was trying to be retrieved. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: 0E572FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2015-06-11 19:48 UTC | newest] Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-31 12:14 [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz [not found] ` <1433074457-26437-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-05-31 12:14 ` [PATCH for-next V2 1/9] IB/core: Change provider's API of create_cq to be extendible Or Gerlitz [not found] ` <1433074457-26437-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-01 9:21 ` Devesh Sharma 2015-06-01 17:00 ` Jason Gunthorpe 2015-05-31 12:14 ` [PATCH for-next V2 2/9] IB/core: Change ib_create_cq to use struct ib_cq_init_attr Or Gerlitz [not found] ` <1433074457-26437-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-01 16:53 ` Jason Gunthorpe [not found] ` <20150601165335.GB14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-02 7:29 ` Matan Barak [not found] ` <CAAKD3BAyNpEKwM7xu=LzFo2F=qP=EN9ZteL2yyAgBVuPNmvtqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-02 17:27 ` Jason Gunthorpe [not found] ` <20150602172711.GB17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-03 9:04 ` Matan Barak [not found] ` <CAAKD3BA7ZD4Gx5c16Ak+0_64rzazw+7n4frkuKHyrvdT=8ac6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-03 16:02 ` Jason Gunthorpe 2015-05-31 12:14 ` [PATCH for-next V2 3/9] IB/core: Add CQ creation time-stamping flag Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 4/9] IB/core: Extend ib_uverbs_create_cq Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 5/9] IB/core: Add timestamp_mask and hca_core_clock to query_device Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 6/9] IB/core: Pass hardware specific data in query_device Or Gerlitz [not found] ` <1433074457-26437-7-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-01 9:24 ` Devesh Sharma 2015-05-31 12:14 ` [PATCH for-next V2 7/9] IB/mlx4: Add mmap call to map the hardware clock Or Gerlitz 2015-05-31 12:14 ` [PATCH for-next V2 8/9] IB/mlx4: Support extended create_cq and query_device uverbs Or Gerlitz [not found] ` <1433074457-26437-9-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-01 16:56 ` Jason Gunthorpe [not found] ` <20150601165649.GC14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-02 7:21 ` Matan Barak [not found] ` <CAAKD3BBsTFYKwOBK+6u04x6deRYzjYESJzbpyy83UtoChZhpOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-02 17:07 ` Jason Gunthorpe [not found] ` <20150602170736.GA17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-03 8:57 ` Matan Barak [not found] ` <CAAKD3BARPocK1zZ2nYV0xWHqEASJTYbp1aQxwymi-TYTyK_vGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-03 16:31 ` Jason Gunthorpe [not found] ` <20150603163146.GD12073-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-03 18:58 ` Or Gerlitz [not found] ` <CAJ3xEMhGdX-35WAjauKZE7+Wg39kbXP35UzYX33J6OB8g_9rYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-03 19:16 ` Jason Gunthorpe [not found] ` <20150603191609.GB7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-03 19:35 ` Or Gerlitz [not found] ` <CAJ3xEMiPkxf_Qu40AozMRWZaY-dAfEyAOZGyT9VYsNpQO1Q-Zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-03 20:38 ` Jason Gunthorpe 2015-05-31 12:14 ` [PATCH for-next V2 9/9] IB/mlx4: Add support for CQ time-stamping Or Gerlitz 2015-05-31 12:31 ` [PATCH for-next V2 0/9] Add completion timestamping support Or Gerlitz 2015-05-31 19:00 ` Doug Ledford [not found] ` <1433098827.114391.179.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-01 9:30 ` Matan Barak [not found] ` <CAAKD3BBo9A5AV9Uv-q1TgSQ6hs-oSUn5QsUT+5+6mk1RwW7AUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-01 11:25 ` Doug Ledford [not found] ` <1433157904.114391.188.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-01 13:58 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506010852360.26845-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-01 16:39 ` Doug Ledford 2015-06-01 16:43 ` Jason Gunthorpe [not found] ` <20150601164322.GA14391-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-01 17:00 ` Doug Ledford [not found] ` <1433178057.114391.208.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-01 17:29 ` Jason Gunthorpe 2015-06-02 14:35 ` Doug Ledford [not found] ` <1433255724.114391.225.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-02 14:44 ` Or Gerlitz [not found] ` <556DC167.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-02 14:53 ` Doug Ledford 2015-06-02 18:08 ` Jason Gunthorpe [not found] ` <20150602180844.GD17776-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-02 18:51 ` Doug Ledford [not found] ` <1433271083.40123.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-02 19:04 ` Jason Gunthorpe [not found] ` <20150602190410.GA23362-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-02 19:56 ` Doug Ledford 2015-06-03 19:48 ` Or Gerlitz [not found] ` <CAJ3xEMjZqwJC8=HrMBv_eSVKVtLW7qjVZNjtuD28octGPtYu6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-06-03 20:46 ` Jason Gunthorpe [not found] ` <20150603204633.GE7902-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-04 0:55 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506031953500.11879-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-04 4:25 ` Jason Gunthorpe [not found] ` <20150604042540.GA8837-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-06 8:16 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506060304480.29634-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-06 15:45 ` Doug Ledford [not found] ` <1433605546.40123.217.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2015-06-06 20:25 ` Or Gerlitz 2015-06-07 4:42 ` Jason Gunthorpe 2015-06-07 4:49 ` Christoph Lameter 2015-06-04 16:17 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F16-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2015-06-04 16:47 ` Jason Gunthorpe [not found] ` <20150604164759.GC27699-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-06-04 17:24 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE4F98-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2015-06-06 8:27 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506060318570.29860-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-08 16:40 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE5AB8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2015-06-10 1:26 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506092018190.6681-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-10 8:45 ` Or Gerlitz [not found] ` <5577F93F.9030405-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-06-10 23:54 ` Doug Ledford 2015-06-10 16:04 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE6746-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2015-06-11 15:24 ` Christoph Lameter [not found] ` <alpine.DEB.2.11.1506111010580.18426-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org> 2015-06-11 19:48 ` Hefty, Sean 2015-06-04 17:30 ` Doug Ledford
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.