Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH rdma-next v1 00/12] DEVX asynchronous events
@ 2019-06-18 17:15 Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 01/12] net/mlx5: Fix mlx5_core_destroy_cq() error flow Leon Romanovsky
                   ` (12 more replies)
  0 siblings, 13 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
 v0 -> v1:
 * Fix the unbind / hot unplug flows to work properly.
 * Fix Ref count handling on the eventfd mode in some flow.
 * Rebased to latest rdma-next

Thanks

------------------------------------------------------------------------------------
From Yishai:

This series enables RDMA applications that use the DEVX interface to
subscribe and read device asynchronous events.

The solution is designed to allow extension of events in the future
without need to perform any changes in the driver code.

To enable that few changes had been done in mlx5_core, it includes:
 * Reading device event capabilities that are user related
   (affiliated and un-affiliated) and set the matching mask upon
   creating the matching EQ.
 * Enable DEVX/mlx5_ib to register for ANY event instead of the option to
   get some hard-coded ones.
 * Enable DEVX/mlx5_ib to get the device raw data for CQ completion events.
 * Enhance mlx5_core_create/destroy CQ to enable DEVX using them so that CQ
   events will be reported as well.

In mlx5_ib layer the below changes were done:
 * A new DEVX API was introduced to allocate an event channel by using
   the uverbs FD object type.
 * Implement the FD channel operations to enable read/poo/close over it.
 * A new DEVX API was introduced to subscribe for specific events over an
   event channel.
 * Manage an internal data structure  over XA(s) to subscribe/dispatch events
   over the different event channels.
 * Use from DEVX the mlx5_core APIs to create/destroy a CQ to be able to
   get its relevant events.

Yishai

Yishai Hadas (12):
  net/mlx5: Fix mlx5_core_destroy_cq() error flow
  net/mlx5: Use event mask based on device capabilities
  net/mlx5: Expose the API to register for ANY event
  net/mlx5: mlx5_core_create_cq() enhancements
  net/mlx5: Report a CQ error event only when a handler was set
  net/mlx5: Report EQE data upon CQ completion
  net/mlx5: Expose device definitions for object events
  IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
  IB/mlx5: Register DEVX with mlx5_core to get async events
  IB/mlx5: Enable subscription for device events over DEVX
  IB/mlx5: Implement DEVX dispatching event
  IB/mlx5: Add DEVX support for CQ events

 drivers/infiniband/hw/mlx5/cq.c               |    5 +-
 drivers/infiniband/hw/mlx5/devx.c             | 1082 ++++++++++++++++-
 drivers/infiniband/hw/mlx5/main.c             |   10 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   12 +
 drivers/infiniband/hw/mlx5/odp.c              |    3 +-
 drivers/infiniband/hw/mlx5/qp.c               |    2 +-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c  |   21 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |    2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |    3 +-
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |    2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   68 +-
 .../ethernet/mellanox/mlx5/core/fpga/conn.c   |    6 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c  |    6 +
 .../net/ethernet/mellanox/mlx5/core/lib/eq.h  |    5 +-
 include/linux/mlx5/cq.h                       |    6 +-
 include/linux/mlx5/device.h                   |    6 +-
 include/linux/mlx5/driver.h                   |    2 +
 include/linux/mlx5/eq.h                       |    4 +-
 include/linux/mlx5/mlx5_ifc.h                 |   34 +-
 include/uapi/rdma/mlx5_user_ioctl_cmds.h      |   19 +
 include/uapi/rdma/mlx5_user_ioctl_verbs.h     |    9 +
 21 files changed, 1237 insertions(+), 70 deletions(-)

--
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mlx5-next v1 01/12] net/mlx5: Fix mlx5_core_destroy_cq() error flow
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 02/12] net/mlx5: Use event mask based on device capabilities Leon Romanovsky
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

The firmware command to destroy a CQ might fail when the object is
referenced by other object and the ref count is managed by the firmware.

To enable a second successful destruction post the first failure need to
change  mlx5_eq_del_cq() to be a void function.

As an error in mlx5_eq_del_cq() is quite fatal from the option to
recover, a debug message inside it should be good enougth and it was
changed to be void.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c     |  9 ++-------
 drivers/net/ethernet/mellanox/mlx5/core/eq.c     | 16 +++++++---------
 drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h |  2 +-
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 713a17ee3751..703d88332bc6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -158,13 +158,8 @@ int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq)
 	u32 in[MLX5_ST_SZ_DW(destroy_cq_in)] = {0};
 	int err;
 
-	err = mlx5_eq_del_cq(mlx5_get_async_eq(dev), cq);
-	if (err)
-		return err;
-
-	err = mlx5_eq_del_cq(&cq->eq->core, cq);
-	if (err)
-		return err;
+	mlx5_eq_del_cq(mlx5_get_async_eq(dev), cq);
+	mlx5_eq_del_cq(&cq->eq->core, cq);
 
 	MLX5_SET(destroy_cq_in, in, opcode, MLX5_CMD_OP_DESTROY_CQ);
 	MLX5_SET(destroy_cq_in, in, cqn, cq->cqn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 58fff2f39b38..8000d2a4a7e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -389,7 +389,7 @@ int mlx5_eq_add_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq)
 	return err;
 }
 
-int mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq)
+void mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq)
 {
 	struct mlx5_cq_table *table = &eq->cq_table;
 	struct mlx5_core_cq *tmp;
@@ -399,16 +399,14 @@ int mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq)
 	spin_unlock(&table->lock);
 
 	if (!tmp) {
-		mlx5_core_warn(eq->dev, "cq 0x%x not found in eq 0x%x tree\n", eq->eqn, cq->cqn);
-		return -ENOENT;
+		mlx5_core_dbg(eq->dev, "cq 0x%x not found in eq 0x%x tree\n",
+			      eq->eqn, cq->cqn);
+		return;
 	}
 
-	if (tmp != cq) {
-		mlx5_core_warn(eq->dev, "corruption on cqn 0x%x in eq 0x%x\n", eq->eqn, cq->cqn);
-		return -EINVAL;
-	}
-
-	return 0;
+	if (tmp != cq)
+		mlx5_core_dbg(eq->dev, "corruption on cqn 0x%x in eq 0x%x\n",
+			      eq->eqn, cq->cqn);
 }
 
 int mlx5_eq_table_init(struct mlx5_core_dev *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
index 24bd991a727e..d826e63d5a17 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
@@ -75,7 +75,7 @@ int mlx5_eq_table_create(struct mlx5_core_dev *dev);
 void mlx5_eq_table_destroy(struct mlx5_core_dev *dev);
 
 int mlx5_eq_add_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq);
-int mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq);
+void mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq);
 struct mlx5_eq_comp *mlx5_eqn2comp_eq(struct mlx5_core_dev *dev, int eqn);
 struct mlx5_eq *mlx5_get_async_eq(struct mlx5_core_dev *dev);
 void mlx5_cq_tasklet_cb(unsigned long data);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mlx5-next v1 02/12] net/mlx5: Use event mask based on device capabilities
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 01/12] net/mlx5: Fix mlx5_core_destroy_cq() error flow Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-27  0:23   ` Saeed Mahameed
  2019-06-18 17:15 ` [PATCH mlx5-next v1 03/12] net/mlx5: Expose the API to register for ANY event Leon Romanovsky
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.

As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c             |  3 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 45 ++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/fw.c |  6 +++
 include/linux/mlx5/device.h                  |  6 ++-
 include/linux/mlx5/eq.h                      |  4 +-
 include/linux/mlx5/mlx5_ifc.h                | 13 ++++--
 6 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 600fe23e2eae..a6740ec308ed 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -1559,10 +1559,11 @@ mlx5_ib_create_pf_eq(struct mlx5_ib_dev *dev, struct mlx5_ib_pf_eq *eq)
 	eq->irq_nb.notifier_call = mlx5_ib_eq_pf_int;
 	param = (struct mlx5_eq_param) {
 		.irq_index = 0,
-		.mask = 1 << MLX5_EVENT_TYPE_PAGE_FAULT,
 		.nent = MLX5_IB_NUM_PF_EQE,
 	};
 	eq->core = mlx5_eq_create_generic(dev->mdev, &param);
+
+	param.mask[0] = 1ull << MLX5_EVENT_TYPE_PAGE_FAULT;
 	if (IS_ERR(eq->core)) {
 		err = PTR_ERR(eq->core);
 		goto err_wq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 8000d2a4a7e2..9d07add38940 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -256,6 +256,7 @@ create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 	int inlen;
 	u32 *in;
 	int err;
+	int i;
 
 	/* Init CQ table */
 	memset(cq_table, 0, sizeof(*cq_table));
@@ -283,10 +284,12 @@ create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 	mlx5_fill_page_array(&eq->buf, pas);
 
 	MLX5_SET(create_eq_in, in, opcode, MLX5_CMD_OP_CREATE_EQ);
-	if (!param->mask && MLX5_CAP_GEN(dev, log_max_uctx))
+	if (!param->mask[0] && MLX5_CAP_GEN(dev, log_max_uctx))
 		MLX5_SET(create_eq_in, in, uid, MLX5_SHARED_RESOURCE_UID);
 
-	MLX5_SET64(create_eq_in, in, event_bitmask, param->mask);
+	for (i = 0; i < 4; i++)
+		MLX5_ARRAY_SET64(create_eq_in, in, event_bitmask, i,
+				 param->mask[i]);
 
 	eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry);
 	MLX5_SET(eqc, eqc, log_eq_size, ilog2(eq->nent));
@@ -507,10 +510,32 @@ static int cq_err_event_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static u64 gather_async_events_mask(struct mlx5_core_dev *dev)
+static void gather_async_events_from_cap(struct mlx5_core_dev *dev,
+					 u64 mask[4])
+{
+	__be64 *user_unaffiliated_events;
+	__be64 *user_affiliated_events;
+	int i;
+
+	user_affiliated_events =
+		MLX5_CAP_DEV_EVENT(dev, user_affiliated_events);
+	user_unaffiliated_events =
+		MLX5_CAP_DEV_EVENT(dev, user_unaffiliated_events);
+
+	for (i = 0; i < 4; i++)
+		mask[i] = be64_to_cpu(user_affiliated_events[i] |
+				      user_unaffiliated_events[i]);
+}
+
+static void gather_async_events_mask(struct mlx5_core_dev *dev, u64 mask[4])
 {
 	u64 async_event_mask = MLX5_ASYNC_EVENT_MASK;
 
+	if (MLX5_CAP_GEN(dev, event_cap)) {
+		gather_async_events_from_cap(dev, mask);
+		return;
+	}
+
 	if (MLX5_VPORT_MANAGER(dev))
 		async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);
 
@@ -544,7 +569,7 @@ static u64 gather_async_events_mask(struct mlx5_core_dev *dev)
 		async_event_mask |=
 			(1ull << MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED);
 
-	return async_event_mask;
+	mask[0] = async_event_mask;
 }
 
 static int create_async_eqs(struct mlx5_core_dev *dev)
@@ -559,9 +584,11 @@ static int create_async_eqs(struct mlx5_core_dev *dev)
 	table->cmd_eq.irq_nb.notifier_call = mlx5_eq_async_int;
 	param = (struct mlx5_eq_param) {
 		.irq_index = 0,
-		.mask = 1ull << MLX5_EVENT_TYPE_CMD,
+		.mask = {1ull << MLX5_EVENT_TYPE_CMD},
 		.nent = MLX5_NUM_CMD_EQE,
 	};
+
+	param.mask[0] = 1ull << MLX5_EVENT_TYPE_CMD;
 	err = create_async_eq(dev, &table->cmd_eq.core, &param);
 	if (err) {
 		mlx5_core_warn(dev, "failed to create cmd EQ %d\n", err);
@@ -577,9 +604,9 @@ static int create_async_eqs(struct mlx5_core_dev *dev)
 	table->async_eq.irq_nb.notifier_call = mlx5_eq_async_int;
 	param = (struct mlx5_eq_param) {
 		.irq_index = 0,
-		.mask = gather_async_events_mask(dev),
 		.nent = MLX5_NUM_ASYNC_EQE,
 	};
+	gather_async_events_mask(dev, param.mask);
 	err = create_async_eq(dev, &table->async_eq.core, &param);
 	if (err) {
 		mlx5_core_warn(dev, "failed to create async EQ %d\n", err);
@@ -595,9 +622,11 @@ static int create_async_eqs(struct mlx5_core_dev *dev)
 	table->pages_eq.irq_nb.notifier_call = mlx5_eq_async_int;
 	param = (struct mlx5_eq_param) {
 		.irq_index = 0,
-		.mask =  1 << MLX5_EVENT_TYPE_PAGE_REQUEST,
+		.mask = {1 << MLX5_EVENT_TYPE_PAGE_REQUEST},
 		.nent = /* TODO: sriov max_vf + */ 1,
 	};
+
+	param.mask[0] = 1ull << MLX5_EVENT_TYPE_PAGE_REQUEST;
 	err = create_async_eq(dev, &table->pages_eq.core, &param);
 	if (err) {
 		mlx5_core_warn(dev, "failed to create pages EQ %d\n", err);
@@ -789,7 +818,7 @@ static int create_comp_eqs(struct mlx5_core_dev *dev)
 		eq->irq_nb.notifier_call = mlx5_eq_comp_int;
 		param = (struct mlx5_eq_param) {
 			.irq_index = vecidx,
-			.mask = 0,
+			.mask = {0},
 			.nent = nent,
 		};
 		err = create_map_eq(dev, &eq->core, &param);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 1ab6f7e3bec6..05367f15c3a7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -202,6 +202,12 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 			return err;
 	}
 
+	if (MLX5_CAP_GEN(dev, event_cap)) {
+		err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_EVENT);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 5e760067ac41..0d1abe097627 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -351,7 +351,7 @@ enum mlx5_event {
 
 	MLX5_EVENT_TYPE_DEVICE_TRACER      = 0x26,
 
-	MLX5_EVENT_TYPE_MAX                = MLX5_EVENT_TYPE_DEVICE_TRACER + 1,
+	MLX5_EVENT_TYPE_MAX                = 0x100,
 };
 
 enum {
@@ -1077,6 +1077,7 @@ enum mlx5_cap_type {
 	MLX5_CAP_DEBUG,
 	MLX5_CAP_RESERVED_14,
 	MLX5_CAP_DEV_MEM,
+	MLX5_CAP_DEV_EVENT = 0x14,
 	/* NUM OF CAP Types */
 	MLX5_CAP_NUM
 };
@@ -1255,6 +1256,9 @@ enum mlx5_qcam_feature_groups {
 #define MLX5_CAP64_DEV_MEM(mdev, cap)\
 	MLX5_GET64(device_mem_cap, mdev->caps.hca_cur[MLX5_CAP_DEV_MEM], cap)
 
+#define MLX5_CAP_DEV_EVENT(mdev, cap)\
+	MLX5_ADDR_OF(device_event_cap, (mdev)->caps.hca_cur[MLX5_CAP_DEV_EVENT], cap)
+
 enum {
 	MLX5_CMD_STAT_OK			= 0x0,
 	MLX5_CMD_STAT_INT_ERR			= 0x1,
diff --git a/include/linux/mlx5/eq.h b/include/linux/mlx5/eq.h
index 70e16dcfb4c4..202df2e5fe8c 100644
--- a/include/linux/mlx5/eq.h
+++ b/include/linux/mlx5/eq.h
@@ -15,7 +15,9 @@ struct mlx5_core_dev;
 struct mlx5_eq_param {
 	u8             irq_index;
 	int            nent;
-	u64            mask;
+	u64            mask[4];
+	void          *context;
+	irq_handler_t  handler;
 };
 
 struct mlx5_eq *
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 16348528fef6..3ef716c054c2 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -823,6 +823,12 @@ struct mlx5_ifc_device_mem_cap_bits {
 	u8         reserved_at_180[0x680];
 };
 
+struct mlx5_ifc_device_event_cap_bits {
+	u8         user_affiliated_events[4][0x40];
+
+	u8         user_unaffiliated_events[4][0x40];
+};
+
 enum {
 	MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_1_BYTE     = 0x0,
 	MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_2_BYTES    = 0x2,
@@ -980,7 +986,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
 	u8         log_max_srq_sz[0x8];
 	u8         log_max_qp_sz[0x8];
-	u8         reserved_at_90[0x8];
+	u8         event_cap[0x1];
+	u8         reserved_at_91[0x7];
 	u8         prio_tag_required[0x1];
 	u8         reserved_at_99[0x2];
 	u8         log_max_qp[0x5];
@@ -7364,9 +7371,9 @@ struct mlx5_ifc_create_eq_in_bits {
 
 	u8         reserved_at_280[0x40];
 
-	u8         event_bitmask[0x40];
+	u8         event_bitmask[4][0x40];
 
-	u8         reserved_at_300[0x580];
+	u8         reserved_at_3c0[0x4c0];
 
 	u8         pas[0][0x40];
 };
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mlx5-next v1 03/12] net/mlx5: Expose the API to register for ANY event
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 01/12] net/mlx5: Fix mlx5_core_destroy_cq() error flow Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 02/12] net/mlx5: Use event mask based on device capabilities Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 04/12] net/mlx5: mlx5_core_create_cq() enhancements Leon Romanovsky
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Expose the API to register for ANY event, mlx5_ib will be able to use
this functionality for its needs.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c     | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 3 ---
 include/linux/mlx5/driver.h                      | 2 ++
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 9d07add38940..a7a8bf73e465 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -954,6 +954,7 @@ int mlx5_eq_notifier_register(struct mlx5_core_dev *dev, struct mlx5_nb *nb)
 
 	return atomic_notifier_chain_register(&eqt->nh[nb->event_type], &nb->nb);
 }
+EXPORT_SYMBOL(mlx5_eq_notifier_register);
 
 int mlx5_eq_notifier_unregister(struct mlx5_core_dev *dev, struct mlx5_nb *nb)
 {
@@ -964,3 +965,4 @@ int mlx5_eq_notifier_unregister(struct mlx5_core_dev *dev, struct mlx5_nb *nb)
 
 	return atomic_notifier_chain_unregister(&eqt->nh[nb->event_type], &nb->nb);
 }
+EXPORT_SYMBOL(mlx5_eq_notifier_unregister);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
index d826e63d5a17..3dfab91ae5f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
@@ -97,7 +97,4 @@ void mlx5_core_eq_free_irqs(struct mlx5_core_dev *dev);
 struct cpu_rmap *mlx5_eq_table_get_rmap(struct mlx5_core_dev *dev);
 #endif
 
-int mlx5_eq_notifier_register(struct mlx5_core_dev *dev, struct mlx5_nb *nb);
-int mlx5_eq_notifier_unregister(struct mlx5_core_dev *dev, struct mlx5_nb *nb);
-
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d8ab633406c2..3f8b9d8e2070 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1047,6 +1047,8 @@ int mlx5_register_interface(struct mlx5_interface *intf);
 void mlx5_unregister_interface(struct mlx5_interface *intf);
 int mlx5_notifier_register(struct mlx5_core_dev *dev, struct notifier_block *nb);
 int mlx5_notifier_unregister(struct mlx5_core_dev *dev, struct notifier_block *nb);
+int mlx5_eq_notifier_register(struct mlx5_core_dev *dev, struct mlx5_nb *nb);
+int mlx5_eq_notifier_unregister(struct mlx5_core_dev *dev, struct mlx5_nb *nb);
 
 int mlx5_core_query_vendor_id(struct mlx5_core_dev *mdev, u32 *vendor_id);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mlx5-next v1 04/12] net/mlx5: mlx5_core_create_cq() enhancements
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (2 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH mlx5-next v1 03/12] net/mlx5: Expose the API to register for ANY event Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 05/12] net/mlx5: Report a CQ error event only when a handler was set Leon Romanovsky
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c                     | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c        | 7 +++----
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 3 ++-
 include/linux/mlx5/cq.h                             | 2 +-
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 07b73df0e1a3..9f39e7b9dd1b 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -892,6 +892,7 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	int vector = attr->comp_vector;
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_ib_cq *cq = to_mcq(ibcq);
+	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
 	int uninitialized_var(index);
 	int uninitialized_var(inlen);
 	u32 *cqb = NULL;
@@ -954,7 +955,7 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (cq->create_flags & IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN)
 		MLX5_SET(cqc, cqc, oi, 1);
 
-	err = mlx5_core_create_cq(dev->mdev, &cq->mcq, cqb, inlen);
+	err = mlx5_core_create_cq(dev->mdev, &cq->mcq, cqb, inlen, out, sizeof(out));
 	if (err)
 		goto err_cqb;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 703d88332bc6..1bd4336392a2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -87,11 +87,10 @@ static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq)
 }
 
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
-			u32 *in, int inlen)
+			u32 *in, int inlen, u32 *out, int outlen)
 {
 	int eqn = MLX5_GET(cqc, MLX5_ADDR_OF(create_cq_in, in, cq_context), c_eqn);
 	u32 dout[MLX5_ST_SZ_DW(destroy_cq_out)];
-	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
 	u32 din[MLX5_ST_SZ_DW(destroy_cq_in)];
 	struct mlx5_eq_comp *eq;
 	int err;
@@ -100,9 +99,9 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 	if (IS_ERR(eq))
 		return PTR_ERR(eq);
 
-	memset(out, 0, sizeof(out));
+	memset(out, 0, outlen);
 	MLX5_SET(create_cq_in, in, opcode, MLX5_CMD_OP_CREATE_CQ);
-	err = mlx5_cmd_exec(dev, in, inlen, out, sizeof(out));
+	err = mlx5_cmd_exec(dev, in, inlen, out, outlen);
 	if (err)
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 457cc39423f2..7186282785df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1517,6 +1517,7 @@ static void mlx5e_free_cq(struct mlx5e_cq *cq)
 
 static int mlx5e_create_cq(struct mlx5e_cq *cq, struct mlx5e_cq_param *param)
 {
+	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
 	struct mlx5_core_dev *mdev = cq->mdev;
 	struct mlx5_core_cq *mcq = &cq->mcq;
 
@@ -1551,7 +1552,7 @@ static int mlx5e_create_cq(struct mlx5e_cq *cq, struct mlx5e_cq_param *param)
 					    MLX5_ADAPTER_PAGE_SHIFT);
 	MLX5_SET64(cqc, cqc, dbr_addr,      cq->wq_ctrl.db.dma);
 
-	err = mlx5_core_create_cq(mdev, mcq, in, inlen);
+	err = mlx5_core_create_cq(mdev, mcq, in, inlen, out, sizeof(out));
 
 	kvfree(in);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index ca2296a2f9ee..dc7b9d9f274d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -429,6 +429,7 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size)
 	struct mlx5_fpga_device *fdev = conn->fdev;
 	struct mlx5_core_dev *mdev = fdev->mdev;
 	u32 temp_cqc[MLX5_ST_SZ_DW(cqc)] = {0};
+	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
 	struct mlx5_wq_param wqp;
 	struct mlx5_cqe64 *cqe;
 	int inlen, err, eqn;
@@ -476,7 +477,7 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size)
 	pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas);
 	mlx5_fill_page_frag_array(&conn->cq.wq_ctrl.buf, pas);
 
-	err = mlx5_core_create_cq(mdev, &conn->cq.mcq, in, inlen);
+	err = mlx5_core_create_cq(mdev, &conn->cq.mcq, in, inlen, out, sizeof(out));
 	kvfree(in);
 
 	if (err)
diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h
index 769326ea1d9b..e44157a2b7db 100644
--- a/include/linux/mlx5/cq.h
+++ b/include/linux/mlx5/cq.h
@@ -185,7 +185,7 @@ static inline void mlx5_cq_put(struct mlx5_core_cq *cq)
 }
 
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
-			u32 *in, int inlen);
+			u32 *in, int inlen, u32 *out, int outlen);
 int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq);
 int mlx5_core_query_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 		       u32 *out, int outlen);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mlx5-next v1 05/12] net/mlx5: Report a CQ error event only when a handler was set
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (3 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH mlx5-next v1 04/12] net/mlx5: mlx5_core_create_cq() enhancements Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 06/12] net/mlx5: Report EQE data upon CQ completion Leon Romanovsky
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Report a CQ error event only when a handler was set.

This enables mlx5_ib to not set a handler upon CQ creation and use some
other mechanism to get this event as of other events by the
mlx5_eq_notifier_register API.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index a7a8bf73e465..316fbed81470 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -503,7 +503,8 @@ static int cq_err_event_notifier(struct notifier_block *nb,
 		return NOTIFY_OK;
 	}
 
-	cq->event(cq, type);
+	if (cq->event)
+		cq->event(cq, type);
 
 	mlx5_cq_put(cq);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mlx5-next v1 06/12] net/mlx5: Report EQE data upon CQ completion
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (4 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH mlx5-next v1 05/12] net/mlx5: Report a CQ error event only when a handler was set Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-18 17:15 ` [PATCH mlx5-next v1 07/12] net/mlx5: Expose device definitions for object events Leon Romanovsky
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Report EQE data upon CQ completion to let upper layers use this data.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c                     | 2 +-
 drivers/infiniband/hw/mlx5/main.c                   | 2 +-
 drivers/infiniband/hw/mlx5/qp.c                     | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c        | 5 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en.h        | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c        | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 3 ++-
 include/linux/mlx5/cq.h                             | 4 ++--
 9 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 9f39e7b9dd1b..fa6d49d583fe 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -37,7 +37,7 @@
 #include "mlx5_ib.h"
 #include "srq.h"
 
-static void mlx5_ib_cq_comp(struct mlx5_core_cq *cq)
+static void mlx5_ib_cq_comp(struct mlx5_core_cq *cq, struct mlx5_eqe *eqe)
 {
 	struct ib_cq *ibcq = &to_mibcq(cq)->ibcq;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 7fb1f775e3d4..c9c6710afc6b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4463,7 +4463,7 @@ static void mlx5_ib_handle_internal_error(struct mlx5_ib_dev *ibdev)
 	 * lock/unlock above locks Now need to arm all involved CQs.
 	 */
 	list_for_each_entry(mcq, &cq_armed_list, reset_notify) {
-		mcq->comp(mcq);
+		mcq->comp(mcq, NULL);
 	}
 	spin_unlock_irqrestore(&ibdev->reset_flow_resource_lock, flags);
 }
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 63d8f61e50e0..f240e022682b 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -6409,7 +6409,7 @@ static void handle_drain_completion(struct ib_cq *cq,
 		/* Run the CQ handler - this makes sure that the drain WR will
 		 * be processed if wasn't processed yet.
 		 */
-		mcq->mcq.comp(&mcq->mcq);
+		mcq->mcq.comp(&mcq->mcq, NULL);
 	}
 
 	wait_for_completion(&sdrain->done);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 1bd4336392a2..818edc63e428 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -58,7 +58,7 @@ void mlx5_cq_tasklet_cb(unsigned long data)
 	list_for_each_entry_safe(mcq, temp, &ctx->process_list,
 				 tasklet_ctx.list) {
 		list_del_init(&mcq->tasklet_ctx.list);
-		mcq->tasklet_ctx.comp(mcq);
+		mcq->tasklet_ctx.comp(mcq, NULL);
 		mlx5_cq_put(mcq);
 		if (time_after(jiffies, end))
 			break;
@@ -68,7 +68,8 @@ void mlx5_cq_tasklet_cb(unsigned long data)
 		tasklet_schedule(&ctx->task);
 }
 
-static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq)
+static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
+				   struct mlx5_eqe *eqe)
 {
 	unsigned long flags;
 	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 3a183d690e23..16753f263079 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -780,7 +780,7 @@ netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 			  struct mlx5e_tx_wqe *wqe, u16 pi, bool xmit_more);
 
 void mlx5e_trigger_irq(struct mlx5e_icosq *sq);
-void mlx5e_completion_event(struct mlx5_core_cq *mcq);
+void mlx5e_completion_event(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe);
 void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event);
 int mlx5e_napi_poll(struct napi_struct *napi, int budget);
 bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index f9862bf75491..c665ae0f22bd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -136,7 +136,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
-void mlx5e_completion_event(struct mlx5_core_cq *mcq)
+void mlx5e_completion_event(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe)
 {
 	struct mlx5e_cq *cq = container_of(mcq, struct mlx5e_cq, mcq);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 316fbed81470..185f99e30bd5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -153,7 +153,7 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 		cq = mlx5_eq_cq_get(eq, cqn);
 		if (likely(cq)) {
 			++cq->arm_sn;
-			cq->comp(cq);
+			cq->comp(cq, eqe);
 			mlx5_cq_put(cq);
 		} else {
 			mlx5_core_warn(eq->dev, "Completion event for bogus CQ 0x%x\n", cqn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index dc7b9d9f274d..028891823f32 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -414,7 +414,8 @@ static void mlx5_fpga_conn_cq_tasklet(unsigned long data)
 	mlx5_fpga_conn_cqes(conn, MLX5_FPGA_CQ_BUDGET);
 }
 
-static void mlx5_fpga_conn_cq_complete(struct mlx5_core_cq *mcq)
+static void mlx5_fpga_conn_cq_complete(struct mlx5_core_cq *mcq,
+				       struct mlx5_eqe *eqe)
 {
 	struct mlx5_fpga_conn *conn;
 
diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h
index e44157a2b7db..40748fc1b11b 100644
--- a/include/linux/mlx5/cq.h
+++ b/include/linux/mlx5/cq.h
@@ -47,7 +47,7 @@ struct mlx5_core_cq {
 	struct completion	free;
 	unsigned		vector;
 	unsigned int		irqn;
-	void (*comp)		(struct mlx5_core_cq *);
+	void (*comp)(struct mlx5_core_cq *cq, struct mlx5_eqe *eqe);
 	void (*event)		(struct mlx5_core_cq *, enum mlx5_event);
 	u32			cons_index;
 	unsigned		arm_sn;
@@ -55,7 +55,7 @@ struct mlx5_core_cq {
 	int			pid;
 	struct {
 		struct list_head list;
-		void (*comp)(struct mlx5_core_cq *);
+		void (*comp)(struct mlx5_core_cq *cq, struct mlx5_eqe *eqe);
 		void		*priv;
 	} tasklet_ctx;
 	int			reset_notify_added;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mlx5-next v1 07/12] net/mlx5: Expose device definitions for object events
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (5 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH mlx5-next v1 06/12] net/mlx5: Report EQE data upon CQ completion Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-18 17:15 ` [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD Leon Romanovsky
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Expose an extra device definitions for objects events.

It includes: object_type values for legacy objects and generic data
header for any other object.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 3ef716c054c2..80a453a1e949 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -91,6 +91,20 @@ enum {
 
 enum {
 	MLX5_OBJ_TYPE_GENEVE_TLV_OPT = 0x000b,
+	MLX5_OBJ_TYPE_MKEY = 0xff01,
+	MLX5_OBJ_TYPE_QP = 0xff02,
+	MLX5_OBJ_TYPE_PSV = 0xff03,
+	MLX5_OBJ_TYPE_RMP = 0xff04,
+	MLX5_OBJ_TYPE_XRC_SRQ = 0xff05,
+	MLX5_OBJ_TYPE_RQ = 0xff06,
+	MLX5_OBJ_TYPE_SQ = 0xff07,
+	MLX5_OBJ_TYPE_TIR = 0xff08,
+	MLX5_OBJ_TYPE_TIS = 0xff09,
+	MLX5_OBJ_TYPE_DCT = 0xff0a,
+	MLX5_OBJ_TYPE_XRQ = 0xff0b,
+	MLX5_OBJ_TYPE_RQT = 0xff0e,
+	MLX5_OBJ_TYPE_FLOW_COUNTER = 0xff0f,
+	MLX5_OBJ_TYPE_CQ = 0xff10,
 };
 
 enum {
@@ -9755,4 +9769,11 @@ struct mlx5_ifc_query_esw_functions_out_bits {
 	u8         reserved_at_280[0x180];
 };
 
+struct mlx5_ifc_affiliated_event_header_bits {
+	u8         reserved_at_0[0x10];
+	u8         obj_type[0x10];
+
+	u8         obj_id[0x10];
+};
+
 #endif /* MLX5_IFC_H */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (6 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH mlx5-next v1 07/12] net/mlx5: Expose device definitions for object events Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-24 11:51   ` Jason Gunthorpe
  2019-06-18 17:15 ` [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events Leon Romanovsky
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial
implementation.

This object is from type class FD and will be used to read DEVX
async events.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c         | 112 ++++++++++++++++++++--
 include/uapi/rdma/mlx5_user_ioctl_cmds.h  |  10 ++
 include/uapi/rdma/mlx5_user_ioctl_verbs.h |   4 +
 3 files changed, 116 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 80b42d069328..1815ce0f8daf 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -33,6 +33,24 @@ struct devx_async_data {
 	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
 };
 
+struct devx_async_event_queue {
+	spinlock_t		lock;
+	wait_queue_head_t	poll_wait;
+	struct list_head	event_list;
+	atomic_t		bytes_in_use;
+	u8			is_destroyed:1;
+	u32			flags;
+};
+
+struct devx_async_event_file {
+	struct ib_uobject		uobj;
+	struct list_head subscribed_events_list; /* Head of events that are
+						  * subscribed to this FD
+						  */
+	struct devx_async_event_queue	ev_queue;
+	struct mlx5_ib_dev *dev;
+};
+
 #define MLX5_MAX_DESTROY_INBOX_SIZE_DW MLX5_ST_SZ_DW(delete_fte_in)
 struct devx_obj {
 	struct mlx5_core_dev	*mdev;
@@ -1336,27 +1354,21 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_QUERY)(
 			      cmd_out, cmd_out_len);
 }
 
-struct devx_async_event_queue {
-	spinlock_t		lock;
-	wait_queue_head_t	poll_wait;
-	struct list_head	event_list;
-	atomic_t		bytes_in_use;
-	u8			is_destroyed:1;
-};
-
 struct devx_async_cmd_event_file {
 	struct ib_uobject		uobj;
 	struct devx_async_event_queue	ev_queue;
 	struct mlx5_async_ctx		async_ctx;
 };
 
-static void devx_init_event_queue(struct devx_async_event_queue *ev_queue)
+static void devx_init_event_queue(struct devx_async_event_queue *ev_queue,
+				  u32 flags)
 {
 	spin_lock_init(&ev_queue->lock);
 	INIT_LIST_HEAD(&ev_queue->event_list);
 	init_waitqueue_head(&ev_queue->poll_wait);
 	atomic_set(&ev_queue->bytes_in_use, 0);
 	ev_queue->is_destroyed = 0;
+	ev_queue->flags = flags;
 }
 
 static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_ASYNC_CMD_FD_ALLOC)(
@@ -1370,11 +1382,38 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_ASYNC_CMD_FD_ALLOC)(
 
 	ev_file = container_of(uobj, struct devx_async_cmd_event_file,
 			       uobj);
-	devx_init_event_queue(&ev_file->ev_queue);
+	devx_init_event_queue(&ev_file->ev_queue, 0);
 	mlx5_cmd_init_async_ctx(mdev->mdev, &ev_file->async_ctx);
 	return 0;
 }
 
+static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_ASYNC_EVENT_FD_ALLOC)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *uobj = uverbs_attr_get_uobject(
+		attrs, MLX5_IB_ATTR_DEVX_ASYNC_EVENT_FD_ALLOC_HANDLE);
+	struct devx_async_event_file *ev_file;
+	struct mlx5_ib_ucontext *c = rdma_udata_to_drv_context(
+		&attrs->driver_udata, struct mlx5_ib_ucontext, ibucontext);
+	struct mlx5_ib_dev *dev = to_mdev(c->ibucontext.device);
+	u32 flags;
+	int err;
+
+	err = uverbs_get_flags32(&flags, attrs,
+		MLX5_IB_ATTR_DEVX_ASYNC_EVENT_FD_ALLOC_FLAGS,
+		MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
+
+	if (err)
+		return err;
+
+	ev_file = container_of(uobj, struct devx_async_event_file,
+			       uobj);
+	devx_init_event_queue(&ev_file->ev_queue, flags);
+	INIT_LIST_HEAD(&ev_file->subscribed_events_list);
+	ev_file->dev = dev;
+	return 0;
+}
+
 static void devx_query_callback(int status, struct mlx5_async_work *context)
 {
 	struct devx_async_data *async_data =
@@ -1729,6 +1768,32 @@ static const struct file_operations devx_async_cmd_event_fops = {
 	.llseek	 = no_llseek,
 };
 
+static ssize_t devx_async_event_read(struct file *filp, char __user *buf,
+				     size_t count, loff_t *pos)
+{
+	return -EINVAL;
+}
+
+static __poll_t devx_async_event_poll(struct file *filp,
+				      struct poll_table_struct *wait)
+{
+	return 0;
+}
+
+static int devx_async_event_close(struct inode *inode, struct file *filp)
+{
+	uverbs_close_fd(filp);
+	return 0;
+}
+
+static const struct file_operations devx_async_event_fops = {
+	.owner	 = THIS_MODULE,
+	.read	 = devx_async_event_read,
+	.poll    = devx_async_event_poll,
+	.release = devx_async_event_close,
+	.llseek	 = no_llseek,
+};
+
 static int devx_hot_unplug_async_cmd_event_file(struct ib_uobject *uobj,
 						   enum rdma_remove_reason why)
 {
@@ -1748,6 +1813,12 @@ static int devx_hot_unplug_async_cmd_event_file(struct ib_uobject *uobj,
 	return 0;
 };
 
+static int devx_hot_unplug_async_event_file(struct ib_uobject *uobj,
+					    enum rdma_remove_reason why)
+{
+	return 0;
+};
+
 DECLARE_UVERBS_NAMED_METHOD(
 	MLX5_IB_METHOD_DEVX_UMEM_REG,
 	UVERBS_ATTR_IDR(MLX5_IB_ATTR_DEVX_UMEM_REG_HANDLE,
@@ -1913,6 +1984,24 @@ DECLARE_UVERBS_NAMED_OBJECT(
 			     O_RDONLY),
 	&UVERBS_METHOD(MLX5_IB_METHOD_DEVX_ASYNC_CMD_FD_ALLOC));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	MLX5_IB_METHOD_DEVX_ASYNC_EVENT_FD_ALLOC,
+	UVERBS_ATTR_FD(MLX5_IB_ATTR_DEVX_ASYNC_EVENT_FD_ALLOC_HANDLE,
+			MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD,
+			UVERBS_ACCESS_NEW,
+			UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(MLX5_IB_ATTR_DEVX_ASYNC_EVENT_FD_ALLOC_FLAGS,
+			enum mlx5_ib_uapi_devx_create_event_channel_flags,
+			UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_OBJECT(
+	MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD,
+	UVERBS_TYPE_ALLOC_FD(sizeof(struct devx_async_event_file),
+			     devx_hot_unplug_async_event_file,
+			     &devx_async_event_fops, "[devx_async_event]",
+			     O_RDONLY),
+	&UVERBS_METHOD(MLX5_IB_METHOD_DEVX_ASYNC_EVENT_FD_ALLOC));
+
 static bool devx_is_supported(struct ib_device *device)
 {
 	struct mlx5_ib_dev *dev = to_mdev(device);
@@ -1933,5 +2022,8 @@ const struct uapi_definition mlx5_ib_devx_defs[] = {
 	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(
 		MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD,
 		UAPI_DEF_IS_OBJ_SUPPORTED(devx_is_supported)),
+	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(
+		MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD,
+		UAPI_DEF_IS_OBJ_SUPPORTED(devx_is_supported)),
 	{},
 };
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index d404c951954c..6ad8f4f11ddd 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -127,16 +127,26 @@ enum mlx5_ib_devx_async_cmd_fd_alloc_attrs {
 	MLX5_IB_ATTR_DEVX_ASYNC_CMD_FD_ALLOC_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
 };
 
+enum mlx5_ib_devx_async_event_fd_alloc_attrs {
+	MLX5_IB_ATTR_DEVX_ASYNC_EVENT_FD_ALLOC_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+	MLX5_IB_ATTR_DEVX_ASYNC_EVENT_FD_ALLOC_FLAGS,
+};
+
 enum mlx5_ib_devx_async_cmd_fd_methods {
 	MLX5_IB_METHOD_DEVX_ASYNC_CMD_FD_ALLOC = (1U << UVERBS_ID_NS_SHIFT),
 };
 
+enum mlx5_ib_devx_async_event_fd_methods {
+	MLX5_IB_METHOD_DEVX_ASYNC_EVENT_FD_ALLOC = (1U << UVERBS_ID_NS_SHIFT),
+};
+
 enum mlx5_ib_objects {
 	MLX5_IB_OBJECT_DEVX = (1U << UVERBS_ID_NS_SHIFT),
 	MLX5_IB_OBJECT_DEVX_OBJ,
 	MLX5_IB_OBJECT_DEVX_UMEM,
 	MLX5_IB_OBJECT_FLOW_MATCHER,
 	MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD,
+	MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD,
 };
 
 enum mlx5_ib_flow_matcher_create_attrs {
diff --git a/include/uapi/rdma/mlx5_user_ioctl_verbs.h b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
index a8f34c237458..57beea4589e4 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_verbs.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
@@ -63,5 +63,9 @@ enum mlx5_ib_uapi_dm_type {
 	MLX5_IB_UAPI_DM_TYPE_HEADER_MODIFY_SW_ICM,
 };
 
+enum mlx5_ib_uapi_devx_create_event_channel_flags {
+	MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA = 1 << 0,
+};
+
 #endif
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (7 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-24 11:52   ` Jason Gunthorpe
  2019-06-18 17:15 ` [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX Leon Romanovsky
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Register DEVX with with mlx5_core to get async events.  This will enable
to dispatch the applicable events to its consumers in down stream
patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c    | 30 ++++++++++++++++++++++++++++
 drivers/infiniband/hw/mlx5/main.c    |  8 ++++++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 12 +++++++++++
 3 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 1815ce0f8daf..e9b9ba5a3e9a 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -1670,6 +1670,36 @@ static int devx_umem_cleanup(struct ib_uobject *uobject,
 	return 0;
 }
 
+static int devx_event_notifier(struct notifier_block *nb,
+			       unsigned long event_type, void *data)
+{
+	return NOTIFY_DONE;
+}
+
+void mlx5_ib_devx_init_event_table(struct mlx5_ib_dev *dev)
+{
+	struct mlx5_devx_event_table *table = &dev->devx_event_table;
+
+	xa_init(&table->event_xa);
+	mutex_init(&table->event_xa_lock);
+	MLX5_NB_INIT(&table->devx_nb, devx_event_notifier, NOTIFY_ANY);
+	mlx5_eq_notifier_register(dev->mdev, &table->devx_nb);
+}
+
+void mlx5_ib_devx_cleanup_event_table(struct mlx5_ib_dev *dev)
+{
+	struct mlx5_devx_event_table *table = &dev->devx_event_table;
+	void *entry;
+	unsigned long id;
+
+	mlx5_eq_notifier_unregister(dev->mdev, &table->devx_nb);
+
+	xa_for_each(&table->event_xa, id, entry)
+		kfree(entry);
+
+	xa_destroy(&table->event_xa);
+}
+
 static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf,
 					 size_t count, loff_t *pos)
 {
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index c9c6710afc6b..67b9e7ac569a 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6653,15 +6653,19 @@ static int mlx5_ib_stage_devx_init(struct mlx5_ib_dev *dev)
 	int uid;
 
 	uid = mlx5_ib_devx_create(dev, false);
-	if (uid > 0)
+	if (uid > 0) {
 		dev->devx_whitelist_uid = uid;
+		mlx5_ib_devx_init_event_table(dev);
+	}
 
 	return 0;
 }
 static void mlx5_ib_stage_devx_cleanup(struct mlx5_ib_dev *dev)
 {
-	if (dev->devx_whitelist_uid)
+	if (dev->devx_whitelist_uid) {
+		mlx5_ib_devx_cleanup_event_table(dev);
 		mlx5_ib_devx_destroy(dev, dev->devx_whitelist_uid);
+	}
 }
 
 void __mlx5_ib_remove(struct mlx5_ib_dev *dev,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 9cf23ae6324e..556af34b788b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -944,6 +944,13 @@ struct mlx5_ib_pf_eq {
 	mempool_t *pool;
 };
 
+struct mlx5_devx_event_table {
+	struct mlx5_nb devx_nb;
+	/* serialize updating the event_xa */
+	struct mutex event_xa_lock;
+	struct xarray event_xa;
+};
+
 struct mlx5_ib_dev {
 	struct ib_device		ib_dev;
 	struct mlx5_core_dev		*mdev;
@@ -994,6 +1001,7 @@ struct mlx5_ib_dev {
 	struct mlx5_srq_table   srq_table;
 	struct mlx5_async_ctx   async_ctx;
 	int			free_port;
+	struct mlx5_devx_event_table devx_event_table;
 };
 
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
@@ -1333,6 +1341,8 @@ void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev,
 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user);
 void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid);
+void mlx5_ib_devx_init_event_table(struct mlx5_ib_dev *dev);
+void mlx5_ib_devx_cleanup_event_table(struct mlx5_ib_dev *dev);
 const struct uverbs_object_tree_def *mlx5_ib_get_devx_tree(void);
 extern const struct uapi_definition mlx5_ib_devx_defs[];
 extern const struct uapi_definition mlx5_ib_flow_defs[];
@@ -1349,6 +1359,8 @@ static inline int
 mlx5_ib_devx_create(struct mlx5_ib_dev *dev,
 			   bool is_user) { return -EOPNOTSUPP; }
 static inline void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid) {}
+static inline void mlx5_ib_devx_init_event_table(struct mlx5_ib_dev *dev) {}
+static inline void mlx5_ib_devx_cleanup_event_table(struct mlx5_ib_dev *dev) {}
 static inline bool mlx5_ib_devx_is_flow_dest(void *obj, int *dest_id,
 					     int *dest_type)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (8 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-24 11:57   ` Jason Gunthorpe
  2019-06-18 17:15 ` [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event Leon Romanovsky
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Enable subscription for device events over DEVX.

Each subscription is added to the two level XA data structure according
to its event number and the DEVX object information in case was given
with the given target fd.

Those events will be reported over the given fd once will occur.
Downstream patches will mange the dispatching to any subscription.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c        | 564 ++++++++++++++++++++++-
 include/uapi/rdma/mlx5_user_ioctl_cmds.h |   9 +
 2 files changed, 566 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index e9b9ba5a3e9a..304b13e7a265 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -14,6 +14,7 @@
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/fs.h>
 #include "mlx5_ib.h"
+#include <linux/xarray.h>
 
 #define UVERBS_MODULE_NAME mlx5_ib
 #include <rdma/uverbs_named_ioctl.h>
@@ -33,6 +34,37 @@ struct devx_async_data {
 	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
 };
 
+/* first level XA value data structure */
+struct devx_event {
+	struct xarray object_ids; /* second XA level, Key = object id */
+	struct list_head unaffiliated_list;
+};
+
+/* second level XA value data structure */
+struct devx_obj_event {
+	struct rcu_head rcu;
+	struct list_head obj_sub_list;
+};
+
+struct devx_event_subscription {
+	struct list_head file_list; /* headed in private_data->
+				     * subscribed_events_list
+				     */
+	struct list_head xa_list; /* headed in devx_event->unaffiliated_list or
+				   * devx_obj_event->obj_sub_list
+				   */
+	struct list_head obj_list; /* headed in devx_object */
+
+	u32 xa_key_level1;
+	u32 xa_key_level2;
+	struct rcu_head	rcu;
+	u64 cookie;
+	bool is_obj_related;
+	struct ib_uobject *fd_uobj;
+	void *object;	/* May need direct access upon hot unplug */
+	struct eventfd_ctx *eventfd;
+};
+
 struct devx_async_event_queue {
 	spinlock_t		lock;
 	wait_queue_head_t	poll_wait;
@@ -62,6 +94,7 @@ struct devx_obj {
 		struct mlx5_ib_devx_mr	devx_mr;
 		struct mlx5_core_dct	core_dct;
 	};
+	struct list_head event_sub; /* holds devx_event_subscription entries */
 };
 
 struct devx_umem {
@@ -167,6 +200,105 @@ bool mlx5_ib_devx_is_flow_counter(void *obj, u32 *counter_id)
 	return false;
 }
 
+static bool is_legacy_unaffiliated_event_num(u16 event_num)
+{
+	switch (event_num) {
+	case MLX5_EVENT_TYPE_PORT_CHANGE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static bool is_legacy_obj_event_num(u16 event_num)
+{
+	switch (event_num) {
+	case MLX5_EVENT_TYPE_PATH_MIG:
+	case MLX5_EVENT_TYPE_COMM_EST:
+	case MLX5_EVENT_TYPE_SQ_DRAINED:
+	case MLX5_EVENT_TYPE_SRQ_LAST_WQE:
+	case MLX5_EVENT_TYPE_SRQ_RQ_LIMIT:
+	case MLX5_EVENT_TYPE_CQ_ERROR:
+	case MLX5_EVENT_TYPE_WQ_CATAS_ERROR:
+	case MLX5_EVENT_TYPE_PATH_MIG_FAILED:
+	case MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR:
+	case MLX5_EVENT_TYPE_WQ_ACCESS_ERROR:
+	case MLX5_EVENT_TYPE_SRQ_CATAS_ERROR:
+	case MLX5_EVENT_TYPE_DCT_DRAINED:
+	case MLX5_EVENT_TYPE_COMP:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static u16 get_legacy_obj_type(u16 opcode)
+{
+	switch (opcode) {
+	case MLX5_CMD_OP_CREATE_RQ:
+	case MLX5_CMD_OP_CREATE_XRC_SRQ:
+		return MLX5_EVENT_QUEUE_TYPE_RQ;
+	case MLX5_CMD_OP_CREATE_QP:
+		return MLX5_EVENT_QUEUE_TYPE_QP;
+	case MLX5_CMD_OP_CREATE_SQ:
+		return MLX5_EVENT_QUEUE_TYPE_SQ;
+	case MLX5_CMD_OP_CREATE_DCT:
+		return MLX5_EVENT_QUEUE_TYPE_DCT;
+	default:
+		return 0;
+	}
+}
+
+static u16 get_dec_obj_type(struct devx_obj *obj, u16 event_num)
+{
+	u16 opcode;
+
+	opcode = (obj->obj_id >> 32) & 0xffff;
+
+	if (is_legacy_obj_event_num(event_num))
+		return get_legacy_obj_type(opcode);
+
+	switch (opcode) {
+	case MLX5_CMD_OP_CREATE_GENERAL_OBJECT:
+		return (obj->obj_id >> 48);
+	case MLX5_CMD_OP_CREATE_RQ:
+		return MLX5_OBJ_TYPE_RQ;
+	case MLX5_CMD_OP_CREATE_QP:
+		return MLX5_OBJ_TYPE_QP;
+	case MLX5_CMD_OP_CREATE_SQ:
+		return MLX5_OBJ_TYPE_SQ;
+	case MLX5_CMD_OP_CREATE_DCT:
+		return MLX5_OBJ_TYPE_DCT;
+	case MLX5_CMD_OP_CREATE_TIR:
+		return MLX5_OBJ_TYPE_TIR;
+	case MLX5_CMD_OP_CREATE_TIS:
+		return MLX5_OBJ_TYPE_TIS;
+	case MLX5_CMD_OP_CREATE_PSV:
+		return MLX5_OBJ_TYPE_PSV;
+	case MLX5_OBJ_TYPE_MKEY:
+		return MLX5_OBJ_TYPE_MKEY;
+	case MLX5_CMD_OP_CREATE_RMP:
+		return MLX5_OBJ_TYPE_RMP;
+	case MLX5_CMD_OP_CREATE_XRC_SRQ:
+		return MLX5_OBJ_TYPE_XRC_SRQ;
+	case MLX5_CMD_OP_CREATE_XRQ:
+		return MLX5_OBJ_TYPE_XRQ;
+	case MLX5_CMD_OP_CREATE_RQT:
+		return MLX5_OBJ_TYPE_RQT;
+	case MLX5_CMD_OP_ALLOC_FLOW_COUNTER:
+		return MLX5_OBJ_TYPE_FLOW_COUNTER;
+	case MLX5_CMD_OP_CREATE_CQ:
+		return MLX5_OBJ_TYPE_CQ;
+	default:
+		return 0;
+	}
+}
+
+static u32 get_dec_obj_id(u64 obj_id)
+{
+	return (obj_id & 0xffffffff);
+}
+
 /*
  * As the obj_id in the firmware is not globally unique the object type
  * must be considered upon checking for a valid object id.
@@ -1143,14 +1275,53 @@ static void devx_cleanup_mkey(struct devx_obj *obj)
 	write_unlock_irqrestore(&table->lock, flags);
 }
 
+static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
+				      struct devx_event_subscription *sub)
+{
+	list_del_rcu(&sub->file_list);
+	list_del_rcu(&sub->xa_list);
+
+	if (sub->is_obj_related) {
+		struct devx_event *event;
+		struct devx_obj_event *xa_val_level2;
+
+		list_del_rcu(&sub->obj_list);
+
+		/* check whether the key level 1 for this obj_sub_list
+		 * is empty
+		 */
+		event = xa_load(&dev->devx_event_table.event_xa,
+				sub->xa_key_level1);
+		WARN_ON(!event);
+
+		xa_val_level2 = xa_load(&event->object_ids,
+				sub->xa_key_level2);
+		if (list_empty(&xa_val_level2->obj_sub_list)) {
+			xa_erase(&event->object_ids,
+				 sub->xa_key_level2);
+			kfree_rcu(xa_val_level2, rcu);
+		}
+	}
+
+	if (sub->eventfd)
+		eventfd_ctx_put(sub->eventfd);
+
+	kfree_rcu(sub, rcu);
+}
+
 static int devx_obj_cleanup(struct ib_uobject *uobject,
 			    enum rdma_remove_reason why,
 			    struct uverbs_attr_bundle *attrs)
 {
 	u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)];
+	struct mlx5_devx_event_table *devx_event_table;
 	struct devx_obj *obj = uobject->object;
+	struct devx_event_subscription *sub_entry, *tmp;
+	struct mlx5_ib_dev *dev;
+	u32 obj_id;
 	int ret;
 
+	dev = mlx5_udata_to_mdev(&attrs->driver_udata);
 	if (obj->flags & DEVX_OBJ_FLAGS_INDIRECT_MKEY)
 		devx_cleanup_mkey(obj);
 
@@ -1162,10 +1333,15 @@ static int devx_obj_cleanup(struct ib_uobject *uobject,
 	if (ib_is_destroy_retryable(ret, why, uobject))
 		return ret;
 
-	if (obj->flags & DEVX_OBJ_FLAGS_INDIRECT_MKEY) {
-		struct mlx5_ib_dev *dev =
-			mlx5_udata_to_mdev(&attrs->driver_udata);
+	devx_event_table = &dev->devx_event_table;
+	obj_id = get_dec_obj_id(obj->obj_id);
 
+	mutex_lock(&devx_event_table->event_xa_lock);
+	list_for_each_entry_safe(sub_entry, tmp, &obj->event_sub, obj_list)
+		devx_cleanup_subscription(dev, sub_entry);
+	mutex_unlock(&devx_event_table->event_xa_lock);
+
+	if (obj->flags & DEVX_OBJ_FLAGS_INDIRECT_MKEY) {
 		call_srcu(&dev->mr_srcu, &obj->devx_mr.rcu,
 			  devx_free_indirect_mkey);
 		return ret;
@@ -1237,6 +1413,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
 
 	uobj->object = obj;
 	obj->mdev = dev->mdev;
+	INIT_LIST_HEAD(&obj->event_sub);
 	devx_obj_build_destroy_cmd(cmd_in, cmd_out, obj->dinbox, &obj->dinlen,
 				   &obj_id);
 	WARN_ON(obj->dinlen > MLX5_MAX_DESTROY_INBOX_SIZE_DW * sizeof(u32));
@@ -1523,6 +1700,350 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_ASYNC_QUERY)(
 	return err;
 }
 
+static void
+subscribe_event_xa_dealloc(struct mlx5_devx_event_table *devx_event_table,
+			   u32 key_level1,
+			   u32 key_level2,
+			   struct devx_obj_event *alloc_obj_event)
+{
+	struct devx_event *event;
+
+	/* Level 1 is valid for future use - no need to free */
+	if (!alloc_obj_event)
+		return;
+
+	event = xa_load(&devx_event_table->event_xa, key_level1);
+	WARN_ON(!event);
+
+	xa_erase(&event->object_ids, key_level2);
+	kfree(alloc_obj_event);
+}
+
+static int
+subscribe_event_xa_alloc(struct mlx5_devx_event_table *devx_event_table,
+			 u32 key_level1,
+			 bool is_level2,
+			 u32 key_level2,
+			 struct devx_obj_event **alloc_obj_event)
+{
+	struct devx_obj_event *obj_event;
+	struct devx_event *event;
+	bool new_entry_level1 = false;
+	int err;
+
+	event = xa_load(&devx_event_table->event_xa, key_level1);
+	if (!event) {
+		event = kzalloc(sizeof(*event), GFP_KERNEL);
+		if (!event)
+			return -ENOMEM;
+
+		new_entry_level1 = true;
+		INIT_LIST_HEAD(&event->unaffiliated_list);
+		xa_init(&event->object_ids);
+
+		err = xa_insert(&devx_event_table->event_xa,
+				key_level1,
+				event,
+				GFP_KERNEL);
+		if (err)
+			goto end;
+	}
+
+	if (!is_level2)
+		return 0;
+
+	obj_event = xa_load(&event->object_ids, key_level2);
+	if (!obj_event) {
+		err = xa_reserve(&event->object_ids, key_level2, GFP_KERNEL);
+		if (err)
+			goto err_level1;
+
+		obj_event = kzalloc(sizeof(*obj_event), GFP_KERNEL);
+		if (!obj_event) {
+			err = -ENOMEM;
+			goto err_level2;
+		}
+
+		INIT_LIST_HEAD(&obj_event->obj_sub_list);
+		*alloc_obj_event = obj_event;
+	}
+
+	return 0;
+
+err_level2:
+	xa_erase(&event->object_ids, key_level2);
+
+err_level1:
+	if (new_entry_level1)
+		xa_erase(&devx_event_table->event_xa, key_level1);
+end:
+	if (new_entry_level1)
+		kfree(event);
+	return err;
+}
+
+static bool is_valid_events_legacy(int num_events, u16 *event_type_num_list,
+				   struct devx_obj *obj)
+{
+	int i;
+
+	for (i = 0; i < num_events; i++) {
+		if (obj) {
+			if (!is_legacy_obj_event_num(event_type_num_list[i]))
+				return false;
+		} else if (!is_legacy_unaffiliated_event_num(
+				event_type_num_list[i])) {
+			return false;
+		}
+	}
+
+	return true;
+}
+
+#define MAX_SUPP_EVENT_NUM 255
+static bool is_valid_events(struct mlx5_core_dev *dev,
+			    int num_events, u16 *event_type_num_list,
+			    struct devx_obj *obj)
+{
+	__be64 *aff_events;
+	__be64 *unaff_events;
+	int mask_entry;
+	int mask_bit;
+	int i;
+
+	if (MLX5_CAP_GEN(dev, event_cap)) {
+		aff_events = MLX5_CAP_DEV_EVENT(dev,
+						user_affiliated_events);
+		unaff_events = MLX5_CAP_DEV_EVENT(dev,
+						  user_unaffiliated_events);
+	} else {
+		return is_valid_events_legacy(num_events, event_type_num_list,
+					      obj);
+	}
+
+	for (i = 0; i < num_events; i++) {
+		if (event_type_num_list[i] > MAX_SUPP_EVENT_NUM)
+			return false;
+
+		mask_entry = event_type_num_list[i] / 64;
+		mask_bit = event_type_num_list[i] % 64;
+
+		if (obj) {
+			/* CQ completion */
+			if (event_type_num_list[i] == 0)
+				continue;
+
+			if (!(be64_to_cpu(aff_events[mask_entry]) &
+					(1ull << mask_bit)))
+				return false;
+
+			continue;
+		}
+
+		if (!(be64_to_cpu(unaff_events[mask_entry]) &
+				(1ull << mask_bit)))
+			return false;
+	}
+
+	return true;
+}
+
+#define MAX_NUM_EVENTS 16
+static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_SUBSCRIBE_EVENT)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *devx_uobj = uverbs_attr_get_uobject(
+				attrs,
+				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_OBJ_HANDLE);
+	struct mlx5_ib_ucontext *c = rdma_udata_to_drv_context(
+		&attrs->driver_udata, struct mlx5_ib_ucontext, ibucontext);
+	struct mlx5_ib_dev *dev = to_mdev(c->ibucontext.device);
+	struct ib_uobject *fd_uobj;
+	struct devx_obj *obj = NULL;
+	struct devx_async_event_file *ev_file;
+	struct mlx5_devx_event_table *devx_event_table = &dev->devx_event_table;
+	u16 *event_type_num_list;
+	struct devx_event_subscription **event_sub_arr;
+	struct devx_obj_event  **event_obj_array_alloc;
+	int redirect_fd;
+	bool use_eventfd = false;
+	int num_events;
+	int num_alloc_xa_entries = 0;
+	u16 obj_type = 0;
+	u64 cookie = 0;
+	u32 obj_id = 0;
+	int err;
+	int i;
+
+	if (!c->devx_uid)
+		return -EINVAL;
+
+	if (!IS_ERR(devx_uobj)) {
+		obj = (struct devx_obj *)devx_uobj->object;
+		if (obj)
+			obj_id = get_dec_obj_id(obj->obj_id);
+	}
+
+	fd_uobj = uverbs_attr_get_uobject(attrs,
+				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_HANDLE);
+	if (IS_ERR(fd_uobj))
+		return PTR_ERR(fd_uobj);
+
+	ev_file = container_of(fd_uobj, struct devx_async_event_file,
+			       uobj);
+
+	if (uverbs_attr_is_valid(attrs,
+				 MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM)) {
+		err = uverbs_copy_from(&redirect_fd, attrs,
+			       MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM);
+		if (err)
+			return err;
+
+		use_eventfd = true;
+	}
+
+	if (uverbs_attr_is_valid(attrs,
+				 MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE)) {
+		if (use_eventfd)
+			return -EINVAL;
+
+		err = uverbs_copy_from(&cookie, attrs,
+				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE);
+		if (err)
+			return err;
+	}
+
+	num_events = uverbs_attr_ptr_get_array_size(
+		attrs, MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST,
+		sizeof(u16));
+
+	if (num_events < 0)
+		return num_events;
+
+	if (num_events > MAX_NUM_EVENTS)
+		return -EINVAL;
+
+	event_type_num_list = uverbs_attr_get_alloced_ptr(attrs,
+			MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST);
+
+	if (!is_valid_events(dev->mdev, num_events, event_type_num_list, obj))
+		return -EINVAL;
+
+	event_sub_arr = uverbs_zalloc(attrs,
+		MAX_NUM_EVENTS * sizeof(struct devx_event_subscription *));
+	event_obj_array_alloc = uverbs_zalloc(attrs,
+		MAX_NUM_EVENTS * sizeof(struct devx_obj_event *));
+
+	if (!event_sub_arr || !event_obj_array_alloc)
+		return -ENOMEM;
+
+	/* Protect from concurrent subscriptions to same XA entries to allow
+	 * both to succeed
+	 */
+	mutex_lock(&devx_event_table->event_xa_lock);
+	for (i = 0; i < num_events; i++) {
+		u32 key_level1;
+
+		if (obj)
+			obj_type = get_dec_obj_type(obj,
+						    event_type_num_list[i]);
+		key_level1 = event_type_num_list[i] | obj_type << 16;
+
+		err = subscribe_event_xa_alloc(devx_event_table,
+					       key_level1,
+					       obj ? true : false,
+					       obj_id,
+					       &event_obj_array_alloc[i]);
+		if (err)
+			goto err;
+
+		num_alloc_xa_entries++;
+		event_sub_arr[i] = kzalloc(sizeof(*event_sub_arr[i]),
+					   GFP_KERNEL);
+		if (!event_sub_arr[i])
+			goto err;
+
+		if (use_eventfd) {
+			event_sub_arr[i]->eventfd =
+				eventfd_ctx_fdget(redirect_fd);
+
+			if (IS_ERR(event_sub_arr[i]->eventfd)) {
+				err = PTR_ERR(event_sub_arr[i]->eventfd);
+				event_sub_arr[i]->eventfd = NULL;
+				goto err;
+			}
+		}
+
+		event_sub_arr[i]->cookie = cookie;
+		event_sub_arr[i]->fd_uobj = fd_uobj;
+		event_sub_arr[i]->object = fd_uobj->object;
+		/* May be needed upon cleanup the devx object/subscription */
+		event_sub_arr[i]->xa_key_level1 = key_level1;
+		event_sub_arr[i]->xa_key_level2 = obj_id;
+		event_sub_arr[i]->is_obj_related = obj ? true : false;
+	}
+
+	/* Once all the allocations and the reservations for level 2
+	 * have been done we can go ahead and insert the data without any
+	 * concern of a failure.
+	 */
+	for (i = 0; i < num_events; i++) {
+		struct devx_event *event;
+		struct devx_obj_event  *obj_event;
+
+		spin_lock_irq(&ev_file->ev_queue.lock);
+		list_add_tail_rcu(&event_sub_arr[i]->file_list,
+				  &ev_file->subscribed_events_list);
+		spin_unlock_irq(&ev_file->ev_queue.lock);
+
+		event = xa_load(&devx_event_table->event_xa,
+				event_sub_arr[i]->xa_key_level1);
+		WARN_ON(!event);
+
+		if (!obj) {
+			list_add_tail_rcu(&event_sub_arr[i]->xa_list,
+					  &event->unaffiliated_list);
+			continue;
+		}
+
+		/* If was no entry prior calling this API need to insert
+		 * this XA key level 2 entry.
+		 */
+		if (event_obj_array_alloc[i]) {
+			err = xa_err(xa_store(&event->object_ids, obj_id,
+					      event_obj_array_alloc[i],
+					      GFP_KERNEL));
+			WARN_ON(err);
+		}
+
+		obj_event = xa_load(&event->object_ids, obj_id);
+		WARN_ON(!obj_event);
+		list_add_tail_rcu(&event_sub_arr[i]->xa_list,
+				  &obj_event->obj_sub_list);
+		list_add_tail_rcu(&event_sub_arr[i]->obj_list,
+				  &obj->event_sub);
+	}
+
+	mutex_unlock(&devx_event_table->event_xa_lock);
+	return 0;
+
+err:
+	for (i = 0; i < num_alloc_xa_entries; i++) {
+		subscribe_event_xa_dealloc(devx_event_table,
+					   event_sub_arr[i]->xa_key_level1,
+					   obj_id,
+					   event_obj_array_alloc[i]);
+
+		if (event_sub_arr[i]->eventfd)
+			eventfd_ctx_put(event_sub_arr[i]->eventfd);
+
+		kvfree(event_sub_arr[i]);
+	}
+
+	mutex_unlock(&devx_event_table->event_xa_lock);
+	return err;
+}
+
 static int devx_umem_get(struct mlx5_ib_dev *dev, struct ib_ucontext *ucontext,
 			 struct uverbs_attr_bundle *attrs,
 			 struct devx_umem *obj)
@@ -1689,14 +2210,21 @@ void mlx5_ib_devx_init_event_table(struct mlx5_ib_dev *dev)
 void mlx5_ib_devx_cleanup_event_table(struct mlx5_ib_dev *dev)
 {
 	struct mlx5_devx_event_table *table = &dev->devx_event_table;
+	struct devx_event_subscription *sub, *tmp;
+	struct devx_event *event;
 	void *entry;
 	unsigned long id;
 
 	mlx5_eq_notifier_unregister(dev->mdev, &table->devx_nb);
-
-	xa_for_each(&table->event_xa, id, entry)
+	mutex_lock(&dev->devx_event_table.event_xa_lock);
+	xa_for_each(&table->event_xa, id, entry) {
+		event = entry;
+		list_for_each_entry_safe(sub, tmp, &event->unaffiliated_list,
+					 xa_list)
+			devx_cleanup_subscription(dev, sub);
 		kfree(entry);
-
+	}
+	mutex_unlock(&dev->devx_event_table.event_xa_lock);
 	xa_destroy(&table->event_xa);
 }
 
@@ -1980,10 +2508,32 @@ DECLARE_UVERBS_NAMED_METHOD(
 		UVERBS_ATTR_TYPE(u64),
 		UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	MLX5_IB_METHOD_DEVX_SUBSCRIBE_EVENT,
+	UVERBS_ATTR_FD(MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_HANDLE,
+		MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD,
+		UVERBS_ACCESS_READ,
+		UA_MANDATORY),
+	UVERBS_ATTR_IDR(MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_OBJ_HANDLE,
+		MLX5_IB_OBJECT_DEVX_OBJ,
+		UVERBS_ACCESS_READ,
+		UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST,
+		UVERBS_ATTR_MIN_SIZE(sizeof(u16)),
+		UA_MANDATORY,
+		UA_ALLOC_AND_COPY),
+	UVERBS_ATTR_PTR_IN(MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE,
+		UVERBS_ATTR_TYPE(u64),
+		UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM,
+		UVERBS_ATTR_TYPE(u32),
+		UA_OPTIONAL));
+
 DECLARE_UVERBS_GLOBAL_METHODS(MLX5_IB_OBJECT_DEVX,
 			      &UVERBS_METHOD(MLX5_IB_METHOD_DEVX_OTHER),
 			      &UVERBS_METHOD(MLX5_IB_METHOD_DEVX_QUERY_UAR),
-			      &UVERBS_METHOD(MLX5_IB_METHOD_DEVX_QUERY_EQN));
+			      &UVERBS_METHOD(MLX5_IB_METHOD_DEVX_QUERY_EQN),
+			      &UVERBS_METHOD(MLX5_IB_METHOD_DEVX_SUBSCRIBE_EVENT));
 
 DECLARE_UVERBS_NAMED_OBJECT(MLX5_IB_OBJECT_DEVX_OBJ,
 			    UVERBS_TYPE_ALLOC_IDR(devx_obj_cleanup),
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index 6ad8f4f11ddd..d0da070cf0ab 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -51,6 +51,7 @@ enum mlx5_ib_devx_methods {
 	MLX5_IB_METHOD_DEVX_OTHER  = (1U << UVERBS_ID_NS_SHIFT),
 	MLX5_IB_METHOD_DEVX_QUERY_UAR,
 	MLX5_IB_METHOD_DEVX_QUERY_EQN,
+	MLX5_IB_METHOD_DEVX_SUBSCRIBE_EVENT,
 };
 
 enum  mlx5_ib_devx_other_attrs {
@@ -93,6 +94,14 @@ enum mlx5_ib_devx_obj_query_async_attrs {
 	MLX5_IB_ATTR_DEVX_OBJ_QUERY_ASYNC_OUT_LEN,
 };
 
+enum mlx5_ib_devx_subscribe_event_attrs {
+	MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+	MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_OBJ_HANDLE,
+	MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST,
+	MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM,
+	MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE,
+};
+
 enum  mlx5_ib_devx_query_eqn_attrs {
 	MLX5_IB_ATTR_DEVX_QUERY_EQN_USER_VEC = (1U << UVERBS_ID_NS_SHIFT),
 	MLX5_IB_ATTR_DEVX_QUERY_EQN_DEV_EQN,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (9 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-24 12:03   ` Jason Gunthorpe
  2019-06-18 17:15 ` [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events Leon Romanovsky
  2019-06-18 18:51 ` [PATCH rdma-next v1 00/12] DEVX asynchronous events Saeed Mahameed
  12 siblings, 1 reply; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Implement DEVX dispatching event by looking up for the applicable
subscriptions for the reported event and using their target fd to
signal/set the event.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c         | 362 +++++++++++++++++++++-
 include/uapi/rdma/mlx5_user_ioctl_verbs.h |   5 +
 2 files changed, 357 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 304b13e7a265..49fdce95d6d9 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -34,6 +34,11 @@ struct devx_async_data {
 	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
 };
 
+struct devx_async_event_data {
+	struct list_head list; /* headed in ev_queue->event_list */
+	struct mlx5_ib_uapi_devx_async_event_hdr hdr;
+};
+
 /* first level XA value data structure */
 struct devx_event {
 	struct xarray object_ids; /* second XA level, Key = object id */
@@ -54,7 +59,9 @@ struct devx_event_subscription {
 				   * devx_obj_event->obj_sub_list
 				   */
 	struct list_head obj_list; /* headed in devx_object */
+	struct list_head event_list; /* headed in ev_queue->event_list */
 
+	u8  is_cleaned:1;
 	u32 xa_key_level1;
 	u32 xa_key_level2;
 	struct rcu_head	rcu;
@@ -71,6 +78,7 @@ struct devx_async_event_queue {
 	struct list_head	event_list;
 	atomic_t		bytes_in_use;
 	u8			is_destroyed:1;
+	u8			is_overflow_err:1;
 	u32			flags;
 };
 
@@ -294,6 +302,46 @@ static u16 get_dec_obj_type(struct devx_obj *obj, u16 event_num)
 	}
 }
 
+/* Any future affiliated event should have a fixed header to get the obj
+ * type and id including events on legacy objects.
+ */
+static u32 get_affiliated_event_obj_id(struct mlx5_eqe *eqe)
+{
+	u32 obj_id = MLX5_GET(affiliated_event_header, eqe, obj_id);
+
+	return obj_id;
+}
+
+static u16 get_affiliated_event_obj_type(struct mlx5_eqe *eqe)
+{
+	u16 obj_type = MLX5_GET(affiliated_event_header, eqe, obj_type);
+
+	return obj_type;
+}
+
+static u16 get_event_obj_type(unsigned long event_type, struct mlx5_eqe *eqe)
+{
+	switch (event_type) {
+	case MLX5_EVENT_TYPE_WQ_CATAS_ERROR:
+	case MLX5_EVENT_TYPE_WQ_ACCESS_ERROR:
+	case MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR:
+	case MLX5_EVENT_TYPE_SRQ_LAST_WQE:
+	case MLX5_EVENT_TYPE_PATH_MIG:
+	case MLX5_EVENT_TYPE_PATH_MIG_FAILED:
+	case MLX5_EVENT_TYPE_COMM_EST:
+	case MLX5_EVENT_TYPE_SQ_DRAINED:
+	case MLX5_EVENT_TYPE_SRQ_RQ_LIMIT:
+	case MLX5_EVENT_TYPE_SRQ_CATAS_ERROR:
+		return eqe->data.qp_srq.type;
+	case MLX5_EVENT_TYPE_CQ_ERROR:
+		return 0;
+	case MLX5_EVENT_TYPE_DCT_DRAINED:
+		return MLX5_EVENT_QUEUE_TYPE_DCT;
+	default:
+		return get_affiliated_event_obj_type(eqe);
+	}
+}
+
 static u32 get_dec_obj_id(u64 obj_id)
 {
 	return (obj_id & 0xffffffff);
@@ -1276,9 +1324,13 @@ static void devx_cleanup_mkey(struct devx_obj *obj)
 }
 
 static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
-				      struct devx_event_subscription *sub)
+				      struct devx_event_subscription *sub,
+				      bool file_close)
 {
-	list_del_rcu(&sub->file_list);
+	if (sub->is_cleaned)
+		goto end;
+
+	sub->is_cleaned = 1;
 	list_del_rcu(&sub->xa_list);
 
 	if (sub->is_obj_related) {
@@ -1303,10 +1355,15 @@ static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
 		}
 	}
 
-	if (sub->eventfd)
-		eventfd_ctx_put(sub->eventfd);
+end:
+	if (file_close) {
+		if (sub->eventfd)
+			eventfd_ctx_put(sub->eventfd);
 
-	kfree_rcu(sub, rcu);
+		list_del_rcu(&sub->file_list);
+		/* subscription may not be used by the read API any more */
+		kfree_rcu(sub, rcu);
+	}
 }
 
 static int devx_obj_cleanup(struct ib_uobject *uobject,
@@ -1338,7 +1395,7 @@ static int devx_obj_cleanup(struct ib_uobject *uobject,
 
 	mutex_lock(&devx_event_table->event_xa_lock);
 	list_for_each_entry_safe(sub_entry, tmp, &obj->event_sub, obj_list)
-		devx_cleanup_subscription(dev, sub_entry);
+		devx_cleanup_subscription(dev, sub_entry, false);
 	mutex_unlock(&devx_event_table->event_xa_lock);
 
 	if (obj->flags & DEVX_OBJ_FLAGS_INDIRECT_MKEY) {
@@ -1588,6 +1645,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_ASYNC_EVENT_FD_ALLOC)(
 	devx_init_event_queue(&ev_file->ev_queue, flags);
 	INIT_LIST_HEAD(&ev_file->subscribed_events_list);
 	ev_file->dev = dev;
+	get_device(&dev->ib_dev.dev);
 	return 0;
 }
 
@@ -1981,6 +2039,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_SUBSCRIBE_EVENT)(
 		event_sub_arr[i]->xa_key_level1 = key_level1;
 		event_sub_arr[i]->xa_key_level2 = obj_id;
 		event_sub_arr[i]->is_obj_related = obj ? true : false;
+		INIT_LIST_HEAD(&event_sub_arr[i]->event_list);
 	}
 
 	/* Once all the allocations and the reservations for level 2
@@ -2191,10 +2250,174 @@ static int devx_umem_cleanup(struct ib_uobject *uobject,
 	return 0;
 }
 
+static bool is_unaffiliated_event(struct mlx5_core_dev *dev,
+				  unsigned long event_type)
+{
+	__be64 *unaff_events;
+	int mask_entry;
+	int mask_bit;
+
+	if (!MLX5_CAP_GEN(dev, event_cap))
+		return is_legacy_unaffiliated_event_num(event_type);
+
+	unaff_events = MLX5_CAP_DEV_EVENT(dev,
+					  user_unaffiliated_events);
+	WARN_ON(event_type > MAX_SUPP_EVENT_NUM);
+
+	mask_entry = event_type / 64;
+	mask_bit = event_type % 64;
+
+	if (!(be64_to_cpu(unaff_events[mask_entry]) & (1ull << mask_bit)))
+		return false;
+
+	return true;
+}
+
+static u32 devx_get_obj_id_from_event(unsigned long event_type, void *data)
+{
+	struct mlx5_eqe *eqe = data;
+	u32 obj_id = 0;
+
+	switch (event_type) {
+	case MLX5_EVENT_TYPE_SRQ_CATAS_ERROR:
+	case MLX5_EVENT_TYPE_SRQ_RQ_LIMIT:
+	case MLX5_EVENT_TYPE_PATH_MIG:
+	case MLX5_EVENT_TYPE_COMM_EST:
+	case MLX5_EVENT_TYPE_SQ_DRAINED:
+	case MLX5_EVENT_TYPE_SRQ_LAST_WQE:
+	case MLX5_EVENT_TYPE_WQ_CATAS_ERROR:
+	case MLX5_EVENT_TYPE_PATH_MIG_FAILED:
+	case MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR:
+	case MLX5_EVENT_TYPE_WQ_ACCESS_ERROR:
+		obj_id = be32_to_cpu(eqe->data.qp_srq.qp_srq_n) & 0xffffff;
+		break;
+	case MLX5_EVENT_TYPE_DCT_DRAINED:
+		obj_id = be32_to_cpu(eqe->data.dct.dctn) & 0xffffff;
+		break;
+	case MLX5_EVENT_TYPE_CQ_ERROR:
+		obj_id = be32_to_cpu(eqe->data.cq_err.cqn) & 0xffffff;
+		break;
+	default:
+		obj_id = get_affiliated_event_obj_id(data);
+		break;
+	}
+
+	return obj_id;
+}
+
+static int deliver_event(struct devx_event_subscription *event_sub,
+			 const void *data)
+{
+	struct ib_uobject *fd_uobj = event_sub->fd_uobj;
+	struct devx_async_event_file *ev_file;
+	struct devx_async_event_queue *ev_queue;
+	struct devx_async_event_data *event_data;
+	unsigned long flags;
+	bool omit_data;
+
+	ev_file = container_of(fd_uobj, struct devx_async_event_file,
+			       uobj);
+	ev_queue = &ev_file->ev_queue;
+	omit_data = ev_queue->flags &
+		MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA;
+
+	if (omit_data) {
+		spin_lock_irqsave(&ev_queue->lock, flags);
+		if (!list_empty(&event_sub->event_list)) {
+			spin_unlock_irqrestore(&ev_queue->lock, flags);
+			return 0;
+		}
+
+		list_add_tail(&event_sub->event_list, &ev_queue->event_list);
+		spin_unlock_irqrestore(&ev_queue->lock, flags);
+		wake_up_interruptible(&ev_queue->poll_wait);
+		return 0;
+	}
+
+	event_data = kzalloc(sizeof(*event_data) +
+			     (omit_data ? 0 : sizeof(struct mlx5_eqe)),
+			     GFP_ATOMIC);
+	if (!event_data) {
+		spin_lock_irqsave(&ev_queue->lock, flags);
+		ev_queue->is_overflow_err = 1;
+		spin_unlock_irqrestore(&ev_queue->lock, flags);
+		return -ENOMEM;
+	}
+
+	event_data->hdr.cookie = event_sub->cookie;
+	memcpy(event_data->hdr.out_data, data, sizeof(struct mlx5_eqe));
+
+	spin_lock_irqsave(&ev_queue->lock, flags);
+	list_add_tail(&event_data->list, &ev_queue->event_list);
+	spin_unlock_irqrestore(&ev_queue->lock, flags);
+	wake_up_interruptible(&ev_queue->poll_wait);
+
+	return 0;
+}
+
+static void dispatch_event_fd(struct list_head *fd_list,
+			      const void *data)
+{
+	struct devx_event_subscription *item;
+
+	list_for_each_entry_rcu(item, fd_list, xa_list) {
+		if (!get_file_rcu((struct file *)item->object))
+			continue;
+
+		if (item->eventfd) {
+			eventfd_signal(item->eventfd, 1);
+			fput(item->object);
+			continue;
+		}
+
+		deliver_event(item, data);
+		fput(item->object);
+	}
+}
+
 static int devx_event_notifier(struct notifier_block *nb,
 			       unsigned long event_type, void *data)
 {
-	return NOTIFY_DONE;
+	struct mlx5_devx_event_table *table;
+	struct mlx5_ib_dev *dev;
+	struct devx_event *event;
+	struct devx_obj_event *obj_event;
+	u16 obj_type = 0;
+	bool is_unaffiliated;
+	u32 obj_id;
+
+	/* Explicit filtering to kernel events which may occur frequently */
+	if (event_type == MLX5_EVENT_TYPE_CMD ||
+	    event_type == MLX5_EVENT_TYPE_PAGE_REQUEST)
+		return NOTIFY_OK;
+
+	table = container_of(nb, struct mlx5_devx_event_table, devx_nb.nb);
+	dev = container_of(table, struct mlx5_ib_dev, devx_event_table);
+	is_unaffiliated = is_unaffiliated_event(dev->mdev, event_type);
+
+	if (!is_unaffiliated)
+		obj_type = get_event_obj_type(event_type, data);
+	event = xa_load(&table->event_xa, event_type | (obj_type << 16));
+	if (!event)
+		return NOTIFY_DONE;
+
+	if (is_unaffiliated) {
+		dispatch_event_fd(&event->unaffiliated_list, data);
+		return NOTIFY_OK;
+	}
+
+	obj_id = devx_get_obj_id_from_event(event_type, data);
+	rcu_read_lock();
+	obj_event = xa_load(&event->object_ids, obj_id);
+	if (!obj_event) {
+		rcu_read_unlock();
+		return NOTIFY_DONE;
+	}
+
+	dispatch_event_fd(&obj_event->obj_sub_list, data);
+
+	rcu_read_unlock();
+	return NOTIFY_OK;
 }
 
 void mlx5_ib_devx_init_event_table(struct mlx5_ib_dev *dev)
@@ -2221,7 +2444,7 @@ void mlx5_ib_devx_cleanup_event_table(struct mlx5_ib_dev *dev)
 		event = entry;
 		list_for_each_entry_safe(sub, tmp, &event->unaffiliated_list,
 					 xa_list)
-			devx_cleanup_subscription(dev, sub);
+			devx_cleanup_subscription(dev, sub, false);
 		kfree(entry);
 	}
 	mutex_unlock(&dev->devx_event_table.event_xa_lock);
@@ -2329,18 +2552,126 @@ static const struct file_operations devx_async_cmd_event_fops = {
 static ssize_t devx_async_event_read(struct file *filp, char __user *buf,
 				     size_t count, loff_t *pos)
 {
-	return -EINVAL;
+	struct devx_async_event_file *ev_file = filp->private_data;
+	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
+	struct devx_event_subscription *event_sub;
+	struct devx_async_event_data *uninitialized_var(event);
+	int ret = 0;
+	size_t eventsz;
+	bool omit_data;
+	void *event_data;
+
+	omit_data = ev_queue->flags &
+		MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA;
+
+	spin_lock_irq(&ev_queue->lock);
+
+	if (ev_queue->is_overflow_err) {
+		ev_queue->is_overflow_err = 0;
+		spin_unlock_irq(&ev_queue->lock);
+		return -EOVERFLOW;
+	}
+
+	while (list_empty(&ev_queue->event_list)) {
+		spin_unlock_irq(&ev_queue->lock);
+
+		if (filp->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		if (wait_event_interruptible(ev_queue->poll_wait,
+			    (!list_empty(&ev_queue->event_list) ||
+			     ev_queue->is_destroyed))) {
+			return -ERESTARTSYS;
+		}
+
+		if (list_empty(&ev_queue->event_list) &&
+		    ev_queue->is_destroyed)
+			return -EIO;
+
+		spin_lock_irq(&ev_queue->lock);
+	}
+
+	if (omit_data) {
+		event_sub = list_first_entry(&ev_queue->event_list,
+					struct devx_event_subscription,
+					event_list);
+		eventsz = sizeof(event_sub->cookie);
+		event_data = &event_sub->cookie;
+	} else {
+		event = list_first_entry(&ev_queue->event_list,
+				      struct devx_async_event_data, list);
+		eventsz = sizeof(struct mlx5_eqe) +
+			sizeof(struct mlx5_ib_uapi_devx_async_event_hdr);
+		event_data = &event->hdr;
+	}
+
+	if (eventsz > count) {
+		spin_unlock_irq(&ev_queue->lock);
+		return -ENOSPC;
+	}
+
+	if (omit_data)
+		list_del_init(&event_sub->event_list);
+	else
+		list_del(&event->list);
+
+	spin_unlock_irq(&ev_queue->lock);
+
+	if (copy_to_user(buf, event_data, eventsz))
+		ret = -EFAULT;
+	else
+		ret = eventsz;
+
+	if (!omit_data)
+		kfree(event);
+	return ret;
 }
 
 static __poll_t devx_async_event_poll(struct file *filp,
 				      struct poll_table_struct *wait)
 {
-	return 0;
+	struct devx_async_event_file *ev_file = filp->private_data;
+	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
+	__poll_t pollflags = 0;
+
+	poll_wait(filp, &ev_queue->poll_wait, wait);
+
+	spin_lock_irq(&ev_queue->lock);
+	if (ev_queue->is_destroyed)
+		pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
+	else if (!list_empty(&ev_queue->event_list))
+		pollflags = EPOLLIN | EPOLLRDNORM;
+	spin_unlock_irq(&ev_queue->lock);
+
+	return pollflags;
 }
 
 static int devx_async_event_close(struct inode *inode, struct file *filp)
 {
+	struct ib_uobject *uobj = filp->private_data;
+	struct devx_async_event_file *ev_file =
+		container_of(uobj, struct devx_async_event_file, uobj);
+	struct devx_event_subscription *event_sub, *event_sub_tmp;
+	struct devx_async_event_data *entry, *tmp;
+
+	mutex_lock(&ev_file->dev->devx_event_table.event_xa_lock);
+	/* delete the subscriptions which are related to this FD */
+	list_for_each_entry_safe(event_sub, event_sub_tmp,
+				 &ev_file->subscribed_events_list, file_list)
+		devx_cleanup_subscription(ev_file->dev, event_sub, true);
+	mutex_unlock(&ev_file->dev->devx_event_table.event_xa_lock);
+
+	/* free the pending events allocation */
+	if (!(ev_file->ev_queue.flags &
+	    MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA)) {
+		spin_lock_irq(&ev_file->ev_queue.lock);
+		list_for_each_entry_safe(entry, tmp,
+					 &ev_file->ev_queue.event_list, list)
+			kfree(entry); /* read can't come any nore */
+		spin_unlock_irq(&ev_file->ev_queue.lock);
+	}
 	uverbs_close_fd(filp);
+	put_device(&ev_file->dev->ib_dev.dev);
 	return 0;
 }
 
@@ -2374,6 +2705,17 @@ static int devx_hot_unplug_async_cmd_event_file(struct ib_uobject *uobj,
 static int devx_hot_unplug_async_event_file(struct ib_uobject *uobj,
 					    enum rdma_remove_reason why)
 {
+	struct devx_async_event_file *ev_file =
+		container_of(uobj, struct devx_async_event_file,
+			     uobj);
+	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
+
+	spin_lock_irq(&ev_queue->lock);
+	ev_queue->is_destroyed = 1;
+	spin_unlock_irq(&ev_queue->lock);
+
+	if (why == RDMA_REMOVE_DRIVER_REMOVE)
+		wake_up_interruptible(&ev_queue->poll_wait);
 	return 0;
 };
 
diff --git a/include/uapi/rdma/mlx5_user_ioctl_verbs.h b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
index 57beea4589e4..9500ff7363ef 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_verbs.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
@@ -67,5 +67,10 @@ enum mlx5_ib_uapi_devx_create_event_channel_flags {
 	MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA = 1 << 0,
 };
 
+struct mlx5_ib_uapi_devx_async_event_hdr {
+	__aligned_u64	cookie;
+	__u8		out_data[];
+};
+
 #endif
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (10 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event Leon Romanovsky
@ 2019-06-18 17:15 ` Leon Romanovsky
  2019-06-24 12:04   ` Jason Gunthorpe
  2019-06-18 18:51 ` [PATCH rdma-next v1 00/12] DEVX asynchronous events Saeed Mahameed
  12 siblings, 1 reply; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-18 17:15 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Yishai Hadas, Saeed Mahameed,
	linux-netdev

From: Yishai Hadas <yishaih@mellanox.com>

Add DEVX support for CQ events by creating and destroying the CQ via
mlx5_core and set an handler to manage its completions.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c | 40 +++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 49fdce95d6d9..91ccd58ebc05 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -19,9 +19,12 @@
 #define UVERBS_MODULE_NAME mlx5_ib
 #include <rdma/uverbs_named_ioctl.h>
 
+static void dispatch_event_fd(struct list_head *fd_list, const void *data);
+
 enum devx_obj_flags {
 	DEVX_OBJ_FLAGS_INDIRECT_MKEY = 1 << 0,
 	DEVX_OBJ_FLAGS_DCT = 1 << 1,
+	DEVX_OBJ_FLAGS_CQ = 1 << 2,
 };
 
 struct devx_async_data {
@@ -94,6 +97,7 @@ struct devx_async_event_file {
 #define MLX5_MAX_DESTROY_INBOX_SIZE_DW MLX5_ST_SZ_DW(delete_fte_in)
 struct devx_obj {
 	struct mlx5_core_dev	*mdev;
+	struct mlx5_ib_dev	*ib_dev;
 	u64			obj_id;
 	u32			dinlen; /* destroy inbox length */
 	u32			dinbox[MLX5_MAX_DESTROY_INBOX_SIZE_DW];
@@ -101,6 +105,7 @@ struct devx_obj {
 	union {
 		struct mlx5_ib_devx_mr	devx_mr;
 		struct mlx5_core_dct	core_dct;
+		struct mlx5_core_cq	core_cq;
 	};
 	struct list_head event_sub; /* holds devx_event_subscription entries */
 };
@@ -1384,6 +1389,8 @@ static int devx_obj_cleanup(struct ib_uobject *uobject,
 
 	if (obj->flags & DEVX_OBJ_FLAGS_DCT)
 		ret = mlx5_core_destroy_dct(obj->mdev, &obj->core_dct);
+	else if (obj->flags & DEVX_OBJ_FLAGS_CQ)
+		ret = mlx5_core_destroy_cq(obj->mdev, &obj->core_cq);
 	else
 		ret = mlx5_cmd_exec(obj->mdev, obj->dinbox, obj->dinlen, out,
 				    sizeof(out));
@@ -1408,6 +1415,30 @@ static int devx_obj_cleanup(struct ib_uobject *uobject,
 	return ret;
 }
 
+static void devx_cq_comp(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe)
+{
+	struct devx_obj *obj = container_of(mcq, struct devx_obj, core_cq);
+	struct mlx5_devx_event_table *table;
+	struct devx_event *event;
+	struct devx_obj_event *obj_event;
+	u32 obj_id = mcq->cqn;
+
+	table = &obj->ib_dev->devx_event_table;
+	event = xa_load(&table->event_xa, MLX5_EVENT_TYPE_COMP);
+	if (!event)
+		return;
+
+	rcu_read_lock();
+	obj_event = xa_load(&event->object_ids, obj_id);
+	if (!obj_event) {
+		rcu_read_unlock();
+		return;
+	}
+
+	dispatch_event_fd(&obj_event->obj_sub_list, eqe);
+	rcu_read_unlock();
+}
+
 static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
 	struct uverbs_attr_bundle *attrs)
 {
@@ -1459,6 +1490,12 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
 		err = mlx5_core_create_dct(dev->mdev, &obj->core_dct,
 					   cmd_in, cmd_in_len,
 					   cmd_out, cmd_out_len);
+	} else if (opcode == MLX5_CMD_OP_CREATE_CQ) {
+		obj->flags |= DEVX_OBJ_FLAGS_CQ;
+		obj->core_cq.comp = devx_cq_comp;
+		err = mlx5_core_create_cq(dev->mdev, &obj->core_cq,
+					  cmd_in, cmd_in_len, cmd_out,
+					  cmd_out_len);
 	} else {
 		err = mlx5_cmd_exec(dev->mdev, cmd_in,
 				    cmd_in_len,
@@ -1471,6 +1508,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
 	uobj->object = obj;
 	obj->mdev = dev->mdev;
 	INIT_LIST_HEAD(&obj->event_sub);
+	obj->ib_dev = dev;
 	devx_obj_build_destroy_cmd(cmd_in, cmd_out, obj->dinbox, &obj->dinlen,
 				   &obj_id);
 	WARN_ON(obj->dinlen > MLX5_MAX_DESTROY_INBOX_SIZE_DW * sizeof(u32));
@@ -1498,6 +1536,8 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_CREATE)(
 obj_destroy:
 	if (obj->flags & DEVX_OBJ_FLAGS_DCT)
 		mlx5_core_destroy_dct(obj->mdev, &obj->core_dct);
+	else if (obj->flags & DEVX_OBJ_FLAGS_CQ)
+		mlx5_core_destroy_cq(obj->mdev, &obj->core_cq);
 	else
 		mlx5_cmd_exec(obj->mdev, obj->dinbox, obj->dinlen, out,
 			      sizeof(out));
-- 
2.20.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 00/12] DEVX asynchronous events
  2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
                   ` (11 preceding siblings ...)
  2019-06-18 17:15 ` [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events Leon Romanovsky
@ 2019-06-18 18:51 ` Saeed Mahameed
  2019-06-19  4:45   ` Leon Romanovsky
  12 siblings, 1 reply; 35+ messages in thread
From: Saeed Mahameed @ 2019-06-18 18:51 UTC (permalink / raw)
  To: Jason Gunthorpe, leon, dledford
  Cc: Yishai Hadas, netdev, Leon Romanovsky, linux-rdma

On Tue, 2019-06-18 at 20:15 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Changelog:
>  v0 -> v1:

Normally 1st submission is V1 and 2nd is V2.
so this should have been v1->v2.

For mlx5-next patches:

Acked-by: Saeed Mahameed <saeedm@mellanox.com>


>  * Fix the unbind / hot unplug flows to work properly.
>  * Fix Ref count handling on the eventfd mode in some flow.
>  * Rebased to latest rdma-next
> 
> Thanks
> 
> -------------------------------------------------------------------
> -----------------
> From Yishai:
> 
> This series enables RDMA applications that use the DEVX interface to
> subscribe and read device asynchronous events.
> 
> The solution is designed to allow extension of events in the future
> without need to perform any changes in the driver code.
> 
> To enable that few changes had been done in mlx5_core, it includes:
>  * Reading device event capabilities that are user related
>    (affiliated and un-affiliated) and set the matching mask upon
>    creating the matching EQ.
>  * Enable DEVX/mlx5_ib to register for ANY event instead of the
> option to
>    get some hard-coded ones.
>  * Enable DEVX/mlx5_ib to get the device raw data for CQ completion
> events.
>  * Enhance mlx5_core_create/destroy CQ to enable DEVX using them so
> that CQ
>    events will be reported as well.
> 
> In mlx5_ib layer the below changes were done:
>  * A new DEVX API was introduced to allocate an event channel by
> using
>    the uverbs FD object type.
>  * Implement the FD channel operations to enable read/poo/close over
> it.
>  * A new DEVX API was introduced to subscribe for specific events
> over an
>    event channel.
>  * Manage an internal data structure  over XA(s) to
> subscribe/dispatch events
>    over the different event channels.
>  * Use from DEVX the mlx5_core APIs to create/destroy a CQ to be able
> to
>    get its relevant events.
> 
> Yishai
> 
> Yishai Hadas (12):
>   net/mlx5: Fix mlx5_core_destroy_cq() error flow
>   net/mlx5: Use event mask based on device capabilities
>   net/mlx5: Expose the API to register for ANY event
>   net/mlx5: mlx5_core_create_cq() enhancements
>   net/mlx5: Report a CQ error event only when a handler was set
>   net/mlx5: Report EQE data upon CQ completion
>   net/mlx5: Expose device definitions for object events
>   IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
>   IB/mlx5: Register DEVX with mlx5_core to get async events
>   IB/mlx5: Enable subscription for device events over DEVX
>   IB/mlx5: Implement DEVX dispatching event
>   IB/mlx5: Add DEVX support for CQ events
> 
>  drivers/infiniband/hw/mlx5/cq.c               |    5 +-
>  drivers/infiniband/hw/mlx5/devx.c             | 1082
> ++++++++++++++++-
>  drivers/infiniband/hw/mlx5/main.c             |   10 +-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h          |   12 +
>  drivers/infiniband/hw/mlx5/odp.c              |    3 +-
>  drivers/infiniband/hw/mlx5/qp.c               |    2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/cq.c  |   21 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en.h  |    2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_main.c |    3 +-
>  .../net/ethernet/mellanox/mlx5/core/en_txrx.c |    2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   68 +-
>  .../ethernet/mellanox/mlx5/core/fpga/conn.c   |    6 +-
>  drivers/net/ethernet/mellanox/mlx5/core/fw.c  |    6 +
>  .../net/ethernet/mellanox/mlx5/core/lib/eq.h  |    5 +-
>  include/linux/mlx5/cq.h                       |    6 +-
>  include/linux/mlx5/device.h                   |    6 +-
>  include/linux/mlx5/driver.h                   |    2 +
>  include/linux/mlx5/eq.h                       |    4 +-
>  include/linux/mlx5/mlx5_ifc.h                 |   34 +-
>  include/uapi/rdma/mlx5_user_ioctl_cmds.h      |   19 +
>  include/uapi/rdma/mlx5_user_ioctl_verbs.h     |    9 +
>  21 files changed, 1237 insertions(+), 70 deletions(-)
> 
> --
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 00/12] DEVX asynchronous events
  2019-06-18 18:51 ` [PATCH rdma-next v1 00/12] DEVX asynchronous events Saeed Mahameed
@ 2019-06-19  4:45   ` Leon Romanovsky
  2019-06-24 21:57     ` Saeed Mahameed
  0 siblings, 1 reply; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-19  4:45 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Jason Gunthorpe, dledford, Yishai Hadas, netdev, linux-rdma

On Tue, Jun 18, 2019 at 06:51:45PM +0000, Saeed Mahameed wrote:
> On Tue, 2019-06-18 at 20:15 +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Changelog:
> >  v0 -> v1:
>
> Normally 1st submission is V1 and 2nd is V2.
> so this should have been v1->v2.

"Normally" depends on the language you are using. In C, everything
starts from 0, including version of patches :).

>
> For mlx5-next patches:
>
> Acked-by: Saeed Mahameed <saeedm@mellanox.com>

Thanks

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
  2019-06-18 17:15 ` [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD Leon Romanovsky
@ 2019-06-24 11:51   ` Jason Gunthorpe
  2019-06-24 13:25     ` Yishai Hadas
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 11:51 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Yishai Hadas,
	Saeed Mahameed, linux-netdev

On Tue, Jun 18, 2019 at 08:15:36PM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas <yishaih@mellanox.com>
> 
> Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial
> implementation.
> 
> This object is from type class FD and will be used to read DEVX
> async events.
> 
> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/devx.c         | 112 ++++++++++++++++++++--
>  include/uapi/rdma/mlx5_user_ioctl_cmds.h  |  10 ++
>  include/uapi/rdma/mlx5_user_ioctl_verbs.h |   4 +
>  3 files changed, 116 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> index 80b42d069328..1815ce0f8daf 100644
> +++ b/drivers/infiniband/hw/mlx5/devx.c
> @@ -33,6 +33,24 @@ struct devx_async_data {
>  	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
>  };
>  
> +struct devx_async_event_queue {

It seems to be a mistake to try and re-use the async_event_queue for
both cmd and event, as they use it very differently and don't even
store the same things in the event_list. I think it is bettter to just
inline this into devx_async_event_file (and inline the old struct in
the cmd file

> +	spinlock_t		lock;
> +	wait_queue_head_t	poll_wait;
> +	struct list_head	event_list;
> +	atomic_t		bytes_in_use;
> +	u8			is_destroyed:1;
> +	u32			flags;
> +};

All the flags testing is ugly, why not just add another bitfield?

> +
> +struct devx_async_event_file {
> +	struct ib_uobject		uobj;
> +	struct list_head subscribed_events_list; /* Head of events that are
> +						  * subscribed to this FD
> +						  */
> +	struct devx_async_event_queue	ev_queue;
> +	struct mlx5_ib_dev *dev;
> +};
> +

Crazy indenting

> diff --git a/include/uapi/rdma/mlx5_user_ioctl_verbs.h b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
> index a8f34c237458..57beea4589e4 100644
> +++ b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
> @@ -63,5 +63,9 @@ enum mlx5_ib_uapi_dm_type {
>  	MLX5_IB_UAPI_DM_TYPE_HEADER_MODIFY_SW_ICM,
>  };
>  
> +enum mlx5_ib_uapi_devx_create_event_channel_flags {
> +	MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA = 1
> << 0,

Maybe this name is too long

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events
  2019-06-18 17:15 ` [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events Leon Romanovsky
@ 2019-06-24 11:52   ` Jason Gunthorpe
  2019-06-24 13:36     ` Yishai Hadas
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 11:52 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Yishai Hadas,
	Saeed Mahameed, linux-netdev

On Tue, Jun 18, 2019 at 08:15:37PM +0300, Leon Romanovsky wrote:
>  void __mlx5_ib_remove(struct mlx5_ib_dev *dev,
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 9cf23ae6324e..556af34b788b 100644
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -944,6 +944,13 @@ struct mlx5_ib_pf_eq {
>  	mempool_t *pool;
>  };
>  
> +struct mlx5_devx_event_table {
> +	struct mlx5_nb devx_nb;
> +	/* serialize updating the event_xa */
> +	struct mutex event_xa_lock;
> +	struct xarray event_xa;
> +};
> +
>  struct mlx5_ib_dev {
>  	struct ib_device		ib_dev;
>  	struct mlx5_core_dev		*mdev;
> @@ -994,6 +1001,7 @@ struct mlx5_ib_dev {
>  	struct mlx5_srq_table   srq_table;
>  	struct mlx5_async_ctx   async_ctx;
>  	int			free_port;
> +	struct mlx5_devx_event_table devx_event_table;

I really question if adding all these structs really does anything for
readability..

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX
  2019-06-18 17:15 ` [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX Leon Romanovsky
@ 2019-06-24 11:57   ` Jason Gunthorpe
  2019-06-24 16:13     ` Yishai Hadas
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 11:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Yishai Hadas,
	Saeed Mahameed, linux-netdev

On Tue, Jun 18, 2019 at 08:15:38PM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas <yishaih@mellanox.com>
> 
> Enable subscription for device events over DEVX.
> 
> Each subscription is added to the two level XA data structure according
> to its event number and the DEVX object information in case was given
> with the given target fd.
> 
> Those events will be reported over the given fd once will occur.
> Downstream patches will mange the dispatching to any subscription.
> 
> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/devx.c        | 564 ++++++++++++++++++++++-
>  include/uapi/rdma/mlx5_user_ioctl_cmds.h |   9 +
>  2 files changed, 566 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> index e9b9ba5a3e9a..304b13e7a265 100644
> +++ b/drivers/infiniband/hw/mlx5/devx.c
> @@ -14,6 +14,7 @@
>  #include <linux/mlx5/driver.h>
>  #include <linux/mlx5/fs.h>
>  #include "mlx5_ib.h"
> +#include <linux/xarray.h>
>  
>  #define UVERBS_MODULE_NAME mlx5_ib
>  #include <rdma/uverbs_named_ioctl.h>
> @@ -33,6 +34,37 @@ struct devx_async_data {
>  	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
>  };
>  
> +/* first level XA value data structure */
> +struct devx_event {
> +	struct xarray object_ids; /* second XA level, Key = object id */
> +	struct list_head unaffiliated_list;
> +};
> +
> +/* second level XA value data structure */
> +struct devx_obj_event {
> +	struct rcu_head rcu;
> +	struct list_head obj_sub_list;
> +};
> +
> +struct devx_event_subscription {
> +	struct list_head file_list; /* headed in private_data->
> +				     * subscribed_events_list
> +				     */
> +	struct list_head xa_list; /* headed in devx_event->unaffiliated_list or
> +				   * devx_obj_event->obj_sub_list
> +				   */
> +	struct list_head obj_list; /* headed in devx_object */
> +
> +	u32 xa_key_level1;
> +	u32 xa_key_level2;
> +	struct rcu_head	rcu;
> +	u64 cookie;
> +	bool is_obj_related;
> +	struct ib_uobject *fd_uobj;
> +	void *object;	/* May need direct access upon hot unplug */

This should be a 'struct file *' and have a better name.

And I'm unclear why we need to store both the ib_uobject and the
struct file for the same thing? And why are we storing the uobj here
instead of the struct devx_async_event_file *?

Since uobj->object == flip && filp->private_data == uobj, I have a
hard time to understand why we need both things, it seems to me that
if we get the fget on the filp then we can rely on the
filp->private_data to get back to the devx_async_event_file.

> +	struct eventfd_ctx *eventfd;
> +};
> +

>  /*
>   * As the obj_id in the firmware is not globally unique the object type
>   * must be considered upon checking for a valid object id.
> @@ -1143,14 +1275,53 @@ static void devx_cleanup_mkey(struct devx_obj *obj)
>  	write_unlock_irqrestore(&table->lock, flags);
>  }
>  
> +static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
> +				      struct devx_event_subscription *sub)
> +{
> +	list_del_rcu(&sub->file_list);
> +	list_del_rcu(&sub->xa_list);
> +
> +	if (sub->is_obj_related) {

is_obj_related looks like it is just list_empty(obj_list)??

Success oriented flow

> @@ -1523,6 +1700,350 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_ASYNC_QUERY)(
>  	return err;
>  }
>  
> +static void
> +subscribe_event_xa_dealloc(struct mlx5_devx_event_table *devx_event_table,
> +			   u32 key_level1,
> +			   u32 key_level2,
> +			   struct devx_obj_event *alloc_obj_event)
> +{
> +	struct devx_event *event;
> +
> +	/* Level 1 is valid for future use - no need to free */
> +	if (!alloc_obj_event)
> +		return;
> +
> +	event = xa_load(&devx_event_table->event_xa, key_level1);
> +	WARN_ON(!event);
> +
> +	xa_erase(&event->object_ids, key_level2);

Shoulnd't this only erase if the value stored is NULL?

> +	kfree(alloc_obj_event);
> +}
> +
> +static int
> +subscribe_event_xa_alloc(struct mlx5_devx_event_table *devx_event_table,
> +			 u32 key_level1,
> +			 bool is_level2,
> +			 u32 key_level2,
> +			 struct devx_obj_event **alloc_obj_event)
> +{
> +	struct devx_obj_event *obj_event;
> +	struct devx_event *event;
> +	bool new_entry_level1 = false;
> +	int err;
> +
> +	event = xa_load(&devx_event_table->event_xa, key_level1);
> +	if (!event) {
> +		event = kzalloc(sizeof(*event), GFP_KERNEL);
> +		if (!event)
> +			return -ENOMEM;
> +
> +		new_entry_level1 = true;
> +		INIT_LIST_HEAD(&event->unaffiliated_list);
> +		xa_init(&event->object_ids);
> +
> +		err = xa_insert(&devx_event_table->event_xa,
> +				key_level1,
> +				event,
> +				GFP_KERNEL);
> +		if (err)
> +			goto end;
> +	}
> +
> +	if (!is_level2)
> +		return 0;
> +
> +	obj_event = xa_load(&event->object_ids, key_level2);
> +	if (!obj_event) {
> +		err = xa_reserve(&event->object_ids, key_level2, GFP_KERNEL);
> +		if (err)
> +			goto err_level1;
> +
> +		obj_event = kzalloc(sizeof(*obj_event), GFP_KERNEL);
> +		if (!obj_event) {
> +			err = -ENOMEM;
> +			goto err_level2;
> +		}
> +
> +		INIT_LIST_HEAD(&obj_event->obj_sub_list);
> +		*alloc_obj_event = obj_event;

This is goofy, just store the empty obj_event in the xa instead of
using xa_reserve, and when you go to do the error unwind just delete
any level2' devx_obj_event' that has a list_empty(obj_sub_list), get
rid of the wonky alloc_obj_event stuff.

The best configuration would be to use devx_cleanup_subscription to
undo the partially ready subscription.

> +	}
> +
> +	return 0;
> +
> +err_level2:
> +	xa_erase(&event->object_ids, key_level2);
> +
> +err_level1:
> +	if (new_entry_level1)
> +		xa_erase(&devx_event_table->event_xa, key_level1);
> +end:
> +	if (new_entry_level1)
> +		kfree(event);

Can't do this, once the level1 is put in the tree it could be referenced by
the irqs. At least it needs a kfree_rcu, most likely it is simpler to
just leave it.

> +#define MAX_NUM_EVENTS 16
> +static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_SUBSCRIBE_EVENT)(
> +	struct uverbs_attr_bundle *attrs)
> +{
> +	struct ib_uobject *devx_uobj = uverbs_attr_get_uobject(
> +				attrs,
> +				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_OBJ_HANDLE);
> +	struct mlx5_ib_ucontext *c = rdma_udata_to_drv_context(
> +		&attrs->driver_udata, struct mlx5_ib_ucontext, ibucontext);
> +	struct mlx5_ib_dev *dev = to_mdev(c->ibucontext.device);
> +	struct ib_uobject *fd_uobj;
> +	struct devx_obj *obj = NULL;
> +	struct devx_async_event_file *ev_file;
> +	struct mlx5_devx_event_table *devx_event_table = &dev->devx_event_table;
> +	u16 *event_type_num_list;
> +	struct devx_event_subscription **event_sub_arr;
> +	struct devx_obj_event  **event_obj_array_alloc;
> +	int redirect_fd;
> +	bool use_eventfd = false;
> +	int num_events;
> +	int num_alloc_xa_entries = 0;
> +	u16 obj_type = 0;
> +	u64 cookie = 0;
> +	u32 obj_id = 0;
> +	int err;
> +	int i;
> +
> +	if (!c->devx_uid)
> +		return -EINVAL;
> +
> +	if (!IS_ERR(devx_uobj)) {
> +		obj = (struct devx_obj *)devx_uobj->object;
> +		if (obj)
> +			obj_id = get_dec_obj_id(obj->obj_id);
> +	}
> +
> +	fd_uobj = uverbs_attr_get_uobject(attrs,
> +				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_HANDLE);
> +	if (IS_ERR(fd_uobj))
> +		return PTR_ERR(fd_uobj);
> +
> +	ev_file = container_of(fd_uobj, struct devx_async_event_file,
> +			       uobj);
> +
> +	if (uverbs_attr_is_valid(attrs,
> +				 MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM)) {
> +		err = uverbs_copy_from(&redirect_fd, attrs,
> +			       MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM);
> +		if (err)
> +			return err;
> +
> +		use_eventfd = true;
> +	}
> +
> +	if (uverbs_attr_is_valid(attrs,
> +				 MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE)) {
> +		if (use_eventfd)
> +			return -EINVAL;
> +
> +		err = uverbs_copy_from(&cookie, attrs,
> +				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE);
> +		if (err)
> +			return err;
> +	}
> +
> +	num_events = uverbs_attr_ptr_get_array_size(
> +		attrs, MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST,
> +		sizeof(u16));
> +
> +	if (num_events < 0)
> +		return num_events;
> +
> +	if (num_events > MAX_NUM_EVENTS)
> +		return -EINVAL;
> +
> +	event_type_num_list = uverbs_attr_get_alloced_ptr(attrs,
> +			MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST);
> +
> +	if (!is_valid_events(dev->mdev, num_events, event_type_num_list, obj))
> +		return -EINVAL;
> +
> +	event_sub_arr = uverbs_zalloc(attrs,
> +		MAX_NUM_EVENTS * sizeof(struct devx_event_subscription *));
> +	event_obj_array_alloc = uverbs_zalloc(attrs,
> +		MAX_NUM_EVENTS * sizeof(struct devx_obj_event *));

There are so many list_heads in the devx_event_subscription, why not
use just one of them to store the allocated events instead of this
temp array? ie event_list looks good for this purpose.

> +
> +	if (!event_sub_arr || !event_obj_array_alloc)
> +		return -ENOMEM;
> +
> +	/* Protect from concurrent subscriptions to same XA entries to allow
> +	 * both to succeed
> +	 */
> +	mutex_lock(&devx_event_table->event_xa_lock);
> +	for (i = 0; i < num_events; i++) {
> +		u32 key_level1;
> +
> +		if (obj)
> +			obj_type = get_dec_obj_type(obj,
> +						    event_type_num_list[i]);
> +		key_level1 = event_type_num_list[i] | obj_type << 16;
> +
> +		err = subscribe_event_xa_alloc(devx_event_table,
> +					       key_level1,
> +					       obj ? true : false,
> +					       obj_id,
> +					       &event_obj_array_alloc[i]);

Usless ?:

> +		if (err)
> +			goto err;
> +
> +		num_alloc_xa_entries++;
> +		event_sub_arr[i] = kzalloc(sizeof(*event_sub_arr[i]),
> +					   GFP_KERNEL);
> +		if (!event_sub_arr[i])
> +			goto err;
> +
> +		if (use_eventfd) {
> +			event_sub_arr[i]->eventfd =
> +				eventfd_ctx_fdget(redirect_fd);
> +
> +			if (IS_ERR(event_sub_arr[i]->eventfd)) {
> +				err = PTR_ERR(event_sub_arr[i]->eventfd);
> +				event_sub_arr[i]->eventfd = NULL;
> +				goto err;
> +			}
> +		}
> +
> +		event_sub_arr[i]->cookie = cookie;
> +		event_sub_arr[i]->fd_uobj = fd_uobj;
> +		event_sub_arr[i]->object = fd_uobj->object;
> +		/* May be needed upon cleanup the devx object/subscription */
> +		event_sub_arr[i]->xa_key_level1 = key_level1;
> +		event_sub_arr[i]->xa_key_level2 = obj_id;
> +		event_sub_arr[i]->is_obj_related = obj ? true : false;

Unneeded ?:


Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event
  2019-06-18 17:15 ` [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event Leon Romanovsky
@ 2019-06-24 12:03   ` Jason Gunthorpe
  2019-06-24 16:55     ` Yishai Hadas
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 12:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Yishai Hadas,
	Saeed Mahameed, linux-netdev

On Tue, Jun 18, 2019 at 08:15:39PM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas <yishaih@mellanox.com>
> 
> Implement DEVX dispatching event by looking up for the applicable
> subscriptions for the reported event and using their target fd to
> signal/set the event.
> 
> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/devx.c         | 362 +++++++++++++++++++++-
>  include/uapi/rdma/mlx5_user_ioctl_verbs.h |   5 +
>  2 files changed, 357 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> index 304b13e7a265..49fdce95d6d9 100644
> +++ b/drivers/infiniband/hw/mlx5/devx.c
> @@ -34,6 +34,11 @@ struct devx_async_data {
>  	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
>  };
>  
> +struct devx_async_event_data {
> +	struct list_head list; /* headed in ev_queue->event_list */
> +	struct mlx5_ib_uapi_devx_async_event_hdr hdr;
> +};
> +
>  /* first level XA value data structure */
>  struct devx_event {
>  	struct xarray object_ids; /* second XA level, Key = object id */
> @@ -54,7 +59,9 @@ struct devx_event_subscription {
>  				   * devx_obj_event->obj_sub_list
>  				   */
>  	struct list_head obj_list; /* headed in devx_object */
> +	struct list_head event_list; /* headed in ev_queue->event_list */
>  
> +	u8  is_cleaned:1;

There is a loose bool 'is_obj_related' that should be combined with
this bool bitfield as well.

>  static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
> -				      struct devx_event_subscription *sub)
> +				      struct devx_event_subscription *sub,
> +				      bool file_close)
>  {
> -	list_del_rcu(&sub->file_list);
> +	if (sub->is_cleaned)
> +		goto end;
> +
> +	sub->is_cleaned = 1;
>  	list_del_rcu(&sub->xa_list);
>  
>  	if (sub->is_obj_related) {
> @@ -1303,10 +1355,15 @@ static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
>  		}
>  	}
>  
> -	if (sub->eventfd)
> -		eventfd_ctx_put(sub->eventfd);
> +end:
> +	if (file_close) {
> +		if (sub->eventfd)
> +			eventfd_ctx_put(sub->eventfd);
>  
> -	kfree_rcu(sub, rcu);
> +		list_del_rcu(&sub->file_list);
> +		/* subscription may not be used by the read API any more */
> +		kfree_rcu(sub, rcu);
> +	}

Dis like this confusing file_close stuff, just put this in the single place
that calls this with the true bool

> +static int deliver_event(struct devx_event_subscription *event_sub,
> +			 const void *data)
> +{
> +	struct ib_uobject *fd_uobj = event_sub->fd_uobj;
> +	struct devx_async_event_file *ev_file;
> +	struct devx_async_event_queue *ev_queue;
> +	struct devx_async_event_data *event_data;
> +	unsigned long flags;
> +	bool omit_data;
> +
> +	ev_file = container_of(fd_uobj, struct devx_async_event_file,
> +			       uobj);
> +	ev_queue = &ev_file->ev_queue;
> +	omit_data = ev_queue->flags &
> +		MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA;
> +
> +	if (omit_data) {
> +		spin_lock_irqsave(&ev_queue->lock, flags);
> +		if (!list_empty(&event_sub->event_list)) {
> +			spin_unlock_irqrestore(&ev_queue->lock, flags);
> +			return 0;
> +		}
> +
> +		list_add_tail(&event_sub->event_list, &ev_queue->event_list);
> +		spin_unlock_irqrestore(&ev_queue->lock, flags);
> +		wake_up_interruptible(&ev_queue->poll_wait);
> +		return 0;
> +	}
> +
> +	event_data = kzalloc(sizeof(*event_data) +
> +			     (omit_data ? 0 : sizeof(struct mlx5_eqe)),
> +			     GFP_ATOMIC);

omit_data is always false here

> +	if (!event_data) {
> +		spin_lock_irqsave(&ev_queue->lock, flags);
> +		ev_queue->is_overflow_err = 1;
> +		spin_unlock_irqrestore(&ev_queue->lock, flags);
> +		return -ENOMEM;
> +	}
> +
> +	event_data->hdr.cookie = event_sub->cookie;
> +	memcpy(event_data->hdr.out_data, data, sizeof(struct mlx5_eqe));
> +
> +	spin_lock_irqsave(&ev_queue->lock, flags);
> +	list_add_tail(&event_data->list, &ev_queue->event_list);
> +	spin_unlock_irqrestore(&ev_queue->lock, flags);
> +	wake_up_interruptible(&ev_queue->poll_wait);
> +
> +	return 0;
> +}
> +
> +static void dispatch_event_fd(struct list_head *fd_list,
> +			      const void *data)
> +{
> +	struct devx_event_subscription *item;
> +
> +	list_for_each_entry_rcu(item, fd_list, xa_list) {
> +		if (!get_file_rcu((struct file *)item->object))
> +			continue;
> +
> +		if (item->eventfd) {
> +			eventfd_signal(item->eventfd, 1);
> +			fput(item->object);
> +			continue;
> +		}
> +
> +		deliver_event(item, data);
> +		fput(item->object);
> +	}
> +}
> +
>  static int devx_event_notifier(struct notifier_block *nb,
>  			       unsigned long event_type, void *data)
>  {
> -	return NOTIFY_DONE;
> +	struct mlx5_devx_event_table *table;
> +	struct mlx5_ib_dev *dev;
> +	struct devx_event *event;
> +	struct devx_obj_event *obj_event;
> +	u16 obj_type = 0;
> +	bool is_unaffiliated;
> +	u32 obj_id;
> +
> +	/* Explicit filtering to kernel events which may occur frequently */
> +	if (event_type == MLX5_EVENT_TYPE_CMD ||
> +	    event_type == MLX5_EVENT_TYPE_PAGE_REQUEST)
> +		return NOTIFY_OK;
> +
> +	table = container_of(nb, struct mlx5_devx_event_table, devx_nb.nb);
> +	dev = container_of(table, struct mlx5_ib_dev, devx_event_table);
> +	is_unaffiliated = is_unaffiliated_event(dev->mdev, event_type);
> +
> +	if (!is_unaffiliated)
> +		obj_type = get_event_obj_type(event_type, data);
> +	event = xa_load(&table->event_xa, event_type | (obj_type << 16));
> +	if (!event)
> +		return NOTIFY_DONE;

event should be in the rcu as well

> +	if (is_unaffiliated) {
> +		dispatch_event_fd(&event->unaffiliated_list, data);
> +		return NOTIFY_OK;
> +	}
> +
> +	obj_id = devx_get_obj_id_from_event(event_type, data);
> +	rcu_read_lock();
> +	obj_event = xa_load(&event->object_ids, obj_id);
> +	if (!obj_event) {
> +		rcu_read_unlock();
> +		return NOTIFY_DONE;
> +	}
> +
> +	dispatch_event_fd(&obj_event->obj_sub_list, data);
> +
> +	rcu_read_unlock();
> +	return NOTIFY_OK;
>  }
>  
>  void mlx5_ib_devx_init_event_table(struct mlx5_ib_dev *dev)
> @@ -2221,7 +2444,7 @@ void mlx5_ib_devx_cleanup_event_table(struct mlx5_ib_dev *dev)
>  		event = entry;
>  		list_for_each_entry_safe(sub, tmp, &event->unaffiliated_list,
>  					 xa_list)
> -			devx_cleanup_subscription(dev, sub);
> +			devx_cleanup_subscription(dev, sub, false);
>  		kfree(entry);
>  	}
>  	mutex_unlock(&dev->devx_event_table.event_xa_lock);
> @@ -2329,18 +2552,126 @@ static const struct file_operations devx_async_cmd_event_fops = {
>  static ssize_t devx_async_event_read(struct file *filp, char __user *buf,
>  				     size_t count, loff_t *pos)
>  {
> -	return -EINVAL;
> +	struct devx_async_event_file *ev_file = filp->private_data;
> +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
> +	struct devx_event_subscription *event_sub;
> +	struct devx_async_event_data *uninitialized_var(event);
> +	int ret = 0;
> +	size_t eventsz;
> +	bool omit_data;
> +	void *event_data;
> +
> +	omit_data = ev_queue->flags &
> +		MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA;
> +
> +	spin_lock_irq(&ev_queue->lock);
> +
> +	if (ev_queue->is_overflow_err) {
> +		ev_queue->is_overflow_err = 0;
> +		spin_unlock_irq(&ev_queue->lock);
> +		return -EOVERFLOW;
> +	}
> +
> +	while (list_empty(&ev_queue->event_list)) {
> +		spin_unlock_irq(&ev_queue->lock);
> +
> +		if (filp->f_flags & O_NONBLOCK)
> +			return -EAGAIN;
> +
> +		if (wait_event_interruptible(ev_queue->poll_wait,
> +			    (!list_empty(&ev_queue->event_list) ||
> +			     ev_queue->is_destroyed))) {
> +			return -ERESTARTSYS;
> +		}
> +
> +		if (list_empty(&ev_queue->event_list) &&
> +		    ev_queue->is_destroyed)
> +			return -EIO;

All these tests should be under the lock.

Why don't we return EIO as soon as is-destroyed happens? What is the
point of flushing out the accumulated events?

> +
> +		spin_lock_irq(&ev_queue->lock);
> +	}
> +
> +	if (omit_data) {
> +		event_sub = list_first_entry(&ev_queue->event_list,
> +					struct devx_event_subscription,
> +					event_list);
> +		eventsz = sizeof(event_sub->cookie);
> +		event_data = &event_sub->cookie;
> +	} else {
> +		event = list_first_entry(&ev_queue->event_list,
> +				      struct devx_async_event_data, list);
> +		eventsz = sizeof(struct mlx5_eqe) +
> +			sizeof(struct mlx5_ib_uapi_devx_async_event_hdr);
> +		event_data = &event->hdr;
> +	}
> +
> +	if (eventsz > count) {
> +		spin_unlock_irq(&ev_queue->lock);
> +		return -ENOSPC;

This is probably the wrong errno

> +	}
> +
> +	if (omit_data)
> +		list_del_init(&event_sub->event_list);
> +	else
> +		list_del(&event->list);
> +
> +	spin_unlock_irq(&ev_queue->lock);
> +
> +	if (copy_to_user(buf, event_data, eventsz))
> +		ret = -EFAULT;
> +	else
> +		ret = eventsz;

This is really poorly ordered, EFAULT will cause the event to be lost. :(

Maybe the event should be re-added on error? Tricky.

> +	if (!omit_data)
> +		kfree(event);
> +	return ret;
>  }
>  
>  static __poll_t devx_async_event_poll(struct file *filp,
>  				      struct poll_table_struct *wait)
>  {
> -	return 0;
> +	struct devx_async_event_file *ev_file = filp->private_data;
> +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
> +	__poll_t pollflags = 0;
> +
> +	poll_wait(filp, &ev_queue->poll_wait, wait);
> +
> +	spin_lock_irq(&ev_queue->lock);
> +	if (ev_queue->is_destroyed)
> +		pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
> +	else if (!list_empty(&ev_queue->event_list))
> +		pollflags = EPOLLIN | EPOLLRDNORM;
> +	spin_unlock_irq(&ev_queue->lock);
> +
> +	return pollflags;
>  }
>  
>  static int devx_async_event_close(struct inode *inode, struct file *filp)
>  {
> +	struct ib_uobject *uobj = filp->private_data;
> +	struct devx_async_event_file *ev_file =
> +		container_of(uobj, struct devx_async_event_file, uobj);
> +	struct devx_event_subscription *event_sub, *event_sub_tmp;
> +	struct devx_async_event_data *entry, *tmp;
> +
> +	mutex_lock(&ev_file->dev->devx_event_table.event_xa_lock);
> +	/* delete the subscriptions which are related to this FD */
> +	list_for_each_entry_safe(event_sub, event_sub_tmp,
> +				 &ev_file->subscribed_events_list, file_list)
> +		devx_cleanup_subscription(ev_file->dev, event_sub, true);
> +	mutex_unlock(&ev_file->dev->devx_event_table.event_xa_lock);
> +
> +	/* free the pending events allocation */
> +	if (!(ev_file->ev_queue.flags &
> +	    MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA)) {
> +		spin_lock_irq(&ev_file->ev_queue.lock);
> +		list_for_each_entry_safe(entry, tmp,
> +					 &ev_file->ev_queue.event_list, list)
> +			kfree(entry); /* read can't come any nore */

spelling

> +		spin_unlock_irq(&ev_file->ev_queue.lock);
> +	}
>  	uverbs_close_fd(filp);
> +	put_device(&ev_file->dev->ib_dev.dev);
>  	return 0;
>  }
>  
> @@ -2374,6 +2705,17 @@ static int devx_hot_unplug_async_cmd_event_file(struct ib_uobject *uobj,
>  static int devx_hot_unplug_async_event_file(struct ib_uobject *uobj,
>  					    enum rdma_remove_reason why)
>  {
> +	struct devx_async_event_file *ev_file =
> +		container_of(uobj, struct devx_async_event_file,
> +			     uobj);
> +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
> +
> +	spin_lock_irq(&ev_queue->lock);
> +	ev_queue->is_destroyed = 1;
> +	spin_unlock_irq(&ev_queue->lock);
> +
> +	if (why == RDMA_REMOVE_DRIVER_REMOVE)
> +		wake_up_interruptible(&ev_queue->poll_wait);

Why isn't this wakeup always done?

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events
  2019-06-18 17:15 ` [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events Leon Romanovsky
@ 2019-06-24 12:04   ` Jason Gunthorpe
  2019-06-24 17:03     ` Yishai Hadas
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 12:04 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Yishai Hadas,
	Saeed Mahameed, linux-netdev

On Tue, Jun 18, 2019 at 08:15:40PM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas <yishaih@mellanox.com>
> 
> Add DEVX support for CQ events by creating and destroying the CQ via
> mlx5_core and set an handler to manage its completions.
> 
> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/devx.c | 40 +++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> index 49fdce95d6d9..91ccd58ebc05 100644
> +++ b/drivers/infiniband/hw/mlx5/devx.c
> @@ -19,9 +19,12 @@
>  #define UVERBS_MODULE_NAME mlx5_ib
>  #include <rdma/uverbs_named_ioctl.h>
>  
> +static void dispatch_event_fd(struct list_head *fd_list, const void *data);
> +
>  enum devx_obj_flags {
>  	DEVX_OBJ_FLAGS_INDIRECT_MKEY = 1 << 0,
>  	DEVX_OBJ_FLAGS_DCT = 1 << 1,
> +	DEVX_OBJ_FLAGS_CQ = 1 << 2,
>  };
>  
>  struct devx_async_data {
> @@ -94,6 +97,7 @@ struct devx_async_event_file {
>  #define MLX5_MAX_DESTROY_INBOX_SIZE_DW MLX5_ST_SZ_DW(delete_fte_in)
>  struct devx_obj {
>  	struct mlx5_core_dev	*mdev;
> +	struct mlx5_ib_dev	*ib_dev;

This seems strange, why would we need to store the core_dev and the ib_dev
in a struct when ibdev->mdev == core_dev?

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
  2019-06-24 11:51   ` Jason Gunthorpe
@ 2019-06-24 13:25     ` Yishai Hadas
  2019-06-24 14:30       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Yishai Hadas @ 2019-06-24 13:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Yishai Hadas,
	Saeed Mahameed, linux-netdev

On 6/24/2019 2:51 PM, Jason Gunthorpe wrote:
> On Tue, Jun 18, 2019 at 08:15:36PM +0300, Leon Romanovsky wrote:
>> From: Yishai Hadas <yishaih@mellanox.com>
>>
>> Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial
>> implementation.
>>
>> This object is from type class FD and will be used to read DEVX
>> async events.
>>
>> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
>> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>>   drivers/infiniband/hw/mlx5/devx.c         | 112 ++++++++++++++++++++--
>>   include/uapi/rdma/mlx5_user_ioctl_cmds.h  |  10 ++
>>   include/uapi/rdma/mlx5_user_ioctl_verbs.h |   4 +
>>   3 files changed, 116 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
>> index 80b42d069328..1815ce0f8daf 100644
>> +++ b/drivers/infiniband/hw/mlx5/devx.c
>> @@ -33,6 +33,24 @@ struct devx_async_data {
>>   	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
>>   };
>>   
>> +struct devx_async_event_queue {
> 
> It seems to be a mistake to try and re-use the async_event_queue for
> both cmd and event, as they use it very differently and don't even
> store the same things in the event_list. I think it is bettter to just
> inline this into devx_async_event_file (and inline the old struct in
> the cmd file
> 

How about having another struct with all the event's queue fields 
together ? this has the benefit of having all those related fields in 
one place and leave the cmd as is.

Alternatively,
We can inline the event stuff under devx_async_event_file and leave the 
cmd for now under a struct as it's not directly related to this series.

What do you think ?


>> +	spinlock_t		lock;
>> +	wait_queue_head_t	poll_wait;
>> +	struct list_head	event_list;
>> +	atomic_t		bytes_in_use;
>> +	u8			is_destroyed:1;
>> +	u32			flags;
>> +};
> 
> All the flags testing is ugly, why not just add another bitfield?

The flags are coming from user space and have their different name 
space, I prefer to not mix with kernel ones. (i.e. is_destroyed).
Makes sense ?

> 
>> +
>> +struct devx_async_event_file {
>> +	struct ib_uobject		uobj;
>> +	struct list_head subscribed_events_list; /* Head of events that are
>> +						  * subscribed to this FD
>> +						  */
>> +	struct devx_async_event_queue	ev_queue;
>> +	struct mlx5_ib_dev *dev;
>> +};
>> +
> 
> Crazy indenting
> 
OK, will handle.

>> diff --git a/include/uapi/rdma/mlx5_user_ioctl_verbs.h b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
>> index a8f34c237458..57beea4589e4 100644
>> +++ b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
>> @@ -63,5 +63,9 @@ enum mlx5_ib_uapi_dm_type {
>>   	MLX5_IB_UAPI_DM_TYPE_HEADER_MODIFY_SW_ICM,
>>   };
>>   
>> +enum mlx5_ib_uapi_devx_create_event_channel_flags {
>> +	MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA = 1
>> << 0,
> 
> Maybe this name is too long

Quite long but follows the name scheme having the UAPI prefix.
Any shorter suggestion ?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events
  2019-06-24 11:52   ` Jason Gunthorpe
@ 2019-06-24 13:36     ` Yishai Hadas
  2019-06-24 14:30       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Yishai Hadas @ 2019-06-24 13:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On 6/24/2019 2:52 PM, Jason Gunthorpe wrote:
> On Tue, Jun 18, 2019 at 08:15:37PM +0300, Leon Romanovsky wrote:
>>   void __mlx5_ib_remove(struct mlx5_ib_dev *dev,
>> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> index 9cf23ae6324e..556af34b788b 100644
>> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> @@ -944,6 +944,13 @@ struct mlx5_ib_pf_eq {
>>   	mempool_t *pool;
>>   };
>>   
>> +struct mlx5_devx_event_table {
>> +	struct mlx5_nb devx_nb;
>> +	/* serialize updating the event_xa */
>> +	struct mutex event_xa_lock;
>> +	struct xarray event_xa;
>> +};
>> +
>>   struct mlx5_ib_dev {
>>   	struct ib_device		ib_dev;
>>   	struct mlx5_core_dev		*mdev;
>> @@ -994,6 +1001,7 @@ struct mlx5_ib_dev {
>>   	struct mlx5_srq_table   srq_table;
>>   	struct mlx5_async_ctx   async_ctx;
>>   	int			free_port;
>> +	struct mlx5_devx_event_table devx_event_table;
> 
> I really question if adding all these structs really does anything for
> readability..
> 

I would prefer this option to add only one structure (i.e. 
mlx5_devx_event_table) on ib_dev, it will hold internally the other 
related stuff.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
  2019-06-24 13:25     ` Yishai Hadas
@ 2019-06-24 14:30       ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 14:30 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Yishai Hadas,
	Saeed Mahameed, linux-netdev

On Mon, Jun 24, 2019 at 04:25:37PM +0300, Yishai Hadas wrote:
> On 6/24/2019 2:51 PM, Jason Gunthorpe wrote:
> > On Tue, Jun 18, 2019 at 08:15:36PM +0300, Leon Romanovsky wrote:
> > > From: Yishai Hadas <yishaih@mellanox.com>
> > > 
> > > Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial
> > > implementation.
> > > 
> > > This object is from type class FD and will be used to read DEVX
> > > async events.
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >   drivers/infiniband/hw/mlx5/devx.c         | 112 ++++++++++++++++++++--
> > >   include/uapi/rdma/mlx5_user_ioctl_cmds.h  |  10 ++
> > >   include/uapi/rdma/mlx5_user_ioctl_verbs.h |   4 +
> > >   3 files changed, 116 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> > > index 80b42d069328..1815ce0f8daf 100644
> > > +++ b/drivers/infiniband/hw/mlx5/devx.c
> > > @@ -33,6 +33,24 @@ struct devx_async_data {
> > >   	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
> > >   };
> > > +struct devx_async_event_queue {
> > 
> > It seems to be a mistake to try and re-use the async_event_queue for
> > both cmd and event, as they use it very differently and don't even
> > store the same things in the event_list. I think it is bettter to just
> > inline this into devx_async_event_file (and inline the old struct in
> > the cmd file
> > 
> 
> How about having another struct with all the event's queue fields together ?
> this has the benefit of having all those related fields in one place and
> leave the cmd as is.
> 
> Alternatively,
> We can inline the event stuff under devx_async_event_file and leave the cmd
> for now under a struct as it's not directly related to this series.

I would probbaly do this

> > > +	spinlock_t		lock;
> > > +	wait_queue_head_t	poll_wait;
> > > +	struct list_head	event_list;
> > > +	atomic_t		bytes_in_use;
> > > +	u8			is_destroyed:1;
> > > +	u32			flags;
> > > +};
> > 
> > All the flags testing is ugly, why not just add another bitfield?
> 
> The flags are coming from user space and have their different name space, I
> prefer to not mix with kernel ones. (i.e. is_destroyed).
> Makes sense ?

No, better to add a bitfield than store the raw flags and another
bitfield.

> > > diff --git a/include/uapi/rdma/mlx5_user_ioctl_verbs.h b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
> > > index a8f34c237458..57beea4589e4 100644
> > > +++ b/include/uapi/rdma/mlx5_user_ioctl_verbs.h
> > > @@ -63,5 +63,9 @@ enum mlx5_ib_uapi_dm_type {
> > >   	MLX5_IB_UAPI_DM_TYPE_HEADER_MODIFY_SW_ICM,
> > >   };
> > > +enum mlx5_ib_uapi_devx_create_event_channel_flags {
> > > +	MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA = 1
> > > << 0,
> > 
> > Maybe this name is too long
> 
> Quite long but follows the name scheme having the UAPI prefix.
> Any shorter suggestion ?
> 

I think you should shorten it

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events
  2019-06-24 13:36     ` Yishai Hadas
@ 2019-06-24 14:30       ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 14:30 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On Mon, Jun 24, 2019 at 04:36:44PM +0300, Yishai Hadas wrote:
> On 6/24/2019 2:52 PM, Jason Gunthorpe wrote:
> > On Tue, Jun 18, 2019 at 08:15:37PM +0300, Leon Romanovsky wrote:
> > >   void __mlx5_ib_remove(struct mlx5_ib_dev *dev,
> > > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > > index 9cf23ae6324e..556af34b788b 100644
> > > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> > > @@ -944,6 +944,13 @@ struct mlx5_ib_pf_eq {
> > >   	mempool_t *pool;
> > >   };
> > > +struct mlx5_devx_event_table {
> > > +	struct mlx5_nb devx_nb;
> > > +	/* serialize updating the event_xa */
> > > +	struct mutex event_xa_lock;
> > > +	struct xarray event_xa;
> > > +};
> > > +
> > >   struct mlx5_ib_dev {
> > >   	struct ib_device		ib_dev;
> > >   	struct mlx5_core_dev		*mdev;
> > > @@ -994,6 +1001,7 @@ struct mlx5_ib_dev {
> > >   	struct mlx5_srq_table   srq_table;
> > >   	struct mlx5_async_ctx   async_ctx;
> > >   	int			free_port;
> > > +	struct mlx5_devx_event_table devx_event_table;
> > 
> > I really question if adding all these structs really does anything for
> > readability..
> > 
> 
> I would prefer this option to add only one structure (i.e.
> mlx5_devx_event_table) on ib_dev, it will hold internally the other related
> stuff.

It seems confounding but generally is the style in this struct :\

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX
  2019-06-24 11:57   ` Jason Gunthorpe
@ 2019-06-24 16:13     ` Yishai Hadas
  2019-06-24 17:56       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Yishai Hadas @ 2019-06-24 16:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On 6/24/2019 2:57 PM, Jason Gunthorpe wrote:
> On Tue, Jun 18, 2019 at 08:15:38PM +0300, Leon Romanovsky wrote:
>> From: Yishai Hadas <yishaih@mellanox.com>
>>
>> Enable subscription for device events over DEVX.
>>
>> Each subscription is added to the two level XA data structure according
>> to its event number and the DEVX object information in case was given
>> with the given target fd.
>>
>> Those events will be reported over the given fd once will occur.
>> Downstream patches will mange the dispatching to any subscription.
>>
>> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
>> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>>   drivers/infiniband/hw/mlx5/devx.c        | 564 ++++++++++++++++++++++-
>>   include/uapi/rdma/mlx5_user_ioctl_cmds.h |   9 +
>>   2 files changed, 566 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
>> index e9b9ba5a3e9a..304b13e7a265 100644
>> +++ b/drivers/infiniband/hw/mlx5/devx.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/mlx5/driver.h>
>>   #include <linux/mlx5/fs.h>
>>   #include "mlx5_ib.h"
>> +#include <linux/xarray.h>
>>   
>>   #define UVERBS_MODULE_NAME mlx5_ib
>>   #include <rdma/uverbs_named_ioctl.h>
>> @@ -33,6 +34,37 @@ struct devx_async_data {
>>   	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
>>   };
>>   
>> +/* first level XA value data structure */
>> +struct devx_event {
>> +	struct xarray object_ids; /* second XA level, Key = object id */
>> +	struct list_head unaffiliated_list;
>> +};
>> +
>> +/* second level XA value data structure */
>> +struct devx_obj_event {
>> +	struct rcu_head rcu;
>> +	struct list_head obj_sub_list;
>> +};
>> +
>> +struct devx_event_subscription {
>> +	struct list_head file_list; /* headed in private_data->
>> +				     * subscribed_events_list
>> +				     */
>> +	struct list_head xa_list; /* headed in devx_event->unaffiliated_list or
>> +				   * devx_obj_event->obj_sub_list
>> +				   */
>> +	struct list_head obj_list; /* headed in devx_object */
>> +
>> +	u32 xa_key_level1;
>> +	u32 xa_key_level2;
>> +	struct rcu_head	rcu;
>> +	u64 cookie;
>> +	bool is_obj_related;
>> +	struct ib_uobject *fd_uobj;
>> +	void *object;	/* May need direct access upon hot unplug */
> 
> This should be a 'struct file *' and have a better name.
> 

OK, will change.

> And I'm unclear why we need to store both the ib_uobject and the
> struct file for the same thing? 

Post hot unplug/unbind the uobj can't be accessed any more to reach the 
object as it will be set to NULL by ib_core layer [1].
As the filp is still open we need a direct access to it down the road 
and for that we have it separately.

This was the comment that I have just put above in the code, I may 
improve it with more details as pointed here.

[1]
https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/rdma_core.c#L149

And why are we storing the uobj here
> instead of the struct devx_async_event_file *?
> 

Basically storing the uobj is the same as of storing the 
devx_async_event_file, as we just use container_of to get it from.
There is no direct access from uobj to any of its fields so that we 
should be fine.

> Since uobj->object == flip && filp->private_data == uobj, I have a
> hard time to understand why we need both things, it seems to me that
> if we get the fget on the filp then we can rely on the
> filp->private_data to get back to the devx_async_event_file.
> 

The idea was to not take an extra ref count on the file (i.e. fget) per 
subscription, this will let the release option to be called once the 
file will be closed by the application.
Otherwise we might need to consider having some unsubscribe method to 
put the ref count back with all its overhead and implications without a 
real justified reason.


>> +	struct eventfd_ctx *eventfd;
>> +};
>> +
> 
>>   /*
>>    * As the obj_id in the firmware is not globally unique the object type
>>    * must be considered upon checking for a valid object id.
>> @@ -1143,14 +1275,53 @@ static void devx_cleanup_mkey(struct devx_obj *obj)
>>   	write_unlock_irqrestore(&table->lock, flags);
>>   }
>>   
>> +static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
>> +				      struct devx_event_subscription *sub)
>> +{
>> +	list_del_rcu(&sub->file_list);
>> +	list_del_rcu(&sub->xa_list);
>> +
>> +	if (sub->is_obj_related) {
> 
> is_obj_related looks like it is just list_empty(obj_list)??
>


Yes, in that approach we may need to call INIT_LIST_HEAD(sub->obj_list) 
in case it wasn't an object upon subscription, will do that.

> Success oriented flow
> 
>> @@ -1523,6 +1700,350 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_OBJ_ASYNC_QUERY)(
>>   	return err;
>>   }
>>   
>> +static void
>> +subscribe_event_xa_dealloc(struct mlx5_devx_event_table *devx_event_table,
>> +			   u32 key_level1,
>> +			   u32 key_level2,
>> +			   struct devx_obj_event *alloc_obj_event)
>> +{
>> +	struct devx_event *event;
>> +
>> +	/* Level 1 is valid for future use - no need to free */
>> +	if (!alloc_obj_event)
>> +		return;
>> +
>> +	event = xa_load(&devx_event_table->event_xa, key_level1);
>> +	WARN_ON(!event);
>> +
>> +	xa_erase(&event->object_ids, key_level2);
> 
> Shoulnd't this only erase if the value stored is NULL?
> 

If this key_level2 wasn't allocated by the subscribe flow and exists 
before we may not reach here and return from the above lines in this 
function [1], otherwise we need to erase as done here.

[1]
if (!alloc_obj_event)
	return;

>> +	kfree(alloc_obj_event);
>> +}
>> +
>> +static int
>> +subscribe_event_xa_alloc(struct mlx5_devx_event_table *devx_event_table,
>> +			 u32 key_level1,
>> +			 bool is_level2,
>> +			 u32 key_level2,
>> +			 struct devx_obj_event **alloc_obj_event)
>> +{
>> +	struct devx_obj_event *obj_event;
>> +	struct devx_event *event;
>> +	bool new_entry_level1 = false;
>> +	int err;
>> +
>> +	event = xa_load(&devx_event_table->event_xa, key_level1);
>> +	if (!event) {
>> +		event = kzalloc(sizeof(*event), GFP_KERNEL);
>> +		if (!event)
>> +			return -ENOMEM;
>> +
>> +		new_entry_level1 = true;
>> +		INIT_LIST_HEAD(&event->unaffiliated_list);
>> +		xa_init(&event->object_ids);
>> +
>> +		err = xa_insert(&devx_event_table->event_xa,
>> +				key_level1,
>> +				event,
>> +				GFP_KERNEL);
>> +		if (err)
>> +			goto end;
>> +	}
>> +
>> +	if (!is_level2)
>> +		return 0;
>> +
>> +	obj_event = xa_load(&event->object_ids, key_level2);
>> +	if (!obj_event) {
>> +		err = xa_reserve(&event->object_ids, key_level2, GFP_KERNEL);
>> +		if (err)
>> +			goto err_level1;
>> +
>> +		obj_event = kzalloc(sizeof(*obj_event), GFP_KERNEL);
>> +		if (!obj_event) {
>> +			err = -ENOMEM;
>> +			goto err_level2;
>> +		}
>> +
>> +		INIT_LIST_HEAD(&obj_event->obj_sub_list);
>> +		*alloc_obj_event = obj_event;
> 
> This is goofy, just store the empty obj_event in the xa instead of
> using xa_reserve, and when you go to do the error unwind just delete
> any level2' devx_obj_event' that has a list_empty(obj_sub_list), get
> rid of the wonky alloc_obj_event stuff.
> 

Please see my answer above about how level2 is managed by this 
alloc_obj_event, is that really worth a change ? I found current logic 
to be clear. I may put some note here if we can stay with that.

> The best configuration would be to use devx_cleanup_subscription to
> undo the partially ready subscription.
> 

This partially ready subscription might not match the 
devx_cleanup_subscription(), e.g. it wasn't added to xa_list and can't 
be deleted without any specific flag to ignore ..


>> +	}
>> +
>> +	return 0;
>> +
>> +err_level2:
>> +	xa_erase(&event->object_ids, key_level2);
>> +
>> +err_level1:
>> +	if (new_entry_level1)
>> +		xa_erase(&devx_event_table->event_xa, key_level1);
>> +end:
>> +	if (new_entry_level1)
>> +		kfree(event);
> 
> Can't do this, once the level1 is put in the tree it could be referenced by
> the irqs. At least it needs a kfree_rcu, most likely it is simpler to
> just leave it.
> 

Agree, it looks simpler just to leave it, will handle.

>> +#define MAX_NUM_EVENTS 16
>> +static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_SUBSCRIBE_EVENT)(
>> +	struct uverbs_attr_bundle *attrs)
>> +{
>> +	struct ib_uobject *devx_uobj = uverbs_attr_get_uobject(
>> +				attrs,
>> +				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_OBJ_HANDLE);
>> +	struct mlx5_ib_ucontext *c = rdma_udata_to_drv_context(
>> +		&attrs->driver_udata, struct mlx5_ib_ucontext, ibucontext);
>> +	struct mlx5_ib_dev *dev = to_mdev(c->ibucontext.device);
>> +	struct ib_uobject *fd_uobj;
>> +	struct devx_obj *obj = NULL;
>> +	struct devx_async_event_file *ev_file;
>> +	struct mlx5_devx_event_table *devx_event_table = &dev->devx_event_table;
>> +	u16 *event_type_num_list;
>> +	struct devx_event_subscription **event_sub_arr;
>> +	struct devx_obj_event  **event_obj_array_alloc;
>> +	int redirect_fd;
>> +	bool use_eventfd = false;
>> +	int num_events;
>> +	int num_alloc_xa_entries = 0;
>> +	u16 obj_type = 0;
>> +	u64 cookie = 0;
>> +	u32 obj_id = 0;
>> +	int err;
>> +	int i;
>> +
>> +	if (!c->devx_uid)
>> +		return -EINVAL;
>> +
>> +	if (!IS_ERR(devx_uobj)) {
>> +		obj = (struct devx_obj *)devx_uobj->object;
>> +		if (obj)
>> +			obj_id = get_dec_obj_id(obj->obj_id);
>> +	}
>> +
>> +	fd_uobj = uverbs_attr_get_uobject(attrs,
>> +				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_HANDLE);
>> +	if (IS_ERR(fd_uobj))
>> +		return PTR_ERR(fd_uobj);
>> +
>> +	ev_file = container_of(fd_uobj, struct devx_async_event_file,
>> +			       uobj);
>> +
>> +	if (uverbs_attr_is_valid(attrs,
>> +				 MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM)) {
>> +		err = uverbs_copy_from(&redirect_fd, attrs,
>> +			       MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_FD_NUM);
>> +		if (err)
>> +			return err;
>> +
>> +		use_eventfd = true;
>> +	}
>> +
>> +	if (uverbs_attr_is_valid(attrs,
>> +				 MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE)) {
>> +		if (use_eventfd)
>> +			return -EINVAL;
>> +
>> +		err = uverbs_copy_from(&cookie, attrs,
>> +				MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_COOKIE);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	num_events = uverbs_attr_ptr_get_array_size(
>> +		attrs, MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST,
>> +		sizeof(u16));
>> +
>> +	if (num_events < 0)
>> +		return num_events;
>> +
>> +	if (num_events > MAX_NUM_EVENTS)
>> +		return -EINVAL;
>> +
>> +	event_type_num_list = uverbs_attr_get_alloced_ptr(attrs,
>> +			MLX5_IB_ATTR_DEVX_SUBSCRIBE_EVENT_TYPE_NUM_LIST);
>> +
>> +	if (!is_valid_events(dev->mdev, num_events, event_type_num_list, obj))
>> +		return -EINVAL;
>> +
>> +	event_sub_arr = uverbs_zalloc(attrs,
>> +		MAX_NUM_EVENTS * sizeof(struct devx_event_subscription *));
>> +	event_obj_array_alloc = uverbs_zalloc(attrs,
>> +		MAX_NUM_EVENTS * sizeof(struct devx_obj_event *));
> 
> There are so many list_heads in the devx_event_subscription, why not
> use just one of them to store the allocated events instead of this
> temp array? ie event_list looks good for this purpose.
> 

I'm using the array later on with direct access to the index that should 
be de-allocated. I would prefer staying with this array rather than 
using the 'event_list' which has other purpose down the road, it's used 
per subscription and doesn't look match to hold the devx_obj_event which 
has no list entry for this purpose..

>> +
>> +	if (!event_sub_arr || !event_obj_array_alloc)
>> +		return -ENOMEM;
>> +
>> +	/* Protect from concurrent subscriptions to same XA entries to allow
>> +	 * both to succeed
>> +	 */
>> +	mutex_lock(&devx_event_table->event_xa_lock);
>> +	for (i = 0; i < num_events; i++) {
>> +		u32 key_level1;
>> +
>> +		if (obj)
>> +			obj_type = get_dec_obj_type(obj,
>> +						    event_type_num_list[i]);
>> +		key_level1 = event_type_num_list[i] | obj_type << 16;
>> +
>> +		err = subscribe_event_xa_alloc(devx_event_table,
>> +					       key_level1,
>> +					       obj ? true : false,
>> +					       obj_id,
>> +					       &event_obj_array_alloc[i]);
> 
> Usless ?:

What do you suggest instead ?

> 
>> +		if (err)
>> +			goto err;
>> +
>> +		num_alloc_xa_entries++;
>> +		event_sub_arr[i] = kzalloc(sizeof(*event_sub_arr[i]),
>> +					   GFP_KERNEL);
>> +		if (!event_sub_arr[i])
>> +			goto err;
>> +
>> +		if (use_eventfd) {
>> +			event_sub_arr[i]->eventfd =
>> +				eventfd_ctx_fdget(redirect_fd);
>> +
>> +			if (IS_ERR(event_sub_arr[i]->eventfd)) {
>> +				err = PTR_ERR(event_sub_arr[i]->eventfd);
>> +				event_sub_arr[i]->eventfd = NULL;
>> +				goto err;
>> +			}
>> +		}
>> +
>> +		event_sub_arr[i]->cookie = cookie;
>> +		event_sub_arr[i]->fd_uobj = fd_uobj;
>> +		event_sub_arr[i]->object = fd_uobj->object;
>> +		/* May be needed upon cleanup the devx object/subscription */
>> +		event_sub_arr[i]->xa_key_level1 = key_level1;
>> +		event_sub_arr[i]->xa_key_level2 = obj_id;
>> +		event_sub_arr[i]->is_obj_related = obj ? true : false;
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event
  2019-06-24 12:03   ` Jason Gunthorpe
@ 2019-06-24 16:55     ` Yishai Hadas
  2019-06-24 18:06       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Yishai Hadas @ 2019-06-24 16:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On 6/24/2019 3:03 PM, Jason Gunthorpe wrote:
> On Tue, Jun 18, 2019 at 08:15:39PM +0300, Leon Romanovsky wrote:
>> From: Yishai Hadas <yishaih@mellanox.com>
>>
>> Implement DEVX dispatching event by looking up for the applicable
>> subscriptions for the reported event and using their target fd to
>> signal/set the event.
>>
>> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
>> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>>   drivers/infiniband/hw/mlx5/devx.c         | 362 +++++++++++++++++++++-
>>   include/uapi/rdma/mlx5_user_ioctl_verbs.h |   5 +
>>   2 files changed, 357 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
>> index 304b13e7a265..49fdce95d6d9 100644
>> +++ b/drivers/infiniband/hw/mlx5/devx.c
>> @@ -34,6 +34,11 @@ struct devx_async_data {
>>   	struct mlx5_ib_uapi_devx_async_cmd_hdr hdr;
>>   };
>>   
>> +struct devx_async_event_data {
>> +	struct list_head list; /* headed in ev_queue->event_list */
>> +	struct mlx5_ib_uapi_devx_async_event_hdr hdr;
>> +};
>> +
>>   /* first level XA value data structure */
>>   struct devx_event {
>>   	struct xarray object_ids; /* second XA level, Key = object id */
>> @@ -54,7 +59,9 @@ struct devx_event_subscription {
>>   				   * devx_obj_event->obj_sub_list
>>   				   */
>>   	struct list_head obj_list; /* headed in devx_object */
>> +	struct list_head event_list; /* headed in ev_queue->event_list */
>>   
>> +	u8  is_cleaned:1;
> 
> There is a loose bool 'is_obj_related' that should be combined with
> this bool bitfield as well.
> 

OK

>>   static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
>> -				      struct devx_event_subscription *sub)
>> +				      struct devx_event_subscription *sub,
>> +				      bool file_close)
>>   {
>> -	list_del_rcu(&sub->file_list);
>> +	if (sub->is_cleaned)
>> +		goto end;
>> +
>> +	sub->is_cleaned = 1;
>>   	list_del_rcu(&sub->xa_list);
>>   
>>   	if (sub->is_obj_related) {
>> @@ -1303,10 +1355,15 @@ static void devx_cleanup_subscription(struct mlx5_ib_dev *dev,
>>   		}
>>   	}
>>   
>> -	if (sub->eventfd)
>> -		eventfd_ctx_put(sub->eventfd);
>> +end:
>> +	if (file_close) {
>> +		if (sub->eventfd)
>> +			eventfd_ctx_put(sub->eventfd);
>>   
>> -	kfree_rcu(sub, rcu);
>> +		list_del_rcu(&sub->file_list);
>> +		/* subscription may not be used by the read API any more */
>> +		kfree_rcu(sub, rcu);
>> +	}
> 
> Dis like this confusing file_close stuff, just put this in the single place
> that calls this with the true bool
> 

OK, will do.

>> +static int deliver_event(struct devx_event_subscription *event_sub,
>> +			 const void *data)
>> +{
>> +	struct ib_uobject *fd_uobj = event_sub->fd_uobj;
>> +	struct devx_async_event_file *ev_file;
>> +	struct devx_async_event_queue *ev_queue;
>> +	struct devx_async_event_data *event_data;
>> +	unsigned long flags;
>> +	bool omit_data;
>> +
>> +	ev_file = container_of(fd_uobj, struct devx_async_event_file,
>> +			       uobj);
>> +	ev_queue = &ev_file->ev_queue;
>> +	omit_data = ev_queue->flags &
>> +		MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA;
>> +
>> +	if (omit_data) {
>> +		spin_lock_irqsave(&ev_queue->lock, flags);
>> +		if (!list_empty(&event_sub->event_list)) {
>> +			spin_unlock_irqrestore(&ev_queue->lock, flags);
>> +			return 0;
>> +		}
>> +
>> +		list_add_tail(&event_sub->event_list, &ev_queue->event_list);
>> +		spin_unlock_irqrestore(&ev_queue->lock, flags);
>> +		wake_up_interruptible(&ev_queue->poll_wait);
>> +		return 0;
>> +	}
>> +
>> +	event_data = kzalloc(sizeof(*event_data) +
>> +			     (omit_data ? 0 : sizeof(struct mlx5_eqe)),
>> +			     GFP_ATOMIC);
> 
> omit_data is always false here
> 

Correct, will clean it up.

>> +	if (!event_data) {
>> +		spin_lock_irqsave(&ev_queue->lock, flags);
>> +		ev_queue->is_overflow_err = 1;
>> +		spin_unlock_irqrestore(&ev_queue->lock, flags);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	event_data->hdr.cookie = event_sub->cookie;
>> +	memcpy(event_data->hdr.out_data, data, sizeof(struct mlx5_eqe));
>> +
>> +	spin_lock_irqsave(&ev_queue->lock, flags);
>> +	list_add_tail(&event_data->list, &ev_queue->event_list);
>> +	spin_unlock_irqrestore(&ev_queue->lock, flags);
>> +	wake_up_interruptible(&ev_queue->poll_wait);
>> +
>> +	return 0;
>> +}
>> +
>> +static void dispatch_event_fd(struct list_head *fd_list,
>> +			      const void *data)
>> +{
>> +	struct devx_event_subscription *item;
>> +
>> +	list_for_each_entry_rcu(item, fd_list, xa_list) {
>> +		if (!get_file_rcu((struct file *)item->object))
>> +			continue;
>> +
>> +		if (item->eventfd) {
>> +			eventfd_signal(item->eventfd, 1);
>> +			fput(item->object);
>> +			continue;
>> +		}
>> +
>> +		deliver_event(item, data);
>> +		fput(item->object);
>> +	}
>> +}
>> +
>>   static int devx_event_notifier(struct notifier_block *nb,
>>   			       unsigned long event_type, void *data)
>>   {
>> -	return NOTIFY_DONE;
>> +	struct mlx5_devx_event_table *table;
>> +	struct mlx5_ib_dev *dev;
>> +	struct devx_event *event;
>> +	struct devx_obj_event *obj_event;
>> +	u16 obj_type = 0;
>> +	bool is_unaffiliated;
>> +	u32 obj_id;
>> +
>> +	/* Explicit filtering to kernel events which may occur frequently */
>> +	if (event_type == MLX5_EVENT_TYPE_CMD ||
>> +	    event_type == MLX5_EVENT_TYPE_PAGE_REQUEST)
>> +		return NOTIFY_OK;
>> +
>> +	table = container_of(nb, struct mlx5_devx_event_table, devx_nb.nb);
>> +	dev = container_of(table, struct mlx5_ib_dev, devx_event_table);
>> +	is_unaffiliated = is_unaffiliated_event(dev->mdev, event_type);
>> +
>> +	if (!is_unaffiliated)
>> +		obj_type = get_event_obj_type(event_type, data);
>> +	event = xa_load(&table->event_xa, event_type | (obj_type << 16));
>> +	if (!event)
>> +		return NOTIFY_DONE;
> 
> event should be in the rcu as well

Do we really need this ? I didn't see a flow that really requires that.
> 
>> +	if (is_unaffiliated) {
>> +		dispatch_event_fd(&event->unaffiliated_list, data);
>> +		return NOTIFY_OK;
>> +	}
>> +
>> +	obj_id = devx_get_obj_id_from_event(event_type, data);
>> +	rcu_read_lock();
>> +	obj_event = xa_load(&event->object_ids, obj_id);
>> +	if (!obj_event) {
>> +		rcu_read_unlock();
>> +		return NOTIFY_DONE;
>> +	}
>> +
>> +	dispatch_event_fd(&obj_event->obj_sub_list, data);
>> +
>> +	rcu_read_unlock();
>> +	return NOTIFY_OK;
>>   }
>>   
>>   void mlx5_ib_devx_init_event_table(struct mlx5_ib_dev *dev)
>> @@ -2221,7 +2444,7 @@ void mlx5_ib_devx_cleanup_event_table(struct mlx5_ib_dev *dev)
>>   		event = entry;
>>   		list_for_each_entry_safe(sub, tmp, &event->unaffiliated_list,
>>   					 xa_list)
>> -			devx_cleanup_subscription(dev, sub);
>> +			devx_cleanup_subscription(dev, sub, false);
>>   		kfree(entry);
>>   	}
>>   	mutex_unlock(&dev->devx_event_table.event_xa_lock);
>> @@ -2329,18 +2552,126 @@ static const struct file_operations devx_async_cmd_event_fops = {
>>   static ssize_t devx_async_event_read(struct file *filp, char __user *buf,
>>   				     size_t count, loff_t *pos)
>>   {
>> -	return -EINVAL;
>> +	struct devx_async_event_file *ev_file = filp->private_data;
>> +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
>> +	struct devx_event_subscription *event_sub;
>> +	struct devx_async_event_data *uninitialized_var(event);
>> +	int ret = 0;
>> +	size_t eventsz;
>> +	bool omit_data;
>> +	void *event_data;
>> +
>> +	omit_data = ev_queue->flags &
>> +		MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA;
>> +
>> +	spin_lock_irq(&ev_queue->lock);
>> +
>> +	if (ev_queue->is_overflow_err) {
>> +		ev_queue->is_overflow_err = 0;
>> +		spin_unlock_irq(&ev_queue->lock);
>> +		return -EOVERFLOW;
>> +	}
>> +
>> +	while (list_empty(&ev_queue->event_list)) {
>> +		spin_unlock_irq(&ev_queue->lock);
>> +
>> +		if (filp->f_flags & O_NONBLOCK)
>> +			return -EAGAIN;
>> +
>> +		if (wait_event_interruptible(ev_queue->poll_wait,
>> +			    (!list_empty(&ev_queue->event_list) ||
>> +			     ev_queue->is_destroyed))) {
>> +			return -ERESTARTSYS;
>> +		}
>> +
>> +		if (list_empty(&ev_queue->event_list) &&
>> +		    ev_queue->is_destroyed)
>> +			return -EIO;
> 
> All these tests should be under the lock.

We can't call wait_event_interruptible() above which may sleep under the 
lock, correct ? are you referring to the list_empty() and is_destroyed ?

By the way looking in uverb code [1], similar code which is not done 
under the lock as of here..

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L244


> 
> Why don't we return EIO as soon as is-destroyed happens? What is the
> point of flushing out the accumulated events?

It follows the above uverb code/logic that returns existing events even 
in that case, also the async command events in this file follows that 
logic, I suggest to stay consistent.
> 
>> +
>> +		spin_lock_irq(&ev_queue->lock);
>> +	}
>> +
>> +	if (omit_data) {
>> +		event_sub = list_first_entry(&ev_queue->event_list,
>> +					struct devx_event_subscription,
>> +					event_list);
>> +		eventsz = sizeof(event_sub->cookie);
>> +		event_data = &event_sub->cookie;
>> +	} else {
>> +		event = list_first_entry(&ev_queue->event_list,
>> +				      struct devx_async_event_data, list);
>> +		eventsz = sizeof(struct mlx5_eqe) +
>> +			sizeof(struct mlx5_ib_uapi_devx_async_event_hdr);
>> +		event_data = &event->hdr;
>> +	}
>> +
>> +	if (eventsz > count) {
>> +		spin_unlock_irq(&ev_queue->lock);
>> +		return -ENOSPC;
> 
> This is probably the wrong errno

OK, will change to -EINVAL as in uverbs
https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L254

> 
>> +	}
>> +
>> +	if (omit_data)
>> +		list_del_init(&event_sub->event_list);
>> +	else
>> +		list_del(&event->list);
>> +
>> +	spin_unlock_irq(&ev_queue->lock);
>> +
>> +	if (copy_to_user(buf, event_data, eventsz))
>> +		ret = -EFAULT;
>> +	else
>> +		ret = eventsz;
> 
> This is really poorly ordered, EFAULT will cause the event to be lost. :(

Agree but apparently rare case .. see also the below notes.

> 
> Maybe the event should be re-added on error? Tricky.

What will happen if another copy_to_user may then fail again (loop ?) 
... not sure that we want to get into this tricky handling ...

As of above, It follows the logic from uverbs at that area.
https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L267

> 
>> +	if (!omit_data)
>> +		kfree(event);
>> +	return ret;
>>   }
>>   
>>   static __poll_t devx_async_event_poll(struct file *filp,
>>   				      struct poll_table_struct *wait)
>>   {
>> -	return 0;
>> +	struct devx_async_event_file *ev_file = filp->private_data;
>> +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
>> +	__poll_t pollflags = 0;
>> +
>> +	poll_wait(filp, &ev_queue->poll_wait, wait);
>> +
>> +	spin_lock_irq(&ev_queue->lock);
>> +	if (ev_queue->is_destroyed)
>> +		pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
>> +	else if (!list_empty(&ev_queue->event_list))
>> +		pollflags = EPOLLIN | EPOLLRDNORM;
>> +	spin_unlock_irq(&ev_queue->lock);
>> +
>> +	return pollflags;
>>   }
>>   
>>   static int devx_async_event_close(struct inode *inode, struct file *filp)
>>   {
>> +	struct ib_uobject *uobj = filp->private_data;
>> +	struct devx_async_event_file *ev_file =
>> +		container_of(uobj, struct devx_async_event_file, uobj);
>> +	struct devx_event_subscription *event_sub, *event_sub_tmp;
>> +	struct devx_async_event_data *entry, *tmp;
>> +
>> +	mutex_lock(&ev_file->dev->devx_event_table.event_xa_lock);
>> +	/* delete the subscriptions which are related to this FD */
>> +	list_for_each_entry_safe(event_sub, event_sub_tmp,
>> +				 &ev_file->subscribed_events_list, file_list)
>> +		devx_cleanup_subscription(ev_file->dev, event_sub, true);
>> +	mutex_unlock(&ev_file->dev->devx_event_table.event_xa_lock);
>> +
>> +	/* free the pending events allocation */
>> +	if (!(ev_file->ev_queue.flags &
>> +	    MLX5_IB_UAPI_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA)) {
>> +		spin_lock_irq(&ev_file->ev_queue.lock);
>> +		list_for_each_entry_safe(entry, tmp,
>> +					 &ev_file->ev_queue.event_list, list)
>> +			kfree(entry); /* read can't come any nore */
> 
> spelling

OK

> 
>> +		spin_unlock_irq(&ev_file->ev_queue.lock);
>> +	}
>>   	uverbs_close_fd(filp);
>> +	put_device(&ev_file->dev->ib_dev.dev);
>>   	return 0;
>>   }
>>   
>> @@ -2374,6 +2705,17 @@ static int devx_hot_unplug_async_cmd_event_file(struct ib_uobject *uobj,
>>   static int devx_hot_unplug_async_event_file(struct ib_uobject *uobj,
>>   					    enum rdma_remove_reason why)
>>   {
>> +	struct devx_async_event_file *ev_file =
>> +		container_of(uobj, struct devx_async_event_file,
>> +			     uobj);
>> +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
>> +
>> +	spin_lock_irq(&ev_queue->lock);
>> +	ev_queue->is_destroyed = 1;
>> +	spin_unlock_irq(&ev_queue->lock);
>> +
>> +	if (why == RDMA_REMOVE_DRIVER_REMOVE)
>> +		wake_up_interruptible(&ev_queue->poll_wait);
> 
> Why isn't this wakeup always done?

Maybe you are right and this can be always done to wake up any readers 
as the 'is_destroyed' was set.

By the way, any idea why it was done as such in uverbs [1] for similar 
flow ? also the command events follows that.


[1]
https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_std_types.c#L207

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events
  2019-06-24 12:04   ` Jason Gunthorpe
@ 2019-06-24 17:03     ` Yishai Hadas
  2019-06-24 18:06       ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Yishai Hadas @ 2019-06-24 17:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On 6/24/2019 3:04 PM, Jason Gunthorpe wrote:
> On Tue, Jun 18, 2019 at 08:15:40PM +0300, Leon Romanovsky wrote:
>> From: Yishai Hadas <yishaih@mellanox.com>
>>
>> Add DEVX support for CQ events by creating and destroying the CQ via
>> mlx5_core and set an handler to manage its completions.
>>
>> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
>> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>>   drivers/infiniband/hw/mlx5/devx.c | 40 +++++++++++++++++++++++++++++++
>>   1 file changed, 40 insertions(+)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
>> index 49fdce95d6d9..91ccd58ebc05 100644
>> +++ b/drivers/infiniband/hw/mlx5/devx.c
>> @@ -19,9 +19,12 @@
>>   #define UVERBS_MODULE_NAME mlx5_ib
>>   #include <rdma/uverbs_named_ioctl.h>
>>   
>> +static void dispatch_event_fd(struct list_head *fd_list, const void *data);
>> +
>>   enum devx_obj_flags {
>>   	DEVX_OBJ_FLAGS_INDIRECT_MKEY = 1 << 0,
>>   	DEVX_OBJ_FLAGS_DCT = 1 << 1,
>> +	DEVX_OBJ_FLAGS_CQ = 1 << 2,
>>   };
>>   
>>   struct devx_async_data {
>> @@ -94,6 +97,7 @@ struct devx_async_event_file {
>>   #define MLX5_MAX_DESTROY_INBOX_SIZE_DW MLX5_ST_SZ_DW(delete_fte_in)
>>   struct devx_obj {
>>   	struct mlx5_core_dev	*mdev;
>> +	struct mlx5_ib_dev	*ib_dev;
> 
> This seems strange, why would we need to store the core_dev and the ib_dev
> in a struct when ibdev->mdev == core_dev?
> 

We need to add the ib_dev as we can't access it from the core_dev.
Most of this patch we can probably go and drop the mdev and access it 
from ib_dev, I preferred to not handle that in this patch.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX
  2019-06-24 16:13     ` Yishai Hadas
@ 2019-06-24 17:56       ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 17:56 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On Mon, Jun 24, 2019 at 07:13:14PM +0300, Yishai Hadas wrote:
> > > +	u32 xa_key_level1;
> > > +	u32 xa_key_level2;
> > > +	struct rcu_head	rcu;
> > > +	u64 cookie;
> > > +	bool is_obj_related;
> > > +	struct ib_uobject *fd_uobj;
> > > +	void *object;	/* May need direct access upon hot unplug */
> > 
> > This should be a 'struct file *' and have a better name.
> > 
> 
> OK, will change.
> 
> > And I'm unclear why we need to store both the ib_uobject and the
> > struct file for the same thing?
> 
> Post hot unplug/unbind the uobj can't be accessed any more to reach the
> object as it will be set to NULL by ib_core layer [1].

struct file users need to get the uobject from the file->private_data
under a fget.

There is only place place that needed fd_uobj, and it was under the
fget section, so it should simply use private_data.

This is why you should only store the struct file and not the uobject.

> This was the comment that I have just put above in the code, I may improve
> it with more details as pointed here.
> 
> [1]
> https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/rdma_core.c#L149

I'm wondering if this is a bug to do this for fds?

> > Since uobj->object == flip && filp->private_data == uobj, I have a
> > hard time to understand why we need both things, it seems to me that
> > if we get the fget on the filp then we can rely on the
> > filp->private_data to get back to the devx_async_event_file.
> > 
> 
> The idea was to not take an extra ref count on the file (i.e. fget) per
> subscription, this will let the release option to be called once the file
> will be closed by the application.

No extra ref is needed, the fget is already obtained in the only place
that needs fd_uobj.

> > > +	obj_event = xa_load(&event->object_ids, key_level2);
> > > +	if (!obj_event) {
> > > +		err = xa_reserve(&event->object_ids, key_level2, GFP_KERNEL);
> > > +		if (err)
> > > +			goto err_level1;
> > > +
> > > +		obj_event = kzalloc(sizeof(*obj_event), GFP_KERNEL);
> > > +		if (!obj_event) {
> > > +			err = -ENOMEM;
> > > +			goto err_level2;
> > > +		}
> > > +
> > > +		INIT_LIST_HEAD(&obj_event->obj_sub_list);
> > > +		*alloc_obj_event = obj_event;
> > 
> > This is goofy, just store the empty obj_event in the xa instead of
> > using xa_reserve, and when you go to do the error unwind just delete
> > any level2' devx_obj_event' that has a list_empty(obj_sub_list), get
> > rid of the wonky alloc_obj_event stuff.
> > 
> 
> Please see my answer above about how level2 is managed by this
> alloc_obj_event, is that really worth a change ? I found current logic to be
> clear. I may put some note here if we can stay with that.

I think it is alot cleaner/simpler than using this extra memory

> > The best configuration would be to use devx_cleanup_subscription to
> > undo the partially ready subscription.
> 
> This partially ready subscription might not match the
> devx_cleanup_subscription(), e.g. it wasn't added to xa_list and can't be
> deleted without any specific flag to ignore ..

Maybe, but I suspect it can work out

> > > +	event_sub_arr = uverbs_zalloc(attrs,
> > > +		MAX_NUM_EVENTS * sizeof(struct devx_event_subscription *));
> > > +	event_obj_array_alloc = uverbs_zalloc(attrs,
> > > +		MAX_NUM_EVENTS * sizeof(struct devx_obj_event *));
> > 
> > There are so many list_heads in the devx_event_subscription, why not
> > use just one of them to store the allocated events instead of this
> > temp array? ie event_list looks good for this purpose.
> > 
> 
> I'm using the array later on with direct access to the index that should be
> de-allocated. I would prefer staying with this array rather than using the
> 'event_list' which has other purpose down the road, it's used per
> subscription and doesn't look match to hold the devx_obj_event which has no
> list entry for this purpose..

Replace the event_obj_array_alloc by storing that data directly in
the xarray

Replace the event_sub_arr by building them into a linked list - it
always need to iterate over the whole list anyhow.

> > > +
> > > +	if (!event_sub_arr || !event_obj_array_alloc)
> > > +		return -ENOMEM;
> > > +
> > > +	/* Protect from concurrent subscriptions to same XA entries to allow
> > > +	 * both to succeed
> > > +	 */
> > > +	mutex_lock(&devx_event_table->event_xa_lock);
> > > +	for (i = 0; i < num_events; i++) {
> > > +		u32 key_level1;
> > > +
> > > +		if (obj)
> > > +			obj_type = get_dec_obj_type(obj,
> > > +						    event_type_num_list[i]);
> > > +		key_level1 = event_type_num_list[i] | obj_type << 16;
> > > +
> > > +		err = subscribe_event_xa_alloc(devx_event_table,
> > > +					       key_level1,
> > > +					       obj ? true : false,
> > > +					       obj_id,
> > > +					       &event_obj_array_alloc[i]);
> > 
> > Usless ?:
> 
> What do you suggest instead ?

Nothing is needed, cast to implicit bool is always forced to
true/false

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event
  2019-06-24 16:55     ` Yishai Hadas
@ 2019-06-24 18:06       ` Jason Gunthorpe
  2019-06-25 14:41         ` Yishai Hadas
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 18:06 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On Mon, Jun 24, 2019 at 07:55:32PM +0300, Yishai Hadas wrote:

> > > +	/* Explicit filtering to kernel events which may occur frequently */
> > > +	if (event_type == MLX5_EVENT_TYPE_CMD ||
> > > +	    event_type == MLX5_EVENT_TYPE_PAGE_REQUEST)
> > > +		return NOTIFY_OK;
> > > +
> > > +	table = container_of(nb, struct mlx5_devx_event_table, devx_nb.nb);
> > > +	dev = container_of(table, struct mlx5_ib_dev, devx_event_table);
> > > +	is_unaffiliated = is_unaffiliated_event(dev->mdev, event_type);
> > > +
> > > +	if (!is_unaffiliated)
> > > +		obj_type = get_event_obj_type(event_type, data);
> > > +	event = xa_load(&table->event_xa, event_type | (obj_type << 16));
> > > +	if (!event)
> > > +		return NOTIFY_DONE;
> > 
> > event should be in the rcu as well
> 
> Do we really need this ? I didn't see a flow that really requires
> that.

I think there are no frees left? Even so it makes much more sense to
include the event in the rcu as if we ever did need to kfree it would
have to be via rcu

> > > +	while (list_empty(&ev_queue->event_list)) {
> > > +		spin_unlock_irq(&ev_queue->lock);
> > > +
> > > +		if (filp->f_flags & O_NONBLOCK)
> > > +			return -EAGAIN;
> > > +
> > > +		if (wait_event_interruptible(ev_queue->poll_wait,
> > > +			    (!list_empty(&ev_queue->event_list) ||
> > > +			     ev_queue->is_destroyed))) {
> > > +			return -ERESTARTSYS;
> > > +		}
> > > +
> > > +		if (list_empty(&ev_queue->event_list) &&
> > > +		    ev_queue->is_destroyed)
> > > +			return -EIO;
> > 
> > All these tests should be under the lock.
> 
> We can't call wait_event_interruptible() above which may sleep under the
> lock, correct ? are you referring to the list_empty() and
> is_destroyed ?

yes

> By the way looking in uverb code [1], similar code which is not done under
> the lock as of here..
> 
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L244

Also not a good idea

> > Why don't we return EIO as soon as is-destroyed happens? What is the
> > point of flushing out the accumulated events?
> 
> It follows the above uverb code/logic that returns existing events even in
> that case, also the async command events in this file follows that logic, I
> suggest to stay consistent.

Don't follow broken uverbs stuff...

> > Maybe the event should be re-added on error? Tricky.
> 
> What will happen if another copy_to_user may then fail again (loop ?) ...
> not sure that we want to get into this tricky handling ...
> 
> As of above, It follows the logic from uverbs at that area.
> https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L267

again it is wrong...

There is no loop if you just stick the item back on the head of the
list and exit, which is probably the right thing to do..

> > > @@ -2374,6 +2705,17 @@ static int devx_hot_unplug_async_cmd_event_file(struct ib_uobject *uobj,
> > >   static int devx_hot_unplug_async_event_file(struct ib_uobject *uobj,
> > >   					    enum rdma_remove_reason why)
> > >   {
> > > +	struct devx_async_event_file *ev_file =
> > > +		container_of(uobj, struct devx_async_event_file,
> > > +			     uobj);
> > > +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
> > > +
> > > +	spin_lock_irq(&ev_queue->lock);
> > > +	ev_queue->is_destroyed = 1;
> > > +	spin_unlock_irq(&ev_queue->lock);
> > > +
> > > +	if (why == RDMA_REMOVE_DRIVER_REMOVE)
> > > +		wake_up_interruptible(&ev_queue->poll_wait);
> > 
> > Why isn't this wakeup always done?
> 
> Maybe you are right and this can be always done to wake up any readers as
> the 'is_destroyed' was set.
> 
> By the way, any idea why it was done as such in uverbs [1] for similar flow
> ? also the command events follows that.

I don't know, it is probably pointless too.

If we don't need it here then we shouldn't have it.

These random pointless ifs bother me as we have to spend time trying
to figure out that they are pointless down the road.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events
  2019-06-24 17:03     ` Yishai Hadas
@ 2019-06-24 18:06       ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-24 18:06 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On Mon, Jun 24, 2019 at 08:03:07PM +0300, Yishai Hadas wrote:
> On 6/24/2019 3:04 PM, Jason Gunthorpe wrote:
> > On Tue, Jun 18, 2019 at 08:15:40PM +0300, Leon Romanovsky wrote:
> > > From: Yishai Hadas <yishaih@mellanox.com>
> > > 
> > > Add DEVX support for CQ events by creating and destroying the CQ via
> > > mlx5_core and set an handler to manage its completions.
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >   drivers/infiniband/hw/mlx5/devx.c | 40 +++++++++++++++++++++++++++++++
> > >   1 file changed, 40 insertions(+)
> > > 
> > > diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> > > index 49fdce95d6d9..91ccd58ebc05 100644
> > > +++ b/drivers/infiniband/hw/mlx5/devx.c
> > > @@ -19,9 +19,12 @@
> > >   #define UVERBS_MODULE_NAME mlx5_ib
> > >   #include <rdma/uverbs_named_ioctl.h>
> > > +static void dispatch_event_fd(struct list_head *fd_list, const void *data);
> > > +
> > >   enum devx_obj_flags {
> > >   	DEVX_OBJ_FLAGS_INDIRECT_MKEY = 1 << 0,
> > >   	DEVX_OBJ_FLAGS_DCT = 1 << 1,
> > > +	DEVX_OBJ_FLAGS_CQ = 1 << 2,
> > >   };
> > >   struct devx_async_data {
> > > @@ -94,6 +97,7 @@ struct devx_async_event_file {
> > >   #define MLX5_MAX_DESTROY_INBOX_SIZE_DW MLX5_ST_SZ_DW(delete_fte_in)
> > >   struct devx_obj {
> > >   	struct mlx5_core_dev	*mdev;
> > > +	struct mlx5_ib_dev	*ib_dev;
> > 
> > This seems strange, why would we need to store the core_dev and the ib_dev
> > in a struct when ibdev->mdev == core_dev?
> > 
> 
> We need to add the ib_dev as we can't access it from the core_dev.
> Most of this patch we can probably go and drop the mdev and access it from
> ib_dev, I preferred to not handle that in this patch.

Should add a patch to revise it then

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 00/12] DEVX asynchronous events
  2019-06-19  4:45   ` Leon Romanovsky
@ 2019-06-24 21:57     ` Saeed Mahameed
  2019-06-30  8:53       ` Leon Romanovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Saeed Mahameed @ 2019-06-24 21:57 UTC (permalink / raw)
  To: leon; +Cc: Jason Gunthorpe, Yishai Hadas, netdev, linux-rdma, dledford

On Wed, 2019-06-19 at 07:45 +0300, Leon Romanovsky wrote:
> On Tue, Jun 18, 2019 at 06:51:45PM +0000, Saeed Mahameed wrote:
> > On Tue, 2019-06-18 at 20:15 +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@mellanox.com>
> > > 
> > > Changelog:
> > >  v0 -> v1:
> > 
> > Normally 1st submission is V1 and 2nd is V2.
> > so this should have been v1->v2.
> 
> "Normally" depends on the language you are using. In C, everything
> starts from 0, including version of patches :).
> 

You are wrong:
quoting: https://kernelnewbies.org/PatchTipsAndTricks

"For example, if you're sending the second revision of a patch, you
should use [PATCH v2]."

now don't tell me that second revision is actually 3rd revision or 1st
is 2nd :).. 

> > For mlx5-next patches:
> > 
> > Acked-by: Saeed Mahameed <saeedm@mellanox.com>
> 
> Thanks

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event
  2019-06-24 18:06       ` Jason Gunthorpe
@ 2019-06-25 14:41         ` Yishai Hadas
  2019-06-25 20:23           ` Jason Gunthorpe
  0 siblings, 1 reply; 35+ messages in thread
From: Yishai Hadas @ 2019-06-25 14:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On 6/24/2019 9:06 PM, Jason Gunthorpe wrote:
> On Mon, Jun 24, 2019 at 07:55:32PM +0300, Yishai Hadas wrote:
> 
>>>> +	/* Explicit filtering to kernel events which may occur frequently */
>>>> +	if (event_type == MLX5_EVENT_TYPE_CMD ||
>>>> +	    event_type == MLX5_EVENT_TYPE_PAGE_REQUEST)
>>>> +		return NOTIFY_OK;
>>>> +
>>>> +	table = container_of(nb, struct mlx5_devx_event_table, devx_nb.nb);
>>>> +	dev = container_of(table, struct mlx5_ib_dev, devx_event_table);
>>>> +	is_unaffiliated = is_unaffiliated_event(dev->mdev, event_type);
>>>> +
>>>> +	if (!is_unaffiliated)
>>>> +		obj_type = get_event_obj_type(event_type, data);
>>>> +	event = xa_load(&table->event_xa, event_type | (obj_type << 16));
>>>> +	if (!event)
>>>> +		return NOTIFY_DONE;
>>>
>>> event should be in the rcu as well
>>
>> Do we really need this ? I didn't see a flow that really requires
>> that.
> 
> I think there are no frees left? Even so it makes much more sense to
> include the event in the rcu as if we ever did need to kfree it would
> have to be via rcu
> 

OK

>>>> +	while (list_empty(&ev_queue->event_list)) {
>>>> +		spin_unlock_irq(&ev_queue->lock);
>>>> +
>>>> +		if (filp->f_flags & O_NONBLOCK)
>>>> +			return -EAGAIN;
>>>> +
>>>> +		if (wait_event_interruptible(ev_queue->poll_wait,
>>>> +			    (!list_empty(&ev_queue->event_list) ||
>>>> +			     ev_queue->is_destroyed))) {
>>>> +			return -ERESTARTSYS;
>>>> +		}
>>>> +
>>>> +		if (list_empty(&ev_queue->event_list) &&
>>>> +		    ev_queue->is_destroyed)
>>>> +			return -EIO;
>>>
>>> All these tests should be under the lock.
>>
>> We can't call wait_event_interruptible() above which may sleep under the
>> lock, correct ? are you referring to the list_empty() and
>> is_destroyed ?
> 
> yes
> 
>> By the way looking in uverb code [1], similar code which is not done under
>> the lock as of here..
>>
>> [1] https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L244
> 
> Also not a good idea
> 
>>> Why don't we return EIO as soon as is-destroyed happens? What is the
>>> point of flushing out the accumulated events?
>>
>> It follows the above uverb code/logic that returns existing events even in
>> that case, also the async command events in this file follows that logic, I
>> suggest to stay consistent.
> 
> Don't follow broken uverbs stuff...

May it be that there is some event that we still want to deliver post 
unbind/hot-unplug ? for example IB_EVENT_DEVICE_FATAL in uverbs and 
others from the driver code.

Not sure that we want to change this logic.
What do you think ?

> 
>>> Maybe the event should be re-added on error? Tricky.
>>
>> What will happen if another copy_to_user may then fail again (loop ?) ...
>> not sure that we want to get into this tricky handling ...
>>
>> As of above, It follows the logic from uverbs at that area.
>> https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L267
> 
> again it is wrong...
> 
> There is no loop if you just stick the item back on the head of the
> list and exit, which is probably the right thing to do..
> 

What if copy_to_user() will fail again just later on ? we might end-up 
with loop of read(s) that always find an event as it was put back.
I suggest to leave this flow as it's now, at least for this series 
submission.

Agree ?


>>>> @@ -2374,6 +2705,17 @@ static int devx_hot_unplug_async_cmd_event_file(struct ib_uobject *uobj,
>>>>    static int devx_hot_unplug_async_event_file(struct ib_uobject *uobj,
>>>>    					    enum rdma_remove_reason why)
>>>>    {
>>>> +	struct devx_async_event_file *ev_file =
>>>> +		container_of(uobj, struct devx_async_event_file,
>>>> +			     uobj);
>>>> +	struct devx_async_event_queue *ev_queue = &ev_file->ev_queue;
>>>> +
>>>> +	spin_lock_irq(&ev_queue->lock);
>>>> +	ev_queue->is_destroyed = 1;
>>>> +	spin_unlock_irq(&ev_queue->lock);
>>>> +
>>>> +	if (why == RDMA_REMOVE_DRIVER_REMOVE)
>>>> +		wake_up_interruptible(&ev_queue->poll_wait);
>>>
>>> Why isn't this wakeup always done?
>>
>> Maybe you are right and this can be always done to wake up any readers as
>> the 'is_destroyed' was set.
>>
>> By the way, any idea why it was done as such in uverbs [1] for similar flow
>> ? also the command events follows that.
> 
> I don't know, it is probably pointless too.
> 
> If we don't need it here then we shouldn't have it.
> 
> These random pointless ifs bother me as we have to spend time trying
> to figure out that they are pointless down the road.
> 

OK, will drop this if.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event
  2019-06-25 14:41         ` Yishai Hadas
@ 2019-06-25 20:23           ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2019-06-25 20:23 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Yishai Hadas, Saeed Mahameed, linux-netdev

On Tue, Jun 25, 2019 at 05:41:49PM +0300, Yishai Hadas wrote:
> > > > Why don't we return EIO as soon as is-destroyed happens? What is the
> > > > point of flushing out the accumulated events?
> > > 
> > > It follows the above uverb code/logic that returns existing events even in
> > > that case, also the async command events in this file follows that logic, I
> > > suggest to stay consistent.
> > 
> > Don't follow broken uverbs stuff...
> 
> May it be that there is some event that we still want to deliver post
> unbind/hot-unplug ? for example IB_EVENT_DEVICE_FATAL in uverbs and others
> from the driver code.

EIO is DEVICE_FATAL.

> Not sure that we want to change this logic.
> What do you think ?

I think this code should exit immediately with EIO if the device is
disassociated.

> > > > Maybe the event should be re-added on error? Tricky.
> > > 
> > > What will happen if another copy_to_user may then fail again (loop ?) ...
> > > not sure that we want to get into this tricky handling ...
> > > 
> > > As of above, It follows the logic from uverbs at that area.
> > > https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/core/uverbs_main.c#L267
> > 
> > again it is wrong...
> > 
> > There is no loop if you just stick the item back on the head of the
> > list and exit, which is probably the right thing to do..
> > 
> 
> What if copy_to_user() will fail again just later on ? we might end-up with
> loop of read(s) that always find an event as it was put back.

That is clearly an application bug and is not the concern of the
kernel..

> I suggest to leave this flow as it's now, at least for this series
> submission.
> 
> Agree ?

I don't think you can actually fix this, so maybe we have to leave
it. But add a comment explaining 

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mlx5-next v1 02/12] net/mlx5: Use event mask based on device capabilities
  2019-06-18 17:15 ` [PATCH mlx5-next v1 02/12] net/mlx5: Use event mask based on device capabilities Leon Romanovsky
@ 2019-06-27  0:23   ` Saeed Mahameed
  0 siblings, 0 replies; 35+ messages in thread
From: Saeed Mahameed @ 2019-06-27  0:23 UTC (permalink / raw)
  To: Jason Gunthorpe, leon, dledford
  Cc: Yishai Hadas, netdev, Leon Romanovsky, linux-rdma

On Tue, 2019-06-18 at 20:15 +0300, Leon Romanovsky wrote:
> From: Yishai Hadas <yishaih@mellanox.com>
> 
> Use the reported device capabilities for the supported user events
> (i.e.
> affiliated and un-affiliated) to set the EQ mask.
> 
> As the event mask can be up to 256 defined by 4 entries of u64 change
> the applicable code to work accordingly.
> 
> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/hw/mlx5/odp.c             |  3 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c | 45 ++++++++++++++++
> ----
>  drivers/net/ethernet/mellanox/mlx5/core/fw.c |  6 +++
>  include/linux/mlx5/device.h                  |  6 ++-
>  include/linux/mlx5/eq.h                      |  4 +-
>  include/linux/mlx5/mlx5_ifc.h                | 13 ++++--
>  6 files changed, 63 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/odp.c
> b/drivers/infiniband/hw/mlx5/odp.c
> index 600fe23e2eae..a6740ec308ed 100644
> --- a/drivers/infiniband/hw/mlx5/odp.c
> +++ b/drivers/infiniband/hw/mlx5/odp.c
> @@ -1559,10 +1559,11 @@ mlx5_ib_create_pf_eq(struct mlx5_ib_dev *dev,
> struct mlx5_ib_pf_eq *eq)
>  	eq->irq_nb.notifier_call = mlx5_ib_eq_pf_int;
>  	param = (struct mlx5_eq_param) {
>  		.irq_index = 0,
> -		.mask = 1 << MLX5_EVENT_TYPE_PAGE_FAULT,
>  		.nent = MLX5_IB_NUM_PF_EQE,
>  	};
>  	eq->core = mlx5_eq_create_generic(dev->mdev, &param);
> +
> +	param.mask[0] = 1ull << MLX5_EVENT_TYPE_PAGE_FAULT;

As Yishai already pointer out, there is a regression here,
the line above was merged in the wrong order, mask should be setup
before calling mlx5_eq_create_generic.

I will expect V3.

>  	if (IS_ERR(eq->core)) {
>  		err = PTR_ERR(eq->core);
>  		goto err_wq;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 8000d2a4a7e2..9d07add38940 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -256,6 +256,7 @@ create_map_eq(struct mlx5_core_dev *dev, struct
> mlx5_eq *eq,
>  	int inlen;
>  	u32 *in;
>  	int err;
> +	int i;
>  
>  	/* Init CQ table */
>  	memset(cq_table, 0, sizeof(*cq_table));
> @@ -283,10 +284,12 @@ create_map_eq(struct mlx5_core_dev *dev, struct
> mlx5_eq *eq,
>  	mlx5_fill_page_array(&eq->buf, pas);
>  
>  	MLX5_SET(create_eq_in, in, opcode, MLX5_CMD_OP_CREATE_EQ);
> -	if (!param->mask && MLX5_CAP_GEN(dev, log_max_uctx))
> +	if (!param->mask[0] && MLX5_CAP_GEN(dev, log_max_uctx))
>  		MLX5_SET(create_eq_in, in, uid,
> MLX5_SHARED_RESOURCE_UID);
>  
> -	MLX5_SET64(create_eq_in, in, event_bitmask, param->mask);
> +	for (i = 0; i < 4; i++)
> +		MLX5_ARRAY_SET64(create_eq_in, in, event_bitmask, i,
> +				 param->mask[i]);
>  
>  	eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry);
>  	MLX5_SET(eqc, eqc, log_eq_size, ilog2(eq->nent));
> @@ -507,10 +510,32 @@ static int cq_err_event_notifier(struct
> notifier_block *nb,
>  	return NOTIFY_OK;
>  }
>  
> -static u64 gather_async_events_mask(struct mlx5_core_dev *dev)
> +static void gather_async_events_from_cap(struct mlx5_core_dev *dev,
> +					 u64 mask[4])
> +{
> +	__be64 *user_unaffiliated_events;
> +	__be64 *user_affiliated_events;
> +	int i;
> +
> +	user_affiliated_events =
> +		MLX5_CAP_DEV_EVENT(dev, user_affiliated_events);
> +	user_unaffiliated_events =
> +		MLX5_CAP_DEV_EVENT(dev, user_unaffiliated_events);
> +
> +	for (i = 0; i < 4; i++)
> +		mask[i] = be64_to_cpu(user_affiliated_events[i] |
> +				      user_unaffiliated_events[i]);
> +}
> +
> +static void gather_async_events_mask(struct mlx5_core_dev *dev, u64
> mask[4])
>  {
>  	u64 async_event_mask = MLX5_ASYNC_EVENT_MASK;
>  
> +	if (MLX5_CAP_GEN(dev, event_cap)) {
> +		gather_async_events_from_cap(dev, mask);
> +		return;
> +	}
> +
>  	if (MLX5_VPORT_MANAGER(dev))
>  		async_event_mask |= (1ull <<
> MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);
>  
> @@ -544,7 +569,7 @@ static u64 gather_async_events_mask(struct
> mlx5_core_dev *dev)
>  		async_event_mask |=
>  			(1ull <<
> MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED);
>  
> -	return async_event_mask;
> +	mask[0] = async_event_mask;
>  }
>  
>  static int create_async_eqs(struct mlx5_core_dev *dev)
> @@ -559,9 +584,11 @@ static int create_async_eqs(struct mlx5_core_dev
> *dev)
>  	table->cmd_eq.irq_nb.notifier_call = mlx5_eq_async_int;
>  	param = (struct mlx5_eq_param) {
>  		.irq_index = 0,
> -		.mask = 1ull << MLX5_EVENT_TYPE_CMD,
> +		.mask = {1ull << MLX5_EVENT_TYPE_CMD},
>  		.nent = MLX5_NUM_CMD_EQE,
>  	};
> +
> +	param.mask[0] = 1ull << MLX5_EVENT_TYPE_CMD;

No need to setup the mask twice, 
here and everywhere in this patch, pick one approach to set it.

>  	err = create_async_eq(dev, &table->cmd_eq.core, &param);
>  	if (err) {
>  		mlx5_core_warn(dev, "failed to create cmd EQ %d\n",
> err);
> @@ -577,9 +604,9 @@ static int create_async_eqs(struct mlx5_core_dev
> *dev)
>  	table->async_eq.irq_nb.notifier_call = mlx5_eq_async_int;
>  	param = (struct mlx5_eq_param) {
>  		.irq_index = 0,
> -		.mask = gather_async_events_mask(dev),
>  		.nent = MLX5_NUM_ASYNC_EQE,
>  	};
> +	gather_async_events_mask(dev, param.mask);
>  	err = create_async_eq(dev, &table->async_eq.core, &param);
>  	if (err) {
>  		mlx5_core_warn(dev, "failed to create async EQ %d\n",
> err);
> @@ -595,9 +622,11 @@ static int create_async_eqs(struct mlx5_core_dev
> *dev)
>  	table->pages_eq.irq_nb.notifier_call = mlx5_eq_async_int;
>  	param = (struct mlx5_eq_param) {
>  		.irq_index = 0,
> -		.mask =  1 << MLX5_EVENT_TYPE_PAGE_REQUEST,
> +		.mask = {1 << MLX5_EVENT_TYPE_PAGE_REQUEST},
>  		.nent = /* TODO: sriov max_vf + */ 1,
>  	};
> +
> +	param.mask[0] = 1ull << MLX5_EVENT_TYPE_PAGE_REQUEST;
>  	err = create_async_eq(dev, &table->pages_eq.core, &param);
>  	if (err) {
>  		mlx5_core_warn(dev, "failed to create pages EQ %d\n",
> err);
> @@ -789,7 +818,7 @@ static int create_comp_eqs(struct mlx5_core_dev
> *dev)
>  		eq->irq_nb.notifier_call = mlx5_eq_comp_int;
>  		param = (struct mlx5_eq_param) {
>  			.irq_index = vecidx,
> -			.mask = 0,
> +			.mask = {0},
>  			.nent = nent,
>  		};
>  		err = create_map_eq(dev, &eq->core, &param);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
> b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
> index 1ab6f7e3bec6..05367f15c3a7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
> @@ -202,6 +202,12 @@ int mlx5_query_hca_caps(struct mlx5_core_dev
> *dev)
>  			return err;
>  	}
>  
> +	if (MLX5_CAP_GEN(dev, event_cap)) {
> +		err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_EVENT);
> +		if (err)
> +			return err;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/include/linux/mlx5/device.h
> b/include/linux/mlx5/device.h
> index 5e760067ac41..0d1abe097627 100644
> --- a/include/linux/mlx5/device.h
> +++ b/include/linux/mlx5/device.h
> @@ -351,7 +351,7 @@ enum mlx5_event {
>  
>  	MLX5_EVENT_TYPE_DEVICE_TRACER      = 0x26,
>  
> -	MLX5_EVENT_TYPE_MAX                =
> MLX5_EVENT_TYPE_DEVICE_TRACER + 1,
> +	MLX5_EVENT_TYPE_MAX                = 0x100,
>  };
>  
>  enum {
> @@ -1077,6 +1077,7 @@ enum mlx5_cap_type {
>  	MLX5_CAP_DEBUG,
>  	MLX5_CAP_RESERVED_14,
>  	MLX5_CAP_DEV_MEM,
> +	MLX5_CAP_DEV_EVENT = 0x14,
>  	/* NUM OF CAP Types */
>  	MLX5_CAP_NUM
>  };
> @@ -1255,6 +1256,9 @@ enum mlx5_qcam_feature_groups {
>  #define MLX5_CAP64_DEV_MEM(mdev, cap)\
>  	MLX5_GET64(device_mem_cap, mdev-
> >caps.hca_cur[MLX5_CAP_DEV_MEM], cap)
>  
> +#define MLX5_CAP_DEV_EVENT(mdev, cap)\
> +	MLX5_ADDR_OF(device_event_cap, (mdev)-
> >caps.hca_cur[MLX5_CAP_DEV_EVENT], cap)
> +
>  enum {
>  	MLX5_CMD_STAT_OK			= 0x0,
>  	MLX5_CMD_STAT_INT_ERR			= 0x1,
> diff --git a/include/linux/mlx5/eq.h b/include/linux/mlx5/eq.h
> index 70e16dcfb4c4..202df2e5fe8c 100644
> --- a/include/linux/mlx5/eq.h
> +++ b/include/linux/mlx5/eq.h
> @@ -15,7 +15,9 @@ struct mlx5_core_dev;
>  struct mlx5_eq_param {
>  	u8             irq_index;
>  	int            nent;
> -	u64            mask;
> +	u64            mask[4];
> +	void          *context;
> +	irq_handler_t  handler;
>  };
>  
>  struct mlx5_eq *
> diff --git a/include/linux/mlx5/mlx5_ifc.h
> b/include/linux/mlx5/mlx5_ifc.h
> index 16348528fef6..3ef716c054c2 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -823,6 +823,12 @@ struct mlx5_ifc_device_mem_cap_bits {
>  	u8         reserved_at_180[0x680];
>  };
>  
> +struct mlx5_ifc_device_event_cap_bits {
> +	u8         user_affiliated_events[4][0x40];
> +
> +	u8         user_unaffiliated_events[4][0x40];
> +};
> +
>  enum {
>  	MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_1_BYTE     = 0x0,
>  	MLX5_ATOMIC_CAPS_ATOMIC_SIZE_QP_2_BYTES    = 0x2,
> @@ -980,7 +986,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
>  
>  	u8         log_max_srq_sz[0x8];
>  	u8         log_max_qp_sz[0x8];
> -	u8         reserved_at_90[0x8];
> +	u8         event_cap[0x1];
> +	u8         reserved_at_91[0x7];
>  	u8         prio_tag_required[0x1];
>  	u8         reserved_at_99[0x2];
>  	u8         log_max_qp[0x5];
> @@ -7364,9 +7371,9 @@ struct mlx5_ifc_create_eq_in_bits {
>  
>  	u8         reserved_at_280[0x40];
>  
> -	u8         event_bitmask[0x40];
> +	u8         event_bitmask[4][0x40];
>  
> -	u8         reserved_at_300[0x580];
> +	u8         reserved_at_3c0[0x4c0];
>  
>  	u8         pas[0][0x40];
>  };

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH rdma-next v1 00/12] DEVX asynchronous events
  2019-06-24 21:57     ` Saeed Mahameed
@ 2019-06-30  8:53       ` Leon Romanovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2019-06-30  8:53 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Jason Gunthorpe, Yishai Hadas, netdev, linux-rdma, dledford

On Mon, Jun 24, 2019 at 09:57:05PM +0000, Saeed Mahameed wrote:
> On Wed, 2019-06-19 at 07:45 +0300, Leon Romanovsky wrote:
> > On Tue, Jun 18, 2019 at 06:51:45PM +0000, Saeed Mahameed wrote:
> > > On Tue, 2019-06-18 at 20:15 +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@mellanox.com>
> > > >
> > > > Changelog:
> > > >  v0 -> v1:
> > >
> > > Normally 1st submission is V1 and 2nd is V2.
> > > so this should have been v1->v2.
> >
> > "Normally" depends on the language you are using. In C, everything
> > starts from 0, including version of patches :).
> >
>
> You are wrong:
> quoting: https://kernelnewbies.org/PatchTipsAndTricks
>
> "For example, if you're sending the second revision of a patch, you
> should use [PATCH v2]."
>
> now don't tell me that second revision is actually 3rd revision or 1st
> is 2nd :)..

:)

If you don't mind, I will stick to common sense (v0, v1, v2 ...)
and official kernel documentation, which mentions existence of v1.

https://elixir.bootlin.com/linux/latest/source/Documentation/process/submitting-patches.rst#L682

>
> > > For mlx5-next patches:
> > >
> > > Acked-by: Saeed Mahameed <saeedm@mellanox.com>
> >
> > Thanks

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, back to index

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-18 17:15 [PATCH rdma-next v1 00/12] DEVX asynchronous events Leon Romanovsky
2019-06-18 17:15 ` [PATCH mlx5-next v1 01/12] net/mlx5: Fix mlx5_core_destroy_cq() error flow Leon Romanovsky
2019-06-18 17:15 ` [PATCH mlx5-next v1 02/12] net/mlx5: Use event mask based on device capabilities Leon Romanovsky
2019-06-27  0:23   ` Saeed Mahameed
2019-06-18 17:15 ` [PATCH mlx5-next v1 03/12] net/mlx5: Expose the API to register for ANY event Leon Romanovsky
2019-06-18 17:15 ` [PATCH mlx5-next v1 04/12] net/mlx5: mlx5_core_create_cq() enhancements Leon Romanovsky
2019-06-18 17:15 ` [PATCH mlx5-next v1 05/12] net/mlx5: Report a CQ error event only when a handler was set Leon Romanovsky
2019-06-18 17:15 ` [PATCH mlx5-next v1 06/12] net/mlx5: Report EQE data upon CQ completion Leon Romanovsky
2019-06-18 17:15 ` [PATCH mlx5-next v1 07/12] net/mlx5: Expose device definitions for object events Leon Romanovsky
2019-06-18 17:15 ` [PATCH rdma-next v1 08/12] IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD Leon Romanovsky
2019-06-24 11:51   ` Jason Gunthorpe
2019-06-24 13:25     ` Yishai Hadas
2019-06-24 14:30       ` Jason Gunthorpe
2019-06-18 17:15 ` [PATCH rdma-next v1 09/12] IB/mlx5: Register DEVX with mlx5_core to get async events Leon Romanovsky
2019-06-24 11:52   ` Jason Gunthorpe
2019-06-24 13:36     ` Yishai Hadas
2019-06-24 14:30       ` Jason Gunthorpe
2019-06-18 17:15 ` [PATCH rdma-next v1 10/12] IB/mlx5: Enable subscription for device events over DEVX Leon Romanovsky
2019-06-24 11:57   ` Jason Gunthorpe
2019-06-24 16:13     ` Yishai Hadas
2019-06-24 17:56       ` Jason Gunthorpe
2019-06-18 17:15 ` [PATCH rdma-next v1 11/12] IB/mlx5: Implement DEVX dispatching event Leon Romanovsky
2019-06-24 12:03   ` Jason Gunthorpe
2019-06-24 16:55     ` Yishai Hadas
2019-06-24 18:06       ` Jason Gunthorpe
2019-06-25 14:41         ` Yishai Hadas
2019-06-25 20:23           ` Jason Gunthorpe
2019-06-18 17:15 ` [PATCH rdma-next v1 12/12] IB/mlx5: Add DEVX support for CQ events Leon Romanovsky
2019-06-24 12:04   ` Jason Gunthorpe
2019-06-24 17:03     ` Yishai Hadas
2019-06-24 18:06       ` Jason Gunthorpe
2019-06-18 18:51 ` [PATCH rdma-next v1 00/12] DEVX asynchronous events Saeed Mahameed
2019-06-19  4:45   ` Leon Romanovsky
2019-06-24 21:57     ` Saeed Mahameed
2019-06-30  8:53       ` Leon Romanovsky

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox