All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21
@ 2018-02-21 20:13 Saeed Mahameed
  2018-02-21 20:13 ` [for-next 1/7] net/mlx5: CQ Database per EQ Saeed Mahameed
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma, Saeed Mahameed

Hi Dave & Doug,

This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.  This series should be pulled to both
trees so we can continue netdev and rdma specific submissions separately.

For more information please see tag log below.

P.S. We expect two more shared code pull requests.

The series doesn't cause any conflict with the latest mlx5 net fixes
series.

Please pull and let me know if there's any issue,

Thanks,
Saeed.

---

The following changes since commit 7928b2cbe55b2a410a0f5c1f154610059c57b1b2:

  Linux 4.16-rc1 (2018-02-11 15:04:29 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git tags/mlx5-updates-2018-02-21

for you to fetch changes up to 388ca8be00370db132464e27f745b8a0add19fcb:

  IB/mlx5: Implement fragmented completion queue (CQ) (2018-02-15 00:30:03 -0800)

----------------------------------------------------------------
mlx5-updates-2018-02-21

This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.

By Saeed,
First six patches of the series are meant to address a performance issue
and should provide a performance boost for multi core IRQ interrupt hungry
workloads.  The issue is fixed in the first patch, all other patches are
meant to refactor the code in light of this fix.

The problem it comes to fix, is a shared spinlock accessed across all HCA
IRQs which protects the CQ database.  To solve this we simply move the CQ
database and its spinlock to be per EQ (IRQ), thus per core.

By Yonatan,
Fragmented completion queue (CQ) for RDMA,
core driver implementation to create fragmented CQ buffers rather than
one large contiguous memory buffer, the implementation scheme already
exist and used by the netdev CQs, the patch shares that code with the
rdma CQ creation flow and makes use of the new API in mlx5_ib driver.

Thanks,
Saeed.

----------------------------------------------------------------
Saeed Mahameed (6):
      net/mlx5: CQ Database per EQ
      net/mlx5: Add missing likely/unlikely hints to cq events
      net/mlx5: EQ add/del CQ API
      net/mlx5: CQ hold/put API
      net/mlx5: Move CQ completion and event forwarding logic to eq.c
      net/mlx5: Remove redundant EQ API exports

Yonatan Cohen (1):
      IB/mlx5: Implement fragmented completion queue (CQ)

 drivers/infiniband/hw/mlx5/cq.c                    |  64 +++++++-----
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c    |  37 ++++---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c       | 116 +++++----------------
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |  92 +++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |   8 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  21 ++++
 drivers/net/ethernet/mellanox/mlx5/core/wq.c       |  18 ++--
 drivers/net/ethernet/mellanox/mlx5/core/wq.h       |  22 ++--
 include/linux/mlx5/cq.h                            |  14 ++-
 include/linux/mlx5/driver.h                        |  88 ++++++++--------
 12 files changed, 279 insertions(+), 218 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [for-next 1/7] net/mlx5: CQ Database per EQ
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
@ 2018-02-21 20:13 ` Saeed Mahameed
  2018-02-21 20:13 ` [for-next 2/7] net/mlx5: Add missing likely/unlikely hints to cq events Saeed Mahameed
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma, Saeed Mahameed

Before this patch the driver had one CQ database protected via one
spinlock, this spinlock is meant to synchronize between CQ
adding/removing and CQ IRQ interrupt handling.

On a system with large number of CPUs and on a work load that requires
lots of interrupts, this global spinlock becomes a very nasty hotspot
and introduces a contention between the active cores, which will
significantly hurt performance and becomes a bottleneck that prevents
seamless cpu scaling.

To solve this we simply move the CQ database and its spinlock to be per
EQ (IRQ), thus per core.

Tested with:
system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores
netperf command: ./super_netperf 200 -P 0 -t TCP_RR  -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M

WITHOUT THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  %guest  %gnice   %idle
Average:     all    4.32    0.00   36.15    0.09    0.00   34.02   0.00    0.00    0.00   25.41

Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271
Overhead  Command          Shared Object                 Symbol
+   14.28%  swapper          [kernel.vmlinux]              [k] intel_idle
+   12.25%  swapper          [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+   10.29%  netserver        [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+    1.32%  netserver        [kernel.vmlinux]              [k] mlx5e_xmit

WITH THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    4.27    0.00   34.31    0.01    0.00   18.71    0.00    0.00    0.00   42.69

Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483
Overhead  Command          Shared Object             Symbol
+   23.33%  swapper          [kernel.vmlinux]          [k] intel_idle
+    1.69%  netserver        [kernel.vmlinux]          [k] mlx5e_xmit

Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   | 69 +++++++++++++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 10 +++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  8 +--
 include/linux/mlx5/cq.h                        |  3 +-
 include/linux/mlx5/driver.h                    | 22 ++++----
 5 files changed, 62 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 1016e05c7ec7..dfbeeaa43276 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -86,10 +86,10 @@ static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq)
 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
 }
 
-void mlx5_cq_completion(struct mlx5_core_dev *dev, u32 cqn)
+void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn)
 {
+	struct mlx5_cq_table *table = &eq->cq_table;
 	struct mlx5_core_cq *cq;
-	struct mlx5_cq_table *table = &dev->priv.cq_table;
 
 	spin_lock(&table->lock);
 	cq = radix_tree_lookup(&table->tree, cqn);
@@ -98,7 +98,7 @@ void mlx5_cq_completion(struct mlx5_core_dev *dev, u32 cqn)
 	spin_unlock(&table->lock);
 
 	if (!cq) {
-		mlx5_core_warn(dev, "Completion event for bogus CQ 0x%x\n", cqn);
+		mlx5_core_warn(eq->dev, "Completion event for bogus CQ 0x%x\n", cqn);
 		return;
 	}
 
@@ -110,9 +110,9 @@ void mlx5_cq_completion(struct mlx5_core_dev *dev, u32 cqn)
 		complete(&cq->free);
 }
 
-void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type)
+void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type)
 {
-	struct mlx5_cq_table *table = &dev->priv.cq_table;
+	struct mlx5_cq_table *table = &eq->cq_table;
 	struct mlx5_core_cq *cq;
 
 	spin_lock(&table->lock);
@@ -124,7 +124,7 @@ void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type)
 	spin_unlock(&table->lock);
 
 	if (!cq) {
-		mlx5_core_warn(dev, "Async event for bogus CQ 0x%x\n", cqn);
+		mlx5_core_warn(eq->dev, "Async event for bogus CQ 0x%x\n", cqn);
 		return;
 	}
 
@@ -137,19 +137,22 @@ void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type)
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 			u32 *in, int inlen)
 {
-	struct mlx5_cq_table *table = &dev->priv.cq_table;
 	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
 	u32 din[MLX5_ST_SZ_DW(destroy_cq_in)];
 	u32 dout[MLX5_ST_SZ_DW(destroy_cq_out)];
 	int eqn = MLX5_GET(cqc, MLX5_ADDR_OF(create_cq_in, in, cq_context),
 			   c_eqn);
-	struct mlx5_eq *eq;
+	struct mlx5_eq *eq, *async_eq;
+	struct mlx5_cq_table *table;
 	int err;
 
+	async_eq = &dev->priv.eq_table.async_eq;
 	eq = mlx5_eqn2eq(dev, eqn);
 	if (IS_ERR(eq))
 		return PTR_ERR(eq);
 
+	table = &eq->cq_table;
+
 	memset(out, 0, sizeof(out));
 	MLX5_SET(create_cq_in, in, opcode, MLX5_CMD_OP_CREATE_CQ);
 	err = mlx5_cmd_exec(dev, in, inlen, out, sizeof(out));
@@ -159,6 +162,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 	cq->cqn = MLX5_GET(create_cq_out, out, cqn);
 	cq->cons_index = 0;
 	cq->arm_sn     = 0;
+	cq->eq         = eq;
 	refcount_set(&cq->refcount, 1);
 	init_completion(&cq->free);
 	if (!cq->comp)
@@ -167,12 +171,20 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 	cq->tasklet_ctx.priv = &eq->tasklet_ctx;
 	INIT_LIST_HEAD(&cq->tasklet_ctx.list);
 
+	/* Add to comp EQ CQ tree to recv comp events */
 	spin_lock_irq(&table->lock);
 	err = radix_tree_insert(&table->tree, cq->cqn, cq);
 	spin_unlock_irq(&table->lock);
 	if (err)
 		goto err_cmd;
 
+	/* Add to async EQ CQ tree to recv Async events */
+	spin_lock_irq(&async_eq->cq_table.lock);
+	err = radix_tree_insert(&async_eq->cq_table.tree, cq->cqn, cq);
+	spin_unlock_irq(&async_eq->cq_table.lock);
+	if (err)
+		goto err_cq_table;
+
 	cq->pid = current->pid;
 	err = mlx5_debug_cq_add(dev, cq);
 	if (err)
@@ -183,6 +195,10 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 
 	return 0;
 
+err_cq_table:
+	spin_lock_irq(&table->lock);
+	radix_tree_delete(&table->tree, cq->cqn);
+	spin_unlock_irq(&table->lock);
 err_cmd:
 	memset(din, 0, sizeof(din));
 	memset(dout, 0, sizeof(dout));
@@ -195,21 +211,34 @@ EXPORT_SYMBOL(mlx5_core_create_cq);
 
 int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq)
 {
-	struct mlx5_cq_table *table = &dev->priv.cq_table;
+	struct mlx5_cq_table *asyn_eq_cq_table = &dev->priv.eq_table.async_eq.cq_table;
+	struct mlx5_cq_table *table = &cq->eq->cq_table;
 	u32 out[MLX5_ST_SZ_DW(destroy_cq_out)] = {0};
 	u32 in[MLX5_ST_SZ_DW(destroy_cq_in)] = {0};
 	struct mlx5_core_cq *tmp;
 	int err;
 
+	spin_lock_irq(&asyn_eq_cq_table->lock);
+	tmp = radix_tree_delete(&asyn_eq_cq_table->tree, cq->cqn);
+	spin_unlock_irq(&asyn_eq_cq_table->lock);
+	if (!tmp) {
+		mlx5_core_warn(dev, "cq 0x%x not found in async eq cq tree\n", cq->cqn);
+		return -EINVAL;
+	}
+	if (tmp != cq) {
+		mlx5_core_warn(dev, "corruption on cqn 0x%x in async eq cq tree\n", cq->cqn);
+		return -EINVAL;
+	}
+
 	spin_lock_irq(&table->lock);
 	tmp = radix_tree_delete(&table->tree, cq->cqn);
 	spin_unlock_irq(&table->lock);
 	if (!tmp) {
-		mlx5_core_warn(dev, "cq 0x%x not found in tree\n", cq->cqn);
+		mlx5_core_warn(dev, "cq 0x%x not found in comp eq cq tree\n", cq->cqn);
 		return -EINVAL;
 	}
 	if (tmp != cq) {
-		mlx5_core_warn(dev, "corruption on srqn 0x%x\n", cq->cqn);
+		mlx5_core_warn(dev, "corruption on cqn 0x%x in comp eq cq tree\n", cq->cqn);
 		return -EINVAL;
 	}
 
@@ -270,21 +299,3 @@ int mlx5_core_modify_cq_moderation(struct mlx5_core_dev *dev,
 	return mlx5_core_modify_cq(dev, cq, in, sizeof(in));
 }
 EXPORT_SYMBOL(mlx5_core_modify_cq_moderation);
-
-int mlx5_init_cq_table(struct mlx5_core_dev *dev)
-{
-	struct mlx5_cq_table *table = &dev->priv.cq_table;
-	int err;
-
-	memset(table, 0, sizeof(*table));
-	spin_lock_init(&table->lock);
-	INIT_RADIX_TREE(&table->tree, GFP_ATOMIC);
-	err = mlx5_cq_debugfs_init(dev);
-
-	return err;
-}
-
-void mlx5_cleanup_cq_table(struct mlx5_core_dev *dev)
-{
-	mlx5_cq_debugfs_cleanup(dev);
-}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 25106e996a96..328403ebf2f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -415,7 +415,7 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
 		switch (eqe->type) {
 		case MLX5_EVENT_TYPE_COMP:
 			cqn = be32_to_cpu(eqe->data.comp.cqn) & 0xffffff;
-			mlx5_cq_completion(dev, cqn);
+			mlx5_cq_completion(eq, cqn);
 			break;
 		case MLX5_EVENT_TYPE_DCT_DRAINED:
 			rsn = be32_to_cpu(eqe->data.dct.dctn) & 0xffffff;
@@ -472,7 +472,7 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
 			cqn = be32_to_cpu(eqe->data.cq_err.cqn) & 0xffffff;
 			mlx5_core_warn(dev, "CQ error on CQN 0x%x, syndrome 0x%x\n",
 				       cqn, eqe->data.cq_err.syndrome);
-			mlx5_cq_event(dev, cqn, eqe->type);
+			mlx5_cq_event(eq, cqn, eqe->type);
 			break;
 
 		case MLX5_EVENT_TYPE_PAGE_REQUEST:
@@ -567,6 +567,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		       int nent, u64 mask, const char *name,
 		       enum mlx5_eq_type type)
 {
+	struct mlx5_cq_table *cq_table = &eq->cq_table;
 	u32 out[MLX5_ST_SZ_DW(create_eq_out)] = {0};
 	struct mlx5_priv *priv = &dev->priv;
 	irq_handler_t handler;
@@ -576,6 +577,11 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 	u32 *in;
 	int err;
 
+	/* Init CQ table */
+	memset(cq_table, 0, sizeof(*cq_table));
+	spin_lock_init(&cq_table->lock);
+	INIT_RADIX_TREE(&cq_table->tree, GFP_ATOMIC);
+
 	eq->type = type;
 	eq->nent = roundup_pow_of_two(nent + MLX5_NUM_SPARE_EQE);
 	eq->cons_index = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 2ef641c91c26..8cc22bf80c87 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -942,9 +942,9 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 		goto out;
 	}
 
-	err = mlx5_init_cq_table(dev);
+	err = mlx5_cq_debugfs_init(dev);
 	if (err) {
-		dev_err(&pdev->dev, "failed to initialize cq table\n");
+		dev_err(&pdev->dev, "failed to initialize cq debugfs\n");
 		goto err_eq_cleanup;
 	}
 
@@ -1002,7 +1002,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	mlx5_cleanup_mkey_table(dev);
 	mlx5_cleanup_srq_table(dev);
 	mlx5_cleanup_qp_table(dev);
-	mlx5_cleanup_cq_table(dev);
+	mlx5_cq_debugfs_cleanup(dev);
 
 err_eq_cleanup:
 	mlx5_eq_cleanup(dev);
@@ -1023,7 +1023,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_cleanup_mkey_table(dev);
 	mlx5_cleanup_srq_table(dev);
 	mlx5_cleanup_qp_table(dev);
-	mlx5_cleanup_cq_table(dev);
+	mlx5_cq_debugfs_cleanup(dev);
 	mlx5_eq_cleanup(dev);
 }
 
diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h
index 48c181a2acc9..06ba425a6ad7 100644
--- a/include/linux/mlx5/cq.h
+++ b/include/linux/mlx5/cq.h
@@ -60,6 +60,7 @@ struct mlx5_core_cq {
 	} tasklet_ctx;
 	int			reset_notify_added;
 	struct list_head	reset_notify;
+	struct mlx5_eq		*eq;
 };
 
 
@@ -171,8 +172,6 @@ static inline void mlx5_cq_arm(struct mlx5_core_cq *cq, u32 cmd,
 	mlx5_write64(doorbell, uar_page + MLX5_CQ_DOORBELL, NULL);
 }
 
-int mlx5_init_cq_table(struct mlx5_core_dev *dev);
-void mlx5_cleanup_cq_table(struct mlx5_core_dev *dev);
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 			u32 *in, int inlen);
 int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 6ed79a8a8318..96e003db2bcd 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -375,8 +375,15 @@ struct mlx5_eq_pagefault {
 	mempool_t		*pool;
 };
 
+struct mlx5_cq_table {
+	/* protect radix tree */
+	spinlock_t		lock;
+	struct radix_tree_root	tree;
+};
+
 struct mlx5_eq {
 	struct mlx5_core_dev   *dev;
+	struct mlx5_cq_table	cq_table;
 	__be32 __iomem	       *doorbell;
 	u32			cons_index;
 	struct mlx5_buf		buf;
@@ -526,13 +533,6 @@ struct mlx5_core_health {
 	struct delayed_work		recover_work;
 };
 
-struct mlx5_cq_table {
-	/* protect radix tree
-	 */
-	spinlock_t		lock;
-	struct radix_tree_root	tree;
-};
-
 struct mlx5_qp_table {
 	/* protect radix tree
 	 */
@@ -654,10 +654,6 @@ struct mlx5_priv {
 	struct dentry	       *cmdif_debugfs;
 	/* end: qp staff */
 
-	/* start: cq staff */
-	struct mlx5_cq_table	cq_table;
-	/* end: cq staff */
-
 	/* start: mkey staff */
 	struct mlx5_mkey_table	mkey_table;
 	/* end: mkey staff */
@@ -1053,12 +1049,12 @@ int mlx5_eq_init(struct mlx5_core_dev *dev);
 void mlx5_eq_cleanup(struct mlx5_core_dev *dev);
 void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas);
 void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas);
-void mlx5_cq_completion(struct mlx5_core_dev *dev, u32 cqn);
+void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn);
 void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type);
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
 struct mlx5_core_srq *mlx5_core_get_srq(struct mlx5_core_dev *dev, u32 srqn);
 void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced);
-void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type);
+void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type);
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		       int nent, u64 mask, const char *name,
 		       enum mlx5_eq_type type);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [for-next 2/7] net/mlx5: Add missing likely/unlikely hints to cq events
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
  2018-02-21 20:13 ` [for-next 1/7] net/mlx5: CQ Database per EQ Saeed Mahameed
@ 2018-02-21 20:13 ` Saeed Mahameed
  2018-02-21 20:13 ` [for-next 3/7] net/mlx5: EQ add/del CQ API Saeed Mahameed
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma, Saeed Mahameed

If a hardware event is targeting a CQ, that CQ should exist.
Add unlikely to error handling flows.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index dfbeeaa43276..9feeb555e937 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -97,7 +97,7 @@ void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn)
 		refcount_inc(&cq->refcount);
 	spin_unlock(&table->lock);
 
-	if (!cq) {
+	if (unlikely(!cq)) {
 		mlx5_core_warn(eq->dev, "Completion event for bogus CQ 0x%x\n", cqn);
 		return;
 	}
@@ -118,12 +118,12 @@ void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type)
 	spin_lock(&table->lock);
 
 	cq = radix_tree_lookup(&table->tree, cqn);
-	if (cq)
+	if (likely(cq))
 		refcount_inc(&cq->refcount);
 
 	spin_unlock(&table->lock);
 
-	if (!cq) {
+	if (unlikely(!cq)) {
 		mlx5_core_warn(eq->dev, "Async event for bogus CQ 0x%x\n", cqn);
 		return;
 	}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [for-next 3/7] net/mlx5: EQ add/del CQ API
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
  2018-02-21 20:13 ` [for-next 1/7] net/mlx5: CQ Database per EQ Saeed Mahameed
  2018-02-21 20:13 ` [for-next 2/7] net/mlx5: Add missing likely/unlikely hints to cq events Saeed Mahameed
@ 2018-02-21 20:13 ` Saeed Mahameed
  2018-02-21 20:13 ` [for-next 4/7] net/mlx5: CQ hold/put API Saeed Mahameed
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma, Saeed Mahameed

Add API to add/del CQ to/from EQs CQ table to be used in cq.c upon CQ
creation/destruction, as CQ table is now private to eq.c.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c       | 60 ++++++----------------
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       | 34 ++++++++++++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  4 ++
 3 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 9feeb555e937..f6e478d05ecc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -137,22 +137,17 @@ void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type)
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 			u32 *in, int inlen)
 {
+	int eqn = MLX5_GET(cqc, MLX5_ADDR_OF(create_cq_in, in, cq_context), c_eqn);
+	u32 dout[MLX5_ST_SZ_DW(destroy_cq_out)];
 	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
 	u32 din[MLX5_ST_SZ_DW(destroy_cq_in)];
-	u32 dout[MLX5_ST_SZ_DW(destroy_cq_out)];
-	int eqn = MLX5_GET(cqc, MLX5_ADDR_OF(create_cq_in, in, cq_context),
-			   c_eqn);
-	struct mlx5_eq *eq, *async_eq;
-	struct mlx5_cq_table *table;
+	struct mlx5_eq *eq;
 	int err;
 
-	async_eq = &dev->priv.eq_table.async_eq;
 	eq = mlx5_eqn2eq(dev, eqn);
 	if (IS_ERR(eq))
 		return PTR_ERR(eq);
 
-	table = &eq->cq_table;
-
 	memset(out, 0, sizeof(out));
 	MLX5_SET(create_cq_in, in, opcode, MLX5_CMD_OP_CREATE_CQ);
 	err = mlx5_cmd_exec(dev, in, inlen, out, sizeof(out));
@@ -172,18 +167,14 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 	INIT_LIST_HEAD(&cq->tasklet_ctx.list);
 
 	/* Add to comp EQ CQ tree to recv comp events */
-	spin_lock_irq(&table->lock);
-	err = radix_tree_insert(&table->tree, cq->cqn, cq);
-	spin_unlock_irq(&table->lock);
+	err = mlx5_eq_add_cq(eq, cq);
 	if (err)
 		goto err_cmd;
 
-	/* Add to async EQ CQ tree to recv Async events */
-	spin_lock_irq(&async_eq->cq_table.lock);
-	err = radix_tree_insert(&async_eq->cq_table.tree, cq->cqn, cq);
-	spin_unlock_irq(&async_eq->cq_table.lock);
+	/* Add to async EQ CQ tree to recv async events */
+	err = mlx5_eq_add_cq(&dev->priv.eq_table.async_eq, cq);
 	if (err)
-		goto err_cq_table;
+		goto err_cq_add;
 
 	cq->pid = current->pid;
 	err = mlx5_debug_cq_add(dev, cq);
@@ -195,10 +186,8 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 
 	return 0;
 
-err_cq_table:
-	spin_lock_irq(&table->lock);
-	radix_tree_delete(&table->tree, cq->cqn);
-	spin_unlock_irq(&table->lock);
+err_cq_add:
+	mlx5_eq_del_cq(eq, cq);
 err_cmd:
 	memset(din, 0, sizeof(din));
 	memset(dout, 0, sizeof(dout));
@@ -211,36 +200,17 @@ EXPORT_SYMBOL(mlx5_core_create_cq);
 
 int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq)
 {
-	struct mlx5_cq_table *asyn_eq_cq_table = &dev->priv.eq_table.async_eq.cq_table;
-	struct mlx5_cq_table *table = &cq->eq->cq_table;
 	u32 out[MLX5_ST_SZ_DW(destroy_cq_out)] = {0};
 	u32 in[MLX5_ST_SZ_DW(destroy_cq_in)] = {0};
-	struct mlx5_core_cq *tmp;
 	int err;
 
-	spin_lock_irq(&asyn_eq_cq_table->lock);
-	tmp = radix_tree_delete(&asyn_eq_cq_table->tree, cq->cqn);
-	spin_unlock_irq(&asyn_eq_cq_table->lock);
-	if (!tmp) {
-		mlx5_core_warn(dev, "cq 0x%x not found in async eq cq tree\n", cq->cqn);
-		return -EINVAL;
-	}
-	if (tmp != cq) {
-		mlx5_core_warn(dev, "corruption on cqn 0x%x in async eq cq tree\n", cq->cqn);
-		return -EINVAL;
-	}
+	err = mlx5_eq_del_cq(&dev->priv.eq_table.async_eq, cq);
+	if (err)
+		return err;
 
-	spin_lock_irq(&table->lock);
-	tmp = radix_tree_delete(&table->tree, cq->cqn);
-	spin_unlock_irq(&table->lock);
-	if (!tmp) {
-		mlx5_core_warn(dev, "cq 0x%x not found in comp eq cq tree\n", cq->cqn);
-		return -EINVAL;
-	}
-	if (tmp != cq) {
-		mlx5_core_warn(dev, "corruption on cqn 0x%x in comp eq cq tree\n", cq->cqn);
-		return -EINVAL;
-	}
+	err = mlx5_eq_del_cq(cq->eq, cq);
+	if (err)
+		return err;
 
 	MLX5_SET(destroy_cq_in, in, opcode, MLX5_CMD_OP_DESTROY_CQ);
 	MLX5_SET(destroy_cq_in, in, cqn, cq->cqn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 328403ebf2f5..c1f0468e95bd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -704,6 +704,40 @@ int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 }
 EXPORT_SYMBOL_GPL(mlx5_destroy_unmap_eq);
 
+int mlx5_eq_add_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq)
+{
+	struct mlx5_cq_table *table = &eq->cq_table;
+	int err;
+
+	spin_lock_irq(&table->lock);
+	err = radix_tree_insert(&table->tree, cq->cqn, cq);
+	spin_unlock_irq(&table->lock);
+
+	return err;
+}
+
+int mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq)
+{
+	struct mlx5_cq_table *table = &eq->cq_table;
+	struct mlx5_core_cq *tmp;
+
+	spin_lock_irq(&table->lock);
+	tmp = radix_tree_delete(&table->tree, cq->cqn);
+	spin_unlock_irq(&table->lock);
+
+	if (!tmp) {
+		mlx5_core_warn(eq->dev, "cq 0x%x not found in eq 0x%x tree\n", eq->eqn, cq->cqn);
+		return -ENOENT;
+	}
+
+	if (tmp != cq) {
+		mlx5_core_warn(eq->dev, "corruption on cqn 0x%x in eq 0x%x\n", eq->eqn, cq->cqn);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 int mlx5_eq_init(struct mlx5_core_dev *dev)
 {
 	int err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 394552f36fcf..54a1cbfb1b5a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -38,6 +38,7 @@
 #include <linux/sched.h>
 #include <linux/if_link.h>
 #include <linux/firmware.h>
+#include <linux/mlx5/cq.h>
 
 #define DRIVER_NAME "mlx5_core"
 #define DRIVER_VERSION "5.0-0"
@@ -115,6 +116,9 @@ int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
 					u32 element_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
 u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev);
+
+int mlx5_eq_add_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq);
+int mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq);
 struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
 u32 mlx5_eq_poll_irq_disabled(struct mlx5_eq *eq);
 void mlx5_cq_tasklet_cb(unsigned long data);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [for-next 4/7] net/mlx5: CQ hold/put API
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2018-02-21 20:13 ` [for-next 3/7] net/mlx5: EQ add/del CQ API Saeed Mahameed
@ 2018-02-21 20:13 ` Saeed Mahameed
  2018-02-21 20:13 ` [for-next 5/7] net/mlx5: Move CQ completion and event forwarding logic to eq.c Saeed Mahameed
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma, Saeed Mahameed

Now as the CQ table is per EQ, add an API to hold/put CQ to be used from
eq.c in downstream patch.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 42 +++++++++++++---------------
 include/linux/mlx5/cq.h                      | 11 ++++++++
 2 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index f6e478d05ecc..06dc7bd302ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -58,8 +58,7 @@ void mlx5_cq_tasklet_cb(unsigned long data)
 				 tasklet_ctx.list) {
 		list_del_init(&mcq->tasklet_ctx.list);
 		mcq->tasklet_ctx.comp(mcq);
-		if (refcount_dec_and_test(&mcq->refcount))
-			complete(&mcq->free);
+		mlx5_cq_put(mcq);
 		if (time_after(jiffies, end))
 			break;
 	}
@@ -80,23 +79,31 @@ static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq)
 	 * still arrive.
 	 */
 	if (list_empty_careful(&cq->tasklet_ctx.list)) {
-		refcount_inc(&cq->refcount);
+		mlx5_cq_hold(cq);
 		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
 	}
 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
 }
 
-void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn)
+/* caller must eventually call mlx5_cq_put on the returned cq */
+static struct mlx5_core_cq *mlx5_eq_cq_get(struct mlx5_eq *eq, u32 cqn)
 {
 	struct mlx5_cq_table *table = &eq->cq_table;
-	struct mlx5_core_cq *cq;
+	struct mlx5_core_cq *cq = NULL;
 
 	spin_lock(&table->lock);
 	cq = radix_tree_lookup(&table->tree, cqn);
 	if (likely(cq))
-		refcount_inc(&cq->refcount);
+		mlx5_cq_hold(cq);
 	spin_unlock(&table->lock);
 
+	return cq;
+}
+
+void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn)
+{
+	struct mlx5_core_cq *cq = mlx5_eq_cq_get(eq, cqn);
+
 	if (unlikely(!cq)) {
 		mlx5_core_warn(eq->dev, "Completion event for bogus CQ 0x%x\n", cqn);
 		return;
@@ -106,22 +113,12 @@ void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn)
 
 	cq->comp(cq);
 
-	if (refcount_dec_and_test(&cq->refcount))
-		complete(&cq->free);
+	mlx5_cq_put(cq);
 }
 
 void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type)
 {
-	struct mlx5_cq_table *table = &eq->cq_table;
-	struct mlx5_core_cq *cq;
-
-	spin_lock(&table->lock);
-
-	cq = radix_tree_lookup(&table->tree, cqn);
-	if (likely(cq))
-		refcount_inc(&cq->refcount);
-
-	spin_unlock(&table->lock);
+	struct mlx5_core_cq *cq = mlx5_eq_cq_get(eq, cqn);
 
 	if (unlikely(!cq)) {
 		mlx5_core_warn(eq->dev, "Async event for bogus CQ 0x%x\n", cqn);
@@ -130,8 +127,7 @@ void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type)
 
 	cq->event(cq, event_type);
 
-	if (refcount_dec_and_test(&cq->refcount))
-		complete(&cq->free);
+	mlx5_cq_put(cq);
 }
 
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
@@ -158,7 +154,8 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 	cq->cons_index = 0;
 	cq->arm_sn     = 0;
 	cq->eq         = eq;
-	refcount_set(&cq->refcount, 1);
+	refcount_set(&cq->refcount, 0);
+	mlx5_cq_hold(cq);
 	init_completion(&cq->free);
 	if (!cq->comp)
 		cq->comp = mlx5_add_cq_to_tasklet;
@@ -221,8 +218,7 @@ int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq)
 	synchronize_irq(cq->irqn);
 
 	mlx5_debug_cq_remove(dev, cq);
-	if (refcount_dec_and_test(&cq->refcount))
-		complete(&cq->free);
+	mlx5_cq_put(cq);
 	wait_for_completion(&cq->free);
 
 	return 0;
diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h
index 06ba425a6ad7..445ad194e0fe 100644
--- a/include/linux/mlx5/cq.h
+++ b/include/linux/mlx5/cq.h
@@ -172,6 +172,17 @@ static inline void mlx5_cq_arm(struct mlx5_core_cq *cq, u32 cmd,
 	mlx5_write64(doorbell, uar_page + MLX5_CQ_DOORBELL, NULL);
 }
 
+static inline void mlx5_cq_hold(struct mlx5_core_cq *cq)
+{
+	refcount_inc(&cq->refcount);
+}
+
+static inline void mlx5_cq_put(struct mlx5_core_cq *cq)
+{
+	if (refcount_dec_and_test(&cq->refcount))
+		complete(&cq->free);
+}
+
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 			u32 *in, int inlen);
 int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [for-next 5/7] net/mlx5: Move CQ completion and event forwarding logic to eq.c
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2018-02-21 20:13 ` [for-next 4/7] net/mlx5: CQ hold/put API Saeed Mahameed
@ 2018-02-21 20:13 ` Saeed Mahameed
  2018-02-21 20:13 ` [for-next 6/7] net/mlx5: Remove redundant EQ API exports Saeed Mahameed
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma, Saeed Mahameed

Since CQ tree is now per EQ, CQ completion and event forwarding became
specific implementation of EQ logic, this patch moves that logic to eq.c
and makes those functions static.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 45 -------------------------
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 49 ++++++++++++++++++++++++++--
 include/linux/mlx5/driver.h                  |  2 --
 3 files changed, 47 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 06dc7bd302ed..669ed16938b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -85,51 +85,6 @@ static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq)
 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
 }
 
-/* caller must eventually call mlx5_cq_put on the returned cq */
-static struct mlx5_core_cq *mlx5_eq_cq_get(struct mlx5_eq *eq, u32 cqn)
-{
-	struct mlx5_cq_table *table = &eq->cq_table;
-	struct mlx5_core_cq *cq = NULL;
-
-	spin_lock(&table->lock);
-	cq = radix_tree_lookup(&table->tree, cqn);
-	if (likely(cq))
-		mlx5_cq_hold(cq);
-	spin_unlock(&table->lock);
-
-	return cq;
-}
-
-void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn)
-{
-	struct mlx5_core_cq *cq = mlx5_eq_cq_get(eq, cqn);
-
-	if (unlikely(!cq)) {
-		mlx5_core_warn(eq->dev, "Completion event for bogus CQ 0x%x\n", cqn);
-		return;
-	}
-
-	++cq->arm_sn;
-
-	cq->comp(cq);
-
-	mlx5_cq_put(cq);
-}
-
-void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type)
-{
-	struct mlx5_core_cq *cq = mlx5_eq_cq_get(eq, cqn);
-
-	if (unlikely(!cq)) {
-		mlx5_core_warn(eq->dev, "Async event for bogus CQ 0x%x\n", cqn);
-		return;
-	}
-
-	cq->event(cq, event_type);
-
-	mlx5_cq_put(cq);
-}
-
 int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 			u32 *in, int inlen)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index c1f0468e95bd..7e442b38a8ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -393,6 +393,51 @@ static void general_event_handler(struct mlx5_core_dev *dev,
 	}
 }
 
+/* caller must eventually call mlx5_cq_put on the returned cq */
+static struct mlx5_core_cq *mlx5_eq_cq_get(struct mlx5_eq *eq, u32 cqn)
+{
+	struct mlx5_cq_table *table = &eq->cq_table;
+	struct mlx5_core_cq *cq = NULL;
+
+	spin_lock(&table->lock);
+	cq = radix_tree_lookup(&table->tree, cqn);
+	if (likely(cq))
+		mlx5_cq_hold(cq);
+	spin_unlock(&table->lock);
+
+	return cq;
+}
+
+static void mlx5_eq_cq_completion(struct mlx5_eq *eq, u32 cqn)
+{
+	struct mlx5_core_cq *cq = mlx5_eq_cq_get(eq, cqn);
+
+	if (unlikely(!cq)) {
+		mlx5_core_warn(eq->dev, "Completion event for bogus CQ 0x%x\n", cqn);
+		return;
+	}
+
+	++cq->arm_sn;
+
+	cq->comp(cq);
+
+	mlx5_cq_put(cq);
+}
+
+static void mlx5_eq_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type)
+{
+	struct mlx5_core_cq *cq = mlx5_eq_cq_get(eq, cqn);
+
+	if (unlikely(!cq)) {
+		mlx5_core_warn(eq->dev, "Async event for bogus CQ 0x%x\n", cqn);
+		return;
+	}
+
+	cq->event(cq, event_type);
+
+	mlx5_cq_put(cq);
+}
+
 static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
 {
 	struct mlx5_eq *eq = eq_ptr;
@@ -415,7 +460,7 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
 		switch (eqe->type) {
 		case MLX5_EVENT_TYPE_COMP:
 			cqn = be32_to_cpu(eqe->data.comp.cqn) & 0xffffff;
-			mlx5_cq_completion(eq, cqn);
+			mlx5_eq_cq_completion(eq, cqn);
 			break;
 		case MLX5_EVENT_TYPE_DCT_DRAINED:
 			rsn = be32_to_cpu(eqe->data.dct.dctn) & 0xffffff;
@@ -472,7 +517,7 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
 			cqn = be32_to_cpu(eqe->data.cq_err.cqn) & 0xffffff;
 			mlx5_core_warn(dev, "CQ error on CQN 0x%x, syndrome 0x%x\n",
 				       cqn, eqe->data.cq_err.syndrome);
-			mlx5_cq_event(eq, cqn, eqe->type);
+			mlx5_eq_cq_event(eq, cqn, eqe->type);
 			break;
 
 		case MLX5_EVENT_TYPE_PAGE_REQUEST:
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 96e003db2bcd..09e2f3e8753c 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1049,12 +1049,10 @@ int mlx5_eq_init(struct mlx5_core_dev *dev);
 void mlx5_eq_cleanup(struct mlx5_core_dev *dev);
 void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas);
 void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas);
-void mlx5_cq_completion(struct mlx5_eq *eq, u32 cqn);
 void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type);
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
 struct mlx5_core_srq *mlx5_core_get_srq(struct mlx5_core_dev *dev, u32 srqn);
 void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced);
-void mlx5_cq_event(struct mlx5_eq *eq, u32 cqn, int event_type);
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		       int nent, u64 mask, const char *name,
 		       enum mlx5_eq_type type);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [for-next 6/7] net/mlx5: Remove redundant EQ API exports
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2018-02-21 20:13 ` [for-next 5/7] net/mlx5: Move CQ completion and event forwarding logic to eq.c Saeed Mahameed
@ 2018-02-21 20:13 ` Saeed Mahameed
  2018-02-21 20:13 ` [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ) Saeed Mahameed
  2018-02-22 19:39 ` [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 David Miller
  7 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma, Saeed Mahameed

EQ structure and API is private to mlx5_core driver only, external
drivers should not have access or the means to manipulate EQ objects.

Remove redundant exports and move API functions out of the linux/mlx5
include directory into the driver's mlx5_core.h private include file.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c        |  3 ---
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 17 +++++++++++++++++
 include/linux/mlx5/driver.h                         | 17 -----------------
 3 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 7e442b38a8ca..c1c94974e16b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -720,7 +720,6 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 	mlx5_buf_free(dev, &eq->buf);
 	return err;
 }
-EXPORT_SYMBOL_GPL(mlx5_create_map_eq);
 
 int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 {
@@ -747,7 +746,6 @@ int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 
 	return err;
 }
-EXPORT_SYMBOL_GPL(mlx5_destroy_unmap_eq);
 
 int mlx5_eq_add_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq)
 {
@@ -925,4 +923,3 @@ int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 	MLX5_SET(query_eq_in, in, eq_number, eq->eqn);
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, outlen);
 }
-EXPORT_SYMBOL_GPL(mlx5_core_eq_query);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 54a1cbfb1b5a..23e17ac0cba5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -117,11 +117,28 @@ int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
 u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev);
 
+int mlx5_eq_init(struct mlx5_core_dev *dev);
+void mlx5_eq_cleanup(struct mlx5_core_dev *dev);
+int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
+		       int nent, u64 mask, const char *name,
+		       enum mlx5_eq_type type);
+int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
 int mlx5_eq_add_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq);
 int mlx5_eq_del_cq(struct mlx5_eq *eq, struct mlx5_core_cq *cq);
+int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
+		       u32 *out, int outlen);
+int mlx5_start_eqs(struct mlx5_core_dev *dev);
+void mlx5_stop_eqs(struct mlx5_core_dev *dev);
 struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
 u32 mlx5_eq_poll_irq_disabled(struct mlx5_eq *eq);
 void mlx5_cq_tasklet_cb(unsigned long data);
+void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced);
+int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
+void mlx5_debug_eq_remove(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
+int mlx5_eq_debugfs_init(struct mlx5_core_dev *dev);
+void mlx5_eq_debugfs_cleanup(struct mlx5_core_dev *dev);
+int mlx5_cq_debugfs_init(struct mlx5_core_dev *dev);
+void mlx5_cq_debugfs_cleanup(struct mlx5_core_dev *dev);
 
 int mlx5_query_pcam_reg(struct mlx5_core_dev *dev, u32 *pcam, u8 feature_group,
 			u8 access_reg_group);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 09e2f3e8753c..2860a253275b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1045,20 +1045,11 @@ int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot);
 int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev);
 void mlx5_register_debugfs(void);
 void mlx5_unregister_debugfs(void);
-int mlx5_eq_init(struct mlx5_core_dev *dev);
-void mlx5_eq_cleanup(struct mlx5_core_dev *dev);
 void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas);
 void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas);
 void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type);
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
 struct mlx5_core_srq *mlx5_core_get_srq(struct mlx5_core_dev *dev, u32 srqn);
-void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool forced);
-int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
-		       int nent, u64 mask, const char *name,
-		       enum mlx5_eq_type type);
-int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
-int mlx5_start_eqs(struct mlx5_core_dev *dev);
-void mlx5_stop_eqs(struct mlx5_core_dev *dev);
 int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
 		    unsigned int *irqn);
 int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn);
@@ -1070,14 +1061,6 @@ int mlx5_core_access_reg(struct mlx5_core_dev *dev, void *data_in,
 			 int size_in, void *data_out, int size_out,
 			 u16 reg_num, int arg, int write);
 
-int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
-void mlx5_debug_eq_remove(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
-int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
-		       u32 *out, int outlen);
-int mlx5_eq_debugfs_init(struct mlx5_core_dev *dev);
-void mlx5_eq_debugfs_cleanup(struct mlx5_core_dev *dev);
-int mlx5_cq_debugfs_init(struct mlx5_core_dev *dev);
-void mlx5_cq_debugfs_cleanup(struct mlx5_core_dev *dev);
 int mlx5_db_alloc(struct mlx5_core_dev *dev, struct mlx5_db *db);
 int mlx5_db_alloc_node(struct mlx5_core_dev *dev, struct mlx5_db *db,
 		       int node);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2018-02-21 20:13 ` [for-next 6/7] net/mlx5: Remove redundant EQ API exports Saeed Mahameed
@ 2018-02-21 20:13 ` Saeed Mahameed
  2018-02-22 17:34   ` Jason Gunthorpe
  2018-02-23  0:04   ` Santosh Shilimkar
  2018-02-22 19:39 ` [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 David Miller
  7 siblings, 2 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-21 20:13 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma,
	Yonatan Cohen, Leon Romanovsky, Saeed Mahameed

From: Yonatan Cohen <yonatanc@mellanox.com>

The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/hw/mlx5/cq.c                 | 64 +++++++++++++++----------
 drivers/infiniband/hw/mlx5/mlx5_ib.h            |  6 +--
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c | 37 +++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 11 +++--
 drivers/net/ethernet/mellanox/mlx5/core/wq.c    | 18 +++----
 drivers/net/ethernet/mellanox/mlx5/core/wq.h    | 22 +++------
 include/linux/mlx5/driver.h                     | 51 ++++++++++++++------
 7 files changed, 124 insertions(+), 85 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 5b974fb97611..c4c7b82f4ac1 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -64,14 +64,9 @@ static void mlx5_ib_cq_event(struct mlx5_core_cq *mcq, enum mlx5_event type)
 	}
 }
 
-static void *get_cqe_from_buf(struct mlx5_ib_cq_buf *buf, int n, int size)
-{
-	return mlx5_buf_offset(&buf->buf, n * size);
-}
-
 static void *get_cqe(struct mlx5_ib_cq *cq, int n)
 {
-	return get_cqe_from_buf(&cq->buf, n, cq->mcq.cqe_sz);
+	return mlx5_frag_buf_get_wqe(&cq->buf.fbc, n);
 }
 
 static u8 sw_ownership_bit(int n, int nent)
@@ -403,7 +398,7 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64,
 
 static void free_cq_buf(struct mlx5_ib_dev *dev, struct mlx5_ib_cq_buf *buf)
 {
-	mlx5_buf_free(dev->mdev, &buf->buf);
+	mlx5_frag_buf_free(dev->mdev, &buf->fbc.frag_buf);
 }
 
 static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe,
@@ -724,12 +719,25 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
 	return ret;
 }
 
-static int alloc_cq_buf(struct mlx5_ib_dev *dev, struct mlx5_ib_cq_buf *buf,
-			int nent, int cqe_size)
+static int alloc_cq_frag_buf(struct mlx5_ib_dev *dev,
+			     struct mlx5_ib_cq_buf *buf,
+			     int nent,
+			     int cqe_size)
 {
+	struct mlx5_frag_buf_ctrl *c = &buf->fbc;
+	struct mlx5_frag_buf *frag_buf = &c->frag_buf;
+	u32 cqc_buff[MLX5_ST_SZ_DW(cqc)] = {0};
 	int err;
 
-	err = mlx5_buf_alloc(dev->mdev, nent * cqe_size, &buf->buf);
+	MLX5_SET(cqc, cqc_buff, log_cq_size, ilog2(cqe_size));
+	MLX5_SET(cqc, cqc_buff, cqe_sz, (cqe_size == 128) ? 1 : 0);
+
+	mlx5_core_init_cq_frag_buf(&buf->fbc, cqc_buff);
+
+	err = mlx5_frag_buf_alloc_node(dev->mdev,
+				       nent * cqe_size,
+				       frag_buf,
+				       dev->mdev->priv.numa_node);
 	if (err)
 		return err;
 
@@ -862,14 +870,15 @@ static void destroy_cq_user(struct mlx5_ib_cq *cq, struct ib_ucontext *context)
 	ib_umem_release(cq->buf.umem);
 }
 
-static void init_cq_buf(struct mlx5_ib_cq *cq, struct mlx5_ib_cq_buf *buf)
+static void init_cq_frag_buf(struct mlx5_ib_cq *cq,
+			     struct mlx5_ib_cq_buf *buf)
 {
 	int i;
 	void *cqe;
 	struct mlx5_cqe64 *cqe64;
 
 	for (i = 0; i < buf->nent; i++) {
-		cqe = get_cqe_from_buf(buf, i, buf->cqe_size);
+		cqe = get_cqe(cq, i);
 		cqe64 = buf->cqe_size == 64 ? cqe : cqe + 64;
 		cqe64->op_own = MLX5_CQE_INVALID << 4;
 	}
@@ -891,14 +900,15 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	cq->mcq.arm_db     = cq->db.db + 1;
 	cq->mcq.cqe_sz = cqe_size;
 
-	err = alloc_cq_buf(dev, &cq->buf, entries, cqe_size);
+	err = alloc_cq_frag_buf(dev, &cq->buf, entries, cqe_size);
 	if (err)
 		goto err_db;
 
-	init_cq_buf(cq, &cq->buf);
+	init_cq_frag_buf(cq, &cq->buf);
 
 	*inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
-		 MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * cq->buf.buf.npages;
+		 MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) *
+		 cq->buf.fbc.frag_buf.npages;
 	*cqb = kvzalloc(*inlen, GFP_KERNEL);
 	if (!*cqb) {
 		err = -ENOMEM;
@@ -906,11 +916,12 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	}
 
 	pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, *cqb, pas);
-	mlx5_fill_page_array(&cq->buf.buf, pas);
+	mlx5_fill_page_frag_array(&cq->buf.fbc.frag_buf, pas);
 
 	cqc = MLX5_ADDR_OF(create_cq_in, *cqb, cq_context);
 	MLX5_SET(cqc, cqc, log_page_size,
-		 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+		 cq->buf.fbc.frag_buf.page_shift -
+		 MLX5_ADAPTER_PAGE_SHIFT);
 
 	*index = dev->mdev->priv.uar->index;
 
@@ -1207,11 +1218,11 @@ static int resize_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	if (!cq->resize_buf)
 		return -ENOMEM;
 
-	err = alloc_cq_buf(dev, cq->resize_buf, entries, cqe_size);
+	err = alloc_cq_frag_buf(dev, cq->resize_buf, entries, cqe_size);
 	if (err)
 		goto ex;
 
-	init_cq_buf(cq, cq->resize_buf);
+	init_cq_frag_buf(cq, cq->resize_buf);
 
 	return 0;
 
@@ -1256,9 +1267,8 @@ static int copy_resize_cqes(struct mlx5_ib_cq *cq)
 	}
 
 	while ((scqe64->op_own >> 4) != MLX5_CQE_RESIZE_CQ) {
-		dcqe = get_cqe_from_buf(cq->resize_buf,
-					(i + 1) & (cq->resize_buf->nent),
-					dsize);
+		dcqe = mlx5_frag_buf_get_wqe(&cq->resize_buf->fbc,
+					     (i + 1) & cq->resize_buf->nent);
 		dcqe64 = dsize == 64 ? dcqe : dcqe + 64;
 		sw_own = sw_ownership_bit(i + 1, cq->resize_buf->nent);
 		memcpy(dcqe, scqe, dsize);
@@ -1324,8 +1334,11 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 		cqe_size = 64;
 		err = resize_kernel(dev, cq, entries, cqe_size);
 		if (!err) {
-			npas = cq->resize_buf->buf.npages;
-			page_shift = cq->resize_buf->buf.page_shift;
+			struct mlx5_frag_buf_ctrl *c;
+
+			c = &cq->resize_buf->fbc;
+			npas = c->frag_buf.npages;
+			page_shift = c->frag_buf.page_shift;
 		}
 	}
 
@@ -1346,7 +1359,8 @@ int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
 		mlx5_ib_populate_pas(dev, cq->resize_umem, page_shift,
 				     pas, 0);
 	else
-		mlx5_fill_page_array(&cq->resize_buf->buf, pas);
+		mlx5_fill_page_frag_array(&cq->resize_buf->fbc.frag_buf,
+					  pas);
 
 	MLX5_SET(modify_cq_in, in,
 		 modify_field_select_resize_field_select.resize_field_select.resize_field_select,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 139385129973..eafb9751daf6 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -371,7 +371,7 @@ struct mlx5_ib_qp {
 		struct mlx5_ib_rss_qp rss_qp;
 		struct mlx5_ib_dct dct;
 	};
-	struct mlx5_buf		buf;
+	struct mlx5_frag_buf	buf;
 
 	struct mlx5_db		db;
 	struct mlx5_ib_wq	rq;
@@ -413,7 +413,7 @@ struct mlx5_ib_qp {
 };
 
 struct mlx5_ib_cq_buf {
-	struct mlx5_buf		buf;
+	struct mlx5_frag_buf_ctrl fbc;
 	struct ib_umem		*umem;
 	int			cqe_size;
 	int			nent;
@@ -495,7 +495,7 @@ struct mlx5_ib_wc {
 struct mlx5_ib_srq {
 	struct ib_srq		ibsrq;
 	struct mlx5_core_srq	msrq;
-	struct mlx5_buf		buf;
+	struct mlx5_frag_buf	buf;
 	struct mlx5_db		db;
 	u64		       *wrid;
 	/* protect SRQ hanlding
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
index 47239bf7bf43..323ffe8bf7e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
@@ -71,19 +71,24 @@ static void *mlx5_dma_zalloc_coherent_node(struct mlx5_core_dev *dev,
 }
 
 int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size,
-			struct mlx5_buf *buf, int node)
+			struct mlx5_frag_buf *buf, int node)
 {
 	dma_addr_t t;
 
 	buf->size = size;
 	buf->npages       = 1;
 	buf->page_shift   = (u8)get_order(size) + PAGE_SHIFT;
-	buf->direct.buf   = mlx5_dma_zalloc_coherent_node(dev, size,
-							  &t, node);
-	if (!buf->direct.buf)
+
+	buf->frags = kzalloc(sizeof(*buf->frags), GFP_KERNEL);
+	if (!buf->frags)
 		return -ENOMEM;
 
-	buf->direct.map = t;
+	buf->frags->buf   = mlx5_dma_zalloc_coherent_node(dev, size,
+							  &t, node);
+	if (!buf->frags->buf)
+		goto err_out;
+
+	buf->frags->map = t;
 
 	while (t & ((1 << buf->page_shift) - 1)) {
 		--buf->page_shift;
@@ -91,18 +96,24 @@ int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size,
 	}
 
 	return 0;
+err_out:
+	kfree(buf->frags);
+	return -ENOMEM;
 }
 
-int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, struct mlx5_buf *buf)
+int mlx5_buf_alloc(struct mlx5_core_dev *dev,
+		   int size, struct mlx5_frag_buf *buf)
 {
 	return mlx5_buf_alloc_node(dev, size, buf, dev->priv.numa_node);
 }
-EXPORT_SYMBOL_GPL(mlx5_buf_alloc);
+EXPORT_SYMBOL(mlx5_buf_alloc);
 
-void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_buf *buf)
+void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf)
 {
-	dma_free_coherent(&dev->pdev->dev, buf->size, buf->direct.buf,
-			  buf->direct.map);
+	dma_free_coherent(&dev->pdev->dev, buf->size, buf->frags->buf,
+			  buf->frags->map);
+
+	kfree(buf->frags);
 }
 EXPORT_SYMBOL_GPL(mlx5_buf_free);
 
@@ -147,6 +158,7 @@ int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int size,
 err_out:
 	return -ENOMEM;
 }
+EXPORT_SYMBOL_GPL(mlx5_frag_buf_alloc_node);
 
 void mlx5_frag_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf)
 {
@@ -162,6 +174,7 @@ void mlx5_frag_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf)
 	}
 	kfree(buf->frags);
 }
+EXPORT_SYMBOL_GPL(mlx5_frag_buf_free);
 
 static struct mlx5_db_pgdir *mlx5_alloc_db_pgdir(struct mlx5_core_dev *dev,
 						 int node)
@@ -275,13 +288,13 @@ void mlx5_db_free(struct mlx5_core_dev *dev, struct mlx5_db *db)
 }
 EXPORT_SYMBOL_GPL(mlx5_db_free);
 
-void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas)
+void mlx5_fill_page_array(struct mlx5_frag_buf *buf, __be64 *pas)
 {
 	u64 addr;
 	int i;
 
 	for (i = 0; i < buf->npages; i++) {
-		addr = buf->direct.map + (i << buf->page_shift);
+		addr = buf->frags->map + (i << buf->page_shift);
 
 		pas[i] = cpu_to_be64(addr);
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 0d4bb0688faa..80b84f6af2a1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -52,7 +52,7 @@ static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
 static inline void mlx5e_read_cqe_slot(struct mlx5e_cq *cq, u32 cqcc,
 				       void *data)
 {
-	u32 ci = cqcc & cq->wq.sz_m1;
+	u32 ci = cqcc & cq->wq.fbc.sz_m1;
 
 	memcpy(data, mlx5_cqwq_get_wqe(&cq->wq, ci), sizeof(struct mlx5_cqe64));
 }
@@ -74,9 +74,10 @@ static inline void mlx5e_read_mini_arr_slot(struct mlx5e_cq *cq, u32 cqcc)
 
 static inline void mlx5e_cqes_update_owner(struct mlx5e_cq *cq, u32 cqcc, int n)
 {
-	u8 op_own = (cqcc >> cq->wq.log_sz) & 1;
-	u32 wq_sz = 1 << cq->wq.log_sz;
-	u32 ci = cqcc & cq->wq.sz_m1;
+	struct mlx5_frag_buf_ctrl *fbc = &cq->wq.fbc;
+	u8 op_own = (cqcc >> fbc->log_sz) & 1;
+	u32 wq_sz = 1 << fbc->log_sz;
+	u32 ci = cqcc & fbc->sz_m1;
 	u32 ci_top = min_t(u32, wq_sz, ci + n);
 
 	for (; ci < ci_top; ci++, n--) {
@@ -101,7 +102,7 @@ static inline void mlx5e_decompress_cqe(struct mlx5e_rq *rq,
 	cq->title.byte_cnt     = cq->mini_arr[cq->mini_arr_idx].byte_cnt;
 	cq->title.check_sum    = cq->mini_arr[cq->mini_arr_idx].checksum;
 	cq->title.op_own      &= 0xf0;
-	cq->title.op_own      |= 0x01 & (cqcc >> cq->wq.log_sz);
+	cq->title.op_own      |= 0x01 & (cqcc >> cq->wq.fbc.log_sz);
 	cq->title.wqe_counter  = cpu_to_be16(cq->decmprs_wqe_counter);
 
 	if (rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.c b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
index 6bcfc25350f5..ea66448ba365 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
@@ -41,7 +41,7 @@ u32 mlx5_wq_cyc_get_size(struct mlx5_wq_cyc *wq)
 
 u32 mlx5_cqwq_get_size(struct mlx5_cqwq *wq)
 {
-	return wq->sz_m1 + 1;
+	return wq->fbc.sz_m1 + 1;
 }
 
 u32 mlx5_wq_ll_get_size(struct mlx5_wq_ll *wq)
@@ -62,7 +62,7 @@ static u32 mlx5_wq_qp_get_byte_size(struct mlx5_wq_qp *wq)
 
 static u32 mlx5_cqwq_get_byte_size(struct mlx5_cqwq *wq)
 {
-	return mlx5_cqwq_get_size(wq) << wq->log_stride;
+	return mlx5_cqwq_get_size(wq) << wq->fbc.log_stride;
 }
 
 static u32 mlx5_wq_ll_get_byte_size(struct mlx5_wq_ll *wq)
@@ -92,7 +92,7 @@ int mlx5_wq_cyc_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
 		goto err_db_free;
 	}
 
-	wq->buf = wq_ctrl->buf.direct.buf;
+	wq->buf = wq_ctrl->buf.frags->buf;
 	wq->db  = wq_ctrl->db.db;
 
 	wq_ctrl->mdev = mdev;
@@ -130,7 +130,7 @@ int mlx5_wq_qp_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
 		goto err_db_free;
 	}
 
-	wq->rq.buf = wq_ctrl->buf.direct.buf;
+	wq->rq.buf = wq_ctrl->buf.frags->buf;
 	wq->sq.buf = wq->rq.buf + mlx5_wq_cyc_get_byte_size(&wq->rq);
 	wq->rq.db  = &wq_ctrl->db.db[MLX5_RCV_DBR];
 	wq->sq.db  = &wq_ctrl->db.db[MLX5_SND_DBR];
@@ -151,11 +151,7 @@ int mlx5_cqwq_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
 {
 	int err;
 
-	wq->log_stride	= 6 + MLX5_GET(cqc, cqc, cqe_sz);
-	wq->log_sz	= MLX5_GET(cqc, cqc, log_cq_size);
-	wq->sz_m1	= (1 << wq->log_sz) - 1;
-	wq->log_frag_strides = PAGE_SHIFT - wq->log_stride;
-	wq->frag_sz_m1	= (1 << wq->log_frag_strides) - 1;
+	mlx5_core_init_cq_frag_buf(&wq->fbc, cqc);
 
 	err = mlx5_db_alloc_node(mdev, &wq_ctrl->db, param->db_numa_node);
 	if (err) {
@@ -172,7 +168,7 @@ int mlx5_cqwq_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
 		goto err_db_free;
 	}
 
-	wq->frag_buf = wq_ctrl->frag_buf;
+	wq->fbc.frag_buf = wq_ctrl->frag_buf;
 	wq->db  = wq_ctrl->db.db;
 
 	wq_ctrl->mdev = mdev;
@@ -209,7 +205,7 @@ int mlx5_wq_ll_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
 		goto err_db_free;
 	}
 
-	wq->buf = wq_ctrl->buf.direct.buf;
+	wq->buf = wq_ctrl->buf.frags->buf;
 	wq->db  = wq_ctrl->db.db;
 
 	for (i = 0; i < wq->sz_m1; i++) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.h b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
index 718589d0cec2..fca90b94596d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
@@ -45,7 +45,7 @@ struct mlx5_wq_param {
 
 struct mlx5_wq_ctrl {
 	struct mlx5_core_dev	*mdev;
-	struct mlx5_buf		buf;
+	struct mlx5_frag_buf	buf;
 	struct mlx5_db		db;
 };
 
@@ -68,14 +68,9 @@ struct mlx5_wq_qp {
 };
 
 struct mlx5_cqwq {
-	struct mlx5_frag_buf	frag_buf;
-	__be32			*db;
-	u32			sz_m1;
-	u32			frag_sz_m1;
-	u32			cc; /* consumer counter */
-	u8			log_sz;
-	u8			log_stride;
-	u8			log_frag_strides;
+	struct mlx5_frag_buf_ctrl fbc;
+	__be32			  *db;
+	u32			  cc; /* consumer counter */
 };
 
 struct mlx5_wq_ll {
@@ -131,20 +126,17 @@ static inline int mlx5_wq_cyc_cc_bigger(u16 cc1, u16 cc2)
 
 static inline u32 mlx5_cqwq_get_ci(struct mlx5_cqwq *wq)
 {
-	return wq->cc & wq->sz_m1;
+	return wq->cc & wq->fbc.sz_m1;
 }
 
 static inline void *mlx5_cqwq_get_wqe(struct mlx5_cqwq *wq, u32 ix)
 {
-	unsigned int frag = (ix >> wq->log_frag_strides);
-
-	return wq->frag_buf.frags[frag].buf +
-		((wq->frag_sz_m1 & ix) << wq->log_stride);
+	return mlx5_frag_buf_get_wqe(&wq->fbc, ix);
 }
 
 static inline u32 mlx5_cqwq_get_wrap_cnt(struct mlx5_cqwq *wq)
 {
-	return wq->cc >> wq->log_sz;
+	return wq->cc >> wq->fbc.log_sz;
 }
 
 static inline void mlx5_cqwq_pop(struct mlx5_cqwq *wq)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 2860a253275b..bfea26af6de5 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -345,13 +345,6 @@ struct mlx5_buf_list {
 	dma_addr_t		map;
 };
 
-struct mlx5_buf {
-	struct mlx5_buf_list	direct;
-	int			npages;
-	int			size;
-	u8			page_shift;
-};
-
 struct mlx5_frag_buf {
 	struct mlx5_buf_list	*frags;
 	int			npages;
@@ -359,6 +352,15 @@ struct mlx5_frag_buf {
 	u8			page_shift;
 };
 
+struct mlx5_frag_buf_ctrl {
+	struct mlx5_frag_buf	frag_buf;
+	u32			sz_m1;
+	u32			frag_sz_m1;
+	u8			log_sz;
+	u8			log_stride;
+	u8			log_frag_strides;
+};
+
 struct mlx5_eq_tasklet {
 	struct list_head list;
 	struct list_head process_list;
@@ -386,7 +388,7 @@ struct mlx5_eq {
 	struct mlx5_cq_table	cq_table;
 	__be32 __iomem	       *doorbell;
 	u32			cons_index;
-	struct mlx5_buf		buf;
+	struct mlx5_frag_buf	buf;
 	int			size;
 	unsigned int		irqn;
 	u8			eqn;
@@ -932,9 +934,9 @@ struct mlx5_hca_vport_context {
 	bool			grh_required;
 };
 
-static inline void *mlx5_buf_offset(struct mlx5_buf *buf, int offset)
+static inline void *mlx5_buf_offset(struct mlx5_frag_buf *buf, int offset)
 {
-		return buf->direct.buf + offset;
+		return buf->frags->buf + offset;
 }
 
 #define STRUCT_FIELD(header, field) \
@@ -973,6 +975,25 @@ static inline u32 mlx5_base_mkey(const u32 key)
 	return key & 0xffffff00u;
 }
 
+static inline void mlx5_core_init_cq_frag_buf(struct mlx5_frag_buf_ctrl *fbc,
+					      void *cqc)
+{
+	fbc->log_stride	= 6 + MLX5_GET(cqc, cqc, cqe_sz);
+	fbc->log_sz	= MLX5_GET(cqc, cqc, log_cq_size);
+	fbc->sz_m1	= (1 << fbc->log_sz) - 1;
+	fbc->log_frag_strides = PAGE_SHIFT - fbc->log_stride;
+	fbc->frag_sz_m1	= (1 << fbc->log_frag_strides) - 1;
+}
+
+static inline void *mlx5_frag_buf_get_wqe(struct mlx5_frag_buf_ctrl *fbc,
+					  u32 ix)
+{
+	unsigned int frag = (ix >> fbc->log_frag_strides);
+
+	return fbc->frag_buf.frags[frag].buf +
+		((fbc->frag_sz_m1 & ix) << fbc->log_stride);
+}
+
 int mlx5_cmd_init(struct mlx5_core_dev *dev);
 void mlx5_cmd_cleanup(struct mlx5_core_dev *dev);
 void mlx5_cmd_use_events(struct mlx5_core_dev *dev);
@@ -998,9 +1019,10 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev);
 void mlx5_trigger_health_work(struct mlx5_core_dev *dev);
 void mlx5_drain_health_recovery(struct mlx5_core_dev *dev);
 int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size,
-			struct mlx5_buf *buf, int node);
-int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, struct mlx5_buf *buf);
-void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_buf *buf);
+			struct mlx5_frag_buf *buf, int node);
+int mlx5_buf_alloc(struct mlx5_core_dev *dev,
+		   int size, struct mlx5_frag_buf *buf);
+void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf);
 int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int size,
 			     struct mlx5_frag_buf *buf, int node);
 void mlx5_frag_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf);
@@ -1045,7 +1067,8 @@ int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot);
 int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev);
 void mlx5_register_debugfs(void);
 void mlx5_unregister_debugfs(void);
-void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas);
+
+void mlx5_fill_page_array(struct mlx5_frag_buf *buf, __be64 *pas);
 void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas);
 void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type);
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)
  2018-02-21 20:13 ` [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ) Saeed Mahameed
@ 2018-02-22 17:34   ` Jason Gunthorpe
  2018-02-23  0:04   ` Santosh Shilimkar
  1 sibling, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2018-02-22 17:34 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Doug Ledford, Leon Romanovsky, netdev,
	linux-rdma, Yonatan Cohen, Leon Romanovsky

On Wed, Feb 21, 2018 at 12:13:54PM -0800, Saeed Mahameed wrote:
> From: Yonatan Cohen <yonatanc@mellanox.com>
> 
> The current implementation of create CQ requires contiguous
> memory, such requirement is problematic once the memory is
> fragmented or the system is low in memory, it causes for
> failures in dma_zalloc_coherent().
> 
> This patch implements new scheme of fragmented CQ to overcome
> this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
> to allocate fragmented buffers, rather than contiguous ones.
> 
> Base the Completion Queues (CQs) on this new fragmented buffer.
> 
> It fixes following crashes:
> kworker/29:0: page allocation failure: order:6, mode:0x80d0
> CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
> Workqueue: ib_cm cm_work_handler [ib_cm]
> Call Trace:
> [<>] dump_stack+0x19/0x1b
> [<>] warn_alloc_failed+0x110/0x180
> [<>] __alloc_pages_slowpath+0x6b7/0x725
> [<>] __alloc_pages_nodemask+0x405/0x420
> [<>] dma_generic_alloc_coherent+0x8f/0x140
> [<>] x86_swiotlb_alloc_coherent+0x21/0x50
> [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
> [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
> [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
> [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
> [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
> [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
> 
> Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>  drivers/infiniband/hw/mlx5/cq.c                 | 64 +++++++++++++++----------
>  drivers/infiniband/hw/mlx5/mlx5_ib.h            |  6 +--
>  drivers/net/ethernet/mellanox/mlx5/core/alloc.c | 37 +++++++++-----
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 11 +++--
>  drivers/net/ethernet/mellanox/mlx5/core/wq.c    | 18 +++----
>  drivers/net/ethernet/mellanox/mlx5/core/wq.h    | 22 +++------
>  include/linux/mlx5/driver.h                     | 51 ++++++++++++++------
>  7 files changed, 124 insertions(+), 85 deletions(-)

For the drivers/infiniband stuff:

Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21
  2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2018-02-21 20:13 ` [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ) Saeed Mahameed
@ 2018-02-22 19:39 ` David Miller
  2018-02-23  2:10   ` Doug Ledford
  7 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2018-02-22 19:39 UTC (permalink / raw)
  To: saeedm; +Cc: dledford, leonro, jgg, netdev, linux-rdma

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Wed, 21 Feb 2018 12:13:47 -0800

> This series includes shared code updates for mlx5 core driver for both
> netdev and rdma subsystems.  This series should be pulled to both
> trees so we can continue netdev and rdma specific submissions separately.
> 
> For more information please see tag log below.
> 
> P.S. We expect two more shared code pull requests.
> 
> The series doesn't cause any conflict with the latest mlx5 net fixes
> series.
> 
> Please pull and let me know if there's any issue,

Looks good to me, pulled into net-next, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)
  2018-02-21 20:13 ` [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ) Saeed Mahameed
  2018-02-22 17:34   ` Jason Gunthorpe
@ 2018-02-23  0:04   ` Santosh Shilimkar
  2018-02-23 19:13     ` Saeed Mahameed
  1 sibling, 1 reply; 15+ messages in thread
From: Santosh Shilimkar @ 2018-02-23  0:04 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford
  Cc: Leon Romanovsky, Jason Gunthorpe, netdev, linux-rdma,
	Yonatan Cohen, Leon Romanovsky

Hi Saeed

On 2/21/2018 12:13 PM, Saeed Mahameed wrote:
> From: Yonatan Cohen <yonatanc@mellanox.com>
> 
> The current implementation of create CQ requires contiguous
> memory, such requirement is problematic once the memory is
> fragmented or the system is low in memory, it causes for
> failures in dma_zalloc_coherent().
> 
> This patch implements new scheme of fragmented CQ to overcome
> this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
> to allocate fragmented buffers, rather than contiguous ones.
> 
> Base the Completion Queues (CQs) on this new fragmented buffer.
> 
> It fixes following crashes:
> kworker/29:0: page allocation failure: order:6, mode:0x80d0
> CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
> Workqueue: ib_cm cm_work_handler [ib_cm]
> Call Trace:
> [<>] dump_stack+0x19/0x1b
> [<>] warn_alloc_failed+0x110/0x180
> [<>] __alloc_pages_slowpath+0x6b7/0x725
> [<>] __alloc_pages_nodemask+0x405/0x420
> [<>] dma_generic_alloc_coherent+0x8f/0x140
> [<>] x86_swiotlb_alloc_coherent+0x21/0x50
> [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
> [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
> [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
> [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
> [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
> [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
> 
> Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Leon Romanovsky <leon@kernel.org>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
Jason mentioned about this patch to me off-list. We were
seeing similar issue with SRQs & QPs. So wondering whether
you have any plans to do similar change for other resouces
too so that they don't rely on higher order page allocation
for icm tables.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21
  2018-02-22 19:39 ` [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 David Miller
@ 2018-02-23  2:10   ` Doug Ledford
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Ledford @ 2018-02-23  2:10 UTC (permalink / raw)
  To: David Miller, saeedm; +Cc: leonro, jgg, netdev, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 884 bytes --]

On Thu, 2018-02-22 at 14:39 -0500, David Miller wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Wed, 21 Feb 2018 12:13:47 -0800
> 
> > This series includes shared code updates for mlx5 core driver for both
> > netdev and rdma subsystems.  This series should be pulled to both
> > trees so we can continue netdev and rdma specific submissions separately.
> > 
> > For more information please see tag log below.
> > 
> > P.S. We expect two more shared code pull requests.
> > 
> > The series doesn't cause any conflict with the latest mlx5 net fixes
> > series.
> > 
> > Please pull and let me know if there's any issue,
> 
> Looks good to me, pulled into net-next, thanks.

Thanks, pulled into rdma-next.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)
  2018-02-23  0:04   ` Santosh Shilimkar
@ 2018-02-23 19:13     ` Saeed Mahameed
  2018-02-24  9:40       ` Majd Dibbiny
  0 siblings, 1 reply; 15+ messages in thread
From: Saeed Mahameed @ 2018-02-23 19:13 UTC (permalink / raw)
  To: santosh.shilimkar, davem, Majd Dibbiny, dledford
  Cc: Jason Gunthorpe, netdev, Leon Romanovsky, linux-rdma, leon,
	Yonatan Cohen

On Thu, 2018-02-22 at 16:04 -0800, Santosh Shilimkar wrote:
> Hi Saeed
> 
> On 2/21/2018 12:13 PM, Saeed Mahameed wrote:
> > From: Yonatan Cohen <yonatanc@mellanox.com>
> > 
> > The current implementation of create CQ requires contiguous
> > memory, such requirement is problematic once the memory is
> > fragmented or the system is low in memory, it causes for
> > failures in dma_zalloc_coherent().
> > 
> > This patch implements new scheme of fragmented CQ to overcome
> > this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
> > to allocate fragmented buffers, rather than contiguous ones.
> > 
> > Base the Completion Queues (CQs) on this new fragmented buffer.
> > 
> > It fixes following crashes:
> > kworker/29:0: page allocation failure: order:6, mode:0x80d0
> > CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
> > Workqueue: ib_cm cm_work_handler [ib_cm]
> > Call Trace:
> > [<>] dump_stack+0x19/0x1b
> > [<>] warn_alloc_failed+0x110/0x180
> > [<>] __alloc_pages_slowpath+0x6b7/0x725
> > [<>] __alloc_pages_nodemask+0x405/0x420
> > [<>] dma_generic_alloc_coherent+0x8f/0x140
> > [<>] x86_swiotlb_alloc_coherent+0x21/0x50
> > [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
> > [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
> > [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
> > [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
> > [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
> > [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
> > 
> > Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
> > Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leon@kernel.org>
> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > ---
> 
> Jason mentioned about this patch to me off-list. We were
> seeing similar issue with SRQs & QPs. So wondering whether
> you have any plans to do similar change for other resouces
> too so that they don't rely on higher order page allocation
> for icm tables.
> 

Hi Santosh,

Adding Majd,

Which ULP is in question ? how big are the QPs/SRQs you create that
lead to this problem ?

For icm tables we already allocate only order 0 pages:
see alloc_system_page() in
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c

But for kernel RDMA SRQ and QP buffers there is a place for
improvement.

Majd, do you know if we have any near future plans for this.

> Regards,
> Santosh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)
  2018-02-23 19:13     ` Saeed Mahameed
@ 2018-02-24  9:40       ` Majd Dibbiny
  2018-02-25 19:56         ` Santosh Shilimkar
  0 siblings, 1 reply; 15+ messages in thread
From: Majd Dibbiny @ 2018-02-24  9:40 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: santosh.shilimkar, davem, dledford, Jason Gunthorpe, netdev,
	Leon Romanovsky, linux-rdma, leon, Yonatan Cohen


> On Feb 23, 2018, at 9:13 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:
> 
>> On Thu, 2018-02-22 at 16:04 -0800, Santosh Shilimkar wrote:
>> Hi Saeed
>> 
>>> On 2/21/2018 12:13 PM, Saeed Mahameed wrote:
>>> From: Yonatan Cohen <yonatanc@mellanox.com>
>>> 
>>> The current implementation of create CQ requires contiguous
>>> memory, such requirement is problematic once the memory is
>>> fragmented or the system is low in memory, it causes for
>>> failures in dma_zalloc_coherent().
>>> 
>>> This patch implements new scheme of fragmented CQ to overcome
>>> this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
>>> to allocate fragmented buffers, rather than contiguous ones.
>>> 
>>> Base the Completion Queues (CQs) on this new fragmented buffer.
>>> 
>>> It fixes following crashes:
>>> kworker/29:0: page allocation failure: order:6, mode:0x80d0
>>> CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
>>> Workqueue: ib_cm cm_work_handler [ib_cm]
>>> Call Trace:
>>> [<>] dump_stack+0x19/0x1b
>>> [<>] warn_alloc_failed+0x110/0x180
>>> [<>] __alloc_pages_slowpath+0x6b7/0x725
>>> [<>] __alloc_pages_nodemask+0x405/0x420
>>> [<>] dma_generic_alloc_coherent+0x8f/0x140
>>> [<>] x86_swiotlb_alloc_coherent+0x21/0x50
>>> [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
>>> [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
>>> [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
>>> [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
>>> [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
>>> [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
>>> 
>>> Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
>>> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
>>> Signed-off-by: Leon Romanovsky <leon@kernel.org>
>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>>> ---
>> 
>> Jason mentioned about this patch to me off-list. We were
>> seeing similar issue with SRQs & QPs. So wondering whether
>> you have any plans to do similar change for other resouces
>> too so that they don't rely on higher order page allocation
>> for icm tables.
>> 
> 
> Hi Santosh,
> 
> Adding Majd,
> 
> Which ULP is in question ? how big are the QPs/SRQs you create that
> lead to this problem ?
> 
> For icm tables we already allocate only order 0 pages:
> see alloc_system_page() in
> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> 
> But for kernel RDMA SRQ and QP buffers there is a place for
> improvement.
> 
> Majd, do you know if we have any near future plans for this.

It’s in our plans to move all the buffers to use 0-order pages.

Santosh,

Is this RDS? Do you have persistent failure with some configuration? Can you please share more information?

Thanks
> 
>> Regards,
>> Santosh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)
  2018-02-24  9:40       ` Majd Dibbiny
@ 2018-02-25 19:56         ` Santosh Shilimkar
  0 siblings, 0 replies; 15+ messages in thread
From: Santosh Shilimkar @ 2018-02-25 19:56 UTC (permalink / raw)
  To: Majd Dibbiny, Saeed Mahameed
  Cc: davem, dledford, Jason Gunthorpe, netdev, Leon Romanovsky,
	linux-rdma, leon, Yonatan Cohen

n 2/24/2018 1:40 AM, Majd Dibbiny wrote:
> 
>> On Feb 23, 2018, at 9:13 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>>
>>> On Thu, 2018-02-22 at 16:04 -0800, Santosh Shilimkar wrote:
>>> Hi Saeed
>>>
>>>> On 2/21/2018 12:13 PM, Saeed Mahameed wrote:

[...]

>>>
>>> Jason mentioned about this patch to me off-list. We were
>>> seeing similar issue with SRQs & QPs. So wondering whether
>>> you have any plans to do similar change for other resouces
>>> too so that they don't rely on higher order page allocation
>>> for icm tables.
>>>
>>
>> Hi Santosh,
>>
>> Adding Majd,
>>
>> Which ULP is in question ? how big are the QPs/SRQs you create that
>> lead to this problem ?
>>
>> For icm tables we already allocate only order 0 pages:
>> see alloc_system_page() in
>> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>
>> But for kernel RDMA SRQ and QP buffers there is a place for
>> improvement.
>>
>> Majd, do you know if we have any near future plans for this.
> 
> It’s in our plans to move all the buffers to use 0-order pages.
> 
> Santosh,
> 
> Is this RDS? Do you have persistent failure with some configuration? Can you please share more information?
> 
No the issue seen with user verbs and actually MLX4 driver. My
last question was more for both MLX4 and MLX5 drivers icm
allocation for all the resources.

With MLX4 driver, we have seen corruption issues with MLX4_NO_RR
while recycling the issues. So we ended up switching to round robin
bitmap allocation as it was before which was changed by one of
Jacks commit 7c6d74d23 {mlx4_core: Roll back round robin bitmap
allocation commit for CQs, SRQs, and MPTs}

With default round robin, the corruption issue went away but then
its undesired effect of bloating the icm tables till you hit the
resource limit means more memory fragmentation. Since these resources
makes use of higher order allocations and in fragmented memory scenarios
we see contention on mm lock for seconds since compaction layer is
trying to stitch pages which takes time.

If these alloaction don't make use of higher order pages, the issue
can be certainly avoided and hence the reason behind the question.

Ofcourse we wouldn't have ended up with this issue if 'MLX4_NO_RR'
worked without corruption :-)

Regards,
Santosh

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-02-25 19:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-21 20:13 [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 Saeed Mahameed
2018-02-21 20:13 ` [for-next 1/7] net/mlx5: CQ Database per EQ Saeed Mahameed
2018-02-21 20:13 ` [for-next 2/7] net/mlx5: Add missing likely/unlikely hints to cq events Saeed Mahameed
2018-02-21 20:13 ` [for-next 3/7] net/mlx5: EQ add/del CQ API Saeed Mahameed
2018-02-21 20:13 ` [for-next 4/7] net/mlx5: CQ hold/put API Saeed Mahameed
2018-02-21 20:13 ` [for-next 5/7] net/mlx5: Move CQ completion and event forwarding logic to eq.c Saeed Mahameed
2018-02-21 20:13 ` [for-next 6/7] net/mlx5: Remove redundant EQ API exports Saeed Mahameed
2018-02-21 20:13 ` [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ) Saeed Mahameed
2018-02-22 17:34   ` Jason Gunthorpe
2018-02-23  0:04   ` Santosh Shilimkar
2018-02-23 19:13     ` Saeed Mahameed
2018-02-24  9:40       ` Majd Dibbiny
2018-02-25 19:56         ` Santosh Shilimkar
2018-02-22 19:39 ` [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21 David Miller
2018-02-23  2:10   ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.