All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net-next V2 00/15] mlx5 updates 2020-12-01
@ 2020-12-03  4:20 Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 01/15] net/mlx5e: Free drop RQ in a dedicated function Saeed Mahameed
                   ` (14 more replies)
  0 siblings, 15 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:20 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S. Miller, netdev, Saeed Mahameed

Hi Jakub,

v1->v2: Removed merge commit of mlx5-next.

This series adds port tx timestamping support and some misc updates.
For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

---

The following changes since commit cec85994c6b4fa6beb5de61dcd03e23001b9deb5:

  bareudp: constify device_type declaration (2020-12-02 18:00:18 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2020-12-01

for you to fetch changes up to f5c3fd7ba638e4fa5908144b3995dcb6854fba60:

  net/mlx5e: Fill mlx5e_create_cq_param in a function (2020-12-02 20:14:42 -0800)

----------------------------------------------------------------
mlx5-updates-2020-12-01

mlx5e port TX timestamping support and MISC updates

1) Add support for port TX timestamping, for better PTP accuracy.

Currently in mlx5 HW TX timestamping is done on CQE (TX completion)
generation, which much earlier than when the packet actually goes out to
the wire, in this series Eran implements the option to do timestamping on
the port using a special SQ (Send Queue), such Send Queue will generate 2
CQEs (TX completions), the original one and a new one when the packet
leaves the port, due to the nature of this special handling, such mechanism
is an opt-in only and it is off by default to avoid any performance
degradation on normal traffic flows.

2) Misc updates and trivial improvements.

----------------------------------------------------------------
Aya Levin (3):
      net/mlx5e: Allow CQ outside of channel context
      net/mlx5e: Allow RQ outside of channel context
      net/mlx5e: Split between RX/TX tunnel FW support indication

Eran Ben Elisha (6):
      net/mlx5e: Allow SQ outside of channel context
      net/mlx5e: Change skb fifo push/pop API to be used without SQ
      net/mlx5e: Split SW group counters update function
      net/mlx5e: Move MLX5E_RX_ERR_CQE macro
      net/mlx5e: Add TX PTP port object support
      net/mlx5e: Add TX port timestamp support

Maxim Mikityanskiy (1):
      net/mlx5e: Fill mlx5e_create_cq_param in a function

Shay Drory (1):
      net/mlx5: Arm only EQs with EQEs

Tariq Toukan (1):
      net/mlx5e: Free drop RQ in a dedicated function

YueHaibing (2):
      net/mlx5e: Remove duplicated include
      net/mlx5: Fix passing zero to 'PTR_ERR'

Zhu Yanjun (1):
      net/mlx5e: remove unnecessary memset

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  63 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en/fs.h    |   3 +-
 .../net/ethernet/mellanox/mlx5/core/en/health.c    |  16 +-
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |   7 +-
 .../net/ethernet/mellanox/mlx5/core/en/params.h    |  10 +
 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c   | 529 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h   |  63 +++
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   |  52 +-
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 215 +++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h  |  19 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c |   9 +-
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c         |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  33 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    |  20 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 252 ++++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  29 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 403 +++++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |  11 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c    |  77 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   6 +-
 .../mellanox/mlx5/core/esw/acl/egress_lgcy.c       |   2 +-
 .../mellanox/mlx5/core/esw/acl/egress_ofld.c       |   2 +-
 .../mellanox/mlx5/core/esw/acl/ingress_lgcy.c      |   2 +-
 .../mellanox/mlx5/core/esw/acl/ingress_ofld.c      |   2 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |   1 -
 27 files changed, 1485 insertions(+), 350 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [net-next V2 01/15] net/mlx5e: Free drop RQ in a dedicated function
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
@ 2020-12-03  4:20 ` Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 02/15] net/mlx5e: Allow CQ outside of channel context Saeed Mahameed
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Tariq Toukan, Aya Levin,
	Maxim Mikityanskiy, Saeed Mahameed

From: Tariq Toukan <tariqt@nvidia.com>

The drop RQ has very limited objects to be freed, and differs
from regular RQs in the context that it is freed from.
Add a dedicated function for it, use it where needed, and remove
the drop_rq-specific checks in the generic function.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c  | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 527c5f12c5af..aab6b5d7de0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -613,14 +613,11 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 
 static void mlx5e_free_rq(struct mlx5e_rq *rq)
 {
-	struct mlx5e_channel *c = rq->channel;
-	struct bpf_prog *old_prog = NULL;
+	struct bpf_prog *old_prog;
 	int i;
 
-	/* drop_rq has neither channel nor xdp_prog. */
-	if (c)
-		old_prog = rcu_dereference_protected(rq->xdp_prog,
-						     lockdep_is_held(&c->priv->state_lock));
+	old_prog = rcu_dereference_protected(rq->xdp_prog,
+					     lockdep_is_held(&rq->channel->priv->state_lock));
 	if (old_prog)
 		bpf_prog_put(old_prog);
 
@@ -3196,6 +3193,11 @@ int mlx5e_close(struct net_device *netdev)
 	return err;
 }
 
+static void mlx5e_free_drop_rq(struct mlx5e_rq *rq)
+{
+	mlx5_wq_destroy(&rq->wq_ctrl);
+}
+
 static int mlx5e_alloc_drop_rq(struct mlx5_core_dev *mdev,
 			       struct mlx5e_rq *rq,
 			       struct mlx5e_rq_param *param)
@@ -3263,7 +3265,7 @@ int mlx5e_open_drop_rq(struct mlx5e_priv *priv,
 	return 0;
 
 err_free_rq:
-	mlx5e_free_rq(drop_rq);
+	mlx5e_free_drop_rq(drop_rq);
 
 err_destroy_cq:
 	mlx5e_destroy_cq(cq);
@@ -3277,7 +3279,7 @@ int mlx5e_open_drop_rq(struct mlx5e_priv *priv,
 void mlx5e_close_drop_rq(struct mlx5e_rq *drop_rq)
 {
 	mlx5e_destroy_rq(drop_rq);
-	mlx5e_free_rq(drop_rq);
+	mlx5e_free_drop_rq(drop_rq);
 	mlx5e_destroy_cq(&drop_rq->cq);
 	mlx5e_free_cq(&drop_rq->cq);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 02/15] net/mlx5e: Allow CQ outside of channel context
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 01/15] net/mlx5e: Free drop RQ in a dedicated function Saeed Mahameed
@ 2020-12-03  4:20 ` Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 03/15] net/mlx5e: Allow RQ " Saeed Mahameed
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Aya Levin, Eran Ben Elisha,
	Tariq Toukan, Saeed Mahameed

From: Aya Levin <ayal@nvidia.com>

In order to be able to create a CQ outside of a channel context, remove
cq->channel direct pointer. This requires adding a direct pointer to
channel statistics, netdevice, priv and to mlx5_core in order to support
CQs that are a part of mlx5e_channel.
In addition, parameters the were previously derived from the channel
like napi, NUMA node, channel stats and index are now assembled in
struct mlx5e_create_cq_param which is given to mlx5e_open_cq() instead
of channel pointer. Generalizing mlx5e_open_cq() allows opening CQ
outside of channel context which will be used in following patches in
the patch-set.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 16 ++++-
 .../ethernet/mellanox/mlx5/core/en/health.c   |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h |  2 +-
 .../mellanox/mlx5/core/en/xsk/setup.c         | 12 +++-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 67 ++++++++++++-------
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  6 +-
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |  5 +-
 8 files changed, 73 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2f05b0f9de01..2d149ab48ce1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -282,10 +282,12 @@ struct mlx5e_cq {
 	u16                        event_ctr;
 	struct napi_struct        *napi;
 	struct mlx5_core_cq        mcq;
-	struct mlx5e_channel      *channel;
+	struct mlx5e_ch_stats     *ch_stats;
 
 	/* control */
+	struct net_device         *netdev;
 	struct mlx5_core_dev      *mdev;
+	struct mlx5e_priv         *priv;
 	struct mlx5_wq_ctrl        wq_ctrl;
 } ____cacheline_aligned_in_smp;
 
@@ -923,9 +925,17 @@ int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		     struct mlx5e_xdpsq *sq, bool is_redirect);
 void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq);
 
+struct mlx5e_create_cq_param {
+	struct napi_struct *napi;
+	struct mlx5e_ch_stats *ch_stats;
+	int node;
+	int ix;
+};
+
 struct mlx5e_cq_param;
-int mlx5e_open_cq(struct mlx5e_channel *c, struct dim_cq_moder moder,
-		  struct mlx5e_cq_param *param, struct mlx5e_cq *cq);
+int mlx5e_open_cq(struct mlx5e_priv *priv, struct dim_cq_moder moder,
+		  struct mlx5e_cq_param *param, struct mlx5e_create_cq_param *ccp,
+		  struct mlx5e_cq *cq);
 void mlx5e_close_cq(struct mlx5e_cq *cq);
 
 int mlx5e_open_locked(struct net_device *netdev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
index 69a05da0e3e3..c62f5e881377 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -37,13 +37,12 @@ int mlx5e_health_fmsg_named_obj_nest_end(struct devlink_fmsg *fmsg)
 
 int mlx5e_health_cq_diag_fmsg(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg)
 {
-	struct mlx5e_priv *priv = cq->channel->priv;
 	u32 out[MLX5_ST_SZ_DW(query_cq_out)] = {};
 	u8 hw_status;
 	void *cqc;
 	int err;
 
-	err = mlx5_core_query_cq(priv->mdev, &cq->mcq, out);
+	err = mlx5_core_query_cq(cq->mdev, &cq->mcq, out);
 	if (err)
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index 07ee1d236ab3..ac47efaaebd5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -308,7 +308,7 @@ static inline void mlx5e_dump_error_cqe(struct mlx5e_cq *cq, u32 qn,
 
 	ci = mlx5_cqwq_ctr2ix(wq, wq->cc - 1);
 
-	netdev_err(cq->channel->netdev,
+	netdev_err(cq->netdev,
 		   "Error cqe on cqn 0x%x, ci 0x%x, qn 0x%x, opcode 0x%x, syndrome 0x%x, vendor syndrome 0x%x\n",
 		   cq->mcq.cqn, ci, qn,
 		   get_cqe_opcode((struct mlx5_cqe64 *)err_cqe),
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index be3465ba38ca..7703e6553da6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -48,9 +48,15 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
 		   struct mlx5e_channel *c)
 {
+	struct mlx5e_create_cq_param ccp = {};
 	struct mlx5e_channel_param *cparam;
 	int err;
 
+	ccp.napi = &c->napi;
+	ccp.ch_stats = c->stats;
+	ccp.node = cpu_to_node(c->cpu);
+	ccp.ix = c->ix;
+
 	if (!mlx5e_validate_xsk_param(params, xsk, priv->mdev))
 		return -EINVAL;
 
@@ -60,7 +66,8 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 
 	mlx5e_build_xsk_cparam(priv, params, xsk, cparam);
 
-	err = mlx5e_open_cq(c, params->rx_cq_moderation, &cparam->rq.cqp, &c->xskrq.cq);
+	err = mlx5e_open_cq(c->priv, params->rx_cq_moderation, &cparam->rq.cqp, &ccp,
+			    &c->xskrq.cq);
 	if (unlikely(err))
 		goto err_free_cparam;
 
@@ -68,7 +75,8 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 	if (unlikely(err))
 		goto err_close_rx_cq;
 
-	err = mlx5e_open_cq(c, params->tx_cq_moderation, &cparam->xdp_sq.cqp, &c->xsksq.cq);
+	err = mlx5e_open_cq(c->priv, params->tx_cq_moderation, &cparam->xdp_sq.cqp, &ccp,
+			    &c->xsksq.cq);
 	if (unlikely(err))
 		goto err_close_rq;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index aab6b5d7de0a..67995a4ce220 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1515,10 +1515,11 @@ void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq)
 	mlx5e_free_xdpsq(sq);
 }
 
-static int mlx5e_alloc_cq_common(struct mlx5_core_dev *mdev,
+static int mlx5e_alloc_cq_common(struct mlx5e_priv *priv,
 				 struct mlx5e_cq_param *param,
 				 struct mlx5e_cq *cq)
 {
+	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5_core_cq *mcq = &cq->mcq;
 	int eqn_not_used;
 	unsigned int irqn;
@@ -1551,25 +1552,27 @@ static int mlx5e_alloc_cq_common(struct mlx5_core_dev *mdev,
 	}
 
 	cq->mdev = mdev;
+	cq->netdev = priv->netdev;
+	cq->priv = priv;
 
 	return 0;
 }
 
-static int mlx5e_alloc_cq(struct mlx5e_channel *c,
+static int mlx5e_alloc_cq(struct mlx5e_priv *priv,
 			  struct mlx5e_cq_param *param,
+			  struct mlx5e_create_cq_param *ccp,
 			  struct mlx5e_cq *cq)
 {
-	struct mlx5_core_dev *mdev = c->priv->mdev;
 	int err;
 
-	param->wq.buf_numa_node = cpu_to_node(c->cpu);
-	param->wq.db_numa_node  = cpu_to_node(c->cpu);
-	param->eq_ix   = c->ix;
+	param->wq.buf_numa_node = ccp->node;
+	param->wq.db_numa_node  = ccp->node;
+	param->eq_ix            = ccp->ix;
 
-	err = mlx5e_alloc_cq_common(mdev, param, cq);
+	err = mlx5e_alloc_cq_common(priv, param, cq);
 
-	cq->napi    = &c->napi;
-	cq->channel = c;
+	cq->napi     = ccp->napi;
+	cq->ch_stats = ccp->ch_stats;
 
 	return err;
 }
@@ -1633,13 +1636,14 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
 	mlx5_core_destroy_cq(cq->mdev, &cq->mcq);
 }
 
-int mlx5e_open_cq(struct mlx5e_channel *c, struct dim_cq_moder moder,
-		  struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
+int mlx5e_open_cq(struct mlx5e_priv *priv, struct dim_cq_moder moder,
+		  struct mlx5e_cq_param *param, struct mlx5e_create_cq_param *ccp,
+		  struct mlx5e_cq *cq)
 {
-	struct mlx5_core_dev *mdev = c->mdev;
+	struct mlx5_core_dev *mdev = priv->mdev;
 	int err;
 
-	err = mlx5e_alloc_cq(c, param, cq);
+	err = mlx5e_alloc_cq(priv, param, ccp, cq);
 	if (err)
 		return err;
 
@@ -1665,14 +1669,15 @@ void mlx5e_close_cq(struct mlx5e_cq *cq)
 
 static int mlx5e_open_tx_cqs(struct mlx5e_channel *c,
 			     struct mlx5e_params *params,
+			     struct mlx5e_create_cq_param *ccp,
 			     struct mlx5e_channel_param *cparam)
 {
 	int err;
 	int tc;
 
 	for (tc = 0; tc < c->num_tc; tc++) {
-		err = mlx5e_open_cq(c, params->tx_cq_moderation,
-				    &cparam->txq_sq.cqp, &c->sq[tc].cq);
+		err = mlx5e_open_cq(c->priv, params->tx_cq_moderation, &cparam->txq_sq.cqp,
+				    ccp, &c->sq[tc].cq);
 		if (err)
 			goto err_close_tx_cqs;
 	}
@@ -1812,30 +1817,40 @@ static int mlx5e_open_queues(struct mlx5e_channel *c,
 			     struct mlx5e_channel_param *cparam)
 {
 	struct dim_cq_moder icocq_moder = {0, 0};
+	struct mlx5e_create_cq_param ccp = {};
 	int err;
 
-	err = mlx5e_open_cq(c, icocq_moder, &cparam->icosq.cqp, &c->async_icosq.cq);
+	ccp.napi = &c->napi;
+	ccp.ch_stats = c->stats;
+	ccp.node = cpu_to_node(c->cpu);
+	ccp.ix = c->ix;
+
+	err = mlx5e_open_cq(c->priv, icocq_moder, &cparam->icosq.cqp, &ccp,
+			    &c->async_icosq.cq);
 	if (err)
 		return err;
 
-	err = mlx5e_open_cq(c, icocq_moder, &cparam->async_icosq.cqp, &c->icosq.cq);
+	err = mlx5e_open_cq(c->priv, icocq_moder, &cparam->async_icosq.cqp, &ccp,
+			    &c->icosq.cq);
 	if (err)
 		goto err_close_async_icosq_cq;
 
-	err = mlx5e_open_tx_cqs(c, params, cparam);
+	err = mlx5e_open_tx_cqs(c, params, &ccp, cparam);
 	if (err)
 		goto err_close_icosq_cq;
 
-	err = mlx5e_open_cq(c, params->tx_cq_moderation, &cparam->xdp_sq.cqp, &c->xdpsq.cq);
+	err = mlx5e_open_cq(c->priv, params->tx_cq_moderation, &cparam->xdp_sq.cqp, &ccp,
+			    &c->xdpsq.cq);
 	if (err)
 		goto err_close_tx_cqs;
 
-	err = mlx5e_open_cq(c, params->rx_cq_moderation, &cparam->rq.cqp, &c->rq.cq);
+	err = mlx5e_open_cq(c->priv, params->rx_cq_moderation, &cparam->rq.cqp, &ccp,
+			    &c->rq.cq);
 	if (err)
 		goto err_close_xdp_tx_cqs;
 
-	err = c->xdp ? mlx5e_open_cq(c, params->tx_cq_moderation,
-				     &cparam->xdp_sq.cqp, &c->rq_xdpsq.cq) : 0;
+	err = c->xdp ? mlx5e_open_cq(c->priv, params->tx_cq_moderation, &cparam->xdp_sq.cqp,
+				     &ccp, &c->rq_xdpsq.cq) : 0;
 	if (err)
 		goto err_close_rx_cq;
 
@@ -3221,14 +3236,16 @@ static int mlx5e_alloc_drop_rq(struct mlx5_core_dev *mdev,
 	return 0;
 }
 
-static int mlx5e_alloc_drop_cq(struct mlx5_core_dev *mdev,
+static int mlx5e_alloc_drop_cq(struct mlx5e_priv *priv,
 			       struct mlx5e_cq *cq,
 			       struct mlx5e_cq_param *param)
 {
+	struct mlx5_core_dev *mdev = priv->mdev;
+
 	param->wq.buf_numa_node = dev_to_node(mlx5_core_dma_dev(mdev));
 	param->wq.db_numa_node  = dev_to_node(mlx5_core_dma_dev(mdev));
 
-	return mlx5e_alloc_cq_common(mdev, param, cq);
+	return mlx5e_alloc_cq_common(priv, param, cq);
 }
 
 int mlx5e_open_drop_rq(struct mlx5e_priv *priv,
@@ -3242,7 +3259,7 @@ int mlx5e_open_drop_rq(struct mlx5e_priv *priv,
 
 	mlx5e_build_drop_rq_param(priv, &rq_param);
 
-	err = mlx5e_alloc_drop_cq(mdev, cq, &cq_param);
+	err = mlx5e_alloc_drop_cq(priv, cq, &cq_param);
 	if (err)
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 6628a0197b4e..08163dca15a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -670,13 +670,13 @@ int mlx5e_poll_ico_cq(struct mlx5e_cq *cq)
 			sqcc += wi->num_wqebbs;
 
 			if (last_wqe && unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ)) {
-				netdev_WARN_ONCE(cq->channel->netdev,
+				netdev_WARN_ONCE(cq->netdev,
 						 "Bad OP in ICOSQ CQE: 0x%x\n",
 						 get_cqe_opcode(cqe));
 				mlx5e_dump_error_cqe(&sq->cq, sq->sqn,
 						     (struct mlx5_err_cqe *)cqe);
 				if (!test_and_set_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
-					queue_work(cq->channel->priv->wq, &sq->recover_work);
+					queue_work(cq->priv->wq, &sq->recover_work);
 				break;
 			}
 
@@ -697,7 +697,7 @@ int mlx5e_poll_ico_cq(struct mlx5e_cq *cq)
 				break;
 #endif
 			default:
-				netdev_WARN_ONCE(cq->channel->netdev,
+				netdev_WARN_ONCE(cq->netdev,
 						 "Bad WQE type in ICOSQ WQE info: 0x%x\n",
 						 wi->wqe_type);
 			}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 6dd3ea3cbbed..14af7488cc4f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -797,8 +797,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 				mlx5e_dump_error_cqe(&sq->cq, sq->sqn,
 						     (struct mlx5_err_cqe *)cqe);
 				mlx5_wq_cyc_wqe_dump(&sq->wq, ci, wi->num_wqebbs);
-				queue_work(cq->channel->priv->wq,
-					   &sq->recover_work);
+				queue_work(cq->priv->wq, &sq->recover_work);
 			}
 			stats->cqe_err++;
 		}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index d5868670f8a5..1ec3d62f026d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -221,14 +221,13 @@ void mlx5e_completion_event(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe)
 
 	napi_schedule(cq->napi);
 	cq->event_ctr++;
-	cq->channel->stats->events++;
+	cq->ch_stats->events++;
 }
 
 void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event)
 {
 	struct mlx5e_cq *cq = container_of(mcq, struct mlx5e_cq, mcq);
-	struct mlx5e_channel *c = cq->channel;
-	struct net_device *netdev = c->netdev;
+	struct net_device *netdev = cq->netdev;
 
 	netdev_err(netdev, "%s: cqn=0x%.6x event=0x%.2x\n",
 		   __func__, mcq->cqn, event);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 03/15] net/mlx5e: Allow RQ outside of channel context
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 01/15] net/mlx5e: Free drop RQ in a dedicated function Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 02/15] net/mlx5e: Allow CQ outside of channel context Saeed Mahameed
@ 2020-12-03  4:20 ` Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 04/15] net/mlx5e: Allow SQ " Saeed Mahameed
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Aya Levin, Eran Ben Elisha,
	Tariq Toukan, Saeed Mahameed

From: Aya Levin <ayal@nvidia.com>

In order to be able to create an RQ outside of a channel context, remove
rq->channel direct pointer. This requires adding a direct pointer to:
ICOSQ and priv in order to support RQs that are part of mlx5e_channel.
Use channel_stats from the corresponding CQ.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 +-
 .../ethernet/mellanox/mlx5/core/en/health.c   |  9 ++--
 .../ethernet/mellanox/mlx5/core/en/health.h   |  3 +-
 .../mellanox/mlx5/core/en/reporter_rx.c       | 50 ++++++++++---------
 .../mellanox/mlx5/core/en/reporter_tx.c       |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 23 ++++-----
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++++----
 7 files changed, 59 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2d149ab48ce1..3dec0731f4da 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -595,7 +595,6 @@ struct mlx5e_rq {
 		u8             map_dir;   /* dma map direction */
 	} buff;
 
-	struct mlx5e_channel  *channel;
 	struct device         *pdev;
 	struct net_device     *netdev;
 	struct mlx5e_rq_stats *stats;
@@ -604,6 +603,8 @@ struct mlx5e_rq {
 	struct mlx5e_page_cache page_cache;
 	struct hwtstamp_config *tstamp;
 	struct mlx5_clock      *clock;
+	struct mlx5e_icosq    *icosq;
+	struct mlx5e_priv     *priv;
 
 	mlx5e_fp_handle_rx_cqe handle_rx_cqe;
 	mlx5e_fp_post_rx_wqes  post_wqes;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
index c62f5e881377..e8fc535e6f91 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -205,21 +205,22 @@ int mlx5e_health_recover_channels(struct mlx5e_priv *priv)
 	return err;
 }
 
-int mlx5e_health_channel_eq_recover(struct mlx5_eq_comp *eq, struct mlx5e_channel *channel)
+int mlx5e_health_channel_eq_recover(struct net_device *dev, struct mlx5_eq_comp *eq,
+				    struct mlx5e_ch_stats *stats)
 {
 	u32 eqe_count;
 
-	netdev_err(channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
+	netdev_err(dev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
 		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
 
 	eqe_count = mlx5_eq_poll_irq_disabled(eq);
 	if (!eqe_count)
 		return -EIO;
 
-	netdev_err(channel->netdev, "Recovered %d eqes on EQ 0x%x\n",
+	netdev_err(dev, "Recovered %d eqes on EQ 0x%x\n",
 		   eqe_count, eq->core.eqn);
 
-	channel->stats->eq_rearm++;
+	stats->eq_rearm++;
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index b9aadddfd000..48d0232ce654 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -43,7 +43,8 @@ struct mlx5e_err_ctx {
 };
 
 int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn);
-int mlx5e_health_channel_eq_recover(struct mlx5_eq_comp *eq, struct mlx5e_channel *channel);
+int mlx5e_health_channel_eq_recover(struct net_device *dev, struct mlx5_eq_comp *eq,
+				    struct mlx5e_ch_stats *stats);
 int mlx5e_health_recover_channels(struct mlx5e_priv *priv);
 int mlx5e_health_report(struct mlx5e_priv *priv,
 			struct devlink_health_reporter *reporter, char *err_str,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 9913647a1faf..0206e033a271 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -146,17 +146,16 @@ static int mlx5e_rx_reporter_err_rq_cqe_recover(void *ctx)
 
 static int mlx5e_rx_reporter_timeout_recover(void *ctx)
 {
-	struct mlx5e_icosq *icosq;
 	struct mlx5_eq_comp *eq;
 	struct mlx5e_rq *rq;
 	int err;
 
 	rq = ctx;
-	icosq = &rq->channel->icosq;
 	eq = rq->cq.mcq.eq;
-	err = mlx5e_health_channel_eq_recover(eq, rq->channel);
-	if (err)
-		clear_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state);
+
+	err = mlx5e_health_channel_eq_recover(rq->netdev, eq, rq->cq.ch_stats);
+	if (err && rq->icosq)
+		clear_bit(MLX5E_SQ_STATE_ENABLED, &rq->icosq->state);
 
 	return err;
 }
@@ -233,21 +232,13 @@ static int mlx5e_reporter_icosq_diagnose(struct mlx5e_icosq *icosq, u8 hw_state,
 static int mlx5e_rx_reporter_build_diagnose_output(struct mlx5e_rq *rq,
 						   struct devlink_fmsg *fmsg)
 {
-	struct mlx5e_priv *priv = rq->channel->priv;
-	struct mlx5e_icosq *icosq;
-	u8 icosq_hw_state;
 	u16 wqe_counter;
 	int wqes_sz;
 	u8 hw_state;
 	u16 wq_head;
 	int err;
 
-	icosq = &rq->channel->icosq;
-	err = mlx5e_query_rq_state(priv->mdev, rq->rqn, &hw_state);
-	if (err)
-		return err;
-
-	err = mlx5_core_query_sq_state(priv->mdev, icosq->sqn, &icosq_hw_state);
+	err = mlx5e_query_rq_state(rq->mdev, rq->rqn, &hw_state);
 	if (err)
 		return err;
 
@@ -259,7 +250,7 @@ static int mlx5e_rx_reporter_build_diagnose_output(struct mlx5e_rq *rq,
 	if (err)
 		return err;
 
-	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", rq->channel->ix);
+	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", rq->ix);
 	if (err)
 		return err;
 
@@ -295,9 +286,18 @@ static int mlx5e_rx_reporter_build_diagnose_output(struct mlx5e_rq *rq,
 	if (err)
 		return err;
 
-	err = mlx5e_reporter_icosq_diagnose(icosq, icosq_hw_state, fmsg);
-	if (err)
-		return err;
+	if (rq->icosq) {
+		struct mlx5e_icosq *icosq = rq->icosq;
+		u8 icosq_hw_state;
+
+		err = mlx5_core_query_sq_state(rq->mdev, icosq->sqn, &icosq_hw_state);
+		if (err)
+			return err;
+
+		err = mlx5e_reporter_icosq_diagnose(icosq, icosq_hw_state, fmsg);
+		if (err)
+			return err;
+	}
 
 	err = devlink_fmsg_obj_nest_end(fmsg);
 	if (err)
@@ -557,25 +557,29 @@ static int mlx5e_rx_reporter_dump(struct devlink_health_reporter *reporter,
 
 void mlx5e_reporter_rx_timeout(struct mlx5e_rq *rq)
 {
-	struct mlx5e_icosq *icosq = &rq->channel->icosq;
-	struct mlx5e_priv *priv = rq->channel->priv;
+	char icosq_str[MLX5E_REPORTER_PER_Q_MAX_LEN] = {};
 	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_icosq *icosq = rq->icosq;
+	struct mlx5e_priv *priv = rq->priv;
 	struct mlx5e_err_ctx err_ctx = {};
 
 	err_ctx.ctx = rq;
 	err_ctx.recover = mlx5e_rx_reporter_timeout_recover;
 	err_ctx.dump = mlx5e_rx_reporter_dump_rq;
+
+	if (icosq)
+		snprintf(icosq_str, sizeof(icosq_str), "ICOSQ: 0x%x, ", icosq->sqn);
 	snprintf(err_str, sizeof(err_str),
-		 "RX timeout on channel: %d, ICOSQ: 0x%x RQ: 0x%x, CQ: 0x%x",
-		 icosq->channel->ix, icosq->sqn, rq->rqn, rq->cq.mcq.cqn);
+		 "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
+		 rq->ix, icosq_str, rq->rqn, rq->cq.mcq.cqn);
 
 	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
 }
 
 void mlx5e_reporter_rq_cqe_err(struct mlx5e_rq *rq)
 {
-	struct mlx5e_priv *priv = rq->channel->priv;
 	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_priv *priv = rq->priv;
 	struct mlx5e_err_ctx err_ctx = {};
 
 	err_ctx.ctx = rq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 8be6eaa3eeb1..97bfeae17dec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -100,7 +100,7 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 	sq = to_ctx->sq;
 	eq = sq->cq.mcq.eq;
 	priv = sq->channel->priv;
-	err = mlx5e_health_channel_eq_recover(eq, sq->channel);
+	err = mlx5e_health_channel_eq_recover(sq->channel->netdev, eq, sq->channel->stats);
 	if (!err) {
 		to_ctx->status = 0; /* this sq recovered */
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 67995a4ce220..559ef38a6358 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -412,9 +412,10 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	rq->wq_type = params->rq_wq_type;
 	rq->pdev    = c->pdev;
 	rq->netdev  = c->netdev;
+	rq->priv    = c->priv;
 	rq->tstamp  = c->tstamp;
 	rq->clock   = &mdev->clock;
-	rq->channel = c;
+	rq->icosq   = &c->icosq;
 	rq->ix      = c->ix;
 	rq->mdev    = mdev;
 	rq->hw_mtu  = MLX5E_SW2HW_MTU(params, params->sw_mtu);
@@ -617,7 +618,7 @@ static void mlx5e_free_rq(struct mlx5e_rq *rq)
 	int i;
 
 	old_prog = rcu_dereference_protected(rq->xdp_prog,
-					     lockdep_is_held(&rq->channel->priv->state_lock));
+					     lockdep_is_held(&rq->priv->state_lock));
 	if (old_prog)
 		bpf_prog_put(old_prog);
 
@@ -717,9 +718,7 @@ int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_state)
 
 static int mlx5e_modify_rq_scatter_fcs(struct mlx5e_rq *rq, bool enable)
 {
-	struct mlx5e_channel *c = rq->channel;
-	struct mlx5e_priv *priv = c->priv;
-	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5_core_dev *mdev = rq->mdev;
 
 	void *in;
 	void *rqc;
@@ -748,8 +747,7 @@ static int mlx5e_modify_rq_scatter_fcs(struct mlx5e_rq *rq, bool enable)
 
 static int mlx5e_modify_rq_vsd(struct mlx5e_rq *rq, bool vsd)
 {
-	struct mlx5e_channel *c = rq->channel;
-	struct mlx5_core_dev *mdev = c->mdev;
+	struct mlx5_core_dev *mdev = rq->mdev;
 	void *in;
 	void *rqc;
 	int inlen;
@@ -783,7 +781,6 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
 int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time)
 {
 	unsigned long exp_time = jiffies + msecs_to_jiffies(wait_time);
-	struct mlx5e_channel *c = rq->channel;
 
 	u16 min_wqes = mlx5_min_rx_wqes(rq->wq_type, mlx5e_rqwq_get_size(rq));
 
@@ -794,8 +791,8 @@ int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time)
 		msleep(20);
 	} while (time_before(jiffies, exp_time));
 
-	netdev_warn(c->netdev, "Failed to get min RX wqes on Channel[%d] RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n",
-		    c->ix, rq->rqn, mlx5e_rqwq_get_cur_sz(rq), min_wqes);
+	netdev_warn(rq->netdev, "Failed to get min RX wqes on Channel[%d] RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n",
+		    rq->ix, rq->rqn, mlx5e_rqwq_get_cur_sz(rq), min_wqes);
 
 	mlx5e_reporter_rx_timeout(rq);
 	return -ETIMEDOUT;
@@ -910,7 +907,7 @@ int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 void mlx5e_activate_rq(struct mlx5e_rq *rq)
 {
 	set_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
-	mlx5e_trigger_irq(&rq->channel->icosq);
+	mlx5e_trigger_irq(rq->icosq);
 }
 
 void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
@@ -922,7 +919,7 @@ void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
 void mlx5e_close_rq(struct mlx5e_rq *rq)
 {
 	cancel_work_sync(&rq->dim.work);
-	cancel_work_sync(&rq->channel->icosq.recover_work);
+	cancel_work_sync(&rq->icosq->recover_work);
 	cancel_work_sync(&rq->recover_work);
 	mlx5e_destroy_rq(rq);
 	mlx5e_free_rx_descs(rq);
@@ -4411,7 +4408,7 @@ static void mlx5e_rq_replace_xdp_prog(struct mlx5e_rq *rq, struct bpf_prog *prog
 	struct bpf_prog *old_prog;
 
 	old_prog = rcu_replace_pointer(rq->xdp_prog, prog,
-				       lockdep_is_held(&rq->channel->priv->state_lock));
+				       lockdep_is_held(&rq->priv->state_lock));
 	if (old_prog)
 		bpf_prog_put(old_prog);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 08163dca15a0..5c0015024f62 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -503,7 +503,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 {
 	struct mlx5e_mpw_info *wi = &rq->mpwqe.info[ix];
 	struct mlx5e_dma_info *dma_info = &wi->umr.dma_info[0];
-	struct mlx5e_icosq *sq = &rq->channel->icosq;
+	struct mlx5e_icosq *sq = rq->icosq;
 	struct mlx5_wq_cyc *wq = &sq->wq;
 	struct mlx5e_umr_wqe *umr_wqe;
 	u16 xlt_offset = ix << (MLX5E_LOG_ALIGNED_MPWQE_PPW - 1);
@@ -713,9 +713,9 @@ int mlx5e_poll_ico_cq(struct mlx5e_cq *cq)
 
 INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
 {
-	struct mlx5e_icosq *sq = &rq->channel->icosq;
 	struct mlx5_wq_ll *wq = &rq->mpwqe.wq;
 	u8  umr_completed = rq->mpwqe.umr_completed;
+	struct mlx5e_icosq *sq = rq->icosq;
 	int alloc_err = 0;
 	u8  missing, i;
 	u16 head;
@@ -1218,11 +1218,12 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
 static void trigger_report(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 {
 	struct mlx5_err_cqe *err_cqe = (struct mlx5_err_cqe *)cqe;
+	struct mlx5e_priv *priv = rq->priv;
 
 	if (cqe_syndrome_needs_recover(err_cqe->syndrome) &&
 	    !test_and_set_bit(MLX5E_RQ_STATE_RECOVERING, &rq->state)) {
 		mlx5e_dump_error_cqe(&rq->cq, rq->rqn, err_cqe);
-		queue_work(rq->channel->priv->wq, &rq->recover_work);
+		queue_work(priv->wq, &rq->recover_work);
 	}
 }
 
@@ -1771,8 +1772,9 @@ static void mlx5e_ipsec_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cq
 
 int mlx5e_rq_set_handlers(struct mlx5e_rq *rq, struct mlx5e_params *params, bool xsk)
 {
+	struct net_device *netdev = rq->netdev;
 	struct mlx5_core_dev *mdev = rq->mdev;
-	struct mlx5e_channel *c = rq->channel;
+	struct mlx5e_priv *priv = rq->priv;
 
 	switch (rq->wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
@@ -1784,15 +1786,15 @@ int mlx5e_rq_set_handlers(struct mlx5e_rq *rq, struct mlx5e_params *params, bool
 		rq->post_wqes = mlx5e_post_rx_mpwqes;
 		rq->dealloc_wqe = mlx5e_dealloc_rx_mpwqe;
 
-		rq->handle_rx_cqe = c->priv->profile->rx_handlers->handle_rx_cqe_mpwqe;
+		rq->handle_rx_cqe = priv->profile->rx_handlers->handle_rx_cqe_mpwqe;
 #ifdef CONFIG_MLX5_EN_IPSEC
 		if (MLX5_IPSEC_DEV(mdev)) {
-			netdev_err(c->netdev, "MPWQE RQ with IPSec offload not supported\n");
+			netdev_err(netdev, "MPWQE RQ with IPSec offload not supported\n");
 			return -EINVAL;
 		}
 #endif
 		if (!rq->handle_rx_cqe) {
-			netdev_err(c->netdev, "RX handler of MPWQE RQ is not set\n");
+			netdev_err(netdev, "RX handler of MPWQE RQ is not set\n");
 			return -EINVAL;
 		}
 		break;
@@ -1807,13 +1809,13 @@ int mlx5e_rq_set_handlers(struct mlx5e_rq *rq, struct mlx5e_params *params, bool
 
 #ifdef CONFIG_MLX5_EN_IPSEC
 		if ((mlx5_fpga_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_DEVICE) &&
-		    c->priv->ipsec)
+		    priv->ipsec)
 			rq->handle_rx_cqe = mlx5e_ipsec_handle_rx_cqe;
 		else
 #endif
-			rq->handle_rx_cqe = c->priv->profile->rx_handlers->handle_rx_cqe;
+			rq->handle_rx_cqe = priv->profile->rx_handlers->handle_rx_cqe;
 		if (!rq->handle_rx_cqe) {
-			netdev_err(c->netdev, "RX handler of RQ is not set\n");
+			netdev_err(netdev, "RX handler of RQ is not set\n");
 			return -EINVAL;
 		}
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 04/15] net/mlx5e: Allow SQ outside of channel context
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2020-12-03  4:20 ` [net-next V2 03/15] net/mlx5e: Allow RQ " Saeed Mahameed
@ 2020-12-03  4:20 ` Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 05/15] net/mlx5e: Change skb fifo push/pop API to be used without SQ Saeed Mahameed
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Aya Levin,
	Tariq Toukan, Saeed Mahameed

From: Eran Ben Elisha <eranbe@nvidia.com>

In order to be able to create an SQ outside of a channel context, remove
sq->channel direct pointer. This requires adding a direct pointer to:
netdevice, priv and mlx5_core in order to support SQs that are part of
mlx5e_channel. Use channel_stats from the corresponding CQ.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  4 +++-
 .../ethernet/mellanox/mlx5/core/en/health.c   |  4 +---
 .../ethernet/mellanox/mlx5/core/en/health.h   |  2 +-
 .../mellanox/mlx5/core/en/reporter_rx.c       |  2 +-
 .../mellanox/mlx5/core/en/reporter_tx.c       | 20 +++++++++----------
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c    |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  9 +++++----
 7 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 3dec0731f4da..c014e8ff66aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -369,10 +369,12 @@ struct mlx5e_txqsq {
 	unsigned int               hw_mtu;
 	struct hwtstamp_config    *tstamp;
 	struct mlx5_clock         *clock;
+	struct net_device         *netdev;
+	struct mlx5_core_dev      *mdev;
+	struct mlx5e_priv         *priv;
 
 	/* control path */
 	struct mlx5_wq_ctrl        wq_ctrl;
-	struct mlx5e_channel      *channel;
 	int                        ch_ix;
 	int                        txq_ix;
 	u32                        rate_limit;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
index e8fc535e6f91..718f8c0a4f6b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -157,10 +157,8 @@ void mlx5e_health_channels_update(struct mlx5e_priv *priv)
 						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
 }
 
-int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
+int mlx5e_health_sq_to_ready(struct mlx5_core_dev *mdev, struct net_device *dev, u32 sqn)
 {
-	struct mlx5_core_dev *mdev = channel->mdev;
-	struct net_device *dev = channel->netdev;
 	struct mlx5e_modify_sq_param msp = {};
 	int err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index 48d0232ce654..f88fbbe06995 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -42,7 +42,7 @@ struct mlx5e_err_ctx {
 	void *ctx;
 };
 
-int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn);
+int mlx5e_health_sq_to_ready(struct mlx5_core_dev *mdev, struct net_device *dev, u32 sqn);
 int mlx5e_health_channel_eq_recover(struct net_device *dev, struct mlx5_eq_comp *eq,
 				    struct mlx5e_ch_stats *stats);
 int mlx5e_health_recover_channels(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 0206e033a271..d80bbd17e5f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -87,7 +87,7 @@ static int mlx5e_rx_reporter_err_icosq_cqe_recover(void *ctx)
 
 	/* At this point, both the rq and the icosq are disabled */
 
-	err = mlx5e_health_sq_to_ready(icosq->channel, icosq->sqn);
+	err = mlx5e_health_sq_to_ready(mdev, dev, icosq->sqn);
 	if (err)
 		goto out;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 97bfeae17dec..88b3b21d1068 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -15,7 +15,7 @@ static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
 		msleep(20);
 	}
 
-	netdev_err(sq->channel->netdev,
+	netdev_err(sq->netdev,
 		   "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
 		   sq->sqn, sq->cc, sq->pc);
 
@@ -41,8 +41,8 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
 	int err;
 
 	sq = ctx;
-	mdev = sq->channel->mdev;
-	dev = sq->channel->netdev;
+	mdev = sq->mdev;
+	dev = sq->netdev;
 
 	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
 		return 0;
@@ -68,7 +68,7 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
 	 * pending WQEs. SQ can safely reset the SQ.
 	 */
 
-	err = mlx5e_health_sq_to_ready(sq->channel, sq->sqn);
+	err = mlx5e_health_sq_to_ready(mdev, dev, sq->sqn);
 	if (err)
 		goto out;
 
@@ -99,8 +99,8 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 	to_ctx = ctx;
 	sq = to_ctx->sq;
 	eq = sq->cq.mcq.eq;
-	priv = sq->channel->priv;
-	err = mlx5e_health_channel_eq_recover(sq->channel->netdev, eq, sq->channel->stats);
+	priv = sq->priv;
+	err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq->cq.ch_stats);
 	if (!err) {
 		to_ctx->status = 0; /* this sq recovered */
 		return err;
@@ -144,8 +144,8 @@ static int
 mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
 					struct mlx5e_txqsq *sq, int tc)
 {
-	struct mlx5e_priv *priv = sq->channel->priv;
 	bool stopped = netif_xmit_stopped(sq->txq);
+	struct mlx5e_priv *priv = sq->priv;
 	u8 state;
 	int err;
 
@@ -396,8 +396,8 @@ static int mlx5e_tx_reporter_dump(struct devlink_health_reporter *reporter,
 
 void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq)
 {
-	struct mlx5e_priv *priv = sq->channel->priv;
 	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_priv *priv = sq->priv;
 	struct mlx5e_err_ctx err_ctx = {};
 
 	err_ctx.ctx = sq;
@@ -410,9 +410,9 @@ void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq)
 
 int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
 {
-	struct mlx5e_priv *priv = sq->channel->priv;
 	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
 	struct mlx5e_tx_timeout_ctx to_ctx = {};
+	struct mlx5e_priv *priv = sq->priv;
 	struct mlx5e_err_ctx err_ctx = {};
 
 	to_ctx.sq = sq;
@@ -421,7 +421,7 @@ int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
 	err_ctx.dump = mlx5e_tx_reporter_dump_sq;
 	snprintf(err_str, sizeof(err_str),
 		 "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u",
-		 sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
+		 sq->ch_ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
 		 jiffies_to_usecs(jiffies - sq->txq->trans_start));
 
 	mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index f51c04284e4d..2b51d3222ca1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -276,7 +276,7 @@ bool mlx5e_tls_handle_tx_skb(struct net_device *netdev, struct mlx5e_txqsq *sq,
 	if (WARN_ON_ONCE(tls_ctx->netdev != netdev))
 		goto err_out;
 
-	if (mlx5_accel_is_ktls_tx(sq->channel->mdev))
+	if (mlx5_accel_is_ktls_tx(sq->mdev))
 		return mlx5e_ktls_handle_tx_skb(tls_ctx, sq, skb, datalen, state);
 
 	/* FPGA */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 559ef38a6358..38506b8b6f82 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1132,7 +1132,9 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 	sq->tstamp    = c->tstamp;
 	sq->clock     = &mdev->clock;
 	sq->mkey_be   = c->mkey_be;
-	sq->channel   = c;
+	sq->netdev    = c->netdev;
+	sq->mdev      = c->mdev;
+	sq->priv      = c->priv;
 	sq->ch_ix     = c->ix;
 	sq->txq_ix    = txq_ix;
 	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
@@ -1332,7 +1334,7 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
 
 void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 {
-	sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix);
+	sq->txq = netdev_get_tx_queue(sq->netdev, sq->txq_ix);
 	set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
 	netdev_tx_reset_queue(sq->txq);
 	netif_tx_start_queue(sq->txq);
@@ -1370,8 +1372,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
 
 static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 {
-	struct mlx5e_channel *c = sq->channel;
-	struct mlx5_core_dev *mdev = c->mdev;
+	struct mlx5_core_dev *mdev = sq->mdev;
 	struct mlx5_rate_limit rl = {0};
 
 	cancel_work_sync(&sq->dim.work);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 05/15] net/mlx5e: Change skb fifo push/pop API to be used without SQ
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2020-12-03  4:20 ` [net-next V2 04/15] net/mlx5e: Allow SQ " Saeed Mahameed
@ 2020-12-03  4:20 ` Saeed Mahameed
  2020-12-03  4:20 ` [net-next V2 06/15] net/mlx5e: Split SW group counters update function Saeed Mahameed
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan, Saeed Mahameed

From: Eran Ben Elisha <eranbe@nvidia.com>

The skb fifo push/pop API used pre-defined attributes within the
mlx5e_txqsq.
In order to share the skb fifo API with other non-SQ use cases,
change the API input to get newly defined mlx5e_skb_fifo struct.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      | 10 ++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 15 +++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 ++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |  6 +++---
 4 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index c014e8ff66aa..7f3bd3d406b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -331,6 +331,13 @@ struct mlx5e_tx_mpwqe {
 	u8 inline_on;
 };
 
+struct mlx5e_skb_fifo {
+	struct sk_buff **fifo;
+	u16 *pc;
+	u16 *cc;
+	u16 mask;
+};
+
 struct mlx5e_txqsq {
 	/* data path */
 
@@ -351,11 +358,10 @@ struct mlx5e_txqsq {
 	/* read only */
 	struct mlx5_wq_cyc         wq;
 	u32                        dma_fifo_mask;
-	u16                        skb_fifo_mask;
 	struct mlx5e_sq_stats     *stats;
 	struct {
 		struct mlx5e_sq_dma       *dma_fifo;
-		struct sk_buff           **skb_fifo;
+		struct mlx5e_skb_fifo      skb_fifo;
 		struct mlx5e_tx_wqe_info  *wqe_info;
 	} db;
 	void __iomem              *uar_map;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index ac47efaaebd5..115ab19ffab1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -250,21 +250,24 @@ mlx5e_dma_push(struct mlx5e_txqsq *sq, dma_addr_t addr, u32 size,
 	dma->type = map_type;
 }
 
-static inline struct sk_buff **mlx5e_skb_fifo_get(struct mlx5e_txqsq *sq, u16 i)
+static inline
+struct sk_buff **mlx5e_skb_fifo_get(struct mlx5e_skb_fifo *fifo, u16 i)
 {
-	return &sq->db.skb_fifo[i & sq->skb_fifo_mask];
+	return &fifo->fifo[i & fifo->mask];
 }
 
-static inline void mlx5e_skb_fifo_push(struct mlx5e_txqsq *sq, struct sk_buff *skb)
+static inline
+void mlx5e_skb_fifo_push(struct mlx5e_skb_fifo *fifo, struct sk_buff *skb)
 {
-	struct sk_buff **skb_item = mlx5e_skb_fifo_get(sq, sq->skb_fifo_pc++);
+	struct sk_buff **skb_item = mlx5e_skb_fifo_get(fifo, (*fifo->pc)++);
 
 	*skb_item = skb;
 }
 
-static inline struct sk_buff *mlx5e_skb_fifo_pop(struct mlx5e_txqsq *sq)
+static inline
+struct sk_buff *mlx5e_skb_fifo_pop(struct mlx5e_skb_fifo *fifo)
 {
-	return *mlx5e_skb_fifo_get(sq, sq->skb_fifo_cc++);
+	return *mlx5e_skb_fifo_get(fifo, (*fifo->cc)++);
 }
 
 static inline void
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 38506b8b6f82..3ea15d62acd9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1086,7 +1086,7 @@ static void mlx5e_free_icosq(struct mlx5e_icosq *sq)
 static void mlx5e_free_txqsq_db(struct mlx5e_txqsq *sq)
 {
 	kvfree(sq->db.wqe_info);
-	kvfree(sq->db.skb_fifo);
+	kvfree(sq->db.skb_fifo.fifo);
 	kvfree(sq->db.dma_fifo);
 }
 
@@ -1098,19 +1098,22 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
 	sq->db.dma_fifo = kvzalloc_node(array_size(df_sz,
 						   sizeof(*sq->db.dma_fifo)),
 					GFP_KERNEL, numa);
-	sq->db.skb_fifo = kvzalloc_node(array_size(df_sz,
-						   sizeof(*sq->db.skb_fifo)),
+	sq->db.skb_fifo.fifo = kvzalloc_node(array_size(df_sz,
+							sizeof(*sq->db.skb_fifo.fifo)),
 					GFP_KERNEL, numa);
 	sq->db.wqe_info = kvzalloc_node(array_size(wq_sz,
 						   sizeof(*sq->db.wqe_info)),
 					GFP_KERNEL, numa);
-	if (!sq->db.dma_fifo || !sq->db.skb_fifo || !sq->db.wqe_info) {
+	if (!sq->db.dma_fifo || !sq->db.skb_fifo.fifo || !sq->db.wqe_info) {
 		mlx5e_free_txqsq_db(sq);
 		return -ENOMEM;
 	}
 
 	sq->dma_fifo_mask = df_sz - 1;
-	sq->skb_fifo_mask = df_sz - 1;
+
+	sq->db.skb_fifo.pc   = &sq->skb_fifo_pc;
+	sq->db.skb_fifo.cc   = &sq->skb_fifo_cc;
+	sq->db.skb_fifo.mask = df_sz - 1;
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 14af7488cc4f..c6b20b77a0f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -572,7 +572,7 @@ mlx5e_sq_xmit_mpwqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		goto err_unmap;
 	mlx5e_dma_push(sq, txd.dma_addr, txd.len, MLX5E_DMA_MAP_SINGLE);
 
-	mlx5e_skb_fifo_push(sq, skb);
+	mlx5e_skb_fifo_push(&sq->db.skb_fifo, skb);
 
 	mlx5e_tx_mpwqe_add_dseg(sq, &txd);
 
@@ -711,7 +711,7 @@ static void mlx5e_tx_wi_consume_fifo_skbs(struct mlx5e_txqsq *sq, struct mlx5e_t
 	int i;
 
 	for (i = 0; i < wi->num_fifo_pkts; i++) {
-		struct sk_buff *skb = mlx5e_skb_fifo_pop(sq);
+		struct sk_buff *skb = mlx5e_skb_fifo_pop(&sq->db.skb_fifo);
 
 		mlx5e_consume_skb(sq, skb, cqe, napi_budget);
 	}
@@ -831,7 +831,7 @@ static void mlx5e_tx_wi_kfree_fifo_skbs(struct mlx5e_txqsq *sq, struct mlx5e_tx_
 	int i;
 
 	for (i = 0; i < wi->num_fifo_pkts; i++)
-		dev_kfree_skb_any(mlx5e_skb_fifo_pop(sq));
+		dev_kfree_skb_any(mlx5e_skb_fifo_pop(&sq->db.skb_fifo));
 }
 
 void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 06/15] net/mlx5e: Split SW group counters update function
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2020-12-03  4:20 ` [net-next V2 05/15] net/mlx5e: Change skb fifo push/pop API to be used without SQ Saeed Mahameed
@ 2020-12-03  4:20 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 07/15] net/mlx5e: Move MLX5E_RX_ERR_CQE macro Saeed Mahameed
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan, Saeed Mahameed

From: Eran Ben Elisha <eranbe@nvidia.com>

SW group counter update function aggregates sw stats out of many
mlx5e_*_stats resides in a given mlx5e_channel_stats struct.
Split the function into a few helper functions.

This will be used later in the series to calculate specific
mlx5e_*_stats which are not defined inside mlx5e_channel_stats.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 288 ++++++++++--------
 1 file changed, 161 insertions(+), 127 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 78f6a6f0a7e0..ebfb47a09128 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -248,6 +248,160 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
 	return idx;
 }
 
+static void mlx5e_stats_grp_sw_update_stats_xdp_red(struct mlx5e_sw_stats *s,
+						    struct mlx5e_xdpsq_stats *xdpsq_red_stats)
+{
+	s->tx_xdp_xmit  += xdpsq_red_stats->xmit;
+	s->tx_xdp_mpwqe += xdpsq_red_stats->mpwqe;
+	s->tx_xdp_inlnw += xdpsq_red_stats->inlnw;
+	s->tx_xdp_nops  += xdpsq_red_stats->nops;
+	s->tx_xdp_full  += xdpsq_red_stats->full;
+	s->tx_xdp_err   += xdpsq_red_stats->err;
+	s->tx_xdp_cqes  += xdpsq_red_stats->cqes;
+}
+
+static void mlx5e_stats_grp_sw_update_stats_xdpsq(struct mlx5e_sw_stats *s,
+						  struct mlx5e_xdpsq_stats *xdpsq_stats)
+{
+	s->rx_xdp_tx_xmit  += xdpsq_stats->xmit;
+	s->rx_xdp_tx_mpwqe += xdpsq_stats->mpwqe;
+	s->rx_xdp_tx_inlnw += xdpsq_stats->inlnw;
+	s->rx_xdp_tx_nops  += xdpsq_stats->nops;
+	s->rx_xdp_tx_full  += xdpsq_stats->full;
+	s->rx_xdp_tx_err   += xdpsq_stats->err;
+	s->rx_xdp_tx_cqe   += xdpsq_stats->cqes;
+}
+
+static void mlx5e_stats_grp_sw_update_stats_xsksq(struct mlx5e_sw_stats *s,
+						  struct mlx5e_xdpsq_stats *xsksq_stats)
+{
+	s->tx_xsk_xmit  += xsksq_stats->xmit;
+	s->tx_xsk_mpwqe += xsksq_stats->mpwqe;
+	s->tx_xsk_inlnw += xsksq_stats->inlnw;
+	s->tx_xsk_full  += xsksq_stats->full;
+	s->tx_xsk_err   += xsksq_stats->err;
+	s->tx_xsk_cqes  += xsksq_stats->cqes;
+}
+
+static void mlx5e_stats_grp_sw_update_stats_xskrq(struct mlx5e_sw_stats *s,
+						  struct mlx5e_rq_stats *xskrq_stats)
+{
+	s->rx_xsk_packets                += xskrq_stats->packets;
+	s->rx_xsk_bytes                  += xskrq_stats->bytes;
+	s->rx_xsk_csum_complete          += xskrq_stats->csum_complete;
+	s->rx_xsk_csum_unnecessary       += xskrq_stats->csum_unnecessary;
+	s->rx_xsk_csum_unnecessary_inner += xskrq_stats->csum_unnecessary_inner;
+	s->rx_xsk_csum_none              += xskrq_stats->csum_none;
+	s->rx_xsk_ecn_mark               += xskrq_stats->ecn_mark;
+	s->rx_xsk_removed_vlan_packets   += xskrq_stats->removed_vlan_packets;
+	s->rx_xsk_xdp_drop               += xskrq_stats->xdp_drop;
+	s->rx_xsk_xdp_redirect           += xskrq_stats->xdp_redirect;
+	s->rx_xsk_wqe_err                += xskrq_stats->wqe_err;
+	s->rx_xsk_mpwqe_filler_cqes      += xskrq_stats->mpwqe_filler_cqes;
+	s->rx_xsk_mpwqe_filler_strides   += xskrq_stats->mpwqe_filler_strides;
+	s->rx_xsk_oversize_pkts_sw_drop  += xskrq_stats->oversize_pkts_sw_drop;
+	s->rx_xsk_buff_alloc_err         += xskrq_stats->buff_alloc_err;
+	s->rx_xsk_cqe_compress_blks      += xskrq_stats->cqe_compress_blks;
+	s->rx_xsk_cqe_compress_pkts      += xskrq_stats->cqe_compress_pkts;
+	s->rx_xsk_congst_umr             += xskrq_stats->congst_umr;
+	s->rx_xsk_arfs_err               += xskrq_stats->arfs_err;
+}
+
+static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
+						     struct mlx5e_rq_stats *rq_stats)
+{
+	s->rx_packets                 += rq_stats->packets;
+	s->rx_bytes                   += rq_stats->bytes;
+	s->rx_lro_packets             += rq_stats->lro_packets;
+	s->rx_lro_bytes               += rq_stats->lro_bytes;
+	s->rx_ecn_mark                += rq_stats->ecn_mark;
+	s->rx_removed_vlan_packets    += rq_stats->removed_vlan_packets;
+	s->rx_csum_none               += rq_stats->csum_none;
+	s->rx_csum_complete           += rq_stats->csum_complete;
+	s->rx_csum_complete_tail      += rq_stats->csum_complete_tail;
+	s->rx_csum_complete_tail_slow += rq_stats->csum_complete_tail_slow;
+	s->rx_csum_unnecessary        += rq_stats->csum_unnecessary;
+	s->rx_csum_unnecessary_inner  += rq_stats->csum_unnecessary_inner;
+	s->rx_xdp_drop                += rq_stats->xdp_drop;
+	s->rx_xdp_redirect            += rq_stats->xdp_redirect;
+	s->rx_wqe_err                 += rq_stats->wqe_err;
+	s->rx_mpwqe_filler_cqes       += rq_stats->mpwqe_filler_cqes;
+	s->rx_mpwqe_filler_strides    += rq_stats->mpwqe_filler_strides;
+	s->rx_oversize_pkts_sw_drop   += rq_stats->oversize_pkts_sw_drop;
+	s->rx_buff_alloc_err          += rq_stats->buff_alloc_err;
+	s->rx_cqe_compress_blks       += rq_stats->cqe_compress_blks;
+	s->rx_cqe_compress_pkts       += rq_stats->cqe_compress_pkts;
+	s->rx_cache_reuse             += rq_stats->cache_reuse;
+	s->rx_cache_full              += rq_stats->cache_full;
+	s->rx_cache_empty             += rq_stats->cache_empty;
+	s->rx_cache_busy              += rq_stats->cache_busy;
+	s->rx_cache_waive             += rq_stats->cache_waive;
+	s->rx_congst_umr              += rq_stats->congst_umr;
+	s->rx_arfs_err                += rq_stats->arfs_err;
+	s->rx_recover                 += rq_stats->recover;
+#ifdef CONFIG_MLX5_EN_TLS
+	s->rx_tls_decrypted_packets   += rq_stats->tls_decrypted_packets;
+	s->rx_tls_decrypted_bytes     += rq_stats->tls_decrypted_bytes;
+	s->rx_tls_ctx                 += rq_stats->tls_ctx;
+	s->rx_tls_del                 += rq_stats->tls_del;
+	s->rx_tls_resync_req_pkt      += rq_stats->tls_resync_req_pkt;
+	s->rx_tls_resync_req_start    += rq_stats->tls_resync_req_start;
+	s->rx_tls_resync_req_end      += rq_stats->tls_resync_req_end;
+	s->rx_tls_resync_req_skip     += rq_stats->tls_resync_req_skip;
+	s->rx_tls_resync_res_ok       += rq_stats->tls_resync_res_ok;
+	s->rx_tls_resync_res_skip     += rq_stats->tls_resync_res_skip;
+	s->rx_tls_err                 += rq_stats->tls_err;
+#endif
+}
+
+static void mlx5e_stats_grp_sw_update_stats_ch_stats(struct mlx5e_sw_stats *s,
+						     struct mlx5e_ch_stats *ch_stats)
+{
+	s->ch_events      += ch_stats->events;
+	s->ch_poll        += ch_stats->poll;
+	s->ch_arm         += ch_stats->arm;
+	s->ch_aff_change  += ch_stats->aff_change;
+	s->ch_force_irq   += ch_stats->force_irq;
+	s->ch_eq_rearm    += ch_stats->eq_rearm;
+}
+
+static void mlx5e_stats_grp_sw_update_stats_sq(struct mlx5e_sw_stats *s,
+					       struct mlx5e_sq_stats *sq_stats)
+{
+	s->tx_packets               += sq_stats->packets;
+	s->tx_bytes                 += sq_stats->bytes;
+	s->tx_tso_packets           += sq_stats->tso_packets;
+	s->tx_tso_bytes             += sq_stats->tso_bytes;
+	s->tx_tso_inner_packets     += sq_stats->tso_inner_packets;
+	s->tx_tso_inner_bytes       += sq_stats->tso_inner_bytes;
+	s->tx_added_vlan_packets    += sq_stats->added_vlan_packets;
+	s->tx_nop                   += sq_stats->nop;
+	s->tx_mpwqe_blks            += sq_stats->mpwqe_blks;
+	s->tx_mpwqe_pkts            += sq_stats->mpwqe_pkts;
+	s->tx_queue_stopped         += sq_stats->stopped;
+	s->tx_queue_wake            += sq_stats->wake;
+	s->tx_queue_dropped         += sq_stats->dropped;
+	s->tx_cqe_err               += sq_stats->cqe_err;
+	s->tx_recover               += sq_stats->recover;
+	s->tx_xmit_more             += sq_stats->xmit_more;
+	s->tx_csum_partial_inner    += sq_stats->csum_partial_inner;
+	s->tx_csum_none             += sq_stats->csum_none;
+	s->tx_csum_partial          += sq_stats->csum_partial;
+#ifdef CONFIG_MLX5_EN_TLS
+	s->tx_tls_encrypted_packets += sq_stats->tls_encrypted_packets;
+	s->tx_tls_encrypted_bytes   += sq_stats->tls_encrypted_bytes;
+	s->tx_tls_ctx               += sq_stats->tls_ctx;
+	s->tx_tls_ooo               += sq_stats->tls_ooo;
+	s->tx_tls_dump_bytes        += sq_stats->tls_dump_bytes;
+	s->tx_tls_dump_packets      += sq_stats->tls_dump_packets;
+	s->tx_tls_resync_bytes      += sq_stats->tls_resync_bytes;
+	s->tx_tls_skip_no_sync_data += sq_stats->tls_skip_no_sync_data;
+	s->tx_tls_drop_no_sync_data += sq_stats->tls_drop_no_sync_data;
+	s->tx_tls_drop_bypass_req   += sq_stats->tls_drop_bypass_req;
+#endif
+	s->tx_cqes                  += sq_stats->cqes;
+}
+
 static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
 {
 	struct mlx5e_sw_stats *s = &priv->stats.sw;
@@ -258,139 +412,19 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
 	for (i = 0; i < priv->max_nch; i++) {
 		struct mlx5e_channel_stats *channel_stats =
 			&priv->channel_stats[i];
-		struct mlx5e_xdpsq_stats *xdpsq_red_stats = &channel_stats->xdpsq;
-		struct mlx5e_xdpsq_stats *xdpsq_stats = &channel_stats->rq_xdpsq;
-		struct mlx5e_xdpsq_stats *xsksq_stats = &channel_stats->xsksq;
-		struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
-		struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
-		struct mlx5e_ch_stats *ch_stats = &channel_stats->ch;
 		int j;
 
-		s->rx_packets	+= rq_stats->packets;
-		s->rx_bytes	+= rq_stats->bytes;
-		s->rx_lro_packets += rq_stats->lro_packets;
-		s->rx_lro_bytes	+= rq_stats->lro_bytes;
-		s->rx_ecn_mark	+= rq_stats->ecn_mark;
-		s->rx_removed_vlan_packets += rq_stats->removed_vlan_packets;
-		s->rx_csum_none	+= rq_stats->csum_none;
-		s->rx_csum_complete += rq_stats->csum_complete;
-		s->rx_csum_complete_tail += rq_stats->csum_complete_tail;
-		s->rx_csum_complete_tail_slow += rq_stats->csum_complete_tail_slow;
-		s->rx_csum_unnecessary += rq_stats->csum_unnecessary;
-		s->rx_csum_unnecessary_inner += rq_stats->csum_unnecessary_inner;
-		s->rx_xdp_drop     += rq_stats->xdp_drop;
-		s->rx_xdp_redirect += rq_stats->xdp_redirect;
-		s->rx_xdp_tx_xmit  += xdpsq_stats->xmit;
-		s->rx_xdp_tx_mpwqe += xdpsq_stats->mpwqe;
-		s->rx_xdp_tx_inlnw += xdpsq_stats->inlnw;
-		s->rx_xdp_tx_nops  += xdpsq_stats->nops;
-		s->rx_xdp_tx_full  += xdpsq_stats->full;
-		s->rx_xdp_tx_err   += xdpsq_stats->err;
-		s->rx_xdp_tx_cqe   += xdpsq_stats->cqes;
-		s->rx_wqe_err   += rq_stats->wqe_err;
-		s->rx_mpwqe_filler_cqes    += rq_stats->mpwqe_filler_cqes;
-		s->rx_mpwqe_filler_strides += rq_stats->mpwqe_filler_strides;
-		s->rx_oversize_pkts_sw_drop += rq_stats->oversize_pkts_sw_drop;
-		s->rx_buff_alloc_err += rq_stats->buff_alloc_err;
-		s->rx_cqe_compress_blks += rq_stats->cqe_compress_blks;
-		s->rx_cqe_compress_pkts += rq_stats->cqe_compress_pkts;
-		s->rx_cache_reuse += rq_stats->cache_reuse;
-		s->rx_cache_full  += rq_stats->cache_full;
-		s->rx_cache_empty += rq_stats->cache_empty;
-		s->rx_cache_busy  += rq_stats->cache_busy;
-		s->rx_cache_waive += rq_stats->cache_waive;
-		s->rx_congst_umr  += rq_stats->congst_umr;
-		s->rx_arfs_err    += rq_stats->arfs_err;
-		s->rx_recover     += rq_stats->recover;
-#ifdef CONFIG_MLX5_EN_TLS
-		s->rx_tls_decrypted_packets += rq_stats->tls_decrypted_packets;
-		s->rx_tls_decrypted_bytes   += rq_stats->tls_decrypted_bytes;
-		s->rx_tls_ctx               += rq_stats->tls_ctx;
-		s->rx_tls_del               += rq_stats->tls_del;
-		s->rx_tls_resync_req_pkt    += rq_stats->tls_resync_req_pkt;
-		s->rx_tls_resync_req_start  += rq_stats->tls_resync_req_start;
-		s->rx_tls_resync_req_end    += rq_stats->tls_resync_req_end;
-		s->rx_tls_resync_req_skip   += rq_stats->tls_resync_req_skip;
-		s->rx_tls_resync_res_ok     += rq_stats->tls_resync_res_ok;
-		s->rx_tls_resync_res_skip   += rq_stats->tls_resync_res_skip;
-		s->rx_tls_err               += rq_stats->tls_err;
-#endif
-		s->ch_events      += ch_stats->events;
-		s->ch_poll        += ch_stats->poll;
-		s->ch_arm         += ch_stats->arm;
-		s->ch_aff_change  += ch_stats->aff_change;
-		s->ch_force_irq   += ch_stats->force_irq;
-		s->ch_eq_rearm    += ch_stats->eq_rearm;
+		mlx5e_stats_grp_sw_update_stats_rq_stats(s, &channel_stats->rq);
+		mlx5e_stats_grp_sw_update_stats_xdpsq(s, &channel_stats->rq_xdpsq);
+		mlx5e_stats_grp_sw_update_stats_ch_stats(s, &channel_stats->ch);
 		/* xdp redirect */
-		s->tx_xdp_xmit    += xdpsq_red_stats->xmit;
-		s->tx_xdp_mpwqe   += xdpsq_red_stats->mpwqe;
-		s->tx_xdp_inlnw   += xdpsq_red_stats->inlnw;
-		s->tx_xdp_nops	  += xdpsq_red_stats->nops;
-		s->tx_xdp_full    += xdpsq_red_stats->full;
-		s->tx_xdp_err     += xdpsq_red_stats->err;
-		s->tx_xdp_cqes    += xdpsq_red_stats->cqes;
+		mlx5e_stats_grp_sw_update_stats_xdp_red(s, &channel_stats->xdpsq);
 		/* AF_XDP zero-copy */
-		s->rx_xsk_packets                += xskrq_stats->packets;
-		s->rx_xsk_bytes                  += xskrq_stats->bytes;
-		s->rx_xsk_csum_complete          += xskrq_stats->csum_complete;
-		s->rx_xsk_csum_unnecessary       += xskrq_stats->csum_unnecessary;
-		s->rx_xsk_csum_unnecessary_inner += xskrq_stats->csum_unnecessary_inner;
-		s->rx_xsk_csum_none              += xskrq_stats->csum_none;
-		s->rx_xsk_ecn_mark               += xskrq_stats->ecn_mark;
-		s->rx_xsk_removed_vlan_packets   += xskrq_stats->removed_vlan_packets;
-		s->rx_xsk_xdp_drop               += xskrq_stats->xdp_drop;
-		s->rx_xsk_xdp_redirect           += xskrq_stats->xdp_redirect;
-		s->rx_xsk_wqe_err                += xskrq_stats->wqe_err;
-		s->rx_xsk_mpwqe_filler_cqes      += xskrq_stats->mpwqe_filler_cqes;
-		s->rx_xsk_mpwqe_filler_strides   += xskrq_stats->mpwqe_filler_strides;
-		s->rx_xsk_oversize_pkts_sw_drop  += xskrq_stats->oversize_pkts_sw_drop;
-		s->rx_xsk_buff_alloc_err         += xskrq_stats->buff_alloc_err;
-		s->rx_xsk_cqe_compress_blks      += xskrq_stats->cqe_compress_blks;
-		s->rx_xsk_cqe_compress_pkts      += xskrq_stats->cqe_compress_pkts;
-		s->rx_xsk_congst_umr             += xskrq_stats->congst_umr;
-		s->rx_xsk_arfs_err               += xskrq_stats->arfs_err;
-		s->tx_xsk_xmit                   += xsksq_stats->xmit;
-		s->tx_xsk_mpwqe                  += xsksq_stats->mpwqe;
-		s->tx_xsk_inlnw                  += xsksq_stats->inlnw;
-		s->tx_xsk_full                   += xsksq_stats->full;
-		s->tx_xsk_err                    += xsksq_stats->err;
-		s->tx_xsk_cqes                   += xsksq_stats->cqes;
+		mlx5e_stats_grp_sw_update_stats_xskrq(s, &channel_stats->xskrq);
+		mlx5e_stats_grp_sw_update_stats_xsksq(s, &channel_stats->xsksq);
 
 		for (j = 0; j < priv->max_opened_tc; j++) {
-			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
-
-			s->tx_packets		+= sq_stats->packets;
-			s->tx_bytes		+= sq_stats->bytes;
-			s->tx_tso_packets	+= sq_stats->tso_packets;
-			s->tx_tso_bytes		+= sq_stats->tso_bytes;
-			s->tx_tso_inner_packets	+= sq_stats->tso_inner_packets;
-			s->tx_tso_inner_bytes	+= sq_stats->tso_inner_bytes;
-			s->tx_added_vlan_packets += sq_stats->added_vlan_packets;
-			s->tx_nop               += sq_stats->nop;
-			s->tx_mpwqe_blks        += sq_stats->mpwqe_blks;
-			s->tx_mpwqe_pkts        += sq_stats->mpwqe_pkts;
-			s->tx_queue_stopped	+= sq_stats->stopped;
-			s->tx_queue_wake	+= sq_stats->wake;
-			s->tx_queue_dropped	+= sq_stats->dropped;
-			s->tx_cqe_err		+= sq_stats->cqe_err;
-			s->tx_recover		+= sq_stats->recover;
-			s->tx_xmit_more		+= sq_stats->xmit_more;
-			s->tx_csum_partial_inner += sq_stats->csum_partial_inner;
-			s->tx_csum_none		+= sq_stats->csum_none;
-			s->tx_csum_partial	+= sq_stats->csum_partial;
-#ifdef CONFIG_MLX5_EN_TLS
-			s->tx_tls_encrypted_packets += sq_stats->tls_encrypted_packets;
-			s->tx_tls_encrypted_bytes   += sq_stats->tls_encrypted_bytes;
-			s->tx_tls_ctx               += sq_stats->tls_ctx;
-			s->tx_tls_ooo               += sq_stats->tls_ooo;
-			s->tx_tls_dump_bytes        += sq_stats->tls_dump_bytes;
-			s->tx_tls_dump_packets      += sq_stats->tls_dump_packets;
-			s->tx_tls_resync_bytes      += sq_stats->tls_resync_bytes;
-			s->tx_tls_skip_no_sync_data += sq_stats->tls_skip_no_sync_data;
-			s->tx_tls_drop_no_sync_data += sq_stats->tls_drop_no_sync_data;
-			s->tx_tls_drop_bypass_req   += sq_stats->tls_drop_bypass_req;
-#endif
-			s->tx_cqes		+= sq_stats->cqes;
+			mlx5e_stats_grp_sw_update_stats_sq(s, &channel_stats->sq[j]);
 
 			/* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92657 */
 			barrier();
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 07/15] net/mlx5e: Move MLX5E_RX_ERR_CQE macro
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2020-12-03  4:20 ` [net-next V2 06/15] net/mlx5e: Split SW group counters update function Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 08/15] net/mlx5e: Add TX PTP port object support Saeed Mahameed
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan, Saeed Mahameed

From: Eran Ben Elisha <eranbe@nvidia.com>

MLX5E_RX_ERR_CQE Macro is used only in data-path, move it to the
appropriate header file.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/health.h | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h   | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index f88fbbe06995..018262d0164b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -7,8 +7,6 @@
 #include "en.h"
 #include "diag/rsc_dump.h"
 
-#define MLX5E_RX_ERR_CQE(cqe) (get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)
-
 static inline bool cqe_syndrome_needs_recover(u8 syndrome)
 {
 	return syndrome == MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR ||
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index 115ab19ffab1..7943eb30b837 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -24,6 +24,8 @@
 
 #define INL_HDR_START_SZ (sizeof(((struct mlx5_wqe_eth_seg *)NULL)->inline_hdr.start))
 
+#define MLX5E_RX_ERR_CQE(cqe) (get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)
+
 enum mlx5e_icosq_wqe_type {
 	MLX5E_ICOSQ_WQE_NOP,
 	MLX5E_ICOSQ_WQE_UMR_RX,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 07/15] net/mlx5e: Move MLX5E_RX_ERR_CQE macro Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-04  2:29   ` Jakub Kicinski
  2020-12-03  4:21 ` [net-next V2 09/15] net/mlx5e: Add TX port timestamp support Saeed Mahameed
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan, Saeed Mahameed

From: Eran Ben Elisha <eranbe@nvidia.com>

Add TX PTP port object support for better TX timestamping accuracy.
Currently, driver supports CQE based TX port timestamp. Device
also offers TX port timestamp, which has less jitter and better
reflects the actual time of a packet's transmit.

Define new driver layout called ptpsq, on which driver will create
SQs that will support TX port timestamp for their transmitted packets.
Driver to identify PTP TX skbs and steer them to these dedicated SQs
as part of the select queue ndo.

Driver to hold ptpsq per TC and report them at
netif_set_real_num_tx_queues().

Add support for all needed functionality in order to xmit and poll
completions received via ptpsq.

Add ptpsq to the TX reporter recover, diagnose and dump methods.

Creation of ptpsqs is disabled by default, and can be enabled via
tx_port_ts private flag.

This patch steer all timestamp related packets to a ptpsq, but it
does not open the port timestamp support for it. The support will
be added in the following patch.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  29 +-
 .../ethernet/mellanox/mlx5/core/en/params.h   |   8 +
 .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 360 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/ptp.h  |  48 +++
 .../mellanox/mlx5/core/en/reporter_tx.c       | 166 ++++++--
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  33 ++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  82 ++--
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  96 +++++
 .../ethernet/mellanox/mlx5/core/en_stats.h    |   3 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |  56 ++-
 11 files changed, 823 insertions(+), 60 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 83a67ca43a41..77961643d5a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -25,7 +25,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
 		en_selftest.o en/port.o en/monitor_stats.o en/health.o \
 		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \
-		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o
+		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o en/ptp.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7f3bd3d406b3..6864c79d2d9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -227,6 +227,7 @@ enum mlx5e_priv_flag {
 	MLX5E_PFLAG_RX_NO_CSUM_COMPLETE,
 	MLX5E_PFLAG_XDP_TX_MPWQE,
 	MLX5E_PFLAG_SKB_TX_MPWQE,
+	MLX5E_PFLAG_TX_PORT_TS,
 	MLX5E_NUM_PFLAGS, /* Keep last */
 };
 
@@ -338,6 +339,8 @@ struct mlx5e_skb_fifo {
 	u16 mask;
 };
 
+struct mlx5e_ptpsq;
+
 struct mlx5e_txqsq {
 	/* data path */
 
@@ -385,6 +388,7 @@ struct mlx5e_txqsq {
 	int                        txq_ix;
 	u32                        rate_limit;
 	struct work_struct         recover_work;
+	struct mlx5e_ptpsq        *ptpsq;
 } ____cacheline_aligned_in_smp;
 
 struct mlx5e_dma_info {
@@ -692,8 +696,11 @@ struct mlx5e_channel {
 	int                        cpu;
 };
 
+struct mlx5e_port_ptp;
+
 struct mlx5e_channels {
 	struct mlx5e_channel **c;
+	struct mlx5e_port_ptp  *port_ptp;
 	unsigned int           num;
 	struct mlx5e_params    params;
 };
@@ -708,6 +715,11 @@ struct mlx5e_channel_stats {
 	struct mlx5e_xdpsq_stats xsksq;
 } ____cacheline_aligned_in_smp;
 
+struct mlx5e_port_ptp_stats {
+	struct mlx5e_ch_stats ch;
+	struct mlx5e_sq_stats sq[MLX5E_MAX_NUM_TC];
+} ____cacheline_aligned_in_smp;
+
 enum {
 	MLX5E_STATE_OPENED,
 	MLX5E_STATE_DESTROYING,
@@ -777,8 +789,10 @@ struct mlx5e_scratchpad {
 
 struct mlx5e_priv {
 	/* priv data path fields - start */
-	struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
+	/* +1 for port ptp ts */
+	struct mlx5e_txqsq *txq2sq[(MLX5E_MAX_NUM_CHANNELS + 1) * MLX5E_MAX_NUM_TC];
 	int channel_tc2realtxq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
+	int port_ptp_tc2realtxq[MLX5E_MAX_NUM_TC];
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 	struct mlx5e_dcbx_dp       dcbx_dp;
 #endif
@@ -813,12 +827,15 @@ struct mlx5e_priv {
 	struct net_device         *netdev;
 	struct mlx5e_stats         stats;
 	struct mlx5e_channel_stats channel_stats[MLX5E_MAX_NUM_CHANNELS];
+	struct mlx5e_port_ptp_stats port_ptp_stats;
 	u16                        max_nch;
 	u8                         max_opened_tc;
+	bool                       port_ptp_opened;
 	struct hwtstamp_config     tstamp;
 	u16                        q_counter;
 	u16                        drop_rq_q_counter;
 	struct notifier_block      events_nb;
+	int                        num_tc_x_num_ch;
 
 	struct udp_tunnel_nic_info nic_info;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -993,7 +1010,17 @@ void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq);
 int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
 		    struct mlx5e_modify_sq_param *p);
 void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq);
+void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq);
+void mlx5e_free_txqsq(struct mlx5e_txqsq *sq);
 void mlx5e_tx_disable_queue(struct netdev_queue *txq);
+int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa);
+void mlx5e_free_txqsq_db(struct mlx5e_txqsq *sq);
+struct mlx5e_create_sq_param;
+int mlx5e_create_sq_rdy(struct mlx5_core_dev *mdev,
+			struct mlx5e_sq_param *param,
+			struct mlx5e_create_sq_param *csp,
+			u32 *sqn);
+void mlx5e_tx_err_cqe_work(struct work_struct *recover_work);
 
 static inline bool mlx5_tx_swp_supported(struct mlx5_core_dev *mdev)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 187007ad3349..70e463712b7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -41,6 +41,14 @@ struct mlx5e_channel_param {
 	struct mlx5e_sq_param      async_icosq;
 };
 
+struct mlx5e_create_sq_param {
+	struct mlx5_wq_ctrl        *wq_ctrl;
+	u32                         cqn;
+	u32                         tisn;
+	u8                          tis_lst_sz;
+	u8                          min_inline_mode;
+};
+
 static inline bool mlx5e_qid_get_ch_if_in_group(struct mlx5e_params *params,
 						u16 qid,
 						enum mlx5e_rq_group group,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
new file mode 100644
index 000000000000..8639b5104df7
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -0,0 +1,360 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2020 Mellanox Technologies
+
+#include "en/ptp.h"
+#include "en/txrx.h"
+#include "lib/clock.h"
+
+static int mlx5e_ptp_napi_poll(struct napi_struct *napi, int budget)
+{
+	struct mlx5e_port_ptp *c = container_of(napi, struct mlx5e_port_ptp,
+						napi);
+	struct mlx5e_ch_stats *ch_stats = c->stats;
+	bool busy = false;
+	int work_done = 0;
+	int i;
+
+	rcu_read_lock();
+
+	ch_stats->poll++;
+
+	for (i = 0; i < c->num_tc; i++)
+		busy |= mlx5e_poll_tx_cq(&c->ptpsq[i].txqsq.cq, budget);
+
+	if (busy) {
+		work_done = budget;
+		goto out;
+	}
+
+	if (unlikely(!napi_complete_done(napi, work_done)))
+		goto out;
+
+	ch_stats->arm++;
+
+	for (i = 0; i < c->num_tc; i++)
+		mlx5e_cq_arm(&c->ptpsq[i].txqsq.cq);
+
+out:
+	rcu_read_unlock();
+
+	return work_done;
+}
+
+static int mlx5e_ptp_alloc_txqsq(struct mlx5e_port_ptp *c, int txq_ix,
+				 struct mlx5e_params *params,
+				 struct mlx5e_sq_param *param,
+				 struct mlx5e_txqsq *sq, int tc,
+				 struct mlx5e_ptpsq *ptpsq)
+{
+	void *sqc_wq               = MLX5_ADDR_OF(sqc, param->sqc, wq);
+	struct mlx5_core_dev *mdev = c->mdev;
+	struct mlx5_wq_cyc *wq = &sq->wq;
+	int err;
+	int node;
+
+	sq->pdev      = c->pdev;
+	sq->tstamp    = c->tstamp;
+	sq->clock     = &mdev->clock;
+	sq->mkey_be   = c->mkey_be;
+	sq->netdev    = c->netdev;
+	sq->priv      = c->priv;
+	sq->mdev      = mdev;
+	sq->ch_ix     = c->ix;
+	sq->txq_ix    = txq_ix;
+	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
+	sq->min_inline_mode = params->tx_min_inline_mode;
+	sq->hw_mtu    = MLX5E_SW2HW_MTU(params, params->sw_mtu);
+	sq->stats     = &c->priv->port_ptp_stats.sq[tc];
+	sq->ptpsq     = ptpsq;
+	INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work);
+	if (!MLX5_CAP_ETH(mdev, wqe_vlan_insert))
+		set_bit(MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE, &sq->state);
+	sq->stop_room = param->stop_room;
+
+	node = dev_to_node(mlx5_core_dma_dev(mdev));
+
+	param->wq.db_numa_node = node;
+	err = mlx5_wq_cyc_create(mdev, &param->wq, sqc_wq, wq, &sq->wq_ctrl);
+	if (err)
+		return err;
+	wq->db    = &wq->db[MLX5_SND_DBR];
+
+	err = mlx5e_alloc_txqsq_db(sq, node);
+	if (err)
+		goto err_sq_wq_destroy;
+
+	return 0;
+
+err_sq_wq_destroy:
+	mlx5_wq_destroy(&sq->wq_ctrl);
+
+	return err;
+}
+
+static void mlx5e_ptp_destroy_sq(struct mlx5_core_dev *mdev, u32 sqn)
+{
+	mlx5_core_destroy_sq(mdev, sqn);
+}
+
+static int mlx5e_ptp_open_txqsq(struct mlx5e_port_ptp *c, u32 tisn,
+				int txq_ix, struct mlx5e_ptp_params *cparams,
+				int tc, struct mlx5e_ptpsq *ptpsq)
+{
+	struct mlx5e_sq_param *sqp = &cparams->txq_sq_param;
+	struct mlx5e_txqsq *txqsq = &ptpsq->txqsq;
+	struct mlx5e_create_sq_param csp = {};
+	int err;
+
+	err = mlx5e_ptp_alloc_txqsq(c, txq_ix, &cparams->params, sqp,
+				    txqsq, tc, ptpsq);
+	if (err)
+		return err;
+
+	csp.tisn            = tisn;
+	csp.tis_lst_sz      = 1;
+	csp.cqn             = txqsq->cq.mcq.cqn;
+	csp.wq_ctrl         = &txqsq->wq_ctrl;
+	csp.min_inline_mode = txqsq->min_inline_mode;
+
+	err = mlx5e_create_sq_rdy(c->mdev, sqp, &csp, &txqsq->sqn);
+	if (err)
+		goto err_free_txqsq;
+
+	return 0;
+
+err_free_txqsq:
+	mlx5e_free_txqsq(txqsq);
+
+	return err;
+}
+
+static void mlx5e_ptp_close_txqsq(struct mlx5e_ptpsq *ptpsq)
+{
+	struct mlx5e_txqsq *sq = &ptpsq->txqsq;
+	struct mlx5_core_dev *mdev = sq->mdev;
+
+	cancel_work_sync(&sq->recover_work);
+	mlx5e_ptp_destroy_sq(mdev, sq->sqn);
+	mlx5e_free_txqsq_descs(sq);
+	mlx5e_free_txqsq(sq);
+}
+
+static int mlx5e_ptp_open_txqsqs(struct mlx5e_port_ptp *c,
+				 struct mlx5e_ptp_params *cparams)
+{
+	struct mlx5e_params *params = &cparams->params;
+	int ix_base;
+	int err;
+	int tc;
+
+	ix_base = params->num_tc * params->num_channels;
+
+	for (tc = 0; tc < params->num_tc; tc++) {
+		int txq_ix = ix_base + tc;
+
+		err = mlx5e_ptp_open_txqsq(c, c->priv->tisn[c->lag_port][tc], txq_ix,
+					   cparams, tc, &c->ptpsq[tc]);
+		if (err)
+			goto close_txqsq;
+	}
+
+	return 0;
+
+close_txqsq:
+	for (--tc; tc >= 0; tc--)
+		mlx5e_ptp_close_txqsq(&c->ptpsq[tc]);
+
+	return err;
+}
+
+static void mlx5e_ptp_close_txqsqs(struct mlx5e_port_ptp *c)
+{
+	int tc;
+
+	for (tc = 0; tc < c->num_tc; tc++)
+		mlx5e_ptp_close_txqsq(&c->ptpsq[tc]);
+}
+
+static int mlx5e_ptp_open_cqs(struct mlx5e_port_ptp *c,
+			      struct mlx5e_ptp_params *cparams)
+{
+	struct mlx5e_params *params = &cparams->params;
+	struct mlx5e_create_cq_param ccp = {};
+	struct dim_cq_moder ptp_moder = {};
+	struct mlx5e_cq_param *cq_param;
+	int err;
+	int tc;
+
+	ccp.node     = dev_to_node(mlx5_core_dma_dev(c->mdev));
+	ccp.ch_stats = c->stats;
+	ccp.napi     = &c->napi;
+	ccp.ix       = c->ix;
+
+	cq_param = &cparams->txq_sq_param.cqp;
+
+	for (tc = 0; tc < params->num_tc; tc++) {
+		struct mlx5e_cq *cq = &c->ptpsq[tc].txqsq.cq;
+
+		err = mlx5e_open_cq(c->priv, ptp_moder, cq_param, &ccp, cq);
+		if (err)
+			goto out_err_txqsq_cq;
+	}
+
+	return 0;
+
+out_err_txqsq_cq:
+	for (--tc; tc >= 0; tc--)
+		mlx5e_close_cq(&c->ptpsq[tc].txqsq.cq);
+
+	return err;
+}
+
+static void mlx5e_ptp_close_cqs(struct mlx5e_port_ptp *c)
+{
+	int tc;
+
+	for (tc = 0; tc < c->num_tc; tc++)
+		mlx5e_close_cq(&c->ptpsq[tc].txqsq.cq);
+}
+
+static void mlx5e_ptp_build_sq_param(struct mlx5e_priv *priv,
+				     struct mlx5e_params *params,
+				     struct mlx5e_sq_param *param)
+{
+	void *sqc = param->sqc;
+	void *wq;
+
+	mlx5e_build_sq_param_common(priv, param);
+
+	wq = MLX5_ADDR_OF(sqc, sqc, wq);
+	MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size);
+	param->stop_room = mlx5e_stop_room_for_wqe(MLX5_SEND_WQE_MAX_WQEBBS);
+	mlx5e_build_tx_cq_param(priv, params, &param->cqp);
+}
+
+static void mlx5e_ptp_build_params(struct mlx5e_port_ptp *c,
+				   struct mlx5e_ptp_params *cparams,
+				   struct mlx5e_params *orig)
+{
+	struct mlx5e_params *params = &cparams->params;
+
+	params->tx_min_inline_mode = orig->tx_min_inline_mode;
+	params->num_channels = orig->num_channels;
+	params->hard_mtu = orig->hard_mtu;
+	params->sw_mtu = orig->sw_mtu;
+	params->num_tc = orig->num_tc;
+
+	/* SQ */
+	params->log_sq_size = orig->log_sq_size;
+
+	mlx5e_ptp_build_sq_param(c->priv, params, &cparams->txq_sq_param);
+}
+
+static int mlx5e_ptp_open_queues(struct mlx5e_port_ptp *c,
+				 struct mlx5e_ptp_params *cparams)
+{
+	int err;
+
+	err = mlx5e_ptp_open_cqs(c, cparams);
+	if (err)
+		return err;
+
+	napi_enable(&c->napi);
+
+	err = mlx5e_ptp_open_txqsqs(c, cparams);
+	if (err)
+		goto disable_napi;
+
+	return 0;
+
+disable_napi:
+	napi_disable(&c->napi);
+	mlx5e_ptp_close_cqs(c);
+
+	return err;
+}
+
+static void mlx5e_ptp_close_queues(struct mlx5e_port_ptp *c)
+{
+	mlx5e_ptp_close_txqsqs(c);
+	napi_disable(&c->napi);
+	mlx5e_ptp_close_cqs(c);
+}
+
+int mlx5e_port_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
+			u8 lag_port, struct mlx5e_port_ptp **cp)
+{
+	struct net_device *netdev = priv->netdev;
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_ptp_params *cparams;
+	struct mlx5e_port_ptp *c;
+	unsigned int irq;
+	int err;
+	int eqn;
+
+	err = mlx5_vector2eqn(priv->mdev, 0, &eqn, &irq);
+	if (err)
+		return err;
+
+	c = kvzalloc_node(sizeof(*c), GFP_KERNEL, dev_to_node(mlx5_core_dma_dev(mdev)));
+	cparams = kvzalloc(sizeof(*cparams), GFP_KERNEL);
+	if (!c || !cparams)
+		return -ENOMEM;
+
+	c->priv     = priv;
+	c->mdev     = priv->mdev;
+	c->tstamp   = &priv->tstamp;
+	c->ix       = 0;
+	c->pdev     = mlx5_core_dma_dev(priv->mdev);
+	c->netdev   = priv->netdev;
+	c->mkey_be  = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
+	c->num_tc   = params->num_tc;
+	c->stats    = &priv->port_ptp_stats.ch;
+	c->irq_desc = irq_to_desc(irq);
+	c->lag_port = lag_port;
+
+	netif_napi_add(netdev, &c->napi, mlx5e_ptp_napi_poll, 64);
+
+	mlx5e_ptp_build_params(c, cparams, params);
+
+	err = mlx5e_ptp_open_queues(c, cparams);
+	if (unlikely(err))
+		goto err_napi_del;
+
+	*cp = c;
+
+	kvfree(cparams);
+
+	return 0;
+
+err_napi_del:
+	netif_napi_del(&c->napi);
+
+	kvfree(cparams);
+	kvfree(c);
+	return err;
+}
+
+void mlx5e_port_ptp_close(struct mlx5e_port_ptp *c)
+{
+	mlx5e_ptp_close_queues(c);
+	netif_napi_del(&c->napi);
+
+	kvfree(c);
+}
+
+void mlx5e_ptp_activate_channel(struct mlx5e_port_ptp *c)
+{
+	int tc;
+
+	for (tc = 0; tc < c->num_tc; tc++)
+		mlx5e_activate_txqsq(&c->ptpsq[tc].txqsq);
+}
+
+void mlx5e_ptp_deactivate_channel(struct mlx5e_port_ptp *c)
+{
+	int tc;
+
+	for (tc = 0; tc < c->num_tc; tc++)
+		mlx5e_deactivate_txqsq(&c->ptpsq[tc].txqsq);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
new file mode 100644
index 000000000000..daa3b6953e3f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies. */
+
+#ifndef __MLX5_EN_PTP_H__
+#define __MLX5_EN_PTP_H__
+
+#include "en.h"
+#include "en/params.h"
+#include "en_stats.h"
+
+struct mlx5e_ptpsq {
+	struct mlx5e_txqsq       txqsq;
+};
+
+struct mlx5e_port_ptp {
+	/* data path */
+	struct mlx5e_ptpsq         ptpsq[MLX5E_MAX_NUM_TC];
+	struct napi_struct         napi;
+	struct device             *pdev;
+	struct net_device         *netdev;
+	__be32                     mkey_be;
+	u8                         num_tc;
+	u8                         lag_port;
+
+	/* data path - accessed per napi poll */
+	struct irq_desc *irq_desc;
+	struct mlx5e_ch_stats     *stats;
+
+	/* control */
+	struct mlx5e_priv         *priv;
+	struct mlx5_core_dev      *mdev;
+	struct hwtstamp_config    *tstamp;
+	DECLARE_BITMAP(state, MLX5E_CHANNEL_NUM_STATES);
+	int                        ix;
+};
+
+struct mlx5e_ptp_params {
+	struct mlx5e_params        params;
+	struct mlx5e_sq_param      txq_sq_param;
+};
+
+int mlx5e_port_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
+			u8 lag_port, struct mlx5e_port_ptp **cp);
+void mlx5e_port_ptp_close(struct mlx5e_port_ptp *c);
+void mlx5e_ptp_activate_channel(struct mlx5e_port_ptp *c);
+void mlx5e_ptp_deactivate_channel(struct mlx5e_port_ptp *c);
+
+#endif /* __MLX5_EN_PTP_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 88b3b21d1068..c55a2ad10599 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2019 Mellanox Technologies. */
 
 #include "health.h"
+#include "en/ptp.h"
 
 static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
 {
@@ -141,8 +142,8 @@ static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
 }
 
 static int
-mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
-					struct mlx5e_txqsq *sq, int tc)
+mlx5e_tx_reporter_build_diagnose_output_sq_common(struct devlink_fmsg *fmsg,
+						  struct mlx5e_txqsq *sq, int tc)
 {
 	bool stopped = netif_xmit_stopped(sq->txq);
 	struct mlx5e_priv *priv = sq->priv;
@@ -153,14 +154,6 @@ mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
 	if (err)
 		return err;
 
-	err = devlink_fmsg_obj_nest_start(fmsg);
-	if (err)
-		return err;
-
-	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", sq->ch_ix);
-	if (err)
-		return err;
-
 	err = devlink_fmsg_u32_pair_put(fmsg, "tc", tc);
 	if (err)
 		return err;
@@ -193,7 +186,24 @@ mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
 	if (err)
 		return err;
 
-	err = mlx5e_health_eq_diag_fmsg(sq->cq.mcq.eq, fmsg);
+	return mlx5e_health_eq_diag_fmsg(sq->cq.mcq.eq, fmsg);
+}
+
+static int
+mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
+					struct mlx5e_txqsq *sq, int tc)
+{
+	int err;
+
+	err = devlink_fmsg_obj_nest_start(fmsg);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", sq->ch_ix);
+	if (err)
+		return err;
+
+	err = mlx5e_tx_reporter_build_diagnose_output_sq_common(fmsg, sq, tc);
 	if (err)
 		return err;
 
@@ -204,49 +214,116 @@ mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
 	return 0;
 }
 
-static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
-				      struct devlink_fmsg *fmsg,
-				      struct netlink_ext_ack *extack)
+static int
+mlx5e_tx_reporter_build_diagnose_output_ptpsq(struct devlink_fmsg *fmsg,
+					      struct mlx5e_ptpsq *ptpsq, int tc)
 {
-	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
-	struct mlx5e_txqsq *generic_sq = priv->txq2sq[0];
-	u32 sq_stride, sq_sz;
-
-	int i, tc, err = 0;
+	int err;
 
-	mutex_lock(&priv->state_lock);
+	err = devlink_fmsg_obj_nest_start(fmsg);
+	if (err)
+		return err;
 
-	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
-		goto unlock;
+	err = devlink_fmsg_string_pair_put(fmsg, "channel", "ptp");
+	if (err)
+		return err;
 
-	sq_sz = mlx5_wq_cyc_get_size(&generic_sq->wq);
-	sq_stride = MLX5_SEND_WQE_BB;
+	err = mlx5e_tx_reporter_build_diagnose_output_sq_common(fmsg,
+								&ptpsq->txqsq,
+								tc);
+	if (err)
+		return err;
 
-	err = mlx5e_health_fmsg_named_obj_nest_start(fmsg, "Common Config");
+	err = devlink_fmsg_obj_nest_end(fmsg);
 	if (err)
-		goto unlock;
+		return err;
+
+	return 0;
+}
+
+static int
+mlx5e_tx_reporter_diagnose_generic_txqsq(struct devlink_fmsg *fmsg,
+					 struct mlx5e_txqsq *txqsq)
+{
+	u32 sq_stride, sq_sz;
+	int err;
 
 	err = mlx5e_health_fmsg_named_obj_nest_start(fmsg, "SQ");
 	if (err)
-		goto unlock;
+		return err;
+
+	sq_sz = mlx5_wq_cyc_get_size(&txqsq->wq);
+	sq_stride = MLX5_SEND_WQE_BB;
 
 	err = devlink_fmsg_u64_pair_put(fmsg, "stride size", sq_stride);
 	if (err)
-		goto unlock;
+		return err;
 
 	err = devlink_fmsg_u32_pair_put(fmsg, "size", sq_sz);
 	if (err)
-		goto unlock;
+		return err;
 
-	err = mlx5e_health_cq_common_diag_fmsg(&generic_sq->cq, fmsg);
+	err = mlx5e_health_cq_common_diag_fmsg(&txqsq->cq, fmsg);
 	if (err)
-		goto unlock;
+		return err;
+
+	return mlx5e_health_fmsg_named_obj_nest_end(fmsg);
+}
+
+static int
+mlx5e_tx_reporter_diagnose_common_config(struct devlink_health_reporter *reporter,
+					 struct devlink_fmsg *fmsg)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_txqsq *generic_sq = priv->txq2sq[0];
+	struct mlx5e_ptpsq *generic_ptpsq;
+	int err;
+
+	err = mlx5e_health_fmsg_named_obj_nest_start(fmsg, "Common Config");
+	if (err)
+		return err;
+
+	err = mlx5e_tx_reporter_diagnose_generic_txqsq(fmsg, generic_sq);
+	if (err)
+		return err;
+
+	generic_ptpsq = priv->channels.port_ptp ?
+			&priv->channels.port_ptp->ptpsq[0] :
+			NULL;
+	if (!generic_ptpsq)
+		goto out;
+
+	err = mlx5e_health_fmsg_named_obj_nest_start(fmsg, "PTP");
+	if (err)
+		return err;
+
+	err = mlx5e_tx_reporter_diagnose_generic_txqsq(fmsg, &generic_ptpsq->txqsq);
+	if (err)
+		return err;
 
 	err = mlx5e_health_fmsg_named_obj_nest_end(fmsg);
 	if (err)
+		return err;
+
+out:
+	return mlx5e_health_fmsg_named_obj_nest_end(fmsg);
+}
+
+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
+				      struct devlink_fmsg *fmsg,
+				      struct netlink_ext_ack *extack)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_port_ptp *ptp_ch = priv->channels.port_ptp;
+
+	int i, tc, err = 0;
+
+	mutex_lock(&priv->state_lock);
+
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
 		goto unlock;
 
-	err = mlx5e_health_fmsg_named_obj_nest_end(fmsg);
+	err = mlx5e_tx_reporter_diagnose_common_config(reporter, fmsg);
 	if (err)
 		goto unlock;
 
@@ -265,6 +342,19 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 				goto unlock;
 		}
 	}
+
+	if (!ptp_ch)
+		goto close_sqs_nest;
+
+	for (tc = 0; tc < priv->channels.params.num_tc; tc++) {
+		err = mlx5e_tx_reporter_build_diagnose_output_ptpsq(fmsg,
+								    &ptp_ch->ptpsq[tc],
+								    tc);
+		if (err)
+			goto unlock;
+	}
+
+close_sqs_nest:
 	err = devlink_fmsg_arr_pair_nest_end(fmsg);
 	if (err)
 		goto unlock;
@@ -338,6 +428,7 @@ static int mlx5e_tx_reporter_dump_sq(struct mlx5e_priv *priv, struct devlink_fms
 static int mlx5e_tx_reporter_dump_all_sqs(struct mlx5e_priv *priv,
 					  struct devlink_fmsg *fmsg)
 {
+	struct mlx5e_port_ptp *ptp_ch = priv->channels.port_ptp;
 	struct mlx5_rsc_key key = {};
 	int i, tc, err;
 
@@ -373,6 +464,17 @@ static int mlx5e_tx_reporter_dump_all_sqs(struct mlx5e_priv *priv,
 				return err;
 		}
 	}
+
+	if (ptp_ch) {
+		for (tc = 0; tc < priv->channels.params.num_tc; tc++) {
+			struct mlx5e_txqsq *sq = &ptp_ch->ptpsq[tc].txqsq;
+
+			err = mlx5e_health_queue_dump(priv, fmsg, sq->sqn, "PTP SQ");
+			if (err)
+				return err;
+		}
+	}
+
 	return devlink_fmsg_arr_pair_nest_end(fmsg);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 42e61dc28ead..30542d98ab27 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1946,6 +1946,38 @@ static int set_pflag_skb_tx_mpwqe(struct net_device *netdev, bool enable)
 	return set_pflag_tx_mpwqe_common(netdev, MLX5E_PFLAG_SKB_TX_MPWQE, enable);
 }
 
+static int set_pflag_tx_port_ts(struct net_device *netdev, bool enable)
+{
+	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_channels new_channels = {};
+	int err;
+
+	if (!MLX5_CAP_GEN(mdev, ts_cqe_to_dest_cqn))
+		return -EOPNOTSUPP;
+
+	new_channels.params = priv->channels.params;
+	MLX5E_SET_PFLAG(&new_channels.params, MLX5E_PFLAG_TX_PORT_TS, enable);
+	/* No need to verify SQ stop room as
+	 * ptpsq.txqsq.stop_room <= generic_sq->stop_room, and both
+	 * has the same log_sq_size.
+	 */
+
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
+		priv->channels.params = new_channels.params;
+		err = mlx5e_num_channels_changed(priv);
+		goto out;
+	}
+
+	err = mlx5e_safe_switch_channels(priv, &new_channels,
+					 mlx5e_num_channels_changed_ctx, NULL);
+out:
+	if (!err)
+		priv->port_ptp_opened = true;
+
+	return err;
+}
+
 static const struct pflag_desc mlx5e_priv_flags[MLX5E_NUM_PFLAGS] = {
 	{ "rx_cqe_moder",        set_pflag_rx_cqe_based_moder },
 	{ "tx_cqe_moder",        set_pflag_tx_cqe_based_moder },
@@ -1954,6 +1986,7 @@ static const struct pflag_desc mlx5e_priv_flags[MLX5E_NUM_PFLAGS] = {
 	{ "rx_no_csum_complete", set_pflag_rx_no_csum_complete },
 	{ "xdp_tx_mpwqe",        set_pflag_xdp_tx_mpwqe },
 	{ "skb_tx_mpwqe",        set_pflag_skb_tx_mpwqe },
+	{ "tx_port_ts",          set_pflag_tx_port_ts },
 };
 
 static int mlx5e_handle_pflag(struct net_device *netdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3ea15d62acd9..e36a13238271 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -64,6 +64,7 @@
 #include "en/hv_vhca_stats.h"
 #include "en/devlink.h"
 #include "lib/mlx5.h"
+#include "en/ptp.h"
 
 bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev)
 {
@@ -1083,14 +1084,14 @@ static void mlx5e_free_icosq(struct mlx5e_icosq *sq)
 	mlx5_wq_destroy(&sq->wq_ctrl);
 }
 
-static void mlx5e_free_txqsq_db(struct mlx5e_txqsq *sq)
+void mlx5e_free_txqsq_db(struct mlx5e_txqsq *sq)
 {
 	kvfree(sq->db.wqe_info);
 	kvfree(sq->db.skb_fifo.fifo);
 	kvfree(sq->db.dma_fifo);
 }
 
-static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
+int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
 {
 	int wq_sz = mlx5_wq_cyc_get_size(&sq->wq);
 	int df_sz = wq_sz * MLX5_SEND_WQEBB_NUM_DS;
@@ -1118,7 +1119,6 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
 	return 0;
 }
 
-static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work);
 static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 			     int txq_ix,
 			     struct mlx5e_params *params,
@@ -1176,20 +1176,12 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 	return err;
 }
 
-static void mlx5e_free_txqsq(struct mlx5e_txqsq *sq)
+void mlx5e_free_txqsq(struct mlx5e_txqsq *sq)
 {
 	mlx5e_free_txqsq_db(sq);
 	mlx5_wq_destroy(&sq->wq_ctrl);
 }
 
-struct mlx5e_create_sq_param {
-	struct mlx5_wq_ctrl        *wq_ctrl;
-	u32                         cqn;
-	u32                         tisn;
-	u8                          tis_lst_sz;
-	u8                          min_inline_mode;
-};
-
 static int mlx5e_create_sq(struct mlx5_core_dev *mdev,
 			   struct mlx5e_sq_param *param,
 			   struct mlx5e_create_sq_param *csp,
@@ -1271,10 +1263,10 @@ static void mlx5e_destroy_sq(struct mlx5_core_dev *mdev, u32 sqn)
 	mlx5_core_destroy_sq(mdev, sqn);
 }
 
-static int mlx5e_create_sq_rdy(struct mlx5_core_dev *mdev,
-			       struct mlx5e_sq_param *param,
-			       struct mlx5e_create_sq_param *csp,
-			       u32 *sqn)
+int mlx5e_create_sq_rdy(struct mlx5_core_dev *mdev,
+			struct mlx5e_sq_param *param,
+			struct mlx5e_create_sq_param *csp,
+			u32 *sqn)
 {
 	struct mlx5e_modify_sq_param msp = {0};
 	int err;
@@ -1350,7 +1342,7 @@ void mlx5e_tx_disable_queue(struct netdev_queue *txq)
 	__netif_tx_unlock_bh(txq);
 }
 
-static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
+void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
 {
 	struct mlx5_wq_cyc *wq = &sq->wq;
 
@@ -1389,7 +1381,7 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 	mlx5e_free_txqsq(sq);
 }
 
-static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
+void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
 {
 	struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
 					      recover_work);
@@ -2374,6 +2366,13 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
 			goto err_close_channels;
 	}
 
+	if (MLX5E_GET_PFLAG(&chs->params, MLX5E_PFLAG_TX_PORT_TS)) {
+		err = mlx5e_port_ptp_open(priv, &chs->params, chs->c[0]->lag_port,
+					  &chs->port_ptp);
+		if (err)
+			goto err_close_channels;
+	}
+
 	mlx5e_health_channels_update(priv);
 	kvfree(cparam);
 	return 0;
@@ -2395,6 +2394,9 @@ static void mlx5e_activate_channels(struct mlx5e_channels *chs)
 
 	for (i = 0; i < chs->num; i++)
 		mlx5e_activate_channel(chs->c[i]);
+
+	if (chs->port_ptp)
+		mlx5e_ptp_activate_channel(chs->port_ptp);
 }
 
 #define MLX5E_RQ_WQES_TIMEOUT 20000 /* msecs */
@@ -2421,6 +2423,9 @@ static void mlx5e_deactivate_channels(struct mlx5e_channels *chs)
 {
 	int i;
 
+	if (chs->port_ptp)
+		mlx5e_ptp_deactivate_channel(chs->port_ptp);
+
 	for (i = 0; i < chs->num; i++)
 		mlx5e_deactivate_channel(chs->c[i]);
 }
@@ -2429,6 +2434,9 @@ void mlx5e_close_channels(struct mlx5e_channels *chs)
 {
 	int i;
 
+	if (chs->port_ptp)
+		mlx5e_port_ptp_close(chs->port_ptp);
+
 	for (i = 0; i < chs->num; i++)
 		mlx5e_close_channel(chs->c[i]);
 
@@ -2914,6 +2922,8 @@ static int mlx5e_update_netdev_queues(struct mlx5e_priv *priv)
 	nch = priv->channels.params.num_channels;
 	ntc = priv->channels.params.num_tc;
 	num_txqs = nch * ntc;
+	if (MLX5E_GET_PFLAG(&priv->channels.params, MLX5E_PFLAG_TX_PORT_TS))
+		num_txqs += ntc;
 	num_rxqs = nch * priv->profile->rq_groups;
 
 	mlx5e_netdev_set_tcs(netdev, nch, ntc);
@@ -2987,14 +2997,13 @@ MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX(mlx5e_num_channels_changed);
 
 static void mlx5e_build_txq_maps(struct mlx5e_priv *priv)
 {
-	int i, ch;
+	int i, ch, tc, num_tc;
 
 	ch = priv->channels.num;
+	num_tc = priv->channels.params.num_tc;
 
 	for (i = 0; i < ch; i++) {
-		int tc;
-
-		for (tc = 0; tc < priv->channels.params.num_tc; tc++) {
+		for (tc = 0; tc < num_tc; tc++) {
 			struct mlx5e_channel *c = priv->channels.c[i];
 			struct mlx5e_txqsq *sq = &c->sq[tc];
 
@@ -3002,10 +3011,28 @@ static void mlx5e_build_txq_maps(struct mlx5e_priv *priv)
 			priv->channel_tc2realtxq[i][tc] = i + tc * ch;
 		}
 	}
+
+	if (!priv->channels.port_ptp)
+		return;
+
+	for (tc = 0; tc < num_tc; tc++) {
+		struct mlx5e_port_ptp *c = priv->channels.port_ptp;
+		struct mlx5e_txqsq *sq = &c->ptpsq[tc].txqsq;
+
+		priv->txq2sq[sq->txq_ix] = sq;
+		priv->port_ptp_tc2realtxq[tc] = priv->num_tc_x_num_ch + tc;
+	}
+}
+
+static void mlx5e_update_num_tc_x_num_ch(struct mlx5e_priv *priv)
+{
+	priv->num_tc_x_num_ch = priv->channels.params.num_tc *
+				priv->channels.num;
 }
 
 void mlx5e_activate_priv_channels(struct mlx5e_priv *priv)
 {
+	mlx5e_update_num_tc_x_num_ch(priv);
 	mlx5e_build_txq_maps(priv);
 	mlx5e_activate_channels(&priv->channels);
 	mlx5e_xdp_tx_enable(priv);
@@ -4342,6 +4369,7 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
 					       tx_timeout_work);
+	struct net_device *netdev = priv->netdev;
 	int i;
 
 	rtnl_lock();
@@ -4350,9 +4378,9 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
 		goto unlock;
 
-	for (i = 0; i < priv->channels.num * priv->channels.params.num_tc; i++) {
+	for (i = 0; i < netdev->real_num_tx_queues; i++) {
 		struct netdev_queue *dev_queue =
-			netdev_get_tx_queue(priv->netdev, i);
+			netdev_get_tx_queue(netdev, i);
 		struct mlx5e_txqsq *sq = priv->txq2sq[i];
 
 		if (!netif_xmit_stopped(dev_queue))
@@ -5334,10 +5362,14 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 				       void *ppriv)
 {
 	struct net_device *netdev;
+	unsigned int ptp_txqs = 0;
 	int err;
 
+	if (MLX5_CAP_GEN(mdev, ts_cqe_to_dest_cqn))
+		ptp_txqs = profile->max_tc;
+
 	netdev = alloc_etherdev_mqs(sizeof(struct mlx5e_priv),
-				    nch * profile->max_tc,
+				    nch * profile->max_tc + ptp_txqs,
 				    nch * profile->rq_groups);
 	if (!netdev) {
 		mlx5_core_err(mdev, "alloc_etherdev_mqs() failed\n");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index ebfb47a09128..9d57dc94c767 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -402,6 +402,24 @@ static void mlx5e_stats_grp_sw_update_stats_sq(struct mlx5e_sw_stats *s,
 	s->tx_cqes                  += sq_stats->cqes;
 }
 
+static void mlx5e_stats_grp_sw_update_stats_ptp(struct mlx5e_priv *priv,
+						struct mlx5e_sw_stats *s)
+{
+	int i;
+
+	if (!priv->port_ptp_opened)
+		return;
+
+	mlx5e_stats_grp_sw_update_stats_ch_stats(s, &priv->port_ptp_stats.ch);
+
+	for (i = 0; i < priv->max_opened_tc; i++) {
+		mlx5e_stats_grp_sw_update_stats_sq(s, &priv->port_ptp_stats.sq[i]);
+
+		/* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92657 */
+		barrier();
+	}
+}
+
 static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
 {
 	struct mlx5e_sw_stats *s = &priv->stats.sw;
@@ -430,6 +448,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
 			barrier();
 		}
 	}
+	mlx5e_stats_grp_sw_update_stats_ptp(priv, s);
 }
 
 static const struct counter_desc q_stats_desc[] = {
@@ -1690,6 +1709,30 @@ static const struct counter_desc ch_stats_desc[] = {
 	{ MLX5E_DECLARE_CH_STAT(struct mlx5e_ch_stats, eq_rearm) },
 };
 
+static const struct counter_desc ptp_sq_stats_desc[] = {
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, packets) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, bytes) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, csum_partial) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, csum_partial_inner) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, added_vlan_packets) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, nop) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, csum_none) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, stopped) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, dropped) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, xmit_more) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, recover) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, cqes) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, wake) },
+	{ MLX5E_DECLARE_PTP_TX_STAT(struct mlx5e_sq_stats, cqe_err) },
+};
+
+static const struct counter_desc ptp_ch_stats_desc[] = {
+	{ MLX5E_DECLARE_PTP_CH_STAT(struct mlx5e_ch_stats, events) },
+	{ MLX5E_DECLARE_PTP_CH_STAT(struct mlx5e_ch_stats, poll) },
+	{ MLX5E_DECLARE_PTP_CH_STAT(struct mlx5e_ch_stats, arm) },
+	{ MLX5E_DECLARE_PTP_CH_STAT(struct mlx5e_ch_stats, eq_rearm) },
+};
+
 #define NUM_RQ_STATS			ARRAY_SIZE(rq_stats_desc)
 #define NUM_SQ_STATS			ARRAY_SIZE(sq_stats_desc)
 #define NUM_XDPSQ_STATS			ARRAY_SIZE(xdpsq_stats_desc)
@@ -1697,6 +1740,57 @@ static const struct counter_desc ch_stats_desc[] = {
 #define NUM_XSKRQ_STATS			ARRAY_SIZE(xskrq_stats_desc)
 #define NUM_XSKSQ_STATS			ARRAY_SIZE(xsksq_stats_desc)
 #define NUM_CH_STATS			ARRAY_SIZE(ch_stats_desc)
+#define NUM_PTP_SQ_STATS		ARRAY_SIZE(ptp_sq_stats_desc)
+#define NUM_PTP_CH_STATS		ARRAY_SIZE(ptp_ch_stats_desc)
+
+static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(ptp)
+{
+	return priv->port_ptp_opened ?
+	       NUM_PTP_CH_STATS + (NUM_PTP_SQ_STATS * priv->max_opened_tc) :
+	       0;
+}
+
+static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(ptp)
+{
+	int i, tc;
+
+	if (!priv->port_ptp_opened)
+		return idx;
+
+	for (i = 0; i < NUM_PTP_CH_STATS; i++)
+		sprintf(data + (idx++) * ETH_GSTRING_LEN,
+			ptp_ch_stats_desc[i].format);
+
+	for (tc = 0; tc < priv->max_opened_tc; tc++)
+		for (i = 0; i < NUM_PTP_SQ_STATS; i++)
+			sprintf(data + (idx++) * ETH_GSTRING_LEN,
+				ptp_sq_stats_desc[i].format, tc);
+
+	return idx;
+}
+
+static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(ptp)
+{
+	int i, tc;
+
+	if (!priv->port_ptp_opened)
+		return idx;
+
+	for (i = 0; i < NUM_PTP_CH_STATS; i++)
+		data[idx++] =
+			MLX5E_READ_CTR64_CPU(&priv->port_ptp_stats.ch,
+					     ptp_ch_stats_desc, i);
+
+	for (tc = 0; tc < priv->max_opened_tc; tc++)
+		for (i = 0; i < NUM_PTP_SQ_STATS; i++)
+			data[idx++] =
+				MLX5E_READ_CTR64_CPU(&priv->port_ptp_stats.sq[tc],
+						     ptp_sq_stats_desc, i);
+
+	return idx;
+}
+
+static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(ptp) { return; }
 
 static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
 {
@@ -1818,6 +1912,7 @@ MLX5E_DEFINE_STATS_GRP(channels, 0);
 MLX5E_DEFINE_STATS_GRP(per_port_buff_congest, 0);
 MLX5E_DEFINE_STATS_GRP(eth_ext, 0);
 static MLX5E_DEFINE_STATS_GRP(tls, 0);
+static MLX5E_DEFINE_STATS_GRP(ptp, 0);
 
 /* The stats groups order is opposite to the update_stats() order calls */
 mlx5e_stats_grp_t mlx5e_nic_stats_grps[] = {
@@ -1840,6 +1935,7 @@ mlx5e_stats_grp_t mlx5e_nic_stats_grps[] = {
 	&MLX5E_STATS_GRP(tls),
 	&MLX5E_STATS_GRP(channels),
 	&MLX5E_STATS_GRP(per_port_buff_congest),
+	&MLX5E_STATS_GRP(ptp),
 };
 
 unsigned int mlx5e_nic_stats_grps_num(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 162daaadb0d8..98ffebcc93b9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -51,6 +51,9 @@
 #define MLX5E_DECLARE_XSKSQ_STAT(type, fld) "tx%d_xsk_"#fld, offsetof(type, fld)
 #define MLX5E_DECLARE_CH_STAT(type, fld) "ch%d_"#fld, offsetof(type, fld)
 
+#define MLX5E_DECLARE_PTP_TX_STAT(type, fld) "ptp_tx%d_"#fld, offsetof(type, fld)
+#define MLX5E_DECLARE_PTP_CH_STAT(type, fld) "ptp_ch_"#fld, offsetof(type, fld)
+
 struct counter_desc {
 	char		format[ETH_GSTRING_LEN];
 	size_t		offset; /* Byte offset */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index c6b20b77a0f2..0ae68cb25035 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -39,6 +39,7 @@
 #include "ipoib/ipoib.h"
 #include "en_accel/en_accel.h"
 #include "lib/clock.h"
+#include "en/ptp.h"
 
 static void mlx5e_dma_unmap_wqe_err(struct mlx5e_txqsq *sq, u8 num_dma)
 {
@@ -66,14 +67,67 @@ static inline int mlx5e_get_dscp_up(struct mlx5e_priv *priv, struct sk_buff *skb
 }
 #endif
 
+static bool mlx5e_use_ptpsq(struct sk_buff *skb)
+{
+	struct flow_keys fk;
+
+	if (!skb_flow_dissect_flow_keys(skb, &fk, 0))
+		return false;
+
+	if (fk.basic.n_proto == htons(ETH_P_1588))
+		return true;
+
+	if (fk.basic.n_proto != htons(ETH_P_IP) &&
+	    fk.basic.n_proto != htons(ETH_P_IPV6))
+		return false;
+
+	return fk.basic.ip_proto == IPPROTO_UDP;
+}
+
+static u16 mlx5e_select_ptpsq(struct net_device *dev, struct sk_buff *skb)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	int up = 0;
+
+	if (!netdev_get_num_tc(dev))
+		goto return_txq;
+
+#ifdef CONFIG_MLX5_CORE_EN_DCB
+	if (priv->dcbx_dp.trust_state == MLX5_QPTS_TRUST_DSCP)
+		up = mlx5e_get_dscp_up(priv, skb);
+	else
+#endif
+		if (skb_vlan_tag_present(skb))
+			up = skb_vlan_tag_get_prio(skb);
+
+return_txq:
+	return priv->port_ptp_tc2realtxq[up];
+}
+
 u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
 		       struct net_device *sb_dev)
 {
-	int txq_ix = netdev_pick_tx(dev, skb, NULL);
 	struct mlx5e_priv *priv = netdev_priv(dev);
+	int txq_ix;
 	int up = 0;
 	int ch_ix;
 
+	if (unlikely(priv->channels.port_ptp)) {
+		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
+		    mlx5e_use_ptpsq(skb))
+			return mlx5e_select_ptpsq(dev, skb);
+
+		txq_ix = netdev_pick_tx(dev, skb, NULL);
+		/* Fix netdev_pick_tx() not to choose ptp_channel txqs.
+		 * If they are selected, switch to regular queues.
+		 * Driver to select these queues only at mlx5e_select_ptpsq().
+		 */
+		if (unlikely(txq_ix >= priv->num_tc_x_num_ch))
+			txq_ix = txq_ix % priv->num_tc_x_num_ch;
+	} else {
+		txq_ix = netdev_pick_tx(dev, skb, NULL);
+	}
+
 	if (!netdev_get_num_tc(dev))
 		return txq_ix;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 09/15] net/mlx5e: Add TX port timestamp support
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 08/15] net/mlx5e: Add TX PTP port object support Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 10/15] net/mlx5e: remove unnecessary memset Saeed Mahameed
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan, Saeed Mahameed

From: Eran Ben Elisha <eranbe@nvidia.com>

Transmitted packet timestamping accuracy can be improved when using
timestamp from the port, instead of packet CQE creation timestamp, as
it better reflects the actual time of a packet's transmit.

TX port timestamping is supported starting from ConnectX6-DX hardware.
Although at the original completion, only CQE timestamp can be attached,
we are able to get TX port timestamping via an additional completion over
a special CQ associated with the SQ (in addition to the regular CQ).

Driver to ignore the original packet completion timestamp, and report
back the timestamp of the special CQ completion. If the absolute timestamp
diff between the two completions is greater than 1 / 128 second, ignore
the TX port timestamp as it has a jitter which is too big.
No skb will be generate out of the extra completion.

Allocate additional CQ per ptpsq, to receive the TX port timestamp.

Driver to hold an skb FIFO in order to map between transmitted skb to
the two expected completions. When using ptpsq, hold double refcount on
the skb, to gaurantee it will not get released before both completions
arrive.

Expose dedicated counters of the ptp additional CQ and connect it to the
TX health reporter.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   1 +
 .../ethernet/mellanox/mlx5/core/en/params.h   |   1 +
 .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 173 +++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/en/ptp.h  |  15 ++
 .../mellanox/mlx5/core/en/reporter_tx.c       |  37 +++-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   1 +
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  21 ++-
 .../ethernet/mellanox/mlx5/core/en_stats.h    |   8 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |  12 +-
 9 files changed, 262 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6864c79d2d9a..a1a81cfeb607 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -718,6 +718,7 @@ struct mlx5e_channel_stats {
 struct mlx5e_port_ptp_stats {
 	struct mlx5e_ch_stats ch;
 	struct mlx5e_sq_stats sq[MLX5E_MAX_NUM_TC];
+	struct mlx5e_ptp_cq_stats cq[MLX5E_MAX_NUM_TC];
 } ____cacheline_aligned_in_smp;
 
 enum {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 70e463712b7f..3959254d4181 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -44,6 +44,7 @@ struct mlx5e_channel_param {
 struct mlx5e_create_sq_param {
 	struct mlx5_wq_ctrl        *wq_ctrl;
 	u32                         cqn;
+	u32                         ts_cqe_to_dest_cqn;
 	u32                         tisn;
 	u8                          tis_lst_sz;
 	u8                          min_inline_mode;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index 8639b5104df7..351118985a57 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -5,6 +5,115 @@
 #include "en/txrx.h"
 #include "lib/clock.h"
 
+struct mlx5e_skb_cb_hwtstamp {
+	ktime_t cqe_hwtstamp;
+	ktime_t port_hwtstamp;
+};
+
+void mlx5e_skb_cb_hwtstamp_init(struct sk_buff *skb)
+{
+	memset(skb->cb, 0, sizeof(struct mlx5e_skb_cb_hwtstamp));
+}
+
+static struct mlx5e_skb_cb_hwtstamp *mlx5e_skb_cb_get_hwts(struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(struct mlx5e_skb_cb_hwtstamp) > sizeof(skb->cb));
+	return (struct mlx5e_skb_cb_hwtstamp *)skb->cb;
+}
+
+static void mlx5e_skb_cb_hwtstamp_tx(struct sk_buff *skb,
+				     struct mlx5e_ptp_cq_stats *cq_stats)
+{
+	struct skb_shared_hwtstamps hwts = {};
+	ktime_t diff;
+
+	diff = abs(mlx5e_skb_cb_get_hwts(skb)->port_hwtstamp -
+		   mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp);
+
+	/* Maximal allowed diff is 1 / 128 second */
+	if (diff > (NSEC_PER_SEC >> 7)) {
+		cq_stats->abort++;
+		cq_stats->abort_abs_diff_ns += diff;
+		return;
+	}
+
+	hwts.hwtstamp = mlx5e_skb_cb_get_hwts(skb)->port_hwtstamp;
+	skb_tstamp_tx(skb, &hwts);
+}
+
+void mlx5e_skb_cb_hwtstamp_handler(struct sk_buff *skb, int hwtstamp_type,
+				   ktime_t hwtstamp,
+				   struct mlx5e_ptp_cq_stats *cq_stats)
+{
+	switch (hwtstamp_type) {
+	case (MLX5E_SKB_CB_CQE_HWTSTAMP):
+		mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp = hwtstamp;
+		break;
+	case (MLX5E_SKB_CB_PORT_HWTSTAMP):
+		mlx5e_skb_cb_get_hwts(skb)->port_hwtstamp = hwtstamp;
+		break;
+	}
+
+	/* If both CQEs arrive, check and report the port tstamp, and clear skb cb as
+	 * skb soon to be released.
+	 */
+	if (!mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp ||
+	    !mlx5e_skb_cb_get_hwts(skb)->port_hwtstamp)
+		return;
+
+	mlx5e_skb_cb_hwtstamp_tx(skb, cq_stats);
+	memset(skb->cb, 0, sizeof(struct mlx5e_skb_cb_hwtstamp));
+}
+
+static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
+				    struct mlx5_cqe64 *cqe,
+				    int budget)
+{
+	struct sk_buff *skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
+	ktime_t hwtstamp;
+
+	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		ptpsq->cq_stats->err_cqe++;
+		goto out;
+	}
+
+	hwtstamp = mlx5_timecounter_cyc2time(ptpsq->txqsq.clock, get_cqe_ts(cqe));
+	mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_PORT_HWTSTAMP,
+				      hwtstamp, ptpsq->cq_stats);
+	ptpsq->cq_stats->cqe++;
+
+out:
+	napi_consume_skb(skb, budget);
+}
+
+static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int budget)
+{
+	struct mlx5e_ptpsq *ptpsq = container_of(cq, struct mlx5e_ptpsq, ts_cq);
+	struct mlx5_cqwq *cqwq = &cq->wq;
+	struct mlx5_cqe64 *cqe;
+	int work_done = 0;
+
+	if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, &ptpsq->txqsq.state)))
+		return false;
+
+	cqe = mlx5_cqwq_get_cqe(cqwq);
+	if (!cqe)
+		return false;
+
+	do {
+		mlx5_cqwq_pop(cqwq);
+
+		mlx5e_ptp_handle_ts_cqe(ptpsq, cqe, budget);
+	} while ((++work_done < budget) && (cqe = mlx5_cqwq_get_cqe(cqwq)));
+
+	mlx5_cqwq_update_db_record(cqwq);
+
+	/* ensure cq space is freed before enabling more cqes */
+	wmb();
+
+	return work_done == budget;
+}
+
 static int mlx5e_ptp_napi_poll(struct napi_struct *napi, int budget)
 {
 	struct mlx5e_port_ptp *c = container_of(napi, struct mlx5e_port_ptp,
@@ -18,8 +127,10 @@ static int mlx5e_ptp_napi_poll(struct napi_struct *napi, int budget)
 
 	ch_stats->poll++;
 
-	for (i = 0; i < c->num_tc; i++)
+	for (i = 0; i < c->num_tc; i++) {
 		busy |= mlx5e_poll_tx_cq(&c->ptpsq[i].txqsq.cq, budget);
+		busy |= mlx5e_ptp_poll_ts_cq(&c->ptpsq[i].ts_cq, budget);
+	}
 
 	if (busy) {
 		work_done = budget;
@@ -31,8 +142,10 @@ static int mlx5e_ptp_napi_poll(struct napi_struct *napi, int budget)
 
 	ch_stats->arm++;
 
-	for (i = 0; i < c->num_tc; i++)
+	for (i = 0; i < c->num_tc; i++) {
 		mlx5e_cq_arm(&c->ptpsq[i].txqsq.cq);
+		mlx5e_cq_arm(&c->ptpsq[i].ts_cq);
+	}
 
 out:
 	rcu_read_unlock();
@@ -96,6 +209,37 @@ static void mlx5e_ptp_destroy_sq(struct mlx5_core_dev *mdev, u32 sqn)
 	mlx5_core_destroy_sq(mdev, sqn);
 }
 
+static int mlx5e_ptp_alloc_traffic_db(struct mlx5e_ptpsq *ptpsq, int numa)
+{
+	int wq_sz = mlx5_wq_cyc_get_size(&ptpsq->txqsq.wq);
+
+	ptpsq->skb_fifo.fifo = kvzalloc_node(array_size(wq_sz, sizeof(*ptpsq->skb_fifo.fifo)),
+					     GFP_KERNEL, numa);
+	if (!ptpsq->skb_fifo.fifo)
+		return -ENOMEM;
+
+	ptpsq->skb_fifo.pc   = &ptpsq->skb_fifo_pc;
+	ptpsq->skb_fifo.cc   = &ptpsq->skb_fifo_cc;
+	ptpsq->skb_fifo.mask = wq_sz - 1;
+
+	return 0;
+}
+
+static void mlx5e_ptp_drain_skb_fifo(struct mlx5e_skb_fifo *skb_fifo)
+{
+	while (*skb_fifo->pc != *skb_fifo->cc) {
+		struct sk_buff *skb = mlx5e_skb_fifo_pop(skb_fifo);
+
+		dev_kfree_skb_any(skb);
+	}
+}
+
+static void mlx5e_ptp_free_traffic_db(struct mlx5e_skb_fifo *skb_fifo)
+{
+	mlx5e_ptp_drain_skb_fifo(skb_fifo);
+	kvfree(skb_fifo->fifo);
+}
+
 static int mlx5e_ptp_open_txqsq(struct mlx5e_port_ptp *c, u32 tisn,
 				int txq_ix, struct mlx5e_ptp_params *cparams,
 				int tc, struct mlx5e_ptpsq *ptpsq)
@@ -115,11 +259,17 @@ static int mlx5e_ptp_open_txqsq(struct mlx5e_port_ptp *c, u32 tisn,
 	csp.cqn             = txqsq->cq.mcq.cqn;
 	csp.wq_ctrl         = &txqsq->wq_ctrl;
 	csp.min_inline_mode = txqsq->min_inline_mode;
+	csp.ts_cqe_to_dest_cqn = ptpsq->ts_cq.mcq.cqn;
 
 	err = mlx5e_create_sq_rdy(c->mdev, sqp, &csp, &txqsq->sqn);
 	if (err)
 		goto err_free_txqsq;
 
+	err = mlx5e_ptp_alloc_traffic_db(ptpsq,
+					 dev_to_node(mlx5_core_dma_dev(c->mdev)));
+	if (err)
+		goto err_free_txqsq;
+
 	return 0;
 
 err_free_txqsq:
@@ -133,6 +283,7 @@ static void mlx5e_ptp_close_txqsq(struct mlx5e_ptpsq *ptpsq)
 	struct mlx5e_txqsq *sq = &ptpsq->txqsq;
 	struct mlx5_core_dev *mdev = sq->mdev;
 
+	mlx5e_ptp_free_traffic_db(&ptpsq->skb_fifo);
 	cancel_work_sync(&sq->recover_work);
 	mlx5e_ptp_destroy_sq(mdev, sq->sqn);
 	mlx5e_free_txqsq_descs(sq);
@@ -200,8 +351,23 @@ static int mlx5e_ptp_open_cqs(struct mlx5e_port_ptp *c,
 			goto out_err_txqsq_cq;
 	}
 
+	for (tc = 0; tc < params->num_tc; tc++) {
+		struct mlx5e_cq *cq = &c->ptpsq[tc].ts_cq;
+		struct mlx5e_ptpsq *ptpsq = &c->ptpsq[tc];
+
+		err = mlx5e_open_cq(c->priv, ptp_moder, cq_param, &ccp, cq);
+		if (err)
+			goto out_err_ts_cq;
+
+		ptpsq->cq_stats = &c->priv->port_ptp_stats.cq[tc];
+	}
+
 	return 0;
 
+out_err_ts_cq:
+	for (--tc; tc >= 0; tc--)
+		mlx5e_close_cq(&c->ptpsq[tc].ts_cq);
+	tc = params->num_tc;
 out_err_txqsq_cq:
 	for (--tc; tc >= 0; tc--)
 		mlx5e_close_cq(&c->ptpsq[tc].txqsq.cq);
@@ -213,6 +379,9 @@ static void mlx5e_ptp_close_cqs(struct mlx5e_port_ptp *c)
 {
 	int tc;
 
+	for (tc = 0; tc < c->num_tc; tc++)
+		mlx5e_close_cq(&c->ptpsq[tc].ts_cq);
+
 	for (tc = 0; tc < c->num_tc; tc++)
 		mlx5e_close_cq(&c->ptpsq[tc].txqsq.cq);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
index daa3b6953e3f..28aa5ae118f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
@@ -10,6 +10,11 @@
 
 struct mlx5e_ptpsq {
 	struct mlx5e_txqsq       txqsq;
+	struct mlx5e_cq          ts_cq;
+	u16                      skb_fifo_cc;
+	u16                      skb_fifo_pc;
+	struct mlx5e_skb_fifo    skb_fifo;
+	struct mlx5e_ptp_cq_stats *cq_stats;
 };
 
 struct mlx5e_port_ptp {
@@ -45,4 +50,14 @@ void mlx5e_port_ptp_close(struct mlx5e_port_ptp *c);
 void mlx5e_ptp_activate_channel(struct mlx5e_port_ptp *c);
 void mlx5e_ptp_deactivate_channel(struct mlx5e_port_ptp *c);
 
+enum {
+	MLX5E_SKB_CB_CQE_HWTSTAMP  = BIT(0),
+	MLX5E_SKB_CB_PORT_HWTSTAMP = BIT(1),
+};
+
+void mlx5e_skb_cb_hwtstamp_handler(struct sk_buff *skb, int hwtstamp_type,
+				   ktime_t hwtstamp,
+				   struct mlx5e_ptp_cq_stats *cq_stats);
+
+void mlx5e_skb_cb_hwtstamp_init(struct sk_buff *skb);
 #endif /* __MLX5_EN_PTP_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index c55a2ad10599..d7275c84313e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -228,9 +228,19 @@ mlx5e_tx_reporter_build_diagnose_output_ptpsq(struct devlink_fmsg *fmsg,
 	if (err)
 		return err;
 
-	err = mlx5e_tx_reporter_build_diagnose_output_sq_common(fmsg,
-								&ptpsq->txqsq,
-								tc);
+	err = mlx5e_tx_reporter_build_diagnose_output_sq_common(fmsg, &ptpsq->txqsq, tc);
+	if (err)
+		return err;
+
+	err = mlx5e_health_fmsg_named_obj_nest_start(fmsg, "Port TS");
+	if (err)
+		return err;
+
+	err = mlx5e_health_cq_diag_fmsg(&ptpsq->ts_cq, fmsg);
+	if (err)
+		return err;
+
+	err = mlx5e_health_fmsg_named_obj_nest_end(fmsg);
 	if (err)
 		return err;
 
@@ -270,6 +280,23 @@ mlx5e_tx_reporter_diagnose_generic_txqsq(struct devlink_fmsg *fmsg,
 	return mlx5e_health_fmsg_named_obj_nest_end(fmsg);
 }
 
+static int
+mlx5e_tx_reporter_diagnose_generic_tx_port_ts(struct devlink_fmsg *fmsg,
+					      struct mlx5e_ptpsq *ptpsq)
+{
+	int err;
+
+	err = mlx5e_health_fmsg_named_obj_nest_start(fmsg, "Port TS");
+	if (err)
+		return err;
+
+	err = mlx5e_health_cq_common_diag_fmsg(&ptpsq->ts_cq, fmsg);
+	if (err)
+		return err;
+
+	return mlx5e_health_fmsg_named_obj_nest_end(fmsg);
+}
+
 static int
 mlx5e_tx_reporter_diagnose_common_config(struct devlink_health_reporter *reporter,
 					 struct devlink_fmsg *fmsg)
@@ -301,6 +328,10 @@ mlx5e_tx_reporter_diagnose_common_config(struct devlink_health_reporter *reporte
 	if (err)
 		return err;
 
+	err = mlx5e_tx_reporter_diagnose_generic_tx_port_ts(fmsg, generic_ptpsq);
+	if (err)
+		return err;
+
 	err = mlx5e_health_fmsg_named_obj_nest_end(fmsg);
 	if (err)
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e36a13238271..fd12d906d239 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1206,6 +1206,7 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev,
 	MLX5_SET(sqc,  sqc, tis_lst_sz, csp->tis_lst_sz);
 	MLX5_SET(sqc,  sqc, tis_num_0, csp->tisn);
 	MLX5_SET(sqc,  sqc, cqn, csp->cqn);
+	MLX5_SET(sqc,  sqc, ts_cqe_to_dest_cqn, csp->ts_cqe_to_dest_cqn);
 
 	if (MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
 		MLX5_SET(sqc,  sqc, min_wqe_inline_mode, csp->min_inline_mode);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 9d57dc94c767..2cf2042b37c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -1733,6 +1733,13 @@ static const struct counter_desc ptp_ch_stats_desc[] = {
 	{ MLX5E_DECLARE_PTP_CH_STAT(struct mlx5e_ch_stats, eq_rearm) },
 };
 
+static const struct counter_desc ptp_cq_stats_desc[] = {
+	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, cqe) },
+	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, err_cqe) },
+	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort) },
+	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort_abs_diff_ns) },
+};
+
 #define NUM_RQ_STATS			ARRAY_SIZE(rq_stats_desc)
 #define NUM_SQ_STATS			ARRAY_SIZE(sq_stats_desc)
 #define NUM_XDPSQ_STATS			ARRAY_SIZE(xdpsq_stats_desc)
@@ -1742,11 +1749,13 @@ static const struct counter_desc ptp_ch_stats_desc[] = {
 #define NUM_CH_STATS			ARRAY_SIZE(ch_stats_desc)
 #define NUM_PTP_SQ_STATS		ARRAY_SIZE(ptp_sq_stats_desc)
 #define NUM_PTP_CH_STATS		ARRAY_SIZE(ptp_ch_stats_desc)
+#define NUM_PTP_CQ_STATS		ARRAY_SIZE(ptp_cq_stats_desc)
 
 static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(ptp)
 {
 	return priv->port_ptp_opened ?
-	       NUM_PTP_CH_STATS + (NUM_PTP_SQ_STATS * priv->max_opened_tc) :
+	       NUM_PTP_CH_STATS +
+	       ((NUM_PTP_SQ_STATS + NUM_PTP_CQ_STATS) * priv->max_opened_tc) :
 	       0;
 }
 
@@ -1766,6 +1775,10 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(ptp)
 			sprintf(data + (idx++) * ETH_GSTRING_LEN,
 				ptp_sq_stats_desc[i].format, tc);
 
+	for (tc = 0; tc < priv->max_opened_tc; tc++)
+		for (i = 0; i < NUM_PTP_CQ_STATS; i++)
+			sprintf(data + (idx++) * ETH_GSTRING_LEN,
+				ptp_cq_stats_desc[i].format, tc);
 	return idx;
 }
 
@@ -1787,6 +1800,12 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(ptp)
 				MLX5E_READ_CTR64_CPU(&priv->port_ptp_stats.sq[tc],
 						     ptp_sq_stats_desc, i);
 
+	for (tc = 0; tc < priv->max_opened_tc; tc++)
+		for (i = 0; i < NUM_PTP_CQ_STATS; i++)
+			data[idx++] =
+				MLX5E_READ_CTR64_CPU(&priv->port_ptp_stats.cq[tc],
+						     ptp_cq_stats_desc, i);
+
 	return idx;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 98ffebcc93b9..e41fc11f2ce7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -53,6 +53,7 @@
 
 #define MLX5E_DECLARE_PTP_TX_STAT(type, fld) "ptp_tx%d_"#fld, offsetof(type, fld)
 #define MLX5E_DECLARE_PTP_CH_STAT(type, fld) "ptp_ch_"#fld, offsetof(type, fld)
+#define MLX5E_DECLARE_PTP_CQ_STAT(type, fld) "ptp_cq%d_"#fld, offsetof(type, fld)
 
 struct counter_desc {
 	char		format[ETH_GSTRING_LEN];
@@ -401,6 +402,13 @@ struct mlx5e_ch_stats {
 	u64 eq_rearm;
 };
 
+struct mlx5e_ptp_cq_stats {
+	u64 cqe;
+	u64 err_cqe;
+	u64 abort;
+	u64 abort_abs_diff_ns;
+};
+
 struct mlx5e_stats {
 	struct mlx5e_sw_stats sw;
 	struct mlx5e_qcounter_stats qcnt;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 0ae68cb25035..5736645842f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -449,6 +449,12 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 	mlx5e_tx_check_stop(sq);
 
+	if (unlikely(sq->ptpsq)) {
+		mlx5e_skb_cb_hwtstamp_init(skb);
+		mlx5e_skb_fifo_push(&sq->ptpsq->skb_fifo, skb);
+		skb_get(skb);
+	}
+
 	send_doorbell = __netdev_tx_sent_queue(sq->txq, attr->num_bytes, xmit_more);
 	if (send_doorbell)
 		mlx5e_notify_hw(wq, sq->pc, sq->uar_map, cseg);
@@ -753,7 +759,11 @@ static void mlx5e_consume_skb(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		u64 ts = get_cqe_ts(cqe);
 
 		hwts.hwtstamp = mlx5_timecounter_cyc2time(sq->clock, ts);
-		skb_tstamp_tx(skb, &hwts);
+		if (sq->ptpsq)
+			mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_CQE_HWTSTAMP,
+						      hwts.hwtstamp, sq->ptpsq->cq_stats);
+		else
+			skb_tstamp_tx(skb, &hwts);
 	}
 
 	napi_consume_skb(skb, napi_budget);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 10/15] net/mlx5e: remove unnecessary memset
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 09/15] net/mlx5e: Add TX port timestamp support Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 11/15] net/mlx5e: Remove duplicated include Saeed Mahameed
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S. Miller, netdev, Zhu Yanjun, Saeed Mahameed

From: Zhu Yanjun <yanjunz@nvidia.com>

Since kvzalloc will initialize the allocated memory, it is not
necessary to initialize it once again.

Fixes: 11b717d61526 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index c9c2962ad49f..f68a4afeefb8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1680,7 +1680,6 @@ static int esw_create_restore_table(struct mlx5_eswitch *esw)
 		goto out_free;
 	}
 
-	memset(flow_group_in, 0, inlen);
 	match_criteria = MLX5_ADDR_OF(create_flow_group_in, flow_group_in,
 				      match_criteria);
 	misc = MLX5_ADDR_OF(fte_match_param, match_criteria,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 11/15] net/mlx5e: Remove duplicated include
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 10/15] net/mlx5e: remove unnecessary memset Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 12/15] net/mlx5: Arm only EQs with EQEs Saeed Mahameed
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S. Miller, netdev, YueHaibing, Saeed Mahameed

From: YueHaibing <yuehaibing@huawei.com>

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5c0015024f62..7f5851c61218 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -52,7 +52,6 @@
 #include "en/xsk/rx.h"
 #include "en/health.h"
 #include "en/params.h"
-#include "en/txrx.h"
 
 static struct sk_buff *
 mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 12/15] net/mlx5: Arm only EQs with EQEs
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 11/15] net/mlx5e: Remove duplicated include Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 13/15] net/mlx5: Fix passing zero to 'PTR_ERR' Saeed Mahameed
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Shay Drory, Parav Pandit, Saeed Mahameed

From: Shay Drory <shayd@nvidia.com>

Currently, when more than one EQ is sharing an IRQ, and this IRQ is
being interrupted, all the EQs sharing the IRQ will be armed. This is
done regardless of whether an EQ has EQE.
When multiple EQs are sharing an IRQ, one or more EQs can have valid
EQEs.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 4ea5d6ddf56a..fc0afa03d407 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -136,7 +136,7 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 
 	eqe = next_eqe_sw(eq);
 	if (!eqe)
-		goto out;
+		return 0;
 
 	do {
 		struct mlx5_core_cq *cq;
@@ -161,8 +161,6 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 		++eq->cons_index;
 
 	} while ((++num_eqes < MLX5_EQ_POLLING_BUDGET) && (eqe = next_eqe_sw(eq)));
-
-out:
 	eq_update_ci(eq, 1);
 
 	if (cqn != -1)
@@ -250,9 +248,9 @@ static int mlx5_eq_async_int(struct notifier_block *nb,
 		++eq->cons_index;
 
 	} while ((++num_eqes < MLX5_EQ_POLLING_BUDGET) && (eqe = next_eqe_sw(eq)));
+	eq_update_ci(eq, 1);
 
 out:
-	eq_update_ci(eq, 1);
 	mlx5_eq_async_int_unlock(eq_async, recovery, &flags);
 
 	return unlikely(recovery) ? num_eqes : 0;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 13/15] net/mlx5: Fix passing zero to 'PTR_ERR'
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (11 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 12/15] net/mlx5: Arm only EQs with EQEs Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 14/15] net/mlx5e: Split between RX/TX tunnel FW support indication Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 15/15] net/mlx5e: Fill mlx5e_create_cq_param in a function Saeed Mahameed
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S. Miller, netdev, YueHaibing, Saeed Mahameed

From: YueHaibing <yuehaibing@huawei.com>

Fix smatch warnings:

drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c:105 esw_acl_egress_lgcy_setup() warn: passing zero to 'PTR_ERR'
drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c:177 esw_acl_egress_ofld_setup() warn: passing zero to 'PTR_ERR'
drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c:184 esw_acl_ingress_lgcy_setup() warn: passing zero to 'PTR_ERR'
drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c:262 esw_acl_ingress_ofld_setup() warn: passing zero to 'PTR_ERR'

esw_acl_table_create() never returns NULL, so
NULL test should be removed.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c  | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c  | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c
index d46f8b225ebe..2b85d4777303 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c
@@ -101,7 +101,7 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw,
 	vport->egress.acl = esw_acl_table_create(esw, vport->vport,
 						 MLX5_FLOW_NAMESPACE_ESW_EGRESS,
 						 table_size);
-	if (IS_ERR_OR_NULL(vport->egress.acl)) {
+	if (IS_ERR(vport->egress.acl)) {
 		err = PTR_ERR(vport->egress.acl);
 		vport->egress.acl = NULL;
 		goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
index c3faae67e4d6..4c74e2690d57 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
@@ -173,7 +173,7 @@ int esw_acl_egress_ofld_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport
 		table_size++;
 	vport->egress.acl = esw_acl_table_create(esw, vport->vport,
 						 MLX5_FLOW_NAMESPACE_ESW_EGRESS, table_size);
-	if (IS_ERR_OR_NULL(vport->egress.acl)) {
+	if (IS_ERR(vport->egress.acl)) {
 		err = PTR_ERR(vport->egress.acl);
 		vport->egress.acl = NULL;
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
index b68976b378b8..d64fad2823e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
@@ -180,7 +180,7 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw,
 		vport->ingress.acl = esw_acl_table_create(esw, vport->vport,
 							  MLX5_FLOW_NAMESPACE_ESW_INGRESS,
 							  table_size);
-		if (IS_ERR_OR_NULL(vport->ingress.acl)) {
+		if (IS_ERR(vport->ingress.acl)) {
 			err = PTR_ERR(vport->ingress.acl);
 			vport->ingress.acl = NULL;
 			return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c
index 4e55d7225a26..548c005ea633 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_ofld.c
@@ -258,7 +258,7 @@ int esw_acl_ingress_ofld_setup(struct mlx5_eswitch *esw,
 	vport->ingress.acl = esw_acl_table_create(esw, vport->vport,
 						  MLX5_FLOW_NAMESPACE_ESW_INGRESS,
 						  num_ftes);
-	if (IS_ERR_OR_NULL(vport->ingress.acl)) {
+	if (IS_ERR(vport->ingress.acl)) {
 		err = PTR_ERR(vport->ingress.acl);
 		vport->ingress.acl = NULL;
 		return err;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 14/15] net/mlx5e: Split between RX/TX tunnel FW support indication
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (12 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 13/15] net/mlx5: Fix passing zero to 'PTR_ERR' Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  2020-12-03  4:21 ` [net-next V2 15/15] net/mlx5e: Fill mlx5e_create_cq_param in a function Saeed Mahameed
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Aya Levin, Moshe Shemesh, Saeed Mahameed

From: Aya Levin <ayal@nvidia.com>

Use the new FW caps to advertise for ip-in-ip tunnel support separately
for RX and TX.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_fs.c   | 20 +++++++----
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 34 ++++++++++++++++---
 3 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index dc744702aee4..5749557749b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -287,8 +287,7 @@ void mlx5e_disable_cvlan_filter(struct mlx5e_priv *priv);
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv);
 
-bool mlx5e_tunnel_proto_supported(struct mlx5_core_dev *mdev, u8 proto_type);
-bool mlx5e_any_tunnel_proto_supported(struct mlx5_core_dev *mdev);
+u8 mlx5e_get_proto_by_tunnel_type(enum mlx5e_tunnel_types tt);
 
 #endif /* __MLX5E_FLOW_STEER_H__ */
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 1f48f99c0997..fa8149f6eb08 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -772,25 +772,31 @@ static struct mlx5e_etype_proto ttc_tunnel_rules[] = {
 
 };
 
-bool mlx5e_tunnel_proto_supported(struct mlx5_core_dev *mdev, u8 proto_type)
+u8 mlx5e_get_proto_by_tunnel_type(enum mlx5e_tunnel_types tt)
+{
+	return ttc_tunnel_rules[tt].proto;
+}
+
+static bool mlx5e_tunnel_proto_supported_rx(struct mlx5_core_dev *mdev, u8 proto_type)
 {
 	switch (proto_type) {
 	case IPPROTO_GRE:
 		return MLX5_CAP_ETH(mdev, tunnel_stateless_gre);
 	case IPPROTO_IPIP:
 	case IPPROTO_IPV6:
-		return MLX5_CAP_ETH(mdev, tunnel_stateless_ip_over_ip);
+		return (MLX5_CAP_ETH(mdev, tunnel_stateless_ip_over_ip) ||
+			MLX5_CAP_ETH(mdev, tunnel_stateless_ip_over_ip_rx));
 	default:
 		return false;
 	}
 }
 
-bool mlx5e_any_tunnel_proto_supported(struct mlx5_core_dev *mdev)
+static bool mlx5e_tunnel_any_rx_proto_supported(struct mlx5_core_dev *mdev)
 {
 	int tt;
 
 	for (tt = 0; tt < MLX5E_NUM_TUNNEL_TT; tt++) {
-		if (mlx5e_tunnel_proto_supported(mdev, ttc_tunnel_rules[tt].proto))
+		if (mlx5e_tunnel_proto_supported_rx(mdev, ttc_tunnel_rules[tt].proto))
 			return true;
 	}
 	return false;
@@ -798,7 +804,7 @@ bool mlx5e_any_tunnel_proto_supported(struct mlx5_core_dev *mdev)
 
 bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
 {
-	return (mlx5e_any_tunnel_proto_supported(mdev) &&
+	return (mlx5e_tunnel_any_rx_proto_supported(mdev) &&
 		MLX5_CAP_FLOWTABLE_NIC_RX(mdev, ft_field_support.inner_ip_version));
 }
 
@@ -899,8 +905,8 @@ static int mlx5e_generate_ttc_table_rules(struct mlx5e_priv *priv,
 	dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 	dest.ft   = params->inner_ttc->ft.t;
 	for (tt = 0; tt < MLX5E_NUM_TUNNEL_TT; tt++) {
-		if (!mlx5e_tunnel_proto_supported(priv->mdev,
-						  ttc_tunnel_rules[tt].proto))
+		if (!mlx5e_tunnel_proto_supported_rx(priv->mdev,
+						     ttc_tunnel_rules[tt].proto))
 			continue;
 		trules[tt] = mlx5e_generate_ttc_rule(priv, ft, &dest,
 						     ttc_tunnel_rules[tt].etype,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index fd12d906d239..26be6eb44fed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4279,6 +4279,20 @@ int mlx5e_get_vf_stats(struct net_device *dev,
 }
 #endif
 
+static bool mlx5e_tunnel_proto_supported_tx(struct mlx5_core_dev *mdev, u8 proto_type)
+{
+	switch (proto_type) {
+	case IPPROTO_GRE:
+		return MLX5_CAP_ETH(mdev, tunnel_stateless_gre);
+	case IPPROTO_IPIP:
+	case IPPROTO_IPV6:
+		return (MLX5_CAP_ETH(mdev, tunnel_stateless_ip_over_ip) ||
+			MLX5_CAP_ETH(mdev, tunnel_stateless_ip_over_ip_tx));
+	default:
+		return false;
+	}
+}
+
 static bool mlx5e_gre_tunnel_inner_proto_offload_supported(struct mlx5_core_dev *mdev,
 							   struct sk_buff *skb)
 {
@@ -4321,7 +4335,7 @@ static netdev_features_t mlx5e_tunnel_features_check(struct mlx5e_priv *priv,
 		break;
 	case IPPROTO_IPIP:
 	case IPPROTO_IPV6:
-		if (mlx5e_tunnel_proto_supported(priv->mdev, IPPROTO_IPIP))
+		if (mlx5e_tunnel_proto_supported_tx(priv->mdev, IPPROTO_IPIP))
 			return features;
 		break;
 	case IPPROTO_UDP:
@@ -4906,6 +4920,17 @@ void mlx5e_vxlan_set_netdev_info(struct mlx5e_priv *priv)
 	priv->netdev->udp_tunnel_nic_info = &priv->nic_info;
 }
 
+static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
+{
+	int tt;
+
+	for (tt = 0; tt < MLX5E_NUM_TUNNEL_TT; tt++) {
+		if (mlx5e_tunnel_proto_supported_tx(mdev, mlx5e_get_proto_by_tunnel_type(tt)))
+			return true;
+	}
+	return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
+}
+
 static void mlx5e_build_nic_netdev(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -4951,8 +4976,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	mlx5e_vxlan_set_netdev_info(priv);
 
-	if (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev) ||
-	    mlx5e_any_tunnel_proto_supported(mdev)) {
+	if (mlx5e_tunnel_any_tx_proto_supported(mdev)) {
 		netdev->hw_enc_features |= NETIF_F_HW_CSUM;
 		netdev->hw_enc_features |= NETIF_F_TSO;
 		netdev->hw_enc_features |= NETIF_F_TSO6;
@@ -4969,7 +4993,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 					 NETIF_F_GSO_UDP_TUNNEL_CSUM;
 	}
 
-	if (mlx5e_tunnel_proto_supported(mdev, IPPROTO_GRE)) {
+	if (mlx5e_tunnel_proto_supported_tx(mdev, IPPROTO_GRE)) {
 		netdev->hw_features     |= NETIF_F_GSO_GRE |
 					   NETIF_F_GSO_GRE_CSUM;
 		netdev->hw_enc_features |= NETIF_F_GSO_GRE |
@@ -4978,7 +5002,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 						NETIF_F_GSO_GRE_CSUM;
 	}
 
-	if (mlx5e_tunnel_proto_supported(mdev, IPPROTO_IPIP)) {
+	if (mlx5e_tunnel_proto_supported_tx(mdev, IPPROTO_IPIP)) {
 		netdev->hw_features |= NETIF_F_GSO_IPXIP4 |
 				       NETIF_F_GSO_IPXIP6;
 		netdev->hw_enc_features |= NETIF_F_GSO_IPXIP4 |
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [net-next V2 15/15] net/mlx5e: Fill mlx5e_create_cq_param in a function
  2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
                   ` (13 preceding siblings ...)
  2020-12-03  4:21 ` [net-next V2 14/15] net/mlx5e: Split between RX/TX tunnel FW support indication Saeed Mahameed
@ 2020-12-03  4:21 ` Saeed Mahameed
  14 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-03  4:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Maxim Mikityanskiy, Tariq Toukan,
	Saeed Mahameed

From: Maxim Mikityanskiy <maximmi@mellanox.com>

Create a function to fill the fields of struct mlx5e_create_cq_param
based on a channel. The purpose is code reuse between normal CQs, XSK
CQs and the upcoming QoS CQs.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/params.h |  1 +
 .../ethernet/mellanox/mlx5/core/en/xsk/setup.c  |  7 ++-----
 .../net/ethernet/mellanox/mlx5/core/en_main.c   | 17 ++++++++++++-----
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 3959254d4181..807147d97a0f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -111,6 +111,7 @@ u16 mlx5e_get_rq_headroom(struct mlx5_core_dev *mdev,
 
 /* Build queue parameters */
 
+void mlx5e_build_create_cq_param(struct mlx5e_create_cq_param *ccp, struct mlx5e_channel *c);
 void mlx5e_build_rq_param(struct mlx5e_priv *priv,
 			  struct mlx5e_params *params,
 			  struct mlx5e_xsk_param *xsk,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index 7703e6553da6..d87c345878d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -48,14 +48,11 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 		   struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
 		   struct mlx5e_channel *c)
 {
-	struct mlx5e_create_cq_param ccp = {};
 	struct mlx5e_channel_param *cparam;
+	struct mlx5e_create_cq_param ccp;
 	int err;
 
-	ccp.napi = &c->napi;
-	ccp.ch_stats = c->stats;
-	ccp.node = cpu_to_node(c->cpu);
-	ccp.ix = c->ix;
+	mlx5e_build_create_cq_param(&ccp, c);
 
 	if (!mlx5e_validate_xsk_param(params, xsk, priv->mdev))
 		return -EINVAL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 26be6eb44fed..e573a82ce037 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1806,18 +1806,25 @@ static int mlx5e_set_tx_maxrate(struct net_device *dev, int index, u32 rate)
 	return err;
 }
 
+void mlx5e_build_create_cq_param(struct mlx5e_create_cq_param *ccp, struct mlx5e_channel *c)
+{
+	*ccp = (struct mlx5e_create_cq_param) {
+		.napi = &c->napi,
+		.ch_stats = c->stats,
+		.node = cpu_to_node(c->cpu),
+		.ix = c->ix,
+	};
+}
+
 static int mlx5e_open_queues(struct mlx5e_channel *c,
 			     struct mlx5e_params *params,
 			     struct mlx5e_channel_param *cparam)
 {
 	struct dim_cq_moder icocq_moder = {0, 0};
-	struct mlx5e_create_cq_param ccp = {};
+	struct mlx5e_create_cq_param ccp;
 	int err;
 
-	ccp.napi = &c->napi;
-	ccp.ch_stats = c->stats;
-	ccp.node = cpu_to_node(c->cpu);
-	ccp.ix = c->ix;
+	mlx5e_build_create_cq_param(&ccp, c);
 
 	err = mlx5e_open_cq(c->priv, icocq_moder, &cparam->icosq.cqp, &ccp,
 			    &c->async_icosq.cq);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-03  4:21 ` [net-next V2 08/15] net/mlx5e: Add TX PTP port object support Saeed Mahameed
@ 2020-12-04  2:29   ` Jakub Kicinski
  2020-12-04 19:33     ` Saeed Mahameed
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-04  2:29 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan

On Wed, 2 Dec 2020 20:21:01 -0800 Saeed Mahameed wrote:
> Add TX PTP port object support for better TX timestamping accuracy.
> Currently, driver supports CQE based TX port timestamp. Device
> also offers TX port timestamp, which has less jitter and better
> reflects the actual time of a packet's transmit.

How much better is it?

Is the new implementation is standard compliant or just a "better
guess"?

> Define new driver layout called ptpsq, on which driver will create
> SQs that will support TX port timestamp for their transmitted packets.
> Driver to identify PTP TX skbs and steer them to these dedicated SQs
> as part of the select queue ndo.
> 
> Driver to hold ptpsq per TC and report them at
> netif_set_real_num_tx_queues().
> 
> Add support for all needed functionality in order to xmit and poll
> completions received via ptpsq.
> 
> Add ptpsq to the TX reporter recover, diagnose and dump methods.
> 
> Creation of ptpsqs is disabled by default, and can be enabled via
> tx_port_ts private flag.

This flag is pretty bad user experience.

> This patch steer all timestamp related packets to a ptpsq, but it
> does not open the port timestamp support for it. The support will
> be added in the following patch.

Overall I'm a little shocked by this, let me sleep on it :)

More info on the trade offs and considerations which led to the
implementation would be useful.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04  2:29   ` Jakub Kicinski
@ 2020-12-04 19:33     ` Saeed Mahameed
  2020-12-04 20:26       ` Jakub Kicinski
  0 siblings, 1 reply; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-04 19:33 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan

On Thu, 2020-12-03 at 18:29 -0800, Jakub Kicinski wrote:
> On Wed, 2 Dec 2020 20:21:01 -0800 Saeed Mahameed wrote:
> > Add TX PTP port object support for better TX timestamping accuracy.
> > Currently, driver supports CQE based TX port timestamp. Device
> > also offers TX port timestamp, which has less jitter and better
> > reflects the actual time of a packet's transmit.
> 
> How much better is it?
> 
> Is the new implementation is standard compliant or just a "better
> guess"?
> 

It is not a guess for sure, the closer to the output port you take the
stamp the more accurate you get, this is why we need the HW timestamp
in first place, i don't have the exact number though, but we target to
be compliant with G.8273.2 class C, (30 nsec), and this code allow
Linux systems to be deployed in the 5G telco edge. Where this standard
is needed.

> > Define new driver layout called ptpsq, on which driver will create
> > SQs that will support TX port timestamp for their transmitted
> > packets.
> > Driver to identify PTP TX skbs and steer them to these dedicated
> > SQs
> > as part of the select queue ndo.
> > 
> > Driver to hold ptpsq per TC and report them at
> > netif_set_real_num_tx_queues().
> > 
> > Add support for all needed functionality in order to xmit and poll
> > completions received via ptpsq.
> > 
> > Add ptpsq to the TX reporter recover, diagnose and dump methods.
> > 
> > Creation of ptpsqs is disabled by default, and can be enabled via
> > tx_port_ts private flag.
> 
> This flag is pretty bad user experience.
> 

Yeah, nothing i  could do about this, there is a large memory foot
print i want to avoid, and we don't want to complicate PTP ctrl API of
the HW operating mode, so until we improve the HW, we prefer to keep
this feature as a private flag.

> > This patch steer all timestamp related packets to a ptpsq, but it
> > does not open the port timestamp support for it. The support will
> > be added in the following patch.
> 
> Overall I'm a little shocked by this, let me sleep on it :)
> 
> More info on the trade offs and considerations which led to the
> implementation would be useful.

To get the Improved accuracy we need a special type of SQs attached to
special HW objects that will provide more accurate stamping.

Trade-offs are :

options 1) convert ALL regular txqs (SQs) to work in this port stamping
mode.

Pros: no need for any special mode in driver, no additional memory,
other than the new HW objects we create for the special stamping.

Cons: significant performance hit for non PTP traffic, (the hw stamps
all packets in the slow but more accurate mode)

option 2) route PTP traffic to a special SQs per ring, this SQ will be
PTP port accurate, Normal traffic will continue through regular SQs

Pros: Regular non PTP traffic not affected.
Cons: High memory footprint for creating special SQs


So we prefer (2) + private flag to avoid the performance hit and the
redundant memory usage out of the box.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 19:33     ` Saeed Mahameed
@ 2020-12-04 20:26       ` Jakub Kicinski
  2020-12-04 21:57         ` Saeed Mahameed
  2020-12-05  1:49         ` Vladimir Oltean
  0 siblings, 2 replies; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-04 20:26 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan,
	Richard Cochran, Vladimir Oltean, Willem de Bruijn

On Fri, 04 Dec 2020 11:33:26 -0800 Saeed Mahameed wrote:
> On Thu, 2020-12-03 at 18:29 -0800, Jakub Kicinski wrote:
> > On Wed, 2 Dec 2020 20:21:01 -0800 Saeed Mahameed wrote:  
> > > Add TX PTP port object support for better TX timestamping accuracy.
> > > Currently, driver supports CQE based TX port timestamp. Device
> > > also offers TX port timestamp, which has less jitter and better
> > > reflects the actual time of a packet's transmit.  
> > 
> > How much better is it?
> > 
> > Is the new implementation is standard compliant or just a "better
> > guess"?
> 
> It is not a guess for sure, the closer to the output port you take the
> stamp the more accurate you get, this is why we need the HW timestamp
> in first place, i don't have the exact number though, but we target to
> be compliant with G.8273.2 class C, (30 nsec), and this code allow
> Linux systems to be deployed in the 5G telco edge. Where this standard
> is needed.

I see. IIRC there was also an IEEE standard which specified the exact
time stamping point (i.e. SFD crosses layer X). If it's class C that
answers the question, I think.

> > > Define new driver layout called ptpsq, on which driver will create
> > > SQs that will support TX port timestamp for their transmitted
> > > packets.
> > > Driver to identify PTP TX skbs and steer them to these dedicated
> > > SQs
> > > as part of the select queue ndo.
> > > 
> > > Driver to hold ptpsq per TC and report them at
> > > netif_set_real_num_tx_queues().
> > > 
> > > Add support for all needed functionality in order to xmit and poll
> > > completions received via ptpsq.
> > > 
> > > Add ptpsq to the TX reporter recover, diagnose and dump methods.
> > > 
> > > Creation of ptpsqs is disabled by default, and can be enabled via
> > > tx_port_ts private flag.  
> > 
> > This flag is pretty bad user experience.
> 
> Yeah, nothing i  could do about this, there is a large memory foot
> print i want to avoid, and we don't want to complicate PTP ctrl API of
> the HW operating mode, so until we improve the HW, we prefer to keep
> this feature as a private flag.
> 
> > > This patch steer all timestamp related packets to a ptpsq, but it
> > > does not open the port timestamp support for it. The support will
> > > be added in the following patch.  
> > 
> > Overall I'm a little shocked by this, let me sleep on it :)
> > 
> > More info on the trade offs and considerations which led to the
> > implementation would be useful.  
> 
> To get the Improved accuracy we need a special type of SQs attached to
> special HW objects that will provide more accurate stamping.
> 
> Trade-offs are :
> 
> options 1) convert ALL regular txqs (SQs) to work in this port stamping
> mode.
> 
> Pros: no need for any special mode in driver, no additional memory,
> other than the new HW objects we create for the special stamping.
> 
> Cons: significant performance hit for non PTP traffic, (the hw stamps
> all packets in the slow but more accurate mode)

Just to be clear (Alexei brought this up when I mentioned these
patches) - the requirement for the separate queues is because the time
stamp enable is a queue property, not a per WQE / frame thing? I
couldn't find this in the code - could you point me to where it's set?

> option 2) route PTP traffic to a special SQs per ring, this SQ will be
> PTP port accurate, Normal traffic will continue through regular SQs
> 
> Pros: Regular non PTP traffic not affected.
> Cons: High memory footprint for creating special SQs
> 
> 
> So we prefer (2) + private flag to avoid the performance hit and the
> redundant memory usage out of the box.

Option 3 - have only one special PTP queue in the system. PTP traffic
is rather low rate, queue per core doesn't seem necessary.


Since you said the PTP queues are slower / higher overhead - are you not
concerned that QUIC traffic will get mis-directed to them? People like
hardware time stamps for all sort of measurements these days. Plus,
since UDP doesn't itself set ooo those applications may be surprised to
see increased out-of-order rate.

Why not use the PTP classification helpers we already have?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 20:26       ` Jakub Kicinski
@ 2020-12-04 21:57         ` Saeed Mahameed
  2020-12-04 22:52           ` Jakub Kicinski
                             ` (2 more replies)
  2020-12-05  1:49         ` Vladimir Oltean
  1 sibling, 3 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-04 21:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan,
	Richard Cochran, Vladimir Oltean, Willem de Bruijn

On Fri, 2020-12-04 at 12:26 -0800, Jakub Kicinski wrote:
> On Fri, 04 Dec 2020 11:33:26 -0800 Saeed Mahameed wrote:
> > On Thu, 2020-12-03 at 18:29 -0800, Jakub Kicinski wrote:
> > > On Wed, 2 Dec 2020 20:21:01 -0800 Saeed Mahameed wrote:  
> > > > Add TX PTP port object support for better TX timestamping
> > > > accuracy.
> > > > Currently, driver supports CQE based TX port timestamp. Device
> > > > also offers TX port timestamp, which has less jitter and better
> > > > reflects the actual time of a packet's transmit.  
> > > 
> > > How much better is it?
> > > 
> > > Is the new implementation is standard compliant or just a "better
> > > guess"?
> > 
> > It is not a guess for sure, the closer to the output port you take
> > the
> > stamp the more accurate you get, this is why we need the HW
> > timestamp
> > in first place, i don't have the exact number though, but we target
> > to
> > be compliant with G.8273.2 class C, (30 nsec), and this code allow
> > Linux systems to be deployed in the 5G telco edge. Where this
> > standard
> > is needed.
> 
> I see. IIRC there was also an IEEE standard which specified the exact
> time stamping point (i.e. SFD crosses layer X). If it's class C that
> answers the question, I think.
> 
> > > > Define new driver layout called ptpsq, on which driver will
> > > > create
> > > > SQs that will support TX port timestamp for their transmitted
> > > > packets.
> > > > Driver to identify PTP TX skbs and steer them to these
> > > > dedicated
> > > > SQs
> > > > as part of the select queue ndo.
> > > > 
> > > > Driver to hold ptpsq per TC and report them at
> > > > netif_set_real_num_tx_queues().
> > > > 
> > > > Add support for all needed functionality in order to xmit and
> > > > poll
> > > > completions received via ptpsq.
> > > > 
> > > > Add ptpsq to the TX reporter recover, diagnose and dump
> > > > methods.
> > > > 
> > > > Creation of ptpsqs is disabled by default, and can be enabled
> > > > via
> > > > tx_port_ts private flag.  
> > > 
> > > This flag is pretty bad user experience.
> > 
> > Yeah, nothing i  could do about this, there is a large memory foot
> > print i want to avoid, and we don't want to complicate PTP ctrl API
> > of
> > the HW operating mode, so until we improve the HW, we prefer to
> > keep
> > this feature as a private flag.
> > 
> > > > This patch steer all timestamp related packets to a ptpsq, but
> > > > it
> > > > does not open the port timestamp support for it. The support
> > > > will
> > > > be added in the following patch.  
> > > 
> > > Overall I'm a little shocked by this, let me sleep on it :)
> > > 
> > > More info on the trade offs and considerations which led to the
> > > implementation would be useful.  
> > 
> > To get the Improved accuracy we need a special type of SQs attached
> > to
> > special HW objects that will provide more accurate stamping.
> > 
> > Trade-offs are :
> > 
> > options 1) convert ALL regular txqs (SQs) to work in this port
> > stamping
> > mode.
> > 
> > Pros: no need for any special mode in driver, no additional memory,
> > other than the new HW objects we create for the special stamping.
> > 
> > Cons: significant performance hit for non PTP traffic, (the hw
> > stamps
> > all packets in the slow but more accurate mode)
> 
> Just to be clear (Alexei brought this up when I mentioned these
> patches) - the requirement for the separate queues is because the
> time
> stamp enable is a queue property, not a per WQE / frame thing? I
> couldn't find this in the code - could you point me to where it's
> set?
> 

Yes, it is not per WQE, a new SQ property and we set it on:
mlx5e_ptp_open_txqsq() and then pass it to mlx5e_create_sq()

where we set it in the hw context like so:

MLX5_SET(sqc,  sqc, ts_cqe_to_dest_cqn, csp->ts_cqe_to_dest_cqn);

A nice quirk ! this will be Line #1234 in mlx5/core/en_main.c :)


> > option 2) route PTP traffic to a special SQs per ring, this SQ will
> > be
> > PTP port accurate, Normal traffic will continue through regular SQs
> > 
> > Pros: Regular non PTP traffic not affected.
> > Cons: High memory footprint for creating special SQs
> > 
> > 
> > So we prefer (2) + private flag to avoid the performance hit and
> > the
> > redundant memory usage out of the box.
> 
> Option 3 - have only one special PTP queue in the system. PTP traffic
> is rather low rate, queue per core doesn't seem necessary.
> 

We only forward ptp traffic to the new special queue but we create more
than one to avoid internal locking as we will utilize the tx softirq
percpu.

After double checking the code it seems Eran and Tariq have decided to
forward all UDP traffic, let me double check with them what happened
here.


> 
> Since you said the PTP queues are slower / higher overhead - are you
> not
> concerned that QUIC traffic will get mis-directed to them? People
> like
> hardware time stamps for all sort of measurements these days. Plus,
> since UDP doesn't itself set ooo those applications may be surprised
> to
> see increased out-of-order rate.
> 

Right, i thought Eran was looking for the ptp udp port as well.
Let me verify what happened here.

> Why not use the PTP classification helpers we already have?

do you mean ptp_parse_header() or the ebpf prog ?
We use skb_flow_dissect() which should be simple enough.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 21:57         ` Saeed Mahameed
@ 2020-12-04 22:52           ` Jakub Kicinski
  2020-12-05  0:55             ` Vladimir Oltean
  2020-12-04 23:17           ` Jakub Kicinski
  2020-12-06 13:33           ` Eran Ben Elisha
  2 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-04 22:52 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan,
	Richard Cochran, Vladimir Oltean, Willem de Bruijn

On Fri, 04 Dec 2020 13:57:49 -0800 Saeed Mahameed wrote:
> > Why not use the PTP classification helpers we already have?  
> 
> do you mean ptp_parse_header() or the ebpf prog ?
> We use skb_flow_dissect() which should be simple enough.

Not sure which exact one TBH, I just know we have helpers for this, 
so if we don't use them it'd be good to at least justify why.

Maybe someone with more practical knowledge here can chime in with 
a recommendation for a helper to find PTP frames on TX?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 21:57         ` Saeed Mahameed
  2020-12-04 22:52           ` Jakub Kicinski
@ 2020-12-04 23:17           ` Jakub Kicinski
  2020-12-04 23:57             ` Saeed Mahameed
  2020-12-06 13:36             ` Eran Ben Elisha
  2020-12-06 13:33           ` Eran Ben Elisha
  2 siblings, 2 replies; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-04 23:17 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan

On Fri, 04 Dec 2020 13:57:49 -0800 Saeed Mahameed wrote:
> > > option 2) route PTP traffic to a special SQs per ring, this SQ will
> > > be
> > > PTP port accurate, Normal traffic will continue through regular SQs
> > > 
> > > Pros: Regular non PTP traffic not affected.
> > > Cons: High memory footprint for creating special SQs
> > > 
> > > So we prefer (2) + private flag to avoid the performance hit and
> > > the
> > > redundant memory usage out of the box.  
> > 
> > Option 3 - have only one special PTP queue in the system. PTP traffic
> > is rather low rate, queue per core doesn't seem necessary.
> 
> We only forward ptp traffic to the new special queue but we create more
> than one to avoid internal locking as we will utilize the tx softirq
> percpu.

In other words to make the driver implementation simpler we'll have
a pretty basic feature hidden behind a ethtool priv knob and a number
of queues which doesn't match reality reported to user space. Hm.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 23:17           ` Jakub Kicinski
@ 2020-12-04 23:57             ` Saeed Mahameed
  2020-12-05  0:24               ` Jakub Kicinski
  2020-12-06 13:36             ` Eran Ben Elisha
  1 sibling, 1 reply; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-04 23:57 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan

On Fri, 2020-12-04 at 15:17 -0800, Jakub Kicinski wrote:
> On Fri, 04 Dec 2020 13:57:49 -0800 Saeed Mahameed wrote:
> > > > option 2) route PTP traffic to a special SQs per ring, this SQ
> > > > will
> > > > be
> > > > PTP port accurate, Normal traffic will continue through regular
> > > > SQs
> > > > 
> > > > Pros: Regular non PTP traffic not affected.
> > > > Cons: High memory footprint for creating special SQs
> > > > 
> > > > So we prefer (2) + private flag to avoid the performance hit
> > > > and
> > > > the
> > > > redundant memory usage out of the box.  
> > > 
> > > Option 3 - have only one special PTP queue in the system. PTP
> > > traffic
> > > is rather low rate, queue per core doesn't seem necessary.
> > 
> > We only forward ptp traffic to the new special queue but we create
> > more
> > than one to avoid internal locking as we will utilize the tx
> > softirq
> > percpu.
> 
> In other words to make the driver implementation simpler we'll have
> a pretty basic feature hidden behind a ethtool priv knob and a number
> of queues which doesn't match reality reported to user space. Hm.

I look at these queues as a special HW objects to allow the accurate
PTP stamping, they piggyback on the reported txqs, so they are
transparent, they just increase the memory footprint of each ring.

for the priv flags, one of the floating ideas was to
use hwtstamp_rx_filters flags:
 
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/net_tstamp.h#L107

Our hardware timestamps all packets for free whether you request it or
not, Currently there is no option to setup "ALL_PTP" traffic in ethtool
-T, but we can add this flag as it make sense to be in ethtool -T, thus
we could use it in mlx5 to determine if user selected ALL_PTP, then ptp
packets will go through this accurate special path.

This is not a W/A or an abuse to the new flag, it just means if you
select ALL_PTP then a side effect will be our HW will be more accurate 
for PTP traffic.

What do you think ?

Regarding reducing to a single special queue, i will discuss with Eran
and the Team on Sunday.

Thanks,
Saeed.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 23:57             ` Saeed Mahameed
@ 2020-12-05  0:24               ` Jakub Kicinski
  2020-12-06 13:37                 ` Eran Ben Elisha
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-05  0:24 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan

On Fri, 04 Dec 2020 15:57:36 -0800 Saeed Mahameed wrote:
> On Fri, 2020-12-04 at 15:17 -0800, Jakub Kicinski wrote:
> > On Fri, 04 Dec 2020 13:57:49 -0800 Saeed Mahameed wrote:  
> > > > > option 2) route PTP traffic to a special SQs per ring, this SQ
> > > > > will
> > > > > be
> > > > > PTP port accurate, Normal traffic will continue through regular
> > > > > SQs
> > > > > 
> > > > > Pros: Regular non PTP traffic not affected.
> > > > > Cons: High memory footprint for creating special SQs
> > > > > 
> > > > > So we prefer (2) + private flag to avoid the performance hit
> > > > > and
> > > > > the
> > > > > redundant memory usage out of the box.    
> > > > 
> > > > Option 3 - have only one special PTP queue in the system. PTP
> > > > traffic
> > > > is rather low rate, queue per core doesn't seem necessary.  
> > > 
> > > We only forward ptp traffic to the new special queue but we create
> > > more
> > > than one to avoid internal locking as we will utilize the tx
> > > softirq
> > > percpu.  
> > 
> > In other words to make the driver implementation simpler we'll have
> > a pretty basic feature hidden behind a ethtool priv knob and a number
> > of queues which doesn't match reality reported to user space. Hm.  
> 
> I look at these queues as a special HW objects to allow the accurate
> PTP stamping, they piggyback on the reported txqs, so they are
> transparent, 

But they are visible to the stack, via sysfs, netlink. Any check
in the kernel that tries to help the driver by validating user input
against real_num_tx_queues will be moot for mlx5e.

mlx5e hides the AF_XDP queues behind normal RSS queues, but it would
have extra visible queues for TX PTP.

> they just increase the memory footprint of each ring.

For every ring or for every TC? (which is hopefully 1 in any non-DCB
deployment?)

> for the priv flags, one of the floating ideas was to
> use hwtstamp_rx_filters flags:
>  
> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/net_tstamp.h#L107
> 
> Our hardware timestamps all packets for free whether you request it or
> not, Currently there is no option to setup "ALL_PTP" traffic in ethtool
> -T, but we can add this flag as it make sense to be in ethtool -T, thus
> we could use it in mlx5 to determine if user selected ALL_PTP, then ptp
> packets will go through this accurate special path.
> 
> This is not a W/A or an abuse to the new flag, it just means if you
> select ALL_PTP then a side effect will be our HW will be more accurate 
> for PTP traffic.
> 
> What do you think ?

That sounds much better than the priv flag, yes.

> Regarding reducing to a single special queue, i will discuss with Eran
> and the Team on Sunday.

Okay, thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 22:52           ` Jakub Kicinski
@ 2020-12-05  0:55             ` Vladimir Oltean
  2020-12-07  6:22               ` Saeed Mahameed
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Oltean @ 2020-12-05  0:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Eran Ben Elisha,
	Tariq Toukan, Richard Cochran, Willem de Bruijn

Hi Jakub,

On Fri, Dec 04, 2020 at 02:52:40PM -0800, Jakub Kicinski wrote:
> On Fri, 04 Dec 2020 13:57:49 -0800 Saeed Mahameed wrote:
> > > Why not use the PTP classification helpers we already have?
> >
> > do you mean ptp_parse_header() or the ebpf prog ?
> > We use skb_flow_dissect() which should be simple enough.
>
> Not sure which exact one TBH, I just know we have helpers for this,
> so if we don't use them it'd be good to at least justify why.
>
> Maybe someone with more practical knowledge here can chime in with
> a recommendation for a helper to find PTP frames on TX?

ptp_classify_raw is optimized to identify PTP event messages (the only
ones that need to be timestamped as far as the protocol is concerned).
PTP general messages (Follow-Up, Delay_Resp, Announce etc) will return
PTP_CLASS_NONE from ptp_classify_raw.

But maybe there is an even better way, since this is on the TX path,
maybe the .ndo_select_queue operation can simply look at
	skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
when deciding whether to send it to the "good" queue or not. This has
the advantage of being less expensive than any sort of frame classification.

Nonetheless, some tests would need to be run. In theory, practice and
theory are the same, whereas in practice they aren't.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 20:26       ` Jakub Kicinski
  2020-12-04 21:57         ` Saeed Mahameed
@ 2020-12-05  1:49         ` Vladimir Oltean
  2020-12-05  2:10           ` Jakub Kicinski
                             ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Vladimir Oltean @ 2020-12-05  1:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Eran Ben Elisha,
	Tariq Toukan, Richard Cochran, Vladimir Oltean, Willem de Bruijn

On Fri, Dec 04, 2020 at 12:26:13PM -0800, Jakub Kicinski wrote:
> On Fri, 04 Dec 2020 11:33:26 -0800 Saeed Mahameed wrote:
> > On Thu, 2020-12-03 at 18:29 -0800, Jakub Kicinski wrote:
> > > On Wed, 2 Dec 2020 20:21:01 -0800 Saeed Mahameed wrote:
> > > > Add TX PTP port object support for better TX timestamping accuracy.
> > > > Currently, driver supports CQE based TX port timestamp. Device
> > > > also offers TX port timestamp, which has less jitter and better
> > > > reflects the actual time of a packet's transmit.
> > >
> > > How much better is it?
> > >
> > > Is the new implementation is standard compliant or just a "better
> > > guess"?
> >
> > It is not a guess for sure, the closer to the output port you take the
> > stamp the more accurate you get, this is why we need the HW timestamp
> > in first place, i don't have the exact number though, but we target to
> > be compliant with G.8273.2 class C, (30 nsec), and this code allow
> > Linux systems to be deployed in the 5G telco edge. Where this standard
> > is needed.
>
> I see. IIRC there was also an IEEE standard which specified the exact
> time stamping point (i.e. SFD crosses layer X). If it's class C that
> answers the question, I think.

The ITU-T G.8273.2 specification just requires a Class C clock to have a
maximum absolute time error under steady state of 30 ns. And taking
timestamps closer to the wire eliminates some clock domain crossings
from what is measured in the path delay, this is probably the reason why
timestamping is more accurate, and it helps to achieve the required
jitter figure.

The IEEE standard that you're thinking of is clause "7.3.4 Generation of
event message timestamps" of IEEE 1588.

-----------------------------[cut here]-----------------------------
7.3.4.1 Event message timestamp point

Unless otherwise specified in a transport-specific annex to this
standard, the message timestamp point for an event message shall be the
beginning of the first symbol after the Start of Frame (SOF) delimiter.

7.3.4.2 Event timestamp generation

All PTP event messages are timestamped on egress and ingress. The
timestamp shall be the time at which the event message timestamp point
passes the reference plane marking the boundary between the PTP node and
the network.

NOTE 1— If an implementation generates event message timestamps using a
point other than the message timestamp point, then the generated
timestamps should be appropriately corrected by the time interval
between the actual time of detection and the time the message timestamp
point passed the reference plane. Failure to make these corrections
results in a time offset between the slave and master clocks.
-----------------------------[cut here]-----------------------------

So there you go, it just says "the reference plane marking the boundary
between the PTP node and the network". So it depends on how you draw the
borders. I cannot seem to find any more precise definition.

Regardless of the layer at which the timestamp is taken, it is the
jitter that matters more than the reduced path delay. The latter is just
a side effect.

"How much better" is an interesting question though.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-05  1:49         ` Vladimir Oltean
@ 2020-12-05  2:10           ` Jakub Kicinski
  2020-12-05 13:20           ` Richard Cochran
  2020-12-07  5:50           ` Saeed Mahameed
  2 siblings, 0 replies; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-05  2:10 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Saeed Mahameed, David S. Miller, netdev, Eran Ben Elisha,
	Tariq Toukan, Richard Cochran, Vladimir Oltean, Willem de Bruijn

On Sat, 5 Dec 2020 03:49:27 +0200 Vladimir Oltean wrote:
> So there you go, it just says "the reference plane marking the boundary
> between the PTP node and the network". So it depends on how you draw the
> borders. I cannot seem to find any more precise definition.

Ah, you made me go search :)

I was referring to what's now section 90 of IEEE 802.3-2018.

> Regardless of the layer at which the timestamp is taken, it is the
> jitter that matters more than the reduced path delay. The latter is just
> a side effect.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-05  1:49         ` Vladimir Oltean
  2020-12-05  2:10           ` Jakub Kicinski
@ 2020-12-05 13:20           ` Richard Cochran
  2020-12-07  5:50           ` Saeed Mahameed
  2 siblings, 0 replies; 41+ messages in thread
From: Richard Cochran @ 2020-12-05 13:20 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Jakub Kicinski, Saeed Mahameed, David S. Miller, netdev,
	Eran Ben Elisha, Tariq Toukan, Vladimir Oltean, Willem de Bruijn

On Sat, Dec 05, 2020 at 03:49:27AM +0200, Vladimir Oltean wrote:
> So there you go, it just says "the reference plane marking the boundary
> between the PTP node and the network". So it depends on how you draw the
> borders.

It depends on the physical link technology.  You can't just "draw the
borders" anywhere you like!

Thanks,
Richard

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 21:57         ` Saeed Mahameed
  2020-12-04 22:52           ` Jakub Kicinski
  2020-12-04 23:17           ` Jakub Kicinski
@ 2020-12-06 13:33           ` Eran Ben Elisha
  2 siblings, 0 replies; 41+ messages in thread
From: Eran Ben Elisha @ 2020-12-06 13:33 UTC (permalink / raw)
  To: Saeed Mahameed, Jakub Kicinski
  Cc: David S. Miller, netdev, Tariq Toukan, Richard Cochran,
	Vladimir Oltean, Willem de Bruijn



On 12/4/2020 11:57 PM, Saeed Mahameed wrote:
> We only forward ptp traffic to the new special queue but we create more
> than one to avoid internal locking as we will utilize the tx softirq
> percpu.
> 
> After double checking the code it seems Eran and Tariq have decided to
> forward all UDP traffic, let me double check with them what happened
> here.

We though about extending the support of these queues to UDP in general, 
and not just PTP. But we can role this back to PTP time critical events 
on dport 319 only.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-04 23:17           ` Jakub Kicinski
  2020-12-04 23:57             ` Saeed Mahameed
@ 2020-12-06 13:36             ` Eran Ben Elisha
  2020-12-07 20:29               ` Jakub Kicinski
  1 sibling, 1 reply; 41+ messages in thread
From: Eran Ben Elisha @ 2020-12-06 13:36 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed; +Cc: David S. Miller, netdev, Tariq Toukan



On 12/5/2020 1:17 AM, Jakub Kicinski wrote:
>> We only forward ptp traffic to the new special queue but we create more
>> than one to avoid internal locking as we will utilize the tx softirq
>> percpu.
> In other words to make the driver implementation simpler we'll have
> a pretty basic feature hidden behind a ethtool priv knob and a number
> of queues which doesn't match reality reported to user space. Hm.

We are not hiding these queues from the netdev stack. We report them in 
real num of TX queues and manage them as any other queue. The only 
change is that select_queue() will select a stream to them if and only 
if they match specific criteria.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-05  0:24               ` Jakub Kicinski
@ 2020-12-06 13:37                 ` Eran Ben Elisha
  2020-12-06 17:08                   ` Richard Cochran
  0 siblings, 1 reply; 41+ messages in thread
From: Eran Ben Elisha @ 2020-12-06 13:37 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed; +Cc: David S. Miller, netdev, Tariq Toukan



On 12/5/2020 2:24 AM, Jakub Kicinski wrote:
> On Fri, 04 Dec 2020 15:57:36 -0800 Saeed Mahameed wrote:
>> On Fri, 2020-12-04 at 15:17 -0800, Jakub Kicinski wrote:
>>> On Fri, 04 Dec 2020 13:57:49 -0800 Saeed Mahameed wrote:
>>>>>> option 2) route PTP traffic to a special SQs per ring, this SQ
>>>>>> will
>>>>>> be
>>>>>> PTP port accurate, Normal traffic will continue through regular
>>>>>> SQs
>>>>>>
>>>>>> Pros: Regular non PTP traffic not affected.
>>>>>> Cons: High memory footprint for creating special SQs
>>>>>>
>>>>>> So we prefer (2) + private flag to avoid the performance hit
>>>>>> and
>>>>>> the
>>>>>> redundant memory usage out of the box.
>>>>>
>>>>> Option 3 - have only one special PTP queue in the system. PTP
>>>>> traffic
>>>>> is rather low rate, queue per core doesn't seem necessary.
>>>>
>>>> We only forward ptp traffic to the new special queue but we create
>>>> more
>>>> than one to avoid internal locking as we will utilize the tx
>>>> softirq
>>>> percpu.
>>>
>>> In other words to make the driver implementation simpler we'll have
>>> a pretty basic feature hidden behind a ethtool priv knob and a number
>>> of queues which doesn't match reality reported to user space. Hm.
>>
>> I look at these queues as a special HW objects to allow the accurate
>> PTP stamping, they piggyback on the reported txqs, so they are
>> transparent,
> 
> But they are visible to the stack, via sysfs, netlink. Any check
> in the kernel that tries to help the driver by validating user input
> against real_num_tx_queues will be moot for mlx5e.

Re-writing it here,  we report them in real num of TX queues.

> 
> mlx5e hides the AF_XDP queues behind normal RSS queues, but it would
> have extra visible queues for TX PTP.
> 
>> they just increase the memory footprint of each ring.
> 
> For every ring or for every TC? (which is hopefully 1 in any non-DCB
> deployment?)

For every TC, not for every ring.

> 
>> for the priv flags, one of the floating ideas was to
>> use hwtstamp_rx_filters flags:
>>   
>> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/net_tstamp.h#L107
>>
>> Our hardware timestamps all packets for free whether you request it or
>> not, Currently there is no option to setup "ALL_PTP" traffic in ethtool
>> -T, but we can add this flag as it make sense to be in ethtool -T, thus
>> we could use it in mlx5 to determine if user selected ALL_PTP, then ptp
>> packets will go through this accurate special path.
>>
>> This is not a W/A or an abuse to the new flag, it just means if you
>> select ALL_PTP then a side effect will be our HW will be more accurate
>> for PTP traffic.
>>
>> What do you think ?
> 
> That sounds much better than the priv flag, yes.

Our Hardware can provide a better accurate time stamp under few 
limitations. It requires higher memory consumption ({SQ, 2 x CQ, 
internal HW LB RQ} per TC), and also has performance impact (more CQEs 
to consume for example).
Some customers are happy with the accuracy they get today and don't want 
the extra penalty, so they don't want to be automatically shifted to the 
new TS logic.

Adding new enum to the ioctl means we have add 
(HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY for example) all the way - drivers, 
kernel ptp, user space ptp, ethtool.

My concerns are:
1. Timestamp applications (like ptp4l or similar) will have to add 
support for configuring the driver to use 
HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY if supported via ioctl prior to 
packets transmit. From application point of view, the dual-modes 
(HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY , HWTSTAMP_TX_ON) support is 
redundant, as it offers nothing new.
2. Other vendors will have to support it as well, when not sure what is 
the expectation from them if they cannot improve accuracy between them.

This feature is just an internal enhancement, and as such it should be 
added only as a vendor private configuration flag. We are not offering 
here about any standard for others to follow.

If we did not have the limitation above, it could have been added as the 
default silently.

I suggest we reconsider the ethtool private-flag, the ioctl change might 
be a long journey in a wrong direction.

> 
>> Regarding reducing to a single special queue, i will discuss with Eran
>> and the Team on Sunday.
> 
> Okay, thanks.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-06 13:37                 ` Eran Ben Elisha
@ 2020-12-06 17:08                   ` Richard Cochran
  2020-12-07  8:37                     ` Saeed Mahameed
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Cochran @ 2020-12-06 17:08 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Jakub Kicinski, Saeed Mahameed, David S. Miller, netdev, Tariq Toukan

On Sun, Dec 06, 2020 at 03:37:47PM +0200, Eran Ben Elisha wrote:
> Adding new enum to the ioctl means we have add
> (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY for example) all the way - drivers,
> kernel ptp, user space ptp, ethtool.
> 
> My concerns are:
> 1. Timestamp applications (like ptp4l or similar) will have to add support
> for configuring the driver to use HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY if
> supported via ioctl prior to packets transmit. From application point of
> view, the dual-modes (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY , HWTSTAMP_TX_ON)
> support is redundant, as it offers nothing new.

Well said.

> 2. Other vendors will have to support it as well, when not sure what is the
> expectation from them if they cannot improve accuracy between them.

If there were multiple different devices out there with this kind of
implementation (different levels of accuracy with increasing run time
performance cost), then we could consider such a flag.  However, to my
knowledge, this feature is unique to your device.

> This feature is just an internal enhancement, and as such it should be added
> only as a vendor private configuration flag. We are not offering here about
> any standard for others to follow.

+1

Thanks,
Richard

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-05  1:49         ` Vladimir Oltean
  2020-12-05  2:10           ` Jakub Kicinski
  2020-12-05 13:20           ` Richard Cochran
@ 2020-12-07  5:50           ` Saeed Mahameed
  2 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-07  5:50 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan,
	Richard Cochran, Vladimir Oltean, Willem de Bruijn

On Sat, 2020-12-05 at 03:49 +0200, Vladimir Oltean wrote:
> On Fri, Dec 04, 2020 at 12:26:13PM -0800, Jakub Kicinski wrote:
> > On Fri, 04 Dec 2020 11:33:26 -0800 Saeed Mahameed wrote:
> > > On Thu, 2020-12-03 at 18:29 -0800, Jakub Kicinski wrote:
> > > > On Wed, 2 Dec 2020 20:21:01 -0800 Saeed Mahameed wrote:
> > > > > Add TX PTP port object support for better TX timestamping
> > > > > accuracy.
> > > > > Currently, driver supports CQE based TX port timestamp.
> > > > > Device
> > > > > also offers TX port timestamp, which has less jitter and
> > > > > better
> > > > > reflects the actual time of a packet's transmit.
> > > > 
> > > > How much better is it?
> > > > 
> > > > Is the new implementation is standard compliant or just a
> > > > "better
> > > > guess"?
> > > 
> > > It is not a guess for sure, the closer to the output port you
> > > take the
> > > stamp the more accurate you get, this is why we need the HW
> > > timestamp
> > > in first place, i don't have the exact number though, but we
> > > target to
> > > be compliant with G.8273.2 class C, (30 nsec), and this code
> > > allow
> > > Linux systems to be deployed in the 5G telco edge. Where this
> > > standard
> > > is needed.
> > 
> > I see. IIRC there was also an IEEE standard which specified the
> > exact
> > time stamping point (i.e. SFD crosses layer X). If it's class C
> > that
> > answers the question, I think.
> 
> The ITU-T G.8273.2 specification just requires a Class C clock to
> have a
> maximum absolute time error under steady state of 30 ns. And taking
> timestamps closer to the wire eliminates some clock domain crossings
> from what is measured in the path delay, this is probably the reason
> why
> timestamping is more accurate, and it helps to achieve the required
> jitter figure.
> 
> The IEEE standard that you're thinking of is clause "7.3.4 Generation
> of
> event message timestamps" of IEEE 1588.
> 
> -----------------------------[cut here]-----------------------------
> 7.3.4.1 Event message timestamp point
> 
> Unless otherwise specified in a transport-specific annex to this
> standard, the message timestamp point for an event message shall be
> the
> beginning of the first symbol after the Start of Frame (SOF)
> delimiter.
> 
> 7.3.4.2 Event timestamp generation
> 
> All PTP event messages are timestamped on egress and ingress. The
> timestamp shall be the time at which the event message timestamp
> point
> passes the reference plane marking the boundary between the PTP node
> and
> the network.
> 
> NOTE 1— If an implementation generates event message timestamps using
> a
> point other than the message timestamp point, then the generated
> timestamps should be appropriately corrected by the time interval
> between the actual time of detection and the time the message
> timestamp
> point passed the reference plane. Failure to make these corrections
> results in a time offset between the slave and master clocks.
> -----------------------------[cut here]-----------------------------
> 
> So there you go, it just says "the reference plane marking the
> boundary
> between the PTP node and the network". So it depends on how you draw
> the
> borders. I cannot seem to find any more precise definition.
> 
> Regardless of the layer at which the timestamp is taken, it is the
> jitter that matters more than the reduced path delay. The latter is
> just
> a side effect.
> 

SO the closer to the wire you take the stamp the less potential for
jitter, since this is after ALL HW pipeline variable delays.

> "How much better" is an interesting question though.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-05  0:55             ` Vladimir Oltean
@ 2020-12-07  6:22               ` Saeed Mahameed
  0 siblings, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-07  6:22 UTC (permalink / raw)
  To: Vladimir Oltean, Jakub Kicinski
  Cc: David S. Miller, netdev, Eran Ben Elisha, Tariq Toukan,
	Richard Cochran, Willem de Bruijn

On Sat, 2020-12-05 at 00:55 +0000, Vladimir Oltean wrote:
> Hi Jakub,
> 
> On Fri, Dec 04, 2020 at 02:52:40PM -0800, Jakub Kicinski wrote:
> > On Fri, 04 Dec 2020 13:57:49 -0800 Saeed Mahameed wrote:
> > > > Why not use the PTP classification helpers we already have?
> > > 
> > > do you mean ptp_parse_header() or the ebpf prog ?
> > > We use skb_flow_dissect() which should be simple enough.
> > 
> > Not sure which exact one TBH, I just know we have helpers for this,
> > so if we don't use them it'd be good to at least justify why.
> > 
> > Maybe someone with more practical knowledge here can chime in with
> > a recommendation for a helper to find PTP frames on TX?
> 
> ptp_classify_raw is optimized to identify PTP event messages (the
> only
> ones that need to be timestamped as far as the protocol is
> concerned).
> PTP general messages (Follow-Up, Delay_Resp, Announce etc) will
> return
> PTP_CLASS_NONE from ptp_classify_raw.
> 

I looked at the implementation, while it is nice to see that it is
running an ebpf program, but it seems these functions are meant for
those who care about the content of those PTP messages.

Select queue has to be consistent for a specific stream so
I'd rather lookup the well known ptp port via the standard flow
dissector and select the queue accordingly, using any other mechanism
might cause inconsistencies and ooo.

also the flow dissector handles non linear skbs very nicely, whereas,
the two ptp classifier methods don't. They actually have different
purposes than what we are looking for.

so I think we should stick with our simple flow dissector
implementation.

> But maybe there is an even better way, since this is on the TX path,
> maybe the .ndo_select_queue operation can simply look at
> 	skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
> when deciding whether to send it to the "good" queue or not. This has
> the advantage of being less expensive than any sort of frame
> classification.
> 

We also considered this, this is bad in our case because this will
easily break performance for users who do setsockopt(SO_TIMESTAMPING)
on TCP/UDP sockets that favor performance over precision but still want
HW timestamping.

> Nonetheless, some tests would need to be run. In theory, practice and
> theory are the same, whereas in practice they aren't.

In Theory, I don't agree ;-).




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-06 17:08                   ` Richard Cochran
@ 2020-12-07  8:37                     ` Saeed Mahameed
  2020-12-07 11:05                       ` Eran Ben Elisha
  2020-12-07 15:19                       ` Richard Cochran
  0 siblings, 2 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-07  8:37 UTC (permalink / raw)
  To: Richard Cochran, Eran Ben Elisha
  Cc: Jakub Kicinski, David S. Miller, netdev, Tariq Toukan

On Sun, 2020-12-06 at 09:08 -0800, Richard Cochran wrote:
> On Sun, Dec 06, 2020 at 03:37:47PM +0200, Eran Ben Elisha wrote:
> > Adding new enum to the ioctl means we have add
> > (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY for example) all the way -
> > drivers,
> > kernel ptp, user space ptp, ethtool.
> > 

Not exactly,
1) the flag name should be HWTSTAMP_TX_PTP_EVENTS, similar to what we
already have in RX, which will mean: 
HW stamp all PTP events, don't care about the rest.

2) no need to add it to drivers from the get go, only drivers who are
interested may implement it, and i am sure there are tons who would
like to have this flag if their hw timestamping implementation is slow
! other drivers will just keep doing what they are doing, timestamp all
traffic even if user requested this flag, again exactly like many other
drivers do for RX flags (hwtstamp_rx_filters).

> > My concerns are:
> > 1. Timestamp applications (like ptp4l or similar) will have to add
> > support
> > for configuring the driver to use HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY
> > if
> > supported via ioctl prior to packets transmit. From application
> > point of
> > view, the dual-modes (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY ,
> > HWTSTAMP_TX_ON)
> > support is redundant, as it offers nothing new.
> 
> Well said.
> 

disagree, it is not a dual mode, just allow the user to have better
granularity for what hw stamps, exactly like what we have in rx.

we are not adding any new mechanism.

> > 2. Other vendors will have to support it as well, when not sure
> > what is the
> > expectation from them if they cannot improve accuracy between them.
> 
> If there were multiple different devices out there with this kind of
> implementation (different levels of accuracy with increasing run time
> performance cost), then we could consider such a flag.  However, to
> my
> knowledge, this feature is unique to your device.
> 

I agree, but i never meant to have a flag that indicate two different
levels of accuracy, that would be a very wild mistake for sure! 

The new flag will be about selecting granularity of what gets a hw
stamp and what doesn't, aligning with the RX filter API.

> > This feature is just an internal enhancement, and as such it should
> > be added
> > only as a vendor private configuration flag. We are not offering
> > here about
> > any standard for others to follow.
> 
> +1
> 

Our driver feature is and internal enhancement yes, but the suggested
flag is very far from indicating any internal enhancement, is actually
an enhancement to the current API, and is a very simple extension with
wide range of improvements to all layers.

Our driver can optimize accuracy when this flag is set, other drivers
might be happy to implement it since they already have a slow hw and
this flag would allow them to run better TCP/UDP performance while
still performing ptp hw stamping, some admins/apps will use it to avoid
stamping all traffic on tx, win win win.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-07  8:37                     ` Saeed Mahameed
@ 2020-12-07 11:05                       ` Eran Ben Elisha
  2020-12-07 15:19                       ` Richard Cochran
  1 sibling, 0 replies; 41+ messages in thread
From: Eran Ben Elisha @ 2020-12-07 11:05 UTC (permalink / raw)
  To: Saeed Mahameed, Richard Cochran
  Cc: Jakub Kicinski, David S. Miller, netdev, Tariq Toukan



On 12/7/2020 10:37 AM, Saeed Mahameed wrote:
> On Sun, 2020-12-06 at 09:08 -0800, Richard Cochran wrote:
>> On Sun, Dec 06, 2020 at 03:37:47PM +0200, Eran Ben Elisha wrote:
>>> Adding new enum to the ioctl means we have add
>>> (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY for example) all the way -
>>> drivers,
>>> kernel ptp, user space ptp, ethtool.
>>>
> 
> Not exactly,
> 1) the flag name should be HWTSTAMP_TX_PTP_EVENTS, similar to what we
> already have in RX, which will mean:
> HW stamp all PTP events, don't care about the rest.
> 
> 2) no need to add it to drivers from the get go, only drivers who are
> interested may implement it, and i am sure there are tons who would
> like to have this flag if their hw timestamping implementation is slow
> ! other drivers will just keep doing what they are doing, timestamp all
> traffic even if user requested this flag, again exactly like many other
> drivers do for RX flags (hwtstamp_rx_filters).
> 
>>> My concerns are:
>>> 1. Timestamp applications (like ptp4l or similar) will have to add
>>> support
>>> for configuring the driver to use HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY
>>> if
>>> supported via ioctl prior to packets transmit. From application
>>> point of
>>> view, the dual-modes (HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY ,
>>> HWTSTAMP_TX_ON)
>>> support is redundant, as it offers nothing new.
>>
>> Well said.
>>
> 
> disagree, it is not a dual mode, just allow the user to have better
> granularity for what hw stamps, exactly like what we have in rx.
> 
> we are not adding any new mechanism.
> 
>>> 2. Other vendors will have to support it as well, when not sure
>>> what is the
>>> expectation from them if they cannot improve accuracy between them.
>>
>> If there were multiple different devices out there with this kind of
>> implementation (different levels of accuracy with increasing run time
>> performance cost), then we could consider such a flag.  However, to
>> my
>> knowledge, this feature is unique to your device.
>>
> 
> I agree, but i never meant to have a flag that indicate two different
> levels of accuracy, that would be a very wild mistake for sure!
> 
> The new flag will be about selecting granularity of what gets a hw
> stamp and what doesn't, aligning with the RX filter API.
> 
>>> This feature is just an internal enhancement, and as such it should
>>> be added
>>> only as a vendor private configuration flag. We are not offering
>>> here about
>>> any standard for others to follow.
>>
>> +1
>>
> 
> Our driver feature is and internal enhancement yes, but the suggested
> flag is very far from indicating any internal enhancement, is actually
> an enhancement to the current API, and is a very simple extension with
> wide range of improvements to all layers.
> 
> Our driver can optimize accuracy when this flag is set, other drivers
> might be happy to implement it since they already have a slow hw and
> this flag would allow them to run better TCP/UDP performance while
> still performing ptp hw stamping, some admins/apps will use it to avoid
> stamping all traffic on tx, win win win.
> 
> 
Seems interesting. I can form such V2 patches soon.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-07  8:37                     ` Saeed Mahameed
  2020-12-07 11:05                       ` Eran Ben Elisha
@ 2020-12-07 15:19                       ` Richard Cochran
  2020-12-07 20:42                         ` Jakub Kicinski
  1 sibling, 1 reply; 41+ messages in thread
From: Richard Cochran @ 2020-12-07 15:19 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Eran Ben Elisha, Jakub Kicinski, David S. Miller, netdev, Tariq Toukan

On Mon, Dec 07, 2020 at 12:37:45AM -0800, Saeed Mahameed wrote:

> we are not adding any new mechanism.

Sorry, I didn't catch the beginning of this thread.  Are you proposing
adding HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY to net_tstamp.h ?

If so, then ...

> Our driver feature is and internal enhancement yes, but the suggested
> flag is very far from indicating any internal enhancement, is actually
> an enhancement to the current API, and is a very simple extension with
> wide range of improvements to all layers.

No, that would be no enhancement but rather a hack for poorly designed
hardware.
 
> Our driver can optimize accuracy when this flag is set, other drivers
> might be happy to implement it since they already have a slow hw

Name three other drivers that would "be happy" to implement this.  Can
you name even one other?

Thanks,
Richard



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-06 13:36             ` Eran Ben Elisha
@ 2020-12-07 20:29               ` Jakub Kicinski
  0 siblings, 0 replies; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-07 20:29 UTC (permalink / raw)
  To: Eran Ben Elisha; +Cc: Saeed Mahameed, David S. Miller, netdev, Tariq Toukan

On Sun, 6 Dec 2020 15:36:38 +0200 Eran Ben Elisha wrote:
> On 12/5/2020 1:17 AM, Jakub Kicinski wrote:
> >> We only forward ptp traffic to the new special queue but we create more
> >> than one to avoid internal locking as we will utilize the tx softirq
> >> percpu.  
> > In other words to make the driver implementation simpler we'll have
> > a pretty basic feature hidden behind a ethtool priv knob and a number
> > of queues which doesn't match reality reported to user space. Hm.  
> 
> We are not hiding these queues from the netdev stack. We report them in 
> real num of TX queues and manage them as any other queue. The only 
> change is that select_queue() will select a stream to them if and only 
> if they match specific criteria.

Please read more carefully what you're replying to. That helps
communication and limits frustration quite a lot.

I said the queues are hidden behind the ethtool knob, as in they are
only instantiated when knob is turned from its default position.
Then you report to the stack that you have n+m queues, but in fact
there is only n queues that are of general use.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-07 15:19                       ` Richard Cochran
@ 2020-12-07 20:42                         ` Jakub Kicinski
  2020-12-07 22:04                           ` Saeed Mahameed
  2020-12-08 13:02                           ` Richard Cochran
  0 siblings, 2 replies; 41+ messages in thread
From: Jakub Kicinski @ 2020-12-07 20:42 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Saeed Mahameed, Eran Ben Elisha, David S. Miller, netdev, Tariq Toukan

On Mon, 7 Dec 2020 07:19:06 -0800 Richard Cochran wrote:
> On Mon, Dec 07, 2020 at 12:37:45AM -0800, Saeed Mahameed wrote:
> > we are not adding any new mechanism.  
> 
> Sorry, I didn't catch the beginning of this thread.  Are you proposing
> adding HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY to net_tstamp.h ?
> 
> If so, then ...
> 
> > Our driver feature is and internal enhancement yes, but the suggested
> > flag is very far from indicating any internal enhancement, is actually
> > an enhancement to the current API, and is a very simple extension with
> > wide range of improvements to all layers.  
> 
> No, that would be no enhancement but rather a hack for poorly designed
> hardware.
> 
> > Our driver can optimize accuracy when this flag is set, other drivers
> > might be happy to implement it since they already have a slow hw  
> 
> Name three other drivers that would "be happy" to implement this.  Can
> you name even one other?

The behavior is not entirely dissimilar to the time stamps on
multi-layered devices (e.g. DSA switches). The time stamp can either 
be generated when the packet enters the device (current mlx5 behavior)
or when it actually egresses thru the MAC (what this set adds).

So while we could find other hardware like this if we squint hard enough
- I'm not sure how much practical use for CPU-side stamps there is in DSA.


My main concern is the user friendliness. I think there is no question
that user running ptp4l would want this mlx5 knob to be enabled. Would
we rather see a patch to ptp4l that turns per driver knob or should we
shoot for some form of an API that tells the kernel that we're
expecting ns level time accuracy? 

That's how I would phrase the dilemma here.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-07 20:42                         ` Jakub Kicinski
@ 2020-12-07 22:04                           ` Saeed Mahameed
  2020-12-08 13:02                           ` Richard Cochran
  1 sibling, 0 replies; 41+ messages in thread
From: Saeed Mahameed @ 2020-12-07 22:04 UTC (permalink / raw)
  To: Jakub Kicinski, Richard Cochran
  Cc: Eran Ben Elisha, David S. Miller, netdev, Tariq Toukan

On Mon, 2020-12-07 at 12:42 -0800, Jakub Kicinski wrote:
> On Mon, 7 Dec 2020 07:19:06 -0800 Richard Cochran wrote:
> > On Mon, Dec 07, 2020 at 12:37:45AM -0800, Saeed Mahameed wrote:
> > > we are not adding any new mechanism.  
> > 
> > Sorry, I didn't catch the beginning of this thread.  Are you
> > proposing
> > adding HWTSTAMP_TX_ON_TIME_CRITICAL_ONLY to net_tstamp.h ?
> > 
> > If so, then ...
> > 
> > > Our driver feature is and internal enhancement yes, but the
> > > suggested
> > > flag is very far from indicating any internal enhancement, is
> > > actually
> > > an enhancement to the current API, and is a very simple extension
> > > with
> > > wide range of improvements to all layers.  
> > 
> > No, that would be no enhancement but rather a hack for poorly
> > designed
> > hardware.
> > 

Why ? how is the new flag different from HWTSTAMP_TX_ONESTEP_SYNC ?
it is a way to fine tune the driver .. nothing is hacky about the new
flag.

> > > Our driver can optimize accuracy when this flag is set, other
> > > drivers
> > > might be happy to implement it since they already have a slow
> > > hw  
> > 
> > Name three other drivers that would "be happy" to implement
> > this.  Can
> > you name even one other?
> 
> The behavior is not entirely dissimilar to the time stamps on
> multi-layered devices (e.g. DSA switches). The time stamp can either 
> be generated when the packet enters the device (current mlx5
> behavior)
> or when it actually egresses thru the MAC (what this set adds).
> 
> So while we could find other hardware like this if we squint hard
> enough
> - I'm not sure how much practical use for CPU-side stamps there is in
> DSA.
> 
> 
> My main concern is the user friendliness. I think there is no
> question
> that user running ptp4l would want this mlx5 knob to be enabled.
> Would
> we rather see a patch to ptp4l that turns per driver knob or should
> we
> shoot for some form of an API that tells the kernel that we're
> expecting ns level time accuracy? 
> 
> That's how I would phrase the dilemma here.

This is why i think that the new PTP tx flag to let the driver know
that only PTP EVENT messages are important would be the perfect answer
for all of the above. this flag has a very standard definition, which
could also mean: improved precision for PTP messages if the HW can do
it, why not, ptp4l should always choose this flag if it is present, as
ptp4l shouldn't request ptp hw tstamp on all tx traffic as it is doing
today, it is just an overkill.

other options will be adding knew knob out of the scope of PTP APIs,
which is going to be as ugly as private flag.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [net-next V2 08/15] net/mlx5e: Add TX PTP port object support
  2020-12-07 20:42                         ` Jakub Kicinski
  2020-12-07 22:04                           ` Saeed Mahameed
@ 2020-12-08 13:02                           ` Richard Cochran
  1 sibling, 0 replies; 41+ messages in thread
From: Richard Cochran @ 2020-12-08 13:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Eran Ben Elisha, David S. Miller, netdev, Tariq Toukan

On Mon, Dec 07, 2020 at 12:42:33PM -0800, Jakub Kicinski wrote:

> The behavior is not entirely dissimilar to the time stamps on
> multi-layered devices (e.g. DSA switches). The time stamp can either 
> be generated when the packet enters the device (current mlx5 behavior)
> or when it actually egresses thru the MAC (what this set adds).

To be useful, the time stamps must be taken on the external ports.
Generating the time stamp at the DMA reception in the device doesn't
even make sense, unless the delay through the device is constant.

> My main concern is the user friendliness. I think there is no question
> that user running ptp4l would want this mlx5 knob to be enabled.

Right.

> Would
> we rather see a patch to ptp4l that turns per driver knob or should we
> shoot for some form of an API that tells the kernel that we're
> expecting ns level time accuracy? 

This is a hardware-specific "feature".  One of the guiding principles
of the linuxptp user space stack is not to become a catalog of
workarounds for random hardware.  IMO the kernel's API should not
encourage "special" hardware either.  After all, we have lots and lots
of PTP hardware supported, all of them already working with the
existing API just fine.

My preference is for a global knob for users of this hardware, either

- a compile time Kconfig option on the driver, or
- some kind of sysctl/debugfs knob

Thanks,
Richard




^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-12-08 13:03 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03  4:20 [pull request][net-next V2 00/15] mlx5 updates 2020-12-01 Saeed Mahameed
2020-12-03  4:20 ` [net-next V2 01/15] net/mlx5e: Free drop RQ in a dedicated function Saeed Mahameed
2020-12-03  4:20 ` [net-next V2 02/15] net/mlx5e: Allow CQ outside of channel context Saeed Mahameed
2020-12-03  4:20 ` [net-next V2 03/15] net/mlx5e: Allow RQ " Saeed Mahameed
2020-12-03  4:20 ` [net-next V2 04/15] net/mlx5e: Allow SQ " Saeed Mahameed
2020-12-03  4:20 ` [net-next V2 05/15] net/mlx5e: Change skb fifo push/pop API to be used without SQ Saeed Mahameed
2020-12-03  4:20 ` [net-next V2 06/15] net/mlx5e: Split SW group counters update function Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 07/15] net/mlx5e: Move MLX5E_RX_ERR_CQE macro Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 08/15] net/mlx5e: Add TX PTP port object support Saeed Mahameed
2020-12-04  2:29   ` Jakub Kicinski
2020-12-04 19:33     ` Saeed Mahameed
2020-12-04 20:26       ` Jakub Kicinski
2020-12-04 21:57         ` Saeed Mahameed
2020-12-04 22:52           ` Jakub Kicinski
2020-12-05  0:55             ` Vladimir Oltean
2020-12-07  6:22               ` Saeed Mahameed
2020-12-04 23:17           ` Jakub Kicinski
2020-12-04 23:57             ` Saeed Mahameed
2020-12-05  0:24               ` Jakub Kicinski
2020-12-06 13:37                 ` Eran Ben Elisha
2020-12-06 17:08                   ` Richard Cochran
2020-12-07  8:37                     ` Saeed Mahameed
2020-12-07 11:05                       ` Eran Ben Elisha
2020-12-07 15:19                       ` Richard Cochran
2020-12-07 20:42                         ` Jakub Kicinski
2020-12-07 22:04                           ` Saeed Mahameed
2020-12-08 13:02                           ` Richard Cochran
2020-12-06 13:36             ` Eran Ben Elisha
2020-12-07 20:29               ` Jakub Kicinski
2020-12-06 13:33           ` Eran Ben Elisha
2020-12-05  1:49         ` Vladimir Oltean
2020-12-05  2:10           ` Jakub Kicinski
2020-12-05 13:20           ` Richard Cochran
2020-12-07  5:50           ` Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 09/15] net/mlx5e: Add TX port timestamp support Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 10/15] net/mlx5e: remove unnecessary memset Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 11/15] net/mlx5e: Remove duplicated include Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 12/15] net/mlx5: Arm only EQs with EQEs Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 13/15] net/mlx5: Fix passing zero to 'PTR_ERR' Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 14/15] net/mlx5e: Split between RX/TX tunnel FW support indication Saeed Mahameed
2020-12-03  4:21 ` [net-next V2 15/15] net/mlx5e: Fill mlx5e_create_cq_param in a function Saeed Mahameed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.