All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19
@ 2018-05-21 21:04 Saeed Mahameed
  2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-21 21:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Saeed Mahameed

Hi Dave,

This is a mlx5e only pull request, for more information please see tag
log below.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit eb38401c779d350e9e31396471ea072fa29aec9b:

  net: stmmac: Populate missing callbacks in HWIF initialization (2018-05-18 13:56:08 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5e-updates-2018-05-19

for you to fetch changes up to e4362d6f7191732f47f5bc63cc4c736494f1f964:

  net/mlx5e: Receive buffer support for DCBX (2018-05-19 06:09:23 -0700)

----------------------------------------------------------------
mlx5e-updates-2018-05-19

This series contains updates for mlx5e netdevice driver with one subject,
DSCP to priority mapping, in the first patch Huy adds the needed API in
dcbnl, the second patch adds the needed mlx5 core capability bits for the
feature, and all other patches are mlx5e (netdev) only changes to add
support for the feature.

From: Huy Nguyen

Dscp to priority mapping for Ethernet packet:

These patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.

Firmware interface:
Mellanox firmware provides two control knobs for this feature:
  QPTS register allow changing the trust state between dscp and
  pcp mode. The default is pcp mode. Once in dscp mode, firmware will
  route the packet based on its dscp value if the dscp field exists.

  QPDPM register allow mapping a specific dscp (0 to 63) to a
  specific priority (0 to 7). By default, all the dscps are mapped to
  priority zero.

Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.

The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.

When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.

When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.

----------------------------------------------------------------
Huy Nguyen (6):
      net/dcb: Add dcbnl buffer attribute
      net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
      net/mlx5e: Move port speed code from en_ethtool.c to en/port.c
      net/mlx5e: PPTB and PBMC register firmware command support
      net/mlx5e: Receive buffer configuration
      net/mlx5e: Receive buffer support for DCBX

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   8 +-
 .../net/ethernet/mellanox/mlx5/core/en/Makefile    |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/port.c  | 237 +++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/port.h  |  48 +++
 .../ethernet/mellanox/mlx5/core/en/port_buffer.c   | 327 +++++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/en/port_buffer.h   |  75 +++++
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 131 ++++++++-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 102 +++----
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |   3 +-
 include/linux/mlx5/device.h                        |   3 +
 include/linux/mlx5/driver.h                        |   2 +
 include/linux/mlx5/mlx5_ifc.h                      |  47 +++
 include/net/dcbnl.h                                |   4 +
 include/uapi/linux/dcbnl.h                         |  10 +
 net/dcb/dcbnl.c                                    |  20 ++
 17 files changed, 945 insertions(+), 80 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
@ 2018-05-21 21:04 ` Saeed Mahameed
  2018-05-22  5:20   ` Jakub Kicinski
  2018-05-23 20:19   ` Jakub Kicinski
  2018-05-21 21:04 ` [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Saeed Mahameed
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-21 21:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

In this patch, we add dcbnl buffer attribute to allow user
change the NIC's buffer configuration such as priority
to buffer mapping and buffer size of individual buffer.

This attribute combined with pfc attribute allows advance user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more prirorities or user
can give large buffer to certain priorities.

We present an use case scenario where dcbnl buffer attribute configured
by advance user helps reduce the latency of messages of different sizes.

Scenarios description:
On ConnectX-5, we run latency sensitive traffic with
small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
traffic with large messages sizes 512KB and 1MB. We group small, medium,
and large message sizes to their own pfc enables priorities as follow.
  Priorities 1 & 2 (64B, 256B and 1KB)
  Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
  Priorities 5 & 6 (512KB and 1MB)

By default, ConnectX-5 maps all pfc enabled priorities to a single
lossless fixed buffer size of 50% of total available buffer space. The
other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
we create three equal size lossless buffers. Each buffer has 25% of total
available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
to lossless  buffer mappings are set as follow.
  Priorities 1 & 2 on lossless buffer #1
  Priorities 3 & 4 on lossless buffer #2
  Priorities 5 & 6 on lossless buffer #3

We observe improvements in latency for small and medium message sizes
as follows. Please note that the large message sizes bandwidth performance is
reduced but the total bandwidth remains the same.
  256B message size (42 % latency reduction)
  4K message size (21% latency reduction)
  64K message size (16% latency reduction)

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/net/dcbnl.h        |  4 ++++
 include/uapi/linux/dcbnl.h | 10 ++++++++++
 net/dcb/dcbnl.c            | 20 ++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index 207d9ba1f92c..0e5e91be2d30 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -101,6 +101,10 @@ struct dcbnl_rtnl_ops {
 	/* CEE peer */
 	int (*cee_peer_getpg) (struct net_device *, struct cee_pg *);
 	int (*cee_peer_getpfc) (struct net_device *, struct cee_pfc *);
+
+	/* buffer settings */
+	int (*dcbnl_getbuffer)(struct net_device *, struct dcbnl_buffer *);
+	int (*dcbnl_setbuffer)(struct net_device *, struct dcbnl_buffer *);
 };
 
 #endif /* __NET_DCBNL_H__ */
diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
index 2c0c6453c3f4..1ddc0a44c172 100644
--- a/include/uapi/linux/dcbnl.h
+++ b/include/uapi/linux/dcbnl.h
@@ -163,6 +163,15 @@ struct ieee_pfc {
 	__u64	indications[IEEE_8021QAZ_MAX_TCS];
 };
 
+#define IEEE_8021Q_MAX_PRIORITIES 8
+#define DCBX_MAX_BUFFERS  8
+struct dcbnl_buffer {
+	/* priority to buffer mapping */
+	__u8    prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
+	/* buffer size in Bytes */
+	__u32   buffer_size[DCBX_MAX_BUFFERS];
+};
+
 /* CEE DCBX std supported values */
 #define CEE_DCBX_MAX_PGS	8
 #define CEE_DCBX_MAX_PRIO	8
@@ -406,6 +415,7 @@ enum ieee_attrs {
 	DCB_ATTR_IEEE_MAXRATE,
 	DCB_ATTR_IEEE_QCN,
 	DCB_ATTR_IEEE_QCN_STATS,
+	DCB_ATTR_DCB_BUFFER,
 	__DCB_ATTR_IEEE_MAX
 };
 #define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1)
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index bae7d78aa068..d2f4e0c1faaf 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -176,6 +176,7 @@ static const struct nla_policy dcbnl_ieee_policy[DCB_ATTR_IEEE_MAX + 1] = {
 	[DCB_ATTR_IEEE_MAXRATE]   = {.len = sizeof(struct ieee_maxrate)},
 	[DCB_ATTR_IEEE_QCN]         = {.len = sizeof(struct ieee_qcn)},
 	[DCB_ATTR_IEEE_QCN_STATS]   = {.len = sizeof(struct ieee_qcn_stats)},
+	[DCB_ATTR_DCB_BUFFER]       = {.len = sizeof(struct dcbnl_buffer)},
 };
 
 /* DCB number of traffic classes nested attributes. */
@@ -1094,6 +1095,16 @@ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev)
 			return -EMSGSIZE;
 	}
 
+	if (ops->dcbnl_getbuffer) {
+		struct dcbnl_buffer buffer;
+
+		memset(&buffer, 0, sizeof(buffer));
+		err = ops->dcbnl_getbuffer(netdev, &buffer);
+		if (!err &&
+		    nla_put(skb, DCB_ATTR_DCB_BUFFER, sizeof(buffer), &buffer))
+			return -EMSGSIZE;
+	}
+
 	app = nla_nest_start(skb, DCB_ATTR_IEEE_APP_TABLE);
 	if (!app)
 		return -EMSGSIZE;
@@ -1453,6 +1464,15 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh,
 			goto err;
 	}
 
+	if (ieee[DCB_ATTR_DCB_BUFFER] && ops->dcbnl_setbuffer) {
+		struct dcbnl_buffer *buffer =
+			nla_data(ieee[DCB_ATTR_DCB_BUFFER]);
+
+		err = ops->dcbnl_setbuffer(netdev, buffer);
+		if (err)
+			goto err;
+	}
+
 	if (ieee[DCB_ATTR_IEEE_APP_TABLE]) {
 		struct nlattr *attr;
 		int rem;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
  2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
  2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
@ 2018-05-21 21:04 ` Saeed Mahameed
  2018-05-22 10:19   ` Or Gerlitz
  2018-05-21 21:04 ` [net-next 3/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c Saeed Mahameed
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-21 21:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

Add pbmc and pptb in the port_access_reg_cap_mask. These two
bits determine if device supports receive buffer configuration.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/device.h   |  3 +++
 include/linux/mlx5/mlx5_ifc.h | 12 ++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 2bc27f8c5b87..db0332a6d23c 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1152,6 +1152,9 @@ enum mlx5_qcam_feature_groups {
 #define MLX5_CAP_PCAM_FEATURE(mdev, fld) \
 	MLX5_GET(pcam_reg, (mdev)->caps.pcam, feature_cap_mask.enhanced_features.fld)
 
+#define MLX5_CAP_PCAM_REG(mdev, reg) \
+	MLX5_GET(pcam_reg, (mdev)->caps.pcam, port_access_reg_cap_mask.regs_5000_to_507f.reg)
+
 #define MLX5_CAP_MCAM_REG(mdev, reg) \
 	MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_access_reg_cap_mask.access_regs.reg)
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index b4ea8a9914c4..f687989d336b 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8003,6 +8003,17 @@ struct mlx5_ifc_pcam_enhanced_features_bits {
 	u8         ppcnt_statistical_group[0x1];
 };
 
+struct mlx5_ifc_pcam_regs_5000_to_507f_bits {
+	u8         port_access_reg_cap_mask_127_to_96[0x20];
+	u8         port_access_reg_cap_mask_95_to_64[0x20];
+	u8         port_access_reg_cap_mask_63_to_32[0x20];
+
+	u8         port_access_reg_cap_mask_31_to_13[0x13];
+	u8         pbmc[0x1];
+	u8         pptb[0x1];
+	u8         port_access_reg_cap_mask_10_to_0[0xb];
+};
+
 struct mlx5_ifc_pcam_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         feature_group[0x8];
@@ -8012,6 +8023,7 @@ struct mlx5_ifc_pcam_reg_bits {
 	u8         reserved_at_20[0x20];
 
 	union {
+		struct mlx5_ifc_pcam_regs_5000_to_507f_bits regs_5000_to_507f;
 		u8         reserved_at_0[0x80];
 	} port_access_reg_cap_mask;
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 3/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c
  2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
  2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
  2018-05-21 21:04 ` [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Saeed Mahameed
@ 2018-05-21 21:04 ` Saeed Mahameed
  2018-05-21 21:05 ` [net-next 4/6] net/mlx5e: PPTB and PBMC register firmware command support Saeed Mahameed
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-21 21:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

Move four below functions from en_ethtool.c to en/port.c. These
functions are used by both en_ethtool.c and en_main.c. Downstream
patches will use these functions without ethtool link mode dependency.
  u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
  int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  u32 mlx5e_port_speed2linkmodes(u32 speed);

Delete the speed field from table mlx5e_build_ptys2ethtool_map. This
table only keeps the mapping between the mlx5e link mode and
ethtool link mode. Add new table mlx5e_link_speed for translation
from mlx5e link mode to actual speed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 -
 .../ethernet/mellanox/mlx5/core/en/Makefile   |   1 +
 .../net/ethernet/mellanox/mlx5/core/en/port.c | 129 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/port.h |  43 ++++++
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  | 102 +++++---------
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   3 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |   3 +-
 8 files changed, 213 insertions(+), 72 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index a7135f5d5cf6..651cf3640420 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
-		en_arfs.o en_fs_ethtool.o en_selftest.o
+		en_arfs.o en_fs_ethtool.o en_selftest.o en/port.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index bc91a7335c93..d13a86a1d702 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -932,8 +932,6 @@ void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv);
 
 void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len,
 				   int num_channels);
-int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
-
 void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params,
 				 u8 cq_period_mode);
 void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
new file mode 100644
index 000000000000..d8e17110f25d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
@@ -0,0 +1 @@
+subdir-ccflags-y += -I$(src)/..
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
new file mode 100644
index 000000000000..9f04542f3661
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "port.h"
+
+/* speed in units of 1Mb */
+static const u32 mlx5e_link_speed[MLX5E_LINK_MODES_NUMBER] = {
+	[MLX5E_1000BASE_CX_SGMII] = 1000,
+	[MLX5E_1000BASE_KX]       = 1000,
+	[MLX5E_10GBASE_CX4]       = 10000,
+	[MLX5E_10GBASE_KX4]       = 10000,
+	[MLX5E_10GBASE_KR]        = 10000,
+	[MLX5E_20GBASE_KR2]       = 20000,
+	[MLX5E_40GBASE_CR4]       = 40000,
+	[MLX5E_40GBASE_KR4]       = 40000,
+	[MLX5E_56GBASE_R4]        = 56000,
+	[MLX5E_10GBASE_CR]        = 10000,
+	[MLX5E_10GBASE_SR]        = 10000,
+	[MLX5E_10GBASE_ER]        = 10000,
+	[MLX5E_40GBASE_SR4]       = 40000,
+	[MLX5E_40GBASE_LR4]       = 40000,
+	[MLX5E_50GBASE_SR2]       = 50000,
+	[MLX5E_100GBASE_CR4]      = 100000,
+	[MLX5E_100GBASE_SR4]      = 100000,
+	[MLX5E_100GBASE_KR4]      = 100000,
+	[MLX5E_100GBASE_LR4]      = 100000,
+	[MLX5E_100BASE_TX]        = 100,
+	[MLX5E_1000BASE_T]        = 1000,
+	[MLX5E_10GBASE_T]         = 10000,
+	[MLX5E_25GBASE_CR]        = 25000,
+	[MLX5E_25GBASE_KR]        = 25000,
+	[MLX5E_25GBASE_SR]        = 25000,
+	[MLX5E_50GBASE_CR2]       = 50000,
+	[MLX5E_50GBASE_KR2]       = 50000,
+};
+
+u32 mlx5e_port_ptys2speed(u32 eth_proto_oper)
+{
+	unsigned long temp = eth_proto_oper;
+	u32 speed = 0;
+	int i;
+
+	i = find_first_bit(&temp, MLX5E_LINK_MODES_NUMBER);
+	if (i < MLX5E_LINK_MODES_NUMBER)
+		speed = mlx5e_link_speed[i];
+
+	return speed;
+}
+
+int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
+{
+	u32 out[MLX5_ST_SZ_DW(ptys_reg)] = {};
+	u32 eth_proto_oper;
+	int err;
+
+	err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1);
+	if (err)
+		return err;
+
+	eth_proto_oper = MLX5_GET(ptys_reg, out, eth_proto_oper);
+	*speed = mlx5e_port_ptys2speed(eth_proto_oper);
+	if (!(*speed)) {
+		mlx5_core_warn(mdev, "cannot get port speed\n");
+		err = -EINVAL;
+	}
+
+	return err;
+}
+
+int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
+{
+	u32 max_speed = 0;
+	u32 proto_cap;
+	int err;
+	int i;
+
+	err = mlx5_query_port_proto_cap(mdev, &proto_cap, MLX5_PTYS_EN);
+	if (err)
+		return err;
+
+	for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i)
+		if (proto_cap & MLX5E_PROT_MASK(i))
+			max_speed = max(max_speed, mlx5e_link_speed[i]);
+
+	*speed = max_speed;
+	return 0;
+}
+
+u32 mlx5e_port_speed2linkmodes(u32 speed)
+{
+	u32 link_modes = 0;
+	int i;
+
+	for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
+		if (mlx5e_link_speed[i] == speed)
+			link_modes |= MLX5E_PROT_MASK(i);
+	}
+
+	return link_modes;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
new file mode 100644
index 000000000000..7aae38e98a65
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __MLX5E_EN_PORT_H
+#define __MLX5E_EN_PORT_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
+int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
+int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
+u32 mlx5e_port_speed2linkmodes(u32 speed);
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 2b786c4d3dab..42bd256e680d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -31,6 +31,7 @@
  */
 
 #include "en.h"
+#include "en/port.h"
 
 void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv,
 			       struct ethtool_drvinfo *drvinfo)
@@ -59,18 +60,16 @@ static void mlx5e_get_drvinfo(struct net_device *dev,
 struct ptys2ethtool_config {
 	__ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
 	__ETHTOOL_DECLARE_LINK_MODE_MASK(advertised);
-	u32 speed;
 };
 
 static struct ptys2ethtool_config ptys2ethtool_table[MLX5E_LINK_MODES_NUMBER];
 
-#define MLX5_BUILD_PTYS2ETHTOOL_CONFIG(reg_, speed_, ...)               \
+#define MLX5_BUILD_PTYS2ETHTOOL_CONFIG(reg_, ...)                       \
 	({                                                              \
 		struct ptys2ethtool_config *cfg;                        \
 		const unsigned int modes[] = { __VA_ARGS__ };           \
 		unsigned int i;                                         \
 		cfg = &ptys2ethtool_table[reg_];                        \
-		cfg->speed = speed_;                                    \
 		bitmap_zero(cfg->supported,                             \
 			    __ETHTOOL_LINK_MODE_MASK_NBITS);            \
 		bitmap_zero(cfg->advertised,                            \
@@ -83,55 +82,55 @@ static struct ptys2ethtool_config ptys2ethtool_table[MLX5E_LINK_MODES_NUMBER];
 
 void mlx5e_build_ptys2ethtool_map(void)
 {
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_CX_SGMII, SPEED_1000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_CX_SGMII,
 				       ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_KX, SPEED_1000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_KX,
 				       ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CX4, SPEED_10000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CX4,
 				       ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KX4, SPEED_10000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KX4,
 				       ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KR, SPEED_10000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KR,
 				       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_20GBASE_KR2, SPEED_20000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_20GBASE_KR2,
 				       ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_CR4, SPEED_40000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_CR4,
 				       ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_KR4, SPEED_40000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_KR4,
 				       ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_56GBASE_R4, SPEED_56000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_56GBASE_R4,
 				       ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CR, SPEED_10000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CR,
 				       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_SR, SPEED_10000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_SR,
 				       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_ER, SPEED_10000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_ER,
 				       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_SR4, SPEED_40000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_SR4,
 				       ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_LR4, SPEED_40000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_LR4,
 				       ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_SR2, SPEED_50000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_SR2,
 				       ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_CR4, SPEED_100000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_CR4,
 				       ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_SR4, SPEED_100000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_SR4,
 				       ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_KR4, SPEED_100000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_KR4,
 				       ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_LR4, SPEED_100000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_LR4,
 				       ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_T, SPEED_10000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_T,
 				       ETHTOOL_LINK_MODE_10000baseT_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_CR, SPEED_25000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_CR,
 				       ETHTOOL_LINK_MODE_25000baseCR_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_KR, SPEED_25000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_KR,
 				       ETHTOOL_LINK_MODE_25000baseKR_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_SR, SPEED_25000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_SR,
 				       ETHTOOL_LINK_MODE_25000baseSR_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_CR2, SPEED_50000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_CR2,
 				       ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT);
-	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_KR2, SPEED_50000,
+	MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_KR2,
 				       ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT);
 }
 
@@ -617,43 +616,24 @@ static void ptys2ethtool_supported_advertised_port(struct ethtool_link_ksettings
 	}
 }
 
-int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
-{
-	u32 max_speed = 0;
-	u32 proto_cap;
-	int err;
-	int i;
-
-	err = mlx5_query_port_proto_cap(mdev, &proto_cap, MLX5_PTYS_EN);
-	if (err)
-		return err;
-
-	for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i)
-		if (proto_cap & MLX5E_PROT_MASK(i))
-			max_speed = max(max_speed, ptys2ethtool_table[i].speed);
-
-	*speed = max_speed;
-	return 0;
-}
-
 static void get_speed_duplex(struct net_device *netdev,
 			     u32 eth_proto_oper,
 			     struct ethtool_link_ksettings *link_ksettings)
 {
-	int i;
 	u32 speed = SPEED_UNKNOWN;
 	u8 duplex = DUPLEX_UNKNOWN;
 
 	if (!netif_carrier_ok(netdev))
 		goto out;
 
-	for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
-		if (eth_proto_oper & MLX5E_PROT_MASK(i)) {
-			speed = ptys2ethtool_table[i].speed;
-			duplex = DUPLEX_FULL;
-			break;
-		}
+	speed = mlx5e_port_ptys2speed(eth_proto_oper);
+	if (!speed) {
+		speed = SPEED_UNKNOWN;
+		goto out;
 	}
+
+	duplex = DUPLEX_FULL;
+
 out:
 	link_ksettings->base.speed = speed;
 	link_ksettings->base.duplex = duplex;
@@ -811,18 +791,6 @@ static u32 mlx5e_ethtool2ptys_adver_link(const unsigned long *link_modes)
 	return ptys_modes;
 }
 
-static u32 mlx5e_ethtool2ptys_speed_link(u32 speed)
-{
-	u32 i, speed_links = 0;
-
-	for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
-		if (ptys2ethtool_table[i].speed == speed)
-			speed_links |= MLX5E_PROT_MASK(i);
-	}
-
-	return speed_links;
-}
-
 static int mlx5e_set_link_ksettings(struct net_device *netdev,
 				    const struct ethtool_link_ksettings *link_ksettings)
 {
@@ -842,7 +810,7 @@ static int mlx5e_set_link_ksettings(struct net_device *netdev,
 
 	link_modes = link_ksettings->base.autoneg == AUTONEG_ENABLE ?
 		mlx5e_ethtool2ptys_adver_link(link_ksettings->link_modes.advertising) :
-		mlx5e_ethtool2ptys_speed_link(speed);
+		mlx5e_port_speed2linkmodes(speed);
 
 	err = mlx5_query_port_proto_cap(mdev, &eth_proto_cap, MLX5_PTYS_EN);
 	if (err) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b5a7580b12fe..cee44c21766c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -46,6 +46,7 @@
 #include "accel/ipsec.h"
 #include "accel/tls.h"
 #include "vxlan.h"
+#include "en/port.h"
 
 struct mlx5e_rq_param {
 	u32			rqc[MLX5_ST_SZ_DW(rqc)];
@@ -4082,7 +4083,7 @@ static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
 	u32 link_speed = 0;
 	u32 pci_bw = 0;
 
-	mlx5e_get_max_linkspeed(mdev, &link_speed);
+	mlx5e_port_max_linkspeed(mdev, &link_speed);
 	pci_bw = pcie_bandwidth_available(mdev->pdev, NULL, NULL, NULL);
 	mlx5_core_dbg_once(mdev, "Max link speed = %d, PCI BW = %d\n",
 			   link_speed, pci_bw);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 674f1d7d2737..a9c96fe8e4fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -52,6 +52,7 @@
 #include "eswitch.h"
 #include "vxlan.h"
 #include "fs_core.h"
+#include "en/port.h"
 
 struct mlx5_nic_flow_attr {
 	u32 action;
@@ -613,7 +614,7 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
 
 	params.q_counter = priv->q_counter;
 	/* set hairpin pair per each 50Gbs share of the link */
-	mlx5e_get_max_linkspeed(priv->mdev, &link_speed);
+	mlx5e_port_max_linkspeed(priv->mdev, &link_speed);
 	link_speed = max_t(u32, link_speed, 50000);
 	link_speed64 = link_speed;
 	do_div(link_speed64, 50000);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 4/6] net/mlx5e: PPTB and PBMC register firmware command support
  2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2018-05-21 21:04 ` [net-next 3/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c Saeed Mahameed
@ 2018-05-21 21:05 ` Saeed Mahameed
  2018-05-21 21:05 ` [net-next 5/6] net/mlx5e: Receive buffer configuration Saeed Mahameed
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-21 21:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

Add firmware command interface to read and write PPTB and PBMC
registers.

PPTB register enables mappings priority to a specific receive buffer.

PBMC registers enables changing the receive buffer's configuration such
as buffer size, xon/xoff thresholds, buffer's lossy property and
buffer's shared property.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/port.c | 108 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/port.h |   5 +
 include/linux/mlx5/driver.h                   |   2 +
 include/linux/mlx5/mlx5_ifc.h                 |  35 ++++++
 4 files changed, 150 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
index 9f04542f3661..24e3b564964f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -127,3 +127,111 @@ u32 mlx5e_port_speed2linkmodes(u32 speed)
 
 	return link_modes;
 }
+
+int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out)
+{
+	int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+	void *in;
+	int err;
+
+	in = kzalloc(sz, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(pbmc_reg, in, local_port, 1);
+	err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PBMC, 0, 0);
+
+	kfree(in);
+	return err;
+}
+
+int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in)
+{
+	int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+	void *out;
+	int err;
+
+	out = kzalloc(sz, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	MLX5_SET(pbmc_reg, in, local_port, 1);
+	err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PBMC, 0, 1);
+
+	kfree(out);
+	return err;
+}
+
+/* buffer[i]: buffer that priority i mapped to */
+int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer)
+{
+	int sz = MLX5_ST_SZ_BYTES(pptb_reg);
+	u32 prio_x_buff;
+	void *out;
+	void *in;
+	int prio;
+	int err;
+
+	in = kzalloc(sz, GFP_KERNEL);
+	out = kzalloc(sz, GFP_KERNEL);
+	if (!in || !out) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	MLX5_SET(pptb_reg, in, local_port, 1);
+	err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 0);
+	if (err)
+		goto out;
+
+	prio_x_buff = MLX5_GET(pptb_reg, out, prio_x_buff);
+	for (prio = 0; prio < 8; prio++) {
+		buffer[prio] = (u8)(prio_x_buff >> (4 * prio)) & 0xF;
+		mlx5_core_dbg(mdev, "prio %d, buffer %d\n", prio, buffer[prio]);
+	}
+out:
+	kfree(in);
+	kfree(out);
+	return err;
+}
+
+int mlx5e_port_set_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer)
+{
+	int sz = MLX5_ST_SZ_BYTES(pptb_reg);
+	u32 prio_x_buff;
+	void *out;
+	void *in;
+	int prio;
+	int err;
+
+	in = kzalloc(sz, GFP_KERNEL);
+	out = kzalloc(sz, GFP_KERNEL);
+	if (!in || !out) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* First query the pptb register */
+	MLX5_SET(pptb_reg, in, local_port, 1);
+	err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 0);
+	if (err)
+		goto out;
+
+	memcpy(in, out, sz);
+	MLX5_SET(pptb_reg, in, local_port, 1);
+
+	/* Update the pm and prio_x_buff */
+	MLX5_SET(pptb_reg, in, pm, 0xFF);
+
+	prio_x_buff = 0;
+	for (prio = 0; prio < 8; prio++)
+		prio_x_buff |= (buffer[prio] << (4 * prio));
+	MLX5_SET(pptb_reg, in, prio_x_buff, prio_x_buff);
+
+	err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 1);
+
+out:
+	kfree(in);
+	kfree(out);
+	return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
index 7aae38e98a65..f8cbd8194179 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
@@ -40,4 +40,9 @@ u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
 int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 u32 mlx5e_port_speed2linkmodes(u32 speed);
+
+int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out);
+int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in);
+int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
+int mlx5e_port_set_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 2a156c5dfadd..395fc1a9e378 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -124,6 +124,8 @@ enum {
 	MLX5_REG_PAOS		 = 0x5006,
 	MLX5_REG_PFCC            = 0x5007,
 	MLX5_REG_PPCNT		 = 0x5008,
+	MLX5_REG_PPTB            = 0x500b,
+	MLX5_REG_PBMC            = 0x500c,
 	MLX5_REG_PMAOS		 = 0x5012,
 	MLX5_REG_PUDE		 = 0x5009,
 	MLX5_REG_PMPE		 = 0x5010,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index f687989d336b..edbddeaacc88 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8788,6 +8788,41 @@ struct mlx5_ifc_qpts_reg_bits {
 	u8         trust_state[0x3];
 };
 
+struct mlx5_ifc_pptb_reg_bits {
+	u8         reserved_at_0[0x2];
+	u8         mm[0x2];
+	u8         reserved_at_4[0x4];
+	u8         local_port[0x8];
+	u8         reserved_at_10[0x6];
+	u8         cm[0x1];
+	u8         um[0x1];
+	u8         pm[0x8];
+
+	u8         prio_x_buff[0x20];
+
+	u8         pm_msb[0x8];
+	u8         reserved_at_48[0x10];
+	u8         ctrl_buff[0x4];
+	u8         untagged_buff[0x4];
+};
+
+struct mlx5_ifc_pbmc_reg_bits {
+	u8         reserved_at_0[0x8];
+	u8         local_port[0x8];
+	u8         reserved_at_10[0x10];
+
+	u8         xoff_timer_value[0x10];
+	u8         xoff_refresh[0x10];
+
+	u8         reserved_at_40[0x9];
+	u8         fullness_threshold[0x7];
+	u8         port_buffer_size[0x10];
+
+	struct mlx5_ifc_bufferx_reg_bits buffer[10];
+
+	u8         reserved_at_2e0[0x40];
+};
+
 struct mlx5_ifc_qtct_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         port_number[0x8];
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 5/6] net/mlx5e: Receive buffer configuration
  2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2018-05-21 21:05 ` [net-next 4/6] net/mlx5e: PPTB and PBMC register firmware command support Saeed Mahameed
@ 2018-05-21 21:05 ` Saeed Mahameed
  2018-05-21 21:05 ` [net-next 6/6] net/mlx5e: Receive buffer support for DCBX Saeed Mahameed
  2018-05-22 19:38 ` [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 David Miller
  6 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-21 21:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

Add APIs for buffer configuration based on the changes in
pfc configuration, cable len, buffer size configuration,
and priority to buffer mapping.

Note that the xoff fomula is as below
  xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]
  xoff_threshold = buffer_size - xoff
  xon_threshold = xoff_threshold - MTU

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   5 +
 .../mellanox/mlx5/core/en/port_buffer.c       | 327 ++++++++++++++++++
 .../mellanox/mlx5/core/en/port_buffer.h       |  75 ++++
 4 files changed, 408 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 651cf3640420..9efbf193ad5a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -21,7 +21,7 @@ mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
 mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o en_tc.o
 
-mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
+mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o en/port_buffer.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o ipoib/ipoib_vlan.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index d13a86a1d702..9ab7158a7ce7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -65,6 +65,7 @@ struct page_pool;
 #define MLX5E_HW2SW_MTU(params, hwmtu) ((hwmtu) - ((params)->hard_mtu))
 #define MLX5E_SW2HW_MTU(params, swmtu) ((swmtu) + ((params)->hard_mtu))
 
+#define MLX5E_MAX_PRIORITY      8
 #define MLX5E_MAX_DSCP          64
 #define MLX5E_MAX_NUM_TC	8
 
@@ -275,6 +276,10 @@ struct mlx5e_dcbx {
 	/* The only setting that cannot be read from FW */
 	u8                         tc_tsa[IEEE_8021QAZ_MAX_TCS];
 	u8                         cap;
+
+	/* Buffer configuration */
+	u32                        cable_len;
+	u32                        xoff;
 };
 
 struct mlx5e_dcbx_dp {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
new file mode 100644
index 000000000000..c047da8752da
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include "port_buffer.h"
+
+int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
+			    struct mlx5e_port_buffer *port_buffer)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+	u32 total_used = 0;
+	void *buffer;
+	void *out;
+	int err;
+	int i;
+
+	out = kzalloc(sz, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	err = mlx5e_port_query_pbmc(mdev, out);
+	if (err)
+		goto out;
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+		buffer = MLX5_ADDR_OF(pbmc_reg, out, buffer[i]);
+		port_buffer->buffer[i].lossy =
+			MLX5_GET(bufferx_reg, buffer, lossy);
+		port_buffer->buffer[i].epsb =
+			MLX5_GET(bufferx_reg, buffer, epsb);
+		port_buffer->buffer[i].size =
+			MLX5_GET(bufferx_reg, buffer, size) << MLX5E_BUFFER_CELL_SHIFT;
+		port_buffer->buffer[i].xon =
+			MLX5_GET(bufferx_reg, buffer, xon_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+		port_buffer->buffer[i].xoff =
+			MLX5_GET(bufferx_reg, buffer, xoff_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+		total_used += port_buffer->buffer[i].size;
+
+		mlx5e_dbg(HW, priv, "buffer %d: size=%d, xon=%d, xoff=%d, epsb=%d, lossy=%d\n", i,
+			  port_buffer->buffer[i].size,
+			  port_buffer->buffer[i].xon,
+			  port_buffer->buffer[i].xoff,
+			  port_buffer->buffer[i].epsb,
+			  port_buffer->buffer[i].lossy);
+	}
+
+	port_buffer->port_buffer_size =
+		MLX5_GET(pbmc_reg, out, port_buffer_size) << MLX5E_BUFFER_CELL_SHIFT;
+	port_buffer->spare_buffer_size =
+		port_buffer->port_buffer_size - total_used;
+
+	mlx5e_dbg(HW, priv, "total buffer size=%d, spare buffer size=%d\n",
+		  port_buffer->port_buffer_size,
+		  port_buffer->spare_buffer_size);
+out:
+	kfree(out);
+	return err;
+}
+
+static int port_set_buffer(struct mlx5e_priv *priv,
+			   struct mlx5e_port_buffer *port_buffer)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+	void *buffer;
+	void *in;
+	int err;
+	int i;
+
+	in = kzalloc(sz, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	err = mlx5e_port_query_pbmc(mdev, in);
+	if (err)
+		goto out;
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+		buffer = MLX5_ADDR_OF(pbmc_reg, in, buffer[i]);
+
+		MLX5_SET(bufferx_reg, buffer, size,
+			 port_buffer->buffer[i].size >> MLX5E_BUFFER_CELL_SHIFT);
+		MLX5_SET(bufferx_reg, buffer, lossy,
+			 port_buffer->buffer[i].lossy);
+		MLX5_SET(bufferx_reg, buffer, xoff_threshold,
+			 port_buffer->buffer[i].xoff >> MLX5E_BUFFER_CELL_SHIFT);
+		MLX5_SET(bufferx_reg, buffer, xon_threshold,
+			 port_buffer->buffer[i].xon >> MLX5E_BUFFER_CELL_SHIFT);
+	}
+
+	err = mlx5e_port_set_pbmc(mdev, in);
+out:
+	kfree(in);
+	return err;
+}
+
+/* xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]) */
+static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu)
+{
+	u32 speed;
+	u32 xoff;
+	int err;
+
+	err = mlx5e_port_linkspeed(priv->mdev, &speed);
+	if (err)
+		return 0;
+
+	xoff = (301 + 216 * priv->dcbx.cable_len / 100) * speed / 1000 + 272 * mtu / 100;
+
+	mlx5e_dbg(HW, priv, "%s: xoff=%d\n", __func__, xoff);
+	return xoff;
+}
+
+static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
+				 u32 xoff, unsigned int mtu)
+{
+	int i;
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+		if (port_buffer->buffer[i].lossy) {
+			port_buffer->buffer[i].xoff = 0;
+			port_buffer->buffer[i].xon  = 0;
+			continue;
+		}
+
+		if (port_buffer->buffer[i].size <
+		    (xoff + mtu + (1 << MLX5E_BUFFER_CELL_SHIFT)))
+			return -ENOMEM;
+
+		port_buffer->buffer[i].xoff = port_buffer->buffer[i].size - xoff;
+		port_buffer->buffer[i].xon  = port_buffer->buffer[i].xoff - mtu;
+	}
+
+	return 0;
+}
+
+/**
+ * update_buffer_lossy()
+ *   mtu: device's MTU
+ *   pfc_en: <input> current pfc configuration
+ *   buffer: <input> current prio to buffer mapping
+ *   xoff:   <input> xoff value
+ *   port_buffer: <output> port receive buffer configuration
+ *   change: <output>
+ *
+ *   Update buffer configuration based on pfc configuraiton and priority
+ *   to buffer mapping.
+ *   Buffer's lossy bit is changed to:
+ *     lossless if there is at least one PFC enabled priority mapped to this buffer
+ *     lossy if all priorities mapped to this buffer are PFC disabled
+ *
+ *   Return:
+ *     Return 0 if no error.
+ *     Set change to true if buffer configuration is modified.
+ */
+static int update_buffer_lossy(unsigned int mtu,
+			       u8 pfc_en, u8 *buffer, u32 xoff,
+			       struct mlx5e_port_buffer *port_buffer,
+			       bool *change)
+{
+	bool changed = false;
+	u8 lossy_count;
+	u8 prio_count;
+	u8 lossy;
+	int prio;
+	int err;
+	int i;
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+		prio_count = 0;
+		lossy_count = 0;
+
+		for (prio = 0; prio < MLX5E_MAX_PRIORITY; prio++) {
+			if (buffer[prio] != i)
+				continue;
+
+			prio_count++;
+			lossy_count += !(pfc_en & (1 << prio));
+		}
+
+		if (lossy_count == prio_count)
+			lossy = 1;
+		else /* lossy_count < prio_count */
+			lossy = 0;
+
+		if (lossy != port_buffer->buffer[i].lossy) {
+			port_buffer->buffer[i].lossy = lossy;
+			changed = true;
+		}
+	}
+
+	if (changed) {
+		err = update_xoff_threshold(port_buffer, xoff, mtu);
+		if (err)
+			return err;
+
+		*change = true;
+	}
+
+	return 0;
+}
+
+int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
+				    u32 change, unsigned int mtu,
+				    struct ieee_pfc *pfc,
+				    u32 *buffer_size,
+				    u8 *prio2buffer)
+{
+	struct mlx5e_port_buffer port_buffer;
+	u32 xoff = calculate_xoff(priv, mtu);
+	bool update_prio2buffer = false;
+	u8 buffer[MLX5E_MAX_PRIORITY];
+	bool update_buffer = false;
+	u32 total_used = 0;
+	u8 curr_pfc_en;
+	int err;
+	int i;
+
+	mlx5e_dbg(HW, priv, "%s: change=%x\n", __func__, change);
+
+	err = mlx5e_port_query_buffer(priv, &port_buffer);
+	if (err)
+		return err;
+
+	if (change & MLX5E_PORT_BUFFER_CABLE_LEN) {
+		update_buffer = true;
+		err = update_xoff_threshold(&port_buffer, xoff, mtu);
+		if (err)
+			return err;
+	}
+
+	if (change & MLX5E_PORT_BUFFER_PFC) {
+		err = mlx5e_port_query_priority2buffer(priv->mdev, buffer);
+		if (err)
+			return err;
+
+		err = update_buffer_lossy(mtu, pfc->pfc_en, buffer, xoff,
+					  &port_buffer, &update_buffer);
+		if (err)
+			return err;
+	}
+
+	if (change & MLX5E_PORT_BUFFER_PRIO2BUFFER) {
+		update_prio2buffer = true;
+		err = mlx5_query_port_pfc(priv->mdev, &curr_pfc_en, NULL);
+		if (err)
+			return err;
+
+		err = update_buffer_lossy(mtu, curr_pfc_en, prio2buffer, xoff,
+					  &port_buffer, &update_buffer);
+		if (err)
+			return err;
+	}
+
+	if (change & MLX5E_PORT_BUFFER_SIZE) {
+		for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+			mlx5e_dbg(HW, priv, "%s: buffer[%d]=%d\n", __func__, i, buffer_size[i]);
+			if (!port_buffer.buffer[i].lossy && !buffer_size[i]) {
+				mlx5e_dbg(HW, priv, "%s: lossless buffer[%d] size cannot be zero\n",
+					  __func__, i);
+				return -EINVAL;
+			}
+
+			port_buffer.buffer[i].size = buffer_size[i];
+			total_used += buffer_size[i];
+		}
+
+		mlx5e_dbg(HW, priv, "%s: total buffer requested=%d\n", __func__, total_used);
+
+		if (total_used > port_buffer.port_buffer_size)
+			return -EINVAL;
+
+		update_buffer = true;
+		err = update_xoff_threshold(&port_buffer, xoff, mtu);
+		if (err)
+			return err;
+	}
+
+	/* Need to update buffer configuration if xoff value is changed */
+	if (!update_buffer && xoff != priv->dcbx.xoff) {
+		update_buffer = true;
+		err = update_xoff_threshold(&port_buffer, xoff, mtu);
+		if (err)
+			return err;
+	}
+	priv->dcbx.xoff = xoff;
+
+	/* Apply the settings */
+	if (update_buffer) {
+		err = port_set_buffer(priv, &port_buffer);
+		if (err)
+			return err;
+	}
+
+	if (update_prio2buffer)
+		err = mlx5e_port_set_priority2buffer(priv->mdev, prio2buffer);
+
+	return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
new file mode 100644
index 000000000000..34f55b81a0de
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __MLX5_EN_PORT_BUFFER_H__
+#define __MLX5_EN_PORT_BUFFER_H__
+
+#include "en.h"
+#include "port.h"
+
+#define MLX5E_MAX_BUFFER 8
+#define MLX5E_BUFFER_CELL_SHIFT 7
+#define MLX5E_DEFAULT_CABLE_LEN 7 /* 7 meters */
+
+#define MLX5_BUFFER_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, pcam_reg) && \
+				     MLX5_CAP_PCAM_REG(mdev, pbmc) && \
+				     MLX5_CAP_PCAM_REG(mdev, pptb))
+
+enum {
+	MLX5E_PORT_BUFFER_CABLE_LEN   = BIT(0),
+	MLX5E_PORT_BUFFER_PFC         = BIT(1),
+	MLX5E_PORT_BUFFER_PRIO2BUFFER = BIT(2),
+	MLX5E_PORT_BUFFER_SIZE        = BIT(3),
+};
+
+struct mlx5e_bufferx_reg {
+	u8   lossy;
+	u8   epsb;
+	u32  size;
+	u32  xoff;
+	u32  xon;
+};
+
+struct mlx5e_port_buffer {
+	u32                       port_buffer_size;
+	u32                       spare_buffer_size;
+	struct mlx5e_bufferx_reg  buffer[MLX5E_MAX_BUFFER];
+};
+
+int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
+				    u32 change, unsigned int mtu,
+				    struct ieee_pfc *pfc,
+				    u32 *buffer_size,
+				    u8 *prio2buffer);
+
+int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
+			    struct mlx5e_port_buffer *port_buffer);
+#endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 6/6] net/mlx5e: Receive buffer support for DCBX
  2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2018-05-21 21:05 ` [net-next 5/6] net/mlx5e: Receive buffer configuration Saeed Mahameed
@ 2018-05-21 21:05 ` Saeed Mahameed
  2018-05-22 19:38 ` [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 David Miller
  6 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-21 21:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed

From: Huy Nguyen <huyn@mellanox.com>

Add dcbnl's set/get buffer configuration callback that allows user to
set/get buffer size configuration and priority to buffer mapping.

By default, firmware controls receive buffer configuration and priority
of buffer mapping based on the changes in pfc settings. When set buffer
call back is triggered, the buffer configuration changes to manual mode.

The manual mode means mlx5 driver will adjust the buffer configuration
accordingly based on the changes in pfc settings.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   1 +
 .../ethernet/mellanox/mlx5/core/en_dcbnl.c    | 131 +++++++++++++++++-
 2 files changed, 125 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 9ab7158a7ce7..c5c7a6d687ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -278,6 +278,7 @@ struct mlx5e_dcbx {
 	u8                         cap;
 
 	/* Buffer configuration */
+	bool                       manual_buffer;
 	u32                        cable_len;
 	u32                        xoff;
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index c641d5656b2d..fa6cd8aa077c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -32,8 +32,8 @@
 #include <linux/device.h>
 #include <linux/netdevice.h>
 #include "en.h"
-
-#define MLX5E_MAX_PRIORITY 8
+#include "en/port.h"
+#include "en/port_buffer.h"
 
 #define MLX5E_100MB (100000)
 #define MLX5E_1GB   (1000000)
@@ -41,6 +41,9 @@
 #define MLX5E_CEE_STATE_UP    1
 #define MLX5E_CEE_STATE_DOWN  0
 
+/* Max supported cable length is 1000 meters */
+#define MLX5E_MAX_CABLE_LENGTH 1000
+
 enum {
 	MLX5E_VENDOR_TC_GROUP_NUM = 7,
 	MLX5E_LOWEST_PRIO_GROUP   = 0,
@@ -338,6 +341,9 @@ static int mlx5e_dcbnl_ieee_getpfc(struct net_device *dev,
 		pfc->indications[i] = PPORT_PER_PRIO_GET(pstats, i, rx_pause);
 	}
 
+	if (MLX5_BUFFER_SUPPORTED(mdev))
+		pfc->delay = priv->dcbx.cable_len;
+
 	return mlx5_query_port_pfc(mdev, &pfc->pfc_en, NULL);
 }
 
@@ -346,16 +352,39 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	struct mlx5_core_dev *mdev = priv->mdev;
+	u32 old_cable_len = priv->dcbx.cable_len;
+	struct ieee_pfc pfc_new;
+	u32 changed = 0;
 	u8 curr_pfc_en;
-	int ret;
+	int ret = 0;
 
+	/* pfc_en */
 	mlx5_query_port_pfc(mdev, &curr_pfc_en, NULL);
+	if (pfc->pfc_en != curr_pfc_en) {
+		ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
+		if (ret)
+			return ret;
+		mlx5_toggle_port_link(mdev);
+		changed |= MLX5E_PORT_BUFFER_PFC;
+	}
 
-	if (pfc->pfc_en == curr_pfc_en)
-		return 0;
+	if (pfc->delay &&
+	    pfc->delay < MLX5E_MAX_CABLE_LENGTH &&
+	    pfc->delay != priv->dcbx.cable_len) {
+		priv->dcbx.cable_len = pfc->delay;
+		changed |= MLX5E_PORT_BUFFER_CABLE_LEN;
+	}
 
-	ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
-	mlx5_toggle_port_link(mdev);
+	if (MLX5_BUFFER_SUPPORTED(mdev)) {
+		pfc_new.pfc_en = (changed & MLX5E_PORT_BUFFER_PFC) ? pfc->pfc_en : curr_pfc_en;
+		if (priv->dcbx.manual_buffer)
+			ret = mlx5e_port_manual_buffer_config(priv, changed,
+							      dev->mtu, &pfc_new,
+							      NULL, NULL);
+
+		if (ret && (changed & MLX5E_PORT_BUFFER_CABLE_LEN))
+			priv->dcbx.cable_len = old_cable_len;
+	}
 
 	if (!ret) {
 		mlx5e_dbg(HW, priv,
@@ -873,6 +902,89 @@ static void mlx5e_dcbnl_setpfcstate(struct net_device *netdev, u8 state)
 	cee_cfg->pfc_enable = state;
 }
 
+static int mlx5e_dcbnl_getbuffer(struct net_device *dev,
+				 struct dcbnl_buffer *dcb_buffer)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_port_buffer port_buffer;
+	u8 buffer[MLX5E_MAX_PRIORITY];
+	int i, err;
+
+	if (!MLX5_BUFFER_SUPPORTED(mdev))
+		return -EOPNOTSUPP;
+
+	err = mlx5e_port_query_priority2buffer(mdev, buffer);
+	if (err)
+		return err;
+
+	for (i = 0; i < MLX5E_MAX_PRIORITY; i++)
+		dcb_buffer->prio2buffer[i] = buffer[i];
+
+	err = mlx5e_port_query_buffer(priv, &port_buffer);
+	if (err)
+		return err;
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+		dcb_buffer->buffer_size[i] = port_buffer.buffer[i].size;
+
+	return 0;
+}
+
+static int mlx5e_dcbnl_setbuffer(struct net_device *dev,
+				 struct dcbnl_buffer *dcb_buffer)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_port_buffer port_buffer;
+	u8 old_prio2buffer[MLX5E_MAX_PRIORITY];
+	u32 *buffer_size = NULL;
+	u8 *prio2buffer = NULL;
+	u32 changed = 0;
+	int i, err;
+
+	if (!MLX5_BUFFER_SUPPORTED(mdev))
+		return -EOPNOTSUPP;
+
+	for (i = 0; i < DCBX_MAX_BUFFERS; i++)
+		mlx5_core_dbg(mdev, "buffer[%d]=%d\n", i, dcb_buffer->buffer_size[i]);
+
+	for (i = 0; i < MLX5E_MAX_PRIORITY; i++)
+		mlx5_core_dbg(mdev, "priority %d buffer%d\n", i, dcb_buffer->prio2buffer[i]);
+
+	err = mlx5e_port_query_priority2buffer(mdev, old_prio2buffer);
+	if (err)
+		return err;
+
+	for (i = 0; i < MLX5E_MAX_PRIORITY; i++) {
+		if (dcb_buffer->prio2buffer[i] != old_prio2buffer[i]) {
+			changed |= MLX5E_PORT_BUFFER_PRIO2BUFFER;
+			prio2buffer = dcb_buffer->prio2buffer;
+			break;
+		}
+	}
+
+	err = mlx5e_port_query_buffer(priv, &port_buffer);
+	if (err)
+		return err;
+
+	for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+		if (port_buffer.buffer[i].size != dcb_buffer->buffer_size[i]) {
+			changed |= MLX5E_PORT_BUFFER_SIZE;
+			buffer_size = dcb_buffer->buffer_size;
+			break;
+		}
+	}
+
+	if (!changed)
+		return 0;
+
+	priv->dcbx.manual_buffer = true;
+	err = mlx5e_port_manual_buffer_config(priv, changed, dev->mtu, NULL,
+					      buffer_size, prio2buffer);
+	return err;
+}
+
 const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops = {
 	.ieee_getets	= mlx5e_dcbnl_ieee_getets,
 	.ieee_setets	= mlx5e_dcbnl_ieee_setets,
@@ -884,6 +996,8 @@ const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops = {
 	.ieee_delapp    = mlx5e_dcbnl_ieee_delapp,
 	.getdcbx	= mlx5e_dcbnl_getdcbx,
 	.setdcbx	= mlx5e_dcbnl_setdcbx,
+	.dcbnl_getbuffer = mlx5e_dcbnl_getbuffer,
+	.dcbnl_setbuffer = mlx5e_dcbnl_setbuffer,
 
 /* CEE interfaces */
 	.setall         = mlx5e_dcbnl_setall,
@@ -1091,5 +1205,8 @@ void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv)
 	if (priv->dcbx.mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
 		priv->dcbx.cap |= DCB_CAP_DCBX_HOST;
 
+	priv->dcbx.manual_buffer = false;
+	priv->dcbx.cable_len = MLX5E_DEFAULT_CABLE_LEN;
+
 	mlx5e_ets_init(priv);
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
@ 2018-05-22  5:20   ` Jakub Kicinski
  2018-05-22 15:36     ` Huy Nguyen
  2018-05-23  9:43     ` Jiri Pirko
  2018-05-23 20:19   ` Jakub Kicinski
  1 sibling, 2 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-05-22  5:20 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Huy Nguyen, Jiri Pirko, Or Gerlitz

On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
> From: Huy Nguyen <huyn@mellanox.com>
> 
> In this patch, we add dcbnl buffer attribute to allow user
> change the NIC's buffer configuration such as priority
> to buffer mapping and buffer size of individual buffer.
> 
> This attribute combined with pfc attribute allows advance user to
> fine tune the qos setting for specific priority queue. For example,
> user can give dedicated buffer for one or more prirorities or user
> can give large buffer to certain priorities.
> 
> We present an use case scenario where dcbnl buffer attribute configured
> by advance user helps reduce the latency of messages of different sizes.
> 
> Scenarios description:
> On ConnectX-5, we run latency sensitive traffic with
> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
> traffic with large messages sizes 512KB and 1MB. We group small, medium,
> and large message sizes to their own pfc enables priorities as follow.
>   Priorities 1 & 2 (64B, 256B and 1KB)
>   Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>   Priorities 5 & 6 (512KB and 1MB)
> 
> By default, ConnectX-5 maps all pfc enabled priorities to a single
> lossless fixed buffer size of 50% of total available buffer space. The
> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
> we create three equal size lossless buffers. Each buffer has 25% of total
> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
> to lossless  buffer mappings are set as follow.
>   Priorities 1 & 2 on lossless buffer #1
>   Priorities 3 & 4 on lossless buffer #2
>   Priorities 5 & 6 on lossless buffer #3
> 
> We observe improvements in latency for small and medium message sizes
> as follows. Please note that the large message sizes bandwidth performance is
> reduced but the total bandwidth remains the same.
>   256B message size (42 % latency reduction)
>   4K message size (21% latency reduction)
>   64K message size (16% latency reduction)
> 
> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

On a cursory look this bares a lot of resemblance to devlink shared
buffer configuration ABI.  Did you look into using that?  

Just to be clear devlink shared buffer ABIs don't require representors
and "switchdev mode".

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
  2018-05-21 21:04 ` [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Saeed Mahameed
@ 2018-05-22 10:19   ` Or Gerlitz
  2018-05-22 10:21     ` Or Gerlitz
  0 siblings, 1 reply; 32+ messages in thread
From: Or Gerlitz @ 2018-05-22 10:19 UTC (permalink / raw)
  To: Huy Nguyen; +Cc: David S. Miller, Linux Netdev List, Saeed Mahameed

On Tue, May 22, 2018 at 12:04 AM, Saeed Mahameed <saeedm@mellanox.com> wrote:
> From: Huy Nguyen <huyn@mellanox.com>
>
> Add pbmc and pptb in the port_access_reg_cap_mask. These two
> bits determine if device supports receive buffer configuration.
>
> Signed-off-by: Huy Nguyen <huyn@mellanox.com>

Huy, Parav reviewed your code to death (but he's still alive and kicking!),
go a head and add his R.Bs note to the entire series.

> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
  2018-05-22 10:19   ` Or Gerlitz
@ 2018-05-22 10:21     ` Or Gerlitz
  2018-05-22 16:01       ` Saeed Mahameed
  0 siblings, 1 reply; 32+ messages in thread
From: Or Gerlitz @ 2018-05-22 10:21 UTC (permalink / raw)
  To: Huy Nguyen; +Cc: David S. Miller, Linux Netdev List, Saeed Mahameed

On Tue, May 22, 2018 at 1:19 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, May 22, 2018 at 12:04 AM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>> From: Huy Nguyen <huyn@mellanox.com>
>>
>> Add pbmc and pptb in the port_access_reg_cap_mask. These two
>> bits determine if device supports receive buffer configuration.
>>
>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>
> Huy, Parav reviewed your code to death (but he's still alive and kicking!),
> go a head and add his R.Bs note to the entire series.

when you fix that, also address checkpatch's scream on

WARNING: Missing or malformed SPDX-License-Identifier tag in line 1

in four cases along the series


>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-22  5:20   ` Jakub Kicinski
@ 2018-05-22 15:36     ` Huy Nguyen
  2018-05-22 18:32       ` Jakub Kicinski
  2018-05-23  9:43     ` Jiri Pirko
  1 sibling, 1 reply; 32+ messages in thread
From: Huy Nguyen @ 2018-05-22 15:36 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: David S. Miller, netdev, Jiri Pirko, Or Gerlitz, Parav Pandit

On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>> From: Huy Nguyen <huyn@mellanox.com>
>>
>> In this patch, we add dcbnl buffer attribute to allow user
>> change the NIC's buffer configuration such as priority
>> to buffer mapping and buffer size of individual buffer.
>>
>> This attribute combined with pfc attribute allows advance user to
>> fine tune the qos setting for specific priority queue. For example,
>> user can give dedicated buffer for one or more prirorities or user
>> can give large buffer to certain priorities.
>>
>> We present an use case scenario where dcbnl buffer attribute configured
>> by advance user helps reduce the latency of messages of different sizes.
>>
>> Scenarios description:
>> On ConnectX-5, we run latency sensitive traffic with
>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>> and large message sizes to their own pfc enables priorities as follow.
>>    Priorities 1 & 2 (64B, 256B and 1KB)
>>    Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>>    Priorities 5 & 6 (512KB and 1MB)
>>
>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>> lossless fixed buffer size of 50% of total available buffer space. The
>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>> we create three equal size lossless buffers. Each buffer has 25% of total
>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>> to lossless  buffer mappings are set as follow.
>>    Priorities 1 & 2 on lossless buffer #1
>>    Priorities 3 & 4 on lossless buffer #2
>>    Priorities 5 & 6 on lossless buffer #3
>>
>> We observe improvements in latency for small and medium message sizes
>> as follows. Please note that the large message sizes bandwidth performance is
>> reduced but the total bandwidth remains the same.
>>    256B message size (42 % latency reduction)
>>    4K message size (21% latency reduction)
>>    64K message size (16% latency reduction)
>>
>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> On a cursory look this bares a lot of resemblance to devlink shared
> buffer configuration ABI.  Did you look into using that?
>
> Just to be clear devlink shared buffer ABIs don't require representors
> and "switchdev mode".
> .
[HQN] Dear Jakub, there are several reasons that devlink shared buffer 
ABI cannot be used:
1. The devlink shared buffer ABI is written based on the switch cli 
which you can find out more
from this link https://community.mellanox.com/docs/DOC-2558.
2. The dcbnl interfaces have been used for QoS settings. In NIC, the 
buffer configuration are tied to
priority (ETS PFC). The buffer configuration are not tied to port like 
switch.
3. Shared buffer, alpha, threshold are switch specific terms.

Please let me know if you have any further question.
Regards,
Huy Nguyen

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
  2018-05-22 10:21     ` Or Gerlitz
@ 2018-05-22 16:01       ` Saeed Mahameed
  2018-05-24 21:21         ` Or Gerlitz
  0 siblings, 1 reply; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-22 16:01 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Huy Nguyen, David S. Miller, Linux Netdev List, Saeed Mahameed

On Tue, May 22, 2018 at 3:21 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, May 22, 2018 at 1:19 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Tue, May 22, 2018 at 12:04 AM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>>> From: Huy Nguyen <huyn@mellanox.com>
>>>
>>> Add pbmc and pptb in the port_access_reg_cap_mask. These two
>>> bits determine if device supports receive buffer configuration.
>>>
>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>>
>> Huy, Parav reviewed your code to death (but he's still alive and kicking!),
>> go a head and add his R.Bs note to the entire series.
>
> when you fix that, also address checkpatch's scream on
>
> WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
>
> in four cases along the series
>

We are going to do this once for all mlx5 files soon, i don't want to
have two types of license headers in the meanwhile.
let's keep this as is until then.

>
>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-22 15:36     ` Huy Nguyen
@ 2018-05-22 18:32       ` Jakub Kicinski
  2018-05-23  1:01         ` Huy Nguyen
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2018-05-22 18:32 UTC (permalink / raw)
  To: Huy Nguyen
  Cc: Saeed Mahameed, David S. Miller, netdev, Jiri Pirko, Or Gerlitz,
	Parav Pandit

On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:
> On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
> > On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:  
> >> From: Huy Nguyen <huyn@mellanox.com>
> >>
> >> In this patch, we add dcbnl buffer attribute to allow user
> >> change the NIC's buffer configuration such as priority
> >> to buffer mapping and buffer size of individual buffer.
> >>
> >> This attribute combined with pfc attribute allows advance user to
> >> fine tune the qos setting for specific priority queue. For example,
> >> user can give dedicated buffer for one or more prirorities or user
> >> can give large buffer to certain priorities.
> >>
> >> We present an use case scenario where dcbnl buffer attribute configured
> >> by advance user helps reduce the latency of messages of different sizes.
> >>
> >> Scenarios description:
> >> On ConnectX-5, we run latency sensitive traffic with
> >> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
> >> traffic with large messages sizes 512KB and 1MB. We group small, medium,
> >> and large message sizes to their own pfc enables priorities as follow.
> >>    Priorities 1 & 2 (64B, 256B and 1KB)
> >>    Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
> >>    Priorities 5 & 6 (512KB and 1MB)
> >>
> >> By default, ConnectX-5 maps all pfc enabled priorities to a single
> >> lossless fixed buffer size of 50% of total available buffer space. The
> >> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
> >> we create three equal size lossless buffers. Each buffer has 25% of total
> >> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
> >> to lossless  buffer mappings are set as follow.
> >>    Priorities 1 & 2 on lossless buffer #1
> >>    Priorities 3 & 4 on lossless buffer #2
> >>    Priorities 5 & 6 on lossless buffer #3
> >>
> >> We observe improvements in latency for small and medium message sizes
> >> as follows. Please note that the large message sizes bandwidth performance is
> >> reduced but the total bandwidth remains the same.
> >>    256B message size (42 % latency reduction)
> >>    4K message size (21% latency reduction)
> >>    64K message size (16% latency reduction)
> >>
> >> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
> >> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  
> > On a cursory look this bares a lot of resemblance to devlink shared
> > buffer configuration ABI.  Did you look into using that?
> >
> > Just to be clear devlink shared buffer ABIs don't require representors
> > and "switchdev mode".
> > .  
> [HQN] Dear Jakub, there are several reasons that devlink shared buffer 
> ABI cannot be used:
> 1. The devlink shared buffer ABI is written based on the switch cli 
> which you can find out more
> from this link https://community.mellanox.com/docs/DOC-2558.

Devlink API accommodates requirements of simpler (SwitchX2?) and more
advanced schemes (present in Spectrum).  The simpler/basic static
threshold configurations is exactly what you are doing here, AFAIU.

> 2. The dcbnl interfaces have been used for QoS settings.

QoS settings != shared buffer configuration.

> In NIC, the  buffer configuration are tied to priority (ETS PFC).

Some customers use DCB, a lot (most?) of them don't.  I don't think the
"this is a logical extension of a commonly used API" really stands here.

> The buffer configuration are not tied to port like switch.

It's tied to a port and TCs, you just have one port but still have 8
TCs exactly like a switch...

> 3. Shared buffer, alpha, threshold are switch specific terms.

IDK how talking about alpha is relevant, it's just one threshold type
the API supports.  As far as shared buffer and threshold I don't know
if these are switch terms (or how "switch" differs from "NIC" at that
level) - I personally find carving shared buffer into pools very
intuitive.

Could you give examples of commands/configs one can use with your new
ABI?  How does one query the total size of the buffer to be carved?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19
  2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2018-05-21 21:05 ` [net-next 6/6] net/mlx5e: Receive buffer support for DCBX Saeed Mahameed
@ 2018-05-22 19:38 ` David Miller
  6 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2018-05-22 19:38 UTC (permalink / raw)
  To: saeedm; +Cc: netdev

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Mon, 21 May 2018 14:04:56 -0700

> This is a mlx5e only pull request, for more information please see tag
> log below.
> 
> Please pull and let me know if there's any problem.

The dcbnl vs. devlink shared buffer API issue needs to be discussed more
thoroughly.

Even if now changes happen to the code in the end, the results of the
discussion and example configurations for the new mechanism need to
be added to the commit message.  At a minimum.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-22 18:32       ` Jakub Kicinski
@ 2018-05-23  1:01         ` Huy Nguyen
  2018-05-23  6:15           ` Or Gerlitz
  2018-05-23  9:23           ` Jakub Kicinski
  0 siblings, 2 replies; 32+ messages in thread
From: Huy Nguyen @ 2018-05-23  1:01 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed, David S. Miller, netdev,
	Jiri Pirko, Or Gerlitz, Parav Pandit

Dear Jakub, PSB.

On 5/22/2018 1:32 PM, Jakub Kicinski wrote:
> On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:
>> On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
>>> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>>>> From: Huy Nguyen <huyn@mellanox.com>
>>>>
>>>> In this patch, we add dcbnl buffer attribute to allow user
>>>> change the NIC's buffer configuration such as priority
>>>> to buffer mapping and buffer size of individual buffer.
>>>>
>>>> This attribute combined with pfc attribute allows advance user to
>>>> fine tune the qos setting for specific priority queue. For example,
>>>> user can give dedicated buffer for one or more prirorities or user
>>>> can give large buffer to certain priorities.
>>>>
>>>> We present an use case scenario where dcbnl buffer attribute configured
>>>> by advance user helps reduce the latency of messages of different sizes.
>>>>
>>>> Scenarios description:
>>>> On ConnectX-5, we run latency sensitive traffic with
>>>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>>>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>>>> and large message sizes to their own pfc enables priorities as follow.
>>>>     Priorities 1 & 2 (64B, 256B and 1KB)
>>>>     Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>>>>     Priorities 5 & 6 (512KB and 1MB)
>>>>
>>>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>>>> lossless fixed buffer size of 50% of total available buffer space. The
>>>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>>>> we create three equal size lossless buffers. Each buffer has 25% of total
>>>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>>>> to lossless  buffer mappings are set as follow.
>>>>     Priorities 1 & 2 on lossless buffer #1
>>>>     Priorities 3 & 4 on lossless buffer #2
>>>>     Priorities 5 & 6 on lossless buffer #3
>>>>
>>>> We observe improvements in latency for small and medium message sizes
>>>> as follows. Please note that the large message sizes bandwidth performance is
>>>> reduced but the total bandwidth remains the same.
>>>>     256B message size (42 % latency reduction)
>>>>     4K message size (21% latency reduction)
>>>>     64K message size (16% latency reduction)
>>>>
>>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>>> On a cursory look this bares a lot of resemblance to devlink shared
>>> buffer configuration ABI.  Did you look into using that?
>>>
>>> Just to be clear devlink shared buffer ABIs don't require representors
>>> and "switchdev mode".
>>> .
>> [HQN] Dear Jakub, there are several reasons that devlink shared buffer
>> ABI cannot be used:
>> 1. The devlink shared buffer ABI is written based on the switch cli
>> which you can find out more
>> from this link https://community.mellanox.com/docs/DOC-2558.
> Devlink API accommodates requirements of simpler (SwitchX2?) and more
> advanced schemes (present in Spectrum).  The simpler/basic static
> threshold configurations is exactly what you are doing here, AFAIU.
[HQN] Devlink API is tailored specifically for switch. We don't 
configure threshold configuration
explicitly. It is done via PFC. Once PFC is enabled on priority, 
threshold is setup based on our
proprietary formula that were tested rigorously for performance.
>> 2. The dcbnl interfaces have been used for QoS settings.
> QoS settings != shared buffer configuration.
[HQN] I think we have different definition about "shared buffer". Please 
refer to this below switch cli link.
It explained in detail what is the "shared buffer" in switch means.
Our NIC does not have "shared buffer" supported.
https://community.mellanox.com/docs/DOC-2591

>
>> In NIC, the  buffer configuration are tied to priority (ETS PFC).
> Some customers use DCB, a lot (most?) of them don't.  I don't think the
> "this is a logical extension of a commonly used API" really stands here.
[HQN] DCBNL are being actively used. The whole point of this patch
is to tie buffer configuration with IEEE's priority and is IEEE's PFC 
configuration.

Ambitious future is to have the switch configure the NIC's buffer size 
and buffer mapping
via TLV packet and this DCBNL interface. But we won't go too far here.
>
>> The buffer configuration are not tied to port like switch.
> It's tied to a port and TCs, you just have one port but still have 8
> TCs exactly like a switch...
[HQN] No. Our buffer ties to priority not to TCs.
>> 3. Shared buffer, alpha, threshold are switch specific terms.
> IDK how talking about alpha is relevant, it's just one threshold type
> the API supports.  As far as shared buffer and threshold I don't know
> if these are switch terms (or how "switch" differs from "NIC" at that
> level) - I personally find carving shared buffer into pools very
> intuitive.
[HQN] Yes, I understand your point too. The NIC's buffer shares some 
characteristics with the switch's
buffer settings. But this DCB buffer setting is to improve the 
performance and work together with the
PFC setting. We would like to keep all the qos setting under DCB Netlink 
as they are designed
to be this way.

>
> Could you give examples of commands/configs one can use with your new
> ABI?
[HQN] The plan is to add the support in lldptool once the kernel code is 
accepted. To test the kernel code,
I am using small python scripts that works on top of the netlink library.
It will be like this format which is similar to other options in lldptool
     priority2buffer: 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to 
buffer 0,2,5,7,1,2,3,6
     buffer_size: 87296,87296,0,87296,0,0,0,0 set receive buffer size 
for buffer 0,1,2,3,4,5,6,7 respectively
>    How does one query the total size of the buffer to be carved?
[HQN] This is not necessary. If the total size is too big, error will be 
return via DCB netlink interface.
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23  1:01         ` Huy Nguyen
@ 2018-05-23  6:15           ` Or Gerlitz
  2018-05-23  9:23           ` Jakub Kicinski
  1 sibling, 0 replies; 32+ messages in thread
From: Or Gerlitz @ 2018-05-23  6:15 UTC (permalink / raw)
  To: Huy Nguyen; +Cc: Linux Netdev List

On Wed, May 23, 2018 at 4:01 AM, Huy Nguyen <huyn@mellanox.com> wrote:
> Dear Jakub, PSB.
> On 5/22/2018 1:32 PM, Jakub Kicinski wrote:

>> Devlink API accommodates requirements of simpler (SwitchX2?) and more
>> advanced schemes (present in Spectrum).  The simpler/basic static
>> threshold configurations is exactly what you are doing here, AFAIU.

> [HQN] Devlink API is tailored specifically for switch. We don't configure
> threshold configuration
> explicitly. It is done via PFC. Once PFC is enabled on priority, threshold
> is setup based on our
> proprietary formula that were tested rigorously for performance.

Huy, please do not prefix your reply lines with your name, it's not needed
and confusing, the email clients used by people in this list do the job.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23  1:01         ` Huy Nguyen
  2018-05-23  6:15           ` Or Gerlitz
@ 2018-05-23  9:23           ` Jakub Kicinski
  2018-05-23  9:33             ` Jiri Pirko
                               ` (3 more replies)
  1 sibling, 4 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-05-23  9:23 UTC (permalink / raw)
  To: Huy Nguyen
  Cc: Saeed Mahameed, David S. Miller, netdev, Jiri Pirko, Or Gerlitz,
	Parav Pandit, Ido Schimmel

On Tue, 22 May 2018 20:01:21 -0500, Huy Nguyen wrote:
> On 5/22/2018 1:32 PM, Jakub Kicinski wrote:
> > On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:  
> >> On 5/22/2018 12:20 AM, Jakub Kicinski wrote:  
> >>> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:  
> >>>> From: Huy Nguyen <huyn@mellanox.com>
> >>>>
> >>>> In this patch, we add dcbnl buffer attribute to allow user
> >>>> change the NIC's buffer configuration such as priority
> >>>> to buffer mapping and buffer size of individual buffer.
> >>>>
> >>>> This attribute combined with pfc attribute allows advance user to
> >>>> fine tune the qos setting for specific priority queue. For example,
> >>>> user can give dedicated buffer for one or more prirorities or user
> >>>> can give large buffer to certain priorities.
> >>>>
> >>>> We present an use case scenario where dcbnl buffer attribute configured
> >>>> by advance user helps reduce the latency of messages of different sizes.
> >>>>
> >>>> Scenarios description:
> >>>> On ConnectX-5, we run latency sensitive traffic with
> >>>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
> >>>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
> >>>> and large message sizes to their own pfc enables priorities as follow.
> >>>>     Priorities 1 & 2 (64B, 256B and 1KB)
> >>>>     Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
> >>>>     Priorities 5 & 6 (512KB and 1MB)
> >>>>
> >>>> By default, ConnectX-5 maps all pfc enabled priorities to a single
> >>>> lossless fixed buffer size of 50% of total available buffer space. The
> >>>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
> >>>> we create three equal size lossless buffers. Each buffer has 25% of total
> >>>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
> >>>> to lossless  buffer mappings are set as follow.
> >>>>     Priorities 1 & 2 on lossless buffer #1
> >>>>     Priorities 3 & 4 on lossless buffer #2
> >>>>     Priorities 5 & 6 on lossless buffer #3
> >>>>
> >>>> We observe improvements in latency for small and medium message sizes
> >>>> as follows. Please note that the large message sizes bandwidth performance is
> >>>> reduced but the total bandwidth remains the same.
> >>>>     256B message size (42 % latency reduction)
> >>>>     4K message size (21% latency reduction)
> >>>>     64K message size (16% latency reduction)
> >>>>
> >>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
> >>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  
> >>> On a cursory look this bares a lot of resemblance to devlink shared
> >>> buffer configuration ABI.  Did you look into using that?
> >>>
> >>> Just to be clear devlink shared buffer ABIs don't require representors
> >>> and "switchdev mode".
> >>> .  
> >> [HQN] Dear Jakub, there are several reasons that devlink shared buffer
> >> ABI cannot be used:
> >> 1. The devlink shared buffer ABI is written based on the switch cli
> >> which you can find out more
> >> from this link https://community.mellanox.com/docs/DOC-2558.  
> > Devlink API accommodates requirements of simpler (SwitchX2?) and more
> > advanced schemes (present in Spectrum).  The simpler/basic static
> > threshold configurations is exactly what you are doing here, AFAIU.  
> [HQN] Devlink API is tailored specifically for switch.

I hope that is not true, since we (Netronome) are trying to use it for
NIC configuration, too.  We should generalize the API if need be.

> We don't configure threshold configuration explicitly. It is done via
> PFC. Once PFC is enabled on priority, threshold is setup based on our
> proprietary formula that were tested rigorously for performance.

Are you referring to XOFF/XON thresholds?  I don't think the "threshold
type" in devlink API implies we are setting XON/XOFF thresholds
directly :S  If PFC is enabled we may be setting them indirectly,
obviously.

My understanding is that for static threshold type the size parameter
specifies the max amount of memory given pool can consume.

> >> 2. The dcbnl interfaces have been used for QoS settings.  
> > QoS settings != shared buffer configuration.  
> [HQN] I think we have different definition about "shared buffer".
> Please refer to this below switch cli link.
> It explained in detail what is the "shared buffer" in switch means.
> Our NIC does not have "shared buffer" supported.
> https://community.mellanox.com/docs/DOC-2591

Yes, we must have a different definitions of "shared buffer" :)  That
link, however, didn't clarify much for me...  In mlx5 you seem to have a
buffer which is shared between priorities, even if it's not what would
be referred to as shared buffer in switch context.

> >> In NIC, the  buffer configuration are tied to priority (ETS PFC).  
> > Some customers use DCB, a lot (most?) of them don't.  I don't think
> > the "this is a logical extension of a commonly used API" really
> > stands here.  
> [HQN] DCBNL are being actively used. The whole point of this patch
> is to tie buffer configuration with IEEE's priority and is IEEE's PFC 
> configuration.
>
> Ambitious future is to have the switch configure the NIC's buffer
> size and buffer mapping
> via TLV packet and this DCBNL interface. But we won't go too far here.

I think I can understand the motivation, and I think it's a nice thing
to expose!  The only questions are: does it really belong to DCBNL and
can existing API be used?
 
From patch description it seems like your default setup is shared
buffer split 50% (lossy)/50% (all prios) and the example you give
changes that to 25% (lossy)/25%x3 prio groups.

With existing devlink API could this be modelled by three ingress pools
with 2 TCs bound each?

> >> The buffer configuration are not tied to port like switch.  
> > It's tied to a port and TCs, you just have one port but still have 8
> > TCs exactly like a switch...  
> [HQN] No. Our buffer ties to priority not to TCs.

Right, that is a valid point.  Although TCs can be mapped to
priorities.  Some switches may tie buffers to priorities, too.  So
perhaps it's worth extending devlink?

> >> 3. Shared buffer, alpha, threshold are switch specific terms.  
> > IDK how talking about alpha is relevant, it's just one threshold
> > type the API supports.  As far as shared buffer and threshold I
> > don't know if these are switch terms (or how "switch" differs from
> > "NIC" at that level) - I personally find carving shared buffer into
> > pools very intuitive.  
> [HQN] Yes, I understand your point too. The NIC's buffer shares some 
> characteristics with the switch's buffer settings. 

Yes, and if it's not a perfect match we can extend it.

> But this DCB buffer setting is to improve the performance and work
> together with the PFC setting. We would like to keep all the qos
> setting under DCB Netlink as they are designed to be this way.

DCBNL seems to carry standard-based information, which this is not.
mlxsw supports DCBNL, will it also support this buffer configuration
mechanism?

> > Could you give examples of commands/configs one can use with your
> > new ABI?  
> [HQN] The plan is to add the support in lldptool once the kernel code
> is accepted. To test the kernel code,
> I am using small python scripts that works on top of the netlink
> library. It will be like this format which is similar to other
> options in lldptool priority2buffer: 0,2,5,7,1,2,3,6 maps priorities
> 0,1,2,3,4,5,6,7 to buffer 0,2,5,7,1,2,3,6
>      buffer_size: 87296,87296,0,87296,0,0,0,0 set receive buffer size 
> for buffer 0,1,2,3,4,5,6,7 respectively
> >    How does one query the total size of the buffer to be carved?  
> [HQN] This is not necessary. If the total size is too big, error will
> be return via DCB netlink interface.

Right, I'm not saying it's a bug :)  It's just nice when user can be
told the total size without having to probe for it :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23  9:23           ` Jakub Kicinski
@ 2018-05-23  9:33             ` Jiri Pirko
  2018-05-23 15:08             ` Huy Nguyen
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2018-05-23  9:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Huy Nguyen, Saeed Mahameed, David S. Miller, netdev, Or Gerlitz,
	Parav Pandit, Ido Schimmel

Wed, May 23, 2018 at 11:23:14AM CEST, jakub.kicinski@netronome.com wrote:
>On Tue, 22 May 2018 20:01:21 -0500, Huy Nguyen wrote:
>> On 5/22/2018 1:32 PM, Jakub Kicinski wrote:
>> > On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:  
>> >> On 5/22/2018 12:20 AM, Jakub Kicinski wrote:  
>> >>> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:  
>> >>>> From: Huy Nguyen <huyn@mellanox.com>
>> >>>>
>> >>>> In this patch, we add dcbnl buffer attribute to allow user
>> >>>> change the NIC's buffer configuration such as priority
>> >>>> to buffer mapping and buffer size of individual buffer.
>> >>>>
>> >>>> This attribute combined with pfc attribute allows advance user to
>> >>>> fine tune the qos setting for specific priority queue. For example,
>> >>>> user can give dedicated buffer for one or more prirorities or user
>> >>>> can give large buffer to certain priorities.
>> >>>>
>> >>>> We present an use case scenario where dcbnl buffer attribute configured
>> >>>> by advance user helps reduce the latency of messages of different sizes.
>> >>>>
>> >>>> Scenarios description:
>> >>>> On ConnectX-5, we run latency sensitive traffic with
>> >>>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>> >>>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>> >>>> and large message sizes to their own pfc enables priorities as follow.
>> >>>>     Priorities 1 & 2 (64B, 256B and 1KB)
>> >>>>     Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>> >>>>     Priorities 5 & 6 (512KB and 1MB)
>> >>>>
>> >>>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>> >>>> lossless fixed buffer size of 50% of total available buffer space. The
>> >>>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>> >>>> we create three equal size lossless buffers. Each buffer has 25% of total
>> >>>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>> >>>> to lossless  buffer mappings are set as follow.
>> >>>>     Priorities 1 & 2 on lossless buffer #1
>> >>>>     Priorities 3 & 4 on lossless buffer #2
>> >>>>     Priorities 5 & 6 on lossless buffer #3
>> >>>>
>> >>>> We observe improvements in latency for small and medium message sizes
>> >>>> as follows. Please note that the large message sizes bandwidth performance is
>> >>>> reduced but the total bandwidth remains the same.
>> >>>>     256B message size (42 % latency reduction)
>> >>>>     4K message size (21% latency reduction)
>> >>>>     64K message size (16% latency reduction)
>> >>>>
>> >>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>> >>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  
>> >>> On a cursory look this bares a lot of resemblance to devlink shared
>> >>> buffer configuration ABI.  Did you look into using that?
>> >>>
>> >>> Just to be clear devlink shared buffer ABIs don't require representors
>> >>> and "switchdev mode".
>> >>> .  
>> >> [HQN] Dear Jakub, there are several reasons that devlink shared buffer
>> >> ABI cannot be used:
>> >> 1. The devlink shared buffer ABI is written based on the switch cli
>> >> which you can find out more
>> >> from this link https://community.mellanox.com/docs/DOC-2558.  
>> > Devlink API accommodates requirements of simpler (SwitchX2?) and more
>> > advanced schemes (present in Spectrum).  The simpler/basic static
>> > threshold configurations is exactly what you are doing here, AFAIU.  
>> [HQN] Devlink API is tailored specifically for switch.
>
>I hope that is not true, since we (Netronome) are trying to use it for
>NIC configuration, too.  We should generalize the API if need be.

Sure it is not true. I have no clue why anyone thinks so :/


>
>> We don't configure threshold configuration explicitly. It is done via
>> PFC. Once PFC is enabled on priority, threshold is setup based on our
>> proprietary formula that were tested rigorously for performance.
>
>Are you referring to XOFF/XON thresholds?  I don't think the "threshold
>type" in devlink API implies we are setting XON/XOFF thresholds
>directly :S  If PFC is enabled we may be setting them indirectly,
>obviously.
>
>My understanding is that for static threshold type the size parameter
>specifies the max amount of memory given pool can consume.
>
>> >> 2. The dcbnl interfaces have been used for QoS settings.  
>> > QoS settings != shared buffer configuration.  
>> [HQN] I think we have different definition about "shared buffer".
>> Please refer to this below switch cli link.
>> It explained in detail what is the "shared buffer" in switch means.
>> Our NIC does not have "shared buffer" supported.
>> https://community.mellanox.com/docs/DOC-2591
>
>Yes, we must have a different definitions of "shared buffer" :)  That
>link, however, didn't clarify much for me...  In mlx5 you seem to have a
>buffer which is shared between priorities, even if it's not what would
>be referred to as shared buffer in switch context.

We introduced "shared buffer" in a devlink with "devlink handle" because
the buffer is shared among the whole ASIC, between multiple
ports/netdevs.


>
>> >> In NIC, the  buffer configuration are tied to priority (ETS PFC).  
>> > Some customers use DCB, a lot (most?) of them don't.  I don't think
>> > the "this is a logical extension of a commonly used API" really
>> > stands here.  
>> [HQN] DCBNL are being actively used. The whole point of this patch
>> is to tie buffer configuration with IEEE's priority and is IEEE's PFC 
>> configuration.
>>
>> Ambitious future is to have the switch configure the NIC's buffer
>> size and buffer mapping
>> via TLV packet and this DCBNL interface. But we won't go too far here.
>
>I think I can understand the motivation, and I think it's a nice thing
>to expose!  The only questions are: does it really belong to DCBNL and
>can existing API be used?
> 
>From patch description it seems like your default setup is shared
>buffer split 50% (lossy)/50% (all prios) and the example you give
>changes that to 25% (lossy)/25%x3 prio groups.
>
>With existing devlink API could this be modelled by three ingress pools
>with 2 TCs bound each?
>
>> >> The buffer configuration are not tied to port like switch.  
>> > It's tied to a port and TCs, you just have one port but still have 8
>> > TCs exactly like a switch...  
>> [HQN] No. Our buffer ties to priority not to TCs.
>
>Right, that is a valid point.  Although TCs can be mapped to
>priorities.  Some switches may tie buffers to priorities, too.  So
>perhaps it's worth extending devlink?
>
>> >> 3. Shared buffer, alpha, threshold are switch specific terms.  
>> > IDK how talking about alpha is relevant, it's just one threshold
>> > type the API supports.  As far as shared buffer and threshold I
>> > don't know if these are switch terms (or how "switch" differs from
>> > "NIC" at that level) - I personally find carving shared buffer into
>> > pools very intuitive.  
>> [HQN] Yes, I understand your point too. The NIC's buffer shares some 
>> characteristics with the switch's buffer settings. 
>
>Yes, and if it's not a perfect match we can extend it.
>
>> But this DCB buffer setting is to improve the performance and work
>> together with the PFC setting. We would like to keep all the qos
>> setting under DCB Netlink as they are designed to be this way.
>
>DCBNL seems to carry standard-based information, which this is not.
>mlxsw supports DCBNL, will it also support this buffer configuration
>mechanism?

Ido would provide you more and accurate info. Basically, in mlxsw we use
dcbnl for the things in can cover and was designed for. And for those
things, the netdev is a handle. Config is specific to the netdev. On the
other hand, devlink shared buffer is used for buffer shared between all
netdevs.



>
>> > Could you give examples of commands/configs one can use with your
>> > new ABI?  
>> [HQN] The plan is to add the support in lldptool once the kernel code
>> is accepted. To test the kernel code,
>> I am using small python scripts that works on top of the netlink
>> library. It will be like this format which is similar to other
>> options in lldptool priority2buffer: 0,2,5,7,1,2,3,6 maps priorities
>> 0,1,2,3,4,5,6,7 to buffer 0,2,5,7,1,2,3,6
>>      buffer_size: 87296,87296,0,87296,0,0,0,0 set receive buffer size 
>> for buffer 0,1,2,3,4,5,6,7 respectively
>> >    How does one query the total size of the buffer to be carved?  
>> [HQN] This is not necessary. If the total size is too big, error will
>> be return via DCB netlink interface.
>
>Right, I'm not saying it's a bug :)  It's just nice when user can be
>told the total size without having to probe for it :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-22  5:20   ` Jakub Kicinski
  2018-05-22 15:36     ` Huy Nguyen
@ 2018-05-23  9:43     ` Jiri Pirko
  2018-05-23 13:52       ` John Fastabend
  1 sibling, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2018-05-23  9:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Huy Nguyen, Or Gerlitz

Tue, May 22, 2018 at 07:20:26AM CEST, jakub.kicinski@netronome.com wrote:
>On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>> From: Huy Nguyen <huyn@mellanox.com>
>> 
>> In this patch, we add dcbnl buffer attribute to allow user
>> change the NIC's buffer configuration such as priority
>> to buffer mapping and buffer size of individual buffer.
>> 
>> This attribute combined with pfc attribute allows advance user to
>> fine tune the qos setting for specific priority queue. For example,
>> user can give dedicated buffer for one or more prirorities or user
>> can give large buffer to certain priorities.
>> 
>> We present an use case scenario where dcbnl buffer attribute configured
>> by advance user helps reduce the latency of messages of different sizes.
>> 
>> Scenarios description:
>> On ConnectX-5, we run latency sensitive traffic with
>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>> and large message sizes to their own pfc enables priorities as follow.
>>   Priorities 1 & 2 (64B, 256B and 1KB)
>>   Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>>   Priorities 5 & 6 (512KB and 1MB)
>> 
>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>> lossless fixed buffer size of 50% of total available buffer space. The
>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>> we create three equal size lossless buffers. Each buffer has 25% of total
>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>> to lossless  buffer mappings are set as follow.
>>   Priorities 1 & 2 on lossless buffer #1
>>   Priorities 3 & 4 on lossless buffer #2
>>   Priorities 5 & 6 on lossless buffer #3
>> 
>> We observe improvements in latency for small and medium message sizes
>> as follows. Please note that the large message sizes bandwidth performance is
>> reduced but the total bandwidth remains the same.
>>   256B message size (42 % latency reduction)
>>   4K message size (21% latency reduction)
>>   64K message size (16% latency reduction)
>> 
>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>
>On a cursory look this bares a lot of resemblance to devlink shared
>buffer configuration ABI.  Did you look into using that?  
>
>Just to be clear devlink shared buffer ABIs don't require representors
>and "switchdev mode".

If the CX5 buffer they are trying to utilize here is per port and not a
shared one, it would seem ok for me to not have it in "devlink sb".

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23  9:43     ` Jiri Pirko
@ 2018-05-23 13:52       ` John Fastabend
  2018-05-23 15:37         ` Huy Nguyen
  2018-05-23 20:13         ` Jakub Kicinski
  0 siblings, 2 replies; 32+ messages in thread
From: John Fastabend @ 2018-05-23 13:52 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Huy Nguyen, Or Gerlitz

On 05/23/2018 02:43 AM, Jiri Pirko wrote:
> Tue, May 22, 2018 at 07:20:26AM CEST, jakub.kicinski@netronome.com wrote:
>> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>>> From: Huy Nguyen <huyn@mellanox.com>
>>>
>>> In this patch, we add dcbnl buffer attribute to allow user
>>> change the NIC's buffer configuration such as priority
>>> to buffer mapping and buffer size of individual buffer.
>>>
>>> This attribute combined with pfc attribute allows advance user to
>>> fine tune the qos setting for specific priority queue. For example,
>>> user can give dedicated buffer for one or more prirorities or user
>>> can give large buffer to certain priorities.
>>>
>>> We present an use case scenario where dcbnl buffer attribute configured
>>> by advance user helps reduce the latency of messages of different sizes.
>>>
>>> Scenarios description:
>>> On ConnectX-5, we run latency sensitive traffic with
>>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>>> and large message sizes to their own pfc enables priorities as follow.
>>>   Priorities 1 & 2 (64B, 256B and 1KB)
>>>   Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>>>   Priorities 5 & 6 (512KB and 1MB)
>>>
>>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>>> lossless fixed buffer size of 50% of total available buffer space. The
>>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>>> we create three equal size lossless buffers. Each buffer has 25% of total
>>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>>> to lossless  buffer mappings are set as follow.
>>>   Priorities 1 & 2 on lossless buffer #1
>>>   Priorities 3 & 4 on lossless buffer #2
>>>   Priorities 5 & 6 on lossless buffer #3
>>>
>>> We observe improvements in latency for small and medium message sizes
>>> as follows. Please note that the large message sizes bandwidth performance is
>>> reduced but the total bandwidth remains the same.
>>>   256B message size (42 % latency reduction)
>>>   4K message size (21% latency reduction)
>>>   64K message size (16% latency reduction)
>>>
>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>>
>> On a cursory look this bares a lot of resemblance to devlink shared
>> buffer configuration ABI.  Did you look into using that?  
>>
>> Just to be clear devlink shared buffer ABIs don't require representors
>> and "switchdev mode".
> 
> If the CX5 buffer they are trying to utilize here is per port and not a
> shared one, it would seem ok for me to not have it in "devlink sb".
> 

+1 I think its probably reasonable to let devlink manage the global
(device layer) buffers and then have dcbnl partition the buffer up
further per netdev. Notice there is already a partitioning of the
buffers happening when DCB is enabled and/or parameters are changed.
So giving explicit control over this seems OK to me.

It would be nice though if the API gave us some hint on max/min/stride
of allowed values. Could the get API return these along with current
value? Presumably the allowed max size could change with devlink buffer
changes in how the global buffer is divided up as well.

The argument against allowing this API is it doesn't have anything to
do with the 802.1Q standard, but that is fine IMO.

.John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23  9:23           ` Jakub Kicinski
  2018-05-23  9:33             ` Jiri Pirko
@ 2018-05-23 15:08             ` Huy Nguyen
  2018-05-23 15:27             ` Huy Nguyen
  2018-05-24 17:13             ` Ido Schimmel
  3 siblings, 0 replies; 32+ messages in thread
From: Huy Nguyen @ 2018-05-23 15:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Jiri Pirko, Or Gerlitz,
	Parav Pandit, Ido Schimmel

> I hope that is not true, since we (Netronome) are trying to use it for
> NIC configuration, too.  We should generalize the API if need be.
Yes, it is up to your company. devlink is static tool. DCBNL are intended to
be dynamically configured by switch. In real world, not many people 
configure NIC's qos
from host.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23  9:23           ` Jakub Kicinski
  2018-05-23  9:33             ` Jiri Pirko
  2018-05-23 15:08             ` Huy Nguyen
@ 2018-05-23 15:27             ` Huy Nguyen
  2018-05-24 17:13             ` Ido Schimmel
  3 siblings, 0 replies; 32+ messages in thread
From: Huy Nguyen @ 2018-05-23 15:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Jiri Pirko, Or Gerlitz,
	Parav Pandit, Ido Schimmel



On 5/23/2018 4:23 AM, Jakub Kicinski wrote:
> >From patch description it seems like your default setup is shared
> buffer split 50% (lossy)/50% (all prios) and the example you give
> changes that to 25% (lossy)/25%x3 prio groups.
>
> With existing devlink API could this be modelled by three ingress pools
> with 2 TCs bound each?
Yes, possible. When you map prio to tc. Please be careful with prio term 
in switch
since switch has more than 8 prio.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23 13:52       ` John Fastabend
@ 2018-05-23 15:37         ` Huy Nguyen
  2018-05-23 16:03           ` John Fastabend
  2018-05-23 20:13         ` Jakub Kicinski
  1 sibling, 1 reply; 32+ messages in thread
From: Huy Nguyen @ 2018-05-23 15:37 UTC (permalink / raw)
  To: John Fastabend, Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Or Gerlitz



On 5/23/2018 8:52 AM, John Fastabend wrote:
> It would be nice though if the API gave us some hint on max/min/stride
> of allowed values. Could the get API return these along with current
> value? Presumably the allowed max size could change with devlink buffer
> changes in how the global buffer is divided up as well.
Acked. I will add Max. Let's skip min/stride since it is too hardware 
specific.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23 15:37         ` Huy Nguyen
@ 2018-05-23 16:03           ` John Fastabend
  2018-05-23 20:28             ` Jakub Kicinski
  2018-05-24 14:37             ` Huy Nguyen
  0 siblings, 2 replies; 32+ messages in thread
From: John Fastabend @ 2018-05-23 16:03 UTC (permalink / raw)
  To: Huy Nguyen, Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Or Gerlitz

On 05/23/2018 08:37 AM, Huy Nguyen wrote:
> 
> 
> On 5/23/2018 8:52 AM, John Fastabend wrote:
>> It would be nice though if the API gave us some hint on max/min/stride
>> of allowed values. Could the get API return these along with current
>> value? Presumably the allowed max size could change with devlink buffer
>> changes in how the global buffer is divided up as well.
> Acked. I will add Max. Let's skip min/stride since it is too hardware specific.

At minimum then we need to document for driver writers what to do
with a value that falls between strides. Round-up or round-down.

.John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23 13:52       ` John Fastabend
  2018-05-23 15:37         ` Huy Nguyen
@ 2018-05-23 20:13         ` Jakub Kicinski
  1 sibling, 0 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-05-23 20:13 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, Saeed Mahameed, David S. Miller, netdev, Huy Nguyen,
	Or Gerlitz

On Wed, 23 May 2018 06:52:33 -0700, John Fastabend wrote:
> On 05/23/2018 02:43 AM, Jiri Pirko wrote:
> > Tue, May 22, 2018 at 07:20:26AM CEST, jakub.kicinski@netronome.com wrote:  
> >> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:  
> >>> From: Huy Nguyen <huyn@mellanox.com>
> >>>
> >>> In this patch, we add dcbnl buffer attribute to allow user
> >>> change the NIC's buffer configuration such as priority
> >>> to buffer mapping and buffer size of individual buffer.
> >>>
> >>> This attribute combined with pfc attribute allows advance user to
> >>> fine tune the qos setting for specific priority queue. For example,
> >>> user can give dedicated buffer for one or more prirorities or user
> >>> can give large buffer to certain priorities.
> >>>
> >>> We present an use case scenario where dcbnl buffer attribute configured
> >>> by advance user helps reduce the latency of messages of different sizes.
> >>>
> >>> Scenarios description:
> >>> On ConnectX-5, we run latency sensitive traffic with
> >>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
> >>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
> >>> and large message sizes to their own pfc enables priorities as follow.
> >>>   Priorities 1 & 2 (64B, 256B and 1KB)
> >>>   Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
> >>>   Priorities 5 & 6 (512KB and 1MB)
> >>>
> >>> By default, ConnectX-5 maps all pfc enabled priorities to a single
> >>> lossless fixed buffer size of 50% of total available buffer space. The
> >>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
> >>> we create three equal size lossless buffers. Each buffer has 25% of total
> >>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
> >>> to lossless  buffer mappings are set as follow.
> >>>   Priorities 1 & 2 on lossless buffer #1
> >>>   Priorities 3 & 4 on lossless buffer #2
> >>>   Priorities 5 & 6 on lossless buffer #3
> >>>
> >>> We observe improvements in latency for small and medium message sizes
> >>> as follows. Please note that the large message sizes bandwidth performance is
> >>> reduced but the total bandwidth remains the same.
> >>>   256B message size (42 % latency reduction)
> >>>   4K message size (21% latency reduction)
> >>>   64K message size (16% latency reduction)
> >>>
> >>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
> >>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  
> >>
> >> On a cursory look this bares a lot of resemblance to devlink shared
> >> buffer configuration ABI.  Did you look into using that?  
> >>
> >> Just to be clear devlink shared buffer ABIs don't require representors
> >> and "switchdev mode".  
> > 
> > If the CX5 buffer they are trying to utilize here is per port and not a
> > shared one, it would seem ok for me to not have it in "devlink sb".

What I meant is that it may be shared between VFs and PF contexts.  But
if it's purely ingress per-prio FIFO without any advanced configuration
capabilities, then perhaps this API is a better match.

> +1 I think its probably reasonable to let devlink manage the global
> (device layer) buffers and then have dcbnl partition the buffer up
> further per netdev. Notice there is already a partitioning of the
> buffers happening when DCB is enabled and/or parameters are changed.
> So giving explicit control over this seems OK to me.

Okay, thanks for the discussion! :)

> It would be nice though if the API gave us some hint on max/min/stride
> of allowed values. Could the get API return these along with current
> value? Presumably the allowed max size could change with devlink
> buffer changes in how the global buffer is divided up as well.
> 
> The argument against allowing this API is it doesn't have anything to
> do with the 802.1Q standard, but that is fine IMO.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
  2018-05-22  5:20   ` Jakub Kicinski
@ 2018-05-23 20:19   ` Jakub Kicinski
  2018-05-24 14:11     ` Huy Nguyen
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2018-05-23 20:19 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev, Huy Nguyen

On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
> diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
> index 2c0c6453c3f4..1ddc0a44c172 100644
> --- a/include/uapi/linux/dcbnl.h
> +++ b/include/uapi/linux/dcbnl.h
> @@ -163,6 +163,15 @@ struct ieee_pfc {
>  	__u64	indications[IEEE_8021QAZ_MAX_TCS];
>  };
>  
> +#define IEEE_8021Q_MAX_PRIORITIES 8
> +#define DCBX_MAX_BUFFERS  8
> +struct dcbnl_buffer {
> +	/* priority to buffer mapping */
> +	__u8    prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
> +	/* buffer size in Bytes */
> +	__u32   buffer_size[DCBX_MAX_BUFFERS];

Could you use IEEE_8021Q_MAX_PRIORITIES to size this array?  The DCBX in
the define name sort of implies this is coming from the standard which
it isn't.

> +};
> +
>  /* CEE DCBX std supported values */
>  #define CEE_DCBX_MAX_PGS	8
>  #define CEE_DCBX_MAX_PRIO	8

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23 16:03           ` John Fastabend
@ 2018-05-23 20:28             ` Jakub Kicinski
  2018-05-24 14:37             ` Huy Nguyen
  1 sibling, 0 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-05-23 20:28 UTC (permalink / raw)
  To: John Fastabend
  Cc: Huy Nguyen, Jiri Pirko, Saeed Mahameed, David S. Miller, netdev,
	Or Gerlitz

On Wed, 23 May 2018 09:03:53 -0700, John Fastabend wrote:
> On 05/23/2018 08:37 AM, Huy Nguyen wrote:
> > 
> > 
> > On 5/23/2018 8:52 AM, John Fastabend wrote:  
> >> It would be nice though if the API gave us some hint on max/min/stride
> >> of allowed values. Could the get API return these along with current
> >> value? Presumably the allowed max size could change with devlink buffer
> >> changes in how the global buffer is divided up as well.  
> > Acked. I will add Max. Let's skip min/stride since it is too hardware specific.  
> 
> At minimum then we need to document for driver writers what to do
> with a value that falls between strides. Round-up or round-down.

BTW I feel like stride would be a good addition to devlink-sb, too!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23 20:19   ` Jakub Kicinski
@ 2018-05-24 14:11     ` Huy Nguyen
  0 siblings, 0 replies; 32+ messages in thread
From: Huy Nguyen @ 2018-05-24 14:11 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed; +Cc: David S. Miller, netdev



On 5/23/2018 3:19 PM, Jakub Kicinski wrote:
> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>> diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
>> index 2c0c6453c3f4..1ddc0a44c172 100644
>> --- a/include/uapi/linux/dcbnl.h
>> +++ b/include/uapi/linux/dcbnl.h
>> @@ -163,6 +163,15 @@ struct ieee_pfc {
>>   	__u64	indications[IEEE_8021QAZ_MAX_TCS];
>>   };
>>   
>> +#define IEEE_8021Q_MAX_PRIORITIES 8
>> +#define DCBX_MAX_BUFFERS  8
>> +struct dcbnl_buffer {
>> +	/* priority to buffer mapping */
>> +	__u8    prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
>> +	/* buffer size in Bytes */
>> +	__u32   buffer_size[DCBX_MAX_BUFFERS];
> Could you use IEEE_8021Q_MAX_PRIORITIES to size this array?  The DCBX in
> the define name sort of implies this is coming from the standard which
> it isn't.
>
I agree with your standard comment. But since priority is mapped to 
buffer, I think it is okay to reuse
#define. Let's not have a duplicate #define with the same meaning.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23 16:03           ` John Fastabend
  2018-05-23 20:28             ` Jakub Kicinski
@ 2018-05-24 14:37             ` Huy Nguyen
  1 sibling, 0 replies; 32+ messages in thread
From: Huy Nguyen @ 2018-05-24 14:37 UTC (permalink / raw)
  To: John Fastabend, Jiri Pirko, Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, netdev, Or Gerlitz

On 5/23/2018 11:03 AM, John Fastabend wrote:
> On 05/23/2018 08:37 AM, Huy Nguyen wrote:
>>
>> On 5/23/2018 8:52 AM, John Fastabend wrote:
>>> It would be nice though if the API gave us some hint on max/min/stride
>>> of allowed values. Could the get API return these along with current
>>> value? Presumably the allowed max size could change with devlink buffer
>>> changes in how the global buffer is divided up as well.
>> Acked. I will add Max. Let's skip min/stride since it is too hardware specific.
> At minimum then we need to document for driver writers what to do
> with a value that falls between strides. Round-up or round-down.
>
> .John
V2 still under internal review. But here are the changes in patch #1 and 
patch #6.
patch #1
Changes in V2:
     Add total_size in dcbnl_buffer to report the total available buffer 
size of the netdev.
     Code changes are in patch #1 and #6.

patch #6 commit message
Changes in V2:
     Report total available buffer size of the netdev.
     Comment on buffer stride:
     Mellanox HCA buffer stride is 128 Bytes. If the
     buffer size is not multiple of 128, the buffer size will be rounded 
down
     to the nearest multiple of 128.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
  2018-05-23  9:23           ` Jakub Kicinski
                               ` (2 preceding siblings ...)
  2018-05-23 15:27             ` Huy Nguyen
@ 2018-05-24 17:13             ` Ido Schimmel
  3 siblings, 0 replies; 32+ messages in thread
From: Ido Schimmel @ 2018-05-24 17:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Huy Nguyen, Saeed Mahameed, David S. Miller, netdev, Jiri Pirko,
	Or Gerlitz, Parav Pandit, Ido Schimmel

Hi Jakub,

On Wed, May 23, 2018 at 02:23:14AM -0700, Jakub Kicinski wrote:
> Are you referring to XOFF/XON thresholds?  I don't think the "threshold
> type" in devlink API implies we are setting XON/XOFF thresholds
> directly :S  If PFC is enabled we may be setting them indirectly,
> obviously.
> 
> My understanding is that for static threshold type the size parameter
> specifies the max amount of memory given pool can consume.

Correct.

> Yes, we must have a different definitions of "shared buffer" :)  That
> link, however, didn't clarify much for me...  In mlx5 you seem to have a
> buffer which is shared between priorities, even if it's not what would
> be referred to as shared buffer in switch context.

The following link is my attempt at explaining the above concepts:
https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service

Please let me know if something is not clear.

Basically, we use devlink-sb and dcbl to configure two different
buffers:

* devlink-sb is used to configure the switch's shared buffer which is
shared between all the ports and thus can't take a netdev as an handle

* dcbnl is used to configure per-port buffers (also called headroom
buffers) where received packets are stored while going through the
switch's pipeline before being admitted to the shared buffer and
awaiting transmission

Note that in Huy's case the buffers are of the second type (per-port)
and thus using dcbnl instead of devlink-sb makes sense.

> DCBNL seems to carry standard-based information, which this is not.
> mlxsw supports DCBNL, will it also support this buffer configuration
> mechanism?

I believe so, it's just a matter of doing the work. The hardware
supports this and the interface is identical to the NIC (same
registers).

> > >    How does one query the total size of the buffer to be carved?  
> > [HQN] This is not necessary. If the total size is too big, error will
> > be return via DCB netlink interface.
> 
> Right, I'm not saying it's a bug :)  It's just nice when user can be
> told the total size without having to probe for it :)

+1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
  2018-05-22 16:01       ` Saeed Mahameed
@ 2018-05-24 21:21         ` Or Gerlitz
  2018-05-24 21:28           ` Saeed Mahameed
  0 siblings, 1 reply; 32+ messages in thread
From: Or Gerlitz @ 2018-05-24 21:21 UTC (permalink / raw)
  To: Saeed Mahameed, Huy Nguyen
  Cc: David S. Miller, Linux Netdev List, Saeed Mahameed

On Tue, May 22, 2018 at 7:01 PM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Tue, May 22, 2018 at 3:21 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Tue, May 22, 2018 at 1:19 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Tue, May 22, 2018 at 12:04 AM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>>>> From: Huy Nguyen <huyn@mellanox.com>
>>>>
>>>> Add pbmc and pptb in the port_access_reg_cap_mask. These two
>>>> bits determine if device supports receive buffer configuration.
>>>>
>>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>>>
>>> Huy, Parav reviewed your code to death (but he's still alive and kicking!),
>>> go a head and add his R.Bs note to the entire series.

Just wanted to make sure you didn't miss this one, ack?



>> when you fix that, also address checkpatch's scream on
>>
>> WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
>>
>> in four cases along the series
>>
>
> We are going to do this once for all mlx5 files soon, i don't want to
> have two types of license headers in the meanwhile.
> let's keep this as is until then.
>
>>
>>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
  2018-05-24 21:21         ` Or Gerlitz
@ 2018-05-24 21:28           ` Saeed Mahameed
  0 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-05-24 21:28 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Huy Nguyen, David S. Miller, Linux Netdev List, Saeed Mahameed

On Thu, May 24, 2018 at 2:21 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, May 22, 2018 at 7:01 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Tue, May 22, 2018 at 3:21 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Tue, May 22, 2018 at 1:19 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Tue, May 22, 2018 at 12:04 AM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>>>>> From: Huy Nguyen <huyn@mellanox.com>
>>>>>
>>>>> Add pbmc and pptb in the port_access_reg_cap_mask. These two
>>>>> bits determine if device supports receive buffer configuration.
>>>>>
>>>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>>>>
>>>> Huy, Parav reviewed your code to death (but he's still alive and kicking!),
>>>> go a head and add his R.Bs note to the entire series.
>
> Just wanted to make sure you didn't miss this one, ack?
>
ack

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-05-24 21:28 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
2018-05-22  5:20   ` Jakub Kicinski
2018-05-22 15:36     ` Huy Nguyen
2018-05-22 18:32       ` Jakub Kicinski
2018-05-23  1:01         ` Huy Nguyen
2018-05-23  6:15           ` Or Gerlitz
2018-05-23  9:23           ` Jakub Kicinski
2018-05-23  9:33             ` Jiri Pirko
2018-05-23 15:08             ` Huy Nguyen
2018-05-23 15:27             ` Huy Nguyen
2018-05-24 17:13             ` Ido Schimmel
2018-05-23  9:43     ` Jiri Pirko
2018-05-23 13:52       ` John Fastabend
2018-05-23 15:37         ` Huy Nguyen
2018-05-23 16:03           ` John Fastabend
2018-05-23 20:28             ` Jakub Kicinski
2018-05-24 14:37             ` Huy Nguyen
2018-05-23 20:13         ` Jakub Kicinski
2018-05-23 20:19   ` Jakub Kicinski
2018-05-24 14:11     ` Huy Nguyen
2018-05-21 21:04 ` [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Saeed Mahameed
2018-05-22 10:19   ` Or Gerlitz
2018-05-22 10:21     ` Or Gerlitz
2018-05-22 16:01       ` Saeed Mahameed
2018-05-24 21:21         ` Or Gerlitz
2018-05-24 21:28           ` Saeed Mahameed
2018-05-21 21:04 ` [net-next 3/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c Saeed Mahameed
2018-05-21 21:05 ` [net-next 4/6] net/mlx5e: PPTB and PBMC register firmware command support Saeed Mahameed
2018-05-21 21:05 ` [net-next 5/6] net/mlx5e: Receive buffer configuration Saeed Mahameed
2018-05-21 21:05 ` [net-next 6/6] net/mlx5e: Receive buffer support for DCBX Saeed Mahameed
2018-05-22 19:38 ` [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.