netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/8] net: dsa: Multi-queue awareness
@ 2017-08-31  0:18 Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues Florian Fainelli
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

This patch series is sent as reference, especially because the last patch
is trying not to be creating too many layer violations, but clearly there
are a little bit being created here anyways.

Essentially what I am trying to achieve is that you have a stacked device which
is multi-queue aware, that applications will be using, and for which they can
control the queue selection (using mq) the way they want. Each of each stacked
network devices are created for each port of the switch (this is what DSA
does). When a skb is submitted from say net_device X, we can derive its port
number and look at the queue_mapping value to determine which port of the
switch and queue we should be sending this to. The information is embedded in a
tag (4 bytes) and is used by the switch to steer the transmission.

These stacked devices will actually transmit using a "master" or conduit
network device which has a number of queues as well. In one version of the
hardware that I work with, we have up to 4 ports, each with 8 queues, and the
master device has a total of 32 hardware queues, so a 1:1 mapping is easy. With
another version of the hardware, same number of ports and queues, but only 16
hardware queues, so only a 2:1 mapping is possible.

In order for congestion information to work properly, I need to establish a
mapping, preferably before transmission starts (but reconfiguration while
interfaces are running would be possible too) between these stacked device's
queue and the conduit interface's queue.

Comments, flames, rotten tomatoes, anything!

Florian Fainelli (8):
  net: dsa: Allow switch drivers to indicate number of RX/TX queues
  net: dsa: tag_brcm: Set output queue from skb queue mapping
  net: dsa: bcm_sf2: Advertise number of egress queues
  net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
  net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
  net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num
  net: dsa: tag_brcm: Indicate to master netdevice port + queue
  net: systemport: Establish DSA network device queue mapping

 drivers/net/dsa/bcm_sf2.c                  |  16 +++++
 drivers/net/dsa/bcm_sf2.h                  |   1 +
 drivers/net/dsa/bcm_sf2_cfp.c              |   8 +--
 drivers/net/ethernet/broadcom/bcmsysport.c | 100 +++++++++++++++++++++++++++--
 drivers/net/ethernet/broadcom/bcmsysport.h |  11 +++-
 include/net/dsa.h                          |  19 ++++++
 net/dsa/slave.c                            |  22 +++++--
 net/dsa/tag_brcm.c                         |   8 ++-
 8 files changed, 170 insertions(+), 15 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-08-31 23:44   ` Andrew Lunn
  2017-08-31  0:18 ` [RFC net-next 2/8] net: dsa: tag_brcm: Set output queue from skb queue mapping Florian Fainelli
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

Let switch drivers indicate how many RX and TX queues they support. Some
switches, such as Broadcom Starfighter 2 are resigned with 8 egress
queues. Future changes will allow us to leverage the queue mapping and
direct the transmission towards a particular queue.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 include/net/dsa.h |  4 ++++
 net/dsa/slave.c   | 10 ++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 398ca8d70ccd..b10e8da3f8d7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -243,6 +243,10 @@ struct dsa_switch {
 	/* devlink used to represent this switch device */
 	struct devlink		*devlink;
 
+	/* Number of switch port queues */
+	unsigned int		num_rx_queues;
+	unsigned int		num_tx_queues;
+
 	/* Dynamically allocated ports, keep last */
 	size_t num_ports;
 	struct dsa_port ports[];
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 78e78a6e6833..bfd7173a3c6a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1259,8 +1259,14 @@ int dsa_slave_create(struct dsa_port *port, const char *name)
 	cpu_dp = ds->dst->cpu_dp;
 	master = cpu_dp->netdev;
 
-	slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
-				 NET_NAME_UNKNOWN, ether_setup);
+	if (!ds->num_rx_queues)
+		ds->num_rx_queues = 1;
+	if (!ds->num_tx_queues)
+		ds->num_tx_queues = 1;
+
+	slave_dev = alloc_netdev_mqs(sizeof(struct dsa_slave_priv), name,
+				     NET_NAME_UNKNOWN, ether_setup,
+				     ds->num_tx_queues, ds->num_rx_queues);
 	if (slave_dev == NULL)
 		return -ENOMEM;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC net-next 2/8] net: dsa: tag_brcm: Set output queue from skb queue mapping
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 3/8] net: dsa: bcm_sf2: Advertise number of egress queues Florian Fainelli
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

We originally used skb->priority but that was not quite correct as this
bitfield needs to contain the egress switch queue we intend to send this
SKB to.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/dsa/tag_brcm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index de74c3f77818..dbb016434ace 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -62,6 +62,7 @@
 static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
+	u16 queue = skb_get_queue_mapping(skb);
 	u8 *brcm_tag;
 
 	if (skb_cow_head(skb, BRCM_TAG_LEN) < 0)
@@ -78,7 +79,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev
 	 * deprecated
 	 */
 	brcm_tag[0] = (1 << BRCM_OPCODE_SHIFT) |
-			((skb->priority << BRCM_IG_TC_SHIFT) & BRCM_IG_TC_MASK);
+		       ((queue & BRCM_IG_TC_MASK) << BRCM_IG_TC_SHIFT);
 	brcm_tag[1] = 0;
 	brcm_tag[2] = 0;
 	if (p->dp->index == 8)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC net-next 3/8] net: dsa: bcm_sf2: Advertise number of egress queues
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 2/8] net: dsa: tag_brcm: Set output queue from skb queue mapping Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 4/8] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping Florian Fainelli
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

The switch supports 8 egress queues per port, so indicate that such that
net/dsa/slave.c::dsa_slave_create can allocate the right number of TX
queues.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 8492c9d64004..3f1ad9d5d7c5 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1147,6 +1147,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
 	ds = dev->ds;
 	ds->ops = &bcm_sf2_ops;
 
+	/* Advertise the 8 egress queues */
+	ds->num_tx_queues = 8;
+
 	dev_set_drvdata(&pdev->dev, priv);
 
 	spin_lock_init(&priv->indir_lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC net-next 4/8] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
                   ` (2 preceding siblings ...)
  2017-08-31  0:18 ` [RFC net-next 3/8] net: dsa: bcm_sf2: Advertise number of egress queues Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 5/8] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 Florian Fainelli
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

Even though TC2QOS mapping is for switch egress queues, we need to
configure it correclty in order for the Broadcom tag ingress (CPU ->
switch) queue selection to work correctly since there is a 1:1 mapping
between switch egress queues and ingress queues.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 3f1ad9d5d7c5..fc9f9f171e55 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -103,6 +103,7 @@ static void bcm_sf2_brcm_hdr_setup(struct bcm_sf2_priv *priv, int port)
 static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 {
 	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+	unsigned int i;
 	u32 reg, offset;
 
 	if (priv->type == BCM7445_DEVICE_ID)
@@ -129,6 +130,14 @@ static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 	reg |= MII_DUMB_FWDG_EN;
 	core_writel(priv, reg, CORE_SWITCH_CTRL);
 
+	/* Configure Traffic Class to QoS mapping, allow each priority to map
+	 * to a different queue number
+	 */
+	reg = core_readl(priv, CORE_PORT_TC2_QOS_MAP_PORT(port));
+	for (i = 0; i < 8; i++)
+		reg |= i << (PRT_TO_QID_SHIFT * i);
+	core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port));
+
 	bcm_sf2_brcm_hdr_setup(priv, port);
 
 	/* Force link status for IMP port */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC net-next 5/8] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
                   ` (3 preceding siblings ...)
  2017-08-31  0:18 ` [RFC net-next 4/8] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 6/8] net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num Florian Fainelli
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

BCM7278 has only 128 entries while BCM7445 has the full 256 entries set,
fix that.

Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c     | 4 ++++
 drivers/net/dsa/bcm_sf2.h     | 1 +
 drivers/net/dsa/bcm_sf2_cfp.c | 8 ++++----
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index fc9f9f171e55..af299fe343cc 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1043,6 +1043,7 @@ struct bcm_sf2_of_data {
 	u32 type;
 	const u16 *reg_offsets;
 	unsigned int core_reg_align;
+	unsigned int num_cfp_rules;
 };
 
 /* Register offsets for the SWITCH_REG_* block */
@@ -1066,6 +1067,7 @@ struct bcm_sf2_of_data {
 	.type		= BCM7445_DEVICE_ID,
 	.core_reg_align	= 0,
 	.reg_offsets	= bcm_sf2_7445_reg_offsets,
+	.num_cfp_rules	= 256,
 };
 
 static const u16 bcm_sf2_7278_reg_offsets[] = {
@@ -1088,6 +1090,7 @@ struct bcm_sf2_of_data {
 	.type		= BCM7278_DEVICE_ID,
 	.core_reg_align	= 1,
 	.reg_offsets	= bcm_sf2_7278_reg_offsets,
+	.num_cfp_rules	= 128,
 };
 
 static const struct of_device_id bcm_sf2_of_match[] = {
@@ -1144,6 +1147,7 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
 	priv->type = data->type;
 	priv->reg_offsets = data->reg_offsets;
 	priv->core_reg_align = data->core_reg_align;
+	priv->num_cfp_rules = data->num_cfp_rules;
 
 	/* Auto-detection using standard registers will not work, so
 	 * provide an indication of what kind of device we are for
diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index d9c96b281fc0..02c499f9c56b 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -72,6 +72,7 @@ struct bcm_sf2_priv {
 	u32 				type;
 	const u16			*reg_offsets;
 	unsigned int			core_reg_align;
+	unsigned int			num_cfp_rules;
 
 	/* spinlock protecting access to the indirect registers */
 	spinlock_t			indir_lock;
diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 2fb32d67065f..8a1da7e67707 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -98,7 +98,7 @@ static inline void bcm_sf2_cfp_rule_addr_set(struct bcm_sf2_priv *priv,
 {
 	u32 reg;
 
-	WARN_ON(addr >= CFP_NUM_RULES);
+	WARN_ON(addr >= priv->num_cfp_rules);
 
 	reg = core_readl(priv, CORE_CFP_ACC);
 	reg &= ~(XCESS_ADDR_MASK << XCESS_ADDR_SHIFT);
@@ -109,7 +109,7 @@ static inline void bcm_sf2_cfp_rule_addr_set(struct bcm_sf2_priv *priv,
 static inline unsigned int bcm_sf2_cfp_rule_size(struct bcm_sf2_priv *priv)
 {
 	/* Entry #0 is reserved */
-	return CFP_NUM_RULES - 1;
+	return priv->num_cfp_rules - 1;
 }
 
 static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int port,
@@ -523,7 +523,7 @@ static int bcm_sf2_cfp_rule_get_all(struct bcm_sf2_priv *priv,
 		if (!(reg & OP_STR_DONE))
 			break;
 
-	} while (index < CFP_NUM_RULES);
+	} while (index < priv->num_cfp_rules);
 
 	/* Put the TCAM size here */
 	nfc->data = bcm_sf2_cfp_rule_size(priv);
@@ -544,7 +544,7 @@ int bcm_sf2_get_rxnfc(struct dsa_switch *ds, int port,
 	case ETHTOOL_GRXCLSRLCNT:
 		/* Subtract the default, unusable rule */
 		nfc->rule_cnt = bitmap_weight(priv->cfp.used,
-					      CFP_NUM_RULES) - 1;
+					      priv->num_cfp_rules) - 1;
 		/* We support specifying rule locations */
 		nfc->data |= RX_CLS_LOC_SPECIAL;
 		break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC net-next 6/8] net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
                   ` (4 preceding siblings ...)
  2017-08-31  0:18 ` [RFC net-next 5/8] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 7/8] net: dsa: tag_brcm: Indicate to master netdevice port + queue Florian Fainelli
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

Expose two helper functions:
* one to verify if a net_device is a DSA slave network device
* one to obtain the physical port number associated with a DSA slave
  network device

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 include/net/dsa.h | 15 +++++++++++++++
 net/dsa/slave.c   | 12 +++++++++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index b10e8da3f8d7..649bd06f9fe4 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -459,6 +459,21 @@ static inline bool netdev_uses_dsa(struct net_device *dev)
 	return false;
 }
 
+#if IS_ENABLED(CONFIG_NET_DSA)
+bool dsa_slave_dev_check(struct net_device *dev);
+unsigned int dsa_slave_dev_port_num(struct net_device *dev);
+#else
+static inline bool dsa_slave_dev_check(struct net_device *dev)
+{
+	return false;
+}
+
+static inline dsa_slave_dev_port_num(struct net_device *dev)
+{
+	return DSA_MAX_PORTS;
+}
+#endif
+
 struct dsa_switch *dsa_switch_alloc(struct device *dev, size_t n);
 void dsa_unregister_switch(struct dsa_switch *ds);
 int dsa_register_switch(struct dsa_switch *ds);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index bfd7173a3c6a..302ae3326e3a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -25,8 +25,6 @@
 
 #include "dsa_priv.h"
 
-static bool dsa_slave_dev_check(struct net_device *dev);
-
 /* slave mii_bus handling ***************************************************/
 static int dsa_slave_phy_read(struct mii_bus *bus, int addr, int reg)
 {
@@ -1346,10 +1344,18 @@ void dsa_slave_destroy(struct net_device *slave_dev)
 	free_netdev(slave_dev);
 }
 
-static bool dsa_slave_dev_check(struct net_device *dev)
+bool dsa_slave_dev_check(struct net_device *dev)
 {
 	return dev->netdev_ops == &dsa_slave_netdev_ops;
 }
+EXPORT_SYMBOL_GPL(dsa_slave_dev_check);
+
+unsigned int dsa_slave_dev_port_num(struct net_device *dev)
+{
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	return p->dp->index;
+}
+EXPORT_SYMBOL_GPL(dsa_slave_dev_port_num);
 
 static int dsa_slave_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC net-next 7/8] net: dsa: tag_brcm: Indicate to master netdevice port + queue
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
                   ` (5 preceding siblings ...)
  2017-08-31  0:18 ` [RFC net-next 6/8] net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-08-31  0:18 ` [RFC net-next 8/8] net: systemport: Establish DSA network device queue mapping Florian Fainelli
  2017-09-01  0:05 ` [RFC net-next 0/8] net: dsa: Multi-queue awareness Andrew Lunn
  8 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

We need to tell the DSA master network device doing the actual
transmission what the desired switch port and queue number is for it to
resolve that to the internal transmit queue it is mapped to.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/dsa/tag_brcm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index dbb016434ace..19a617b09b3c 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -86,6 +86,11 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev
 		brcm_tag[2] = BRCM_IG_DSTMAP2_MASK;
 	brcm_tag[3] = (1 << p->dp->index) & BRCM_IG_DSTMAP1_MASK;
 
+	/* Now tell the master network device about the desired output queue
+	 * as well
+	 */
+	skb_set_queue_mapping(skb, p->dp->index << dev->num_tx_queues | queue);
+
 	return skb;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC net-next 8/8] net: systemport: Establish DSA network device queue mapping
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
                   ` (6 preceding siblings ...)
  2017-08-31  0:18 ` [RFC net-next 7/8] net: dsa: tag_brcm: Indicate to master netdevice port + queue Florian Fainelli
@ 2017-08-31  0:18 ` Florian Fainelli
  2017-09-01  0:05 ` [RFC net-next 0/8] net: dsa: Multi-queue awareness Andrew Lunn
  8 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli

Establish a queue mapping between the DSA slave network device queues
created, and the transmit queue that SYSTEMPORT manages.

We need to configure the SYSTEMPORT transmit queue with the switch port
number and switch port queue number in order for the switch and
SYSTEMPORT hardware to utilize the out of band congestion notification.
This hardware mechanism works by looking at the switch port egress queue
and determines whether there is enough buffers for this queue, with that
class of service for a successful transmission and if not, backpressures
the SYSTEMPORT queue that is being used.

For this to work, we register a network device notifier that listens for
the registration of DSA network devices, and when that happens, extracts
the number of queues for these devices and their associated port number,
remembers that in the driver private structure and linearly maps those
queues to TX rings/queues that we manage.

This scheme works because DSA slave network deviecs always transmit
through SYSTEMPORT so when DSA slave network devices are
destroyed/brought down, the corresponding SYSTEMPORT queues are no
longer used. Also, by design of the DSA framework, the master network
device (SYSTEMPORT) is registered first.

For faster lookups we use an array of up to DSA_MAX_PORTS * number
of queues per port, and then map pointers to bcm_sysport_tx_ring such
that our ndo_select_queue() implementation can just index into that
array to locate the corresponding ring index.

Here is an example mapping with this code:

P0,Q0 -> Q0
..
P0,Q7 -> Q7
P1,Q0 -> Q8
..
P1,Q7 -> Q15
P2,Q0 -> Q16
..
P2,Q7 -> Q23
P7,Q0 -> Q24
..
P7,Q7 -> Q31

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 100 +++++++++++++++++++++++++++--
 drivers/net/ethernet/broadcom/bcmsysport.h |  11 +++-
 2 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 931751e4f369..eed4c3f672d7 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1387,7 +1387,14 @@ static int bcm_sysport_init_tx_ring(struct bcm_sysport_priv *priv,
 	tdma_writel(priv, 0, TDMA_DESC_RING_COUNT(index));
 	tdma_writel(priv, 1, TDMA_DESC_RING_INTR_CONTROL(index));
 	tdma_writel(priv, 0, TDMA_DESC_RING_PROD_CONS_INDEX(index));
-	tdma_writel(priv, RING_IGNORE_STATUS, TDMA_DESC_RING_MAPPING(index));
+
+	/* Configure QID and port mapping */
+	reg = tdma_readl(priv, TDMA_DESC_RING_MAPPING(index));
+	reg &= ~(RING_QID_MASK | RING_PORT_ID_MASK << RING_PORT_ID_SHIFT);
+	reg |= ring->switch_queue & RING_QID_MASK;
+	reg |= ring->switch_port << RING_PORT_ID_SHIFT;
+	reg |= RING_IGNORE_STATUS;
+	tdma_writel(priv, reg, TDMA_DESC_RING_MAPPING(index));
 	tdma_writel(priv, 0, TDMA_DESC_RING_PCP_DEI_VID(index));
 
 	/* Program the number of descriptors as MAX_THRESHOLD and half of
@@ -1405,8 +1412,9 @@ static int bcm_sysport_init_tx_ring(struct bcm_sysport_priv *priv,
 	napi_enable(&ring->napi);
 
 	netif_dbg(priv, hw, priv->netdev,
-		  "TDMA cfg, size=%d, desc_cpu=%p\n",
-		  ring->size, ring->desc_cpu);
+		  "TDMA cfg, size=%d, desc_cpu=%p switch q=%d,port=%d\n",
+		  ring->size, ring->desc_cpu, ring->switch_queue,
+		  ring->switch_port);
 
 	return 0;
 }
@@ -1987,6 +1995,78 @@ static int bcm_sysport_stop(struct net_device *dev)
 	.set_link_ksettings     = phy_ethtool_set_link_ksettings,
 };
 
+static u16 bcm_sysport_select_queue(struct net_device *dev, struct sk_buff *skb,
+				    void *accel_priv,
+				    select_queue_fallback_t fallback)
+{
+	struct bcm_sysport_priv *priv = netdev_priv(dev);
+	u16 queue = skb_get_queue_mapping(skb);
+	struct bcm_sysport_tx_ring *tx_ring;
+	unsigned int q, port;
+
+	if (!netdev_uses_dsa(dev))
+		return fallback(dev, skb);
+
+	/* DSA tagging layer will have configured the correct queue */
+	q = queue & 0xff;
+	port = queue >> priv->per_port_num_tx_queues;
+	tx_ring = priv->ring_map[q + port * priv->per_port_num_tx_queues];
+
+	return tx_ring->index;
+}
+
+static int bcm_sysport_map_queues(struct bcm_sysport_priv *priv,
+				  struct net_device *slave_dev)
+{
+	struct net_device *dev = priv->netdev;
+	struct bcm_sysport_tx_ring *ring;
+	unsigned int num_tx_queues;
+	unsigned int q, start, port;
+
+	port = dsa_slave_dev_port_num(slave_dev);
+	num_tx_queues = slave_dev->num_tx_queues;
+
+	if (priv->per_port_num_tx_queues &&
+	    priv->per_port_num_tx_queues != num_tx_queues)
+		netdev_warn(slave_dev, "asymetric number of per-port queues\n");
+
+	priv->per_port_num_tx_queues = num_tx_queues;
+
+	start = find_first_zero_bit(&priv->queue_bitmap, dev->num_tx_queues);
+	for (q = 0; q < num_tx_queues; q++) {
+		ring = &priv->tx_rings[q + start];
+
+		/* Just remember the mapping actual programming done
+		 * during bcm_sysport_init_tx_ring
+		 */
+		ring->switch_queue = q;
+		ring->switch_port = port;
+		priv->ring_map[q + port * num_tx_queues] = ring;
+
+		/* Set all queues as being used now */
+		set_bit(q + start, &priv->queue_bitmap);
+	}
+
+	return NOTIFY_OK;
+}
+
+static int bcm_sysport_notifier_event(struct notifier_block *nb,
+				      unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct bcm_sysport_priv *priv;
+
+	priv = container_of(nb, struct bcm_sysport_priv, queue_nb);
+
+	if (!dsa_slave_dev_check(dev))
+		return NOTIFY_DONE;
+
+	if (event == NETDEV_REGISTER)
+		return bcm_sysport_map_queues(priv, dev);
+
+	return NOTIFY_DONE;
+}
+
 static const struct net_device_ops bcm_sysport_netdev_ops = {
 	.ndo_start_xmit		= bcm_sysport_xmit,
 	.ndo_tx_timeout		= bcm_sysport_tx_timeout,
@@ -1999,6 +2079,7 @@ static int bcm_sysport_stop(struct net_device *dev)
 	.ndo_poll_controller	= bcm_sysport_poll_controller,
 #endif
 	.ndo_get_stats64	= bcm_sysport_get_stats64,
+	.ndo_select_queue	= bcm_sysport_select_queue,
 };
 
 #define REV_FMT	"v%2x.%02x"
@@ -2148,10 +2229,17 @@ static int bcm_sysport_probe(struct platform_device *pdev)
 
 	u64_stats_init(&priv->syncp);
 
+	priv->queue_nb.notifier_call = bcm_sysport_notifier_event;
+	ret = register_netdevice_notifier(&priv->queue_nb);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to register notifier\n");
+		goto err_deregister_fixed_link;
+	}
+
 	ret = register_netdev(dev);
 	if (ret) {
 		dev_err(&pdev->dev, "failed to register net_device\n");
-		goto err_deregister_fixed_link;
+		goto err_deregister_nb;
 	}
 
 	priv->rev = topctrl_readl(priv, REV_CNTL) & REV_MASK;
@@ -2164,6 +2252,8 @@ static int bcm_sysport_probe(struct platform_device *pdev)
 
 	return 0;
 
+err_deregister_nb:
+	unregister_netdevice_notifier(&priv->queue_nb);
 err_deregister_fixed_link:
 	if (of_phy_is_fixed_link(dn))
 		of_phy_deregister_fixed_link(dn);
@@ -2175,6 +2265,7 @@ static int bcm_sysport_probe(struct platform_device *pdev)
 static int bcm_sysport_remove(struct platform_device *pdev)
 {
 	struct net_device *dev = dev_get_drvdata(&pdev->dev);
+	struct bcm_sysport_priv *priv = netdev_priv(dev);
 	struct device_node *dn = pdev->dev.of_node;
 
 	/* Not much to do, ndo_close has been called
@@ -2183,6 +2274,7 @@ static int bcm_sysport_remove(struct platform_device *pdev)
 	unregister_netdev(dev);
 	if (of_phy_is_fixed_link(dn))
 		of_phy_deregister_fixed_link(dn);
+	unregister_netdevice_notifier(&priv->queue_nb);
 	free_netdev(dev);
 	dev_set_drvdata(&pdev->dev, NULL);
 
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index 80b4ffff63b7..8913cb0cace6 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -404,7 +404,7 @@ struct bcm_rsb {
 #define  RING_CONS_INDEX_MASK		0xffff
 
 #define RING_MAPPING			0x14
-#define  RING_QID_MASK			0x3
+#define  RING_QID_MASK			0x7
 #define  RING_PORT_ID_SHIFT		3
 #define  RING_PORT_ID_MASK		0x7
 #define  RING_IGNORE_STATUS		(1 << 6)
@@ -711,6 +711,8 @@ struct bcm_sysport_tx_ring {
 	struct bcm_sysport_priv *priv;	/* private context backpointer */
 	unsigned long	packets;	/* packets statistics */
 	unsigned long	bytes;		/* bytes statistics */
+	unsigned int	switch_queue;	/* switch port queue number */
+	unsigned int	switch_port;	/* switch port queue number */
 };
 
 /* Driver private structure */
@@ -764,5 +766,12 @@ struct bcm_sysport_priv {
 
 	/* For atomic update generic 64bit value on 32bit Machine */
 	struct u64_stats_sync	syncp;
+
+	/* netdev notifier to map switch port queues */
+	struct notifier_block	queue_nb;
+	unsigned int		per_port_num_tx_queues;
+	unsigned long		queue_bitmap;
+	struct bcm_sysport_tx_ring *ring_map[DSA_MAX_PORTS * 8];
+
 };
 #endif /* __BCM_SYSPORT_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues
  2017-08-31  0:18 ` [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues Florian Fainelli
@ 2017-08-31 23:44   ` Andrew Lunn
  2017-09-01  4:00     ` Florian Fainelli
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Lunn @ 2017-08-31 23:44 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, jiri, jhs, davem, xiyou.wangcong, vivien.didelot

On Wed, Aug 30, 2017 at 05:18:45PM -0700, Florian Fainelli wrote:
> Let switch drivers indicate how many RX and TX queues they support. Some
> switches, such as Broadcom Starfighter 2 are resigned with 8 egress
> queues.

Marvell switches also have egress queue.

Does the SF2 have ingress queues? Marvel don't as far as i known.  So
i wounder if num_rx_queues is useful?

Do switches in general have ingress queues? 

   Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
                   ` (7 preceding siblings ...)
  2017-08-31  0:18 ` [RFC net-next 8/8] net: systemport: Establish DSA network device queue mapping Florian Fainelli
@ 2017-09-01  0:05 ` Andrew Lunn
  2017-09-01  4:10   ` Florian Fainelli
  8 siblings, 1 reply; 20+ messages in thread
From: Andrew Lunn @ 2017-09-01  0:05 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, jiri, jhs, davem, xiyou.wangcong, vivien.didelot

On Wed, Aug 30, 2017 at 05:18:44PM -0700, Florian Fainelli wrote:
> This patch series is sent as reference, especially because the last patch
> is trying not to be creating too many layer violations, but clearly there
> are a little bit being created here anyways.
> 
> Essentially what I am trying to achieve is that you have a stacked device which
> is multi-queue aware, that applications will be using, and for which they can
> control the queue selection (using mq) the way they want. Each of each stacked
> network devices are created for each port of the switch (this is what DSA
> does). When a skb is submitted from say net_device X, we can derive its port
> number and look at the queue_mapping value to determine which port of the
> switch and queue we should be sending this to. The information is embedded in a
> tag (4 bytes) and is used by the switch to steer the transmission.
> 
> These stacked devices will actually transmit using a "master" or conduit
> network device which has a number of queues as well. In one version of the
> hardware that I work with, we have up to 4 ports, each with 8 queues, and the
> master device has a total of 32 hardware queues, so a 1:1 mapping is easy. With
> another version of the hardware, same number of ports and queues, but only 16
> hardware queues, so only a 2:1 mapping is possible.
> 
> In order for congestion information to work properly, I need to establish a
> mapping, preferably before transmission starts (but reconfiguration while
> interfaces are running would be possible too) between these stacked device's
> queue and the conduit interface's queue.
> 
> Comments, flames, rotten tomatoes, anything!

Right, i think i understand.

This works just for traffic between the host and ports.  The host can
set the egress queue. And i assume the queues are priorities, either
absolute or weighted round robin, etc.

But this has no effect on traffic going from port to port. At some
point, i expect you will want to offload TC for that.

How will the two interact? Could the TC rules also act on traffic from
the host to a port? Would it be simpler in the long run to just
implement TC rules?

	  Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues
  2017-08-31 23:44   ` Andrew Lunn
@ 2017-09-01  4:00     ` Florian Fainelli
  0 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2017-09-01  4:00 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, jiri, jhs, davem, xiyou.wangcong, vivien.didelot



On 08/31/2017 04:44 PM, Andrew Lunn wrote:
> On Wed, Aug 30, 2017 at 05:18:45PM -0700, Florian Fainelli wrote:
>> Let switch drivers indicate how many RX and TX queues they support. Some
>> switches, such as Broadcom Starfighter 2 are resigned with 8 egress
>> queues.
> 
> Marvell switches also have egress queue.
> 
> Does the SF2 have ingress queues? Marvel don't as far as i known.  So
> i wounder if num_rx_queues is useful?

At the moment probably not, since we are not doing anything useful other
than creating the network devices with the indicated number of queues.

> 
> Do switches in general have ingress queues? 

They do, at least the Starfigther 2 has, and from the Broadcom tag you
can get such information (BRCM_EG_TC_SHIFT) and you could presumably
record that queue on the SKB. I don't have an use case for that (yet?).
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01  0:05 ` [RFC net-next 0/8] net: dsa: Multi-queue awareness Andrew Lunn
@ 2017-09-01  4:10   ` Florian Fainelli
  2017-09-01 13:29     ` Andrew Lunn
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2017-09-01  4:10 UTC (permalink / raw)
  To: Andrew Lunn, jiri, jhs; +Cc: netdev, davem, xiyou.wangcong, vivien.didelot



On 08/31/2017 05:05 PM, Andrew Lunn wrote:
> On Wed, Aug 30, 2017 at 05:18:44PM -0700, Florian Fainelli wrote:
>> This patch series is sent as reference, especially because the last patch
>> is trying not to be creating too many layer violations, but clearly there
>> are a little bit being created here anyways.
>>
>> Essentially what I am trying to achieve is that you have a stacked device which
>> is multi-queue aware, that applications will be using, and for which they can
>> control the queue selection (using mq) the way they want. Each of each stacked
>> network devices are created for each port of the switch (this is what DSA
>> does). When a skb is submitted from say net_device X, we can derive its port
>> number and look at the queue_mapping value to determine which port of the
>> switch and queue we should be sending this to. The information is embedded in a
>> tag (4 bytes) and is used by the switch to steer the transmission.
>>
>> These stacked devices will actually transmit using a "master" or conduit
>> network device which has a number of queues as well. In one version of the
>> hardware that I work with, we have up to 4 ports, each with 8 queues, and the
>> master device has a total of 32 hardware queues, so a 1:1 mapping is easy. With
>> another version of the hardware, same number of ports and queues, but only 16
>> hardware queues, so only a 2:1 mapping is possible.
>>
>> In order for congestion information to work properly, I need to establish a
>> mapping, preferably before transmission starts (but reconfiguration while
>> interfaces are running would be possible too) between these stacked device's
>> queue and the conduit interface's queue.
>>
>> Comments, flames, rotten tomatoes, anything!
> 
> Right, i think i understand.
> 
> This works just for traffic between the host and ports.  The host can
> set the egress queue. And i assume the queues are priorities, either
> absolute or weighted round robin, etc.
> 
> But this has no effect on traffic going from port to port. At some
> point, i expect you will want to offload TC for that.

You are absolutely right, this patch series aims at having the host be
able to steer traffic towards particular switch port egress queues which
are configured with specific priorities. At the moment it really is
mapping one priority value (in the 802.1p sense) to one queue number and
let the switch scheduler figure things out.

With this patch set you can now use the multiq filter of tc and do
exactly what is documented under Documentation/networking/multiqueue.txt
and get the desired matches to be steered towards the queue you defined.

> 
> How will the two interact? Could the TC rules also act on traffic from
> the host to a port? Would it be simpler in the long run to just
> implement TC rules?

I suppose that you could somehow use TC to influence how the traffic
from host to CPU works, but without a "CPU" port representor the
question is how do we get that done? If we used "eth0" we need to
callback into the switch driver for programming..

Regarding the last patch in this series, what I would ideally to replace
it with is something along the lines of:

tc bind dev sw0p0 queue 0 dev eth0 queue 16

I am not sure if this is an action, or a filter, or something else...
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01  4:10   ` Florian Fainelli
@ 2017-09-01 13:29     ` Andrew Lunn
  2017-09-01 16:46       ` Florian Fainelli
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Lunn @ 2017-09-01 13:29 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: jiri, jhs, netdev, davem, xiyou.wangcong, vivien.didelot

> I suppose that you could somehow use TC to influence how the traffic
> from host to CPU works, but without a "CPU" port representor the
> question is how do we get that done? If we used "eth0" we need to
> callback into the switch driver for programming..

We need to compare how the different switches work with respect to
QoS. Marvell switches do a lot of the classification on the ingress
port where it determines what queue the frame should be placed in on
the egress port. The egress port then schedules its queues.

This does not map to TC too well.

> Regarding the last patch in this series, what I would ideally to replace
> it with is something along the lines of:
> 
> tc bind dev sw0p0 queue 0 dev eth0 queue 16

Why do you need this? sw0p0 has 8 queues? So i assume you use TC on
sw0p0 to place frames into these queues? The queue then gets passed
transparently down through the conduit interface and then used by the
tagger. I don't see why you need eth0 here? We try our best to avoid
eth0 wherever possible, it causes confusion. So i would prefer not to
have to use eth0 with TC commands.

     Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01 13:29     ` Andrew Lunn
@ 2017-09-01 16:46       ` Florian Fainelli
  2017-09-01 17:55         ` Andrew Lunn
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2017-09-01 16:46 UTC (permalink / raw)
  To: Andrew Lunn, jiri, jhs; +Cc: netdev, davem, xiyou.wangcong, vivien.didelot

On 09/01/2017 06:29 AM, Andrew Lunn wrote:
>> I suppose that you could somehow use TC to influence how the traffic
>> from host to CPU works, but without a "CPU" port representor the
>> question is how do we get that done? If we used "eth0" we need to
>> callback into the switch driver for programming..
> 
> We need to compare how the different switches work with respect to
> QoS. Marvell switches do a lot of the classification on the ingress
> port where it determines what queue the frame should be placed in on
> the egress port. The egress port then schedules its queues.
> 
> This does not map to TC too well.
> 
>> Regarding the last patch in this series, what I would ideally to replace
>> it with is something along the lines of:
>>
>> tc bind dev sw0p0 queue 0 dev eth0 queue 16
> 
> Why do you need this? sw0p0 has 8 queues? So i assume you use TC on
> sw0p0 to place frames into these queues? The queue then gets passed
> transparently down through the conduit interface and then used by the
> tagger. I don't see why you need eth0 here? We try our best to avoid
> eth0 wherever possible, it causes confusion. So i would prefer not to
> have to use eth0 with TC commands.

Well, if you read through patch 8 maybe this is explained. The dynamic
queue selection is working fine through the use of the DSA network
device's queue being passed to the Broadcom tag. If there was just I
would agree with you, but here is the catch below.

We also have this unique (AFAICT) hardware feature called Advanced
Congestion Buffering (ACB) where the CPU Ethernet MAC can receive
congestion information from the switch queues directly, but out of band
from specifically added HW logic and signals. This is not using pause
frames.

This is useful for instance when your CPU is linking at 1Gbits/sec (or
more) internally with the switch, but you have connected external hosts
that are only 10/100 capable. When you push 1Gbits/sec of traffic
towards such hosts, you would get severe packet loss, unless you have
pause frames enabled. The problem with Pause frames within the SF2
switch design is that they are not per-flow though, so you can't resolve
from the pause frames which egress queue to backpressure. With ACB
though, you get end-to-end backpressure between say, Port 8 and Port 0.
In order for this to work though, you need the CPU MAC to be inspecting
or rather receiving congestion notification from the switch port and
queues directly.

This is why I need to define a mapping between switch (port P,queue Q)
and CPU MAC queues (Q'). In the first generation HW, we have up to 4
ports exposed, each with 8 queues, and we have 32 CPU queues, 1:1
mapping is possible. On 2nd generation hardware, same number of ports
and queues per port, but only 16 queues, so a 2:1 mapping is possible only.

If I do not establish the mapping, several problems can occur but the
most severe is that congestion notification logic gets its congestion
information from the wrong port and queue, so it can be permanently
backpressuring the CPU queue based on e.g: a disabled port. This results
in the transmit queue being disabled, and so you get the netdev watchdog
to kick in and scream.

As you can see from the last patch, I used a notifier to receive
information when DSA slave network devices are added, from which I
extract their port number and set up an internal mapping to the CPU
queues, but this is creating a layering violation since I have now the
CPU driver extracting DSA specific net_device information.

The idea behind exposing a command like the proposed "tc bind" is to let
users define the mapping as they see fit. 1:1 or 2:1 mapping is fine as
a default, but may not be satisfactory for some use cases.

Hope this helps understand the bigger picture ;)
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01 16:46       ` Florian Fainelli
@ 2017-09-01 17:55         ` Andrew Lunn
  2017-09-01 18:27           ` Florian Fainelli
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Lunn @ 2017-09-01 17:55 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: jiri, jhs, netdev, davem, xiyou.wangcong, vivien.didelot

Hi Florian

> >> tc bind dev sw0p0 queue 0 dev eth0 queue 16

It this the eth0 i don't like here. Why not in the implementation just
use something like netdev_master_upper_dev_get('sw0p0')? Or does

tc bind dev sw0p0 queue 0 dev lo queue 16

make sense?

     Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01 17:55         ` Andrew Lunn
@ 2017-09-01 18:27           ` Florian Fainelli
  2017-09-01 18:50             ` Andrew Lunn
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2017-09-01 18:27 UTC (permalink / raw)
  To: Andrew Lunn, jiri; +Cc: jhs, netdev, davem, xiyou.wangcong, vivien.didelot

On 09/01/2017 10:55 AM, Andrew Lunn wrote:
> Hi Florian
> 
>>>> tc bind dev sw0p0 queue 0 dev eth0 queue 16
> 
> It this the eth0 i don't like here. Why not in the implementation just
> use something like netdev_master_upper_dev_get('sw0p0')? Or does
> 
> tc bind dev sw0p0 queue 0 dev lo queue 16
> 
> make sense?

Last I brought this up with Jiri that we should link DSA network devices
to their master network deviecs with netdev_upper_dev_link() he said
this was not appropriate for DSA slave network devices, but I can't
remember why, I would assume that any stacked device set up would do that.

In any case, we need to establish a mapping so we have to specify at
least the target device's queue number. It is quite similar in premise
to e.g: enslaving a network device to a bridge port:

ip link set dev eth0 master br0

Thanks
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01 18:27           ` Florian Fainelli
@ 2017-09-01 18:50             ` Andrew Lunn
  2017-09-01 19:21               ` Florian Fainelli
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Lunn @ 2017-09-01 18:50 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: jiri, jhs, netdev, davem, xiyou.wangcong, vivien.didelot

On Fri, Sep 01, 2017 at 11:27:43AM -0700, Florian Fainelli wrote:
> On 09/01/2017 10:55 AM, Andrew Lunn wrote:
> > Hi Florian
> > 
> >>>> tc bind dev sw0p0 queue 0 dev eth0 queue 16
> > 
> > It this the eth0 i don't like here. Why not in the implementation just
> > use something like netdev_master_upper_dev_get('sw0p0')? Or does
> 
> Last I brought this up with Jiri that we should link DSA network devices
> to their master network deviecs with netdev_upper_dev_link() he said
> this was not appropriate for DSA slave network devices, but I can't
> remember why, I would assume that any stacked device set up would do that.

There is some form a linking going, our device names show that:

9: lan5@eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether da:87:2a:03:cf:16 brd ff:ff:ff:ff:ff:ff

> In any case, we need to establish a mapping so we have to specify at
> least the target device's queue number. It is quite similar in premise
> to e.g: enslaving a network device to a bridge port:
> 
> ip link set dev eth0 master br0

But here br0 is absolutely required, we have to say which bridge the
slave port should be a member of.

But what good is eth0 in

tc bind dev sw0p0 queue 0 dev eth0 queue 16

As i said suggesting, you have to somehow verify that eth0 is the
conduit interface sw0p0 is using. Which makes the parameter pointless.
Determine it from the sw0p0 somehow.

	  Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01 18:50             ` Andrew Lunn
@ 2017-09-01 19:21               ` Florian Fainelli
  2017-09-01 19:44                 ` Andrew Lunn
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2017-09-01 19:21 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: jiri, jhs, netdev, davem, xiyou.wangcong, vivien.didelot

On 09/01/2017 11:50 AM, Andrew Lunn wrote:
> On Fri, Sep 01, 2017 at 11:27:43AM -0700, Florian Fainelli wrote:
>> On 09/01/2017 10:55 AM, Andrew Lunn wrote:
>>> Hi Florian
>>>
>>>>>> tc bind dev sw0p0 queue 0 dev eth0 queue 16
>>>
>>> It this the eth0 i don't like here. Why not in the implementation just
>>> use something like netdev_master_upper_dev_get('sw0p0')? Or does
>>
>> Last I brought this up with Jiri that we should link DSA network devices
>> to their master network deviecs with netdev_upper_dev_link() he said
>> this was not appropriate for DSA slave network devices, but I can't
>> remember why, I would assume that any stacked device set up would do that.
> 
> There is some form a linking going, our device names show that:
> 
> 9: lan5@eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
>     link/ether da:87:2a:03:cf:16 brd ff:ff:ff:ff:ff:ff

This is because iproute2 is linking the devices based on what
ndo_get_iflink() returns.

> 
>> In any case, we need to establish a mapping so we have to specify at
>> least the target device's queue number. It is quite similar in premise
>> to e.g: enslaving a network device to a bridge port:
>>
>> ip link set dev eth0 master br0
> 
> But here br0 is absolutely required, we have to say which bridge the
> slave port should be a member of.

Right,

> 
> But what good is eth0 in
> 
> tc bind dev sw0p0 queue 0 dev eth0 queue 16
> 
> As i said suggesting, you have to somehow verify that eth0 is the
> conduit interface sw0p0 is using. Which makes the parameter pointless.
> Determine it from the sw0p0 somehow.

I see what you mean, so something along the lines of just:

tc bind dev swp0p0 queue 0 master queue 16

without having to specify the master network device since it's implicit,
I kind of like that.
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
  2017-09-01 19:21               ` Florian Fainelli
@ 2017-09-01 19:44                 ` Andrew Lunn
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Lunn @ 2017-09-01 19:44 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: jiri, jhs, netdev, davem, xiyou.wangcong, vivien.didelot

> I see what you mean, so something along the lines of just:
> 
> tc bind dev swp0p0 queue 0 master queue 16
> 
> without having to specify the master network device since it's implicit,
> I kind of like that.

Yes, that is better.

     Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-09-01 19:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues Florian Fainelli
2017-08-31 23:44   ` Andrew Lunn
2017-09-01  4:00     ` Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 2/8] net: dsa: tag_brcm: Set output queue from skb queue mapping Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 3/8] net: dsa: bcm_sf2: Advertise number of egress queues Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 4/8] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 5/8] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 6/8] net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 7/8] net: dsa: tag_brcm: Indicate to master netdevice port + queue Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 8/8] net: systemport: Establish DSA network device queue mapping Florian Fainelli
2017-09-01  0:05 ` [RFC net-next 0/8] net: dsa: Multi-queue awareness Andrew Lunn
2017-09-01  4:10   ` Florian Fainelli
2017-09-01 13:29     ` Andrew Lunn
2017-09-01 16:46       ` Florian Fainelli
2017-09-01 17:55         ` Andrew Lunn
2017-09-01 18:27           ` Florian Fainelli
2017-09-01 18:50             ` Andrew Lunn
2017-09-01 19:21               ` Florian Fainelli
2017-09-01 19:44                 ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).