netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] Generic TX reallocation for DSA
@ 2020-10-17 21:35 Vladimir Oltean
  2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
                   ` (13 more replies)
  0 siblings, 14 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:35 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

Christian has reported buggy usage of skb_put() in tag_ksz.c, which is
only triggerable in real life using his not-yet-published patches for
IEEE 1588 timestamping on Micrel KSZ switches.

The concrete problem there is that the driver can end up calling
skb_put() and exceed the end of the skb data area, because even though
it had reallocated the frame once before, it hadn't reallocated it large
enough. Christian explained it in more detail here:

https://lore.kernel.org/netdev/20201014161719.30289-1-ceggers@arri.de/
https://lore.kernel.org/netdev/20201016200226.23994-1-ceggers@arri.de/

But actually there's a bigger problem, which is that some taggers which
get more rarely tested tend to do some shenanigans which are uncaught
for the longest time, and in the meanwhile, their code gets copy-pasted
into other taggers, creating a mess. For example, the tail tagging
driver for Marvell 88E6060 currently reallocates _every_single_frame_ on
TX. Is that an obvious indication that nobody is using it? Sure. Is it a
good model to follow when developing a new tail tagging driver? No.

DSA has all the information it needs in order to simplify the job of a
tagger on TX. It knows whether it's a normal or a tail tagger, and what
is the protocol overhead it incurs. So why not just perform the
reallocation centrally, which also has the benefit of being able to
introduce a common ethtool statistics counter for number of TX reallocs.
With the latter, performance issues due to this particular reason are
easy to track down.

Christian Eggers (2):
  net: dsa: tag_ksz: don't allocate additional memory for
    padding/tagging
  net: dsa: trailer: don't allocate additional memory for
    padding/tagging

Vladimir Oltean (11):
  net: dsa: add plumbing for custom netdev statistics
  net: dsa: implement a central TX reallocation procedure
  net: dsa: tag_qca: let DSA core deal with TX reallocation
  net: dsa: tag_ocelot: let DSA core deal with TX reallocation
  net: dsa: tag_mtk: let DSA core deal with TX reallocation
  net: dsa: tag_lan9303: let DSA core deal with TX reallocation
  net: dsa: tag_edsa: let DSA core deal with TX reallocation
  net: dsa: tag_brcm: let DSA core deal with TX reallocation
  net: dsa: tag_dsa: let DSA core deal with TX reallocation
  net: dsa: tag_gswip: let DSA core deal with TX reallocation
  net: dsa: tag_ar9331: let DSA core deal with TX reallocation

 net/dsa/dsa_priv.h    |  9 ++++++
 net/dsa/slave.c       | 68 ++++++++++++++++++++++++++++++++++++++--
 net/dsa/tag_ar9331.c  |  3 --
 net/dsa/tag_brcm.c    |  3 --
 net/dsa/tag_dsa.c     |  5 ---
 net/dsa/tag_edsa.c    |  4 ---
 net/dsa/tag_gswip.c   |  4 ---
 net/dsa/tag_ksz.c     | 73 ++++++-------------------------------------
 net/dsa/tag_lan9303.c |  9 ------
 net/dsa/tag_mtk.c     |  3 --
 net/dsa/tag_ocelot.c  |  7 -----
 net/dsa/tag_qca.c     |  3 --
 net/dsa/tag_trailer.c | 31 ++----------------
 13 files changed, 85 insertions(+), 137 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
@ 2020-10-17 21:35 ` Vladimir Oltean
  2020-10-17 22:13   ` Florian Fainelli
                     ` (3 more replies)
  2020-10-17 21:36 ` [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure Vladimir Oltean
                   ` (12 subsequent siblings)
  13 siblings, 4 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:35 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

DSA needs to push a header onto every packet on TX, and this might cause
reallocation under certain scenarios, which might affect, for example,
performance.

But reallocated packets are not standardized in struct pcpu_sw_netstats,
struct net_device_stats or anywhere else, it seems, so we need to roll
our own extra netdevice statistics and expose them to ethtool.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa_priv.h |  9 +++++++++
 net/dsa/slave.c    | 25 ++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 12998bf04e55..d39db7500cdd 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -73,12 +73,21 @@ struct dsa_notifier_mtu_info {
 	int mtu;
 };
 
+/* Driver statistics, other than those in struct rtnl_link_stats64.
+ * These are collected per-CPU and aggregated by ethtool.
+ */
+struct dsa_slave_stats {
+	__u64			tx_reallocs;
+	struct u64_stats_sync	syncp;
+} __aligned(1 * sizeof(u64));
+
 struct dsa_slave_priv {
 	/* Copy of CPU port xmit for faster access in slave transmit hot path */
 	struct sk_buff *	(*xmit)(struct sk_buff *skb,
 					struct net_device *dev);
 
 	struct pcpu_sw_netstats	__percpu *stats64;
+	struct dsa_slave_stats	__percpu *extra_stats;
 
 	struct gro_cells	gcells;
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 3bc5ca40c9fb..d4326940233c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -668,9 +668,10 @@ static void dsa_slave_get_strings(struct net_device *dev,
 		strncpy(data + len, "tx_bytes", len);
 		strncpy(data + 2 * len, "rx_packets", len);
 		strncpy(data + 3 * len, "rx_bytes", len);
+		strncpy(data + 4 * len, "tx_reallocs", len);
 		if (ds->ops->get_strings)
 			ds->ops->get_strings(ds, dp->index, stringset,
-					     data + 4 * len);
+					     data + 5 * len);
 	}
 }
 
@@ -682,11 +683,13 @@ static void dsa_slave_get_ethtool_stats(struct net_device *dev,
 	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_switch *ds = dp->ds;
 	struct pcpu_sw_netstats *s;
+	struct dsa_slave_stats *e;
 	unsigned int start;
 	int i;
 
 	for_each_possible_cpu(i) {
 		u64 tx_packets, tx_bytes, rx_packets, rx_bytes;
+		u64 tx_reallocs;
 
 		s = per_cpu_ptr(p->stats64, i);
 		do {
@@ -696,13 +699,21 @@ static void dsa_slave_get_ethtool_stats(struct net_device *dev,
 			rx_packets = s->rx_packets;
 			rx_bytes = s->rx_bytes;
 		} while (u64_stats_fetch_retry_irq(&s->syncp, start));
+
+		e = per_cpu_ptr(p->extra_stats, i);
+		do {
+			start = u64_stats_fetch_begin_irq(&e->syncp);
+			tx_reallocs	= e->tx_reallocs;
+		} while (u64_stats_fetch_retry_irq(&e->syncp, start));
+
 		data[0] += tx_packets;
 		data[1] += tx_bytes;
 		data[2] += rx_packets;
 		data[3] += rx_bytes;
+		data[4] += tx_reallocs;
 	}
 	if (ds->ops->get_ethtool_stats)
-		ds->ops->get_ethtool_stats(ds, dp->index, data + 4);
+		ds->ops->get_ethtool_stats(ds, dp->index, data + 5);
 }
 
 static int dsa_slave_get_sset_count(struct net_device *dev, int sset)
@@ -713,7 +724,7 @@ static int dsa_slave_get_sset_count(struct net_device *dev, int sset)
 	if (sset == ETH_SS_STATS) {
 		int count;
 
-		count = 4;
+		count = 5;
 		if (ds->ops->get_sset_count)
 			count += ds->ops->get_sset_count(ds, dp->index, sset);
 
@@ -1806,6 +1817,12 @@ int dsa_slave_create(struct dsa_port *port)
 		free_netdev(slave_dev);
 		return -ENOMEM;
 	}
+	p->extra_stats = netdev_alloc_pcpu_stats(struct dsa_slave_stats);
+	if (!p->extra_stats) {
+		free_percpu(p->stats64);
+		free_netdev(slave_dev);
+		return -ENOMEM;
+	}
 
 	ret = gro_cells_init(&p->gcells, slave_dev);
 	if (ret)
@@ -1864,6 +1881,7 @@ int dsa_slave_create(struct dsa_port *port)
 out_gcells:
 	gro_cells_destroy(&p->gcells);
 out_free:
+	free_percpu(p->extra_stats);
 	free_percpu(p->stats64);
 	free_netdev(slave_dev);
 	port->slave = NULL;
@@ -1886,6 +1904,7 @@ void dsa_slave_destroy(struct net_device *slave_dev)
 	dsa_slave_notify(slave_dev, DSA_PORT_UNREGISTER);
 	phylink_destroy(dp->pl);
 	gro_cells_destroy(&p->gcells);
+	free_percpu(p->extra_stats);
 	free_percpu(p->stats64);
 	free_netdev(slave_dev);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
  2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 22:01   ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 03/13] net: dsa: tag_ksz: don't allocate additional memory for padding/tagging Vladimir Oltean
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

At the moment, taggers are left with the task of ensuring that the skb
headers are writable (which they aren't, if the frames were cloned for
TX timestamping, for flooding by the bridge, etc), and that there is
enough space in the skb data area for the DSA tag to be pushed.

Moreover, the life of tail taggers is even harder, because they need to
ensure that short frames have enough padding, a problem that normal
taggers don't have.

The principle of the DSA framework is that everything except for the
most intimate hardware specifics (like in this case, the actual packing
of the DSA tag bits) should be done inside the core, to avoid having
code paths that are very rarely tested.

So provide a TX reallocation procedure that should cover the known needs
of DSA today.

Note that this patch also gives the network stack a good hint about the
headroom/tailroom it's going to need. Up till now it wasn't doing that.
So the reallocation procedure should really be there only for the
exceptional cases, and for cloned packets which need to be unshared.
The tx_reallocs counter should prove that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/slave.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index d4326940233c..790f5c8deb13 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -548,6 +548,36 @@ netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(dsa_enqueue_skb);
 
+static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev)
+{
+	struct net_device *master = dsa_slave_to_master(dev);
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_slave_stats *e;
+	int headroom, tailroom;
+
+	headroom = dev->needed_headroom;
+	tailroom = dev->needed_tailroom;
+	/* For tail taggers, we need to pad short frames ourselves, to ensure
+	 * that the tail tag does not fail at its role of being at the end of
+	 * the packet, once the master interface pads the frame.
+	 */
+	if (unlikely(tailroom && skb->len < ETH_ZLEN))
+		tailroom += ETH_ZLEN - skb->len;
+
+	if (likely(skb_headroom(skb) >= headroom &&
+		   skb_tailroom(skb) >= tailroom) &&
+		   !skb_cloned(skb))
+		/* No reallocation needed, yay! */
+		return 0;
+
+	e = this_cpu_ptr(p->extra_stats);
+	u64_stats_update_begin(&e->syncp);
+	e->tx_reallocs++;
+	u64_stats_update_end(&e->syncp);
+
+	return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
+}
+
 static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
@@ -567,6 +597,11 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
 	 */
 	dsa_skb_tx_timestamp(p, skb);
 
+	if (dsa_realloc_skb(skb, dev)) {
+		kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
 	/* Transmit function may have to reallocate the original SKB,
 	 * in which case it must have freed it. Only free it here on error.
 	 */
@@ -1802,6 +1837,14 @@ int dsa_slave_create(struct dsa_port *port)
 	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
 	if (ds->ops->port_max_mtu)
 		slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index);
+	/* Try to save one extra realloc later in the TX path (in the master)
+	 * by also inheriting the master's needed headroom and tailroom.
+	 * The 8021q driver also does this.
+	 */
+	if (cpu_dp->tag_ops->tail_tag)
+		slave_dev->needed_tailroom = cpu_dp->tag_ops->overhead;
+	else
+		slave_dev->needed_headroom = cpu_dp->tag_ops->overhead;
 	SET_NETDEV_DEVTYPE(slave_dev, &dsa_type);
 
 	netdev_for_each_tx_queue(slave_dev, dsa_slave_set_lockdep_class_one,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 03/13] net: dsa: tag_ksz: don't allocate additional memory for padding/tagging
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
  2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 04/13] net: dsa: trailer: " Vladimir Oltean
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

From: Christian Eggers <ceggers@arri.de>

The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_ksz.c | 73 ++++++-----------------------------------------
 1 file changed, 9 insertions(+), 64 deletions(-)

diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 945a9bd5ba35..e78a783bb841 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -14,46 +14,6 @@
 #define KSZ_EGRESS_TAG_LEN		1
 #define KSZ_INGRESS_TAG_LEN		1
 
-static struct sk_buff *ksz_common_xmit(struct sk_buff *skb,
-				       struct net_device *dev, int len)
-{
-	struct sk_buff *nskb;
-	int padlen;
-
-	padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
-
-	if (skb_tailroom(skb) >= padlen + len) {
-		/* Let dsa_slave_xmit() free skb */
-		if (__skb_put_padto(skb, skb->len + padlen, false))
-			return NULL;
-
-		nskb = skb;
-	} else {
-		nskb = alloc_skb(NET_IP_ALIGN + skb->len +
-				 padlen + len, GFP_ATOMIC);
-		if (!nskb)
-			return NULL;
-		skb_reserve(nskb, NET_IP_ALIGN);
-
-		skb_reset_mac_header(nskb);
-		skb_set_network_header(nskb,
-				       skb_network_header(skb) - skb->head);
-		skb_set_transport_header(nskb,
-					 skb_transport_header(skb) - skb->head);
-		skb_copy_and_csum_dev(skb, skb_put(nskb, skb->len));
-
-		/* Let skb_put_padto() free nskb, and let dsa_slave_xmit() free
-		 * skb
-		 */
-		if (skb_put_padto(nskb, nskb->len + padlen))
-			return NULL;
-
-		consume_skb(skb);
-	}
-
-	return nskb;
-}
-
 static struct sk_buff *ksz_common_rcv(struct sk_buff *skb,
 				      struct net_device *dev,
 				      unsigned int port, unsigned int len)
@@ -90,23 +50,18 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb,
 static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
-	struct sk_buff *nskb;
 	u8 *tag;
 	u8 *addr;
 
-	nskb = ksz_common_xmit(skb, dev, KSZ_INGRESS_TAG_LEN);
-	if (!nskb)
-		return NULL;
-
 	/* Tag encoding */
-	tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
-	addr = skb_mac_header(nskb);
+	tag = skb_put(skb, KSZ_INGRESS_TAG_LEN);
+	addr = skb_mac_header(skb);
 
 	*tag = 1 << dp->index;
 	if (is_link_local_ether_addr(addr))
 		*tag |= KSZ8795_TAIL_TAG_OVERRIDE;
 
-	return nskb;
+	return skb;
 }
 
 static struct sk_buff *ksz8795_rcv(struct sk_buff *skb, struct net_device *dev,
@@ -155,18 +110,13 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb,
 				    struct net_device *dev)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
-	struct sk_buff *nskb;
 	__be16 *tag;
 	u8 *addr;
 	u16 val;
 
-	nskb = ksz_common_xmit(skb, dev, KSZ9477_INGRESS_TAG_LEN);
-	if (!nskb)
-		return NULL;
-
 	/* Tag encoding */
-	tag = skb_put(nskb, KSZ9477_INGRESS_TAG_LEN);
-	addr = skb_mac_header(nskb);
+	tag = skb_put(skb, KSZ9477_INGRESS_TAG_LEN);
+	addr = skb_mac_header(skb);
 
 	val = BIT(dp->index);
 
@@ -175,7 +125,7 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb,
 
 	*tag = cpu_to_be16(val);
 
-	return nskb;
+	return skb;
 }
 
 static struct sk_buff *ksz9477_rcv(struct sk_buff *skb, struct net_device *dev,
@@ -211,24 +161,19 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb,
 				    struct net_device *dev)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
-	struct sk_buff *nskb;
 	u8 *addr;
 	u8 *tag;
 
-	nskb = ksz_common_xmit(skb, dev, KSZ_INGRESS_TAG_LEN);
-	if (!nskb)
-		return NULL;
-
 	/* Tag encoding */
-	tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
-	addr = skb_mac_header(nskb);
+	tag = skb_put(skb, KSZ_INGRESS_TAG_LEN);
+	addr = skb_mac_header(skb);
 
 	*tag = BIT(dp->index);
 
 	if (is_link_local_ether_addr(addr))
 		*tag |= KSZ9893_TAIL_TAG_OVERRIDE;
 
-	return nskb;
+	return skb;
 }
 
 static const struct dsa_device_ops ksz9893_netdev_ops = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 04/13] net: dsa: trailer: don't allocate additional memory for padding/tagging
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (2 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 03/13] net: dsa: tag_ksz: don't allocate additional memory for padding/tagging Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 05/13] net: dsa: tag_qca: let DSA core deal with TX reallocation Vladimir Oltean
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

From: Christian Eggers <ceggers@arri.de>

The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_trailer.c | 31 ++-----------------------------
 1 file changed, 2 insertions(+), 29 deletions(-)

diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index 3a1cc24a4f0a..5b97ede56a0f 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -13,42 +13,15 @@
 static struct sk_buff *trailer_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
-	struct sk_buff *nskb;
-	int padlen;
 	u8 *trailer;
 
-	/*
-	 * We have to make sure that the trailer ends up as the very
-	 * last 4 bytes of the packet.  This means that we have to pad
-	 * the packet to the minimum ethernet frame size, if necessary,
-	 * before adding the trailer.
-	 */
-	padlen = 0;
-	if (skb->len < 60)
-		padlen = 60 - skb->len;
-
-	nskb = alloc_skb(NET_IP_ALIGN + skb->len + padlen + 4, GFP_ATOMIC);
-	if (!nskb)
-		return NULL;
-	skb_reserve(nskb, NET_IP_ALIGN);
-
-	skb_reset_mac_header(nskb);
-	skb_set_network_header(nskb, skb_network_header(skb) - skb->head);
-	skb_set_transport_header(nskb, skb_transport_header(skb) - skb->head);
-	skb_copy_and_csum_dev(skb, skb_put(nskb, skb->len));
-	consume_skb(skb);
-
-	if (padlen) {
-		skb_put_zero(nskb, padlen);
-	}
-
-	trailer = skb_put(nskb, 4);
+	trailer = skb_put(skb, 4);
 	trailer[0] = 0x80;
 	trailer[1] = 1 << dp->index;
 	trailer[2] = 0x10;
 	trailer[3] = 0x00;
 
-	return nskb;
+	return skb;
 }
 
 static struct sk_buff *trailer_rcv(struct sk_buff *skb, struct net_device *dev,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 05/13] net: dsa: tag_qca: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (3 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 04/13] net: dsa: trailer: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 06/13] net: dsa: tag_ocelot: " Vladimir Oltean
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach, John Crispin, Alexander Lobakin

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: John Crispin <john@phrozen.org>
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_qca.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 1b9e8507112b..88181b52f480 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -34,9 +34,6 @@ static struct sk_buff *qca_tag_xmit(struct sk_buff *skb, struct net_device *dev)
 	__be16 *phdr;
 	u16 hdr;
 
-	if (skb_cow_head(skb, QCA_HDR_LEN) < 0)
-		return NULL;
-
 	skb_push(skb, QCA_HDR_LEN);
 
 	memmove(skb->data, skb->data + QCA_HDR_LEN, 2 * ETH_ALEN);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 06/13] net: dsa: tag_ocelot: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (4 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 05/13] net: dsa: tag_qca: let DSA core deal with TX reallocation Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 07/13] net: dsa: tag_mtk: " Vladimir Oltean
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_ocelot.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index 3b468aca5c53..16a1afd5b8e1 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -143,13 +143,6 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
 	struct ocelot_port *ocelot_port;
 	u8 *prefix, *injection;
 	u64 qos_class, rew_op;
-	int err;
-
-	err = skb_cow_head(skb, OCELOT_TOTAL_TAG_LEN);
-	if (unlikely(err < 0)) {
-		netdev_err(netdev, "Cannot make room for tag.\n");
-		return NULL;
-	}
 
 	ocelot_port = ocelot->ports[dp->index];
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 07/13] net: dsa: tag_mtk: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (5 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 06/13] net: dsa: tag_ocelot: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 08/13] net: dsa: tag_lan9303: " Vladimir Oltean
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach, DENG Qingfang, Sean Wang, John Crispin

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_mtk.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index 4cdd9cf428fb..38dcdded74c0 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -34,9 +34,6 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
 	 * table with VID.
 	 */
 	if (!skb_vlan_tagged(skb)) {
-		if (skb_cow_head(skb, MTK_HDR_LEN) < 0)
-			return NULL;
-
 		skb_push(skb, MTK_HDR_LEN);
 		memmove(skb->data, skb->data + MTK_HDR_LEN, 2 * ETH_ALEN);
 		is_vlan_skb = false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/13] net: dsa: tag_lan9303: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (6 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 07/13] net: dsa: tag_mtk: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 09/13] net: dsa: tag_edsa: " Vladimir Oltean
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_lan9303.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index ccfb6f641bbf..aa1318dccaf0 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -58,15 +58,6 @@ static struct sk_buff *lan9303_xmit(struct sk_buff *skb, struct net_device *dev)
 	__be16 *lan9303_tag;
 	u16 tag;
 
-	/* insert a special VLAN tag between the MAC addresses
-	 * and the current ethertype field.
-	 */
-	if (skb_cow_head(skb, LAN9303_TAG_LEN) < 0) {
-		dev_dbg(&dev->dev,
-			"Cannot make room for the special tag. Dropping packet\n");
-		return NULL;
-	}
-
 	/* provide 'LAN9303_TAG_LEN' bytes additional space */
 	skb_push(skb, LAN9303_TAG_LEN);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 09/13] net: dsa: tag_edsa: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (7 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 08/13] net: dsa: tag_lan9303: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 10/13] net: dsa: tag_brcm: " Vladimir Oltean
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Note that the VLAN code path needs a smaller extra headroom than the
regular EtherType DSA path. That isn't a problem, because this tagger
declares the larger tag length (8 bytes vs 4) as the protocol overhead,
so we are covered in both cases.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_edsa.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 120614240319..abf70a29deb4 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -35,8 +35,6 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * current ethertype field if the packet is untagged.
 	 */
 	if (skb->protocol == htons(ETH_P_8021Q)) {
-		if (skb_cow_head(skb, DSA_HLEN) < 0)
-			return NULL;
 		skb_push(skb, DSA_HLEN);
 
 		memmove(skb->data, skb->data + DSA_HLEN, 2 * ETH_ALEN);
@@ -60,8 +58,6 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, struct net_device *dev)
 			edsa_header[6] &= ~0x10;
 		}
 	} else {
-		if (skb_cow_head(skb, EDSA_HLEN) < 0)
-			return NULL;
 		skb_push(skb, EDSA_HLEN);
 
 		memmove(skb->data, skb->data + EDSA_HLEN, 2 * ETH_ALEN);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 10/13] net: dsa: tag_brcm: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (8 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 09/13] net: dsa: tag_edsa: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 11/13] net: dsa: tag_dsa: " Vladimir Oltean
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_brcm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index ad72dff8d524..e934dace3922 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -66,9 +66,6 @@ static struct sk_buff *brcm_tag_xmit_ll(struct sk_buff *skb,
 	u16 queue = skb_get_queue_mapping(skb);
 	u8 *brcm_tag;
 
-	if (skb_cow_head(skb, BRCM_TAG_LEN) < 0)
-		return NULL;
-
 	/* The Ethernet switch we are interfaced with needs packets to be at
 	 * least 64 bytes (including FCS) otherwise they will be discarded when
 	 * they enter the switch port logic. When Broadcom tags are enabled, we
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 11/13] net: dsa: tag_dsa: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (9 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 10/13] net: dsa: tag_brcm: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 12/13] net: dsa: tag_gswip: " Vladimir Oltean
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Similar to the EtherType DSA tagger, the old Marvell tagger can
transform an 802.1Q header if present into a DSA tag, so there is no
headroom required in that case. But we are ensuring that it exists,
regardless (practically speaking, the headroom must be 4 bytes larger
than it needs to be).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_dsa.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 0b756fae68a5..63d690a0fca6 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -23,9 +23,6 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * the ethertype field for untagged packets.
 	 */
 	if (skb->protocol == htons(ETH_P_8021Q)) {
-		if (skb_cow_head(skb, 0) < 0)
-			return NULL;
-
 		/*
 		 * Construct tagged FROM_CPU DSA tag from 802.1q tag.
 		 */
@@ -41,8 +38,6 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct net_device *dev)
 			dsa_header[2] &= ~0x10;
 		}
 	} else {
-		if (skb_cow_head(skb, DSA_HLEN) < 0)
-			return NULL;
 		skb_push(skb, DSA_HLEN);
 
 		memmove(skb->data, skb->data + DSA_HLEN, 2 * ETH_ALEN);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 12/13] net: dsa: tag_gswip: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (10 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 11/13] net: dsa: tag_dsa: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 22:19   ` Vladimir Oltean
  2020-10-17 21:36 ` [RFC PATCH 13/13] net: dsa: tag_ar9331: " Vladimir Oltean
  2020-10-17 23:07 ` [RFC PATCH 00/13] Generic TX reallocation for DSA Andrew Lunn
  13 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach, Hauke Mehrtens

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

This one is interesting, the DSA tag is 8 bytes on RX and 4 bytes on TX.
Because DSA is unaware of asymmetrical tag lengths, the overhead/needed
headroom is declared as 8 bytes and therefore 4 bytes larger than it
needs to be. If this becomes a problem, and the GSWIP driver can't be
converted to a uniform header length, we might need to make DSA aware of
separate RX/TX overhead values.

Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_gswip.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/dsa/tag_gswip.c b/net/dsa/tag_gswip.c
index 408d4af390a0..cde93ccb21ac 100644
--- a/net/dsa/tag_gswip.c
+++ b/net/dsa/tag_gswip.c
@@ -63,10 +63,6 @@ static struct sk_buff *gswip_tag_xmit(struct sk_buff *skb,
 	int err;
 	u8 *gswip_tag;
 
-	err = skb_cow_head(skb, GSWIP_TX_HEADER_LEN);
-	if (err)
-		return NULL;
-
 	skb_push(skb, GSWIP_TX_HEADER_LEN);
 
 	gswip_tag = skb->data;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 13/13] net: dsa: tag_ar9331: let DSA core deal with TX reallocation
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (11 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 12/13] net: dsa: tag_gswip: " Vladimir Oltean
@ 2020-10-17 21:36 ` Vladimir Oltean
  2020-10-17 23:07 ` [RFC PATCH 00/13] Generic TX reallocation for DSA Andrew Lunn
  13 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 21:36 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach, Per Forlin, Oleksij Rempel

Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.

Cc: Per Forlin <per.forlin@axis.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/tag_ar9331.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/dsa/tag_ar9331.c b/net/dsa/tag_ar9331.c
index 55b00694cdba..002cf7f952e2 100644
--- a/net/dsa/tag_ar9331.c
+++ b/net/dsa/tag_ar9331.c
@@ -31,9 +31,6 @@ static struct sk_buff *ar9331_tag_xmit(struct sk_buff *skb,
 	__le16 *phdr;
 	u16 hdr;
 
-	if (skb_cow_head(skb, AR9331_HDR_LEN) < 0)
-		return NULL;
-
 	phdr = skb_push(skb, AR9331_HDR_LEN);
 
 	hdr = FIELD_PREP(AR9331_HDR_VERSION_MASK, AR9331_HDR_VERSION);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 21:36 ` [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure Vladimir Oltean
@ 2020-10-17 22:01   ` Vladimir Oltean
  2020-10-17 22:11     ` Florian Fainelli
  2020-10-17 22:31     ` Vladimir Oltean
  0 siblings, 2 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 22:01 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

On Sun, Oct 18, 2020 at 12:36:00AM +0300, Vladimir Oltean wrote:
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index d4326940233c..790f5c8deb13 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -548,6 +548,36 @@ netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dsa_enqueue_skb);
>  
> +static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev)

I forgot to actually pad the skb here, if it's a tail tagger, silly me.
The following changes should do the trick.

> +{
> +	struct net_device *master = dsa_slave_to_master(dev);

The addition of master->needed_headroom and master->needed_tailroom used
to be here, that's why this unused variable is here.

> +	struct dsa_slave_priv *p = netdev_priv(dev);
> +	struct dsa_slave_stats *e;
> +	int headroom, tailroom;
	int padlen = 0, err;
> +
> +	headroom = dev->needed_headroom;
> +	tailroom = dev->needed_tailroom;
> +	/* For tail taggers, we need to pad short frames ourselves, to ensure
> +	 * that the tail tag does not fail at its role of being at the end of
> +	 * the packet, once the master interface pads the frame.
> +	 */
> +	if (unlikely(tailroom && skb->len < ETH_ZLEN))
> +		tailroom += ETH_ZLEN - skb->len;
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
		padlen = ETH_ZLEN - skb->len;
	tailroom += padlen;
> +
> +	if (likely(skb_headroom(skb) >= headroom &&
> +		   skb_tailroom(skb) >= tailroom) &&
> +		   !skb_cloned(skb))
> +		/* No reallocation needed, yay! */
> +		return 0;
> +
> +	e = this_cpu_ptr(p->extra_stats);
> +	u64_stats_update_begin(&e->syncp);
> +	e->tx_reallocs++;
> +	u64_stats_update_end(&e->syncp);
> +
> +	return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
	if (err < 0 || !padlen)
		return err;

	return __skb_put_padto(skb, padlen, false);
> +}
> +
>  static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct dsa_slave_priv *p = netdev_priv(dev);
> @@ -567,6 +597,11 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
>  	 */
>  	dsa_skb_tx_timestamp(p, skb);
>  
> +	if (dsa_realloc_skb(skb, dev)) {
> +		kfree_skb(skb);
> +		return NETDEV_TX_OK;
> +	}
> +
>  	/* Transmit function may have to reallocate the original SKB,
>  	 * in which case it must have freed it. Only free it here on error.
>  	 */
> @@ -1802,6 +1837,14 @@ int dsa_slave_create(struct dsa_port *port)
>  	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
>  	if (ds->ops->port_max_mtu)
>  		slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index);
> +	/* Try to save one extra realloc later in the TX path (in the master)
> +	 * by also inheriting the master's needed headroom and tailroom.
> +	 * The 8021q driver also does this.
> +	 */

Also, this comment is bogus given the current code. It should be removed
from here, and...

> +	if (cpu_dp->tag_ops->tail_tag)
> +		slave_dev->needed_tailroom = cpu_dp->tag_ops->overhead;
> +	else
> +		slave_dev->needed_headroom = cpu_dp->tag_ops->overhead;
...put here, along with:
	slave_dev->needed_headroom += master->needed_headroom;
	slave_dev->needed_tailroom += master->needed_tailroom;
>  	SET_NETDEV_DEVTYPE(slave_dev, &dsa_type);
>  
>  	netdev_for_each_tx_queue(slave_dev, dsa_slave_set_lockdep_class_one,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 22:01   ` Vladimir Oltean
@ 2020-10-17 22:11     ` Florian Fainelli
  2020-10-17 22:17       ` Vladimir Oltean
  2020-10-19  8:33       ` David Laight
  2020-10-17 22:31     ` Vladimir Oltean
  1 sibling, 2 replies; 49+ messages in thread
From: Florian Fainelli @ 2020-10-17 22:11 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: andrew, vivien.didelot, kuba, Christian Eggers, Kurt Kanzenbach



On 10/17/2020 3:01 PM, Vladimir Oltean wrote:
> On Sun, Oct 18, 2020 at 12:36:00AM +0300, Vladimir Oltean wrote:
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index d4326940233c..790f5c8deb13 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -548,6 +548,36 @@ netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev)
>>   }
>>   EXPORT_SYMBOL_GPL(dsa_enqueue_skb);
>>   
>> +static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev)
> 
> I forgot to actually pad the skb here, if it's a tail tagger, silly me.
> The following changes should do the trick.
> 
>> +{
>> +	struct net_device *master = dsa_slave_to_master(dev);
> 
> The addition of master->needed_headroom and master->needed_tailroom used
> to be here, that's why this unused variable is here.
> 
>> +	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	struct dsa_slave_stats *e;
>> +	int headroom, tailroom;
> 	int padlen = 0, err;
>> +
>> +	headroom = dev->needed_headroom;
>> +	tailroom = dev->needed_tailroom;
>> +	/* For tail taggers, we need to pad short frames ourselves, to ensure
>> +	 * that the tail tag does not fail at its role of being at the end of
>> +	 * the packet, once the master interface pads the frame.
>> +	 */
>> +	if (unlikely(tailroom && skb->len < ETH_ZLEN))
>> +		tailroom += ETH_ZLEN - skb->len;
> 		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 		padlen = ETH_ZLEN - skb->len;
> 	tailroom += padlen;
>> +
>> +	if (likely(skb_headroom(skb) >= headroom &&
>> +		   skb_tailroom(skb) >= tailroom) &&
>> +		   !skb_cloned(skb))
>> +		/* No reallocation needed, yay! */
>> +		return 0;
>> +
>> +	e = this_cpu_ptr(p->extra_stats);
>> +	u64_stats_update_begin(&e->syncp);
>> +	e->tx_reallocs++;
>> +	u64_stats_update_end(&e->syncp);
>> +
>> +	return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
> 	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
> 	if (err < 0 || !padlen)
> 		return err;
> 
> 	return __skb_put_padto(skb, padlen, false);
>> +}
>> +
>>   static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct dsa_slave_priv *p = netdev_priv(dev);
>> @@ -567,6 +597,11 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
>>   	 */
>>   	dsa_skb_tx_timestamp(p, skb);
>>   
>> +	if (dsa_realloc_skb(skb, dev)) {
>> +		kfree_skb(skb);
>> +		return NETDEV_TX_OK;
>> +	}
>> +
>>   	/* Transmit function may have to reallocate the original SKB,
>>   	 * in which case it must have freed it. Only free it here on error.
>>   	 */
>> @@ -1802,6 +1837,14 @@ int dsa_slave_create(struct dsa_port *port)
>>   	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
>>   	if (ds->ops->port_max_mtu)
>>   		slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index);
>> +	/* Try to save one extra realloc later in the TX path (in the master)
>> +	 * by also inheriting the master's needed headroom and tailroom.
>> +	 * The 8021q driver also does this.
>> +	 */
> 
> Also, this comment is bogus given the current code. It should be removed
> from here, and...
> 
>> +	if (cpu_dp->tag_ops->tail_tag)
>> +		slave_dev->needed_tailroom = cpu_dp->tag_ops->overhead;
>> +	else
>> +		slave_dev->needed_headroom = cpu_dp->tag_ops->overhead;
> ...put here, along with:
> 	slave_dev->needed_headroom += master->needed_headroom;
> 	slave_dev->needed_tailroom += master->needed_tailroom;

Not positive you need that because you may be account for more head or 
tail room than necessary.

For instance with tag_brcm.c and systemport.c we need 4 bytes of head 
room for the Broadcom tag and an additional 8 bytes for pushing the 
transmit status block descriptor in front of the Ethernet frame about to 
be transmitted. These additional 8 bytes are a requirement of the DSA 
master here and exist regardless of DSA being used, but we should not be 
propagating them to the DSA slave.
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
@ 2020-10-17 22:13   ` Florian Fainelli
  2020-10-17 23:15   ` Andrew Lunn
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Florian Fainelli @ 2020-10-17 22:13 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: andrew, vivien.didelot, kuba, Christian Eggers, Kurt Kanzenbach



On 10/17/2020 2:35 PM, Vladimir Oltean wrote:
> DSA needs to push a header onto every packet on TX, and this might cause
> reallocation under certain scenarios, which might affect, for example,
> performance.
> 
> But reallocated packets are not standardized in struct pcpu_sw_netstats,
> struct net_device_stats or anywhere else, it seems, so we need to roll
> our own extra netdevice statistics and expose them to ethtool.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 22:11     ` Florian Fainelli
@ 2020-10-17 22:17       ` Vladimir Oltean
  2020-10-18  0:37         ` Florian Fainelli
  2020-10-19  8:33       ` David Laight
  1 sibling, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 22:17 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, andrew, vivien.didelot, kuba, Christian Eggers, Kurt Kanzenbach

On Sat, Oct 17, 2020 at 03:11:52PM -0700, Florian Fainelli wrote:
> > 	slave_dev->needed_headroom += master->needed_headroom;
> > 	slave_dev->needed_tailroom += master->needed_tailroom;
> 
> Not positive you need that because you may be account for more head or tail
> room than necessary.
> 
> For instance with tag_brcm.c and systemport.c we need 4 bytes of head room
> for the Broadcom tag and an additional 8 bytes for pushing the transmit
> status block descriptor in front of the Ethernet frame about to be
> transmitted. These additional 8 bytes are a requirement of the DSA master
> here and exist regardless of DSA being used, but we should not be
> propagating them to the DSA slave.

And that's exactly what I'm trying to do here, do you see any problem
with it? Basically I'm telling the network stack to allocate skbs with
large enough headroom and tailroom so that reallocations will not be
necessary for its entire TX journey. Not in DSA and not in the
systemport either. That's the exact reason why the VLAN driver does this
too, as far as I understand. Doing this trick also has the benefit that
it works with stacked DSA devices too. The real master has a headroom
of, say, 16 bytes, the first-level switch has 16 bytes, and the
second-level switch has 16 more bytes. So when you inject an skb into
the second-level switch (the one with the user ports that applications
will use), the skb will be reallocated only once, with a new headroom of
16 * 3 bytes, instead of potentially 3 times (incrementally, first for
16, then for 32, then for 48). Am I missing something?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 12/13] net: dsa: tag_gswip: let DSA core deal with TX reallocation
  2020-10-17 21:36 ` [RFC PATCH 12/13] net: dsa: tag_gswip: " Vladimir Oltean
@ 2020-10-17 22:19   ` Vladimir Oltean
  0 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 22:19 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach, Hauke Mehrtens

On Sun, Oct 18, 2020 at 12:36:10AM +0300, Vladimir Oltean wrote:
> diff --git a/net/dsa/tag_gswip.c b/net/dsa/tag_gswip.c
> index 408d4af390a0..cde93ccb21ac 100644
> --- a/net/dsa/tag_gswip.c
> +++ b/net/dsa/tag_gswip.c
> @@ -63,10 +63,6 @@ static struct sk_buff *gswip_tag_xmit(struct sk_buff *skb,
>  	int err;

Unused variable "err" left behind, will delete.

>  	u8 *gswip_tag;
>
> -	err = skb_cow_head(skb, GSWIP_TX_HEADER_LEN);
> -	if (err)
> -		return NULL;
> -
>  	skb_push(skb, GSWIP_TX_HEADER_LEN);
>
>  	gswip_tag = skb->data;
> --

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 22:01   ` Vladimir Oltean
  2020-10-17 22:11     ` Florian Fainelli
@ 2020-10-17 22:31     ` Vladimir Oltean
  2020-10-18  0:13       ` Vladimir Oltean
  1 sibling, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-17 22:31 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

On Sun, Oct 18, 2020 at 01:01:04AM +0300, Vladimir Oltean wrote:
> > +	return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
> 	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
> 	if (err < 0 || !padlen)
> 		return err;
> 
> 	return __skb_put_padto(skb, padlen, false);

Oops, another one here. Should be:

	return __skb_put_padto(skb, skb->len + padlen, false);
> > +}

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/13] Generic TX reallocation for DSA
  2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
                   ` (12 preceding siblings ...)
  2020-10-17 21:36 ` [RFC PATCH 13/13] net: dsa: tag_ar9331: " Vladimir Oltean
@ 2020-10-17 23:07 ` Andrew Lunn
  13 siblings, 0 replies; 49+ messages in thread
From: Andrew Lunn @ 2020-10-17 23:07 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

Hi Vladimir

> For example, the tail tagging driver for Marvell 88E6060 currently
> reallocates _every_single_frame_ on TX. Is that an obvious
> indication that nobody is using it?

We have had very few patches for that driver, so i would agree with
you, it is probably not used any more. It could even be something we
consider moving to staging and then out of the kernel.

> DSA has all the information it needs in order to simplify the job of a
> tagger on TX. It knows whether it's a normal or a tail tagger, and what
> is the protocol overhead it incurs. So why not just perform the
> reallocation centrally, which also has the benefit of being able to
> introduce a common ethtool statistics counter for number of TX reallocs.
> With the latter, performance issues due to this particular reason are
> easy to track down.

All sounds good to me, in principle. But the devil is in the details.

    Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
  2020-10-17 22:13   ` Florian Fainelli
@ 2020-10-17 23:15   ` Andrew Lunn
  2020-10-18  0:23     ` Vladimir Oltean
  2020-10-18 12:02   ` Heiner Kallweit
  2020-10-18 16:49   ` Jakub Kicinski
  3 siblings, 1 reply; 49+ messages in thread
From: Andrew Lunn @ 2020-10-17 23:15 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

> +/* Driver statistics, other than those in struct rtnl_link_stats64.
> + * These are collected per-CPU and aggregated by ethtool.
> + */
> +struct dsa_slave_stats {
> +	__u64			tx_reallocs;
> +	struct u64_stats_sync	syncp;
> +} __aligned(1 * sizeof(u64));

The convention seems to be to use a prefix of pcpu_,
e.g. pcpu_sw_netstats, pcpu_lstats.

Also, why __u64? Neither pcpu_sw_netstats nor pcpu_lstats use __u64.

      Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 22:31     ` Vladimir Oltean
@ 2020-10-18  0:13       ` Vladimir Oltean
  2020-10-18 10:36         ` Christian Eggers
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-18  0:13 UTC (permalink / raw)
  To: netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

On Sun, Oct 18, 2020 at 01:31:20AM +0300, Vladimir Oltean wrote:
> On Sun, Oct 18, 2020 at 01:01:04AM +0300, Vladimir Oltean wrote:
> > > +	return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
> > 	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 	err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC);
> > 	if (err < 0 || !padlen)
> > 		return err;
> > 
> > 	return __skb_put_padto(skb, padlen, false);
> 
> Oops, another one here. Should be:
> 
> 	return __skb_put_padto(skb, skb->len + padlen, false);
> > > +}

Last one for today. This should actually be correct now, and not
allocate double the needed headroom size.

static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev)
{
	struct dsa_slave_priv *p = netdev_priv(dev);
	struct dsa_slave_stats *e;
	int needed_headroom;
	int needed_tailroom;
	int padlen = 0, err;

	needed_headroom = dev->needed_headroom;
	needed_tailroom = dev->needed_tailroom;
	/* For tail taggers, we need to pad short frames ourselves, to ensure
	 * that the tail tag does not fail at its role of being at the end of
	 * the packet, once the master interface pads the frame.
	 */
	if (unlikely(needed_tailroom && skb->len < ETH_ZLEN))
		padlen = ETH_ZLEN - skb->len;
	needed_tailroom += padlen;
	needed_headroom -= skb_headroom(skb);
	needed_tailroom -= skb_tailroom(skb);

	if (likely(needed_headroom <= 0 && needed_tailroom <= 0 &&
		   !skb_cloned(skb)))
		/* No reallocation needed, yay! */
		return 0;

	e = this_cpu_ptr(p->extra_stats);
	u64_stats_update_begin(&e->syncp);
	e->tx_reallocs++;
	u64_stats_update_end(&e->syncp);

	err = pskb_expand_head(skb, max(needed_headroom, 0),
			       max(needed_tailroom, 0), GFP_ATOMIC);
	if (err < 0 || !padlen)
		return err;

	return __skb_put_padto(skb, ETH_ZLEN, false);
}

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-17 23:15   ` Andrew Lunn
@ 2020-10-18  0:23     ` Vladimir Oltean
  0 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-18  0:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

On Sun, Oct 18, 2020 at 01:15:51AM +0200, Andrew Lunn wrote:
> > +/* Driver statistics, other than those in struct rtnl_link_stats64.
> > + * These are collected per-CPU and aggregated by ethtool.
> > + */
> > +struct dsa_slave_stats {
> > +	__u64			tx_reallocs;
> > +	struct u64_stats_sync	syncp;
> > +} __aligned(1 * sizeof(u64));
> 
> The convention seems to be to use a prefix of pcpu_,
> e.g. pcpu_sw_netstats, pcpu_lstats.

I find the "pcpu_sw_netstats" to be long and useless. I can easily see
it's percpu based on its usage, I don't need to have it in its name.

> Also, why __u64? Neither pcpu_sw_netstats nor pcpu_lstats use __u64.

Ok, I'll confess I stole the beginning from the dpaa2-eth driver prior
to commit d70446ee1f40 ("dpaa2-eth: send a scatter-gather FD instead of
realloc-ing"), since I knew it used to implement this counter. Then I
combined with what was already there for the standard statistics in DSA.

But to be honest, what I dislike a little bit is that we have 2
structures now. For example, netronome nfp has created one struct
nfp_repr_pcpu_stats that holds everything, even if it duplicates some
counters found elsewhere. But I think that's a bit easier to digest from
a maintainability point of view, what do you think?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 22:17       ` Vladimir Oltean
@ 2020-10-18  0:37         ` Florian Fainelli
  0 siblings, 0 replies; 49+ messages in thread
From: Florian Fainelli @ 2020-10-18  0:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, vivien.didelot, kuba, Christian Eggers, Kurt Kanzenbach



On 10/17/2020 3:17 PM, Vladimir Oltean wrote:
> On Sat, Oct 17, 2020 at 03:11:52PM -0700, Florian Fainelli wrote:
>>> 	slave_dev->needed_headroom += master->needed_headroom;
>>> 	slave_dev->needed_tailroom += master->needed_tailroom;
>>
>> Not positive you need that because you may be account for more head or tail
>> room than necessary.
>>
>> For instance with tag_brcm.c and systemport.c we need 4 bytes of head room
>> for the Broadcom tag and an additional 8 bytes for pushing the transmit
>> status block descriptor in front of the Ethernet frame about to be
>> transmitted. These additional 8 bytes are a requirement of the DSA master
>> here and exist regardless of DSA being used, but we should not be
>> propagating them to the DSA slave.
> 
> And that's exactly what I'm trying to do here, do you see any problem
> with it? Basically I'm telling the network stack to allocate skbs with
> large enough headroom and tailroom so that reallocations will not be
> necessary for its entire TX journey. Not in DSA and not in the
> systemport either. That's the exact reason why the VLAN driver does this
> too, as far as I understand. Doing this trick also has the benefit that
> it works with stacked DSA devices too. The real master has a headroom
> of, say, 16 bytes, the first-level switch has 16 bytes, and the
> second-level switch has 16 more bytes. So when you inject an skb into
> the second-level switch (the one with the user ports that applications
> will use), the skb will be reallocated only once, with a new headroom of
> 16 * 3 bytes, instead of potentially 3 times (incrementally, first for
> 16, then for 32, then for 48). Am I missing something?
> 

That is fine with me, given that we can resolve most of the TX path 
ahead of time, I suppose we should indeed take advantage of that 
knowledge. Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-18  0:13       ` Vladimir Oltean
@ 2020-10-18 10:36         ` Christian Eggers
  2020-10-18 11:42           ` Vladimir Oltean
  0 siblings, 1 reply; 49+ messages in thread
From: Christian Eggers @ 2020-10-18 10:36 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba, Kurt Kanzenbach

Hi Vladimir,

thank you very much for bringing some progress into this.

I tried to test on top of latest net-next, but I currently get a linker error (not  related to this):
arch/arm/vfp/vfphw.o: in function `vfp_support_entry':
arch/arm/vfp/vfphw.S:85:(.text+0xa): relocation truncated to fit: R_ARM_THM_JUMP19 against symbol `vfp_kmode_exception' defined in .text.unlikely section in arch/arm/vfp/vfpmodule.o

So continued testing of your series (with all updates) and my PTP work on top
of net-next from 2020-10-14.

On Sunday, 18 October 2020, 02:13:27 CEST, Vladimir Oltean wrote:
> Last one for today. This should actually be correct now, and not
> allocate double the needed headroom size.
> 
> static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev)
> {
> 	struct dsa_slave_priv *p = netdev_priv(dev);
> 	struct dsa_slave_stats *e;
> 	int needed_headroom;
> 	int needed_tailroom;
> 	int padlen = 0, err;
> 
> 	needed_headroom = dev->needed_headroom;
> 	needed_tailroom = dev->needed_tailroom;
Debugging shows that these values are correct for my device (0/5).

> 	/* For tail taggers, we need to pad short frames ourselves, to ensure
> 	 * that the tail tag does not fail at its role of being at the end of
> 	 * the packet, once the master interface pads the frame.
> 	 */
> 	if (unlikely(needed_tailroom && skb->len < ETH_ZLEN))
Can "needed_tailroom" be used equivalently for dsa_device_ops::tail_tag?

> 		padlen = ETH_ZLEN - skb->len;
> 	needed_tailroom += padlen;
> 	needed_headroom -= skb_headroom(skb);
> 	needed_tailroom -= skb_tailroom(skb);
> 
> 	if (likely(needed_headroom <= 0 && needed_tailroom <= 0 &&
> 		   !skb_cloned(skb)))
> 		/* No reallocation needed, yay! */
> 		return 0;
> 
> 	e = this_cpu_ptr(p->extra_stats);
> 	u64_stats_update_begin(&e->syncp);
> 	e->tx_reallocs++;
> 	u64_stats_update_end(&e->syncp);
> 
> 	err = pskb_expand_head(skb, max(needed_headroom, 0),
> 			       max(needed_tailroom, 0), GFP_ATOMIC);
You may remove the second max() statement (around needed_tailroom). This would
size the reallocated skb more exactly to the size actually required an may save
some RAM (already tested too).

Alternatively I suggest moving the max() statements up in order to completely
avoid negative values for needed_headroom / needed_tailroom:

	needed_headroom = max(needed_headroom - skb_headroom(skb), 0);
	needed_tailroom = max(needed_tailroom - skb_tailroom(skb), 0);

	if (likely(needed_headroom == 0 && needed_tailroom == 0 &&
		   !skb_cloned(skb)))
		/* No reallocation needed, yay! */
		return 0;
	...
	err = pskb_expand_head(skb, needed_headroom, needed_tailroom, GFP_ATOMIC);

> 	if (err < 0 || !padlen)
> 		return err;
This is correct but looks misleading for me. What about...
	if (err < 0)
		return err;

	return padlen ? __skb_put_padto(skb, ETH_ZLEN, false) : 0;


FYI two dumps of a padded skb (before/after) dsa_realloc_skb():
[ 1983.621180] old:skb len=58 headroom=2 headlen=58 tailroom=68
[ 1983.621180] mac=(2,14) net=(16,-1) trans=-1
[ 1983.621180] shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))
[ 1983.621180] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
[ 1983.621180] hash(0x0 sw=0 l4=0) proto=0x88f7 pkttype=0 iif=0
[ 1983.635575] old:dev name=lan0 feat=0x0x0002000000005220
[ 1983.638205] old:sk family=17 type=3 proto=0
[ 1983.640323] old:skb linear:   00000000: 01 1b 19 00 00 00 ee 97 1f aa 93 21 88 f7 01 02
[ 1983.644416] old:skb linear:   00000010: 00 2c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1983.648656] old:skb linear:   00000020: 00 00 ee 97 1f ff fe aa 93 21 00 01 06 79 01 7f
[ 1983.652726] old:skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00

[ 1983.656040] new:skb len=60 headroom=2 headlen=60 tailroom=66
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ok
[ 1983.656040] mac=(2,14) net=(16,-1) trans=-1
[ 1983.656040] shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))
[ 1983.656040] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
[ 1983.656040] hash(0x0 sw=0 l4=0) proto=0x88f7 pkttype=0 iif=0
[ 1983.670427] new:dev name=lan0 feat=0x0x0002000000005220
[ 1983.673082] new:sk family=17 type=3 proto=0
[ 1983.675233] new:skb linear:   00000000: 01 1b 19 00 00 00 ee 97 1f aa 93 21 88 f7 01 02
[ 1983.679329] new:skb linear:   00000010: 00 2c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1983.683411] new:skb linear:   00000020: 00 00 ee 97 1f ff fe aa 93 21 00 01 06 79 01 7f
[ 1983.687506] new:skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00 00 00

Tested-by: Christian Eggers <ceggers@arri.de>  # For tail taggers only

Best regards
Christian Eggers




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-18 10:36         ` Christian Eggers
@ 2020-10-18 11:42           ` Vladimir Oltean
  2020-10-18 11:59             ` Christian Eggers
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-18 11:42 UTC (permalink / raw)
  To: Christian Eggers
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 12:36:03PM +0200, Christian Eggers wrote:
> >       /* For tail taggers, we need to pad short frames ourselves, to ensure
> >        * that the tail tag does not fail at its role of being at the end of
> >        * the packet, once the master interface pads the frame.
> >        */
> >       if (unlikely(needed_tailroom && skb->len < ETH_ZLEN))
> Can "needed_tailroom" be used equivalently for dsa_device_ops::tail_tag?

For what I care, it's "equivalent enough".
Since needed_tailroom comes from slave->needed_tailroom + master->needed_tailroom,
there might be a situation when slave->needed_tailroom is 0, but padding
is performed nonetheless. I am not sure that this is something that will
occur in practice. If you grep drivers/net/ethernet, only Sun Virtual Network
devices set dev->needed_tailroom. I would prefer avoiding to touch any
other cache line, and not duplicating the tail_tag or overhead members
if I can avoid it. If it becomes a problem, I'll make that change.

> >               padlen = ETH_ZLEN - skb->len;
> >       needed_tailroom += padlen;
> >       needed_headroom -= skb_headroom(skb);
> >       needed_tailroom -= skb_tailroom(skb);
> >
> >       if (likely(needed_headroom <= 0 && needed_tailroom <= 0 &&
> >                  !skb_cloned(skb)))
> >               /* No reallocation needed, yay! */
> >               return 0;
> >
> >       e = this_cpu_ptr(p->extra_stats);
> >       u64_stats_update_begin(&e->syncp);
> >       e->tx_reallocs++;
> >       u64_stats_update_end(&e->syncp);
> >
> >       err = pskb_expand_head(skb, max(needed_headroom, 0),
> >                              max(needed_tailroom, 0), GFP_ATOMIC);
> You may remove the second max() statement (around needed_tailroom). This would
> size the reallocated skb more exactly to the size actually required an may save
> some RAM (already tested too).

Please explain more. needed_tailroom can be negative, why should I
shrink the tailroom?

> Alternatively I suggest moving the max() statements up in order to completely
> avoid negative values for needed_headroom / needed_tailroom:
> 
>         needed_headroom = max(needed_headroom - skb_headroom(skb), 0);
>         needed_tailroom = max(needed_tailroom - skb_tailroom(skb), 0);
> 
>         if (likely(needed_headroom == 0 && needed_tailroom == 0 &&
>                    !skb_cloned(skb)))
>                 /* No reallocation needed, yay! */
>                 return 0;
>         ...
>         err = pskb_expand_head(skb, needed_headroom, needed_tailroom, GFP_ATOMIC);
> 

Ok, this looks better, thanks.

> >       if (err < 0 || !padlen)
> >               return err;
> This is correct but looks misleading for me. What about...
>         if (err < 0)
>                 return err;
> 
>         return padlen ? __skb_put_padto(skb, ETH_ZLEN, false) : 0;
> 

Ok, I suppose it can be misleading. Will do this even if it's one more
branch. It's in the unlikely path anyway.

> FYI two dumps of a padded skb (before/after) dsa_realloc_skb():
> [ 1983.621180] old:skb len=58 headroom=2 headlen=58 tailroom=68
> [ 1983.621180] mac=(2,14) net=(16,-1) trans=-1
> [ 1983.621180] shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))
> [ 1983.621180] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> [ 1983.621180] hash(0x0 sw=0 l4=0) proto=0x88f7 pkttype=0 iif=0
> [ 1983.635575] old:dev name=lan0 feat=0x0x0002000000005220
> [ 1983.638205] old:sk family=17 type=3 proto=0
> [ 1983.640323] old:skb linear:   00000000: 01 1b 19 00 00 00 ee 97 1f aa 93 21 88 f7 01 02
> [ 1983.644416] old:skb linear:   00000010: 00 2c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1983.648656] old:skb linear:   00000020: 00 00 ee 97 1f ff fe aa 93 21 00 01 06 79 01 7f
> [ 1983.652726] old:skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00
> 
> [ 1983.656040] new:skb len=60 headroom=2 headlen=60 tailroom=66
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ok
> [ 1983.656040] mac=(2,14) net=(16,-1) trans=-1
> [ 1983.656040] shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))
> [ 1983.656040] csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> [ 1983.656040] hash(0x0 sw=0 l4=0) proto=0x88f7 pkttype=0 iif=0
> [ 1983.670427] new:dev name=lan0 feat=0x0x0002000000005220
> [ 1983.673082] new:sk family=17 type=3 proto=0
> [ 1983.675233] new:skb linear:   00000000: 01 1b 19 00 00 00 ee 97 1f aa 93 21 88 f7 01 02
> [ 1983.679329] new:skb linear:   00000010: 00 2c 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 1983.683411] new:skb linear:   00000020: 00 00 ee 97 1f ff fe aa 93 21 00 01 06 79 01 7f
> [ 1983.687506] new:skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00 00 00

The dumps look ok, and in line with what I saw.
For what it's worth, anybody can test with any tagger, you don' need
dedicated hardware. You just need to replace the value returned by your
dsa_switch_ops::get_tag_protocol method. This is enough to get skb dumps.
For more complicated things like ensuring 1588 timestamping
works, it won't be enough, of course, so your testing is still very
valuable to ensure that keeps working for you (it does for me).

> 
> Tested-by: Christian Eggers <ceggers@arri.de>  # For tail taggers only

Thanks, I'll resend this in about 2 weeks. In the meantime I'll update
this branch:
https://github.com/vladimiroltean/linux/commits/dsa-tx-realloc

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-18 11:42           ` Vladimir Oltean
@ 2020-10-18 11:59             ` Christian Eggers
  2020-10-18 12:15               ` Vladimir Oltean
  0 siblings, 1 reply; 49+ messages in thread
From: Christian Eggers @ 2020-10-18 11:59 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba, Kurt Kanzenbach

On Sunday, 18 October 2020, 13:42:06 CEST, Vladimir Oltean wrote:
> On Sun, Oct 18, 2020 at 12:36:03PM +0200, Christian Eggers wrote:
> > >       err = pskb_expand_head(skb, max(needed_headroom, 0),
> > >       
> > >                              max(needed_tailroom, 0), GFP_ATOMIC);
> > 
> > You may remove the second max() statement (around needed_tailroom). This
> > would size the reallocated skb more exactly to the size actually required
> > an may save some RAM (already tested too).
> 
> Please explain more. needed_tailroom can be negative, why should I
> shrink the tailroom?
Because it will not be required anymore. This may lead to smaller memory 
allocations or the excess tailroom can be reused for headroom if needed. If 
none of both applies, the tailroom will not be changed.

regards
Christian




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
  2020-10-17 22:13   ` Florian Fainelli
  2020-10-17 23:15   ` Andrew Lunn
@ 2020-10-18 12:02   ` Heiner Kallweit
  2020-10-18 12:16     ` Vladimir Oltean
  2020-10-18 16:49   ` Jakub Kicinski
  3 siblings, 1 reply; 49+ messages in thread
From: Heiner Kallweit @ 2020-10-18 12:02 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: andrew, f.fainelli, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

On 17.10.2020 23:35, Vladimir Oltean wrote:
> DSA needs to push a header onto every packet on TX, and this might cause
> reallocation under certain scenarios, which might affect, for example,
> performance.
> 
> But reallocated packets are not standardized in struct pcpu_sw_netstats,
> struct net_device_stats or anywhere else, it seems, so we need to roll
> our own extra netdevice statistics and expose them to ethtool.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/dsa/dsa_priv.h |  9 +++++++++
>  net/dsa/slave.c    | 25 ++++++++++++++++++++++---
>  2 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 12998bf04e55..d39db7500cdd 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -73,12 +73,21 @@ struct dsa_notifier_mtu_info {
>  	int mtu;
>  };
>  
> +/* Driver statistics, other than those in struct rtnl_link_stats64.
> + * These are collected per-CPU and aggregated by ethtool.
> + */
> +struct dsa_slave_stats {
> +	__u64			tx_reallocs;
> +	struct u64_stats_sync	syncp;
> +} __aligned(1 * sizeof(u64));
> +

Wouldn't a simple unsigned long (like in struct net_device_stats) be
sufficient here? This would make handling the counter much simpler.
And as far as I understand we talk about a packet counter that is
touched in certain scenarios only.

>  struct dsa_slave_priv {
>  	/* Copy of CPU port xmit for faster access in slave transmit hot path */
>  	struct sk_buff *	(*xmit)(struct sk_buff *skb,
>  					struct net_device *dev);
>  
>  	struct pcpu_sw_netstats	__percpu *stats64;
> +	struct dsa_slave_stats	__percpu *extra_stats;
>  
>  	struct gro_cells	gcells;
>  
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 3bc5ca40c9fb..d4326940233c 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -668,9 +668,10 @@ static void dsa_slave_get_strings(struct net_device *dev,
>  		strncpy(data + len, "tx_bytes", len);
>  		strncpy(data + 2 * len, "rx_packets", len);
>  		strncpy(data + 3 * len, "rx_bytes", len);
> +		strncpy(data + 4 * len, "tx_reallocs", len);
>  		if (ds->ops->get_strings)
>  			ds->ops->get_strings(ds, dp->index, stringset,
> -					     data + 4 * len);
> +					     data + 5 * len);
>  	}
>  }
>  
> @@ -682,11 +683,13 @@ static void dsa_slave_get_ethtool_stats(struct net_device *dev,
>  	struct dsa_slave_priv *p = netdev_priv(dev);
>  	struct dsa_switch *ds = dp->ds;
>  	struct pcpu_sw_netstats *s;
> +	struct dsa_slave_stats *e;
>  	unsigned int start;
>  	int i;
>  
>  	for_each_possible_cpu(i) {
>  		u64 tx_packets, tx_bytes, rx_packets, rx_bytes;
> +		u64 tx_reallocs;
>  
>  		s = per_cpu_ptr(p->stats64, i);
>  		do {
> @@ -696,13 +699,21 @@ static void dsa_slave_get_ethtool_stats(struct net_device *dev,
>  			rx_packets = s->rx_packets;
>  			rx_bytes = s->rx_bytes;
>  		} while (u64_stats_fetch_retry_irq(&s->syncp, start));
> +
> +		e = per_cpu_ptr(p->extra_stats, i);
> +		do {
> +			start = u64_stats_fetch_begin_irq(&e->syncp);
> +			tx_reallocs	= e->tx_reallocs;
> +		} while (u64_stats_fetch_retry_irq(&e->syncp, start));
> +
>  		data[0] += tx_packets;
>  		data[1] += tx_bytes;
>  		data[2] += rx_packets;
>  		data[3] += rx_bytes;
> +		data[4] += tx_reallocs;
>  	}
>  	if (ds->ops->get_ethtool_stats)
> -		ds->ops->get_ethtool_stats(ds, dp->index, data + 4);
> +		ds->ops->get_ethtool_stats(ds, dp->index, data + 5);
>  }
>  
>  static int dsa_slave_get_sset_count(struct net_device *dev, int sset)
> @@ -713,7 +724,7 @@ static int dsa_slave_get_sset_count(struct net_device *dev, int sset)
>  	if (sset == ETH_SS_STATS) {
>  		int count;
>  
> -		count = 4;
> +		count = 5;
>  		if (ds->ops->get_sset_count)
>  			count += ds->ops->get_sset_count(ds, dp->index, sset);
>  
> @@ -1806,6 +1817,12 @@ int dsa_slave_create(struct dsa_port *port)
>  		free_netdev(slave_dev);
>  		return -ENOMEM;
>  	}
> +	p->extra_stats = netdev_alloc_pcpu_stats(struct dsa_slave_stats);
> +	if (!p->extra_stats) {
> +		free_percpu(p->stats64);
> +		free_netdev(slave_dev);
> +		return -ENOMEM;
> +	}
>  
>  	ret = gro_cells_init(&p->gcells, slave_dev);
>  	if (ret)
> @@ -1864,6 +1881,7 @@ int dsa_slave_create(struct dsa_port *port)
>  out_gcells:
>  	gro_cells_destroy(&p->gcells);
>  out_free:
> +	free_percpu(p->extra_stats);
>  	free_percpu(p->stats64);
>  	free_netdev(slave_dev);
>  	port->slave = NULL;
> @@ -1886,6 +1904,7 @@ void dsa_slave_destroy(struct net_device *slave_dev)
>  	dsa_slave_notify(slave_dev, DSA_PORT_UNREGISTER);
>  	phylink_destroy(dp->pl);
>  	gro_cells_destroy(&p->gcells);
> +	free_percpu(p->extra_stats);
>  	free_percpu(p->stats64);
>  	free_netdev(slave_dev);
>  }
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-18 11:59             ` Christian Eggers
@ 2020-10-18 12:15               ` Vladimir Oltean
  0 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-18 12:15 UTC (permalink / raw)
  To: Christian Eggers
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 01:59:43PM +0200, Christian Eggers wrote:
> On Sunday, 18 October 2020, 13:42:06 CEST, Vladimir Oltean wrote:
> > On Sun, Oct 18, 2020 at 12:36:03PM +0200, Christian Eggers wrote:
> > > >       err = pskb_expand_head(skb, max(needed_headroom, 0),
> > > >
> > > >                              max(needed_tailroom, 0), GFP_ATOMIC);
> > >
> > > You may remove the second max() statement (around needed_tailroom). This
> > > would size the reallocated skb more exactly to the size actually required
> > > an may save some RAM (already tested too).
> >
> > Please explain more. needed_tailroom can be negative, why should I
> > shrink the tailroom?
> Because it will not be required anymore. This may lead to smaller memory
> allocations or the excess tailroom can be reused for headroom if needed. If
> none of both applies, the tailroom will not be changed.

Understand now, you're talking about the case where the tailroom in the
skb is already larger than what we estimate the packet will need through
its entire journey in the TX path. I still won't shrink it though, I'll
keep using the second approach you suggested.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 12:02   ` Heiner Kallweit
@ 2020-10-18 12:16     ` Vladimir Oltean
  2020-10-18 13:09       ` Heiner Kallweit
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-18 12:16 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 02:02:46PM +0200, Heiner Kallweit wrote:
> Wouldn't a simple unsigned long (like in struct net_device_stats) be
> sufficient here? This would make handling the counter much simpler.
> And as far as I understand we talk about a packet counter that is
> touched in certain scenarios only.

I don't understand, in what sense 'sufficient'? This counter is exported
to ethtool which works with u64 values, how would an unsigned long,
which is u32 on 32-bit systems, help?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 12:16     ` Vladimir Oltean
@ 2020-10-18 13:09       ` Heiner Kallweit
  2020-10-18 13:48         ` Vladimir Oltean
  0 siblings, 1 reply; 49+ messages in thread
From: Heiner Kallweit @ 2020-10-18 13:09 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On 18.10.2020 14:16, Vladimir Oltean wrote:
> On Sun, Oct 18, 2020 at 02:02:46PM +0200, Heiner Kallweit wrote:
>> Wouldn't a simple unsigned long (like in struct net_device_stats) be
>> sufficient here? This would make handling the counter much simpler.
>> And as far as I understand we talk about a packet counter that is
>> touched in certain scenarios only.
> 
> I don't understand, in what sense 'sufficient'? This counter is exported
> to ethtool which works with u64 values, how would an unsigned long,
> which is u32 on 32-bit systems, help?
> 
Sufficient for me means that it's unlikely that a 32 bit counter will
overflow. Many drivers use the 32 bit counters (on a 32bit system) in
net_device_stats for infrequent events like rx/tx errors, and 64bit
counters only for things like rx/tx bytes, which are more likely to
overflow.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 13:09       ` Heiner Kallweit
@ 2020-10-18 13:48         ` Vladimir Oltean
  2020-10-18 14:13           ` Heiner Kallweit
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-18 13:48 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 03:09:32PM +0200, Heiner Kallweit wrote:
> On 18.10.2020 14:16, Vladimir Oltean wrote:
> > On Sun, Oct 18, 2020 at 02:02:46PM +0200, Heiner Kallweit wrote:
> >> Wouldn't a simple unsigned long (like in struct net_device_stats) be
> >> sufficient here? This would make handling the counter much simpler.
> >> And as far as I understand we talk about a packet counter that is
> >> touched in certain scenarios only.
> > 
> > I don't understand, in what sense 'sufficient'? This counter is exported
> > to ethtool which works with u64 values, how would an unsigned long,
> > which is u32 on 32-bit systems, help?
> > 
> Sufficient for me means that it's unlikely that a 32 bit counter will
> overflow. Many drivers use the 32 bit counters (on a 32bit system) in
> net_device_stats for infrequent events like rx/tx errors, and 64bit
> counters only for things like rx/tx bytes, which are more likely to
> overflow.

2^32 = 4,294,967,296 = 4 billion packets
Considering that every packet that needs TX timestamping must be
reallocated, protocols such as IEEE 802.1AS will trigger 5 reallocs per
second. So time synchronization alone (background traffic, by all
accounts) will make this counter overflow in 27 years.
Every packet flooded or multicast by the bridge will also trigger
reallocs. In this case it is not difficult to imagine that overflows
might occur sooner.

Even if the above wasn't true. What becomes easier when I make the
counter an unsigned long? I need to make arch-dependent casts between an
unsigned long an an u64 when I expose the counter to ethtool, and there
it ends up being u64 too, doesn't it?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 13:48         ` Vladimir Oltean
@ 2020-10-18 14:13           ` Heiner Kallweit
  2020-10-18 22:58             ` Vladimir Oltean
  0 siblings, 1 reply; 49+ messages in thread
From: Heiner Kallweit @ 2020-10-18 14:13 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On 18.10.2020 15:48, Vladimir Oltean wrote:
> On Sun, Oct 18, 2020 at 03:09:32PM +0200, Heiner Kallweit wrote:
>> On 18.10.2020 14:16, Vladimir Oltean wrote:
>>> On Sun, Oct 18, 2020 at 02:02:46PM +0200, Heiner Kallweit wrote:
>>>> Wouldn't a simple unsigned long (like in struct net_device_stats) be
>>>> sufficient here? This would make handling the counter much simpler.
>>>> And as far as I understand we talk about a packet counter that is
>>>> touched in certain scenarios only.
>>>
>>> I don't understand, in what sense 'sufficient'? This counter is exported
>>> to ethtool which works with u64 values, how would an unsigned long,
>>> which is u32 on 32-bit systems, help?
>>>
>> Sufficient for me means that it's unlikely that a 32 bit counter will
>> overflow. Many drivers use the 32 bit counters (on a 32bit system) in
>> net_device_stats for infrequent events like rx/tx errors, and 64bit
>> counters only for things like rx/tx bytes, which are more likely to
>> overflow.
> 
> 2^32 = 4,294,967,296 = 4 billion packets
> Considering that every packet that needs TX timestamping must be
> reallocated, protocols such as IEEE 802.1AS will trigger 5 reallocs per
> second. So time synchronization alone (background traffic, by all
> accounts) will make this counter overflow in 27 years.
> Every packet flooded or multicast by the bridge will also trigger
> reallocs. In this case it is not difficult to imagine that overflows
> might occur sooner.
> 
> Even if the above wasn't true. What becomes easier when I make the
> counter an unsigned long? I need to make arch-dependent casts between an
> unsigned long an an u64 when I expose the counter to ethtool, and there
> it ends up being u64 too, doesn't it?
> 

Access to unsigned long should be atomic, so you could avoid the
following and access the counter directly (like other drivers do it
with the net_device_stats members). On a side note, because I'm also
just dealing with it: this_cpu_ptr() is safe only if preemption is
disabled. Is this the case here? Else there's get_cpu_ptr/put_cpu_ptr.
Also, if irq's aren't disabled there might be a need to use
u64_stats_update_begin_irqsave() et al. See:
2695578b896a ("net: usbnet: fix potential deadlock on 32bit hosts")
But I don't know dsa good enough to say whether this is applicable
here.

+	e = this_cpu_ptr(p->extra_stats);
+	u64_stats_update_begin(&e->syncp);
+	e->tx_reallocs++;
+	u64_stats_update_end(&e->syncp);

+		e = per_cpu_ptr(p->extra_stats, i);
+		do {
+			start = u64_stats_fetch_begin_irq(&e->syncp);
+			tx_reallocs	= e->tx_reallocs;
+		} while (u64_stats_fetch_retry_irq(&e->syncp, start));


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
                     ` (2 preceding siblings ...)
  2020-10-18 12:02   ` Heiner Kallweit
@ 2020-10-18 16:49   ` Jakub Kicinski
  2020-10-18 17:36     ` Andrew Lunn
  3 siblings, 1 reply; 49+ messages in thread
From: Jakub Kicinski @ 2020-10-18 16:49 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, f.fainelli, vivien.didelot, Christian Eggers,
	Kurt Kanzenbach

On Sun, 18 Oct 2020 00:35:59 +0300 Vladimir Oltean wrote:
> DSA needs to push a header onto every packet on TX, and this might cause
> reallocation under certain scenarios, which might affect, for example,
> performance.
> 
> But reallocated packets are not standardized in struct pcpu_sw_netstats,
> struct net_device_stats or anywhere else, it seems, so we need to roll
> our own extra netdevice statistics and expose them to ethtool.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Could you consider adding "driver" stats under RTM_GETSTATS, 
or a similar new structured interface over ethtool?

Looks like the statistic in question has pretty clear semantics,
and may be more broadly useful.

> +/* Driver statistics, other than those in struct rtnl_link_stats64.
> + * These are collected per-CPU and aggregated by ethtool.
> + */
> +struct dsa_slave_stats {
> +	__u64			tx_reallocs;

s/__u/u/

> +	struct u64_stats_sync	syncp;
> +} __aligned(1 * sizeof(u64));

Why aligned to u64? Compiler should pick a reasonable alignment here 
by itself.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 16:49   ` Jakub Kicinski
@ 2020-10-18 17:36     ` Andrew Lunn
  2020-10-18 18:26       ` Jakub Kicinski
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Lunn @ 2020-10-18 17:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vladimir Oltean, netdev, f.fainelli, vivien.didelot,
	Christian Eggers, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 09:49:51AM -0700, Jakub Kicinski wrote:
> > +	struct u64_stats_sync	syncp;
> > +} __aligned(1 * sizeof(u64));
> 
> Why aligned to u64? Compiler should pick a reasonable alignment here 
> by itself.

Hi Jakub

I wondered that as well. But:

struct gnet_stats_basic_cpu {
        struct gnet_stats_basic_packed bstats;
        struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));

/* often modified stats are per-CPU, other are shared (netdev->stats) */
struct pcpu_sw_netstats {
        u64     rx_packets;
        u64     rx_bytes;
        u64     tx_packets;
        u64     tx_bytes;
        struct u64_stats_sync   syncp;
} __aligned(4 * sizeof(u64));

struct pcpu_lstats {
        u64_stats_t packets;
        u64_stats_t bytes;
        struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));

Cargo cult or is there a real need?

      Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 17:36     ` Andrew Lunn
@ 2020-10-18 18:26       ` Jakub Kicinski
  2020-10-18 18:33         ` Jakub Kicinski
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Kicinski @ 2020-10-18 18:26 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Oltean, netdev, f.fainelli, vivien.didelot,
	Christian Eggers, Kurt Kanzenbach

On Sun, 18 Oct 2020 19:36:49 +0200 Andrew Lunn wrote:
> On Sun, Oct 18, 2020 at 09:49:51AM -0700, Jakub Kicinski wrote:
> > > +	struct u64_stats_sync	syncp;
> > > +} __aligned(1 * sizeof(u64));  
> > 
> > Why aligned to u64? Compiler should pick a reasonable alignment here 
> > by itself.  
> 
> Hi Jakub
> 
> I wondered that as well. But:
> 
> struct gnet_stats_basic_cpu {
>         struct gnet_stats_basic_packed bstats;
>         struct u64_stats_sync syncp;
> } __aligned(2 * sizeof(u64));
> 
> /* often modified stats are per-CPU, other are shared (netdev->stats) */
> struct pcpu_sw_netstats {
>         u64     rx_packets;
>         u64     rx_bytes;
>         u64     tx_packets;
>         u64     tx_bytes;
>         struct u64_stats_sync   syncp;
> } __aligned(4 * sizeof(u64));
> 
> struct pcpu_lstats {
>         u64_stats_t packets;
>         u64_stats_t bytes;
>         struct u64_stats_sync syncp;
> } __aligned(2 * sizeof(u64));
> 
> Cargo cult or is there a real need?

Hm, looks like the intent is to enforce power of two alignment 
to prevent the structure from spanning cache lines. Doesn't make 
any difference for 1 counter, but I guess we can keep the style 
for consistency.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 18:26       ` Jakub Kicinski
@ 2020-10-18 18:33         ` Jakub Kicinski
  0 siblings, 0 replies; 49+ messages in thread
From: Jakub Kicinski @ 2020-10-18 18:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Oltean, netdev, f.fainelli, vivien.didelot,
	Christian Eggers, Kurt Kanzenbach

On Sun, 18 Oct 2020 11:26:53 -0700 Jakub Kicinski wrote:
> Hm, looks like the intent is to enforce power of two alignment 
> to prevent the structure from spanning cache lines. Doesn't make 
> any difference for 1 counter, but I guess we can keep the style 
> for consistency.

I take that back, looks like seqcount_t is 4B, so it will make
a difference, don't mind me :)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 14:13           ` Heiner Kallweit
@ 2020-10-18 22:58             ` Vladimir Oltean
  2020-10-18 23:11               ` Florian Fainelli
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-18 22:58 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: netdev, andrew, f.fainelli, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 04:13:28PM +0200, Heiner Kallweit wrote:
> On a side note, because I'm also just dealing with it: this_cpu_ptr()
> is safe only if preemption is disabled. Is this the case here? Else
> there's get_cpu_ptr/put_cpu_ptr.

lockdep would shout about using smp_processor_id() in preemptible
context? Because it doesn't do that, and never did, but I don't really
understand why, coming to think of it.

> Also, if irq's aren't disabled there might be a need to use
> u64_stats_update_begin_irqsave() et al. See: 2695578b896a ("net:
> usbnet: fix potential deadlock on 32bit hosts") But I don't know dsa
> good enough to say whether this is applicable here.

DSA xmit and receive path do not run from hardirq context, just softirq,
so I believe the issue reported to Eric there does not apply here.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 22:58             ` Vladimir Oltean
@ 2020-10-18 23:11               ` Florian Fainelli
  2020-10-19  0:21                 ` Vladimir Oltean
  0 siblings, 1 reply; 49+ messages in thread
From: Florian Fainelli @ 2020-10-18 23:11 UTC (permalink / raw)
  To: Vladimir Oltean, Heiner Kallweit
  Cc: netdev, andrew, vivien.didelot, kuba, Christian Eggers, Kurt Kanzenbach



On 10/18/2020 3:58 PM, Vladimir Oltean wrote:
> On Sun, Oct 18, 2020 at 04:13:28PM +0200, Heiner Kallweit wrote:
>> On a side note, because I'm also just dealing with it: this_cpu_ptr()
>> is safe only if preemption is disabled. Is this the case here? Else
>> there's get_cpu_ptr/put_cpu_ptr.
> 
> lockdep would shout about using smp_processor_id() in preemptible
> context? Because it doesn't do that, and never did, but I don't really
> understand why, coming to think of it.
> 
>> Also, if irq's aren't disabled there might be a need to use
>> u64_stats_update_begin_irqsave() et al. See: 2695578b896a ("net:
>> usbnet: fix potential deadlock on 32bit hosts") But I don't know dsa
>> good enough to say whether this is applicable here.
> 
> DSA xmit and receive path do not run from hardirq context, just softirq,
> so I believe the issue reported to Eric there does not apply here.

How about when used as a netconsole? We do support netconsole over DSA 
interfaces.
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-18 23:11               ` Florian Fainelli
@ 2020-10-19  0:21                 ` Vladimir Oltean
  2020-10-19  3:49                   ` Florian Fainelli
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-19  0:21 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Heiner Kallweit, netdev, andrew, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 04:11:14PM -0700, Florian Fainelli wrote:
> How about when used as a netconsole? We do support netconsole over DSA
> interfaces.

How? Who is supposed to bring up the master interface, and when?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-19  0:21                 ` Vladimir Oltean
@ 2020-10-19  3:49                   ` Florian Fainelli
  2020-10-19 12:05                     ` Andrew Lunn
  0 siblings, 1 reply; 49+ messages in thread
From: Florian Fainelli @ 2020-10-19  3:49 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Heiner Kallweit, netdev, andrew, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach



On 10/18/2020 5:21 PM, Vladimir Oltean wrote:
> On Sun, Oct 18, 2020 at 04:11:14PM -0700, Florian Fainelli wrote:
>> How about when used as a netconsole? We do support netconsole over DSA
>> interfaces.
> 
> How? Who is supposed to bring up the master interface, and when?
> 

You are right that this appears not to work when configured on the 
kernel command line:

[    6.836910] netpoll: netconsole: local port 4444
[    6.841553] netpoll: netconsole: local IPv4 address 192.168.1.10
[    6.847582] netpoll: netconsole: interface 'gphy'
[    6.852305] netpoll: netconsole: remote port 9353
[    6.857030] netpoll: netconsole: remote IPv4 address 192.168.1.254
[    6.863233] netpoll: netconsole: remote ethernet address 
b8:ac:6f:80:af:7e
[    6.870134] netpoll: netconsole: device gphy not up yet, forcing it
[    6.876428] netpoll: netconsole: failed to open gphy
[    6.881412] netconsole: cleaning up

looking at my test notes from 2015 when it was added, I had only tested 
dynamic netconsole while the network devices have already been brought 
up which is why I did not catch it. Let me see if I can fix that somehow.
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-17 22:11     ` Florian Fainelli
  2020-10-17 22:17       ` Vladimir Oltean
@ 2020-10-19  8:33       ` David Laight
  2020-10-19 10:30         ` Vladimir Oltean
  1 sibling, 1 reply; 49+ messages in thread
From: David Laight @ 2020-10-19  8:33 UTC (permalink / raw)
  To: 'Florian Fainelli', Vladimir Oltean, netdev
  Cc: andrew, vivien.didelot, kuba, Christian Eggers, Kurt Kanzenbach

From: Florian Fainelli>
> Sent: 17 October 2020 23:12
..
> Not positive you need that because you may be account for more head or
> tail room than necessary.
> 
> For instance with tag_brcm.c and systemport.c we need 4 bytes of head
> room for the Broadcom tag and an additional 8 bytes for pushing the
> transmit status block descriptor in front of the Ethernet frame about to
> be transmitted. These additional 8 bytes are a requirement of the DSA
> master here and exist regardless of DSA being used, but we should not be
> propagating them to the DSA slave.

Is it possible to send the extra bytes from a separate buffer fragment?
The entire area could be allocated (coherent) when the rings are
allocated.
That would save having to modify the skb at all.

Even if some bytes of the frame header need 'adjusting' transmitting
from a copy may be faster - especially on systems with an iommu.

Many (many) moons ago we found the cutoff point for copying frames
on a system with an iommu to be around 1k bytes.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-19  8:33       ` David Laight
@ 2020-10-19 10:30         ` Vladimir Oltean
  2020-10-19 11:14           ` David Laight
  2020-10-19 12:19           ` Andrew Lunn
  0 siblings, 2 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-19 10:30 UTC (permalink / raw)
  To: David Laight
  Cc: 'Florian Fainelli',
	netdev, andrew, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

On Mon, Oct 19, 2020 at 08:33:27AM +0000, David Laight wrote:
> Is it possible to send the extra bytes from a separate buffer fragment?
> The entire area could be allocated (coherent) when the rings are
> allocated.
> That would save having to modify the skb at all.
> 
> Even if some bytes of the frame header need 'adjusting' transmitting
> from a copy may be faster - especially on systems with an iommu.
> 
> Many (many) moons ago we found the cutoff point for copying frames
> on a system with an iommu to be around 1k bytes.

Please help me understand better how to implement what you're suggesting.
DSA switches have 3 places where they might insert a tag:
1. Between the source MAC address and the EtherType (this is the most
   common)
2. Before the destination MAC address
3. Before the FCS

I imagine that the most common scenario (1) is also the most difficult
to implement using fragments, since I would need to split the Ethernet
header from the rest of the skb data area, which might defeat the
purpose.

Also, simply integrating these 3 code paths into something generic will
bring challenges of its own.

Lastly, I fully expect the buffers to have proper headroom and tailroom
now (if they don't, then it's worth investigating what's the code path
that doesn't observe our dev->needed_headroom and dev->needed_tailroom).
The reallocation code is there just for clones (and as far as I
understand, fragments won't save us from the need of reallocating the
data areas of clones), and for short frames with DSA switches in case
(3).

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-19 10:30         ` Vladimir Oltean
@ 2020-10-19 11:14           ` David Laight
  2020-10-19 11:41             ` Vladimir Oltean
  2020-10-19 12:19           ` Andrew Lunn
  1 sibling, 1 reply; 49+ messages in thread
From: David Laight @ 2020-10-19 11:14 UTC (permalink / raw)
  To: 'Vladimir Oltean'
  Cc: 'Florian Fainelli',
	netdev, andrew, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

From: Vladimir Oltean
> Sent: 19 October 2020 11:31
> 
> On Mon, Oct 19, 2020 at 08:33:27AM +0000, David Laight wrote:
> > Is it possible to send the extra bytes from a separate buffer fragment?
> > The entire area could be allocated (coherent) when the rings are
> > allocated.
> > That would save having to modify the skb at all.
> >
> > Even if some bytes of the frame header need 'adjusting' transmitting
> > from a copy may be faster - especially on systems with an iommu.
> >
> > Many (many) moons ago we found the cutoff point for copying frames
> > on a system with an iommu to be around 1k bytes.
> 
> Please help me understand better how to implement what you're suggesting.
> DSA switches have 3 places where they might insert a tag:
> 1. Between the source MAC address and the EtherType (this is the most
>    common)
> 2. Before the destination MAC address
> 3. Before the FCS
> 
> I imagine that the most common scenario (1) is also the most difficult
> to implement using fragments, since I would need to split the Ethernet
> header from the rest of the skb data area, which might defeat the
> purpose.
> 
> Also, simply integrating these 3 code paths into something generic will
> bring challenges of its own.
> 
> Lastly, I fully expect the buffers to have proper headroom and tailroom
> now (if they don't, then it's worth investigating what's the code path
> that doesn't observe our dev->needed_headroom and dev->needed_tailroom).
> The reallocation code is there just for clones (and as far as I
> understand, fragments won't save us from the need of reallocating the
> data areas of clones), and for short frames with DSA switches in case
> (3).

If the skb are 'cloned' then I suspect you can't even put the MAC addresses
into the skb because they might be being transmitted on another interface.

OTOH TCP will have cloned the skb for retransmit so may ensure than there
isn't (still) a second reference (from an old transmit) before doing the
transmit - in which case you can write into the head/tail space.

Hmmm... I was thinking you were doing something for a specific driver.
But it looks more like it depends on what the interface is connected to.

If the MAC addresses (and ethertype) can be written into the skb head
space then it must be valid to rewrite the header containing the tag.
(With the proviso that none of the MAC drivers try to decode it again.)

One thing I have noticed is that the size of the 'headroom' for UDP
and RAWIP (where I was looking) packets depends on the final 'dev'.
This means that you can't copy the frame into an skb until you know
the 'dev' - but that might depend on the frame data.
This is a 'catch-22' problem.

I actually wonder how much the headroom varies.
It might be worth having a system-wide 'headroom' value.
A few extra bytes aren't really going to make any difference.

That might make it far less likely that there isn't the available
headroom for the tag - in which case you can just log once and discard.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-19 11:14           ` David Laight
@ 2020-10-19 11:41             ` Vladimir Oltean
  0 siblings, 0 replies; 49+ messages in thread
From: Vladimir Oltean @ 2020-10-19 11:41 UTC (permalink / raw)
  To: David Laight
  Cc: 'Florian Fainelli',
	netdev, andrew, vivien.didelot, kuba, Christian Eggers,
	Kurt Kanzenbach

On Mon, Oct 19, 2020 at 11:14:07AM +0000, David Laight wrote:
> If the skb are 'cloned' then I suspect you can't even put the MAC addresses
> into the skb because they might be being transmitted on another interface.

No, that's not right. The bridge floods frames by cloning them, and L2
forwarding doesn't need to modify the L2 header in any way. So when you
have a bridge upper, you will get cloned skbs with a valid MAC header in
them.

> OTOH TCP will have cloned the skb for retransmit so may ensure than there
> isn't (still) a second reference (from an old transmit) before doing the
> transmit - in which case you can write into the head/tail space.

In the case of TCP, I suppose the DSA layer will never see cloned skbs
for the reason you mention, true.

> Hmmm... I was thinking you were doing something for a specific driver.
> But it looks more like it depends on what the interface is connected to.

I'm not sure what you mean here.

> If the MAC addresses (and ethertype) can be written into the skb head
> space then it must be valid to rewrite the header containing the tag.
> (With the proviso that none of the MAC drivers try to decode it again.)

This phrase is almost correct.
Hardware TX timestamping also clones the skb, because the clone needs to
be ultimately reinjected into the socket's error queue, for the user
space program to retrieve the timestamp via a cmsg.

> One thing I have noticed is that the size of the 'headroom' for UDP
> and RAWIP (where I was looking) packets depends on the final 'dev'.
> This means that you can't copy the frame into an skb until you know
> the 'dev' - but that might depend on the frame data.
> This is a 'catch-22' problem.
> I actually wonder how much the headroom varies.
> It might be worth having a system-wide 'headroom' value.
> A few extra bytes aren't really going to make any difference.
>
> That might make it far less likely that there isn't the available
> headroom for the tag - in which case you can just log once and discard.

Again, the case of the bridge, and of TX timestamping, and of tail
taggers that need to perform padding, is enough that we need to
reallocate skbs, and can't just drop them when they're not writable or
there isn't enough head/tailroom.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-19  3:49                   ` Florian Fainelli
@ 2020-10-19 12:05                     ` Andrew Lunn
  2020-10-19 16:27                       ` Florian Fainelli
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Lunn @ 2020-10-19 12:05 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vladimir Oltean, Heiner Kallweit, netdev, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On Sun, Oct 18, 2020 at 08:49:31PM -0700, Florian Fainelli wrote:
> 
> 
> On 10/18/2020 5:21 PM, Vladimir Oltean wrote:
> > On Sun, Oct 18, 2020 at 04:11:14PM -0700, Florian Fainelli wrote:
> > > How about when used as a netconsole? We do support netconsole over DSA
> > > interfaces.
> > 
> > How? Who is supposed to bring up the master interface, and when?
> > 
> 
> You are right that this appears not to work when configured on the kernel
> command line:
> 
> [    6.836910] netpoll: netconsole: local port 4444
> [    6.841553] netpoll: netconsole: local IPv4 address 192.168.1.10
> [    6.847582] netpoll: netconsole: interface 'gphy'
> [    6.852305] netpoll: netconsole: remote port 9353
> [    6.857030] netpoll: netconsole: remote IPv4 address 192.168.1.254
> [    6.863233] netpoll: netconsole: remote ethernet address
> b8:ac:6f:80:af:7e
> [    6.870134] netpoll: netconsole: device gphy not up yet, forcing it
> [    6.876428] netpoll: netconsole: failed to open gphy
> [    6.881412] netconsole: cleaning up
> 
> looking at my test notes from 2015 when it was added, I had only tested
> dynamic netconsole while the network devices have already been brought up
> which is why I did not catch it. Let me see if I can fix that somehow.

Hi Florian

NFS root used to work, so there must be some code in the kernel to
bring the master interface up. Might just need copy/pasting.

      Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure
  2020-10-19 10:30         ` Vladimir Oltean
  2020-10-19 11:14           ` David Laight
@ 2020-10-19 12:19           ` Andrew Lunn
  1 sibling, 0 replies; 49+ messages in thread
From: Andrew Lunn @ 2020-10-19 12:19 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David Laight, 'Florian Fainelli',
	netdev, vivien.didelot, kuba, Christian Eggers, Kurt Kanzenbach

On Mon, Oct 19, 2020 at 10:30:47AM +0000, Vladimir Oltean wrote:
> On Mon, Oct 19, 2020 at 08:33:27AM +0000, David Laight wrote:
> > Is it possible to send the extra bytes from a separate buffer fragment?
> > The entire area could be allocated (coherent) when the rings are
> > allocated.
> > That would save having to modify the skb at all.
> > 
> > Even if some bytes of the frame header need 'adjusting' transmitting
> > from a copy may be faster - especially on systems with an iommu.
> > 
> > Many (many) moons ago we found the cutoff point for copying frames
> > on a system with an iommu to be around 1k bytes.
> 
> Please help me understand better how to implement what you're suggesting.
> DSA switches have 3 places where they might insert a tag:
> 1. Between the source MAC address and the EtherType (this is the most
>    common)
> 2. Before the destination MAC address
> 3. Before the FCS
> 
> I imagine that the most common scenario (1) is also the most difficult
> to implement using fragments, since I would need to split the Ethernet
> header from the rest of the skb data area, which might defeat the
> purpose.

We also have length issues. Most scatter/gather DMA engines require
the fragments are multiple of 4 bytes. Only the last segment does not
have this length restriction. And some of the DSA tag headers are 2
bytes, or 1 byte. So some master devices are going to have to convert
the fragments back to a linear buffer.

    Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics
  2020-10-19 12:05                     ` Andrew Lunn
@ 2020-10-19 16:27                       ` Florian Fainelli
  0 siblings, 0 replies; 49+ messages in thread
From: Florian Fainelli @ 2020-10-19 16:27 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Oltean, Heiner Kallweit, netdev, vivien.didelot, kuba,
	Christian Eggers, Kurt Kanzenbach

On 10/19/20 5:05 AM, Andrew Lunn wrote:
> On Sun, Oct 18, 2020 at 08:49:31PM -0700, Florian Fainelli wrote:
>>
>>
>> On 10/18/2020 5:21 PM, Vladimir Oltean wrote:
>>> On Sun, Oct 18, 2020 at 04:11:14PM -0700, Florian Fainelli wrote:
>>>> How about when used as a netconsole? We do support netconsole over DSA
>>>> interfaces.
>>>
>>> How? Who is supposed to bring up the master interface, and when?
>>>
>>
>> You are right that this appears not to work when configured on the kernel
>> command line:
>>
>> [    6.836910] netpoll: netconsole: local port 4444
>> [    6.841553] netpoll: netconsole: local IPv4 address 192.168.1.10
>> [    6.847582] netpoll: netconsole: interface 'gphy'
>> [    6.852305] netpoll: netconsole: remote port 9353
>> [    6.857030] netpoll: netconsole: remote IPv4 address 192.168.1.254
>> [    6.863233] netpoll: netconsole: remote ethernet address
>> b8:ac:6f:80:af:7e
>> [    6.870134] netpoll: netconsole: device gphy not up yet, forcing it
>> [    6.876428] netpoll: netconsole: failed to open gphy
>> [    6.881412] netconsole: cleaning up
>>
>> looking at my test notes from 2015 when it was added, I had only tested
>> dynamic netconsole while the network devices have already been brought up
>> which is why I did not catch it. Let me see if I can fix that somehow.
> 
> Hi Florian
> 
> NFS root used to work, so there must be some code in the kernel to
> bring the master interface up. Might just need copy/pasting.

This is a tiny bit different because netconsole goes through netpoll
which is responsible for doing the interface configuration. Unlike root
over NFS, this does not utilize net/ipv4/ipconfig.c, so the existing DSA
checks in that file are not used. The same "cure" could be applied, but
I am not sure if it will be accepted, we shall see.
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2020-10-19 16:27 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-17 21:35 [RFC PATCH 00/13] Generic TX reallocation for DSA Vladimir Oltean
2020-10-17 21:35 ` [RFC PATCH 01/13] net: dsa: add plumbing for custom netdev statistics Vladimir Oltean
2020-10-17 22:13   ` Florian Fainelli
2020-10-17 23:15   ` Andrew Lunn
2020-10-18  0:23     ` Vladimir Oltean
2020-10-18 12:02   ` Heiner Kallweit
2020-10-18 12:16     ` Vladimir Oltean
2020-10-18 13:09       ` Heiner Kallweit
2020-10-18 13:48         ` Vladimir Oltean
2020-10-18 14:13           ` Heiner Kallweit
2020-10-18 22:58             ` Vladimir Oltean
2020-10-18 23:11               ` Florian Fainelli
2020-10-19  0:21                 ` Vladimir Oltean
2020-10-19  3:49                   ` Florian Fainelli
2020-10-19 12:05                     ` Andrew Lunn
2020-10-19 16:27                       ` Florian Fainelli
2020-10-18 16:49   ` Jakub Kicinski
2020-10-18 17:36     ` Andrew Lunn
2020-10-18 18:26       ` Jakub Kicinski
2020-10-18 18:33         ` Jakub Kicinski
2020-10-17 21:36 ` [RFC PATCH 02/13] net: dsa: implement a central TX reallocation procedure Vladimir Oltean
2020-10-17 22:01   ` Vladimir Oltean
2020-10-17 22:11     ` Florian Fainelli
2020-10-17 22:17       ` Vladimir Oltean
2020-10-18  0:37         ` Florian Fainelli
2020-10-19  8:33       ` David Laight
2020-10-19 10:30         ` Vladimir Oltean
2020-10-19 11:14           ` David Laight
2020-10-19 11:41             ` Vladimir Oltean
2020-10-19 12:19           ` Andrew Lunn
2020-10-17 22:31     ` Vladimir Oltean
2020-10-18  0:13       ` Vladimir Oltean
2020-10-18 10:36         ` Christian Eggers
2020-10-18 11:42           ` Vladimir Oltean
2020-10-18 11:59             ` Christian Eggers
2020-10-18 12:15               ` Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 03/13] net: dsa: tag_ksz: don't allocate additional memory for padding/tagging Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 04/13] net: dsa: trailer: " Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 05/13] net: dsa: tag_qca: let DSA core deal with TX reallocation Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 06/13] net: dsa: tag_ocelot: " Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 07/13] net: dsa: tag_mtk: " Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 08/13] net: dsa: tag_lan9303: " Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 09/13] net: dsa: tag_edsa: " Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 10/13] net: dsa: tag_brcm: " Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 11/13] net: dsa: tag_dsa: " Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 12/13] net: dsa: tag_gswip: " Vladimir Oltean
2020-10-17 22:19   ` Vladimir Oltean
2020-10-17 21:36 ` [RFC PATCH 13/13] net: dsa: tag_ar9331: " Vladimir Oltean
2020-10-17 23:07 ` [RFC PATCH 00/13] Generic TX reallocation for DSA Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).