linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v8 0/5] net: ipqess: introduce Qualcomm IPQESS driver
@ 2022-11-04 17:41 Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 1/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-04 17:41 UTC (permalink / raw)
  To: davem, Rob Herring, Krzysztof Kozlowski
  Cc: Maxime Chevallier, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

Hello everyone,

This is the 8th iteration on the IPQESS driver, that includes a new
DSA tagger to let the MAC convey the tag to the switch through an
out-of-band medium, here using DMA descriptors.

Notables changes on v8 :
 - Added fixed-link in the dtsi SoC file
 - Removed the ethernet0 alias from the dtsi
 - Added a missing blank line in tagger driver

Notables changes on V7 :
 - Fixed sparse warnings
 - Fixed a typo in the bindings
 - Added missing maintainers in CC

Notables changes on V6 :
 - Cleanup unused helpers and fields in the tagger
 - Cleanup ordering in various files
 - Added more documentation on the tagger
 - Fixed the CHANGEUPPER caching
 - Cleanups in the IPQESS driver

Notables changes on V5 :
 - Fix caching of CHANGEUPPER events
 - Use a skb extension-based tagger
 - Rename the binding file
 - Some cleanups in the ipqess driver itself

Notables changes on V4 :
 - Cache the uses_dsa info from CHANGEUPPER events
 - Use better string handling helpers for ethtool stats
 - rename ethtool callbacks
 - Fix a binding typo

Notables changes on V3 :
 - Took into account Russell's review on the ioctl handler and the mac
   capabilities that were missing
 - Took Andrew's reviews into account by reworking the napi rx loop,
   some stray "inline" keywords, and useless warnings
 - Took Vlad's reviews into account by reworking a few macros
 - Took Christophe's review into account by removing extra GFP_ZERO
 - Took Rob's review into account by simplifying the binding

Notables changes on V2 :
 - Put the DSA tag in the skb itself instead of using skb->shinfo
 - Fixed the initialisation sequence based on Andrew's comments
 - Reworked the error paths in the init sequence
 - Add support for the clock and reset lines on that controller
 - Fixed and updated the binding

The driver itself is pretty straightforward, but has lived out-of-tree
for a while. I've done my best to clean-up some outdated API calls, but
some might remain.

This controller is somewhat special, since it's part of the IPQ4019 SoC
which also includes an QCA8K switch, and uses the IPQESS controller for
the CPU port. The switch is so tightly intergrated with the MAC that it
is connected to the MAC using an internal link (hence the fact that we
only support PHY_INTERFACE_MODE_INTERNAL), and this has some
consequences on the DSA side.

The tagging for the switch isn't done inband as most switch do, but
out-of-band, the DSA tag being included in the DMA descriptor.

This series includes a new out-of-band tagger that uses skb extensions
to convey the tag between the tagger and the MAC driver.

Thanks to the Sartura folks who worked on a base version of this driver,
and provided test hardware.

Best regards,

Maxime

Maxime Chevallier (5):
  net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller
  net: ipqess: introduce the Qualcomm IPQESS driver
  net: dsa: add out-of-band tagging protocol
  net: ipqess: Add out-of-band DSA tagging support
  ARM: dts: qcom: ipq4019: Add description for the IPQESS Ethernet
    controller

 .../bindings/net/qcom,ipq4019-ess-edma.yaml   |   94 ++
 Documentation/networking/dsa/dsa.rst          |   13 +-
 MAINTAINERS                                   |    8 +
 arch/arm/boot/dts/qcom-ipq4019.dtsi           |   48 +
 drivers/net/ethernet/qualcomm/Kconfig         |   12 +
 drivers/net/ethernet/qualcomm/Makefile        |    2 +
 drivers/net/ethernet/qualcomm/ipqess/Makefile |    8 +
 drivers/net/ethernet/qualcomm/ipqess/ipqess.c | 1308 +++++++++++++++++
 drivers/net/ethernet/qualcomm/ipqess/ipqess.h |  522 +++++++
 .../ethernet/qualcomm/ipqess/ipqess_ethtool.c |  164 +++
 include/linux/dsa/oob.h                       |   16 +
 include/linux/skbuff.h                        |    3 +
 include/net/dsa.h                             |    2 +
 net/core/skbuff.c                             |   10 +
 net/dsa/Kconfig                               |    9 +
 net/dsa/Makefile                              |    1 +
 net/dsa/tag_oob.c                             |   49 +
 17 files changed, 2268 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.c
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.h
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c
 create mode 100644 include/linux/dsa/oob.h
 create mode 100644 net/dsa/tag_oob.c

-- 
2.37.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v8 1/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller
  2022-11-04 17:41 [PATCH net-next v8 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
@ 2022-11-04 17:41 ` Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 2/5] net: ipqess: introduce the Qualcomm IPQESS driver Maxime Chevallier
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-04 17:41 UTC (permalink / raw)
  To: davem, Rob Herring, Krzysztof Kozlowski
  Cc: Maxime Chevallier, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Krzysztof Kozlowski

Add the DT binding for the IPQESS Ethernet Controller. This is a simple
controller, only requiring the phy-mode, interrupts, clocks, and
possibly a MAC address setting.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---
V7->V8:
 - No changes
V6->V7:
 - Fixed a typo (firts -> first)
V5->V6:
 - Fixed the $id that used the wrong compatible
 - Force passing all the per-queue interrupts
V4->V5:
 - Remove stray quotes arount the ref property
 - Rename the binding to match the compatible string
V3->V4:
 - Fix a binding typo in the compatible string
V2->V3:
 - Cleanup on reset and clock names
V1->V2:
 - Fixed the example
 - Added reset and clocks
 - Removed generic ethernet attributes

 .../bindings/net/qcom,ipq4019-ess-edma.yaml   | 94 +++++++++++++++++++
 1 file changed, 94 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml

diff --git a/Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml b/Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml
new file mode 100644
index 000000000000..a16e707a4694
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml
@@ -0,0 +1,94 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/qcom,ipq4019-ess-edma.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm IPQ ESS EDMA Ethernet Controller
+
+maintainers:
+  - Maxime Chevallier <maxime.chevallier@bootlin.com>
+
+allOf:
+  - $ref: ethernet-controller.yaml#
+
+properties:
+  compatible:
+    const: qcom,ipq4019-ess-edma
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 32
+    description: One interrupt per tx and rx queue, the first 16 are rx queues
+                 and the last 16 are the tx queues
+
+  clocks:
+    maxItems: 1
+
+  resets:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - resets
+  - phy-mode
+
+unevaluatedProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/qcom,gcc-ipq4019.h>
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+    #include <dt-bindings/interrupt-controller/irq.h>
+    gmac: ethernet@c080000 {
+        compatible = "qcom,ipq4019-ess-edma";
+        reg = <0xc080000 0x8000>;
+        resets = <&gcc ESS_RESET>;
+        clocks = <&gcc GCC_ESS_CLK>;
+        interrupts = <GIC_SPI  65 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  66 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  67 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  68 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  69 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  70 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  71 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  72 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  73 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  74 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  75 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  76 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  77 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  78 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  79 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  80 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 240 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 241 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 242 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 243 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 244 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 245 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 246 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 247 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 248 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 249 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 250 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 251 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 252 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 253 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 254 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 255 IRQ_TYPE_EDGE_RISING>;
+
+        phy-mode = "internal";
+        fixed-link {
+            speed = <1000>;
+            full-duplex;
+            pause;
+        };
+    };
+
+...
-- 
2.37.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v8 2/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-11-04 17:41 [PATCH net-next v8 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 1/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
@ 2022-11-04 17:41 ` Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-04 17:41 UTC (permalink / raw)
  To: davem, Rob Herring, Krzysztof Kozlowski
  Cc: Maxime Chevallier, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

The Qualcomm IPQESS controller is a simple 1G Ethernet controller found
on the IPQ4019 chip. This controller has some specificities, in that the
IPQ4019 platform that includes that controller also has an internal
switch, based on the QCA8K IP.

It is connected to that switch through an internal link, and doesn't
expose directly any external interface, hence it only supports the
PHY_INTERFACE_MODE_INTERNAL for now.

It has 16 RX and TX queues, with a very basic RSS fanout configured at
init time.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---

V7->V8 :
 - No changes
V6->V7 :
 - Added proper endianness conversion to access descriptor fields
V5->V6 :
 - Use the devm_platform_get_and_ioremap_resource() helper
 - Use the correct endianness for TX descriptors
 - Cleaned up the de-init sequence
V4->V5 :
 - Reworked the NAPI loops
 - Used sizeof(*var) when possible
V3->V4 :
 - Renamed the ethtool ksettings callbacks
 - Use ethtool_sprintf
 - Cache uses_dsa on CHANGEUPPER events
V2->V3 :
 - Cleaned up ioctls and phylink support
 - Reworked the napi loop
 - cleaned up some macros and warnings
V1->V2 :
 - Reworked the init sequence, following Andrew's comments
 - Added clock and reset support
 - Reworked the error paths
 - Added extra endianness wrappers to fix sparse warnings


 MAINTAINERS                                   |    7 +
 drivers/net/ethernet/qualcomm/Kconfig         |   11 +
 drivers/net/ethernet/qualcomm/Makefile        |    2 +
 drivers/net/ethernet/qualcomm/ipqess/Makefile |    8 +
 drivers/net/ethernet/qualcomm/ipqess/ipqess.c | 1246 +++++++++++++++++
 drivers/net/ethernet/qualcomm/ipqess/ipqess.h |  518 +++++++
 .../ethernet/qualcomm/ipqess/ipqess_ethtool.c |  164 +++
 7 files changed, 1956 insertions(+)
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.c
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.h
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 95fc5e1b4548..47588d4b1657 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17049,6 +17049,13 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/qualcomm/emac/
 
+QUALCOMM IPQESS ETHERNET DRIVER
+M:	Maxime Chevallier <maxime.chevallier@bootlin.com>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml
+F:	drivers/net/ethernet/qualcomm/ipqess/
+
 QUALCOMM ETHQOS ETHERNET DRIVER
 M:	Vinod Koul <vkoul@kernel.org>
 R:	Bhupesh Sharma <bhupesh.sharma@linaro.org>
diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig
index a4434eb38950..28861bca5a5b 100644
--- a/drivers/net/ethernet/qualcomm/Kconfig
+++ b/drivers/net/ethernet/qualcomm/Kconfig
@@ -60,6 +60,17 @@ config QCOM_EMAC
 	  low power, Receive-Side Scaling (RSS), and IEEE 1588-2008
 	  Precision Clock Synchronization Protocol.
 
+config QCOM_IPQ4019_ESS_EDMA
+	tristate "Qualcomm Atheros IPQ4019 ESS EDMA support"
+	depends on (OF && ARCH_QCOM) || COMPILE_TEST
+	select PHYLINK
+	help
+	  This driver supports the Qualcomm Atheros IPQ40xx built-in
+	  ESS EDMA ethernet controller.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called ipqess.
+
 source "drivers/net/ethernet/qualcomm/rmnet/Kconfig"
 
 endif # NET_VENDOR_QUALCOMM
diff --git a/drivers/net/ethernet/qualcomm/Makefile b/drivers/net/ethernet/qualcomm/Makefile
index 9250976dd884..db463c9ea1f9 100644
--- a/drivers/net/ethernet/qualcomm/Makefile
+++ b/drivers/net/ethernet/qualcomm/Makefile
@@ -11,4 +11,6 @@ qcauart-objs := qca_uart.o
 
 obj-y += emac/
 
+obj-$(CONFIG_QCOM_IPQ4019_ESS_EDMA) += ipqess/
+
 obj-$(CONFIG_RMNET) += rmnet/
diff --git a/drivers/net/ethernet/qualcomm/ipqess/Makefile b/drivers/net/ethernet/qualcomm/ipqess/Makefile
new file mode 100644
index 000000000000..4f2db7283ebf
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the IPQ ESS driver
+#
+
+obj-$(CONFIG_QCOM_IPQ4019_ESS_EDMA) += ipq_ess.o
+
+ipq_ess-objs := ipqess.o ipqess_ethtool.o
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.c b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
new file mode 100644
index 000000000000..df3f2ce77065
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
@@ -0,0 +1,1246 @@
+// SPDX-License-Identifier: GPL-2.0 OR ISC
+/* Copyright (c) 2014 - 2017, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017 - 2018, John Crispin <john@phrozen.org>
+ * Copyright (c) 2018 - 2019, Christian Lamparter <chunkeey@gmail.com>
+ * Copyright (c) 2020 - 2021, Gabor Juhos <j4g8y7@gmail.com>
+ * Copyright (c) 2021 - 2022, Maxime Chevallier <maxime.chevallier@bootlin.com>
+ *
+ */
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/if_vlan.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/phylink.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+#include <linux/skbuff.h>
+#include <linux/vmalloc.h>
+#include <net/checksum.h>
+#include <net/ip6_checksum.h>
+
+#include "ipqess.h"
+
+#define IPQESS_RRD_SIZE		16
+#define IPQESS_NEXT_IDX(X, Y)  (((X) + 1) & ((Y) - 1))
+#define IPQESS_TX_DMA_BUF_LEN	0x3fff
+
+static void ipqess_w32(struct ipqess *ess, u32 reg, u32 val)
+{
+	writel(val, ess->hw_addr + reg);
+}
+
+static u32 ipqess_r32(struct ipqess *ess, u16 reg)
+{
+	return readl(ess->hw_addr + reg);
+}
+
+static void ipqess_m32(struct ipqess *ess, u32 mask, u32 val, u16 reg)
+{
+	u32 _val = ipqess_r32(ess, reg);
+
+	_val &= ~mask;
+	_val |= val;
+
+	ipqess_w32(ess, reg, _val);
+}
+
+void ipqess_update_hw_stats(struct ipqess *ess)
+{
+	u32 *p;
+	u32 stat;
+	int i;
+
+	lockdep_assert_held(&ess->stats_lock);
+
+	p = (u32 *)&ess->ipqess_stats;
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_TX_STAT_PKT_Q(i));
+		*p += stat;
+		p++;
+	}
+
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_TX_STAT_BYTE_Q(i));
+		*p += stat;
+		p++;
+	}
+
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_RX_STAT_PKT_Q(i));
+		*p += stat;
+		p++;
+	}
+
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_RX_STAT_BYTE_Q(i));
+		*p += stat;
+		p++;
+	}
+}
+
+static int ipqess_tx_ring_alloc(struct ipqess *ess)
+{
+	struct device *dev = &ess->pdev->dev;
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		struct ipqess_tx_ring *tx_ring = &ess->tx_ring[i];
+		size_t size;
+		u32 idx;
+
+		tx_ring->ess = ess;
+		tx_ring->ring_id = i;
+		tx_ring->idx = i * 4;
+		tx_ring->count = IPQESS_TX_RING_SIZE;
+		tx_ring->nq = netdev_get_tx_queue(ess->netdev, i);
+
+		size = sizeof(struct ipqess_buf) * IPQESS_TX_RING_SIZE;
+		tx_ring->buf = devm_kzalloc(dev, size, GFP_KERNEL);
+		if (!tx_ring->buf)
+			return -ENOMEM;
+
+		size = sizeof(struct ipqess_tx_desc) * IPQESS_TX_RING_SIZE;
+		tx_ring->hw_desc = dmam_alloc_coherent(dev, size, &tx_ring->dma,
+						       GFP_KERNEL);
+		if (!tx_ring->hw_desc)
+			return -ENOMEM;
+
+		ipqess_w32(ess, IPQESS_REG_TPD_BASE_ADDR_Q(tx_ring->idx),
+			   (u32)tx_ring->dma);
+
+		idx = ipqess_r32(ess, IPQESS_REG_TPD_IDX_Q(tx_ring->idx));
+		idx >>= IPQESS_TPD_CONS_IDX_SHIFT; /* need u32 here */
+		idx &= 0xffff;
+		tx_ring->head = idx;
+		tx_ring->tail = idx;
+
+		ipqess_m32(ess, IPQESS_TPD_PROD_IDX_MASK << IPQESS_TPD_PROD_IDX_SHIFT,
+			   idx, IPQESS_REG_TPD_IDX_Q(tx_ring->idx));
+		ipqess_w32(ess, IPQESS_REG_TX_SW_CONS_IDX_Q(tx_ring->idx), idx);
+		ipqess_w32(ess, IPQESS_REG_TPD_RING_SIZE, IPQESS_TX_RING_SIZE);
+	}
+
+	return 0;
+}
+
+static int ipqess_tx_unmap_and_free(struct device *dev, struct ipqess_buf *buf)
+{
+	int len = 0;
+
+	if (buf->flags & IPQESS_DESC_SINGLE)
+		dma_unmap_single(dev, buf->dma,	buf->length, DMA_TO_DEVICE);
+	else if (buf->flags & IPQESS_DESC_PAGE)
+		dma_unmap_page(dev, buf->dma, buf->length, DMA_TO_DEVICE);
+
+	if (buf->flags & IPQESS_DESC_LAST) {
+		len = buf->skb->len;
+		dev_kfree_skb_any(buf->skb);
+	}
+
+	buf->flags = 0;
+
+	return len;
+}
+
+static void ipqess_tx_ring_free(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int j;
+
+		if (ess->tx_ring[i].hw_desc)
+			continue;
+
+		for (j = 0; j < IPQESS_TX_RING_SIZE; j++) {
+			struct ipqess_buf *buf = &ess->tx_ring[i].buf[j];
+
+			ipqess_tx_unmap_and_free(&ess->pdev->dev, buf);
+		}
+
+		ess->tx_ring[i].buf = NULL;
+	}
+}
+
+static int ipqess_rx_buf_prepare(struct ipqess_buf *buf,
+				 struct ipqess_rx_ring *rx_ring)
+{
+	memset(buf->skb->data, 0, sizeof(struct ipqess_rx_desc));
+
+	buf->dma = dma_map_single(rx_ring->ppdev, buf->skb->data,
+				  IPQESS_RX_HEAD_BUFF_SIZE, DMA_FROM_DEVICE);
+	if (dma_mapping_error(rx_ring->ppdev, buf->dma)) {
+		dev_kfree_skb_any(buf->skb);
+		buf->skb = NULL;
+		return -EFAULT;
+	}
+
+	buf->length = IPQESS_RX_HEAD_BUFF_SIZE;
+	rx_ring->hw_desc[rx_ring->head] = (struct ipqess_rx_desc *)buf->dma;
+	rx_ring->head = (rx_ring->head + 1) % IPQESS_RX_RING_SIZE;
+
+	ipqess_m32(rx_ring->ess, IPQESS_RFD_PROD_IDX_BITS,
+		   (rx_ring->head + IPQESS_RX_RING_SIZE - 1) % IPQESS_RX_RING_SIZE,
+		   IPQESS_REG_RFD_IDX_Q(rx_ring->idx));
+
+	return 0;
+}
+
+/* locking is handled by the caller */
+static int ipqess_rx_buf_alloc_napi(struct ipqess_rx_ring *rx_ring)
+{
+	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
+
+	buf->skb = napi_alloc_skb(&rx_ring->napi_rx, IPQESS_RX_HEAD_BUFF_SIZE);
+	if (!buf->skb)
+		return -ENOMEM;
+
+	return ipqess_rx_buf_prepare(buf, rx_ring);
+}
+
+static int ipqess_rx_buf_alloc(struct ipqess_rx_ring *rx_ring)
+{
+	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
+
+	buf->skb = netdev_alloc_skb_ip_align(rx_ring->ess->netdev,
+					     IPQESS_RX_HEAD_BUFF_SIZE);
+
+	if (!buf->skb)
+		return -ENOMEM;
+
+	return ipqess_rx_buf_prepare(buf, rx_ring);
+}
+
+static void ipqess_refill_work(struct work_struct *work)
+{
+	struct ipqess_rx_ring_refill *rx_refill = container_of(work,
+		struct ipqess_rx_ring_refill, refill_work);
+	struct ipqess_rx_ring *rx_ring = rx_refill->rx_ring;
+	int refill = 0;
+
+	/* don't let this loop by accident. */
+	while (atomic_dec_and_test(&rx_ring->refill_count)) {
+		napi_disable(&rx_ring->napi_rx);
+		if (ipqess_rx_buf_alloc(rx_ring)) {
+			refill++;
+			dev_dbg(rx_ring->ppdev,
+				"Not all buffers were reallocated");
+		}
+		napi_enable(&rx_ring->napi_rx);
+	}
+
+	if (atomic_add_return(refill, &rx_ring->refill_count))
+		schedule_work(&rx_refill->refill_work);
+}
+
+static int ipqess_rx_ring_alloc(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int j;
+
+		ess->rx_ring[i].ess = ess;
+		ess->rx_ring[i].ppdev = &ess->pdev->dev;
+		ess->rx_ring[i].ring_id = i;
+		ess->rx_ring[i].idx = i * 2;
+
+		ess->rx_ring[i].buf = devm_kzalloc(&ess->pdev->dev,
+						   sizeof(struct ipqess_buf) * IPQESS_RX_RING_SIZE,
+						   GFP_KERNEL);
+
+		if (!ess->rx_ring[i].buf)
+			return -ENOMEM;
+
+		ess->rx_ring[i].hw_desc =
+			dmam_alloc_coherent(&ess->pdev->dev,
+					    sizeof(struct ipqess_rx_desc) * IPQESS_RX_RING_SIZE,
+					    &ess->rx_ring[i].dma, GFP_KERNEL);
+
+		if (!ess->rx_ring[i].hw_desc)
+			return -ENOMEM;
+
+		for (j = 0; j < IPQESS_RX_RING_SIZE; j++)
+			if (ipqess_rx_buf_alloc(&ess->rx_ring[i]) < 0)
+				return -ENOMEM;
+
+		ess->rx_refill[i].rx_ring = &ess->rx_ring[i];
+		INIT_WORK(&ess->rx_refill[i].refill_work, ipqess_refill_work);
+
+		ipqess_w32(ess, IPQESS_REG_RFD_BASE_ADDR_Q(ess->rx_ring[i].idx),
+			   (u32)(ess->rx_ring[i].dma));
+	}
+
+	ipqess_w32(ess, IPQESS_REG_RX_DESC0,
+		   (IPQESS_RX_HEAD_BUFF_SIZE << IPQESS_RX_BUF_SIZE_SHIFT) |
+		   (IPQESS_RX_RING_SIZE << IPQESS_RFD_RING_SIZE_SHIFT));
+
+	return 0;
+}
+
+static void ipqess_rx_ring_free(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int j;
+
+		cancel_work_sync(&ess->rx_refill[i].refill_work);
+		atomic_set(&ess->rx_ring[i].refill_count, 0);
+
+		for (j = 0; j < IPQESS_RX_RING_SIZE; j++) {
+			dma_unmap_single(&ess->pdev->dev,
+					 ess->rx_ring[i].buf[j].dma,
+					 ess->rx_ring[i].buf[j].length,
+					 DMA_FROM_DEVICE);
+			dev_kfree_skb_any(ess->rx_ring[i].buf[j].skb);
+		}
+	}
+}
+
+static struct net_device_stats *ipqess_get_stats(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	spin_lock(&ess->stats_lock);
+	ipqess_update_hw_stats(ess);
+	spin_unlock(&ess->stats_lock);
+
+	return &ess->stats;
+}
+
+static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int budget)
+{
+	u32 length = 0, num_desc, tail, rx_ring_tail;
+	int done = 0;
+
+	rx_ring_tail = rx_ring->tail;
+
+	tail = ipqess_r32(rx_ring->ess, IPQESS_REG_RFD_IDX_Q(rx_ring->idx));
+	tail >>= IPQESS_RFD_CONS_IDX_SHIFT;
+	tail &= IPQESS_RFD_CONS_IDX_MASK;
+
+	while (done < budget) {
+		struct ipqess_rx_desc *rd;
+		struct sk_buff *skb;
+
+		if (rx_ring_tail == tail)
+			break;
+
+		dma_unmap_single(rx_ring->ppdev,
+				 rx_ring->buf[rx_ring_tail].dma,
+				 rx_ring->buf[rx_ring_tail].length,
+				 DMA_FROM_DEVICE);
+
+		skb = xchg(&rx_ring->buf[rx_ring_tail].skb, NULL);
+		rd = (struct ipqess_rx_desc *)skb->data;
+		rx_ring_tail = IPQESS_NEXT_IDX(rx_ring_tail, IPQESS_RX_RING_SIZE);
+
+		/* Check if RRD is valid */
+		if (!(rd->rrd7 & cpu_to_le16(IPQESS_RRD_DESC_VALID))) {
+			num_desc = 1;
+			dev_kfree_skb_any(skb);
+			goto skip;
+		}
+
+		num_desc = le16_to_cpu(rd->rrd1) & IPQESS_RRD_NUM_RFD_MASK;
+		length = le16_to_cpu(rd->rrd6) & IPQESS_RRD_PKT_SIZE_MASK;
+
+		skb_reserve(skb, IPQESS_RRD_SIZE);
+		if (num_desc > 1) {
+			struct sk_buff *skb_prev = NULL;
+			int size_remaining;
+			int i;
+
+			skb->data_len = 0;
+			skb->tail += (IPQESS_RX_HEAD_BUFF_SIZE - IPQESS_RRD_SIZE);
+			skb->len = length;
+			skb->truesize = length;
+			size_remaining = length - (IPQESS_RX_HEAD_BUFF_SIZE - IPQESS_RRD_SIZE);
+
+			for (i = 1; i < num_desc; i++) {
+				struct sk_buff *skb_temp = rx_ring->buf[rx_ring_tail].skb;
+
+				dma_unmap_single(rx_ring->ppdev,
+						 rx_ring->buf[rx_ring_tail].dma,
+						 rx_ring->buf[rx_ring_tail].length,
+						 DMA_FROM_DEVICE);
+
+				skb_put(skb_temp, min(size_remaining, IPQESS_RX_HEAD_BUFF_SIZE));
+				if (skb_prev)
+					skb_prev->next = rx_ring->buf[rx_ring_tail].skb;
+				else
+					skb_shinfo(skb)->frag_list = rx_ring->buf[rx_ring_tail].skb;
+				skb_prev = rx_ring->buf[rx_ring_tail].skb;
+				rx_ring->buf[rx_ring_tail].skb->next = NULL;
+
+				skb->data_len += rx_ring->buf[rx_ring_tail].skb->len;
+				size_remaining -= rx_ring->buf[rx_ring_tail].skb->len;
+
+				rx_ring_tail = IPQESS_NEXT_IDX(rx_ring_tail, IPQESS_RX_RING_SIZE);
+			}
+
+		} else {
+			skb_put(skb, length);
+		}
+
+		skb->dev = rx_ring->ess->netdev;
+		skb->protocol = eth_type_trans(skb, rx_ring->ess->netdev);
+		skb_record_rx_queue(skb, rx_ring->ring_id);
+
+		if (rd->rrd6 & cpu_to_le16(IPQESS_RRD_CSUM_FAIL_MASK))
+			skb_checksum_none_assert(skb);
+		else
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+		if (rd->rrd7 & cpu_to_le16(IPQESS_RRD_CVLAN))
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+					       le16_to_cpu(rd->rrd4));
+		else if (rd->rrd1 & cpu_to_le16(IPQESS_RRD_SVLAN))
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD),
+					       le16_to_cpu(rd->rrd4));
+
+		napi_gro_receive(&rx_ring->napi_rx, skb);
+
+		rx_ring->ess->stats.rx_packets++;
+		rx_ring->ess->stats.rx_bytes += length;
+
+		done++;
+skip:
+
+		num_desc += atomic_xchg(&rx_ring->refill_count, 0);
+		while (num_desc) {
+			if (ipqess_rx_buf_alloc_napi(rx_ring)) {
+				num_desc = atomic_add_return(num_desc,
+							     &rx_ring->refill_count);
+				if (num_desc >= DIV_ROUND_UP(IPQESS_RX_RING_SIZE * 4, 7))
+					schedule_work(&rx_ring->ess->rx_refill[rx_ring->ring_id].refill_work);
+				break;
+			}
+			num_desc--;
+		}
+	}
+
+	ipqess_w32(rx_ring->ess, IPQESS_REG_RX_SW_CONS_IDX_Q(rx_ring->idx),
+		   rx_ring_tail);
+	rx_ring->tail = rx_ring_tail;
+
+	return done;
+}
+
+static int ipqess_tx_complete(struct ipqess_tx_ring *tx_ring, int budget)
+{
+	int total = 0, ret;
+	int done = 0;
+	u32 tail;
+
+	tail = ipqess_r32(tx_ring->ess, IPQESS_REG_TPD_IDX_Q(tx_ring->idx));
+	tail >>= IPQESS_TPD_CONS_IDX_SHIFT;
+	tail &= IPQESS_TPD_CONS_IDX_MASK;
+
+	do {
+		ret = ipqess_tx_unmap_and_free(&tx_ring->ess->pdev->dev,
+					       &tx_ring->buf[tx_ring->tail]);
+		tx_ring->tail = IPQESS_NEXT_IDX(tx_ring->tail, tx_ring->count);
+
+		total += ret;
+	} while ((++done < budget) && (tx_ring->tail != tail));
+
+	ipqess_w32(tx_ring->ess, IPQESS_REG_TX_SW_CONS_IDX_Q(tx_ring->idx),
+		   tx_ring->tail);
+
+	if (netif_tx_queue_stopped(tx_ring->nq)) {
+		netdev_dbg(tx_ring->ess->netdev, "waking up tx queue %d\n",
+			   tx_ring->idx);
+		netif_tx_wake_queue(tx_ring->nq);
+	}
+
+	netdev_tx_completed_queue(tx_ring->nq, done, total);
+
+	return done;
+}
+
+static int ipqess_tx_napi(struct napi_struct *napi, int budget)
+{
+	struct ipqess_tx_ring *tx_ring = container_of(napi, struct ipqess_tx_ring,
+						    napi_tx);
+	int work_done = 0;
+	u32 tx_status;
+
+	tx_status = ipqess_r32(tx_ring->ess, IPQESS_REG_TX_ISR);
+	tx_status &= BIT(tx_ring->idx);
+
+	work_done = ipqess_tx_complete(tx_ring, budget);
+
+	ipqess_w32(tx_ring->ess, IPQESS_REG_TX_ISR, tx_status);
+
+	if (likely(work_done < budget)) {
+		if (napi_complete_done(napi, work_done))
+			ipqess_w32(tx_ring->ess,
+				   IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx), 0x1);
+	}
+
+	return work_done;
+}
+
+static int ipqess_rx_napi(struct napi_struct *napi, int budget)
+{
+	struct ipqess_rx_ring *rx_ring = container_of(napi, struct ipqess_rx_ring,
+						    napi_rx);
+	struct ipqess *ess = rx_ring->ess;
+	u32 rx_mask = BIT(rx_ring->idx);
+	int remaining_budget = budget;
+	int rx_done;
+	u32 status;
+
+	do {
+		ipqess_w32(ess, IPQESS_REG_RX_ISR, rx_mask);
+		rx_done = ipqess_rx_poll(rx_ring, remaining_budget);
+		remaining_budget -= rx_done;
+
+		status = ipqess_r32(ess, IPQESS_REG_RX_ISR);
+	} while (remaining_budget > 0 && (status & rx_mask));
+
+	if (remaining_budget <= 0)
+		return budget;
+
+	if (napi_complete_done(napi, budget - remaining_budget))
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(rx_ring->idx), 0x1);
+
+	return budget - remaining_budget;
+}
+
+static irqreturn_t ipqess_interrupt_tx(int irq, void *priv)
+{
+	struct ipqess_tx_ring *tx_ring = (struct ipqess_tx_ring *)priv;
+
+	if (likely(napi_schedule_prep(&tx_ring->napi_tx))) {
+		__napi_schedule(&tx_ring->napi_tx);
+		ipqess_w32(tx_ring->ess, IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx),
+			   0x0);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ipqess_interrupt_rx(int irq, void *priv)
+{
+	struct ipqess_rx_ring *rx_ring = (struct ipqess_rx_ring *)priv;
+
+	if (likely(napi_schedule_prep(&rx_ring->napi_rx))) {
+		__napi_schedule(&rx_ring->napi_rx);
+		ipqess_w32(rx_ring->ess, IPQESS_REG_RX_INT_MASK_Q(rx_ring->idx),
+			   0x0);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void ipqess_irq_enable(struct ipqess *ess)
+{
+	int i;
+
+	ipqess_w32(ess, IPQESS_REG_RX_ISR, 0xff);
+	ipqess_w32(ess, IPQESS_REG_TX_ISR, 0xffff);
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 1);
+		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 1);
+	}
+}
+
+static void ipqess_irq_disable(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 0);
+		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 0);
+	}
+}
+
+static int __init ipqess_init(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	struct device_node *of_node = ess->pdev->dev.of_node;
+	int ret;
+
+	ret = of_get_ethdev_address(of_node, netdev);
+	if (ret)
+		eth_hw_addr_random(netdev);
+
+	return phylink_of_phy_connect(ess->phylink, of_node, 0);
+}
+
+static void ipqess_uninit(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	phylink_disconnect_phy(ess->phylink);
+}
+
+static int ipqess_open(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	int i, err;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int qid;
+
+		qid = ess->tx_ring[i].idx;
+		err = devm_request_irq(&netdev->dev, ess->tx_irq[qid],
+				       ipqess_interrupt_tx, 0,
+				       ess->tx_irq_names[qid],
+				       &ess->tx_ring[i]);
+		if (err)
+			return err;
+
+		qid = ess->rx_ring[i].idx;
+		err = devm_request_irq(&netdev->dev, ess->rx_irq[qid],
+				       ipqess_interrupt_rx, 0,
+				       ess->rx_irq_names[qid],
+				       &ess->rx_ring[i]);
+		if (err)
+			return err;
+
+		napi_enable(&ess->tx_ring[i].napi_tx);
+		napi_enable(&ess->rx_ring[i].napi_rx);
+	}
+
+	ipqess_irq_enable(ess);
+	phylink_start(ess->phylink);
+	netif_tx_start_all_queues(netdev);
+
+	return 0;
+}
+
+static int ipqess_stop(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	int i;
+
+	netif_tx_stop_all_queues(netdev);
+	phylink_stop(ess->phylink);
+	ipqess_irq_disable(ess);
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		napi_disable(&ess->tx_ring[i].napi_tx);
+		napi_disable(&ess->rx_ring[i].napi_rx);
+	}
+
+	return 0;
+}
+
+static int ipqess_do_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	return phylink_mii_ioctl(ess->phylink, ifr, cmd);
+}
+
+static u16 ipqess_tx_desc_available(struct ipqess_tx_ring *tx_ring)
+{
+	u16 count = 0;
+
+	if (tx_ring->tail <= tx_ring->head)
+		count = IPQESS_TX_RING_SIZE;
+
+	count += tx_ring->tail - tx_ring->head - 1;
+
+	return count;
+}
+
+static int ipqess_cal_txd_req(struct sk_buff *skb)
+{
+	int tpds;
+
+	/* one TPD for the header, and one for each fragments */
+	tpds = 1 + skb_shinfo(skb)->nr_frags;
+	if (skb_is_gso(skb) && skb_is_gso_v6(skb)) {
+		/* for LSOv2 one extra TPD is needed */
+		tpds++;
+	}
+
+	return tpds;
+}
+
+static struct ipqess_buf *ipqess_get_tx_buffer(struct ipqess_tx_ring *tx_ring,
+					       struct ipqess_tx_desc *desc)
+{
+	return &tx_ring->buf[desc - tx_ring->hw_desc];
+}
+
+static struct ipqess_tx_desc *ipqess_tx_desc_next(struct ipqess_tx_ring *tx_ring)
+{
+	struct ipqess_tx_desc *desc;
+
+	desc = &tx_ring->hw_desc[tx_ring->head];
+	tx_ring->head = IPQESS_NEXT_IDX(tx_ring->head, tx_ring->count);
+
+	return desc;
+}
+
+static void ipqess_rollback_tx(struct ipqess *eth,
+			       struct ipqess_tx_desc *first_desc, int ring_id)
+{
+	struct ipqess_tx_ring *tx_ring = &eth->tx_ring[ring_id];
+	struct ipqess_tx_desc *desc = NULL;
+	struct ipqess_buf *buf;
+	u16 start_index, index;
+
+	start_index = first_desc - tx_ring->hw_desc;
+
+	index = start_index;
+	while (index != tx_ring->head) {
+		desc = &tx_ring->hw_desc[index];
+		buf = &tx_ring->buf[index];
+		ipqess_tx_unmap_and_free(&eth->pdev->dev, buf);
+		memset(desc, 0, sizeof(*desc));
+		if (++index == tx_ring->count)
+			index = 0;
+	}
+	tx_ring->head = start_index;
+}
+
+static int ipqess_tx_map_and_fill(struct ipqess_tx_ring *tx_ring,
+				  struct sk_buff *skb)
+{
+	struct ipqess_tx_desc *desc = NULL, *first_desc = NULL;
+	u32 word1 = 0, word3 = 0, lso_word1 = 0, svlan_tag = 0;
+	struct platform_device *pdev = tx_ring->ess->pdev;
+	struct ipqess_buf *buf = NULL;
+	u16 len;
+	int i;
+
+	if (skb_is_gso(skb)) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
+			lso_word1 |= IPQESS_TPD_IPV4_EN;
+			ip_hdr(skb)->check = 0;
+			tcp_hdr(skb)->check = ~csum_tcpudp_magic(ip_hdr(skb)->saddr,
+								 ip_hdr(skb)->daddr,
+								 0, IPPROTO_TCP, 0);
+		} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
+			lso_word1 |= IPQESS_TPD_LSO_V2_EN;
+			ipv6_hdr(skb)->payload_len = 0;
+			tcp_hdr(skb)->check = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+							       &ipv6_hdr(skb)->daddr,
+							       0, IPPROTO_TCP, 0);
+		}
+
+		lso_word1 |= IPQESS_TPD_LSO_EN |
+			     ((skb_shinfo(skb)->gso_size & IPQESS_TPD_MSS_MASK) <<
+							   IPQESS_TPD_MSS_SHIFT) |
+			     (skb_transport_offset(skb) << IPQESS_TPD_HDR_SHIFT);
+	} else if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
+		u8 css, cso;
+
+		cso = skb_checksum_start_offset(skb);
+		css = cso + skb->csum_offset;
+
+		word1 |= (IPQESS_TPD_CUSTOM_CSUM_EN);
+		word1 |= (cso >> 1) << IPQESS_TPD_HDR_SHIFT;
+		word1 |= ((css >> 1) << IPQESS_TPD_CUSTOM_CSUM_SHIFT);
+	}
+
+	if (skb_vlan_tag_present(skb)) {
+		switch (skb->vlan_proto) {
+		case htons(ETH_P_8021Q):
+			word3 |= BIT(IPQESS_TX_INS_CVLAN);
+			word3 |= skb_vlan_tag_get(skb) << IPQESS_TX_CVLAN_TAG_SHIFT;
+			break;
+		case htons(ETH_P_8021AD):
+			word1 |= BIT(IPQESS_TX_INS_SVLAN);
+			svlan_tag = skb_vlan_tag_get(skb);
+			break;
+		default:
+			dev_err(&pdev->dev, "no ctag or stag present\n");
+			goto vlan_tag_error;
+		}
+	}
+
+	if (eth_type_vlan(skb->protocol))
+		word1 |= IPQESS_TPD_VLAN_TAGGED;
+
+	if (skb->protocol == htons(ETH_P_PPP_SES))
+		word1 |= IPQESS_TPD_PPPOE_EN;
+
+	len = skb_headlen(skb);
+
+	first_desc = ipqess_tx_desc_next(tx_ring);
+	desc = first_desc;
+	if (lso_word1 & IPQESS_TPD_LSO_V2_EN) {
+		desc->addr = cpu_to_le32(skb->len);
+		desc->word1 = cpu_to_le32(word1 | lso_word1);
+		desc->svlan_tag = cpu_to_le16(svlan_tag);
+		desc->word3 = cpu_to_le32(word3);
+		desc = ipqess_tx_desc_next(tx_ring);
+	}
+
+	buf = ipqess_get_tx_buffer(tx_ring, desc);
+	buf->length = len;
+	buf->dma = dma_map_single(&pdev->dev, skb->data, len, DMA_TO_DEVICE);
+
+	if (dma_mapping_error(&pdev->dev, buf->dma))
+		goto dma_error;
+
+	desc->addr = cpu_to_le32(buf->dma);
+	desc->len  = cpu_to_le16(len);
+
+	buf->flags |= IPQESS_DESC_SINGLE;
+	desc->word1 = cpu_to_le32(word1 | lso_word1);
+	desc->svlan_tag = cpu_to_le16(svlan_tag);
+	desc->word3 = cpu_to_le32(word3);
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		len = skb_frag_size(frag);
+		desc = ipqess_tx_desc_next(tx_ring);
+		buf = ipqess_get_tx_buffer(tx_ring, desc);
+		buf->length = len;
+		buf->flags |= IPQESS_DESC_PAGE;
+		buf->dma = skb_frag_dma_map(&pdev->dev, frag, 0, len,
+					    DMA_TO_DEVICE);
+
+		if (dma_mapping_error(&pdev->dev, buf->dma))
+			goto dma_error;
+
+		desc->addr = cpu_to_le32(buf->dma);
+		desc->len  = cpu_to_le16(len);
+		desc->svlan_tag = cpu_to_le16(svlan_tag);
+		desc->word1 = cpu_to_le32(word1 | lso_word1);
+		desc->word3 = cpu_to_le32(word3);
+	}
+	desc->word1 |= cpu_to_le32(1 << IPQESS_TPD_EOP_SHIFT);
+	buf->skb = skb;
+	buf->flags |= IPQESS_DESC_LAST;
+
+	return 0;
+
+dma_error:
+	ipqess_rollback_tx(tx_ring->ess, first_desc, tx_ring->ring_id);
+	dev_err(&pdev->dev, "TX DMA map failed\n");
+
+vlan_tag_error:
+	return -ENOMEM;
+}
+
+static void ipqess_kick_tx(struct ipqess_tx_ring *tx_ring)
+{
+	/* Ensure that all TPDs has been written completely */
+	dma_wmb();
+
+	/* update software producer index */
+	ipqess_w32(tx_ring->ess, IPQESS_REG_TPD_IDX_Q(tx_ring->idx),
+		   tx_ring->head);
+}
+
+static netdev_tx_t ipqess_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	struct ipqess_tx_ring *tx_ring;
+	int avail;
+	int tx_num;
+	int ret;
+
+	tx_ring = &ess->tx_ring[skb_get_queue_mapping(skb)];
+	tx_num = ipqess_cal_txd_req(skb);
+	avail = ipqess_tx_desc_available(tx_ring);
+	if (avail < tx_num) {
+		netdev_dbg(netdev,
+			   "stopping tx queue %d, avail=%d req=%d im=%x\n",
+			   tx_ring->idx, avail, tx_num,
+			   ipqess_r32(tx_ring->ess,
+				      IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx)));
+		netif_tx_stop_queue(tx_ring->nq);
+		ipqess_w32(tx_ring->ess, IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx), 0x1);
+		ipqess_kick_tx(tx_ring);
+		return NETDEV_TX_BUSY;
+	}
+
+	ret = ipqess_tx_map_and_fill(tx_ring, skb);
+	if (ret) {
+		dev_kfree_skb_any(skb);
+		ess->stats.tx_errors++;
+		goto err_out;
+	}
+
+	ess->stats.tx_packets++;
+	ess->stats.tx_bytes += skb->len;
+	netdev_tx_sent_queue(tx_ring->nq, skb->len);
+
+	if (!netdev_xmit_more() || netif_xmit_stopped(tx_ring->nq))
+		ipqess_kick_tx(tx_ring);
+
+err_out:
+	return NETDEV_TX_OK;
+}
+
+static int ipqess_set_mac_address(struct net_device *netdev, void *p)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	const char *macaddr = netdev->dev_addr;
+	int ret = eth_mac_addr(netdev, p);
+
+	if (ret)
+		return ret;
+
+	ipqess_w32(ess, IPQESS_REG_MAC_CTRL1, (macaddr[0] << 8) | macaddr[1]);
+	ipqess_w32(ess, IPQESS_REG_MAC_CTRL0,
+		   (macaddr[2] << 24) | (macaddr[3] << 16) | (macaddr[4] << 8) |
+		    macaddr[5]);
+
+	return 0;
+}
+
+static void ipqess_tx_timeout(struct net_device *netdev, unsigned int txq_id)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	struct ipqess_tx_ring *tr = &ess->tx_ring[txq_id];
+
+	netdev_warn(netdev, "TX timeout on queue %d\n", tr->idx);
+}
+
+static const struct net_device_ops ipqess_axi_netdev_ops = {
+	.ndo_init		= ipqess_init,
+	.ndo_uninit		= ipqess_uninit,
+	.ndo_open		= ipqess_open,
+	.ndo_stop		= ipqess_stop,
+	.ndo_do_ioctl		= ipqess_do_ioctl,
+	.ndo_start_xmit		= ipqess_xmit,
+	.ndo_get_stats		= ipqess_get_stats,
+	.ndo_set_mac_address	= ipqess_set_mac_address,
+	.ndo_tx_timeout		= ipqess_tx_timeout,
+};
+
+static void ipqess_hw_stop(struct ipqess *ess)
+{
+	int i;
+
+	/* disable all RX queue IRQs */
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++)
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(i), 0);
+
+	/* disable all TX queue IRQs */
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++)
+		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(i), 0);
+
+	/* disable all other IRQs */
+	ipqess_w32(ess, IPQESS_REG_MISC_IMR, 0);
+	ipqess_w32(ess, IPQESS_REG_WOL_IMR, 0);
+
+	/* clear the IRQ status registers */
+	ipqess_w32(ess, IPQESS_REG_RX_ISR, 0xff);
+	ipqess_w32(ess, IPQESS_REG_TX_ISR, 0xffff);
+	ipqess_w32(ess, IPQESS_REG_MISC_ISR, 0x1fff);
+	ipqess_w32(ess, IPQESS_REG_WOL_ISR, 0x1);
+	ipqess_w32(ess, IPQESS_REG_WOL_CTRL, 0);
+
+	/* disable RX and TX queues */
+	ipqess_m32(ess, IPQESS_RXQ_CTRL_EN_MASK, 0, IPQESS_REG_RXQ_CTRL);
+	ipqess_m32(ess, IPQESS_TXQ_CTRL_TXQ_EN, 0, IPQESS_REG_TXQ_CTRL);
+}
+
+static int ipqess_hw_init(struct ipqess *ess)
+{
+	int i, err;
+	u32 tmp;
+
+	ipqess_hw_stop(ess);
+
+	ipqess_m32(ess, BIT(IPQESS_INTR_SW_IDX_W_TYP_SHIFT),
+		   IPQESS_INTR_SW_IDX_W_TYPE << IPQESS_INTR_SW_IDX_W_TYP_SHIFT,
+		   IPQESS_REG_INTR_CTRL);
+
+	/* enable IRQ delay slot */
+	ipqess_w32(ess, IPQESS_REG_IRQ_MODRT_TIMER_INIT,
+		   (IPQESS_TX_IMT << IPQESS_IRQ_MODRT_TX_TIMER_SHIFT) |
+		   (IPQESS_RX_IMT << IPQESS_IRQ_MODRT_RX_TIMER_SHIFT));
+
+	/* Set Customer and Service VLAN TPIDs */
+	ipqess_w32(ess, IPQESS_REG_VLAN_CFG,
+		   (ETH_P_8021Q << IPQESS_VLAN_CFG_CVLAN_TPID_SHIFT) |
+		   (ETH_P_8021AD << IPQESS_VLAN_CFG_SVLAN_TPID_SHIFT));
+
+	/* Configure the TX Queue bursting */
+	ipqess_w32(ess, IPQESS_REG_TXQ_CTRL,
+		   (IPQESS_TPD_BURST << IPQESS_TXQ_NUM_TPD_BURST_SHIFT) |
+		   (IPQESS_TXF_BURST << IPQESS_TXQ_TXF_BURST_NUM_SHIFT) |
+		   IPQESS_TXQ_CTRL_TPD_BURST_EN);
+
+	/* Set RSS type */
+	ipqess_w32(ess, IPQESS_REG_RSS_TYPE,
+		   IPQESS_RSS_TYPE_IPV4TCP | IPQESS_RSS_TYPE_IPV6_TCP |
+		   IPQESS_RSS_TYPE_IPV4_UDP | IPQESS_RSS_TYPE_IPV6UDP |
+		   IPQESS_RSS_TYPE_IPV4 | IPQESS_RSS_TYPE_IPV6);
+
+	/* Set RFD ring burst and threshold */
+	ipqess_w32(ess, IPQESS_REG_RX_DESC1,
+		   (IPQESS_RFD_BURST << IPQESS_RXQ_RFD_BURST_NUM_SHIFT) |
+		   (IPQESS_RFD_THR << IPQESS_RXQ_RFD_PF_THRESH_SHIFT) |
+		   (IPQESS_RFD_LTHR << IPQESS_RXQ_RFD_LOW_THRESH_SHIFT));
+
+	/* Set Rx FIFO
+	 * - threshold to start to DMA data to host
+	 */
+	ipqess_w32(ess, IPQESS_REG_RXQ_CTRL,
+		   IPQESS_FIFO_THRESH_128_BYTE | IPQESS_RXQ_CTRL_RMV_VLAN);
+
+	err = ipqess_rx_ring_alloc(ess);
+	if (err)
+		return err;
+
+	err = ipqess_tx_ring_alloc(ess);
+	if (err)
+		goto err_rx_ring_free;
+
+	/* Load all of ring base addresses above into the dma engine */
+	ipqess_m32(ess, 0, BIT(IPQESS_LOAD_PTR_SHIFT), IPQESS_REG_TX_SRAM_PART);
+
+	/* Disable TX FIFO low watermark and high watermark */
+	ipqess_w32(ess, IPQESS_REG_TXF_WATER_MARK, 0);
+
+	/* Configure RSS indirection table.
+	 * 128 hash will be configured in the following
+	 * pattern: hash{0,1,2,3} = {Q0,Q2,Q4,Q6} respectively
+	 * and so on
+	 */
+	for (i = 0; i < IPQESS_NUM_IDT; i++)
+		ipqess_w32(ess, IPQESS_REG_RSS_IDT(i), IPQESS_RSS_IDT_VALUE);
+
+	/* Configure load balance mapping table.
+	 * 4 table entry will be configured according to the
+	 * following pattern: load_balance{0,1,2,3} = {Q0,Q1,Q3,Q4}
+	 * respectively.
+	 */
+	ipqess_w32(ess, IPQESS_REG_LB_RING, IPQESS_LB_REG_VALUE);
+
+	/* Configure Virtual queue for Tx rings */
+	ipqess_w32(ess, IPQESS_REG_VQ_CTRL0, IPQESS_VQ_REG_VALUE);
+	ipqess_w32(ess, IPQESS_REG_VQ_CTRL1, IPQESS_VQ_REG_VALUE);
+
+	/* Configure Max AXI Burst write size to 128 bytes*/
+	ipqess_w32(ess, IPQESS_REG_AXIW_CTRL_MAXWRSIZE,
+		   IPQESS_AXIW_MAXWRSIZE_VALUE);
+
+	/* Enable TX queues */
+	ipqess_m32(ess, 0, IPQESS_TXQ_CTRL_TXQ_EN, IPQESS_REG_TXQ_CTRL);
+
+	/* Enable RX queues */
+	tmp = 0;
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++)
+		tmp |= IPQESS_RXQ_CTRL_EN(ess->rx_ring[i].idx);
+
+	ipqess_m32(ess, IPQESS_RXQ_CTRL_EN_MASK, tmp, IPQESS_REG_RXQ_CTRL);
+
+	return 0;
+
+err_rx_ring_free:
+
+	ipqess_rx_ring_free(ess);
+	return err;
+}
+
+static void ipqess_mac_config(struct phylink_config *config, unsigned int mode,
+			      const struct phylink_link_state *state)
+{
+	/* Nothing to do, use fixed Internal mode */
+}
+
+static void ipqess_mac_link_down(struct phylink_config *config,
+				 unsigned int mode,
+				 phy_interface_t interface)
+{
+	/* Nothing to do, use fixed Internal mode */
+}
+
+static void ipqess_mac_link_up(struct phylink_config *config,
+			       struct phy_device *phy, unsigned int mode,
+			       phy_interface_t interface,
+			       int speed, int duplex,
+			       bool tx_pause, bool rx_pause)
+{
+	/* Nothing to do, use fixed Internal mode */
+}
+
+static struct phylink_mac_ops ipqess_phylink_mac_ops = {
+	.validate		= phylink_generic_validate,
+	.mac_config		= ipqess_mac_config,
+	.mac_link_up		= ipqess_mac_link_up,
+	.mac_link_down		= ipqess_mac_link_down,
+};
+
+static void ipqess_reset(struct ipqess *ess)
+{
+	reset_control_assert(ess->ess_rst);
+
+	mdelay(10);
+
+	reset_control_deassert(ess->ess_rst);
+
+	/* Waiting for all inner tables to be flushed and reinitialized.
+	 * This takes between 5 and 10 ms
+	 */
+
+	mdelay(10);
+}
+
+static int ipqess_axi_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct net_device *netdev;
+	phy_interface_t phy_mode;
+	struct ipqess *ess;
+	int i, err = 0;
+
+	netdev = devm_alloc_etherdev_mqs(&pdev->dev, sizeof(*ess),
+					 IPQESS_NETDEV_QUEUES,
+					 IPQESS_NETDEV_QUEUES);
+	if (!netdev)
+		return -ENOMEM;
+
+	ess = netdev_priv(netdev);
+	ess->netdev = netdev;
+	ess->pdev = pdev;
+	spin_lock_init(&ess->stats_lock);
+	SET_NETDEV_DEV(netdev, &pdev->dev);
+	platform_set_drvdata(pdev, netdev);
+
+	ess->hw_addr = devm_platform_get_and_ioremap_resource(pdev, 0, NULL);
+	if (IS_ERR(ess->hw_addr))
+		return PTR_ERR(ess->hw_addr);
+
+	err = of_get_phy_mode(np, &phy_mode);
+	if (err) {
+		dev_err(&pdev->dev, "incorrect phy-mode\n");
+		return err;
+	}
+
+	ess->ess_clk = devm_clk_get(&pdev->dev, NULL);
+	if (!IS_ERR(ess->ess_clk))
+		clk_prepare_enable(ess->ess_clk);
+
+	ess->ess_rst = devm_reset_control_get(&pdev->dev, NULL);
+	if (IS_ERR(ess->ess_rst))
+		goto err_clk;
+
+	ipqess_reset(ess);
+
+	ess->phylink_config.dev = &netdev->dev;
+	ess->phylink_config.type = PHYLINK_NETDEV;
+	ess->phylink_config.mac_capabilities = MAC_SYM_PAUSE | MAC_10 |
+					       MAC_100 | MAC_1000FD;
+
+	__set_bit(PHY_INTERFACE_MODE_INTERNAL,
+		  ess->phylink_config.supported_interfaces);
+
+	ess->phylink = phylink_create(&ess->phylink_config,
+				      of_fwnode_handle(np), phy_mode,
+				      &ipqess_phylink_mac_ops);
+	if (IS_ERR(ess->phylink)) {
+		err = PTR_ERR(ess->phylink);
+		goto err_clk;
+	}
+
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
+		ess->tx_irq[i] = platform_get_irq(pdev, i);
+		scnprintf(ess->tx_irq_names[i], sizeof(ess->tx_irq_names[i]),
+			  "%s:txq%d", pdev->name, i);
+	}
+
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
+		ess->rx_irq[i] = platform_get_irq(pdev, i + IPQESS_MAX_TX_QUEUE);
+		scnprintf(ess->rx_irq_names[i], sizeof(ess->rx_irq_names[i]),
+			  "%s:rxq%d", pdev->name, i);
+	}
+
+	netdev->netdev_ops = &ipqess_axi_netdev_ops;
+	netdev->features = NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+			   NETIF_F_HW_VLAN_CTAG_RX |
+			   NETIF_F_HW_VLAN_CTAG_TX |
+			   NETIF_F_TSO | NETIF_F_GRO | NETIF_F_SG;
+	/* feature change is not supported yet */
+	netdev->hw_features = 0;
+	netdev->vlan_features = NETIF_F_HW_CSUM | NETIF_F_SG | NETIF_F_RXCSUM |
+				NETIF_F_TSO |
+				NETIF_F_GRO;
+	netdev->watchdog_timeo = 5 * HZ;
+	netdev->base_addr = (u32)ess->hw_addr;
+	netdev->max_mtu = 9000;
+	netdev->gso_max_segs = IPQESS_TX_RING_SIZE / 2;
+
+	ipqess_set_ethtool_ops(netdev);
+
+	err = ipqess_hw_init(ess);
+	if (err)
+		goto err_phylink;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		netif_napi_add_tx(netdev, &ess->tx_ring[i].napi_tx, ipqess_tx_napi);
+		netif_napi_add(netdev, &ess->rx_ring[i].napi_rx, ipqess_rx_napi);
+	}
+
+	err = register_netdev(netdev);
+	if (err)
+		goto err_hw_stop;
+
+	return 0;
+
+err_hw_stop:
+	ipqess_hw_stop(ess);
+
+	ipqess_tx_ring_free(ess);
+	ipqess_rx_ring_free(ess);
+err_phylink:
+	phylink_destroy(ess->phylink);
+
+err_clk:
+	clk_disable_unprepare(ess->ess_clk);
+
+	return err;
+}
+
+static int ipqess_axi_remove(struct platform_device *pdev)
+{
+	const struct net_device *netdev = platform_get_drvdata(pdev);
+	struct ipqess *ess = netdev_priv(netdev);
+
+	unregister_netdev(ess->netdev);
+	ipqess_hw_stop(ess);
+
+	ipqess_tx_ring_free(ess);
+	ipqess_rx_ring_free(ess);
+
+	phylink_destroy(ess->phylink);
+	clk_disable_unprepare(ess->ess_clk);
+
+	return 0;
+}
+
+static const struct of_device_id ipqess_of_mtable[] = {
+	{.compatible = "qcom,ipq4019-ess-edma" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, ipqess_of_mtable);
+
+static struct platform_driver ipqess_axi_driver = {
+	.driver = {
+		.name    = "ipqess-edma",
+		.of_match_table = ipqess_of_mtable,
+	},
+	.probe    = ipqess_axi_probe,
+	.remove   = ipqess_axi_remove,
+};
+
+module_platform_driver(ipqess_axi_driver);
+
+MODULE_AUTHOR("Qualcomm Atheros Inc");
+MODULE_AUTHOR("John Crispin <john@phrozen.org>");
+MODULE_AUTHOR("Christian Lamparter <chunkeey@gmail.com>");
+MODULE_AUTHOR("Gabor Juhos <j4g8y7@gmail.com>");
+MODULE_AUTHOR("Maxime Chevallier <maxime.chevallier@bootlin.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.h b/drivers/net/ethernet/qualcomm/ipqess/ipqess.h
new file mode 100644
index 000000000000..dae17f3a23e4
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.h
@@ -0,0 +1,518 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR ISC) */
+/* Copyright (c) 2014 - 2016, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017 - 2018, John Crispin <john@phrozen.org>
+ * Copyright (c) 2018 - 2019, Christian Lamparter <chunkeey@gmail.com>
+ * Copyright (c) 2020 - 2021, Gabor Juhos <j4g8y7@gmail.com>
+ * Copyright (c) 2021 - 2022, Maxime Chevallier <maxime.chevallier@bootlin.com>
+ *
+ */
+
+#ifndef _IPQESS_H_
+#define _IPQESS_H_
+
+#define IPQESS_NETDEV_QUEUES	4
+
+#define IPQESS_TPD_EOP_SHIFT 31
+
+#define IPQESS_PORT_ID_SHIFT 12
+#define IPQESS_PORT_ID_MASK 0x7
+
+/* tpd word 3 bit 18-28 */
+#define IPQESS_TPD_PORT_BITMAP_SHIFT 18
+
+#define IPQESS_TPD_FROM_CPU_SHIFT 25
+
+#define IPQESS_RX_RING_SIZE 128
+#define IPQESS_RX_HEAD_BUFF_SIZE 1540
+#define IPQESS_TX_RING_SIZE 128
+#define IPQESS_MAX_RX_QUEUE 8
+#define IPQESS_MAX_TX_QUEUE 16
+
+/* Configurations */
+#define IPQESS_INTR_CLEAR_TYPE 0
+#define IPQESS_INTR_SW_IDX_W_TYPE 0
+#define IPQESS_FIFO_THRESH_TYPE 0
+#define IPQESS_RSS_TYPE 0
+#define IPQESS_RX_IMT 0x0020
+#define IPQESS_TX_IMT 0x0050
+#define IPQESS_TPD_BURST 5
+#define IPQESS_TXF_BURST 0x100
+#define IPQESS_RFD_BURST 8
+#define IPQESS_RFD_THR 16
+#define IPQESS_RFD_LTHR 0
+
+/* Flags used in transmit direction */
+#define IPQESS_DESC_LAST 0x1
+#define IPQESS_DESC_SINGLE 0x2
+#define IPQESS_DESC_PAGE 0x4
+
+struct ipqess_statistics {
+	u32 tx_q0_pkt;
+	u32 tx_q1_pkt;
+	u32 tx_q2_pkt;
+	u32 tx_q3_pkt;
+	u32 tx_q4_pkt;
+	u32 tx_q5_pkt;
+	u32 tx_q6_pkt;
+	u32 tx_q7_pkt;
+	u32 tx_q8_pkt;
+	u32 tx_q9_pkt;
+	u32 tx_q10_pkt;
+	u32 tx_q11_pkt;
+	u32 tx_q12_pkt;
+	u32 tx_q13_pkt;
+	u32 tx_q14_pkt;
+	u32 tx_q15_pkt;
+	u32 tx_q0_byte;
+	u32 tx_q1_byte;
+	u32 tx_q2_byte;
+	u32 tx_q3_byte;
+	u32 tx_q4_byte;
+	u32 tx_q5_byte;
+	u32 tx_q6_byte;
+	u32 tx_q7_byte;
+	u32 tx_q8_byte;
+	u32 tx_q9_byte;
+	u32 tx_q10_byte;
+	u32 tx_q11_byte;
+	u32 tx_q12_byte;
+	u32 tx_q13_byte;
+	u32 tx_q14_byte;
+	u32 tx_q15_byte;
+	u32 rx_q0_pkt;
+	u32 rx_q1_pkt;
+	u32 rx_q2_pkt;
+	u32 rx_q3_pkt;
+	u32 rx_q4_pkt;
+	u32 rx_q5_pkt;
+	u32 rx_q6_pkt;
+	u32 rx_q7_pkt;
+	u32 rx_q0_byte;
+	u32 rx_q1_byte;
+	u32 rx_q2_byte;
+	u32 rx_q3_byte;
+	u32 rx_q4_byte;
+	u32 rx_q5_byte;
+	u32 rx_q6_byte;
+	u32 rx_q7_byte;
+	u32 tx_desc_error;
+};
+
+struct ipqess_tx_desc {
+	__le16  len;
+	__le16  svlan_tag;
+	__le32  word1;
+	__le32  addr;
+	__le32  word3;
+} __aligned(16) __packed;
+
+struct ipqess_rx_desc {
+	__le16 rrd0;
+	__le16 rrd1;
+	__le16 rrd2;
+	__le16 rrd3;
+	__le16 rrd4;
+	__le16 rrd5;
+	__le16 rrd6;
+	__le16 rrd7;
+} __aligned(16) __packed;
+
+struct ipqess_buf {
+	struct sk_buff *skb;
+	dma_addr_t dma;
+	u32 flags;
+	u16 length;
+};
+
+struct ipqess_tx_ring {
+	struct napi_struct napi_tx;
+	u32 idx;
+	int ring_id;
+	struct ipqess *ess;
+	struct netdev_queue *nq;
+	struct ipqess_tx_desc *hw_desc;
+	struct ipqess_buf *buf;
+	dma_addr_t dma;
+	u16 count;
+	u16 head;
+	u16 tail;
+};
+
+struct ipqess_rx_ring {
+	struct napi_struct napi_rx;
+	u32 idx;
+	int ring_id;
+	struct ipqess *ess;
+	struct device *ppdev;
+	struct ipqess_rx_desc **hw_desc;
+	struct ipqess_buf *buf;
+	dma_addr_t dma;
+	u16 head;
+	u16 tail;
+	atomic_t refill_count;
+};
+
+struct ipqess_rx_ring_refill {
+	struct ipqess_rx_ring *rx_ring;
+	struct work_struct refill_work;
+};
+
+#define IPQESS_IRQ_NAME_LEN	32
+
+struct ipqess {
+	struct net_device *netdev;
+	void __iomem *hw_addr;
+
+	struct clk *ess_clk;
+	struct reset_control *ess_rst;
+
+	struct ipqess_rx_ring rx_ring[IPQESS_NETDEV_QUEUES];
+
+	struct platform_device *pdev;
+	struct phylink *phylink;
+	struct phylink_config phylink_config;
+	struct ipqess_tx_ring tx_ring[IPQESS_NETDEV_QUEUES];
+
+	struct ipqess_statistics ipqess_stats;
+
+	/* Protects stats */
+	spinlock_t stats_lock;
+	struct net_device_stats stats;
+
+	struct ipqess_rx_ring_refill rx_refill[IPQESS_NETDEV_QUEUES];
+	u32 tx_irq[IPQESS_MAX_TX_QUEUE];
+	char tx_irq_names[IPQESS_MAX_TX_QUEUE][IPQESS_IRQ_NAME_LEN];
+	u32 rx_irq[IPQESS_MAX_RX_QUEUE];
+	char rx_irq_names[IPQESS_MAX_TX_QUEUE][IPQESS_IRQ_NAME_LEN];
+};
+
+void ipqess_set_ethtool_ops(struct net_device *netdev);
+void ipqess_update_hw_stats(struct ipqess *ess);
+
+/* register definition */
+#define IPQESS_REG_MAS_CTRL 0x0
+#define IPQESS_REG_TIMEOUT_CTRL 0x004
+#define IPQESS_REG_DBG0 0x008
+#define IPQESS_REG_DBG1 0x00C
+#define IPQESS_REG_SW_CTRL0 0x100
+#define IPQESS_REG_SW_CTRL1 0x104
+
+/* Interrupt Status Register */
+#define IPQESS_REG_RX_ISR 0x200
+#define IPQESS_REG_TX_ISR 0x208
+#define IPQESS_REG_MISC_ISR 0x210
+#define IPQESS_REG_WOL_ISR 0x218
+
+#define IPQESS_MISC_ISR_RX_URG_Q(x) (1 << (x))
+
+#define IPQESS_MISC_ISR_AXIR_TIMEOUT 0x00000100
+#define IPQESS_MISC_ISR_AXIR_ERR 0x00000200
+#define IPQESS_MISC_ISR_TXF_DEAD 0x00000400
+#define IPQESS_MISC_ISR_AXIW_ERR 0x00000800
+#define IPQESS_MISC_ISR_AXIW_TIMEOUT 0x00001000
+
+#define IPQESS_WOL_ISR 0x00000001
+
+/* Interrupt Mask Register */
+#define IPQESS_REG_MISC_IMR 0x214
+#define IPQESS_REG_WOL_IMR 0x218
+
+#define IPQESS_RX_IMR_NORMAL_MASK 0x1
+#define IPQESS_TX_IMR_NORMAL_MASK 0x1
+#define IPQESS_MISC_IMR_NORMAL_MASK 0x80001FFF
+#define IPQESS_WOL_IMR_NORMAL_MASK 0x1
+
+/* Edma receive consumer index */
+#define IPQESS_REG_RX_SW_CONS_IDX_Q(x) (0x220 + ((x) << 2)) /* x is the queue id */
+
+/* Edma transmit consumer index */
+#define IPQESS_REG_TX_SW_CONS_IDX_Q(x) (0x240 + ((x) << 2)) /* x is the queue id */
+
+/* IRQ Moderator Initial Timer Register */
+#define IPQESS_REG_IRQ_MODRT_TIMER_INIT 0x280
+#define IPQESS_IRQ_MODRT_TIMER_MASK 0xFFFF
+#define IPQESS_IRQ_MODRT_RX_TIMER_SHIFT 0
+#define IPQESS_IRQ_MODRT_TX_TIMER_SHIFT 16
+
+/* Interrupt Control Register */
+#define IPQESS_REG_INTR_CTRL 0x284
+#define IPQESS_INTR_CLR_TYP_SHIFT 0
+#define IPQESS_INTR_SW_IDX_W_TYP_SHIFT 1
+#define IPQESS_INTR_CLEAR_TYPE_W1 0
+#define IPQESS_INTR_CLEAR_TYPE_R 1
+
+/* RX Interrupt Mask Register */
+#define IPQESS_REG_RX_INT_MASK_Q(x) (0x300 + ((x) << 2)) /* x = queue id */
+
+/* TX Interrupt mask register */
+#define IPQESS_REG_TX_INT_MASK_Q(x) (0x340 + ((x) << 2)) /* x = queue id */
+
+/* Load Ptr Register
+ * Software sets this bit after the initialization of the head and tail
+ */
+#define IPQESS_REG_TX_SRAM_PART 0x400
+#define IPQESS_LOAD_PTR_SHIFT 16
+
+/* TXQ Control Register */
+#define IPQESS_REG_TXQ_CTRL 0x404
+#define IPQESS_TXQ_CTRL_IP_OPTION_EN 0x10
+#define IPQESS_TXQ_CTRL_TXQ_EN 0x20
+#define IPQESS_TXQ_CTRL_ENH_MODE 0x40
+#define IPQESS_TXQ_CTRL_LS_8023_EN 0x80
+#define IPQESS_TXQ_CTRL_TPD_BURST_EN 0x100
+#define IPQESS_TXQ_CTRL_LSO_BREAK_EN 0x200
+#define IPQESS_TXQ_NUM_TPD_BURST_MASK 0xF
+#define IPQESS_TXQ_TXF_BURST_NUM_MASK 0xFFFF
+#define IPQESS_TXQ_NUM_TPD_BURST_SHIFT 0
+#define IPQESS_TXQ_TXF_BURST_NUM_SHIFT 16
+
+#define	IPQESS_REG_TXF_WATER_MARK 0x408 /* In 8-bytes */
+#define IPQESS_TXF_WATER_MARK_MASK 0x0FFF
+#define IPQESS_TXF_LOW_WATER_MARK_SHIFT 0
+#define IPQESS_TXF_HIGH_WATER_MARK_SHIFT 16
+#define IPQESS_TXQ_CTRL_BURST_MODE_EN 0x80000000
+
+/* WRR Control Register */
+#define IPQESS_REG_WRR_CTRL_Q0_Q3 0x40c
+#define IPQESS_REG_WRR_CTRL_Q4_Q7 0x410
+#define IPQESS_REG_WRR_CTRL_Q8_Q11 0x414
+#define IPQESS_REG_WRR_CTRL_Q12_Q15 0x418
+
+/* Weight round robin(WRR), it takes queue as input, and computes
+ * starting bits where we need to write the weight for a particular
+ * queue
+ */
+#define IPQESS_WRR_SHIFT(x) (((x) * 5) % 20)
+
+/* Tx Descriptor Control Register */
+#define IPQESS_REG_TPD_RING_SIZE 0x41C
+#define IPQESS_TPD_RING_SIZE_SHIFT 0
+#define IPQESS_TPD_RING_SIZE_MASK 0xFFFF
+
+/* Transmit descriptor base address */
+#define IPQESS_REG_TPD_BASE_ADDR_Q(x) (0x420 + ((x) << 2)) /* x = queue id */
+
+/* TPD Index Register */
+#define IPQESS_REG_TPD_IDX_Q(x) (0x460 + ((x) << 2)) /* x = queue id */
+
+#define IPQESS_TPD_PROD_IDX_BITS 0x0000FFFF
+#define IPQESS_TPD_CONS_IDX_BITS 0xFFFF0000
+#define IPQESS_TPD_PROD_IDX_MASK 0xFFFF
+#define IPQESS_TPD_CONS_IDX_MASK 0xFFFF
+#define IPQESS_TPD_PROD_IDX_SHIFT 0
+#define IPQESS_TPD_CONS_IDX_SHIFT 16
+
+/* TX Virtual Queue Mapping Control Register */
+#define IPQESS_REG_VQ_CTRL0 0x4A0
+#define IPQESS_REG_VQ_CTRL1 0x4A4
+
+/* Virtual QID shift, it takes queue as input, and computes
+ * Virtual QID position in virtual qid control register
+ */
+#define IPQESS_VQ_ID_SHIFT(i) (((i) * 3) % 24)
+
+/* Virtual Queue Default Value */
+#define IPQESS_VQ_REG_VALUE 0x240240
+
+/* Tx side Port Interface Control Register */
+#define IPQESS_REG_PORT_CTRL 0x4A8
+#define IPQESS_PAD_EN_SHIFT 15
+
+/* Tx side VLAN Configuration Register */
+#define IPQESS_REG_VLAN_CFG 0x4AC
+
+#define IPQESS_VLAN_CFG_SVLAN_TPID_SHIFT 0
+#define IPQESS_VLAN_CFG_SVLAN_TPID_MASK 0xffff
+#define IPQESS_VLAN_CFG_CVLAN_TPID_SHIFT 16
+#define IPQESS_VLAN_CFG_CVLAN_TPID_MASK 0xffff
+
+#define IPQESS_TX_CVLAN 16
+#define IPQESS_TX_INS_CVLAN 17
+#define IPQESS_TX_CVLAN_TAG_SHIFT 0
+
+#define IPQESS_TX_SVLAN 14
+#define IPQESS_TX_INS_SVLAN 15
+#define IPQESS_TX_SVLAN_TAG_SHIFT 16
+
+/* Tx Queue Packet Statistic Register */
+#define IPQESS_REG_TX_STAT_PKT_Q(x) (0x700 + ((x) << 3)) /* x = queue id */
+
+#define IPQESS_TX_STAT_PKT_MASK 0xFFFFFF
+
+/* Tx Queue Byte Statistic Register */
+#define IPQESS_REG_TX_STAT_BYTE_Q(x) (0x704 + ((x) << 3)) /* x = queue id */
+
+/* Load Balance Based Ring Offset Register */
+#define IPQESS_REG_LB_RING 0x800
+#define IPQESS_LB_RING_ENTRY_MASK 0xff
+#define IPQESS_LB_RING_ID_MASK 0x7
+#define IPQESS_LB_RING_PROFILE_ID_MASK 0x3
+#define IPQESS_LB_RING_ENTRY_BIT_OFFSET 8
+#define IPQESS_LB_RING_ID_OFFSET 0
+#define IPQESS_LB_RING_PROFILE_ID_OFFSET 3
+#define IPQESS_LB_REG_VALUE 0x6040200
+
+/* Load Balance Priority Mapping Register */
+#define IPQESS_REG_LB_PRI_START 0x804
+#define IPQESS_REG_LB_PRI_END 0x810
+#define IPQESS_LB_PRI_REG_INC 4
+#define IPQESS_LB_PRI_ENTRY_BIT_OFFSET 4
+#define IPQESS_LB_PRI_ENTRY_MASK 0xf
+
+/* RSS Priority Mapping Register */
+#define IPQESS_REG_RSS_PRI 0x820
+#define IPQESS_RSS_PRI_ENTRY_MASK 0xf
+#define IPQESS_RSS_RING_ID_MASK 0x7
+#define IPQESS_RSS_PRI_ENTRY_BIT_OFFSET 4
+
+/* RSS Indirection Register */
+#define IPQESS_REG_RSS_IDT(x) (0x840 + ((x) << 2)) /* x = No. of indirection table */
+#define IPQESS_NUM_IDT 16
+#define IPQESS_RSS_IDT_VALUE 0x64206420
+
+/* Default RSS Ring Register */
+#define IPQESS_REG_DEF_RSS 0x890
+#define IPQESS_DEF_RSS_MASK 0x7
+
+/* RSS Hash Function Type Register */
+#define IPQESS_REG_RSS_TYPE 0x894
+#define IPQESS_RSS_TYPE_NONE 0x01
+#define IPQESS_RSS_TYPE_IPV4TCP 0x02
+#define IPQESS_RSS_TYPE_IPV6_TCP 0x04
+#define IPQESS_RSS_TYPE_IPV4_UDP 0x08
+#define IPQESS_RSS_TYPE_IPV6UDP 0x10
+#define IPQESS_RSS_TYPE_IPV4 0x20
+#define IPQESS_RSS_TYPE_IPV6 0x40
+#define IPQESS_RSS_HASH_MODE_MASK 0x7f
+
+#define IPQESS_REG_RSS_HASH_VALUE 0x8C0
+
+#define IPQESS_REG_RSS_TYPE_RESULT 0x8C4
+
+#define IPQESS_HASH_TYPE_START 0
+#define IPQESS_HASH_TYPE_END 5
+#define IPQESS_HASH_TYPE_SHIFT 12
+
+#define IPQESS_RFS_FLOW_ENTRIES 1024
+#define IPQESS_RFS_FLOW_ENTRIES_MASK (IPQESS_RFS_FLOW_ENTRIES - 1)
+#define IPQESS_RFS_EXPIRE_COUNT_PER_CALL 128
+
+/* RFD Base Address Register */
+#define IPQESS_REG_RFD_BASE_ADDR_Q(x) (0x950 + ((x) << 2)) /* x = queue id */
+
+/* RFD Index Register */
+#define IPQESS_REG_RFD_IDX_Q(x) (0x9B0 + ((x) << 2)) /* x = queue id */
+
+#define IPQESS_RFD_PROD_IDX_BITS 0x00000FFF
+#define IPQESS_RFD_CONS_IDX_BITS 0x0FFF0000
+#define IPQESS_RFD_PROD_IDX_MASK 0xFFF
+#define IPQESS_RFD_CONS_IDX_MASK 0xFFF
+#define IPQESS_RFD_PROD_IDX_SHIFT 0
+#define IPQESS_RFD_CONS_IDX_SHIFT 16
+
+/* Rx Descriptor Control Register */
+#define IPQESS_REG_RX_DESC0 0xA10
+#define IPQESS_RFD_RING_SIZE_MASK 0xFFF
+#define IPQESS_RX_BUF_SIZE_MASK 0xFFFF
+#define IPQESS_RFD_RING_SIZE_SHIFT 0
+#define IPQESS_RX_BUF_SIZE_SHIFT 16
+
+#define IPQESS_REG_RX_DESC1 0xA14
+#define IPQESS_RXQ_RFD_BURST_NUM_MASK 0x3F
+#define IPQESS_RXQ_RFD_PF_THRESH_MASK 0x1F
+#define IPQESS_RXQ_RFD_LOW_THRESH_MASK 0xFFF
+#define IPQESS_RXQ_RFD_BURST_NUM_SHIFT 0
+#define IPQESS_RXQ_RFD_PF_THRESH_SHIFT 8
+#define IPQESS_RXQ_RFD_LOW_THRESH_SHIFT 16
+
+/* RXQ Control Register */
+#define IPQESS_REG_RXQ_CTRL 0xA18
+#define IPQESS_FIFO_THRESH_TYPE_SHIF 0
+#define IPQESS_FIFO_THRESH_128_BYTE 0x0
+#define IPQESS_FIFO_THRESH_64_BYTE 0x1
+#define IPQESS_RXQ_CTRL_RMV_VLAN 0x00000002
+#define IPQESS_RXQ_CTRL_EN_MASK			GENMASK(15, 8)
+#define IPQESS_RXQ_CTRL_EN(__qid)		BIT(8 + (__qid))
+
+/* AXI Burst Size Config */
+#define IPQESS_REG_AXIW_CTRL_MAXWRSIZE 0xA1C
+#define IPQESS_AXIW_MAXWRSIZE_VALUE 0x0
+
+/* Rx Statistics Register */
+#define IPQESS_REG_RX_STAT_BYTE_Q(x) (0xA30 + ((x) << 2)) /* x = queue id */
+#define IPQESS_REG_RX_STAT_PKT_Q(x) (0xA50 + ((x) << 2)) /* x = queue id */
+
+/* WoL Pattern Length Register */
+#define IPQESS_REG_WOL_PATTERN_LEN0 0xC00
+#define IPQESS_WOL_PT_LEN_MASK 0xFF
+#define IPQESS_WOL_PT0_LEN_SHIFT 0
+#define IPQESS_WOL_PT1_LEN_SHIFT 8
+#define IPQESS_WOL_PT2_LEN_SHIFT 16
+#define IPQESS_WOL_PT3_LEN_SHIFT 24
+
+#define IPQESS_REG_WOL_PATTERN_LEN1 0xC04
+#define IPQESS_WOL_PT4_LEN_SHIFT 0
+#define IPQESS_WOL_PT5_LEN_SHIFT 8
+#define IPQESS_WOL_PT6_LEN_SHIFT 16
+
+/* WoL Control Register */
+#define IPQESS_REG_WOL_CTRL 0xC08
+#define IPQESS_WOL_WK_EN 0x00000001
+#define IPQESS_WOL_MG_EN 0x00000002
+#define IPQESS_WOL_PT0_EN 0x00000004
+#define IPQESS_WOL_PT1_EN 0x00000008
+#define IPQESS_WOL_PT2_EN 0x00000010
+#define IPQESS_WOL_PT3_EN 0x00000020
+#define IPQESS_WOL_PT4_EN 0x00000040
+#define IPQESS_WOL_PT5_EN 0x00000080
+#define IPQESS_WOL_PT6_EN 0x00000100
+
+/* MAC Control Register */
+#define IPQESS_REG_MAC_CTRL0 0xC20
+#define IPQESS_REG_MAC_CTRL1 0xC24
+
+/* WoL Pattern Register */
+#define IPQESS_REG_WOL_PATTERN_START 0x5000
+#define IPQESS_PATTERN_PART_REG_OFFSET 0x40
+
+/* TX descriptor fields */
+#define IPQESS_TPD_HDR_SHIFT 0
+#define IPQESS_TPD_PPPOE_EN 0x00000100
+#define IPQESS_TPD_IP_CSUM_EN 0x00000200
+#define IPQESS_TPD_TCP_CSUM_EN 0x0000400
+#define IPQESS_TPD_UDP_CSUM_EN 0x00000800
+#define IPQESS_TPD_CUSTOM_CSUM_EN 0x00000C00
+#define IPQESS_TPD_LSO_EN 0x00001000
+#define IPQESS_TPD_LSO_V2_EN 0x00002000
+/* The VLAN_TAGGED bit is not used in the publicly available
+ * drivers. The definition has been stolen from the Atheros
+ * 'alx' driver (drivers/net/ethernet/atheros/alx/hw.h). It
+ * seems that it has the same meaning in regard to the EDMA
+ * hardware.
+ */
+#define IPQESS_TPD_VLAN_TAGGED 0x00004000
+#define IPQESS_TPD_IPV4_EN 0x00010000
+#define IPQESS_TPD_MSS_MASK 0x1FFF
+#define IPQESS_TPD_MSS_SHIFT 18
+#define IPQESS_TPD_CUSTOM_CSUM_SHIFT 18
+
+/* RRD descriptor fields */
+#define IPQESS_RRD_NUM_RFD_MASK 0x000F
+#define IPQESS_RRD_PKT_SIZE_MASK 0x3FFF
+#define IPQESS_RRD_SRC_PORT_NUM_MASK 0x4000
+#define IPQESS_RRD_SVLAN 0x8000
+#define IPQESS_RRD_FLOW_COOKIE_MASK 0x07FF
+
+#define IPQESS_RRD_PKT_SIZE_MASK 0x3FFF
+#define IPQESS_RRD_CSUM_FAIL_MASK 0xC000
+#define IPQESS_RRD_CVLAN 0x0001
+#define IPQESS_RRD_DESC_VALID 0x8000
+
+#define IPQESS_RRD_PRIORITY_SHIFT 4
+#define IPQESS_RRD_PRIORITY_MASK 0x7
+#define IPQESS_RRD_PORT_TYPE_SHIFT 7
+#define IPQESS_RRD_PORT_TYPE_MASK 0x1F
+
+#define IPQESS_RRD_PORT_ID_MASK 0x7000
+
+#endif
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c b/drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c
new file mode 100644
index 000000000000..07527b3a7e62
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0 OR ISC
+/* Copyright (c) 2015 - 2016, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017 - 2018, John Crispin <john@phrozen.org>
+ * Copyright (c) 2021 - 2022, Maxime Chevallier <maxime.chevallier@bootlin.com>
+ *
+ */
+
+#include <linux/ethtool.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/phylink.h>
+
+#include "ipqess.h"
+
+struct ipqess_ethtool_stats {
+	u8 string[ETH_GSTRING_LEN];
+	u32 offset;
+};
+
+#define IPQESS_STAT(m)    offsetof(struct ipqess_statistics, m)
+#define DRVINFO_LEN	32
+
+static const struct ipqess_ethtool_stats ipqess_stats[] = {
+	{"tx_q0_pkt", IPQESS_STAT(tx_q0_pkt)},
+	{"tx_q1_pkt", IPQESS_STAT(tx_q1_pkt)},
+	{"tx_q2_pkt", IPQESS_STAT(tx_q2_pkt)},
+	{"tx_q3_pkt", IPQESS_STAT(tx_q3_pkt)},
+	{"tx_q4_pkt", IPQESS_STAT(tx_q4_pkt)},
+	{"tx_q5_pkt", IPQESS_STAT(tx_q5_pkt)},
+	{"tx_q6_pkt", IPQESS_STAT(tx_q6_pkt)},
+	{"tx_q7_pkt", IPQESS_STAT(tx_q7_pkt)},
+	{"tx_q8_pkt", IPQESS_STAT(tx_q8_pkt)},
+	{"tx_q9_pkt", IPQESS_STAT(tx_q9_pkt)},
+	{"tx_q10_pkt", IPQESS_STAT(tx_q10_pkt)},
+	{"tx_q11_pkt", IPQESS_STAT(tx_q11_pkt)},
+	{"tx_q12_pkt", IPQESS_STAT(tx_q12_pkt)},
+	{"tx_q13_pkt", IPQESS_STAT(tx_q13_pkt)},
+	{"tx_q14_pkt", IPQESS_STAT(tx_q14_pkt)},
+	{"tx_q15_pkt", IPQESS_STAT(tx_q15_pkt)},
+	{"tx_q0_byte", IPQESS_STAT(tx_q0_byte)},
+	{"tx_q1_byte", IPQESS_STAT(tx_q1_byte)},
+	{"tx_q2_byte", IPQESS_STAT(tx_q2_byte)},
+	{"tx_q3_byte", IPQESS_STAT(tx_q3_byte)},
+	{"tx_q4_byte", IPQESS_STAT(tx_q4_byte)},
+	{"tx_q5_byte", IPQESS_STAT(tx_q5_byte)},
+	{"tx_q6_byte", IPQESS_STAT(tx_q6_byte)},
+	{"tx_q7_byte", IPQESS_STAT(tx_q7_byte)},
+	{"tx_q8_byte", IPQESS_STAT(tx_q8_byte)},
+	{"tx_q9_byte", IPQESS_STAT(tx_q9_byte)},
+	{"tx_q10_byte", IPQESS_STAT(tx_q10_byte)},
+	{"tx_q11_byte", IPQESS_STAT(tx_q11_byte)},
+	{"tx_q12_byte", IPQESS_STAT(tx_q12_byte)},
+	{"tx_q13_byte", IPQESS_STAT(tx_q13_byte)},
+	{"tx_q14_byte", IPQESS_STAT(tx_q14_byte)},
+	{"tx_q15_byte", IPQESS_STAT(tx_q15_byte)},
+	{"rx_q0_pkt", IPQESS_STAT(rx_q0_pkt)},
+	{"rx_q1_pkt", IPQESS_STAT(rx_q1_pkt)},
+	{"rx_q2_pkt", IPQESS_STAT(rx_q2_pkt)},
+	{"rx_q3_pkt", IPQESS_STAT(rx_q3_pkt)},
+	{"rx_q4_pkt", IPQESS_STAT(rx_q4_pkt)},
+	{"rx_q5_pkt", IPQESS_STAT(rx_q5_pkt)},
+	{"rx_q6_pkt", IPQESS_STAT(rx_q6_pkt)},
+	{"rx_q7_pkt", IPQESS_STAT(rx_q7_pkt)},
+	{"rx_q0_byte", IPQESS_STAT(rx_q0_byte)},
+	{"rx_q1_byte", IPQESS_STAT(rx_q1_byte)},
+	{"rx_q2_byte", IPQESS_STAT(rx_q2_byte)},
+	{"rx_q3_byte", IPQESS_STAT(rx_q3_byte)},
+	{"rx_q4_byte", IPQESS_STAT(rx_q4_byte)},
+	{"rx_q5_byte", IPQESS_STAT(rx_q5_byte)},
+	{"rx_q6_byte", IPQESS_STAT(rx_q6_byte)},
+	{"rx_q7_byte", IPQESS_STAT(rx_q7_byte)},
+	{"tx_desc_error", IPQESS_STAT(tx_desc_error)},
+};
+
+static int ipqess_get_strset_count(struct net_device *netdev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return ARRAY_SIZE(ipqess_stats);
+	default:
+		netdev_dbg(netdev, "%s: Unsupported string set", __func__);
+		return -EOPNOTSUPP;
+	}
+}
+
+static void ipqess_get_strings(struct net_device *netdev, u32 stringset,
+			       u8 *data)
+{
+	u8 *p = data;
+	u32 i;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for (i = 0; i < ARRAY_SIZE(ipqess_stats); i++)
+			ethtool_sprintf(&p, ipqess_stats[i].string);
+		break;
+	}
+}
+
+static void ipqess_get_ethtool_stats(struct net_device *netdev,
+				     struct ethtool_stats *stats,
+				     uint64_t *data)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	u32 *essstats = (u32 *)&ess->ipqess_stats;
+	int i;
+
+	spin_lock(&ess->stats_lock);
+
+	ipqess_update_hw_stats(ess);
+
+	for (i = 0; i < ARRAY_SIZE(ipqess_stats); i++)
+		data[i] = *(u32 *)(essstats + (ipqess_stats[i].offset / sizeof(u32)));
+
+	spin_unlock(&ess->stats_lock);
+}
+
+static void ipqess_get_drvinfo(struct net_device *dev,
+			       struct ethtool_drvinfo *info)
+{
+	strscpy(info->driver, "qca_ipqess", DRVINFO_LEN);
+	strscpy(info->bus_info, "axi", ETHTOOL_BUSINFO_LEN);
+}
+
+static int ipqess_get_link_ksettings(struct net_device *netdev,
+				     struct ethtool_link_ksettings *cmd)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	return phylink_ethtool_ksettings_get(ess->phylink, cmd);
+}
+
+static int ipqess_set_link_ksettings(struct net_device *netdev,
+				     const struct ethtool_link_ksettings *cmd)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	return phylink_ethtool_ksettings_set(ess->phylink, cmd);
+}
+
+static void ipqess_get_ringparam(struct net_device *netdev,
+				 struct ethtool_ringparam *ring,
+				 struct kernel_ethtool_ringparam *kernel_ering,
+				 struct netlink_ext_ack *extack)
+{
+	ring->tx_max_pending = IPQESS_TX_RING_SIZE;
+	ring->rx_max_pending = IPQESS_RX_RING_SIZE;
+}
+
+static const struct ethtool_ops ipqesstool_ops = {
+	.get_drvinfo = &ipqess_get_drvinfo,
+	.get_link = &ethtool_op_get_link,
+	.get_link_ksettings = &ipqess_get_link_ksettings,
+	.set_link_ksettings = &ipqess_set_link_ksettings,
+	.get_strings = &ipqess_get_strings,
+	.get_sset_count = &ipqess_get_strset_count,
+	.get_ethtool_stats = &ipqess_get_ethtool_stats,
+	.get_ringparam = ipqess_get_ringparam,
+};
+
+void ipqess_set_ethtool_ops(struct net_device *netdev)
+{
+	netdev->ethtool_ops = &ipqesstool_ops;
+}
-- 
2.37.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-04 17:41 [PATCH net-next v8 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 1/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 2/5] net: ipqess: introduce the Qualcomm IPQESS driver Maxime Chevallier
@ 2022-11-04 17:41 ` Maxime Chevallier
  2022-11-05  3:05   ` Jakub Kicinski
  2022-11-04 17:41 ` [PATCH net-next v8 4/5] net: ipqess: Add out-of-band DSA tagging support Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 5/5] ARM: dts: qcom: ipq4019: Add description for the IPQESS Ethernet controller Maxime Chevallier
  4 siblings, 1 reply; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-04 17:41 UTC (permalink / raw)
  To: davem, Rob Herring, Krzysztof Kozlowski
  Cc: Maxime Chevallier, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

This tagging protocol is designed for the situation where the link
between the MAC and the Switch is designed such that the Destination
Port, which is usually embedded in some part of the Ethernet Header, is
sent out-of-band, and isn't present at all in the Ethernet frame.

This can happen when the MAC and Switch are tightly integrated on an
SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
tag is inserted directly into the DMA descriptors. In that case,
the MAC driver is responsible for sending the tag to the switch using
the out-of-band medium. To do so, the MAC driver needs to have the
information of the destination port for that skb.

Add a new tagging protocol based on SKB extensions to convey the
information about the destination port to the MAC driver

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---

V7->V8:
 - Added a missing blank line after declaration
V6->V7:
 - Fixed a sparse warning by making the dsa ops static
V5->V6:
 - Added some documentation
 - Removed the pop/push helpers
 - Removed unused fields
V4->V5
 - Use SKB extensions to convey the tag
V3->V4 
 - No changes
V3->V2:
 - No changes, as the discussion is ongoing
V1->V2:
 - Reworked the tagging method, putting the tag at skb->head instead
   of putting it into skb->shinfo, as per Andrew, Florian and Vlad's
   reviews


 Documentation/networking/dsa/dsa.rst | 13 +++++++-
 MAINTAINERS                          |  1 +
 include/linux/dsa/oob.h              | 16 +++++++++
 include/linux/skbuff.h               |  3 ++
 include/net/dsa.h                    |  2 ++
 net/core/skbuff.c                    | 10 ++++++
 net/dsa/Kconfig                      |  9 +++++
 net/dsa/Makefile                     |  1 +
 net/dsa/tag_oob.c                    | 49 ++++++++++++++++++++++++++++
 9 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/dsa/oob.h
 create mode 100644 net/dsa/tag_oob.c

diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index a94ddf83348a..2909ed5f00f6 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -66,7 +66,8 @@ Switch tagging protocols
 ------------------------
 
 DSA supports many vendor-specific tagging protocols, one software-defined
-tagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``).
+tagging protocol, a tag-less mode as well (``DSA_TAG_PROTO_NONE``) and an
+out-of-band tagging protocol (``DSA_TAG_PROTO_OOB``).
 
 The exact format of the tag protocol is vendor specific, but in general, they
 all contain something which:
@@ -217,6 +218,16 @@ receive all frames regardless of the value of the MAC DA. This can be done by
 setting the ``promisc_on_master`` property of the ``struct dsa_device_ops``.
 Note that this assumes a DSA-unaware master driver, which is the norm.
 
+Some SoCs have a tight integration between the conduit network interface and the
+embedded switch, such that the DSA tag isn't transmitted in the packet data,
+but through another media, using so-called out-of-band tagging. In that case,
+the host MAC driver is in charge of transmitting the tag to the switch.
+An example is the IPQ4019 SoC, that transmits the tag between the ipqess
+ethernet controller and the qca8k switch using the DMA descriptor. In that
+configuration, tag-chaining is permitted, but the OOB tag will always be the
+top-most switch in the tree. The tagger (``DSA_TAG_PROTO_OOB``) uses skb
+extensions to transmit the tag to and from the MAC driver.
+
 Master network devices
 ----------------------
 
diff --git a/MAINTAINERS b/MAINTAINERS
index 47588d4b1657..bdf716128058 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17055,6 +17055,7 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml
 F:	drivers/net/ethernet/qualcomm/ipqess/
+F:	net/dsa/tag_oob.c
 
 QUALCOMM ETHQOS ETHERNET DRIVER
 M:	Vinod Koul <vkoul@kernel.org>
diff --git a/include/linux/dsa/oob.h b/include/linux/dsa/oob.h
new file mode 100644
index 000000000000..b5683a9a647d
--- /dev/null
+++ b/include/linux/dsa/oob.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ * Copyright (C) 2022 Maxime Chevallier <maxime.chevallier@bootlin.com>
+ */
+
+#ifndef _NET_DSA_OOB_H
+#define _NET_DSA_OOB_H
+
+#include <linux/skbuff.h>
+
+struct dsa_oob_tag_info {
+	u16 port;
+};
+
+int dsa_oob_tag_push(struct sk_buff *skb, struct dsa_oob_tag_info *ti);
+int dsa_oob_tag_pop(struct sk_buff *skb, struct dsa_oob_tag_info *ti);
+#endif
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 59c9fd55699d..ace765ae56b3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4573,6 +4573,9 @@ enum skb_ext_id {
 #endif
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 	SKB_EXT_MCTP,
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB)
+	SKB_EXT_DSA_OOB,
 #endif
 	SKB_EXT_NUM, /* must be last */
 };
diff --git a/include/net/dsa.h b/include/net/dsa.h
index ee369670e20e..114176efacc9 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -55,6 +55,7 @@ struct phylink_link_state;
 #define DSA_TAG_PROTO_RTL8_4T_VALUE		25
 #define DSA_TAG_PROTO_RZN1_A5PSW_VALUE		26
 #define DSA_TAG_PROTO_LAN937X_VALUE		27
+#define DSA_TAG_PROTO_OOB_VALUE			28
 
 enum dsa_tag_protocol {
 	DSA_TAG_PROTO_NONE		= DSA_TAG_PROTO_NONE_VALUE,
@@ -85,6 +86,7 @@ enum dsa_tag_protocol {
 	DSA_TAG_PROTO_RTL8_4T		= DSA_TAG_PROTO_RTL8_4T_VALUE,
 	DSA_TAG_PROTO_RZN1_A5PSW	= DSA_TAG_PROTO_RZN1_A5PSW_VALUE,
 	DSA_TAG_PROTO_LAN937X		= DSA_TAG_PROTO_LAN937X_VALUE,
+	DSA_TAG_PROTO_OOB		= DSA_TAG_PROTO_OOB_VALUE,
 };
 
 struct dsa_switch;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 42a35b59fb1e..571ef7fd95b4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -61,8 +61,12 @@
 #include <linux/if_vlan.h>
 #include <linux/mpls.h>
 #include <linux/kcov.h>
+#ifdef CONFIG_NET_DSA_TAG_OOB
+#include <linux/dsa/oob.h>
+#endif
 
 #include <net/protocol.h>
+#include <net/dsa.h>
 #include <net/dst.h>
 #include <net/sock.h>
 #include <net/checksum.h>
@@ -4487,6 +4491,9 @@ static const u8 skb_ext_type_len[] = {
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 	[SKB_EXT_MCTP] = SKB_EXT_CHUNKSIZEOF(struct mctp_flow),
 #endif
+#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB)
+	[SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
+#endif
 };
 
 static __always_inline unsigned int skb_ext_total_length(void)
@@ -4506,6 +4513,9 @@ static __always_inline unsigned int skb_ext_total_length(void)
 #endif
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 		skb_ext_type_len[SKB_EXT_MCTP] +
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB)
+		skb_ext_type_len[SKB_EXT_DSA_OOB] +
 #endif
 		0;
 }
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 3eef72ce99a4..2ba4bbe07df1 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -113,6 +113,15 @@ config NET_DSA_TAG_OCELOT_8021Q
 	  this mode, less TCAM resources (VCAP IS1, IS2, ES0) are available for
 	  use with tc-flower.
 
+config NET_DSA_TAG_OOB
+	select SKB_EXTENSIONS
+	tristate "Tag driver for Out-of-band tagging drivers"
+	help
+	  Say Y or M if you want to enable support for pairs of embedded
+	  switches and host MAC drivers which perform demultiplexing and
+	  packet steering to ports using out of band metadata processed
+	  by the DSA master, rather than tags present in the packets.
+
 config NET_DSA_TAG_QCA
 	tristate "Tag driver for Qualcomm Atheros QCA8K switches"
 	help
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index bf57ef3bce2a..b11c24c969ee 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o
 obj-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
 obj-$(CONFIG_NET_DSA_TAG_OCELOT) += tag_ocelot.o
 obj-$(CONFIG_NET_DSA_TAG_OCELOT_8021Q) += tag_ocelot_8021q.o
+obj-$(CONFIG_NET_DSA_TAG_OOB) += tag_oob.o
 obj-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o
 obj-$(CONFIG_NET_DSA_TAG_RTL4_A) += tag_rtl4_a.o
 obj-$(CONFIG_NET_DSA_TAG_RTL8_4) += tag_rtl8_4.o
diff --git a/net/dsa/tag_oob.c b/net/dsa/tag_oob.c
new file mode 100644
index 000000000000..e328a1f4e38d
--- /dev/null
+++ b/net/dsa/tag_oob.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2022, Maxime Chevallier <maxime.chevallier@bootlin.com> */
+
+#include <linux/bitfield.h>
+#include <linux/dsa/oob.h>
+#include <linux/skbuff.h>
+
+#include "dsa_priv.h"
+
+static struct sk_buff *oob_tag_xmit(struct sk_buff *skb,
+				    struct net_device *dev)
+{
+	struct dsa_oob_tag_info *tag_info = skb_ext_add(skb, SKB_EXT_DSA_OOB);
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+
+	tag_info->port = dp->index;
+
+	return skb;
+}
+
+static struct sk_buff *oob_tag_rcv(struct sk_buff *skb,
+				   struct net_device *dev)
+{
+	struct dsa_oob_tag_info *tag_info = skb_ext_find(skb, SKB_EXT_DSA_OOB);
+
+	if (!tag_info)
+		return NULL;
+
+	skb->dev = dsa_master_find_slave(dev, 0, tag_info->port);
+	if (!skb->dev)
+		return NULL;
+
+	return skb;
+}
+
+static const struct dsa_device_ops oob_tag_dsa_ops = {
+	.name	= "oob",
+	.proto	= DSA_TAG_PROTO_OOB,
+	.xmit	= oob_tag_xmit,
+	.rcv	= oob_tag_rcv,
+};
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("DSA tag driver for out-of-band tagging");
+MODULE_AUTHOR("Maxime Chevallier <maxime.chevallier@bootlin.com>");
+MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_OOB);
+
+module_dsa_tag_driver(oob_tag_dsa_ops);
-- 
2.37.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v8 4/5] net: ipqess: Add out-of-band DSA tagging support
  2022-11-04 17:41 [PATCH net-next v8 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
                   ` (2 preceding siblings ...)
  2022-11-04 17:41 ` [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
@ 2022-11-04 17:41 ` Maxime Chevallier
  2022-11-04 17:41 ` [PATCH net-next v8 5/5] ARM: dts: qcom: ipq4019: Add description for the IPQESS Ethernet controller Maxime Chevallier
  4 siblings, 0 replies; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-04 17:41 UTC (permalink / raw)
  To: davem, Rob Herring, Krzysztof Kozlowski
  Cc: Maxime Chevallier, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On the IPQ4019, there's an 5 ports switch connected to the CPU through
the IPQESS Ethernet controller. The way the DSA tag is sent-out to that
switch is through the DMA descriptor, due to how tightly it is
integrated with the switch.

We use the out-of-band tagging protocol by getting the source
port from the descriptor, push it into the skb extensions, and have the
tagger pull it to infer the destination netdev. The reverse process is
done on the TX side, where the driver pulls the tag from the skb and
builds the descriptor accordingly.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---

V7->V8:
 - No changes
V6->V7:
 - Added proper endianness conversion for descriptor accesses
V5->V6:
 - Fixed the CHANGEUPPER event handling
 - removed pop/push helpers
V4->V5:
 - Rework the CHANGEUPPER event handling
V3->V4:
 - No changes
V2->V3:
 - No changes
V1->V2:
 - Use the new tagger, and the dsa_oob_tag_* helpers
 

 drivers/net/ethernet/qualcomm/Kconfig         |  1 +
 drivers/net/ethernet/qualcomm/ipqess/ipqess.c | 64 ++++++++++++++++++-
 drivers/net/ethernet/qualcomm/ipqess/ipqess.h |  4 ++
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig
index 28861bca5a5b..7eda94131cb1 100644
--- a/drivers/net/ethernet/qualcomm/Kconfig
+++ b/drivers/net/ethernet/qualcomm/Kconfig
@@ -64,6 +64,7 @@ config QCOM_IPQ4019_ESS_EDMA
 	tristate "Qualcomm Atheros IPQ4019 ESS EDMA support"
 	depends on (OF && ARCH_QCOM) || COMPILE_TEST
 	select PHYLINK
+	select NET_DSA_TAG_OOB
 	help
 	  This driver supports the Qualcomm Atheros IPQ40xx built-in
 	  ESS EDMA ethernet controller.
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.c b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
index df3f2ce77065..a2385d6407b3 100644
--- a/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
@@ -9,6 +9,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/clk.h>
+#include <linux/dsa/oob.h>
 #include <linux/if_vlan.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
@@ -22,6 +23,7 @@
 #include <linux/skbuff.h>
 #include <linux/vmalloc.h>
 #include <net/checksum.h>
+#include <net/dsa.h>
 #include <net/ip6_checksum.h>
 
 #include "ipqess.h"
@@ -327,6 +329,7 @@ static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int budget)
 	tail &= IPQESS_RFD_CONS_IDX_MASK;
 
 	while (done < budget) {
+		struct dsa_oob_tag_info *tag_info;
 		struct ipqess_rx_desc *rd;
 		struct sk_buff *skb;
 
@@ -406,6 +409,12 @@ static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int budget)
 			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD),
 					       le16_to_cpu(rd->rrd4));
 
+		if (likely(rx_ring->ess->dsa_ports)) {
+			tag_info = skb_ext_add(skb, SKB_EXT_DSA_OOB);
+			tag_info->port = FIELD_GET(IPQESS_RRD_PORT_ID_MASK,
+						   le16_to_cpu(rd->rrd1));
+		}
+
 		napi_gro_receive(&rx_ring->napi_rx, skb);
 
 		rx_ring->ess->stats.rx_packets++;
@@ -706,6 +715,23 @@ static void ipqess_rollback_tx(struct ipqess *eth,
 	tx_ring->head = start_index;
 }
 
+static void ipqess_process_dsa_tag_sh(struct ipqess *ess, struct sk_buff *skb,
+				      u32 *word3)
+{
+	struct dsa_oob_tag_info *tag_info;
+
+	if (unlikely(!ess->dsa_ports))
+		return;
+
+	tag_info = skb_ext_find(skb, SKB_EXT_DSA_OOB);
+	if (!tag_info)
+		return;
+
+	*word3 |= tag_info->port << IPQESS_TPD_PORT_BITMAP_SHIFT;
+	*word3 |= BIT(IPQESS_TPD_FROM_CPU_SHIFT);
+	*word3 |= 0x3e << IPQESS_TPD_PORT_BITMAP_SHIFT;
+}
+
 static int ipqess_tx_map_and_fill(struct ipqess_tx_ring *tx_ring,
 				  struct sk_buff *skb)
 {
@@ -716,6 +742,8 @@ static int ipqess_tx_map_and_fill(struct ipqess_tx_ring *tx_ring,
 	u16 len;
 	int i;
 
+	ipqess_process_dsa_tag_sh(tx_ring->ess, skb, &word3);
+
 	if (skb_is_gso(skb)) {
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
 			lso_word1 |= IPQESS_TPD_IPV4_EN;
@@ -917,6 +945,33 @@ static const struct net_device_ops ipqess_axi_netdev_ops = {
 	.ndo_tx_timeout		= ipqess_tx_timeout,
 };
 
+static int ipqess_netdevice_event(struct notifier_block *nb,
+				  unsigned long event, void *ptr)
+{
+	struct ipqess *ess = container_of(nb, struct ipqess, netdev_notifier);
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct netdev_notifier_changeupper_info *info;
+
+	if (dev != ess->netdev)
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		info = ptr;
+
+		if (!dsa_slave_dev_check(info->upper_dev))
+			return NOTIFY_DONE;
+
+		if (info->linking)
+			ess->dsa_ports++;
+		else
+			ess->dsa_ports--;
+
+		return NOTIFY_DONE;
+	}
+	return NOTIFY_OK;
+}
+
 static void ipqess_hw_stop(struct ipqess *ess)
 {
 	int i;
@@ -1184,12 +1239,19 @@ static int ipqess_axi_probe(struct platform_device *pdev)
 		netif_napi_add(netdev, &ess->rx_ring[i].napi_rx, ipqess_rx_napi);
 	}
 
-	err = register_netdev(netdev);
+	ess->netdev_notifier.notifier_call = ipqess_netdevice_event;
+	err = register_netdevice_notifier(&ess->netdev_notifier);
 	if (err)
 		goto err_hw_stop;
 
+	err = register_netdev(netdev);
+	if (err)
+		goto err_notifier_unregister;
+
 	return 0;
 
+err_notifier_unregister:
+	unregister_netdevice_notifier(&ess->netdev_notifier);
 err_hw_stop:
 	ipqess_hw_stop(ess);
 
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.h b/drivers/net/ethernet/qualcomm/ipqess/ipqess.h
index dae17f3a23e4..5999a3b26235 100644
--- a/drivers/net/ethernet/qualcomm/ipqess/ipqess.h
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.h
@@ -171,6 +171,10 @@ struct ipqess {
 	struct platform_device *pdev;
 	struct phylink *phylink;
 	struct phylink_config phylink_config;
+
+	struct notifier_block netdev_notifier;
+	int dsa_ports;
+
 	struct ipqess_tx_ring tx_ring[IPQESS_NETDEV_QUEUES];
 
 	struct ipqess_statistics ipqess_stats;
-- 
2.37.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v8 5/5] ARM: dts: qcom: ipq4019: Add description for the IPQESS Ethernet controller
  2022-11-04 17:41 [PATCH net-next v8 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
                   ` (3 preceding siblings ...)
  2022-11-04 17:41 ` [PATCH net-next v8 4/5] net: ipqess: Add out-of-band DSA tagging support Maxime Chevallier
@ 2022-11-04 17:41 ` Maxime Chevallier
  4 siblings, 0 replies; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-04 17:41 UTC (permalink / raw)
  To: davem, Rob Herring, Krzysztof Kozlowski
  Cc: Maxime Chevallier, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Krzysztof Kozlowski

The Qualcomm IPQ4019 includes an internal 5 ports switch, which is
connected to the CPU through the internal IPQESS Ethernet controller.

Add support for this internal interface, which is internally connected to a
modified version of the QCA8K Ethernet switch.

This Ethernet controller only support a specific internal interface mode
for connection to the switch.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---

V7->V8:
 - Added fixed-link
 - Removed ethernet0 alias
V6->V7:
 - No Changes
V5->V6:
 - Removed extra blank lines
 - Put the status property last
V4->V5:
 - Reword the commit log
V3->V4:
 - No Changes
V2->V3:
 - No Changes
V1->V2:
 - Added clock and resets

 arch/arm/boot/dts/qcom-ipq4019.dtsi | 48 +++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/arch/arm/boot/dts/qcom-ipq4019.dtsi b/arch/arm/boot/dts/qcom-ipq4019.dtsi
index b23591110bd2..c681b13aa3d9 100644
--- a/arch/arm/boot/dts/qcom-ipq4019.dtsi
+++ b/arch/arm/boot/dts/qcom-ipq4019.dtsi
@@ -591,6 +591,54 @@ wifi1: wifi@a800000 {
 			status = "disabled";
 		};
 
+		gmac: ethernet@c080000 {
+			compatible = "qcom,ipq4019-ess-edma";
+			reg = <0xc080000 0x8000>;
+			resets = <&gcc ESS_RESET>;
+			reset-names = "ess";
+			clocks = <&gcc GCC_ESS_CLK>;
+			clock-names = "ess";
+			interrupts = <GIC_SPI  65 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  66 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  67 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  68 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  69 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  70 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  71 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  72 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  73 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  74 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  75 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  76 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  77 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  78 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  79 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  80 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 240 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 241 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 242 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 243 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 244 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 245 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 246 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 247 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 248 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 249 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 250 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 251 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 252 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 253 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 254 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 255 IRQ_TYPE_EDGE_RISING>;
+			phy-mode = "internal";
+			status = "disabled";
+			fixed-link {
+				speed = <1000>;
+				full-duplex;
+				pause;
+			};
+		};
+
 		mdio: mdio@90000 {
 			#address-cells = <1>;
 			#size-cells = <0>;
-- 
2.37.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-04 17:41 ` [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
@ 2022-11-05  3:05   ` Jakub Kicinski
  2022-11-07  8:39     ` Maxime Chevallier
  2022-11-07 11:27     ` Vladimir Oltean
  0 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2022-11-05  3:05 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, Krzysztof Kozlowski, Eric Dumazet,
	Paolo Abeni, netdev, linux-kernel, devicetree, thomas.petazzoni,
	Andrew Lunn, Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
> This tagging protocol is designed for the situation where the link
> between the MAC and the Switch is designed such that the Destination
> Port, which is usually embedded in some part of the Ethernet Header, is
> sent out-of-band, and isn't present at all in the Ethernet frame.
> 
> This can happen when the MAC and Switch are tightly integrated on an
> SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
> tag is inserted directly into the DMA descriptors. In that case,
> the MAC driver is responsible for sending the tag to the switch using
> the out-of-band medium. To do so, the MAC driver needs to have the
> information of the destination port for that skb.
> 
> Add a new tagging protocol based on SKB extensions to convey the
> information about the destination port to the MAC driver

This is what METADATA_HW_PORT_MUX is for, you shouldn't have 
to allocate a piece of memory for every single packet.

Also the series doesn't build.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-05  3:05   ` Jakub Kicinski
@ 2022-11-07  8:39     ` Maxime Chevallier
  2022-11-07 16:25       ` Jakub Kicinski
  2022-11-08 12:22       ` Felix Fietkau
  2022-11-07 11:27     ` Vladimir Oltean
  1 sibling, 2 replies; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-07  8:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, Rob Herring, Krzysztof Kozlowski, Eric Dumazet,
	Paolo Abeni, netdev, linux-kernel, devicetree, thomas.petazzoni,
	Andrew Lunn, Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

Hello Jakub,

On Fri, 4 Nov 2022 20:05:30 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
> > This tagging protocol is designed for the situation where the link
> > between the MAC and the Switch is designed such that the Destination
> > Port, which is usually embedded in some part of the Ethernet
> > Header, is sent out-of-band, and isn't present at all in the
> > Ethernet frame.
> > 
> > This can happen when the MAC and Switch are tightly integrated on an
> > SoC, as is the case with the Qualcomm IPQ4019 for example, where
> > the DSA tag is inserted directly into the DMA descriptors. In that
> > case, the MAC driver is responsible for sending the tag to the
> > switch using the out-of-band medium. To do so, the MAC driver needs
> > to have the information of the destination port for that skb.
> > 
> > Add a new tagging protocol based on SKB extensions to convey the
> > information about the destination port to the MAC driver  
> 
> This is what METADATA_HW_PORT_MUX is for, you shouldn't have 
> to allocate a piece of memory for every single packet.

Does this work with DSA ? The information conveyed in the extension is
the DSA port identifier. I'm not familiar at all with
METADATA_HW_PORT_MUX, should we extend that mechanism to convey the
DSA port id ?

I also agree that allocating data isn't the best way to go, but from
the history of this series, we've tried 3 approaches so far :

 - Adding a new field to struct sk_buff, which isn't a good idea
 - Using the skb headroom, but then we can't know for sure is the skb
   contains a DSA tag or not
 - Using skb extensions, that comes with the cost of this memory
   allocation. Is this approach also incorrect then ?

> Also the series doesn't build.

Can you elaborate more ? I can't reproduce the build failure on my
side, and I didn't get any reports from the kbuild bot, are you using a
specific config file ?

Thanks,

Maxime

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-05  3:05   ` Jakub Kicinski
  2022-11-07  8:39     ` Maxime Chevallier
@ 2022-11-07 11:27     ` Vladimir Oltean
  2022-11-07 12:51       ` Vladimir Oltean
       [not found]       ` <20221107084535.61317862@kernel.org>
  1 sibling, 2 replies; 19+ messages in thread
From: Vladimir Oltean @ 2022-11-07 11:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maxime Chevallier, davem, Rob Herring, Krzysztof Kozlowski,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

Hi Jakub,

On Fri, Nov 04, 2022 at 08:05:30PM -0700, Jakub Kicinski wrote:
> On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
> > Add a new tagging protocol based on SKB extensions to convey the
> > information about the destination port to the MAC driver
>
> This is what METADATA_HW_PORT_MUX is for, you shouldn't have
> to allocate a piece of memory for every single packet.

Since this is the model that skb extensions propose and not something
that Maxime invented for this series, I presume that's not such a big
deal? What's more, couldn't this specific limitation of skb extensions
be addressed in a punctual way, via one-time calls to __skb_ext_alloc()
and fast path calls to __skb_ext_set()?

I'm unfamiliar to the concept of destination cache entries and even more
so to the concept of struct dst_entry * carrying metadata. I suppose the
latter were introduced for lack of space in struct sk_buff, to carry
metadata between layers that aren't L3/L4 (where normal dst_entry structs
are used)? What makes metadata dst's preferable to skb extensions?
The latter are more general; AFAIK they can be used between any layer
and any other layer, like for example between RX and TX in the
forwarding path. Side note, I am not exactly clear what are the lifetime
guarantees of a metadata dst entry, and if DSA's use would be 100% safe
(DSA is kind of L3, since it has an ETH_P_XDSA packet_type handler, not
an rx_handler).

More importantly, what happens if a DSA switch is used together with a
SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for
PF-VF communication? (if I understood the commit message of 3fcece12bc1b
("net: store port/representator id in metadata_dst") correctly)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-07 11:27     ` Vladimir Oltean
@ 2022-11-07 12:51       ` Vladimir Oltean
       [not found]         ` <20221107084934.157becba@kernel.org>
       [not found]       ` <20221107084535.61317862@kernel.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2022-11-07 12:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maxime Chevallier, davem, Rob Herring, Krzysztof Kozlowski,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On Mon, Nov 07, 2022 at 01:27:36PM +0200, Vladimir Oltean wrote:
> Hi Jakub,
> (...)

There is also another problem having to do with future extensibility of
METADATA_HW_PORT_MUX for DSA. I don't know how much of this is going to
be applicable for qca8k, but DSA tags might also carry such information
as trap reason (RX) or injection type (into forwarding plane or control
packet; the latter bypasses port STP state) and the FID to which the
packet should be classified by the hardware (TX). If we're going to
design a mechanism which only preallocates metadata dst's for ports,
it's going to be difficult to make that work for more information later on.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-07  8:39     ` Maxime Chevallier
@ 2022-11-07 16:25       ` Jakub Kicinski
  2022-11-08 12:22       ` Felix Fietkau
  1 sibling, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2022-11-07 16:25 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, Krzysztof Kozlowski, Eric Dumazet,
	Paolo Abeni, netdev, linux-kernel, devicetree, thomas.petazzoni,
	Andrew Lunn, Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On Mon, 7 Nov 2022 09:39:50 +0100 Maxime Chevallier wrote:
> > Also the series doesn't build.  
> 
> Can you elaborate more ? I can't reproduce the build failure on my
> side, and I didn't get any reports from the kbuild bot, are you using a
> specific config file ?

../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                                                 ^~~~~~
../include/uapi/linux/const.h:32:44: note: in definition of macro ‘__ALIGN_KERNEL_MASK’
   32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
      |                                            ^
../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’
    8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
      |                                 ^~~~~~~~~~~~~~
../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’
 4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
      |                                  ^~~~~
../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                             ^~~~~~~~~~~~~~~~~~~
../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                                                 ^~~~~~
../include/uapi/linux/const.h:32:50: note: in definition of macro ‘__ALIGN_KERNEL_MASK’
   32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
      |                                                  ^~~~
../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’
    8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
      |                                 ^~~~~~~~~~~~~~
../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’
 4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
      |                                  ^~~~~
../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                             ^~~~~~~~~~~~~~~~~~~
../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                                                 ^~~~~~
../include/uapi/linux/const.h:32:61: note: in definition of macro ‘__ALIGN_KERNEL_MASK’
   32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
      |                                                             ^~~~
../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’
    8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
      |                                 ^~~~~~~~~~~~~~
../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’
 4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
      |                                  ^~~~~
../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                             ^~~~~~~~~~~~~~~~~~~


Also this:

drivers/net/ethernet/qualcomm/ipqess/ipqess.c:1172:22: warning: cast to smaller integer type 'u32' (aka 'unsigned int') from 'void *' [-Wvoid-pointer-to-int-cast]
        netdev->base_addr = (u32)ess->hw_addr;
                            ^~~~~~~~~~~~~~~~~

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
       [not found]         ` <20221107084934.157becba@kernel.org>
@ 2022-11-07 17:04           ` Vladimir Oltean
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2022-11-07 17:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maxime Chevallier, davem, Rob Herring, Krzysztof Kozlowski,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On Mon, Nov 07, 2022 at 08:49:34AM -0800, Jakub Kicinski wrote:
> On Mon, 7 Nov 2022 12:51:26 +0000 Vladimir Oltean wrote:
> > There is also another problem having to do with future extensibility of
> > METADATA_HW_PORT_MUX for DSA. I don't know how much of this is going to
> > be applicable for qca8k, but DSA tags might also carry such information
> > as trap reason (RX) or injection type (into forwarding plane or control
> > packet; the latter bypasses port STP state) and the FID to which the
> > packet should be classified by the hardware (TX). If we're going to
> > design a mechanism which only preallocates metadata dst's for ports,
> > it's going to be difficult to make that work for more information later on.
>
> The entire patch we're commenting on is 100 LoC. Seems like a small
> thing, which can be rewritten later as needed. I don't think hand wave-y
> arguments are sufficient to go with a much heavier solution from the
> start.

I don't think it's as hand wavey as you think. Maxime did not present
the switch-side changes or device tree in this patch set. If it's going
to be based on drivers/net/dsa/qca/qca8k-common.c as I suspect, then it
might have some obscure features which are already supported by 'normal'
QCA8K DSA switches, like register read/write over Ethernet, and MIB
autocasting. If these features exist in hardware (they aren't exposed by
this patch set for sure), you'd be hard-pressed to fit them into the
METADATA_HW_PORT_MUX model, since it's pure management traffic consumed
by the switch driver and not delivered to the network stack, as opposed
to packets sent/received on behalf of any switch port.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
       [not found]       ` <20221107084535.61317862@kernel.org>
@ 2022-11-07 17:28         ` Vladimir Oltean
       [not found]           ` <20221107102440.1aecdbdb@kernel.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2022-11-07 17:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maxime Chevallier, davem, Rob Herring, Krzysztof Kozlowski,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On Mon, Nov 07, 2022 at 08:45:35AM -0800, Jakub Kicinski wrote:
> On Mon, 7 Nov 2022 11:27:37 +0000 Vladimir Oltean wrote:
> > Since this is the model that skb extensions propose and not something
> > that Maxime invented for this series, I presume that's not such a big
> > deal?
> 
> It's not a generic "do whatever you want" with it feature. The more
> people use it the less possible it is to have it disabled in a host-
> -centric kernel. 

We were talking about "the model" being "the model where you allocate
the extension for each packet", no?

> > What's more, couldn't this specific limitation of skb extensions
> > be addressed in a punctual way, via one-time calls to __skb_ext_alloc()
> > and fast path calls to __skb_ext_set()?
> 
> Are you suggesting we add refcounting to the skb ext?

idk, is it such a big offence? :)

Actually my previous paragraph on which you replied with an apparently
unrelated comment was saying that I think we're okay with allocating an
skb extension for each packet, if that's what the skb extension usage
model proposes.

> > I'm unfamiliar to the concept of destination cache entries and even more
> > so to the concept of struct dst_entry * carrying metadata. I suppose the
> > latter were introduced for lack of space in struct sk_buff, to carry
> > metadata between layers that aren't L3/L4 (where normal dst_entry structs
> > are used)? What makes metadata dst's preferable to skb extensions?
> 
> It's much less invasive.

Don't get me wrong, I don't oppose a dst_metadata solution as long as I
think that I understand it and that I can maintain/extend it as needed
going forward (which I clearly think I do about skb extensions, they
seem simple to use to the naive reader). No need to get territorial
about it, better to arm yourself with a bit of patience.

> 
> > The latter are more general; AFAIK they can be used between any layer
> > and any other layer, like for example between RX and TX in the
> > forwarding path.
> 
> You can't be using lower-dev / upper-dev metadata across forwarding,
> how would that ever work?

What makes metadata dst's preferable to skb extensions?
           ~~~~~~~~~~~~                 ~~~~~~~~~~~~~~
           former                       latter

I said: "The latter [aka skb extensions, not metadata dst's] are more general".
I did not say that you can use metadata dst's across forwarding, quite
the opposite.

> 
> > Side note, I am not exactly clear what are the lifetime
> > guarantees of a metadata dst entry, and if DSA's use would be 100% safe
> > (DSA is kind of L3, since it has an ETH_P_XDSA packet_type handler, not
> > an rx_handler).
> 
> It's just a refcounted object. I presume the DSA uppers can't get
> spawned before the lower is spawned already?

By lifetime guarantees, I actually meant: what is the latest point
during the RX path that skb_dst() will still point to the metadata dst,
and not get replaced with the real destination cache entry?

I think we're okay, because although DSA presents itself as an L3
protocol in the RX path, the 'real' L3 protocol handler will surely not
have run earlier than DSA, due to how eth_type_trans() was patched to
return ETH_P_XDSA.

Or I might be reading things completely wrong. Again, I have no
experience with destination cache entry structures or their metadata
carrying kind. Or with skb extensions, for that matter, other than
noticing that they exist.

> 
> > More importantly, what happens if a DSA switch is used together with a
> > SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for
> > PF-VF communication? (if I understood the commit message of 3fcece12bc1b
> > ("net: store port/representator id in metadata_dst") correctly)
> 
> Let's be clear that the OOB metadata model only works if both upper and 
> lower are aware of the metadata. In fact they are pretty tightly bound.
> So chances of a mismatch are extremely low and theorizing about them is
> academic.

Legally I'm not allowed to say too much, but let's say I've heard about
something which makes the above not theoretical. Anyway, let's assume
it's not a concern.

> 
> In general, I'm not sure if pretending this is DSA is not an unnecessary
> complication which will end up hurting both ends of the equation.

This is a valid point. We've refused wacky "not DSA, not switchdev"
hardware before:
https://lore.kernel.org/netdev/20201125232459.378-1-lukma@denx.de/
There's also the option of doing what I did with ocelot/felix: a common
switch lib and 2 distinct front-ends, one switchdev and one DSA.

Not a lot of people seem to be willing to put that effort in, though.
The imx28 patch set was eventually abandoned. I though I'd try a
different approach this time. Idk, maybe it's a waste of time.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
       [not found]           ` <20221107102440.1aecdbdb@kernel.org>
@ 2022-11-07 18:40             ` Florian Fainelli
  2022-11-07 20:07             ` Vladimir Oltean
  1 sibling, 0 replies; 19+ messages in thread
From: Florian Fainelli @ 2022-11-07 18:40 UTC (permalink / raw)
  To: Jakub Kicinski, Vladimir Oltean
  Cc: Maxime Chevallier, davem, Rob Herring, Krzysztof Kozlowski,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Heiner Kallweit, Russell King,
	linux-arm-kernel@lists.infradead.org",
	Luka Perkov, Robert Marko, Andy Gross, Bjorn Andersson,
	Konrad Dybcio

On 11/7/22 10:24, Jakub Kicinski wrote:

[snip]

> Yeah, it's a balancing act. Please explore the metadata option, I think
> most people jump to the skb extension because they don't know about
> metadata. If you still want skb extension after, I'll look away.

It seems to me like we are trying too hard to have a generic out of band 
solution to provide tagger information coming from a DMA descriptor as 
opposed to just introducing a DSA tagger variant specific to the format 
being used and specific to the switch + integrated MAC. Something like 
DSA_TAG_IPQDMA or whatever the name chosen would be, may be fine.

The only value I see at this point in just in telling me that the tagger 
format is coming from a DMA descriptor, but other than that, it is just 
a middle layer that requires marshalling of data on both sides, so sure 
the idea behind DSA was to be able to mix and match any Ethernet MAC 
with any discrete switch, but integrating both into the same ASIC does 
nullify the design goal.
-- 
Florian


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
       [not found]           ` <20221107102440.1aecdbdb@kernel.org>
  2022-11-07 18:40             ` Florian Fainelli
@ 2022-11-07 20:07             ` Vladimir Oltean
  1 sibling, 0 replies; 19+ messages in thread
From: Vladimir Oltean @ 2022-11-07 20:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maxime Chevallier, davem, Rob Herring, Krzysztof Kozlowski,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On Mon, Nov 07, 2022 at 10:24:40AM -0800, Jakub Kicinski wrote:
> IIRC the skb extensions were initially proposed as a way to handle rare
> exception packets (e.g. first packet of a connection-tracked flow in OvS
> offloads). Also MPTCP but that's also edge / slow path (sorry MPTCP).
>
> Now the usage is spreading and I have to keep fighting to keep them out
> of the datacenter production kernel I co-maintain.
>
> So, yeah, I hate them :)

To be fair, most people see CPU-terminated traffic on a DSA switch
(i.e. what you see in net/dsa/tag_*.c) as slow path which needs no
optimization. It's not that I condone this, but it's factually true.
If it wasn't the case, then out of the drivers I maintain, a control
packet wouldn't be delivered via SPI on SJA1105, and flow control
wouldn't be broken on the CPU port of switches from the Ocelot family.
And more importantly, software bridging between a switchdev and a
non-switchdev port wouldn't be such an oversight for more than 3/4 of
all switch drivers.

Also, I didn't really get *why* you hate them, just that you do.
Seems circular: slow => hate; hate => slow?

I don't think that skb->_skb_refdst is the hallmark of clean or simple
designs either, a pointer and a refcount bit squashed into a single
sk_buff field that is also in a union with 2 other things, and which is
reused in other network layers for purposes that have nothing to do with
L3 routing. Nope, sorry, this is highly optimized design at its finest,
true, but I have no interest in doing mental gymnastics in order to
maintain such a thing, just because some hardware manufacturer thought
that it would be a smart idea to split up device ownership in this way,
and neither build a 'switch with rings' nor a 'switch with tags', but
rather 'a switch with somebody else's rings'. The people who built this
monstrosity should step in and maintain the software architecture that's
a direct consequence of their design choices. Otherwise I'm going to opt
for the simplest thing to maintain that works. It's unfair to not care
about software support for your own hardware enough to study frameworks
beforehand, *and* to complain about performance.

> > > > The latter are more general; AFAIK they can be used between any layer
> > > > and any other layer, like for example between RX and TX in the
> > > > forwarding path.
> > >
> > > You can't be using lower-dev / upper-dev metadata across forwarding,
> > > how would that ever work?
> >
> > What makes metadata dst's preferable to skb extensions?
> >            ~~~~~~~~~~~~                 ~~~~~~~~~~~~~~
> >            former                       latter
> >
> > I said: "The latter [aka skb extensions, not metadata dst's] are more general".
> > I did not say that you can use metadata dst's across forwarding, quite
> > the opposite.
>
> No, no, I'm asking how you'd use either. I'm questioning the entire
> flow, not whether either mechanism can be used to fulfill it.

Well, we are probably talking about different things. I said that skb
extensions are a more general concept which *allows* you to pass metadata
from the iif to the oif. Metadata dst's don't. So what is the need for
metadata dst's, if skb extensions can do what those can do, and more.
Not that this use case is particularly relevant to DSA OOB. Just that I
think a reasonable expectation would have been to make skb extensions
more performant, than to introduce a parallel mechanism.

> TBH I mostly have experience on the Tx side, given that on the Rx side
> there is no queuing so the entire abstraction of tag implementation
> being separate is not strictly necessary. But if you find that the Rx
> doesn't work, and you really want the skb extensions - then, well,
> I acquiesce. And hope the Meta prod kernel never needs OOB DSA :)

I would never enable this feature, either. I would love not having to
see it.

> > > > More importantly, what happens if a DSA switch is used together with a
> > > > SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for
> > > > PF-VF communication? (if I understood the commit message of 3fcece12bc1b
> > > > ("net: store port/representator id in metadata_dst") correctly)
> > >
> > > Let's be clear that the OOB metadata model only works if both upper and
> > > lower are aware of the metadata. In fact they are pretty tightly bound.
> > > So chances of a mismatch are extremely low and theorizing about them is
> > > academic.
> >
> > Legally I'm not allowed to say too much, but let's say I've heard about
> > something which makes the above not theoretical. Anyway, let's assume
> > it's not a concern.
>
> But in that case the same vendor designs both ends, right?

Yes.

> So there should be no conflict between the metadata assigned for reprs
> vs dsa ports.

Can't say more, sorry.

> > > In general, I'm not sure if pretending this is DSA is not an unnecessary
> > > complication which will end up hurting both ends of the equation.
> >
> > This is a valid point. We've refused wacky "not DSA, not switchdev"
> > hardware before:
> > https://lore.kernel.org/netdev/20201125232459.378-1-lukma@denx.de/
> > There's also the option of doing what I did with ocelot/felix: a common
> > switch lib and 2 distinct front-ends, one switchdev and one DSA.
>
> Exactly.
>
> > Not a lot of people seem to be willing to put that effort in, though.
> > The imx28 patch set was eventually abandoned. I though I'd try a
> > different approach this time. Idk, maybe it's a waste of time.
>
> Yeah, it's a balancing act. Please explore the metadata option, I think
> most people jump to the skb extension because they don't know about
> metadata. If you still want skb extension after, I'll look away.

Well, I guess I'm still not really convinced about metadata_dst, you're
still not really convinced about skb extensions, but what we have in
common is that "one switch lib, two front ends" is an alternative worth
exploring as a design that's both clean and efficient? :)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-07  8:39     ` Maxime Chevallier
  2022-11-07 16:25       ` Jakub Kicinski
@ 2022-11-08 12:22       ` Felix Fietkau
  2022-11-15  9:29         ` Maxime Chevallier
  1 sibling, 1 reply; 19+ messages in thread
From: Felix Fietkau @ 2022-11-08 12:22 UTC (permalink / raw)
  To: Maxime Chevallier, Jakub Kicinski
  Cc: davem, Rob Herring, Krzysztof Kozlowski, Eric Dumazet,
	Paolo Abeni, netdev, linux-kernel, devicetree, thomas.petazzoni,
	Andrew Lunn, Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko,
	Andy Gross, Bjorn Andersson, Konrad Dybcio

On 07.11.22 09:39, Maxime Chevallier wrote:
>> On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
>> > This tagging protocol is designed for the situation where the link
>> > between the MAC and the Switch is designed such that the Destination
>> > Port, which is usually embedded in some part of the Ethernet
>> > Header, is sent out-of-band, and isn't present at all in the
>> > Ethernet frame.
>> > 
>> > This can happen when the MAC and Switch are tightly integrated on an
>> > SoC, as is the case with the Qualcomm IPQ4019 for example, where
>> > the DSA tag is inserted directly into the DMA descriptors. In that
>> > case, the MAC driver is responsible for sending the tag to the
>> > switch using the out-of-band medium. To do so, the MAC driver needs
>> > to have the information of the destination port for that skb.
>> > 
>> > Add a new tagging protocol based on SKB extensions to convey the
>> > information about the destination port to the MAC driver  
>> 
>> This is what METADATA_HW_PORT_MUX is for, you shouldn't have 
>> to allocate a piece of memory for every single packet.
> 
> Does this work with DSA ? The information conveyed in the extension is
> the DSA port identifier. I'm not familiar at all with
> METADATA_HW_PORT_MUX, should we extend that mechanism to convey the
> DSA port id ?
> 
> I also agree that allocating data isn't the best way to go, but from
> the history of this series, we've tried 3 approaches so far :
> 
>   - Adding a new field to struct sk_buff, which isn't a good idea
>   - Using the skb headroom, but then we can't know for sure is the skb
>     contains a DSA tag or not
>   - Using skb extensions, that comes with the cost of this memory
>     allocation. Is this approach also incorrect then ?
FYI, I'm currently working on hardware DSA untagging on the mediatek
mtk_eth_soc driver. On this hardware, I definitely need to keep the
custom DSA tag driver, as hardware untagging is not always available.
For the receive side, I came up with this patch (still untested) for
using METADATA_HW_PORT_MUX.
It has the advantage of being able to skip the tag protocol rcv ops
call for offload-enabled packets.

Maybe for the transmit side we could have some kind of netdev feature
or capability that indicates offload support and allows skipping the
tag xmit function as well.
In that case, ipqess could simply use a no-op tag driver.

What do you think?

---
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -972,11 +972,13 @@ bool __skb_flow_dissect(const struct net *net,
  		if (unlikely(skb->dev && netdev_uses_dsa(skb->dev) &&
  			     skb->protocol == htons(ETH_P_XDSA))) {
  			const struct dsa_device_ops *ops;
+			struct metadata_dst *md_dst = skb_metadata_dst(skb);
  			int offset = 0;
  
  			ops = skb->dev->dsa_ptr->tag_ops;
  			/* Only DSA header taggers break flow dissection */
-			if (ops->needed_headroom) {
+			if (ops->needed_headroom &&
+			    (!md_dst || md_dst->type != METADATA_HW_PORT_MUX)) {
  				if (ops->flow_dissect)
  					ops->flow_dissect(skb, &proto, &offset);
  				else
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -11,6 +11,7 @@
  #include <linux/netdevice.h>
  #include <linux/sysfs.h>
  #include <linux/ptp_classify.h>
+#include <net/dst_metadata.h>
  
  #include "dsa_priv.h"
  
@@ -216,6 +217,7 @@ static bool dsa_skb_defer_rx_timestamp(struct dsa_slave_priv *p,
  static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
  			  struct packet_type *pt, struct net_device *unused)
  {
+	struct metadata_dst *md_dst = skb_metadata_dst(skb);
  	struct dsa_port *cpu_dp = dev->dsa_ptr;
  	struct sk_buff *nskb = NULL;
  	struct dsa_slave_priv *p;
@@ -229,7 +231,21 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
  	if (!skb)
  		return 0;
  
-	nskb = cpu_dp->rcv(skb, dev);
+	if (md_dst && md_dst->type == METADATA_HW_PORT_MUX) {
+		unsigned int port = md_dst->u.port_info.port_id;
+
+		dsa_default_offload_fwd_mark(skb);
+		skb_dst_set(skb, NULL);
+		if (!skb_has_extensions(skb))
+			skb->slow_gro = 0;
+
+		skb->dev = dsa_master_find_slave(dev, 0, port);
+		if (skb->dev)
+			nskb = skb;
+	} else {
+		nskb = cpu_dp->rcv(skb, dev);
+	}
+
  	if (!nskb) {
  		kfree_skb(skb);
  		return 0;


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-08 12:22       ` Felix Fietkau
@ 2022-11-15  9:29         ` Maxime Chevallier
  2022-11-15 11:50           ` Vladimir Oltean
  0 siblings, 1 reply; 19+ messages in thread
From: Maxime Chevallier @ 2022-11-15  9:29 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Jakub Kicinski, davem, Rob Herring, Krzysztof Kozlowski,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko, Andy Gross, Bjorn Andersson, Konrad Dybcio

Hello everyone,

Felix, thanks for the feedback !

On Tue, 8 Nov 2022 13:22:17 +0100
Felix Fietkau <nbd@nbd.name> wrote:

[...]

> FYI, I'm currently working on hardware DSA untagging on the mediatek
> mtk_eth_soc driver. On this hardware, I definitely need to keep the
> custom DSA tag driver, as hardware untagging is not always available.
> For the receive side, I came up with this patch (still untested) for
> using METADATA_HW_PORT_MUX.
> It has the advantage of being able to skip the tag protocol rcv ops
> call for offload-enabled packets.
> 
> Maybe for the transmit side we could have some kind of netdev feature
> or capability that indicates offload support and allows skipping the
> tag xmit function as well.
> In that case, ipqess could simply use a no-op tag driver.

If I'm not mistaken, Florian also proposed a while ago an offload
mechanism for taggin/untagging :

https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/

It uses some of the points you're mentionning, such as the netdev
feature :)

All in all, I'm still a bit confused about the next steps. If I can
summarize a bit, we have a lot of approaches, all with advantages and
inconvenients, I'll try to summarize the state :

 - We could simply use the skb extensions as-is, rename the tagger
   something like "DSA_TAG_IPQDMA" and consider this a way to perform
   tagging on this specific class of hardware, without trying too hard
   to make it generic.

 - We could try to move forward with this mechanism of offloading
   tagging and untagging from the MAC driver, this would address
   Florian's first try at this, Felix's use-case and would fit well the
   IPQESS case

 - There's the option discussed by Vlad and Jakub to add several
   frontends, one being a switchev driver, here I'm a bit lost TBH, if
   we go this way I could definitely use a few pointers from Vlad :)

When looking at it from this point of view, option 2 looks pretty
promising, but I would like to make sure we're on the same page at that
point. On my side, I've tried several approaches for this tagging and
so far none are acceptable, for good reasons. I would like to make sure
then that I grasp the full picture and I didn't miss other possible
ways of addressing this.

Thanks everyone for your help !

Maxime

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-15  9:29         ` Maxime Chevallier
@ 2022-11-15 11:50           ` Vladimir Oltean
  2023-05-23 12:34             ` Maxime Chevallier
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Oltean @ 2022-11-15 11:50 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Felix Fietkau, Jakub Kicinski, davem, Rob Herring,
	Krzysztof Kozlowski, Eric Dumazet, Paolo Abeni, netdev,
	linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Luka Perkov, Robert Marko, Andy Gross,
	Bjorn Andersson, Konrad Dybcio

On Tue, Nov 15, 2022 at 10:29:24AM +0100, Maxime Chevallier wrote:
> Hello everyone,
> 
> Felix, thanks for the feedback !
> 
> On Tue, 8 Nov 2022 13:22:17 +0100
> Felix Fietkau <nbd@nbd.name> wrote:
> 
> [...]
> 
> > FYI, I'm currently working on hardware DSA untagging on the mediatek
> > mtk_eth_soc driver. On this hardware, I definitely need to keep the
> > custom DSA tag driver, as hardware untagging is not always available.
> > For the receive side, I came up with this patch (still untested) for
> > using METADATA_HW_PORT_MUX.
> > It has the advantage of being able to skip the tag protocol rcv ops
> > call for offload-enabled packets.
> > 
> > Maybe for the transmit side we could have some kind of netdev feature
> > or capability that indicates offload support and allows skipping the
> > tag xmit function as well.
> > In that case, ipqess could simply use a no-op tag driver.
> 
> If I'm not mistaken, Florian also proposed a while ago an offload
> mechanism for taggin/untagging :
> 
> https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/
> 
> It uses some of the points you're mentionning, such as the netdev
> feature :)
> 
> All in all, I'm still a bit confused about the next steps. If I can
> summarize a bit, we have a lot of approaches, all with advantages and
> inconvenients, I'll try to summarize the state :
> 
>  - We could simply use the skb extensions as-is, rename the tagger
>    something like "DSA_TAG_IPQDMA" and consider this a way to perform
>    tagging on this specific class of hardware, without trying too hard
>    to make it generic.

For Felix, using skb extensions would be inconvenient, since it would
involve per packet allocations which are now avoided with the metadata
dsts.

>  - We could try to move forward with this mechanism of offloading
>    tagging and untagging from the MAC driver, this would address
>    Florian's first try at this, Felix's use-case and would fit well the
>    IPQESS case

Someone would need to take things from where Felix left them:
https://patchwork.kernel.org/project/netdevbpf/patch/20221114124214.58199-2-nbd@nbd.name/
and add TX tag offloading support as well. Here there would need to be
a mechanism through which DSA asks "hey, this is my tagging protocol,
can the master offload it in the TX direction or am I just going to push
the tag into the packet?". I tried to sketch here something along those
lines:
https://patchwork.kernel.org/project/netdevbpf/patch/20221109163426.76164-10-nbd@nbd.name/#25084481

>  - There's the option discussed by Vlad and Jakub to add several
>    frontends, one being a switchev driver, here I'm a bit lost TBH, if
>    we go this way I could definitely use a few pointers from Vlad :)

The assumption being here that there is more functionality to cover by
the metadata dst than a port mux. I'm really not clear what is the
hardware design truly, hopefully you could give more details about that.

The mechanism is quite simple, it's not rocket science. Take something
like a bridge join operation, the proposal is to do something like this:

    dsa_slave_netdevice_event
        (net/dsa/slave.c)
               |
               v
      dsa_slave_changeupper
       (net/dsa/slave.c)
               |
               v
       dsa_port_bridge_join                         ocelot_netdevice_event
        (net/dsa/port.c)                  (drivers/net/ethernet/mscc/ocelot_net.c)
               |                                           |
               v                                           v
     dsa_switch_bridge_join                     ocelot_netdevice_changeupper
       (net/dsa/switch.c)                 (drivers/net/ethernet/mscc/ocelot_net.c)
               |                                           |
               v                                           v
       felix_bridge_join                        ocelot_netdevice_bridge_join
(drivers/net/dsa/ocelot/felix.c)          (drivers/net/ethernet/mscc/ocelot_net.c)
               |                                           |
               |                                           |
               +---------------------+---------------------+
                                     |
                                     v
                           ocelot_port_bridge_join
                      (drivers/net/ethernet/mscc/ocelot.c)

with you maintaining the entire right branch that represents the switchdev frontend,
and more or less duplicates part of DSA.

The advantage of this approach is that you can register your own NAPI
handler where you can treat packets in whichever way you like, and have
your own ndo_start_xmit. This driver would treat the aggregate of the
ess DMA engine and the ipq switch as a single device, and expose it as a
switch with DMA, basically.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
  2022-11-15 11:50           ` Vladimir Oltean
@ 2023-05-23 12:34             ` Maxime Chevallier
  0 siblings, 0 replies; 19+ messages in thread
From: Maxime Chevallier @ 2023-05-23 12:34 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Felix Fietkau, Jakub Kicinski, davem, Rob Herring,
	Krzysztof Kozlowski, Eric Dumazet, Paolo Abeni, netdev,
	linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Luka Perkov, Robert Marko, Andy Gross,
	Bjorn Andersson, Konrad Dybcio, romain.gantois

Hello everyone,

I'm digging this topic up, it has we'd like to move forward with the
upstreaming of this, and before trying any new approach, I'd like to
see if we can settle on one of the two choices that were expressed so
far.

To summarize the issue, this hardware platform (IPQ4019 from Qualcomm)
uses an internal switch that's a modified QCA8K, for which there
already is a DSA driver. On that platform, there's a MAC (ipqess)
connected to the switch, that passes the dst/src port id through the
DMA descriptor, whereas a typical DSA switch would pass that
information in the frame itself.

There has been a few approaches to try and reuse DSA as-is with a
custom tagger, but all of them eventually got rejected, for a good
reason.

Two solutions are proposed, as discussed in that thread (hence the
top-posting, sorry about that).

There are two approaches remaining, either implementing DSA tagging
offload support in RX/TX, or having a DSA frontend for the switch (the
current QCA8K driver) and a switchdev frontend, using the qca8k logic
with the ESS driver handling transfers for the CPU port.

As both approaches make sense but are quite opposed, I'd like to make
sure we go in the right direction. The switchdev approach definitely
makes a lot of sense, but the DSA tagging offloading has been in
discussion for quite a while, starting with Florian's series, followed
by Felix's, and this could also be a good occasion to move forward with
this, and it would also involve a minimal rework of the current ipqess
driver.

Any pointer would help,

Thanks everyone,

Maxime

On Tue, 15 Nov 2022 11:50:23 +0000
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Tue, Nov 15, 2022 at 10:29:24AM +0100, Maxime Chevallier wrote:
> > Hello everyone,
> > 
> > Felix, thanks for the feedback !
> > 
> > On Tue, 8 Nov 2022 13:22:17 +0100
> > Felix Fietkau <nbd@nbd.name> wrote:
> > 
> > [...]
> >   
> > > FYI, I'm currently working on hardware DSA untagging on the
> > > mediatek mtk_eth_soc driver. On this hardware, I definitely need
> > > to keep the custom DSA tag driver, as hardware untagging is not
> > > always available. For the receive side, I came up with this patch
> > > (still untested) for using METADATA_HW_PORT_MUX.
> > > It has the advantage of being able to skip the tag protocol rcv
> > > ops call for offload-enabled packets.
> > > 
> > > Maybe for the transmit side we could have some kind of netdev
> > > feature or capability that indicates offload support and allows
> > > skipping the tag xmit function as well.
> > > In that case, ipqess could simply use a no-op tag driver.  
> > 
> > If I'm not mistaken, Florian also proposed a while ago an offload
> > mechanism for taggin/untagging :
> > 
> > https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/
> > 
> > It uses some of the points you're mentionning, such as the netdev
> > feature :)
> > 
> > All in all, I'm still a bit confused about the next steps. If I can
> > summarize a bit, we have a lot of approaches, all with advantages
> > and inconvenients, I'll try to summarize the state :
> > 
> >  - We could simply use the skb extensions as-is, rename the tagger
> >    something like "DSA_TAG_IPQDMA" and consider this a way to
> > perform tagging on this specific class of hardware, without trying
> > too hard to make it generic.  
> 
> For Felix, using skb extensions would be inconvenient, since it would
> involve per packet allocations which are now avoided with the metadata
> dsts.
> 
> >  - We could try to move forward with this mechanism of offloading
> >    tagging and untagging from the MAC driver, this would address
> >    Florian's first try at this, Felix's use-case and would fit well
> > the IPQESS case  
> 
> Someone would need to take things from where Felix left them:
> https://patchwork.kernel.org/project/netdevbpf/patch/20221114124214.58199-2-nbd@nbd.name/
> and add TX tag offloading support as well. Here there would need to be
> a mechanism through which DSA asks "hey, this is my tagging protocol,
> can the master offload it in the TX direction or am I just going to
> push the tag into the packet?". I tried to sketch here something
> along those lines:
> https://patchwork.kernel.org/project/netdevbpf/patch/20221109163426.76164-10-nbd@nbd.name/#25084481
> 
> >  - There's the option discussed by Vlad and Jakub to add several
> >    frontends, one being a switchev driver, here I'm a bit lost TBH,
> > if we go this way I could definitely use a few pointers from Vlad
> > :)  
> 
> The assumption being here that there is more functionality to cover by
> the metadata dst than a port mux. I'm really not clear what is the
> hardware design truly, hopefully you could give more details about
> that.

TBH the documentation I have is pretty limited, I don't actually know
what else can go in the metadata attached to the descriptor :(

> The mechanism is quite simple, it's not rocket science. Take something
> like a bridge join operation, the proposal is to do something like
> this:
> 
>     dsa_slave_netdevice_event
>         (net/dsa/slave.c)
>                |
>                v
>       dsa_slave_changeupper
>        (net/dsa/slave.c)
>                |
>                v
>        dsa_port_bridge_join
> ocelot_netdevice_event (net/dsa/port.c)
> (drivers/net/ethernet/mscc/ocelot_net.c) |
>                | v                                           v
>      dsa_switch_bridge_join
> ocelot_netdevice_changeupper (net/dsa/switch.c)
> (drivers/net/ethernet/mscc/ocelot_net.c) |
>                | v                                           v
>        felix_bridge_join
> ocelot_netdevice_bridge_join (drivers/net/dsa/ocelot/felix.c)
>  (drivers/net/ethernet/mscc/ocelot_net.c) |
>                 | |                                           |
>                +---------------------+---------------------+
>                                      |
>                                      v
>                            ocelot_port_bridge_join
>                       (drivers/net/ethernet/mscc/ocelot.c)
> 
> with you maintaining the entire right branch that represents the
> switchdev frontend, and more or less duplicates part of DSA.
> 
> The advantage of this approach is that you can register your own NAPI
> handler where you can treat packets in whichever way you like, and
> have your own ndo_start_xmit. This driver would treat the aggregate
> of the ess DMA engine and the ipq switch as a single device, and
> expose it as a switch with DMA, basically.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-23 12:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04 17:41 [PATCH net-next v8 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
2022-11-04 17:41 ` [PATCH net-next v8 1/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
2022-11-04 17:41 ` [PATCH net-next v8 2/5] net: ipqess: introduce the Qualcomm IPQESS driver Maxime Chevallier
2022-11-04 17:41 ` [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
2022-11-05  3:05   ` Jakub Kicinski
2022-11-07  8:39     ` Maxime Chevallier
2022-11-07 16:25       ` Jakub Kicinski
2022-11-08 12:22       ` Felix Fietkau
2022-11-15  9:29         ` Maxime Chevallier
2022-11-15 11:50           ` Vladimir Oltean
2023-05-23 12:34             ` Maxime Chevallier
2022-11-07 11:27     ` Vladimir Oltean
2022-11-07 12:51       ` Vladimir Oltean
     [not found]         ` <20221107084934.157becba@kernel.org>
2022-11-07 17:04           ` Vladimir Oltean
     [not found]       ` <20221107084535.61317862@kernel.org>
2022-11-07 17:28         ` Vladimir Oltean
     [not found]           ` <20221107102440.1aecdbdb@kernel.org>
2022-11-07 18:40             ` Florian Fainelli
2022-11-07 20:07             ` Vladimir Oltean
2022-11-04 17:41 ` [PATCH net-next v8 4/5] net: ipqess: Add out-of-band DSA tagging support Maxime Chevallier
2022-11-04 17:41 ` [PATCH net-next v8 5/5] ARM: dts: qcom: ipq4019: Add description for the IPQESS Ethernet controller Maxime Chevallier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).