netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/5] net: ipqess: introduce Qualcomm IPQESS driver
@ 2022-05-14 15:06 Maxime Chevallier
  2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-14 15:06 UTC (permalink / raw)
  To: davem, Rob Herring
  Cc: Maxime Chevallier, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

Hello everyone,

This is the 2nd iteration of a series that introduces a new driver, for
the Qualcomm IPQESS Ethernet Controller, found on the IPQ4019.

Notables changes on V2 :
 - Put the DSA tag in the skb itself instead of using skb->shinfo
 - Fixed the initialisation sequence based on Andrew's comments
 - Reworked the error paths in the init sequence
 - Add support for the clock and reset lines on that controller
 - Fixed and updated the binding

The driver itself is pretty straightforward, but has lived out-of-tree
for a while. I've done my best to clean-up some outdated API calls, but
some might remain.

This controller is somewhat special, since it's part of the IPQ4019 SoC
which also includes an QCA8K switch, and uses the IPQESS controller for
the CPU port. The switch is so tightly intergrated with the MAC that it
is connected to the MAC using an internal link (hence the fact that we
only support PHY_INTERFACE_MODE_INTERNAL), and this has some
consequences on the DSA side.

The tagging for the switch isn't done inband as most switch do, but
out-of-band, the DSA tag being included in the DMA descriptor.

This series includes a new out-of-band tagger that uses the skb headroom
to convey the tag between the tagger and the MAC driver.

Thanks to the Sartura folks who worked on a base version of this driver,
and provided test hardware.

Best regards,

Maxime Chevallier

Maxime Chevallier (5):
  net: ipqess: introduce the Qualcomm IPQESS driver
  net: dsa: add out-of-band tagging protocol
  net: ipqess: Add out-of-band DSA tagging support
  net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller
  ARM: dts: qcom: ipq4019: Add description for the IPQESS Ethernet
    controller

 .../devicetree/bindings/net/qcom,ipqess.yaml  |  104 ++
 MAINTAINERS                                   |    6 +
 arch/arm/boot/dts/qcom-ipq4019.dtsi           |   46 +
 drivers/net/ethernet/qualcomm/Kconfig         |   12 +
 drivers/net/ethernet/qualcomm/Makefile        |    2 +
 drivers/net/ethernet/qualcomm/ipqess/Makefile |    8 +
 drivers/net/ethernet/qualcomm/ipqess/ipqess.c | 1296 +++++++++++++++++
 drivers/net/ethernet/qualcomm/ipqess/ipqess.h |  518 +++++++
 .../ethernet/qualcomm/ipqess/ipqess_ethtool.c |  168 +++
 include/linux/dsa/oob.h                       |   17 +
 include/net/dsa.h                             |    2 +
 net/dsa/Kconfig                               |    7 +
 net/dsa/Makefile                              |    1 +
 net/dsa/tag_oob.c                             |   84 ++
 14 files changed, 2271 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/qcom,ipqess.yaml
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.c
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.h
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c
 create mode 100644 include/linux/dsa/oob.h
 create mode 100644 net/dsa/tag_oob.c

-- 
2.36.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-14 15:06 [PATCH net-next v2 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
@ 2022-05-14 15:06 ` Maxime Chevallier
  2022-05-14 17:18   ` Russell King (Oracle)
                     ` (3 more replies)
  2022-05-14 15:06 ` [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-14 15:06 UTC (permalink / raw)
  To: davem, Rob Herring
  Cc: Maxime Chevallier, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

The Qualcomm IPQESS controller is a simple 1G Ethernet controller found
on the IPQ4019 chip. This controller has some specificities, in that the
IPQ4019 platform that includes that controller also has an internal
switch, based on the QCA8K IP.

It is connected to that switch through an internal link, and doesn't
expose directly any external interface, hence it only supports the
PHY_INTERFACE_MODE_INTERNAL for now.

It has 16 RX and TX queues, with a very basic RSS fanout configured at
init time.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V1->V2 :
 - Reworked the init sequence, following Andrew's comments
 - Added clock and reset support
 - Reworked the error paths
 - Added extra endianness wrappers to fix sparse warnings

 MAINTAINERS                                   |    6 +
 drivers/net/ethernet/qualcomm/Kconfig         |   11 +
 drivers/net/ethernet/qualcomm/Makefile        |    2 +
 drivers/net/ethernet/qualcomm/ipqess/Makefile |    8 +
 drivers/net/ethernet/qualcomm/ipqess/ipqess.c | 1269 +++++++++++++++++
 drivers/net/ethernet/qualcomm/ipqess/ipqess.h |  518 +++++++
 .../ethernet/qualcomm/ipqess/ipqess_ethtool.c |  168 +++
 7 files changed, 1982 insertions(+)
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.c
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.h
 create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9b0480f1b153..29e6ec4f975a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16308,6 +16308,12 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/qualcomm/emac/
 
+QUALCOMM IPQESS ETHERNET DRIVER
+M:	Maxime Chevallier <maxime.chevallier@bootlin.com>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ethernet/qualcomm/ipqess/
+
 QUALCOMM ETHQOS ETHERNET DRIVER
 M:	Vinod Koul <vkoul@kernel.org>
 L:	netdev@vger.kernel.org
diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig
index a4434eb38950..a723ddbea248 100644
--- a/drivers/net/ethernet/qualcomm/Kconfig
+++ b/drivers/net/ethernet/qualcomm/Kconfig
@@ -60,6 +60,17 @@ config QCOM_EMAC
 	  low power, Receive-Side Scaling (RSS), and IEEE 1588-2008
 	  Precision Clock Synchronization Protocol.
 
+config QCOM_IPQ4019_ESS_EDMA
+	tristate "Qualcomm Atheros IPQ4019 ESS EDMA support"
+	depends on OF
+	select PHYLINK
+	help
+	  This driver supports the Qualcomm Atheros IPQ40xx built-in
+	  ESS EDMA ethernet controller.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called ipqess.
+
 source "drivers/net/ethernet/qualcomm/rmnet/Kconfig"
 
 endif # NET_VENDOR_QUALCOMM
diff --git a/drivers/net/ethernet/qualcomm/Makefile b/drivers/net/ethernet/qualcomm/Makefile
index 9250976dd884..db463c9ea1f9 100644
--- a/drivers/net/ethernet/qualcomm/Makefile
+++ b/drivers/net/ethernet/qualcomm/Makefile
@@ -11,4 +11,6 @@ qcauart-objs := qca_uart.o
 
 obj-y += emac/
 
+obj-$(CONFIG_QCOM_IPQ4019_ESS_EDMA) += ipqess/
+
 obj-$(CONFIG_RMNET) += rmnet/
diff --git a/drivers/net/ethernet/qualcomm/ipqess/Makefile b/drivers/net/ethernet/qualcomm/ipqess/Makefile
new file mode 100644
index 000000000000..4f2db7283ebf
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the IPQ ESS driver
+#
+
+obj-$(CONFIG_QCOM_IPQ4019_ESS_EDMA) += ipq_ess.o
+
+ipq_ess-objs := ipqess.o ipqess_ethtool.o
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.c b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
new file mode 100644
index 000000000000..b11f11f23c11
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
@@ -0,0 +1,1269 @@
+// SPDX-License-Identifier: GPL-2.0 OR ISC
+/* Copyright (c) 2014 - 2017, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017 - 2018, John Crispin <john@phrozen.org>
+ * Copyright (c) 2018 - 2019, Christian Lamparter <chunkeey@gmail.com>
+ * Copyright (c) 2020 - 2021, Gabor Juhos <j4g8y7@gmail.com>
+ * Copyright (c) 2021 - 2022, Maxime Chevallier <maxime.chevallier@bootlin.com>
+ *
+ */
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/if_vlan.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/phylink.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+#include <linux/skbuff.h>
+#include <linux/vmalloc.h>
+#include <net/checksum.h>
+#include <net/ip6_checksum.h>
+
+#include "ipqess.h"
+
+#define IPQESS_RRD_SIZE		16
+#define IPQESS_NEXT_IDX(X, Y)  (((X) + 1) & ((Y) - 1))
+#define IPQESS_TX_DMA_BUF_LEN	0x3fff
+
+static void ipqess_w32(struct ipqess *ess, u32 reg, u32 val)
+{
+	writel(val, ess->hw_addr + reg);
+}
+
+static u32 ipqess_r32(struct ipqess *ess, u16 reg)
+{
+	return readl(ess->hw_addr + reg);
+}
+
+static void ipqess_m32(struct ipqess *ess, u32 mask, u32 val, u16 reg)
+{
+	u32 _val = ipqess_r32(ess, reg);
+
+	_val &= ~mask;
+	_val |= val;
+
+	ipqess_w32(ess, reg, _val);
+}
+
+void ipqess_update_hw_stats(struct ipqess *ess)
+{
+	u32 *p;
+	u32 stat;
+	int i;
+
+	lockdep_assert_held(&ess->stats_lock);
+
+	p = (u32 *)&ess->ipqess_stats;
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_TX_STAT_PKT_Q(i));
+		*p += stat;
+		p++;
+	}
+
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_TX_STAT_BYTE_Q(i));
+		*p += stat;
+		p++;
+	}
+
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_RX_STAT_PKT_Q(i));
+		*p += stat;
+		p++;
+	}
+
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
+		stat = ipqess_r32(ess, IPQESS_REG_RX_STAT_BYTE_Q(i));
+		*p += stat;
+		p++;
+	}
+}
+
+static int ipqess_tx_ring_alloc(struct ipqess *ess)
+{
+	struct device *dev = &ess->pdev->dev;
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		struct ipqess_tx_ring *tx_ring = &ess->tx_ring[i];
+		size_t size;
+		u32 idx;
+
+		tx_ring->ess = ess;
+		tx_ring->ring_id = i;
+		tx_ring->idx = i * 4;
+		tx_ring->count = IPQESS_TX_RING_SIZE;
+		tx_ring->nq = netdev_get_tx_queue(ess->netdev, i);
+
+		size = sizeof(struct ipqess_buf) * IPQESS_TX_RING_SIZE;
+		tx_ring->buf = devm_kzalloc(dev, size, GFP_KERNEL);
+		if (!tx_ring->buf) {
+			netdev_err(ess->netdev, "buffer alloc of tx ring failed");
+			return -ENOMEM;
+		}
+
+		size = sizeof(struct ipqess_tx_desc) * IPQESS_TX_RING_SIZE;
+		tx_ring->hw_desc = dmam_alloc_coherent(dev, size, &tx_ring->dma,
+						       GFP_KERNEL | __GFP_ZERO);
+		if (!tx_ring->hw_desc) {
+			netdev_err(ess->netdev, "descriptor allocation for tx ring failed");
+			return -ENOMEM;
+		}
+
+		ipqess_w32(ess, IPQESS_REG_TPD_BASE_ADDR_Q(tx_ring->idx),
+			   (u32)tx_ring->dma);
+
+		idx = ipqess_r32(ess, IPQESS_REG_TPD_IDX_Q(tx_ring->idx));
+		idx >>= IPQESS_TPD_CONS_IDX_SHIFT; /* need u32 here */
+		idx &= 0xffff;
+		tx_ring->head = idx;
+		tx_ring->tail = idx;
+
+		ipqess_m32(ess, IPQESS_TPD_PROD_IDX_MASK << IPQESS_TPD_PROD_IDX_SHIFT,
+			   idx, IPQESS_REG_TPD_IDX_Q(tx_ring->idx));
+		ipqess_w32(ess, IPQESS_REG_TX_SW_CONS_IDX_Q(tx_ring->idx), idx);
+		ipqess_w32(ess, IPQESS_REG_TPD_RING_SIZE, IPQESS_TX_RING_SIZE);
+	}
+
+	return 0;
+}
+
+static int ipqess_tx_unmap_and_free(struct device *dev, struct ipqess_buf *buf)
+{
+	int len = 0;
+
+	if (buf->flags & IPQESS_DESC_SINGLE)
+		dma_unmap_single(dev, buf->dma,	buf->length, DMA_TO_DEVICE);
+	else if (buf->flags & IPQESS_DESC_PAGE)
+		dma_unmap_page(dev, buf->dma, buf->length, DMA_TO_DEVICE);
+
+	if (buf->flags & IPQESS_DESC_LAST) {
+		len = buf->skb->len;
+		dev_kfree_skb_any(buf->skb);
+	}
+
+	buf->flags = 0;
+
+	return len;
+}
+
+static void ipqess_tx_ring_free(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int j;
+
+		if (ess->tx_ring[i].hw_desc)
+			continue;
+
+		for (j = 0; j < IPQESS_TX_RING_SIZE; j++) {
+			struct ipqess_buf *buf = &ess->tx_ring[i].buf[j];
+
+			ipqess_tx_unmap_and_free(&ess->pdev->dev, buf);
+		}
+
+		ess->tx_ring[i].buf = NULL;
+	}
+}
+
+static int ipqess_rx_buf_prepare(struct ipqess_buf *buf,
+				 struct ipqess_rx_ring *rx_ring)
+{
+	memset(buf->skb->data, 0, sizeof(struct ipqess_rx_desc));
+
+	buf->dma = dma_map_single(rx_ring->ppdev, buf->skb->data,
+				  IPQESS_RX_HEAD_BUFF_SIZE, DMA_FROM_DEVICE);
+	if (dma_mapping_error(rx_ring->ppdev, buf->dma)) {
+		dev_err_once(rx_ring->ppdev,
+			     "IPQESS DMA mapping failed for linear address %x",
+			     buf->dma);
+		dev_kfree_skb_any(buf->skb);
+		buf->skb = NULL;
+		return -EFAULT;
+	}
+
+	buf->length = IPQESS_RX_HEAD_BUFF_SIZE;
+	rx_ring->hw_desc[rx_ring->head] = (struct ipqess_rx_desc *)buf->dma;
+	rx_ring->head = (rx_ring->head + 1) % IPQESS_RX_RING_SIZE;
+
+	ipqess_m32(rx_ring->ess, IPQESS_RFD_PROD_IDX_BITS,
+		   (rx_ring->head + IPQESS_RX_RING_SIZE - 1) % IPQESS_RX_RING_SIZE,
+		   IPQESS_REG_RFD_IDX_Q(rx_ring->idx));
+
+	return 0;
+}
+
+/* locking is handled by the caller */
+static int ipqess_rx_buf_alloc_napi(struct ipqess_rx_ring *rx_ring)
+{
+	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
+
+	buf->skb = napi_alloc_skb(&rx_ring->napi_rx, IPQESS_RX_HEAD_BUFF_SIZE);
+	if (!buf->skb)
+		return -ENOMEM;
+
+	return ipqess_rx_buf_prepare(buf, rx_ring);
+}
+
+static int ipqess_rx_buf_alloc(struct ipqess_rx_ring *rx_ring)
+{
+	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
+
+	buf->skb = netdev_alloc_skb_ip_align(rx_ring->ess->netdev,
+					     IPQESS_RX_HEAD_BUFF_SIZE);
+
+	if (!buf->skb)
+		return -ENOMEM;
+
+	return ipqess_rx_buf_prepare(buf, rx_ring);
+}
+
+static void ipqess_refill_work(struct work_struct *work)
+{
+	struct ipqess_rx_ring_refill *rx_refill = container_of(work,
+		struct ipqess_rx_ring_refill, refill_work);
+	struct ipqess_rx_ring *rx_ring = rx_refill->rx_ring;
+	int refill = 0;
+
+	/* don't let this loop by accident. */
+	while (atomic_dec_and_test(&rx_ring->refill_count)) {
+		napi_disable(&rx_ring->napi_rx);
+		if (ipqess_rx_buf_alloc(rx_ring)) {
+			refill++;
+			dev_dbg(rx_ring->ppdev,
+				"Not all buffers were reallocated");
+		}
+		napi_enable(&rx_ring->napi_rx);
+	}
+
+	if (atomic_add_return(refill, &rx_ring->refill_count))
+		schedule_work(&rx_refill->refill_work);
+}
+
+static int ipqess_rx_ring_alloc(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int j;
+
+		ess->rx_ring[i].ess = ess;
+		ess->rx_ring[i].ppdev = &ess->pdev->dev;
+		ess->rx_ring[i].ring_id = i;
+		ess->rx_ring[i].idx = i * 2;
+
+		ess->rx_ring[i].buf = devm_kzalloc(&ess->pdev->dev,
+						   sizeof(struct ipqess_buf) * IPQESS_RX_RING_SIZE,
+						   GFP_KERNEL);
+
+		if (!ess->rx_ring[i].buf)
+			return -ENOMEM;
+
+		ess->rx_ring[i].hw_desc =
+			dmam_alloc_coherent(&ess->pdev->dev,
+					    sizeof(struct ipqess_rx_desc) * IPQESS_RX_RING_SIZE,
+					    &ess->rx_ring[i].dma, GFP_KERNEL);
+
+		if (!ess->rx_ring[i].hw_desc)
+			return -ENOMEM;
+
+		for (j = 0; j < IPQESS_RX_RING_SIZE; j++)
+			if (ipqess_rx_buf_alloc(&ess->rx_ring[i]) < 0)
+				return -ENOMEM;
+
+		ess->rx_refill[i].rx_ring = &ess->rx_ring[i];
+		INIT_WORK(&ess->rx_refill[i].refill_work, ipqess_refill_work);
+
+		ipqess_w32(ess, IPQESS_REG_RFD_BASE_ADDR_Q(ess->rx_ring[i].idx),
+			   (u32)(ess->rx_ring[i].dma));
+	}
+
+	ipqess_w32(ess, IPQESS_REG_RX_DESC0,
+		   (IPQESS_RX_HEAD_BUFF_SIZE << IPQESS_RX_BUF_SIZE_SHIFT) |
+		   (IPQESS_RX_RING_SIZE << IPQESS_RFD_RING_SIZE_SHIFT));
+
+	return 0;
+}
+
+static void ipqess_rx_ring_free(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int j;
+
+		atomic_set(&ess->rx_ring[i].refill_count, 0);
+		cancel_work_sync(&ess->rx_refill[i].refill_work);
+
+		for (j = 0; j < IPQESS_RX_RING_SIZE; j++) {
+			dma_unmap_single(&ess->pdev->dev,
+					 ess->rx_ring[i].buf[j].dma,
+					 ess->rx_ring[i].buf[j].length,
+					 DMA_FROM_DEVICE);
+			dev_kfree_skb_any(ess->rx_ring[i].buf[j].skb);
+		}
+	}
+}
+
+static struct net_device_stats *ipqess_get_stats(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	spin_lock(&ess->stats_lock);
+	ipqess_update_hw_stats(ess);
+	spin_unlock(&ess->stats_lock);
+
+	return &ess->stats;
+}
+
+static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int budget)
+{
+	u32 length = 0, num_desc, tail, rx_ring_tail;
+	int done = 0;
+
+	rx_ring_tail = rx_ring->tail;
+
+	tail = ipqess_r32(rx_ring->ess, IPQESS_REG_RFD_IDX_Q(rx_ring->idx));
+	tail >>= IPQESS_RFD_CONS_IDX_SHIFT;
+	tail &= IPQESS_RFD_CONS_IDX_MASK;
+
+	while (done < budget) {
+		struct ipqess_rx_desc *rd;
+		struct sk_buff *skb;
+
+		if (rx_ring_tail == tail)
+			break;
+
+		dma_unmap_single(rx_ring->ppdev,
+				 rx_ring->buf[rx_ring_tail].dma,
+				 rx_ring->buf[rx_ring_tail].length,
+				 DMA_FROM_DEVICE);
+
+		skb = xchg(&rx_ring->buf[rx_ring_tail].skb, NULL);
+		rd = (struct ipqess_rx_desc *)skb->data;
+		rx_ring_tail = IPQESS_NEXT_IDX(rx_ring_tail, IPQESS_RX_RING_SIZE);
+
+		/* Check if RRD is valid */
+		if (!(rd->rrd7 & IPQESS_RRD_DESC_VALID)) {
+			num_desc = 1;
+			dev_kfree_skb_any(skb);
+			goto skip;
+		}
+
+		num_desc = rd->rrd1 & IPQESS_RRD_NUM_RFD_MASK;
+		length = rd->rrd6 & IPQESS_RRD_PKT_SIZE_MASK;
+
+		skb_reserve(skb, IPQESS_RRD_SIZE);
+		if (num_desc > 1) {
+			struct sk_buff *skb_prev = NULL;
+			int size_remaining;
+			int i;
+
+			skb->data_len = 0;
+			skb->tail += (IPQESS_RX_HEAD_BUFF_SIZE - IPQESS_RRD_SIZE);
+			skb->len = length;
+			skb->truesize = length;
+			size_remaining = length - (IPQESS_RX_HEAD_BUFF_SIZE - IPQESS_RRD_SIZE);
+
+			for (i = 1; i < num_desc; i++) {
+				struct sk_buff *skb_temp = rx_ring->buf[rx_ring_tail].skb;
+
+				dma_unmap_single(rx_ring->ppdev,
+						 rx_ring->buf[rx_ring_tail].dma,
+						 rx_ring->buf[rx_ring_tail].length,
+						 DMA_FROM_DEVICE);
+
+				skb_put(skb_temp, min(size_remaining, IPQESS_RX_HEAD_BUFF_SIZE));
+				if (skb_prev)
+					skb_prev->next = rx_ring->buf[rx_ring_tail].skb;
+				else
+					skb_shinfo(skb)->frag_list = rx_ring->buf[rx_ring_tail].skb;
+				skb_prev = rx_ring->buf[rx_ring_tail].skb;
+				rx_ring->buf[rx_ring_tail].skb->next = NULL;
+
+				skb->data_len += rx_ring->buf[rx_ring_tail].skb->len;
+				size_remaining -= rx_ring->buf[rx_ring_tail].skb->len;
+
+				rx_ring_tail = IPQESS_NEXT_IDX(rx_ring_tail, IPQESS_RX_RING_SIZE);
+			}
+
+		} else {
+			skb_put(skb, length);
+		}
+
+		skb->dev = rx_ring->ess->netdev;
+		skb->protocol = eth_type_trans(skb, rx_ring->ess->netdev);
+		skb_record_rx_queue(skb, rx_ring->ring_id);
+
+		if (rd->rrd6 & IPQESS_RRD_CSUM_FAIL_MASK)
+			skb_checksum_none_assert(skb);
+		else
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+		if (rd->rrd7 & IPQESS_RRD_CVLAN)
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+					       rd->rrd4);
+		else if (rd->rrd1 & IPQESS_RRD_SVLAN)
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD),
+					       rd->rrd4);
+
+		napi_gro_receive(&rx_ring->napi_rx, skb);
+
+		rx_ring->ess->stats.rx_packets++;
+		rx_ring->ess->stats.rx_bytes += length;
+
+		done++;
+skip:
+
+		num_desc += atomic_xchg(&rx_ring->refill_count, 0);
+		while (num_desc) {
+			if (ipqess_rx_buf_alloc_napi(rx_ring)) {
+				num_desc = atomic_add_return(num_desc,
+							     &rx_ring->refill_count);
+				if (num_desc >= ((4 * IPQESS_RX_RING_SIZE + 6) / 7))
+					schedule_work(&rx_ring->ess->rx_refill[rx_ring->ring_id].refill_work);
+				break;
+			}
+			num_desc--;
+		}
+	}
+
+	ipqess_w32(rx_ring->ess, IPQESS_REG_RX_SW_CONS_IDX_Q(rx_ring->idx),
+		   rx_ring_tail);
+	rx_ring->tail = rx_ring_tail;
+
+	return done;
+}
+
+static int ipqess_tx_complete(struct ipqess_tx_ring *tx_ring, int budget)
+{
+	int total = 0, ret;
+	int done = 0;
+	u32 tail;
+
+	tail = ipqess_r32(tx_ring->ess, IPQESS_REG_TPD_IDX_Q(tx_ring->idx));
+	tail >>= IPQESS_TPD_CONS_IDX_SHIFT;
+	tail &= IPQESS_TPD_CONS_IDX_MASK;
+
+	while ((tx_ring->tail != tail) && (done < budget)) {
+		ret = ipqess_tx_unmap_and_free(&tx_ring->ess->pdev->dev,
+					       &tx_ring->buf[tx_ring->tail]);
+		tx_ring->tail = IPQESS_NEXT_IDX(tx_ring->tail, tx_ring->count);
+
+		if (ret) {
+			total += ret;
+			done++;
+		}
+	}
+
+	ipqess_w32(tx_ring->ess, IPQESS_REG_TX_SW_CONS_IDX_Q(tx_ring->idx),
+		   tx_ring->tail);
+
+	if (netif_tx_queue_stopped(tx_ring->nq)) {
+		netdev_dbg(tx_ring->ess->netdev, "waking up tx queue %d\n",
+			   tx_ring->idx);
+		netif_tx_wake_queue(tx_ring->nq);
+	}
+
+	netdev_tx_completed_queue(tx_ring->nq, done, total);
+
+	return done;
+}
+
+static int ipqess_tx_napi(struct napi_struct *napi, int budget)
+{
+	struct ipqess_tx_ring *tx_ring = container_of(napi, struct ipqess_tx_ring,
+						    napi_tx);
+	int work_done = 0;
+	u32 tx_status;
+
+	tx_status = ipqess_r32(tx_ring->ess, IPQESS_REG_TX_ISR);
+	tx_status &= BIT(tx_ring->idx);
+
+	work_done = ipqess_tx_complete(tx_ring, budget);
+
+	ipqess_w32(tx_ring->ess, IPQESS_REG_TX_ISR, tx_status);
+
+	if (likely(work_done < budget)) {
+		if (napi_complete_done(napi, work_done))
+			ipqess_w32(tx_ring->ess,
+				   IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx), 0x1);
+	}
+
+	return work_done;
+}
+
+static int ipqess_rx_napi(struct napi_struct *napi, int budget)
+{
+	struct ipqess_rx_ring *rx_ring = container_of(napi, struct ipqess_rx_ring,
+						    napi_rx);
+	struct ipqess *ess = rx_ring->ess;
+	u32 rx_mask = BIT(rx_ring->idx);
+	int remain_budget = budget;
+	int rx_done;
+	u32 status;
+
+poll_again:
+	ipqess_w32(ess, IPQESS_REG_RX_ISR, rx_mask);
+	rx_done = ipqess_rx_poll(rx_ring, remain_budget);
+
+	if (rx_done == remain_budget)
+		return budget;
+
+	status = ipqess_r32(ess, IPQESS_REG_RX_ISR);
+	if (status & rx_mask) {
+		remain_budget -= rx_done;
+		goto poll_again;
+	}
+
+	if (napi_complete_done(napi, rx_done + budget - remain_budget))
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(rx_ring->idx), 0x1);
+
+	return rx_done + budget - remain_budget;
+}
+
+static irqreturn_t ipqess_interrupt_tx(int irq, void *priv)
+{
+	struct ipqess_tx_ring *tx_ring = (struct ipqess_tx_ring *)priv;
+
+	if (likely(napi_schedule_prep(&tx_ring->napi_tx))) {
+		__napi_schedule(&tx_ring->napi_tx);
+		ipqess_w32(tx_ring->ess, IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx),
+			   0x0);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ipqess_interrupt_rx(int irq, void *priv)
+{
+	struct ipqess_rx_ring *rx_ring = (struct ipqess_rx_ring *)priv;
+
+	if (likely(napi_schedule_prep(&rx_ring->napi_rx))) {
+		__napi_schedule(&rx_ring->napi_rx);
+		ipqess_w32(rx_ring->ess, IPQESS_REG_RX_INT_MASK_Q(rx_ring->idx),
+			   0x0);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void ipqess_irq_enable(struct ipqess *ess)
+{
+	int i;
+
+	ipqess_w32(ess, IPQESS_REG_RX_ISR, 0xff);
+	ipqess_w32(ess, IPQESS_REG_TX_ISR, 0xffff);
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 1);
+		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 1);
+	}
+}
+
+static void ipqess_irq_disable(struct ipqess *ess)
+{
+	int i;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 0);
+		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 0);
+	}
+}
+
+static int __init ipqess_init(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	struct device_node *of_node = ess->pdev->dev.of_node;
+	int ret;
+
+	ret = of_get_ethdev_address(of_node, netdev);
+	if (ret)
+		eth_hw_addr_random(netdev);
+
+	return phylink_of_phy_connect(ess->phylink, of_node, 0);
+}
+
+static void ipqess_uninit(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	phylink_disconnect_phy(ess->phylink);
+}
+
+static int ipqess_open(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	int i, err;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		int qid;
+
+		qid = ess->tx_ring[i].idx;
+		err = devm_request_irq(&netdev->dev, ess->tx_irq[qid],
+				       ipqess_interrupt_tx, 0,
+				       ess->tx_irq_names[qid],
+				       &ess->tx_ring[i]);
+		if (err)
+			return err;
+
+		qid = ess->rx_ring[i].idx;
+		err = devm_request_irq(&netdev->dev, ess->rx_irq[qid],
+				       ipqess_interrupt_rx, 0,
+				       ess->rx_irq_names[qid],
+				       &ess->rx_ring[i]);
+		if (err)
+			return err;
+
+		napi_enable(&ess->tx_ring[i].napi_tx);
+		napi_enable(&ess->rx_ring[i].napi_rx);
+	}
+
+	ipqess_irq_enable(ess);
+	phylink_start(ess->phylink);
+	netif_tx_start_all_queues(netdev);
+
+	return 0;
+}
+
+static int ipqess_stop(struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	int i;
+
+	netif_tx_stop_all_queues(netdev);
+	phylink_stop(ess->phylink);
+	ipqess_irq_disable(ess);
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		napi_disable(&ess->tx_ring[i].napi_tx);
+		napi_disable(&ess->rx_ring[i].napi_rx);
+	}
+
+	return 0;
+}
+
+static int ipqess_do_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	switch (cmd) {
+	case SIOCGMIIPHY:
+	case SIOCGMIIREG:
+	case SIOCSMIIREG:
+		return phylink_mii_ioctl(ess->phylink, ifr, cmd);
+	default:
+		break;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static inline u16 ipqess_tx_desc_available(struct ipqess_tx_ring *tx_ring)
+{
+	u16 count = 0;
+
+	if (tx_ring->tail <= tx_ring->head)
+		count = IPQESS_TX_RING_SIZE;
+
+	count += tx_ring->tail - tx_ring->head - 1;
+
+	return count;
+}
+
+static inline int ipqess_cal_txd_req(struct sk_buff *skb)
+{
+	int tpds;
+
+	/* one TPD for the header, and one for each fragments */
+	tpds = 1 + skb_shinfo(skb)->nr_frags;
+	if (skb_is_gso(skb) && skb_is_gso_v6(skb)) {
+		/* for LSOv2 one extra TPD is needed */
+		tpds++;
+	}
+
+	return tpds;
+}
+
+static struct ipqess_buf *ipqess_get_tx_buffer(struct ipqess_tx_ring *tx_ring,
+					       struct ipqess_tx_desc *desc)
+{
+	return &tx_ring->buf[desc - tx_ring->hw_desc];
+}
+
+static struct ipqess_tx_desc *ipqess_tx_desc_next(struct ipqess_tx_ring *tx_ring)
+{
+	struct ipqess_tx_desc *desc;
+
+	desc = &tx_ring->hw_desc[tx_ring->head];
+	tx_ring->head = IPQESS_NEXT_IDX(tx_ring->head, tx_ring->count);
+
+	return desc;
+}
+
+static void ipqess_rollback_tx(struct ipqess *eth,
+			       struct ipqess_tx_desc *first_desc, int ring_id)
+{
+	struct ipqess_tx_ring *tx_ring = &eth->tx_ring[ring_id];
+	struct ipqess_tx_desc *desc = NULL;
+	struct ipqess_buf *buf;
+	u16 start_index, index;
+
+	start_index = first_desc - tx_ring->hw_desc;
+
+	index = start_index;
+	while (index != tx_ring->head) {
+		desc = &tx_ring->hw_desc[index];
+		buf = &tx_ring->buf[index];
+		ipqess_tx_unmap_and_free(&eth->pdev->dev, buf);
+		memset(desc, 0, sizeof(struct ipqess_tx_desc));
+		if (++index == tx_ring->count)
+			index = 0;
+	}
+	tx_ring->head = start_index;
+}
+
+static int ipqess_tx_map_and_fill(struct ipqess_tx_ring *tx_ring,
+				  struct sk_buff *skb)
+{
+	struct ipqess_tx_desc *desc = NULL, *first_desc = NULL;
+	u32 word1 = 0, word3 = 0, lso_word1 = 0, svlan_tag = 0;
+	struct platform_device *pdev = tx_ring->ess->pdev;
+	struct ipqess_buf *buf = NULL;
+	u16 len;
+	int i;
+
+	if (skb_is_gso(skb)) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
+			lso_word1 |= IPQESS_TPD_IPV4_EN;
+			ip_hdr(skb)->check = 0;
+			tcp_hdr(skb)->check = ~csum_tcpudp_magic(ip_hdr(skb)->saddr,
+								 ip_hdr(skb)->daddr,
+								 0, IPPROTO_TCP, 0);
+		} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
+			lso_word1 |= IPQESS_TPD_LSO_V2_EN;
+			ipv6_hdr(skb)->payload_len = 0;
+			tcp_hdr(skb)->check = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+							       &ipv6_hdr(skb)->daddr,
+							       0, IPPROTO_TCP, 0);
+		}
+
+		lso_word1 |= IPQESS_TPD_LSO_EN |
+			     ((skb_shinfo(skb)->gso_size & IPQESS_TPD_MSS_MASK) <<
+							   IPQESS_TPD_MSS_SHIFT) |
+			     (skb_transport_offset(skb) << IPQESS_TPD_HDR_SHIFT);
+	} else if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
+		u8 css, cso;
+
+		cso = skb_checksum_start_offset(skb);
+		css = cso + skb->csum_offset;
+
+		word1 |= (IPQESS_TPD_CUSTOM_CSUM_EN);
+		word1 |= (cso >> 1) << IPQESS_TPD_HDR_SHIFT;
+		word1 |= ((css >> 1) << IPQESS_TPD_CUSTOM_CSUM_SHIFT);
+	}
+
+	if (skb_vlan_tag_present(skb)) {
+		switch (skb->vlan_proto) {
+		case htons(ETH_P_8021Q):
+			word3 |= BIT(IPQESS_TX_INS_CVLAN);
+			word3 |= skb_vlan_tag_get(skb) << IPQESS_TX_CVLAN_TAG_SHIFT;
+			break;
+		case htons(ETH_P_8021AD):
+			word1 |= BIT(IPQESS_TX_INS_SVLAN);
+			svlan_tag = skb_vlan_tag_get(skb);
+			break;
+		default:
+			dev_err(&pdev->dev, "no ctag or stag present\n");
+			goto vlan_tag_error;
+		}
+	}
+
+	if (eth_type_vlan(skb->protocol))
+		word1 |= IPQESS_TPD_VLAN_TAGGED;
+
+	if (skb->protocol == htons(ETH_P_PPP_SES))
+		word1 |= IPQESS_TPD_PPPOE_EN;
+
+	len = skb_headlen(skb);
+
+	first_desc = ipqess_tx_desc_next(tx_ring);
+	desc = first_desc;
+	if (lso_word1 & IPQESS_TPD_LSO_V2_EN) {
+		desc->addr = cpu_to_le32(skb->len);
+		desc->word1 = cpu_to_le32(word1 | lso_word1);
+		desc->svlan_tag = cpu_to_le16(svlan_tag);
+		desc->word3 = cpu_to_le32(word3);
+		desc = ipqess_tx_desc_next(tx_ring);
+	}
+
+	buf = ipqess_get_tx_buffer(tx_ring, desc);
+	buf->length = len;
+	buf->dma = dma_map_single(&pdev->dev, skb->data, len, DMA_TO_DEVICE);
+
+	if (dma_mapping_error(&pdev->dev, buf->dma))
+		goto dma_error;
+
+	desc->addr = cpu_to_le32(buf->dma);
+	desc->len  = cpu_to_le16(len);
+
+	buf->flags |= IPQESS_DESC_SINGLE;
+	desc->word1 = cpu_to_le32(word1 | lso_word1);
+	desc->svlan_tag = cpu_to_le16(svlan_tag);
+	desc->word3 = cpu_to_le32(word3);
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		len = skb_frag_size(frag);
+		desc = ipqess_tx_desc_next(tx_ring);
+		buf = ipqess_get_tx_buffer(tx_ring, desc);
+		buf->length = len;
+		buf->flags |= IPQESS_DESC_PAGE;
+		buf->dma = skb_frag_dma_map(&pdev->dev, frag, 0, len,
+					    DMA_TO_DEVICE);
+
+		if (dma_mapping_error(&pdev->dev, buf->dma))
+			goto dma_error;
+
+		desc->addr = cpu_to_le32(buf->dma);
+		desc->len  = cpu_to_le16(len);
+		desc->svlan_tag = cpu_to_le16(svlan_tag);
+		desc->word1 = cpu_to_le32(word1 | lso_word1);
+		desc->word3 = cpu_to_le32(word3);
+	}
+	desc->word1 |= cpu_to_le32(1 << IPQESS_TPD_EOP_SHIFT);
+	buf->skb = skb;
+	buf->flags |= IPQESS_DESC_LAST;
+
+	return 0;
+
+dma_error:
+	ipqess_rollback_tx(tx_ring->ess, first_desc, tx_ring->ring_id);
+	dev_err(&pdev->dev, "TX DMA map failed\n");
+
+vlan_tag_error:
+	return -ENOMEM;
+}
+
+static inline void ipqess_kick_tx(struct ipqess_tx_ring *tx_ring)
+{
+	/* Ensure that all TPDs has been written completely */
+	dma_wmb();
+
+	/* update software producer index */
+	ipqess_w32(tx_ring->ess, IPQESS_REG_TPD_IDX_Q(tx_ring->idx),
+		   tx_ring->head);
+}
+
+static netdev_tx_t ipqess_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	struct ipqess_tx_ring *tx_ring;
+	int avail;
+	int tx_num;
+	int ret;
+
+	tx_ring = &ess->tx_ring[skb_get_queue_mapping(skb)];
+	tx_num = ipqess_cal_txd_req(skb);
+	avail = ipqess_tx_desc_available(tx_ring);
+	if (avail < tx_num) {
+		netdev_dbg(netdev,
+			   "stopping tx queue %d, avail=%d req=%d im=%x\n",
+			   tx_ring->idx, avail, tx_num,
+			   ipqess_r32(tx_ring->ess,
+				      IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx)));
+		netif_tx_stop_queue(tx_ring->nq);
+		ipqess_w32(tx_ring->ess, IPQESS_REG_TX_INT_MASK_Q(tx_ring->idx), 0x1);
+		ipqess_kick_tx(tx_ring);
+		return NETDEV_TX_BUSY;
+	}
+
+	ret = ipqess_tx_map_and_fill(tx_ring, skb);
+	if (ret) {
+		dev_kfree_skb_any(skb);
+		ess->stats.tx_errors++;
+		goto err_out;
+	}
+
+	ess->stats.tx_packets++;
+	ess->stats.tx_bytes += skb->len;
+	netdev_tx_sent_queue(tx_ring->nq, skb->len);
+
+	if (!netdev_xmit_more() || netif_xmit_stopped(tx_ring->nq))
+		ipqess_kick_tx(tx_ring);
+
+err_out:
+	return NETDEV_TX_OK;
+}
+
+static int ipqess_set_mac_address(struct net_device *netdev, void *p)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	const char *macaddr = netdev->dev_addr;
+	int ret = eth_mac_addr(netdev, p);
+
+	if (ret)
+		return ret;
+
+	ipqess_w32(ess, IPQESS_REG_MAC_CTRL1, (macaddr[0] << 8) | macaddr[1]);
+	ipqess_w32(ess, IPQESS_REG_MAC_CTRL0,
+		   (macaddr[2] << 24) | (macaddr[3] << 16) | (macaddr[4] << 8) |
+		    macaddr[5]);
+
+	return 0;
+}
+
+static void ipqess_tx_timeout(struct net_device *netdev, unsigned int txq_id)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	struct ipqess_tx_ring *tr = &ess->tx_ring[txq_id];
+
+	netdev_warn(netdev, "TX timeout on queue %d\n", tr->idx);
+}
+
+static const struct net_device_ops ipqess_axi_netdev_ops = {
+	.ndo_init		= ipqess_init,
+	.ndo_uninit		= ipqess_uninit,
+	.ndo_open		= ipqess_open,
+	.ndo_stop		= ipqess_stop,
+	.ndo_do_ioctl		= ipqess_do_ioctl,
+	.ndo_start_xmit		= ipqess_xmit,
+	.ndo_get_stats		= ipqess_get_stats,
+	.ndo_set_mac_address	= ipqess_set_mac_address,
+	.ndo_tx_timeout		= ipqess_tx_timeout,
+};
+
+static void ipqess_hw_stop(struct ipqess *ess)
+{
+	int i;
+
+	/* disable all RX queue IRQs */
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++)
+		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(i), 0);
+
+	/* disable all TX queue IRQs */
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++)
+		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(i), 0);
+
+	/* disable all other IRQs */
+	ipqess_w32(ess, IPQESS_REG_MISC_IMR, 0);
+	ipqess_w32(ess, IPQESS_REG_WOL_IMR, 0);
+
+	/* clear the IRQ status registers */
+	ipqess_w32(ess, IPQESS_REG_RX_ISR, 0xff);
+	ipqess_w32(ess, IPQESS_REG_TX_ISR, 0xffff);
+	ipqess_w32(ess, IPQESS_REG_MISC_ISR, 0x1fff);
+	ipqess_w32(ess, IPQESS_REG_WOL_ISR, 0x1);
+	ipqess_w32(ess, IPQESS_REG_WOL_CTRL, 0);
+
+	/* disable RX and TX queues */
+	ipqess_m32(ess, IPQESS_RXQ_CTRL_EN_MASK, 0, IPQESS_REG_RXQ_CTRL);
+	ipqess_m32(ess, IPQESS_TXQ_CTRL_TXQ_EN, 0, IPQESS_REG_TXQ_CTRL);
+}
+
+static int ipqess_hw_init(struct ipqess *ess)
+{
+	int i, err;
+	u32 tmp;
+
+	ipqess_hw_stop(ess);
+
+	ipqess_m32(ess, BIT(IPQESS_INTR_SW_IDX_W_TYP_SHIFT),
+		   IPQESS_INTR_SW_IDX_W_TYPE << IPQESS_INTR_SW_IDX_W_TYP_SHIFT,
+		   IPQESS_REG_INTR_CTRL);
+
+	/* enable IRQ delay slot */
+	ipqess_w32(ess, IPQESS_REG_IRQ_MODRT_TIMER_INIT,
+		   (IPQESS_TX_IMT << IPQESS_IRQ_MODRT_TX_TIMER_SHIFT) |
+		   (IPQESS_RX_IMT << IPQESS_IRQ_MODRT_RX_TIMER_SHIFT));
+
+	/* Set Customer and Service VLAN TPIDs */
+	ipqess_w32(ess, IPQESS_REG_VLAN_CFG,
+		   (ETH_P_8021Q << IPQESS_VLAN_CFG_CVLAN_TPID_SHIFT) |
+		   (ETH_P_8021AD << IPQESS_VLAN_CFG_SVLAN_TPID_SHIFT));
+
+	/* Configure the TX Queue bursting */
+	ipqess_w32(ess, IPQESS_REG_TXQ_CTRL,
+		   (IPQESS_TPD_BURST << IPQESS_TXQ_NUM_TPD_BURST_SHIFT) |
+		   (IPQESS_TXF_BURST << IPQESS_TXQ_TXF_BURST_NUM_SHIFT) |
+		   IPQESS_TXQ_CTRL_TPD_BURST_EN);
+
+	/* Set RSS type */
+	ipqess_w32(ess, IPQESS_REG_RSS_TYPE,
+		   IPQESS_RSS_TYPE_IPV4TCP | IPQESS_RSS_TYPE_IPV6_TCP |
+		   IPQESS_RSS_TYPE_IPV4_UDP | IPQESS_RSS_TYPE_IPV6UDP |
+		   IPQESS_RSS_TYPE_IPV4 | IPQESS_RSS_TYPE_IPV6);
+
+	/* Set RFD ring burst and threshold */
+	ipqess_w32(ess, IPQESS_REG_RX_DESC1,
+		   (IPQESS_RFD_BURST << IPQESS_RXQ_RFD_BURST_NUM_SHIFT) |
+		   (IPQESS_RFD_THR << IPQESS_RXQ_RFD_PF_THRESH_SHIFT) |
+		   (IPQESS_RFD_LTHR << IPQESS_RXQ_RFD_LOW_THRESH_SHIFT));
+
+	/* Set Rx FIFO
+	 * - threshold to start to DMA data to host
+	 */
+	ipqess_w32(ess, IPQESS_REG_RXQ_CTRL,
+		   IPQESS_FIFO_THRESH_128_BYTE | IPQESS_RXQ_CTRL_RMV_VLAN);
+
+	err = ipqess_rx_ring_alloc(ess);
+	if (err)
+		return err;
+
+	err = ipqess_tx_ring_alloc(ess);
+	if (err)
+		goto err_rx_ring_free;
+
+	/* Load all of ring base addresses above into the dma engine */
+	ipqess_m32(ess, 0, BIT(IPQESS_LOAD_PTR_SHIFT), IPQESS_REG_TX_SRAM_PART);
+
+	/* Disable TX FIFO low watermark and high watermark */
+	ipqess_w32(ess, IPQESS_REG_TXF_WATER_MARK, 0);
+
+	/* Configure RSS indirection table.
+	 * 128 hash will be configured in the following
+	 * pattern: hash{0,1,2,3} = {Q0,Q2,Q4,Q6} respectively
+	 * and so on
+	 */
+	for (i = 0; i < IPQESS_NUM_IDT; i++)
+		ipqess_w32(ess, IPQESS_REG_RSS_IDT(i), IPQESS_RSS_IDT_VALUE);
+
+	/* Configure load balance mapping table.
+	 * 4 table entry will be configured according to the
+	 * following pattern: load_balance{0,1,2,3} = {Q0,Q1,Q3,Q4}
+	 * respectively.
+	 */
+	ipqess_w32(ess, IPQESS_REG_LB_RING, IPQESS_LB_REG_VALUE);
+
+	/* Configure Virtual queue for Tx rings */
+	ipqess_w32(ess, IPQESS_REG_VQ_CTRL0, IPQESS_VQ_REG_VALUE);
+	ipqess_w32(ess, IPQESS_REG_VQ_CTRL1, IPQESS_VQ_REG_VALUE);
+
+	/* Configure Max AXI Burst write size to 128 bytes*/
+	ipqess_w32(ess, IPQESS_REG_AXIW_CTRL_MAXWRSIZE,
+		   IPQESS_AXIW_MAXWRSIZE_VALUE);
+
+	/* Enable TX queues */
+	ipqess_m32(ess, 0, IPQESS_TXQ_CTRL_TXQ_EN, IPQESS_REG_TXQ_CTRL);
+
+	/* Enable RX queues */
+	tmp = 0;
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++)
+		tmp |= IPQESS_RXQ_CTRL_EN(ess->rx_ring[i].idx);
+
+	ipqess_m32(ess, IPQESS_RXQ_CTRL_EN_MASK, tmp, IPQESS_REG_RXQ_CTRL);
+
+	return 0;
+
+err_rx_ring_free:
+
+	ipqess_rx_ring_free(ess);
+	return err;
+}
+
+static void ipqess_mac_config(struct phylink_config *config, unsigned int mode,
+			      const struct phylink_link_state *state)
+{
+	/* Nothing to do, use fixed Internal mode */
+}
+
+static void ipqess_mac_link_down(struct phylink_config *config,
+				 unsigned int mode,
+				 phy_interface_t interface)
+{
+	/* Nothing to do, use fixed Internal mode */
+}
+
+static void ipqess_mac_link_up(struct phylink_config *config,
+			       struct phy_device *phy, unsigned int mode,
+			       phy_interface_t interface,
+			       int speed, int duplex,
+			       bool tx_pause, bool rx_pause)
+{
+	/* Nothing to do, use fixed Internal mode */
+}
+
+static struct phylink_mac_ops ipqess_phylink_mac_ops = {
+	.validate		= phylink_generic_validate,
+	.mac_config		= ipqess_mac_config,
+	.mac_link_up		= ipqess_mac_link_up,
+	.mac_link_down		= ipqess_mac_link_down,
+};
+
+static void ipqess_reset(struct ipqess *ess)
+{
+	reset_control_assert(ess->ess_rst);
+
+	mdelay(10);
+
+	reset_control_deassert(ess->ess_rst);
+
+	/* Waiting for all inner tables to be flushed and reinitialized.
+	 * This takes between 5 and 10 ms
+	 */
+
+	mdelay(10);
+}
+
+static int ipqess_axi_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct net_device *netdev;
+	phy_interface_t phy_mode;
+	struct resource *res;
+	struct ipqess *ess;
+	int i, err = 0;
+
+	netdev = devm_alloc_etherdev_mqs(&pdev->dev, sizeof(struct ipqess),
+					 IPQESS_NETDEV_QUEUES,
+					 IPQESS_NETDEV_QUEUES);
+	if (!netdev)
+		return -ENOMEM;
+
+	ess = netdev_priv(netdev);
+	ess->netdev = netdev;
+	ess->pdev = pdev;
+	spin_lock_init(&ess->stats_lock);
+	SET_NETDEV_DEV(netdev, &pdev->dev);
+	platform_set_drvdata(pdev, netdev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	ess->hw_addr = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(ess->hw_addr))
+		return PTR_ERR(ess->hw_addr);
+
+	err = of_get_phy_mode(np, &phy_mode);
+	if (err) {
+		dev_err(&pdev->dev, "incorrect phy-mode\n");
+		return err;
+	}
+
+	ess->ess_clk = devm_clk_get(&pdev->dev, "ess");
+	if (!IS_ERR(ess->ess_clk))
+		clk_prepare_enable(ess->ess_clk);
+
+	ess->ess_rst = devm_reset_control_get(&pdev->dev, "ess");
+	if (IS_ERR(ess->ess_rst))
+		goto err_clk;
+
+	ipqess_reset(ess);
+
+	ess->phylink_config.dev = &netdev->dev;
+	ess->phylink_config.type = PHYLINK_NETDEV;
+
+	__set_bit(PHY_INTERFACE_MODE_INTERNAL,
+		  ess->phylink_config.supported_interfaces);
+
+	ess->phylink = phylink_create(&ess->phylink_config,
+				      of_fwnode_handle(np), phy_mode,
+				      &ipqess_phylink_mac_ops);
+	if (IS_ERR(ess->phylink)) {
+		err = PTR_ERR(ess->phylink);
+		goto err_clk;
+	}
+
+	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
+		ess->tx_irq[i] = platform_get_irq(pdev, i);
+		scnprintf(ess->tx_irq_names[i], sizeof(ess->tx_irq_names[i]),
+			  "%s:txq%d", pdev->name, i);
+	}
+
+	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
+		ess->rx_irq[i] = platform_get_irq(pdev, i + IPQESS_MAX_TX_QUEUE);
+		scnprintf(ess->rx_irq_names[i], sizeof(ess->rx_irq_names[i]),
+			  "%s:rxq%d", pdev->name, i);
+	}
+
+	netdev->netdev_ops = &ipqess_axi_netdev_ops;
+	netdev->features = NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+			   NETIF_F_HW_VLAN_CTAG_RX |
+			   NETIF_F_HW_VLAN_CTAG_TX |
+			   NETIF_F_TSO | NETIF_F_GRO | NETIF_F_SG;
+	/* feature change is not supported yet */
+	netdev->hw_features = 0;
+	netdev->vlan_features = NETIF_F_HW_CSUM | NETIF_F_SG | NETIF_F_RXCSUM |
+				NETIF_F_TSO |
+				NETIF_F_GRO;
+	netdev->watchdog_timeo = 5 * HZ;
+	netdev->base_addr = (u32)ess->hw_addr;
+	netdev->max_mtu = 9000;
+	netdev->gso_max_segs = IPQESS_TX_RING_SIZE / 2;
+
+	ipqess_set_ethtool_ops(netdev);
+
+	err = ipqess_hw_init(ess);
+	if (err)
+		goto err_phylink;
+
+	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
+		netif_tx_napi_add(netdev, &ess->tx_ring[i].napi_tx,
+				  ipqess_tx_napi, 64);
+		netif_napi_add(netdev, &ess->rx_ring[i].napi_rx, ipqess_rx_napi,
+			       64);
+	}
+
+	err = register_netdev(netdev);
+	if (err)
+		goto err_hw_stop;
+
+	return 0;
+
+err_hw_stop:
+	ipqess_hw_stop(ess);
+
+	ipqess_tx_ring_free(ess);
+	ipqess_rx_ring_free(ess);
+err_phylink:
+	phylink_destroy(ess->phylink);
+
+err_clk:
+	clk_disable_unprepare(ess->ess_clk);
+
+	return err;
+}
+
+static int ipqess_axi_remove(struct platform_device *pdev)
+{
+	const struct net_device *netdev = platform_get_drvdata(pdev);
+	struct ipqess *ess = netdev_priv(netdev);
+
+	ipqess_hw_stop(ess);
+	unregister_netdev(ess->netdev);
+
+	ipqess_tx_ring_free(ess);
+	ipqess_rx_ring_free(ess);
+
+	phylink_destroy(ess->phylink);
+	clk_disable_unprepare(ess->ess_clk);
+
+	return 0;
+}
+
+static const struct of_device_id ipqess_of_mtable[] = {
+	{.compatible = "qcom,ipq4019-ess-edma" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, ipqess_of_mtable);
+
+static struct platform_driver ipqess_axi_driver = {
+	.driver = {
+		.name    = "ipqess-edma",
+		.of_match_table = ipqess_of_mtable,
+	},
+	.probe    = ipqess_axi_probe,
+	.remove   = ipqess_axi_remove,
+};
+
+module_platform_driver(ipqess_axi_driver);
+
+MODULE_AUTHOR("Qualcomm Atheros Inc");
+MODULE_AUTHOR("John Crispin <john@phrozen.org>");
+MODULE_AUTHOR("Christian Lamparter <chunkeey@gmail.com>");
+MODULE_AUTHOR("Gabor Juhos <j4g8y7@gmail.com>");
+MODULE_AUTHOR("Maxime Chevallier <maxime.chevallier@bootlin.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.h b/drivers/net/ethernet/qualcomm/ipqess/ipqess.h
new file mode 100644
index 000000000000..9a4ab6ce282a
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.h
@@ -0,0 +1,518 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR ISC) */
+/* Copyright (c) 2014 - 2016, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017 - 2018, John Crispin <john@phrozen.org>
+ * Copyright (c) 2018 - 2019, Christian Lamparter <chunkeey@gmail.com>
+ * Copyright (c) 2020 - 2021, Gabor Juhos <j4g8y7@gmail.com>
+ * Copyright (c) 2021 - 2022, Maxime Chevallier <maxime.chevallier@bootlin.com>
+ *
+ */
+
+#ifndef _IPQESS_H_
+#define _IPQESS_H_
+
+#define IPQESS_NETDEV_QUEUES	4
+
+#define IPQESS_TPD_EOP_SHIFT 31
+
+#define IPQESS_PORT_ID_SHIFT 12
+#define IPQESS_PORT_ID_MASK 0x7
+
+/* tpd word 3 bit 18-28 */
+#define IPQESS_TPD_PORT_BITMAP_SHIFT 18
+
+#define IPQESS_TPD_FROM_CPU_SHIFT 25
+
+#define IPQESS_RX_RING_SIZE 128
+#define IPQESS_RX_HEAD_BUFF_SIZE 1540
+#define IPQESS_TX_RING_SIZE 128
+#define IPQESS_MAX_RX_QUEUE 8
+#define IPQESS_MAX_TX_QUEUE 16
+
+/* Configurations */
+#define IPQESS_INTR_CLEAR_TYPE 0
+#define IPQESS_INTR_SW_IDX_W_TYPE 0
+#define IPQESS_FIFO_THRESH_TYPE 0
+#define IPQESS_RSS_TYPE 0
+#define IPQESS_RX_IMT 0x0020
+#define IPQESS_TX_IMT 0x0050
+#define IPQESS_TPD_BURST 5
+#define IPQESS_TXF_BURST 0x100
+#define IPQESS_RFD_BURST 8
+#define IPQESS_RFD_THR 16
+#define IPQESS_RFD_LTHR 0
+
+/* Flags used in transmit direction */
+#define IPQESS_DESC_LAST 0x1
+#define IPQESS_DESC_SINGLE 0x2
+#define IPQESS_DESC_PAGE 0x4
+
+struct ipqess_statistics {
+	u32 tx_q0_pkt;
+	u32 tx_q1_pkt;
+	u32 tx_q2_pkt;
+	u32 tx_q3_pkt;
+	u32 tx_q4_pkt;
+	u32 tx_q5_pkt;
+	u32 tx_q6_pkt;
+	u32 tx_q7_pkt;
+	u32 tx_q8_pkt;
+	u32 tx_q9_pkt;
+	u32 tx_q10_pkt;
+	u32 tx_q11_pkt;
+	u32 tx_q12_pkt;
+	u32 tx_q13_pkt;
+	u32 tx_q14_pkt;
+	u32 tx_q15_pkt;
+	u32 tx_q0_byte;
+	u32 tx_q1_byte;
+	u32 tx_q2_byte;
+	u32 tx_q3_byte;
+	u32 tx_q4_byte;
+	u32 tx_q5_byte;
+	u32 tx_q6_byte;
+	u32 tx_q7_byte;
+	u32 tx_q8_byte;
+	u32 tx_q9_byte;
+	u32 tx_q10_byte;
+	u32 tx_q11_byte;
+	u32 tx_q12_byte;
+	u32 tx_q13_byte;
+	u32 tx_q14_byte;
+	u32 tx_q15_byte;
+	u32 rx_q0_pkt;
+	u32 rx_q1_pkt;
+	u32 rx_q2_pkt;
+	u32 rx_q3_pkt;
+	u32 rx_q4_pkt;
+	u32 rx_q5_pkt;
+	u32 rx_q6_pkt;
+	u32 rx_q7_pkt;
+	u32 rx_q0_byte;
+	u32 rx_q1_byte;
+	u32 rx_q2_byte;
+	u32 rx_q3_byte;
+	u32 rx_q4_byte;
+	u32 rx_q5_byte;
+	u32 rx_q6_byte;
+	u32 rx_q7_byte;
+	u32 tx_desc_error;
+};
+
+struct ipqess_tx_desc {
+	__le16  len;
+	__le16  svlan_tag;
+	__le32  word1;
+	__le32  addr;
+	__le32  word3;
+} __aligned(16) __packed;
+
+struct ipqess_rx_desc {
+	u16 rrd0;
+	u16 rrd1;
+	u16 rrd2;
+	u16 rrd3;
+	u16 rrd4;
+	u16 rrd5;
+	u16 rrd6;
+	u16 rrd7;
+} __aligned(16) __packed;
+
+struct ipqess_buf {
+	struct sk_buff *skb;
+	dma_addr_t dma;
+	u32 flags;
+	u16 length;
+};
+
+struct ipqess_tx_ring {
+	struct napi_struct napi_tx;
+	u32 idx;
+	int ring_id;
+	struct ipqess *ess;
+	struct netdev_queue *nq;
+	struct ipqess_tx_desc *hw_desc;
+	struct ipqess_buf *buf;
+	dma_addr_t dma;
+	u16 count;
+	u16 head;
+	u16 tail;
+};
+
+struct ipqess_rx_ring {
+	struct napi_struct napi_rx;
+	u32 idx;
+	int ring_id;
+	struct ipqess *ess;
+	struct device *ppdev;
+	struct ipqess_rx_desc **hw_desc;
+	struct ipqess_buf *buf;
+	dma_addr_t dma;
+	u16 head;
+	u16 tail;
+	atomic_t refill_count;
+};
+
+struct ipqess_rx_ring_refill {
+	struct ipqess_rx_ring *rx_ring;
+	struct work_struct refill_work;
+};
+
+#define IPQESS_IRQ_NAME_LEN	32
+
+struct ipqess {
+	struct net_device *netdev;
+	void __iomem *hw_addr;
+
+	struct clk *ess_clk;
+	struct reset_control *ess_rst;
+
+	struct ipqess_rx_ring rx_ring[IPQESS_NETDEV_QUEUES];
+
+	struct platform_device *pdev;
+	struct phylink *phylink;
+	struct phylink_config phylink_config;
+	struct ipqess_tx_ring tx_ring[IPQESS_NETDEV_QUEUES];
+
+	struct ipqess_statistics ipqess_stats;
+
+	/* Protects stats */
+	spinlock_t stats_lock;
+	struct net_device_stats stats;
+
+	struct ipqess_rx_ring_refill rx_refill[IPQESS_NETDEV_QUEUES];
+	u32 tx_irq[IPQESS_MAX_TX_QUEUE];
+	char tx_irq_names[IPQESS_MAX_TX_QUEUE][IPQESS_IRQ_NAME_LEN];
+	u32 rx_irq[IPQESS_MAX_RX_QUEUE];
+	char rx_irq_names[IPQESS_MAX_TX_QUEUE][IPQESS_IRQ_NAME_LEN];
+};
+
+void ipqess_set_ethtool_ops(struct net_device *netdev);
+void ipqess_update_hw_stats(struct ipqess *ess);
+
+/* register definition */
+#define IPQESS_REG_MAS_CTRL 0x0
+#define IPQESS_REG_TIMEOUT_CTRL 0x004
+#define IPQESS_REG_DBG0 0x008
+#define IPQESS_REG_DBG1 0x00C
+#define IPQESS_REG_SW_CTRL0 0x100
+#define IPQESS_REG_SW_CTRL1 0x104
+
+/* Interrupt Status Register */
+#define IPQESS_REG_RX_ISR 0x200
+#define IPQESS_REG_TX_ISR 0x208
+#define IPQESS_REG_MISC_ISR 0x210
+#define IPQESS_REG_WOL_ISR 0x218
+
+#define IPQESS_MISC_ISR_RX_URG_Q(x) (1 << (x))
+
+#define IPQESS_MISC_ISR_AXIR_TIMEOUT 0x00000100
+#define IPQESS_MISC_ISR_AXIR_ERR 0x00000200
+#define IPQESS_MISC_ISR_TXF_DEAD 0x00000400
+#define IPQESS_MISC_ISR_AXIW_ERR 0x00000800
+#define IPQESS_MISC_ISR_AXIW_TIMEOUT 0x00001000
+
+#define IPQESS_WOL_ISR 0x00000001
+
+/* Interrupt Mask Register */
+#define IPQESS_REG_MISC_IMR 0x214
+#define IPQESS_REG_WOL_IMR 0x218
+
+#define IPQESS_RX_IMR_NORMAL_MASK 0x1
+#define IPQESS_TX_IMR_NORMAL_MASK 0x1
+#define IPQESS_MISC_IMR_NORMAL_MASK 0x80001FFF
+#define IPQESS_WOL_IMR_NORMAL_MASK 0x1
+
+/* Edma receive consumer index */
+#define IPQESS_REG_RX_SW_CONS_IDX_Q(x) (0x220 + ((x) << 2)) /* x is the queue id */
+
+/* Edma transmit consumer index */
+#define IPQESS_REG_TX_SW_CONS_IDX_Q(x) (0x240 + ((x) << 2)) /* x is the queue id */
+
+/* IRQ Moderator Initial Timer Register */
+#define IPQESS_REG_IRQ_MODRT_TIMER_INIT 0x280
+#define IPQESS_IRQ_MODRT_TIMER_MASK 0xFFFF
+#define IPQESS_IRQ_MODRT_RX_TIMER_SHIFT 0
+#define IPQESS_IRQ_MODRT_TX_TIMER_SHIFT 16
+
+/* Interrupt Control Register */
+#define IPQESS_REG_INTR_CTRL 0x284
+#define IPQESS_INTR_CLR_TYP_SHIFT 0
+#define IPQESS_INTR_SW_IDX_W_TYP_SHIFT 1
+#define IPQESS_INTR_CLEAR_TYPE_W1 0
+#define IPQESS_INTR_CLEAR_TYPE_R 1
+
+/* RX Interrupt Mask Register */
+#define IPQESS_REG_RX_INT_MASK_Q(x) (0x300 + ((x) << 2)) /* x = queue id */
+
+/* TX Interrupt mask register */
+#define IPQESS_REG_TX_INT_MASK_Q(x) (0x340 + ((x) << 2)) /* x = queue id */
+
+/* Load Ptr Register
+ * Software sets this bit after the initialization of the head and tail
+ */
+#define IPQESS_REG_TX_SRAM_PART 0x400
+#define IPQESS_LOAD_PTR_SHIFT 16
+
+/* TXQ Control Register */
+#define IPQESS_REG_TXQ_CTRL 0x404
+#define IPQESS_TXQ_CTRL_IP_OPTION_EN 0x10
+#define IPQESS_TXQ_CTRL_TXQ_EN 0x20
+#define IPQESS_TXQ_CTRL_ENH_MODE 0x40
+#define IPQESS_TXQ_CTRL_LS_8023_EN 0x80
+#define IPQESS_TXQ_CTRL_TPD_BURST_EN 0x100
+#define IPQESS_TXQ_CTRL_LSO_BREAK_EN 0x200
+#define IPQESS_TXQ_NUM_TPD_BURST_MASK 0xF
+#define IPQESS_TXQ_TXF_BURST_NUM_MASK 0xFFFF
+#define IPQESS_TXQ_NUM_TPD_BURST_SHIFT 0
+#define IPQESS_TXQ_TXF_BURST_NUM_SHIFT 16
+
+#define	IPQESS_REG_TXF_WATER_MARK 0x408 /* In 8-bytes */
+#define IPQESS_TXF_WATER_MARK_MASK 0x0FFF
+#define IPQESS_TXF_LOW_WATER_MARK_SHIFT 0
+#define IPQESS_TXF_HIGH_WATER_MARK_SHIFT 16
+#define IPQESS_TXQ_CTRL_BURST_MODE_EN 0x80000000
+
+/* WRR Control Register */
+#define IPQESS_REG_WRR_CTRL_Q0_Q3 0x40c
+#define IPQESS_REG_WRR_CTRL_Q4_Q7 0x410
+#define IPQESS_REG_WRR_CTRL_Q8_Q11 0x414
+#define IPQESS_REG_WRR_CTRL_Q12_Q15 0x418
+
+/* Weight round robin(WRR), it takes queue as input, and computes
+ * starting bits where we need to write the weight for a particular
+ * queue
+ */
+#define IPQESS_WRR_SHIFT(x) (((x) * 5) % 20)
+
+/* Tx Descriptor Control Register */
+#define IPQESS_REG_TPD_RING_SIZE 0x41C
+#define IPQESS_TPD_RING_SIZE_SHIFT 0
+#define IPQESS_TPD_RING_SIZE_MASK 0xFFFF
+
+/* Transmit descriptor base address */
+#define IPQESS_REG_TPD_BASE_ADDR_Q(x) (0x420 + ((x) << 2)) /* x = queue id */
+
+/* TPD Index Register */
+#define IPQESS_REG_TPD_IDX_Q(x) (0x460 + ((x) << 2)) /* x = queue id */
+
+#define IPQESS_TPD_PROD_IDX_BITS 0x0000FFFF
+#define IPQESS_TPD_CONS_IDX_BITS 0xFFFF0000
+#define IPQESS_TPD_PROD_IDX_MASK 0xFFFF
+#define IPQESS_TPD_CONS_IDX_MASK 0xFFFF
+#define IPQESS_TPD_PROD_IDX_SHIFT 0
+#define IPQESS_TPD_CONS_IDX_SHIFT 16
+
+/* TX Virtual Queue Mapping Control Register */
+#define IPQESS_REG_VQ_CTRL0 0x4A0
+#define IPQESS_REG_VQ_CTRL1 0x4A4
+
+/* Virtual QID shift, it takes queue as input, and computes
+ * Virtual QID position in virtual qid control register
+ */
+#define IPQESS_VQ_ID_SHIFT(i) (((i) * 3) % 24)
+
+/* Virtual Queue Default Value */
+#define IPQESS_VQ_REG_VALUE 0x240240
+
+/* Tx side Port Interface Control Register */
+#define IPQESS_REG_PORT_CTRL 0x4A8
+#define IPQESS_PAD_EN_SHIFT 15
+
+/* Tx side VLAN Configuration Register */
+#define IPQESS_REG_VLAN_CFG 0x4AC
+
+#define IPQESS_VLAN_CFG_SVLAN_TPID_SHIFT 0
+#define IPQESS_VLAN_CFG_SVLAN_TPID_MASK 0xffff
+#define IPQESS_VLAN_CFG_CVLAN_TPID_SHIFT 16
+#define IPQESS_VLAN_CFG_CVLAN_TPID_MASK 0xffff
+
+#define IPQESS_TX_CVLAN 16
+#define IPQESS_TX_INS_CVLAN 17
+#define IPQESS_TX_CVLAN_TAG_SHIFT 0
+
+#define IPQESS_TX_SVLAN 14
+#define IPQESS_TX_INS_SVLAN 15
+#define IPQESS_TX_SVLAN_TAG_SHIFT 16
+
+/* Tx Queue Packet Statistic Register */
+#define IPQESS_REG_TX_STAT_PKT_Q(x) (0x700 + ((x) << 3)) /* x = queue id */
+
+#define IPQESS_TX_STAT_PKT_MASK 0xFFFFFF
+
+/* Tx Queue Byte Statistic Register */
+#define IPQESS_REG_TX_STAT_BYTE_Q(x) (0x704 + ((x) << 3)) /* x = queue id */
+
+/* Load Balance Based Ring Offset Register */
+#define IPQESS_REG_LB_RING 0x800
+#define IPQESS_LB_RING_ENTRY_MASK 0xff
+#define IPQESS_LB_RING_ID_MASK 0x7
+#define IPQESS_LB_RING_PROFILE_ID_MASK 0x3
+#define IPQESS_LB_RING_ENTRY_BIT_OFFSET 8
+#define IPQESS_LB_RING_ID_OFFSET 0
+#define IPQESS_LB_RING_PROFILE_ID_OFFSET 3
+#define IPQESS_LB_REG_VALUE 0x6040200
+
+/* Load Balance Priority Mapping Register */
+#define IPQESS_REG_LB_PRI_START 0x804
+#define IPQESS_REG_LB_PRI_END 0x810
+#define IPQESS_LB_PRI_REG_INC 4
+#define IPQESS_LB_PRI_ENTRY_BIT_OFFSET 4
+#define IPQESS_LB_PRI_ENTRY_MASK 0xf
+
+/* RSS Priority Mapping Register */
+#define IPQESS_REG_RSS_PRI 0x820
+#define IPQESS_RSS_PRI_ENTRY_MASK 0xf
+#define IPQESS_RSS_RING_ID_MASK 0x7
+#define IPQESS_RSS_PRI_ENTRY_BIT_OFFSET 4
+
+/* RSS Indirection Register */
+#define IPQESS_REG_RSS_IDT(x) (0x840 + ((x) << 2)) /* x = No. of indirection table */
+#define IPQESS_NUM_IDT 16
+#define IPQESS_RSS_IDT_VALUE 0x64206420
+
+/* Default RSS Ring Register */
+#define IPQESS_REG_DEF_RSS 0x890
+#define IPQESS_DEF_RSS_MASK 0x7
+
+/* RSS Hash Function Type Register */
+#define IPQESS_REG_RSS_TYPE 0x894
+#define IPQESS_RSS_TYPE_NONE 0x01
+#define IPQESS_RSS_TYPE_IPV4TCP 0x02
+#define IPQESS_RSS_TYPE_IPV6_TCP 0x04
+#define IPQESS_RSS_TYPE_IPV4_UDP 0x08
+#define IPQESS_RSS_TYPE_IPV6UDP 0x10
+#define IPQESS_RSS_TYPE_IPV4 0x20
+#define IPQESS_RSS_TYPE_IPV6 0x40
+#define IPQESS_RSS_HASH_MODE_MASK 0x7f
+
+#define IPQESS_REG_RSS_HASH_VALUE 0x8C0
+
+#define IPQESS_REG_RSS_TYPE_RESULT 0x8C4
+
+#define IPQESS_HASH_TYPE_START 0
+#define IPQESS_HASH_TYPE_END 5
+#define IPQESS_HASH_TYPE_SHIFT 12
+
+#define IPQESS_RFS_FLOW_ENTRIES 1024
+#define IPQESS_RFS_FLOW_ENTRIES_MASK (IPQESS_RFS_FLOW_ENTRIES - 1)
+#define IPQESS_RFS_EXPIRE_COUNT_PER_CALL 128
+
+/* RFD Base Address Register */
+#define IPQESS_REG_RFD_BASE_ADDR_Q(x) (0x950 + ((x) << 2)) /* x = queue id */
+
+/* RFD Index Register */
+#define IPQESS_REG_RFD_IDX_Q(x) (0x9B0 + ((x) << 2)) /* x = queue id */
+
+#define IPQESS_RFD_PROD_IDX_BITS 0x00000FFF
+#define IPQESS_RFD_CONS_IDX_BITS 0x0FFF0000
+#define IPQESS_RFD_PROD_IDX_MASK 0xFFF
+#define IPQESS_RFD_CONS_IDX_MASK 0xFFF
+#define IPQESS_RFD_PROD_IDX_SHIFT 0
+#define IPQESS_RFD_CONS_IDX_SHIFT 16
+
+/* Rx Descriptor Control Register */
+#define IPQESS_REG_RX_DESC0 0xA10
+#define IPQESS_RFD_RING_SIZE_MASK 0xFFF
+#define IPQESS_RX_BUF_SIZE_MASK 0xFFFF
+#define IPQESS_RFD_RING_SIZE_SHIFT 0
+#define IPQESS_RX_BUF_SIZE_SHIFT 16
+
+#define IPQESS_REG_RX_DESC1 0xA14
+#define IPQESS_RXQ_RFD_BURST_NUM_MASK 0x3F
+#define IPQESS_RXQ_RFD_PF_THRESH_MASK 0x1F
+#define IPQESS_RXQ_RFD_LOW_THRESH_MASK 0xFFF
+#define IPQESS_RXQ_RFD_BURST_NUM_SHIFT 0
+#define IPQESS_RXQ_RFD_PF_THRESH_SHIFT 8
+#define IPQESS_RXQ_RFD_LOW_THRESH_SHIFT 16
+
+/* RXQ Control Register */
+#define IPQESS_REG_RXQ_CTRL 0xA18
+#define IPQESS_FIFO_THRESH_TYPE_SHIF 0
+#define IPQESS_FIFO_THRESH_128_BYTE 0x0
+#define IPQESS_FIFO_THRESH_64_BYTE 0x1
+#define IPQESS_RXQ_CTRL_RMV_VLAN 0x00000002
+#define IPQESS_RXQ_CTRL_EN_MASK			GENMASK(15, 8)
+#define IPQESS_RXQ_CTRL_EN(__qid)		BIT(8 + (__qid))
+
+/* AXI Burst Size Config */
+#define IPQESS_REG_AXIW_CTRL_MAXWRSIZE 0xA1C
+#define IPQESS_AXIW_MAXWRSIZE_VALUE 0x0
+
+/* Rx Statistics Register */
+#define IPQESS_REG_RX_STAT_BYTE_Q(x) (0xA30 + ((x) << 2)) /* x = queue id */
+#define IPQESS_REG_RX_STAT_PKT_Q(x) (0xA50 + ((x) << 2)) /* x = queue id */
+
+/* WoL Pattern Length Register */
+#define IPQESS_REG_WOL_PATTERN_LEN0 0xC00
+#define IPQESS_WOL_PT_LEN_MASK 0xFF
+#define IPQESS_WOL_PT0_LEN_SHIFT 0
+#define IPQESS_WOL_PT1_LEN_SHIFT 8
+#define IPQESS_WOL_PT2_LEN_SHIFT 16
+#define IPQESS_WOL_PT3_LEN_SHIFT 24
+
+#define IPQESS_REG_WOL_PATTERN_LEN1 0xC04
+#define IPQESS_WOL_PT4_LEN_SHIFT 0
+#define IPQESS_WOL_PT5_LEN_SHIFT 8
+#define IPQESS_WOL_PT6_LEN_SHIFT 16
+
+/* WoL Control Register */
+#define IPQESS_REG_WOL_CTRL 0xC08
+#define IPQESS_WOL_WK_EN 0x00000001
+#define IPQESS_WOL_MG_EN 0x00000002
+#define IPQESS_WOL_PT0_EN 0x00000004
+#define IPQESS_WOL_PT1_EN 0x00000008
+#define IPQESS_WOL_PT2_EN 0x00000010
+#define IPQESS_WOL_PT3_EN 0x00000020
+#define IPQESS_WOL_PT4_EN 0x00000040
+#define IPQESS_WOL_PT5_EN 0x00000080
+#define IPQESS_WOL_PT6_EN 0x00000100
+
+/* MAC Control Register */
+#define IPQESS_REG_MAC_CTRL0 0xC20
+#define IPQESS_REG_MAC_CTRL1 0xC24
+
+/* WoL Pattern Register */
+#define IPQESS_REG_WOL_PATTERN_START 0x5000
+#define IPQESS_PATTERN_PART_REG_OFFSET 0x40
+
+/* TX descriptor fields */
+#define IPQESS_TPD_HDR_SHIFT 0
+#define IPQESS_TPD_PPPOE_EN 0x00000100
+#define IPQESS_TPD_IP_CSUM_EN 0x00000200
+#define IPQESS_TPD_TCP_CSUM_EN 0x0000400
+#define IPQESS_TPD_UDP_CSUM_EN 0x00000800
+#define IPQESS_TPD_CUSTOM_CSUM_EN 0x00000C00
+#define IPQESS_TPD_LSO_EN 0x00001000
+#define IPQESS_TPD_LSO_V2_EN 0x00002000
+/* The VLAN_TAGGED bit is not used in the publicly available
+ * drivers. The definition has been stolen from the Atheros
+ * 'alx' driver (drivers/net/ethernet/atheros/alx/hw.h). It
+ * seems that it has the same meaning in regard to the EDMA
+ * hardware.
+ */
+#define IPQESS_TPD_VLAN_TAGGED 0x00004000
+#define IPQESS_TPD_IPV4_EN 0x00010000
+#define IPQESS_TPD_MSS_MASK 0x1FFF
+#define IPQESS_TPD_MSS_SHIFT 18
+#define IPQESS_TPD_CUSTOM_CSUM_SHIFT 18
+
+/* RRD descriptor fields */
+#define IPQESS_RRD_NUM_RFD_MASK 0x000F
+#define IPQESS_RRD_PKT_SIZE_MASK 0x3FFF
+#define IPQESS_RRD_SRC_PORT_NUM_MASK 0x4000
+#define IPQESS_RRD_SVLAN 0x8000
+#define IPQESS_RRD_FLOW_COOKIE_MASK 0x07FF
+
+#define IPQESS_RRD_PKT_SIZE_MASK 0x3FFF
+#define IPQESS_RRD_CSUM_FAIL_MASK 0xC000
+#define IPQESS_RRD_CVLAN 0x0001
+#define IPQESS_RRD_DESC_VALID 0x8000
+
+#define IPQESS_RRD_PRIORITY_SHIFT 4
+#define IPQESS_RRD_PRIORITY_MASK 0x7
+#define IPQESS_RRD_PORT_TYPE_SHIFT 7
+#define IPQESS_RRD_PORT_TYPE_MASK 0x1F
+
+#define IPQESS_RRD_PORT_ID_MASK 0x7000
+
+#endif
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c b/drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c
new file mode 100644
index 000000000000..95fb7e2418d1
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0 OR ISC
+/* Copyright (c) 2015 - 2016, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017 - 2018, John Crispin <john@phrozen.org>
+ * Copyright (c) 2021 - 2022, Maxime Chevallier <maxime.chevallier@bootlin.com>
+ *
+ */
+
+#include <linux/ethtool.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/phylink.h>
+
+#include "ipqess.h"
+
+struct ipqess_ethtool_stats {
+	u8 string[ETH_GSTRING_LEN];
+	u32 offset;
+};
+
+#define IPQESS_STAT(m)    offsetof(struct ipqess_statistics, m)
+#define DRVINFO_LEN	32
+
+static const struct ipqess_ethtool_stats ipqess_stats[] = {
+	{"tx_q0_pkt", IPQESS_STAT(tx_q0_pkt)},
+	{"tx_q1_pkt", IPQESS_STAT(tx_q1_pkt)},
+	{"tx_q2_pkt", IPQESS_STAT(tx_q2_pkt)},
+	{"tx_q3_pkt", IPQESS_STAT(tx_q3_pkt)},
+	{"tx_q4_pkt", IPQESS_STAT(tx_q4_pkt)},
+	{"tx_q5_pkt", IPQESS_STAT(tx_q5_pkt)},
+	{"tx_q6_pkt", IPQESS_STAT(tx_q6_pkt)},
+	{"tx_q7_pkt", IPQESS_STAT(tx_q7_pkt)},
+	{"tx_q8_pkt", IPQESS_STAT(tx_q8_pkt)},
+	{"tx_q9_pkt", IPQESS_STAT(tx_q9_pkt)},
+	{"tx_q10_pkt", IPQESS_STAT(tx_q10_pkt)},
+	{"tx_q11_pkt", IPQESS_STAT(tx_q11_pkt)},
+	{"tx_q12_pkt", IPQESS_STAT(tx_q12_pkt)},
+	{"tx_q13_pkt", IPQESS_STAT(tx_q13_pkt)},
+	{"tx_q14_pkt", IPQESS_STAT(tx_q14_pkt)},
+	{"tx_q15_pkt", IPQESS_STAT(tx_q15_pkt)},
+	{"tx_q0_byte", IPQESS_STAT(tx_q0_byte)},
+	{"tx_q1_byte", IPQESS_STAT(tx_q1_byte)},
+	{"tx_q2_byte", IPQESS_STAT(tx_q2_byte)},
+	{"tx_q3_byte", IPQESS_STAT(tx_q3_byte)},
+	{"tx_q4_byte", IPQESS_STAT(tx_q4_byte)},
+	{"tx_q5_byte", IPQESS_STAT(tx_q5_byte)},
+	{"tx_q6_byte", IPQESS_STAT(tx_q6_byte)},
+	{"tx_q7_byte", IPQESS_STAT(tx_q7_byte)},
+	{"tx_q8_byte", IPQESS_STAT(tx_q8_byte)},
+	{"tx_q9_byte", IPQESS_STAT(tx_q9_byte)},
+	{"tx_q10_byte", IPQESS_STAT(tx_q10_byte)},
+	{"tx_q11_byte", IPQESS_STAT(tx_q11_byte)},
+	{"tx_q12_byte", IPQESS_STAT(tx_q12_byte)},
+	{"tx_q13_byte", IPQESS_STAT(tx_q13_byte)},
+	{"tx_q14_byte", IPQESS_STAT(tx_q14_byte)},
+	{"tx_q15_byte", IPQESS_STAT(tx_q15_byte)},
+	{"rx_q0_pkt", IPQESS_STAT(rx_q0_pkt)},
+	{"rx_q1_pkt", IPQESS_STAT(rx_q1_pkt)},
+	{"rx_q2_pkt", IPQESS_STAT(rx_q2_pkt)},
+	{"rx_q3_pkt", IPQESS_STAT(rx_q3_pkt)},
+	{"rx_q4_pkt", IPQESS_STAT(rx_q4_pkt)},
+	{"rx_q5_pkt", IPQESS_STAT(rx_q5_pkt)},
+	{"rx_q6_pkt", IPQESS_STAT(rx_q6_pkt)},
+	{"rx_q7_pkt", IPQESS_STAT(rx_q7_pkt)},
+	{"rx_q0_byte", IPQESS_STAT(rx_q0_byte)},
+	{"rx_q1_byte", IPQESS_STAT(rx_q1_byte)},
+	{"rx_q2_byte", IPQESS_STAT(rx_q2_byte)},
+	{"rx_q3_byte", IPQESS_STAT(rx_q3_byte)},
+	{"rx_q4_byte", IPQESS_STAT(rx_q4_byte)},
+	{"rx_q5_byte", IPQESS_STAT(rx_q5_byte)},
+	{"rx_q6_byte", IPQESS_STAT(rx_q6_byte)},
+	{"rx_q7_byte", IPQESS_STAT(rx_q7_byte)},
+	{"tx_desc_error", IPQESS_STAT(tx_desc_error)},
+};
+
+static int ipqess_get_strset_count(struct net_device *netdev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return ARRAY_SIZE(ipqess_stats);
+	default:
+		netdev_dbg(netdev, "%s: Unsupported string set", __func__);
+		return -EOPNOTSUPP;
+	}
+}
+
+static void ipqess_get_strings(struct net_device *netdev, u32 stringset,
+			       u8 *data)
+{
+	u8 *p = data;
+	u32 i;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for (i = 0; i < ARRAY_SIZE(ipqess_stats); i++) {
+			memcpy(p, ipqess_stats[i].string,
+			       min((size_t)ETH_GSTRING_LEN,
+				   strlen(ipqess_stats[i].string) + 1));
+			p += ETH_GSTRING_LEN;
+		}
+		break;
+	}
+}
+
+static void ipqess_get_ethtool_stats(struct net_device *netdev,
+				     struct ethtool_stats *stats,
+				     uint64_t *data)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+	u32 *essstats = (u32 *)&ess->ipqess_stats;
+	int i;
+
+	spin_lock(&ess->stats_lock);
+
+	ipqess_update_hw_stats(ess);
+
+	for (i = 0; i < ARRAY_SIZE(ipqess_stats); i++)
+		data[i] = *(u32 *)(essstats + (ipqess_stats[i].offset / sizeof(u32)));
+
+	spin_unlock(&ess->stats_lock);
+}
+
+static void ipqess_get_drvinfo(struct net_device *dev,
+			       struct ethtool_drvinfo *info)
+{
+	strscpy(info->driver, "qca_ipqess", DRVINFO_LEN);
+	strscpy(info->bus_info, "axi", ETHTOOL_BUSINFO_LEN);
+}
+
+static int ipqess_get_settings(struct net_device *netdev,
+			       struct ethtool_link_ksettings *cmd)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	return phylink_ethtool_ksettings_get(ess->phylink, cmd);
+}
+
+static int ipqess_set_settings(struct net_device *netdev,
+			       const struct ethtool_link_ksettings *cmd)
+{
+	struct ipqess *ess = netdev_priv(netdev);
+
+	return phylink_ethtool_ksettings_set(ess->phylink, cmd);
+}
+
+static void ipqess_get_ringparam(struct net_device *netdev,
+				 struct ethtool_ringparam *ring,
+				 struct kernel_ethtool_ringparam *kernel_ering,
+				 struct netlink_ext_ack *extack)
+{
+	ring->tx_max_pending = IPQESS_TX_RING_SIZE;
+	ring->rx_max_pending = IPQESS_RX_RING_SIZE;
+}
+
+static const struct ethtool_ops ipqesstool_ops = {
+	.get_drvinfo = &ipqess_get_drvinfo,
+	.get_link = &ethtool_op_get_link,
+	.get_link_ksettings = &ipqess_get_settings,
+	.set_link_ksettings = &ipqess_set_settings,
+	.get_strings = &ipqess_get_strings,
+	.get_sset_count = &ipqess_get_strset_count,
+	.get_ethtool_stats = &ipqess_get_ethtool_stats,
+	.get_ringparam = ipqess_get_ringparam,
+};
+
+void ipqess_set_ethtool_ops(struct net_device *netdev)
+{
+	netdev->ethtool_ops = &ipqesstool_ops;
+}
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-14 15:06 [PATCH net-next v2 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
  2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
@ 2022-05-14 15:06 ` Maxime Chevallier
  2022-05-14 16:33   ` Florian Fainelli
                     ` (2 more replies)
  2022-05-14 15:06 ` [PATCH net-next v2 3/5] net: ipqess: Add out-of-band DSA tagging support Maxime Chevallier
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-14 15:06 UTC (permalink / raw)
  To: davem, Rob Herring
  Cc: Maxime Chevallier, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

This tagging protocol is designed for the situation where the link
between the MAC and the Switch is designed such that the Destination
Port, which is usually embedded in some part of the Ethernet Header, is
sent out-of-band, and isn't present at all in the Ethernet frame.

This can happen when the MAC and Switch are tightly integrated on an
SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
tag is inserted directly into the DMA descriptors. In that case,
the MAC driver is responsible for sending the tag to the switch using
the out-of-band medium. To do so, the MAC driver needs to have the
information of the destination port for that skb.

This out-of-band tagging protocol is using the very beggining of the skb
headroom to store the tag. The drawback of this approch is that the
headroom isn't initialized upon allocating it, therefore we have a
chance that the garbage data that lies there at allocation time actually
ressembles a valid oob tag. This is only problematic if we are
sending/receiving traffic on the master port, which isn't a valid DSA
use-case from the beggining. When dealing from traffic to/from a slave
port, then the oob tag will be initialized properly by the tagger or the
mac driver through the use of the dsa_oob_tag_push() call.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V1->V2:
 - Reworked the tagging method, putting the tag at skb->head instead
   of putting it into skb->shinfo, as per Andrew, Florian and Vlad's
   reviews

 include/linux/dsa/oob.h | 17 +++++++++
 include/net/dsa.h       |  2 +
 net/dsa/Kconfig         |  7 ++++
 net/dsa/Makefile        |  1 +
 net/dsa/tag_oob.c       | 84 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 111 insertions(+)
 create mode 100644 include/linux/dsa/oob.h
 create mode 100644 net/dsa/tag_oob.c

diff --git a/include/linux/dsa/oob.h b/include/linux/dsa/oob.h
new file mode 100644
index 000000000000..dbb4a6fb1ce4
--- /dev/null
+++ b/include/linux/dsa/oob.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ * Copyright (C) 2022 Maxime Chevallier <maxime.chevallier@bootlin.com>
+ */
+
+#ifndef _NET_DSA_OOB_H
+#define _NET_DSA_OOB_H
+
+#include <linux/skbuff.h>
+
+struct dsa_oob_tag_info {
+	u16 proto;
+	u16 dp;
+};
+
+int dsa_oob_tag_push(struct sk_buff *skb, struct dsa_oob_tag_info *ti);
+int dsa_oob_tag_pop(struct sk_buff *skb, struct dsa_oob_tag_info *ti);
+#endif
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 14e10cda7267..9951df858912 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -53,6 +53,7 @@ struct phylink_link_state;
 #define DSA_TAG_PROTO_SJA1110_VALUE		23
 #define DSA_TAG_PROTO_RTL8_4_VALUE		24
 #define DSA_TAG_PROTO_RTL8_4T_VALUE		25
+#define DSA_TAG_PROTO_OOB_VALUE			26
 
 enum dsa_tag_protocol {
 	DSA_TAG_PROTO_NONE		= DSA_TAG_PROTO_NONE_VALUE,
@@ -81,6 +82,7 @@ enum dsa_tag_protocol {
 	DSA_TAG_PROTO_SJA1110		= DSA_TAG_PROTO_SJA1110_VALUE,
 	DSA_TAG_PROTO_RTL8_4		= DSA_TAG_PROTO_RTL8_4_VALUE,
 	DSA_TAG_PROTO_RTL8_4T		= DSA_TAG_PROTO_RTL8_4T_VALUE,
+	DSA_TAG_PROTO_OOB		= DSA_TAG_PROTO_OOB_VALUE,
 };
 
 struct dsa_switch;
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 8cb87b5067ee..b7aa4d8552b2 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -57,6 +57,13 @@ config NET_DSA_TAG_HELLCREEK
 	  Say Y or M if you want to enable support for tagging frames
 	  for the Hirschmann Hellcreek TSN switches.
 
+config NET_DSA_TAG_OOB
+	tristate "Tag driver for Out-of-band tagging drivers"
+	help
+	  Say Y or M if you want to enable support for tagging out-of-band. In
+	  that case, the MAC driver becomes responsible for sending the tag to
+	  the switch, outside the inband data.
+
 config NET_DSA_TAG_GSWIP
 	tristate "Tag driver for Lantiq / Intel GSWIP switches"
 	help
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 9f75820e7c98..b156e20f9c0a 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_NET_DSA_TAG_BRCM_COMMON) += tag_brcm.o
 obj-$(CONFIG_NET_DSA_TAG_DSA_COMMON) += tag_dsa.o
 obj-$(CONFIG_NET_DSA_TAG_GSWIP) += tag_gswip.o
 obj-$(CONFIG_NET_DSA_TAG_HELLCREEK) += tag_hellcreek.o
+obj-$(CONFIG_NET_DSA_TAG_OOB) += tag_oob.o
 obj-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o
 obj-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o
 obj-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
diff --git a/net/dsa/tag_oob.c b/net/dsa/tag_oob.c
new file mode 100644
index 000000000000..45ee3df5a7f9
--- /dev/null
+++ b/net/dsa/tag_oob.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2022, Maxime Chevallier <maxime.chevallier@bootlin.com> */
+
+#include <linux/bitfield.h>
+#include <linux/dsa/oob.h>
+
+#include "dsa_priv.h"
+
+#define DSA_OOB_TAG_LEN 4
+
+int dsa_oob_tag_push(struct sk_buff *skb, struct dsa_oob_tag_info *ti)
+{
+	struct dsa_oob_tag_info *tag_info;
+
+	tag_info = (struct dsa_oob_tag_info *)skb->head;
+
+	tag_info->proto = ti->proto;
+	tag_info->dp = ti->dp;
+
+	return 0;
+}
+EXPORT_SYMBOL(dsa_oob_tag_push);
+
+int dsa_oob_tag_pop(struct sk_buff *skb, struct dsa_oob_tag_info *ti)
+{
+	struct dsa_oob_tag_info *tag_info;
+
+	tag_info = (struct dsa_oob_tag_info *)skb->head;
+
+	if (tag_info->proto != DSA_TAG_PROTO_OOB)
+		return -EINVAL;
+
+	ti->proto = tag_info->proto;
+	ti->dp = tag_info->dp;
+
+	return 0;
+}
+EXPORT_SYMBOL(dsa_oob_tag_pop);
+
+static struct sk_buff *oob_tag_xmit(struct sk_buff *skb,
+				    struct net_device *dev)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_oob_tag_info tag_info;
+
+	tag_info.dp = dp->index;
+	tag_info.proto = DSA_TAG_PROTO_OOB;
+
+	if (dsa_oob_tag_push(skb, &tag_info))
+		return NULL;
+
+	return skb;
+}
+
+static struct sk_buff *oob_tag_rcv(struct sk_buff *skb,
+				   struct net_device *dev)
+{
+	struct dsa_oob_tag_info tag_info;
+
+	if (dsa_oob_tag_pop(skb, &tag_info))
+		return NULL;
+
+	skb->dev = dsa_master_find_slave(dev, 0, tag_info.dp);
+	if (!skb->dev)
+		return NULL;
+
+	return skb;
+}
+
+const struct dsa_device_ops oob_tag_dsa_ops = {
+	.name	= "oob",
+	.proto	= DSA_TAG_PROTO_OOB,
+	.xmit	= oob_tag_xmit,
+	.rcv	= oob_tag_rcv,
+	.needed_headroom = DSA_OOB_TAG_LEN,
+};
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("DSA tag driver for out-of-band tagging");
+MODULE_AUTHOR("Maxime Chevallier <maxime.chevallier@bootlin.com>");
+MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_OOB);
+
+module_dsa_tag_driver(oob_tag_dsa_ops);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH net-next v2 3/5] net: ipqess: Add out-of-band DSA tagging support
  2022-05-14 15:06 [PATCH net-next v2 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
  2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
  2022-05-14 15:06 ` [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
@ 2022-05-14 15:06 ` Maxime Chevallier
  2022-05-14 15:06 ` [PATCH net-next v2 4/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
  2022-05-14 15:06 ` [PATCH net-next v2 5/5] ARM: dts: qcom: ipq4019: Add description for the " Maxime Chevallier
  4 siblings, 0 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-14 15:06 UTC (permalink / raw)
  To: davem, Rob Herring
  Cc: Maxime Chevallier, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

On the IPQ4019, there's an 5 ports switch connected to the CPU through
the IPQESS Ethernet controller. The way the DSA tag is sent-out to that
switch is through the DMA descriptor, due to how tightly it is
integrated with the switch.

This commit uses the out-of-band tagging protocol by getting the source
port from the descriptor, push it into the skb, and have the tagger pull
it to infer the destination netdev. The reverse process is done on the
TX side, where the driver pulls the tag from the skb and builds the
descriptor accordingly.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V1->V2:
 - Use the new tagger, and the dsa_oob_tag_* helpers

 drivers/net/ethernet/qualcomm/Kconfig         |  3 ++-
 drivers/net/ethernet/qualcomm/ipqess/ipqess.c | 27 +++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig
index a723ddbea248..eeb2c608d6b9 100644
--- a/drivers/net/ethernet/qualcomm/Kconfig
+++ b/drivers/net/ethernet/qualcomm/Kconfig
@@ -62,8 +62,9 @@ config QCOM_EMAC
 
 config QCOM_IPQ4019_ESS_EDMA
 	tristate "Qualcomm Atheros IPQ4019 ESS EDMA support"
-	depends on OF
+	depends on OF && NET_DSA
 	select PHYLINK
+	select NET_DSA_TAG_OOB
 	help
 	  This driver supports the Qualcomm Atheros IPQ40xx built-in
 	  ESS EDMA ethernet controller.
diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.c b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
index b11f11f23c11..a068dff19943 100644
--- a/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
+++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
@@ -9,6 +9,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/clk.h>
+#include <linux/dsa/oob.h>
 #include <linux/if_vlan.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
@@ -22,6 +23,7 @@
 #include <linux/skbuff.h>
 #include <linux/vmalloc.h>
 #include <net/checksum.h>
+#include <net/dsa.h>
 #include <net/ip6_checksum.h>
 
 #include "ipqess.h"
@@ -334,6 +336,7 @@ static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int budget)
 	tail &= IPQESS_RFD_CONS_IDX_MASK;
 
 	while (done < budget) {
+		struct dsa_oob_tag_info tag_info;
 		struct ipqess_rx_desc *rd;
 		struct sk_buff *skb;
 
@@ -413,6 +416,12 @@ static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int budget)
 			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD),
 					       rd->rrd4);
 
+		if (netdev_uses_dsa(rx_ring->ess->netdev)) {
+			tag_info.dp = FIELD_GET(IPQESS_RRD_PORT_ID_MASK, rd->rrd1);
+			tag_info.proto = DSA_TAG_PROTO_OOB;
+			dsa_oob_tag_push(skb, &tag_info);
+		}
+
 		napi_gro_receive(&rx_ring->napi_rx, skb);
 
 		rx_ring->ess->stats.rx_packets++;
@@ -727,6 +736,22 @@ static void ipqess_rollback_tx(struct ipqess *eth,
 	tx_ring->head = start_index;
 }
 
+static void ipqess_process_dsa_tag_sh(struct ipqess *ess, struct sk_buff *skb,
+				      u32 *word3)
+{
+	struct dsa_oob_tag_info tag_info;
+
+	if (!netdev_uses_dsa(ess->netdev))
+		return;
+
+	if (dsa_oob_tag_pop(skb, &tag_info))
+		return;
+
+	*word3 |= tag_info.dp << IPQESS_TPD_PORT_BITMAP_SHIFT;
+	*word3 |= BIT(IPQESS_TPD_FROM_CPU_SHIFT);
+	*word3 |= 0x3e << IPQESS_TPD_PORT_BITMAP_SHIFT;
+}
+
 static int ipqess_tx_map_and_fill(struct ipqess_tx_ring *tx_ring,
 				  struct sk_buff *skb)
 {
@@ -737,6 +762,8 @@ static int ipqess_tx_map_and_fill(struct ipqess_tx_ring *tx_ring,
 	u16 len;
 	int i;
 
+	ipqess_process_dsa_tag_sh(tx_ring->ess, skb, &word3);
+
 	if (skb_is_gso(skb)) {
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
 			lso_word1 |= IPQESS_TPD_IPV4_EN;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH net-next v2 4/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller
  2022-05-14 15:06 [PATCH net-next v2 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
                   ` (2 preceding siblings ...)
  2022-05-14 15:06 ` [PATCH net-next v2 3/5] net: ipqess: Add out-of-band DSA tagging support Maxime Chevallier
@ 2022-05-14 15:06 ` Maxime Chevallier
  2022-05-18  0:52   ` Rob Herring
  2022-05-14 15:06 ` [PATCH net-next v2 5/5] ARM: dts: qcom: ipq4019: Add description for the " Maxime Chevallier
  4 siblings, 1 reply; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-14 15:06 UTC (permalink / raw)
  To: davem, Rob Herring
  Cc: Maxime Chevallier, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

Add the DT binding for the IPQESS Ethernet Controller. This is a simple
controller, only requiring the phy-mode, interrupts, clocks, and
possibly a MAC address setting.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V1->V2:
 - Fixed the example
 - Added reset and clocks
 - Removed generic ethernet attributes

 .../devicetree/bindings/net/qcom,ipqess.yaml  | 104 ++++++++++++++++++
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/qcom,ipqess.yaml

diff --git a/Documentation/devicetree/bindings/net/qcom,ipqess.yaml b/Documentation/devicetree/bindings/net/qcom,ipqess.yaml
new file mode 100644
index 000000000000..ea0023509737
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qcom,ipqess.yaml
@@ -0,0 +1,104 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/qcom,ipqess.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm IPQ ESS EDMA Ethernet Controller
+
+maintainers:
+  - Maxime Chevallier <maxime.chevallier@bootlin.com>
+
+allOf:
+  - $ref: "ethernet-controller.yaml#"
+
+properties:
+  compatible:
+    const: qcom,ipq4019e-ess-edma
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    minItems: 2
+    maxItems: 32
+    description: One interrupt per tx and rx queue, with up to 16 queues.
+
+  clocks:
+    maxItems: 1
+
+  clock-names:
+    const: ess
+
+  resets:
+    maxItems: 1
+
+  reset-names:
+    const: ess
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - resets
+  - phy-mode
+
+unevaluatedProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/qcom,gcc-ipq4019.h>
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+    #include <dt-bindings/interrupt-controller/irq.h>
+    gmac: ethernet@c080000 {
+        compatible = "qcom,ipq4019-ess-edma";
+        reg = <0xc080000 0x8000>;
+        resets = <&gcc ESS_RESET>;
+        reset-names = "ess";
+        clocks = <&gcc GCC_ESS_CLK>;
+        clock-names = "ess";
+        interrupts = <GIC_SPI  65 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  66 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  67 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  68 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  69 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  70 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  71 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  72 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  73 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  74 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  75 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  76 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  77 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  78 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  79 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI  80 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 240 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 241 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 242 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 243 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 244 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 245 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 246 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 247 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 248 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 249 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 250 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 251 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 252 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 253 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 254 IRQ_TYPE_EDGE_RISING>,
+                     <GIC_SPI 255 IRQ_TYPE_EDGE_RISING>;
+
+        phy-mode = "internal";
+        fixed-link {
+            speed = <1000>;
+            full-duplex;
+            pause;
+            asym-pause;
+        };
+    };
+
+...
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH net-next v2 5/5] ARM: dts: qcom: ipq4019: Add description for the IPQESS Ethernet controller
  2022-05-14 15:06 [PATCH net-next v2 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
                   ` (3 preceding siblings ...)
  2022-05-14 15:06 ` [PATCH net-next v2 4/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
@ 2022-05-14 15:06 ` Maxime Chevallier
  4 siblings, 0 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-14 15:06 UTC (permalink / raw)
  To: davem, Rob Herring
  Cc: Maxime Chevallier, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

The Qualcomm IPQ4019 includes an internal 5 ports switch, which is
connected to the CPU through the internal IPQESS Ethernet controller.

This commit adds support for this internal interface, which is
internally connected to a modified version of the QCA8K Ethernet switch.

This Ethernet controller only support a specific internal interface mode
for connection to the switch.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V1->V2:
 - Added clock and resets

 arch/arm/boot/dts/qcom-ipq4019.dtsi | 46 +++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/arch/arm/boot/dts/qcom-ipq4019.dtsi b/arch/arm/boot/dts/qcom-ipq4019.dtsi
index cac92dde040f..1afabee37fc6 100644
--- a/arch/arm/boot/dts/qcom-ipq4019.dtsi
+++ b/arch/arm/boot/dts/qcom-ipq4019.dtsi
@@ -38,6 +38,7 @@ aliases {
 		spi1 = &blsp1_spi2;
 		i2c0 = &blsp1_i2c3;
 		i2c1 = &blsp1_i2c4;
+		ethernet0 = &gmac;
 	};
 
 	cpus {
@@ -668,6 +669,51 @@ swport5: port@5 { /* MAC5 */
 			};
 		};
 
+		gmac: ethernet@c080000 {
+			compatible = "qcom,ipq4019-ess-edma";
+			reg = <0xc080000 0x8000>;
+			resets = <&gcc ESS_RESET>;
+			reset-names = "ess";
+			clocks = <&gcc GCC_ESS_CLK>;
+			clock-names = "ess";
+			interrupts = <GIC_SPI  65 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  66 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  67 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  68 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  69 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  70 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  71 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  72 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  73 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  74 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  75 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  76 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  77 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  78 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  79 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI  80 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 240 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 241 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 242 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 243 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 244 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 245 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 246 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 247 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 248 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 249 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 250 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 251 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 252 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 253 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 254 IRQ_TYPE_EDGE_RISING>,
+				     <GIC_SPI 255 IRQ_TYPE_EDGE_RISING>;
+
+			status = "disabled";
+
+			phy-mode = "internal";
+		};
+
 		mdio: mdio@90000 {
 			#address-cells = <1>;
 			#size-cells = <0>;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-14 15:06 ` [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
@ 2022-05-14 16:33   ` Florian Fainelli
  2022-05-17  7:06     ` Maxime Chevallier
  2022-05-14 22:40   ` Vladimir Oltean
  2022-05-16 19:20   ` Jakub Kicinski
  2 siblings, 1 reply; 24+ messages in thread
From: Florian Fainelli @ 2022-05-14 16:33 UTC (permalink / raw)
  To: Maxime Chevallier, davem, Rob Herring
  Cc: netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Heiner Kallweit, Russell King, linux-arm-kernel, Vladimir Oltean,
	Luka Perkov, Robert Marko

Hi Maxime,

On 5/14/2022 8:06 AM, Maxime Chevallier wrote:
> This tagging protocol is designed for the situation where the link
> between the MAC and the Switch is designed such that the Destination
> Port, which is usually embedded in some part of the Ethernet Header, is
> sent out-of-band, and isn't present at all in the Ethernet frame.
> 
> This can happen when the MAC and Switch are tightly integrated on an
> SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
> tag is inserted directly into the DMA descriptors. In that case,
> the MAC driver is responsible for sending the tag to the switch using
> the out-of-band medium. To do so, the MAC driver needs to have the
> information of the destination port for that skb.
> 
> This out-of-band tagging protocol is using the very beggining of the skb
> headroom to store the tag. The drawback of this approch is that the
> headroom isn't initialized upon allocating it, therefore we have a
> chance that the garbage data that lies there at allocation time actually
> ressembles a valid oob tag. This is only problematic if we are
> sending/receiving traffic on the master port, which isn't a valid DSA
> use-case from the beggining. When dealing from traffic to/from a slave
> port, then the oob tag will be initialized properly by the tagger or the
> mac driver through the use of the dsa_oob_tag_push() call.

What I like about your approach is that you have aligned the way an out 
of band switch tag is communicated to the networking stack the same way 
that an "in-band" switch tag would be communicated. I think this is a 
good way forward to provide the out of band tag and I don't think it 
creates a performance problem because the Ethernet frame is hot in the 
cache (dma_unmap_single()) and we already have an "expensive" read of 
the DMA descriptor in coherent memory anyway.

You could possibly optimize the data flow a bit to limit the amount of 
sk_buff data movement by asking your Ethernet controller to DMA into the 
data buffer N bytes into the beginning of the data buffer. That way, if 
you have reserved say, 2 bytes at the front data buffer you can deposit 
the QCA tag there and you do not need to push, process the tag, then pop 
it, just process and pop. Consider using the 2byte stuffing that the 
Ethernet controller might be adding to the beginning of the Ethernet 
frame to align the IP header on a 4-byte boundary to provide the tag in 
there?

If we want to have a generic out of band tagger like you propose, it 
seems to me that we will need to invent a synthetic DSA tagging format 
which is the largest common denominator of the out of band tags that we 
want to support. We could imagine being more compact in the 
representation for instance by using an u8 for storing a bitmask of 
ports (works for both RX and TX then) and another u8 for various packet 
forwarding reasons.

Then we would request the various Ethernet MAC drivers to marshall their 
proprietary tag into the DSA synthetic one on receive, and unmarshall it 
on transmit.

Another approach IMHO which maybe helps the maintainability of the code 
moving forward as well as ensuring that all Ethernet switch tagging code 
lives in one place, is to teach each tagger driver how to optimize their 
data paths to minimize the amount of data movements and checksum 
re-calculations, this is what I had in mind a few years ago:

https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/

This might scale a little less well, and maybe this makes too many 
assumptions as to where and how the checksums are calculated on the 
packet contents, but at least, you don't have logic processing the same 
type of switch tag scattered between the Ethernet MAC drivers (beyond 
copying/pushing) and DSA switch taggers.

I would like to hear other's opinion on this.
-- 
Florian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
@ 2022-05-14 17:18   ` Russell King (Oracle)
  2022-05-17  7:09     ` Maxime Chevallier
  2022-05-14 20:44   ` Vladimir Oltean
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Russell King (Oracle) @ 2022-05-14 17:18 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko

On Sat, May 14, 2022 at 05:06:52PM +0200, Maxime Chevallier wrote:
> +static int ipqess_do_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
> +{
> +	struct ipqess *ess = netdev_priv(netdev);
> +
> +	switch (cmd) {
> +	case SIOCGMIIPHY:
> +	case SIOCGMIIREG:
> +	case SIOCSMIIREG:
> +		return phylink_mii_ioctl(ess->phylink, ifr, cmd);
> +	default:
> +		break;
> +	}
> +
> +	return -EOPNOTSUPP;
> +}

Is there a reason this isn't just:

	return phylink_mii_ioctl(ess->phylink, ifr, cmd);

?

> +static int ipqess_axi_probe(struct platform_device *pdev)
> +{
> +	struct device_node *np = pdev->dev.of_node;
> +	struct net_device *netdev;
> +	phy_interface_t phy_mode;
> +	struct resource *res;
> +	struct ipqess *ess;
> +	int i, err = 0;
> +
> +	netdev = devm_alloc_etherdev_mqs(&pdev->dev, sizeof(struct ipqess),
> +					 IPQESS_NETDEV_QUEUES,
> +					 IPQESS_NETDEV_QUEUES);
> +	if (!netdev)
> +		return -ENOMEM;
> +
> +	ess = netdev_priv(netdev);
> +	ess->netdev = netdev;
> +	ess->pdev = pdev;
> +	spin_lock_init(&ess->stats_lock);
> +	SET_NETDEV_DEV(netdev, &pdev->dev);
> +	platform_set_drvdata(pdev, netdev);
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	ess->hw_addr = devm_ioremap_resource(&pdev->dev, res);
> +	if (IS_ERR(ess->hw_addr))
> +		return PTR_ERR(ess->hw_addr);
> +
> +	err = of_get_phy_mode(np, &phy_mode);
> +	if (err) {
> +		dev_err(&pdev->dev, "incorrect phy-mode\n");
> +		return err;
> +	}
> +
> +	ess->ess_clk = devm_clk_get(&pdev->dev, "ess");
> +	if (!IS_ERR(ess->ess_clk))
> +		clk_prepare_enable(ess->ess_clk);
> +
> +	ess->ess_rst = devm_reset_control_get(&pdev->dev, "ess");
> +	if (IS_ERR(ess->ess_rst))
> +		goto err_clk;
> +
> +	ipqess_reset(ess);
> +
> +	ess->phylink_config.dev = &netdev->dev;
> +	ess->phylink_config.type = PHYLINK_NETDEV;
> +
> +	__set_bit(PHY_INTERFACE_MODE_INTERNAL,
> +		  ess->phylink_config.supported_interfaces);

No mac capabilities?

> +
> +	ess->phylink = phylink_create(&ess->phylink_config,
> +				      of_fwnode_handle(np), phy_mode,
> +				      &ipqess_phylink_mac_ops);

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
  2022-05-14 17:18   ` Russell King (Oracle)
@ 2022-05-14 20:44   ` Vladimir Oltean
  2022-05-17  7:11     ` Maxime Chevallier
  2022-05-16  2:51   ` Andrew Lunn
  2022-05-17 21:03   ` Christophe JAILLET
  3 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2022-05-14 20:44 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko

On Sat, May 14, 2022 at 05:06:52PM +0200, Maxime Chevallier wrote:
> +/* locking is handled by the caller */
> +static int ipqess_rx_buf_alloc_napi(struct ipqess_rx_ring *rx_ring)
> +{
> +	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
> +
> +	buf->skb = napi_alloc_skb(&rx_ring->napi_rx, IPQESS_RX_HEAD_BUFF_SIZE);
> +	if (!buf->skb)
> +		return -ENOMEM;
> +
> +	return ipqess_rx_buf_prepare(buf, rx_ring);
> +}
> +
> +static int ipqess_rx_buf_alloc(struct ipqess_rx_ring *rx_ring)
> +{
> +	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
> +
> +	buf->skb = netdev_alloc_skb_ip_align(rx_ring->ess->netdev,
> +					     IPQESS_RX_HEAD_BUFF_SIZE);
> +
> +	if (!buf->skb)
> +		return -ENOMEM;
> +
> +	return ipqess_rx_buf_prepare(buf, rx_ring);
> +}
> +
> +static void ipqess_refill_work(struct work_struct *work)
> +{
> +	struct ipqess_rx_ring_refill *rx_refill = container_of(work,
> +		struct ipqess_rx_ring_refill, refill_work);
> +	struct ipqess_rx_ring *rx_ring = rx_refill->rx_ring;
> +	int refill = 0;
> +
> +	/* don't let this loop by accident. */
> +	while (atomic_dec_and_test(&rx_ring->refill_count)) {
> +		napi_disable(&rx_ring->napi_rx);
> +		if (ipqess_rx_buf_alloc(rx_ring)) {
> +			refill++;
> +			dev_dbg(rx_ring->ppdev,
> +				"Not all buffers were reallocated");
> +		}
> +		napi_enable(&rx_ring->napi_rx);
> +	}
> +
> +	if (atomic_add_return(refill, &rx_ring->refill_count))
> +		schedule_work(&rx_refill->refill_work);
> +}
> +
> +static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int budget)
> +{

> +	while (done < budget) {

> +		num_desc += atomic_xchg(&rx_ring->refill_count, 0);
> +		while (num_desc) {
> +			if (ipqess_rx_buf_alloc_napi(rx_ring)) {
> +				num_desc = atomic_add_return(num_desc,
> +							     &rx_ring->refill_count);
> +				if (num_desc >= ((4 * IPQESS_RX_RING_SIZE + 6) / 7))

DIV_ROUND_UP(IPQESS_RX_RING_SIZE * 4, 7)
Also, why this number?

> +					schedule_work(&rx_ring->ess->rx_refill[rx_ring->ring_id].refill_work);
> +				break;
> +			}
> +			num_desc--;
> +		}
> +	}
> +
> +	ipqess_w32(rx_ring->ess, IPQESS_REG_RX_SW_CONS_IDX_Q(rx_ring->idx),
> +		   rx_ring_tail);
> +	rx_ring->tail = rx_ring_tail;
> +
> +	return done;
> +}

> +static void ipqess_rx_ring_free(struct ipqess *ess)
> +{
> +	int i;
> +
> +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> +		int j;
> +
> +		atomic_set(&ess->rx_ring[i].refill_count, 0);
> +		cancel_work_sync(&ess->rx_refill[i].refill_work);

When refill_work is currently scheduled and executing the while loop,
will refill_count underflow due to the possibility of calling
atomic_dec_and_test(0)?

> +
> +		for (j = 0; j < IPQESS_RX_RING_SIZE; j++) {
> +			dma_unmap_single(&ess->pdev->dev,
> +					 ess->rx_ring[i].buf[j].dma,
> +					 ess->rx_ring[i].buf[j].length,
> +					 DMA_FROM_DEVICE);
> +			dev_kfree_skb_any(ess->rx_ring[i].buf[j].skb);
> +		}
> +	}
> +}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-14 15:06 ` [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
  2022-05-14 16:33   ` Florian Fainelli
@ 2022-05-14 22:40   ` Vladimir Oltean
  2022-05-17  7:01     ` Maxime Chevallier
  2022-05-16 19:20   ` Jakub Kicinski
  2 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2022-05-14 22:40 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko

On Sat, May 14, 2022 at 05:06:53PM +0200, Maxime Chevallier wrote:
> This tagging protocol is designed for the situation where the link
> between the MAC and the Switch is designed such that the Destination
> Port, which is usually embedded in some part of the Ethernet Header, is
> sent out-of-band, and isn't present at all in the Ethernet frame.
> 
> This can happen when the MAC and Switch are tightly integrated on an
> SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
> tag is inserted directly into the DMA descriptors. In that case,
> the MAC driver is responsible for sending the tag to the switch using
> the out-of-band medium. To do so, the MAC driver needs to have the
> information of the destination port for that skb.
> 
> This out-of-band tagging protocol is using the very beggining of the skb
> headroom to store the tag. The drawback of this approch is that the
> headroom isn't initialized upon allocating it, therefore we have a
> chance that the garbage data that lies there at allocation time actually
> ressembles a valid oob tag. This is only problematic if we are
> sending/receiving traffic on the master port, which isn't a valid DSA
> use-case from the beggining. When dealing from traffic to/from a slave
> port, then the oob tag will be initialized properly by the tagger or the
> mac driver through the use of the dsa_oob_tag_push() call.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> ---

Why put the DSA pseudo-header at skb->head rather than push it using
skb_push()? I thought you were going to check for the presence of a DSA
header using something like skb->mac_len == ETH_HLEN + tag len, but
right now it sounds like treating garbage in the headroom as a valid DSA
tag is indeed a potential problem. If you can't sort that out using
information from the header offsets alone, maybe an skb extension is
required?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
  2022-05-14 17:18   ` Russell King (Oracle)
  2022-05-14 20:44   ` Vladimir Oltean
@ 2022-05-16  2:51   ` Andrew Lunn
  2022-05-17  7:13     ` Maxime Chevallier
  2022-05-17 21:03   ` Christophe JAILLET
  3 siblings, 1 reply; 24+ messages in thread
From: Andrew Lunn @ 2022-05-16  2:51 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

> +static int ipqess_tx_ring_alloc(struct ipqess *ess)
> +{
> +	struct device *dev = &ess->pdev->dev;
> +	int i;
> +
> +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> +		struct ipqess_tx_ring *tx_ring = &ess->tx_ring[i];
> +		size_t size;
> +		u32 idx;
> +
> +		tx_ring->ess = ess;
> +		tx_ring->ring_id = i;
> +		tx_ring->idx = i * 4;
> +		tx_ring->count = IPQESS_TX_RING_SIZE;
> +		tx_ring->nq = netdev_get_tx_queue(ess->netdev, i);
> +
> +		size = sizeof(struct ipqess_buf) * IPQESS_TX_RING_SIZE;
> +		tx_ring->buf = devm_kzalloc(dev, size, GFP_KERNEL);
> +		if (!tx_ring->buf) {
> +			netdev_err(ess->netdev, "buffer alloc of tx ring failed");
> +			return -ENOMEM;
> +		}

kzalloc() is pretty loud when there is no memory. So you see patches
removing such warning messages.

> +static int ipqess_rx_napi(struct napi_struct *napi, int budget)
> +{
> +	struct ipqess_rx_ring *rx_ring = container_of(napi, struct ipqess_rx_ring,
> +						    napi_rx);
> +	struct ipqess *ess = rx_ring->ess;
> +	u32 rx_mask = BIT(rx_ring->idx);
> +	int remain_budget = budget;
> +	int rx_done;
> +	u32 status;
> +
> +poll_again:
> +	ipqess_w32(ess, IPQESS_REG_RX_ISR, rx_mask);
> +	rx_done = ipqess_rx_poll(rx_ring, remain_budget);
> +
> +	if (rx_done == remain_budget)
> +		return budget;
> +
> +	status = ipqess_r32(ess, IPQESS_REG_RX_ISR);
> +	if (status & rx_mask) {
> +		remain_budget -= rx_done;
> +		goto poll_again;
> +	}

Could this be turned into a do while() loop?

> +static void ipqess_irq_enable(struct ipqess *ess)
> +{
> +	int i;
> +
> +	ipqess_w32(ess, IPQESS_REG_RX_ISR, 0xff);
> +	ipqess_w32(ess, IPQESS_REG_TX_ISR, 0xffff);
> +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> +		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 1);
> +		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 1);
> +	}
> +}
> +
> +static void ipqess_irq_disable(struct ipqess *ess)
> +{
> +	int i;
> +
> +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> +		ipqess_w32(ess, IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 0);
> +		ipqess_w32(ess, IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 0);
> +	}
> +}

Enable and disable are not symmetric?


> +static inline void ipqess_kick_tx(struct ipqess_tx_ring *tx_ring)

No inline functions please in .c files. Let the compiler decide.

   Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-14 15:06 ` [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
  2022-05-14 16:33   ` Florian Fainelli
  2022-05-14 22:40   ` Vladimir Oltean
@ 2022-05-16 19:20   ` Jakub Kicinski
  2022-05-17  6:53     ` Maxime Chevallier
  2 siblings, 1 reply; 24+ messages in thread
From: Jakub Kicinski @ 2022-05-16 19:20 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

On Sat, 14 May 2022 17:06:53 +0200 Maxime Chevallier wrote:
> This tagging protocol is designed for the situation where the link
> between the MAC and the Switch is designed such that the Destination
> Port, which is usually embedded in some part of the Ethernet Header, is
> sent out-of-band, and isn't present at all in the Ethernet frame.
> 
> This can happen when the MAC and Switch are tightly integrated on an
> SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
> tag is inserted directly into the DMA descriptors. In that case,
> the MAC driver is responsible for sending the tag to the switch using
> the out-of-band medium. To do so, the MAC driver needs to have the
> information of the destination port for that skb.
> 
> This out-of-band tagging protocol is using the very beggining of the skb
> headroom to store the tag. The drawback of this approch is that the
> headroom isn't initialized upon allocating it, therefore we have a
> chance that the garbage data that lies there at allocation time actually
> ressembles a valid oob tag. This is only problematic if we are
> sending/receiving traffic on the master port, which isn't a valid DSA
> use-case from the beggining. When dealing from traffic to/from a slave
> port, then the oob tag will be initialized properly by the tagger or the
> mac driver through the use of the dsa_oob_tag_push() call.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

This must had been asked on v1 but there's no trace of it in the
current submission afaict...

If the tag is passed in the descriptor how is this not a pure switchdev
driver? The explanation must be preserved somehow.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-16 19:20   ` Jakub Kicinski
@ 2022-05-17  6:53     ` Maxime Chevallier
  2022-05-17 20:58       ` Jakub Kicinski
  0 siblings, 1 reply; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-17  6:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

Hello Jakub,

On Mon, 16 May 2022 12:20:48 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Sat, 14 May 2022 17:06:53 +0200 Maxime Chevallier wrote:
> > This tagging protocol is designed for the situation where the link
> > between the MAC and the Switch is designed such that the Destination
> > Port, which is usually embedded in some part of the Ethernet
> > Header, is sent out-of-band, and isn't present at all in the
> > Ethernet frame.
> > 
> > This can happen when the MAC and Switch are tightly integrated on an
> > SoC, as is the case with the Qualcomm IPQ4019 for example, where
> > the DSA tag is inserted directly into the DMA descriptors. In that
> > case, the MAC driver is responsible for sending the tag to the
> > switch using the out-of-band medium. To do so, the MAC driver needs
> > to have the information of the destination port for that skb.
> > 
> > This out-of-band tagging protocol is using the very beggining of
> > the skb headroom to store the tag. The drawback of this approch is
> > that the headroom isn't initialized upon allocating it, therefore
> > we have a chance that the garbage data that lies there at
> > allocation time actually ressembles a valid oob tag. This is only
> > problematic if we are sending/receiving traffic on the master port,
> > which isn't a valid DSA use-case from the beggining. When dealing
> > from traffic to/from a slave port, then the oob tag will be
> > initialized properly by the tagger or the mac driver through the
> > use of the dsa_oob_tag_push() call.
> > 
> > Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>  
> 
> This must had been asked on v1 but there's no trace of it in the
> current submission afaict...

No you're correct, this wasn't explained.

> If the tag is passed in the descriptor how is this not a pure
> switchdev driver? The explanation must be preserved somehow.

The main reason is that although the MAC and switch are rightly coupled
on that platform, the switch is actually a QC8K that can live on it's
own, as an external switch. Here, it's just a slightly modified version
of this IP.

The same goes for the MAC IP, but so far we don't support any other
platform that have the MAC as a standalone controller. As far as we can
tell, platforms that have this MAC also include a QCA8K, but the
datasheet also mentions other modes (like outputing RGMII).

Is this valid to have it as a standalone ethernet driver in that
situation ?

Thanks,

Maxime

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-14 22:40   ` Vladimir Oltean
@ 2022-05-17  7:01     ` Maxime Chevallier
  2022-05-19 14:52       ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-17  7:01 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko

Hi Vlad,

On Sat, 14 May 2022 22:40:03 +0000
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Sat, May 14, 2022 at 05:06:53PM +0200, Maxime Chevallier wrote:
> > This tagging protocol is designed for the situation where the link
> > between the MAC and the Switch is designed such that the Destination
> > Port, which is usually embedded in some part of the Ethernet
> > Header, is sent out-of-band, and isn't present at all in the
> > Ethernet frame.
> > 
> > This can happen when the MAC and Switch are tightly integrated on an
> > SoC, as is the case with the Qualcomm IPQ4019 for example, where
> > the DSA tag is inserted directly into the DMA descriptors. In that
> > case, the MAC driver is responsible for sending the tag to the
> > switch using the out-of-band medium. To do so, the MAC driver needs
> > to have the information of the destination port for that skb.
> > 
> > This out-of-band tagging protocol is using the very beggining of
> > the skb headroom to store the tag. The drawback of this approch is
> > that the headroom isn't initialized upon allocating it, therefore
> > we have a chance that the garbage data that lies there at
> > allocation time actually ressembles a valid oob tag. This is only
> > problematic if we are sending/receiving traffic on the master port,
> > which isn't a valid DSA use-case from the beggining. When dealing
> > from traffic to/from a slave port, then the oob tag will be
> > initialized properly by the tagger or the mac driver through the
> > use of the dsa_oob_tag_push() call.
> > 
> > Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> > ---  
> 
> Why put the DSA pseudo-header at skb->head rather than push it using
> skb_push()? I thought you were going to check for the presence of a
> DSA header using something like skb->mac_len == ETH_HLEN + tag len,
> but right now it sounds like treating garbage in the headroom as a
> valid DSA tag is indeed a potential problem. If you can't sort that
> out using information from the header offsets alone, maybe an skb
> extension is required?

Indeed, I thought of that, the main reason is that pushing/poping in
itself is not enough, you also have to move the whole mac_header to
leave room for the tag, and then re-set it in it's original location.
There's nothing wrong with this, but it looked a bit cumbersome just to
insert a dummy tag that gets removed rightaway. Does that make sense ?

But yes I would really like to get a way to know wether the tag is
there or not, I'll dig a bit more to see if I can find a way to get
this info from the various skb offsets in a reliable way.

Thanks,

Maxime

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-14 16:33   ` Florian Fainelli
@ 2022-05-17  7:06     ` Maxime Chevallier
  0 siblings, 0 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-17  7:06 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko

Hi Florian,

On Sat, 14 May 2022 09:33:44 -0700
Florian Fainelli <f.fainelli@gmail.com> wrote:

> Hi Maxime,
> 
> On 5/14/2022 8:06 AM, Maxime Chevallier wrote:
> > This tagging protocol is designed for the situation where the link
> > between the MAC and the Switch is designed such that the Destination
> > Port, which is usually embedded in some part of the Ethernet
> > Header, is sent out-of-band, and isn't present at all in the
> > Ethernet frame.
> > 
> > This can happen when the MAC and Switch are tightly integrated on an
> > SoC, as is the case with the Qualcomm IPQ4019 for example, where
> > the DSA tag is inserted directly into the DMA descriptors. In that
> > case, the MAC driver is responsible for sending the tag to the
> > switch using the out-of-band medium. To do so, the MAC driver needs
> > to have the information of the destination port for that skb.
> > 
> > This out-of-band tagging protocol is using the very beggining of
> > the skb headroom to store the tag. The drawback of this approch is
> > that the headroom isn't initialized upon allocating it, therefore
> > we have a chance that the garbage data that lies there at
> > allocation time actually ressembles a valid oob tag. This is only
> > problematic if we are sending/receiving traffic on the master port,
> > which isn't a valid DSA use-case from the beggining. When dealing
> > from traffic to/from a slave port, then the oob tag will be
> > initialized properly by the tagger or the mac driver through the
> > use of the dsa_oob_tag_push() call.  
> 
> What I like about your approach is that you have aligned the way an
> out of band switch tag is communicated to the networking stack the
> same way that an "in-band" switch tag would be communicated. I think
> this is a good way forward to provide the out of band tag and I don't
> think it creates a performance problem because the Ethernet frame is
> hot in the cache (dma_unmap_single()) and we already have an
> "expensive" read of the DMA descriptor in coherent memory anyway.
> 
> You could possibly optimize the data flow a bit to limit the amount
> of sk_buff data movement by asking your Ethernet controller to DMA
> into the data buffer N bytes into the beginning of the data buffer.
> That way, if you have reserved say, 2 bytes at the front data buffer
> you can deposit the QCA tag there and you do not need to push,
> process the tag, then pop it, just process and pop. Consider using
> the 2byte stuffing that the Ethernet controller might be adding to
> the beginning of the Ethernet frame to align the IP header on a
> 4-byte boundary to provide the tag in there?
> 
> If we want to have a generic out of band tagger like you propose, it 
> seems to me that we will need to invent a synthetic DSA tagging
> format which is the largest common denominator of the out of band
> tags that we want to support. We could imagine being more compact in
> the representation for instance by using an u8 for storing a bitmask
> of ports (works for both RX and TX then) and another u8 for various
> packet forwarding reasons.

Thanks, that was my initial idea indeed. Having a generic tagger that
can be re-used would be great IMO. I'll modify the format as you
propose, and also give a try to you approach of DMA'ing 2 bytes forward
so that the tag location is already allocated, that's a nice idea.

> Then we would request the various Ethernet MAC drivers to marshall
> their proprietary tag into the DSA synthetic one on receive, and
> unmarshall it on transmit.
> 
> Another approach IMHO which maybe helps the maintainability of the
> code moving forward as well as ensuring that all Ethernet switch
> tagging code lives in one place, is to teach each tagger driver how
> to optimize their data paths to minimize the amount of data movements
> and checksum re-calculations, this is what I had in mind a few years
> ago:
> 
> https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/
> 
> This might scale a little less well, and maybe this makes too many 
> assumptions as to where and how the checksums are calculated on the 
> packet contents, but at least, you don't have logic processing the
> same type of switch tag scattered between the Ethernet MAC drivers
> (beyond copying/pushing) and DSA switch taggers.

That would definitely fit well with this tagger, I didn't know about
that series !

Thanks for the review,

Maxime

> I would like to hear other's opinion on this.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-14 17:18   ` Russell King (Oracle)
@ 2022-05-17  7:09     ` Maxime Chevallier
  0 siblings, 0 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-17  7:09 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko

Hello Russell,

Thanks for the review.

On Sat, 14 May 2022 18:18:17 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Sat, May 14, 2022 at 05:06:52PM +0200, Maxime Chevallier wrote:
> > +static int ipqess_do_ioctl(struct net_device *netdev, struct ifreq
> > *ifr, int cmd) +{
> > +	struct ipqess *ess = netdev_priv(netdev);
> > +
> > +	switch (cmd) {
> > +	case SIOCGMIIPHY:
> > +	case SIOCGMIIREG:
> > +	case SIOCSMIIREG:
> > +		return phylink_mii_ioctl(ess->phylink, ifr, cmd);
> > +	default:
> > +		break;
> > +	}
> > +
> > +	return -EOPNOTSUPP;
> > +}  
> 
> Is there a reason this isn't just:
> 
> 	return phylink_mii_ioctl(ess->phylink, ifr, cmd);

Not really, an oversight on my part, I'll address that in v3

> ?
> 
> > +static int ipqess_axi_probe(struct platform_device *pdev)
> > +{
> > +	struct device_node *np = pdev->dev.of_node;
> > +	struct net_device *netdev;
> > +	phy_interface_t phy_mode;
> > +	struct resource *res;
> > +	struct ipqess *ess;
> > +	int i, err = 0;
> > +
> > +	netdev = devm_alloc_etherdev_mqs(&pdev->dev, sizeof(struct
> > ipqess),
> > +					 IPQESS_NETDEV_QUEUES,
> > +					 IPQESS_NETDEV_QUEUES);
> > +	if (!netdev)
> > +		return -ENOMEM;
> > +
> > +	ess = netdev_priv(netdev);
> > +	ess->netdev = netdev;
> > +	ess->pdev = pdev;
> > +	spin_lock_init(&ess->stats_lock);
> > +	SET_NETDEV_DEV(netdev, &pdev->dev);
> > +	platform_set_drvdata(pdev, netdev);
> > +
> > +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +	ess->hw_addr = devm_ioremap_resource(&pdev->dev, res);
> > +	if (IS_ERR(ess->hw_addr))
> > +		return PTR_ERR(ess->hw_addr);
> > +
> > +	err = of_get_phy_mode(np, &phy_mode);
> > +	if (err) {
> > +		dev_err(&pdev->dev, "incorrect phy-mode\n");
> > +		return err;
> > +	}
> > +
> > +	ess->ess_clk = devm_clk_get(&pdev->dev, "ess");
> > +	if (!IS_ERR(ess->ess_clk))
> > +		clk_prepare_enable(ess->ess_clk);
> > +
> > +	ess->ess_rst = devm_reset_control_get(&pdev->dev, "ess");
> > +	if (IS_ERR(ess->ess_rst))
> > +		goto err_clk;
> > +
> > +	ipqess_reset(ess);
> > +
> > +	ess->phylink_config.dev = &netdev->dev;
> > +	ess->phylink_config.type = PHYLINK_NETDEV;
> > +
> > +	__set_bit(PHY_INTERFACE_MODE_INTERNAL,
> > +		  ess->phylink_config.supported_interfaces);  
> 
> No mac capabilities?

My bad too, I also missed that. I'll also address that in v3.

> > +
> > +	ess->phylink = phylink_create(&ess->phylink_config,
> > +				      of_fwnode_handle(np),
> > phy_mode,
> > +				      &ipqess_phylink_mac_ops);  
> 

Thanks,

Maxime

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-14 20:44   ` Vladimir Oltean
@ 2022-05-17  7:11     ` Maxime Chevallier
  0 siblings, 0 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-17  7:11 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko

Hello Vlad,

On Sat, 14 May 2022 20:44:38 +0000
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Sat, May 14, 2022 at 05:06:52PM +0200, Maxime Chevallier wrote:
> > +/* locking is handled by the caller */
> > +static int ipqess_rx_buf_alloc_napi(struct ipqess_rx_ring *rx_ring)
> > +{
> > +	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
> > +
> > +	buf->skb = napi_alloc_skb(&rx_ring->napi_rx,
> > IPQESS_RX_HEAD_BUFF_SIZE);
> > +	if (!buf->skb)
> > +		return -ENOMEM;
> > +
> > +	return ipqess_rx_buf_prepare(buf, rx_ring);
> > +}
> > +
> > +static int ipqess_rx_buf_alloc(struct ipqess_rx_ring *rx_ring)
> > +{
> > +	struct ipqess_buf *buf = &rx_ring->buf[rx_ring->head];
> > +
> > +	buf->skb = netdev_alloc_skb_ip_align(rx_ring->ess->netdev,
> > +
> > IPQESS_RX_HEAD_BUFF_SIZE); +
> > +	if (!buf->skb)
> > +		return -ENOMEM;
> > +
> > +	return ipqess_rx_buf_prepare(buf, rx_ring);
> > +}
> > +
> > +static void ipqess_refill_work(struct work_struct *work)
> > +{
> > +	struct ipqess_rx_ring_refill *rx_refill =
> > container_of(work,
> > +		struct ipqess_rx_ring_refill, refill_work);
> > +	struct ipqess_rx_ring *rx_ring = rx_refill->rx_ring;
> > +	int refill = 0;
> > +
> > +	/* don't let this loop by accident. */
> > +	while (atomic_dec_and_test(&rx_ring->refill_count)) {
> > +		napi_disable(&rx_ring->napi_rx);
> > +		if (ipqess_rx_buf_alloc(rx_ring)) {
> > +			refill++;
> > +			dev_dbg(rx_ring->ppdev,
> > +				"Not all buffers were
> > reallocated");
> > +		}
> > +		napi_enable(&rx_ring->napi_rx);
> > +	}
> > +
> > +	if (atomic_add_return(refill, &rx_ring->refill_count))
> > +		schedule_work(&rx_refill->refill_work);
> > +}
> > +
> > +static int ipqess_rx_poll(struct ipqess_rx_ring *rx_ring, int
> > budget) +{  
> 
> > +	while (done < budget) {  
> 
> > +		num_desc += atomic_xchg(&rx_ring->refill_count, 0);
> > +		while (num_desc) {
> > +			if (ipqess_rx_buf_alloc_napi(rx_ring)) {
> > +				num_desc =
> > atomic_add_return(num_desc,
> > +
> > &rx_ring->refill_count);
> > +				if (num_desc >= ((4 *
> > IPQESS_RX_RING_SIZE + 6) / 7))  
> 
> DIV_ROUND_UP(IPQESS_RX_RING_SIZE * 4, 7)
> Also, why this number?

Ah this was from the original out-of-tree driver... I'll try to figure
out what's going on an replace that by some #define that would make
more sense.

> > +
> > schedule_work(&rx_ring->ess->rx_refill[rx_ring->ring_id].refill_work);
> > +				break;
> > +			}
> > +			num_desc--;
> > +		}
> > +	}
> > +
> > +	ipqess_w32(rx_ring->ess,
> > IPQESS_REG_RX_SW_CONS_IDX_Q(rx_ring->idx),
> > +		   rx_ring_tail);
> > +	rx_ring->tail = rx_ring_tail;
> > +
> > +	return done;
> > +}  
> 
> > +static void ipqess_rx_ring_free(struct ipqess *ess)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > +		int j;
> > +
> > +		atomic_set(&ess->rx_ring[i].refill_count, 0);
> > +		cancel_work_sync(&ess->rx_refill[i].refill_work);  
> 
> When refill_work is currently scheduled and executing the while loop,
> will refill_count underflow due to the possibility of calling
> atomic_dec_and_test(0)?

Good question, I'll double-check, you might be correct. Nice catch

> > +
> > +		for (j = 0; j < IPQESS_RX_RING_SIZE; j++) {
> > +			dma_unmap_single(&ess->pdev->dev,
> > +
> > ess->rx_ring[i].buf[j].dma,
> > +
> > ess->rx_ring[i].buf[j].length,
> > +					 DMA_FROM_DEVICE);
> > +
> > dev_kfree_skb_any(ess->rx_ring[i].buf[j].skb);
> > +		}
> > +	}
> > +  

Thanks,

Maxime

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-16  2:51   ` Andrew Lunn
@ 2022-05-17  7:13     ` Maxime Chevallier
  0 siblings, 0 replies; 24+ messages in thread
From: Maxime Chevallier @ 2022-05-17  7:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

Hi Andrew,

On Mon, 16 May 2022 04:51:03 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > +static int ipqess_tx_ring_alloc(struct ipqess *ess)
> > +{
> > +	struct device *dev = &ess->pdev->dev;
> > +	int i;
> > +
> > +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > +		struct ipqess_tx_ring *tx_ring = &ess->tx_ring[i];
> > +		size_t size;
> > +		u32 idx;
> > +
> > +		tx_ring->ess = ess;
> > +		tx_ring->ring_id = i;
> > +		tx_ring->idx = i * 4;
> > +		tx_ring->count = IPQESS_TX_RING_SIZE;
> > +		tx_ring->nq = netdev_get_tx_queue(ess->netdev, i);
> > +
> > +		size = sizeof(struct ipqess_buf) *
> > IPQESS_TX_RING_SIZE;
> > +		tx_ring->buf = devm_kzalloc(dev, size, GFP_KERNEL);
> > +		if (!tx_ring->buf) {
> > +			netdev_err(ess->netdev, "buffer alloc of
> > tx ring failed");
> > +			return -ENOMEM;
> > +		}  
> 
> kzalloc() is pretty loud when there is no memory. So you see patches
> removing such warning messages.

Ack, I'll remove that

> > +static int ipqess_rx_napi(struct napi_struct *napi, int budget)
> > +{
> > +	struct ipqess_rx_ring *rx_ring = container_of(napi, struct
> > ipqess_rx_ring,
> > +						    napi_rx);
> > +	struct ipqess *ess = rx_ring->ess;
> > +	u32 rx_mask = BIT(rx_ring->idx);
> > +	int remain_budget = budget;
> > +	int rx_done;
> > +	u32 status;
> > +
> > +poll_again:
> > +	ipqess_w32(ess, IPQESS_REG_RX_ISR, rx_mask);
> > +	rx_done = ipqess_rx_poll(rx_ring, remain_budget);
> > +
> > +	if (rx_done == remain_budget)
> > +		return budget;
> > +
> > +	status = ipqess_r32(ess, IPQESS_REG_RX_ISR);
> > +	if (status & rx_mask) {
> > +		remain_budget -= rx_done;
> > +		goto poll_again;
> > +	}  
> 
> Could this be turned into a do while() loop?

Yes indeed, I'll fix this for v3

> > +static void ipqess_irq_enable(struct ipqess *ess)
> > +{
> > +	int i;
> > +
> > +	ipqess_w32(ess, IPQESS_REG_RX_ISR, 0xff);
> > +	ipqess_w32(ess, IPQESS_REG_TX_ISR, 0xffff);
> > +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > +		ipqess_w32(ess,
> > IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 1);
> > +		ipqess_w32(ess,
> > IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 1);
> > +	}
> > +}
> > +
> > +static void ipqess_irq_disable(struct ipqess *ess)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> > +		ipqess_w32(ess,
> > IPQESS_REG_RX_INT_MASK_Q(ess->rx_ring[i].idx), 0);
> > +		ipqess_w32(ess,
> > IPQESS_REG_TX_INT_MASK_Q(ess->tx_ring[i].idx), 0);
> > +	}
> > +}  
> 
> Enable and disable are not symmetric?

Ah nice catch too, I'll dig into this, either to make it symmetric or
to explain with a comment why it isn't

> 
> > +static inline void ipqess_kick_tx(struct ipqess_tx_ring *tx_ring)  
> 
> No inline functions please in .c files. Let the compiler decide.

Ack, I'll address that.

>    Andrew

Thanks again for the review

Maxime

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-17  6:53     ` Maxime Chevallier
@ 2022-05-17 20:58       ` Jakub Kicinski
  0 siblings, 0 replies; 24+ messages in thread
From: Jakub Kicinski @ 2022-05-17 20:58 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Vladimir Oltean, Luka Perkov,
	Robert Marko

On Tue, 17 May 2022 08:53:55 +0200 Maxime Chevallier wrote:
> > This must had been asked on v1 but there's no trace of it in the
> > current submission afaict...  
> 
> No you're correct, this wasn't explained.
> 
> > If the tag is passed in the descriptor how is this not a pure
> > switchdev driver? The explanation must be preserved somehow.  
> 
> The main reason is that although the MAC and switch are rightly coupled
> on that platform, the switch is actually a QC8K that can live on it's
> own, as an external switch. Here, it's just a slightly modified version
> of this IP.
> 
> The same goes for the MAC IP, but so far we don't support any other
> platform that have the MAC as a standalone controller. As far as we can
> tell, platforms that have this MAC also include a QCA8K, but the
> datasheet also mentions other modes (like outputing RGMII).

Got it, thanks! Please weave this justification more explicitly into 
the cover letter.

> Is this valid to have it as a standalone ethernet driver in that
> situation ?

Quite possibly.. I won't pretend I've looked at the code, I defer 
to the reviewers :)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 1/5] net: ipqess: introduce the Qualcomm IPQESS driver
  2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
                     ` (2 preceding siblings ...)
  2022-05-16  2:51   ` Andrew Lunn
@ 2022-05-17 21:03   ` Christophe JAILLET
  3 siblings, 0 replies; 24+ messages in thread
From: Christophe JAILLET @ 2022-05-17 21:03 UTC (permalink / raw)
  To: Maxime Chevallier, davem, Rob Herring
  Cc: netdev, linux-kernel, devicetree, thomas.petazzoni, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko

Le 14/05/2022 à 17:06, Maxime Chevallier a écrit :
> The Qualcomm IPQESS controller is a simple 1G Ethernet controller found
> on the IPQ4019 chip. This controller has some specificities, in that the
> IPQ4019 platform that includes that controller also has an internal
> switch, based on the QCA8K IP.
> 
> It is connected to that switch through an internal link, and doesn't
> expose directly any external interface, hence it only supports the
> PHY_INTERFACE_MODE_INTERNAL for now.
> 
> It has 16 RX and TX queues, with a very basic RSS fanout configured at
> init time.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> ---
> V1->V2 :
>   - Reworked the init sequence, following Andrew's comments
>   - Added clock and reset support
>   - Reworked the error paths
>   - Added extra endianness wrappers to fix sparse warnings
> 
>   MAINTAINERS                                   |    6 +
>   drivers/net/ethernet/qualcomm/Kconfig         |   11 +
>   drivers/net/ethernet/qualcomm/Makefile        |    2 +
>   drivers/net/ethernet/qualcomm/ipqess/Makefile |    8 +
>   drivers/net/ethernet/qualcomm/ipqess/ipqess.c | 1269 +++++++++++++++++
>   drivers/net/ethernet/qualcomm/ipqess/ipqess.h |  518 +++++++
>   .../ethernet/qualcomm/ipqess/ipqess_ethtool.c |  168 +++
>   7 files changed, 1982 insertions(+)
>   create mode 100644 drivers/net/ethernet/qualcomm/ipqess/Makefile
>   create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.c
>   create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess.h
>   create mode 100644 drivers/net/ethernet/qualcomm/ipqess/ipqess_ethtool.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9b0480f1b153..29e6ec4f975a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16308,6 +16308,12 @@ L:	netdev@vger.kernel.org
>   S:	Maintained
>   F:	drivers/net/ethernet/qualcomm/emac/
>   
> +QUALCOMM IPQESS ETHERNET DRIVER
> +M:	Maxime Chevallier <maxime.chevallier@bootlin.com>
> +L:	netdev@vger.kernel.org
> +S:	Maintained
> +F:	drivers/net/ethernet/qualcomm/ipqess/
> +
>   QUALCOMM ETHQOS ETHERNET DRIVER
>   M:	Vinod Koul <vkoul@kernel.org>
>   L:	netdev@vger.kernel.org
> diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig
> index a4434eb38950..a723ddbea248 100644
> --- a/drivers/net/ethernet/qualcomm/Kconfig
> +++ b/drivers/net/ethernet/qualcomm/Kconfig
> @@ -60,6 +60,17 @@ config QCOM_EMAC
>   	  low power, Receive-Side Scaling (RSS), and IEEE 1588-2008
>   	  Precision Clock Synchronization Protocol.
>   
> +config QCOM_IPQ4019_ESS_EDMA
> +	tristate "Qualcomm Atheros IPQ4019 ESS EDMA support"
> +	depends on OF
> +	select PHYLINK
> +	help
> +	  This driver supports the Qualcomm Atheros IPQ40xx built-in
> +	  ESS EDMA ethernet controller.
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called ipqess.
> +
>   source "drivers/net/ethernet/qualcomm/rmnet/Kconfig"
>   
>   endif # NET_VENDOR_QUALCOMM
> diff --git a/drivers/net/ethernet/qualcomm/Makefile b/drivers/net/ethernet/qualcomm/Makefile
> index 9250976dd884..db463c9ea1f9 100644
> --- a/drivers/net/ethernet/qualcomm/Makefile
> +++ b/drivers/net/ethernet/qualcomm/Makefile
> @@ -11,4 +11,6 @@ qcauart-objs := qca_uart.o
>   
>   obj-y += emac/
>   
> +obj-$(CONFIG_QCOM_IPQ4019_ESS_EDMA) += ipqess/
> +
>   obj-$(CONFIG_RMNET) += rmnet/
> diff --git a/drivers/net/ethernet/qualcomm/ipqess/Makefile b/drivers/net/ethernet/qualcomm/ipqess/Makefile
> new file mode 100644
> index 000000000000..4f2db7283ebf
> --- /dev/null
> +++ b/drivers/net/ethernet/qualcomm/ipqess/Makefile
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for the IPQ ESS driver
> +#
> +
> +obj-$(CONFIG_QCOM_IPQ4019_ESS_EDMA) += ipq_ess.o
> +
> +ipq_ess-objs := ipqess.o ipqess_ethtool.o
> diff --git a/drivers/net/ethernet/qualcomm/ipqess/ipqess.c b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
> new file mode 100644
> index 000000000000..b11f11f23c11
> --- /dev/null
> +++ b/drivers/net/ethernet/qualcomm/ipqess/ipqess.c
> @@ -0,0 +1,1269 @@
> +// SPDX-License-Identifier: GPL-2.0 OR ISC
> +/* Copyright (c) 2014 - 2017, The Linux Foundation. All rights reserved.
> + * Copyright (c) 2017 - 2018, John Crispin <john@phrozen.org>
> + * Copyright (c) 2018 - 2019, Christian Lamparter <chunkeey@gmail.com>
> + * Copyright (c) 2020 - 2021, Gabor Juhos <j4g8y7@gmail.com>
> + * Copyright (c) 2021 - 2022, Maxime Chevallier <maxime.chevallier@bootlin.com>
> + *
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/clk.h>
> +#include <linux/if_vlan.h>
> +#include <linux/interrupt.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_mdio.h>
> +#include <linux/of_net.h>
> +#include <linux/phylink.h>
> +#include <linux/platform_device.h>
> +#include <linux/reset.h>
> +#include <linux/skbuff.h>
> +#include <linux/vmalloc.h>
> +#include <net/checksum.h>
> +#include <net/ip6_checksum.h>
> +
> +#include "ipqess.h"
> +
> +#define IPQESS_RRD_SIZE		16
> +#define IPQESS_NEXT_IDX(X, Y)  (((X) + 1) & ((Y) - 1))
> +#define IPQESS_TX_DMA_BUF_LEN	0x3fff
> +
> +static void ipqess_w32(struct ipqess *ess, u32 reg, u32 val)
> +{
> +	writel(val, ess->hw_addr + reg);
> +}
> +
> +static u32 ipqess_r32(struct ipqess *ess, u16 reg)
> +{
> +	return readl(ess->hw_addr + reg);
> +}
> +
> +static void ipqess_m32(struct ipqess *ess, u32 mask, u32 val, u16 reg)
> +{
> +	u32 _val = ipqess_r32(ess, reg);
> +
> +	_val &= ~mask;
> +	_val |= val;
> +
> +	ipqess_w32(ess, reg, _val);
> +}
> +
> +void ipqess_update_hw_stats(struct ipqess *ess)
> +{
> +	u32 *p;
> +	u32 stat;
> +	int i;
> +
> +	lockdep_assert_held(&ess->stats_lock);
> +
> +	p = (u32 *)&ess->ipqess_stats;
> +	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
> +		stat = ipqess_r32(ess, IPQESS_REG_TX_STAT_PKT_Q(i));
> +		*p += stat;
> +		p++;
> +	}
> +
> +	for (i = 0; i < IPQESS_MAX_TX_QUEUE; i++) {
> +		stat = ipqess_r32(ess, IPQESS_REG_TX_STAT_BYTE_Q(i));
> +		*p += stat;
> +		p++;
> +	}
> +
> +	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
> +		stat = ipqess_r32(ess, IPQESS_REG_RX_STAT_PKT_Q(i));
> +		*p += stat;
> +		p++;
> +	}
> +
> +	for (i = 0; i < IPQESS_MAX_RX_QUEUE; i++) {
> +		stat = ipqess_r32(ess, IPQESS_REG_RX_STAT_BYTE_Q(i));
> +		*p += stat;
> +		p++;
> +	}
> +}
> +
> +static int ipqess_tx_ring_alloc(struct ipqess *ess)
> +{
> +	struct device *dev = &ess->pdev->dev;
> +	int i;
> +
> +	for (i = 0; i < IPQESS_NETDEV_QUEUES; i++) {
> +		struct ipqess_tx_ring *tx_ring = &ess->tx_ring[i];
> +		size_t size;
> +		u32 idx;
> +
> +		tx_ring->ess = ess;
> +		tx_ring->ring_id = i;
> +		tx_ring->idx = i * 4;
> +		tx_ring->count = IPQESS_TX_RING_SIZE;
> +		tx_ring->nq = netdev_get_tx_queue(ess->netdev, i);
> +
> +		size = sizeof(struct ipqess_buf) * IPQESS_TX_RING_SIZE;
> +		tx_ring->buf = devm_kzalloc(dev, size, GFP_KERNEL);
> +		if (!tx_ring->buf) {
> +			netdev_err(ess->netdev, "buffer alloc of tx ring failed");
> +			return -ENOMEM;
> +		}
> +
> +		size = sizeof(struct ipqess_tx_desc) * IPQESS_TX_RING_SIZE;
> +		tx_ring->hw_desc = dmam_alloc_coherent(dev, size, &tx_ring->dma,
> +						       GFP_KERNEL | __GFP_ZERO);

Hi,

Nitpicking: I think that __GFP_ZERO is useless (and harmless) because 
dma_alloc_coherent() always zeroes the memory that is allocated.

CJ


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 4/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller
  2022-05-14 15:06 ` [PATCH net-next v2 4/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
@ 2022-05-18  0:52   ` Rob Herring
  0 siblings, 0 replies; 24+ messages in thread
From: Rob Herring @ 2022-05-18  0:52 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, netdev, linux-kernel, devicetree, thomas.petazzoni,
	Andrew Lunn, Florian Fainelli, Heiner Kallweit, Russell King,
	linux-arm-kernel, Vladimir Oltean, Luka Perkov, Robert Marko

On Sat, May 14, 2022 at 05:06:55PM +0200, Maxime Chevallier wrote:
> Add the DT binding for the IPQESS Ethernet Controller. This is a simple
> controller, only requiring the phy-mode, interrupts, clocks, and
> possibly a MAC address setting.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> ---
> V1->V2:
>  - Fixed the example
>  - Added reset and clocks
>  - Removed generic ethernet attributes
> 
>  .../devicetree/bindings/net/qcom,ipqess.yaml  | 104 ++++++++++++++++++
>  1 file changed, 104 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/qcom,ipqess.yaml
> 
> diff --git a/Documentation/devicetree/bindings/net/qcom,ipqess.yaml b/Documentation/devicetree/bindings/net/qcom,ipqess.yaml
> new file mode 100644
> index 000000000000..ea0023509737
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/qcom,ipqess.yaml
> @@ -0,0 +1,104 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/qcom,ipqess.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Qualcomm IPQ ESS EDMA Ethernet Controller
> +
> +maintainers:
> +  - Maxime Chevallier <maxime.chevallier@bootlin.com>
> +
> +allOf:
> +  - $ref: "ethernet-controller.yaml#"
> +
> +properties:
> +  compatible:
> +    const: qcom,ipq4019e-ess-edma
> +
> +  reg:
> +    maxItems: 1
> +
> +  interrupts:
> +    minItems: 2
> +    maxItems: 32
> +    description: One interrupt per tx and rx queue, with up to 16 queues.
> +
> +  clocks:
> +    maxItems: 1
> +
> +  clock-names:
> +    const: ess

Always kind of pointless to have a single *-names entry when it's just 
the block name.

> +
> +  resets:
> +    maxItems: 1
> +
> +  reset-names:
> +    const: ess

ditto

> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +  - clocks
> +  - clock-names
> +  - resets
> +  - phy-mode
> +
> +unevaluatedProperties: false
> +
> +examples:
> +  - |
> +    #include <dt-bindings/clock/qcom,gcc-ipq4019.h>
> +    #include <dt-bindings/interrupt-controller/arm-gic.h>
> +    #include <dt-bindings/interrupt-controller/irq.h>
> +    gmac: ethernet@c080000 {
> +        compatible = "qcom,ipq4019-ess-edma";
> +        reg = <0xc080000 0x8000>;
> +        resets = <&gcc ESS_RESET>;
> +        reset-names = "ess";
> +        clocks = <&gcc GCC_ESS_CLK>;
> +        clock-names = "ess";
> +        interrupts = <GIC_SPI  65 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  66 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  67 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  68 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  69 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  70 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  71 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  72 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  73 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  74 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  75 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  76 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  77 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  78 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  79 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI  80 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 240 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 241 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 242 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 243 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 244 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 245 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 246 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 247 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 248 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 249 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 250 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 251 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 252 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 253 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 254 IRQ_TYPE_EDGE_RISING>,
> +                     <GIC_SPI 255 IRQ_TYPE_EDGE_RISING>;
> +
> +        phy-mode = "internal";
> +        fixed-link {
> +            speed = <1000>;
> +            full-duplex;
> +            pause;
> +            asym-pause;
> +        };
> +    };
> +
> +...
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-17  7:01     ` Maxime Chevallier
@ 2022-05-19 14:52       ` Vladimir Oltean
  2022-05-19 17:11         ` Florian Fainelli
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2022-05-19 14:52 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko

On Tue, May 17, 2022 at 09:01:56AM +0200, Maxime Chevallier wrote:
> Hi Vlad,
> 
> On Sat, 14 May 2022 22:40:03 +0000
> Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> 
> > On Sat, May 14, 2022 at 05:06:53PM +0200, Maxime Chevallier wrote:
> > > This tagging protocol is designed for the situation where the link
> > > between the MAC and the Switch is designed such that the Destination
> > > Port, which is usually embedded in some part of the Ethernet
> > > Header, is sent out-of-band, and isn't present at all in the
> > > Ethernet frame.
> > > 
> > > This can happen when the MAC and Switch are tightly integrated on an
> > > SoC, as is the case with the Qualcomm IPQ4019 for example, where
> > > the DSA tag is inserted directly into the DMA descriptors. In that
> > > case, the MAC driver is responsible for sending the tag to the
> > > switch using the out-of-band medium. To do so, the MAC driver needs
> > > to have the information of the destination port for that skb.
> > > 
> > > This out-of-band tagging protocol is using the very beggining of
> > > the skb headroom to store the tag. The drawback of this approch is
> > > that the headroom isn't initialized upon allocating it, therefore
> > > we have a chance that the garbage data that lies there at
> > > allocation time actually ressembles a valid oob tag. This is only
> > > problematic if we are sending/receiving traffic on the master port,
> > > which isn't a valid DSA use-case from the beggining. When dealing
> > > from traffic to/from a slave port, then the oob tag will be
> > > initialized properly by the tagger or the mac driver through the
> > > use of the dsa_oob_tag_push() call.
> > > 
> > > Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> > > ---  
> > 
> > Why put the DSA pseudo-header at skb->head rather than push it using
> > skb_push()? I thought you were going to check for the presence of a
> > DSA header using something like skb->mac_len == ETH_HLEN + tag len,
> > but right now it sounds like treating garbage in the headroom as a
> > valid DSA tag is indeed a potential problem. If you can't sort that
> > out using information from the header offsets alone, maybe an skb
> > extension is required?
> 
> Indeed, I thought of that, the main reason is that pushing/poping in
> itself is not enough, you also have to move the whole mac_header to
> leave room for the tag, and then re-set it in it's original location.
> There's nothing wrong with this, but it looked a bit cumbersome just to
> insert a dummy tag that gets removed rightaway. Does that make sense ?

You're thinking about inserting a header before the EtherType. But what
has been said was to _prepend_ a header, i.e. put it before the Ethernet
MAC DA. That way you don't need to move the Ethernet header.

But anyway, too much talk for mostly nothing, see below.

> But yes I would really like to get a way to know wether the tag is
> there or not, I'll dig a bit more to see if I can find a way to get
> this info from the various skb offsets in a reliable way.

Without an skb extension, this seems like an impossible task to me
(which should also answer Florian's request for feedback on the proposal
to share skb->cb with GRO, the qdisc, and whomever else there might be
in the path between the DSA master and the switch).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-19 14:52       ` Vladimir Oltean
@ 2022-05-19 17:11         ` Florian Fainelli
  2022-05-19 17:34           ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Fainelli @ 2022-05-19 17:11 UTC (permalink / raw)
  To: Vladimir Oltean, Maxime Chevallier
  Cc: davem, Rob Herring, netdev, linux-kernel, devicetree,
	thomas.petazzoni, Andrew Lunn, Heiner Kallweit, Russell King,
	linux-arm-kernel, Luka Perkov, Robert Marko



On 5/19/2022 7:52 AM, Vladimir Oltean wrote:
> On Tue, May 17, 2022 at 09:01:56AM +0200, Maxime Chevallier wrote:
>> Hi Vlad,
>>
>> On Sat, 14 May 2022 22:40:03 +0000
>> Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>>
>>> On Sat, May 14, 2022 at 05:06:53PM +0200, Maxime Chevallier wrote:
>>>> This tagging protocol is designed for the situation where the link
>>>> between the MAC and the Switch is designed such that the Destination
>>>> Port, which is usually embedded in some part of the Ethernet
>>>> Header, is sent out-of-band, and isn't present at all in the
>>>> Ethernet frame.
>>>>
>>>> This can happen when the MAC and Switch are tightly integrated on an
>>>> SoC, as is the case with the Qualcomm IPQ4019 for example, where
>>>> the DSA tag is inserted directly into the DMA descriptors. In that
>>>> case, the MAC driver is responsible for sending the tag to the
>>>> switch using the out-of-band medium. To do so, the MAC driver needs
>>>> to have the information of the destination port for that skb.
>>>>
>>>> This out-of-band tagging protocol is using the very beggining of
>>>> the skb headroom to store the tag. The drawback of this approch is
>>>> that the headroom isn't initialized upon allocating it, therefore
>>>> we have a chance that the garbage data that lies there at
>>>> allocation time actually ressembles a valid oob tag. This is only
>>>> problematic if we are sending/receiving traffic on the master port,
>>>> which isn't a valid DSA use-case from the beggining. When dealing
>>>> from traffic to/from a slave port, then the oob tag will be
>>>> initialized properly by the tagger or the mac driver through the
>>>> use of the dsa_oob_tag_push() call.
>>>>
>>>> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
>>>> ---
>>>
>>> Why put the DSA pseudo-header at skb->head rather than push it using
>>> skb_push()? I thought you were going to check for the presence of a
>>> DSA header using something like skb->mac_len == ETH_HLEN + tag len,
>>> but right now it sounds like treating garbage in the headroom as a
>>> valid DSA tag is indeed a potential problem. If you can't sort that
>>> out using information from the header offsets alone, maybe an skb
>>> extension is required?
>>
>> Indeed, I thought of that, the main reason is that pushing/poping in
>> itself is not enough, you also have to move the whole mac_header to
>> leave room for the tag, and then re-set it in it's original location.
>> There's nothing wrong with this, but it looked a bit cumbersome just to
>> insert a dummy tag that gets removed rightaway. Does that make sense ?
> 
> You're thinking about inserting a header before the EtherType. But what
> has been said was to _prepend_ a header, i.e. put it before the Ethernet
> MAC DA. That way you don't need to move the Ethernet header.
> 
> But anyway, too much talk for mostly nothing, see below.
> 
>> But yes I would really like to get a way to know wether the tag is
>> there or not, I'll dig a bit more to see if I can find a way to get
>> this info from the various skb offsets in a reliable way.
> 
> Without an skb extension, this seems like an impossible task to me
> (which should also answer Florian's request for feedback on the proposal
> to share skb->cb with GRO, the qdisc, and whomever else there might be
> in the path between the DSA master and the switch).

Sorry I should have been clearer, the patch series that I pointed Maxime 
at earlier:

https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/

was initially accepted only to be reverted later on because on 64-bit 
host, there was not enough room in skb->cb[] to insert 4 bytes, so it 
got reverted.

So yes, I think we need to allocate a custom SKB extension if we want to 
convey the tag, unless we somehow manage to put it in the linear portion 
of the SKB to avoid using any control buffer or extension.
-- 
Florian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol
  2022-05-19 17:11         ` Florian Fainelli
@ 2022-05-19 17:34           ` Vladimir Oltean
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2022-05-19 17:34 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Maxime Chevallier, davem, Rob Herring, netdev, linux-kernel,
	devicetree, thomas.petazzoni, Andrew Lunn, Heiner Kallweit,
	Russell King, linux-arm-kernel, Luka Perkov, Robert Marko

On Thu, May 19, 2022 at 10:11:13AM -0700, Florian Fainelli wrote:
> unless we somehow manage to put it in the linear portion of
> the SKB to avoid using any control buffer or extension.

But how? Essentially the DSA master has to look at a packet and
determine whether it came from DSA based on something which non-DSA
code could not have done. In fact, I'm looking at the calls to
skb_reset_mac_{header,len} from net/core/skbuff.c, specifically at VLAN
and MPLS, and I believe (but haven't tested) that pushing such headers
would also alter skb->mac_len to some value != ETH_HLEN. So simply
having the DSA master check whether DSA was there by checking whether
skb->mac_len is ETH_HLEN + DSA tag len could easily confuse DSA with
some other protocol of same header size.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-05-19 17:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-14 15:06 [PATCH net-next v2 0/5] net: ipqess: introduce Qualcomm IPQESS driver Maxime Chevallier
2022-05-14 15:06 ` [PATCH net-next v2 1/5] net: ipqess: introduce the " Maxime Chevallier
2022-05-14 17:18   ` Russell King (Oracle)
2022-05-17  7:09     ` Maxime Chevallier
2022-05-14 20:44   ` Vladimir Oltean
2022-05-17  7:11     ` Maxime Chevallier
2022-05-16  2:51   ` Andrew Lunn
2022-05-17  7:13     ` Maxime Chevallier
2022-05-17 21:03   ` Christophe JAILLET
2022-05-14 15:06 ` [PATCH net-next v2 2/5] net: dsa: add out-of-band tagging protocol Maxime Chevallier
2022-05-14 16:33   ` Florian Fainelli
2022-05-17  7:06     ` Maxime Chevallier
2022-05-14 22:40   ` Vladimir Oltean
2022-05-17  7:01     ` Maxime Chevallier
2022-05-19 14:52       ` Vladimir Oltean
2022-05-19 17:11         ` Florian Fainelli
2022-05-19 17:34           ` Vladimir Oltean
2022-05-16 19:20   ` Jakub Kicinski
2022-05-17  6:53     ` Maxime Chevallier
2022-05-17 20:58       ` Jakub Kicinski
2022-05-14 15:06 ` [PATCH net-next v2 3/5] net: ipqess: Add out-of-band DSA tagging support Maxime Chevallier
2022-05-14 15:06 ` [PATCH net-next v2 4/5] net: dt-bindings: Introduce the Qualcomm IPQESS Ethernet controller Maxime Chevallier
2022-05-18  0:52   ` Rob Herring
2022-05-14 15:06 ` [PATCH net-next v2 5/5] ARM: dts: qcom: ipq4019: Add description for the " Maxime Chevallier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).