Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
@ 2020-07-29 18:29 David Thompson
  2020-07-29 19:41 ` David Thompson
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: David Thompson @ 2020-07-29 18:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jiri, David Thompson, Asmaa Mnebhi

This patch adds build and driver logic for the "mlxbf_gige"
Ethernet driver from Mellanox Technologies. The second
generation BlueField SoC from Mellanox supports an
out-of-band GigaBit Ethernet management port to the Arm
subsystem.  This driver supports TCP/IP network connectivity
for that port, and provides back-end routines to handle
basic ethtool requests.

The logic in "mlxbf_gige_main.c" is the driver performing
packet processing and handling ethtool management requests.
The driver interfaces to the Gigabit Ethernet block of
BlueField SoC via MMIO accesses to registers, which contain
control information or pointers describing transmit and
receive resources.  There is a single transmit queue, and
the port supports transmit ring sizes of 4 to 256 entries.
There is a single receive queue, and the port supports
receive ring sizes of 32 to 32K entries. The transmit and
receive rings are allocated from DMA coherent memory. There
is a 16-bit producer and consumer index per ring to denote
software ownership and hardware ownership, respetcively.
The main driver supports the handling of some basic ethtool
requests: get driver info, get/set ring parameters, get
registers, and get statistics.

The logic in "mlxbf_gige_mdio.c" is the driver controlling
the Mellanox BlueField hardware that interacts with a PHY
device via MDIO/MDC pins.  This driver does the following:
  - At driver probe time, it configures several BlueField MDIO
    parameters such as sample rate, full drive, voltage and MDC
    based on values read from ACPI table.
  - It defines functions to read and write MDIO registers and
    registers the MDIO bus.
  - It defines the phy interrupt handler reporting a
    link up/down status change
  - This driver's probe is invoked from the main driver logic
    while the phy interrupt handler is registered in ndo_open.

Driver limitations
  - Only supports 1Gbps speed
  - Only supports GMII protocol
  - Supports maximum packet size of 2KB
  - Does not support scatter-gather buffering

Testing
  - Successful build of kernel for ARM64, ARM32, X86_64
  - Tested ARM64 build on FastModels & Palladium

Signed-off-by: David Thompson <dthompson@mellanox.com>
Signed-off-by: Asmaa Mnebhi <asmaa@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/Kconfig              |    1 +
 drivers/net/ethernet/mellanox/Makefile             |    1 +
 drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig   |   13 +
 drivers/net/ethernet/mellanox/mlxbf_gige/Makefile  |    5 +
 .../net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h  |  156 +++
 .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 1277 ++++++++++++++++++++
 .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c |  423 +++++++
 .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h |   73 ++
 8 files changed, 1949 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig
 create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h

diff --git a/drivers/net/ethernet/mellanox/Kconfig b/drivers/net/ethernet/mellanox/Kconfig
index ff6613a..b4f66eb 100644
--- a/drivers/net/ethernet/mellanox/Kconfig
+++ b/drivers/net/ethernet/mellanox/Kconfig
@@ -22,5 +22,6 @@ source "drivers/net/ethernet/mellanox/mlx4/Kconfig"
 source "drivers/net/ethernet/mellanox/mlx5/core/Kconfig"
 source "drivers/net/ethernet/mellanox/mlxsw/Kconfig"
 source "drivers/net/ethernet/mellanox/mlxfw/Kconfig"
+source "drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig"
 
 endif # NET_VENDOR_MELLANOX
diff --git a/drivers/net/ethernet/mellanox/Makefile b/drivers/net/ethernet/mellanox/Makefile
index 79773ac..d4b5f54 100644
--- a/drivers/net/ethernet/mellanox/Makefile
+++ b/drivers/net/ethernet/mellanox/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_MLX4_CORE) += mlx4/
 obj-$(CONFIG_MLX5_CORE) += mlx5/core/
 obj-$(CONFIG_MLXSW_CORE) += mlxsw/
 obj-$(CONFIG_MLXFW) += mlxfw/
+obj-$(CONFIG_MLXBF_GIGE) += mlxbf_gige/
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig b/drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig
new file mode 100644
index 0000000..73c5d74
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
+#
+# Mellanox GigE driver configuration
+#
+
+config MLXBF_GIGE
+	tristate "Mellanox Technologies BlueField Gigabit Ethernet support"
+	depends on (ARM64 || COMPILE_TEST) && ACPI && INET
+	select PHYLIB
+	help
+	  The second generation BlueField SoC from Mellanox Technologies
+	  supports an out-of-band Gigabit Ethernet management port to the
+	  Arm subsystem.
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/Makefile b/drivers/net/ethernet/mellanox/mlxbf_gige/Makefile
new file mode 100644
index 0000000..f6be6c6
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
+
+obj-$(CONFIG_MLXBF_GIGE) += mlxbf_gige.o
+
+mlxbf_gige-y := mlxbf_gige_main.o mlxbf_gige_mdio.o
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h
new file mode 100644
index 0000000..f89199d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h
@@ -0,0 +1,156 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB */
+
+/* Header file for Gigabit Ethernet driver for Mellanox BlueField SoC
+ * - this file contains software data structures and any chip-specific
+ *   data structures (e.g. TX WQE format) that are memory resident.
+ *
+ * Copyright (c) 2020 Mellanox Technologies Ltd.
+ */
+
+#ifndef __MLXBF_GIGE_H__
+#define __MLXBF_GIGE_H__
+
+#include <linux/irqreturn.h>
+#include <linux/netdevice.h>
+
+/* The silicon design supports a maximum RX ring size of
+ * 32K entries. Based on current testing this maximum size
+ * is not required to be supported.  Instead the RX ring
+ * will be capped at a realistic value of 1024 entries.
+ */
+#define MLXBF_GIGE_MIN_RXQ_SZ     32
+#define MLXBF_GIGE_MAX_RXQ_SZ     1024
+#define MLXBF_GIGE_DEFAULT_RXQ_SZ 128
+
+#define MLXBF_GIGE_MIN_TXQ_SZ     4
+#define MLXBF_GIGE_MAX_TXQ_SZ     256
+#define MLXBF_GIGE_DEFAULT_TXQ_SZ 128
+
+#define MLXBF_GIGE_DEFAULT_BUF_SZ 2048
+
+/* There are four individual MAC RX filters. Currently
+ * two of them are being used: one for the broadcast MAC
+ * (index 0) and one for local MAC (index 1)
+ */
+#define MLXBF_GIGE_BCAST_MAC_FILTER_IDX 0
+#define MLXBF_GIGE_LOCAL_MAC_FILTER_IDX 1
+
+/* Define for broadcast MAC literal */
+#define BCAST_MAC_ADDR 0xFFFFFFFFFFFF
+
+/* There are three individual interrupts:
+ *   1) Errors, "OOB" interrupt line
+ *   2) Receive Packet, "OOB_LLU" interrupt line
+ *   3) LLU and PLU Events, "OOB_PLU" interrupt line
+ */
+#define MLXBF_GIGE_ERROR_INTR_IDX       0
+#define MLXBF_GIGE_RECEIVE_PKT_INTR_IDX 1
+#define MLXBF_GIGE_LLU_PLU_INTR_IDX     2
+#define MLXBF_GIGE_PHY_INT_N            3
+
+#define MLXBF_GIGE_MDIO_DEFAULT_PHY_ADDR 0x3
+
+struct mlxbf_gige_stats {
+	u64 hw_access_errors;
+	u64 tx_invalid_checksums;
+	u64 tx_small_frames;
+	u64 tx_index_errors;
+	u64 sw_config_errors;
+	u64 sw_access_errors;
+	u64 rx_truncate_errors;
+	u64 rx_mac_errors;
+	u64 rx_din_dropped_pkts;
+	u64 tx_fifo_full;
+	u64 rx_filter_passed_pkts;
+	u64 rx_filter_discard_pkts;
+};
+
+struct mlxbf_gige {
+	void __iomem *base;
+	void __iomem *llu_base;
+	void __iomem *plu_base;
+	struct device *dev;
+	struct net_device *netdev;
+	struct platform_device *pdev;
+	void __iomem *mdio_io;
+	struct mii_bus *mdiobus;
+	void __iomem *gpio_io;
+	void __iomem *cause_rsh_coalesce0_io;
+	void __iomem *cause_gpio_arm_coalesce0_io;
+	u32 phy_int_gpio_mask;
+	spinlock_t lock;
+	spinlock_t gpio_lock;
+	u16 rx_q_entries;
+	u16 tx_q_entries;
+	u64 *tx_wqe_base;
+	dma_addr_t tx_wqe_base_dma;
+	u64 *tx_wqe_next;
+	u64 *tx_cc;
+	dma_addr_t tx_cc_dma;
+	dma_addr_t *rx_wqe_base;
+	dma_addr_t rx_wqe_base_dma;
+	u64 *rx_cqe_base;
+	dma_addr_t rx_cqe_base_dma;
+	u16 tx_pi;
+	u16 prev_tx_ci;
+	u64 error_intr_count;
+	u64 rx_intr_count;
+	u64 llu_plu_intr_count;
+	u8 *rx_buf[MLXBF_GIGE_MAX_RXQ_SZ];
+	u8 *tx_buf[MLXBF_GIGE_MAX_TXQ_SZ];
+	int error_irq;
+	int rx_irq;
+	int llu_plu_irq;
+	bool promisc_enabled;
+	struct napi_struct napi;
+	struct mlxbf_gige_stats stats;
+};
+
+/* Rx Work Queue Element definitions */
+#define MLXBF_GIGE_RX_WQE_SZ                   8
+
+/* Rx Completion Queue Element definitions */
+#define MLXBF_GIGE_RX_CQE_SZ                   8
+#define MLXBF_GIGE_RX_CQE_PKT_LEN_MASK         GENMASK(10, 0)
+#define MLXBF_GIGE_RX_CQE_VALID_MASK           GENMASK(11, 11)
+#define MLXBF_GIGE_RX_CQE_PKT_STATUS_MASK      GENMASK(15, 12)
+#define MLXBF_GIGE_RX_CQE_PKT_STATUS_MAC_ERR   GENMASK(12, 12)
+#define MLXBF_GIGE_RX_CQE_PKT_STATUS_TRUNCATED GENMASK(13, 13)
+#define MLXBF_GIGE_RX_CQE_CHKSUM_MASK          GENMASK(31, 16)
+
+/* Tx Work Queue Element definitions */
+#define MLXBF_GIGE_TX_WQE_SZ_QWORDS            2
+#define MLXBF_GIGE_TX_WQE_SZ                   16
+#define MLXBF_GIGE_TX_WQE_PKT_LEN_MASK         GENMASK(10, 0)
+#define MLXBF_GIGE_TX_WQE_UPDATE_MASK          GENMASK(31, 31)
+#define MLXBF_GIGE_TX_WQE_CHKSUM_LEN_MASK      GENMASK(42, 32)
+#define MLXBF_GIGE_TX_WQE_CHKSUM_START_MASK    GENMASK(55, 48)
+#define MLXBF_GIGE_TX_WQE_CHKSUM_OFFSET_MASK   GENMASK(63, 56)
+
+/* Macro to return packet length of specified TX WQE */
+#define MLXBF_GIGE_TX_WQE_PKT_LEN(tx_wqe_addr) \
+	(*(tx_wqe_addr + 1) & MLXBF_GIGE_TX_WQE_PKT_LEN_MASK)
+
+/* Tx Completion Count */
+#define MLXBF_GIGE_TX_CC_SZ                    8
+
+/* List of resources in ACPI table */
+enum mlxbf_gige_res {
+	MLXBF_GIGE_RES_MAC,
+	MLXBF_GIGE_RES_MDIO9,
+	MLXBF_GIGE_RES_GPIO0,
+	MLXBF_GIGE_RES_CAUSE_RSH_COALESCE0,
+	MLXBF_GIGE_RES_CAUSE_GPIO_ARM_COALESCE0,
+	MLXBF_GIGE_RES_LLU,
+	MLXBF_GIGE_RES_PLU
+};
+
+/* Version of register data returned by mlxbf_gige_get_regs() */
+#define MLXBF_GIGE_REGS_VERSION 1
+
+int mlxbf_gige_mdio_probe(struct platform_device *pdev,
+			  struct mlxbf_gige *priv);
+void mlxbf_gige_mdio_remove(struct mlxbf_gige *priv);
+irqreturn_t mlxbf_gige_mdio_handle_phy_interrupt(struct mlxbf_gige *priv);
+
+#endif /* !defined(__MLXBF_GIGE_H__) */
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
new file mode 100644
index 0000000..a02e7a4
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
@@ -0,0 +1,1277 @@
+// SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
+
+/* Gigabit Ethernet driver for Mellanox BlueField SoC
+ *
+ * Copyright (c) 2020 Mellanox Technologies Ltd.
+ */
+
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/etherdevice.h>
+#include <linux/interrupt.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/platform_device.h>
+
+#include "mlxbf_gige.h"
+#include "mlxbf_gige_regs.h"
+
+#define DRV_NAME    "mlxbf_gige"
+
+static void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv,
+					 unsigned int index, u64 dmac)
+{
+	void __iomem *base = priv->base;
+	u64 control;
+
+	/* Write destination MAC to specified MAC RX filter */
+	writeq(dmac, base + MLXBF_GIGE_RX_MAC_FILTER +
+	       (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE));
+
+	/* Enable MAC receive filter mask for specified index */
+	control = readq(base + MLXBF_GIGE_CONTROL);
+	control |= (MLXBF_GIGE_CONTROL_EN_SPECIFIC_MAC << index);
+	writeq(control, base + MLXBF_GIGE_CONTROL);
+}
+
+static int mlxbf_gige_get_mac_rx_filter(struct mlxbf_gige *priv,
+					unsigned int index, u64 *dmac)
+{
+	void __iomem *base = priv->base;
+
+	/* Read destination MAC from specified MAC RX filter */
+	*dmac = readq(base + MLXBF_GIGE_RX_MAC_FILTER +
+		      (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE));
+
+	return 0;
+}
+
+static void mlxbf_gige_enable_promisc(struct mlxbf_gige *priv)
+{
+	void __iomem *base = priv->base;
+	u64 control;
+
+	/* Enable MAC_ID_RANGE match functionality */
+	control = readq(base + MLXBF_GIGE_CONTROL);
+	control |= MLXBF_GIGE_CONTROL_MAC_ID_RANGE_EN;
+	writeq(control, base + MLXBF_GIGE_CONTROL);
+
+	/* Set start of destination MAC range check to 0 */
+	writeq(0, base + MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_START);
+
+	/* Set end of destination MAC range check to all FFs */
+	writeq(0xFFFFFFFFFFFF, base + MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_END);
+}
+
+static void mlxbf_gige_disable_promisc(struct mlxbf_gige *priv)
+{
+	void __iomem *base = priv->base;
+	u64 control;
+
+	/* Disable MAC_ID_RANGE match functionality */
+	control = readq(base + MLXBF_GIGE_CONTROL);
+	control &= ~MLXBF_GIGE_CONTROL_MAC_ID_RANGE_EN;
+	writeq(control, base + MLXBF_GIGE_CONTROL);
+
+	/* NOTE: no need to change DMAC_RANGE_START or END;
+	 * those values are ignored since MAC_ID_RANGE_EN=0
+	 */
+}
+
+/* Receive Initialization
+ * 1) Configures RX MAC filters via MMIO registers
+ * 2) Allocates RX WQE array using coherent DMA mapping
+ * 3) Initializes each element of RX WQE array with a receive
+ *    buffer pointer (also using coherent DMA mapping)
+ * 4) Allocates RX CQE array using coherent DMA mapping
+ * 5) Completes other misc receive initialization
+ */
+static int mlxbf_gige_rx_init(struct mlxbf_gige *priv)
+{
+	size_t wq_size, cq_size;
+	dma_addr_t *rx_wqe_ptr;
+	dma_addr_t rx_buf_dma;
+	u64 data;
+	int i, j;
+
+	/* Configure MAC RX filter #0 to allow RX of broadcast pkts */
+	mlxbf_gige_set_mac_rx_filter(priv, MLXBF_GIGE_BCAST_MAC_FILTER_IDX,
+				     BCAST_MAC_ADDR);
+
+	wq_size = MLXBF_GIGE_RX_WQE_SZ * priv->rx_q_entries;
+	priv->rx_wqe_base = dma_alloc_coherent(priv->dev, wq_size,
+					       &priv->rx_wqe_base_dma,
+					       GFP_KERNEL);
+	if (!priv->rx_wqe_base)
+		return -ENOMEM;
+
+	/* Initialize 'rx_wqe_ptr' to point to first RX WQE in array
+	 * Each RX WQE is simply a receive buffer pointer, so walk
+	 * the entire array, allocating a 2KB buffer for each element
+	 */
+	rx_wqe_ptr = priv->rx_wqe_base;
+
+	for (i = 0; i < priv->rx_q_entries; i++) {
+		/* Allocate a receive buffer for this RX WQE. The DMA
+		 * form (dma_addr_t) of the receive buffer address is
+		 * stored in the RX WQE array (via 'rx_wqe_ptr') where
+		 * it is accessible by the GigE device. The VA form of
+		 * the receive buffer is stored in 'rx_buf[]' array in
+		 * the driver private storage for housekeeping.
+		 */
+		priv->rx_buf[i] = dma_alloc_coherent(priv->dev,
+						     MLXBF_GIGE_DEFAULT_BUF_SZ,
+						     &rx_buf_dma,
+						     GFP_KERNEL);
+		if (!priv->rx_buf[i])
+			goto free_wqe_and_buf;
+
+		*rx_wqe_ptr++ = rx_buf_dma;
+	}
+
+	/* Write RX WQE base address into MMIO reg */
+	writeq(priv->rx_wqe_base_dma, priv->base + MLXBF_GIGE_RX_WQ_BASE);
+
+	cq_size = MLXBF_GIGE_RX_CQE_SZ * priv->rx_q_entries;
+	priv->rx_cqe_base = dma_alloc_coherent(priv->dev, cq_size,
+					       &priv->rx_cqe_base_dma,
+					       GFP_KERNEL);
+	if (!priv->rx_cqe_base)
+		goto free_wqe_and_buf;
+
+	/* Write RX CQE base address into MMIO reg */
+	writeq(priv->rx_cqe_base_dma, priv->base + MLXBF_GIGE_RX_CQ_BASE);
+
+	/* Write RX_WQE_PI with current number of replenished buffers */
+	writeq(priv->rx_q_entries, priv->base + MLXBF_GIGE_RX_WQE_PI);
+
+	/* Enable RX DMA to write new packets to memory */
+	writeq(MLXBF_GIGE_RX_DMA_EN, priv->base + MLXBF_GIGE_RX_DMA);
+
+	/* Enable removal of CRC during RX */
+	data = readq(priv->base + MLXBF_GIGE_RX);
+	data |= MLXBF_GIGE_RX_STRIP_CRC_EN;
+	writeq(data, priv->base + MLXBF_GIGE_RX);
+
+	/* Enable RX MAC filter pass and discard counters */
+	writeq(MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC_EN,
+	       priv->base + MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC);
+	writeq(MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS_EN,
+	       priv->base + MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS);
+
+	/* Clear MLXBF_GIGE_INT_MASK 'receive pkt' bit to
+	 * indicate readiness to receive pkts
+	 */
+	data = readq(priv->base + MLXBF_GIGE_INT_MASK);
+	data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
+	writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
+
+	writeq(ilog2(priv->rx_q_entries),
+	       priv->base + MLXBF_GIGE_RX_WQE_SIZE_LOG2);
+
+	return 0;
+
+free_wqe_and_buf:
+	rx_wqe_ptr = priv->rx_wqe_base;
+	for (j = 0; j < i; j++) {
+		dma_free_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
+				  priv->rx_buf[j], *rx_wqe_ptr);
+		rx_wqe_ptr++;
+	}
+	dma_free_coherent(priv->dev, wq_size,
+			  priv->rx_wqe_base, priv->rx_wqe_base_dma);
+	return -ENOMEM;
+}
+
+/* Transmit Initialization
+ * 1) Allocates TX WQE array using coherent DMA mapping
+ * 2) Allocates TX completion counter using coherent DMA mapping
+ */
+static int mlxbf_gige_tx_init(struct mlxbf_gige *priv)
+{
+	size_t size;
+
+	size = MLXBF_GIGE_TX_WQE_SZ * priv->tx_q_entries;
+	priv->tx_wqe_base = dma_alloc_coherent(priv->dev, size,
+					       &priv->tx_wqe_base_dma,
+					       GFP_KERNEL);
+	if (!priv->tx_wqe_base)
+		return -ENOMEM;
+
+	priv->tx_wqe_next = priv->tx_wqe_base;
+
+	/* Write TX WQE base address into MMIO reg */
+	writeq(priv->tx_wqe_base_dma, priv->base + MLXBF_GIGE_TX_WQ_BASE);
+
+	/* Allocate address for TX completion count */
+	priv->tx_cc = dma_alloc_coherent(priv->dev, MLXBF_GIGE_TX_CC_SZ,
+					 &priv->tx_cc_dma, GFP_KERNEL);
+
+	if (!priv->tx_cc) {
+		dma_free_coherent(priv->dev, size,
+				  priv->tx_wqe_base, priv->tx_wqe_base_dma);
+		return -ENOMEM;
+	}
+
+	/* Write TX CC base address into MMIO reg */
+	writeq(priv->tx_cc_dma, priv->base + MLXBF_GIGE_TX_CI_UPDATE_ADDRESS);
+
+	writeq(ilog2(priv->tx_q_entries),
+	       priv->base + MLXBF_GIGE_TX_WQ_SIZE_LOG2);
+
+	priv->prev_tx_ci = 0;
+	priv->tx_pi = 0;
+
+	return 0;
+}
+
+/* Receive Deinitialization
+ * This routine will free allocations done by mlxbf_gige_rx_init(),
+ * namely the RX WQE and RX CQE arrays, as well as all RX buffers
+ */
+static void mlxbf_gige_rx_deinit(struct mlxbf_gige *priv)
+{
+	dma_addr_t *rx_wqe_ptr;
+	size_t size;
+	int i;
+
+	rx_wqe_ptr = priv->rx_wqe_base;
+
+	for (i = 0; i < priv->rx_q_entries; i++) {
+		dma_free_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
+				  priv->rx_buf[i], *rx_wqe_ptr);
+		priv->rx_buf[i] = NULL;
+		rx_wqe_ptr++;
+	}
+
+	size = MLXBF_GIGE_RX_WQE_SZ * priv->rx_q_entries;
+	dma_free_coherent(priv->dev, size,
+			  priv->rx_wqe_base, priv->rx_wqe_base_dma);
+
+	size = MLXBF_GIGE_RX_CQE_SZ * priv->rx_q_entries;
+	dma_free_coherent(priv->dev, size,
+			  priv->rx_cqe_base, priv->rx_cqe_base_dma);
+
+	priv->rx_wqe_base = 0;
+	priv->rx_wqe_base_dma = 0;
+	priv->rx_cqe_base = 0;
+	priv->rx_cqe_base_dma = 0;
+	writeq(0, priv->base + MLXBF_GIGE_RX_WQ_BASE);
+	writeq(0, priv->base + MLXBF_GIGE_RX_CQ_BASE);
+}
+
+/* Transmit Deinitialization
+ * This routine will free allocations done by mlxbf_gige_tx_init(),
+ * namely the TX WQE array and the TX completion counter
+ */
+static void mlxbf_gige_tx_deinit(struct mlxbf_gige *priv)
+{
+	u64 *tx_wqe_ptr;
+	size_t size;
+	int i;
+
+	tx_wqe_ptr = priv->tx_wqe_base;
+
+	for (i = 0; i < priv->tx_q_entries; i++) {
+		if (priv->tx_buf[i]) {
+			dma_free_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
+					  priv->tx_buf[i], *tx_wqe_ptr);
+			priv->tx_buf[i] = NULL;
+		}
+		tx_wqe_ptr += 2;
+	}
+
+	size = MLXBF_GIGE_TX_WQE_SZ * priv->tx_q_entries;
+	dma_free_coherent(priv->dev, size,
+			  priv->tx_wqe_base, priv->tx_wqe_base_dma);
+
+	dma_free_coherent(priv->dev, MLXBF_GIGE_TX_CC_SZ,
+			  priv->tx_cc, priv->tx_cc_dma);
+
+	priv->tx_wqe_base = 0;
+	priv->tx_wqe_base_dma = 0;
+	priv->tx_cc = 0;
+	priv->tx_cc_dma = 0;
+	priv->tx_wqe_next = 0;
+	writeq(0, priv->base + MLXBF_GIGE_TX_WQ_BASE);
+	writeq(0, priv->base + MLXBF_GIGE_TX_CI_UPDATE_ADDRESS);
+}
+
+/* Start of struct ethtool_ops functions */
+static int mlxbf_gige_get_regs_len(struct net_device *netdev)
+{
+	/* Return size of MMIO register space (in bytes).
+	 *
+	 * NOTE: MLXBF_GIGE_MAC_CFG is the last defined register offset,
+	 * so use that plus size of single register to derive total size
+	 */
+	return MLXBF_GIGE_MAC_CFG + 8;
+}
+
+static void mlxbf_gige_get_regs(struct net_device *netdev,
+				struct ethtool_regs *regs, void *p)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+	u64 *buff = p;
+	int reg;
+
+	regs->version = MLXBF_GIGE_REGS_VERSION;
+
+	/* Read entire MMIO register space and store results
+	 * into the provided buffer. Each 64-bit word is converted
+	 * to big-endian to make the output more readable.
+	 *
+	 * NOTE: by design, a read to an offset without an existing
+	 *       register will be acknowledged and return zero.
+	 */
+	for (reg = 0; reg <= MLXBF_GIGE_MAC_CFG; reg += 8)
+		*buff++ = cpu_to_be64(readq(priv->base + reg));
+}
+
+static void mlxbf_gige_get_ringparam(struct net_device *netdev,
+				     struct ethtool_ringparam *ering)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+
+	memset(ering, 0, sizeof(*ering));
+	ering->rx_max_pending = MLXBF_GIGE_MAX_RXQ_SZ;
+	ering->tx_max_pending = MLXBF_GIGE_MAX_TXQ_SZ;
+	ering->rx_pending = priv->rx_q_entries;
+	ering->tx_pending = priv->tx_q_entries;
+}
+
+static int mlxbf_gige_set_ringparam(struct net_device *netdev,
+				    struct ethtool_ringparam *ering)
+{
+	const struct net_device_ops *ops = netdev->netdev_ops;
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+	int new_rx_q_entries, new_tx_q_entries;
+
+	/* Device does not have separate queues for small/large frames */
+	if (ering->rx_mini_pending || ering->rx_jumbo_pending)
+		return -EINVAL;
+
+	/* Round up to supported values */
+	new_rx_q_entries = roundup_pow_of_two(ering->rx_pending);
+	new_tx_q_entries = roundup_pow_of_two(ering->tx_pending);
+
+	/* Range check the new values */
+	if (new_tx_q_entries < MLXBF_GIGE_MIN_TXQ_SZ ||
+	    new_tx_q_entries > MLXBF_GIGE_MAX_TXQ_SZ ||
+	    new_rx_q_entries < MLXBF_GIGE_MIN_RXQ_SZ ||
+	    new_rx_q_entries > MLXBF_GIGE_MAX_RXQ_SZ)
+		return -EINVAL;
+
+	/* If queue sizes did not change, exit now */
+	if (new_rx_q_entries == priv->rx_q_entries &&
+	    new_tx_q_entries == priv->tx_q_entries)
+		return 0;
+
+	if (netif_running(netdev))
+		ops->ndo_stop(netdev);
+
+	priv->rx_q_entries = new_rx_q_entries;
+	priv->tx_q_entries = new_tx_q_entries;
+
+	if (netif_running(netdev))
+		ops->ndo_open(netdev);
+
+	return 0;
+}
+
+static void mlxbf_gige_get_drvinfo(struct net_device *netdev,
+				   struct ethtool_drvinfo *info)
+{
+	strlcpy(info->driver, DRV_NAME, sizeof(info->driver));
+	strlcpy(info->bus_info, dev_name(&netdev->dev), sizeof(info->bus_info));
+}
+
+static const struct {
+	const char string[ETH_GSTRING_LEN];
+} mlxbf_gige_ethtool_stats_keys[] = {
+	{ "rx_bytes" },
+	{ "rx_packets" },
+	{ "tx_bytes" },
+	{ "tx_packets" },
+	{ "hw_access_errors" },
+	{ "tx_invalid_checksums" },
+	{ "tx_small_frames" },
+	{ "tx_index_errors" },
+	{ "sw_config_errors" },
+	{ "sw_access_errors" },
+	{ "rx_truncate_errors" },
+	{ "rx_mac_errors" },
+	{ "rx_din_dropped_pkts" },
+	{ "tx_fifo_full" },
+	{ "rx_filter_passed_pkts" },
+	{ "rx_filter_discard_pkts" },
+};
+
+static int mlxbf_gige_get_sset_count(struct net_device *netdev, int stringset)
+{
+	if (stringset != ETH_SS_STATS)
+		return -EOPNOTSUPP;
+	return ARRAY_SIZE(mlxbf_gige_ethtool_stats_keys);
+}
+
+static void mlxbf_gige_get_strings(struct net_device *netdev, u32 stringset,
+				   u8 *buf)
+{
+	if (stringset != ETH_SS_STATS)
+		return;
+	memcpy(buf, &mlxbf_gige_ethtool_stats_keys,
+	       sizeof(mlxbf_gige_ethtool_stats_keys));
+}
+
+static void mlxbf_gige_get_ethtool_stats(struct net_device *netdev,
+					 struct ethtool_stats *estats,
+					 u64 *data)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+
+	/* Fill data array with interface statistics
+	 *
+	 * NOTE: the data writes must be in
+	 *       sync with the strings shown in
+	 *       the mlxbf_gige_ethtool_stats_keys[] array
+	 *
+	 * NOTE2: certain statistics below are zeroed upon
+	 *        port disable, so the calculation below
+	 *        must include the "cached" value of the stat
+	 *        plus the value read directly from hardware.
+	 *        Cached statistics are currently:
+	 *          rx_din_dropped_pkts
+	 *          rx_filter_passed_pkts
+	 *          rx_filter_discard_pkts
+	 */
+	*data++ = netdev->stats.rx_bytes;
+	*data++ = netdev->stats.rx_packets;
+	*data++ = netdev->stats.tx_bytes;
+	*data++ = netdev->stats.tx_packets;
+	*data++ = priv->stats.hw_access_errors;
+	*data++ = priv->stats.tx_invalid_checksums;
+	*data++ = priv->stats.tx_small_frames;
+	*data++ = priv->stats.tx_index_errors;
+	*data++ = priv->stats.sw_config_errors;
+	*data++ = priv->stats.sw_access_errors;
+	*data++ = priv->stats.rx_truncate_errors;
+	*data++ = priv->stats.rx_mac_errors;
+	*data++ = (priv->stats.rx_din_dropped_pkts +
+		   readq(priv->base + MLXBF_GIGE_RX_DIN_DROP_COUNTER));
+	*data++ = priv->stats.tx_fifo_full;
+	*data++ = (priv->stats.rx_filter_passed_pkts +
+		   readq(priv->base + MLXBF_GIGE_RX_PASS_COUNTER_ALL));
+	*data++ = (priv->stats.rx_filter_discard_pkts +
+		   readq(priv->base + MLXBF_GIGE_RX_DISC_COUNTER_ALL));
+
+	spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+static void mlxbf_gige_get_pauseparam(struct net_device *netdev,
+				      struct ethtool_pauseparam *pause)
+{
+	pause->autoneg = AUTONEG_ENABLE;
+	pause->rx_pause = 1;
+	pause->tx_pause = 1;
+}
+
+static int mlxbf_gige_get_link_ksettings(struct net_device *netdev,
+					 struct ethtool_link_ksettings *link_ksettings)
+{
+	struct phy_device *phydev = netdev->phydev;
+	u32 supported, advertising;
+	u32 lp_advertising = 0;
+	int status;
+
+	supported = SUPPORTED_TP | SUPPORTED_1000baseT_Full |
+		    SUPPORTED_Autoneg | SUPPORTED_Pause;
+
+	advertising = ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg |
+		      ADVERTISED_Pause;
+
+	status = phy_read(phydev, MII_LPA);
+	if (status >= 0)
+		lp_advertising = mii_lpa_to_ethtool_lpa_t(status & 0xffff);
+
+	status = phy_read(phydev, MII_STAT1000);
+	if (status >= 0)
+		lp_advertising |= mii_stat1000_to_ethtool_lpa_t(status & 0xffff);
+
+	ethtool_convert_legacy_u32_to_link_mode(link_ksettings->link_modes.supported,
+						supported);
+	ethtool_convert_legacy_u32_to_link_mode(link_ksettings->link_modes.advertising,
+						advertising);
+	ethtool_convert_legacy_u32_to_link_mode(link_ksettings->link_modes.lp_advertising,
+						lp_advertising);
+
+	link_ksettings->base.autoneg = AUTONEG_ENABLE;
+	link_ksettings->base.speed = SPEED_1000;
+	link_ksettings->base.duplex = DUPLEX_FULL;
+	link_ksettings->base.port = PORT_TP;
+	link_ksettings->base.phy_address = MLXBF_GIGE_MDIO_DEFAULT_PHY_ADDR;
+	link_ksettings->base.transceiver = XCVR_INTERNAL;
+	link_ksettings->base.mdio_support = ETH_MDIO_SUPPORTS_C22;
+	link_ksettings->base.eth_tp_mdix = ETH_TP_MDI_INVALID;
+	link_ksettings->base.eth_tp_mdix_ctrl = ETH_TP_MDI_INVALID;
+
+	return 0;
+}
+
+static const struct ethtool_ops mlxbf_gige_ethtool_ops = {
+	.get_drvinfo		= mlxbf_gige_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+	.get_ringparam		= mlxbf_gige_get_ringparam,
+	.set_ringparam		= mlxbf_gige_set_ringparam,
+	.get_regs_len           = mlxbf_gige_get_regs_len,
+	.get_regs               = mlxbf_gige_get_regs,
+	.get_strings            = mlxbf_gige_get_strings,
+	.get_sset_count         = mlxbf_gige_get_sset_count,
+	.get_ethtool_stats      = mlxbf_gige_get_ethtool_stats,
+	.nway_reset		= phy_ethtool_nway_reset,
+	.get_pauseparam		= mlxbf_gige_get_pauseparam,
+	.get_link_ksettings	= mlxbf_gige_get_link_ksettings,
+};
+
+static void mlxbf_gige_handle_link_change(struct net_device *netdev)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+	struct phy_device *phydev = netdev->phydev;
+	irqreturn_t ret;
+
+	ret = mlxbf_gige_mdio_handle_phy_interrupt(priv);
+	if (ret != IRQ_HANDLED)
+		return;
+
+	/* print new link status only if the interrupt came from the PHY */
+	phy_print_status(phydev);
+}
+
+/* Start of struct net_device_ops functions */
+static irqreturn_t mlxbf_gige_error_intr(int irq, void *dev_id)
+{
+	struct mlxbf_gige *priv;
+	u64 int_status;
+
+	priv = dev_id;
+
+	priv->error_intr_count++;
+
+	int_status = readq(priv->base + MLXBF_GIGE_INT_STATUS);
+
+	if (int_status & MLXBF_GIGE_INT_STATUS_HW_ACCESS_ERROR)
+		priv->stats.hw_access_errors++;
+
+	if (int_status & MLXBF_GIGE_INT_STATUS_TX_CHECKSUM_INPUTS) {
+		priv->stats.tx_invalid_checksums++;
+		/* This error condition is latched into MLXBF_GIGE_INT_STATUS
+		 * when the GigE silicon operates on the offending
+		 * TX WQE. The write to MLXBF_GIGE_INT_STATUS at the bottom
+		 * of this routine clears this error condition.
+		 */
+	}
+
+	if (int_status & MLXBF_GIGE_INT_STATUS_TX_SMALL_FRAME_SIZE) {
+		priv->stats.tx_small_frames++;
+		/* This condition happens when the networking stack invokes
+		 * this driver's "start_xmit()" method with a packet whose
+		 * size < 60 bytes.  The GigE silicon will automatically pad
+		 * this small frame up to a minimum-sized frame before it is
+		 * sent. The "tx_small_frame" condition is latched into the
+		 * MLXBF_GIGE_INT_STATUS register when the GigE silicon
+		 * operates on the offending TX WQE. The write to
+		 * MLXBF_GIGE_INT_STATUS at the bottom of this routine
+		 * clears this condition.
+		 */
+	}
+
+	if (int_status & MLXBF_GIGE_INT_STATUS_TX_PI_CI_EXCEED_WQ_SIZE)
+		priv->stats.tx_index_errors++;
+
+	if (int_status & MLXBF_GIGE_INT_STATUS_SW_CONFIG_ERROR)
+		priv->stats.sw_config_errors++;
+
+	if (int_status & MLXBF_GIGE_INT_STATUS_SW_ACCESS_ERROR)
+		priv->stats.sw_access_errors++;
+
+	/* Clear all error interrupts by writing '1' back to
+	 * all the asserted bits in INT_STATUS.  Do not write
+	 * '1' back to 'receive packet' bit, since that is
+	 * managed separately.
+	 */
+
+	int_status &= ~MLXBF_GIGE_INT_STATUS_RX_RECEIVE_PACKET;
+
+	writeq(int_status, priv->base + MLXBF_GIGE_INT_STATUS);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t mlxbf_gige_rx_intr(int irq, void *dev_id)
+{
+	struct mlxbf_gige *priv;
+
+	priv = dev_id;
+
+	priv->rx_intr_count++;
+
+	/* Driver has been interrupted because a new packet is available,
+	 * but do not process packets at this time.  Instead, disable any
+	 * further "packet rx" interrupts and tell the networking subsystem
+	 * to poll the driver to pick up all available packets.
+	 *
+	 * NOTE: GigE silicon automatically disables "packet rx" interrupt by
+	 *       setting MLXBF_GIGE_INT_MASK bit0 upon triggering the interrupt
+	 *       to the ARM cores.  Software needs to re-enable "packet rx"
+	 *       interrupts by clearing MLXBF_GIGE_INT_MASK bit0.
+	 */
+
+	/* Tell networking subsystem to poll GigE driver */
+	napi_schedule(&priv->napi);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t mlxbf_gige_llu_plu_intr(int irq, void *dev_id)
+{
+	struct mlxbf_gige *priv;
+
+	priv = dev_id;
+	priv->llu_plu_intr_count++;
+
+	return IRQ_HANDLED;
+}
+
+/* Function that returns status of TX ring:
+ *          0: TX ring is full, i.e. there are no
+ *             available un-used entries in TX ring.
+ *   non-null: TX ring is not full, i.e. there are
+ *             some available entries in TX ring.
+ *             The non-null value is a measure of
+ *             how many TX entries are available, but
+ *             it is not the exact number of available
+ *             entries (see below).
+ *
+ * The algorithm makes the assumption that if
+ * (prev_tx_ci == tx_pi) then the TX ring is empty.
+ * An empty ring actually has (tx_q_entries-1)
+ * entries, which allows the algorithm to differentiate
+ * the case of an empty ring vs. a full ring.
+ */
+static u16 mlxbf_gige_tx_buffs_avail(struct mlxbf_gige *priv)
+{
+	unsigned long flags;
+	u16 avail;
+
+	spin_lock_irqsave(&priv->lock, flags);
+
+	if (priv->prev_tx_ci == priv->tx_pi)
+		avail = priv->tx_q_entries - 1;
+	else
+		avail = ((priv->tx_q_entries + priv->prev_tx_ci - priv->tx_pi)
+			  % priv->tx_q_entries) - 1;
+
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	return avail;
+}
+
+static bool mlxbf_gige_handle_tx_complete(struct mlxbf_gige *priv)
+{
+	struct net_device_stats *stats;
+	u16 tx_wqe_index;
+	u64 *tx_wqe_addr;
+	u64 tx_status;
+	u16 tx_ci;
+
+	tx_status = readq(priv->base + MLXBF_GIGE_TX_STATUS);
+	if (tx_status & MLXBF_GIGE_TX_STATUS_DATA_FIFO_FULL)
+		priv->stats.tx_fifo_full++;
+	tx_ci = readq(priv->base + MLXBF_GIGE_TX_CONSUMER_INDEX);
+	stats = &priv->netdev->stats;
+
+	/* Transmit completion logic needs to loop until the completion
+	 * index (in SW) equals TX consumer index (from HW).  These
+	 * parameters are unsigned 16-bit values and the wrap case needs
+	 * to be supported, that is TX consumer index wrapped from 0xFFFF
+	 * to 0 while TX completion index is still < 0xFFFF.
+	 */
+	for (; priv->prev_tx_ci != tx_ci; priv->prev_tx_ci++) {
+		tx_wqe_index = priv->prev_tx_ci % priv->tx_q_entries;
+		/* Each TX WQE is 16 bytes. The 8 MSB store the 2KB TX
+		 * buffer address and the 8 LSB contain information
+		 * about the TX WQE.
+		 */
+		tx_wqe_addr = priv->tx_wqe_base +
+			       (tx_wqe_index * MLXBF_GIGE_TX_WQE_SZ_QWORDS);
+
+		stats->tx_packets++;
+		stats->tx_bytes += MLXBF_GIGE_TX_WQE_PKT_LEN(tx_wqe_addr);
+		dma_free_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
+				  priv->tx_buf[tx_wqe_index], *tx_wqe_addr);
+		priv->tx_buf[tx_wqe_index] = NULL;
+	}
+
+	/* Since the TX ring was likely just drained, check if TX queue
+	 * had previously been stopped and now that there are TX buffers
+	 * available the TX queue can be awakened.
+	 */
+	if (netif_queue_stopped(priv->netdev) &&
+	    mlxbf_gige_tx_buffs_avail(priv)) {
+		netif_wake_queue(priv->netdev);
+	}
+
+	return true;
+}
+
+static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
+{
+	struct net_device *netdev = priv->netdev;
+	u16 rx_pi_rem, rx_ci_rem;
+	struct sk_buff *skb;
+	u64 *rx_cqe_addr;
+	u64 datalen;
+	u64 rx_cqe;
+	u16 rx_ci;
+	u16 rx_pi;
+	u8 *pktp;
+
+	/* Index into RX buffer array is rx_pi w/wrap based on RX_CQE_SIZE */
+	rx_pi = readq(priv->base + MLXBF_GIGE_RX_WQE_PI);
+	rx_pi_rem = rx_pi % priv->rx_q_entries;
+	pktp = priv->rx_buf[rx_pi_rem];
+	rx_cqe_addr = priv->rx_cqe_base + rx_pi_rem;
+	rx_cqe = *rx_cqe_addr;
+	datalen = rx_cqe & MLXBF_GIGE_RX_CQE_PKT_LEN_MASK;
+
+	if ((rx_cqe & MLXBF_GIGE_RX_CQE_PKT_STATUS_MASK) == 0) {
+		/* Packet is OK, increment stats */
+		netdev->stats.rx_packets++;
+		netdev->stats.rx_bytes += datalen;
+
+		skb = dev_alloc_skb(datalen);
+		if (!skb) {
+			netdev->stats.rx_dropped++;
+			return false;
+		}
+
+		memcpy(skb_put(skb, datalen), pktp, datalen);
+
+		skb->dev = netdev;
+		skb->protocol = eth_type_trans(skb, netdev);
+		skb->ip_summed = CHECKSUM_NONE; /* device did not checksum packet */
+
+		netif_receive_skb(skb);
+	} else if (rx_cqe & MLXBF_GIGE_RX_CQE_PKT_STATUS_MAC_ERR) {
+		priv->stats.rx_mac_errors++;
+	} else if (rx_cqe & MLXBF_GIGE_RX_CQE_PKT_STATUS_TRUNCATED) {
+		priv->stats.rx_truncate_errors++;
+	}
+
+	/* Let hardware know we've replenished one buffer */
+	writeq(rx_pi + 1, priv->base + MLXBF_GIGE_RX_WQE_PI);
+
+	(*rx_pkts)++;
+	rx_pi = readq(priv->base + MLXBF_GIGE_RX_WQE_PI);
+	rx_pi_rem = rx_pi % priv->rx_q_entries;
+	rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
+	rx_ci_rem = rx_ci % priv->rx_q_entries;
+
+	return rx_pi_rem != rx_ci_rem;
+}
+
+/* Driver poll() function called by NAPI infrastructure */
+static int mlxbf_gige_poll(struct napi_struct *napi, int budget)
+{
+	struct mlxbf_gige *priv;
+	bool remaining_pkts;
+	int work_done = 0;
+	u64 data;
+
+	priv = container_of(napi, struct mlxbf_gige, napi);
+
+	mlxbf_gige_handle_tx_complete(priv);
+
+	do {
+		remaining_pkts = mlxbf_gige_rx_packet(priv, &work_done);
+	} while (remaining_pkts && work_done < budget);
+
+	/* If amount of work done < budget, turn off NAPI polling
+	 * via napi_complete_done(napi, work_done) and then
+	 * re-enable interrupts.
+	 */
+	if (work_done < budget && napi_complete_done(napi, work_done)) {
+		/* Clear MLXBF_GIGE_INT_MASK 'receive pkt' bit to
+		 * indicate receive readiness
+		 */
+		data = readq(priv->base + MLXBF_GIGE_INT_MASK);
+		data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
+		writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
+	}
+
+	return work_done;
+}
+
+static int mlxbf_gige_request_irqs(struct mlxbf_gige *priv)
+{
+	int err;
+
+	err = devm_request_irq(priv->dev, priv->error_irq,
+			       mlxbf_gige_error_intr, 0, "mlxbf_gige_error",
+			       priv);
+	if (err) {
+		dev_err(priv->dev, "Request error_irq failure\n");
+		return err;
+	}
+
+	err = devm_request_irq(priv->dev, priv->rx_irq,
+			       mlxbf_gige_rx_intr, 0, "mlxbf_gige_rx",
+			       priv);
+	if (err) {
+		dev_err(priv->dev, "Request rx_irq failure\n");
+		return err;
+	}
+
+	err = devm_request_irq(priv->dev, priv->llu_plu_irq,
+			       mlxbf_gige_llu_plu_intr, 0, "mlxbf_gige_llu_plu",
+			       priv);
+	if (err) {
+		dev_err(priv->dev, "Request llu_plu_irq failure\n");
+		return err;
+	}
+
+	return 0;
+}
+
+static void mlxbf_gige_free_irqs(struct mlxbf_gige *priv)
+{
+	devm_free_irq(priv->dev, priv->error_irq, priv);
+	devm_free_irq(priv->dev, priv->rx_irq, priv);
+	devm_free_irq(priv->dev, priv->llu_plu_irq, priv);
+}
+
+static void mlxbf_gige_cache_stats(struct mlxbf_gige *priv)
+{
+	struct mlxbf_gige_stats *p;
+
+	/* Cache stats that will be cleared by clean port operation */
+	p = &priv->stats;
+	p->rx_din_dropped_pkts += readq(priv->base +
+					MLXBF_GIGE_RX_DIN_DROP_COUNTER);
+	p->rx_filter_passed_pkts += readq(priv->base +
+					  MLXBF_GIGE_RX_PASS_COUNTER_ALL);
+	p->rx_filter_discard_pkts += readq(priv->base +
+					   MLXBF_GIGE_RX_DISC_COUNTER_ALL);
+}
+
+static void mlxbf_gige_clean_port(struct mlxbf_gige *priv)
+{
+	u64 control, status;
+	int cnt;
+
+	/* Set the CLEAN_PORT_EN bit to trigger SW reset */
+	control = readq(priv->base + MLXBF_GIGE_CONTROL);
+	control |= MLXBF_GIGE_CONTROL_CLEAN_PORT_EN;
+	writeq(control, priv->base + MLXBF_GIGE_CONTROL);
+
+	/* Loop waiting for status ready bit to assert */
+	cnt = 1000;
+	do {
+		status = readq(priv->base + MLXBF_GIGE_STATUS);
+		if (status & MLXBF_GIGE_STATUS_READY)
+			break;
+		usleep_range(50, 100);
+	} while (--cnt > 0);
+
+	/* Clear the CLEAN_PORT_EN bit at end of this loop */
+	control = readq(priv->base + MLXBF_GIGE_CONTROL);
+	control &= ~MLXBF_GIGE_CONTROL_CLEAN_PORT_EN;
+	writeq(control, priv->base + MLXBF_GIGE_CONTROL);
+}
+
+static int mlxbf_gige_open(struct net_device *netdev)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+	struct phy_device *phydev = netdev->phydev;
+	u64 int_en;
+	int err;
+
+	mlxbf_gige_cache_stats(priv);
+	mlxbf_gige_clean_port(priv);
+	mlxbf_gige_rx_init(priv);
+	mlxbf_gige_tx_init(priv);
+	netif_napi_add(netdev, &priv->napi, mlxbf_gige_poll, NAPI_POLL_WEIGHT);
+	napi_enable(&priv->napi);
+	netif_start_queue(netdev);
+
+	err = mlxbf_gige_request_irqs(priv);
+	if (err)
+		return err;
+
+	phy_start(phydev);
+
+	/* Set bits in INT_EN that we care about */
+	int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR |
+		 MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS |
+		 MLXBF_GIGE_INT_EN_TX_SMALL_FRAME_SIZE |
+		 MLXBF_GIGE_INT_EN_TX_PI_CI_EXCEED_WQ_SIZE |
+		 MLXBF_GIGE_INT_EN_SW_CONFIG_ERROR |
+		 MLXBF_GIGE_INT_EN_SW_ACCESS_ERROR |
+		 MLXBF_GIGE_INT_EN_RX_RECEIVE_PACKET;
+	writeq(int_en, priv->base + MLXBF_GIGE_INT_EN);
+
+	return 0;
+}
+
+static int mlxbf_gige_stop(struct net_device *netdev)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+
+	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
+	netif_stop_queue(netdev);
+	napi_disable(&priv->napi);
+	netif_napi_del(&priv->napi);
+	mlxbf_gige_free_irqs(priv);
+
+	if (netdev->phydev)
+		phy_stop(netdev->phydev);
+
+	mlxbf_gige_rx_deinit(priv);
+	mlxbf_gige_tx_deinit(priv);
+	mlxbf_gige_cache_stats(priv);
+	mlxbf_gige_clean_port(priv);
+
+	return 0;
+}
+
+/* Function to advance the tx_wqe_next pointer to next TX WQE */
+static void mlxbf_gige_update_tx_wqe_next(struct mlxbf_gige *priv)
+{
+	/* Advance tx_wqe_next pointer */
+	priv->tx_wqe_next += MLXBF_GIGE_TX_WQE_SZ_QWORDS;
+
+	/* Check if 'next' pointer is beyond end of TX ring */
+	/* If so, set 'next' back to 'base' pointer of ring */
+	if (priv->tx_wqe_next == (priv->tx_wqe_base +
+				  (priv->tx_q_entries * MLXBF_GIGE_TX_WQE_SZ_QWORDS)))
+		priv->tx_wqe_next = priv->tx_wqe_base;
+}
+
+static netdev_tx_t mlxbf_gige_start_xmit(struct sk_buff *skb,
+					 struct net_device *netdev)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+	dma_addr_t tx_buf_dma;
+	u8 *tx_buf = NULL;
+	u64 *tx_wqe_addr;
+	u64 word2;
+
+	/* Check that there is room left in TX ring */
+	if (!mlxbf_gige_tx_buffs_avail(priv)) {
+		/* TX ring is full, inform stack but do not free SKB */
+		netif_stop_queue(netdev);
+		netdev->stats.tx_dropped++;
+		return NETDEV_TX_BUSY;
+	}
+
+	/* Allocate ptr for buffer */
+	if (skb->len < MLXBF_GIGE_DEFAULT_BUF_SZ)
+		tx_buf = dma_alloc_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
+					    &tx_buf_dma, GFP_KERNEL);
+
+	if (!tx_buf) {
+		/* Free incoming skb, could not alloc TX buffer */
+		dev_kfree_skb(skb);
+		netdev->stats.tx_dropped++;
+		return NET_XMIT_DROP;
+	}
+
+	priv->tx_buf[priv->tx_pi % priv->tx_q_entries] = tx_buf;
+
+	/* Copy data from skb to allocated TX buffer
+	 *
+	 * NOTE: GigE silicon will automatically pad up to
+	 *       minimum packet length if needed.
+	 */
+	skb_copy_bits(skb, 0, tx_buf, skb->len);
+
+	/* Get address of TX WQE */
+	tx_wqe_addr = priv->tx_wqe_next;
+
+	mlxbf_gige_update_tx_wqe_next(priv);
+
+	/* Put PA of buffer address into first 64-bit word of TX WQE */
+	*tx_wqe_addr = tx_buf_dma;
+
+	/* Set TX WQE pkt_len appropriately */
+	word2 = skb->len & MLXBF_GIGE_TX_WQE_PKT_LEN_MASK;
+
+	/* Write entire 2nd word of TX WQE */
+	*(tx_wqe_addr + 1) = word2;
+
+	priv->tx_pi++;
+
+	/* Create memory barrier before write to TX PI */
+	wmb();
+
+	writeq(priv->tx_pi, priv->base + MLXBF_GIGE_TX_PRODUCER_INDEX);
+
+	/* Free incoming skb, contents already copied to HW */
+	dev_kfree_skb(skb);
+
+	return NETDEV_TX_OK;
+}
+
+static int mlxbf_gige_do_ioctl(struct net_device *netdev,
+			       struct ifreq *ifr, int cmd)
+{
+	if (!(netif_running(netdev)))
+		return -EINVAL;
+
+	return phy_mii_ioctl(netdev->phydev, ifr, cmd);
+}
+
+static void mlxbf_gige_set_rx_mode(struct net_device *netdev)
+{
+	struct mlxbf_gige *priv = netdev_priv(netdev);
+	bool new_promisc_enabled;
+
+	new_promisc_enabled = netdev->flags & IFF_PROMISC;
+
+	/* Only write to the hardware registers if the new setting
+	 * of promiscuous mode is different from the current one.
+	 */
+	if (new_promisc_enabled != priv->promisc_enabled) {
+		priv->promisc_enabled = new_promisc_enabled;
+
+		if (new_promisc_enabled)
+			mlxbf_gige_enable_promisc(priv);
+		else
+			mlxbf_gige_disable_promisc(priv);
+		}
+	}
+
+static const struct net_device_ops mlxbf_gige_netdev_ops = {
+	.ndo_open		= mlxbf_gige_open,
+	.ndo_stop		= mlxbf_gige_stop,
+	.ndo_start_xmit		= mlxbf_gige_start_xmit,
+	.ndo_set_mac_address	= eth_mac_addr,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_do_ioctl		= mlxbf_gige_do_ioctl,
+	.ndo_set_rx_mode        = mlxbf_gige_set_rx_mode,
+};
+
+static u64 mlxbf_gige_mac_to_u64(u8 *addr)
+{
+	u64 mac = 0;
+	int i;
+
+	for (i = 0; i < ETH_ALEN; i++) {
+		mac <<= 8;
+		mac |= addr[i];
+	}
+	return mac;
+}
+
+static void mlxbf_gige_u64_to_mac(u8 *addr, u64 mac)
+{
+	int i;
+
+	for (i = ETH_ALEN; i > 0; i--) {
+		addr[i - 1] = mac & 0xFF;
+		mac >>= 8;
+	}
+}
+
+static void mlxbf_gige_initial_mac(struct mlxbf_gige *priv)
+{
+	u8 mac[ETH_ALEN];
+	u64 local_mac;
+	int status;
+
+	status = mlxbf_gige_get_mac_rx_filter(priv, MLXBF_GIGE_LOCAL_MAC_FILTER_IDX,
+					      &local_mac);
+	mlxbf_gige_u64_to_mac(mac, local_mac);
+
+	if (is_valid_ether_addr(mac)) {
+		ether_addr_copy(priv->netdev->dev_addr, mac);
+	} else {
+		/* Provide a random MAC if for some reason the device has
+		 * not been configured with a valid MAC address already.
+		 */
+		eth_hw_addr_random(priv->netdev);
+	}
+
+	local_mac = mlxbf_gige_mac_to_u64(priv->netdev->dev_addr);
+	mlxbf_gige_set_mac_rx_filter(priv, MLXBF_GIGE_LOCAL_MAC_FILTER_IDX,
+				     local_mac);
+}
+
+static int mlxbf_gige_probe(struct platform_device *pdev)
+{
+	struct phy_device *phydev;
+	struct net_device *netdev;
+	struct resource *mac_res;
+	struct resource *llu_res;
+	struct resource *plu_res;
+	struct mlxbf_gige *priv;
+	void __iomem *llu_base;
+	void __iomem *plu_base;
+	void __iomem *base;
+	u64 control;
+	int err = 0;
+
+	mac_res = platform_get_resource(pdev, IORESOURCE_MEM, MLXBF_GIGE_RES_MAC);
+	if (!mac_res)
+		return -ENXIO;
+
+	base = devm_ioremap_resource(&pdev->dev, mac_res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+
+	llu_res = platform_get_resource(pdev, IORESOURCE_MEM, MLXBF_GIGE_RES_LLU);
+	if (!llu_res)
+		return -ENXIO;
+
+	llu_base = devm_ioremap_resource(&pdev->dev, llu_res);
+	if (IS_ERR(llu_base))
+		return PTR_ERR(llu_base);
+
+	plu_res = platform_get_resource(pdev, IORESOURCE_MEM, MLXBF_GIGE_RES_PLU);
+	if (!plu_res)
+		return -ENXIO;
+
+	plu_base = devm_ioremap_resource(&pdev->dev, plu_res);
+	if (IS_ERR(plu_base))
+		return PTR_ERR(plu_base);
+
+	/* Perform general init of GigE block */
+	control = readq(base + MLXBF_GIGE_CONTROL);
+	control |= MLXBF_GIGE_CONTROL_PORT_EN;
+	writeq(control, base + MLXBF_GIGE_CONTROL);
+
+	netdev = devm_alloc_etherdev(&pdev->dev, sizeof(*priv));
+	if (!netdev)
+		return -ENOMEM;
+
+	SET_NETDEV_DEV(netdev, &pdev->dev);
+	netdev->netdev_ops = &mlxbf_gige_netdev_ops;
+	netdev->ethtool_ops = &mlxbf_gige_ethtool_ops;
+	priv = netdev_priv(netdev);
+	priv->netdev = netdev;
+
+	platform_set_drvdata(pdev, priv);
+	priv->dev = &pdev->dev;
+	priv->pdev = pdev;
+
+	spin_lock_init(&priv->lock);
+	spin_lock_init(&priv->gpio_lock);
+
+	/* Attach MDIO device */
+	err = mlxbf_gige_mdio_probe(pdev, priv);
+	if (err)
+		return err;
+
+	priv->base = base;
+	priv->llu_base = llu_base;
+	priv->plu_base = plu_base;
+
+	priv->rx_q_entries = MLXBF_GIGE_DEFAULT_RXQ_SZ;
+	priv->tx_q_entries = MLXBF_GIGE_DEFAULT_TXQ_SZ;
+
+	/* Write initial MAC address to hardware */
+	mlxbf_gige_initial_mac(priv);
+
+	err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+	if (err) {
+		dev_err(&pdev->dev, "DMA configuration failed: 0x%x\n", err);
+		return err;
+	}
+
+	priv->error_irq = platform_get_irq(pdev, MLXBF_GIGE_ERROR_INTR_IDX);
+	priv->rx_irq = platform_get_irq(pdev, MLXBF_GIGE_RECEIVE_PKT_INTR_IDX);
+	priv->llu_plu_irq = platform_get_irq(pdev, MLXBF_GIGE_LLU_PLU_INTR_IDX);
+
+	phydev = phy_find_first(priv->mdiobus);
+	if (!phydev)
+		return -EIO;
+
+	/* Sets netdev->phydev to phydev; which will eventually
+	 * be used in ioctl calls.
+	 */
+	err = phy_connect_direct(netdev, phydev,
+				 mlxbf_gige_handle_link_change,
+				 PHY_INTERFACE_MODE_GMII);
+	if (err) {
+		dev_err(&pdev->dev, "Could not attach to PHY\n");
+		return err;
+	}
+
+	/* MAC only supports 1000T full duplex mode */
+	phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
+	phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_100baseT_Full_BIT);
+	phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_100baseT_Half_BIT);
+	phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_10baseT_Full_BIT);
+	phy_remove_link_mode(phydev, ETHTOOL_LINK_MODE_10baseT_Half_BIT);
+
+	/* MAC supports symmetric flow control */
+	phy_support_sym_pause(phydev);
+
+	/* Display information about attached PHY device */
+	phy_attached_info(phydev);
+
+	err = register_netdev(netdev);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to register netdev\n");
+		phy_disconnect(phydev);
+		return err;
+	}
+
+	return 0;
+}
+
+static int mlxbf_gige_remove(struct platform_device *pdev)
+{
+	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
+
+	unregister_netdev(priv->netdev);
+	phy_disconnect(priv->netdev->phydev);
+	mlxbf_gige_mdio_remove(priv);
+	platform_set_drvdata(pdev, NULL);
+
+	return 0;
+}
+
+static void mlxbf_gige_shutdown(struct platform_device *pdev)
+{
+	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
+
+	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
+	mlxbf_gige_clean_port(priv);
+}
+
+static const struct acpi_device_id mlxbf_gige_acpi_match[] = {
+	{ "MLNXBF17", 0 },
+	{},
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_gige_acpi_match);
+
+static struct platform_driver mlxbf_gige_driver = {
+	.probe = mlxbf_gige_probe,
+	.remove = mlxbf_gige_remove,
+	.shutdown = mlxbf_gige_shutdown,
+	.driver = {
+		.name = DRV_NAME,
+		.acpi_match_table = ACPI_PTR(mlxbf_gige_acpi_match),
+	},
+};
+
+module_platform_driver(mlxbf_gige_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC Gigabit Ethernet Driver");
+MODULE_AUTHOR("David Thompson <dthompson@mellanox.com>");
+MODULE_AUTHOR("Asmaa Mnebhi <asmaa@mellanox.com>");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c
new file mode 100644
index 0000000..bb848b4
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c
@@ -0,0 +1,423 @@
+// SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
+/*  MDIO support for Mellanox GigE driver
+ *
+ *  Copyright (c) 2020 Mellanox Technologies Ltd.
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/ioport.h>
+#include <linux/irqreturn.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/phy.h>
+#include <linux/platform_device.h>
+#include <linux/property.h>
+
+#include "mlxbf_gige.h"
+
+#define MLXBF_GIGE_MDIO_POLL_BUSY_TIMEOUT	100 /* ms */
+#define MLXBF_GIGE_MDIO_POLL_DELAY_USEC		100 /* us */
+
+#define MLXBF_GIGE_MDIO_GW_OFFSET	0x0
+#define MLXBF_GIGE_MDIO_CFG_OFFSET	0x4
+
+/* Support clause 22 */
+#define MLXBF_GIGE_MDIO_CL22_ST1	0x1
+#define MLXBF_GIGE_MDIO_CL22_WRITE	0x1
+#define MLXBF_GIGE_MDIO_CL22_READ	0x2
+
+/* Busy bit is set by software and cleared by hardware */
+#define MLXBF_GIGE_MDIO_SET_BUSY	0x1
+/* Lock bit should be set/cleared by software */
+#define MLXBF_GIGE_MDIO_SET_LOCK	0x1
+
+/* MDIO GW register bits */
+#define MLXBF_GIGE_MDIO_GW_AD_MASK	GENMASK(15, 0)
+#define MLXBF_GIGE_MDIO_GW_DEVAD_MASK	GENMASK(20, 16)
+#define MLXBF_GIGE_MDIO_GW_PARTAD_MASK	GENMASK(25, 21)
+#define MLXBF_GIGE_MDIO_GW_OPCODE_MASK	GENMASK(27, 26)
+#define MLXBF_GIGE_MDIO_GW_ST1_MASK	GENMASK(28, 28)
+#define MLXBF_GIGE_MDIO_GW_BUSY_MASK	GENMASK(30, 30)
+#define MLXBF_GIGE_MDIO_GW_LOCK_MASK	GENMASK(31, 31)
+
+/* MDIO config register bits */
+#define MLXBF_GIGE_MDIO_CFG_MDIO_MODE_MASK		GENMASK(1, 0)
+#define MLXBF_GIGE_MDIO_CFG_MDIO3_3_MASK		GENMASK(2, 2)
+#define MLXBF_GIGE_MDIO_CFG_MDIO_FULL_DRIVE_MASK	GENMASK(4, 4)
+#define MLXBF_GIGE_MDIO_CFG_MDC_PERIOD_MASK		GENMASK(15, 8)
+#define MLXBF_GIGE_MDIO_CFG_MDIO_IN_SAMP_MASK		GENMASK(23, 16)
+#define MLXBF_GIGE_MDIO_CFG_MDIO_OUT_SAMP_MASK		GENMASK(31, 24)
+
+/* Formula for encoding the MDIO period. The encoded value is
+ * passed to the MDIO config register.
+ *
+ * mdc_clk = 2*(val + 1)*i1clk
+ *
+ * 400 ns = 2*(val + 1)*(((1/430)*1000) ns)
+ *
+ * val = (((400 * 430 / 1000) / 2) - 1)
+ */
+#define MLXBF_GIGE_I1CLK_MHZ		430
+#define MLXBF_GIGE_MDC_CLK_NS		400
+
+#define MLXBF_GIGE_MDIO_PERIOD	(((MLXBF_GIGE_MDC_CLK_NS * MLXBF_GIGE_I1CLK_MHZ / 1000) / 2) - 1)
+
+/* PHY should operate in master mode only */
+#define MLXBF_GIGE_MDIO_MODE_MASTER	1
+
+/* PHY input voltage has to be 3.3V */
+#define MLXBF_GIGE_MDIO3_3		1
+
+/* Operate in full drive mode */
+#define MLXBF_GIGE_MDIO_FULL_DRIVE	1
+
+/* 6 cycles before the i1clk (core clock) rising edge that triggers the mdc */
+#define MLXBF_GIGE_MDIO_IN_SAMP		6
+
+/* 13 cycles after the i1clk (core clock) rising edge that triggers the mdc */
+#define MLXBF_GIGE_MDIO_OUT_SAMP	13
+
+/* The PHY interrupt line is shared with other interrupt lines such
+ * as GPIO and SMBus. So use YU registers to determine whether the
+ * interrupt comes from the PHY.
+ */
+#define MLXBF_GIGE_CAUSE_RSH_COALESCE0_GPIO_CAUSE_MASK	0x10
+#define MLXBF_GIGE_GPIO_CAUSE_IRQ_IS_SET(val) \
+	((val) & MLXBF_GIGE_CAUSE_RSH_COALESCE0_GPIO_CAUSE_MASK)
+
+#define MLXBF_GIGE_GPIO_BLOCK0_MASK	BIT(0)
+
+#define MLXBF_GIGE_GPIO_CAUSE_FALL_EN		0x48
+#define MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0	0x80
+#define MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0		0x94
+#define MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE	0x98
+
+#define MLXBF_GIGE_GPIO12_BIT			12
+
+static u32 mlxbf_gige_mdio_create_cmd(u16 data, int phy_add,
+				      int phy_reg, u32 opcode)
+{
+	u32 gw_reg = 0;
+
+	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_AD_MASK, data);
+	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_DEVAD_MASK, phy_reg);
+	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_PARTAD_MASK, phy_add);
+	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_OPCODE_MASK, opcode);
+	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_ST1_MASK,
+			     MLXBF_GIGE_MDIO_CL22_ST1);
+	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_BUSY_MASK,
+			     MLXBF_GIGE_MDIO_SET_BUSY);
+
+	/* Hold the lock until the read/write is completed so that no other
+	 * program accesses the mdio bus.
+	 */
+	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_LOCK_MASK,
+			     MLXBF_GIGE_MDIO_SET_LOCK);
+
+	return gw_reg;
+}
+
+static int mlxbf_gige_mdio_poll_bit(struct mlxbf_gige *priv, u32 bit_mask)
+{
+	unsigned long timeout;
+	u32 val;
+
+	timeout = jiffies + msecs_to_jiffies(MLXBF_GIGE_MDIO_POLL_BUSY_TIMEOUT);
+	do {
+		val = readl(priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
+		if (!(val & bit_mask))
+			return 0;
+		udelay(MLXBF_GIGE_MDIO_POLL_DELAY_USEC);
+	} while (time_before(jiffies, timeout));
+
+	return -ETIME;
+}
+
+static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add, int phy_reg)
+{
+	struct mlxbf_gige *priv = bus->priv;
+	u32 cmd;
+	u32 ret;
+
+	/* If the lock is held by something else, drop the request.
+	 * If the lock is cleared, that means the busy bit was cleared.
+	 */
+	ret = mlxbf_gige_mdio_poll_bit(priv, MLXBF_GIGE_MDIO_GW_LOCK_MASK);
+	if (ret)
+		return -EBUSY;
+
+	/* Send mdio read request */
+	cmd = mlxbf_gige_mdio_create_cmd(0, phy_add, phy_reg, MLXBF_GIGE_MDIO_CL22_READ);
+
+	writel(cmd, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
+
+	ret = mlxbf_gige_mdio_poll_bit(priv, MLXBF_GIGE_MDIO_GW_BUSY_MASK);
+	if (ret) {
+		writel(0, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
+		return -EBUSY;
+	}
+
+	ret = readl(priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
+	/* Only return ad bits of the gw register */
+	ret &= MLXBF_GIGE_MDIO_GW_AD_MASK;
+
+	/* To release the YU MDIO lock, clear gw register,
+	 * so that the YU does not confuse this write with a new
+	 * MDIO read/write request.
+	 */
+	writel(0, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
+
+	return ret;
+}
+
+static int mlxbf_gige_mdio_write(struct mii_bus *bus, int phy_add,
+				 int phy_reg, u16 val)
+{
+	struct mlxbf_gige *priv = bus->priv;
+	u32 cmd;
+	int ret;
+
+	/* If the lock is held by something else, drop the request.
+	 * If the lock is cleared, that means the busy bit was cleared.
+	 */
+	ret = mlxbf_gige_mdio_poll_bit(priv, MLXBF_GIGE_MDIO_GW_LOCK_MASK);
+	if (ret)
+		return -EBUSY;
+
+	/* Send mdio write request */
+	cmd = mlxbf_gige_mdio_create_cmd(val, phy_add, phy_reg,
+					 MLXBF_GIGE_MDIO_CL22_WRITE);
+	writel(cmd, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
+
+	/* If the poll timed out, drop the request */
+	ret = mlxbf_gige_mdio_poll_bit(priv, MLXBF_GIGE_MDIO_GW_BUSY_MASK);
+
+	/* To release the YU MDIO lock, clear gw register,
+	 * so that the YU does not confuse this write as a new
+	 * MDIO read/write request.
+	 */
+	writel(0, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
+
+	return ret;
+}
+
+static void mlxbf_gige_mdio_disable_phy_int(struct mlxbf_gige *priv)
+{
+	unsigned long flags;
+	u32 val;
+
+	spin_lock_irqsave(&priv->gpio_lock, flags);
+	val = readl(priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
+	val &= ~priv->phy_int_gpio_mask;
+	writel(val, priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
+	spin_unlock_irqrestore(&priv->gpio_lock, flags);
+}
+
+static void mlxbf_gige_mdio_enable_phy_int(struct mlxbf_gige *priv)
+{
+	unsigned long flags;
+	u32 val;
+
+	spin_lock_irqsave(&priv->gpio_lock, flags);
+	/* The INT_N interrupt level is active low.
+	 * So enable cause fall bit to detect when GPIO
+	 * state goes low.
+	 */
+	val = readl(priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_FALL_EN);
+	val |= priv->phy_int_gpio_mask;
+	writel(val, priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_FALL_EN);
+
+	/* Enable PHY interrupt by setting the priority level */
+	val = readl(priv->gpio_io +
+			MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
+	val |= priv->phy_int_gpio_mask;
+	writel(val, priv->gpio_io +
+			MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
+	spin_unlock_irqrestore(&priv->gpio_lock, flags);
+}
+
+/* Interrupt handler is called from mlxbf_gige_main.c
+ * driver whenever a phy interrupt is received.
+ */
+irqreturn_t mlxbf_gige_mdio_handle_phy_interrupt(struct mlxbf_gige *priv)
+{
+	u32 val;
+
+	/* The YU interrupt is shared between SMBus and GPIOs.
+	 * So first, determine whether this is a GPIO interrupt.
+	 */
+	val = readl(priv->cause_rsh_coalesce0_io);
+	if (!MLXBF_GIGE_GPIO_CAUSE_IRQ_IS_SET(val)) {
+		/* Nothing to do here, not a GPIO interrupt */
+		return IRQ_NONE;
+	}
+	/* Then determine which gpio register this interrupt is for.
+	 * Return if the interrupt is not for gpio block 0.
+	 */
+	val = readl(priv->cause_gpio_arm_coalesce0_io);
+	if (!(val & MLXBF_GIGE_GPIO_BLOCK0_MASK))
+		return IRQ_NONE;
+
+	/* Finally check if this interrupt is from PHY device.
+	 * Return if it is not.
+	 */
+	val = readl(priv->gpio_io +
+			MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0);
+	if (!(val & priv->phy_int_gpio_mask))
+		return IRQ_NONE;
+
+	/* Clear interrupt when done, otherwise, no further interrupt
+	 * will be triggered.
+	 * Writing 0x1 to the clear cause register also clears the
+	 * following registers:
+	 * cause_gpio_arm_coalesce0
+	 * cause_rsh_coalesce0
+	 */
+	val = readl(priv->gpio_io +
+			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
+	val |= priv->phy_int_gpio_mask;
+	writel(val, priv->gpio_io +
+			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
+
+	return IRQ_HANDLED;
+}
+
+static void mlxbf_gige_mdio_init_config(struct mlxbf_gige *priv)
+{
+	struct device *dev = priv->dev;
+	u32 mdio_full_drive;
+	u32 mdio_out_sample;
+	u32 mdio_in_sample;
+	u32 mdio_voltage;
+	u32 mdc_period;
+	u32 mdio_mode;
+	u32 mdio_cfg;
+	int ret;
+
+	ret = device_property_read_u32(dev, "mdio-mode", &mdio_mode);
+	if (ret < 0)
+		mdio_mode = MLXBF_GIGE_MDIO_MODE_MASTER;
+
+	ret = device_property_read_u32(dev, "mdio-voltage", &mdio_voltage);
+	if (ret < 0)
+		mdio_voltage = MLXBF_GIGE_MDIO3_3;
+
+	ret = device_property_read_u32(dev, "mdio-full-drive", &mdio_full_drive);
+	if (ret < 0)
+		mdio_full_drive = MLXBF_GIGE_MDIO_FULL_DRIVE;
+
+	ret = device_property_read_u32(dev, "mdc-period", &mdc_period);
+	if (ret < 0)
+		mdc_period = MLXBF_GIGE_MDIO_PERIOD;
+
+	ret = device_property_read_u32(dev, "mdio-in-sample", &mdio_in_sample);
+	if (ret < 0)
+		mdio_in_sample = MLXBF_GIGE_MDIO_IN_SAMP;
+
+	ret = device_property_read_u32(dev, "mdio-out-sample", &mdio_out_sample);
+	if (ret < 0)
+		mdio_out_sample = MLXBF_GIGE_MDIO_OUT_SAMP;
+
+	mdio_cfg = FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_MODE_MASK, mdio_mode) |
+		   FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO3_3_MASK, mdio_voltage) |
+		   FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_FULL_DRIVE_MASK, mdio_full_drive) |
+		   FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDC_PERIOD_MASK, mdc_period) |
+		   FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_IN_SAMP_MASK, mdio_in_sample) |
+		   FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_OUT_SAMP_MASK, mdio_out_sample);
+
+	writel(mdio_cfg, priv->mdio_io + MLXBF_GIGE_MDIO_CFG_OFFSET);
+}
+
+int mlxbf_gige_mdio_probe(struct platform_device *pdev, struct mlxbf_gige *priv)
+{
+	struct device *dev = &pdev->dev;
+	struct resource *res;
+	u32 phy_int_gpio;
+	u32 phy_addr;
+	int ret;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, MLXBF_GIGE_RES_MDIO9);
+	if (!res)
+		return -ENODEV;
+
+	priv->mdio_io = devm_ioremap_resource(dev, res);
+	if (IS_ERR(priv->mdio_io))
+		return PTR_ERR(priv->mdio_io);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, MLXBF_GIGE_RES_GPIO0);
+	if (!res)
+		return -ENODEV;
+
+	priv->gpio_io = devm_ioremap(dev, res->start, resource_size(res));
+	if (!priv->gpio_io)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM,
+				    MLXBF_GIGE_RES_CAUSE_RSH_COALESCE0);
+	if (!res)
+		return -ENODEV;
+
+	priv->cause_rsh_coalesce0_io =
+		devm_ioremap(dev, res->start, resource_size(res));
+	if (!priv->cause_rsh_coalesce0_io)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM,
+				    MLXBF_GIGE_RES_CAUSE_GPIO_ARM_COALESCE0);
+	if (!res)
+		return -ENODEV;
+
+	priv->cause_gpio_arm_coalesce0_io =
+		devm_ioremap(dev, res->start, resource_size(res));
+	if (!priv->cause_gpio_arm_coalesce0_io)
+		return -ENOMEM;
+
+	mlxbf_gige_mdio_init_config(priv);
+
+	ret = device_property_read_u32(dev, "phy-int-gpio", &phy_int_gpio);
+	if (ret < 0)
+		phy_int_gpio = MLXBF_GIGE_GPIO12_BIT;
+	priv->phy_int_gpio_mask = BIT(phy_int_gpio);
+
+	mlxbf_gige_mdio_enable_phy_int(priv);
+
+	priv->mdiobus = devm_mdiobus_alloc(dev);
+	if (!priv->mdiobus) {
+		dev_err(dev, "Failed to alloc MDIO bus\n");
+		return -ENOMEM;
+	}
+
+	priv->mdiobus->name = "mlxbf-mdio";
+	priv->mdiobus->read = mlxbf_gige_mdio_read;
+	priv->mdiobus->write = mlxbf_gige_mdio_write;
+	priv->mdiobus->parent = dev;
+	priv->mdiobus->priv = priv;
+	snprintf(priv->mdiobus->id, MII_BUS_ID_SIZE, "%s",
+		 dev_name(dev));
+
+	ret = device_property_read_u32(dev, "phy-addr", &phy_addr);
+	if (ret < 0)
+		phy_addr = MLXBF_GIGE_MDIO_DEFAULT_PHY_ADDR;
+
+	priv->mdiobus->irq[phy_addr] = PHY_POLL;
+
+	/* Auto probe PHY at the corresponding address */
+	priv->mdiobus->phy_mask = ~(1 << phy_addr);
+	ret = mdiobus_register(priv->mdiobus);
+	if (ret)
+		dev_err(dev, "Failed to register MDIO bus\n");
+
+	return ret;
+}
+
+void mlxbf_gige_mdio_remove(struct mlxbf_gige *priv)
+{
+	mlxbf_gige_mdio_disable_phy_int(priv);
+	mdiobus_unregister(priv->mdiobus);
+}
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h
new file mode 100644
index 0000000..9c7af82
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB */
+
+/* Header file for Mellanox BlueField GigE register defines
+ *
+ * Copyright (c) 2020 Mellanox Technologies Ltd.
+ */
+
+#ifndef __MLXBF_GIGE_REGS_H__
+#define __MLXBF_GIGE_REGS_H__
+
+#define MLXBF_GIGE_STATUS                             0x0010
+#define MLXBF_GIGE_STATUS_READY                       BIT(0)
+#define MLXBF_GIGE_INT_STATUS                         0x0028
+#define MLXBF_GIGE_INT_STATUS_RX_RECEIVE_PACKET       BIT(0)
+#define MLXBF_GIGE_INT_STATUS_RX_MAC_ERROR            BIT(1)
+#define MLXBF_GIGE_INT_STATUS_RX_TRN_ERROR            BIT(2)
+#define MLXBF_GIGE_INT_STATUS_SW_ACCESS_ERROR         BIT(3)
+#define MLXBF_GIGE_INT_STATUS_SW_CONFIG_ERROR         BIT(4)
+#define MLXBF_GIGE_INT_STATUS_TX_PI_CI_EXCEED_WQ_SIZE BIT(5)
+#define MLXBF_GIGE_INT_STATUS_TX_SMALL_FRAME_SIZE     BIT(6)
+#define MLXBF_GIGE_INT_STATUS_TX_CHECKSUM_INPUTS      BIT(7)
+#define MLXBF_GIGE_INT_STATUS_HW_ACCESS_ERROR         BIT(8)
+#define MLXBF_GIGE_INT_EN                             0x0030
+#define MLXBF_GIGE_INT_EN_RX_RECEIVE_PACKET           BIT(0)
+#define MLXBF_GIGE_INT_EN_RX_MAC_ERROR                BIT(1)
+#define MLXBF_GIGE_INT_EN_RX_TRN_ERROR                BIT(2)
+#define MLXBF_GIGE_INT_EN_SW_ACCESS_ERROR             BIT(3)
+#define MLXBF_GIGE_INT_EN_SW_CONFIG_ERROR             BIT(4)
+#define MLXBF_GIGE_INT_EN_TX_PI_CI_EXCEED_WQ_SIZE     BIT(5)
+#define MLXBF_GIGE_INT_EN_TX_SMALL_FRAME_SIZE         BIT(6)
+#define MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS          BIT(7)
+#define MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR             BIT(8)
+#define MLXBF_GIGE_INT_MASK                           0x0038
+#define MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET         BIT(0)
+#define MLXBF_GIGE_CONTROL                            0x0040
+#define MLXBF_GIGE_CONTROL_PORT_EN                    BIT(0)
+#define MLXBF_GIGE_CONTROL_MAC_ID_RANGE_EN            BIT(1)
+#define MLXBF_GIGE_CONTROL_EN_SPECIFIC_MAC            BIT(4)
+#define MLXBF_GIGE_CONTROL_CLEAN_PORT_EN              BIT(31)
+#define MLXBF_GIGE_RX_WQ_BASE                         0x0200
+#define MLXBF_GIGE_RX_WQE_SIZE_LOG2                   0x0208
+#define MLXBF_GIGE_RX_WQE_SIZE_LOG2_RESET_VAL         7
+#define MLXBF_GIGE_RX_CQ_BASE                         0x0210
+#define MLXBF_GIGE_TX_WQ_BASE                         0x0218
+#define MLXBF_GIGE_TX_WQ_SIZE_LOG2                    0x0220
+#define MLXBF_GIGE_TX_WQ_SIZE_LOG2_RESET_VAL          7
+#define MLXBF_GIGE_TX_CI_UPDATE_ADDRESS               0x0228
+#define MLXBF_GIGE_RX_WQE_PI                          0x0230
+#define MLXBF_GIGE_TX_PRODUCER_INDEX                  0x0238
+#define MLXBF_GIGE_RX_MAC_FILTER                      0x0240
+#define MLXBF_GIGE_RX_MAC_FILTER_STRIDE               0x0008
+#define MLXBF_GIGE_RX_DIN_DROP_COUNTER                0x0260
+#define MLXBF_GIGE_TX_CONSUMER_INDEX                  0x0310
+#define MLXBF_GIGE_TX_CONTROL                         0x0318
+#define MLXBF_GIGE_TX_CONTROL_GRACEFUL_STOP           BIT(0)
+#define MLXBF_GIGE_TX_STATUS                          0x0388
+#define MLXBF_GIGE_TX_STATUS_DATA_FIFO_FULL           BIT(1)
+#define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_START     0x0520
+#define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_END       0x0528
+#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC           0x0540
+#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC_EN        BIT(0)
+#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS           0x0548
+#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS_EN        BIT(0)
+#define MLXBF_GIGE_RX_PASS_COUNTER_ALL                0x0550
+#define MLXBF_GIGE_RX_DISC_COUNTER_ALL                0x0560
+#define MLXBF_GIGE_RX                                 0x0578
+#define MLXBF_GIGE_RX_STRIP_CRC_EN                    BIT(1)
+#define MLXBF_GIGE_RX_DMA                             0x0580
+#define MLXBF_GIGE_RX_DMA_EN                          BIT(0)
+#define MLXBF_GIGE_RX_CQE_PACKET_CI                   0x05b0
+#define MLXBF_GIGE_MAC_CFG                            0x05e8
+
+#endif /* !defined(__MLXBF_GIGE_REGS_H__) */
-- 
2.1.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
@ 2020-07-29 19:41 ` David Thompson
  2020-07-29 20:31   ` David Miller
  2020-07-29 20:49 ` Jakub Kicinski
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: David Thompson @ 2020-07-29 19:41 UTC (permalink / raw)
  To: David Thompson, netdev; +Cc: davem, kuba, Jiri Pirko, Asmaa Mnebhi



> -----Original Message-----
> From: David Thompson <dthompson@mellanox.com>
> Sent: Wednesday, July 29, 2020 2:29 PM
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net; kuba@kernel.org; Jiri Pirko <jiri@mellanox.com>;
> David Thompson <dthompson@mellanox.com>; Asmaa Mnebhi
> <Asmaa@mellanox.com>
> Subject: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
> 
> This patch adds build and driver logic for the "mlxbf_gige"
> Ethernet driver from Mellanox Technologies. The second
> generation BlueField SoC from Mellanox supports an
> out-of-band GigaBit Ethernet management port to the Arm
> subsystem.  This driver supports TCP/IP network connectivity
> for that port, and provides back-end routines to handle
> basic ethtool requests.
> 
> The logic in "mlxbf_gige_main.c" is the driver performing
> packet processing and handling ethtool management requests.
> The driver interfaces to the Gigabit Ethernet block of
> BlueField SoC via MMIO accesses to registers, which contain
> control information or pointers describing transmit and
> receive resources.  There is a single transmit queue, and
> the port supports transmit ring sizes of 4 to 256 entries.
> There is a single receive queue, and the port supports
> receive ring sizes of 32 to 32K entries. The transmit and
> receive rings are allocated from DMA coherent memory. There
> is a 16-bit producer and consumer index per ring to denote
> software ownership and hardware ownership, respetcively.
> The main driver supports the handling of some basic ethtool
> requests: get driver info, get/set ring parameters, get
> registers, and get statistics.
> 
> The logic in "mlxbf_gige_mdio.c" is the driver controlling
> the Mellanox BlueField hardware that interacts with a PHY
> device via MDIO/MDC pins.  This driver does the following:
>   - At driver probe time, it configures several BlueField MDIO
>     parameters such as sample rate, full drive, voltage and MDC
>     based on values read from ACPI table.
>   - It defines functions to read and write MDIO registers and
>     registers the MDIO bus.
>   - It defines the phy interrupt handler reporting a
>     link up/down status change
>   - This driver's probe is invoked from the main driver logic
>     while the phy interrupt handler is registered in ndo_open.
> 
> Driver limitations
>   - Only supports 1Gbps speed
>   - Only supports GMII protocol
>   - Supports maximum packet size of 2KB
>   - Does not support scatter-gather buffering
> 
> Testing
>   - Successful build of kernel for ARM64, ARM32, X86_64
>   - Tested ARM64 build on FastModels & Palladium
> 

It's been pointed out to me that this section is incomplete, and I apologize.

The "Testing" section should include the following information:
- Tested ARM64 build on several Mellanox boards that are built with
   the BlueField-2 SoC.  The testing includes coverage in the areas of
   networking (e.g. ping, iperf, ifconfig, route), file transfers (e.g. SCP),
   and various ethtool options relevant to this driver.

> Signed-off-by: David Thompson <dthompson@mellanox.com>
> Signed-off-by: Asmaa Mnebhi <asmaa@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/Kconfig              |    1 +
>  drivers/net/ethernet/mellanox/Makefile             |    1 +
>  drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig   |   13 +
>  drivers/net/ethernet/mellanox/mlxbf_gige/Makefile  |    5 +
>  .../net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h  |  156 +++
>  .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 1277
> ++++++++++++++++++++
>  .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c |  423 +++++++
>  .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h |   73 ++
>  8 files changed, 1949 insertions(+)
>  create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig
>  create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/Makefile
>  create mode 100644 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h
>  create mode 100644
> drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
>  create mode 100644
> drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c
>  create mode 100644
> drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h
> 
> diff --git a/drivers/net/ethernet/mellanox/Kconfig
> b/drivers/net/ethernet/mellanox/Kconfig
> index ff6613a..b4f66eb 100644
> --- a/drivers/net/ethernet/mellanox/Kconfig
> +++ b/drivers/net/ethernet/mellanox/Kconfig
> @@ -22,5 +22,6 @@ source "drivers/net/ethernet/mellanox/mlx4/Kconfig"
>  source "drivers/net/ethernet/mellanox/mlx5/core/Kconfig"
>  source "drivers/net/ethernet/mellanox/mlxsw/Kconfig"
>  source "drivers/net/ethernet/mellanox/mlxfw/Kconfig"
> +source "drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig"
> 
>  endif # NET_VENDOR_MELLANOX
> diff --git a/drivers/net/ethernet/mellanox/Makefile
> b/drivers/net/ethernet/mellanox/Makefile
> index 79773ac..d4b5f54 100644
> --- a/drivers/net/ethernet/mellanox/Makefile
> +++ b/drivers/net/ethernet/mellanox/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_MLX4_CORE) += mlx4/
>  obj-$(CONFIG_MLX5_CORE) += mlx5/core/
>  obj-$(CONFIG_MLXSW_CORE) += mlxsw/
>  obj-$(CONFIG_MLXFW) += mlxfw/
> +obj-$(CONFIG_MLXBF_GIGE) += mlxbf_gige/
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig
> b/drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig
> new file mode 100644
> index 0000000..73c5d74
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
> +#
> +# Mellanox GigE driver configuration
> +#
> +
> +config MLXBF_GIGE
> +	tristate "Mellanox Technologies BlueField Gigabit Ethernet support"
> +	depends on (ARM64 || COMPILE_TEST) && ACPI && INET
> +	select PHYLIB
> +	help
> +	  The second generation BlueField SoC from Mellanox Technologies
> +	  supports an out-of-band Gigabit Ethernet management port to the
> +	  Arm subsystem.
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/Makefile
> b/drivers/net/ethernet/mellanox/mlxbf_gige/Makefile
> new file mode 100644
> index 0000000..f6be6c6
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
> +
> +obj-$(CONFIG_MLXBF_GIGE) += mlxbf_gige.o
> +
> +mlxbf_gige-y := mlxbf_gige_main.o mlxbf_gige_mdio.o
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h
> b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h
> new file mode 100644
> index 0000000..f89199d
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h
> @@ -0,0 +1,156 @@
> +/* SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB */
> +
> +/* Header file for Gigabit Ethernet driver for Mellanox BlueField SoC
> + * - this file contains software data structures and any chip-specific
> + *   data structures (e.g. TX WQE format) that are memory resident.
> + *
> + * Copyright (c) 2020 Mellanox Technologies Ltd.
> + */
> +
> +#ifndef __MLXBF_GIGE_H__
> +#define __MLXBF_GIGE_H__
> +
> +#include <linux/irqreturn.h>
> +#include <linux/netdevice.h>
> +
> +/* The silicon design supports a maximum RX ring size of
> + * 32K entries. Based on current testing this maximum size
> + * is not required to be supported.  Instead the RX ring
> + * will be capped at a realistic value of 1024 entries.
> + */
> +#define MLXBF_GIGE_MIN_RXQ_SZ     32
> +#define MLXBF_GIGE_MAX_RXQ_SZ     1024
> +#define MLXBF_GIGE_DEFAULT_RXQ_SZ 128
> +
> +#define MLXBF_GIGE_MIN_TXQ_SZ     4
> +#define MLXBF_GIGE_MAX_TXQ_SZ     256
> +#define MLXBF_GIGE_DEFAULT_TXQ_SZ 128
> +
> +#define MLXBF_GIGE_DEFAULT_BUF_SZ 2048
> +
> +/* There are four individual MAC RX filters. Currently
> + * two of them are being used: one for the broadcast MAC
> + * (index 0) and one for local MAC (index 1)
> + */
> +#define MLXBF_GIGE_BCAST_MAC_FILTER_IDX 0
> +#define MLXBF_GIGE_LOCAL_MAC_FILTER_IDX 1
> +
> +/* Define for broadcast MAC literal */
> +#define BCAST_MAC_ADDR 0xFFFFFFFFFFFF
> +
> +/* There are three individual interrupts:
> + *   1) Errors, "OOB" interrupt line
> + *   2) Receive Packet, "OOB_LLU" interrupt line
> + *   3) LLU and PLU Events, "OOB_PLU" interrupt line
> + */
> +#define MLXBF_GIGE_ERROR_INTR_IDX       0
> +#define MLXBF_GIGE_RECEIVE_PKT_INTR_IDX 1
> +#define MLXBF_GIGE_LLU_PLU_INTR_IDX     2
> +#define MLXBF_GIGE_PHY_INT_N            3
> +
> +#define MLXBF_GIGE_MDIO_DEFAULT_PHY_ADDR 0x3
> +
> +struct mlxbf_gige_stats {
> +	u64 hw_access_errors;
> +	u64 tx_invalid_checksums;
> +	u64 tx_small_frames;
> +	u64 tx_index_errors;
> +	u64 sw_config_errors;
> +	u64 sw_access_errors;
> +	u64 rx_truncate_errors;
> +	u64 rx_mac_errors;
> +	u64 rx_din_dropped_pkts;
> +	u64 tx_fifo_full;
> +	u64 rx_filter_passed_pkts;
> +	u64 rx_filter_discard_pkts;
> +};
> +
> +struct mlxbf_gige {
> +	void __iomem *base;
> +	void __iomem *llu_base;
> +	void __iomem *plu_base;
> +	struct device *dev;
> +	struct net_device *netdev;
> +	struct platform_device *pdev;
> +	void __iomem *mdio_io;
> +	struct mii_bus *mdiobus;
> +	void __iomem *gpio_io;
> +	void __iomem *cause_rsh_coalesce0_io;
> +	void __iomem *cause_gpio_arm_coalesce0_io;
> +	u32 phy_int_gpio_mask;
> +	spinlock_t lock;
> +	spinlock_t gpio_lock;
> +	u16 rx_q_entries;
> +	u16 tx_q_entries;
> +	u64 *tx_wqe_base;
> +	dma_addr_t tx_wqe_base_dma;
> +	u64 *tx_wqe_next;
> +	u64 *tx_cc;
> +	dma_addr_t tx_cc_dma;
> +	dma_addr_t *rx_wqe_base;
> +	dma_addr_t rx_wqe_base_dma;
> +	u64 *rx_cqe_base;
> +	dma_addr_t rx_cqe_base_dma;
> +	u16 tx_pi;
> +	u16 prev_tx_ci;
> +	u64 error_intr_count;
> +	u64 rx_intr_count;
> +	u64 llu_plu_intr_count;
> +	u8 *rx_buf[MLXBF_GIGE_MAX_RXQ_SZ];
> +	u8 *tx_buf[MLXBF_GIGE_MAX_TXQ_SZ];
> +	int error_irq;
> +	int rx_irq;
> +	int llu_plu_irq;
> +	bool promisc_enabled;
> +	struct napi_struct napi;
> +	struct mlxbf_gige_stats stats;
> +};
> +
> +/* Rx Work Queue Element definitions */
> +#define MLXBF_GIGE_RX_WQE_SZ                   8
> +
> +/* Rx Completion Queue Element definitions */
> +#define MLXBF_GIGE_RX_CQE_SZ                   8
> +#define MLXBF_GIGE_RX_CQE_PKT_LEN_MASK         GENMASK(10, 0)
> +#define MLXBF_GIGE_RX_CQE_VALID_MASK           GENMASK(11, 11)
> +#define MLXBF_GIGE_RX_CQE_PKT_STATUS_MASK      GENMASK(15, 12)
> +#define MLXBF_GIGE_RX_CQE_PKT_STATUS_MAC_ERR   GENMASK(12, 12)
> +#define MLXBF_GIGE_RX_CQE_PKT_STATUS_TRUNCATED GENMASK(13, 13)
> +#define MLXBF_GIGE_RX_CQE_CHKSUM_MASK          GENMASK(31, 16)
> +
> +/* Tx Work Queue Element definitions */
> +#define MLXBF_GIGE_TX_WQE_SZ_QWORDS            2
> +#define MLXBF_GIGE_TX_WQE_SZ                   16
> +#define MLXBF_GIGE_TX_WQE_PKT_LEN_MASK         GENMASK(10, 0)
> +#define MLXBF_GIGE_TX_WQE_UPDATE_MASK          GENMASK(31, 31)
> +#define MLXBF_GIGE_TX_WQE_CHKSUM_LEN_MASK      GENMASK(42, 32)
> +#define MLXBF_GIGE_TX_WQE_CHKSUM_START_MASK    GENMASK(55, 48)
> +#define MLXBF_GIGE_TX_WQE_CHKSUM_OFFSET_MASK   GENMASK(63, 56)
> +
> +/* Macro to return packet length of specified TX WQE */
> +#define MLXBF_GIGE_TX_WQE_PKT_LEN(tx_wqe_addr) \
> +	(*(tx_wqe_addr + 1) & MLXBF_GIGE_TX_WQE_PKT_LEN_MASK)
> +
> +/* Tx Completion Count */
> +#define MLXBF_GIGE_TX_CC_SZ                    8
> +
> +/* List of resources in ACPI table */
> +enum mlxbf_gige_res {
> +	MLXBF_GIGE_RES_MAC,
> +	MLXBF_GIGE_RES_MDIO9,
> +	MLXBF_GIGE_RES_GPIO0,
> +	MLXBF_GIGE_RES_CAUSE_RSH_COALESCE0,
> +	MLXBF_GIGE_RES_CAUSE_GPIO_ARM_COALESCE0,
> +	MLXBF_GIGE_RES_LLU,
> +	MLXBF_GIGE_RES_PLU
> +};
> +
> +/* Version of register data returned by mlxbf_gige_get_regs() */
> +#define MLXBF_GIGE_REGS_VERSION 1
> +
> +int mlxbf_gige_mdio_probe(struct platform_device *pdev,
> +			  struct mlxbf_gige *priv);
> +void mlxbf_gige_mdio_remove(struct mlxbf_gige *priv);
> +irqreturn_t mlxbf_gige_mdio_handle_phy_interrupt(struct mlxbf_gige *priv);
> +
> +#endif /* !defined(__MLXBF_GIGE_H__) */
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> new file mode 100644
> index 0000000..a02e7a4
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> @@ -0,0 +1,1277 @@
> +// SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
> +
> +/* Gigabit Ethernet driver for Mellanox BlueField SoC
> + *
> + * Copyright (c) 2020 Mellanox Technologies Ltd.
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/device.h>
> +#include <linux/etherdevice.h>
> +#include <linux/interrupt.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/module.h>
> +#include <linux/phy.h>
> +#include <linux/platform_device.h>
> +
> +#include "mlxbf_gige.h"
> +#include "mlxbf_gige_regs.h"
> +
> +#define DRV_NAME    "mlxbf_gige"
> +
> +static void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv,
> +					 unsigned int index, u64 dmac)
> +{
> +	void __iomem *base = priv->base;
> +	u64 control;
> +
> +	/* Write destination MAC to specified MAC RX filter */
> +	writeq(dmac, base + MLXBF_GIGE_RX_MAC_FILTER +
> +	       (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE));
> +
> +	/* Enable MAC receive filter mask for specified index */
> +	control = readq(base + MLXBF_GIGE_CONTROL);
> +	control |= (MLXBF_GIGE_CONTROL_EN_SPECIFIC_MAC << index);
> +	writeq(control, base + MLXBF_GIGE_CONTROL);
> +}
> +
> +static int mlxbf_gige_get_mac_rx_filter(struct mlxbf_gige *priv,
> +					unsigned int index, u64 *dmac)
> +{
> +	void __iomem *base = priv->base;
> +
> +	/* Read destination MAC from specified MAC RX filter */
> +	*dmac = readq(base + MLXBF_GIGE_RX_MAC_FILTER +
> +		      (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE));
> +
> +	return 0;
> +}
> +
> +static void mlxbf_gige_enable_promisc(struct mlxbf_gige *priv)
> +{
> +	void __iomem *base = priv->base;
> +	u64 control;
> +
> +	/* Enable MAC_ID_RANGE match functionality */
> +	control = readq(base + MLXBF_GIGE_CONTROL);
> +	control |= MLXBF_GIGE_CONTROL_MAC_ID_RANGE_EN;
> +	writeq(control, base + MLXBF_GIGE_CONTROL);
> +
> +	/* Set start of destination MAC range check to 0 */
> +	writeq(0, base + MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_START);
> +
> +	/* Set end of destination MAC range check to all FFs */
> +	writeq(0xFFFFFFFFFFFF, base +
> MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_END);
> +}
> +
> +static void mlxbf_gige_disable_promisc(struct mlxbf_gige *priv)
> +{
> +	void __iomem *base = priv->base;
> +	u64 control;
> +
> +	/* Disable MAC_ID_RANGE match functionality */
> +	control = readq(base + MLXBF_GIGE_CONTROL);
> +	control &= ~MLXBF_GIGE_CONTROL_MAC_ID_RANGE_EN;
> +	writeq(control, base + MLXBF_GIGE_CONTROL);
> +
> +	/* NOTE: no need to change DMAC_RANGE_START or END;
> +	 * those values are ignored since MAC_ID_RANGE_EN=0
> +	 */
> +}
> +
> +/* Receive Initialization
> + * 1) Configures RX MAC filters via MMIO registers
> + * 2) Allocates RX WQE array using coherent DMA mapping
> + * 3) Initializes each element of RX WQE array with a receive
> + *    buffer pointer (also using coherent DMA mapping)
> + * 4) Allocates RX CQE array using coherent DMA mapping
> + * 5) Completes other misc receive initialization
> + */
> +static int mlxbf_gige_rx_init(struct mlxbf_gige *priv)
> +{
> +	size_t wq_size, cq_size;
> +	dma_addr_t *rx_wqe_ptr;
> +	dma_addr_t rx_buf_dma;
> +	u64 data;
> +	int i, j;
> +
> +	/* Configure MAC RX filter #0 to allow RX of broadcast pkts */
> +	mlxbf_gige_set_mac_rx_filter(priv,
> MLXBF_GIGE_BCAST_MAC_FILTER_IDX,
> +				     BCAST_MAC_ADDR);
> +
> +	wq_size = MLXBF_GIGE_RX_WQE_SZ * priv->rx_q_entries;
> +	priv->rx_wqe_base = dma_alloc_coherent(priv->dev, wq_size,
> +					       &priv->rx_wqe_base_dma,
> +					       GFP_KERNEL);
> +	if (!priv->rx_wqe_base)
> +		return -ENOMEM;
> +
> +	/* Initialize 'rx_wqe_ptr' to point to first RX WQE in array
> +	 * Each RX WQE is simply a receive buffer pointer, so walk
> +	 * the entire array, allocating a 2KB buffer for each element
> +	 */
> +	rx_wqe_ptr = priv->rx_wqe_base;
> +
> +	for (i = 0; i < priv->rx_q_entries; i++) {
> +		/* Allocate a receive buffer for this RX WQE. The DMA
> +		 * form (dma_addr_t) of the receive buffer address is
> +		 * stored in the RX WQE array (via 'rx_wqe_ptr') where
> +		 * it is accessible by the GigE device. The VA form of
> +		 * the receive buffer is stored in 'rx_buf[]' array in
> +		 * the driver private storage for housekeeping.
> +		 */
> +		priv->rx_buf[i] = dma_alloc_coherent(priv->dev,
> +
> MLXBF_GIGE_DEFAULT_BUF_SZ,
> +						     &rx_buf_dma,
> +						     GFP_KERNEL);
> +		if (!priv->rx_buf[i])
> +			goto free_wqe_and_buf;
> +
> +		*rx_wqe_ptr++ = rx_buf_dma;
> +	}
> +
> +	/* Write RX WQE base address into MMIO reg */
> +	writeq(priv->rx_wqe_base_dma, priv->base +
> MLXBF_GIGE_RX_WQ_BASE);
> +
> +	cq_size = MLXBF_GIGE_RX_CQE_SZ * priv->rx_q_entries;
> +	priv->rx_cqe_base = dma_alloc_coherent(priv->dev, cq_size,
> +					       &priv->rx_cqe_base_dma,
> +					       GFP_KERNEL);
> +	if (!priv->rx_cqe_base)
> +		goto free_wqe_and_buf;
> +
> +	/* Write RX CQE base address into MMIO reg */
> +	writeq(priv->rx_cqe_base_dma, priv->base +
> MLXBF_GIGE_RX_CQ_BASE);
> +
> +	/* Write RX_WQE_PI with current number of replenished buffers */
> +	writeq(priv->rx_q_entries, priv->base + MLXBF_GIGE_RX_WQE_PI);
> +
> +	/* Enable RX DMA to write new packets to memory */
> +	writeq(MLXBF_GIGE_RX_DMA_EN, priv->base +
> MLXBF_GIGE_RX_DMA);
> +
> +	/* Enable removal of CRC during RX */
> +	data = readq(priv->base + MLXBF_GIGE_RX);
> +	data |= MLXBF_GIGE_RX_STRIP_CRC_EN;
> +	writeq(data, priv->base + MLXBF_GIGE_RX);
> +
> +	/* Enable RX MAC filter pass and discard counters */
> +	writeq(MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC_EN,
> +	       priv->base + MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC);
> +	writeq(MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS_EN,
> +	       priv->base + MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS);
> +
> +	/* Clear MLXBF_GIGE_INT_MASK 'receive pkt' bit to
> +	 * indicate readiness to receive pkts
> +	 */
> +	data = readq(priv->base + MLXBF_GIGE_INT_MASK);
> +	data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
> +	writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
> +
> +	writeq(ilog2(priv->rx_q_entries),
> +	       priv->base + MLXBF_GIGE_RX_WQE_SIZE_LOG2);
> +
> +	return 0;
> +
> +free_wqe_and_buf:
> +	rx_wqe_ptr = priv->rx_wqe_base;
> +	for (j = 0; j < i; j++) {
> +		dma_free_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
> +				  priv->rx_buf[j], *rx_wqe_ptr);
> +		rx_wqe_ptr++;
> +	}
> +	dma_free_coherent(priv->dev, wq_size,
> +			  priv->rx_wqe_base, priv->rx_wqe_base_dma);
> +	return -ENOMEM;
> +}
> +
> +/* Transmit Initialization
> + * 1) Allocates TX WQE array using coherent DMA mapping
> + * 2) Allocates TX completion counter using coherent DMA mapping
> + */
> +static int mlxbf_gige_tx_init(struct mlxbf_gige *priv)
> +{
> +	size_t size;
> +
> +	size = MLXBF_GIGE_TX_WQE_SZ * priv->tx_q_entries;
> +	priv->tx_wqe_base = dma_alloc_coherent(priv->dev, size,
> +					       &priv->tx_wqe_base_dma,
> +					       GFP_KERNEL);
> +	if (!priv->tx_wqe_base)
> +		return -ENOMEM;
> +
> +	priv->tx_wqe_next = priv->tx_wqe_base;
> +
> +	/* Write TX WQE base address into MMIO reg */
> +	writeq(priv->tx_wqe_base_dma, priv->base +
> MLXBF_GIGE_TX_WQ_BASE);
> +
> +	/* Allocate address for TX completion count */
> +	priv->tx_cc = dma_alloc_coherent(priv->dev, MLXBF_GIGE_TX_CC_SZ,
> +					 &priv->tx_cc_dma, GFP_KERNEL);
> +
> +	if (!priv->tx_cc) {
> +		dma_free_coherent(priv->dev, size,
> +				  priv->tx_wqe_base, priv->tx_wqe_base_dma);
> +		return -ENOMEM;
> +	}
> +
> +	/* Write TX CC base address into MMIO reg */
> +	writeq(priv->tx_cc_dma, priv->base +
> MLXBF_GIGE_TX_CI_UPDATE_ADDRESS);
> +
> +	writeq(ilog2(priv->tx_q_entries),
> +	       priv->base + MLXBF_GIGE_TX_WQ_SIZE_LOG2);
> +
> +	priv->prev_tx_ci = 0;
> +	priv->tx_pi = 0;
> +
> +	return 0;
> +}
> +
> +/* Receive Deinitialization
> + * This routine will free allocations done by mlxbf_gige_rx_init(),
> + * namely the RX WQE and RX CQE arrays, as well as all RX buffers
> + */
> +static void mlxbf_gige_rx_deinit(struct mlxbf_gige *priv)
> +{
> +	dma_addr_t *rx_wqe_ptr;
> +	size_t size;
> +	int i;
> +
> +	rx_wqe_ptr = priv->rx_wqe_base;
> +
> +	for (i = 0; i < priv->rx_q_entries; i++) {
> +		dma_free_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
> +				  priv->rx_buf[i], *rx_wqe_ptr);
> +		priv->rx_buf[i] = NULL;
> +		rx_wqe_ptr++;
> +	}
> +
> +	size = MLXBF_GIGE_RX_WQE_SZ * priv->rx_q_entries;
> +	dma_free_coherent(priv->dev, size,
> +			  priv->rx_wqe_base, priv->rx_wqe_base_dma);
> +
> +	size = MLXBF_GIGE_RX_CQE_SZ * priv->rx_q_entries;
> +	dma_free_coherent(priv->dev, size,
> +			  priv->rx_cqe_base, priv->rx_cqe_base_dma);
> +
> +	priv->rx_wqe_base = 0;
> +	priv->rx_wqe_base_dma = 0;
> +	priv->rx_cqe_base = 0;
> +	priv->rx_cqe_base_dma = 0;
> +	writeq(0, priv->base + MLXBF_GIGE_RX_WQ_BASE);
> +	writeq(0, priv->base + MLXBF_GIGE_RX_CQ_BASE);
> +}
> +
> +/* Transmit Deinitialization
> + * This routine will free allocations done by mlxbf_gige_tx_init(),
> + * namely the TX WQE array and the TX completion counter
> + */
> +static void mlxbf_gige_tx_deinit(struct mlxbf_gige *priv)
> +{
> +	u64 *tx_wqe_ptr;
> +	size_t size;
> +	int i;
> +
> +	tx_wqe_ptr = priv->tx_wqe_base;
> +
> +	for (i = 0; i < priv->tx_q_entries; i++) {
> +		if (priv->tx_buf[i]) {
> +			dma_free_coherent(priv->dev,
> MLXBF_GIGE_DEFAULT_BUF_SZ,
> +					  priv->tx_buf[i], *tx_wqe_ptr);
> +			priv->tx_buf[i] = NULL;
> +		}
> +		tx_wqe_ptr += 2;
> +	}
> +
> +	size = MLXBF_GIGE_TX_WQE_SZ * priv->tx_q_entries;
> +	dma_free_coherent(priv->dev, size,
> +			  priv->tx_wqe_base, priv->tx_wqe_base_dma);
> +
> +	dma_free_coherent(priv->dev, MLXBF_GIGE_TX_CC_SZ,
> +			  priv->tx_cc, priv->tx_cc_dma);
> +
> +	priv->tx_wqe_base = 0;
> +	priv->tx_wqe_base_dma = 0;
> +	priv->tx_cc = 0;
> +	priv->tx_cc_dma = 0;
> +	priv->tx_wqe_next = 0;
> +	writeq(0, priv->base + MLXBF_GIGE_TX_WQ_BASE);
> +	writeq(0, priv->base + MLXBF_GIGE_TX_CI_UPDATE_ADDRESS);
> +}
> +
> +/* Start of struct ethtool_ops functions */
> +static int mlxbf_gige_get_regs_len(struct net_device *netdev)
> +{
> +	/* Return size of MMIO register space (in bytes).
> +	 *
> +	 * NOTE: MLXBF_GIGE_MAC_CFG is the last defined register offset,
> +	 * so use that plus size of single register to derive total size
> +	 */
> +	return MLXBF_GIGE_MAC_CFG + 8;
> +}
> +
> +static void mlxbf_gige_get_regs(struct net_device *netdev,
> +				struct ethtool_regs *regs, void *p)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	u64 *buff = p;
> +	int reg;
> +
> +	regs->version = MLXBF_GIGE_REGS_VERSION;
> +
> +	/* Read entire MMIO register space and store results
> +	 * into the provided buffer. Each 64-bit word is converted
> +	 * to big-endian to make the output more readable.
> +	 *
> +	 * NOTE: by design, a read to an offset without an existing
> +	 *       register will be acknowledged and return zero.
> +	 */
> +	for (reg = 0; reg <= MLXBF_GIGE_MAC_CFG; reg += 8)
> +		*buff++ = cpu_to_be64(readq(priv->base + reg));
> +}
> +
> +static void mlxbf_gige_get_ringparam(struct net_device *netdev,
> +				     struct ethtool_ringparam *ering)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +
> +	memset(ering, 0, sizeof(*ering));
> +	ering->rx_max_pending = MLXBF_GIGE_MAX_RXQ_SZ;
> +	ering->tx_max_pending = MLXBF_GIGE_MAX_TXQ_SZ;
> +	ering->rx_pending = priv->rx_q_entries;
> +	ering->tx_pending = priv->tx_q_entries;
> +}
> +
> +static int mlxbf_gige_set_ringparam(struct net_device *netdev,
> +				    struct ethtool_ringparam *ering)
> +{
> +	const struct net_device_ops *ops = netdev->netdev_ops;
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	int new_rx_q_entries, new_tx_q_entries;
> +
> +	/* Device does not have separate queues for small/large frames */
> +	if (ering->rx_mini_pending || ering->rx_jumbo_pending)
> +		return -EINVAL;
> +
> +	/* Round up to supported values */
> +	new_rx_q_entries = roundup_pow_of_two(ering->rx_pending);
> +	new_tx_q_entries = roundup_pow_of_two(ering->tx_pending);
> +
> +	/* Range check the new values */
> +	if (new_tx_q_entries < MLXBF_GIGE_MIN_TXQ_SZ ||
> +	    new_tx_q_entries > MLXBF_GIGE_MAX_TXQ_SZ ||
> +	    new_rx_q_entries < MLXBF_GIGE_MIN_RXQ_SZ ||
> +	    new_rx_q_entries > MLXBF_GIGE_MAX_RXQ_SZ)
> +		return -EINVAL;
> +
> +	/* If queue sizes did not change, exit now */
> +	if (new_rx_q_entries == priv->rx_q_entries &&
> +	    new_tx_q_entries == priv->tx_q_entries)
> +		return 0;
> +
> +	if (netif_running(netdev))
> +		ops->ndo_stop(netdev);
> +
> +	priv->rx_q_entries = new_rx_q_entries;
> +	priv->tx_q_entries = new_tx_q_entries;
> +
> +	if (netif_running(netdev))
> +		ops->ndo_open(netdev);
> +
> +	return 0;
> +}
> +
> +static void mlxbf_gige_get_drvinfo(struct net_device *netdev,
> +				   struct ethtool_drvinfo *info)
> +{
> +	strlcpy(info->driver, DRV_NAME, sizeof(info->driver));
> +	strlcpy(info->bus_info, dev_name(&netdev->dev), sizeof(info-
> >bus_info));
> +}
> +
> +static const struct {
> +	const char string[ETH_GSTRING_LEN];
> +} mlxbf_gige_ethtool_stats_keys[] = {
> +	{ "rx_bytes" },
> +	{ "rx_packets" },
> +	{ "tx_bytes" },
> +	{ "tx_packets" },
> +	{ "hw_access_errors" },
> +	{ "tx_invalid_checksums" },
> +	{ "tx_small_frames" },
> +	{ "tx_index_errors" },
> +	{ "sw_config_errors" },
> +	{ "sw_access_errors" },
> +	{ "rx_truncate_errors" },
> +	{ "rx_mac_errors" },
> +	{ "rx_din_dropped_pkts" },
> +	{ "tx_fifo_full" },
> +	{ "rx_filter_passed_pkts" },
> +	{ "rx_filter_discard_pkts" },
> +};
> +
> +static int mlxbf_gige_get_sset_count(struct net_device *netdev, int stringset)
> +{
> +	if (stringset != ETH_SS_STATS)
> +		return -EOPNOTSUPP;
> +	return ARRAY_SIZE(mlxbf_gige_ethtool_stats_keys);
> +}
> +
> +static void mlxbf_gige_get_strings(struct net_device *netdev, u32 stringset,
> +				   u8 *buf)
> +{
> +	if (stringset != ETH_SS_STATS)
> +		return;
> +	memcpy(buf, &mlxbf_gige_ethtool_stats_keys,
> +	       sizeof(mlxbf_gige_ethtool_stats_keys));
> +}
> +
> +static void mlxbf_gige_get_ethtool_stats(struct net_device *netdev,
> +					 struct ethtool_stats *estats,
> +					 u64 *data)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&priv->lock, flags);
> +
> +	/* Fill data array with interface statistics
> +	 *
> +	 * NOTE: the data writes must be in
> +	 *       sync with the strings shown in
> +	 *       the mlxbf_gige_ethtool_stats_keys[] array
> +	 *
> +	 * NOTE2: certain statistics below are zeroed upon
> +	 *        port disable, so the calculation below
> +	 *        must include the "cached" value of the stat
> +	 *        plus the value read directly from hardware.
> +	 *        Cached statistics are currently:
> +	 *          rx_din_dropped_pkts
> +	 *          rx_filter_passed_pkts
> +	 *          rx_filter_discard_pkts
> +	 */
> +	*data++ = netdev->stats.rx_bytes;
> +	*data++ = netdev->stats.rx_packets;
> +	*data++ = netdev->stats.tx_bytes;
> +	*data++ = netdev->stats.tx_packets;
> +	*data++ = priv->stats.hw_access_errors;
> +	*data++ = priv->stats.tx_invalid_checksums;
> +	*data++ = priv->stats.tx_small_frames;
> +	*data++ = priv->stats.tx_index_errors;
> +	*data++ = priv->stats.sw_config_errors;
> +	*data++ = priv->stats.sw_access_errors;
> +	*data++ = priv->stats.rx_truncate_errors;
> +	*data++ = priv->stats.rx_mac_errors;
> +	*data++ = (priv->stats.rx_din_dropped_pkts +
> +		   readq(priv->base + MLXBF_GIGE_RX_DIN_DROP_COUNTER));
> +	*data++ = priv->stats.tx_fifo_full;
> +	*data++ = (priv->stats.rx_filter_passed_pkts +
> +		   readq(priv->base + MLXBF_GIGE_RX_PASS_COUNTER_ALL));
> +	*data++ = (priv->stats.rx_filter_discard_pkts +
> +		   readq(priv->base + MLXBF_GIGE_RX_DISC_COUNTER_ALL));
> +
> +	spin_unlock_irqrestore(&priv->lock, flags);
> +}
> +
> +static void mlxbf_gige_get_pauseparam(struct net_device *netdev,
> +				      struct ethtool_pauseparam *pause)
> +{
> +	pause->autoneg = AUTONEG_ENABLE;
> +	pause->rx_pause = 1;
> +	pause->tx_pause = 1;
> +}
> +
> +static int mlxbf_gige_get_link_ksettings(struct net_device *netdev,
> +					 struct ethtool_link_ksettings
> *link_ksettings)
> +{
> +	struct phy_device *phydev = netdev->phydev;
> +	u32 supported, advertising;
> +	u32 lp_advertising = 0;
> +	int status;
> +
> +	supported = SUPPORTED_TP | SUPPORTED_1000baseT_Full |
> +		    SUPPORTED_Autoneg | SUPPORTED_Pause;
> +
> +	advertising = ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg |
> +		      ADVERTISED_Pause;
> +
> +	status = phy_read(phydev, MII_LPA);
> +	if (status >= 0)
> +		lp_advertising = mii_lpa_to_ethtool_lpa_t(status & 0xffff);
> +
> +	status = phy_read(phydev, MII_STAT1000);
> +	if (status >= 0)
> +		lp_advertising |= mii_stat1000_to_ethtool_lpa_t(status &
> 0xffff);
> +
> +	ethtool_convert_legacy_u32_to_link_mode(link_ksettings-
> >link_modes.supported,
> +						supported);
> +	ethtool_convert_legacy_u32_to_link_mode(link_ksettings-
> >link_modes.advertising,
> +						advertising);
> +	ethtool_convert_legacy_u32_to_link_mode(link_ksettings-
> >link_modes.lp_advertising,
> +						lp_advertising);
> +
> +	link_ksettings->base.autoneg = AUTONEG_ENABLE;
> +	link_ksettings->base.speed = SPEED_1000;
> +	link_ksettings->base.duplex = DUPLEX_FULL;
> +	link_ksettings->base.port = PORT_TP;
> +	link_ksettings->base.phy_address =
> MLXBF_GIGE_MDIO_DEFAULT_PHY_ADDR;
> +	link_ksettings->base.transceiver = XCVR_INTERNAL;
> +	link_ksettings->base.mdio_support = ETH_MDIO_SUPPORTS_C22;
> +	link_ksettings->base.eth_tp_mdix = ETH_TP_MDI_INVALID;
> +	link_ksettings->base.eth_tp_mdix_ctrl = ETH_TP_MDI_INVALID;
> +
> +	return 0;
> +}
> +
> +static const struct ethtool_ops mlxbf_gige_ethtool_ops = {
> +	.get_drvinfo		= mlxbf_gige_get_drvinfo,
> +	.get_link		= ethtool_op_get_link,
> +	.get_ringparam		= mlxbf_gige_get_ringparam,
> +	.set_ringparam		= mlxbf_gige_set_ringparam,
> +	.get_regs_len           = mlxbf_gige_get_regs_len,
> +	.get_regs               = mlxbf_gige_get_regs,
> +	.get_strings            = mlxbf_gige_get_strings,
> +	.get_sset_count         = mlxbf_gige_get_sset_count,
> +	.get_ethtool_stats      = mlxbf_gige_get_ethtool_stats,
> +	.nway_reset		= phy_ethtool_nway_reset,
> +	.get_pauseparam		= mlxbf_gige_get_pauseparam,
> +	.get_link_ksettings	= mlxbf_gige_get_link_ksettings,
> +};
> +
> +static void mlxbf_gige_handle_link_change(struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	struct phy_device *phydev = netdev->phydev;
> +	irqreturn_t ret;
> +
> +	ret = mlxbf_gige_mdio_handle_phy_interrupt(priv);
> +	if (ret != IRQ_HANDLED)
> +		return;
> +
> +	/* print new link status only if the interrupt came from the PHY */
> +	phy_print_status(phydev);
> +}
> +
> +/* Start of struct net_device_ops functions */
> +static irqreturn_t mlxbf_gige_error_intr(int irq, void *dev_id)
> +{
> +	struct mlxbf_gige *priv;
> +	u64 int_status;
> +
> +	priv = dev_id;
> +
> +	priv->error_intr_count++;
> +
> +	int_status = readq(priv->base + MLXBF_GIGE_INT_STATUS);
> +
> +	if (int_status & MLXBF_GIGE_INT_STATUS_HW_ACCESS_ERROR)
> +		priv->stats.hw_access_errors++;
> +
> +	if (int_status & MLXBF_GIGE_INT_STATUS_TX_CHECKSUM_INPUTS) {
> +		priv->stats.tx_invalid_checksums++;
> +		/* This error condition is latched into MLXBF_GIGE_INT_STATUS
> +		 * when the GigE silicon operates on the offending
> +		 * TX WQE. The write to MLXBF_GIGE_INT_STATUS at the
> bottom
> +		 * of this routine clears this error condition.
> +		 */
> +	}
> +
> +	if (int_status & MLXBF_GIGE_INT_STATUS_TX_SMALL_FRAME_SIZE) {
> +		priv->stats.tx_small_frames++;
> +		/* This condition happens when the networking stack invokes
> +		 * this driver's "start_xmit()" method with a packet whose
> +		 * size < 60 bytes.  The GigE silicon will automatically pad
> +		 * this small frame up to a minimum-sized frame before it is
> +		 * sent. The "tx_small_frame" condition is latched into the
> +		 * MLXBF_GIGE_INT_STATUS register when the GigE silicon
> +		 * operates on the offending TX WQE. The write to
> +		 * MLXBF_GIGE_INT_STATUS at the bottom of this routine
> +		 * clears this condition.
> +		 */
> +	}
> +
> +	if (int_status & MLXBF_GIGE_INT_STATUS_TX_PI_CI_EXCEED_WQ_SIZE)
> +		priv->stats.tx_index_errors++;
> +
> +	if (int_status & MLXBF_GIGE_INT_STATUS_SW_CONFIG_ERROR)
> +		priv->stats.sw_config_errors++;
> +
> +	if (int_status & MLXBF_GIGE_INT_STATUS_SW_ACCESS_ERROR)
> +		priv->stats.sw_access_errors++;
> +
> +	/* Clear all error interrupts by writing '1' back to
> +	 * all the asserted bits in INT_STATUS.  Do not write
> +	 * '1' back to 'receive packet' bit, since that is
> +	 * managed separately.
> +	 */
> +
> +	int_status &= ~MLXBF_GIGE_INT_STATUS_RX_RECEIVE_PACKET;
> +
> +	writeq(int_status, priv->base + MLXBF_GIGE_INT_STATUS);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t mlxbf_gige_rx_intr(int irq, void *dev_id)
> +{
> +	struct mlxbf_gige *priv;
> +
> +	priv = dev_id;
> +
> +	priv->rx_intr_count++;
> +
> +	/* Driver has been interrupted because a new packet is available,
> +	 * but do not process packets at this time.  Instead, disable any
> +	 * further "packet rx" interrupts and tell the networking subsystem
> +	 * to poll the driver to pick up all available packets.
> +	 *
> +	 * NOTE: GigE silicon automatically disables "packet rx" interrupt by
> +	 *       setting MLXBF_GIGE_INT_MASK bit0 upon triggering the interrupt
> +	 *       to the ARM cores.  Software needs to re-enable "packet rx"
> +	 *       interrupts by clearing MLXBF_GIGE_INT_MASK bit0.
> +	 */
> +
> +	/* Tell networking subsystem to poll GigE driver */
> +	napi_schedule(&priv->napi);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t mlxbf_gige_llu_plu_intr(int irq, void *dev_id)
> +{
> +	struct mlxbf_gige *priv;
> +
> +	priv = dev_id;
> +	priv->llu_plu_intr_count++;
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/* Function that returns status of TX ring:
> + *          0: TX ring is full, i.e. there are no
> + *             available un-used entries in TX ring.
> + *   non-null: TX ring is not full, i.e. there are
> + *             some available entries in TX ring.
> + *             The non-null value is a measure of
> + *             how many TX entries are available, but
> + *             it is not the exact number of available
> + *             entries (see below).
> + *
> + * The algorithm makes the assumption that if
> + * (prev_tx_ci == tx_pi) then the TX ring is empty.
> + * An empty ring actually has (tx_q_entries-1)
> + * entries, which allows the algorithm to differentiate
> + * the case of an empty ring vs. a full ring.
> + */
> +static u16 mlxbf_gige_tx_buffs_avail(struct mlxbf_gige *priv)
> +{
> +	unsigned long flags;
> +	u16 avail;
> +
> +	spin_lock_irqsave(&priv->lock, flags);
> +
> +	if (priv->prev_tx_ci == priv->tx_pi)
> +		avail = priv->tx_q_entries - 1;
> +	else
> +		avail = ((priv->tx_q_entries + priv->prev_tx_ci - priv->tx_pi)
> +			  % priv->tx_q_entries) - 1;
> +
> +	spin_unlock_irqrestore(&priv->lock, flags);
> +
> +	return avail;
> +}
> +
> +static bool mlxbf_gige_handle_tx_complete(struct mlxbf_gige *priv)
> +{
> +	struct net_device_stats *stats;
> +	u16 tx_wqe_index;
> +	u64 *tx_wqe_addr;
> +	u64 tx_status;
> +	u16 tx_ci;
> +
> +	tx_status = readq(priv->base + MLXBF_GIGE_TX_STATUS);
> +	if (tx_status & MLXBF_GIGE_TX_STATUS_DATA_FIFO_FULL)
> +		priv->stats.tx_fifo_full++;
> +	tx_ci = readq(priv->base + MLXBF_GIGE_TX_CONSUMER_INDEX);
> +	stats = &priv->netdev->stats;
> +
> +	/* Transmit completion logic needs to loop until the completion
> +	 * index (in SW) equals TX consumer index (from HW).  These
> +	 * parameters are unsigned 16-bit values and the wrap case needs
> +	 * to be supported, that is TX consumer index wrapped from 0xFFFF
> +	 * to 0 while TX completion index is still < 0xFFFF.
> +	 */
> +	for (; priv->prev_tx_ci != tx_ci; priv->prev_tx_ci++) {
> +		tx_wqe_index = priv->prev_tx_ci % priv->tx_q_entries;
> +		/* Each TX WQE is 16 bytes. The 8 MSB store the 2KB TX
> +		 * buffer address and the 8 LSB contain information
> +		 * about the TX WQE.
> +		 */
> +		tx_wqe_addr = priv->tx_wqe_base +
> +			       (tx_wqe_index *
> MLXBF_GIGE_TX_WQE_SZ_QWORDS);
> +
> +		stats->tx_packets++;
> +		stats->tx_bytes +=
> MLXBF_GIGE_TX_WQE_PKT_LEN(tx_wqe_addr);
> +		dma_free_coherent(priv->dev, MLXBF_GIGE_DEFAULT_BUF_SZ,
> +				  priv->tx_buf[tx_wqe_index], *tx_wqe_addr);
> +		priv->tx_buf[tx_wqe_index] = NULL;
> +	}
> +
> +	/* Since the TX ring was likely just drained, check if TX queue
> +	 * had previously been stopped and now that there are TX buffers
> +	 * available the TX queue can be awakened.
> +	 */
> +	if (netif_queue_stopped(priv->netdev) &&
> +	    mlxbf_gige_tx_buffs_avail(priv)) {
> +		netif_wake_queue(priv->netdev);
> +	}
> +
> +	return true;
> +}
> +
> +static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
> +{
> +	struct net_device *netdev = priv->netdev;
> +	u16 rx_pi_rem, rx_ci_rem;
> +	struct sk_buff *skb;
> +	u64 *rx_cqe_addr;
> +	u64 datalen;
> +	u64 rx_cqe;
> +	u16 rx_ci;
> +	u16 rx_pi;
> +	u8 *pktp;
> +
> +	/* Index into RX buffer array is rx_pi w/wrap based on RX_CQE_SIZE */
> +	rx_pi = readq(priv->base + MLXBF_GIGE_RX_WQE_PI);
> +	rx_pi_rem = rx_pi % priv->rx_q_entries;
> +	pktp = priv->rx_buf[rx_pi_rem];
> +	rx_cqe_addr = priv->rx_cqe_base + rx_pi_rem;
> +	rx_cqe = *rx_cqe_addr;
> +	datalen = rx_cqe & MLXBF_GIGE_RX_CQE_PKT_LEN_MASK;
> +
> +	if ((rx_cqe & MLXBF_GIGE_RX_CQE_PKT_STATUS_MASK) == 0) {
> +		/* Packet is OK, increment stats */
> +		netdev->stats.rx_packets++;
> +		netdev->stats.rx_bytes += datalen;
> +
> +		skb = dev_alloc_skb(datalen);
> +		if (!skb) {
> +			netdev->stats.rx_dropped++;
> +			return false;
> +		}
> +
> +		memcpy(skb_put(skb, datalen), pktp, datalen);
> +
> +		skb->dev = netdev;
> +		skb->protocol = eth_type_trans(skb, netdev);
> +		skb->ip_summed = CHECKSUM_NONE; /* device did not
> checksum packet */
> +
> +		netif_receive_skb(skb);
> +	} else if (rx_cqe & MLXBF_GIGE_RX_CQE_PKT_STATUS_MAC_ERR) {
> +		priv->stats.rx_mac_errors++;
> +	} else if (rx_cqe & MLXBF_GIGE_RX_CQE_PKT_STATUS_TRUNCATED) {
> +		priv->stats.rx_truncate_errors++;
> +	}
> +
> +	/* Let hardware know we've replenished one buffer */
> +	writeq(rx_pi + 1, priv->base + MLXBF_GIGE_RX_WQE_PI);
> +
> +	(*rx_pkts)++;
> +	rx_pi = readq(priv->base + MLXBF_GIGE_RX_WQE_PI);
> +	rx_pi_rem = rx_pi % priv->rx_q_entries;
> +	rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
> +	rx_ci_rem = rx_ci % priv->rx_q_entries;
> +
> +	return rx_pi_rem != rx_ci_rem;
> +}
> +
> +/* Driver poll() function called by NAPI infrastructure */
> +static int mlxbf_gige_poll(struct napi_struct *napi, int budget)
> +{
> +	struct mlxbf_gige *priv;
> +	bool remaining_pkts;
> +	int work_done = 0;
> +	u64 data;
> +
> +	priv = container_of(napi, struct mlxbf_gige, napi);
> +
> +	mlxbf_gige_handle_tx_complete(priv);
> +
> +	do {
> +		remaining_pkts = mlxbf_gige_rx_packet(priv, &work_done);
> +	} while (remaining_pkts && work_done < budget);
> +
> +	/* If amount of work done < budget, turn off NAPI polling
> +	 * via napi_complete_done(napi, work_done) and then
> +	 * re-enable interrupts.
> +	 */
> +	if (work_done < budget && napi_complete_done(napi, work_done)) {
> +		/* Clear MLXBF_GIGE_INT_MASK 'receive pkt' bit to
> +		 * indicate receive readiness
> +		 */
> +		data = readq(priv->base + MLXBF_GIGE_INT_MASK);
> +		data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
> +		writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
> +	}
> +
> +	return work_done;
> +}
> +
> +static int mlxbf_gige_request_irqs(struct mlxbf_gige *priv)
> +{
> +	int err;
> +
> +	err = devm_request_irq(priv->dev, priv->error_irq,
> +			       mlxbf_gige_error_intr, 0, "mlxbf_gige_error",
> +			       priv);
> +	if (err) {
> +		dev_err(priv->dev, "Request error_irq failure\n");
> +		return err;
> +	}
> +
> +	err = devm_request_irq(priv->dev, priv->rx_irq,
> +			       mlxbf_gige_rx_intr, 0, "mlxbf_gige_rx",
> +			       priv);
> +	if (err) {
> +		dev_err(priv->dev, "Request rx_irq failure\n");
> +		return err;
> +	}
> +
> +	err = devm_request_irq(priv->dev, priv->llu_plu_irq,
> +			       mlxbf_gige_llu_plu_intr, 0, "mlxbf_gige_llu_plu",
> +			       priv);
> +	if (err) {
> +		dev_err(priv->dev, "Request llu_plu_irq failure\n");
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static void mlxbf_gige_free_irqs(struct mlxbf_gige *priv)
> +{
> +	devm_free_irq(priv->dev, priv->error_irq, priv);
> +	devm_free_irq(priv->dev, priv->rx_irq, priv);
> +	devm_free_irq(priv->dev, priv->llu_plu_irq, priv);
> +}
> +
> +static void mlxbf_gige_cache_stats(struct mlxbf_gige *priv)
> +{
> +	struct mlxbf_gige_stats *p;
> +
> +	/* Cache stats that will be cleared by clean port operation */
> +	p = &priv->stats;
> +	p->rx_din_dropped_pkts += readq(priv->base +
> +
> 	MLXBF_GIGE_RX_DIN_DROP_COUNTER);
> +	p->rx_filter_passed_pkts += readq(priv->base +
> +
> MLXBF_GIGE_RX_PASS_COUNTER_ALL);
> +	p->rx_filter_discard_pkts += readq(priv->base +
> +
> MLXBF_GIGE_RX_DISC_COUNTER_ALL);
> +}
> +
> +static void mlxbf_gige_clean_port(struct mlxbf_gige *priv)
> +{
> +	u64 control, status;
> +	int cnt;
> +
> +	/* Set the CLEAN_PORT_EN bit to trigger SW reset */
> +	control = readq(priv->base + MLXBF_GIGE_CONTROL);
> +	control |= MLXBF_GIGE_CONTROL_CLEAN_PORT_EN;
> +	writeq(control, priv->base + MLXBF_GIGE_CONTROL);
> +
> +	/* Loop waiting for status ready bit to assert */
> +	cnt = 1000;
> +	do {
> +		status = readq(priv->base + MLXBF_GIGE_STATUS);
> +		if (status & MLXBF_GIGE_STATUS_READY)
> +			break;
> +		usleep_range(50, 100);
> +	} while (--cnt > 0);
> +
> +	/* Clear the CLEAN_PORT_EN bit at end of this loop */
> +	control = readq(priv->base + MLXBF_GIGE_CONTROL);
> +	control &= ~MLXBF_GIGE_CONTROL_CLEAN_PORT_EN;
> +	writeq(control, priv->base + MLXBF_GIGE_CONTROL);
> +}
> +
> +static int mlxbf_gige_open(struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	struct phy_device *phydev = netdev->phydev;
> +	u64 int_en;
> +	int err;
> +
> +	mlxbf_gige_cache_stats(priv);
> +	mlxbf_gige_clean_port(priv);
> +	mlxbf_gige_rx_init(priv);
> +	mlxbf_gige_tx_init(priv);
> +	netif_napi_add(netdev, &priv->napi, mlxbf_gige_poll,
> NAPI_POLL_WEIGHT);
> +	napi_enable(&priv->napi);
> +	netif_start_queue(netdev);
> +
> +	err = mlxbf_gige_request_irqs(priv);
> +	if (err)
> +		return err;
> +
> +	phy_start(phydev);
> +
> +	/* Set bits in INT_EN that we care about */
> +	int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR |
> +		 MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS |
> +		 MLXBF_GIGE_INT_EN_TX_SMALL_FRAME_SIZE |
> +		 MLXBF_GIGE_INT_EN_TX_PI_CI_EXCEED_WQ_SIZE |
> +		 MLXBF_GIGE_INT_EN_SW_CONFIG_ERROR |
> +		 MLXBF_GIGE_INT_EN_SW_ACCESS_ERROR |
> +		 MLXBF_GIGE_INT_EN_RX_RECEIVE_PACKET;
> +	writeq(int_en, priv->base + MLXBF_GIGE_INT_EN);
> +
> +	return 0;
> +}
> +
> +static int mlxbf_gige_stop(struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +
> +	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
> +	netif_stop_queue(netdev);
> +	napi_disable(&priv->napi);
> +	netif_napi_del(&priv->napi);
> +	mlxbf_gige_free_irqs(priv);
> +
> +	if (netdev->phydev)
> +		phy_stop(netdev->phydev);
> +
> +	mlxbf_gige_rx_deinit(priv);
> +	mlxbf_gige_tx_deinit(priv);
> +	mlxbf_gige_cache_stats(priv);
> +	mlxbf_gige_clean_port(priv);
> +
> +	return 0;
> +}
> +
> +/* Function to advance the tx_wqe_next pointer to next TX WQE */
> +static void mlxbf_gige_update_tx_wqe_next(struct mlxbf_gige *priv)
> +{
> +	/* Advance tx_wqe_next pointer */
> +	priv->tx_wqe_next += MLXBF_GIGE_TX_WQE_SZ_QWORDS;
> +
> +	/* Check if 'next' pointer is beyond end of TX ring */
> +	/* If so, set 'next' back to 'base' pointer of ring */
> +	if (priv->tx_wqe_next == (priv->tx_wqe_base +
> +				  (priv->tx_q_entries *
> MLXBF_GIGE_TX_WQE_SZ_QWORDS)))
> +		priv->tx_wqe_next = priv->tx_wqe_base;
> +}
> +
> +static netdev_tx_t mlxbf_gige_start_xmit(struct sk_buff *skb,
> +					 struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	dma_addr_t tx_buf_dma;
> +	u8 *tx_buf = NULL;
> +	u64 *tx_wqe_addr;
> +	u64 word2;
> +
> +	/* Check that there is room left in TX ring */
> +	if (!mlxbf_gige_tx_buffs_avail(priv)) {
> +		/* TX ring is full, inform stack but do not free SKB */
> +		netif_stop_queue(netdev);
> +		netdev->stats.tx_dropped++;
> +		return NETDEV_TX_BUSY;
> +	}
> +
> +	/* Allocate ptr for buffer */
> +	if (skb->len < MLXBF_GIGE_DEFAULT_BUF_SZ)
> +		tx_buf = dma_alloc_coherent(priv->dev,
> MLXBF_GIGE_DEFAULT_BUF_SZ,
> +					    &tx_buf_dma, GFP_KERNEL);
> +
> +	if (!tx_buf) {
> +		/* Free incoming skb, could not alloc TX buffer */
> +		dev_kfree_skb(skb);
> +		netdev->stats.tx_dropped++;
> +		return NET_XMIT_DROP;
> +	}
> +
> +	priv->tx_buf[priv->tx_pi % priv->tx_q_entries] = tx_buf;
> +
> +	/* Copy data from skb to allocated TX buffer
> +	 *
> +	 * NOTE: GigE silicon will automatically pad up to
> +	 *       minimum packet length if needed.
> +	 */
> +	skb_copy_bits(skb, 0, tx_buf, skb->len);
> +
> +	/* Get address of TX WQE */
> +	tx_wqe_addr = priv->tx_wqe_next;
> +
> +	mlxbf_gige_update_tx_wqe_next(priv);
> +
> +	/* Put PA of buffer address into first 64-bit word of TX WQE */
> +	*tx_wqe_addr = tx_buf_dma;
> +
> +	/* Set TX WQE pkt_len appropriately */
> +	word2 = skb->len & MLXBF_GIGE_TX_WQE_PKT_LEN_MASK;
> +
> +	/* Write entire 2nd word of TX WQE */
> +	*(tx_wqe_addr + 1) = word2;
> +
> +	priv->tx_pi++;
> +
> +	/* Create memory barrier before write to TX PI */
> +	wmb();
> +
> +	writeq(priv->tx_pi, priv->base + MLXBF_GIGE_TX_PRODUCER_INDEX);
> +
> +	/* Free incoming skb, contents already copied to HW */
> +	dev_kfree_skb(skb);
> +
> +	return NETDEV_TX_OK;
> +}
> +
> +static int mlxbf_gige_do_ioctl(struct net_device *netdev,
> +			       struct ifreq *ifr, int cmd)
> +{
> +	if (!(netif_running(netdev)))
> +		return -EINVAL;
> +
> +	return phy_mii_ioctl(netdev->phydev, ifr, cmd);
> +}
> +
> +static void mlxbf_gige_set_rx_mode(struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	bool new_promisc_enabled;
> +
> +	new_promisc_enabled = netdev->flags & IFF_PROMISC;
> +
> +	/* Only write to the hardware registers if the new setting
> +	 * of promiscuous mode is different from the current one.
> +	 */
> +	if (new_promisc_enabled != priv->promisc_enabled) {
> +		priv->promisc_enabled = new_promisc_enabled;
> +
> +		if (new_promisc_enabled)
> +			mlxbf_gige_enable_promisc(priv);
> +		else
> +			mlxbf_gige_disable_promisc(priv);
> +		}
> +	}
> +
> +static const struct net_device_ops mlxbf_gige_netdev_ops = {
> +	.ndo_open		= mlxbf_gige_open,
> +	.ndo_stop		= mlxbf_gige_stop,
> +	.ndo_start_xmit		= mlxbf_gige_start_xmit,
> +	.ndo_set_mac_address	= eth_mac_addr,
> +	.ndo_validate_addr	= eth_validate_addr,
> +	.ndo_do_ioctl		= mlxbf_gige_do_ioctl,
> +	.ndo_set_rx_mode        = mlxbf_gige_set_rx_mode,
> +};
> +
> +static u64 mlxbf_gige_mac_to_u64(u8 *addr)
> +{
> +	u64 mac = 0;
> +	int i;
> +
> +	for (i = 0; i < ETH_ALEN; i++) {
> +		mac <<= 8;
> +		mac |= addr[i];
> +	}
> +	return mac;
> +}
> +
> +static void mlxbf_gige_u64_to_mac(u8 *addr, u64 mac)
> +{
> +	int i;
> +
> +	for (i = ETH_ALEN; i > 0; i--) {
> +		addr[i - 1] = mac & 0xFF;
> +		mac >>= 8;
> +	}
> +}
> +
> +static void mlxbf_gige_initial_mac(struct mlxbf_gige *priv)
> +{
> +	u8 mac[ETH_ALEN];
> +	u64 local_mac;
> +	int status;
> +
> +	status = mlxbf_gige_get_mac_rx_filter(priv,
> MLXBF_GIGE_LOCAL_MAC_FILTER_IDX,
> +					      &local_mac);
> +	mlxbf_gige_u64_to_mac(mac, local_mac);
> +
> +	if (is_valid_ether_addr(mac)) {
> +		ether_addr_copy(priv->netdev->dev_addr, mac);
> +	} else {
> +		/* Provide a random MAC if for some reason the device has
> +		 * not been configured with a valid MAC address already.
> +		 */
> +		eth_hw_addr_random(priv->netdev);
> +	}
> +
> +	local_mac = mlxbf_gige_mac_to_u64(priv->netdev->dev_addr);
> +	mlxbf_gige_set_mac_rx_filter(priv,
> MLXBF_GIGE_LOCAL_MAC_FILTER_IDX,
> +				     local_mac);
> +}
> +
> +static int mlxbf_gige_probe(struct platform_device *pdev)
> +{
> +	struct phy_device *phydev;
> +	struct net_device *netdev;
> +	struct resource *mac_res;
> +	struct resource *llu_res;
> +	struct resource *plu_res;
> +	struct mlxbf_gige *priv;
> +	void __iomem *llu_base;
> +	void __iomem *plu_base;
> +	void __iomem *base;
> +	u64 control;
> +	int err = 0;
> +
> +	mac_res = platform_get_resource(pdev, IORESOURCE_MEM,
> MLXBF_GIGE_RES_MAC);
> +	if (!mac_res)
> +		return -ENXIO;
> +
> +	base = devm_ioremap_resource(&pdev->dev, mac_res);
> +	if (IS_ERR(base))
> +		return PTR_ERR(base);
> +
> +	llu_res = platform_get_resource(pdev, IORESOURCE_MEM,
> MLXBF_GIGE_RES_LLU);
> +	if (!llu_res)
> +		return -ENXIO;
> +
> +	llu_base = devm_ioremap_resource(&pdev->dev, llu_res);
> +	if (IS_ERR(llu_base))
> +		return PTR_ERR(llu_base);
> +
> +	plu_res = platform_get_resource(pdev, IORESOURCE_MEM,
> MLXBF_GIGE_RES_PLU);
> +	if (!plu_res)
> +		return -ENXIO;
> +
> +	plu_base = devm_ioremap_resource(&pdev->dev, plu_res);
> +	if (IS_ERR(plu_base))
> +		return PTR_ERR(plu_base);
> +
> +	/* Perform general init of GigE block */
> +	control = readq(base + MLXBF_GIGE_CONTROL);
> +	control |= MLXBF_GIGE_CONTROL_PORT_EN;
> +	writeq(control, base + MLXBF_GIGE_CONTROL);
> +
> +	netdev = devm_alloc_etherdev(&pdev->dev, sizeof(*priv));
> +	if (!netdev)
> +		return -ENOMEM;
> +
> +	SET_NETDEV_DEV(netdev, &pdev->dev);
> +	netdev->netdev_ops = &mlxbf_gige_netdev_ops;
> +	netdev->ethtool_ops = &mlxbf_gige_ethtool_ops;
> +	priv = netdev_priv(netdev);
> +	priv->netdev = netdev;
> +
> +	platform_set_drvdata(pdev, priv);
> +	priv->dev = &pdev->dev;
> +	priv->pdev = pdev;
> +
> +	spin_lock_init(&priv->lock);
> +	spin_lock_init(&priv->gpio_lock);
> +
> +	/* Attach MDIO device */
> +	err = mlxbf_gige_mdio_probe(pdev, priv);
> +	if (err)
> +		return err;
> +
> +	priv->base = base;
> +	priv->llu_base = llu_base;
> +	priv->plu_base = plu_base;
> +
> +	priv->rx_q_entries = MLXBF_GIGE_DEFAULT_RXQ_SZ;
> +	priv->tx_q_entries = MLXBF_GIGE_DEFAULT_TXQ_SZ;
> +
> +	/* Write initial MAC address to hardware */
> +	mlxbf_gige_initial_mac(priv);
> +
> +	err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> +	if (err) {
> +		dev_err(&pdev->dev, "DMA configuration failed: 0x%x\n", err);
> +		return err;
> +	}
> +
> +	priv->error_irq = platform_get_irq(pdev,
> MLXBF_GIGE_ERROR_INTR_IDX);
> +	priv->rx_irq = platform_get_irq(pdev,
> MLXBF_GIGE_RECEIVE_PKT_INTR_IDX);
> +	priv->llu_plu_irq = platform_get_irq(pdev,
> MLXBF_GIGE_LLU_PLU_INTR_IDX);
> +
> +	phydev = phy_find_first(priv->mdiobus);
> +	if (!phydev)
> +		return -EIO;
> +
> +	/* Sets netdev->phydev to phydev; which will eventually
> +	 * be used in ioctl calls.
> +	 */
> +	err = phy_connect_direct(netdev, phydev,
> +				 mlxbf_gige_handle_link_change,
> +				 PHY_INTERFACE_MODE_GMII);
> +	if (err) {
> +		dev_err(&pdev->dev, "Could not attach to PHY\n");
> +		return err;
> +	}
> +
> +	/* MAC only supports 1000T full duplex mode */
> +	phy_remove_link_mode(phydev,
> ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
> +	phy_remove_link_mode(phydev,
> ETHTOOL_LINK_MODE_100baseT_Full_BIT);
> +	phy_remove_link_mode(phydev,
> ETHTOOL_LINK_MODE_100baseT_Half_BIT);
> +	phy_remove_link_mode(phydev,
> ETHTOOL_LINK_MODE_10baseT_Full_BIT);
> +	phy_remove_link_mode(phydev,
> ETHTOOL_LINK_MODE_10baseT_Half_BIT);
> +
> +	/* MAC supports symmetric flow control */
> +	phy_support_sym_pause(phydev);
> +
> +	/* Display information about attached PHY device */
> +	phy_attached_info(phydev);
> +
> +	err = register_netdev(netdev);
> +	if (err) {
> +		dev_err(&pdev->dev, "Failed to register netdev\n");
> +		phy_disconnect(phydev);
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mlxbf_gige_remove(struct platform_device *pdev)
> +{
> +	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
> +
> +	unregister_netdev(priv->netdev);
> +	phy_disconnect(priv->netdev->phydev);
> +	mlxbf_gige_mdio_remove(priv);
> +	platform_set_drvdata(pdev, NULL);
> +
> +	return 0;
> +}
> +
> +static void mlxbf_gige_shutdown(struct platform_device *pdev)
> +{
> +	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
> +
> +	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
> +	mlxbf_gige_clean_port(priv);
> +}
> +
> +static const struct acpi_device_id mlxbf_gige_acpi_match[] = {
> +	{ "MLNXBF17", 0 },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(acpi, mlxbf_gige_acpi_match);
> +
> +static struct platform_driver mlxbf_gige_driver = {
> +	.probe = mlxbf_gige_probe,
> +	.remove = mlxbf_gige_remove,
> +	.shutdown = mlxbf_gige_shutdown,
> +	.driver = {
> +		.name = DRV_NAME,
> +		.acpi_match_table = ACPI_PTR(mlxbf_gige_acpi_match),
> +	},
> +};
> +
> +module_platform_driver(mlxbf_gige_driver);
> +
> +MODULE_DESCRIPTION("Mellanox BlueField SoC Gigabit Ethernet Driver");
> +MODULE_AUTHOR("David Thompson <dthompson@mellanox.com>");
> +MODULE_AUTHOR("Asmaa Mnebhi <asmaa@mellanox.com>");
> +MODULE_LICENSE("Dual BSD/GPL");
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c
> b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c
> new file mode 100644
> index 0000000..bb848b4
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_mdio.c
> @@ -0,0 +1,423 @@
> +// SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB
> +/*  MDIO support for Mellanox GigE driver
> + *
> + *  Copyright (c) 2020 Mellanox Technologies Ltd.
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitfield.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/ioport.h>
> +#include <linux/irqreturn.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/phy.h>
> +#include <linux/platform_device.h>
> +#include <linux/property.h>
> +
> +#include "mlxbf_gige.h"
> +
> +#define MLXBF_GIGE_MDIO_POLL_BUSY_TIMEOUT	100 /* ms */
> +#define MLXBF_GIGE_MDIO_POLL_DELAY_USEC		100 /* us */
> +
> +#define MLXBF_GIGE_MDIO_GW_OFFSET	0x0
> +#define MLXBF_GIGE_MDIO_CFG_OFFSET	0x4
> +
> +/* Support clause 22 */
> +#define MLXBF_GIGE_MDIO_CL22_ST1	0x1
> +#define MLXBF_GIGE_MDIO_CL22_WRITE	0x1
> +#define MLXBF_GIGE_MDIO_CL22_READ	0x2
> +
> +/* Busy bit is set by software and cleared by hardware */
> +#define MLXBF_GIGE_MDIO_SET_BUSY	0x1
> +/* Lock bit should be set/cleared by software */
> +#define MLXBF_GIGE_MDIO_SET_LOCK	0x1
> +
> +/* MDIO GW register bits */
> +#define MLXBF_GIGE_MDIO_GW_AD_MASK	GENMASK(15, 0)
> +#define MLXBF_GIGE_MDIO_GW_DEVAD_MASK	GENMASK(20, 16)
> +#define MLXBF_GIGE_MDIO_GW_PARTAD_MASK	GENMASK(25, 21)
> +#define MLXBF_GIGE_MDIO_GW_OPCODE_MASK	GENMASK(27, 26)
> +#define MLXBF_GIGE_MDIO_GW_ST1_MASK	GENMASK(28, 28)
> +#define MLXBF_GIGE_MDIO_GW_BUSY_MASK	GENMASK(30, 30)
> +#define MLXBF_GIGE_MDIO_GW_LOCK_MASK	GENMASK(31, 31)
> +
> +/* MDIO config register bits */
> +#define MLXBF_GIGE_MDIO_CFG_MDIO_MODE_MASK		GENMASK(1,
> 0)
> +#define MLXBF_GIGE_MDIO_CFG_MDIO3_3_MASK		GENMASK(2,
> 2)
> +#define MLXBF_GIGE_MDIO_CFG_MDIO_FULL_DRIVE_MASK	GENMASK(4,
> 4)
> +#define MLXBF_GIGE_MDIO_CFG_MDC_PERIOD_MASK		GENMASK(15,
> 8)
> +#define MLXBF_GIGE_MDIO_CFG_MDIO_IN_SAMP_MASK
> 	GENMASK(23, 16)
> +#define MLXBF_GIGE_MDIO_CFG_MDIO_OUT_SAMP_MASK
> 	GENMASK(31, 24)
> +
> +/* Formula for encoding the MDIO period. The encoded value is
> + * passed to the MDIO config register.
> + *
> + * mdc_clk = 2*(val + 1)*i1clk
> + *
> + * 400 ns = 2*(val + 1)*(((1/430)*1000) ns)
> + *
> + * val = (((400 * 430 / 1000) / 2) - 1)
> + */
> +#define MLXBF_GIGE_I1CLK_MHZ		430
> +#define MLXBF_GIGE_MDC_CLK_NS		400
> +
> +#define MLXBF_GIGE_MDIO_PERIOD	(((MLXBF_GIGE_MDC_CLK_NS *
> MLXBF_GIGE_I1CLK_MHZ / 1000) / 2) - 1)
> +
> +/* PHY should operate in master mode only */
> +#define MLXBF_GIGE_MDIO_MODE_MASTER	1
> +
> +/* PHY input voltage has to be 3.3V */
> +#define MLXBF_GIGE_MDIO3_3		1
> +
> +/* Operate in full drive mode */
> +#define MLXBF_GIGE_MDIO_FULL_DRIVE	1
> +
> +/* 6 cycles before the i1clk (core clock) rising edge that triggers the mdc */
> +#define MLXBF_GIGE_MDIO_IN_SAMP		6
> +
> +/* 13 cycles after the i1clk (core clock) rising edge that triggers the mdc */
> +#define MLXBF_GIGE_MDIO_OUT_SAMP	13
> +
> +/* The PHY interrupt line is shared with other interrupt lines such
> + * as GPIO and SMBus. So use YU registers to determine whether the
> + * interrupt comes from the PHY.
> + */
> +#define MLXBF_GIGE_CAUSE_RSH_COALESCE0_GPIO_CAUSE_MASK	0x10
> +#define MLXBF_GIGE_GPIO_CAUSE_IRQ_IS_SET(val) \
> +	((val) & MLXBF_GIGE_CAUSE_RSH_COALESCE0_GPIO_CAUSE_MASK)
> +
> +#define MLXBF_GIGE_GPIO_BLOCK0_MASK	BIT(0)
> +
> +#define MLXBF_GIGE_GPIO_CAUSE_FALL_EN		0x48
> +#define MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0	0x80
> +#define MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0		0x94
> +#define MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE	0x98
> +
> +#define MLXBF_GIGE_GPIO12_BIT			12
> +
> +static u32 mlxbf_gige_mdio_create_cmd(u16 data, int phy_add,
> +				      int phy_reg, u32 opcode)
> +{
> +	u32 gw_reg = 0;
> +
> +	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_AD_MASK, data);
> +	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_DEVAD_MASK,
> phy_reg);
> +	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_PARTAD_MASK,
> phy_add);
> +	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_OPCODE_MASK,
> opcode);
> +	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_ST1_MASK,
> +			     MLXBF_GIGE_MDIO_CL22_ST1);
> +	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_BUSY_MASK,
> +			     MLXBF_GIGE_MDIO_SET_BUSY);
> +
> +	/* Hold the lock until the read/write is completed so that no other
> +	 * program accesses the mdio bus.
> +	 */
> +	gw_reg |= FIELD_PREP(MLXBF_GIGE_MDIO_GW_LOCK_MASK,
> +			     MLXBF_GIGE_MDIO_SET_LOCK);
> +
> +	return gw_reg;
> +}
> +
> +static int mlxbf_gige_mdio_poll_bit(struct mlxbf_gige *priv, u32 bit_mask)
> +{
> +	unsigned long timeout;
> +	u32 val;
> +
> +	timeout = jiffies +
> msecs_to_jiffies(MLXBF_GIGE_MDIO_POLL_BUSY_TIMEOUT);
> +	do {
> +		val = readl(priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +		if (!(val & bit_mask))
> +			return 0;
> +		udelay(MLXBF_GIGE_MDIO_POLL_DELAY_USEC);
> +	} while (time_before(jiffies, timeout));
> +
> +	return -ETIME;
> +}
> +
> +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add, int phy_reg)
> +{
> +	struct mlxbf_gige *priv = bus->priv;
> +	u32 cmd;
> +	u32 ret;
> +
> +	/* If the lock is held by something else, drop the request.
> +	 * If the lock is cleared, that means the busy bit was cleared.
> +	 */
> +	ret = mlxbf_gige_mdio_poll_bit(priv,
> MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> +	if (ret)
> +		return -EBUSY;
> +
> +	/* Send mdio read request */
> +	cmd = mlxbf_gige_mdio_create_cmd(0, phy_add, phy_reg,
> MLXBF_GIGE_MDIO_CL22_READ);
> +
> +	writel(cmd, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +
> +	ret = mlxbf_gige_mdio_poll_bit(priv,
> MLXBF_GIGE_MDIO_GW_BUSY_MASK);
> +	if (ret) {
> +		writel(0, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +		return -EBUSY;
> +	}
> +
> +	ret = readl(priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +	/* Only return ad bits of the gw register */
> +	ret &= MLXBF_GIGE_MDIO_GW_AD_MASK;
> +
> +	/* To release the YU MDIO lock, clear gw register,
> +	 * so that the YU does not confuse this write with a new
> +	 * MDIO read/write request.
> +	 */
> +	writel(0, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +
> +	return ret;
> +}
> +
> +static int mlxbf_gige_mdio_write(struct mii_bus *bus, int phy_add,
> +				 int phy_reg, u16 val)
> +{
> +	struct mlxbf_gige *priv = bus->priv;
> +	u32 cmd;
> +	int ret;
> +
> +	/* If the lock is held by something else, drop the request.
> +	 * If the lock is cleared, that means the busy bit was cleared.
> +	 */
> +	ret = mlxbf_gige_mdio_poll_bit(priv,
> MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> +	if (ret)
> +		return -EBUSY;
> +
> +	/* Send mdio write request */
> +	cmd = mlxbf_gige_mdio_create_cmd(val, phy_add, phy_reg,
> +					 MLXBF_GIGE_MDIO_CL22_WRITE);
> +	writel(cmd, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +
> +	/* If the poll timed out, drop the request */
> +	ret = mlxbf_gige_mdio_poll_bit(priv,
> MLXBF_GIGE_MDIO_GW_BUSY_MASK);
> +
> +	/* To release the YU MDIO lock, clear gw register,
> +	 * so that the YU does not confuse this write as a new
> +	 * MDIO read/write request.
> +	 */
> +	writel(0, priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +
> +	return ret;
> +}
> +
> +static void mlxbf_gige_mdio_disable_phy_int(struct mlxbf_gige *priv)
> +{
> +	unsigned long flags;
> +	u32 val;
> +
> +	spin_lock_irqsave(&priv->gpio_lock, flags);
> +	val = readl(priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	val &= ~priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	spin_unlock_irqrestore(&priv->gpio_lock, flags);
> +}
> +
> +static void mlxbf_gige_mdio_enable_phy_int(struct mlxbf_gige *priv)
> +{
> +	unsigned long flags;
> +	u32 val;
> +
> +	spin_lock_irqsave(&priv->gpio_lock, flags);
> +	/* The INT_N interrupt level is active low.
> +	 * So enable cause fall bit to detect when GPIO
> +	 * state goes low.
> +	 */
> +	val = readl(priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_FALL_EN);
> +	val |= priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_FALL_EN);
> +
> +	/* Enable PHY interrupt by setting the priority level */
> +	val = readl(priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	val |= priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	spin_unlock_irqrestore(&priv->gpio_lock, flags);
> +}
> +
> +/* Interrupt handler is called from mlxbf_gige_main.c
> + * driver whenever a phy interrupt is received.
> + */
> +irqreturn_t mlxbf_gige_mdio_handle_phy_interrupt(struct mlxbf_gige *priv)
> +{
> +	u32 val;
> +
> +	/* The YU interrupt is shared between SMBus and GPIOs.
> +	 * So first, determine whether this is a GPIO interrupt.
> +	 */
> +	val = readl(priv->cause_rsh_coalesce0_io);
> +	if (!MLXBF_GIGE_GPIO_CAUSE_IRQ_IS_SET(val)) {
> +		/* Nothing to do here, not a GPIO interrupt */
> +		return IRQ_NONE;
> +	}
> +	/* Then determine which gpio register this interrupt is for.
> +	 * Return if the interrupt is not for gpio block 0.
> +	 */
> +	val = readl(priv->cause_gpio_arm_coalesce0_io);
> +	if (!(val & MLXBF_GIGE_GPIO_BLOCK0_MASK))
> +		return IRQ_NONE;
> +
> +	/* Finally check if this interrupt is from PHY device.
> +	 * Return if it is not.
> +	 */
> +	val = readl(priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0);
> +	if (!(val & priv->phy_int_gpio_mask))
> +		return IRQ_NONE;
> +
> +	/* Clear interrupt when done, otherwise, no further interrupt
> +	 * will be triggered.
> +	 * Writing 0x1 to the clear cause register also clears the
> +	 * following registers:
> +	 * cause_gpio_arm_coalesce0
> +	 * cause_rsh_coalesce0
> +	 */
> +	val = readl(priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> +	val |= priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void mlxbf_gige_mdio_init_config(struct mlxbf_gige *priv)
> +{
> +	struct device *dev = priv->dev;
> +	u32 mdio_full_drive;
> +	u32 mdio_out_sample;
> +	u32 mdio_in_sample;
> +	u32 mdio_voltage;
> +	u32 mdc_period;
> +	u32 mdio_mode;
> +	u32 mdio_cfg;
> +	int ret;
> +
> +	ret = device_property_read_u32(dev, "mdio-mode", &mdio_mode);
> +	if (ret < 0)
> +		mdio_mode = MLXBF_GIGE_MDIO_MODE_MASTER;
> +
> +	ret = device_property_read_u32(dev, "mdio-voltage", &mdio_voltage);
> +	if (ret < 0)
> +		mdio_voltage = MLXBF_GIGE_MDIO3_3;
> +
> +	ret = device_property_read_u32(dev, "mdio-full-drive",
> &mdio_full_drive);
> +	if (ret < 0)
> +		mdio_full_drive = MLXBF_GIGE_MDIO_FULL_DRIVE;
> +
> +	ret = device_property_read_u32(dev, "mdc-period", &mdc_period);
> +	if (ret < 0)
> +		mdc_period = MLXBF_GIGE_MDIO_PERIOD;
> +
> +	ret = device_property_read_u32(dev, "mdio-in-sample",
> &mdio_in_sample);
> +	if (ret < 0)
> +		mdio_in_sample = MLXBF_GIGE_MDIO_IN_SAMP;
> +
> +	ret = device_property_read_u32(dev, "mdio-out-sample",
> &mdio_out_sample);
> +	if (ret < 0)
> +		mdio_out_sample = MLXBF_GIGE_MDIO_OUT_SAMP;
> +
> +	mdio_cfg = FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_MODE_MASK,
> mdio_mode) |
> +		   FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO3_3_MASK,
> mdio_voltage) |
> +
> FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_FULL_DRIVE_MASK,
> mdio_full_drive) |
> +		   FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDC_PERIOD_MASK,
> mdc_period) |
> +
> FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_IN_SAMP_MASK,
> mdio_in_sample) |
> +
> FIELD_PREP(MLXBF_GIGE_MDIO_CFG_MDIO_OUT_SAMP_MASK,
> mdio_out_sample);
> +
> +	writel(mdio_cfg, priv->mdio_io + MLXBF_GIGE_MDIO_CFG_OFFSET);
> +}
> +
> +int mlxbf_gige_mdio_probe(struct platform_device *pdev, struct mlxbf_gige
> *priv)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct resource *res;
> +	u32 phy_int_gpio;
> +	u32 phy_addr;
> +	int ret;
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM,
> MLXBF_GIGE_RES_MDIO9);
> +	if (!res)
> +		return -ENODEV;
> +
> +	priv->mdio_io = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(priv->mdio_io))
> +		return PTR_ERR(priv->mdio_io);
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM,
> MLXBF_GIGE_RES_GPIO0);
> +	if (!res)
> +		return -ENODEV;
> +
> +	priv->gpio_io = devm_ioremap(dev, res->start, resource_size(res));
> +	if (!priv->gpio_io)
> +		return -ENOMEM;
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM,
> +				    MLXBF_GIGE_RES_CAUSE_RSH_COALESCE0);
> +	if (!res)
> +		return -ENODEV;
> +
> +	priv->cause_rsh_coalesce0_io =
> +		devm_ioremap(dev, res->start, resource_size(res));
> +	if (!priv->cause_rsh_coalesce0_io)
> +		return -ENOMEM;
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM,
> +
> MLXBF_GIGE_RES_CAUSE_GPIO_ARM_COALESCE0);
> +	if (!res)
> +		return -ENODEV;
> +
> +	priv->cause_gpio_arm_coalesce0_io =
> +		devm_ioremap(dev, res->start, resource_size(res));
> +	if (!priv->cause_gpio_arm_coalesce0_io)
> +		return -ENOMEM;
> +
> +	mlxbf_gige_mdio_init_config(priv);
> +
> +	ret = device_property_read_u32(dev, "phy-int-gpio", &phy_int_gpio);
> +	if (ret < 0)
> +		phy_int_gpio = MLXBF_GIGE_GPIO12_BIT;
> +	priv->phy_int_gpio_mask = BIT(phy_int_gpio);
> +
> +	mlxbf_gige_mdio_enable_phy_int(priv);
> +
> +	priv->mdiobus = devm_mdiobus_alloc(dev);
> +	if (!priv->mdiobus) {
> +		dev_err(dev, "Failed to alloc MDIO bus\n");
> +		return -ENOMEM;
> +	}
> +
> +	priv->mdiobus->name = "mlxbf-mdio";
> +	priv->mdiobus->read = mlxbf_gige_mdio_read;
> +	priv->mdiobus->write = mlxbf_gige_mdio_write;
> +	priv->mdiobus->parent = dev;
> +	priv->mdiobus->priv = priv;
> +	snprintf(priv->mdiobus->id, MII_BUS_ID_SIZE, "%s",
> +		 dev_name(dev));
> +
> +	ret = device_property_read_u32(dev, "phy-addr", &phy_addr);
> +	if (ret < 0)
> +		phy_addr = MLXBF_GIGE_MDIO_DEFAULT_PHY_ADDR;
> +
> +	priv->mdiobus->irq[phy_addr] = PHY_POLL;
> +
> +	/* Auto probe PHY at the corresponding address */
> +	priv->mdiobus->phy_mask = ~(1 << phy_addr);
> +	ret = mdiobus_register(priv->mdiobus);
> +	if (ret)
> +		dev_err(dev, "Failed to register MDIO bus\n");
> +
> +	return ret;
> +}
> +
> +void mlxbf_gige_mdio_remove(struct mlxbf_gige *priv)
> +{
> +	mlxbf_gige_mdio_disable_phy_int(priv);
> +	mdiobus_unregister(priv->mdiobus);
> +}
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h
> b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h
> new file mode 100644
> index 0000000..9c7af82
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h
> @@ -0,0 +1,73 @@
> +/* SPDX-License-Identifier: GPL-2.0-only OR Linux-OpenIB */
> +
> +/* Header file for Mellanox BlueField GigE register defines
> + *
> + * Copyright (c) 2020 Mellanox Technologies Ltd.
> + */
> +
> +#ifndef __MLXBF_GIGE_REGS_H__
> +#define __MLXBF_GIGE_REGS_H__
> +
> +#define MLXBF_GIGE_STATUS                             0x0010
> +#define MLXBF_GIGE_STATUS_READY                       BIT(0)
> +#define MLXBF_GIGE_INT_STATUS                         0x0028
> +#define MLXBF_GIGE_INT_STATUS_RX_RECEIVE_PACKET       BIT(0)
> +#define MLXBF_GIGE_INT_STATUS_RX_MAC_ERROR            BIT(1)
> +#define MLXBF_GIGE_INT_STATUS_RX_TRN_ERROR            BIT(2)
> +#define MLXBF_GIGE_INT_STATUS_SW_ACCESS_ERROR         BIT(3)
> +#define MLXBF_GIGE_INT_STATUS_SW_CONFIG_ERROR         BIT(4)
> +#define MLXBF_GIGE_INT_STATUS_TX_PI_CI_EXCEED_WQ_SIZE BIT(5)
> +#define MLXBF_GIGE_INT_STATUS_TX_SMALL_FRAME_SIZE     BIT(6)
> +#define MLXBF_GIGE_INT_STATUS_TX_CHECKSUM_INPUTS      BIT(7)
> +#define MLXBF_GIGE_INT_STATUS_HW_ACCESS_ERROR         BIT(8)
> +#define MLXBF_GIGE_INT_EN                             0x0030
> +#define MLXBF_GIGE_INT_EN_RX_RECEIVE_PACKET           BIT(0)
> +#define MLXBF_GIGE_INT_EN_RX_MAC_ERROR                BIT(1)
> +#define MLXBF_GIGE_INT_EN_RX_TRN_ERROR                BIT(2)
> +#define MLXBF_GIGE_INT_EN_SW_ACCESS_ERROR             BIT(3)
> +#define MLXBF_GIGE_INT_EN_SW_CONFIG_ERROR             BIT(4)
> +#define MLXBF_GIGE_INT_EN_TX_PI_CI_EXCEED_WQ_SIZE     BIT(5)
> +#define MLXBF_GIGE_INT_EN_TX_SMALL_FRAME_SIZE         BIT(6)
> +#define MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS          BIT(7)
> +#define MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR             BIT(8)
> +#define MLXBF_GIGE_INT_MASK                           0x0038
> +#define MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET         BIT(0)
> +#define MLXBF_GIGE_CONTROL                            0x0040
> +#define MLXBF_GIGE_CONTROL_PORT_EN                    BIT(0)
> +#define MLXBF_GIGE_CONTROL_MAC_ID_RANGE_EN            BIT(1)
> +#define MLXBF_GIGE_CONTROL_EN_SPECIFIC_MAC            BIT(4)
> +#define MLXBF_GIGE_CONTROL_CLEAN_PORT_EN              BIT(31)
> +#define MLXBF_GIGE_RX_WQ_BASE                         0x0200
> +#define MLXBF_GIGE_RX_WQE_SIZE_LOG2                   0x0208
> +#define MLXBF_GIGE_RX_WQE_SIZE_LOG2_RESET_VAL         7
> +#define MLXBF_GIGE_RX_CQ_BASE                         0x0210
> +#define MLXBF_GIGE_TX_WQ_BASE                         0x0218
> +#define MLXBF_GIGE_TX_WQ_SIZE_LOG2                    0x0220
> +#define MLXBF_GIGE_TX_WQ_SIZE_LOG2_RESET_VAL          7
> +#define MLXBF_GIGE_TX_CI_UPDATE_ADDRESS               0x0228
> +#define MLXBF_GIGE_RX_WQE_PI                          0x0230
> +#define MLXBF_GIGE_TX_PRODUCER_INDEX                  0x0238
> +#define MLXBF_GIGE_RX_MAC_FILTER                      0x0240
> +#define MLXBF_GIGE_RX_MAC_FILTER_STRIDE               0x0008
> +#define MLXBF_GIGE_RX_DIN_DROP_COUNTER                0x0260
> +#define MLXBF_GIGE_TX_CONSUMER_INDEX                  0x0310
> +#define MLXBF_GIGE_TX_CONTROL                         0x0318
> +#define MLXBF_GIGE_TX_CONTROL_GRACEFUL_STOP           BIT(0)
> +#define MLXBF_GIGE_TX_STATUS                          0x0388
> +#define MLXBF_GIGE_TX_STATUS_DATA_FIFO_FULL           BIT(1)
> +#define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_START     0x0520
> +#define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_END       0x0528
> +#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC           0x0540
> +#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC_EN        BIT(0)
> +#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS           0x0548
> +#define MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS_EN        BIT(0)
> +#define MLXBF_GIGE_RX_PASS_COUNTER_ALL                0x0550
> +#define MLXBF_GIGE_RX_DISC_COUNTER_ALL                0x0560
> +#define MLXBF_GIGE_RX                                 0x0578
> +#define MLXBF_GIGE_RX_STRIP_CRC_EN                    BIT(1)
> +#define MLXBF_GIGE_RX_DMA                             0x0580
> +#define MLXBF_GIGE_RX_DMA_EN                          BIT(0)
> +#define MLXBF_GIGE_RX_CQE_PACKET_CI                   0x05b0
> +#define MLXBF_GIGE_MAC_CFG                            0x05e8
> +
> +#endif /* !defined(__MLXBF_GIGE_REGS_H__) */
> --
> 2.1.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 19:41 ` David Thompson
@ 2020-07-29 20:31   ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2020-07-29 20:31 UTC (permalink / raw)
  To: dthompson; +Cc: netdev, kuba, jiri, Asmaa

From: David Thompson <dthompson@mellanox.com>
Date: Wed, 29 Jul 2020 19:41:30 +0000

> It's been pointed out to me that this section is incomplete, and I apologize.
 ...

David, do you have any idea what kind of burdon you create by quoting
an entire HUGE patch just to add some commentary to a small portion?

Take a look at:

	https://patchwork.ozlabs.org/project/netdev/patch/1596047355-28777-1-git-send-email-dthompson@mellanox.com/

I have to scroll through all of that quoted text just to get to the
patch itself.

Do NOT ever do this please!

I hate to beat on a dead horse, but I am continually surprised at how
lazy people are when quoting text.  Is it really that hard in modern
email clients to edit out the quoted text appropriately?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
  2020-07-29 19:41 ` David Thompson
@ 2020-07-29 20:49 ` Jakub Kicinski
  2020-07-30  4:03 ` kernel test robot
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2020-07-29 20:49 UTC (permalink / raw)
  To: David Thompson; +Cc: netdev, davem, jiri, Asmaa Mnebhi

On Wed, 29 Jul 2020 14:29:15 -0400 David Thompson wrote:
> This patch adds build and driver logic for the "mlxbf_gige"
> Ethernet driver from Mellanox Technologies.

Please fix these W=1 C=1 warnings:

drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:256:29: warning: Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:258:29: warning: Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:292:29: warning: Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:294:23: warning: Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:296:29: warning: Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:329:25: warning: incorrect type in assignment (different base types)
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:329:25:    expected unsigned long long [usertype]
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:329:25:    got restricted __be64 [usertype]
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c: In function ‘mlxbf_gige_initial_mac’:
drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:1093:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable]
 1093 |  int status;
      |      ^~~~~~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
  2020-07-29 19:41 ` David Thompson
  2020-07-29 20:49 ` Jakub Kicinski
@ 2020-07-30  4:03 ` kernel test robot
  2020-07-31 17:42 ` Andrew Lunn
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2020-07-30  4:03 UTC (permalink / raw)
  To: David Thompson, netdev
  Cc: kbuild-all, davem, kuba, jiri, David Thompson, Asmaa Mnebhi


[-- Attachment #1: Type: text/plain, Size: 2036 bytes --]

Hi David,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/David-Thompson/Add-Mellanox-BlueField-Gigabit-Ethernet-driver/20200730-023011
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 490ed0b908d371cd9ab63fc142213e5d02d810ee
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c: In function 'mlxbf_gige_initial_mac':
>> drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c:1093:6: warning: variable 'status' set but not used [-Wunused-but-set-variable]
    1093 |  int status;
         |      ^~~~~~

vim +/status +1093 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

  1088	
  1089	static void mlxbf_gige_initial_mac(struct mlxbf_gige *priv)
  1090	{
  1091		u8 mac[ETH_ALEN];
  1092		u64 local_mac;
> 1093		int status;
  1094	
  1095		status = mlxbf_gige_get_mac_rx_filter(priv, MLXBF_GIGE_LOCAL_MAC_FILTER_IDX,
  1096						      &local_mac);
  1097		mlxbf_gige_u64_to_mac(mac, local_mac);
  1098	
  1099		if (is_valid_ether_addr(mac)) {
  1100			ether_addr_copy(priv->netdev->dev_addr, mac);
  1101		} else {
  1102			/* Provide a random MAC if for some reason the device has
  1103			 * not been configured with a valid MAC address already.
  1104			 */
  1105			eth_hw_addr_random(priv->netdev);
  1106		}
  1107	
  1108		local_mac = mlxbf_gige_mac_to_u64(priv->netdev->dev_addr);
  1109		mlxbf_gige_set_mac_rx_filter(priv, MLXBF_GIGE_LOCAL_MAC_FILTER_IDX,
  1110					     local_mac);
  1111	}
  1112	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 74148 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
                   ` (2 preceding siblings ...)
  2020-07-30  4:03 ` kernel test robot
@ 2020-07-31 17:42 ` Andrew Lunn
       [not found]   ` <VI1PR05MB4110070900CF42CB3E18983EDA4E0@VI1PR05MB4110.eurprd05.prod.outlook.com>
  2020-08-11 19:53   ` Asmaa Mnebhi
  2020-07-31 18:37 ` Andrew Lunn
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 17+ messages in thread
From: Andrew Lunn @ 2020-07-31 17:42 UTC (permalink / raw)
  To: David Thompson; +Cc: netdev, davem, kuba, jiri, Asmaa Mnebhi

Hi David

>> +static int mlxbf_gige_mdio_poll_bit(struct mlxbf_gige *priv, u32 bit_mask)
> +{
> +	unsigned long timeout;
> +	u32 val;
> +
> +	timeout = jiffies + msecs_to_jiffies(MLXBF_GIGE_MDIO_POLL_BUSY_TIMEOUT);
> +	do {
> +		val = readl(priv->mdio_io + MLXBF_GIGE_MDIO_GW_OFFSET);
> +		if (!(val & bit_mask))
> +			return 0;
> +		udelay(MLXBF_GIGE_MDIO_POLL_DELAY_USEC);
> +	} while (time_before(jiffies, timeout));

Please use one of the include/linux/iopoll.h macros. 

> +
> +	return -ETIME;

ETIMEDOUT, not ETIME. But that will automatically be fixed when you
use iopoll.h. Core code has less bugs, which is why you should use it.

> +}
> +
> +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add, int phy_reg)
> +{
> +	struct mlxbf_gige *priv = bus->priv;
> +	u32 cmd;
> +	u32 ret;
> +
> +	/* If the lock is held by something else, drop the request.
> +	 * If the lock is cleared, that means the busy bit was cleared.
> +	 */

How can this happen? The mdio core has a mutex which prevents parallel
access?

> +	ret = mlxbf_gige_mdio_poll_bit(priv, MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> +	if (ret)
> +		return -EBUSY;

PHY drivers are not going to like that. They are not going to
retry. What is likely to happen is that phylib moves into the ERROR
state, and the PHY driver grinds to a halt.

> +static void mlxbf_gige_mdio_disable_phy_int(struct mlxbf_gige *priv)
> +{
> +	unsigned long flags;
> +	u32 val;
> +
> +	spin_lock_irqsave(&priv->gpio_lock, flags);
> +	val = readl(priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	val &= ~priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	spin_unlock_irqrestore(&priv->gpio_lock, flags);
> +}
> +
> +static void mlxbf_gige_mdio_enable_phy_int(struct mlxbf_gige *priv)
> +{
> +	unsigned long flags;
> +	u32 val;
> +
> +	spin_lock_irqsave(&priv->gpio_lock, flags);
> +	/* The INT_N interrupt level is active low.
> +	 * So enable cause fall bit to detect when GPIO
> +	 * state goes low.
> +	 */
> +	val = readl(priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_FALL_EN);
> +	val |= priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io + MLXBF_GIGE_GPIO_CAUSE_FALL_EN);
> +
> +	/* Enable PHY interrupt by setting the priority level */
> +	val = readl(priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	val |= priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_EVTEN0);
> +	spin_unlock_irqrestore(&priv->gpio_lock, flags);
> +}
> +
> +/* Interrupt handler is called from mlxbf_gige_main.c
> + * driver whenever a phy interrupt is received.
> + */
> +irqreturn_t mlxbf_gige_mdio_handle_phy_interrupt(struct mlxbf_gige *priv)
> +{
> +	u32 val;
> +
> +	/* The YU interrupt is shared between SMBus and GPIOs.
> +	 * So first, determine whether this is a GPIO interrupt.
> +	 */
> +	val = readl(priv->cause_rsh_coalesce0_io);
> +	if (!MLXBF_GIGE_GPIO_CAUSE_IRQ_IS_SET(val)) {
> +		/* Nothing to do here, not a GPIO interrupt */
> +		return IRQ_NONE;
> +	}
> +	/* Then determine which gpio register this interrupt is for.
> +	 * Return if the interrupt is not for gpio block 0.
> +	 */
> +	val = readl(priv->cause_gpio_arm_coalesce0_io);
> +	if (!(val & MLXBF_GIGE_GPIO_BLOCK0_MASK))
> +		return IRQ_NONE;
> +
> +	/* Finally check if this interrupt is from PHY device.
> +	 * Return if it is not.
> +	 */
> +	val = readl(priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0);
> +	if (!(val & priv->phy_int_gpio_mask))
> +		return IRQ_NONE;
> +
> +	/* Clear interrupt when done, otherwise, no further interrupt
> +	 * will be triggered.
> +	 * Writing 0x1 to the clear cause register also clears the
> +	 * following registers:
> +	 * cause_gpio_arm_coalesce0
> +	 * cause_rsh_coalesce0
> +	 */
> +	val = readl(priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> +	val |= priv->phy_int_gpio_mask;
> +	writel(val, priv->gpio_io +
> +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);

Shoudn't there be a call into the PHY driver at this point?

> +
> +	return IRQ_HANDLED;
> +}

So these last three functions seem to be an interrupt controller?  So
why not model it as a Linux interrupt controller? 

> +static void mlxbf_gige_mdio_init_config(struct mlxbf_gige *priv)
> +{
> +	struct device *dev = priv->dev;
> +	u32 mdio_full_drive;
> +	u32 mdio_out_sample;
> +	u32 mdio_in_sample;
> +	u32 mdio_voltage;
> +	u32 mdc_period;
> +	u32 mdio_mode;
> +	u32 mdio_cfg;
> +	int ret;
> +
> +	ret = device_property_read_u32(dev, "mdio-mode", &mdio_mode);
> +	if (ret < 0)
> +		mdio_mode = MLXBF_GIGE_MDIO_MODE_MASTER;
> +
> +	ret = device_property_read_u32(dev, "mdio-voltage", &mdio_voltage);
> +	if (ret < 0)
> +		mdio_voltage = MLXBF_GIGE_MDIO3_3;
> +
> +	ret = device_property_read_u32(dev, "mdio-full-drive", &mdio_full_drive);
> +	if (ret < 0)
> +		mdio_full_drive = MLXBF_GIGE_MDIO_FULL_DRIVE;
> +
> +	ret = device_property_read_u32(dev, "mdc-period", &mdc_period);
> +	if (ret < 0)
> +		mdc_period = MLXBF_GIGE_MDIO_PERIOD;
> +
> +	ret = device_property_read_u32(dev, "mdio-in-sample", &mdio_in_sample);
> +	if (ret < 0)
> +		mdio_in_sample = MLXBF_GIGE_MDIO_IN_SAMP;
> +
> +	ret = device_property_read_u32(dev, "mdio-out-sample", &mdio_out_sample);
> +	if (ret < 0)
> +		mdio_out_sample = MLXBF_GIGE_MDIO_OUT_SAMP;

Please see the discussion going on in the thread:

https://lore.kernel.org/linux-acpi/20200715090400.4733-1-calvin.johnson@oss.nxp.com/T/#t

and in particular

https://lore.kernel.org/linux-acpi/20200715090400.4733-1-calvin.johnson@oss.nxp.com/T/#t

My reading of this is you need to provide a specification of these
properties, and show they really are being used. Please join in on
that thread. Until we make progress on how ACPI should be used, you
might want to drop all these properties and just hard code it as a
standard 2.5Mhz MDIO bus.

> +int mlxbf_gige_mdio_probe(struct platform_device *pdev, struct mlxbf_gige *priv)
> +{
> +
> +	ret = device_property_read_u32(dev, "phy-addr", &phy_addr);
> +	if (ret < 0)
> +		phy_addr = MLXBF_GIGE_MDIO_DEFAULT_PHY_ADDR;

This is going to be problematic. See above.

> +
> +	priv->mdiobus->irq[phy_addr] = PHY_POLL;

That is the default anyway. You can skip this.  But why do you have
interrupt handling code, and then poll it? Maybe just delete all the
interrupt code?

> +
> +	/* Auto probe PHY at the corresponding address */
> +	priv->mdiobus->phy_mask = ~(1 << phy_addr);
> +	ret = mdiobus_register(priv->mdiobus);
> +	if (ret)
> +		dev_err(dev, "Failed to register MDIO bus\n");

Does it break if you scan the whole bus? It would allow you to avoid
some of the ACPI issues.

> +
> +	return ret;
> +}
> +

  Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
                   ` (3 preceding siblings ...)
  2020-07-31 17:42 ` Andrew Lunn
@ 2020-07-31 18:37 ` Andrew Lunn
  2020-07-31 18:38 ` Andrew Lunn
  2020-07-31 18:41 ` Andrew Lunn
  6 siblings, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2020-07-31 18:37 UTC (permalink / raw)
  To: David Thompson; +Cc: netdev, davem, kuba, jiri, Asmaa Mnebhi

On Wed, Jul 29, 2020 at 02:29:15PM -0400, David Thompson wrote:

Hi David

> +static void mlxbf_gige_get_pauseparam(struct net_device *netdev,
> +				      struct ethtool_pauseparam *pause)
> +{
> +	pause->autoneg = AUTONEG_ENABLE;
> +	pause->rx_pause = 1;
> +	pause->tx_pause = 1;

This is incorrect. You say autoneg is supported. So you should be
returning the result of the autoneg. But what is also wrong is you
don't appear to be programming the MAC with the result of the autoneg.
mlxbf_gige_handle_link_change() should be doing this.

> +}
> +
> +static int mlxbf_gige_get_link_ksettings(struct net_device *netdev,
> +					 struct ethtool_link_ksettings *link_ksettings)
> +{
> +	struct phy_device *phydev = netdev->phydev;
> +	u32 supported, advertising;
> +	u32 lp_advertising = 0;
> +	int status;

phy_ethtool_ksettings_get() and maybe phy_ethtool_ksettings_set().

> +
> +	supported = SUPPORTED_TP | SUPPORTED_1000baseT_Full |
> +		    SUPPORTED_Autoneg | SUPPORTED_Pause;
> +
> +	advertising = ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg |
> +		      ADVERTISED_Pause;
> +
> +	status = phy_read(phydev, MII_LPA);
> +	if (status >= 0)
> +		lp_advertising = mii_lpa_to_ethtool_lpa_t(status & 0xffff);
> +
> +	status = phy_read(phydev, MII_STAT1000);
> +	if (status >= 0)
> +		lp_advertising |= mii_stat1000_to_ethtool_lpa_t(status & 0xffff);
> +

The MAC driver has no business poking around in PHY registers. Call
into phylib.

> +static void mlxbf_gige_handle_link_change(struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	struct phy_device *phydev = netdev->phydev;
> +	irqreturn_t ret;
> +
> +	ret = mlxbf_gige_mdio_handle_phy_interrupt(priv);

You are polling the PHY. I don't see anywhere you link the interrupt
to phylib.

> +	if (ret != IRQ_HANDLED)
> +		return;
> +
> +	/* print new link status only if the interrupt came from the PHY */
> +	phy_print_status(phydev);
> +}

> +static int mlxbf_gige_open(struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +	struct phy_device *phydev = netdev->phydev;
> +	u64 int_en;
> +	int err;
> +
> +	mlxbf_gige_cache_stats(priv);
> +	mlxbf_gige_clean_port(priv);
> +	mlxbf_gige_rx_init(priv);
> +	mlxbf_gige_tx_init(priv);
> +	netif_napi_add(netdev, &priv->napi, mlxbf_gige_poll, NAPI_POLL_WEIGHT);
> +	napi_enable(&priv->napi);
> +	netif_start_queue(netdev);
> +
> +	err = mlxbf_gige_request_irqs(priv);
> +	if (err)
> +		return err;
> +
> +	phy_start(phydev);
> +
> +	/* Set bits in INT_EN that we care about */
> +	int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR |
> +		 MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS |
> +		 MLXBF_GIGE_INT_EN_TX_SMALL_FRAME_SIZE |
> +		 MLXBF_GIGE_INT_EN_TX_PI_CI_EXCEED_WQ_SIZE |
> +		 MLXBF_GIGE_INT_EN_SW_CONFIG_ERROR |
> +		 MLXBF_GIGE_INT_EN_SW_ACCESS_ERROR |
> +		 MLXBF_GIGE_INT_EN_RX_RECEIVE_PACKET;
> +	writeq(int_en, priv->base + MLXBF_GIGE_INT_EN);
> +
> +	return 0;
> +}
> +
> +static int mlxbf_gige_stop(struct net_device *netdev)
> +{
> +	struct mlxbf_gige *priv = netdev_priv(netdev);
> +
> +	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
> +	netif_stop_queue(netdev);
> +	napi_disable(&priv->napi);
> +	netif_napi_del(&priv->napi);
> +	mlxbf_gige_free_irqs(priv);
> +
> +	if (netdev->phydev)
> +		phy_stop(netdev->phydev);

In open() you unconditionally start the phy. Do you expect the PHY to
disappear between open and stop?

> +static int mlxbf_gige_probe(struct platform_device *pdev)
> +{

> +	phydev = phy_find_first(priv->mdiobus);
> +	if (!phydev)
> +		return -EIO;

-ENODEV would seem more appropriate.

	Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
                   ` (4 preceding siblings ...)
  2020-07-31 18:37 ` Andrew Lunn
@ 2020-07-31 18:38 ` Andrew Lunn
  2020-07-31 18:41 ` Andrew Lunn
  6 siblings, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2020-07-31 18:38 UTC (permalink / raw)
  To: David Thompson; +Cc: netdev, davem, kuba, jiri, Asmaa Mnebhi

> +config MLXBF_GIGE
> +	tristate "Mellanox Technologies BlueField Gigabit Ethernet support"
> +	depends on (ARM64 || COMPILE_TEST) && ACPI && INET
> +	select PHYLIB
> +	help
> +	  The second generation BlueField SoC from Mellanox Technologies
> +	  supports an out-of-band Gigabit Ethernet management port to the
> +	  Arm subsystem.

You might want to additionally select the PHY driver you are using.

    Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
                   ` (5 preceding siblings ...)
  2020-07-31 18:38 ` Andrew Lunn
@ 2020-07-31 18:41 ` Andrew Lunn
  6 siblings, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2020-07-31 18:41 UTC (permalink / raw)
  To: David Thompson, f; +Cc: netdev, davem, kuba, jiri, Asmaa Mnebhi

> +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add, int phy_reg)
> +{
> +	struct mlxbf_gige *priv = bus->priv;
> +	u32 cmd;
> +	u32 ret;

Here and in write, please check if it is a C45 transaction request and
return -EOPNOTSUPP.

       Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
       [not found]   ` <VI1PR05MB4110070900CF42CB3E18983EDA4E0@VI1PR05MB4110.eurprd05.prod.outlook.com>
@ 2020-07-31 19:54     ` Andrew Lunn
  2020-07-31 21:38       ` Asmaa Mnebhi
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Lunn @ 2020-07-31 19:54 UTC (permalink / raw)
  To: Asmaa Mnebhi; +Cc: David Thompson, netdev, davem, kuba, Jiri Pirko

On Fri, Jul 31, 2020 at 06:54:04PM +0000, Asmaa Mnebhi wrote:

Hi Asmaa

Please don't send HTML obfusticated emails to mailing lists.

> > +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add, int
> 
> > +phy_reg) {
> 
> > +         struct mlxbf_gige *priv = bus->priv;
> 
> > +         u32 cmd;
> 
> > +         u32 ret;
> 
> > +
> 
> > +         /* If the lock is held by something else, drop the request.
> 
> > +         * If the lock is cleared, that means the busy bit was cleared.
> 
> > +         */
> 
>  
> 
> How can this happen? The mdio core has a mutex which prevents parallel access?
> 
>  
> 
> This is a HW Lock. It is an actual register. So another HW entity can be
> holding that lock and reading/changing the values in the HW registers.

You have not explains how that can happen? Is there something in the
driver i missed which takes a backdoor to read/write MDIO
transactions?

> > +         ret = mlxbf_gige_mdio_poll_bit(priv, MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> 
> > +         if (ret)
> 
> > +                       return -EBUSY;
> 
>  
> 
> PHY drivers are not going to like that. They are not going to retry. What is
> likely to happen is that phylib moves into the ERROR state, and the PHY driver
> grinds to a halt.
> 
>  
> 
> This is a fairly quick HW transaction. So I don’t think it would cause and
> issue for the PHY drivers. In this case, we use the micrel KSZ9031. We haven’t
> seen issues.

So you have happy to debug hard to find and reproduce issues when it
does happen? Or would you like to spend a little bit of time now and
just prevent it happening at all?

     Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-31 19:54     ` Andrew Lunn
@ 2020-07-31 21:38       ` Asmaa Mnebhi
  2020-08-01  1:14         ` Andrew Lunn
  0 siblings, 1 reply; 17+ messages in thread
From: Asmaa Mnebhi @ 2020-07-31 21:38 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David Thompson, netdev, davem, kuba, Jiri Pirko



> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Friday, July 31, 2020 3:55 PM
> To: Asmaa Mnebhi <Asmaa@mellanox.com>
> Cc: David Thompson <dthompson@mellanox.com>;
> netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org; Jiri
> Pirko <jiri@mellanox.com>
> Subject: Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet
> driver
> 
> On Fri, Jul 31, 2020 at 06:54:04PM +0000, Asmaa Mnebhi wrote:
> 
> Hi Asmaa
> 
> Please don't send HTML obfusticated emails to mailing lists.

My apologies! 

> 
> > > +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add,
> > > +int
> >
> > > +phy_reg) {
> >
> > > +         struct mlxbf_gige *priv = bus->priv;
> >
> > > +         u32 cmd;
> >
> > > +         u32 ret;
> >
> > > +
> >
> > > +         /* If the lock is held by something else, drop the request.
> >
> > > +         * If the lock is cleared, that means the busy bit was cleared.
> >
> > > +         */
> >
> >
> >
> > How can this happen? The mdio core has a mutex which prevents parallel
> access?
> >
> >
> >
> > This is a HW Lock. It is an actual register. So another HW entity can
> > be holding that lock and reading/changing the values in the HW registers.
> 
> You have not explains how that can happen? Is there something in the driver
> i missed which takes a backdoor to read/write MDIO transactions?

Ah ok! There is a HW entity (called YU) within the BlueField which is connected to the PHY device.
I think the YU is what you are calling "backdoor" here. The YU contains several registers which control reads/writes
To the PHY. So it is like an extra layer for reading MDIO registers. One of the YU registers is the gateway register (aka GW or
MLXBF_GIGE_MDIO_GW_OFFSET in the code). If the GW register's LOCK bit is not cleared, we cannot write anything to the actual PHY MDIO registers.
Did I answer your question?

> 
> > > +         ret = mlxbf_gige_mdio_poll_bit(priv,
> > > + MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> >
> > > +         if (ret)
> >
> > > +                       return -EBUSY;
> >
> >
> >
> > PHY drivers are not going to like that. They are not going to retry.
> > What is likely to happen is that phylib moves into the ERROR state,
> > and the PHY driver grinds to a halt.
> >
> >
> >
> > This is a fairly quick HW transaction. So I don’t think it would cause
> > and issue for the PHY drivers. In this case, we use the micrel
> > KSZ9031. We haven’t seen issues.
> 
> So you have happy to debug hard to find and reproduce issues when it does
> happen? Or would you like to spend a little bit of time now and just prevent
> it happening at all?

I think I misunderstood your comment. Did you ask why we are polling here? Or that we shouldn't be returning -EBUSY?
> 
>      Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-31 21:38       ` Asmaa Mnebhi
@ 2020-08-01  1:14         ` Andrew Lunn
  2020-08-03 14:23           ` Asmaa Mnebhi
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Lunn @ 2020-08-01  1:14 UTC (permalink / raw)
  To: Asmaa Mnebhi; +Cc: David Thompson, netdev, davem, kuba, Jiri Pirko

> > > > +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add,
> > > > +int
> > >
> > > > +phy_reg) {
> > >
> > > > +         struct mlxbf_gige *priv = bus->priv;
> > >
> > > > +         u32 cmd;
> > >
> > > > +         u32 ret;
> > >
> > > > +
> > >
> > > > +         /* If the lock is held by something else, drop the request.
> > >
> > > > +         * If the lock is cleared, that means the busy bit was cleared.
> > >
> > > > +         */
> > >
> > >
> > >
> > > How can this happen? The mdio core has a mutex which prevents parallel
> > access?
> > >
> > >
> > >
> > > This is a HW Lock. It is an actual register. So another HW entity can
> > > be holding that lock and reading/changing the values in the HW registers.
> > 
> > You have not explains how that can happen? Is there something in the driver
> > i missed which takes a backdoor to read/write MDIO transactions?
> 
> Ah ok! There is a HW entity (called YU) within the BlueField which is connected to the PHY device.
> I think the YU is what you are calling "backdoor" here. The YU contains several registers which control reads/writes
> To the PHY. So it is like an extra layer for reading MDIO registers. One of the YU registers is the gateway register (aka GW or
> MLXBF_GIGE_MDIO_GW_OFFSET in the code). If the GW register's LOCK bit is not cleared, we cannot write anything to the actual PHY MDIO registers.
> Did I answer your question?

Nope.

How can two transactions happen at the same time, causing this lock
bit to be locked? Given that the MDIO core has a mutex and serialises
all transactions. How can the lock bit every be set?

> > > > +         ret = mlxbf_gige_mdio_poll_bit(priv,
> > > > + MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> > >
> > > > +         if (ret)
> > >
> > > > +                       return -EBUSY;
> > >
> > >
> > >
> > > PHY drivers are not going to like that. They are not going to retry.
> > > What is likely to happen is that phylib moves into the ERROR state,
> > > and the PHY driver grinds to a halt.
> > >
> > >
> > >
> > > This is a fairly quick HW transaction. So I don’t think it would cause
> > > and issue for the PHY drivers. In this case, we use the micrel
> > > KSZ9031. We haven’t seen issues.
> > 
> > So you have happy to debug hard to find and reproduce issues when it does
> > happen? Or would you like to spend a little bit of time now and just prevent
> > it happening at all?
> 
> I think I misunderstood your comment. Did you ask why we are polling here? Or that we shouldn't be returning -EBUSY?

I think you should not be returning EBUSY. If it every happens, bad
things will happen.

This lock bit seems to server no purpose. Software will ensure that
transactions are serialized. If it serves no purpose, just ensure it
is unlocked at probe time, and then ignore it. If you ignore it, you
will never return -EBUSY and so bad things will never happen.

Just because hardware exists does not mean you have to use it or that
it adds any value.

       Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-08-01  1:14         ` Andrew Lunn
@ 2020-08-03 14:23           ` Asmaa Mnebhi
  0 siblings, 0 replies; 17+ messages in thread
From: Asmaa Mnebhi @ 2020-08-03 14:23 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David Thompson, netdev, davem, kuba, Jiri Pirko



> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Friday, July 31, 2020 9:15 PM
> To: Asmaa Mnebhi <Asmaa@mellanox.com>
> Cc: David Thompson <dthompson@mellanox.com>;
> netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org; Jiri Pirko
> <jiri@mellanox.com>
> Subject: Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
> 
> > > > > +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int
> > > > > +phy_add, int
> > > >
> > > > > +phy_reg) {
> > > >
> > > > > +         struct mlxbf_gige *priv = bus->priv;
> > > >
> > > > > +         u32 cmd;
> > > >
> > > > > +         u32 ret;
> > > >
> > > > > +
> > > >
> > > > > +         /* If the lock is held by something else, drop the request.
> > > >
> > > > > +         * If the lock is cleared, that means the busy bit was cleared.
> > > >
> > > > > +         */
> > > >
> > > >
> > > >
> > > > How can this happen? The mdio core has a mutex which prevents
> > > > parallel
> > > access?
> > > >
> > > >
> > > >
> > > > This is a HW Lock. It is an actual register. So another HW entity
> > > > can be holding that lock and reading/changing the values in the HW
> registers.
> > >
> > > You have not explains how that can happen? Is there something in the
> > > driver i missed which takes a backdoor to read/write MDIO transactions?
> >
> > Ah ok! There is a HW entity (called YU) within the BlueField which is
> connected to the PHY device.
> > I think the YU is what you are calling "backdoor" here. The YU
> > contains several registers which control reads/writes To the PHY. So
> > it is like an extra layer for reading MDIO registers. One of the YU registers is
> the gateway register (aka GW or MLXBF_GIGE_MDIO_GW_OFFSET in the
> code). If the GW register's LOCK bit is not cleared, we cannot write anything
> to the actual PHY MDIO registers.
> > Did I answer your question?
> 
> Nope.
> 
> How can two transactions happen at the same time, causing this lock bit to
> be locked? Given that the MDIO core has a mutex and serialises all
> transactions. How can the lock bit every be set?

Ah I see what you are saying. SW takes care of it, so HW would never fall into this scenario. That will make things cleaner and faster then! Ok will change it, test it and report back.

> 
> > > > > +         ret = mlxbf_gige_mdio_poll_bit(priv,
> > > > > + MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> > > >
> > > > > +         if (ret)
> > > >
> > > > > +                       return -EBUSY;
> > > >
> > > >
> > > >
> > > > PHY drivers are not going to like that. They are not going to retry.
> > > > What is likely to happen is that phylib moves into the ERROR
> > > > state, and the PHY driver grinds to a halt.
> > > >
> > > >
> > > >
> > > > This is a fairly quick HW transaction. So I don’t think it would
> > > > cause and issue for the PHY drivers. In this case, we use the
> > > > micrel KSZ9031. We haven’t seen issues.
> > >
> > > So you have happy to debug hard to find and reproduce issues when it
> > > does happen? Or would you like to spend a little bit of time now and
> > > just prevent it happening at all?
> >
> > I think I misunderstood your comment. Did you ask why we are polling
> here? Or that we shouldn't be returning -EBUSY?
> 
> I think you should not be returning EBUSY. If it every happens, bad things will
> happen.
> 
> This lock bit seems to server no purpose. Software will ensure that
> transactions are serialized. If it serves no purpose, just ensure it is unlocked
> at probe time, and then ignore it. If you ignore it, you will never return -
> EBUSY and so bad things will never happen.
> 
> Just because hardware exists does not mean you have to use it or that it
> adds any value.

Sounds good.
> 
>        Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-07-31 17:42 ` Andrew Lunn
       [not found]   ` <VI1PR05MB4110070900CF42CB3E18983EDA4E0@VI1PR05MB4110.eurprd05.prod.outlook.com>
@ 2020-08-11 19:53   ` Asmaa Mnebhi
  2020-08-11 20:06     ` Andrew Lunn
  1 sibling, 1 reply; 17+ messages in thread
From: Asmaa Mnebhi @ 2020-08-11 19:53 UTC (permalink / raw)
  To: Andrew Lunn, David Thompson; +Cc: netdev, davem, kuba, Jiri Pirko, Asmaa Mnebhi

Hi Andrew,

Thanks again for your feedback.

> > +	/* Finally check if this interrupt is from PHY device.
> > +	 * Return if it is not.
> > +	 */
> > +	val = readl(priv->gpio_io +
> > +			MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0);
> > +	if (!(val & priv->phy_int_gpio_mask))
> > +		return IRQ_NONE;
> > +
> > +	/* Clear interrupt when done, otherwise, no further interrupt
> > +	 * will be triggered.
> > +	 * Writing 0x1 to the clear cause register also clears the
> > +	 * following registers:
> > +	 * cause_gpio_arm_coalesce0
> > +	 * cause_rsh_coalesce0
> > +	 */
> > +	val = readl(priv->gpio_io +
> > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> > +	val |= priv->phy_int_gpio_mask;
> > +	writel(val, priv->gpio_io +
> > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> 
> Shoudn't there be a call into the PHY driver at this point?
> 
> > +
> > +	return IRQ_HANDLED;
> > +}
> 
> So these last three functions seem to be an interrupt controller?  So why not
> model it as a Linux interrupt controller?

Apologies for the confusion. The plan is to remove support to the polling and instead support the HW interrupt as follows (from the probe):
irq = platform_get_irq(pdev, MLXBF_GIGE_PHY_INT_N);
         if (irq < 0) {
                 dev_err(dev, "Failed to retrieve irq 0x%x\n", irq);
                 return -ENODEV;
         }
         priv->mdiobus->irq[phy_addr] = irq;

This HW interrupt is the PHY interrupt which indicates link up/link down.
The MAC driver calls phy_connect_direct, which I thought was sufficient to handle the interrupt since it calls phy_request_interrupt.
Phy_request_interrupt calls request_threaded_irq which registers phy_interrupt as a callback.
Phy_interrupt triggers the phy state machine which checks the link status. The state machine goes into phy_check_link_status which eventually calls mlxbf_gige_handle_link_change.

I guess my question is should we model it as a linux interrupt controller rather than use phy_connect_direct ? 

Using phy_connect_direct to register my interrupt handler, I have encountered a particular issue where the PHY interrupt is triggered before the phy link status bit (reg 0x1 of the PHY device) is set to 1 (indicating link is up).
So the PHY interrupt triggers the PHY state machine, which checks the link status and sees that it is still 0, so it keeps the link state as DOWN.
Adding a delay to wait for the register to be set accordingly fixes this "race condition". But it doesn't look nice.

Thank you.
Asmaa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-08-11 19:53   ` Asmaa Mnebhi
@ 2020-08-11 20:06     ` Andrew Lunn
  2020-08-12 20:37       ` Asmaa Mnebhi
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Lunn @ 2020-08-11 20:06 UTC (permalink / raw)
  To: Asmaa Mnebhi
  Cc: David Thompson, netdev, davem, kuba, Jiri Pirko, Asmaa Mnebhi

On Tue, Aug 11, 2020 at 07:53:35PM +0000, Asmaa Mnebhi wrote:
> Hi Andrew,
> 
> Thanks again for your feedback.
> 
> > > +	/* Finally check if this interrupt is from PHY device.
> > > +	 * Return if it is not.
> > > +	 */
> > > +	val = readl(priv->gpio_io +
> > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0);
> > > +	if (!(val & priv->phy_int_gpio_mask))
> > > +		return IRQ_NONE;
> > > +
> > > +	/* Clear interrupt when done, otherwise, no further interrupt
> > > +	 * will be triggered.
> > > +	 * Writing 0x1 to the clear cause register also clears the
> > > +	 * following registers:
> > > +	 * cause_gpio_arm_coalesce0
> > > +	 * cause_rsh_coalesce0
> > > +	 */
> > > +	val = readl(priv->gpio_io +
> > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> > > +	val |= priv->phy_int_gpio_mask;
> > > +	writel(val, priv->gpio_io +
> > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> > 
> > Shoudn't there be a call into the PHY driver at this point?
> > 
> > > +
> > > +	return IRQ_HANDLED;
> > > +}
> > 
> > So these last three functions seem to be an interrupt controller?  So why not
> > model it as a Linux interrupt controller?
> 
> Apologies for the confusion. The plan is to remove support to the polling and instead support the HW interrupt as follows (from the probe):
> irq = platform_get_irq(pdev, MLXBF_GIGE_PHY_INT_N);
>          if (irq < 0) {
>                  dev_err(dev, "Failed to retrieve irq 0x%x\n", irq);
>                  return -ENODEV;
>          }
>          priv->mdiobus->irq[phy_addr] = irq;

O.K, that is one way to do it. The other is via the MAC driver calling
phy_mac_interrupt().

> I guess my question is should we model it as a linux interrupt
> controller rather than use phy_connect_direct ?

It seems like there are other interrupt sources, not just the PHY. Do
you plan to use any of them? It can be easier to debug issues if you
have an interrupt controller, can see counters in /proc/interrupts,
etc. Also, if you need to export the lines to some other driver,
e.g. SFP, it is easier to do when there is an interrupt controller.

> Using phy_connect_direct to register my interrupt handler, I have
> encountered a particular issue where the PHY interrupt is triggered
> before the phy link status bit (reg 0x1 of the PHY device) is set to
> 1 (indicating link is up).

So the hardware is broken :-(

What about the other way, link down? Same problem?

Polling is probably your best bet, since it is robust against broken
interrupts. If i remember correctly, this is an off the shelf 1G PHY?
Microchip? Is there an errata for this? Maybe the errata suggests a
work around?

     Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-08-11 20:06     ` Andrew Lunn
@ 2020-08-12 20:37       ` Asmaa Mnebhi
  2020-08-12 21:34         ` Andrew Lunn
  0 siblings, 1 reply; 17+ messages in thread
From: Asmaa Mnebhi @ 2020-08-12 20:37 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David Thompson, netdev, davem, kuba, Jiri Pirko, Asmaa Mnebhi



> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Tuesday, August 11, 2020 4:07 PM
> To: Asmaa Mnebhi <asmaa@nvidia.com>
> Cc: David Thompson <dthompson@mellanox.com>;
> netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org; Jiri
> Pirko <jiri@mellanox.com>; Asmaa Mnebhi <Asmaa@mellanox.com>
> Subject: Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet
> driver
> 
> On Tue, Aug 11, 2020 at 07:53:35PM +0000, Asmaa Mnebhi wrote:
> > Hi Andrew,
> >
> > Thanks again for your feedback.
> >
> > > > +	/* Finally check if this interrupt is from PHY device.
> > > > +	 * Return if it is not.
> > > > +	 */
> > > > +	val = readl(priv->gpio_io +
> > > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0);
> > > > +	if (!(val & priv->phy_int_gpio_mask))
> > > > +		return IRQ_NONE;
> > > > +
> > > > +	/* Clear interrupt when done, otherwise, no further interrupt
> > > > +	 * will be triggered.
> > > > +	 * Writing 0x1 to the clear cause register also clears the
> > > > +	 * following registers:
> > > > +	 * cause_gpio_arm_coalesce0
> > > > +	 * cause_rsh_coalesce0
> > > > +	 */
> > > > +	val = readl(priv->gpio_io +
> > > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> > > > +	val |= priv->phy_int_gpio_mask;
> > > > +	writel(val, priv->gpio_io +
> > > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> > >
> > > Shoudn't there be a call into the PHY driver at this point?
> > >
> > > > +
> > > > +	return IRQ_HANDLED;
> > > > +}
> > >
> > > So these last three functions seem to be an interrupt controller?
> > > So why not model it as a Linux interrupt controller?
> >
> > Apologies for the confusion. The plan is to remove support to the polling
> and instead support the HW interrupt as follows (from the probe):
> > irq = platform_get_irq(pdev, MLXBF_GIGE_PHY_INT_N);
> >          if (irq < 0) {
> >                  dev_err(dev, "Failed to retrieve irq 0x%x\n", irq);
> >                  return -ENODEV;
> >          }
> >          priv->mdiobus->irq[phy_addr] = irq;
> 
> O.K, that is one way to do it. The other is via the MAC driver calling
> phy_mac_interrupt().
> 
> > I guess my question is should we model it as a linux interrupt
> > controller rather than use phy_connect_direct ?
> 
> It seems like there are other interrupt sources, not just the PHY. Do you plan
> to use any of them? It can be easier to debug issues if you have an interrupt
> controller, can see counters in /proc/interrupts, etc. Also, if you need to
> export the lines to some other driver, e.g. SFP, it is easier to do when there is
> an interrupt controller.
> 
> > Using phy_connect_direct to register my interrupt handler, I have
> > encountered a particular issue where the PHY interrupt is triggered
> > before the phy link status bit (reg 0x1 of the PHY device) is set to
> > 1 (indicating link is up).
> 
> So the hardware is broken :-(
> 
> What about the other way, link down? Same problem?
> 
> Polling is probably your best bet, since it is robust against broken interrupts.
> If i remember correctly, this is an off the shelf 1G PHY?
> Microchip? Is there an errata for this? Maybe the errata suggests a work
> around?

So let me explain further and would greatly appreciate your input.
Technically, when this driver gets loaded, we shouldn't need the interrupt when bringing up the link for the first time, do we?
Correct me if I am wrong, "phy_start" should bring up the link. Phy_start calls phy_start_aneg, which eventually calls phy_check_link_status.
phy_check_link_status , reads the link state bit of the BMSR register (only twice),  and based on that determines whether to bring up/down the link. In our case, that bit is still 0 when the read is donw. A little bit later, it gets set to 1.

This is why polling works in this case. Phy_start fails to bring up the link but the polling eventually bring it up. If we choose to use the interrupt, we should make sure that the 
Interrupt is enabled a little bit after phy_start, otherwise, it would just be wasted.

Best,
Asmaa

> 
>      Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
  2020-08-12 20:37       ` Asmaa Mnebhi
@ 2020-08-12 21:34         ` Andrew Lunn
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2020-08-12 21:34 UTC (permalink / raw)
  To: Asmaa Mnebhi
  Cc: David Thompson, netdev, davem, kuba, Jiri Pirko, Asmaa Mnebhi

Hi Asmaa

Please wrap your emails at about 75 characters.

> So let me explain further and would greatly appreciate your input.

> Technically, when this driver gets loaded, we shouldn't need the interrupt when bringing up the link for the first time, do we?
> Correct me if I am wrong, "phy_start" should bring up the link. Phy_start calls phy_start_aneg, which eventually calls phy_check_link_status.
> phy_check_link_status , reads the link state bit of the BMSR register (only twice),  and based on that determines whether to bring up/down the link. In our case, that bit is still 0 when the read is donw. A little bit later, it gets set to 1.
> 
> This is why polling works in this case. Phy_start fails to bring up the link but the polling eventually bring it up. If we choose to use the interrupt, we should make sure that the 
> Interrupt is enabled a little bit after phy_start, otherwise, it would just be wasted.

When the PHY is connected to the MAC, phy_request_interrupt() is
called. That sets up the SoC side of the interrupt, so that
phy_interrupt() will be called on interrupt. It then calls
phy_enable_interrupts()->phy_config_interrupt() which calls into the
PHY driver to enable interrupts within the PHY.  It is then expected
that the PHY interrupts whenever there is a change in link status.

Sometime later phy_start() will be called which should kick off an
autoneg. That should result in the link going up, maybe 1.5 seconds
later if the link peer is present, maybe later if there is no
peer. The link up will trigger an interrupt, and the new status will
be read. If the link goes down, the interrupt should also trigger, and
the status will be updated.

   Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-29 18:29 [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver David Thompson
2020-07-29 19:41 ` David Thompson
2020-07-29 20:31   ` David Miller
2020-07-29 20:49 ` Jakub Kicinski
2020-07-30  4:03 ` kernel test robot
2020-07-31 17:42 ` Andrew Lunn
     [not found]   ` <VI1PR05MB4110070900CF42CB3E18983EDA4E0@VI1PR05MB4110.eurprd05.prod.outlook.com>
2020-07-31 19:54     ` Andrew Lunn
2020-07-31 21:38       ` Asmaa Mnebhi
2020-08-01  1:14         ` Andrew Lunn
2020-08-03 14:23           ` Asmaa Mnebhi
2020-08-11 19:53   ` Asmaa Mnebhi
2020-08-11 20:06     ` Andrew Lunn
2020-08-12 20:37       ` Asmaa Mnebhi
2020-08-12 21:34         ` Andrew Lunn
2020-07-31 18:37 ` Andrew Lunn
2020-07-31 18:38 ` Andrew Lunn
2020-07-31 18:41 ` Andrew Lunn

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git