All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips
@ 2024-04-15 10:43 FUJITA Tomonori
  2024-04-15 10:43 ` [PATCH net-next v1 1/5] net: tn40xx: add pci " FUJITA Tomonori
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-15 10:43 UTC (permalink / raw)
  To: netdev; +Cc: andrew

This patchset adds a new 10G ethernet driver for Tehuti Networks
TN40xx chips. Note in mainline, there is a driver for Tehuti Networks
(drivers/net/ethernet/tehuti/tehuti.[hc]), which supports TN30xx
chips.

Multiple vendors (DLink, Asus, Edimax, QNAP, etc) developed adapters
based on TN40xx chips. Tehuti Networks went out of business but the
drivers are still distributed with some of the hardware (and also
available on some sites). With some changes, I try to upstream this
driver with a new PHY driver in Rust.

The major change is replacing a PHY abstraction layer with
PHYLIB. TN40xx chips are used with various PHY hardware (AMCC QT2025,
TI TLK10232, Aqrate AQR105, and Marvell MV88X3120, MV88X3310, and
MV88E2010). So the original driver has the own PHY abstraction layer
to handle them.

I'll submit a new PHY driver for QT2025 in Rust shortly. For now, I
enable only adapters using QT2025 PHY in the PCI ID table of this
driver. I've tested this driver and the QT2025 PHY driver with Edimax
EN-9320 10G adapter. In mainline, there are PHY drivers for AQR105 and
Marvell PHYs, which could work for some TN40xx adapters with this
driver.

The other changes are replacing the embedded firmware in a header file
with the firmware APIs, handling dma mapping errors, removing many
ifdef, fixing lots of style issues, etc.

To make reviewing easier, this patchset has only basic functions. Once
merged, I'll submit features like ethtool support.


FUJITA Tomonori (5):
  net: tn40xx: add pci driver for Tehuti Networks TN40xx chips
  net: tn40xx: add register defines
  net: tn40xx: add basic Tx handling
  net: tn40xx: add basic Rx handling
  net: tn40xx: add PHYLIB support

 MAINTAINERS                             |    8 +-
 drivers/net/ethernet/tehuti/Kconfig     |   14 +
 drivers/net/ethernet/tehuti/Makefile    |    3 +
 drivers/net/ethernet/tehuti/tn40.c      | 1981 +++++++++++++++++++++++
 drivers/net/ethernet/tehuti/tn40.h      |  291 ++++
 drivers/net/ethernet/tehuti/tn40_mdio.c |  141 ++
 drivers/net/ethernet/tehuti/tn40_regs.h |  279 ++++
 7 files changed, 2716 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/tehuti/tn40.c
 create mode 100644 drivers/net/ethernet/tehuti/tn40.h
 create mode 100644 drivers/net/ethernet/tehuti/tn40_mdio.c
 create mode 100644 drivers/net/ethernet/tehuti/tn40_regs.h


base-commit: 32affa5578f0e6b9abef3623d3976395afbd265c
-- 
2.34.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next v1 1/5] net: tn40xx: add pci driver for Tehuti Networks TN40xx chips
  2024-04-15 10:43 [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips FUJITA Tomonori
@ 2024-04-15 10:43 ` FUJITA Tomonori
  2024-04-15 12:39   ` kernel test robot
  2024-04-15 14:24   ` Andrew Lunn
  2024-04-15 10:43 ` [PATCH net-next v1 2/5] net: tn40xx: add register defines FUJITA Tomonori
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-15 10:43 UTC (permalink / raw)
  To: netdev; +Cc: andrew

This just adds the scaffolding for an ethernet driver for Tehuti
Networks TN40xx chips.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
---
 MAINTAINERS                          |  8 +++-
 drivers/net/ethernet/tehuti/Kconfig  | 12 ++++++
 drivers/net/ethernet/tehuti/Makefile |  3 ++
 drivers/net/ethernet/tehuti/tn40.c   | 57 ++++++++++++++++++++++++++++
 drivers/net/ethernet/tehuti/tn40.h   | 17 +++++++++
 5 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/tehuti/tn40.c
 create mode 100644 drivers/net/ethernet/tehuti/tn40.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ba3fe6ac09c..3bd757e803d3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21818,7 +21818,13 @@ TEHUTI ETHERNET DRIVER
 M:	Andy Gospodarek <andy@greyhouse.net>
 L:	netdev@vger.kernel.org
 S:	Supported
-F:	drivers/net/ethernet/tehuti/*
+F:	drivers/net/ethernet/tehuti/tehuti.*
+
+TEHUTI TN40XX ETHERNET DRIVER
+M:	FUJITA Tomonori <fujita.tomonori@gmail.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/ethernet/tehuti/tn40*
 
 TELECOM CLOCK DRIVER FOR MCPL0010
 M:	Mark Gross <markgross@kernel.org>
diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
index 8735633765a1..849e3b4a71c1 100644
--- a/drivers/net/ethernet/tehuti/Kconfig
+++ b/drivers/net/ethernet/tehuti/Kconfig
@@ -23,4 +23,16 @@ config TEHUTI
 	help
 	  Tehuti Networks 10G Ethernet NIC
 
+config TEHUTI_TN40
+	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
+	depends on PCI
+	help
+	  This driver supports 10G Ethernet adapters using Tehuti Networks
+	  TN40xx chips. Currently, adapters with Applied Micro Circuits
+	  Corporation QT2025 are supported; Tehuti Networks TN9310,
+	  DLink DXE-810S, ASUS XG-C100F, and Edimax EN-9320.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called tn40xx.
+
 endif # NET_VENDOR_TEHUTI
diff --git a/drivers/net/ethernet/tehuti/Makefile b/drivers/net/ethernet/tehuti/Makefile
index 13a0ddd62088..1c468d99e476 100644
--- a/drivers/net/ethernet/tehuti/Makefile
+++ b/drivers/net/ethernet/tehuti/Makefile
@@ -4,3 +4,6 @@
 #
 
 obj-$(CONFIG_TEHUTI) += tehuti.o
+
+tn40xx-y := tn40.o
+obj-$(CONFIG_TEHUTI_TN40) += tn40xx.o
diff --git a/drivers/net/ethernet/tehuti/tn40.c b/drivers/net/ethernet/tehuti/tn40.c
new file mode 100644
index 000000000000..368a73300f8a
--- /dev/null
+++ b/drivers/net/ethernet/tehuti/tn40.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (c) Tehuti Networks Ltd. */
+
+#include "tn40.h"
+
+static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	int ret;
+
+	ret = pci_enable_device(pdev);
+	if (ret)
+		return ret;
+
+	if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))) {
+		ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+		if (ret) {
+			dev_err(&pdev->dev, "failed to set DMA mask.\n");
+			goto err_disable_device;
+		}
+	}
+	return 0;
+err_disable_device:
+	pci_disable_device(pdev);
+	return ret;
+}
+
+static void bdx_remove(struct pci_dev *pdev)
+{
+	pci_disable_device(pdev);
+}
+
+static const struct pci_device_id bdx_id_table[] = {
+	{ PCI_DEVICE_SUB(PCI_VENDOR_ID_TEHUTI, 0x4022,
+			 PCI_VENDOR_ID_TEHUTI, 0x3015) },
+	{ PCI_DEVICE_SUB(PCI_VENDOR_ID_TEHUTI, 0x4022,
+			 PCI_VENDOR_ID_DLINK, 0x4d00) },
+	{ PCI_DEVICE_SUB(PCI_VENDOR_ID_TEHUTI, 0x4022,
+			 PCI_VENDOR_ID_ASUSTEK, 0x8709) },
+	{ PCI_DEVICE_SUB(PCI_VENDOR_ID_TEHUTI, 0x4022,
+			 PCI_VENDOR_ID_EDIMAX, 0x8103) },
+	{ }
+};
+
+static struct pci_driver bdx_driver = {
+	.name = BDX_DRV_NAME,
+	.id_table = bdx_id_table,
+	.probe = bdx_probe,
+	.remove = bdx_remove,
+};
+
+module_pci_driver(bdx_driver);
+
+MODULE_DEVICE_TABLE(pci, bdx_id_table);
+MODULE_AUTHOR("Tehuti networks");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Tehuti Network TN30xx Driver");
+MODULE_VERSION(BDX_DRV_VERSION);
diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
new file mode 100644
index 000000000000..8507c8f7bc6a
--- /dev/null
+++ b/drivers/net/ethernet/tehuti/tn40.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* Copyright (c) Tehuti Networks Ltd. */
+
+#ifndef _TN40_H_
+#define _TN40_H_
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/version.h>
+
+#define BDX_DRV_NAME "tn40xx"
+#define BDX_DRV_VERSION "0.3.6.17.2"
+
+#define PCI_VENDOR_ID_EDIMAX 0x1432
+
+#endif /* _TN40XX_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v1 2/5] net: tn40xx: add register defines
  2024-04-15 10:43 [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips FUJITA Tomonori
  2024-04-15 10:43 ` [PATCH net-next v1 1/5] net: tn40xx: add pci " FUJITA Tomonori
@ 2024-04-15 10:43 ` FUJITA Tomonori
  2024-04-15 10:43 ` [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling FUJITA Tomonori
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-15 10:43 UTC (permalink / raw)
  To: netdev; +Cc: andrew

This adds several defines to handle registers in Tehuti Networks
TN40xx chips for later patches.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
---
 drivers/net/ethernet/tehuti/tn40.h      |   2 +
 drivers/net/ethernet/tehuti/tn40_regs.h | 279 ++++++++++++++++++++++++
 2 files changed, 281 insertions(+)
 create mode 100644 drivers/net/ethernet/tehuti/tn40_regs.h

diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
index 8507c8f7bc6a..ed43ba027dc5 100644
--- a/drivers/net/ethernet/tehuti/tn40.h
+++ b/drivers/net/ethernet/tehuti/tn40.h
@@ -9,6 +9,8 @@
 #include <linux/pci.h>
 #include <linux/version.h>
 
+#include "tn40_regs.h"
+
 #define BDX_DRV_NAME "tn40xx"
 #define BDX_DRV_VERSION "0.3.6.17.2"
 
diff --git a/drivers/net/ethernet/tehuti/tn40_regs.h b/drivers/net/ethernet/tehuti/tn40_regs.h
new file mode 100644
index 000000000000..5e5cb0c49fdf
--- /dev/null
+++ b/drivers/net/ethernet/tehuti/tn40_regs.h
@@ -0,0 +1,279 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* Copyright (c) Tehuti Networks Ltd. */
+
+#ifndef _TN40_REGS_H_
+#define _TN40_REGS_H_
+
+/* Register region size */
+#define BDX_REGS_SIZE 0x10000
+
+/* Registers from 0x0000-0x00fc were remapped to 0x4000-0x40fc */
+#define REG_TXD_CFG1_0 0x4000
+#define REG_TXD_CFG1_1 0x4004
+#define REG_TXD_CFG1_2 0x4008
+#define REG_TXD_CFG1_3 0x400C
+
+#define REG_RXF_CFG1_0 0x4010
+#define REG_RXF_CFG1_1 0x4014
+#define REG_RXF_CFG1_2 0x4018
+#define REG_RXF_CFG1_3 0x401C
+
+#define REG_RXD_CFG1_0 0x4020
+#define REG_RXD_CFG1_1 0x4024
+#define REG_RXD_CFG1_2 0x4028
+#define REG_RXD_CFG1_3 0x402C
+
+#define REG_TXF_CFG1_0 0x4030
+#define REG_TXF_CFG1_1 0x4034
+#define REG_TXF_CFG1_2 0x4038
+#define REG_TXF_CFG1_3 0x403C
+
+#define REG_TXD_CFG0_0 0x4040
+#define REG_TXD_CFG0_1 0x4044
+#define REG_TXD_CFG0_2 0x4048
+#define REG_TXD_CFG0_3 0x404C
+
+#define REG_RXF_CFG0_0 0x4050
+#define REG_RXF_CFG0_1 0x4054
+#define REG_RXF_CFG0_2 0x4058
+#define REG_RXF_CFG0_3 0x405C
+
+#define REG_RXD_CFG0_0 0x4060
+#define REG_RXD_CFG0_1 0x4064
+#define REG_RXD_CFG0_2 0x4068
+#define REG_RXD_CFG0_3 0x406C
+
+#define REG_TXF_CFG0_0 0x4070
+#define REG_TXF_CFG0_1 0x4074
+#define REG_TXF_CFG0_2 0x4078
+#define REG_TXF_CFG0_3 0x407C
+
+#define REG_TXD_WPTR_0 0x4080
+#define REG_TXD_WPTR_1 0x4084
+#define REG_TXD_WPTR_2 0x4088
+#define REG_TXD_WPTR_3 0x408C
+
+#define REG_RXF_WPTR_0 0x4090
+#define REG_RXF_WPTR_1 0x4094
+#define REG_RXF_WPTR_2 0x4098
+#define REG_RXF_WPTR_3 0x409C
+
+#define REG_RXD_WPTR_0 0x40A0
+#define REG_RXD_WPTR_1 0x40A4
+#define REG_RXD_WPTR_2 0x40A8
+#define REG_RXD_WPTR_3 0x40AC
+
+#define REG_TXF_WPTR_0 0x40B0
+#define REG_TXF_WPTR_1 0x40B4
+#define REG_TXF_WPTR_2 0x40B8
+#define REG_TXF_WPTR_3 0x40BC
+
+#define REG_TXD_RPTR_0 0x40C0
+#define REG_TXD_RPTR_1 0x40C4
+#define REG_TXD_RPTR_2 0x40C8
+#define REG_TXD_RPTR_3 0x40CC
+
+#define REG_RXF_RPTR_0 0x40D0
+#define REG_RXF_RPTR_1 0x40D4
+#define REG_RXF_RPTR_2 0x40D8
+#define REG_RXF_RPTR_3 0x40DC
+
+#define REG_RXD_RPTR_0 0x40E0
+#define REG_RXD_RPTR_1 0x40E4
+#define REG_RXD_RPTR_2 0x40E8
+#define REG_RXD_RPTR_3 0x40EC
+
+#define REG_TXF_RPTR_0 0x40F0
+#define REG_TXF_RPTR_1 0x40F4
+#define REG_TXF_RPTR_2 0x40F8
+#define REG_TXF_RPTR_3 0x40FC
+
+/* Hardware versioning */
+#define FW_VER 0x5010
+#define SROM_VER 0x5020
+#define FPGA_VER 0x5030
+#define FPGA_SEED 0x5040
+
+/* Registers from 0x0100-0x0150 were remapped to 0x5100-0x5150 */
+#define REG_ISR REG_ISR0
+#define REG_ISR0 0x5100
+
+#define REG_IMR REG_IMR0
+#define REG_IMR0 0x5110
+
+#define REG_RDINTCM0 0x5120
+#define REG_RDINTCM2 0x5128
+
+#define REG_TDINTCM0 0x5130
+
+#define REG_ISR_MSK0 0x5140
+
+#define REG_INIT_SEMAPHORE 0x5170
+#define REG_INIT_STATUS 0x5180
+
+#define REG_MAC_LNK_STAT 0x0200
+#define MAC_LINK_STAT 0x0004 /* Link state */
+
+#define REG_BLNK_LED 0x0210
+
+#define REG_GMAC_RXF_A 0x1240
+
+#define REG_UNC_MAC0_A 0x1250
+#define REG_UNC_MAC1_A 0x1260
+#define REG_UNC_MAC2_A 0x1270
+
+#define REG_VLAN_0 0x1800
+
+#define REG_MAX_FRAME_A 0x12C0
+
+#define REG_RX_MAC_MCST0 0x1A80
+#define REG_RX_MAC_MCST1 0x1A84
+#define MAC_MCST_NUM 15
+#define REG_RX_MCST_HASH0 0x1A00
+#define MAC_MCST_HASH_NUM 8
+
+#define REG_VPC 0x2300
+#define REG_VIC 0x2320
+#define REG_VGLB 0x2340
+
+#define REG_CLKPLL 0x5000
+
+/* MDIO interface */
+
+#define REG_MDIO_CMD_STAT 0x6030
+#define REG_MDIO_CMD 0x6034
+#define REG_MDIO_DATA 0x6038
+#define REG_MDIO_ADDR 0x603C
+#define GET_MDIO_BUSY(x) GET_BITS_SHIFT(x, 1, 0)
+#define GET_MDIO_RD_ERR(x) GET_BITS_SHIFT(x, 1, 1)
+
+/* for 10G only*/
+#define REG_RSS_CNG 0x000000b0
+
+#define RSS_ENABLED 0x00000001
+#define RSS_HFT_TOEPLITZ 0x00000002
+#define RSS_HASH_IPV4 0x00000100
+#define RSS_HASH_TCP_IPV4 0x00000200
+#define RSS_HASH_IPV6 0x00000400
+#define RSS_HASH_IPV6_EX 0x00000800
+#define RSS_HASH_TCP_IPV6 0x00001000
+#define RSS_HASH_TCP_IPV6_EX 0x00002000
+
+#define REG_RSS_HASH_BASE 0x0400
+#define RSS_HASH_LEN 40
+#define REG_RSS_INDT_BASE 0x0600
+#define RSS_INDT_LEN 256
+
+#define REG_REVISION 0x6000
+#define REG_SCRATCH 0x6004
+#define REG_CTRLST 0x6008
+#define REG_MAC_ADDR_0 0x600C
+#define REG_MAC_ADDR_1 0x6010
+#define REG_FRM_LENGTH 0x6014
+#define REG_PAUSE_QUANT 0x6054
+#define REG_RX_FIFO_SECTION 0x601C
+#define REG_TX_FIFO_SECTION 0x6020
+#define REG_RX_FULLNESS 0x6024
+#define REG_TX_FULLNESS 0x6028
+#define REG_HASHTABLE 0x602C
+
+#define REG_RST_PORT 0x7000
+#define REG_DIS_PORT 0x7010
+#define REG_RST_QU 0x7020
+#define REG_DIS_QU 0x7030
+
+#define REG_CTRLST_TX_ENA 0x0001
+#define REG_CTRLST_RX_ENA 0x0002
+#define REG_CTRLST_PRM_ENA 0x0010
+#define REG_CTRLST_PAD_ENA 0x0020
+
+#define REG_CTRLST_BASE (REG_CTRLST_PAD_ENA | REG_CTRLST_PRM_ENA)
+
+#define REG_RX_FLT 0x1400
+
+/* TXD TXF RXF RXD  CONFIG 0x0000 --- 0x007c */
+#define TX_RX_CFG1_BASE 0xffffffff /*0-31 */
+#define TX_RX_CFG0_BASE 0xfffff000 /*31:12 */
+#define TX_RX_CFG0_RSVD 0x00000ffc /*11:2 */
+#define TX_RX_CFG0_SIZE 0x00000003 /*1:0 */
+
+/* TXD TXF RXF RXD  WRITE 0x0080 --- 0x00BC */
+#define TXF_WPTR_WR_PTR 0x00007ff8 /*14:3 */
+
+/* TXD TXF RXF RXD  READ  0x00CO --- 0x00FC */
+#define TXF_RPTR_RD_PTR 0x00007ff8 /*14:3 */
+
+/* The last 4 bits are dropped size is rounded to 16 */
+#define TXF_WPTR_MASK 0x7ff0
+
+/* regISR 0x0100 */
+/* regIMR 0x0110 */
+#define IMR_INPROG 0x80000000 /*31 */
+#define IR_LNKCHG1 0x10000000 /*28 */
+#define IR_LNKCHG0 0x08000000 /*27 */
+#define IR_GPIO 0x04000000 /*26 */
+#define IR_RFRSH 0x02000000 /*25 */
+#define IR_RSVD 0x01000000 /*24 */
+#define IR_SWI 0x00800000 /*23 */
+#define IR_RX_FREE_3 0x00400000 /*22 */
+#define IR_RX_FREE_2 0x00200000 /*21 */
+#define IR_RX_FREE_1 0x00100000 /*20 */
+#define IR_RX_FREE_0 0x00080000 /*19 */
+#define IR_TX_FREE_3 0x00040000 /*18 */
+#define IR_TX_FREE_2 0x00020000 /*17 */
+#define IR_TX_FREE_1 0x00010000 /*16 */
+#define IR_TX_FREE_0 0x00008000 /*15 */
+#define IR_RX_DESC_3 0x00004000 /*14 */
+#define IR_RX_DESC_2 0x00002000 /*13 */
+#define IR_RX_DESC_1 0x00001000 /*12 */
+#define IR_RX_DESC_0 0x00000800 /*11 */
+#define IR_PSE 0x00000400 /*10 */
+#define IR_TMR3 0x00000200 /* 9 */
+#define IR_TMR2 0x00000100 /* 8 */
+#define IR_TMR1 0x00000080 /* 7 */
+#define IR_TMR0 0x00000040 /* 6 */
+#define IR_VNT 0x00000020 /* 5 */
+#define IR_RxFL 0x00000010 /* 4 */
+#define IR_SDPERR 0x00000008 /* 3 */
+#define IR_TR 0x00000004 /* 2 */
+#define IR_PCIE_LINK 0x00000002 /* 1 */
+#define IR_PCIE_TOUT 0x00000001 /* 0 */
+
+#define IR_EXTRA                                                     \
+	(IR_RX_FREE_0 | IR_LNKCHG0 | IR_LNKCHG1 | IR_PSE | IR_TMR0 | \
+	 IR_PCIE_LINK | IR_PCIE_TOUT)
+
+#define GMAC_RX_FILTER_OSEN 0x1000 /* shared OS enable */
+#define GMAC_RX_FILTER_TXFC 0x0400 /* Tx flow control */
+#define GMAC_RX_FILTER_RSV0 0x0200 /* reserved */
+#define GMAC_RX_FILTER_FDA 0x0100 /* filter out direct address */
+#define GMAC_RX_FILTER_AOF 0x0080 /* accept over run */
+#define GMAC_RX_FILTER_ACF 0x0040 /* accept control frames */
+#define GMAC_RX_FILTER_ARUNT 0x0020 /* accept under run */
+#define GMAC_RX_FILTER_ACRC 0x0010 /* accept crc error */
+#define GMAC_RX_FILTER_AM 0x0008 /* accept multicast */
+#define GMAC_RX_FILTER_AB 0x0004 /* accept broadcast */
+#define GMAC_RX_FILTER_PRM 0x0001 /* [0:1] promiscuous mode */
+
+#define MAX_FRAME_AB_VAL 0x3fff /* 13:0 */
+
+#define CLKPLL_PLLLKD 0x0200 /* 9 */
+#define CLKPLL_RSTEND 0x0100 /* 8 */
+#define CLKPLL_SFTRST 0x0001 /* 0 */
+
+#define CLKPLL_LKD (CLKPLL_PLLLKD | CLKPLL_RSTEND)
+
+/* PCI-E Device Control Register (Offset 0x88) Source: Luxor Data
+ * Sheet, 7.1.3.3.3
+ */
+#define PCI_DEV_CTRL_REG 0x88
+#define GET_DEV_CTRL_MAXPL(x) GET_BITS_SHIFT(x, 3, 5)
+#define GET_DEV_CTRL_MRRS(x) GET_BITS_SHIFT(x, 3, 12)
+
+/* PCI-E Link Status Register (Offset 0x92) Source: Luxor Data
+ * Sheet, 7.1.3.3.7
+ */
+#define PCI_LINK_STATUS_REG 0x92
+#define GET_LINK_STATUS_LANES(x) GET_BITS_SHIFT(x, 6, 4)
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling
  2024-04-15 10:43 [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips FUJITA Tomonori
  2024-04-15 10:43 ` [PATCH net-next v1 1/5] net: tn40xx: add pci " FUJITA Tomonori
  2024-04-15 10:43 ` [PATCH net-next v1 2/5] net: tn40xx: add register defines FUJITA Tomonori
@ 2024-04-15 10:43 ` FUJITA Tomonori
  2024-04-15 22:29   ` kernel test robot
  2024-04-18 17:23   ` Simon Horman
  2024-04-15 10:43 ` [PATCH net-next v1 4/5] net: tn40xx: add basic Rx handling FUJITA Tomonori
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-15 10:43 UTC (permalink / raw)
  To: netdev; +Cc: andrew

This patch adds device specific structures to initialize the hardware
with basic Tx handling. The original driver loads the embedded
firmware in the header file. This driver is implemented to use the
firmware APIs.

The Tx logic uses three major data structures; two ring buffers with
NIC and one database. One ring buffer is used to send information
about packets to be sent for NIC. The other is used to get information
from NIC about packet that are sent. The database is used to keep the
information about DMA mapping. After a packet is sent, the db is used
to free the resource used for the packet.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
---
 drivers/net/ethernet/tehuti/Kconfig |    1 +
 drivers/net/ethernet/tehuti/tn40.c  | 1251 +++++++++++++++++++++++++++
 drivers/net/ethernet/tehuti/tn40.h  |  192 +++-
 3 files changed, 1443 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
index 849e3b4a71c1..4198fd59e42e 100644
--- a/drivers/net/ethernet/tehuti/Kconfig
+++ b/drivers/net/ethernet/tehuti/Kconfig
@@ -26,6 +26,7 @@ config TEHUTI
 config TEHUTI_TN40
 	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
 	depends on PCI
+	select FW_LOADER
 	help
 	  This driver supports 10G Ethernet adapters using Tehuti Networks
 	  TN40xx chips. Currently, adapters with Applied Micro Circuits
diff --git a/drivers/net/ethernet/tehuti/tn40.c b/drivers/net/ethernet/tehuti/tn40.c
index 368a73300f8a..0798df468fc3 100644
--- a/drivers/net/ethernet/tehuti/tn40.c
+++ b/drivers/net/ethernet/tehuti/tn40.c
@@ -3,9 +3,1170 @@
 
 #include "tn40.h"
 
+#define SHORT_PACKET_SIZE 60
+
+static void bdx_enable_interrupts(struct bdx_priv *priv)
+{
+	write_reg(priv, REG_IMR, priv->isr_mask);
+}
+
+static void bdx_disable_interrupts(struct bdx_priv *priv)
+{
+	write_reg(priv, REG_IMR, 0);
+}
+
+static int bdx_fifo_alloc(struct bdx_priv *priv, struct fifo *f, int fsz_type,
+			  u16 reg_cfg0, u16 reg_cfg1, u16 reg_rptr, u16 reg_wptr)
+{
+	u16 memsz = FIFO_SIZE * (1 << fsz_type);
+
+	memset(f, 0, sizeof(struct fifo));
+	/* 1K extra space is allocated at the end of the fifo to simplify
+	 * processing of descriptors that wraps around fifo's end.
+	 */
+	f->va = dma_alloc_coherent(&priv->pdev->dev,
+				   memsz + FIFO_EXTRA_SPACE, &f->da, GFP_KERNEL);
+	if (!f->va)
+		return -ENOMEM;
+
+	f->reg_cfg0 = reg_cfg0;
+	f->reg_cfg1 = reg_cfg1;
+	f->reg_rptr = reg_rptr;
+	f->reg_wptr = reg_wptr;
+	f->rptr = 0;
+	f->wptr = 0;
+	f->memsz = memsz;
+	f->size_mask = memsz - 1;
+	write_reg(priv, reg_cfg0, (u32)((f->da & TX_RX_CFG0_BASE) | fsz_type));
+	write_reg(priv, reg_cfg1, H32_64(f->da));
+	return 0;
+}
+
+static void bdx_fifo_free(struct bdx_priv *priv, struct fifo *f)
+{
+	dma_free_coherent(&priv->pdev->dev,
+			  f->memsz + FIFO_EXTRA_SPACE, f->va, f->da);
+}
+
+/* TX HW/SW interaction overview
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * There are 2 types of TX communication channels between driver and NIC.
+ * 1) TX Free Fifo - TXF - Holds ack descriptors for sent packets.
+ * 2) TX Data Fifo - TXD - Holds descriptors of full buffers.
+ *
+ * Currently the NIC supports TSO, checksumming and gather DMA
+ * UFO and IP fragmentation is on the way.
+ *
+ * RX SW Data Structures
+ * ~~~~~~~~~~~~~~~~~~~~~
+ * TXDB is used to keep track of all skbs owned by SW and their DMA addresses.
+ * For TX case, ownership lasts from getting the packet via hard_xmit and
+ * until the HW acknowledges sending the packet by TXF descriptors.
+ * TXDB is implemented as a cyclic buffer.
+ *
+ * FIFO objects keep info about the fifo's size and location, relevant HW
+ * registers, usage and skb db. Each RXD and RXF fifo has their own fifo
+ * structure. Implemented as simple struct.
+ *
+ * TX SW Execution Flow
+ * ~~~~~~~~~~~~~~~~~~~~
+ * OS calls the driver's hard_xmit method with a packet to send. The driver
+ * creates DMA mappings, builds TXD descriptors and kicks the HW by updating
+ * TXD WPTR.
+ *
+ * When a packet is sent, The HW write a TXF descriptor and the SW
+ * frees the original skb. To prevent TXD fifo overflow without
+ * reading HW registers every time, the SW deploys "tx level"
+ * technique. Upon startup, the tx level is initialized to TXD fifo
+ * length. For every sent packet, the SW gets its TXD descriptor size
+ * (from a pre-calculated array) and subtracts it from tx level.  The
+ * size is also stored in txdb. When a TXF ack arrives, the SW fetched
+ * the size of the original TXD descriptor from the txdb and adds it
+ * to the tx level. When the Tx level drops below some predefined
+ * threshold, the driver stops the TX queue. When the TX level rises
+ * above that level, the tx queue is enabled again.
+ *
+ * This technique avoids excessive reading of RPTR and WPTR registers.
+ * As our benchmarks shows, it adds 1.5 Gbit/sec to NIS's throughput.
+ */
+static inline int bdx_tx_db_size(struct txdb *db)
+{
+	int taken = db->wptr - db->rptr;
+
+	if (taken < 0)
+		taken = db->size + 1 + taken;	/* (size + 1) equals memsz */
+	return db->size - taken;
+}
+
+static inline void __bdx_tx_db_ptr_next(struct txdb *db, struct tx_map **pptr)
+{
+	++*pptr;
+	if (unlikely(*pptr == db->end))
+		*pptr = db->start;
+}
+
+static inline void bdx_tx_db_inc_rptr(struct txdb *db)
+{
+	__bdx_tx_db_ptr_next(db, &db->rptr);
+}
+
+static inline void bdx_tx_db_inc_wptr(struct txdb *db)
+{
+	__bdx_tx_db_ptr_next(db, &db->wptr);
+}
+
+static int bdx_tx_db_init(struct txdb *d, int sz_type)
+{
+	int memsz = FIFO_SIZE * (1 << (sz_type + 1));
+
+	d->start = vzalloc(memsz);
+	if (!d->start)
+		return -ENOMEM;
+	/* In order to differentiate between an empty db state and a full db
+	 * state at least one element should always be empty in order to
+	 * avoid rptr == wptr, which means that the db is empty.
+	 */
+	d->size = memsz / sizeof(struct tx_map) - 1;
+	d->end = d->start + d->size + 1;	/* just after last element */
+
+	/* All dbs are created empty */
+	d->rptr = d->start;
+	d->wptr = d->start;
+	return 0;
+}
+
+static void bdx_tx_db_close(struct txdb *d)
+{
+	if (d->start) {
+		vfree(d->start);
+		d->start = NULL;
+	}
+}
+
+/* Sizes of tx desc (including padding if needed) as function of the SKB's
+ * frag number
+ */
+static struct {
+	u16 bytes;
+	u16 qwords;		/* qword = 64 bit */
+} txd_sizes[MAX_PBL];
+
+inline void bdx_set_pbl(struct pbl *pbl, dma_addr_t dma, int len)
+{
+	pbl->len = cpu_to_le32(len);
+	pbl->pa_lo = cpu_to_le32(L32_64(dma));
+	pbl->pa_hi = cpu_to_le32(H32_64(dma));
+}
+
+static inline void bdx_set_txdb(struct txdb *db, dma_addr_t dma, int len)
+{
+	db->wptr->len = len;
+	db->wptr->addr.dma = dma;
+}
+
+struct mapping_info {
+	dma_addr_t dma;
+	size_t size;
+};
+
+/**
+ * txdb_map_skb - create and store DMA mappings for skb's data blocks
+ * @priv: NIC private structure
+ * @skb: socket buffer to map
+ * @txdd: pointer to tx descriptor to be updated
+ * @pkt_len: pointer to unsigned long value
+ *
+ * This function creates DMA mappings for skb's data blocks and writes them to
+ * PBL of a new tx descriptor. It also stores them in the tx db, so they could
+ * be unmapped after the data has been sent. It is the responsibility of the
+ * caller to make sure that there is enough space in the txdb. The last
+ * element holds a pointer to skb itself and is marked with a zero length.
+ *
+ * Return: 0 on success and negative value on error.
+ */
+static inline int bdx_tx_map_skb(struct bdx_priv *priv, struct sk_buff *skb,
+				 struct txd_desc *txdd, unsigned int *pkt_len)
+{
+	dma_addr_t dma;
+	int i, len, err;
+	struct txdb *db = &priv->txdb;
+	struct pbl *pbl = &txdd->pbl[0];
+	int nr_frags = skb_shinfo(skb)->nr_frags;
+	unsigned int size;
+	struct mapping_info info[MAX_PBL];
+
+	netdev_dbg(priv->ndev, "TX skb %p skbLen %d dataLen %d frags %d\n", skb,
+		   skb->len, skb->data_len, nr_frags);
+	if (nr_frags > MAX_PBL - 1) {
+		err = skb_linearize(skb);
+		if (err)
+			return -1;
+		nr_frags = skb_shinfo(skb)->nr_frags;
+	}
+	/* initial skb */
+	len = skb->len - skb->data_len;
+	dma = dma_map_single(&priv->pdev->dev, skb->data, len,
+			     DMA_TO_DEVICE);
+	if (dma_mapping_error(&priv->pdev->dev, dma))
+		return -1;
+
+	bdx_set_txdb(db, dma, len);
+	bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
+	*pkt_len = db->wptr->len;
+
+	for (i = 0; i < nr_frags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		size = skb_frag_size(frag);
+		dma = skb_frag_dma_map(&priv->pdev->dev, frag, 0,
+				       size, DMA_TO_DEVICE);
+
+		if (dma_mapping_error(&priv->pdev->dev, dma))
+			goto mapping_error;
+		info[i].dma = dma;
+		info[i].size = size;
+	}
+
+	for (i = 0; i < nr_frags; i++) {
+		bdx_tx_db_inc_wptr(db);
+		bdx_set_txdb(db, info[i].dma, info[i].size);
+		bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
+		*pkt_len += db->wptr->len;
+	}
+
+	/* SHORT_PKT_FIX */
+	if (skb->len < SHORT_PACKET_SIZE)
+		++nr_frags;
+
+	/* Add skb clean up info. */
+	bdx_tx_db_inc_wptr(db);
+	db->wptr->len = -txd_sizes[nr_frags].bytes;
+	db->wptr->addr.skb = skb;
+	bdx_tx_db_inc_wptr(db);
+
+	return 0;
+ mapping_error:
+	dma_unmap_page(&priv->pdev->dev, db->wptr->addr.dma, db->wptr->len, DMA_TO_DEVICE);
+	for (; i > 0; i--)
+		dma_unmap_page(&priv->pdev->dev, info[i - 1].dma, info[i - 1].size, DMA_TO_DEVICE);
+	return -1;
+}
+
+static void init_txd_sizes(void)
+{
+	int i, lwords;
+
+	if (txd_sizes[0].bytes)
+		return;
+
+	/* 7 - is number of lwords in txd with one phys buffer
+	 * 3 - is number of lwords used for every additional phys buffer
+	 */
+	for (i = 0; i < MAX_PBL; i++) {
+		lwords = 7 + (i * 3);
+		if (lwords & 1)
+			lwords++;	/* pad it with 1 lword */
+		txd_sizes[i].qwords = lwords >> 1;
+		txd_sizes[i].bytes = lwords << 2;
+	}
+}
+
+static int create_tx_ring(struct bdx_priv *priv)
+{
+	int ret;
+
+	ret = bdx_fifo_alloc(priv, &priv->txd_fifo0.m, priv->txd_size,
+			     REG_TXD_CFG0_0, REG_TXD_CFG1_0,
+			     REG_TXD_RPTR_0, REG_TXD_WPTR_0);
+	if (ret)
+		return ret;
+
+	ret = bdx_fifo_alloc(priv, &priv->txf_fifo0.m, priv->txf_size,
+			     REG_TXF_CFG0_0, REG_TXF_CFG1_0,
+			     REG_TXF_RPTR_0, REG_TXF_WPTR_0);
+	if (ret)
+		goto err_free_txd;
+
+	/* The TX db has to keep mappings for all packets sent (on
+	 * TxD) and not yet reclaimed (on TxF).
+	 */
+	ret = bdx_tx_db_init(&priv->txdb, max(priv->txd_size, priv->txf_size));
+	if (ret)
+		goto err_free_txf;
+
+	/* SHORT_PKT_FIX */
+	priv->b0_len = 64;
+	priv->b0_va =
+		dma_alloc_coherent(&priv->pdev->dev, priv->b0_len, &priv->b0_dma, GFP_KERNEL);
+	if (!priv->b0_va)
+		goto err_free_db;
+
+	priv->tx_level = BDX_MAX_TX_LEVEL;
+	priv->tx_update_mark = priv->tx_level - 1024;
+	return 0;
+err_free_db:
+	bdx_tx_db_close(&priv->txdb);
+err_free_txf:
+	bdx_fifo_free(priv, &priv->txf_fifo0.m);
+err_free_txd:
+	bdx_fifo_free(priv, &priv->txd_fifo0.m);
+	return ret;
+}
+
+/**
+ * bdx_tx_space - Calculate the available space in the TX fifo.
+ *
+ * @priv - NIC private structure
+ * Return: available space in TX fifo in bytes
+ */
+static inline int bdx_tx_space(struct bdx_priv *priv)
+{
+	struct txd_fifo *f = &priv->txd_fifo0;
+	int fsize;
+
+	f->m.rptr = read_reg(priv, f->m.reg_rptr) & TXF_WPTR_WR_PTR;
+	fsize = f->m.rptr - f->m.wptr;
+	if (fsize <= 0)
+		fsize = f->m.memsz + fsize;
+	return fsize;
+}
+
+static int bdx_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct bdx_priv *priv = netdev_priv(ndev);
+	struct txd_fifo *f = &priv->txd_fifo0;
+	int txd_checksum = 7;	/* full checksum */
+	int txd_lgsnd = 0;
+	int txd_vlan_id = 0;
+	int txd_vtag = 0;
+	int txd_mss = 0;
+	unsigned int pkt_len;
+	struct txd_desc *txdd;
+	int nr_frags, len, err;
+
+	/* Build tx descriptor */
+	txdd = (struct txd_desc *)(f->m.va + f->m.wptr);
+	err = bdx_tx_map_skb(priv, skb, txdd, &pkt_len);
+	if (err) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+	nr_frags = skb_shinfo(skb)->nr_frags;
+	if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL))
+		txd_checksum = 0;
+
+	if (skb_shinfo(skb)->gso_size) {
+		txd_mss = skb_shinfo(skb)->gso_size;
+		txd_lgsnd = 1;
+		netdev_dbg(priv->ndev, "skb %p pkt len %d gso size = %d\n", skb,
+			   pkt_len, txd_mss);
+	}
+	if (skb_vlan_tag_present(skb)) {
+		/* Don't cut VLAN ID to 12 bits */
+		txd_vlan_id = skb_vlan_tag_get(skb);
+		txd_vtag = 1;
+	}
+	txdd->va_hi = 0;
+	txdd->va_lo = (u32)((u64)skb);
+	txdd->length = cpu_to_le16(pkt_len);
+	txdd->mss = cpu_to_le16(txd_mss);
+	txdd->txd_val1 =
+		cpu_to_le32(TXD_W1_VAL
+			    (txd_sizes[nr_frags].qwords, txd_checksum,
+			     txd_vtag, txd_lgsnd, txd_vlan_id));
+	netdev_dbg(priv->ndev, "=== w1 qwords[%d] %d =====\n", nr_frags,
+		   txd_sizes[nr_frags].qwords);
+	netdev_dbg(priv->ndev, "=== TxD desc =====================\n");
+	netdev_dbg(priv->ndev, "=== w1: 0x%x ================\n", txdd->txd_val1);
+	netdev_dbg(priv->ndev, "=== w2: mss 0x%x len 0x%x\n", txdd->mss,
+		   txdd->length);
+	/* SHORT_PKT_FIX */
+	if (pkt_len < SHORT_PACKET_SIZE) {
+		struct pbl *pbl = &txdd->pbl[++nr_frags];
+
+		txdd->length = cpu_to_le16(SHORT_PACKET_SIZE);
+		txdd->txd_val1 =
+			cpu_to_le32(TXD_W1_VAL
+				    (txd_sizes[nr_frags].qwords,
+				     txd_checksum, txd_vtag, txd_lgsnd,
+				     txd_vlan_id));
+		pbl->len = cpu_to_le32(SHORT_PACKET_SIZE - pkt_len);
+		pbl->pa_lo = cpu_to_le32(L32_64(priv->b0_dma));
+		pbl->pa_hi = cpu_to_le32(H32_64(priv->b0_dma));
+		netdev_dbg(priv->ndev, "=== SHORT_PKT_FIX   ================\n");
+		netdev_dbg(priv->ndev, "=== nr_frags : %d   ================\n",
+			   nr_frags);
+	}
+
+	/* Increment TXD write pointer. In case of fifo wrapping copy
+	 * reminder of the descriptor to the beginning.
+	 */
+	f->m.wptr += txd_sizes[nr_frags].bytes;
+	len = f->m.wptr - f->m.memsz;
+	if (unlikely(len >= 0)) {
+		f->m.wptr = len;
+		if (len > 0)
+			memcpy(f->m.va, f->m.va + f->m.memsz, len);
+	}
+	/* Force memory writes to complete before letting the HW know
+	 * there are new descriptors to fetch.
+	 */
+	wmb();
+
+	priv->tx_level -= txd_sizes[nr_frags].bytes;
+	if (priv->tx_level > priv->tx_update_mark) {
+		write_reg(priv, f->m.reg_wptr,
+			  f->m.wptr & TXF_WPTR_WR_PTR);
+	} else {
+		if (priv->tx_noupd++ > BDX_NO_UPD_PACKETS) {
+			priv->tx_noupd = 0;
+			write_reg(priv, f->m.reg_wptr,
+				  f->m.wptr & TXF_WPTR_WR_PTR);
+		}
+	}
+
+	netif_trans_update(ndev);
+	priv->net_stats.tx_packets++;
+	priv->net_stats.tx_bytes += pkt_len;
+	if (priv->tx_level < BDX_MIN_TX_LEVEL) {
+		netdev_dbg(priv->ndev, "TX Q STOP level %d\n", priv->tx_level);
+		netif_stop_queue(ndev);
+	}
+
+	return NETDEV_TX_OK;
+}
+
+static void bdx_tx_free_skbs(struct bdx_priv *priv)
+{
+	struct txdb *db = &priv->txdb;
+
+	while (db->rptr != db->wptr) {
+		if (likely(db->rptr->len))
+			dma_unmap_page(&priv->pdev->dev, db->rptr->addr.dma,
+				       db->rptr->len, DMA_TO_DEVICE);
+		else
+			dev_kfree_skb(db->rptr->addr.skb);
+		bdx_tx_db_inc_rptr(db);
+	}
+}
+
+static void destroy_tx_ring(struct bdx_priv *priv)
+{
+	bdx_tx_free_skbs(priv);
+	bdx_fifo_free(priv, &priv->txd_fifo0.m);
+	bdx_fifo_free(priv, &priv->txf_fifo0.m);
+	bdx_tx_db_close(&priv->txdb);
+	/* SHORT_PKT_FIX */
+	if (priv->b0_len) {
+		dma_free_coherent(&priv->pdev->dev, priv->b0_len, priv->b0_va,
+				  priv->b0_dma);
+		priv->b0_len = 0;
+	}
+}
+
+/**
+ * bdx_tx_push_desc - Push a descriptor to TxD fifo.
+ *
+ * @priv: NIC private structure
+ * @data: desc's data
+ * @size: desc's size
+ *
+ * This function pushes desc to TxD fifo and overlaps it if needed.
+ *
+ * This function does not check for available space, nor does it check
+ * that the data size is smaller than the fifo size. Checking for
+ * space is the responsibility of the caller.
+ */
+static void bdx_tx_push_desc(struct bdx_priv *priv, void *data, int size)
+{
+	struct txd_fifo *f = &priv->txd_fifo0;
+	int i = f->m.memsz - f->m.wptr;
+
+	if (size == 0)
+		return;
+
+	if (i > size) {
+		memcpy(f->m.va + f->m.wptr, data, size);
+		f->m.wptr += size;
+	} else {
+		memcpy(f->m.va + f->m.wptr, data, i);
+		f->m.wptr = size - i;
+		memcpy(f->m.va, data + i, f->m.wptr);
+	}
+	write_reg(priv, f->m.reg_wptr, f->m.wptr & TXF_WPTR_WR_PTR);
+}
+
+/**
+ * bdx_tx_push_desc_safe - push descriptor to TxD fifo in a safe way.
+ *
+ * @priv: NIC private structure
+ * @data: descriptor data
+ * @size: descriptor size
+ *
+ * This function does check for available space and, if necessary,
+ * waits for the NIC to read existing data before writing new data.
+ */
+static void bdx_tx_push_desc_safe(struct bdx_priv *priv, void *data, int size)
+{
+	int timer = 0;
+
+	while (size > 0) {
+		/* We subtract 8 because when the fifo is full rptr ==
+		 * wptr, which also means that fifo is empty, we can
+		 * understand the difference, but could the HW do the
+		 * same ???
+		 */
+		int avail = bdx_tx_space(priv) - 8;
+
+		if (avail <= 0) {
+			if (timer++ > 300) {	/* Prevent endless loop */
+				netdev_dbg(priv->ndev, "timeout while writing desc to TxD fifo\n");
+				break;
+			}
+			udelay(50);	/* Give the HW a chance to clean the fifo */
+			continue;
+		}
+		avail = min(avail, size);
+		netdev_dbg(priv->ndev, "about to push  %d bytes starting %p size %d\n", avail,
+			   data, size);
+		bdx_tx_push_desc(priv, data, avail);
+		size -= avail;
+		data += avail;
+	}
+}
+
+static int bdx_set_link_speed(struct bdx_priv *priv, u32 speed)
+{
+	int i;
+	u32 val;
+
+	netdev_dbg(priv->ndev, "speed %d\n", speed);
+
+	switch (speed) {
+	case SPEED_10000:
+	case SPEED_5000:
+	case SPEED_2500:
+		netdev_dbg(priv->ndev, "link_speed %d\n", speed);
+
+		write_reg(priv, 0x1010, 0x217);	/*ETHSD.REFCLK_CONF  */
+		write_reg(priv, 0x104c, 0x4c);	/*ETHSD.L0_RX_PCNT  */
+		write_reg(priv, 0x1050, 0x4c);	/*ETHSD.L1_RX_PCNT  */
+		write_reg(priv, 0x1054, 0x4c);	/*ETHSD.L2_RX_PCNT  */
+		write_reg(priv, 0x1058, 0x4c);	/*ETHSD.L3_RX_PCNT  */
+		write_reg(priv, 0x102c, 0x434);	/*ETHSD.L0_TX_PCNT  */
+		write_reg(priv, 0x1030, 0x434);	/*ETHSD.L1_TX_PCNT  */
+		write_reg(priv, 0x1034, 0x434);	/*ETHSD.L2_TX_PCNT  */
+		write_reg(priv, 0x1038, 0x434);	/*ETHSD.L3_TX_PCNT  */
+		write_reg(priv, 0x6300, 0x0400);	/*MAC.PCS_CTRL */
+
+		write_reg(priv, 0x1018, 0x00);	/*Mike2 */
+		udelay(5);
+		write_reg(priv, 0x1018, 0x04);	/*Mike2 */
+		udelay(5);
+		write_reg(priv, 0x1018, 0x06);	/*Mike2 */
+		udelay(5);
+		/*MikeFix1 */
+		/*L0: 0x103c , L1: 0x1040 , L2: 0x1044 , L3: 0x1048 =0x81644 */
+		write_reg(priv, 0x103c, 0x81644);	/*ETHSD.L0_TX_DCNT  */
+		write_reg(priv, 0x1040, 0x81644);	/*ETHSD.L1_TX_DCNT  */
+		write_reg(priv, 0x1044, 0x81644);	/*ETHSD.L2_TX_DCNT  */
+		write_reg(priv, 0x1048, 0x81644);	/*ETHSD.L3_TX_DCNT  */
+		write_reg(priv, 0x1014, 0x043);	/*ETHSD.INIT_STAT */
+		for (i = 1000; i; i--) {
+			udelay(50);
+			val = read_reg(priv, 0x1014);	/*ETHSD.INIT_STAT */
+			if (val & (1 << 9)) {
+				write_reg(priv, 0x1014, 0x3);	/*ETHSD.INIT_STAT */
+				val = read_reg(priv, 0x1014);	/*ETHSD.INIT_STAT */
+
+				break;
+			}
+		}
+		if (!i)
+			netdev_err(priv->ndev, "MAC init timeout!\n");
+
+		write_reg(priv, 0x6350, 0x0);	/*MAC.PCS_IF_MODE */
+		write_reg(priv, REG_CTRLST, 0xC13);	/*0x93//0x13 */
+		write_reg(priv, 0x111c, 0x7ff);	/*MAC.MAC_RST_CNT */
+		for (i = 40; i--;)
+			udelay(50);
+
+		write_reg(priv, 0x111c, 0x0);	/*MAC.MAC_RST_CNT */
+		break;
+
+	case SPEED_1000:
+	case SPEED_100:
+		write_reg(priv, 0x1010, 0x613);	/*ETHSD.REFCLK_CONF  */
+		write_reg(priv, 0x104c, 0x4d);	/*ETHSD.L0_RX_PCNT  */
+		write_reg(priv, 0x1050, 0x0);	/*ETHSD.L1_RX_PCNT  */
+		write_reg(priv, 0x1054, 0x0);	/*ETHSD.L2_RX_PCNT  */
+		write_reg(priv, 0x1058, 0x0);	/*ETHSD.L3_RX_PCNT  */
+		write_reg(priv, 0x102c, 0x35);	/*ETHSD.L0_TX_PCNT  */
+		write_reg(priv, 0x1030, 0x0);	/*ETHSD.L1_TX_PCNT  */
+		write_reg(priv, 0x1034, 0x0);	/*ETHSD.L2_TX_PCNT  */
+		write_reg(priv, 0x1038, 0x0);	/*ETHSD.L3_TX_PCNT  */
+		write_reg(priv, 0x6300, 0x01140);	/*MAC.PCS_CTRL */
+
+		write_reg(priv, 0x1014, 0x043);	/*ETHSD.INIT_STAT */
+		for (i = 1000; i; i--) {
+			udelay(50);
+			val = read_reg(priv, 0x1014);	/*ETHSD.INIT_STAT */
+			if (val & (1 << 9)) {
+				write_reg(priv, 0x1014, 0x3);	/*ETHSD.INIT_STAT */
+				val = read_reg(priv, 0x1014);	/*ETHSD.INIT_STAT */
+
+				break;
+			}
+		}
+		if (!i)
+			netdev_err(priv->ndev, "MAC init timeout!\n");
+
+		write_reg(priv, 0x6350, 0x2b);	/*MAC.PCS_IF_MODE 1g */
+		write_reg(priv, 0x6310, 0x9801);	/*MAC.PCS_DEV_AB */
+
+		write_reg(priv, 0x6314, 0x1);	/*MAC.PCS_PART_AB */
+		write_reg(priv, 0x6348, 0xc8);	/*MAC.PCS_LINK_LO */
+		write_reg(priv, 0x634c, 0xc8);	/*MAC.PCS_LINK_HI */
+		udelay(50);
+		write_reg(priv, REG_CTRLST, 0xC13);	/*0x93//0x13 */
+		write_reg(priv, 0x111c, 0x7ff);	/*MAC.MAC_RST_CNT */
+		for (i = 40; i--;)
+			udelay(50);
+
+		write_reg(priv, 0x111c, 0x0);	/*MAC.MAC_RST_CNT */
+		write_reg(priv, 0x6300, 0x1140);	/*MAC.PCS_CTRL */
+		break;
+
+	case 0:		/* Link down */
+		write_reg(priv, 0x104c, 0x0);	/*ETHSD.L0_RX_PCNT  */
+		write_reg(priv, 0x1050, 0x0);	/*ETHSD.L1_RX_PCNT  */
+		write_reg(priv, 0x1054, 0x0);	/*ETHSD.L2_RX_PCNT  */
+		write_reg(priv, 0x1058, 0x0);	/*ETHSD.L3_RX_PCNT  */
+		write_reg(priv, 0x102c, 0x0);	/*ETHSD.L0_TX_PCNT  */
+		write_reg(priv, 0x1030, 0x0);	/*ETHSD.L1_TX_PCNT  */
+		write_reg(priv, 0x1034, 0x0);	/*ETHSD.L2_TX_PCNT  */
+		write_reg(priv, 0x1038, 0x0);	/*ETHSD.L3_TX_PCNT  */
+
+		write_reg(priv, REG_CTRLST, 0x800);
+		write_reg(priv, 0x111c, 0x7ff);	/*MAC.MAC_RST_CNT */
+		for (i = 40; i--;)
+			udelay(50);
+		write_reg(priv, 0x111c, 0x0);	/*MAC.MAC_RST_CNT */
+		break;
+
+	default:
+		netdev_err(priv->ndev, "Link speed was not identified yet (%d)\n", speed);
+		speed = 0;
+		break;
+	}
+
+	return speed;
+}
+
+#define LINK_LOOP_MAX 10
+
+static void bdx_link_changed(struct bdx_priv *priv)
+{
+	u32 link = read_reg(priv, REG_MAC_LNK_STAT) & MAC_LINK_STAT;
+
+	if (!link) {
+		if (netif_carrier_ok(priv->ndev) && priv->link) {
+			netif_stop_queue(priv->ndev);
+			netif_carrier_off(priv->ndev);
+			netdev_info(priv->ndev, "Device link is down.\n");
+		}
+		priv->link = 0;
+		netif_carrier_off(priv->ndev);
+		if (priv->link_loop_cnt++ > LINK_LOOP_MAX) {
+			/* MAC reset */
+			bdx_set_link_speed(priv, 0);
+			priv->link_loop_cnt = 0;
+		}
+		write_reg(priv, 0x5150, 1000000);
+		return;
+	}
+	priv->link = link;
+}
+
+static inline void bdx_isr_extra(struct bdx_priv *priv, u32 isr)
+{
+	if (isr & (IR_LNKCHG0 | IR_LNKCHG1 | IR_TMR0)) {
+		netdev_dbg(priv->ndev, "isr = 0x%x\n", isr);
+		bdx_link_changed(priv);
+	}
+}
+
+static irqreturn_t bdx_isr_napi(int irq, void *dev)
+{
+	struct net_device *ndev = dev;
+	struct bdx_priv *priv = netdev_priv(ndev);
+	u32 isr;
+
+	isr = read_reg(priv, REG_ISR_MSK0);
+
+	if (unlikely(!isr)) {
+		bdx_enable_interrupts(priv);
+		return IRQ_NONE;	/* Not our interrupt */
+	}
+
+	if (isr & IR_EXTRA)
+		bdx_isr_extra(priv, isr);
+
+	if (isr & (IR_RX_DESC_0 | IR_TX_FREE_0 | IR_TMR1)) {
+		/* We get here if an interrupt has slept into the
+		 * small time window between these lines in
+		 * bdx_poll: bdx_enable_interrupts(priv); return 0;
+		 *
+		 * Currently interrupts are disabled (since we read
+		 * the ISR register) and we have failed to register
+		 * the next poll. So we read the regs to trigger the
+		 * chip and allow further interrupts.
+		 */
+		read_reg(priv, REG_TXF_WPTR_0);
+		read_reg(priv, REG_RXD_WPTR_0);
+	}
+
+	bdx_enable_interrupts(priv);
+	return IRQ_HANDLED;
+}
+
+static int bdx_fw_load(struct bdx_priv *priv)
+{
+	int master, i, ret;
+	const struct firmware *fw = NULL;
+
+	ret = request_firmware(&fw, "tn40xx-14.fw", &priv->pdev->dev);
+	if (ret)
+		return ret;
+
+	master = read_reg(priv, REG_INIT_SEMAPHORE);
+	if (!read_reg(priv, REG_INIT_STATUS) && master) {
+		netdev_dbg(priv->ndev, "Loading FW...\n");
+		bdx_tx_push_desc_safe(priv, (void *)fw->data, fw->size);
+		mdelay(100);
+	}
+	for (i = 0; i < 200; i++) {
+		if (read_reg(priv, REG_INIT_STATUS))
+			break;
+		mdelay(2);
+	}
+	if (master)
+		write_reg(priv, REG_INIT_SEMAPHORE, 1);
+
+	if (i == 200) {
+		netdev_err(priv->ndev, "%s firmware loading failed\n", priv->ndev->name);
+		netdev_dbg(priv->ndev, "VPC = 0x%x VIC = 0x%x INIT_STATUS = 0x%x i =%d\n",
+			   read_reg(priv, REG_VPC),
+			   read_reg(priv, REG_VIC), read_reg(priv, REG_INIT_STATUS), i);
+		ret = -EIO;
+	} else {
+		netdev_dbg(priv->ndev, "%s firmware loading success\n", priv->ndev->name);
+	}
+	release_firmware(fw);
+	return ret;
+}
+
+static void bdx_restore_mac(struct net_device *ndev, struct bdx_priv *priv)
+{
+	u32 val;
+
+	netdev_dbg(priv->ndev, "mac0 =%x mac1 =%x mac2 =%x\n",
+		   read_reg(priv, REG_UNC_MAC0_A),
+		   read_reg(priv, REG_UNC_MAC1_A), read_reg(priv, REG_UNC_MAC2_A));
+
+	val = (ndev->dev_addr[0] << 8) | (ndev->dev_addr[1]);
+	write_reg(priv, REG_UNC_MAC2_A, val);
+	val = (ndev->dev_addr[2] << 8) | (ndev->dev_addr[3]);
+	write_reg(priv, REG_UNC_MAC1_A, val);
+	val = (ndev->dev_addr[4] << 8) | (ndev->dev_addr[5]);
+	write_reg(priv, REG_UNC_MAC0_A, val);
+
+	/* More then IP MAC address */
+	write_reg(priv, REG_MAC_ADDR_0,
+		  (ndev->dev_addr[3] << 24) | (ndev->dev_addr[2] << 16) |
+		  (ndev->dev_addr[1]
+		   << 8) | (ndev->dev_addr[0]));
+	write_reg(priv, REG_MAC_ADDR_1,
+		  (ndev->dev_addr[5] << 8) | (ndev->dev_addr[4]));
+
+	netdev_dbg(priv->ndev, "mac0 =%x mac1 =%x mac2 =%x\n",
+		   read_reg(priv, REG_UNC_MAC0_A),
+		   read_reg(priv, REG_UNC_MAC1_A), read_reg(priv, REG_UNC_MAC2_A));
+}
+
+static int bdx_hw_start(struct bdx_priv *priv)
+{
+	write_reg(priv, REG_FRM_LENGTH, 0X3FE0);
+	write_reg(priv, REG_GMAC_RXF_A, 0X10fd);
+	/*MikeFix1 */
+	/*L0: 0x103c , L1: 0x1040 , L2: 0x1044 , L3: 0x1048 =0x81644 */
+	write_reg(priv, 0x103c, 0x81644);	/*ETHSD.L0_TX_DCNT  */
+	write_reg(priv, 0x1040, 0x81644);	/*ETHSD.L1_TX_DCNT  */
+	write_reg(priv, 0x1044, 0x81644);	/*ETHSD.L2_TX_DCNT  */
+	write_reg(priv, 0x1048, 0x81644);	/*ETHSD.L3_TX_DCNT  */
+	write_reg(priv, REG_RX_FIFO_SECTION, 0x10);
+	write_reg(priv, REG_TX_FIFO_SECTION, 0xE00010);
+	write_reg(priv, REG_RX_FULLNESS, 0);
+	write_reg(priv, REG_TX_FULLNESS, 0);
+
+	write_reg(priv, REG_VGLB, 0);
+	write_reg(priv, REG_RDINTCM0, priv->rdintcm);
+	write_reg(priv, REG_RDINTCM2, 0);
+
+	write_reg(priv, REG_TDINTCM0, priv->tdintcm);	/* old val = 0x300064 */
+
+	/* Enable timer interrupt once in 2 secs. */
+	bdx_restore_mac(priv->ndev, priv);
+
+	/* Pause frame */
+	write_reg(priv, 0x12E0, 0x28);
+	write_reg(priv, REG_PAUSE_QUANT, 0xFFFF);
+	write_reg(priv, 0x6064, 0xF);
+
+	write_reg(priv, REG_GMAC_RXF_A,
+		  GMAC_RX_FILTER_OSEN | GMAC_RX_FILTER_TXFC | GMAC_RX_FILTER_AM
+		  | GMAC_RX_FILTER_AB);
+
+	bdx_link_changed(priv);
+	bdx_enable_interrupts(priv);
+	return 0;
+}
+
+static int bdx_hw_reset(struct bdx_priv *priv)
+{
+	u32 val, i;
+
+	/* Reset sequences: read, write 1, read, write 0 */
+	val = read_reg(priv, REG_CLKPLL);
+	write_reg(priv, REG_CLKPLL, (val | CLKPLL_SFTRST) + 0x8);
+	udelay(50);
+	val = read_reg(priv, REG_CLKPLL);
+	write_reg(priv, REG_CLKPLL, val & ~CLKPLL_SFTRST);
+
+	/* Check that the PLLs are locked and reset ended */
+	for (i = 0; i < 70; i++, mdelay(10)) {
+		if ((read_reg(priv, REG_CLKPLL) & CLKPLL_LKD) == CLKPLL_LKD) {
+			udelay(50);
+			/* Do any PCI-E read transaction */
+			read_reg(priv, REG_RXD_CFG0_0);
+			return 0;
+		}
+	}
+	return 1;
+}
+
+static int bdx_sw_reset(struct bdx_priv *priv)
+{
+	int i;
+
+	/* 1. load MAC (obsolete) */
+	/* 2. disable Rx (and Tx) */
+	write_reg(priv, REG_GMAC_RXF_A, 0);
+	mdelay(100);
+	/* 3. Disable port */
+	write_reg(priv, REG_DIS_PORT, 1);
+	/* 4. Disable queue */
+	write_reg(priv, REG_DIS_QU, 1);
+	/* 5. Wait until hw is disabled */
+	for (i = 0; i < 50; i++) {
+		if (read_reg(priv, REG_RST_PORT) & 1)
+			break;
+		mdelay(10);
+	}
+	if (i == 50) {
+		netdev_err(priv->ndev, "%s SW reset timeout. continuing anyway\n",
+			   priv->ndev->name);
+	}
+	/* 6. Disable interrupts */
+	write_reg(priv, REG_RDINTCM0, 0);
+	write_reg(priv, REG_TDINTCM0, 0);
+	write_reg(priv, REG_IMR, 0);
+	read_reg(priv, REG_ISR);
+
+	/* 7. Reset queue */
+	write_reg(priv, REG_RST_QU, 1);
+	/* 8. Reset port */
+	write_reg(priv, REG_RST_PORT, 1);
+	/* 9. Zero all read and write pointers */
+	for (i = REG_TXD_WPTR_0; i <= REG_TXF_RPTR_3; i += 0x10)
+		write_reg(priv, i, 0);
+	/* 10. Unset port disable */
+	write_reg(priv, REG_DIS_PORT, 0);
+	/* 11. Unset queue disable */
+	write_reg(priv, REG_DIS_QU, 0);
+	/* 12. Unset queue reset */
+	write_reg(priv, REG_RST_QU, 0);
+	/* 13. Unset port reset */
+	write_reg(priv, REG_RST_PORT, 0);
+	/* 14. Enable Rx */
+	/* Skipped. will be done later */
+	return 0;
+}
+
+static int bdx_start(struct bdx_priv *priv)
+{
+	int ret;
+
+	ret = create_tx_ring(priv);
+	if (ret) {
+		netdev_err(priv->ndev, "failed to tx init %d\n", ret);
+		return ret;
+	}
+
+	ret = request_irq(priv->pdev->irq, &bdx_isr_napi, IRQF_SHARED,
+			  priv->ndev->name, priv->ndev);
+	if (ret) {
+		netdev_err(priv->ndev, "failed to request irq %d\n", ret);
+		goto err_tx_ring;
+	}
+
+	ret = bdx_hw_start(priv);
+	if (ret) {
+		netdev_err(priv->ndev, "failed to hw start %d\n", ret);
+		goto err_free_irq;
+	}
+	return 0;
+err_free_irq:
+	free_irq(priv->pdev->irq, priv->ndev);
+err_tx_ring:
+	destroy_tx_ring(priv);
+	return ret;
+}
+
+static int bdx_close(struct net_device *ndev)
+{
+	struct bdx_priv *priv = netdev_priv(ndev);
+
+	netif_carrier_off(ndev);
+
+	bdx_disable_interrupts(priv);
+	free_irq(priv->pdev->irq, priv->ndev);
+	bdx_sw_reset(priv);
+	destroy_tx_ring(priv);
+	return 0;
+}
+
+static int bdx_open(struct net_device *dev)
+{
+	struct bdx_priv *priv = netdev_priv(dev);
+	int ret;
+
+	bdx_sw_reset(priv);
+	ret = bdx_start(priv);
+	if (ret) {
+		netdev_err(dev, "failed to start %d\n", ret);
+		return ret;
+	}
+	return 0;
+}
+
+static void __bdx_vlan_rx_vid(struct net_device *ndev, uint16_t vid, int enable)
+{
+	struct bdx_priv *priv = netdev_priv(ndev);
+	u32 reg, bit, val;
+
+	netdev_dbg(priv->ndev, "vid =%d value =%d\n", (int)vid, enable);
+	if (unlikely(vid >= 4096)) {
+		netdev_err(priv->ndev, "invalid VID: %u (> 4096)\n", vid);
+		return;
+	}
+	reg = REG_VLAN_0 + (vid / 32) * 4;
+	bit = 1 << vid % 32;
+	val = read_reg(priv, reg);
+	netdev_dbg(priv->ndev, "reg =%x, val =%x, bit =%d\n", reg, val, bit);
+	if (enable)
+		val |= bit;
+	else
+		val &= ~bit;
+	netdev_dbg(priv->ndev, "new val %x\n", val);
+	write_reg(priv, reg, val);
+}
+
+static int bdx_vlan_rx_add_vid(struct net_device *ndev,
+			       __always_unused __be16 proto, u16 vid)
+{
+	__bdx_vlan_rx_vid(ndev, vid, 1);
+	return 0;
+}
+
+static int bdx_vlan_rx_kill_vid(struct net_device *ndev,
+				__always_unused __be16 proto, u16 vid)
+{
+	__bdx_vlan_rx_vid(ndev, vid, 0);
+	return 0;
+}
+
+static void bdx_setmulti(struct net_device *ndev)
+{
+	struct bdx_priv *priv = netdev_priv(ndev);
+
+	u32 rxf_val =
+	    GMAC_RX_FILTER_AM | GMAC_RX_FILTER_AB | GMAC_RX_FILTER_OSEN |
+	    GMAC_RX_FILTER_TXFC;
+	int i;
+
+	/* IMF - imperfect (hash) rx multicast filter */
+	/* PMF - perfect rx multicast filter */
+
+	/* FIXME: RXE(OFF) */
+	if (ndev->flags & IFF_PROMISC) {
+		rxf_val |= GMAC_RX_FILTER_PRM;
+	} else if (ndev->flags & IFF_ALLMULTI) {
+		/* set IMF to accept all multicast frames */
+		for (i = 0; i < MAC_MCST_HASH_NUM; i++)
+			write_reg(priv, REG_RX_MCST_HASH0 + i * 4, ~0);
+	} else if (netdev_mc_count(ndev)) {
+		u8 hash;
+		struct netdev_hw_addr *mclist;
+		u32 reg, val;
+
+		/* Set IMF to deny all multicast frames */
+		for (i = 0; i < MAC_MCST_HASH_NUM; i++)
+			write_reg(priv, REG_RX_MCST_HASH0 + i * 4, 0);
+
+		/* Set PMF to deny all multicast frames */
+		for (i = 0; i < MAC_MCST_NUM; i++) {
+			write_reg(priv, REG_RX_MAC_MCST0 + i * 8, 0);
+			write_reg(priv, REG_RX_MAC_MCST1 + i * 8, 0);
+		}
+		/* Use PMF to accept first MAC_MCST_NUM (15) addresses */
+
+		/* TBD: Sort the addresses and write them in ascending
+		 * order into RX_MAC_MCST regs. we skip this phase now
+		 * and accept ALL multicast frames through IMF. Accept
+		 * the rest of addresses throw IMF.
+		 */
+		netdev_for_each_mc_addr(mclist, ndev) {
+			hash = 0;
+			for (i = 0; i < ETH_ALEN; i++)
+				hash ^= mclist->addr[i];
+
+			reg = REG_RX_MCST_HASH0 + ((hash >> 5) << 2);
+			val = read_reg(priv, reg);
+			val |= (1 << (hash % 32));
+			write_reg(priv, reg, val);
+		}
+	} else {
+		rxf_val |= GMAC_RX_FILTER_AB;
+	}
+	write_reg(priv, REG_GMAC_RXF_A, rxf_val);
+	/* Enable RX */
+	/* FIXME: RXE(ON) */
+}
+
+static int bdx_set_mac(struct net_device *ndev, void *p)
+{
+	struct bdx_priv *priv = netdev_priv(ndev);
+	struct sockaddr *addr = p;
+
+	eth_hw_addr_set(ndev, addr->sa_data);
+	bdx_restore_mac(ndev, priv);
+	return 0;
+}
+
+static void bdx_mac_init(struct bdx_priv *priv)
+{
+	u8 addr[ETH_ALEN];
+	u64 val;
+
+	val = (u64)read_reg(priv, REG_UNC_MAC0_A);
+	val |= (u64)read_reg(priv, REG_UNC_MAC1_A) << 16;
+	val |= (u64)read_reg(priv, REG_UNC_MAC2_A) << 32;
+
+	u64_to_ether_addr(val, addr);
+	eth_hw_addr_set(priv->ndev, addr);
+}
+
+static struct net_device_stats *bdx_get_stats(struct net_device *ndev)
+{
+	struct bdx_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *net_stat = &priv->net_stats;
+	return net_stat;
+}
+
+static const struct net_device_ops bdx_netdev_ops = {
+	.ndo_open = bdx_open,
+	.ndo_stop = bdx_close,
+	.ndo_start_xmit = bdx_start_xmit,
+	.ndo_validate_addr = eth_validate_addr,
+	.ndo_set_rx_mode = bdx_setmulti,
+	.ndo_get_stats = bdx_get_stats,
+	.ndo_set_mac_address = bdx_set_mac,
+	.ndo_vlan_rx_add_vid = bdx_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid = bdx_vlan_rx_kill_vid,
+};
+
+static int bdx_priv_init(struct bdx_priv *priv)
+{
+	int ret;
+
+	ret = bdx_hw_reset(priv);
+	if (ret)
+		return ret;
+
+	/* Set GPIO[9:0] to output 0 */
+	write_reg(priv, 0x51E0, 0x30010006);	/* GPIO_OE_ WR CMD */
+	write_reg(priv, 0x51F0, 0x0);	/* GPIO_OE_ DATA */
+	write_reg(priv, REG_MDIO_CMD_STAT, 0x3ec8);
+
+	// we use tx descriptors to load a firmware.
+	ret = create_tx_ring(priv);
+	if (ret)
+		return ret;
+	ret = bdx_fw_load(priv);
+	destroy_tx_ring(priv);
+	return ret;
+}
+
+static struct net_device *bdx_netdev_alloc(struct pci_dev *pdev)
+{
+	struct net_device *ndev;
+
+	ndev = alloc_etherdev(sizeof(struct bdx_priv));
+	if (!ndev)
+		return NULL;
+	ndev->netdev_ops = &bdx_netdev_ops;
+	ndev->tx_queue_len = BDX_NDEV_TXQ_LEN;
+	ndev->mem_start = pci_resource_start(pdev, 0);
+	ndev->mem_end = pci_resource_end(pdev, 0);
+	ndev->min_mtu = ETH_ZLEN;
+	ndev->max_mtu = BDX_MAX_MTU;
+
+	ndev->features = NETIF_F_IP_CSUM |
+		NETIF_F_SG |
+		NETIF_F_FRAGLIST |
+		NETIF_F_TSO | NETIF_F_GRO |
+		NETIF_F_RXCSUM |
+		NETIF_F_RXHASH |
+		NETIF_F_HW_VLAN_CTAG_TX |
+		NETIF_F_HW_VLAN_CTAG_RX |
+		NETIF_F_HW_VLAN_CTAG_FILTER;
+	ndev->vlan_features = NETIF_F_IP_CSUM |
+			       NETIF_F_SG |
+			       NETIF_F_TSO | NETIF_F_GRO | NETIF_F_RXHASH;
+
+	if (dma_get_mask(&pdev->dev) == DMA_BIT_MASK(64)) {
+		ndev->features |= NETIF_F_HIGHDMA;
+		ndev->vlan_features |= NETIF_F_HIGHDMA;
+	}
+	ndev->hw_features |= ndev->features;
+
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+	netif_carrier_off(ndev);
+	netif_stop_queue(ndev);
+
+	return ndev;
+}
+
 static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
+	struct net_device *ndev;
+	struct bdx_priv *priv;
 	int ret;
+	unsigned int nvec = 1;
+	void __iomem *regs;
+
+	init_txd_sizes();
 
 	ret = pci_enable_device(pdev);
 	if (ret)
@@ -18,7 +1179,87 @@ static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 			goto err_disable_device;
 		}
 	}
+
+	ret = pci_request_regions(pdev, BDX_DRV_NAME);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to request PCI regions.\n");
+		goto err_disable_device;
+	}
+
+	pci_set_master(pdev);
+
+	regs = pci_iomap(pdev, 0, BDX_REGS_SIZE);
+	if (!regs) {
+		ret = -EIO;
+		dev_err(&pdev->dev, "failed to map PCI bar.\n");
+		goto err_free_regions;
+	}
+
+	ndev = bdx_netdev_alloc(pdev);
+	if (!ndev) {
+		ret = -ENOMEM;
+		dev_err(&pdev->dev, "failed to allocate netdev.\n");
+		goto err_iounmap;
+	}
+
+	priv = netdev_priv(ndev);
+	pci_set_drvdata(pdev, priv);
+
+	priv->regs = regs;
+	priv->pdev = pdev;
+	priv->ndev = ndev;
+	/* Initialize fifo sizes. */
+	priv->txd_size = 3;
+	priv->txf_size = 3;
+	priv->rxd_size = 3;
+	priv->rxf_size = 3;
+	/* Initialize the initial coalescing registers. */
+	priv->rdintcm = INT_REG_VAL(0x20, 1, 4, 12);
+	priv->tdintcm = INT_REG_VAL(0x20, 1, 0, 12);
+
+	ret = bdx_hw_reset(priv);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to reset HW.\n");
+		goto err_free_netdev;
+	}
+
+	ret = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "failed to allocate irq.\n");
+		goto err_free_netdev;
+	}
+
+	priv->stats_flag = ((read_reg(priv, FPGA_VER) & 0xFFF) != 308);
+
+	priv->isr_mask =
+	    IR_RX_FREE_0 | IR_LNKCHG0 | IR_PSE | IR_TMR0 | IR_RX_DESC_0 |
+	    IR_TX_FREE_0 | IR_TMR1;
+
+	bdx_mac_init(priv);
+	ret = register_netdev(ndev);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to register netdev.\n");
+		goto err_free_irq;
+	}
+
+	ret = bdx_priv_init(priv);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to initialize bdx_priv.\n");
+		goto err_unregister_netdev;
+	}
+
 	return 0;
+err_unregister_netdev:
+	unregister_netdev(ndev);
+err_free_irq:
+	pci_free_irq_vectors(pdev);
+err_free_netdev:
+	pci_set_drvdata(pdev, NULL);
+	free_netdev(ndev);
+err_iounmap:
+	iounmap(regs);
+err_free_regions:
+	pci_release_regions(pdev);
 err_disable_device:
 	pci_disable_device(pdev);
 	return ret;
@@ -26,7 +1267,17 @@ static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 static void bdx_remove(struct pci_dev *pdev)
 {
+	struct bdx_priv *priv = pci_get_drvdata(pdev);
+	struct net_device *ndev = priv->ndev;
+
+	unregister_netdev(ndev);
+
+	pci_free_irq_vectors(priv->pdev);
+	pci_set_drvdata(pdev, NULL);
+	iounmap(priv->regs);
+	pci_release_regions(pdev);
 	pci_disable_device(pdev);
+	free_netdev(ndev);
 }
 
 static const struct pci_device_id bdx_id_table[] = {
diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
index ed43ba027dc5..e8044e9d06eb 100644
--- a/drivers/net/ethernet/tehuti/tn40.h
+++ b/drivers/net/ethernet/tehuti/tn40.h
@@ -4,9 +4,21 @@
 #ifndef _TN40_H_
 #define _TN40_H_
 
+#include <linux/crc32.h>
+#include <linux/delay.h>
+#include <linux/etherdevice.h>
+#include <linux/firmware.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/in.h>
+#include <linux/interrupt.h>
+#include <linux/ip.h>
 #include <linux/module.h>
-#include <linux/kernel.h>
+#include <linux/netdevice.h>
 #include <linux/pci.h>
+#include <linux/phy.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
 #include <linux/version.h>
 
 #include "tn40_regs.h"
@@ -16,4 +28,182 @@
 
 #define PCI_VENDOR_ID_EDIMAX 0x1432
 
+#define MDIO_SPEED_1MHZ (1)
+#define MDIO_SPEED_6MHZ (6)
+
+/* netdev tx queue len for Luxor. The default value is 1000.
+ * ifconfig eth1 txqueuelen 3000 - to change it at runtime.
+ */
+#define BDX_NDEV_TXQ_LEN 3000
+
+#define FIFO_SIZE 4096
+#define FIFO_EXTRA_SPACE 1024
+
+#if BITS_PER_LONG == 64
+#define H32_64(x) ((u32)((u64)(x) >> 32))
+#define L32_64(x) ((u32)((u64)(x) & 0xffffffff))
+#elif BITS_PER_LONG == 32
+#define H32_64(x) 0
+#define L32_64(x) ((u32)(x))
+#else /* BITS_PER_LONG == ?? */
+#error BITS_PER_LONG is undefined. Must be 64 or 32
+#endif /* BITS_PER_LONG */
+
+#define BDX_TXF_DESC_SZ 16
+#define BDX_MAX_TX_LEVEL (priv->txd_fifo0.m.memsz - 16)
+#define BDX_MIN_TX_LEVEL 256
+#define BDX_NO_UPD_PACKETS 40
+#define BDX_MAX_MTU BIT(14)
+
+#define PCK_TH_MULT 128
+#define INT_COAL_MULT 2
+
+#define BITS_MASK(nbits) ((1 << (nbits)) - 1)
+#define GET_BITS_SHIFT(x, nbits, nshift) (((x) >> (nshift)) & BITS_MASK(nbits))
+#define BITS_SHIFT_MASK(nbits, nshift) (BITS_MASK(nbits) << (nshift))
+#define BITS_SHIFT_VAL(x, nbits, nshift) (((x) & BITS_MASK(nbits)) << (nshift))
+#define BITS_SHIFT_CLEAR(x, nbits, nshift) \
+	((x) & (~BITS_SHIFT_MASK(nbits, (nshift))))
+
+#define GET_INT_COAL(x) GET_BITS_SHIFT(x, 15, 0)
+#define GET_INT_COAL_RC(x) GET_BITS_SHIFT(x, 1, 15)
+#define GET_RXF_TH(x) GET_BITS_SHIFT(x, 4, 16)
+#define GET_PCK_TH(x) GET_BITS_SHIFT(x, 4, 20)
+
+#define INT_REG_VAL(coal, coal_rc, rxf_th, pck_th) \
+	((coal) | ((coal_rc) << 15) | ((rxf_th) << 16) | ((pck_th) << 20))
+
+struct fifo {
+	dma_addr_t da; /* Physical address of fifo (used by HW) */
+	char *va; /* Virtual address of fifo (used by SW) */
+	u32 rptr, wptr;
+	 /* Cached values of RPTR and WPTR registers,
+	  * they're 32 bits on both 32 and 64 archs.
+	  */
+	u16 reg_cfg0;
+	u16 reg_cfg1;
+	u16 reg_rptr;
+	u16 reg_wptr;
+	u16 memsz; /* Memory size allocated for fifo */
+	u16 size_mask;
+	u16 pktsz; /* Skb packet size to allocate */
+	u16 rcvno; /* Number of buffers that come from this RXF */
+};
+
+struct txf_fifo {
+	struct fifo m; /* The minimal set of variables used by all fifos */
+};
+
+struct txd_fifo {
+	struct fifo m; /* The minimal set of variables used by all fifos */
+};
+
+union bdx_dma_addr {
+	dma_addr_t dma;
+	struct sk_buff *skb;
+};
+
+/* Entry in the db.
+ * if len == 0 addr is dma
+ * if len != 0 addr is skb
+ */
+struct tx_map {
+	union bdx_dma_addr addr;
+	int len;
+};
+
+/* tx database - implemented as circular fifo buffer */
+struct txdb {
+	struct tx_map *start; /* Points to the first element */
+	struct tx_map *end; /* Points just AFTER the last element */
+	struct tx_map *rptr; /* Points to the next element to read */
+	struct tx_map *wptr; /* Points to the next element to write */
+	int size; /* Number of elements in the db */
+};
+
+struct bdx_priv {
+	struct net_device *ndev;
+	struct pci_dev *pdev;
+
+	/* Tx FIFOs: 1 for data desc, 1 for empty (acks) desc */
+	struct txd_fifo txd_fifo0;
+	struct txf_fifo txf_fifo0;
+	struct txdb txdb;
+	int tx_level;
+	int tx_update_mark;
+	int tx_noupd;
+
+	int stats_flag;
+	struct net_device_stats net_stats;
+
+	u8 txd_size;
+	u8 txf_size;
+	u8 rxd_size;
+	u8 rxf_size;
+	u32 rdintcm;
+	u32 tdintcm;
+
+	u32 isr_mask;
+	int link;
+	u32 link_loop_cnt;
+
+	void __iomem *regs;
+
+	/* SHORT_PKT_FIX */
+	u32 b0_len;
+	dma_addr_t b0_dma; /* Physical address of buffer */
+	char *b0_va; /* Virtual address of buffer */
+};
+
+#define MAX_PBL (19)
+/* PBL describes each virtual buffer to be transmitted from the host. */
+struct pbl {
+	u32 pa_lo;
+	u32 pa_hi;
+	u32 len;
+};
+
+/* First word for TXD descriptor. It means: type = 3 for regular Tx packet,
+ * hw_csum = 7 for IP+UDP+TCP HW checksums.
+ */
+#define TXD_W1_VAL(bc, checksum, vtag, lgsnd, vlan_id)               \
+	((bc) | ((checksum) << 5) | ((vtag) << 8) | ((lgsnd) << 9) | \
+	 (0x30000) | ((vlan_id & 0x0fff) << 20) |                    \
+	 (((vlan_id >> 13) & 7) << 13))
+
+struct txd_desc {
+	u32 txd_val1;
+	u16 mss;
+	u16 length;
+	u32 va_lo;
+	u32 va_hi;
+	struct pbl pbl[0]; /* Fragments */
+} __packed;
+
+struct txf_desc {
+	u32 status;
+	u32 va_lo; /* VAdr[31:0] */
+	u32 va_hi; /* VAdr[63:32] */
+	u32 pad;
+} __packed;
+
+/* 32 bit kernels use 16 bits for page_offset. Do not increase
+ * LUXOR__MAX_PAGE_SIZE beyind 64K!
+ */
+#if BITS_PER_LONG > 32
+#define LUXOR__MAX_PAGE_SIZE 0x40000
+#else
+#define LUXOR__MAX_PAGE_SIZE 0x10000
+#endif
+
+static inline u32 read_reg(struct bdx_priv *priv, u32 reg)
+{
+	return readl(priv->regs + reg);
+}
+
+static inline void write_reg(struct bdx_priv *priv, u32 reg, u32 val)
+{
+	writel(val, priv->regs + reg);
+}
+
 #endif /* _TN40XX_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v1 4/5] net: tn40xx: add basic Rx handling
  2024-04-15 10:43 [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips FUJITA Tomonori
                   ` (2 preceding siblings ...)
  2024-04-15 10:43 ` [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling FUJITA Tomonori
@ 2024-04-15 10:43 ` FUJITA Tomonori
  2024-04-16  0:38   ` kernel test robot
  2024-04-15 10:43 ` [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support FUJITA Tomonori
  2024-04-15 23:21 ` [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips Florian Fainelli
  5 siblings, 1 reply; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-15 10:43 UTC (permalink / raw)
  To: netdev; +Cc: andrew

This patch adds basic Rx handling. The Rx logic uses three major data
structures; two ring buffers with NIC and one database. One ring
buffer is used to send information to NIC about memory to be stored
packets to be received. The other is used to get information from NIC
about received packets. The database is used to keep the information
about DMA mapping. After a packet arrived, the db is used to pass the
packet to the network stack.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
---
 drivers/net/ethernet/tehuti/tn40.c | 643 ++++++++++++++++++++++++++++-
 drivers/net/ethernet/tehuti/tn40.h |  77 ++++
 2 files changed, 719 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/tehuti/tn40.c b/drivers/net/ethernet/tehuti/tn40.c
index 0798df468fc3..c8ed9b743753 100644
--- a/drivers/net/ethernet/tehuti/tn40.c
+++ b/drivers/net/ethernet/tehuti/tn40.c
@@ -48,6 +48,559 @@ static void bdx_fifo_free(struct bdx_priv *priv, struct fifo *f)
 			  f->memsz + FIFO_EXTRA_SPACE, f->va, f->da);
 }
 
+static struct rxdb *bdx_rxdb_alloc(int nelem)
+{
+	struct rxdb *db;
+	int i;
+	size_t size = sizeof(struct rxdb) + (nelem * sizeof(int)) +
+	    (nelem * sizeof(struct rx_map));
+
+	db = vzalloc(size);
+	if (db) {
+		db->stack = (int *)(db + 1);
+		db->elems = (void *)(db->stack + nelem);
+		db->nelem = nelem;
+		db->top = nelem;
+		/* make the first alloc close to db struct */
+		for (i = 0; i < nelem; i++)
+			db->stack[i] = nelem - i - 1;
+	}
+	return db;
+}
+
+static void bdx_rxdb_free(struct rxdb *db)
+{
+	vfree(db);
+}
+
+static inline int bdx_rxdb_alloc_elem(struct rxdb *db)
+{
+	return db->stack[--(db->top)];
+}
+
+static inline void *bdx_rxdb_addr_elem(struct rxdb *db, unsigned int n)
+{
+	return db->elems + n;
+}
+
+static inline int bdx_rxdb_available(struct rxdb *db)
+{
+	return db->top;
+}
+
+static inline void bdx_rxdb_free_elem(struct rxdb *db, unsigned int n)
+{
+	db->stack[(db->top)++] = n;
+}
+
+static void bdx_rx_vlan(struct bdx_priv *priv, struct sk_buff *skb,
+			u32 rxd_val1, u16 rxd_vlan)
+{
+	if (GET_RXD_VTAG(rxd_val1))	/* Vlan case */
+		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+				       le16_to_cpu(GET_RXD_VLAN_TCI(rxd_vlan)));
+}
+
+static inline struct bdx_page *bdx_rx_page(struct rx_map *dm)
+{
+	return &dm->bdx_page;
+}
+
+static struct bdx_page *bdx_rx_get_page(struct bdx_priv *priv)
+{
+	gfp_t gfp_mask;
+	int page_size = priv->rx_page_table.page_size;
+	struct bdx_page *bdx_page = &priv->rx_page_table.bdx_pages;
+	struct page *page;
+	dma_addr_t dma;
+
+	gfp_mask = GFP_ATOMIC | __GFP_NOWARN;
+	if (page_size > PAGE_SIZE)
+		gfp_mask |= __GFP_COMP;
+
+	page = alloc_pages(gfp_mask, get_order(page_size));
+	if (likely(page)) {
+		netdev_dbg(priv->ndev, "map page %p size %d\n", page, page_size);
+		dma = dma_map_page(&priv->pdev->dev, page, 0, page_size,
+				   DMA_FROM_DEVICE);
+		if (unlikely(dma_mapping_error(&priv->pdev->dev, dma))) {
+			netdev_err(priv->ndev, "failed to map page %d\n", page_size);
+			__free_pages(page, get_order(page_size));
+			return NULL;
+		}
+	} else {
+		return NULL;
+	}
+
+	bdx_page->page = page;
+	bdx_page->dma = dma;
+	return bdx_page;
+}
+
+static int bdx_rx_get_page_size(struct bdx_priv *priv)
+{
+	struct rxdb *db = priv->rxdb0;
+	int dno = bdx_rxdb_available(db) - 1;
+
+	priv->rx_page_table.page_size =
+	    min(LUXOR__MAX_PAGE_SIZE, dno * priv->rx_page_table.buf_size);
+
+	return priv->rx_page_table.page_size;
+}
+
+static void bdx_rx_reuse_page(struct bdx_priv *priv, struct rx_map *dm)
+{
+	netdev_dbg(priv->ndev, "dm size %d off %d dma %p\n", dm->size, dm->off,
+		   (void *)dm->dma);
+	if (dm->off == 0) {
+		netdev_dbg(priv->ndev, "unmap page %p size %d\n", (void *)dm->dma, dm->size);
+		dma_unmap_page(&priv->pdev->dev, dm->dma, dm->size,
+			       DMA_FROM_DEVICE);
+	}
+}
+
+static void bdx_rx_ref_page(struct bdx_page *bdx_page)
+{
+	get_page(bdx_page->page);
+}
+
+static void bdx_rx_put_page(struct bdx_priv *priv, struct rx_map *dm)
+{
+	if (dm->off == 0)
+		dma_unmap_page(&priv->pdev->dev, dm->dma, dm->size,
+			       DMA_FROM_DEVICE);
+	put_page(dm->bdx_page.page);
+}
+
+static void bdx_rx_set_dm_page(register struct rx_map *dm,
+			       struct bdx_page *bdx_page)
+{
+	dm->bdx_page.page = bdx_page->page;
+}
+
+/**
+ * create_rx_ring - Initialize RX all related HW and SW resources
+ * @priv: NIC private structure
+ *
+ * bdx_rx_init creates rxf and rxd fifos, updates the relevant HW registers,
+ * preallocates skbs for rx. It assumes that Rx is disabled in HW funcs are
+ * grouped for better cache usage
+ *
+ * RxD fifo is smaller then RxF fifo by design. Upon high load, RxD will be
+ * filled and packets will be dropped by the NIC without getting into the host
+ * or generating interrupts. In this situation the host has no chance of
+ * processing all the packets. Dropping packets by the NIC is cheaper, since it
+ * takes 0 CPU cycles.
+ *
+ * Return: 0 on success and negative value on error.
+ */
+static int create_rx_ring(struct bdx_priv *priv)
+{
+	int ret, pkt_size;
+
+	ret = bdx_fifo_alloc(priv, &priv->rxd_fifo0.m, priv->rxd_size,
+			     REG_RXD_CFG0_0, REG_RXD_CFG1_0,
+			     REG_RXD_RPTR_0, REG_RXD_WPTR_0);
+	if (ret)
+		return ret;
+
+	ret = bdx_fifo_alloc(priv, &priv->rxf_fifo0.m, priv->rxf_size,
+			     REG_RXF_CFG0_0, REG_RXF_CFG1_0,
+			     REG_RXF_RPTR_0, REG_RXF_WPTR_0);
+	if (ret)
+		goto err_free_rxd;
+
+	pkt_size = priv->ndev->mtu + VLAN_ETH_HLEN;
+	priv->rxf_fifo0.m.pktsz = pkt_size;
+	priv->rxdb0 =
+		bdx_rxdb_alloc(priv->rxf_fifo0.m.memsz / sizeof(struct rxf_desc));
+	if (!priv->rxdb0)
+		goto err_free_rxf;
+
+	priv->rx_page_table.buf_size = round_up(pkt_size, SMP_CACHE_BYTES);
+	return 0;
+err_free_rxf:
+	bdx_fifo_free(priv, &priv->rxf_fifo0.m);
+err_free_rxd:
+	bdx_fifo_free(priv, &priv->rxd_fifo0.m);
+	return ret;
+}
+
+static void bdx_rx_free_buffers(struct bdx_priv *priv, struct rxdb *db,
+				struct rxf_fifo *f)
+{
+	struct rx_map *dm;
+	u16 i;
+
+	netdev_dbg(priv->ndev, "total =%d free =%d busy =%d\n", db->nelem,
+		   bdx_rxdb_available(db), db->nelem - bdx_rxdb_available(db));
+	while (bdx_rxdb_available(db) > 0) {
+		i = bdx_rxdb_alloc_elem(db);
+		dm = bdx_rxdb_addr_elem(db, i);
+		dm->dma = 0;
+	}
+	for (i = 0; i < db->nelem; i++) {
+		dm = bdx_rxdb_addr_elem(db, i);
+		if (dm->dma && dm->bdx_page.page)
+			bdx_rx_put_page(priv, dm);
+	}
+}
+
+static void destroy_rx_ring(struct bdx_priv *priv)
+{
+	if (priv->rxdb0) {
+		bdx_rx_free_buffers(priv, priv->rxdb0, &priv->rxf_fifo0);
+		bdx_rxdb_free(priv->rxdb0);
+		priv->rxdb0 = NULL;
+	}
+	bdx_fifo_free(priv, &priv->rxf_fifo0.m);
+	bdx_fifo_free(priv, &priv->rxd_fifo0.m);
+}
+
+/**
+ * bdx_rx_alloc_buffers - Fill rxf fifo with new skbs.
+ *
+ * @priv: NIC's private structure
+ *
+ * bdx_rx_alloc_buffers allocates skbs, builds rxf descs and pushes them (rxf
+ * descr) into the rxf fifo.  Skb's virtual and physical addresses are stored
+ * in skb db.
+ * To calculate the free space, we uses the cached values of RPTR and WPTR
+ * when needed. This function also updates RPTR and WPTR.
+ */
+static void bdx_rx_alloc_buffers(struct bdx_priv *priv)
+{
+	int dno, delta, idx;
+	struct rxf_desc *rxfd;
+	struct rx_map *dm;
+	int page_size;
+	struct rxdb *db = priv->rxdb0;
+	struct rxf_fifo *f = &priv->rxf_fifo0;
+	int n_pages = 0;
+	struct bdx_page *bdx_page = NULL;
+	int buf_size = priv->rx_page_table.buf_size;
+	int page_off = -1;
+	u64 dma = 0ULL;
+
+	dno = bdx_rxdb_available(db) - 1;
+	page_size = bdx_rx_get_page_size(priv);
+	netdev_dbg(priv->ndev, "dno %d page_size %d buf_size %d\n", dno, page_size,
+		   priv->rx_page_table.buf_size);
+	while (dno > 0) {
+		/* We allocate large pages (i.e. 64KB) and store
+		 * multiple packet buffers in each page. The packet
+		 * buffers are stored backwards in each page (starting
+		 * from the highest address). We utilize the fact that
+		 * the last buffer in each page has a 0 offset to
+		 * detect that all the buffers were processed in order
+		 * to unmap the page.
+		 */
+		if (unlikely(page_off < 0)) {
+			bdx_page = bdx_rx_get_page(priv);
+			if (!bdx_page) {
+				u32 timeout = 1000000;	/* 1/5 sec */
+
+				write_reg(priv, 0x5154, timeout);
+				netdev_dbg(priv->ndev, "system memory is temporary low\n");
+				break;
+			}
+			page_off = ((page_size / buf_size) - 1) * buf_size;
+			dma = bdx_page->dma;
+			n_pages += 1;
+		} else {
+			bdx_rx_ref_page(bdx_page);
+			/* Page is already allocated and mapped, just
+			 * increment the page usage count.
+			 */
+		}
+		rxfd = (struct rxf_desc *)(f->m.va + f->m.wptr);
+		idx = bdx_rxdb_alloc_elem(db);
+		dm = bdx_rxdb_addr_elem(db, idx);
+		dm->size = page_size;
+		bdx_rx_set_dm_page(dm, bdx_page);
+		dm->off = page_off;
+		dm->dma = dma + page_off;
+		netdev_dbg(priv->ndev, "dm size %d off %d dma %p\n", dm->size, dm->off,
+			   (void *)dm->dma);
+		page_off -= buf_size;
+
+		rxfd->info = cpu_to_le32(0x10003);	/* INFO =1 BC =3 */
+		rxfd->va_lo = idx;
+		rxfd->pa_lo = cpu_to_le32(L32_64(dm->dma));
+		rxfd->pa_hi = cpu_to_le32(H32_64(dm->dma));
+		rxfd->len = cpu_to_le32(f->m.pktsz);
+		f->m.wptr += sizeof(struct rxf_desc);
+		delta = f->m.wptr - f->m.memsz;
+		if (unlikely(delta >= 0)) {
+			f->m.wptr = delta;
+			if (delta > 0) {
+				memcpy(f->m.va, f->m.va + f->m.memsz, delta);
+				netdev_dbg(priv->ndev, "wrapped rxd descriptor\n");
+			}
+		}
+		dno--;
+	}
+	netdev_dbg(priv->ndev, "n_pages %d\n", n_pages);
+	/* TBD: Do not update WPTR if no desc were written */
+	write_reg(priv, f->m.reg_wptr, f->m.wptr & TXF_WPTR_WR_PTR);
+	netdev_dbg(priv->ndev, "write_reg 0x%04x f->m.reg_wptr 0x%x\n", f->m.reg_wptr,
+		   f->m.wptr & TXF_WPTR_WR_PTR);
+	netdev_dbg(priv->ndev, "read_reg  0x%04x f->m.reg_rptr=0x%x\n", f->m.reg_rptr,
+		   read_reg(priv, f->m.reg_rptr));
+	netdev_dbg(priv->ndev, "write_reg 0x%04x f->m.reg_wptr=0x%x\n", f->m.reg_wptr,
+		   read_reg(priv, f->m.reg_wptr));
+}
+
+static void bdx_recycle_skb(struct bdx_priv *priv, struct rxd_desc *rxdd)
+{
+	struct rxdb *db = priv->rxdb0;
+	struct rx_map *dm = bdx_rxdb_addr_elem(db, rxdd->va_lo);
+	struct rxf_fifo *f = &priv->rxf_fifo0;
+	struct rxf_desc *rxfd = (struct rxf_desc *)(f->m.va + f->m.wptr);
+	int delta;
+
+	rxfd->info = cpu_to_le32(0x10003);	/* INFO=1 BC=3 */
+	rxfd->va_lo = rxdd->va_lo;
+	rxfd->pa_lo = cpu_to_le32(L32_64(dm->dma));
+	rxfd->pa_hi = cpu_to_le32(H32_64(dm->dma));
+	rxfd->len = cpu_to_le32(f->m.pktsz);
+	f->m.wptr += sizeof(struct rxf_desc);
+	delta = f->m.wptr - f->m.memsz;
+	if (unlikely(delta >= 0)) {
+		f->m.wptr = delta;
+		if (delta > 0) {
+			memcpy(f->m.va, f->m.va + f->m.memsz, delta);
+			netdev_dbg(priv->ndev, "wrapped rxf descriptor\n");
+		}
+	}
+}
+
+static inline u16 checksum(u16 *buf, u16 len, u16 *saddr, u16 *daddr, u16 proto)
+{
+	u32 sum;
+	u16 j = len;
+
+	sum = 0;
+	while (j > 1) {
+		sum += *buf++;
+		if (sum & 0x80000000)
+			sum = (sum & 0xFFFF) + (sum >> 16);
+
+		j -= 2;
+	}
+	if (j & 1)
+		sum += *((u8 *)buf);
+
+	/* Add the tcp pseudo-header */
+	sum += *(saddr++);
+	sum += *saddr;
+	sum += *(daddr++);
+	sum += *daddr;
+	sum += htons(proto);
+	sum += htons(len);
+	/* Fold 32-bit sum to 16 bits */
+	while (sum >> 16)
+		sum = (sum & 0xFFFF) + (sum >> 16);
+
+	/* One's complement of sum */
+	return ((u16)(sum));
+}
+
+static void bdx_skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page,
+				int off, int len)
+{
+	skb_add_rx_frag(skb, 0, page, off, len, SKB_TRUESIZE(len));
+}
+
+#define PKT_ERR_LEN		(70)
+
+static int bdx_rx_error(struct bdx_priv *priv, char *pkt, u32 rxd_err, u16 len)
+{
+	struct ethhdr *eth = (struct ethhdr *)pkt;
+	struct iphdr *iph =
+	    (struct iphdr *)(pkt + sizeof(struct ethhdr) +
+			     ((eth->h_proto ==
+			       htons(ETH_P_8021Q)) ? VLAN_HLEN : 0));
+	int ret = 1;
+
+	if (rxd_err == 0x8) {	/* UDP checksum error */
+		struct udphdr *udp =
+		    (struct udphdr *)((u8 *)iph + sizeof(struct iphdr));
+		if (udp->check == 0) {
+			netdev_dbg(priv->ndev, "false rxd_err = 0x%x\n", rxd_err);
+			ret = 0;	/* Work around H/W false error indication */
+		} else if (len < PKT_ERR_LEN) {
+			u16 sum = checksum((u16 *)udp,
+					   htons(iph->tot_len) -
+					   (iph->ihl * sizeof(u32)),
+					   (u16 *)&iph->saddr,
+					   (u16 *)&iph->daddr, IPPROTO_UDP);
+			if (sum == 0xFFFF) {
+				netdev_dbg(priv->ndev, "false rxd_err = 0x%x\n", rxd_err);
+				ret = 0;	/* Work around H/W false error indication */
+			}
+		}
+	} else if ((rxd_err == 0x10) && (len < PKT_ERR_LEN)) {	/* TCP checksum error */
+		u16 sum;
+		struct tcphdr *tcp =
+		    (struct tcphdr *)((u8 *)iph + sizeof(struct iphdr));
+		sum = checksum((u16 *)tcp,
+			       htons(iph->tot_len) - (iph->ihl * sizeof(u32)),
+			       (u16 *)&iph->saddr, (u16 *)&iph->daddr,
+			       IPPROTO_TCP);
+		if (sum == 0xFFFF) {
+			netdev_dbg(priv->ndev, "false rxd_err = 0x%x\n", rxd_err);
+			ret = 0;	/* Work around H/W false error indication */
+		}
+	}
+	return ret;
+}
+
+static int bdx_rx_receive(struct bdx_priv *priv, struct rxd_fifo *f, int budget)
+{
+	struct sk_buff *skb;
+	struct rxd_desc *rxdd;
+	struct rx_map *dm;
+	struct bdx_page *bdx_page;
+	struct rxf_fifo *rxf_fifo;
+	u32 rxd_val1, rxd_err;
+	u16 len;
+	u16 rxd_vlan;
+	u32 pkt_id;
+	int tmp_len, size;
+	char *pkt;
+	int done = 0;
+	struct rxdb *db = NULL;
+
+	f->m.wptr = read_reg(priv, f->m.reg_wptr) & TXF_WPTR_WR_PTR;
+	size = f->m.wptr - f->m.rptr;
+	if (size < 0)
+		size += f->m.memsz;	/* Size is negative :-) */
+
+	while (size > 0) {
+		rxdd = (struct rxd_desc *)(f->m.va + f->m.rptr);
+		db = priv->rxdb0;
+
+		/* We have a chicken and egg problem here. If the
+		 * descriptor is wrapped we first need to copy the tail
+		 * of the descriptor to the end of the buffer before
+		 * extracting values from the descriptor. However in
+		 * order to know if the descriptor is wrapped we need to
+		 * obtain the length of the descriptor from (the
+		 * wrapped) descriptor. Luckily the length is the first
+		 * word of the descriptor. Descriptor lengths are
+		 * multiples of 8 bytes so in case of a wrapped
+		 * descriptor the first 8 bytes guaranteed to appear
+		 * before the end of the buffer. We first obtain the
+		 * length, we then copy the rest of the descriptor if
+		 * needed and then extract the rest of the values from
+		 * the descriptor.
+		 *
+		 * Do not change the order of operations as it will
+		 * break the code!!!
+		 */
+		rxd_val1 = cpu_to_le32(rxdd->rxd_val1);
+		tmp_len = GET_RXD_BC(rxd_val1) << 3;
+		pkt_id = GET_RXD_PKT_ID(rxd_val1);
+		size -= tmp_len;
+		/* CHECK FOR A PARTIALLY ARRIVED DESCRIPTOR */
+		if (size < 0) {
+			netdev_dbg(priv->ndev,
+				   "%s partially arrived desc tmp_len %d\n",
+				   __func__, tmp_len);
+			break;
+		}
+		/* make sure that the descriptor fully is arrived
+		 * before reading the rest of the descriptor.
+		 */
+		rmb();
+
+		/* A special treatment is given to non-contiguous
+		 * descriptors that start near the end, wraps around
+		 * and continue at the beginning. The second part is
+		 * copied right after the first, and then descriptor
+		 * is interpreted as normal. The fifo has an extra
+		 * space to allow such operations.
+		 */
+
+		/* HAVE WE REACHED THE END OF THE QUEUE? */
+		f->m.rptr += tmp_len;
+		tmp_len = f->m.rptr - f->m.memsz;
+		if (unlikely(tmp_len >= 0)) {
+			f->m.rptr = tmp_len;
+			if (tmp_len > 0) {
+				/* COPY PARTIAL DESCRIPTOR TO THE END OF THE QUEUE */
+				netdev_dbg(priv->ndev, "wrapped desc rptr=%d tmp_len=%d\n",
+					   f->m.rptr, tmp_len);
+				memcpy(f->m.va + f->m.memsz, f->m.va, tmp_len);
+			}
+		}
+		dm = bdx_rxdb_addr_elem(db, rxdd->va_lo);
+		prefetch(dm);
+		bdx_page = bdx_rx_page(dm);
+
+		len = cpu_to_le16(rxdd->len);
+		rxd_vlan = cpu_to_le16(rxdd->rxd_vlan);
+		/* CHECK FOR ERRORS */
+		rxd_err = GET_RXD_ERR(rxd_val1);
+		if (unlikely(rxd_err)) {
+			int ret = 1;
+
+			/* NOT CRC error */
+			if (!(rxd_err & 0x4) &&
+			    /* UDP checksum error */
+			    ((rxd_err == 0x8 && pkt_id == 2) ||
+			     /* TCP checksum error */
+			     (rxd_err == 0x10 && len < PKT_ERR_LEN && pkt_id == 1))) {
+				pkt = ((char *)page_address(bdx_page->page) +
+				       dm->off);
+				ret = bdx_rx_error(priv, pkt, rxd_err, len);
+			}
+			if (ret) {
+				netdev_err(priv->ndev, "rxd_err = 0x%x\n", rxd_err);
+				priv->net_stats.rx_errors++;
+				bdx_recycle_skb(priv, rxdd);
+				continue;
+			}
+		}
+		rxf_fifo = &priv->rxf_fifo0;
+
+		/* In this case we obtain a pre-allocated skb from
+		 * napi. We add a frag with the page/off/len tuple of
+		 * the buffer that we have just read and then call
+		 * vlan_gro_frags()/napi_gro_frags() to process the
+		 * packet. The same skb is used again and again to
+		 * handle all packets, which eliminates the need to
+		 * allocate an skb for each packet.
+		 */
+		skb = napi_get_frags(&priv->napi);
+		if (!skb) {
+			netdev_err(priv->ndev, "napi_get_frags failed\n");
+			break;
+		}
+		skb->ip_summed =
+		    (pkt_id == 0) ? CHECKSUM_NONE : CHECKSUM_UNNECESSARY;
+		bdx_skb_add_rx_frag(skb, 0, bdx_page->page, dm->off, len);
+		bdx_rxdb_free_elem(db, rxdd->va_lo);
+
+		/* PROCESS PACKET */
+		bdx_rx_vlan(priv, skb, rxd_val1, rxd_vlan);
+		napi_gro_frags(&priv->napi);
+
+		bdx_rx_reuse_page(priv, dm);
+		priv->net_stats.rx_bytes += len;
+
+		if (unlikely(++done >= budget))
+			break;
+	}
+
+	priv->net_stats.rx_packets += done;
+	/* FIXME: Do something to minimize pci accesses */
+	write_reg(priv, f->m.reg_rptr, f->m.rptr & TXF_WPTR_WR_PTR);
+	bdx_rx_alloc_buffers(priv);
+	return done;
+}
+
 /* TX HW/SW interaction overview
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  * There are 2 types of TX communication channels between driver and NIC.
@@ -436,6 +989,58 @@ static int bdx_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	return NETDEV_TX_OK;
 }
 
+static void bdx_tx_cleanup(struct bdx_priv *priv)
+{
+	struct txf_fifo *f = &priv->txf_fifo0;
+	struct txdb *db = &priv->txdb;
+	int tx_level = 0;
+
+	f->m.wptr = read_reg(priv, f->m.reg_wptr) & TXF_WPTR_MASK;
+
+	netif_tx_lock(priv->ndev);
+	while (f->m.wptr != f->m.rptr) {
+		f->m.rptr += BDX_TXF_DESC_SZ;
+		f->m.rptr &= f->m.size_mask;
+		/* Unmap all fragments */
+		/* First has to come tx_maps containing DMA */
+		do {
+			netdev_dbg(priv->ndev, "pci_unmap_page 0x%llx len %d\n",
+				   db->rptr->addr.dma, db->rptr->len);
+			dma_unmap_page(&priv->pdev->dev, db->rptr->addr.dma,
+				       db->rptr->len, DMA_TO_DEVICE);
+			bdx_tx_db_inc_rptr(db);
+		} while (db->rptr->len > 0);
+		tx_level -= db->rptr->len;	/* '-' Because the len is negative */
+
+		/* Now should come skb pointer - free it */
+		dev_kfree_skb_any(db->rptr->addr.skb);
+		netdev_dbg(priv->ndev, "dev_kfree_skb_any %p %d\n", db->rptr->addr.skb,
+			   -db->rptr->len);
+		bdx_tx_db_inc_rptr(db);
+	}
+
+	/* Let the HW know which TXF descriptors were cleaned */
+	write_reg(priv, f->m.reg_rptr, f->m.rptr & TXF_WPTR_WR_PTR);
+
+	/* We reclaimed resources, so in case the Q is stopped by xmit
+	 * callback, we resume the transmission and use tx_lock to
+	 * synchronize with xmit.
+	 */
+	priv->tx_level += tx_level;
+	if (priv->tx_noupd) {
+		priv->tx_noupd = 0;
+		write_reg(priv, priv->txd_fifo0.m.reg_wptr,
+			  priv->txd_fifo0.m.wptr & TXF_WPTR_WR_PTR);
+	}
+	if (unlikely(netif_queue_stopped(priv->ndev) &&
+		     netif_carrier_ok(priv->ndev) &&
+		     (priv->tx_level >= BDX_MAX_TX_LEVEL / 2))) {
+		netdev_dbg(priv->ndev, "TX Q WAKE level %d\n", priv->tx_level);
+		netif_wake_queue(priv->ndev);
+	}
+	netif_tx_unlock(priv->ndev);
+}
+
 static void bdx_tx_free_skbs(struct bdx_priv *priv)
 {
 	struct txdb *db = &priv->txdb;
@@ -713,6 +1318,11 @@ static irqreturn_t bdx_isr_napi(int irq, void *dev)
 		bdx_isr_extra(priv, isr);
 
 	if (isr & (IR_RX_DESC_0 | IR_TX_FREE_0 | IR_TMR1)) {
+		if (likely(napi_schedule_prep(&priv->napi))) {
+			__napi_schedule(&priv->napi);
+			return IRQ_HANDLED;
+		}
+
 		/* We get here if an interrupt has slept into the
 		 * small time window between these lines in
 		 * bdx_poll: bdx_enable_interrupts(priv); return 0;
@@ -730,6 +1340,21 @@ static irqreturn_t bdx_isr_napi(int irq, void *dev)
 	return IRQ_HANDLED;
 }
 
+static int bdx_poll(struct napi_struct *napi, int budget)
+{
+	struct bdx_priv *priv = container_of(napi, struct bdx_priv, napi);
+	int work_done;
+
+	bdx_tx_cleanup(priv);
+
+	work_done = bdx_rx_receive(priv, &priv->rxd_fifo0, budget);
+	if (work_done < budget) {
+		napi_complete(napi);
+		bdx_enable_interrupts(priv);
+	}
+	return work_done;
+}
+
 static int bdx_fw_load(struct bdx_priv *priv)
 {
 	int master, i, ret;
@@ -810,6 +1435,8 @@ static int bdx_hw_start(struct bdx_priv *priv)
 	write_reg(priv, REG_TX_FULLNESS, 0);
 
 	write_reg(priv, REG_VGLB, 0);
+	write_reg(priv, REG_MAX_FRAME_A,
+		  priv->rxf_fifo0.m.pktsz & MAX_FRAME_AB_VAL);
 	write_reg(priv, REG_RDINTCM0, priv->rdintcm);
 	write_reg(priv, REG_RDINTCM2, 0);
 
@@ -913,11 +1540,19 @@ static int bdx_start(struct bdx_priv *priv)
 		return ret;
 	}
 
+	ret = create_rx_ring(priv);
+	if (ret) {
+		netdev_err(priv->ndev, "failed to rx init %d\n", ret);
+		goto err_tx_ring;
+	}
+
+	bdx_rx_alloc_buffers(priv);
+
 	ret = request_irq(priv->pdev->irq, &bdx_isr_napi, IRQF_SHARED,
 			  priv->ndev->name, priv->ndev);
 	if (ret) {
 		netdev_err(priv->ndev, "failed to request irq %d\n", ret);
-		goto err_tx_ring;
+		goto err_rx_ring;
 	}
 
 	ret = bdx_hw_start(priv);
@@ -928,6 +1563,8 @@ static int bdx_start(struct bdx_priv *priv)
 	return 0;
 err_free_irq:
 	free_irq(priv->pdev->irq, priv->ndev);
+err_rx_ring:
+	destroy_rx_ring(priv);
 err_tx_ring:
 	destroy_tx_ring(priv);
 	return ret;
@@ -938,10 +1575,13 @@ static int bdx_close(struct net_device *ndev)
 	struct bdx_priv *priv = netdev_priv(ndev);
 
 	netif_carrier_off(ndev);
+	netif_napi_del(&priv->napi);
+	napi_disable(&priv->napi);
 
 	bdx_disable_interrupts(priv);
 	free_irq(priv->pdev->irq, priv->ndev);
 	bdx_sw_reset(priv);
+	destroy_rx_ring(priv);
 	destroy_tx_ring(priv);
 	return 0;
 }
@@ -1204,6 +1844,7 @@ static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	priv = netdev_priv(ndev);
 	pci_set_drvdata(pdev, priv);
+	netif_napi_add(ndev, &priv->napi, bdx_poll);
 
 	priv->regs = regs;
 	priv->pdev = pdev;
diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
index e8044e9d06eb..fb43ebb5911f 100644
--- a/drivers/net/ethernet/tehuti/tn40.h
+++ b/drivers/net/ethernet/tehuti/tn40.h
@@ -7,6 +7,7 @@
 #include <linux/crc32.h>
 #include <linux/delay.h>
 #include <linux/etherdevice.h>
+#include <linux/ethtool.h>
 #include <linux/firmware.h>
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
@@ -19,6 +20,7 @@
 #include <linux/phy.h>
 #include <linux/tcp.h>
 #include <linux/udp.h>
+#include <linux/vmalloc.h>
 #include <linux/version.h>
 
 #include "tn40_regs.h"
@@ -98,6 +100,33 @@ struct txd_fifo {
 	struct fifo m; /* The minimal set of variables used by all fifos */
 };
 
+struct rxf_fifo {
+	struct fifo m; /* The minimal set of variables used by all fifos */
+};
+
+struct rxd_fifo {
+	struct fifo m; /* The minimal set of variables used by all fifos */
+};
+
+struct bdx_page {
+	struct page *page;
+	u64 dma;
+};
+
+struct rx_map {
+	struct bdx_page bdx_page;
+	u64 dma;
+	u32 off;
+	u32 size; /* Mapped area (i.e. page) size */
+};
+
+struct rxdb {
+	int *stack;
+	struct rx_map *elems;
+	int nelem;
+	int top;
+};
+
 union bdx_dma_addr {
 	dma_addr_t dma;
 	struct sk_buff *skb;
@@ -121,10 +150,23 @@ struct txdb {
 	int size; /* Number of elements in the db */
 };
 
+struct bdx_rx_page_table {
+	int page_size;
+	int buf_size;
+	struct bdx_page bdx_pages;
+};
+
 struct bdx_priv {
 	struct net_device *ndev;
 	struct pci_dev *pdev;
 
+	struct napi_struct napi;
+	/* RX FIFOs: 1 for data (full) descs, and 2 for free descs */
+	struct rxd_fifo rxd_fifo0;
+	struct rxf_fifo rxf_fifo0;
+	struct rxdb *rxdb0; /* Rx dbs to store skb pointers */
+	int napi_stop;
+	struct vlan_group *vlgrp;
 	/* Tx FIFOs: 1 for data desc, 1 for empty (acks) desc */
 	struct txd_fifo txd_fifo0;
 	struct txf_fifo txf_fifo0;
@@ -153,6 +195,41 @@ struct bdx_priv {
 	u32 b0_len;
 	dma_addr_t b0_dma; /* Physical address of buffer */
 	char *b0_va; /* Virtual address of buffer */
+
+	struct bdx_rx_page_table rx_page_table;
+};
+
+/* RX FREE descriptor - 64bit */
+struct rxf_desc {
+	u32 info; /* Buffer Count + Info - described below */
+	u32 va_lo; /* VAdr[31:0] */
+	u32 va_hi; /* VAdr[63:32] */
+	u32 pa_lo; /* PAdr[31:0] */
+	u32 pa_hi; /* PAdr[63:32] */
+	u32 len; /* Buffer Length */
+};
+
+#define GET_RXD_BC(x) GET_BITS_SHIFT((x), 5, 0)
+#define GET_RXD_RXFQ(x) GET_BITS_SHIFT((x), 2, 8)
+#define GET_RXD_TO(x) GET_BITS_SHIFT((x), 1, 15)
+#define GET_RXD_TYPE(x) GET_BITS_SHIFT((x), 4, 16)
+#define GET_RXD_ERR(x) GET_BITS_SHIFT((x), 6, 21)
+#define GET_RXD_RXP(x) GET_BITS_SHIFT((x), 1, 27)
+#define GET_RXD_PKT_ID(x) GET_BITS_SHIFT((x), 3, 28)
+#define GET_RXD_VTAG(x) GET_BITS_SHIFT((x), 1, 31)
+#define GET_RXD_VLAN_ID(x) GET_BITS_SHIFT((x), 12, 0)
+#define GET_RXD_VLAN_TCI(x) GET_BITS_SHIFT((x), 16, 0)
+#define GET_RXD_CFI(x) GET_BITS_SHIFT((x), 1, 12)
+#define GET_RXD_PRIO(x) GET_BITS_SHIFT((x), 3, 13)
+
+struct rxd_desc {
+	u32 rxd_val1;
+	u16 len;
+	u16 rxd_vlan;
+	u32 va_lo;
+	u32 va_hi;
+	u32 rss_lo;
+	u32 rss_hash;
 };
 
 #define MAX_PBL (19)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support
  2024-04-15 10:43 [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips FUJITA Tomonori
                   ` (3 preceding siblings ...)
  2024-04-15 10:43 ` [PATCH net-next v1 4/5] net: tn40xx: add basic Rx handling FUJITA Tomonori
@ 2024-04-15 10:43 ` FUJITA Tomonori
  2024-04-15 14:44   ` Andrew Lunn
  2024-04-15 23:21 ` [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips Florian Fainelli
  5 siblings, 1 reply; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-15 10:43 UTC (permalink / raw)
  To: netdev; +Cc: andrew

This patch adds supports for multiple PHY hardware with PHYLIB. The
adapters with TN40xx chips use multiple PHY hardware; AMCC QT2025, TI
TLK10232, Aqrate AQR105, and Marvell 88X3120, 88X3310, and MV88E2010.

For now, the PCI ID table of this driver enables adapters using only
QT2025 PHY. I've tested this driver and the QT2025 PHY driver with
Edimax EN-9320 10G adapter.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
---
 drivers/net/ethernet/tehuti/Kconfig     |   1 +
 drivers/net/ethernet/tehuti/Makefile    |   2 +-
 drivers/net/ethernet/tehuti/tn40.c      |  32 ++++++
 drivers/net/ethernet/tehuti/tn40.h      |   5 +
 drivers/net/ethernet/tehuti/tn40_mdio.c | 141 ++++++++++++++++++++++++
 5 files changed, 180 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/tehuti/tn40_mdio.c

diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
index 4198fd59e42e..71f22471f9a0 100644
--- a/drivers/net/ethernet/tehuti/Kconfig
+++ b/drivers/net/ethernet/tehuti/Kconfig
@@ -27,6 +27,7 @@ config TEHUTI_TN40
 	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
 	depends on PCI
 	select FW_LOADER
+	select AMCC_QT2025_PHY
 	help
 	  This driver supports 10G Ethernet adapters using Tehuti Networks
 	  TN40xx chips. Currently, adapters with Applied Micro Circuits
diff --git a/drivers/net/ethernet/tehuti/Makefile b/drivers/net/ethernet/tehuti/Makefile
index 1c468d99e476..7a0fe586a243 100644
--- a/drivers/net/ethernet/tehuti/Makefile
+++ b/drivers/net/ethernet/tehuti/Makefile
@@ -5,5 +5,5 @@
 
 obj-$(CONFIG_TEHUTI) += tehuti.o
 
-tn40xx-y := tn40.o
+tn40xx-y := tn40.o tn40_mdio.o
 obj-$(CONFIG_TEHUTI_TN40) += tn40xx.o
diff --git a/drivers/net/ethernet/tehuti/tn40.c b/drivers/net/ethernet/tehuti/tn40.c
index c8ed9b743753..2c50295f4e68 100644
--- a/drivers/net/ethernet/tehuti/tn40.c
+++ b/drivers/net/ethernet/tehuti/tn40.c
@@ -1285,18 +1285,26 @@ static void bdx_link_changed(struct bdx_priv *priv)
 		if (priv->link_loop_cnt++ > LINK_LOOP_MAX) {
 			/* MAC reset */
 			bdx_set_link_speed(priv, 0);
+			bdx_set_link_speed(priv, priv->phydev->speed);
 			priv->link_loop_cnt = 0;
 		}
 		write_reg(priv, 0x5150, 1000000);
 		return;
 	}
+
+	if (!netif_carrier_ok(priv->ndev)) {
+		netif_wake_queue(priv->ndev);
+		phy_print_status(priv->phydev);
+	}
 	priv->link = link;
+	netif_carrier_on(priv->ndev);
 }
 
 static inline void bdx_isr_extra(struct bdx_priv *priv, u32 isr)
 {
 	if (isr & (IR_LNKCHG0 | IR_LNKCHG1 | IR_TMR0)) {
 		netdev_dbg(priv->ndev, "isr = 0x%x\n", isr);
+		phy_mac_interrupt(priv->phydev);
 		bdx_link_changed(priv);
 	}
 }
@@ -1580,23 +1588,42 @@ static int bdx_close(struct net_device *ndev)
 
 	bdx_disable_interrupts(priv);
 	free_irq(priv->pdev->irq, priv->ndev);
+	phy_stop(priv->phydev);
+	phy_disconnect(priv->phydev);
 	bdx_sw_reset(priv);
 	destroy_rx_ring(priv);
 	destroy_tx_ring(priv);
 	return 0;
 }
 
+static void phy_handler(struct net_device *_dev)
+{
+}
+
 static int bdx_open(struct net_device *dev)
 {
 	struct bdx_priv *priv = netdev_priv(dev);
 	int ret;
 
 	bdx_sw_reset(priv);
+
+	ret = phy_connect_direct(priv->ndev, priv->phydev, phy_handler, PHY_INTERFACE_MODE_XAUI);
+	if (ret) {
+		netdev_err(dev, "failed to connect to phy %d\n", ret);
+		return ret;
+	}
+	phy_attached_info(priv->phydev);
+	phy_start(priv->phydev);
+
 	ret = bdx_start(priv);
 	if (ret) {
 		netdev_err(dev, "failed to start %d\n", ret);
+		phy_stop(priv->phydev);
+		phy_disconnect(priv->phydev);
 		return ret;
 	}
+	napi_enable(&priv->napi);
+	netif_start_queue(priv->ndev);
 	return 0;
 }
 
@@ -1872,6 +1899,11 @@ static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	priv->stats_flag = ((read_reg(priv, FPGA_VER) & 0xFFF) != 308);
 
+	ret = bdx_mdiobus_init(priv);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to find PHY.\n");
+		goto err_free_irq;
+	}
 	priv->isr_mask =
 	    IR_RX_FREE_0 | IR_LNKCHG0 | IR_PSE | IR_TMR0 | IR_RX_DESC_0 |
 	    IR_TX_FREE_0 | IR_TMR1;
diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
index fb43ebb5911f..06ab9a2cb42d 100644
--- a/drivers/net/ethernet/tehuti/tn40.h
+++ b/drivers/net/ethernet/tehuti/tn40.h
@@ -197,6 +197,9 @@ struct bdx_priv {
 	char *b0_va; /* Virtual address of buffer */
 
 	struct bdx_rx_page_table rx_page_table;
+
+	struct mii_bus *mdio;
+	struct phy_device *phydev;
 };
 
 /* RX FREE descriptor - 64bit */
@@ -283,4 +286,6 @@ static inline void write_reg(struct bdx_priv *priv, u32 reg, u32 val)
 	writel(val, priv->regs + reg);
 }
 
+int bdx_mdiobus_init(struct bdx_priv *priv);
+
 #endif /* _TN40XX_H */
diff --git a/drivers/net/ethernet/tehuti/tn40_mdio.c b/drivers/net/ethernet/tehuti/tn40_mdio.c
new file mode 100644
index 000000000000..f7f83c77e8b2
--- /dev/null
+++ b/drivers/net/ethernet/tehuti/tn40_mdio.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (c) Tehuti Networks Ltd. */
+
+#include "tn40.h"
+
+static u32 bdx_mdio_get(struct bdx_priv *priv)
+{
+	void __iomem *regs = priv->regs;
+
+#define BDX_MAX_MDIO_BUSY_LOOPS 1024
+	int tries = 0;
+
+	while (++tries < BDX_MAX_MDIO_BUSY_LOOPS) {
+		u32 mdio_cmd_stat = readl(regs + REG_MDIO_CMD_STAT);
+
+		if (GET_MDIO_BUSY(mdio_cmd_stat) == 0)
+			return mdio_cmd_stat;
+	}
+	dev_err(&priv->pdev->dev, "MDIO busy!\n");
+	return 0xFFFFFFFF;
+}
+
+static u16 bdx_mdio_read(struct bdx_priv *priv, int device, int port, u16 addr)
+{
+	void __iomem *regs = priv->regs;
+	u32 tmp_reg, i;
+	/* wait until MDIO is not busy */
+	if (bdx_mdio_get(priv) == 0xFFFFFFFF)
+		return -1;
+
+	i = ((device & 0x1F) | ((port & 0x1F) << 5));
+	writel(i, regs + REG_MDIO_CMD);
+	writel((u32)addr, regs + REG_MDIO_ADDR);
+	tmp_reg = bdx_mdio_get(priv);
+	if (tmp_reg == 0xFFFFFFFF)
+		return -1;
+
+	writel(((1 << 15) | i), regs + REG_MDIO_CMD);
+	/* read CMD_STAT until not busy */
+	tmp_reg = bdx_mdio_get(priv);
+	if (tmp_reg == 0xFFFFFFFF)
+		return -1;
+
+	if (GET_MDIO_RD_ERR(tmp_reg)) {
+		dev_dbg(&priv->pdev->dev, "MDIO error after read command\n");
+		return -1;
+	}
+	tmp_reg = readl(regs + REG_MDIO_DATA);
+
+	return (tmp_reg & 0xFFFF);
+}
+
+static int bdx_mdio_write(struct bdx_priv *priv, int device, int port, u16 addr,
+			  u16 data)
+{
+	void __iomem *regs = priv->regs;
+	u32 tmp_reg;
+
+	/* wait until MDIO is not busy */
+	if (bdx_mdio_get(priv) == 0xFFFFFFFF)
+		return -1;
+	writel(((device & 0x1F) | ((port & 0x1F) << 5)), regs + REG_MDIO_CMD);
+	writel((u32)addr, regs + REG_MDIO_ADDR);
+	if (bdx_mdio_get(priv) == 0xFFFFFFFF)
+		return -1;
+	writel((u32)data, regs + REG_MDIO_DATA);
+	/* read CMD_STAT until not busy */
+	tmp_reg = bdx_mdio_get(priv);
+	if (tmp_reg == 0xFFFFFFFF)
+		return -1;
+
+	if (GET_MDIO_RD_ERR(tmp_reg)) {
+		dev_err(&priv->pdev->dev, "MDIO error after write command\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void bdx_mdio_set_speed(struct bdx_priv *priv, u32 speed)
+{
+	void __iomem *regs = priv->regs;
+	int mdio_cfg;
+
+	mdio_cfg = readl(regs + REG_MDIO_CMD_STAT);
+	if (speed == 1)
+		mdio_cfg = (0x7d << 7) | 0x08;	/* 1MHz */
+	else
+		mdio_cfg = 0xA08;	/* 6MHz */
+	mdio_cfg |= (1 << 6);
+	writel(mdio_cfg, regs + REG_MDIO_CMD_STAT);
+	msleep(100);
+}
+
+static int mdio_read_reg(struct mii_bus *mii_bus, int addr, int devnum, int regnum)
+{
+	return bdx_mdio_read(mii_bus->priv, devnum, addr, regnum);
+}
+
+static int mdio_write_reg(struct mii_bus *mii_bus, int addr, int devnum, int regnum, u16 val)
+{
+	return bdx_mdio_write(mii_bus->priv, devnum, addr, regnum, val);
+}
+
+int bdx_mdiobus_init(struct bdx_priv *priv)
+{
+	struct pci_dev *pdev = priv->pdev;
+	struct mii_bus *bus;
+	struct phy_device *phydev;
+	int ret;
+
+	bus = devm_mdiobus_alloc(&pdev->dev);
+	if (!bus)
+		return -ENOMEM;
+
+	bus->name = BDX_DRV_NAME;
+	bus->parent = &pdev->dev;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "tn40xx-%x-%x",
+		 pci_domain_nr(pdev->bus), pci_dev_id(pdev));
+	bus->priv = priv;
+
+	bus->read_c45 = mdio_read_reg;
+	bus->write_c45 = mdio_write_reg;
+
+	ret = devm_mdiobus_register(&pdev->dev, bus);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to register mdiobus %d %u %u\n",
+			ret, bus->state, MDIOBUS_UNREGISTERED);
+		return ret;
+	}
+
+	phydev = phy_find_first(bus);
+	if (!phydev) {
+		dev_err(&pdev->dev, "failed to find phy\n");
+		return -1;
+	}
+	phydev->irq = PHY_MAC_INTERRUPT;
+	priv->mdio = bus;
+	priv->phydev = phydev;
+	bdx_mdio_set_speed(priv, MDIO_SPEED_6MHZ);
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 1/5] net: tn40xx: add pci driver for Tehuti Networks TN40xx chips
  2024-04-15 10:43 ` [PATCH net-next v1 1/5] net: tn40xx: add pci " FUJITA Tomonori
@ 2024-04-15 12:39   ` kernel test robot
  2024-04-15 14:24   ` Andrew Lunn
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2024-04-15 12:39 UTC (permalink / raw)
  To: FUJITA Tomonori, netdev; +Cc: oe-kbuild-all, andrew

Hi FUJITA,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 32affa5578f0e6b9abef3623d3976395afbd265c]

url:    https://github.com/intel-lab-lkp/linux/commits/FUJITA-Tomonori/net-tn40xx-add-pci-driver-for-Tehuti-Networks-TN40xx-chips/20240415-185416
base:   32affa5578f0e6b9abef3623d3976395afbd265c
patch link:    https://lore.kernel.org/r/20240415104352.4685-2-fujita.tomonori%40gmail.com
patch subject: [PATCH net-next v1 1/5] net: tn40xx: add pci driver for Tehuti Networks TN40xx chips
reproduce: (https://download.01.org/0day-ci/archive/20240415/202404152042.fOlsMNTW-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404152042.fOlsMNTW-lkp@intel.com/

versioncheck warnings: (new ones prefixed by >>)
   INFO PATH=/opt/cross/rustc-1.76.0-bindgen-0.65.1/cargo/bin:/opt/cross/clang-17/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
   /usr/bin/timeout -k 100 3h /usr/bin/make KCFLAGS= -Wtautological-compare -Wno-error=return-type -Wreturn-type -Wcast-function-type -funsigned-char -Wundef -fstrict-flex-arrays=3 -Wenum-conversion W=1 --keep-going LLVM=1 -j32 ARCH=x86_64 versioncheck
   find ./* \( -name SCCS -o -name BitKeeper -o -name .svn -o -name CVS -o -name .pc -o -name .hg -o -name .git \) -prune -o \
   	-name '*.[hcS]' -type f -print | sort \
   	| xargs perl -w ./scripts/checkversion.pl
   ./drivers/accessibility/speakup/genmap.c: 13 linux/version.h not needed.
   ./drivers/accessibility/speakup/makemapdata.c: 13 linux/version.h not needed.
>> ./drivers/net/ethernet/tehuti/tn40.h: 10 linux/version.h not needed.
   ./drivers/staging/media/atomisp/include/linux/atomisp.h: 23 linux/version.h not needed.
   ./samples/bpf/spintest.bpf.c: 8 linux/version.h not needed.
   ./samples/trace_events/trace_custom_sched.c: 11 linux/version.h not needed.
   ./sound/soc/codecs/cs42l42.c: 14 linux/version.h not needed.
   ./tools/lib/bpf/bpf_helpers.h: 410: need linux/version.h
   ./tools/testing/selftests/bpf/progs/dev_cgroup.c: 9 linux/version.h not needed.
   ./tools/testing/selftests/bpf/progs/netcnt_prog.c: 3 linux/version.h not needed.
   ./tools/testing/selftests/bpf/progs/test_map_lock.c: 4 linux/version.h not needed.
   ./tools/testing/selftests/bpf/progs/test_send_signal_kern.c: 4 linux/version.h not needed.
   ./tools/testing/selftests/bpf/progs/test_spin_lock.c: 4 linux/version.h not needed.
   ./tools/testing/selftests/bpf/progs/test_tcp_estats.c: 37 linux/version.h not needed.
   ./tools/testing/selftests/wireguard/qemu/init.c: 27 linux/version.h not needed.

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 1/5] net: tn40xx: add pci driver for Tehuti Networks TN40xx chips
  2024-04-15 10:43 ` [PATCH net-next v1 1/5] net: tn40xx: add pci " FUJITA Tomonori
  2024-04-15 12:39   ` kernel test robot
@ 2024-04-15 14:24   ` Andrew Lunn
  2024-04-21  2:28     ` FUJITA Tomonori
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Lunn @ 2024-04-15 14:24 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: netdev

> +#include "tn40.h"
> +
> +static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)

The name space prefix generally has something to do with the driver
name. What does bdx have to do with Tehuti Network TN40xx?

      Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support
  2024-04-15 10:43 ` [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support FUJITA Tomonori
@ 2024-04-15 14:44   ` Andrew Lunn
  2024-04-16 12:19     ` FUJITA Tomonori
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Lunn @ 2024-04-15 14:44 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: netdev

On Mon, Apr 15, 2024 at 07:43:52PM +0900, FUJITA Tomonori wrote:
> This patch adds supports for multiple PHY hardware with PHYLIB. The
> adapters with TN40xx chips use multiple PHY hardware; AMCC QT2025, TI
> TLK10232, Aqrate AQR105, and Marvell 88X3120, 88X3310, and MV88E2010.
> 
> For now, the PCI ID table of this driver enables adapters using only
> QT2025 PHY. I've tested this driver and the QT2025 PHY driver with
> Edimax EN-9320 10G adapter.

Please split this up. Add the MDIO bus master in one patch. Then add
support for phylib in a second patch. They are logically different
things.

Are there variants of this device using SFP? It might be you actually
want to use phylink, not phylib. That is a bit messy for a PCI device,
look at drivers/net/ethernet/wangxun.

> diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
> index 4198fd59e42e..71f22471f9a0 100644
> --- a/drivers/net/ethernet/tehuti/Kconfig
> +++ b/drivers/net/ethernet/tehuti/Kconfig
> @@ -27,6 +27,7 @@ config TEHUTI_TN40
>  	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
>  	depends on PCI
>  	select FW_LOADER
> +	select AMCC_QT2025_PHY

That is pretty unusual, especially when you say there are a few
different choices.

> +static u32 bdx_mdio_get(struct bdx_priv *priv)
> +{
> +	void __iomem *regs = priv->regs;
> +
> +#define BDX_MAX_MDIO_BUSY_LOOPS 1024
> +	int tries = 0;
> +
> +	while (++tries < BDX_MAX_MDIO_BUSY_LOOPS) {
> +		u32 mdio_cmd_stat = readl(regs + REG_MDIO_CMD_STAT);
> +
> +		if (GET_MDIO_BUSY(mdio_cmd_stat) == 0)
> +			return mdio_cmd_stat;
> +	}
> +	dev_err(&priv->pdev->dev, "MDIO busy!\n");

include/linux/iopoll.h

> +	return 0xFFFFFFFF;

It is always better to use standard error codes. In this case,
-ETIMEDOUT.

> +static u16 bdx_mdio_read(struct bdx_priv *priv, int device, int port, u16 addr)
> +{
> +	void __iomem *regs = priv->regs;
> +	u32 tmp_reg, i;
> +	/* wait until MDIO is not busy */
> +	if (bdx_mdio_get(priv) == 0xFFFFFFFF)
> +		return -1;
> +
> +	i = ((device & 0x1F) | ((port & 0x1F) << 5));
> +	writel(i, regs + REG_MDIO_CMD);
> +	writel((u32)addr, regs + REG_MDIO_ADDR);
> +	tmp_reg = bdx_mdio_get(priv);
> +	if (tmp_reg == 0xFFFFFFFF)
> +		return -1;

This function has a return type of u16. So returning -1 makes no sense.

> +static int mdio_read_reg(struct mii_bus *mii_bus, int addr, int devnum, int regnum)
> +{
> +	return bdx_mdio_read(mii_bus->priv, devnum, addr, regnum);

I would probably change bdx_mdio_read() so that it takes the
parameters in the same order as mdio_read_reg().

There is also a reasonably common convention that the functions
performing C45 bus protocol operations have c45 in their name. It
appears this hardware does not support C22 at all. That makes it
unusual, and little hits like this are useful.

	 Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling
  2024-04-15 10:43 ` [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling FUJITA Tomonori
@ 2024-04-15 22:29   ` kernel test robot
  2024-04-18 17:23   ` Simon Horman
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2024-04-15 22:29 UTC (permalink / raw)
  To: FUJITA Tomonori, netdev; +Cc: oe-kbuild-all, andrew

Hi FUJITA,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 32affa5578f0e6b9abef3623d3976395afbd265c]

url:    https://github.com/intel-lab-lkp/linux/commits/FUJITA-Tomonori/net-tn40xx-add-pci-driver-for-Tehuti-Networks-TN40xx-chips/20240415-185416
base:   32affa5578f0e6b9abef3623d3976395afbd265c
patch link:    https://lore.kernel.org/r/20240415104352.4685-4-fujita.tomonori%40gmail.com
patch subject: [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling
config: microblaze-allmodconfig (https://download.01.org/0day-ci/archive/20240416/202404160600.TORqdc7K-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240416/202404160600.TORqdc7K-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404160600.TORqdc7K-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/tehuti/tn40.c: In function 'bdx_start_xmit':
>> drivers/net/ethernet/tehuti/tn40.c:370:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     370 |         txdd->va_lo = (u32)((u64)skb);
         |                             ^
--
>> drivers/net/ethernet/tehuti/tn40.c:189: warning: expecting prototype for txdb_map_skb(). Prototype was for bdx_tx_map_skb() instead
>> drivers/net/ethernet/tehuti/tn40.c:323: warning: Function parameter or struct member 'priv' not described in 'bdx_tx_space'


vim +370 drivers/net/ethernet/tehuti/tn40.c

   171	
   172	/**
   173	 * txdb_map_skb - create and store DMA mappings for skb's data blocks
   174	 * @priv: NIC private structure
   175	 * @skb: socket buffer to map
   176	 * @txdd: pointer to tx descriptor to be updated
   177	 * @pkt_len: pointer to unsigned long value
   178	 *
   179	 * This function creates DMA mappings for skb's data blocks and writes them to
   180	 * PBL of a new tx descriptor. It also stores them in the tx db, so they could
   181	 * be unmapped after the data has been sent. It is the responsibility of the
   182	 * caller to make sure that there is enough space in the txdb. The last
   183	 * element holds a pointer to skb itself and is marked with a zero length.
   184	 *
   185	 * Return: 0 on success and negative value on error.
   186	 */
   187	static inline int bdx_tx_map_skb(struct bdx_priv *priv, struct sk_buff *skb,
   188					 struct txd_desc *txdd, unsigned int *pkt_len)
 > 189	{
   190		dma_addr_t dma;
   191		int i, len, err;
   192		struct txdb *db = &priv->txdb;
   193		struct pbl *pbl = &txdd->pbl[0];
   194		int nr_frags = skb_shinfo(skb)->nr_frags;
   195		unsigned int size;
   196		struct mapping_info info[MAX_PBL];
   197	
   198		netdev_dbg(priv->ndev, "TX skb %p skbLen %d dataLen %d frags %d\n", skb,
   199			   skb->len, skb->data_len, nr_frags);
   200		if (nr_frags > MAX_PBL - 1) {
   201			err = skb_linearize(skb);
   202			if (err)
   203				return -1;
   204			nr_frags = skb_shinfo(skb)->nr_frags;
   205		}
   206		/* initial skb */
   207		len = skb->len - skb->data_len;
   208		dma = dma_map_single(&priv->pdev->dev, skb->data, len,
   209				     DMA_TO_DEVICE);
   210		if (dma_mapping_error(&priv->pdev->dev, dma))
   211			return -1;
   212	
   213		bdx_set_txdb(db, dma, len);
   214		bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
   215		*pkt_len = db->wptr->len;
   216	
   217		for (i = 0; i < nr_frags; i++) {
   218			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
   219	
   220			size = skb_frag_size(frag);
   221			dma = skb_frag_dma_map(&priv->pdev->dev, frag, 0,
   222					       size, DMA_TO_DEVICE);
   223	
   224			if (dma_mapping_error(&priv->pdev->dev, dma))
   225				goto mapping_error;
   226			info[i].dma = dma;
   227			info[i].size = size;
   228		}
   229	
   230		for (i = 0; i < nr_frags; i++) {
   231			bdx_tx_db_inc_wptr(db);
   232			bdx_set_txdb(db, info[i].dma, info[i].size);
   233			bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
   234			*pkt_len += db->wptr->len;
   235		}
   236	
   237		/* SHORT_PKT_FIX */
   238		if (skb->len < SHORT_PACKET_SIZE)
   239			++nr_frags;
   240	
   241		/* Add skb clean up info. */
   242		bdx_tx_db_inc_wptr(db);
   243		db->wptr->len = -txd_sizes[nr_frags].bytes;
   244		db->wptr->addr.skb = skb;
   245		bdx_tx_db_inc_wptr(db);
   246	
   247		return 0;
   248	 mapping_error:
   249		dma_unmap_page(&priv->pdev->dev, db->wptr->addr.dma, db->wptr->len, DMA_TO_DEVICE);
   250		for (; i > 0; i--)
   251			dma_unmap_page(&priv->pdev->dev, info[i - 1].dma, info[i - 1].size, DMA_TO_DEVICE);
   252		return -1;
   253	}
   254	
   255	static void init_txd_sizes(void)
   256	{
   257		int i, lwords;
   258	
   259		if (txd_sizes[0].bytes)
   260			return;
   261	
   262		/* 7 - is number of lwords in txd with one phys buffer
   263		 * 3 - is number of lwords used for every additional phys buffer
   264		 */
   265		for (i = 0; i < MAX_PBL; i++) {
   266			lwords = 7 + (i * 3);
   267			if (lwords & 1)
   268				lwords++;	/* pad it with 1 lword */
   269			txd_sizes[i].qwords = lwords >> 1;
   270			txd_sizes[i].bytes = lwords << 2;
   271		}
   272	}
   273	
   274	static int create_tx_ring(struct bdx_priv *priv)
   275	{
   276		int ret;
   277	
   278		ret = bdx_fifo_alloc(priv, &priv->txd_fifo0.m, priv->txd_size,
   279				     REG_TXD_CFG0_0, REG_TXD_CFG1_0,
   280				     REG_TXD_RPTR_0, REG_TXD_WPTR_0);
   281		if (ret)
   282			return ret;
   283	
   284		ret = bdx_fifo_alloc(priv, &priv->txf_fifo0.m, priv->txf_size,
   285				     REG_TXF_CFG0_0, REG_TXF_CFG1_0,
   286				     REG_TXF_RPTR_0, REG_TXF_WPTR_0);
   287		if (ret)
   288			goto err_free_txd;
   289	
   290		/* The TX db has to keep mappings for all packets sent (on
   291		 * TxD) and not yet reclaimed (on TxF).
   292		 */
   293		ret = bdx_tx_db_init(&priv->txdb, max(priv->txd_size, priv->txf_size));
   294		if (ret)
   295			goto err_free_txf;
   296	
   297		/* SHORT_PKT_FIX */
   298		priv->b0_len = 64;
   299		priv->b0_va =
   300			dma_alloc_coherent(&priv->pdev->dev, priv->b0_len, &priv->b0_dma, GFP_KERNEL);
   301		if (!priv->b0_va)
   302			goto err_free_db;
   303	
   304		priv->tx_level = BDX_MAX_TX_LEVEL;
   305		priv->tx_update_mark = priv->tx_level - 1024;
   306		return 0;
   307	err_free_db:
   308		bdx_tx_db_close(&priv->txdb);
   309	err_free_txf:
   310		bdx_fifo_free(priv, &priv->txf_fifo0.m);
   311	err_free_txd:
   312		bdx_fifo_free(priv, &priv->txd_fifo0.m);
   313		return ret;
   314	}
   315	
   316	/**
   317	 * bdx_tx_space - Calculate the available space in the TX fifo.
   318	 *
   319	 * @priv - NIC private structure
   320	 * Return: available space in TX fifo in bytes
   321	 */
   322	static inline int bdx_tx_space(struct bdx_priv *priv)
 > 323	{
   324		struct txd_fifo *f = &priv->txd_fifo0;
   325		int fsize;
   326	
   327		f->m.rptr = read_reg(priv, f->m.reg_rptr) & TXF_WPTR_WR_PTR;
   328		fsize = f->m.rptr - f->m.wptr;
   329		if (fsize <= 0)
   330			fsize = f->m.memsz + fsize;
   331		return fsize;
   332	}
   333	
   334	static int bdx_start_xmit(struct sk_buff *skb, struct net_device *ndev)
   335	{
   336		struct bdx_priv *priv = netdev_priv(ndev);
   337		struct txd_fifo *f = &priv->txd_fifo0;
   338		int txd_checksum = 7;	/* full checksum */
   339		int txd_lgsnd = 0;
   340		int txd_vlan_id = 0;
   341		int txd_vtag = 0;
   342		int txd_mss = 0;
   343		unsigned int pkt_len;
   344		struct txd_desc *txdd;
   345		int nr_frags, len, err;
   346	
   347		/* Build tx descriptor */
   348		txdd = (struct txd_desc *)(f->m.va + f->m.wptr);
   349		err = bdx_tx_map_skb(priv, skb, txdd, &pkt_len);
   350		if (err) {
   351			dev_kfree_skb(skb);
   352			return NETDEV_TX_OK;
   353		}
   354		nr_frags = skb_shinfo(skb)->nr_frags;
   355		if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL))
   356			txd_checksum = 0;
   357	
   358		if (skb_shinfo(skb)->gso_size) {
   359			txd_mss = skb_shinfo(skb)->gso_size;
   360			txd_lgsnd = 1;
   361			netdev_dbg(priv->ndev, "skb %p pkt len %d gso size = %d\n", skb,
   362				   pkt_len, txd_mss);
   363		}
   364		if (skb_vlan_tag_present(skb)) {
   365			/* Don't cut VLAN ID to 12 bits */
   366			txd_vlan_id = skb_vlan_tag_get(skb);
   367			txd_vtag = 1;
   368		}
   369		txdd->va_hi = 0;
 > 370		txdd->va_lo = (u32)((u64)skb);
   371		txdd->length = cpu_to_le16(pkt_len);
   372		txdd->mss = cpu_to_le16(txd_mss);
   373		txdd->txd_val1 =
   374			cpu_to_le32(TXD_W1_VAL
   375				    (txd_sizes[nr_frags].qwords, txd_checksum,
   376				     txd_vtag, txd_lgsnd, txd_vlan_id));
   377		netdev_dbg(priv->ndev, "=== w1 qwords[%d] %d =====\n", nr_frags,
   378			   txd_sizes[nr_frags].qwords);
   379		netdev_dbg(priv->ndev, "=== TxD desc =====================\n");
   380		netdev_dbg(priv->ndev, "=== w1: 0x%x ================\n", txdd->txd_val1);
   381		netdev_dbg(priv->ndev, "=== w2: mss 0x%x len 0x%x\n", txdd->mss,
   382			   txdd->length);
   383		/* SHORT_PKT_FIX */
   384		if (pkt_len < SHORT_PACKET_SIZE) {
   385			struct pbl *pbl = &txdd->pbl[++nr_frags];
   386	
   387			txdd->length = cpu_to_le16(SHORT_PACKET_SIZE);
   388			txdd->txd_val1 =
   389				cpu_to_le32(TXD_W1_VAL
   390					    (txd_sizes[nr_frags].qwords,
   391					     txd_checksum, txd_vtag, txd_lgsnd,
   392					     txd_vlan_id));
   393			pbl->len = cpu_to_le32(SHORT_PACKET_SIZE - pkt_len);
   394			pbl->pa_lo = cpu_to_le32(L32_64(priv->b0_dma));
   395			pbl->pa_hi = cpu_to_le32(H32_64(priv->b0_dma));
   396			netdev_dbg(priv->ndev, "=== SHORT_PKT_FIX   ================\n");
   397			netdev_dbg(priv->ndev, "=== nr_frags : %d   ================\n",
   398				   nr_frags);
   399		}
   400	
   401		/* Increment TXD write pointer. In case of fifo wrapping copy
   402		 * reminder of the descriptor to the beginning.
   403		 */
   404		f->m.wptr += txd_sizes[nr_frags].bytes;
   405		len = f->m.wptr - f->m.memsz;
   406		if (unlikely(len >= 0)) {
   407			f->m.wptr = len;
   408			if (len > 0)
   409				memcpy(f->m.va, f->m.va + f->m.memsz, len);
   410		}
   411		/* Force memory writes to complete before letting the HW know
   412		 * there are new descriptors to fetch.
   413		 */
   414		wmb();
   415	
   416		priv->tx_level -= txd_sizes[nr_frags].bytes;
   417		if (priv->tx_level > priv->tx_update_mark) {
   418			write_reg(priv, f->m.reg_wptr,
   419				  f->m.wptr & TXF_WPTR_WR_PTR);
   420		} else {
   421			if (priv->tx_noupd++ > BDX_NO_UPD_PACKETS) {
   422				priv->tx_noupd = 0;
   423				write_reg(priv, f->m.reg_wptr,
   424					  f->m.wptr & TXF_WPTR_WR_PTR);
   425			}
   426		}
   427	
   428		netif_trans_update(ndev);
   429		priv->net_stats.tx_packets++;
   430		priv->net_stats.tx_bytes += pkt_len;
   431		if (priv->tx_level < BDX_MIN_TX_LEVEL) {
   432			netdev_dbg(priv->ndev, "TX Q STOP level %d\n", priv->tx_level);
   433			netif_stop_queue(ndev);
   434		}
   435	
   436		return NETDEV_TX_OK;
   437	}
   438	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips
  2024-04-15 10:43 [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips FUJITA Tomonori
                   ` (4 preceding siblings ...)
  2024-04-15 10:43 ` [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support FUJITA Tomonori
@ 2024-04-15 23:21 ` Florian Fainelli
  2024-04-16 10:59   ` FUJITA Tomonori
  5 siblings, 1 reply; 22+ messages in thread
From: Florian Fainelli @ 2024-04-15 23:21 UTC (permalink / raw)
  To: FUJITA Tomonori, netdev; +Cc: andrew

Hi Fujita-san,

On 4/15/2024 3:43 AM, FUJITA Tomonori wrote:
> This patchset adds a new 10G ethernet driver for Tehuti Networks
> TN40xx chips. Note in mainline, there is a driver for Tehuti Networks
> (drivers/net/ethernet/tehuti/tehuti.[hc]), which supports TN30xx
> chips.
> 
> Multiple vendors (DLink, Asus, Edimax, QNAP, etc) developed adapters
> based on TN40xx chips. Tehuti Networks went out of business but the
> drivers are still distributed with some of the hardware (and also
> available on some sites). With some changes, I try to upstream this
> driver with a new PHY driver in Rust.
> 
> The major change is replacing a PHY abstraction layer with
> PHYLIB. TN40xx chips are used with various PHY hardware (AMCC QT2025,
> TI TLK10232, Aqrate AQR105, and Marvell MV88X3120, MV88X3310, and
> MV88E2010). So the original driver has the own PHY abstraction layer
> to handle them.
> 
> I'll submit a new PHY driver for QT2025 in Rust shortly. For now, I
> enable only adapters using QT2025 PHY in the PCI ID table of this
> driver. I've tested this driver and the QT2025 PHY driver with Edimax
> EN-9320 10G adapter. In mainline, there are PHY drivers for AQR105 and
> Marvell PHYs, which could work for some TN40xx adapters with this
> driver.
> 
> The other changes are replacing the embedded firmware in a header file
> with the firmware APIs, handling dma mapping errors, removing many
> ifdef, fixing lots of style issues, etc.
> 
> To make reviewing easier, this patchset has only basic functions. Once
> merged, I'll submit features like ethtool support.

Thanks a lot for attempting to upstream support for the TN40xx chips, 
those are extremely popular 10GbE PCIe cards under USD 100. Last I 
checked the vendor driver, it was not entirely clear however whether it 
would be possible to get the various PHY firmwares included in 
linux-firmware, that is the licensing was not very specific either way 
(redistribution allowed or not).
-- 
Florian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 4/5] net: tn40xx: add basic Rx handling
  2024-04-15 10:43 ` [PATCH net-next v1 4/5] net: tn40xx: add basic Rx handling FUJITA Tomonori
@ 2024-04-16  0:38   ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2024-04-16  0:38 UTC (permalink / raw)
  To: FUJITA Tomonori, netdev; +Cc: oe-kbuild-all, andrew

Hi FUJITA,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 32affa5578f0e6b9abef3623d3976395afbd265c]

url:    https://github.com/intel-lab-lkp/linux/commits/FUJITA-Tomonori/net-tn40xx-add-pci-driver-for-Tehuti-Networks-TN40xx-chips/20240415-185416
base:   32affa5578f0e6b9abef3623d3976395afbd265c
patch link:    https://lore.kernel.org/r/20240415104352.4685-5-fujita.tomonori%40gmail.com
patch subject: [PATCH net-next v1 4/5] net: tn40xx: add basic Rx handling
config: microblaze-allmodconfig (https://download.01.org/0day-ci/archive/20240416/202404160840.Me9Gpj7a-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240416/202404160840.Me9Gpj7a-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404160840.Me9Gpj7a-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:566,
                    from include/asm-generic/bug.h:22,
                    from ./arch/microblaze/include/generated/asm/bug.h:1,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:13,
                    from include/linux/sched.h:14,
                    from include/linux/delay.h:23,
                    from drivers/net/ethernet/tehuti/tn40.h:8,
                    from drivers/net/ethernet/tehuti/tn40.c:4:
   drivers/net/ethernet/tehuti/tn40.c: In function 'bdx_rx_reuse_page':
>> drivers/net/ethernet/tehuti/tn40.c:154:20: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     154 |                    (void *)dm->dma);
         |                    ^
   include/linux/dynamic_debug.h:224:29: note: in definition of macro '__dynamic_func_call_cls'
     224 |                 func(&id, ##__VA_ARGS__);                       \
         |                             ^~~~~~~~~~~
   include/linux/dynamic_debug.h:250:9: note: in expansion of macro '_dynamic_func_call_cls'
     250 |         _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__)
         |         ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:277:9: note: in expansion of macro '_dynamic_func_call'
     277 |         _dynamic_func_call(fmt, __dynamic_netdev_dbg,           \
         |         ^~~~~~~~~~~~~~~~~~
   include/net/net_debug.h:57:9: note: in expansion of macro 'dynamic_netdev_dbg'
      57 |         dynamic_netdev_dbg(__dev, format, ##args);              \
         |         ^~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c:153:9: note: in expansion of macro 'netdev_dbg'
     153 |         netdev_dbg(priv->ndev, "dm size %d off %d dma %p\n", dm->size, dm->off,
         |         ^~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c:156:67: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     156 |                 netdev_dbg(priv->ndev, "unmap page %p size %d\n", (void *)dm->dma, dm->size);
         |                                                                   ^
   include/linux/dynamic_debug.h:224:29: note: in definition of macro '__dynamic_func_call_cls'
     224 |                 func(&id, ##__VA_ARGS__);                       \
         |                             ^~~~~~~~~~~
   include/linux/dynamic_debug.h:250:9: note: in expansion of macro '_dynamic_func_call_cls'
     250 |         _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__)
         |         ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:277:9: note: in expansion of macro '_dynamic_func_call'
     277 |         _dynamic_func_call(fmt, __dynamic_netdev_dbg,           \
         |         ^~~~~~~~~~~~~~~~~~
   include/net/net_debug.h:57:9: note: in expansion of macro 'dynamic_netdev_dbg'
      57 |         dynamic_netdev_dbg(__dev, format, ##args);              \
         |         ^~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c:156:17: note: in expansion of macro 'netdev_dbg'
     156 |                 netdev_dbg(priv->ndev, "unmap page %p size %d\n", (void *)dm->dma, dm->size);
         |                 ^~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c: In function 'bdx_rx_alloc_buffers':
   drivers/net/ethernet/tehuti/tn40.c:324:28: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     324 |                            (void *)dm->dma);
         |                            ^
   include/linux/dynamic_debug.h:224:29: note: in definition of macro '__dynamic_func_call_cls'
     224 |                 func(&id, ##__VA_ARGS__);                       \
         |                             ^~~~~~~~~~~
   include/linux/dynamic_debug.h:250:9: note: in expansion of macro '_dynamic_func_call_cls'
     250 |         _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__)
         |         ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:277:9: note: in expansion of macro '_dynamic_func_call'
     277 |         _dynamic_func_call(fmt, __dynamic_netdev_dbg,           \
         |         ^~~~~~~~~~~~~~~~~~
   include/net/net_debug.h:57:9: note: in expansion of macro 'dynamic_netdev_dbg'
      57 |         dynamic_netdev_dbg(__dev, format, ##args);              \
         |         ^~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c:323:17: note: in expansion of macro 'netdev_dbg'
     323 |                 netdev_dbg(priv->ndev, "dm size %d off %d dma %p\n", dm->size, dm->off,
         |                 ^~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c: In function 'bdx_rx_receive':
>> drivers/net/ethernet/tehuti/tn40.c:465:26: warning: variable 'rxf_fifo' set but not used [-Wunused-but-set-variable]
     465 |         struct rxf_fifo *rxf_fifo;
         |                          ^~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c: In function 'bdx_start_xmit':
   drivers/net/ethernet/tehuti/tn40.c:923:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     923 |         txdd->va_lo = (u32)((u64)skb);
         |                             ^
   drivers/net/ethernet/tehuti/tn40.c: In function 'bdx_tx_cleanup':
>> drivers/net/ethernet/tehuti/tn40.c:1007:48: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'dma_addr_t' {aka 'unsigned int'} [-Wformat=]
    1007 |                         netdev_dbg(priv->ndev, "pci_unmap_page 0x%llx len %d\n",
         |                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1008 |                                    db->rptr->addr.dma, db->rptr->len);
         |                                    ~~~~~~~~~~~~~~~~~~
         |                                                  |
         |                                                  dma_addr_t {aka unsigned int}
   include/linux/dynamic_debug.h:224:29: note: in definition of macro '__dynamic_func_call_cls'
     224 |                 func(&id, ##__VA_ARGS__);                       \
         |                             ^~~~~~~~~~~
   include/linux/dynamic_debug.h:250:9: note: in expansion of macro '_dynamic_func_call_cls'
     250 |         _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__)
         |         ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/dynamic_debug.h:277:9: note: in expansion of macro '_dynamic_func_call'
     277 |         _dynamic_func_call(fmt, __dynamic_netdev_dbg,           \
         |         ^~~~~~~~~~~~~~~~~~
   include/net/net_debug.h:57:9: note: in expansion of macro 'dynamic_netdev_dbg'
      57 |         dynamic_netdev_dbg(__dev, format, ##args);              \
         |         ^~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c:1007:25: note: in expansion of macro 'netdev_dbg'
    1007 |                         netdev_dbg(priv->ndev, "pci_unmap_page 0x%llx len %d\n",
         |                         ^~~~~~~~~~
   drivers/net/ethernet/tehuti/tn40.c:1007:69: note: format string is defined here
    1007 |                         netdev_dbg(priv->ndev, "pci_unmap_page 0x%llx len %d\n",
         |                                                                  ~~~^
         |                                                                     |
         |                                                                     long long unsigned int
         |                                                                  %x


vim +154 drivers/net/ethernet/tehuti/tn40.c

   150	
   151	static void bdx_rx_reuse_page(struct bdx_priv *priv, struct rx_map *dm)
   152	{
   153		netdev_dbg(priv->ndev, "dm size %d off %d dma %p\n", dm->size, dm->off,
 > 154			   (void *)dm->dma);
   155		if (dm->off == 0) {
   156			netdev_dbg(priv->ndev, "unmap page %p size %d\n", (void *)dm->dma, dm->size);
   157			dma_unmap_page(&priv->pdev->dev, dm->dma, dm->size,
   158				       DMA_FROM_DEVICE);
   159		}
   160	}
   161	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips
  2024-04-15 23:21 ` [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips Florian Fainelli
@ 2024-04-16 10:59   ` FUJITA Tomonori
  0 siblings, 0 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-16 10:59 UTC (permalink / raw)
  To: f.fainelli; +Cc: fujita.tomonori, netdev, andrew

Hi,

On Mon, 15 Apr 2024 16:21:35 -0700
Florian Fainelli <f.fainelli@gmail.com> wrote:

> Thanks a lot for attempting to upstream support for the TN40xx chips,
> those are extremely popular 10GbE PCIe cards under USD 100. Last I
> checked the vendor driver, it was not entirely clear however whether
> it would be possible to get the various PHY firmwares included in
> linux-firmware, that is the licensing was not very specific either way
> (redistribution allowed or not).

The code of the original driver has MODULE_LICENSE("GPL"). The
firmware of QT2025 PHY is embedded in a header file in the source code
in the binary format. So I assume that it's also under GPL.

Unlike QT2025 firmware file, firmware files for Marvell PHYs aren't
included in the original driver code. The README says you need to get
Marvell firmware files and compile the code with them. So I guess that
they can't be redistributed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support
  2024-04-15 14:44   ` Andrew Lunn
@ 2024-04-16 12:19     ` FUJITA Tomonori
  2024-04-16 12:57       ` Andrew Lunn
  0 siblings, 1 reply; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-16 12:19 UTC (permalink / raw)
  To: andrew; +Cc: fujita.tomonori, netdev

Hi,

On Mon, 15 Apr 2024 16:44:31 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Mon, Apr 15, 2024 at 07:43:52PM +0900, FUJITA Tomonori wrote:
>> This patch adds supports for multiple PHY hardware with PHYLIB. The
>> adapters with TN40xx chips use multiple PHY hardware; AMCC QT2025, TI
>> TLK10232, Aqrate AQR105, and Marvell 88X3120, 88X3310, and MV88E2010.
>> 
>> For now, the PCI ID table of this driver enables adapters using only
>> QT2025 PHY. I've tested this driver and the QT2025 PHY driver with
>> Edimax EN-9320 10G adapter.
> 
> Please split this up. Add the MDIO bus master in one patch. Then add
> support for phylib in a second patch. They are logically different
> things.

Understood, I'll split this in v2.


> Are there variants of this device using SFP? It might be you actually
> want to use phylink, not phylib. That is a bit messy for a PCI device,
> look at drivers/net/ethernet/wangxun.

phylink is necessary if PHY is hot-pluggable, right? if so, the driver
doesn't need it. The PHYs that adapters with TN40XX use are

AMCC QT2025 PHY (SFP+)
- Tehuti TN9310
- DLink DXE-810S
- Asus XG-C100F
- Edimax EN-9320

Marvell MV88x3120 (10GBase-T)
- Tehuti TN9210

Marvell MV88X3310 (10GBase-T)
- Tehuti TN9710
- Edimax EN-9320TX-E
- Buffalo LGY-PCIE-MG
- IOI GE10
- LR-Link LREC6860BT
- QNAP PCIe Expansion Card

Marvell MV88E2010 (5GBase-T)
- Tehuti TN9710Q

TI TLK10232 (SFP+)
- Tehuti TN9610
- LR-Link LREC6860AF

Aquantia AQR105 (10GBase-T)
- Tehuti TN9510
- DLink DXE-810T
- Edimax EN-9320TX-E


>> diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
>> index 4198fd59e42e..71f22471f9a0 100644
>> --- a/drivers/net/ethernet/tehuti/Kconfig
>> +++ b/drivers/net/ethernet/tehuti/Kconfig
>> @@ -27,6 +27,7 @@ config TEHUTI_TN40
>>  	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
>>  	depends on PCI
>>  	select FW_LOADER
>> +	select AMCC_QT2025_PHY
> 
> That is pretty unusual, especially when you say there are a few
> different choices.

I should not put any 'select *_PHY' here?


>> +static u32 bdx_mdio_get(struct bdx_priv *priv)
>> +{
>> +	void __iomem *regs = priv->regs;
>> +
>> +#define BDX_MAX_MDIO_BUSY_LOOPS 1024
>> +	int tries = 0;
>> +
>> +	while (++tries < BDX_MAX_MDIO_BUSY_LOOPS) {
>> +		u32 mdio_cmd_stat = readl(regs + REG_MDIO_CMD_STAT);
>> +
>> +		if (GET_MDIO_BUSY(mdio_cmd_stat) == 0)
>> +			return mdio_cmd_stat;
>> +	}
>> +	dev_err(&priv->pdev->dev, "MDIO busy!\n");
> 
> include/linux/iopoll.h
> 
>> +	return 0xFFFFFFFF;
> 
> It is always better to use standard error codes. In this case,
> -ETIMEDOUT.

I'll


>> +static u16 bdx_mdio_read(struct bdx_priv *priv, int device, int port, u16 addr)
>> +{
>> +	void __iomem *regs = priv->regs;
>> +	u32 tmp_reg, i;
>> +	/* wait until MDIO is not busy */
>> +	if (bdx_mdio_get(priv) == 0xFFFFFFFF)
>> +		return -1;
>> +
>> +	i = ((device & 0x1F) | ((port & 0x1F) << 5));
>> +	writel(i, regs + REG_MDIO_CMD);
>> +	writel((u32)addr, regs + REG_MDIO_ADDR);
>> +	tmp_reg = bdx_mdio_get(priv);
>> +	if (tmp_reg == 0xFFFFFFFF)
>> +		return -1;
> 
> This function has a return type of u16. So returning -1 makes no sense.

Yeah, I thought the same but left it alone. I'll change in v2.


>> +static int mdio_read_reg(struct mii_bus *mii_bus, int addr, int devnum, int regnum)
>> +{
>> +	return bdx_mdio_read(mii_bus->priv, devnum, addr, regnum);
> 
> I would probably change bdx_mdio_read() so that it takes the
> parameters in the same order as mdio_read_reg().

Sure, I'll.


> There is also a reasonably common convention that the functions
> performing C45 bus protocol operations have c45 in their name. It
> appears this hardware does not support C22 at all. That makes it
> unusual, and little hits like this are useful.

I'm not sure the adapters supports C22 or not (probably do, I
guess). But the original driver uses C45 bus protocol operations.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support
  2024-04-16 12:19     ` FUJITA Tomonori
@ 2024-04-16 12:57       ` Andrew Lunn
  2024-04-25  1:24         ` FUJITA Tomonori
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Lunn @ 2024-04-16 12:57 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: netdev

> > Are there variants of this device using SFP? It might be you actually
> > want to use phylink, not phylib. That is a bit messy for a PCI device,
> > look at drivers/net/ethernet/wangxun.
> 
> phylink is necessary if PHY is hot-pluggable, right? if so, the driver
> doesn't need it. The PHYs that adapters with TN40XX use are

There is more to it than that. phylib has problems when the bandwidth
is > 1G and the MAC/PHY link becomes more problematic. Often the PHY
will change this link depending on what the media side is doing. If
you have a 1G SFP inserted, the QT2025 will change the MAC/PHY link to
1000BaseX. If it has a 10G SFP it will use XAUI. phylink knows how to
decode the SFP EEPROM to determine what sort of module it is, and how
the PHY should be configured.

To fully support this hardware you are going to need to use phylink.

> >> diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
> >> index 4198fd59e42e..71f22471f9a0 100644
> >> --- a/drivers/net/ethernet/tehuti/Kconfig
> >> +++ b/drivers/net/ethernet/tehuti/Kconfig
> >> @@ -27,6 +27,7 @@ config TEHUTI_TN40
> >>  	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
> >>  	depends on PCI
> >>  	select FW_LOADER
> >> +	select AMCC_QT2025_PHY
> > 
> > That is pretty unusual, especially when you say there are a few
> > different choices.
> 
> I should not put any 'select *_PHY' here?

Correct. Most distributions just package everything.

We are going to get into an odd corner case that since Rust is still
experimental, i doubt distributions are building Rust modules. So they
will end up with a MAC driver but no PHY driver, at least not for the
QT2025. The Marvell and Aquantia PHY should just work.

Anybody who does want to use the QT2025 will either need to
build there own kernel, or black list the in kernel MAC driver and use
the out of tree driver. But eventually, Rust will start to be
packaged, and then it should work out O.K.
 
	Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling
  2024-04-15 10:43 ` [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling FUJITA Tomonori
  2024-04-15 22:29   ` kernel test robot
@ 2024-04-18 17:23   ` Simon Horman
  2024-04-18 17:24     ` Simon Horman
  2024-04-22  7:29     ` FUJITA Tomonori
  1 sibling, 2 replies; 22+ messages in thread
From: Simon Horman @ 2024-04-18 17:23 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: netdev, andrew

On Mon, Apr 15, 2024 at 07:43:50PM +0900, FUJITA Tomonori wrote:
> This patch adds device specific structures to initialize the hardware
> with basic Tx handling. The original driver loads the embedded
> firmware in the header file. This driver is implemented to use the
> firmware APIs.
> 
> The Tx logic uses three major data structures; two ring buffers with
> NIC and one database. One ring buffer is used to send information
> about packets to be sent for NIC. The other is used to get information
> from NIC about packet that are sent. The database is used to keep the
> information about DMA mapping. After a packet is sent, the db is used
> to free the resource used for the packet.
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>

Hi Fujita-san,

Some review from my side.

> ---
>  drivers/net/ethernet/tehuti/Kconfig |    1 +
>  drivers/net/ethernet/tehuti/tn40.c  | 1251 +++++++++++++++++++++++++++
>  drivers/net/ethernet/tehuti/tn40.h  |  192 +++-
>  3 files changed, 1443 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
> index 849e3b4a71c1..4198fd59e42e 100644
> --- a/drivers/net/ethernet/tehuti/Kconfig
> +++ b/drivers/net/ethernet/tehuti/Kconfig
> @@ -26,6 +26,7 @@ config TEHUTI
>  config TEHUTI_TN40
>  	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
>  	depends on PCI
> +	select FW_LOADER
>  	help
>  	  This driver supports 10G Ethernet adapters using Tehuti Networks
>  	  TN40xx chips. Currently, adapters with Applied Micro Circuits

Not strictly related to this patch, but did you consider
adding an entry in the MAINTAINERS file for this driver?

> diff --git a/drivers/net/ethernet/tehuti/tn40.c b/drivers/net/ethernet/tehuti/tn40.c
> index 368a73300f8a..0798df468fc3 100644
> --- a/drivers/net/ethernet/tehuti/tn40.c
> +++ b/drivers/net/ethernet/tehuti/tn40.c
> @@ -3,9 +3,1170 @@
>  
>  #include "tn40.h"
>  
> +#define SHORT_PACKET_SIZE 60
> +
> +static void bdx_enable_interrupts(struct bdx_priv *priv)
> +{
> +	write_reg(priv, REG_IMR, priv->isr_mask);
> +}
> +
> +static void bdx_disable_interrupts(struct bdx_priv *priv)
> +{
> +	write_reg(priv, REG_IMR, 0);
> +}
> +
> +static int bdx_fifo_alloc(struct bdx_priv *priv, struct fifo *f, int fsz_type,
> +			  u16 reg_cfg0, u16 reg_cfg1, u16 reg_rptr, u16 reg_wptr)

Please consider using a soft limit on line length of 80 characters
in Networking code.

> +{
> +	u16 memsz = FIFO_SIZE * (1 << fsz_type);

I'm not sure if fsz_type has a meaning here - perhaps it comes from the
datasheet. But if not, perhaps 'order' would be a more intuitive
name for this parameter. Similarly for the txd_size and txf_size
fields of struct bdx_priv, the sz_type field of bdx_tx_db_init(),
and possibly elsewhere.

> +
> +	memset(f, 0, sizeof(struct fifo));
> +	/* 1K extra space is allocated at the end of the fifo to simplify
> +	 * processing of descriptors that wraps around fifo's end.
> +	 */
> +	f->va = dma_alloc_coherent(&priv->pdev->dev,
> +				   memsz + FIFO_EXTRA_SPACE, &f->da, GFP_KERNEL);
> +	if (!f->va)
> +		return -ENOMEM;
> +
> +	f->reg_cfg0 = reg_cfg0;
> +	f->reg_cfg1 = reg_cfg1;
> +	f->reg_rptr = reg_rptr;
> +	f->reg_wptr = reg_wptr;
> +	f->rptr = 0;
> +	f->wptr = 0;
> +	f->memsz = memsz;
> +	f->size_mask = memsz - 1;
> +	write_reg(priv, reg_cfg0, (u32)((f->da & TX_RX_CFG0_BASE) | fsz_type));

For consistency should this be use H32_64()?:

		H32_64((f->da & TX_RX_CFG0_BASE) | fsz_type)

> +	write_reg(priv, reg_cfg1, H32_64(f->da));
> +	return 0;
> +}
> +
> +static void bdx_fifo_free(struct bdx_priv *priv, struct fifo *f)
> +{
> +	dma_free_coherent(&priv->pdev->dev,
> +			  f->memsz + FIFO_EXTRA_SPACE, f->va, f->da);
> +}
> +
> +/* TX HW/SW interaction overview
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + * There are 2 types of TX communication channels between driver and NIC.
> + * 1) TX Free Fifo - TXF - Holds ack descriptors for sent packets.
> + * 2) TX Data Fifo - TXD - Holds descriptors of full buffers.
> + *
> + * Currently the NIC supports TSO, checksumming and gather DMA
> + * UFO and IP fragmentation is on the way.
> + *
> + * RX SW Data Structures
> + * ~~~~~~~~~~~~~~~~~~~~~
> + * TXDB is used to keep track of all skbs owned by SW and their DMA addresses.
> + * For TX case, ownership lasts from getting the packet via hard_xmit and
> + * until the HW acknowledges sending the packet by TXF descriptors.
> + * TXDB is implemented as a cyclic buffer.
> + *
> + * FIFO objects keep info about the fifo's size and location, relevant HW
> + * registers, usage and skb db. Each RXD and RXF fifo has their own fifo
> + * structure. Implemented as simple struct.
> + *
> + * TX SW Execution Flow
> + * ~~~~~~~~~~~~~~~~~~~~
> + * OS calls the driver's hard_xmit method with a packet to send. The driver
> + * creates DMA mappings, builds TXD descriptors and kicks the HW by updating
> + * TXD WPTR.
> + *
> + * When a packet is sent, The HW write a TXF descriptor and the SW
> + * frees the original skb. To prevent TXD fifo overflow without
> + * reading HW registers every time, the SW deploys "tx level"
> + * technique. Upon startup, the tx level is initialized to TXD fifo
> + * length. For every sent packet, the SW gets its TXD descriptor size
> + * (from a pre-calculated array) and subtracts it from tx level.  The
> + * size is also stored in txdb. When a TXF ack arrives, the SW fetched
> + * the size of the original TXD descriptor from the txdb and adds it
> + * to the tx level. When the Tx level drops below some predefined
> + * threshold, the driver stops the TX queue. When the TX level rises
> + * above that level, the tx queue is enabled again.
> + *
> + * This technique avoids excessive reading of RPTR and WPTR registers.
> + * As our benchmarks shows, it adds 1.5 Gbit/sec to NIS's throughput.
> + */
> +static inline int bdx_tx_db_size(struct txdb *db)
> +{
> +	int taken = db->wptr - db->rptr;
> +
> +	if (taken < 0)
> +		taken = db->size + 1 + taken;	/* (size + 1) equals memsz */
> +	return db->size - taken;
> +}

bdx_tx_db_size seems to be unused. Perhaps it can be removed.

Flagged by W=1 build with clang-18.

...

> +/* Sizes of tx desc (including padding if needed) as function of the SKB's
> + * frag number
> + */
> +static struct {
> +	u16 bytes;
> +	u16 qwords;		/* qword = 64 bit */
> +} txd_sizes[MAX_PBL];
> +
> +inline void bdx_set_pbl(struct pbl *pbl, dma_addr_t dma, int len)
> +{
> +	pbl->len = cpu_to_le32(len);
> +	pbl->pa_lo = cpu_to_le32(L32_64(dma));
> +	pbl->pa_hi = cpu_to_le32(H32_64(dma));
> +}

The type of the pa_lo and pa_high fields of struct pbl are both u32.
But here they are assigned little-endian values. This doesn't seem right
(I expect the types of the fields should be changed to __le32).

This was flagged by Sparse, along with a number of other problems
(which I didn't look into). Please ensure that patches do
not introduce new Sparse warnings.

...

> +/**
> + * txdb_map_skb - create and store DMA mappings for skb's data blocks

nit: bdx_tx_map_skb

Flagged by ./scripts/kernel-doc -Wall -none

> + * @priv: NIC private structure
> + * @skb: socket buffer to map
> + * @txdd: pointer to tx descriptor to be updated
> + * @pkt_len: pointer to unsigned long value
> + *
> + * This function creates DMA mappings for skb's data blocks and writes them to
> + * PBL of a new tx descriptor. It also stores them in the tx db, so they could
> + * be unmapped after the data has been sent. It is the responsibility of the
> + * caller to make sure that there is enough space in the txdb. The last
> + * element holds a pointer to skb itself and is marked with a zero length.
> + *
> + * Return: 0 on success and negative value on error.
> + */
> +static inline int bdx_tx_map_skb(struct bdx_priv *priv, struct sk_buff *skb,
> +				 struct txd_desc *txdd, unsigned int *pkt_len)
> +{
> +	dma_addr_t dma;
> +	int i, len, err;
> +	struct txdb *db = &priv->txdb;
> +	struct pbl *pbl = &txdd->pbl[0];
> +	int nr_frags = skb_shinfo(skb)->nr_frags;
> +	unsigned int size;
> +	struct mapping_info info[MAX_PBL];

nit: Please arrange local variables in new Networking code in reverse
     xmas tree order - longest line to shortest.

     This tool is your friend: https://github.com/ecree-solarflare/xmastree

> +
> +	netdev_dbg(priv->ndev, "TX skb %p skbLen %d dataLen %d frags %d\n", skb,
> +		   skb->len, skb->data_len, nr_frags);
> +	if (nr_frags > MAX_PBL - 1) {
> +		err = skb_linearize(skb);
> +		if (err)
> +			return -1;
> +		nr_frags = skb_shinfo(skb)->nr_frags;
> +	}
> +	/* initial skb */
> +	len = skb->len - skb->data_len;
> +	dma = dma_map_single(&priv->pdev->dev, skb->data, len,
> +			     DMA_TO_DEVICE);
> +	if (dma_mapping_error(&priv->pdev->dev, dma))
> +		return -1;

As I believe Andrew mentioned elsewhere, it's best
to use standard error values. Perhaps here:

	ret = dma_mapping_error(...);
	if (ret)
		return ret;

> +
> +	bdx_set_txdb(db, dma, len);
> +	bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
> +	*pkt_len = db->wptr->len;
> +
> +	for (i = 0; i < nr_frags; i++) {
> +		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> +
> +		size = skb_frag_size(frag);
> +		dma = skb_frag_dma_map(&priv->pdev->dev, frag, 0,
> +				       size, DMA_TO_DEVICE);
> +
> +		if (dma_mapping_error(&priv->pdev->dev, dma))
> +			goto mapping_error;
> +		info[i].dma = dma;
> +		info[i].size = size;
> +	}
> +
> +	for (i = 0; i < nr_frags; i++) {
> +		bdx_tx_db_inc_wptr(db);
> +		bdx_set_txdb(db, info[i].dma, info[i].size);
> +		bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
> +		*pkt_len += db->wptr->len;
> +	}
> +
> +	/* SHORT_PKT_FIX */
> +	if (skb->len < SHORT_PACKET_SIZE)
> +		++nr_frags;
> +
> +	/* Add skb clean up info. */
> +	bdx_tx_db_inc_wptr(db);
> +	db->wptr->len = -txd_sizes[nr_frags].bytes;
> +	db->wptr->addr.skb = skb;
> +	bdx_tx_db_inc_wptr(db);
> +
> +	return 0;
> + mapping_error:
> +	dma_unmap_page(&priv->pdev->dev, db->wptr->addr.dma, db->wptr->len, DMA_TO_DEVICE);
> +	for (; i > 0; i--)
> +		dma_unmap_page(&priv->pdev->dev, info[i - 1].dma, info[i - 1].size, DMA_TO_DEVICE);
> +	return -1;

And here:

	return -ENOMEM;

> +}

...

> +static int create_tx_ring(struct bdx_priv *priv)
> +{
> +	int ret;
> +
> +	ret = bdx_fifo_alloc(priv, &priv->txd_fifo0.m, priv->txd_size,
> +			     REG_TXD_CFG0_0, REG_TXD_CFG1_0,
> +			     REG_TXD_RPTR_0, REG_TXD_WPTR_0);
> +	if (ret)
> +		return ret;
> +
> +	ret = bdx_fifo_alloc(priv, &priv->txf_fifo0.m, priv->txf_size,
> +			     REG_TXF_CFG0_0, REG_TXF_CFG1_0,
> +			     REG_TXF_RPTR_0, REG_TXF_WPTR_0);
> +	if (ret)
> +		goto err_free_txd;
> +
> +	/* The TX db has to keep mappings for all packets sent (on
> +	 * TxD) and not yet reclaimed (on TxF).
> +	 */
> +	ret = bdx_tx_db_init(&priv->txdb, max(priv->txd_size, priv->txf_size));
> +	if (ret)
> +		goto err_free_txf;
> +
> +	/* SHORT_PKT_FIX */
> +	priv->b0_len = 64;
> +	priv->b0_va =
> +		dma_alloc_coherent(&priv->pdev->dev, priv->b0_len, &priv->b0_dma, GFP_KERNEL);

nit: I think this indentation would be more in keeping with normal practices:

	priv->b0_va = dma_alloc_coherent(&priv->pdev->dev, priv->b0_len,
					 &priv->b0_dma, GFP_KERNEL);

> +	if (!priv->b0_va)
> +		goto err_free_db;

The goto above will result in the function returning ret.
But ret is 0 here. Should it be set to a negative error value,
f.e. -ENOMEM?

Flagged by Smatch.

> +
> +	priv->tx_level = BDX_MAX_TX_LEVEL;
> +	priv->tx_update_mark = priv->tx_level - 1024;
> +	return 0;
> +err_free_db:
> +	bdx_tx_db_close(&priv->txdb);
> +err_free_txf:
> +	bdx_fifo_free(priv, &priv->txf_fifo0.m);
> +err_free_txd:
> +	bdx_fifo_free(priv, &priv->txd_fifo0.m);
> +	return ret;
> +}
> +
> +/**
> + * bdx_tx_space - Calculate the available space in the TX fifo.
> + *
> + * @priv - NIC private structure

nit: '@priv: NIC private structure'

> + * Return: available space in TX fifo in bytes
> + */

...

> +static int bdx_set_link_speed(struct bdx_priv *priv, u32 speed)
> +{
> +	int i;
> +	u32 val;
> +
> +	netdev_dbg(priv->ndev, "speed %d\n", speed);
> +
> +	switch (speed) {
> +	case SPEED_10000:
> +	case SPEED_5000:
> +	case SPEED_2500:
> +		netdev_dbg(priv->ndev, "link_speed %d\n", speed);
> +

There are a lot of magic numbers below.
Could these be converted into #defines with meaningful names?

> +		write_reg(priv, 0x1010, 0x217);	/*ETHSD.REFCLK_CONF  */
> +		write_reg(priv, 0x104c, 0x4c);	/*ETHSD.L0_RX_PCNT  */
> +		write_reg(priv, 0x1050, 0x4c);	/*ETHSD.L1_RX_PCNT  */
> +		write_reg(priv, 0x1054, 0x4c);	/*ETHSD.L2_RX_PCNT  */
> +		write_reg(priv, 0x1058, 0x4c);	/*ETHSD.L3_RX_PCNT  */
> +		write_reg(priv, 0x102c, 0x434);	/*ETHSD.L0_TX_PCNT  */
> +		write_reg(priv, 0x1030, 0x434);	/*ETHSD.L1_TX_PCNT  */
> +		write_reg(priv, 0x1034, 0x434);	/*ETHSD.L2_TX_PCNT  */
> +		write_reg(priv, 0x1038, 0x434);	/*ETHSD.L3_TX_PCNT  */
> +		write_reg(priv, 0x6300, 0x0400);	/*MAC.PCS_CTRL */

...

> +static int bdx_hw_reset(struct bdx_priv *priv)
> +{
> +	u32 val, i;
> +
> +	/* Reset sequences: read, write 1, read, write 0 */
> +	val = read_reg(priv, REG_CLKPLL);
> +	write_reg(priv, REG_CLKPLL, (val | CLKPLL_SFTRST) + 0x8);
> +	udelay(50);

Checkpatch recommends using usleep_range() here
after consulting with Documentation/timers/timers-howto.rst.
TBH, I'm unsure of the merit of this advice.

> +	val = read_reg(priv, REG_CLKPLL);
> +	write_reg(priv, REG_CLKPLL, val & ~CLKPLL_SFTRST);
> +
> +	/* Check that the PLLs are locked and reset ended */
> +	for (i = 0; i < 70; i++, mdelay(10)) {
> +		if ((read_reg(priv, REG_CLKPLL) & CLKPLL_LKD) == CLKPLL_LKD) {
> +			udelay(50);

Ditto.

> +			/* Do any PCI-E read transaction */
> +			read_reg(priv, REG_RXD_CFG0_0);
> +			return 0;
> +		}
> +	}
> +	return 1;
> +}
> +
> +static int bdx_sw_reset(struct bdx_priv *priv)

nit: This function always returns zero, and the caller ignores the
     return value. It's return type could be void.

> +{
> +	int i;
> +
> +	/* 1. load MAC (obsolete) */
> +	/* 2. disable Rx (and Tx) */
> +	write_reg(priv, REG_GMAC_RXF_A, 0);
> +	mdelay(100);
> +	/* 3. Disable port */
> +	write_reg(priv, REG_DIS_PORT, 1);
> +	/* 4. Disable queue */
> +	write_reg(priv, REG_DIS_QU, 1);
> +	/* 5. Wait until hw is disabled */
> +	for (i = 0; i < 50; i++) {
> +		if (read_reg(priv, REG_RST_PORT) & 1)
> +			break;
> +		mdelay(10);
> +	}
> +	if (i == 50) {
> +		netdev_err(priv->ndev, "%s SW reset timeout. continuing anyway\n",
> +			   priv->ndev->name);
> +	}
> +	/* 6. Disable interrupts */
> +	write_reg(priv, REG_RDINTCM0, 0);
> +	write_reg(priv, REG_TDINTCM0, 0);
> +	write_reg(priv, REG_IMR, 0);
> +	read_reg(priv, REG_ISR);
> +
> +	/* 7. Reset queue */
> +	write_reg(priv, REG_RST_QU, 1);
> +	/* 8. Reset port */
> +	write_reg(priv, REG_RST_PORT, 1);
> +	/* 9. Zero all read and write pointers */
> +	for (i = REG_TXD_WPTR_0; i <= REG_TXF_RPTR_3; i += 0x10)
> +		write_reg(priv, i, 0);
> +	/* 10. Unset port disable */
> +	write_reg(priv, REG_DIS_PORT, 0);
> +	/* 11. Unset queue disable */
> +	write_reg(priv, REG_DIS_QU, 0);
> +	/* 12. Unset queue reset */
> +	write_reg(priv, REG_RST_QU, 0);
> +	/* 13. Unset port reset */
> +	write_reg(priv, REG_RST_PORT, 0);
> +	/* 14. Enable Rx */
> +	/* Skipped. will be done later */
> +	return 0;
> +}

...

> +static void bdx_setmulti(struct net_device *ndev)
> +{
> +	struct bdx_priv *priv = netdev_priv(ndev);
> +
> +	u32 rxf_val =
> +	    GMAC_RX_FILTER_AM | GMAC_RX_FILTER_AB | GMAC_RX_FILTER_OSEN |
> +	    GMAC_RX_FILTER_TXFC;
> +	int i;
> +
> +	/* IMF - imperfect (hash) rx multicast filter */
> +	/* PMF - perfect rx multicast filter */
> +
> +	/* FIXME: RXE(OFF) */

Is there a plan to fix this, and the TBD below?

> +	if (ndev->flags & IFF_PROMISC) {
> +		rxf_val |= GMAC_RX_FILTER_PRM;
> +	} else if (ndev->flags & IFF_ALLMULTI) {
> +		/* set IMF to accept all multicast frames */
> +		for (i = 0; i < MAC_MCST_HASH_NUM; i++)
> +			write_reg(priv, REG_RX_MCST_HASH0 + i * 4, ~0);
> +	} else if (netdev_mc_count(ndev)) {
> +		u8 hash;
> +		struct netdev_hw_addr *mclist;
> +		u32 reg, val;
> +
> +		/* Set IMF to deny all multicast frames */
> +		for (i = 0; i < MAC_MCST_HASH_NUM; i++)
> +			write_reg(priv, REG_RX_MCST_HASH0 + i * 4, 0);
> +
> +		/* Set PMF to deny all multicast frames */
> +		for (i = 0; i < MAC_MCST_NUM; i++) {
> +			write_reg(priv, REG_RX_MAC_MCST0 + i * 8, 0);
> +			write_reg(priv, REG_RX_MAC_MCST1 + i * 8, 0);
> +		}
> +		/* Use PMF to accept first MAC_MCST_NUM (15) addresses */
> +
> +		/* TBD: Sort the addresses and write them in ascending
> +		 * order into RX_MAC_MCST regs. we skip this phase now
> +		 * and accept ALL multicast frames through IMF. Accept
> +		 * the rest of addresses throw IMF.
> +		 */
> +		netdev_for_each_mc_addr(mclist, ndev) {
> +			hash = 0;
> +			for (i = 0; i < ETH_ALEN; i++)
> +				hash ^= mclist->addr[i];
> +
> +			reg = REG_RX_MCST_HASH0 + ((hash >> 5) << 2);
> +			val = read_reg(priv, reg);
> +			val |= (1 << (hash % 32));
> +			write_reg(priv, reg, val);
> +		}
> +	} else {
> +		rxf_val |= GMAC_RX_FILTER_AB;
> +	}
> +	write_reg(priv, REG_GMAC_RXF_A, rxf_val);
> +	/* Enable RX */
> +	/* FIXME: RXE(ON) */
> +}

...

> diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h

...

> +#if BITS_PER_LONG == 64
> +#define H32_64(x) ((u32)((u64)(x) >> 32))
> +#define L32_64(x) ((u32)((u64)(x) & 0xffffffff))
> +#elif BITS_PER_LONG == 32
> +#define H32_64(x) 0
> +#define L32_64(x) ((u32)(x))
> +#else /* BITS_PER_LONG == ?? */
> +#error BITS_PER_LONG is undefined. Must be 64 or 32
> +#endif /* BITS_PER_LONG */

I am curious to know why it is valid to mask off the upper 64 bits
on systems with 32 bit longs. As far as I see this is used
in conjunction for supplying DMA addresses to the NIC.
But it's not obvious how that relates to the length
of longs on the host.

Probably I'm missing something very obvious here.
But if not, my follow-up would be to suggest using
upper_32_bits() and lower_32_bits().

> +
> +#define BDX_TXF_DESC_SZ 16
> +#define BDX_MAX_TX_LEVEL (priv->txd_fifo0.m.memsz - 16)
> +#define BDX_MIN_TX_LEVEL 256
> +#define BDX_NO_UPD_PACKETS 40
> +#define BDX_MAX_MTU BIT(14)
> +
> +#define PCK_TH_MULT 128
> +#define INT_COAL_MULT 2
> +
> +#define BITS_MASK(nbits) ((1 << (nbits)) - 1)

> +#define GET_BITS_SHIFT(x, nbits, nshift) (((x) >> (nshift)) & BITS_MASK(nbits))
> +#define BITS_SHIFT_MASK(nbits, nshift) (BITS_MASK(nbits) << (nshift))
> +#define BITS_SHIFT_VAL(x, nbits, nshift) (((x) & BITS_MASK(nbits)) << (nshift))
> +#define BITS_SHIFT_CLEAR(x, nbits, nshift) \
> +	((x) & (~BITS_SHIFT_MASK(nbits, (nshift))))
> +
> +#define GET_INT_COAL(x) GET_BITS_SHIFT(x, 15, 0)
> +#define GET_INT_COAL_RC(x) GET_BITS_SHIFT(x, 1, 15)
> +#define GET_RXF_TH(x) GET_BITS_SHIFT(x, 4, 16)
> +#define GET_PCK_TH(x) GET_BITS_SHIFT(x, 4, 20)

It feels like using of GENMASK and FIELD_GET is appropriate here.

> +
> +#define INT_REG_VAL(coal, coal_rc, rxf_th, pck_th) \
> +	((coal) | ((coal_rc) << 15) | ((rxf_th) << 16) | ((pck_th) << 20))

This looks like a candidate for GENMASK and FILED_PREP.

> +#define MAX_PBL (19)
> +/* PBL describes each virtual buffer to be transmitted from the host. */
> +struct pbl {
> +	u32 pa_lo;
> +	u32 pa_hi;
> +	u32 len;
> +};
> +
> +/* First word for TXD descriptor. It means: type = 3 for regular Tx packet,
> + * hw_csum = 7 for IP+UDP+TCP HW checksums.
> + */
> +#define TXD_W1_VAL(bc, checksum, vtag, lgsnd, vlan_id)               \
> +	((bc) | ((checksum) << 5) | ((vtag) << 8) | ((lgsnd) << 9) | \
> +	 (0x30000) | ((vlan_id & 0x0fff) << 20) |                    \
> +	 (((vlan_id >> 13) & 7) << 13))

Likewise, here.

Also, it would be nice to use a #define for 0x3000
(or 0x3 used with FIELD_PREP with a mask of 0xffff).

> +
> +struct txd_desc {
> +	u32 txd_val1;
> +	u16 mss;
> +	u16 length;
> +	u32 va_lo;
> +	u32 va_hi;
> +	struct pbl pbl[0]; /* Fragments */

I think it is more appropriate to use a flex array here.

	struct pbl pbl[]; /* Fragments */

Flagged by gcc-13 W=1 allmodconfig build (on x86_64).
Please make sure each patch does not introduce new warnings
for such builds.

> +} __packed;
> +
> +struct txf_desc {
> +	u32 status;
> +	u32 va_lo; /* VAdr[31:0] */
> +	u32 va_hi; /* VAdr[63:32] */
> +	u32 pad;
> +} __packed;
> +
> +/* 32 bit kernels use 16 bits for page_offset. Do not increase
> + * LUXOR__MAX_PAGE_SIZE beyind 64K!

nit: beyond

> + */
> +#if BITS_PER_LONG > 32
> +#define LUXOR__MAX_PAGE_SIZE 0x40000
> +#else
> +#define LUXOR__MAX_PAGE_SIZE 0x10000
> +#endif

...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling
  2024-04-18 17:23   ` Simon Horman
@ 2024-04-18 17:24     ` Simon Horman
  2024-04-22  7:29     ` FUJITA Tomonori
  1 sibling, 0 replies; 22+ messages in thread
From: Simon Horman @ 2024-04-18 17:24 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: netdev, andrew

On Thu, Apr 18, 2024 at 06:23:06PM +0100, Simon Horman wrote:
> On Mon, Apr 15, 2024 at 07:43:50PM +0900, FUJITA Tomonori wrote:
> > This patch adds device specific structures to initialize the hardware
> > with basic Tx handling. The original driver loads the embedded
> > firmware in the header file. This driver is implemented to use the
> > firmware APIs.
> > 
> > The Tx logic uses three major data structures; two ring buffers with
> > NIC and one database. One ring buffer is used to send information
> > about packets to be sent for NIC. The other is used to get information
> > from NIC about packet that are sent. The database is used to keep the
> > information about DMA mapping. After a packet is sent, the db is used
> > to free the resource used for the packet.
> > 
> > Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
> 
> Hi Fujita-san,
> 
> Some review from my side.
> 
> > ---
> >  drivers/net/ethernet/tehuti/Kconfig |    1 +
> >  drivers/net/ethernet/tehuti/tn40.c  | 1251 +++++++++++++++++++++++++++
> >  drivers/net/ethernet/tehuti/tn40.h  |  192 +++-
> >  3 files changed, 1443 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
> > index 849e3b4a71c1..4198fd59e42e 100644
> > --- a/drivers/net/ethernet/tehuti/Kconfig
> > +++ b/drivers/net/ethernet/tehuti/Kconfig
> > @@ -26,6 +26,7 @@ config TEHUTI
> >  config TEHUTI_TN40
> >  	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
> >  	depends on PCI
> > +	select FW_LOADER
> >  	help
> >  	  This driver supports 10G Ethernet adapters using Tehuti Networks
> >  	  TN40xx chips. Currently, adapters with Applied Micro Circuits
> 
> Not strictly related to this patch, but did you consider
> adding an entry in the MAINTAINERS file for this driver?

Sorry, somehow I missed that you have done that elsewhere
in this patchset.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 1/5] net: tn40xx: add pci driver for Tehuti Networks TN40xx chips
  2024-04-15 14:24   ` Andrew Lunn
@ 2024-04-21  2:28     ` FUJITA Tomonori
  0 siblings, 0 replies; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-21  2:28 UTC (permalink / raw)
  To: andrew; +Cc: fujita.tomonori, netdev

Hi,

On Mon, 15 Apr 2024 16:24:37 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

>> +#include "tn40.h"
>> +
>> +static int bdx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> 
> The name space prefix generally has something to do with the driver
> name. What does bdx have to do with Tehuti Network TN40xx?

Not sure. Tehuti TN30xx driver also uses bdx_ prefix. I'll use tn40_
prefix in v2.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling
  2024-04-18 17:23   ` Simon Horman
  2024-04-18 17:24     ` Simon Horman
@ 2024-04-22  7:29     ` FUJITA Tomonori
  2024-04-22 10:38       ` Simon Horman
  1 sibling, 1 reply; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-22  7:29 UTC (permalink / raw)
  To: horms; +Cc: fujita.tomonori, netdev, andrew

Hi,

On Thu, 18 Apr 2024 18:23:06 +0100
Simon Horman <horms@kernel.org> wrote:

> On Mon, Apr 15, 2024 at 07:43:50PM +0900, FUJITA Tomonori wrote:
>> This patch adds device specific structures to initialize the hardware
>> with basic Tx handling. The original driver loads the embedded
>> firmware in the header file. This driver is implemented to use the
>> firmware APIs.
>> 
>> The Tx logic uses three major data structures; two ring buffers with
>> NIC and one database. One ring buffer is used to send information
>> about packets to be sent for NIC. The other is used to get information
>> from NIC about packet that are sent. The database is used to keep the
>> information about DMA mapping. After a packet is sent, the db is used
>> to free the resource used for the packet.
>> 
>> Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
> 
> Hi Fujita-san,
> 
> Some review from my side.

Thanks a lot!

>> +static int bdx_fifo_alloc(struct bdx_priv *priv, struct fifo *f, int fsz_type,
>> +			  u16 reg_cfg0, u16 reg_cfg1, u16 reg_rptr, u16 reg_wptr)
> 
> Please consider using a soft limit on line length of 80 characters
> in Networking code.

Sure, fixed.

>> +{
>> +	u16 memsz = FIFO_SIZE * (1 << fsz_type);
> 
> I'm not sure if fsz_type has a meaning here - perhaps it comes from the
> datasheet. But if not, perhaps 'order' would be a more intuitive
> name for this parameter. Similarly for the txd_size and txf_size
> fields of struct bdx_priv, the sz_type field of bdx_tx_db_init(),
> and possibly elsewhere.

I don't have the datasheet of this hardware (so I have to leave alone
many magic values from the original driver).

The driver writes this 'fsz_type' to a register so I guess this is
called something like fifo_size_type for the hardware. I'll rename if
you prefer.


>> +
>> +	memset(f, 0, sizeof(struct fifo));
>> +	/* 1K extra space is allocated at the end of the fifo to simplify
>> +	 * processing of descriptors that wraps around fifo's end.
>> +	 */
>> +	f->va = dma_alloc_coherent(&priv->pdev->dev,
>> +				   memsz + FIFO_EXTRA_SPACE, &f->da, GFP_KERNEL);
>> +	if (!f->va)
>> +		return -ENOMEM;
>> +
>> +	f->reg_cfg0 = reg_cfg0;
>> +	f->reg_cfg1 = reg_cfg1;
>> +	f->reg_rptr = reg_rptr;
>> +	f->reg_wptr = reg_wptr;
>> +	f->rptr = 0;
>> +	f->wptr = 0;
>> +	f->memsz = memsz;
>> +	f->size_mask = memsz - 1;
>> +	write_reg(priv, reg_cfg0, (u32)((f->da & TX_RX_CFG0_BASE) | fsz_type));
> 
> For consistency should this be use H32_64()?:
> 
> 		H32_64((f->da & TX_RX_CFG0_BASE) | fsz_type)

L32_64() if we use here?

The driver splits 64 bits value (f->da) and writes them to reg_cfg0
and reg_cfg1?

>> +	write_reg(priv, reg_cfg1, H32_64(f->da));
>> +	return 0;
>> +}

(snip)

>> +static inline int bdx_tx_db_size(struct txdb *db)
>> +{
>> +	int taken = db->wptr - db->rptr;
>> +
>> +	if (taken < 0)
>> +		taken = db->size + 1 + taken;	/* (size + 1) equals memsz */
>> +	return db->size - taken;
>> +}
> 
> bdx_tx_db_size seems to be unused. Perhaps it can be removed.
> 
> Flagged by W=1 build with clang-18.

My bad, removed.

> ...
> 
>> +/* Sizes of tx desc (including padding if needed) as function of the SKB's
>> + * frag number
>> + */
>> +static struct {
>> +	u16 bytes;
>> +	u16 qwords;		/* qword = 64 bit */
>> +} txd_sizes[MAX_PBL];
>> +
>> +inline void bdx_set_pbl(struct pbl *pbl, dma_addr_t dma, int len)
>> +{
>> +	pbl->len = cpu_to_le32(len);
>> +	pbl->pa_lo = cpu_to_le32(L32_64(dma));
>> +	pbl->pa_hi = cpu_to_le32(H32_64(dma));
>> +}
> 
> The type of the pa_lo and pa_high fields of struct pbl are both u32.
> But here they are assigned little-endian values. This doesn't seem right
> (I expect the types of the fields should be changed to __le32).

Right, fixed. I use __le* for all the members in txd_desc and pbl.

> This was flagged by Sparse, along with a number of other problems
> (which I didn't look into). Please ensure that patches do
> not introduce new Sparse warnings.

Sorry, I'll try.

> ...
> 
>> +/**
>> + * txdb_map_skb - create and store DMA mappings for skb's data blocks
> 
> nit: bdx_tx_map_skb
> 
> Flagged by ./scripts/kernel-doc -Wall -none

Oops, fixed.

>> + * @priv: NIC private structure
>> + * @skb: socket buffer to map
>> + * @txdd: pointer to tx descriptor to be updated
>> + * @pkt_len: pointer to unsigned long value
>> + *
>> + * This function creates DMA mappings for skb's data blocks and writes them to
>> + * PBL of a new tx descriptor. It also stores them in the tx db, so they could
>> + * be unmapped after the data has been sent. It is the responsibility of the
>> + * caller to make sure that there is enough space in the txdb. The last
>> + * element holds a pointer to skb itself and is marked with a zero length.
>> + *
>> + * Return: 0 on success and negative value on error.
>> + */
>> +static inline int bdx_tx_map_skb(struct bdx_priv *priv, struct sk_buff *skb,
>> +				 struct txd_desc *txdd, unsigned int *pkt_len)
>> +{
>> +	dma_addr_t dma;
>> +	int i, len, err;
>> +	struct txdb *db = &priv->txdb;
>> +	struct pbl *pbl = &txdd->pbl[0];
>> +	int nr_frags = skb_shinfo(skb)->nr_frags;
>> +	unsigned int size;
>> +	struct mapping_info info[MAX_PBL];
> 
> nit: Please arrange local variables in new Networking code in reverse
>      xmas tree order - longest line to shortest.
> 
>      This tool is your friend: https://github.com/ecree-solarflare/xmastree

Fixed all the places.

>> +
>> +	netdev_dbg(priv->ndev, "TX skb %p skbLen %d dataLen %d frags %d\n", skb,
>> +		   skb->len, skb->data_len, nr_frags);
>> +	if (nr_frags > MAX_PBL - 1) {
>> +		err = skb_linearize(skb);
>> +		if (err)
>> +			return -1;
>> +		nr_frags = skb_shinfo(skb)->nr_frags;
>> +	}
>> +	/* initial skb */
>> +	len = skb->len - skb->data_len;
>> +	dma = dma_map_single(&priv->pdev->dev, skb->data, len,
>> +			     DMA_TO_DEVICE);
>> +	if (dma_mapping_error(&priv->pdev->dev, dma))
>> +		return -1;
> 
> As I believe Andrew mentioned elsewhere, it's best
> to use standard error values. Perhaps here:
> 
> 	ret = dma_mapping_error(...);
> 	if (ret)
> 		return ret;

Fixed.

>> +
>> +	bdx_set_txdb(db, dma, len);
>> +	bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
>> +	*pkt_len = db->wptr->len;
>> +
>> +	for (i = 0; i < nr_frags; i++) {
>> +		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>> +
>> +		size = skb_frag_size(frag);
>> +		dma = skb_frag_dma_map(&priv->pdev->dev, frag, 0,
>> +				       size, DMA_TO_DEVICE);
>> +
>> +		if (dma_mapping_error(&priv->pdev->dev, dma))
>> +			goto mapping_error;
>> +		info[i].dma = dma;
>> +		info[i].size = size;
>> +	}
>> +
>> +	for (i = 0; i < nr_frags; i++) {
>> +		bdx_tx_db_inc_wptr(db);
>> +		bdx_set_txdb(db, info[i].dma, info[i].size);
>> +		bdx_set_pbl(pbl++, db->wptr->addr.dma, db->wptr->len);
>> +		*pkt_len += db->wptr->len;
>> +	}
>> +
>> +	/* SHORT_PKT_FIX */
>> +	if (skb->len < SHORT_PACKET_SIZE)
>> +		++nr_frags;
>> +
>> +	/* Add skb clean up info. */
>> +	bdx_tx_db_inc_wptr(db);
>> +	db->wptr->len = -txd_sizes[nr_frags].bytes;
>> +	db->wptr->addr.skb = skb;
>> +	bdx_tx_db_inc_wptr(db);
>> +
>> +	return 0;
>> + mapping_error:
>> +	dma_unmap_page(&priv->pdev->dev, db->wptr->addr.dma, db->wptr->len, DMA_TO_DEVICE);
>> +	for (; i > 0; i--)
>> +		dma_unmap_page(&priv->pdev->dev, info[i - 1].dma, info[i - 1].size, DMA_TO_DEVICE);
>> +	return -1;
> 
> And here:
> 
> 	return -ENOMEM;

Fixed.

>> +static int create_tx_ring(struct bdx_priv *priv)
>> +{
>> +	int ret;
>> +
>> +	ret = bdx_fifo_alloc(priv, &priv->txd_fifo0.m, priv->txd_size,
>> +			     REG_TXD_CFG0_0, REG_TXD_CFG1_0,
>> +			     REG_TXD_RPTR_0, REG_TXD_WPTR_0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = bdx_fifo_alloc(priv, &priv->txf_fifo0.m, priv->txf_size,
>> +			     REG_TXF_CFG0_0, REG_TXF_CFG1_0,
>> +			     REG_TXF_RPTR_0, REG_TXF_WPTR_0);
>> +	if (ret)
>> +		goto err_free_txd;
>> +
>> +	/* The TX db has to keep mappings for all packets sent (on
>> +	 * TxD) and not yet reclaimed (on TxF).
>> +	 */
>> +	ret = bdx_tx_db_init(&priv->txdb, max(priv->txd_size, priv->txf_size));
>> +	if (ret)
>> +		goto err_free_txf;
>> +
>> +	/* SHORT_PKT_FIX */
>> +	priv->b0_len = 64;
>> +	priv->b0_va =
>> +		dma_alloc_coherent(&priv->pdev->dev, priv->b0_len, &priv->b0_dma, GFP_KERNEL);
> 
> nit: I think this indentation would be more in keeping with normal practices:
> 
> 	priv->b0_va = dma_alloc_coherent(&priv->pdev->dev, priv->b0_len,
> 					 &priv->b0_dma, GFP_KERNEL);

Fixed.

>> +	if (!priv->b0_va)
>> +		goto err_free_db;
> 
> The goto above will result in the function returning ret.
> But ret is 0 here. Should it be set to a negative error value,
> f.e. -ENOMEM?
> 
> Flagged by Smatch.

Oops, fixed.

>> +
>> +	priv->tx_level = BDX_MAX_TX_LEVEL;
>> +	priv->tx_update_mark = priv->tx_level - 1024;
>> +	return 0;
>> +err_free_db:
>> +	bdx_tx_db_close(&priv->txdb);
>> +err_free_txf:
>> +	bdx_fifo_free(priv, &priv->txf_fifo0.m);
>> +err_free_txd:
>> +	bdx_fifo_free(priv, &priv->txd_fifo0.m);
>> +	return ret;
>> +}
>> +
>> +/**
>> + * bdx_tx_space - Calculate the available space in the TX fifo.
>> + *
>> + * @priv - NIC private structure
> 
> nit: '@priv: NIC private structure'

Fixed.

>> + * Return: available space in TX fifo in bytes
>> + */
> 
> ...
> 
>> +static int bdx_set_link_speed(struct bdx_priv *priv, u32 speed)
>> +{
>> +	int i;
>> +	u32 val;
>> +
>> +	netdev_dbg(priv->ndev, "speed %d\n", speed);
>> +
>> +	switch (speed) {
>> +	case SPEED_10000:
>> +	case SPEED_5000:
>> +	case SPEED_2500:
>> +		netdev_dbg(priv->ndev, "link_speed %d\n", speed);
>> +
> 
> There are a lot of magic numbers below.
> Could these be converted into #defines with meaningful names?

Without the datasheet, I'm not sure what names are appropriate..

>> +		write_reg(priv, 0x1010, 0x217);	/*ETHSD.REFCLK_CONF  */
>> +		write_reg(priv, 0x104c, 0x4c);	/*ETHSD.L0_RX_PCNT  */
>> +		write_reg(priv, 0x1050, 0x4c);	/*ETHSD.L1_RX_PCNT  */
>> +		write_reg(priv, 0x1054, 0x4c);	/*ETHSD.L2_RX_PCNT  */
>> +		write_reg(priv, 0x1058, 0x4c);	/*ETHSD.L3_RX_PCNT  */
>> +		write_reg(priv, 0x102c, 0x434);	/*ETHSD.L0_TX_PCNT  */
>> +		write_reg(priv, 0x1030, 0x434);	/*ETHSD.L1_TX_PCNT  */
>> +		write_reg(priv, 0x1034, 0x434);	/*ETHSD.L2_TX_PCNT  */
>> +		write_reg(priv, 0x1038, 0x434);	/*ETHSD.L3_TX_PCNT  */
>> +		write_reg(priv, 0x6300, 0x0400);	/*MAC.PCS_CTRL */
> 
> ...
> 
>> +static int bdx_hw_reset(struct bdx_priv *priv)
>> +{
>> +	u32 val, i;
>> +
>> +	/* Reset sequences: read, write 1, read, write 0 */
>> +	val = read_reg(priv, REG_CLKPLL);
>> +	write_reg(priv, REG_CLKPLL, (val | CLKPLL_SFTRST) + 0x8);
>> +	udelay(50);
> 
> Checkpatch recommends using usleep_range() here
> after consulting with Documentation/timers/timers-howto.rst.
> TBH, I'm unsure of the merit of this advice.

Yeah, I run checkpatch but don't have the datascheet so I'm not sure
what range are appropriate.


>> +	val = read_reg(priv, REG_CLKPLL);
>> +	write_reg(priv, REG_CLKPLL, val & ~CLKPLL_SFTRST);
>> +
>> +	/* Check that the PLLs are locked and reset ended */
>> +	for (i = 0; i < 70; i++, mdelay(10)) {
>> +		if ((read_reg(priv, REG_CLKPLL) & CLKPLL_LKD) == CLKPLL_LKD) {
>> +			udelay(50);
> 
> Ditto.
> 
>> +			/* Do any PCI-E read transaction */
>> +			read_reg(priv, REG_RXD_CFG0_0);
>> +			return 0;
>> +		}
>> +	}
>> +	return 1;
>> +}
>> +
>> +static int bdx_sw_reset(struct bdx_priv *priv)
> 
> nit: This function always returns zero, and the caller ignores the
>      return value. It's return type could be void.

Fixed.

>> +static void bdx_setmulti(struct net_device *ndev)
>> +{
>> +	struct bdx_priv *priv = netdev_priv(ndev);
>> +
>> +	u32 rxf_val =
>> +	    GMAC_RX_FILTER_AM | GMAC_RX_FILTER_AB | GMAC_RX_FILTER_OSEN |
>> +	    GMAC_RX_FILTER_TXFC;
>> +	int i;
>> +
>> +	/* IMF - imperfect (hash) rx multicast filter */
>> +	/* PMF - perfect rx multicast filter */
>> +
>> +	/* FIXME: RXE(OFF) */
> 
> Is there a plan to fix this, and the TBD below?

I just left the original code comment alone. Not sure what I should do
here. better to remove completely?

>> diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
> 
> ...
> 
>> +#if BITS_PER_LONG == 64
>> +#define H32_64(x) ((u32)((u64)(x) >> 32))
>> +#define L32_64(x) ((u32)((u64)(x) & 0xffffffff))
>> +#elif BITS_PER_LONG == 32
>> +#define H32_64(x) 0
>> +#define L32_64(x) ((u32)(x))
>> +#else /* BITS_PER_LONG == ?? */
>> +#error BITS_PER_LONG is undefined. Must be 64 or 32
>> +#endif /* BITS_PER_LONG */
> 
> I am curious to know why it is valid to mask off the upper 64 bits
> on systems with 32 bit longs. As far as I see this is used
> in conjunction for supplying DMA addresses to the NIC.
> But it's not obvious how that relates to the length
> of longs on the host.

I suppose that the original driver tries to use the length of
dma_addr_t (CONFIG_ARCH_DMA_ADDR_T_64BIT?) here.

> Probably I'm missing something very obvious here.
> But if not, my follow-up would be to suggest using
> upper_32_bits() and lower_32_bits().

You prefer to remove ifdef 

#define H32_64(x) upper_32_bits(x)
#define L32_64(x) lower_32_bits(x)

?

or replace H32_64 and L32_64 with upper_32_bits and lower_32_bits
respectively?

>> +#define BDX_TXF_DESC_SZ 16
>> +#define BDX_MAX_TX_LEVEL (priv->txd_fifo0.m.memsz - 16)
>> +#define BDX_MIN_TX_LEVEL 256
>> +#define BDX_NO_UPD_PACKETS 40
>> +#define BDX_MAX_MTU BIT(14)
>> +
>> +#define PCK_TH_MULT 128
>> +#define INT_COAL_MULT 2
>> +
>> +#define BITS_MASK(nbits) ((1 << (nbits)) - 1)
> 
>> +#define GET_BITS_SHIFT(x, nbits, nshift) (((x) >> (nshift)) & BITS_MASK(nbits))
>> +#define BITS_SHIFT_MASK(nbits, nshift) (BITS_MASK(nbits) << (nshift))
>> +#define BITS_SHIFT_VAL(x, nbits, nshift) (((x) & BITS_MASK(nbits)) << (nshift))
>> +#define BITS_SHIFT_CLEAR(x, nbits, nshift) \
>> +	((x) & (~BITS_SHIFT_MASK(nbits, (nshift))))
>> +
>> +#define GET_INT_COAL(x) GET_BITS_SHIFT(x, 15, 0)
>> +#define GET_INT_COAL_RC(x) GET_BITS_SHIFT(x, 1, 15)
>> +#define GET_RXF_TH(x) GET_BITS_SHIFT(x, 4, 16)
>> +#define GET_PCK_TH(x) GET_BITS_SHIFT(x, 4, 20)
> 
> It feels like using of GENMASK and FIELD_GET is appropriate here.

Sure, I'll replace the above macros with GENMASK and FIELD_GET. 

>> +#define INT_REG_VAL(coal, coal_rc, rxf_th, pck_th) \
>> +	((coal) | ((coal_rc) << 15) | ((rxf_th) << 16) | ((pck_th) << 20))
> 
> This looks like a candidate for GENMASK and FILED_PREP.

like the following?

#define INT_REG_VAL(coal, coal_rc, rxf_th, pck_th)      \
	FIELD_PREP(GENMASK(14, 0), (coal)) |            \
	FIELD_PREP(GENMASK(15, 15), (coal_rc)) |        \
	FIELD_PREP(GENMASK(19, 16), (rxf_th)) |         \
	FIELD_PREP(GENMASK(31, 20), (pck_th))


>> +#define MAX_PBL (19)
>> +/* PBL describes each virtual buffer to be transmitted from the host. */
>> +struct pbl {
>> +	u32 pa_lo;
>> +	u32 pa_hi;
>> +	u32 len;
>> +};
>> +
>> +/* First word for TXD descriptor. It means: type = 3 for regular Tx packet,
>> + * hw_csum = 7 for IP+UDP+TCP HW checksums.
>> + */
>> +#define TXD_W1_VAL(bc, checksum, vtag, lgsnd, vlan_id)               \
>> +	((bc) | ((checksum) << 5) | ((vtag) << 8) | ((lgsnd) << 9) | \
>> +	 (0x30000) | ((vlan_id & 0x0fff) << 20) |                    \
>> +	 (((vlan_id >> 13) & 7) << 13))
> 
> Likewise, here.
> 
> Also, it would be nice to use a #define for 0x3000
> (or 0x3 used with FIELD_PREP with a mask of 0xffff).

Understood, I'll try.


>> +struct txd_desc {
>> +	u32 txd_val1;
>> +	u16 mss;
>> +	u16 length;
>> +	u32 va_lo;
>> +	u32 va_hi;
>> +	struct pbl pbl[0]; /* Fragments */
> 
> I think it is more appropriate to use a flex array here.
> 
> 	struct pbl pbl[]; /* Fragments */
> 
> Flagged by gcc-13 W=1 allmodconfig build (on x86_64).
> Please make sure each patch does not introduce new warnings
> for such builds.

Fixed. seems that gcc-12 doesn't complain. I'll set up and try gcc-13.


>> +} __packed;
>> +
>> +struct txf_desc {
>> +	u32 status;
>> +	u32 va_lo; /* VAdr[31:0] */
>> +	u32 va_hi; /* VAdr[63:32] */
>> +	u32 pad;
>> +} __packed;
>> +
>> +/* 32 bit kernels use 16 bits for page_offset. Do not increase
>> + * LUXOR__MAX_PAGE_SIZE beyind 64K!
> 
> nit: beyond

Oops, fixed.


Thanks a lot!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling
  2024-04-22  7:29     ` FUJITA Tomonori
@ 2024-04-22 10:38       ` Simon Horman
  0 siblings, 0 replies; 22+ messages in thread
From: Simon Horman @ 2024-04-22 10:38 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: netdev, andrew

On Mon, Apr 22, 2024 at 04:29:13PM +0900, FUJITA Tomonori wrote:
> Hi,
> 
> On Thu, 18 Apr 2024 18:23:06 +0100
> Simon Horman <horms@kernel.org> wrote:
> 
> > On Mon, Apr 15, 2024 at 07:43:50PM +0900, FUJITA Tomonori wrote:
> >> This patch adds device specific structures to initialize the hardware
> >> with basic Tx handling. The original driver loads the embedded
> >> firmware in the header file. This driver is implemented to use the
> >> firmware APIs.
> >> 
> >> The Tx logic uses three major data structures; two ring buffers with
> >> NIC and one database. One ring buffer is used to send information
> >> about packets to be sent for NIC. The other is used to get information
> >> from NIC about packet that are sent. The database is used to keep the
> >> information about DMA mapping. After a packet is sent, the db is used
> >> to free the resource used for the packet.
> >> 
> >> Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
> > 
> > Hi Fujita-san,
> > 
> > Some review from my side.
> 
> Thanks a lot!

Likewise, thanks for addressing most of my concerns.

> >> +static int bdx_fifo_alloc(struct bdx_priv *priv, struct fifo *f, int fsz_type,
> >> +			  u16 reg_cfg0, u16 reg_cfg1, u16 reg_rptr, u16 reg_wptr)
> > 
> > Please consider using a soft limit on line length of 80 characters
> > in Networking code.
> 
> Sure, fixed.
> 
> >> +{
> >> +	u16 memsz = FIFO_SIZE * (1 << fsz_type);
> > 
> > I'm not sure if fsz_type has a meaning here - perhaps it comes from the
> > datasheet. But if not, perhaps 'order' would be a more intuitive
> > name for this parameter. Similarly for the txd_size and txf_size
> > fields of struct bdx_priv, the sz_type field of bdx_tx_db_init(),
> > and possibly elsewhere.
> 
> I don't have the datasheet of this hardware (so I have to leave alone
> many magic values from the original driver).
> 
> The driver writes this 'fsz_type' to a register so I guess this is
> called something like fifo_size_type for the hardware. I'll rename if
> you prefer.

No strong preference here.

> >> +
> >> +	memset(f, 0, sizeof(struct fifo));
> >> +	/* 1K extra space is allocated at the end of the fifo to simplify
> >> +	 * processing of descriptors that wraps around fifo's end.
> >> +	 */
> >> +	f->va = dma_alloc_coherent(&priv->pdev->dev,
> >> +				   memsz + FIFO_EXTRA_SPACE, &f->da, GFP_KERNEL);
> >> +	if (!f->va)
> >> +		return -ENOMEM;
> >> +
> >> +	f->reg_cfg0 = reg_cfg0;
> >> +	f->reg_cfg1 = reg_cfg1;
> >> +	f->reg_rptr = reg_rptr;
> >> +	f->reg_wptr = reg_wptr;
> >> +	f->rptr = 0;
> >> +	f->wptr = 0;
> >> +	f->memsz = memsz;
> >> +	f->size_mask = memsz - 1;
> >> +	write_reg(priv, reg_cfg0, (u32)((f->da & TX_RX_CFG0_BASE) | fsz_type));
> > 
> > For consistency should this be use H32_64()?:
> > 
> > 		H32_64((f->da & TX_RX_CFG0_BASE) | fsz_type)
> 
> L32_64() if we use here?
>
> The driver splits 64 bits value (f->da) and writes them to reg_cfg0
> and reg_cfg1?

Yes, my mistake. L32_64() seems appropriate here.

...

> > There are a lot of magic numbers below.
> > Could these be converted into #defines with meaningful names?
> 
> Without the datasheet, I'm not sure what names are appropriate..

Ok, understood.

> >> +		write_reg(priv, 0x1010, 0x217);	/*ETHSD.REFCLK_CONF  */
> >> +		write_reg(priv, 0x104c, 0x4c);	/*ETHSD.L0_RX_PCNT  */
> >> +		write_reg(priv, 0x1050, 0x4c);	/*ETHSD.L1_RX_PCNT  */
> >> +		write_reg(priv, 0x1054, 0x4c);	/*ETHSD.L2_RX_PCNT  */
> >> +		write_reg(priv, 0x1058, 0x4c);	/*ETHSD.L3_RX_PCNT  */
> >> +		write_reg(priv, 0x102c, 0x434);	/*ETHSD.L0_TX_PCNT  */
> >> +		write_reg(priv, 0x1030, 0x434);	/*ETHSD.L1_TX_PCNT  */
> >> +		write_reg(priv, 0x1034, 0x434);	/*ETHSD.L2_TX_PCNT  */
> >> +		write_reg(priv, 0x1038, 0x434);	/*ETHSD.L3_TX_PCNT  */
> >> +		write_reg(priv, 0x6300, 0x0400);	/*MAC.PCS_CTRL */
> > 
> > ...
> > 
> >> +static int bdx_hw_reset(struct bdx_priv *priv)
> >> +{
> >> +	u32 val, i;
> >> +
> >> +	/* Reset sequences: read, write 1, read, write 0 */
> >> +	val = read_reg(priv, REG_CLKPLL);
> >> +	write_reg(priv, REG_CLKPLL, (val | CLKPLL_SFTRST) + 0x8);
> >> +	udelay(50);
> > 
> > Checkpatch recommends using usleep_range() here
> > after consulting with Documentation/timers/timers-howto.rst.
> > TBH, I'm unsure of the merit of this advice.
> 
> Yeah, I run checkpatch but don't have the datascheet so I'm not sure
> what range are appropriate.

I'd guess that a range of 50 - 100 would be fine.
But I take your point about not having the datasheet,
so perhaps it is safest to just keep the udelay() for now.

> 
> 
> >> +	val = read_reg(priv, REG_CLKPLL);
> >> +	write_reg(priv, REG_CLKPLL, val & ~CLKPLL_SFTRST);
> >> +
> >> +	/* Check that the PLLs are locked and reset ended */
> >> +	for (i = 0; i < 70; i++, mdelay(10)) {
> >> +		if ((read_reg(priv, REG_CLKPLL) & CLKPLL_LKD) == CLKPLL_LKD) {
> >> +			udelay(50);
> > 
> > Ditto.
> > 
> >> +			/* Do any PCI-E read transaction */
> >> +			read_reg(priv, REG_RXD_CFG0_0);
> >> +			return 0;
> >> +		}
> >> +	}
> >> +	return 1;
> >> +}
> >> +
> >> +static int bdx_sw_reset(struct bdx_priv *priv)
> > 
> > nit: This function always returns zero, and the caller ignores the
> >      return value. It's return type could be void.
> 
> Fixed.
> 
> >> +static void bdx_setmulti(struct net_device *ndev)
> >> +{
> >> +	struct bdx_priv *priv = netdev_priv(ndev);
> >> +
> >> +	u32 rxf_val =
> >> +	    GMAC_RX_FILTER_AM | GMAC_RX_FILTER_AB | GMAC_RX_FILTER_OSEN |
> >> +	    GMAC_RX_FILTER_TXFC;
> >> +	int i;
> >> +
> >> +	/* IMF - imperfect (hash) rx multicast filter */
> >> +	/* PMF - perfect rx multicast filter */
> >> +
> >> +	/* FIXME: RXE(OFF) */
> > 
> > Is there a plan to fix this, and the TBD below?
> 
> I just left the original code comment alone. Not sure what I should do
> here. better to remove completely?

Usually it's best not to have such comments.
But if it's a carry-over from existing code,
then I suppose it is best to leave it as is.

> 
> >> diff --git a/drivers/net/ethernet/tehuti/tn40.h b/drivers/net/ethernet/tehuti/tn40.h
> > 
> > ...
> > 
> >> +#if BITS_PER_LONG == 64
> >> +#define H32_64(x) ((u32)((u64)(x) >> 32))
> >> +#define L32_64(x) ((u32)((u64)(x) & 0xffffffff))
> >> +#elif BITS_PER_LONG == 32
> >> +#define H32_64(x) 0
> >> +#define L32_64(x) ((u32)(x))
> >> +#else /* BITS_PER_LONG == ?? */
> >> +#error BITS_PER_LONG is undefined. Must be 64 or 32
> >> +#endif /* BITS_PER_LONG */
> > 
> > I am curious to know why it is valid to mask off the upper 64 bits
> > on systems with 32 bit longs. As far as I see this is used
> > in conjunction for supplying DMA addresses to the NIC.
> > But it's not obvious how that relates to the length
> > of longs on the host.
> 
> I suppose that the original driver tries to use the length of
> dma_addr_t (CONFIG_ARCH_DMA_ADDR_T_64BIT?) here.
> 
> > Probably I'm missing something very obvious here.
> > But if not, my follow-up would be to suggest using
> > upper_32_bits() and lower_32_bits().
> 
> You prefer to remove ifdef 
> 
> #define H32_64(x) upper_32_bits(x)
> #define L32_64(x) lower_32_bits(x)
> 
> ?
> 
> or replace H32_64 and L32_64 with upper_32_bits and lower_32_bits
> respectively?

I'd go with the last option if you think it is safe to do so.
But if you think it's a bit risky, perhaps it's best to keep
the code as is for now.

> >> +#define BDX_TXF_DESC_SZ 16
> >> +#define BDX_MAX_TX_LEVEL (priv->txd_fifo0.m.memsz - 16)
> >> +#define BDX_MIN_TX_LEVEL 256
> >> +#define BDX_NO_UPD_PACKETS 40
> >> +#define BDX_MAX_MTU BIT(14)
> >> +
> >> +#define PCK_TH_MULT 128
> >> +#define INT_COAL_MULT 2
> >> +
> >> +#define BITS_MASK(nbits) ((1 << (nbits)) - 1)
> > 
> >> +#define GET_BITS_SHIFT(x, nbits, nshift) (((x) >> (nshift)) & BITS_MASK(nbits))
> >> +#define BITS_SHIFT_MASK(nbits, nshift) (BITS_MASK(nbits) << (nshift))
> >> +#define BITS_SHIFT_VAL(x, nbits, nshift) (((x) & BITS_MASK(nbits)) << (nshift))
> >> +#define BITS_SHIFT_CLEAR(x, nbits, nshift) \
> >> +	((x) & (~BITS_SHIFT_MASK(nbits, (nshift))))
> >> +
> >> +#define GET_INT_COAL(x) GET_BITS_SHIFT(x, 15, 0)
> >> +#define GET_INT_COAL_RC(x) GET_BITS_SHIFT(x, 1, 15)
> >> +#define GET_RXF_TH(x) GET_BITS_SHIFT(x, 4, 16)
> >> +#define GET_PCK_TH(x) GET_BITS_SHIFT(x, 4, 20)
> > 
> > It feels like using of GENMASK and FIELD_GET is appropriate here.
> 
> Sure, I'll replace the above macros with GENMASK and FIELD_GET. 
> 
> >> +#define INT_REG_VAL(coal, coal_rc, rxf_th, pck_th) \
> >> +	((coal) | ((coal_rc) << 15) | ((rxf_th) << 16) | ((pck_th) << 20))
> > 
> > This looks like a candidate for GENMASK and FILED_PREP.
> 
> like the following?
> 
> #define INT_REG_VAL(coal, coal_rc, rxf_th, pck_th)      \
> 	FIELD_PREP(GENMASK(14, 0), (coal)) |            \
> 	FIELD_PREP(GENMASK(15, 15), (coal_rc)) |        \
> 	FIELD_PREP(GENMASK(19, 16), (rxf_th)) |         \
> 	FIELD_PREP(GENMASK(31, 20), (pck_th))

Yes, I think so.
I think you can use BIT(15) in place of GENMASK(15, 15).

...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support
  2024-04-16 12:57       ` Andrew Lunn
@ 2024-04-25  1:24         ` FUJITA Tomonori
  2024-04-25 14:37           ` Andrew Lunn
  0 siblings, 1 reply; 22+ messages in thread
From: FUJITA Tomonori @ 2024-04-25  1:24 UTC (permalink / raw)
  To: andrew; +Cc: fujita.tomonori, netdev

Hi,

On Tue, 16 Apr 2024 14:57:58 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

>> > Are there variants of this device using SFP? It might be you actually
>> > want to use phylink, not phylib. That is a bit messy for a PCI device,
>> > look at drivers/net/ethernet/wangxun.
>> 
>> phylink is necessary if PHY is hot-pluggable, right? if so, the driver
>> doesn't need it. The PHYs that adapters with TN40XX use are
> 
> There is more to it than that. phylib has problems when the bandwidth
> is > 1G and the MAC/PHY link becomes more problematic. Often the PHY
> will change this link depending on what the media side is doing. If
> you have a 1G SFP inserted, the QT2025 will change the MAC/PHY link to
> 1000BaseX. If it has a 10G SFP it will use XAUI. phylink knows how to
> decode the SFP EEPROM to determine what sort of module it is, and how
> the PHY should be configured.
> 
> To fully support this hardware you are going to need to use phylink.

I updated the code to use phylink and posted v2. At least seems that
it works with 10G SFP+ as before.

I suppose that more changes are necessary for full support. For
example, with 1G SFP inserted, the MAC driver has to configure the
hardware for 1G. I'll investigate once I get 1G SFP.

Note that the original driver supports only 10G SFP+.


>> >> diff --git a/drivers/net/ethernet/tehuti/Kconfig b/drivers/net/ethernet/tehuti/Kconfig
>> >> index 4198fd59e42e..71f22471f9a0 100644
>> >> --- a/drivers/net/ethernet/tehuti/Kconfig
>> >> +++ b/drivers/net/ethernet/tehuti/Kconfig
>> >> @@ -27,6 +27,7 @@ config TEHUTI_TN40
>> >>  	tristate "Tehuti Networks TN40xx 10G Ethernet adapters"
>> >>  	depends on PCI
>> >>  	select FW_LOADER
>> >> +	select AMCC_QT2025_PHY
>> > 
>> > That is pretty unusual, especially when you say there are a few
>> > different choices.
>> 
>> I should not put any 'select *_PHY' here?
> 
> Correct. Most distributions just package everything.
> 
> We are going to get into an odd corner case that since Rust is still
> experimental, i doubt distributions are building Rust modules. So they
> will end up with a MAC driver but no PHY driver, at least not for the
> QT2025. The Marvell and Aquantia PHY should just work.
> 
> Anybody who does want to use the QT2025 will either need to
> build there own kernel, or black list the in kernel MAC driver and use
> the out of tree driver. But eventually, Rust will start to be
> packaged, and then it should work out O.K.

Sure, I dropped the above in v2.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support
  2024-04-25  1:24         ` FUJITA Tomonori
@ 2024-04-25 14:37           ` Andrew Lunn
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Lunn @ 2024-04-25 14:37 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: netdev

> I updated the code to use phylink and posted v2. At least seems that
> it works with 10G SFP+ as before.
> 
> I suppose that more changes are necessary for full support. For
> example, with 1G SFP inserted, the MAC driver has to configure the
> hardware for 1G. I'll investigate once I get 1G SFP.
> 
> Note that the original driver supports only 10G SFP+.

This is where the Rust PHY driver gets more interesting.

A PHY which is being used as a media converter needs to support a few
additional things. The marvel10g driver is a good example to follow
for some things. Look at mv3310_sfp_ops.

But there is more to it than that. phylink assumes it has access to
the i2c bus the module is on. The datasheet for the QT2025, shows a
couple of diagrams how an SFP is connected to it. The QT2025 has an
I2C bus which should be connected to the SFP cage. So you need to
export this I2C bus master in the PHY driver. So a Rust I2C bus
driver. Has anybody done that yet? However, it is not clear from my
reading of the datasheet if you get true access to the I2C bus. It
says:

XFP/SFP+ Module Access through MDIO

The MDIO interface can be used to access an XFP or
SFP+ module. The XFP/SFP+ module 2-wire interface
must be connected to the UC_SCL and UC_SDA clock
and data lines. The XFP module address is 1010000,
while the SFP+ module uses memory at addresses
1010000 and 1010001. The entire module address
space will be automatically read upon powerup, reset
or module hotplug by detection of the MOD_ABS
signal. A 400ms delay is observed before upload to
allow the module to initialize.
The memory at module address 1010000 is mapped to
MDIO register range 3.D000 - 3.D0FFh. Read/write
access to the module memory is controlled by MDIO
register 3.D100h. This applies to both module types.

The memory at module address 1010001 is mapped to
MDIO register range 1.8007 - 1.8106h. No read/write
access to the module memory is provided.

DOM Memory Access
The SFP+ DOM memory (A2) is mapped to MDIO
registers 1.8007-1.8106h (the NVR address space) or
alternatively to 1.A000-1.A0FF (this is firmware load
dependent; it may be configurable). If mapped to
1.A000-1.A0FF, DOM-related alarms in the SFP+
module will feed into the LASI alarm tree (see “Link
Alarm Status Interrupt Pin (LASI)” on page 74 for
details).

Later firmware versions implement a DOM periodic
polling feature, where the DOM memory is read at
every 1s. Optical alarms will then automatically alert
the host system through the LASI interrupt pins. Only a
subset of registers containing dynamically changing
values are polled on each update. Consult AMCC for
details on this feature.

So i guess you are going to need to fake the I2C bus, mapping I2C
transfer requests into MDIO reads of the mapped memory.

Additionally, phylink expects a few GPIO for Los of Signal, TX Enable,
TX Fault, etc. The PHY has these, so the PHY driver needs to export a
GPIO driver.

And then you need some glue, to bring all the parts together. The
wangxun Ethernet driver has mostly solved this, so you can take
inspiration from there.

You picked an interesting device to add Rust support for.

	Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-04-25 14:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-15 10:43 [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips FUJITA Tomonori
2024-04-15 10:43 ` [PATCH net-next v1 1/5] net: tn40xx: add pci " FUJITA Tomonori
2024-04-15 12:39   ` kernel test robot
2024-04-15 14:24   ` Andrew Lunn
2024-04-21  2:28     ` FUJITA Tomonori
2024-04-15 10:43 ` [PATCH net-next v1 2/5] net: tn40xx: add register defines FUJITA Tomonori
2024-04-15 10:43 ` [PATCH net-next v1 3/5] net: tn40xx: add basic Tx handling FUJITA Tomonori
2024-04-15 22:29   ` kernel test robot
2024-04-18 17:23   ` Simon Horman
2024-04-18 17:24     ` Simon Horman
2024-04-22  7:29     ` FUJITA Tomonori
2024-04-22 10:38       ` Simon Horman
2024-04-15 10:43 ` [PATCH net-next v1 4/5] net: tn40xx: add basic Rx handling FUJITA Tomonori
2024-04-16  0:38   ` kernel test robot
2024-04-15 10:43 ` [PATCH net-next v1 5/5] net: tn40xx: add PHYLIB support FUJITA Tomonori
2024-04-15 14:44   ` Andrew Lunn
2024-04-16 12:19     ` FUJITA Tomonori
2024-04-16 12:57       ` Andrew Lunn
2024-04-25  1:24         ` FUJITA Tomonori
2024-04-25 14:37           ` Andrew Lunn
2024-04-15 23:21 ` [PATCH net-next v1 0/5] add ethernet driver for Tehuti Networks TN40xx chips Florian Fainelli
2024-04-16 10:59   ` FUJITA Tomonori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.