All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
@ 2024-04-03 20:08 Alexander Duyck
  2024-04-03 20:08 ` [net-next PATCH 01/15] PCI: Add Meta Platforms vendor ID Alexander Duyck
                   ` (18 more replies)
  0 siblings, 19 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

This patch set includes the necessary patches to enable basic Tx and Rx
over the Meta Platforms Host Network Interface. To do this we introduce a
new driver and driver and directories in the form of
"drivers/net/ethernet/meta/fbnic".

Due to submission limits the general plan to submit a minimal driver for
now almost equivalent to a UEFI driver in functionality, and then follow up
over the coming weeks enabling additional offloads and more features for
the device.

The general plan is to look at adding support for ethtool, statistics, and
start work on offloads in the next set of patches.

---

Alexander Duyck (15):
      PCI: Add Meta Platforms vendor ID
      eth: fbnic: add scaffolding for Meta's NIC driver
      eth: fbnic: Allocate core device specific structures and devlink interface
      eth: fbnic: Add register init to set PCIe/Ethernet device config
      eth: fbnic: add message parsing for FW messages
      eth: fbnic: add FW communication mechanism
      eth: fbnic: allocate a netdevice and napi vectors with queues
      eth: fbnic: implement Tx queue alloc/start/stop/free
      eth: fbnic: implement Rx queue alloc/start/stop/free
      eth: fbnic: Add initial messaging to notify FW of our presence
      eth: fbnic: Enable Ethernet link setup
      eth: fbnic: add basic Tx handling
      eth: fbnic: add basic Rx handling
      eth: fbnic: add L2 address programming
      eth: fbnic: write the TCAM tables used for RSS control and Rx to host


 MAINTAINERS                                   |    7 +
 drivers/net/ethernet/Kconfig                  |    1 +
 drivers/net/ethernet/Makefile                 |    1 +
 drivers/net/ethernet/meta/Kconfig             |   29 +
 drivers/net/ethernet/meta/Makefile            |    6 +
 drivers/net/ethernet/meta/fbnic/Makefile      |   18 +
 drivers/net/ethernet/meta/fbnic/fbnic.h       |  148 ++
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h   |  912 ++++++++
 .../net/ethernet/meta/fbnic/fbnic_devlink.c   |   86 +
 .../net/ethernet/meta/fbnic/fbnic_drvinfo.h   |    5 +
 drivers/net/ethernet/meta/fbnic/fbnic_fw.c    |  823 ++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_fw.h    |  133 ++
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |  251 +++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 1025 +++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.h   |   83 +
 .../net/ethernet/meta/fbnic/fbnic_netdev.c    |  470 +++++
 .../net/ethernet/meta/fbnic/fbnic_netdev.h    |   59 +
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |  633 ++++++
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c   |  709 +++++++
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.h   |  189 ++
 drivers/net/ethernet/meta/fbnic/fbnic_tlv.c   |  529 +++++
 drivers/net/ethernet/meta/fbnic/fbnic_tlv.h   |  175 ++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c  | 1873 +++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h  |  125 ++
 include/linux/pci_ids.h                       |    2 +
 25 files changed, 8292 insertions(+)
 create mode 100644 drivers/net/ethernet/meta/Kconfig
 create mode 100644 drivers/net/ethernet/meta/Makefile
 create mode 100644 drivers/net/ethernet/meta/fbnic/Makefile
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_csr.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_drvinfo.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_fw.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_fw.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_irq.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_mac.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_mac.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_pci.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_rpc.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h

--


^ permalink raw reply	[flat|nested] 163+ messages in thread

* [net-next PATCH 01/15] PCI: Add Meta Platforms vendor ID
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 20:20   ` Bjorn Helgaas
  2024-04-03 20:08 ` [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver Alexander Duyck
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Add Meta as a vendor ID for PCI devices so we can use the macro for future
drivers.

CC: bhelgaas@google.com
CC: linux-pci@vger.kernel.org
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 include/linux/pci_ids.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index a0c75e467df3..e5a1d5e9930b 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2598,6 +2598,8 @@
 
 #define PCI_VENDOR_ID_HYGON		0x1d94
 
+#define PCI_VENDOR_ID_META		0x1d9b
+
 #define PCI_VENDOR_ID_FUNGIBLE		0x1dad
 
 #define PCI_VENDOR_ID_HXT		0x1dbf



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
  2024-04-03 20:08 ` [net-next PATCH 01/15] PCI: Add Meta Platforms vendor ID Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 20:33   ` Andrew Lunn
  2024-04-03 20:08 ` [net-next PATCH 03/15] eth: fbnic: Allocate core device specific structures and devlink interface Alexander Duyck
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Create a bare-bones PCI driver for Meta's NIC.
Subsequent changes will flesh it out.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 MAINTAINERS                                     |    7 +
 drivers/net/ethernet/Kconfig                    |    1 
 drivers/net/ethernet/Makefile                   |    1 
 drivers/net/ethernet/meta/Kconfig               |   29 +++
 drivers/net/ethernet/meta/Makefile              |    6 +
 drivers/net/ethernet/meta/fbnic/Makefile        |   10 +
 drivers/net/ethernet/meta/fbnic/fbnic.h         |   19 ++
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h     |    9 +
 drivers/net/ethernet/meta/fbnic/fbnic_drvinfo.h |    5 +
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c     |  196 +++++++++++++++++++++++
 10 files changed, 283 insertions(+)
 create mode 100644 drivers/net/ethernet/meta/Kconfig
 create mode 100644 drivers/net/ethernet/meta/Makefile
 create mode 100644 drivers/net/ethernet/meta/fbnic/Makefile
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_csr.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_drvinfo.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_pci.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6a233e1a3cf2..77efffbd23f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14307,6 +14307,13 @@ T:	git git://linuxtv.org/media_tree.git
 F:	Documentation/devicetree/bindings/media/amlogic,gx-vdec.yaml
 F:	drivers/staging/media/meson/vdec/
 
+META ETHERNET DRIVERS
+M:	Alexander Duyck <alexanderduyck@fb.com>
+M:	Jakub Kicinski <kuba@kernel.org>
+R:	kernel-team@meta.com
+S:	Maintained
+F:	drivers/net/ethernet/meta/
+
 METHODE UDPU SUPPORT
 M:	Robert Marko <robert.marko@sartura.hr>
 S:	Maintained
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 6a19b5393ed1..0baac25db4f8 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -122,6 +122,7 @@ source "drivers/net/ethernet/litex/Kconfig"
 source "drivers/net/ethernet/marvell/Kconfig"
 source "drivers/net/ethernet/mediatek/Kconfig"
 source "drivers/net/ethernet/mellanox/Kconfig"
+source "drivers/net/ethernet/meta/Kconfig"
 source "drivers/net/ethernet/micrel/Kconfig"
 source "drivers/net/ethernet/microchip/Kconfig"
 source "drivers/net/ethernet/mscc/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 0d872d4efcd1..c03203439c0e 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NET_VENDOR_LITEX) += litex/
 obj-$(CONFIG_NET_VENDOR_MARVELL) += marvell/
 obj-$(CONFIG_NET_VENDOR_MEDIATEK) += mediatek/
 obj-$(CONFIG_NET_VENDOR_MELLANOX) += mellanox/
+obj-$(CONFIG_NET_VENDOR_META) += meta/
 obj-$(CONFIG_NET_VENDOR_MICREL) += micrel/
 obj-$(CONFIG_NET_VENDOR_MICROCHIP) += microchip/
 obj-$(CONFIG_NET_VENDOR_MICROSEMI) += mscc/
diff --git a/drivers/net/ethernet/meta/Kconfig b/drivers/net/ethernet/meta/Kconfig
new file mode 100644
index 000000000000..8949ab15a02e
--- /dev/null
+++ b/drivers/net/ethernet/meta/Kconfig
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Meta Platforms network device configuration
+#
+
+config NET_VENDOR_META
+	bool "Meta Platforms devices"
+	default y
+	help
+	  If you have a network (Ethernet) card designed by Meta, say Y.
+	  That's Meta as in the parent company of Facebook.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about Meta cards. If you say Y, you will be asked for
+	  your specific card in the following questions.
+
+if NET_VENDOR_META
+
+config FBNIC
+	tristate "Meta Platforms Host Network Interface"
+	depends on PCI_MSI
+	help
+	  This driver supports Meta Platforms Host Network Interface.
+
+	  To compile this driver as a module, choose M here. The module
+	  will be called fbnic.  MSI-X interrupt support is required.
+
+endif # NET_VENDOR_META
diff --git a/drivers/net/ethernet/meta/Makefile b/drivers/net/ethernet/meta/Makefile
new file mode 100644
index 000000000000..88804f3de963
--- /dev/null
+++ b/drivers/net/ethernet/meta/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Meta Platforms network device drivers.
+#
+
+obj-$(CONFIG_FBNIC) += fbnic/
diff --git a/drivers/net/ethernet/meta/fbnic/Makefile b/drivers/net/ethernet/meta/fbnic/Makefile
new file mode 100644
index 000000000000..ce277fec875f
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+
+#
+# Makefile for the Meta(R) Host Network Interface
+#
+
+obj-$(CONFIG_FBNIC) += fbnic.o
+
+fbnic-y := fbnic_pci.o
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
new file mode 100644
index 000000000000..25702dab8d66
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_H_
+#define _FBNIC_H_
+
+#include "fbnic_csr.h"
+
+extern char fbnic_driver_name[];
+
+enum fbnic_boards {
+	fbnic_board_asic
+};
+
+struct fbnic_info {
+	unsigned int bar_mask;
+};
+
+#endif /* _FBNIC_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
new file mode 100644
index 000000000000..72e89c07bf54
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_CSR_H_
+#define _FBNIC_CSR_H_
+
+#define PCI_DEVICE_ID_META_FBNIC_ASIC		0x0013
+
+#endif /* _FBNIC_CSR_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_drvinfo.h b/drivers/net/ethernet/meta/fbnic/fbnic_drvinfo.h
new file mode 100644
index 000000000000..809ba6729442
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_drvinfo.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#define DRV_NAME "fbnic"
+#define DRV_SUMMARY "Meta(R) Host Network Interface Driver"
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
new file mode 100644
index 000000000000..1cb71cb1de14
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+
+#include "fbnic.h"
+#include "fbnic_drvinfo.h"
+
+char fbnic_driver_name[] = DRV_NAME;
+
+MODULE_DESCRIPTION(DRV_SUMMARY);
+MODULE_LICENSE("GPL");
+
+static const struct fbnic_info fbnic_asic_info = {
+	.bar_mask = BIT(0) | BIT(4)
+};
+
+static const struct fbnic_info *fbnic_info_tbl[] = {
+	[fbnic_board_asic] = &fbnic_asic_info,
+};
+
+static const struct pci_device_id fbnic_pci_tbl[] = {
+	{ PCI_DEVICE_DATA(META, FBNIC_ASIC, fbnic_board_asic) },
+	/* required last entry */
+	{0, }
+};
+MODULE_DEVICE_TABLE(pci, fbnic_pci_tbl);
+
+/**
+ *  fbnic_probe - Device Initialization Routine
+ *  @pdev: PCI device information struct
+ *  @ent: entry in fbnic_pci_tbl
+ *
+ *  Returns 0 on success, negative on failure
+ *
+ *  Initializes a PCI device identified by a pci_dev structure.
+ *  The OS initialization, configuring of the adapter private structure,
+ *  and a hardware reset occur.
+ **/
+static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	const struct fbnic_info *info = fbnic_info_tbl[ent->driver_data];
+	int err;
+
+	if (pdev->error_state != pci_channel_io_normal) {
+		dev_err(&pdev->dev,
+			"PCI device still in an error state. Unable to load...\n");
+		return -EIO;
+	}
+
+	err = pcim_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "PCI enable device failed: %d\n", err);
+		return err;
+	}
+
+	err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(46));
+	if (err)
+		err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+	if (err) {
+		dev_err(&pdev->dev, "DMA configuration failed: %d\n", err);
+		return err;
+	}
+
+	err = pcim_iomap_regions(pdev, info->bar_mask, fbnic_driver_name);
+	if (err) {
+		dev_err(&pdev->dev,
+			"pci_request_selected_regions failed: %d\n", err);
+		return err;
+	}
+
+	pci_set_master(pdev);
+	pci_save_state(pdev);
+
+	return 0;
+}
+
+/**
+ * fbnic_remove - Device Removal Routine
+ * @pdev: PCI device information struct
+ *
+ * Called by the PCI subsystem to alert the driver that it should release
+ * a PCI device.  The could be caused by a Hot-Plug event, or because the
+ * driver is going to be removed from memory.
+ **/
+static void fbnic_remove(struct pci_dev *pdev)
+{
+}
+
+static int fbnic_pm_suspend(struct device *dev)
+{
+	return 0;
+}
+
+static int __fbnic_pm_resume(struct device *dev)
+{
+	return 0;
+}
+
+static int __maybe_unused fbnic_pm_resume(struct device *dev)
+{
+	int err;
+
+	err = __fbnic_pm_resume(dev);
+
+	return err;
+}
+
+static const struct dev_pm_ops fbnic_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(fbnic_pm_suspend, fbnic_pm_resume)
+};
+
+static void fbnic_shutdown(struct pci_dev *pdev)
+{
+	fbnic_pm_suspend(&pdev->dev);
+}
+
+static pci_ers_result_t fbnic_err_error_detected(struct pci_dev *pdev,
+						 pci_channel_state_t state)
+{
+	/* disconnect device if failure is not recoverable via reset */
+	if (state == pci_channel_io_perm_failure)
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	fbnic_pm_suspend(&pdev->dev);
+
+	/* Request a slot reset */
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t fbnic_err_slot_reset(struct pci_dev *pdev)
+{
+	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
+	pci_save_state(pdev);
+
+	if (pci_enable_device_mem(pdev)) {
+		dev_err(&pdev->dev,
+			"Cannot re-enable PCI device after reset.\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void fbnic_err_resume(struct pci_dev *pdev)
+{
+}
+
+static const struct pci_error_handlers fbnic_err_handler = {
+	.error_detected	= fbnic_err_error_detected,
+	.slot_reset	= fbnic_err_slot_reset,
+	.resume		= fbnic_err_resume,
+};
+
+static struct pci_driver fbnic_driver = {
+	.name		= fbnic_driver_name,
+	.id_table	= fbnic_pci_tbl,
+	.probe		= fbnic_probe,
+	.remove		= fbnic_remove,
+	.driver.pm	= &fbnic_pm_ops,
+	.shutdown	= fbnic_shutdown,
+	.err_handler	= &fbnic_err_handler,
+};
+
+/**
+ * fbnic_init_module - Driver Registration Routine
+ *
+ * The first routine called when the driver is loaded.  All it does is
+ * register with the PCI subsystem.
+ **/
+static int __init fbnic_init_module(void)
+{
+	int err;
+
+	pr_info(DRV_SUMMARY " (%s)", fbnic_driver.name);
+
+	err = pci_register_driver(&fbnic_driver);
+
+	return err;
+}
+module_init(fbnic_init_module);
+
+/**
+ * fbnic_exit_module - Driver Exit Cleanup Routine
+ *
+ * Called just before the driver is removed from memory.
+ **/
+static void __exit fbnic_exit_module(void)
+{
+	pci_unregister_driver(&fbnic_driver);
+}
+module_exit(fbnic_exit_module);



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 03/15] eth: fbnic: Allocate core device specific structures and devlink interface
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
  2024-04-03 20:08 ` [net-next PATCH 01/15] PCI: Add Meta Platforms vendor ID Alexander Duyck
  2024-04-03 20:08 ` [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 20:35   ` Bjorn Helgaas
  2024-04-03 20:08 ` [net-next PATCH 04/15] eth: fbnic: Add register init to set PCIe/Ethernet device config Alexander Duyck
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

At the core of the fbnic device will be the devlink interface. This
interface will eventually provide basic functionality in the event that
there are any issues with the network interface.

Add support for allocating the MSI-X vectors and setting up the BAR
mapping. With this we can start enabling various subsytems and start
brining up additional interfaces such the AXI fabric and the firmware
mailbox.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/Makefile        |    4 +
 drivers/net/ethernet/meta/fbnic/fbnic.h         |   28 ++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_devlink.c |   84 +++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c     |   52 ++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c     |   72 +++++++++++++++++++-
 5 files changed, 238 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_irq.c

diff --git a/drivers/net/ethernet/meta/fbnic/Makefile b/drivers/net/ethernet/meta/fbnic/Makefile
index ce277fec875f..c06041e70bc5 100644
--- a/drivers/net/ethernet/meta/fbnic/Makefile
+++ b/drivers/net/ethernet/meta/fbnic/Makefile
@@ -7,4 +7,6 @@
 
 obj-$(CONFIG_FBNIC) += fbnic.o
 
-fbnic-y := fbnic_pci.o
+fbnic-y := fbnic_devlink.o \
+	   fbnic_irq.o \
+	   fbnic_pci.o
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 25702dab8d66..f322cea4ce22 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -6,8 +6,36 @@
 
 #include "fbnic_csr.h"
 
+struct fbnic_dev {
+	struct device *dev;
+
+	u32 __iomem *uc_addr0;
+	u32 __iomem *uc_addr4;
+	struct msix_entry *msix_entries;
+	unsigned short num_irqs;
+
+	u64 dsn;
+};
+
+/* Reserve entry 0 in the MSI-X "others" array until we have filled all
+ * 32 of the possible interrupt slots. By doing this we can avoid any
+ * potential conflicts should we need to enable one of the debug interrupt
+ * causes later.
+ */
+enum {
+	FBNIC_NON_NAPI_VECTORS
+};
+
 extern char fbnic_driver_name[];
 
+void fbnic_devlink_free(struct fbnic_dev *fbd);
+struct fbnic_dev *fbnic_devlink_alloc(struct pci_dev *pdev);
+void fbnic_devlink_register(struct fbnic_dev *fbd);
+void fbnic_devlink_unregister(struct fbnic_dev *fbd);
+
+void fbnic_free_irqs(struct fbnic_dev *fbd);
+int fbnic_alloc_irqs(struct fbnic_dev *fbd);
+
 enum fbnic_boards {
 	fbnic_board_asic
 };
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c b/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
new file mode 100644
index 000000000000..91e8135410df
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <asm/unaligned.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+#include <net/devlink.h>
+
+#include "fbnic.h"
+
+#define FBNIC_SN_STR_LEN	24
+
+static int fbnic_devlink_info_get(struct devlink *devlink,
+				  struct devlink_info_req *req,
+				  struct netlink_ext_ack *extack)
+{
+	struct fbnic_dev *fbd = devlink_priv(devlink);
+	int err;
+
+	if (fbd->dsn) {
+		unsigned char serial[FBNIC_SN_STR_LEN];
+		u8 dsn[8];
+
+		put_unaligned_be64(fbd->dsn, dsn);
+		err = snprintf(serial, FBNIC_SN_STR_LEN, "%8phD", dsn);
+		if (err < 0)
+			return err;
+
+		err = devlink_info_serial_number_put(req, serial);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static const struct devlink_ops fbnic_devlink_ops = {
+	.info_get = fbnic_devlink_info_get,
+};
+
+void fbnic_devlink_free(struct fbnic_dev *fbd)
+{
+	struct devlink *devlink = priv_to_devlink(fbd);
+
+	devlink_free(devlink);
+}
+
+struct fbnic_dev *fbnic_devlink_alloc(struct pci_dev *pdev)
+{
+	void __iomem * const *iomap_table;
+	struct devlink *devlink;
+	struct fbnic_dev *fbd;
+
+	devlink = devlink_alloc(&fbnic_devlink_ops, sizeof(struct fbnic_dev),
+				&pdev->dev);
+	if (!devlink)
+		return NULL;
+
+	fbd = devlink_priv(devlink);
+	pci_set_drvdata(pdev, fbd);
+	fbd->dev = &pdev->dev;
+
+	iomap_table = pcim_iomap_table(pdev);
+	fbd->uc_addr0 = iomap_table[0];
+	fbd->uc_addr4 = iomap_table[4];
+
+	fbd->dsn = pci_get_dsn(pdev);
+
+	return fbd;
+}
+
+void fbnic_devlink_register(struct fbnic_dev *fbd)
+{
+	struct devlink *devlink = priv_to_devlink(fbd);
+
+	devlink_register(devlink);
+}
+
+void fbnic_devlink_unregister(struct fbnic_dev *fbd)
+{
+	struct devlink *devlink = priv_to_devlink(fbd);
+
+	devlink_unregister(devlink);
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
new file mode 100644
index 000000000000..d2fdc51704b9
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/pci.h>
+#include <linux/types.h>
+
+#include "fbnic.h"
+
+void fbnic_free_irqs(struct fbnic_dev *fbd)
+{
+	struct pci_dev *pdev = to_pci_dev(fbd->dev);
+
+	fbd->num_irqs = 0;
+
+	pci_disable_msix(pdev);
+
+	kfree(fbd->msix_entries);
+	fbd->msix_entries = NULL;
+}
+
+int fbnic_alloc_irqs(struct fbnic_dev *fbd)
+{
+	unsigned int wanted_irqs = FBNIC_NON_NAPI_VECTORS;
+	struct pci_dev *pdev = to_pci_dev(fbd->dev);
+	struct msix_entry *msix_entries;
+	int i, num_irqs;
+
+	msix_entries = kcalloc(wanted_irqs, sizeof(*msix_entries), GFP_KERNEL);
+	if (!msix_entries)
+		return -ENOMEM;
+
+	for (i = 0; i < wanted_irqs; i++)
+		msix_entries[i].entry = i;
+
+	num_irqs = pci_enable_msix_range(pdev, msix_entries,
+					 FBNIC_NON_NAPI_VECTORS + 1,
+					 wanted_irqs);
+	if (num_irqs < 0) {
+		dev_err(fbd->dev, "Failed to allocate MSI-X entries\n");
+		kfree(msix_entries);
+		return num_irqs;
+	}
+
+	if (num_irqs < wanted_irqs)
+		dev_warn(fbd->dev, "Allocated %d IRQs, expected %d\n",
+			 num_irqs, wanted_irqs);
+
+	fbd->msix_entries = msix_entries;
+	fbd->num_irqs = num_irqs;
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index 1cb71cb1de14..596151396eac 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -43,6 +43,7 @@ MODULE_DEVICE_TABLE(pci, fbnic_pci_tbl);
 static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	const struct fbnic_info *info = fbnic_info_tbl[ent->driver_data];
+	struct fbnic_dev *fbd;
 	int err;
 
 	if (pdev->error_state != pci_channel_io_normal) {
@@ -72,10 +73,41 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		return err;
 	}
 
+	fbd = fbnic_devlink_alloc(pdev);
+	if (!fbd) {
+		dev_err(&pdev->dev, "Devlink allocation failed\n");
+		return -ENOMEM;
+	}
+
 	pci_set_master(pdev);
 	pci_save_state(pdev);
 
+	fbnic_devlink_register(fbd);
+
+	err = fbnic_alloc_irqs(fbd);
+	if (err)
+		goto free_fbd;
+
+	if (!fbd->dsn) {
+		dev_warn(&pdev->dev, "Reading serial number failed\n");
+		goto init_failure_mode;
+	}
+
 	return 0;
+
+init_failure_mode:
+	dev_warn(&pdev->dev, "Probe error encountered, entering init failure mode. Normal networking functionality will not be available.\n");
+	 /* Always return 0 even on error so devlink is registered to allow
+	  * firmware updates for fixes.
+	  */
+	return 0;
+free_fbd:
+	pci_disable_device(pdev);
+
+	fbnic_devlink_unregister(fbd);
+	fbnic_devlink_free(fbd);
+
+	return err;
 }
 
 /**
@@ -88,16 +120,49 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
  **/
 static void fbnic_remove(struct pci_dev *pdev)
 {
+	struct fbnic_dev *fbd = pci_get_drvdata(pdev);
+
+	fbnic_free_irqs(fbd);
+
+	fbnic_devlink_unregister(fbd);
+	fbnic_devlink_free(fbd);
 }
 
 static int fbnic_pm_suspend(struct device *dev)
 {
+	struct fbnic_dev *fbd = dev_get_drvdata(dev);
+
+	/* Free the IRQs so they aren't trying to occupy sleeping CPUs */
+	fbnic_free_irqs(fbd);
+
+	/* Hardware is about ot go away, so switch off MMIO access internally */
+	WRITE_ONCE(fbd->uc_addr0, NULL);
+	WRITE_ONCE(fbd->uc_addr4, NULL);
+
 	return 0;
 }
 
 static int __fbnic_pm_resume(struct device *dev)
 {
+	struct fbnic_dev *fbd = dev_get_drvdata(dev);
+	void __iomem * const *iomap_table;
+	int err;
+
+	/* restore MMIO access */
+	iomap_table = pcim_iomap_table(to_pci_dev(dev));
+	fbd->uc_addr0 = iomap_table[0];
+	fbd->uc_addr4 = iomap_table[4];
+
+	/* rerequest the IRQs */
+	err = fbnic_alloc_irqs(fbd);
+	if (err)
+		goto err_invalidate_uc_addr;
+
 	return 0;
+err_invalidate_uc_addr:
+	WRITE_ONCE(fbd->uc_addr0, NULL);
+	WRITE_ONCE(fbd->uc_addr4, NULL);
+	return err;
 }
 
 static int __maybe_unused fbnic_pm_resume(struct device *dev)
@@ -133,6 +198,8 @@ static pci_ers_result_t fbnic_err_error_detected(struct pci_dev *pdev,
 
 static pci_ers_result_t fbnic_err_slot_reset(struct pci_dev *pdev)
 {
+	int err;
+
 	pci_set_power_state(pdev, PCI_D0);
 	pci_restore_state(pdev);
 	pci_save_state(pdev);
@@ -143,7 +210,10 @@ static pci_ers_result_t fbnic_err_slot_reset(struct pci_dev *pdev)
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
-	return PCI_ERS_RESULT_RECOVERED;
+	/* restore device to previous state */
+	err = __fbnic_pm_resume(&pdev->dev);
+
+	return err ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
 }
 
 static void fbnic_err_resume(struct pci_dev *pdev)



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 04/15] eth: fbnic: Add register init to set PCIe/Ethernet device config
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (2 preceding siblings ...)
  2024-04-03 20:08 ` [net-next PATCH 03/15] eth: fbnic: Allocate core device specific structures and devlink interface Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 20:46   ` Andrew Lunn
  2024-04-03 20:08 ` [net-next PATCH 05/15] eth: fbnic: add message parsing for FW messages Alexander Duyck
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

As a part of enabling the device the first step is to configure the AXI and
Ethernet interfaces to allow for basic traffic. This consists of
configuring several registers related to the PCIe and Ethernet MAC as well
as configuring the handlers for moving traffic between entities.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/Makefile    |    1 
 drivers/net/ethernet/meta/fbnic/fbnic.h     |   36 ++
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h |  312 +++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c |  438 +++++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.h |   25 ++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c |   39 ++
 6 files changed, 851 insertions(+)
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_mac.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_mac.h

diff --git a/drivers/net/ethernet/meta/fbnic/Makefile b/drivers/net/ethernet/meta/fbnic/Makefile
index c06041e70bc5..b8f4511440dc 100644
--- a/drivers/net/ethernet/meta/fbnic/Makefile
+++ b/drivers/net/ethernet/meta/fbnic/Makefile
@@ -9,4 +9,5 @@ obj-$(CONFIG_FBNIC) += fbnic.o
 
 fbnic-y := fbnic_devlink.o \
 	   fbnic_irq.o \
+	   fbnic_mac.o \
 	   fbnic_pci.o
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index f322cea4ce22..63ec82a830cd 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -4,17 +4,23 @@
 #ifndef _FBNIC_H_
 #define _FBNIC_H_
 
+#include <linux/io.h>
+
 #include "fbnic_csr.h"
+#include "fbnic_mac.h"
 
 struct fbnic_dev {
 	struct device *dev;
 
 	u32 __iomem *uc_addr0;
 	u32 __iomem *uc_addr4;
+	const struct fbnic_mac *mac;
 	struct msix_entry *msix_entries;
 	unsigned short num_irqs;
 
 	u64 dsn;
+	u32 mps;
+	u32 readrq;
 };
 
 /* Reserve entry 0 in the MSI-X "others" array until we have filled all
@@ -26,6 +32,36 @@ enum {
 	FBNIC_NON_NAPI_VECTORS
 };
 
+static inline bool fbnic_present(struct fbnic_dev *fbd)
+{
+	return !!READ_ONCE(fbd->uc_addr0);
+}
+
+static inline void fbnic_wr32(struct fbnic_dev *fbd, u32 reg, u32 val)
+{
+	u32 __iomem *csr = READ_ONCE(fbd->uc_addr0);
+
+	if (csr)
+		writel(val, csr + reg);
+}
+
+u32 fbnic_rd32(struct fbnic_dev *fbd, u32 reg);
+
+static inline void
+fbnic_rmw32(struct fbnic_dev *fbd, u32 reg, u32 mask, u32 val)
+{
+	u32 v;
+
+	v = fbnic_rd32(fbd, reg);
+	v &= ~mask;
+	v |= val;
+	fbnic_wr32(fbd, reg, v);
+}
+
+#define wr32(reg, val)	fbnic_wr32(fbd, reg, val)
+#define rd32(reg)	fbnic_rd32(fbd, reg)
+#define wrfl()		fbnic_rd32(fbd, FBNIC_MASTER_SPARE_0)
+
 extern char fbnic_driver_name[];
 
 void fbnic_devlink_free(struct fbnic_dev *fbd);
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 72e89c07bf54..eb37e5981e69 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -4,6 +4,318 @@
 #ifndef _FBNIC_CSR_H_
 #define _FBNIC_CSR_H_
 
+#include <linux/bitops.h>
+
+#define CSR_BIT(nr)		(1u << (nr))
+#define CSR_GENMASK(h, l)	GENMASK(h, l)
+
 #define PCI_DEVICE_ID_META_FBNIC_ASIC		0x0013
 
+#define FBNIC_CLOCK_FREQ	(600 * (1000 * 1000))
+
+/* Global QM Tx registers */
+#define FBNIC_CSR_START_QM_TX		0x00800	/* CSR section delimiter */
+#define FBNIC_QM_TWQ_DEFAULT_META_L	0x00818		/* 0x02060 */
+#define FBNIC_QM_TWQ_DEFAULT_META_H	0x00819		/* 0x02064 */
+
+#define FBNIC_QM_TQS_CTL0		0x0081b		/* 0x0206c */
+#define FBNIC_QM_TQS_CTL0_LSO_TS_MASK	CSR_BIT(0)
+enum {
+	FBNIC_QM_TQS_CTL0_LSO_TS_FIRST	= 0,
+	FBNIC_QM_TQS_CTL0_LSO_TS_LAST	= 1,
+};
+
+#define FBNIC_QM_TQS_CTL0_PREFETCH_THRESH	CSR_GENMASK(7, 1)
+enum {
+	FBNIC_QM_TQS_CTL0_PREFETCH_THRESH_MIN	= 16,
+};
+
+#define FBNIC_QM_TQS_CTL1		0x0081c		/* 0x02070 */
+#define FBNIC_QM_TQS_CTL1_MC_MAX_CREDITS	CSR_GENMASK(7, 0)
+#define FBNIC_QM_TQS_CTL1_BULK_MAX_CREDITS	CSR_GENMASK(15, 8)
+#define FBNIC_QM_TQS_MTU_CTL0		0x0081d		/* 0x02074 */
+#define FBNIC_QM_TQS_MTU_CTL1		0x0081e		/* 0x02078 */
+#define FBNIC_QM_TQS_MTU_CTL1_BULK		CSR_GENMASK(13, 0)
+#define FBNIC_QM_TCQ_CTL0		0x0082d		/* 0x020b4 */
+#define FBNIC_QM_TCQ_CTL0_COAL_WAIT		CSR_GENMASK(15, 0)
+#define FBNIC_QM_TCQ_CTL0_TICK_CYCLES		CSR_GENMASK(26, 16)
+#define FBNIC_QM_TQS_EDT_TS_RANGE	0x00849		/* 0x2124 */
+#define FBNIC_QM_TNI_TDF_CTL		0x0086c		/* 0x021b0 */
+#define FBNIC_QM_TNI_TDF_CTL_MRRS		CSR_GENMASK(1, 0)
+#define FBNIC_QM_TNI_TDF_CTL_CLS		CSR_GENMASK(3, 2)
+#define FBNIC_QM_TNI_TDF_CTL_MAX_OT		CSR_GENMASK(11, 4)
+#define FBNIC_QM_TNI_TDF_CTL_MAX_OB		CSR_GENMASK(23, 12)
+#define FBNIC_QM_TNI_TDE_CTL		0x0086d		/* 0x021b4 */
+#define FBNIC_QM_TNI_TDE_CTL_MRRS		CSR_GENMASK(1, 0)
+#define FBNIC_QM_TNI_TDE_CTL_CLS		CSR_GENMASK(3, 2)
+#define FBNIC_QM_TNI_TDE_CTL_MAX_OT		CSR_GENMASK(11, 4)
+#define FBNIC_QM_TNI_TDE_CTL_MAX_OB		CSR_GENMASK(24, 12)
+#define FBNIC_QM_TNI_TDE_CTL_MRRS_1K		CSR_BIT(25)
+#define FBNIC_QM_TNI_TCM_CTL		0x0086e		/* 0x021b8 */
+#define FBNIC_QM_TNI_TCM_CTL_MPS		CSR_GENMASK(1, 0)
+#define FBNIC_QM_TNI_TCM_CTL_CLS		CSR_GENMASK(3, 2)
+#define FBNIC_QM_TNI_TCM_CTL_MAX_OT		CSR_GENMASK(11, 4)
+#define FBNIC_QM_TNI_TCM_CTL_MAX_OB		CSR_GENMASK(23, 12)
+#define FBNIC_CSR_END_QM_TX		0x00873	/* CSR section delimiter */
+
+/* Global QM Rx registers */
+#define FBNIC_CSR_START_QM_RX		0x00c00	/* CSR section delimiter */
+#define FBNIC_QM_RCQ_CTL0		0x00c0c		/* 0x03030 */
+#define FBNIC_QM_RCQ_CTL0_COAL_WAIT		CSR_GENMASK(15, 0)
+#define FBNIC_QM_RCQ_CTL0_TICK_CYCLES		CSR_GENMASK(26, 16)
+#define FBNIC_QM_RNI_RBP_CTL		0x00c2d		/* 0x030b4 */
+#define FBNIC_QM_RNI_RBP_CTL_MRRS		CSR_GENMASK(1, 0)
+#define FBNIC_QM_RNI_RBP_CTL_CLS		CSR_GENMASK(3, 2)
+#define FBNIC_QM_RNI_RBP_CTL_MAX_OT		CSR_GENMASK(11, 4)
+#define FBNIC_QM_RNI_RBP_CTL_MAX_OB		CSR_GENMASK(23, 12)
+#define FBNIC_QM_RNI_RDE_CTL		0x00c2e		/* 0x030b8 */
+#define FBNIC_QM_RNI_RDE_CTL_MPS		CSR_GENMASK(1, 0)
+#define FBNIC_QM_RNI_RDE_CTL_CLS		CSR_GENMASK(3, 2)
+#define FBNIC_QM_RNI_RDE_CTL_MAX_OT		CSR_GENMASK(11, 4)
+#define FBNIC_QM_RNI_RDE_CTL_MAX_OB		CSR_GENMASK(23, 12)
+#define FBNIC_QM_RNI_RCM_CTL		0x00c2f		/* 0x030bc */
+#define FBNIC_QM_RNI_RCM_CTL_MPS		CSR_GENMASK(1, 0)
+#define FBNIC_QM_RNI_RCM_CTL_CLS		CSR_GENMASK(3, 2)
+#define FBNIC_QM_RNI_RCM_CTL_MAX_OT		CSR_GENMASK(11, 4)
+#define FBNIC_QM_RNI_RCM_CTL_MAX_OB		CSR_GENMASK(23, 12)
+#define FBNIC_CSR_END_QM_RX		0x00c34	/* CSR section delimiter */
+
+/* TCE registers */
+#define FBNIC_CSR_START_TCE		0x04000	/* CSR section delimiter */
+#define FBNIC_TCE_REG_BASE		0x04000		/* 0x10000 */
+
+#define FBNIC_TCE_LSO_CTRL		0x04000		/* 0x10000 */
+#define FBNIC_TCE_LSO_CTRL_TCPF_CLR_1ST		CSR_GENMASK(8, 0)
+#define FBNIC_TCE_LSO_CTRL_TCPF_CLR_MID		CSR_GENMASK(17, 9)
+#define FBNIC_TCE_LSO_CTRL_TCPF_CLR_END		CSR_GENMASK(26, 18)
+#define FBNIC_TCE_LSO_CTRL_IPID_MODE_INC	CSR_BIT(27)
+
+#define FBNIC_TCE_CSO_CTRL		0x04001		/* 0x10004 */
+#define FBNIC_TCE_CSO_CTRL_TCP_ZERO_CSUM	CSR_BIT(0)
+
+#define FBNIC_TCE_TXB_CTRL		0x04002		/* 0x10008 */
+#define FBNIC_TCE_TXB_CTRL_LOAD			CSR_BIT(0)
+#define FBNIC_TCE_TXB_CTRL_TCAM_ENABLE		CSR_BIT(1)
+#define FBNIC_TCE_TXB_CTRL_DISABLE		CSR_BIT(2)
+
+#define FBNIC_TCE_TXB_ENQ_WRR_CTRL	0x04003		/* 0x1000c */
+#define FBNIC_TCE_TXB_ENQ_WRR_CTRL_WEIGHT0	CSR_GENMASK(7, 0)
+#define FBNIC_TCE_TXB_ENQ_WRR_CTRL_WEIGHT1	CSR_GENMASK(15, 8)
+#define FBNIC_TCE_TXB_ENQ_WRR_CTRL_WEIGHT2	CSR_GENMASK(23, 16)
+
+#define FBNIC_TCE_TXB_TEI_Q0_CTRL	0x04004		/* 0x10010 */
+#define FBNIC_TCE_TXB_TEI_Q1_CTRL	0x04005		/* 0x10014 */
+#define FBNIC_TCE_TXB_MC_Q_CTRL		0x04006		/* 0x10018 */
+#define FBNIC_TCE_TXB_RX_TEI_Q_CTRL	0x04007		/* 0x1001c */
+#define FBNIC_TCE_TXB_RX_BMC_Q_CTRL	0x04008		/* 0x10020 */
+#define FBNIC_TCE_TXB_Q_CTRL_START		CSR_GENMASK(10, 0)
+#define FBNIC_TCE_TXB_Q_CTRL_SIZE		CSR_GENMASK(22, 11)
+
+#define FBNIC_TCE_TXB_TEI_DWRR_CTRL	0x04009		/* 0x10024 */
+#define FBNIC_TCE_TXB_TEI_DWRR_CTRL_QUANTUM0	CSR_GENMASK(7, 0)
+#define FBNIC_TCE_TXB_TEI_DWRR_CTRL_QUANTUM1	CSR_GENMASK(15, 8)
+#define FBNIC_TCE_TXB_NTWRK_DWRR_CTRL	0x0400a		/* 0x10028 */
+#define FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_QUANTUM0	CSR_GENMASK(7, 0)
+#define FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_QUANTUM1	CSR_GENMASK(15, 8)
+#define FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_QUANTUM2	CSR_GENMASK(23, 16)
+
+#define FBNIC_TCE_TXB_CLDR_CFG		0x0400b		/* 0x1002c */
+#define FBNIC_TCE_TXB_CLDR_CFG_NUM_SLOT		CSR_GENMASK(5, 0)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG(n)	(0x0400c + (n))	/* 0x10030 + 4*n */
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_CNT		16
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_0_0	CSR_GENMASK(1, 0)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_0_1	CSR_GENMASK(3, 2)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_0_2	CSR_GENMASK(5, 4)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_0_3	CSR_GENMASK(7, 6)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_1_0	CSR_GENMASK(9, 8)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_1_1	CSR_GENMASK(11, 10)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_1_2	CSR_GENMASK(13, 12)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_1_3	CSR_GENMASK(15, 14)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_2_0	CSR_GENMASK(17, 16)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_2_1	CSR_GENMASK(19, 18)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_2_2	CSR_GENMASK(21, 20)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_2_3	CSR_GENMASK(23, 22)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_3_0	CSR_GENMASK(25, 24)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_3_1	CSR_GENMASK(27, 26)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_3_2	CSR_GENMASK(29, 28)
+#define FBNIC_TCE_TXB_CLDR_SLOT_CFG_DEST_ID_3_3	CSR_GENMASK(31, 30)
+
+#define FBNIC_TCE_BMC_MAX_PKTSZ		0x0403a		/* 0x100e8 */
+#define FBNIC_TCE_BMC_MAX_PKTSZ_TX		CSR_GENMASK(13, 0)
+#define FBNIC_TCE_BMC_MAX_PKTSZ_RX		CSR_GENMASK(27, 14)
+#define FBNIC_TCE_MC_MAX_PKTSZ		0x0403b		/* 0x100ec */
+#define FBNIC_TCE_MC_MAX_PKTSZ_TMI		CSR_GENMASK(13, 0)
+
+#define FBNIC_TCE_SOP_PROT_CTRL		0x0403c		/* 0x100f0 */
+#define FBNIC_TCE_SOP_PROT_CTRL_TBI		CSR_GENMASK(7, 0)
+#define FBNIC_TCE_SOP_PROT_CTRL_TTI_FRM		CSR_GENMASK(14, 8)
+#define FBNIC_TCE_SOP_PROT_CTRL_TTI_CM		CSR_GENMASK(18, 15)
+
+#define FBNIC_TCE_DROP_CTRL		0x0403d		/* 0x100f4 */
+#define FBNIC_TCE_DROP_CTRL_TTI_CM_DROP_EN	CSR_BIT(0)
+#define FBNIC_TCE_DROP_CTRL_TTI_FRM_DROP_EN	CSR_BIT(1)
+#define FBNIC_TCE_DROP_CTRL_TTI_TBI_DROP_EN	CSR_BIT(2)
+
+#define FBNIC_TCE_TXB_TX_BMC_Q_CTRL	0x0404B		/* 0x1012c */
+#define FBNIC_TCE_TXB_BMC_DWRR_CTRL	0x0404C		/* 0x10130 */
+#define FBNIC_TCE_TXB_BMC_DWRR_CTRL_QUANTUM0	CSR_GENMASK(7, 0)
+#define FBNIC_TCE_TXB_BMC_DWRR_CTRL_QUANTUM1	CSR_GENMASK(15, 8)
+#define FBNIC_TCE_TXB_TEI_DWRR_CTRL_EXT	0x0404D		/* 0x10134 */
+#define FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_EXT \
+					0x0404E		/* 0x10138 */
+#define FBNIC_TCE_TXB_BMC_DWRR_CTRL_EXT	0x0404F		/* 0x1013c */
+#define FBNIC_CSR_END_TCE		0x04050	/* CSR section delimiter */
+
+/* TMI registers */
+#define FBNIC_CSR_START_TMI		0x04400	/* CSR section delimiter */
+#define FBNIC_TMI_SOP_PROT_CTRL		0x04400		/* 0x11000 */
+#define FBNIC_CSR_END_TMI		0x0443f	/* CSR section delimiter */
+/* Rx Buffer Registers */
+#define FBNIC_CSR_START_RXB		0x08000	/* CSR section delimiter */
+enum {
+	FBNIC_RXB_FIFO_MC		= 0,
+	/* Unused */
+	/* Unused */
+	FBNIC_RXB_FIFO_NET_TO_BMC	= 3,
+	FBNIC_RXB_FIFO_HOST		= 4,
+	/* Unused */
+	FBNIC_RXB_FIFO_BMC_TO_HOST	= 6,
+	/* Unused */
+	FBNIC_RXB_FIFO_INDICES		= 8
+};
+
+#define FBNIC_RXB_CT_SIZE(n)		(0x08000 + (n))	/* 0x20000 + 4*n */
+#define FBNIC_RXB_CT_SIZE_CNT			8
+#define FBNIC_RXB_CT_SIZE_HEADER		CSR_GENMASK(5, 0)
+#define FBNIC_RXB_CT_SIZE_PAYLOAD		CSR_GENMASK(11, 6)
+#define FBNIC_RXB_CT_SIZE_ENABLE		CSR_BIT(12)
+#define FBNIC_RXB_PAUSE_DROP_CTRL	0x08008		/* 0x20020 */
+#define FBNIC_RXB_PAUSE_DROP_CTRL_DROP_ENABLE	CSR_GENMASK(7, 0)
+#define FBNIC_RXB_PAUSE_DROP_CTRL_PAUSE_ENABLE	CSR_GENMASK(15, 8)
+#define FBNIC_RXB_PAUSE_DROP_CTRL_ECN_ENABLE	CSR_GENMASK(23, 16)
+#define FBNIC_RXB_PAUSE_DROP_CTRL_PS_ENABLE	CSR_GENMASK(27, 24)
+#define FBNIC_RXB_PAUSE_THLD(n)		(0x08009 + (n)) /* 0x20024 + 4*n */
+#define FBNIC_RXB_PAUSE_THLD_CNT		8
+#define FBNIC_RXB_PAUSE_THLD_ON			CSR_GENMASK(12, 0)
+#define FBNIC_RXB_PAUSE_THLD_OFF		CSR_GENMASK(25, 13)
+#define FBNIC_RXB_DROP_THLD(n)		(0x08011 + (n)) /* 0x20044 + 4*n */
+#define FBNIC_RXB_DROP_THLD_CNT			8
+#define FBNIC_RXB_DROP_THLD_ON			CSR_GENMASK(12, 0)
+#define FBNIC_RXB_DROP_THLD_OFF			CSR_GENMASK(25, 13)
+#define FBNIC_RXB_ECN_THLD(n)		(0x0801e + (n)) /* 0x20078 + 4*n */
+#define FBNIC_RXB_ECN_THLD_CNT			8
+#define FBNIC_RXB_ECN_THLD_ON			CSR_GENMASK(12, 0)
+#define FBNIC_RXB_ECN_THLD_OFF			CSR_GENMASK(25, 13)
+#define FBNIC_RXB_PBUF_CFG(n)		(0x08027 + (n))	/* 0x2009c + 4*n */
+#define FBNIC_RXB_PBUF_CFG_CNT			8
+#define FBNIC_RXB_PBUF_BASE_ADDR		CSR_GENMASK(12, 0)
+#define FBNIC_RXB_PBUF_SIZE			CSR_GENMASK(21, 13)
+#define FBNIC_RXB_DWRR_RDE_WEIGHT0	0x0802f		/* 0x200bc */
+#define FBNIC_RXB_DWRR_RDE_WEIGHT0_QUANTUM0	CSR_GENMASK(7, 0)
+#define FBNIC_RXB_DWRR_RDE_WEIGHT0_QUANTUM1	CSR_GENMASK(15, 8)
+#define FBNIC_RXB_DWRR_RDE_WEIGHT0_QUANTUM2	CSR_GENMASK(23, 16)
+#define FBNIC_RXB_DWRR_RDE_WEIGHT0_QUANTUM3	CSR_GENMASK(31, 24)
+#define FBNIC_RXB_DWRR_RDE_WEIGHT1	0x08030		/* 0x200c0 */
+#define FBNIC_RXB_DWRR_RDE_WEIGHT1_QUANTUM4	CSR_GENMASK(7, 0)
+#define FBNIC_RXB_DWRR_BMC_WEIGHT	0x08031		/* 0x200c4 */
+#define FBNIC_RXB_CLDR_PRIO_CFG(n)	(0x8034 + (n))	/* 0x200d0 + 4*n */
+#define FBNIC_RXB_CLDR_PRIO_CFG_CNT		16
+#define FBNIC_RXB_ENDIAN_FCS		0x08044		/* 0x20110 */
+enum {
+	/* Unused */
+	/* Unused */
+	FBNIC_RXB_DEQUEUE_BMC		= 2,
+	FBNIC_RXB_DEQUEUE_HOST		= 3,
+	FBNIC_RXB_DEQUEUE_INDICES	= 4
+};
+
+#define FBNIC_RXB_PBUF_CREDIT(n)	(0x08047 + (n))	/* 0x2011C + 4*n */
+#define FBNIC_RXB_PBUF_CREDIT_CNT		8
+#define FBNIC_RXB_PBUF_CREDIT_MASK		CSR_GENMASK(13, 0)
+#define FBNIC_RXB_INTF_CREDIT		0x0804f		/* 0x2013C */
+#define FBNIC_RXB_INTF_CREDIT_MASK0		CSR_GENMASK(3, 0)
+#define FBNIC_RXB_INTF_CREDIT_MASK1		CSR_GENMASK(7, 4)
+#define FBNIC_RXB_INTF_CREDIT_MASK2		CSR_GENMASK(11, 8)
+#define FBNIC_RXB_INTF_CREDIT_MASK3		CSR_GENMASK(15, 12)
+
+#define FBNIC_RXB_PAUSE_EVENT_CNT(n)	(0x08053 + (n))	/* 0x2014c + 4*n */
+#define FBNIC_RXB_DROP_FRMS_STS(n)	(0x08057 + (n))	/* 0x2015c + 4*n */
+#define FBNIC_RXB_DROP_BYTES_STS_L(n) \
+				(0x08080 + 2 * (n))	/* 0x20200 + 8*n */
+#define FBNIC_RXB_DROP_BYTES_STS_H(n) \
+				(0x08081 + 2 * (n))	/* 0x20204 + 8*n */
+#define FBNIC_RXB_TRUN_FRMS_STS(n)	(0x08091 + (n))	/* 0x20244 + 4*n */
+#define FBNIC_RXB_TRUN_BYTES_STS_L(n) \
+				(0x080c0 + 2 * (n))	/* 0x20300 + 8*n */
+#define FBNIC_RXB_TRUN_BYTES_STS_H(n) \
+				(0x080c1 + 2 * (n))	/* 0x20304 + 8*n */
+#define FBNIC_RXB_TRANS_PAUSE_STS(n)	(0x080d1 + (n))	/* 0x20344 + 4*n */
+#define FBNIC_RXB_TRANS_DROP_STS(n)	(0x080d9 + (n))	/* 0x20364 + 4*n */
+#define FBNIC_RXB_TRANS_ECN_STS(n)	(0x080e1 + (n))	/* 0x20384 + 4*n */
+enum {
+	FBNIC_RXB_ENQUEUE_NET		= 0,
+	FBNIC_RXB_ENQUEUE_BMC		= 1,
+	/* Unused */
+	/* Unused */
+	FBNIC_RXB_ENQUEUE_INDICES	= 4
+};
+
+#define FBNIC_RXB_DRBO_FRM_CNT_SRC(n)	(0x080f9 + (n))	/* 0x203e4 + 4*n */
+#define FBNIC_RXB_DRBO_BYTE_CNT_SRC_L(n) \
+					(0x080fd + (n))	/* 0x203f4 + 4*n */
+#define FBNIC_RXB_DRBO_BYTE_CNT_SRC_H(n) \
+					(0x08101 + (n))	/* 0x20404 + 4*n */
+#define FBNIC_RXB_INTF_FRM_CNT_DST(n)	(0x08105 + (n))	/* 0x20414 + 4*n */
+#define FBNIC_RXB_INTF_BYTE_CNT_DST_L(n) \
+					(0x08109 + (n))	/* 0x20424 + 4*n */
+#define FBNIC_RXB_INTF_BYTE_CNT_DST_H(n) \
+					(0x0810d + (n))	/* 0x20434 + 4*n */
+#define FBNIC_RXB_PBUF_FRM_CNT_DST(n)	(0x08111 + (n))	/* 0x20444 + 4*n */
+#define FBNIC_RXB_PBUF_BYTE_CNT_DST_L(n) \
+					(0x08115 + (n))	/* 0x20454 + 4*n */
+#define FBNIC_RXB_PBUF_BYTE_CNT_DST_H(n) \
+					(0x08119 + (n))	/* 0x20464 + 4*n */
+
+#define FBNIC_RXB_PBUF_FIFO_LEVEL(n)	(0x0811d + (n)) /* 0x20474 + 4*n */
+
+#define FBNIC_RXB_INTEGRITY_ERR(n)	(0x0812f + (n))	/* 0x204bc + 4*n */
+#define FBNIC_RXB_MAC_ERR(n)		(0x08133 + (n))	/* 0x204cc + 4*n */
+#define FBNIC_RXB_PARSER_ERR(n)		(0x08137 + (n))	/* 0x204dc + 4*n */
+#define FBNIC_RXB_FRM_ERR(n)		(0x0813b + (n))	/* 0x204ec + 4*n */
+
+#define FBNIC_RXB_DWRR_RDE_WEIGHT0_EXT	0x08143		/* 0x2050c */
+#define FBNIC_RXB_DWRR_RDE_WEIGHT1_EXT	0x08144		/* 0x20510 */
+#define FBNIC_CSR_END_RXB		0x081b1	/* CSR section delimiter */
+
+/* Rx Parser and Classifier Registers */
+#define FBNIC_CSR_START_RPC		0x08400	/* CSR section delimiter */
+#define FBNIC_RPC_RMI_CONFIG		0x08400		/* 0x21000 */
+#define FBNIC_RPC_RMI_CONFIG_OH_BYTES		CSR_GENMASK(4, 0)
+#define FBNIC_RPC_RMI_CONFIG_FCS_PRESENT	CSR_BIT(8)
+#define FBNIC_RPC_RMI_CONFIG_ENABLE		CSR_BIT(12)
+#define FBNIC_RPC_RMI_CONFIG_MTU		CSR_GENMASK(31, 16)
+#define FBNIC_CSR_END_RPC		0x0856b	/* CSR section delimiter */
+
+/* Fab Registers */
+#define FBNIC_CSR_START_FAB		0x0C000 /* CSR section delimiter */
+#define FBNIC_FAB_AXI4_AR_SPACER_2_CFG		0x0C005		/* 0x30014 */
+#define FBNIC_FAB_AXI4_AR_SPACER_MASK		CSR_BIT(16)
+#define FBNIC_FAB_AXI4_AR_SPACER_THREADSHOLD	CSR_GENMASK(15, 0)
+#define FBNIC_CSR_END_FAB		0x0C020	    /* CSR section delimiter */
+
+/* Master Registers */
+#define FBNIC_CSR_START_MASTER		0x0C400	/* CSR section delimiter */
+#define FBNIC_MASTER_SPARE_0		0x0C41B		/* 0x3106c */
+#define FBNIC_CSR_END_MASTER		0x0C41E	/* CSR section delimiter */
+
+/* PUL User Registers */
+#define FBNIC_CSR_START_PUL_USER	0x31000	/* CSR section delimiter */
+#define FBNIC_PUL_OB_TLP_HDR_AW_CFG	0x3103d		/* 0xc40f4 */
+#define FBNIC_PUL_OB_TLP_HDR_AW_CFG_BME		CSR_BIT(18)
+#define FBNIC_PUL_OB_TLP_HDR_AR_CFG	0x3103e		/* 0xc40f8 */
+#define FBNIC_PUL_OB_TLP_HDR_AR_CFG_BME		CSR_BIT(18)
+#define FBNIC_CSR_END_PUL_USER	0x31080	/* CSR section delimiter */
+
+#define FBNIC_MAX_QUEUES		128
+
 #endif /* _FBNIC_CSR_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_mac.c b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
new file mode 100644
index 000000000000..dbbfdc649f37
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
@@ -0,0 +1,438 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/bitfield.h>
+#include <net/tcp.h>
+
+#include "fbnic.h"
+#include "fbnic_mac.h"
+
+static void fbnic_init_readrq(struct fbnic_dev *fbd, unsigned int offset,
+			      unsigned int cls, unsigned int readrq)
+{
+	u32 val = rd32(offset);
+
+	/* The TDF_CTL masks are a superset of the RNI_RBP ones. So we can
+	 * use them when setting either the TDE_CTF or RNI_RBP registers.
+	 */
+	val &= FBNIC_QM_TNI_TDF_CTL_MAX_OT | FBNIC_QM_TNI_TDF_CTL_MAX_OB;
+
+	val |= FIELD_PREP(FBNIC_QM_TNI_TDF_CTL_MRRS, readrq) |
+	       FIELD_PREP(FBNIC_QM_TNI_TDF_CTL_CLS, cls);
+
+	wr32(offset, val);
+}
+
+static void fbnic_init_mps(struct fbnic_dev *fbd, unsigned int offset,
+			   unsigned int cls, unsigned int mps)
+{
+	u32 val = rd32(offset);
+
+	/* Currently all MPS masks are identical so just use the first one */
+	val &= ~(FBNIC_QM_TNI_TCM_CTL_MPS | FBNIC_QM_TNI_TCM_CTL_CLS);
+
+	val |= FIELD_PREP(FBNIC_QM_TNI_TCM_CTL_MPS, mps) |
+	       FIELD_PREP(FBNIC_QM_TNI_TCM_CTL_CLS, cls);
+
+	wr32(offset, val);
+}
+
+static void fbnic_mac_init_axi(struct fbnic_dev *fbd)
+{
+	bool override_1k = false;
+	int readrq, mps, cls;
+
+	/* All of the values are based on being a power of 2 starting
+	 * with 64 == 0. Therefore we can either divide by 64 in the
+	 * case of constants, or just subtract 6 from the log2 of the value
+	 * in order to get the value we will be programming into the
+	 * registers.
+	 */
+	readrq = ilog2(fbd->readrq) - 6;
+	if (readrq > 3)
+		override_1k = true;
+	readrq = clamp(readrq, 0, 3);
+
+	mps = ilog2(fbd->mps) - 6;
+	mps = clamp(mps, 0, 3);
+
+	cls = ilog2(L1_CACHE_BYTES) - 6;
+	cls = clamp(cls, 0, 3);
+
+	/* Configure Tx/Rx AXI Paths w/ Read Request and Max Payload sizes */
+	fbnic_init_readrq(fbd, FBNIC_QM_TNI_TDF_CTL, cls, readrq);
+	fbnic_init_mps(fbd, FBNIC_QM_TNI_TCM_CTL, cls, mps);
+
+	/* Configure QM TNI TDE:
+	 * - Max outstanding AXI beats to 704(768 - 64) - guaranetees 8% of
+	 *   buffer capacity to descriptors.
+	 * - Max outstanding transactions to 128
+	 */
+	wr32(FBNIC_QM_TNI_TDE_CTL,
+	     FIELD_PREP(FBNIC_QM_TNI_TDE_CTL_MRRS_1K, override_1k ? 1 : 0) |
+	     FIELD_PREP(FBNIC_QM_TNI_TDE_CTL_MAX_OB, 704) |
+	     FIELD_PREP(FBNIC_QM_TNI_TDE_CTL_MAX_OT, 128) |
+	     FIELD_PREP(FBNIC_QM_TNI_TDE_CTL_MRRS, readrq) |
+	     FIELD_PREP(FBNIC_QM_TNI_TDE_CTL_CLS, cls));
+
+	fbnic_init_readrq(fbd, FBNIC_QM_RNI_RBP_CTL, cls, readrq);
+	fbnic_init_mps(fbd, FBNIC_QM_RNI_RDE_CTL, cls, mps);
+	fbnic_init_mps(fbd, FBNIC_QM_RNI_RCM_CTL, cls, mps);
+
+	/* Enable XALI AR/AW outbound */
+	wr32(FBNIC_PUL_OB_TLP_HDR_AW_CFG,
+	     FBNIC_PUL_OB_TLP_HDR_AW_CFG_BME);
+	wr32(FBNIC_PUL_OB_TLP_HDR_AR_CFG,
+	     FBNIC_PUL_OB_TLP_HDR_AR_CFG_BME);
+}
+
+static void fbnic_mac_init_qm(struct fbnic_dev *fbd)
+{
+	u32 clock_freq;
+
+	/* Configure TSO behavior */
+	fbnic_wr32(fbd, FBNIC_QM_TQS_CTL0,
+		   FIELD_PREP(FBNIC_QM_TQS_CTL0_LSO_TS_MASK,
+			      FBNIC_QM_TQS_CTL0_LSO_TS_LAST) |
+		   FIELD_PREP(FBNIC_QM_TQS_CTL0_PREFETCH_THRESH,
+			      FBNIC_QM_TQS_CTL0_PREFETCH_THRESH_MIN));
+
+	/* Limit EDT to INT_MAX as this is the limit of the EDT Qdisc */
+	fbnic_wr32(fbd, FBNIC_QM_TQS_EDT_TS_RANGE, INT_MAX);
+
+	/* Configure MTU
+	 * Due to known HW issue we cannot set the MTU to within 16 octets
+	 * of a 64 octet aligned boundary. So we will set the TQS_MTU(s) to
+	 * MTU + 1.
+	 */
+	fbnic_wr32(fbd, FBNIC_QM_TQS_MTU_CTL0, FBNIC_MAX_JUMBO_FRAME_SIZE + 1);
+	fbnic_wr32(fbd, FBNIC_QM_TQS_MTU_CTL1,
+		   FIELD_PREP(FBNIC_QM_TQS_MTU_CTL1_BULK,
+			      FBNIC_MAX_JUMBO_FRAME_SIZE + 1));
+
+	clock_freq = FBNIC_CLOCK_FREQ;
+
+	/* Be aggressive on the timings. We will have the interrupt
+	 * threshold timer tick once every 1 usec and coalese writes for
+	 * up to 80 usecs.
+	 */
+	fbnic_wr32(fbd, FBNIC_QM_TCQ_CTL0,
+		   FIELD_PREP(FBNIC_QM_TCQ_CTL0_TICK_CYCLES,
+			      clock_freq / 1000000) |
+		   FIELD_PREP(FBNIC_QM_TCQ_CTL0_COAL_WAIT,
+			      clock_freq / 12500));
+
+	/* We will have the interrupt threshold timer tick once every
+	 * 1 usec and coalese writes for up to 2 usecs.
+	 */
+	fbnic_wr32(fbd, FBNIC_QM_RCQ_CTL0,
+		   FIELD_PREP(FBNIC_QM_RCQ_CTL0_TICK_CYCLES,
+			      clock_freq / 1000000) |
+		   FIELD_PREP(FBNIC_QM_RCQ_CTL0_COAL_WAIT,
+			      clock_freq / 500000));
+
+	/* Configure spacer control to 64 beats. */
+	fbnic_wr32(fbd, FBNIC_FAB_AXI4_AR_SPACER_2_CFG,
+		   FBNIC_FAB_AXI4_AR_SPACER_MASK |
+		   FIELD_PREP(FBNIC_FAB_AXI4_AR_SPACER_THREADSHOLD, 2));
+}
+
+#define FBNIC_DROP_EN_MASK	0x7d
+#define FBNIC_PAUSE_EN_MASK	0x14
+#define FBNIC_ECN_EN_MASK	0x10
+
+struct fbnic_fifo_config {
+	unsigned int addr;
+	unsigned int size;
+};
+
+/* Rx FIFO Configuration
+ * The table consists of 8 entries, of which only 4 are currently used
+ * The starting addr is in units of 64B and the size is in 2KB units
+ * Below is the human readable version of the table defined below:
+ * Function		Addr	Size
+ * ----------------------------------
+ * network to Host/BMC	384K	64K
+ * Unused
+ * Unused
+ * network to BMC	448K	32K
+ * network to Host	0	384K
+ * Unused
+ * BMC to Host		480K	32K
+ * Unused
+ */
+static const struct fbnic_fifo_config fifo_config[] = {
+	{ .addr = 0x1800, .size = 0x20 },	/* network to Host/BMC */
+	{ },					/* not used */
+	{ },					/* not used */
+	{ .addr = 0x1c00, .size = 0x10 },	/* network to BMC */
+	{ .addr = 0x0000, .size = 0xc0 },	/* network to Host */
+	{ },					/* not used */
+	{ .addr = 0x1e00, .size = 0x10 },	/* BMC to Host */
+	{ }					/* not used */
+};
+
+static void fbnic_mac_init_rxb(struct fbnic_dev *fbd)
+{
+	bool rx_enable;
+	int i;
+
+	rx_enable = !!(fbnic_rd32(fbd, FBNIC_RPC_RMI_CONFIG) &
+		       FBNIC_RPC_RMI_CONFIG_ENABLE);
+
+	for (i = 0; i < 8; i++) {
+		unsigned int size = fifo_config[i].size;
+
+		/* If we are coming up on a system that already has the
+		 * Rx data path enabled we don't need to reconfigure the
+		 * FIFOs. Instead we can check to verify the values are
+		 * large enough to meet our needs, and use the values to
+		 * populate the flow control, ECN, and drop thresholds.
+		 */
+		if (rx_enable) {
+			size = FIELD_GET(FBNIC_RXB_PBUF_SIZE,
+					 fbnic_rd32(fbd,
+						    FBNIC_RXB_PBUF_CFG(i)));
+			if (size < fifo_config[i].size)
+				dev_warn(fbd->dev,
+					 "fifo%d size of %d smaller than expected value of %d\n",
+					 i, size << 11,
+					 fifo_config[i].size << 11);
+		} else {
+			/* Program RXB Cuthrough */
+			fbnic_wr32(fbd, FBNIC_RXB_CT_SIZE(i),
+				   FIELD_PREP(FBNIC_RXB_CT_SIZE_HEADER, 4) |
+				   FIELD_PREP(FBNIC_RXB_CT_SIZE_PAYLOAD, 2));
+
+			/* The granularity for the packet buffer size is 2KB
+			 * granularity while the packet buffer base address is
+			 * only 64B granularity
+			 */
+			fbnic_wr32(fbd, FBNIC_RXB_PBUF_CFG(i),
+				   FIELD_PREP(FBNIC_RXB_PBUF_BASE_ADDR,
+					      fifo_config[i].addr) |
+				   FIELD_PREP(FBNIC_RXB_PBUF_SIZE, size));
+
+			/* The granularity for the credits is 64B. This is
+			 * based on RXB_PBUF_SIZE * 32 + 4.
+			 */
+			fbnic_wr32(fbd, FBNIC_RXB_PBUF_CREDIT(i),
+				   FIELD_PREP(FBNIC_RXB_PBUF_CREDIT_MASK,
+					      size ? size * 32 + 4 : 0));
+		}
+
+		if (!size)
+			continue;
+
+		/* Pause is size of FIFO with 56KB skid to start/stop */
+		fbnic_wr32(fbd, FBNIC_RXB_PAUSE_THLD(i),
+			   !(FBNIC_PAUSE_EN_MASK & (1u << i)) ? 0x1fff :
+			   FIELD_PREP(FBNIC_RXB_PAUSE_THLD_ON,
+				      size * 32 - 0x380) |
+			   FIELD_PREP(FBNIC_RXB_PAUSE_THLD_OFF, 0x380));
+
+		/* Enable Drop when only one packet is left in the FIFO */
+		fbnic_wr32(fbd, FBNIC_RXB_DROP_THLD(i),
+			   !(FBNIC_DROP_EN_MASK & (1u << i)) ? 0x1fff :
+			   FIELD_PREP(FBNIC_RXB_DROP_THLD_ON,
+				      size * 32 -
+				      FBNIC_MAX_JUMBO_FRAME_SIZE / 64) |
+			   FIELD_PREP(FBNIC_RXB_DROP_THLD_OFF,
+				      size * 32 -
+				      FBNIC_MAX_JUMBO_FRAME_SIZE / 64));
+
+		/* Enable ECN bit when 1/4 of RXB is filled with at least
+		 * 1 room for one full jumbo frame before setting ECN
+		 */
+		fbnic_wr32(fbd, FBNIC_RXB_ECN_THLD(i),
+			   !(FBNIC_ECN_EN_MASK & (1u << i)) ? 0x1fff :
+			   FIELD_PREP(FBNIC_RXB_ECN_THLD_ON,
+				      max_t(unsigned int,
+					    size * 32 / 4,
+					    FBNIC_MAX_JUMBO_FRAME_SIZE / 64)) |
+			   FIELD_PREP(FBNIC_RXB_ECN_THLD_OFF,
+				      max_t(unsigned int,
+					    size * 32 / 4,
+					    FBNIC_MAX_JUMBO_FRAME_SIZE / 64)));
+	}
+
+	/* For now only enable drop and ECN. We need to add driver/kernel
+	 * interfaces for configuring pause.
+	 */
+	fbnic_wr32(fbd, FBNIC_RXB_PAUSE_DROP_CTRL,
+		   FIELD_PREP(FBNIC_RXB_PAUSE_DROP_CTRL_DROP_ENABLE,
+			      FBNIC_DROP_EN_MASK) |
+		   FIELD_PREP(FBNIC_RXB_PAUSE_DROP_CTRL_ECN_ENABLE,
+			      FBNIC_ECN_EN_MASK));
+
+	/* Program INTF credits */
+	fbnic_wr32(fbd, FBNIC_RXB_INTF_CREDIT,
+		   FBNIC_RXB_INTF_CREDIT_MASK0 |
+		   FBNIC_RXB_INTF_CREDIT_MASK1 |
+		   FBNIC_RXB_INTF_CREDIT_MASK2 |
+		   FIELD_PREP(FBNIC_RXB_INTF_CREDIT_MASK3, 8));
+
+	/* Configure calendar slots.
+	 * Rx: 0 - 62	RDE 1st, BMC 2nd
+	 *     63	BMC 1st, RDE 2nd
+	 */
+	for (i = 0; i < 16; i++) {
+		u32 calendar_val = (i == 15) ? 0x1e1b1b1b : 0x1b1b1b1b;
+
+		fbnic_wr32(fbd, FBNIC_RXB_CLDR_PRIO_CFG(i), calendar_val);
+	}
+
+	/* Split the credits for the DRR up as follows:
+	 * Quantum0: 8000	Network to Host
+	 * Quantum1: 0		Not used
+	 * Quantum2: 80		BMC to Host
+	 * Quantum3: 0		Not used
+	 * Quantum4: 8000	Multicast to Host and BMC
+	 */
+	fbnic_wr32(fbd, FBNIC_RXB_DWRR_RDE_WEIGHT0,
+		   FIELD_PREP(FBNIC_RXB_DWRR_RDE_WEIGHT0_QUANTUM0, 0x40) |
+		   FIELD_PREP(FBNIC_RXB_DWRR_RDE_WEIGHT0_QUANTUM2, 0x50));
+	fbnic_wr32(fbd, FBNIC_RXB_DWRR_RDE_WEIGHT0_EXT,
+		   FIELD_PREP(FBNIC_RXB_DWRR_RDE_WEIGHT0_QUANTUM0, 0x1f));
+	fbnic_wr32(fbd, FBNIC_RXB_DWRR_RDE_WEIGHT1,
+		   FIELD_PREP(FBNIC_RXB_DWRR_RDE_WEIGHT1_QUANTUM4, 0x40));
+	fbnic_wr32(fbd, FBNIC_RXB_DWRR_RDE_WEIGHT1_EXT,
+		   FIELD_PREP(FBNIC_RXB_DWRR_RDE_WEIGHT1_QUANTUM4, 0x1f));
+
+	/* Program RXB FCS Endian register */
+	fbnic_wr32(fbd, FBNIC_RXB_ENDIAN_FCS, 0x0aaaaaa0);
+}
+
+static void fbnic_mac_init_txb(struct fbnic_dev *fbd)
+{
+	int i;
+
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_CTRL, 0);
+
+	/* Configure Tx QM Credits */
+	fbnic_wr32(fbd, FBNIC_QM_TQS_CTL1,
+		   FIELD_PREP(FBNIC_QM_TQS_CTL1_MC_MAX_CREDITS, 0x40) |
+		   FIELD_PREP(FBNIC_QM_TQS_CTL1_BULK_MAX_CREDITS, 0x20));
+
+	/* Initialize internal Tx queues */
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_TEI_Q0_CTRL, 0);
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_TEI_Q1_CTRL, 0);
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_MC_Q_CTRL,
+		   FIELD_PREP(FBNIC_TCE_TXB_Q_CTRL_SIZE, 0x400) |
+		   FIELD_PREP(FBNIC_TCE_TXB_Q_CTRL_START, 0x000));
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_RX_TEI_Q_CTRL, 0);
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_TX_BMC_Q_CTRL,
+		   FIELD_PREP(FBNIC_TCE_TXB_Q_CTRL_SIZE, 0x200) |
+		   FIELD_PREP(FBNIC_TCE_TXB_Q_CTRL_START, 0x400));
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_RX_BMC_Q_CTRL,
+		   FIELD_PREP(FBNIC_TCE_TXB_Q_CTRL_SIZE, 0x200) |
+		   FIELD_PREP(FBNIC_TCE_TXB_Q_CTRL_START, 0x600));
+
+	fbnic_wr32(fbd, FBNIC_TCE_LSO_CTRL,
+		   FBNIC_TCE_LSO_CTRL_IPID_MODE_INC |
+		   FIELD_PREP(FBNIC_TCE_LSO_CTRL_TCPF_CLR_1ST, TCPHDR_PSH |
+							       TCPHDR_FIN) |
+		   FIELD_PREP(FBNIC_TCE_LSO_CTRL_TCPF_CLR_MID, TCPHDR_PSH |
+							       TCPHDR_CWR |
+							       TCPHDR_FIN) |
+		   FIELD_PREP(FBNIC_TCE_LSO_CTRL_TCPF_CLR_END, TCPHDR_CWR));
+	fbnic_wr32(fbd, FBNIC_TCE_CSO_CTRL, 0);
+
+	fbnic_wr32(fbd, FBNIC_TCE_BMC_MAX_PKTSZ,
+		   FIELD_PREP(FBNIC_TCE_BMC_MAX_PKTSZ_TX,
+			      FBNIC_MAX_JUMBO_FRAME_SIZE) |
+		   FIELD_PREP(FBNIC_TCE_BMC_MAX_PKTSZ_RX,
+			      FBNIC_MAX_JUMBO_FRAME_SIZE));
+	fbnic_wr32(fbd, FBNIC_TCE_MC_MAX_PKTSZ,
+		   FIELD_PREP(FBNIC_TCE_MC_MAX_PKTSZ_TMI,
+			      FBNIC_MAX_JUMBO_FRAME_SIZE));
+
+	/* Enable Drops in Tx path, needed for FPGA only */
+	fbnic_wr32(fbd, FBNIC_TCE_DROP_CTRL,
+		   FBNIC_TCE_DROP_CTRL_TTI_CM_DROP_EN |
+		   FBNIC_TCE_DROP_CTRL_TTI_FRM_DROP_EN |
+		   FBNIC_TCE_DROP_CTRL_TTI_TBI_DROP_EN);
+
+	/* Configure calendar slots.
+	 * Tx: 0 - 62	TMI 1st, BMC 2nd
+	 *     63	BMC 1st, TMI 2nd
+	 */
+	for (i = 0; i < 16; i++) {
+		u32 calendar_val = (i == 15) ? 0x1e1b1b1b : 0x1b1b1b1b;
+
+		fbnic_wr32(fbd, FBNIC_TCE_TXB_CLDR_SLOT_CFG(i), calendar_val);
+	}
+
+	/* Configure DWRR */
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_ENQ_WRR_CTRL,
+		   FIELD_PREP(FBNIC_TCE_TXB_ENQ_WRR_CTRL_WEIGHT0, 0x64) |
+		   FIELD_PREP(FBNIC_TCE_TXB_ENQ_WRR_CTRL_WEIGHT2, 0x04));
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_TEI_DWRR_CTRL, 0);
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_TEI_DWRR_CTRL_EXT, 0);
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_BMC_DWRR_CTRL,
+		   FIELD_PREP(FBNIC_TCE_TXB_BMC_DWRR_CTRL_QUANTUM0, 0x50) |
+		   FIELD_PREP(FBNIC_TCE_TXB_BMC_DWRR_CTRL_QUANTUM1, 0x82));
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_BMC_DWRR_CTRL_EXT, 0);
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_NTWRK_DWRR_CTRL,
+		   FIELD_PREP(FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_QUANTUM1, 0x50) |
+		   FIELD_PREP(FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_QUANTUM2, 0x20));
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_EXT,
+		   FIELD_PREP(FBNIC_TCE_TXB_NTWRK_DWRR_CTRL_QUANTUM2, 0x03));
+
+	/* Configure SOP protocol protection */
+	fbnic_wr32(fbd, FBNIC_TCE_SOP_PROT_CTRL,
+		   FIELD_PREP(FBNIC_TCE_SOP_PROT_CTRL_TBI, 0x78) |
+		   FIELD_PREP(FBNIC_TCE_SOP_PROT_CTRL_TTI_FRM, 0x40) |
+		   FIELD_PREP(FBNIC_TCE_SOP_PROT_CTRL_TTI_CM, 0x0c));
+
+	/* Conservative configuration on MAC interface Start of Packet
+	 * protection FIFO. This sets the minimum depth of the FIFO before
+	 * we start sending packets to the MAC measured in 64B units and
+	 * up to 160 entries deep.
+	 *
+	 * For the ASIC the clock is fast enough that we will likely fill
+	 * the SOP FIFO before the MAC can drain it. So just use a minimum
+	 * value of 8.
+	 *
+	 * For the FPGA we have a clock that is about 3/5 of the MAC clock.
+	 * As such we will need to account for adding more runway before
+	 * transmitting the frames.
+	 * SOP = (9230 / 64) * 2/5 + 8
+	 * SOP = 66
+	 */
+	fbnic_wr32(fbd, FBNIC_TMI_SOP_PROT_CTRL, 8);
+
+	wrfl();
+	fbnic_wr32(fbd, FBNIC_TCE_TXB_CTRL, FBNIC_TCE_TXB_CTRL_TCAM_ENABLE |
+					    FBNIC_TCE_TXB_CTRL_LOAD);
+}
+
+static void fbnic_mac_init_regs(struct fbnic_dev *fbd)
+{
+	fbnic_mac_init_axi(fbd);
+	fbnic_mac_init_qm(fbd);
+	fbnic_mac_init_rxb(fbd);
+	fbnic_mac_init_txb(fbd);
+}
+
+static const struct fbnic_mac fbnic_mac_asic = {
+	.init_regs = fbnic_mac_init_regs,
+};
+
+/**
+ * fbnic_mac_init - Assign a MAC type and initialize the fbnic device
+ * @fbd: Device pointer to device to initialize
+ *
+ * Returns 0 on success, negative on failure
+ *
+ * Initialize the MAC function pointers and initializes the MAC of
+ * the device.
+ **/
+int fbnic_mac_init(struct fbnic_dev *fbd)
+{
+	fbd->mac = &fbnic_mac_asic;
+
+	fbd->mac->init_regs(fbd);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_mac.h b/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
new file mode 100644
index 000000000000..e78a92338a62
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_MAC_H_
+#define _FBNIC_MAC_H_
+
+#include <linux/types.h>
+
+struct fbnic_dev;
+
+#define FBNIC_MAX_JUMBO_FRAME_SIZE	9742
+
+/* This structure defines the interface hooks for the MAC. The MAC hooks
+ * will be configured as a const struct provided with a set of function
+ * pointers.
+ *
+ * void (*init_regs)(struct fbnic_dev *fbd);
+ *	Initialize MAC registers to enable Tx/Rx paths and FIFOs.
+ */
+struct fbnic_mac {
+	void (*init_regs)(struct fbnic_dev *fbd);
+};
+
+int fbnic_mac_init(struct fbnic_dev *fbd);
+#endif /* _FBNIC_MAC_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index 596151396eac..c860516eb23a 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -29,6 +29,35 @@ static const struct pci_device_id fbnic_pci_tbl[] = {
 };
 MODULE_DEVICE_TABLE(pci, fbnic_pci_tbl);
 
+u32 fbnic_rd32(struct fbnic_dev *fbd, u32 reg)
+{
+	u32 __iomem *csr = READ_ONCE(fbd->uc_addr0);
+	u32 value;
+
+	if (!csr)
+		return ~0U;
+
+	value = readl(csr + reg);
+
+	/* If any bits are 0 value should be valid */
+	if (~value)
+		return value;
+
+	/* All 1's may be valid if ZEROs register still works */
+	if (reg != FBNIC_MASTER_SPARE_0 && ~readl(csr + FBNIC_MASTER_SPARE_0))
+		return value;
+
+	/* Hardware is giving us all 1's reads, assume it is gone */
+	WRITE_ONCE(fbd->uc_addr0, NULL);
+	WRITE_ONCE(fbd->uc_addr4, NULL);
+
+	dev_err(fbd->dev,
+		"Failed read (idx 0x%x AKA addr 0x%x), disabled CSR access, awaiting reset\n",
+		reg, reg << 2);
+
+	return ~0U;
+}
+
 /**
  *  fbnic_probe - Device Initialization Routine
  *  @pdev: PCI device information struct
@@ -88,6 +117,12 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto free_fbd;
 
+	err = fbnic_mac_init(fbd);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to initialize MAC: %d\n", err);
+		goto free_irqs;
+	}
+
 	if (!fbd->dsn) {
 		dev_warn(&pdev->dev, "Reading serial number failed\n");
 		goto init_failure_mode;
@@ -101,6 +136,8 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	  * firmware updates for fixes.
 	  */
 	return 0;
+free_irqs:
+	fbnic_free_irqs(fbd);
 free_fbd:
 	pci_disable_device(pdev);
 
@@ -158,6 +195,8 @@ static int __fbnic_pm_resume(struct device *dev)
 	if (err)
 		goto err_invalidate_uc_addr;
 
+	fbd->mac->init_regs(fbd);
+
 	return 0;
 err_invalidate_uc_addr:
 	WRITE_ONCE(fbd->uc_addr0, NULL);



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 05/15] eth: fbnic: add message parsing for FW messages
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (3 preceding siblings ...)
  2024-04-03 20:08 ` [net-next PATCH 04/15] eth: fbnic: Add register init to set PCIe/Ethernet device config Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 21:07   ` Jeff Johnson
  2024-04-03 20:08 ` [net-next PATCH 06/15] eth: fbnic: add FW communication mechanism Alexander Duyck
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Add FW message formatting and parsing. The TLV format should
look very familiar to those familiar with netlink.
Since we don't have to deal with backward compatibility
we tweaked the format a little to make it easier to deal
with, and more appropriate for tightly coupled interfaces
like driver<>FW communication.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/Makefile    |    3 
 drivers/net/ethernet/meta/fbnic/fbnic_tlv.c |  529 +++++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_tlv.h |  175 +++++++++
 3 files changed, 706 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.h

diff --git a/drivers/net/ethernet/meta/fbnic/Makefile b/drivers/net/ethernet/meta/fbnic/Makefile
index b8f4511440dc..0434ee0b3069 100644
--- a/drivers/net/ethernet/meta/fbnic/Makefile
+++ b/drivers/net/ethernet/meta/fbnic/Makefile
@@ -10,4 +10,5 @@ obj-$(CONFIG_FBNIC) += fbnic.o
 fbnic-y := fbnic_devlink.o \
 	   fbnic_irq.o \
 	   fbnic_mac.o \
-	   fbnic_pci.o
+	   fbnic_pci.o \
+	   fbnic_tlv.o
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_tlv.c b/drivers/net/ethernet/meta/fbnic/fbnic_tlv.c
new file mode 100644
index 000000000000..88d7b15fa798
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_tlv.c
@@ -0,0 +1,529 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/once.h>
+#include <linux/random.h>
+#include <linux/string.h>
+#include <uapi/linux/if_ether.h>
+
+#include "fbnic_tlv.h"
+
+/**
+ *  fbnic_tlv_msg_alloc - Allocate page and initialize FW message header
+ *  @msg_id: Identifier for new message we are starting
+ *
+ *  Returns pointer to start of message, or NULL on failure.
+ *
+ *  Allocates a page and initializes message header at start of page.
+ *  Initial message size is 1 DWORD which is just the header.
+ **/
+struct fbnic_tlv_msg *fbnic_tlv_msg_alloc(u16 msg_id)
+{
+	struct fbnic_tlv_hdr hdr = { 0 };
+	struct fbnic_tlv_msg *msg;
+
+	msg = (struct fbnic_tlv_msg *)__get_free_page(GFP_KERNEL);
+	if (!msg)
+		return NULL;
+
+	/* Start with zero filled header and then back fill with data */
+	hdr.type = msg_id;
+	hdr.is_msg = 1;
+	hdr.len = cpu_to_le16(1);
+
+	/* Copy header into start of message */
+	msg->hdr = hdr;
+
+	return msg;
+}
+
+/**
+ *  fbnic_tlv_attr_put_flag - Add flag value to message
+ *  @msg: Message header we are adding flag attribute to
+ *  @attr_id: ID of flag attribute we are adding to message
+ *
+ *  Returns -ENOSPC if there is no room for the attribute. Otherwise 0.
+ *
+ *  Adds a 1 DWORD flag attribute to the message. The presence of this
+ *  attribute can be used as a boolean value indicating true, otherwise the
+ *  value is considered false.
+ **/
+int fbnic_tlv_attr_put_flag(struct fbnic_tlv_msg *msg, const u16 attr_id)
+{
+	int attr_max_len = PAGE_SIZE - offset_in_page(msg) - sizeof(*msg);
+	struct fbnic_tlv_hdr hdr = { 0 };
+	struct fbnic_tlv_msg *attr;
+
+	attr_max_len -= le16_to_cpu(msg->hdr.len) * sizeof(u32);
+	if (attr_max_len < sizeof(*attr))
+		return -ENOSPC;
+
+	/* get header pointer and bump attr to start of data */
+	attr = &msg[le16_to_cpu(msg->hdr.len)];
+
+	/* Record attribute type and size */
+	hdr.type = attr_id;
+	hdr.len = cpu_to_le16(sizeof(hdr));
+
+	attr->hdr = hdr;
+	le16_add_cpu(&msg->hdr.len,
+		     FBNIC_TLV_MSG_SIZE(le16_to_cpu(hdr.len)));
+
+	return 0;
+}
+
+/**
+ *  fbnic_tlv_attr_put_value - Add data to message
+ *  @msg: Message header we are adding flag attribute to
+ *  @attr_id: ID of flag attribute we are adding to message
+ *  @value: Pointer to data to be stored
+ *  @len: Size of data to be stored.
+ *
+ *  Returns -ENOSPC if there is no room for the attribute. Otherwise 0.
+ *
+ *  Adds header and copies data pointed to by value into the message. The
+ *  result is rounded up to the nearest DWORD for sizing so that the
+ *  headers remain aligned.
+ *
+ *  The assumption is that the value field is in a format where byte
+ *  ordering can be guaranteed such as a byte array or a little endian
+ *  format.
+ **/
+int fbnic_tlv_attr_put_value(struct fbnic_tlv_msg *msg, const u16 attr_id,
+			     const void *value, const int len)
+{
+	int attr_max_len = PAGE_SIZE - offset_in_page(msg) - sizeof(*msg);
+	struct fbnic_tlv_hdr hdr = { 0 };
+	struct fbnic_tlv_msg *attr;
+
+	attr_max_len -= le16_to_cpu(msg->hdr.len) * sizeof(u32);
+	if (attr_max_len < sizeof(*attr) + len)
+		return -ENOSPC;
+
+	/* get header pointer and bump attr to start of data */
+	attr = &msg[le16_to_cpu(msg->hdr.len)];
+
+	/* Record attribute type and size */
+	hdr.type = attr_id;
+	hdr.len = cpu_to_le16(sizeof(hdr) + len);
+
+	/* Zero pad end of region to be written if we aren't aligned */
+	if (len % sizeof(hdr))
+		attr->value[len / sizeof(hdr)] = 0;
+
+	/* Copy data over */
+	memcpy(attr->value, value, len);
+
+	attr->hdr = hdr;
+	le16_add_cpu(&msg->hdr.len,
+		     FBNIC_TLV_MSG_SIZE(le16_to_cpu(hdr.len)));
+
+	return 0;
+}
+
+/**
+ *  __fbnic_tlv_attr_put_int - Add integer to message
+ *  @msg: Message header we are adding flag attribute to
+ *  @attr_id: ID of flag attribute we are adding to message
+ *  @value: Data to be stored
+ *  @len: Size of data to be stored, either 4 or 8 bytes.
+ *
+ *  Returns -ENOSPC if there is no room for the attribute. Otherwise 0.
+ *
+ *  Adds header and copies data pointed to by value into the message. Will
+ *  format the data as little endian.
+ **/
+int __fbnic_tlv_attr_put_int(struct fbnic_tlv_msg *msg, const u16 attr_id,
+			     s64 value, const int len)
+{
+	__le64 le64_value = cpu_to_le64(value);
+
+	return fbnic_tlv_attr_put_value(msg, attr_id, &le64_value, len);
+}
+
+/**
+ *  fbnic_tlv_attr_put_mac_addr - Add mac_addr to message
+ *  @msg: Message header we are adding flag attribute to
+ *  @attr_id: ID of flag attribute we are adding to message
+ *  @mac_addr: Byte pointer to MAC address to be stored
+ *
+ *  Returns -ENOSPC if there is no room for the attribute. Otherwise 0.
+ *
+ *  Adds header and copies data pointed to by mac_addr into the message. Will
+ *  copy the address raw so it will be in big endian with start of MAC
+ *  address at start of attribute.
+ **/
+int fbnic_tlv_attr_put_mac_addr(struct fbnic_tlv_msg *msg, const u16 attr_id,
+				const u8 *mac_addr)
+{
+	return fbnic_tlv_attr_put_value(msg, attr_id, mac_addr, ETH_ALEN);
+}
+
+/**
+ *  fbnic_tlv_attr_put_string - Add string to message
+ *  @msg: Message header we are adding flag attribute to
+ *  @attr_id: ID of flag attribute we are adding to message
+ *  @string: Byte pointer to null terminated string to be stored
+ *
+ *  Returns -ENOSPC if there is no room for the attribute. Otherwise 0.
+ *
+ *  Adds header and copies data pointed to by string into the message. Will
+ *  copy the address raw so it will be in byte order.
+ **/
+int fbnic_tlv_attr_put_string(struct fbnic_tlv_msg *msg, u16 attr_id,
+			      const char *string)
+{
+	int attr_max_len = PAGE_SIZE - sizeof(*msg);
+	int str_len = 1;
+
+	/* The max length will be message minus existing message and new
+	 * attribute header. Since the message is measured in DWORDs we have
+	 * to multiply the size by 4.
+	 *
+	 * The string length doesn't include the \0 so we have to add one to
+	 * the final value, so start with that as our initial value.
+	 *
+	 * We will verify if the string will fit in fbnic_tlv_attr_put_value()
+	 */
+	attr_max_len -= le16_to_cpu(msg->hdr.len) * sizeof(u32);
+	str_len += strnlen(string, attr_max_len);
+
+	return fbnic_tlv_attr_put_value(msg, attr_id, string, str_len);
+}
+
+/**
+ *  fbnic_tlv_attr_get_unsigned - Retrieve unsigned value from result
+ *  @attr: Attribute to retrieve data from
+ *
+ *  Returns unsigned 64b value containing integer value
+ **/
+u64 fbnic_tlv_attr_get_unsigned(struct fbnic_tlv_msg *attr)
+{
+	__le64 le64_value = 0;
+
+	memcpy(&le64_value, &attr->value[0],
+	       le16_to_cpu(attr->hdr.len) - sizeof(*attr));
+
+	return le64_to_cpu(le64_value);
+}
+
+/**
+ *  fbnic_tlv_attr_get_signed - Retrieve signed value from result
+ *  @attr: Attribute to retrieve data from
+ *
+ *  Returns signed 64b value containing integer value
+ **/
+s64 fbnic_tlv_attr_get_signed(struct fbnic_tlv_msg *attr)
+{
+	int shift = (8 + sizeof(*attr) - le16_to_cpu(attr->hdr.len)) * 8;
+	__le64 le64_value = 0;
+	s64 value;
+
+	/* Copy the value and adjust for byte ordering */
+	memcpy(&le64_value, &attr->value[0],
+	       le16_to_cpu(attr->hdr.len) - sizeof(*attr));
+	value = le64_to_cpu(le64_value);
+
+	/* Sign extend the return value by using a pair of shifts */
+	return (value << shift) >> shift;
+}
+
+/**
+ * fbnic_tlv_attr_get_string - Retrieve string value from result
+ * @attr: Attribute to retrieve data from
+ * @str: Pointer to an allocated string to store the data
+ * @max_size: The maximum size which can be in str
+ *
+ * Returns the size of the string read from firmware
+ **/
+size_t fbnic_tlv_attr_get_string(struct fbnic_tlv_msg *attr, char *str,
+				 size_t max_size)
+{
+	max_size = min_t(size_t, max_size,
+			 (le16_to_cpu(attr->hdr.len) * 4) - sizeof(*attr));
+	memcpy(str, &attr->value, max_size);
+
+	return max_size;
+}
+
+/**
+ *  fbnic_tlv_attr_nest_start - Add nested attribute header to message
+ *  @msg: Message header we are adding flag attribute to
+ *  @attr_id: ID of flag attribute we are adding to message
+ *
+ *  Returns NULL if there is no room for the attribute. Otherwise a pointer
+ *  to the new attribute header.
+ *
+ *  New header length is stored initially in DWORDs.
+ **/
+struct fbnic_tlv_msg *fbnic_tlv_attr_nest_start(struct fbnic_tlv_msg *msg,
+						u16 attr_id)
+{
+	int attr_max_len = PAGE_SIZE - offset_in_page(msg) - sizeof(*msg);
+	struct fbnic_tlv_msg *attr = &msg[le16_to_cpu(msg->hdr.len)];
+	struct fbnic_tlv_hdr hdr = { 0 };
+
+	/* Make sure we have space for at least the nest header plus one more */
+	attr_max_len -= le16_to_cpu(msg->hdr.len) * sizeof(u32);
+	if (attr_max_len < sizeof(*attr) * 2)
+		return NULL;
+
+	/* Record attribute type and size */
+	hdr.type = attr_id;
+
+	/* Add current message length to account for consumption within the
+	 * page and leave it as a multiple of DWORDs, we will shift to
+	 * bytes when we close it out.
+	 */
+	hdr.len = cpu_to_le16(1);
+
+	attr->hdr = hdr;
+
+	return attr;
+}
+
+/**
+ *  fbnic_tlv_attr_nest_stop - Close out nested attribute and add it to message
+ *  @msg: Message header we are adding flag attribute to
+ *
+ *  Closes out nested attribute, adds length to message, and then bumps
+ *  length from DWORDs to bytes to match other attributes.
+ **/
+void fbnic_tlv_attr_nest_stop(struct fbnic_tlv_msg *msg)
+{
+	struct fbnic_tlv_msg *attr = &msg[le16_to_cpu(msg->hdr.len)];
+	u16 len = le16_to_cpu(attr->hdr.len);
+
+	/* Add attribute to message if there is more than just a header */
+	if (len <= 1)
+		return;
+
+	le16_add_cpu(&msg->hdr.len, len);
+
+	/* convert from DWORDs to bytes */
+	attr->hdr.len = cpu_to_le16(len * sizeof(u32));
+}
+
+static int
+fbnic_tlv_attr_validate(struct fbnic_tlv_msg *attr,
+			const struct fbnic_tlv_index *tlv_index)
+{
+	u16 len = le16_to_cpu(attr->hdr.len) - sizeof(*attr);
+	u16 attr_id = attr->hdr.type;
+	__le32 *value = &attr->value[0];
+
+	if (attr->hdr.is_msg)
+		return -EINVAL;
+
+	if (attr_id >= FBNIC_TLV_RESULTS_MAX)
+		return -EINVAL;
+
+	while (tlv_index->id != attr_id) {
+		if  (tlv_index->id == FBNIC_TLV_ATTR_ID_UNKNOWN) {
+			if (attr->hdr.cannot_ignore)
+				return -ENOENT;
+			return le16_to_cpu(attr->hdr.len);
+		}
+
+		tlv_index++;
+	}
+
+	if (offset_in_page(attr) + len > PAGE_SIZE - sizeof(*attr))
+		return -E2BIG;
+
+	switch (tlv_index->type) {
+	case FBNIC_TLV_STRING:
+		if (!len || len > tlv_index->len)
+			return -EINVAL;
+		if (((char *)value)[len - 1])
+			return -EINVAL;
+		break;
+	case FBNIC_TLV_FLAG:
+		if (len)
+			return -EINVAL;
+		break;
+	case FBNIC_TLV_UNSIGNED:
+	case FBNIC_TLV_SIGNED:
+		if (tlv_index->len > sizeof(__le64))
+			return -EINVAL;
+		fallthrough;
+	case FBNIC_TLV_BINARY:
+		if (!len || len > tlv_index->len)
+			return -EINVAL;
+		break;
+	case FBNIC_TLV_NESTED:
+	case FBNIC_TLV_ARRAY:
+		if (len % 4)
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ *  fbnic_tlv_attr_parse_array - Parse array of attributes into results array
+ *  @attr: Start of attributes in the message
+ *  @len: Length of attributes in the message
+ *  @results: Array of pointers to store the results of parsing
+ *  @tlv_index: List of TLV attributes to be parsed from message
+ *  @tlv_attr_id: Specific ID that is repeated in array
+ *  @array_len: Number of results to store in results array
+ *
+ *  Returns 0 on success, or negative value on error.
+ *
+ *  Will take a list of attributes and a parser definition and will capture
+ *  the results in the results array to have the data extracted later.
+ **/
+int fbnic_tlv_attr_parse_array(struct fbnic_tlv_msg *attr, int len,
+			       struct fbnic_tlv_msg **results,
+			       const struct fbnic_tlv_index *tlv_index,
+			       u16 tlv_attr_id, size_t array_len)
+{
+	int i = 0;
+
+	/* Initialize results table to NULL. */
+	memset(results, 0, array_len * sizeof(results[0]));
+
+	/* Nothing to parse if header was only thing there */
+	if (!len)
+		return 0;
+
+	/* Work through list of attributes, parsing them as necessary */
+	while (len > 0) {
+		u16 attr_id = attr->hdr.type;
+		u16 attr_len;
+		int err;
+
+		if (tlv_attr_id != attr_id)
+			return -EINVAL;
+
+		/* stop parsing on full error */
+		err = fbnic_tlv_attr_validate(attr, tlv_index);
+		if (err < 0)
+			return err;
+
+		if (i >= array_len)
+			return -ENOSPC;
+
+		results[i++] = attr;
+
+		attr_len = FBNIC_TLV_MSG_SIZE(le16_to_cpu(attr->hdr.len));
+		len -= attr_len;
+		attr += attr_len;
+	}
+
+	return len == 0 ? 0 : -EINVAL;
+}
+
+/**
+ *  fbnic_tlv_attr_parse - Parse attributes into a list of attribute results
+ *  @attr: Start of attributes in the message
+ *  @len: Length of attributes in the message
+ *  @results: Array of pointers to store the results of parsing
+ *  @tlv_index: List of TLV attributes to be parsed from message
+ *
+ *  Returns 0 on success, or negative value on error.
+ *
+ *  Will take a list of attributes and a parser definition and will capture
+ *  the results in the results array to have the data extracted later.
+ **/
+int fbnic_tlv_attr_parse(struct fbnic_tlv_msg *attr, int len,
+			 struct fbnic_tlv_msg **results,
+			 const struct fbnic_tlv_index *tlv_index)
+{
+	/* Initialize results table to NULL. */
+	memset(results, 0, sizeof(results[0]) * FBNIC_TLV_RESULTS_MAX);
+
+	/* Nothing to parse if header was only thing there */
+	if (!len)
+		return 0;
+
+	/* Work through list of attributes, parsing them as necessary */
+	while (len > 0) {
+		int err = fbnic_tlv_attr_validate(attr, tlv_index);
+		u16 attr_id = attr->hdr.type;
+		u16 attr_len;
+
+		/* stop parsing on full error */
+		if (err < 0)
+			return err;
+
+		/* Ignore results for unsupported values */
+		if (!err) {
+			/* Do not overwrite existing entries */
+			if (results[attr_id])
+				return -EADDRINUSE;
+
+			results[attr_id] = attr;
+		}
+
+		attr_len = FBNIC_TLV_MSG_SIZE(le16_to_cpu(attr->hdr.len));
+		len -= attr_len;
+		attr += attr_len;
+	}
+
+	return len == 0 ? 0 : -EINVAL;
+}
+
+/**
+ *  fbnic_tlv_msg_parse - Parse message and process via predetermined functions
+ *  @opaque: Value passed to parser function to enable driver access
+ *  @msg: Message to be parsed.
+ *  @parser: TLV message parser definition.
+ *
+ *  Returns 0 on success, or negative value on error.
+ *
+ *  Will take a message a number of message types via the attribute parsing
+ *  definitions and function provided for the parser array.
+ **/
+int fbnic_tlv_msg_parse(void *opaque, struct fbnic_tlv_msg *msg,
+			const struct fbnic_tlv_parser *parser)
+{
+	struct fbnic_tlv_msg *results[FBNIC_TLV_RESULTS_MAX];
+	u16 msg_id = msg->hdr.type;
+	int err;
+
+	if (!msg->hdr.is_msg)
+		return -EINVAL;
+
+	if (le16_to_cpu(msg->hdr.len) > PAGE_SIZE / sizeof(u32))
+		return -E2BIG;
+
+	while (parser->id != msg_id) {
+		if (parser->id == FBNIC_TLV_MSG_ID_UNKNOWN)
+			return -ENOENT;
+		parser++;
+	}
+
+	err = fbnic_tlv_attr_parse(&msg[1], le16_to_cpu(msg->hdr.len) - 1,
+				   results, parser->attr);
+	if (err)
+		return err;
+
+	return parser->func(opaque, results);
+}
+
+/**
+ *  fbnic_tlv_parser_error - called if message doesn't match known type
+ *  @opaque: (unused)
+ *  @results: (unused)
+ *
+ *  Returns -EBADMSG to indicate the message is an unsupported type
+ **/
+int fbnic_tlv_parser_error(void *opaque, struct fbnic_tlv_msg **results)
+{
+	return -EBADMSG;
+}
+
+void fbnic_tlv_attr_addr_copy(u8 *dest, struct fbnic_tlv_msg *src)
+{
+	u8 *mac_addr;
+
+	mac_addr = fbnic_tlv_attr_get_value_ptr(src);
+	memcpy(dest, mac_addr, ETH_ALEN);
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_tlv.h b/drivers/net/ethernet/meta/fbnic/fbnic_tlv.h
new file mode 100644
index 000000000000..67300ab44353
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_tlv.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_TLV_H_
+#define _FBNIC_TLV_H_
+
+#include <asm/byteorder.h>
+#include <linux/bits.h>
+#include <linux/const.h>
+#include <linux/types.h>
+
+#define FBNIC_TLV_MSG_ALIGN(len)	ALIGN(len, sizeof(u32))
+#define FBNIC_TLV_MSG_SIZE(len)		\
+		(FBNIC_TLV_MSG_ALIGN(len) / sizeof(u32))
+
+/* TLV Header Format
+ *    3			  2		      1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |		Length		   |M|I|RSV|	   Type / ID	   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * The TLV header format described above will be used for transferring
+ * messages between the host and the firmware. To ensure byte ordering
+ * we have defined all fields as being little endian.
+ * Type/ID: Identifier for message and/or attribute
+ * RSV: Reserved field for future use, likely as additional flags
+ * I: cannot_ignore flag, identifies if unrecognized attribute can be ignored
+ * M: is_msg, indicates that this is the start of a new message
+ * Length: Total length of message in dwords including header
+ *		or
+ *	   Total length of attribute in bytes including header
+ */
+struct fbnic_tlv_hdr {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+	u16 type		: 12; /* 0 .. 11  Type / ID */
+	u16 rsvd		: 2;  /* 12 .. 13 Reserved for future use */
+	u16 cannot_ignore	: 1;  /* 14	  Attribute can be ignored */
+	u16 is_msg		: 1;  /* 15	  Header belongs to message */
+#elif defined(__BIG_ENDIAN_BITFIELD)
+	u16 is_msg		: 1;  /* 15	  Header belongs to message */
+	u16 cannot_ignore	: 1;  /* 14	  Attribute can be ignored */
+	u16 rsvd		: 2;  /* 13 .. 12 Reserved for future use */
+	u16 type		: 12; /* 11 .. 0  Type / ID */
+#else
+#error "Missing defines from byteorder.h"
+#endif
+	__le16 len;		/* 16 .. 32	length including TLV header */
+};
+
+#define FBNIC_TLV_RESULTS_MAX		32
+
+struct fbnic_tlv_msg {
+	struct fbnic_tlv_hdr	hdr;
+	__le32			value[];
+};
+
+#define FBNIC_TLV_MSG_ID_UNKNOWN		USHRT_MAX
+
+enum fbnic_tlv_type {
+	FBNIC_TLV_STRING,
+	FBNIC_TLV_FLAG,
+	FBNIC_TLV_UNSIGNED,
+	FBNIC_TLV_SIGNED,
+	FBNIC_TLV_BINARY,
+	FBNIC_TLV_NESTED,
+	FBNIC_TLV_ARRAY,
+	__FBNIC_TLV_MAX_TYPE
+};
+
+/* TLV Index
+ * Defines the relationship between the attribute IDs and their types.
+ * For each entry in the index there will be a size and type associated
+ * with it so that we can use this to parse the data and verify it matches
+ * the expected layout.
+ */
+struct fbnic_tlv_index {
+	u16			id;
+	u16			len;
+	enum fbnic_tlv_type	type;
+};
+
+#define TLV_MAX_DATA			(PAGE_SIZE - 512)
+#define FBNIC_TLV_ATTR_ID_UNKNOWN	USHRT_MAX
+#define FBNIC_TLV_ATTR_STRING(id, len)	{ id, len, FBNIC_TLV_STRING }
+#define FBNIC_TLV_ATTR_FLAG(id)		{ id, 0, FBNIC_TLV_FLAG }
+#define FBNIC_TLV_ATTR_U32(id)		{ id, sizeof(u32), FBNIC_TLV_UNSIGNED }
+#define FBNIC_TLV_ATTR_U64(id)		{ id, sizeof(u64), FBNIC_TLV_UNSIGNED }
+#define FBNIC_TLV_ATTR_S32(id)		{ id, sizeof(s32), FBNIC_TLV_SIGNED }
+#define FBNIC_TLV_ATTR_S64(id)		{ id, sizeof(s64), FBNIC_TLV_SIGNED }
+#define FBNIC_TLV_ATTR_MAC_ADDR(id)	{ id, ETH_ALEN, FBNIC_TLV_BINARY }
+#define FBNIC_TLV_ATTR_NESTED(id)	{ id, 0, FBNIC_TLV_NESTED }
+#define FBNIC_TLV_ATTR_ARRAY(id)	{ id, 0, FBNIC_TLV_ARRAY }
+#define FBNIC_TLV_ATTR_RAW_DATA(id)	{ id, TLV_MAX_DATA, FBNIC_TLV_BINARY }
+#define FBNIC_TLV_ATTR_LAST		{ FBNIC_TLV_ATTR_ID_UNKNOWN, 0, 0 }
+
+struct fbnic_tlv_parser {
+	u16				id;
+	const struct fbnic_tlv_index	*attr;
+	int				(*func)(void *opaque,
+						struct fbnic_tlv_msg **results);
+};
+
+#define FBNIC_TLV_PARSER(id, attr, func) { FBNIC_TLV_MSG_ID_##id, attr, func }
+
+static inline void *
+fbnic_tlv_attr_get_value_ptr(struct fbnic_tlv_msg *attr)
+{
+	return (void *)&attr->value[0];
+}
+
+static inline bool fbnic_tlv_attr_get_bool(struct fbnic_tlv_msg *attr)
+{
+	return !!attr;
+}
+
+u64 fbnic_tlv_attr_get_unsigned(struct fbnic_tlv_msg *attr);
+s64 fbnic_tlv_attr_get_signed(struct fbnic_tlv_msg *attr);
+size_t fbnic_tlv_attr_get_string(struct fbnic_tlv_msg *attr, char *str,
+				 size_t max_size);
+
+#define get_unsigned_result(id, location) \
+do { \
+	struct fbnic_tlv_msg *result = results[id]; \
+	if (result) \
+		location = fbnic_tlv_attr_get_unsigned(result); \
+} while (0)
+
+#define get_signed_result(id, location) \
+do { \
+	struct fbnic_tlv_msg *result = results[id]; \
+	if (result) \
+		location = fbnic_tlv_attr_get_signed(result); \
+} while (0)
+
+#define get_string_result(id, size, str, max_size) \
+do { \
+	struct fbnic_tlv_msg *result = results[id]; \
+	if (result) \
+		size = fbnic_tlv_attr_get_string(result, str, max_size); \
+} while (0)
+
+#define get_bool(id) (!!(results[id]))
+
+struct fbnic_tlv_msg *fbnic_tlv_msg_alloc(u16 msg_id);
+int fbnic_tlv_attr_put_flag(struct fbnic_tlv_msg *msg, const u16 attr_id);
+int fbnic_tlv_attr_put_value(struct fbnic_tlv_msg *msg, const u16 attr_id,
+			     const void *value, const int len);
+int __fbnic_tlv_attr_put_int(struct fbnic_tlv_msg *msg, const u16 attr_id,
+			     s64 value, const int len);
+#define fbnic_tlv_attr_put_int(msg, attr_id, value) \
+	__fbnic_tlv_attr_put_int(msg, attr_id, value, \
+				 FBNIC_TLV_MSG_ALIGN(sizeof(value)))
+int fbnic_tlv_attr_put_mac_addr(struct fbnic_tlv_msg *msg, const u16 attr_id,
+				const u8 *mac_addr);
+int fbnic_tlv_attr_put_string(struct fbnic_tlv_msg *msg, u16 attr_id,
+			      const char *string);
+struct fbnic_tlv_msg *fbnic_tlv_attr_nest_start(struct fbnic_tlv_msg *msg,
+						u16 attr_id);
+void fbnic_tlv_attr_nest_stop(struct fbnic_tlv_msg *msg);
+void fbnic_tlv_attr_addr_copy(u8 *dest, struct fbnic_tlv_msg *src);
+int fbnic_tlv_attr_parse_array(struct fbnic_tlv_msg *attr, int len,
+			       struct fbnic_tlv_msg **results,
+			       const struct fbnic_tlv_index *tlv_index,
+			       u16 tlv_attr_id, size_t array_len);
+int fbnic_tlv_attr_parse(struct fbnic_tlv_msg *attr, int len,
+			 struct fbnic_tlv_msg **results,
+			 const struct fbnic_tlv_index *tlv_index);
+int fbnic_tlv_msg_parse(void *opaque, struct fbnic_tlv_msg *msg,
+			const struct fbnic_tlv_parser *parser);
+int fbnic_tlv_parser_error(void *opaque, struct fbnic_tlv_msg **results);
+
+#define FBNIC_TLV_MSG_ERROR \
+	FBNIC_TLV_PARSER(UNKNOWN, NULL, fbnic_tlv_parser_error)
+#endif /* _FBNIC_TLV_H_ */



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 06/15] eth: fbnic: add FW communication mechanism
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (4 preceding siblings ...)
  2024-04-03 20:08 ` [net-next PATCH 05/15] eth: fbnic: add message parsing for FW messages Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 20:08 ` [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues Alexander Duyck
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Add a mechanism for sending messages to and receiving messages
from the FW. The FW has fairly limited functionality, so the
mechanism doesn't have to support high message rate.

Use device mailbox registers to form two rings, one "to" and
one "from" the device. The rings are just a convention between
driver and FW, not a HW construct. We don't expect messages
larger than 4k so use page-sized buffers.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/Makefile    |    1 
 drivers/net/ethernet/meta/fbnic/fbnic.h     |   18 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h |   79 ++++++
 drivers/net/ethernet/meta/fbnic/fbnic_fw.c  |  354 +++++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_fw.h  |   26 ++
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c |   79 ++++++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c |   59 +++++
 7 files changed, 616 insertions(+)
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_fw.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_fw.h

diff --git a/drivers/net/ethernet/meta/fbnic/Makefile b/drivers/net/ethernet/meta/fbnic/Makefile
index 0434ee0b3069..7b63cd5b09d4 100644
--- a/drivers/net/ethernet/meta/fbnic/Makefile
+++ b/drivers/net/ethernet/meta/fbnic/Makefile
@@ -8,6 +8,7 @@
 obj-$(CONFIG_FBNIC) += fbnic.o
 
 fbnic-y := fbnic_devlink.o \
+	   fbnic_fw.o \
 	   fbnic_irq.o \
 	   fbnic_mac.o \
 	   fbnic_pci.o \
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 63ec82a830cd..28d7f8a880da 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -7,6 +7,7 @@
 #include <linux/io.h>
 
 #include "fbnic_csr.h"
+#include "fbnic_fw.h"
 #include "fbnic_mac.h"
 
 struct fbnic_dev {
@@ -16,8 +17,13 @@ struct fbnic_dev {
 	u32 __iomem *uc_addr4;
 	const struct fbnic_mac *mac;
 	struct msix_entry *msix_entries;
+	unsigned int fw_msix_vector;
 	unsigned short num_irqs;
 
+	struct fbnic_fw_mbx mbx[FBNIC_IPC_MBX_INDICES];
+	/* Lock protecting Tx Mailbox queue to prevent possible races */
+	spinlock_t fw_tx_lock;
+
 	u64 dsn;
 	u32 mps;
 	u32 readrq;
@@ -29,6 +35,7 @@ struct fbnic_dev {
  * causes later.
  */
 enum {
+	FBNIC_FW_MSIX_ENTRY,
 	FBNIC_NON_NAPI_VECTORS
 };
 
@@ -62,6 +69,14 @@ fbnic_rmw32(struct fbnic_dev *fbd, u32 reg, u32 mask, u32 val)
 #define rd32(reg)	fbnic_rd32(fbd, reg)
 #define wrfl()		fbnic_rd32(fbd, FBNIC_MASTER_SPARE_0)
 
+bool fbnic_fw_present(struct fbnic_dev *fbd);
+u32 fbnic_fw_rd32(struct fbnic_dev *fbd, u32 reg);
+void fbnic_fw_wr32(struct fbnic_dev *fbd, u32 reg, u32 val);
+
+#define fw_rd32(reg)		fbnic_fw_rd32(fbd, reg)
+#define fw_wr32(reg, val)	fbnic_fw_wr32(fbd, reg, val)
+#define fw_wrfl()		fbnic_fw_rd32(fbd, FBNIC_FW_ZERO_REG)
+
 extern char fbnic_driver_name[];
 
 void fbnic_devlink_free(struct fbnic_dev *fbd);
@@ -69,6 +84,9 @@ struct fbnic_dev *fbnic_devlink_alloc(struct pci_dev *pdev);
 void fbnic_devlink_register(struct fbnic_dev *fbd);
 void fbnic_devlink_unregister(struct fbnic_dev *fbd);
 
+int fbnic_fw_enable_mbx(struct fbnic_dev *fbd);
+void fbnic_fw_disable_mbx(struct fbnic_dev *fbd);
+
 void fbnic_free_irqs(struct fbnic_dev *fbd);
 int fbnic_alloc_irqs(struct fbnic_dev *fbd);
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index eb37e5981e69..6755af1c948f 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -9,10 +9,60 @@
 #define CSR_BIT(nr)		(1u << (nr))
 #define CSR_GENMASK(h, l)	GENMASK(h, l)
 
+#define DESC_BIT(nr)		BIT_ULL(nr)
+#define DESC_GENMASK(h, l)	GENMASK_ULL(h, l)
+
 #define PCI_DEVICE_ID_META_FBNIC_ASIC		0x0013
 
 #define FBNIC_CLOCK_FREQ	(600 * (1000 * 1000))
 
+/* Register Definitions
+ *
+ * The registers are laid as indexes into an le32 array. As such the actual
+ * address is 4 times the index value. Below each register is defined as 3
+ * fields, name, index, and Address.
+ *
+ *      Name				Index		Address
+ *************************************************************************/
+/* Interrupt Registers */
+#define FBNIC_CSR_START_INTR		0x00000	/* CSR section delimiter */
+#define FBNIC_INTR_STATUS(n)		(0x00000 + (n))	/* 0x00000 + 4*n */
+#define FBNIC_INTR_STATUS_CNT			8
+#define FBNIC_INTR_MASK(n)		(0x00008 + (n)) /* 0x00020 + 4*n */
+#define FBNIC_INTR_MASK_CNT			8
+#define FBNIC_INTR_SET(n)		(0x00010 + (n))	/* 0x00040 + 4*n */
+#define FBNIC_INTR_SET_CNT			8
+#define FBNIC_INTR_CLEAR(n)		(0x00018 + (n))	/* 0x00060 + 4*n */
+#define FBNIC_INTR_CLEAR_CNT			8
+#define FBNIC_INTR_SW_STATUS(n)		(0x00020 + (n)) /* 0x00080 + 4*n */
+#define FBNIC_INTR_SW_STATUS_CNT		8
+#define FBNIC_INTR_SW_AC_MODE(n)	(0x00028 + (n)) /* 0x000a0 + 4*n */
+#define FBNIC_INTR_SW_AC_MODE_CNT		8
+#define FBNIC_INTR_MASK_SET(n)		(0x00030 + (n)) /* 0x000c0 + 4*n */
+#define FBNIC_INTR_MASK_SET_CNT			8
+#define FBNIC_INTR_MASK_CLEAR(n)	(0x00038 + (n)) /* 0x000e0 + 4*n */
+#define FBNIC_INTR_MASK_CLEAR_CNT		8
+#define FBNIC_MAX_MSIX_VECS		256U
+#define FBNIC_INTR_MSIX_CTRL(n)		(0x00040 + (n)) /* 0x00100 + 4*n */
+#define FBNIC_INTR_MSIX_CTRL_VECTOR_MASK	CSR_GENMASK(7, 0)
+#define FBNIC_INTR_MSIX_CTRL_ENABLE		CSR_BIT(31)
+
+#define FBNIC_CSR_END_INTR		0x0005f	/* CSR section delimiter */
+
+/* Interrupt MSIX Registers */
+#define FBNIC_CSR_START_INTR_CQ		0x00400	/* CSR section delimiter */
+#define FBNIC_INTR_CQ_REARM(n) \
+				(0x00400 + ((n) * 4))	/* 0x01000 + 0x10*n */
+#define FBNIC_INTR_CQ_REARM_CNT			256
+#define FBNIC_INTR_CQ_REARM_RCQ_TIMEOUT		CSR_GENMASK(13, 0)
+#define FBNIC_INTR_CQ_REARM_RCQ_TIMEOUT_UPD_EN	CSR_BIT(14)
+#define FBNIC_INTR_CQ_REARM_TCQ_TIMEOUT		CSR_GENMASK(28, 15)
+#define FBNIC_INTR_CQ_REARM_TCQ_TIMEOUT_UPD_EN	CSR_BIT(29)
+#define FBNIC_INTR_CQ_REARM_INTR_RELOAD		CSR_BIT(30)
+#define FBNIC_INTR_CQ_REARM_INTR_UNMASK		CSR_BIT(31)
+
+#define FBNIC_CSR_END_INTR_CQ		0x007fe	/* CSR section delimiter */
+
 /* Global QM Tx registers */
 #define FBNIC_CSR_START_QM_TX		0x00800	/* CSR section delimiter */
 #define FBNIC_QM_TWQ_DEFAULT_META_L	0x00818		/* 0x02060 */
@@ -318,4 +368,33 @@ enum {
 
 #define FBNIC_MAX_QUEUES		128
 
+/* BAR 4 CSRs */
+
+/* The IPC mailbox consists of 32 mailboxes, with each mailbox consisting
+ * of 32 4 byte registers. We will use 2 registers per descriptor so the
+ * length of the mailbox is reduced to 16.
+ *
+ * Currently we use an offset of 0x6000 on BAR4 for the mailbox so we just
+ * have to do the math and determine the offset based on the mailbox
+ * direction and index inside that mailbox.
+ */
+#define FBNIC_IPC_MBX_DESC_LEN	16
+#define FBNIC_IPC_MBX(mbx_idx, desc_idx)	\
+	((((mbx_idx) * FBNIC_IPC_MBX_DESC_LEN + (desc_idx)) * 2) + 0x6000)
+
+/* Use first register in mailbox to flush writes */
+#define FBNIC_FW_ZERO_REG	FBNIC_IPC_MBX(0, 0)
+
+enum {
+	FBNIC_IPC_MBX_RX_IDX,
+	FBNIC_IPC_MBX_TX_IDX,
+	FBNIC_IPC_MBX_INDICES,
+};
+
+#define FBNIC_IPC_MBX_DESC_LEN_MASK	DESC_GENMASK(63, 48)
+#define FBNIC_IPC_MBX_DESC_EOM		DESC_BIT(46)
+#define FBNIC_IPC_MBX_DESC_ADDR_MASK	DESC_GENMASK(45, 3)
+#define FBNIC_IPC_MBX_DESC_FW_CMPL	DESC_BIT(1)
+#define FBNIC_IPC_MBX_DESC_HOST_CMPL	DESC_BIT(0)
+
 #endif /* _FBNIC_CSR_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_fw.c b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
new file mode 100644
index 000000000000..71647044aa23
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
@@ -0,0 +1,354 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/bitfield.h>
+#include <linux/delay.h>
+#include <linux/dev_printk.h>
+#include <linux/dma-mapping.h>
+#include <linux/gfp.h>
+#include <linux/types.h>
+
+#include "fbnic.h"
+#include "fbnic_tlv.h"
+
+static void __fbnic_mbx_wr_desc(struct fbnic_dev *fbd, int mbx_idx,
+				int desc_idx, u64 desc)
+{
+	fw_wr32(FBNIC_IPC_MBX(mbx_idx, desc_idx) + 1, upper_32_bits(desc));
+	fw_wrfl();
+	fw_wr32(FBNIC_IPC_MBX(mbx_idx, desc_idx), lower_32_bits(desc));
+}
+
+static u64 __fbnic_mbx_rd_desc(struct fbnic_dev *fbd, int mbx_idx, int desc_idx)
+{
+	u64 ret_val;
+
+	ret_val = fw_rd32(FBNIC_IPC_MBX(mbx_idx, desc_idx));
+	ret_val += (u64)fw_rd32(FBNIC_IPC_MBX(mbx_idx, desc_idx) + 1) << 32;
+
+	return ret_val;
+}
+
+static void fbnic_mbx_init_desc_ring(struct fbnic_dev *fbd, int mbx_idx)
+{
+	int desc_idx;
+
+	/* Initialize first descriptor to all 0s. Doing this gives us a
+	 * solid stop for the firmware to hit when it is done looping
+	 * through the ring.
+	 */
+	__fbnic_mbx_wr_desc(fbd, mbx_idx, 0, 0);
+
+	fw_wrfl();
+
+	/* We then fill the rest of the ring starting at the end and moving
+	 * back toward descriptor 0 with skip descriptors that have no
+	 * length nor address, and tell the firmware that they can skip
+	 * them and just move past them to the one we initialized to 0.
+	 */
+	for (desc_idx = FBNIC_IPC_MBX_DESC_LEN; --desc_idx;) {
+		__fbnic_mbx_wr_desc(fbd, mbx_idx, desc_idx,
+				    FBNIC_IPC_MBX_DESC_FW_CMPL |
+				    FBNIC_IPC_MBX_DESC_HOST_CMPL);
+		fw_wrfl();
+	}
+}
+
+void fbnic_mbx_init(struct fbnic_dev *fbd)
+{
+	int i;
+
+	/* Initialize lock to protect Tx ring */
+	spin_lock_init(&fbd->fw_tx_lock);
+
+	/* reinitialize mailbox memory */
+	for (i = 0; i < FBNIC_IPC_MBX_INDICES; i++)
+		memset(&fbd->mbx[i], 0, sizeof(struct fbnic_fw_mbx));
+
+	/* Clear any stale causes in vector 0 as that is used for doorbell */
+	wr32(FBNIC_INTR_CLEAR(0), 1u << FBNIC_FW_MSIX_ENTRY);
+
+	for (i = 0; i < FBNIC_IPC_MBX_INDICES; i++)
+		fbnic_mbx_init_desc_ring(fbd, i);
+}
+
+static int fbnic_mbx_map_msg(struct fbnic_dev *fbd, int mbx_idx,
+			     struct fbnic_tlv_msg *msg, u16 length, u8 eom)
+{
+	struct fbnic_fw_mbx *mbx = &fbd->mbx[mbx_idx];
+	u8 tail = mbx->tail;
+	dma_addr_t addr;
+	int direction;
+
+	if (!mbx->ready || !fbnic_fw_present(fbd))
+		return -ENODEV;
+
+	direction = (mbx_idx == FBNIC_IPC_MBX_RX_IDX) ? DMA_FROM_DEVICE :
+							DMA_TO_DEVICE;
+
+	if (mbx->head == ((tail + 1) % FBNIC_IPC_MBX_DESC_LEN))
+		return -EBUSY;
+
+	addr = dma_map_single(fbd->dev, msg, PAGE_SIZE, direction);
+	if (dma_mapping_error(fbd->dev, addr)) {
+		free_page((unsigned long)msg);
+
+		return -ENOSPC;
+	}
+
+	mbx->buf_info[tail].msg = msg;
+	mbx->buf_info[tail].addr = addr;
+
+	mbx->tail = (tail + 1) % FBNIC_IPC_MBX_DESC_LEN;
+
+	fw_wr32(FBNIC_IPC_MBX(mbx_idx, mbx->tail), 0);
+
+	__fbnic_mbx_wr_desc(fbd, mbx_idx, tail,
+			    FIELD_PREP(FBNIC_IPC_MBX_DESC_LEN_MASK, length) |
+			    (addr & FBNIC_IPC_MBX_DESC_ADDR_MASK) |
+			    (eom ? FBNIC_IPC_MBX_DESC_EOM : 0) |
+			    FBNIC_IPC_MBX_DESC_HOST_CMPL);
+
+	return 0;
+}
+
+static void fbnic_mbx_unmap_and_free_msg(struct fbnic_dev *fbd, int mbx_idx,
+					 int desc_idx)
+{
+	struct fbnic_fw_mbx *mbx = &fbd->mbx[mbx_idx];
+	int direction;
+
+	if (!mbx->buf_info[desc_idx].msg)
+		return;
+
+	direction = (mbx_idx == FBNIC_IPC_MBX_RX_IDX) ? DMA_FROM_DEVICE :
+							DMA_TO_DEVICE;
+	dma_unmap_single(fbd->dev, mbx->buf_info[desc_idx].addr,
+			 PAGE_SIZE, direction);
+
+	free_page((unsigned long)mbx->buf_info[desc_idx].msg);
+	mbx->buf_info[desc_idx].msg = NULL;
+}
+
+static void fbnic_mbx_clean_desc_ring(struct fbnic_dev *fbd, int mbx_idx)
+{
+	int i;
+
+	fbnic_mbx_init_desc_ring(fbd, mbx_idx);
+
+	for (i = FBNIC_IPC_MBX_DESC_LEN; i--;)
+		fbnic_mbx_unmap_and_free_msg(fbd, mbx_idx, i);
+}
+
+void fbnic_mbx_clean(struct fbnic_dev *fbd)
+{
+	int i;
+
+	for (i = 0; i < FBNIC_IPC_MBX_INDICES; i++)
+		fbnic_mbx_clean_desc_ring(fbd, i);
+}
+
+#define FBNIC_MBX_MAX_PAGE_SIZE	FIELD_MAX(FBNIC_IPC_MBX_DESC_LEN_MASK)
+#define FBNIC_RX_PAGE_SIZE	min_t(int, PAGE_SIZE, FBNIC_MBX_MAX_PAGE_SIZE)
+
+static int fbnic_mbx_alloc_rx_msgs(struct fbnic_dev *fbd)
+{
+	struct fbnic_fw_mbx *rx_mbx = &fbd->mbx[FBNIC_IPC_MBX_RX_IDX];
+	u8 tail = rx_mbx->tail, head = rx_mbx->head, count;
+	int err = 0;
+
+	/* Do nothing if mailbox is not ready, or we already have pages on
+	 * the ring that can be used by the firmware
+	 */
+	if (!rx_mbx->ready)
+		return -ENODEV;
+
+	/* Fill all but 1 unused descriptors in the Rx queue. */
+	count = (head - tail - 1) % FBNIC_IPC_MBX_DESC_LEN;
+	while (!err && count--) {
+		struct fbnic_tlv_msg *msg;
+
+		msg = (struct fbnic_tlv_msg *)__get_free_page(GFP_ATOMIC |
+							      __GFP_NOWARN);
+		if (!msg) {
+			err = -ENOMEM;
+			break;
+		}
+
+		err = fbnic_mbx_map_msg(fbd, FBNIC_IPC_MBX_RX_IDX, msg,
+					FBNIC_RX_PAGE_SIZE, 0);
+		if (err)
+			free_page((unsigned long)msg);
+	}
+
+	return err;
+}
+
+static void fbnic_mbx_process_tx_msgs(struct fbnic_dev *fbd)
+{
+	struct fbnic_fw_mbx *tx_mbx = &fbd->mbx[FBNIC_IPC_MBX_TX_IDX];
+	u8 head = tx_mbx->head;
+	u64 desc;
+
+	while (head != tx_mbx->tail) {
+		desc = __fbnic_mbx_rd_desc(fbd, FBNIC_IPC_MBX_TX_IDX, head);
+		if (!(desc & FBNIC_IPC_MBX_DESC_FW_CMPL))
+			break;
+
+		fbnic_mbx_unmap_and_free_msg(fbd, FBNIC_IPC_MBX_TX_IDX, head);
+
+		head++;
+		head %= FBNIC_IPC_MBX_DESC_LEN;
+	}
+
+	/* record head for next interrupt */
+	tx_mbx->head = head;
+}
+
+static void fbnic_mbx_postinit_desc_ring(struct fbnic_dev *fbd, int mbx_idx)
+{
+	struct fbnic_fw_mbx *mbx = &fbd->mbx[mbx_idx];
+
+	/* This is a one time init, so just exit if it is completed */
+	if (mbx->ready)
+		return;
+
+	mbx->ready = true;
+
+	switch (mbx_idx) {
+	case FBNIC_IPC_MBX_RX_IDX:
+		/* Make sure we have a page for the FW to write to */
+		fbnic_mbx_alloc_rx_msgs(fbd);
+		break;
+	}
+}
+
+static void fbnic_mbx_postinit(struct fbnic_dev *fbd)
+{
+	int i;
+
+	/* We only need to do this on the first interrupt following init.
+	 * this primes the mailbox so that we will have cleared all the
+	 * skip descriptors.
+	 */
+	if (!(rd32(FBNIC_INTR_STATUS(0)) & (1u << FBNIC_FW_MSIX_ENTRY)))
+		return;
+
+	wr32(FBNIC_INTR_CLEAR(0), 1u << FBNIC_FW_MSIX_ENTRY);
+
+	for (i = 0; i < FBNIC_IPC_MBX_INDICES; i++)
+		fbnic_mbx_postinit_desc_ring(fbd, i);
+}
+
+static const struct fbnic_tlv_parser fbnic_fw_tlv_parser[] = {
+	FBNIC_TLV_MSG_ERROR
+};
+
+static void fbnic_mbx_process_rx_msgs(struct fbnic_dev *fbd)
+{
+	struct fbnic_fw_mbx *rx_mbx = &fbd->mbx[FBNIC_IPC_MBX_RX_IDX];
+	u8 head = rx_mbx->head;
+	u64 desc, length;
+
+	while (head != rx_mbx->tail) {
+		struct fbnic_tlv_msg *msg;
+		int err;
+
+		desc = __fbnic_mbx_rd_desc(fbd, FBNIC_IPC_MBX_RX_IDX, head);
+		if (!(desc & FBNIC_IPC_MBX_DESC_FW_CMPL))
+			break;
+
+		dma_unmap_single(fbd->dev, rx_mbx->buf_info[head].addr,
+				 PAGE_SIZE, DMA_FROM_DEVICE);
+
+		msg = rx_mbx->buf_info[head].msg;
+
+		length = FIELD_GET(FBNIC_IPC_MBX_DESC_LEN_MASK, desc);
+
+		if (le16_to_cpu(msg->hdr.len) * sizeof(u32) > length)
+			dev_warn(fbd->dev, "Mailbox message length mismatch\n");
+
+		/* If parsing fails dump conents of message to dmesg */
+		err = fbnic_tlv_msg_parse(fbd, msg, fbnic_fw_tlv_parser);
+		if (err) {
+			dev_warn(fbd->dev, "Unable to process message: %d\n",
+				 err);
+			print_hex_dump(KERN_WARNING, "fbnic:",
+				       DUMP_PREFIX_OFFSET, 16, 2,
+				       msg, length, true);
+		}
+
+		dev_dbg(fbd->dev, "Parsed msg type %d\n", msg->hdr.type);
+
+		free_page((unsigned long)rx_mbx->buf_info[head].msg);
+		rx_mbx->buf_info[head].msg = NULL;
+
+		head++;
+		head %= FBNIC_IPC_MBX_DESC_LEN;
+	}
+
+	/* record head for next interrupt */
+	rx_mbx->head = head;
+
+	/* Make sure we have at least one page for the FW to write to */
+	fbnic_mbx_alloc_rx_msgs(fbd);
+}
+
+void fbnic_mbx_poll(struct fbnic_dev *fbd)
+{
+	fbnic_mbx_postinit(fbd);
+
+	fbnic_mbx_process_tx_msgs(fbd);
+	fbnic_mbx_process_rx_msgs(fbd);
+}
+
+int fbnic_mbx_poll_tx_ready(struct fbnic_dev *fbd)
+{
+	struct fbnic_fw_mbx *tx_mbx;
+	int attempts = 50;
+
+	/* Immediate fail if BAR4 isn't there */
+	if (!fbnic_fw_present(fbd))
+		return -ENODEV;
+
+	tx_mbx = &fbd->mbx[FBNIC_IPC_MBX_TX_IDX];
+	while (!tx_mbx->ready && --attempts) {
+		msleep(200);
+		fbnic_mbx_poll(fbd);
+	}
+
+	return attempts ? 0 : -ETIMEDOUT;
+}
+
+void fbnic_mbx_flush_tx(struct fbnic_dev *fbd)
+{
+	struct fbnic_fw_mbx *tx_mbx;
+	int attempts = 50;
+	u8 count = 0;
+
+	/* Nothing to do if there is no mailbox */
+	if (!fbnic_fw_present(fbd))
+		return;
+
+	/* Record current Rx stats */
+	tx_mbx = &fbd->mbx[FBNIC_IPC_MBX_TX_IDX];
+
+	/* Nothing to do if mailbox never got to ready */
+	if (!tx_mbx->ready)
+		return;
+
+	/* give firmware time to process packet,
+	 * we will wait up to 10 seconds which is 50 waits of 200ms.
+	 */
+	do {
+		u8 head = tx_mbx->head;
+
+		if (head == tx_mbx->tail)
+			break;
+
+		msleep(200);
+		fbnic_mbx_process_tx_msgs(fbd);
+
+		count += (tx_mbx->head - head) % FBNIC_IPC_MBX_DESC_LEN;
+	} while (count < FBNIC_IPC_MBX_DESC_LEN && --attempts);
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_fw.h b/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
new file mode 100644
index 000000000000..c143079f881c
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_FW_H_
+#define _FBNIC_FW_H_
+
+#include <linux/types.h>
+
+struct fbnic_dev;
+struct fbnic_tlv_msg;
+
+struct fbnic_fw_mbx {
+	u8 ready, head, tail;
+	struct {
+		struct fbnic_tlv_msg	*msg;
+		dma_addr_t		addr;
+	} buf_info[FBNIC_IPC_MBX_DESC_LEN];
+};
+
+void fbnic_mbx_init(struct fbnic_dev *fbd);
+void fbnic_mbx_clean(struct fbnic_dev *fbd);
+void fbnic_mbx_poll(struct fbnic_dev *fbd);
+int fbnic_mbx_poll_tx_ready(struct fbnic_dev *fbd);
+void fbnic_mbx_flush_tx(struct fbnic_dev *fbd);
+
+#endif /* _FBNIC_FW_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
index d2fdc51704b9..d5b08910577c 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
@@ -6,10 +6,88 @@
 
 #include "fbnic.h"
 
+static irqreturn_t fbnic_fw_msix_intr(int __always_unused irq, void *data)
+{
+	struct fbnic_dev *fbd = (struct fbnic_dev *)data;
+
+	fbnic_mbx_poll(fbd);
+
+	wr32(FBNIC_INTR_MASK_CLEAR(0), 1u << FBNIC_FW_MSIX_ENTRY);
+
+	return IRQ_HANDLED;
+}
+
+/**
+ * fbnic_fw_enable_mbx - Configure and initialize Firmware Mailbox
+ * @fbd: Pointer to device to initialize
+ *
+ * This function will initialize the firmware mailbox rings, enable the IRQ
+ * and initialize the communication between the Firmware and the host. The
+ * firmware is expected to respond to the initialization by sending an
+ * interrupt essentially notifying the host that it has seen the
+ * initialization and is now synced up.
+ **/
+int fbnic_fw_enable_mbx(struct fbnic_dev *fbd)
+{
+	u32 vector;
+	int err;
+
+	vector = fbd->fw_msix_vector;
+
+	/* Request the IRQ for MAC link vector.
+	 * Map MAC cause to it, and unmask it
+	 */
+	err = request_threaded_irq(vector, NULL, &fbnic_fw_msix_intr, 0,
+				   dev_name(fbd->dev), fbd);
+	if (err)
+		return err;
+
+	/* Initialize mailbox and attempt to poll it into ready state */
+	fbnic_mbx_init(fbd);
+	err = fbnic_mbx_poll_tx_ready(fbd);
+	if (err)
+		dev_warn(fbd->dev, "FW mailbox did not enter ready state\n");
+
+	/* Enable interrupts */
+	wr32(FBNIC_INTR_SW_AC_MODE(0), ~(1u << FBNIC_FW_MSIX_ENTRY));
+	wr32(FBNIC_INTR_MASK_CLEAR(0), 1u << FBNIC_FW_MSIX_ENTRY);
+
+	return 0;
+}
+
+/**
+ * fbnic_fw_disable_mbx - Disable mailbox and place it in standby state
+ * @fbd: Pointer to device to disable
+ *
+ * This function will disable the mailbox interrupt, free any messages still
+ * in the mailbox and place it into a standby state. The firmware is
+ * expected to see the update and assume that the host is in the reset state.
+ **/
+void fbnic_fw_disable_mbx(struct fbnic_dev *fbd)
+{
+	/* Disable interrupt and free vector */
+	wr32(FBNIC_INTR_MASK_SET(0), 1u << FBNIC_FW_MSIX_ENTRY);
+
+	/* Re-enable auto-clear for the mailbox register */
+	wr32(FBNIC_INTR_SW_AC_MODE(0), ~0);
+
+	/* Free the vector */
+	free_irq(fbd->fw_msix_vector, fbd);
+
+	/* Make sure disabling logs message is sent, must be done here to
+	 * avoid risk of completing without a running interrupt.
+	 */
+	fbnic_mbx_flush_tx(fbd);
+
+	/* Flush any remaining entries */
+	fbnic_mbx_clean(fbd);
+}
+
 void fbnic_free_irqs(struct fbnic_dev *fbd)
 {
 	struct pci_dev *pdev = to_pci_dev(fbd->dev);
 
+	fbd->fw_msix_vector = 0;
 	fbd->num_irqs = 0;
 
 	pci_disable_msix(pdev);
@@ -48,5 +126,6 @@ int fbnic_alloc_irqs(struct fbnic_dev *fbd)
 	fbd->msix_entries = msix_entries;
 	fbd->num_irqs = num_irqs;
 
+	fbd->fw_msix_vector = msix_entries[FBNIC_FW_MSIX_ENTRY].vector;
 	return 0;
 }
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index c860516eb23a..1873fba9c6c2 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -58,6 +58,48 @@ u32 fbnic_rd32(struct fbnic_dev *fbd, u32 reg)
 	return ~0U;
 }
 
+bool fbnic_fw_present(struct fbnic_dev *fbd)
+{
+	return !!READ_ONCE(fbd->uc_addr4);
+}
+
+void fbnic_fw_wr32(struct fbnic_dev *fbd, u32 reg, u32 val)
+{
+	u32 __iomem *csr = READ_ONCE(fbd->uc_addr4);
+
+	if (csr)
+		writel(val, csr + reg);
+}
+
+u32 fbnic_fw_rd32(struct fbnic_dev *fbd, u32 reg)
+{
+	u32 __iomem *csr = READ_ONCE(fbd->uc_addr4);
+	u32 value;
+
+	if (!csr)
+		return ~0U;
+
+	value = readl(csr + reg);
+
+	/* If any bits are 0 value should be valid */
+	if (~value)
+		return value;
+
+	/* All 1's may be valid if ZEROs register still works */
+	if (reg != FBNIC_FW_ZERO_REG && ~readl(csr + FBNIC_FW_ZERO_REG))
+		return value;
+
+	/* Hardware is giving us all 1's reads, assume it is gone */
+	WRITE_ONCE(fbd->uc_addr0, NULL);
+	WRITE_ONCE(fbd->uc_addr4, NULL);
+
+	dev_err(fbd->dev,
+		"Failed read (idx 0x%x AKA addr 0x%x), disabled CSR access, awaiting reset\n",
+		reg, reg << 2);
+
+	return ~0U;
+}
+
 /**
  *  fbnic_probe - Device Initialization Routine
  *  @pdev: PCI device information struct
@@ -123,6 +165,13 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto free_irqs;
 	}
 
+	err = fbnic_fw_enable_mbx(fbd);
+	if (err) {
+		dev_err(&pdev->dev,
+			"Firmware mailbox initialization failure\n");
+		goto free_irqs;
+	}
+
 	if (!fbd->dsn) {
 		dev_warn(&pdev->dev, "Reading serial number failed\n");
 		goto init_failure_mode;
@@ -159,6 +208,7 @@ static void fbnic_remove(struct pci_dev *pdev)
 {
 	struct fbnic_dev *fbd = pci_get_drvdata(pdev);
 
+	fbnic_fw_disable_mbx(fbd);
 	fbnic_free_irqs(fbd);
 
 	fbnic_devlink_unregister(fbd);
@@ -169,6 +219,8 @@ static int fbnic_pm_suspend(struct device *dev)
 {
 	struct fbnic_dev *fbd = dev_get_drvdata(dev);
 
+	fbnic_fw_disable_mbx(fbd);
+
 	/* Free the IRQs so they aren't trying to occupy sleeping CPUs */
 	fbnic_free_irqs(fbd);
 
@@ -197,7 +249,14 @@ static int __fbnic_pm_resume(struct device *dev)
 
 	fbd->mac->init_regs(fbd);
 
+	/* Reenable mailbox */
+	err = fbnic_fw_enable_mbx(fbd);
+	if (err)
+		goto err_free_irqs;
+
 	return 0;
+err_free_irqs:
+	fbnic_free_irqs(fbd);
 err_invalidate_uc_addr:
 	WRITE_ONCE(fbd->uc_addr0, NULL);
 	WRITE_ONCE(fbd->uc_addr4, NULL);



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (5 preceding siblings ...)
  2024-04-03 20:08 ` [net-next PATCH 06/15] eth: fbnic: add FW communication mechanism Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 20:58   ` Andrew Lunn
  2024-04-03 20:08 ` [net-next PATCH 08/15] eth: fbnic: implement Tx queue alloc/start/stop/free Alexander Duyck
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Allocate a netdev and figure out basics like how many queues
we need, MAC address, MTU bounds. Kick off a service task
to do various periodic things like health checking.
The service task only runs when device is open.

We have four levels of objects here:

 - ring - a HW ring with head / tail pointers,
 - triad - two submission and one completion ring,
 - NAPI - NAPI, with one IRQ and any number of Rx and Tx triads,
 - Netdev - The ultimate container of the rings and napi vectors.

The "triad" is the only less-than-usual construct. On Rx we have
two "free buffer" submission rings, one for packet headers and
one for packet data. On Tx we have separate rings for XDP Tx
and normal Tx. So we ended up with ring triplets in both
directions.

We keep NAPIs on a local list, even though core already maintains a list.
Later on having a separate list will matter for live reconfig.
We introduce the list already, the churn would not be worth it.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/Makefile       |    4 
 drivers/net/ethernet/meta/fbnic/fbnic.h        |   13 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |   26 ++
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c    |    2 
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |  180 ++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h |   37 +++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c    |  141 +++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c   |  268 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h   |   55 +++++
 9 files changed, 724 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h

diff --git a/drivers/net/ethernet/meta/fbnic/Makefile b/drivers/net/ethernet/meta/fbnic/Makefile
index 7b63cd5b09d4..f2ea90e0c14f 100644
--- a/drivers/net/ethernet/meta/fbnic/Makefile
+++ b/drivers/net/ethernet/meta/fbnic/Makefile
@@ -11,5 +11,7 @@ fbnic-y := fbnic_devlink.o \
 	   fbnic_fw.o \
 	   fbnic_irq.o \
 	   fbnic_mac.o \
+	   fbnic_netdev.o \
 	   fbnic_pci.o \
-	   fbnic_tlv.o
+	   fbnic_tlv.o \
+	   fbnic_txrx.o
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 28d7f8a880da..92a36959547c 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -5,6 +5,7 @@
 #define _FBNIC_H_
 
 #include <linux/io.h>
+#include <linux/workqueue.h>
 
 #include "fbnic_csr.h"
 #include "fbnic_fw.h"
@@ -12,6 +13,7 @@
 
 struct fbnic_dev {
 	struct device *dev;
+	struct net_device *netdev;
 
 	u32 __iomem *uc_addr0;
 	u32 __iomem *uc_addr4;
@@ -20,6 +22,8 @@ struct fbnic_dev {
 	unsigned int fw_msix_vector;
 	unsigned short num_irqs;
 
+	struct delayed_work service_task;
+
 	struct fbnic_fw_mbx mbx[FBNIC_IPC_MBX_INDICES];
 	/* Lock protecting Tx Mailbox queue to prevent possible races */
 	spinlock_t fw_tx_lock;
@@ -27,6 +31,9 @@ struct fbnic_dev {
 	u64 dsn;
 	u32 mps;
 	u32 readrq;
+
+	/* Number of TCQs/RCQs available on hardware */
+	u16 max_num_queues;
 };
 
 /* Reserve entry 0 in the MSI-X "others" array until we have filled all
@@ -77,6 +84,11 @@ void fbnic_fw_wr32(struct fbnic_dev *fbd, u32 reg, u32 val);
 #define fw_wr32(reg, val)	fbnic_fw_wr32(fbd, reg, val)
 #define fw_wrfl()		fbnic_fw_rd32(fbd, FBNIC_FW_ZERO_REG)
 
+static inline bool fbnic_init_failure(struct fbnic_dev *fbd)
+{
+	return !fbd->netdev;
+}
+
 extern char fbnic_driver_name[];
 
 void fbnic_devlink_free(struct fbnic_dev *fbd);
@@ -95,6 +107,7 @@ enum fbnic_boards {
 };
 
 struct fbnic_info {
+	unsigned int max_num_queues;
 	unsigned int bar_mask;
 };
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 6755af1c948f..39980974e21f 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -366,7 +366,33 @@ enum {
 #define FBNIC_PUL_OB_TLP_HDR_AR_CFG_BME		CSR_BIT(18)
 #define FBNIC_CSR_END_PUL_USER	0x31080	/* CSR section delimiter */
 
+/* Queue Registers
+ *
+ * The queue register offsets are specific for a given queue grouping. So to
+ * find the actual register offset it is necessary to combine FBNIC_QUEUE(n)
+ * with the register to get the actual register offset like so:
+ *   FBNIC_QUEUE_TWQ0_CTL(n) == FBNIC_QUEUE(n) + FBNIC_QUEUE_TWQ0_CTL
+ */
+#define FBNIC_CSR_START_QUEUE		0x40000	/* CSR section delimiter */
+#define FBNIC_QUEUE_STRIDE		0x400		/* 0x1000 */
+#define FBNIC_QUEUE(n)\
+	(0x40000 + FBNIC_QUEUE_STRIDE * (n))	/* 0x100000 + 0x1000*n */
+
+#define FBNIC_QUEUE_TWQ0_TAIL		0x002		/* 0x008 */
+#define FBNIC_QUEUE_TWQ1_TAIL		0x003		/* 0x00c */
+
+/* Tx Completion Queue Registers */
+#define FBNIC_QUEUE_TCQ_HEAD		0x081		/* 0x204 */
+
+/* Rx Completion Queue Registers */
+#define FBNIC_QUEUE_RCQ_HEAD		0x201		/* 0x804 */
+
+/* Rx Buffer Descriptor Queue Registers */
+#define FBNIC_QUEUE_BDQ_HPQ_TAIL	0x241		/* 0x904 */
+#define FBNIC_QUEUE_BDQ_PPQ_TAIL	0x242		/* 0x908 */
+
 #define FBNIC_MAX_QUEUES		128
+#define FBNIC_CSR_END_QUEUE	(0x40000 + 0x400 * FBNIC_MAX_QUEUES - 1)
 
 /* BAR 4 CSRs */
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
index d5b08910577c..a20070683f48 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 
 #include "fbnic.h"
+#include "fbnic_txrx.h"
 
 static irqreturn_t fbnic_fw_msix_intr(int __always_unused irq, void *data)
 {
@@ -103,6 +104,7 @@ int fbnic_alloc_irqs(struct fbnic_dev *fbd)
 	struct msix_entry *msix_entries;
 	int i, num_irqs;
 
+	wanted_irqs += min_t(unsigned int, num_online_cpus(), FBNIC_MAX_RXQS);
 	msix_entries = kcalloc(wanted_irqs, sizeof(*msix_entries), GFP_KERNEL);
 	if (!msix_entries)
 		return -ENOMEM;
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
new file mode 100644
index 000000000000..82c21bcb9c3f
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/etherdevice.h>
+#include <linux/ipv6.h>
+#include <linux/types.h>
+
+#include "fbnic.h"
+#include "fbnic_netdev.h"
+#include "fbnic_txrx.h"
+
+int __fbnic_open(struct fbnic_net *fbn)
+{
+	int err;
+
+	err = fbnic_alloc_napi_vectors(fbn);
+	if (err)
+		return err;
+
+	err = netif_set_real_num_tx_queues(fbn->netdev,
+					   fbn->num_tx_queues);
+	if (err)
+		goto free_resources;
+
+	err = netif_set_real_num_rx_queues(fbn->netdev,
+					   fbn->num_rx_queues);
+	if (err)
+		goto free_resources;
+
+	return 0;
+free_resources:
+	fbnic_free_napi_vectors(fbn);
+	return err;
+}
+
+static int fbnic_open(struct net_device *netdev)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	int err;
+
+	err = __fbnic_open(fbn);
+	if (!err)
+		fbnic_up(fbn);
+
+	return err;
+}
+
+static int fbnic_stop(struct net_device *netdev)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+
+	fbnic_down(fbn);
+
+	fbnic_free_napi_vectors(fbn);
+
+	return 0;
+}
+
+static const struct net_device_ops fbnic_netdev_ops = {
+	.ndo_open		= fbnic_open,
+	.ndo_stop		= fbnic_stop,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_start_xmit		= fbnic_xmit_frame,
+};
+
+void fbnic_reset_queues(struct fbnic_net *fbn,
+			unsigned int tx, unsigned int rx)
+{
+	struct fbnic_dev *fbd = fbn->fbd;
+	unsigned int max_napis;
+
+	max_napis = fbd->num_irqs - FBNIC_NON_NAPI_VECTORS;
+
+	tx = min(tx, max_napis);
+	fbn->num_tx_queues = tx;
+
+	rx = min(rx, max_napis);
+	fbn->num_rx_queues = rx;
+
+	fbn->num_napi = max(tx, rx);
+}
+
+/**
+ * fbnic_netdev_alloc - Allocate a netdev and associate with fbnic
+ * @fbd: Driver specific structure to associate netdev with
+ *
+ * Allocate and initialize the netdev and netdev private structure. Bind
+ * together the hardware, netdev, and pci data structures.
+ **/
+struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
+{
+	struct net_device *netdev;
+	struct fbnic_net *fbn;
+	int default_queues;
+
+	netdev = alloc_etherdev_mq(sizeof(*fbn), FBNIC_MAX_RXQS);
+	if (!netdev)
+		return NULL;
+
+	SET_NETDEV_DEV(netdev, fbd->dev);
+	fbd->netdev = netdev;
+
+	netdev->netdev_ops = &fbnic_netdev_ops;
+
+	fbn = netdev_priv(netdev);
+	fbn->netdev = netdev;
+	fbn->fbd = fbd;
+	INIT_LIST_HEAD(&fbn->napis);
+
+	default_queues = netif_get_num_default_rss_queues();
+	if (default_queues > fbd->max_num_queues)
+		default_queues = fbd->max_num_queues;
+
+	fbnic_reset_queues(fbn, default_queues, default_queues);
+
+	netdev->min_mtu = IPV6_MIN_MTU;
+	netdev->max_mtu = FBNIC_MAX_JUMBO_FRAME_SIZE - ETH_HLEN;
+
+	netif_carrier_off(netdev);
+
+	netif_tx_stop_all_queues(netdev);
+
+	return netdev;
+}
+
+/**
+ * fbnic_netdev_free - Free the netdev associate with fbnic
+ * @fbd: Driver specific structure to free netdev from
+ *
+ * Allocate and initialize the netdev and netdev private structure. Bind
+ * together the hardware, netdev, and pci data structures.
+ **/
+void fbnic_netdev_free(struct fbnic_dev *fbd)
+{
+	free_netdev(fbd->netdev);
+	fbd->netdev = NULL;
+}
+
+static int fbnic_dsn_to_mac_addr(u64 dsn, char *addr)
+{
+	addr[0] = (dsn >> 56) & 0xFF;
+	addr[1] = (dsn >> 48) & 0xFF;
+	addr[2] = (dsn >> 40) & 0xFF;
+	addr[3] = (dsn >> 16) & 0xFF;
+	addr[4] = (dsn >> 8) & 0xFF;
+	addr[5] = dsn & 0xFF;
+
+	return is_valid_ether_addr(addr) ? 0 : -EINVAL;
+}
+
+/**
+ * fbnic_netdev_register - Initialize general software structures
+ * @netdev: Netdev containing structure to initialize and register
+ *
+ * Initialize the MAC address for the netdev and register it.
+ **/
+int fbnic_netdev_register(struct net_device *netdev)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	struct fbnic_dev *fbd = fbn->fbd;
+	u64 dsn = fbd->dsn;
+	u8 addr[ETH_ALEN];
+	int err;
+
+	err = fbnic_dsn_to_mac_addr(dsn, addr);
+	if (!err) {
+		ether_addr_copy(netdev->perm_addr, addr);
+		eth_hw_addr_set(netdev, addr);
+	} else {
+		dev_err(fbd->dev, "MAC addr %pM invalid\n", addr);
+		return err;
+	}
+
+	return register_netdev(netdev);
+}
+
+void fbnic_netdev_unregister(struct net_device *netdev)
+{
+	unregister_netdev(netdev);
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
new file mode 100644
index 000000000000..8d12abe5fb57
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_NETDEV_H_
+#define _FBNIC_NETDEV_H_
+
+#include <linux/types.h>
+
+#include "fbnic_txrx.h"
+
+struct fbnic_net {
+	struct fbnic_ring *tx[FBNIC_MAX_TXQS];
+	struct fbnic_ring *rx[FBNIC_MAX_RXQS];
+
+	struct net_device *netdev;
+	struct fbnic_dev *fbd;
+
+	u16 num_napi;
+
+	u16 num_tx_queues;
+	u16 num_rx_queues;
+
+	struct list_head napis;
+};
+
+int __fbnic_open(struct fbnic_net *fbn);
+void fbnic_up(struct fbnic_net *fbn);
+void fbnic_down(struct fbnic_net *fbn);
+
+struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd);
+void fbnic_netdev_free(struct fbnic_dev *fbd);
+int fbnic_netdev_register(struct net_device *netdev);
+void fbnic_netdev_unregister(struct net_device *netdev);
+void fbnic_reset_queues(struct fbnic_net *fbn,
+			unsigned int tx, unsigned int rx);
+
+#endif /* _FBNIC_NETDEV_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index 1873fba9c6c2..67f6112ec5f0 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -4,10 +4,12 @@
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/rtnetlink.h>
 #include <linux/types.h>
 
 #include "fbnic.h"
 #include "fbnic_drvinfo.h"
+#include "fbnic_netdev.h"
 
 char fbnic_driver_name[] = DRV_NAME;
 
@@ -15,6 +17,7 @@ MODULE_DESCRIPTION(DRV_SUMMARY);
 MODULE_LICENSE("GPL");
 
 static const struct fbnic_info fbnic_asic_info = {
+	.max_num_queues = FBNIC_MAX_QUEUES,
 	.bar_mask = BIT(0) | BIT(4)
 };
 
@@ -55,6 +58,10 @@ u32 fbnic_rd32(struct fbnic_dev *fbd, u32 reg)
 		"Failed read (idx 0x%x AKA addr 0x%x), disabled CSR access, awaiting reset\n",
 		reg, reg << 2);
 
+	/* Notify stack that device has lost (PCIe) link */
+	if (!fbnic_init_failure(fbd))
+		netif_device_detach(fbd->netdev);
+
 	return ~0U;
 }
 
@@ -97,9 +104,56 @@ u32 fbnic_fw_rd32(struct fbnic_dev *fbd, u32 reg)
 		"Failed read (idx 0x%x AKA addr 0x%x), disabled CSR access, awaiting reset\n",
 		reg, reg << 2);
 
+	/* Notify stack that device has lost (PCIe) link */
+	if (!fbnic_init_failure(fbd))
+		netif_device_detach(fbd->netdev);
+
 	return ~0U;
 }
 
+static void fbnic_service_task_start(struct fbnic_net *fbn)
+{
+	struct fbnic_dev *fbd = fbn->fbd;
+
+	schedule_delayed_work(&fbd->service_task, HZ);
+}
+
+static void fbnic_service_task_stop(struct fbnic_net *fbn)
+{
+	struct net_device *netdev = fbn->netdev;
+	struct fbnic_dev *fbd = fbn->fbd;
+
+	cancel_delayed_work(&fbd->service_task);
+	netif_carrier_off(netdev);
+}
+
+void fbnic_up(struct fbnic_net *fbn)
+{
+	netif_tx_start_all_queues(fbn->netdev);
+
+	fbnic_service_task_start(fbn);
+}
+
+void fbnic_down(struct fbnic_net *fbn)
+{
+	fbnic_service_task_stop(fbn);
+
+	netif_tx_disable(fbn->netdev);
+}
+
+static void fbnic_service_task(struct work_struct *work)
+{
+	struct fbnic_dev *fbd = container_of(to_delayed_work(work),
+					     struct fbnic_dev, service_task);
+
+	rtnl_lock();
+
+	if (netif_running(fbd->netdev))
+		schedule_delayed_work(&fbd->service_task, HZ);
+
+	rtnl_unlock();
+}
+
 /**
  *  fbnic_probe - Device Initialization Routine
  *  @pdev: PCI device information struct
@@ -114,6 +168,7 @@ u32 fbnic_fw_rd32(struct fbnic_dev *fbd, u32 reg)
 static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	const struct fbnic_info *info = fbnic_info_tbl[ent->driver_data];
+	struct net_device *netdev;
 	struct fbnic_dev *fbd;
 	int err;
 
@@ -150,11 +205,16 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		return -ENOMEM;
 	}
 
+	/* populate driver with hardware-specific info and handlers */
+	fbd->max_num_queues = info->max_num_queues;
+
 	pci_set_master(pdev);
 	pci_save_state(pdev);
 
 	fbnic_devlink_register(fbd);
 
+	INIT_DELAYED_WORK(&fbd->service_task, fbnic_service_task);
+
 	err = fbnic_alloc_irqs(fbd);
 	if (err)
 		goto free_fbd;
@@ -177,8 +237,22 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto init_failure_mode;
 	}
 
+	netdev = fbnic_netdev_alloc(fbd);
+	if (!netdev) {
+		dev_err(&pdev->dev, "Netdev allocation failed\n");
+		goto init_failure_mode;
+	}
+
+	err = fbnic_netdev_register(netdev);
+	if (err) {
+		dev_err(&pdev->dev, "Netdev registration failed: %d\n", err);
+		goto ifm_free_netdev;
+	}
+
 	return 0;
 
+ifm_free_netdev:
+	fbnic_netdev_free(fbd);
 init_failure_mode:
 	dev_warn(&pdev->dev, "Probe error encountered, entering init failure mode. Normal networking functionality will not be available.\n");
 	 /* Always return 0 even on error so devlink is registered to allow
@@ -192,7 +266,6 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	fbnic_devlink_unregister(fbd);
 	fbnic_devlink_free(fbd);
-
 	return err;
 }
 
@@ -208,6 +281,14 @@ static void fbnic_remove(struct pci_dev *pdev)
 {
 	struct fbnic_dev *fbd = pci_get_drvdata(pdev);
 
+	if (!fbnic_init_failure(fbd)) {
+		struct net_device *netdev = fbd->netdev;
+
+		fbnic_netdev_unregister(netdev);
+		cancel_delayed_work_sync(&fbd->service_task);
+		fbnic_netdev_free(fbd);
+	}
+
 	fbnic_fw_disable_mbx(fbd);
 	fbnic_free_irqs(fbd);
 
@@ -218,7 +299,21 @@ static void fbnic_remove(struct pci_dev *pdev)
 static int fbnic_pm_suspend(struct device *dev)
 {
 	struct fbnic_dev *fbd = dev_get_drvdata(dev);
+	struct net_device *netdev = fbd->netdev;
 
+	if (fbnic_init_failure(fbd))
+		goto null_uc_addr;
+
+	rtnl_lock();
+
+	netif_device_detach(netdev);
+
+	if (netif_running(netdev))
+		netdev->netdev_ops->ndo_stop(netdev);
+
+	rtnl_unlock();
+
+null_uc_addr:
 	fbnic_fw_disable_mbx(fbd);
 
 	/* Free the IRQs so they aren't trying to occupy sleeping CPUs */
@@ -234,7 +329,9 @@ static int fbnic_pm_suspend(struct device *dev)
 static int __fbnic_pm_resume(struct device *dev)
 {
 	struct fbnic_dev *fbd = dev_get_drvdata(dev);
+	struct net_device *netdev = fbd->netdev;
 	void __iomem * const *iomap_table;
+	struct fbnic_net *fbn;
 	int err;
 
 	/* restore MMIO access */
@@ -254,7 +351,29 @@ static int __fbnic_pm_resume(struct device *dev)
 	if (err)
 		goto err_free_irqs;
 
+	/* no netdev means there isn't a network interface to bring up */
+	if (fbnic_init_failure(fbd))
+		return 0;
+
+	fbn = netdev_priv(netdev);
+
+	/* reset the queues if needed */
+	fbnic_reset_queues(fbn, fbn->num_tx_queues, fbn->num_rx_queues);
+
+	rtnl_lock();
+
+	if (netif_running(netdev)) {
+		err = __fbnic_open(fbn);
+		if (err)
+			goto err_disable_mbx;
+	}
+
+	rtnl_unlock();
+
 	return 0;
+err_disable_mbx:
+	rtnl_unlock();
+	fbnic_fw_disable_mbx(fbd);
 err_free_irqs:
 	fbnic_free_irqs(fbd);
 err_invalidate_uc_addr:
@@ -263,11 +382,30 @@ static int __fbnic_pm_resume(struct device *dev)
 	return err;
 }
 
+static void __fbnic_pm_attach(struct device *dev)
+{
+	struct fbnic_dev *fbd = dev_get_drvdata(dev);
+	struct net_device *netdev = fbd->netdev;
+	struct fbnic_net *fbn;
+
+	if (fbnic_init_failure(fbd))
+		return;
+
+	fbn = netdev_priv(netdev);
+
+	if (netif_running(netdev))
+		fbnic_up(fbn);
+
+	netif_device_attach(netdev);
+}
+
 static int __maybe_unused fbnic_pm_resume(struct device *dev)
 {
 	int err;
 
 	err = __fbnic_pm_resume(dev);
+	if (!err)
+		__fbnic_pm_attach(dev);
 
 	return err;
 }
@@ -316,6 +454,7 @@ static pci_ers_result_t fbnic_err_slot_reset(struct pci_dev *pdev)
 
 static void fbnic_err_resume(struct pci_dev *pdev)
 {
+	__fbnic_pm_attach(&pdev->dev);
 }
 
 static const struct pci_error_handlers fbnic_err_handler = {
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
new file mode 100644
index 000000000000..366386a9721a
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
@@ -0,0 +1,268 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/pci.h>
+
+#include "fbnic_netdev.h"
+#include "fbnic_txrx.h"
+#include "fbnic.h"
+
+netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev)
+{
+	dev_kfree_skb_any(skb);
+	return NETDEV_TX_OK;
+}
+
+static int fbnic_poll(struct napi_struct *napi, int budget)
+{
+	return 0;
+}
+
+static irqreturn_t fbnic_msix_clean_rings(int __always_unused irq, void *data)
+{
+	struct fbnic_napi_vector *nv = data;
+
+	napi_schedule_irqoff(&nv->napi);
+
+	return IRQ_HANDLED;
+}
+
+static void fbnic_remove_tx_ring(struct fbnic_net *fbn,
+				 struct fbnic_ring *txr)
+{
+	if (!(txr->flags & FBNIC_RING_F_STATS))
+		return;
+
+	/* Remove pointer to the Tx ring */
+	WARN_ON(fbn->tx[txr->q_idx] && fbn->tx[txr->q_idx] != txr);
+	fbn->tx[txr->q_idx] = NULL;
+}
+
+static void fbnic_remove_rx_ring(struct fbnic_net *fbn,
+				 struct fbnic_ring *rxr)
+{
+	if (!(rxr->flags & FBNIC_RING_F_STATS))
+		return;
+
+	/* Remove pointer to the Rx ring */
+	WARN_ON(fbn->rx[rxr->q_idx] && fbn->rx[rxr->q_idx] != rxr);
+	fbn->rx[rxr->q_idx] = NULL;
+}
+
+static void fbnic_free_napi_vector(struct fbnic_net *fbn,
+				   struct fbnic_napi_vector *nv)
+{
+	struct fbnic_dev *fbd = nv->fbd;
+	u32 v_idx = nv->v_idx;
+	int i, j;
+
+	for (i = 0; i < nv->txt_count; i++) {
+		fbnic_remove_tx_ring(fbn, &nv->qt[i].sub0);
+		fbnic_remove_tx_ring(fbn, &nv->qt[i].cmpl);
+	}
+
+	for (j = 0; j < nv->rxt_count; j++, i++) {
+		fbnic_remove_rx_ring(fbn, &nv->qt[i].sub0);
+		fbnic_remove_rx_ring(fbn, &nv->qt[i].sub1);
+		fbnic_remove_rx_ring(fbn, &nv->qt[i].cmpl);
+	}
+
+	free_irq(fbd->msix_entries[v_idx].vector, nv);
+	netif_napi_del(&nv->napi);
+	list_del(&nv->napis);
+	kfree(nv);
+}
+
+void fbnic_free_napi_vectors(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv, *temp;
+
+	list_for_each_entry_safe(nv, temp, &fbn->napis, napis)
+		fbnic_free_napi_vector(fbn, nv);
+}
+
+static void fbnic_name_napi_vector(struct fbnic_napi_vector *nv)
+{
+	unsigned char *dev_name = nv->napi.dev->name;
+
+	if (!nv->rxt_count)
+		snprintf(nv->name, sizeof(nv->name), "%s-Tx-%u", dev_name,
+			 nv->v_idx - FBNIC_NON_NAPI_VECTORS);
+	else
+		snprintf(nv->name, sizeof(nv->name), "%s-TxRx-%u", dev_name,
+			 nv->v_idx - FBNIC_NON_NAPI_VECTORS);
+}
+
+static void fbnic_ring_init(struct fbnic_ring *ring, u32 __iomem *doorbell,
+			    int q_idx, u8 flags)
+{
+	ring->doorbell = doorbell;
+	ring->q_idx = q_idx;
+	ring->flags = flags;
+}
+
+static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn,
+				   unsigned int v_count, unsigned int v_idx,
+				   unsigned int txq_count, unsigned int txq_idx,
+				   unsigned int rxq_count, unsigned int rxq_idx)
+{
+	int txt_count = txq_count, rxt_count = rxq_count;
+	u32 __iomem *uc_addr = fbd->uc_addr0;
+	struct fbnic_napi_vector *nv;
+	struct fbnic_q_triad *qt;
+	int qt_count, err;
+	u32 __iomem *db;
+	u32 vector;
+
+	qt_count = txt_count + rxq_count;
+	if (!qt_count)
+		return -EINVAL;
+
+	/* If MMIO has already failed there are no rings to initialize */
+	if (!uc_addr)
+		return -EIO;
+
+	/* allocate NAPI vector and queue triads */
+	nv = kzalloc(struct_size(nv, qt, qt_count), GFP_KERNEL);
+	if (!nv)
+		return -ENOMEM;
+
+	/* record queue triad counts */
+	nv->txt_count = txt_count;
+	nv->rxt_count = rxt_count;
+
+	/* provide pointer back to fbnic and MSI-X vectors */
+	nv->fbd = fbd;
+	nv->v_idx = v_idx;
+
+	/* tie napi to netdev */
+	list_add(&nv->napis, &fbn->napis);
+	netif_napi_add(fbn->netdev, &nv->napi, fbnic_poll);
+
+	/* tie nv back to PCIe dev */
+	nv->dev = fbd->dev;
+
+	/* initialize vector name */
+	fbnic_name_napi_vector(nv);
+
+	/* request the IRQ for napi vector */
+	vector = fbd->msix_entries[v_idx].vector;
+	err = request_irq(vector, &fbnic_msix_clean_rings, IRQF_SHARED,
+			  nv->name, nv);
+	if (err)
+		goto napi_del;
+
+	/* Initialize queue triads */
+	qt = nv->qt;
+
+	while (txt_count) {
+		/* Configure Tx queue */
+		db = &uc_addr[FBNIC_QUEUE(txq_idx) + FBNIC_QUEUE_TWQ0_TAIL];
+
+		/* Assign Tx queue to netdev if applicable */
+		if (txq_count > 0) {
+			u8 flags = FBNIC_RING_F_CTX | FBNIC_RING_F_STATS;
+
+			fbnic_ring_init(&qt->sub0, db, txq_idx, flags);
+			fbn->tx[txq_idx] = &qt->sub0;
+			txq_count--;
+		} else {
+			fbnic_ring_init(&qt->sub0, db, 0,
+					FBNIC_RING_F_DISABLED);
+		}
+
+		/* Configure Tx completion queue */
+		db = &uc_addr[FBNIC_QUEUE(txq_idx) + FBNIC_QUEUE_TCQ_HEAD];
+		fbnic_ring_init(&qt->cmpl, db, 0, 0);
+
+		/* Update Tx queue index */
+		txt_count--;
+		txq_idx += v_count;
+
+		/* move to next queue triad */
+		qt++;
+	}
+
+	while (rxt_count) {
+		/* Configure header queue */
+		db = &uc_addr[FBNIC_QUEUE(rxq_idx) + FBNIC_QUEUE_BDQ_HPQ_TAIL];
+		fbnic_ring_init(&qt->sub0, db, 0, FBNIC_RING_F_CTX);
+
+		/* Configure payload queue */
+		db = &uc_addr[FBNIC_QUEUE(rxq_idx) + FBNIC_QUEUE_BDQ_PPQ_TAIL];
+		fbnic_ring_init(&qt->sub1, db, 0, FBNIC_RING_F_CTX);
+
+		/* Configure Rx completion queue */
+		db = &uc_addr[FBNIC_QUEUE(rxq_idx) + FBNIC_QUEUE_RCQ_HEAD];
+		fbnic_ring_init(&qt->cmpl, db, rxq_idx, FBNIC_RING_F_STATS);
+		fbn->rx[rxq_idx] = &qt->cmpl;
+
+		/* Update Rx queue index */
+		rxt_count--;
+		rxq_idx += v_count;
+
+		/* move to next queue triad */
+		qt++;
+	}
+
+	return 0;
+
+napi_del:
+	netif_napi_del(&nv->napi);
+	list_del(&nv->napis);
+	kfree(nv);
+	return err;
+}
+
+int fbnic_alloc_napi_vectors(struct fbnic_net *fbn)
+{
+	unsigned int txq_idx = 0, rxq_idx = 0, v_idx = FBNIC_NON_NAPI_VECTORS;
+	unsigned int num_tx = fbn->num_tx_queues;
+	unsigned int num_rx = fbn->num_rx_queues;
+	unsigned int num_napi = fbn->num_napi;
+	struct fbnic_dev *fbd = fbn->fbd;
+	int err;
+
+	/* Allocate 1 Tx queue per napi vector */
+	if (num_napi < FBNIC_MAX_TXQS && num_napi == num_tx + num_rx) {
+		while (num_tx) {
+			err = fbnic_alloc_napi_vector(fbd, fbn,
+						      num_napi, v_idx,
+						      1, txq_idx, 0, 0);
+			if (err)
+				goto free_vectors;
+
+			/* update counts and index */
+			num_tx--;
+			txq_idx++;
+
+			v_idx++;
+		}
+	}
+
+	/* Allocate Tx/Rx queue pairs per vector, or allocate remaining Rx */
+	while (num_rx | num_tx) {
+		int tqpv = DIV_ROUND_UP(num_tx, num_napi - txq_idx);
+		int rqpv = DIV_ROUND_UP(num_rx, num_napi - rxq_idx);
+
+		err = fbnic_alloc_napi_vector(fbd, fbn, num_napi, v_idx,
+					      tqpv, txq_idx, rqpv, rxq_idx);
+		if (err)
+			goto free_vectors;
+
+		/* update counts and index */
+		num_tx -= tqpv;
+		txq_idx++;
+
+		num_rx -= rqpv;
+		rxq_idx++;
+
+		v_idx++;
+	}
+
+	return 0;
+
+free_vectors:
+	fbnic_free_napi_vectors(fbn);
+	return -ENOMEM;
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
new file mode 100644
index 000000000000..e7f1208a3543
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_TXRX_H_
+#define _FBNIC_TXRX_H_
+
+#include <linux/netdevice.h>
+#include <linux/types.h>
+
+struct fbnic_net;
+
+#define FBNIC_MAX_TXQS			128u
+#define FBNIC_MAX_RXQS			128u
+
+#define FBNIC_RING_F_DISABLED		BIT(0)
+#define FBNIC_RING_F_CTX		BIT(1)
+#define FBNIC_RING_F_STATS		BIT(2)	/* ring's stats may be used */
+
+struct fbnic_ring {
+	u32 __iomem *doorbell;		/* pointer to CSR space for ring */
+	u16 size_mask;			/* size of ring in descriptors - 1 */
+	u8 q_idx;			/* logical netdev ring index */
+	u8 flags;			/* ring flags (FBNIC_RING_F_*) */
+
+	u32 head, tail;			/* head/tail of ring */
+};
+
+struct fbnic_q_triad {
+	struct fbnic_ring sub0, sub1, cmpl;
+};
+
+struct fbnic_napi_vector {
+	struct napi_struct napi;
+	struct device *dev;		/* Device for DMA unmapping */
+	struct fbnic_dev *fbd;
+	char name[IFNAMSIZ + 9];
+
+	u16 v_idx;
+	u8 txt_count;
+	u8 rxt_count;
+
+	struct list_head napis;
+
+	struct fbnic_q_triad qt[];
+};
+
+#define FBNIC_MAX_TXQS			128u
+#define FBNIC_MAX_RXQS			128u
+
+netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev);
+
+int fbnic_alloc_napi_vectors(struct fbnic_net *fbn);
+void fbnic_free_napi_vectors(struct fbnic_net *fbn);
+
+#endif /* _FBNIC_TXRX_H_ */



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 08/15] eth: fbnic: implement Tx queue alloc/start/stop/free
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (6 preceding siblings ...)
  2024-04-03 20:08 ` [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues Alexander Duyck
@ 2024-04-03 20:08 ` Alexander Duyck
  2024-04-03 20:09 ` [net-next PATCH 09/15] eth: fbnic: implement Rx " Alexander Duyck
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:08 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Implement basic management operations for Tx queues.
Allocate memory for submission and completion rings.
Learn how to start the queues, stop them, and wait for HW
to be idle.

We call HW rings "descriptor rings" (stored in ring->desc),
and SW context rings "buffer rings" (stored in ring->*_buf union).

This is the first patch which actually touches CSRs so add CSR
helpers.

No actual datapath / packet handling here, yet.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |   66 ++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |    9 +
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h |    2 
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c    |   19 +
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c   |  425 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h   |   22 +
 6 files changed, 542 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 39980974e21f..e50c2827590b 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -61,10 +61,18 @@
 #define FBNIC_INTR_CQ_REARM_INTR_RELOAD		CSR_BIT(30)
 #define FBNIC_INTR_CQ_REARM_INTR_UNMASK		CSR_BIT(31)
 
+#define FBNIC_INTR_RCQ_TIMEOUT(n) \
+				(0x00401 + ((n) * 4))	/* 0x01004 + 0x10*n */
+#define FBNIC_INTR_RCQ_TIMEOUT_CNT		256
+#define FBNIC_INTR_TCQ_TIMEOUT(n) \
+				(0x00402 + ((n) * 4))	/* 0x01008 + 0x10*n */
+#define FBNIC_INTR_TCQ_TIMEOUT_CNT		256
 #define FBNIC_CSR_END_INTR_CQ		0x007fe	/* CSR section delimiter */
 
 /* Global QM Tx registers */
 #define FBNIC_CSR_START_QM_TX		0x00800	/* CSR section delimiter */
+#define FBNIC_QM_TWQ_IDLE(n)		(0x00800 + (n)) /* 0x02000 + 0x4*n */
+#define FBNIC_QM_TWQ_IDLE_CNT			8
 #define FBNIC_QM_TWQ_DEFAULT_META_L	0x00818		/* 0x02060 */
 #define FBNIC_QM_TWQ_DEFAULT_META_H	0x00819		/* 0x02064 */
 
@@ -86,10 +94,16 @@ enum {
 #define FBNIC_QM_TQS_MTU_CTL0		0x0081d		/* 0x02074 */
 #define FBNIC_QM_TQS_MTU_CTL1		0x0081e		/* 0x02078 */
 #define FBNIC_QM_TQS_MTU_CTL1_BULK		CSR_GENMASK(13, 0)
+#define FBNIC_QM_TCQ_IDLE(n)		(0x00821 + (n)) /* 0x02084 + 0x4*n */
+#define FBNIC_QM_TCQ_IDLE_CNT			4
 #define FBNIC_QM_TCQ_CTL0		0x0082d		/* 0x020b4 */
 #define FBNIC_QM_TCQ_CTL0_COAL_WAIT		CSR_GENMASK(15, 0)
 #define FBNIC_QM_TCQ_CTL0_TICK_CYCLES		CSR_GENMASK(26, 16)
+#define FBNIC_QM_TQS_IDLE(n)		(0x00830 + (n)) /* 0x020c0 + 0x4*n */
+#define FBNIC_QM_TQS_IDLE_CNT			4
 #define FBNIC_QM_TQS_EDT_TS_RANGE	0x00849		/* 0x2124 */
+#define FBNIC_QM_TDE_IDLE(n)		(0x00853 + (n)) /* 0x0214c + 0x4*n */
+#define FBNIC_QM_TDE_IDLE_CNT			8
 #define FBNIC_QM_TNI_TDF_CTL		0x0086c		/* 0x021b0 */
 #define FBNIC_QM_TNI_TDF_CTL_MRRS		CSR_GENMASK(1, 0)
 #define FBNIC_QM_TNI_TDF_CTL_CLS		CSR_GENMASK(3, 2)
@@ -110,9 +124,15 @@ enum {
 
 /* Global QM Rx registers */
 #define FBNIC_CSR_START_QM_RX		0x00c00	/* CSR section delimiter */
+#define FBNIC_QM_RCQ_IDLE(n)		(0x00c00 + (n)) /* 0x03000 + 0x4*n */
+#define FBNIC_QM_RCQ_IDLE_CNT			4
 #define FBNIC_QM_RCQ_CTL0		0x00c0c		/* 0x03030 */
 #define FBNIC_QM_RCQ_CTL0_COAL_WAIT		CSR_GENMASK(15, 0)
 #define FBNIC_QM_RCQ_CTL0_TICK_CYCLES		CSR_GENMASK(26, 16)
+#define FBNIC_QM_HPQ_IDLE(n)		(0x00c0f + (n)) /* 0x0303c + 0x4*n */
+#define FBNIC_QM_HPQ_IDLE_CNT			4
+#define FBNIC_QM_PPQ_IDLE(n)		(0x00c13 + (n)) /* 0x0304c + 0x4*n */
+#define FBNIC_QM_PPQ_IDLE_CNT			4
 #define FBNIC_QM_RNI_RBP_CTL		0x00c2d		/* 0x030b4 */
 #define FBNIC_QM_RNI_RBP_CTL_MRRS		CSR_GENMASK(1, 0)
 #define FBNIC_QM_RNI_RBP_CTL_CLS		CSR_GENMASK(3, 2)
@@ -219,6 +239,8 @@ enum {
 /* TMI registers */
 #define FBNIC_CSR_START_TMI		0x04400	/* CSR section delimiter */
 #define FBNIC_TMI_SOP_PROT_CTRL		0x04400		/* 0x11000 */
+#define FBNIC_TMI_DROP_CTRL		0x04401		/* 0x11004 */
+#define FBNIC_TMI_DROP_CTRL_EN			CSR_BIT(0)
 #define FBNIC_CSR_END_TMI		0x0443f	/* CSR section delimiter */
 /* Rx Buffer Registers */
 #define FBNIC_CSR_START_RXB		0x08000	/* CSR section delimiter */
@@ -378,12 +400,56 @@ enum {
 #define FBNIC_QUEUE(n)\
 	(0x40000 + FBNIC_QUEUE_STRIDE * (n))	/* 0x100000 + 0x1000*n */
 
+#define FBNIC_QUEUE_TWQ0_CTL		0x000		/* 0x000 */
+#define FBNIC_QUEUE_TWQ1_CTL		0x001		/* 0x004 */
+#define FBNIC_QUEUE_TWQ_CTL_RESET		CSR_BIT(0)
+#define FBNIC_QUEUE_TWQ_CTL_ENABLE		CSR_BIT(1)
 #define FBNIC_QUEUE_TWQ0_TAIL		0x002		/* 0x008 */
 #define FBNIC_QUEUE_TWQ1_TAIL		0x003		/* 0x00c */
 
+#define FBNIC_QUEUE_TWQ0_SIZE		0x00a		/* 0x028 */
+#define FBNIC_QUEUE_TWQ1_SIZE		0x00b		/* 0x02c */
+#define FBNIC_QUEUE_TWQ_SIZE_MASK		CSR_GENMASK(3, 0)
+
+#define FBNIC_QUEUE_TWQ0_BAL		0x020		/* 0x080 */
+#define FBNIC_QUEUE_BAL_MASK			CSR_GENMASK(31, 7)
+#define FBNIC_QUEUE_TWQ0_BAH		0x021		/* 0x084 */
+#define FBNIC_QUEUE_TWQ1_BAL		0x022		/* 0x088 */
+#define FBNIC_QUEUE_TWQ1_BAH		0x023		/* 0x08c */
+
 /* Tx Completion Queue Registers */
+#define FBNIC_QUEUE_TCQ_CTL		0x080		/* 0x200 */
+#define FBNIC_QUEUE_TCQ_CTL_RESET		CSR_BIT(0)
+#define FBNIC_QUEUE_TCQ_CTL_ENABLE		CSR_BIT(1)
+
 #define FBNIC_QUEUE_TCQ_HEAD		0x081		/* 0x204 */
 
+#define FBNIC_QUEUE_TCQ_SIZE		0x084		/* 0x210 */
+#define FBNIC_QUEUE_TCQ_SIZE_MASK		CSR_GENMASK(3, 0)
+
+#define FBNIC_QUEUE_TCQ_BAL		0x0a0		/* 0x280 */
+#define FBNIC_QUEUE_TCQ_BAH		0x0a1		/* 0x284 */
+
+/* Tx Interrupt Manager Registers */
+#define FBNIC_QUEUE_TIM_CTL		0x0c0		/* 0x300 */
+#define FBNIC_QUEUE_TIM_CTL_MSIX_MASK		CSR_GENMASK(7, 0)
+
+#define FBNIC_QUEUE_TIM_THRESHOLD	0x0c1		/* 0x304 */
+#define FBNIC_QUEUE_TIM_THRESHOLD_TWD_MASK	CSR_GENMASK(14, 0)
+
+#define FBNIC_QUEUE_TIM_CLEAR		0x0c2		/* 0x308 */
+#define FBNIC_QUEUE_TIM_CLEAR_MASK		CSR_BIT(0)
+#define FBNIC_QUEUE_TIM_SET		0x0c3		/* 0x30c */
+#define FBNIC_QUEUE_TIM_SET_MASK		CSR_BIT(0)
+#define FBNIC_QUEUE_TIM_MASK		0x0c4		/* 0x310 */
+#define FBNIC_QUEUE_TIM_MASK_MASK		CSR_BIT(0)
+
+#define FBNIC_QUEUE_TIM_TIMER		0x0c5		/* 0x314 */
+
+#define FBNIC_QUEUE_TIM_COUNTS		0x0c6		/* 0x318 */
+#define FBNIC_QUEUE_TIM_COUNTS_CNT1_MASK	CSR_GENMASK(30, 16)
+#define FBNIC_QUEUE_TIM_COUNTS_CNT0_MASK	CSR_GENMASK(14, 0)
+
 /* Rx Completion Queue Registers */
 #define FBNIC_QUEUE_RCQ_HEAD		0x201		/* 0x804 */
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index 82c21bcb9c3f..dce3827d4398 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -17,6 +17,10 @@ int __fbnic_open(struct fbnic_net *fbn)
 	if (err)
 		return err;
 
+	err = fbnic_alloc_resources(fbn);
+	if (err)
+		goto free_napi_vectors;
+
 	err = netif_set_real_num_tx_queues(fbn->netdev,
 					   fbn->num_tx_queues);
 	if (err)
@@ -29,6 +33,8 @@ int __fbnic_open(struct fbnic_net *fbn)
 
 	return 0;
 free_resources:
+	fbnic_free_resources(fbn);
+free_napi_vectors:
 	fbnic_free_napi_vectors(fbn);
 	return err;
 }
@@ -51,6 +57,7 @@ static int fbnic_stop(struct net_device *netdev)
 
 	fbnic_down(fbn);
 
+	fbnic_free_resources(fbn);
 	fbnic_free_napi_vectors(fbn);
 
 	return 0;
@@ -107,6 +114,8 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
 	fbn->fbd = fbd;
 	INIT_LIST_HEAD(&fbn->napis);
 
+	fbn->txq_size = FBNIC_TXQ_SIZE_DEFAULT;
+
 	default_queues = netif_get_num_default_rss_queues();
 	if (default_queues > fbd->max_num_queues)
 		default_queues = fbd->max_num_queues;
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
index 8d12abe5fb57..b3c39c10c3f7 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
@@ -15,6 +15,8 @@ struct fbnic_net {
 	struct net_device *netdev;
 	struct fbnic_dev *fbd;
 
+	u32 txq_size;
+
 	u16 num_napi;
 
 	u16 num_tx_queues;
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index 67f6112ec5f0..12d7fbf22d27 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -129,16 +129,33 @@ static void fbnic_service_task_stop(struct fbnic_net *fbn)
 
 void fbnic_up(struct fbnic_net *fbn)
 {
+	fbnic_enable(fbn);
+
+	/* Enable Tx/Rx processing */
+	fbnic_napi_enable(fbn);
 	netif_tx_start_all_queues(fbn->netdev);
 
 	fbnic_service_task_start(fbn);
 }
 
-void fbnic_down(struct fbnic_net *fbn)
+static void fbnic_down_noidle(struct fbnic_net *fbn)
 {
 	fbnic_service_task_stop(fbn);
 
+	/* Disable Tx/Rx Processing */
+	fbnic_napi_disable(fbn);
 	netif_tx_disable(fbn->netdev);
+
+	fbnic_disable(fbn);
+}
+
+void fbnic_down(struct fbnic_net *fbn)
+{
+	fbnic_down_noidle(fbn);
+
+	fbnic_wait_all_queues_idle(fbn->fbd, false);
+
+	fbnic_flush(fbn);
 }
 
 static void fbnic_service_task(struct work_struct *work)
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
index 366386a9721a..dd05ed96d8fc 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
@@ -1,18 +1,50 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) Meta Platforms, Inc. and affiliates. */
 
+#include <linux/iopoll.h>
 #include <linux/pci.h>
 
 #include "fbnic_netdev.h"
 #include "fbnic_txrx.h"
 #include "fbnic.h"
 
+static u32 __iomem *fbnic_ring_csr_base(const struct fbnic_ring *ring)
+{
+	unsigned long csr_base = (unsigned long)ring->doorbell;
+
+	csr_base &= ~(FBNIC_QUEUE_STRIDE * sizeof(u32) - 1);
+
+	return (u32 __iomem *)csr_base;
+}
+
+static u32 fbnic_ring_rd32(struct fbnic_ring *ring, unsigned int csr)
+{
+	u32 __iomem *csr_base = fbnic_ring_csr_base(ring);
+
+	return readl(csr_base + csr);
+}
+
+static void fbnic_ring_wr32(struct fbnic_ring *ring, unsigned int csr, u32 val)
+{
+	u32 __iomem *csr_base = fbnic_ring_csr_base(ring);
+
+	writel(val, csr_base + csr);
+}
+
 netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev)
 {
 	dev_kfree_skb_any(skb);
 	return NETDEV_TX_OK;
 }
 
+static void fbnic_nv_irq_disable(struct fbnic_napi_vector *nv)
+{
+	struct fbnic_dev *fbd = nv->fbd;
+	u32 v_idx = nv->v_idx;
+
+	wr32(FBNIC_INTR_MASK_SET(v_idx / 32), 1 << (v_idx % 32));
+}
+
 static int fbnic_poll(struct napi_struct *napi, int budget)
 {
 	return 0;
@@ -266,3 +298,396 @@ int fbnic_alloc_napi_vectors(struct fbnic_net *fbn)
 	fbnic_free_napi_vectors(fbn);
 	return -ENOMEM;
 }
+
+static void fbnic_free_ring_resources(struct device *dev,
+				      struct fbnic_ring *ring)
+{
+	kvfree(ring->buffer);
+	ring->buffer = NULL;
+
+	/* If size is not set there are no descriptors present */
+	if (!ring->size)
+		return;
+
+	dma_free_coherent(dev, ring->size, ring->desc, ring->dma);
+	ring->size_mask = 0;
+	ring->size = 0;
+}
+
+static int fbnic_alloc_tx_ring_desc(struct fbnic_net *fbn,
+				    struct fbnic_ring *txr)
+{
+	struct device *dev = fbn->netdev->dev.parent;
+	size_t size;
+
+	/* round size up to nearest 4K */
+	size = ALIGN(array_size(sizeof(*txr->desc), fbn->txq_size), 4096);
+
+	txr->desc = dma_alloc_coherent(dev, size, &txr->dma,
+				       GFP_KERNEL | __GFP_NOWARN);
+	if (!txr->desc)
+		return -ENOMEM;
+
+	/* txq_size should be a power of 2, so mask is just that -1 */
+	txr->size_mask = fbn->txq_size - 1;
+	txr->size = size;
+
+	return 0;
+}
+
+static int fbnic_alloc_tx_ring_buffer(struct fbnic_ring *txr)
+{
+	size_t size = array_size(sizeof(*txr->tx_buf), txr->size_mask + 1);
+
+	txr->tx_buf = kvzalloc(size, GFP_KERNEL | __GFP_NOWARN);
+
+	return txr->tx_buf ? 0 : -ENOMEM;
+}
+
+static int fbnic_alloc_tx_ring_resources(struct fbnic_net *fbn,
+					 struct fbnic_ring *txr)
+{
+	struct device *dev = fbn->netdev->dev.parent;
+	int err;
+
+	if (txr->flags & FBNIC_RING_F_DISABLED)
+		return 0;
+
+	err = fbnic_alloc_tx_ring_desc(fbn, txr);
+	if (err)
+		return err;
+
+	if (!(txr->flags & FBNIC_RING_F_CTX))
+		return 0;
+
+	err = fbnic_alloc_tx_ring_buffer(txr);
+	if (err)
+		goto free_desc;
+
+	return 0;
+
+free_desc:
+	fbnic_free_ring_resources(dev, txr);
+	return err;
+}
+
+static void fbnic_free_qt_resources(struct fbnic_net *fbn,
+				    struct fbnic_q_triad *qt)
+{
+	struct device *dev = fbn->netdev->dev.parent;
+
+	fbnic_free_ring_resources(dev, &qt->cmpl);
+	fbnic_free_ring_resources(dev, &qt->sub1);
+	fbnic_free_ring_resources(dev, &qt->sub0);
+}
+
+static int fbnic_alloc_tx_qt_resources(struct fbnic_net *fbn,
+				       struct fbnic_q_triad *qt)
+{
+	struct device *dev = fbn->netdev->dev.parent;
+	int err;
+
+	err = fbnic_alloc_tx_ring_resources(fbn, &qt->sub0);
+	if (err)
+		return err;
+
+	err = fbnic_alloc_tx_ring_resources(fbn, &qt->cmpl);
+	if (err)
+		goto free_sub0;
+
+	return 0;
+
+free_sub0:
+	fbnic_free_ring_resources(dev, &qt->sub0);
+	return err;
+}
+
+static void fbnic_free_nv_resources(struct fbnic_net *fbn,
+				    struct fbnic_napi_vector *nv)
+{
+	int i;
+
+	/* Free Tx Resources  */
+	for (i = 0; i < nv->txt_count; i++)
+		fbnic_free_qt_resources(fbn, &nv->qt[i]);
+}
+
+static int fbnic_alloc_nv_resources(struct fbnic_net *fbn,
+				    struct fbnic_napi_vector *nv)
+{
+	int i, err;
+
+	/* Allocate Tx Resources */
+	for (i = 0; i < nv->txt_count; i++) {
+		err = fbnic_alloc_tx_qt_resources(fbn, &nv->qt[i]);
+		if (err)
+			goto free_resources;
+	}
+
+	return 0;
+
+free_resources:
+	while (i--)
+		fbnic_free_qt_resources(fbn, &nv->qt[i]);
+	return err;
+}
+
+void fbnic_free_resources(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv;
+
+	list_for_each_entry(nv, &fbn->napis, napis)
+		fbnic_free_nv_resources(fbn, nv);
+}
+
+int fbnic_alloc_resources(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv;
+	int err = -ENODEV;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		err = fbnic_alloc_nv_resources(fbn, nv);
+		if (err)
+			goto free_resources;
+	}
+
+	return 0;
+
+free_resources:
+	list_for_each_entry_continue_reverse(nv, &fbn->napis, napis)
+		fbnic_free_nv_resources(fbn, nv);
+
+	return err;
+}
+
+static void fbnic_disable_twq0(struct fbnic_ring *txr)
+{
+	u32 twq_ctl = fbnic_ring_rd32(txr, FBNIC_QUEUE_TWQ0_CTL);
+
+	twq_ctl &= ~FBNIC_QUEUE_TWQ_CTL_ENABLE;
+
+	fbnic_ring_wr32(txr, FBNIC_QUEUE_TWQ0_CTL, twq_ctl);
+}
+
+static void fbnic_disable_tcq(struct fbnic_ring *txr)
+{
+	fbnic_ring_wr32(txr, FBNIC_QUEUE_TCQ_CTL, 0);
+	fbnic_ring_wr32(txr, FBNIC_QUEUE_TIM_MASK, FBNIC_QUEUE_TIM_MASK_MASK);
+}
+
+void fbnic_napi_disable(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		napi_disable(&nv->napi);
+
+		fbnic_nv_irq_disable(nv);
+	}
+}
+
+void fbnic_disable(struct fbnic_net *fbn)
+{
+	struct fbnic_dev *fbd = fbn->fbd;
+	struct fbnic_napi_vector *nv;
+	int i;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		/* disable Tx Queue Triads */
+		for (i = 0; i < nv->txt_count; i++) {
+			struct fbnic_q_triad *qt = &nv->qt[i];
+
+			fbnic_disable_twq0(&qt->sub0);
+			fbnic_disable_tcq(&qt->cmpl);
+		}
+	}
+
+	wrfl();
+}
+
+static void fbnic_tx_flush(struct fbnic_dev *fbd)
+{
+	netdev_warn(fbd->netdev, "tiggerring Tx flush\n");
+
+	fbnic_rmw32(fbd, FBNIC_TMI_DROP_CTRL, FBNIC_TMI_DROP_CTRL_EN,
+		    FBNIC_TMI_DROP_CTRL_EN);
+}
+
+static void fbnic_tx_flush_off(struct fbnic_dev *fbd)
+{
+	fbnic_rmw32(fbd, FBNIC_TMI_DROP_CTRL, FBNIC_TMI_DROP_CTRL_EN, 0);
+}
+
+struct fbnic_idle_regs {
+	u32 reg_base;
+	u8 reg_cnt;
+};
+
+static bool fbnic_all_idle(struct fbnic_dev *fbd,
+			   const struct fbnic_idle_regs *regs,
+			   unsigned int nregs)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < nregs; i++) {
+		for (j = 0; j < regs[i].reg_cnt; j++) {
+			if (fbnic_rd32(fbd, regs[i].reg_base + j) != ~0U)
+				return false;
+		}
+	}
+	return true;
+}
+
+static void fbnic_idle_dump(struct fbnic_dev *fbd,
+			    const struct fbnic_idle_regs *regs,
+			    unsigned int nregs, const char *dir, int err)
+{
+	unsigned int i, j;
+
+	netdev_err(fbd->netdev, "error waiting for %s idle %d\n", dir, err);
+	for (i = 0; i < nregs; i++)
+		for (j = 0; j < regs[i].reg_cnt; j++)
+			netdev_err(fbd->netdev, "0x%04x: %08x\n",
+				   regs[i].reg_base + j,
+				   fbnic_rd32(fbd, regs[i].reg_base + j));
+}
+
+int fbnic_wait_all_queues_idle(struct fbnic_dev *fbd, bool may_fail)
+{
+	static const struct fbnic_idle_regs tx[] = {
+		{ FBNIC_QM_TWQ_IDLE(0),	FBNIC_QM_TWQ_IDLE_CNT, },
+		{ FBNIC_QM_TQS_IDLE(0),	FBNIC_QM_TQS_IDLE_CNT, },
+		{ FBNIC_QM_TDE_IDLE(0),	FBNIC_QM_TDE_IDLE_CNT, },
+		{ FBNIC_QM_TCQ_IDLE(0),	FBNIC_QM_TCQ_IDLE_CNT, },
+	};
+	bool idle;
+	int err;
+
+	err = read_poll_timeout_atomic(fbnic_all_idle, idle, idle, 2, 500000,
+				       false, fbd, tx, ARRAY_SIZE(tx));
+	if (err == -ETIMEDOUT) {
+		fbnic_tx_flush(fbd);
+		err = read_poll_timeout_atomic(fbnic_all_idle, idle, idle,
+					       2, 500000, false,
+					       fbd, tx, ARRAY_SIZE(tx));
+		fbnic_tx_flush_off(fbd);
+	}
+	if (err) {
+		fbnic_idle_dump(fbd, tx, ARRAY_SIZE(tx), "Tx", err);
+		if (may_fail)
+			return err;
+	}
+
+	return err;
+}
+
+void fbnic_flush(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		int i;
+
+		/* Flush any processed Tx Queue Triads and drop the rest */
+		for (i = 0; i < nv->txt_count; i++) {
+			struct fbnic_q_triad *qt = &nv->qt[i];
+			struct netdev_queue *tx_queue;
+
+			/* Reset completion queue descriptor ring */
+			memset(qt->cmpl.desc, 0, qt->cmpl.size);
+
+			/* Reset BQL associated with Tx queue */
+			tx_queue = netdev_get_tx_queue(nv->napi.dev,
+						       qt->sub0.q_idx);
+			netdev_tx_reset_queue(tx_queue);
+		}
+	}
+}
+
+static void fbnic_enable_twq0(struct fbnic_ring *twq)
+{
+	u32 log_size = fls(twq->size_mask);
+
+	if (!twq->size_mask)
+		return;
+
+	/* reset head/tail */
+	fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ0_CTL, FBNIC_QUEUE_TWQ_CTL_RESET);
+	twq->tail = 0;
+	twq->head = 0;
+
+	/* Store descriptor ring address and size */
+	fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ0_BAL, lower_32_bits(twq->dma));
+	fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ0_BAH, upper_32_bits(twq->dma));
+
+	/* write lower 4 bits of log size as 64K ring size is 0 */
+	fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ0_SIZE, log_size & 0xf);
+
+	fbnic_ring_wr32(twq, FBNIC_QUEUE_TWQ0_CTL, FBNIC_QUEUE_TWQ_CTL_ENABLE);
+}
+
+static void fbnic_enable_tcq(struct fbnic_napi_vector *nv,
+			     struct fbnic_ring *tcq)
+{
+	u32 log_size = fls(tcq->size_mask);
+
+	if (!tcq->size_mask)
+		return;
+
+	/* reset head/tail */
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TCQ_CTL, FBNIC_QUEUE_TCQ_CTL_RESET);
+	tcq->tail = 0;
+	tcq->head = 0;
+
+	/* Store descriptor ring address and size */
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TCQ_BAL, lower_32_bits(tcq->dma));
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TCQ_BAH, upper_32_bits(tcq->dma));
+
+	/* write lower 4 bits of log size as 64K ring size is 0 */
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TCQ_SIZE, log_size & 0xf);
+
+	/* Store interrupt information for the completion queue */
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TIM_CTL, nv->v_idx);
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TIM_THRESHOLD, tcq->size_mask / 2);
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TIM_MASK, 0);
+
+	/* Enable queue */
+	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TCQ_CTL, FBNIC_QUEUE_TCQ_CTL_ENABLE);
+}
+
+void fbnic_enable(struct fbnic_net *fbn)
+{
+	struct fbnic_dev *fbd = fbn->fbd;
+	struct fbnic_napi_vector *nv;
+	int i;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		/* Setup Tx Queue Triads */
+		for (i = 0; i < nv->txt_count; i++) {
+			struct fbnic_q_triad *qt = &nv->qt[i];
+
+			fbnic_enable_twq0(&qt->sub0);
+			fbnic_enable_tcq(nv, &qt->cmpl);
+		}
+	}
+
+	wrfl();
+}
+
+static void fbnic_nv_irq_enable(struct fbnic_napi_vector *nv)
+{
+	struct fbnic_dev *fbd = nv->fbd;
+
+	wr32(FBNIC_INTR_CQ_REARM(nv->v_idx), FBNIC_INTR_CQ_REARM_INTR_UNMASK);
+}
+
+void fbnic_napi_enable(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		napi_enable(&nv->napi);
+
+		fbnic_nv_irq_enable(nv);
+	}
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
index e7f1208a3543..2898e0dccf7a 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
@@ -12,17 +12,30 @@ struct fbnic_net;
 #define FBNIC_MAX_TXQS			128u
 #define FBNIC_MAX_RXQS			128u
 
+#define FBNIC_TXQ_SIZE_DEFAULT		1024
+
 #define FBNIC_RING_F_DISABLED		BIT(0)
 #define FBNIC_RING_F_CTX		BIT(1)
 #define FBNIC_RING_F_STATS		BIT(2)	/* ring's stats may be used */
 
 struct fbnic_ring {
+	/* Pointer to buffer specific info */
+	union {
+		void *buffer;			/* Generic pointer */
+		void **tx_buf;			/* TWQ */
+	};
+
 	u32 __iomem *doorbell;		/* pointer to CSR space for ring */
+	__le64 *desc;			/* descriptor ring memory */
 	u16 size_mask;			/* size of ring in descriptors - 1 */
 	u8 q_idx;			/* logical netdev ring index */
 	u8 flags;			/* ring flags (FBNIC_RING_F_*) */
 
 	u32 head, tail;			/* head/tail of ring */
+
+	/* slow path fields follow */
+	dma_addr_t dma;			/* phys addr of descriptor memory */
+	size_t size;			/* size of descriptor ring in memory */
 };
 
 struct fbnic_q_triad {
@@ -51,5 +64,14 @@ netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev);
 
 int fbnic_alloc_napi_vectors(struct fbnic_net *fbn);
 void fbnic_free_napi_vectors(struct fbnic_net *fbn);
+int fbnic_alloc_resources(struct fbnic_net *fbn);
+void fbnic_free_resources(struct fbnic_net *fbn);
+void fbnic_napi_enable(struct fbnic_net *fbn);
+void fbnic_napi_disable(struct fbnic_net *fbn);
+void fbnic_enable(struct fbnic_net *fbn);
+void fbnic_disable(struct fbnic_net *fbn);
+void fbnic_flush(struct fbnic_net *fbn);
+
+int fbnic_wait_all_queues_idle(struct fbnic_dev *fbd, bool may_fail);
 
 #endif /* _FBNIC_TXRX_H_ */



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 09/15] eth: fbnic: implement Rx queue alloc/start/stop/free
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (7 preceding siblings ...)
  2024-04-03 20:08 ` [net-next PATCH 08/15] eth: fbnic: implement Tx queue alloc/start/stop/free Alexander Duyck
@ 2024-04-03 20:09 ` Alexander Duyck
  2024-04-04 11:42   ` kernel test robot
  2024-04-03 20:09 ` [net-next PATCH 10/15] eth: fbnic: Add initial messaging to notify FW of our presence Alexander Duyck
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:09 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Implement control path parts of Rx queue handling.

The NIC consumes memory in pages. It takes a full page and places
packets into it in a configurable manner (with the ability to define
headroom / tailroom as well as head alignment requirements).
As mentioned in prior patches there are two page submissions queues
one for packet headers and second (optional) for packet payloads.
For now feed both queues from a single page pool.

Use the page pool "fragment" API, as we can't predict upfront
how the page will be sliced.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |   70 ++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |    3 
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h |    3 
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c    |    2 
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c   |  466 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h   |   30 ++
 6 files changed, 568 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index e50c2827590b..33832a4f78ea 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -16,6 +16,10 @@
 
 #define FBNIC_CLOCK_FREQ	(600 * (1000 * 1000))
 
+/* Rx Buffer Descriptor Format */
+#define FBNIC_BD_PAGE_ADDR_MASK			DESC_GENMASK(45, 12)
+#define FBNIC_BD_PAGE_ID_MASK			DESC_GENMASK(63, 48)
+
 /* Register Definitions
  *
  * The registers are laid as indexes into an le32 array. As such the actual
@@ -451,12 +455,78 @@ enum {
 #define FBNIC_QUEUE_TIM_COUNTS_CNT0_MASK	CSR_GENMASK(14, 0)
 
 /* Rx Completion Queue Registers */
+#define FBNIC_QUEUE_RCQ_CTL		0x200		/* 0x800 */
+#define FBNIC_QUEUE_RCQ_CTL_RESET		CSR_BIT(0)
+#define FBNIC_QUEUE_RCQ_CTL_ENABLE		CSR_BIT(1)
+
 #define FBNIC_QUEUE_RCQ_HEAD		0x201		/* 0x804 */
 
+#define FBNIC_QUEUE_RCQ_SIZE		0x204		/* 0x810 */
+#define FBNIC_QUEUE_RCQ_SIZE_MASK		CSR_GENMASK(3, 0)
+
+#define FBNIC_QUEUE_RCQ_BAL		0x220		/* 0x880 */
+#define FBNIC_QUEUE_RCQ_BAH		0x221		/* 0x884 */
+
 /* Rx Buffer Descriptor Queue Registers */
+#define FBNIC_QUEUE_BDQ_CTL		0x240		/* 0x900 */
+#define FBNIC_QUEUE_BDQ_CTL_RESET		CSR_BIT(0)
+#define FBNIC_QUEUE_BDQ_CTL_ENABLE		CSR_BIT(1)
+#define FBNIC_QUEUE_BDQ_CTL_PPQ_ENABLE		CSR_BIT(30)
+
 #define FBNIC_QUEUE_BDQ_HPQ_TAIL	0x241		/* 0x904 */
 #define FBNIC_QUEUE_BDQ_PPQ_TAIL	0x242		/* 0x908 */
 
+#define FBNIC_QUEUE_BDQ_HPQ_SIZE	0x247		/* 0x91c */
+#define FBNIC_QUEUE_BDQ_PPQ_SIZE	0x248		/* 0x920 */
+#define FBNIC_QUEUE_BDQ_SIZE_MASK		CSR_GENMASK(3, 0)
+
+#define FBNIC_QUEUE_BDQ_HPQ_BAL		0x260		/* 0x980 */
+#define FBNIC_QUEUE_BDQ_HPQ_BAH		0x261		/* 0x984 */
+#define FBNIC_QUEUE_BDQ_PPQ_BAL		0x262		/* 0x988 */
+#define FBNIC_QUEUE_BDQ_PPQ_BAH		0x263		/* 0x98c */
+
+/* Rx DMA Engine Configuration */
+#define FBNIC_QUEUE_RDE_CTL0		0x2a0		/* 0xa80 */
+#define FBNIC_QUEUE_RDE_CTL0_EN_HDR_SPLIT	CSR_BIT(31)
+#define FBNIC_QUEUE_RDE_CTL0_DROP_MODE_MASK	CSR_GENMASK(30, 29)
+enum {
+	FBNIC_QUEUE_RDE_CTL0_DROP_IMMEDIATE	= 0,
+	FBNIC_QUEUE_RDE_CTL0_DROP_WAIT		= 1,
+	FBNIC_QUEUE_RDE_CTL0_DROP_NEVER		= 2,
+};
+
+#define FBNIC_QUEUE_RDE_CTL0_MIN_HROOM_MASK	CSR_GENMASK(28, 20)
+#define FBNIC_QUEUE_RDE_CTL0_MIN_TROOM_MASK	CSR_GENMASK(19, 11)
+
+#define FBNIC_QUEUE_RDE_CTL1		0x2a1		/* 0xa84 */
+#define FBNIC_QUEUE_RDE_CTL1_MAX_HDR_MASK	CSR_GENMASK(24, 12)
+#define FBNIC_QUEUE_RDE_CTL1_PAYLD_OFF_MASK	CSR_GENMASK(11, 9)
+#define FBNIC_QUEUE_RDE_CTL1_PAYLD_PG_CL_MASK	CSR_GENMASK(8, 6)
+#define FBNIC_QUEUE_RDE_CTL1_PADLEN_MASK	CSR_GENMASK(5, 2)
+#define FBNIC_QUEUE_RDE_CTL1_PAYLD_PACK_MASK	CSR_GENMASK(1, 0)
+enum {
+	FBNIC_QUEUE_RDE_CTL1_PAYLD_PACK_NONE	= 0,
+	FBNIC_QUEUE_RDE_CTL1_PAYLD_PACK_ALL	= 1,
+	FBNIC_QUEUE_RDE_CTL1_PAYLD_PACK_RSS	= 2,
+};
+
+/* Rx Interrupt Manager Registers */
+#define FBNIC_QUEUE_RIM_CTL		0x2c0		/* 0xb00 */
+#define FBNIC_QUEUE_RIM_CTL_MSIX_MASK		CSR_GENMASK(7, 0)
+
+#define FBNIC_QUEUE_RIM_THRESHOLD	0x2c1		/* 0xb04 */
+#define FBNIC_QUEUE_RIM_THRESHOLD_RCD_MASK	CSR_GENMASK(14, 0)
+
+#define FBNIC_QUEUE_RIM_CLEAR		0x2c2		/* 0xb08 */
+#define FBNIC_QUEUE_RIM_CLEAR_MASK		CSR_BIT(0)
+#define FBNIC_QUEUE_RIM_SET		0x2c3		/* 0xb0c */
+#define FBNIC_QUEUE_RIM_SET_MASK		CSR_BIT(0)
+#define FBNIC_QUEUE_RIM_MASK		0x2c4		/* 0xb10 */
+#define FBNIC_QUEUE_RIM_MASK_MASK		CSR_BIT(0)
+
+#define FBNIC_QUEUE_RIM_COAL_STATUS	0x2c5		/* 0xb14 */
+#define FBNIC_QUEUE_RIM_RCD_COUNT_MASK		CSR_GENMASK(30, 16)
+#define FBNIC_QUEUE_RIM_TIMER_MASK		CSR_GENMASK(13, 0)
 #define FBNIC_MAX_QUEUES		128
 #define FBNIC_CSR_END_QUEUE	(0x40000 + 0x400 * FBNIC_MAX_QUEUES - 1)
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index dce3827d4398..171b159cc006 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -115,6 +115,9 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
 	INIT_LIST_HEAD(&fbn->napis);
 
 	fbn->txq_size = FBNIC_TXQ_SIZE_DEFAULT;
+	fbn->hpq_size = FBNIC_HPQ_SIZE_DEFAULT;
+	fbn->ppq_size = FBNIC_PPQ_SIZE_DEFAULT;
+	fbn->rcq_size = FBNIC_RCQ_SIZE_DEFAULT;
 
 	default_queues = netif_get_num_default_rss_queues();
 	if (default_queues > fbd->max_num_queues)
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
index b3c39c10c3f7..18f93e9431cc 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
@@ -16,6 +16,9 @@ struct fbnic_net {
 	struct fbnic_dev *fbd;
 
 	u32 txq_size;
+	u32 hpq_size;
+	u32 ppq_size;
+	u32 rcq_size;
 
 	u16 num_napi;
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index 12d7fbf22d27..d6598c81a5f9 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -131,6 +131,8 @@ void fbnic_up(struct fbnic_net *fbn)
 {
 	fbnic_enable(fbn);
 
+	fbnic_fill(fbn);
+
 	/* Enable Tx/Rx processing */
 	fbnic_napi_enable(fbn);
 	netif_tx_start_all_queues(fbn->netdev);
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
index dd05ed96d8fc..484cab7342da 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
@@ -3,6 +3,8 @@
 
 #include <linux/iopoll.h>
 #include <linux/pci.h>
+#include <net/netdev_queues.h>
+#include <net/page_pool/helpers.h>
 
 #include "fbnic_netdev.h"
 #include "fbnic_txrx.h"
@@ -31,12 +33,128 @@ static void fbnic_ring_wr32(struct fbnic_ring *ring, unsigned int csr, u32 val)
 	writel(val, csr_base + csr);
 }
 
+static inline unsigned int fbnic_desc_unused(struct fbnic_ring *ring)
+{
+	return (ring->head - ring->tail - 1) & ring->size_mask;
+}
+
 netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev)
 {
 	dev_kfree_skb_any(skb);
 	return NETDEV_TX_OK;
 }
 
+static void fbnic_page_pool_init(struct fbnic_ring *ring, unsigned int idx,
+				 struct page *page)
+{
+	struct fbnic_rx_buf *rx_buf = &ring->rx_buf[idx];
+
+	page_pool_fragment_page(page, PAGECNT_BIAS_MAX);
+	rx_buf->pagecnt_bias = PAGECNT_BIAS_MAX;
+	rx_buf->page = page;
+}
+
+static void fbnic_page_pool_drain(struct fbnic_ring *ring, unsigned int idx,
+				  struct fbnic_napi_vector *nv, int budget)
+{
+	struct fbnic_rx_buf *rx_buf = &ring->rx_buf[idx];
+	struct page *page = rx_buf->page;
+
+	if (!page_pool_unref_page(page, rx_buf->pagecnt_bias))
+		page_pool_put_unrefed_page(nv->page_pool, page, -1, !!budget);
+
+	rx_buf->page = NULL;
+}
+
+static void fbnic_clean_bdq(struct fbnic_napi_vector *nv, int napi_budget,
+			    struct fbnic_ring *ring, unsigned int hw_head)
+{
+	unsigned int head = ring->head;
+
+	if (head == hw_head)
+		return;
+
+	do {
+		fbnic_page_pool_drain(ring, head, nv, napi_budget);
+
+		head++;
+		head &= ring->size_mask;
+	} while (head != hw_head);
+
+	ring->head = head;
+}
+
+static __le64 fbnic_bd_prep(struct page *page, u16 id)
+{
+	dma_addr_t dma = page_pool_get_dma_addr(page);
+	u64 bd;
+
+	bd = (FBNIC_BD_PAGE_ADDR_MASK & dma) |
+	     FIELD_PREP(FBNIC_BD_PAGE_ID_MASK, id);
+
+	return cpu_to_le64(bd);
+}
+
+static void fbnic_fill_bdq(struct fbnic_napi_vector *nv, struct fbnic_ring *bdq)
+{
+	unsigned int count = fbnic_desc_unused(bdq);
+	unsigned int i = bdq->tail;
+
+	if (!count)
+		return;
+
+	do {
+		struct page *page;
+		__le64 *bd;
+
+		page = page_pool_dev_alloc_pages(nv->page_pool);
+		if (!page)
+			break;
+
+		fbnic_page_pool_init(bdq, i, page);
+
+		bd = &bdq->desc[i];
+		*bd = fbnic_bd_prep(page, i);
+
+		i++;
+		i &= bdq->size_mask;
+
+		count--;
+	} while (count);
+
+	if (bdq->tail != i) {
+		bdq->tail = i;
+
+		/* Force DMA writes to flush before writing to tail */
+		dma_wmb();
+
+		writel(i, bdq->doorbell);
+	}
+}
+
+static void fbnic_put_pkt_buff(struct fbnic_napi_vector *nv,
+			       struct fbnic_pkt_buff *pkt, int budget)
+{
+	struct skb_shared_info *shinfo;
+	struct page *page;
+	int nr_frags;
+
+	if (!pkt->buff.data_hard_start)
+		return;
+
+	shinfo = xdp_get_shared_info_from_buff(&pkt->buff);
+	nr_frags = pkt->nr_frags;
+
+	while (nr_frags--) {
+		page = skb_frag_page(&shinfo->frags[nr_frags]);
+		page_pool_put_full_page(nv->page_pool, page, !!budget);
+	}
+
+	page = virt_to_page(pkt->buff.data_hard_start);
+	page_pool_put_full_page(nv->page_pool, page, !!budget);
+	pkt->buff.data_hard_start = NULL;
+}
+
 static void fbnic_nv_irq_disable(struct fbnic_napi_vector *nv)
 {
 	struct fbnic_dev *fbd = nv->fbd;
@@ -100,6 +218,7 @@ static void fbnic_free_napi_vector(struct fbnic_net *fbn,
 	}
 
 	free_irq(fbd->msix_entries[v_idx].vector, nv);
+	page_pool_destroy(nv->page_pool);
 	netif_napi_del(&nv->napi);
 	list_del(&nv->napis);
 	kfree(nv);
@@ -125,6 +244,42 @@ static void fbnic_name_napi_vector(struct fbnic_napi_vector *nv)
 			 nv->v_idx - FBNIC_NON_NAPI_VECTORS);
 }
 
+static int fbnic_alloc_nv_page_pool(struct fbnic_net *fbn,
+				    struct fbnic_napi_vector *nv)
+{
+	struct page_pool_params pp_params = {
+		.order = 0,
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+		.pool_size = (fbn->hpq_size + fbn->ppq_size) * nv->rxt_count,
+		.nid = NUMA_NO_NODE,
+		.dev = nv->dev,
+		.dma_dir = DMA_BIDIRECTIONAL,
+		.offset = 0,
+		.max_len = PAGE_SIZE
+	};
+	struct page_pool *pp;
+
+	/* Page pool cannot exceed a size of 32768. This doesn't limit the
+	 * pages on the ring but the number we can have cached waiting on
+	 * the next use.
+	 *
+	 * TBD: Can this be reduced further? Would a multiple of
+	 * NAPI_POLL_WEIGHT possibly make more sense? The question is how
+	 * may pages do we need to hold in reserve to get the best return
+	 * without hogging too much system memory.
+	 */
+	if (pp_params.pool_size > 32768)
+		pp_params.pool_size = 32768;
+
+	pp = page_pool_create(&pp_params);
+	if (IS_ERR(pp))
+		return PTR_ERR(pp);
+
+	nv->page_pool = pp;
+
+	return 0;
+}
+
 static void fbnic_ring_init(struct fbnic_ring *ring, u32 __iomem *doorbell,
 			    int q_idx, u8 flags)
 {
@@ -174,6 +329,13 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn,
 	/* tie nv back to PCIe dev */
 	nv->dev = fbd->dev;
 
+	/* allocate page pool */
+	if (rxq_count) {
+		err = fbnic_alloc_nv_page_pool(fbn, nv);
+		if (err)
+			goto napi_del;
+	}
+
 	/* initialize vector name */
 	fbnic_name_napi_vector(nv);
 
@@ -182,7 +344,7 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn,
 	err = request_irq(vector, &fbnic_msix_clean_rings, IRQF_SHARED,
 			  nv->name, nv);
 	if (err)
-		goto napi_del;
+		goto pp_destroy;
 
 	/* Initialize queue triads */
 	qt = nv->qt;
@@ -239,6 +401,8 @@ static int fbnic_alloc_napi_vector(struct fbnic_dev *fbd, struct fbnic_net *fbn,
 
 	return 0;
 
+pp_destroy:
+	page_pool_destroy(nv->page_pool);
 napi_del:
 	netif_napi_del(&nv->napi);
 	list_del(&nv->napis);
@@ -371,6 +535,77 @@ static int fbnic_alloc_tx_ring_resources(struct fbnic_net *fbn,
 	return err;
 }
 
+static int fbnic_alloc_rx_ring_desc(struct fbnic_net *fbn,
+				    struct fbnic_ring *rxr)
+{
+	struct device *dev = fbn->netdev->dev.parent;
+	u32 rxq_size;
+	size_t size;
+
+	switch (rxr->doorbell - fbnic_ring_csr_base(rxr)) {
+	case FBNIC_QUEUE_BDQ_HPQ_TAIL:
+		rxq_size = fbn->hpq_size;
+		break;
+	case FBNIC_QUEUE_BDQ_PPQ_TAIL:
+		rxq_size = fbn->ppq_size;
+		break;
+	case FBNIC_QUEUE_RCQ_HEAD:
+		rxq_size = fbn->rcq_size;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/* round size up to nearest 4K */
+	size = ALIGN(array_size(sizeof(*rxr->desc), rxq_size), 4096);
+
+	rxr->desc = dma_alloc_coherent(dev, size, &rxr->dma,
+				       GFP_KERNEL | __GFP_NOWARN);
+	if (!rxr->desc)
+		return -ENOMEM;
+
+	/* rxq_size should be a power of 2, so mask is just that -1 */
+	rxr->size_mask = rxq_size - 1;
+	rxr->size = size;
+
+	return 0;
+}
+
+static int fbnic_alloc_rx_ring_buffer(struct fbnic_ring *rxr)
+{
+	size_t size = array_size(sizeof(*rxr->rx_buf), rxr->size_mask + 1);
+
+	if (rxr->flags & FBNIC_RING_F_CTX)
+		size = sizeof(*rxr->rx_buf) * (rxr->size_mask + 1);
+	else
+		size = sizeof(*rxr->pkt);
+
+	rxr->rx_buf = kvzalloc(size, GFP_KERNEL | __GFP_NOWARN);
+
+	return rxr->rx_buf ? 0 : -ENOMEM;
+}
+
+static int fbnic_alloc_rx_ring_resources(struct fbnic_net *fbn,
+					 struct fbnic_ring *rxr)
+{
+	struct device *dev = fbn->netdev->dev.parent;
+	int err;
+
+	err = fbnic_alloc_rx_ring_desc(fbn, rxr);
+	if (err)
+		return err;
+
+	err = fbnic_alloc_rx_ring_buffer(rxr);
+	if (err)
+		goto free_desc;
+
+	return 0;
+
+free_desc:
+	fbnic_free_ring_resources(dev, rxr);
+	return err;
+}
+
 static void fbnic_free_qt_resources(struct fbnic_net *fbn,
 				    struct fbnic_q_triad *qt)
 {
@@ -402,20 +637,50 @@ static int fbnic_alloc_tx_qt_resources(struct fbnic_net *fbn,
 	return err;
 }
 
+static int fbnic_alloc_rx_qt_resources(struct fbnic_net *fbn,
+				       struct fbnic_q_triad *qt)
+{
+	struct device *dev = fbn->netdev->dev.parent;
+	int err;
+
+	err = fbnic_alloc_rx_ring_resources(fbn, &qt->sub0);
+	if (err)
+		return err;
+
+	err = fbnic_alloc_rx_ring_resources(fbn, &qt->sub1);
+	if (err)
+		goto free_sub0;
+
+	err = fbnic_alloc_rx_ring_resources(fbn, &qt->cmpl);
+	if (err)
+		goto free_sub1;
+
+	return 0;
+
+free_sub1:
+	fbnic_free_ring_resources(dev, &qt->sub1);
+free_sub0:
+	fbnic_free_ring_resources(dev, &qt->sub0);
+	return err;
+}
+
 static void fbnic_free_nv_resources(struct fbnic_net *fbn,
 				    struct fbnic_napi_vector *nv)
 {
-	int i;
+	int i, j;
 
 	/* Free Tx Resources  */
 	for (i = 0; i < nv->txt_count; i++)
 		fbnic_free_qt_resources(fbn, &nv->qt[i]);
+
+	for (j = 0; j < nv->rxt_count; j++, i++)
+		fbnic_free_qt_resources(fbn, &nv->qt[i]);
 }
 
 static int fbnic_alloc_nv_resources(struct fbnic_net *fbn,
 				    struct fbnic_napi_vector *nv)
 {
-	int i, err;
+	int i, j, err;
 
 	/* Allocate Tx Resources */
 	for (i = 0; i < nv->txt_count; i++) {
@@ -424,6 +689,13 @@ static int fbnic_alloc_nv_resources(struct fbnic_net *fbn,
 			goto free_resources;
 	}
 
+	/* Allocate Rx Resources */
+	for (j = 0; j < nv->rxt_count; j++, i++) {
+		err = fbnic_alloc_rx_qt_resources(fbn, &nv->qt[i]);
+		if (err)
+			goto free_resources;
+	}
+
 	return 0;
 
 free_resources:
@@ -475,6 +747,21 @@ static void fbnic_disable_tcq(struct fbnic_ring *txr)
 	fbnic_ring_wr32(txr, FBNIC_QUEUE_TIM_MASK, FBNIC_QUEUE_TIM_MASK_MASK);
 }
 
+static void fbnic_disable_bdq(struct fbnic_ring *hpq, struct fbnic_ring *ppq)
+{
+	u32 bdq_ctl = fbnic_ring_rd32(hpq, FBNIC_QUEUE_BDQ_CTL);
+
+	bdq_ctl &= ~FBNIC_QUEUE_BDQ_CTL_ENABLE;
+
+	fbnic_ring_wr32(hpq, FBNIC_QUEUE_BDQ_CTL, bdq_ctl);
+}
+
+static void fbnic_disable_rcq(struct fbnic_ring *rxr)
+{
+	fbnic_ring_wr32(rxr, FBNIC_QUEUE_RCQ_CTL, 0);
+	fbnic_ring_wr32(rxr, FBNIC_QUEUE_RIM_MASK, FBNIC_QUEUE_RIM_MASK_MASK);
+}
+
 void fbnic_napi_disable(struct fbnic_net *fbn)
 {
 	struct fbnic_napi_vector *nv;
@@ -490,7 +777,7 @@ void fbnic_disable(struct fbnic_net *fbn)
 {
 	struct fbnic_dev *fbd = fbn->fbd;
 	struct fbnic_napi_vector *nv;
-	int i;
+	int i, j;
 
 	list_for_each_entry(nv, &fbn->napis, napis) {
 		/* disable Tx Queue Triads */
@@ -500,6 +787,14 @@ void fbnic_disable(struct fbnic_net *fbn)
 			fbnic_disable_twq0(&qt->sub0);
 			fbnic_disable_tcq(&qt->cmpl);
 		}
+
+		/* disable Rx Queue Triads */
+		for (j = 0; j < nv->rxt_count; j++, i++) {
+			struct fbnic_q_triad *qt = &nv->qt[i];
+
+			fbnic_disable_bdq(&qt->sub0, &qt->sub1);
+			fbnic_disable_rcq(&qt->cmpl);
+		}
 	}
 
 	wrfl();
@@ -559,6 +854,10 @@ int fbnic_wait_all_queues_idle(struct fbnic_dev *fbd, bool may_fail)
 		{ FBNIC_QM_TQS_IDLE(0),	FBNIC_QM_TQS_IDLE_CNT, },
 		{ FBNIC_QM_TDE_IDLE(0),	FBNIC_QM_TDE_IDLE_CNT, },
 		{ FBNIC_QM_TCQ_IDLE(0),	FBNIC_QM_TCQ_IDLE_CNT, },
+	}, rx[] = {
+		{ FBNIC_QM_HPQ_IDLE(0),	FBNIC_QM_HPQ_IDLE_CNT, },
+		{ FBNIC_QM_PPQ_IDLE(0),	FBNIC_QM_PPQ_IDLE_CNT, },
+		{ FBNIC_QM_RCQ_IDLE(0),	FBNIC_QM_RCQ_IDLE_CNT, },
 	};
 	bool idle;
 	int err;
@@ -578,6 +877,10 @@ int fbnic_wait_all_queues_idle(struct fbnic_dev *fbd, bool may_fail)
 			return err;
 	}
 
+	err = read_poll_timeout_atomic(fbnic_all_idle, idle, idle, 2, 500000,
+				       false, fbd, rx, ARRAY_SIZE(rx));
+	if (err)
+		fbnic_idle_dump(fbd, rx, ARRAY_SIZE(rx), "Rx", err);
 	return err;
 }
 
@@ -586,7 +889,7 @@ void fbnic_flush(struct fbnic_net *fbn)
 	struct fbnic_napi_vector *nv;
 
 	list_for_each_entry(nv, &fbn->napis, napis) {
-		int i;
+		int i, j;
 
 		/* Flush any processed Tx Queue Triads and drop the rest */
 		for (i = 0; i < nv->txt_count; i++) {
@@ -601,6 +904,38 @@ void fbnic_flush(struct fbnic_net *fbn)
 						       qt->sub0.q_idx);
 			netdev_tx_reset_queue(tx_queue);
 		}
+
+		/* Flush any processed Rx Queue Triads and drop the rest */
+		for (j = 0; j < nv->rxt_count; j++, i++) {
+			struct fbnic_q_triad *qt = &nv->qt[i];
+
+			/* Clean the work queues of unprocessed work */
+			fbnic_clean_bdq(nv, 0, &qt->sub0, qt->sub0.tail);
+			fbnic_clean_bdq(nv, 0, &qt->sub1, qt->sub1.tail);
+
+			/* Reset completion queue descriptor ring */
+			memset(qt->cmpl.desc, 0, qt->cmpl.size);
+
+			fbnic_put_pkt_buff(nv, qt->cmpl.pkt, 0);
+		}
+	}
+}
+
+void fbnic_fill(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		int i, j;
+
+		/* Populate pages in the BDQ rings to use for Rx */
+		for (j = 0, i = nv->txt_count; j < nv->rxt_count; j++, i++) {
+			struct fbnic_q_triad *qt = &nv->qt[i];
+
+			/* populate the header and payload BDQs */
+			fbnic_fill_bdq(nv, &qt->sub0);
+			fbnic_fill_bdq(nv, &qt->sub1);
+		}
 	}
 }
 
@@ -655,11 +990,102 @@ static void fbnic_enable_tcq(struct fbnic_napi_vector *nv,
 	fbnic_ring_wr32(tcq, FBNIC_QUEUE_TCQ_CTL, FBNIC_QUEUE_TCQ_CTL_ENABLE);
 }
 
+static void fbnic_enable_bdq(struct fbnic_ring *hpq, struct fbnic_ring *ppq)
+{
+	u32 bdq_ctl = FBNIC_QUEUE_BDQ_CTL_ENABLE;
+	u32 log_size;
+
+	/* reset head/tail */
+	fbnic_ring_wr32(hpq, FBNIC_QUEUE_BDQ_CTL, FBNIC_QUEUE_BDQ_CTL_RESET);
+	ppq->tail = 0;
+	ppq->head = 0;
+	hpq->tail = 0;
+	hpq->head = 0;
+
+	log_size = fls(hpq->size_mask);
+
+	/* Store descriptor ring address and size */
+	fbnic_ring_wr32(hpq, FBNIC_QUEUE_BDQ_HPQ_BAL, lower_32_bits(hpq->dma));
+	fbnic_ring_wr32(hpq, FBNIC_QUEUE_BDQ_HPQ_BAH, upper_32_bits(hpq->dma));
+
+	/* write lower 4 bits of log size as 64K ring size is 0 */
+	fbnic_ring_wr32(hpq, FBNIC_QUEUE_BDQ_HPQ_SIZE, log_size & 0xf);
+
+	if (!ppq->size_mask)
+		goto write_ctl;
+
+	log_size = fls(ppq->size_mask);
+
+	/* Add enabling of PPQ to BDQ control */
+	bdq_ctl |= FBNIC_QUEUE_BDQ_CTL_PPQ_ENABLE;
+
+	/* Store descriptor ring address and size */
+	fbnic_ring_wr32(ppq, FBNIC_QUEUE_BDQ_PPQ_BAL, lower_32_bits(ppq->dma));
+	fbnic_ring_wr32(ppq, FBNIC_QUEUE_BDQ_PPQ_BAH, upper_32_bits(ppq->dma));
+	fbnic_ring_wr32(ppq, FBNIC_QUEUE_BDQ_PPQ_SIZE, log_size & 0xf);
+
+write_ctl:
+	fbnic_ring_wr32(hpq, FBNIC_QUEUE_BDQ_CTL, bdq_ctl);
+}
+
+static void fbnic_config_drop_mode_rcq(struct fbnic_napi_vector *nv,
+				       struct fbnic_ring *rcq)
+{
+	u32 drop_mode, rcq_ctl;
+
+	drop_mode = FBNIC_QUEUE_RDE_CTL0_DROP_IMMEDIATE;
+
+	/* Specify packet layout */
+	rcq_ctl = FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_DROP_MODE_MASK, drop_mode) |
+	    FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_MIN_HROOM_MASK, FBNIC_RX_HROOM) |
+	    FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_MIN_TROOM_MASK, FBNIC_RX_TROOM);
+
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RDE_CTL0, rcq_ctl);
+}
+
+static void fbnic_enable_rcq(struct fbnic_napi_vector *nv,
+			     struct fbnic_ring *rcq)
+{
+	u32 log_size = fls(rcq->size_mask);
+	u32 rcq_ctl;
+
+	fbnic_config_drop_mode_rcq(nv, rcq);
+
+	rcq_ctl = FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_PADLEN_MASK, FBNIC_RX_PAD) |
+		   FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_MAX_HDR_MASK,
+			      FBNIC_RX_MAX_HDR) |
+		   FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_PAYLD_OFF_MASK,
+			      FBNIC_RX_PAYLD_OFFSET) |
+		   FIELD_PREP(FBNIC_QUEUE_RDE_CTL1_PAYLD_PG_CL_MASK,
+			      FBNIC_RX_PAYLD_PG_CL);
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RDE_CTL1, rcq_ctl);
+
+	/* reset head/tail */
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RCQ_CTL, FBNIC_QUEUE_RCQ_CTL_RESET);
+	rcq->head = 0;
+	rcq->tail = 0;
+
+	/* Store descriptor ring address and size */
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RCQ_BAL, lower_32_bits(rcq->dma));
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RCQ_BAH, upper_32_bits(rcq->dma));
+
+	/* write lower 4 bits of log size as 64K ring size is 0 */
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RCQ_SIZE, log_size & 0xf);
+
+	/* Store interrupt information for the completion queue */
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RIM_CTL, nv->v_idx);
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RIM_THRESHOLD, rcq->size_mask / 2);
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RIM_MASK, 0);
+
+	/* Enable queue */
+	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RCQ_CTL, FBNIC_QUEUE_RCQ_CTL_ENABLE);
+}
+
 void fbnic_enable(struct fbnic_net *fbn)
 {
 	struct fbnic_dev *fbd = fbn->fbd;
 	struct fbnic_napi_vector *nv;
-	int i;
+	int i, j;
 
 	list_for_each_entry(nv, &fbn->napis, napis) {
 		/* Setup Tx Queue Triads */
@@ -669,6 +1095,15 @@ void fbnic_enable(struct fbnic_net *fbn)
 			fbnic_enable_twq0(&qt->sub0);
 			fbnic_enable_tcq(nv, &qt->cmpl);
 		}
+
+		/* Setup Rx Queue Triads */
+		for (j = 0; j < nv->rxt_count; j++, i++) {
+			struct fbnic_q_triad *qt = &nv->qt[i];
+
+			fbnic_enable_bdq(&qt->sub0, &qt->sub1);
+			fbnic_config_drop_mode_rcq(nv, &qt->cmpl);
+			fbnic_enable_rcq(nv, &qt->cmpl);
+		}
 	}
 
 	wrfl();
@@ -683,11 +1118,30 @@ static void fbnic_nv_irq_enable(struct fbnic_napi_vector *nv)
 
 void fbnic_napi_enable(struct fbnic_net *fbn)
 {
+	u32 irqs[FBNIC_MAX_MSIX_VECS / 32] = {};
+	struct fbnic_dev *fbd = fbn->fbd;
 	struct fbnic_napi_vector *nv;
+	int i;
 
 	list_for_each_entry(nv, &fbn->napis, napis) {
 		napi_enable(&nv->napi);
 
 		fbnic_nv_irq_enable(nv);
+
+		/* Record bit used for NAPI IRQs so we can
+		 * set the mask appropriately
+		 */
+		irqs[nv->v_idx / 32] |= BIT(nv->v_idx % 32);
 	}
+
+	/* Force the first interrupt on the device to guarantee
+	 * that any packets that may have been enqueued during the
+	 * bringup are processed.
+	 */
+	for (i = 0; i < ARRAY_SIZE(irqs); i++) {
+		if (!irqs[i])
+			continue;
+		wr32(FBNIC_INTR_SET(i), irqs[i]);
+	}
+	wrfl();
 }
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
index 2898e0dccf7a..200f3b893d02 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
@@ -6,6 +6,7 @@
 
 #include <linux/netdevice.h>
 #include <linux/types.h>
+#include <net/xdp.h>
 
 struct fbnic_net;
 
@@ -13,16 +14,43 @@ struct fbnic_net;
 #define FBNIC_MAX_RXQS			128u
 
 #define FBNIC_TXQ_SIZE_DEFAULT		1024
+#define FBNIC_HPQ_SIZE_DEFAULT		256
+#define FBNIC_PPQ_SIZE_DEFAULT		256
+#define FBNIC_RCQ_SIZE_DEFAULT		1024
+
+#define FBNIC_RX_TROOM \
+	SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
+#define FBNIC_RX_HROOM \
+	(ALIGN(FBNIC_RX_TROOM + NET_SKB_PAD, 128) - FBNIC_RX_TROOM)
+#define FBNIC_RX_PAD			0
+#define FBNIC_RX_MAX_HDR		(1536 - FBNIC_RX_PAD)
+#define FBNIC_RX_PAYLD_OFFSET		0
+#define FBNIC_RX_PAYLD_PG_CL		0
 
 #define FBNIC_RING_F_DISABLED		BIT(0)
 #define FBNIC_RING_F_CTX		BIT(1)
 #define FBNIC_RING_F_STATS		BIT(2)	/* ring's stats may be used */
 
+struct fbnic_pkt_buff {
+	struct xdp_buff buff;
+	u32 data_truesize;
+	u16 data_len;
+	u16 nr_frags;
+};
+
+#define PAGECNT_BIAS_MAX	USHRT_MAX
+struct fbnic_rx_buf {
+	struct page *page;
+	unsigned int pagecnt_bias;
+};
+
 struct fbnic_ring {
 	/* Pointer to buffer specific info */
 	union {
 		void *buffer;			/* Generic pointer */
 		void **tx_buf;			/* TWQ */
+		struct fbnic_pkt_buff *pkt;	/* RCQ */
+		struct fbnic_rx_buf *rx_buf;	/* BDQ */
 	};
 
 	u32 __iomem *doorbell;		/* pointer to CSR space for ring */
@@ -45,6 +73,7 @@ struct fbnic_q_triad {
 struct fbnic_napi_vector {
 	struct napi_struct napi;
 	struct device *dev;		/* Device for DMA unmapping */
+	struct page_pool *page_pool;
 	struct fbnic_dev *fbd;
 	char name[IFNAMSIZ + 9];
 
@@ -71,6 +100,7 @@ void fbnic_napi_disable(struct fbnic_net *fbn);
 void fbnic_enable(struct fbnic_net *fbn);
 void fbnic_disable(struct fbnic_net *fbn);
 void fbnic_flush(struct fbnic_net *fbn);
+void fbnic_fill(struct fbnic_net *fbn);
 
 int fbnic_wait_all_queues_idle(struct fbnic_dev *fbd, bool may_fail);
 



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 10/15] eth: fbnic: Add initial messaging to notify FW of our presence
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (8 preceding siblings ...)
  2024-04-03 20:09 ` [net-next PATCH 09/15] eth: fbnic: implement Rx " Alexander Duyck
@ 2024-04-03 20:09 ` Alexander Duyck
  2024-04-03 20:09 ` [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup Alexander Duyck
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:09 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

After the driver loads we need to get some intitial capabilities from the
firmware to determine what the device is capable of and what functionality
needs to be enabled. Specifically we receive information about the current
state of the link and if a BMC is present.

After that when we bring the interface up we will need the ability to take
ownership from the FW. To do that we will need to notify it that we are
taking control before we start configuring the traffic classifier and MAC.

Once we have ownership we need to notify the firmware that we are still
present and active. To do that we will send a regular heartbeat to the FW.
If the FW doesn't receive the heartbeat in a timely fashion it will retake
control of the RPC and MAC and assume that the host has gone offline.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic.h        |    5 
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |    8 
 drivers/net/ethernet/meta/fbnic/fbnic_fw.c     |  409 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_fw.h     |   85 +++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |   18 +
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c    |   28 ++
 6 files changed, 553 insertions(+)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 92a36959547c..4f18d703dae8 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -25,9 +25,14 @@ struct fbnic_dev {
 	struct delayed_work service_task;
 
 	struct fbnic_fw_mbx mbx[FBNIC_IPC_MBX_INDICES];
+	struct fbnic_fw_cap fw_cap;
 	/* Lock protecting Tx Mailbox queue to prevent possible races */
 	spinlock_t fw_tx_lock;
 
+	unsigned long last_heartbeat_request;
+	unsigned long last_heartbeat_response;
+	u8 fw_heartbeat_enabled;
+
 	u64 dsn;
 	u32 mps;
 	u32 readrq;
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 33832a4f78ea..8b035c4e068e 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -12,6 +12,14 @@
 #define DESC_BIT(nr)		BIT_ULL(nr)
 #define DESC_GENMASK(h, l)	GENMASK_ULL(h, l)
 
+/* Defines the minimum firmware version required by the driver */
+#define MIN_FW_MAJOR_VERSION    0
+#define MIN_FW_MINOR_VERSION    10
+#define MIN_FW_BUILD_VERSION    6
+#define MIN_FW_VERSION_CODE     (MIN_FW_MAJOR_VERSION * (1u << 24) + \
+				 MIN_FW_MINOR_VERSION * (1u << 16) + \
+				 MIN_FW_BUILD_VERSION)
+
 #define PCI_DEVICE_ID_META_FBNIC_ASIC		0x0013
 
 #define FBNIC_CLOCK_FREQ	(600 * (1000 * 1000))
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_fw.c b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
index 71647044aa23..4c3098364fed 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
@@ -2,6 +2,7 @@
 /* Copyright (c) Meta Platforms, Inc. and affiliates. */
 
 #include <linux/bitfield.h>
+#include <linux/etherdevice.h>
 #include <linux/delay.h>
 #include <linux/dev_printk.h>
 #include <linux/dma-mapping.h>
@@ -184,6 +185,22 @@ static int fbnic_mbx_alloc_rx_msgs(struct fbnic_dev *fbd)
 	return err;
 }
 
+static int fbnic_mbx_map_tlv_msg(struct fbnic_dev *fbd,
+				 struct fbnic_tlv_msg *msg)
+{
+	unsigned long flags;
+	int err;
+
+	spin_lock_irqsave(&fbd->fw_tx_lock, flags);
+
+	err = fbnic_mbx_map_msg(fbd, FBNIC_IPC_MBX_TX_IDX, msg,
+				le16_to_cpu(msg->hdr.len) * sizeof(u32), 1);
+
+	spin_unlock_irqrestore(&fbd->fw_tx_lock, flags);
+
+	return err;
+}
+
 static void fbnic_mbx_process_tx_msgs(struct fbnic_dev *fbd)
 {
 	struct fbnic_fw_mbx *tx_mbx = &fbd->mbx[FBNIC_IPC_MBX_TX_IDX];
@@ -205,6 +222,60 @@ static void fbnic_mbx_process_tx_msgs(struct fbnic_dev *fbd)
 	tx_mbx->head = head;
 }
 
+/**
+ * fbnic_fw_xmit_simple_msg - Transmit a simple single TLV message w/o data
+ * @fbd: FBNIC device structure
+ * @msg_type: ENUM value indicating message type to send
+ *
+ * Returns the following values:
+ * -EOPNOTSUPP: Is not ASIC so mailbox is not supported
+ * -ENODEV: Device I/O error
+ * -ENOMEM: Failed to allocate message
+ * -EBUSY: No space in mailbox
+ * -ENOSPC: DMA mapping failed
+ *
+ * This function sends a single TLV header indicating the host wants to take
+ * some action. However there are no other side effects which means that any
+ * response will need to be caught via a completion if this action is
+ * expected to kick off a resultant action.
+ */
+static int fbnic_fw_xmit_simple_msg(struct fbnic_dev *fbd, u32 msg_type)
+{
+	struct fbnic_tlv_msg *msg;
+	int err = 0;
+
+	if (!fbnic_fw_present(fbd))
+		return -ENODEV;
+
+	msg = fbnic_tlv_msg_alloc(msg_type);
+	if (!msg)
+		return -ENOMEM;
+
+	err = fbnic_mbx_map_tlv_msg(fbd, msg);
+	if (err)
+		free_page((unsigned long)msg);
+
+	return err;
+}
+
+/**
+ * fbnic_fw_xmit_cap_msg - Allocate and populate a FW capabilities message
+ * @fbd: FBNIC device structure
+ *
+ * Returns NULL on failure to allocate, error pointer on error, or pointer
+ * to new TLV test message.
+ *
+ * Sends a single TLV header indicating the host wants the firmware to
+ * confirm the capabilities and version.
+ **/
+static int fbnic_fw_xmit_cap_msg(struct fbnic_dev *fbd)
+{
+	int err = fbnic_fw_xmit_simple_msg(fbd, FBNIC_TLV_MSG_ID_HOST_CAP_REQ);
+
+	/* return 0 if we are not calling this on ASIC */
+	return (err == -EOPNOTSUPP) ? 0 : err;
+}
+
 static void fbnic_mbx_postinit_desc_ring(struct fbnic_dev *fbd, int mbx_idx)
 {
 	struct fbnic_fw_mbx *mbx = &fbd->mbx[mbx_idx];
@@ -220,6 +291,16 @@ static void fbnic_mbx_postinit_desc_ring(struct fbnic_dev *fbd, int mbx_idx)
 		/* Make sure we have a page for the FW to write to */
 		fbnic_mbx_alloc_rx_msgs(fbd);
 		break;
+	case FBNIC_IPC_MBX_TX_IDX:
+		/* Force version to 1 if we successfully requested an update
+		 * from the firmware. This should be overwritten once we get
+		 * the actual version from the firmware in the capabilities
+		 * request message.
+		 */
+		if (!fbnic_fw_xmit_cap_msg(fbd) &&
+		    !fbd->fw_cap.running.mgmt.version)
+			fbd->fw_cap.running.mgmt.version = 1;
+		break;
 	}
 }
 
@@ -240,7 +321,335 @@ static void fbnic_mbx_postinit(struct fbnic_dev *fbd)
 		fbnic_mbx_postinit_desc_ring(fbd, i);
 }
 
+/**
+ * fbnic_fw_xmit_ownership_msg - Create and transmit a host ownership message
+ * to FW mailbox
+ *
+ * @fbd: FBNIC device structure
+ * @take_ownership: take/release the ownership
+ *
+ * Returns 0 on success, negative value on failure
+ *
+ * Notifies the firmware that the driver either takes ownership of the NIC
+ * (when @take_ownership is true) or releases it.
+ */
+int fbnic_fw_xmit_ownership_msg(struct fbnic_dev *fbd, bool take_ownership)
+{
+	unsigned long req_time = jiffies;
+	struct fbnic_tlv_msg *msg;
+	int err = 0;
+
+	if (!fbnic_fw_present(fbd))
+		return -ENODEV;
+
+	msg = fbnic_tlv_msg_alloc(FBNIC_TLV_MSG_ID_OWNERSHIP_REQ);
+	if (!msg)
+		return -ENOMEM;
+
+	if (take_ownership) {
+		err = fbnic_tlv_attr_put_flag(msg, FBNIC_FW_OWNERSHIP_FLAG);
+		if (err)
+			goto free_message;
+	}
+
+	err = fbnic_mbx_map_tlv_msg(fbd, msg);
+	if (err)
+		goto free_message;
+
+	/* Initialize heartbeat, set last response to 1 second in the past
+	 * so that we will trigger a timeout if the firmware doesn't respond
+	 */
+	fbd->last_heartbeat_response = req_time - HZ;
+
+	fbd->last_heartbeat_request = req_time;
+
+	/* set heartbeat detection based on if we are taking ownership */
+	fbd->fw_heartbeat_enabled = take_ownership;
+
+	return err;
+
+free_message:
+	free_page((unsigned long)msg);
+	return err;
+}
+
+static const struct fbnic_tlv_index fbnic_fw_cap_resp_index[] = {
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_VERSION),
+	FBNIC_TLV_ATTR_FLAG(FBNIC_FW_CAP_RESP_BMC_PRESENT),
+	FBNIC_TLV_ATTR_MAC_ADDR(FBNIC_FW_CAP_RESP_BMC_MAC_ADDR),
+	FBNIC_TLV_ATTR_ARRAY(FBNIC_FW_CAP_RESP_BMC_MAC_ARRAY),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_STORED_VERSION),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_ACTIVE_FW_SLOT),
+	FBNIC_TLV_ATTR_STRING(FBNIC_FW_CAP_RESP_VERSION_COMMIT_STR,
+			      FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_BMC_ALL_MULTI),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_FW_LINK_SPEED),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_FW_LINK_FEC),
+	FBNIC_TLV_ATTR_STRING(FBNIC_FW_CAP_RESP_STORED_COMMIT_STR,
+			      FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_CMRT_VERSION),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_STORED_CMRT_VERSION),
+	FBNIC_TLV_ATTR_STRING(FBNIC_FW_CAP_RESP_CMRT_COMMIT_STR,
+			      FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE),
+	FBNIC_TLV_ATTR_STRING(FBNIC_FW_CAP_RESP_STORED_CMRT_COMMIT_STR,
+			      FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE),
+	FBNIC_TLV_ATTR_U32(FBNIC_FW_CAP_RESP_UEFI_VERSION),
+	FBNIC_TLV_ATTR_STRING(FBNIC_FW_CAP_RESP_UEFI_COMMIT_STR,
+			      FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE),
+	FBNIC_TLV_ATTR_LAST
+};
+
+static int fbnic_fw_parse_bmc_addrs(u8 bmc_mac_addr[][ETH_ALEN],
+				    struct fbnic_tlv_msg *attr, int len)
+{
+	int attr_len = le16_to_cpu(attr->hdr.len) / sizeof(u32) - 1;
+	struct fbnic_tlv_msg *mac_results[8];
+	int err, i = 0;
+
+	/* make sure we have enough room to process all the MAC addresses */
+	if (len > 8)
+		return -ENOSPC;
+
+	/* Parse the array */
+	err = fbnic_tlv_attr_parse_array(&attr[1], attr_len, mac_results,
+					 fbnic_fw_cap_resp_index,
+					 FBNIC_FW_CAP_RESP_BMC_MAC_ADDR, len);
+	if (err)
+		return err;
+
+	/* Copy results into MAC addr array */
+	for (i = 0; i < len && mac_results[i]; i++)
+		fbnic_tlv_attr_addr_copy(bmc_mac_addr[i], mac_results[i]);
+
+	/* Zero remaining unused addresses */
+	while (i < len)
+		eth_zero_addr(bmc_mac_addr[i++]);
+
+	return 0;
+}
+
+static int fbnic_fw_parse_cap_resp(void *opaque, struct fbnic_tlv_msg **results)
+{
+	u32 active_slot = 0, all_multi = 0;
+	struct fbnic_dev *fbd = opaque;
+	u32 speed = 0, fec = 0;
+	size_t commit_size = 0;
+	bool bmc_present;
+	int err;
+
+	get_unsigned_result(FBNIC_FW_CAP_RESP_VERSION,
+			    fbd->fw_cap.running.mgmt.version);
+
+	if (!fbd->fw_cap.running.mgmt.version)
+		return -EINVAL;
+
+	if (fbd->fw_cap.running.mgmt.version < MIN_FW_VERSION_CODE) {
+		char running_ver[FBNIC_FW_VER_MAX_SIZE];
+
+		fbnic_mk_fw_ver_str(fbd->fw_cap.running.mgmt.version,
+				    running_ver);
+		dev_err(fbd->dev, "Device firmware version(%s) is older than minimum required version(%02d.%02d.%02d)\n",
+			running_ver,
+			MIN_FW_MAJOR_VERSION,
+			MIN_FW_MINOR_VERSION,
+			MIN_FW_BUILD_VERSION);
+		/* Disable TX mailbox to prevent card use until firmware is
+		 * updated.
+		 */
+		fbd->mbx[FBNIC_IPC_MBX_TX_IDX].ready = false;
+		return -EINVAL;
+	}
+
+	get_string_result(FBNIC_FW_CAP_RESP_VERSION_COMMIT_STR, commit_size,
+			  fbd->fw_cap.running.mgmt.commit,
+			  FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE);
+	if (!commit_size)
+		dev_warn(fbd->dev, "Firmware did not send mgmt commit!\n");
+
+	get_unsigned_result(FBNIC_FW_CAP_RESP_STORED_VERSION,
+			    fbd->fw_cap.stored.mgmt.version);
+	get_string_result(FBNIC_FW_CAP_RESP_STORED_COMMIT_STR, commit_size,
+			  fbd->fw_cap.stored.mgmt.commit,
+			  FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE);
+
+	get_unsigned_result(FBNIC_FW_CAP_RESP_CMRT_VERSION,
+			    fbd->fw_cap.running.bootloader.version);
+	get_string_result(FBNIC_FW_CAP_RESP_CMRT_COMMIT_STR, commit_size,
+			  fbd->fw_cap.running.bootloader.commit,
+			  FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE);
+
+	get_unsigned_result(FBNIC_FW_CAP_RESP_STORED_CMRT_VERSION,
+			    fbd->fw_cap.stored.bootloader.version);
+	get_string_result(FBNIC_FW_CAP_RESP_STORED_CMRT_COMMIT_STR, commit_size,
+			  fbd->fw_cap.stored.bootloader.commit,
+			  FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE);
+
+	get_unsigned_result(FBNIC_FW_CAP_RESP_UEFI_VERSION,
+			    fbd->fw_cap.stored.undi.version);
+	get_string_result(FBNIC_FW_CAP_RESP_UEFI_COMMIT_STR, commit_size,
+			  fbd->fw_cap.stored.undi.commit,
+			  FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE);
+
+	get_unsigned_result(FBNIC_FW_CAP_RESP_ACTIVE_FW_SLOT, active_slot);
+	fbd->fw_cap.active_slot = active_slot;
+
+	get_unsigned_result(FBNIC_FW_CAP_RESP_FW_LINK_SPEED, speed);
+	get_unsigned_result(FBNIC_FW_CAP_RESP_FW_LINK_FEC, fec);
+	fbd->fw_cap.link_speed = speed;
+	fbd->fw_cap.link_fec = fec;
+
+	bmc_present = !!results[FBNIC_FW_CAP_RESP_BMC_PRESENT];
+	if (bmc_present) {
+		struct fbnic_tlv_msg *attr;
+
+		attr = results[FBNIC_FW_CAP_RESP_BMC_MAC_ARRAY];
+		if (!attr)
+			return -EINVAL;
+
+		err = fbnic_fw_parse_bmc_addrs(fbd->fw_cap.bmc_mac_addr,
+					       attr, 4);
+		if (err)
+			return err;
+
+		get_unsigned_result(FBNIC_FW_CAP_RESP_BMC_ALL_MULTI, all_multi);
+	} else {
+		memset(fbd->fw_cap.bmc_mac_addr, 0,
+		       sizeof(fbd->fw_cap.bmc_mac_addr));
+	}
+
+	fbd->fw_cap.bmc_present = bmc_present;
+
+	if (results[FBNIC_FW_CAP_RESP_BMC_ALL_MULTI] || !bmc_present)
+		fbd->fw_cap.all_multi = all_multi;
+
+	return 0;
+}
+
+static const struct fbnic_tlv_index fbnic_ownership_resp_index[] = {
+	FBNIC_TLV_ATTR_LAST
+};
+
+static int fbnic_fw_parse_ownership_resp(void *opaque,
+					 struct fbnic_tlv_msg **results)
+{
+	struct fbnic_dev *fbd = (struct fbnic_dev *)opaque;
+
+	/* Count the ownership response as a heartbeat reply */
+	fbd->last_heartbeat_response = jiffies;
+
+	return 0;
+}
+
+static const struct fbnic_tlv_index fbnic_heartbeat_resp_index[] = {
+	FBNIC_TLV_ATTR_LAST
+};
+
+static int fbnic_fw_parse_heartbeat_resp(void *opaque,
+					 struct fbnic_tlv_msg **results)
+{
+	struct fbnic_dev *fbd = (struct fbnic_dev *)opaque;
+
+	fbd->last_heartbeat_response = jiffies;
+
+	return 0;
+}
+
+static int fbnic_fw_xmit_heartbeat_message(struct fbnic_dev *fbd)
+{
+	unsigned long req_time = jiffies;
+	struct fbnic_tlv_msg *msg;
+	int err = 0;
+
+	if (!fbnic_fw_present(fbd))
+		return -ENODEV;
+
+	msg = fbnic_tlv_msg_alloc(FBNIC_TLV_MSG_ID_HEARTBEAT_REQ);
+	if (!msg)
+		return -ENOMEM;
+
+	err = fbnic_mbx_map_tlv_msg(fbd, msg);
+	if (err)
+		goto free_message;
+
+	fbd->last_heartbeat_request = req_time;
+
+	return err;
+
+free_message:
+	free_page((unsigned long)msg);
+	return err;
+}
+
+static bool fbnic_fw_heartbeat_current(struct fbnic_dev *fbd)
+{
+	unsigned long last_response = fbd->last_heartbeat_response;
+	unsigned long last_request = fbd->last_heartbeat_request;
+
+	return !time_before(last_response, last_request);
+}
+
+int fbnic_fw_init_heartbeat(struct fbnic_dev *fbd, bool poll)
+{
+	int err = -ETIMEDOUT;
+	int attempts = 50;
+
+	if (!fbnic_fw_present(fbd))
+		return -ENODEV;
+
+	while (attempts--) {
+		msleep(200);
+		if (poll)
+			fbnic_mbx_poll(fbd);
+
+		if (!fbnic_fw_heartbeat_current(fbd))
+			continue;
+
+		/* Place new message on mailbox to elicit a response */
+		err = fbnic_fw_xmit_heartbeat_message(fbd);
+		if (err)
+			dev_warn(fbd->dev,
+				 "Failed to send heartbeat message\n");
+		break;
+	}
+
+	return err;
+}
+
+void fbnic_fw_check_heartbeat(struct fbnic_dev *fbd)
+{
+	unsigned long last_request = fbd->last_heartbeat_request;
+	int err;
+
+	/* Do not check heartbeat or send another request until current
+	 * period has expired. Otherwise we might start spamming requests.
+	 */
+	if (time_is_after_jiffies(last_request + FW_HEARTBEAT_PERIOD))
+		return;
+
+	/* We already reported no mailbox. Wait for it to come back */
+	if (!fbd->fw_heartbeat_enabled)
+		return;
+
+	/* Was the last heartbeat response long time ago? */
+	if (!fbnic_fw_heartbeat_current(fbd)) {
+		dev_warn(fbd->dev,
+			 "Firmware did not respond to heartbeat message\n");
+		fbd->fw_heartbeat_enabled = false;
+	}
+
+	/* Place new message on mailbox to elicit a response */
+	err = fbnic_fw_xmit_heartbeat_message(fbd);
+	if (err)
+		dev_warn(fbd->dev, "Failed to send heartbeat message\n");
+}
+
 static const struct fbnic_tlv_parser fbnic_fw_tlv_parser[] = {
+	FBNIC_TLV_PARSER(FW_CAP_RESP, fbnic_fw_cap_resp_index,
+			 fbnic_fw_parse_cap_resp),
+	FBNIC_TLV_PARSER(OWNERSHIP_RESP, fbnic_ownership_resp_index,
+			 fbnic_fw_parse_ownership_resp),
+	FBNIC_TLV_PARSER(HEARTBEAT_RESP, fbnic_heartbeat_resp_index,
+			 fbnic_fw_parse_heartbeat_resp),
 	FBNIC_TLV_MSG_ERROR
 };
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_fw.h b/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
index c143079f881c..40d314f963ea 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
@@ -4,6 +4,7 @@
 #ifndef _FBNIC_FW_H_
 #define _FBNIC_FW_H_
 
+#include <linux/if_ether.h>
 #include <linux/types.h>
 
 struct fbnic_dev;
@@ -17,10 +18,94 @@ struct fbnic_fw_mbx {
 	} buf_info[FBNIC_IPC_MBX_DESC_LEN];
 };
 
+// FW_VER_MAX_SIZE must match ETHTOOL_FWVERS_LEN
+#define FBNIC_FW_VER_MAX_SIZE	                32
+// Formatted version is in the format XX.YY.ZZ_RRR_COMMIT
+#define FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE	(FBNIC_FW_VER_MAX_SIZE - 13)
+#define FBNIC_FW_LOG_MAX_SIZE	                256
+
+struct fbnic_fw_ver {
+	u32 version;
+	char commit[FBNIC_FW_CAP_RESP_COMMIT_MAX_SIZE];
+};
+
+struct fbnic_fw_cap {
+	struct {
+		struct fbnic_fw_ver mgmt, bootloader;
+	} running;
+	struct {
+		struct fbnic_fw_ver mgmt, bootloader, undi;
+	} stored;
+	u8	active_slot;
+	u8	bmc_mac_addr[4][ETH_ALEN];
+	u8	bmc_present	: 1;
+	u8	all_multi	: 1;
+	u8	link_speed;
+	u8	link_fec;
+};
+
 void fbnic_mbx_init(struct fbnic_dev *fbd);
 void fbnic_mbx_clean(struct fbnic_dev *fbd);
 void fbnic_mbx_poll(struct fbnic_dev *fbd);
 int fbnic_mbx_poll_tx_ready(struct fbnic_dev *fbd);
 void fbnic_mbx_flush_tx(struct fbnic_dev *fbd);
+int fbnic_fw_xmit_ownership_msg(struct fbnic_dev *fbd, bool take_ownership);
+int fbnic_fw_init_heartbeat(struct fbnic_dev *fbd, bool poll);
+void fbnic_fw_check_heartbeat(struct fbnic_dev *fbd);
+
+#define fbnic_mk_full_fw_ver_str(_rev_id, _delim, _commit, _str)	\
+do {									\
+	const u32 __rev_id = _rev_id;					\
+	snprintf(_str, sizeof(_str), "%02lu.%02lu.%02lu-%03lu%s%s",	\
+		 FIELD_GET(FBNIC_FW_CAP_RESP_VERSION_MAJOR, __rev_id),	\
+		 FIELD_GET(FBNIC_FW_CAP_RESP_VERSION_MINOR, __rev_id),	\
+		 FIELD_GET(FBNIC_FW_CAP_RESP_VERSION_PATCH, __rev_id),	\
+		 FIELD_GET(FBNIC_FW_CAP_RESP_VERSION_BUILD, __rev_id),	\
+		 _delim, _commit);					\
+} while (0)
 
+#define fbnic_mk_fw_ver_str(_rev_id, _str) \
+	fbnic_mk_full_fw_ver_str(_rev_id, "", "", _str)
+
+#define FW_HEARTBEAT_PERIOD		(10 * HZ)
+
+enum {
+	FBNIC_TLV_MSG_ID_HOST_CAP_REQ			= 0x10,
+	FBNIC_TLV_MSG_ID_FW_CAP_RESP			= 0x11,
+	FBNIC_TLV_MSG_ID_OWNERSHIP_REQ			= 0x12,
+	FBNIC_TLV_MSG_ID_OWNERSHIP_RESP			= 0x13,
+	FBNIC_TLV_MSG_ID_HEARTBEAT_REQ			= 0x14,
+	FBNIC_TLV_MSG_ID_HEARTBEAT_RESP			= 0x15,
+};
+
+#define FBNIC_FW_CAP_RESP_VERSION_MAJOR		CSR_GENMASK(31, 24)
+#define FBNIC_FW_CAP_RESP_VERSION_MINOR		CSR_GENMASK(23, 16)
+#define FBNIC_FW_CAP_RESP_VERSION_PATCH		CSR_GENMASK(15, 8)
+#define FBNIC_FW_CAP_RESP_VERSION_BUILD		CSR_GENMASK(7, 0)
+enum {
+	FBNIC_FW_CAP_RESP_VERSION			= 0x0,
+	FBNIC_FW_CAP_RESP_BMC_PRESENT			= 0x1,
+	FBNIC_FW_CAP_RESP_BMC_MAC_ADDR			= 0x2,
+	FBNIC_FW_CAP_RESP_BMC_MAC_ARRAY			= 0x3,
+	FBNIC_FW_CAP_RESP_STORED_VERSION		= 0x4,
+	FBNIC_FW_CAP_RESP_ACTIVE_FW_SLOT		= 0x5,
+	FBNIC_FW_CAP_RESP_VERSION_COMMIT_STR		= 0x6,
+	FBNIC_FW_CAP_RESP_BMC_ALL_MULTI			= 0x8,
+	FBNIC_FW_CAP_RESP_FW_STATE			= 0x9,
+	FBNIC_FW_CAP_RESP_FW_LINK_SPEED			= 0xa,
+	FBNIC_FW_CAP_RESP_FW_LINK_FEC			= 0xb,
+	FBNIC_FW_CAP_RESP_STORED_COMMIT_STR		= 0xc,
+	FBNIC_FW_CAP_RESP_CMRT_VERSION			= 0xd,
+	FBNIC_FW_CAP_RESP_STORED_CMRT_VERSION		= 0xe,
+	FBNIC_FW_CAP_RESP_CMRT_COMMIT_STR		= 0xf,
+	FBNIC_FW_CAP_RESP_STORED_CMRT_COMMIT_STR	= 0x10,
+	FBNIC_FW_CAP_RESP_UEFI_VERSION			= 0x11,
+	FBNIC_FW_CAP_RESP_UEFI_COMMIT_STR		= 0x12,
+	FBNIC_FW_CAP_RESP_MSG_MAX
+};
+
+enum {
+	FBNIC_FW_OWNERSHIP_FLAG			= 0x0,
+	FBNIC_FW_OWNERSHIP_MSG_MAX
+};
 #endif /* _FBNIC_FW_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index 171b159cc006..bbc2f21060dc 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -11,6 +11,7 @@
 
 int __fbnic_open(struct fbnic_net *fbn)
 {
+	struct fbnic_dev *fbd = fbn->fbd;
 	int err;
 
 	err = fbnic_alloc_napi_vectors(fbn);
@@ -31,7 +32,22 @@ int __fbnic_open(struct fbnic_net *fbn)
 	if (err)
 		goto free_resources;
 
+	/* Send ownership message and flush to verify FW has seen it */
+	err = fbnic_fw_xmit_ownership_msg(fbd, true);
+	if (err) {
+		dev_warn(fbd->dev,
+			 "Error %d sending host ownership message to the firmware\n",
+			 err);
+		goto free_resources;
+	}
+
+	err = fbnic_fw_init_heartbeat(fbd, false);
+	if (err)
+		goto release_ownership;
+
 	return 0;
+release_ownership:
+	fbnic_fw_xmit_ownership_msg(fbn->fbd, false);
 free_resources:
 	fbnic_free_resources(fbn);
 free_napi_vectors:
@@ -57,6 +73,8 @@ static int fbnic_stop(struct net_device *netdev)
 
 	fbnic_down(fbn);
 
+	fbnic_fw_xmit_ownership_msg(fbn->fbd, false);
+
 	fbnic_free_resources(fbn);
 	fbnic_free_napi_vectors(fbn);
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index d6598c81a5f9..8408f0d5f54a 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -160,6 +160,30 @@ void fbnic_down(struct fbnic_net *fbn)
 	fbnic_flush(fbn);
 }
 
+static void fbnic_health_check(struct fbnic_dev *fbd)
+{
+	struct fbnic_fw_mbx *tx_mbx = &fbd->mbx[FBNIC_IPC_MBX_TX_IDX];
+
+	/* As long as the heart is beating the FW is healty */
+	if (fbd->fw_heartbeat_enabled)
+		return;
+
+	/* If the Tx mailbox still has messages sitting in it then there likely
+	 * isn't anything we can do. We will wait until the mailbox is empty to
+	 * report the fault so we can collect the crashlog.
+	 */
+	if (tx_mbx->head != tx_mbx->tail)
+		return;
+
+	/* TBD: Need to add a more thorough recovery here.
+	 *	Specifically I need to verify what all the firmware will have
+	 *	changed since we had setup and it rebooted. May just need to
+	 *	perform a down/up. For now we will just reclaim ownership so
+	 *	the heartbeat can catch the next fault.
+	 */
+	fbnic_fw_xmit_ownership_msg(fbd, true);
+}
+
 static void fbnic_service_task(struct work_struct *work)
 {
 	struct fbnic_dev *fbd = container_of(to_delayed_work(work),
@@ -167,6 +191,10 @@ static void fbnic_service_task(struct work_struct *work)
 
 	rtnl_lock();
 
+	fbnic_fw_check_heartbeat(fbd);
+
+	fbnic_health_check(fbd);
+
 	if (netif_running(fbd->netdev))
 		schedule_delayed_work(&fbd->service_task, HZ);
 



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (9 preceding siblings ...)
  2024-04-03 20:09 ` [net-next PATCH 10/15] eth: fbnic: Add initial messaging to notify FW of our presence Alexander Duyck
@ 2024-04-03 20:09 ` Alexander Duyck
  2024-04-03 21:11   ` Andrew Lunn
  2024-04-05 21:51   ` Andrew Lunn
  2024-04-03 20:09 ` [net-next PATCH 12/15] eth: fbnic: add basic Tx handling Alexander Duyck
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:09 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Add the logic needed to enable the Ethernet link to be configured and the
link to be detected. We have to partially rely on the FW for this as there
are parts of the MAC configuration that are shared between multiple ports
so we ask the firmware to complete those pieces on our behalf.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic.h        |   18 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |  143 ++++++
 drivers/net/ethernet/meta/fbnic/fbnic_fw.c     |   60 ++
 drivers/net/ethernet/meta/fbnic/fbnic_fw.h     |   22 +
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c    |  118 +++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c    |  587 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.h    |   58 ++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |   12 
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h |    7 
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c    |   73 +++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c   |   21 +
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h   |    1 
 12 files changed, 1119 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 4f18d703dae8..202f005e1cfd 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -20,6 +20,7 @@ struct fbnic_dev {
 	const struct fbnic_mac *mac;
 	struct msix_entry *msix_entries;
 	unsigned int fw_msix_vector;
+	unsigned int mac_msix_vector;
 	unsigned short num_irqs;
 
 	struct delayed_work service_task;
@@ -37,6 +38,13 @@ struct fbnic_dev {
 	u32 mps;
 	u32 readrq;
 
+	/* Tri-state value indicating state of link.
+	 *  0 - Up
+	 *  1 - Down
+	 *  2 - Event - Requires checking as link state may have changed
+	 */
+	s8 link_state;
+
 	/* Number of TCQs/RCQs available on hardware */
 	u16 max_num_queues;
 };
@@ -48,6 +56,7 @@ struct fbnic_dev {
  */
 enum {
 	FBNIC_FW_MSIX_ENTRY,
+	FBNIC_MAC_MSIX_ENTRY,
 	FBNIC_NON_NAPI_VECTORS
 };
 
@@ -89,6 +98,11 @@ void fbnic_fw_wr32(struct fbnic_dev *fbd, u32 reg, u32 val);
 #define fw_wr32(reg, val)	fbnic_fw_wr32(fbd, reg, val)
 #define fw_wrfl()		fbnic_fw_rd32(fbd, FBNIC_FW_ZERO_REG)
 
+static inline bool fbnic_bmc_present(struct fbnic_dev *fbd)
+{
+	return fbd->fw_cap.bmc_present;
+}
+
 static inline bool fbnic_init_failure(struct fbnic_dev *fbd)
 {
 	return !fbd->netdev;
@@ -104,6 +118,10 @@ void fbnic_devlink_unregister(struct fbnic_dev *fbd);
 int fbnic_fw_enable_mbx(struct fbnic_dev *fbd);
 void fbnic_fw_disable_mbx(struct fbnic_dev *fbd);
 
+int fbnic_mac_get_link(struct fbnic_dev *fbd, bool *link);
+int fbnic_mac_enable(struct fbnic_dev *fbd);
+void fbnic_mac_disable(struct fbnic_dev *fbd);
+
 void fbnic_free_irqs(struct fbnic_dev *fbd);
 int fbnic_alloc_irqs(struct fbnic_dev *fbd);
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 8b035c4e068e..39c98d2dce12 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -58,6 +58,10 @@
 #define FBNIC_INTR_MSIX_CTRL(n)		(0x00040 + (n)) /* 0x00100 + 4*n */
 #define FBNIC_INTR_MSIX_CTRL_VECTOR_MASK	CSR_GENMASK(7, 0)
 #define FBNIC_INTR_MSIX_CTRL_ENABLE		CSR_BIT(31)
+enum {
+	FBNIC_INTR_MSIX_CTRL_MAC_IDX	= 6,
+	FBNIC_INTR_MSIX_CTRL_PCS_IDX	= 34,
+};
 
 #define FBNIC_CSR_END_INTR		0x0005f	/* CSR section delimiter */
 
@@ -392,6 +396,145 @@ enum {
 #define FBNIC_MASTER_SPARE_0		0x0C41B		/* 0x3106c */
 #define FBNIC_CSR_END_MASTER		0x0C41E	/* CSR section delimiter */
 
+/* MAC PCS registers */
+#define FBNIC_CSR_START_PCS		0x10000 /* CSR section delimiter */
+#define FBNIC_PCS_CONTROL1_0		0x10000		/* 0x40000 */
+#define FBNIC_PCS_CONTROL1_RESET		CSR_BIT(15)
+#define FBNIC_PCS_CONTROL1_LOOPBACK		CSR_BIT(14)
+#define FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS	CSR_BIT(13)
+#define FBNIC_PCS_CONTROL1_SPEED_ALWAYS		CSR_BIT(6)
+#define FBNIC_PCS_VENDOR_VL_INTVL_0	0x10202		/* 0x40808 */
+#define FBNIC_PCS_VL0_0_CHAN_0		0x10208		/* 0x40820 */
+#define FBNIC_PCS_VL0_1_CHAN_0		0x10209		/* 0x40824 */
+#define FBNIC_PCS_VL1_0_CHAN_0		0x1020a		/* 0x40828 */
+#define FBNIC_PCS_VL1_1_CHAN_0		0x1020b		/* 0x4082c */
+#define FBNIC_PCS_VL2_0_CHAN_0		0x1020c		/* 0x40830 */
+#define FBNIC_PCS_VL2_1_CHAN_0		0x1020d		/* 0x40834 */
+#define FBNIC_PCS_VL3_0_CHAN_0		0x1020e		/* 0x40838 */
+#define FBNIC_PCS_VL3_1_CHAN_0		0x1020f		/* 0x4083c */
+#define FBNIC_PCS_MODE_VL_CHAN_0	0x10210		/* 0x40840 */
+#define FBNIC_PCS_MODE_HI_BER25			CSR_BIT(2)
+#define FBNIC_PCS_MODE_DISABLE_MLD		CSR_BIT(1)
+#define FBNIC_PCS_MODE_ENA_CLAUSE49		CSR_BIT(0)
+#define FBNIC_PCS_CONTROL1_1		0x10400		/* 0x41000 */
+#define FBNIC_PCS_VENDOR_VL_INTVL_1	0x10602		/* 0x41808 */
+#define FBNIC_PCS_VL0_0_CHAN_1		0x10608		/* 0x41820 */
+#define FBNIC_PCS_VL0_1_CHAN_1		0x10609		/* 0x41824 */
+#define FBNIC_PCS_VL1_0_CHAN_1		0x1060a		/* 0x41828 */
+#define FBNIC_PCS_VL1_1_CHAN_1		0x1060b		/* 0x4182c */
+#define FBNIC_PCS_VL2_0_CHAN_1		0x1060c		/* 0x41830 */
+#define FBNIC_PCS_VL2_1_CHAN_1		0x1060d		/* 0x41834 */
+#define FBNIC_PCS_VL3_0_CHAN_1		0x1060e		/* 0x41838 */
+#define FBNIC_PCS_VL3_1_CHAN_1		0x1060f		/* 0x4183c */
+#define FBNIC_PCS_MODE_VL_CHAN_1	0x10610		/* 0x41840 */
+#define FBNIC_CSR_END_PCS		0x10668 /* CSR section delimiter */
+
+#define FBNIC_CSR_START_RSFEC		0x10800 /* CSR section delimiter */
+#define FBNIC_RSFEC_CONTROL(n)\
+				(0x10800 + 8 * (n))	/* 0x42000 + 32*n */
+#define FBNIC_RSFEC_CONTROL_AM16_COPY_DIS	CSR_BIT(3)
+#define FBNIC_RSFEC_CONTROL_KP_ENABLE		CSR_BIT(8)
+#define FBNIC_RSFEC_CONTROL_TC_PAD_ALTER	CSR_BIT(10)
+#define FBNIC_RSFEC_MAX_LANES			4
+#define FBNIC_RSFEC_CCW_LO(n) \
+				(0x10802 + 8 * (n))	/* 0x42008 + 32*n */
+#define FBNIC_RSFEC_CCW_HI(n) \
+				(0x10803 + 8 * (n))	/* 0x4200c + 32*n */
+#define FBNIC_RSFEC_NCCW_LO(n) \
+				(0x10804 + 8 * (n))	/* 0x42010 + 32*n */
+#define FBNIC_RSFEC_NCCW_HI(n) \
+				(0x10805 + 8 * (n))	/* 0x42014 + 32*n */
+#define FBNIC_RSFEC_SYMBLERR_LO(n) \
+				(0x10880 + 8 * (n))	/* 0x42200 + 32*n */
+#define FBNIC_RSFEC_SYMBLERR_HI(n) \
+				(0x10881 + 8 * (n))	/* 0x42204 + 32*n */
+#define FBNIC_CSR_END_RSFEC		0x108c8 /* CSR section delimiter */
+
+/* MAC MAC registers */
+#define FBNIC_CSR_START_MAC_MAC		0x11000 /* CSR section delimiter */
+#define FBNIC_MAC_COMMAND_CONFIG	0x11002		/* 0x44008 */
+#define FBNIC_MAC_COMMAND_CONFIG_RX_PAUSE_DIS	CSR_BIT(29)
+#define FBNIC_MAC_COMMAND_CONFIG_TX_PAUSE_DIS	CSR_BIT(28)
+#define FBNIC_MAC_COMMAND_CONFIG_FLT_HDL_DIS	CSR_BIT(27)
+#define FBNIC_MAC_COMMAND_CONFIG_TX_PAD_EN	CSR_BIT(11)
+#define FBNIC_MAC_COMMAND_CONFIG_LOOPBACK_EN	CSR_BIT(10)
+#define FBNIC_MAC_COMMAND_CONFIG_PROMISC_EN	CSR_BIT(4)
+#define FBNIC_MAC_COMMAND_CONFIG_RX_ENA		CSR_BIT(1)
+#define FBNIC_MAC_COMMAND_CONFIG_TX_ENA		CSR_BIT(0)
+#define FBNIC_MAC_FRM_LENGTH		0x11005		/* 0x44014 */
+#define FBNIC_MAC_TX_IPG_LENGTH		0x11011		/* 0x44044 */
+#define FBNIC_MAC_TX_IPG_LENGTH_COMP		CSR_GENMASK(31, 16)
+#define FBNIC_MAC_TX_IPG_LENGTH_TXIPG		CSR_GENMASK(5, 3)
+#define FBNIC_MAC_CL01_PAUSE_QUANTA	0x11015		/* 0x44054 */
+#define FBNIC_MAC_CL01_QUANTA_THRESH	0x11019		/* 0x44064 */
+#define FBNIC_MAC_XIF_MODE		0x11020		/* 0x44080 */
+#define FBNIC_MAC_XIF_MODE_TX_MAC_RS_ERR	CSR_BIT(8)
+#define FBNIC_MAC_XIF_MODE_XGMII		CSR_BIT(0)
+#define FBNIC_CSR_END_MAC_MAC		0x11028 /* CSR section delimiter */
+
+/* MAC CSR registers */
+#define FBNIC_CSR_START_MAC_CSR		0x11800 /* CSR section delimiter */
+#define FBNIC_MAC_CTRL			0x11800		/* 0x46000 */
+#define FBNIC_MAC_CTRL_RESET_FF_TX_CLK		CSR_BIT(14)
+#define FBNIC_MAC_CTRL_RESET_FF_RX_CLK		CSR_BIT(13)
+#define FBNIC_MAC_CTRL_RESET_TX_CLK		CSR_BIT(12)
+#define FBNIC_MAC_CTRL_RESET_RX_CLK		CSR_BIT(11)
+#define FBNIC_MAC_CTRL_TX_CRC			CSR_BIT(8)
+#define FBNIC_MAC_CTRL_CFG_MODE128		CSR_BIT(10)
+#define FBNIC_MAC_SERDES_CTRL		0x11807		/* 0x4601c */
+#define FBNIC_MAC_SERDES_CTRL_RESET_PCS_REF_CLK	CSR_BIT(26)
+#define FBNIC_MAC_SERDES_CTRL_RESET_F91_REF_CLK	CSR_BIT(25)
+#define FBNIC_MAC_SERDES_CTRL_RESET_SD_TX_CLK	CSR_GENMASK(24, 23)
+#define FBNIC_MAC_SERDES_CTRL_RESET_SD_RX_CLK	CSR_GENMASK(22, 21)
+#define FBNIC_MAC_SERDES_CTRL_SD_8X             CSR_GENMASK(18, 17)
+#define FBNIC_MAC_SERDES_CTRL_F91_1LANE_IN0	CSR_BIT(9)
+#define FBNIC_MAC_SERDES_CTRL_RXLAUI_ENA_IN0	CSR_BIT(7)
+#define FBNIC_MAC_SERDES_CTRL_PCS100_ENA_IN0    CSR_BIT(6)
+#define FBNIC_MAC_SERDES_CTRL_PACER_10G_MASK	CSR_GENMASK(1, 0)
+#define FBNIC_MAC_PCS_STS0		0x11808		/* 0x46020 */
+#define FBNIC_MAC_PCS_STS0_LINK			CSR_BIT(27)
+#define FBNIC_MAC_PCS_STS0_BLOCK_LOCK		CSR_GENMASK(24, 5)
+#define FBNIC_MAC_PCS_STS0_AMPS_LOCK		CSR_GENMASK(4, 1)
+#define FBNIC_MAC_PCS_STS1		0x11809		/* 0x46024 */
+#define FBNIC_MAC_PCS_STS1_FCFEC_LOCK		CSR_GENMASK(11, 8)
+#define FBNIC_MAC_PCS_INTR_STS		0x11814		/* 0x46050 */
+#define FBNIC_MAC_PCS_INTR_LINK_DOWN		CSR_BIT(1)
+#define FBNIC_MAC_PCS_INTR_LINK_UP		CSR_BIT(0)
+#define FBNIC_MAC_PCS_INTR_MASK		0x11816		/* 0x46058 */
+#define FBNIC_MAC_ENET_LED		0x11820		/* 0x46080 */
+#define FBNIC_MAC_ENET_LED_OVERRIDE_EN		CSR_GENMASK(2, 0)
+#define FBNIC_MAC_ENET_LED_OVERRIDE_VAL		CSR_GENMASK(6, 4)
+enum {
+	FBNIC_MAC_ENET_LED_OVERRIDE_ACTIVITY	= 0x1,
+	FBNIC_MAC_ENET_LED_OVERRIDE_AMBER	= 0x2,
+	FBNIC_MAC_ENET_LED_OVERRIDE_BLUE	= 0x4,
+};
+
+#define FBNIC_MAC_ENET_LED_BLINK_RATE_MASK	CSR_GENMASK(11, 8)
+enum {
+	FBNIC_MAC_ENET_LED_BLINK_RATE_5HZ	= 0xf,
+};
+
+#define FBNIC_MAC_ENET_LED_BLUE_MASK		CSR_GENMASK(18, 16)
+enum {
+	FBNIC_MAC_ENET_LED_BLUE_50G		= 0x2,
+	FBNIC_MAC_ENET_LED_BLUE_100G		= 0x4,
+};
+
+#define FBNIC_MAC_ENET_LED_AMBER_MASK		CSR_GENMASK(21, 20)
+enum {
+	FBNIC_MAC_ENET_LED_AMBER_25G		= 0x1,
+	FBNIC_MAC_ENET_LED_AMBER_50G		= 0x2,
+};
+
+#define FBNIC_MAC_ENET_SIG_DETECT	0x11824		/* 0x46090 */
+#define FBNIC_MAC_ENET_SIG_DETECT_PCS_MASK	CSR_GENMASK(1, 0)
+#define FBNIC_MAC_ENET_FEC_CTRL		0x11825		/* 0x46094 */
+#define FBNIC_MAC_ENET_FEC_CTRL_FEC_ENA		CSR_GENMASK(27, 24)
+#define FBNIC_MAC_ENET_FEC_CTRL_KP_MODE_ENA	CSR_GENMASK(11, 8)
+#define FBNIC_MAC_ENET_FEC_CTRL_F91_ENA		CSR_GENMASK(3, 0)
+#define FBNIC_CSR_END_MAC_CSR		0x1184e /* CSR section delimiter */
+
 /* PUL User Registers */
 #define FBNIC_CSR_START_PUL_USER	0x31000	/* CSR section delimiter */
 #define FBNIC_PUL_OB_TLP_HDR_AW_CFG	0x3103d		/* 0xc40f4 */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_fw.c b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
index 4c3098364fed..af38d5934bbf 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c
@@ -643,6 +643,63 @@ void fbnic_fw_check_heartbeat(struct fbnic_dev *fbd)
 		dev_warn(fbd->dev, "Failed to send heartbeat message\n");
 }
 
+/**
+ * fbnic_fw_xmit_comphy_set_msg - Create and transmit a comphy set request
+ *
+ * @fbd: FBNIC device structure
+ * @speed: Indicates link speed, composed of modulation and number of lanes
+ *
+ * Returns 0 on success, negative value on failure
+ *
+ * Asks the firmware to reconfigure the comphy for this slice to the target
+ * speed.
+ */
+int fbnic_fw_xmit_comphy_set_msg(struct fbnic_dev *fbd, u32 speed)
+{
+	struct fbnic_tlv_msg *msg;
+	int err = 0;
+
+	if (!fbnic_fw_present(fbd))
+		return -ENODEV;
+
+	msg = fbnic_tlv_msg_alloc(FBNIC_TLV_MSG_ID_COMPHY_SET_REQ);
+	if (!msg)
+		return -ENOMEM;
+
+	err = fbnic_tlv_attr_put_int(msg, FBNIC_COMPHY_SET_PAM4,
+				     !!(speed & FBNIC_LINK_MODE_PAM4));
+	if (err)
+		goto free_message;
+
+	err = fbnic_mbx_map_tlv_msg(fbd, msg);
+	if (err)
+		goto free_message;
+
+	return 0;
+
+free_message:
+	free_page((unsigned long)msg);
+	return err;
+}
+
+static const struct fbnic_tlv_index fbnic_comphy_set_resp_index[] = {
+	FBNIC_TLV_ATTR_S32(FBNIC_COMPHY_SET_ERROR),
+	FBNIC_TLV_ATTR_LAST
+};
+
+static int fbnic_fw_parse_comphy_set_resp(void *opaque,
+					  struct fbnic_tlv_msg **results)
+{
+	struct fbnic_dev *fbd = (struct fbnic_dev *)opaque;
+	int err_resp = 0;
+
+	get_signed_result(FBNIC_COMPHY_SET_ERROR, err_resp);
+	if (err_resp)
+		dev_err(fbd->dev, "COMPHY_SET returned %d\n", err_resp);
+
+	return 0;
+}
+
 static const struct fbnic_tlv_parser fbnic_fw_tlv_parser[] = {
 	FBNIC_TLV_PARSER(FW_CAP_RESP, fbnic_fw_cap_resp_index,
 			 fbnic_fw_parse_cap_resp),
@@ -650,6 +707,9 @@ static const struct fbnic_tlv_parser fbnic_fw_tlv_parser[] = {
 			 fbnic_fw_parse_ownership_resp),
 	FBNIC_TLV_PARSER(HEARTBEAT_RESP, fbnic_heartbeat_resp_index,
 			 fbnic_fw_parse_heartbeat_resp),
+	FBNIC_TLV_PARSER(COMPHY_SET_RESP,
+			 fbnic_comphy_set_resp_index,
+			 fbnic_fw_parse_comphy_set_resp),
 	FBNIC_TLV_MSG_ERROR
 };
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_fw.h b/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
index 40d314f963ea..ea4802537d31 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_fw.h
@@ -52,6 +52,7 @@ void fbnic_mbx_flush_tx(struct fbnic_dev *fbd);
 int fbnic_fw_xmit_ownership_msg(struct fbnic_dev *fbd, bool take_ownership);
 int fbnic_fw_init_heartbeat(struct fbnic_dev *fbd, bool poll);
 void fbnic_fw_check_heartbeat(struct fbnic_dev *fbd);
+int fbnic_fw_xmit_comphy_set_msg(struct fbnic_dev *fbd, u32 speed);
 
 #define fbnic_mk_full_fw_ver_str(_rev_id, _delim, _commit, _str)	\
 do {									\
@@ -76,6 +77,8 @@ enum {
 	FBNIC_TLV_MSG_ID_OWNERSHIP_RESP			= 0x13,
 	FBNIC_TLV_MSG_ID_HEARTBEAT_REQ			= 0x14,
 	FBNIC_TLV_MSG_ID_HEARTBEAT_RESP			= 0x15,
+	FBNIC_TLV_MSG_ID_COMPHY_SET_REQ			= 0x3E,
+	FBNIC_TLV_MSG_ID_COMPHY_SET_RESP		= 0x3F,
 };
 
 #define FBNIC_FW_CAP_RESP_VERSION_MAJOR		CSR_GENMASK(31, 24)
@@ -104,8 +107,27 @@ enum {
 	FBNIC_FW_CAP_RESP_MSG_MAX
 };
 
+enum {
+	FBNIC_FW_LINK_SPEED_25R1		= 1,
+	FBNIC_FW_LINK_SPEED_50R2		= 2,
+	FBNIC_FW_LINK_SPEED_50R1		= 3,
+	FBNIC_FW_LINK_SPEED_100R2		= 4,
+};
+
+enum {
+	FBNIC_FW_LINK_FEC_NONE			= 1,
+	FBNIC_FW_LINK_FEC_RS			= 2,
+	FBNIC_FW_LINK_FEC_BASER			= 3,
+};
+
 enum {
 	FBNIC_FW_OWNERSHIP_FLAG			= 0x0,
 	FBNIC_FW_OWNERSHIP_MSG_MAX
 };
+
+enum {
+	FBNIC_COMPHY_SET_PAM4			= 0x0,
+	FBNIC_COMPHY_SET_ERROR			= 0x1,
+	FBNIC_COMPHY_SET_MSG_MAX
+};
 #endif /* _FBNIC_FW_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
index a20070683f48..33b5f15e2c40 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
@@ -84,11 +84,127 @@ void fbnic_fw_disable_mbx(struct fbnic_dev *fbd)
 	fbnic_mbx_clean(fbd);
 }
 
+static irqreturn_t fbnic_mac_msix_intr(int __always_unused irq, void *data)
+{
+	struct fbnic_dev *fbd = data;
+
+	if (fbd->mac->get_link_event(fbd))
+		fbd->link_state = FBNIC_LINK_EVENT;
+	else
+		wr32(FBNIC_INTR_MASK_CLEAR(0), 1u << FBNIC_MAC_MSIX_ENTRY);
+
+	return IRQ_HANDLED;
+}
+
+/**
+ * fbnic_mac_get_link - Retrieve the current link state of the MAC
+ * @fbd: Device to retrieve the link state of
+ * @link: pointer to boolean value that will store link state
+ *
+ * This function will query the hardware to determine the state of the
+ * hardware to determine the link status of the device. If it is unable to
+ * communicate with the device it will return ENODEV and return false
+ * indicating the link is down.
+ **/
+int fbnic_mac_get_link(struct fbnic_dev *fbd, bool *link)
+{
+	const struct fbnic_mac *mac = fbd->mac;
+
+	*link = true;
+
+	/* In an interrupt driven setup we can just skip the check if
+	 * the link is up as the interrupt should toggle it to the EVENT
+	 * state if the link has changed state at any time since the last
+	 * check.
+	 */
+	if (fbd->link_state == FBNIC_LINK_UP)
+		goto skip_check;
+
+	*link = mac->get_link(fbd);
+
+	wr32(FBNIC_INTR_MASK_CLEAR(0), 1u << FBNIC_MAC_MSIX_ENTRY);
+skip_check:
+	if (!fbnic_present(fbd)) {
+		*link = false;
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+/**
+ * fbnic_mac_enable - Configure the MAC to enable it to advertise link
+ * @fbd: Pointer to device to initialize
+ *
+ * This function provides basic bringup for the CMAC and sets the link
+ * state to FBNIC_LINK_EVENT which tells the link state check that the
+ * current state is unknown and that interrupts must be enabled after the
+ * check is completed.
+ **/
+int fbnic_mac_enable(struct fbnic_dev *fbd)
+{
+	const struct fbnic_mac *mac = fbd->mac;
+	u32 vector = fbd->mac_msix_vector;
+	int err;
+
+	/* Request the IRQ for MAC link vector.
+	 * Map MAC cause to it, and unmask it
+	 */
+	err = request_irq(vector, &fbnic_mac_msix_intr, 0,
+			  fbd->netdev->name, fbd);
+	if (err)
+		return err;
+
+	wr32(FBNIC_INTR_MSIX_CTRL(FBNIC_INTR_MSIX_CTRL_PCS_IDX),
+	     FBNIC_MAC_MSIX_ENTRY | FBNIC_INTR_MSIX_CTRL_ENABLE);
+
+	err = mac->enable(fbd);
+	if (err) {
+		/* Disable interrupt */
+		wr32(FBNIC_INTR_MSIX_CTRL(FBNIC_INTR_MSIX_CTRL_PCS_IDX),
+		     FBNIC_MAC_MSIX_ENTRY);
+		wr32(FBNIC_INTR_MASK_SET(0), 1u << FBNIC_MAC_MSIX_ENTRY);
+
+		/* Free the vector */
+		free_irq(fbd->mac_msix_vector, fbd);
+	}
+
+	return err;
+}
+
+/**
+ * fbnic_mac_disable - Teardown the MAC to prepare for stopping
+ * @fbd: Pointer to device that is stopping
+ *
+ * This function undoes the work done in fbnic_mac_enable and prepares the
+ * device to no longer receive traffic on the host interface.
+ **/
+void fbnic_mac_disable(struct fbnic_dev *fbd)
+{
+	const struct fbnic_mac *mac = fbd->mac;
+
+	/* Nothing to do if link is already disabled */
+	if (fbd->link_state == FBNIC_LINK_DISABLED)
+		return;
+
+	mac->disable(fbd);
+
+	/* Disable interrupt */
+	wr32(FBNIC_INTR_MSIX_CTRL(FBNIC_INTR_MSIX_CTRL_PCS_IDX),
+	     FBNIC_MAC_MSIX_ENTRY);
+	wr32(FBNIC_INTR_MASK_SET(0), 1u << FBNIC_MAC_MSIX_ENTRY);
+
+	/* Free the vector */
+	free_irq(fbd->mac_msix_vector, fbd);
+}
+
 void fbnic_free_irqs(struct fbnic_dev *fbd)
 {
 	struct pci_dev *pdev = to_pci_dev(fbd->dev);
 
+	fbd->mac_msix_vector = 0;
 	fbd->fw_msix_vector = 0;
+
 	fbd->num_irqs = 0;
 
 	pci_disable_msix(pdev);
@@ -128,6 +244,8 @@ int fbnic_alloc_irqs(struct fbnic_dev *fbd)
 	fbd->msix_entries = msix_entries;
 	fbd->num_irqs = num_irqs;
 
+	fbd->mac_msix_vector = msix_entries[FBNIC_MAC_MSIX_ENTRY].vector;
 	fbd->fw_msix_vector = msix_entries[FBNIC_FW_MSIX_ENTRY].vector;
+
 	return 0;
 }
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_mac.c b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
index dbbfdc649f37..64c4dde30b9d 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
@@ -2,10 +2,12 @@
 /* Copyright (c) Meta Platforms, Inc. and affiliates. */
 
 #include <linux/bitfield.h>
+#include <linux/iopoll.h>
 #include <net/tcp.h>
 
 #include "fbnic.h"
 #include "fbnic_mac.h"
+#include "fbnic_netdev.h"
 
 static void fbnic_init_readrq(struct fbnic_dev *fbd, unsigned int offset,
 			      unsigned int cls, unsigned int readrq)
@@ -415,8 +417,593 @@ static void fbnic_mac_init_regs(struct fbnic_dev *fbd)
 	fbnic_mac_init_txb(fbd);
 }
 
+static int fbnic_mac_get_link_event_asic(struct fbnic_dev *fbd)
+{
+	u32 pcs_intr_mask = rd32(FBNIC_MAC_PCS_INTR_STS);
+
+	if (pcs_intr_mask & FBNIC_MAC_PCS_INTR_LINK_DOWN)
+		return -1;
+
+	return (pcs_intr_mask & FBNIC_MAC_PCS_INTR_LINK_UP) ? 1 : 0;
+}
+
+static u32 __fbnic_mac_config_asic(struct fbnic_dev *fbd)
+{
+	/* Enable MAC Promiscuous mode and Tx padding */
+	u32 command_config = FBNIC_MAC_COMMAND_CONFIG_TX_PAD_EN |
+			     FBNIC_MAC_COMMAND_CONFIG_PROMISC_EN;
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+	u32 rxb_pause_ctrl;
+
+	/* Set class 0 Quanta and refresh */
+	wr32(FBNIC_MAC_CL01_PAUSE_QUANTA, 0xffff);
+	wr32(FBNIC_MAC_CL01_QUANTA_THRESH, 0x7fff);
+
+	/* Enable generation of pause frames if enabled */
+	rxb_pause_ctrl = rd32(FBNIC_RXB_PAUSE_DROP_CTRL);
+	rxb_pause_ctrl &= ~FBNIC_RXB_PAUSE_DROP_CTRL_PAUSE_ENABLE;
+	if (!fbn->tx_pause)
+		command_config |= FBNIC_MAC_COMMAND_CONFIG_TX_PAUSE_DIS;
+	else
+		rxb_pause_ctrl |=
+			FIELD_PREP(FBNIC_RXB_PAUSE_DROP_CTRL_PAUSE_ENABLE,
+				   FBNIC_PAUSE_EN_MASK);
+	wr32(FBNIC_RXB_PAUSE_DROP_CTRL, rxb_pause_ctrl);
+
+	if (!fbn->rx_pause)
+		command_config |= FBNIC_MAC_COMMAND_CONFIG_RX_PAUSE_DIS;
+
+	/* Disable fault handling if no FEC is requested */
+	if ((fbn->fec & FBNIC_FEC_MODE_MASK) == FBNIC_FEC_OFF)
+		command_config |= FBNIC_MAC_COMMAND_CONFIG_FLT_HDL_DIS;
+
+	return command_config;
+}
+
+static bool fbnic_mac_get_pcs_link_status(struct fbnic_dev *fbd)
+{
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+	u32 pcs_status, lane_mask = ~0;
+
+	pcs_status = rd32(FBNIC_MAC_PCS_STS0);
+	if (!(pcs_status & FBNIC_MAC_PCS_STS0_LINK))
+		return false;
+
+	/* Define the expected lane mask for the status bits we need to check */
+	switch (fbn->link_mode & FBNIC_LINK_MODE_MASK) {
+	case FBNIC_LINK_100R2:
+		lane_mask = 0xf;
+		break;
+	case FBNIC_LINK_50R1:
+		lane_mask = 3;
+		break;
+	case FBNIC_LINK_50R2:
+		switch (fbn->fec & FBNIC_FEC_MODE_MASK) {
+		case FBNIC_FEC_OFF:
+			lane_mask = 0x63;
+			break;
+		case FBNIC_FEC_RS:
+			lane_mask = 5;
+			break;
+		case FBNIC_FEC_BASER:
+			lane_mask = 0xf;
+			break;
+		}
+		break;
+	case FBNIC_LINK_25R1:
+		lane_mask = 1;
+		break;
+	}
+
+	/* Use an XOR to remove the bits we expect to see set */
+	switch (fbn->fec & FBNIC_FEC_MODE_MASK) {
+	case FBNIC_FEC_OFF:
+		lane_mask ^= FIELD_GET(FBNIC_MAC_PCS_STS0_BLOCK_LOCK,
+				       pcs_status);
+		break;
+	case FBNIC_FEC_RS:
+		lane_mask ^= FIELD_GET(FBNIC_MAC_PCS_STS0_AMPS_LOCK,
+				       pcs_status);
+		break;
+	case FBNIC_FEC_BASER:
+		lane_mask ^= FIELD_GET(FBNIC_MAC_PCS_STS1_FCFEC_LOCK,
+				       rd32(FBNIC_MAC_PCS_STS1));
+		break;
+	}
+
+	/* If all lanes cancelled then we have a lock on all lanes */
+	return !lane_mask;
+}
+
+#define FBNIC_MAC_ENET_LED_DEFAULT				\
+	(FIELD_PREP(FBNIC_MAC_ENET_LED_AMBER_MASK,		\
+		    FBNIC_MAC_ENET_LED_AMBER_50G |		\
+		    FBNIC_MAC_ENET_LED_AMBER_25G) |		\
+	 FIELD_PREP(FBNIC_MAC_ENET_LED_BLUE_MASK,		\
+		    FBNIC_MAC_ENET_LED_BLUE_100G |		\
+		    FBNIC_MAC_ENET_LED_BLUE_50G))
+#define FBNIC_MAC_ENET_LED_ACTIVITY_DEFAULT			\
+	FIELD_PREP(FBNIC_MAC_ENET_LED_BLINK_RATE_MASK,		\
+		   FBNIC_MAC_ENET_LED_BLINK_RATE_5HZ)
+#define FBNIC_MAC_ENET_LED_ACTIVITY_ON				\
+	FIELD_PREP(FBNIC_MAC_ENET_LED_OVERRIDE_EN,		\
+		   FBNIC_MAC_ENET_LED_OVERRIDE_ACTIVITY)
+#define FBNIC_MAC_ENET_LED_AMBER				\
+	(FIELD_PREP(FBNIC_MAC_ENET_LED_OVERRIDE_EN,		\
+		    FBNIC_MAC_ENET_LED_OVERRIDE_BLUE |		\
+		    FBNIC_MAC_ENET_LED_OVERRIDE_AMBER) |	\
+	 FIELD_PREP(FBNIC_MAC_ENET_LED_OVERRIDE_VAL,		\
+		    FBNIC_MAC_ENET_LED_OVERRIDE_AMBER))
+#define FBNIC_MAC_ENET_LED_BLUE					\
+	(FIELD_PREP(FBNIC_MAC_ENET_LED_OVERRIDE_EN,		\
+		    FBNIC_MAC_ENET_LED_OVERRIDE_BLUE |		\
+		    FBNIC_MAC_ENET_LED_OVERRIDE_AMBER) |	\
+	 FIELD_PREP(FBNIC_MAC_ENET_LED_OVERRIDE_VAL,		\
+		    FBNIC_MAC_ENET_LED_OVERRIDE_BLUE))
+
+static void fbnic_set_led_state_asic(struct fbnic_dev *fbd, int state)
+{
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+	u32 led_csr = FBNIC_MAC_ENET_LED_DEFAULT;
+
+	switch (state) {
+	case FBNIC_LED_OFF:
+		led_csr |= FBNIC_MAC_ENET_LED_AMBER |
+			   FBNIC_MAC_ENET_LED_ACTIVITY_ON;
+		break;
+	case FBNIC_LED_ON:
+		led_csr |= FBNIC_MAC_ENET_LED_BLUE |
+			   FBNIC_MAC_ENET_LED_ACTIVITY_ON;
+		break;
+	case FBNIC_LED_RESTORE:
+		led_csr |= FBNIC_MAC_ENET_LED_ACTIVITY_DEFAULT;
+
+		/* Don't set LEDs on if link isn't up */
+		if (fbd->link_state != FBNIC_LINK_UP)
+			break;
+		/* Don't set LEDs for supported autoneg modes */
+		if ((fbn->link_mode & FBNIC_LINK_AUTO) &&
+		    (fbn->link_mode & FBNIC_LINK_MODE_MASK) != FBNIC_LINK_50R2)
+			break;
+
+		/* Set LEDs based on link speed
+		 * 100G	Blue,
+		 * 50G	Blue & Amber
+		 * 25G	Amber
+		 */
+		switch (fbn->link_mode & FBNIC_LINK_MODE_MASK) {
+		case FBNIC_LINK_100R2:
+			led_csr |= FBNIC_MAC_ENET_LED_BLUE;
+			break;
+		case FBNIC_LINK_50R1:
+		case FBNIC_LINK_50R2:
+			led_csr |= FBNIC_MAC_ENET_LED_BLUE;
+			fallthrough;
+		case FBNIC_LINK_25R1:
+			led_csr |= FBNIC_MAC_ENET_LED_AMBER;
+			break;
+		}
+		break;
+	default:
+		return;
+	}
+
+	wr32(FBNIC_MAC_ENET_LED, led_csr);
+}
+
+static bool fbnic_mac_get_link_asic(struct fbnic_dev *fbd)
+{
+	u32 cmd_cfg, mac_ctrl;
+	int link_direction;
+	bool link;
+
+	/* If disabled do not update link_state nor change settings */
+	if (fbd->link_state == FBNIC_LINK_DISABLED)
+		return false;
+
+	link_direction = fbnic_mac_get_link_event_asic(fbd);
+
+	/* Clear interrupt state due to recent changes. */
+	wr32(FBNIC_MAC_PCS_INTR_STS,
+	     FBNIC_MAC_PCS_INTR_LINK_DOWN | FBNIC_MAC_PCS_INTR_LINK_UP);
+
+	/* If link bounced down clear the PCS_STS bit related to link */
+	if (link_direction < 0) {
+		wr32(FBNIC_MAC_PCS_STS0, FBNIC_MAC_PCS_STS0_LINK |
+					 FBNIC_MAC_PCS_STS0_BLOCK_LOCK |
+					 FBNIC_MAC_PCS_STS0_AMPS_LOCK);
+		wr32(FBNIC_MAC_PCS_STS1, FBNIC_MAC_PCS_STS1_FCFEC_LOCK);
+	}
+
+	link = fbnic_mac_get_pcs_link_status(fbd);
+	cmd_cfg = __fbnic_mac_config_asic(fbd);
+	mac_ctrl = rd32(FBNIC_MAC_CTRL);
+
+	/* Depending on the event we will unmask the cause that will force a
+	 * transition, and update the Tx to reflect our status to the remote
+	 * link partner.
+	 */
+	if (link) {
+		mac_ctrl &= ~(FBNIC_MAC_CTRL_RESET_FF_TX_CLK |
+			      FBNIC_MAC_CTRL_RESET_TX_CLK |
+			      FBNIC_MAC_CTRL_RESET_FF_RX_CLK |
+			      FBNIC_MAC_CTRL_RESET_RX_CLK);
+		cmd_cfg |= FBNIC_MAC_COMMAND_CONFIG_RX_ENA |
+			   FBNIC_MAC_COMMAND_CONFIG_TX_ENA;
+		fbd->link_state = FBNIC_LINK_UP;
+	} else {
+		mac_ctrl |= FBNIC_MAC_CTRL_RESET_FF_TX_CLK |
+			    FBNIC_MAC_CTRL_RESET_TX_CLK |
+			    FBNIC_MAC_CTRL_RESET_FF_RX_CLK |
+			    FBNIC_MAC_CTRL_RESET_RX_CLK;
+		fbd->link_state = FBNIC_LINK_DOWN;
+	}
+
+	wr32(FBNIC_MAC_CTRL, mac_ctrl);
+	wr32(FBNIC_MAC_COMMAND_CONFIG, cmd_cfg);
+
+	/* Toggle LED settings to enable LEDs manually if necessary */
+	fbnic_set_led_state_asic(fbd, FBNIC_LED_RESTORE);
+
+	if (link_direction)
+		wr32(FBNIC_MAC_PCS_INTR_MASK,
+		     link ?  ~FBNIC_MAC_PCS_INTR_LINK_DOWN :
+			     ~FBNIC_MAC_PCS_INTR_LINK_UP);
+
+	return link;
+}
+
+static void fbnic_mac_pre_config(struct fbnic_dev *fbd)
+{
+	u32 serdes_ctrl, mac_ctrl, xif_mode, enet_fec_ctrl = 0;
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+
+	/* set reset bits and enable appending of Tx CRC */
+	mac_ctrl = FBNIC_MAC_CTRL_RESET_FF_TX_CLK |
+		   FBNIC_MAC_CTRL_RESET_FF_RX_CLK |
+		   FBNIC_MAC_CTRL_RESET_TX_CLK |
+		   FBNIC_MAC_CTRL_RESET_RX_CLK |
+		   FBNIC_MAC_CTRL_TX_CRC;
+	serdes_ctrl = FBNIC_MAC_SERDES_CTRL_RESET_PCS_REF_CLK |
+		      FBNIC_MAC_SERDES_CTRL_RESET_F91_REF_CLK |
+		      FBNIC_MAC_SERDES_CTRL_RESET_SD_TX_CLK |
+		      FBNIC_MAC_SERDES_CTRL_RESET_SD_RX_CLK;
+	xif_mode = FBNIC_MAC_XIF_MODE_TX_MAC_RS_ERR;
+
+	switch (fbn->link_mode & FBNIC_LINK_MODE_MASK) {
+	case FBNIC_LINK_25R1:
+		/* Enable XGMII to run w/ 10G pacer */
+		xif_mode |= FBNIC_MAC_XIF_MODE_XGMII;
+		serdes_ctrl |= FBNIC_MAC_SERDES_CTRL_PACER_10G_MASK;
+		if (fbn->fec & FBNIC_FEC_RS)
+			serdes_ctrl |= FBNIC_MAC_SERDES_CTRL_F91_1LANE_IN0;
+		break;
+	case FBNIC_LINK_50R2:
+		if (!(fbn->fec & FBNIC_FEC_RS))
+			serdes_ctrl |= FBNIC_MAC_SERDES_CTRL_RXLAUI_ENA_IN0;
+		break;
+	case FBNIC_LINK_100R2:
+		mac_ctrl |= FBNIC_MAC_CTRL_CFG_MODE128;
+		serdes_ctrl |= FBNIC_MAC_SERDES_CTRL_PCS100_ENA_IN0;
+		enet_fec_ctrl |= FBNIC_MAC_ENET_FEC_CTRL_KP_MODE_ENA;
+		fallthrough;
+	case FBNIC_LINK_50R1:
+		serdes_ctrl |= FBNIC_MAC_SERDES_CTRL_SD_8X;
+		if (fbn->fec & FBNIC_FEC_AUTO)
+			fbn->fec = FBNIC_FEC_AUTO | FBNIC_FEC_RS;
+		break;
+	}
+
+	switch (fbn->fec & FBNIC_FEC_MODE_MASK) {
+	case FBNIC_FEC_RS:
+		enet_fec_ctrl |= FBNIC_MAC_ENET_FEC_CTRL_F91_ENA;
+		break;
+	case FBNIC_FEC_BASER:
+		enet_fec_ctrl |= FBNIC_MAC_ENET_FEC_CTRL_FEC_ENA;
+		break;
+	case FBNIC_FEC_OFF:
+		break;
+	default:
+		dev_err(fbd->dev, "Unsupported FEC mode detected");
+	}
+
+	/* Store updated config to MAC */
+	wr32(FBNIC_MAC_CTRL, mac_ctrl);
+	wr32(FBNIC_MAC_SERDES_CTRL, serdes_ctrl);
+	wr32(FBNIC_MAC_XIF_MODE, xif_mode);
+	wr32(FBNIC_MAC_ENET_FEC_CTRL, enet_fec_ctrl);
+
+	/* flush writes to allow time for MAC to go into resets */
+	wrfl();
+
+	/* Set signal detect for all lanes */
+	wr32(FBNIC_MAC_ENET_SIG_DETECT, FBNIC_MAC_ENET_SIG_DETECT_PCS_MASK);
+}
+
+static void fbnic_mac_pcs_config(struct fbnic_dev *fbd)
+{
+	u32 pcs_mode = 0, rsfec_ctrl = 0, vl_intvl = 0;
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+	int i;
+
+	/* Set link mode specific lane and FEC values */
+	switch (fbn->link_mode & FBNIC_LINK_MODE_MASK) {
+	case FBNIC_LINK_25R1:
+		if (fbn->fec & FBNIC_FEC_RS)
+			vl_intvl = 20479;
+		else
+			pcs_mode |= FBNIC_PCS_MODE_DISABLE_MLD;
+		pcs_mode |= FBNIC_PCS_MODE_HI_BER25 |
+			    FBNIC_PCS_MODE_ENA_CLAUSE49;
+		break;
+	case FBNIC_LINK_50R1:
+		rsfec_ctrl |= FBNIC_RSFEC_CONTROL_KP_ENABLE;
+		fallthrough;
+	case FBNIC_LINK_50R2:
+		rsfec_ctrl |= FBNIC_RSFEC_CONTROL_TC_PAD_ALTER;
+		vl_intvl = 20479;
+		break;
+	case FBNIC_LINK_100R2:
+		rsfec_ctrl |= FBNIC_RSFEC_CONTROL_AM16_COPY_DIS |
+			      FBNIC_RSFEC_CONTROL_KP_ENABLE;
+		pcs_mode |= FBNIC_PCS_MODE_DISABLE_MLD;
+		vl_intvl = 16383;
+		break;
+	}
+
+	for (i = 0; i < 4; i++)
+		wr32(FBNIC_RSFEC_CONTROL(i), rsfec_ctrl);
+
+	wr32(FBNIC_PCS_MODE_VL_CHAN_0, pcs_mode);
+	wr32(FBNIC_PCS_MODE_VL_CHAN_1, pcs_mode);
+
+	wr32(FBNIC_PCS_VENDOR_VL_INTVL_0, vl_intvl);
+	wr32(FBNIC_PCS_VENDOR_VL_INTVL_1, vl_intvl);
+
+	/* Update IPG to account for vl_intvl */
+	wr32(FBNIC_MAC_TX_IPG_LENGTH,
+	     FIELD_PREP(FBNIC_MAC_TX_IPG_LENGTH_COMP, vl_intvl) | 0xc);
+
+	/* Program lane markers indicating which lanes are in use
+	 * and what speeds we are transmitting at.
+	 */
+	switch (fbn->link_mode & FBNIC_LINK_MODE_MASK) {
+	case FBNIC_LINK_100R2:
+		wr32(FBNIC_PCS_VL0_0_CHAN_0, 0x68c1);
+		wr32(FBNIC_PCS_VL0_1_CHAN_0, 0x21);
+		wr32(FBNIC_PCS_VL1_0_CHAN_0, 0x719d);
+		wr32(FBNIC_PCS_VL1_1_CHAN_0, 0x8e);
+		wr32(FBNIC_PCS_VL2_0_CHAN_0, 0x4b59);
+		wr32(FBNIC_PCS_VL2_1_CHAN_0, 0xe8);
+		wr32(FBNIC_PCS_VL3_0_CHAN_0, 0x954d);
+		wr32(FBNIC_PCS_VL3_1_CHAN_0, 0x7b);
+		wr32(FBNIC_PCS_VL0_0_CHAN_1, 0x68c1);
+		wr32(FBNIC_PCS_VL0_1_CHAN_1, 0x21);
+		wr32(FBNIC_PCS_VL1_0_CHAN_1, 0x719d);
+		wr32(FBNIC_PCS_VL1_1_CHAN_1, 0x8e);
+		wr32(FBNIC_PCS_VL2_0_CHAN_1, 0x4b59);
+		wr32(FBNIC_PCS_VL2_1_CHAN_1, 0xe8);
+		wr32(FBNIC_PCS_VL3_0_CHAN_1, 0x954d);
+		wr32(FBNIC_PCS_VL3_1_CHAN_1, 0x7b);
+		break;
+	case FBNIC_LINK_50R2:
+		wr32(FBNIC_PCS_VL0_0_CHAN_1, 0x7690);
+		wr32(FBNIC_PCS_VL0_1_CHAN_1, 0x47);
+		wr32(FBNIC_PCS_VL1_0_CHAN_1, 0xc4f0);
+		wr32(FBNIC_PCS_VL1_1_CHAN_1, 0xe6);
+		wr32(FBNIC_PCS_VL2_0_CHAN_1, 0x65c5);
+		wr32(FBNIC_PCS_VL2_1_CHAN_1, 0x9b);
+		wr32(FBNIC_PCS_VL3_0_CHAN_1, 0x79a2);
+		wr32(FBNIC_PCS_VL3_1_CHAN_1, 0x3d);
+		fallthrough;
+	case FBNIC_LINK_50R1:
+		wr32(FBNIC_PCS_VL0_0_CHAN_0, 0x7690);
+		wr32(FBNIC_PCS_VL0_1_CHAN_0, 0x47);
+		wr32(FBNIC_PCS_VL1_0_CHAN_0, 0xc4f0);
+		wr32(FBNIC_PCS_VL1_1_CHAN_0, 0xe6);
+		wr32(FBNIC_PCS_VL2_0_CHAN_0, 0x65c5);
+		wr32(FBNIC_PCS_VL2_1_CHAN_0, 0x9b);
+		wr32(FBNIC_PCS_VL3_0_CHAN_0, 0x79a2);
+		wr32(FBNIC_PCS_VL3_1_CHAN_0, 0x3d);
+		break;
+	case FBNIC_LINK_25R1:
+		wr32(FBNIC_PCS_VL0_0_CHAN_0, 0x68c1);
+		wr32(FBNIC_PCS_VL0_1_CHAN_0, 0x21);
+		wr32(FBNIC_PCS_VL1_0_CHAN_0, 0xc4f0);
+		wr32(FBNIC_PCS_VL1_1_CHAN_0, 0xe6);
+		wr32(FBNIC_PCS_VL2_0_CHAN_0, 0x65c5);
+		wr32(FBNIC_PCS_VL2_1_CHAN_0, 0x9b);
+		wr32(FBNIC_PCS_VL3_0_CHAN_0, 0x79a2);
+		wr32(FBNIC_PCS_VL3_1_CHAN_0, 0x3d);
+		break;
+	}
+}
+
+static bool fbnic_mac_pcs_reset_complete(struct fbnic_dev *fbd)
+{
+	return !(rd32(FBNIC_PCS_CONTROL1_0) & FBNIC_PCS_CONTROL1_RESET) &&
+	       !(rd32(FBNIC_PCS_CONTROL1_1) & FBNIC_PCS_CONTROL1_RESET);
+}
+
+static int fbnic_mac_post_config(struct fbnic_dev *fbd)
+{
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+	u32 serdes_ctrl, reset_complete, lane_mask;
+	int err;
+
+	/* Clear resets for XPCS and F91 reference clocks */
+	serdes_ctrl = rd32(FBNIC_MAC_SERDES_CTRL);
+	serdes_ctrl &= ~FBNIC_MAC_SERDES_CTRL_RESET_PCS_REF_CLK;
+	if (fbn->fec & FBNIC_FEC_RS)
+		serdes_ctrl &= ~FBNIC_MAC_SERDES_CTRL_RESET_F91_REF_CLK;
+	wr32(FBNIC_MAC_SERDES_CTRL, serdes_ctrl);
+
+	/* Reset PCS and flush reset value */
+	wr32(FBNIC_PCS_CONTROL1_0,
+	     FBNIC_PCS_CONTROL1_RESET |
+	     FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS |
+	     FBNIC_PCS_CONTROL1_SPEED_ALWAYS);
+	wr32(FBNIC_PCS_CONTROL1_1,
+	     FBNIC_PCS_CONTROL1_RESET |
+	     FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS |
+	     FBNIC_PCS_CONTROL1_SPEED_ALWAYS);
+
+	/* poll for completion of reset */
+	err = readx_poll_timeout(fbnic_mac_pcs_reset_complete, fbd,
+				 reset_complete, reset_complete,
+				 1000, 150000);
+	if (err)
+		return err;
+
+	/* Flush any stale link status info */
+	wr32(FBNIC_MAC_PCS_STS0, FBNIC_MAC_PCS_STS0_LINK |
+				 FBNIC_MAC_PCS_STS0_BLOCK_LOCK |
+				 FBNIC_MAC_PCS_STS0_AMPS_LOCK);
+
+	/* Report starting state as "Link Event" to force detection of link */
+	fbd->link_state = FBNIC_LINK_EVENT;
+
+	/* Force link down to allow for link detection */
+	netif_carrier_off(fbn->netdev);
+
+	/* create simple bitmask for 2 or 1 lane setups */
+	lane_mask = (fbn->link_mode & FBNIC_LINK_MODE_R2) ? 3 : 1;
+
+	/* release the brakes and allow Tx/Rx to come out of reset */
+	serdes_ctrl &=
+	     ~(FIELD_PREP(FBNIC_MAC_SERDES_CTRL_RESET_SD_TX_CLK, lane_mask) |
+	       FIELD_PREP(FBNIC_MAC_SERDES_CTRL_RESET_SD_RX_CLK, lane_mask));
+	wr32(FBNIC_MAC_SERDES_CTRL, serdes_ctrl);
+
+	fbn->link_mode &= ~FBNIC_LINK_AUTO;
+
+	/* Ask firmware to configure the PHY for the correct encoding mode */
+	return fbnic_fw_xmit_comphy_set_msg(fbd,
+					    fbn->link_mode &
+					    FBNIC_LINK_MODE_MASK);
+}
+
+static void fbnic_mac_get_fw_settings(struct fbnic_dev *fbd)
+{
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+	u8 fec = fbn->fec;
+	u8 link_mode;
+
+	/* Update FEC first to reflect FW current mode */
+	if (fbn->fec & FBNIC_FEC_AUTO) {
+		switch (fbd->fw_cap.link_fec) {
+		case FBNIC_FW_LINK_FEC_NONE:
+			fec = FBNIC_FEC_OFF;
+			break;
+		case FBNIC_FW_LINK_FEC_RS:
+			fec = FBNIC_FEC_RS;
+			break;
+		case FBNIC_FW_LINK_FEC_BASER:
+			fec = FBNIC_FEC_BASER;
+			break;
+		default:
+			return;
+		}
+	}
+
+	/* Do nothing if AUTO mode is not engaged */
+	if (fbn->link_mode & FBNIC_LINK_AUTO) {
+		switch (fbd->fw_cap.link_speed) {
+		case FBNIC_FW_LINK_SPEED_25R1:
+			link_mode = FBNIC_LINK_25R1;
+			break;
+		case FBNIC_FW_LINK_SPEED_50R2:
+			link_mode = FBNIC_LINK_50R2;
+			break;
+		case FBNIC_FW_LINK_SPEED_50R1:
+			link_mode = FBNIC_LINK_50R1;
+			fec = FBNIC_FEC_RS;
+			break;
+		case FBNIC_FW_LINK_SPEED_100R2:
+			link_mode = FBNIC_LINK_100R2;
+			fec = FBNIC_FEC_RS;
+			break;
+		default:
+			return;
+		}
+
+		fbn->link_mode = link_mode;
+		fbn->fec = fec;
+	}
+}
+
+static int fbnic_mac_enable_asic(struct fbnic_dev *fbd)
+{
+	/* Mask and clear the PCS interrupt, will be enabled by link handler */
+	wr32(FBNIC_MAC_PCS_INTR_MASK, ~0);
+	wr32(FBNIC_MAC_PCS_INTR_STS, ~0);
+
+	/* Pull in settings from FW */
+	fbnic_mac_get_fw_settings(fbd);
+
+	/* Configure MAC registers */
+	fbnic_mac_pre_config(fbd);
+
+	/* Configure PCS block */
+	fbnic_mac_pcs_config(fbd);
+
+	/* Configure flow control and error correction */
+	wr32(FBNIC_MAC_COMMAND_CONFIG, __fbnic_mac_config_asic(fbd));
+
+	/* Configure maximum frame size */
+	wr32(FBNIC_MAC_FRM_LENGTH, FBNIC_MAX_JUMBO_FRAME_SIZE);
+
+	/* Configure LED defaults */
+	fbnic_set_led_state_asic(fbd, FBNIC_LED_RESTORE);
+
+	return fbnic_mac_post_config(fbd);
+}
+
+static void fbnic_mac_disable_asic(struct fbnic_dev *fbd)
+{
+	u32 mask = FBNIC_MAC_COMMAND_CONFIG_LOOPBACK_EN;
+	u32 cmd_cfg = rd32(FBNIC_MAC_COMMAND_CONFIG);
+	u32 mac_ctrl = rd32(FBNIC_MAC_CTRL);
+
+	/* Clear link state to disable any further transitions */
+	fbd->link_state = FBNIC_LINK_DISABLED;
+
+	/* Clear Tx and Rx enable bits to disable MAC, ignore other values */
+	if (!fbnic_bmc_present(fbd)) {
+		mask |= FBNIC_MAC_COMMAND_CONFIG_RX_ENA |
+			FBNIC_MAC_COMMAND_CONFIG_TX_ENA;
+		mac_ctrl |= FBNIC_MAC_CTRL_RESET_FF_TX_CLK |
+			    FBNIC_MAC_CTRL_RESET_TX_CLK |
+			    FBNIC_MAC_CTRL_RESET_FF_RX_CLK |
+			    FBNIC_MAC_CTRL_RESET_RX_CLK;
+
+		/* Restore LED defaults */
+		fbnic_set_led_state_asic(fbd, FBNIC_LED_RESTORE);
+	}
+
+	/* Check mask for enabled bits, if any set clear and write back */
+	if (mask & cmd_cfg) {
+		wr32(FBNIC_MAC_COMMAND_CONFIG, cmd_cfg & ~mask);
+		wr32(FBNIC_MAC_CTRL, mac_ctrl);
+	}
+
+	/* Disable loopback, and flush write */
+	wr32(FBNIC_PCS_CONTROL1_0,
+	     FBNIC_PCS_CONTROL1_RESET |
+	     FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS |
+	     FBNIC_PCS_CONTROL1_SPEED_ALWAYS);
+	wr32(FBNIC_PCS_CONTROL1_1,
+	     FBNIC_PCS_CONTROL1_RESET |
+	     FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS |
+	     FBNIC_PCS_CONTROL1_SPEED_ALWAYS);
+}
+
 static const struct fbnic_mac fbnic_mac_asic = {
+	.enable = fbnic_mac_enable_asic,
+	.disable = fbnic_mac_disable_asic,
 	.init_regs = fbnic_mac_init_regs,
+	.get_link = fbnic_mac_get_link_asic,
+	.get_link_event = fbnic_mac_get_link_event_asic,
 };
 
 /**
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_mac.h b/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
index e78a92338a62..5aa089093206 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
@@ -10,14 +10,72 @@ struct fbnic_dev;
 
 #define FBNIC_MAX_JUMBO_FRAME_SIZE	9742
 
+enum {
+	FBNIC_LINK_DISABLED	= 0,
+	FBNIC_LINK_DOWN		= 1,
+	FBNIC_LINK_UP		= 2,
+	FBNIC_LINK_EVENT	= 3,
+};
+
+enum {
+	FBNIC_LED_STROBE_INIT,
+	FBNIC_LED_ON,
+	FBNIC_LED_OFF,
+	FBNIC_LED_RESTORE,
+};
+
+/* Treat the FEC bits as a bitmask laid out as follows:
+ * Bit 0: RS Enabled
+ * Bit 1: BASER(Firecode) Enabled
+ * Bit 2: Autoneg FEC
+ */
+enum {
+	FBNIC_FEC_OFF		= 0,
+	FBNIC_FEC_RS		= 1,
+	FBNIC_FEC_BASER		= 2,
+	FBNIC_FEC_AUTO		= 4,
+};
+
+#define FBNIC_FEC_MODE_MASK	(FBNIC_FEC_AUTO - 1)
+
+/* Treat the link modes as a set of moldulation/lanes bitmask:
+ * Bit 0: Lane Count, 0 = R1, 1 = R2
+ * Bit 1: Modulation, 0 = NRZ, 1 = PAM4
+ * Bit 2: Autoneg Modulation/Lane Configuration
+ */
+enum {
+	FBNIC_LINK_25R1		= 0,
+	FBNIC_LINK_50R2		= 1,
+	FBNIC_LINK_50R1		= 2,
+	FBNIC_LINK_100R2	= 3,
+	FBNIC_LINK_AUTO		= 4,
+};
+
+#define FBNIC_LINK_MODE_R2	(FBNIC_LINK_50R2)
+#define FBNIC_LINK_MODE_PAM4	(FBNIC_LINK_50R1)
+#define FBNIC_LINK_MODE_MASK	(FBNIC_LINK_AUTO - 1)
+
 /* This structure defines the interface hooks for the MAC. The MAC hooks
  * will be configured as a const struct provided with a set of function
  * pointers.
  *
+ * bool (*get_link)(struct fbnic_dev *fbd);
+ *	Get the current link state for the MAC.
+ * int (*get_link_event)(struct fbnic_dev *fbd)
+ *	Get the current link event status, reports true if link has
+ *	changed to either up (1) or down (-1).
+ * void (*enable)(struct fbnic_dev *fbd);
+ *	Configure and enable MAC to enable link if not already enabled
+ * void (*disable)(struct fbnic_dev *fbd);
+ *	Shutdown the link if we are the only consumer of it.
  * void (*init_regs)(struct fbnic_dev *fbd);
  *	Initialize MAC registers to enable Tx/Rx paths and FIFOs.
  */
 struct fbnic_mac {
+	bool (*get_link)(struct fbnic_dev *fbd);
+	int (*get_link_event)(struct fbnic_dev *fbd);
+	int (*enable)(struct fbnic_dev *fbd);
+	void (*disable)(struct fbnic_dev *fbd);
 	void (*init_regs)(struct fbnic_dev *fbd);
 };
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index bbc2f21060dc..c49ace7f2156 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -45,6 +45,10 @@ int __fbnic_open(struct fbnic_net *fbn)
 	if (err)
 		goto release_ownership;
 
+	err = fbnic_mac_enable(fbd);
+	if (err)
+		goto release_ownership;
+
 	return 0;
 release_ownership:
 	fbnic_fw_xmit_ownership_msg(fbn->fbd, false);
@@ -72,6 +76,7 @@ static int fbnic_stop(struct net_device *netdev)
 	struct fbnic_net *fbn = netdev_priv(netdev);
 
 	fbnic_down(fbn);
+	fbnic_mac_disable(fbn->fbd);
 
 	fbnic_fw_xmit_ownership_msg(fbn->fbd, false);
 
@@ -146,6 +151,13 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
 	netdev->min_mtu = IPV6_MIN_MTU;
 	netdev->max_mtu = FBNIC_MAX_JUMBO_FRAME_SIZE - ETH_HLEN;
 
+	/* Default to accept pause frames w/ attempt to autoneg the value */
+	fbn->autoneg_pause = true;
+	fbn->rx_pause = true;
+	fbn->tx_pause = false;
+
+	fbn->fec = FBNIC_FEC_AUTO | FBNIC_FEC_RS;
+	fbn->link_mode = FBNIC_LINK_AUTO | FBNIC_LINK_50R2;
 	netif_carrier_off(netdev);
 
 	netif_tx_stop_all_queues(netdev);
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
index 18f93e9431cc..3976fb1a0eac 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
@@ -22,9 +22,16 @@ struct fbnic_net {
 
 	u16 num_napi;
 
+	u8 autoneg_pause;
+	u8 tx_pause;
+	u8 rx_pause;
+	u8 fec;
+	u8 link_mode;
+
 	u16 num_tx_queues;
 	u16 num_rx_queues;
 
+	u64 link_down_events;
 	struct list_head napis;
 };
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index 8408f0d5f54a..f243950c68bb 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -160,6 +160,78 @@ void fbnic_down(struct fbnic_net *fbn)
 	fbnic_flush(fbn);
 }
 
+static char *fbnic_report_fec(struct fbnic_dev *fbd)
+{
+	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
+
+	if (fbn->link_mode & FBNIC_LINK_MODE_PAM4)
+		return "Clause 91 RS(544,514)";
+
+	switch (fbn->fec & FBNIC_FEC_MODE_MASK) {
+	case FBNIC_FEC_OFF:
+		return "Off";
+	case FBNIC_FEC_BASER:
+		return "Clause 74 BaseR";
+	case FBNIC_FEC_RS:
+		return "Clause 91 RS(528,514)";
+	}
+
+	return "Unknown";
+}
+
+static void fbnic_link_check(struct fbnic_dev *fbd)
+{
+	struct net_device *netdev = fbd->netdev;
+	bool link_found = false;
+	int err;
+
+	err = fbnic_mac_get_link(fbd, &link_found);
+	if (err) {
+		/* TBD: For now we do nothing. In the future we should have
+		 * the link_check function request a reset.
+		 *
+		 * We would do this here as the reset will likely involve
+		 * us having to tear down the interface which will require
+		 * us taking the RTNL lock in order to coordinate the
+		 * teardown and bringup before and after the reset.
+		 */
+		return;
+	}
+
+	if (!link_found) {
+		if (netif_carrier_ok(netdev)) {
+			struct fbnic_net *fbn = netdev_priv(netdev);
+
+			netdev_err(netdev, "NIC Link is Down\n");
+			fbn->link_down_events++;
+		}
+		netif_carrier_off(netdev);
+		return;
+	}
+
+	if (!netif_carrier_ok(netdev)) {
+		struct fbnic_net *fbn = netdev_priv(netdev);
+
+		netdev_info(netdev,
+			    "NIC Link is Up, %d Mbps (%s), Flow control: %s\n",
+			    ((fbn->link_mode & FBNIC_LINK_MODE_PAM4) ?
+			     50000 : 25000) *
+			    ((fbn->link_mode & FBNIC_LINK_MODE_R2) ?
+			     2 : 1),
+			    (fbn->link_mode & FBNIC_LINK_MODE_PAM4) ?
+			    "PAM4" : "NRZ",
+			    (fbn->rx_pause ?
+			     (fbn->tx_pause ? "ON - Tx/Rx" : "ON - Rx") :
+			     (fbn->tx_pause ? "ON - Tx" : "OFF")));
+		netdev_info(netdev, "FEC autoselect %s encoding: %s\n",
+			    (fbn->fec & FBNIC_FEC_AUTO) ?
+			    "enabled" : "disabled",
+			    fbnic_report_fec(fbd));
+		fbnic_config_drop_mode(fbn);
+	}
+	netif_carrier_on(netdev);
+}
+
 static void fbnic_health_check(struct fbnic_dev *fbd)
 {
 	struct fbnic_fw_mbx *tx_mbx = &fbd->mbx[FBNIC_IPC_MBX_TX_IDX];
@@ -192,6 +264,7 @@ static void fbnic_service_task(struct work_struct *work)
 	rtnl_lock();
 
 	fbnic_fw_check_heartbeat(fbd);
+	fbnic_link_check(fbd);
 
 	fbnic_health_check(fbd);
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
index 484cab7342da..2967ff53305a 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
@@ -1031,9 +1031,14 @@ static void fbnic_enable_bdq(struct fbnic_ring *hpq, struct fbnic_ring *ppq)
 static void fbnic_config_drop_mode_rcq(struct fbnic_napi_vector *nv,
 				       struct fbnic_ring *rcq)
 {
+	struct fbnic_net *fbn = netdev_priv(nv->napi.dev);
 	u32 drop_mode, rcq_ctl;
 
-	drop_mode = FBNIC_QUEUE_RDE_CTL0_DROP_IMMEDIATE;
+	/* Drop mode is only supported on when flow control is disabled */
+	if (!fbn->tx_pause)
+		drop_mode = FBNIC_QUEUE_RDE_CTL0_DROP_IMMEDIATE;
+	else
+		drop_mode = FBNIC_QUEUE_RDE_CTL0_DROP_NEVER;
 
 	/* Specify packet layout */
 	rcq_ctl = FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_DROP_MODE_MASK, drop_mode) |
@@ -1043,6 +1048,20 @@ static void fbnic_config_drop_mode_rcq(struct fbnic_napi_vector *nv,
 	fbnic_ring_wr32(rcq, FBNIC_QUEUE_RDE_CTL0, rcq_ctl);
 }
 
+void fbnic_config_drop_mode(struct fbnic_net *fbn)
+{
+	struct fbnic_napi_vector *nv;
+	int i;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		for (i = 0; i < nv->rxt_count; i++) {
+			struct fbnic_q_triad *qt = &nv->qt[nv->txt_count + i];
+
+			fbnic_config_drop_mode_rcq(nv, &qt->cmpl);
+		}
+	}
+}
+
 static void fbnic_enable_rcq(struct fbnic_napi_vector *nv,
 			     struct fbnic_ring *rcq)
 {
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
index 200f3b893d02..812e4bb245db 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
@@ -99,6 +99,7 @@ void fbnic_napi_enable(struct fbnic_net *fbn);
 void fbnic_napi_disable(struct fbnic_net *fbn);
 void fbnic_enable(struct fbnic_net *fbn);
 void fbnic_disable(struct fbnic_net *fbn);
+void fbnic_config_drop_mode(struct fbnic_net *fbn);
 void fbnic_flush(struct fbnic_net *fbn);
 void fbnic_fill(struct fbnic_net *fbn);
 



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 12/15] eth: fbnic: add basic Tx handling
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (10 preceding siblings ...)
  2024-04-03 20:09 ` [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup Alexander Duyck
@ 2024-04-03 20:09 ` Alexander Duyck
  2024-04-03 20:09 ` [net-next PATCH 13/15] eth: fbnic: add basic Rx handling Alexander Duyck
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:09 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Handle Tx of simple packets. Support checksum offload and gather.
Use .ndo_features_check to make sure packet geometry will be
supported by the HW, i.e. we can fit the header lengths into
the descriptor fields.

The device writes to the completion rings the position of the tail
(consumer) pointer. Read all those writebacks, obviously the last
one will be the most recent, complete skbs up to that point.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |   66 ++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |    9 +
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c   |  395 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h   |   15 +
 4 files changed, 483 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 39c98d2dce12..0819ddc1dcc8 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -24,6 +24,72 @@
 
 #define FBNIC_CLOCK_FREQ	(600 * (1000 * 1000))
 
+/* Transmit Work Descriptor Format */
+/* Length, Type, Offset Masks and Shifts */
+#define FBNIC_TWD_L2_HLEN_MASK			DESC_GENMASK(5, 0)
+
+#define FBNIC_TWD_L3_TYPE_MASK			DESC_GENMASK(7, 6)
+enum {
+	FBNIC_TWD_L3_TYPE_OTHER	= 0,
+	FBNIC_TWD_L3_TYPE_IPV4	= 1,
+	FBNIC_TWD_L3_TYPE_IPV6	= 2,
+	FBNIC_TWD_L3_TYPE_V6V6	= 3,
+};
+
+#define FBNIC_TWD_L3_OHLEN_MASK			DESC_GENMASK(15, 8)
+#define FBNIC_TWD_L3_IHLEN_MASK			DESC_GENMASK(23, 16)
+
+enum {
+	FBNIC_TWD_L4_TYPE_OTHER	= 0,
+	FBNIC_TWD_L4_TYPE_TCP	= 1,
+	FBNIC_TWD_L4_TYPE_UDP	= 2,
+};
+
+#define FBNIC_TWD_CSUM_OFFSET_MASK		DESC_GENMASK(27, 24)
+#define FBNIC_TWD_L4_HLEN_MASK			DESC_GENMASK(31, 28)
+
+/* Flags and Type */
+#define FBNIC_TWD_L4_TYPE_MASK			DESC_GENMASK(33, 32)
+#define FBNIC_TWD_FLAG_REQ_TS			DESC_BIT(34)
+#define FBNIC_TWD_FLAG_REQ_LSO			DESC_BIT(35)
+#define FBNIC_TWD_FLAG_REQ_CSO			DESC_BIT(36)
+#define FBNIC_TWD_FLAG_REQ_COMPLETION		DESC_BIT(37)
+#define FBNIC_TWD_FLAG_DEST_MAC			DESC_BIT(43)
+#define FBNIC_TWD_FLAG_DEST_BMC			DESC_BIT(44)
+#define FBNIC_TWD_FLAG_DEST_FW			DESC_BIT(45)
+#define FBNIC_TWD_TYPE_MASK			DESC_GENMASK(47, 46)
+enum {
+	FBNIC_TWD_TYPE_META	= 0,
+	FBNIC_TWD_TYPE_OPT_META	= 1,
+	FBNIC_TWD_TYPE_AL	= 2,
+	FBNIC_TWD_TYPE_LAST_AL	= 3,
+};
+
+/* MSS and Completion Req */
+#define FBNIC_TWD_MSS_MASK			DESC_GENMASK(61, 48)
+
+#define FBNIC_TWD_TS_MASK			DESC_GENMASK(39, 0)
+#define FBNIC_TWD_ADDR_MASK			DESC_GENMASK(45, 0)
+#define FBNIC_TWD_LEN_MASK			DESC_GENMASK(63, 48)
+
+/* Tx Completion Descriptor Format */
+#define FBNIC_TCD_TYPE0_HEAD0_MASK		DESC_GENMASK(15, 0)
+#define FBNIC_TCD_TYPE0_HEAD1_MASK		DESC_GENMASK(31, 16)
+
+#define FBNIC_TCD_TYPE1_TS_MASK			DESC_GENMASK(39, 0)
+
+#define FBNIC_TCD_STATUS_MASK			DESC_GENMASK(59, 48)
+#define FBNIC_TCD_STATUS_TS_INVALID		DESC_BIT(48)
+#define FBNIC_TCD_STATUS_ILLEGAL_TS_REQ		DESC_BIT(49)
+#define FBNIC_TCD_TWQ1				DESC_BIT(60)
+#define FBNIC_TCD_TYPE_MASK			DESC_GENMASK(62, 61)
+enum {
+	FBNIC_TCD_TYPE_0	= 0,
+	FBNIC_TCD_TYPE_1	= 1,
+};
+
+#define FBNIC_TCD_DONE				DESC_BIT(63)
+
 /* Rx Buffer Descriptor Format */
 #define FBNIC_BD_PAGE_ADDR_MASK			DESC_GENMASK(45, 12)
 #define FBNIC_BD_PAGE_ID_MASK			DESC_GENMASK(63, 48)
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index c49ace7f2156..91d4ea2bfb29 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -91,6 +91,7 @@ static const struct net_device_ops fbnic_netdev_ops = {
 	.ndo_stop		= fbnic_stop,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_start_xmit		= fbnic_xmit_frame,
+	.ndo_features_check	= fbnic_features_check,
 };
 
 void fbnic_reset_queues(struct fbnic_net *fbn,
@@ -148,6 +149,14 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
 
 	fbnic_reset_queues(fbn, default_queues, default_queues);
 
+	netdev->features |=
+		NETIF_F_SG |
+		NETIF_F_HW_CSUM;
+
+	netdev->hw_features |= netdev->features;
+	netdev->vlan_features |= netdev->features;
+	netdev->hw_enc_features |= netdev->features;
+
 	netdev->min_mtu = IPV6_MIN_MTU;
 	netdev->max_mtu = FBNIC_MAX_JUMBO_FRAME_SIZE - ETH_HLEN;
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
index 2967ff53305a..ad4cb059c959 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) Meta Platforms, Inc. and affiliates. */
 
+#include <linux/bitfield.h>
 #include <linux/iopoll.h>
 #include <linux/pci.h>
 #include <net/netdev_queues.h>
@@ -10,6 +11,14 @@
 #include "fbnic_txrx.h"
 #include "fbnic.h"
 
+struct fbnic_xmit_cb {
+	u32 bytecount;
+	u8 desc_count;
+	int hw_head;
+};
+
+#define FBNIC_XMIT_CB(_skb) ((struct fbnic_xmit_cb *)(skb->cb))
+
 static u32 __iomem *fbnic_ring_csr_base(const struct fbnic_ring *ring)
 {
 	unsigned long csr_base = (unsigned long)ring->doorbell;
@@ -38,12 +47,313 @@ static inline unsigned int fbnic_desc_unused(struct fbnic_ring *ring)
 	return (ring->head - ring->tail - 1) & ring->size_mask;
 }
 
-netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev)
+static inline unsigned int fbnic_desc_used(struct fbnic_ring *ring)
+{
+	return (ring->tail - ring->head) & ring->size_mask;
+}
+
+static inline struct netdev_queue *txring_txq(const struct net_device *dev,
+					      const struct fbnic_ring *ring)
+{
+	return netdev_get_tx_queue(dev, ring->q_idx);
+}
+
+static inline int fbnic_maybe_stop_tx(const struct net_device *dev,
+				      struct fbnic_ring *ring,
+				      const unsigned int size)
+{
+	struct netdev_queue *txq = txring_txq(dev, ring);
+	int res;
+
+	res = netif_txq_maybe_stop(txq, fbnic_desc_unused(ring), size,
+				   FBNIC_TX_DESC_WAKEUP);
+
+	return !res;
+}
+
+static inline bool fbnic_tx_sent_queue(struct sk_buff *skb,
+				       struct fbnic_ring *ring)
+{
+	struct netdev_queue *dev_queue = txring_txq(skb->dev, ring);
+	unsigned int bytecount = FBNIC_XMIT_CB(skb)->bytecount;
+	bool xmit_more = netdev_xmit_more();
+
+	/* TBD: Request completion more often if xmit_more becomes large */
+
+	return __netdev_tx_sent_queue(dev_queue, bytecount, xmit_more);
+}
+
+static void fbnic_unmap_single_twd(struct device *dev, __le64 *twd)
+{
+	u64 raw_twd = le64_to_cpu(*twd);
+	unsigned int len;
+	dma_addr_t dma;
+
+	dma = FIELD_GET(FBNIC_TWD_ADDR_MASK, raw_twd);
+	len = FIELD_GET(FBNIC_TWD_LEN_MASK, raw_twd);
+
+	dma_unmap_single(dev, dma, len, DMA_TO_DEVICE);
+}
+
+static void fbnic_unmap_page_twd(struct device *dev, __le64 *twd)
+{
+	u64 raw_twd = le64_to_cpu(*twd);
+	unsigned int len;
+	dma_addr_t dma;
+
+	dma = FIELD_GET(FBNIC_TWD_ADDR_MASK, raw_twd);
+	len = FIELD_GET(FBNIC_TWD_LEN_MASK, raw_twd);
+
+	dma_unmap_page(dev, dma, len, DMA_TO_DEVICE);
+}
+
+#define FBNIC_TWD_TYPE(_type) \
+	cpu_to_le64(FIELD_PREP(FBNIC_TWD_TYPE_MASK, FBNIC_TWD_TYPE_##_type))
+
+static bool
+fbnic_tx_offloads(struct fbnic_ring *ring, struct sk_buff *skb, __le64 *meta)
+{
+	unsigned int l2len, i3len;
+
+	if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL))
+		return false;
+
+	l2len = skb_mac_header_len(skb);
+	i3len = skb_checksum_start(skb) - skb_network_header(skb);
+
+	*meta |= cpu_to_le64(FIELD_PREP(FBNIC_TWD_CSUM_OFFSET_MASK,
+					skb->csum_offset / 2));
+
+	*meta |= cpu_to_le64(FBNIC_TWD_FLAG_REQ_CSO);
+
+	*meta |= cpu_to_le64(FIELD_PREP(FBNIC_TWD_L2_HLEN_MASK, l2len / 2) |
+			     FIELD_PREP(FBNIC_TWD_L3_IHLEN_MASK, i3len / 2));
+	return false;
+}
+
+static bool
+fbnic_tx_map(struct fbnic_ring *ring, struct sk_buff *skb, __le64 *meta)
 {
+	struct device *dev = skb->dev->dev.parent;
+	unsigned int tail = ring->tail, first;
+	unsigned int size, data_len;
+	skb_frag_t *frag;
+	dma_addr_t dma;
+	__le64 *twd;
+
+	ring->tx_buf[tail] = skb;
+
+	tail++;
+	tail &= ring->size_mask;
+	first = tail;
+
+	size = skb_headlen(skb);
+	data_len = skb->data_len;
+
+	if (size > FIELD_MAX(FBNIC_TWD_LEN_MASK))
+		goto dma_error;
+
+	dma = dma_map_single(dev, skb->data, size, DMA_TO_DEVICE);
+
+	for (frag = &skb_shinfo(skb)->frags[0];; frag++) {
+		twd = &ring->desc[tail];
+
+		if (dma_mapping_error(dev, dma))
+			goto dma_error;
+
+		*twd = cpu_to_le64(FIELD_PREP(FBNIC_TWD_ADDR_MASK, dma) |
+				   FIELD_PREP(FBNIC_TWD_LEN_MASK, size) |
+				   FIELD_PREP(FBNIC_TWD_TYPE_MASK,
+					      FBNIC_TWD_TYPE_AL));
+
+		tail++;
+		tail &= ring->size_mask;
+
+		if (!data_len)
+			break;
+
+		size = skb_frag_size(frag);
+		data_len -= size;
+
+		if (size > FIELD_MAX(FBNIC_TWD_LEN_MASK))
+			goto dma_error;
+
+		dma = skb_frag_dma_map(dev, frag, 0, size, DMA_TO_DEVICE);
+	}
+
+	*twd |= FBNIC_TWD_TYPE(LAST_AL);
+
+	FBNIC_XMIT_CB(skb)->desc_count = ((twd - meta) + 1) & ring->size_mask;
+
+	ring->tail = tail;
+
+	/* Verify there is room for another packet */
+	fbnic_maybe_stop_tx(skb->dev, ring, FBNIC_MAX_SKB_DESC);
+
+	if (fbnic_tx_sent_queue(skb, ring)) {
+		*meta |= cpu_to_le64(FBNIC_TWD_FLAG_REQ_COMPLETION);
+
+		/* Force DMA writes to flush before writing to tail */
+		dma_wmb();
+
+		writel(tail, ring->doorbell);
+	}
+
+	return false;
+dma_error:
+	if (net_ratelimit())
+		netdev_err(skb->dev, "TX DMA map failed\n");
+
+	while (tail != first) {
+		tail--;
+		tail &= ring->size_mask;
+		twd = &ring->desc[tail];
+		if (tail == first)
+			fbnic_unmap_single_twd(dev, twd);
+		else
+			fbnic_unmap_page_twd(dev, twd);
+	}
+
+	return true;
+}
+
+#define FBNIC_MIN_FRAME_LEN	60
+
+static netdev_tx_t
+fbnic_xmit_frame_ring(struct sk_buff *skb, struct fbnic_ring *ring)
+{
+	__le64 *meta = &ring->desc[ring->tail];
+	u16 desc_needed;
+
+	if (skb_put_padto(skb, FBNIC_MIN_FRAME_LEN))
+		goto err_count;
+
+	/* need: 1 descriptor per page,
+	 *       + 1 desc for skb_head,
+	 *       + 2 desc for metadata and timestamp metadata
+	 *       + 7 desc gap to keep tail from touching head
+	 * otherwise try next time
+	 */
+	desc_needed = skb_shinfo(skb)->nr_frags + 10;
+	if (fbnic_maybe_stop_tx(skb->dev, ring, desc_needed))
+		return NETDEV_TX_BUSY;
+
+	*meta = cpu_to_le64(FBNIC_TWD_FLAG_DEST_MAC);
+
+	/* Write all members within DWORD to condense this into 2 4B writes */
+	FBNIC_XMIT_CB(skb)->bytecount = skb->len;
+	FBNIC_XMIT_CB(skb)->desc_count = 0;
+
+	if (fbnic_tx_offloads(ring, skb, meta))
+		goto err_free;
+
+	if (fbnic_tx_map(ring, skb, meta))
+		goto err_free;
+
+	return NETDEV_TX_OK;
+
+err_free:
 	dev_kfree_skb_any(skb);
+err_count:
 	return NETDEV_TX_OK;
 }
 
+netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev)
+{
+	struct fbnic_net *fbn = netdev_priv(dev);
+	unsigned int q_map = skb->queue_mapping;
+
+	return fbnic_xmit_frame_ring(skb, fbn->tx[q_map]);
+}
+
+netdev_features_t
+fbnic_features_check(struct sk_buff *skb, struct net_device *dev,
+		     netdev_features_t features)
+{
+	unsigned int l2len, l3len;
+
+	if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL))
+		return features;
+
+	l2len = skb_mac_header_len(skb);
+	l3len = skb_checksum_start(skb) - skb_network_header(skb);
+
+	/* Check header lengths are multiple of 2.
+	 * In case of 6in6 we support longer headers (IHLEN + OHLEN)
+	 * but keep things simple for now, 512B is plenty.
+	 */
+	if ((l2len | l3len | skb->csum_offset) % 2 ||
+	    !FIELD_FIT(FBNIC_TWD_L2_HLEN_MASK, l2len / 2) ||
+	    !FIELD_FIT(FBNIC_TWD_L3_IHLEN_MASK, l3len / 2) ||
+	    !FIELD_FIT(FBNIC_TWD_CSUM_OFFSET_MASK, skb->csum_offset / 2))
+		return features & ~NETIF_F_CSUM_MASK;
+
+	return features;
+}
+
+static void fbnic_clean_twq0(struct fbnic_napi_vector *nv, int napi_budget,
+			     struct fbnic_ring *ring, bool discard,
+			     unsigned int hw_head)
+{
+	u64 total_bytes = 0, total_packets = 0;
+	unsigned int head = ring->head;
+	struct netdev_queue *txq;
+	unsigned int clean_desc;
+
+	clean_desc = (hw_head - head) & ring->size_mask;
+
+	while (clean_desc) {
+		struct sk_buff *skb = ring->tx_buf[head];
+		unsigned int desc_cnt;
+
+		desc_cnt = FBNIC_XMIT_CB(skb)->desc_count;
+		if (desc_cnt > clean_desc)
+			break;
+
+		ring->tx_buf[head] = NULL;
+
+		clean_desc -= desc_cnt;
+
+		while (!(ring->desc[head] & FBNIC_TWD_TYPE(AL))) {
+			head++;
+			head &= ring->size_mask;
+			desc_cnt--;
+		}
+
+		fbnic_unmap_single_twd(nv->dev, &ring->desc[head]);
+		head++;
+		head &= ring->size_mask;
+		desc_cnt--;
+
+		while (desc_cnt--) {
+			fbnic_unmap_page_twd(nv->dev, &ring->desc[head]);
+			head++;
+			head &= ring->size_mask;
+		}
+
+		total_bytes += FBNIC_XMIT_CB(skb)->bytecount;
+		total_packets += 1;
+
+		napi_consume_skb(skb, napi_budget);
+	}
+
+	if (!total_bytes)
+		return;
+
+	ring->head = head;
+
+	txq = txring_txq(nv->napi.dev, ring);
+
+	if (unlikely(discard)) {
+		netdev_tx_completed_queue(txq, total_packets, total_bytes);
+		return;
+	}
+
+	netif_txq_completed_wake(txq, total_packets, total_bytes,
+				 fbnic_desc_unused(ring),
+				 FBNIC_TX_DESC_WAKEUP);
+}
+
 static void fbnic_page_pool_init(struct fbnic_ring *ring, unsigned int idx,
 				 struct page *page)
 {
@@ -66,6 +376,65 @@ static void fbnic_page_pool_drain(struct fbnic_ring *ring, unsigned int idx,
 	rx_buf->page = NULL;
 }
 
+static void fbnic_clean_twq(struct fbnic_napi_vector *nv, int napi_budget,
+			    struct fbnic_q_triad *qt, s32 head0)
+{
+	if (head0 >= 0)
+		fbnic_clean_twq0(nv, napi_budget, &qt->sub0, false, head0);
+}
+
+static void
+fbnic_clean_tcq(struct fbnic_napi_vector *nv, struct fbnic_q_triad *qt,
+		int napi_budget)
+{
+	struct fbnic_ring *cmpl = &qt->cmpl;
+	__le64 *raw_tcd, done;
+	u32 head = cmpl->head;
+	s32 head0 = -1;
+
+	done = (head & (cmpl->size_mask + 1)) ? 0 : cpu_to_le64(FBNIC_TCD_DONE);
+	raw_tcd = &cmpl->desc[head & cmpl->size_mask];
+
+	/* Walk the completion queue collecting the heads reported by NIC */
+	while ((*raw_tcd & cpu_to_le64(FBNIC_TCD_DONE)) == done) {
+		u64 tcd;
+
+		dma_rmb();
+
+		tcd = le64_to_cpu(*raw_tcd);
+
+		switch (FIELD_GET(FBNIC_TCD_TYPE_MASK, tcd)) {
+		case FBNIC_TCD_TYPE_0:
+			if (!(tcd & FBNIC_TCD_TWQ1))
+				head0 = FIELD_GET(FBNIC_TCD_TYPE0_HEAD0_MASK,
+						  tcd);
+			/* Currently all err status bits are related to
+			 * timestamps and as those have yet to be added
+			 * they are skipped for now.
+			 */
+			break;
+		default:
+			break;
+		}
+
+		raw_tcd++;
+		head++;
+		if (!(head & cmpl->size_mask)) {
+			done ^= cpu_to_le64(FBNIC_TCD_DONE);
+			raw_tcd = &cmpl->desc[0];
+		}
+	}
+
+	/* Record the current head/tail of the queue */
+	if (cmpl->head != head) {
+		cmpl->head = head;
+		writel(head & cmpl->size_mask, cmpl->doorbell);
+	}
+
+	/* Unmap and free processed buffers */
+	fbnic_clean_twq(nv, napi_budget, qt, head0);
+}
+
 static void fbnic_clean_bdq(struct fbnic_napi_vector *nv, int napi_budget,
 			    struct fbnic_ring *ring, unsigned int hw_head)
 {
@@ -153,7 +522,7 @@ static void fbnic_put_pkt_buff(struct fbnic_napi_vector *nv,
 	page = virt_to_page(pkt->buff.data_hard_start);
 	page_pool_put_full_page(nv->page_pool, page, !!budget);
 	pkt->buff.data_hard_start = NULL;
-}
+};
 
 static void fbnic_nv_irq_disable(struct fbnic_napi_vector *nv)
 {
@@ -163,8 +532,27 @@ static void fbnic_nv_irq_disable(struct fbnic_napi_vector *nv)
 	wr32(FBNIC_INTR_MASK_SET(v_idx / 32), 1 << (v_idx % 32));
 }
 
+static void fbnic_nv_irq_rearm(struct fbnic_napi_vector *nv)
+{
+	struct fbnic_dev *fbd = nv->fbd;
+	u32 v_idx = nv->v_idx;
+
+	wr32(FBNIC_INTR_CQ_REARM(v_idx), FBNIC_INTR_CQ_REARM_INTR_UNMASK);
+}
+
 static int fbnic_poll(struct napi_struct *napi, int budget)
 {
+	struct fbnic_napi_vector *nv = container_of(napi,
+						    struct fbnic_napi_vector,
+						    napi);
+	int i;
+
+	for (i = 0; i < nv->txt_count; i++)
+		fbnic_clean_tcq(nv, &nv->qt[i], budget);
+
+	if (likely(napi_complete_done(napi, 0)))
+		fbnic_nv_irq_rearm(nv);
+
 	return 0;
 }
 
@@ -896,6 +1284,9 @@ void fbnic_flush(struct fbnic_net *fbn)
 			struct fbnic_q_triad *qt = &nv->qt[i];
 			struct netdev_queue *tx_queue;
 
+			/* Clean the work queues of unprocessed work */
+			fbnic_clean_twq0(nv, 0, &qt->sub0, true, qt->sub0.tail);
+
 			/* Reset completion queue descriptor ring */
 			memset(qt->cmpl.desc, 0, qt->cmpl.size);
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
index 812e4bb245db..0c424c49866d 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
@@ -10,6 +10,18 @@
 
 struct fbnic_net;
 
+/* Guarantee we have space needed for storing the buffer
+ * To store the buffer we need:
+ *	1 descriptor per page
+ *	+ 1 descriptor for skb head
+ *	+ 2 descriptors for metadata and optional metadata
+ *	+ 7 descriptors to keep tail out of the same cachline as head
+ * If we cannot guarantee that then we should return TX_BUSY
+ */
+#define FBNIC_MAX_SKB_DESC	(MAX_SKB_FRAGS + 10)
+#define FBNIC_TX_DESC_WAKEUP	(FBNIC_MAX_SKB_DESC * 2)
+#define FBNIC_TX_DESC_MIN	roundup_pow_of_two(FBNIC_TX_DESC_WAKEUP)
+
 #define FBNIC_MAX_TXQS			128u
 #define FBNIC_MAX_RXQS			128u
 
@@ -90,6 +102,9 @@ struct fbnic_napi_vector {
 #define FBNIC_MAX_RXQS			128u
 
 netdev_tx_t fbnic_xmit_frame(struct sk_buff *skb, struct net_device *dev);
+netdev_features_t
+fbnic_features_check(struct sk_buff *skb, struct net_device *dev,
+		     netdev_features_t features);
 
 int fbnic_alloc_napi_vectors(struct fbnic_net *fbn);
 void fbnic_free_napi_vectors(struct fbnic_net *fbn);



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (11 preceding siblings ...)
  2024-04-03 20:09 ` [net-next PATCH 12/15] eth: fbnic: add basic Tx handling Alexander Duyck
@ 2024-04-03 20:09 ` Alexander Duyck
  2024-04-09 11:47   ` Yunsheng Lin
  2024-04-03 20:09 ` [net-next PATCH 14/15] eth: fbnic: add L2 address programming Alexander Duyck
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:09 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Handle Rx packets with basic csum and Rx hash offloads.

NIC writes back to the completion ring a head buffer descriptor
(data buffer allocated from header pages), variable number of payload
descriptors (data buffers in payload pages), an optional metadata
descriptor (type 3) and finally the primary metadata descriptor
(type 2).

This format makes scatter support fairly easy - start gathering
the pages when we see head page, gather until we see the primary
metadata descriptor, do the processing. Use XDP infra to collect
the packet fragments as we traverse the descriptors. XDP itself
is not supported yet, but it will be soon.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |   60 ++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |    4 
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c    |    3 
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c   |  322 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h   |    2 
 5 files changed, 387 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 0819ddc1dcc8..f61b401fdd5c 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -90,6 +90,66 @@ enum {
 
 #define FBNIC_TCD_DONE				DESC_BIT(63)
 
+/* Rx Completion Queue Descriptors */
+#define FBNIC_RCD_TYPE_MASK			DESC_GENMASK(62, 61)
+enum {
+	FBNIC_RCD_TYPE_HDR_AL	= 0,
+	FBNIC_RCD_TYPE_PAY_AL	= 1,
+	FBNIC_RCD_TYPE_OPT_META	= 2,
+	FBNIC_RCD_TYPE_META	= 3,
+};
+
+#define FBNIC_RCD_DONE				DESC_BIT(63)
+
+/* Address/Length Completion Descriptors */
+#define FBNIC_RCD_AL_BUFF_ID_MASK		DESC_GENMASK(15, 0)
+#define FBNIC_RCD_AL_BUFF_LEN_MASK		DESC_GENMASK(28, 16)
+#define FBNIC_RCD_AL_BUFF_OFF_MASK		DESC_GENMASK(43, 32)
+#define FBNIC_RCD_AL_PAGE_FIN			DESC_BIT(60)
+
+/* Header AL specific values */
+#define FBNIC_RCD_HDR_AL_OVERFLOW		DESC_BIT(53)
+#define FBNIC_RCD_HDR_AL_DMA_HINT_MASK		DESC_GENMASK(59, 54)
+enum {
+	FBNIC_RCD_HDR_AL_DMA_HINT_NONE  = 0,
+	FBNIC_RCD_HDR_AL_DMA_HINT_L2	= 1,
+	FBNIC_RCD_HDR_AL_DMA_HINT_L3	= 2,
+	FBNIC_RCD_HDR_AL_DMA_HINT_L4	= 4,
+};
+
+/* Optional Metadata Completion Descriptors */
+#define FBNIC_RCD_OPT_META_TS_MASK		DESC_GENMASK(39, 0)
+#define FBNIC_RCD_OPT_META_ACTION_MASK		DESC_GENMASK(45, 40)
+#define FBNIC_RCD_OPT_META_ACTION		DESC_BIT(57)
+#define FBNIC_RCD_OPT_META_TS			DESC_BIT(58)
+#define FBNIC_RCD_OPT_META_TYPE_MASK		DESC_GENMASK(60, 59)
+
+/* Metadata Completion Descriptors */
+#define FBNIC_RCD_META_RSS_HASH_MASK		DESC_GENMASK(31, 0)
+#define FBNIC_RCD_META_L2_CSUM_MASK		DESC_GENMASK(47, 32)
+#define FBNIC_RCD_META_L3_TYPE_MASK		DESC_GENMASK(49, 48)
+enum {
+	FBNIC_RCD_META_L3_TYPE_OTHER	= 0,
+	FBNIC_RCD_META_L3_TYPE_IPV4	= 1,
+	FBNIC_RCD_META_L3_TYPE_IPV6	= 2,
+	FBNIC_RCD_META_L3_TYPE_V6V6	= 3,
+};
+
+#define FBNIC_RCD_META_L4_TYPE_MASK		DESC_GENMASK(51, 50)
+enum {
+	FBNIC_RCD_META_L4_TYPE_OTHER	= 0,
+	FBNIC_RCD_META_L4_TYPE_TCP	= 1,
+	FBNIC_RCD_META_L4_TYPE_UDP	= 2,
+};
+
+#define FBNIC_RCD_META_L4_CSUM_UNNECESSARY	DESC_BIT(52)
+#define FBNIC_RCD_META_ERR_MAC_EOP		DESC_BIT(53)
+#define FBNIC_RCD_META_ERR_TRUNCATED_FRAME	DESC_BIT(54)
+#define FBNIC_RCD_META_ERR_PARSER		DESC_BIT(55)
+#define FBNIC_RCD_META_UNCORRECTABLE_ERR_MASK	\
+	(FBNIC_RCD_META_ERR_MAC_EOP | FBNIC_RCD_META_ERR_TRUNCATED_FRAME)
+#define FBNIC_RCD_META_ECN			DESC_BIT(60)
+
 /* Rx Buffer Descriptor Format */
 #define FBNIC_BD_PAGE_ADDR_MASK			DESC_GENMASK(45, 12)
 #define FBNIC_BD_PAGE_ID_MASK			DESC_GENMASK(63, 48)
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index 91d4ea2bfb29..792bdfa7429d 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -150,8 +150,10 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
 	fbnic_reset_queues(fbn, default_queues, default_queues);
 
 	netdev->features |=
+		NETIF_F_RXHASH |
 		NETIF_F_SG |
-		NETIF_F_HW_CSUM;
+		NETIF_F_HW_CSUM |
+		NETIF_F_RXCSUM;
 
 	netdev->hw_features |= netdev->features;
 	netdev->vlan_features |= netdev->features;
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index f243950c68bb..d897b0d65abf 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -268,6 +268,9 @@ static void fbnic_service_task(struct work_struct *work)
 
 	fbnic_health_check(fbd);
 
+	if (netif_carrier_ok(fbd->netdev))
+		fbnic_napi_depletion_check(fbd->netdev);
+
 	if (netif_running(fbd->netdev))
 		schedule_delayed_work(&fbd->service_task, HZ);
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
index ad4cb059c959..fe35112b9075 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
@@ -131,6 +131,24 @@ fbnic_tx_offloads(struct fbnic_ring *ring, struct sk_buff *skb, __le64 *meta)
 	return false;
 }
 
+static void
+fbnic_rx_csum(u64 rcd, struct sk_buff *skb, struct fbnic_ring *rcq)
+{
+	skb_checksum_none_assert(skb);
+
+	if (unlikely(!(skb->dev->features & NETIF_F_RXCSUM)))
+		return;
+
+	if (FIELD_GET(FBNIC_RCD_META_L4_CSUM_UNNECESSARY, rcd)) {
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	} else {
+		u16 csum = FIELD_GET(FBNIC_RCD_META_L2_CSUM_MASK, rcd);
+
+		skb->ip_summed = CHECKSUM_COMPLETE;
+		skb->csum = (__force __wsum)csum;
+	}
+}
+
 static bool
 fbnic_tx_map(struct fbnic_ring *ring, struct sk_buff *skb, __le64 *meta)
 {
@@ -364,6 +382,16 @@ static void fbnic_page_pool_init(struct fbnic_ring *ring, unsigned int idx,
 	rx_buf->page = page;
 }
 
+static struct page *fbnic_page_pool_get(struct fbnic_ring *ring,
+					unsigned int idx)
+{
+	struct fbnic_rx_buf *rx_buf = &ring->rx_buf[idx];
+
+	rx_buf->pagecnt_bias--;
+
+	return rx_buf->page;
+}
+
 static void fbnic_page_pool_drain(struct fbnic_ring *ring, unsigned int idx,
 				  struct fbnic_napi_vector *nv, int budget)
 {
@@ -501,6 +529,98 @@ static void fbnic_fill_bdq(struct fbnic_napi_vector *nv, struct fbnic_ring *bdq)
 	}
 }
 
+static unsigned int fbnic_hdr_pg_start(unsigned int pg_off)
+{
+	/* The headroom of the first header may be larger than FBNIC_RX_HROOM
+	 * due to alignment. So account for that by just making the page
+	 * offset 0 if we are starting at the first header.
+	 */
+	if (ALIGN(FBNIC_RX_HROOM, 128) > FBNIC_RX_HROOM &&
+	    pg_off == ALIGN(FBNIC_RX_HROOM, 128))
+		return 0;
+
+	return pg_off - FBNIC_RX_HROOM;
+}
+
+static unsigned int fbnic_hdr_pg_end(unsigned int pg_off, unsigned int len)
+{
+	/* Determine the end of the buffer by finding the start of the next
+	 * and then subtracting the headroom from that frame.
+	 */
+	pg_off += len + FBNIC_RX_TROOM + FBNIC_RX_HROOM;
+
+	return ALIGN(pg_off, 128) - FBNIC_RX_HROOM;
+}
+
+static void fbnic_pkt_prepare(struct fbnic_napi_vector *nv, u64 rcd,
+			      struct fbnic_pkt_buff *pkt,
+			      struct fbnic_q_triad *qt)
+{
+	unsigned int hdr_pg_off = FIELD_GET(FBNIC_RCD_AL_BUFF_OFF_MASK, rcd);
+	unsigned int hdr_pg_idx = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
+	struct page *page = fbnic_page_pool_get(&qt->sub0, hdr_pg_idx);
+	unsigned int len = FIELD_GET(FBNIC_RCD_AL_BUFF_LEN_MASK, rcd);
+	unsigned int frame_sz, hdr_pg_start, hdr_pg_end, headroom;
+	unsigned char *hdr_start;
+
+	/* data_hard_start should always be NULL when this is called */
+	WARN_ON_ONCE(pkt->buff.data_hard_start);
+
+	/* Short-cut the end caclulation if we know page is fully consumed */
+	hdr_pg_end = FIELD_GET(FBNIC_RCD_AL_PAGE_FIN, rcd) ?
+		     PAGE_SIZE : fbnic_hdr_pg_end(hdr_pg_off, len);
+	hdr_pg_start = fbnic_hdr_pg_start(hdr_pg_off);
+
+	frame_sz = hdr_pg_end - hdr_pg_start;
+	xdp_init_buff(&pkt->buff, frame_sz, NULL);
+
+	/* Sync DMA buffer */
+	dma_sync_single_range_for_cpu(nv->dev, page_pool_get_dma_addr(page),
+				      hdr_pg_start, frame_sz,
+				      DMA_BIDIRECTIONAL);
+
+	/* Build frame around buffer */
+	hdr_start = page_address(page) + hdr_pg_start;
+	headroom = hdr_pg_off - hdr_pg_start + FBNIC_RX_PAD;
+
+	xdp_prepare_buff(&pkt->buff, hdr_start, headroom,
+			 len - FBNIC_RX_PAD, true);
+
+	pkt->data_truesize = 0;
+	pkt->data_len = 0;
+	pkt->nr_frags = 0;
+}
+
+static void fbnic_add_rx_frag(struct fbnic_napi_vector *nv, u64 rcd,
+			      struct fbnic_pkt_buff *pkt,
+			      struct fbnic_q_triad *qt)
+{
+	unsigned int pg_off = FIELD_GET(FBNIC_RCD_AL_BUFF_OFF_MASK, rcd);
+	unsigned int pg_idx = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
+	unsigned int len = FIELD_GET(FBNIC_RCD_AL_BUFF_LEN_MASK, rcd);
+	struct page *page = fbnic_page_pool_get(&qt->sub1, pg_idx);
+	struct skb_shared_info *shinfo;
+	unsigned int truesize;
+
+	truesize = FIELD_GET(FBNIC_RCD_AL_PAGE_FIN, rcd) ? PAGE_SIZE - pg_off :
+		   ALIGN(len, 128);
+
+	/* Sync DMA buffer */
+	dma_sync_single_range_for_cpu(nv->dev, page_pool_get_dma_addr(page),
+				      pg_off, truesize, DMA_BIDIRECTIONAL);
+
+	/* Add page to xdp shared info */
+	shinfo = xdp_get_shared_info_from_buff(&pkt->buff);
+
+	/* We use gso_segs to store truesize */
+	pkt->data_truesize += truesize;
+
+	__skb_fill_page_desc_noacc(shinfo, pkt->nr_frags++, page, pg_off, len);
+
+	/* Store data_len in gso_size */
+	pkt->data_len += len;
+}
+
 static void fbnic_put_pkt_buff(struct fbnic_napi_vector *nv,
 			       struct fbnic_pkt_buff *pkt, int budget)
 {
@@ -522,7 +642,167 @@ static void fbnic_put_pkt_buff(struct fbnic_napi_vector *nv,
 	page = virt_to_page(pkt->buff.data_hard_start);
 	page_pool_put_full_page(nv->page_pool, page, !!budget);
 	pkt->buff.data_hard_start = NULL;
-};
+}
+
+static struct sk_buff *fbnic_build_skb(struct fbnic_napi_vector *nv,
+				       struct fbnic_pkt_buff *pkt)
+{
+	unsigned int nr_frags = pkt->nr_frags;
+	struct skb_shared_info *shinfo;
+	unsigned int truesize;
+	struct sk_buff *skb;
+
+	truesize = xdp_data_hard_end(&pkt->buff) + FBNIC_RX_TROOM -
+		   pkt->buff.data_hard_start;
+
+	/* Build frame around buffer */
+	skb = napi_build_skb(pkt->buff.data_hard_start, truesize);
+	if (unlikely(!skb))
+		return NULL;
+
+	/* Push data pointer to start of data, put tail to end of data */
+	skb_reserve(skb, pkt->buff.data - pkt->buff.data_hard_start);
+	__skb_put(skb, pkt->buff.data_end - pkt->buff.data);
+
+	/* Add tracking for metadata at the start of the frame */
+	skb_metadata_set(skb, pkt->buff.data - pkt->buff.data_meta);
+
+	/* Add Rx frags */
+	if (nr_frags) {
+		/* Verify that shared info didn't move */
+		shinfo = xdp_get_shared_info_from_buff(&pkt->buff);
+		WARN_ON(skb_shinfo(skb) != shinfo);
+
+		skb->truesize += pkt->data_truesize;
+		skb->data_len += pkt->data_len;
+		shinfo->nr_frags = nr_frags;
+		skb->len += pkt->data_len;
+	}
+
+	skb_mark_for_recycle(skb);
+
+	/* Set MAC header specific fields */
+	skb->protocol = eth_type_trans(skb, nv->napi.dev);
+
+	return skb;
+}
+
+static enum pkt_hash_types fbnic_skb_hash_type(u64 rcd)
+{
+	return (FBNIC_RCD_META_L4_TYPE_MASK & rcd) ? PKT_HASH_TYPE_L4 :
+	       (FBNIC_RCD_META_L3_TYPE_MASK & rcd) ? PKT_HASH_TYPE_L3 :
+						     PKT_HASH_TYPE_L2;
+}
+
+static void fbnic_populate_skb_fields(struct fbnic_napi_vector *nv,
+				      u64 rcd, struct sk_buff *skb,
+				      struct fbnic_q_triad *qt)
+{
+	struct net_device *netdev = nv->napi.dev;
+	struct fbnic_ring *rcq = &qt->cmpl;
+
+	fbnic_rx_csum(rcd, skb, rcq);
+
+	if (netdev->features & NETIF_F_RXHASH)
+		skb_set_hash(skb,
+			     FIELD_GET(FBNIC_RCD_META_RSS_HASH_MASK, rcd),
+			     fbnic_skb_hash_type(rcd));
+
+	skb_record_rx_queue(skb, rcq->q_idx);
+}
+
+static bool fbnic_rcd_metadata_err(u64 rcd)
+{
+	return !!(FBNIC_RCD_META_UNCORRECTABLE_ERR_MASK & rcd);
+}
+
+static int fbnic_clean_rcq(struct fbnic_napi_vector *nv,
+			   struct fbnic_q_triad *qt, int budget)
+{
+	struct fbnic_ring *rcq = &qt->cmpl;
+	struct fbnic_pkt_buff *pkt;
+	s32 head0 = -1, head1 = -1;
+	__le64 *raw_rcd, done;
+	u32 head = rcq->head;
+	u64 packets = 0;
+
+	done = (head & (rcq->size_mask + 1)) ? cpu_to_le64(FBNIC_RCD_DONE) : 0;
+	raw_rcd = &rcq->desc[head & rcq->size_mask];
+	pkt = rcq->pkt;
+
+	/* Walk the completion queue collecting the heads reported by NIC */
+	while (likely(packets < budget)) {
+		struct sk_buff *skb = ERR_PTR(-EINVAL);
+		u64 rcd;
+
+		if ((*raw_rcd & cpu_to_le64(FBNIC_RCD_DONE)) == done)
+			break;
+
+		dma_rmb();
+
+		rcd = le64_to_cpu(*raw_rcd);
+
+		switch (FIELD_GET(FBNIC_RCD_TYPE_MASK, rcd)) {
+		case FBNIC_RCD_TYPE_HDR_AL:
+			head0 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
+			fbnic_pkt_prepare(nv, rcd, pkt, qt);
+
+			break;
+		case FBNIC_RCD_TYPE_PAY_AL:
+			head1 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
+			fbnic_add_rx_frag(nv, rcd, pkt, qt);
+
+			break;
+		case FBNIC_RCD_TYPE_OPT_META:
+			/* Only type 0 is currently supported */
+			if (FIELD_GET(FBNIC_RCD_OPT_META_TYPE_MASK, rcd))
+				break;
+
+			/* We currently ignore the action table index */
+			break;
+		case FBNIC_RCD_TYPE_META:
+			if (likely(!fbnic_rcd_metadata_err(rcd)))
+				skb = fbnic_build_skb(nv, pkt);
+
+			/* populate skb and invalidate XDP */
+			if (!IS_ERR_OR_NULL(skb)) {
+				fbnic_populate_skb_fields(nv, rcd, skb, qt);
+
+				packets++;
+
+				napi_gro_receive(&nv->napi, skb);
+			}
+
+			pkt->buff.data_hard_start = NULL;
+
+			break;
+		}
+
+		raw_rcd++;
+		head++;
+		if (!(head & rcq->size_mask)) {
+			done ^= cpu_to_le64(FBNIC_RCD_DONE);
+			raw_rcd = &rcq->desc[0];
+		}
+	}
+
+	/* Unmap and free processed buffers */
+	if (head0 >= 0)
+		fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
+	fbnic_fill_bdq(nv, &qt->sub0);
+
+	if (head1 >= 0)
+		fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
+	fbnic_fill_bdq(nv, &qt->sub1);
+
+	/* Record the current head/tail of the queue */
+	if (rcq->head != head) {
+		rcq->head = head;
+		writel(head & rcq->size_mask, rcq->doorbell);
+	}
+
+	return packets;
+}
 
 static void fbnic_nv_irq_disable(struct fbnic_napi_vector *nv)
 {
@@ -545,12 +825,19 @@ static int fbnic_poll(struct napi_struct *napi, int budget)
 	struct fbnic_napi_vector *nv = container_of(napi,
 						    struct fbnic_napi_vector,
 						    napi);
-	int i;
+	int i, j, work_done = 0;
 
 	for (i = 0; i < nv->txt_count; i++)
 		fbnic_clean_tcq(nv, &nv->qt[i], budget);
 
-	if (likely(napi_complete_done(napi, 0)))
+	if (budget)
+		for (j = 0; j < nv->rxt_count; j++, i++)
+			work_done += fbnic_clean_rcq(nv, &nv->qt[i], budget);
+
+	if (work_done >= budget)
+		return budget;
+
+	if (likely(napi_complete_done(napi, work_done)))
 		fbnic_nv_irq_rearm(nv);
 
 	return 0;
@@ -1555,3 +1842,32 @@ void fbnic_napi_enable(struct fbnic_net *fbn)
 	}
 	wrfl();
 }
+
+void fbnic_napi_depletion_check(struct net_device *netdev)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	u32 irqs[FBNIC_MAX_MSIX_VECS / 32] = {};
+	struct fbnic_dev *fbd = fbn->fbd;
+	struct fbnic_napi_vector *nv;
+	int i, j;
+
+	list_for_each_entry(nv, &fbn->napis, napis) {
+		/* Find RQs which are completely out of pages */
+		for (i = nv->txt_count, j = 0; j < nv->rxt_count; j++, i++) {
+			/* Assume 4 pages is always enough to fit a packet
+			 * and therefore generate a completion and an IRQ.
+			 */
+			if (fbnic_desc_used(&nv->qt[i].sub0) < 4 ||
+			    fbnic_desc_used(&nv->qt[i].sub1) < 4)
+				irqs[nv->v_idx / 32] |= BIT(nv->v_idx % 32);
+		}
+	}
+
+	for (i = 0; i < ARRAY_SIZE(irqs); i++) {
+		if (!irqs[i])
+			continue;
+		wr32(FBNIC_INTR_MASK_CLEAR(i), irqs[i]);
+		wr32(FBNIC_INTR_SET(i), irqs[i]);
+	}
+	wrfl();
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
index 0c424c49866d..4e43d41c781a 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
@@ -5,6 +5,7 @@
 #define _FBNIC_TXRX_H_
 
 #include <linux/netdevice.h>
+#include <linux/skbuff.h>
 #include <linux/types.h>
 #include <net/xdp.h>
 
@@ -118,6 +119,7 @@ void fbnic_config_drop_mode(struct fbnic_net *fbn);
 void fbnic_flush(struct fbnic_net *fbn);
 void fbnic_fill(struct fbnic_net *fbn);
 
+void fbnic_napi_depletion_check(struct net_device *netdev);
 int fbnic_wait_all_queues_idle(struct fbnic_dev *fbd, bool may_fail);
 
 #endif /* _FBNIC_TXRX_H_ */



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 14/15] eth: fbnic: add L2 address programming
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (12 preceding siblings ...)
  2024-04-03 20:09 ` [net-next PATCH 13/15] eth: fbnic: add basic Rx handling Alexander Duyck
@ 2024-04-03 20:09 ` Alexander Duyck
  2024-04-03 20:09 ` [net-next PATCH 15/15] eth: fbnic: write the TCAM tables used for RSS control and Rx to host Alexander Duyck
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:09 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

Program the Rx TCAM to control L2 forwarding. Since we are in full
control of the NIC we need to make sure we include BMC forwarding
in the rules. When host is not present BMC will program the TCAM
to get onto the network but once we take ownership it's up to
Linux driver to make sure BMC L2 addresses are handled correctly.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/Makefile        |    1 
 drivers/net/ethernet/meta/fbnic/fbnic.h         |   10 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h     |   14 +
 drivers/net/ethernet/meta/fbnic/fbnic_devlink.c |    2 
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c  |  231 ++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h  |    3 
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c     |    3 
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c     |  338 +++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.h     |  139 +++++++++
 9 files changed, 741 insertions(+)
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
 create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_rpc.h

diff --git a/drivers/net/ethernet/meta/fbnic/Makefile b/drivers/net/ethernet/meta/fbnic/Makefile
index f2ea90e0c14f..5210844ebe63 100644
--- a/drivers/net/ethernet/meta/fbnic/Makefile
+++ b/drivers/net/ethernet/meta/fbnic/Makefile
@@ -13,5 +13,6 @@ fbnic-y := fbnic_devlink.o \
 	   fbnic_mac.o \
 	   fbnic_netdev.o \
 	   fbnic_pci.o \
+	   fbnic_rpc.o \
 	   fbnic_tlv.o \
 	   fbnic_txrx.o
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 202f005e1cfd..0a62dc129d7e 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -10,6 +10,7 @@
 #include "fbnic_csr.h"
 #include "fbnic_fw.h"
 #include "fbnic_mac.h"
+#include "fbnic_rpc.h"
 
 struct fbnic_dev {
 	struct device *dev;
@@ -38,6 +39,10 @@ struct fbnic_dev {
 	u32 mps;
 	u32 readrq;
 
+	/* Local copy of the devices TCAM */
+	struct fbnic_mac_addr mac_addr[FBNIC_RPC_TCAM_MACDA_NUM_ENTRIES];
+	u8 mac_addr_boundary;
+
 	/* Tri-state value indicating state of link.
 	 *  0 - Up
 	 *  1 - Down
@@ -103,6 +108,11 @@ static inline bool fbnic_bmc_present(struct fbnic_dev *fbd)
 	return fbd->fw_cap.bmc_present;
 }
 
+static inline void fbnic_bmc_set_present(struct fbnic_dev *fbd, bool present)
+{
+	fbd->fw_cap.bmc_present = present;
+}
+
 static inline bool fbnic_init_failure(struct fbnic_dev *fbd)
 {
 	return !fbd->netdev;
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index f61b401fdd5c..613b50bf829c 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -508,8 +508,22 @@ enum {
 #define FBNIC_RPC_RMI_CONFIG_FCS_PRESENT	CSR_BIT(8)
 #define FBNIC_RPC_RMI_CONFIG_ENABLE		CSR_BIT(12)
 #define FBNIC_RPC_RMI_CONFIG_MTU		CSR_GENMASK(31, 16)
+#define FBNIC_RPC_TCAM_MACDA_VALIDATE	0x0852d		/* 0x214b4 */
 #define FBNIC_CSR_END_RPC		0x0856b	/* CSR section delimiter */
 
+/* RPC RAM Registers */
+
+#define FBNIC_CSR_START_RPC_RAM		0x08800	/* CSR section delimiter */
+#define FBNIC_RPC_ACT_TBL_NUM_ENTRIES		64
+
+/* TCAM Tables */
+#define FBNIC_RPC_TCAM_VALIDATE			CSR_BIT(31)
+#define FBNIC_RPC_TCAM_MACDA(m, n) \
+	(0x08b80 + ((n) * 0x20) + (m))		/* 0x022e00 + 0x80*n + 4*m */
+#define FBNIC_RPC_TCAM_MACDA_VALUE		CSR_GENMASK(15, 0)
+#define FBNIC_RPC_TCAM_MACDA_MASK		CSR_GENMASK(31, 16)
+#define FBNIC_CSR_END_RPC_RAM		0x08f1f	/* CSR section delimiter */
+
 /* Fab Registers */
 #define FBNIC_CSR_START_FAB		0x0C000 /* CSR section delimiter */
 #define FBNIC_FAB_AXI4_AR_SPACER_2_CFG		0x0C005		/* 0x30014 */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c b/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
index 91e8135410df..b007e7bddf81 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
@@ -66,6 +66,8 @@ struct fbnic_dev *fbnic_devlink_alloc(struct pci_dev *pdev)
 
 	fbd->dsn = pci_get_dsn(pdev);
 
+	fbd->mac_addr_boundary = FBNIC_RPC_TCAM_MACDA_DEFAULT_BOUNDARY;
+
 	return fbd;
 }
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index 792bdfa7429d..349560821435 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -7,6 +7,7 @@
 
 #include "fbnic.h"
 #include "fbnic_netdev.h"
+#include "fbnic_rpc.h"
 #include "fbnic_txrx.h"
 
 int __fbnic_open(struct fbnic_net *fbn)
@@ -48,6 +49,8 @@ int __fbnic_open(struct fbnic_net *fbn)
 	err = fbnic_mac_enable(fbd);
 	if (err)
 		goto release_ownership;
+	/* Pull the BMC config and initialize the RPC */
+	fbnic_bmc_rpc_init(fbd);
 
 	return 0;
 release_ownership:
@@ -86,12 +89,240 @@ static int fbnic_stop(struct net_device *netdev)
 	return 0;
 }
 
+static int fbnic_uc_sync(struct net_device *netdev, const unsigned char *addr)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	struct fbnic_mac_addr *avail_addr;
+
+	if (WARN_ON(!is_valid_ether_addr(addr)))
+		return -EADDRNOTAVAIL;
+
+	avail_addr = __fbnic_uc_sync(fbn->fbd, addr);
+	if (!avail_addr)
+		return -ENOSPC;
+
+	/* Add type flag indicating this address is in use by the host */
+	set_bit(FBNIC_MAC_ADDR_T_UNICAST, avail_addr->act_tcam);
+
+	return 0;
+}
+
+static int fbnic_uc_unsync(struct net_device *netdev, const unsigned char *addr)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	struct fbnic_dev *fbd = fbn->fbd;
+	int i, ret;
+
+	/* Scan from middle of list to bottom, filling bottom up.
+	 * Skip the first entry which is reserved for dev_addr and
+	 * leave the last entry to use for promiscuous filtering.
+	 */
+	for (i = fbd->mac_addr_boundary, ret = -ENOENT;
+	     i < FBNIC_RPC_TCAM_MACDA_HOST_ADDR_IDX && ret; i++) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[i];
+
+		if (!ether_addr_equal(mac_addr->value.addr8, addr))
+			continue;
+
+		ret = __fbnic_uc_unsync(mac_addr);
+	}
+
+	return ret;
+}
+
+static int fbnic_mc_sync(struct net_device *netdev, const unsigned char *addr)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	struct fbnic_mac_addr *avail_addr;
+
+	if (WARN_ON(!is_multicast_ether_addr(addr)))
+		return -EADDRNOTAVAIL;
+
+	avail_addr = __fbnic_mc_sync(fbn->fbd, addr);
+	if (!avail_addr)
+		return -ENOSPC;
+
+	/* Add type flag indicating this address is in use by the host */
+	set_bit(FBNIC_MAC_ADDR_T_MULTICAST, avail_addr->act_tcam);
+
+	return 0;
+}
+
+static int fbnic_mc_unsync(struct net_device *netdev, const unsigned char *addr)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	struct fbnic_dev *fbd = fbn->fbd;
+	int i, ret;
+
+	/* Scan from middle of list to top, filling top down.
+	 * Skip over the address reserved for the BMC MAC and
+	 * exclude index 0 as that belongs to the broadcast address
+	 */
+	for (i = fbd->mac_addr_boundary, ret = -ENOENT;
+	     --i > FBNIC_RPC_TCAM_MACDA_BROADCAST_IDX && ret;) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[i];
+
+		if (!ether_addr_equal(mac_addr->value.addr8, addr))
+			continue;
+
+		ret = __fbnic_mc_unsync(mac_addr);
+	}
+
+	return ret;
+}
+
+void __fbnic_set_rx_mode(struct net_device *netdev)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	bool uc_promisc = false, mc_promisc = false;
+	struct fbnic_dev *fbd = fbn->fbd;
+	struct fbnic_mac_addr *mac_addr;
+	int err;
+
+	/* Populate host address from dev_addr */
+	mac_addr = &fbd->mac_addr[FBNIC_RPC_TCAM_MACDA_HOST_ADDR_IDX];
+	if (!ether_addr_equal(mac_addr->value.addr8, netdev->dev_addr) ||
+	    mac_addr->state != FBNIC_TCAM_S_VALID) {
+		ether_addr_copy(mac_addr->value.addr8, netdev->dev_addr);
+		mac_addr->state = FBNIC_TCAM_S_UPDATE;
+		set_bit(FBNIC_MAC_ADDR_T_UNICAST, mac_addr->act_tcam);
+	}
+
+	/* Populate broadcast address if broadcast is enabled */
+	mac_addr = &fbd->mac_addr[FBNIC_RPC_TCAM_MACDA_BROADCAST_IDX];
+	if (netdev->flags & IFF_BROADCAST) {
+		if (!is_broadcast_ether_addr(mac_addr->value.addr8) ||
+		    mac_addr->state != FBNIC_TCAM_S_VALID) {
+			eth_broadcast_addr(mac_addr->value.addr8);
+			mac_addr->state = FBNIC_TCAM_S_ADD;
+		}
+		set_bit(FBNIC_MAC_ADDR_T_BROADCAST, mac_addr->act_tcam);
+	} else if (mac_addr->state == FBNIC_TCAM_S_VALID) {
+		__fbnic_xc_unsync(mac_addr, FBNIC_MAC_ADDR_T_BROADCAST);
+	}
+
+	/* synchronize unicast and multicast address lists */
+	err = __dev_uc_sync(netdev, fbnic_uc_sync, fbnic_uc_unsync);
+	if (err == -ENOSPC)
+		uc_promisc = true;
+	err = __dev_mc_sync(netdev, fbnic_mc_sync, fbnic_mc_unsync);
+	if (err == -ENOSPC)
+		mc_promisc = true;
+
+	uc_promisc |= !!(netdev->flags & IFF_PROMISC);
+	mc_promisc |= !!(netdev->flags & IFF_ALLMULTI) || uc_promisc;
+
+	/* Populate last TCAM entry with promiscuous entry and 0/1 bit mask */
+	mac_addr = &fbd->mac_addr[FBNIC_RPC_TCAM_MACDA_PROMISC_IDX];
+	if (uc_promisc) {
+		if (!is_zero_ether_addr(mac_addr->value.addr8) ||
+		    mac_addr->state != FBNIC_TCAM_S_VALID) {
+			eth_zero_addr(mac_addr->value.addr8);
+			eth_broadcast_addr(mac_addr->mask.addr8);
+			clear_bit(FBNIC_MAC_ADDR_T_ALLMULTI,
+				  mac_addr->act_tcam);
+			set_bit(FBNIC_MAC_ADDR_T_PROMISC,
+				mac_addr->act_tcam);
+			mac_addr->state = FBNIC_TCAM_S_ADD;
+		}
+	} else if (mc_promisc) {
+		/* We have to add a special handler for multicast as the
+		 * BMC may have an all-multi rule already in place. As such
+		 * adding a rule ourselves won't do any good so we will have
+		 * to modify the rules for the ALL MULTI below if the BMC
+		 * already has the rule in place.
+		 */
+		if (!fbd->fw_cap.all_multi &&
+		    (!is_multicast_ether_addr(mac_addr->value.addr8) ||
+		     mac_addr->state != FBNIC_TCAM_S_VALID)) {
+			eth_zero_addr(mac_addr->value.addr8);
+			eth_broadcast_addr(mac_addr->mask.addr8);
+			mac_addr->value.addr8[0] ^= 1;
+			mac_addr->mask.addr8[0] ^= 1;
+			set_bit(FBNIC_MAC_ADDR_T_ALLMULTI,
+				mac_addr->act_tcam);
+			clear_bit(FBNIC_MAC_ADDR_T_PROMISC,
+				  mac_addr->act_tcam);
+			mac_addr->state = FBNIC_TCAM_S_ADD;
+		}
+	} else if (mac_addr->state == FBNIC_TCAM_S_VALID) {
+		if (test_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam)) {
+			clear_bit(FBNIC_MAC_ADDR_T_ALLMULTI,
+				  mac_addr->act_tcam);
+			clear_bit(FBNIC_MAC_ADDR_T_PROMISC,
+				  mac_addr->act_tcam);
+		} else {
+			mac_addr->state = FBNIC_TCAM_S_DELETE;
+		}
+	}
+
+	/* Add rules for BMC all multicast if it is enabled */
+	fbnic_bmc_rpc_all_multi_config(fbd, mc_promisc);
+
+	/* sift out any unshared BMC rules and place them in BMC only section */
+	fbnic_sift_macda(fbd);
+
+	/* Write updates to hardware */
+	fbnic_write_macda(fbd);
+}
+
+static void fbnic_set_rx_mode(struct net_device *netdev)
+{
+	/* no need to update the hardware if we are not running */
+	if (netif_running(netdev))
+		__fbnic_set_rx_mode(netdev);
+}
+
+static int fbnic_set_mac(struct net_device *netdev, void *p)
+{
+	struct sockaddr *addr = p;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	eth_hw_addr_set(netdev, addr->sa_data);
+
+	fbnic_set_rx_mode(netdev);
+
+	return 0;
+}
+
+void fbnic_clear_rx_mode(struct net_device *netdev)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	struct fbnic_dev *fbd = fbn->fbd;
+	int idx;
+
+	for (idx = ARRAY_SIZE(fbd->mac_addr); idx--;) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[idx];
+
+		if (mac_addr->state != FBNIC_TCAM_S_VALID)
+			continue;
+
+		bitmap_clear(mac_addr->act_tcam,
+			     FBNIC_MAC_ADDR_T_HOST_START,
+			     FBNIC_MAC_ADDR_T_HOST_LEN);
+
+		if (bitmap_empty(mac_addr->act_tcam,
+				 FBNIC_RPC_TCAM_ACT_NUM_ENTRIES))
+			mac_addr->state = FBNIC_TCAM_S_DELETE;
+	}
+
+	/* Write updates to hardware */
+	fbnic_write_macda(fbd);
+
+	__dev_uc_unsync(netdev, NULL);
+	__dev_mc_unsync(netdev, NULL);
+}
+
 static const struct net_device_ops fbnic_netdev_ops = {
 	.ndo_open		= fbnic_open,
 	.ndo_stop		= fbnic_stop,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_start_xmit		= fbnic_xmit_frame,
 	.ndo_features_check	= fbnic_features_check,
+	.ndo_set_mac_address	= fbnic_set_mac,
+	.ndo_set_rx_mode	= fbnic_set_rx_mode,
 };
 
 void fbnic_reset_queues(struct fbnic_net *fbn,
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
index 3976fb1a0eac..40e155cf1865 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
@@ -46,4 +46,7 @@ void fbnic_netdev_unregister(struct net_device *netdev);
 void fbnic_reset_queues(struct fbnic_net *fbn,
 			unsigned int tx, unsigned int rx);
 
+void __fbnic_set_rx_mode(struct net_device *netdev);
+void fbnic_clear_rx_mode(struct net_device *netdev);
+
 #endif /* _FBNIC_NETDEV_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index d897b0d65abf..fbd2c15c9a99 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -133,6 +133,8 @@ void fbnic_up(struct fbnic_net *fbn)
 
 	fbnic_fill(fbn);
 
+	__fbnic_set_rx_mode(fbn->netdev);
+
 	/* Enable Tx/Rx processing */
 	fbnic_napi_enable(fbn);
 	netif_tx_start_all_queues(fbn->netdev);
@@ -148,6 +150,7 @@ static void fbnic_down_noidle(struct fbnic_net *fbn)
 	fbnic_napi_disable(fbn);
 	netif_tx_disable(fbn->netdev);
 
+	fbnic_clear_rx_mode(fbn->netdev);
 	fbnic_disable(fbn);
 }
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
new file mode 100644
index 000000000000..ac8814778919
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
@@ -0,0 +1,338 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#include <linux/etherdevice.h>
+
+#include "fbnic.h"
+#include "fbnic_netdev.h"
+#include "fbnic_rpc.h"
+
+static int fbnic_read_macda_entry(struct fbnic_dev *fbd, unsigned int idx,
+				  struct fbnic_mac_addr *mac_addr)
+{
+	__be16 *mask, *value;
+	int i;
+
+	mask = &mac_addr->mask.addr16[FBNIC_RPC_TCAM_MACDA_WORD_LEN - 1];
+	value = &mac_addr->value.addr16[FBNIC_RPC_TCAM_MACDA_WORD_LEN - 1];
+
+	for (i = 0; i < FBNIC_RPC_TCAM_MACDA_WORD_LEN; i++) {
+		u32 macda = rd32(FBNIC_RPC_TCAM_MACDA(idx, i));
+
+		*mask-- = htons(FIELD_GET(FBNIC_RPC_TCAM_MACDA_MASK, macda));
+		*value-- = htons(FIELD_GET(FBNIC_RPC_TCAM_MACDA_VALUE, macda));
+	}
+
+	return (rd32(FBNIC_RPC_TCAM_MACDA(idx, i)) &
+		FBNIC_RPC_TCAM_VALIDATE) ? 0 : -EINVAL;
+}
+
+void fbnic_bmc_rpc_all_multi_config(struct fbnic_dev *fbd,
+				    bool enable_host)
+{
+	struct fbnic_mac_addr *mac_addr;
+
+	/* We need to add the all multicast filter at the end of the
+	 * multicast address list. This way if there are any that are
+	 * shared between the host and the BMC they can be directed to
+	 * both. Otherwise the remainder just get sent directly to the
+	 * BMC.
+	 */
+	mac_addr = &fbd->mac_addr[fbd->mac_addr_boundary - 1];
+	if (fbnic_bmc_present(fbd) && fbd->fw_cap.all_multi) {
+		if (mac_addr->state != FBNIC_TCAM_S_VALID) {
+			eth_zero_addr(mac_addr->value.addr8);
+			eth_broadcast_addr(mac_addr->mask.addr8);
+			mac_addr->value.addr8[0] ^= 1;
+			mac_addr->mask.addr8[0] ^= 1;
+			set_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam);
+			mac_addr->state = FBNIC_TCAM_S_ADD;
+		}
+		if (enable_host)
+			set_bit(FBNIC_MAC_ADDR_T_ALLMULTI,
+				mac_addr->act_tcam);
+		else
+			clear_bit(FBNIC_MAC_ADDR_T_ALLMULTI,
+				  mac_addr->act_tcam);
+	} else if (!test_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam) &&
+		   !is_zero_ether_addr(mac_addr->mask.addr8) &&
+		   mac_addr->state == FBNIC_TCAM_S_VALID) {
+		clear_bit(FBNIC_MAC_ADDR_T_ALLMULTI, mac_addr->act_tcam);
+		clear_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam);
+		mac_addr->state = FBNIC_TCAM_S_DELETE;
+	}
+}
+
+void fbnic_bmc_rpc_init(struct fbnic_dev *fbd)
+{
+	int i = FBNIC_RPC_TCAM_MACDA_BMC_ADDR_IDX;
+	struct fbnic_mac_addr *mac_addr;
+	u32 macda_validate;
+
+	/* Verify that RPC is already enabled, if not abort */
+	macda_validate = rd32(FBNIC_RPC_TCAM_MACDA_VALIDATE);
+	if (!(macda_validate & (1u << i)))
+		return;
+
+	/* Read BMC MACDA entry and validate it */
+	mac_addr = &fbd->mac_addr[i];
+	if (fbnic_read_macda_entry(fbd, i, mac_addr))
+		return;
+
+	/* If BMC MAC addr is valid then record it and flag it as valid */
+	if (!is_valid_ether_addr(mac_addr->value.addr8))
+		return;
+
+	set_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam);
+	mac_addr->state = FBNIC_TCAM_S_VALID;
+
+	/* Record the BMC Multicast addresses */
+	for (i++; i < FBNIC_RPC_TCAM_MACDA_BROADCAST_IDX; i++) {
+		if (!(macda_validate & (1u << i)))
+			continue;
+
+		mac_addr = &fbd->mac_addr[i];
+		if (fbnic_read_macda_entry(fbd, i, mac_addr))
+			continue;
+
+		if (is_broadcast_ether_addr(mac_addr->value.addr8)) {
+			mac_addr->state = FBNIC_TCAM_S_DELETE;
+			continue;
+		}
+
+		if (!is_multicast_ether_addr(mac_addr->value.addr8))
+			continue;
+
+		set_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam);
+		mac_addr->state = FBNIC_TCAM_S_VALID;
+	}
+
+	/* Validate Broadcast is also present, record it and tag it */
+	if (macda_validate & (1u << i)) {
+		mac_addr = &fbd->mac_addr[i];
+		eth_broadcast_addr(mac_addr->value.addr8);
+		set_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam);
+		mac_addr->state = FBNIC_TCAM_S_ADD;
+	}
+
+	/* Record the shared BMC Multicast addresses */
+	for (i++; i <= FBNIC_RPC_TCAM_MACDA_PROMISC_IDX; i++) {
+		if (!(macda_validate & (1u << i)))
+			continue;
+
+		mac_addr = &fbd->mac_addr[i];
+		if (fbnic_read_macda_entry(fbd, i, mac_addr))
+			continue;
+
+		if (!is_multicast_ether_addr(mac_addr->value.addr8))
+			continue;
+
+		/* it isn't an exact match filter it must be an all-multi. */
+		if (!is_zero_ether_addr(mac_addr->mask.addr8)) {
+			fbd->fw_cap.all_multi = 1;
+
+			/* If it isn't in the correct spot don't record it */
+			if (i != fbd->mac_addr_boundary - 1)
+				continue;
+		}
+
+		set_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam);
+		mac_addr->state = FBNIC_TCAM_S_VALID;
+	}
+
+	fbnic_bmc_rpc_all_multi_config(fbd, false);
+
+	fbnic_bmc_set_present(fbd, true);
+}
+
+struct fbnic_mac_addr *__fbnic_uc_sync(struct fbnic_dev *fbd,
+				       const unsigned char *addr)
+{
+	struct fbnic_mac_addr *avail_addr = NULL;
+	unsigned int i;
+
+	/* Scan from middle of list to bottom, filling bottom up.
+	 * Skip the first entry which is reserved for dev_addr and
+	 * leave the last entry to use for promiscuous filtering.
+	 */
+	for (i = fbd->mac_addr_boundary - 1;
+	     i < FBNIC_RPC_TCAM_MACDA_HOST_ADDR_IDX; i++) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[i];
+
+		if (mac_addr->state == FBNIC_TCAM_S_DISABLED) {
+			avail_addr = mac_addr;
+		} else if (ether_addr_equal(mac_addr->value.addr8, addr)) {
+			avail_addr = mac_addr;
+			break;
+		}
+	}
+
+	if (avail_addr && avail_addr->state == FBNIC_TCAM_S_DISABLED) {
+		ether_addr_copy(avail_addr->value.addr8, addr);
+		eth_zero_addr(avail_addr->mask.addr8);
+		avail_addr->state = FBNIC_TCAM_S_ADD;
+	}
+
+	return avail_addr;
+}
+
+struct fbnic_mac_addr *__fbnic_mc_sync(struct fbnic_dev *fbd,
+				       const unsigned char *addr)
+{
+	struct fbnic_mac_addr *avail_addr = NULL;
+	unsigned int i;
+
+	/* Scan from middle of list to top, filling top down.
+	 * Skip over the address reserved for the BMC MAC and
+	 * exclude index 0 as that belongs to the broadcast address
+	 */
+	for (i = fbd->mac_addr_boundary;
+	     --i > FBNIC_RPC_TCAM_MACDA_BROADCAST_IDX;) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[i];
+
+		if (mac_addr->state == FBNIC_TCAM_S_DISABLED) {
+			avail_addr = mac_addr;
+		} else if (ether_addr_equal(mac_addr->value.addr8, addr)) {
+			avail_addr = mac_addr;
+			break;
+		}
+	}
+
+	/* Scan the BMC addresses to see if it may have already
+	 * reserved the address.
+	 */
+	while (--i) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[i];
+
+		if (!is_zero_ether_addr(mac_addr->mask.addr8))
+			continue;
+
+		/* Only move on if we find a match */
+		if (!ether_addr_equal(mac_addr->value.addr8, addr))
+			continue;
+
+		/* We need to pull this address to the shared area */
+		if (avail_addr) {
+			memcpy(avail_addr, mac_addr, sizeof(*mac_addr));
+			mac_addr->state = FBNIC_TCAM_S_DELETE;
+			avail_addr->state = FBNIC_TCAM_S_ADD;
+		}
+
+		break;
+	}
+
+	if (avail_addr && avail_addr->state == FBNIC_TCAM_S_DISABLED) {
+		ether_addr_copy(avail_addr->value.addr8, addr);
+		eth_zero_addr(avail_addr->mask.addr8);
+		avail_addr->state = FBNIC_TCAM_S_ADD;
+	}
+
+	return avail_addr;
+}
+
+int __fbnic_xc_unsync(struct fbnic_mac_addr *mac_addr, unsigned int tcam_idx)
+{
+	if (!test_and_clear_bit(tcam_idx, mac_addr->act_tcam))
+		return -ENOENT;
+
+	if (bitmap_empty(mac_addr->act_tcam, FBNIC_RPC_TCAM_ACT_NUM_ENTRIES))
+		mac_addr->state = FBNIC_TCAM_S_DELETE;
+
+	return 0;
+}
+
+void fbnic_sift_macda(struct fbnic_dev *fbd)
+{
+	int dest, src;
+
+	/* move BMC only addresses back into BMC region */
+	for (dest = FBNIC_RPC_TCAM_MACDA_BMC_ADDR_IDX,
+	     src = FBNIC_RPC_TCAM_MACDA_MULTICAST_IDX;
+	     ++dest < FBNIC_RPC_TCAM_MACDA_BROADCAST_IDX &&
+	     src < fbd->mac_addr_boundary;) {
+		struct fbnic_mac_addr *dest_addr = &fbd->mac_addr[dest];
+
+		if (dest_addr->state != FBNIC_TCAM_S_DISABLED)
+			continue;
+
+		while (src < fbd->mac_addr_boundary) {
+			struct fbnic_mac_addr *src_addr = &fbd->mac_addr[src++];
+
+			/* Verify BMC bit is set */
+			if (!test_bit(FBNIC_MAC_ADDR_T_BMC, src_addr->act_tcam))
+				continue;
+
+			/* Verify filter isn't already disabled */
+			if (src_addr->state == FBNIC_TCAM_S_DISABLED ||
+			    src_addr->state == FBNIC_TCAM_S_DELETE)
+				continue;
+
+			/* Verify only BMC bit is set */
+			if (bitmap_weight(src_addr->act_tcam,
+					  FBNIC_RPC_TCAM_ACT_NUM_ENTRIES) != 1)
+				continue;
+
+			/* Verify we are not moving wildcard address */
+			if (!is_zero_ether_addr(src_addr->mask.addr8))
+				continue;
+
+			memcpy(dest_addr, src_addr, sizeof(*src_addr));
+			src_addr->state = FBNIC_TCAM_S_DELETE;
+			dest_addr->state = FBNIC_TCAM_S_ADD;
+		}
+	}
+}
+
+static void fbnic_clear_macda_entry(struct fbnic_dev *fbd, unsigned int idx)
+{
+	int i;
+
+	/* invalidate entry and clear addr state info */
+	for (i = 0; i <= FBNIC_RPC_TCAM_MACDA_WORD_LEN; i++)
+		wr32(FBNIC_RPC_TCAM_MACDA(idx, i), 0);
+}
+
+static void fbnic_write_macda_entry(struct fbnic_dev *fbd, unsigned int idx,
+				    struct fbnic_mac_addr *mac_addr)
+{
+	__be16 *mask, *value;
+	int i;
+
+	mask = &mac_addr->mask.addr16[FBNIC_RPC_TCAM_MACDA_WORD_LEN - 1];
+	value = &mac_addr->value.addr16[FBNIC_RPC_TCAM_MACDA_WORD_LEN - 1];
+
+	for (i = 0; i < FBNIC_RPC_TCAM_MACDA_WORD_LEN; i++)
+		wr32(FBNIC_RPC_TCAM_MACDA(idx, i),
+		     FIELD_PREP(FBNIC_RPC_TCAM_MACDA_MASK, ntohs(*mask--)) |
+		     FIELD_PREP(FBNIC_RPC_TCAM_MACDA_VALUE, ntohs(*value--)));
+
+	wrfl();
+
+	wr32(FBNIC_RPC_TCAM_MACDA(idx, i), FBNIC_RPC_TCAM_VALIDATE);
+}
+
+void fbnic_write_macda(struct fbnic_dev *fbd)
+{
+	int idx;
+
+	for (idx = ARRAY_SIZE(fbd->mac_addr); idx--;) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[idx];
+
+		/* Check if update flag is set else exit. */
+		if (!(mac_addr->state & FBNIC_TCAM_S_UPDATE))
+			continue;
+
+		/* Clear by writing 0s. */
+		if (mac_addr->state == FBNIC_TCAM_S_DELETE) {
+			/* invalidate entry and clear addr state info */
+			fbnic_clear_macda_entry(fbd, idx);
+			memset(mac_addr, 0, sizeof(*mac_addr));
+
+			continue;
+		}
+
+		fbnic_write_macda_entry(fbd, idx, mac_addr);
+
+		mac_addr->state = FBNIC_TCAM_S_VALID;
+	}
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.h b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.h
new file mode 100644
index 000000000000..1b59b10ba677
--- /dev/null
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.h
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _FBNIC_RPC_H_
+#define _FBNIC_RPC_H_
+
+#include <uapi/linux/in6.h>
+#include <linux/bitfield.h>
+
+/*  The TCAM state definitions follow an expected ordering.
+ *  They start out disabled, then move through the following states:
+ *  Disabled  0	-> Add	      2
+ *  Add	      2	-> Valid      1
+ *
+ *  Valid     1	-> Add/Update 2
+ *  Add	      2	-> Valid      1
+ *
+ *  Valid     1	-> Delete     3
+ *  Delete    3	-> Disabled   0
+ */
+enum {
+	FBNIC_TCAM_S_DISABLED	= 0,
+	FBNIC_TCAM_S_VALID	= 1,
+	FBNIC_TCAM_S_ADD	= 2,
+	FBNIC_TCAM_S_UPDATE	= FBNIC_TCAM_S_ADD,
+	FBNIC_TCAM_S_DELETE	= 3,
+};
+
+/* 32 MAC Destination Address TCAM Entries
+ * 4 registers DA[1:0], DA[3:2], DA[5:4], Validate
+ */
+#define FBNIC_RPC_TCAM_MACDA_WORD_LEN		3
+#define FBNIC_RPC_TCAM_MACDA_NUM_ENTRIES	32
+
+#define FBNIC_RPC_TCAM_ACT_WORD_LEN		11
+#define FBNIC_RPC_TCAM_ACT_NUM_ENTRIES		64
+
+struct fbnic_mac_addr {
+	union {
+		unsigned char addr8[ETH_ALEN];
+		__be16 addr16[FBNIC_RPC_TCAM_MACDA_WORD_LEN];
+	} mask, value;
+	unsigned char state;
+	DECLARE_BITMAP(act_tcam, FBNIC_RPC_TCAM_ACT_NUM_ENTRIES);
+};
+
+struct fbnic_act_tcam {
+	struct {
+		u16 tcam[FBNIC_RPC_TCAM_ACT_WORD_LEN];
+	} mask, value;
+	unsigned char state;
+	u16 rss_en_mask;
+	u32 dest;
+};
+
+enum {
+	FBNIC_RSS_EN_HOST_ETHER,
+	FBNIC_RSS_EN_XCAST_ETHER,
+#define FBNIC_RSS_EN_NUM_UNICAST FBNIC_RSS_EN_XCAST_ETHER
+	FBNIC_RSS_EN_NUM_ENTRIES
+};
+
+/* Reserve the first 2 entries for the use by the BMC so that we can
+ * avoid allowing rules to get in the way of BMC unicast traffic.
+ */
+#define FBNIC_RPC_ACT_TBL_BMC_OFFSET		0
+#define FBNIC_RPC_ACT_TBL_BMC_ALL_MULTI_OFFSET	1
+
+/* We reserve the last 14 entries for RSS rules on the host. The BMC
+ * unicast rule will need to be populated above these and is expected to
+ * use MACDA TCAM entry 23 to store the BMC MAC address.
+ */
+#define FBNIC_RPC_ACT_TBL_RSS_OFFSET \
+	(FBNIC_RPC_ACT_TBL_NUM_ENTRIES - FBNIC_RSS_EN_NUM_ENTRIES)
+
+/* Flags used to identify the owner for this MAC filter. Note that any
+ * flags set for Broadcast thru Promisc indicate that the rule belongs
+ * to the RSS filters for the host.
+ */
+enum {
+	FBNIC_MAC_ADDR_T_BMC            = 0,
+	FBNIC_MAC_ADDR_T_BROADCAST	= FBNIC_RPC_ACT_TBL_RSS_OFFSET,
+#define FBNIC_MAC_ADDR_T_HOST_START	FBNIC_MAC_ADDR_T_BROADCAST
+	FBNIC_MAC_ADDR_T_MULTICAST,
+	FBNIC_MAC_ADDR_T_UNICAST,
+	FBNIC_MAC_ADDR_T_ALLMULTI,	/* BROADCAST ... MULTICAST*/
+	FBNIC_MAC_ADDR_T_PROMISC,	/* BROADCAST ... UNICAST */
+	FBNIC_MAC_ADDR_T_HOST_LAST
+};
+
+#define FBNIC_MAC_ADDR_T_HOST_LEN \
+	(FBNIC_MAC_ADDR_T_HOST_LAST - FBNIC_MAC_ADDR_T_HOST_START)
+
+#define FBNIC_RPC_TCAM_ACT1_L2_MACDA_IDX	CSR_GENMASK(9, 5)
+#define FBNIC_RPC_TCAM_ACT1_L2_MACDA_VALID	CSR_BIT(10)
+
+/* TCAM 0 - 3 reserved for BMC MAC addresses */
+#define FBNIC_RPC_TCAM_MACDA_BMC_ADDR_IDX	0
+/* TCAM 4 reserved for broadcast MAC address */
+#define FBNIC_RPC_TCAM_MACDA_BROADCAST_IDX	4
+/* TCAMs 5 - 30 will be used for multicast and unicast addresses. The
+ * boundary between the two can be variable it is currently set to 24
+ * on which the unicast addresses start. The general idea is that we will
+ * always go top-down with unicast, and bottom-up with multicast so that
+ * there should be free-space in the middle between the two.
+ *
+ * The entry at MADCA_DEFAULT_BOUNDARY is a special case as it can be used
+ * for the ALL MULTI address if the list is full, or the BMC has requested
+ * it.
+ */
+#define FBNIC_RPC_TCAM_MACDA_MULTICAST_IDX	5
+#define FBNIC_RPC_TCAM_MACDA_DEFAULT_BOUNDARY	24
+#define FBNIC_RPC_TCAM_MACDA_HOST_ADDR_IDX	30
+/* Reserved for use to record Multicast promisc, or Promiscuous */
+#define FBNIC_RPC_TCAM_MACDA_PROMISC_IDX	31
+
+struct fbnic_dev;
+
+void fbnic_bmc_rpc_init(struct fbnic_dev *fbd);
+void fbnic_bmc_rpc_all_multi_config(struct fbnic_dev *fbd, bool enable_host);
+
+int __fbnic_xc_unsync(struct fbnic_mac_addr *mac_addr, unsigned int tcam_idx);
+struct fbnic_mac_addr *__fbnic_uc_sync(struct fbnic_dev *fbd,
+				       const unsigned char *addr);
+struct fbnic_mac_addr *__fbnic_mc_sync(struct fbnic_dev *fbd,
+				       const unsigned char *addr);
+void fbnic_sift_macda(struct fbnic_dev *fbd);
+void fbnic_write_macda(struct fbnic_dev *fbd);
+
+static inline int __fbnic_uc_unsync(struct fbnic_mac_addr *mac_addr)
+{
+	return __fbnic_xc_unsync(mac_addr, FBNIC_MAC_ADDR_T_UNICAST);
+}
+
+static inline int __fbnic_mc_unsync(struct fbnic_mac_addr *mac_addr)
+{
+	return __fbnic_xc_unsync(mac_addr, FBNIC_MAC_ADDR_T_MULTICAST);
+}
+#endif /* _FBNIC_RPC_H_ */



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [net-next PATCH 15/15] eth: fbnic: write the TCAM tables used for RSS control and Rx to host
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (13 preceding siblings ...)
  2024-04-03 20:09 ` [net-next PATCH 14/15] eth: fbnic: add L2 address programming Alexander Duyck
@ 2024-04-03 20:09 ` Alexander Duyck
  2024-04-03 20:42 ` [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Bjorn Helgaas
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:09 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

From: Alexander Duyck <alexanderduyck@fb.com>

RSS is controlled by the Rx filter tables. Program rules matching
on appropriate traffic types and set hashing fields using actions.
We need a separate set of rules for broadcast and multicast
because the action there needs to include forwarding to BMC.

This patch only initializes the default settings, the control
of the configuration using ethtool will come soon.

With this the necessary rules are put in place to enable Rx of packets by
the host.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic.h        |    1 
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h    |   59 ++++
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c |    6 
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h |    7 
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c    |    4 
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c    |  371 ++++++++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.h    |   52 +++
 7 files changed, 499 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 0a62dc129d7e..5186d097cb8b 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -40,6 +40,7 @@ struct fbnic_dev {
 	u32 readrq;
 
 	/* Local copy of the devices TCAM */
+	struct fbnic_act_tcam act_tcam[FBNIC_RPC_TCAM_ACT_NUM_ENTRIES];
 	struct fbnic_mac_addr mac_addr[FBNIC_RPC_TCAM_MACDA_NUM_ENTRIES];
 	u8 mac_addr_boundary;
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index 613b50bf829c..4fb572421202 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -508,20 +508,79 @@ enum {
 #define FBNIC_RPC_RMI_CONFIG_FCS_PRESENT	CSR_BIT(8)
 #define FBNIC_RPC_RMI_CONFIG_ENABLE		CSR_BIT(12)
 #define FBNIC_RPC_RMI_CONFIG_MTU		CSR_GENMASK(31, 16)
+
+#define FBNIC_RPC_ACT_TBL0_DEFAULT	0x0840a		/* 0x21028 */
+#define FBNIC_RPC_ACT_TBL0_DROP			CSR_BIT(0)
+#define FBNIC_RPC_ACT_TBL0_DEST_MASK		CSR_GENMASK(3, 1)
+enum {
+	FBNIC_RPC_ACT_TBL0_DEST_HOST	= 1,
+	FBNIC_RPC_ACT_TBL0_DEST_BMC	= 2,
+	FBNIC_RPC_ACT_TBL0_DEST_EI	= 4,
+};
+
+#define FBNIC_RPC_ACT_TBL0_DMA_HINT		CSR_GENMASK(24, 16)
+#define FBNIC_RPC_ACT_TBL0_RSS_CTXT_ID		CSR_BIT(30)
+
+#define FBNIC_RPC_ACT_TBL1_DEFAULT	0x0840b		/* 0x2102c */
+#define FBNIC_RPC_ACT_TBL1_RSS_ENA_MASK		CSR_GENMASK(15, 0)
+enum {
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_IP_SRC	= 1,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_IP_DST	= 2,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_L4_SRC	= 4,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_L4_DST	= 8,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_L2_DA	= 16,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_L4_RSS_BYTE	= 32,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_IV6_FL_LBL	= 64,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_OV6_FL_LBL	= 128,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_DSCP		= 256,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_L3_PROT	= 512,
+	FBNIC_RPC_ACT_TBL1_RSS_ENA_L4_PROT	= 1024,
+};
+
+#define FBNIC_RPC_RSS_KEY(n)		(0x0840c + (n))	/* 0x21030 + 4*n */
+#define FBNIC_RPC_RSS_KEY_BIT_LEN		425
+#define FBNIC_RPC_RSS_KEY_BYTE_LEN \
+	DIV_ROUND_UP(FBNIC_RPC_RSS_KEY_BIT_LEN, 8)
+#define FBNIC_RPC_RSS_KEY_DWORD_LEN \
+	DIV_ROUND_UP(FBNIC_RPC_RSS_KEY_BIT_LEN, 32)
+#define FBNIC_RPC_RSS_KEY_LAST_IDX \
+	(FBNIC_RPC_RSS_KEY_DWORD_LEN - 1)
+#define FBNIC_RPC_RSS_KEY_LAST_MASK \
+	CSR_GENMASK(31, \
+		    FBNIC_RPC_RSS_KEY_DWORD_LEN * 32 - \
+		    FBNIC_RPC_RSS_KEY_BIT_LEN)
+
 #define FBNIC_RPC_TCAM_MACDA_VALIDATE	0x0852d		/* 0x214b4 */
 #define FBNIC_CSR_END_RPC		0x0856b	/* CSR section delimiter */
 
 /* RPC RAM Registers */
 
 #define FBNIC_CSR_START_RPC_RAM		0x08800	/* CSR section delimiter */
+#define FBNIC_RPC_ACT_TBL0(n)		(0x08800 + (n))	/* 0x22000 + 4*n */
+#define FBNIC_RPC_ACT_TBL1(n)		(0x08840 + (n))	/* 0x22100 + 4*n */
 #define FBNIC_RPC_ACT_TBL_NUM_ENTRIES		64
 
 /* TCAM Tables */
 #define FBNIC_RPC_TCAM_VALIDATE			CSR_BIT(31)
+
+/* 64 Action TCAM Entries, 12 registers
+ * 3 mixed, src port, dst port, 6 L4 words, and Validate
+ */
+#define FBNIC_RPC_TCAM_ACT(m, n) \
+	(0x08880 + ((n) * 0x40) + (m))		/* 0x22200 + 0x100*n + 4*m*/
+
+#define FBNIC_RPC_TCAM_ACT_VALUE		CSR_GENMASK(15, 0)
+#define FBNIC_RPC_TCAM_ACT_MASK			CSR_GENMASK(31, 16)
+
 #define FBNIC_RPC_TCAM_MACDA(m, n) \
 	(0x08b80 + ((n) * 0x20) + (m))		/* 0x022e00 + 0x80*n + 4*m */
 #define FBNIC_RPC_TCAM_MACDA_VALUE		CSR_GENMASK(15, 0)
 #define FBNIC_RPC_TCAM_MACDA_MASK		CSR_GENMASK(31, 16)
+
+#define FBNIC_RPC_RSS_TBL(n, m) \
+	(0x08d20 + ((n) * 0x100) + (m))		/* 0x023480 + 0x400*n + 4*m */
+#define FBNIC_RPC_RSS_TBL_COUNT			2
+#define FBNIC_RPC_RSS_TBL_SIZE			256
 #define FBNIC_CSR_END_RPC_RAM		0x08f1f	/* CSR section delimiter */
 
 /* Fab Registers */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index 349560821435..2844f0bfb9c4 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -51,6 +51,7 @@ int __fbnic_open(struct fbnic_net *fbn)
 		goto release_ownership;
 	/* Pull the BMC config and initialize the RPC */
 	fbnic_bmc_rpc_init(fbd);
+	fbnic_rss_reinit(fbd, fbn);
 
 	return 0;
 release_ownership:
@@ -263,6 +264,7 @@ void __fbnic_set_rx_mode(struct net_device *netdev)
 	fbnic_sift_macda(fbd);
 
 	/* Write updates to hardware */
+	fbnic_write_rules(fbd);
 	fbnic_write_macda(fbd);
 }
 
@@ -380,6 +382,10 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
 
 	fbnic_reset_queues(fbn, default_queues, default_queues);
 
+	fbnic_reset_indir_tbl(fbn);
+	fbnic_rss_key_fill(fbn->rss_key);
+	fbnic_rss_init_en_mask(fbn);
+
 	netdev->features |=
 		NETIF_F_RXHASH |
 		NETIF_F_SG |
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
index 40e155cf1865..6cfe820e4bba 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
@@ -6,6 +6,8 @@
 
 #include <linux/types.h>
 
+#include "fbnic_csr.h"
+#include "fbnic_rpc.h"
 #include "fbnic_txrx.h"
 
 struct fbnic_net {
@@ -31,7 +33,12 @@ struct fbnic_net {
 	u16 num_tx_queues;
 	u16 num_rx_queues;
 
+	u8 indir_tbl[FBNIC_RPC_RSS_TBL_COUNT][FBNIC_RPC_RSS_TBL_SIZE];
+	u32 rss_key[FBNIC_RPC_RSS_KEY_DWORD_LEN];
+	u32 rss_flow_hash[FBNIC_NUM_HASH_OPT];
+
 	u64 link_down_events;
+
 	struct list_head napis;
 };
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index fbd2c15c9a99..82f1afd7ae4b 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -133,6 +133,8 @@ void fbnic_up(struct fbnic_net *fbn)
 
 	fbnic_fill(fbn);
 
+	fbnic_rss_reinit_hw(fbn->fbd, fbn);
+
 	__fbnic_set_rx_mode(fbn->netdev);
 
 	/* Enable Tx/Rx processing */
@@ -151,6 +153,8 @@ static void fbnic_down_noidle(struct fbnic_net *fbn)
 	netif_tx_disable(fbn->netdev);
 
 	fbnic_clear_rx_mode(fbn->netdev);
+	fbnic_clear_rules(fbn->fbd);
+	fbnic_rss_disable_hw(fbn->fbd);
 	fbnic_disable(fbn);
 }
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
index ac8814778919..09b06c4a8153 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
@@ -2,11 +2,102 @@
 /* Copyright (c) Meta Platforms, Inc. and affiliates. */
 
 #include <linux/etherdevice.h>
+#include <linux/ethtool.h>
 
 #include "fbnic.h"
 #include "fbnic_netdev.h"
 #include "fbnic_rpc.h"
 
+void fbnic_reset_indir_tbl(struct fbnic_net *fbn)
+{
+	unsigned int num_rx = fbn->num_rx_queues;
+	unsigned int i;
+
+	for (i = 0; i < FBNIC_RPC_RSS_TBL_SIZE; i++) {
+		fbn->indir_tbl[0][i] = ethtool_rxfh_indir_default(i, num_rx);
+		fbn->indir_tbl[1][i] = ethtool_rxfh_indir_default(i, num_rx);
+	}
+}
+
+void fbnic_rss_key_fill(u32 *buffer)
+{
+	static u32 rss_key[FBNIC_RPC_RSS_KEY_DWORD_LEN];
+
+	net_get_random_once(rss_key, sizeof(rss_key));
+	rss_key[FBNIC_RPC_RSS_KEY_LAST_IDX] &= FBNIC_RPC_RSS_KEY_LAST_MASK;
+
+	memcpy(buffer, rss_key, sizeof(rss_key));
+}
+
+#define RX_HASH_OPT_L4 \
+	(RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | RXH_L4_B_2_3)
+#define RX_HASH_OPT_L3 \
+	(RXH_IP_SRC | RXH_IP_DST)
+#define RX_HASH_OPT_L2 RXH_L2DA
+
+void fbnic_rss_init_en_mask(struct fbnic_net *fbn)
+{
+	fbn->rss_flow_hash[FBNIC_TCP4_HASH_OPT] = RX_HASH_OPT_L4;
+	fbn->rss_flow_hash[FBNIC_TCP6_HASH_OPT] = RX_HASH_OPT_L4;
+
+	fbn->rss_flow_hash[FBNIC_UDP4_HASH_OPT] = RX_HASH_OPT_L3;
+	fbn->rss_flow_hash[FBNIC_UDP6_HASH_OPT] = RX_HASH_OPT_L3;
+	fbn->rss_flow_hash[FBNIC_IPV4_HASH_OPT] = RX_HASH_OPT_L3;
+	fbn->rss_flow_hash[FBNIC_IPV6_HASH_OPT] = RX_HASH_OPT_L3;
+
+	fbn->rss_flow_hash[FBNIC_ETHER_HASH_OPT] = RX_HASH_OPT_L2;
+}
+
+void fbnic_rss_disable_hw(struct fbnic_dev *fbd)
+{
+	/* Disable RPC by clearing enable bit and configuration */
+	if (!fbnic_bmc_present(fbd))
+		wr32(FBNIC_RPC_RMI_CONFIG,
+		     FIELD_PREP(FBNIC_RPC_RMI_CONFIG_OH_BYTES, 20));
+}
+
+#define FBNIC_FH_2_RSSEM_BIT(_fh, _rssem, _val)		\
+	FIELD_PREP(FBNIC_RPC_ACT_TBL1_RSS_ENA_##_rssem,	\
+		   FIELD_GET(RXH_##_fh, _val))
+static u16 fbnic_flow_hash_2_rss_en_mask(struct fbnic_net *fbn, int flow_type)
+{
+	u32 flow_hash = fbn->rss_flow_hash[flow_type];
+	u32 rss_en_mask = 0;
+
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L2DA, L2_DA, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP_SRC, IP_SRC, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP_DST, IP_DST, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_0_1, L4_SRC, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_2_3, L4_DST, flow_hash);
+
+	return rss_en_mask;
+}
+
+void fbnic_rss_reinit_hw(struct fbnic_dev *fbd, struct fbnic_net *fbn)
+{
+	unsigned int i;
+
+	for (i = 0; i < FBNIC_RPC_RSS_TBL_SIZE; i++) {
+		fbnic_wr32(fbd, FBNIC_RPC_RSS_TBL(0, i), fbn->indir_tbl[0][i]);
+		fbnic_wr32(fbd, FBNIC_RPC_RSS_TBL(1, i), fbn->indir_tbl[1][i]);
+	}
+
+	for (i = 0; i < FBNIC_RPC_RSS_KEY_DWORD_LEN; i++)
+		wr32(FBNIC_RPC_RSS_KEY(i), fbn->rss_key[i]);
+
+	/* Default action for this to drop w/ no destination */
+	wr32(FBNIC_RPC_ACT_TBL0_DEFAULT, FBNIC_RPC_ACT_TBL0_DROP);
+	wrfl();
+
+	wr32(FBNIC_RPC_ACT_TBL1_DEFAULT, 0);
+
+	/* If it isn't already enabled set the RMI Config value to enable RPC */
+	wr32(FBNIC_RPC_RMI_CONFIG,
+	     FIELD_PREP(FBNIC_RPC_RMI_CONFIG_MTU, FBNIC_MAX_JUMBO_FRAME_SIZE) |
+	     FIELD_PREP(FBNIC_RPC_RMI_CONFIG_OH_BYTES, 20) |
+	     FBNIC_RPC_RMI_CONFIG_ENABLE);
+}
+
 static int fbnic_read_macda_entry(struct fbnic_dev *fbd, unsigned int idx,
 				  struct fbnic_mac_addr *mac_addr)
 {
@@ -30,7 +121,9 @@ static int fbnic_read_macda_entry(struct fbnic_dev *fbd, unsigned int idx,
 void fbnic_bmc_rpc_all_multi_config(struct fbnic_dev *fbd,
 				    bool enable_host)
 {
+	struct fbnic_act_tcam *act_tcam;
 	struct fbnic_mac_addr *mac_addr;
+	int j;
 
 	/* We need to add the all multicast filter at the end of the
 	 * multicast address list. This way if there are any that are
@@ -61,13 +154,51 @@ void fbnic_bmc_rpc_all_multi_config(struct fbnic_dev *fbd,
 		clear_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam);
 		mac_addr->state = FBNIC_TCAM_S_DELETE;
 	}
+
+	/* We have to add a special handler for multicast as the
+	 * BMC may have an all-multi rule already in place. As such
+	 * adding a rule ourselves won't do any good so we will have
+	 * to modify the rules for the ALL MULTI below if the BMC
+	 * already has the rule in place.
+	 */
+	act_tcam = &fbd->act_tcam[FBNIC_RPC_ACT_TBL_BMC_ALL_MULTI_OFFSET];
+
+	/* If we are not enabling the rule just delete it. We will fall
+	 * back to the RSS rules that support the multicast addresses.
+	 */
+	if (!fbnic_bmc_present(fbd) || !fbd->fw_cap.all_multi || enable_host) {
+		if (act_tcam->state == FBNIC_TCAM_S_VALID)
+			act_tcam->state = FBNIC_TCAM_S_DELETE;
+		return;
+	}
+
+	/* Rewrite TCAM rule 23 to handle BMC all-multi traffic */
+	act_tcam->dest = FIELD_PREP(FBNIC_RPC_ACT_TBL0_DEST_MASK,
+				    FBNIC_RPC_ACT_TBL0_DEST_BMC);
+	act_tcam->mask.tcam[0] = 0xffff;
+
+	/* MACDA 0 - 3 is reserved for the BMC MAC address */
+	act_tcam->value.tcam[1] =
+			FIELD_PREP(FBNIC_RPC_TCAM_ACT1_L2_MACDA_IDX,
+				   fbd->mac_addr_boundary - 1) |
+			FBNIC_RPC_TCAM_ACT1_L2_MACDA_VALID;
+	act_tcam->mask.tcam[1] = 0xffff &
+			 ~FBNIC_RPC_TCAM_ACT1_L2_MACDA_IDX &
+			 ~FBNIC_RPC_TCAM_ACT1_L2_MACDA_VALID;
+
+	for (j = 2; j < FBNIC_RPC_TCAM_ACT_WORD_LEN; j++)
+		act_tcam->mask.tcam[j] = 0xffff;
+
+	act_tcam->state = FBNIC_TCAM_S_UPDATE;
 }
 
 void fbnic_bmc_rpc_init(struct fbnic_dev *fbd)
 {
 	int i = FBNIC_RPC_TCAM_MACDA_BMC_ADDR_IDX;
+	struct fbnic_act_tcam *act_tcam;
 	struct fbnic_mac_addr *mac_addr;
 	u32 macda_validate;
+	int j;
 
 	/* Verify that RPC is already enabled, if not abort */
 	macda_validate = rd32(FBNIC_RPC_TCAM_MACDA_VALIDATE);
@@ -140,11 +271,116 @@ void fbnic_bmc_rpc_init(struct fbnic_dev *fbd)
 		mac_addr->state = FBNIC_TCAM_S_VALID;
 	}
 
+	/* Rewrite TCAM rule 0 if it isn't present to relocate BMC rules */
+	act_tcam = &fbd->act_tcam[FBNIC_RPC_ACT_TBL_BMC_OFFSET];
+	act_tcam->dest = FIELD_PREP(FBNIC_RPC_ACT_TBL0_DEST_MASK,
+				    FBNIC_RPC_ACT_TBL0_DEST_BMC);
+	act_tcam->mask.tcam[0] = 0xffff;
+
+	/* MACDA 0 - 3 is reserved for the BMC MAC address
+	 * to account for that we have to mask out the lower 2 bits
+	 * of the macda by performing an &= with 0x1c.
+	 */
+	act_tcam->value.tcam[1] = FBNIC_RPC_TCAM_ACT1_L2_MACDA_VALID;
+	act_tcam->mask.tcam[1] = 0xffff &
+			~FIELD_PREP(FBNIC_RPC_TCAM_ACT1_L2_MACDA_IDX, 0x1c) &
+			~FBNIC_RPC_TCAM_ACT1_L2_MACDA_VALID;
+
+	for (j = 2; j < FBNIC_RPC_TCAM_ACT_WORD_LEN; j++)
+		act_tcam->mask.tcam[j] = 0xffff;
+
+	act_tcam->state = FBNIC_TCAM_S_UPDATE;
+
 	fbnic_bmc_rpc_all_multi_config(fbd, false);
 
 	fbnic_bmc_set_present(fbd, true);
 }
 
+#define FBNIC_ACT1_INIT(_l4, _udp, _ip, _v6)		\
+	(((_l4) ? FBNIC_RPC_TCAM_ACT1_L4_VALID : 0) |	\
+	 ((_udp) ? FBNIC_RPC_TCAM_ACT1_L4_IS_UDP : 0) |	\
+	 ((_ip) ? FBNIC_RPC_TCAM_ACT1_IP_VALID : 0) |	\
+	 ((_v6) ? FBNIC_RPC_TCAM_ACT1_IP_IS_V6 : 0))
+
+void fbnic_rss_reinit(struct fbnic_dev *fbd, struct fbnic_net *fbn)
+{
+	static const u32 act1_value[FBNIC_NUM_HASH_OPT] = {
+		FBNIC_ACT1_INIT(1, 1, 1, 1),	/* UDP6 */
+		FBNIC_ACT1_INIT(1, 1, 1, 0),	/* UDP4 */
+		FBNIC_ACT1_INIT(1, 0, 1, 1),	/* TCP6 */
+		FBNIC_ACT1_INIT(1, 0, 1, 0),	/* TCP4 */
+		FBNIC_ACT1_INIT(0, 0, 1, 1),	/* IP6 */
+		FBNIC_ACT1_INIT(0, 0, 1, 0),	/* IP4 */
+		0				/* Ether */
+	};
+	unsigned int i;
+
+	/* To support scenarios where a BMC is present we must write the
+	 * rules twice, once for the unicast cases, and once again for
+	 * the broadcast/multicast cases as we have to support 2 destinations.
+	 */
+	BUILD_BUG_ON(FBNIC_RSS_EN_NUM_UNICAST * 2 != FBNIC_RSS_EN_NUM_ENTRIES);
+	BUILD_BUG_ON(ARRAY_SIZE(act1_value) != FBNIC_NUM_HASH_OPT);
+
+	/* Program RSS hash enable mask for host in action TCAM/table. */
+	for (i = fbnic_bmc_present(fbd) ? 0 : FBNIC_RSS_EN_NUM_UNICAST;
+	     i < FBNIC_RSS_EN_NUM_ENTRIES; i++) {
+		unsigned int idx = i + FBNIC_RPC_ACT_TBL_RSS_OFFSET;
+		struct fbnic_act_tcam *act_tcam = &fbd->act_tcam[idx];
+		u32 flow_hash, dest, rss_en_mask;
+		int flow_type, j;
+		u16 value = 0;
+
+		flow_type = i % FBNIC_RSS_EN_NUM_UNICAST;
+		flow_hash = fbn->rss_flow_hash[flow_type];
+
+		/* set DEST_HOST based on absence of RXH_DISCARD */
+		dest = FIELD_PREP(FBNIC_RPC_ACT_TBL0_DEST_MASK,
+				  !(RXH_DISCARD & flow_hash) ?
+				  FBNIC_RPC_ACT_TBL0_DEST_HOST : 0);
+
+		if (i >= FBNIC_RSS_EN_NUM_UNICAST && fbnic_bmc_present(fbd))
+			dest |= FIELD_PREP(FBNIC_RPC_ACT_TBL0_DEST_MASK,
+					   FBNIC_RPC_ACT_TBL0_DEST_BMC);
+
+		if (!dest)
+			dest = FBNIC_RPC_ACT_TBL0_DROP;
+
+		if (act1_value[flow_type] & FBNIC_RPC_TCAM_ACT1_L4_VALID)
+			dest |= FIELD_PREP(FBNIC_RPC_ACT_TBL0_DMA_HINT,
+					   FBNIC_RCD_HDR_AL_DMA_HINT_L4);
+
+		rss_en_mask = fbnic_flow_hash_2_rss_en_mask(fbn, flow_type);
+
+		act_tcam->dest = dest;
+		act_tcam->rss_en_mask = rss_en_mask;
+		act_tcam->state = FBNIC_TCAM_S_UPDATE;
+
+		act_tcam->mask.tcam[0] = 0xffff;
+
+		/* We reserve the upper 8 MACDA TCAM entries for host
+		 * unicast. So we set the value to 24, and the mask the
+		 * lower bits so that the lower entries can be used as
+		 * multicast or BMC addresses.
+		 */
+		if (i < FBNIC_RSS_EN_NUM_UNICAST)
+			value = FIELD_PREP(FBNIC_RPC_TCAM_ACT1_L2_MACDA_IDX,
+					   fbd->mac_addr_boundary);
+		value |= FBNIC_RPC_TCAM_ACT1_L2_MACDA_VALID;
+
+		flow_type = i % FBNIC_RSS_EN_NUM_UNICAST;
+		value |= act1_value[flow_type];
+
+		act_tcam->value.tcam[1] = value;
+		act_tcam->mask.tcam[1] = ~value;
+
+		for (j = 2; j < FBNIC_RPC_TCAM_ACT_WORD_LEN; j++)
+			act_tcam->mask.tcam[j] = 0xffff;
+
+		act_tcam->state = FBNIC_TCAM_S_UPDATE;
+	}
+}
+
 struct fbnic_mac_addr *__fbnic_uc_sync(struct fbnic_dev *fbd,
 				       const unsigned char *addr)
 {
@@ -292,6 +528,38 @@ static void fbnic_clear_macda_entry(struct fbnic_dev *fbd, unsigned int idx)
 		wr32(FBNIC_RPC_TCAM_MACDA(idx, i), 0);
 }
 
+static void fbnic_clear_macda(struct fbnic_dev *fbd)
+{
+	int idx;
+
+	for (idx = ARRAY_SIZE(fbd->mac_addr); idx--;) {
+		struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[idx];
+
+		if (mac_addr->state == FBNIC_TCAM_S_DISABLED)
+			continue;
+
+		if (test_bit(FBNIC_MAC_ADDR_T_BMC, mac_addr->act_tcam)) {
+			if (fbnic_bmc_present(fbd))
+				continue;
+			dev_warn_once(fbd->dev,
+				      "Found BMC MAC address w/ BMC not present\n");
+		}
+
+		fbnic_clear_macda_entry(fbd, idx);
+
+		/* If rule was already destined for deletion just wipe it now */
+		if (mac_addr->state == FBNIC_TCAM_S_DELETE) {
+			memset(mac_addr, 0, sizeof(*mac_addr));
+			continue;
+		}
+
+		/* Change state to update so that we will rewrite
+		 * this tcam the next time fbnic_write_macda is called.
+		 */
+		mac_addr->state = FBNIC_TCAM_S_UPDATE;
+	}
+}
+
 static void fbnic_write_macda_entry(struct fbnic_dev *fbd, unsigned int idx,
 				    struct fbnic_mac_addr *mac_addr)
 {
@@ -336,3 +604,106 @@ void fbnic_write_macda(struct fbnic_dev *fbd)
 		mac_addr->state = FBNIC_TCAM_S_VALID;
 	}
 }
+
+static void fbnic_clear_act_tcam(struct fbnic_dev *fbd, unsigned int idx)
+{
+	int i;
+
+	/* invalidate entry and clear addr state info */
+	for (i = 0; i <= FBNIC_RPC_TCAM_ACT_WORD_LEN; i++)
+		wr32(FBNIC_RPC_TCAM_ACT(idx, i), 0);
+}
+
+void fbnic_clear_rules(struct fbnic_dev *fbd)
+{
+	u32 dest = FIELD_PREP(FBNIC_RPC_ACT_TBL0_DEST_MASK,
+			      FBNIC_RPC_ACT_TBL0_DEST_BMC);
+	int i = FBNIC_RPC_TCAM_ACT_NUM_ENTRIES - 1;
+	struct fbnic_act_tcam *act_tcam;
+
+	/* Clear MAC rules */
+	fbnic_clear_macda(fbd);
+
+	/* If BMC is present we need to preserve the last rule which
+	 * will be used to route traffic to the BMC if it is received.
+	 *
+	 * At this point it should be the only MAC address in the MACDA
+	 * so any unicast or multicast traffic received should be routed
+	 * to it. So leave the last rule in place.
+	 *
+	 * It will be rewritten to add the host again when we bring
+	 * the interface back up.
+	 */
+	if (fbnic_bmc_present(fbd)) {
+		act_tcam = &fbd->act_tcam[i];
+
+		if (act_tcam->state == FBNIC_TCAM_S_VALID &&
+		    (act_tcam->dest & dest)) {
+			wr32(FBNIC_RPC_ACT_TBL0(i), dest);
+			wr32(FBNIC_RPC_ACT_TBL1(i), 0);
+
+			act_tcam->state = FBNIC_TCAM_S_UPDATE;
+
+			i--;
+		}
+	}
+
+	/* Work from the bottom up deleting all other rules from hardware */
+	do {
+		act_tcam = &fbd->act_tcam[i];
+
+		if (act_tcam->state != FBNIC_TCAM_S_VALID)
+			continue;
+
+		fbnic_clear_act_tcam(fbd, i);
+		act_tcam->state = FBNIC_TCAM_S_UPDATE;
+	} while (i--);
+}
+
+static void fbnic_delete_act_tcam(struct fbnic_dev *fbd, unsigned int idx)
+{
+	fbnic_clear_act_tcam(fbd, idx);
+	memset(&fbd->act_tcam[idx], 0, sizeof(struct fbnic_act_tcam));
+}
+
+static void fbnic_update_act_tcam(struct fbnic_dev *fbd, unsigned int idx)
+{
+	struct fbnic_act_tcam *act_tcam = &fbd->act_tcam[idx];
+	int i;
+
+	/* Update entry by writing the destination and RSS mask */
+	wr32(FBNIC_RPC_ACT_TBL0(idx), act_tcam->dest);
+	wr32(FBNIC_RPC_ACT_TBL1(idx), act_tcam->rss_en_mask);
+
+	/* Write new TCAM rule to hardware */
+	for (i = 0; i < FBNIC_RPC_TCAM_ACT_WORD_LEN; i++)
+		wr32(FBNIC_RPC_TCAM_ACT(idx, i),
+		     FIELD_PREP(FBNIC_RPC_TCAM_ACT_MASK,
+				act_tcam->mask.tcam[i]) |
+		     FIELD_PREP(FBNIC_RPC_TCAM_ACT_VALUE,
+				act_tcam->value.tcam[i]));
+
+	wrfl();
+
+	wr32(FBNIC_RPC_TCAM_ACT(idx, i), FBNIC_RPC_TCAM_VALIDATE);
+	act_tcam->state = FBNIC_TCAM_S_VALID;
+}
+
+void fbnic_write_rules(struct fbnic_dev *fbd)
+{
+	int i;
+
+	/* Flush any pending action table rules */
+	for (i = 0; i < FBNIC_RPC_ACT_TBL_NUM_ENTRIES; i++) {
+		struct fbnic_act_tcam *act_tcam = &fbd->act_tcam[i];
+
+		/* Check if update flag is set else exit. */
+		if (!(act_tcam->state & FBNIC_TCAM_S_UPDATE))
+			continue;
+
+		if (act_tcam->state == FBNIC_TCAM_S_DELETE)
+			fbnic_delete_act_tcam(fbd, i);
+		else
+			fbnic_update_act_tcam(fbd, i);
+	}
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.h b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.h
index 1b59b10ba677..d62935f722a2 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.h
@@ -54,9 +54,21 @@ struct fbnic_act_tcam {
 };
 
 enum {
+	FBNIC_RSS_EN_HOST_UDP6,
+	FBNIC_RSS_EN_HOST_UDP4,
+	FBNIC_RSS_EN_HOST_TCP6,
+	FBNIC_RSS_EN_HOST_TCP4,
+	FBNIC_RSS_EN_HOST_IP6,
+	FBNIC_RSS_EN_HOST_IP4,
 	FBNIC_RSS_EN_HOST_ETHER,
+	FBNIC_RSS_EN_XCAST_UDP6,
+#define FBNIC_RSS_EN_NUM_UNICAST FBNIC_RSS_EN_XCAST_UDP6
+	FBNIC_RSS_EN_XCAST_UDP4,
+	FBNIC_RSS_EN_XCAST_TCP6,
+	FBNIC_RSS_EN_XCAST_TCP4,
+	FBNIC_RSS_EN_XCAST_IP6,
+	FBNIC_RSS_EN_XCAST_IP4,
 	FBNIC_RSS_EN_XCAST_ETHER,
-#define FBNIC_RSS_EN_NUM_UNICAST FBNIC_RSS_EN_XCAST_ETHER
 	FBNIC_RSS_EN_NUM_ENTRIES
 };
 
@@ -91,8 +103,22 @@ enum {
 #define FBNIC_MAC_ADDR_T_HOST_LEN \
 	(FBNIC_MAC_ADDR_T_HOST_LAST - FBNIC_MAC_ADDR_T_HOST_START)
 
+#define FBNIC_RPC_TCAM_ACT0_IPSRC_IDX		CSR_GENMASK(2, 0)
+#define FBNIC_RPC_TCAM_ACT0_IPSRC_VALID		CSR_BIT(3)
+#define FBNIC_RPC_TCAM_ACT0_IPDST_IDX		CSR_GENMASK(6, 4)
+#define FBNIC_RPC_TCAM_ACT0_IPDST_VALID		CSR_BIT(7)
+#define FBNIC_RPC_TCAM_ACT0_OUTER_IPSRC_IDX	CSR_GENMASK(10, 8)
+#define FBNIC_RPC_TCAM_ACT0_OUTER_IPSRC_VALID	CSR_BIT(11)
+#define FBNIC_RPC_TCAM_ACT0_OUTER_IPDST_IDX	CSR_GENMASK(14, 12)
+#define FBNIC_RPC_TCAM_ACT0_OUTER_IPDST_VALID	CSR_BIT(15)
+
 #define FBNIC_RPC_TCAM_ACT1_L2_MACDA_IDX	CSR_GENMASK(9, 5)
 #define FBNIC_RPC_TCAM_ACT1_L2_MACDA_VALID	CSR_BIT(10)
+#define FBNIC_RPC_TCAM_ACT1_IP_IS_V6		CSR_BIT(11)
+#define FBNIC_RPC_TCAM_ACT1_IP_VALID		CSR_BIT(12)
+#define FBNIC_RPC_TCAM_ACT1_OUTER_IP_VALID	CSR_BIT(13)
+#define FBNIC_RPC_TCAM_ACT1_L4_IS_UDP		CSR_BIT(14)
+#define FBNIC_RPC_TCAM_ACT1_L4_VALID		CSR_BIT(15)
 
 /* TCAM 0 - 3 reserved for BMC MAC addresses */
 #define FBNIC_RPC_TCAM_MACDA_BMC_ADDR_IDX	0
@@ -114,11 +140,32 @@ enum {
 /* Reserved for use to record Multicast promisc, or Promiscuous */
 #define FBNIC_RPC_TCAM_MACDA_PROMISC_IDX	31
 
+enum {
+	FBNIC_UDP6_HASH_OPT,
+	FBNIC_UDP4_HASH_OPT,
+	FBNIC_TCP6_HASH_OPT,
+	FBNIC_TCP4_HASH_OPT,
+#define FBNIC_L4_HASH_OPT FBNIC_TCP4_HASH_OPT
+	FBNIC_IPV6_HASH_OPT,
+	FBNIC_IPV4_HASH_OPT,
+#define FBNIC_IP_HASH_OPT FBNIC_IPV4_HASH_OPT
+	FBNIC_ETHER_HASH_OPT,
+	FBNIC_NUM_HASH_OPT,
+};
+
 struct fbnic_dev;
+struct fbnic_net;
 
 void fbnic_bmc_rpc_init(struct fbnic_dev *fbd);
 void fbnic_bmc_rpc_all_multi_config(struct fbnic_dev *fbd, bool enable_host);
 
+void fbnic_reset_indir_tbl(struct fbnic_net *fbn);
+void fbnic_rss_key_fill(u32 *buffer);
+void fbnic_rss_init_en_mask(struct fbnic_net *fbn);
+void fbnic_rss_disable_hw(struct fbnic_dev *fbd);
+void fbnic_rss_reinit_hw(struct fbnic_dev *fbd, struct fbnic_net *fbn);
+void fbnic_rss_reinit(struct fbnic_dev *fbd, struct fbnic_net *fbn);
+
 int __fbnic_xc_unsync(struct fbnic_mac_addr *mac_addr, unsigned int tcam_idx);
 struct fbnic_mac_addr *__fbnic_uc_sync(struct fbnic_dev *fbd,
 				       const unsigned char *addr);
@@ -136,4 +183,7 @@ static inline int __fbnic_mc_unsync(struct fbnic_mac_addr *mac_addr)
 {
 	return __fbnic_xc_unsync(mac_addr, FBNIC_MAC_ADDR_T_MULTICAST);
 }
+
+void fbnic_clear_rules(struct fbnic_dev *fbd);
+void fbnic_write_rules(struct fbnic_dev *fbd);
 #endif /* _FBNIC_RPC_H_ */



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 01/15] PCI: Add Meta Platforms vendor ID
  2024-04-03 20:08 ` [net-next PATCH 01/15] PCI: Add Meta Platforms vendor ID Alexander Duyck
@ 2024-04-03 20:20   ` Bjorn Helgaas
  0 siblings, 0 replies; 163+ messages in thread
From: Bjorn Helgaas @ 2024-04-03 20:20 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 03, 2024 at 01:08:28PM -0700, Alexander Duyck wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> Add Meta as a vendor ID for PCI devices so we can use the macro for future
> drivers.
> 
> CC: bhelgaas@google.com
> CC: linux-pci@vger.kernel.org
> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  include/linux/pci_ids.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index a0c75e467df3..e5a1d5e9930b 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2598,6 +2598,8 @@
>  
>  #define PCI_VENDOR_ID_HYGON		0x1d94
>  
> +#define PCI_VENDOR_ID_META		0x1d9b
> +
>  #define PCI_VENDOR_ID_FUNGIBLE		0x1dad
>  
>  #define PCI_VENDOR_ID_HXT		0x1dbf
> 
> 

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver
  2024-04-03 20:08 ` [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver Alexander Duyck
@ 2024-04-03 20:33   ` Andrew Lunn
  2024-04-03 20:47     ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-03 20:33 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

> + * fbnic_init_module - Driver Registration Routine
> + *
> + * The first routine called when the driver is loaded.  All it does is
> + * register with the PCI subsystem.
> + **/
> +static int __init fbnic_init_module(void)
> +{
> +	int err;
> +
> +	pr_info(DRV_SUMMARY " (%s)", fbnic_driver.name);

Please don't spam the kernel log like this. Drivers should only report
when something goes wrong.

     Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 03/15] eth: fbnic: Allocate core device specific structures and devlink interface
  2024-04-03 20:08 ` [net-next PATCH 03/15] eth: fbnic: Allocate core device specific structures and devlink interface Alexander Duyck
@ 2024-04-03 20:35   ` Bjorn Helgaas
  0 siblings, 0 replies; 163+ messages in thread
From: Bjorn Helgaas @ 2024-04-03 20:35 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 03, 2024 at 01:08:37PM -0700, Alexander Duyck wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> At the core of the fbnic device will be the devlink interface. This
> interface will eventually provide basic functionality in the event that
> there are any issues with the network interface.
> 
> Add support for allocating the MSI-X vectors and setting up the BAR
> mapping. With this we can start enabling various subsytems and start
> brining up additional interfaces such the AXI fabric and the firmware
> mailbox.

> +int fbnic_alloc_irqs(struct fbnic_dev *fbd)
> +{
> +	unsigned int wanted_irqs = FBNIC_NON_NAPI_VECTORS;
> +	struct pci_dev *pdev = to_pci_dev(fbd->dev);
> +	struct msix_entry *msix_entries;
> +	int i, num_irqs;
> +
> +	msix_entries = kcalloc(wanted_irqs, sizeof(*msix_entries), GFP_KERNEL);
> +	if (!msix_entries)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < wanted_irqs; i++)
> +		msix_entries[i].entry = i;
> +
> +	num_irqs = pci_enable_msix_range(pdev, msix_entries,
> +					 FBNIC_NON_NAPI_VECTORS + 1,
> +					 wanted_irqs);

FWIW, deprecated in favor of pci_alloc_irq_vectors().

> +	if (num_irqs < 0) {
> +		dev_err(fbd->dev, "Failed to allocate MSI-X entries\n");
> +		kfree(msix_entries);
> +		return num_irqs;
> +	}
> +
> +	if (num_irqs < wanted_irqs)
> +		dev_warn(fbd->dev, "Allocated %d IRQs, expected %d\n",
> +			 num_irqs, wanted_irqs);
> +
> +	fbd->msix_entries = msix_entries;
> +	fbd->num_irqs = num_irqs;
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (14 preceding siblings ...)
  2024-04-03 20:09 ` [net-next PATCH 15/15] eth: fbnic: write the TCAM tables used for RSS control and Rx to host Alexander Duyck
@ 2024-04-03 20:42 ` Bjorn Helgaas
  2024-04-04 11:37 ` Jiri Pirko
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 163+ messages in thread
From: Bjorn Helgaas @ 2024-04-03 20:42 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 03, 2024 at 01:08:24PM -0700, Alexander Duyck wrote:
> This patch set includes the necessary patches to enable basic Tx and Rx
> over the Meta Platforms Host Network Interface. To do this we introduce a
> new driver and driver and directories in the form of
> "drivers/net/ethernet/meta/fbnic".

>       PCI: Add Meta Platforms vendor ID
>       eth: fbnic: add scaffolding for Meta's NIC driver
>       eth: fbnic: Allocate core device specific structures and devlink interface
>       eth: fbnic: Add register init to set PCIe/Ethernet device config
>       eth: fbnic: add message parsing for FW messages
>       eth: fbnic: add FW communication mechanism
>       eth: fbnic: allocate a netdevice and napi vectors with queues
>       eth: fbnic: implement Tx queue alloc/start/stop/free
>       eth: fbnic: implement Rx queue alloc/start/stop/free
>       eth: fbnic: Add initial messaging to notify FW of our presence
>       eth: fbnic: Enable Ethernet link setup
>       eth: fbnic: add basic Tx handling
>       eth: fbnic: add basic Rx handling
>       eth: fbnic: add L2 address programming
>       eth: fbnic: write the TCAM tables used for RSS control and Rx to host

Random mix of initial caps in subjects.  Also kind of a mix of initial
caps in comments, e.g.,

  $ grep -Er "^\s+/\*" drivers/net/ethernet/meta/fbnic/

I didn't bother to figure out which patch these typos were in:

  $ codespell drivers/net/ethernet/meta/fbnic/
  drivers/net/ethernet/meta/fbnic/fbnic_pci.c:452: ot ==> to, of, or
  drivers/net/ethernet/meta/fbnic/fbnic_pci.c:479: Reenable ==> Re-enable
  drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:569: caclulation ==> calculation
  drivers/net/ethernet/meta/fbnic/fbnic_fw.c:740: conents ==> contents
  drivers/net/ethernet/meta/fbnic/fbnic_txrx.h:19: cachline ==> cacheline

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 04/15] eth: fbnic: Add register init to set PCIe/Ethernet device config
  2024-04-03 20:08 ` [net-next PATCH 04/15] eth: fbnic: Add register init to set PCIe/Ethernet device config Alexander Duyck
@ 2024-04-03 20:46   ` Andrew Lunn
  2024-04-10 20:31     ` Jacob Keller
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-03 20:46 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

> +#define wr32(reg, val)	fbnic_wr32(fbd, reg, val)
> +#define rd32(reg)	fbnic_rd32(fbd, reg)
> +#define wrfl()		fbnic_rd32(fbd, FBNIC_MASTER_SPARE_0)

I don't think that is considered best practices, using variables not
passed to the macro.

	Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver
  2024-04-03 20:33   ` Andrew Lunn
@ 2024-04-03 20:47     ` Alexander Duyck
  2024-04-03 21:17       ` Andrew Lunn
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 20:47 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 3, 2024 at 1:33 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > + * fbnic_init_module - Driver Registration Routine
> > + *
> > + * The first routine called when the driver is loaded.  All it does is
> > + * register with the PCI subsystem.
> > + **/
> > +static int __init fbnic_init_module(void)
> > +{
> > +     int err;
> > +
> > +     pr_info(DRV_SUMMARY " (%s)", fbnic_driver.name);
>
> Please don't spam the kernel log like this. Drivers should only report
> when something goes wrong.
>
>      Andrew

Really? I have always used something like this to determine that the
driver isn't there when a user complains that the driver didn't load
on a given device. It isn't as though it would be super spammy as this
is something that is normally only run once when the module is loaded
during early boot, and there isn't a good way to say the module isn't
loaded if the driver itself isn't there.

For example if somebody adds the driver, but forgets to update the
initramfs I can easily call out that it isn't there when I ask for the
logs from the system.

Although I suppose I could meet you halfway. It seems like I am always
posting the message here. If you would prefer I can only display it if
the driver is successfully registered.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues
  2024-04-03 20:08 ` [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues Alexander Duyck
@ 2024-04-03 20:58   ` Andrew Lunn
  2024-04-03 22:15     ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-03 20:58 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

> +static int fbnic_dsn_to_mac_addr(u64 dsn, char *addr)
> +{
> +	addr[0] = (dsn >> 56) & 0xFF;
> +	addr[1] = (dsn >> 48) & 0xFF;
> +	addr[2] = (dsn >> 40) & 0xFF;
> +	addr[3] = (dsn >> 16) & 0xFF;
> +	addr[4] = (dsn >> 8) & 0xFF;
> +	addr[5] = dsn & 0xFF;

u64_to_ether_addr() might work here.

> +
> +	return is_valid_ether_addr(addr) ? 0 : -EINVAL;
> +}
> +
> +/**
> + * fbnic_netdev_register - Initialize general software structures
> + * @netdev: Netdev containing structure to initialize and register
> + *
> + * Initialize the MAC address for the netdev and register it.
> + **/
> +int fbnic_netdev_register(struct net_device *netdev)
> +{
> +	struct fbnic_net *fbn = netdev_priv(netdev);
> +	struct fbnic_dev *fbd = fbn->fbd;
> +	u64 dsn = fbd->dsn;
> +	u8 addr[ETH_ALEN];
> +	int err;
> +
> +	err = fbnic_dsn_to_mac_addr(dsn, addr);
> +	if (!err) {
> +		ether_addr_copy(netdev->perm_addr, addr);
> +		eth_hw_addr_set(netdev, addr);
> +	} else {
> +		dev_err(fbd->dev, "MAC addr %pM invalid\n", addr);

Rather than fail, it is more normal to allocate a random MAC address.

> @@ -192,7 +266,6 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  
>  	fbnic_devlink_unregister(fbd);
>  	fbnic_devlink_free(fbd);
> -
>  	return err;
>  }

That hunk should be somewhere else.

     Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 05/15] eth: fbnic: add message parsing for FW messages
  2024-04-03 20:08 ` [net-next PATCH 05/15] eth: fbnic: add message parsing for FW messages Alexander Duyck
@ 2024-04-03 21:07   ` Jeff Johnson
  0 siblings, 0 replies; 163+ messages in thread
From: Jeff Johnson @ 2024-04-03 21:07 UTC (permalink / raw)
  To: Alexander Duyck, netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

On 4/3/2024 1:08 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> Add FW message formatting and parsing. The TLV format should
> look very familiar to those familiar with netlink.
> Since we don't have to deal with backward compatibility
> we tweaked the format a little to make it easier to deal
> with, and more appropriate for tightly coupled interfaces
> like driver<>FW communication.
> 
> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  drivers/net/ethernet/meta/fbnic/Makefile    |    3 
>  drivers/net/ethernet/meta/fbnic/fbnic_tlv.c |  529 +++++++++++++++++++++++++++
>  drivers/net/ethernet/meta/fbnic/fbnic_tlv.h |  175 +++++++++
>  3 files changed, 706 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.c
>  create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.h
[...]
> +/**
> + *  fbnic_tlv_msg_alloc - Allocate page and initialize FW message header
> + *  @msg_id: Identifier for new message we are starting
> + *
> + *  Returns pointer to start of message, or NULL on failure.

should use Return: tag as documented at https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation

(although the kernel-doc script will accept a few variations such as Returns: or @Returns:)

currently scripts/kernel-doc -Wall -Werror -none $files reports:
drivers/net/ethernet/meta/fbnic/fbnic_fw.c:244: warning: No description found for return value of 'fbnic_fw_xmit_simple_msg'
drivers/net/ethernet/meta/fbnic/fbnic_fw.c:273: warning: No description found for return value of 'fbnic_fw_xmit_cap_msg'
drivers/net/ethernet/meta/fbnic/fbnic_fw.c:338: warning: No description found for return value of 'fbnic_fw_xmit_ownership_msg'
drivers/net/ethernet/meta/fbnic/fbnic_fw.c:659: warning: No description found for return value of 'fbnic_fw_xmit_comphy_set_msg'
drivers/net/ethernet/meta/fbnic/fbnic_irq.c:33: warning: No description found for return value of 'fbnic_fw_enable_mbx'
drivers/net/ethernet/meta/fbnic/fbnic_irq.c:111: warning: No description found for return value of 'fbnic_mac_get_link'
drivers/net/ethernet/meta/fbnic/fbnic_irq.c:146: warning: No description found for return value of 'fbnic_mac_enable'
drivers/net/ethernet/meta/fbnic/fbnic_mac.c:1020: warning: No description found for return value of 'fbnic_mac_init'
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c:356: warning: No description found for return value of 'fbnic_netdev_alloc'
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c:449: warning: No description found for return value of 'fbnic_netdev_register'
drivers/net/ethernet/meta/fbnic/fbnic_pci.c:300: warning: No description found for return value of 'fbnic_probe'
drivers/net/ethernet/meta/fbnic/fbnic_pci.c:614: warning: No description found for return value of 'fbnic_init_module'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:24: warning: No description found for return value of 'fbnic_tlv_msg_alloc'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:55: warning: No description found for return value of 'fbnic_tlv_attr_put_flag'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:97: warning: No description found for return value of 'fbnic_tlv_attr_put_value'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:141: warning: No description found for return value of '__fbnic_tlv_attr_put_int'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:161: warning: No description found for return value of 'fbnic_tlv_attr_put_mac_addr'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:178: warning: No description found for return value of 'fbnic_tlv_attr_put_string'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:204: warning: No description found for return value of 'fbnic_tlv_attr_get_unsigned'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:220: warning: No description found for return value of 'fbnic_tlv_attr_get_signed'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:244: warning: No description found for return value of 'fbnic_tlv_attr_get_string'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:264: warning: No description found for return value of 'fbnic_tlv_attr_nest_start'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:387: warning: No description found for return value of 'fbnic_tlv_attr_parse_array'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:439: warning: No description found for return value of 'fbnic_tlv_attr_parse'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:487: warning: No description found for return value of 'fbnic_tlv_msg_parse'
drivers/net/ethernet/meta/fbnic/fbnic_tlv.c:520: warning: No description found for return value of 'fbnic_tlv_parser_error'
26 warnings as Errors

> + *
> + *  Allocates a page and initializes message header at start of page.
> + *  Initial message size is 1 DWORD which is just the header.
> + **/


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup
  2024-04-03 20:09 ` [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup Alexander Duyck
@ 2024-04-03 21:11   ` Andrew Lunn
  2024-04-05 21:51   ` Andrew Lunn
  1 sibling, 0 replies; 163+ messages in thread
From: Andrew Lunn @ 2024-04-03 21:11 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

> +/* MAC PCS registers */
> +#define FBNIC_CSR_START_PCS		0x10000 /* CSR section delimiter */
> +#define FBNIC_PCS_CONTROL1_0		0x10000		/* 0x40000 */
> +#define FBNIC_PCS_CONTROL1_RESET		CSR_BIT(15)
> +#define FBNIC_PCS_CONTROL1_LOOPBACK		CSR_BIT(14)
> +#define FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS	CSR_BIT(13)
> +#define FBNIC_PCS_CONTROL1_SPEED_ALWAYS		CSR_BIT(6)
> +#define FBNIC_PCS_VENDOR_VL_INTVL_0	0x10202		/* 0x40808 */
> +#define FBNIC_PCS_VL0_0_CHAN_0		0x10208		/* 0x40820 */
> +#define FBNIC_PCS_VL0_1_CHAN_0		0x10209		/* 0x40824 */
> +#define FBNIC_PCS_VL1_0_CHAN_0		0x1020a		/* 0x40828 */
> +#define FBNIC_PCS_VL1_1_CHAN_0		0x1020b		/* 0x4082c */
> +#define FBNIC_PCS_VL2_0_CHAN_0		0x1020c		/* 0x40830 */
> +#define FBNIC_PCS_VL2_1_CHAN_0		0x1020d		/* 0x40834 */
> +#define FBNIC_PCS_VL3_0_CHAN_0		0x1020e		/* 0x40838 */
> +#define FBNIC_PCS_VL3_1_CHAN_0		0x1020f		/* 0x4083c */

Is this a licences PCS? Synopsys DesignWare?

> +static void fbnic_set_led_state_asic(struct fbnic_dev *fbd, int state)
> +{
> +	struct fbnic_net *fbn = netdev_priv(fbd->netdev);
> +	u32 led_csr = FBNIC_MAC_ENET_LED_DEFAULT;
> +
> +	switch (state) {
> +	case FBNIC_LED_OFF:
> +		led_csr |= FBNIC_MAC_ENET_LED_AMBER |
> +			   FBNIC_MAC_ENET_LED_ACTIVITY_ON;
> +		break;
> +	case FBNIC_LED_ON:
> +		led_csr |= FBNIC_MAC_ENET_LED_BLUE |
> +			   FBNIC_MAC_ENET_LED_ACTIVITY_ON;
> +		break;
> +	case FBNIC_LED_RESTORE:
> +		led_csr |= FBNIC_MAC_ENET_LED_ACTIVITY_DEFAULT;
> +
> +		/* Don't set LEDs on if link isn't up */
> +		if (fbd->link_state != FBNIC_LINK_UP)
> +			break;
> +		/* Don't set LEDs for supported autoneg modes */
> +		if ((fbn->link_mode & FBNIC_LINK_AUTO) &&
> +		    (fbn->link_mode & FBNIC_LINK_MODE_MASK) != FBNIC_LINK_50R2)
> +			break;
> +
> +		/* Set LEDs based on link speed
> +		 * 100G	Blue,
> +		 * 50G	Blue & Amber
> +		 * 25G	Amber
> +		 */
> +		switch (fbn->link_mode & FBNIC_LINK_MODE_MASK) {
> +		case FBNIC_LINK_100R2:
> +			led_csr |= FBNIC_MAC_ENET_LED_BLUE;
> +			break;
> +		case FBNIC_LINK_50R1:
> +		case FBNIC_LINK_50R2:
> +			led_csr |= FBNIC_MAC_ENET_LED_BLUE;
> +			fallthrough;
> +		case FBNIC_LINK_25R1:
> +			led_csr |= FBNIC_MAC_ENET_LED_AMBER;
> +			break;
> +		}
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	wr32(FBNIC_MAC_ENET_LED, led_csr);
> +}

Seems like you should be using /sys/class/leds and the netdev trigger.

      Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver
  2024-04-03 20:47     ` Alexander Duyck
@ 2024-04-03 21:17       ` Andrew Lunn
  2024-04-03 21:51         ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-03 21:17 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 03, 2024 at 01:47:18PM -0700, Alexander Duyck wrote:
> On Wed, Apr 3, 2024 at 1:33 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > + * fbnic_init_module - Driver Registration Routine
> > > + *
> > > + * The first routine called when the driver is loaded.  All it does is
> > > + * register with the PCI subsystem.
> > > + **/
> > > +static int __init fbnic_init_module(void)
> > > +{
> > > +     int err;
> > > +
> > > +     pr_info(DRV_SUMMARY " (%s)", fbnic_driver.name);
> >
> > Please don't spam the kernel log like this. Drivers should only report
> > when something goes wrong.
> >
> >      Andrew
> 
> Really?

I think if you look around, GregKH has said this.

lsmod | wc
    167     585    6814

Do i really want my kernel log spammed with 167 'Hello world'
messages?

> I have always used something like this to determine that the
> driver isn't there when a user complains that the driver didn't load
> on a given device. It isn't as though it would be super spammy as this
> is something that is normally only run once when the module is loaded
> during early boot, and there isn't a good way to say the module isn't
> loaded if the driver itself isn't there.

lsmod

	Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver
  2024-04-03 21:17       ` Andrew Lunn
@ 2024-04-03 21:51         ` Alexander Duyck
  2024-04-03 22:20           ` Andrew Lunn
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 21:51 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 3, 2024 at 2:17 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Wed, Apr 03, 2024 at 01:47:18PM -0700, Alexander Duyck wrote:
> > On Wed, Apr 3, 2024 at 1:33 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > + * fbnic_init_module - Driver Registration Routine
> > > > + *
> > > > + * The first routine called when the driver is loaded.  All it does is
> > > > + * register with the PCI subsystem.
> > > > + **/
> > > > +static int __init fbnic_init_module(void)
> > > > +{
> > > > +     int err;
> > > > +
> > > > +     pr_info(DRV_SUMMARY " (%s)", fbnic_driver.name);
> > >
> > > Please don't spam the kernel log like this. Drivers should only report
> > > when something goes wrong.
> > >
> > >      Andrew
> >
> > Really?
>
> I think if you look around, GregKH has said this.
>
> lsmod | wc
>     167     585    6814
>
> Do i really want my kernel log spammed with 167 'Hello world'
> messages?

I would say it depends. Are you trying to boot off of all 167 devices?
The issue I run into is that I have to support boot scenarios where
the driver has to load as early as possible in order to mount a boot
image copied over the network. In many cases if something fails we
won't have access to something like lsmod since this is being used in
fairly small monolithic kernel images used for provisioning systems.

> > I have always used something like this to determine that the
> > driver isn't there when a user complains that the driver didn't load
> > on a given device. It isn't as though it would be super spammy as this
> > is something that is normally only run once when the module is loaded
> > during early boot, and there isn't a good way to say the module isn't
> > loaded if the driver itself isn't there.
>
> lsmod
>
>         Andrew

That assumes you have access to the system and aren't looking at logs
after the fact. In addition that assumes the module isn't built into
the kernel as well. Having the one line in the log provides a single
point of truth that is easily searchable without having to resort to
one of several different ways of trying to figure out if it is there:
[root@localhost ~]# dmesg | grep "Meta(R) Host Network Interface Driver"
[   11.890979] Meta(R) Host Network Interface Driver (fbnic)

Otherwise we are having to go searching in sysfs if it is there, or
lsmod, or whatever is your preferred way and that only works if we
have login access to the system and it isn't just doing something like
writing the log to a file and rebooting.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues
  2024-04-03 20:58   ` Andrew Lunn
@ 2024-04-03 22:15     ` Alexander Duyck
  2024-04-03 22:26       ` Andrew Lunn
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 22:15 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 3, 2024 at 1:58 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > +static int fbnic_dsn_to_mac_addr(u64 dsn, char *addr)
> > +{
> > +     addr[0] = (dsn >> 56) & 0xFF;
> > +     addr[1] = (dsn >> 48) & 0xFF;
> > +     addr[2] = (dsn >> 40) & 0xFF;
> > +     addr[3] = (dsn >> 16) & 0xFF;
> > +     addr[4] = (dsn >> 8) & 0xFF;
> > +     addr[5] = dsn & 0xFF;
>
> u64_to_ether_addr() might work here.

Actually I think it is the opposite byte order. In addition we have to
skip over bytes 3 and 4 in the center of this as those are just {
0xff, 0xff } assuming the DSN is properly formed.

> > +
> > +     return is_valid_ether_addr(addr) ? 0 : -EINVAL;
> > +}
> > +
> > +/**
> > + * fbnic_netdev_register - Initialize general software structures
> > + * @netdev: Netdev containing structure to initialize and register
> > + *
> > + * Initialize the MAC address for the netdev and register it.
> > + **/
> > +int fbnic_netdev_register(struct net_device *netdev)
> > +{
> > +     struct fbnic_net *fbn = netdev_priv(netdev);
> > +     struct fbnic_dev *fbd = fbn->fbd;
> > +     u64 dsn = fbd->dsn;
> > +     u8 addr[ETH_ALEN];
> > +     int err;
> > +
> > +     err = fbnic_dsn_to_mac_addr(dsn, addr);
> > +     if (!err) {
> > +             ether_addr_copy(netdev->perm_addr, addr);
> > +             eth_hw_addr_set(netdev, addr);
> > +     } else {
> > +             dev_err(fbd->dev, "MAC addr %pM invalid\n", addr);
>
> Rather than fail, it is more normal to allocate a random MAC address.

If the MAC address is invalid we are probably looking at an EEPROM
corruption. If requested we could port over a module parameter we have
that enables fallback as you are mentioning. However for us it is
better to default to failing since the MAC address is used to identify
the system within the datacenter and if it is randomly assigned it
will make it hard to correctly provision the system anyway.

> > @@ -192,7 +266,6 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> >
> >       fbnic_devlink_unregister(fbd);
> >       fbnic_devlink_free(fbd);
> > -
> >       return err;
> >  }
>
> That hunk should be somewhere else.
>
>      Andrew

Good catch. That wasn't supposed to be there. Must have accidentally
dropped that line.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver
  2024-04-03 21:51         ` Alexander Duyck
@ 2024-04-03 22:20           ` Andrew Lunn
  2024-04-03 23:27             ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-03 22:20 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

> I would say it depends. Are you trying to boot off of all 167 devices?

If i'm doing some sort of odd boot setup, i generally TFTP boot the
kernel, and then use NFS root. And i have everything built in. It not
finding the root fs because networking is FUBAR is pretty obvious. Bin
there, done that.

Please keep in mind the users here. This is a data centre NIC, not a
'grandma and grandpa' device which as the designated IT expert of the
family i need to help make work. Can the operators of Meta data
centres really not understand lsmod? Cannot look in /sys?

       Andrew


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues
  2024-04-03 22:15     ` Alexander Duyck
@ 2024-04-03 22:26       ` Andrew Lunn
  0 siblings, 0 replies; 163+ messages in thread
From: Andrew Lunn @ 2024-04-03 22:26 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 03, 2024 at 03:15:25PM -0700, Alexander Duyck wrote:
> On Wed, Apr 3, 2024 at 1:58 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > +static int fbnic_dsn_to_mac_addr(u64 dsn, char *addr)
> > > +{
> > > +     addr[0] = (dsn >> 56) & 0xFF;
> > > +     addr[1] = (dsn >> 48) & 0xFF;
> > > +     addr[2] = (dsn >> 40) & 0xFF;
> > > +     addr[3] = (dsn >> 16) & 0xFF;
> > > +     addr[4] = (dsn >> 8) & 0xFF;
> > > +     addr[5] = dsn & 0xFF;
> >
> > u64_to_ether_addr() might work here.
> 
> Actually I think it is the opposite byte order. In addition we have to
> skip over bytes 3 and 4 in the center of this as those are just {
> 0xff, 0xff } assuming the DSN is properly formed.

O.K. That is why i used 'might'

> > > +/**
> > > + * fbnic_netdev_register - Initialize general software structures
> > > + * @netdev: Netdev containing structure to initialize and register
> > > + *
> > > + * Initialize the MAC address for the netdev and register it.
> > > + **/
> > > +int fbnic_netdev_register(struct net_device *netdev)
> > > +{
> > > +     struct fbnic_net *fbn = netdev_priv(netdev);
> > > +     struct fbnic_dev *fbd = fbn->fbd;
> > > +     u64 dsn = fbd->dsn;
> > > +     u8 addr[ETH_ALEN];
> > > +     int err;
> > > +
> > > +     err = fbnic_dsn_to_mac_addr(dsn, addr);
> > > +     if (!err) {
> > > +             ether_addr_copy(netdev->perm_addr, addr);
> > > +             eth_hw_addr_set(netdev, addr);
> > > +     } else {
> > > +             dev_err(fbd->dev, "MAC addr %pM invalid\n", addr);
> >
> > Rather than fail, it is more normal to allocate a random MAC address.
> 
> If the MAC address is invalid we are probably looking at an EEPROM
> corruption. If requested we could port over a module parameter we have
> that enables fallback as you are mentioning. However for us it is
> better to default to failing since the MAC address is used to identify
> the system within the datacenter and if it is randomly assigned it
> will make it hard to correctly provision the system anyway.

So maybe add a comment about that.

But i would also expect your DHCP server to be helping out here. If it
gets a request with a MAC it does not know, it could allocate an IP
address from a pool for devices which are FUBAR. You can then at least
ssh into it and try to figure out what has gone wrong?

    Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver
  2024-04-03 22:20           ` Andrew Lunn
@ 2024-04-03 23:27             ` Alexander Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-03 23:27 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 3, 2024 at 3:20 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > I would say it depends. Are you trying to boot off of all 167 devices?
>
> If i'm doing some sort of odd boot setup, i generally TFTP boot the
> kernel, and then use NFS root. And i have everything built in. It not
> finding the root fs because networking is FUBAR is pretty obvious. Bin
> there, done that.
>
> Please keep in mind the users here. This is a data centre NIC, not a
> 'grandma and grandpa' device which as the designated IT expert of the
> family i need to help make work. Can the operators of Meta data
> centres really not understand lsmod? Cannot look in /sys?
>
>        Andrew

At datacenter scales stopping to triage individual issues becomes
quite expensive. It implies that you are leaving the device in the
failed state while you take the time to come back around and figure
out what is going on. That is why in many cases we are not able to
stop and run an lsmod. Usually the error is recorded, the system
reset, and nothing comes of it unless it is a repeated issue.

Also it seems like this messaging is still being added for new
drivers. A quick search through the code for an example comes up with
cb7dd712189f ("octeon_ep_vf: Add driver framework and device
initialization") which is doing the same exact thing and is even a bit
noisier.

Anyway I partially agree that we do need to reduce the noise scope of
it so I will update so we only print the message if we actually
register the driver. We can probably discuss this further for v2 when
I get it submitted.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (15 preceding siblings ...)
  2024-04-03 20:42 ` [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Bjorn Helgaas
@ 2024-04-04 11:37 ` Jiri Pirko
  2024-04-04 14:45   ` Alexander Duyck
  2024-04-05 14:01 ` Przemek Kitszel
  2024-04-09 20:51 ` Jakub Kicinski
  18 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-04 11:37 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

Wed, Apr 03, 2024 at 10:08:24PM CEST, alexander.duyck@gmail.com wrote:
>This patch set includes the necessary patches to enable basic Tx and Rx
>over the Meta Platforms Host Network Interface. To do this we introduce a
>new driver and driver and directories in the form of
>"drivers/net/ethernet/meta/fbnic".
>
>Due to submission limits the general plan to submit a minimal driver for
>now almost equivalent to a UEFI driver in functionality, and then follow up
>over the coming weeks enabling additional offloads and more features for
>the device.
>
>The general plan is to look at adding support for ethtool, statistics, and
>start work on offloads in the next set of patches.

Could you please shed some light for the motivation to introduce this
driver in the community kernel? Is this device something people can
obtain in a shop, or is it rather something to be seen in Meta
datacenter only? If the second is the case, why exactly would we need
this driver?



>
>---
>
>Alexander Duyck (15):
>      PCI: Add Meta Platforms vendor ID
>      eth: fbnic: add scaffolding for Meta's NIC driver
>      eth: fbnic: Allocate core device specific structures and devlink interface
>      eth: fbnic: Add register init to set PCIe/Ethernet device config
>      eth: fbnic: add message parsing for FW messages
>      eth: fbnic: add FW communication mechanism
>      eth: fbnic: allocate a netdevice and napi vectors with queues
>      eth: fbnic: implement Tx queue alloc/start/stop/free
>      eth: fbnic: implement Rx queue alloc/start/stop/free
>      eth: fbnic: Add initial messaging to notify FW of our presence
>      eth: fbnic: Enable Ethernet link setup
>      eth: fbnic: add basic Tx handling
>      eth: fbnic: add basic Rx handling
>      eth: fbnic: add L2 address programming
>      eth: fbnic: write the TCAM tables used for RSS control and Rx to host
>
>
> MAINTAINERS                                   |    7 +
> drivers/net/ethernet/Kconfig                  |    1 +
> drivers/net/ethernet/Makefile                 |    1 +
> drivers/net/ethernet/meta/Kconfig             |   29 +
> drivers/net/ethernet/meta/Makefile            |    6 +
> drivers/net/ethernet/meta/fbnic/Makefile      |   18 +
> drivers/net/ethernet/meta/fbnic/fbnic.h       |  148 ++
> drivers/net/ethernet/meta/fbnic/fbnic_csr.h   |  912 ++++++++
> .../net/ethernet/meta/fbnic/fbnic_devlink.c   |   86 +
> .../net/ethernet/meta/fbnic/fbnic_drvinfo.h   |    5 +
> drivers/net/ethernet/meta/fbnic/fbnic_fw.c    |  823 ++++++++
> drivers/net/ethernet/meta/fbnic/fbnic_fw.h    |  133 ++
> drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |  251 +++
> drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 1025 +++++++++
> drivers/net/ethernet/meta/fbnic/fbnic_mac.h   |   83 +
> .../net/ethernet/meta/fbnic/fbnic_netdev.c    |  470 +++++
> .../net/ethernet/meta/fbnic/fbnic_netdev.h    |   59 +
> drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |  633 ++++++
> drivers/net/ethernet/meta/fbnic/fbnic_rpc.c   |  709 +++++++
> drivers/net/ethernet/meta/fbnic/fbnic_rpc.h   |  189 ++
> drivers/net/ethernet/meta/fbnic/fbnic_tlv.c   |  529 +++++
> drivers/net/ethernet/meta/fbnic/fbnic_tlv.h   |  175 ++
> drivers/net/ethernet/meta/fbnic/fbnic_txrx.c  | 1873 +++++++++++++++++
> drivers/net/ethernet/meta/fbnic/fbnic_txrx.h  |  125 ++
> include/linux/pci_ids.h                       |    2 +
> 25 files changed, 8292 insertions(+)
> create mode 100644 drivers/net/ethernet/meta/Kconfig
> create mode 100644 drivers/net/ethernet/meta/Makefile
> create mode 100644 drivers/net/ethernet/meta/fbnic/Makefile
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_csr.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_drvinfo.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_fw.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_fw.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_irq.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_mac.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_mac.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_netdev.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_pci.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_rpc.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_tlv.h
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
> create mode 100644 drivers/net/ethernet/meta/fbnic/fbnic_txrx.h
>
>--
>
>

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 09/15] eth: fbnic: implement Rx queue alloc/start/stop/free
  2024-04-03 20:09 ` [net-next PATCH 09/15] eth: fbnic: implement Rx " Alexander Duyck
@ 2024-04-04 11:42   ` kernel test robot
  0 siblings, 0 replies; 163+ messages in thread
From: kernel test robot @ 2024-04-04 11:42 UTC (permalink / raw)
  To: Alexander Duyck, netdev
  Cc: llvm, oe-kbuild-all, Alexander Duyck, kuba, davem, pabeni

Hi Alexander,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Duyck/PCI-Add-Meta-Platforms-vendor-ID/20240404-041319
base:   net-next/main
patch link:    https://lore.kernel.org/r/171217494276.1598374.468010123854919775.stgit%40ahduyck-xeon-server.home.arpa
patch subject: [net-next PATCH 09/15] eth: fbnic: implement Rx queue alloc/start/stop/free
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20240404/202404041954.giczimoH-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 546dc2245ffc4cccd0b05b58b7a5955e355a3b27)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240404/202404041954.giczimoH-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404041954.giczimoH-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:4:
   In file included from include/linux/iopoll.h:14:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     547 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     560 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
         |                                                           ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
     102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
         |                                                      ^
   In file included from drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:4:
   In file included from include/linux/iopoll.h:14:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     573 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
         |                                                           ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
     115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
         |                                                      ^
   In file included from drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:4:
   In file included from include/linux/iopoll.h:14:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     584 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     692 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     700 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     708 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     717 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     726 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     735 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   In file included from drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:5:
   In file included from include/linux/pci.h:37:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:173:
   In file included from arch/s390/include/asm/mmu_context.h:11:
   In file included from arch/s390/include/asm/pgalloc.h:18:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     509 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     516 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     528 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     537 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:1041:6: error: call to '__compiletime_assert_733' declared with 'error' attribute: FIELD_PREP: value too large for the field
    1041 |             FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_MIN_TROOM_MASK, FBNIC_RX_TROOM);
         |             ^
   include/linux/bitfield.h:115:3: note: expanded from macro 'FIELD_PREP'
     115 |                 __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: ");    \
         |                 ^
   include/linux/bitfield.h:68:3: note: expanded from macro '__BF_FIELD_CHECK'
      68 |                 BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ?           \
         |                 ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/linux/compiler_types.h:448:2: note: expanded from macro '_compiletime_assert'
     448 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:441:4: note: expanded from macro '__compiletime_assert'
     441 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:54:1: note: expanded from here
      54 | __compiletime_assert_733
         | ^
   17 warnings and 1 error generated.


vim +1041 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c

  1030	
  1031	static void fbnic_config_drop_mode_rcq(struct fbnic_napi_vector *nv,
  1032					       struct fbnic_ring *rcq)
  1033	{
  1034		u32 drop_mode, rcq_ctl;
  1035	
  1036		drop_mode = FBNIC_QUEUE_RDE_CTL0_DROP_IMMEDIATE;
  1037	
  1038		/* Specify packet layout */
  1039		rcq_ctl = FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_DROP_MODE_MASK, drop_mode) |
  1040		    FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_MIN_HROOM_MASK, FBNIC_RX_HROOM) |
> 1041		    FIELD_PREP(FBNIC_QUEUE_RDE_CTL0_MIN_TROOM_MASK, FBNIC_RX_TROOM);
  1042	
  1043		fbnic_ring_wr32(rcq, FBNIC_QUEUE_RDE_CTL0, rcq_ctl);
  1044	}
  1045	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 11:37 ` Jiri Pirko
@ 2024-04-04 14:45   ` Alexander Duyck
  2024-04-04 15:24     ` Andrew Lunn
  2024-04-04 15:36     ` Jiri Pirko
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-04 14:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

On Thu, Apr 4, 2024 at 4:37 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Wed, Apr 03, 2024 at 10:08:24PM CEST, alexander.duyck@gmail.com wrote:
> >This patch set includes the necessary patches to enable basic Tx and Rx
> >over the Meta Platforms Host Network Interface. To do this we introduce a
> >new driver and driver and directories in the form of
> >"drivers/net/ethernet/meta/fbnic".
> >
> >Due to submission limits the general plan to submit a minimal driver for
> >now almost equivalent to a UEFI driver in functionality, and then follow up
> >over the coming weeks enabling additional offloads and more features for
> >the device.
> >
> >The general plan is to look at adding support for ethtool, statistics, and
> >start work on offloads in the next set of patches.
>
> Could you please shed some light for the motivation to introduce this
> driver in the community kernel? Is this device something people can
> obtain in a shop, or is it rather something to be seen in Meta
> datacenter only? If the second is the case, why exactly would we need
> this driver?

For now this is Meta only. However there are several reasons for
wanting to include this in the upstream kernel.

First is the fact that from a maintenance standpoint it is easier to
avoid drifting from the upstream APIs and such if we are in the kernel
it makes things much easier to maintain as we can just pull in patches
without having to add onto that work by having to craft backports
around code that isn't already in upstream.

Second is the fact that as we introduce new features with our driver
it is much easier to show a proof of concept with the driver being in
the kernel than not. It makes it much harder to work with the
community on offloads and such if we don't have a good vehicle to use
for that. What this driver will provide is an opportunity to push
changes that would be beneficial to us, and likely the rest of the
community without being constrained by what vendors decide they want
to enable or not. The general idea is that if we can show benefit with
our NIC then other vendors would be more likely to follow in our path.

Lastly, there is a good chance that we may end up opening up more than
just the driver code for this project assuming we can get past these
initial hurdles. I don't know if you have noticed but Meta is pretty
involved in the Open Compute Project. So if we want to work with third
parties on things like firmware and hardware it makes it much easier
to do so if the driver is already open and publicly available in the
Linux kernel.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 14:45   ` Alexander Duyck
@ 2024-04-04 15:24     ` Andrew Lunn
  2024-04-04 15:37       ` Jakub Kicinski
  2024-04-04 15:36     ` Jiri Pirko
  1 sibling, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-04 15:24 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jiri Pirko, netdev, bhelgaas, linux-pci, Alexander Duyck, kuba,
	davem, pabeni

> For now this is Meta only. However there are several reasons for
> wanting to include this in the upstream kernel.
> 
> First is the fact that from a maintenance standpoint it is easier to
> avoid drifting from the upstream APIs and such if we are in the kernel
> it makes things much easier to maintain as we can just pull in patches
> without having to add onto that work by having to craft backports
> around code that isn't already in upstream.
> 
> Second is the fact that as we introduce new features with our driver
> it is much easier to show a proof of concept with the driver being in
> the kernel than not. It makes it much harder to work with the
> community on offloads and such if we don't have a good vehicle to use
> for that. What this driver will provide is an opportunity to push
> changes that would be beneficial to us, and likely the rest of the
> community without being constrained by what vendors decide they want
> to enable or not. The general idea is that if we can show benefit with
> our NIC then other vendors would be more likely to follow in our path.
 
Given the discussion going on in the thread "mlx5 ConnectX control
misc driver", you also plan to show you don't need such a misc driver?

	Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 14:45   ` Alexander Duyck
  2024-04-04 15:24     ` Andrew Lunn
@ 2024-04-04 15:36     ` Jiri Pirko
  2024-04-04 18:35       ` Andrew Lunn
  2024-04-04 19:22       ` Alexander Duyck
  1 sibling, 2 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-04 15:36 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

Thu, Apr 04, 2024 at 04:45:14PM CEST, alexander.duyck@gmail.com wrote:
>On Thu, Apr 4, 2024 at 4:37 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Wed, Apr 03, 2024 at 10:08:24PM CEST, alexander.duyck@gmail.com wrote:
>> >This patch set includes the necessary patches to enable basic Tx and Rx
>> >over the Meta Platforms Host Network Interface. To do this we introduce a
>> >new driver and driver and directories in the form of
>> >"drivers/net/ethernet/meta/fbnic".
>> >
>> >Due to submission limits the general plan to submit a minimal driver for
>> >now almost equivalent to a UEFI driver in functionality, and then follow up
>> >over the coming weeks enabling additional offloads and more features for
>> >the device.
>> >
>> >The general plan is to look at adding support for ethtool, statistics, and
>> >start work on offloads in the next set of patches.
>>
>> Could you please shed some light for the motivation to introduce this
>> driver in the community kernel? Is this device something people can
>> obtain in a shop, or is it rather something to be seen in Meta
>> datacenter only? If the second is the case, why exactly would we need
>> this driver?
>
>For now this is Meta only. However there are several reasons for
>wanting to include this in the upstream kernel.
>
>First is the fact that from a maintenance standpoint it is easier to
>avoid drifting from the upstream APIs and such if we are in the kernel
>it makes things much easier to maintain as we can just pull in patches
>without having to add onto that work by having to craft backports
>around code that isn't already in upstream.

That is making life easier for you, making it harder for the community.
O relevance.


>
>Second is the fact that as we introduce new features with our driver
>it is much easier to show a proof of concept with the driver being in
>the kernel than not. It makes it much harder to work with the
>community on offloads and such if we don't have a good vehicle to use
>for that. What this driver will provide is an opportunity to push
>changes that would be beneficial to us, and likely the rest of the
>community without being constrained by what vendors decide they want
>to enable or not. The general idea is that if we can show benefit with
>our NIC then other vendors would be more likely to follow in our path.

Yeah, so not even we would have to maintain driver nobody (outside Meta)
uses or cares about, you say that we will likely maintain more of a dead
code related to that. I think that in Linux kernel, there any many
examples of similarly dead code that causes a lot of headaches to
maintain.

You just want to make your life easier here again. Don't drag community
into this please.


>
>Lastly, there is a good chance that we may end up opening up more than
>just the driver code for this project assuming we can get past these
>initial hurdles. I don't know if you have noticed but Meta is pretty
>involved in the Open Compute Project. So if we want to work with third
>parties on things like firmware and hardware it makes it much easier
>to do so if the driver is already open and publicly available in the
>Linux kernel.

OCP, ehm, lets say I have my reservations...
Okay, what motivation would anyne have to touch the fw of a hardware
running inside Meta datacenter only? Does not make any sense.

I'd say come again when your HW is not limited to Meta datacenter.
For the record and FWIW, I NACK this.


>
>Thanks,
>
>- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 15:24     ` Andrew Lunn
@ 2024-04-04 15:37       ` Jakub Kicinski
  2024-04-05  3:08         ` David Ahern
  0 siblings, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-04 15:37 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alexander Duyck, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

On Thu, 4 Apr 2024 17:24:03 +0200 Andrew Lunn wrote:
> Given the discussion going on in the thread "mlx5 ConnectX control
> misc driver", you also plan to show you don't need such a misc driver?

To the extent to which it is possible to prove an in-existence
of something :) Since my argument that Meta can use _vendor_ devices
without resorting to "misc drivers" doesn't convince them, I doubt
this data point would help.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 15:36     ` Jiri Pirko
@ 2024-04-04 18:35       ` Andrew Lunn
  2024-04-04 19:05         ` Leon Romanovsky
  2024-04-04 19:22       ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-04 18:35 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Alexander Duyck, netdev, bhelgaas, linux-pci, Alexander Duyck,
	kuba, davem, pabeni

> OCP, ehm, lets say I have my reservations...
> Okay, what motivation would anyne have to touch the fw of a hardware
> running inside Meta datacenter only? Does not make any sense.
> 
> I'd say come again when your HW is not limited to Meta datacenter.
> For the record and FWIW, I NACK this.

Is ENA used outside of Amazon data centres?

Is FUNGIBLE used outside of Microsoft data centres?

Is gVNIC used outside of Google data centres?

I don't actually know, i'm more of an Embedded developer.

  Andrew


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 18:35       ` Andrew Lunn
@ 2024-04-04 19:05         ` Leon Romanovsky
  0 siblings, 0 replies; 163+ messages in thread
From: Leon Romanovsky @ 2024-04-04 19:05 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jiri Pirko, Alexander Duyck, netdev, bhelgaas, linux-pci,
	Alexander Duyck, kuba, davem, pabeni

On Thu, Apr 04, 2024 at 08:35:18PM +0200, Andrew Lunn wrote:
> > OCP, ehm, lets say I have my reservations...
> > Okay, what motivation would anyne have to touch the fw of a hardware
> > running inside Meta datacenter only? Does not make any sense.
> > 
> > I'd say come again when your HW is not limited to Meta datacenter.
> > For the record and FWIW, I NACK this.
> 
> Is ENA used outside of Amazon data centres?

At least this driver is needed when you use cloud instance in Amazon
and want install your own kernel/OS.

> 
> Is FUNGIBLE used outside of Microsoft data centres?

Have no idea

> 
> Is gVNIC used outside of Google data centres?

You need this driver too.
https://cloud.google.com/compute/docs/networking/using-gvnic

> 
> I don't actually know, i'm more of an Embedded developer.
> 
>   Andrew
> 
> 

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 15:36     ` Jiri Pirko
  2024-04-04 18:35       ` Andrew Lunn
@ 2024-04-04 19:22       ` Alexander Duyck
  2024-04-04 20:25         ` Jakub Kicinski
  2024-04-08 10:54         ` Jiri Pirko
  1 sibling, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-04 19:22 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

On Thu, Apr 4, 2024 at 8:36 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Apr 04, 2024 at 04:45:14PM CEST, alexander.duyck@gmail.com wrote:
> >On Thu, Apr 4, 2024 at 4:37 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Wed, Apr 03, 2024 at 10:08:24PM CEST, alexander.duyck@gmail.com wrote:

<...>

> >> Could you please shed some light for the motivation to introduce this
> >> driver in the community kernel? Is this device something people can
> >> obtain in a shop, or is it rather something to be seen in Meta
> >> datacenter only? If the second is the case, why exactly would we need
> >> this driver?
> >
> >For now this is Meta only. However there are several reasons for
> >wanting to include this in the upstream kernel.
> >
> >First is the fact that from a maintenance standpoint it is easier to
> >avoid drifting from the upstream APIs and such if we are in the kernel
> >it makes things much easier to maintain as we can just pull in patches
> >without having to add onto that work by having to craft backports
> >around code that isn't already in upstream.
>
> That is making life easier for you, making it harder for the community.
> O relevance.
>
>
> >
> >Second is the fact that as we introduce new features with our driver
> >it is much easier to show a proof of concept with the driver being in
> >the kernel than not. It makes it much harder to work with the
> >community on offloads and such if we don't have a good vehicle to use
> >for that. What this driver will provide is an opportunity to push
> >changes that would be beneficial to us, and likely the rest of the
> >community without being constrained by what vendors decide they want
> >to enable or not. The general idea is that if we can show benefit with
> >our NIC then other vendors would be more likely to follow in our path.
>
> Yeah, so not even we would have to maintain driver nobody (outside Meta)
> uses or cares about, you say that we will likely maintain more of a dead
> code related to that. I think that in Linux kernel, there any many
> examples of similarly dead code that causes a lot of headaches to
> maintain.
>
> You just want to make your life easier here again. Don't drag community
> into this please.

The argument itself doesn't really hold water. The fact is the Meta
data centers are not an insignificant consumer of Linux, so it isn't
as if the driver isn't going to be used. This implies some lack of
good faith from Meta. I don't understand that as we are contributing
across multiple areas in the kernel including networking and ebpf. Is
Meta expected to start pulling time from our upstream maintainers to
have them update out-of-tree kernel modules since the community isn't
willing to let us maintain it in the kernel? Is the message that the
kernel is expected to get value from Meta, but that value is not meant
to be reciprocated? Would you really rather have us start maintaining
our own internal kernel with our own "proprietary goodness", and ask
other NIC vendors to have to maintain their drivers against yet
another kernel if they want to be used in our data centers?

As pointed out by Andew we aren't the first data center to push a
driver for our own proprietary device. The fact is there have been
drivers added for devices that were for purely emulated devices with
no actual customers such as rocker. Should the switch vendors at the
time have pushed back on it stating it wasn't a real "for sale"
device? The whole argument seems counter to what is expected. When a
vendor creates a new device and will likely be enabling new kernel
features my understanding is that it is better to be in the kernel
than not.

Putting a criteria on it that it must be "for sale" seems rather
arbitrary and capricious, especially given that most drivers have to
be pushed out long before they are available in the market in order to
meet deadlines to get the driver into OSV releases such as Redhat when
it hits the market. By that logic should we block all future drivers
until we can find them for sale somewhere? That way we don't run the
risk of adding a vendor driver for a product that might be scrapped
due to a last minute bug that will cause it to never be released.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 19:22       ` Alexander Duyck
@ 2024-04-04 20:25         ` Jakub Kicinski
  2024-04-04 21:59           ` John Fastabend
  2024-04-08 10:54         ` Jiri Pirko
  1 sibling, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-04 20:25 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jiri Pirko, netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Thu, 4 Apr 2024 12:22:02 -0700 Alexander Duyck wrote:
> The argument itself doesn't really hold water. The fact is the Meta
> data centers are not an insignificant consumer of Linux, 

customer or beneficiary ?

> so it isn't as if the driver isn't going to be used. This implies
> some lack of good faith from Meta.

"Good faith" is not a sufficient foundation for a community consisting
of volunteers, and commercial entities (with the xz debacle maybe even
less today than it was a month ago). As a maintainer I really don't want
to be in position of judging the "good faith" of corporate actors.

> I don't understand that as we are
> contributing across multiple areas in the kernel including networking
> and ebpf. Is Meta expected to start pulling time from our upstream
> maintainers to have them update out-of-tree kernel modules since the
> community isn't willing to let us maintain it in the kernel? Is the
> message that the kernel is expected to get value from Meta, but that
> value is not meant to be reciprocated? Would you really rather have
> us start maintaining our own internal kernel with our own
> "proprietary goodness", and ask other NIC vendors to have to maintain
> their drivers against yet another kernel if they want to be used in
> our data centers?

Please allow the community to make rational choices in the interest of
the project and more importantly the interest of its broader user base.

Google would also claim "good faith" -- undoubtedly is supports 
the kernel, and lets some of its best engineers contribute.
Did that make them stop trying to build Fuchsia? The "good faith" of
companies operates with the limits of margin of error of they consider
rational and beneficial.

I don't want to put my thumb on the scale (yet?), but (with my
maintainer hat on) please don't use the "Meta is good" argument, because
someone will send a similar driver from a less involved company later on
and we'll be accused of playing favorites :( Plus companies can change
their approach to open source from "inclusive" to "extractive" 
(to borrow the economic terminology) rather quickly.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 20:25         ` Jakub Kicinski
@ 2024-04-04 21:59           ` John Fastabend
  2024-04-04 23:50             ` Jakub Kicinski
                               ` (2 more replies)
  0 siblings, 3 replies; 163+ messages in thread
From: John Fastabend @ 2024-04-04 21:59 UTC (permalink / raw)
  To: Jakub Kicinski, Alexander Duyck
  Cc: Jiri Pirko, netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

Jakub Kicinski wrote:
> On Thu, 4 Apr 2024 12:22:02 -0700 Alexander Duyck wrote:
> > The argument itself doesn't really hold water. The fact is the Meta
> > data centers are not an insignificant consumer of Linux, 
> 
> customer or beneficiary ?
> 
> > so it isn't as if the driver isn't going to be used. This implies
> > some lack of good faith from Meta.
> 
> "Good faith" is not a sufficient foundation for a community consisting
> of volunteers, and commercial entities (with the xz debacle maybe even
> less today than it was a month ago). As a maintainer I really don't want
> to be in position of judging the "good faith" of corporate actors.
> 
> > I don't understand that as we are
> > contributing across multiple areas in the kernel including networking
> > and ebpf. Is Meta expected to start pulling time from our upstream
> > maintainers to have them update out-of-tree kernel modules since the
> > community isn't willing to let us maintain it in the kernel? Is the
> > message that the kernel is expected to get value from Meta, but that
> > value is not meant to be reciprocated? Would you really rather have
> > us start maintaining our own internal kernel with our own
> > "proprietary goodness", and ask other NIC vendors to have to maintain
> > their drivers against yet another kernel if they want to be used in
> > our data centers?
> 
> Please allow the community to make rational choices in the interest of
> the project and more importantly the interest of its broader user base.
> 
> Google would also claim "good faith" -- undoubtedly is supports 
> the kernel, and lets some of its best engineers contribute.
> Did that make them stop trying to build Fuchsia? The "good faith" of
> companies operates with the limits of margin of error of they consider
> rational and beneficial.
> 
> I don't want to put my thumb on the scale (yet?), but (with my
> maintainer hat on) please don't use the "Meta is good" argument, because
> someone will send a similar driver from a less involved company later on
> and we'll be accused of playing favorites :( Plus companies can change
> their approach to open source from "inclusive" to "extractive" 
> (to borrow the economic terminology) rather quickly.
> 

I'll throw my $.02 in. In this case you have a driver that I only scanned
so far, but looks well done. Alex has written lots of drivers I trust he
will not just abondon it. And if it does end up abondoned and no one
supports it at some future point we can deprecate it same as any other
driver in the networking tree. All the feedback is being answered and
debate is happening so I expect will get a v2, v3 or so. All good signs
in my point.

Back to your point about faith in a company. I don't think we even need
to care about whatever companies business plans. The author could have
submitted with their personal address for what its worth and called it
drivers/alexware/duyck.o Bit extreme and I would have called him on it,
but hopefully the point is clear.

We have lots of drivers in the tree that are hard to physically get ahold
of. Or otherwise gated by paying some vendor for compute time, etc. to
use. We even have some drivers where the hardware itself never made
it out into the wild or only a single customer used it before sellers 
burned it for commercial reasons or hw wasn't workable, team was cut, etc.

I can't see how if I have a physical NIC for it on my desk here makes
much difference one way or the other.

The alternative is much worse someone builds a team of engineers locks
them up they build some interesting pieces and we never get to see it
because we tried to block someone from opensourcing their driver?
Eventually they need some kernel changes and than we block those too
because we didn't allow the driver that was the use case? This seems
wrong to me.

Anyways we have zero ways to enforce such a policy. Have vendors
ship a NIC to somebody with the v0 of the patch set? Attach a picture? 
Even if vendor X claims they will have a product in N months and
than only sells it to qualified customers what to do we do then.
Driver author could even believe the hardware will be available
when they post the driver, but business may change out of hands
of the developer.

I'm 100% on letting this through assuming Alex is on top of feedback
and the code is good. I think any other policy would be very ugly
to enforce, prove, and even understand. Obviously code and architecture
debates I'm all for. Ensuring we have a trusted, experienced person
signed up to review code, address feedback, fix whatever syzbot finds
and so on is also a must I think. I'm sure Alex will take care of
it.

Thanks,
John

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 21:59           ` John Fastabend
@ 2024-04-04 23:50             ` Jakub Kicinski
  2024-04-05  0:11               ` Alexander Duyck
  2024-04-04 23:50             ` Alexander Duyck
  2024-04-08 11:05             ` Jiri Pirko
  2 siblings, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-04 23:50 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexander Duyck, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

On Thu, 04 Apr 2024 14:59:33 -0700 John Fastabend wrote:
> The alternative is much worse someone builds a team of engineers locks
> them up they build some interesting pieces and we never get to see it
> because we tried to block someone from opensourcing their driver?

Opensourcing is just one push to github.
There are guarantees we give to upstream drivers.

> Eventually they need some kernel changes and than we block those too
> because we didn't allow the driver that was the use case? This seems
> wrong to me.

The flip side of the argument is, what if we allow some device we don't
have access to to make changes to the core for its benefit. Owner
reports that some changes broke the kernel for them. Kernel rules,
regression, we have to revert. This is not a hypothetical, "less than
cooperative users" demanding reverts, and "reporting us to Linus"
is a reality :(

Technical solution? Maybe if it's not a public device regression rules
don't apply? Seems fairly reasonable.

> Anyways we have zero ways to enforce such a policy. Have vendors
> ship a NIC to somebody with the v0 of the patch set? Attach a picture? 

GenAI world, pictures mean nothing :) We do have a CI in netdev, which
is all ready to ingest external results, and a (currently tiny amount?)
of test for NICs. Prove that you care about the device by running the
upstream tests and reporting results? Seems fairly reasonable.

> Even if vendor X claims they will have a product in N months and
> than only sells it to qualified customers what to do we do then.
> Driver author could even believe the hardware will be available
> when they post the driver, but business may change out of hands
> of the developer.
> 
> I'm 100% on letting this through assuming Alex is on top of feedback
> and the code is good.

I'd strongly prefer if we detach our trust and respect for Alex
from whatever precedent we make here. I can't stress this enough.
IDK if I'm exaggerating or it's hard to appreciate the challenges 
of maintainership without living it, but I really don't like being
accused of playing favorites or big companies buying their way in :(

> I think any other policy would be very ugly to enforce, prove, and
> even understand. Obviously code and architecture debates I'm all for.
> Ensuring we have a trusted, experienced person signed up to review
> code, address feedback, fix whatever syzbot finds and so on is also a
> must I think. I'm sure Alex will take care of it.

"Whatever syzbot finds" may be slightly moot for a private device ;)
but otherwise 100%! These are exactly the kind of points I think we
should enumerate. I started writing a list of expectations a while back:

Documentation/maintainer/feature-and-driver-maintainers.rst

I think we just need something like this, maybe just a step up, for
non-public devices..

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 21:59           ` John Fastabend
  2024-04-04 23:50             ` Jakub Kicinski
@ 2024-04-04 23:50             ` Alexander Duyck
  2024-04-08 11:05             ` Jiri Pirko
  2 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-04 23:50 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jakub Kicinski, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

On Thu, Apr 4, 2024 at 2:59 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Jakub Kicinski wrote:
> > On Thu, 4 Apr 2024 12:22:02 -0700 Alexander Duyck wrote:
> > > I don't understand that as we are
> > > contributing across multiple areas in the kernel including networking
> > > and ebpf. Is Meta expected to start pulling time from our upstream
> > > maintainers to have them update out-of-tree kernel modules since the
> > > community isn't willing to let us maintain it in the kernel? Is the
> > > message that the kernel is expected to get value from Meta, but that
> > > value is not meant to be reciprocated? Would you really rather have
> > > us start maintaining our own internal kernel with our own
> > > "proprietary goodness", and ask other NIC vendors to have to maintain
> > > their drivers against yet another kernel if they want to be used in
> > > our data centers?
> >
> > Please allow the community to make rational choices in the interest of
> > the project and more importantly the interest of its broader user base.
> >
> > Google would also claim "good faith" -- undoubtedly is supports
> > the kernel, and lets some of its best engineers contribute.
> > Did that make them stop trying to build Fuchsia? The "good faith" of
> > companies operates with the limits of margin of error of they consider
> > rational and beneficial.
> >
> > I don't want to put my thumb on the scale (yet?), but (with my
> > maintainer hat on) please don't use the "Meta is good" argument, because
> > someone will send a similar driver from a less involved company later on
> > and we'll be accused of playing favorites :( Plus companies can change
> > their approach to open source from "inclusive" to "extractive"
> > (to borrow the economic terminology) rather quickly.
> >
>
> I'll throw my $.02 in. In this case you have a driver that I only scanned
> so far, but looks well done. Alex has written lots of drivers I trust he
> will not just abondon it. And if it does end up abondoned and no one
> supports it at some future point we can deprecate it same as any other
> driver in the networking tree. All the feedback is being answered and
> debate is happening so I expect will get a v2, v3 or so. All good signs
> in my point.
>
> Back to your point about faith in a company. I don't think we even need
> to care about whatever companies business plans. The author could have
> submitted with their personal address for what its worth and called it
> drivers/alexware/duyck.o Bit extreme and I would have called him on it,
> but hopefully the point is clear.
>
> We have lots of drivers in the tree that are hard to physically get ahold
> of. Or otherwise gated by paying some vendor for compute time, etc. to
> use. We even have some drivers where the hardware itself never made
> it out into the wild or only a single customer used it before sellers
> burned it for commercial reasons or hw wasn't workable, team was cut, etc.
>
> I can't see how if I have a physical NIC for it on my desk here makes
> much difference one way or the other.

The advantage of Meta not selling it publicly at this time is that
Meta is both the consumer and the maintainer so if a kernel API change
broke something Meta would be responsible for fixing it. It would be
Meta's own interest to maintain the part, and if it breaks Meta would
be the only one impacted assuming the breaking change at least builds.
So rather than "good faith", maybe I should have said "motivated self
interest".

It seems like the worst case scenario would actually be some
commercial product being sold. Worse yet, one with proprietary bits
such as firmware on the device. Should commercial vendors be required
to provide some sort of documentation proving they are dedicated to
their product and financially stable enough to maintain it for the
entire product life cycle? What if the vendor sells some significant
number of units to Linux users out there, and then either goes under,
gets acquired, and/or just decides to go in a new direction. In that
scenario we have the driver unmaintained and consumers impacted, but
the company responsible for it has no motivation to address it other
than maybe negative PR.

> The alternative is much worse someone builds a team of engineers locks
> them up they build some interesting pieces and we never get to see it
> because we tried to block someone from opensourcing their driver?
> Eventually they need some kernel changes and than we block those too
> because we didn't allow the driver that was the use case? This seems
> wrong to me.
>
> Anyways we have zero ways to enforce such a policy. Have vendors
> ship a NIC to somebody with the v0 of the patch set? Attach a picture?
> Even if vendor X claims they will have a product in N months and
> than only sells it to qualified customers what to do we do then.
> Driver author could even believe the hardware will be available
> when they post the driver, but business may change out of hands
> of the developer.

This is what I was referring to as being "arbitrary and capricious".
The issue is what would we define as a NIC being for sale
commercially. Do we have to support a certain form factor, sell it for
a certain price, or sell a certain quantity?

If anything maybe we should look more at something like a blast radius
in terms of inclusion. If this were to go wrong, how could it go wrong
and who would be impacted.

> I'm 100% on letting this through assuming Alex is on top of feedback
> and the code is good. I think any other policy would be very ugly
> to enforce, prove, and even understand. Obviously code and architecture
> debates I'm all for. Ensuring we have a trusted, experienced person
> signed up to review code, address feedback, fix whatever syzbot finds
> and so on is also a must I think. I'm sure Alex will take care of
> it.
>
> Thanks,
> John

Thanks for your reply. I had started a reply, but you probably worded
this better than I could have.

Thanks

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 23:50             ` Jakub Kicinski
@ 2024-04-05  0:11               ` Alexander Duyck
  2024-04-05  2:38                 ` Jakub Kicinski
  2024-04-05  7:11                 ` Paolo Abeni
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-05  0:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

On Thu, Apr 4, 2024 at 4:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 04 Apr 2024 14:59:33 -0700 John Fastabend wrote:
> > The alternative is much worse someone builds a team of engineers locks
> > them up they build some interesting pieces and we never get to see it
> > because we tried to block someone from opensourcing their driver?
>
> Opensourcing is just one push to github.
> There are guarantees we give to upstream drivers.

Are there? Do we have them documented somewhere?

> > Eventually they need some kernel changes and than we block those too
> > because we didn't allow the driver that was the use case? This seems
> > wrong to me.
>
> The flip side of the argument is, what if we allow some device we don't
> have access to to make changes to the core for its benefit. Owner
> reports that some changes broke the kernel for them. Kernel rules,
> regression, we have to revert. This is not a hypothetical, "less than
> cooperative users" demanding reverts, and "reporting us to Linus"
> is a reality :(
>
> Technical solution? Maybe if it's not a public device regression rules
> don't apply? Seems fairly reasonable.

This is a hypothetical. This driver currently isn't changing anything
outside of itself. At this point the driver would only be build tested
by everyone else. They could just not include it in their Kconfig and
then out-of-sight, out-of-mind.

> > Anyways we have zero ways to enforce such a policy. Have vendors
> > ship a NIC to somebody with the v0 of the patch set? Attach a picture?
>
> GenAI world, pictures mean nothing :) We do have a CI in netdev, which
> is all ready to ingest external results, and a (currently tiny amount?)
> of test for NICs. Prove that you care about the device by running the
> upstream tests and reporting results? Seems fairly reasonable.

That seems like an opportunity to be exploited through. Are the
results going to be verified in any way? Maybe cryptographically
signed? Seems like it would be easy enough to fake the results.

> > Even if vendor X claims they will have a product in N months and
> > than only sells it to qualified customers what to do we do then.
> > Driver author could even believe the hardware will be available
> > when they post the driver, but business may change out of hands
> > of the developer.
> >
> > I'm 100% on letting this through assuming Alex is on top of feedback
> > and the code is good.
>
> I'd strongly prefer if we detach our trust and respect for Alex
> from whatever precedent we make here. I can't stress this enough.
> IDK if I'm exaggerating or it's hard to appreciate the challenges
> of maintainership without living it, but I really don't like being
> accused of playing favorites or big companies buying their way in :(

Again, I would say we look at the blast radius. That is how we should
be measuring any change. At this point the driver is self contained
into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
outside that directory, and it can be switched off via Kconfig.

When the time comes to start adding new features we can probably start
by looking at how to add either generic offloads like was done for
GSO, CSO, ect or how it can also be implemented on another vendor's
NIC.

At this point the only risk the driver presents is that it is yet
another driver, done in the same style I did the other Intel drivers,
and so any kernel API changes will end up needing to be applied to it
just like the other drivers.

> > I think any other policy would be very ugly to enforce, prove, and
> > even understand. Obviously code and architecture debates I'm all for.
> > Ensuring we have a trusted, experienced person signed up to review
> > code, address feedback, fix whatever syzbot finds and so on is also a
> > must I think. I'm sure Alex will take care of it.
>
> "Whatever syzbot finds" may be slightly moot for a private device ;)
> but otherwise 100%! These are exactly the kind of points I think we
> should enumerate. I started writing a list of expectations a while back:
>
> Documentation/maintainer/feature-and-driver-maintainers.rst
>
> I think we just need something like this, maybe just a step up, for
> non-public devices..

I honestly think we are getting the cart ahead of the horse. When we
start talking about kernel API changes then we can probably get into
the whole "private" versus "publicly available" argument. A good
example of the kind of thing I am thinking of is GSO partial where I
ended up with Mellanox and Intel sending me 40G and 100G NICs and
cables to implement it on their devices as all I had was essentially
igb and ixgbe based NICs.

Odds are when we start getting to those kind of things maybe we need
to look at having a few systems available for developer use, but until
then I am not sure it makes sense to focus on if the device is
publicly available or not.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05  0:11               ` Alexander Duyck
@ 2024-04-05  2:38                 ` Jakub Kicinski
  2024-04-05 15:41                   ` Alexander Duyck
  2024-04-05  7:11                 ` Paolo Abeni
  1 sibling, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-05  2:38 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: John Fastabend, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

On Thu, 4 Apr 2024 17:11:47 -0700 Alexander Duyck wrote:
> > Opensourcing is just one push to github.
> > There are guarantees we give to upstream drivers.  
> 
> Are there? Do we have them documented somewhere?

I think they are somewhere in Documentation/
To some extent this question in itself supports my point that written
down rules, as out of date as they may be, seem to carry more respect
than what a maintainer says :S

> > > Eventually they need some kernel changes and than we block those too
> > > because we didn't allow the driver that was the use case? This seems
> > > wrong to me.  
> >
> > The flip side of the argument is, what if we allow some device we don't
> > have access to to make changes to the core for its benefit. Owner
> > reports that some changes broke the kernel for them. Kernel rules,
> > regression, we have to revert. This is not a hypothetical, "less than
> > cooperative users" demanding reverts, and "reporting us to Linus"
> > is a reality :(
> >
> > Technical solution? Maybe if it's not a public device regression rules
> > don't apply? Seems fairly reasonable.  
> 
> This is a hypothetical. This driver currently isn't changing anything
> outside of itself. At this point the driver would only be build tested
> by everyone else. They could just not include it in their Kconfig and
> then out-of-sight, out-of-mind.

Not changing does not mean not depending on existing behavior.
Investigating and fixing properly even the hardest regressions in 
the stack is a bar that Meta can so easily clear. I don't understand
why you are arguing.

> > > Anyways we have zero ways to enforce such a policy. Have vendors
> > > ship a NIC to somebody with the v0 of the patch set? Attach a picture?  
> >
> > GenAI world, pictures mean nothing :) We do have a CI in netdev, which
> > is all ready to ingest external results, and a (currently tiny amount?)
> > of test for NICs. Prove that you care about the device by running the
> > upstream tests and reporting results? Seems fairly reasonable.  
> 
> That seems like an opportunity to be exploited through. Are the
> results going to be verified in any way? Maybe cryptographically
> signed? Seems like it would be easy enough to fake the results.

I think it's much easier to just run the tests than write a system
which will competently lie. But even if we completely suspend trust,
someone lying is of no cost to the community in this case.

> > > Even if vendor X claims they will have a product in N months and
> > > than only sells it to qualified customers what to do we do then.
> > > Driver author could even believe the hardware will be available
> > > when they post the driver, but business may change out of hands
> > > of the developer.
> > >
> > > I'm 100% on letting this through assuming Alex is on top of feedback
> > > and the code is good.  
> >
> > I'd strongly prefer if we detach our trust and respect for Alex
> > from whatever precedent we make here. I can't stress this enough.
> > IDK if I'm exaggerating or it's hard to appreciate the challenges
> > of maintainership without living it, but I really don't like being
> > accused of playing favorites or big companies buying their way in :(  
> 
> Again, I would say we look at the blast radius. That is how we should
> be measuring any change. At this point the driver is self contained
> into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
> outside that directory, and it can be switched off via Kconfig.

It is not practical to ponder every change case by case. Maintainers
are overworked. How long until we send the uAPI patch for RSS on the
flow label? I'd rather not re-litigate this every time someone posts
a slightly different feature. Let's cover the obvious points from 
the beginning while everyone is paying attention. We can amend later
as need be.

> When the time comes to start adding new features we can probably start
> by looking at how to add either generic offloads like was done for
> GSO, CSO, ect or how it can also be implemented on another vendor's
> NIC.
> 
> At this point the only risk the driver presents is that it is yet
> another driver, done in the same style I did the other Intel drivers,
> and so any kernel API changes will end up needing to be applied to it
> just like the other drivers.

The risk is we'll have a fight every time there is a disagreement about
the expectations.

> > > I think any other policy would be very ugly to enforce, prove, and
> > > even understand. Obviously code and architecture debates I'm all for.
> > > Ensuring we have a trusted, experienced person signed up to review
> > > code, address feedback, fix whatever syzbot finds and so on is also a
> > > must I think. I'm sure Alex will take care of it.  
> >
> > "Whatever syzbot finds" may be slightly moot for a private device ;)
> > but otherwise 100%! These are exactly the kind of points I think we
> > should enumerate. I started writing a list of expectations a while back:
> >
> > Documentation/maintainer/feature-and-driver-maintainers.rst
> >
> > I think we just need something like this, maybe just a step up, for
> > non-public devices..  
> 
> I honestly think we are getting the cart ahead of the horse. When we
> start talking about kernel API changes then we can probably get into
> the whole "private" versus "publicly available" argument. A good
> example of the kind of thing I am thinking of is GSO partial where I
> ended up with Mellanox and Intel sending me 40G and 100G NICs and
> cables to implement it on their devices as all I had was essentially
> igb and ixgbe based NICs.

That'd be great. Maybe even more than I'd expect. So why not write
it down? In case the person doing the coding is not Alex Duyck, and
just wants to get it done for their narrow use case, get a promo,
go work on something else?

> Odds are when we start getting to those kind of things maybe we need
> to look at having a few systems available for developer use, but until
> then I am not sure it makes sense to focus on if the device is
> publicly available or not.

Developer access would be huge.
A mirage of developer access? immaterial :)

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 15:37       ` Jakub Kicinski
@ 2024-04-05  3:08         ` David Ahern
  0 siblings, 0 replies; 163+ messages in thread
From: David Ahern @ 2024-04-05  3:08 UTC (permalink / raw)
  To: Jakub Kicinski, Andrew Lunn
  Cc: Alexander Duyck, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

On 4/4/24 9:37 AM, Jakub Kicinski wrote:
> On Thu, 4 Apr 2024 17:24:03 +0200 Andrew Lunn wrote:
>> Given the discussion going on in the thread "mlx5 ConnectX control
>> misc driver", you also plan to show you don't need such a misc driver?
> 
> To the extent to which it is possible to prove an in-existence
> of something :) Since my argument that Meta can use _vendor_ devices
> without resorting to "misc drivers" doesn't convince them, I doubt
> this data point would help.
> 

When this device gets widely deployed and you have the inevitable
production problems (inevitable in this sense this is designed,
implemented and deployed by humans who make mistakes and then S/W has to
compensate for the quirks), you can see whether it is easy to
completely, sanely and efficiently debug those problems solely with
extensions to open source tools.

But, I am guessing that is still 1+ years away before you will know.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05  0:11               ` Alexander Duyck
  2024-04-05  2:38                 ` Jakub Kicinski
@ 2024-04-05  7:11                 ` Paolo Abeni
  2024-04-05 12:26                   ` Jason Gunthorpe
  2024-04-08 11:37                   ` Jiri Pirko
  1 sibling, 2 replies; 163+ messages in thread
From: Paolo Abeni @ 2024-04-05  7:11 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski
  Cc: John Fastabend, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem

On Thu, 2024-04-04 at 17:11 -0700, Alexander Duyck wrote:
> Again, I would say we look at the blast radius. That is how we should
> be measuring any change. At this point the driver is self contained
> into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
> outside that directory, and it can be switched off via Kconfig.

I personally think this is the most relevant point. This is just a new
NIC driver, completely self-encapsulated. I quickly glanced over the
code and it looks like it's not doing anything obviously bad. It really
looks like an usual, legit, NIC driver.

I don't think the fact that the NIC itself is hard to grasp for anyone
outside <organization> makes a difference. Long time ago Greg noted
that drivers has been merged for H/W known to have a _single_ existing
instance (IIRC, I can't find the reference on top of my head, but back
then was quite popular, I hope some other old guy could remember).

To me, the maintainership burden is on Meta: Alex/Meta will have to
handle bug report, breakages, user-complains (I guess this last would
be the easier part ;). If he/they will not cope with the process we can
simply revert the driver. I would be quite surprised if such situation
should happen, but the impact from my PoV looks minimal.

TL;DR: I don't see any good reason to not accept this - unless my quick
glance was too quick and very wrong, but it looks like other has
similar view.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05  7:11                 ` Paolo Abeni
@ 2024-04-05 12:26                   ` Jason Gunthorpe
  2024-04-05 13:06                     ` Daniel Borkmann
  2024-04-05 14:24                     ` Alexander Duyck
  2024-04-08 11:37                   ` Jiri Pirko
  1 sibling, 2 replies; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-05 12:26 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Alexander Duyck, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

On Fri, Apr 05, 2024 at 09:11:19AM +0200, Paolo Abeni wrote:
> On Thu, 2024-04-04 at 17:11 -0700, Alexander Duyck wrote:
> > Again, I would say we look at the blast radius. That is how we should
> > be measuring any change. At this point the driver is self contained
> > into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
> > outside that directory, and it can be switched off via Kconfig.
> 
> I personally think this is the most relevant point. This is just a new
> NIC driver, completely self-encapsulated. I quickly glanced over the
> code and it looks like it's not doing anything obviously bad. It really
> looks like an usual, legit, NIC driver.

This is completely true, and as I've said many times the kernel as a
project is substantially about supporting the HW that people actually
build. There is no reason not to merge yet another basic netdev
driver.

However, there is also a pretty strong red line in Linux where people
belive, with strong conviction, that kernel code should not be merged
only to support a propriety userspace. This submission is clearly
bluring that line. This driver will only run in Meta's proprietary
kernel fork on servers running Meta's propriety userspace.

At this point perhaps it is OK, a basic NIC driver is not really an
issue, but Jiri is also very correct to point out that this is heading
in a very concerning direction.

Alex already indicated new features are coming, changes to the core
code will be proposed. How should those be evaluated? Hypothetically
should fbnic be allowed to be the first implementation of something
invasive like Mina's DMABUF work? Google published an open userspace
for NCCL that people can (in theory at least) actually run. Meta would
not be able to do that. I would say that clearly crosses the line and
should not be accepted.

So I think there should be an expectation that technicaly sound things
Meta may propose must not be accepted because they cross the
ideological red line into enabling only proprietary software.

To me it sets up a fairly common anti-pattern where a vendor starts
out with good intentions, reaches community pushback and falls back to
their downstream fork. Once forking occurs it becomes self-reinforcing
as built up infrastructure like tests and CI will only run correctly
on the fork and the fork grows. Then eventually the upstream code is
abandoned. This has happened many times before in Linux..

IMHO from a community perspective I feel like we should expect Meta to
fail and end up with a fork. The community should warn them. However
if they really want to try anyhow then I'm not sure it would be
appropriate to stop them at this point. Meta will just end up being a
"bad vendor".

I think the best thing the netdev community could do is come up with
some more clear guidelines what Meta could use fbnic to justify and
what would be rejected (ideologically) and Meta can decide on their
own if they want to continue.

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 12:26                   ` Jason Gunthorpe
@ 2024-04-05 13:06                     ` Daniel Borkmann
  2024-04-05 14:24                     ` Alexander Duyck
  1 sibling, 0 replies; 163+ messages in thread
From: Daniel Borkmann @ 2024-04-05 13:06 UTC (permalink / raw)
  To: Jason Gunthorpe, Paolo Abeni
  Cc: Alexander Duyck, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

On 4/5/24 2:26 PM, Jason Gunthorpe wrote:
> On Fri, Apr 05, 2024 at 09:11:19AM +0200, Paolo Abeni wrote:
>> On Thu, 2024-04-04 at 17:11 -0700, Alexander Duyck wrote:
>>> Again, I would say we look at the blast radius. That is how we should
>>> be measuring any change. At this point the driver is self contained
>>> into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
>>> outside that directory, and it can be switched off via Kconfig.
>>
>> I personally think this is the most relevant point. This is just a new
>> NIC driver, completely self-encapsulated. I quickly glanced over the
>> code and it looks like it's not doing anything obviously bad. It really
>> looks like an usual, legit, NIC driver.
> 
> This is completely true, and as I've said many times the kernel as a
> project is substantially about supporting the HW that people actually
> build. There is no reason not to merge yet another basic netdev
> driver.
> 
> However, there is also a pretty strong red line in Linux where people
> belive, with strong conviction, that kernel code should not be merged
> only to support a propriety userspace. This submission is clearly
> bluring that line. This driver will only run in Meta's proprietary
> kernel fork on servers running Meta's propriety userspace.
> 
> At this point perhaps it is OK, a basic NIC driver is not really an
> issue, but Jiri is also very correct to point out that this is heading
> in a very concerning direction.
> 
> Alex already indicated new features are coming, changes to the core
> code will be proposed. How should those be evaluated? Hypothetically
> should fbnic be allowed to be the first implementation of something
> invasive like Mina's DMABUF work?
My $0.02 from only reading this thread on the side.. when it comes to
extending and integrating with core networking code (e.g. larger features
like offloads, xdp/af_xdp, etc) the networking community always requested
at least two driver implementations to show-case that the code extensions
touching core code are not unique to just a single driver/NIC/vendor. I'd
expect this holds true also here..

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (16 preceding siblings ...)
  2024-04-04 11:37 ` Jiri Pirko
@ 2024-04-05 14:01 ` Przemek Kitszel
  2024-04-06 16:53   ` Alexander Duyck
  2024-04-09 20:51 ` Jakub Kicinski
  18 siblings, 1 reply; 163+ messages in thread
From: Przemek Kitszel @ 2024-04-05 14:01 UTC (permalink / raw)
  To: Alexander Duyck, netdev
  Cc: bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

On 4/3/24 22:08, Alexander Duyck wrote:
> This patch set includes the necessary patches to enable basic Tx and Rx
> over the Meta Platforms Host Network Interface. To do this we introduce a
> new driver and driver and directories in the form of
> "drivers/net/ethernet/meta/fbnic".
> 
> Due to submission limits the general plan to submit a minimal driver for
> now almost equivalent to a UEFI driver in functionality, and then follow up
> over the coming weeks enabling additional offloads and more features for
> the device.
> 
> The general plan is to look at adding support for ethtool, statistics, and
> start work on offloads in the next set of patches.
> 
> ---
> 
> Alexander Duyck (15):
>        PCI: Add Meta Platforms vendor ID
>        eth: fbnic: add scaffolding for Meta's NIC driver
>        eth: fbnic: Allocate core device specific structures and devlink interface
>        eth: fbnic: Add register init to set PCIe/Ethernet device config
>        eth: fbnic: add message parsing for FW messages
>        eth: fbnic: add FW communication mechanism
>        eth: fbnic: allocate a netdevice and napi vectors with queues
>        eth: fbnic: implement Tx queue alloc/start/stop/free
>        eth: fbnic: implement Rx queue alloc/start/stop/free
>        eth: fbnic: Add initial messaging to notify FW of our presence
>        eth: fbnic: Enable Ethernet link setup
>        eth: fbnic: add basic Tx handling
>        eth: fbnic: add basic Rx handling
>        eth: fbnic: add L2 address programming
>        eth: fbnic: write the TCAM tables used for RSS control and Rx to host
> 
> 
>   MAINTAINERS                                   |    7 +
>   drivers/net/ethernet/Kconfig                  |    1 +
>   drivers/net/ethernet/Makefile                 |    1 +
>   drivers/net/ethernet/meta/Kconfig             |   29 +
>   drivers/net/ethernet/meta/Makefile            |    6 +
>   drivers/net/ethernet/meta/fbnic/Makefile      |   18 +
>   drivers/net/ethernet/meta/fbnic/fbnic.h       |  148 ++
>   drivers/net/ethernet/meta/fbnic/fbnic_csr.h   |  912 ++++++++
>   .../net/ethernet/meta/fbnic/fbnic_devlink.c   |   86 +
>   .../net/ethernet/meta/fbnic/fbnic_drvinfo.h   |    5 +
>   drivers/net/ethernet/meta/fbnic/fbnic_fw.c    |  823 ++++++++
>   drivers/net/ethernet/meta/fbnic/fbnic_fw.h    |  133 ++
>   drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |  251 +++
>   drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 1025 +++++++++
>   drivers/net/ethernet/meta/fbnic/fbnic_mac.h   |   83 +
>   .../net/ethernet/meta/fbnic/fbnic_netdev.c    |  470 +++++
>   .../net/ethernet/meta/fbnic/fbnic_netdev.h    |   59 +
>   drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |  633 ++++++
>   drivers/net/ethernet/meta/fbnic/fbnic_rpc.c   |  709 +++++++
>   drivers/net/ethernet/meta/fbnic/fbnic_rpc.h   |  189 ++
>   drivers/net/ethernet/meta/fbnic/fbnic_tlv.c   |  529 +++++
>   drivers/net/ethernet/meta/fbnic/fbnic_tlv.h   |  175 ++
>   drivers/net/ethernet/meta/fbnic/fbnic_txrx.c  | 1873 +++++++++++++++++
>   drivers/net/ethernet/meta/fbnic/fbnic_txrx.h  |  125 ++
>   include/linux/pci_ids.h                       |    2 +
>   25 files changed, 8292 insertions(+)

Even if this is just a basic scaffolding for what will come, it's hard
to believe that no patch was co-developed, or should be marked as
authored-by some other developer.

[...]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 12:26                   ` Jason Gunthorpe
  2024-04-05 13:06                     ` Daniel Borkmann
@ 2024-04-05 14:24                     ` Alexander Duyck
  2024-04-05 15:17                       ` Jason Gunthorpe
  2024-04-09 16:53                       ` Edward Cree
  1 sibling, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-05 14:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Paolo Abeni, Jakub Kicinski, John Fastabend, Jiri Pirko, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem, Christoph Hellwig

On Fri, Apr 5, 2024 at 5:26 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Apr 05, 2024 at 09:11:19AM +0200, Paolo Abeni wrote:
> > On Thu, 2024-04-04 at 17:11 -0700, Alexander Duyck wrote:
> > > Again, I would say we look at the blast radius. That is how we should
> > > be measuring any change. At this point the driver is self contained
> > > into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
> > > outside that directory, and it can be switched off via Kconfig.
> >
> > I personally think this is the most relevant point. This is just a new
> > NIC driver, completely self-encapsulated. I quickly glanced over the
> > code and it looks like it's not doing anything obviously bad. It really
> > looks like an usual, legit, NIC driver.
>
> This is completely true, and as I've said many times the kernel as a
> project is substantially about supporting the HW that people actually
> build. There is no reason not to merge yet another basic netdev
> driver.
>
> However, there is also a pretty strong red line in Linux where people
> belive, with strong conviction, that kernel code should not be merged
> only to support a propriety userspace. This submission is clearly
> bluring that line. This driver will only run in Meta's proprietary
> kernel fork on servers running Meta's propriety userspace.
>
> At this point perhaps it is OK, a basic NIC driver is not really an
> issue, but Jiri is also very correct to point out that this is heading
> in a very concerning direction.
>
> Alex already indicated new features are coming, changes to the core
> code will be proposed. How should those be evaluated? Hypothetically
> should fbnic be allowed to be the first implementation of something
> invasive like Mina's DMABUF work? Google published an open userspace
> for NCCL that people can (in theory at least) actually run. Meta would
> not be able to do that. I would say that clearly crosses the line and
> should not be accepted.

Why not? Just because we are not commercially selling it doesn't mean
we couldn't look at other solutions such as QEMU. If we were to
provide a github repo with an emulation of the NIC would that be
enough to satisfy the "commercial" requirement?

The fact is I already have an implementation, but I would probably
need to clean up a few things as the current setup requires 3 QEMU
instances to emulate the full setup with host, firmware, and BMC. It
wouldn't be as performant as the actual hardware but it is more than
enough for us to test code with. If we need to look at publishing
something like that to github in order to address the lack of user
availability I could start looking at getting the approvals for that.

> So I think there should be an expectation that technically sound things
> Meta may propose must not be accepted because they cross the
> ideological red line into enabling only proprietary software.

That is a faulty argument. That is like saying we should kick out the
nouveu driver out of Linux just because it supports Nvidia graphics
cards that happen to also have a proprietary out-of-tree driver out
there, or maybe we need to kick all the Intel NIC drivers out for
DPDK? I can't think of many NIC vendors that don't have their own
out-of-tree drivers floating around with their own kernel bypass
solutions to support proprietary software.

> To me it sets up a fairly common anti-pattern where a vendor starts
> out with good intentions, reaches community pushback and falls back to
> their downstream fork. Once forking occurs it becomes self-reinforcing
> as built up infrastructure like tests and CI will only run correctly
> on the fork and the fork grows. Then eventually the upstream code is
> abandoned. This has happened many times before in Linux..
>
> IMHO from a community perspective I feel like we should expect Meta to
> fail and end up with a fork. The community should warn them. However
> if they really want to try anyhow then I'm not sure it would be
> appropriate to stop them at this point. Meta will just end up being a
> "bad vendor".
>
> I think the best thing the netdev community could do is come up with
> some more clear guidelines what Meta could use fbnic to justify and
> what would be rejected (ideologically) and Meta can decide on their
> own if they want to continue.

I agree. We need a consistent set of standards. I just strongly
believe commercial availability shouldn't be one of them.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 14:24                     ` Alexander Duyck
@ 2024-04-05 15:17                       ` Jason Gunthorpe
  2024-04-05 18:38                         ` Alexander Duyck
  2024-04-09 16:53                       ` Edward Cree
  1 sibling, 1 reply; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-05 15:17 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Paolo Abeni, Jakub Kicinski, John Fastabend, Jiri Pirko, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem, Christoph Hellwig

On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
> > Alex already indicated new features are coming, changes to the core
> > code will be proposed. How should those be evaluated? Hypothetically
> > should fbnic be allowed to be the first implementation of something
> > invasive like Mina's DMABUF work? Google published an open userspace
> > for NCCL that people can (in theory at least) actually run. Meta would
> > not be able to do that. I would say that clearly crosses the line and
> > should not be accepted.
> 
> Why not? Just because we are not commercially selling it doesn't mean
> we couldn't look at other solutions such as QEMU. If we were to
> provide a github repo with an emulation of the NIC would that be
> enough to satisfy the "commercial" requirement?

My test is not "commercial", it is enabling open source ecosystem vs
benefiting only proprietary software.

In my hypothetical you'd need to do something like open source Meta's
implementation of the AI networking that the DMABUF patches enable,
and even then since nobody could run it at performance the thing is
pretty questionable.

IMHO publishing a qemu chip emulator would not advance the open source
ecosystem around building a DMABUF AI networking scheme.

> > So I think there should be an expectation that technically sound things
> > Meta may propose must not be accepted because they cross the
> > ideological red line into enabling only proprietary software.
> 
> That is a faulty argument. That is like saying we should kick out the
> nouveu driver out of Linux just because it supports Nvidia graphics
> cards that happen to also have a proprietary out-of-tree driver out
> there,

Huh? nouveau supports a fully open source mesa graphics stack in
Linux. How is that remotely similar to what I said? No issue.

> or maybe we need to kick all the Intel NIC drivers out for
> DPDK? 

DPDK is fully open source, again no issue.

You pointed at two things that I would consider to be exemplar open
source projects and said their existance somehow means we should be
purging drivers from the kernel???

I really don't understand what you are trying to say at all.

The kernel standard is that good quality open source *does* exist, we
tend to not care what proprietary things people create beyond that.

> I can't think of many NIC vendors that don't have their own
> out-of-tree drivers floating around with their own kernel bypass
> solutions to support proprietary software.

Most of those are also open source, and we can't say much about what
people do out of tree, obviously.

> I agree. We need a consistent set of standards. I just strongly
> believe commercial availability shouldn't be one of them.

I never said commercial availability. I talked about open source vs
proprietary userspace. This is very standard kernel stuff.

You have an unavailable NIC, so we know it is only ever operated with
Meta's proprietary kernel fork, supporting Meta's proprietary
userspace software. Where exactly is the open source?

Why should someone working to improve only their proprietary
environment be welcomed in the same way as someone working to improve
the open source ecosystem? That has never been the kernel communities
position.

If you want to propose things to the kernel that can only be
meaningfully used by your proprietary software then you should not
expect to succeed. No one should be surprised to hear this.

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05  2:38                 ` Jakub Kicinski
@ 2024-04-05 15:41                   ` Alexander Duyck
  2024-04-08  6:18                     ` Leon Romanovsky
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-05 15:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Jiri Pirko, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

On Thu, Apr 4, 2024 at 7:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 4 Apr 2024 17:11:47 -0700 Alexander Duyck wrote:
> > > Opensourcing is just one push to github.
> > > There are guarantees we give to upstream drivers.
> >
> > Are there? Do we have them documented somewhere?
>
> I think they are somewhere in Documentation/
> To some extent this question in itself supports my point that written
> down rules, as out of date as they may be, seem to carry more respect
> than what a maintainer says :S

I think the problem is there are multiple maintainers and they all
have different ways of doing things. As a submitter over the years I
have had to deal with probably over a half dozen different maintainers
and each experience has been different. I have always argued that the
netdev tree is one of the better maintained setups.

> > > > Eventually they need some kernel changes and than we block those too
> > > > because we didn't allow the driver that was the use case? This seems
> > > > wrong to me.
> > >
> > > The flip side of the argument is, what if we allow some device we don't
> > > have access to to make changes to the core for its benefit. Owner
> > > reports that some changes broke the kernel for them. Kernel rules,
> > > regression, we have to revert. This is not a hypothetical, "less than
> > > cooperative users" demanding reverts, and "reporting us to Linus"
> > > is a reality :(
> > >
> > > Technical solution? Maybe if it's not a public device regression rules
> > > don't apply? Seems fairly reasonable.
> >
> > This is a hypothetical. This driver currently isn't changing anything
> > outside of itself. At this point the driver would only be build tested
> > by everyone else. They could just not include it in their Kconfig and
> > then out-of-sight, out-of-mind.
>
> Not changing does not mean not depending on existing behavior.
> Investigating and fixing properly even the hardest regressions in
> the stack is a bar that Meta can so easily clear. I don't understand
> why you are arguing.

I wasn't saying the driver wouldn't be dependent on existing behavior.
I was saying that it was a hypothetical that Meta would be a "less
than cooperative user" and demand a revert.  It is also a hypothetical
that Linus wouldn't just propose a revert of the fbnic driver instead
of the API for the crime of being a "less than cooperative maintainer"
and  then give Meta the Nvidia treatment.

> > > > Anyways we have zero ways to enforce such a policy. Have vendors
> > > > ship a NIC to somebody with the v0 of the patch set? Attach a picture?
> > >
> > > GenAI world, pictures mean nothing :) We do have a CI in netdev, which
> > > is all ready to ingest external results, and a (currently tiny amount?)
> > > of test for NICs. Prove that you care about the device by running the
> > > upstream tests and reporting results? Seems fairly reasonable.
> >
> > That seems like an opportunity to be exploited through. Are the
> > results going to be verified in any way? Maybe cryptographically
> > signed? Seems like it would be easy enough to fake the results.
>
> I think it's much easier to just run the tests than write a system
> which will competently lie. But even if we completely suspend trust,
> someone lying is of no cost to the community in this case.

I don't get this part. You are paranoid about bad actors until it
comes to accepting the test results? So write a broken API, "prove" it
works by running it on your broken test setup, and then get it
upstream after establishing a false base for trust. Seems like a
perfect setup for an exploit like what happened with the xz setup.

> > > > Even if vendor X claims they will have a product in N months and
> > > > than only sells it to qualified customers what to do we do then.
> > > > Driver author could even believe the hardware will be available
> > > > when they post the driver, but business may change out of hands
> > > > of the developer.
> > > >
> > > > I'm 100% on letting this through assuming Alex is on top of feedback
> > > > and the code is good.
> > >
> > > I'd strongly prefer if we detach our trust and respect for Alex
> > > from whatever precedent we make here. I can't stress this enough.
> > > IDK if I'm exaggerating or it's hard to appreciate the challenges
> > > of maintainership without living it, but I really don't like being
> > > accused of playing favorites or big companies buying their way in :(
> >
> > Again, I would say we look at the blast radius. That is how we should
> > be measuring any change. At this point the driver is self contained
> > into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
> > outside that directory, and it can be switched off via Kconfig.
>
> It is not practical to ponder every change case by case. Maintainers
> are overworked. How long until we send the uAPI patch for RSS on the
> flow label? I'd rather not re-litigate this every time someone posts
> a slightly different feature. Let's cover the obvious points from
> the beginning while everyone is paying attention. We can amend later
> as need be.

Isn't that what we are doing right now? We are basically refusing this
patch set not based on its own merits but on a "what-if" scenario for
a patch set that might come at some point in the future and conjecture
that somehow it is going to be able to add features for just itself
when we haven't allowed that for in the past, for example with things
like GSO partial.

> > When the time comes to start adding new features we can probably start
> > by looking at how to add either generic offloads like was done for
> > GSO, CSO, ect or how it can also be implemented on another vendor's
> > NIC.
> >
> > At this point the only risk the driver presents is that it is yet
> > another driver, done in the same style I did the other Intel drivers,
> > and so any kernel API changes will end up needing to be applied to it
> > just like the other drivers.
>
> The risk is we'll have a fight every time there is a disagreement about
> the expectations.

We always do. I am not sure why you would expect that would change by
blocking this patch set. If anything it sounds like maybe we need to
document some requirements for availability for testing, and
expectations for what it would mean to be a one-off device where only
one entity has access to it. However that is a process problem, and
not so much a patch or driver issue.

> > > > I think any other policy would be very ugly to enforce, prove, and
> > > > even understand. Obviously code and architecture debates I'm all for.
> > > > Ensuring we have a trusted, experienced person signed up to review
> > > > code, address feedback, fix whatever syzbot finds and so on is also a
> > > > must I think. I'm sure Alex will take care of it.
> > >
> > > "Whatever syzbot finds" may be slightly moot for a private device ;)
> > > but otherwise 100%! These are exactly the kind of points I think we
> > > should enumerate. I started writing a list of expectations a while back:
> > >
> > > Documentation/maintainer/feature-and-driver-maintainers.rst
> > >
> > > I think we just need something like this, maybe just a step up, for
> > > non-public devices..
> >
> > I honestly think we are getting the cart ahead of the horse. When we
> > start talking about kernel API changes then we can probably get into
> > the whole "private" versus "publicly available" argument. A good
> > example of the kind of thing I am thinking of is GSO partial where I
> > ended up with Mellanox and Intel sending me 40G and 100G NICs and
> > cables to implement it on their devices as all I had was essentially
> > igb and ixgbe based NICs.
>
> That'd be great. Maybe even more than I'd expect. So why not write
> it down? In case the person doing the coding is not Alex Duyck, and
> just wants to get it done for their narrow use case, get a promo,
> go work on something else?

Write what down? That the vendors didn't like me harassing them to
test my code so they shipped me the NICs and asked me to just test it
myself?

That worked in my scenario as I had a server level system in my home
lab to make that work. We cannot expect everyone to have that kind of
setup for their own development. That is why I am considering the QEMU
approach as that might make this a bit more accessible. I could then
look at enabling the QEMU at the same time I enable the driver.

> > Odds are when we start getting to those kind of things maybe we need
> > to look at having a few systems available for developer use, but until
> > then I am not sure it makes sense to focus on if the device is
> > publicly available or not.
>
> Developer access would be huge.
> A mirage of developer access? immaterial :)

If nothing else, maybe we need a writeup somewhere of the level of
support a driver should expect from the Linux community if the device
is not "easily available". We could probably define that in terms of
what a reasonable expectation would be for a developer to have access
to it.

I would say that "commercial sales" is not a good metric and shouldn't
come into it. If anything it would be about device availability for
development and testing.

In addition I would be good with a definition of "support" for the
case of a device that isn't publicly available being quite limited, as
those with access would have to have active involvement in the
community to enable support. Without that it might as well be
considered orphaned and the driver dropped.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 15:17                       ` Jason Gunthorpe
@ 2024-04-05 18:38                         ` Alexander Duyck
  2024-04-05 19:02                           ` Jason Gunthorpe
  2024-04-08 11:50                           ` Jiri Pirko
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-05 18:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Paolo Abeni, Jakub Kicinski, John Fastabend, Jiri Pirko, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem, Christoph Hellwig

On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
> > > Alex already indicated new features are coming, changes to the core
> > > code will be proposed. How should those be evaluated? Hypothetically
> > > should fbnic be allowed to be the first implementation of something
> > > invasive like Mina's DMABUF work? Google published an open userspace
> > > for NCCL that people can (in theory at least) actually run. Meta would
> > > not be able to do that. I would say that clearly crosses the line and
> > > should not be accepted.
> >
> > Why not? Just because we are not commercially selling it doesn't mean
> > we couldn't look at other solutions such as QEMU. If we were to
> > provide a github repo with an emulation of the NIC would that be
> > enough to satisfy the "commercial" requirement?
>
> My test is not "commercial", it is enabling open source ecosystem vs
> benefiting only proprietary software.

Sorry, that was where this started where Jiri was stating that we had
to be selling this.

> In my hypothetical you'd need to do something like open source Meta's
> implementation of the AI networking that the DMABUF patches enable,
> and even then since nobody could run it at performance the thing is
> pretty questionable.
>
> IMHO publishing a qemu chip emulator would not advance the open source
> ecosystem around building a DMABUF AI networking scheme.

Well not too many will be able to afford getting the types of systems
and hardware needed for this in the first place. Primarily just your
large data center companies can afford it.

I never said this hardware is about enabling DMABUF. You implied that.
The fact is that this driver is meant to be a pretty basic speeds and
feeds device. We support header split and network flow classification
so I suppose it could be used for DMABUF but by that logic so could a
number of other drivers.

> > > So I think there should be an expectation that technically sound things
> > > Meta may propose must not be accepted because they cross the
> > > ideological red line into enabling only proprietary software.
> >
> > That is a faulty argument. That is like saying we should kick out the
> > nouveu driver out of Linux just because it supports Nvidia graphics
> > cards that happen to also have a proprietary out-of-tree driver out
> > there,
>
> Huh? nouveau supports a fully open source mesa graphics stack in
> Linux. How is that remotely similar to what I said? No issue.

Right, nouveau is fully open source. That is what I am trying to do
with fbnic. That is what I am getting at. This isn't connecting to
some proprietary stack or engaging in any sort of bypass. It is going
through the standard networking stack. If there were some other
out-of-tree driver for this to support some other use case how would
that impact the upstream patch submission?

This driver is being NAKed for enabling stuff that hasn't even been
presented. It is barely enough driver to handle PXE booting which is
needed to be able to even load an OS on the system. Yet somehow
because you are expecting a fork to come in at some point to support
DMABUF you are wanting to block it outright. How about rather than
doing that we wait until there is something there that is
objectionable before we start speculating on what may be coming.

> You pointed at two things that I would consider to be exemplar open
> source projects and said their existance somehow means we should be
> purging drivers from the kernel???
>
> I really don't understand what you are trying to say at all.

I'm trying to say that both those projects are essentially doing the
same thing you are accusing fbnic of doing, even though I am exposing
no non-standard API(s) and everything is open source. You are
projecting future changes onto this driver that don't currently and
may never exist.

> The kernel standard is that good quality open source *does* exist, we
> tend to not care what proprietary things people create beyond that.

Now I am confused. You say you don't care what happens later, but you
seem to be insisting you care about what proprietary things will be
done with it after it is upstreamed.

> > I can't think of many NIC vendors that don't have their own
> > out-of-tree drivers floating around with their own kernel bypass
> > solutions to support proprietary software.
>
> Most of those are also open source, and we can't say much about what
> people do out of tree, obviously.

Isn't that exactly what you are doing though with all your
"proprietary" comments?

> > I agree. We need a consistent set of standards. I just strongly
> > believe commercial availability shouldn't be one of them.
>
> I never said commercial availability. I talked about open source vs
> proprietary userspace. This is very standard kernel stuff.
>
> You have an unavailable NIC, so we know it is only ever operated with
> Meta's proprietary kernel fork, supporting Meta's proprietary
> userspace software. Where exactly is the open source?

It depends on your definition of "unavailable". I could argue that for
many most of the Mellanox NICs are also have limited availability as
they aren't exactly easy to get a hold of without paying a hefty
ransom.

The NIC is currently available to developers within Meta. As such I
know there are not a small number of kernel developers who could get
access to it if they asked for a login to one of our test and
development systems. Also I offered to provide the QEMU repo, but you
said you had no interest in that option.

> Why should someone working to improve only their proprietary
> environment be welcomed in the same way as someone working to improve
> the open source ecosystem? That has never been the kernel communities
> position.

To quote Linus `I do not see open source as some big goody-goody
"let's all sing kumbaya around the campfire and make the world a
better place". No, open source only really works if everybody is
contributing for their own selfish reasons.`[1]

How is us using our own NIC any different than if one of the vendors
were to make a NIC exclusively for us or any other large data center?
The only reason why this is coming up is because Meta is not a typical
NIC vendor but normally a consumer. The fact that we will be
dogfooding our own NIC seems to be at the heart of the issue here.

Haven't there been a number of maintainers who end up maintaining code
bases in the kernel for platforms and/or devices where they own one of
the few devices available in the world? How would this be any
different. Given enough time it is likely this will end up in the
hands of those outside Meta anyway, at that point the argument would
be moot.

> If you want to propose things to the kernel that can only be
> meaningfully used by your proprietary software then you should not
> expect to succeed. No one should be surprised to hear this.

If the whole idea is to get us to run a non-proprietary stack nothing
sends the exact opposite message like telling us we cannot upstream a
simple network driver because of a "what if" about some DMABUF patch
set from Google. All I am asking for is the ability to net install a
system with this device. That requires the driver to be available in
the provisioning kernel image, so thus why I am working to upstream it
as I would rather not have to maintain an out-of-tree kernel driver.

The argument here isn't about proprietary software. It is proprietary
hardware that seems to be the issue, or at least that is where it
started. The driver itself anyone could load, build, or even run on
QEMU as I mentioned. It is open source and not exposing any new APIs.
The issue seems to be with the fact that the NIC can't be bought from
a vendor and instead Meta is building the NIC for it's own
consumption.

As far as the software stack the concern about DMABUF seems like an
orthogonal argument that should be had at the userspace/API level and
doesn't directly relate to any specific driver. As has been pointed
out enabling anything like that wouldn't be a single NIC solution and
to be accepted upstream it should be implemented on at least 2
different vendor drivers.  Additionally, there isn't anything unique
about this hardware that would make it more capable of enabling that
than any other device.

Thanks,

- Alex

[1]: https://www.bbc.com/news/technology-18419231

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 18:38                         ` Alexander Duyck
@ 2024-04-05 19:02                           ` Jason Gunthorpe
  2024-04-06 16:05                             ` Alexander Duyck
  2024-04-08 11:50                           ` Jiri Pirko
  1 sibling, 1 reply; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-05 19:02 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Paolo Abeni, Jakub Kicinski, John Fastabend, Jiri Pirko, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem, Christoph Hellwig

On Fri, Apr 05, 2024 at 11:38:25AM -0700, Alexander Duyck wrote:

> > In my hypothetical you'd need to do something like open source Meta's
> > implementation of the AI networking that the DMABUF patches enable,
> > and even then since nobody could run it at performance the thing is
> > pretty questionable.
> >
> > IMHO publishing a qemu chip emulator would not advance the open source
> > ecosystem around building a DMABUF AI networking scheme.
> 
> Well not too many will be able to afford getting the types of systems
> and hardware needed for this in the first place. Primarily just your
> large data center companies can afford it.
> 
> I never said this hardware is about enabling DMABUF.

I presented a hypothetical to be able to illustrate a scenario where
this driver should not be used to justify invasive core kernel
changes.

I have no idea what future things you have in mind, or if any will
reach a threshold where I would expect they should not be
included. You where the one saying a key reason you wanted this driver
was to push core changes and you said you imagine changes that are
unique to fbnic that "others might like to follow".

I'm being very clear to say that there are some core changes should
not be accepted due to the kernel's open source ideology.

> Right, nouveau is fully open source. That is what I am trying to do
> with fbnic. That is what I am getting at. This isn't connecting to
> some proprietary stack or engaging in any sort of bypass.

The basic driver presented here is not, a future driver that justifies
unknown changes to the core may be.

This is why my message was pretty clear. IMHO there is nothing wrong
with this series, but I do not expect you will get everything you want
in future due to this issue.

I said decide if you want to continue. I'm not NAKing anything on this
series.

> I'm trying to say that both those projects are essentially doing the
> same thing you are accusing fbnic of doing, 

Not even close. Both those projects support open source ecosystems,
have wide cross vendor participating. fbnic isn't even going to be
enabled in any distribution.

> > You have an unavailable NIC, so we know it is only ever operated with
> > Meta's proprietary kernel fork, supporting Meta's proprietary
> > userspace software. Where exactly is the open source?
> 
> It depends on your definition of "unavailable". I could argue that for
> many most of the Mellanox NICs are also have limited availability as
> they aren't exactly easy to get a hold of without paying a hefty
> ransom.

And GNIC's that run Mina's series are completely unavailable right
now. That is still a big different from a temporary issue to a
permanent structural intention of the manufacturer.

> > Why should someone working to improve only their proprietary
> > environment be welcomed in the same way as someone working to improve
> > the open source ecosystem? That has never been the kernel communities
> > position.
> 
> To quote Linus `I do not see open source as some big goody-goody
> "let's all sing kumbaya around the campfire and make the world a
> better place". No, open source only really works if everybody is
> contributing for their own selfish reasons.`[1]

I think that stance has evolved and the consensus position toward uapi
is stronger.

> different. Given enough time it is likely this will end up in the
> hands of those outside Meta anyway, at that point the argument would
> be moot.

Oh, I'm skeptical about that.

You seem to have taken my original email in a strange direction. I
said this series was fine but cautioned that if you proceed you should
be expecting an eventual feature rejection for idological reasons, and
gave a hypothetical example what that would look like.

If you want to continue or not is up to Meta, in my view.

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup
  2024-04-03 20:09 ` [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup Alexander Duyck
  2024-04-03 21:11   ` Andrew Lunn
@ 2024-04-05 21:51   ` Andrew Lunn
  2024-04-21 23:21     ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-05 21:51 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

> +#define FBNIC_CSR_START_PCS		0x10000 /* CSR section delimiter */
> +#define FBNIC_PCS_CONTROL1_0		0x10000		/* 0x40000 */
> +#define FBNIC_PCS_CONTROL1_RESET		CSR_BIT(15)
> +#define FBNIC_PCS_CONTROL1_LOOPBACK		CSR_BIT(14)
> +#define FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS	CSR_BIT(13)
> +#define FBNIC_PCS_CONTROL1_SPEED_ALWAYS		CSR_BIT(6)

This appears to be PCS control register 1, define in 45.2.3.1. Since
this is a standard register, please add it to mdio.h.

> +#define FBNIC_PCS_VENDOR_VL_INTVL_0	0x10202		/* 0x40808 */

Could you explain how these registers map to 802.3 clause 45? Would
that be 3.1002? That would however put it in the reserved range 3.812
through 3.1799. The vendor range is 3.32768 through 3.65535. 

> +#define FBNIC_PCS_VL0_0_CHAN_0		0x10208		/* 0x40820 */
> +#define FBNIC_PCS_VL0_1_CHAN_0		0x10209		/* 0x40824 */
> +#define FBNIC_PCS_VL1_0_CHAN_0		0x1020a		/* 0x40828 */
> +#define FBNIC_PCS_VL1_1_CHAN_0		0x1020b		/* 0x4082c */
> +#define FBNIC_PCS_VL2_0_CHAN_0		0x1020c		/* 0x40830 */
> +#define FBNIC_PCS_VL2_1_CHAN_0		0x1020d		/* 0x40834 */
> +#define FBNIC_PCS_VL3_0_CHAN_0		0x1020e		/* 0x40838 */
> +#define FBNIC_PCS_VL3_1_CHAN_0		0x1020f		/* 0x4083c */
> +#define FBNIC_PCS_MODE_VL_CHAN_0	0x10210		/* 0x40840 */
> +#define FBNIC_PCS_MODE_HI_BER25			CSR_BIT(2)
> +#define FBNIC_PCS_MODE_DISABLE_MLD		CSR_BIT(1)
> +#define FBNIC_PCS_MODE_ENA_CLAUSE49		CSR_BIT(0)
> +#define FBNIC_PCS_CONTROL1_1		0x10400		/* 0x41000 */
> +#define FBNIC_PCS_VENDOR_VL_INTVL_1	0x10602		/* 0x41808 */
> +#define FBNIC_PCS_VL0_0_CHAN_1		0x10608		/* 0x41820 */
> +#define FBNIC_PCS_VL0_1_CHAN_1		0x10609		/* 0x41824 */
> +#define FBNIC_PCS_VL1_0_CHAN_1		0x1060a		/* 0x41828 */
> +#define FBNIC_PCS_VL1_1_CHAN_1		0x1060b		/* 0x4182c */
> +#define FBNIC_PCS_VL2_0_CHAN_1		0x1060c		/* 0x41830 */
> +#define FBNIC_PCS_VL2_1_CHAN_1		0x1060d		/* 0x41834 */
> +#define FBNIC_PCS_VL3_0_CHAN_1		0x1060e		/* 0x41838 */
> +#define FBNIC_PCS_VL3_1_CHAN_1		0x1060f		/* 0x4183c */
> +#define FBNIC_PCS_MODE_VL_CHAN_1	0x10610		/* 0x41840 */
> +#define FBNIC_CSR_END_PCS		0x10668 /* CSR section delimiter */
> +
> +#define FBNIC_CSR_START_RSFEC		0x10800 /* CSR section delimiter */
> +#define FBNIC_RSFEC_CONTROL(n)\
> +				(0x10800 + 8 * (n))	/* 0x42000 + 32*n */
> +#define FBNIC_RSFEC_CONTROL_AM16_COPY_DIS	CSR_BIT(3)
> +#define FBNIC_RSFEC_CONTROL_KP_ENABLE		CSR_BIT(8)
> +#define FBNIC_RSFEC_CONTROL_TC_PAD_ALTER	CSR_BIT(10)
> +#define FBNIC_RSFEC_MAX_LANES			4
> +#define FBNIC_RSFEC_CCW_LO(n) \
> +				(0x10802 + 8 * (n))	/* 0x42008 + 32*n */
> +#define FBNIC_RSFEC_CCW_HI(n) \
> +				(0x10803 + 8 * (n))	/* 0x4200c + 32*n */

Is this Corrected Code Words Lower/Upper? 1.202 and 1.203?

> +#define FBNIC_RSFEC_NCCW_LO(n) \
> +				(0x10804 + 8 * (n))	/* 0x42010 + 32*n */
> +#define FBNIC_RSFEC_NCCW_HI(n) \
> +				(0x10805 + 8 * (n))	/* 0x42014 + 32*n */

Which suggests this is Uncorrected code Words? 1.204, 1.205? I guess
the N is for Not?

> +#define FBNIC_RSFEC_SYMBLERR_LO(n) \
> +				(0x10880 + 8 * (n))	/* 0x42200 + 32*n */
> +#define FBNIC_RSFEC_SYMBLERR_HI(n) \
> +				(0x10881 + 8 * (n))	/* 0x42204 + 32*n */

And these are symbol count errors, 1.210 and 1.211?

If there are other registers which follow 802.3 it would be good to
add them to mdio.h, so others can share them.

    Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 19:02                           ` Jason Gunthorpe
@ 2024-04-06 16:05                             ` Alexander Duyck
  2024-04-06 16:49                               ` Andrew Lunn
                                                 ` (2 more replies)
  0 siblings, 3 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-06 16:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Paolo Abeni, Jakub Kicinski, John Fastabend, Jiri Pirko, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem, Christoph Hellwig

On Fri, Apr 5, 2024 at 12:02 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Apr 05, 2024 at 11:38:25AM -0700, Alexander Duyck wrote:
>
> > > In my hypothetical you'd need to do something like open source Meta's
> > > implementation of the AI networking that the DMABUF patches enable,
> > > and even then since nobody could run it at performance the thing is
> > > pretty questionable.
> > >
> > > IMHO publishing a qemu chip emulator would not advance the open source
> > > ecosystem around building a DMABUF AI networking scheme.
> >
> > Well not too many will be able to afford getting the types of systems
> > and hardware needed for this in the first place. Primarily just your
> > large data center companies can afford it.
> >
> > I never said this hardware is about enabling DMABUF.
>
> I presented a hypothetical to be able to illustrate a scenario where
> this driver should not be used to justify invasive core kernel
> changes.
>
> I have no idea what future things you have in mind, or if any will
> reach a threshold where I would expect they should not be
> included. You where the one saying a key reason you wanted this driver
> was to push core changes and you said you imagine changes that are
> unique to fbnic that "others might like to follow".
>
> I'm being very clear to say that there are some core changes should
> not be accepted due to the kernel's open source ideology.

Okay, on core changes I 100% agree. That is one of the reasons why we
have the whole thing about any feature really needing to be enabled on
at least 2 different vendor devices.

> > Right, nouveau is fully open source. That is what I am trying to do
> > with fbnic. That is what I am getting at. This isn't connecting to
> > some proprietary stack or engaging in any sort of bypass.
>
> The basic driver presented here is not, a future driver that justifies
> unknown changes to the core may be.
>
> This is why my message was pretty clear. IMHO there is nothing wrong
> with this series, but I do not expect you will get everything you want
> in future due to this issue.
>
> I said decide if you want to continue. I'm not NAKing anything on this
> series.

My apologies. I had interpreted your statement as essentially agreeing
with Jiri and NAKing the patches simply for not being commercially
available.

> > I'm trying to say that both those projects are essentially doing the
> > same thing you are accusing fbnic of doing,
>
> Not even close. Both those projects support open source ecosystems,
> have wide cross vendor participating. fbnic isn't even going to be
> enabled in any distribution.

I can't say for certain if that will be the case going forward or not.
I know we haven't reached out to any distros and currently don't need
to. With that said though, having the driver available as a module
shouldn't cause any harm either so I don't really have a strong
opinion about it either way.

Seeing as how we arne't in the NIC business I am not sure how
restrictive we would be about licensing out the IP. I could see Meta
potentially trying to open up the specs just to take the manufacturing
burden off of us and enable more competition, though I am on the
engineering side and not so much sourcing so I am just speculating.

> > > You have an unavailable NIC, so we know it is only ever operated with
> > > Meta's proprietary kernel fork, supporting Meta's proprietary
> > > userspace software. Where exactly is the open source?
> >
> > It depends on your definition of "unavailable". I could argue that for
> > many most of the Mellanox NICs are also have limited availability as
> > they aren't exactly easy to get a hold of without paying a hefty
> > ransom.
>
> And GNIC's that run Mina's series are completely unavailable right
> now. That is still a big different from a temporary issue to a
> permanent structural intention of the manufacturer.

I'm assuming it is some sort of firmware functionality that is needed
to enable it? One thing with our design is that the firmware actually
has minimal functionality. Basically it is the liaison between the
BMC, Host, and the MAC. Otherwise it has no role to play in the
control path so when the driver is loaded it is running the show.

> > > Why should someone working to improve only their proprietary
> > > environment be welcomed in the same way as someone working to improve
> > > the open source ecosystem? That has never been the kernel communities
> > > position.
> >
> > To quote Linus `I do not see open source as some big goody-goody
> > "let's all sing kumbaya around the campfire and make the world a
> > better place". No, open source only really works if everybody is
> > contributing for their own selfish reasons.`[1]
>
> I think that stance has evolved and the consensus position toward uapi
> is stronger.

I assume you are talking about the need to have an open source
consumer for any exposed uapi? That makes sense from the test and
development standpoint as you need to have some way to exercise and
test any interface that is available in the kernel.

> > different. Given enough time it is likely this will end up in the
> > hands of those outside Meta anyway, at that point the argument would
> > be moot.
>
> Oh, I'm skeptical about that.

I didn't say how widely or when. I got my introduction to Linux by
buying used server systems and trying to get something maintainable in
terms of OS on them. From my experience you end up with all sorts of
odd-ball proprietary parts that eventually end up leaking out to the
public. Though back then it was more likely to be a proprietary spin
of some known silicon with a few tweaks that had to be accounted for.

> You seem to have taken my original email in a strange direction. I
> said this series was fine but cautioned that if you proceed you should
> be expecting an eventual feature rejection for idological reasons, and
> gave a hypothetical example what that would look like.
>
> If you want to continue or not is up to Meta, in my view.
>
> Jason

Yeah, I think I had misunderstood your original comment as being in
support of Jiri's position. I hadn't fully grokked that you were doing
more of a "yes, but" versus an "yes, and".

For this driver I don't see there being too much in the way of
complications as there shouldn't be much in the way of any sort of
kernel-bypass that would be applicable to this driver uniquely.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-06 16:05                             ` Alexander Duyck
@ 2024-04-06 16:49                               ` Andrew Lunn
  2024-04-06 17:16                                 ` Alexander Duyck
  2024-04-08 15:04                               ` Jakub Kicinski
  2024-04-08 19:50                               ` Mina Almasry
  2 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-06 16:49 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	Jiri Pirko, netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

> I'm assuming it is some sort of firmware functionality that is needed
> to enable it? One thing with our design is that the firmware actually
> has minimal functionality. Basically it is the liaison between the
> BMC, Host, and the MAC. Otherwise it has no role to play in the
> control path so when the driver is loaded it is running the show.

Which i personally feel is great. In an odd way, this to me indicates
this is a commodity product, or at least, leading the way towards
commodity 100G products. Looking at the embedded SoC NIC market, which
pretty is much about commodity, few 1G Ethernet NICs have firmware.
Most 10G NICs also have no firmware. Linux is driving the hardware.

Much of the current Linux infrastructure is limited to 10G, because
currently everything faster than that hides away in firmware, Linux
does not get to driver it. This driver could help push Linux
controlling the hardware forward, to be benefit of us all. It would be
great if this driver used phylink to manage the PCS and the SFP cage,
that the PCS code is moved into drivers/net/pcs, etc. Assuming this
PCS follows the standards, it would be great to add helpers like we
have for clause 37, clause 73, to help support other future PCS
drivers which will appear. 100G in SoCs is probably not going to
appear too soon, but single channel 25G is probably the next step
after 10G. And what is added for this device will probably also work
for 25G. 40G via 4 channels is probably not too far away either.

Our Linux SFP driver is also currently limited to 10G. It would be
great if this driver could push that forwards to support faster SFP
cages and devices, support splitting and merging, etc.

None of this requires new kAPIs, they all already exist. There is
nothing controversial here. Everything follows standards. So if Meta
were to abandon the MAC driver, it would not matter, its not dead
infrastructure code, future drivers would make use of it, as this
technology becomes more and more commodity.

	Andrew

	   

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 14:01 ` Przemek Kitszel
@ 2024-04-06 16:53   ` Alexander Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-06 16:53 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

On Fri, Apr 5, 2024 at 7:01 AM Przemek Kitszel
<przemyslaw.kitszel@intel.com> wrote:
>
> On 4/3/24 22:08, Alexander Duyck wrote:
> > This patch set includes the necessary patches to enable basic Tx and Rx
> > over the Meta Platforms Host Network Interface. To do this we introduce a
> > new driver and driver and directories in the form of
> > "drivers/net/ethernet/meta/fbnic".
> >
> > Due to submission limits the general plan to submit a minimal driver for
> > now almost equivalent to a UEFI driver in functionality, and then follow up
> > over the coming weeks enabling additional offloads and more features for
> > the device.
> >
> > The general plan is to look at adding support for ethtool, statistics, and
> > start work on offloads in the next set of patches.
> >
> > ---
> >
> > Alexander Duyck (15):
> >        PCI: Add Meta Platforms vendor ID
> >        eth: fbnic: add scaffolding for Meta's NIC driver
> >        eth: fbnic: Allocate core device specific structures and devlink interface
> >        eth: fbnic: Add register init to set PCIe/Ethernet device config
> >        eth: fbnic: add message parsing for FW messages
> >        eth: fbnic: add FW communication mechanism
> >        eth: fbnic: allocate a netdevice and napi vectors with queues
> >        eth: fbnic: implement Tx queue alloc/start/stop/free
> >        eth: fbnic: implement Rx queue alloc/start/stop/free
> >        eth: fbnic: Add initial messaging to notify FW of our presence
> >        eth: fbnic: Enable Ethernet link setup
> >        eth: fbnic: add basic Tx handling
> >        eth: fbnic: add basic Rx handling
> >        eth: fbnic: add L2 address programming
> >        eth: fbnic: write the TCAM tables used for RSS control and Rx to host
> >
> >
> >   MAINTAINERS                                   |    7 +
> >   drivers/net/ethernet/Kconfig                  |    1 +
> >   drivers/net/ethernet/Makefile                 |    1 +
> >   drivers/net/ethernet/meta/Kconfig             |   29 +
> >   drivers/net/ethernet/meta/Makefile            |    6 +
> >   drivers/net/ethernet/meta/fbnic/Makefile      |   18 +
> >   drivers/net/ethernet/meta/fbnic/fbnic.h       |  148 ++
> >   drivers/net/ethernet/meta/fbnic/fbnic_csr.h   |  912 ++++++++
> >   .../net/ethernet/meta/fbnic/fbnic_devlink.c   |   86 +
> >   .../net/ethernet/meta/fbnic/fbnic_drvinfo.h   |    5 +
> >   drivers/net/ethernet/meta/fbnic/fbnic_fw.c    |  823 ++++++++
> >   drivers/net/ethernet/meta/fbnic/fbnic_fw.h    |  133 ++
> >   drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |  251 +++
> >   drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 1025 +++++++++
> >   drivers/net/ethernet/meta/fbnic/fbnic_mac.h   |   83 +
> >   .../net/ethernet/meta/fbnic/fbnic_netdev.c    |  470 +++++
> >   .../net/ethernet/meta/fbnic/fbnic_netdev.h    |   59 +
> >   drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |  633 ++++++
> >   drivers/net/ethernet/meta/fbnic/fbnic_rpc.c   |  709 +++++++
> >   drivers/net/ethernet/meta/fbnic/fbnic_rpc.h   |  189 ++
> >   drivers/net/ethernet/meta/fbnic/fbnic_tlv.c   |  529 +++++
> >   drivers/net/ethernet/meta/fbnic/fbnic_tlv.h   |  175 ++
> >   drivers/net/ethernet/meta/fbnic/fbnic_txrx.c  | 1873 +++++++++++++++++
> >   drivers/net/ethernet/meta/fbnic/fbnic_txrx.h  |  125 ++
> >   include/linux/pci_ids.h                       |    2 +
> >   25 files changed, 8292 insertions(+)
>
> Even if this is just a basic scaffolding for what will come, it's hard
> to believe that no patch was co-developed, or should be marked as
> authored-by some other developer.
>
> [...]

I don't want to come across as snarky, but you must be new to Intel?
If nothing else you might ask a few people there  about the history of
the fm10k drivers. I think I did most of the Linux and FreeBSD fm10k
drivers in about 2 to 3 years. Typically getting basic Tx and Rx up
and running on a driver only takes a few weeks, and it is pretty
straight forward when you are implementing the QEMU at the same time
to test it on. From my experience driver development really goes by
the pareto principle where getting basic Tx/Rx up and running is
usually a quick task. What takes forever is enabling all the other
offloads and functions.

As far as this driver goes I would say this is something similar, only
this time I have worked on a Linux and UEFI driver, both of which I am
hoping to get upstreamed. With that said I can go through the bits for
the yet to be upstreamed parts that weren't done by me.

We had tried to bring a few people onto the team early on, none of
which are with the team anymore but a couple are still with the
company so I can reach out to them and see if they are okay with the
Co-autthor attribution before I submit those patches. I have a few
people who worked on the ptp and debugfs, one that enabled Tx offloads
and the ethtool ring configuration, and another had just started work
on the UEFI driver before he left. In addition there was an Intern who
did most of the work on the ethtool loopback test.

When the layoffs hit in late 2022 the team was basically reduced to
just myself and a firmware developer. Neither of us really strayed too
much into the other's code. Basically I defined the mailbox interface
and messages and went on our separate ways from there. In the last 6
months our team started hiring again. The new hires are currently
working in areas that aren't in this set such as devlink firmware
update, ethtool register dump, and various other interactions with the
firmware.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-06 16:49                               ` Andrew Lunn
@ 2024-04-06 17:16                                 ` Alexander Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-06 17:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	Jiri Pirko, netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

On Sat, Apr 6, 2024 at 9:49 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > I'm assuming it is some sort of firmware functionality that is needed
> > to enable it? One thing with our design is that the firmware actually
> > has minimal functionality. Basically it is the liaison between the
> > BMC, Host, and the MAC. Otherwise it has no role to play in the
> > control path so when the driver is loaded it is running the show.
>
> Which i personally feel is great. In an odd way, this to me indicates
> this is a commodity product, or at least, leading the way towards
> commodity 100G products. Looking at the embedded SoC NIC market, which
> pretty is much about commodity, few 1G Ethernet NICs have firmware.
> Most 10G NICs also have no firmware. Linux is driving the hardware.
>
> Much of the current Linux infrastructure is limited to 10G, because
> currently everything faster than that hides away in firmware, Linux
> does not get to driver it. This driver could help push Linux
> controlling the hardware forward, to be benefit of us all. It would be
> great if this driver used phylink to manage the PCS and the SFP cage,
> that the PCS code is moved into drivers/net/pcs, etc. Assuming this
> PCS follows the standards, it would be great to add helpers like we
> have for clause 37, clause 73, to help support other future PCS
> drivers which will appear. 100G in SoCs is probably not going to
> appear too soon, but single channel 25G is probably the next step
> after 10G. And what is added for this device will probably also work
> for 25G. 40G via 4 channels is probably not too far away either.
>
> Our Linux SFP driver is also currently limited to 10G. It would be
> great if this driver could push that forwards to support faster SFP
> cages and devices, support splitting and merging, etc.
>
> None of this requires new kAPIs, they all already exist. There is
> nothing controversial here. Everything follows standards. So if Meta
> were to abandon the MAC driver, it would not matter, its not dead
> infrastructure code, future drivers would make use of it, as this
> technology becomes more and more commodity.
>
>         Andrew

As far as the MAC/PCS code goes I will have to see what I can do. I
think I have to check with our sourcing team to figure out what
contracts are in place for whatever IP we are currently using before I
can share any additional info beyond the code here.

One other complication I can think of in terms of switching things
over as you have requested is that we will probably need to look at
splitting up the fbnic_mac.c file as it is currently used for both the
UEFI driver and the Linux driver so I will need to have a solution for
the UEFI driver which wouldn't have the advantage of phylink.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 15:41                   ` Alexander Duyck
@ 2024-04-08  6:18                     ` Leon Romanovsky
  2024-04-08 15:26                       ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Leon Romanovsky @ 2024-04-08  6:18 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On Fri, Apr 05, 2024 at 08:41:11AM -0700, Alexander Duyck wrote:
> On Thu, Apr 4, 2024 at 7:38 PM Jakub Kicinski <kuba@kernel.org> wrote:

<...>

> > > > Technical solution? Maybe if it's not a public device regression rules
> > > > don't apply? Seems fairly reasonable.
> > >
> > > This is a hypothetical. This driver currently isn't changing anything
> > > outside of itself. At this point the driver would only be build tested
> > > by everyone else. They could just not include it in their Kconfig and
> > > then out-of-sight, out-of-mind.
> >
> > Not changing does not mean not depending on existing behavior.
> > Investigating and fixing properly even the hardest regressions in
> > the stack is a bar that Meta can so easily clear. I don't understand
> > why you are arguing.
> 
> I wasn't saying the driver wouldn't be dependent on existing behavior.
> I was saying that it was a hypothetical that Meta would be a "less
> than cooperative user" and demand a revert.  It is also a hypothetical
> that Linus wouldn't just propose a revert of the fbnic driver instead
> of the API for the crime of being a "less than cooperative maintainer"
> and  then give Meta the Nvidia treatment.

It is very easy to be "less than cooperative maintainer" in netdev world.
1. Be vendor.
2. Propose ideas which are different.
3. Report for user-visible regression.
4. Ask for a fix from the patch author or demand a revert according to netdev rules/practice.

And voilà, you are "less than cooperative maintainer".

So in reality, the "hypothetical" is very close to the reality, unless
Meta contribution will be treated as a special case.

Thanks

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 19:22       ` Alexander Duyck
  2024-04-04 20:25         ` Jakub Kicinski
@ 2024-04-08 10:54         ` Jiri Pirko
  1 sibling, 0 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-08 10:54 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, bhelgaas, linux-pci, Alexander Duyck, kuba, davem, pabeni

Thu, Apr 04, 2024 at 09:22:02PM CEST, alexander.duyck@gmail.com wrote:
>On Thu, Apr 4, 2024 at 8:36 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Thu, Apr 04, 2024 at 04:45:14PM CEST, alexander.duyck@gmail.com wrote:
>> >On Thu, Apr 4, 2024 at 4:37 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Wed, Apr 03, 2024 at 10:08:24PM CEST, alexander.duyck@gmail.com wrote:
>
><...>
>
>> >> Could you please shed some light for the motivation to introduce this
>> >> driver in the community kernel? Is this device something people can
>> >> obtain in a shop, or is it rather something to be seen in Meta
>> >> datacenter only? If the second is the case, why exactly would we need
>> >> this driver?
>> >
>> >For now this is Meta only. However there are several reasons for
>> >wanting to include this in the upstream kernel.
>> >
>> >First is the fact that from a maintenance standpoint it is easier to
>> >avoid drifting from the upstream APIs and such if we are in the kernel
>> >it makes things much easier to maintain as we can just pull in patches
>> >without having to add onto that work by having to craft backports
>> >around code that isn't already in upstream.
>>
>> That is making life easier for you, making it harder for the community.
>> O relevance.
>>
>>
>> >
>> >Second is the fact that as we introduce new features with our driver
>> >it is much easier to show a proof of concept with the driver being in
>> >the kernel than not. It makes it much harder to work with the
>> >community on offloads and such if we don't have a good vehicle to use
>> >for that. What this driver will provide is an opportunity to push
>> >changes that would be beneficial to us, and likely the rest of the
>> >community without being constrained by what vendors decide they want
>> >to enable or not. The general idea is that if we can show benefit with
>> >our NIC then other vendors would be more likely to follow in our path.
>>
>> Yeah, so not even we would have to maintain driver nobody (outside Meta)
>> uses or cares about, you say that we will likely maintain more of a dead
>> code related to that. I think that in Linux kernel, there any many
>> examples of similarly dead code that causes a lot of headaches to
>> maintain.
>>
>> You just want to make your life easier here again. Don't drag community
>> into this please.
>
>The argument itself doesn't really hold water. The fact is the Meta
>data centers are not an insignificant consumer of Linux, so it isn't
>as if the driver isn't going to be used. This implies some lack of

Used by one user. Consider a person creating some custom proprietary
FPGA based pet project for himself, trying to add driver for it to the
mainline kernel. Why? Nobody else will ever see the device, why the
community should be involved at all? Does not make sense. Have the
driver for your internal cook-ups internal.


>good faith from Meta. I don't understand that as we are contributing
>across multiple areas in the kernel including networking and ebpf. Is
>Meta expected to start pulling time from our upstream maintainers to
>have them update out-of-tree kernel modules since the community isn't
>willing to let us maintain it in the kernel? Is the message that the

If Meta contributes whatever may be useful for somebody else, it is
completely fine. This driver is not useful for anyone, except Meta.


>kernel is expected to get value from Meta, but that value is not meant
>to be reciprocated? Would you really rather have us start maintaining
>our own internal kernel with our own "proprietary goodness", and ask

I don't care, maintain whatever you want internally. Totally up to you.
Just try to understand my POV. I may believe you have good faith and
everything. But still, I think that community has to be selfish.


>other NIC vendors to have to maintain their drivers against yet
>another kernel if they want to be used in our data centers?
>
>As pointed out by Andew we aren't the first data center to push a
>driver for our own proprietary device. The fact is there have been

If you proprietary device is used by other people running virtual
machines on your systems, that is completely fine. But that is incorrect
analogy to your nic, no outside-Meta person will ever see it!


>drivers added for devices that were for purely emulated devices with
>no actual customers such as rocker. Should the switch vendors at the

This is completely fault analogy. Rocker was introduced to solve
chicken-egg problem to ass switch device support into kernel. It served
the purpose quite well. Let it rot now.



>time have pushed back on it stating it wasn't a real "for sale"
>device? The whole argument seems counter to what is expected. When a
>vendor creates a new device and will likely be enabling new kernel
>features my understanding is that it is better to be in the kernel
>than not.
>
>Putting a criteria on it that it must be "for sale" seems rather

Not "for sale", but "available to the outside person".


>arbitrary and capricious, especially given that most drivers have to

Not capricious at all, I sorry you feel that way. You proceed your
company goals, my position here is to defend the community and
the unnecessary and pointless burden you are putting on it.


>be pushed out long before they are available in the market in order to
>meet deadlines to get the driver into OSV releases such as Redhat when
>it hits the market. By that logic should we block all future drivers
>until we can find them for sale somewhere? That way we don't run the

That is or course obviously complete fault analogy again. You never plan
to introduce your device to public. Big difference. Don't you see it?


>risk of adding a vendor driver for a product that might be scrapped
>due to a last minute bug that will cause it to never be released.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-04 21:59           ` John Fastabend
  2024-04-04 23:50             ` Jakub Kicinski
  2024-04-04 23:50             ` Alexander Duyck
@ 2024-04-08 11:05             ` Jiri Pirko
  2 siblings, 0 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-08 11:05 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jakub Kicinski, Alexander Duyck, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, pabeni

Thu, Apr 04, 2024 at 11:59:33PM CEST, john.fastabend@gmail.com wrote:
>Jakub Kicinski wrote:
>> On Thu, 4 Apr 2024 12:22:02 -0700 Alexander Duyck wrote:
>> > The argument itself doesn't really hold water. The fact is the Meta
>> > data centers are not an insignificant consumer of Linux, 
>> 
>> customer or beneficiary ?
>> 
>> > so it isn't as if the driver isn't going to be used. This implies
>> > some lack of good faith from Meta.
>> 
>> "Good faith" is not a sufficient foundation for a community consisting
>> of volunteers, and commercial entities (with the xz debacle maybe even
>> less today than it was a month ago). As a maintainer I really don't want
>> to be in position of judging the "good faith" of corporate actors.
>> 
>> > I don't understand that as we are
>> > contributing across multiple areas in the kernel including networking
>> > and ebpf. Is Meta expected to start pulling time from our upstream
>> > maintainers to have them update out-of-tree kernel modules since the
>> > community isn't willing to let us maintain it in the kernel? Is the
>> > message that the kernel is expected to get value from Meta, but that
>> > value is not meant to be reciprocated? Would you really rather have
>> > us start maintaining our own internal kernel with our own
>> > "proprietary goodness", and ask other NIC vendors to have to maintain
>> > their drivers against yet another kernel if they want to be used in
>> > our data centers?
>> 
>> Please allow the community to make rational choices in the interest of
>> the project and more importantly the interest of its broader user base.
>> 
>> Google would also claim "good faith" -- undoubtedly is supports 
>> the kernel, and lets some of its best engineers contribute.
>> Did that make them stop trying to build Fuchsia? The "good faith" of
>> companies operates with the limits of margin of error of they consider
>> rational and beneficial.
>> 
>> I don't want to put my thumb on the scale (yet?), but (with my
>> maintainer hat on) please don't use the "Meta is good" argument, because
>> someone will send a similar driver from a less involved company later on
>> and we'll be accused of playing favorites :( Plus companies can change
>> their approach to open source from "inclusive" to "extractive" 
>> (to borrow the economic terminology) rather quickly.
>> 
>
>I'll throw my $.02 in. In this case you have a driver that I only scanned
>so far, but looks well done. Alex has written lots of drivers I trust he
>will not just abondon it. And if it does end up abondoned and no one
>supports it at some future point we can deprecate it same as any other
>driver in the networking tree. All the feedback is being answered and
>debate is happening so I expect will get a v2, v3 or so. All good signs
>in my point.
>
>Back to your point about faith in a company. I don't think we even need
>to care about whatever companies business plans. The author could have
>submitted with their personal address for what its worth and called it
>drivers/alexware/duyck.o Bit extreme and I would have called him on it,
>but hopefully the point is clear.
>
>We have lots of drivers in the tree that are hard to physically get ahold
>of. Or otherwise gated by paying some vendor for compute time, etc. to
>use. We even have some drivers where the hardware itself never made
>it out into the wild or only a single customer used it before sellers 
>burned it for commercial reasons or hw wasn't workable, team was cut, etc.
>
>I can't see how if I have a physical NIC for it on my desk here makes
>much difference one way or the other.
>
>The alternative is much worse someone builds a team of engineers locks
>them up they build some interesting pieces and we never get to see it
>because we tried to block someone from opensourcing their driver?
>Eventually they need some kernel changes and than we block those too
>because we didn't allow the driver that was the use case? This seems
>wrong to me.
>
>Anyways we have zero ways to enforce such a policy. Have vendors
>ship a NIC to somebody with the v0 of the patch set? Attach a picture? 

Come on. Are you kidding? Isn't this case crystal clear?


>Even if vendor X claims they will have a product in N months and
>than only sells it to qualified customers what to do we do then.
>Driver author could even believe the hardware will be available
>when they post the driver, but business may change out of hands
>of the developer.
>
>I'm 100% on letting this through assuming Alex is on top of feedback
>and the code is good. I think any other policy would be very ugly
>to enforce, prove, and even understand. Obviously code and architecture
>debates I'm all for. Ensuring we have a trusted, experienced person
>signed up to review code, address feedback, fix whatever syzbot finds
>and so on is also a must I think. I'm sure Alex will take care of
>it.

You are for some reason making this submission very personal on Alex.
Just to be clear, this has nothing to do with Alex.


>
>Thanks,
>John

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05  7:11                 ` Paolo Abeni
  2024-04-05 12:26                   ` Jason Gunthorpe
@ 2024-04-08 11:37                   ` Jiri Pirko
  1 sibling, 0 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-08 11:37 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Alexander Duyck, Jakub Kicinski, John Fastabend, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem

Fri, Apr 05, 2024 at 09:11:19AM CEST, pabeni@redhat.com wrote:
>On Thu, 2024-04-04 at 17:11 -0700, Alexander Duyck wrote:
>> Again, I would say we look at the blast radius. That is how we should
>> be measuring any change. At this point the driver is self contained
>> into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
>> outside that directory, and it can be switched off via Kconfig.
>
>I personally think this is the most relevant point. This is just a new
>NIC driver, completely self-encapsulated. I quickly glanced over the

What do you mean by "self contained/encapsulated"? You are not using
any API outside the driver? Every driver API change that this NIC
is going to use is a burden. I did my share of changes like that in
the past so I have pretty good notion how painful it often is.


>code and it looks like it's not doing anything obviously bad. It really
>looks like an usual, legit, NIC driver.
>
>I don't think the fact that the NIC itself is hard to grasp for anyone

Distinguish "hard"/"impossible".


>outside <organization> makes a difference. Long time ago Greg noted
>that drivers has been merged for H/W known to have a _single_ existing
>instance (IIRC, I can't find the reference on top of my head, but back
>then was quite popular, I hope some other old guy could remember).
>
>To me, the maintainership burden is on Meta: Alex/Meta will have to
>handle bug report, breakages, user-complains (I guess this last would
>be the easier part ;). If he/they will not cope with the process we can
>simply revert the driver. I would be quite surprised if such situation
>should happen, but the impact from my PoV looks minimal.
>
>TL;DR: I don't see any good reason to not accept this - unless my quick
>glance was too quick and very wrong, but it looks like other has
>similar view.

Do you actually see any good reason to accept this? I mean, really,
could you spell out at least one benefit it brings for non-Meta user?
I see only gains for Meta and losses for the community.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 18:38                         ` Alexander Duyck
  2024-04-05 19:02                           ` Jason Gunthorpe
@ 2024-04-08 11:50                           ` Jiri Pirko
  2024-04-08 15:46                             ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-08 11:50 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>> > > Alex already indicated new features are coming, changes to the core
>> > > code will be proposed. How should those be evaluated? Hypothetically
>> > > should fbnic be allowed to be the first implementation of something
>> > > invasive like Mina's DMABUF work? Google published an open userspace
>> > > for NCCL that people can (in theory at least) actually run. Meta would
>> > > not be able to do that. I would say that clearly crosses the line and
>> > > should not be accepted.
>> >
>> > Why not? Just because we are not commercially selling it doesn't mean
>> > we couldn't look at other solutions such as QEMU. If we were to
>> > provide a github repo with an emulation of the NIC would that be
>> > enough to satisfy the "commercial" requirement?
>>
>> My test is not "commercial", it is enabling open source ecosystem vs
>> benefiting only proprietary software.
>
>Sorry, that was where this started where Jiri was stating that we had
>to be selling this.

For the record, I never wrote that. Not sure why you repeat this over
this thread.

And for the record, I don't share Jason's concern about proprietary
userspace. From what I see, whoever is consuming the KAPI is free to do
that however he pleases.

But, this is completely distant from my concerns about this driver.


[...]


>> > I agree. We need a consistent set of standards. I just strongly
>> > believe commercial availability shouldn't be one of them.
>>
>> I never said commercial availability. I talked about open source vs
>> proprietary userspace. This is very standard kernel stuff.
>>
>> You have an unavailable NIC, so we know it is only ever operated with
>> Meta's proprietary kernel fork, supporting Meta's proprietary
>> userspace software. Where exactly is the open source?
>
>It depends on your definition of "unavailable". I could argue that for
>many most of the Mellanox NICs are also have limited availability as
>they aren't exactly easy to get a hold of without paying a hefty
>ransom.

Sorry, but I have to say this is ridiculous argument, really Alex.
Apples and oranges.

[...]


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-06 16:05                             ` Alexander Duyck
  2024-04-06 16:49                               ` Andrew Lunn
@ 2024-04-08 15:04                               ` Jakub Kicinski
  2024-04-08 19:50                               ` Mina Almasry
  2 siblings, 0 replies; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-08 15:04 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Duyck, Jason Gunthorpe, Paolo Abeni, John Fastabend,
	Jiri Pirko, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

On Sat, 6 Apr 2024 09:05:01 -0700 Alexander Duyck wrote:
> > I'm being very clear to say that there are some core changes should
> > not be accepted due to the kernel's open source ideology.  
> 
> Okay, on core changes I 100% agree. That is one of the reasons why we
> have the whole thing about any feature really needing to be enabled on
> at least 2 different vendor devices.

The "2 different vendor"/implementation thing came up before so
I wanted to provide more context for the less initiated readers.
We try  to judge features in terms of how reasonable the APIs are,
overall system design and how easy they will be to modify later
(e.g. uAPI, depth of core changes).

Risks are usually more pronounced for stack features like GSO partial,
XDP or AF_XDP. Although my (faulty) memory is that we started with
just mlx4 for XDP and other drivers quickly followed. But we did not
wait for more than an ACK from other vendors.

We almost never have a second implementation for HW-heavy features.
TLS immediately comes to mind, and merging it was probably the right
call given how many implementations were added since. For "full" IPsec
offload we still only have one vendor. Existing TCP ZC Rx (page
flipping) was presented as possible with two NICs but mlx5 was hacked
up and still doesn't support real HDS.

Most (if not all) of recent uAPI we added in netdev netlink were
accepted with a single implementation (be it Intel's work + drivers
or my work, and I often provide just a bnxt implementation).

Summary -> the number of implementations we require is decided on case
by case basis, depending on our level of confidence..

I don't know if this "2 implementations" rule is just a "mental
ellipsis" everyone understands to be a more nuanced rule in practice. 
But to an outsider it would seem very selectively enforced. In worst
case a fake rule to give us an excuse to nack stuff.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08  6:18                     ` Leon Romanovsky
@ 2024-04-08 15:26                       ` Alexander Duyck
  2024-04-08 18:41                         ` Leon Romanovsky
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-08 15:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On Sun, Apr 7, 2024 at 11:18 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Fri, Apr 05, 2024 at 08:41:11AM -0700, Alexander Duyck wrote:
> > On Thu, Apr 4, 2024 at 7:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> <...>
>
> > > > > Technical solution? Maybe if it's not a public device regression rules
> > > > > don't apply? Seems fairly reasonable.
> > > >
> > > > This is a hypothetical. This driver currently isn't changing anything
> > > > outside of itself. At this point the driver would only be build tested
> > > > by everyone else. They could just not include it in their Kconfig and
> > > > then out-of-sight, out-of-mind.
> > >
> > > Not changing does not mean not depending on existing behavior.
> > > Investigating and fixing properly even the hardest regressions in
> > > the stack is a bar that Meta can so easily clear. I don't understand
> > > why you are arguing.
> >
> > I wasn't saying the driver wouldn't be dependent on existing behavior.
> > I was saying that it was a hypothetical that Meta would be a "less
> > than cooperative user" and demand a revert.  It is also a hypothetical
> > that Linus wouldn't just propose a revert of the fbnic driver instead
> > of the API for the crime of being a "less than cooperative maintainer"
> > and  then give Meta the Nvidia treatment.
>
> It is very easy to be "less than cooperative maintainer" in netdev world.
> 1. Be vendor.
> 2. Propose ideas which are different.
> 3. Report for user-visible regression.
> 4. Ask for a fix from the patch author or demand a revert according to netdev rules/practice.
>
> And voilà, you are "less than cooperative maintainer".
>
> So in reality, the "hypothetical" is very close to the reality, unless
> Meta contribution will be treated as a special case.
>
> Thanks

How many cases of that have we had in the past? I'm honestly curious
as I don't actually have any reference.

Also as far as item 3 isn't hard for it to be a "user-visible"
regression if there are no users outside of the vendor that is
maintaining the driver to report it? Again I am assuming that the same
rules wouldn't necessarily apply in the vendor/consumer being one
entity case.

Also from my past experience the community doesn't give a damn about
1. It is only if 3 is being reported by actual users that somebody
would care. The fact is if vendors held that much power they would
have run roughshod over the community long ago as I know there are
vendors who love to provide one-off projects outside of the kernel and
usually have to work to get things into the upstream later and no
amount of complaining about "the users" will get their code accepted.
The users may complain but it is the vendors fault for that so the
community doesn't have to take action.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 11:50                           ` Jiri Pirko
@ 2024-04-08 15:46                             ` Alexander Duyck
  2024-04-08 16:51                               ` Jiri Pirko
  2024-04-08 18:16                               ` Jason Gunthorpe
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-08 15:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
> >On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >>
> >> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
> >> > > Alex already indicated new features are coming, changes to the core
> >> > > code will be proposed. How should those be evaluated? Hypothetically
> >> > > should fbnic be allowed to be the first implementation of something
> >> > > invasive like Mina's DMABUF work? Google published an open userspace
> >> > > for NCCL that people can (in theory at least) actually run. Meta would
> >> > > not be able to do that. I would say that clearly crosses the line and
> >> > > should not be accepted.
> >> >
> >> > Why not? Just because we are not commercially selling it doesn't mean
> >> > we couldn't look at other solutions such as QEMU. If we were to
> >> > provide a github repo with an emulation of the NIC would that be
> >> > enough to satisfy the "commercial" requirement?
> >>
> >> My test is not "commercial", it is enabling open source ecosystem vs
> >> benefiting only proprietary software.
> >
> >Sorry, that was where this started where Jiri was stating that we had
> >to be selling this.
>
> For the record, I never wrote that. Not sure why you repeat this over
> this thread.

Because you seem to be implying that the Meta NIC driver shouldn't be
included simply since it isn't going to be available outside of Meta.
The fact is Meta employs a number of kernel developers and as a result
of that there will be a number of kernel developers that will have
access to this NIC and likely do development on systems containing it.
In addition simply due to the size of the datacenters that we will be
populating there is actually a strong likelihood that there will be
more instances of this NIC running on Linux than there are of some
other vendor devices that have been allowed to have drivers in the
kernel.

So from what I can tell the only difference is if we are manufacturing
this for sale, or for personal use. Thus why I mention "commercial"
since the only difference from my perspective is the fact that we are
making it for our own use instead of selling it.

[...]

> >> > I agree. We need a consistent set of standards. I just strongly
> >> > believe commercial availability shouldn't be one of them.
> >>
> >> I never said commercial availability. I talked about open source vs
> >> proprietary userspace. This is very standard kernel stuff.
> >>
> >> You have an unavailable NIC, so we know it is only ever operated with
> >> Meta's proprietary kernel fork, supporting Meta's proprietary
> >> userspace software. Where exactly is the open source?
> >
> >It depends on your definition of "unavailable". I could argue that for
> >many most of the Mellanox NICs are also have limited availability as
> >they aren't exactly easy to get a hold of without paying a hefty
> >ransom.
>
> Sorry, but I have to say this is ridiculous argument, really Alex.
> Apples and oranges.

Really? So would you be making the same argument if it was
Nvidia/Mellanox pushing the driver and they were exclusively making it
just for Meta, Google, or some other big cloud provider? I suspect
not. If nothing else they likely wouldn't disclose the plan for
exclusive sales to get around this sort of thing. The fact is I know
many of the vendors make proprietary spins of their firmware and
hardware for specific customers. The way I see it this patchset is
being rejected as I was too honest about the general plan and use case
for it.

This is what I am getting at. It just seems like we are playing games
with semantics where if it is a vendor making the arrangement then it
is okay for them to make hardware that is inaccessible to most, but if
it is Meta then somehow it isn't.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 15:46                             ` Alexander Duyck
@ 2024-04-08 16:51                               ` Jiri Pirko
  2024-04-08 17:32                                 ` John Fastabend
  2024-04-08 21:36                                 ` Florian Fainelli
  2024-04-08 18:16                               ` Jason Gunthorpe
  1 sibling, 2 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-08 16:51 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>> >On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>> >>
>> >> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>> >> > > Alex already indicated new features are coming, changes to the core
>> >> > > code will be proposed. How should those be evaluated? Hypothetically
>> >> > > should fbnic be allowed to be the first implementation of something
>> >> > > invasive like Mina's DMABUF work? Google published an open userspace
>> >> > > for NCCL that people can (in theory at least) actually run. Meta would
>> >> > > not be able to do that. I would say that clearly crosses the line and
>> >> > > should not be accepted.
>> >> >
>> >> > Why not? Just because we are not commercially selling it doesn't mean
>> >> > we couldn't look at other solutions such as QEMU. If we were to
>> >> > provide a github repo with an emulation of the NIC would that be
>> >> > enough to satisfy the "commercial" requirement?
>> >>
>> >> My test is not "commercial", it is enabling open source ecosystem vs
>> >> benefiting only proprietary software.
>> >
>> >Sorry, that was where this started where Jiri was stating that we had
>> >to be selling this.
>>
>> For the record, I never wrote that. Not sure why you repeat this over
>> this thread.
>
>Because you seem to be implying that the Meta NIC driver shouldn't be
>included simply since it isn't going to be available outside of Meta.
>The fact is Meta employs a number of kernel developers and as a result
>of that there will be a number of kernel developers that will have
>access to this NIC and likely do development on systems containing it.
>In addition simply due to the size of the datacenters that we will be
>populating there is actually a strong likelihood that there will be
>more instances of this NIC running on Linux than there are of some
>other vendor devices that have been allowed to have drivers in the
>kernel.

So? The gain for community is still 0. No matter how many instances is
private hw you privately have. Just have a private driver.


>
>So from what I can tell the only difference is if we are manufacturing
>this for sale, or for personal use. Thus why I mention "commercial"
>since the only difference from my perspective is the fact that we are
>making it for our own use instead of selling it.

Give it for free.


>
>[...]
>
>> >> > I agree. We need a consistent set of standards. I just strongly
>> >> > believe commercial availability shouldn't be one of them.
>> >>
>> >> I never said commercial availability. I talked about open source vs
>> >> proprietary userspace. This is very standard kernel stuff.
>> >>
>> >> You have an unavailable NIC, so we know it is only ever operated with
>> >> Meta's proprietary kernel fork, supporting Meta's proprietary
>> >> userspace software. Where exactly is the open source?
>> >
>> >It depends on your definition of "unavailable". I could argue that for
>> >many most of the Mellanox NICs are also have limited availability as
>> >they aren't exactly easy to get a hold of without paying a hefty
>> >ransom.
>>
>> Sorry, but I have to say this is ridiculous argument, really Alex.
>> Apples and oranges.
>
>Really? So would you be making the same argument if it was
>Nvidia/Mellanox pushing the driver and they were exclusively making it
>just for Meta, Google, or some other big cloud provider? I suspect

Heh, what ifs :) Anyway, chance that happens is very close to 0.


>not. If nothing else they likely wouldn't disclose the plan for
>exclusive sales to get around this sort of thing. The fact is I know
>many of the vendors make proprietary spins of their firmware and
>hardware for specific customers. The way I see it this patchset is
>being rejected as I was too honest about the general plan and use case
>for it.
>
>This is what I am getting at. It just seems like we are playing games
>with semantics where if it is a vendor making the arrangement then it
>is okay for them to make hardware that is inaccessible to most, but if
>it is Meta then somehow it isn't.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 16:51                               ` Jiri Pirko
@ 2024-04-08 17:32                                 ` John Fastabend
  2024-04-09 11:01                                   ` Jiri Pirko
  2024-04-08 21:36                                 ` Florian Fainelli
  1 sibling, 1 reply; 163+ messages in thread
From: John Fastabend @ 2024-04-08 17:32 UTC (permalink / raw)
  To: Jiri Pirko, Alexander Duyck
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

Jiri Pirko wrote:
> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
> >On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
> >> >On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >> >>
> >> >> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
> >> >> > > Alex already indicated new features are coming, changes to the core
> >> >> > > code will be proposed. How should those be evaluated? Hypothetically
> >> >> > > should fbnic be allowed to be the first implementation of something
> >> >> > > invasive like Mina's DMABUF work? Google published an open userspace
> >> >> > > for NCCL that people can (in theory at least) actually run. Meta would
> >> >> > > not be able to do that. I would say that clearly crosses the line and
> >> >> > > should not be accepted.
> >> >> >
> >> >> > Why not? Just because we are not commercially selling it doesn't mean
> >> >> > we couldn't look at other solutions such as QEMU. If we were to
> >> >> > provide a github repo with an emulation of the NIC would that be
> >> >> > enough to satisfy the "commercial" requirement?
> >> >>
> >> >> My test is not "commercial", it is enabling open source ecosystem vs
> >> >> benefiting only proprietary software.
> >> >
> >> >Sorry, that was where this started where Jiri was stating that we had
> >> >to be selling this.
> >>
> >> For the record, I never wrote that. Not sure why you repeat this over
> >> this thread.
> >
> >Because you seem to be implying that the Meta NIC driver shouldn't be
> >included simply since it isn't going to be available outside of Meta.
> >The fact is Meta employs a number of kernel developers and as a result
> >of that there will be a number of kernel developers that will have
> >access to this NIC and likely do development on systems containing it.
> >In addition simply due to the size of the datacenters that we will be
> >populating there is actually a strong likelihood that there will be
> >more instances of this NIC running on Linux than there are of some
> >other vendor devices that have been allowed to have drivers in the
> >kernel.
> 
> So? The gain for community is still 0. No matter how many instances is
> private hw you privately have. Just have a private driver.

The gain is the same as if company X makes a card and sells it
exclusively to datacenter provider Y. We know this happens.
Vendors would happily spin up a NIC if a DC with scale like this
would pay for it. They just don't advertise it in patch 0/X,
"adding device for cloud provider foo".

There is no difference here. We gain developers, we gain insights,
learnings and Linux and OSS drivers are running on another big
DC. They improve things and find bugs they upstream them its a win.

The opposite is also true if we exclude a driver/NIC HW that is
running on major DCs we lose a lot of insight, experience, value.
DCs are all starting to build their own hardware if we lose this
section of HW we lose those developers too. We are less likely
to get any advances they come up with. I think you have it backwards.
Eventually Linux networking becomes either commodity and irrelevant
for DC deployments.

So I strongly disagree we lose by excluding drivers and win by
bringing it in.

> 
> 
> >
> >So from what I can tell the only difference is if we are manufacturing
> >this for sale, or for personal use. Thus why I mention "commercial"
> >since the only difference from my perspective is the fact that we are
> >making it for our own use instead of selling it.
> 
> Give it for free.

Huh?

> 
> 
> >
> >[...]
> >
> >> >> > I agree. We need a consistent set of standards. I just strongly
> >> >> > believe commercial availability shouldn't be one of them.
> >> >>
> >> >> I never said commercial availability. I talked about open source vs
> >> >> proprietary userspace. This is very standard kernel stuff.
> >> >>
> >> >> You have an unavailable NIC, so we know it is only ever operated with
> >> >> Meta's proprietary kernel fork, supporting Meta's proprietary
> >> >> userspace software. Where exactly is the open source?
> >> >
> >> >It depends on your definition of "unavailable". I could argue that for
> >> >many most of the Mellanox NICs are also have limited availability as
> >> >they aren't exactly easy to get a hold of without paying a hefty
> >> >ransom.
> >>
> >> Sorry, but I have to say this is ridiculous argument, really Alex.
> >> Apples and oranges.
> >
> >Really? So would you be making the same argument if it was
> >Nvidia/Mellanox pushing the driver and they were exclusively making it
> >just for Meta, Google, or some other big cloud provider? I suspect
> 
> Heh, what ifs :) Anyway, chance that happens is very close to 0.
> 
> 
> >not. If nothing else they likely wouldn't disclose the plan for
> >exclusive sales to get around this sort of thing. The fact is I know
> >many of the vendors make proprietary spins of their firmware and
> >hardware for specific customers. The way I see it this patchset is
> >being rejected as I was too honest about the general plan and use case
> >for it.
> >
> >This is what I am getting at. It just seems like we are playing games
> >with semantics where if it is a vendor making the arrangement then it
> >is okay for them to make hardware that is inaccessible to most, but if
> >it is Meta then somehow it isn't.



^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 15:46                             ` Alexander Duyck
  2024-04-08 16:51                               ` Jiri Pirko
@ 2024-04-08 18:16                               ` Jason Gunthorpe
  1 sibling, 0 replies; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-08 18:16 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jiri Pirko, Paolo Abeni, Jakub Kicinski, John Fastabend, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem, Christoph Hellwig

On Mon, Apr 08, 2024 at 08:46:35AM -0700, Alexander Duyck wrote:

> Really? So would you be making the same argument if it was
> Nvidia/Mellanox pushing the driver and they were exclusively making it
> just for Meta, Google, or some other big cloud provider? 

At least I would, yes.

> I suspect not. If nothing else they likely wouldn't disclose the
> plan for exclusive sales to get around this sort of thing. The fact
> is I know many of the vendors make proprietary spins of their
> firmware and hardware for specific customers. The way I see it this
> patchset is being rejected as I was too honest about the general
> plan and use case for it.

Regrettably this does happen quietly in the kernel. If you know the
right behind the scenes stuff you can start to be aware. That doesn't
mean it is aligned with community values or should be done/encouraged.

> This is what I am getting at. It just seems like we are playing games
> with semantics where if it is a vendor making the arrangement then it
> is okay for them to make hardware that is inaccessible to most, but if
> it is Meta then somehow it isn't.

With Meta it is obvious what is happening, and what is benefiting. If
a COTS vendor does it then we have to take a leap of faith a unique
feature will have wider applications - and many would require to see
an open source userspace to boot strap that. I don't think we always
get it right. Value judgements are often a bit murky like that.

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 15:26                       ` Alexander Duyck
@ 2024-04-08 18:41                         ` Leon Romanovsky
  2024-04-08 20:43                           ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Leon Romanovsky @ 2024-04-08 18:41 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On Mon, Apr 08, 2024 at 08:26:33AM -0700, Alexander Duyck wrote:
> On Sun, Apr 7, 2024 at 11:18 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Apr 05, 2024 at 08:41:11AM -0700, Alexander Duyck wrote:
> > > On Thu, Apr 4, 2024 at 7:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > <...>
> >
> > > > > > Technical solution? Maybe if it's not a public device regression rules
> > > > > > don't apply? Seems fairly reasonable.
> > > > >
> > > > > This is a hypothetical. This driver currently isn't changing anything
> > > > > outside of itself. At this point the driver would only be build tested
> > > > > by everyone else. They could just not include it in their Kconfig and
> > > > > then out-of-sight, out-of-mind.
> > > >
> > > > Not changing does not mean not depending on existing behavior.
> > > > Investigating and fixing properly even the hardest regressions in
> > > > the stack is a bar that Meta can so easily clear. I don't understand
> > > > why you are arguing.
> > >
> > > I wasn't saying the driver wouldn't be dependent on existing behavior.
> > > I was saying that it was a hypothetical that Meta would be a "less
> > > than cooperative user" and demand a revert.  It is also a hypothetical
> > > that Linus wouldn't just propose a revert of the fbnic driver instead
> > > of the API for the crime of being a "less than cooperative maintainer"
> > > and  then give Meta the Nvidia treatment.
> >
> > It is very easy to be "less than cooperative maintainer" in netdev world.
> > 1. Be vendor.
> > 2. Propose ideas which are different.
> > 3. Report for user-visible regression.
> > 4. Ask for a fix from the patch author or demand a revert according to netdev rules/practice.
> >
> > And voilà, you are "less than cooperative maintainer".
> >
> > So in reality, the "hypothetical" is very close to the reality, unless
> > Meta contribution will be treated as a special case.
> >
> > Thanks
> 
> How many cases of that have we had in the past? I'm honestly curious
> as I don't actually have any reference.

And this is the problem, you don't have "any reference" and accurate
knowledge what happened, but you are saying "less than cooperative
maintainer".

> 
> Also as far as item 3 isn't hard for it to be a "user-visible"
> regression if there are no users outside of the vendor that is
> maintaining the driver to report it? 

This wasn't the case. It was change in core code, which broke specific
version of vagrant. Vendor caught it simply by luck.

> Again I am assuming that the same rules wouldn't necessarily apply
> in the vendor/consumer being one entity case.
> 
> Also from my past experience the community doesn't give a damn about
> 1. It is only if 3 is being reported by actual users that somebody
> would care. The fact is if vendors held that much power they would
> have run roughshod over the community long ago as I know there are
> vendors who love to provide one-off projects outside of the kernel and
> usually have to work to get things into the upstream later and no
> amount of complaining about "the users" will get their code accepted.
> The users may complain but it is the vendors fault for that so the
> community doesn't have to take action.

You are taking it to completely wrong direction with your assumptions.
The reality is that regression was reported by real user without any
vendor code involved. This is why the end result was so bad for all parties.

So no, you can get "less than cooperative maintainer" label really easy in
current environment.

Thanks

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-06 16:05                             ` Alexander Duyck
  2024-04-06 16:49                               ` Andrew Lunn
  2024-04-08 15:04                               ` Jakub Kicinski
@ 2024-04-08 19:50                               ` Mina Almasry
  2 siblings, 0 replies; 163+ messages in thread
From: Mina Almasry @ 2024-04-08 19:50 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	Jiri Pirko, netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig, Eric Dumazet, Willem de Bruijn

On Sat, Apr 6, 2024 at 9:05 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> > > > You have an unavailable NIC, so we know it is only ever operated with
> > > > Meta's proprietary kernel fork, supporting Meta's proprietary
> > > > userspace software. Where exactly is the open source?
> > >
> > > It depends on your definition of "unavailable". I could argue that for
> > > many most of the Mellanox NICs are also have limited availability as
> > > they aren't exactly easy to get a hold of without paying a hefty
> > > ransom.
> >
> > And GNIC's that run Mina's series are completely unavailable right
> > now. That is still a big different from a temporary issue to a
> > permanent structural intention of the manufacturer.
>
> I'm assuming it is some sort of firmware functionality that is needed
> to enable it? One thing with our design is that the firmware actually
> has minimal functionality. Basically it is the liaison between the
> BMC, Host, and the MAC. Otherwise it has no role to play in the
> control path so when the driver is loaded it is running the show.
>

Sorry, I didn't realize our devmem TCP work was mentioned in this
context. Just jumping in to say, no, this is not the case, devmem TCP
does not require firmware functionality AFAICT. The selftest provided
with the devmem TCP series should work on any driver that:

1. supports header split/flow steering/rss/page pool (I guess this
support may need firmware changes...).

2. supports the new queue configuration ndos:
https://patchwork.kernel.org/project/netdevbpf/patch/20240403002053.2376017-2-almasrymina@google.com/

3. supports the new netmem page_pool APIs:
https://patchwork.kernel.org/project/netdevbpf/patch/20240403002053.2376017-8-almasrymina@google.com/

No firmware changes specific to devmem TCP are needed, AFAICT. All
these are driver changes. I also always publish a full branch with all
the GVE changes so reviewers can check if there is anything too
specific to GVE that we're doing, so far there are been no issues, and
to be honest I can't see anything specific that we do with GVE for
devmem TCP:

https://github.com/mina/linux/commits/tcpdevmem-v7/

In fact, GVE is IMO a relatively feature light driver, and the fact
that GVE can do devmem TCP IMO makes it easier for fancier NICs to
also do devmem TCP.

I'm working with folks interested in extending devmem TCP to their
drivers, and they may follow up with patches after the series is
merged (or before). The only reason I haven't implemented devmem TCP
for multiple different drivers is a logistical one. I don't have
access to hardware that supports all these prerequisite features other
than GVE.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 18:41                         ` Leon Romanovsky
@ 2024-04-08 20:43                           ` Alexander Duyck
  2024-04-08 21:49                             ` Florian Fainelli
  2024-04-09  8:18                             ` Leon Romanovsky
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-08 20:43 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On Mon, Apr 8, 2024 at 11:41 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Apr 08, 2024 at 08:26:33AM -0700, Alexander Duyck wrote:
> > On Sun, Apr 7, 2024 at 11:18 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Fri, Apr 05, 2024 at 08:41:11AM -0700, Alexander Duyck wrote:
> > > > On Thu, Apr 4, 2024 at 7:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > <...>
> > >
> > > > > > > Technical solution? Maybe if it's not a public device regression rules
> > > > > > > don't apply? Seems fairly reasonable.
> > > > > >
> > > > > > This is a hypothetical. This driver currently isn't changing anything
> > > > > > outside of itself. At this point the driver would only be build tested
> > > > > > by everyone else. They could just not include it in their Kconfig and
> > > > > > then out-of-sight, out-of-mind.
> > > > >
> > > > > Not changing does not mean not depending on existing behavior.
> > > > > Investigating and fixing properly even the hardest regressions in
> > > > > the stack is a bar that Meta can so easily clear. I don't understand
> > > > > why you are arguing.
> > > >
> > > > I wasn't saying the driver wouldn't be dependent on existing behavior.
> > > > I was saying that it was a hypothetical that Meta would be a "less
> > > > than cooperative user" and demand a revert.  It is also a hypothetical
> > > > that Linus wouldn't just propose a revert of the fbnic driver instead
> > > > of the API for the crime of being a "less than cooperative maintainer"
> > > > and  then give Meta the Nvidia treatment.
> > >
> > > It is very easy to be "less than cooperative maintainer" in netdev world.
> > > 1. Be vendor.
> > > 2. Propose ideas which are different.
> > > 3. Report for user-visible regression.
> > > 4. Ask for a fix from the patch author or demand a revert according to netdev rules/practice.
> > >
> > > And voilà, you are "less than cooperative maintainer".
> > >
> > > So in reality, the "hypothetical" is very close to the reality, unless
> > > Meta contribution will be treated as a special case.
> > >
> > > Thanks
> >
> > How many cases of that have we had in the past? I'm honestly curious
> > as I don't actually have any reference.
>
> And this is the problem, you don't have "any reference" and accurate
> knowledge what happened, but you are saying "less than cooperative
> maintainer".

By "less than cooperative maintainer" I was referring to the scenario
where somebody is maintaining something unique to them, such as the
Meta Host NIC, and not willing to work with the community to fix it
and instead just demanding a revert of a change. It doesn't seem like
it would be too much to ask to work with the author on a fix for the
problem as long as the maintainer is willing to work with the author
on putting together and testing the fix.

With that said if the upstream version of things aren't broken then it
doesn't matter. It shouldn't be expected of the community to maintain
any proprietary code that wasn't accepted upstream.

> >
> > Also as far as item 3 isn't hard for it to be a "user-visible"
> > regression if there are no users outside of the vendor that is
> > maintaining the driver to report it?
>
> This wasn't the case. It was change in core code, which broke specific
> version of vagrant. Vendor caught it simply by luck.

Any more info on this? Without context it is hard to say one way or the other.

I know I have seen my fair share of hot issues such as when the
introduction of the tracing framework was corrupting the NVRAM on
e1000e NICs.[1] It got everyone's attention when it essentially
bricked one of Linus's systems. I don't recall us doing a full revert
on function tracing as a result, but I believe it was flagged as
broken until it could be resolved. So depending on the situation there
are cases where asking for a fix or revert might be appropriate.

> > Again I am assuming that the same rules wouldn't necessarily apply
> > in the vendor/consumer being one entity case.
> >
> > Also from my past experience the community doesn't give a damn about
> > 1. It is only if 3 is being reported by actual users that somebody
> > would care. The fact is if vendors held that much power they would
> > have run roughshod over the community long ago as I know there are
> > vendors who love to provide one-off projects outside of the kernel and
> > usually have to work to get things into the upstream later and no
> > amount of complaining about "the users" will get their code accepted.
> > The users may complain but it is the vendors fault for that so the
> > community doesn't have to take action.
>
> You are taking it to completely wrong direction with your assumptions.
> The reality is that regression was reported by real user without any
> vendor code involved. This is why the end result was so bad for all parties.

Okay, but that doesn't tie into what is going on here. In this case
"vendor" == "user". Like I was saying the community generally cares
about the user so 3 would be the important case assuming they are
using a stock kernel and driver and not hiding behind the vendor
expecting some sort of proprietary fix. If they are using some
proprietary stuff behind the scenes, then tough luck.

> So no, you can get "less than cooperative maintainer" label really easy in
> current environment.

I didn't say you couldn't. Without context I cannot say if it was
deserved or not. I know in the example I cited above Intel had to add
changes to the e1000e driver to make the NVRAM non-writable until the
problem patch was found. So Intel was having to patch to fix an issue
it didn't introduce and deal with the negative press and blow-back
from a function tracing patch that was damaging NICs.

The point I was trying to make is that if you are the only owner of
something, and not willing to work with the community as a maintainer
it becomes much easier for the community to just revert the driver
rather than try to change the code if you aren't willing to work with
them. Thus the "less than cooperative" part. The argument being made
seems to be that once something is in the kernel it is forever and if
we get it in and then refuse to work with the community it couldn't be
reverted. I am arguing that isn't the case, especially if Meta were to
become a "less than cooperative maintainer" for a device that is
primarily only going to be available in Meta data centers.

Thanks,

- Alex

[1]: https://lwn.net/Articles/304105/

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 16:51                               ` Jiri Pirko
  2024-04-08 17:32                                 ` John Fastabend
@ 2024-04-08 21:36                                 ` Florian Fainelli
  2024-04-09 10:56                                   ` Jiri Pirko
  1 sibling, 1 reply; 163+ messages in thread
From: Florian Fainelli @ 2024-04-08 21:36 UTC (permalink / raw)
  To: Jiri Pirko, Alexander Duyck
  Cc: Jason Gunthorpe, Paolo Abeni, Jakub Kicinski, John Fastabend,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

On 4/8/24 09:51, Jiri Pirko wrote:
> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>> On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>>
>>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>>>> On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>
>>>>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>>>>>>> Alex already indicated new features are coming, changes to the core
>>>>>>> code will be proposed. How should those be evaluated? Hypothetically
>>>>>>> should fbnic be allowed to be the first implementation of something
>>>>>>> invasive like Mina's DMABUF work? Google published an open userspace
>>>>>>> for NCCL that people can (in theory at least) actually run. Meta would
>>>>>>> not be able to do that. I would say that clearly crosses the line and
>>>>>>> should not be accepted.
>>>>>>
>>>>>> Why not? Just because we are not commercially selling it doesn't mean
>>>>>> we couldn't look at other solutions such as QEMU. If we were to
>>>>>> provide a github repo with an emulation of the NIC would that be
>>>>>> enough to satisfy the "commercial" requirement?
>>>>>
>>>>> My test is not "commercial", it is enabling open source ecosystem vs
>>>>> benefiting only proprietary software.
>>>>
>>>> Sorry, that was where this started where Jiri was stating that we had
>>>> to be selling this.
>>>
>>> For the record, I never wrote that. Not sure why you repeat this over
>>> this thread.
>>
>> Because you seem to be implying that the Meta NIC driver shouldn't be
>> included simply since it isn't going to be available outside of Meta.
>> The fact is Meta employs a number of kernel developers and as a result
>> of that there will be a number of kernel developers that will have
>> access to this NIC and likely do development on systems containing it.
>> In addition simply due to the size of the datacenters that we will be
>> populating there is actually a strong likelihood that there will be
>> more instances of this NIC running on Linux than there are of some
>> other vendor devices that have been allowed to have drivers in the
>> kernel.
> 
> So? The gain for community is still 0. No matter how many instances is
> private hw you privately have. Just have a private driver.

I am amazed and not in a good way at how far this has gone, truly.

This really is akin to saying that any non-zero driver count to maintain 
is a burden on the community. Which is true, by definition, but if the 
goal was to build something for no users, then clearly this is the wrong 
place to be in, or too late. The systems with no users are the best to 
maintain, that is for sure.

If the practical concern is wen you make tree wide API change that fbnic 
happens to use, and you have yet another driver (fbnic) to convert, so 
what? Work with Alex ahead of time, get his driver to be modified, post 
the patch series. Even if Alex happens to move on and stop being 
responsible and there is no maintainer, so what? Give the driver a 
depreciation window for someone to step in, rip it, end of story. 
Nothing new, so what has specifically changed as of April 4th 2024 to 
oppose such strong rejection?

Like it was said, there are tons of drivers in the Linux kernel that 
have a single user, this one might have a few more than a single one, 
that should be good enough.

What the heck is going on?
-- 
Florian


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 20:43                           ` Alexander Duyck
@ 2024-04-08 21:49                             ` Florian Fainelli
  2024-04-08 21:52                               ` Florian Fainelli
  2024-04-09  8:18                             ` Leon Romanovsky
  1 sibling, 1 reply; 163+ messages in thread
From: Florian Fainelli @ 2024-04-08 21:49 UTC (permalink / raw)
  To: Alexander Duyck, Leon Romanovsky
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On 4/8/24 13:43, Alexander Duyck wrote:
>>>
>>> Also as far as item 3 isn't hard for it to be a "user-visible"
>>> regression if there are no users outside of the vendor that is
>>> maintaining the driver to report it?
>>
>> This wasn't the case. It was change in core code, which broke specific
>> version of vagrant. Vendor caught it simply by luck.
> 
> Any more info on this? Without context it is hard to say one way or the other.

Believe this is the thread in question:

https://lore.kernel.org/netdev/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com/
-- 
Florian


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 21:49                             ` Florian Fainelli
@ 2024-04-08 21:52                               ` Florian Fainelli
  0 siblings, 0 replies; 163+ messages in thread
From: Florian Fainelli @ 2024-04-08 21:52 UTC (permalink / raw)
  To: Alexander Duyck, Leon Romanovsky
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On 4/8/24 14:49, Florian Fainelli wrote:
> On 4/8/24 13:43, Alexander Duyck wrote:
>>>>
>>>> Also as far as item 3 isn't hard for it to be a "user-visible"
>>>> regression if there are no users outside of the vendor that is
>>>> maintaining the driver to report it?
>>>
>>> This wasn't the case. It was change in core code, which broke specific
>>> version of vagrant. Vendor caught it simply by luck.
>>
>> Any more info on this? Without context it is hard to say one way or 
>> the other.
> 
> Believe this is the thread in question:
> 
> https://lore.kernel.org/netdev/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com/

And the follow up:

https://lore.kernel.org/netdev/14459261ea9f9c7d7dfb28eb004ce8734fa83ade.1704185904.git.leonro@nvidia.com/
-- 
Florian


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 20:43                           ` Alexander Duyck
  2024-04-08 21:49                             ` Florian Fainelli
@ 2024-04-09  8:18                             ` Leon Romanovsky
  2024-04-09 14:43                               ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Leon Romanovsky @ 2024-04-09  8:18 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On Mon, Apr 08, 2024 at 01:43:28PM -0700, Alexander Duyck wrote:
> On Mon, Apr 8, 2024 at 11:41 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Apr 08, 2024 at 08:26:33AM -0700, Alexander Duyck wrote:
> > > On Sun, Apr 7, 2024 at 11:18 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Fri, Apr 05, 2024 at 08:41:11AM -0700, Alexander Duyck wrote:
> > > > > On Thu, Apr 4, 2024 at 7:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > <...>
> > > >
> > > > > > > > Technical solution? Maybe if it's not a public device regression rules
> > > > > > > > don't apply? Seems fairly reasonable.
> > > > > > >
> > > > > > > This is a hypothetical. This driver currently isn't changing anything
> > > > > > > outside of itself. At this point the driver would only be build tested
> > > > > > > by everyone else. They could just not include it in their Kconfig and
> > > > > > > then out-of-sight, out-of-mind.
> > > > > >
> > > > > > Not changing does not mean not depending on existing behavior.
> > > > > > Investigating and fixing properly even the hardest regressions in
> > > > > > the stack is a bar that Meta can so easily clear. I don't understand
> > > > > > why you are arguing.
> > > > >
> > > > > I wasn't saying the driver wouldn't be dependent on existing behavior.
> > > > > I was saying that it was a hypothetical that Meta would be a "less
> > > > > than cooperative user" and demand a revert.  It is also a hypothetical
> > > > > that Linus wouldn't just propose a revert of the fbnic driver instead
> > > > > of the API for the crime of being a "less than cooperative maintainer"
> > > > > and  then give Meta the Nvidia treatment.
> > > >
> > > > It is very easy to be "less than cooperative maintainer" in netdev world.
> > > > 1. Be vendor.
> > > > 2. Propose ideas which are different.
> > > > 3. Report for user-visible regression.
> > > > 4. Ask for a fix from the patch author or demand a revert according to netdev rules/practice.
> > > >
> > > > And voilà, you are "less than cooperative maintainer".
> > > >
> > > > So in reality, the "hypothetical" is very close to the reality, unless
> > > > Meta contribution will be treated as a special case.
> > > >
> > > > Thanks
> > >
> > > How many cases of that have we had in the past? I'm honestly curious
> > > as I don't actually have any reference.
> >
> > And this is the problem, you don't have "any reference" and accurate
> > knowledge what happened, but you are saying "less than cooperative
> > maintainer".

<...>

> Any more info on this? Without context it is hard to say one way or the other.

<...>

> I didn't say you couldn't. Without context I cannot say if it was
> deserved or not. 

Florian gave links to the context, so I'll skip this part.

In this thread, Jakub tried to revive the discussion about it.
https://lore.kernel.org/netdev/20240326133412.47cf6d99@kernel.org/

<...>

> The point I was trying to make is that if you are the only owner of
> something, and not willing to work with the community as a maintainer

Like Jakub, I don't understand why you are talking about regressions in
the driver, while you brought the label of "less than cooperative maintainer"
and asked for "then give Meta the Nvidia treatment".

I don't want to get into the discussion about if this driver should be
accepted or not.

I'm just asking to stop label people and companies based on descriptions
from other people, but rely on facts.

Thanks

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 21:36                                 ` Florian Fainelli
@ 2024-04-09 10:56                                   ` Jiri Pirko
  2024-04-09 13:05                                     ` Florian Fainelli
  0 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-09 10:56 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Alexander Duyck, Jason Gunthorpe, Paolo Abeni, Jakub Kicinski,
	John Fastabend, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

Mon, Apr 08, 2024 at 11:36:42PM CEST, f.fainelli@gmail.com wrote:
>On 4/8/24 09:51, Jiri Pirko wrote:
>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>> > On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> > > 
>> > > Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>> > > > On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>> > > > > 
>> > > > > On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>> > > > > > > Alex already indicated new features are coming, changes to the core
>> > > > > > > code will be proposed. How should those be evaluated? Hypothetically
>> > > > > > > should fbnic be allowed to be the first implementation of something
>> > > > > > > invasive like Mina's DMABUF work? Google published an open userspace
>> > > > > > > for NCCL that people can (in theory at least) actually run. Meta would
>> > > > > > > not be able to do that. I would say that clearly crosses the line and
>> > > > > > > should not be accepted.
>> > > > > > 
>> > > > > > Why not? Just because we are not commercially selling it doesn't mean
>> > > > > > we couldn't look at other solutions such as QEMU. If we were to
>> > > > > > provide a github repo with an emulation of the NIC would that be
>> > > > > > enough to satisfy the "commercial" requirement?
>> > > > > 
>> > > > > My test is not "commercial", it is enabling open source ecosystem vs
>> > > > > benefiting only proprietary software.
>> > > > 
>> > > > Sorry, that was where this started where Jiri was stating that we had
>> > > > to be selling this.
>> > > 
>> > > For the record, I never wrote that. Not sure why you repeat this over
>> > > this thread.
>> > 
>> > Because you seem to be implying that the Meta NIC driver shouldn't be
>> > included simply since it isn't going to be available outside of Meta.
>> > The fact is Meta employs a number of kernel developers and as a result
>> > of that there will be a number of kernel developers that will have
>> > access to this NIC and likely do development on systems containing it.
>> > In addition simply due to the size of the datacenters that we will be
>> > populating there is actually a strong likelihood that there will be
>> > more instances of this NIC running on Linux than there are of some
>> > other vendor devices that have been allowed to have drivers in the
>> > kernel.
>> 
>> So? The gain for community is still 0. No matter how many instances is
>> private hw you privately have. Just have a private driver.
>
>I am amazed and not in a good way at how far this has gone, truly.
>
>This really is akin to saying that any non-zero driver count to maintain is a
>burden on the community. Which is true, by definition, but if the goal was to
>build something for no users, then clearly this is the wrong place to be in,
>or too late. The systems with no users are the best to maintain, that is for
>sure.
>
>If the practical concern is wen you make tree wide API change that fbnic
>happens to use, and you have yet another driver (fbnic) to convert, so what?
>Work with Alex ahead of time, get his driver to be modified, post the patch
>series. Even if Alex happens to move on and stop being responsible and there
>is no maintainer, so what? Give the driver a depreciation window for someone
>to step in, rip it, end of story. Nothing new, so what has specifically
>changed as of April 4th 2024 to oppose such strong rejection?

How you describe the flow of internal API change is totally distant from
reality. Really, like no part is correct:
1) API change is responsibility of the person doing it. Imagine working
   with 40 driver maintainers for every API change. I did my share of
   API changes in the past, maintainer were only involved to be cced.
2) To deprecate driver because the maintainer is not responsible. Can
   you please show me one example when that happened in the past?


>
>Like it was said, there are tons of drivers in the Linux kernel that have a
>single user, this one might have a few more than a single one, that should be
>good enough.

This will have exactly 0. That is my point. Why to merge something
nobody will ever use?


>
>What the heck is going on?
>-- 
>Florian
>

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-08 17:32                                 ` John Fastabend
@ 2024-04-09 11:01                                   ` Jiri Pirko
  2024-04-09 13:11                                     ` Alexander Lobakin
  0 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-09 11:01 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexander Duyck, Jason Gunthorpe, Paolo Abeni, Jakub Kicinski,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem,
	Christoph Hellwig

Mon, Apr 08, 2024 at 07:32:59PM CEST, john.fastabend@gmail.com wrote:
>Jiri Pirko wrote:
>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>> >On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>> >> >On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>> >> >>
>> >> >> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>> >> >> > > Alex already indicated new features are coming, changes to the core
>> >> >> > > code will be proposed. How should those be evaluated? Hypothetically
>> >> >> > > should fbnic be allowed to be the first implementation of something
>> >> >> > > invasive like Mina's DMABUF work? Google published an open userspace
>> >> >> > > for NCCL that people can (in theory at least) actually run. Meta would
>> >> >> > > not be able to do that. I would say that clearly crosses the line and
>> >> >> > > should not be accepted.
>> >> >> >
>> >> >> > Why not? Just because we are not commercially selling it doesn't mean
>> >> >> > we couldn't look at other solutions such as QEMU. If we were to
>> >> >> > provide a github repo with an emulation of the NIC would that be
>> >> >> > enough to satisfy the "commercial" requirement?
>> >> >>
>> >> >> My test is not "commercial", it is enabling open source ecosystem vs
>> >> >> benefiting only proprietary software.
>> >> >
>> >> >Sorry, that was where this started where Jiri was stating that we had
>> >> >to be selling this.
>> >>
>> >> For the record, I never wrote that. Not sure why you repeat this over
>> >> this thread.
>> >
>> >Because you seem to be implying that the Meta NIC driver shouldn't be
>> >included simply since it isn't going to be available outside of Meta.
>> >The fact is Meta employs a number of kernel developers and as a result
>> >of that there will be a number of kernel developers that will have
>> >access to this NIC and likely do development on systems containing it.
>> >In addition simply due to the size of the datacenters that we will be
>> >populating there is actually a strong likelihood that there will be
>> >more instances of this NIC running on Linux than there are of some
>> >other vendor devices that have been allowed to have drivers in the
>> >kernel.
>> 
>> So? The gain for community is still 0. No matter how many instances is
>> private hw you privately have. Just have a private driver.
>
>The gain is the same as if company X makes a card and sells it
>exclusively to datacenter provider Y. We know this happens.

Different story. The driver is still the same. Perhaps only some parts
of it are tailored to fit one person's need, maybe. But here, the whole
thing is obviously is targeted to one person. Can't you see the scale on
which these are different?


>Vendors would happily spin up a NIC if a DC with scale like this
>would pay for it. They just don't advertise it in patch 0/X,
>"adding device for cloud provider foo".
>
>There is no difference here. We gain developers, we gain insights,
>learnings and Linux and OSS drivers are running on another big
>DC. They improve things and find bugs they upstream them its a win.
>
>The opposite is also true if we exclude a driver/NIC HW that is
>running on major DCs we lose a lot of insight, experience, value.

Could you please describe in details and examples what exactly is we
are about to loose? I don't see it.


>DCs are all starting to build their own hardware if we lose this
>section of HW we lose those developers too. We are less likely
>to get any advances they come up with. I think you have it backwards.
>Eventually Linux networking becomes either commodity and irrelevant
>for DC deployments.
>
>So I strongly disagree we lose by excluding drivers and win by
>bringing it in.
>
>> 
>> 
>> >
>> >So from what I can tell the only difference is if we are manufacturing
>> >this for sale, or for personal use. Thus why I mention "commercial"
>> >since the only difference from my perspective is the fact that we are
>> >making it for our own use instead of selling it.
>> 
>> Give it for free.
>
>Huh?
>
>> 
>> 
>> >
>> >[...]
>> >
>> >> >> > I agree. We need a consistent set of standards. I just strongly
>> >> >> > believe commercial availability shouldn't be one of them.
>> >> >>
>> >> >> I never said commercial availability. I talked about open source vs
>> >> >> proprietary userspace. This is very standard kernel stuff.
>> >> >>
>> >> >> You have an unavailable NIC, so we know it is only ever operated with
>> >> >> Meta's proprietary kernel fork, supporting Meta's proprietary
>> >> >> userspace software. Where exactly is the open source?
>> >> >
>> >> >It depends on your definition of "unavailable". I could argue that for
>> >> >many most of the Mellanox NICs are also have limited availability as
>> >> >they aren't exactly easy to get a hold of without paying a hefty
>> >> >ransom.
>> >>
>> >> Sorry, but I have to say this is ridiculous argument, really Alex.
>> >> Apples and oranges.
>> >
>> >Really? So would you be making the same argument if it was
>> >Nvidia/Mellanox pushing the driver and they were exclusively making it
>> >just for Meta, Google, or some other big cloud provider? I suspect
>> 
>> Heh, what ifs :) Anyway, chance that happens is very close to 0.
>> 
>> 
>> >not. If nothing else they likely wouldn't disclose the plan for
>> >exclusive sales to get around this sort of thing. The fact is I know
>> >many of the vendors make proprietary spins of their firmware and
>> >hardware for specific customers. The way I see it this patchset is
>> >being rejected as I was too honest about the general plan and use case
>> >for it.
>> >
>> >This is what I am getting at. It just seems like we are playing games
>> >with semantics where if it is a vendor making the arrangement then it
>> >is okay for them to make hardware that is inaccessible to most, but if
>> >it is Meta then somehow it isn't.
>
>

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-03 20:09 ` [net-next PATCH 13/15] eth: fbnic: add basic Rx handling Alexander Duyck
@ 2024-04-09 11:47   ` Yunsheng Lin
  2024-04-09 15:08     ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Yunsheng Lin @ 2024-04-09 11:47 UTC (permalink / raw)
  To: Alexander Duyck, netdev; +Cc: Alexander Duyck, kuba, davem, pabeni

On 2024/4/4 4:09, Alexander Duyck wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>

...

> +
> +static int fbnic_clean_rcq(struct fbnic_napi_vector *nv,
> +			   struct fbnic_q_triad *qt, int budget)
> +{
> +	struct fbnic_ring *rcq = &qt->cmpl;
> +	struct fbnic_pkt_buff *pkt;
> +	s32 head0 = -1, head1 = -1;
> +	__le64 *raw_rcd, done;
> +	u32 head = rcq->head;
> +	u64 packets = 0;
> +
> +	done = (head & (rcq->size_mask + 1)) ? cpu_to_le64(FBNIC_RCD_DONE) : 0;
> +	raw_rcd = &rcq->desc[head & rcq->size_mask];
> +	pkt = rcq->pkt;
> +
> +	/* Walk the completion queue collecting the heads reported by NIC */
> +	while (likely(packets < budget)) {
> +		struct sk_buff *skb = ERR_PTR(-EINVAL);
> +		u64 rcd;
> +
> +		if ((*raw_rcd & cpu_to_le64(FBNIC_RCD_DONE)) == done)
> +			break;
> +
> +		dma_rmb();
> +
> +		rcd = le64_to_cpu(*raw_rcd);
> +
> +		switch (FIELD_GET(FBNIC_RCD_TYPE_MASK, rcd)) {
> +		case FBNIC_RCD_TYPE_HDR_AL:
> +			head0 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
> +			fbnic_pkt_prepare(nv, rcd, pkt, qt);
> +
> +			break;
> +		case FBNIC_RCD_TYPE_PAY_AL:
> +			head1 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
> +			fbnic_add_rx_frag(nv, rcd, pkt, qt);
> +
> +			break;
> +		case FBNIC_RCD_TYPE_OPT_META:
> +			/* Only type 0 is currently supported */
> +			if (FIELD_GET(FBNIC_RCD_OPT_META_TYPE_MASK, rcd))
> +				break;
> +
> +			/* We currently ignore the action table index */
> +			break;
> +		case FBNIC_RCD_TYPE_META:
> +			if (likely(!fbnic_rcd_metadata_err(rcd)))
> +				skb = fbnic_build_skb(nv, pkt);
> +
> +			/* populate skb and invalidate XDP */
> +			if (!IS_ERR_OR_NULL(skb)) {
> +				fbnic_populate_skb_fields(nv, rcd, skb, qt);
> +
> +				packets++;
> +
> +				napi_gro_receive(&nv->napi, skb);
> +			}
> +
> +			pkt->buff.data_hard_start = NULL;
> +
> +			break;
> +		}
> +
> +		raw_rcd++;
> +		head++;
> +		if (!(head & rcq->size_mask)) {
> +			done ^= cpu_to_le64(FBNIC_RCD_DONE);
> +			raw_rcd = &rcq->desc[0];
> +		}
> +	}
> +
> +	/* Unmap and free processed buffers */
> +	if (head0 >= 0)
> +		fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
> +	fbnic_fill_bdq(nv, &qt->sub0);
> +
> +	if (head1 >= 0)
> +		fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
> +	fbnic_fill_bdq(nv, &qt->sub1);

I am not sure how complicated the rx handling will be for the advanced
feature. For the current code, for each entry/desc in both qt->sub0 and
qt->sub1 at least need one page, and the page seems to be only used once
no matter however small the page is used?

I am assuming you want to do 'tightly optimized' operation for this by
calling page_pool_fragment_page(), but manipulating page->pp_ref_count
directly does not seems to add any value for the current code, but seem
to waste a lot of memory by not using the frag API, especially PAGE_SIZE
> 4K?

> +
> +	/* Record the current head/tail of the queue */
> +	if (rcq->head != head) {
> +		rcq->head = head;
> +		writel(head & rcq->size_mask, rcq->doorbell);
> +	}
> +
> +	return packets;
> +}
>  
> 

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 10:56                                   ` Jiri Pirko
@ 2024-04-09 13:05                                     ` Florian Fainelli
  2024-04-09 14:28                                       ` Jiri Pirko
  0 siblings, 1 reply; 163+ messages in thread
From: Florian Fainelli @ 2024-04-09 13:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Alexander Duyck, Jason Gunthorpe, Paolo Abeni, Jakub Kicinski,
	John Fastabend, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig



On 4/9/2024 3:56 AM, Jiri Pirko wrote:
> Mon, Apr 08, 2024 at 11:36:42PM CEST, f.fainelli@gmail.com wrote:
>> On 4/8/24 09:51, Jiri Pirko wrote:
>>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>>>> On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>
>>>>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>>>>>> On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>>>
>>>>>>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>>>>>>>>> Alex already indicated new features are coming, changes to the core
>>>>>>>>> code will be proposed. How should those be evaluated? Hypothetically
>>>>>>>>> should fbnic be allowed to be the first implementation of something
>>>>>>>>> invasive like Mina's DMABUF work? Google published an open userspace
>>>>>>>>> for NCCL that people can (in theory at least) actually run. Meta would
>>>>>>>>> not be able to do that. I would say that clearly crosses the line and
>>>>>>>>> should not be accepted.
>>>>>>>>
>>>>>>>> Why not? Just because we are not commercially selling it doesn't mean
>>>>>>>> we couldn't look at other solutions such as QEMU. If we were to
>>>>>>>> provide a github repo with an emulation of the NIC would that be
>>>>>>>> enough to satisfy the "commercial" requirement?
>>>>>>>
>>>>>>> My test is not "commercial", it is enabling open source ecosystem vs
>>>>>>> benefiting only proprietary software.
>>>>>>
>>>>>> Sorry, that was where this started where Jiri was stating that we had
>>>>>> to be selling this.
>>>>>
>>>>> For the record, I never wrote that. Not sure why you repeat this over
>>>>> this thread.
>>>>
>>>> Because you seem to be implying that the Meta NIC driver shouldn't be
>>>> included simply since it isn't going to be available outside of Meta.
>>>> The fact is Meta employs a number of kernel developers and as a result
>>>> of that there will be a number of kernel developers that will have
>>>> access to this NIC and likely do development on systems containing it.
>>>> In addition simply due to the size of the datacenters that we will be
>>>> populating there is actually a strong likelihood that there will be
>>>> more instances of this NIC running on Linux than there are of some
>>>> other vendor devices that have been allowed to have drivers in the
>>>> kernel.
>>>
>>> So? The gain for community is still 0. No matter how many instances is
>>> private hw you privately have. Just have a private driver.
>>
>> I am amazed and not in a good way at how far this has gone, truly.
>>
>> This really is akin to saying that any non-zero driver count to maintain is a
>> burden on the community. Which is true, by definition, but if the goal was to
>> build something for no users, then clearly this is the wrong place to be in,
>> or too late. The systems with no users are the best to maintain, that is for
>> sure.
>>
>> If the practical concern is wen you make tree wide API change that fbnic
>> happens to use, and you have yet another driver (fbnic) to convert, so what?
>> Work with Alex ahead of time, get his driver to be modified, post the patch
>> series. Even if Alex happens to move on and stop being responsible and there
>> is no maintainer, so what? Give the driver a depreciation window for someone
>> to step in, rip it, end of story. Nothing new, so what has specifically
>> changed as of April 4th 2024 to oppose such strong rejection?
> 
> How you describe the flow of internal API change is totally distant from
> reality. Really, like no part is correct:
> 1) API change is responsibility of the person doing it. Imagine working
>     with 40 driver maintainers for every API change. I did my share of
>     API changes in the past, maintainer were only involved to be cced.

As a submitter you propose changes and silence is acknowledgement. If 
one of your API changes broke someone's driver and they did not notify 
you of the breakage during the review cycle, it falls on their shoulder 
to fix it for themselves and they should not be holding back your work, 
that would not be fair. If you know about the breakage, and there is 
still no fix, that is an indication the driver is not actively used and 
maintained.

This also does not mean you have to do the entire API changes to a 
driver you do not know about on your own. Nothing ever prevents you from 
posting the patches as RFC and say: "here is how I would go about 
changing your driver, please review and help me make corrections". If 
the driver maintainers do not respond there is no reason their lack of 
involvement should refrain your work, and so your proposed changes will 
be merged eventually.

Is not this the whole point of being a community and be able to delegate 
and mitigate the risk of large scale changes?

> 2) To deprecate driver because the maintainer is not responsible. Can
>     you please show me one example when that happened in the past?

I cannot show you an example because we never had to go that far and I 
did not say that this is an established practice, but that we *could* do 
that if we ever reached that point.

> 
> 
>>
>> Like it was said, there are tons of drivers in the Linux kernel that have a
>> single user, this one might have a few more than a single one, that should be
>> good enough.
> 
> This will have exactly 0. That is my point. Why to merge something
> nobody will ever use?

Even if Alex and his firmware colleague end up being the only two people 
using this driver if the decision is to make it upstream because this is 
the desired distribution and development model of the driver we should 
respect that.

And just to be clear, we should not be respecting that because Meta, or 
Alex or anyone decided that they were doing the world a favor by working 
in the open rather than being closed door, but simply because we cannot 
*presume* about their intentions and the future.

For drivers specifically, yes, there is a question of to which degree 
can we scale horizontally, and I do not think there is ever going to be 
an answer to that, as we will continue to see new drivers emerge, 
possibly with few users, for some definition of few.
-- 
Florian

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 11:01                                   ` Jiri Pirko
@ 2024-04-09 13:11                                     ` Alexander Lobakin
  2024-04-09 13:18                                       ` Jason Gunthorpe
                                                         ` (2 more replies)
  0 siblings, 3 replies; 163+ messages in thread
From: Alexander Lobakin @ 2024-04-09 13:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Alexander Duyck, Jason Gunthorpe, Paolo Abeni,
	Jakub Kicinski, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 9 Apr 2024 13:01:51 +0200

> Mon, Apr 08, 2024 at 07:32:59PM CEST, john.fastabend@gmail.com wrote:
>> Jiri Pirko wrote:
>>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>>>> On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>
>>>>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>>>>>> On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>>>
>>>>>>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>>>>>>>>> Alex already indicated new features are coming, changes to the core
>>>>>>>>> code will be proposed. How should those be evaluated? Hypothetically
>>>>>>>>> should fbnic be allowed to be the first implementation of something
>>>>>>>>> invasive like Mina's DMABUF work? Google published an open userspace
>>>>>>>>> for NCCL that people can (in theory at least) actually run. Meta would
>>>>>>>>> not be able to do that. I would say that clearly crosses the line and
>>>>>>>>> should not be accepted.
>>>>>>>>
>>>>>>>> Why not? Just because we are not commercially selling it doesn't mean
>>>>>>>> we couldn't look at other solutions such as QEMU. If we were to
>>>>>>>> provide a github repo with an emulation of the NIC would that be
>>>>>>>> enough to satisfy the "commercial" requirement?
>>>>>>>
>>>>>>> My test is not "commercial", it is enabling open source ecosystem vs
>>>>>>> benefiting only proprietary software.
>>>>>>
>>>>>> Sorry, that was where this started where Jiri was stating that we had
>>>>>> to be selling this.
>>>>>
>>>>> For the record, I never wrote that. Not sure why you repeat this over
>>>>> this thread.
>>>>
>>>> Because you seem to be implying that the Meta NIC driver shouldn't be
>>>> included simply since it isn't going to be available outside of Meta.

BTW idpf is also not something you can go and buy in a store, but it's
here in the kernel. Anyway, see below.

>>>> The fact is Meta employs a number of kernel developers and as a result
>>>> of that there will be a number of kernel developers that will have
>>>> access to this NIC and likely do development on systems containing it.

[...]

>> Vendors would happily spin up a NIC if a DC with scale like this
>> would pay for it. They just don't advertise it in patch 0/X,
>> "adding device for cloud provider foo".
>>
>> There is no difference here. We gain developers, we gain insights,
>> learnings and Linux and OSS drivers are running on another big
>> DC. They improve things and find bugs they upstream them its a win.
>>
>> The opposite is also true if we exclude a driver/NIC HW that is
>> running on major DCs we lose a lot of insight, experience, value.
> 
> Could you please describe in details and examples what exactly is we
> are about to loose? I don't see it.

As long as driver A introduces new features / improvements / API /
whatever to the core kernel, we benefit from this no matter whether I'm
actually able to run this driver on my system.

Some drivers even give us benefit by that they are of good quality (I
don't speak for this driver, just some hypothetical) and/or have
interesting design / code / API / etc. choices. The drivers I work on
did gain a lot just from that I was reading new commits / lore threads
and look at changes in other drivers.

I saw enough situations when driver A started using/doing something the
way it wasn't ever done anywhere before, and then more and more drivers
stated doing the same thing and at the end it became sorta standard.

I didn't read this patchset and thus can't say if it will bring us good
immediately or some time later, but I believe there's no reason to
reject the driver only because you can't buy a board for it in your
gadget store next door.

[...]

Thanks,
Olek

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 13:11                                     ` Alexander Lobakin
@ 2024-04-09 13:18                                       ` Jason Gunthorpe
  2024-04-09 14:08                                       ` Jakub Kicinski
  2024-04-09 14:41                                       ` Jiri Pirko
  2 siblings, 0 replies; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-09 13:18 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Jiri Pirko, John Fastabend, Alexander Duyck, Paolo Abeni,
	Jakub Kicinski, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

On Tue, Apr 09, 2024 at 03:11:21PM +0200, Alexander Lobakin wrote:

> BTW idpf is also not something you can go and buy in a store, but it's
> here in the kernel. Anyway, see below.

That is really disappointing to hear :(

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 13:11                                     ` Alexander Lobakin
  2024-04-09 13:18                                       ` Jason Gunthorpe
@ 2024-04-09 14:08                                       ` Jakub Kicinski
  2024-04-09 14:27                                         ` Jakub Kicinski
  2024-04-09 14:41                                       ` Jiri Pirko
  2 siblings, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-09 14:08 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Jiri Pirko, John Fastabend, Alexander Duyck, Paolo Abeni, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem

On Tue, 9 Apr 2024 15:11:21 +0200 Alexander Lobakin wrote:
> BTW idpf is also not something you can go and buy in a store, but it's
> here in the kernel. Anyway, see below.

For some definition of "a store" :)

> > Could you please describe in details and examples what exactly is we
> > are about to loose? I don't see it.  
> 
> As long as driver A introduces new features / improvements / API /
> whatever to the core kernel, we benefit from this no matter whether I'm
> actually able to run this driver on my system.
> 
> Some drivers even give us benefit by that they are of good quality (I
> don't speak for this driver, just some hypothetical) and/or have
> interesting design / code / API / etc. choices. The drivers I work on
> did gain a lot just from that I was reading new commits / lore threads
> and look at changes in other drivers.

Another point along these lines is worth bringing up. Companies which
build their own kernels probably have little reason to distribute
drivers out of tree. Vendors unfortunately are forced by some of their
customers and/or sales department to provide out of tree drivers. Which
in turn distinctiveness them from implementing shared core
infrastructure. The queue API is a good example of that. Number of
vendors implement pre-allocate and swap for reconfiguration but it's
not controlled by the core. So after 5+ years (look at netconf 2019
slides) of violently agreeing that we need queue alloc we made little
progress :( I don't think that it's a coincidence that it's Mina
(Google) and David (Meta) who picked up this work. And it's really hard
to implement that in an "off the shelf device", where queues are fully
controlled by FW (no documentation available), and without breaking
something (no access to vendor's CI/tests). IOW while modifying core
for a single private driver is a concern there's also a ton of work
we all agree needs to be done in the core, that we need help with.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 14:08                                       ` Jakub Kicinski
@ 2024-04-09 14:27                                         ` Jakub Kicinski
  0 siblings, 0 replies; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-09 14:27 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Jiri Pirko, John Fastabend, Alexander Duyck, Paolo Abeni, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem

On Tue, 9 Apr 2024 07:08:58 -0700 Jakub Kicinski wrote:
> distinctiveness

Too trusting of the spellcheck, I meant disincentivizes

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 13:05                                     ` Florian Fainelli
@ 2024-04-09 14:28                                       ` Jiri Pirko
  2024-04-09 17:42                                         ` Florian Fainelli
  0 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-09 14:28 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Alexander Duyck, Jason Gunthorpe, Paolo Abeni, Jakub Kicinski,
	John Fastabend, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

Tue, Apr 09, 2024 at 03:05:47PM CEST, f.fainelli@gmail.com wrote:
>
>
>On 4/9/2024 3:56 AM, Jiri Pirko wrote:
>> Mon, Apr 08, 2024 at 11:36:42PM CEST, f.fainelli@gmail.com wrote:
>> > On 4/8/24 09:51, Jiri Pirko wrote:
>> > > Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>> > > > On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> > > > > 
>> > > > > Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>> > > > > > On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>> > > > > > > 
>> > > > > > > On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>> > > > > > > > > Alex already indicated new features are coming, changes to the core
>> > > > > > > > > code will be proposed. How should those be evaluated? Hypothetically
>> > > > > > > > > should fbnic be allowed to be the first implementation of something
>> > > > > > > > > invasive like Mina's DMABUF work? Google published an open userspace
>> > > > > > > > > for NCCL that people can (in theory at least) actually run. Meta would
>> > > > > > > > > not be able to do that. I would say that clearly crosses the line and
>> > > > > > > > > should not be accepted.
>> > > > > > > > 
>> > > > > > > > Why not? Just because we are not commercially selling it doesn't mean
>> > > > > > > > we couldn't look at other solutions such as QEMU. If we were to
>> > > > > > > > provide a github repo with an emulation of the NIC would that be
>> > > > > > > > enough to satisfy the "commercial" requirement?
>> > > > > > > 
>> > > > > > > My test is not "commercial", it is enabling open source ecosystem vs
>> > > > > > > benefiting only proprietary software.
>> > > > > > 
>> > > > > > Sorry, that was where this started where Jiri was stating that we had
>> > > > > > to be selling this.
>> > > > > 
>> > > > > For the record, I never wrote that. Not sure why you repeat this over
>> > > > > this thread.
>> > > > 
>> > > > Because you seem to be implying that the Meta NIC driver shouldn't be
>> > > > included simply since it isn't going to be available outside of Meta.
>> > > > The fact is Meta employs a number of kernel developers and as a result
>> > > > of that there will be a number of kernel developers that will have
>> > > > access to this NIC and likely do development on systems containing it.
>> > > > In addition simply due to the size of the datacenters that we will be
>> > > > populating there is actually a strong likelihood that there will be
>> > > > more instances of this NIC running on Linux than there are of some
>> > > > other vendor devices that have been allowed to have drivers in the
>> > > > kernel.
>> > > 
>> > > So? The gain for community is still 0. No matter how many instances is
>> > > private hw you privately have. Just have a private driver.
>> > 
>> > I am amazed and not in a good way at how far this has gone, truly.
>> > 
>> > This really is akin to saying that any non-zero driver count to maintain is a
>> > burden on the community. Which is true, by definition, but if the goal was to
>> > build something for no users, then clearly this is the wrong place to be in,
>> > or too late. The systems with no users are the best to maintain, that is for
>> > sure.
>> > 
>> > If the practical concern is wen you make tree wide API change that fbnic
>> > happens to use, and you have yet another driver (fbnic) to convert, so what?
>> > Work with Alex ahead of time, get his driver to be modified, post the patch
>> > series. Even if Alex happens to move on and stop being responsible and there
>> > is no maintainer, so what? Give the driver a depreciation window for someone
>> > to step in, rip it, end of story. Nothing new, so what has specifically
>> > changed as of April 4th 2024 to oppose such strong rejection?
>> 
>> How you describe the flow of internal API change is totally distant from
>> reality. Really, like no part is correct:
>> 1) API change is responsibility of the person doing it. Imagine working
>>     with 40 driver maintainers for every API change. I did my share of
>>     API changes in the past, maintainer were only involved to be cced.
>
>As a submitter you propose changes and silence is acknowledgement. If one of
>your API changes broke someone's driver and they did not notify you of the
>breakage during the review cycle, it falls on their shoulder to fix it for
>themselves and they should not be holding back your work, that would not be

Does it? I don't think so. If you break something, better try to fix it
before somebody else has to.


>fair. If you know about the breakage, and there is still no fix, that is an
>indication the driver is not actively used and maintained.

So? That is not my point. If I break something in fbnic, why does anyone
care? Nobody is ever to hit that bug, only Meta DC.


>
>This also does not mean you have to do the entire API changes to a driver you
>do not know about on your own. Nothing ever prevents you from posting the
>patches as RFC and say: "here is how I would go about changing your driver,
>please review and help me make corrections". If the driver maintainers do not
>respond there is no reason their lack of involvement should refrain your
>work, and so your proposed changes will be merged eventually.

Realistically, did you see that ever happen. I can't recall.


>
>Is not this the whole point of being a community and be able to delegate and
>mitigate the risk of large scale changes?
>
>> 2) To deprecate driver because the maintainer is not responsible. Can
>>     you please show me one example when that happened in the past?
>
>I cannot show you an example because we never had to go that far and I did
>not say that this is an established practice, but that we *could* do that if
>we ever reached that point.

You are talking about a flow that does not exist. I don't understand how
is that related to this discussion then.


>
>> 
>> 
>> > 
>> > Like it was said, there are tons of drivers in the Linux kernel that have a
>> > single user, this one might have a few more than a single one, that should be
>> > good enough.
>> 
>> This will have exactly 0. That is my point. Why to merge something
>> nobody will ever use?
>
>Even if Alex and his firmware colleague end up being the only two people
>using this driver if the decision is to make it upstream because this is the
>desired distribution and development model of the driver we should respect
>that.
>
>And just to be clear, we should not be respecting that because Meta, or Alex
>or anyone decided that they were doing the world a favor by working in the
>open rather than being closed door, but simply because we cannot *presume*

I don't see any favor for the community. What's the favor exactly?
The only favor I see is the in the opposite direction, community giving
Meta free cycles saving their backporting costs. Why?


>about their intentions and the future.

Heh, the intention is pretty clear from this discussion, isn't it? If
they ever by any chance decide to go public with their device, driver
for that could be submitted at a time. But this is totally hypothetical.


>
>For drivers specifically, yes, there is a question of to which degree can we
>scale horizontally, and I do not think there is ever going to be an answer to
>that, as we will continue to see new drivers emerge, possibly with few users,
>for some definition of few.
>-- 
>Florian

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 13:11                                     ` Alexander Lobakin
  2024-04-09 13:18                                       ` Jason Gunthorpe
  2024-04-09 14:08                                       ` Jakub Kicinski
@ 2024-04-09 14:41                                       ` Jiri Pirko
  2024-04-10 11:45                                         ` Alexander Lobakin
  2 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-09 14:41 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: John Fastabend, Alexander Duyck, Jason Gunthorpe, Paolo Abeni,
	Jakub Kicinski, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

Tue, Apr 09, 2024 at 03:11:21PM CEST, aleksander.lobakin@intel.com wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Tue, 9 Apr 2024 13:01:51 +0200
>
>> Mon, Apr 08, 2024 at 07:32:59PM CEST, john.fastabend@gmail.com wrote:
>>> Jiri Pirko wrote:
>>>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>>>>> On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>
>>>>>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>>>>>>> On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>>>>>>>>>> Alex already indicated new features are coming, changes to the core
>>>>>>>>>> code will be proposed. How should those be evaluated? Hypothetically
>>>>>>>>>> should fbnic be allowed to be the first implementation of something
>>>>>>>>>> invasive like Mina's DMABUF work? Google published an open userspace
>>>>>>>>>> for NCCL that people can (in theory at least) actually run. Meta would
>>>>>>>>>> not be able to do that. I would say that clearly crosses the line and
>>>>>>>>>> should not be accepted.
>>>>>>>>>
>>>>>>>>> Why not? Just because we are not commercially selling it doesn't mean
>>>>>>>>> we couldn't look at other solutions such as QEMU. If we were to
>>>>>>>>> provide a github repo with an emulation of the NIC would that be
>>>>>>>>> enough to satisfy the "commercial" requirement?
>>>>>>>>
>>>>>>>> My test is not "commercial", it is enabling open source ecosystem vs
>>>>>>>> benefiting only proprietary software.
>>>>>>>
>>>>>>> Sorry, that was where this started where Jiri was stating that we had
>>>>>>> to be selling this.
>>>>>>
>>>>>> For the record, I never wrote that. Not sure why you repeat this over
>>>>>> this thread.
>>>>>
>>>>> Because you seem to be implying that the Meta NIC driver shouldn't be
>>>>> included simply since it isn't going to be available outside of Meta.
>
>BTW idpf is also not something you can go and buy in a store, but it's
>here in the kernel. Anyway, see below.

IDK, why so many people in this thread are so focused on "buying" nic.
IDPF device is something I assume one may see on a VM hosted in some
cloud, isn't it? If yes, it is completely legit to have it in. Do I miss
something?


>
>>>>> The fact is Meta employs a number of kernel developers and as a result
>>>>> of that there will be a number of kernel developers that will have
>>>>> access to this NIC and likely do development on systems containing it.
>
>[...]
>
>>> Vendors would happily spin up a NIC if a DC with scale like this
>>> would pay for it. They just don't advertise it in patch 0/X,
>>> "adding device for cloud provider foo".
>>>
>>> There is no difference here. We gain developers, we gain insights,
>>> learnings and Linux and OSS drivers are running on another big
>>> DC. They improve things and find bugs they upstream them its a win.
>>>
>>> The opposite is also true if we exclude a driver/NIC HW that is
>>> running on major DCs we lose a lot of insight, experience, value.
>> 
>> Could you please describe in details and examples what exactly is we
>> are about to loose? I don't see it.
>
>As long as driver A introduces new features / improvements / API /
>whatever to the core kernel, we benefit from this no matter whether I'm
>actually able to run this driver on my system.
>
>Some drivers even give us benefit by that they are of good quality (I
>don't speak for this driver, just some hypothetical) and/or have
>interesting design / code / API / etc. choices. The drivers I work on
>did gain a lot just from that I was reading new commits / lore threads
>and look at changes in other drivers.
>
>I saw enough situations when driver A started using/doing something the
>way it wasn't ever done anywhere before, and then more and more drivers
>stated doing the same thing and at the end it became sorta standard.

So bottom line is, the unused driver *may* introduce some features and
*may* provide as an example of how to do things for other people.
Is this really that beneficial for the community that it overweights
the obvious cons (not going to repeat them)?

Like with any other patch/set we merge in, we always look at the cons
and pros. I'm honestly surprised that so many people here
want to make exception for Meta's internal toy project.


>
>I didn't read this patchset and thus can't say if it will bring us good
>immediately or some time later, but I believe there's no reason to
>reject the driver only because you can't buy a board for it in your
>gadget store next door.

Again with "buying", uff.


>
>[...]
>
>Thanks,
>Olek

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09  8:18                             ` Leon Romanovsky
@ 2024-04-09 14:43                               ` Alexander Duyck
  2024-04-09 15:39                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-09 14:43 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe
  Cc: Jakub Kicinski, John Fastabend, Jiri Pirko, netdev, bhelgaas,
	linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 9, 2024 at 1:19 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Apr 08, 2024 at 01:43:28PM -0700, Alexander Duyck wrote:
> > On Mon, Apr 8, 2024 at 11:41 AM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Mon, Apr 08, 2024 at 08:26:33AM -0700, Alexander Duyck wrote:
> > > > On Sun, Apr 7, 2024 at 11:18 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > On Fri, Apr 05, 2024 at 08:41:11AM -0700, Alexander Duyck wrote:
> > > > > > On Thu, Apr 4, 2024 at 7:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > > Technical solution? Maybe if it's not a public device regression rules
> > > > > > > > > don't apply? Seems fairly reasonable.
> > > > > > > >
> > > > > > > > This is a hypothetical. This driver currently isn't changing anything
> > > > > > > > outside of itself. At this point the driver would only be build tested
> > > > > > > > by everyone else. They could just not include it in their Kconfig and
> > > > > > > > then out-of-sight, out-of-mind.
> > > > > > >
> > > > > > > Not changing does not mean not depending on existing behavior.
> > > > > > > Investigating and fixing properly even the hardest regressions in
> > > > > > > the stack is a bar that Meta can so easily clear. I don't understand
> > > > > > > why you are arguing.
> > > > > >
> > > > > > I wasn't saying the driver wouldn't be dependent on existing behavior.
> > > > > > I was saying that it was a hypothetical that Meta would be a "less
> > > > > > than cooperative user" and demand a revert.  It is also a hypothetical
> > > > > > that Linus wouldn't just propose a revert of the fbnic driver instead
> > > > > > of the API for the crime of being a "less than cooperative maintainer"
> > > > > > and  then give Meta the Nvidia treatment.
> > > > >
> > > > > It is very easy to be "less than cooperative maintainer" in netdev world.
> > > > > 1. Be vendor.
> > > > > 2. Propose ideas which are different.
> > > > > 3. Report for user-visible regression.
> > > > > 4. Ask for a fix from the patch author or demand a revert according to netdev rules/practice.
> > > > >
> > > > > And voilà, you are "less than cooperative maintainer".
> > > > >
> > > > > So in reality, the "hypothetical" is very close to the reality, unless
> > > > > Meta contribution will be treated as a special case.
> > > > >
> > > > > Thanks
> > > >
> > > > How many cases of that have we had in the past? I'm honestly curious
> > > > as I don't actually have any reference.
> > >
> > > And this is the problem, you don't have "any reference" and accurate
> > > knowledge what happened, but you are saying "less than cooperative
> > > maintainer".
>
> <...>
>
> > Any more info on this? Without context it is hard to say one way or the other.
>
> <...>
>
> > I didn't say you couldn't. Without context I cannot say if it was
> > deserved or not.
>
> Florian gave links to the context, so I'll skip this part.
>
> In this thread, Jakub tried to revive the discussion about it.
> https://lore.kernel.org/netdev/20240326133412.47cf6d99@kernel.org/
>
> <...>

I see. So this is what you were referencing. Arguably I can see both
sides of the issue. Ideally what should have been presented would have
been the root cause of why the diff was breaking things and then it
could have been fixed. However instead what was presented was
essentially a bisect with a request to revert.

Ultimately Eric accepted the revert since there was an issue that
needed to be fixed. However I can't tell what went on in terms of
trying to get to the root cause as that was taken offline for
discussion so I can't say what role Mellanox played in either good or
bad other than at least performing the bisect.

Ultimately I think this kind of comes down to the hobbyist versus
commercial interests issue that I brought up earlier. The hobbyist
side should at least be curious about what about the Vagrant
implementation was not RFC compliant which the changes supposedly
were, thus the interest in getting a root cause. However that said it
is broken and needs to be fixed so curiosity be damned, we cannot
break userspace or not interop with other TCP implementations.

> > The point I was trying to make is that if you are the only owner of
> > something, and not willing to work with the community as a maintainer
>
> Like Jakub, I don't understand why you are talking about regressions in
> the driver, while you brought the label of "less than cooperative maintainer"
> and asked for "then give Meta the Nvidia treatment".

Because I have been trying to keep the  whole discussion about the
fbnic driver that is presented in this patch set. When I was referring
to a "less than cooperative maintainer" it was in response to the
hypothetical about what if Meta started refusing to work with the
community after this was accepted, and the "Nvidia treatment" I was
referring was the graphics side about 10 years ago[1] as the question
was about somebody running to Linus to complain that their proprietary
hardware got broken by a kernel change. The general idea being if we
are a proprietary NIC with ourselves as the only customer Linus would
be more likely to give Meta a similar message.


> I don't want to get into the discussion about if this driver should be
> accepted or not.
>
> I'm just asking to stop label people and companies based on descriptions
> from other people, but rely on facts.

Sorry, it wasn't meant to be any sort of attack on Nvidia/Mellanox.
The Nvidia I was referencing was the graphics side which had a bad
reputation with the community long before Mellanox got involved.

> Thanks

Thank you. I understand now that you and Jason were just trying to
warn me about what the community will and won't accept. Like I
mentioned before I had just misconstrued Jason's comments as backing
Jiri initially in this. In my mind I was prepared for the
Nvidia/Mellanox folks dog piling me so I was just prepared for
attacks.

Just for the record this will be the third NIC driver I have added to
the kernel following igbvf and fm10k, and years maintaining some of
the other Intel network drivers. So I am well aware of the
expectations of a maintainer. I might be a bit rusty due to a couple
years of being focused on this project and not being able to post as
much upstream, but as the expression goes "This isn't my first rodeo".

- Alex

[1]: https://arstechnica.com/information-technology/2012/06/linus-torvalds-says-f-k-you-to-nvidia/

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-09 11:47   ` Yunsheng Lin
@ 2024-04-09 15:08     ` Alexander Duyck
  2024-04-10 11:54       ` Yunsheng Lin
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-09 15:08 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Tue, Apr 9, 2024 at 4:47 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/4/4 4:09, Alexander Duyck wrote:
> > From: Alexander Duyck <alexanderduyck@fb.com>

[...]

> > +     /* Unmap and free processed buffers */
> > +     if (head0 >= 0)
> > +             fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
> > +     fbnic_fill_bdq(nv, &qt->sub0);
> > +
> > +     if (head1 >= 0)
> > +             fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
> > +     fbnic_fill_bdq(nv, &qt->sub1);
>
> I am not sure how complicated the rx handling will be for the advanced
> feature. For the current code, for each entry/desc in both qt->sub0 and
> qt->sub1 at least need one page, and the page seems to be only used once
> no matter however small the page is used?
>
> I am assuming you want to do 'tightly optimized' operation for this by
> calling page_pool_fragment_page(), but manipulating page->pp_ref_count
> directly does not seems to add any value for the current code, but seem
> to waste a lot of memory by not using the frag API, especially PAGE_SIZE
> > 4K?

On this hardware both the header and payload buffers are fragmentable.
The hardware decides the partitioning and we just follow it. So for
example it wouldn't be uncommon to have a jumbo frame split up such
that the header is less than 128B plus SKB overhead while the actual
data in the payload is just over 1400. So for us fragmenting the pages
is a very likely case especially with smaller packets.

It is better for us to optimize for the small packet scenario than
optimize for the case where 4K slices are getting taken. That way when
we are CPU constrained handling small packets we are the most
optimized whereas for the larger frames we can spare a few cycles to
account for the extra overhead. The result should be a higher overall
packets per second.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 14:43                               ` Alexander Duyck
@ 2024-04-09 15:39                                 ` Jason Gunthorpe
  2024-04-09 16:31                                   ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-09 15:39 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Leon Romanovsky, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 09, 2024 at 07:43:07AM -0700, Alexander Duyck wrote:

> I see. So this is what you were referencing. Arguably I can see both
> sides of the issue. Ideally what should have been presented would have
> been the root cause of why the diff

Uh, that almost never happens in the kernel world. Someone does a
great favour to us all to test rc kernels and finds bugs. The
expectation is generally things like:

 - The bug is fixed immediately because the issue is obvious to the
   author
 - Iteration and rapid progress is seen toward enlightening the author
 - The patch is reverted, often rapidly, try again later with a good
   patch

Unsophisticated reporters should not experience regressions,
period. Unsophisticated reporters shuld not be expected to debug
things on their own (though it sure is nice if they can!). We really
like it and appreciate it if reporters can run experiments!

In this particular instance there was some resistance getting to a fix
quickly. I think a revert for something like this that could not be
immediately fixed is the correct thing, especially when it effects
significant work within the community. It gives the submitter time to
find out how to solve the regression.

That there is now so much ongoing bad blood over such an ordinary
matter is what is really distressing here.

I think Leon's point is broadly that those on the "vendor" side seem
to often be accused of being a "bad vendor". I couldn't help but
notice the language from Meta on this thread seemed to place Meta
outside of being a vendor, despite having always very much been doing
typical vendor activities like downstream forks, proprietary userspace
and now drivers for their own devices.

In my view the vendor/!vendor distinction is really toxic and should
stop.

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 15:39                                 ` Jason Gunthorpe
@ 2024-04-09 16:31                                   ` Alexander Duyck
  2024-04-09 17:12                                     ` Jason Gunthorpe
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-09 16:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 9, 2024 at 8:39 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Apr 09, 2024 at 07:43:07AM -0700, Alexander Duyck wrote:
>
> > I see. So this is what you were referencing. Arguably I can see both
> > sides of the issue. Ideally what should have been presented would have
> > been the root cause of why the diff
>
> Uh, that almost never happens in the kernel world. Someone does a
> great favour to us all to test rc kernels and finds bugs. The

Thus why I mentioned "Ideally". Most often that cannot be the case due
to various reasons. However, that said that would have been the Ideal
solution, not the practical one.

> expectation is generally things like:
>
>  - The bug is fixed immediately because the issue is obvious to the
>    author
>  - Iteration and rapid progress is seen toward enlightening the author
>  - The patch is reverted, often rapidly, try again later with a good
>    patch

When working on a development branch that shouldn't be the
expectation. I suspect that is why the revert was pushed back on
initially. The developer wanted a chance to try to debug and resolve
the issue with root cause.

Honestly what I probably would have proposed was a build flag that
would have allowed the code to stay but be disabled with a "Broken"
label to allow both developers to work on their own thing. Then if
people complained about the RFC non-compliance issue, but didn't care
about the Vagrant setup they could have just turned it on to test and
verify it fixed their issue and get additional testing. However I
assume that would have introduced additional maintenance overhead.

> Unsophisticated reporters should not experience regressions,
> period. Unsophisticated reporters shuld not be expected to debug
> things on their own (though it sure is nice if they can!). We really
> like it and appreciate it if reporters can run experiments!

Unsophisticated reporters/users shouldn't be running net-next. If this
has made it to or is about to go into Linus's tree then I would agree
the regression needs to be resolved ASAP as that stuff shouldn't exist
past rc1 at the latest.

> In this particular instance there was some resistance getting to a fix
> quickly. I think a revert for something like this that could not be
> immediately fixed is the correct thing, especially when it effects
> significant work within the community. It gives the submitter time to
> find out how to solve the regression.
>
> That there is now so much ongoing bad blood over such an ordinary
> matter is what is really distressing here.

Well much of it has to do with the fact that this is supposed to be a
community. Generally I help you, you help me and together we both make
progress. So within the community people tend to build up what we
could call karma. Generally I think some of the messages sent seemed
to make it come across that the Mellanox/Nvidia folks felt it "wasn't
their problem" so they elicited a bit of frustration from the other
maintainers and built up some negative karma.

As I had mentioned in the case of the e1000e NVRAM corruption. It
wasn't an Intel issue that caused the problem but Intel had to jump in
to address it until they found the root cause that was function
tracing. Unfortunately one thing that tends to happen with upstream is
that we get asked to do things that aren't directly related to the
project we are working on. We saw that at Intel quite often. I
referred to it at one point as the "you stepped in it, you own it"
phenomenon where if we even brushed against block of upstream code
that wasn't being well maintained we would be asked to fix it up and
address existing issues before we could upstream any patches.

> I think Leon's point is broadly that those on the "vendor" side seem
> to often be accused of being a "bad vendor". I couldn't help but
> notice the language from Meta on this thread seemed to place Meta
> outside of being a vendor, despite having always very much been doing
> typical vendor activities like downstream forks, proprietary userspace
> and now drivers for their own devices.

I wouldn't disagree that we are doing "vendor" things. Up until about
4 years ago I was on the "vendor" side at Intel. One difference is
that Meta is also the "consumer". So if I report an issue it is me
complaining about something as a sophisticated user instead of a
unsophisticated one. So hopefully we have gone though and done some
triage to at least bisect it down to a patch and are willing to work
with the community as you guys did. If we can work with the other
maintainers to enable them to debug and root cause the issue then even
better. The revert is normally the weapon of last resort to be broken
out before the merge window opens, or if an issue is caught in Linus's
tree.

> In my view the vendor/!vendor distinction is really toxic and should
> stop.

I agree. However that was essentially what started all this when Jiri
pointed out that we weren't selling the NIC to anyone else. That made
this all about vendor vs !vendor, and his suggestion of just giving
the NICs away isn't exactly practical. At least not an any sort of
large scale. Maybe we should start coming up with a new term for the
!vendor case. How about "prosumer", as in "producer" and "consumer"?

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-05 14:24                     ` Alexander Duyck
  2024-04-05 15:17                       ` Jason Gunthorpe
@ 2024-04-09 16:53                       ` Edward Cree
  1 sibling, 0 replies; 163+ messages in thread
From: Edward Cree @ 2024-04-09 16:53 UTC (permalink / raw)
  To: Alexander Duyck, Jason Gunthorpe
  Cc: Paolo Abeni, Jakub Kicinski, John Fastabend, Jiri Pirko, netdev,
	bhelgaas, linux-pci, Alexander Duyck, davem, Christoph Hellwig

On 05/04/2024 15:24, Alexander Duyck wrote:
> Why not? Just because we are not commercially selling it doesn't mean
> we couldn't look at other solutions such as QEMU. If we were to
> provide a github repo with an emulation of the NIC would that be
> enough to satisfy the "commercial" requirement?
> 
> The fact is I already have an implementation, but I would probably
> need to clean up a few things as the current setup requires 3 QEMU
> instances to emulate the full setup with host, firmware, and BMC. It
> wouldn't be as performant as the actual hardware but it is more than
> enough for us to test code with. If we need to look at publishing
> something like that to github in order to address the lack of user
> availability I could start looking at getting the approvals for that.
Personally I think that this would vitiate any legitimate objections
 anyone could have to this driver.  The emulation would be a functional
 spec for the device, and (assuming it's open source, including the
 firmware) would provide a basis for anyone attempting to build their
 own hardware to the same interface.  As long as clones aren't
 prevented by some kind of patent encumbrance or whatever, this would
 be more 'open' than many of the devices users _can_ get their hands on
 today.
The way this suggestion/offer/proposal got dismissed and ignored in
 favour of spurious arguments about DMABUF speaks volumes.

-e

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 16:31                                   ` Alexander Duyck
@ 2024-04-09 17:12                                     ` Jason Gunthorpe
  2024-04-09 18:38                                       ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-09 17:12 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Leon Romanovsky, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 09, 2024 at 09:31:06AM -0700, Alexander Duyck wrote:

> > expectation is generally things like:
> >
> >  - The bug is fixed immediately because the issue is obvious to the
> >    author
> >  - Iteration and rapid progress is seen toward enlightening the author
> >  - The patch is reverted, often rapidly, try again later with a good
> >    patch
> 
> When working on a development branch that shouldn't be the
> expectation. I suspect that is why the revert was pushed back on
> initially. The developer wanted a chance to try to debug and resolve
> the issue with root cause.

Even mm-unstable drops patches on a hair trigger, as an example.

You can't have an orderly development process if your development tree
is broken in your CI.. Personally I'm grateful for the people who test
linux-next (or the various constituent sub trees), it really helps.

> Well much of it has to do with the fact that this is supposed to be a
> community. Generally I help you, you help me and together we both make
> progress. So within the community people tend to build up what we
> could call karma. Generally I think some of the messages sent seemed
> to make it come across that the Mellanox/Nvidia folks felt it "wasn't
> their problem" so they elicited a bit of frustration from the other
> maintainers and built up some negative karma.

How could it be NVIDIA folks problem? They are not experts in TCP and
can't debug it. The engineer running the CI systems did what he was
asked by Eric from what I can tell.

> phenomenon where if we even brushed against block of upstream code
> that wasn't being well maintained we would be asked to fix it up and
> address existing issues before we could upstream any patches.

Well, Intel has it's own karma problems in the kernel community. :(

> > In my view the vendor/!vendor distinction is really toxic and should
> > stop.
> 
> I agree. However that was essentially what started all this when Jiri
> pointed out that we weren't selling the NIC to anyone else. That made
> this all about vendor vs !vendor, 

That is not how I would sum up Jiri's position.

By my read he is saying that contributing code to the kernel that only
Meta can actually use is purely extractive. It is not about vendor or
!vendor, it is taking-free-forwardporting or not. You have argued,
and I would agree, that there is a grey scale between
extractive/collaborative - but I also agree with Jiri that fbnic is
undeniably far toward the extractive side.

If being extractive is a problem in this case or not is another
question, but I would say Jiri's objection is definitely not about
selling or vendor vs !vendor.

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 14:28                                       ` Jiri Pirko
@ 2024-04-09 17:42                                         ` Florian Fainelli
  2024-04-09 18:38                                           ` Leon Romanovsky
  0 siblings, 1 reply; 163+ messages in thread
From: Florian Fainelli @ 2024-04-09 17:42 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Alexander Duyck, Jason Gunthorpe, Paolo Abeni, Jakub Kicinski,
	John Fastabend, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

On 4/9/24 07:28, Jiri Pirko wrote:
> Tue, Apr 09, 2024 at 03:05:47PM CEST, f.fainelli@gmail.com wrote:
>>
>>
>> On 4/9/2024 3:56 AM, Jiri Pirko wrote:
>>> Mon, Apr 08, 2024 at 11:36:42PM CEST, f.fainelli@gmail.com wrote:
>>>> On 4/8/24 09:51, Jiri Pirko wrote:
>>>>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>>>>>> On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>>
>>>>>>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>>>>>>>> On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>>>>>>>>>>> Alex already indicated new features are coming, changes to the core
>>>>>>>>>>> code will be proposed. How should those be evaluated? Hypothetically
>>>>>>>>>>> should fbnic be allowed to be the first implementation of something
>>>>>>>>>>> invasive like Mina's DMABUF work? Google published an open userspace
>>>>>>>>>>> for NCCL that people can (in theory at least) actually run. Meta would
>>>>>>>>>>> not be able to do that. I would say that clearly crosses the line and
>>>>>>>>>>> should not be accepted.
>>>>>>>>>>
>>>>>>>>>> Why not? Just because we are not commercially selling it doesn't mean
>>>>>>>>>> we couldn't look at other solutions such as QEMU. If we were to
>>>>>>>>>> provide a github repo with an emulation of the NIC would that be
>>>>>>>>>> enough to satisfy the "commercial" requirement?
>>>>>>>>>
>>>>>>>>> My test is not "commercial", it is enabling open source ecosystem vs
>>>>>>>>> benefiting only proprietary software.
>>>>>>>>
>>>>>>>> Sorry, that was where this started where Jiri was stating that we had
>>>>>>>> to be selling this.
>>>>>>>
>>>>>>> For the record, I never wrote that. Not sure why you repeat this over
>>>>>>> this thread.
>>>>>>
>>>>>> Because you seem to be implying that the Meta NIC driver shouldn't be
>>>>>> included simply since it isn't going to be available outside of Meta.
>>>>>> The fact is Meta employs a number of kernel developers and as a result
>>>>>> of that there will be a number of kernel developers that will have
>>>>>> access to this NIC and likely do development on systems containing it.
>>>>>> In addition simply due to the size of the datacenters that we will be
>>>>>> populating there is actually a strong likelihood that there will be
>>>>>> more instances of this NIC running on Linux than there are of some
>>>>>> other vendor devices that have been allowed to have drivers in the
>>>>>> kernel.
>>>>>
>>>>> So? The gain for community is still 0. No matter how many instances is
>>>>> private hw you privately have. Just have a private driver.
>>>>
>>>> I am amazed and not in a good way at how far this has gone, truly.
>>>>
>>>> This really is akin to saying that any non-zero driver count to maintain is a
>>>> burden on the community. Which is true, by definition, but if the goal was to
>>>> build something for no users, then clearly this is the wrong place to be in,
>>>> or too late. The systems with no users are the best to maintain, that is for
>>>> sure.
>>>>
>>>> If the practical concern is wen you make tree wide API change that fbnic
>>>> happens to use, and you have yet another driver (fbnic) to convert, so what?
>>>> Work with Alex ahead of time, get his driver to be modified, post the patch
>>>> series. Even if Alex happens to move on and stop being responsible and there
>>>> is no maintainer, so what? Give the driver a depreciation window for someone
>>>> to step in, rip it, end of story. Nothing new, so what has specifically
>>>> changed as of April 4th 2024 to oppose such strong rejection?
>>>
>>> How you describe the flow of internal API change is totally distant from
>>> reality. Really, like no part is correct:
>>> 1) API change is responsibility of the person doing it. Imagine working
>>>      with 40 driver maintainers for every API change. I did my share of
>>>      API changes in the past, maintainer were only involved to be cced.
>>
>> As a submitter you propose changes and silence is acknowledgement. If one of
>> your API changes broke someone's driver and they did not notify you of the
>> breakage during the review cycle, it falls on their shoulder to fix it for
>> themselves and they should not be holding back your work, that would not be
> 
> Does it? I don't think so. If you break something, better try to fix it
> before somebody else has to.



> 
> 
>> fair. If you know about the breakage, and there is still no fix, that is an
>> indication the driver is not actively used and maintained.
> 
> So? That is not my point. If I break something in fbnic, why does anyone
> care? Nobody is ever to hit that bug, only Meta DC.

They care, and they will jump in to fix it. There is no expectation that 
as a community member you should be able to make 100% correct patches, 
this is absolutely not humanly possible, even less so with scarce access 
to the hardware. All you can hope for is that your changes work, and 
that someone catches it, sooner rather than later.

> 
> 
>>
>> This also does not mean you have to do the entire API changes to a driver you
>> do not know about on your own. Nothing ever prevents you from posting the
>> patches as RFC and say: "here is how I would go about changing your driver,
>> please review and help me make corrections". If the driver maintainers do not
>> respond there is no reason their lack of involvement should refrain your
>> work, and so your proposed changes will be merged eventually.
> 
> Realistically, did you see that ever happen. I can't recall.

This happens all of the time, if you make a netdev tree wide change, how 
many maintainer's Acked-by do we collect before merging those changes: 
none typically because some netdev maintainers are just quicker than 
reviewers could be. In other subsystems we might actually wait for 
people to give a change to give their A-b or R-b tags, not always though.

> 
> 
>>
>> Is not this the whole point of being a community and be able to delegate and
>> mitigate the risk of large scale changes?
>>
>>> 2) To deprecate driver because the maintainer is not responsible. Can
>>>      you please show me one example when that happened in the past?
>>
>> I cannot show you an example because we never had to go that far and I did
>> not say that this is an established practice, but that we *could* do that if
>> we ever reached that point.
> 
> You are talking about a flow that does not exist. I don't understand how
> is that related to this discussion then.

I was trying to appease your concerns about additional maintenance 
burden. If the burden becomes real, we ditch it. We can dismiss this 
point as being not relevant if you want.

> 
> 
>>
>>>
>>>
>>>>
>>>> Like it was said, there are tons of drivers in the Linux kernel that have a
>>>> single user, this one might have a few more than a single one, that should be
>>>> good enough.
>>>
>>> This will have exactly 0. That is my point. Why to merge something
>>> nobody will ever use?
>>
>> Even if Alex and his firmware colleague end up being the only two people
>> using this driver if the decision is to make it upstream because this is the
>> desired distribution and development model of the driver we should respect
>> that.
>>
>> And just to be clear, we should not be respecting that because Meta, or Alex
>> or anyone decided that they were doing the world a favor by working in the
>> open rather than being closed door, but simply because we cannot *presume*
> 
> I don't see any favor for the community. What's the favor exactly?

There is no exchange of favors or "this" for "that", this is not how a 
community works. You bring your code to the table, solicit review 
feedback, then go on to maintain it within your bounds, and, time 
permitting, beyond your driver. What we gain as a community is 
additional visibility, more API users (eventually real world users, 
too), and therefore a somewhat more objective way of coming up with new 
APIs and features, and just a broader understanding of what is out 
there. This is better than speculation since that creates a less skewed 
mental model.

Let us say that someone at Meta wanted to get this core netdev feature 
that could be super cool for others included in the upstream kernel, we 
would shut it down on the basis that no user exists and we would be 
right about doing it that. Turns out there is a user, but the driver 
lives out of tree, but now we also reject that driver? Who benefits from 
doing that: nobody.

You need a membership card to join the club that you can only enter if 
you have a membership card already? No thank you.

> The only favor I see is the in the opposite direction, community giving
> Meta free cycles saving their backporting costs. Why?

Technically it would be both forward porting cycles, since they would no 
longer need to rebase the driver against their most recent kernel used, 
and backporting cycles for the first kernel including fbnic onwards.

That comes almost for free these days anyways thanks to static analysis 
tools. The overwhelming cost of the maintenance remains on Meta 
engineers, being the only ones with access to the hardware. If they end 
up with customers in the future, they can offload some of that to their 
customers, too.

Let's just look at a few high profile drivers by lines changed:

Summary for: drivers/net/ethernet/mellanox/mlxsw
Total: 133422 (+), 44952 (-)
Own: 131180 (+), 42725 (-)
Community: 2242 (+) (1.680 %), 2227 (-) (4.954 %)

Summary for: drivers/net/ethernet/mellanox/mlx5
Total: 265368 (+), 107690 (-)
Own: 259213 (+), 100328 (-)
Community: 6155 (+) (2.319 %), 7362 (-) (6.836 %)%

Summary for: drivers/net/ethernet/broadcom/bnxt
Total: 70355 (+), 25402 (-)
Own: 68458 (+), 23358 (-)
Community: 1897 (+) (2.696 %), 2044 (-) (8.047 %)

Summary for: drivers/net/ethernet/intel/e1000e/
Total: 39760 (+), 9924 (-)
Own: 38514 (+), 8905 (-)
Community: 1246 (+) (3.134 %), 1019 (-) (10.268 %)

I admit this is simplistic because both mlxsw and mlx5 drivers helped 
greatly improve the networking stack in parts that I benefited directly 
from within DSA for instance.

The point is, you paid the maintenance price though, the community did not.

> 
> 
>> about their intentions and the future.
> 
> Heh, the intention is pretty clear from this discussion, isn't it? If
> they ever by any chance decide to go public with their device, driver
> for that could be submitted at a time. But this is totally hypothetical.

I think your opposition is unreasonable and is unfair. Using your 
argument to the extreme, I may go as far as saying that it encourages 
working out of tree, rather than in tree. This is the exact opposite of 
what made Linux successful as an OS.

Can I buy a Spectrum switch off Amazon?
-- 
Florian


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 17:42                                         ` Florian Fainelli
@ 2024-04-09 18:38                                           ` Leon Romanovsky
  0 siblings, 0 replies; 163+ messages in thread
From: Leon Romanovsky @ 2024-04-09 18:38 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jiri Pirko, Alexander Duyck, Jason Gunthorpe, Paolo Abeni,
	Jakub Kicinski, John Fastabend, netdev, bhelgaas, linux-pci,
	Alexander Duyck, davem, Christoph Hellwig

On Tue, Apr 09, 2024 at 10:42:44AM -0700, Florian Fainelli wrote:
> On 4/9/24 07:28, Jiri Pirko wrote:
> > Tue, Apr 09, 2024 at 03:05:47PM CEST, f.fainelli@gmail.com wrote:
> > > 

<...>

> Can I buy a Spectrum switch off Amazon?

You can buy latest Spectrum generation from eBay.
https://www.ebay.com/itm/145138400557?itmmeta=01HV2247YN9ENJHP6T4YSN2HP7&hash=item21caec3d2d:g:qWoAAOSwzEZjiq2z&itmprp=enc%3AAQAJAAAA4CjGKSBxVTaO07qnpLBQGBwBJdGVCYhu730MrI5AC6E%2BERJLxS0EdlgE2gKtElk%2FZUj6A9DQR69IXIpTk%2FJbqEGKgCNR4d6TMz6i%2BogO02ZZBCkpLkPOYpOvDJV1jRv2KVOBt7i5k5pUMRJpwPKzAH%2Fwf6tglPOId2d9fSy%2BBM3MbDcbZfkv4V%2FNbItTgspvDnnMKAzUmR3Rs9%2FoHVDVbU4ZsnfRKFOMaKbGH5j%2Fani2jAqtbPEIA3H8nUcdpkxo4I61N5w9peLlN6Hkj8E8irdQY4TTzSTdYZ7EC9JG09cQ%7Ctkp%3ABk9SR8T_kMLYYw

Thanks

> -- 
> Florian
> 
> 

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 17:12                                     ` Jason Gunthorpe
@ 2024-04-09 18:38                                       ` Alexander Duyck
  2024-04-09 18:54                                         ` Jason Gunthorpe
  2024-04-09 19:15                                         ` Leon Romanovsky
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-09 18:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 9, 2024 at 10:12 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Apr 09, 2024 at 09:31:06AM -0700, Alexander Duyck wrote:
>
> > > expectation is generally things like:
> > >
> > >  - The bug is fixed immediately because the issue is obvious to the
> > >    author
> > >  - Iteration and rapid progress is seen toward enlightening the author
> > >  - The patch is reverted, often rapidly, try again later with a good
> > >    patch
> >
> > When working on a development branch that shouldn't be the
> > expectation. I suspect that is why the revert was pushed back on
> > initially. The developer wanted a chance to try to debug and resolve
> > the issue with root cause.
>
> Even mm-unstable drops patches on a hair trigger, as an example.
>
> You can't have an orderly development process if your development tree
> is broken in your CI.. Personally I'm grateful for the people who test
> linux-next (or the various constituent sub trees), it really helps.
>
> > Well much of it has to do with the fact that this is supposed to be a
> > community. Generally I help you, you help me and together we both make
> > progress. So within the community people tend to build up what we
> > could call karma. Generally I think some of the messages sent seemed
> > to make it come across that the Mellanox/Nvidia folks felt it "wasn't
> > their problem" so they elicited a bit of frustration from the other
> > maintainers and built up some negative karma.
>
> How could it be NVIDIA folks problem? They are not experts in TCP and
> can't debug it. The engineer running the CI systems did what he was
> asked by Eric from what I can tell.

No, I get your message. I wasn't saying it was your problem. All that
can be asked for is such cooperation. Like I said I think some of the
problem was the messaging more than the process.

> > phenomenon where if we even brushed against block of upstream code
> > that wasn't being well maintained we would be asked to fix it up and
> > address existing issues before we could upstream any patches.
>
> Well, Intel has it's own karma problems in the kernel community. :(

Oh, I know. I resisted the urge to push out the driver as "idgaf:
Internal Device Generated at Facebook" on April 1st instead of "fbnic"
to poke fun at the presentation they did at Netdev 0x16 where they
were trying to say all the vendors should be implementing "idpf" since
they made it a standard.

> > > In my view the vendor/!vendor distinction is really toxic and should
> > > stop.
> >
> > I agree. However that was essentially what started all this when Jiri
> > pointed out that we weren't selling the NIC to anyone else. That made
> > this all about vendor vs !vendor,
>
> That is not how I would sum up Jiri's position.
>
> By my read he is saying that contributing code to the kernel that only
> Meta can actually use is purely extractive. It is not about vendor or
> !vendor, it is taking-free-forwardporting or not. You have argued,
> and I would agree, that there is a grey scale between
> extractive/collaborative - but I also agree with Jiri that fbnic is
> undeniably far toward the extractive side.
>
> If being extractive is a problem in this case or not is another
> question, but I would say Jiri's objection is definitely not about
> selling or vendor vs !vendor.
>
> Jason

It all depends on your definition of being extractive. I would assume
a "consumer" that is running a large number of systems and is capable
of providing sophisticated feedback on issues found within the kernel,
in many cases providing fixes for said issues, or working with
maintainers on resolution of said issues, is not extractive.

The fact that said "consumer" decides to then produce their own device
becoming a "prosumer" means they are now able to more accurately and
quickly diagnose issues when they see them. They can design things
such that there isn't some black box of firmware, or a third party
driver, in the datapath that prevents them from quickly diagnosing the
issue. So if anything I would think that is a net positive for the
community as it allows the "prosumer" to provide much more quick and
useful feedback on issues found in the kernel rather than having to
wait on a third party vendor to provide additional input.

Note I am not going after any particular vendor with my comments. This
applies to all vendors. The problem as a customer is that you are
limited on what you can do once you find an issue. Quite often you are
at the mercy of the vendor in such cases, especially when there seems
to be either firmware or "security" issues involved.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 18:38                                       ` Alexander Duyck
@ 2024-04-09 18:54                                         ` Jason Gunthorpe
  2024-04-09 20:03                                           ` Alexander Duyck
  2024-04-09 19:15                                         ` Leon Romanovsky
  1 sibling, 1 reply; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-09 18:54 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Leon Romanovsky, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 09, 2024 at 11:38:59AM -0700, Alexander Duyck wrote:
> > > phenomenon where if we even brushed against block of upstream code
> > > that wasn't being well maintained we would be asked to fix it up and
> > > address existing issues before we could upstream any patches.
> >
> > Well, Intel has it's own karma problems in the kernel community. :(
> 
> Oh, I know. I resisted the urge to push out the driver as "idgaf:
> Internal Device Generated at Facebook" on April 1st instead of
> "fbnic"

That would have been hilarious!

> to poke fun at the presentation they did at Netdev 0x16 where they
> were trying to say all the vendors should be implementing "idpf" since
> they made it a standard.

Yes, I noticed this also. For all the worries I've heard lately about
lack of commonality/etc it seems like a major missed ecosystem
opportunity to have not invested in an industry standard. From what I
can see fbnic has no hope of being anything more than a one-off
generation for Meta. Too many silicon design micro-details are exposed
to the OS.

> It all depends on your definition of being extractive. I would assume
> a "consumer" that is running a large number of systems and is capable
> of providing sophisticated feedback on issues found within the kernel,
> in many cases providing fixes for said issues, or working with
> maintainers on resolution of said issues, is not extractive.

I don't know, as I said there is some grey scale.

IMHO it is not appropriate to make such decisions based on some
company wide metric. fbnic team alone should be judged and shouldn't
get a free ride based on the other good work Meta is doing. Otherwise
it turns into a thing where bigger/richer companies just get to do
whatever they want because they do the most "good" in aggregate.

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 18:38                                       ` Alexander Duyck
  2024-04-09 18:54                                         ` Jason Gunthorpe
@ 2024-04-09 19:15                                         ` Leon Romanovsky
  1 sibling, 0 replies; 163+ messages in thread
From: Leon Romanovsky @ 2024-04-09 19:15 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jason Gunthorpe, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 09, 2024 at 11:38:59AM -0700, Alexander Duyck wrote:
> On Tue, Apr 9, 2024 at 10:12 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Apr 09, 2024 at 09:31:06AM -0700, Alexander Duyck wrote:
> >
> > > > expectation is generally things like:
> > > >
> > > >  - The bug is fixed immediately because the issue is obvious to the
> > > >    author
> > > >  - Iteration and rapid progress is seen toward enlightening the author
> > > >  - The patch is reverted, often rapidly, try again later with a good
> > > >    patch
> > >
> > > When working on a development branch that shouldn't be the
> > > expectation. I suspect that is why the revert was pushed back on
> > > initially. The developer wanted a chance to try to debug and resolve
> > > the issue with root cause.
> >
> > Even mm-unstable drops patches on a hair trigger, as an example.
> >
> > You can't have an orderly development process if your development tree
> > is broken in your CI.. Personally I'm grateful for the people who test
> > linux-next (or the various constituent sub trees), it really helps.
> >
> > > Well much of it has to do with the fact that this is supposed to be a
> > > community. Generally I help you, you help me and together we both make
> > > progress. So within the community people tend to build up what we
> > > could call karma. Generally I think some of the messages sent seemed
> > > to make it come across that the Mellanox/Nvidia folks felt it "wasn't
> > > their problem" so they elicited a bit of frustration from the other
> > > maintainers and built up some negative karma.
> >
> > How could it be NVIDIA folks problem? They are not experts in TCP and
> > can't debug it. The engineer running the CI systems did what he was
> > asked by Eric from what I can tell.
> 
> No, I get your message. I wasn't saying it was your problem. All that
> can be asked for is such cooperation. Like I said I think some of the
> problem was the messaging more than the process.

Patch with revert came month+ after we reported the issue and were ready
to do anything to find the root cause, so it is not the messaging issue,
it was the exclusion from process issue.

I tried to avoid to write the below, but because Jason brought it
already, I'll write my feelings.

Current netdev has very toxic environment, with binary separation to
vendors and not-vendors.

Vendors are bad guys who day and night try to cheat and sneak their
dirty hacks into the kernel. Their contributions are negligible and
can't be trusted by definition.

Luckily enough, there are some "not-vendors" and they are the good
guys who know what is the best for the community and all other world.

Thanks

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 18:54                                         ` Jason Gunthorpe
@ 2024-04-09 20:03                                           ` Alexander Duyck
  2024-04-09 23:11                                             ` Jason Gunthorpe
  2024-04-10  9:37                                             ` Jiri Pirko
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-09 20:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 9, 2024 at 11:55 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Apr 09, 2024 at 11:38:59AM -0700, Alexander Duyck wrote:
> > > > phenomenon where if we even brushed against block of upstream code
> > > > that wasn't being well maintained we would be asked to fix it up and
> > > > address existing issues before we could upstream any patches.
> > >
> > > Well, Intel has it's own karma problems in the kernel community. :(
> >
> > Oh, I know. I resisted the urge to push out the driver as "idgaf:
> > Internal Device Generated at Facebook" on April 1st instead of
> > "fbnic"
>
> That would have been hilarious!
>
> > to poke fun at the presentation they did at Netdev 0x16 where they
> > were trying to say all the vendors should be implementing "idpf" since
> > they made it a standard.
>
> Yes, I noticed this also. For all the worries I've heard lately about
> lack of commonality/etc it seems like a major missed ecosystem
> opportunity to have not invested in an industry standard. From what I
> can see fbnic has no hope of being anything more than a one-off
> generation for Meta. Too many silicon design micro-details are exposed
> to the OS.

I know. The fact is we aren't trying to abstract away anything as that
would mean a larger firmware blob. That is the problem with an
abstraction like idpf is that it just adds more overhead as you have
to have the firmware manage more of the control plane.

> > It all depends on your definition of being extractive. I would assume
> > a "consumer" that is running a large number of systems and is capable
> > of providing sophisticated feedback on issues found within the kernel,
> > in many cases providing fixes for said issues, or working with
> > maintainers on resolution of said issues, is not extractive.
>
> I don't know, as I said there is some grey scale.
>
> IMHO it is not appropriate to make such decisions based on some
> company wide metric. fbnic team alone should be judged and shouldn't
> get a free ride based on the other good work Meta is doing. Otherwise
> it turns into a thing where bigger/richer companies just get to do
> whatever they want because they do the most "good" in aggregate.

The problem here in this case is that I am pretty much the heart of
the software driver team with a few new hires onboarding the next
couple months. People were asking why others were jumping to my
defense, well if we are going to judge the team they are mostly
judging me. I'm just hoping my reputation has spoken for itself
considering I was making significant contributions to the drivers even
after I have gone through several changes of employer.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
                   ` (17 preceding siblings ...)
  2024-04-05 14:01 ` Przemek Kitszel
@ 2024-04-09 20:51 ` Jakub Kicinski
  2024-04-09 21:06   ` Willem de Bruijn
                     ` (2 more replies)
  18 siblings, 3 replies; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-09 20:51 UTC (permalink / raw)
  To: pabeni, John Fastabend, Alexander Lobakin, Florian Fainelli,
	Andrew Lunn, Daniel Borkmann, Edward Cree
  Cc: Alexander Duyck, netdev, bhelgaas, linux-pci, Alexander Duyck,
	Willem de Bruijn

On Wed, 03 Apr 2024 13:08:24 -0700 Alexander Duyck wrote:
> This patch set includes the necessary patches to enable basic Tx and Rx
> over the Meta Platforms Host Network Interface. To do this we introduce a
> new driver and driver and directories in the form of
> "drivers/net/ethernet/meta/fbnic".

Let me try to restate some takeaways and ask for further clarification
on the main question...

First, I think there's broad support for merging the driver itself.

IIUC there is also broad support to raise the expectations from
maintainers of drivers for private devices, specifically that they will:
 - receive weaker "no regression" guarantees
 - help with refactoring / adapting their drivers more actively
 - not get upset when we delete those drivers if they stop participating

If you think that the drivers should be merged *without* setting these
expectations, please speak up.

Nobody picked me up on the suggestion to use the CI as a proactive
check whether the maintainer / owner is still paying attention, 
but okay :(


What is less clear to me is what do we do about uAPI / core changes.
Of those who touched on the subject - few people seem to be curious /
welcoming to any reasonable features coming out for private devices
(John, Olek, Florian)? Others are more cautious focusing on blast
radius and referring to the "two driver rule" (Daniel, Paolo)?
Whether that means outright ban on touching common code or uAPI
in ways which aren't exercised by commercial NICs, is unclear. 
Andrew and Ed did not address the question directly AFAICT.

Is my reading correct? Does anyone have an opinion on whether we should
try to dig more into this question prior to merging the driver, and
set some ground rules? Or proceed and learn by doing?

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 20:51 ` Jakub Kicinski
@ 2024-04-09 21:06   ` Willem de Bruijn
  2024-04-10  7:26     ` Jiri Pirko
  2024-04-09 23:42   ` Andrew Lunn
  2024-04-10  7:42   ` Jiri Pirko
  2 siblings, 1 reply; 163+ messages in thread
From: Willem de Bruijn @ 2024-04-09 21:06 UTC (permalink / raw)
  To: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Andrew Lunn, Daniel Borkmann, Edward Cree
  Cc: Alexander Duyck, netdev, bhelgaas, linux-pci, Alexander Duyck,
	Willem de Bruijn

Jakub Kicinski wrote:
> On Wed, 03 Apr 2024 13:08:24 -0700 Alexander Duyck wrote:
> > This patch set includes the necessary patches to enable basic Tx and Rx
> > over the Meta Platforms Host Network Interface. To do this we introduce a
> > new driver and driver and directories in the form of
> > "drivers/net/ethernet/meta/fbnic".
> 
> Let me try to restate some takeaways and ask for further clarification
> on the main question...
> 
> First, I think there's broad support for merging the driver itself.
> 
> IIUC there is also broad support to raise the expectations from
> maintainers of drivers for private devices, specifically that they will:
>  - receive weaker "no regression" guarantees
>  - help with refactoring / adapting their drivers more actively
>  - not get upset when we delete those drivers if they stop participating
> 
> If you think that the drivers should be merged *without* setting these
> expectations, please speak up.
> 
> Nobody picked me up on the suggestion to use the CI as a proactive
> check whether the maintainer / owner is still paying attention, 
> but okay :(
> 
> 
> What is less clear to me is what do we do about uAPI / core changes.
> Of those who touched on the subject - few people seem to be curious /
> welcoming to any reasonable features coming out for private devices
> (John, Olek, Florian)? Others are more cautious focusing on blast
> radius and referring to the "two driver rule" (Daniel, Paolo)?
> Whether that means outright ban on touching common code or uAPI
> in ways which aren't exercised by commercial NICs, is unclear. 
> Andrew and Ed did not address the question directly AFAICT.
> 
> Is my reading correct? Does anyone have an opinion on whether we should
> try to dig more into this question prior to merging the driver, and
> set some ground rules? Or proceed and learn by doing?

Thanks for summarizing. That was my reading too

Two distict questions

1. whether a standard driver is as admissible if the device is not
   available on the open market.

2. whether new device features can be supported without at least
   two available devices supporting it.

FWIW, +1 for 1 from me. Any serious device that exists in quantity
and is properly maintained should be in-tree.

In terms of trusting Meta, it is less about karma, but an indication
of these two requirements when the driver first appears. We would not
want to merge vaporware drivers from unknown sources or university
research projects.

2 is out of scope for this series. But I would always want to hear
about potential new features that an organization finds valuable
enough to implement. Rather than a blanket rule against them.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 20:03                                           ` Alexander Duyck
@ 2024-04-09 23:11                                             ` Jason Gunthorpe
  2024-04-10  9:37                                             ` Jiri Pirko
  1 sibling, 0 replies; 163+ messages in thread
From: Jason Gunthorpe @ 2024-04-09 23:11 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Leon Romanovsky, Jakub Kicinski, John Fastabend, Jiri Pirko,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

On Tue, Apr 09, 2024 at 01:03:06PM -0700, Alexander Duyck wrote:
> People were asking why others were jumping to my defense, well if we
> are going to judge the team they are mostly judging me. I'm just
> hoping my reputation has spoken for itself considering I was making
> significant contributions to the drivers even after I have gone
> through several changes of employer.

I don't think anything in this thread is about you personally. You
were just given a "bad vendor" task by your employer and you carried
it out. As is usual with these things there is legitimate disagreement
on what it means to be a "bad vendor".

Jason

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 20:51 ` Jakub Kicinski
  2024-04-09 21:06   ` Willem de Bruijn
@ 2024-04-09 23:42   ` Andrew Lunn
  2024-04-10 15:56     ` Alexander Duyck
  2024-04-10  7:42   ` Jiri Pirko
  2 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-09 23:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pabeni, John Fastabend, Alexander Lobakin, Florian Fainelli,
	Daniel Borkmann, Edward Cree, Alexander Duyck, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

> What is less clear to me is what do we do about uAPI / core changes.

I would differentiate between core change and core additions. If there
is very limited firmware on this device, i assume Linux is managing
the SFP cage, and to some extend the PCS. Extending the core to handle
these at higher speeds than currently supported would be one such core
addition. I've no problem with this. And i doubt it will be a single
NIC using such additions for too long. It looks like ClearFog CX LX2
could make use of such extensions as well, and there are probably
other boards and devices, maybe the Zynq 7000?

> Is my reading correct? Does anyone have an opinion on whether we should
> try to dig more into this question prior to merging the driver, and
> set some ground rules? Or proceed and learn by doing?

I'm not too keen on keeping potentially shareable code in the driver
just because of UEFI. It has long been the norm that you should not
have wrappers so you can reuse code in different OSes. And UEFI is
just another OS. So i really would like to see a Linux I2C bus master
driver, a linux GPIO driver if appropriate, and using phylink, just as
i've pushed wangxun to do that, and to some extend nvidia with their
GPIO controller embedded in their NIC. The nice thing is, the
developers for wangxun has mostly solved all this for a PCIe device,
so their code can be copied.

Do we need to set some ground rules? No. I can give similar feedback
as i gave the wangxun developers, if Linux subsystems are not used
appropriately.

       Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 21:06   ` Willem de Bruijn
@ 2024-04-10  7:26     ` Jiri Pirko
  2024-04-10 21:30       ` Jacob Keller
  0 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-10  7:26 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Andrew Lunn, Daniel Borkmann, Edward Cree,
	Alexander Duyck, netdev, bhelgaas, linux-pci, Alexander Duyck

Tue, Apr 09, 2024 at 11:06:05PM CEST, willemdebruijn.kernel@gmail.com wrote:
>Jakub Kicinski wrote:
>> On Wed, 03 Apr 2024 13:08:24 -0700 Alexander Duyck wrote:

[...]

>
>2. whether new device features can be supported without at least
>   two available devices supporting it.
>

[...]

>
>2 is out of scope for this series. But I would always want to hear
>about potential new features that an organization finds valuable
>enough to implement. Rather than a blanket rule against them.

This appears out of the nowhere. In the past, I would say wast majority
of the features was merged with single device implementation. Often, it
is the only device out there at a time that supports the feature.
This limitation would put break for feature additions. I can put a long
list of features that would not be here ever (like 50% of mlxsw driver).

>
>

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 20:51 ` Jakub Kicinski
  2024-04-09 21:06   ` Willem de Bruijn
  2024-04-09 23:42   ` Andrew Lunn
@ 2024-04-10  7:42   ` Jiri Pirko
  2024-04-10 12:50     ` Przemek Kitszel
  2024-04-10 13:46     ` Jakub Kicinski
  2 siblings, 2 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-10  7:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pabeni, John Fastabend, Alexander Lobakin, Florian Fainelli,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

Tue, Apr 09, 2024 at 10:51:42PM CEST, kuba@kernel.org wrote:
>On Wed, 03 Apr 2024 13:08:24 -0700 Alexander Duyck wrote:
>> This patch set includes the necessary patches to enable basic Tx and Rx
>> over the Meta Platforms Host Network Interface. To do this we introduce a
>> new driver and driver and directories in the form of
>> "drivers/net/ethernet/meta/fbnic".
>
>Let me try to restate some takeaways and ask for further clarification
>on the main question...
>
>First, I think there's broad support for merging the driver itself.
>
>IIUC there is also broad support to raise the expectations from
>maintainers of drivers for private devices, specifically that they will:
> - receive weaker "no regression" guarantees
> - help with refactoring / adapting their drivers more actively

:)


> - not get upset when we delete those drivers if they stop participating

Sorry for being pain, but I would still like to see some sumarization of
what is actually the gain for the community to merge this unused driver.
So far, I don't recall to read anything solid.

btw:
Kconfig description should contain:
 Say N here, you can't ever see this device in real world.


>
>If you think that the drivers should be merged *without* setting these
>expectations, please speak up.
>
>Nobody picked me up on the suggestion to use the CI as a proactive
>check whether the maintainer / owner is still paying attention, 
>but okay :(
>
>
>What is less clear to me is what do we do about uAPI / core changes.
>Of those who touched on the subject - few people seem to be curious /
>welcoming to any reasonable features coming out for private devices
>(John, Olek, Florian)? Others are more cautious focusing on blast
>radius and referring to the "two driver rule" (Daniel, Paolo)?
>Whether that means outright ban on touching common code or uAPI
>in ways which aren't exercised by commercial NICs, is unclear. 

For these kind of unused drivers, I think it would be legit to
disallow any internal/external api changes. Just do that for some
normal driver, then benefit from the changes in the unused driver.

Now the question is, how to distinguish these 2 driver kinds? Maybe to
put them under some directory so it is clear?
drivers/net/unused/ethernet/meta/fbnic/


>Andrew and Ed did not address the question directly AFAICT.
>
>Is my reading correct? Does anyone have an opinion on whether we should
>try to dig more into this question prior to merging the driver, and
>set some ground rules? Or proceed and learn by doing?
>

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 20:03                                           ` Alexander Duyck
  2024-04-09 23:11                                             ` Jason Gunthorpe
@ 2024-04-10  9:37                                             ` Jiri Pirko
  1 sibling, 0 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-10  9:37 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jason Gunthorpe, Leon Romanovsky, Jakub Kicinski, John Fastabend,
	netdev, bhelgaas, linux-pci, Alexander Duyck, davem, pabeni

Tue, Apr 09, 2024 at 10:03:06PM CEST, alexander.duyck@gmail.com wrote:
>On Tue, Apr 9, 2024 at 11:55 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Tue, Apr 09, 2024 at 11:38:59AM -0700, Alexander Duyck wrote:
>> > > > phenomenon where if we even brushed against block of upstream code
>> > > > that wasn't being well maintained we would be asked to fix it up and
>> > > > address existing issues before we could upstream any patches.
>> > >
>> > > Well, Intel has it's own karma problems in the kernel community. :(
>> >
>> > Oh, I know. I resisted the urge to push out the driver as "idgaf:
>> > Internal Device Generated at Facebook" on April 1st instead of
>> > "fbnic"
>>
>> That would have been hilarious!
>>
>> > to poke fun at the presentation they did at Netdev 0x16 where they
>> > were trying to say all the vendors should be implementing "idpf" since
>> > they made it a standard.
>>
>> Yes, I noticed this also. For all the worries I've heard lately about
>> lack of commonality/etc it seems like a major missed ecosystem
>> opportunity to have not invested in an industry standard. From what I
>> can see fbnic has no hope of being anything more than a one-off
>> generation for Meta. Too many silicon design micro-details are exposed
>> to the OS.
>
>I know. The fact is we aren't trying to abstract away anything as that
>would mean a larger firmware blob. That is the problem with an
>abstraction like idpf is that it just adds more overhead as you have
>to have the firmware manage more of the control plane.
>
>> > It all depends on your definition of being extractive. I would assume
>> > a "consumer" that is running a large number of systems and is capable
>> > of providing sophisticated feedback on issues found within the kernel,
>> > in many cases providing fixes for said issues, or working with
>> > maintainers on resolution of said issues, is not extractive.
>>
>> I don't know, as I said there is some grey scale.
>>
>> IMHO it is not appropriate to make such decisions based on some
>> company wide metric. fbnic team alone should be judged and shouldn't
>> get a free ride based on the other good work Meta is doing. Otherwise
>> it turns into a thing where bigger/richer companies just get to do
>> whatever they want because they do the most "good" in aggregate.
>
>The problem here in this case is that I am pretty much the heart of
>the software driver team with a few new hires onboarding the next
>couple months. People were asking why others were jumping to my
>defense, well if we are going to judge the team they are mostly
>judging me. I'm just hoping my reputation has spoken for itself
>considering I was making significant contributions to the drivers even
>after I have gone through several changes of employer.

Let me clearly state two things this thread it running around all the
time:
1) This has nothing to do with you Alex, at all. If John Doe was the one
   pushing this, from my perspective, everything would be exactly the same.
2) This is not about selling devices. I already stresses that out
   multiple times, yet people are still talking about selling. This is
   about possibility of outside-Meta person to meet the device in real
   world with real usecase (not emulated, that does not make any sense).

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 14:41                                       ` Jiri Pirko
@ 2024-04-10 11:45                                         ` Alexander Lobakin
  2024-04-10 12:12                                           ` Jiri Pirko
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Lobakin @ 2024-04-10 11:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Alexander Duyck, Jason Gunthorpe, Paolo Abeni,
	Jakub Kicinski, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 9 Apr 2024 16:41:10 +0200

> Tue, Apr 09, 2024 at 03:11:21PM CEST, aleksander.lobakin@intel.com wrote:
>> From: Jiri Pirko <jiri@resnulli.us>
>> Date: Tue, 9 Apr 2024 13:01:51 +0200
>>
>>> Mon, Apr 08, 2024 at 07:32:59PM CEST, john.fastabend@gmail.com wrote:
>>>> Jiri Pirko wrote:
>>>>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>>>>>> On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>>
>>>>>>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>>>>>>>> On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>>>>>>>>>>> Alex already indicated new features are coming, changes to the core
>>>>>>>>>>> code will be proposed. How should those be evaluated? Hypothetically
>>>>>>>>>>> should fbnic be allowed to be the first implementation of something
>>>>>>>>>>> invasive like Mina's DMABUF work? Google published an open userspace
>>>>>>>>>>> for NCCL that people can (in theory at least) actually run. Meta would
>>>>>>>>>>> not be able to do that. I would say that clearly crosses the line and
>>>>>>>>>>> should not be accepted.
>>>>>>>>>>
>>>>>>>>>> Why not? Just because we are not commercially selling it doesn't mean
>>>>>>>>>> we couldn't look at other solutions such as QEMU. If we were to
>>>>>>>>>> provide a github repo with an emulation of the NIC would that be
>>>>>>>>>> enough to satisfy the "commercial" requirement?
>>>>>>>>>
>>>>>>>>> My test is not "commercial", it is enabling open source ecosystem vs
>>>>>>>>> benefiting only proprietary software.
>>>>>>>>
>>>>>>>> Sorry, that was where this started where Jiri was stating that we had
>>>>>>>> to be selling this.
>>>>>>>
>>>>>>> For the record, I never wrote that. Not sure why you repeat this over
>>>>>>> this thread.
>>>>>>
>>>>>> Because you seem to be implying that the Meta NIC driver shouldn't be
>>>>>> included simply since it isn't going to be available outside of Meta.
>>
>> BTW idpf is also not something you can go and buy in a store, but it's
>> here in the kernel. Anyway, see below.
> 
> IDK, why so many people in this thread are so focused on "buying" nic.
> IDPF device is something I assume one may see on a VM hosted in some
> cloud, isn't it? If yes, it is completely legit to have it in. Do I miss
> something?

Anyhow, we want the upstream Linux kernel to work out of box on most
systems. Rejecting this driver basically encourages to still prefer
OOT/proprietary crap.

> 
> 
>>
>>>>>> The fact is Meta employs a number of kernel developers and as a result
>>>>>> of that there will be a number of kernel developers that will have
>>>>>> access to this NIC and likely do development on systems containing it.
>>
>> [...]
>>
>>>> Vendors would happily spin up a NIC if a DC with scale like this
>>>> would pay for it. They just don't advertise it in patch 0/X,
>>>> "adding device for cloud provider foo".
>>>>
>>>> There is no difference here. We gain developers, we gain insights,
>>>> learnings and Linux and OSS drivers are running on another big
>>>> DC. They improve things and find bugs they upstream them its a win.
>>>>
>>>> The opposite is also true if we exclude a driver/NIC HW that is
>>>> running on major DCs we lose a lot of insight, experience, value.
>>>
>>> Could you please describe in details and examples what exactly is we
>>> are about to loose? I don't see it.
>>
>> As long as driver A introduces new features / improvements / API /
>> whatever to the core kernel, we benefit from this no matter whether I'm
>> actually able to run this driver on my system.
>>
>> Some drivers even give us benefit by that they are of good quality (I
>> don't speak for this driver, just some hypothetical) and/or have
>> interesting design / code / API / etc. choices. The drivers I work on
>> did gain a lot just from that I was reading new commits / lore threads
>> and look at changes in other drivers.
>>
>> I saw enough situations when driver A started using/doing something the
>> way it wasn't ever done anywhere before, and then more and more drivers
>> stated doing the same thing and at the end it became sorta standard.
> 
> So bottom line is, the unused driver *may* introduce some features and
> *may* provide as an example of how to do things for other people.
> Is this really that beneficial for the community that it overweights
> the obvious cons (not going to repeat them)?
> 
> Like with any other patch/set we merge in, we always look at the cons
> and pros. I'm honestly surprised that so many people here
> want to make exception for Meta's internal toy project.

It's not me who wants to make an exception. I haven't ever seen a driver
rejected due to "it can be used only somewhere where I can't go", so
looks like it's you who wants to make an exception :>

Thanks,
Olek

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-09 15:08     ` Alexander Duyck
@ 2024-04-10 11:54       ` Yunsheng Lin
  2024-04-10 15:03         ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Yunsheng Lin @ 2024-04-10 11:54 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On 2024/4/9 23:08, Alexander Duyck wrote:
> On Tue, Apr 9, 2024 at 4:47 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2024/4/4 4:09, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> [...]
> 
>>> +     /* Unmap and free processed buffers */
>>> +     if (head0 >= 0)
>>> +             fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
>>> +     fbnic_fill_bdq(nv, &qt->sub0);
>>> +
>>> +     if (head1 >= 0)
>>> +             fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
>>> +     fbnic_fill_bdq(nv, &qt->sub1);
>>
>> I am not sure how complicated the rx handling will be for the advanced
>> feature. For the current code, for each entry/desc in both qt->sub0 and
>> qt->sub1 at least need one page, and the page seems to be only used once
>> no matter however small the page is used?
>>
>> I am assuming you want to do 'tightly optimized' operation for this by
>> calling page_pool_fragment_page(), but manipulating page->pp_ref_count
>> directly does not seems to add any value for the current code, but seem
>> to waste a lot of memory by not using the frag API, especially PAGE_SIZE
>>> 4K?
> 
> On this hardware both the header and payload buffers are fragmentable.
> The hardware decides the partitioning and we just follow it. So for
> example it wouldn't be uncommon to have a jumbo frame split up such
> that the header is less than 128B plus SKB overhead while the actual
> data in the payload is just over 1400. So for us fragmenting the pages
> is a very likely case especially with smaller packets.

I understand that is what you are trying to do, but the code above does
not seems to match the description, as the fbnic_clean_bdq() and
fbnic_fill_bdq() are called for qt->sub0 and qt->sub1, so the old pages
of qt->sub0 and qt->sub1 just cleaned are drained and refill each sub
with new pages, which does not seems to have any fragmenting?

The fragmenting can only happen when there is continuous small packet
coming from wire so that hw can report the same pg_id for different
packet with pg_offset before fbnic_clean_bdq() and fbnic_fill_bdq()
is called? I am not sure how to ensure that considering that we might
break out of while loop in fbnic_clean_rcq() because of 'packets < budget'
checking.

> 
> It is better for us to optimize for the small packet scenario than
> optimize for the case where 4K slices are getting taken. That way when
> we are CPU constrained handling small packets we are the most
> optimized whereas for the larger frames we can spare a few cycles to
> account for the extra overhead. The result should be a higher overall
> packets per second.

The problem is that small packet means low utilization of the bandwidth
as more bandwidth is used to send header instead of payload that is useful
for the user, so the question seems to be how often the small packet is
seen in the wire?

> .
> 

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 11:45                                         ` Alexander Lobakin
@ 2024-04-10 12:12                                           ` Jiri Pirko
  0 siblings, 0 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-10 12:12 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: John Fastabend, Alexander Duyck, Jason Gunthorpe, Paolo Abeni,
	Jakub Kicinski, netdev, bhelgaas, linux-pci, Alexander Duyck,
	davem, Christoph Hellwig

Wed, Apr 10, 2024 at 01:45:54PM CEST, aleksander.lobakin@intel.com wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Tue, 9 Apr 2024 16:41:10 +0200
>
>> Tue, Apr 09, 2024 at 03:11:21PM CEST, aleksander.lobakin@intel.com wrote:
>>> From: Jiri Pirko <jiri@resnulli.us>
>>> Date: Tue, 9 Apr 2024 13:01:51 +0200
>>>
>>>> Mon, Apr 08, 2024 at 07:32:59PM CEST, john.fastabend@gmail.com wrote:
>>>>> Jiri Pirko wrote:
>>>>>> Mon, Apr 08, 2024 at 05:46:35PM CEST, alexander.duyck@gmail.com wrote:
>>>>>>> On Mon, Apr 8, 2024 at 4:51 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>>>
>>>>>>>> Fri, Apr 05, 2024 at 08:38:25PM CEST, alexander.duyck@gmail.com wrote:
>>>>>>>>> On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
>>>>>>>>>>>> Alex already indicated new features are coming, changes to the core
>>>>>>>>>>>> code will be proposed. How should those be evaluated? Hypothetically
>>>>>>>>>>>> should fbnic be allowed to be the first implementation of something
>>>>>>>>>>>> invasive like Mina's DMABUF work? Google published an open userspace
>>>>>>>>>>>> for NCCL that people can (in theory at least) actually run. Meta would
>>>>>>>>>>>> not be able to do that. I would say that clearly crosses the line and
>>>>>>>>>>>> should not be accepted.
>>>>>>>>>>>
>>>>>>>>>>> Why not? Just because we are not commercially selling it doesn't mean
>>>>>>>>>>> we couldn't look at other solutions such as QEMU. If we were to
>>>>>>>>>>> provide a github repo with an emulation of the NIC would that be
>>>>>>>>>>> enough to satisfy the "commercial" requirement?
>>>>>>>>>>
>>>>>>>>>> My test is not "commercial", it is enabling open source ecosystem vs
>>>>>>>>>> benefiting only proprietary software.
>>>>>>>>>
>>>>>>>>> Sorry, that was where this started where Jiri was stating that we had
>>>>>>>>> to be selling this.
>>>>>>>>
>>>>>>>> For the record, I never wrote that. Not sure why you repeat this over
>>>>>>>> this thread.
>>>>>>>
>>>>>>> Because you seem to be implying that the Meta NIC driver shouldn't be
>>>>>>> included simply since it isn't going to be available outside of Meta.
>>>
>>> BTW idpf is also not something you can go and buy in a store, but it's
>>> here in the kernel. Anyway, see below.
>> 
>> IDK, why so many people in this thread are so focused on "buying" nic.
>> IDPF device is something I assume one may see on a VM hosted in some
>> cloud, isn't it? If yes, it is completely legit to have it in. Do I miss
>> something?
>
>Anyhow, we want the upstream Linux kernel to work out of box on most
>systems. Rejecting this driver basically encourages to still prefer
>OOT/proprietary crap.

Totally true. Out of the box on as many systems as possible. This is not
the case. Can you show me an example of a person outside-of-Meta can
benefit from using this driver out-of-box?


>
>> 
>> 
>>>
>>>>>>> The fact is Meta employs a number of kernel developers and as a result
>>>>>>> of that there will be a number of kernel developers that will have
>>>>>>> access to this NIC and likely do development on systems containing it.
>>>
>>> [...]
>>>
>>>>> Vendors would happily spin up a NIC if a DC with scale like this
>>>>> would pay for it. They just don't advertise it in patch 0/X,
>>>>> "adding device for cloud provider foo".
>>>>>
>>>>> There is no difference here. We gain developers, we gain insights,
>>>>> learnings and Linux and OSS drivers are running on another big
>>>>> DC. They improve things and find bugs they upstream them its a win.
>>>>>
>>>>> The opposite is also true if we exclude a driver/NIC HW that is
>>>>> running on major DCs we lose a lot of insight, experience, value.
>>>>
>>>> Could you please describe in details and examples what exactly is we
>>>> are about to loose? I don't see it.
>>>
>>> As long as driver A introduces new features / improvements / API /
>>> whatever to the core kernel, we benefit from this no matter whether I'm
>>> actually able to run this driver on my system.
>>>
>>> Some drivers even give us benefit by that they are of good quality (I
>>> don't speak for this driver, just some hypothetical) and/or have
>>> interesting design / code / API / etc. choices. The drivers I work on
>>> did gain a lot just from that I was reading new commits / lore threads
>>> and look at changes in other drivers.
>>>
>>> I saw enough situations when driver A started using/doing something the
>>> way it wasn't ever done anywhere before, and then more and more drivers
>>> stated doing the same thing and at the end it became sorta standard.
>> 
>> So bottom line is, the unused driver *may* introduce some features and
>> *may* provide as an example of how to do things for other people.
>> Is this really that beneficial for the community that it overweights
>> the obvious cons (not going to repeat them)?
>> 
>> Like with any other patch/set we merge in, we always look at the cons
>> and pros. I'm honestly surprised that so many people here
>> want to make exception for Meta's internal toy project.
>
>It's not me who wants to make an exception. I haven't ever seen a driver
>rejected due to "it can be used only somewhere where I can't go", so
>looks like it's you who wants to make an exception :>

Could you please point me to some other existing driver for device
that does not exist (/exist only at some person's backyard)?


>
>Thanks,
>Olek

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10  7:42   ` Jiri Pirko
@ 2024-04-10 12:50     ` Przemek Kitszel
  2024-04-10 13:46     ` Jakub Kicinski
  1 sibling, 0 replies; 163+ messages in thread
From: Przemek Kitszel @ 2024-04-10 12:50 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: pabeni, John Fastabend, Alexander Lobakin, Florian Fainelli,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

On 4/10/24 09:42, Jiri Pirko wrote:
> Tue, Apr 09, 2024 at 10:51:42PM CEST, kuba@kernel.org wrote:
>> On Wed, 03 Apr 2024 13:08:24 -0700 Alexander Duyck wrote:
>>> This patch set includes the necessary patches to enable basic Tx and Rx
>>> over the Meta Platforms Host Network Interface. To do this we introduce a
>>> new driver and driver and directories in the form of
>>> "drivers/net/ethernet/meta/fbnic".
>>
>> Let me try to restate some takeaways and ask for further clarification
>> on the main question...
>>
>> First, I think there's broad support for merging the driver itself.
>>
>> IIUC there is also broad support to raise the expectations from
>> maintainers of drivers for private devices, specifically that they will:
>> - receive weaker "no regression" guarantees
>> - help with refactoring / adapting their drivers more actively
> 
> :)
> 
> 
>> - not get upset when we delete those drivers if they stop participating
> 
> Sorry for being pain, but I would still like to see some sumarization of
> what is actually the gain for the community to merge this unused driver.
> So far, I don't recall to read anything solid.

For me personally, both as a developer and as an user, any movement into
lean-FW direction is a breeze of fresh air.

And nobody is stopping Nvidia or other vendor from manufacturing
Advanced FBNIC Accelerator TM, that uses the driver as-is, but makes it
faster, better and cheaper that anything you could buy right now.

> 
> btw:
> Kconfig description should contain:
>   Say N here, you can't ever see this device in real world.
> 

Thank you for keeping this entertaining :)

> 
>>
>> If you think that the drivers should be merged *without* setting these
>> expectations, please speak up.
>>
>> Nobody picked me up on the suggestion to use the CI as a proactive
>> check whether the maintainer / owner is still paying attention,
>> but okay :(
>>
>>
>> What is less clear to me is what do we do about uAPI / core changes.
>> Of those who touched on the subject - few people seem to be curious /
>> welcoming to any reasonable features coming out for private devices
>> (John, Olek, Florian)? Others are more cautious focusing on blast
>> radius and referring to the "two driver rule" (Daniel, Paolo)?
>> Whether that means outright ban on touching common code or uAPI
>> in ways which aren't exercised by commercial NICs, is unclear.
> 
> For these kind of unused drivers, I think it would be legit to
> disallow any internal/external api changes. Just do that for some
> normal driver, then benefit from the changes in the unused driver.
> 
> Now the question is, how to distinguish these 2 driver kinds? Maybe to
> put them under some directory so it is clear?
> drivers/net/unused/ethernet/meta/fbnic/
> 
> 
>> Andrew and Ed did not address the question directly AFAICT.
>>
>> Is my reading correct? Does anyone have an opinion on whether we should
>> try to dig more into this question prior to merging the driver, and
>> set some ground rules? Or proceed and learn by doing?
>>
> 


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10  7:42   ` Jiri Pirko
  2024-04-10 12:50     ` Przemek Kitszel
@ 2024-04-10 13:46     ` Jakub Kicinski
  2024-04-10 15:12       ` Jiri Pirko
  1 sibling, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-10 13:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: pabeni, John Fastabend, Alexander Lobakin, Florian Fainelli,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, 10 Apr 2024 09:42:14 +0200 Jiri Pirko wrote:
> > - not get upset when we delete those drivers if they stop participating  
> 
> Sorry for being pain, but I would still like to see some sumarization of
> what is actually the gain for the community to merge this unused driver.
> So far, I don't recall to read anything solid.

From the discussion I think some folks made the point that it's
educational to see what big companies do, and seeing the work
may lead to reuse and other people adopting features / ideas.

> btw:
> Kconfig description should contain:
>  Say N here, you can't ever see this device in real world.

We do use standard distro kernels in some corners of the DC, AFAIU.

> >If you think that the drivers should be merged *without* setting these
> >expectations, please speak up.
> >
> >Nobody picked me up on the suggestion to use the CI as a proactive
> >check whether the maintainer / owner is still paying attention, 
> >but okay :(
> >
> >
> >What is less clear to me is what do we do about uAPI / core changes.
> >Of those who touched on the subject - few people seem to be curious /
> >welcoming to any reasonable features coming out for private devices
> >(John, Olek, Florian)? Others are more cautious focusing on blast
> >radius and referring to the "two driver rule" (Daniel, Paolo)?
> >Whether that means outright ban on touching common code or uAPI
> >in ways which aren't exercised by commercial NICs, is unclear.   
> 
> For these kind of unused drivers, I think it would be legit to
> disallow any internal/external api changes. Just do that for some
> normal driver, then benefit from the changes in the unused driver.

Unused is a bit strong, and we didn't put netdevsim in a special
directory. Let's see if more such drivers appear and if there
are practical uses for the separation for scripts etc?

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-10 11:54       ` Yunsheng Lin
@ 2024-04-10 15:03         ` Alexander Duyck
  2024-04-12  8:43           ` Yunsheng Lin
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-10 15:03 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Wed, Apr 10, 2024 at 4:54 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/4/9 23:08, Alexander Duyck wrote:
> > On Tue, Apr 9, 2024 at 4:47 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2024/4/4 4:09, Alexander Duyck wrote:
> >>> From: Alexander Duyck <alexanderduyck@fb.com>
> >
> > [...]
> >
> >>> +     /* Unmap and free processed buffers */
> >>> +     if (head0 >= 0)
> >>> +             fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
> >>> +     fbnic_fill_bdq(nv, &qt->sub0);
> >>> +
> >>> +     if (head1 >= 0)
> >>> +             fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
> >>> +     fbnic_fill_bdq(nv, &qt->sub1);
> >>
> >> I am not sure how complicated the rx handling will be for the advanced
> >> feature. For the current code, for each entry/desc in both qt->sub0 and
> >> qt->sub1 at least need one page, and the page seems to be only used once
> >> no matter however small the page is used?
> >>
> >> I am assuming you want to do 'tightly optimized' operation for this by
> >> calling page_pool_fragment_page(), but manipulating page->pp_ref_count
> >> directly does not seems to add any value for the current code, but seem
> >> to waste a lot of memory by not using the frag API, especially PAGE_SIZE
> >>> 4K?
> >
> > On this hardware both the header and payload buffers are fragmentable.
> > The hardware decides the partitioning and we just follow it. So for
> > example it wouldn't be uncommon to have a jumbo frame split up such
> > that the header is less than 128B plus SKB overhead while the actual
> > data in the payload is just over 1400. So for us fragmenting the pages
> > is a very likely case especially with smaller packets.
>
> I understand that is what you are trying to do, but the code above does
> not seems to match the description, as the fbnic_clean_bdq() and
> fbnic_fill_bdq() are called for qt->sub0 and qt->sub1, so the old pages
> of qt->sub0 and qt->sub1 just cleaned are drained and refill each sub
> with new pages, which does not seems to have any fragmenting?

That is because it is all taken care of by the completion queue. Take
a look in fbnic_pkt_prepare. We are taking the buffer from the header
descriptor and taking a slice out of it there via fbnic_page_pool_get.
Basically we store the fragment count locally in the rx_buf and then
subtract what is leftover when the device is done with it.

> The fragmenting can only happen when there is continuous small packet
> coming from wire so that hw can report the same pg_id for different
> packet with pg_offset before fbnic_clean_bdq() and fbnic_fill_bdq()
> is called? I am not sure how to ensure that considering that we might
> break out of while loop in fbnic_clean_rcq() because of 'packets < budget'
> checking.

We don't free the page until we have moved one past it, or the
hardware has indicated it will take no more slices via a PAGE_FIN bit
in the descriptor.

> > It is better for us to optimize for the small packet scenario than
> > optimize for the case where 4K slices are getting taken. That way when
> > we are CPU constrained handling small packets we are the most
> > optimized whereas for the larger frames we can spare a few cycles to
> > account for the extra overhead. The result should be a higher overall
> > packets per second.
>
> The problem is that small packet means low utilization of the bandwidth
> as more bandwidth is used to send header instead of payload that is useful
> for the user, so the question seems to be how often the small packet is
> seen in the wire?

Very often. Especially when you are running something like servers
where the flow usually consists of an incoming request which is often
only a few hundred bytes, followed by us sending a response which then
leads to a flow of control frames for it.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 13:46     ` Jakub Kicinski
@ 2024-04-10 15:12       ` Jiri Pirko
  2024-04-10 17:35         ` Jakub Kicinski
  0 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-10 15:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pabeni, John Fastabend, Alexander Lobakin, Florian Fainelli,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

Wed, Apr 10, 2024 at 03:46:11PM CEST, kuba@kernel.org wrote:
>On Wed, 10 Apr 2024 09:42:14 +0200 Jiri Pirko wrote:
>> > - not get upset when we delete those drivers if they stop participating  
>> 
>> Sorry for being pain, but I would still like to see some sumarization of
>> what is actually the gain for the community to merge this unused driver.
>> So far, I don't recall to read anything solid.
>
>From the discussion I think some folks made the point that it's
>educational to see what big companies do, and seeing the work
>may lead to reuse and other people adopting features / ideas.

Okay, if that's all, does it justify the cons? Will someone put this on
weights?


>
>> btw:
>> Kconfig description should contain:
>>  Say N here, you can't ever see this device in real world.
>
>We do use standard distro kernels in some corners of the DC, AFAIU.

I find it amusing to think about a distro vendor, for example RedHat,
to support driver for a proprietary private device.


>
>> >If you think that the drivers should be merged *without* setting these
>> >expectations, please speak up.
>> >
>> >Nobody picked me up on the suggestion to use the CI as a proactive
>> >check whether the maintainer / owner is still paying attention, 
>> >but okay :(
>> >
>> >
>> >What is less clear to me is what do we do about uAPI / core changes.
>> >Of those who touched on the subject - few people seem to be curious /
>> >welcoming to any reasonable features coming out for private devices
>> >(John, Olek, Florian)? Others are more cautious focusing on blast
>> >radius and referring to the "two driver rule" (Daniel, Paolo)?
>> >Whether that means outright ban on touching common code or uAPI
>> >in ways which aren't exercised by commercial NICs, is unclear.   
>> 
>> For these kind of unused drivers, I think it would be legit to
>> disallow any internal/external api changes. Just do that for some
>> normal driver, then benefit from the changes in the unused driver.
>
>Unused is a bit strong, and we didn't put netdevsim in a special
>directory. Let's see if more such drivers appear and if there
>are practical uses for the separation for scripts etc?

The practical use I see that the reviewer would spot right away is
someone pushes a feature implemented in this unused driver only.
Say it would be a clear mark for a driver of lower category.
For the person doing API change it would be an indication that he
does not have that cautious to not to break anything in this driver.
The driver maintainer should be the one to deal with potential issues.

With this clear marking and Documentation to describe it, I think I
would be ok to let this in, FWIW.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-09 23:42   ` Andrew Lunn
@ 2024-04-10 15:56     ` Alexander Duyck
  2024-04-10 20:01       ` Andrew Lunn
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-10 15:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

On Tue, Apr 9, 2024 at 4:42 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > What is less clear to me is what do we do about uAPI / core changes.
>
> I would differentiate between core change and core additions. If there
> is very limited firmware on this device, i assume Linux is managing
> the SFP cage, and to some extend the PCS. Extending the core to handle
> these at higher speeds than currently supported would be one such core
> addition. I've no problem with this. And i doubt it will be a single
> NIC using such additions for too long. It looks like ClearFog CX LX2
> could make use of such extensions as well, and there are probably
> other boards and devices, maybe the Zynq 7000?

The driver on this device doesn't have full access over the PHY.
Basically we control everything from the PCS north, and the firmware
controls everything from the PMA south as the physical connection is
MUXed between 4 slices. So this means the firmware also controls all
the I2C and the QSFP and EEPROM. The main reason for this is that
those blocks are shared resources between the slices, as such the
firmware acts as the arbitrator for 4 slices and the BMC.

> > Is my reading correct? Does anyone have an opinion on whether we should
> > try to dig more into this question prior to merging the driver, and
> > set some ground rules? Or proceed and learn by doing?
>
> I'm not too keen on keeping potentially shareable code in the driver
> just because of UEFI. It has long been the norm that you should not
> have wrappers so you can reuse code in different OSes. And UEFI is
> just another OS. So i really would like to see a Linux I2C bus master
> driver, a linux GPIO driver if appropriate, and using phylink, just as
> i've pushed wangxun to do that, and to some extend nvidia with their
> GPIO controller embedded in their NIC. The nice thing is, the
> developers for wangxun has mostly solved all this for a PCIe device,
> so their code can be copied.

Well when you are a one man driver development team sharing code
definitely makes one's life much easier when you have to maintain
multiple drivers. That said, I will see what I can do to comply with
your requests while hopefully not increasing my maintenance burden too
much. If nothing else I might just write myself a kernel compatibility
shim in UEFI so I can reuse the code that way.

The driver has no access to I2C or GPIO. The QSFP and EEPROM are
shared so I don't have access directly from the driver and I will have
to send any messages over the Host/FW mailbox to change any of that.

> Do we need to set some ground rules? No. I can give similar feedback
> as i gave the wangxun developers, if Linux subsystems are not used
> appropriately.
>
>        Andrew

Yeah, I kind of assume that will always be the case. Rules only mean
something if they are enforced anway. Although having the rules
documented does help in terms of making them known to all those
involved.

Anyway, I have already incorporated most of the feedback from the
other maintainers. The two requests you had were a bit more difficult
as they are in areas I am not entirely familiar with so I thought I
would check in with you before I dive into them.

One request I recall you having was to look at using the LED framework
for the LEDs. I think I have enough info to chase that down and get it
resolved for v2. However if you have some examples you would prefer I
follow I can look into that.

As far as the PCS block does it matter if I expose what the actual
underlying IP is or isn't? My request for info on what I can disclose
seems to be moving at the speed of bureaucracy so I don't know how
long it would take if I were to try to write something truly generic.
That said if I just put together something that referred to this as
pcs-fbnic for now, would that work for making this acceptable for
upstream?

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 15:12       ` Jiri Pirko
@ 2024-04-10 17:35         ` Jakub Kicinski
  2024-04-10 17:39           ` Florian Fainelli
  0 siblings, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-10 17:35 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: pabeni, John Fastabend, Alexander Lobakin, Florian Fainelli,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, 10 Apr 2024 17:12:18 +0200 Jiri Pirko wrote:
> >> For these kind of unused drivers, I think it would be legit to
> >> disallow any internal/external api changes. Just do that for some
> >> normal driver, then benefit from the changes in the unused driver.  
> >
> >Unused is a bit strong, and we didn't put netdevsim in a special
> >directory. Let's see if more such drivers appear and if there
> >are practical uses for the separation for scripts etc?  
> 
> The practical use I see that the reviewer would spot right away is
> someone pushes a feature implemented in this unused driver only.
> Say it would be a clear mark for a driver of lower category.
> For the person doing API change it would be an indication that he
> does not have that cautious to not to break anything in this driver.
> The driver maintainer should be the one to deal with potential issues.

Hm, we currently group by vendor but the fact it's a private device
is probably more important indeed. For example if Google submits
a driver for a private device it may be confusing what's public
cloud (which I think/hope GVE is) and what's fully private.

So we could categorize by the characteristic rather than vendor:

drivers/net/ethernet/${term}/fbnic/

I'm afraid it may be hard for us to agree on an accurate term, tho.
"Unused" sounds.. odd, we don't keep unused code, "private"
sounds like we granted someone special right not took some away,
maybe "exclusive"? Or "besteffort"? Or "staging" :D  IDK.

> With this clear marking and Documentation to describe it, I think I
> would be ok to let this in, FWIW.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 17:35         ` Jakub Kicinski
@ 2024-04-10 17:39           ` Florian Fainelli
  2024-04-10 17:56             ` Jakub Kicinski
  0 siblings, 1 reply; 163+ messages in thread
From: Florian Fainelli @ 2024-04-10 17:39 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: pabeni, John Fastabend, Alexander Lobakin, Andrew Lunn,
	Daniel Borkmann, Edward Cree, Alexander Duyck, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn



On 4/10/2024 10:35 AM, Jakub Kicinski wrote:
> On Wed, 10 Apr 2024 17:12:18 +0200 Jiri Pirko wrote:
>>>> For these kind of unused drivers, I think it would be legit to
>>>> disallow any internal/external api changes. Just do that for some
>>>> normal driver, then benefit from the changes in the unused driver.
>>>
>>> Unused is a bit strong, and we didn't put netdevsim in a special
>>> directory. Let's see if more such drivers appear and if there
>>> are practical uses for the separation for scripts etc?
>>
>> The practical use I see that the reviewer would spot right away is
>> someone pushes a feature implemented in this unused driver only.
>> Say it would be a clear mark for a driver of lower category.
>> For the person doing API change it would be an indication that he
>> does not have that cautious to not to break anything in this driver.
>> The driver maintainer should be the one to deal with potential issues.
> 
> Hm, we currently group by vendor but the fact it's a private device
> is probably more important indeed. For example if Google submits
> a driver for a private device it may be confusing what's public
> cloud (which I think/hope GVE is) and what's fully private.
> 
> So we could categorize by the characteristic rather than vendor:
> 
> drivers/net/ethernet/${term}/fbnic/
> 
> I'm afraid it may be hard for us to agree on an accurate term, tho.
> "Unused" sounds.. odd, we don't keep unused code, "private"
> sounds like we granted someone special right not took some away,
> maybe "exclusive"? Or "besteffort"? Or "staging" :D  IDK.

Do we really need that categorization at the directory/filesystem level? 
cannot we just document it clearly in the Kconfig help text and under 
Documentation/networking/?
-- 
Florian

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 17:39           ` Florian Fainelli
@ 2024-04-10 17:56             ` Jakub Kicinski
  2024-04-10 18:00               ` Florian Fainelli
  2024-04-10 18:01               ` Alexander Duyck
  0 siblings, 2 replies; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-10 17:56 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jiri Pirko, pabeni, John Fastabend, Alexander Lobakin,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, 10 Apr 2024 10:39:11 -0700 Florian Fainelli wrote:
> > Hm, we currently group by vendor but the fact it's a private device
> > is probably more important indeed. For example if Google submits
> > a driver for a private device it may be confusing what's public
> > cloud (which I think/hope GVE is) and what's fully private.
> > 
> > So we could categorize by the characteristic rather than vendor:
> > 
> > drivers/net/ethernet/${term}/fbnic/
> > 
> > I'm afraid it may be hard for us to agree on an accurate term, tho.
> > "Unused" sounds.. odd, we don't keep unused code, "private"
> > sounds like we granted someone special right not took some away,
> > maybe "exclusive"? Or "besteffort"? Or "staging" :D  IDK.  
> 
> Do we really need that categorization at the directory/filesystem level? 
> cannot we just document it clearly in the Kconfig help text and under 
> Documentation/networking/?

From the reviewer perspective I think we will just remember.
If some newcomer tries to do refactoring they may benefit from seeing
this is a special device and more help is offered. Dunno if a newcomer
would look at the right docs.

Whether it's more "paperwork" than we'll actually gain, I have no idea.
I may not be the best person to comment.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 17:56             ` Jakub Kicinski
@ 2024-04-10 18:00               ` Florian Fainelli
  2024-04-10 20:03                 ` Jakub Kicinski
  2024-04-10 18:01               ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Florian Fainelli @ 2024-04-10 18:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, pabeni, John Fastabend, Alexander Lobakin,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn



On 4/10/2024 10:56 AM, Jakub Kicinski wrote:
> On Wed, 10 Apr 2024 10:39:11 -0700 Florian Fainelli wrote:
>>> Hm, we currently group by vendor but the fact it's a private device
>>> is probably more important indeed. For example if Google submits
>>> a driver for a private device it may be confusing what's public
>>> cloud (which I think/hope GVE is) and what's fully private.
>>>
>>> So we could categorize by the characteristic rather than vendor:
>>>
>>> drivers/net/ethernet/${term}/fbnic/
>>>
>>> I'm afraid it may be hard for us to agree on an accurate term, tho.
>>> "Unused" sounds.. odd, we don't keep unused code, "private"
>>> sounds like we granted someone special right not took some away,
>>> maybe "exclusive"? Or "besteffort"? Or "staging" :D  IDK.
>>
>> Do we really need that categorization at the directory/filesystem level?
>> cannot we just document it clearly in the Kconfig help text and under
>> Documentation/networking/?
> 
>  From the reviewer perspective I think we will just remember.
> If some newcomer tries to do refactoring they may benefit from seeing
> this is a special device and more help is offered. Dunno if a newcomer
> would look at the right docs.
> 
> Whether it's more "paperwork" than we'll actually gain, I have no idea.
> I may not be the best person to comment.

To me it is starting to feel like more paperwork than warranted, 
although I cannot really think about an "implied" metric that we could 
track, short of monitoring patches/bug reports coming from outside of 
the original driver authors/owners as an indication of how widely 
utilized a given driver is.

The number of changes to the driver between release cycles is not a good 
indication between a driver with few users presents the most agile 
configuration, but similarly, a very actively used driver with real 
world users may see a large number of changes between releases based 
upon its use.

What we need is some sort of popularity contest tracking :)
-- 
Florian

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 17:56             ` Jakub Kicinski
  2024-04-10 18:00               ` Florian Fainelli
@ 2024-04-10 18:01               ` Alexander Duyck
  2024-04-10 18:29                 ` Florian Fainelli
  2024-04-11  6:34                 ` Jiri Pirko
  1 sibling, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-10 18:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Florian Fainelli, Jiri Pirko, pabeni, John Fastabend,
	Alexander Lobakin, Andrew Lunn, Daniel Borkmann, Edward Cree,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, Apr 10, 2024 at 10:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 10 Apr 2024 10:39:11 -0700 Florian Fainelli wrote:
> > > Hm, we currently group by vendor but the fact it's a private device
> > > is probably more important indeed. For example if Google submits
> > > a driver for a private device it may be confusing what's public
> > > cloud (which I think/hope GVE is) and what's fully private.
> > >
> > > So we could categorize by the characteristic rather than vendor:
> > >
> > > drivers/net/ethernet/${term}/fbnic/
> > >
> > > I'm afraid it may be hard for us to agree on an accurate term, tho.
> > > "Unused" sounds.. odd, we don't keep unused code, "private"
> > > sounds like we granted someone special right not took some away,
> > > maybe "exclusive"? Or "besteffort"? Or "staging" :D  IDK.
> >
> > Do we really need that categorization at the directory/filesystem level?
> > cannot we just document it clearly in the Kconfig help text and under
> > Documentation/networking/?
>
> From the reviewer perspective I think we will just remember.
> If some newcomer tries to do refactoring they may benefit from seeing
> this is a special device and more help is offered. Dunno if a newcomer
> would look at the right docs.
>
> Whether it's more "paperwork" than we'll actually gain, I have no idea.
> I may not be the best person to comment.

Are we going to go through and retro-actively move some of the drivers
that are already there that are exclusive to specific companies? That
is the bigger issue as I see it. It has already been brought up that
idpf is exclusive. In addition several other people have reached out
to me about other devices that are exclusive to other organizations.

I don't see any value in it as it would just encourage people to lie
in order to avoid being put in what would essentially become a
blacklisted directory.

If we are going to be trying to come up with some special status maybe
it makes sense to have some status in the MAINTAINERS file that would
indicate that this driver is exclusive to some organization and not
publicly available so any maintenance would have to be proprietary.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 18:01               ` Alexander Duyck
@ 2024-04-10 18:29                 ` Florian Fainelli
  2024-04-10 19:58                   ` Jakub Kicinski
  2024-04-11  6:34                 ` Jiri Pirko
  1 sibling, 1 reply; 163+ messages in thread
From: Florian Fainelli @ 2024-04-10 18:29 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski
  Cc: Jiri Pirko, pabeni, John Fastabend, Alexander Lobakin,
	Andrew Lunn, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn



On 4/10/2024 11:01 AM, Alexander Duyck wrote:
> On Wed, Apr 10, 2024 at 10:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Wed, 10 Apr 2024 10:39:11 -0700 Florian Fainelli wrote:
>>>> Hm, we currently group by vendor but the fact it's a private device
>>>> is probably more important indeed. For example if Google submits
>>>> a driver for a private device it may be confusing what's public
>>>> cloud (which I think/hope GVE is) and what's fully private.
>>>>
>>>> So we could categorize by the characteristic rather than vendor:
>>>>
>>>> drivers/net/ethernet/${term}/fbnic/
>>>>
>>>> I'm afraid it may be hard for us to agree on an accurate term, tho.
>>>> "Unused" sounds.. odd, we don't keep unused code, "private"
>>>> sounds like we granted someone special right not took some away,
>>>> maybe "exclusive"? Or "besteffort"? Or "staging" :D  IDK.
>>>
>>> Do we really need that categorization at the directory/filesystem level?
>>> cannot we just document it clearly in the Kconfig help text and under
>>> Documentation/networking/?
>>
>>  From the reviewer perspective I think we will just remember.
>> If some newcomer tries to do refactoring they may benefit from seeing
>> this is a special device and more help is offered. Dunno if a newcomer
>> would look at the right docs.
>>
>> Whether it's more "paperwork" than we'll actually gain, I have no idea.
>> I may not be the best person to comment.
> 
> Are we going to go through and retro-actively move some of the drivers
> that are already there that are exclusive to specific companies? That
> is the bigger issue as I see it. It has already been brought up that
> idpf is exclusive. In addition several other people have reached out
> to me about other devices that are exclusive to other organizations.
> 
> I don't see any value in it as it would just encourage people to lie
> in order to avoid being put in what would essentially become a
> blacklisted directory.

Agreed.

> 
> If we are going to be trying to come up with some special status maybe
> it makes sense to have some status in the MAINTAINERS file that would
> indicate that this driver is exclusive to some organization and not
> publicly available so any maintenance would have to be proprietary.

I like that idea.
-- 
Florian

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 18:29                 ` Florian Fainelli
@ 2024-04-10 19:58                   ` Jakub Kicinski
  2024-04-10 22:03                     ` Jacob Keller
  0 siblings, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-10 19:58 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Alexander Duyck, Jiri Pirko, pabeni, John Fastabend,
	Alexander Lobakin, Andrew Lunn, Daniel Borkmann, Edward Cree,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, 10 Apr 2024 11:29:57 -0700 Florian Fainelli wrote:
> > If we are going to be trying to come up with some special status maybe
> > it makes sense to have some status in the MAINTAINERS file that would
> > indicate that this driver is exclusive to some organization and not
> > publicly available so any maintenance would have to be proprietary.  
> 
> I like that idea.

+1, also first idea that came to mind but I was too afraid 
of bike shedding to mention it :) Fingers crossed? :)

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 15:56     ` Alexander Duyck
@ 2024-04-10 20:01       ` Andrew Lunn
  2024-04-10 21:07         ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-10 20:01 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, Apr 10, 2024 at 08:56:31AM -0700, Alexander Duyck wrote:
> On Tue, Apr 9, 2024 at 4:42 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > What is less clear to me is what do we do about uAPI / core changes.
> >
> > I would differentiate between core change and core additions. If there
> > is very limited firmware on this device, i assume Linux is managing
> > the SFP cage, and to some extend the PCS. Extending the core to handle
> > these at higher speeds than currently supported would be one such core
> > addition. I've no problem with this. And i doubt it will be a single
> > NIC using such additions for too long. It looks like ClearFog CX LX2
> > could make use of such extensions as well, and there are probably
> > other boards and devices, maybe the Zynq 7000?
> 
> The driver on this device doesn't have full access over the PHY.
> Basically we control everything from the PCS north, and the firmware
> controls everything from the PMA south as the physical connection is
> MUXed between 4 slices. So this means the firmware also controls all
> the I2C and the QSFP and EEPROM. The main reason for this is that
> those blocks are shared resources between the slices, as such the
> firmware acts as the arbitrator for 4 slices and the BMC.

Ah, shame. You took what is probably the least valuable intellectual
property, and most shareable with the community and locked it up in
firmware where nobody can use it.

You should probably stop saying there is not much firmware with this
device, and that Linux controls it. It clearly does not...

	Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 18:00               ` Florian Fainelli
@ 2024-04-10 20:03                 ` Jakub Kicinski
  0 siblings, 0 replies; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-10 20:03 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jiri Pirko, pabeni, John Fastabend, Alexander Lobakin,
	Andrew Lunn, Daniel Borkmann, Edward Cree, Alexander Duyck,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, 10 Apr 2024 11:00:35 -0700 Florian Fainelli wrote:
> although I cannot really think about an "implied" metric that we could 
> track, short of monitoring patches/bug reports coming from outside of 
> the original driver authors/owners as an indication of how widely 
> utilized a given driver is.

Not metric, just to clarify. I think the discussion started from 
my email saying:

 - help with refactoring / adapting their drivers more actively

and that may be an empty promise if the person doing the refactoring
does not know they could ask. It's not uncommon for a relative
newcomer to redo some internal API. Not that it's usually a hard
transformation.. Dunno, a bit hypothetical.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 04/15] eth: fbnic: Add register init to set PCIe/Ethernet device config
  2024-04-03 20:46   ` Andrew Lunn
@ 2024-04-10 20:31     ` Jacob Keller
  0 siblings, 0 replies; 163+ messages in thread
From: Jacob Keller @ 2024-04-10 20:31 UTC (permalink / raw)
  To: Andrew Lunn, Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni



On 4/3/2024 1:46 PM, Andrew Lunn wrote:
>> +#define wr32(reg, val)	fbnic_wr32(fbd, reg, val)
>> +#define rd32(reg)	fbnic_rd32(fbd, reg)
>> +#define wrfl()		fbnic_rd32(fbd, FBNIC_MASTER_SPARE_0)
> 
> I don't think that is considered best practices, using variables not
> passed to the macro.
> 
> 	Andrew
> 

Yea, please avoid this, it will only cause pain later when debugging why
something doesn't work.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 20:01       ` Andrew Lunn
@ 2024-04-10 21:07         ` Alexander Duyck
  2024-04-10 22:37           ` Andrew Lunn
  2024-04-11  6:39           ` Jiri Pirko
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-10 21:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, Apr 10, 2024 at 1:01 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Wed, Apr 10, 2024 at 08:56:31AM -0700, Alexander Duyck wrote:
> > On Tue, Apr 9, 2024 at 4:42 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > What is less clear to me is what do we do about uAPI / core changes.
> > >
> > > I would differentiate between core change and core additions. If there
> > > is very limited firmware on this device, i assume Linux is managing
> > > the SFP cage, and to some extend the PCS. Extending the core to handle
> > > these at higher speeds than currently supported would be one such core
> > > addition. I've no problem with this. And i doubt it will be a single
> > > NIC using such additions for too long. It looks like ClearFog CX LX2
> > > could make use of such extensions as well, and there are probably
> > > other boards and devices, maybe the Zynq 7000?
> >
> > The driver on this device doesn't have full access over the PHY.
> > Basically we control everything from the PCS north, and the firmware
> > controls everything from the PMA south as the physical connection is
> > MUXed between 4 slices. So this means the firmware also controls all
> > the I2C and the QSFP and EEPROM. The main reason for this is that
> > those blocks are shared resources between the slices, as such the
> > firmware acts as the arbitrator for 4 slices and the BMC.
>
> Ah, shame. You took what is probably the least valuable intellectual
> property, and most shareable with the community and locked it up in
> firmware where nobody can use it.
>
> You should probably stop saying there is not much firmware with this
> device, and that Linux controls it. It clearly does not...
>
>         Andrew

Well I was referring more to the data path level more than the phy
configuration. I suspect different people have different levels of
expectations on what minimal firmware is. With this hardware we at
least don't need to use firmware commands to enable or disable queues,
get the device stats, or update a MAC address.

When it comes to multi-host NICs I am not sure there are going to be
any solutions that don't have some level of firmware due to the fact
that the cable is physically shared with multiple slots.

I am assuming we still want to do the PCS driver. So I will still see
what I can do to get that setup.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10  7:26     ` Jiri Pirko
@ 2024-04-10 21:30       ` Jacob Keller
  2024-04-10 22:19         ` Andrew Lunn
  0 siblings, 1 reply; 163+ messages in thread
From: Jacob Keller @ 2024-04-10 21:30 UTC (permalink / raw)
  To: Jiri Pirko, Willem de Bruijn
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Andrew Lunn, Daniel Borkmann, Edward Cree,
	Alexander Duyck, netdev, bhelgaas, linux-pci, Alexander Duyck



On 4/10/2024 12:26 AM, Jiri Pirko wrote:
> Tue, Apr 09, 2024 at 11:06:05PM CEST, willemdebruijn.kernel@gmail.com wrote:
>> Jakub Kicinski wrote:
>>> On Wed, 03 Apr 2024 13:08:24 -0700 Alexander Duyck wrote:
> 
> [...]
> 
>>
>> 2. whether new device features can be supported without at least
>>   two available devices supporting it.
>>
> 
> [...]
> 
>>
>> 2 is out of scope for this series. But I would always want to hear
>> about potential new features that an organization finds valuable
>> enough to implement. Rather than a blanket rule against them.
> 
> This appears out of the nowhere. In the past, I would say wast majority
> of the features was merged with single device implementation. Often, it
> is the only device out there at a time that supports the feature.
> This limitation would put break for feature additions. I can put a long
> list of features that would not be here ever (like 50% of mlxsw driver).
> 
>>
>>
> 

Jakub already mentioned this being nuanced in a previous part of the
thread. In reality, lots of features do get implemented by only one
driver first.

I think its good practice to ensure multiple vendors/drivers can use
whatever common uAPI or kernel API exists. It can be frustrating when
some new API gets introduced but then can't be used by another device..
In most cases thats on the vendors for being slow to respond or work
with each other when developing the new API.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 19:58                   ` Jakub Kicinski
@ 2024-04-10 22:03                     ` Jacob Keller
  2024-04-11  6:31                       ` Jiri Pirko
  0 siblings, 1 reply; 163+ messages in thread
From: Jacob Keller @ 2024-04-10 22:03 UTC (permalink / raw)
  To: Jakub Kicinski, Florian Fainelli
  Cc: Alexander Duyck, Jiri Pirko, pabeni, John Fastabend,
	Alexander Lobakin, Andrew Lunn, Daniel Borkmann, Edward Cree,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn



On 4/10/2024 12:58 PM, Jakub Kicinski wrote:
> On Wed, 10 Apr 2024 11:29:57 -0700 Florian Fainelli wrote:
>>> If we are going to be trying to come up with some special status maybe
>>> it makes sense to have some status in the MAINTAINERS file that would
>>> indicate that this driver is exclusive to some organization and not
>>> publicly available so any maintenance would have to be proprietary.  
>>
>> I like that idea.
> 
> +1, also first idea that came to mind but I was too afraid 
> of bike shedding to mention it :) Fingers crossed? :)
> 

+1, I think putting it in MAINTAINERS makes a lot of sense.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 21:30       ` Jacob Keller
@ 2024-04-10 22:19         ` Andrew Lunn
  2024-04-11  0:31           ` Jacob Keller
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-10 22:19 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Jiri Pirko, Willem de Bruijn, Jakub Kicinski, pabeni,
	John Fastabend, Alexander Lobakin, Florian Fainelli,
	Daniel Borkmann, Edward Cree, Alexander Duyck, netdev, bhelgaas,
	linux-pci, Alexander Duyck

> I think its good practice to ensure multiple vendors/drivers can use
> whatever common uAPI or kernel API exists. It can be frustrating when
> some new API gets introduced but then can't be used by another device..
> In most cases thats on the vendors for being slow to respond or work
> with each other when developing the new API.

I tend to agree with the last part. Vendors tend not to reviewer other
vendors patches, and so often don't notice a new API being added which
they could use, if it was a little bit more generic. Also vendors
often seem to focus on their devices/firmware requirements, not an
abstract device, and so end up with something not generic.

As a reviewer, i try to take more notice of new APIs than most other
things, and ideally it is something we should all do.

	 Andrew




^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 21:07         ` Alexander Duyck
@ 2024-04-10 22:37           ` Andrew Lunn
  2024-04-11 16:00             ` Alexander Duyck
  2024-04-11  6:39           ` Jiri Pirko
  1 sibling, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-10 22:37 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

> Well I was referring more to the data path level more than the phy
> configuration. I suspect different people have different levels of
> expectations on what minimal firmware is. With this hardware we at
> least don't need to use firmware commands to enable or disable queues,
> get the device stats, or update a MAC address.
> 
> When it comes to multi-host NICs I am not sure there are going to be
> any solutions that don't have some level of firmware due to the fact
> that the cable is physically shared with multiple slots.

This is something Russell King at least considered. I don't really
know enough to know why its impossible for Linux to deal with multiple
slots.

> I am assuming we still want to do the PCS driver. So I will still see
> what I can do to get that setup.

You should look at the API offered by drivers in drivers/net/pcs. It
is designed to be used with drivers which actually drive the hardware,
and use phylink. Who is responsible for configuring and looking at the
results of auto negotiation? Who is responsible for putting the PCS
into the correct mode depending on the SFP modules capabilities?
Because you seemed to of split the PCS into two, and hidden some of it
away, i don't know if it makes sense to try to shoehorn what is left
into a Linux driver.

     Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 22:19         ` Andrew Lunn
@ 2024-04-11  0:31           ` Jacob Keller
  0 siblings, 0 replies; 163+ messages in thread
From: Jacob Keller @ 2024-04-11  0:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jiri Pirko, Willem de Bruijn, Jakub Kicinski, pabeni,
	John Fastabend, Alexander Lobakin, Florian Fainelli,
	Daniel Borkmann, Edward Cree, Alexander Duyck, netdev, bhelgaas,
	linux-pci, Alexander Duyck



On 4/10/2024 3:19 PM, Andrew Lunn wrote:
>> I think its good practice to ensure multiple vendors/drivers can use
>> whatever common uAPI or kernel API exists. It can be frustrating when
>> some new API gets introduced but then can't be used by another device..
>> In most cases thats on the vendors for being slow to respond or work
>> with each other when developing the new API.
> 
> I tend to agree with the last part. Vendors tend not to reviewer other
> vendors patches, and so often don't notice a new API being added which
> they could use, if it was a little bit more generic. Also vendors
> often seem to focus on their devices/firmware requirements, not an
> abstract device, and so end up with something not generic.
> 
> As a reviewer, i try to take more notice of new APIs than most other
> things, and ideally it is something we should all do.
> 
> 	 Andrew
> 
> 
> 

Agreed. It can be challenging when you're in the vendor space though, as
you get handed priorities.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 22:03                     ` Jacob Keller
@ 2024-04-11  6:31                       ` Jiri Pirko
  2024-04-11 16:22                         ` Jacob Keller
  0 siblings, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-11  6:31 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Jakub Kicinski, Florian Fainelli, Alexander Duyck, pabeni,
	John Fastabend, Alexander Lobakin, Andrew Lunn, Daniel Borkmann,
	Edward Cree, netdev, bhelgaas, linux-pci, Alexander Duyck,
	Willem de Bruijn

Thu, Apr 11, 2024 at 12:03:54AM CEST, jacob.e.keller@intel.com wrote:
>
>
>On 4/10/2024 12:58 PM, Jakub Kicinski wrote:
>> On Wed, 10 Apr 2024 11:29:57 -0700 Florian Fainelli wrote:
>>>> If we are going to be trying to come up with some special status maybe
>>>> it makes sense to have some status in the MAINTAINERS file that would
>>>> indicate that this driver is exclusive to some organization and not
>>>> publicly available so any maintenance would have to be proprietary.  
>>>
>>> I like that idea.
>> 
>> +1, also first idea that came to mind but I was too afraid 
>> of bike shedding to mention it :) Fingers crossed? :)
>> 
>
>+1, I think putting it in MAINTAINERS makes a lot of sense.

Well, how exactly you imagine to do this? I have no problem using
MAINTAINERS for this, I was thinking about that too, but I could not
figure out the way it would work. Having driver directory is much more
obvious, person cooking up a patch sees that immediatelly. Do you look
at MAINTAINTERS file when you do some driver API changing patch/ any
patch? I certainly don't (not counting get_maintainers sctipt).

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 18:01               ` Alexander Duyck
  2024-04-10 18:29                 ` Florian Fainelli
@ 2024-04-11  6:34                 ` Jiri Pirko
  1 sibling, 0 replies; 163+ messages in thread
From: Jiri Pirko @ 2024-04-11  6:34 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Florian Fainelli, pabeni, John Fastabend,
	Alexander Lobakin, Andrew Lunn, Daniel Borkmann, Edward Cree,
	netdev, bhelgaas, linux-pci, Alexander Duyck, Willem de Bruijn

Wed, Apr 10, 2024 at 08:01:44PM CEST, alexander.duyck@gmail.com wrote:
>On Wed, Apr 10, 2024 at 10:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Wed, 10 Apr 2024 10:39:11 -0700 Florian Fainelli wrote:
>> > > Hm, we currently group by vendor but the fact it's a private device
>> > > is probably more important indeed. For example if Google submits
>> > > a driver for a private device it may be confusing what's public
>> > > cloud (which I think/hope GVE is) and what's fully private.
>> > >
>> > > So we could categorize by the characteristic rather than vendor:
>> > >
>> > > drivers/net/ethernet/${term}/fbnic/
>> > >
>> > > I'm afraid it may be hard for us to agree on an accurate term, tho.
>> > > "Unused" sounds.. odd, we don't keep unused code, "private"
>> > > sounds like we granted someone special right not took some away,
>> > > maybe "exclusive"? Or "besteffort"? Or "staging" :D  IDK.
>> >
>> > Do we really need that categorization at the directory/filesystem level?
>> > cannot we just document it clearly in the Kconfig help text and under
>> > Documentation/networking/?
>>
>> From the reviewer perspective I think we will just remember.
>> If some newcomer tries to do refactoring they may benefit from seeing
>> this is a special device and more help is offered. Dunno if a newcomer
>> would look at the right docs.
>>
>> Whether it's more "paperwork" than we'll actually gain, I have no idea.
>> I may not be the best person to comment.
>
>Are we going to go through and retro-actively move some of the drivers
>that are already there that are exclusive to specific companies? That
>is the bigger issue as I see it. It has already been brought up that

Why is it an issue? Very easy to move drivers to this new directory.


>idpf is exclusive. In addition several other people have reached out
>to me about other devices that are exclusive to other organizations.
>
>I don't see any value in it as it would just encourage people to lie
>in order to avoid being put in what would essentially become a
>blacklisted directory.

You are thinking all or nothing. I'd say that if we have 80% of such
drivers in the correct place/directory, it's a win. The rest will lie.
Shame for them when it is discovered.


>
>If we are going to be trying to come up with some special status maybe
>it makes sense to have some status in the MAINTAINERS file that would
>indicate that this driver is exclusive to some organization and not
>publicly available so any maintenance would have to be proprietary.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 21:07         ` Alexander Duyck
  2024-04-10 22:37           ` Andrew Lunn
@ 2024-04-11  6:39           ` Jiri Pirko
  2024-04-11 16:46             ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Jiri Pirko @ 2024-04-11  6:39 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Andrew Lunn, Jakub Kicinski, pabeni, John Fastabend,
	Alexander Lobakin, Florian Fainelli, Daniel Borkmann,
	Edward Cree, netdev, bhelgaas, linux-pci, Alexander Duyck,
	Willem de Bruijn

Wed, Apr 10, 2024 at 11:07:02PM CEST, alexander.duyck@gmail.com wrote:
>On Wed, Apr 10, 2024 at 1:01 PM Andrew Lunn <andrew@lunn.ch> wrote:
>>
>> On Wed, Apr 10, 2024 at 08:56:31AM -0700, Alexander Duyck wrote:
>> > On Tue, Apr 9, 2024 at 4:42 PM Andrew Lunn <andrew@lunn.ch> wrote:
>> > >
>> > > > What is less clear to me is what do we do about uAPI / core changes.
>> > >
>> > > I would differentiate between core change and core additions. If there
>> > > is very limited firmware on this device, i assume Linux is managing
>> > > the SFP cage, and to some extend the PCS. Extending the core to handle
>> > > these at higher speeds than currently supported would be one such core
>> > > addition. I've no problem with this. And i doubt it will be a single
>> > > NIC using such additions for too long. It looks like ClearFog CX LX2
>> > > could make use of such extensions as well, and there are probably
>> > > other boards and devices, maybe the Zynq 7000?
>> >
>> > The driver on this device doesn't have full access over the PHY.
>> > Basically we control everything from the PCS north, and the firmware
>> > controls everything from the PMA south as the physical connection is
>> > MUXed between 4 slices. So this means the firmware also controls all
>> > the I2C and the QSFP and EEPROM. The main reason for this is that
>> > those blocks are shared resources between the slices, as such the
>> > firmware acts as the arbitrator for 4 slices and the BMC.
>>
>> Ah, shame. You took what is probably the least valuable intellectual
>> property, and most shareable with the community and locked it up in
>> firmware where nobody can use it.
>>
>> You should probably stop saying there is not much firmware with this
>> device, and that Linux controls it. It clearly does not...
>>
>>         Andrew
>
>Well I was referring more to the data path level more than the phy
>configuration. I suspect different people have different levels of
>expectations on what minimal firmware is. With this hardware we at
>least don't need to use firmware commands to enable or disable queues,
>get the device stats, or update a MAC address.
>
>When it comes to multi-host NICs I am not sure there are going to be
>any solutions that don't have some level of firmware due to the fact

A small linux host on the nic that controls the eswitch perhaps? I mean,
the multi pf nic without a host in charge of the physical port and
swithing between it and pf is simply broken design. And yeah you would
probably now want to argue others are doing it already in the same way :)
True that.


>that the cable is physically shared with multiple slots.
>
>I am assuming we still want to do the PCS driver. So I will still see
>what I can do to get that setup.
>
>Thanks,
>
>- Alex
>

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-10 22:37           ` Andrew Lunn
@ 2024-04-11 16:00             ` Alexander Duyck
  2024-04-11 17:32               ` Andrew Lunn
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-11 16:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

On Wed, Apr 10, 2024 at 3:37 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Well I was referring more to the data path level more than the phy
> > configuration. I suspect different people have different levels of
> > expectations on what minimal firmware is. With this hardware we at
> > least don't need to use firmware commands to enable or disable queues,
> > get the device stats, or update a MAC address.
> >
> > When it comes to multi-host NICs I am not sure there are going to be
> > any solutions that don't have some level of firmware due to the fact
> > that the cable is physically shared with multiple slots.
>
> This is something Russell King at least considered. I don't really
> know enough to know why its impossible for Linux to deal with multiple
> slots.

It mostly has to do with the arbitration between them. It is a matter
of having to pass a TON of info to the individual slice and then the
problem is it would have to do things correctly and not manage to take
out it's neighbor or the BMC.

> > I am assuming we still want to do the PCS driver. So I will still see
> > what I can do to get that setup.
>
> You should look at the API offered by drivers in drivers/net/pcs. It
> is designed to be used with drivers which actually drive the hardware,
> and use phylink. Who is responsible for configuring and looking at the
> results of auto negotiation? Who is responsible for putting the PCS
> into the correct mode depending on the SFP modules capabilities?
> Because you seemed to of split the PCS into two, and hidden some of it
> away, i don't know if it makes sense to try to shoehorn what is left
> into a Linux driver.

We have control of the auto negotiation as that is north of the PMA
and is configured per host. We should support clause 73 autoneg.
Although we haven't done much with it as most of our use cases are
just fixed speed setups to the switch over either 25G-CR1, 50G-CR2,
50G-CR1, or 100G-CR2. So odds are we aren't going to be doing anything
too terribly exciting.

As far as the QSFP setup the FW is responsible for any communication
with it. I suspect that the expectation is that we aren't going to
need much in the way of config since we are just using direct attach
cables. So we are the only one driving the PCS assuming we aren't
talking about power-on init where the FW is setting up for the BMC to
have access.

We will have to see. The PCS drivers in that directory mostly make
sense to me and don't look like too much of a departure from my
current code. It will just be a matter of splitting up the fbnic_mac.c
file and adding the PCS logic as a separate block, or at least I hope
that is all that is mostly involved. Probably take me a week or two to
get it coded up and push out the v2.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-11  6:31                       ` Jiri Pirko
@ 2024-04-11 16:22                         ` Jacob Keller
  0 siblings, 0 replies; 163+ messages in thread
From: Jacob Keller @ 2024-04-11 16:22 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Florian Fainelli, Alexander Duyck, pabeni,
	John Fastabend, Alexander Lobakin, Andrew Lunn, Daniel Borkmann,
	Edward Cree, netdev, bhelgaas, linux-pci, Alexander Duyck,
	Willem de Bruijn



On 4/10/2024 11:31 PM, Jiri Pirko wrote:
> Thu, Apr 11, 2024 at 12:03:54AM CEST, jacob.e.keller@intel.com wrote:
>>
>>
>> On 4/10/2024 12:58 PM, Jakub Kicinski wrote:
>>> On Wed, 10 Apr 2024 11:29:57 -0700 Florian Fainelli wrote:
>>>>> If we are going to be trying to come up with some special status maybe
>>>>> it makes sense to have some status in the MAINTAINERS file that would
>>>>> indicate that this driver is exclusive to some organization and not
>>>>> publicly available so any maintenance would have to be proprietary.  
>>>>
>>>> I like that idea.
>>>
>>> +1, also first idea that came to mind but I was too afraid 
>>> of bike shedding to mention it :) Fingers crossed? :)
>>>
>>
>> +1, I think putting it in MAINTAINERS makes a lot of sense.
> 
> Well, how exactly you imagine to do this? I have no problem using
> MAINTAINERS for this, I was thinking about that too, but I could not
> figure out the way it would work. Having driver directory is much more
> obvious, person cooking up a patch sees that immediatelly. Do you look
> at MAINTAINTERS file when you do some driver API changing patch/ any
> patch? I certainly don't (not counting get_maintainers sctipt).

I use MAINTAINERS (along with get_maintainers) to figure out who to CC
when dealing with a driver. I guess I probably don't do so before making
a change.

I guess it depends on what the intent of documenting it is for.
Presumably it would be a hint and reminder to future maintainers that if
the folks listed as MAINTAINERS are no longer responsive then this
driver can be reverted.

Or if you push more strongly towards "its up to them to do all
maintenance" i.e. we don't make API changes for them, etc? I didn't get
the sense that was the consensus.

The consensus I got from reading this thread is that most people are ok
with merging the driver. Some reservations about the future and any API
changes specifically for the one driver... Some reservations about the
extra maintenance burden.

Several others pointed out example cases which are similar in
availability. Perhaps not quite as obvious as the case of this where the
device is produced and consumed by the same group only.

The practical reality is that many of the devices are practically
exclusive if not definitionally so.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-11  6:39           ` Jiri Pirko
@ 2024-04-11 16:46             ` Alexander Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-11 16:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Andrew Lunn, Jakub Kicinski, pabeni, John Fastabend,
	Alexander Lobakin, Florian Fainelli, Daniel Borkmann,
	Edward Cree, netdev, bhelgaas, linux-pci, Alexander Duyck,
	Willem de Bruijn

On Wed, Apr 10, 2024 at 11:39 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Wed, Apr 10, 2024 at 11:07:02PM CEST, alexander.duyck@gmail.com wrote:
> >On Wed, Apr 10, 2024 at 1:01 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >>
> >> On Wed, Apr 10, 2024 at 08:56:31AM -0700, Alexander Duyck wrote:
> >> > On Tue, Apr 9, 2024 at 4:42 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >> > >
> >> > > > What is less clear to me is what do we do about uAPI / core changes.
> >> > >
> >> > > I would differentiate between core change and core additions. If there
> >> > > is very limited firmware on this device, i assume Linux is managing
> >> > > the SFP cage, and to some extend the PCS. Extending the core to handle
> >> > > these at higher speeds than currently supported would be one such core
> >> > > addition. I've no problem with this. And i doubt it will be a single
> >> > > NIC using such additions for too long. It looks like ClearFog CX LX2
> >> > > could make use of such extensions as well, and there are probably
> >> > > other boards and devices, maybe the Zynq 7000?
> >> >
> >> > The driver on this device doesn't have full access over the PHY.
> >> > Basically we control everything from the PCS north, and the firmware
> >> > controls everything from the PMA south as the physical connection is
> >> > MUXed between 4 slices. So this means the firmware also controls all
> >> > the I2C and the QSFP and EEPROM. The main reason for this is that
> >> > those blocks are shared resources between the slices, as such the
> >> > firmware acts as the arbitrator for 4 slices and the BMC.
> >>
> >> Ah, shame. You took what is probably the least valuable intellectual
> >> property, and most shareable with the community and locked it up in
> >> firmware where nobody can use it.
> >>
> >> You should probably stop saying there is not much firmware with this
> >> device, and that Linux controls it. It clearly does not...
> >>
> >>         Andrew
> >
> >Well I was referring more to the data path level more than the phy
> >configuration. I suspect different people have different levels of
> >expectations on what minimal firmware is. With this hardware we at
> >least don't need to use firmware commands to enable or disable queues,
> >get the device stats, or update a MAC address.
> >
> >When it comes to multi-host NICs I am not sure there are going to be
> >any solutions that don't have some level of firmware due to the fact
>
> A small linux host on the nic that controls the eswitch perhaps? I mean,
> the multi pf nic without a host in charge of the physical port and
> swithing between it and pf is simply broken design. And yeah you would
> probably now want to argue others are doing it already in the same way :)
> True that.

Well in our case there isn't an eswitch. The issue is more the logic
for the Ethernet PHY isn't setup to only run one port. Instead the PHY
is MUXed over 2 ports per interface, and then the QSFP interface
itself is spread over 4 ports.

What you end up with is something like the second to last image in
this article[1] where you have a MAC/PCS pair per host sitting on top
of one PMA with some blocks that are shared between the hosts and some
that are not. The issue becomes management of access to the QSFP and
PHY and how to prevent one host from being able to monopolize the
PHY/QSFP or crash the others if something goes sideways. Then you have
to also add in the BMC management on top of that.

[1]: https://semiengineering.com/integrated-ethernet-pcs-and-phy-ip-for-400g-800g-hyperscale-data-centers/

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-11 16:00             ` Alexander Duyck
@ 2024-04-11 17:32               ` Andrew Lunn
  2024-04-11 23:12                 ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-11 17:32 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

On Thu, Apr 11, 2024 at 09:00:17AM -0700, Alexander Duyck wrote:
> On Wed, Apr 10, 2024 at 3:37 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > Well I was referring more to the data path level more than the phy
> > > configuration. I suspect different people have different levels of
> > > expectations on what minimal firmware is. With this hardware we at
> > > least don't need to use firmware commands to enable or disable queues,
> > > get the device stats, or update a MAC address.
> > >
> > > When it comes to multi-host NICs I am not sure there are going to be
> > > any solutions that don't have some level of firmware due to the fact
> > > that the cable is physically shared with multiple slots.
> >
> > This is something Russell King at least considered. I don't really
> > know enough to know why its impossible for Linux to deal with multiple
> > slots.
> 
> It mostly has to do with the arbitration between them. It is a matter
> of having to pass a TON of info to the individual slice and then the
> problem is it would have to do things correctly and not manage to take
> out it's neighbor or the BMC.

How much is specific to your device? How much is just following 802.3
and the CMIS standards? I assume anything which is just following
802.3 and CMIS could actually be re-used? And you have some glue to
combine them in a way that is specific to your device?
 
> > > I am assuming we still want to do the PCS driver. So I will still see
> > > what I can do to get that setup.
> >
> > You should look at the API offered by drivers in drivers/net/pcs. It
> > is designed to be used with drivers which actually drive the hardware,
> > and use phylink. Who is responsible for configuring and looking at the
> > results of auto negotiation? Who is responsible for putting the PCS
> > into the correct mode depending on the SFP modules capabilities?
> > Because you seemed to of split the PCS into two, and hidden some of it
> > away, i don't know if it makes sense to try to shoehorn what is left
> > into a Linux driver.
> 
> We have control of the auto negotiation as that is north of the PMA
> and is configured per host. We should support clause 73 autoneg.
> Although we haven't done much with it as most of our use cases are
> just fixed speed setups to the switch over either 25G-CR1, 50G-CR2,
> 50G-CR1, or 100G-CR2. So odds are we aren't going to be doing anything
> too terribly exciting.

Maybe not, but you might of gained from the community here, if others
could of adopted this code for their devices. You might not need
clause 73, but phylink provides helpers to implement it, so it is
pretty easy to add. Maybe your initial PCS driver does not support it,
but later adopters who also licence this PCS might add it, and you get
the feature for free. The corrected/uncorrected counters i asked
about, are something you might not export in your current code via
ethtool. But again, this is something which somebody else could add a
helper for, and you would get it nearly for free.

> As far as the QSFP setup the FW is responsible for any communication
> with it. I suspect that the expectation is that we aren't going to
> need much in the way of config since we are just using direct attach
> cables.

Another place you might of got features for free. The Linux SFP driver
exports HWMON values for temperature, power, received power, etc, but
for 1G. The QSFP+ standard Versatile Diagnostics Monitoring is
different, but i could see somebody adding a generic implementation in
the Linux SFP driver, so that the HWMON support is just free. Same
goes for the error performance statics. Parts of power management
could easily be generic. It might be possible to use Linux regulators
to describe what your board is capable if, and the SFP core could then
implement the ethtool ops, checking with the regulator to see if the
power is actually available, and then talking to the SFP to tell it to
change its power class?

Florian posted some interesting statistics, that vendors tend to
maintain their own drivers, and don't get much support from the
community. However i suspect it is a different story for shared
infrastructure like PCS drivers, PHY drivers, SFP drivers. That is
where you get the most community support and the most stuff for free.
But you actually have to use it to benefit from it.

	Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface
  2024-04-11 17:32               ` Andrew Lunn
@ 2024-04-11 23:12                 ` Alexander Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-11 23:12 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, pabeni, John Fastabend, Alexander Lobakin,
	Florian Fainelli, Daniel Borkmann, Edward Cree, netdev, bhelgaas,
	linux-pci, Alexander Duyck, Willem de Bruijn

On Thu, Apr 11, 2024 at 10:32 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Thu, Apr 11, 2024 at 09:00:17AM -0700, Alexander Duyck wrote:
> > On Wed, Apr 10, 2024 at 3:37 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > Well I was referring more to the data path level more than the phy
> > > > configuration. I suspect different people have different levels of
> > > > expectations on what minimal firmware is. With this hardware we at
> > > > least don't need to use firmware commands to enable or disable queues,
> > > > get the device stats, or update a MAC address.
> > > >
> > > > When it comes to multi-host NICs I am not sure there are going to be
> > > > any solutions that don't have some level of firmware due to the fact
> > > > that the cable is physically shared with multiple slots.
> > >
> > > This is something Russell King at least considered. I don't really
> > > know enough to know why its impossible for Linux to deal with multiple
> > > slots.
> >
> > It mostly has to do with the arbitration between them. It is a matter
> > of having to pass a TON of info to the individual slice and then the
> > problem is it would have to do things correctly and not manage to take
> > out it's neighbor or the BMC.
>
> How much is specific to your device? How much is just following 802.3
> and the CMIS standards? I assume anything which is just following
> 802.3 and CMIS could actually be re-used? And you have some glue to
> combine them in a way that is specific to your device?
>
> > > > I am assuming we still want to do the PCS driver. So I will still see
> > > > what I can do to get that setup.
> > >
> > > You should look at the API offered by drivers in drivers/net/pcs. It
> > > is designed to be used with drivers which actually drive the hardware,
> > > and use phylink. Who is responsible for configuring and looking at the
> > > results of auto negotiation? Who is responsible for putting the PCS
> > > into the correct mode depending on the SFP modules capabilities?
> > > Because you seemed to of split the PCS into two, and hidden some of it
> > > away, i don't know if it makes sense to try to shoehorn what is left
> > > into a Linux driver.
> >
> > We have control of the auto negotiation as that is north of the PMA
> > and is configured per host. We should support clause 73 autoneg.
> > Although we haven't done much with it as most of our use cases are
> > just fixed speed setups to the switch over either 25G-CR1, 50G-CR2,
> > 50G-CR1, or 100G-CR2. So odds are we aren't going to be doing anything
> > too terribly exciting.
>
> Maybe not, but you might of gained from the community here, if others
> could of adopted this code for their devices. You might not need
> clause 73, but phylink provides helpers to implement it, so it is
> pretty easy to add. Maybe your initial PCS driver does not support it,
> but later adopters who also licence this PCS might add it, and you get
> the feature for free. The corrected/uncorrected counters i asked
> about, are something you might not export in your current code via
> ethtool. But again, this is something which somebody else could add a
> helper for, and you would get it nearly for free.

You don't have to sell me on the reuse advantages of open source. I
will probably look at adding autoneg at some point in the future, but
for our main use case it wasn't needed. If nothing else I will
probably hand it off to one of the new hires on the team when I get
some time.

The counters are exported. Just haven't gotten far enough to show the
ethtool patches yet.. :-)

> > As far as the QSFP setup the FW is responsible for any communication
> > with it. I suspect that the expectation is that we aren't going to
> > need much in the way of config since we are just using direct attach
> > cables.
>
> Another place you might of got features for free. The Linux SFP driver
> exports HWMON values for temperature, power, received power, etc, but
> for 1G. The QSFP+ standard Versatile Diagnostics Monitoring is
> different, but i could see somebody adding a generic implementation in
> the Linux SFP driver, so that the HWMON support is just free. Same
> goes for the error performance statics. Parts of power management
> could easily be generic. It might be possible to use Linux regulators
> to describe what your board is capable if, and the SFP core could then
> implement the ethtool ops, checking with the regulator to see if the
> power is actually available, and then talking to the SFP to tell it to
> change its power class?

Again, for us it ends up not having much value adding additional QSFP
logic because we aren't using anything fancy. It is all just direct
attach cables.

> Florian posted some interesting statistics, that vendors tend to
> maintain their own drivers, and don't get much support from the
> community. However I suspect it is a different story for shared
> infrastructure like PCS drivers, PHY drivers, SFP drivers. That is
> where you get the most community support and the most stuff for free.
> But you actually have to use it to benefit from it.

I'll probably get started on the PCS drivers for this next week. I
will follow up with questions if I run into any issues.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-10 15:03         ` Alexander Duyck
@ 2024-04-12  8:43           ` Yunsheng Lin
  2024-04-12  9:47             ` Yunsheng Lin
  2024-04-12 15:05             ` Alexander Duyck
  0 siblings, 2 replies; 163+ messages in thread
From: Yunsheng Lin @ 2024-04-12  8:43 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On 2024/4/10 23:03, Alexander Duyck wrote:
> On Wed, Apr 10, 2024 at 4:54 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2024/4/9 23:08, Alexander Duyck wrote:
>>> On Tue, Apr 9, 2024 at 4:47 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>>
>>>> On 2024/4/4 4:09, Alexander Duyck wrote:
>>>>> From: Alexander Duyck <alexanderduyck@fb.com>
>>>
>>> [...]
>>>
>>>>> +     /* Unmap and free processed buffers */
>>>>> +     if (head0 >= 0)
>>>>> +             fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
>>>>> +     fbnic_fill_bdq(nv, &qt->sub0);
>>>>> +
>>>>> +     if (head1 >= 0)
>>>>> +             fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
>>>>> +     fbnic_fill_bdq(nv, &qt->sub1);
>>>>
>>>> I am not sure how complicated the rx handling will be for the advanced
>>>> feature. For the current code, for each entry/desc in both qt->sub0 and
>>>> qt->sub1 at least need one page, and the page seems to be only used once
>>>> no matter however small the page is used?
>>>>
>>>> I am assuming you want to do 'tightly optimized' operation for this by
>>>> calling page_pool_fragment_page(), but manipulating page->pp_ref_count
>>>> directly does not seems to add any value for the current code, but seem
>>>> to waste a lot of memory by not using the frag API, especially PAGE_SIZE
>>>>> 4K?
>>>
>>> On this hardware both the header and payload buffers are fragmentable.
>>> The hardware decides the partitioning and we just follow it. So for
>>> example it wouldn't be uncommon to have a jumbo frame split up such
>>> that the header is less than 128B plus SKB overhead while the actual
>>> data in the payload is just over 1400. So for us fragmenting the pages
>>> is a very likely case especially with smaller packets.
>>
>> I understand that is what you are trying to do, but the code above does
>> not seems to match the description, as the fbnic_clean_bdq() and
>> fbnic_fill_bdq() are called for qt->sub0 and qt->sub1, so the old pages
>> of qt->sub0 and qt->sub1 just cleaned are drained and refill each sub
>> with new pages, which does not seems to have any fragmenting?
> 
> That is because it is all taken care of by the completion queue. Take
> a look in fbnic_pkt_prepare. We are taking the buffer from the header
> descriptor and taking a slice out of it there via fbnic_page_pool_get.
> Basically we store the fragment count locally in the rx_buf and then
> subtract what is leftover when the device is done with it.

The above seems look a lot like the prepare/commit API in [1], the prepare
is done in fbnic_fill_bdq() and commit is done by fbnic_page_pool_get() in
fbnic_pkt_prepare() and fbnic_add_rx_frag().

If page_pool is able to provide a central place for pagecnt_bias of all the
fragmemts of the same page, we may provide a similar prepare/commit API for
frag API, I am not sure how to handle it for now.

From the below macro, this hw seems to be only able to handle 4K memory for
each entry/desc in qt->sub0 and qt->sub1, so there seems to be a lot of memory
that is unused for PAGE_SIZE > 4K as it is allocating memory based on page
granularity for each rx_buf in qt->sub0 and qt->sub1.

+#define FBNIC_RCD_AL_BUFF_OFF_MASK		DESC_GENMASK(43, 32)

It is still possible to reserve enough pagecnt_bias for each fragment, so that
the caller can still do its own fragmenting on fragment granularity as we
seems to have enough pagecnt_bias for each page.

If we provide a proper frag API to reserve enough pagecnt_bias for caller to
do its own fragmenting, then the memory waste may be avoided for this hw in
system with PAGE_SIZE > 4K.

1. https://lore.kernel.org/lkml/20240407130850.19625-10-linyunsheng@huawei.com/

> 
>> The fragmenting can only happen when there is continuous small packet
>> coming from wire so that hw can report the same pg_id for different
>> packet with pg_offset before fbnic_clean_bdq() and fbnic_fill_bdq()
>> is called? I am not sure how to ensure that considering that we might
>> break out of while loop in fbnic_clean_rcq() because of 'packets < budget'
>> checking.
> 
> We don't free the page until we have moved one past it, or the
> hardware has indicated it will take no more slices via a PAGE_FIN bit
> in the descriptor.


I look more closely at it, I am not able to figure it out how it is done
yet, as the PAGE_FIN bit mentioned above seems to be only used to calculate
the hdr_pg_end and truesize in fbnic_pkt_prepare() and fbnic_add_rx_frag().

For the below flow in fbnic_clean_rcq(), fbnic_clean_bdq() will be called
to drain the page in rx_buf just cleaned when head0/head1 >= 0, so I am not
sure how it do the fragmenting yet, am I missing something obvious here?

	while (likely(packets < budget)) {
		switch (FIELD_GET(FBNIC_RCD_TYPE_MASK, rcd)) {
		case FBNIC_RCD_TYPE_HDR_AL:
			head0 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
			fbnic_pkt_prepare(nv, rcd, pkt, qt);

			break;
		case FBNIC_RCD_TYPE_PAY_AL:
			head1 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
			fbnic_add_rx_frag(nv, rcd, pkt, qt);

			break;

		case FBNIC_RCD_TYPE_META:
			if (likely(!fbnic_rcd_metadata_err(rcd)))
				skb = fbnic_build_skb(nv, pkt);

			/* populate skb and invalidate XDP */
			if (!IS_ERR_OR_NULL(skb)) {
				fbnic_populate_skb_fields(nv, rcd, skb, qt);

				packets++;

				napi_gro_receive(&nv->napi, skb);
			}

			pkt->buff.data_hard_start = NULL;

			break;
		}

	/* Unmap and free processed buffers */
	if (head0 >= 0)
		fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
	fbnic_fill_bdq(nv, &qt->sub0);

	if (head1 >= 0)
		fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
	fbnic_fill_bdq(nv, &qt->sub1);
	
	}

> 
>>> It is better for us to optimize for the small packet scenario than
>>> optimize for the case where 4K slices are getting taken. That way when
>>> we are CPU constrained handling small packets we are the most
>>> optimized whereas for the larger frames we can spare a few cycles to
>>> account for the extra overhead. The result should be a higher overall
>>> packets per second.
>>
>> The problem is that small packet means low utilization of the bandwidth
>> as more bandwidth is used to send header instead of payload that is useful
>> for the user, so the question seems to be how often the small packet is
>> seen in the wire?
> 
> Very often. Especially when you are running something like servers
> where the flow usually consists of an incoming request which is often
> only a few hundred bytes, followed by us sending a response which then
> leads to a flow of control frames for it.

I think this is depending on the use case, if it is video streaming server,
I guess most of the packet is mtu-sized?

> .
> 

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-12  8:43           ` Yunsheng Lin
@ 2024-04-12  9:47             ` Yunsheng Lin
  2024-04-12 15:05             ` Alexander Duyck
  1 sibling, 0 replies; 163+ messages in thread
From: Yunsheng Lin @ 2024-04-12  9:47 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On 2024/4/12 16:43, Yunsheng Lin wrote:
> On 2024/4/10 23:03, Alexander Duyck wrote:

> 
> If we provide a proper frag API to reserve enough pagecnt_bias for caller to
> do its own fragmenting, then the memory waste may be avoided for this hw in
> system with PAGE_SIZE > 4K.
> 

Something like the page_pool_alloc_frag_bias() API below:

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 1d397c1a0043..018943307e68 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -92,6 +92,14 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
        return page_pool_alloc_pages(pool, gfp);
 }

+static inline struct page *page_pool_alloc_frag(struct page_pool *pool,
+                                               unsigned int *offset,
+                                               unsigned int size,
+                                               gfp_t gfp)
+{
+       return page_pool_alloc_frag_bias(pool, offset, size, 1U, gfp);
+}
+
 /**
  * page_pool_dev_alloc_frag() - allocate a page fragment.
  * @pool: pool from which to allocate
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 5e43a08d3231..2847b96264e5 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -202,8 +202,9 @@ struct page_pool {
 };

 struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
-struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset,
-                                 unsigned int size, gfp_t gfp);
+struct page *page_pool_alloc_frag_bias(struct page_pool *pool,
+                                      unsigned int *offset, unsigned int size,
+                                      unsigned int bias, gfp_t gfp);
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
                                          int cpuid);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4c175091fc0a..441b54473c35 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -850,9 +850,11 @@ static void page_pool_free_frag(struct page_pool *pool)
        page_pool_return_page(pool, page);
 }

-struct page *page_pool_alloc_frag(struct page_pool *pool,
-                                 unsigned int *offset,
-                                 unsigned int size, gfp_t gfp)
+struct page *page_pool_alloc_frag_bias(struct page_pool *pool,
+                                      unsigned int *offset,
+                                      unsigned int size,
+                                      unsigned int bias,
+                                      gfp_t gfp)
 {
        unsigned int max_size = PAGE_SIZE << pool->p.order;
        struct page *page = pool->frag_page;
@@ -881,19 +883,19 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
                pool->frag_page = page;

 frag_reset:
-               pool->frag_users = 1;
+               pool->frag_users = bias;
                *offset = 0;
                pool->frag_offset = size;
                page_pool_fragment_page(page, BIAS_MAX);
                return page;
        }

-       pool->frag_users++;
+       pool->frag_users += bias;
        pool->frag_offset = *offset + size;
        alloc_stat_inc(pool, fast);
        return page;
 }
-EXPORT_SYMBOL(page_pool_alloc_frag);
+EXPORT_SYMBOL(page_pool_alloc_frag_bias);

 static void page_pool_empty_ring(struct page_pool *pool)
 {


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-12  8:43           ` Yunsheng Lin
  2024-04-12  9:47             ` Yunsheng Lin
@ 2024-04-12 15:05             ` Alexander Duyck
  2024-04-15 13:19               ` Yunsheng Lin
  1 sibling, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-12 15:05 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Fri, Apr 12, 2024 at 1:43 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/4/10 23:03, Alexander Duyck wrote:
> > On Wed, Apr 10, 2024 at 4:54 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2024/4/9 23:08, Alexander Duyck wrote:
> >>> On Tue, Apr 9, 2024 at 4:47 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>>>
> >>>> On 2024/4/4 4:09, Alexander Duyck wrote:
> >>>>> From: Alexander Duyck <alexanderduyck@fb.com>
> >>>
> >>> [...]
> >>>
> >>>>> +     /* Unmap and free processed buffers */
> >>>>> +     if (head0 >= 0)
> >>>>> +             fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
> >>>>> +     fbnic_fill_bdq(nv, &qt->sub0);
> >>>>> +
> >>>>> +     if (head1 >= 0)
> >>>>> +             fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
> >>>>> +     fbnic_fill_bdq(nv, &qt->sub1);
> >>>>
> >>>> I am not sure how complicated the rx handling will be for the advanced
> >>>> feature. For the current code, for each entry/desc in both qt->sub0 and
> >>>> qt->sub1 at least need one page, and the page seems to be only used once
> >>>> no matter however small the page is used?
> >>>>
> >>>> I am assuming you want to do 'tightly optimized' operation for this by
> >>>> calling page_pool_fragment_page(), but manipulating page->pp_ref_count
> >>>> directly does not seems to add any value for the current code, but seem
> >>>> to waste a lot of memory by not using the frag API, especially PAGE_SIZE
> >>>>> 4K?
> >>>
> >>> On this hardware both the header and payload buffers are fragmentable.
> >>> The hardware decides the partitioning and we just follow it. So for
> >>> example it wouldn't be uncommon to have a jumbo frame split up such
> >>> that the header is less than 128B plus SKB overhead while the actual
> >>> data in the payload is just over 1400. So for us fragmenting the pages
> >>> is a very likely case especially with smaller packets.
> >>
> >> I understand that is what you are trying to do, but the code above does
> >> not seems to match the description, as the fbnic_clean_bdq() and
> >> fbnic_fill_bdq() are called for qt->sub0 and qt->sub1, so the old pages
> >> of qt->sub0 and qt->sub1 just cleaned are drained and refill each sub
> >> with new pages, which does not seems to have any fragmenting?
> >
> > That is because it is all taken care of by the completion queue. Take
> > a look in fbnic_pkt_prepare. We are taking the buffer from the header
> > descriptor and taking a slice out of it there via fbnic_page_pool_get.
> > Basically we store the fragment count locally in the rx_buf and then
> > subtract what is leftover when the device is done with it.
>
> The above seems look a lot like the prepare/commit API in [1], the prepare
> is done in fbnic_fill_bdq() and commit is done by fbnic_page_pool_get() in
> fbnic_pkt_prepare() and fbnic_add_rx_frag().
>
> If page_pool is able to provide a central place for pagecnt_bias of all the
> fragmemts of the same page, we may provide a similar prepare/commit API for
> frag API, I am not sure how to handle it for now.
>
> From the below macro, this hw seems to be only able to handle 4K memory for
> each entry/desc in qt->sub0 and qt->sub1, so there seems to be a lot of memory
> that is unused for PAGE_SIZE > 4K as it is allocating memory based on page
> granularity for each rx_buf in qt->sub0 and qt->sub1.
>
> +#define FBNIC_RCD_AL_BUFF_OFF_MASK             DESC_GENMASK(43, 32)

The advantage of being a purpose built driver is that we aren't
running on any architectures where the PAGE_SIZE > 4K. If it came to
that we could probably look at splitting the pages within the
descriptors by simply having a single page span multiple descriptors.

> It is still possible to reserve enough pagecnt_bias for each fragment, so that
> the caller can still do its own fragmenting on fragment granularity as we
> seems to have enough pagecnt_bias for each page.
>
> If we provide a proper frag API to reserve enough pagecnt_bias for caller to
> do its own fragmenting, then the memory waste may be avoided for this hw in
> system with PAGE_SIZE > 4K.
>
> 1. https://lore.kernel.org/lkml/20240407130850.19625-10-linyunsheng@huawei.com/

That isn't a concern for us as we are only using the device on x86
systems at this time.

> >
> >> The fragmenting can only happen when there is continuous small packet
> >> coming from wire so that hw can report the same pg_id for different
> >> packet with pg_offset before fbnic_clean_bdq() and fbnic_fill_bdq()
> >> is called? I am not sure how to ensure that considering that we might
> >> break out of while loop in fbnic_clean_rcq() because of 'packets < budget'
> >> checking.
> >
> > We don't free the page until we have moved one past it, or the
> > hardware has indicated it will take no more slices via a PAGE_FIN bit
> > in the descriptor.
>
>
> I look more closely at it, I am not able to figure it out how it is done
> yet, as the PAGE_FIN bit mentioned above seems to be only used to calculate
> the hdr_pg_end and truesize in fbnic_pkt_prepare() and fbnic_add_rx_frag().
>
> For the below flow in fbnic_clean_rcq(), fbnic_clean_bdq() will be called
> to drain the page in rx_buf just cleaned when head0/head1 >= 0, so I am not
> sure how it do the fragmenting yet, am I missing something obvious here?
>
>         while (likely(packets < budget)) {
>                 switch (FIELD_GET(FBNIC_RCD_TYPE_MASK, rcd)) {
>                 case FBNIC_RCD_TYPE_HDR_AL:
>                         head0 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
>                         fbnic_pkt_prepare(nv, rcd, pkt, qt);
>
>                         break;
>                 case FBNIC_RCD_TYPE_PAY_AL:
>                         head1 = FIELD_GET(FBNIC_RCD_AL_BUFF_ID_MASK, rcd);
>                         fbnic_add_rx_frag(nv, rcd, pkt, qt);
>
>                         break;
>
>                 case FBNIC_RCD_TYPE_META:
>                         if (likely(!fbnic_rcd_metadata_err(rcd)))
>                                 skb = fbnic_build_skb(nv, pkt);
>
>                         /* populate skb and invalidate XDP */
>                         if (!IS_ERR_OR_NULL(skb)) {
>                                 fbnic_populate_skb_fields(nv, rcd, skb, qt);
>
>                                 packets++;
>
>                                 napi_gro_receive(&nv->napi, skb);
>                         }
>
>                         pkt->buff.data_hard_start = NULL;
>
>                         break;
>                 }
>
>         /* Unmap and free processed buffers */
>         if (head0 >= 0)
>                 fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
>         fbnic_fill_bdq(nv, &qt->sub0);
>
>         if (head1 >= 0)
>                 fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
>         fbnic_fill_bdq(nv, &qt->sub1);
>
>         }

The cleanup logic cleans everything up to but not including the
head0/head1 offsets. So the pages are left on the ring until they are
fully consumed.

> >
> >>> It is better for us to optimize for the small packet scenario than
> >>> optimize for the case where 4K slices are getting taken. That way when
> >>> we are CPU constrained handling small packets we are the most
> >>> optimized whereas for the larger frames we can spare a few cycles to
> >>> account for the extra overhead. The result should be a higher overall
> >>> packets per second.
> >>
> >> The problem is that small packet means low utilization of the bandwidth
> >> as more bandwidth is used to send header instead of payload that is useful
> >> for the user, so the question seems to be how often the small packet is
> >> seen in the wire?
> >
> > Very often. Especially when you are running something like servers
> > where the flow usually consists of an incoming request which is often
> > only a few hundred bytes, followed by us sending a response which then
> > leads to a flow of control frames for it.
>
> I think this is depending on the use case, if it is video streaming server,
> I guess most of the packet is mtu-sized?

For the transmit side, yes. For the server side no. A typical TCP flow
has two sides two it. One sending SYN/ACK/FIN requests and the initial
get request and the other basically sending the response data.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-12 15:05             ` Alexander Duyck
@ 2024-04-15 13:19               ` Yunsheng Lin
  2024-04-15 15:03                 ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Yunsheng Lin @ 2024-04-15 13:19 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On 2024/4/12 23:05, Alexander Duyck wrote:

...

>>
>> From the below macro, this hw seems to be only able to handle 4K memory for
>> each entry/desc in qt->sub0 and qt->sub1, so there seems to be a lot of memory
>> that is unused for PAGE_SIZE > 4K as it is allocating memory based on page
>> granularity for each rx_buf in qt->sub0 and qt->sub1.
>>
>> +#define FBNIC_RCD_AL_BUFF_OFF_MASK             DESC_GENMASK(43, 32)
> 
> The advantage of being a purpose built driver is that we aren't
> running on any architectures where the PAGE_SIZE > 4K. If it came to

I am not sure if 'being a purpose built driver' argument is strong enough
here, at least the Kconfig does not seems to be suggesting it is a purpose
built driver, perhaps add a 'depend on' to suggest that?

> that we could probably look at splitting the pages within the
> descriptors by simply having a single page span multiple descriptors.

My point is that we might be able to meet the above use case with a proper
API without driver manipulating the reference counting by calling
page_pool_fragment_page() directly.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 13:19               ` Yunsheng Lin
@ 2024-04-15 15:03                 ` Alexander Duyck
  2024-04-15 17:11                   ` Jakub Kicinski
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-15 15:03 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Mon, Apr 15, 2024 at 6:19 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/4/12 23:05, Alexander Duyck wrote:
>
> ...
>
> >>
> >> From the below macro, this hw seems to be only able to handle 4K memory for
> >> each entry/desc in qt->sub0 and qt->sub1, so there seems to be a lot of memory
> >> that is unused for PAGE_SIZE > 4K as it is allocating memory based on page
> >> granularity for each rx_buf in qt->sub0 and qt->sub1.
> >>
> >> +#define FBNIC_RCD_AL_BUFF_OFF_MASK             DESC_GENMASK(43, 32)
> >
> > The advantage of being a purpose built driver is that we aren't
> > running on any architectures where the PAGE_SIZE > 4K. If it came to
>
> I am not sure if 'being a purpose built driver' argument is strong enough
> here, at least the Kconfig does not seems to be suggesting it is a purpose
> built driver, perhaps add a 'depend on' to suggest that?

I'm not sure if you have been following the other threads. One of the
general thoughts of pushback against this driver was that Meta is
currently the only company that will have possession of this NIC. As
such Meta will be deciding what systems it goes into and as a result
of that we aren't likely to be running it on systems with 64K pages.

> > that we could probably look at splitting the pages within the
> > descriptors by simply having a single page span multiple descriptors.
>
> My point is that we might be able to meet the above use case with a proper
> API without driver manipulating the reference counting by calling
> page_pool_fragment_page() directly.

My suggestion would be to look at putting your proposed API together
as something that can be used by another driver. Once we hit that I
can then look at incorporating it into fbnic. One issue right now is
that the current patch set is meant to make use of existing APIs
instead of needing to rely on creating new ones as this isn't a device
others will have access to so it will make it harder to test any
proposed API based only on fbnic.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 15:03                 ` Alexander Duyck
@ 2024-04-15 17:11                   ` Jakub Kicinski
  2024-04-15 18:03                     ` Alexander Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-15 17:11 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Mon, 15 Apr 2024 08:03:38 -0700 Alexander Duyck wrote:
> > > The advantage of being a purpose built driver is that we aren't
> > > running on any architectures where the PAGE_SIZE > 4K. If it came to  
> >
> > I am not sure if 'being a purpose built driver' argument is strong enough
> > here, at least the Kconfig does not seems to be suggesting it is a purpose
> > built driver, perhaps add a 'depend on' to suggest that?  
> 
> I'm not sure if you have been following the other threads. One of the
> general thoughts of pushback against this driver was that Meta is
> currently the only company that will have possession of this NIC. As
> such Meta will be deciding what systems it goes into and as a result
> of that we aren't likely to be running it on systems with 64K pages.

Didn't take long for this argument to float to the surface..

We tried to write some rules with Paolo but haven't published them, yet.
Here is one that may be relevant:

  3. External contributions
  -------------------------

  Owners of drivers for private devices must not exhibit a stronger
  sense of ownership or push back on accepting code changes from
  members of the community. 3rd party contributions should be evaluated
  and eventually accepted, or challenged only on technical arguments
  based on the code itself. In particular, the argument that the owner
  is the only user and therefore knows best should not be used.

Not exactly a contribution, but we predicted the "we know best"
tone of the argument :(

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 17:11                   ` Jakub Kicinski
@ 2024-04-15 18:03                     ` Alexander Duyck
  2024-04-15 18:19                       ` Jakub Kicinski
  2024-04-16 14:05                       ` Alexander Lobakin
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-15 18:03 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Mon, Apr 15, 2024 at 10:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 15 Apr 2024 08:03:38 -0700 Alexander Duyck wrote:
> > > > The advantage of being a purpose built driver is that we aren't
> > > > running on any architectures where the PAGE_SIZE > 4K. If it came to
> > >
> > > I am not sure if 'being a purpose built driver' argument is strong enough
> > > here, at least the Kconfig does not seems to be suggesting it is a purpose
> > > built driver, perhaps add a 'depend on' to suggest that?
> >
> > I'm not sure if you have been following the other threads. One of the
> > general thoughts of pushback against this driver was that Meta is
> > currently the only company that will have possession of this NIC. As
> > such Meta will be deciding what systems it goes into and as a result
> > of that we aren't likely to be running it on systems with 64K pages.
>
> Didn't take long for this argument to float to the surface..

This wasn't my full argument. You truncated the part where I
specifically called out that it is hard to justify us pushing a
proprietary API that is only used by our driver.

> We tried to write some rules with Paolo but haven't published them, yet.
> Here is one that may be relevant:
>
>   3. External contributions
>   -------------------------
>
>   Owners of drivers for private devices must not exhibit a stronger
>   sense of ownership or push back on accepting code changes from
>   members of the community. 3rd party contributions should be evaluated
>   and eventually accepted, or challenged only on technical arguments
>   based on the code itself. In particular, the argument that the owner
>   is the only user and therefore knows best should not be used.
>
> Not exactly a contribution, but we predicted the "we know best"
> tone of the argument :(

The "we know best" is more of an "I know best" as someone who has
worked with page pool and the page fragment API since well before it
existed. My push back is based on the fact that we don't want to
allocate fragments, we want to allocate pages and fragment them
ourselves after the fact. As such it doesn't make much sense to add an
API that will have us trying to use the page fragment API which holds
onto the page when the expectation is that we will take the whole
thing and just fragment it ourselves.

If we are going to use the page fragment API we need the ability to
convert a page pool page to a fragment in place, not have it get
pulled into the page fragment API and then immediately yanked right
back out. On top of that we don't need to be making significant
changes to the API that will slow down all the other users to
accomodate a driver that will not be used by most users.

This is a case where I agree with Jiri. It doesn't make sense to slow
down all the other users of the page fragment API by making it so that
we can pull variable sliced batches from the page pool fragment
interface. It makes much more sense for us to just fragment in place
and add as little overhead to the other users of page pool APIs as
possible.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 18:03                     ` Alexander Duyck
@ 2024-04-15 18:19                       ` Jakub Kicinski
  2024-04-15 18:55                         ` Alexander Duyck
  2024-04-16 14:05                       ` Alexander Lobakin
  1 sibling, 1 reply; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-15 18:19 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Mon, 15 Apr 2024 11:03:13 -0700 Alexander Duyck wrote:
> This wasn't my full argument. You truncated the part where I
> specifically called out that it is hard to justify us pushing a
> proprietary API that is only used by our driver.

I see. Please be careful when making such arguments, tho.

> The "we know best" is more of an "I know best" as someone who has
> worked with page pool and the page fragment API since well before it
> existed. My push back is based on the fact that we don't want to
> allocate fragments, we want to allocate pages and fragment them
> ourselves after the fact. As such it doesn't make much sense to add an
> API that will have us trying to use the page fragment API which holds
> onto the page when the expectation is that we will take the whole
> thing and just fragment it ourselves.

To be clear I'm not arguing for the exact use of the API as suggested.
Or even that we should support this in the shared API. One would
probably have to take a stab at coding it up to find out what works
best. My first try FWIW would be to mask off the low bits of the
page index, eg. for 64k page making entries 0-15 all use rx_buf 
index 0...

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 18:19                       ` Jakub Kicinski
@ 2024-04-15 18:55                         ` Alexander Duyck
  2024-04-15 22:01                           ` Jakub Kicinski
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-15 18:55 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Mon, Apr 15, 2024 at 11:19 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 15 Apr 2024 11:03:13 -0700 Alexander Duyck wrote:
> > This wasn't my full argument. You truncated the part where I
> > specifically called out that it is hard to justify us pushing a
> > proprietary API that is only used by our driver.
>
> I see. Please be careful when making such arguments, tho.
>
> > The "we know best" is more of an "I know best" as someone who has
> > worked with page pool and the page fragment API since well before it
> > existed. My push back is based on the fact that we don't want to
> > allocate fragments, we want to allocate pages and fragment them
> > ourselves after the fact. As such it doesn't make much sense to add an
> > API that will have us trying to use the page fragment API which holds
> > onto the page when the expectation is that we will take the whole
> > thing and just fragment it ourselves.
>
> To be clear I'm not arguing for the exact use of the API as suggested.
> Or even that we should support this in the shared API. One would
> probably have to take a stab at coding it up to find out what works
> best. My first try FWIW would be to mask off the low bits of the
> page index, eg. for 64k page making entries 0-15 all use rx_buf
> index 0...

It would take a few more changes to make it all work. Basically we
would need to map the page into every descriptor entry since the worst
case scenario would be that somehow we end up with things getting so
tight that the page is only partially mapped and we are working
through it as a subset of 4K slices with some at the beginning being
unmapped from the descriptor ring while some are still waiting to be
assigned to a descriptor and used. What I would probably have to look
at doing is adding some sort of cache on the ring to hold onto it
while we dole it out 4K at a time to the descriptors. Either that or
enforce a hard 16 descriptor limit where we have to assign a full page
with every allocation meaning we are at a higher risk for starving the
device for memory.

The bigger issue would be how could we test it? This is an OCP NIC and
as far as I am aware we don't have any systems available that would
support a 64K page. I suppose I could rebuild the QEMU for an
architecture that supports 64K pages and test it. It would just be
painful to have to set up a virtual system to test code that would
literally never be used again. I am not sure QEMU can generate enough
stress to really test the page allocator and make sure all corner
cases are covered.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 18:55                         ` Alexander Duyck
@ 2024-04-15 22:01                           ` Jakub Kicinski
  2024-04-15 23:57                             ` Alexander Duyck
  2024-04-16 13:25                             ` Yunsheng Lin
  0 siblings, 2 replies; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-15 22:01 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Mon, 15 Apr 2024 11:55:37 -0700 Alexander Duyck wrote:
> It would take a few more changes to make it all work. Basically we
> would need to map the page into every descriptor entry since the worst
> case scenario would be that somehow we end up with things getting so
> tight that the page is only partially mapped and we are working
> through it as a subset of 4K slices with some at the beginning being
> unmapped from the descriptor ring while some are still waiting to be
> assigned to a descriptor and used. What I would probably have to look
> at doing is adding some sort of cache on the ring to hold onto it
> while we dole it out 4K at a time to the descriptors. Either that or
> enforce a hard 16 descriptor limit where we have to assign a full page
> with every allocation meaning we are at a higher risk for starving the
> device for memory.

Hm, that would be more work, indeed, but potentially beneficial. I was
thinking of separating the page allocation and draining logic a bit
from the fragment handling logic.

#define RXPAGE_IDX(idx)		((idx) >> PAGE_SHIFT - 12)

in fbnic_clean_bdq():

	while (RXPAGE_IDX(head) != RXPAGE_IDX(hw_head))

refer to rx_buf as:

	struct fbnic_rx_buf *rx_buf = &ring->rx_buf[idx >> LOSE_BITS];

Refill always works in batches of multiple of PAGE_SIZE / 4k.

> The bigger issue would be how could we test it? This is an OCP NIC and
> as far as I am aware we don't have any systems available that would
> support a 64K page. I suppose I could rebuild the QEMU for an
> architecture that supports 64K pages and test it. It would just be
> painful to have to set up a virtual system to test code that would
> literally never be used again. I am not sure QEMU can generate enough
> stress to really test the page allocator and make sure all corner
> cases are covered.

The testing may be tricky. We could possibly test with hacking up the
driver to use compound pages (say always allocate 16k) and making sure
we don't refer to PAGE_SIZE directly in the test.

BTW I have a spreadsheet of "promises", I'd be fine if we set a
deadline for FBNIC to gain support for PAGE_SIZE != 4k and Kconfig 
to x86-only for now..

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 22:01                           ` Jakub Kicinski
@ 2024-04-15 23:57                             ` Alexander Duyck
  2024-04-16  0:24                               ` Jakub Kicinski
  2024-04-16 13:25                             ` Yunsheng Lin
  1 sibling, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-15 23:57 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Mon, Apr 15, 2024 at 3:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 15 Apr 2024 11:55:37 -0700 Alexander Duyck wrote:
> > It would take a few more changes to make it all work. Basically we
> > would need to map the page into every descriptor entry since the worst
> > case scenario would be that somehow we end up with things getting so
> > tight that the page is only partially mapped and we are working
> > through it as a subset of 4K slices with some at the beginning being
> > unmapped from the descriptor ring while some are still waiting to be
> > assigned to a descriptor and used. What I would probably have to look
> > at doing is adding some sort of cache on the ring to hold onto it
> > while we dole it out 4K at a time to the descriptors. Either that or
> > enforce a hard 16 descriptor limit where we have to assign a full page
> > with every allocation meaning we are at a higher risk for starving the
> > device for memory.
>
> Hm, that would be more work, indeed, but potentially beneficial. I was
> thinking of separating the page allocation and draining logic a bit
> from the fragment handling logic.
>
> #define RXPAGE_IDX(idx)         ((idx) >> PAGE_SHIFT - 12)
>
> in fbnic_clean_bdq():
>
>         while (RXPAGE_IDX(head) != RXPAGE_IDX(hw_head))
>
> refer to rx_buf as:
>
>         struct fbnic_rx_buf *rx_buf = &ring->rx_buf[idx >> LOSE_BITS];
>
> Refill always works in batches of multiple of PAGE_SIZE / 4k.
>
> > The bigger issue would be how could we test it? This is an OCP NIC and
> > as far as I am aware we don't have any systems available that would
> > support a 64K page. I suppose I could rebuild the QEMU for an
> > architecture that supports 64K pages and test it. It would just be
> > painful to have to set up a virtual system to test code that would
> > literally never be used again. I am not sure QEMU can generate enough
> > stress to really test the page allocator and make sure all corner
> > cases are covered.
>
> The testing may be tricky. We could possibly test with hacking up the
> driver to use compound pages (say always allocate 16k) and making sure
> we don't refer to PAGE_SIZE directly in the test.
>
> BTW I have a spreadsheet of "promises", I'd be fine if we set a
> deadline for FBNIC to gain support for PAGE_SIZE != 4k and Kconfig
> to x86-only for now..

Why set a deadline? It doesn't make sense to add as a feature for now.

I would be fine with limiting it to x86-only and then stating that if
we need to change it to add support for an architecture that does
support !4K page size then we can cross that bridge when we get there
as it would be much more likely that we would have access to a
platform to test it on rather than adding overhead to the code to
support a setup that this device may never see in its lifetime.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 23:57                             ` Alexander Duyck
@ 2024-04-16  0:24                               ` Jakub Kicinski
  0 siblings, 0 replies; 163+ messages in thread
From: Jakub Kicinski @ 2024-04-16  0:24 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Mon, 15 Apr 2024 16:57:54 -0700 Alexander Duyck wrote:
> > The testing may be tricky. We could possibly test with hacking up the
> > driver to use compound pages (say always allocate 16k) and making sure
> > we don't refer to PAGE_SIZE directly in the test.
> >
> > BTW I have a spreadsheet of "promises", I'd be fine if we set a
> > deadline for FBNIC to gain support for PAGE_SIZE != 4k and Kconfig
> > to x86-only for now..  
> 
> Why set a deadline? It doesn't make sense to add as a feature for now.

Okay, maybe I'm trying to be too nice. Please have all the feedback
addressed in v2. 

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 22:01                           ` Jakub Kicinski
  2024-04-15 23:57                             ` Alexander Duyck
@ 2024-04-16 13:25                             ` Yunsheng Lin
  2024-04-16 14:35                               ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Yunsheng Lin @ 2024-04-16 13:25 UTC (permalink / raw)
  To: Jakub Kicinski, Alexander Duyck; +Cc: netdev, Alexander Duyck, davem, pabeni

On 2024/4/16 6:01, Jakub Kicinski wrote:
> On Mon, 15 Apr 2024 11:55:37 -0700 Alexander Duyck wrote:
>> It would take a few more changes to make it all work. Basically we
>> would need to map the page into every descriptor entry since the worst
>> case scenario would be that somehow we end up with things getting so
>> tight that the page is only partially mapped and we are working
>> through it as a subset of 4K slices with some at the beginning being
>> unmapped from the descriptor ring while some are still waiting to be
>> assigned to a descriptor and used. What I would probably have to look
>> at doing is adding some sort of cache on the ring to hold onto it
>> while we dole it out 4K at a time to the descriptors. Either that or
>> enforce a hard 16 descriptor limit where we have to assign a full page
>> with every allocation meaning we are at a higher risk for starving the
>> device for memory.
> 
> Hm, that would be more work, indeed, but potentially beneficial. I was
> thinking of separating the page allocation and draining logic a bit
> from the fragment handling logic.
> 
> #define RXPAGE_IDX(idx)		((idx) >> PAGE_SHIFT - 12)
> 
> in fbnic_clean_bdq():
> 
> 	while (RXPAGE_IDX(head) != RXPAGE_IDX(hw_head))
> 
> refer to rx_buf as:
> 
> 	struct fbnic_rx_buf *rx_buf = &ring->rx_buf[idx >> LOSE_BITS];
> 
> Refill always works in batches of multiple of PAGE_SIZE / 4k.

Are we expecting drivers wanting best possible performance doing the
above duplicated trick?

"grep -rn '_reuse_' drivers/net/ethernet/" seems to suggest that we
already have similar trick to do the page spliting in a lot of drivers,
I would rather we do not duplicate the above trick again.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-15 18:03                     ` Alexander Duyck
  2024-04-15 18:19                       ` Jakub Kicinski
@ 2024-04-16 14:05                       ` Alexander Lobakin
  2024-04-16 14:46                         ` Alexander Duyck
  1 sibling, 1 reply; 163+ messages in thread
From: Alexander Lobakin @ 2024-04-16 14:05 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Mon, 15 Apr 2024 11:03:13 -0700

> On Mon, Apr 15, 2024 at 10:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Mon, 15 Apr 2024 08:03:38 -0700 Alexander Duyck wrote:
>>>>> The advantage of being a purpose built driver is that we aren't
>>>>> running on any architectures where the PAGE_SIZE > 4K. If it came to
>>>>
>>>> I am not sure if 'being a purpose built driver' argument is strong enough
>>>> here, at least the Kconfig does not seems to be suggesting it is a purpose
>>>> built driver, perhaps add a 'depend on' to suggest that?
>>>
>>> I'm not sure if you have been following the other threads. One of the
>>> general thoughts of pushback against this driver was that Meta is
>>> currently the only company that will have possession of this NIC. As
>>> such Meta will be deciding what systems it goes into and as a result
>>> of that we aren't likely to be running it on systems with 64K pages.
>>
>> Didn't take long for this argument to float to the surface..
> 
> This wasn't my full argument. You truncated the part where I
> specifically called out that it is hard to justify us pushing a
> proprietary API that is only used by our driver.
> 
>> We tried to write some rules with Paolo but haven't published them, yet.
>> Here is one that may be relevant:
>>
>>   3. External contributions
>>   -------------------------
>>
>>   Owners of drivers for private devices must not exhibit a stronger
>>   sense of ownership or push back on accepting code changes from
>>   members of the community. 3rd party contributions should be evaluated
>>   and eventually accepted, or challenged only on technical arguments
>>   based on the code itself. In particular, the argument that the owner
>>   is the only user and therefore knows best should not be used.
>>
>> Not exactly a contribution, but we predicted the "we know best"
>> tone of the argument :(
> 
> The "we know best" is more of an "I know best" as someone who has
> worked with page pool and the page fragment API since well before it
> existed. My push back is based on the fact that we don't want to

I still strongly believe Jesper-style arguments like "I've been working
with this for aeons", "I invented the Internet", "I was born 3 decades
before this API was introduced" are not valid arguments.

> allocate fragments, we want to allocate pages and fragment them
> ourselves after the fact. As such it doesn't make much sense to add an
> API that will have us trying to use the page fragment API which holds
> onto the page when the expectation is that we will take the whole
> thing and just fragment it ourselves.

[...]

Re "this HW works only on x86, why bother" -- I still believe there
shouldn't be any hardcodes in any driver based on the fact that the HW
is deployed only on particular systems. Page sizes, Endianness,
32/64-bit... It's not difficult to make a driver look like it's
universal and could work anywhere, really.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-16 13:25                             ` Yunsheng Lin
@ 2024-04-16 14:35                               ` Alexander Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-16 14:35 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: Jakub Kicinski, netdev, Alexander Duyck, davem, pabeni

On Tue, Apr 16, 2024 at 6:25 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/4/16 6:01, Jakub Kicinski wrote:
> > On Mon, 15 Apr 2024 11:55:37 -0700 Alexander Duyck wrote:
> >> It would take a few more changes to make it all work. Basically we
> >> would need to map the page into every descriptor entry since the worst
> >> case scenario would be that somehow we end up with things getting so
> >> tight that the page is only partially mapped and we are working
> >> through it as a subset of 4K slices with some at the beginning being
> >> unmapped from the descriptor ring while some are still waiting to be
> >> assigned to a descriptor and used. What I would probably have to look
> >> at doing is adding some sort of cache on the ring to hold onto it
> >> while we dole it out 4K at a time to the descriptors. Either that or
> >> enforce a hard 16 descriptor limit where we have to assign a full page
> >> with every allocation meaning we are at a higher risk for starving the
> >> device for memory.
> >
> > Hm, that would be more work, indeed, but potentially beneficial. I was
> > thinking of separating the page allocation and draining logic a bit
> > from the fragment handling logic.
> >
> > #define RXPAGE_IDX(idx)               ((idx) >> PAGE_SHIFT - 12)
> >
> > in fbnic_clean_bdq():
> >
> >       while (RXPAGE_IDX(head) != RXPAGE_IDX(hw_head))
> >
> > refer to rx_buf as:
> >
> >       struct fbnic_rx_buf *rx_buf = &ring->rx_buf[idx >> LOSE_BITS];
> >
> > Refill always works in batches of multiple of PAGE_SIZE / 4k.
>
> Are we expecting drivers wanting best possible performance doing the
> above duplicated trick?
>
> "grep -rn '_reuse_' drivers/net/ethernet/" seems to suggest that we
> already have similar trick to do the page spliting in a lot of drivers,
> I would rather we do not duplicate the above trick again.

Then why not focus on those drivers? You may have missed the whole
point but it isn't possible to test this device on a system with 64K
pages currently. There aren't any platforms we can drop the device
into that support that.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-16 14:05                       ` Alexander Lobakin
@ 2024-04-16 14:46                         ` Alexander Duyck
  2024-04-16 18:26                           ` Andrew Lunn
                                             ` (2 more replies)
  0 siblings, 3 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-16 14:46 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Jakub Kicinski, Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

On Tue, Apr 16, 2024 at 7:05 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Mon, 15 Apr 2024 11:03:13 -0700
>
> > On Mon, Apr 15, 2024 at 10:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Mon, 15 Apr 2024 08:03:38 -0700 Alexander Duyck wrote:
> >>>>> The advantage of being a purpose built driver is that we aren't
> >>>>> running on any architectures where the PAGE_SIZE > 4K. If it came to
> >>>>
> >>>> I am not sure if 'being a purpose built driver' argument is strong enough
> >>>> here, at least the Kconfig does not seems to be suggesting it is a purpose
> >>>> built driver, perhaps add a 'depend on' to suggest that?
> >>>
> >>> I'm not sure if you have been following the other threads. One of the
> >>> general thoughts of pushback against this driver was that Meta is
> >>> currently the only company that will have possession of this NIC. As
> >>> such Meta will be deciding what systems it goes into and as a result
> >>> of that we aren't likely to be running it on systems with 64K pages.
> >>
> >> Didn't take long for this argument to float to the surface..
> >
> > This wasn't my full argument. You truncated the part where I
> > specifically called out that it is hard to justify us pushing a
> > proprietary API that is only used by our driver.
> >
> >> We tried to write some rules with Paolo but haven't published them, yet.
> >> Here is one that may be relevant:
> >>
> >>   3. External contributions
> >>   -------------------------
> >>
> >>   Owners of drivers for private devices must not exhibit a stronger
> >>   sense of ownership or push back on accepting code changes from
> >>   members of the community. 3rd party contributions should be evaluated
> >>   and eventually accepted, or challenged only on technical arguments
> >>   based on the code itself. In particular, the argument that the owner
> >>   is the only user and therefore knows best should not be used.
> >>
> >> Not exactly a contribution, but we predicted the "we know best"
> >> tone of the argument :(
> >
> > The "we know best" is more of an "I know best" as someone who has
> > worked with page pool and the page fragment API since well before it
> > existed. My push back is based on the fact that we don't want to
>
> I still strongly believe Jesper-style arguments like "I've been working
> with this for aeons", "I invented the Internet", "I was born 3 decades
> before this API was introduced" are not valid arguments.

Sorry that is a bit of my frustration with Yunsheng coming through. He
has another patch set that mostly just moves my code and made himself
the maintainer. Admittedly I am a bit annoyed with that. Especially
since the main drive seems to be to force everything to use that one
approach and then optimize for his use case for vhost net over all
others most likely at the expense of everything else.

It seems like it is the very thing we were complaining about in patch
0 with other drivers getting penalized at the cost of optimizing for
one specific driver.

> > allocate fragments, we want to allocate pages and fragment them
> > ourselves after the fact. As such it doesn't make much sense to add an
> > API that will have us trying to use the page fragment API which holds
> > onto the page when the expectation is that we will take the whole
> > thing and just fragment it ourselves.
>
> [...]
>
> Re "this HW works only on x86, why bother" -- I still believe there
> shouldn't be any hardcodes in any driver based on the fact that the HW
> is deployed only on particular systems. Page sizes, Endianness,
> 32/64-bit... It's not difficult to make a driver look like it's
> universal and could work anywhere, really.

It isn't that this only works on x86. It is that we can only test it
on x86. The biggest issue right now is that I wouldn't have any
systems w/ 64K pages that I could test on, and the worst that could
happen based on the current code is that the NIC driver will be a
memory hog.

I would much prefer the potential of being a memory hog on an untested
hardware over implementing said code untested and introducing
something like a memory leak or page fault issue.

That is why I am more than willing to make this an x86_64 only driver
for now and we can look at expanding out as I get time and get
equipment to to add support and test for other architectures.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-16 14:46                         ` Alexander Duyck
@ 2024-04-16 18:26                           ` Andrew Lunn
  2024-04-17  8:14                           ` Leon Romanovsky
  2024-04-17 10:39                           ` Alexander Lobakin
  2 siblings, 0 replies; 163+ messages in thread
From: Andrew Lunn @ 2024-04-16 18:26 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Lobakin, Jakub Kicinski, Yunsheng Lin, netdev,
	Alexander Duyck, davem, pabeni

> Sorry that is a bit of my frustration with Yunsheng coming through. He
> has another patch set that mostly just moves my code and made himself
> the maintainer. Admittedly I am a bit annoyed with that. Especially
> since the main drive seems to be to force everything to use that one
> approach and then optimize for his use case for vhost net over all
> others most likely at the expense of everything else.

That is why we ask for benchmarks for "optimization patches". If they
don't actually provide any improvements, or degrade other use cases,
they get rejected.

> That is why I am more than willing to make this an x86_64 only driver
> for now and we can look at expanding out as I get time and get
> equipment to to add support and test for other architectures.

That sounds reasonable to me. But i would allow COMPILE_TEST for other
architectures, just to keep the number of surprises low when you do
have other architectures to test with.

     Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-16 14:46                         ` Alexander Duyck
  2024-04-16 18:26                           ` Andrew Lunn
@ 2024-04-17  8:14                           ` Leon Romanovsky
  2024-04-17 16:09                             ` Alexander Duyck
  2024-04-17 10:39                           ` Alexander Lobakin
  2 siblings, 1 reply; 163+ messages in thread
From: Leon Romanovsky @ 2024-04-17  8:14 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Lobakin, Jakub Kicinski, Yunsheng Lin, netdev,
	Alexander Duyck, davem, pabeni

On Tue, Apr 16, 2024 at 07:46:06AM -0700, Alexander Duyck wrote:
> On Tue, Apr 16, 2024 at 7:05 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
> >
> > From: Alexander Duyck <alexander.duyck@gmail.com>
> > Date: Mon, 15 Apr 2024 11:03:13 -0700
> >
> > > On Mon, Apr 15, 2024 at 10:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >>
> > >> On Mon, 15 Apr 2024 08:03:38 -0700 Alexander Duyck wrote:
> > >>>>> The advantage of being a purpose built driver is that we aren't
> > >>>>> running on any architectures where the PAGE_SIZE > 4K. If it came to
> > >>>>
> > >>>> I am not sure if 'being a purpose built driver' argument is strong enough
> > >>>> here, at least the Kconfig does not seems to be suggesting it is a purpose
> > >>>> built driver, perhaps add a 'depend on' to suggest that?
> > >>>
> > >>> I'm not sure if you have been following the other threads. One of the
> > >>> general thoughts of pushback against this driver was that Meta is
> > >>> currently the only company that will have possession of this NIC. As
> > >>> such Meta will be deciding what systems it goes into and as a result
> > >>> of that we aren't likely to be running it on systems with 64K pages.
> > >>
> > >> Didn't take long for this argument to float to the surface..
> > >
> > > This wasn't my full argument. You truncated the part where I
> > > specifically called out that it is hard to justify us pushing a
> > > proprietary API that is only used by our driver.
> > >
> > >> We tried to write some rules with Paolo but haven't published them, yet.
> > >> Here is one that may be relevant:
> > >>
> > >>   3. External contributions
> > >>   -------------------------
> > >>
> > >>   Owners of drivers for private devices must not exhibit a stronger
> > >>   sense of ownership or push back on accepting code changes from
> > >>   members of the community. 3rd party contributions should be evaluated
> > >>   and eventually accepted, or challenged only on technical arguments
> > >>   based on the code itself. In particular, the argument that the owner
> > >>   is the only user and therefore knows best should not be used.
> > >>
> > >> Not exactly a contribution, but we predicted the "we know best"
> > >> tone of the argument :(
> > >
> > > The "we know best" is more of an "I know best" as someone who has
> > > worked with page pool and the page fragment API since well before it
> > > existed. My push back is based on the fact that we don't want to
> >
> > I still strongly believe Jesper-style arguments like "I've been working
> > with this for aeons", "I invented the Internet", "I was born 3 decades
> > before this API was introduced" are not valid arguments.
> 
> Sorry that is a bit of my frustration with Yunsheng coming through. He
> has another patch set that mostly just moves my code and made himself
> the maintainer. Admittedly I am a bit annoyed with that. Especially
> since the main drive seems to be to force everything to use that one
> approach and then optimize for his use case for vhost net over all
> others most likely at the expense of everything else.
> 
> It seems like it is the very thing we were complaining about in patch
> 0 with other drivers getting penalized at the cost of optimizing for
> one specific driver.
> 
> > > allocate fragments, we want to allocate pages and fragment them
> > > ourselves after the fact. As such it doesn't make much sense to add an
> > > API that will have us trying to use the page fragment API which holds
> > > onto the page when the expectation is that we will take the whole
> > > thing and just fragment it ourselves.
> >
> > [...]
> >
> > Re "this HW works only on x86, why bother" -- I still believe there
> > shouldn't be any hardcodes in any driver based on the fact that the HW
> > is deployed only on particular systems. Page sizes, Endianness,
> > 32/64-bit... It's not difficult to make a driver look like it's
> > universal and could work anywhere, really.
> 
> It isn't that this only works on x86. It is that we can only test it
> on x86. The biggest issue right now is that I wouldn't have any
> systems w/ 64K pages that I could test on.

Didn't you write that you will provide QEMU emulation for this device?

Thanks

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-16 14:46                         ` Alexander Duyck
  2024-04-16 18:26                           ` Andrew Lunn
  2024-04-17  8:14                           ` Leon Romanovsky
@ 2024-04-17 10:39                           ` Alexander Lobakin
  2 siblings, 0 replies; 163+ messages in thread
From: Alexander Lobakin @ 2024-04-17 10:39 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Yunsheng Lin, netdev, Alexander Duyck, davem, pabeni

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Tue, 16 Apr 2024 07:46:06 -0700

> On Tue, Apr 16, 2024 at 7:05 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Alexander Duyck <alexander.duyck@gmail.com>
>> Date: Mon, 15 Apr 2024 11:03:13 -0700
>>
>>> On Mon, Apr 15, 2024 at 10:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Mon, 15 Apr 2024 08:03:38 -0700 Alexander Duyck wrote:
>>>>>>> The advantage of being a purpose built driver is that we aren't
>>>>>>> running on any architectures where the PAGE_SIZE > 4K. If it came to
>>>>>>
>>>>>> I am not sure if 'being a purpose built driver' argument is strong enough
>>>>>> here, at least the Kconfig does not seems to be suggesting it is a purpose
>>>>>> built driver, perhaps add a 'depend on' to suggest that?
>>>>>
>>>>> I'm not sure if you have been following the other threads. One of the
>>>>> general thoughts of pushback against this driver was that Meta is
>>>>> currently the only company that will have possession of this NIC. As
>>>>> such Meta will be deciding what systems it goes into and as a result
>>>>> of that we aren't likely to be running it on systems with 64K pages.
>>>>
>>>> Didn't take long for this argument to float to the surface..
>>>
>>> This wasn't my full argument. You truncated the part where I
>>> specifically called out that it is hard to justify us pushing a
>>> proprietary API that is only used by our driver.
>>>
>>>> We tried to write some rules with Paolo but haven't published them, yet.
>>>> Here is one that may be relevant:
>>>>
>>>>   3. External contributions
>>>>   -------------------------
>>>>
>>>>   Owners of drivers for private devices must not exhibit a stronger
>>>>   sense of ownership or push back on accepting code changes from
>>>>   members of the community. 3rd party contributions should be evaluated
>>>>   and eventually accepted, or challenged only on technical arguments
>>>>   based on the code itself. In particular, the argument that the owner
>>>>   is the only user and therefore knows best should not be used.
>>>>
>>>> Not exactly a contribution, but we predicted the "we know best"
>>>> tone of the argument :(
>>>
>>> The "we know best" is more of an "I know best" as someone who has
>>> worked with page pool and the page fragment API since well before it
>>> existed. My push back is based on the fact that we don't want to
>>
>> I still strongly believe Jesper-style arguments like "I've been working
>> with this for aeons", "I invented the Internet", "I was born 3 decades
>> before this API was introduced" are not valid arguments.
> 
> Sorry that is a bit of my frustration with Yunsheng coming through. He
> has another patch set that mostly just moves my code and made himself
> the maintainer. Admittedly I am a bit annoyed with that. Especially
> since the main drive seems to be to force everything to use that one
> approach and then optimize for his use case for vhost net over all
> others most likely at the expense of everything else.
> 
> It seems like it is the very thing we were complaining about in patch
> 0 with other drivers getting penalized at the cost of optimizing for
> one specific driver.
> 
>>> allocate fragments, we want to allocate pages and fragment them
>>> ourselves after the fact. As such it doesn't make much sense to add an
>>> API that will have us trying to use the page fragment API which holds
>>> onto the page when the expectation is that we will take the whole
>>> thing and just fragment it ourselves.
>>
>> [...]
>>
>> Re "this HW works only on x86, why bother" -- I still believe there
>> shouldn't be any hardcodes in any driver based on the fact that the HW
>> is deployed only on particular systems. Page sizes, Endianness,
>> 32/64-bit... It's not difficult to make a driver look like it's
>> universal and could work anywhere, really.
> 
> It isn't that this only works on x86. It is that we can only test it
> on x86. The biggest issue right now is that I wouldn't have any
> systems w/ 64K pages that I could test on, and the worst that could
> happen based on the current code is that the NIC driver will be a
> memory hog.
> 
> I would much prefer the potential of being a memory hog on an untested
> hardware over implementing said code untested and introducing
> something like a memory leak or page fault issue.
> 
> That is why I am more than willing to make this an x86_64 only driver
> for now and we can look at expanding out as I get time and get
> equipment to to add support and test for other architectures.

I don't see any issue to not limit it to x86_64 only. It compiles just
fine and if you run it later on a non-x86_64 system, you'll test it
then. I don't think anyone will run it on a different platform prior to
that.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling
  2024-04-17  8:14                           ` Leon Romanovsky
@ 2024-04-17 16:09                             ` Alexander Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Duyck @ 2024-04-17 16:09 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Alexander Lobakin, Jakub Kicinski, Yunsheng Lin, netdev,
	Alexander Duyck, davem, pabeni

On Wed, Apr 17, 2024 at 1:14 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Apr 16, 2024 at 07:46:06AM -0700, Alexander Duyck wrote:
> > On Tue, Apr 16, 2024 at 7:05 AM Alexander Lobakin
> > <aleksander.lobakin@intel.com> wrote:
> > >
> > > From: Alexander Duyck <alexander.duyck@gmail.com>
> > > Date: Mon, 15 Apr 2024 11:03:13 -0700
> > >
> > > > On Mon, Apr 15, 2024 at 10:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >>
> > > >> On Mon, 15 Apr 2024 08:03:38 -0700 Alexander Duyck wrote:
> > > >>>>> The advantage of being a purpose built driver is that we aren't
> > > >>>>> running on any architectures where the PAGE_SIZE > 4K. If it came to
> > > >>>>
> > > >>>> I am not sure if 'being a purpose built driver' argument is strong enough
> > > >>>> here, at least the Kconfig does not seems to be suggesting it is a purpose
> > > >>>> built driver, perhaps add a 'depend on' to suggest that?
> > > >>>
> > > >>> I'm not sure if you have been following the other threads. One of the
> > > >>> general thoughts of pushback against this driver was that Meta is
> > > >>> currently the only company that will have possession of this NIC. As
> > > >>> such Meta will be deciding what systems it goes into and as a result
> > > >>> of that we aren't likely to be running it on systems with 64K pages.
> > > >>
> > > >> Didn't take long for this argument to float to the surface..
> > > >
> > > > This wasn't my full argument. You truncated the part where I
> > > > specifically called out that it is hard to justify us pushing a
> > > > proprietary API that is only used by our driver.
> > > >
> > > >> We tried to write some rules with Paolo but haven't published them, yet.
> > > >> Here is one that may be relevant:
> > > >>
> > > >>   3. External contributions
> > > >>   -------------------------
> > > >>
> > > >>   Owners of drivers for private devices must not exhibit a stronger
> > > >>   sense of ownership or push back on accepting code changes from
> > > >>   members of the community. 3rd party contributions should be evaluated
> > > >>   and eventually accepted, or challenged only on technical arguments
> > > >>   based on the code itself. In particular, the argument that the owner
> > > >>   is the only user and therefore knows best should not be used.
> > > >>
> > > >> Not exactly a contribution, but we predicted the "we know best"
> > > >> tone of the argument :(
> > > >
> > > > The "we know best" is more of an "I know best" as someone who has
> > > > worked with page pool and the page fragment API since well before it
> > > > existed. My push back is based on the fact that we don't want to
> > >
> > > I still strongly believe Jesper-style arguments like "I've been working
> > > with this for aeons", "I invented the Internet", "I was born 3 decades
> > > before this API was introduced" are not valid arguments.
> >
> > Sorry that is a bit of my frustration with Yunsheng coming through. He
> > has another patch set that mostly just moves my code and made himself
> > the maintainer. Admittedly I am a bit annoyed with that. Especially
> > since the main drive seems to be to force everything to use that one
> > approach and then optimize for his use case for vhost net over all
> > others most likely at the expense of everything else.
> >
> > It seems like it is the very thing we were complaining about in patch
> > 0 with other drivers getting penalized at the cost of optimizing for
> > one specific driver.
> >
> > > > allocate fragments, we want to allocate pages and fragment them
> > > > ourselves after the fact. As such it doesn't make much sense to add an
> > > > API that will have us trying to use the page fragment API which holds
> > > > onto the page when the expectation is that we will take the whole
> > > > thing and just fragment it ourselves.
> > >
> > > [...]
> > >
> > > Re "this HW works only on x86, why bother" -- I still believe there
> > > shouldn't be any hardcodes in any driver based on the fact that the HW
> > > is deployed only on particular systems. Page sizes, Endianness,
> > > 32/64-bit... It's not difficult to make a driver look like it's
> > > universal and could work anywhere, really.
> >
> > It isn't that this only works on x86. It is that we can only test it
> > on x86. The biggest issue right now is that I wouldn't have any
> > systems w/ 64K pages that I could test on.
>
> Didn't you write that you will provide QEMU emulation for this device?
>
> Thanks

Yes. I had already mentioned the possibility of testing it this way. I
am just not sure it adds much value to test the already limited
hardware and a limited platform setup in emulation. The issue is that
it will be hard to generate any stress in the QEMU environment since
it maxes out at only about 1 or 2 Gbps as the overhead for providing
TCAMs is software is not insignificant. To top it off, emulating a
non-native architecture will slow things down further. It would be
like asking someone to test a 100G nic on a system that only has PCIe
gen1.

I will probably just go the suggested route of enabling compile
testing on all platforms, and only support loading it on X86_64. As
time permits I can probably assign somebody the job of exploring the
page size larger than 4K issue, however it will be a matter of
weighing the trade-off for adding technical debt for a use case that
may not be applicable.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup
  2024-04-05 21:51   ` Andrew Lunn
@ 2024-04-21 23:21     ` Alexander Duyck
  2024-04-22 15:52       ` Andrew Lunn
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Duyck @ 2024-04-21 23:21 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Fri, Apr 5, 2024 at 2:51 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > +#define FBNIC_CSR_START_PCS          0x10000 /* CSR section delimiter */
> > +#define FBNIC_PCS_CONTROL1_0         0x10000         /* 0x40000 */
> > +#define FBNIC_PCS_CONTROL1_RESET             CSR_BIT(15)
> > +#define FBNIC_PCS_CONTROL1_LOOPBACK          CSR_BIT(14)
> > +#define FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS       CSR_BIT(13)
> > +#define FBNIC_PCS_CONTROL1_SPEED_ALWAYS              CSR_BIT(6)
>
> This appears to be PCS control register 1, define in 45.2.3.1. Since
> this is a standard register, please add it to mdio.h.

Actually all these bits are essentially there in the forms of:
MDIO_CTRL1_RESET, MDIO_PCS_CTRL1_LOOPBACK, and MDIO_CTRL1_SPEEDSELEXT.
I will base the driver on these values.

> > +#define FBNIC_PCS_VENDOR_VL_INTVL_0  0x10202         /* 0x40808 */
>
> Could you explain how these registers map to 802.3 clause 45? Would
> that be 3.1002? That would however put it in the reserved range 3.812
> through 3.1799. The vendor range is 3.32768 through 3.65535.

So from what I can tell the vendor specific registers are mapped into
the middle of the range starting at register 512 instead of starting
at 32768. So essentially offsets 512 - 612 and 1544 - 1639 appear to
be used for the vendor specific registers. In addition we have an
unused block of PCS registers that are unused from 1024 to 1536 as
they were there for an unsupported speed configuration.

> > +#define FBNIC_CSR_START_RSFEC                0x10800 /* CSR section delimiter */
> > +#define FBNIC_RSFEC_CONTROL(n)\
> > +                             (0x10800 + 8 * (n))     /* 0x42000 + 32*n */
> > +#define FBNIC_RSFEC_CONTROL_AM16_COPY_DIS    CSR_BIT(3)
> > +#define FBNIC_RSFEC_CONTROL_KP_ENABLE                CSR_BIT(8)
> > +#define FBNIC_RSFEC_CONTROL_TC_PAD_ALTER     CSR_BIT(10)
> > +#define FBNIC_RSFEC_MAX_LANES                        4
> > +#define FBNIC_RSFEC_CCW_LO(n) \
> > +                             (0x10802 + 8 * (n))     /* 0x42008 + 32*n */
> > +#define FBNIC_RSFEC_CCW_HI(n) \
> > +                             (0x10803 + 8 * (n))     /* 0x4200c + 32*n */
>
> Is this Corrected Code Words Lower/Upper? 1.202 and 1.203?

Yes and no, this is 3.802 and 3.803 which I assume is more or less the
same thing but the PCS variant.

> > +#define FBNIC_RSFEC_NCCW_LO(n) \
> > +                             (0x10804 + 8 * (n))     /* 0x42010 + 32*n */
> > +#define FBNIC_RSFEC_NCCW_HI(n) \
> > +                             (0x10805 + 8 * (n))     /* 0x42014 + 32*n */
>
> Which suggests this is Uncorrected code Words? 1.204, 1.205? I guess
> the N is for Not?

These are 3.804 and 3.805.

From what I can tell the first 6 registers for the RSFEC are laid out
in the same order. However we have 4 of these blocks that we have to
work with and they are tightly packed such that the second block
starts at offset 8 following the start of the first block.

> > +#define FBNIC_RSFEC_SYMBLERR_LO(n) \
> > +                             (0x10880 + 8 * (n))     /* 0x42200 + 32*n */
> > +#define FBNIC_RSFEC_SYMBLERR_HI(n) \
> > +                             (0x10881 + 8 * (n))     /* 0x42204 + 32*n */
>
> And these are symbol count errors, 1.210 and 1.211?

I think this is 3.600 and 3.601.  However we only have 8 sets of
registers instead of 16.

> If there are other registers which follow 802.3 it would be good to
> add them to mdio.h, so others can share them.

I will try to see what I can do. I will try to sort out what is device
device specific such as our register layout versus what is shared such
as PCS register layouts and such.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup
  2024-04-21 23:21     ` Alexander Duyck
@ 2024-04-22 15:52       ` Andrew Lunn
  2024-04-22 18:59         ` Alexander H Duyck
  0 siblings, 1 reply; 163+ messages in thread
From: Andrew Lunn @ 2024-04-22 15:52 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Sun, Apr 21, 2024 at 04:21:57PM -0700, Alexander Duyck wrote:
> On Fri, Apr 5, 2024 at 2:51 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > +#define FBNIC_CSR_START_PCS          0x10000 /* CSR section delimiter */
> > > +#define FBNIC_PCS_CONTROL1_0         0x10000         /* 0x40000 */
> > > +#define FBNIC_PCS_CONTROL1_RESET             CSR_BIT(15)
> > > +#define FBNIC_PCS_CONTROL1_LOOPBACK          CSR_BIT(14)
> > > +#define FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS       CSR_BIT(13)
> > > +#define FBNIC_PCS_CONTROL1_SPEED_ALWAYS              CSR_BIT(6)
> >
> > This appears to be PCS control register 1, define in 45.2.3.1. Since
> > this is a standard register, please add it to mdio.h.
> 
> Actually all these bits are essentially there in the forms of:
> MDIO_CTRL1_RESET, MDIO_PCS_CTRL1_LOOPBACK, and MDIO_CTRL1_SPEEDSELEXT.
> I will base the driver on these values.

Great, thanks.

> > > +#define FBNIC_PCS_VENDOR_VL_INTVL_0  0x10202         /* 0x40808 */
> >
> > Could you explain how these registers map to 802.3 clause 45? Would
> > that be 3.1002? That would however put it in the reserved range 3.812
> > through 3.1799. The vendor range is 3.32768 through 3.65535.
> 
> So from what I can tell the vendor specific registers are mapped into
> the middle of the range starting at register 512 instead of starting
> at 32768.

802.3, clause 1.4.512:

  reserved: A key word indicating an object (bit, register, connector
  pin, encoding, interface signal, enumeration, etc.) to be defined
  only by this standard. A reserved object shall not be used for any
  user- defined purpose such as a user- or device-specific function;
  and such use of a reserved object shall render the implementation
  noncompliant with this standard.

It is surprising how many vendors like to make their devices
noncompliant by not following simple things like this. Anyway, nothing
you can do. Please put _VEND_ in the #define names to make it clear
these are vendor registers, even if they are not in the vendor space.

> > > +#define FBNIC_CSR_START_RSFEC                0x10800 /* CSR section delimiter */
> > > +#define FBNIC_RSFEC_CONTROL(n)\
> > > +                             (0x10800 + 8 * (n))     /* 0x42000 + 32*n */
> > > +#define FBNIC_RSFEC_CONTROL_AM16_COPY_DIS    CSR_BIT(3)
> > > +#define FBNIC_RSFEC_CONTROL_KP_ENABLE                CSR_BIT(8)
> > > +#define FBNIC_RSFEC_CONTROL_TC_PAD_ALTER     CSR_BIT(10)
> > > +#define FBNIC_RSFEC_MAX_LANES                        4
> > > +#define FBNIC_RSFEC_CCW_LO(n) \
> > > +                             (0x10802 + 8 * (n))     /* 0x42008 + 32*n */
> > > +#define FBNIC_RSFEC_CCW_HI(n) \
> > > +                             (0x10803 + 8 * (n))     /* 0x4200c + 32*n */
> >
> > Is this Corrected Code Words Lower/Upper? 1.202 and 1.203?
> 
> Yes and no, this is 3.802 and 3.803 which I assume is more or less the
> same thing but the PCS variant.

Have you figure out how to map the 802.3 register number to the value
you need here? 0x42008 + 32*n? Ideally we should list the registers in
the common header file with there 802.3 defined value. Your driver can
them massage the value to what you need for your memory mapped device.

If you really want to go the whole hog, you might be able to extend
mdio-regmap.c to support memory mapped C45 registers, so your driver
can then use mdiodev_c45_read()/mdiodev_c45_write(). We have a few PCS
drivers for licensed IP which appear on both MDIO busses and memory
mapped. mdio-regmap.c hides way the access details.

> I will try to see what I can do. I will try to sort out what is device
> device specific such as our register layout versus what is shared such
> as PCS register layouts and such.

That would be great.

	Andrew

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup
  2024-04-22 15:52       ` Andrew Lunn
@ 2024-04-22 18:59         ` Alexander H Duyck
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander H Duyck @ 2024-04-22 18:59 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Alexander Duyck, kuba, davem, pabeni

On Mon, 2024-04-22 at 17:52 +0200, Andrew Lunn wrote:
> On Sun, Apr 21, 2024 at 04:21:57PM -0700, Alexander Duyck wrote:
> > On Fri, Apr 5, 2024 at 2:51 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > > 
> > > > +#define FBNIC_CSR_START_PCS          0x10000 /* CSR section delimiter */
> > > > +#define FBNIC_PCS_CONTROL1_0         0x10000         /* 0x40000 */
> > > > +#define FBNIC_PCS_CONTROL1_RESET             CSR_BIT(15)
> > > > +#define FBNIC_PCS_CONTROL1_LOOPBACK          CSR_BIT(14)
> > > > +#define FBNIC_PCS_CONTROL1_SPEED_SELECT_ALWAYS       CSR_BIT(13)
> > > > +#define FBNIC_PCS_CONTROL1_SPEED_ALWAYS              CSR_BIT(6)
> > > 
> > > This appears to be PCS control register 1, define in 45.2.3.1. Since
> > > this is a standard register, please add it to mdio.h.
> > 
> > Actually all these bits are essentially there in the forms of:
> > MDIO_CTRL1_RESET, MDIO_PCS_CTRL1_LOOPBACK, and MDIO_CTRL1_SPEEDSELEXT.
> > I will base the driver on these values.
> 
> Great, thanks.
> 
> > > > +#define FBNIC_PCS_VENDOR_VL_INTVL_0  0x10202         /* 0x40808 */
> > > 
> > > Could you explain how these registers map to 802.3 clause 45? Would
> > > that be 3.1002? That would however put it in the reserved range 3.812
> > > through 3.1799. The vendor range is 3.32768 through 3.65535.
> > 
> > So from what I can tell the vendor specific registers are mapped into
> > the middle of the range starting at register 512 instead of starting
> > at 32768.
> 
> 802.3, clause 1.4.512:
> 
>   reserved: A key word indicating an object (bit, register, connector
>   pin, encoding, interface signal, enumeration, etc.) to be defined
>   only by this standard. A reserved object shall not be used for any
>   user- defined purpose such as a user- or device-specific function;
>   and such use of a reserved object shall render the implementation
>   noncompliant with this standard.
> 
> It is surprising how many vendors like to make their devices
> noncompliant by not following simple things like this. Anyway, nothing
> you can do. Please put _VEND_ in the #define names to make it clear
> these are vendor registers, even if they are not in the vendor space.

Yeah, I am not sure how much of this is the synthesis of the IP versus
the mapping functionality of our device in terms of how the registers
got ordered. I'm thinking if nothing else there may be a need to break
this up into logical "pages".

From what I can tell the layout is something like:
CSR Range	Register Block
==================================
   0 -  511	PCS Channel 0 (within spec)
 512 - 1023	PCS Channel 0 Vendor Registers
1024 - 1535	PCS Channel 1 (within spec)
1536 - 2043	PCS Channel 1 Vendor Registers

I said we weren't using channel 1 registers but after looking back
through the code and starting re-factoring I believe I was thinking of
channels 2 and 3 which would be used for 100-R4. Basically channel 1 is
used in the 50-R2 and 100-R2 use cases.

As far as the vendor registers themselves most of this block of
registers is all about the virtual lane alignment markers. As such I
may want to export the values to a shared header file since they should
be common as per the spec, but the means of programming them would be
vendor specific.

> > > > +#define FBNIC_CSR_START_RSFEC                0x10800 /* CSR section delimiter */
> > > > +#define FBNIC_RSFEC_CONTROL(n)\
> > > > +                             (0x10800 + 8 * (n))     /* 0x42000 + 32*n */
> > > > +#define FBNIC_RSFEC_CONTROL_AM16_COPY_DIS    CSR_BIT(3)
> > > > +#define FBNIC_RSFEC_CONTROL_KP_ENABLE                CSR_BIT(8)
> > > > +#define FBNIC_RSFEC_CONTROL_TC_PAD_ALTER     CSR_BIT(10)
> > > > +#define FBNIC_RSFEC_MAX_LANES                        4
> > > > +#define FBNIC_RSFEC_CCW_LO(n) \
> > > > +                             (0x10802 + 8 * (n))     /* 0x42008 + 32*n */
> > > > +#define FBNIC_RSFEC_CCW_HI(n) \
> > > > +                             (0x10803 + 8 * (n))     /* 0x4200c + 32*n */
> > > 
> > > Is this Corrected Code Words Lower/Upper? 1.202 and 1.203?
> > 
> > Yes and no, this is 3.802 and 3.803 which I assume is more or less the
> > same thing but the PCS variant.
> 
> Have you figure out how to map the 802.3 register number to the value
> you need here? 0x42008 + 32*n? Ideally we should list the registers in
> the common header file with there 802.3 defined value. Your driver can
> them massage the value to what you need for your memory mapped device.

Similarly for the RSFEC portion things seem to have been broken out
into 4 blocks w/ multiple sets of registers. The first 6 are laid out
in the same order as the spec, but they are starting at an offset of
(2048 + 8 * (n)) instead of 800. So I suppose the big question would be
how to convert the standard addressing scheme into something that would
get us to the right page for the right interface.

> If you really want to go the whole hog, you might be able to extend
> mdio-regmap.c to support memory mapped C45 registers, so your driver
> can then use mdiodev_c45_read()/mdiodev_c45_write(). We have a few PCS
> drivers for licensed IP which appear on both MDIO busses and memory
> mapped. mdio-regmap.c hides way the access details.

The big issue as I see it is the fact that we have multiple copies of
things that are interleaved together. So for example with the RSFEC we
have 4 blocks of 8 registers that are all interleaved with the first 6
matching the layout, but the last 2 being something different.

Since I am not accessing these via MDIO I am not sure what the expected
layout should be in terms of deciding what should be a device, channel,
or register address and how that would map to a page.

^ permalink raw reply	[flat|nested] 163+ messages in thread

end of thread, other threads:[~2024-04-22 18:59 UTC | newest]

Thread overview: 163+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-03 20:08 [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Alexander Duyck
2024-04-03 20:08 ` [net-next PATCH 01/15] PCI: Add Meta Platforms vendor ID Alexander Duyck
2024-04-03 20:20   ` Bjorn Helgaas
2024-04-03 20:08 ` [net-next PATCH 02/15] eth: fbnic: add scaffolding for Meta's NIC driver Alexander Duyck
2024-04-03 20:33   ` Andrew Lunn
2024-04-03 20:47     ` Alexander Duyck
2024-04-03 21:17       ` Andrew Lunn
2024-04-03 21:51         ` Alexander Duyck
2024-04-03 22:20           ` Andrew Lunn
2024-04-03 23:27             ` Alexander Duyck
2024-04-03 20:08 ` [net-next PATCH 03/15] eth: fbnic: Allocate core device specific structures and devlink interface Alexander Duyck
2024-04-03 20:35   ` Bjorn Helgaas
2024-04-03 20:08 ` [net-next PATCH 04/15] eth: fbnic: Add register init to set PCIe/Ethernet device config Alexander Duyck
2024-04-03 20:46   ` Andrew Lunn
2024-04-10 20:31     ` Jacob Keller
2024-04-03 20:08 ` [net-next PATCH 05/15] eth: fbnic: add message parsing for FW messages Alexander Duyck
2024-04-03 21:07   ` Jeff Johnson
2024-04-03 20:08 ` [net-next PATCH 06/15] eth: fbnic: add FW communication mechanism Alexander Duyck
2024-04-03 20:08 ` [net-next PATCH 07/15] eth: fbnic: allocate a netdevice and napi vectors with queues Alexander Duyck
2024-04-03 20:58   ` Andrew Lunn
2024-04-03 22:15     ` Alexander Duyck
2024-04-03 22:26       ` Andrew Lunn
2024-04-03 20:08 ` [net-next PATCH 08/15] eth: fbnic: implement Tx queue alloc/start/stop/free Alexander Duyck
2024-04-03 20:09 ` [net-next PATCH 09/15] eth: fbnic: implement Rx " Alexander Duyck
2024-04-04 11:42   ` kernel test robot
2024-04-03 20:09 ` [net-next PATCH 10/15] eth: fbnic: Add initial messaging to notify FW of our presence Alexander Duyck
2024-04-03 20:09 ` [net-next PATCH 11/15] eth: fbnic: Enable Ethernet link setup Alexander Duyck
2024-04-03 21:11   ` Andrew Lunn
2024-04-05 21:51   ` Andrew Lunn
2024-04-21 23:21     ` Alexander Duyck
2024-04-22 15:52       ` Andrew Lunn
2024-04-22 18:59         ` Alexander H Duyck
2024-04-03 20:09 ` [net-next PATCH 12/15] eth: fbnic: add basic Tx handling Alexander Duyck
2024-04-03 20:09 ` [net-next PATCH 13/15] eth: fbnic: add basic Rx handling Alexander Duyck
2024-04-09 11:47   ` Yunsheng Lin
2024-04-09 15:08     ` Alexander Duyck
2024-04-10 11:54       ` Yunsheng Lin
2024-04-10 15:03         ` Alexander Duyck
2024-04-12  8:43           ` Yunsheng Lin
2024-04-12  9:47             ` Yunsheng Lin
2024-04-12 15:05             ` Alexander Duyck
2024-04-15 13:19               ` Yunsheng Lin
2024-04-15 15:03                 ` Alexander Duyck
2024-04-15 17:11                   ` Jakub Kicinski
2024-04-15 18:03                     ` Alexander Duyck
2024-04-15 18:19                       ` Jakub Kicinski
2024-04-15 18:55                         ` Alexander Duyck
2024-04-15 22:01                           ` Jakub Kicinski
2024-04-15 23:57                             ` Alexander Duyck
2024-04-16  0:24                               ` Jakub Kicinski
2024-04-16 13:25                             ` Yunsheng Lin
2024-04-16 14:35                               ` Alexander Duyck
2024-04-16 14:05                       ` Alexander Lobakin
2024-04-16 14:46                         ` Alexander Duyck
2024-04-16 18:26                           ` Andrew Lunn
2024-04-17  8:14                           ` Leon Romanovsky
2024-04-17 16:09                             ` Alexander Duyck
2024-04-17 10:39                           ` Alexander Lobakin
2024-04-03 20:09 ` [net-next PATCH 14/15] eth: fbnic: add L2 address programming Alexander Duyck
2024-04-03 20:09 ` [net-next PATCH 15/15] eth: fbnic: write the TCAM tables used for RSS control and Rx to host Alexander Duyck
2024-04-03 20:42 ` [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface Bjorn Helgaas
2024-04-04 11:37 ` Jiri Pirko
2024-04-04 14:45   ` Alexander Duyck
2024-04-04 15:24     ` Andrew Lunn
2024-04-04 15:37       ` Jakub Kicinski
2024-04-05  3:08         ` David Ahern
2024-04-04 15:36     ` Jiri Pirko
2024-04-04 18:35       ` Andrew Lunn
2024-04-04 19:05         ` Leon Romanovsky
2024-04-04 19:22       ` Alexander Duyck
2024-04-04 20:25         ` Jakub Kicinski
2024-04-04 21:59           ` John Fastabend
2024-04-04 23:50             ` Jakub Kicinski
2024-04-05  0:11               ` Alexander Duyck
2024-04-05  2:38                 ` Jakub Kicinski
2024-04-05 15:41                   ` Alexander Duyck
2024-04-08  6:18                     ` Leon Romanovsky
2024-04-08 15:26                       ` Alexander Duyck
2024-04-08 18:41                         ` Leon Romanovsky
2024-04-08 20:43                           ` Alexander Duyck
2024-04-08 21:49                             ` Florian Fainelli
2024-04-08 21:52                               ` Florian Fainelli
2024-04-09  8:18                             ` Leon Romanovsky
2024-04-09 14:43                               ` Alexander Duyck
2024-04-09 15:39                                 ` Jason Gunthorpe
2024-04-09 16:31                                   ` Alexander Duyck
2024-04-09 17:12                                     ` Jason Gunthorpe
2024-04-09 18:38                                       ` Alexander Duyck
2024-04-09 18:54                                         ` Jason Gunthorpe
2024-04-09 20:03                                           ` Alexander Duyck
2024-04-09 23:11                                             ` Jason Gunthorpe
2024-04-10  9:37                                             ` Jiri Pirko
2024-04-09 19:15                                         ` Leon Romanovsky
2024-04-05  7:11                 ` Paolo Abeni
2024-04-05 12:26                   ` Jason Gunthorpe
2024-04-05 13:06                     ` Daniel Borkmann
2024-04-05 14:24                     ` Alexander Duyck
2024-04-05 15:17                       ` Jason Gunthorpe
2024-04-05 18:38                         ` Alexander Duyck
2024-04-05 19:02                           ` Jason Gunthorpe
2024-04-06 16:05                             ` Alexander Duyck
2024-04-06 16:49                               ` Andrew Lunn
2024-04-06 17:16                                 ` Alexander Duyck
2024-04-08 15:04                               ` Jakub Kicinski
2024-04-08 19:50                               ` Mina Almasry
2024-04-08 11:50                           ` Jiri Pirko
2024-04-08 15:46                             ` Alexander Duyck
2024-04-08 16:51                               ` Jiri Pirko
2024-04-08 17:32                                 ` John Fastabend
2024-04-09 11:01                                   ` Jiri Pirko
2024-04-09 13:11                                     ` Alexander Lobakin
2024-04-09 13:18                                       ` Jason Gunthorpe
2024-04-09 14:08                                       ` Jakub Kicinski
2024-04-09 14:27                                         ` Jakub Kicinski
2024-04-09 14:41                                       ` Jiri Pirko
2024-04-10 11:45                                         ` Alexander Lobakin
2024-04-10 12:12                                           ` Jiri Pirko
2024-04-08 21:36                                 ` Florian Fainelli
2024-04-09 10:56                                   ` Jiri Pirko
2024-04-09 13:05                                     ` Florian Fainelli
2024-04-09 14:28                                       ` Jiri Pirko
2024-04-09 17:42                                         ` Florian Fainelli
2024-04-09 18:38                                           ` Leon Romanovsky
2024-04-08 18:16                               ` Jason Gunthorpe
2024-04-09 16:53                       ` Edward Cree
2024-04-08 11:37                   ` Jiri Pirko
2024-04-04 23:50             ` Alexander Duyck
2024-04-08 11:05             ` Jiri Pirko
2024-04-08 10:54         ` Jiri Pirko
2024-04-05 14:01 ` Przemek Kitszel
2024-04-06 16:53   ` Alexander Duyck
2024-04-09 20:51 ` Jakub Kicinski
2024-04-09 21:06   ` Willem de Bruijn
2024-04-10  7:26     ` Jiri Pirko
2024-04-10 21:30       ` Jacob Keller
2024-04-10 22:19         ` Andrew Lunn
2024-04-11  0:31           ` Jacob Keller
2024-04-09 23:42   ` Andrew Lunn
2024-04-10 15:56     ` Alexander Duyck
2024-04-10 20:01       ` Andrew Lunn
2024-04-10 21:07         ` Alexander Duyck
2024-04-10 22:37           ` Andrew Lunn
2024-04-11 16:00             ` Alexander Duyck
2024-04-11 17:32               ` Andrew Lunn
2024-04-11 23:12                 ` Alexander Duyck
2024-04-11  6:39           ` Jiri Pirko
2024-04-11 16:46             ` Alexander Duyck
2024-04-10  7:42   ` Jiri Pirko
2024-04-10 12:50     ` Przemek Kitszel
2024-04-10 13:46     ` Jakub Kicinski
2024-04-10 15:12       ` Jiri Pirko
2024-04-10 17:35         ` Jakub Kicinski
2024-04-10 17:39           ` Florian Fainelli
2024-04-10 17:56             ` Jakub Kicinski
2024-04-10 18:00               ` Florian Fainelli
2024-04-10 20:03                 ` Jakub Kicinski
2024-04-10 18:01               ` Alexander Duyck
2024-04-10 18:29                 ` Florian Fainelli
2024-04-10 19:58                   ` Jakub Kicinski
2024-04-10 22:03                     ` Jacob Keller
2024-04-11  6:31                       ` Jiri Pirko
2024-04-11 16:22                         ` Jacob Keller
2024-04-11  6:34                 ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.