linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V1 0/9] PCIe TPH and cache direct injection support
@ 2024-05-09 16:27 Wei Huang
  2024-05-09 16:27 ` [PATCH V1 1/9] PCI: Introduce PCIe TPH support framework Wei Huang
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

Hi All,

TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices
to provide optimization hints for requests that target memory space. These
hints, in a format called steering tag (ST), are provided in the requester's
TLP headers and allow the system hardware, including the Root Complex, to
optimize the utilization of platform resources for the requests.

Upcoming AMD hardware implement a new Cache Injection feature that leverages
TPH. Cache Injection allows PCIe endpoints to inject I/O Coherent DMA writes
directly into an L2 within the CCX (core complex) closest to the CPU core
that will consume it. The technology is targeted at applications whose
performance is sensitive to the latency of inbound writes as seen by a CPU
core. The applications include networking and storage applications.

This series implements generic TPH support in Linux. It allows STs to be
retrieved from ACPI _DSM (defined by ACPI) and used by PCIe end-point
drivers as needed. As a demo, it includes an usage example in Broadcom BNXT
driver. When running on Broadcom NICs with proper firmware, Cache Injection
shows substantial memory bandwidth saving using real-world benchmarks.

Manoj Panicker (1):
  bnxt_en: Add TPH support in BNXT driver

Michael Chan (1):
  bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings

Wei Huang (7):
  PCI: Introduce PCIe TPH support framework
  PCI: Add TPH related register definition
  PCI/TPH: Implement a command line option to disable TPH
  PCI/TPH: Implement a command line option to force No ST Mode
  PCI/TPH: Introduce API functions to get/set steering tags
  PCI/TPH: Retrieve steering tag from ACPI _DSM
  PCI/TPH: Add TPH documentation

 Documentation/PCI/index.rst                   |   1 +
 Documentation/PCI/tph.rst                     |  57 ++
 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/driver-api/pci/pci.rst          |   3 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  59 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   4 +
 drivers/pci/pci-driver.c                      |  12 +-
 drivers/pci/pci.c                             |  24 +
 drivers/pci/pci.h                             |   6 +
 drivers/pci/pcie/Kconfig                      |  10 +
 drivers/pci/pcie/Makefile                     |   1 +
 drivers/pci/pcie/tph.c                        | 563 ++++++++++++++++++
 drivers/pci/probe.c                           |   1 +
 drivers/vfio/pci/vfio_pci_config.c            |   7 +-
 include/linux/pci-tph.h                       |  75 +++
 include/linux/pci.h                           |   6 +
 include/uapi/linux/pci_regs.h                 |  35 +-
 17 files changed, 856 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/PCI/tph.rst
 create mode 100644 drivers/pci/pcie/tph.c
 create mode 100644 include/linux/pci-tph.h

-- 
2.44.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH V1 1/9] PCI: Introduce PCIe TPH support framework
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-09 16:27 ` [PATCH V1 2/9] PCI: Add TPH related register definition Wei Huang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

This patch implements the framework for PCIe TPH support. It introduces
tph.c source file, along with CONFIG_PCIE_TPH, to Linux PCIe subsystem.
A new member, named tph_cap, is also introduced in pci_dev to cache TPH
capability offset.

Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 drivers/pci/pci.h         |  6 ++++++
 drivers/pci/pcie/Kconfig  | 10 ++++++++++
 drivers/pci/pcie/Makefile |  1 +
 drivers/pci/pcie/tph.c    | 28 ++++++++++++++++++++++++++++
 drivers/pci/probe.c       |  1 +
 include/linux/pci.h       |  4 ++++
 6 files changed, 50 insertions(+)
 create mode 100644 drivers/pci/pcie/tph.c

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 17fed1846847..6f1d35a68126 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -508,6 +508,12 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
 
 #endif /* CONFIG_PCI_IOV */
 
+#ifdef CONFIG_PCIE_TPH
+void pcie_tph_init(struct pci_dev *dev);
+#else
+static inline void pcie_tph_init(struct pci_dev *dev) {}
+#endif
+
 #ifdef CONFIG_PCIE_PTM
 void pci_ptm_init(struct pci_dev *dev);
 void pci_save_ptm_state(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 8999fcebde6a..a4940e2af9b1 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -155,3 +155,13 @@ config PCIE_EDR
 	  the PCI Firmware Specification r3.2.  Enable this if you want to
 	  support hybrid DPC model which uses both firmware and OS to
 	  implement DPC.
+
+config PCIE_TPH
+	bool "TLP Processing Hints"
+	default n
+	help
+	  This option adds support for PCIE TLP Processing Hints (TPH).
+	  TPH allows endpoint devices to provide optimization hints, such as
+	  desired caching behavior, for requests that target memory space.
+	  These hints, called steering tags, can empower the system hardware
+	  to optimize the utilization of platform resources.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 6461aa93fe76..3542b42ea0b9 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME)		+= pme.o
 obj-$(CONFIG_PCIE_DPC)		+= dpc.o
 obj-$(CONFIG_PCIE_PTM)		+= ptm.o
 obj-$(CONFIG_PCIE_EDR)		+= edr.o
+obj-$(CONFIG_PCIE_TPH)		+= tph.o
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
new file mode 100644
index 000000000000..5f0cc06b74bb
--- /dev/null
+++ b/drivers/pci/pcie/tph.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * TPH (TLP Processing Hints) support
+ *
+ * Copyright (C) 2024 Advanced Micro Devices, Inc.
+ *     Eric Van Tassell <Eric.VanTassell@amd.com>
+ *     Wei Huang <wei.huang2@amd.com>
+ */
+
+#define pr_fmt(fmt) "TPH: " fmt
+#define dev_fmt pr_fmt
+
+#include <linux/acpi.h>
+#include <uapi/linux/pci_regs.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/msi.h>
+#include <linux/pci.h>
+#include <linux/msi.h>
+#include <linux/pci-acpi.h>
+
+#include "../pci.h"
+
+void pcie_tph_init(struct pci_dev *dev)
+{
+	dev->tph_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_TPH);
+}
+
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1325fbae2f28..9ac511032639 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2481,6 +2481,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	pci_dpc_init(dev);		/* Downstream Port Containment */
 	pci_rcec_init(dev);		/* Root Complex Event Collector */
 	pci_doe_init(dev);		/* Data Object Exchange */
+	pcie_tph_init(dev);             /* TLP Processing Hints */
 
 	pcie_report_downtraining(dev);
 	pci_init_reset_methods(dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 16493426a04f..73d92c7d2c5b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -529,6 +529,10 @@ struct pci_dev {
 
 	/* These methods index pci_reset_fn_methods[] */
 	u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+
+#ifdef CONFIG_PCIE_TPH
+	u16 tph_cap; /* TPH capability offset */
+#endif
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 2/9] PCI: Add TPH related register definition
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
  2024-05-09 16:27 ` [PATCH V1 1/9] PCI: Introduce PCIe TPH support framework Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-09 16:27 ` [PATCH V1 3/9] PCI/TPH: Implement a command line option to disable TPH Wei Huang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

Linux has some basic, but incomplete, definition for the TPH Requester
capability registers. Also the control registers of TPH Requester and
the TPH Completer are missing. This patch adds all required definitions
to support TPH enablement.

Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 drivers/vfio/pci/vfio_pci_config.c |  7 +++---
 include/uapi/linux/pci_regs.h      | 35 ++++++++++++++++++++++++++----
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 97422aafaa7b..de622cdfc2a4 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1434,14 +1434,15 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo
 		if (ret)
 			return pcibios_err_to_errno(ret);
 
-		if ((dword & PCI_TPH_CAP_LOC_MASK) == PCI_TPH_LOC_CAP) {
+		if (((dword & PCI_TPH_CAP_LOC_MASK) >> PCI_TPH_CAP_LOC_SHIFT)
+			== PCI_TPH_LOC_CAP) {
 			int sts;
 
 			sts = dword & PCI_TPH_CAP_ST_MASK;
 			sts >>= PCI_TPH_CAP_ST_SHIFT;
-			return PCI_TPH_BASE_SIZEOF + (sts * 2) + 2;
+			return PCI_TPH_ST_TABLE + (sts * 2) + 2;
 		}
-		return PCI_TPH_BASE_SIZEOF;
+		return PCI_TPH_ST_TABLE;
 	case PCI_EXT_CAP_ID_DVSEC:
 		ret = pci_read_config_dword(pdev, epos + PCI_DVSEC_HEADER1, &dword);
 		if (ret)
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index a39193213ff2..bf5dcd626613 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -657,6 +657,7 @@
 #define  PCI_EXP_DEVCAP2_ATOMIC_COMP64	0x00000100 /* 64b AtomicOp completion */
 #define  PCI_EXP_DEVCAP2_ATOMIC_COMP128	0x00000200 /* 128b AtomicOp completion */
 #define  PCI_EXP_DEVCAP2_LTR		0x00000800 /* Latency tolerance reporting */
+#define  PCI_EXP_DEVCAP2_TPH_COMP	0x00003000 /* TPH completer support */
 #define  PCI_EXP_DEVCAP2_OBFF_MASK	0x000c0000 /* OBFF support mechanism */
 #define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */
 #define  PCI_EXP_DEVCAP2_OBFF_WAKE	0x00080000 /* Re-use WAKE# for OBFF */
@@ -1020,15 +1021,41 @@
 #define  PCI_DPA_CAP_SUBSTATE_MASK	0x1F	/* # substates - 1 */
 #define PCI_DPA_BASE_SIZEOF	16	/* size with 0 substates */
 
+/* TPH Completer Support */
+#define PCI_EXP_DEVCAP2_TPH_COMP_SHIFT		12
+#define PCI_EXP_DEVCAP2_TPH_COMP_NONE		0x0 /* None */
+#define PCI_EXP_DEVCAP2_TPH_COMP_TPH_ONLY	0x1 /* TPH only */
+#define PCI_EXP_DEVCAP2_TPH_COMP_TPH_AND_EXT	0x3 /* TPH and Extended TPH */
+
 /* TPH Requester */
 #define PCI_TPH_CAP		4	/* capability register */
+#define  PCI_TPH_CAP_NO_ST	0x1	/* no ST mode supported */
+#define  PCI_TPH_CAP_NO_ST_SHIFT	0x0	/* no ST mode supported shift */
+#define  PCI_TPH_CAP_INT_VEC	0x2	/* interrupt vector mode supported */
+#define  PCI_TPH_CAP_INT_VEC_SHIFT	0x1	/* interrupt vector mode supported shift */
+#define  PCI_TPH_CAP_DS		0x4	/* device specific mode supported */
+#define  PCI_TPH_CAP_DS_SHIFT	0x4	/* device specific mode supported shift */
 #define  PCI_TPH_CAP_LOC_MASK	0x600	/* location mask */
-#define   PCI_TPH_LOC_NONE	0x000	/* no location */
-#define   PCI_TPH_LOC_CAP	0x200	/* in capability */
-#define   PCI_TPH_LOC_MSIX	0x400	/* in MSI-X */
+#define  PCI_TPH_CAP_LOC_SHIFT	9	/* location shift */
+#define   PCI_TPH_LOC_NONE	0x0	/*  no ST Table */
+#define   PCI_TPH_LOC_CAP	0x1	/*  ST Table in extended capability */
+#define   PCI_TPH_LOC_MSIX	0x2	/*  ST table in MSI-X table */
 #define PCI_TPH_CAP_ST_MASK	0x07FF0000	/* ST table mask */
 #define PCI_TPH_CAP_ST_SHIFT	16	/* ST table shift */
-#define PCI_TPH_BASE_SIZEOF	0xc	/* size with no ST table */
+
+#define PCI_TPH_CTRL		0x8	/* control register */
+#define  PCI_TPH_CTRL_MODE_SEL_MASK	0x7	/* ST Model Select mask */
+#define  PCI_TPH_CTRL_MODE_SEL_SHIFT	0x0	/* ST Model Select shift */
+#define   PCI_TPH_NO_ST_MODE		0x0	/*  No ST Mode */
+#define   PCI_TPH_INT_VEC_MODE		0x1	/*  Interrupt Vector Mode */
+#define   PCI_TPH_DEV_SPEC_MODE		0x2	/*  Device Specific Mode */
+#define  PCI_TPH_CTRL_REQ_EN_MASK	0x300	/* TPH Requester mask */
+#define  PCI_TPH_CTRL_REQ_EN_SHIFT	8	/* TPH Requester shift */
+#define   PCI_TPH_REQ_DISABLE		0x0	/*  No TPH request allowed */
+#define   PCI_TPH_REQ_TPH_ONLY		0x1	/*  8-bit TPH tags allowed */
+#define   PCI_TPH_REQ_EXT_TPH		0x3	/*  16-bit TPH tags allowed */
+
+#define PCI_TPH_ST_TABLE	0xc	/* base of ST table */
 
 /* Downstream Port Containment */
 #define PCI_EXP_DPC_CAP			0x04	/* DPC Capability */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 3/9] PCI/TPH: Implement a command line option to disable TPH
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
  2024-05-09 16:27 ` [PATCH V1 1/9] PCI: Introduce PCIe TPH support framework Wei Huang
  2024-05-09 16:27 ` [PATCH V1 2/9] PCI: Add TPH related register definition Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-09 16:27 ` [PATCH V1 4/9] PCI/TPH: Implement a command line option to force No ST Mode Wei Huang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

Provide a kernel option, with related helper functions, to completely
disable TPH so that no TPH headers are generated.

Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 .../admin-guide/kernel-parameters.txt         |  1 +
 drivers/pci/pci-driver.c                      |  7 ++++-
 drivers/pci/pci.c                             | 12 ++++++++
 drivers/pci/pcie/tph.c                        | 30 +++++++++++++++++++
 include/linux/pci-tph.h                       | 19 ++++++++++++
 include/linux/pci.h                           |  1 +
 6 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/pci-tph.h

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 396137ee018d..5681600c6941 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4593,6 +4593,7 @@
 		nomio		[S390] Do not use MIO instructions.
 		norid		[S390] ignore the RID field and force use of
 				one PCI domain per PCI function
+		notph		[PCIE] Do not use PCIe TPH
 
 	pcie_aspm=	[PCIE] Forcibly enable or ignore PCIe Active State Power
 			Management.
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index af2996d0d17f..9722d070c0ca 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -21,6 +21,7 @@
 #include <linux/acpi.h>
 #include <linux/dma-map-ops.h>
 #include <linux/iommu.h>
+#include <linux/pci-tph.h>
 #include "pci.h"
 #include "pcie/portdrv.h"
 
@@ -322,8 +323,12 @@ static long local_pci_probe(void *_ddi)
 	pm_runtime_get_sync(dev);
 	pci_dev->driver = pci_drv;
 	rc = pci_drv->probe(pci_dev, ddi->id);
-	if (!rc)
+	if (!rc) {
+		if (pci_tph_disabled())
+			pcie_tph_disable(pci_dev);
+
 		return rc;
+	}
 	if (rc < 0) {
 		pci_dev->driver = NULL;
 		pm_runtime_put_sync(dev);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e5f243dd4288..06f9656f95bf 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -157,6 +157,9 @@ static bool pcie_ari_disabled;
 /* If set, the PCIe ATS capability will not be used. */
 static bool pcie_ats_disabled;
 
+/* If set, the PCIe TPH capability will not be used. */
+static bool pcie_tph_disabled;
+
 /* If set, the PCI config space of each device is printed during boot. */
 bool pci_early_dump;
 
@@ -166,6 +169,12 @@ bool pci_ats_disabled(void)
 }
 EXPORT_SYMBOL_GPL(pci_ats_disabled);
 
+bool pci_tph_disabled(void)
+{
+	return pcie_tph_disabled;
+}
+EXPORT_SYMBOL_GPL(pci_tph_disabled);
+
 /* Disable bridge_d3 for all PCIe ports */
 static bool pci_bridge_d3_disable;
 /* Force bridge_d3 for all PCIe ports */
@@ -6707,6 +6716,9 @@ static int __init pci_setup(char *str)
 				pci_no_domains();
 			} else if (!strncmp(str, "noari", 5)) {
 				pcie_ari_disabled = true;
+			} else if (!strcmp(str, "notph")) {
+				pr_info("PCIe: TPH is disabled\n");
+				pcie_tph_disabled = true;
 			} else if (!strncmp(str, "cbiosize=", 9)) {
 				pci_cardbus_io_size = memparse(str + 9, &str);
 			} else if (!strncmp(str, "cbmemsize=", 10)) {
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index 5f0cc06b74bb..5dc533b89a33 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -16,11 +16,41 @@
 #include <linux/errno.h>
 #include <linux/msi.h>
 #include <linux/pci.h>
+#include <linux/pci-tph.h>
 #include <linux/msi.h>
 #include <linux/pci-acpi.h>
 
 #include "../pci.h"
 
+static int tph_set_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask,
+				 u8 shift, u32 field)
+{
+	u32 reg_val;
+	int ret;
+
+	if (!dev->tph_cap)
+		return -EINVAL;
+
+	ret = pci_read_config_dword(dev, dev->tph_cap + offset, &reg_val);
+	if (ret)
+		return ret;
+
+	reg_val &= ~mask;
+	reg_val |= (field << shift) & mask;
+
+	ret = pci_write_config_dword(dev, dev->tph_cap + offset, reg_val);
+
+	return ret;
+}
+
+int pcie_tph_disable(struct pci_dev *dev)
+{
+	return  tph_set_reg_field_u32(dev, PCI_TPH_CTRL,
+				      PCI_TPH_CTRL_REQ_EN_MASK,
+				      PCI_TPH_CTRL_REQ_EN_SHIFT,
+				      PCI_TPH_REQ_DISABLE);
+}
+
 void pcie_tph_init(struct pci_dev *dev)
 {
 	dev->tph_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_TPH);
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
new file mode 100644
index 000000000000..e187d7e89e8c
--- /dev/null
+++ b/include/linux/pci-tph.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * TPH (TLP Processing Hints)
+ *
+ * Copyright (C) 2024 Advanced Micro Devices, Inc.
+ *     Eric Van Tassell <Eric.VanTassell@amd.com>
+ *     Wei Huang <wei.huang2@amd.com>
+ */
+#ifndef LINUX_PCI_TPH_H
+#define LINUX_PCI_TPH_H
+
+#ifdef CONFIG_PCIE_TPH
+int pcie_tph_disable(struct pci_dev *dev);
+#else
+static inline int pcie_tph_disable(struct pci_dev *dev)
+{ return -EOPNOTSUPP; }
+#endif
+
+#endif /* LINUX_PCI_TPH_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 73d92c7d2c5b..63aa6f888c90 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1866,6 +1866,7 @@ static inline bool pci_aer_available(void) { return false; }
 #endif
 
 bool pci_ats_disabled(void);
+bool pci_tph_disabled(void);
 
 #ifdef CONFIG_PCIE_PTM
 int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 4/9] PCI/TPH: Implement a command line option to force No ST Mode
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
                   ` (2 preceding siblings ...)
  2024-05-09 16:27 ` [PATCH V1 3/9] PCI/TPH: Implement a command line option to disable TPH Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-09 16:27 ` [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags Wei Huang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

When "No ST mode" is enabled, end-point devices can generate TPH headers
but with all steering tags treated as zero. A steering tag of zero is
interpreted as "using the default policy" by the root complex. This is
essential to quantify the benefit of steering tags for some given
workloads.

Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 .../admin-guide/kernel-parameters.txt         |  1 +
 drivers/pci/pci-driver.c                      |  7 ++++++-
 drivers/pci/pci.c                             | 12 +++++++++++
 drivers/pci/pcie/tph.c                        | 21 +++++++++++++++++++
 include/linux/pci-tph.h                       |  3 +++
 include/linux/pci.h                           |  1 +
 6 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5681600c6941..0adbbe291783 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4594,6 +4594,7 @@
 		norid		[S390] ignore the RID field and force use of
 				one PCI domain per PCI function
 		notph		[PCIE] Do not use PCIe TPH
+		nostmode	[PCIE] Force TPH to use No ST Mode
 
 	pcie_aspm=	[PCIE] Forcibly enable or ignore PCIe Active State Power
 			Management.
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 9722d070c0ca..aa98843d9884 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -324,8 +324,13 @@ static long local_pci_probe(void *_ddi)
 	pci_dev->driver = pci_drv;
 	rc = pci_drv->probe(pci_dev, ddi->id);
 	if (!rc) {
-		if (pci_tph_disabled())
+		if (pci_tph_disabled()) {
 			pcie_tph_disable(pci_dev);
+			return rc;
+		}
+
+		if (pci_tph_nostmode())
+			tph_set_dev_nostmode(pci_dev);
 
 		return rc;
 	}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 06f9656f95bf..ba9ec6f1b51e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -160,6 +160,9 @@ static bool pcie_ats_disabled;
 /* If set, the PCIe TPH capability will not be used. */
 static bool pcie_tph_disabled;
 
+/* If TPH is enabled, "No ST Mode" will be enforced. */
+static bool pcie_tph_nostmode;
+
 /* If set, the PCI config space of each device is printed during boot. */
 bool pci_early_dump;
 
@@ -175,6 +178,12 @@ bool pci_tph_disabled(void)
 }
 EXPORT_SYMBOL_GPL(pci_tph_disabled);
 
+bool pci_tph_nostmode(void)
+{
+	return pcie_tph_nostmode;
+}
+EXPORT_SYMBOL_GPL(pci_tph_nostmode);
+
 /* Disable bridge_d3 for all PCIe ports */
 static bool pci_bridge_d3_disable;
 /* Force bridge_d3 for all PCIe ports */
@@ -6719,6 +6728,9 @@ static int __init pci_setup(char *str)
 			} else if (!strcmp(str, "notph")) {
 				pr_info("PCIe: TPH is disabled\n");
 				pcie_tph_disabled = true;
+			} else if (!strcmp(str, "nostmode")) {
+				pr_info("PCIe: TPH No ST Mode is enabled\n");
+				pcie_tph_nostmode = true;
 			} else if (!strncmp(str, "cbiosize=", 9)) {
 				pci_cardbus_io_size = memparse(str + 9, &str);
 			} else if (!strncmp(str, "cbmemsize=", 10)) {
diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index 5dc533b89a33..d5f7309fdf52 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -43,6 +43,27 @@ static int tph_set_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask,
 	return ret;
 }
 
+int tph_set_dev_nostmode(struct pci_dev *dev)
+{
+	int ret;
+
+	/* set ST Mode Select to "No ST Mode" */
+	ret = tph_set_reg_field_u32(dev, PCI_TPH_CTRL,
+				    PCI_TPH_CTRL_MODE_SEL_MASK,
+				    PCI_TPH_CTRL_MODE_SEL_SHIFT,
+				    PCI_TPH_NO_ST_MODE);
+	if (ret)
+		return ret;
+
+	/* set "TPH Requester Enable" to "TPH only" */
+	ret = tph_set_reg_field_u32(dev, PCI_TPH_CTRL,
+				    PCI_TPH_CTRL_REQ_EN_MASK,
+				    PCI_TPH_CTRL_REQ_EN_SHIFT,
+				    PCI_TPH_REQ_TPH_ONLY);
+
+	return ret;
+}
+
 int pcie_tph_disable(struct pci_dev *dev)
 {
 	return  tph_set_reg_field_u32(dev, PCI_TPH_CTRL,
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index e187d7e89e8c..95269afc8b7d 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -11,9 +11,12 @@
 
 #ifdef CONFIG_PCIE_TPH
 int pcie_tph_disable(struct pci_dev *dev);
+int tph_set_dev_nostmode(struct pci_dev *dev);
 #else
 static inline int pcie_tph_disable(struct pci_dev *dev)
 { return -EOPNOTSUPP; }
+static inline int tph_set_dev_nostmode(struct pci_dev *dev)
+{ return -EOPNOTSUPP; }
 #endif
 
 #endif /* LINUX_PCI_TPH_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 63aa6f888c90..6781a1bd28c5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1867,6 +1867,7 @@ static inline bool pci_aer_available(void) { return false; }
 
 bool pci_ats_disabled(void);
 bool pci_tph_disabled(void);
+bool pci_tph_nostmode(void);
 
 #ifdef CONFIG_PCIE_PTM
 int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
                   ` (3 preceding siblings ...)
  2024-05-09 16:27 ` [PATCH V1 4/9] PCI/TPH: Implement a command line option to force No ST Mode Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-10  3:07   ` kernel test robot
  2024-05-11 20:15   ` Simon Horman
  2024-05-09 16:27 ` [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM Wei Huang
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

This patch introduces two API functions, pcie_tph_get_st() and
pcie_tph_set_st(), for a driver to retrieve or configure device's
steering tags. There are two possible locations for steering tag
table and the code automatically figure out the right location to
set the tags if pcie_tph_set_st() is called. Note the tag value is
always zero currently and will be extended in the follow-up patches.

Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 drivers/pci/pcie/tph.c  | 383 ++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-tph.h |  19 ++
 2 files changed, 402 insertions(+)

diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index d5f7309fdf52..50451a0a32ff 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -43,6 +43,336 @@ static int tph_set_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask,
 	return ret;
 }
 
+static int tph_get_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask,
+				 u8 shift, u32 *field)
+{
+	u32 reg_val;
+	int ret;
+
+	if (!dev->tph_cap)
+		return -EINVAL;
+
+	ret = pci_read_config_dword(dev, dev->tph_cap + offset, &reg_val);
+	if (ret)
+		return ret;
+
+	*field = (reg_val & mask) >> shift;
+
+	return 0;
+}
+
+static int tph_get_table_size(struct pci_dev *dev, u16 *size_out)
+{
+	int ret;
+	u32 tmp;
+
+	ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP,
+				    PCI_TPH_CAP_ST_MASK,
+				    PCI_TPH_CAP_ST_SHIFT, &tmp);
+
+	if (ret)
+		return ret;
+
+	*size_out = (u16)tmp;
+
+	return 0;
+}
+
+/*
+ * For a given device, return a pointer to the MSI table entry at msi_index.
+ */
+static void __iomem *tph_msix_table_entry(struct pci_dev *dev,
+					  __le16 msi_index)
+{
+	void *entry;
+	u16 tbl_sz;
+	int ret;
+
+	ret = tph_get_table_size(dev, &tbl_sz);
+	if (ret || msi_index > tbl_sz)
+		return NULL;
+
+	entry = dev->msix_base + msi_index * PCI_MSIX_ENTRY_SIZE;
+
+	return entry;
+}
+
+/*
+ * For a given device, return a pointer to the vector control register at
+ * offset 0xc of MSI table entry at msi_index.
+ */
+static void __iomem *tph_msix_vector_control(struct pci_dev *dev,
+					     __le16 msi_index)
+{
+	void __iomem *vec_ctrl_addr = tph_msix_table_entry(dev, msi_index);
+
+	if (vec_ctrl_addr)
+		vec_ctrl_addr += PCI_MSIX_ENTRY_VECTOR_CTRL;
+
+	return vec_ctrl_addr;
+}
+
+/*
+ * Translate from MSI-X interrupt index to struct msi_desc *
+ */
+static struct msi_desc *tph_msix_index_to_desc(struct pci_dev *dev, int index)
+{
+	struct msi_desc *entry;
+
+	msi_lock_descs(&dev->dev);
+	msi_for_each_desc(entry, &dev->dev, MSI_DESC_ASSOCIATED) {
+		if (entry->msi_index == index)
+			return entry;
+	}
+	msi_unlock_descs(&dev->dev);
+
+	return NULL;
+}
+
+static bool tph_int_vec_mode_supported(struct pci_dev *dev)
+{
+	u32 mode = 0;
+	int ret;
+
+	ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP,
+				    PCI_TPH_CAP_INT_VEC,
+				    PCI_TPH_CAP_INT_VEC_SHIFT, &mode);
+	if (ret)
+		return false;
+
+	return !!mode;
+}
+
+static int tph_get_table_location(struct pci_dev *dev, u8 *loc_out)
+{
+	u32 loc;
+	int ret;
+
+	ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP, PCI_TPH_CAP_LOC_MASK,
+				    PCI_TPH_CAP_LOC_SHIFT, &loc);
+	if (ret)
+		return ret;
+
+	*loc_out = (u8)loc;
+
+	return 0;
+}
+
+static bool msix_nr_in_bounds(struct pci_dev *dev, int msix_nr)
+{
+	u16 tbl_sz;
+
+	if (tph_get_table_size(dev, &tbl_sz))
+		return false;
+
+	return msix_nr <= tbl_sz;
+}
+
+/* Return root port capability - 0 means none */
+static int get_root_port_completer_cap(struct pci_dev *dev)
+{
+	struct pci_dev *rp;
+	int ret;
+	int val;
+
+	rp = pcie_find_root_port(dev);
+	if (!rp) {
+		pr_err("cannot find root port of %s\n", dev_name(&dev->dev));
+		return 0;
+	}
+
+	ret = pcie_capability_read_dword(rp, PCI_EXP_DEVCAP2, &val);
+	if (ret) {
+		pr_err("cannot read device capabilities 2 of %s\n",
+		       dev_name(&dev->dev));
+		return 0;
+	}
+
+	val &= PCI_EXP_DEVCAP2_TPH_COMP;
+
+	return val >> PCI_EXP_DEVCAP2_TPH_COMP_SHIFT;
+}
+
+/*
+ * TPH device needs to be below a rootport with the TPH Completer and
+ * the completer must offer a compatible level of completer support to that
+ * requested by the device driver.
+ */
+static bool completer_support_ok(struct pci_dev *dev, u8 req)
+{
+	int rp_cap;
+
+	rp_cap = get_root_port_completer_cap(dev);
+
+	if (req > rp_cap) {
+		pr_err("root port lacks proper TPH completer capability\n");
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * The PCI Specification version 5.0 requires the "No ST Mode" mode
+ * be supported by any compatible device.
+ */
+static bool no_st_mode_supported(struct pci_dev *dev)
+{
+	bool no_st;
+	int ret;
+	u32 tmp;
+
+	ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP, PCI_TPH_CAP_NO_ST,
+				    PCI_TPH_CAP_NO_ST_SHIFT, &tmp);
+	if (ret)
+		return false;
+
+	no_st = !!tmp;
+
+	if (!no_st) {
+		pr_err("TPH devices must support no ST mode\n");
+		return false;
+	}
+
+	return true;
+}
+
+static int tph_write_ctrl_reg(struct pci_dev *dev, u32 value)
+{
+	int ret;
+
+	ret = tph_set_reg_field_u32(dev, PCI_TPH_CTRL, ~0L, 0, value);
+
+	if (ret)
+		goto err_out;
+
+	return 0;
+
+err_out:
+	/* minimizing possible harm by disabling TPH */
+	pcie_tph_disable(dev);
+	return ret;
+}
+
+/* Update the ST Mode Select field of the TPH Control Register */
+static int tph_set_ctrl_reg_mode_sel(struct pci_dev *dev, u8 st_mode)
+{
+	int ret;
+	u32 ctrl_reg;
+
+	ret = tph_get_reg_field_u32(dev, PCI_TPH_CTRL, ~0L, 0, &ctrl_reg);
+	if (ret)
+		return ret;
+
+	/* clear the mode select and enable fields */
+	ctrl_reg &= ~(PCI_TPH_CTRL_MODE_SEL_MASK);
+	ctrl_reg |= ((u32)(st_mode << PCI_TPH_CTRL_MODE_SEL_SHIFT) &
+		     PCI_TPH_CTRL_MODE_SEL_MASK);
+
+	ret = tph_write_ctrl_reg(dev, ctrl_reg);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+/* Write the steering tag to MSI-X vector control register */
+static void tph_write_tag_to_msix(struct pci_dev *dev, int msix_nr, u16 tag)
+{
+	u32 val;
+	void __iomem *vec_ctrl;
+	struct msi_desc *msi_desc;
+
+	msi_desc = tph_msix_index_to_desc(dev, msix_nr);
+	if (!msi_desc) {
+		pr_err("MSI-X descriptor for #%d not found\n", msix_nr);
+		return;
+	}
+
+	vec_ctrl = tph_msix_vector_control(dev, msi_desc->msi_index);
+
+	val = readl(vec_ctrl);
+	val &= 0xffff;
+	val |= (tag << 16);
+	writel(val, vec_ctrl);
+
+	/* read back to flush the update */
+	val = readl(vec_ctrl);
+	msi_unlock_descs(&dev->dev);
+}
+
+/* Update the TPH Requester Enable field of the TPH Control Register */
+static int tph_set_ctrl_reg_en(struct pci_dev *dev, u8 req_type)
+{
+	int ret;
+	u32 ctrl_reg;
+
+	ret = tph_get_reg_field_u32(dev, PCI_TPH_CTRL, ~0L, 0,
+				    &ctrl_reg);
+	if (ret)
+		return ret;
+
+	/* clear the mode select and enable fields and set new values*/
+	ctrl_reg &= ~(PCI_TPH_CTRL_REQ_EN_MASK);
+	ctrl_reg |= (((u32)req_type << PCI_TPH_CTRL_REQ_EN_SHIFT) &
+			PCI_TPH_CTRL_REQ_EN_MASK);
+
+	ret = tph_write_ctrl_reg(dev, ctrl_reg);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static bool pcie_tph_write_st(struct pci_dev *dev, unsigned int msix_nr,
+			      u8 req_type, u16 tag)
+{
+	int offset;
+	u8  loc;
+	int ret;
+
+	/* setting ST isn't needed - not an error, just return true */
+	if (!dev->tph_cap || pci_tph_disabled() || pci_tph_nostmode() ||
+	    !dev->msix_enabled || !tph_int_vec_mode_supported(dev))
+		return true;
+
+	/* setting ST is incorrect in the following cases - return error */
+	if (!no_st_mode_supported(dev) || !msix_nr_in_bounds(dev, msix_nr) ||
+	    !completer_support_ok(dev, req_type))
+		return false;
+
+	/*
+	 * disable TPH before updating the tag to avoid potential instability
+	 * as cautioned about in the "ST Table Programming" of PCI-E spec
+	 */
+	pcie_tph_disable(dev);
+
+	ret = tph_get_table_location(dev, &loc);
+	if (ret)
+		return false;
+
+	switch (loc) {
+	case PCI_TPH_LOC_MSIX:
+		tph_write_tag_to_msix(dev, msix_nr, tag);
+		break;
+	case PCI_TPH_LOC_CAP:
+		offset = dev->tph_cap + PCI_TPH_ST_TABLE
+			  + msix_nr * sizeof(u16);
+		pci_write_config_word(dev, offset, tag);
+		break;
+	default:
+		pr_err("unable to write steering tag for device %s\n",
+		       dev_name(&dev->dev));
+		return false;
+	}
+
+	/* select interrupt vector mode */
+	tph_set_ctrl_reg_mode_sel(dev, PCI_TPH_INT_VEC_MODE);
+	tph_set_ctrl_reg_en(dev, req_type);
+
+	return true;
+}
+
 int tph_set_dev_nostmode(struct pci_dev *dev)
 {
 	int ret;
@@ -77,3 +407,56 @@ void pcie_tph_init(struct pci_dev *dev)
 	dev->tph_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_TPH);
 }
 
+/**
+ * pcie_tph_get_st() - Retrieve steering tag for a specific CPU
+ * @dev: pci device
+ * @cpu: the acpi cpu_uid.
+ * @mem_type: memory type (vram, nvram)
+ * @req_type: request type (disable, tph, extended tph)
+ * @tag: steering tag return value
+ *
+ * Return:
+ *        true : success
+ *        false: failed
+ */
+bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu,
+		    enum tph_mem_type mem_type, u8 req_type,
+		    u16 *tag)
+{
+	*tag = 0;
+
+	return true;
+}
+
+/**
+ * pcie_tph_set_st() - Set steering tag in ST table entry
+ * @dev: pci device
+ * @msix_nr: ordinal number of msix interrupt.
+ * @cpu: the acpi cpu_uid.
+ * @mem_type: memory type (vram, nvram)
+ * @req_type: request type (disable, tph, extended tph)
+ *
+ * Return:
+ *        true : success
+ *        false: failed
+ */
+bool pcie_tph_set_st(struct pci_dev *dev, unsigned int msix_nr,
+		     unsigned int cpu, enum tph_mem_type mem_type,
+		     u8 req_type)
+{
+	u16 tag;
+	bool ret = true;
+
+	ret = pcie_tph_get_st(dev, cpu, mem_type, req_type, &tag);
+
+	if (!ret)
+		return false;
+
+	pr_debug("%s: writing tag %d for msi-x intr %d (cpu: %d)\n",
+		 __func__, tag, msix_nr, cpu);
+
+	ret = pcie_tph_write_st(dev, msix_nr, req_type, tag);
+
+	return ret;
+}
+EXPORT_SYMBOL(pcie_tph_set_st);
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 95269afc8b7d..42ecd6192e69 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -9,14 +9,33 @@
 #ifndef LINUX_PCI_TPH_H
 #define LINUX_PCI_TPH_H
 
+enum tph_mem_type {
+	TPH_MEM_TYPE_VM,	/* volatile memory type */
+	TPH_MEM_TYPE_PM		/* persistent memory type */
+};
+
 #ifdef CONFIG_PCIE_TPH
 int pcie_tph_disable(struct pci_dev *dev);
 int tph_set_dev_nostmode(struct pci_dev *dev);
+bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu,
+		     enum tph_mem_type tag_type, u8 req_enable,
+		     u16 *tag);
+bool pcie_tph_set_st(struct pci_dev *dev, unsigned int msix_nr,
+		     unsigned int cpu, enum tph_mem_type tag_type,
+		     u8 req_enable);
 #else
 static inline int pcie_tph_disable(struct pci_dev *dev)
 { return -EOPNOTSUPP; }
 static inline int tph_set_dev_nostmode(struct pci_dev *dev)
 { return -EOPNOTSUPP; }
+static inline bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu,
+				   enum tph_mem_type tag_type, u8 req_enable,
+				   u16 *tag)
+{ return false; }
+static inline bool pcie_tph_set_st(struct pci_dev *dev, unsigned int msix_nr,
+				   unsigned int cpu, enum tph_mem_type tag_type,
+				   u8 req_enable)
+{ return true; }
 #endif
 
 #endif /* LINUX_PCI_TPH_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
                   ` (4 preceding siblings ...)
  2024-05-09 16:27 ` [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-10  4:20   ` kernel test robot
  2024-05-10  5:24   ` kernel test robot
  2024-05-09 16:27 ` [PATCH V1 7/9] PCI/TPH: Add TPH documentation Wei Huang
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

According to PCI SIG ECN, calling the _DSM firmware method for a given
CPU_UID returns the steering tags for different types of memory
(volatile, non-volatile). These tags are supposed to be used in ST
table entry for optimal results.

Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 drivers/pci/pcie/tph.c  | 103 +++++++++++++++++++++++++++++++++++++++-
 include/linux/pci-tph.h |  34 +++++++++++++
 2 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
index 50451a0a32ff..b9d61e1cfd88 100644
--- a/drivers/pci/pcie/tph.c
+++ b/drivers/pci/pcie/tph.c
@@ -158,6 +158,98 @@ static int tph_get_table_location(struct pci_dev *dev, u8 *loc_out)
 	return 0;
 }
 
+static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
+			   union st_info *st_tag)
+{
+	switch (req_type) {
+	case PCI_TPH_REQ_TPH_ONLY: /* 8 bit tags */
+		switch (mem_type) {
+		case TPH_MEM_TYPE_VM:
+			if (st_tag->vm_st_valid)
+				return st_tag->vm_st;
+			break;
+		case TPH_MEM_TYPE_PM:
+			if (st_tag->pm_st_valid)
+				return st_tag->pm_st;
+			break;
+		}
+		break;
+	case PCI_TPH_REQ_EXT_TPH: /* 16 bit tags */
+		switch (mem_type) {
+		case TPH_MEM_TYPE_VM:
+			if (st_tag->vm_xst_valid)
+				return st_tag->vm_xst;
+			break;
+		case TPH_MEM_TYPE_PM:
+			if (st_tag->pm_xst_valid)
+				return st_tag->pm_xst;
+			break;
+		}
+		break;
+	default:
+		pr_err("invalid steering tag in ACPI _DSM\n");
+		return 0;
+	}
+
+	return 0;
+}
+
+#define MIN_ST_DSM_REV		7
+#define ST_DSM_FUNC_INDEX	0xf
+static bool invoke_dsm(acpi_handle handle, u32 cpu_uid, u8 ph,
+		       u8 target_type, bool cache_ref_valid,
+		       u64 cache_ref, union st_info *st_out)
+{
+	union acpi_object in_obj, in_buf[3], *out_obj;
+
+	in_buf[0].integer.type = ACPI_TYPE_INTEGER;
+	in_buf[0].integer.value = 0; /* 0 => processor cache steering tags */
+
+	in_buf[1].integer.type = ACPI_TYPE_INTEGER;
+	in_buf[1].integer.value = cpu_uid;
+
+	in_buf[2].integer.type = ACPI_TYPE_INTEGER;
+	in_buf[2].integer.value = ph & 3;
+	in_buf[2].integer.value |= (target_type & 1) << 2;
+	in_buf[2].integer.value |= (cache_ref_valid & 1) << 3;
+	in_buf[2].integer.value |= (cache_ref << 32);
+
+	in_obj.type = ACPI_TYPE_PACKAGE;
+	in_obj.package.count = ARRAY_SIZE(in_buf);
+	in_obj.package.elements = in_buf;
+
+	out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, MIN_ST_DSM_REV,
+				    ST_DSM_FUNC_INDEX, &in_obj);
+
+	if (!out_obj)
+		return false;
+
+	if (out_obj->type != ACPI_TYPE_BUFFER) {
+		pr_err("invalid return type %d from TPH _DSM\n",
+		       out_obj->type);
+		ACPI_FREE(out_obj);
+		return false;
+	}
+
+	st_out->value = *((u64 *)(out_obj->buffer.pointer));
+
+	ACPI_FREE(out_obj);
+
+	return true;
+}
+
+static acpi_handle root_complex_acpi_handle(struct pci_dev *dev)
+{
+	struct pci_dev *root_port;
+
+	root_port = pcie_find_root_port(dev);
+
+	if (!root_port || !root_port->bus || !root_port->bus->bridge)
+		return NULL;
+
+	return ACPI_HANDLE(root_port->bus->bridge);
+}
+
 static bool msix_nr_in_bounds(struct pci_dev *dev, int msix_nr)
 {
 	u16 tbl_sz;
@@ -423,7 +515,16 @@ bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu,
 		    enum tph_mem_type mem_type, u8 req_type,
 		    u16 *tag)
 {
-	*tag = 0;
+	union st_info info;
+
+	if (!invoke_dsm(root_complex_acpi_handle(dev), cpu, 0, 0, false, 0,
+			&info)) {
+		*tag = 0;
+		return false;
+	}
+
+	*tag = tph_extract_tag(mem_type, req_type, &info);
+	pr_debug("%s: cpu=%d tag=%d\n", __func__, cpu, *tag);
 
 	return true;
 }
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 42ecd6192e69..ed5299a831cb 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -14,6 +14,40 @@ enum tph_mem_type {
 	TPH_MEM_TYPE_PM		/* persistent memory type */
 };
 
+/*
+ * The st_info struct defines the steering tag returned by the firmware _DSM
+ * method defined in PCI SIG ECN. The specification is available at:
+ * https://members.pcisig.com/wg/PCI-SIG/document/15470.
+
+ * @vm_st_valid:  8 bit tag for volatile memory is valid
+ * @vm_xst_valid: 16 bit tag for volatile memory is valid
+ * @vm_ignore:    1 => was and will be ignored, 0 => ph should be supplied
+ * @vm_st:        8 bit steering tag for volatile mem
+ * @vm_xst:       16 bit steering tag for volatile mem
+ * @pm_st_valid:  8 bit tag for persistent memory is valid
+ * @pm_xst_valid: 16 bit tag for persistent memory is valid
+ * @pm_ignore:    1 => was and will be ignore, 0 => ph should be supplied
+ * @pm_st:        8 bit steering tag for persistent mem
+ * @pm_xst:       16 bit steering tag for persistent mem
+ */
+union st_info {
+	struct {
+		u64 vm_st_valid:1,
+		vm_xst_valid:1,
+		vm_ph_ignore:1,
+		rsvd1:5,
+		vm_st:8,
+		vm_xst:16,
+		pm_st_valid:1,
+		pm_xst_valid:1,
+		pm_ph_ignore:1,
+		rsvd2:5,
+		pm_st:8,
+		pm_xst:16;
+	};
+	u64 value;
+};
+
 #ifdef CONFIG_PCIE_TPH
 int pcie_tph_disable(struct pci_dev *dev);
 int tph_set_dev_nostmode(struct pci_dev *dev);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 7/9] PCI/TPH: Add TPH documentation
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
                   ` (5 preceding siblings ...)
  2024-05-09 16:27 ` [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-15 12:11   ` Bagas Sanjaya
  2024-05-09 16:27 ` [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver Wei Huang
  2024-05-09 16:27 ` [PATCH V1 9/9] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang
  8 siblings, 1 reply; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

Provide a document for TPH feature, including the description of
kernel options and driver API interface.

Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
---
 Documentation/PCI/index.rst          |  1 +
 Documentation/PCI/tph.rst            | 57 ++++++++++++++++++++++++++++
 Documentation/driver-api/pci/pci.rst |  3 ++
 3 files changed, 61 insertions(+)
 create mode 100644 Documentation/PCI/tph.rst

diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index e73f84aebde3..5e7c4e6e726b 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -18,3 +18,4 @@ PCI Bus Subsystem
    pcieaer-howto
    endpoint/index
    boot-interrupts
+   tph
diff --git a/Documentation/PCI/tph.rst b/Documentation/PCI/tph.rst
new file mode 100644
index 000000000000..ea9c8313f3e4
--- /dev/null
+++ b/Documentation/PCI/tph.rst
@@ -0,0 +1,57 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========
+TPH Support
+===========
+
+
+:Copyright: |copy| 2024 Advanced Micro Devices, Inc.
+:Authors: - Eric van Tassell <eric.vantassell@amd.com>
+          - Wei Huang <wei.huang2@amd.com>
+
+Overview
+========
+TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices
+to provide optimization hints, such as desired caching behavior, for
+requests that target memory space. These hints, in a format called steering
+tags, are provided in the requester's TLP headers and can empower the system
+hardware, including the Root Complex, to optimize the utilization of platform
+resources for the requests.
+
+User Guide
+==========
+
+Kernel Options
+--------------
+There are two kernel command line options available to control TPH feature
+
+   * "notph": TPH will be disabled for all endpoint devices.
+   * "nostmode": TPH will be enabled but the ST Mode will be forced to "No ST Mode".
+
+Device Driver API
+-----------------
+In brief, an endpoint device driver using the TPH interface to configure
+Interrupt Vector Mode will call pcie_tph_set_st() when setting up MSI-X
+interrupts as shown below:
+
+.. code-block:: c
+
+    for (i = 0, j = 0; i < nr_rings; i++) {
+        ...
+        rc = request_irq(irq->vector, irq->handler, flags, irq->name, NULL);
+        ...
+        if (!pcie_tph_set_st(pdev, i, cpumask_first(irq->cpu_mask),
+                             TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
+               pr_err("Error in configuring steering tag\n");
+        ...
+    }
+
+If a device only supports TPH vendor specific mode, its driver can call
+pcie_tph_get_st() to retrieve the steering tag for a specific CPU and uses
+the tag to control TPH behavior.
+
+.. kernel-doc:: drivers/pci/pcie/tph.c
+   :export:
+
+.. kernel-doc:: drivers/pci/pcie/tph.c
+   :identifiers: pcie_tph_set_st
diff --git a/Documentation/driver-api/pci/pci.rst b/Documentation/driver-api/pci/pci.rst
index aa40b1cc243b..3d896b2cf16e 100644
--- a/Documentation/driver-api/pci/pci.rst
+++ b/Documentation/driver-api/pci/pci.rst
@@ -46,6 +46,9 @@ PCI Support Library
 .. kernel-doc:: drivers/pci/pci-sysfs.c
    :internal:
 
+.. kernel-doc:: drivers/pci/pcie/tph.c
+   :export:
+
 PCI Hotplug Support Library
 ---------------------------
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
                   ` (6 preceding siblings ...)
  2024-05-09 16:27 ` [PATCH V1 7/9] PCI/TPH: Add TPH documentation Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  2024-05-09 21:50   ` Vadim Fedorenko
  2024-05-10  3:10   ` Somnath Kotur
  2024-05-09 16:27 ` [PATCH V1 9/9] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang
  8 siblings, 2 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

From: Manoj Panicker <manoj.panicker2@amd.com>

As a usage example, this patch implements TPH support in Broadcom BNXT
device driver by invoking pcie_tph_set_st() function when interrupt
affinity is changed.

Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
 2 files changed, 55 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 2c2ee79c4d77..be9c17566fb4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -55,6 +55,7 @@
 #include <net/page_pool/helpers.h>
 #include <linux/align.h>
 #include <net/netdev_queues.h>
+#include <linux/pci-tph.h>
 
 #include "bnxt_hsi.h"
 #include "bnxt.h"
@@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
 				free_cpumask_var(irq->cpu_mask);
 				irq->have_cpumask = 0;
 			}
+			irq_set_affinity_notifier(irq->vector, NULL);
 			free_irq(irq->vector, bp->bnapi[i]);
 		}
 
@@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
 	}
 }
 
+static void bnxt_rtnl_lock_sp(struct bnxt *bp);
+static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
+static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
+				     const cpumask_t *mask)
+{
+	struct bnxt_irq *irq;
+
+	irq = container_of(notify, struct bnxt_irq, affinity_notify);
+	cpumask_copy(irq->cpu_mask, mask);
+
+	if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
+			     cpumask_first(irq->cpu_mask),
+			     TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
+		pr_err("error in configuring steering tag\n");
+
+	if (netif_running(irq->bp->dev)) {
+		rtnl_lock();
+		bnxt_close_nic(irq->bp, false, false);
+		bnxt_open_nic(irq->bp, false, false);
+		rtnl_unlock();
+	}
+}
+
+static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
+{
+}
+
+static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
+{
+	struct irq_affinity_notify *notify;
+
+	notify = &irq->affinity_notify;
+	notify->irq = irq->vector;
+	notify->notify = bnxt_irq_affinity_notify;
+	notify->release = bnxt_irq_affinity_release;
+
+	irq_set_affinity_notifier(irq->vector, notify);
+}
+
 static int bnxt_request_irq(struct bnxt *bp)
 {
 	int i, j, rc = 0;
@@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
 			int numa_node = dev_to_node(&bp->pdev->dev);
 
 			irq->have_cpumask = 1;
+			irq->msix_nr = map_idx;
 			cpumask_set_cpu(cpumask_local_spread(i, numa_node),
 					irq->cpu_mask);
 			rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
@@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
 					    irq->vector);
 				break;
 			}
+
+			if (!pcie_tph_set_st(bp->pdev, i,
+					     cpumask_first(irq->cpu_mask),
+					     TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
+				netdev_err(bp->dev, "error in setting steering tag\n");
+			} else {
+				irq->bp = bp;
+				__bnxt_register_notify_irqchanges(irq);
+			}
 		}
 	}
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index dd849e715c9b..0d3442590bb4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1195,6 +1195,10 @@ struct bnxt_irq {
 	u8		have_cpumask:1;
 	char		name[IFNAMSIZ + 2];
 	cpumask_var_t	cpu_mask;
+
+	int		msix_nr;
+	struct bnxt	*bp;
+	struct irq_affinity_notify affinity_notify;
 };
 
 #define HWRM_RING_ALLOC_TX	0x1
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V1 9/9] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings
  2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
                   ` (7 preceding siblings ...)
  2024-05-09 16:27 ` [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver Wei Huang
@ 2024-05-09 16:27 ` Wei Huang
  8 siblings, 0 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-09 16:27 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell, wei.huang2

From: Michael Chan <michael.chan@broadcom.com>

Newer firmware can use the NQ ring ID associated with each RX/RX AGG
ring to enable PCIe steering tag.  Older firmware will just ignore the
information.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index be9c17566fb4..2b5bedb47a27 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6620,10 +6620,12 @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp,
 
 			/* Association of rx ring with stats context */
 			grp_info = &bp->grp_info[ring->grp_idx];
+			req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id);
 			req->rx_buf_size = cpu_to_le16(bp->rx_buf_use_size);
 			req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx);
 			req->enables |= cpu_to_le32(
-				RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID);
+				RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID |
+				RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID);
 			if (NET_IP_ALIGN == 2)
 				flags = RING_ALLOC_REQ_FLAGS_RX_SOP_PAD;
 			req->flags = cpu_to_le16(flags);
@@ -6635,11 +6637,13 @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp,
 			/* Association of agg ring with rx ring */
 			grp_info = &bp->grp_info[ring->grp_idx];
 			req->rx_ring_id = cpu_to_le16(grp_info->rx_fw_ring_id);
+			req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id);
 			req->rx_buf_size = cpu_to_le16(BNXT_RX_PAGE_SIZE);
 			req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx);
 			req->enables |= cpu_to_le32(
 				RING_ALLOC_REQ_ENABLES_RX_RING_ID_VALID |
-				RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID);
+				RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID |
+				RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID);
 		} else {
 			req->ring_type = RING_ALLOC_REQ_RING_TYPE_RX;
 		}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-09 16:27 ` [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver Wei Huang
@ 2024-05-09 21:50   ` Vadim Fedorenko
  2024-05-10  3:55     ` Ajit Khaparde
  2024-05-10  3:10   ` Somnath Kotur
  1 sibling, 1 reply; 25+ messages in thread
From: Vadim Fedorenko @ 2024-05-09 21:50 UTC (permalink / raw)
  To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell

On 09/05/2024 17:27, Wei Huang wrote:
> From: Manoj Panicker <manoj.panicker2@amd.com>
> 
> As a usage example, this patch implements TPH support in Broadcom BNXT
> device driver by invoking pcie_tph_set_st() function when interrupt
> affinity is changed.
> 
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Reviewed-by: Wei Huang <wei.huang2@amd.com>
> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> ---
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
>   drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>   2 files changed, 55 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 2c2ee79c4d77..be9c17566fb4 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -55,6 +55,7 @@
>   #include <net/page_pool/helpers.h>
>   #include <linux/align.h>
>   #include <net/netdev_queues.h>
> +#include <linux/pci-tph.h>
>   
>   #include "bnxt_hsi.h"
>   #include "bnxt.h"
> @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
>   				free_cpumask_var(irq->cpu_mask);
>   				irq->have_cpumask = 0;
>   			}
> +			irq_set_affinity_notifier(irq->vector, NULL);
>   			free_irq(irq->vector, bp->bnapi[i]);
>   		}
>   
> @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
>   	}
>   }
>   
> +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
> +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
> +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> +				     const cpumask_t *mask)
> +{
> +	struct bnxt_irq *irq;
> +
> +	irq = container_of(notify, struct bnxt_irq, affinity_notify);
> +	cpumask_copy(irq->cpu_mask, mask);
> +
> +	if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
> +			     cpumask_first(irq->cpu_mask),
> +			     TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
> +		pr_err("error in configuring steering tag\n");
> +
> +	if (netif_running(irq->bp->dev)) {
> +		rtnl_lock();
> +		bnxt_close_nic(irq->bp, false, false);
> +		bnxt_open_nic(irq->bp, false, false);
> +		rtnl_unlock();
> +	}

Is it really needed? It will cause link flap and pause in the traffic
service for the device. Why the device needs full restart in this case?

> +}
> +
> +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
> +{
> +}
> +
> +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)

No inlines in .c files, please. Let compiler decide what to inline.

> +{
> +	struct irq_affinity_notify *notify;
> +
> +	notify = &irq->affinity_notify;
> +	notify->irq = irq->vector;
> +	notify->notify = bnxt_irq_affinity_notify;
> +	notify->release = bnxt_irq_affinity_release;
> +
> +	irq_set_affinity_notifier(irq->vector, notify);
> +}
> +
>   static int bnxt_request_irq(struct bnxt *bp)
>   {
>   	int i, j, rc = 0;
> @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
>   			int numa_node = dev_to_node(&bp->pdev->dev);
>   
>   			irq->have_cpumask = 1;
> +			irq->msix_nr = map_idx;
>   			cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>   					irq->cpu_mask);
>   			rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
>   					    irq->vector);
>   				break;
>   			}
> +
> +			if (!pcie_tph_set_st(bp->pdev, i,
> +					     cpumask_first(irq->cpu_mask),
> +					     TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
> +				netdev_err(bp->dev, "error in setting steering tag\n");
> +			} else {
> +				irq->bp = bp;
> +				__bnxt_register_notify_irqchanges(irq);
> +			}
>   		}
>   	}
>   	return rc;
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index dd849e715c9b..0d3442590bb4 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -1195,6 +1195,10 @@ struct bnxt_irq {
>   	u8		have_cpumask:1;
>   	char		name[IFNAMSIZ + 2];
>   	cpumask_var_t	cpu_mask;
> +
> +	int		msix_nr;
> +	struct bnxt	*bp;
> +	struct irq_affinity_notify affinity_notify;
>   };
>   
>   #define HWRM_RING_ALLOC_TX	0x1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags
  2024-05-09 16:27 ` [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags Wei Huang
@ 2024-05-10  3:07   ` kernel test robot
  2024-05-11 20:15   ` Simon Horman
  1 sibling, 0 replies; 25+ messages in thread
From: kernel test robot @ 2024-05-10  3:07 UTC (permalink / raw)
  To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
  Cc: oe-kbuild-all, bhelgaas, corbet, davem, edumazet, kuba, pabeni,
	alex.williamson, gospo, michael.chan, ajit.khaparde,
	manoj.panicker2, Eric.VanTassell, wei.huang2

Hi Wei,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/for-linus]
[also build test ERROR on awilliam-vfio/next linus/master awilliam-vfio/for-linus v6.9-rc7 next-20240509]
[cannot apply to pci/next horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Wei-Huang/PCI-Introduce-PCIe-TPH-support-framework/20240510-003504
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git for-linus
patch link:    https://lore.kernel.org/r/20240509162741.1937586-6-wei.huang2%40amd.com
patch subject: [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags
config: parisc-randconfig-r081-20240510 (https://download.01.org/0day-ci/archive/20240510/202405101033.2xLAqHIx-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240510/202405101033.2xLAqHIx-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405101033.2xLAqHIx-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/pci/pcie/tph.c: In function 'tph_msix_table_entry':
>> drivers/pci/pcie/tph.c:95:22: error: 'struct pci_dev' has no member named 'msix_base'; did you mean 'msix_cap'?
      95 |         entry = dev->msix_base + msi_index * PCI_MSIX_ENTRY_SIZE;
         |                      ^~~~~~~~~
         |                      msix_cap


vim +95 drivers/pci/pcie/tph.c

    80	
    81	/*
    82	 * For a given device, return a pointer to the MSI table entry at msi_index.
    83	 */
    84	static void __iomem *tph_msix_table_entry(struct pci_dev *dev,
    85						  __le16 msi_index)
    86	{
    87		void *entry;
    88		u16 tbl_sz;
    89		int ret;
    90	
    91		ret = tph_get_table_size(dev, &tbl_sz);
    92		if (ret || msi_index > tbl_sz)
    93			return NULL;
    94	
  > 95		entry = dev->msix_base + msi_index * PCI_MSIX_ENTRY_SIZE;
    96	
    97		return entry;
    98	}
    99	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-09 16:27 ` [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver Wei Huang
  2024-05-09 21:50   ` Vadim Fedorenko
@ 2024-05-10  3:10   ` Somnath Kotur
  1 sibling, 0 replies; 25+ messages in thread
From: Somnath Kotur @ 2024-05-10  3:10 UTC (permalink / raw)
  To: Wei Huang
  Cc: linux-pci, linux-kernel, linux-doc, netdev, bhelgaas, corbet,
	davem, edumazet, kuba, pabeni, alex.williamson, gospo,
	michael.chan, ajit.khaparde, manoj.panicker2, Eric.VanTassell

[-- Attachment #1: Type: text/plain, Size: 5069 bytes --]

On Thu, May 9, 2024 at 10:02 PM Wei Huang <wei.huang2@amd.com> wrote:
>
> From: Manoj Panicker <manoj.panicker2@amd.com>
>
> As a usage example, this patch implements TPH support in Broadcom BNXT
> device driver by invoking pcie_tph_set_st() function when interrupt
> affinity is changed.
>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Reviewed-by: Wei Huang <wei.huang2@amd.com>
> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
>  drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>  2 files changed, 55 insertions(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 2c2ee79c4d77..be9c17566fb4 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -55,6 +55,7 @@
>  #include <net/page_pool/helpers.h>
>  #include <linux/align.h>
>  #include <net/netdev_queues.h>
> +#include <linux/pci-tph.h>
>
>  #include "bnxt_hsi.h"
>  #include "bnxt.h"
> @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
>                                 free_cpumask_var(irq->cpu_mask);
>                                 irq->have_cpumask = 0;
>                         }
> +                       irq_set_affinity_notifier(irq->vector, NULL);
>                         free_irq(irq->vector, bp->bnapi[i]);
>                 }
>
> @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
>         }
>  }
>
> +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
> +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
> +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> +                                    const cpumask_t *mask)
> +{
> +       struct bnxt_irq *irq;
> +
> +       irq = container_of(notify, struct bnxt_irq, affinity_notify);
> +       cpumask_copy(irq->cpu_mask, mask);
> +
> +       if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
> +                            cpumask_first(irq->cpu_mask),
> +                            TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
> +               pr_err("error in configuring steering tag\n");
> +
> +       if (netif_running(irq->bp->dev)) {
> +               rtnl_lock();
> +               bnxt_close_nic(irq->bp, false, false);
> +               bnxt_open_nic(irq->bp, false, false);
> +               rtnl_unlock();
> +       }
> +}
> +
> +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
> +{
> +}
> +
> +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
> +{
> +       struct irq_affinity_notify *notify;
> +
> +       notify = &irq->affinity_notify;
> +       notify->irq = irq->vector;
> +       notify->notify = bnxt_irq_affinity_notify;
> +       notify->release = bnxt_irq_affinity_release;
> +
> +       irq_set_affinity_notifier(irq->vector, notify);
> +}
> +
>  static int bnxt_request_irq(struct bnxt *bp)
>  {
>         int i, j, rc = 0;
> @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
>                         int numa_node = dev_to_node(&bp->pdev->dev);
>
>                         irq->have_cpumask = 1;
> +                       irq->msix_nr = map_idx;
>                         cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>                                         irq->cpu_mask);
>                         rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
>                                             irq->vector);
>                                 break;
>                         }
> +
> +                       if (!pcie_tph_set_st(bp->pdev, i,
> +                                            cpumask_first(irq->cpu_mask),
> +                                            TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
> +                               netdev_err(bp->dev, "error in setting steering tag\n");
> +                       } else {
> +                               irq->bp = bp;
> +                               __bnxt_register_notify_irqchanges(irq);
> +                       }
>                 }
>         }
>         return rc;
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index dd849e715c9b..0d3442590bb4 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -1195,6 +1195,10 @@ struct bnxt_irq {
>         u8              have_cpumask:1;
>         char            name[IFNAMSIZ + 2];
>         cpumask_var_t   cpu_mask;
> +
> +       int             msix_nr;
> +       struct bnxt     *bp;
> +       struct irq_affinity_notify affinity_notify;
>  };
>
>  #define HWRM_RING_ALLOC_TX     0x1
> --
> 2.44.0
>
>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4212 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-09 21:50   ` Vadim Fedorenko
@ 2024-05-10  3:55     ` Ajit Khaparde
  2024-05-10 10:35       ` Vadim Fedorenko
  0 siblings, 1 reply; 25+ messages in thread
From: Ajit Khaparde @ 2024-05-10  3:55 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev, bhelgaas,
	corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
	michael.chan, manoj.panicker2, Eric.VanTassell

[-- Attachment #1: Type: text/plain, Size: 5711 bytes --]

On Thu, May 9, 2024 at 2:50 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 09/05/2024 17:27, Wei Huang wrote:
> > From: Manoj Panicker <manoj.panicker2@amd.com>
> >
> > As a usage example, this patch implements TPH support in Broadcom BNXT
> > device driver by invoking pcie_tph_set_st() function when interrupt
> > affinity is changed.
> >
> > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> > Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > Reviewed-by: Wei Huang <wei.huang2@amd.com>
> > Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> > ---
> >   drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
> >   drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
> >   2 files changed, 55 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 2c2ee79c4d77..be9c17566fb4 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -55,6 +55,7 @@
> >   #include <net/page_pool/helpers.h>
> >   #include <linux/align.h>
> >   #include <net/netdev_queues.h>
> > +#include <linux/pci-tph.h>
> >
> >   #include "bnxt_hsi.h"
> >   #include "bnxt.h"
> > @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
> >                               free_cpumask_var(irq->cpu_mask);
> >                               irq->have_cpumask = 0;
> >                       }
> > +                     irq_set_affinity_notifier(irq->vector, NULL);
> >                       free_irq(irq->vector, bp->bnapi[i]);
> >               }
> >
> > @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
> >       }
> >   }
> >
> > +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
> > +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
> > +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> > +                                  const cpumask_t *mask)
> > +{
> > +     struct bnxt_irq *irq;
> > +
> > +     irq = container_of(notify, struct bnxt_irq, affinity_notify);
> > +     cpumask_copy(irq->cpu_mask, mask);
> > +
> > +     if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
> > +                          cpumask_first(irq->cpu_mask),
> > +                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
> > +             pr_err("error in configuring steering tag\n");
> > +
> > +     if (netif_running(irq->bp->dev)) {
> > +             rtnl_lock();
> > +             bnxt_close_nic(irq->bp, false, false);
> > +             bnxt_open_nic(irq->bp, false, false);
> > +             rtnl_unlock();
> > +     }
>
> Is it really needed? It will cause link flap and pause in the traffic
> service for the device. Why the device needs full restart in this case?

In that sequence only the rings are recreated for the hardware to sync
up the tags.

Actually its not a full restart. There is no link reinit or other
heavy lifting in this sequence.
The pause in traffic may be momentary. Do IRQ/CPU affinities change frequently?
Probably not?

>
>
> > +}
> > +
> > +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
> > +{
> > +}
> > +
> > +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
>
> No inlines in .c files, please. Let compiler decide what to inline.
>
> > +{
> > +     struct irq_affinity_notify *notify;
> > +
> > +     notify = &irq->affinity_notify;
> > +     notify->irq = irq->vector;
> > +     notify->notify = bnxt_irq_affinity_notify;
> > +     notify->release = bnxt_irq_affinity_release;
> > +
> > +     irq_set_affinity_notifier(irq->vector, notify);
> > +}
> > +
> >   static int bnxt_request_irq(struct bnxt *bp)
> >   {
> >       int i, j, rc = 0;
> > @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
> >                       int numa_node = dev_to_node(&bp->pdev->dev);
> >
> >                       irq->have_cpumask = 1;
> > +                     irq->msix_nr = map_idx;
> >                       cpumask_set_cpu(cpumask_local_spread(i, numa_node),
> >                                       irq->cpu_mask);
> >                       rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> > @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
> >                                           irq->vector);
> >                               break;
> >                       }
> > +
> > +                     if (!pcie_tph_set_st(bp->pdev, i,
> > +                                          cpumask_first(irq->cpu_mask),
> > +                                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
> > +                             netdev_err(bp->dev, "error in setting steering tag\n");
> > +                     } else {
> > +                             irq->bp = bp;
> > +                             __bnxt_register_notify_irqchanges(irq);
> > +                     }
> >               }
> >       }
> >       return rc;
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > index dd849e715c9b..0d3442590bb4 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > @@ -1195,6 +1195,10 @@ struct bnxt_irq {
> >       u8              have_cpumask:1;
> >       char            name[IFNAMSIZ + 2];
> >       cpumask_var_t   cpu_mask;
> > +
> > +     int             msix_nr;
> > +     struct bnxt     *bp;
> > +     struct irq_affinity_notify affinity_notify;
> >   };
> >
> >   #define HWRM_RING_ALLOC_TX  0x1
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4218 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM
  2024-05-09 16:27 ` [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM Wei Huang
@ 2024-05-10  4:20   ` kernel test robot
  2024-05-10  5:24   ` kernel test robot
  1 sibling, 0 replies; 25+ messages in thread
From: kernel test robot @ 2024-05-10  4:20 UTC (permalink / raw)
  To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
  Cc: oe-kbuild-all, bhelgaas, corbet, davem, edumazet, kuba, pabeni,
	alex.williamson, gospo, michael.chan, ajit.khaparde,
	manoj.panicker2, Eric.VanTassell, wei.huang2

Hi Wei,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/for-linus]
[also build test ERROR on awilliam-vfio/next linus/master awilliam-vfio/for-linus v6.9-rc7 next-20240509]
[cannot apply to pci/next horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Wei-Huang/PCI-Introduce-PCIe-TPH-support-framework/20240510-003504
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git for-linus
patch link:    https://lore.kernel.org/r/20240509162741.1937586-7-wei.huang2%40amd.com
patch subject: [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM
config: parisc-randconfig-r081-20240510 (https://download.01.org/0day-ci/archive/20240510/202405101200.FPuliW1p-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240510/202405101200.FPuliW1p-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405101200.FPuliW1p-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/pci/pcie/tph.c: In function 'tph_msix_table_entry':
   drivers/pci/pcie/tph.c:95:22: error: 'struct pci_dev' has no member named 'msix_base'; did you mean 'msix_cap'?
      95 |         entry = dev->msix_base + msi_index * PCI_MSIX_ENTRY_SIZE;
         |                      ^~~~~~~~~
         |                      msix_cap
   drivers/pci/pcie/tph.c: In function 'invoke_dsm':
>> drivers/pci/pcie/tph.c:221:46: error: 'pci_acpi_dsm_guid' undeclared (first use in this function)
     221 |         out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, MIN_ST_DSM_REV,
         |                                              ^~~~~~~~~~~~~~~~~
   drivers/pci/pcie/tph.c:221:46: note: each undeclared identifier is reported only once for each function it appears in


vim +/pci_acpi_dsm_guid +221 drivers/pci/pcie/tph.c

   196	
   197	#define MIN_ST_DSM_REV		7
   198	#define ST_DSM_FUNC_INDEX	0xf
   199	static bool invoke_dsm(acpi_handle handle, u32 cpu_uid, u8 ph,
   200			       u8 target_type, bool cache_ref_valid,
   201			       u64 cache_ref, union st_info *st_out)
   202	{
   203		union acpi_object in_obj, in_buf[3], *out_obj;
   204	
   205		in_buf[0].integer.type = ACPI_TYPE_INTEGER;
   206		in_buf[0].integer.value = 0; /* 0 => processor cache steering tags */
   207	
   208		in_buf[1].integer.type = ACPI_TYPE_INTEGER;
   209		in_buf[1].integer.value = cpu_uid;
   210	
   211		in_buf[2].integer.type = ACPI_TYPE_INTEGER;
   212		in_buf[2].integer.value = ph & 3;
   213		in_buf[2].integer.value |= (target_type & 1) << 2;
   214		in_buf[2].integer.value |= (cache_ref_valid & 1) << 3;
   215		in_buf[2].integer.value |= (cache_ref << 32);
   216	
   217		in_obj.type = ACPI_TYPE_PACKAGE;
   218		in_obj.package.count = ARRAY_SIZE(in_buf);
   219		in_obj.package.elements = in_buf;
   220	
 > 221		out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, MIN_ST_DSM_REV,
   222					    ST_DSM_FUNC_INDEX, &in_obj);
   223	
   224		if (!out_obj)
   225			return false;
   226	
   227		if (out_obj->type != ACPI_TYPE_BUFFER) {
   228			pr_err("invalid return type %d from TPH _DSM\n",
   229			       out_obj->type);
   230			ACPI_FREE(out_obj);
   231			return false;
   232		}
   233	
   234		st_out->value = *((u64 *)(out_obj->buffer.pointer));
   235	
   236		ACPI_FREE(out_obj);
   237	
   238		return true;
   239	}
   240	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM
  2024-05-09 16:27 ` [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM Wei Huang
  2024-05-10  4:20   ` kernel test robot
@ 2024-05-10  5:24   ` kernel test robot
  1 sibling, 0 replies; 25+ messages in thread
From: kernel test robot @ 2024-05-10  5:24 UTC (permalink / raw)
  To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
  Cc: llvm, oe-kbuild-all, bhelgaas, corbet, davem, edumazet, kuba,
	pabeni, alex.williamson, gospo, michael.chan, ajit.khaparde,
	manoj.panicker2, Eric.VanTassell, wei.huang2

Hi Wei,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/for-linus]
[also build test ERROR on awilliam-vfio/next linus/master awilliam-vfio/for-linus v6.9-rc7 next-20240509]
[cannot apply to pci/next horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Wei-Huang/PCI-Introduce-PCIe-TPH-support-framework/20240510-003504
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git for-linus
patch link:    https://lore.kernel.org/r/20240509162741.1937586-7-wei.huang2%40amd.com
patch subject: [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20240510/202405101330.7jDvJ4Jc-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project b910bebc300dafb30569cecc3017b446ea8eafa0)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240510/202405101330.7jDvJ4Jc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405101330.7jDvJ4Jc-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/pci/pcie/tph.c:13:
   In file included from include/linux/acpi.h:14:
   In file included from include/linux/device.h:32:
   In file included from include/linux/device/driver.h:21:
   In file included from include/linux/module.h:19:
   In file included from include/linux/elf.h:6:
   In file included from arch/s390/include/asm/elf.h:173:
   In file included from arch/s390/include/asm/mmu_context.h:11:
   In file included from arch/s390/include/asm/pgalloc.h:18:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     509 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     516 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     528 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     537 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   In file included from drivers/pci/pcie/tph.c:17:
   In file included from include/linux/msi.h:27:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     547 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     560 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
         |                                                           ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
     102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
         |                                                      ^
   In file included from drivers/pci/pcie/tph.c:17:
   In file included from include/linux/msi.h:27:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     573 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
         |                                                           ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
     115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
         |                                                      ^
   In file included from drivers/pci/pcie/tph.c:17:
   In file included from include/linux/msi.h:27:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:78:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     584 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     594 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     604 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     692 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     700 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     708 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     717 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     726 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     735 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
>> drivers/pci/pcie/tph.c:221:39: error: use of undeclared identifier 'pci_acpi_dsm_guid'
     221 |         out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, MIN_ST_DSM_REV,
         |                                              ^
   17 warnings and 1 error generated.


vim +/pci_acpi_dsm_guid +221 drivers/pci/pcie/tph.c

   196	
   197	#define MIN_ST_DSM_REV		7
   198	#define ST_DSM_FUNC_INDEX	0xf
   199	static bool invoke_dsm(acpi_handle handle, u32 cpu_uid, u8 ph,
   200			       u8 target_type, bool cache_ref_valid,
   201			       u64 cache_ref, union st_info *st_out)
   202	{
   203		union acpi_object in_obj, in_buf[3], *out_obj;
   204	
   205		in_buf[0].integer.type = ACPI_TYPE_INTEGER;
   206		in_buf[0].integer.value = 0; /* 0 => processor cache steering tags */
   207	
   208		in_buf[1].integer.type = ACPI_TYPE_INTEGER;
   209		in_buf[1].integer.value = cpu_uid;
   210	
   211		in_buf[2].integer.type = ACPI_TYPE_INTEGER;
   212		in_buf[2].integer.value = ph & 3;
   213		in_buf[2].integer.value |= (target_type & 1) << 2;
   214		in_buf[2].integer.value |= (cache_ref_valid & 1) << 3;
   215		in_buf[2].integer.value |= (cache_ref << 32);
   216	
   217		in_obj.type = ACPI_TYPE_PACKAGE;
   218		in_obj.package.count = ARRAY_SIZE(in_buf);
   219		in_obj.package.elements = in_buf;
   220	
 > 221		out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, MIN_ST_DSM_REV,
   222					    ST_DSM_FUNC_INDEX, &in_obj);
   223	
   224		if (!out_obj)
   225			return false;
   226	
   227		if (out_obj->type != ACPI_TYPE_BUFFER) {
   228			pr_err("invalid return type %d from TPH _DSM\n",
   229			       out_obj->type);
   230			ACPI_FREE(out_obj);
   231			return false;
   232		}
   233	
   234		st_out->value = *((u64 *)(out_obj->buffer.pointer));
   235	
   236		ACPI_FREE(out_obj);
   237	
   238		return true;
   239	}
   240	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-10  3:55     ` Ajit Khaparde
@ 2024-05-10 10:35       ` Vadim Fedorenko
  2024-05-10 15:23         ` Andy Gospodarek
  0 siblings, 1 reply; 25+ messages in thread
From: Vadim Fedorenko @ 2024-05-10 10:35 UTC (permalink / raw)
  To: Ajit Khaparde
  Cc: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev, bhelgaas,
	corbet, davem, edumazet, kuba, pabeni, alex.williamson, gospo,
	michael.chan, manoj.panicker2, Eric.VanTassell

On 10.05.2024 04:55, Ajit Khaparde wrote:
> On Thu, May 9, 2024 at 2:50 PM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
>>
>> On 09/05/2024 17:27, Wei Huang wrote:
>>> From: Manoj Panicker <manoj.panicker2@amd.com>
>>>
>>> As a usage example, this patch implements TPH support in Broadcom BNXT
>>> device driver by invoking pcie_tph_set_st() function when interrupt
>>> affinity is changed.
>>>
>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>>> Reviewed-by: Wei Huang <wei.huang2@amd.com>
>>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
>>> ---
>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>>>    2 files changed, 55 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> index 2c2ee79c4d77..be9c17566fb4 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> @@ -55,6 +55,7 @@
>>>    #include <net/page_pool/helpers.h>
>>>    #include <linux/align.h>
>>>    #include <net/netdev_queues.h>
>>> +#include <linux/pci-tph.h>
>>>
>>>    #include "bnxt_hsi.h"
>>>    #include "bnxt.h"
>>> @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>                                free_cpumask_var(irq->cpu_mask);
>>>                                irq->have_cpumask = 0;
>>>                        }
>>> +                     irq_set_affinity_notifier(irq->vector, NULL);
>>>                        free_irq(irq->vector, bp->bnapi[i]);
>>>                }
>>>
>>> @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>        }
>>>    }
>>>
>>> +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
>>> +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
>>> +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
>>> +                                  const cpumask_t *mask)
>>> +{
>>> +     struct bnxt_irq *irq;
>>> +
>>> +     irq = container_of(notify, struct bnxt_irq, affinity_notify);
>>> +     cpumask_copy(irq->cpu_mask, mask);
>>> +
>>> +     if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
>>> +                          cpumask_first(irq->cpu_mask),
>>> +                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
>>> +             pr_err("error in configuring steering tag\n");
>>> +
>>> +     if (netif_running(irq->bp->dev)) {
>>> +             rtnl_lock();
>>> +             bnxt_close_nic(irq->bp, false, false);
>>> +             bnxt_open_nic(irq->bp, false, false);
>>> +             rtnl_unlock();
>>> +     }
>>
>> Is it really needed? It will cause link flap and pause in the traffic
>> service for the device. Why the device needs full restart in this case?
> 
> In that sequence only the rings are recreated for the hardware to sync
> up the tags.
> 
> Actually its not a full restart. There is no link reinit or other
> heavy lifting in this sequence.
> The pause in traffic may be momentary. Do IRQ/CPU affinities change frequently?
> Probably not?

 From what I can see in bnxt_en, proper validation of link_re_init parameter is
not (yet?) implemented, __bnxt_open_nic will unconditionally call 
netif_carrier_off() which will be treated as loss of carrier with counters
increment and proper events posted. Changes to CPU affinities were 
non-distruptive before the patch, but now it may break user-space assumptions.

Does FW need full rings re-init to update target value, which is one u32 write?
It looks like overkill TBH.

And yes, affinities can be change on fly according to the changes of the
workload on the host.

>>
>>
>>> +}
>>> +
>>> +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
>>> +{
>>> +}
>>> +
>>> +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
>>
>> No inlines in .c files, please. Let compiler decide what to inline.
>>
>>> +{
>>> +     struct irq_affinity_notify *notify;
>>> +
>>> +     notify = &irq->affinity_notify;
>>> +     notify->irq = irq->vector;
>>> +     notify->notify = bnxt_irq_affinity_notify;
>>> +     notify->release = bnxt_irq_affinity_release;
>>> +
>>> +     irq_set_affinity_notifier(irq->vector, notify);
>>> +}
>>> +
>>>    static int bnxt_request_irq(struct bnxt *bp)
>>>    {
>>>        int i, j, rc = 0;
>>> @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>                        int numa_node = dev_to_node(&bp->pdev->dev);
>>>
>>>                        irq->have_cpumask = 1;
>>> +                     irq->msix_nr = map_idx;
>>>                        cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>>>                                        irq->cpu_mask);
>>>                        rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>>> @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>                                            irq->vector);
>>>                                break;
>>>                        }
>>> +
>>> +                     if (!pcie_tph_set_st(bp->pdev, i,
>>> +                                          cpumask_first(irq->cpu_mask),
>>> +                                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
>>> +                             netdev_err(bp->dev, "error in setting steering tag\n");
>>> +                     } else {
>>> +                             irq->bp = bp;
>>> +                             __bnxt_register_notify_irqchanges(irq);
>>> +                     }
>>>                }
>>>        }
>>>        return rc;
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> index dd849e715c9b..0d3442590bb4 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> @@ -1195,6 +1195,10 @@ struct bnxt_irq {
>>>        u8              have_cpumask:1;
>>>        char            name[IFNAMSIZ + 2];
>>>        cpumask_var_t   cpu_mask;
>>> +
>>> +     int             msix_nr;
>>> +     struct bnxt     *bp;
>>> +     struct irq_affinity_notify affinity_notify;
>>>    };
>>>
>>>    #define HWRM_RING_ALLOC_TX  0x1
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-10 10:35       ` Vadim Fedorenko
@ 2024-05-10 15:23         ` Andy Gospodarek
  2024-05-10 20:03           ` David Wei
  2024-05-10 20:33           ` Vadim Fedorenko
  0 siblings, 2 replies; 25+ messages in thread
From: Andy Gospodarek @ 2024-05-10 15:23 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Ajit Khaparde, Wei Huang, linux-pci, linux-kernel, linux-doc,
	netdev, bhelgaas, corbet, davem, edumazet, kuba, pabeni,
	alex.williamson, michael.chan, manoj.panicker2, Eric.VanTassell

On Fri, May 10, 2024 at 11:35:35AM +0100, Vadim Fedorenko wrote:
> On 10.05.2024 04:55, Ajit Khaparde wrote:
> > On Thu, May 9, 2024 at 2:50 PM Vadim Fedorenko
> > <vadim.fedorenko@linux.dev> wrote:
> > > 
> > > On 09/05/2024 17:27, Wei Huang wrote:
> > > > From: Manoj Panicker <manoj.panicker2@amd.com>
> > > > 
> > > > As a usage example, this patch implements TPH support in Broadcom BNXT
> > > > device driver by invoking pcie_tph_set_st() function when interrupt
> > > > affinity is changed.
> > > > 
> > > > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> > > > Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > > > Reviewed-by: Wei Huang <wei.huang2@amd.com>
> > > > Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> > > > ---
> > > >    drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
> > > >    drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
> > > >    2 files changed, 55 insertions(+)
> > > > 
> > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > index 2c2ee79c4d77..be9c17566fb4 100644
> > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > @@ -55,6 +55,7 @@
> > > >    #include <net/page_pool/helpers.h>
> > > >    #include <linux/align.h>
> > > >    #include <net/netdev_queues.h>
> > > > +#include <linux/pci-tph.h>
> > > > 
> > > >    #include "bnxt_hsi.h"
> > > >    #include "bnxt.h"
> > > > @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
> > > >                                free_cpumask_var(irq->cpu_mask);
> > > >                                irq->have_cpumask = 0;
> > > >                        }
> > > > +                     irq_set_affinity_notifier(irq->vector, NULL);
> > > >                        free_irq(irq->vector, bp->bnapi[i]);
> > > >                }
> > > > 
> > > > @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
> > > >        }
> > > >    }
> > > > 
> > > > +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
> > > > +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
> > > > +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> > > > +                                  const cpumask_t *mask)
> > > > +{
> > > > +     struct bnxt_irq *irq;
> > > > +
> > > > +     irq = container_of(notify, struct bnxt_irq, affinity_notify);
> > > > +     cpumask_copy(irq->cpu_mask, mask);
> > > > +
> > > > +     if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
> > > > +                          cpumask_first(irq->cpu_mask),
> > > > +                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
> > > > +             pr_err("error in configuring steering tag\n");
> > > > +
> > > > +     if (netif_running(irq->bp->dev)) {
> > > > +             rtnl_lock();
> > > > +             bnxt_close_nic(irq->bp, false, false);
> > > > +             bnxt_open_nic(irq->bp, false, false);
> > > > +             rtnl_unlock();
> > > > +     }
> > > 
> > > Is it really needed? It will cause link flap and pause in the traffic
> > > service for the device. Why the device needs full restart in this case?
> > 
> > In that sequence only the rings are recreated for the hardware to sync
> > up the tags.
> > 
> > Actually its not a full restart. There is no link reinit or other
> > heavy lifting in this sequence.
> > The pause in traffic may be momentary. Do IRQ/CPU affinities change frequently?
> > Probably not?
> 
> From what I can see in bnxt_en, proper validation of link_re_init parameter is
> not (yet?) implemented, __bnxt_open_nic will unconditionally call
> netif_carrier_off() which will be treated as loss of carrier with counters
> increment and proper events posted. Changes to CPU affinities were
> non-disruptive before the patch, but now it may break user-space
> assumptions.

From my testing the link should not flap.  I just fired up a recent net-next
and confirmed the same by calling $ ethtool -G ens7f0np0 rx 1024 which does a
similar bnxt_close_nic(bp, false, false)/bnxt_open_nic(bp, false, false) as
this patch.  Link remained up -- even with a non-Broadocm link-partner.

> Does FW need full rings re-init to update target value, which is one u32 write?
> It looks like overkill TBH.

Full rings do not, but the initialization of that particular ring associated
with this irq does need to be done.  On my list of things we need to do in
bnxt_en is implement the new ndo_queue_stop/start and ndo_queue_mem_alloc/free
operations and once those are done we could make a switch as that may be less
disruptive.

> And yes, affinities can be change on fly according to the changes of the
> workload on the host.
> 
> > > 
> > > 
> > > > +}
> > > > +
> > > > +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
> > > > +{
> > > > +}
> > > > +
> > > > +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
> > > 
> > > No inlines in .c files, please. Let compiler decide what to inline.
> > > 
> > > > +{
> > > > +     struct irq_affinity_notify *notify;
> > > > +
> > > > +     notify = &irq->affinity_notify;
> > > > +     notify->irq = irq->vector;
> > > > +     notify->notify = bnxt_irq_affinity_notify;
> > > > +     notify->release = bnxt_irq_affinity_release;
> > > > +
> > > > +     irq_set_affinity_notifier(irq->vector, notify);
> > > > +}
> > > > +
> > > >    static int bnxt_request_irq(struct bnxt *bp)
> > > >    {
> > > >        int i, j, rc = 0;
> > > > @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
> > > >                        int numa_node = dev_to_node(&bp->pdev->dev);
> > > > 
> > > >                        irq->have_cpumask = 1;
> > > > +                     irq->msix_nr = map_idx;
> > > >                        cpumask_set_cpu(cpumask_local_spread(i, numa_node),
> > > >                                        irq->cpu_mask);
> > > >                        rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> > > > @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
> > > >                                            irq->vector);
> > > >                                break;
> > > >                        }
> > > > +
> > > > +                     if (!pcie_tph_set_st(bp->pdev, i,
> > > > +                                          cpumask_first(irq->cpu_mask),
> > > > +                                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
> > > > +                             netdev_err(bp->dev, "error in setting steering tag\n");
> > > > +                     } else {
> > > > +                             irq->bp = bp;
> > > > +                             __bnxt_register_notify_irqchanges(irq);
> > > > +                     }
> > > >                }
> > > >        }
> > > >        return rc;
> > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > > > index dd849e715c9b..0d3442590bb4 100644
> > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > > > @@ -1195,6 +1195,10 @@ struct bnxt_irq {
> > > >        u8              have_cpumask:1;
> > > >        char            name[IFNAMSIZ + 2];
> > > >        cpumask_var_t   cpu_mask;
> > > > +
> > > > +     int             msix_nr;
> > > > +     struct bnxt     *bp;
> > > > +     struct irq_affinity_notify affinity_notify;
> > > >    };
> > > > 
> > > >    #define HWRM_RING_ALLOC_TX  0x1
> > > 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-10 15:23         ` Andy Gospodarek
@ 2024-05-10 20:03           ` David Wei
  2024-05-10 20:33             ` Andy Gospodarek
  2024-05-10 20:33           ` Vadim Fedorenko
  1 sibling, 1 reply; 25+ messages in thread
From: David Wei @ 2024-05-10 20:03 UTC (permalink / raw)
  To: Andy Gospodarek, Vadim Fedorenko
  Cc: Ajit Khaparde, Wei Huang, linux-pci, linux-kernel, linux-doc,
	netdev, bhelgaas, corbet, davem, edumazet, kuba, pabeni,
	alex.williamson, michael.chan, manoj.panicker2, Eric.VanTassell

On 2024-05-10 08:23, Andy Gospodarek wrote:
> On Fri, May 10, 2024 at 11:35:35AM +0100, Vadim Fedorenko wrote:
>> On 10.05.2024 04:55, Ajit Khaparde wrote:
>>> On Thu, May 9, 2024 at 2:50 PM Vadim Fedorenko
>>> <vadim.fedorenko@linux.dev> wrote:
>>>>
>>>> On 09/05/2024 17:27, Wei Huang wrote:
>>>>> From: Manoj Panicker <manoj.panicker2@amd.com>
>>>>>
>>>>> As a usage example, this patch implements TPH support in Broadcom BNXT
>>>>> device driver by invoking pcie_tph_set_st() function when interrupt
>>>>> affinity is changed.
>>>>>
>>>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>>>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>>>>> Reviewed-by: Wei Huang <wei.huang2@amd.com>
>>>>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
>>>>> ---
>>>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
>>>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>>>>>    2 files changed, 55 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>>>> index 2c2ee79c4d77..be9c17566fb4 100644
>>>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>>>> @@ -55,6 +55,7 @@
>>>>>    #include <net/page_pool/helpers.h>
>>>>>    #include <linux/align.h>
>>>>>    #include <net/netdev_queues.h>
>>>>> +#include <linux/pci-tph.h>
>>>>>
>>>>>    #include "bnxt_hsi.h"
>>>>>    #include "bnxt.h"
>>>>> @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>>>                                free_cpumask_var(irq->cpu_mask);
>>>>>                                irq->have_cpumask = 0;
>>>>>                        }
>>>>> +                     irq_set_affinity_notifier(irq->vector, NULL);
>>>>>                        free_irq(irq->vector, bp->bnapi[i]);
>>>>>                }
>>>>>
>>>>> @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>>>        }
>>>>>    }
>>>>>
>>>>> +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
>>>>> +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
>>>>> +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
>>>>> +                                  const cpumask_t *mask)
>>>>> +{
>>>>> +     struct bnxt_irq *irq;
>>>>> +
>>>>> +     irq = container_of(notify, struct bnxt_irq, affinity_notify);
>>>>> +     cpumask_copy(irq->cpu_mask, mask);
>>>>> +
>>>>> +     if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
>>>>> +                          cpumask_first(irq->cpu_mask),
>>>>> +                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
>>>>> +             pr_err("error in configuring steering tag\n");
>>>>> +
>>>>> +     if (netif_running(irq->bp->dev)) {
>>>>> +             rtnl_lock();
>>>>> +             bnxt_close_nic(irq->bp, false, false);
>>>>> +             bnxt_open_nic(irq->bp, false, false);
>>>>> +             rtnl_unlock();
>>>>> +     }
>>>>
>>>> Is it really needed? It will cause link flap and pause in the traffic
>>>> service for the device. Why the device needs full restart in this case?
>>>
>>> In that sequence only the rings are recreated for the hardware to sync
>>> up the tags.
>>>
>>> Actually its not a full restart. There is no link reinit or other
>>> heavy lifting in this sequence.
>>> The pause in traffic may be momentary. Do IRQ/CPU affinities change frequently?
>>> Probably not?
>>
>> From what I can see in bnxt_en, proper validation of link_re_init parameter is
>> not (yet?) implemented, __bnxt_open_nic will unconditionally call
>> netif_carrier_off() which will be treated as loss of carrier with counters
>> increment and proper events posted. Changes to CPU affinities were
>> non-disruptive before the patch, but now it may break user-space
>> assumptions.
> 
> From my testing the link should not flap.  I just fired up a recent net-next
> and confirmed the same by calling $ ethtool -G ens7f0np0 rx 1024 which does a
> similar bnxt_close_nic(bp, false, false)/bnxt_open_nic(bp, false, false) as
> this patch.  Link remained up -- even with a non-Broadocm link-partner.
> 
>> Does FW need full rings re-init to update target value, which is one u32 write?
>> It looks like overkill TBH.
> 
> Full rings do not, but the initialization of that particular ring associated
> with this irq does need to be done.  On my list of things we need to do in
> bnxt_en is implement the new ndo_queue_stop/start and ndo_queue_mem_alloc/free
> operations and once those are done we could make a switch as that may be less
> disruptive.

Hi Andy, I have an implementation of the new ndo_queue_stop/start() API
[1] and would appreciate comments. I've been trying to test it but
without avail due to FW issues.

[1]: https://lore.kernel.org/netdev/20240502045410.3524155-1-dw@davidwei.uk/

> 
>> And yes, affinities can be change on fly according to the changes of the
>> workload on the host.
>>
>>>>
>>>>
>>>>> +}
>>>>> +
>>>>> +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
>>>>
>>>> No inlines in .c files, please. Let compiler decide what to inline.
>>>>
>>>>> +{
>>>>> +     struct irq_affinity_notify *notify;
>>>>> +
>>>>> +     notify = &irq->affinity_notify;
>>>>> +     notify->irq = irq->vector;
>>>>> +     notify->notify = bnxt_irq_affinity_notify;
>>>>> +     notify->release = bnxt_irq_affinity_release;
>>>>> +
>>>>> +     irq_set_affinity_notifier(irq->vector, notify);
>>>>> +}
>>>>> +
>>>>>    static int bnxt_request_irq(struct bnxt *bp)
>>>>>    {
>>>>>        int i, j, rc = 0;
>>>>> @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>>>                        int numa_node = dev_to_node(&bp->pdev->dev);
>>>>>
>>>>>                        irq->have_cpumask = 1;
>>>>> +                     irq->msix_nr = map_idx;
>>>>>                        cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>>>>>                                        irq->cpu_mask);
>>>>>                        rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>>>>> @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>>>                                            irq->vector);
>>>>>                                break;
>>>>>                        }
>>>>> +
>>>>> +                     if (!pcie_tph_set_st(bp->pdev, i,
>>>>> +                                          cpumask_first(irq->cpu_mask),
>>>>> +                                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
>>>>> +                             netdev_err(bp->dev, "error in setting steering tag\n");
>>>>> +                     } else {
>>>>> +                             irq->bp = bp;
>>>>> +                             __bnxt_register_notify_irqchanges(irq);
>>>>> +                     }
>>>>>                }
>>>>>        }
>>>>>        return rc;
>>>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>>>> index dd849e715c9b..0d3442590bb4 100644
>>>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>>>> @@ -1195,6 +1195,10 @@ struct bnxt_irq {
>>>>>        u8              have_cpumask:1;
>>>>>        char            name[IFNAMSIZ + 2];
>>>>>        cpumask_var_t   cpu_mask;
>>>>> +
>>>>> +     int             msix_nr;
>>>>> +     struct bnxt     *bp;
>>>>> +     struct irq_affinity_notify affinity_notify;
>>>>>    };
>>>>>
>>>>>    #define HWRM_RING_ALLOC_TX  0x1
>>>>
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-10 20:03           ` David Wei
@ 2024-05-10 20:33             ` Andy Gospodarek
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Gospodarek @ 2024-05-10 20:33 UTC (permalink / raw)
  To: David Wei
  Cc: Andy Gospodarek, Vadim Fedorenko, Ajit Khaparde, Wei Huang,
	linux-pci, linux-kernel, linux-doc, netdev, bhelgaas, corbet,
	davem, edumazet, kuba, pabeni, alex.williamson, michael.chan,
	manoj.panicker2, Eric.VanTassell

On Fri, May 10, 2024 at 01:03:50PM -0700, David Wei wrote:
> On 2024-05-10 08:23, Andy Gospodarek wrote:
> > On Fri, May 10, 2024 at 11:35:35AM +0100, Vadim Fedorenko wrote:
> >> On 10.05.2024 04:55, Ajit Khaparde wrote:
> >>> On Thu, May 9, 2024 at 2:50 PM Vadim Fedorenko
> >>> <vadim.fedorenko@linux.dev> wrote:
> >>>>
> >>>> On 09/05/2024 17:27, Wei Huang wrote:
> >>>>> From: Manoj Panicker <manoj.panicker2@amd.com>
> >>>>>
> >>>>> As a usage example, this patch implements TPH support in Broadcom BNXT
> >>>>> device driver by invoking pcie_tph_set_st() function when interrupt
> >>>>> affinity is changed.
> >>>>>
> >>>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> >>>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> >>>>> Reviewed-by: Wei Huang <wei.huang2@amd.com>
> >>>>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> >>>>> ---
> >>>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
> >>>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
> >>>>>    2 files changed, 55 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> >>>>> index 2c2ee79c4d77..be9c17566fb4 100644
> >>>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> >>>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> >>>>> @@ -55,6 +55,7 @@
> >>>>>    #include <net/page_pool/helpers.h>
> >>>>>    #include <linux/align.h>
> >>>>>    #include <net/netdev_queues.h>
> >>>>> +#include <linux/pci-tph.h>
> >>>>>
> >>>>>    #include "bnxt_hsi.h"
> >>>>>    #include "bnxt.h"
> >>>>> @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
> >>>>>                                free_cpumask_var(irq->cpu_mask);
> >>>>>                                irq->have_cpumask = 0;
> >>>>>                        }
> >>>>> +                     irq_set_affinity_notifier(irq->vector, NULL);
> >>>>>                        free_irq(irq->vector, bp->bnapi[i]);
> >>>>>                }
> >>>>>
> >>>>> @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
> >>>>>        }
> >>>>>    }
> >>>>>
> >>>>> +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
> >>>>> +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
> >>>>> +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> >>>>> +                                  const cpumask_t *mask)
> >>>>> +{
> >>>>> +     struct bnxt_irq *irq;
> >>>>> +
> >>>>> +     irq = container_of(notify, struct bnxt_irq, affinity_notify);
> >>>>> +     cpumask_copy(irq->cpu_mask, mask);
> >>>>> +
> >>>>> +     if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
> >>>>> +                          cpumask_first(irq->cpu_mask),
> >>>>> +                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
> >>>>> +             pr_err("error in configuring steering tag\n");
> >>>>> +
> >>>>> +     if (netif_running(irq->bp->dev)) {
> >>>>> +             rtnl_lock();
> >>>>> +             bnxt_close_nic(irq->bp, false, false);
> >>>>> +             bnxt_open_nic(irq->bp, false, false);
> >>>>> +             rtnl_unlock();
> >>>>> +     }
> >>>>
> >>>> Is it really needed? It will cause link flap and pause in the traffic
> >>>> service for the device. Why the device needs full restart in this case?
> >>>
> >>> In that sequence only the rings are recreated for the hardware to sync
> >>> up the tags.
> >>>
> >>> Actually its not a full restart. There is no link reinit or other
> >>> heavy lifting in this sequence.
> >>> The pause in traffic may be momentary. Do IRQ/CPU affinities change frequently?
> >>> Probably not?
> >>
> >> From what I can see in bnxt_en, proper validation of link_re_init parameter is
> >> not (yet?) implemented, __bnxt_open_nic will unconditionally call
> >> netif_carrier_off() which will be treated as loss of carrier with counters
> >> increment and proper events posted. Changes to CPU affinities were
> >> non-disruptive before the patch, but now it may break user-space
> >> assumptions.
> > 
> > From my testing the link should not flap.  I just fired up a recent net-next
> > and confirmed the same by calling $ ethtool -G ens7f0np0 rx 1024 which does a
> > similar bnxt_close_nic(bp, false, false)/bnxt_open_nic(bp, false, false) as
> > this patch.  Link remained up -- even with a non-Broadocm link-partner.
> > 
> >> Does FW need full rings re-init to update target value, which is one u32 write?
> >> It looks like overkill TBH.
> > 
> > Full rings do not, but the initialization of that particular ring associated
> > with this irq does need to be done.  On my list of things we need to do in
> > bnxt_en is implement the new ndo_queue_stop/start and ndo_queue_mem_alloc/free
> > operations and once those are done we could make a switch as that may be less
> > disruptive.
> 
> Hi Andy, I have an implementation of the new ndo_queue_stop/start() API
> [1] and would appreciate comments. I've been trying to test it but
> without avail due to FW issues.
> 
> [1]: https://lore.kernel.org/netdev/20240502045410.3524155-1-dw@davidwei.uk/
> 

David, I will take a look at those in more detail over the weekend or on Monday
(they are sitting in my inbox).

The overall structure looks good, but I do have at least one concern that is
related to what would need to be done in the hardware pipeline to be sure it is
safe to free packet buffers.

> > 
> >> And yes, affinities can be change on fly according to the changes of the
> >> workload on the host.
> >>
> >>>>
> >>>>
> >>>>> +}
> >>>>> +
> >>>>> +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>>> +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
> >>>>
> >>>> No inlines in .c files, please. Let compiler decide what to inline.
> >>>>
> >>>>> +{
> >>>>> +     struct irq_affinity_notify *notify;
> >>>>> +
> >>>>> +     notify = &irq->affinity_notify;
> >>>>> +     notify->irq = irq->vector;
> >>>>> +     notify->notify = bnxt_irq_affinity_notify;
> >>>>> +     notify->release = bnxt_irq_affinity_release;
> >>>>> +
> >>>>> +     irq_set_affinity_notifier(irq->vector, notify);
> >>>>> +}
> >>>>> +
> >>>>>    static int bnxt_request_irq(struct bnxt *bp)
> >>>>>    {
> >>>>>        int i, j, rc = 0;
> >>>>> @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
> >>>>>                        int numa_node = dev_to_node(&bp->pdev->dev);
> >>>>>
> >>>>>                        irq->have_cpumask = 1;
> >>>>> +                     irq->msix_nr = map_idx;
> >>>>>                        cpumask_set_cpu(cpumask_local_spread(i, numa_node),
> >>>>>                                        irq->cpu_mask);
> >>>>>                        rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> >>>>> @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
> >>>>>                                            irq->vector);
> >>>>>                                break;
> >>>>>                        }
> >>>>> +
> >>>>> +                     if (!pcie_tph_set_st(bp->pdev, i,
> >>>>> +                                          cpumask_first(irq->cpu_mask),
> >>>>> +                                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
> >>>>> +                             netdev_err(bp->dev, "error in setting steering tag\n");
> >>>>> +                     } else {
> >>>>> +                             irq->bp = bp;
> >>>>> +                             __bnxt_register_notify_irqchanges(irq);
> >>>>> +                     }
> >>>>>                }
> >>>>>        }
> >>>>>        return rc;
> >>>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> >>>>> index dd849e715c9b..0d3442590bb4 100644
> >>>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> >>>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> >>>>> @@ -1195,6 +1195,10 @@ struct bnxt_irq {
> >>>>>        u8              have_cpumask:1;
> >>>>>        char            name[IFNAMSIZ + 2];
> >>>>>        cpumask_var_t   cpu_mask;
> >>>>> +
> >>>>> +     int             msix_nr;
> >>>>> +     struct bnxt     *bp;
> >>>>> +     struct irq_affinity_notify affinity_notify;
> >>>>>    };
> >>>>>
> >>>>>    #define HWRM_RING_ALLOC_TX  0x1
> >>>>
> >>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-10 15:23         ` Andy Gospodarek
  2024-05-10 20:03           ` David Wei
@ 2024-05-10 20:33           ` Vadim Fedorenko
  2024-05-10 20:37             ` Andy Gospodarek
  1 sibling, 1 reply; 25+ messages in thread
From: Vadim Fedorenko @ 2024-05-10 20:33 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Ajit Khaparde, Wei Huang, linux-pci, linux-kernel, linux-doc,
	netdev, bhelgaas, corbet, davem, edumazet, kuba, pabeni,
	alex.williamson, michael.chan, manoj.panicker2, Eric.VanTassell

On 10/05/2024 16:23, Andy Gospodarek wrote:
> On Fri, May 10, 2024 at 11:35:35AM +0100, Vadim Fedorenko wrote:
>> On 10.05.2024 04:55, Ajit Khaparde wrote:
>>> On Thu, May 9, 2024 at 2:50 PM Vadim Fedorenko
>>> <vadim.fedorenko@linux.dev> wrote:
>>>>
>>>> On 09/05/2024 17:27, Wei Huang wrote:
>>>>> From: Manoj Panicker <manoj.panicker2@amd.com>
>>>>>
>>>>> As a usage example, this patch implements TPH support in Broadcom BNXT
>>>>> device driver by invoking pcie_tph_set_st() function when interrupt
>>>>> affinity is changed.
>>>>>
>>>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>>>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>>>>> Reviewed-by: Wei Huang <wei.huang2@amd.com>
>>>>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
>>>>> ---
>>>>>     drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
>>>>>     drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>>>>>     2 files changed, 55 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>>>> index 2c2ee79c4d77..be9c17566fb4 100644
>>>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>>>> @@ -55,6 +55,7 @@
>>>>>     #include <net/page_pool/helpers.h>
>>>>>     #include <linux/align.h>
>>>>>     #include <net/netdev_queues.h>
>>>>> +#include <linux/pci-tph.h>
>>>>>
>>>>>     #include "bnxt_hsi.h"
>>>>>     #include "bnxt.h"
>>>>> @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>>>                                 free_cpumask_var(irq->cpu_mask);
>>>>>                                 irq->have_cpumask = 0;
>>>>>                         }
>>>>> +                     irq_set_affinity_notifier(irq->vector, NULL);
>>>>>                         free_irq(irq->vector, bp->bnapi[i]);
>>>>>                 }
>>>>>
>>>>> @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>>>         }
>>>>>     }
>>>>>
>>>>> +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
>>>>> +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
>>>>> +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
>>>>> +                                  const cpumask_t *mask)
>>>>> +{
>>>>> +     struct bnxt_irq *irq;
>>>>> +
>>>>> +     irq = container_of(notify, struct bnxt_irq, affinity_notify);
>>>>> +     cpumask_copy(irq->cpu_mask, mask);
>>>>> +
>>>>> +     if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
>>>>> +                          cpumask_first(irq->cpu_mask),
>>>>> +                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
>>>>> +             pr_err("error in configuring steering tag\n");
>>>>> +
>>>>> +     if (netif_running(irq->bp->dev)) {
>>>>> +             rtnl_lock();
>>>>> +             bnxt_close_nic(irq->bp, false, false);
>>>>> +             bnxt_open_nic(irq->bp, false, false);
>>>>> +             rtnl_unlock();
>>>>> +     }
>>>>
>>>> Is it really needed? It will cause link flap and pause in the traffic
>>>> service for the device. Why the device needs full restart in this case?
>>>
>>> In that sequence only the rings are recreated for the hardware to sync
>>> up the tags.
>>>
>>> Actually its not a full restart. There is no link reinit or other
>>> heavy lifting in this sequence.
>>> The pause in traffic may be momentary. Do IRQ/CPU affinities change frequently?
>>> Probably not?
>>
>>  From what I can see in bnxt_en, proper validation of link_re_init parameter is
>> not (yet?) implemented, __bnxt_open_nic will unconditionally call
>> netif_carrier_off() which will be treated as loss of carrier with counters
>> increment and proper events posted. Changes to CPU affinities were
>> non-disruptive before the patch, but now it may break user-space
>> assumptions.
> 
>  From my testing the link should not flap.  I just fired up a recent net-next
> and confirmed the same by calling $ ethtool -G ens7f0np0 rx 1024 which does a
> similar bnxt_close_nic(bp, false, false)/bnxt_open_nic(bp, false, false) as
> this patch.  Link remained up -- even with a non-Broadocm link-partner.

Hi Andy!

Well, it might be that from phy PoV the link didn't flap, but from 
network subsystem it does flap:

[root@host ~]# ethtool -G eth0 rx 512
[root@host ~]# cat /sys/class/net/eth0/carrier_changes
6
[root@host ~]# ethtool -G eth0 rx 1024
[root@host ~]# cat /sys/class/net/eth0/carrier_changes
8

And this is what I'm referring to when talking about user-space 
experience. But I would like to see new ndo_queue_stop/start 
implementation, it may help in this situation.

>> Does FW need full rings re-init to update target value, which is one u32 write?
>> It looks like overkill TBH.
> 
> Full rings do not, but the initialization of that particular ring associated
> with this irq does need to be done.  On my list of things we need to do in
> bnxt_en is implement the new ndo_queue_stop/start and ndo_queue_mem_alloc/free
> operations and once those are done we could make a switch as that may be less
> disruptive.
> 
>> And yes, affinities can be change on fly according to the changes of the
>> workload on the host.
>>
>>>>
>>>>
>>>>> +}
>>>>> +
>>>>> +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
>>>>
>>>> No inlines in .c files, please. Let compiler decide what to inline.
>>>>
>>>>> +{
>>>>> +     struct irq_affinity_notify *notify;
>>>>> +
>>>>> +     notify = &irq->affinity_notify;
>>>>> +     notify->irq = irq->vector;
>>>>> +     notify->notify = bnxt_irq_affinity_notify;
>>>>> +     notify->release = bnxt_irq_affinity_release;
>>>>> +
>>>>> +     irq_set_affinity_notifier(irq->vector, notify);
>>>>> +}
>>>>> +
>>>>>     static int bnxt_request_irq(struct bnxt *bp)
>>>>>     {
>>>>>         int i, j, rc = 0;
>>>>> @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>>>                         int numa_node = dev_to_node(&bp->pdev->dev);
>>>>>
>>>>>                         irq->have_cpumask = 1;
>>>>> +                     irq->msix_nr = map_idx;
>>>>>                         cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>>>>>                                         irq->cpu_mask);
>>>>>                         rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>>>>> @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>>>                                             irq->vector);
>>>>>                                 break;
>>>>>                         }
>>>>> +
>>>>> +                     if (!pcie_tph_set_st(bp->pdev, i,
>>>>> +                                          cpumask_first(irq->cpu_mask),
>>>>> +                                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
>>>>> +                             netdev_err(bp->dev, "error in setting steering tag\n");
>>>>> +                     } else {
>>>>> +                             irq->bp = bp;
>>>>> +                             __bnxt_register_notify_irqchanges(irq);
>>>>> +                     }
>>>>>                 }
>>>>>         }
>>>>>         return rc;
>>>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>>>> index dd849e715c9b..0d3442590bb4 100644
>>>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>>>> @@ -1195,6 +1195,10 @@ struct bnxt_irq {
>>>>>         u8              have_cpumask:1;
>>>>>         char            name[IFNAMSIZ + 2];
>>>>>         cpumask_var_t   cpu_mask;
>>>>> +
>>>>> +     int             msix_nr;
>>>>> +     struct bnxt     *bp;
>>>>> +     struct irq_affinity_notify affinity_notify;
>>>>>     };
>>>>>
>>>>>     #define HWRM_RING_ALLOC_TX  0x1
>>>>
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver
  2024-05-10 20:33           ` Vadim Fedorenko
@ 2024-05-10 20:37             ` Andy Gospodarek
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Gospodarek @ 2024-05-10 20:37 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Andy Gospodarek, Ajit Khaparde, Wei Huang, linux-pci,
	linux-kernel, linux-doc, netdev, bhelgaas, corbet, davem,
	edumazet, kuba, pabeni, alex.williamson, michael.chan,
	manoj.panicker2, Eric.VanTassell

On Fri, May 10, 2024 at 09:33:46PM +0100, Vadim Fedorenko wrote:
> On 10/05/2024 16:23, Andy Gospodarek wrote:
> > On Fri, May 10, 2024 at 11:35:35AM +0100, Vadim Fedorenko wrote:
> > > On 10.05.2024 04:55, Ajit Khaparde wrote:
> > > > On Thu, May 9, 2024 at 2:50 PM Vadim Fedorenko
> > > > <vadim.fedorenko@linux.dev> wrote:
> > > > > 
> > > > > On 09/05/2024 17:27, Wei Huang wrote:
> > > > > > From: Manoj Panicker <manoj.panicker2@amd.com>
> > > > > > 
> > > > > > As a usage example, this patch implements TPH support in Broadcom BNXT
> > > > > > device driver by invoking pcie_tph_set_st() function when interrupt
> > > > > > affinity is changed.
> > > > > > 
> > > > > > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> > > > > > Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > > > > > Reviewed-by: Wei Huang <wei.huang2@amd.com>
> > > > > > Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> > > > > > ---
> > > > > >     drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++++++++++++++++++++++
> > > > > >     drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
> > > > > >     2 files changed, 55 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > > > index 2c2ee79c4d77..be9c17566fb4 100644
> > > > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > > > @@ -55,6 +55,7 @@
> > > > > >     #include <net/page_pool/helpers.h>
> > > > > >     #include <linux/align.h>
> > > > > >     #include <net/netdev_queues.h>
> > > > > > +#include <linux/pci-tph.h>
> > > > > > 
> > > > > >     #include "bnxt_hsi.h"
> > > > > >     #include "bnxt.h"
> > > > > > @@ -10491,6 +10492,7 @@ static void bnxt_free_irq(struct bnxt *bp)
> > > > > >                                 free_cpumask_var(irq->cpu_mask);
> > > > > >                                 irq->have_cpumask = 0;
> > > > > >                         }
> > > > > > +                     irq_set_affinity_notifier(irq->vector, NULL);
> > > > > >                         free_irq(irq->vector, bp->bnapi[i]);
> > > > > >                 }
> > > > > > 
> > > > > > @@ -10498,6 +10500,45 @@ static void bnxt_free_irq(struct bnxt *bp)
> > > > > >         }
> > > > > >     }
> > > > > > 
> > > > > > +static void bnxt_rtnl_lock_sp(struct bnxt *bp);
> > > > > > +static void bnxt_rtnl_unlock_sp(struct bnxt *bp);
> > > > > > +static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> > > > > > +                                  const cpumask_t *mask)
> > > > > > +{
> > > > > > +     struct bnxt_irq *irq;
> > > > > > +
> > > > > > +     irq = container_of(notify, struct bnxt_irq, affinity_notify);
> > > > > > +     cpumask_copy(irq->cpu_mask, mask);
> > > > > > +
> > > > > > +     if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr,
> > > > > > +                          cpumask_first(irq->cpu_mask),
> > > > > > +                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY))
> > > > > > +             pr_err("error in configuring steering tag\n");
> > > > > > +
> > > > > > +     if (netif_running(irq->bp->dev)) {
> > > > > > +             rtnl_lock();
> > > > > > +             bnxt_close_nic(irq->bp, false, false);
> > > > > > +             bnxt_open_nic(irq->bp, false, false);
> > > > > > +             rtnl_unlock();
> > > > > > +     }
> > > > > 
> > > > > Is it really needed? It will cause link flap and pause in the traffic
> > > > > service for the device. Why the device needs full restart in this case?
> > > > 
> > > > In that sequence only the rings are recreated for the hardware to sync
> > > > up the tags.
> > > > 
> > > > Actually its not a full restart. There is no link reinit or other
> > > > heavy lifting in this sequence.
> > > > The pause in traffic may be momentary. Do IRQ/CPU affinities change frequently?
> > > > Probably not?
> > > 
> > >  From what I can see in bnxt_en, proper validation of link_re_init parameter is
> > > not (yet?) implemented, __bnxt_open_nic will unconditionally call
> > > netif_carrier_off() which will be treated as loss of carrier with counters
> > > increment and proper events posted. Changes to CPU affinities were
> > > non-disruptive before the patch, but now it may break user-space
> > > assumptions.
> > 
> >  From my testing the link should not flap.  I just fired up a recent net-next
> > and confirmed the same by calling $ ethtool -G ens7f0np0 rx 1024 which does a
> > similar bnxt_close_nic(bp, false, false)/bnxt_open_nic(bp, false, false) as
> > this patch.  Link remained up -- even with a non-Broadocm link-partner.
> 
> Hi Andy!
> 
> Well, it might be that from phy PoV the link didn't flap, but from network
> subsystem it does flap:
> 
> [root@host ~]# ethtool -G eth0 rx 512
> [root@host ~]# cat /sys/class/net/eth0/carrier_changes
> 6
> [root@host ~]# ethtool -G eth0 rx 1024
> [root@host ~]# cat /sys/class/net/eth0/carrier_changes
> 8

Fair point :)

I also think that we should skip doing the bnxt_close_nic/bnxt_open_nic
except when TPH is possible on the system.  That should mitigate some of
these concerns for users who do not have it enabled.

> And this is what I'm referring to when talking about user-space experience.
> But I would like to see new ndo_queue_stop/start implementation, it may help
> in this situation.
> 
> > > Does FW need full rings re-init to update target value, which is one u32 write?
> > > It looks like overkill TBH.
> > 
> > Full rings do not, but the initialization of that particular ring associated
> > with this irq does need to be done.  On my list of things we need to do in
> > bnxt_en is implement the new ndo_queue_stop/start and ndo_queue_mem_alloc/free
> > operations and once those are done we could make a switch as that may be less
> > disruptive.
> > 
> > > And yes, affinities can be change on fly according to the changes of the
> > > workload on the host.
> > > 
> > > > > 
> > > > > 
> > > > > > +}
> > > > > > +
> > > > > > +static void bnxt_irq_affinity_release(struct kref __always_unused *ref)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +static inline void __bnxt_register_notify_irqchanges(struct bnxt_irq *irq)
> > > > > 
> > > > > No inlines in .c files, please. Let compiler decide what to inline.
> > > > > 
> > > > > > +{
> > > > > > +     struct irq_affinity_notify *notify;
> > > > > > +
> > > > > > +     notify = &irq->affinity_notify;
> > > > > > +     notify->irq = irq->vector;
> > > > > > +     notify->notify = bnxt_irq_affinity_notify;
> > > > > > +     notify->release = bnxt_irq_affinity_release;
> > > > > > +
> > > > > > +     irq_set_affinity_notifier(irq->vector, notify);
> > > > > > +}
> > > > > > +
> > > > > >     static int bnxt_request_irq(struct bnxt *bp)
> > > > > >     {
> > > > > >         int i, j, rc = 0;
> > > > > > @@ -10543,6 +10584,7 @@ static int bnxt_request_irq(struct bnxt *bp)
> > > > > >                         int numa_node = dev_to_node(&bp->pdev->dev);
> > > > > > 
> > > > > >                         irq->have_cpumask = 1;
> > > > > > +                     irq->msix_nr = map_idx;
> > > > > >                         cpumask_set_cpu(cpumask_local_spread(i, numa_node),
> > > > > >                                         irq->cpu_mask);
> > > > > >                         rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> > > > > > @@ -10552,6 +10594,15 @@ static int bnxt_request_irq(struct bnxt *bp)
> > > > > >                                             irq->vector);
> > > > > >                                 break;
> > > > > >                         }
> > > > > > +
> > > > > > +                     if (!pcie_tph_set_st(bp->pdev, i,
> > > > > > +                                          cpumask_first(irq->cpu_mask),
> > > > > > +                                          TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) {
> > > > > > +                             netdev_err(bp->dev, "error in setting steering tag\n");
> > > > > > +                     } else {
> > > > > > +                             irq->bp = bp;
> > > > > > +                             __bnxt_register_notify_irqchanges(irq);
> > > > > > +                     }
> > > > > >                 }
> > > > > >         }
> > > > > >         return rc;
> > > > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > > > > > index dd849e715c9b..0d3442590bb4 100644
> > > > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > > > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > > > > > @@ -1195,6 +1195,10 @@ struct bnxt_irq {
> > > > > >         u8              have_cpumask:1;
> > > > > >         char            name[IFNAMSIZ + 2];
> > > > > >         cpumask_var_t   cpu_mask;
> > > > > > +
> > > > > > +     int             msix_nr;
> > > > > > +     struct bnxt     *bp;
> > > > > > +     struct irq_affinity_notify affinity_notify;
> > > > > >     };
> > > > > > 
> > > > > >     #define HWRM_RING_ALLOC_TX  0x1
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags
  2024-05-09 16:27 ` [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags Wei Huang
  2024-05-10  3:07   ` kernel test robot
@ 2024-05-11 20:15   ` Simon Horman
  2024-05-13 13:29     ` Wei Huang
  1 sibling, 1 reply; 25+ messages in thread
From: Simon Horman @ 2024-05-11 20:15 UTC (permalink / raw)
  To: Wei Huang
  Cc: linux-pci, linux-kernel, linux-doc, netdev, bhelgaas, corbet,
	davem, edumazet, kuba, pabeni, alex.williamson, gospo,
	michael.chan, ajit.khaparde, manoj.panicker2, Eric.VanTassell

On Thu, May 09, 2024 at 11:27:37AM -0500, Wei Huang wrote:
> This patch introduces two API functions, pcie_tph_get_st() and
> pcie_tph_set_st(), for a driver to retrieve or configure device's
> steering tags. There are two possible locations for steering tag
> table and the code automatically figure out the right location to
> set the tags if pcie_tph_set_st() is called. Note the tag value is
> always zero currently and will be extended in the follow-up patches.
> 
> Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
> Signed-off-by: Wei Huang <wei.huang2@amd.com>

Hi Eric and Wei,

I noticed a few minor problems flagged by Sparse
which I'd like to bring to your attention.

> ---
>  drivers/pci/pcie/tph.c  | 383 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci-tph.h |  19 ++
>  2 files changed, 402 insertions(+)
> 
> diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c

...

> +/*
> + * For a given device, return a pointer to the MSI table entry at msi_index.
> + */
> +static void __iomem *tph_msix_table_entry(struct pci_dev *dev,
> +					  __le16 msi_index)
> +{
> +	void *entry;
> +	u16 tbl_sz;
> +	int ret;
> +
> +	ret = tph_get_table_size(dev, &tbl_sz);
> +	if (ret || msi_index > tbl_sz)

While tbl_sz is a host-byte order integer value, msi_index is little endian.
So maths operations involving the latter doesn't seem right.

Flagged by Sparse.

> +		return NULL;
> +
> +	entry = dev->msix_base + msi_index * PCI_MSIX_ENTRY_SIZE;

Likewise, there seem to be endian problems here here.

Also, entry is used on the line above and below in a way
where an __iomem annotation is expected, but entry doesn't have one.

Also flagged by Sparse.

> +
> +	return entry;
> +}

...

> +/* Write the steering tag to MSI-X vector control register */
> +static void tph_write_tag_to_msix(struct pci_dev *dev, int msix_nr, u16 tag)
> +{
> +	u32 val;
> +	void __iomem *vec_ctrl;
> +	struct msi_desc *msi_desc;
> +
> +	msi_desc = tph_msix_index_to_desc(dev, msix_nr);
> +	if (!msi_desc) {
> +		pr_err("MSI-X descriptor for #%d not found\n", msix_nr);
> +		return;
> +	}
> +
> +	vec_ctrl = tph_msix_vector_control(dev, msi_desc->msi_index);

According to Sparse, the type of msi_desc->msi_index is unsigned short.
But tph_msix_vector_control expects it's second argument to be __le16.

> +
> +	val = readl(vec_ctrl);
> +	val &= 0xffff;
> +	val |= (tag << 16);
> +	writel(val, vec_ctrl);
> +
> +	/* read back to flush the update */
> +	val = readl(vec_ctrl);
> +	msi_unlock_descs(&dev->dev);
> +}

...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags
  2024-05-11 20:15   ` Simon Horman
@ 2024-05-13 13:29     ` Wei Huang
  0 siblings, 0 replies; 25+ messages in thread
From: Wei Huang @ 2024-05-13 13:29 UTC (permalink / raw)
  To: Simon Horman
  Cc: linux-pci, linux-kernel, linux-doc, netdev, bhelgaas, corbet,
	davem, edumazet, kuba, pabeni, alex.williamson, gospo,
	michael.chan, ajit.khaparde, manoj.panicker2, Eric.VanTassell



On 5/11/24 15:15, Simon Horman wrote:
> On Thu, May 09, 2024 at 11:27:37AM -0500, Wei Huang wrote:
>> This patch introduces two API functions, pcie_tph_get_st() and
>> pcie_tph_set_st(), for a driver to retrieve or configure device's
>> steering tags. There are two possible locations for steering tag
>> table and the code automatically figure out the right location to
>> set the tags if pcie_tph_set_st() is called. Note the tag value is
>> always zero currently and will be extended in the follow-up patches.
>>
>> Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
>> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
> 
> Hi Eric and Wei,
> 
> I noticed a few minor problems flagged by Sparse
> which I'd like to bring to your attention.
> 
>> ---
>>  drivers/pci/pcie/tph.c  | 383 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/pci-tph.h |  19 ++
>>  2 files changed, 402 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c
> 
> ...
> 
>> +/*
>> + * For a given device, return a pointer to the MSI table entry at msi_index.
>> + */
>> +static void __iomem *tph_msix_table_entry(struct pci_dev *dev,
>> +					  __le16 msi_index)
>> +{
>> +	void *entry;
>> +	u16 tbl_sz;
>> +	int ret;
>> +
>> +	ret = tph_get_table_size(dev, &tbl_sz);
>> +	if (ret || msi_index > tbl_sz)
> 
> While tbl_sz is a host-byte order integer value, msi_index is little endian.
> So maths operations involving the latter doesn't seem right.

Thanks, will take care of it in the next revision.

> 
> Flagged by Sparse.
> 
>> +		return NULL;
>> +
>> +	entry = dev->msix_base + msi_index * PCI_MSIX_ENTRY_SIZE;
> 
> Likewise, there seem to be endian problems here here.
> 
> Also, entry is used on the line above and below in a way
> where an __iomem annotation is expected, but entry doesn't have one.
> 
> Also flagged by Sparse.

Will fix

> 
>> +
>> +	return entry;
>> +}
> 
> ...
> 
>> +/* Write the steering tag to MSI-X vector control register */
>> +static void tph_write_tag_to_msix(struct pci_dev *dev, int msix_nr, u16 tag)
>> +{
>> +	u32 val;
>> +	void __iomem *vec_ctrl;
>> +	struct msi_desc *msi_desc;
>> +
>> +	msi_desc = tph_msix_index_to_desc(dev, msix_nr);
>> +	if (!msi_desc) {
>> +		pr_err("MSI-X descriptor for #%d not found\n", msix_nr);
>> +		return;
>> +	}
>> +
>> +	vec_ctrl = tph_msix_vector_control(dev, msi_desc->msi_index);
> 
> According to Sparse, the type of msi_desc->msi_index is unsigned short.
> But tph_msix_vector_control expects it's second argument to be __le16.

Will fix

> 
>> +
>> +	val = readl(vec_ctrl);
>> +	val &= 0xffff;
>> +	val |= (tag << 16);
>> +	writel(val, vec_ctrl);
>> +
>> +	/* read back to flush the update */
>> +	val = readl(vec_ctrl);
>> +	msi_unlock_descs(&dev->dev);
>> +}
> 
> ...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V1 7/9] PCI/TPH: Add TPH documentation
  2024-05-09 16:27 ` [PATCH V1 7/9] PCI/TPH: Add TPH documentation Wei Huang
@ 2024-05-15 12:11   ` Bagas Sanjaya
  0 siblings, 0 replies; 25+ messages in thread
From: Bagas Sanjaya @ 2024-05-15 12:11 UTC (permalink / raw)
  To: Wei Huang, linux-pci, linux-kernel, linux-doc, netdev
  Cc: bhelgaas, corbet, davem, edumazet, kuba, pabeni, alex.williamson,
	gospo, michael.chan, ajit.khaparde, manoj.panicker2,
	Eric.VanTassell

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

On Thu, May 09, 2024 at 11:27:39AM -0500, Wei Huang wrote:
> +:Copyright: |copy| 2024 Advanced Micro Devices, Inc.
> +:Authors: - Eric van Tassell <eric.vantassell@amd.com>
> +          - Wei Huang <wei.huang2@amd.com>

You can directly embed copyright symbol without having to pull in <isonum.txt>:

---- >8 ----
diff --git a/Documentation/PCI/tph.rst b/Documentation/PCI/tph.rst
index ea9c8313f3e4f8..d7043fb0b71b3a 100644
--- a/Documentation/PCI/tph.rst
+++ b/Documentation/PCI/tph.rst
@@ -5,7 +5,7 @@ TPH Support
 ===========
 
 
-:Copyright: |copy| 2024 Advanced Micro Devices, Inc.
+:Copyright: © 2024 Advanced Micro Devices, Inc.
 :Authors: - Eric van Tassell <eric.vantassell@amd.com>
           - Wei Huang <wei.huang2@amd.com>
 
Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-05-15 12:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-09 16:27 [PATCH V1 0/9] PCIe TPH and cache direct injection support Wei Huang
2024-05-09 16:27 ` [PATCH V1 1/9] PCI: Introduce PCIe TPH support framework Wei Huang
2024-05-09 16:27 ` [PATCH V1 2/9] PCI: Add TPH related register definition Wei Huang
2024-05-09 16:27 ` [PATCH V1 3/9] PCI/TPH: Implement a command line option to disable TPH Wei Huang
2024-05-09 16:27 ` [PATCH V1 4/9] PCI/TPH: Implement a command line option to force No ST Mode Wei Huang
2024-05-09 16:27 ` [PATCH V1 5/9] PCI/TPH: Introduce API functions to get/set steering tags Wei Huang
2024-05-10  3:07   ` kernel test robot
2024-05-11 20:15   ` Simon Horman
2024-05-13 13:29     ` Wei Huang
2024-05-09 16:27 ` [PATCH V1 6/9] PCI/TPH: Retrieve steering tag from ACPI _DSM Wei Huang
2024-05-10  4:20   ` kernel test robot
2024-05-10  5:24   ` kernel test robot
2024-05-09 16:27 ` [PATCH V1 7/9] PCI/TPH: Add TPH documentation Wei Huang
2024-05-15 12:11   ` Bagas Sanjaya
2024-05-09 16:27 ` [PATCH V1 8/9] bnxt_en: Add TPH support in BNXT driver Wei Huang
2024-05-09 21:50   ` Vadim Fedorenko
2024-05-10  3:55     ` Ajit Khaparde
2024-05-10 10:35       ` Vadim Fedorenko
2024-05-10 15:23         ` Andy Gospodarek
2024-05-10 20:03           ` David Wei
2024-05-10 20:33             ` Andy Gospodarek
2024-05-10 20:33           ` Vadim Fedorenko
2024-05-10 20:37             ` Andy Gospodarek
2024-05-10  3:10   ` Somnath Kotur
2024-05-09 16:27 ` [PATCH V1 9/9] bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).