linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Expose and manage PCI device reset
@ 2021-03-12 17:34 ameynarkhede03
  2021-03-12 17:34 ` [PATCH 1/4] PCI: Refactor pcie_flr to follow calling convention of other reset methods ameynarkhede03
                   ` (5 more replies)
  0 siblings, 6 replies; 90+ messages in thread
From: ameynarkhede03 @ 2021-03-12 17:34 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, alex.williamson, raphael.norwitz, Amey Narkhede

From: Amey Narkhede <ameynarkhede03@gmail.com>

PCI and PCIe devices may support a number of possible reset mechanisms
for example Function Level Reset (FLR) provided via Advanced Feature or
PCIe capabilities, Power Management reset, bus reset, or device specific reset.
Currently the PCI subsystem creates a policy prioritizing these reset methods
which provides neither visibility nor control to userspace.

Expose the reset methods available per device to userspace, via sysfs
and allow an administrative user or device owner to have ability to
manage per device reset method priorities or exclusions.
This feature aims to allow greater control of a device for use cases
as device assignment, where specific device or platform issues may
interact poorly with a given reset method, and for which device specific
quirks have not been developed.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

Amey Narkhede (4):
  PCI: Refactor pcie_flr to follow calling convention of other reset
    methods
  PCI: Add new bitmap for keeping track of supported reset mechanisms
  PCI: Remove reset_fn field from pci_dev
  PCI/sysfs: Allow userspace to query and set device reset mechanism

 Documentation/ABI/testing/sysfs-bus-pci       |  15 ++
 drivers/crypto/cavium/nitrox/nitrox_main.c    |   4 +-
 drivers/crypto/qat/qat_common/adf_aer.c       |   2 +-
 drivers/infiniband/hw/hfi1/chip.c             |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +-
 .../ethernet/cavium/liquidio/lio_vf_main.c    |   4 +-
 .../ethernet/cavium/liquidio/octeon_mailbox.c |   2 +-
 drivers/net/ethernet/freescale/enetc/enetc.c  |   2 +-
 .../ethernet/freescale/enetc/enetc_pci_mdio.c |   2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   4 +-
 drivers/pci/pci-sysfs.c                       |  68 +++++++-
 drivers/pci/pci.c                             | 160 ++++++++++--------
 drivers/pci/pci.h                             |  11 +-
 drivers/pci/pcie/aer.c                        |  12 +-
 drivers/pci/probe.c                           |   4 +-
 drivers/pci/quirks.c                          |  17 +-
 include/linux/pci.h                           |  17 +-
 17 files changed, 213 insertions(+), 117 deletions(-)

--
2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 1/4] PCI: Refactor pcie_flr to follow calling convention of other reset methods
  2021-03-12 17:34 [PATCH 0/4] Expose and manage PCI device reset ameynarkhede03
@ 2021-03-12 17:34 ` ameynarkhede03
  2021-03-12 17:34 ` [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms ameynarkhede03
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: ameynarkhede03 @ 2021-03-12 17:34 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, alex.williamson, raphael.norwitz, Amey Narkhede

From: Amey Narkhede <ameynarkhede03@gmail.com>

Currently there is separate function pcie_has_flr to probe
whether pcie flr is supported or not by the device which does
not match the calling convention followed by all other reset
methods which use second function argument to decide whether
to probe or not. Refactor pcie_flr to follow calling convention
of reset methods and remove superfluous pcie_has_flr function.

Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
---
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

 drivers/crypto/cavium/nitrox/nitrox_main.c    |  4 +-
 drivers/crypto/qat/qat_common/adf_aer.c       |  2 +-
 drivers/infiniband/hw/hfi1/chip.c             |  4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 .../ethernet/cavium/liquidio/lio_vf_main.c    |  2 +-
 .../ethernet/cavium/liquidio/octeon_mailbox.c |  2 +-
 drivers/net/ethernet/freescale/enetc/enetc.c  |  2 +-
 .../ethernet/freescale/enetc/enetc_pci_mdio.c |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  4 +-
 drivers/pci/pci.c                             | 65 ++++++++++---------
 drivers/pci/pcie/aer.c                        | 12 ++--
 drivers/pci/quirks.c                          | 15 ++---
 include/linux/pci.h                           |  4 +-
 13 files changed, 58 insertions(+), 62 deletions(-)

diff --git a/drivers/crypto/cavium/nitrox/nitrox_main.c b/drivers/crypto/cavium/nitrox/nitrox_main.c
index facc8e6bc..dbf9499f4 100644
--- a/drivers/crypto/cavium/nitrox/nitrox_main.c
+++ b/drivers/crypto/cavium/nitrox/nitrox_main.c
@@ -306,9 +306,7 @@ static int nitrox_device_flr(struct pci_dev *pdev)
 		return -ENOMEM;
 	}

-	/* check flr support */
-	if (pcie_has_flr(pdev))
-		pcie_flr(pdev);
+	pcie_flr(pdev, 0);

 	pci_restore_state(pdev);

diff --git a/drivers/crypto/qat/qat_common/adf_aer.c b/drivers/crypto/qat/qat_common/adf_aer.c
index d2ae293d0..7716a6b8b 100644
--- a/drivers/crypto/qat/qat_common/adf_aer.c
+++ b/drivers/crypto/qat/qat_common/adf_aer.c
@@ -65,7 +65,7 @@ EXPORT_SYMBOL_GPL(adf_reset_sbr);

 void adf_reset_flr(struct adf_accel_dev *accel_dev)
 {
-	pcie_flr(accel_to_pci_dev(accel_dev));
+	pcie_flr(accel_to_pci_dev(accel_dev), 0);
 }
 EXPORT_SYMBOL_GPL(adf_reset_flr);

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 993cbf37e..b2cc0dd9b 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -14099,7 +14099,7 @@ static int init_chip(struct hfi1_devdata *dd)
 		dd_dev_info(dd, "Resetting CSRs with FLR\n");

 		/* do the FLR, the DC reset will remain */
-		pcie_flr(dd->pcidev);
+		pcie_flr(dd->pcidev, 0);

 		/* restore command and BARs */
 		ret = restore_pci_variables(dd);
@@ -14111,7 +14111,7 @@ static int init_chip(struct hfi1_devdata *dd)

 		if (is_ax(dd)) {
 			dd_dev_info(dd, "Resetting CSRs with FLR\n");
-			pcie_flr(dd->pcidev);
+			pcie_flr(dd->pcidev, 0);
 			ret = restore_pci_variables(dd);
 			if (ret) {
 				dd_dev_err(dd, "%s: Could not restore PCI variables\n",
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index a680fd9c6..dd2b539c7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12750,7 +12750,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 */
 	if (is_kdump_kernel()) {
 		pci_clear_master(pdev);
-		pcie_flr(pdev);
+		pcie_flr(pdev, 0);
 	}

 	max_irqs = bnxt_get_max_irq(pdev);
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 516f166ce..9b9d305c6 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -429,7 +429,7 @@ static void octeon_pci_flr(struct octeon_device *oct)
 	pci_write_config_word(oct->pci_dev, PCI_COMMAND,
 			      PCI_COMMAND_INTX_DISABLE);

-	pcie_flr(oct->pci_dev);
+	pcie_flr(oct->pci_dev, 0);

 	pci_cfg_access_unlock(oct->pci_dev);

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
index ad685f5d0..ed9e68a4b 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
@@ -260,7 +260,7 @@ static int octeon_mbox_process_cmd(struct octeon_mbox *mbox,
 		dev_info(&oct->pci_dev->dev,
 			 "got a request for FLR from VF that owns DPI ring %u\n",
 			 mbox->q_no);
-		pcie_flr(oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no]);
+		pcie_flr(oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no], 0);
 		break;

 	case OCTEON_PF_CHANGED_VF_MACADDR:
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index c78d12229..8fb11c63c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1895,7 +1895,7 @@ int enetc_pci_probe(struct pci_dev *pdev, const char *name, int sizeof_priv)
 	size_t alloc_size;
 	int err, len;

-	pcie_flr(pdev);
+	pcie_flr(pdev, 0);
 	err = pci_enable_device_mem(pdev);
 	if (err) {
 		dev_err(&pdev->dev, "device enable failed\n");
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pci_mdio.c b/drivers/net/ethernet/freescale/enetc/enetc_pci_mdio.c
index 15f37c5b8..7cd6bf124 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pci_mdio.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pci_mdio.c
@@ -47,7 +47,7 @@ static int enetc_pci_mdio_probe(struct pci_dev *pdev,
 	mdio_priv->mdio_base = ENETC_EMDIO_BASE;
 	snprintf(bus->id, MII_BUS_ID_SIZE, "%s", dev_name(dev));

-	pcie_flr(pdev);
+	pcie_flr(pdev, 0);
 	err = pci_enable_device_mem(pdev);
 	if (err) {
 		dev_err(dev, "device enable failed\n");
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index fae84202d..c638fb650 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7624,7 +7624,7 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
 		pci_read_config_word(vfdev, PCI_STATUS, &status_reg);
 		if (status_reg != IXGBE_FAILED_READ_CFG_WORD &&
 		    status_reg & PCI_STATUS_REC_MASTER_ABORT)
-			pcie_flr(vfdev);
+			pcie_flr(vfdev, 0);
 	}
 }

@@ -11241,7 +11241,7 @@ static pci_ers_result_t ixgbe_io_error_detected(struct pci_dev *pdev,
 		 * VFLR.  Just clean up the AER in that case.
 		 */
 		if (vfdev) {
-			pcie_flr(vfdev);
+			pcie_flr(vfdev, 0);
 			/* Free device reference count */
 			pci_dev_put(vfdev);
 		}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 16a17215f..4a7c084a3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4574,33 +4574,13 @@ int pci_wait_for_pending_transaction(struct pci_dev *dev)
 EXPORT_SYMBOL(pci_wait_for_pending_transaction);

 /**
- * pcie_has_flr - check if a device supports function level resets
- * @dev: device to check
- *
- * Returns true if the device advertises support for PCIe function level
- * resets.
- */
-bool pcie_has_flr(struct pci_dev *dev)
-{
-	u32 cap;
-
-	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
-		return false;
-
-	pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap);
-	return cap & PCI_EXP_DEVCAP_FLR;
-}
-EXPORT_SYMBOL_GPL(pcie_has_flr);
-
-/**
- * pcie_flr - initiate a PCIe function level reset
+ * pcie_reset_flr - initiate a PCIe function level reset
  * @dev: device to reset
  *
- * Initiate a function level reset on @dev.  The caller should ensure the
- * device supports FLR before calling this function, e.g. by using the
- * pcie_has_flr() helper.
+ * Initiate a function level reset unconditionally on @dev without
+ * checking any flags and DEVCAP
  */
-int pcie_flr(struct pci_dev *dev)
+int pcie_reset_flr(struct pci_dev *dev)
 {
 	if (!pci_wait_for_pending_transaction(dev))
 		pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");
@@ -4619,6 +4599,30 @@ int pcie_flr(struct pci_dev *dev)

 	return pci_dev_wait(dev, "FLR", PCIE_RESET_READY_POLL_MS);
 }
+
+/**
+ * pcie_flr - initiate a PCIe function level reset
+ * @dev: device to reset
+ * @probe: If set, only check if the device can be reset this way.
+ *
+ * Initiate a function level reset on @dev.
+ */
+int pcie_flr(struct pci_dev *dev, int probe)
+{
+	u32 cap;
+
+	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
+		return -ENOTTY;
+
+	pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap);
+	if (!(cap & PCI_EXP_DEVCAP_FLR))
+		return -ENOTTY;
+
+	if (probe)
+		return 0;
+
+	return pcie_reset_flr(dev);
+}
 EXPORT_SYMBOL_GPL(pcie_flr);

 static int pci_af_flr(struct pci_dev *dev, int probe)
@@ -5091,11 +5095,9 @@ int __pci_reset_function_locked(struct pci_dev *dev)
 	rc = pci_dev_specific_reset(dev, 0);
 	if (rc != -ENOTTY)
 		return rc;
-	if (pcie_has_flr(dev)) {
-		rc = pcie_flr(dev);
-		if (rc != -ENOTTY)
-			return rc;
-	}
+	rc = pcie_flr(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
 	rc = pci_af_flr(dev, 0);
 	if (rc != -ENOTTY)
 		return rc;
@@ -5129,8 +5131,9 @@ int pci_probe_reset_function(struct pci_dev *dev)
 	rc = pci_dev_specific_reset(dev, 1);
 	if (rc != -ENOTTY)
 		return rc;
-	if (pcie_has_flr(dev))
-		return 0;
+	rc = pcie_flr(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
 	rc = pci_af_flr(dev, 1);
 	if (rc != -ENOTTY)
 		return rc;
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ba2238834..57a8806a9 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1405,13 +1405,11 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 	}

 	if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
-		if (pcie_has_flr(dev)) {
-			rc = pcie_flr(dev);
-			pci_info(dev, "has been reset (%d)\n", rc);
-		} else {
-			pci_info(dev, "not reset (no FLR support)\n");
-			rc = -ENOTTY;
-		}
+		rc = pcie_flr(dev, 0);
+		if (!rc)
+			pci_info(dev, "has been reset\n");
+		else
+			pci_info(dev, "not reset (no FLR support: %d)\n", rc);
 	} else {
 		rc = pci_bus_error_reset(dev);
 		pci_info(dev, "%s Port link has been reset (%d)\n",
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3b..0a3df84c9 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3692,7 +3692,7 @@ static int reset_intel_82599_sfp_virtfn(struct pci_dev *dev, int probe)
 	 * supported.
 	 */
 	if (!probe)
-		pcie_flr(dev);
+		pcie_reset_flr(dev);
 	return 0;
 }

@@ -3795,7 +3795,7 @@ static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe)
 				      PCI_MSIX_FLAGS_ENABLE |
 				      PCI_MSIX_FLAGS_MASKALL);

-	pcie_flr(dev);
+	pcie_flr(dev, 0);

 	/*
 	 * Restore the configuration information (BAR values, etc.) including
@@ -3831,7 +3831,7 @@ static int nvme_disable_and_flr(struct pci_dev *dev, int probe)
 	u32 cfg;

 	if (dev->class != PCI_CLASS_STORAGE_EXPRESS ||
-	    !pcie_has_flr(dev) || !pci_resource_start(dev, 0))
+	    pcie_flr(dev, 1) || !pci_resource_start(dev, 0))
 		return -ENOTTY;

 	if (probe)
@@ -3887,7 +3887,7 @@ static int nvme_disable_and_flr(struct pci_dev *dev, int probe)

 	pci_iounmap(dev, bar);

-	pcie_flr(dev);
+	pcie_flr(dev, 0);

 	return 0;
 }
@@ -3900,13 +3900,10 @@ static int nvme_disable_and_flr(struct pci_dev *dev, int probe)
  */
 static int delay_250ms_after_flr(struct pci_dev *dev, int probe)
 {
-	if (!pcie_has_flr(dev))
-		return -ENOTTY;
+	int ret = pcie_flr(dev, probe);

 	if (probe)
-		return 0;
-
-	pcie_flr(dev);
+		return ret;

 	msleep(250);

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 86c799c97..621ff5224 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1217,8 +1217,8 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 			     enum pci_bus_speed *speed,
 			     enum pcie_link_width *width);
 void pcie_print_link_status(struct pci_dev *dev);
-bool pcie_has_flr(struct pci_dev *dev);
-int pcie_flr(struct pci_dev *dev);
+int pcie_reset_flr(struct pci_dev *dev);
+int pcie_flr(struct pci_dev *dev, int probe);
 int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);
 int pci_reset_function_locked(struct pci_dev *dev);
--
2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms
  2021-03-12 17:34 [PATCH 0/4] Expose and manage PCI device reset ameynarkhede03
  2021-03-12 17:34 ` [PATCH 1/4] PCI: Refactor pcie_flr to follow calling convention of other reset methods ameynarkhede03
@ 2021-03-12 17:34 ` ameynarkhede03
  2021-03-14 23:51   ` Pali Rohár
  2021-03-12 17:34 ` [PATCH 3/4] PCI: Remove reset_fn field from pci_dev ameynarkhede03
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 90+ messages in thread
From: ameynarkhede03 @ 2021-03-12 17:34 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, alex.williamson, raphael.norwitz, Amey Narkhede

From: Amey Narkhede <ameynarkhede03@gmail.com>

Introduce a new bitmap reset_methods in struct pci_dev
to keep track of reset mechanisms supported by the
device. Also refactor probing and reset functions
to take advantage of calling convention of reset
functions.

Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
---
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

 drivers/pci/pci.c   | 106 ++++++++++++++++++++++++--------------------
 drivers/pci/pci.h   |  11 ++++-
 drivers/pci/probe.c |   5 +--
 include/linux/pci.h |  10 +++++
 4 files changed, 79 insertions(+), 53 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4a7c084a3..407b44e85 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -40,6 +40,26 @@ const char *pci_power_names[] = {
 };
 EXPORT_SYMBOL_GPL(pci_power_names);

+static int pci_af_flr(struct pci_dev *dev, int probe);
+static int pci_pm_reset(struct pci_dev *dev, int probe);
+static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe);
+static int pci_parent_bus_reset(struct pci_dev *dev, int probe);
+
+/*
+ * The ordering for functions in pci_reset_fn_methods
+ * is required for bitmap positions defined
+ * in reset_methods in struct pci_dev
+ */
+const struct pci_reset_fn_method pci_reset_fn_methods[] = {
+	{ .reset_fn = &pci_dev_specific_reset, .name = "device_specific" },
+	{ .reset_fn = &pcie_flr, .name = "flr" },
+	{ .reset_fn = &pci_af_flr, .name = "af_flr" },
+	{ .reset_fn = &pci_pm_reset, .name = "pm" },
+	{ .reset_fn = &pci_dev_reset_slot_function, .name = "slot" },
+	{ .reset_fn = &pci_parent_bus_reset, .name = "bus" },
+	{ 0 },
+};
+
 int isa_dma_bridge_buggy;
 EXPORT_SYMBOL(isa_dma_bridge_buggy);

@@ -5080,71 +5100,59 @@ static void pci_dev_restore(struct pci_dev *dev)
  */
 int __pci_reset_function_locked(struct pci_dev *dev)
 {
-	int rc;
+	int i, rc = -ENOTTY;
+	const struct pci_reset_fn_method *reset;

 	might_sleep();

-	/*
-	 * A reset method returns -ENOTTY if it doesn't support this device
-	 * and we should try the next method.
-	 *
-	 * If it returns 0 (success), we're finished.  If it returns any
-	 * other error, we're also finished: this indicates that further
-	 * reset mechanisms might be broken on the device.
-	 */
-	rc = pci_dev_specific_reset(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pcie_flr(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_af_flr(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_pm_reset(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_dev_reset_slot_function(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	return pci_parent_bus_reset(dev, 0);
+	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
+		if (!(dev->reset_methods & (1 << i)))
+			continue;
+
+		/*
+		 * A reset method returns -ENOTTY if it doesn't support this device
+		 * and we should try the next method.
+		 *
+		 * If it returns 0 (success), we're finished.  If it returns any
+		 * other error, we're also finished: this indicates that further
+		 * reset mechanisms might be broken on the device.
+		 */
+		rc = reset->reset_fn(dev, 0);
+		if (rc != -ENOTTY)
+			return rc;
+	}
+	return rc;
 }
 EXPORT_SYMBOL_GPL(__pci_reset_function_locked);

 /**
- * pci_probe_reset_function - check whether the device can be safely reset
- * @dev: PCI device to reset
+ * pci_init_reset_methods - check whether device can be safely reset
+ * and store supported reset mechanisms.
+ * @dev: PCI device to check for reset mechanisms
  *
  * Some devices allow an individual function to be reset without affecting
  * other functions in the same device.  The PCI device must be responsive
- * to PCI config space in order to use this function.
+ * to reads and writes to its PCI config space in order to use this function.
  *
- * Returns 0 if the device function can be reset or negative if the
- * device doesn't support resetting a single function.
+ * Stores reset mechanisms supported by device in reset_methods bitmap
+ * field of struct pci_dev
  */
-int pci_probe_reset_function(struct pci_dev *dev)
+void pci_init_reset_methods(struct pci_dev *dev)
 {
-	int rc;
+	int i, rc;
+	const struct pci_reset_fn_method *reset;

-	might_sleep();
+	dev->reset_methods = 0;

-	rc = pci_dev_specific_reset(dev, 1);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pcie_flr(dev, 1);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_af_flr(dev, 1);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_pm_reset(dev, 1);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_dev_reset_slot_function(dev, 1);
-	if (rc != -ENOTTY)
-		return rc;
+	might_sleep();

-	return pci_parent_bus_reset(dev, 1);
+	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
+		rc = reset->reset_fn(dev, 1);
+		if (!rc)
+			dev->reset_methods |= (1 << i);
+		else if (rc != -ENOTTY)
+			break;
+	}
 }

 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ef7c46613..ec093efdc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -39,7 +39,7 @@ enum pci_mmap_api {
 int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai,
 		  enum pci_mmap_api mmap_api);

-int pci_probe_reset_function(struct pci_dev *dev);
+void pci_init_reset_methods(struct pci_dev *dev);
 int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
 int pci_bus_error_reset(struct pci_dev *dev);

@@ -612,6 +612,15 @@ struct pci_dev_reset_methods {
 	int (*reset)(struct pci_dev *dev, int probe);
 };

+typedef int (*pci_reset_fn_t)(struct pci_dev *, int);
+
+struct pci_reset_fn_method {
+	pci_reset_fn_t reset_fn;
+	char *name;
+};
+
+extern const struct pci_reset_fn_method pci_reset_fn_methods[];
+
 #ifdef CONFIG_PCI_QUIRKS
 int pci_dev_specific_reset(struct pci_dev *dev, int probe);
 #else
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 953f15abc..01dd037bd 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2403,9 +2403,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	pci_rcec_init(dev);		/* Root Complex Event Collector */

 	pcie_report_downtraining(dev);
-
-	if (pci_probe_reset_function(dev) == 0)
-		dev->reset_fn = 1;
+	pci_init_reset_methods(dev);
+	dev->reset_fn = !!dev->reset_methods;
 }

 /*
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 621ff5224..56d6e4750 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -325,6 +325,16 @@ struct pci_dev {
 	unsigned int	class;		/* 3 bytes: (base,sub,prog-if) */
 	u8		revision;	/* PCI revision, low byte of class word */
 	u8		hdr_type;	/* PCI header type (`multi' flag masked out) */
+	/*
+	 * bit 0 -> dev_specific
+	 * bit 1 -> flr
+	 * bit 2 -> af_flr
+	 * bit 3 -> pm
+	 * bit 4 -> slot
+	 * bit 5 -> bus
+	 * See pci_reset_fn_methods array in pci.c
+	 */
+	u8 __bitwise reset_methods;		/* bitmap for device supported reset capabilities */
 #ifdef CONFIG_PCIEAER
 	u16		aer_cap;	/* AER capability offset */
 	struct aer_stats *aer_stats;	/* AER stats for this device */
--
2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 3/4] PCI: Remove reset_fn field from pci_dev
  2021-03-12 17:34 [PATCH 0/4] Expose and manage PCI device reset ameynarkhede03
  2021-03-12 17:34 ` [PATCH 1/4] PCI: Refactor pcie_flr to follow calling convention of other reset methods ameynarkhede03
  2021-03-12 17:34 ` [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms ameynarkhede03
@ 2021-03-12 17:34 ` ameynarkhede03
  2021-03-14 23:52   ` Pali Rohár
  2021-03-12 17:34 ` [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism ameynarkhede03
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 90+ messages in thread
From: ameynarkhede03 @ 2021-03-12 17:34 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, alex.williamson, raphael.norwitz, Amey Narkhede

From: Amey Narkhede <ameynarkhede03@gmail.com>

reset_fn field is used to indicate whether the
device supports any reset mechanism or not.
Deprecate use of reset_fn in favor of new
reset_methods bitmap which can be used to keep
track of all supported reset mechanisms of a device.

Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
---
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
 drivers/pci/pci-sysfs.c                            | 6 ++----
 drivers/pci/pci.c                                  | 6 +++---
 drivers/pci/probe.c                                | 1 -
 drivers/pci/quirks.c                               | 2 +-
 include/linux/pci.h                                | 1 -
 6 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 9b9d305c6..3e2c49e08 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -526,7 +526,7 @@ static void octeon_destroy_resources(struct octeon_device *oct)
 			oct->irq_name_storage = NULL;
 		}
 		/* Soft reset the octeon device before exiting */
-		if (oct->pci_dev->reset_fn)
+		if (oct->pci_dev->reset_methods)
 			octeon_pci_flr(oct);
 		else
 			cn23xx_vf_ask_pf_to_do_flr(oct);
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index f8afd54ca..78d2c130c 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1334,7 +1334,7 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)

 	pcie_vpd_create_sysfs_dev_files(dev);

-	if (dev->reset_fn) {
+	if (dev->reset_methods) {
 		retval = device_create_file(&dev->dev, &dev_attr_reset);
 		if (retval)
 			goto error;
@@ -1417,10 +1417,8 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
 static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
 {
 	pcie_vpd_remove_sysfs_dev_files(dev);
-	if (dev->reset_fn) {
+	if (dev->reset_methods)
 		device_remove_file(&dev->dev, &dev_attr_reset);
-		dev->reset_fn = 0;
-	}
 }

 /**
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 407b44e85..b7f6c6588 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5175,7 +5175,7 @@ int pci_reset_function(struct pci_dev *dev)
 {
 	int rc;

-	if (!dev->reset_fn)
+	if (!dev->reset_methods)
 		return -ENOTTY;

 	pci_dev_lock(dev);
@@ -5211,7 +5211,7 @@ int pci_reset_function_locked(struct pci_dev *dev)
 {
 	int rc;

-	if (!dev->reset_fn)
+	if (!dev->reset_methods)
 		return -ENOTTY;

 	pci_dev_save_and_disable(dev);
@@ -5234,7 +5234,7 @@ int pci_try_reset_function(struct pci_dev *dev)
 {
 	int rc;

-	if (!dev->reset_fn)
+	if (!dev->reset_methods)
 		return -ENOTTY;

 	if (!pci_dev_trylock(dev))
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 01dd037bd..4764e031a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2404,7 +2404,6 @@ static void pci_init_capabilities(struct pci_dev *dev)

 	pcie_report_downtraining(dev);
 	pci_init_reset_methods(dev);
-	dev->reset_fn = !!dev->reset_methods;
 }

 /*
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 0a3df84c9..20a81b1bc 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5535,7 +5535,7 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)

 	if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
 	    pdev->subsystem_device != 0x222e ||
-	    !pdev->reset_fn)
+	    !pdev->reset_methods)
 		return;

 	if (pci_enable_device_mem(pdev))
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 56d6e4750..a2f003f4e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -437,7 +437,6 @@ struct pci_dev {
 	unsigned int	state_saved:1;
 	unsigned int	is_physfn:1;
 	unsigned int	is_virtfn:1;
-	unsigned int	reset_fn:1;
 	unsigned int	is_hotplug_bridge:1;
 	unsigned int	shpc_managed:1;		/* SHPC owned by shpchp */
 	unsigned int	is_thunderbolt:1;	/* Thunderbolt controller */
--
2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-12 17:34 [PATCH 0/4] Expose and manage PCI device reset ameynarkhede03
                   ` (2 preceding siblings ...)
  2021-03-12 17:34 ` [PATCH 3/4] PCI: Remove reset_fn field from pci_dev ameynarkhede03
@ 2021-03-12 17:34 ` ameynarkhede03
  2021-03-14 23:55   ` Pali Rohár
       [not found] ` <20210312112043.3f2954e3@omen.home.shazbot.org>
  2021-03-14 12:09 ` Leon Romanovsky
  5 siblings, 1 reply; 90+ messages in thread
From: ameynarkhede03 @ 2021-03-12 17:34 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, alex.williamson, raphael.norwitz, Amey Narkhede

From: Amey Narkhede <ameynarkhede03@gmail.com>

Add reset_methods_enabled bitmap to struct pci_dev to
keep track of user preferred device reset mechanisms.
Add reset_method sysfs attribute to query and set
user preferred device reset mechanisms.

Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
---
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

 Documentation/ABI/testing/sysfs-bus-pci | 15 ++++++
 drivers/pci/pci-sysfs.c                 | 66 +++++++++++++++++++++++--
 drivers/pci/pci.c                       |  3 +-
 include/linux/pci.h                     |  2 +
 4 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 25c9c3977..ae53ecd2e 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -121,6 +121,21 @@ Description:
 		child buses, and re-discover devices removed earlier
 		from this part of the device tree.

+What:		/sys/bus/pci/devices/.../reset_method
+Date:		March 2021
+Contact:	Amey Narkhede <ameynarkhede03@gmail.com>
+Description:
+		Some devices allow an individual function to be reset
+		without affecting other functions in the same slot.
+		For devices that have this support, a file named reset_method
+		will be present in sysfs. Reading this file will give names
+		of the device supported reset methods. Currently used methods
+		are enclosed in brackets. Writing the name of any of the device
+		supported reset method to this file will set the reset method to
+		be used when resetting the device. Writing "none" to this file
+		will disable ability to reset the device and writing "default"
+		will return to the original value.
+
 What:		/sys/bus/pci/devices/.../reset
 Date:		July 2009
 Contact:	Michael S. Tsirkin <mst@redhat.com>
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 78d2c130c..3cd06d1c0 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1304,6 +1304,59 @@ static const struct bin_attribute pcie_config_attr = {
 	.write = pci_write_config,
 };

+static ssize_t reset_method_show(struct device *dev,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	const struct pci_reset_fn_method *reset;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	ssize_t len = 0;
+	int i;
+
+	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
+		if (pdev->reset_methods_enabled & (1 << i))
+			len += sysfs_emit_at(buf, len, "[%s] ", reset->name);
+		else if (pdev->reset_methods & (1 << i))
+			len += sysfs_emit_at(buf, len, "%s ", reset->name);
+	}
+
+	return len;
+}
+
+static ssize_t reset_method_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	const struct pci_reset_fn_method *reset = pci_reset_fn_methods;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	u8 reset_mechanism;
+	int i = 0;
+
+	/* Writing none disables reset */
+	if (sysfs_streq(buf, "none")) {
+		reset_mechanism = 0;
+	} else if (sysfs_streq(buf, "default")) {
+		/* Writing default returns to initial value */
+		reset_mechanism = pdev->reset_methods;
+	} else {
+		reset_mechanism = 0;
+		for (; reset->reset_fn; i++, reset++) {
+			if (sysfs_streq(buf, reset->name)) {
+				reset_mechanism = 1 << i;
+				break;
+			}
+		}
+		if (!reset_mechanism || !(pdev->reset_methods & reset_mechanism))
+			return -EINVAL;
+	}
+
+	pdev->reset_methods_enabled = reset_mechanism;
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(reset_method);
+
 static ssize_t reset_store(struct device *dev, struct device_attribute *attr,
 			   const char *buf, size_t count)
 {
@@ -1337,11 +1390,16 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
 	if (dev->reset_methods) {
 		retval = device_create_file(&dev->dev, &dev_attr_reset);
 		if (retval)
-			goto error;
+			goto err_reset;
+		retval = device_create_file(&dev->dev, &dev_attr_reset_method);
+		if (retval)
+			goto err_method;
 	}
 	return 0;

-error:
+err_method:
+	device_remove_file(&dev->dev, &dev_attr_reset);
+err_reset:
 	pcie_vpd_remove_sysfs_dev_files(dev);
 	return retval;
 }
@@ -1417,8 +1475,10 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
 static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
 {
 	pcie_vpd_remove_sysfs_dev_files(dev);
-	if (dev->reset_methods)
+	if (dev->reset_methods) {
 		device_remove_file(&dev->dev, &dev_attr_reset);
+		device_remove_file(&dev->dev, &dev_attr_reset_method);
+	}
 }

 /**
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b7f6c6588..81cebea56 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5106,7 +5106,7 @@ int __pci_reset_function_locked(struct pci_dev *dev)
 	might_sleep();

 	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
-		if (!(dev->reset_methods & (1 << i)))
+		if (!(dev->reset_methods_enabled & (1 << i)))
 			continue;

 		/*
@@ -5153,6 +5153,7 @@ void pci_init_reset_methods(struct pci_dev *dev)
 		else if (rc != -ENOTTY)
 			break;
 	}
+	dev->reset_methods_enabled = dev->reset_methods;
 }

 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a2f003f4e..400f614e0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -335,6 +335,8 @@ struct pci_dev {
 	 * See pci_reset_fn_methods array in pci.c
 	 */
 	u8 __bitwise reset_methods;		/* bitmap for device supported reset capabilities */
+	/* bitmap for user enabled and device supported reset capabilities */
+	u8 __bitwise reset_methods_enabled;
 #ifdef CONFIG_PCIEAER
 	u16		aer_cap;	/* AER capability offset */
 	struct aer_stats *aer_stats;	/* AER stats for this device */
--
2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/4] Expose and manage PCI device reset
       [not found] ` <20210312112043.3f2954e3@omen.home.shazbot.org>
@ 2021-03-12 18:40   ` Amey Narkhede
  2021-03-12 18:58     ` Krzysztof Wilczyński
  2021-03-13  2:02     ` Raphael Norwitz
  0 siblings, 2 replies; 90+ messages in thread
From: Amey Narkhede @ 2021-03-12 18:40 UTC (permalink / raw)
  To: Alex Williamson; +Cc: bhelgaas, linux-pci, linux-kernel, raphael.norwitz

On 21/03/12 11:20AM, Alex Williamson wrote:
> On Fri, 12 Mar 2021 23:04:48 +0530
> ameynarkhede03@gmail.com wrote:
>
> > From: Amey Narkhede <ameynarkhede03@gmail.com>
> >
> > PCI and PCIe devices may support a number of possible reset mechanisms
> > for example Function Level Reset (FLR) provided via Advanced Feature or
> > PCIe capabilities, Power Management reset, bus reset, or device specific reset.
> > Currently the PCI subsystem creates a policy prioritizing these reset methods
> > which provides neither visibility nor control to userspace.
> >
> > Expose the reset methods available per device to userspace, via sysfs
> > and allow an administrative user or device owner to have ability to
> > manage per device reset method priorities or exclusions.
> > This feature aims to allow greater control of a device for use cases
> > as device assignment, where specific device or platform issues may
> > interact poorly with a given reset method, and for which device specific
> > quirks have not been developed.
> >
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> > Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>
> Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> really need to be explicit, IMO.  This is a common issue for new
> developers, but it really needs to be more formal.  I wouldn't claim to
> be able to speak for Raphael and interpret his comments so far as his
> final seal of approval.
>
> Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> triple dash '---' line.  Anything between that line and the beginning
> of the diff is discarded by tools.  People will often use that for
> difference between version since it will be discarded on commit.
> Likewise, the cover letter is not committed, so Review-by there are
> generally not done.  I generally make my Sign-off last in the chain and
> maintainers will generally add theirs after that.  This makes for a
> chain where someone can read up from the bottom to see how this commit
> entered the kernel.  Reviews, Acks, and whatnot will therefore usually
> be collected above the author posting the patch.
>
> Since this is a v1 patch and it's likely there will be more revisions,
> rather than send a v2 immediately with corrections, I'd probably just
> reply to the cover letter retracting Raphael's Review-by for him to
> send his own and noting that you'll fix the commit reviews formatting,
> but will wait for a bit for further comments before sending a new
> version.
>
> No big deal, nice work getting it sent out.  Thanks,
>
> Alex
>
Raphael sent me the email with
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> that
is why I included it.
So basically in v2 I should reorder tags such that Sign-off will be
the last. Did I get that right? Or am I missing something?

Thanks,
Amey

> > Amey Narkhede (4):
> >   PCI: Refactor pcie_flr to follow calling convention of other reset
> >     methods
> >   PCI: Add new bitmap for keeping track of supported reset mechanisms
> >   PCI: Remove reset_fn field from pci_dev
> >   PCI/sysfs: Allow userspace to query and set device reset mechanism
> >
> >  Documentation/ABI/testing/sysfs-bus-pci       |  15 ++
> >  drivers/crypto/cavium/nitrox/nitrox_main.c    |   4 +-
> >  drivers/crypto/qat/qat_common/adf_aer.c       |   2 +-
> >  drivers/infiniband/hw/hfi1/chip.c             |   4 +-
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +-
> >  .../ethernet/cavium/liquidio/lio_vf_main.c    |   4 +-
> >  .../ethernet/cavium/liquidio/octeon_mailbox.c |   2 +-
> >  drivers/net/ethernet/freescale/enetc/enetc.c  |   2 +-
> >  .../ethernet/freescale/enetc/enetc_pci_mdio.c |   2 +-
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   4 +-
> >  drivers/pci/pci-sysfs.c                       |  68 +++++++-
> >  drivers/pci/pci.c                             | 160 ++++++++++--------
> >  drivers/pci/pci.h                             |  11 +-
> >  drivers/pci/pcie/aer.c                        |  12 +-
> >  drivers/pci/probe.c                           |   4 +-
> >  drivers/pci/quirks.c                          |  17 +-
> >  include/linux/pci.h                           |  17 +-
> >  17 files changed, 213 insertions(+), 117 deletions(-)
> >
> > --
> > 2.30.2
> >
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/4] Expose and manage PCI device reset
  2021-03-12 18:40   ` [PATCH 0/4] Expose and manage PCI device reset Amey Narkhede
@ 2021-03-12 18:58     ` Krzysztof Wilczyński
  2021-03-12 19:06       ` Amey Narkhede
  2021-03-13  2:02     ` Raphael Norwitz
  1 sibling, 1 reply; 90+ messages in thread
From: Krzysztof Wilczyński @ 2021-03-12 18:58 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: Alex Williamson, bhelgaas, linux-pci, linux-kernel, raphael.norwitz

Hi Amey,

Thank you for sending the series over!

[...]
> > Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> > really need to be explicit, IMO.  This is a common issue for new
> > developers, but it really needs to be more formal.  I wouldn't claim to
> > be able to speak for Raphael and interpret his comments so far as his
> > final seal of approval.
> >
> > Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> > triple dash '---' line.  Anything between that line and the beginning
> > of the diff is discarded by tools.  People will often use that for
> > difference between version since it will be discarded on commit.
> > Likewise, the cover letter is not committed, so Review-by there are
> > generally not done.  I generally make my Sign-off last in the chain and
> > maintainers will generally add theirs after that.  This makes for a
> > chain where someone can read up from the bottom to see how this commit
> > entered the kernel.  Reviews, Acks, and whatnot will therefore usually
> > be collected above the author posting the patch.
> >
> > Since this is a v1 patch and it's likely there will be more revisions,
> > rather than send a v2 immediately with corrections, I'd probably just
> > reply to the cover letter retracting Raphael's Review-by for him to
> > send his own and noting that you'll fix the commit reviews formatting,
> > but will wait for a bit for further comments before sending a new
> > version.
> >
> > No big deal, nice work getting it sent out.  Thanks,
> >
> > Alex
> >
> Raphael sent me the email with
> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> that
> is why I included it.
> So basically in v2 I should reorder tags such that Sign-off will be
> the last. Did I get that right? Or am I missing something?
[...]

I am not sure about the messages outside of the mailing list between
you, Alex and Raphael, as normally conversation and any reviews would
happen here (on the mailing list, that is), but as long as everyone
involved is on the same page, then every should be fine.

In terms of how to format the patch, have a look at the following,
especially before you send another version, as there are some good tips
and recommendations there (including how to order things):

  https://lore.kernel.org/linux-pci/20171026223701.GA25649@bhelgaas-glaptop.roam.corp.google.com/

Krzysztof

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/4] Expose and manage PCI device reset
  2021-03-12 18:58     ` Krzysztof Wilczyński
@ 2021-03-12 19:06       ` Amey Narkhede
  2021-03-12 19:20         ` Krzysztof Wilczyński
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-12 19:06 UTC (permalink / raw)
  To: Krzysztof Wilczyński
  Cc: Alex Williamson, bhelgaas, linux-pci, linux-kernel, raphael.norwitz

On 21/03/12 07:58PM, Krzysztof Wilczyński wrote:
> Hi Amey,
>
> Thank you for sending the series over!
>
> [...]
> > > Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> > > really need to be explicit, IMO.  This is a common issue for new
> > > developers, but it really needs to be more formal.  I wouldn't claim to
> > > be able to speak for Raphael and interpret his comments so far as his
> > > final seal of approval.
> > >
> > > Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> > > triple dash '---' line.  Anything between that line and the beginning
> > > of the diff is discarded by tools.  People will often use that for
> > > difference between version since it will be discarded on commit.
> > > Likewise, the cover letter is not committed, so Review-by there are
> > > generally not done.  I generally make my Sign-off last in the chain and
> > > maintainers will generally add theirs after that.  This makes for a
> > > chain where someone can read up from the bottom to see how this commit
> > > entered the kernel.  Reviews, Acks, and whatnot will therefore usually
> > > be collected above the author posting the patch.
> > >
> > > Since this is a v1 patch and it's likely there will be more revisions,
> > > rather than send a v2 immediately with corrections, I'd probably just
> > > reply to the cover letter retracting Raphael's Review-by for him to
> > > send his own and noting that you'll fix the commit reviews formatting,
> > > but will wait for a bit for further comments before sending a new
> > > version.
> > >
> > > No big deal, nice work getting it sent out.  Thanks,
> > >
> > > Alex
> > >
> > Raphael sent me the email with
> > Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> that
> > is why I included it.
> > So basically in v2 I should reorder tags such that Sign-off will be
> > the last. Did I get that right? Or am I missing something?
> [...]
>
> I am not sure about the messages outside of the mailing list between
> you, Alex and Raphael, as normally conversation and any reviews would
> happen here (on the mailing list, that is), but as long as everyone
> involved is on the same page, then every should be fine.
>
> In terms of how to format the patch, have a look at the following,
> especially before you send another version, as there are some good tips
> and recommendations there (including how to order things):
>
>   https://lore.kernel.org/linux-pci/20171026223701.GA25649@bhelgaas-glaptop.roam.corp.google.com/
>
> Krzysztof
Basically whole thing boils down to I'm not good at handling terminal
email clients. I'll surely keep those points mentioned by Bjorn
in my mind.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/4] Expose and manage PCI device reset
  2021-03-12 19:06       ` Amey Narkhede
@ 2021-03-12 19:20         ` Krzysztof Wilczyński
  0 siblings, 0 replies; 90+ messages in thread
From: Krzysztof Wilczyński @ 2021-03-12 19:20 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: Alex Williamson, bhelgaas, linux-pci, linux-kernel, raphael.norwitz

Hi Amey,

[...]
> Basically whole thing boils down to I'm not good at handling terminal
> email clients. I'll surely keep those points mentioned by Bjorn
> in my mind.
[...]

No worries.  Thunderbird works fine with Google Mail and can send plain
text e-mails too, if you get tired of Mutt etc.

By the way, don't immediately send v2 quite yet.  Allow people some time
to review first version.  Well, unless you deem that you need to do it,
that is.

Krzysztof

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/4] Expose and manage PCI device reset
  2021-03-12 18:40   ` [PATCH 0/4] Expose and manage PCI device reset Amey Narkhede
  2021-03-12 18:58     ` Krzysztof Wilczyński
@ 2021-03-13  2:02     ` Raphael Norwitz
  1 sibling, 0 replies; 90+ messages in thread
From: Raphael Norwitz @ 2021-03-13  2:02 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: Alex Williamson, bhelgaas, linux-pci, linux-kernel, Raphael Norwitz

On Sat, Mar 13, 2021 at 12:10:38AM +0530, Amey Narkhede wrote:
> On 21/03/12 11:20AM, Alex Williamson wrote:
> > On Fri, 12 Mar 2021 23:04:48 +0530
> > ameynarkhede03@gmail.com wrote:
> >
> > > From: Amey Narkhede <ameynarkhede03@gmail.com>
> > >
> > > PCI and PCIe devices may support a number of possible reset mechanisms
> > > for example Function Level Reset (FLR) provided via Advanced Feature or
> > > PCIe capabilities, Power Management reset, bus reset, or device specific reset.
> > > Currently the PCI subsystem creates a policy prioritizing these reset methods
> > > which provides neither visibility nor control to userspace.
> > >
> > > Expose the reset methods available per device to userspace, via sysfs
> > > and allow an administrative user or device owner to have ability to
> > > manage per device reset method priorities or exclusions.
> > > This feature aims to allow greater control of a device for use cases
> > > as device assignment, where specific device or platform issues may
> > > interact poorly with a given reset method, and for which device specific
> > > quirks have not been developed.
> > >
> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> > > Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> >
> > Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> > really need to be explicit, IMO.  This is a common issue for new
> > developers, but it really needs to be more formal.  I wouldn't claim to
> > be able to speak for Raphael and interpret his comments so far as his
> > final seal of approval.
> >
> > Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> > triple dash '---' line.  Anything between that line and the beginning
> > of the diff is discarded by tools.  People will often use that for
> > difference between version since it will be discarded on commit.
> > Likewise, the cover letter is not committed, so Review-by there are
> > generally not done.  I generally make my Sign-off last in the chain and
> > maintainers will generally add theirs after that.  This makes for a
> > chain where someone can read up from the bottom to see how this commit
> > entered the kernel.  Reviews, Acks, and whatnot will therefore usually
> > be collected above the author posting the patch.
> >
> > Since this is a v1 patch and it's likely there will be more revisions,
> > rather than send a v2 immediately with corrections, I'd probably just
> > reply to the cover letter retracting Raphael's Review-by for him to
> > send his own and noting that you'll fix the commit reviews formatting,
> > but will wait for a bit for further comments before sending a new
> > version.
> >
> > No big deal, nice work getting it sent out.  Thanks,
> >
> > Alex
> >
> Raphael sent me the email with
> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> that
> is why I included it.
> So basically in v2 I should reorder tags such that Sign-off will be
> the last. Did I get that right? Or am I missing something?
>

Just to confirm, I did send

Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

for the latest version and I'm happy to have it on this series.

> Thanks,
> Amey
> 
> > > Amey Narkhede (4):
> > >   PCI: Refactor pcie_flr to follow calling convention of other reset
> > >     methods
> > >   PCI: Add new bitmap for keeping track of supported reset mechanisms
> > >   PCI: Remove reset_fn field from pci_dev
> > >   PCI/sysfs: Allow userspace to query and set device reset mechanism
> > >
> > >  Documentation/ABI/testing/sysfs-bus-pci       |  15 ++
> > >  drivers/crypto/cavium/nitrox/nitrox_main.c    |   4 +-
> > >  drivers/crypto/qat/qat_common/adf_aer.c       |   2 +-
> > >  drivers/infiniband/hw/hfi1/chip.c             |   4 +-
> > >  drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +-
> > >  .../ethernet/cavium/liquidio/lio_vf_main.c    |   4 +-
> > >  .../ethernet/cavium/liquidio/octeon_mailbox.c |   2 +-
> > >  drivers/net/ethernet/freescale/enetc/enetc.c  |   2 +-
> > >  .../ethernet/freescale/enetc/enetc_pci_mdio.c |   2 +-
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   4 +-
> > >  drivers/pci/pci-sysfs.c                       |  68 +++++++-
> > >  drivers/pci/pci.c                             | 160 ++++++++++--------
> > >  drivers/pci/pci.h                             |  11 +-
> > >  drivers/pci/pcie/aer.c                        |  12 +-
> > >  drivers/pci/probe.c                           |   4 +-
> > >  drivers/pci/quirks.c                          |  17 +-
> > >  include/linux/pci.h                           |  17 +-
> > >  17 files changed, 213 insertions(+), 117 deletions(-)
> > >
> > > --
> > > 2.30.2
> > >
> >

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/4] Expose and manage PCI device reset
  2021-03-12 17:34 [PATCH 0/4] Expose and manage PCI device reset ameynarkhede03
                   ` (4 preceding siblings ...)
       [not found] ` <20210312112043.3f2954e3@omen.home.shazbot.org>
@ 2021-03-14 12:09 ` Leon Romanovsky
  5 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-14 12:09 UTC (permalink / raw)
  To: ameynarkhede03
  Cc: bhelgaas, linux-pci, linux-kernel, alex.williamson, raphael.norwitz

On Fri, Mar 12, 2021 at 11:04:48PM +0530, ameynarkhede03@gmail.com wrote:
> From: Amey Narkhede <ameynarkhede03@gmail.com>
>
> PCI and PCIe devices may support a number of possible reset mechanisms
> for example Function Level Reset (FLR) provided via Advanced Feature or
> PCIe capabilities, Power Management reset, bus reset, or device specific reset.
> Currently the PCI subsystem creates a policy prioritizing these reset methods
> which provides neither visibility nor control to userspace.
>
> Expose the reset methods available per device to userspace, via sysfs
> and allow an administrative user or device owner to have ability to
> manage per device reset method priorities or exclusions.
> This feature aims to allow greater control of a device for use cases
> as device assignment, where specific device or platform issues may
> interact poorly with a given reset method, and for which device specific
> quirks have not been developed.

Sorry, are we talking about specific devices/flows/applications that
must have this functionality or about theoretical use case?

Thanks

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms
  2021-03-12 17:34 ` [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms ameynarkhede03
@ 2021-03-14 23:51   ` Pali Rohár
  0 siblings, 0 replies; 90+ messages in thread
From: Pali Rohár @ 2021-03-14 23:51 UTC (permalink / raw)
  To: ameynarkhede03
  Cc: bhelgaas, linux-pci, linux-kernel, alex.williamson, raphael.norwitz

On Friday 12 March 2021 23:04:50 ameynarkhede03@gmail.com wrote:
> From: Amey Narkhede <ameynarkhede03@gmail.com>
> 
> Introduce a new bitmap reset_methods in struct pci_dev
> to keep track of reset mechanisms supported by the
> device. Also refactor probing and reset functions
> to take advantage of calling convention of reset
> functions.
> 
> Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
> ---
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> 
>  drivers/pci/pci.c   | 106 ++++++++++++++++++++++++--------------------
>  drivers/pci/pci.h   |  11 ++++-
>  drivers/pci/probe.c |   5 +--
>  include/linux/pci.h |  10 +++++
>  4 files changed, 79 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4a7c084a3..407b44e85 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -40,6 +40,26 @@ const char *pci_power_names[] = {
>  };
>  EXPORT_SYMBOL_GPL(pci_power_names);
> 
> +static int pci_af_flr(struct pci_dev *dev, int probe);
> +static int pci_pm_reset(struct pci_dev *dev, int probe);
> +static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe);
> +static int pci_parent_bus_reset(struct pci_dev *dev, int probe);
> +
> +/*
> + * The ordering for functions in pci_reset_fn_methods
> + * is required for bitmap positions defined
> + * in reset_methods in struct pci_dev
> + */
> +const struct pci_reset_fn_method pci_reset_fn_methods[] = {
> +	{ .reset_fn = &pci_dev_specific_reset, .name = "device_specific" },
> +	{ .reset_fn = &pcie_flr, .name = "flr" },
> +	{ .reset_fn = &pci_af_flr, .name = "af_flr" },
> +	{ .reset_fn = &pci_pm_reset, .name = "pm" },
> +	{ .reset_fn = &pci_dev_reset_slot_function, .name = "slot" },
> +	{ .reset_fn = &pci_parent_bus_reset, .name = "bus" },

Hello Amey! In the list of reset methods is missing PCIe Warm Reset.

Could you extend and prepare API also for PCIe Warm Reset? According to
PCI Express mini card and m.2 electromechanical specifications, PCIe
Warm Reset can be triggered by PERST# signal and more kernel drivers can
internally control PERST#. Just there is no kernel API and therefore
PCIe Warm Reset nor PERST# signal is unified.

> +	{ 0 },
> +};
> +
>  int isa_dma_bridge_buggy;
>  EXPORT_SYMBOL(isa_dma_bridge_buggy);
> 
> @@ -5080,71 +5100,59 @@ static void pci_dev_restore(struct pci_dev *dev)
>   */
>  int __pci_reset_function_locked(struct pci_dev *dev)
>  {
> -	int rc;
> +	int i, rc = -ENOTTY;
> +	const struct pci_reset_fn_method *reset;
> 
>  	might_sleep();
> 
> -	/*
> -	 * A reset method returns -ENOTTY if it doesn't support this device
> -	 * and we should try the next method.
> -	 *
> -	 * If it returns 0 (success), we're finished.  If it returns any
> -	 * other error, we're also finished: this indicates that further
> -	 * reset mechanisms might be broken on the device.
> -	 */
> -	rc = pci_dev_specific_reset(dev, 0);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pcie_flr(dev, 0);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pci_af_flr(dev, 0);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pci_pm_reset(dev, 0);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pci_dev_reset_slot_function(dev, 0);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	return pci_parent_bus_reset(dev, 0);
> +	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> +		if (!(dev->reset_methods & (1 << i)))
> +			continue;
> +
> +		/*
> +		 * A reset method returns -ENOTTY if it doesn't support this device
> +		 * and we should try the next method.
> +		 *
> +		 * If it returns 0 (success), we're finished.  If it returns any
> +		 * other error, we're also finished: this indicates that further
> +		 * reset mechanisms might be broken on the device.
> +		 */
> +		rc = reset->reset_fn(dev, 0);
> +		if (rc != -ENOTTY)
> +			return rc;
> +	}
> +	return rc;
>  }
>  EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
> 
>  /**
> - * pci_probe_reset_function - check whether the device can be safely reset
> - * @dev: PCI device to reset
> + * pci_init_reset_methods - check whether device can be safely reset
> + * and store supported reset mechanisms.
> + * @dev: PCI device to check for reset mechanisms
>   *
>   * Some devices allow an individual function to be reset without affecting
>   * other functions in the same device.  The PCI device must be responsive
> - * to PCI config space in order to use this function.
> + * to reads and writes to its PCI config space in order to use this function.
>   *
> - * Returns 0 if the device function can be reset or negative if the
> - * device doesn't support resetting a single function.
> + * Stores reset mechanisms supported by device in reset_methods bitmap
> + * field of struct pci_dev
>   */
> -int pci_probe_reset_function(struct pci_dev *dev)
> +void pci_init_reset_methods(struct pci_dev *dev)
>  {
> -	int rc;
> +	int i, rc;
> +	const struct pci_reset_fn_method *reset;
> 
> -	might_sleep();
> +	dev->reset_methods = 0;
> 
> -	rc = pci_dev_specific_reset(dev, 1);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pcie_flr(dev, 1);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pci_af_flr(dev, 1);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pci_pm_reset(dev, 1);
> -	if (rc != -ENOTTY)
> -		return rc;
> -	rc = pci_dev_reset_slot_function(dev, 1);
> -	if (rc != -ENOTTY)
> -		return rc;
> +	might_sleep();
> 
> -	return pci_parent_bus_reset(dev, 1);
> +	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> +		rc = reset->reset_fn(dev, 1);
> +		if (!rc)
> +			dev->reset_methods |= (1 << i);
> +		else if (rc != -ENOTTY)
> +			break;
> +	}
>  }
> 
>  /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index ef7c46613..ec093efdc 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -39,7 +39,7 @@ enum pci_mmap_api {
>  int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai,
>  		  enum pci_mmap_api mmap_api);
> 
> -int pci_probe_reset_function(struct pci_dev *dev);
> +void pci_init_reset_methods(struct pci_dev *dev);
>  int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
>  int pci_bus_error_reset(struct pci_dev *dev);
> 
> @@ -612,6 +612,15 @@ struct pci_dev_reset_methods {
>  	int (*reset)(struct pci_dev *dev, int probe);
>  };
> 
> +typedef int (*pci_reset_fn_t)(struct pci_dev *, int);
> +
> +struct pci_reset_fn_method {
> +	pci_reset_fn_t reset_fn;
> +	char *name;
> +};
> +
> +extern const struct pci_reset_fn_method pci_reset_fn_methods[];
> +
>  #ifdef CONFIG_PCI_QUIRKS
>  int pci_dev_specific_reset(struct pci_dev *dev, int probe);
>  #else
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 953f15abc..01dd037bd 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2403,9 +2403,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	pci_rcec_init(dev);		/* Root Complex Event Collector */
> 
>  	pcie_report_downtraining(dev);
> -
> -	if (pci_probe_reset_function(dev) == 0)
> -		dev->reset_fn = 1;
> +	pci_init_reset_methods(dev);
> +	dev->reset_fn = !!dev->reset_methods;
>  }
> 
>  /*
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 621ff5224..56d6e4750 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -325,6 +325,16 @@ struct pci_dev {
>  	unsigned int	class;		/* 3 bytes: (base,sub,prog-if) */
>  	u8		revision;	/* PCI revision, low byte of class word */
>  	u8		hdr_type;	/* PCI header type (`multi' flag masked out) */
> +	/*
> +	 * bit 0 -> dev_specific
> +	 * bit 1 -> flr
> +	 * bit 2 -> af_flr
> +	 * bit 3 -> pm
> +	 * bit 4 -> slot
> +	 * bit 5 -> bus
> +	 * See pci_reset_fn_methods array in pci.c
> +	 */
> +	u8 __bitwise reset_methods;		/* bitmap for device supported reset capabilities */
>  #ifdef CONFIG_PCIEAER
>  	u16		aer_cap;	/* AER capability offset */
>  	struct aer_stats *aer_stats;	/* AER stats for this device */
> --
> 2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 3/4] PCI: Remove reset_fn field from pci_dev
  2021-03-12 17:34 ` [PATCH 3/4] PCI: Remove reset_fn field from pci_dev ameynarkhede03
@ 2021-03-14 23:52   ` Pali Rohár
  0 siblings, 0 replies; 90+ messages in thread
From: Pali Rohár @ 2021-03-14 23:52 UTC (permalink / raw)
  To: ameynarkhede03
  Cc: bhelgaas, linux-pci, linux-kernel, alex.williamson, raphael.norwitz

On Friday 12 March 2021 23:04:51 ameynarkhede03@gmail.com wrote:
> From: Amey Narkhede <ameynarkhede03@gmail.com>
> 
> reset_fn field is used to indicate whether the
> device supports any reset mechanism or not.
> Deprecate use of reset_fn in favor of new
> reset_methods bitmap which can be used to keep
> track of all supported reset mechanisms of a device.

Hello Amey!

You cannot trigger PCIe Hot Reset (PCI secondary bus reset) in this
simple way from sysfs via new reset methods.

I proposed very similar functionality just few days ago:
https://lore.kernel.org/linux-pci/20210301171221.3d42a55i7h5ubqsb@pali/T/#u

And I realized that it needs more steps to be done.

At least some remove-reset-rescan procedure done atomically is required.

> Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
> ---
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> 
>  drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
>  drivers/pci/pci-sysfs.c                            | 6 ++----
>  drivers/pci/pci.c                                  | 6 +++---
>  drivers/pci/probe.c                                | 1 -
>  drivers/pci/quirks.c                               | 2 +-
>  include/linux/pci.h                                | 1 -
>  6 files changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> index 9b9d305c6..3e2c49e08 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> @@ -526,7 +526,7 @@ static void octeon_destroy_resources(struct octeon_device *oct)
>  			oct->irq_name_storage = NULL;
>  		}
>  		/* Soft reset the octeon device before exiting */
> -		if (oct->pci_dev->reset_fn)
> +		if (oct->pci_dev->reset_methods)
>  			octeon_pci_flr(oct);
>  		else
>  			cn23xx_vf_ask_pf_to_do_flr(oct);
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index f8afd54ca..78d2c130c 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1334,7 +1334,7 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
> 
>  	pcie_vpd_create_sysfs_dev_files(dev);
> 
> -	if (dev->reset_fn) {
> +	if (dev->reset_methods) {
>  		retval = device_create_file(&dev->dev, &dev_attr_reset);
>  		if (retval)
>  			goto error;
> @@ -1417,10 +1417,8 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
>  static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
>  {
>  	pcie_vpd_remove_sysfs_dev_files(dev);
> -	if (dev->reset_fn) {
> +	if (dev->reset_methods)
>  		device_remove_file(&dev->dev, &dev_attr_reset);
> -		dev->reset_fn = 0;
> -	}
>  }
> 
>  /**
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 407b44e85..b7f6c6588 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5175,7 +5175,7 @@ int pci_reset_function(struct pci_dev *dev)
>  {
>  	int rc;
> 
> -	if (!dev->reset_fn)
> +	if (!dev->reset_methods)
>  		return -ENOTTY;
> 
>  	pci_dev_lock(dev);
> @@ -5211,7 +5211,7 @@ int pci_reset_function_locked(struct pci_dev *dev)
>  {
>  	int rc;
> 
> -	if (!dev->reset_fn)
> +	if (!dev->reset_methods)
>  		return -ENOTTY;
> 
>  	pci_dev_save_and_disable(dev);
> @@ -5234,7 +5234,7 @@ int pci_try_reset_function(struct pci_dev *dev)
>  {
>  	int rc;
> 
> -	if (!dev->reset_fn)
> +	if (!dev->reset_methods)
>  		return -ENOTTY;
> 
>  	if (!pci_dev_trylock(dev))
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 01dd037bd..4764e031a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2404,7 +2404,6 @@ static void pci_init_capabilities(struct pci_dev *dev)
> 
>  	pcie_report_downtraining(dev);
>  	pci_init_reset_methods(dev);
> -	dev->reset_fn = !!dev->reset_methods;
>  }
> 
>  /*
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 0a3df84c9..20a81b1bc 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5535,7 +5535,7 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
> 
>  	if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
>  	    pdev->subsystem_device != 0x222e ||
> -	    !pdev->reset_fn)
> +	    !pdev->reset_methods)
>  		return;
> 
>  	if (pci_enable_device_mem(pdev))
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 56d6e4750..a2f003f4e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -437,7 +437,6 @@ struct pci_dev {
>  	unsigned int	state_saved:1;
>  	unsigned int	is_physfn:1;
>  	unsigned int	is_virtfn:1;
> -	unsigned int	reset_fn:1;
>  	unsigned int	is_hotplug_bridge:1;
>  	unsigned int	shpc_managed:1;		/* SHPC owned by shpchp */
>  	unsigned int	is_thunderbolt:1;	/* Thunderbolt controller */
> --
> 2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-12 17:34 ` [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism ameynarkhede03
@ 2021-03-14 23:55   ` Pali Rohár
  2021-03-15 13:43     ` Amey Narkhede
  2021-03-18 17:51     ` Enrico Weigelt, metux IT consult
  0 siblings, 2 replies; 90+ messages in thread
From: Pali Rohár @ 2021-03-14 23:55 UTC (permalink / raw)
  To: ameynarkhede03
  Cc: bhelgaas, linux-pci, linux-kernel, alex.williamson, raphael.norwitz

On Friday 12 March 2021 23:04:52 ameynarkhede03@gmail.com wrote:
> From: Amey Narkhede <ameynarkhede03@gmail.com>
> 
> Add reset_methods_enabled bitmap to struct pci_dev to
> keep track of user preferred device reset mechanisms.
> Add reset_method sysfs attribute to query and set
> user preferred device reset mechanisms.
> 
> Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
> ---
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> 
>  Documentation/ABI/testing/sysfs-bus-pci | 15 ++++++
>  drivers/pci/pci-sysfs.c                 | 66 +++++++++++++++++++++++--
>  drivers/pci/pci.c                       |  3 +-
>  include/linux/pci.h                     |  2 +
>  4 files changed, 82 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index 25c9c3977..ae53ecd2e 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -121,6 +121,21 @@ Description:
>  		child buses, and re-discover devices removed earlier
>  		from this part of the device tree.
> 
> +What:		/sys/bus/pci/devices/.../reset_method
> +Date:		March 2021
> +Contact:	Amey Narkhede <ameynarkhede03@gmail.com>
> +Description:
> +		Some devices allow an individual function to be reset
> +		without affecting other functions in the same slot.
> +		For devices that have this support, a file named reset_method
> +		will be present in sysfs. Reading this file will give names
> +		of the device supported reset methods. Currently used methods
> +		are enclosed in brackets. Writing the name of any of the device
> +		supported reset method to this file will set the reset method to
> +		be used when resetting the device. Writing "none" to this file
> +		will disable ability to reset the device and writing "default"
> +		will return to the original value.
> +

Hello Amey!

I think that this API does not work for PCIe Hot Reset (=PCI secondary
bus reset) and PCIe Warm Reset.

First reset method is bound to the bus, not device and therefore kernel
does not have to see any registered device. So there would be no
"reset_method" sysfs file, and also no "reset" sysfs file. But PCIe Hot
Reset is in most cases needed when buggy card is not registered on bus,
to trigger this reset. And with this API this is not possible.

PCIe Warm Reset is done by PERST# signal. When signal is asserted then
device is in reset state and therefore is not registered. So again
kernel does not have to see registered device.

Moreover for mPCIe form factor cards, boards can share one PERST# signal
with more PCIe cards and control this signal via GPIO. So asserting
PERST# GPIO can trigger Warm reset for more PCIe cards, not just one. It
depends on board or topology.

So... I do not think that current approach with "reset_method" sysfs
entry bound to the PCI device does not work for PCI secondary bus reset
and also cannot be used for implementing PCIe Warm Reset.

I would rather suggest to re-design and prepare a new API which would
work also with PCIe Hot Reset and PCIe Warm Reset.

This "reset" sysfs file can work only with PCI Function Level Reset or
some PM or device specific reset. But not with reset types which are
more like slot or bus orientated.

>  What:		/sys/bus/pci/devices/.../reset
>  Date:		July 2009
>  Contact:	Michael S. Tsirkin <mst@redhat.com>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 78d2c130c..3cd06d1c0 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1304,6 +1304,59 @@ static const struct bin_attribute pcie_config_attr = {
>  	.write = pci_write_config,
>  };
> 
> +static ssize_t reset_method_show(struct device *dev,
> +				 struct device_attribute *attr,
> +				 char *buf)
> +{
> +	const struct pci_reset_fn_method *reset;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	ssize_t len = 0;
> +	int i;
> +
> +	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> +		if (pdev->reset_methods_enabled & (1 << i))
> +			len += sysfs_emit_at(buf, len, "[%s] ", reset->name);
> +		else if (pdev->reset_methods & (1 << i))
> +			len += sysfs_emit_at(buf, len, "%s ", reset->name);
> +	}
> +
> +	return len;
> +}
> +
> +static ssize_t reset_method_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t count)
> +{
> +	const struct pci_reset_fn_method *reset = pci_reset_fn_methods;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	u8 reset_mechanism;
> +	int i = 0;
> +
> +	/* Writing none disables reset */
> +	if (sysfs_streq(buf, "none")) {
> +		reset_mechanism = 0;
> +	} else if (sysfs_streq(buf, "default")) {
> +		/* Writing default returns to initial value */
> +		reset_mechanism = pdev->reset_methods;
> +	} else {
> +		reset_mechanism = 0;
> +		for (; reset->reset_fn; i++, reset++) {
> +			if (sysfs_streq(buf, reset->name)) {
> +				reset_mechanism = 1 << i;
> +				break;
> +			}
> +		}
> +		if (!reset_mechanism || !(pdev->reset_methods & reset_mechanism))
> +			return -EINVAL;
> +	}
> +
> +	pdev->reset_methods_enabled = reset_mechanism;
> +
> +	return count;
> +}
> +
> +static DEVICE_ATTR_RW(reset_method);
> +
>  static ssize_t reset_store(struct device *dev, struct device_attribute *attr,
>  			   const char *buf, size_t count)
>  {
> @@ -1337,11 +1390,16 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
>  	if (dev->reset_methods) {
>  		retval = device_create_file(&dev->dev, &dev_attr_reset);
>  		if (retval)
> -			goto error;
> +			goto err_reset;
> +		retval = device_create_file(&dev->dev, &dev_attr_reset_method);
> +		if (retval)
> +			goto err_method;
>  	}
>  	return 0;
> 
> -error:
> +err_method:
> +	device_remove_file(&dev->dev, &dev_attr_reset);
> +err_reset:
>  	pcie_vpd_remove_sysfs_dev_files(dev);
>  	return retval;
>  }
> @@ -1417,8 +1475,10 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
>  static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
>  {
>  	pcie_vpd_remove_sysfs_dev_files(dev);
> -	if (dev->reset_methods)
> +	if (dev->reset_methods) {
>  		device_remove_file(&dev->dev, &dev_attr_reset);
> +		device_remove_file(&dev->dev, &dev_attr_reset_method);
> +	}
>  }
> 
>  /**
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b7f6c6588..81cebea56 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5106,7 +5106,7 @@ int __pci_reset_function_locked(struct pci_dev *dev)
>  	might_sleep();
> 
>  	for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> -		if (!(dev->reset_methods & (1 << i)))
> +		if (!(dev->reset_methods_enabled & (1 << i)))
>  			continue;
> 
>  		/*
> @@ -5153,6 +5153,7 @@ void pci_init_reset_methods(struct pci_dev *dev)
>  		else if (rc != -ENOTTY)
>  			break;
>  	}
> +	dev->reset_methods_enabled = dev->reset_methods;
>  }
> 
>  /**
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index a2f003f4e..400f614e0 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -335,6 +335,8 @@ struct pci_dev {
>  	 * See pci_reset_fn_methods array in pci.c
>  	 */
>  	u8 __bitwise reset_methods;		/* bitmap for device supported reset capabilities */
> +	/* bitmap for user enabled and device supported reset capabilities */
> +	u8 __bitwise reset_methods_enabled;
>  #ifdef CONFIG_PCIEAER
>  	u16		aer_cap;	/* AER capability offset */
>  	struct aer_stats *aer_stats;	/* AER stats for this device */
> --
> 2.30.2

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-14 23:55   ` Pali Rohár
@ 2021-03-15 13:43     ` Amey Narkhede
  2021-03-15 13:52       ` Pali Rohár
  2021-03-18 17:51     ` Enrico Weigelt, metux IT consult
  1 sibling, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-15 13:43 UTC (permalink / raw)
  To: Pali Rohár
  Cc: bhelgaas, alex.williamson, raphael.norwitz, linux-kernel, linux-pci

On 21/03/15 12:55AM, Pali Rohár wrote:
> On Friday 12 March 2021 23:04:52 ameynarkhede03@gmail.com wrote:
> > From: Amey Narkhede <ameynarkhede03@gmail.com>
> >
> > Add reset_methods_enabled bitmap to struct pci_dev to
> > keep track of user preferred device reset mechanisms.
> > Add reset_method sysfs attribute to query and set
> > user preferred device reset mechanisms.
> >
> > Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
> > ---
> > Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> > Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> >
> >  Documentation/ABI/testing/sysfs-bus-pci | 15 ++++++
> >  drivers/pci/pci-sysfs.c                 | 66 +++++++++++++++++++++++--
> >  drivers/pci/pci.c                       |  3 +-
> >  include/linux/pci.h                     |  2 +
> >  4 files changed, 82 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> > index 25c9c3977..ae53ecd2e 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-pci
> > +++ b/Documentation/ABI/testing/sysfs-bus-pci
> > @@ -121,6 +121,21 @@ Description:
> >  		child buses, and re-discover devices removed earlier
> >  		from this part of the device tree.
> >
> > +What:		/sys/bus/pci/devices/.../reset_method
> > +Date:		March 2021
> > +Contact:	Amey Narkhede <ameynarkhede03@gmail.com>
> > +Description:
> > +		Some devices allow an individual function to be reset
> > +		without affecting other functions in the same slot.
> > +		For devices that have this support, a file named reset_method
> > +		will be present in sysfs. Reading this file will give names
> > +		of the device supported reset methods. Currently used methods
> > +		are enclosed in brackets. Writing the name of any of the device
> > +		supported reset method to this file will set the reset method to
> > +		be used when resetting the device. Writing "none" to this file
> > +		will disable ability to reset the device and writing "default"
> > +		will return to the original value.
> > +
>
> Hello Amey!
>
> I think that this API does not work for PCIe Hot Reset (=PCI secondary
> bus reset) and PCIe Warm Reset.
>
> First reset method is bound to the bus, not device and therefore kernel
> does not have to see any registered device. So there would be no
> "reset_method" sysfs file, and also no "reset" sysfs file. But PCIe Hot
> Reset is in most cases needed when buggy card is not registered on bus,
> to trigger this reset. And with this API this is not possible.
>
> PCIe Warm Reset is done by PERST# signal. When signal is asserted then
> device is in reset state and therefore is not registered. So again
> kernel does not have to see registered device.
>
> Moreover for mPCIe form factor cards, boards can share one PERST# signal
> with more PCIe cards and control this signal via GPIO. So asserting
> PERST# GPIO can trigger Warm reset for more PCIe cards, not just one. It
> depends on board or topology.
>
> So... I do not think that current approach with "reset_method" sysfs
> entry bound to the PCI device does not work for PCI secondary bus reset
> and also cannot be used for implementing PCIe Warm Reset.
>
> I would rather suggest to re-design and prepare a new API which would
> work also with PCIe Hot Reset and PCIe Warm Reset.
>
> This "reset" sysfs file can work only with PCI Function Level Reset or
> some PM or device specific reset. But not with reset types which are
> more like slot or bus orientated.
>
The scope of this patch was to expose current reset methods
to the userspace. Also reset methods are available
for only those devices that allow an individual function to be reset
without affecting other functions in the same device.
So if those conditions are satisfied by the device then it can
use slot reset (pci_dev_reset_slot_function) and secondary bus
reset(pci_parent_bus_reset) which I think are hot reset and
warm reset respectively.

Thanks,
Amey
[...]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 13:43     ` Amey Narkhede
@ 2021-03-15 13:52       ` Pali Rohár
  2021-03-15 14:34         ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Pali Rohár @ 2021-03-15 13:52 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: bhelgaas, alex.williamson, raphael.norwitz, linux-kernel, linux-pci

On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> slot reset (pci_dev_reset_slot_function) and secondary bus
> reset(pci_parent_bus_reset) which I think are hot reset and
> warm reset respectively.

No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
type of reset, which is currently implemented only for PCIe hot plug
bridges and for PowerPC PowerNV platform and it just call PCI secondary
bus reset with some other hook. PCIe Warm Reset does not have API in
kernel and therefore drivers do not export this type of reset via any
kernel function (yet).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 13:52       ` Pali Rohár
@ 2021-03-15 14:34         ` Alex Williamson
  2021-03-15 14:52           ` Pali Rohár
  2021-03-15 15:07           ` Leon Romanovsky
  0 siblings, 2 replies; 90+ messages in thread
From: Alex Williamson @ 2021-03-15 14:34 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Mon, 15 Mar 2021 14:52:26 +0100
Pali Rohár <pali@kernel.org> wrote:

> On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > slot reset (pci_dev_reset_slot_function) and secondary bus
> > reset(pci_parent_bus_reset) which I think are hot reset and
> > warm reset respectively.  
> 
> No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> type of reset, which is currently implemented only for PCIe hot plug
> bridges and for PowerPC PowerNV platform and it just call PCI secondary
> bus reset with some other hook. PCIe Warm Reset does not have API in
> kernel and therefore drivers do not export this type of reset via any
> kernel function (yet).

Warm reset is beyond the scope of this series, but could be implemented
in a compatible way to fit within the pci_reset_fn_methods[] array
defined here.  Note that with this series the resets available through
pci_reset_function() and the per device reset attribute is sysfs remain
exactly the same as they are currently.  The bus and slot reset
methods used here are limited to devices where only a single function is
affected by the reset, therefore it is not like the patch you proposed
which performed a reset irrespective of the downstream devices.  This
series only enables selection of the existing methods.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 14:34         ` Alex Williamson
@ 2021-03-15 14:52           ` Pali Rohár
  2021-03-15 15:03             ` Alex Williamson
  2021-03-15 15:07           ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Pali Rohár @ 2021-03-15 14:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> On Mon, 15 Mar 2021 14:52:26 +0100
> Pali Rohár <pali@kernel.org> wrote:
> 
> > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > warm reset respectively.  
> > 
> > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > type of reset, which is currently implemented only for PCIe hot plug
> > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > bus reset with some other hook. PCIe Warm Reset does not have API in
> > kernel and therefore drivers do not export this type of reset via any
> > kernel function (yet).
> 
> Warm reset is beyond the scope of this series, but could be implemented
> in a compatible way to fit within the pci_reset_fn_methods[] array
> defined here.

Ok!

> Note that with this series the resets available through
> pci_reset_function() and the per device reset attribute is sysfs remain
> exactly the same as they are currently.  The bus and slot reset
> methods used here are limited to devices where only a single function is
> affected by the reset, therefore it is not like the patch you proposed
> which performed a reset irrespective of the downstream devices.  This
> series only enables selection of the existing methods.  Thanks,
> 
> Alex
> 

But with this patch series, there is still an issue with PCI secondary
bus reset mechanism as exported sysfs attribute does not do that
remove-reset-rescan procedure. As discussed in other thread, this reset
let device in unconfigured / broken state.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 14:52           ` Pali Rohár
@ 2021-03-15 15:03             ` Alex Williamson
  2021-03-17 19:02               ` Pali Rohár
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-15 15:03 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Mon, 15 Mar 2021 15:52:38 +0100
Pali Rohár <pali@kernel.org> wrote:

> On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > On Mon, 15 Mar 2021 14:52:26 +0100
> > Pali Rohár <pali@kernel.org> wrote:
> >   
> > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > warm reset respectively.    
> > > 
> > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > type of reset, which is currently implemented only for PCIe hot plug
> > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > kernel and therefore drivers do not export this type of reset via any
> > > kernel function (yet).  
> > 
> > Warm reset is beyond the scope of this series, but could be implemented
> > in a compatible way to fit within the pci_reset_fn_methods[] array
> > defined here.  
> 
> Ok!
> 
> > Note that with this series the resets available through
> > pci_reset_function() and the per device reset attribute is sysfs remain
> > exactly the same as they are currently.  The bus and slot reset
> > methods used here are limited to devices where only a single function is
> > affected by the reset, therefore it is not like the patch you proposed
> > which performed a reset irrespective of the downstream devices.  This
> > series only enables selection of the existing methods.  Thanks,
> > 
> > Alex
> >   
> 
> But with this patch series, there is still an issue with PCI secondary
> bus reset mechanism as exported sysfs attribute does not do that
> remove-reset-rescan procedure. As discussed in other thread, this reset
> let device in unconfigured / broken state.

No, there's not:

int pci_reset_function(struct pci_dev *dev)
{
        int rc;

        if (!dev->reset_fn)
                return -ENOTTY;

        pci_dev_lock(dev);
>>>     pci_dev_save_and_disable(dev);

        rc = __pci_reset_function_locked(dev);

>>>     pci_dev_restore(dev);
        pci_dev_unlock(dev);

        return rc;
}

The remove/re-scan was discussed primarily because your patch performed
a bus reset regardless of what devices were affected by that reset and
it's difficult to manage the scope where multiple devices are affected.
Here, the bus and slot reset functions will fail unless the scope is
limited to the single device triggering this reset.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 14:34         ` Alex Williamson
  2021-03-15 14:52           ` Pali Rohár
@ 2021-03-15 15:07           ` Leon Romanovsky
  2021-03-15 15:33             ` Amey Narkhede
  1 sibling, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-15 15:07 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Pali Rohár, Amey Narkhede, bhelgaas, raphael.norwitz,
	linux-kernel, linux-pci

On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> On Mon, 15 Mar 2021 14:52:26 +0100
> Pali Rohár <pali@kernel.org> wrote:
>
> > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > warm reset respectively.
> >
> > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > type of reset, which is currently implemented only for PCIe hot plug
> > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > bus reset with some other hook. PCIe Warm Reset does not have API in
> > kernel and therefore drivers do not export this type of reset via any
> > kernel function (yet).
>
> Warm reset is beyond the scope of this series, but could be implemented
> in a compatible way to fit within the pci_reset_fn_methods[] array
> defined here.  Note that with this series the resets available through
> pci_reset_function() and the per device reset attribute is sysfs remain
> exactly the same as they are currently.  The bus and slot reset
> methods used here are limited to devices where only a single function is
> affected by the reset, therefore it is not like the patch you proposed
> which performed a reset irrespective of the downstream devices.  This
> series only enables selection of the existing methods.  Thanks,

Alex,

I asked the patch author here [1], but didn't get any response, maybe
you can answer me. What is the use case scenario for this functionality?

Thanks

[1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal

>
> Alex
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 15:07           ` Leon Romanovsky
@ 2021-03-15 15:33             ` Amey Narkhede
  2021-03-15 16:29               ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-15 15:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, linux-pci, bhelgaas, raphael.norwitz, linux-kernel

On 21/03/15 05:07PM, Leon Romanovsky wrote:
> On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > On Mon, 15 Mar 2021 14:52:26 +0100
> > Pali Rohár <pali@kernel.org> wrote:
> >
> > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > warm reset respectively.
> > >
> > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > type of reset, which is currently implemented only for PCIe hot plug
> > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > kernel and therefore drivers do not export this type of reset via any
> > > kernel function (yet).
> >
> > Warm reset is beyond the scope of this series, but could be implemented
> > in a compatible way to fit within the pci_reset_fn_methods[] array
> > defined here.  Note that with this series the resets available through
> > pci_reset_function() and the per device reset attribute is sysfs remain
> > exactly the same as they are currently.  The bus and slot reset
> > methods used here are limited to devices where only a single function is
> > affected by the reset, therefore it is not like the patch you proposed
> > which performed a reset irrespective of the downstream devices.  This
> > series only enables selection of the existing methods.  Thanks,
>
> Alex,
>
> I asked the patch author here [1], but didn't get any response, maybe
> you can answer me. What is the use case scenario for this functionality?
>
> Thanks
>
> [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal
>
Sorry for not responding immediately. There were some buggy wifi cards
which needed FLR explicitly not sure if that behavior is fixed in
drivers. Also there is use a case at Nutanix but the engineer who
is involved is on PTO that is why I did not respond immediately as
I don't know the details yet.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 15:33             ` Amey Narkhede
@ 2021-03-15 16:29               ` Alex Williamson
  2021-03-15 18:32                 ` Raphael Norwitz
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-15 16:29 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: Leon Romanovsky, linux-pci, bhelgaas, raphael.norwitz, linux-kernel

On Mon, 15 Mar 2021 21:03:41 +0530
Amey Narkhede <ameynarkhede03@gmail.com> wrote:

> On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:  
> > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > Pali Rohár <pali@kernel.org> wrote:
> > >  
> > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > warm reset respectively.  
> > > >
> > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > kernel and therefore drivers do not export this type of reset via any
> > > > kernel function (yet).  
> > >
> > > Warm reset is beyond the scope of this series, but could be implemented
> > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > defined here.  Note that with this series the resets available through
> > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > exactly the same as they are currently.  The bus and slot reset
> > > methods used here are limited to devices where only a single function is
> > > affected by the reset, therefore it is not like the patch you proposed
> > > which performed a reset irrespective of the downstream devices.  This
> > > series only enables selection of the existing methods.  Thanks,  
> >
> > Alex,
> >
> > I asked the patch author here [1], but didn't get any response, maybe
> > you can answer me. What is the use case scenario for this functionality?
> >
> > Thanks
> >
> > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal
> >  
> Sorry for not responding immediately. There were some buggy wifi cards
> which needed FLR explicitly not sure if that behavior is fixed in
> drivers. Also there is use a case at Nutanix but the engineer who
> is involved is on PTO that is why I did not respond immediately as
> I don't know the details yet.

And more generally, devices continue to have reset issues and we
impose a fixed priority in our ordering.  We can and probably should
continue to quirk devices when we find broken resets so that we have
the best default behavior, but it's currently not easy for an end user
to experiment, ie. this reset works, that one doesn't.  We might also
have platform issues where a given reset works better on a certain
platform.  Exposing a way to test these things might lead to better
quirks.  In the case I think Pali was looking for, they wanted a
mechanism to force a bus reset, if this was in reference to a single
function device, this could be accomplished by setting a priority for
that mechanism, which would translate to not only the sysfs reset
attribute, but also the reset mechanism used by vfio-pci.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 16:29               ` Alex Williamson
@ 2021-03-15 18:32                 ` Raphael Norwitz
  2021-03-17  4:20                   ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Raphael Norwitz @ 2021-03-15 18:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, Leon Romanovsky, linux-pci, bhelgaas,
	Raphael Norwitz, linux-kernel, Alay Shah, Suresh Gumpula,
	Shyam Rajendran, Felipe Franciosi

On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> On Mon, 15 Mar 2021 21:03:41 +0530
> Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> 
> > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:  
> > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > Pali Rohár <pali@kernel.org> wrote:
> > > >  
> > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > warm reset respectively.  
> > > > >
> > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > kernel function (yet).  
> > > >
> > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > defined here.  Note that with this series the resets available through
> > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > exactly the same as they are currently.  The bus and slot reset
> > > > methods used here are limited to devices where only a single function is
> > > > affected by the reset, therefore it is not like the patch you proposed
> > > > which performed a reset irrespective of the downstream devices.  This
> > > > series only enables selection of the existing methods.  Thanks,  
> > >
> > > Alex,
> > >
> > > I asked the patch author here [1], but didn't get any response, maybe
> > > you can answer me. What is the use case scenario for this functionality?
> > >
> > > Thanks
> > >
> > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/ 
> > >  
> > Sorry for not responding immediately. There were some buggy wifi cards
> > which needed FLR explicitly not sure if that behavior is fixed in
> > drivers. Also there is use a case at Nutanix but the engineer who
> > is involved is on PTO that is why I did not respond immediately as
> > I don't know the details yet.
> 
> And more generally, devices continue to have reset issues and we
> impose a fixed priority in our ordering.  We can and probably should
> continue to quirk devices when we find broken resets so that we have
> the best default behavior, but it's currently not easy for an end user
> to experiment, ie. this reset works, that one doesn't.  We might also
> have platform issues where a given reset works better on a certain
> platform.  Exposing a way to test these things might lead to better
> quirks.  In the case I think Pali was looking for, they wanted a
> mechanism to force a bus reset, if this was in reference to a single
> function device, this could be accomplished by setting a priority for
> that mechanism, which would translate to not only the sysfs reset
> attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> 
> Alex
>

To confirm from our end - we have seen many such instances where default
reset methods have not worked well on our platform. Debugging these
issues is painful in practice, and this interface would make it far
easier.

Having an interface like this would also help us better communicate the
issues we find with upstream. Allowing others to more easily test our
(or other entities') findings should give better visibility into
which issues apply to the device in general and which are platform
specific. In disambiguating the former from the latter, we should be
able to better quirk devices for everyone, and in the latter cases, this
interface allows for a safer and more elegant solution than any of the
current alternatives.

CC Alay, Suresh, Shyam and Felipe in case they have anything to add.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 18:32                 ` Raphael Norwitz
@ 2021-03-17  4:20                   ` Leon Romanovsky
  2021-03-17 10:24                     ` Amey Narkhede
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-17  4:20 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: Alex Williamson, Amey Narkhede, linux-pci, bhelgaas,
	linux-kernel, Alay Shah, Suresh Gumpula, Shyam Rajendran,
	Felipe Franciosi

On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > On Mon, 15 Mar 2021 21:03:41 +0530
> > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> >
> > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > >
> > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > warm reset respectively.
> > > > > >
> > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > kernel function (yet).
> > > > >
> > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > defined here.  Note that with this series the resets available through
> > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > methods used here are limited to devices where only a single function is
> > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > series only enables selection of the existing methods.  Thanks,
> > > >
> > > > Alex,
> > > >
> > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > you can answer me. What is the use case scenario for this functionality?
> > > >
> > > > Thanks
> > > >
> > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > >
> > > Sorry for not responding immediately. There were some buggy wifi cards
> > > which needed FLR explicitly not sure if that behavior is fixed in
> > > drivers. Also there is use a case at Nutanix but the engineer who
> > > is involved is on PTO that is why I did not respond immediately as
> > > I don't know the details yet.
> >
> > And more generally, devices continue to have reset issues and we
> > impose a fixed priority in our ordering.  We can and probably should
> > continue to quirk devices when we find broken resets so that we have
> > the best default behavior, but it's currently not easy for an end user
> > to experiment, ie. this reset works, that one doesn't.  We might also
> > have platform issues where a given reset works better on a certain
> > platform.  Exposing a way to test these things might lead to better
> > quirks.  In the case I think Pali was looking for, they wanted a
> > mechanism to force a bus reset, if this was in reference to a single
> > function device, this could be accomplished by setting a priority for
> > that mechanism, which would translate to not only the sysfs reset
> > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> >
> > Alex
> >
>
> To confirm from our end - we have seen many such instances where default
> reset methods have not worked well on our platform. Debugging these
> issues is painful in practice, and this interface would make it far
> easier.
>
> Having an interface like this would also help us better communicate the
> issues we find with upstream. Allowing others to more easily test our
> (or other entities') findings should give better visibility into
> which issues apply to the device in general and which are platform
> specific. In disambiguating the former from the latter, we should be
> able to better quirk devices for everyone, and in the latter cases, this
> interface allows for a safer and more elegant solution than any of the
> current alternatives.

So to summarize, we are talking about test and debug interface to
overcome HW bugs, am I right?

My personal experience shows that once the easy workaround exists
(and write to generally available sysfs is very simple), the vendors
and users desire for proper fix decreases drastically. IMHO, we will
see increase of copy/paste in SO and blog posts, but reduce in quirks.

My 2-cents.

>
> CC Alay, Suresh, Shyam and Felipe in case they have anything to add.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17  4:20                   ` Leon Romanovsky
@ 2021-03-17 10:24                     ` Amey Narkhede
  2021-03-17 11:02                       ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-17 10:24 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/17 06:20AM, Leon Romanovsky wrote:
> On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > >
> > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > >
> > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > warm reset respectively.
> > > > > > >
> > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > kernel function (yet).
> > > > > >
> > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > defined here.  Note that with this series the resets available through
> > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > methods used here are limited to devices where only a single function is
> > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > series only enables selection of the existing methods.  Thanks,
> > > > >
> > > > > Alex,
> > > > >
> > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > you can answer me. What is the use case scenario for this functionality?
> > > > >
> > > > > Thanks
> > > > >
> > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > >
> > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > is involved is on PTO that is why I did not respond immediately as
> > > > I don't know the details yet.
> > >
> > > And more generally, devices continue to have reset issues and we
> > > impose a fixed priority in our ordering.  We can and probably should
> > > continue to quirk devices when we find broken resets so that we have
> > > the best default behavior, but it's currently not easy for an end user
> > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > have platform issues where a given reset works better on a certain
> > > platform.  Exposing a way to test these things might lead to better
> > > quirks.  In the case I think Pali was looking for, they wanted a
> > > mechanism to force a bus reset, if this was in reference to a single
> > > function device, this could be accomplished by setting a priority for
> > > that mechanism, which would translate to not only the sysfs reset
> > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > >
> > > Alex
> > >
> >
> > To confirm from our end - we have seen many such instances where default
> > reset methods have not worked well on our platform. Debugging these
> > issues is painful in practice, and this interface would make it far
> > easier.
> >
> > Having an interface like this would also help us better communicate the
> > issues we find with upstream. Allowing others to more easily test our
> > (or other entities') findings should give better visibility into
> > which issues apply to the device in general and which are platform
> > specific. In disambiguating the former from the latter, we should be
> > able to better quirk devices for everyone, and in the latter cases, this
> > interface allows for a safer and more elegant solution than any of the
> > current alternatives.
>
> So to summarize, we are talking about test and debug interface to
> overcome HW bugs, am I right?
>
> My personal experience shows that once the easy workaround exists
> (and write to generally available sysfs is very simple), the vendors
> and users desire for proper fix decreases drastically. IMHO, we will
> see increase of copy/paste in SO and blog posts, but reduce in quirks.
>
> My 2-cents.
>
I agree with your point but at least it gives the userspace ability
to use broken device until bug is fixed in upstream.
This is also applicable for obscure devices without upstream
drivers for example custom FPGA based devices.
Another main application which I forgot to mention is virtualization
where vmm wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 10:24                     ` Amey Narkhede
@ 2021-03-17 11:02                       ` Leon Romanovsky
  2021-03-17 11:23                         ` Amey Narkhede
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-17 11:02 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > >
> > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > >
> > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > warm reset respectively.
> > > > > > > >
> > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > kernel function (yet).
> > > > > > >
> > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > >
> > > > > > Alex,
> > > > > >
> > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > >
> > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > I don't know the details yet.
> > > >
> > > > And more generally, devices continue to have reset issues and we
> > > > impose a fixed priority in our ordering.  We can and probably should
> > > > continue to quirk devices when we find broken resets so that we have
> > > > the best default behavior, but it's currently not easy for an end user
> > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > have platform issues where a given reset works better on a certain
> > > > platform.  Exposing a way to test these things might lead to better
> > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > mechanism to force a bus reset, if this was in reference to a single
> > > > function device, this could be accomplished by setting a priority for
> > > > that mechanism, which would translate to not only the sysfs reset
> > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > >
> > > > Alex
> > > >
> > >
> > > To confirm from our end - we have seen many such instances where default
> > > reset methods have not worked well on our platform. Debugging these
> > > issues is painful in practice, and this interface would make it far
> > > easier.
> > >
> > > Having an interface like this would also help us better communicate the
> > > issues we find with upstream. Allowing others to more easily test our
> > > (or other entities') findings should give better visibility into
> > > which issues apply to the device in general and which are platform
> > > specific. In disambiguating the former from the latter, we should be
> > > able to better quirk devices for everyone, and in the latter cases, this
> > > interface allows for a safer and more elegant solution than any of the
> > > current alternatives.
> >
> > So to summarize, we are talking about test and debug interface to
> > overcome HW bugs, am I right?
> >
> > My personal experience shows that once the easy workaround exists
> > (and write to generally available sysfs is very simple), the vendors
> > and users desire for proper fix decreases drastically. IMHO, we will
> > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> >
> > My 2-cents.
> >
> I agree with your point but at least it gives the userspace ability
> to use broken device until bug is fixed in upstream.

As I said, I don't expect many fixes once "userspace" will be able to
use cheap workaround. There is no incentive to fix it.

> This is also applicable for obscure devices without upstream
> drivers for example custom FPGA based devices.

This is not relevant to upstream kernel. Those vendors ship everything
custom, they don't need upstream, we don't need them :)

> Another main application which I forgot to mention is virtualization
> where vmm wants to reset the device when the guest is reset,
> to emulate machine reboot as closely as possible.

It can work in very narrow case, because reset will cause to device
reprobe and most likely the driver will be different from the one that
started reset. I can imagine that net devices will lose their state and
config after such reset too.

IMHO, it will be saner for everyone if virtualization don't try such resets.

Thanks

>
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 11:02                       ` Leon Romanovsky
@ 2021-03-17 11:23                         ` Amey Narkhede
  2021-03-17 11:47                           ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-17 11:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/17 01:02PM, Leon Romanovsky wrote:
> On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > >
> > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > >
> > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > warm reset respectively.
> > > > > > > > >
> > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > kernel function (yet).
> > > > > > > >
> > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > >
> > > > > > > Alex,
> > > > > > >
> > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > >
> > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > I don't know the details yet.
> > > > >
> > > > > And more generally, devices continue to have reset issues and we
> > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > continue to quirk devices when we find broken resets so that we have
> > > > > the best default behavior, but it's currently not easy for an end user
> > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > have platform issues where a given reset works better on a certain
> > > > > platform.  Exposing a way to test these things might lead to better
> > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > function device, this could be accomplished by setting a priority for
> > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > >
> > > > To confirm from our end - we have seen many such instances where default
> > > > reset methods have not worked well on our platform. Debugging these
> > > > issues is painful in practice, and this interface would make it far
> > > > easier.
> > > >
> > > > Having an interface like this would also help us better communicate the
> > > > issues we find with upstream. Allowing others to more easily test our
> > > > (or other entities') findings should give better visibility into
> > > > which issues apply to the device in general and which are platform
> > > > specific. In disambiguating the former from the latter, we should be
> > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > interface allows for a safer and more elegant solution than any of the
> > > > current alternatives.
> > >
> > > So to summarize, we are talking about test and debug interface to
> > > overcome HW bugs, am I right?
> > >
> > > My personal experience shows that once the easy workaround exists
> > > (and write to generally available sysfs is very simple), the vendors
> > > and users desire for proper fix decreases drastically. IMHO, we will
> > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > >
> > > My 2-cents.
> > >
> > I agree with your point but at least it gives the userspace ability
> > to use broken device until bug is fixed in upstream.
>
> As I said, I don't expect many fixes once "userspace" will be able to
> use cheap workaround. There is no incentive to fix it.
>
> > This is also applicable for obscure devices without upstream
> > drivers for example custom FPGA based devices.
>
> This is not relevant to upstream kernel. Those vendors ship everything
> custom, they don't need upstream, we don't need them :)
>
By custom I meant hobbyists who could tinker with their custom FPGA.

> > Another main application which I forgot to mention is virtualization
> > where vmm wants to reset the device when the guest is reset,
> > to emulate machine reboot as closely as possible.
>
> It can work in very narrow case, because reset will cause to device
> reprobe and most likely the driver will be different from the one that
> started reset. I can imagine that net devices will lose their state and
> config after such reset too.
>
Not sure if I got that 100% right. The pci_reset_function() function
saves and restores device state over the reset.

> IMHO, it will be saner for everyone if virtualization don't try such resets.
>
> Thanks
>
The exists reset sysfs attribute was added for exactly this case
though.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 11:23                         ` Amey Narkhede
@ 2021-03-17 11:47                           ` Leon Romanovsky
  2021-03-17 13:17                             ` Amey Narkhede
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-17 11:47 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > >
> > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > warm reset respectively.
> > > > > > > > > >
> > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > kernel function (yet).
> > > > > > > > >
> > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > >
> > > > > > > > Alex,
> > > > > > > >
> > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > >
> > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > I don't know the details yet.
> > > > > >
> > > > > > And more generally, devices continue to have reset issues and we
> > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > have platform issues where a given reset works better on a certain
> > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > function device, this could be accomplished by setting a priority for
> > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > >
> > > > > > Alex
> > > > > >
> > > > >
> > > > > To confirm from our end - we have seen many such instances where default
> > > > > reset methods have not worked well on our platform. Debugging these
> > > > > issues is painful in practice, and this interface would make it far
> > > > > easier.
> > > > >
> > > > > Having an interface like this would also help us better communicate the
> > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > (or other entities') findings should give better visibility into
> > > > > which issues apply to the device in general and which are platform
> > > > > specific. In disambiguating the former from the latter, we should be
> > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > interface allows for a safer and more elegant solution than any of the
> > > > > current alternatives.
> > > >
> > > > So to summarize, we are talking about test and debug interface to
> > > > overcome HW bugs, am I right?
> > > >
> > > > My personal experience shows that once the easy workaround exists
> > > > (and write to generally available sysfs is very simple), the vendors
> > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > >
> > > > My 2-cents.
> > > >
> > > I agree with your point but at least it gives the userspace ability
> > > to use broken device until bug is fixed in upstream.
> >
> > As I said, I don't expect many fixes once "userspace" will be able to
> > use cheap workaround. There is no incentive to fix it.
> >
> > > This is also applicable for obscure devices without upstream
> > > drivers for example custom FPGA based devices.
> >
> > This is not relevant to upstream kernel. Those vendors ship everything
> > custom, they don't need upstream, we don't need them :)
> >
> By custom I meant hobbyists who could tinker with their custom FPGA.

I invite such hobbyists to send patches and include their FPGA in
upstream kernel.

>
> > > Another main application which I forgot to mention is virtualization
> > > where vmm wants to reset the device when the guest is reset,
> > > to emulate machine reboot as closely as possible.
> >
> > It can work in very narrow case, because reset will cause to device
> > reprobe and most likely the driver will be different from the one that
> > started reset. I can imagine that net devices will lose their state and
> > config after such reset too.
> >
> Not sure if I got that 100% right. The pci_reset_function() function
> saves and restores device state over the reset.

I'm talking about netdev state, but whatever given the existence of
sysfs reset knob.

>
> > IMHO, it will be saner for everyone if virtualization don't try such resets.
> >
> > Thanks
> >
> The exists reset sysfs attribute was added for exactly this case
> though.

I didn't know the rationale behind that file till you said and I
googled libvirt discussion, so ok. Do you propose that libvirt
will manage database of devices and their working reset types?

I'm not against this patch, just want to raise an attention that the
outcome of this patch will be decrease in fixes of broken devices.

Thanks

>
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 11:47                           ` Leon Romanovsky
@ 2021-03-17 13:17                             ` Amey Narkhede
  2021-03-17 13:58                               ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-17 13:17 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/17 01:47PM, Leon Romanovsky wrote:
> On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > >
> > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > >
> > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > kernel function (yet).
> > > > > > > > > >
> > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > >
> > > > > > > > > Alex,
> > > > > > > > >
> > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > >
> > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > I don't know the details yet.
> > > > > > >
> > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > > >
> > > > > >
> > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > issues is painful in practice, and this interface would make it far
> > > > > > easier.
> > > > > >
> > > > > > Having an interface like this would also help us better communicate the
> > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > (or other entities') findings should give better visibility into
> > > > > > which issues apply to the device in general and which are platform
> > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > current alternatives.
> > > > >
> > > > > So to summarize, we are talking about test and debug interface to
> > > > > overcome HW bugs, am I right?
> > > > >
> > > > > My personal experience shows that once the easy workaround exists
> > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > >
> > > > > My 2-cents.
> > > > >
> > > > I agree with your point but at least it gives the userspace ability
> > > > to use broken device until bug is fixed in upstream.
> > >
> > > As I said, I don't expect many fixes once "userspace" will be able to
> > > use cheap workaround. There is no incentive to fix it.
> > >
> > > > This is also applicable for obscure devices without upstream
> > > > drivers for example custom FPGA based devices.
> > >
> > > This is not relevant to upstream kernel. Those vendors ship everything
> > > custom, they don't need upstream, we don't need them :)
> > >
> > By custom I meant hobbyists who could tinker with their custom FPGA.
>
> I invite such hobbyists to send patches and include their FPGA in
> upstream kernel.
>
> >
> > > > Another main application which I forgot to mention is virtualization
> > > > where vmm wants to reset the device when the guest is reset,
> > > > to emulate machine reboot as closely as possible.
> > >
> > > It can work in very narrow case, because reset will cause to device
> > > reprobe and most likely the driver will be different from the one that
> > > started reset. I can imagine that net devices will lose their state and
> > > config after such reset too.
> > >
> > Not sure if I got that 100% right. The pci_reset_function() function
> > saves and restores device state over the reset.
>
> I'm talking about netdev state, but whatever given the existence of
> sysfs reset knob.
>
> >
> > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > >
> > > Thanks
> > >
> > The exists reset sysfs attribute was added for exactly this case
> > though.
>
> I didn't know the rationale behind that file till you said and I
> googled libvirt discussion, so ok. Do you propose that libvirt
> will manage database of devices and their working reset types?
>
I don't have much idea about internals of libvirt but why would
it need to manage database of working reset types? It could just
read new reset_methods attribute to get the list of supported reset
methods.
> I'm not against this patch, just want to raise an attention that the
> outcome of this patch will be decrease in fixes of broken devices.
>
> Thanks
>
That makes sense but that isn't any different from existing reset
attribute. This patch inhances it and allows selecting a device supported
reset method instead of using first available reset method according to
existing hardcoded policy.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 13:17                             ` Amey Narkhede
@ 2021-03-17 13:58                               ` Leon Romanovsky
  2021-03-17 17:31                                 ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-17 13:58 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > >
> > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > >
> > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex,
> > > > > > > > > >
> > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > >
> > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > I don't know the details yet.
> > > > > > > >
> > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > >
> > > > > > > > Alex
> > > > > > > >
> > > > > > >
> > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > easier.
> > > > > > >
> > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > (or other entities') findings should give better visibility into
> > > > > > > which issues apply to the device in general and which are platform
> > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > current alternatives.
> > > > > >
> > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > overcome HW bugs, am I right?
> > > > > >
> > > > > > My personal experience shows that once the easy workaround exists
> > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > >
> > > > > > My 2-cents.
> > > > > >
> > > > > I agree with your point but at least it gives the userspace ability
> > > > > to use broken device until bug is fixed in upstream.
> > > >
> > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > use cheap workaround. There is no incentive to fix it.
> > > >
> > > > > This is also applicable for obscure devices without upstream
> > > > > drivers for example custom FPGA based devices.
> > > >
> > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > custom, they don't need upstream, we don't need them :)
> > > >
> > > By custom I meant hobbyists who could tinker with their custom FPGA.
> >
> > I invite such hobbyists to send patches and include their FPGA in
> > upstream kernel.
> >
> > >
> > > > > Another main application which I forgot to mention is virtualization
> > > > > where vmm wants to reset the device when the guest is reset,
> > > > > to emulate machine reboot as closely as possible.
> > > >
> > > > It can work in very narrow case, because reset will cause to device
> > > > reprobe and most likely the driver will be different from the one that
> > > > started reset. I can imagine that net devices will lose their state and
> > > > config after such reset too.
> > > >
> > > Not sure if I got that 100% right. The pci_reset_function() function
> > > saves and restores device state over the reset.
> >
> > I'm talking about netdev state, but whatever given the existence of
> > sysfs reset knob.
> >
> > >
> > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > > >
> > > > Thanks
> > > >
> > > The exists reset sysfs attribute was added for exactly this case
> > > though.
> >
> > I didn't know the rationale behind that file till you said and I
> > googled libvirt discussion, so ok. Do you propose that libvirt
> > will manage database of devices and their working reset types?
> >
> I don't have much idea about internals of libvirt but why would
> it need to manage database of working reset types? It could just
> read new reset_methods attribute to get the list of supported reset
> methods.

Because the idea of this patch is to read all supported reset types and
allow to the user to chose the working one. The user will do it with
help from StackOverflow, but libvirt will need to have some sort of
database, otherwise it won't be different from simple "echo 1 > reset"
which will iterate over all supported resets anyway.

> > I'm not against this patch, just want to raise an attention that the
> > outcome of this patch will be decrease in fixes of broken devices.
> >
> > Thanks
> >
> That makes sense but that isn't any different from existing reset
> attribute. This patch inhances it and allows selecting a device supported
> reset method instead of using first available reset method according to
> existing hardcoded policy.

The difference here is that this is a workaround to solve bugs that
should be fixed in the kernel.

Thanks

>
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 13:58                               ` Leon Romanovsky
@ 2021-03-17 17:31                                 ` Alex Williamson
  2021-03-18  9:09                                   ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-17 17:31 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Amey Narkhede, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Wed, 17 Mar 2021 15:58:40 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > On 21/03/17 01:47PM, Leon Romanovsky wrote:  
> > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:  
> > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:  
> > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:  
> > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:  
> > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:  
> > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:  
> > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > > >  
> > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:  
> > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:  
> > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > >  
> > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > warm reset respectively.  
> > > > > > > > > > > > >
> > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > kernel function (yet).  
> > > > > > > > > > > >
> > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,  
> > > > > > > > > > >
> > > > > > > > > > > Alex,
> > > > > > > > > > >
> > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > >  
> > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > I don't know the details yet.  
> > > > > > > > >
> > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > > >
> > > > > > > > > Alex
> > > > > > > > >  
> > > > > > > >
> > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > easier.
> > > > > > > >
> > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > current alternatives.  
> > > > > > >
> > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > overcome HW bugs, am I right?
> > > > > > >
> > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > >
> > > > > > > My 2-cents.
> > > > > > >  
> > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > to use broken device until bug is fixed in upstream.  
> > > > >
> > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > use cheap workaround. There is no incentive to fix it.

We can increase the annoyance factor of using a modified set of reset
methods, but ultimately we can only control what goes into our kernel,
other kernels might take v1 of this series and incorporate it
regardless of what happens here.

> > > > > > This is also applicable for obscure devices without upstream
> > > > > > drivers for example custom FPGA based devices.  
> > > > >
> > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > custom, they don't need upstream, we don't need them :)
> > > > >  
> > > > By custom I meant hobbyists who could tinker with their custom FPGA.  
> > >
> > > I invite such hobbyists to send patches and include their FPGA in
> > > upstream kernel.

This is potentially another good use case, how receptive are we going
to be to an FPGA design that botches a reset.  Do they have a valid
device ID for us to base a quirk on, are they just squatting on one, or
using the default from a library.  Maybe the next bitstream will
resolve it, maybe without any external indication.  IOW, what would the
quality level be for that quirk versus using this as a workaround,
where the user probably wouldn't mind a kernel nag?

> > > > > > Another main application which I forgot to mention is virtualization
> > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > to emulate machine reboot as closely as possible.  
> > > > >
> > > > > It can work in very narrow case, because reset will cause to device
> > > > > reprobe and most likely the driver will be different from the one that
> > > > > started reset. I can imagine that net devices will lose their state and
> > > > > config after such reset too.
> > > > >  
> > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > saves and restores device state over the reset.  
> > >
> > > I'm talking about netdev state, but whatever given the existence of
> > > sysfs reset knob.
> > >  
> > > >  
> > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.

That would cause a massive regression in device assignment support.  As
with other sysfs attributes, triggering them alongside a running driver
is probably not going to end well.  However, pci_reset_function() is
extremely useful for stopping devices and returning them to a default
state, when either rebooting a VM or returning the device to the host.
The device is not removed and re-probed when this occurs, vfio-pci is
able to hold onto the device across these actions.  Sure, don't reset a
netdev device when it's in use, that's not what these are used for.

> > > > The exists reset sysfs attribute was added for exactly this case
> > > > though.  
> > >
> > > I didn't know the rationale behind that file till you said and I
> > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > will manage database of devices and their working reset types?
> > >  
> > I don't have much idea about internals of libvirt but why would
> > it need to manage database of working reset types? It could just
> > read new reset_methods attribute to get the list of supported reset
> > methods.  
> 
> Because the idea of this patch is to read all supported reset types and
> allow to the user to chose the working one. The user will do it with
> help from StackOverflow, but libvirt will need to have some sort of
> database, otherwise it won't be different from simple "echo 1 > reset"
> which will iterate over all supported resets anyway.

AFAIK, libvirt no longer attempts to do resets itself, or is at least
moving in that direction.  vfio-pci will reset as device when they're
opened by a user (when available) or triggered via the API.

> > > I'm not against this patch, just want to raise an attention that the
> > > outcome of this patch will be decrease in fixes of broken devices.
> > >
> > > Thanks
> > >  
> > That makes sense but that isn't any different from existing reset
> > attribute. This patch inhances it and allows selecting a device supported
> > reset method instead of using first available reset method according to
> > existing hardcoded policy.  
> 
> The difference here is that this is a workaround to solve bugs that
> should be fixed in the kernel.

If we want to discourage using this as a primary means to resolve reset
issues on a device then we can create log warnings any time it's used.
Downstreams that really want this functionality are going to take this
patch from the list whether we accept it or not.  As above, it seems
there are valid use cases.  Even with mainstream vfio in QEMU, I go
through some hoops trying to determine if I can do a secondary bus
reset rather than a PM reset because it's not specified anywhere what a
"soft reset" means for any given device.  This sort of interface could
make it easier to apply a system policy that a pci_reset_function()
should always perform a secondary bus reset if the only other option is
a PM reset.  Maybe that policy mostly makes sense for a VM use case, so
we'd want one policy by default and another when the device is used for
this functionality.  How could we accomplish that with a quirk?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-15 15:03             ` Alex Williamson
@ 2021-03-17 19:02               ` Pali Rohár
  2021-03-17 19:15                 ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Pali Rohár @ 2021-03-17 19:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> On Mon, 15 Mar 2021 15:52:38 +0100
> Pali Rohár <pali@kernel.org> wrote:
> 
> > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > Pali Rohár <pali@kernel.org> wrote:
> > >   
> > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > warm reset respectively.    
> > > > 
> > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > kernel and therefore drivers do not export this type of reset via any
> > > > kernel function (yet).  
> > > 
> > > Warm reset is beyond the scope of this series, but could be implemented
> > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > defined here.  
> > 
> > Ok!
> > 
> > > Note that with this series the resets available through
> > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > exactly the same as they are currently.  The bus and slot reset
> > > methods used here are limited to devices where only a single function is
> > > affected by the reset, therefore it is not like the patch you proposed
> > > which performed a reset irrespective of the downstream devices.  This
> > > series only enables selection of the existing methods.  Thanks,
> > > 
> > > Alex
> > >   
> > 
> > But with this patch series, there is still an issue with PCI secondary
> > bus reset mechanism as exported sysfs attribute does not do that
> > remove-reset-rescan procedure. As discussed in other thread, this reset
> > let device in unconfigured / broken state.
> 
> No, there's not:
> 
> int pci_reset_function(struct pci_dev *dev)
> {
>         int rc;
> 
>         if (!dev->reset_fn)
>                 return -ENOTTY;
> 
>         pci_dev_lock(dev);
> >>>     pci_dev_save_and_disable(dev);
> 
>         rc = __pci_reset_function_locked(dev);
> 
> >>>     pci_dev_restore(dev);
>         pci_dev_unlock(dev);
> 
>         return rc;
> }
> 
> The remove/re-scan was discussed primarily because your patch performed
> a bus reset regardless of what devices were affected by that reset and
> it's difficult to manage the scope where multiple devices are affected.
> Here, the bus and slot reset functions will fail unless the scope is
> limited to the single device triggering this reset.  Thanks,
> 
> Alex
> 

I was thinking a bit more about it and I'm really sure how it would
behave with hotplugging PCIe bridge.

On aardvark PCIe controller I have already tested that secondary bus
reset bit is triggering Hot Reset event and then also Link Down event.
These events are not handled by aardvark driver yet (needs to
implemented into kernel's emulated root bridge code).

But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
Kernel has already code which removes PCIe device if it changes presence
bit (and inform via interrupt). And Link Down event triggers this
change.

Can somebody test these changes on some PCIe hotplug controller what
secondary bus reset via sysfs would do? Because currently it is not
exported as reset method and there can be different race conditions and
maybe error (?) if hotplug code is going to remove device on which user
triggered bus reset via sysfs.

And in my opinion this can happen also in case when only one device is
on the bus, so it perfectly matches all conditions when sysfs can use
bus reset for one device.

I can try to implement hotplug code into aardvark driver and root bridge
emulator to test how this patch would happen. But it would take some
time...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 19:02               ` Pali Rohár
@ 2021-03-17 19:15                 ` Alex Williamson
  2021-03-17 19:24                   ` Pali Rohár
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-17 19:15 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Wed, 17 Mar 2021 20:02:06 +0100
Pali Rohár <pali@kernel.org> wrote:

> On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > On Mon, 15 Mar 2021 15:52:38 +0100
> > Pali Rohár <pali@kernel.org> wrote:
> >   
> > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:  
> > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > Pali Rohár <pali@kernel.org> wrote:
> > > >     
> > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:    
> > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > warm reset respectively.      
> > > > > 
> > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > kernel function (yet).    
> > > > 
> > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > defined here.    
> > > 
> > > Ok!
> > >   
> > > > Note that with this series the resets available through
> > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > exactly the same as they are currently.  The bus and slot reset
> > > > methods used here are limited to devices where only a single function is
> > > > affected by the reset, therefore it is not like the patch you proposed
> > > > which performed a reset irrespective of the downstream devices.  This
> > > > series only enables selection of the existing methods.  Thanks,
> > > > 
> > > > Alex
> > > >     
> > > 
> > > But with this patch series, there is still an issue with PCI secondary
> > > bus reset mechanism as exported sysfs attribute does not do that
> > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > let device in unconfigured / broken state.  
> > 
> > No, there's not:
> > 
> > int pci_reset_function(struct pci_dev *dev)
> > {
> >         int rc;
> > 
> >         if (!dev->reset_fn)
> >                 return -ENOTTY;
> > 
> >         pci_dev_lock(dev);  
> > >>>     pci_dev_save_and_disable(dev);  
> > 
> >         rc = __pci_reset_function_locked(dev);
> >   
> > >>>     pci_dev_restore(dev);  
> >         pci_dev_unlock(dev);
> > 
> >         return rc;
> > }
> > 
> > The remove/re-scan was discussed primarily because your patch performed
> > a bus reset regardless of what devices were affected by that reset and
> > it's difficult to manage the scope where multiple devices are affected.
> > Here, the bus and slot reset functions will fail unless the scope is
> > limited to the single device triggering this reset.  Thanks,
> > 
> > Alex
> >   
> 
> I was thinking a bit more about it and I'm really sure how it would
> behave with hotplugging PCIe bridge.
> 
> On aardvark PCIe controller I have already tested that secondary bus
> reset bit is triggering Hot Reset event and then also Link Down event.
> These events are not handled by aardvark driver yet (needs to
> implemented into kernel's emulated root bridge code).
> 
> But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> Kernel has already code which removes PCIe device if it changes presence
> bit (and inform via interrupt). And Link Down event triggers this
> change.

This is the difference between slot and bus resets, the slot reset is
implemented by the hotplug controller and disables presence detection
around the bus reset.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 19:15                 ` Alex Williamson
@ 2021-03-17 19:24                   ` Pali Rohár
  2021-03-17 19:32                     ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Pali Rohár @ 2021-03-17 19:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:02:06 +0100
> Pali Rohár <pali@kernel.org> wrote:
> 
> > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > Pali Rohár <pali@kernel.org> wrote:
> > >   
> > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:  
> > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > >     
> > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:    
> > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > warm reset respectively.      
> > > > > > 
> > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > kernel function (yet).    
> > > > > 
> > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > defined here.    
> > > > 
> > > > Ok!
> > > >   
> > > > > Note that with this series the resets available through
> > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > methods used here are limited to devices where only a single function is
> > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > series only enables selection of the existing methods.  Thanks,
> > > > > 
> > > > > Alex
> > > > >     
> > > > 
> > > > But with this patch series, there is still an issue with PCI secondary
> > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > let device in unconfigured / broken state.  
> > > 
> > > No, there's not:
> > > 
> > > int pci_reset_function(struct pci_dev *dev)
> > > {
> > >         int rc;
> > > 
> > >         if (!dev->reset_fn)
> > >                 return -ENOTTY;
> > > 
> > >         pci_dev_lock(dev);  
> > > >>>     pci_dev_save_and_disable(dev);  
> > > 
> > >         rc = __pci_reset_function_locked(dev);
> > >   
> > > >>>     pci_dev_restore(dev);  
> > >         pci_dev_unlock(dev);
> > > 
> > >         return rc;
> > > }
> > > 
> > > The remove/re-scan was discussed primarily because your patch performed
> > > a bus reset regardless of what devices were affected by that reset and
> > > it's difficult to manage the scope where multiple devices are affected.
> > > Here, the bus and slot reset functions will fail unless the scope is
> > > limited to the single device triggering this reset.  Thanks,
> > > 
> > > Alex
> > >   
> > 
> > I was thinking a bit more about it and I'm really sure how it would
> > behave with hotplugging PCIe bridge.
> > 
> > On aardvark PCIe controller I have already tested that secondary bus
> > reset bit is triggering Hot Reset event and then also Link Down event.
> > These events are not handled by aardvark driver yet (needs to
> > implemented into kernel's emulated root bridge code).
> > 
> > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > Kernel has already code which removes PCIe device if it changes presence
> > bit (and inform via interrupt). And Link Down event triggers this
> > change.
> 
> This is the difference between slot and bus resets, the slot reset is
> implemented by the hotplug controller and disables presence detection
> around the bus reset.  Thanks,

Yes, but I'm talking about bus reset, not about slot reset.

I mean: to use bus reset via sysfs on hardware which supports slots and
hotplugging.

And if I'm reading code correctly, this combination is allowed, right?
Via these new patches it is possible to disable slot reset and enable
bus reset.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 19:24                   ` Pali Rohár
@ 2021-03-17 19:32                     ` Alex Williamson
  2021-03-17 19:40                       ` Pali Rohár
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-17 19:32 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Wed, 17 Mar 2021 20:24:24 +0100
Pali Rohár <pali@kernel.org> wrote:

> On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > On Wed, 17 Mar 2021 20:02:06 +0100
> > Pali Rohár <pali@kernel.org> wrote:
> >   
> > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:  
> > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > Pali Rohár <pali@kernel.org> wrote:
> > > >     
> > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:    
> > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > >       
> > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:      
> > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > warm reset respectively.        
> > > > > > > 
> > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > kernel function (yet).      
> > > > > > 
> > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > defined here.      
> > > > > 
> > > > > Ok!
> > > > >     
> > > > > > Note that with this series the resets available through
> > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > methods used here are limited to devices where only a single function is
> > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > 
> > > > > > Alex
> > > > > >       
> > > > > 
> > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > let device in unconfigured / broken state.    
> > > > 
> > > > No, there's not:
> > > > 
> > > > int pci_reset_function(struct pci_dev *dev)
> > > > {
> > > >         int rc;
> > > > 
> > > >         if (!dev->reset_fn)
> > > >                 return -ENOTTY;
> > > > 
> > > >         pci_dev_lock(dev);    
> > > > >>>     pci_dev_save_and_disable(dev);    
> > > > 
> > > >         rc = __pci_reset_function_locked(dev);
> > > >     
> > > > >>>     pci_dev_restore(dev);    
> > > >         pci_dev_unlock(dev);
> > > > 
> > > >         return rc;
> > > > }
> > > > 
> > > > The remove/re-scan was discussed primarily because your patch performed
> > > > a bus reset regardless of what devices were affected by that reset and
> > > > it's difficult to manage the scope where multiple devices are affected.
> > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > limited to the single device triggering this reset.  Thanks,
> > > > 
> > > > Alex
> > > >     
> > > 
> > > I was thinking a bit more about it and I'm really sure how it would
> > > behave with hotplugging PCIe bridge.
> > > 
> > > On aardvark PCIe controller I have already tested that secondary bus
> > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > These events are not handled by aardvark driver yet (needs to
> > > implemented into kernel's emulated root bridge code).
> > > 
> > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > Kernel has already code which removes PCIe device if it changes presence
> > > bit (and inform via interrupt). And Link Down event triggers this
> > > change.  
> > 
> > This is the difference between slot and bus resets, the slot reset is
> > implemented by the hotplug controller and disables presence detection
> > around the bus reset.  Thanks,  
> 
> Yes, but I'm talking about bus reset, not about slot reset.
> 
> I mean: to use bus reset via sysfs on hardware which supports slots and
> hotplugging.
> 
> And if I'm reading code correctly, this combination is allowed, right?
> Via these new patches it is possible to disable slot reset and enable
> bus reset.

That's true, a slot reset is simply a bus reset wrapped around code
that prevents the device from getting ejected.  Maybe it would make
sense to combine the two as far as this interface is concerned, ie. a
single "bus" reset method that will always use slot reset when
available.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 19:32                     ` Alex Williamson
@ 2021-03-17 19:40                       ` Pali Rohár
  2021-03-17 20:00                         ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Pali Rohár @ 2021-03-17 19:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:24:24 +0100
> Pali Rohár <pali@kernel.org> wrote:
> 
> > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > Pali Rohár <pali@kernel.org> wrote:
> > >   
> > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:  
> > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > >     
> > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:    
> > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > >       
> > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:      
> > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > warm reset respectively.        
> > > > > > > > 
> > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > kernel function (yet).      
> > > > > > > 
> > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > defined here.      
> > > > > > 
> > > > > > Ok!
> > > > > >     
> > > > > > > Note that with this series the resets available through
> > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > 
> > > > > > > Alex
> > > > > > >       
> > > > > > 
> > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > let device in unconfigured / broken state.    
> > > > > 
> > > > > No, there's not:
> > > > > 
> > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > {
> > > > >         int rc;
> > > > > 
> > > > >         if (!dev->reset_fn)
> > > > >                 return -ENOTTY;
> > > > > 
> > > > >         pci_dev_lock(dev);    
> > > > > >>>     pci_dev_save_and_disable(dev);    
> > > > > 
> > > > >         rc = __pci_reset_function_locked(dev);
> > > > >     
> > > > > >>>     pci_dev_restore(dev);    
> > > > >         pci_dev_unlock(dev);
> > > > > 
> > > > >         return rc;
> > > > > }
> > > > > 
> > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > limited to the single device triggering this reset.  Thanks,
> > > > > 
> > > > > Alex
> > > > >     
> > > > 
> > > > I was thinking a bit more about it and I'm really sure how it would
> > > > behave with hotplugging PCIe bridge.
> > > > 
> > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > These events are not handled by aardvark driver yet (needs to
> > > > implemented into kernel's emulated root bridge code).
> > > > 
> > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > Kernel has already code which removes PCIe device if it changes presence
> > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > change.  
> > > 
> > > This is the difference between slot and bus resets, the slot reset is
> > > implemented by the hotplug controller and disables presence detection
> > > around the bus reset.  Thanks,  
> > 
> > Yes, but I'm talking about bus reset, not about slot reset.
> > 
> > I mean: to use bus reset via sysfs on hardware which supports slots and
> > hotplugging.
> > 
> > And if I'm reading code correctly, this combination is allowed, right?
> > Via these new patches it is possible to disable slot reset and enable
> > bus reset.
> 
> That's true, a slot reset is simply a bus reset wrapped around code
> that prevents the device from getting ejected.

Yes, this makes slot reset "safe". But bus reset is "unsafe".

> Maybe it would make
> sense to combine the two as far as this interface is concerned, ie. a
> single "bus" reset method that will always use slot reset when
> available.  Thanks,

That should work when slot reset is available.

Other option is that mentioned remove-reset-rescan procedure.

But quick search in drivers/pci/hotplug/ results that not all hotplug
drivers implement reset_slot method.

So there is a possible issue with hotplug driver which may eject device
during bus reset (because e.g. slot reset is not implemented)?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 19:40                       ` Pali Rohár
@ 2021-03-17 20:00                         ` Alex Williamson
  2021-03-17 20:13                           ` Pali Rohár
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-17 20:00 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Wed, 17 Mar 2021 20:40:24 +0100
Pali Rohár <pali@kernel.org> wrote:

> On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > On Wed, 17 Mar 2021 20:24:24 +0100
> > Pali Rohár <pali@kernel.org> wrote:
> >   
> > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:  
> > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > Pali Rohár <pali@kernel.org> wrote:
> > > >     
> > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:    
> > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > >       
> > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:      
> > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > >         
> > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:        
> > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > warm reset respectively.          
> > > > > > > > > 
> > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > kernel function (yet).        
> > > > > > > > 
> > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > defined here.        
> > > > > > > 
> > > > > > > Ok!
> > > > > > >       
> > > > > > > > Note that with this series the resets available through
> > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > 
> > > > > > > > Alex
> > > > > > > >         
> > > > > > > 
> > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > let device in unconfigured / broken state.      
> > > > > > 
> > > > > > No, there's not:
> > > > > > 
> > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > {
> > > > > >         int rc;
> > > > > > 
> > > > > >         if (!dev->reset_fn)
> > > > > >                 return -ENOTTY;
> > > > > > 
> > > > > >         pci_dev_lock(dev);      
> > > > > > >>>     pci_dev_save_and_disable(dev);      
> > > > > > 
> > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > >       
> > > > > > >>>     pci_dev_restore(dev);      
> > > > > >         pci_dev_unlock(dev);
> > > > > > 
> > > > > >         return rc;
> > > > > > }
> > > > > > 
> > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > 
> > > > > > Alex
> > > > > >       
> > > > > 
> > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > behave with hotplugging PCIe bridge.
> > > > > 
> > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > These events are not handled by aardvark driver yet (needs to
> > > > > implemented into kernel's emulated root bridge code).
> > > > > 
> > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > change.    
> > > > 
> > > > This is the difference between slot and bus resets, the slot reset is
> > > > implemented by the hotplug controller and disables presence detection
> > > > around the bus reset.  Thanks,    
> > > 
> > > Yes, but I'm talking about bus reset, not about slot reset.
> > > 
> > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > hotplugging.
> > > 
> > > And if I'm reading code correctly, this combination is allowed, right?
> > > Via these new patches it is possible to disable slot reset and enable
> > > bus reset.  
> > 
> > That's true, a slot reset is simply a bus reset wrapped around code
> > that prevents the device from getting ejected.  
> 
> Yes, this makes slot reset "safe". But bus reset is "unsafe".
> 
> > Maybe it would make
> > sense to combine the two as far as this interface is concerned, ie. a
> > single "bus" reset method that will always use slot reset when
> > available.  Thanks,  
> 
> That should work when slot reset is available.
> 
> Other option is that mentioned remove-reset-rescan procedure.

That's not something we can introduce to the pci_reset_function() path
without a fair bit of collateral in using it through vfio-pci.

> But quick search in drivers/pci/hotplug/ results that not all hotplug
> drivers implement reset_slot method.
> 
> So there is a possible issue with hotplug driver which may eject device
> during bus reset (because e.g. slot reset is not implemented)?

People aren't reporting it, so maybe those controllers aren't being
used for this use case.  Or maybe introducing this patch will make
these reset methods more readily accessible for testing.  We can fix or
blacklist those controllers for bus reset when reports come in.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 20:00                         ` Alex Williamson
@ 2021-03-17 20:13                           ` Pali Rohár
  2021-03-18 14:31                             ` Amey Narkhede
  0 siblings, 1 reply; 90+ messages in thread
From: Pali Rohár @ 2021-03-17 20:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:40:24 +0100
> Pali Rohár <pali@kernel.org> wrote:
> 
> > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > Pali Rohár <pali@kernel.org> wrote:
> > >   
> > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:  
> > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > >     
> > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:    
> > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > >       
> > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:      
> > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > >         
> > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:        
> > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > warm reset respectively.          
> > > > > > > > > > 
> > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > kernel function (yet).        
> > > > > > > > > 
> > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > defined here.        
> > > > > > > > 
> > > > > > > > Ok!
> > > > > > > >       
> > > > > > > > > Note that with this series the resets available through
> > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > 
> > > > > > > > > Alex
> > > > > > > > >         
> > > > > > > > 
> > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > let device in unconfigured / broken state.      
> > > > > > > 
> > > > > > > No, there's not:
> > > > > > > 
> > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > {
> > > > > > >         int rc;
> > > > > > > 
> > > > > > >         if (!dev->reset_fn)
> > > > > > >                 return -ENOTTY;
> > > > > > > 
> > > > > > >         pci_dev_lock(dev);      
> > > > > > > >>>     pci_dev_save_and_disable(dev);      
> > > > > > > 
> > > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > > >       
> > > > > > > >>>     pci_dev_restore(dev);      
> > > > > > >         pci_dev_unlock(dev);
> > > > > > > 
> > > > > > >         return rc;
> > > > > > > }
> > > > > > > 
> > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > > 
> > > > > > > Alex
> > > > > > >       
> > > > > > 
> > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > behave with hotplugging PCIe bridge.
> > > > > > 
> > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > implemented into kernel's emulated root bridge code).
> > > > > > 
> > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > change.    
> > > > > 
> > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > implemented by the hotplug controller and disables presence detection
> > > > > around the bus reset.  Thanks,    
> > > > 
> > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > 
> > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > hotplugging.
> > > > 
> > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > Via these new patches it is possible to disable slot reset and enable
> > > > bus reset.  
> > > 
> > > That's true, a slot reset is simply a bus reset wrapped around code
> > > that prevents the device from getting ejected.  
> > 
> > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > 
> > > Maybe it would make
> > > sense to combine the two as far as this interface is concerned, ie. a
> > > single "bus" reset method that will always use slot reset when
> > > available.  Thanks,  
> > 
> > That should work when slot reset is available.
> > 
> > Other option is that mentioned remove-reset-rescan procedure.
> 
> That's not something we can introduce to the pci_reset_function() path
> without a fair bit of collateral in using it through vfio-pci.
> 
> > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > drivers implement reset_slot method.
> > 
> > So there is a possible issue with hotplug driver which may eject device
> > during bus reset (because e.g. slot reset is not implemented)?
> 
> People aren't reporting it, so maybe those controllers aren't being
> used for this use case.  Or maybe introducing this patch will make
> these reset methods more readily accessible for testing.  We can fix or
> blacklist those controllers for bus reset when reports come in.  Thanks,

Ok! I do not know neither if those controllers are used, but looks like
that there are still changes in hotplug code.

So I guess with these patches people can test it and report issues when
such thing happen.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 17:31                                 ` Alex Williamson
@ 2021-03-18  9:09                                   ` Leon Romanovsky
  2021-03-18 14:22                                     ` Amey Narkhede
  2021-03-18 16:39                                     ` Alex Williamson
  0 siblings, 2 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-18  9:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> On Wed, 17 Mar 2021 15:58:40 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
>
> > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex,
> > > > > > > > > > > >
> > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > >
> > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > I don't know the details yet.
> > > > > > > > > >
> > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > easier.
> > > > > > > > >
> > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > current alternatives.
> > > > > > > >
> > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > overcome HW bugs, am I right?
> > > > > > > >
> > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > >
> > > > > > > > My 2-cents.
> > > > > > > >
> > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > to use broken device until bug is fixed in upstream.
> > > > > >
> > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > use cheap workaround. There is no incentive to fix it.
>
> We can increase the annoyance factor of using a modified set of reset
> methods, but ultimately we can only control what goes into our kernel,
> other kernels might take v1 of this series and incorporate it
> regardless of what happens here.
>
> > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > drivers for example custom FPGA based devices.
> > > > > >
> > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > custom, they don't need upstream, we don't need them :)
> > > > > >
> > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > >
> > > > I invite such hobbyists to send patches and include their FPGA in
> > > > upstream kernel.
>
> This is potentially another good use case, how receptive are we going
> to be to an FPGA design that botches a reset.  Do they have a valid
> device ID for us to base a quirk on, are they just squatting on one, or
> using the default from a library.  Maybe the next bitstream will
> resolve it, maybe without any external indication.  IOW, what would the
> quality level be for that quirk versus using this as a workaround,
> where the user probably wouldn't mind a kernel nag?

It is worth to solve it when the need arises.

>
> > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > to emulate machine reboot as closely as possible.
> > > > > >
> > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > config after such reset too.
> > > > > >
> > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > saves and restores device state over the reset.
> > > >
> > > > I'm talking about netdev state, but whatever given the existence of
> > > > sysfs reset knob.
> > > >
> > > > >
> > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
>
> That would cause a massive regression in device assignment support.  As
> with other sysfs attributes, triggering them alongside a running driver
> is probably not going to end well.  However, pci_reset_function() is
> extremely useful for stopping devices and returning them to a default
> state, when either rebooting a VM or returning the device to the host.
> The device is not removed and re-probed when this occurs, vfio-pci is
> able to hold onto the device across these actions.  Sure, don't reset a
> netdev device when it's in use, that's not what these are used for.
>
> > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > though.
> > > >
> > > > I didn't know the rationale behind that file till you said and I
> > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > will manage database of devices and their working reset types?
> > > >
> > > I don't have much idea about internals of libvirt but why would
> > > it need to manage database of working reset types? It could just
> > > read new reset_methods attribute to get the list of supported reset
> > > methods.
> >
> > Because the idea of this patch is to read all supported reset types and
> > allow to the user to chose the working one. The user will do it with
> > help from StackOverflow, but libvirt will need to have some sort of
> > database, otherwise it won't be different from simple "echo 1 > reset"
> > which will iterate over all supported resets anyway.
>
> AFAIK, libvirt no longer attempts to do resets itself, or is at least
> moving in that direction.  vfio-pci will reset as device when they're
> opened by a user (when available) or triggered via the API.

<...>

> > The difference here is that this is a workaround to solve bugs that
> > should be fixed in the kernel.
>
> If we want to discourage using this as a primary means to resolve reset
> issues on a device then we can create log warnings any time it's used.
> Downstreams that really want this functionality are going to take this
> patch from the list whether we accept it or not.  As above, it seems
> there are valid use cases.  Even with mainstream vfio in QEMU, I go
> through some hoops trying to determine if I can do a secondary bus
> reset rather than a PM reset because it's not specified anywhere what a
> "soft reset" means for any given device.  This sort of interface could
> make it easier to apply a system policy that a pci_reset_function()
> should always perform a secondary bus reset if the only other option is
> a PM reset.  Maybe that policy mostly makes sense for a VM use case, so
> we'd want one policy by default and another when the device is used for
> this functionality.  How could we accomplish that with a quirk?  Thanks,

I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?

If it is latter then we don't really need sysfs, if not, we still need
some sort of DB to create second policy, because "supported != working".
What am I missing?

Thanks

>
> Alex
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18  9:09                                   ` Leon Romanovsky
@ 2021-03-18 14:22                                     ` Amey Narkhede
  2021-03-18 14:57                                       ` Leon Romanovsky
  2021-03-18 16:39                                     ` Alex Williamson
  1 sibling, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-18 14:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/18 11:09AM, Leon Romanovsky wrote:
> On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > On Wed, 17 Mar 2021 15:58:40 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >
> > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > >
> > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > >
> > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > easier.
> > > > > > > > > >
> > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > current alternatives.
> > > > > > > > >
> > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > >
> > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > >
> > > > > > > > > My 2-cents.
> > > > > > > > >
> > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > >
> > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > use cheap workaround. There is no incentive to fix it.
> >
> > We can increase the annoyance factor of using a modified set of reset
> > methods, but ultimately we can only control what goes into our kernel,
> > other kernels might take v1 of this series and incorporate it
> > regardless of what happens here.
> >
> > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > drivers for example custom FPGA based devices.
> > > > > > >
> > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > >
> > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > >
> > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > upstream kernel.
> >
> > This is potentially another good use case, how receptive are we going
> > to be to an FPGA design that botches a reset.  Do they have a valid
> > device ID for us to base a quirk on, are they just squatting on one, or
> > using the default from a library.  Maybe the next bitstream will
> > resolve it, maybe without any external indication.  IOW, what would the
> > quality level be for that quirk versus using this as a workaround,
> > where the user probably wouldn't mind a kernel nag?
>
> It is worth to solve it when the need arises.
>
> >
> > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > >
> > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > config after such reset too.
> > > > > > >
> > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > saves and restores device state over the reset.
> > > > >
> > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > sysfs reset knob.
> > > > >
> > > > > >
> > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> >
> > That would cause a massive regression in device assignment support.  As
> > with other sysfs attributes, triggering them alongside a running driver
> > is probably not going to end well.  However, pci_reset_function() is
> > extremely useful for stopping devices and returning them to a default
> > state, when either rebooting a VM or returning the device to the host.
> > The device is not removed and re-probed when this occurs, vfio-pci is
> > able to hold onto the device across these actions.  Sure, don't reset a
> > netdev device when it's in use, that's not what these are used for.
> >
> > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > though.
> > > > >
> > > > > I didn't know the rationale behind that file till you said and I
> > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > will manage database of devices and their working reset types?
> > > > >
> > > > I don't have much idea about internals of libvirt but why would
> > > > it need to manage database of working reset types? It could just
> > > > read new reset_methods attribute to get the list of supported reset
> > > > methods.
> > >
> > > Because the idea of this patch is to read all supported reset types and
> > > allow to the user to chose the working one. The user will do it with
> > > help from StackOverflow, but libvirt will need to have some sort of
> > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > which will iterate over all supported resets anyway.
> >
> > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > moving in that direction.  vfio-pci will reset as device when they're
> > opened by a user (when available) or triggered via the API.
>
> <...>
>
> > > The difference here is that this is a workaround to solve bugs that
> > > should be fixed in the kernel.
> >
> > If we want to discourage using this as a primary means to resolve reset
> > issues on a device then we can create log warnings any time it's used.
> > Downstreams that really want this functionality are going to take this
> > patch from the list whether we accept it or not.  As above, it seems
> > there are valid use cases.  Even with mainstream vfio in QEMU, I go
> > through some hoops trying to determine if I can do a secondary bus
> > reset rather than a PM reset because it's not specified anywhere what a
> > "soft reset" means for any given device.  This sort of interface could
> > make it easier to apply a system policy that a pci_reset_function()
> > should always perform a secondary bus reset if the only other option is
> > a PM reset.  Maybe that policy mostly makes sense for a VM use case, so
> > we'd want one policy by default and another when the device is used for
> > this functionality.  How could we accomplish that with a quirk?  Thanks,
>
> I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
>
> If it is latter then we don't really need sysfs, if not, we still need
> some sort of DB to create second policy, because "supported != working".
> What am I missing?
>
> Thanks
>
Can you explain bit more about why supported != working?
Why would hardware indicate that it supports specific reset
method if it doesn't work? There is only an unsual quirk for intel
82599 which supports FLR but only reports in PF DEVCAP not in
VF DEVCAP so we need to directly call FLR without checking if it
is supported.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-17 20:13                           ` Pali Rohár
@ 2021-03-18 14:31                             ` Amey Narkhede
  2021-03-23 14:34                               ` Pali Rohár
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-18 14:31 UTC (permalink / raw)
  To: Pali Rohár
  Cc: alex.williamson, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On 21/03/17 09:13PM, Pali Rohár wrote:
> On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > On Wed, 17 Mar 2021 20:40:24 +0100
> > Pali Rohár <pali@kernel.org> wrote:
> >
> > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > Pali Rohár <pali@kernel.org> wrote:
> > > >
> > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > >
> > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > >
> > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > >
> > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > kernel function (yet).
> > > > > > > > > >
> > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > defined here.
> > > > > > > > >
> > > > > > > > > Ok!
> > > > > > > > >
> > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > let device in unconfigured / broken state.
> > > > > > > >
> > > > > > > > No, there's not:
> > > > > > > >
> > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > {
> > > > > > > >         int rc;
> > > > > > > >
> > > > > > > >         if (!dev->reset_fn)
> > > > > > > >                 return -ENOTTY;
> > > > > > > >
> > > > > > > >         pci_dev_lock(dev);
> > > > > > > > >>>     pci_dev_save_and_disable(dev);
> > > > > > > >
> > > > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > > > >
> > > > > > > > >>>     pci_dev_restore(dev);
> > > > > > > >         pci_dev_unlock(dev);
> > > > > > > >
> > > > > > > >         return rc;
> > > > > > > > }
> > > > > > > >
> > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > > >
> > > > > > > > Alex
> > > > > > > >
> > > > > > >
> > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > behave with hotplugging PCIe bridge.
> > > > > > >
> > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > >
> > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > change.
> > > > > >
> > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > around the bus reset.  Thanks,
> > > > >
> > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > >
> > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > hotplugging.
> > > > >
> > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > bus reset.
> > > >
> > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > that prevents the device from getting ejected.
> > >
> > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > >
> > > > Maybe it would make
> > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > single "bus" reset method that will always use slot reset when
> > > > available.  Thanks,
> > >
> > > That should work when slot reset is available.
> > >
> > > Other option is that mentioned remove-reset-rescan procedure.
> >
> > That's not something we can introduce to the pci_reset_function() path
> > without a fair bit of collateral in using it through vfio-pci.
> >
> > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > drivers implement reset_slot method.
> > >
> > > So there is a possible issue with hotplug driver which may eject device
> > > during bus reset (because e.g. slot reset is not implemented)?
> >
> > People aren't reporting it, so maybe those controllers aren't being
> > used for this use case.  Or maybe introducing this patch will make
> > these reset methods more readily accessible for testing.  We can fix or
> > blacklist those controllers for bus reset when reports come in.  Thanks,
>
> Ok! I do not know neither if those controllers are used, but looks like
> that there are still changes in hotplug code.
>
> So I guess with these patches people can test it and report issues when
> such thing happen.
So after a bit research as I understood we need to group slot
and bus reset together in a single category of reset methods and
then implicitly use slot reset if it is available when bus reset is
enabled by the user.
Is that right?

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 14:22                                     ` Amey Narkhede
@ 2021-03-18 14:57                                       ` Leon Romanovsky
  2021-03-18 17:01                                         ` Amey Narkhede
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-18 14:57 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alex,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > > >
> > > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > > easier.
> > > > > > > > > > >
> > > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > > current alternatives.
> > > > > > > > > >
> > > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > > >
> > > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > > >
> > > > > > > > > > My 2-cents.
> > > > > > > > > >
> > > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > > >
> > > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > > use cheap workaround. There is no incentive to fix it.
> > >
> > > We can increase the annoyance factor of using a modified set of reset
> > > methods, but ultimately we can only control what goes into our kernel,
> > > other kernels might take v1 of this series and incorporate it
> > > regardless of what happens here.
> > >
> > > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > > drivers for example custom FPGA based devices.
> > > > > > > >
> > > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > > >
> > > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > > >
> > > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > > upstream kernel.
> > >
> > > This is potentially another good use case, how receptive are we going
> > > to be to an FPGA design that botches a reset.  Do they have a valid
> > > device ID for us to base a quirk on, are they just squatting on one, or
> > > using the default from a library.  Maybe the next bitstream will
> > > resolve it, maybe without any external indication.  IOW, what would the
> > > quality level be for that quirk versus using this as a workaround,
> > > where the user probably wouldn't mind a kernel nag?
> >
> > It is worth to solve it when the need arises.
> >
> > >
> > > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > > >
> > > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > > config after such reset too.
> > > > > > > >
> > > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > > saves and restores device state over the reset.
> > > > > >
> > > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > > sysfs reset knob.
> > > > > >
> > > > > > >
> > > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > >
> > > That would cause a massive regression in device assignment support.  As
> > > with other sysfs attributes, triggering them alongside a running driver
> > > is probably not going to end well.  However, pci_reset_function() is
> > > extremely useful for stopping devices and returning them to a default
> > > state, when either rebooting a VM or returning the device to the host.
> > > The device is not removed and re-probed when this occurs, vfio-pci is
> > > able to hold onto the device across these actions.  Sure, don't reset a
> > > netdev device when it's in use, that's not what these are used for.
> > >
> > > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > > though.
> > > > > >
> > > > > > I didn't know the rationale behind that file till you said and I
> > > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > > will manage database of devices and their working reset types?
> > > > > >
> > > > > I don't have much idea about internals of libvirt but why would
> > > > > it need to manage database of working reset types? It could just
> > > > > read new reset_methods attribute to get the list of supported reset
> > > > > methods.
> > > >
> > > > Because the idea of this patch is to read all supported reset types and
> > > > allow to the user to chose the working one. The user will do it with
> > > > help from StackOverflow, but libvirt will need to have some sort of
> > > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > > which will iterate over all supported resets anyway.
> > >
> > > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > > moving in that direction.  vfio-pci will reset as device when they're
> > > opened by a user (when available) or triggered via the API.
> >
> > <...>
> >
> > > > The difference here is that this is a workaround to solve bugs that
> > > > should be fixed in the kernel.
> > >
> > > If we want to discourage using this as a primary means to resolve reset
> > > issues on a device then we can create log warnings any time it's used.
> > > Downstreams that really want this functionality are going to take this
> > > patch from the list whether we accept it or not.  As above, it seems
> > > there are valid use cases.  Even with mainstream vfio in QEMU, I go
> > > through some hoops trying to determine if I can do a secondary bus
> > > reset rather than a PM reset because it's not specified anywhere what a
> > > "soft reset" means for any given device.  This sort of interface could
> > > make it easier to apply a system policy that a pci_reset_function()
> > > should always perform a secondary bus reset if the only other option is
> > > a PM reset.  Maybe that policy mostly makes sense for a VM use case, so
> > > we'd want one policy by default and another when the device is used for
> > > this functionality.  How could we accomplish that with a quirk?  Thanks,
> >
> > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> >
> > If it is latter then we don't really need sysfs, if not, we still need
> > some sort of DB to create second policy, because "supported != working".
> > What am I missing?
> >
> > Thanks
> >
> Can you explain bit more about why supported != working?

It is written in the commit message of this patch.
https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@gmail.com/
"This feature aims to allow greater control of a device for use cases
as device assignment, where specific device or platform issues may
interact poorly with a given reset method, and for which device specific
quirks have not been developed."

You wrote it and also repeated it a couple of times during the discussion.

If device can understand that specific reset doesn't work, it won't
perform it in first place.

Thanks

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18  9:09                                   ` Leon Romanovsky
  2021-03-18 14:22                                     ` Amey Narkhede
@ 2021-03-18 16:39                                     ` Alex Williamson
  2021-03-18 17:22                                       ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-18 16:39 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Amey Narkhede, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Thu, 18 Mar 2021 11:09:34 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > On Wed, 17 Mar 2021 15:58:40 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >  
> > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:  
> > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:  
> > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:  
> > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:  
> > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:  
> > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:  
> > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:  
> > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:  
> > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > > > > >  
> > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:  
> > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:  
> > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > warm reset respectively.  
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > kernel function (yet).  
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,  
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > >  
> > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > I don't know the details yet.  
> > > > > > > > > > >
> > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >  
> > > > > > > > > >
> > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > easier.
> > > > > > > > > >
> > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > current alternatives.  
> > > > > > > > >
> > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > >
> > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > >
> > > > > > > > > My 2-cents.
> > > > > > > > >  
> > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > to use broken device until bug is fixed in upstream.  
> > > > > > >
> > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > use cheap workaround. There is no incentive to fix it.  
> >
> > We can increase the annoyance factor of using a modified set of reset
> > methods, but ultimately we can only control what goes into our kernel,
> > other kernels might take v1 of this series and incorporate it
> > regardless of what happens here.
> >  
> > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > drivers for example custom FPGA based devices.  
> > > > > > >
> > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > >  
> > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.  
> > > > >
> > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > upstream kernel.  
> >
> > This is potentially another good use case, how receptive are we going
> > to be to an FPGA design that botches a reset.  Do they have a valid
> > device ID for us to base a quirk on, are they just squatting on one, or
> > using the default from a library.  Maybe the next bitstream will
> > resolve it, maybe without any external indication.  IOW, what would the
> > quality level be for that quirk versus using this as a workaround,
> > where the user probably wouldn't mind a kernel nag?  
> 
> It is worth to solve it when the need arises.
> 
> >  
> > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > to emulate machine reboot as closely as possible.  
> > > > > > >
> > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > config after such reset too.
> > > > > > >  
> > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > saves and restores device state over the reset.  
> > > > >
> > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > sysfs reset knob.
> > > > >  
> > > > > >  
> > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.  
> >
> > That would cause a massive regression in device assignment support.  As
> > with other sysfs attributes, triggering them alongside a running driver
> > is probably not going to end well.  However, pci_reset_function() is
> > extremely useful for stopping devices and returning them to a default
> > state, when either rebooting a VM or returning the device to the host.
> > The device is not removed and re-probed when this occurs, vfio-pci is
> > able to hold onto the device across these actions.  Sure, don't reset a
> > netdev device when it's in use, that's not what these are used for.
> >  
> > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > though.  
> > > > >
> > > > > I didn't know the rationale behind that file till you said and I
> > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > will manage database of devices and their working reset types?
> > > > >  
> > > > I don't have much idea about internals of libvirt but why would
> > > > it need to manage database of working reset types? It could just
> > > > read new reset_methods attribute to get the list of supported reset
> > > > methods.  
> > >
> > > Because the idea of this patch is to read all supported reset types and
> > > allow to the user to chose the working one. The user will do it with
> > > help from StackOverflow, but libvirt will need to have some sort of
> > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > which will iterate over all supported resets anyway.  
> >
> > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > moving in that direction.  vfio-pci will reset as device when they're
> > opened by a user (when available) or triggered via the API.  
> 
> <...>
> 
> > > The difference here is that this is a workaround to solve bugs that
> > > should be fixed in the kernel.  
> >
> > If we want to discourage using this as a primary means to resolve reset
> > issues on a device then we can create log warnings any time it's used.
> > Downstreams that really want this functionality are going to take this
> > patch from the list whether we accept it or not.  As above, it seems
> > there are valid use cases.  Even with mainstream vfio in QEMU, I go
> > through some hoops trying to determine if I can do a secondary bus
> > reset rather than a PM reset because it's not specified anywhere what a
> > "soft reset" means for any given device.  This sort of interface could
> > make it easier to apply a system policy that a pci_reset_function()
> > should always perform a secondary bus reset if the only other option is
> > a PM reset.  Maybe that policy mostly makes sense for a VM use case, so
> > we'd want one policy by default and another when the device is used for
> > this functionality.  How could we accomplish that with a quirk?  Thanks,  
> 
> I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> 
> If it is latter then we don't really need sysfs, if not, we still need
> some sort of DB to create second policy, because "supported != working".
> What am I missing?

vfio-pci uses the internal kernel API, ie. the variants of
pci_reset_function(), which is the same interface used by the existing
sysfs reset mechanism.  This proposed configuration of the reset method
would affect any driver using that same core infrastructure and from my
perspective that's really the goal.  In the case where a supported
reset mechanism fails for a device, continuing to quirk those out for
the best default behavior makes sense, I'd be disappointed for a vendor
to not pursue improving the default behavior where it clearly makes
sense.  However, there's also a policy decision, the kernel imposes a
preferential ordering of reset mechanism.  Is that ordering the best
case for all users?  I've presented above a case where a userspace may
prefer a policy of preferring a bus reset to a PM reset.  So I think
the question is not only are there supported mechanisms that don't
work, where this interface allows userspace to more readily identify
and work around those sorts of issues, but it also enables user
preference and easier evaluation whether all of the supported reset
mechanisms work rather than just the first one we encounter in the
ordering we've decided to impose today.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 14:57                                       ` Leon Romanovsky
@ 2021-03-18 17:01                                         ` Amey Narkhede
  2021-03-18 17:35                                           ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-18 17:01 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/18 04:57PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Alex,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > > > >
> > > > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > > > easier.
> > > > > > > > > > > >
> > > > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > > > current alternatives.
> > > > > > > > > > >
> > > > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > > > >
> > > > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > > > >
> > > > > > > > > > > My 2-cents.
> > > > > > > > > > >
> > > > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > > > >
> > > > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > > > use cheap workaround. There is no incentive to fix it.
> > > >
> > > > We can increase the annoyance factor of using a modified set of reset
> > > > methods, but ultimately we can only control what goes into our kernel,
> > > > other kernels might take v1 of this series and incorporate it
> > > > regardless of what happens here.
> > > >
> > > > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > > > drivers for example custom FPGA based devices.
> > > > > > > > >
> > > > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > > > >
> > > > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > > > >
> > > > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > > > upstream kernel.
> > > >
> > > > This is potentially another good use case, how receptive are we going
> > > > to be to an FPGA design that botches a reset.  Do they have a valid
> > > > device ID for us to base a quirk on, are they just squatting on one, or
> > > > using the default from a library.  Maybe the next bitstream will
> > > > resolve it, maybe without any external indication.  IOW, what would the
> > > > quality level be for that quirk versus using this as a workaround,
> > > > where the user probably wouldn't mind a kernel nag?
> > >
> > > It is worth to solve it when the need arises.
> > >
> > > >
> > > > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > > > >
> > > > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > > > config after such reset too.
> > > > > > > > >
> > > > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > > > saves and restores device state over the reset.
> > > > > > >
> > > > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > > > sysfs reset knob.
> > > > > > >
> > > > > > > >
> > > > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > > >
> > > > That would cause a massive regression in device assignment support.  As
> > > > with other sysfs attributes, triggering them alongside a running driver
> > > > is probably not going to end well.  However, pci_reset_function() is
> > > > extremely useful for stopping devices and returning them to a default
> > > > state, when either rebooting a VM or returning the device to the host.
> > > > The device is not removed and re-probed when this occurs, vfio-pci is
> > > > able to hold onto the device across these actions.  Sure, don't reset a
> > > > netdev device when it's in use, that's not what these are used for.
> > > >
> > > > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > > > though.
> > > > > > >
> > > > > > > I didn't know the rationale behind that file till you said and I
> > > > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > > > will manage database of devices and their working reset types?
> > > > > > >
> > > > > > I don't have much idea about internals of libvirt but why would
> > > > > > it need to manage database of working reset types? It could just
> > > > > > read new reset_methods attribute to get the list of supported reset
> > > > > > methods.
> > > > >
> > > > > Because the idea of this patch is to read all supported reset types and
> > > > > allow to the user to chose the working one. The user will do it with
> > > > > help from StackOverflow, but libvirt will need to have some sort of
> > > > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > > > which will iterate over all supported resets anyway.
> > > >
> > > > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > > > moving in that direction.  vfio-pci will reset as device when they're
> > > > opened by a user (when available) or triggered via the API.
> > >
> > > <...>
> > >
> > > > > The difference here is that this is a workaround to solve bugs that
> > > > > should be fixed in the kernel.
> > > >
> > > > If we want to discourage using this as a primary means to resolve reset
> > > > issues on a device then we can create log warnings any time it's used.
> > > > Downstreams that really want this functionality are going to take this
> > > > patch from the list whether we accept it or not.  As above, it seems
> > > > there are valid use cases.  Even with mainstream vfio in QEMU, I go
> > > > through some hoops trying to determine if I can do a secondary bus
> > > > reset rather than a PM reset because it's not specified anywhere what a
> > > > "soft reset" means for any given device.  This sort of interface could
> > > > make it easier to apply a system policy that a pci_reset_function()
> > > > should always perform a secondary bus reset if the only other option is
> > > > a PM reset.  Maybe that policy mostly makes sense for a VM use case, so
> > > > we'd want one policy by default and another when the device is used for
> > > > this functionality.  How could we accomplish that with a quirk?  Thanks,
> > >
> > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > >
> > > If it is latter then we don't really need sysfs, if not, we still need
> > > some sort of DB to create second policy, because "supported != working".
> > > What am I missing?
> > >
> > > Thanks
> > >
> > Can you explain bit more about why supported != working?
>
> It is written in the commit message of this patch.
> https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@gmail.com/
> "This feature aims to allow greater control of a device for use cases
> as device assignment, where specific device or platform issues may
> interact poorly with a given reset method, and for which device specific
> quirks have not been developed."
>
> You wrote it and also repeated it a couple of times during the discussion.
>
> If device can understand that specific reset doesn't work, it won't
> perform it in first place.
>
> Thanks
Is it possible for device to understand whether or not specific reset
will work or not prior to performing reset and after it indicates
support for that reset method? Maybe theres problem with that particular
piece of hardware in that machine.
How can database be maintained if a particular machines have
particular piece of faulty HW?
If for some reason reset doesn't work it will just give -ENOTTY.
This isn't any different from existing behavior.Actually it informs user
that the reset method didn't reset the device and user can use different
reset method instead of implicitly using different reset method.
If user doesn't explicitly set preferred reset method then
we go ahead with existing implicit fall through behavior which will try all
available reset methods until any one of them works.
If you have device that doesn't support reset at all then you have
option to completely disable it unlike existing reset attribute where
you cannot disable reset. So it gives greater control where you can
disable the reset altogether when quirk isn't developed yet.

We can't expect to develop quirk for every device in existence.
For example on my laptop elantech touchpad still doesn't work in 2021
with vanilla kernel, arch linux applies the patch which was reverted in
mainline kernel for some reason.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 16:39                                     ` Alex Williamson
@ 2021-03-18 17:22                                       ` Leon Romanovsky
  2021-03-18 17:38                                         ` Amey Narkhede
  2021-03-18 18:34                                         ` Enrico Weigelt, metux IT consult
  0 siblings, 2 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-18 17:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Amey Narkhede, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Thu, Mar 18, 2021 at 10:39:35AM -0600, Alex Williamson wrote:
> On Thu, 18 Mar 2021 11:09:34 +0200
> Leon Romanovsky <leon@kernel.org> wrote:

<...>

> > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> >
> > If it is latter then we don't really need sysfs, if not, we still need
> > some sort of DB to create second policy, because "supported != working".
> > What am I missing?
>
> vfio-pci uses the internal kernel API, ie. the variants of
> pci_reset_function(), which is the same interface used by the existing
> sysfs reset mechanism.  This proposed configuration of the reset method
> would affect any driver using that same core infrastructure and from my
> perspective that's really the goal.  In the case where a supported
> reset mechanism fails for a device, continuing to quirk those out for
> the best default behavior makes sense, I'd be disappointed for a vendor
> to not pursue improving the default behavior where it clearly makes
> sense.  However, there's also a policy decision, the kernel imposes a
> preferential ordering of reset mechanism.  Is that ordering the best
> case for all users?  I've presented above a case where a userspace may
> prefer a policy of preferring a bus reset to a PM reset.  So I think
> the question is not only are there supported mechanisms that don't
> work, where this interface allows userspace to more readily identify
> and work around those sorts of issues, but it also enables user
> preference and easier evaluation whether all of the supported reset
> mechanisms work rather than just the first one we encounter in the
> ordering we've decided to impose today.  Thanks,

Alex,

Which email client do you use?
Your responses are grouped as one huge block without any chance to respond
to you on specific point or answer to your question.

I see your flow and understand your position, but will repeat my
position. We need to make sure that vendors will have incentive to
supply quirks.

And regarding vendors, see Amey response below about his touchpad troubles.
The cheap electronics vendors don't care about their users.

Thanks

>
> Alex
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:01                                         ` Amey Narkhede
@ 2021-03-18 17:35                                           ` Leon Romanovsky
  2021-03-18 17:43                                             ` Amey Narkhede
  2021-03-18 17:58                                             ` Enrico Weigelt, metux IT consult
  0 siblings, 2 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-18 17:35 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > Leon Romanovsky <leon@kernel.org> wrote:

<...>

> > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > >
> > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > some sort of DB to create second policy, because "supported != working".
> > > > What am I missing?
> > > >
> > > > Thanks
> > > >
> > > Can you explain bit more about why supported != working?
> >
> > It is written in the commit message of this patch.
> > https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@gmail.com/
> > "This feature aims to allow greater control of a device for use cases
> > as device assignment, where specific device or platform issues may
> > interact poorly with a given reset method, and for which device specific
> > quirks have not been developed."
> >
> > You wrote it and also repeated it a couple of times during the discussion.
> >
> > If device can understand that specific reset doesn't work, it won't
> > perform it in first place.
> >
> > Thanks
> Is it possible for device to understand whether or not specific reset
> will work or not prior to performing reset and after it indicates
> support for that reset method? Maybe theres problem with that particular
> piece of hardware in that machine.
> How can database be maintained if a particular machines have
> particular piece of faulty HW?

It was exactly the reason why I think that VM usecase presented by
you is not viable.

> If for some reason reset doesn't work it will just give -ENOTTY.
> This isn't any different from existing behavior.Actually it informs user
> that the reset method didn't reset the device and user can use different
> reset method instead of implicitly using different reset method.
> If user doesn't explicitly set preferred reset method then
> we go ahead with existing implicit fall through behavior which will try all
> available reset methods until any one of them works.
> If you have device that doesn't support reset at all then you have
> option to completely disable it unlike existing reset attribute where
> you cannot disable reset. So it gives greater control where you can
> disable the reset altogether when quirk isn't developed yet.

I explicitly asked to hear usecase, right now, I got an explanation from
Alex for policy decision (which doesn't need sysfs) and from you about
overcoming HW bugs with expectation that user will be guru of PCI reset
methods.

>
> We can't expect to develop quirk for every device in existence.

It doesn't give us an excuse do not try.

> For example on my laptop elantech touchpad still doesn't work in 2021
> with vanilla kernel, arch linux applies the patch which was reverted in
> mainline kernel for some reason.

I see it as a good example of cheap solution. Vendor won't fix your
touchpad because distros provide workaround. The same will be with reset.

Thanks

>
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:22                                       ` Leon Romanovsky
@ 2021-03-18 17:38                                         ` Amey Narkhede
  2021-03-18 18:34                                         ` Enrico Weigelt, metux IT consult
  1 sibling, 0 replies; 90+ messages in thread
From: Amey Narkhede @ 2021-03-18 17:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/18 07:22PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 10:39:35AM -0600, Alex Williamson wrote:
> > On Thu, 18 Mar 2021 11:09:34 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
>
> <...>
>
> > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > >
> > > If it is latter then we don't really need sysfs, if not, we still need
> > > some sort of DB to create second policy, because "supported != working".
> > > What am I missing?
> >
> > vfio-pci uses the internal kernel API, ie. the variants of
> > pci_reset_function(), which is the same interface used by the existing
> > sysfs reset mechanism.  This proposed configuration of the reset method
> > would affect any driver using that same core infrastructure and from my
> > perspective that's really the goal.  In the case where a supported
> > reset mechanism fails for a device, continuing to quirk those out for
> > the best default behavior makes sense, I'd be disappointed for a vendor
> > to not pursue improving the default behavior where it clearly makes
> > sense.  However, there's also a policy decision, the kernel imposes a
> > preferential ordering of reset mechanism.  Is that ordering the best
> > case for all users?  I've presented above a case where a userspace may
> > prefer a policy of preferring a bus reset to a PM reset.  So I think
> > the question is not only are there supported mechanisms that don't
> > work, where this interface allows userspace to more readily identify
> > and work around those sorts of issues, but it also enables user
> > preference and easier evaluation whether all of the supported reset
> > mechanisms work rather than just the first one we encounter in the
> > ordering we've decided to impose today.  Thanks,
>
>
[...]
> And regarding vendors, see Amey response below about his touchpad troubles.
> The cheap electronics vendors don't care about their users.
>
> Thanks
>
On the side note that vendor probably doesn't care about
Linux users because even that reverted patch was submitted
by community member.
Many vendors are satisfied with windows only drivers.
They don't have any reason to support Linux. That doesn't
mean we should also abandon those users.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:35                                           ` Leon Romanovsky
@ 2021-03-18 17:43                                             ` Amey Narkhede
  2021-03-18 18:14                                               ` Enrico Weigelt, metux IT consult
  2021-03-19 13:05                                               ` Leon Romanovsky
  2021-03-18 17:58                                             ` Enrico Weigelt, metux IT consult
  1 sibling, 2 replies; 90+ messages in thread
From: Amey Narkhede @ 2021-03-18 17:43 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe, alex.williamson

On 21/03/18 07:35PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> > On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > > Leon Romanovsky <leon@kernel.org> wrote:
>
> <...>
>
> > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > > >
> > > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > > some sort of DB to create second policy, because "supported != working".
> > > > > What am I missing?
> > > > >
> > > > > Thanks
> > > > >
> > > > Can you explain bit more about why supported != working?
> > >
> > > It is written in the commit message of this patch.
> > > https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@gmail.com/
> > > "This feature aims to allow greater control of a device for use cases
> > > as device assignment, where specific device or platform issues may
> > > interact poorly with a given reset method, and for which device specific
> > > quirks have not been developed."
> > >
> > > You wrote it and also repeated it a couple of times during the discussion.
> > >
> > > If device can understand that specific reset doesn't work, it won't
> > > perform it in first place.
> > >
> > > Thanks
> > Is it possible for device to understand whether or not specific reset
> > will work or not prior to performing reset and after it indicates
> > support for that reset method? Maybe theres problem with that particular
> > piece of hardware in that machine.
> > How can database be maintained if a particular machines have
> > particular piece of faulty HW?
>
> It was exactly the reason why I think that VM usecase presented by
> you is not viable.
>
Well I didn't present it as new use case. I just gave existing
usecase based on existing reset attribute. Nothing new here.
Nothing really changes wrt that use case.
> > If for some reason reset doesn't work it will just give -ENOTTY.
> > This isn't any different from existing behavior.Actually it informs user
> > that the reset method didn't reset the device and user can use different
> > reset method instead of implicitly using different reset method.
> > If user doesn't explicitly set preferred reset method then
> > we go ahead with existing implicit fall through behavior which will try all
> > available reset methods until any one of them works.
> > If you have device that doesn't support reset at all then you have
> > option to completely disable it unlike existing reset attribute where
> > you cannot disable reset. So it gives greater control where you can
> > disable the reset altogether when quirk isn't developed yet.
>
> I explicitly asked to hear usecase, right now, I got an explanation from
> Alex for policy decision (which doesn't need sysfs) and from you about
> overcoming HW bugs with expectation that user will be guru of PCI reset
> methods.
>
> >
> > We can't expect to develop quirk for every device in existence.
>
> It doesn't give us an excuse do not try.
>
> > For example on my laptop elantech touchpad still doesn't work in 2021
> > with vanilla kernel, arch linux applies the patch which was reverted in
> > mainline kernel for some reason.
>
> I see it as a good example of cheap solution. Vendor won't fix your
> touchpad because distros provide workaround. The same will be with reset.
>
> Thanks
>
As mentioned earlier not all vendors care about Linux and not
all of the population can afford to buy new HW just to run Linux.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-14 23:55   ` Pali Rohár
  2021-03-15 13:43     ` Amey Narkhede
@ 2021-03-18 17:51     ` Enrico Weigelt, metux IT consult
  1 sibling, 0 replies; 90+ messages in thread
From: Enrico Weigelt, metux IT consult @ 2021-03-18 17:51 UTC (permalink / raw)
  To: Pali Rohár, ameynarkhede03
  Cc: bhelgaas, linux-pci, linux-kernel, alex.williamson, raphael.norwitz

On 15.03.21 00:55, Pali Rohár wrote:

> Moreover for mPCIe form factor cards, boards can share one PERST# signal
> with more PCIe cards and control this signal via GPIO. So asserting
> PERST# GPIO can trigger Warm reset for more PCIe cards, not just one. It
> depends on board or topology.

The pcengines apu* boards happen to be such candidates: they've got
three m.2 slots, but not all wired in the same way (depending on actual
model, not all have pcie wired). Reset lines are driven via gpio, and
some devices (I recall some lte basebands) sometimes need an explicit
reset in order to come up properly.

I have to check the schematics for the diffrent models, how exactly
these gpios are wired. (i've got reports that some production lines
don't have them wired at all - but couldn't confirm this on my own).

BTW: any idea how to inject board specific reset methods, after the
host brigde driver is already active ? In my case, apu boards, the
pci host bridge is probed via acpi and the apu board driver (which sets
up gpios, leds, keys, ...) comes much later.


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:35                                           ` Leon Romanovsky
  2021-03-18 17:43                                             ` Amey Narkhede
@ 2021-03-18 17:58                                             ` Enrico Weigelt, metux IT consult
  2021-03-19 13:07                                               ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Enrico Weigelt, metux IT consult @ 2021-03-18 17:58 UTC (permalink / raw)
  To: Leon Romanovsky, Amey Narkhede
  Cc: alex.williamson, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 18.03.21 18:35, Leon Romanovsky wrote:

> I see it as a good example of cheap solution. Vendor won't fix your
> touchpad because distros provide workaround. The same will be with reset.

Usually, vendor won't fix it, anyways, regardless of any kernel
workarounds.

Most Vendors are already completely overstrained w/ anything
software-related. A good reason why we should try to get rid firmware,
as much as we can.

It's really sad. A *decent* vendor would just provide a clean DT and
(actually matching!) schematics. But that's really hard to find, these
days :(


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:43                                             ` Amey Narkhede
@ 2021-03-18 18:14                                               ` Enrico Weigelt, metux IT consult
  2021-03-19 13:05                                               ` Leon Romanovsky
  1 sibling, 0 replies; 90+ messages in thread
From: Enrico Weigelt, metux IT consult @ 2021-03-18 18:14 UTC (permalink / raw)
  To: Amey Narkhede, Leon Romanovsky
  Cc: raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe, alex.williamson

On 18.03.21 18:43, Amey Narkhede wrote:

> Well I didn't present it as new use case. I just gave existing
> usecase based on existing reset attribute. Nothing new here.
> Nothing really changes wrt that use case.

As a board driver maintainer, I fully support your case. At least as a
development/debugging. And even if people out there play around and find
their own workarounds, these can give us maintainers valuable insights
and save us a lot of time.

> As mentioned earlier not all vendors care about Linux and not
> all of the population can afford to buy new HW just to run Linux.

At least in the x86 world (arm is *much* better here), even the
(supposedly) Linux-friendly ones often don't really care, especially if
the board isn't the newerst model anymore.

Unfortunately, what we do or don't do in the kernel has practically no
influence on board vendor decisions. The best we can practically achieve
at their side is slowing them down on smearing bullshit into FW and acpi
tables. Even getting some useful documentation from vendors is a really
rare thing.

ARM world with device tree, of course, is much better (except for closed
consumer devices like "smartphones" or acpi-poisoned arm64 boxes). At
least for profession embedded boards.


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:22                                       ` Leon Romanovsky
  2021-03-18 17:38                                         ` Amey Narkhede
@ 2021-03-18 18:34                                         ` Enrico Weigelt, metux IT consult
  2021-03-19 12:59                                           ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Enrico Weigelt, metux IT consult @ 2021-03-18 18:34 UTC (permalink / raw)
  To: Leon Romanovsky, Alex Williamson
  Cc: Amey Narkhede, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 18.03.21 18:22, Leon Romanovsky wrote:

> Which email client do you use?
> Your responses are grouped as one huge block without any chance to respond
> to you on specific point or answer to your question.

I'm reading this thread in Tbird, and threading / quoting all looks
nice.

> I see your flow and understand your position, but will repeat my
> position. We need to make sure that vendors will have incentive to
> supply quirks.

I really doubt we can influence that by any technical decision here in
the kernel.

> And regarding vendors, see Amey response below about his touchpad troubles.
> The cheap electronics vendors don't care about their users.

IMHO, the expensive ones don't care either.

Does eg. Dell publish board schematics ? Do they even publish exact part
lists (exact chipsets) along with their brochures, so customers can
check wether their HW is supported, before buying and trying out ?

Doesn't seem so. I've personally seen a lot cases where some supposedly
supported HW turned out to be some completely different and unsupported
HW that's sold under exactly the same product ID. One of many reasons
for not giving them a single penny anymore.

IMHO, there're only very few changes of convincing some HW vendor for
doing a better job on driver side:

a) product is targeted for a niche that can't live without Linux
    (eg. embedded)
b) it's really *dangerous* for your market share if anything doesn't
    work properly on Linux (eg. certan server machines)
c) somebody *really* big (like Google) is gun-pointing at some supplier,
    who's got a lot to loose
d) a *massive* worldwide shitstorm against the vendor

[ And often, even a combination of them isn't enough. Did you know that
   even Google doesn't get all specs necessary to replace away the ugly
   FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
   reverse engineer their AGESA blob). ]

You see, what we do here in the kernel has no practical influence on
those hw vendors.


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 18:34                                         ` Enrico Weigelt, metux IT consult
@ 2021-03-19 12:59                                           ` Leon Romanovsky
  2021-03-19 13:48                                             ` Enrico Weigelt, metux IT consult
                                                               ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-19 12:59 UTC (permalink / raw)
  To: Enrico Weigelt, metux IT consult
  Cc: Alex Williamson, Amey Narkhede, raphael.norwitz, linux-pci,
	bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> On 18.03.21 18:22, Leon Romanovsky wrote:
> 
> > Which email client do you use?
> > Your responses are grouped as one huge block without any chance to respond
> > to you on specific point or answer to your question.
> 
> I'm reading this thread in Tbird, and threading / quoting all looks
> nice.

I'm not talking about threading or quoting but about response itself.
See it here https://lore.kernel.org/lkml/20210318103935.2ec32302@omen.home.shazbot.org/
Alex's response is one big chunk without any separations to paragraphs.

> 
> > I see your flow and understand your position, but will repeat my
> > position. We need to make sure that vendors will have incentive to
> > supply quirks.
> 
> I really doubt we can influence that by any technical decision here in
> the kernel.

There are subsystems that succeeded to do it, for example netdev, RDMA e.t.c.

> 
> > And regarding vendors, see Amey response below about his touchpad troubles.
> > The cheap electronics vendors don't care about their users.
> 
> IMHO, the expensive ones don't care either.
> 
> Does eg. Dell publish board schematics ? Do they even publish exact part
> lists (exact chipsets) along with their brochures, so customers can
> check wether their HW is supported, before buying and trying out ?

They do it because they are allowed to do it and not because they
explicitly want to annoyance their customers. 

> 
> Doesn't seem so. I've personally seen a lot cases where some supposedly
> supported HW turned out to be some completely different and unsupported
> HW that's sold under exactly the same product ID. One of many reasons
> for not giving them a single penny anymore.
> 
> IMHO, there're only very few changes of convincing some HW vendor for
> doing a better job on driver side:
> 
> a) product is targeted for a niche that can't live without Linux
>    (eg. embedded)
> b) it's really *dangerous* for your market share if anything doesn't
>    work properly on Linux (eg. certan server machines)
> c) somebody *really* big (like Google) is gun-pointing at some supplier,
>    who's got a lot to loose
> d) a *massive* worldwide shitstorm against the vendor
> 
> [ And often, even a combination of them isn't enough. Did you know that
>   even Google doesn't get all specs necessary to replace away the ugly
>   FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
>   reverse engineer their AGESA blob). ]

I don't know about this specific Google case, but from my previous experience.
The reasons why vendor says no to Google are usually due to licensing and legal
issues and not open source vs. proprietary.

> 
> You see, what we do here in the kernel has no practical influence on
> those hw vendors.

I see it differently, but it doesn't matter. This is too theoretical
discussion to my taste.

> 
> 
> --mtx
> 
> -- 
> ---
> Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
> werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
> GPG/PGP-Schlüssel zu.
> ---
> Enrico Weigelt, metux IT consult
> Free software and Linux embedded engineering
> info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:43                                             ` Amey Narkhede
  2021-03-18 18:14                                               ` Enrico Weigelt, metux IT consult
@ 2021-03-19 13:05                                               ` Leon Romanovsky
  2021-03-19 15:23                                                 ` Amey Narkhede
  1 sibling, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-19 13:05 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe, alex.williamson

On Thu, Mar 18, 2021 at 11:13:44PM +0530, Amey Narkhede wrote:
> On 21/03/18 07:35PM, Leon Romanovsky wrote:
> > On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> > > On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > > > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > > > Leon Romanovsky <leon@kernel.org> wrote:
> >
> > <...>
> >
> > > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > > > >
> > > > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > > > some sort of DB to create second policy, because "supported != working".
> > > > > > What am I missing?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > Can you explain bit more about why supported != working?
> > > >
> > > > It is written in the commit message of this patch.
> > > > https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@gmail.com/
> > > > "This feature aims to allow greater control of a device for use cases
> > > > as device assignment, where specific device or platform issues may
> > > > interact poorly with a given reset method, and for which device specific
> > > > quirks have not been developed."
> > > >
> > > > You wrote it and also repeated it a couple of times during the discussion.
> > > >
> > > > If device can understand that specific reset doesn't work, it won't
> > > > perform it in first place.
> > > >
> > > > Thanks
> > > Is it possible for device to understand whether or not specific reset
> > > will work or not prior to performing reset and after it indicates
> > > support for that reset method? Maybe theres problem with that particular
> > > piece of hardware in that machine.
> > > How can database be maintained if a particular machines have
> > > particular piece of faulty HW?
> >
> > It was exactly the reason why I think that VM usecase presented by
> > you is not viable.
> >
> Well I didn't present it as new use case. I just gave existing
> usecase based on existing reset attribute. Nothing new here.
> Nothing really changes wrt that use case.

Of course it is new, please see Alex's response, he said that vfio uses
in-kernel API and not sysfs.

> > > If for some reason reset doesn't work it will just give -ENOTTY.
> > > This isn't any different from existing behavior.Actually it informs user
> > > that the reset method didn't reset the device and user can use different
> > > reset method instead of implicitly using different reset method.
> > > If user doesn't explicitly set preferred reset method then
> > > we go ahead with existing implicit fall through behavior which will try all
> > > available reset methods until any one of them works.
> > > If you have device that doesn't support reset at all then you have
> > > option to completely disable it unlike existing reset attribute where
> > > you cannot disable reset. So it gives greater control where you can
> > > disable the reset altogether when quirk isn't developed yet.
> >
> > I explicitly asked to hear usecase, right now, I got an explanation from
> > Alex for policy decision (which doesn't need sysfs) and from you about
> > overcoming HW bugs with expectation that user will be guru of PCI reset
> > methods.
> >
> > >
> > > We can't expect to develop quirk for every device in existence.
> >
> > It doesn't give us an excuse do not try.
> >
> > > For example on my laptop elantech touchpad still doesn't work in 2021
> > > with vanilla kernel, arch linux applies the patch which was reverted in
> > > mainline kernel for some reason.
> >
> > I see it as a good example of cheap solution. Vendor won't fix your
> > touchpad because distros provide workaround. The same will be with reset.
> >
> > Thanks
> >
> As mentioned earlier not all vendors care about Linux and not
> all of the population can afford to buy new HW just to run Linux.

Sorry, but you are not consistent. At the beginning, we talked about new HW
that has bugs but don't have quirks yet. Here we are talking about old HW
that still doesn't have quirks.

Thanks

> 
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 17:58                                             ` Enrico Weigelt, metux IT consult
@ 2021-03-19 13:07                                               ` Leon Romanovsky
  0 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-19 13:07 UTC (permalink / raw)
  To: Enrico Weigelt, metux IT consult
  Cc: Amey Narkhede, alex.williamson, raphael.norwitz, linux-pci,
	bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Thu, Mar 18, 2021 at 06:58:25PM +0100, Enrico Weigelt, metux IT consult wrote:
> On 18.03.21 18:35, Leon Romanovsky wrote:
> 
> > I see it as a good example of cheap solution. Vendor won't fix your
> > touchpad because distros provide workaround. The same will be with reset.
> 
> Usually, vendor won't fix it, anyways, regardless of any kernel
> workarounds.

It is not only vendors, but enthusiasts won't fix too, because their
distro works.

Thanks

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 12:59                                           ` Leon Romanovsky
@ 2021-03-19 13:48                                             ` Enrico Weigelt, metux IT consult
  2021-03-19 15:51                                               ` Leon Romanovsky
  2021-03-19 15:57                                             ` Bjorn Helgaas
  2021-03-19 16:23                                             ` Alex Williamson
  2 siblings, 1 reply; 90+ messages in thread
From: Enrico Weigelt, metux IT consult @ 2021-03-19 13:48 UTC (permalink / raw)
  To: Leon Romanovsky, Enrico Weigelt, metux IT consult
  Cc: Alex Williamson, Amey Narkhede, raphael.norwitz, linux-pci,
	bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On 19.03.21 13:59, Leon Romanovsky wrote:

>> I really doubt we can influence that by any technical decision here in
>> the kernel.
> 
> There are subsystems that succeeded to do it, for example netdev, RDMA e.t.c.

I'd guess either hi-end / server or embedded products - already
mentioned that these are different fields. I've been talking about the
average consumer products.

OTOH, there're also very expensive vendors that are exceptionally bad,
eg. National instruments (who even are capable of breaking rpm so badly
with their proprietary packages that they open up 0day holes - i once
filed a report @FD on such a case).

>> IMHO, the expensive ones don't care either.
>>
>> Does eg. Dell publish board schematics ? Do they even publish exact part
>> lists (exact chipsets) along with their brochures, so customers can
>> check wether their HW is supported, before buying and trying out ?
> 
> They do it because they are allowed to do it and not because they
> explicitly want to annoyance their customers.

Yes, they're just ignorant. They can still do that, because buy their
pretty expensive cheap-hardware. And that's mostly driven by purchase
people inside the customer organisations, who just don't care how much
damage they do to their own employers, by dictating purchase of
expensive broken-by-design hardware. ... but that's nothing we here have
any influence on - except for dissuasion and purchase boycott ...

In any case, I still fail to see why giving operators an debug knob
should make anything worse.

>> [ And often, even a combination of them isn't enough. Did you know that
>>    even Google doesn't get all specs necessary to replace away the ugly
>>    FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
>>    reverse engineer their AGESA blob). ]
> 
> I don't know about this specific Google case, but from my previous experience.
> The reasons why vendor says no to Google are usually due to licensing and legal
> issues and not open source vs. proprietary.

In short words: Google did (still does?) build their own mainboards and
FW (IIRC that's where LinuxBoot came from), but even with their HUGE
quantities (they buy cpus in quantities of truck loads) they still did
not manage to get any specs for writing their own early init w/o the
proprietary FSP.

The licensing / legal issues can either be:

a) we, the mightly Intel Corp., have been so extremly stupid for
    licensing some vital IP stuff (what exactly could that be, in exactly
    the prime domain of Intel ?) and signing such insane crontracts, that
    we're not allowed to tell anybody how to actually use our own
    products (yes: initializing the CPU and built-in interfaces belongs
    exactly into that category)
b) we, the mighty Intel Corp., couldn't build something on our own, but
    just stolen IP (in our primary domain) and are scared that anybody
    could find out from just reading some early setup code.
c) we, the mighty Intel Corp., rule the world and we give a phrack on
    what some tiny Customers like Google want from us.
d) we, the mightly Intel Corp., did do what our name tells: INTEL,
    and we don't want anybody raise unpleasant questions.


choose your poison :P


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 13:05                                               ` Leon Romanovsky
@ 2021-03-19 15:23                                                 ` Amey Narkhede
  2021-03-19 15:37                                                   ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-19 15:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe, alex.williamson

On 21/03/19 03:05PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 11:13:44PM +0530, Amey Narkhede wrote:
> > On 21/03/18 07:35PM, Leon Romanovsky wrote:
> > > On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> > > > On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > > > > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > > > > Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > <...>
> > >
> > > > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > > > > >
> > > > > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > > > > some sort of DB to create second policy, because "supported != working".
> > > > > > > What am I missing?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > Can you explain bit more about why supported != working?
> > > > >
> > > > > It is written in the commit message of this patch.
> > > > > https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@gmail.com/
> > > > > "This feature aims to allow greater control of a device for use cases
> > > > > as device assignment, where specific device or platform issues may
> > > > > interact poorly with a given reset method, and for which device specific
> > > > > quirks have not been developed."
> > > > >
> > > > > You wrote it and also repeated it a couple of times during the discussion.
> > > > >
> > > > > If device can understand that specific reset doesn't work, it won't
> > > > > perform it in first place.
> > > > >
> > > > > Thanks
> > > > Is it possible for device to understand whether or not specific reset
> > > > will work or not prior to performing reset and after it indicates
> > > > support for that reset method? Maybe theres problem with that particular
> > > > piece of hardware in that machine.
> > > > How can database be maintained if a particular machines have
> > > > particular piece of faulty HW?
> > >
> > > It was exactly the reason why I think that VM usecase presented by
> > > you is not viable.
> > >
> > Well I didn't present it as new use case. I just gave existing
> > usecase based on existing reset attribute. Nothing new here.
> > Nothing really changes wrt that use case.
>
> Of course it is new, please see Alex's response, he said that vfio uses
> in-kernel API and not sysfs.
>
Still it doesn't change in-kernel API either.
> > > > If for some reason reset doesn't work it will just give -ENOTTY.
> > > > This isn't any different from existing behavior.Actually it informs user
> > > > that the reset method didn't reset the device and user can use different
> > > > reset method instead of implicitly using different reset method.
> > > > If user doesn't explicitly set preferred reset method then
> > > > we go ahead with existing implicit fall through behavior which will try all
> > > > available reset methods until any one of them works.
> > > > If you have device that doesn't support reset at all then you have
> > > > option to completely disable it unlike existing reset attribute where
> > > > you cannot disable reset. So it gives greater control where you can
> > > > disable the reset altogether when quirk isn't developed yet.
> > >
> > > I explicitly asked to hear usecase, right now, I got an explanation from
> > > Alex for policy decision (which doesn't need sysfs) and from you about
> > > overcoming HW bugs with expectation that user will be guru of PCI reset
> > > methods.
> > >
> > > >
> > > > We can't expect to develop quirk for every device in existence.
> > >
> > > It doesn't give us an excuse do not try.
> > >
> > > > For example on my laptop elantech touchpad still doesn't work in 2021
> > > > with vanilla kernel, arch linux applies the patch which was reverted in
> > > > mainline kernel for some reason.
> > >
> > > I see it as a good example of cheap solution. Vendor won't fix your
> > > touchpad because distros provide workaround. The same will be with reset.
> > >
> > > Thanks
> > >
> > As mentioned earlier not all vendors care about Linux and not
> > all of the population can afford to buy new HW just to run Linux.
>
> Sorry, but you are not consistent. At the beginning, we talked about new HW
> that has bugs but don't have quirks yet. Here we are talking about old HW
> that still doesn't have quirks.
>
> Thanks
>
Does it really matter whether HW is old or new?
If old HW doesn't have quirks yet how can we expect
new one to have quirks? What if new HW is made by same vendors
who don't have any interest in Linux?

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 15:23                                                 ` Amey Narkhede
@ 2021-03-19 15:37                                                   ` Leon Romanovsky
  2021-03-19 15:53                                                     ` Amey Narkhede
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-19 15:37 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe, alex.williamson

On Fri, Mar 19, 2021 at 08:53:17PM +0530, Amey Narkhede wrote:
> On 21/03/19 03:05PM, Leon Romanovsky wrote:

<...>

> > > > It was exactly the reason why I think that VM usecase presented by
> > > > you is not viable.
> > > >
> > > Well I didn't present it as new use case. I just gave existing
> > > usecase based on existing reset attribute. Nothing new here.
> > > Nothing really changes wrt that use case.
> >
> > Of course it is new, please see Alex's response, he said that vfio uses
> > in-kernel API and not sysfs.
> >
> Still it doesn't change in-kernel API either.

Right, but the issue is with user space part of this proposal and not
in-kernel API.


<...>

> > > As mentioned earlier not all vendors care about Linux and not
> > > all of the population can afford to buy new HW just to run Linux.
> >
> > Sorry, but you are not consistent. At the beginning, we talked about new HW
> > that has bugs but don't have quirks yet. Here we are talking about old HW
> > that still doesn't have quirks.
> >
> > Thanks
> >
> Does it really matter whether HW is old or new?
> If old HW doesn't have quirks yet how can we expect
> new one to have quirks? What if new HW is made by same vendors
> who don't have any interest in Linux?

It is pretty clear that this sysfs won't improve quirks situation but
has all potential to reduce their amount even more.

Let's stop this discussion here.

Thanks

> 
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 13:48                                             ` Enrico Weigelt, metux IT consult
@ 2021-03-19 15:51                                               ` Leon Romanovsky
  0 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-19 15:51 UTC (permalink / raw)
  To: Enrico Weigelt, metux IT consult
  Cc: Alex Williamson, Amey Narkhede, raphael.norwitz, linux-pci,
	bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Fri, Mar 19, 2021 at 02:48:12PM +0100, Enrico Weigelt, metux IT consult wrote:
> On 19.03.21 13:59, Leon Romanovsky wrote:

<...>

> In any case, I still fail to see why giving operators an debug knob
> should make anything worse.

I see this patch as a workaround to stop and provide quirks for reset issues.
As a way forward, we can do this sysfs visible for DEBUG/EXPERT .config builds.
What do you think?

> 
> > > [ And often, even a combination of them isn't enough. Did you know that
> > >    even Google doesn't get all specs necessary to replace away the ugly
> > >    FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
> > >    reverse engineer their AGESA blob). ]
> > 
> > I don't know about this specific Google case, but from my previous experience.
> > The reasons why vendor says no to Google are usually due to licensing and legal
> > issues and not open source vs. proprietary.
> 
> In short words: Google did (still does?) build their own mainboards and
> FW (IIRC that's where LinuxBoot came from), but even with their HUGE
> quantities (they buy cpus in quantities of truck loads) they still did
> not manage to get any specs for writing their own early init w/o the
> proprietary FSP.
> 
> The licensing / legal issues can either be:
> 
> a) we, the mightly Intel Corp., have been so extremly stupid for
>    licensing some vital IP stuff (what exactly could that be, in exactly
>    the prime domain of Intel ?) and signing such insane crontracts, that
>    we're not allowed to tell anybody how to actually use our own
>    products (yes: initializing the CPU and built-in interfaces belongs
>    exactly into that category)
> b) we, the mighty Intel Corp., couldn't build something on our own, but
>    just stolen IP (in our primary domain) and are scared that anybody
>    could find out from just reading some early setup code.
> c) we, the mighty Intel Corp., rule the world and we give a phrack on
>    what some tiny Customers like Google want from us.
> d) we, the mightly Intel Corp., did do what our name tells: INTEL,
>    and we don't want anybody raise unpleasant questions.

I would say
 e) We, Intel, have fixes and optimization logic (patented or specific to different
 customers) that is applicable  to our HW and we can't open it to Google because it
 will be used against us, in procurement and development. See recent article about
 ex-Intel employee who used this information when placed bids in Microsoft.
 https://www.usnews.com/news/best-states/oregon/articles/2021-02-08/intel-sues-engineer-who-went-to-microsoft-over-trade-secrets

> 
> 
> choose your poison :P
> 
> 
> --mtx
> 
> -- 
> ---
> Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
> werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
> GPG/PGP-Schlüssel zu.
> ---
> Enrico Weigelt, metux IT consult
> Free software and Linux embedded engineering
> info@metux.net -- +49-151-27565287

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 15:37                                                   ` Leon Romanovsky
@ 2021-03-19 15:53                                                     ` Amey Narkhede
  0 siblings, 0 replies; 90+ messages in thread
From: Amey Narkhede @ 2021-03-19 15:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe, alex.williamson

On 21/03/19 05:37PM, Leon Romanovsky wrote:
> On Fri, Mar 19, 2021 at 08:53:17PM +0530, Amey Narkhede wrote:
> > On 21/03/19 03:05PM, Leon Romanovsky wrote:
>
> <...>
>
> > > > > It was exactly the reason why I think that VM usecase presented by
> > > > > you is not viable.
> > > > >
> > > > Well I didn't present it as new use case. I just gave existing
> > > > usecase based on existing reset attribute. Nothing new here.
> > > > Nothing really changes wrt that use case.
> > >
> > > Of course it is new, please see Alex's response, he said that vfio uses
> > > in-kernel API and not sysfs.
> > >
> > Still it doesn't change in-kernel API either.
>
> Right, but the issue is with user space part of this proposal and not
> in-kernel API.
Userspace part just inhances existing reset attribute still no
significant changes there.
>
>
> <...>
>
> > > > As mentioned earlier not all vendors care about Linux and not
> > > > all of the population can afford to buy new HW just to run Linux.
> > >
> > > Sorry, but you are not consistent. At the beginning, we talked about new HW
> > > that has bugs but don't have quirks yet. Here we are talking about old HW
> > > that still doesn't have quirks.
> > >
> > > Thanks
> > >
> > Does it really matter whether HW is old or new?
> > If old HW doesn't have quirks yet how can we expect
> > new one to have quirks? What if new HW is made by same vendors
> > who don't have any interest in Linux?
>
> It is pretty clear that this sysfs won't improve quirks situation but
> has all potential to reduce their amount even more.
>
> Let's stop this discussion here.
>
> Thanks
>
IMO it does improve usability of devices which I consider to be more
important than developing quirks which are just bandages in the end
not HW fix. There's no point in using Linux if
I can't use the device in the first place and expecting to wait
for some community member to develop quirk without vendor support
is simply unrealistic.
So let's stop this discussion here.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 12:59                                           ` Leon Romanovsky
  2021-03-19 13:48                                             ` Enrico Weigelt, metux IT consult
@ 2021-03-19 15:57                                             ` Bjorn Helgaas
  2021-03-19 16:24                                               ` Leon Romanovsky
  2021-03-19 16:23                                             ` Alex Williamson
  2 siblings, 1 reply; 90+ messages in thread
From: Bjorn Helgaas @ 2021-03-19 15:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Alex Williamson, Amey Narkhede,
	raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe

On Fri, Mar 19, 2021 at 02:59:47PM +0200, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > On 18.03.21 18:22, Leon Romanovsky wrote:
> > 
> > > Which email client do you use?  Your responses are grouped as
> > > one huge block without any chance to respond to you on specific
> > > point or answer to your question.
> > 
> > I'm reading this thread in Tbird, and threading / quoting all
> > looks nice.
> 
> I'm not talking about threading or quoting but about response
> itself.  See it here
> https://lore.kernel.org/lkml/20210318103935.2ec32302@omen.home.shazbot.org/
> Alex's response is one big chunk without any separations to
> paragraphs.

Don't make this harder than it needs to be.  I think it's totally
acceptable to just split Alex's text where you need to respond.  For
example, Alex wrote this:

  vfio-pci uses the internal kernel API, ie. the variants of
  pci_reset_function(), which is the same interface used by the existing
  sysfs reset mechanism.  This proposed configuration of the reset method
  would affect any driver using that same core infrastructure and from my
  perspective that's really the goal.  ...

If I wanted to respond to the first sentence, I would just do this:

aw> vfio-pci uses the internal kernel API, ie. the variants of
aw> pci_reset_function(), which is the same interface used by the existing
aw> sysfs reset mechanism.  

I would write my response to the above here.  The rest of the quote
continues on below.  If the rest of Alex's message isn't relevant to
my response, I would remove it completely.

aw> This proposed configuration of the reset method
aw> would affect any driver using that same core infrastructure and from my
aw> perspective that's really the goal.  ...

Bjorn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 12:59                                           ` Leon Romanovsky
  2021-03-19 13:48                                             ` Enrico Weigelt, metux IT consult
  2021-03-19 15:57                                             ` Bjorn Helgaas
@ 2021-03-19 16:23                                             ` Alex Williamson
  2021-03-20  9:10                                               ` Leon Romanovsky
  2 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-19 16:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Fri, 19 Mar 2021 14:59:47 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > On 18.03.21 18:22, Leon Romanovsky wrote:
> >   
> > > Which email client do you use?
> > > Your responses are grouped as one huge block without any chance to respond
> > > to you on specific point or answer to your question.  
> > 
> > I'm reading this thread in Tbird, and threading / quoting all looks
> > nice.  
> 
> I'm not talking about threading or quoting but about response itself.
> See it here https://lore.kernel.org/lkml/20210318103935.2ec32302@omen.home.shazbot.org/
> Alex's response is one big chunk without any separations to paragraphs.

I've never known paragraph breaks to be required to interject a reply.

Back on topic...

> >   
> > > I see your flow and understand your position, but will repeat my
> > > position. We need to make sure that vendors will have incentive to
> > > supply quirks.  

What if we taint the kernel or pci_warn() for cases where either all
the reset methods are disabled, ie. 'echo none > reset_method', or any
time a device specific method is disabled?

I'd almost go so far as to prevent disabling a device specific reset
altogether, but for example should a device specific reset that fixes
an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
case if direct FLR were disabled via a device flag introduced with the
quirk and the remaining resets can still be selected by preference.

Theoretically all the other reset methods work and are available, it's
only a policy decision which to use, right?

If a device probes for a reset that's broken and distros start
including systemd scripts to apply a preference to avoid it, (a) that
enables them to work with existing kernels, and (b) indicates to us to
add the trivial quirk to flag that reset as broken.

The other side of the argument that this discourages quirks is that
this interface actually makes it significantly easier to report specific
reset methods as broken for a given device.

Thanks,
Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 15:57                                             ` Bjorn Helgaas
@ 2021-03-19 16:24                                               ` Leon Romanovsky
  0 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-19 16:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Enrico Weigelt, metux IT consult, Alex Williamson, Amey Narkhede,
	raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe

On Fri, Mar 19, 2021 at 10:57:11AM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 19, 2021 at 02:59:47PM +0200, Leon Romanovsky wrote:
> > On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > > On 18.03.21 18:22, Leon Romanovsky wrote:
> > > 
> > > > Which email client do you use?  Your responses are grouped as
> > > > one huge block without any chance to respond to you on specific
> > > > point or answer to your question.
> > > 
> > > I'm reading this thread in Tbird, and threading / quoting all
> > > looks nice.
> > 
> > I'm not talking about threading or quoting but about response
> > itself.  See it here
> > https://lore.kernel.org/lkml/20210318103935.2ec32302@omen.home.shazbot.org/
> > Alex's response is one big chunk without any separations to
> > paragraphs.
> 
> Don't make this harder than it needs to be.  I think it's totally
> acceptable to just split Alex's text where you need to respond.  For
> example, Alex wrote this:
> 
>   vfio-pci uses the internal kernel API, ie. the variants of
>   pci_reset_function(), which is the same interface used by the existing
>   sysfs reset mechanism.  This proposed configuration of the reset method
>   would affect any driver using that same core infrastructure and from my
>   perspective that's really the goal.  ...
> 
> If I wanted to respond to the first sentence, I would just do this:
> 
> aw> vfio-pci uses the internal kernel API, ie. the variants of
> aw> pci_reset_function(), which is the same interface used by the existing
> aw> sysfs reset mechanism.  
> 
> I would write my response to the above here.  The rest of the quote
> continues on below.  If the rest of Alex's message isn't relevant to
> my response, I would remove it completely.
> 
> aw> This proposed configuration of the reset method
> aw> would affect any driver using that same core infrastructure and from my
> aw> perspective that's really the goal.  ...
> 
> Bjorn

Thanks Bjorn, you presented me how to respond on such messages, however
I was more afraid if my setup needs some adjustments and it is only me
who sees it as one chunk.

Thanks


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-19 16:23                                             ` Alex Williamson
@ 2021-03-20  9:10                                               ` Leon Romanovsky
  2021-03-20 14:59                                                 ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-20  9:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> On Fri, 19 Mar 2021 14:59:47 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > > On 18.03.21 18:22, Leon Romanovsky wrote:
> > >   
> > > > Which email client do you use?
> > > > Your responses are grouped as one huge block without any chance to respond
> > > > to you on specific point or answer to your question.  
> > > 
> > > I'm reading this thread in Tbird, and threading / quoting all looks
> > > nice.  
> > 
> > I'm not talking about threading or quoting but about response itself.
> > See it here https://lore.kernel.org/lkml/20210318103935.2ec32302@omen.home.shazbot.org/
> > Alex's response is one big chunk without any separations to paragraphs.
> 
> I've never known paragraph breaks to be required to interject a reply.

Of course not, but as Bjorn said if you don't do paragraphs, we will
need manually break your message, fix ">" quotation marks and half
sentences.

I just wanted to be sure that this is not my mail client.

> 
> Back on topic...
> 
> > >   
> > > > I see your flow and understand your position, but will repeat my
> > > > position. We need to make sure that vendors will have incentive to
> > > > supply quirks.  
> 
> What if we taint the kernel or pci_warn() for cases where either all
> the reset methods are disabled, ie. 'echo none > reset_method', or any
> time a device specific method is disabled?

What does it mean "none"? Does it mean nothing supported? If yes, I think that
pci_warn() will be enough. At least for me, taint is usable during debug stages,
probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.

> 
> I'd almost go so far as to prevent disabling a device specific reset
> altogether, but for example should a device specific reset that fixes
> an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> case if direct FLR were disabled via a device flag introduced with the
> quirk and the remaining resets can still be selected by preference.

I don't know enough to discuss the PCI details, but you raised good point.
This sysfs is user visible API that is presented as is from device point
of view. It can be easily run into problems if PCI/core doesn't work with
user's choice.

> 
> Theoretically all the other reset methods work and are available, it's
> only a policy decision which to use, right?

But this patch was presented as a way to overcome situations where
supported != working and user magically knows which reset type to set.

If you want to take this patch to be policy decision tool,
it will need to accept "reset_type1,reset_type2,..." sort of input,
so fallback will work natively.

I think that it will be much more robust and cleaner solution than it is now.
Something like that:
cat /sys/..../reset_policy
reset_type1,reset_type2,...,reset_typeX
echo "reset_type3,reset_type1" > /sys/..../reset_policy
cat /sys/..../reset_policy
reset_type3,reset_type1

Thanks

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-20  9:10                                               ` Leon Romanovsky
@ 2021-03-20 14:59                                                 ` Alex Williamson
  2021-03-21  8:40                                                   ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-20 14:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Sat, 20 Mar 2021 11:10:08 +0200
Leon Romanovsky <leon@kernel.org> wrote:
> On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote: 
> > 
> > What if we taint the kernel or pci_warn() for cases where either all
> > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > time a device specific method is disabled?  
> 
> What does it mean "none"? Does it mean nothing supported? If yes, I think that
> pci_warn() will be enough. At least for me, taint is usable during debug stages,
> probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.

"none" as implemented in this patch, clearing the enabled function
reset methods.

> > I'd almost go so far as to prevent disabling a device specific reset
> > altogether, but for example should a device specific reset that fixes
> > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > case if direct FLR were disabled via a device flag introduced with the
> > quirk and the remaining resets can still be selected by preference.  
> 
> I don't know enough to discuss the PCI details, but you raised good point.
> This sysfs is user visible API that is presented as is from device point
> of view. It can be easily run into problems if PCI/core doesn't work with
> user's choice.
> 
> > 
> > Theoretically all the other reset methods work and are available, it's
> > only a policy decision which to use, right?  
> 
> But this patch was presented as a way to overcome situations where
> supported != working and user magically knows which reset type to set.

It's not magic, the new sysfs attributes expose which resets are
enabled and the order that they're used, the user can simply select the
next one.  Being able to bypass a broken reset method is a helpful side
effect of getting to select a preferred reset method.

> If you want to take this patch to be policy decision tool,
> it will need to accept "reset_type1,reset_type2,..." sort of input,
> so fallback will work natively.

I don't see that as a requirement.  We have fall-through support in the
kernel, but for a given device we're really only ever going to make use
of one of those methods.  If a user knows enough about a device to have
a preference, I think it can be singular.  That also significantly
simplifies the interface and supporting code.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-20 14:59                                                 ` Alex Williamson
@ 2021-03-21  8:40                                                   ` Leon Romanovsky
  2021-03-21 14:57                                                     ` Amey Narkhede
  2021-03-22 17:10                                                     ` Alex Williamson
  0 siblings, 2 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-21  8:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> On Sat, 20 Mar 2021 11:10:08 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote: 
> > > 
> > > What if we taint the kernel or pci_warn() for cases where either all
> > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > time a device specific method is disabled?  
> > 
> > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> 
> "none" as implemented in this patch, clearing the enabled function
> reset methods.

It is far from intuitive, the empty string will be easier to understand,
because "none" means no reset at all.

> 
> > > I'd almost go so far as to prevent disabling a device specific reset
> > > altogether, but for example should a device specific reset that fixes
> > > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > > case if direct FLR were disabled via a device flag introduced with the
> > > quirk and the remaining resets can still be selected by preference.  
> > 
> > I don't know enough to discuss the PCI details, but you raised good point.
> > This sysfs is user visible API that is presented as is from device point
> > of view. It can be easily run into problems if PCI/core doesn't work with
> > user's choice.
> > 
> > > 
> > > Theoretically all the other reset methods work and are available, it's
> > > only a policy decision which to use, right?  
> > 
> > But this patch was presented as a way to overcome situations where
> > supported != working and user magically knows which reset type to set.
> 
> It's not magic, the new sysfs attributes expose which resets are
> enabled and the order that they're used, the user can simply select the
> next one.  Being able to bypass a broken reset method is a helpful side
> effect of getting to select a preferred reset method.

Magic in a sense that user has no idea what those resets mean, the
expectation is that he will blindly iterate till something works.

> 
> > If you want to take this patch to be policy decision tool,
> > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > so fallback will work natively.
> 
> I don't see that as a requirement.  We have fall-through support in the
> kernel, but for a given device we're really only ever going to make use
> of one of those methods.  If a user knows enough about a device to have
> a preference, I think it can be singular.  That also significantly
> simplifies the interface and supporting code.  Thanks,

I'm struggling to get requirements from this thread. You talked about
policy decision to overtake fallback mechanism, Amey wanted to avoid
quirks.

Do you have an example of such devices or we are talking about
theoretical case?

And I don't see why simple line parser with loop iterator over strchr()
suddenly becomes complicated code.

Thanks

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-21  8:40                                                   ` Leon Romanovsky
@ 2021-03-21 14:57                                                     ` Amey Narkhede
  2021-03-22 17:10                                                     ` Alex Williamson
  1 sibling, 0 replies; 90+ messages in thread
From: Amey Narkhede @ 2021-03-21 14:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: alex.williamson, info, raphael.norwitz, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/21 10:40AM, Leon Romanovsky wrote:
> On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > On Sat, 20 Mar 2021 11:10:08 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > >
> > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > time a device specific method is disabled?
> > >
> > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> >
> > "none" as implemented in this patch, clearing the enabled function
> > reset methods.
>
> It is far from intuitive, the empty string will be easier to understand,
> because "none" means no reset at all.
>
> >
> > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > altogether, but for example should a device specific reset that fixes
> > > > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > > > case if direct FLR were disabled via a device flag introduced with the
> > > > quirk and the remaining resets can still be selected by preference.
> > >
> > > I don't know enough to discuss the PCI details, but you raised good point.
> > > This sysfs is user visible API that is presented as is from device point
> > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > user's choice.
> > >
> > > >
> > > > Theoretically all the other reset methods work and are available, it's
> > > > only a policy decision which to use, right?
> > >
> > > But this patch was presented as a way to overcome situations where
> > > supported != working and user magically knows which reset type to set.
> >
> > It's not magic, the new sysfs attributes expose which resets are
> > enabled and the order that they're used, the user can simply select the
> > next one.  Being able to bypass a broken reset method is a helpful side
> > effect of getting to select a preferred reset method.
>
> Magic in a sense that user has no idea what those resets mean, the
> expectation is that he will blindly iterate till something works.
>
> >
> > > If you want to take this patch to be policy decision tool,
> > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > so fallback will work natively.
> >
> > I don't see that as a requirement.  We have fall-through support in the
> > kernel, but for a given device we're really only ever going to make use
> > of one of those methods.  If a user knows enough about a device to have
> > a preference, I think it can be singular.  That also significantly
> > simplifies the interface and supporting code.  Thanks,
>
> I'm struggling to get requirements from this thread. You talked about
> policy decision to overtake fallback mechanism, Amey wanted to avoid
> quirks.
Just to clarify I don't want to avoid quirks. I just want device
to be usable even if it doesn't have quirk as the quirk for that
particular device may not be developed at all for different reasons
mentioned earlier.
[...]

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-21  8:40                                                   ` Leon Romanovsky
  2021-03-21 14:57                                                     ` Amey Narkhede
@ 2021-03-22 17:10                                                     ` Alex Williamson
  2021-03-24 10:03                                                       ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-22 17:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Sun, 21 Mar 2021 10:40:55 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > On Sat, 20 Mar 2021 11:10:08 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:  
> > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:   
> > > > 
> > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > time a device specific method is disabled?    
> > > 
> > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.  
> > 
> > "none" as implemented in this patch, clearing the enabled function
> > reset methods.  
> 
> It is far from intuitive, the empty string will be easier to understand,
> because "none" means no reset at all.

"No reset at all" is what "none" achieves, the
pci_dev.reset_methods_enabled bitmap is cleared.  We can use an empty
string, but I think we want a way to clear all enabled resets and a way
to return it to the default.  I could see arguments for an empty string
serving either purpose, so this version proposed explicitly using
"none" and "default", as included in the ABI update.

> > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > altogether, but for example should a device specific reset that fixes
> > > > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > > > case if direct FLR were disabled via a device flag introduced with the
> > > > quirk and the remaining resets can still be selected by preference.    
> > > 
> > > I don't know enough to discuss the PCI details, but you raised good point.
> > > This sysfs is user visible API that is presented as is from device point
> > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > user's choice.
> > >   
> > > > 
> > > > Theoretically all the other reset methods work and are available, it's
> > > > only a policy decision which to use, right?    
> > > 
> > > But this patch was presented as a way to overcome situations where
> > > supported != working and user magically knows which reset type to set.  
> > 
> > It's not magic, the new sysfs attributes expose which resets are
> > enabled and the order that they're used, the user can simply select the
> > next one.  Being able to bypass a broken reset method is a helpful side
> > effect of getting to select a preferred reset method.  
> 
> Magic in a sense that user has no idea what those resets mean, the
> expectation is that he will blindly iterate till something works.

Which ought to actually be a safe thing to do.  We should have quirks to
exclude resets that are known broken but still probe as present and I'd
be perfectly fine if we issue a warning if the user disables all resets
for a given device.
 
> > > If you want to take this patch to be policy decision tool,
> > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > so fallback will work natively.  
> > 
> > I don't see that as a requirement.  We have fall-through support in the
> > kernel, but for a given device we're really only ever going to make use
> > of one of those methods.  If a user knows enough about a device to have
> > a preference, I think it can be singular.  That also significantly
> > simplifies the interface and supporting code.  Thanks,  
> 
> I'm struggling to get requirements from this thread. You talked about
> policy decision to overtake fallback mechanism, Amey wanted to avoid
> quirks.
> 
> Do you have an example of such devices or we are talking about
> theoretical case?

Look at any device that already has a reset quirk and the process it
took to get there.  Those are more than just theoretical cases.

For policy preference, I already described how I've configured QEMU to
prefer a bus reset rather than a PM reset due to lack of specification
regarding the scope of a PM "soft reset".  This interface would allow a
system policy to do that same thing.

I don't think anyone is suggesting this as a means to avoid quirks that
would resolve reset issues and create the best default general behavior.
This provides a mechanism to test various reset methods, and thereby
identify broken methods, and set a policy.  Sure, that policy might be
to avoid a broken reset in the interim before it gets quirked and
there's potential for abuse there, but I think the benefits outweigh
the risks.

> And I don't see why simple line parser with loop iterator over strchr()
> suddenly becomes complicated code.

Setting multiple bits in a bitmap is easy.  How do you then go on to
allow the user to specify an ordering preference?  If you have an
algorithm you'd like to propose that allows the user to manage the
ordering when enabling multiple methods without substantially
increasing the complexity, please share.  IMO, a given device will
generally use one reset method and it seems sufficient to restrict user
preference to achieve all the use cases I've noted.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-18 14:31                             ` Amey Narkhede
@ 2021-03-23 14:34                               ` Pali Rohár
  2021-03-23 14:44                                 ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Pali Rohár @ 2021-03-23 14:34 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: alex.williamson, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> On 21/03/17 09:13PM, Pali Rohár wrote:
> > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > Pali Rohár <pali@kernel.org> wrote:
> > >
> > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > >
> > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > >
> > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > >
> > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > >
> > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > defined here.
> > > > > > > > > >
> > > > > > > > > > Ok!
> > > > > > > > > >
> > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > let device in unconfigured / broken state.
> > > > > > > > >
> > > > > > > > > No, there's not:
> > > > > > > > >
> > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > {
> > > > > > > > >         int rc;
> > > > > > > > >
> > > > > > > > >         if (!dev->reset_fn)
> > > > > > > > >                 return -ENOTTY;
> > > > > > > > >
> > > > > > > > >         pci_dev_lock(dev);
> > > > > > > > > >>>     pci_dev_save_and_disable(dev);
> > > > > > > > >
> > > > > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > > > > >
> > > > > > > > > >>>     pci_dev_restore(dev);
> > > > > > > > >         pci_dev_unlock(dev);
> > > > > > > > >
> > > > > > > > >         return rc;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > > > >
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > >
> > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > >
> > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > change.
> > > > > > >
> > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > around the bus reset.  Thanks,
> > > > > >
> > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > >
> > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > hotplugging.
> > > > > >
> > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > bus reset.
> > > > >
> > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > that prevents the device from getting ejected.
> > > >
> > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > >
> > > > > Maybe it would make
> > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > single "bus" reset method that will always use slot reset when
> > > > > available.  Thanks,
> > > >
> > > > That should work when slot reset is available.
> > > >
> > > > Other option is that mentioned remove-reset-rescan procedure.
> > >
> > > That's not something we can introduce to the pci_reset_function() path
> > > without a fair bit of collateral in using it through vfio-pci.
> > >
> > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > drivers implement reset_slot method.
> > > >
> > > > So there is a possible issue with hotplug driver which may eject device
> > > > during bus reset (because e.g. slot reset is not implemented)?
> > >
> > > People aren't reporting it, so maybe those controllers aren't being
> > > used for this use case.  Or maybe introducing this patch will make
> > > these reset methods more readily accessible for testing.  We can fix or
> > > blacklist those controllers for bus reset when reports come in.  Thanks,
> >
> > Ok! I do not know neither if those controllers are used, but looks like
> > that there are still changes in hotplug code.
> >
> > So I guess with these patches people can test it and report issues when
> > such thing happen.
> So after a bit research as I understood we need to group slot
> and bus reset together in a single category of reset methods and
> then implicitly use slot reset if it is available when bus reset is
> enabled by the user.
> Is that right?

Yes, I understand it in same way. Just I do not know which name to
choose for this reset category. In PCI spec it is called Secondary Bus
Reset (as it resets whole bus with all devices; but we allow this reset
in this patch series only if on the bus is connected exactly one device).
In PCIe spec it is called Hot Reset. And if kernel detects Slot support
then kernel currently calls it Slot reset. But it is still same thing.
Any opinion? I think that we could call it Hot Reset as this patch
series exports it only for single device (so calling it _bus_ is not the
best match).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-23 14:34                               ` Pali Rohár
@ 2021-03-23 14:44                                 ` Alex Williamson
  2021-03-23 15:32                                   ` Amey Narkhede
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-23 14:44 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Amey Narkhede, bhelgaas, raphael.norwitz, linux-kernel, linux-pci

On Tue, 23 Mar 2021 15:34:19 +0100
Pali Rohár <pali@kernel.org> wrote:

> On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> > On 21/03/17 09:13PM, Pali Rohár wrote:  
> > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:  
> > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > Pali Rohár <pali@kernel.org> wrote:
> > > >  
> > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:  
> > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > >  
> > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:  
> > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > >  
> > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:  
> > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > >  
> > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:  
> > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > >  
> > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > warm reset respectively.  
> > > > > > > > > > > > >
> > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > kernel function (yet).  
> > > > > > > > > > > >
> > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > defined here.  
> > > > > > > > > > >
> > > > > > > > > > > Ok!
> > > > > > > > > > >  
> > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex
> > > > > > > > > > > >  
> > > > > > > > > > >
> > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > let device in unconfigured / broken state.  
> > > > > > > > > >
> > > > > > > > > > No, there's not:
> > > > > > > > > >
> > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > {
> > > > > > > > > >         int rc;
> > > > > > > > > >
> > > > > > > > > >         if (!dev->reset_fn)
> > > > > > > > > >                 return -ENOTTY;
> > > > > > > > > >
> > > > > > > > > >         pci_dev_lock(dev);  
> > > > > > > > > > >>>     pci_dev_save_and_disable(dev);  
> > > > > > > > > >
> > > > > > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > > > > > >  
> > > > > > > > > > >>>     pci_dev_restore(dev);  
> > > > > > > > > >         pci_dev_unlock(dev);
> > > > > > > > > >
> > > > > > > > > >         return rc;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex
> > > > > > > > > >  
> > > > > > > > >
> > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > >
> > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > >
> > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > change.  
> > > > > > > >
> > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > around the bus reset.  Thanks,  
> > > > > > >
> > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > >
> > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > hotplugging.
> > > > > > >
> > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > bus reset.  
> > > > > >
> > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > that prevents the device from getting ejected.  
> > > > >
> > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > >  
> > > > > > Maybe it would make
> > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > single "bus" reset method that will always use slot reset when
> > > > > > available.  Thanks,  
> > > > >
> > > > > That should work when slot reset is available.
> > > > >
> > > > > Other option is that mentioned remove-reset-rescan procedure.  
> > > >
> > > > That's not something we can introduce to the pci_reset_function() path
> > > > without a fair bit of collateral in using it through vfio-pci.
> > > >  
> > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > drivers implement reset_slot method.
> > > > >
> > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > during bus reset (because e.g. slot reset is not implemented)?  
> > > >
> > > > People aren't reporting it, so maybe those controllers aren't being
> > > > used for this use case.  Or maybe introducing this patch will make
> > > > these reset methods more readily accessible for testing.  We can fix or
> > > > blacklist those controllers for bus reset when reports come in.  Thanks,  
> > >
> > > Ok! I do not know neither if those controllers are used, but looks like
> > > that there are still changes in hotplug code.
> > >
> > > So I guess with these patches people can test it and report issues when
> > > such thing happen.  
> > So after a bit research as I understood we need to group slot
> > and bus reset together in a single category of reset methods and
> > then implicitly use slot reset if it is available when bus reset is
> > enabled by the user.
> > Is that right?  
> 
> Yes, I understand it in same way. Just I do not know which name to
> choose for this reset category. In PCI spec it is called Secondary Bus
> Reset (as it resets whole bus with all devices; but we allow this reset
> in this patch series only if on the bus is connected exactly one device).
> In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> then kernel currently calls it Slot reset. But it is still same thing.
> Any opinion? I think that we could call it Hot Reset as this patch
> series exports it only for single device (so calling it _bus_ is not the
> best match).

A similar abstraction where our scope is not limited to a single
function calls this a bus reset:

int pci_reset_bus(struct pci_dev *pdev)
{
        return (!pci_probe_reset_slot(pdev->slot)) ?
            __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
}

Thanks,
Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-23 14:44                                 ` Alex Williamson
@ 2021-03-23 15:32                                   ` Amey Narkhede
  2021-03-23 16:06                                     ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-23 15:32 UTC (permalink / raw)
  To: Alex Williamson; +Cc: bhelgaas, pali, raphael.norwitz, linux-kernel, linux-pci

On 21/03/23 08:44AM, Alex Williamson wrote:
> On Tue, 23 Mar 2021 15:34:19 +0100
> Pali Rohár <pali@kernel.org> wrote:
>
> > On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> > > On 21/03/17 09:13PM, Pali Rohár wrote:
> > > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > >
> > > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > >
> > > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > defined here.
> > > > > > > > > > > >
> > > > > > > > > > > > Ok!
> > > > > > > > > > > >
> > > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > > let device in unconfigured / broken state.
> > > > > > > > > > >
> > > > > > > > > > > No, there's not:
> > > > > > > > > > >
> > > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > > {
> > > > > > > > > > >         int rc;
> > > > > > > > > > >
> > > > > > > > > > >         if (!dev->reset_fn)
> > > > > > > > > > >                 return -ENOTTY;
> > > > > > > > > > >
> > > > > > > > > > >         pci_dev_lock(dev);
> > > > > > > > > > > >>>     pci_dev_save_and_disable(dev);
> > > > > > > > > > >
> > > > > > > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > > > > > > >
> > > > > > > > > > > >>>     pci_dev_restore(dev);
> > > > > > > > > > >         pci_dev_unlock(dev);
> > > > > > > > > > >
> > > > > > > > > > >         return rc;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > > >
> > > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > > >
> > > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > > change.
> > > > > > > > >
> > > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > > around the bus reset.  Thanks,
> > > > > > > >
> > > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > > >
> > > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > > hotplugging.
> > > > > > > >
> > > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > > bus reset.
> > > > > > >
> > > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > > that prevents the device from getting ejected.
> > > > > >
> > > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > > >
> > > > > > > Maybe it would make
> > > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > > single "bus" reset method that will always use slot reset when
> > > > > > > available.  Thanks,
> > > > > >
> > > > > > That should work when slot reset is available.
> > > > > >
> > > > > > Other option is that mentioned remove-reset-rescan procedure.
> > > > >
> > > > > That's not something we can introduce to the pci_reset_function() path
> > > > > without a fair bit of collateral in using it through vfio-pci.
> > > > >
> > > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > > drivers implement reset_slot method.
> > > > > >
> > > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > > during bus reset (because e.g. slot reset is not implemented)?
> > > > >
> > > > > People aren't reporting it, so maybe those controllers aren't being
> > > > > used for this use case.  Or maybe introducing this patch will make
> > > > > these reset methods more readily accessible for testing.  We can fix or
> > > > > blacklist those controllers for bus reset when reports come in.  Thanks,
> > > >
> > > > Ok! I do not know neither if those controllers are used, but looks like
> > > > that there are still changes in hotplug code.
> > > >
> > > > So I guess with these patches people can test it and report issues when
> > > > such thing happen.
> > > So after a bit research as I understood we need to group slot
> > > and bus reset together in a single category of reset methods and
> > > then implicitly use slot reset if it is available when bus reset is
> > > enabled by the user.
> > > Is that right?
> >
> > Yes, I understand it in same way. Just I do not know which name to
> > choose for this reset category. In PCI spec it is called Secondary Bus
> > Reset (as it resets whole bus with all devices; but we allow this reset
> > in this patch series only if on the bus is connected exactly one device).
> > In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> > then kernel currently calls it Slot reset. But it is still same thing.
> > Any opinion? I think that we could call it Hot Reset as this patch
> > series exports it only for single device (so calling it _bus_ is not the
> > best match).
>
> A similar abstraction where our scope is not limited to a single
> function calls this a bus reset:
>
> int pci_reset_bus(struct pci_dev *pdev)
> {
>         return (!pci_probe_reset_slot(pdev->slot)) ?
>             __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> }
>
> Thanks,
> Alex
>
I was going to use similar function

int pci_bus_reset(struct pci_dev *dev, int probe)
{
       return pci_dev_reset_slot_function(dev, probe) ?
               pci_parent_bus_reset(dev, probe) : 0;

}

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-23 15:32                                   ` Amey Narkhede
@ 2021-03-23 16:06                                     ` Alex Williamson
  2021-03-23 16:15                                       ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-23 16:06 UTC (permalink / raw)
  To: Amey Narkhede; +Cc: bhelgaas, pali, raphael.norwitz, linux-kernel, linux-pci

On Tue, 23 Mar 2021 21:02:21 +0530
Amey Narkhede <ameynarkhede03@gmail.com> wrote:

> On 21/03/23 08:44AM, Alex Williamson wrote:
> > On Tue, 23 Mar 2021 15:34:19 +0100
> > Pali Rohár <pali@kernel.org> wrote:
> >  
> > > On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:  
> > > > On 21/03/17 09:13PM, Pali Rohár wrote:  
> > > > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:  
> > > > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > >  
> > > > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:  
> > > > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > >  
> > > > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:  
> > > > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > >  
> > > > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:  
> > > > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > >  
> > > > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:  
> > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:  
> > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > warm reset respectively.  
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > kernel function (yet).  
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > defined here.  
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ok!
> > > > > > > > > > > > >  
> > > > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > >  
> > > > > > > > > > > > >
> > > > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > > > let device in unconfigured / broken state.  
> > > > > > > > > > > >
> > > > > > > > > > > > No, there's not:
> > > > > > > > > > > >
> > > > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > > > {
> > > > > > > > > > > >         int rc;
> > > > > > > > > > > >
> > > > > > > > > > > >         if (!dev->reset_fn)
> > > > > > > > > > > >                 return -ENOTTY;
> > > > > > > > > > > >
> > > > > > > > > > > >         pci_dev_lock(dev);  
> > > > > > > > > > > > >>>     pci_dev_save_and_disable(dev);  
> > > > > > > > > > > >
> > > > > > > > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > > > > > > > >  
> > > > > > > > > > > > >>>     pci_dev_restore(dev);  
> > > > > > > > > > > >         pci_dev_unlock(dev);
> > > > > > > > > > > >
> > > > > > > > > > > >         return rc;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex
> > > > > > > > > > > >  
> > > > > > > > > > >
> > > > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > > > >
> > > > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > > > >
> > > > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > > > change.  
> > > > > > > > > >
> > > > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > > > around the bus reset.  Thanks,  
> > > > > > > > >
> > > > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > > > >
> > > > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > > > hotplugging.
> > > > > > > > >
> > > > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > > > bus reset.  
> > > > > > > >
> > > > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > > > that prevents the device from getting ejected.  
> > > > > > >
> > > > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > > > >  
> > > > > > > > Maybe it would make
> > > > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > > > single "bus" reset method that will always use slot reset when
> > > > > > > > available.  Thanks,  
> > > > > > >
> > > > > > > That should work when slot reset is available.
> > > > > > >
> > > > > > > Other option is that mentioned remove-reset-rescan procedure.  
> > > > > >
> > > > > > That's not something we can introduce to the pci_reset_function() path
> > > > > > without a fair bit of collateral in using it through vfio-pci.
> > > > > >  
> > > > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > > > drivers implement reset_slot method.
> > > > > > >
> > > > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > > > during bus reset (because e.g. slot reset is not implemented)?  
> > > > > >
> > > > > > People aren't reporting it, so maybe those controllers aren't being
> > > > > > used for this use case.  Or maybe introducing this patch will make
> > > > > > these reset methods more readily accessible for testing.  We can fix or
> > > > > > blacklist those controllers for bus reset when reports come in.  Thanks,  
> > > > >
> > > > > Ok! I do not know neither if those controllers are used, but looks like
> > > > > that there are still changes in hotplug code.
> > > > >
> > > > > So I guess with these patches people can test it and report issues when
> > > > > such thing happen.  
> > > > So after a bit research as I understood we need to group slot
> > > > and bus reset together in a single category of reset methods and
> > > > then implicitly use slot reset if it is available when bus reset is
> > > > enabled by the user.
> > > > Is that right?  
> > >
> > > Yes, I understand it in same way. Just I do not know which name to
> > > choose for this reset category. In PCI spec it is called Secondary Bus
> > > Reset (as it resets whole bus with all devices; but we allow this reset
> > > in this patch series only if on the bus is connected exactly one device).
> > > In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> > > then kernel currently calls it Slot reset. But it is still same thing.
> > > Any opinion? I think that we could call it Hot Reset as this patch
> > > series exports it only for single device (so calling it _bus_ is not the
> > > best match).  
> >
> > A similar abstraction where our scope is not limited to a single
> > function calls this a bus reset:
> >
> > int pci_reset_bus(struct pci_dev *pdev)
> > {
> >         return (!pci_probe_reset_slot(pdev->slot)) ?
> >             __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > }
> >
> > Thanks,
> > Alex
> >  
> I was going to use similar function
> 
> int pci_bus_reset(struct pci_dev *dev, int probe)
> {
>        return pci_dev_reset_slot_function(dev, probe) ?
>                pci_parent_bus_reset(dev, probe) : 0;
> 
> }

I think via the sysfs attribute we can simply call this "bus" reset,
but internally having both pci_reset_bus() and pci_bus_reset() would be
really confusing.  We're doing the same thing as pci_bus_reset() but
with a different scope, so I'd probably suggest
pci_bus_reset_function().

Also, the above ternary form isn't true to the original, only -ENOTTY
allows fall-through, so something more like:

int pci_reset_bus_function(struct pci_dev *dev, int probe)
{
	int rc = pci_dev_reset_slot_function(dev, probe);

	return (rc == -ENOTTY) ? pci_parent_bus_reset(dev, probe) : rc;
}

Thanks,
Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-23 16:06                                     ` Alex Williamson
@ 2021-03-23 16:15                                       ` Alex Williamson
  0 siblings, 0 replies; 90+ messages in thread
From: Alex Williamson @ 2021-03-23 16:15 UTC (permalink / raw)
  To: Amey Narkhede; +Cc: bhelgaas, pali, raphael.norwitz, linux-kernel, linux-pci

On Tue, 23 Mar 2021 10:06:25 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 23 Mar 2021 21:02:21 +0530
> Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> 
> > On 21/03/23 08:44AM, Alex Williamson wrote:  
> > > On Tue, 23 Mar 2021 15:34:19 +0100
> > > Pali Rohár <pali@kernel.org> wrote:
> > >    
> > > > On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:    
> > > > > On 21/03/17 09:13PM, Pali Rohár wrote:    
> > > > > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:    
> > > > > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > >    
> > > > > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:    
> > > > > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > >    
> > > > > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:    
> > > > > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > >    
> > > > > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:    
> > > > > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > >    
> > > > > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:    
> > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > > > >    
> > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:    
> > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > > warm reset respectively.    
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > > kernel function (yet).    
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > > defined here.    
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ok!
> > > > > > > > > > > > > >    
> > > > > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > >    
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > > > > let device in unconfigured / broken state.    
> > > > > > > > > > > > >
> > > > > > > > > > > > > No, there's not:
> > > > > > > > > > > > >
> > > > > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > > > > {
> > > > > > > > > > > > >         int rc;
> > > > > > > > > > > > >
> > > > > > > > > > > > >         if (!dev->reset_fn)
> > > > > > > > > > > > >                 return -ENOTTY;
> > > > > > > > > > > > >
> > > > > > > > > > > > >         pci_dev_lock(dev);    
> > > > > > > > > > > > > >>>     pci_dev_save_and_disable(dev);    
> > > > > > > > > > > > >
> > > > > > > > > > > > >         rc = __pci_reset_function_locked(dev);
> > > > > > > > > > > > >    
> > > > > > > > > > > > > >>>     pci_dev_restore(dev);    
> > > > > > > > > > > > >         pci_dev_unlock(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > >         return rc;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > > > > limited to the single device triggering this reset.  Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >    
> > > > > > > > > > > >
> > > > > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > > > > >
> > > > > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > > > > >
> > > > > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > > > > change.    
> > > > > > > > > > >
> > > > > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > > > > around the bus reset.  Thanks,    
> > > > > > > > > >
> > > > > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > > > > >
> > > > > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > > > > hotplugging.
> > > > > > > > > >
> > > > > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > > > > bus reset.    
> > > > > > > > >
> > > > > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > > > > that prevents the device from getting ejected.    
> > > > > > > >
> > > > > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > > > > >    
> > > > > > > > > Maybe it would make
> > > > > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > > > > single "bus" reset method that will always use slot reset when
> > > > > > > > > available.  Thanks,    
> > > > > > > >
> > > > > > > > That should work when slot reset is available.
> > > > > > > >
> > > > > > > > Other option is that mentioned remove-reset-rescan procedure.    
> > > > > > >
> > > > > > > That's not something we can introduce to the pci_reset_function() path
> > > > > > > without a fair bit of collateral in using it through vfio-pci.
> > > > > > >    
> > > > > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > > > > drivers implement reset_slot method.
> > > > > > > >
> > > > > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > > > > during bus reset (because e.g. slot reset is not implemented)?    
> > > > > > >
> > > > > > > People aren't reporting it, so maybe those controllers aren't being
> > > > > > > used for this use case.  Or maybe introducing this patch will make
> > > > > > > these reset methods more readily accessible for testing.  We can fix or
> > > > > > > blacklist those controllers for bus reset when reports come in.  Thanks,    
> > > > > >
> > > > > > Ok! I do not know neither if those controllers are used, but looks like
> > > > > > that there are still changes in hotplug code.
> > > > > >
> > > > > > So I guess with these patches people can test it and report issues when
> > > > > > such thing happen.    
> > > > > So after a bit research as I understood we need to group slot
> > > > > and bus reset together in a single category of reset methods and
> > > > > then implicitly use slot reset if it is available when bus reset is
> > > > > enabled by the user.
> > > > > Is that right?    
> > > >
> > > > Yes, I understand it in same way. Just I do not know which name to
> > > > choose for this reset category. In PCI spec it is called Secondary Bus
> > > > Reset (as it resets whole bus with all devices; but we allow this reset
> > > > in this patch series only if on the bus is connected exactly one device).
> > > > In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> > > > then kernel currently calls it Slot reset. But it is still same thing.
> > > > Any opinion? I think that we could call it Hot Reset as this patch
> > > > series exports it only for single device (so calling it _bus_ is not the
> > > > best match).    
> > >
> > > A similar abstraction where our scope is not limited to a single
> > > function calls this a bus reset:
> > >
> > > int pci_reset_bus(struct pci_dev *pdev)
> > > {
> > >         return (!pci_probe_reset_slot(pdev->slot)) ?
> > >             __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > > }
> > >
> > > Thanks,
> > > Alex
> > >    
> > I was going to use similar function
> > 
> > int pci_bus_reset(struct pci_dev *dev, int probe)
> > {
> >        return pci_dev_reset_slot_function(dev, probe) ?
> >                pci_parent_bus_reset(dev, probe) : 0;
> > 
> > }  
> 
> I think via the sysfs attribute we can simply call this "bus" reset,
> but internally having both pci_reset_bus() and pci_bus_reset() would be
> really confusing.  We're doing the same thing as pci_bus_reset() but
> with a different scope, so I'd probably suggest
> pci_bus_reset_function().

I'm already confusing them, s/bus_reset/reset_bus/ in the last sentence
above.  Thanks,

Alex

> 
> Also, the above ternary form isn't true to the original, only -ENOTTY
> allows fall-through, so something more like:
> 
> int pci_reset_bus_function(struct pci_dev *dev, int probe)
> {
> 	int rc = pci_dev_reset_slot_function(dev, probe);
> 
> 	return (rc == -ENOTTY) ? pci_parent_bus_reset(dev, probe) : rc;
> }
> 
> Thanks,
> Alex
> 


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-22 17:10                                                     ` Alex Williamson
@ 2021-03-24 10:03                                                       ` Leon Romanovsky
  2021-03-24 14:37                                                         ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-24 10:03 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:
> On Sun, 21 Mar 2021 10:40:55 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:  
> > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:   
> > > > > 
> > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > time a device specific method is disabled?    
> > > > 
> > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.  
> > > 
> > > "none" as implemented in this patch, clearing the enabled function
> > > reset methods.  
> > 
> > It is far from intuitive, the empty string will be easier to understand,
> > because "none" means no reset at all.
> 
> "No reset at all" is what "none" achieves, the
> pci_dev.reset_methods_enabled bitmap is cleared.  We can use an empty
> string, but I think we want a way to clear all enabled resets and a way
> to return it to the default.  I could see arguments for an empty string
> serving either purpose, so this version proposed explicitly using
> "none" and "default", as included in the ABI update.

I will stick with "default" only and leave "none" for something else.

> 
> > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > altogether, but for example should a device specific reset that fixes
> > > > > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > quirk and the remaining resets can still be selected by preference.    
> > > > 
> > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > This sysfs is user visible API that is presented as is from device point
> > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > user's choice.
> > > >   
> > > > > 
> > > > > Theoretically all the other reset methods work and are available, it's
> > > > > only a policy decision which to use, right?    
> > > > 
> > > > But this patch was presented as a way to overcome situations where
> > > > supported != working and user magically knows which reset type to set.  
> > > 
> > > It's not magic, the new sysfs attributes expose which resets are
> > > enabled and the order that they're used, the user can simply select the
> > > next one.  Being able to bypass a broken reset method is a helpful side
> > > effect of getting to select a preferred reset method.  
> > 
> > Magic in a sense that user has no idea what those resets mean, the
> > expectation is that he will blindly iterate till something works.
> 
> Which ought to actually be a safe thing to do.  We should have quirks to
> exclude resets that are known broken but still probe as present and I'd
> be perfectly fine if we issue a warning if the user disables all resets
> for a given device.
>  
> > > > If you want to take this patch to be policy decision tool,
> > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > so fallback will work natively.  
> > > 
> > > I don't see that as a requirement.  We have fall-through support in the
> > > kernel, but for a given device we're really only ever going to make use
> > > of one of those methods.  If a user knows enough about a device to have
> > > a preference, I think it can be singular.  That also significantly
> > > simplifies the interface and supporting code.  Thanks,  
> > 
> > I'm struggling to get requirements from this thread. You talked about
> > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > quirks.
> > 
> > Do you have an example of such devices or we are talking about
> > theoretical case?
> 
> Look at any device that already has a reset quirk and the process it
> took to get there.  Those are more than just theoretical cases.

So let's fix the process. The long standing kernel policy is that kernel
bugs (and missing quirk can be seen as such bug) should be fixed in the
kernel and not workaround by the users.

> 
> For policy preference, I already described how I've configured QEMU to
> prefer a bus reset rather than a PM reset due to lack of specification
> regarding the scope of a PM "soft reset".  This interface would allow a
> system policy to do that same thing.
> 
> I don't think anyone is suggesting this as a means to avoid quirks that
> would resolve reset issues and create the best default general behavior.
> This provides a mechanism to test various reset methods, and thereby
> identify broken methods, and set a policy.  Sure, that policy might be
> to avoid a broken reset in the interim before it gets quirked and
> there's potential for abuse there, but I think the benefits outweigh
> the risks.

This interface is proposed as first class citizen in the general sysfs
layout. Of course, it will be seen as a way to bypass the kernel.

At least, put it under CONFIG_EXPERT option, so no distro will enable it
by default.

> 
> > And I don't see why simple line parser with loop iterator over strchr()
> > suddenly becomes complicated code.
> 
> Setting multiple bits in a bitmap is easy.  How do you then go on to
> allow the user to specify an ordering preference?  If you have an
> algorithm you'd like to propose that allows the user to manage the
> ordering when enabling multiple methods without substantially
> increasing the complexity, please share.  IMO, a given device will
> generally use one reset method and it seems sufficient to restrict user
> preference to achieve all the use cases I've noted.  Thanks,

Linked list + iterator will do the trick.

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-24 10:03                                                       ` Leon Romanovsky
@ 2021-03-24 14:37                                                         ` Alex Williamson
  2021-03-24 15:13                                                           ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-24 14:37 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Wed, 24 Mar 2021 12:03:00 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:
> > On Sun, 21 Mar 2021 10:40:55 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:  
> > > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > > Leon Romanovsky <leon@kernel.org> wrote:    
> > > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:     
> > > > > > 
> > > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > > time a device specific method is disabled?      
> > > > > 
> > > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.    
> > > > 
> > > > "none" as implemented in this patch, clearing the enabled function
> > > > reset methods.    
> > > 
> > > It is far from intuitive, the empty string will be easier to understand,
> > > because "none" means no reset at all.  
> > 
> > "No reset at all" is what "none" achieves, the
> > pci_dev.reset_methods_enabled bitmap is cleared.  We can use an empty
> > string, but I think we want a way to clear all enabled resets and a way
> > to return it to the default.  I could see arguments for an empty string
> > serving either purpose, so this version proposed explicitly using
> > "none" and "default", as included in the ABI update.  
> 
> I will stick with "default" only and leave "none" for something else.

Are you suggesting writing "default" restores the unmodified behavior
and writing an empty string clears all enabled reset methods?
 
> > > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > > altogether, but for example should a device specific reset that fixes
> > > > > > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > > quirk and the remaining resets can still be selected by preference.      
> > > > > 
> > > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > > This sysfs is user visible API that is presented as is from device point
> > > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > > user's choice.
> > > > >     
> > > > > > 
> > > > > > Theoretically all the other reset methods work and are available, it's
> > > > > > only a policy decision which to use, right?      
> > > > > 
> > > > > But this patch was presented as a way to overcome situations where
> > > > > supported != working and user magically knows which reset type to set.    
> > > > 
> > > > It's not magic, the new sysfs attributes expose which resets are
> > > > enabled and the order that they're used, the user can simply select the
> > > > next one.  Being able to bypass a broken reset method is a helpful side
> > > > effect of getting to select a preferred reset method.    
> > > 
> > > Magic in a sense that user has no idea what those resets mean, the
> > > expectation is that he will blindly iterate till something works.  
> > 
> > Which ought to actually be a safe thing to do.  We should have quirks to
> > exclude resets that are known broken but still probe as present and I'd
> > be perfectly fine if we issue a warning if the user disables all resets
> > for a given device.
> >    
> > > > > If you want to take this patch to be policy decision tool,
> > > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > > so fallback will work natively.    
> > > > 
> > > > I don't see that as a requirement.  We have fall-through support in the
> > > > kernel, but for a given device we're really only ever going to make use
> > > > of one of those methods.  If a user knows enough about a device to have
> > > > a preference, I think it can be singular.  That also significantly
> > > > simplifies the interface and supporting code.  Thanks,    
> > > 
> > > I'm struggling to get requirements from this thread. You talked about
> > > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > > quirks.
> > > 
> > > Do you have an example of such devices or we are talking about
> > > theoretical case?  
> > 
> > Look at any device that already has a reset quirk and the process it
> > took to get there.  Those are more than just theoretical cases.  
> 
> So let's fix the process. The long standing kernel policy is that kernel
> bugs (and missing quirk can be seen as such bug) should be fixed in the
> kernel and not workaround by the users.

I don't see an actual proposal here to fix the process.  Allowing
specific reset methods to be trivially tested is a step towards fixing
the process.  Unfortunately we can't tell the difference between
someone setting a policy because they prefer a reset mechanism, are
testing a reset mechanism, or they're avoiding a broken reset mechanism.
We can't force participation if we've made it clear that the interface
should not be used long term for anything other than policy preference
and testing.

> > For policy preference, I already described how I've configured QEMU to
> > prefer a bus reset rather than a PM reset due to lack of specification
> > regarding the scope of a PM "soft reset".  This interface would allow a
> > system policy to do that same thing.
> > 
> > I don't think anyone is suggesting this as a means to avoid quirks that
> > would resolve reset issues and create the best default general behavior.
> > This provides a mechanism to test various reset methods, and thereby
> > identify broken methods, and set a policy.  Sure, that policy might be
> > to avoid a broken reset in the interim before it gets quirked and
> > there's potential for abuse there, but I think the benefits outweigh
> > the risks.  
> 
> This interface is proposed as first class citizen in the general sysfs
> layout. Of course, it will be seen as a way to bypass the kernel.
> 
> At least, put it under CONFIG_EXPERT option, so no distro will enable it
> by default.

Of course we're proposing it to be accessible, it should also require
admin privileges to modify, sysfs has lots of such things.  If it's
relegated to non-default accessibility, it won't be used for testing
and it won't be available for system policy and it's pointless.
 
> > > And I don't see why simple line parser with loop iterator over strchr()
> > > suddenly becomes complicated code.  
> > 
> > Setting multiple bits in a bitmap is easy.  How do you then go on to
> > allow the user to specify an ordering preference?  If you have an
> > algorithm you'd like to propose that allows the user to manage the
> > ordering when enabling multiple methods without substantially
> > increasing the complexity, please share.  IMO, a given device will
> > generally use one reset method and it seems sufficient to restrict user
> > preference to achieve all the use cases I've noted.  Thanks,  
> 
> Linked list + iterator will do the trick.

So you're suggesting to add potentially multiple dynamic allocations per
device and list locking and management for an unspecified use case for
an interface you seem to be opposed to anyway.  It should be pretty
clear why the keep-it-simple approach was taken in this series.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-24 14:37                                                         ` Alex Williamson
@ 2021-03-24 15:13                                                           ` Leon Romanovsky
  2021-03-24 17:17                                                             ` Alex Williamson
  0 siblings, 1 reply; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-24 15:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Wed, Mar 24, 2021 at 08:37:43AM -0600, Alex Williamson wrote:
> On Wed, 24 Mar 2021 12:03:00 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:
> > > On Sun, 21 Mar 2021 10:40:55 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >   
> > > > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:  
> > > > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > > > Leon Romanovsky <leon@kernel.org> wrote:    
> > > > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:     
> > > > > > > 
> > > > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > > > time a device specific method is disabled?      
> > > > > > 
> > > > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.    
> > > > > 
> > > > > "none" as implemented in this patch, clearing the enabled function
> > > > > reset methods.    
> > > > 
> > > > It is far from intuitive, the empty string will be easier to understand,
> > > > because "none" means no reset at all.  
> > > 
> > > "No reset at all" is what "none" achieves, the
> > > pci_dev.reset_methods_enabled bitmap is cleared.  We can use an empty
> > > string, but I think we want a way to clear all enabled resets and a way
> > > to return it to the default.  I could see arguments for an empty string
> > > serving either purpose, so this version proposed explicitly using
> > > "none" and "default", as included in the ABI update.  
> > 
> > I will stick with "default" only and leave "none" for something else.
> 
> Are you suggesting writing "default" restores the unmodified behavior
> and writing an empty string clears all enabled reset methods?
>  
> > > > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > > > altogether, but for example should a device specific reset that fixes
> > > > > > > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > > > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > > > quirk and the remaining resets can still be selected by preference.      
> > > > > > 
> > > > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > > > This sysfs is user visible API that is presented as is from device point
> > > > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > > > user's choice.
> > > > > >     
> > > > > > > 
> > > > > > > Theoretically all the other reset methods work and are available, it's
> > > > > > > only a policy decision which to use, right?      
> > > > > > 
> > > > > > But this patch was presented as a way to overcome situations where
> > > > > > supported != working and user magically knows which reset type to set.    
> > > > > 
> > > > > It's not magic, the new sysfs attributes expose which resets are
> > > > > enabled and the order that they're used, the user can simply select the
> > > > > next one.  Being able to bypass a broken reset method is a helpful side
> > > > > effect of getting to select a preferred reset method.    
> > > > 
> > > > Magic in a sense that user has no idea what those resets mean, the
> > > > expectation is that he will blindly iterate till something works.  
> > > 
> > > Which ought to actually be a safe thing to do.  We should have quirks to
> > > exclude resets that are known broken but still probe as present and I'd
> > > be perfectly fine if we issue a warning if the user disables all resets
> > > for a given device.
> > >    
> > > > > > If you want to take this patch to be policy decision tool,
> > > > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > > > so fallback will work natively.    
> > > > > 
> > > > > I don't see that as a requirement.  We have fall-through support in the
> > > > > kernel, but for a given device we're really only ever going to make use
> > > > > of one of those methods.  If a user knows enough about a device to have
> > > > > a preference, I think it can be singular.  That also significantly
> > > > > simplifies the interface and supporting code.  Thanks,    
> > > > 
> > > > I'm struggling to get requirements from this thread. You talked about
> > > > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > > > quirks.
> > > > 
> > > > Do you have an example of such devices or we are talking about
> > > > theoretical case?  
> > > 
> > > Look at any device that already has a reset quirk and the process it
> > > took to get there.  Those are more than just theoretical cases.  
> > 
> > So let's fix the process. The long standing kernel policy is that kernel
> > bugs (and missing quirk can be seen as such bug) should be fixed in the
> > kernel and not workaround by the users.
> 
> I don't see an actual proposal here to fix the process.  Allowing
> specific reset methods to be trivially tested is a step towards fixing
> the process.  Unfortunately we can't tell the difference between
> someone setting a policy because they prefer a reset mechanism, are
> testing a reset mechanism, or they're avoiding a broken reset mechanism.
> We can't force participation if we've made it clear that the interface
> should not be used long term for anything other than policy preference
> and testing.

Yes, and real testing/debugging almost always requires kernel rebuild.
Everything else is waste of time.

> 
> > > For policy preference, I already described how I've configured QEMU to
> > > prefer a bus reset rather than a PM reset due to lack of specification
> > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > system policy to do that same thing.
> > > 
> > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > would resolve reset issues and create the best default general behavior.
> > > This provides a mechanism to test various reset methods, and thereby
> > > identify broken methods, and set a policy.  Sure, that policy might be
> > > to avoid a broken reset in the interim before it gets quirked and
> > > there's potential for abuse there, but I think the benefits outweigh
> > > the risks.  
> > 
> > This interface is proposed as first class citizen in the general sysfs
> > layout. Of course, it will be seen as a way to bypass the kernel.
> > 
> > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > by default.
> 
> Of course we're proposing it to be accessible, it should also require
> admin privileges to modify, sysfs has lots of such things.  If it's
> relegated to non-default accessibility, it won't be used for testing
> and it won't be available for system policy and it's pointless.

We probably have difference in view of what testing is. I expect from
the users who experience issues with reset to do extra steps and one of
them is to require from them to compile their kernel.

The root permissions doesn't protect from anything, SO lovers will use
root without even thinking twice.

>  
> > > > And I don't see why simple line parser with loop iterator over strchr()
> > > > suddenly becomes complicated code.  
> > > 
> > > Setting multiple bits in a bitmap is easy.  How do you then go on to
> > > allow the user to specify an ordering preference?  If you have an
> > > algorithm you'd like to propose that allows the user to manage the
> > > ordering when enabling multiple methods without substantially
> > > increasing the complexity, please share.  IMO, a given device will
> > > generally use one reset method and it seems sufficient to restrict user
> > > preference to achieve all the use cases I've noted.  Thanks,  
> > 
> > Linked list + iterator will do the trick.
> 
> So you're suggesting to add potentially multiple dynamic allocations per
> device and list locking and management for an unspecified use case for
> an interface you seem to be opposed to anyway.  It should be pretty
> clear why the keep-it-simple approach was taken in this series.  Thanks,

I'm trying to help you with your use case of providing reset policy
mechanism, which can be without CONFIG_EXPERT. However if you want
to continue path of having specific reset type only, please ensure
that this is not taken to the "bypass kernel" direction.

Thanks

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-24 15:13                                                           ` Leon Romanovsky
@ 2021-03-24 17:17                                                             ` Alex Williamson
  2021-03-25  8:37                                                               ` Leon Romanovsky
  0 siblings, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-24 17:17 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Wed, 24 Mar 2021 17:13:56 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Wed, Mar 24, 2021 at 08:37:43AM -0600, Alex Williamson wrote:
> > On Wed, 24 Mar 2021 12:03:00 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:  
> > > > On Sun, 21 Mar 2021 10:40:55 +0200
> > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > >     
> > > > > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:    
> > > > > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > > > > Leon Romanovsky <leon@kernel.org> wrote:      
> > > > > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:       
> > > > > > > > 
> > > > > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > > > > time a device specific method is disabled?        
> > > > > > > 
> > > > > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.      
> > > > > > 
> > > > > > "none" as implemented in this patch, clearing the enabled function
> > > > > > reset methods.      
> > > > > 
> > > > > It is far from intuitive, the empty string will be easier to understand,
> > > > > because "none" means no reset at all.    
> > > > 
> > > > "No reset at all" is what "none" achieves, the
> > > > pci_dev.reset_methods_enabled bitmap is cleared.  We can use an empty
> > > > string, but I think we want a way to clear all enabled resets and a way
> > > > to return it to the default.  I could see arguments for an empty string
> > > > serving either purpose, so this version proposed explicitly using
> > > > "none" and "default", as included in the ABI update.    
> > > 
> > > I will stick with "default" only and leave "none" for something else.  
> > 
> > Are you suggesting writing "default" restores the unmodified behavior
> > and writing an empty string clears all enabled reset methods?
> >    
> > > > > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > > > > altogether, but for example should a device specific reset that fixes
> > > > > > > > an aspect of FLR behavior prevent using a bus reset?  I'd prefer in that
> > > > > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > > > > quirk and the remaining resets can still be selected by preference.        
> > > > > > > 
> > > > > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > > > > This sysfs is user visible API that is presented as is from device point
> > > > > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > > > > user's choice.
> > > > > > >       
> > > > > > > > 
> > > > > > > > Theoretically all the other reset methods work and are available, it's
> > > > > > > > only a policy decision which to use, right?        
> > > > > > > 
> > > > > > > But this patch was presented as a way to overcome situations where
> > > > > > > supported != working and user magically knows which reset type to set.      
> > > > > > 
> > > > > > It's not magic, the new sysfs attributes expose which resets are
> > > > > > enabled and the order that they're used, the user can simply select the
> > > > > > next one.  Being able to bypass a broken reset method is a helpful side
> > > > > > effect of getting to select a preferred reset method.      
> > > > > 
> > > > > Magic in a sense that user has no idea what those resets mean, the
> > > > > expectation is that he will blindly iterate till something works.    
> > > > 
> > > > Which ought to actually be a safe thing to do.  We should have quirks to
> > > > exclude resets that are known broken but still probe as present and I'd
> > > > be perfectly fine if we issue a warning if the user disables all resets
> > > > for a given device.
> > > >      
> > > > > > > If you want to take this patch to be policy decision tool,
> > > > > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > > > > so fallback will work natively.      
> > > > > > 
> > > > > > I don't see that as a requirement.  We have fall-through support in the
> > > > > > kernel, but for a given device we're really only ever going to make use
> > > > > > of one of those methods.  If a user knows enough about a device to have
> > > > > > a preference, I think it can be singular.  That also significantly
> > > > > > simplifies the interface and supporting code.  Thanks,      
> > > > > 
> > > > > I'm struggling to get requirements from this thread. You talked about
> > > > > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > > > > quirks.
> > > > > 
> > > > > Do you have an example of such devices or we are talking about
> > > > > theoretical case?    
> > > > 
> > > > Look at any device that already has a reset quirk and the process it
> > > > took to get there.  Those are more than just theoretical cases.    
> > > 
> > > So let's fix the process. The long standing kernel policy is that kernel
> > > bugs (and missing quirk can be seen as such bug) should be fixed in the
> > > kernel and not workaround by the users.  
> > 
> > I don't see an actual proposal here to fix the process.  Allowing
> > specific reset methods to be trivially tested is a step towards fixing
> > the process.  Unfortunately we can't tell the difference between
> > someone setting a policy because they prefer a reset mechanism, are
> > testing a reset mechanism, or they're avoiding a broken reset mechanism.
> > We can't force participation if we've made it clear that the interface
> > should not be used long term for anything other than policy preference
> > and testing.  
> 
> Yes, and real testing/debugging almost always requires kernel rebuild.
> Everything else is waste of time.

Sorry, this is nonsense.  Allowing users to debug issues without a full
kernel rebuild is a good thing.

> > > > For policy preference, I already described how I've configured QEMU to
> > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > system policy to do that same thing.
> > > > 
> > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > would resolve reset issues and create the best default general behavior.
> > > > This provides a mechanism to test various reset methods, and thereby
> > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > to avoid a broken reset in the interim before it gets quirked and
> > > > there's potential for abuse there, but I think the benefits outweigh
> > > > the risks.    
> > > 
> > > This interface is proposed as first class citizen in the general sysfs
> > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > 
> > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > by default.  
> > 
> > Of course we're proposing it to be accessible, it should also require
> > admin privileges to modify, sysfs has lots of such things.  If it's
> > relegated to non-default accessibility, it won't be used for testing
> > and it won't be available for system policy and it's pointless.  
> 
> We probably have difference in view of what testing is. I expect from
> the users who experience issues with reset to do extra steps and one of
> them is to require from them to compile their kernel.

I would define the ability to generate a CI test that can pick a
device, unbind it from its driver, and iterate reset methods as a
worthwhile improvement in testing.

> The root permissions doesn't protect from anything, SO lovers will use
> root without even thinking twice.

Yes, with great power comes great responsibility.  Many admins ignore
this.  That's far beyond the scope of this series.

> > > > > And I don't see why simple line parser with loop iterator over strchr()
> > > > > suddenly becomes complicated code.    
> > > > 
> > > > Setting multiple bits in a bitmap is easy.  How do you then go on to
> > > > allow the user to specify an ordering preference?  If you have an
> > > > algorithm you'd like to propose that allows the user to manage the
> > > > ordering when enabling multiple methods without substantially
> > > > increasing the complexity, please share.  IMO, a given device will
> > > > generally use one reset method and it seems sufficient to restrict user
> > > > preference to achieve all the use cases I've noted.  Thanks,    
> > > 
> > > Linked list + iterator will do the trick.  
> > 
> > So you're suggesting to add potentially multiple dynamic allocations per
> > device and list locking and management for an unspecified use case for
> > an interface you seem to be opposed to anyway.  It should be pretty
> > clear why the keep-it-simple approach was taken in this series.  Thanks,  
> 
> I'm trying to help you with your use case of providing reset policy
> mechanism, which can be without CONFIG_EXPERT. However if you want
> to continue path of having specific reset type only, please ensure
> that this is not taken to the "bypass kernel" direction.

You've lost me, are you saying you'd be in favor of an interface that
allows an admin to specify an arbitrary list of reset methods because
that's somehow more in line with a policy choice than a userspace
workaround?  This seems like unnecessary bloat because (a) it allows
the same bypass mechanism, and (b) a given device is only going to use
a single method anyway, so the functionality is unnecessary.  Please
help me understand how this favors the policy use case.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-24 17:17                                                             ` Alex Williamson
@ 2021-03-25  8:37                                                               ` Leon Romanovsky
  2021-03-25 14:55                                                                 ` Alex Williamson
  2021-03-25 16:26                                                                 ` Amey Narkhede
  0 siblings, 2 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-25  8:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> On Wed, 24 Mar 2021 17:13:56 +0200
> Leon Romanovsky <leon@kernel.org> wrote:

<...>

> > Yes, and real testing/debugging almost always requires kernel rebuild.
> > Everything else is waste of time.
> 
> Sorry, this is nonsense.  Allowing users to debug issues without a full
> kernel rebuild is a good thing.

It is far from debug, this interface doesn't give you any answers why
the reset didn't work, it just helps you to find the one that works.

Unless you believe that this information will be enough to understand
the root cause, you will need to ask from the user to perform extra
tests, maybe try some quirk. All of that requires from the users to
rebuild their kernel.

So no, it is not debug.

> 
> > > > > For policy preference, I already described how I've configured QEMU to
> > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > system policy to do that same thing.
> > > > > 
> > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > would resolve reset issues and create the best default general behavior.
> > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > the risks.    
> > > > 
> > > > This interface is proposed as first class citizen in the general sysfs
> > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > 
> > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > by default.  
> > > 
> > > Of course we're proposing it to be accessible, it should also require
> > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > relegated to non-default accessibility, it won't be used for testing
> > > and it won't be available for system policy and it's pointless.  
> > 
> > We probably have difference in view of what testing is. I expect from
> > the users who experience issues with reset to do extra steps and one of
> > them is to require from them to compile their kernel.
> 
> I would define the ability to generate a CI test that can pick a
> device, unbind it from its driver, and iterate reset methods as a
> worthwhile improvement in testing.

Who is going to run this CI? At least all kernel CIs (external and
internal to HW vendors) that I'm familiar are building kernel themselves.

Distro kernel is too bloat to be really usable for CI.

> 
> > The root permissions doesn't protect from anything, SO lovers will use
> > root without even thinking twice.
> 
> Yes, with great power comes great responsibility.  Many admins ignore
> this.  That's far beyond the scope of this series.

<...>

> > I'm trying to help you with your use case of providing reset policy
> > mechanism, which can be without CONFIG_EXPERT. However if you want
> > to continue path of having specific reset type only, please ensure
> > that this is not taken to the "bypass kernel" direction.
> 
> You've lost me, are you saying you'd be in favor of an interface that
> allows an admin to specify an arbitrary list of reset methods because
> that's somehow more in line with a policy choice than a userspace
> workaround?  This seems like unnecessary bloat because (a) it allows
> the same bypass mechanism, and (b) a given device is only going to use
> a single method anyway, so the functionality is unnecessary.  Please
> help me understand how this favors the policy use case.  Thanks,

The policy decision is global logic that is easier to grasp. At some
point of our discussion, you presented the case where PM reset is not
defined well and you prefer to do bus reset (something like that).

I expect that QEMU sets same reset policy for all devices at the same
time instead of trying per-device to guess which one works.

And yes, you will be able to bypass kernel, but at least this interface
will be broader than initial one that serves only SO and workarounds.

Thanks

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25  8:37                                                               ` Leon Romanovsky
@ 2021-03-25 14:55                                                                 ` Alex Williamson
  2021-03-25 16:09                                                                   ` Leon Romanovsky
  2021-03-25 16:26                                                                 ` Amey Narkhede
  1 sibling, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-25 14:55 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Thu, 25 Mar 2021 10:37:54 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > On Wed, 24 Mar 2021 17:13:56 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:  
> 
> <...>
> 
> > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > Everything else is waste of time.  
> > 
> > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > kernel rebuild is a good thing.  
> 
> It is far from debug, this interface doesn't give you any answers why
> the reset didn't work, it just helps you to find the one that works.
> 
> Unless you believe that this information will be enough to understand
> the root cause, you will need to ask from the user to perform extra
> tests, maybe try some quirk. All of that requires from the users to
> rebuild their kernel.
> 
> So no, it is not debug.

It allows a user to experiment to determine (a) my device doesn't work
in a given scenario with the default configuration, but (b) if I change
the reset to this other thing it does work.  That is a step in
debugging.

It's absurd to think that a sysfs attribute could provide root cause,
but it might be enough for someone to further help that user.  It would
be a useful clue for a bug report.  Yes, reaching root cause might
involve building a kernel, but that doesn't invalidate that having a
step towards debugging in the base kernel might be a useful tool.

> > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > system policy to do that same thing.
> > > > > > 
> > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > the risks.      
> > > > > 
> > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > 
> > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > by default.    
> > > > 
> > > > Of course we're proposing it to be accessible, it should also require
> > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > relegated to non-default accessibility, it won't be used for testing
> > > > and it won't be available for system policy and it's pointless.    
> > > 
> > > We probably have difference in view of what testing is. I expect from
> > > the users who experience issues with reset to do extra steps and one of
> > > them is to require from them to compile their kernel.  
> > 
> > I would define the ability to generate a CI test that can pick a
> > device, unbind it from its driver, and iterate reset methods as a
> > worthwhile improvement in testing.  
> 
> Who is going to run this CI? At least all kernel CIs (external and
> internal to HW vendors) that I'm familiar are building kernel themselves.
> 
> Distro kernel is too bloat to be really usable for CI.

At this point I'm suspicious you're trolling.  A distro kernel CI
certainly uses the kernel they intend to ship and support in their
environment. You're concerned about a bloated kernel, but the proposal
here adds 2-bytes per device to track reset methods and a trivial array
in text memory, meanwhile you're proposing multiple per-device memory
allocations to enhance the feature you think is too bloated for CI.

> > > The root permissions doesn't protect from anything, SO lovers will use
> > > root without even thinking twice.  
> > 
> > Yes, with great power comes great responsibility.  Many admins ignore
> > this.  That's far beyond the scope of this series.  
> 
> <...>
> 
> > > I'm trying to help you with your use case of providing reset policy
> > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > to continue path of having specific reset type only, please ensure
> > > that this is not taken to the "bypass kernel" direction.  
> > 
> > You've lost me, are you saying you'd be in favor of an interface that
> > allows an admin to specify an arbitrary list of reset methods because
> > that's somehow more in line with a policy choice than a userspace
> > workaround?  This seems like unnecessary bloat because (a) it allows
> > the same bypass mechanism, and (b) a given device is only going to use
> > a single method anyway, so the functionality is unnecessary.  Please
> > help me understand how this favors the policy use case.  Thanks,  
> 
> The policy decision is global logic that is easier to grasp. At some
> point of our discussion, you presented the case where PM reset is not
> defined well and you prefer to do bus reset (something like that).
> 
> I expect that QEMU sets same reset policy for all devices at the same
> time instead of trying per-device to guess which one works.
> 
> And yes, you will be able to bypass kernel, but at least this interface
> will be broader than initial one that serves only SO and workarounds.

I still think allocating objects for a list and managing that list is
too bloated and complicated, but I agree that being able to have more
fine grained control could be useful.  Is it necessary to be able to
re-order reset methods or might it still be better aligned to a policy
use case if we allow plus and minus operators?  For example, a device
might list:

[pm] [bus]

Indicating that PM and bus reset are both available and enabled.  The
user could do:

echo -pm > reset_methods

This would result in:

pm [bus]

Indicating that both PM and bus resets are available, but only bus reset
is enabled (note this is the identical result to "echo bus >" in the
current proposal).  "echo +pm" or "echo default" could re-enable the PM
reset.  Would something like that be satisfactory?

If we need to allow re-ording, we'd want to use a byte-array where each
byte indicates a type of reset and perhaps a non-zero value in the
array indicates the method is enabled and the value indicates priority.
For example writing "dev_spec,flr,bus" would parse to write 1 to the
byte associated with the device specific reset, 2 to flr, 3 to bus
reset, then we'd process low to high (or maybe starting at a high value
to count down to zero might be more simple).  We could do that with
only adding less than a fixed 8-bytes per device and no dynamic
allocation.  Thoughts?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25 14:55                                                                 ` Alex Williamson
@ 2021-03-25 16:09                                                                   ` Leon Romanovsky
  2021-03-25 17:22                                                                     ` Amey Narkhede
  2021-03-25 17:53                                                                     ` Alex Williamson
  0 siblings, 2 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-25 16:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> On Thu, 25 Mar 2021 10:37:54 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:  
> > 
> > <...>
> > 
> > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > Everything else is waste of time.  
> > > 
> > > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > > kernel rebuild is a good thing.  
> > 
> > It is far from debug, this interface doesn't give you any answers why
> > the reset didn't work, it just helps you to find the one that works.
> > 
> > Unless you believe that this information will be enough to understand
> > the root cause, you will need to ask from the user to perform extra
> > tests, maybe try some quirk. All of that requires from the users to
> > rebuild their kernel.
> > 
> > So no, it is not debug.
> 
> It allows a user to experiment to determine (a) my device doesn't work
> in a given scenario with the default configuration, but (b) if I change
> the reset to this other thing it does work.  That is a step in
> debugging.
> 
> It's absurd to think that a sysfs attribute could provide root cause,
> but it might be enough for someone to further help that user.  It would
> be a useful clue for a bug report.  Yes, reaching root cause might
> involve building a kernel, but that doesn't invalidate that having a
> step towards debugging in the base kernel might be a useful tool.

Let's agree to do not agree.

> 
> > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > > system policy to do that same thing.
> > > > > > > 
> > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > the risks.      
> > > > > > 
> > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > 
> > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > by default.    
> > > > > 
> > > > > Of course we're proposing it to be accessible, it should also require
> > > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > and it won't be available for system policy and it's pointless.    
> > > > 
> > > > We probably have difference in view of what testing is. I expect from
> > > > the users who experience issues with reset to do extra steps and one of
> > > > them is to require from them to compile their kernel.  
> > > 
> > > I would define the ability to generate a CI test that can pick a
> > > device, unbind it from its driver, and iterate reset methods as a
> > > worthwhile improvement in testing.  
> > 
> > Who is going to run this CI? At least all kernel CIs (external and
> > internal to HW vendors) that I'm familiar are building kernel themselves.
> > 
> > Distro kernel is too bloat to be really usable for CI.
> 
> At this point I'm suspicious you're trolling.  A distro kernel CI
> certainly uses the kernel they intend to ship and support in their
> environment. You're concerned about a bloated kernel, but the proposal
> here adds 2-bytes per device to track reset methods and a trivial array
> in text memory, meanwhile you're proposing multiple per-device memory
> allocations to enhance the feature you think is too bloated for CI.

I don't know why you decided to focus on memory footprint which is not
important at all during CI runs. The bloat is in Kconfig options that
are not needed. Those extra options add significant overhead during
builds and runs itself.

And not, I'm not trolling, but representing HW vendor that pushes its CI
and developers environment to the limit, by running full kernel builds with
less than 30 seconds and boot-to-test with less than 6 seconds for full
Fedora VM.

> 
> > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > root without even thinking twice.  
> > > 
> > > Yes, with great power comes great responsibility.  Many admins ignore
> > > this.  That's far beyond the scope of this series.  
> > 
> > <...>
> > 
> > > > I'm trying to help you with your use case of providing reset policy
> > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > to continue path of having specific reset type only, please ensure
> > > > that this is not taken to the "bypass kernel" direction.  
> > > 
> > > You've lost me, are you saying you'd be in favor of an interface that
> > > allows an admin to specify an arbitrary list of reset methods because
> > > that's somehow more in line with a policy choice than a userspace
> > > workaround?  This seems like unnecessary bloat because (a) it allows
> > > the same bypass mechanism, and (b) a given device is only going to use
> > > a single method anyway, so the functionality is unnecessary.  Please
> > > help me understand how this favors the policy use case.  Thanks,  
> > 
> > The policy decision is global logic that is easier to grasp. At some
> > point of our discussion, you presented the case where PM reset is not
> > defined well and you prefer to do bus reset (something like that).
> > 
> > I expect that QEMU sets same reset policy for all devices at the same
> > time instead of trying per-device to guess which one works.
> > 
> > And yes, you will be able to bypass kernel, but at least this interface
> > will be broader than initial one that serves only SO and workarounds.
> 
> I still think allocating objects for a list and managing that list is
> too bloated and complicated, but I agree that being able to have more
> fine grained control could be useful.  Is it necessary to be able to
> re-order reset methods or might it still be better aligned to a policy
> use case if we allow plus and minus operators?  For example, a device
> might list:
> 
> [pm] [bus]
> 
> Indicating that PM and bus reset are both available and enabled.  The
> user could do:
> 
> echo -pm > reset_methods
> 
> This would result in:
> 
> pm [bus]
> 
> Indicating that both PM and bus resets are available, but only bus reset
> is enabled (note this is the identical result to "echo bus >" in the
> current proposal).  "echo +pm" or "echo default" could re-enable the PM
> reset.  Would something like that be satisfactory?

Yes, I actually imagined simpler interface:
To set specific type:
echo pm > reset_methods
To set policy:
echo "pm,bus" > reset_methods

But your proposal is nicer.

> 
> If we need to allow re-ording, we'd want to use a byte-array where each
> byte indicates a type of reset and perhaps a non-zero value in the
> array indicates the method is enabled and the value indicates priority.
> For example writing "dev_spec,flr,bus" would parse to write 1 to the
> byte associated with the device specific reset, 2 to flr, 3 to bus
> reset, then we'd process low to high (or maybe starting at a high value
> to count down to zero might be more simple).  We could do that with
> only adding less than a fixed 8-bytes per device and no dynamic
> allocation.  Thoughts?  Thanks,

Like I suggested, linked list will be easier and the reset will be
something like:
 for_each_reset_type(device, type) { 
   switch (type) {
   	case PM:
	       ret = do_some_reset(device);
	       break;
	case BUS:
		.....
	}
   if (!ret || ret == -ENOMEM)  <-- go to next type in linked list
     return ret;
   }
     

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25  8:37                                                               ` Leon Romanovsky
  2021-03-25 14:55                                                                 ` Alex Williamson
@ 2021-03-25 16:26                                                                 ` Amey Narkhede
  2021-03-25 16:46                                                                   ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-25 16:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: info, raphael.norwitz, alex.williamson, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/25 10:37AM, Leon Romanovsky wrote:
> On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > On Wed, 24 Mar 2021 17:13:56 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
>
> <...>
>
> > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > Everything else is waste of time.
> >
> > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > kernel rebuild is a good thing.
>
> It is far from debug, this interface doesn't give you any answers why
> the reset didn't work, it just helps you to find the one that works.
>
> Unless you believe that this information will be enough to understand
> the root cause, you will need to ask from the user to perform extra
> tests, maybe try some quirk. All of that requires from the users to
> rebuild their kernel.
>
> So no, it is not debug.
>
> >
> > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > system policy to do that same thing.
> > > > > >
> > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > the risks.
> > > > >
> > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > >
> > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > by default.
> > > >
> > > > Of course we're proposing it to be accessible, it should also require
> > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > relegated to non-default accessibility, it won't be used for testing
> > > > and it won't be available for system policy and it's pointless.
> > >
> > > We probably have difference in view of what testing is. I expect from
> > > the users who experience issues with reset to do extra steps and one of
> > > them is to require from them to compile their kernel.
> >
> > I would define the ability to generate a CI test that can pick a
> > device, unbind it from its driver, and iterate reset methods as a
> > worthwhile improvement in testing.
>
> Who is going to run this CI? At least all kernel CIs (external and
> internal to HW vendors) that I'm familiar are building kernel themselves.
>
> Distro kernel is too bloat to be really usable for CI.
>
> >
> > > The root permissions doesn't protect from anything, SO lovers will use
> > > root without even thinking twice.
> >
> > Yes, with great power comes great responsibility.  Many admins ignore
> > this.  That's far beyond the scope of this series.
>
> <...>
>
> > > I'm trying to help you with your use case of providing reset policy
> > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > to continue path of having specific reset type only, please ensure
> > > that this is not taken to the "bypass kernel" direction.
> >
> > You've lost me, are you saying you'd be in favor of an interface that
> > allows an admin to specify an arbitrary list of reset methods because
> > that's somehow more in line with a policy choice than a userspace
> > workaround?  This seems like unnecessary bloat because (a) it allows
> > the same bypass mechanism, and (b) a given device is only going to use
> > a single method anyway, so the functionality is unnecessary.  Please
> > help me understand how this favors the policy use case.  Thanks,
>
> The policy decision is global logic that is easier to grasp. At some
> point of our discussion, you presented the case where PM reset is not
> defined well and you prefer to do bus reset (something like that).
>
> I expect that QEMU sets same reset policy for all devices at the same
> time instead of trying per-device to guess which one works.
>
The current reset attribute does the same thing internally you described
at the end.
> And yes, you will be able to bypass kernel, but at least this interface
> will be broader than initial one that serves only SO and workarounds.
>
What does it mean by "bypassing" kernel?
I don't see any problem with SO and workaround if that is the only
way an user can use their device. Why are you expecting every vendor to
develop quirk? Also I don't see any point of using linked list to
unnecessarily complicate a simple thing.

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25 16:26                                                                 ` Amey Narkhede
@ 2021-03-25 16:46                                                                   ` Leon Romanovsky
  0 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-25 16:46 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: info, raphael.norwitz, alex.williamson, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Thu, Mar 25, 2021 at 09:56:37PM +0530, Amey Narkhede wrote:
> On 21/03/25 10:37AM, Leon Romanovsky wrote:
> > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:

<...>

> > I expect that QEMU sets same reset policy for all devices at the same
> > time instead of trying per-device to guess which one works.
> >
> The current reset attribute does the same thing internally you described
> at the end.
> > And yes, you will be able to bypass kernel, but at least this interface
> > will be broader than initial one that serves only SO and workarounds.
> >
> What does it mean by "bypassing" kernel?
> I don't see any problem with SO and workaround if that is the only
> way an user can use their device. Why are you expecting every vendor to
> develop quirk? Also I don't see any point of using linked list to
> unnecessarily complicate a simple thing.

Please reread our conversation with Alex, it has answers to both of your
questions.

Thanks

> 
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25 16:09                                                                   ` Leon Romanovsky
@ 2021-03-25 17:22                                                                     ` Amey Narkhede
  2021-03-25 17:36                                                                       ` Leon Romanovsky
  2021-03-25 17:53                                                                     ` Alex Williamson
  1 sibling, 1 reply; 90+ messages in thread
From: Amey Narkhede @ 2021-03-25 17:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: info, raphael.norwitz, alex.williamson, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On 21/03/25 06:09PM, Leon Romanovsky wrote:
> On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > On Thu, 25 Mar 2021 10:37:54 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >
> > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > <...>
> > >
> > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > Everything else is waste of time.
> > > >
> > > > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > > > kernel rebuild is a good thing.
> > >
> > > It is far from debug, this interface doesn't give you any answers why
> > > the reset didn't work, it just helps you to find the one that works.
> > >
> > > Unless you believe that this information will be enough to understand
> > > the root cause, you will need to ask from the user to perform extra
> > > tests, maybe try some quirk. All of that requires from the users to
> > > rebuild their kernel.
> > >
> > > So no, it is not debug.
> >
> > It allows a user to experiment to determine (a) my device doesn't work
> > in a given scenario with the default configuration, but (b) if I change
> > the reset to this other thing it does work.  That is a step in
> > debugging.
> >
> > It's absurd to think that a sysfs attribute could provide root cause,
> > but it might be enough for someone to further help that user.  It would
> > be a useful clue for a bug report.  Yes, reaching root cause might
> > involve building a kernel, but that doesn't invalidate that having a
> > step towards debugging in the base kernel might be a useful tool.
>
> Let's agree to do not agree.
>
> >
> > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > > > system policy to do that same thing.
> > > > > > > >
> > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > the risks.
> > > > > > >
> > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > >
> > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > by default.
> > > > > >
> > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > and it won't be available for system policy and it's pointless.
> > > > >
> > > > > We probably have difference in view of what testing is. I expect from
> > > > > the users who experience issues with reset to do extra steps and one of
> > > > > them is to require from them to compile their kernel.
> > > >
> > > > I would define the ability to generate a CI test that can pick a
> > > > device, unbind it from its driver, and iterate reset methods as a
> > > > worthwhile improvement in testing.
> > >
> > > Who is going to run this CI? At least all kernel CIs (external and
> > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > >
> > > Distro kernel is too bloat to be really usable for CI.
> >
> > At this point I'm suspicious you're trolling.  A distro kernel CI
> > certainly uses the kernel they intend to ship and support in their
> > environment. You're concerned about a bloated kernel, but the proposal
> > here adds 2-bytes per device to track reset methods and a trivial array
> > in text memory, meanwhile you're proposing multiple per-device memory
> > allocations to enhance the feature you think is too bloated for CI.
>
> I don't know why you decided to focus on memory footprint which is not
> important at all during CI runs. The bloat is in Kconfig options that
> are not needed. Those extra options add significant overhead during
> builds and runs itself.
>
> And not, I'm not trolling, but representing HW vendor that pushes its CI
> and developers environment to the limit, by running full kernel builds with
> less than 30 seconds and boot-to-test with less than 6 seconds for full
> Fedora VM.
>
> >
> > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > root without even thinking twice.
> > > >
> > > > Yes, with great power comes great responsibility.  Many admins ignore
> > > > this.  That's far beyond the scope of this series.
> > >
> > > <...>
> > >
> > > > > I'm trying to help you with your use case of providing reset policy
> > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > to continue path of having specific reset type only, please ensure
> > > > > that this is not taken to the "bypass kernel" direction.
> > > >
> > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > allows an admin to specify an arbitrary list of reset methods because
> > > > that's somehow more in line with a policy choice than a userspace
> > > > workaround?  This seems like unnecessary bloat because (a) it allows
> > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > a single method anyway, so the functionality is unnecessary.  Please
> > > > help me understand how this favors the policy use case.  Thanks,
> > >
> > > The policy decision is global logic that is easier to grasp. At some
> > > point of our discussion, you presented the case where PM reset is not
> > > defined well and you prefer to do bus reset (something like that).
> > >
> > > I expect that QEMU sets same reset policy for all devices at the same
> > > time instead of trying per-device to guess which one works.
> > >
> > > And yes, you will be able to bypass kernel, but at least this interface
> > > will be broader than initial one that serves only SO and workarounds.
> >
> > I still think allocating objects for a list and managing that list is
> > too bloated and complicated, but I agree that being able to have more
> > fine grained control could be useful.  Is it necessary to be able to
> > re-order reset methods or might it still be better aligned to a policy
> > use case if we allow plus and minus operators?  For example, a device
> > might list:
> >
> > [pm] [bus]
> >
> > Indicating that PM and bus reset are both available and enabled.  The
> > user could do:
> >
> > echo -pm > reset_methods
> >
> > This would result in:
> >
> > pm [bus]
> >
> > Indicating that both PM and bus resets are available, but only bus reset
> > is enabled (note this is the identical result to "echo bus >" in the
> > current proposal).  "echo +pm" or "echo default" could re-enable the PM
> > reset.  Would something like that be satisfactory?
>
> Yes, I actually imagined simpler interface:
> To set specific type:
> echo pm > reset_methods
> To set policy:
> echo "pm,bus" > reset_methods
>
> But your proposal is nicer.
>
Okay I'll include this in v2
> >
> > If we need to allow re-ording, we'd want to use a byte-array where each
> > byte indicates a type of reset and perhaps a non-zero value in the
> > array indicates the method is enabled and the value indicates priority.
> > For example writing "dev_spec,flr,bus" would parse to write 1 to the
> > byte associated with the device specific reset, 2 to flr, 3 to bus
> > reset, then we'd process low to high (or maybe starting at a high value
> > to count down to zero might be more simple).  We could do that with
> > only adding less than a fixed 8-bytes per device and no dynamic
> > allocation.  Thoughts?  Thanks,
>
> Like I suggested, linked list will be easier and the reset will be
> something like:
>  for_each_reset_type(device, type) {
>    switch (type) {
>    	case PM:
> 	       ret = do_some_reset(device);
> 	       break;
> 	case BUS:
> 		.....
> 	}
>    if (!ret || ret == -ENOMEM)  <-- go to next type in linked list
>      return ret;
>    }
>
Maybe we can use a byte array here. Lets consider current pci_reset_fn_methods
array. If a input is "pm, flr" we can have byte array with index of
those methods in pci_reset_fn_methods like [3, 1]. So when user triggers a
reset we use reset method at index 3(pm) and then at index 1(flr).
Does that make sense?

Thanks,
Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25 17:22                                                                     ` Amey Narkhede
@ 2021-03-25 17:36                                                                       ` Leon Romanovsky
  0 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-25 17:36 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: info, raphael.norwitz, alex.williamson, linux-pci, bhelgaas,
	linux-kernel, alay.shah, suresh.gumpula, shyam.rajendran, felipe

On Thu, Mar 25, 2021 at 10:52:57PM +0530, Amey Narkhede wrote:
> On 21/03/25 06:09PM, Leon Romanovsky wrote:
> > On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > > On Thu, 25 Mar 2021 10:37:54 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > <...>
> > > >
> > > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > > Everything else is waste of time.
> > > > >
> > > > > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > > > > kernel rebuild is a good thing.
> > > >
> > > > It is far from debug, this interface doesn't give you any answers why
> > > > the reset didn't work, it just helps you to find the one that works.
> > > >
> > > > Unless you believe that this information will be enough to understand
> > > > the root cause, you will need to ask from the user to perform extra
> > > > tests, maybe try some quirk. All of that requires from the users to
> > > > rebuild their kernel.
> > > >
> > > > So no, it is not debug.
> > >
> > > It allows a user to experiment to determine (a) my device doesn't work
> > > in a given scenario with the default configuration, but (b) if I change
> > > the reset to this other thing it does work.  That is a step in
> > > debugging.
> > >
> > > It's absurd to think that a sysfs attribute could provide root cause,
> > > but it might be enough for someone to further help that user.  It would
> > > be a useful clue for a bug report.  Yes, reaching root cause might
> > > involve building a kernel, but that doesn't invalidate that having a
> > > step towards debugging in the base kernel might be a useful tool.
> >
> > Let's agree to do not agree.
> >
> > >
> > > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > > > > system policy to do that same thing.
> > > > > > > > >
> > > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > > the risks.
> > > > > > > >
> > > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > > >
> > > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > > by default.
> > > > > > >
> > > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > > and it won't be available for system policy and it's pointless.
> > > > > >
> > > > > > We probably have difference in view of what testing is. I expect from
> > > > > > the users who experience issues with reset to do extra steps and one of
> > > > > > them is to require from them to compile their kernel.
> > > > >
> > > > > I would define the ability to generate a CI test that can pick a
> > > > > device, unbind it from its driver, and iterate reset methods as a
> > > > > worthwhile improvement in testing.
> > > >
> > > > Who is going to run this CI? At least all kernel CIs (external and
> > > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > > >
> > > > Distro kernel is too bloat to be really usable for CI.
> > >
> > > At this point I'm suspicious you're trolling.  A distro kernel CI
> > > certainly uses the kernel they intend to ship and support in their
> > > environment. You're concerned about a bloated kernel, but the proposal
> > > here adds 2-bytes per device to track reset methods and a trivial array
> > > in text memory, meanwhile you're proposing multiple per-device memory
> > > allocations to enhance the feature you think is too bloated for CI.
> >
> > I don't know why you decided to focus on memory footprint which is not
> > important at all during CI runs. The bloat is in Kconfig options that
> > are not needed. Those extra options add significant overhead during
> > builds and runs itself.
> >
> > And not, I'm not trolling, but representing HW vendor that pushes its CI
> > and developers environment to the limit, by running full kernel builds with
> > less than 30 seconds and boot-to-test with less than 6 seconds for full
> > Fedora VM.
> >
> > >
> > > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > > root without even thinking twice.
> > > > >
> > > > > Yes, with great power comes great responsibility.  Many admins ignore
> > > > > this.  That's far beyond the scope of this series.
> > > >
> > > > <...>
> > > >
> > > > > > I'm trying to help you with your use case of providing reset policy
> > > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > > to continue path of having specific reset type only, please ensure
> > > > > > that this is not taken to the "bypass kernel" direction.
> > > > >
> > > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > > allows an admin to specify an arbitrary list of reset methods because
> > > > > that's somehow more in line with a policy choice than a userspace
> > > > > workaround?  This seems like unnecessary bloat because (a) it allows
> > > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > > a single method anyway, so the functionality is unnecessary.  Please
> > > > > help me understand how this favors the policy use case.  Thanks,
> > > >
> > > > The policy decision is global logic that is easier to grasp. At some
> > > > point of our discussion, you presented the case where PM reset is not
> > > > defined well and you prefer to do bus reset (something like that).
> > > >
> > > > I expect that QEMU sets same reset policy for all devices at the same
> > > > time instead of trying per-device to guess which one works.
> > > >
> > > > And yes, you will be able to bypass kernel, but at least this interface
> > > > will be broader than initial one that serves only SO and workarounds.
> > >
> > > I still think allocating objects for a list and managing that list is
> > > too bloated and complicated, but I agree that being able to have more
> > > fine grained control could be useful.  Is it necessary to be able to
> > > re-order reset methods or might it still be better aligned to a policy
> > > use case if we allow plus and minus operators?  For example, a device
> > > might list:
> > >
> > > [pm] [bus]
> > >
> > > Indicating that PM and bus reset are both available and enabled.  The
> > > user could do:
> > >
> > > echo -pm > reset_methods
> > >
> > > This would result in:
> > >
> > > pm [bus]
> > >
> > > Indicating that both PM and bus resets are available, but only bus reset
> > > is enabled (note this is the identical result to "echo bus >" in the
> > > current proposal).  "echo +pm" or "echo default" could re-enable the PM
> > > reset.  Would something like that be satisfactory?
> >
> > Yes, I actually imagined simpler interface:
> > To set specific type:
> > echo pm > reset_methods
> > To set policy:
> > echo "pm,bus" > reset_methods
> >
> > But your proposal is nicer.
> >
> Okay I'll include this in v2
> > >
> > > If we need to allow re-ording, we'd want to use a byte-array where each
> > > byte indicates a type of reset and perhaps a non-zero value in the
> > > array indicates the method is enabled and the value indicates priority.
> > > For example writing "dev_spec,flr,bus" would parse to write 1 to the
> > > byte associated with the device specific reset, 2 to flr, 3 to bus
> > > reset, then we'd process low to high (or maybe starting at a high value
> > > to count down to zero might be more simple).  We could do that with
> > > only adding less than a fixed 8-bytes per device and no dynamic
> > > allocation.  Thoughts?  Thanks,
> >
> > Like I suggested, linked list will be easier and the reset will be
> > something like:
> >  for_each_reset_type(device, type) {
> >    switch (type) {
> >    	case PM:
> > 	       ret = do_some_reset(device);
> > 	       break;
> > 	case BUS:
> > 		.....
> > 	}
> >    if (!ret || ret == -ENOMEM)  <-- go to next type in linked list
> >      return ret;
> >    }
> >
> Maybe we can use a byte array here. Lets consider current pci_reset_fn_methods
> array. If a input is "pm, flr" we can have byte array with index of
> those methods in pci_reset_fn_methods like [3, 1]. So when user triggers a
> reset we use reset method at index 3(pm) and then at index 1(flr).
> Does that make sense?

I'm not worried about in-kernel implementation, we will rewrite it if
needed. The most important part is user visible ABI, which we won't be
able to fix.

Thanks

> 
> Thanks,
> Amey

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25 16:09                                                                   ` Leon Romanovsky
  2021-03-25 17:22                                                                     ` Amey Narkhede
@ 2021-03-25 17:53                                                                     ` Alex Williamson
  2021-03-26  6:40                                                                       ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-25 17:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Thu, 25 Mar 2021 18:09:58 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > On Thu, 25 Mar 2021 10:37:54 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:  
> > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > Leon Romanovsky <leon@kernel.org> wrote:    
> > > 
> > > <...>
> > >   
> > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > Everything else is waste of time.    
> > > > 
> > > > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > > > kernel rebuild is a good thing.    
> > > 
> > > It is far from debug, this interface doesn't give you any answers why
> > > the reset didn't work, it just helps you to find the one that works.
> > > 
> > > Unless you believe that this information will be enough to understand
> > > the root cause, you will need to ask from the user to perform extra
> > > tests, maybe try some quirk. All of that requires from the users to
> > > rebuild their kernel.
> > > 
> > > So no, it is not debug.  
> > 
> > It allows a user to experiment to determine (a) my device doesn't work
> > in a given scenario with the default configuration, but (b) if I change
> > the reset to this other thing it does work.  That is a step in
> > debugging.
> > 
> > It's absurd to think that a sysfs attribute could provide root cause,
> > but it might be enough for someone to further help that user.  It would
> > be a useful clue for a bug report.  Yes, reaching root cause might
> > involve building a kernel, but that doesn't invalidate that having a
> > step towards debugging in the base kernel might be a useful tool.  
> 
> Let's agree to do not agree.
> 
> >   
> > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > > > system policy to do that same thing.
> > > > > > > > 
> > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > the risks.        
> > > > > > > 
> > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > > 
> > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > by default.      
> > > > > > 
> > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > and it won't be available for system policy and it's pointless.      
> > > > > 
> > > > > We probably have difference in view of what testing is. I expect from
> > > > > the users who experience issues with reset to do extra steps and one of
> > > > > them is to require from them to compile their kernel.    
> > > > 
> > > > I would define the ability to generate a CI test that can pick a
> > > > device, unbind it from its driver, and iterate reset methods as a
> > > > worthwhile improvement in testing.    
> > > 
> > > Who is going to run this CI? At least all kernel CIs (external and
> > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > > 
> > > Distro kernel is too bloat to be really usable for CI.  
> > 
> > At this point I'm suspicious you're trolling.  A distro kernel CI
> > certainly uses the kernel they intend to ship and support in their
> > environment. You're concerned about a bloated kernel, but the proposal
> > here adds 2-bytes per device to track reset methods and a trivial array
> > in text memory, meanwhile you're proposing multiple per-device memory
> > allocations to enhance the feature you think is too bloated for CI.  
> 
> I don't know why you decided to focus on memory footprint which is not
> important at all during CI runs. The bloat is in Kconfig options that
> are not needed. Those extra options add significant overhead during
> builds and runs itself.
> 
> And not, I'm not trolling, but representing HW vendor that pushes its CI
> and developers environment to the limit, by running full kernel builds with
> less than 30 seconds and boot-to-test with less than 6 seconds for full
> Fedora VM.

CI is only one aspect where I think this interface could be useful, as
below there's also a policy use case.  Therefore my inclination is that
this would be included in default kernels and avoiding bloat is a good
thing.  A CI environment can be used in different ways, it's not
necessarily building a new kernel for every test, nor do typical users
have access to those types of environments to report information in a
bug.
   
> > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > root without even thinking twice.    
> > > > 
> > > > Yes, with great power comes great responsibility.  Many admins ignore
> > > > this.  That's far beyond the scope of this series.    
> > > 
> > > <...>
> > >   
> > > > > I'm trying to help you with your use case of providing reset policy
> > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > to continue path of having specific reset type only, please ensure
> > > > > that this is not taken to the "bypass kernel" direction.    
> > > > 
> > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > allows an admin to specify an arbitrary list of reset methods because
> > > > that's somehow more in line with a policy choice than a userspace
> > > > workaround?  This seems like unnecessary bloat because (a) it allows
> > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > a single method anyway, so the functionality is unnecessary.  Please
> > > > help me understand how this favors the policy use case.  Thanks,    
> > > 
> > > The policy decision is global logic that is easier to grasp. At some
> > > point of our discussion, you presented the case where PM reset is not
> > > defined well and you prefer to do bus reset (something like that).
> > > 
> > > I expect that QEMU sets same reset policy for all devices at the same
> > > time instead of trying per-device to guess which one works.
> > > 
> > > And yes, you will be able to bypass kernel, but at least this interface
> > > will be broader than initial one that serves only SO and workarounds.  
> > 
> > I still think allocating objects for a list and managing that list is
> > too bloated and complicated, but I agree that being able to have more
> > fine grained control could be useful.  Is it necessary to be able to
> > re-order reset methods or might it still be better aligned to a policy
> > use case if we allow plus and minus operators?  For example, a device
> > might list:
> > 
> > [pm] [bus]
> > 
> > Indicating that PM and bus reset are both available and enabled.  The
> > user could do:
> > 
> > echo -pm > reset_methods
> > 
> > This would result in:
> > 
> > pm [bus]
> > 
> > Indicating that both PM and bus resets are available, but only bus reset
> > is enabled (note this is the identical result to "echo bus >" in the
> > current proposal).  "echo +pm" or "echo default" could re-enable the PM
> > reset.  Would something like that be satisfactory?  
> 
> Yes, I actually imagined simpler interface:
> To set specific type:
> echo pm > reset_methods
> To set policy:
> echo "pm,bus" > reset_methods
> 
> But your proposal is nicer.

The above doesn't support re-ordering though, we'll need to parse a
comma separated list for that.

> > If we need to allow re-ording, we'd want to use a byte-array where each
> > byte indicates a type of reset and perhaps a non-zero value in the
> > array indicates the method is enabled and the value indicates priority.
> > For example writing "dev_spec,flr,bus" would parse to write 1 to the
> > byte associated with the device specific reset, 2 to flr, 3 to bus
> > reset, then we'd process low to high (or maybe starting at a high value
> > to count down to zero might be more simple).  We could do that with
> > only adding less than a fixed 8-bytes per device and no dynamic
> > allocation.  Thoughts?  Thanks,  
> 
> Like I suggested, linked list will be easier and the reset will be
> something like:
>  for_each_reset_type(device, type) { 
>    switch (type) {
>    	case PM:
> 	       ret = do_some_reset(device);
> 	       break;
> 	case BUS:
> 		.....
> 	}
>    if (!ret || ret == -ENOMEM)  <-- go to next type in linked list
>      return ret;
>    }

Perhaps Bjorn has some thoughts, but I don't like the dynamic memory
allocation and list management required for a linked list.  Once bus &
slot reset are combined, I think we're talking about potentially 5
reset methods, so if we had:

const struct pci_reset_fn_method pci_reset_fn_methods[] = {
	{ .reset_fn = &pci_dev_specific_reset, .name = "device_specific" },
	{ .reset_fn = &pcie_flr, .name = "flr" },
	{ .reset_fn = &pci_af_flr, .name = "af_flr" },
	{ .reset_fn = &pci_pm_reset, .name = "pm" },
	{ .reset_fn = &pci_reset_bus_function, .name = "bus" },
};

The pci_dev could include

	u8 reset_methods[ARRAY_SIZE(pci_reset_fn_methods)];

And we could loop as:

u8 prio;

for (prio = ARRAY_SIZE(pci_reset_fn_methods); prio; prio--) {
	int i;

	for (i = 0; i < ARRAY_SIZE(pci_reset_fn_methods); i++) {
		if (dev->reset_methods[i] == prio) {
			ret = pci_reset_fn_methods[i].reset_fn(dev, probe);
			if (ret != -ENOTTY)
				return ret;
			break;
		}
	}
	if (i == ARRAY_SIZE(pci_reset_fn_methods))
		break;
}

return -ENOTTY;

The sysfs _store function would probably do something like:

u8 reset_methods[ARRAY_SIZE(pci_reset_fn_methods)] = { 0 };
u8 prio = ARRAY_SIZE(pci_reset_fn_methods);

for each @string in comma separated list from user... {
	int i;

	for (i = 0; i < ARRAY_SIZE(pci_reset_fn_methods); i++) {
		if (!strcmp(@string, pci_reset_fn_methods[i].name)) {
			reset_methods[i] = prio--;
			break;
		}
	}

	if (i == ARRAY_SIZE(pci_reset_fn_methods))
		return -EINVAL;
}

memcpy(dev->reset_methods, reset_methods, sizeof(reset_methods));

The probe would also need to fill the array in a compatible way:

u8 reset_methods[ARRAY_SIZE(pci_reset_fn_methods)] = { 0 };
u8 prio = ARRAY_SIZE(pci_reset_fn_methods);
int i;

for (i = 0; i < ARRAY_SIZE(pci_reset_fn_methods); i++) {
	int ret = pci_reset_fn_methods[i].reset_fn(dev, 1);

	if (!ret)
		reset_methods[i] = prio--;
	else if (ret != -ENOTTY)
		break;
}
	
memcpy(dev->reset_methods, reset_methods, sizeof(reset_methods));

Thanks,
Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-25 17:53                                                                     ` Alex Williamson
@ 2021-03-26  6:40                                                                       ` Leon Romanovsky
  2021-03-26  9:18                                                                         ` Krzysztof Wilczyński
  2021-03-26 14:20                                                                         ` Alex Williamson
  0 siblings, 2 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-26  6:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Thu, Mar 25, 2021 at 11:53:24AM -0600, Alex Williamson wrote:
> On Thu, 25 Mar 2021 18:09:58 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > > On Thu, 25 Mar 2021 10:37:54 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >   
> > > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:  
> > > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > > Leon Romanovsky <leon@kernel.org> wrote:    
> > > > 
> > > > <...>
> > > >   
> > > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > > Everything else is waste of time.    
> > > > > 
> > > > > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > > > > kernel rebuild is a good thing.    
> > > > 
> > > > It is far from debug, this interface doesn't give you any answers why
> > > > the reset didn't work, it just helps you to find the one that works.
> > > > 
> > > > Unless you believe that this information will be enough to understand
> > > > the root cause, you will need to ask from the user to perform extra
> > > > tests, maybe try some quirk. All of that requires from the users to
> > > > rebuild their kernel.
> > > > 
> > > > So no, it is not debug.  
> > > 
> > > It allows a user to experiment to determine (a) my device doesn't work
> > > in a given scenario with the default configuration, but (b) if I change
> > > the reset to this other thing it does work.  That is a step in
> > > debugging.
> > > 
> > > It's absurd to think that a sysfs attribute could provide root cause,
> > > but it might be enough for someone to further help that user.  It would
> > > be a useful clue for a bug report.  Yes, reaching root cause might
> > > involve building a kernel, but that doesn't invalidate that having a
> > > step towards debugging in the base kernel might be a useful tool.  
> > 
> > Let's agree to do not agree.
> > 
> > >   
> > > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > > > > system policy to do that same thing.
> > > > > > > > > 
> > > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > > the risks.        
> > > > > > > > 
> > > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > > > 
> > > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > > by default.      
> > > > > > > 
> > > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > > and it won't be available for system policy and it's pointless.      
> > > > > > 
> > > > > > We probably have difference in view of what testing is. I expect from
> > > > > > the users who experience issues with reset to do extra steps and one of
> > > > > > them is to require from them to compile their kernel.    
> > > > > 
> > > > > I would define the ability to generate a CI test that can pick a
> > > > > device, unbind it from its driver, and iterate reset methods as a
> > > > > worthwhile improvement in testing.    
> > > > 
> > > > Who is going to run this CI? At least all kernel CIs (external and
> > > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > > > 
> > > > Distro kernel is too bloat to be really usable for CI.  
> > > 
> > > At this point I'm suspicious you're trolling.  A distro kernel CI
> > > certainly uses the kernel they intend to ship and support in their
> > > environment. You're concerned about a bloated kernel, but the proposal
> > > here adds 2-bytes per device to track reset methods and a trivial array
> > > in text memory, meanwhile you're proposing multiple per-device memory
> > > allocations to enhance the feature you think is too bloated for CI.  
> > 
> > I don't know why you decided to focus on memory footprint which is not
> > important at all during CI runs. The bloat is in Kconfig options that
> > are not needed. Those extra options add significant overhead during
> > builds and runs itself.
> > 
> > And not, I'm not trolling, but representing HW vendor that pushes its CI
> > and developers environment to the limit, by running full kernel builds with
> > less than 30 seconds and boot-to-test with less than 6 seconds for full
> > Fedora VM.
> 
> CI is only one aspect where I think this interface could be useful, as
> below there's also a policy use case.  Therefore my inclination is that
> this would be included in default kernels and avoiding bloat is a good
> thing.  A CI environment can be used in different ways, it's not
> necessarily building a new kernel for every test, nor do typical users
> have access to those types of environments to report information in a
> bug.
>    
> > > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > > root without even thinking twice.    
> > > > > 
> > > > > Yes, with great power comes great responsibility.  Many admins ignore
> > > > > this.  That's far beyond the scope of this series.    
> > > > 
> > > > <...>
> > > >   
> > > > > > I'm trying to help you with your use case of providing reset policy
> > > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > > to continue path of having specific reset type only, please ensure
> > > > > > that this is not taken to the "bypass kernel" direction.    
> > > > > 
> > > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > > allows an admin to specify an arbitrary list of reset methods because
> > > > > that's somehow more in line with a policy choice than a userspace
> > > > > workaround?  This seems like unnecessary bloat because (a) it allows
> > > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > > a single method anyway, so the functionality is unnecessary.  Please
> > > > > help me understand how this favors the policy use case.  Thanks,    
> > > > 
> > > > The policy decision is global logic that is easier to grasp. At some
> > > > point of our discussion, you presented the case where PM reset is not
> > > > defined well and you prefer to do bus reset (something like that).
> > > > 
> > > > I expect that QEMU sets same reset policy for all devices at the same
> > > > time instead of trying per-device to guess which one works.
> > > > 
> > > > And yes, you will be able to bypass kernel, but at least this interface
> > > > will be broader than initial one that serves only SO and workarounds.  
> > > 
> > > I still think allocating objects for a list and managing that list is
> > > too bloated and complicated, but I agree that being able to have more
> > > fine grained control could be useful.  Is it necessary to be able to
> > > re-order reset methods or might it still be better aligned to a policy
> > > use case if we allow plus and minus operators?  For example, a device
> > > might list:
> > > 
> > > [pm] [bus]
> > > 
> > > Indicating that PM and bus reset are both available and enabled.  The
> > > user could do:
> > > 
> > > echo -pm > reset_methods
> > > 
> > > This would result in:
> > > 
> > > pm [bus]
> > > 
> > > Indicating that both PM and bus resets are available, but only bus reset
> > > is enabled (note this is the identical result to "echo bus >" in the
> > > current proposal).  "echo +pm" or "echo default" could re-enable the PM
> > > reset.  Would something like that be satisfactory?  
> > 
> > Yes, I actually imagined simpler interface:
> > To set specific type:
> > echo pm > reset_methods
> > To set policy:
> > echo "pm,bus" > reset_methods
> > 
> > But your proposal is nicer.
> 
> The above doesn't support re-ordering though, we'll need to parse a
> comma separated list for that.

It supports by writing: echo "bus,pm" > reset_methods.
Regarding comma, IMHO it is easiest pattern for the parsing.

Anyway, The in-kernel implementation is not important to me.

Thanks

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-26  6:40                                                                       ` Leon Romanovsky
@ 2021-03-26  9:18                                                                         ` Krzysztof Wilczyński
  2021-03-26 12:54                                                                           ` Leon Romanovsky
  2021-03-26 14:20                                                                         ` Alex Williamson
  1 sibling, 1 reply; 90+ messages in thread
From: Krzysztof Wilczyński @ 2021-03-26  9:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Alex Williamson, Enrico Weigelt, metux IT consult, Amey Narkhede,
	raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe

Hello,

[...]

Aside of the sysfs interface, would this new functionality also require
anything to be overridden at boot time via passing some command-line
arguments?  Not sure how relevant such thing would be to device, but,
whatnot reset, though.

I am curious whether there would be a need for anything like that.

Krzysztof

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-26  9:18                                                                         ` Krzysztof Wilczyński
@ 2021-03-26 12:54                                                                           ` Leon Romanovsky
  0 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-26 12:54 UTC (permalink / raw)
  To: Krzysztof Wilczyński
  Cc: Alex Williamson, Enrico Weigelt, metux IT consult, Amey Narkhede,
	raphael.norwitz, linux-pci, bhelgaas, linux-kernel, alay.shah,
	suresh.gumpula, shyam.rajendran, felipe

On Fri, Mar 26, 2021 at 10:18:25AM +0100, Krzysztof Wilczyński wrote:
> Hello,
> 
> [...]
> 
> Aside of the sysfs interface, would this new functionality also require
> anything to be overridden at boot time via passing some command-line
> arguments?  Not sure how relevant such thing would be to device, but,
> whatnot reset, though.

This is per-device property and can't be universally correct like kernel
command-line arguments. I don't think that we need to add such functionality.

> 
> I am curious whether there would be a need for anything like that.

I prefer not.

> 
> Krzysztof

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-26  6:40                                                                       ` Leon Romanovsky
  2021-03-26  9:18                                                                         ` Krzysztof Wilczyński
@ 2021-03-26 14:20                                                                         ` Alex Williamson
  2021-03-27  6:02                                                                           ` Leon Romanovsky
  1 sibling, 1 reply; 90+ messages in thread
From: Alex Williamson @ 2021-03-26 14:20 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Fri, 26 Mar 2021 09:40:30 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Mar 25, 2021 at 11:53:24AM -0600, Alex Williamson wrote:
> > On Thu, 25 Mar 2021 18:09:58 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:  
> > > > On Thu, 25 Mar 2021 10:37:54 +0200
> > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > >     
> > > > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:    
> > > > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > > > Leon Romanovsky <leon@kernel.org> wrote:      
> > > > > 
> > > > > <...>
> > > > >     
> > > > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > > > Everything else is waste of time.      
> > > > > > 
> > > > > > Sorry, this is nonsense.  Allowing users to debug issues without a full
> > > > > > kernel rebuild is a good thing.      
> > > > > 
> > > > > It is far from debug, this interface doesn't give you any answers why
> > > > > the reset didn't work, it just helps you to find the one that works.
> > > > > 
> > > > > Unless you believe that this information will be enough to understand
> > > > > the root cause, you will need to ask from the user to perform extra
> > > > > tests, maybe try some quirk. All of that requires from the users to
> > > > > rebuild their kernel.
> > > > > 
> > > > > So no, it is not debug.    
> > > > 
> > > > It allows a user to experiment to determine (a) my device doesn't work
> > > > in a given scenario with the default configuration, but (b) if I change
> > > > the reset to this other thing it does work.  That is a step in
> > > > debugging.
> > > > 
> > > > It's absurd to think that a sysfs attribute could provide root cause,
> > > > but it might be enough for someone to further help that user.  It would
> > > > be a useful clue for a bug report.  Yes, reaching root cause might
> > > > involve building a kernel, but that doesn't invalidate that having a
> > > > step towards debugging in the base kernel might be a useful tool.    
> > > 
> > > Let's agree to do not agree.
> > >   
> > > >     
> > > > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > > > regarding the scope of a PM "soft reset".  This interface would allow a
> > > > > > > > > > system policy to do that same thing.
> > > > > > > > > > 
> > > > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > > > identify broken methods, and set a policy.  Sure, that policy might be
> > > > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > > > the risks.          
> > > > > > > > > 
> > > > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > > > > 
> > > > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > > > by default.        
> > > > > > > > 
> > > > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > > > admin privileges to modify, sysfs has lots of such things.  If it's
> > > > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > > > and it won't be available for system policy and it's pointless.        
> > > > > > > 
> > > > > > > We probably have difference in view of what testing is. I expect from
> > > > > > > the users who experience issues with reset to do extra steps and one of
> > > > > > > them is to require from them to compile their kernel.      
> > > > > > 
> > > > > > I would define the ability to generate a CI test that can pick a
> > > > > > device, unbind it from its driver, and iterate reset methods as a
> > > > > > worthwhile improvement in testing.      
> > > > > 
> > > > > Who is going to run this CI? At least all kernel CIs (external and
> > > > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > > > > 
> > > > > Distro kernel is too bloat to be really usable for CI.    
> > > > 
> > > > At this point I'm suspicious you're trolling.  A distro kernel CI
> > > > certainly uses the kernel they intend to ship and support in their
> > > > environment. You're concerned about a bloated kernel, but the proposal
> > > > here adds 2-bytes per device to track reset methods and a trivial array
> > > > in text memory, meanwhile you're proposing multiple per-device memory
> > > > allocations to enhance the feature you think is too bloated for CI.    
> > > 
> > > I don't know why you decided to focus on memory footprint which is not
> > > important at all during CI runs. The bloat is in Kconfig options that
> > > are not needed. Those extra options add significant overhead during
> > > builds and runs itself.
> > > 
> > > And not, I'm not trolling, but representing HW vendor that pushes its CI
> > > and developers environment to the limit, by running full kernel builds with
> > > less than 30 seconds and boot-to-test with less than 6 seconds for full
> > > Fedora VM.  
> > 
> > CI is only one aspect where I think this interface could be useful, as
> > below there's also a policy use case.  Therefore my inclination is that
> > this would be included in default kernels and avoiding bloat is a good
> > thing.  A CI environment can be used in different ways, it's not
> > necessarily building a new kernel for every test, nor do typical users
> > have access to those types of environments to report information in a
> > bug.
> >      
> > > > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > > > root without even thinking twice.      
> > > > > > 
> > > > > > Yes, with great power comes great responsibility.  Many admins ignore
> > > > > > this.  That's far beyond the scope of this series.      
> > > > > 
> > > > > <...>
> > > > >     
> > > > > > > I'm trying to help you with your use case of providing reset policy
> > > > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > > > to continue path of having specific reset type only, please ensure
> > > > > > > that this is not taken to the "bypass kernel" direction.      
> > > > > > 
> > > > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > > > allows an admin to specify an arbitrary list of reset methods because
> > > > > > that's somehow more in line with a policy choice than a userspace
> > > > > > workaround?  This seems like unnecessary bloat because (a) it allows
> > > > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > > > a single method anyway, so the functionality is unnecessary.  Please
> > > > > > help me understand how this favors the policy use case.  Thanks,      
> > > > > 
> > > > > The policy decision is global logic that is easier to grasp. At some
> > > > > point of our discussion, you presented the case where PM reset is not
> > > > > defined well and you prefer to do bus reset (something like that).
> > > > > 
> > > > > I expect that QEMU sets same reset policy for all devices at the same
> > > > > time instead of trying per-device to guess which one works.
> > > > > 
> > > > > And yes, you will be able to bypass kernel, but at least this interface
> > > > > will be broader than initial one that serves only SO and workarounds.    
> > > > 
> > > > I still think allocating objects for a list and managing that list is
> > > > too bloated and complicated, but I agree that being able to have more
> > > > fine grained control could be useful.  Is it necessary to be able to
> > > > re-order reset methods or might it still be better aligned to a policy
> > > > use case if we allow plus and minus operators?  For example, a device
> > > > might list:
> > > > 
> > > > [pm] [bus]
> > > > 
> > > > Indicating that PM and bus reset are both available and enabled.  The
> > > > user could do:
> > > > 
> > > > echo -pm > reset_methods
> > > > 
> > > > This would result in:
> > > > 
> > > > pm [bus]
> > > > 
> > > > Indicating that both PM and bus resets are available, but only bus reset
> > > > is enabled (note this is the identical result to "echo bus >" in the
> > > > current proposal).  "echo +pm" or "echo default" could re-enable the PM
> > > > reset.  Would something like that be satisfactory?    

(3) This +/- scheme, which doesn't support re-ordering.

> > > 
> > > Yes, I actually imagined simpler interface:
> > > To set specific type:
> > > echo pm > reset_methods
> > > To set policy:
> > > echo "pm,bus" > reset_methods
> > > 
> > > But your proposal is nicer.  

(2) This, which I believe is in reference to... ^^
> > 
> > The above doesn't support re-ordering though, we'll need to parse a
> > comma separated list for that.

(1) This refers to... ^^
  
> 
> It supports by writing: echo "bus,pm" > reset_methods.
> Regarding comma, IMHO it is easiest pattern for the parsing.
> 
> Anyway, The in-kernel implementation is not important to me.

Too bad, it should have been apparent from the sample code that it was
using a comma separated list with re-ordering support.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
  2021-03-26 14:20                                                                         ` Alex Williamson
@ 2021-03-27  6:02                                                                           ` Leon Romanovsky
  0 siblings, 0 replies; 90+ messages in thread
From: Leon Romanovsky @ 2021-03-27  6:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Enrico Weigelt, metux IT consult, Amey Narkhede, raphael.norwitz,
	linux-pci, bhelgaas, linux-kernel, alay.shah, suresh.gumpula,
	shyam.rajendran, felipe

On Fri, Mar 26, 2021 at 08:20:07AM -0600, Alex Williamson wrote:
> On Fri, 26 Mar 2021 09:40:30 +0300
> Leon Romanovsky <leon@kernel.org> wrote:

<...>

> > 
> > It supports by writing: echo "bus,pm" > reset_methods.
> > Regarding comma, IMHO it is easiest pattern for the parsing.
> > 
> > Anyway, The in-kernel implementation is not important to me.
> 
> Too bad, it should have been apparent from the sample code that it was
> using a comma separated list with re-ordering support.  Thanks,

Excellent, both of us think that "bus,pm" is the easiest way to
implement policy decision.

Thanks

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2021-03-27  6:03 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 17:34 [PATCH 0/4] Expose and manage PCI device reset ameynarkhede03
2021-03-12 17:34 ` [PATCH 1/4] PCI: Refactor pcie_flr to follow calling convention of other reset methods ameynarkhede03
2021-03-12 17:34 ` [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms ameynarkhede03
2021-03-14 23:51   ` Pali Rohár
2021-03-12 17:34 ` [PATCH 3/4] PCI: Remove reset_fn field from pci_dev ameynarkhede03
2021-03-14 23:52   ` Pali Rohár
2021-03-12 17:34 ` [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism ameynarkhede03
2021-03-14 23:55   ` Pali Rohár
2021-03-15 13:43     ` Amey Narkhede
2021-03-15 13:52       ` Pali Rohár
2021-03-15 14:34         ` Alex Williamson
2021-03-15 14:52           ` Pali Rohár
2021-03-15 15:03             ` Alex Williamson
2021-03-17 19:02               ` Pali Rohár
2021-03-17 19:15                 ` Alex Williamson
2021-03-17 19:24                   ` Pali Rohár
2021-03-17 19:32                     ` Alex Williamson
2021-03-17 19:40                       ` Pali Rohár
2021-03-17 20:00                         ` Alex Williamson
2021-03-17 20:13                           ` Pali Rohár
2021-03-18 14:31                             ` Amey Narkhede
2021-03-23 14:34                               ` Pali Rohár
2021-03-23 14:44                                 ` Alex Williamson
2021-03-23 15:32                                   ` Amey Narkhede
2021-03-23 16:06                                     ` Alex Williamson
2021-03-23 16:15                                       ` Alex Williamson
2021-03-15 15:07           ` Leon Romanovsky
2021-03-15 15:33             ` Amey Narkhede
2021-03-15 16:29               ` Alex Williamson
2021-03-15 18:32                 ` Raphael Norwitz
2021-03-17  4:20                   ` Leon Romanovsky
2021-03-17 10:24                     ` Amey Narkhede
2021-03-17 11:02                       ` Leon Romanovsky
2021-03-17 11:23                         ` Amey Narkhede
2021-03-17 11:47                           ` Leon Romanovsky
2021-03-17 13:17                             ` Amey Narkhede
2021-03-17 13:58                               ` Leon Romanovsky
2021-03-17 17:31                                 ` Alex Williamson
2021-03-18  9:09                                   ` Leon Romanovsky
2021-03-18 14:22                                     ` Amey Narkhede
2021-03-18 14:57                                       ` Leon Romanovsky
2021-03-18 17:01                                         ` Amey Narkhede
2021-03-18 17:35                                           ` Leon Romanovsky
2021-03-18 17:43                                             ` Amey Narkhede
2021-03-18 18:14                                               ` Enrico Weigelt, metux IT consult
2021-03-19 13:05                                               ` Leon Romanovsky
2021-03-19 15:23                                                 ` Amey Narkhede
2021-03-19 15:37                                                   ` Leon Romanovsky
2021-03-19 15:53                                                     ` Amey Narkhede
2021-03-18 17:58                                             ` Enrico Weigelt, metux IT consult
2021-03-19 13:07                                               ` Leon Romanovsky
2021-03-18 16:39                                     ` Alex Williamson
2021-03-18 17:22                                       ` Leon Romanovsky
2021-03-18 17:38                                         ` Amey Narkhede
2021-03-18 18:34                                         ` Enrico Weigelt, metux IT consult
2021-03-19 12:59                                           ` Leon Romanovsky
2021-03-19 13:48                                             ` Enrico Weigelt, metux IT consult
2021-03-19 15:51                                               ` Leon Romanovsky
2021-03-19 15:57                                             ` Bjorn Helgaas
2021-03-19 16:24                                               ` Leon Romanovsky
2021-03-19 16:23                                             ` Alex Williamson
2021-03-20  9:10                                               ` Leon Romanovsky
2021-03-20 14:59                                                 ` Alex Williamson
2021-03-21  8:40                                                   ` Leon Romanovsky
2021-03-21 14:57                                                     ` Amey Narkhede
2021-03-22 17:10                                                     ` Alex Williamson
2021-03-24 10:03                                                       ` Leon Romanovsky
2021-03-24 14:37                                                         ` Alex Williamson
2021-03-24 15:13                                                           ` Leon Romanovsky
2021-03-24 17:17                                                             ` Alex Williamson
2021-03-25  8:37                                                               ` Leon Romanovsky
2021-03-25 14:55                                                                 ` Alex Williamson
2021-03-25 16:09                                                                   ` Leon Romanovsky
2021-03-25 17:22                                                                     ` Amey Narkhede
2021-03-25 17:36                                                                       ` Leon Romanovsky
2021-03-25 17:53                                                                     ` Alex Williamson
2021-03-26  6:40                                                                       ` Leon Romanovsky
2021-03-26  9:18                                                                         ` Krzysztof Wilczyński
2021-03-26 12:54                                                                           ` Leon Romanovsky
2021-03-26 14:20                                                                         ` Alex Williamson
2021-03-27  6:02                                                                           ` Leon Romanovsky
2021-03-25 16:26                                                                 ` Amey Narkhede
2021-03-25 16:46                                                                   ` Leon Romanovsky
2021-03-18 17:51     ` Enrico Weigelt, metux IT consult
     [not found] ` <20210312112043.3f2954e3@omen.home.shazbot.org>
2021-03-12 18:40   ` [PATCH 0/4] Expose and manage PCI device reset Amey Narkhede
2021-03-12 18:58     ` Krzysztof Wilczyński
2021-03-12 19:06       ` Amey Narkhede
2021-03-12 19:20         ` Krzysztof Wilczyński
2021-03-13  2:02     ` Raphael Norwitz
2021-03-14 12:09 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).