linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init
@ 2023-11-20  8:40 Manivannan Sadhasivam
  2023-11-20  8:40 ` [PATCH v7 1/2] " Manivannan Sadhasivam
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2023-11-20  8:40 UTC (permalink / raw)
  To: jingoohan1, gustavo.pimentel, lpieralisi, robh, kw, bhelgaas
  Cc: linux-pci, linux-arm-msm, linux-kernel, quic_bjorande,
	fancer.lancer, vidyas, Manivannan Sadhasivam

Hello,

This series is the continuation of previous work by Vidya Sagar [1] to fix the
issues related to accessing DBI register space before completing the core
initialization in some EP platforms like Tegra194/234 and Qcom SM8450.

Since Vidya is busy, I took over the series based on his consent (off-list
discussion).

I've reworked the series in v7 to make it bisect friendly, and also to avoid
build issue with the dependency. This resulted in the patches being heavily
modified, so I took over the authorship of the patches.

Testing
=======

I've tested the series on Qcom SM8450 based dev board. I also expect it to work
on Tegra platforms as well but it'd be good if Vidya or someone could test it.

- Mani

[1] https://lore.kernel.org/linux-pci/20221013175712.7539-1-vidyas@nvidia.com/
[2] https://lore.kernel.org/linux-pci/20230825123843.GD6005@thinkpad/

Changes in v7:

- Rebased on top of v6.7-rc1
- Kept the current dw_pcie_ep_init_complete() API instead of renaming it to
  dw_pcie_ep_init_late(), since changing the name causes a slight ambiguity.
- Splitted the change that moves pci_epc_init_notify() inside
  dw_pcie_ep_init_notify() to help bisecting and also to avoid build issue.
- Added a new patch that moves pci_epc_init_notify() inside
  dw_pcie_ep_init_notify().
- Took over the authorship and dropped the previous Ack as the patches are
  heavily modified.

Changes in v6:

- Rebased on top of pci/next (6e2fca71e187)
- removed ep_init_late() callback as it is no longer necessary

For previous changelog, please refer [1].


Manivannan Sadhasivam (2):
  PCI: designware-ep: Fix DBI access before core init
  PCI: designware-ep: Move pci_epc_init_notify() inside
    dw_pcie_ep_init_complete()

 .../pci/controller/dwc/pcie-designware-ep.c   | 149 +++++++++++-------
 drivers/pci/controller/dwc/pcie-designware.h  |   5 -
 drivers/pci/controller/dwc/pcie-qcom-ep.c     |   2 -
 drivers/pci/controller/dwc/pcie-tegra194.c    |   2 -
 4 files changed, 93 insertions(+), 65 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-20  8:40 [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Manivannan Sadhasivam
@ 2023-11-20  8:40 ` Manivannan Sadhasivam
  2023-11-28 17:41   ` Niklas Cassel
  2024-01-09 21:12   ` Bjorn Helgaas
  2023-11-20  8:40 ` [PATCH v7 2/2] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete() Manivannan Sadhasivam
  2024-01-07  7:27 ` [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Krzysztof Wilczyński
  2 siblings, 2 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2023-11-20  8:40 UTC (permalink / raw)
  To: jingoohan1, gustavo.pimentel, lpieralisi, robh, kw, bhelgaas
  Cc: linux-pci, linux-arm-msm, linux-kernel, quic_bjorande,
	fancer.lancer, vidyas, Manivannan Sadhasivam

The drivers for platforms requiring reference clock from the PCIe host for
initializing their PCIe EP core, make use of the 'core_init_notifier'
feature exposed by the DWC common code. On these platforms, access to the
hw registers like DBI before completing the core initialization will result
in a whole system hang. But the current DWC EP driver tries to access DBI
registers during dw_pcie_ep_init() without waiting for core initialization
and it results in system hang on platforms making use of
'core_init_notifier' such as Tegra194 and Qcom SM8450.

To workaround this issue, users of the above mentioned platforms have to
maintain the dependency with the PCIe host by booting the PCIe EP after
host boot. But this won't provide a good user experience, since PCIe EP is
_one_ of the features of those platforms and it doesn't make sense to
delay the whole platform booting due to the PCIe dependency.

So to fix this issue, let's move all the DBI access during
dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
API that gets called only after core initialization on these platforms.
This makes sure that the DBI register accesses are skipped during
dw_pcie_ep_init() and accessed later once the core initialization happens.

For the rest of the platforms, DBI access happens as usual.

Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 .../pci/controller/dwc/pcie-designware-ep.c   | 139 ++++++++++++------
 1 file changed, 91 insertions(+), 48 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index f6207989fc6a..b1c79cd8e25f 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -662,14 +662,19 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct dw_pcie *pci, int cap)
 	return 0;
 }
 
-int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
+static int dw_pcie_ep_late_init(struct dw_pcie_ep *ep)
 {
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+	struct dw_pcie_ep_func *ep_func;
+	struct device *dev = pci->dev;
+	struct pci_epc *epc = ep->epc;
 	unsigned int offset, ptm_cap_base;
 	unsigned int nbars;
 	u8 hdr_type;
+	u8 func_no;
+	int i, ret;
+	void *addr;
 	u32 reg;
-	int i;
 
 	hdr_type = dw_pcie_readb_dbi(pci, PCI_HEADER_TYPE) &
 		   PCI_HEADER_TYPE_MASK;
@@ -680,6 +685,55 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 		return -EIO;
 	}
 
+	dw_pcie_version_detect(pci);
+
+	dw_pcie_iatu_detect(pci);
+
+	ret = dw_pcie_edma_detect(pci);
+	if (ret)
+		return ret;
+
+	if (!ep->ib_window_map) {
+		ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
+						       GFP_KERNEL);
+		if (!ep->ib_window_map)
+			goto err_remove_edma;
+	}
+
+	if (!ep->ob_window_map) {
+		ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
+						       GFP_KERNEL);
+		if (!ep->ob_window_map)
+			goto err_remove_edma;
+	}
+
+	if (!ep->outbound_addr) {
+		addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
+				    GFP_KERNEL);
+		if (!addr)
+			goto err_remove_edma;
+		ep->outbound_addr = addr;
+	}
+
+	for (func_no = 0; func_no < epc->max_functions; func_no++) {
+
+		ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
+		if (ep_func)
+			continue;
+
+		ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
+		if (!ep_func)
+			goto err_remove_edma;
+
+		ep_func->func_no = func_no;
+		ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
+							      PCI_CAP_ID_MSI);
+		ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
+							       PCI_CAP_ID_MSIX);
+
+		list_add_tail(&ep_func->list, &ep->func_list);
+	}
+
 	offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
 	ptm_cap_base = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_PTM);
 
@@ -714,14 +768,38 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 	dw_pcie_dbi_ro_wr_dis(pci);
 
 	return 0;
+
+err_remove_edma:
+	dw_pcie_edma_remove(pci);
+
+	return ret;
+}
+
+int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
+{
+	struct pci_epc *epc = ep->epc;
+	int ret;
+
+	ret = dw_pcie_ep_late_init(ep);
+	if (ret)
+		goto err_cleanup;
+
+	return 0;
+
+err_cleanup:
+	pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
+			      epc->mem->window.page_size);
+	pci_epc_mem_exit(epc);
+	if (ep->ops->deinit)
+		ep->ops->deinit(ep);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete);
 
 int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 {
 	int ret;
-	void *addr;
-	u8 func_no;
 	struct resource *res;
 	struct pci_epc *epc;
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
@@ -729,7 +807,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 	struct platform_device *pdev = to_platform_device(dev);
 	struct device_node *np = dev->of_node;
 	const struct pci_epc_features *epc_features;
-	struct dw_pcie_ep_func *ep_func;
 
 	INIT_LIST_HEAD(&ep->func_list);
 
@@ -747,26 +824,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 	if (ep->ops->pre_init)
 		ep->ops->pre_init(ep);
 
-	dw_pcie_version_detect(pci);
-
-	dw_pcie_iatu_detect(pci);
-
-	ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
-					       GFP_KERNEL);
-	if (!ep->ib_window_map)
-		return -ENOMEM;
-
-	ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
-					       GFP_KERNEL);
-	if (!ep->ob_window_map)
-		return -ENOMEM;
-
-	addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
-			    GFP_KERNEL);
-	if (!addr)
-		return -ENOMEM;
-	ep->outbound_addr = addr;
-
 	epc = devm_pci_epc_create(dev, &epc_ops);
 	if (IS_ERR(epc)) {
 		dev_err(dev, "Failed to create epc device\n");
@@ -780,20 +837,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 	if (ret < 0)
 		epc->max_functions = 1;
 
-	for (func_no = 0; func_no < epc->max_functions; func_no++) {
-		ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
-		if (!ep_func)
-			return -ENOMEM;
-
-		ep_func->func_no = func_no;
-		ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
-							      PCI_CAP_ID_MSI);
-		ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
-							       PCI_CAP_ID_MSIX);
-
-		list_add_tail(&ep_func->list, &ep->func_list);
-	}
-
 	if (ep->ops->ep_init)
 		ep->ops->ep_init(ep);
 
@@ -812,25 +855,25 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 		goto err_exit_epc_mem;
 	}
 
-	ret = dw_pcie_edma_detect(pci);
-	if (ret)
-		goto err_free_epc_mem;
-
 	if (ep->ops->get_features) {
 		epc_features = ep->ops->get_features(ep);
 		if (epc_features->core_init_notifier)
 			return 0;
 	}
 
-	ret = dw_pcie_ep_init_complete(ep);
+	/*
+	 * NOTE:- Avoid accessing the hardware (Ex:- DBI space) before this
+	 * step as platforms that implement 'core_init_notifier' feature may
+	 * not have the hardware ready (i.e. core initialized) for access
+	 * (Ex: tegra194). Any hardware access on such platforms result
+	 * in system hang.
+	 */
+	ret = dw_pcie_ep_late_init(ep);
 	if (ret)
-		goto err_remove_edma;
+		goto err_free_epc_mem;
 
 	return 0;
 
-err_remove_edma:
-	dw_pcie_edma_remove(pci);
-
 err_free_epc_mem:
 	pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
 			      epc->mem->window.page_size);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 2/2] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
  2023-11-20  8:40 [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Manivannan Sadhasivam
  2023-11-20  8:40 ` [PATCH v7 1/2] " Manivannan Sadhasivam
@ 2023-11-20  8:40 ` Manivannan Sadhasivam
  2024-01-07  7:27 ` [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Krzysztof Wilczyński
  2 siblings, 0 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2023-11-20  8:40 UTC (permalink / raw)
  To: jingoohan1, gustavo.pimentel, lpieralisi, robh, kw, bhelgaas
  Cc: linux-pci, linux-arm-msm, linux-kernel, quic_bjorande,
	fancer.lancer, vidyas, Manivannan Sadhasivam

Since pci_epc_init_notify() API is getting called right after the
dw_pcie_ep_init_complete() API in the DWC glue drivers, let's move it
inside dw_pcie_ep_init_complete() API as there is no compelling reason to
call it separately.

Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/controller/dwc/pcie-designware-ep.c | 10 ++--------
 drivers/pci/controller/dwc/pcie-designware.h    |  5 -----
 drivers/pci/controller/dwc/pcie-qcom-ep.c       |  2 --
 drivers/pci/controller/dwc/pcie-tegra194.c      |  2 --
 4 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index b1c79cd8e25f..63bb99d1c48f 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -22,14 +22,6 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_linkup);
 
-void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
-{
-	struct pci_epc *epc = ep->epc;
-
-	pci_epc_init_notify(epc);
-}
-EXPORT_SYMBOL_GPL(dw_pcie_ep_init_notify);
-
 struct dw_pcie_ep_func *
 dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no)
 {
@@ -784,6 +776,8 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 	if (ret)
 		goto err_cleanup;
 
+	pci_epc_init_notify(epc);
+
 	return 0;
 
 err_cleanup:
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index 55ff76e3d384..53bf38989eea 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -578,7 +578,6 @@ static inline void __iomem *dw_pcie_own_conf_map_bus(struct pci_bus *bus,
 void dw_pcie_ep_linkup(struct dw_pcie_ep *ep);
 int dw_pcie_ep_init(struct dw_pcie_ep *ep);
 int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep);
-void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep);
 void dw_pcie_ep_exit(struct dw_pcie_ep *ep);
 int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no);
 int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
@@ -605,10 +604,6 @@ static inline int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 	return 0;
 }
 
-static inline void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
-{
-}
-
 static inline void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
 {
 }
diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
index 9e58f055199a..4a8119779a29 100644
--- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
+++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
@@ -482,8 +482,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
 	val &= ~PARF_MSTR_AXI_CLK_EN;
 	writel_relaxed(val, pcie_ep->parf + PARF_MHI_CLOCK_RESET_CTRL);
 
-	dw_pcie_ep_init_notify(&pcie_ep->pci.ep);
-
 	/* Enable LTSSM */
 	val = readl_relaxed(pcie_ep->parf + PARF_LTSSM);
 	val |= BIT(8);
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index 0fe113598ebb..1126d1f5830c 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -1901,8 +1901,6 @@ static void pex_ep_event_pex_rst_deassert(struct tegra_pcie_dw *pcie)
 		goto fail_init_complete;
 	}
 
-	dw_pcie_ep_init_notify(ep);
-
 	/* Program the private control to allow sending LTR upstream */
 	if (pcie->of_data->has_ltr_req_fix) {
 		val = appl_readl(pcie, APPL_LTR_MSG_2);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-20  8:40 ` [PATCH v7 1/2] " Manivannan Sadhasivam
@ 2023-11-28 17:41   ` Niklas Cassel
  2023-11-29 11:38     ` Niklas Cassel
  2023-11-29 14:08     ` Manivannan Sadhasivam
  2024-01-09 21:12   ` Bjorn Helgaas
  1 sibling, 2 replies; 24+ messages in thread
From: Niklas Cassel @ 2023-11-28 17:41 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci, Damien Le Moal

On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> The drivers for platforms requiring reference clock from the PCIe host for
> initializing their PCIe EP core, make use of the 'core_init_notifier'
> feature exposed by the DWC common code. On these platforms, access to the
> hw registers like DBI before completing the core initialization will result
> in a whole system hang. But the current DWC EP driver tries to access DBI
> registers during dw_pcie_ep_init() without waiting for core initialization
> and it results in system hang on platforms making use of
> 'core_init_notifier' such as Tegra194 and Qcom SM8450.
> 
> To workaround this issue, users of the above mentioned platforms have to
> maintain the dependency with the PCIe host by booting the PCIe EP after
> host boot. But this won't provide a good user experience, since PCIe EP is
> _one_ of the features of those platforms and it doesn't make sense to
> delay the whole platform booting due to the PCIe dependency.
> 
> So to fix this issue, let's move all the DBI access during
> dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> API that gets called only after core initialization on these platforms.
> This makes sure that the DBI register accesses are skipped during
> dw_pcie_ep_init() and accessed later once the core initialization happens.
> 
> For the rest of the platforms, DBI access happens as usual.
> 
> Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---

Hello Mani,

I tried this patch on top of a work in progress EP driver,
which, similar to pcie-qcom-ep.c has a perst gpio as input,
and a .core_init_notifier.

What I noticed is the following every time I reboot the RC, I get:

[  604.735115] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
[ 1000.713582] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
[ 1000.714355] debugfs: File 'mf' in directory '/' already present!
[ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
[ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
[ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!


Also:

# ls -al /sys/class/dma/dma*/device | grep pcie
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan0/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan1/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan2/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan3/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan0/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan1/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan2/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan3/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan0/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan1/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan2/device -> ../../../a40000000.pcie_ep
lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan3/device -> ../../../a40000000.pcie_ep

Adds another dmaX entry for each reboot.


I'm quite sure that you will see the same with pcie-qcom-ep.

I think that either the DWC drivers using core_init (only tegra and qcom)
need to deinit the eDMA in their assert_perst() function, or this patch
needs to deinit the eDMA before initializing it.


A problem with the current code, if you do NOT have this patch, which I assume
is also problem on pcie-qcom-ep, is that since assert_perst() function performs
a core reset, all the eDMA setting written in the dbi by the eDMA driver will be
cleared, so a PERST assert + deassert by the RC will wipe the eDMA settings.
Hopefully, this will no longer be a problem after this patch has been merged.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-28 17:41   ` Niklas Cassel
@ 2023-11-29 11:38     ` Niklas Cassel
  2023-11-29 15:51       ` Manivannan Sadhasivam
  2023-11-29 14:08     ` Manivannan Sadhasivam
  1 sibling, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2023-11-29 11:38 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Tue, Nov 28, 2023 at 06:41:51PM +0100, Niklas Cassel wrote:
> On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> > The drivers for platforms requiring reference clock from the PCIe host for
> > initializing their PCIe EP core, make use of the 'core_init_notifier'
> > feature exposed by the DWC common code. On these platforms, access to the
> > hw registers like DBI before completing the core initialization will result
> > in a whole system hang. But the current DWC EP driver tries to access DBI
> > registers during dw_pcie_ep_init() without waiting for core initialization
> > and it results in system hang on platforms making use of
> > 'core_init_notifier' such as Tegra194 and Qcom SM8450.
> > 
> > To workaround this issue, users of the above mentioned platforms have to
> > maintain the dependency with the PCIe host by booting the PCIe EP after
> > host boot. But this won't provide a good user experience, since PCIe EP is
> > _one_ of the features of those platforms and it doesn't make sense to
> > delay the whole platform booting due to the PCIe dependency.
> > 
> > So to fix this issue, let's move all the DBI access during
> > dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> > API that gets called only after core initialization on these platforms.
> > This makes sure that the DBI register accesses are skipped during
> > dw_pcie_ep_init() and accessed later once the core initialization happens.
> > 
> > For the rest of the platforms, DBI access happens as usual.
> > 
> > Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > ---
> 
> Hello Mani,
> 
> I tried this patch on top of a work in progress EP driver,
> which, similar to pcie-qcom-ep.c has a perst gpio as input,
> and a .core_init_notifier.
> 
> What I noticed is the following every time I reboot the RC, I get:
> 
> [  604.735115] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> [ 1000.713582] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
> 
> 
> Also:
> 
> # ls -al /sys/class/dma/dma*/device | grep pcie
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan0/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan1/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan2/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan3/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan0/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan1/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan2/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan3/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan0/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan1/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan2/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan3/device -> ../../../a40000000.pcie_ep
> 
> Adds another dmaX entry for each reboot.
> 
> 
> I'm quite sure that you will see the same with pcie-qcom-ep.
> 
> I think that either the DWC drivers using core_init (only tegra and qcom)
> need to deinit the eDMA in their assert_perst() function, or this patch
> needs to deinit the eDMA before initializing it.
> 
> 
> A problem with the current code, if you do NOT have this patch, which I assume
> is also problem on pcie-qcom-ep, is that since assert_perst() function performs
> a core reset, all the eDMA setting written in the dbi by the eDMA driver will be
> cleared, so a PERST assert + deassert by the RC will wipe the eDMA settings.
> Hopefully, this will no longer be a problem after this patch has been merged.
> 
> 
> Kind regards,
> Niklas

I'm sorry that I'm just looking at this patch now (it's v7 already).
But I did notice that the DWC code is inconsistent for drivers having
a .core_init_notifier and drivers not having a .core_init_notifier.

When receiving a hot reset or link-down reset, the DWC core gets reset,
which means that most DBI settings get reset to their reset value.


Both tegra and qcom-ep does have a start_link() that is basically a no-op.
Instead, ep_init_complete() (and LTSSM enable) is called when PERST is
deasserted, so settings written by ep_init_complete() will always get set
after PERST is asserted + deasserted.

However, for a driver without a .core_init_notifier, a pci-epf-test unbind
+ bind, will currently NOT write the DBI settings written by
ep_init_complete() when starting the link the second time.

If you unbind + bind pci-epf-test (which requires stopping and starting the
link), I think that you should write all the DBI settings. Unbinding + binding
will allocate memory for all the BARs, write all the iATU settings etc.
It doesn't make sense that some DBI writes (those made by ep_init_complete())
are not redone.

The problem is that if you do not have a .core_init_notifier,
ep_init_complete() (which does DBI writes) is only called by ep_init(),
and never ever again.


Considering that .start_link() is a no-op for DWC drivers with a
.core_init_notifier (they instead call ep_init_complete() when perst is
deasserted), I think the most logical thing would be for .start_link() to
call ep_init_complete() (for drivers without a .core_init_notifier), that way,
all DBI settings (and not just some) will be written on an unbind + bind.


Something like this:

--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -465,6 +465,16 @@ static int dw_pcie_ep_start(struct pci_epc *epc)
 {
        struct dw_pcie_ep *ep = epc_get_drvdata(epc);
        struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+       const struct pci_epc_features *epc_features;
+
+       if (ep->ops->get_features) {
+               epc_features = ep->ops->get_features(ep);
+               if (!epc_features->core_init_notifier) {
+                       ret = dw_pcie_ep_init_complete(ep);
+                       if (ret)
+                               return ret;
+               }
+       }
 
        return dw_pcie_start_link(pci);
 }
@@ -729,7 +739,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
        struct device *dev = pci->dev;
        struct platform_device *pdev = to_platform_device(dev);
        struct device_node *np = dev->of_node;
-       const struct pci_epc_features *epc_features;
        struct dw_pcie_ep_func *ep_func;
 
        INIT_LIST_HEAD(&ep->func_list);
@@ -817,16 +826,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
        if (ret)
                goto err_free_epc_mem;
 
-       if (ep->ops->get_features) {
-               epc_features = ep->ops->get_features(ep);
-               if (epc_features->core_init_notifier)
-                       return 0;
-       }
-
-       ret = dw_pcie_ep_init_complete(ep);
-       if (ret)
-               goto err_remove_edma;
-
        return 0;
 
 err_remove_edma:


I could send a patch, but it would be conflicting with your patch.
And you also need to handle deiniting + initing the eDMA in a nice way,
but that seems to be a problem that also needs to be addressed with the
patch in $subject.

What do you think?


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-28 17:41   ` Niklas Cassel
  2023-11-29 11:38     ` Niklas Cassel
@ 2023-11-29 14:08     ` Manivannan Sadhasivam
  1 sibling, 0 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2023-11-29 14:08 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-pci, Damien Le Moal

Hi Niklas,

On Tue, Nov 28, 2023 at 05:41:52PM +0000, Niklas Cassel wrote:
> On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> > The drivers for platforms requiring reference clock from the PCIe host for
> > initializing their PCIe EP core, make use of the 'core_init_notifier'
> > feature exposed by the DWC common code. On these platforms, access to the
> > hw registers like DBI before completing the core initialization will result
> > in a whole system hang. But the current DWC EP driver tries to access DBI
> > registers during dw_pcie_ep_init() without waiting for core initialization
> > and it results in system hang on platforms making use of
> > 'core_init_notifier' such as Tegra194 and Qcom SM8450.
> > 
> > To workaround this issue, users of the above mentioned platforms have to
> > maintain the dependency with the PCIe host by booting the PCIe EP after
> > host boot. But this won't provide a good user experience, since PCIe EP is
> > _one_ of the features of those platforms and it doesn't make sense to
> > delay the whole platform booting due to the PCIe dependency.
> > 
> > So to fix this issue, let's move all the DBI access during
> > dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> > API that gets called only after core initialization on these platforms.
> > This makes sure that the DBI register accesses are skipped during
> > dw_pcie_ep_init() and accessed later once the core initialization happens.
> > 
> > For the rest of the platforms, DBI access happens as usual.
> > 
> > Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > ---
> 
> Hello Mani,
> 
> I tried this patch on top of a work in progress EP driver,
> which, similar to pcie-qcom-ep.c has a perst gpio as input,
> and a .core_init_notifier.
> 
> What I noticed is the following every time I reboot the RC, I get:
> 
> [  604.735115] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> [ 1000.713582] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
> 
> 
> Also:
> 
> # ls -al /sys/class/dma/dma*/device | grep pcie
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan0/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan1/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan2/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan3/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan0/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan1/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan2/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan3/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan0/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan1/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan2/device -> ../../../a40000000.pcie_ep
> lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan3/device -> ../../../a40000000.pcie_ep
> 
> Adds another dmaX entry for each reboot.
> 
> 
> I'm quite sure that you will see the same with pcie-qcom-ep.
> 

Yes, you are right. I'm aware of this issue but don't have a handy solution atm.

> I think that either the DWC drivers using core_init (only tegra and qcom)
> need to deinit the eDMA in their assert_perst() function, or this patch
> needs to deinit the eDMA before initializing it.
> 

The problem with first option is that it is too late. The moment EP gets perst
assert IRQ, refclk will be cut off by the host. So if EP tries to access any
hardware registers (edma_core_off) it will result in hard reboot.

The second option is not a complete solution, because the EPF drivers still need
to release the DMA channels and that can only happen if the EPF drivers know
that the host is going into power down state. Even if the DWC core makes an
assumption that EPF drivers are releasing the channel, in reality it is not
happening.

> 
> A problem with the current code, if you do NOT have this patch, which I assume
> is also problem on pcie-qcom-ep, is that since assert_perst() function performs
> a core reset, all the eDMA setting written in the dbi by the eDMA driver will be
> cleared, so a PERST assert + deassert by the RC will wipe the eDMA settings.
> Hopefully, this will no longer be a problem after this patch has been merged.
> 

Yes. Since all the DBI accesses were moved to the dw_pcie_ep_late_init()
function that get's called during perst deassert with the help of
dw_pcie_ep_init_complete(), all the settings will be reconfigured again.

- Mani

> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-29 11:38     ` Niklas Cassel
@ 2023-11-29 15:51       ` Manivannan Sadhasivam
  2023-11-29 16:36         ` Niklas Cassel
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2023-11-29 15:51 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Wed, Nov 29, 2023 at 11:38:34AM +0000, Niklas Cassel wrote:
> On Tue, Nov 28, 2023 at 06:41:51PM +0100, Niklas Cassel wrote:
> > On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> > > The drivers for platforms requiring reference clock from the PCIe host for
> > > initializing their PCIe EP core, make use of the 'core_init_notifier'
> > > feature exposed by the DWC common code. On these platforms, access to the
> > > hw registers like DBI before completing the core initialization will result
> > > in a whole system hang. But the current DWC EP driver tries to access DBI
> > > registers during dw_pcie_ep_init() without waiting for core initialization
> > > and it results in system hang on platforms making use of
> > > 'core_init_notifier' such as Tegra194 and Qcom SM8450.
> > > 
> > > To workaround this issue, users of the above mentioned platforms have to
> > > maintain the dependency with the PCIe host by booting the PCIe EP after
> > > host boot. But this won't provide a good user experience, since PCIe EP is
> > > _one_ of the features of those platforms and it doesn't make sense to
> > > delay the whole platform booting due to the PCIe dependency.
> > > 
> > > So to fix this issue, let's move all the DBI access during
> > > dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> > > API that gets called only after core initialization on these platforms.
> > > This makes sure that the DBI register accesses are skipped during
> > > dw_pcie_ep_init() and accessed later once the core initialization happens.
> > > 
> > > For the rest of the platforms, DBI access happens as usual.
> > > 
> > > Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
> > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > ---
> > 
> > Hello Mani,
> > 
> > I tried this patch on top of a work in progress EP driver,
> > which, similar to pcie-qcom-ep.c has a perst gpio as input,
> > and a .core_init_notifier.
> > 
> > What I noticed is the following every time I reboot the RC, I get:
> > 
> > [  604.735115] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> > [ 1000.713582] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> > [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> > [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> > [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> > [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
> > 
> > 
> > Also:
> > 
> > # ls -al /sys/class/dma/dma*/device | grep pcie
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan0/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan1/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan2/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan3/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan0/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan1/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan2/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan3/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan0/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan1/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan2/device -> ../../../a40000000.pcie_ep
> > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan3/device -> ../../../a40000000.pcie_ep
> > 
> > Adds another dmaX entry for each reboot.
> > 
> > 
> > I'm quite sure that you will see the same with pcie-qcom-ep.
> > 
> > I think that either the DWC drivers using core_init (only tegra and qcom)
> > need to deinit the eDMA in their assert_perst() function, or this patch
> > needs to deinit the eDMA before initializing it.
> > 
> > 
> > A problem with the current code, if you do NOT have this patch, which I assume
> > is also problem on pcie-qcom-ep, is that since assert_perst() function performs
> > a core reset, all the eDMA setting written in the dbi by the eDMA driver will be
> > cleared, so a PERST assert + deassert by the RC will wipe the eDMA settings.
> > Hopefully, this will no longer be a problem after this patch has been merged.
> > 
> > 
> > Kind regards,
> > Niklas
> 
> I'm sorry that I'm just looking at this patch now (it's v7 already).
> But I did notice that the DWC code is inconsistent for drivers having
> a .core_init_notifier and drivers not having a .core_init_notifier.
> 
> When receiving a hot reset or link-down reset, the DWC core gets reset,
> which means that most DBI settings get reset to their reset value.
> 

I believe you are talking about the hardware here and not the DWC core driver.

> 
> Both tegra and qcom-ep does have a start_link() that is basically a no-op.

That's because of the fact that we cannot write to LTSSM registers before perst
deassert.

> Instead, ep_init_complete() (and LTSSM enable) is called when PERST is
> deasserted, so settings written by ep_init_complete() will always get set
> after PERST is asserted + deasserted.
> 
> However, for a driver without a .core_init_notifier, a pci-epf-test unbind
> + bind, will currently NOT write the DBI settings written by
> ep_init_complete() when starting the link the second time.
> 
> If you unbind + bind pci-epf-test (which requires stopping and starting the
> link), I think that you should write all the DBI settings. Unbinding + binding
> will allocate memory for all the BARs, write all the iATU settings etc.
> It doesn't make sense that some DBI writes (those made by ep_init_complete())
> are not redone.
> 
> The problem is that if you do not have a .core_init_notifier,
> ep_init_complete() (which does DBI writes) is only called by ep_init(),
> and never ever again.
> 

Hmm, right. I did not look close into non-core_init_notifier platforms because
of the lack of hardware.

> 
> Considering that .start_link() is a no-op for DWC drivers with a
> .core_init_notifier (they instead call ep_init_complete() when perst is
> deasserted), I think the most logical thing would be for .start_link() to
> call ep_init_complete() (for drivers without a .core_init_notifier), that way,
> all DBI settings (and not just some) will be written on an unbind + bind.
> 
> 
> Something like this:
> 
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -465,6 +465,16 @@ static int dw_pcie_ep_start(struct pci_epc *epc)
>  {
>         struct dw_pcie_ep *ep = epc_get_drvdata(epc);
>         struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +       const struct pci_epc_features *epc_features;
> +
> +       if (ep->ops->get_features) {
> +               epc_features = ep->ops->get_features(ep);
> +               if (!epc_features->core_init_notifier) {
> +                       ret = dw_pcie_ep_init_complete(ep);
> +                       if (ret)
> +                               return ret;
> +               }
> +       }
>  
>         return dw_pcie_start_link(pci);
>  }
> @@ -729,7 +739,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>         struct device *dev = pci->dev;
>         struct platform_device *pdev = to_platform_device(dev);
>         struct device_node *np = dev->of_node;
> -       const struct pci_epc_features *epc_features;
>         struct dw_pcie_ep_func *ep_func;
>  
>         INIT_LIST_HEAD(&ep->func_list);
> @@ -817,16 +826,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>         if (ret)
>                 goto err_free_epc_mem;
>  
> -       if (ep->ops->get_features) {
> -               epc_features = ep->ops->get_features(ep);
> -               if (epc_features->core_init_notifier)
> -                       return 0;
> -       }
> -
> -       ret = dw_pcie_ep_init_complete(ep);
> -       if (ret)
> -               goto err_remove_edma;
> -
>         return 0;
>  
>  err_remove_edma:
> 
> 
> I could send a patch, but it would be conflicting with your patch.
> And you also need to handle deiniting + initing the eDMA in a nice way,
> but that seems to be a problem that also needs to be addressed with the
> patch in $subject.
> 
> What do you think?
> 

Your proposed solution looks good to me. If you are OK, I can spin up a patch on
behalf of you (with your authorship ofc) and integrate it with this series.

Let me know!

- Mani

> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-29 15:51       ` Manivannan Sadhasivam
@ 2023-11-29 16:36         ` Niklas Cassel
  2023-11-30  6:38           ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2023-11-29 16:36 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Wed, Nov 29, 2023 at 09:21:40PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Nov 29, 2023 at 11:38:34AM +0000, Niklas Cassel wrote:
> > On Tue, Nov 28, 2023 at 06:41:51PM +0100, Niklas Cassel wrote:
> > > On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> > > > The drivers for platforms requiring reference clock from the PCIe host for
> > > > initializing their PCIe EP core, make use of the 'core_init_notifier'
> > > > feature exposed by the DWC common code. On these platforms, access to the
> > > > hw registers like DBI before completing the core initialization will result
> > > > in a whole system hang. But the current DWC EP driver tries to access DBI
> > > > registers during dw_pcie_ep_init() without waiting for core initialization
> > > > and it results in system hang on platforms making use of
> > > > 'core_init_notifier' such as Tegra194 and Qcom SM8450.
> > > > 
> > > > To workaround this issue, users of the above mentioned platforms have to
> > > > maintain the dependency with the PCIe host by booting the PCIe EP after
> > > > host boot. But this won't provide a good user experience, since PCIe EP is
> > > > _one_ of the features of those platforms and it doesn't make sense to
> > > > delay the whole platform booting due to the PCIe dependency.
> > > > 
> > > > So to fix this issue, let's move all the DBI access during
> > > > dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> > > > API that gets called only after core initialization on these platforms.
> > > > This makes sure that the DBI register accesses are skipped during
> > > > dw_pcie_ep_init() and accessed later once the core initialization happens.
> > > > 
> > > > For the rest of the platforms, DBI access happens as usual.
> > > > 
> > > > Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
> > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > ---
> > > 
> > > Hello Mani,
> > > 
> > > I tried this patch on top of a work in progress EP driver,
> > > which, similar to pcie-qcom-ep.c has a perst gpio as input,
> > > and a .core_init_notifier.
> > > 
> > > What I noticed is the following every time I reboot the RC, I get:
> > > 
> > > [  604.735115] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> > > [ 1000.713582] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> > > [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> > > [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> > > [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> > > [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
> > > 
> > > 
> > > Also:
> > > 
> > > # ls -al /sys/class/dma/dma*/device | grep pcie
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan0/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan1/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan2/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan3/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan0/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan1/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan2/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan3/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan0/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan1/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan2/device -> ../../../a40000000.pcie_ep
> > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan3/device -> ../../../a40000000.pcie_ep
> > > 
> > > Adds another dmaX entry for each reboot.
> > > 
> > > 
> > > I'm quite sure that you will see the same with pcie-qcom-ep.
> > > 
> > > I think that either the DWC drivers using core_init (only tegra and qcom)
> > > need to deinit the eDMA in their assert_perst() function, or this patch
> > > needs to deinit the eDMA before initializing it.
> > > 
> > > 
> > > A problem with the current code, if you do NOT have this patch, which I assume
> > > is also problem on pcie-qcom-ep, is that since assert_perst() function performs
> > > a core reset, all the eDMA setting written in the dbi by the eDMA driver will be
> > > cleared, so a PERST assert + deassert by the RC will wipe the eDMA settings.
> > > Hopefully, this will no longer be a problem after this patch has been merged.
> > > 
> > > 
> > > Kind regards,
> > > Niklas
> > 
> > I'm sorry that I'm just looking at this patch now (it's v7 already).
> > But I did notice that the DWC code is inconsistent for drivers having
> > a .core_init_notifier and drivers not having a .core_init_notifier.
> > 
> > When receiving a hot reset or link-down reset, the DWC core gets reset,
> > which means that most DBI settings get reset to their reset value.
> > 
> 
> I believe you are talking about the hardware here and not the DWC core driver.
> 
> > 
> > Both tegra and qcom-ep does have a start_link() that is basically a no-op.
> 
> That's because of the fact that we cannot write to LTSSM registers before perst
> deassert.
> 
> > Instead, ep_init_complete() (and LTSSM enable) is called when PERST is
> > deasserted, so settings written by ep_init_complete() will always get set
> > after PERST is asserted + deasserted.
> > 
> > However, for a driver without a .core_init_notifier, a pci-epf-test unbind
> > + bind, will currently NOT write the DBI settings written by
> > ep_init_complete() when starting the link the second time.
> > 
> > If you unbind + bind pci-epf-test (which requires stopping and starting the
> > link), I think that you should write all the DBI settings. Unbinding + binding
> > will allocate memory for all the BARs, write all the iATU settings etc.
> > It doesn't make sense that some DBI writes (those made by ep_init_complete())
> > are not redone.
> > 
> > The problem is that if you do not have a .core_init_notifier,
> > ep_init_complete() (which does DBI writes) is only called by ep_init(),
> > and never ever again.
> > 
> 
> Hmm, right. I did not look close into non-core_init_notifier platforms because
> of the lack of hardware.
> 
> > 
> > Considering that .start_link() is a no-op for DWC drivers with a
> > .core_init_notifier (they instead call ep_init_complete() when perst is
> > deasserted), I think the most logical thing would be for .start_link() to
> > call ep_init_complete() (for drivers without a .core_init_notifier), that way,
> > all DBI settings (and not just some) will be written on an unbind + bind.
> > 
> > 
> > Something like this:
> > 
> > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > @@ -465,6 +465,16 @@ static int dw_pcie_ep_start(struct pci_epc *epc)
> >  {
> >         struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> >         struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +       const struct pci_epc_features *epc_features;
> > +
> > +       if (ep->ops->get_features) {
> > +               epc_features = ep->ops->get_features(ep);
> > +               if (!epc_features->core_init_notifier) {
> > +                       ret = dw_pcie_ep_init_complete(ep);
> > +                       if (ret)
> > +                               return ret;
> > +               }
> > +       }
> >  
> >         return dw_pcie_start_link(pci);
> >  }
> > @@ -729,7 +739,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> >         struct device *dev = pci->dev;
> >         struct platform_device *pdev = to_platform_device(dev);
> >         struct device_node *np = dev->of_node;
> > -       const struct pci_epc_features *epc_features;
> >         struct dw_pcie_ep_func *ep_func;
> >  
> >         INIT_LIST_HEAD(&ep->func_list);
> > @@ -817,16 +826,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> >         if (ret)
> >                 goto err_free_epc_mem;
> >  
> > -       if (ep->ops->get_features) {
> > -               epc_features = ep->ops->get_features(ep);
> > -               if (epc_features->core_init_notifier)
> > -                       return 0;
> > -       }
> > -
> > -       ret = dw_pcie_ep_init_complete(ep);
> > -       if (ret)
> > -               goto err_remove_edma;
> > -
> >         return 0;
> >  
> >  err_remove_edma:
> > 
> > 
> > I could send a patch, but it would be conflicting with your patch.
> > And you also need to handle deiniting + initing the eDMA in a nice way,
> > but that seems to be a problem that also needs to be addressed with the
> > patch in $subject.
> > 
> > What do you think?
> > 
> 
> Your proposed solution looks good to me. If you are OK, I can spin up a patch on
> behalf of you (with your authorship ofc) and integrate it with this series.

Sounds good to me!

Since you will need to also fix the error paths (my suggestion didn't)),
I think that you deserve the authorship if you cook up a patch.

It would be nice to hear Vidya's opinion on my suggestion as well, since he
appears to be the one who added the .core_init_notifier in the first place.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-29 16:36         ` Niklas Cassel
@ 2023-11-30  6:38           ` Manivannan Sadhasivam
  2023-11-30 11:22             ` Niklas Cassel
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2023-11-30  6:38 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Wed, Nov 29, 2023 at 04:36:04PM +0000, Niklas Cassel wrote:
> On Wed, Nov 29, 2023 at 09:21:40PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Nov 29, 2023 at 11:38:34AM +0000, Niklas Cassel wrote:
> > > On Tue, Nov 28, 2023 at 06:41:51PM +0100, Niklas Cassel wrote:
> > > > On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> > > > > The drivers for platforms requiring reference clock from the PCIe host for
> > > > > initializing their PCIe EP core, make use of the 'core_init_notifier'
> > > > > feature exposed by the DWC common code. On these platforms, access to the
> > > > > hw registers like DBI before completing the core initialization will result
> > > > > in a whole system hang. But the current DWC EP driver tries to access DBI
> > > > > registers during dw_pcie_ep_init() without waiting for core initialization
> > > > > and it results in system hang on platforms making use of
> > > > > 'core_init_notifier' such as Tegra194 and Qcom SM8450.
> > > > > 
> > > > > To workaround this issue, users of the above mentioned platforms have to
> > > > > maintain the dependency with the PCIe host by booting the PCIe EP after
> > > > > host boot. But this won't provide a good user experience, since PCIe EP is
> > > > > _one_ of the features of those platforms and it doesn't make sense to
> > > > > delay the whole platform booting due to the PCIe dependency.
> > > > > 
> > > > > So to fix this issue, let's move all the DBI access during
> > > > > dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> > > > > API that gets called only after core initialization on these platforms.
> > > > > This makes sure that the DBI register accesses are skipped during
> > > > > dw_pcie_ep_init() and accessed later once the core initialization happens.
> > > > > 
> > > > > For the rest of the platforms, DBI access happens as usual.
> > > > > 
> > > > > Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > ---
> > > > 
> > > > Hello Mani,
> > > > 
> > > > I tried this patch on top of a work in progress EP driver,
> > > > which, similar to pcie-qcom-ep.c has a perst gpio as input,
> > > > and a .core_init_notifier.
> > > > 
> > > > What I noticed is the following every time I reboot the RC, I get:
> > > > 
> > > > [  604.735115] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> > > > [ 1000.713582] debugfs: Directory 'a40000000.pcie_ep' with parent 'dmaengine' already present!
> > > > [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> > > > [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> > > > [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> > > > [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
> > > > 
> > > > 
> > > > Also:
> > > > 
> > > > # ls -al /sys/class/dma/dma*/device | grep pcie
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan0/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan1/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan2/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma3chan3/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan0/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan1/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan2/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:14 /sys/class/dma/dma4chan3/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan0/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan1/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan2/device -> ../../../a40000000.pcie_ep
> > > > lrwxrwxrwx    1 root     root             0 Jan  1 00:16 /sys/class/dma/dma5chan3/device -> ../../../a40000000.pcie_ep
> > > > 
> > > > Adds another dmaX entry for each reboot.
> > > > 
> > > > 
> > > > I'm quite sure that you will see the same with pcie-qcom-ep.
> > > > 
> > > > I think that either the DWC drivers using core_init (only tegra and qcom)
> > > > need to deinit the eDMA in their assert_perst() function, or this patch
> > > > needs to deinit the eDMA before initializing it.
> > > > 
> > > > 
> > > > A problem with the current code, if you do NOT have this patch, which I assume
> > > > is also problem on pcie-qcom-ep, is that since assert_perst() function performs
> > > > a core reset, all the eDMA setting written in the dbi by the eDMA driver will be
> > > > cleared, so a PERST assert + deassert by the RC will wipe the eDMA settings.
> > > > Hopefully, this will no longer be a problem after this patch has been merged.
> > > > 
> > > > 
> > > > Kind regards,
> > > > Niklas
> > > 
> > > I'm sorry that I'm just looking at this patch now (it's v7 already).
> > > But I did notice that the DWC code is inconsistent for drivers having
> > > a .core_init_notifier and drivers not having a .core_init_notifier.
> > > 
> > > When receiving a hot reset or link-down reset, the DWC core gets reset,
> > > which means that most DBI settings get reset to their reset value.
> > > 
> > 
> > I believe you are talking about the hardware here and not the DWC core driver.
> > 
> > > 
> > > Both tegra and qcom-ep does have a start_link() that is basically a no-op.
> > 
> > That's because of the fact that we cannot write to LTSSM registers before perst
> > deassert.
> > 
> > > Instead, ep_init_complete() (and LTSSM enable) is called when PERST is
> > > deasserted, so settings written by ep_init_complete() will always get set
> > > after PERST is asserted + deasserted.
> > > 
> > > However, for a driver without a .core_init_notifier, a pci-epf-test unbind
> > > + bind, will currently NOT write the DBI settings written by
> > > ep_init_complete() when starting the link the second time.
> > > 
> > > If you unbind + bind pci-epf-test (which requires stopping and starting the
> > > link), I think that you should write all the DBI settings. Unbinding + binding
> > > will allocate memory for all the BARs, write all the iATU settings etc.
> > > It doesn't make sense that some DBI writes (those made by ep_init_complete())
> > > are not redone.
> > > 

Looking at this issue again, I think your statement may not be correct. In the
stop_link() callback, non-core_init_notifier platforms are just disabling the
LTSSM and enabling it again in start_link(). So all the rest of the
configurations (DBI etc...) should not be affected during EPF bind/unbind.

Can you point me to the specific controller driver that is doing otherwise?

- Mani

> > > The problem is that if you do not have a .core_init_notifier,
> > > ep_init_complete() (which does DBI writes) is only called by ep_init(),
> > > and never ever again.
> > > 
> > 
> > Hmm, right. I did not look close into non-core_init_notifier platforms because
> > of the lack of hardware.
> > 
> > > 
> > > Considering that .start_link() is a no-op for DWC drivers with a
> > > .core_init_notifier (they instead call ep_init_complete() when perst is
> > > deasserted), I think the most logical thing would be for .start_link() to
> > > call ep_init_complete() (for drivers without a .core_init_notifier), that way,
> > > all DBI settings (and not just some) will be written on an unbind + bind.
> > > 
> > > 
> > > Something like this:
> > > 
> > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > @@ -465,6 +465,16 @@ static int dw_pcie_ep_start(struct pci_epc *epc)
> > >  {
> > >         struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > >         struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > +       const struct pci_epc_features *epc_features;
> > > +
> > > +       if (ep->ops->get_features) {
> > > +               epc_features = ep->ops->get_features(ep);
> > > +               if (!epc_features->core_init_notifier) {
> > > +                       ret = dw_pcie_ep_init_complete(ep);
> > > +                       if (ret)
> > > +                               return ret;
> > > +               }
> > > +       }
> > >  
> > >         return dw_pcie_start_link(pci);
> > >  }
> > > @@ -729,7 +739,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > >         struct device *dev = pci->dev;
> > >         struct platform_device *pdev = to_platform_device(dev);
> > >         struct device_node *np = dev->of_node;
> > > -       const struct pci_epc_features *epc_features;
> > >         struct dw_pcie_ep_func *ep_func;
> > >  
> > >         INIT_LIST_HEAD(&ep->func_list);
> > > @@ -817,16 +826,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > >         if (ret)
> > >                 goto err_free_epc_mem;
> > >  
> > > -       if (ep->ops->get_features) {
> > > -               epc_features = ep->ops->get_features(ep);
> > > -               if (epc_features->core_init_notifier)
> > > -                       return 0;
> > > -       }
> > > -
> > > -       ret = dw_pcie_ep_init_complete(ep);
> > > -       if (ret)
> > > -               goto err_remove_edma;
> > > -
> > >         return 0;
> > >  
> > >  err_remove_edma:
> > > 
> > > 
> > > I could send a patch, but it would be conflicting with your patch.
> > > And you also need to handle deiniting + initing the eDMA in a nice way,
> > > but that seems to be a problem that also needs to be addressed with the
> > > patch in $subject.
> > > 
> > > What do you think?
> > > 
> > 
> > Your proposed solution looks good to me. If you are OK, I can spin up a patch on
> > behalf of you (with your authorship ofc) and integrate it with this series.
> 
> Sounds good to me!
> 
> Since you will need to also fix the error paths (my suggestion didn't)),
> I think that you deserve the authorship if you cook up a patch.
> 
> It would be nice to hear Vidya's opinion on my suggestion as well, since he
> appears to be the one who added the .core_init_notifier in the first place.
> 
> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-30  6:38           ` Manivannan Sadhasivam
@ 2023-11-30 11:22             ` Niklas Cassel
  2023-11-30 13:57               ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2023-11-30 11:22 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Thu, Nov 30, 2023 at 12:08:00PM +0530, Manivannan Sadhasivam wrote:
> 
> Looking at this issue again, I think your statement may not be correct. In the
> stop_link() callback, non-core_init_notifier platforms are just disabling the
> LTSSM and enabling it again in start_link(). So all the rest of the
> configurations (DBI etc...) should not be affected during EPF bind/unbind.

Sorry for confusing you.

When toggling PERST on a driver with a core_init_notifier, you will call
dw_pcie_ep_init_complete() - which will initialize DBI settings, and then
stop/start the link by deasserting/asserting LTSSM.
(perst_assert()/perst_deassert() is functionally the equivalent to start_link()/
stop_link() on non-core_init_notifier platforms.)


For drivers without a core_init_notifier, they currently don't listen to PERST
input GPIO.
Stopping and starting the link will not call dw_pcie_ep_init_complete(),
so it will not (re-)initialize DBI settings.


My problem is that a bunch of hardware registers gets reset when receiving
a link-down reset or hot reset. It would be nice to write all DBI settings
when starting the link. That way the link going down for a millisecond, and
gettting immediately reestablished, will not be so bad. A stop + start link
will rewrite the settings. (Just like toggling PERST rewrites the settings.)


Currently, doing a bind + start link + stop link + unbind + bind + start link,
will not reinitialize all DBI writes for a driver.
(This is also true for a core_init_notifier driver, but there start/stop link
is a no-op, other than enabling the GPIO irq handler.)

What this will do during the second bind:
-allocate BARs (DBI writes)
-set iATUs (DBI writes)

What it will not redo is:
-what is done by dw_pcie_ep_init() - which is fine, because this is only
 supposed to allocate stuff or get stuff from DT, not actually touch HW
 registers.
-what is done by dw_pcie_ep_init_complete() - I think this should be done
 since it does DBI writes. E.g. clearing PTM root Cap and fixing Resizable
 BAR Cap, calling dw_pcie_setup() which sets max link width etc.



I guess the counter argument could be that... if you need to re-initialize
DBI registers, you could probably do a:
stop link + unbind EPF driver + unbind PCIe-EP driver + bind PCIe-EP driver
+ bind EPF driver + start link..
(But I don't need all that if I use a .core_init_notifier and just toggle
PERST).

Of course, toggling PERST and starting/stopping the link via sysfs is not
exactly the same thing...

For me personally, the platform I use do not require an external refclk to
write DBI settings, but I would very much like the HW to be re-initialized
either when stopping/starting the link in sysfs, or by toggling PERST, or
both.

I think that I will just add a .core_init_notifier to my WIP driver,
even though I do not require an external refclk, and simply call
dw_pcie_ep_init_complete() when receiving a PERST deassert, because I want
_all_ settings (DBI writes) to be reinitialized.
(If we receive a PERST reset request, I think it does make sense to obey
and actually reset/re-write all settings.)

A sysfs stop/start link would still not reinitialize everything...
I think it would be good if the DWC EP driver actually did this too,
but I'm fine if you consider it out of scope of your patch series.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-30 11:22             ` Niklas Cassel
@ 2023-11-30 13:57               ` Manivannan Sadhasivam
  2023-12-23  1:52                 ` Niklas Cassel
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2023-11-30 13:57 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Thu, Nov 30, 2023 at 11:22:30AM +0000, Niklas Cassel wrote:
> On Thu, Nov 30, 2023 at 12:08:00PM +0530, Manivannan Sadhasivam wrote:
> > 
> > Looking at this issue again, I think your statement may not be correct. In the
> > stop_link() callback, non-core_init_notifier platforms are just disabling the
> > LTSSM and enabling it again in start_link(). So all the rest of the
> > configurations (DBI etc...) should not be affected during EPF bind/unbind.
> 
> Sorry for confusing you.
> 
> When toggling PERST on a driver with a core_init_notifier, you will call
> dw_pcie_ep_init_complete() - which will initialize DBI settings, and then
> stop/start the link by deasserting/asserting LTSSM.
> (perst_assert()/perst_deassert() is functionally the equivalent to start_link()/
> stop_link() on non-core_init_notifier platforms.)
> 
> 
> For drivers without a core_init_notifier, they currently don't listen to PERST
> input GPIO.
> Stopping and starting the link will not call dw_pcie_ep_init_complete(),
> so it will not (re-)initialize DBI settings.
> 
> 
> My problem is that a bunch of hardware registers gets reset when receiving
> a link-down reset or hot reset. It would be nice to write all DBI settings
> when starting the link. That way the link going down for a millisecond, and
> gettting immediately reestablished, will not be so bad. A stop + start link
> will rewrite the settings. (Just like toggling PERST rewrites the settings.)
> 

I got slightly confused by this part. So you are saying that when a user stops
the controller through configfs, EP receives the LINK_DOWN event and that causes
the EP specific registers (like DBI) to be reset?

I thought the LINK_DOWN event can only change LTSSM and some status registers.

> 
> Currently, doing a bind + start link + stop link + unbind + bind + start link,
> will not reinitialize all DBI writes for a driver.
> (This is also true for a core_init_notifier driver, but there start/stop link
> is a no-op, other than enabling the GPIO irq handler.)
> 
> What this will do during the second bind:
> -allocate BARs (DBI writes)
> -set iATUs (DBI writes)
> 
> What it will not redo is:
> -what is done by dw_pcie_ep_init() - which is fine, because this is only
>  supposed to allocate stuff or get stuff from DT, not actually touch HW
>  registers.
> -what is done by dw_pcie_ep_init_complete() - I think this should be done
>  since it does DBI writes. E.g. clearing PTM root Cap and fixing Resizable
>  BAR Cap, calling dw_pcie_setup() which sets max link width etc.
> 
> 
> 
> I guess the counter argument could be that... if you need to re-initialize
> DBI registers, you could probably do a:
> stop link + unbind EPF driver + unbind PCIe-EP driver + bind PCIe-EP driver
> + bind EPF driver + start link..
> (But I don't need all that if I use a .core_init_notifier and just toggle
> PERST).
> 
> Of course, toggling PERST and starting/stopping the link via sysfs is not
> exactly the same thing...
> 
> For me personally, the platform I use do not require an external refclk to
> write DBI settings, but I would very much like the HW to be re-initialized
> either when stopping/starting the link in sysfs, or by toggling PERST, or
> both.
> 
> I think that I will just add a .core_init_notifier to my WIP driver,
> even though I do not require an external refclk, and simply call
> dw_pcie_ep_init_complete() when receiving a PERST deassert, because I want
> _all_ settings (DBI writes) to be reinitialized.
> (If we receive a PERST reset request, I think it does make sense to obey
> and actually reset/re-write all settings.)
> 

No, you should not add core_init_notifier if your platform receives an active
refclk (although my heard wants to have an unified behavior across all
platforms).

We should fix the DWC core code if you clarify my above suspicion.

- Mani

> A sysfs stop/start link would still not reinitialize everything...
> I think it would be good if the DWC EP driver actually did this too,
> but I'm fine if you consider it out of scope of your patch series.
> 
> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-30 13:57               ` Manivannan Sadhasivam
@ 2023-12-23  1:52                 ` Niklas Cassel
  2024-01-06 15:12                   ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2023-12-23  1:52 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Thu, Nov 30, 2023 at 07:27:57PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Nov 30, 2023 at 11:22:30AM +0000, Niklas Cassel wrote:
> > On Thu, Nov 30, 2023 at 12:08:00PM +0530, Manivannan Sadhasivam wrote:
> > > 
> > > Looking at this issue again, I think your statement may not be correct. In the
> > > stop_link() callback, non-core_init_notifier platforms are just disabling the
> > > LTSSM and enabling it again in start_link(). So all the rest of the
> > > configurations (DBI etc...) should not be affected during EPF bind/unbind.
> > 
> > Sorry for confusing you.
> > 
> > When toggling PERST on a driver with a core_init_notifier, you will call
> > dw_pcie_ep_init_complete() - which will initialize DBI settings, and then
> > stop/start the link by deasserting/asserting LTSSM.
> > (perst_assert()/perst_deassert() is functionally the equivalent to start_link()/
> > stop_link() on non-core_init_notifier platforms.)
> > 
> > 
> > For drivers without a core_init_notifier, they currently don't listen to PERST
> > input GPIO.
> > Stopping and starting the link will not call dw_pcie_ep_init_complete(),
> > so it will not (re-)initialize DBI settings.
> > 
> > 
> > My problem is that a bunch of hardware registers gets reset when receiving
> > a link-down reset or hot reset. It would be nice to write all DBI settings
> > when starting the link. That way the link going down for a millisecond, and
> > gettting immediately reestablished, will not be so bad. A stop + start link
> > will rewrite the settings. (Just like toggling PERST rewrites the settings.)
> > 
> 
> I got slightly confused by this part. So you are saying that when a user stops
> the controller through configfs, EP receives the LINK_DOWN event and that causes
> the EP specific registers (like DBI) to be reset?

Sorry for taking time to reply.
(I wanted to make sure that I was 100% understanding what was going on.)

So to give you a clear example:
If you:
1) start the EP, start the pci-epf-test
2) start the RC
3) run pci-epf-test
4) reboot the RC

this will trigger a link-down reset IRQ on the EP side (link_req_rst_not).


> 
> I thought the LINK_DOWN event can only change LTSSM and some status registers.

link_req_rst_not will assert link_down_rst_n, which will assert
non_sticky_rst_n. So all non-sticky registers will be reset.

Some thing that has been biting me is that:
While advertized Resizable BAR sizes are sticky,
the currently used resizable BAR size is non-sticky.

So a reboot of the RC will reset the BAR sizes to reset values
(which on the SoC I'm using is 1 GB...)

So, there is an advantage of having a .core_init_notifier,
as this will allow you to reboot the RC without any problems.

I guess one solution, for DWC glue drivers that don't strictly
need a refclock is to just call dw_pcie_ep_init_complete() when
receiving the link-down IRQ (since when you get that, the core
has already reset non-sticky registers to reset values), and
then when you get a link-up again, you have already re-inited
the registers that got reset.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-12-23  1:52                 ` Niklas Cassel
@ 2024-01-06 15:12                   ` Manivannan Sadhasivam
  2024-01-18 18:33                     ` Niklas Cassel
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2024-01-06 15:12 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Sat, Dec 23, 2023 at 01:52:01AM +0000, Niklas Cassel wrote:
> On Thu, Nov 30, 2023 at 07:27:57PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Nov 30, 2023 at 11:22:30AM +0000, Niklas Cassel wrote:
> > > On Thu, Nov 30, 2023 at 12:08:00PM +0530, Manivannan Sadhasivam wrote:
> > > > 
> > > > Looking at this issue again, I think your statement may not be correct. In the
> > > > stop_link() callback, non-core_init_notifier platforms are just disabling the
> > > > LTSSM and enabling it again in start_link(). So all the rest of the
> > > > configurations (DBI etc...) should not be affected during EPF bind/unbind.
> > > 
> > > Sorry for confusing you.
> > > 
> > > When toggling PERST on a driver with a core_init_notifier, you will call
> > > dw_pcie_ep_init_complete() - which will initialize DBI settings, and then
> > > stop/start the link by deasserting/asserting LTSSM.
> > > (perst_assert()/perst_deassert() is functionally the equivalent to start_link()/
> > > stop_link() on non-core_init_notifier platforms.)
> > > 
> > > 
> > > For drivers without a core_init_notifier, they currently don't listen to PERST
> > > input GPIO.
> > > Stopping and starting the link will not call dw_pcie_ep_init_complete(),
> > > so it will not (re-)initialize DBI settings.
> > > 
> > > 
> > > My problem is that a bunch of hardware registers gets reset when receiving
> > > a link-down reset or hot reset. It would be nice to write all DBI settings
> > > when starting the link. That way the link going down for a millisecond, and
> > > gettting immediately reestablished, will not be so bad. A stop + start link
> > > will rewrite the settings. (Just like toggling PERST rewrites the settings.)
> > > 
> > 
> > I got slightly confused by this part. So you are saying that when a user stops
> > the controller through configfs, EP receives the LINK_DOWN event and that causes
> > the EP specific registers (like DBI) to be reset?
> 
> Sorry for taking time to reply.
> (I wanted to make sure that I was 100% understanding what was going on.)
> 
> So to give you a clear example:
> If you:
> 1) start the EP, start the pci-epf-test
> 2) start the RC
> 3) run pci-epf-test
> 4) reboot the RC
> 
> this will trigger a link-down reset IRQ on the EP side (link_req_rst_not).
> 

Right. This is the sane RC reboot scenario. Although, there is no guarantee
that the EP will get LINK_DOWN event before perst_assert (I've seen this with
some RC platforms).

But can you confirm whether your EP is receiving PERST assert/deassert when RC
reboots?

- Mani

> 
> > 
> > I thought the LINK_DOWN event can only change LTSSM and some status registers.
> 
> link_req_rst_not will assert link_down_rst_n, which will assert
> non_sticky_rst_n. So all non-sticky registers will be reset.
> 
> Some thing that has been biting me is that:
> While advertized Resizable BAR sizes are sticky,
> the currently used resizable BAR size is non-sticky.
> 
> So a reboot of the RC will reset the BAR sizes to reset values
> (which on the SoC I'm using is 1 GB...)
> 
> So, there is an advantage of having a .core_init_notifier,
> as this will allow you to reboot the RC without any problems.
> 
> I guess one solution, for DWC glue drivers that don't strictly
> need a refclock is to just call dw_pcie_ep_init_complete() when
> receiving the link-down IRQ (since when you get that, the core
> has already reset non-sticky registers to reset values), and
> then when you get a link-up again, you have already re-inited
> the registers that got reset.
> 
> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-20  8:40 [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Manivannan Sadhasivam
  2023-11-20  8:40 ` [PATCH v7 1/2] " Manivannan Sadhasivam
  2023-11-20  8:40 ` [PATCH v7 2/2] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete() Manivannan Sadhasivam
@ 2024-01-07  7:27 ` Krzysztof Wilczyński
  2024-01-07  7:34   ` Krzysztof Wilczyński
  2024-01-09 17:58   ` Niklas Cassel
  2 siblings, 2 replies; 24+ messages in thread
From: Krzysztof Wilczyński @ 2024-01-07  7:27 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: jingoohan1, gustavo.pimentel, lpieralisi, robh, bhelgaas,
	linux-pci, linux-arm-msm, linux-kernel, quic_bjorande,
	fancer.lancer, vidyas

Hello,

> This series is the continuation of previous work by Vidya Sagar [1] to fix the
> issues related to accessing DBI register space before completing the core
> initialization in some EP platforms like Tegra194/234 and Qcom SM8450.

Applied to controller/dwc-ep, thank you!

[01/02] PCI: designware-ep: Fix DBI access before core init
        https://git.kernel.org/pci/pci/c/d3d13b00a2cf
[02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
        https://git.kernel.org/pci/pci/c/a171e1d60dad

	Krzysztof

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-07  7:27 ` [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Krzysztof Wilczyński
@ 2024-01-07  7:34   ` Krzysztof Wilczyński
  2024-01-09 17:58   ` Niklas Cassel
  1 sibling, 0 replies; 24+ messages in thread
From: Krzysztof Wilczyński @ 2024-01-07  7:34 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: jingoohan1, gustavo.pimentel, lpieralisi, robh, bhelgaas,
	linux-pci, linux-arm-msm, linux-kernel, quic_bjorande,
	fancer.lancer, vidyas

Hello,

> > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > issues related to accessing DBI register space before completing the core
> > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
> 
> Applied to controller/dwc-ep, thank you!
> 
> [01/02] PCI: designware-ep: Fix DBI access before core init
>         https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
>         https://git.kernel.org/pci/pci/c/a171e1d60dad

Manivannan, I applied this series to "controller/dwc-ep" rather than
"controller/dwc" so that I can have a look at the conflicts we have
between this series and the current "controller/dwc" branch.

There have been changes from both Shimoda-san and Frank Li that both
changed some things and moved some things around within the code base,
so this series no longer cleanly applies.

However, I wanted the CI system to pick the branch for testing, as the
sooner, the better.

Hopefully, we can resolve the conflicts before Bjorn sends his Pull
Request with 6.8 changes.

	Krzysztof

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-07  7:27 ` [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Krzysztof Wilczyński
  2024-01-07  7:34   ` Krzysztof Wilczyński
@ 2024-01-09 17:58   ` Niklas Cassel
  2024-01-10  3:11     ` Manivannan Sadhasivam
  1 sibling, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2024-01-09 17:58 UTC (permalink / raw)
  To: Krzysztof Wilczyński
  Cc: Manivannan Sadhasivam, jingoohan1, gustavo.pimentel, lpieralisi,
	robh, bhelgaas, linux-pci, linux-arm-msm, linux-kernel,
	quic_bjorande, fancer.lancer, vidyas

On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> Hello,

Hello Krzysztof, Manivannan,

> 
> > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > issues related to accessing DBI register space before completing the core
> > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
> 
> Applied to controller/dwc-ep, thank you!
> 
> [01/02] PCI: designware-ep: Fix DBI access before core init
>         https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
>         https://git.kernel.org/pci/pci/c/a171e1d60dad
> 
> 	Krzysztof

Considering that we know that this series introduces new problems
for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/

Do we really want to apply this series as is?


Reading the patch, it appears that (at least some) tegra and
qcom boards currently causes a whole system hang, which IIUC,
renders those boards unusable.

So perhaps the new issues introduced by this series is preferable
to a whole system hang.

As such, I can understand the urgency to merge this series.
However, at the very least, I think that it might be worth amending
the commit message to mention that this will currently not deregister
the dma device in a clean way, leading to e.g. superfluous entries in
/sys/class/dma/ and debugfs warnings being printed to dmesg.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2023-11-20  8:40 ` [PATCH v7 1/2] " Manivannan Sadhasivam
  2023-11-28 17:41   ` Niklas Cassel
@ 2024-01-09 21:12   ` Bjorn Helgaas
  1 sibling, 0 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2024-01-09 21:12 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: jingoohan1, gustavo.pimentel, lpieralisi, robh, kw, bhelgaas,
	linux-pci, linux-arm-msm, linux-kernel, quic_bjorande,
	fancer.lancer, vidyas

It doesn't look to me like the issues raised by Niklas have really
been resolved:

  https://lore.kernel.org/r/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
  https://lore.kernel.org/r/ZZ2JXMhdOI1Upabx@x1-carbon

so I'm doubtful that we should apply this as-is.  The spurious
/sys/class/dma/ stuff and debugfs warnings sound like things that will
annoy users.

Apart from that, this patch has been floating around a long time, but
I still think this solution is hard to maintain for the same reasons I
mentioned here:
https://lore.kernel.org/linux-pci/20220919224014.GA1030798@bhelgaas/

On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> The drivers for platforms requiring reference clock from the PCIe host for
> initializing their PCIe EP core, make use of the 'core_init_notifier'
> feature exposed by the DWC common code. On these platforms, access to the
> hw registers like DBI before completing the core initialization will result
> in a whole system hang. But the current DWC EP driver tries to access DBI
> registers during dw_pcie_ep_init() without waiting for core initialization
> and it results in system hang on platforms making use of
> 'core_init_notifier' such as Tegra194 and Qcom SM8450.

I see that only qcom_pcie_epc_features and tegra_pcie_epc_features
*set* "core_init_notifier", but all platforms use it because it's only
tested in dw_pcie_ep_init() (and a test case), which is generic to all
DWC drivers.

"core_init_notifier" is not a notifier.  From reading the code, it
only means "if this is set, skip the rest of dw_pcie_ep_init()".

Based on the code, I assume it implies that drivers that set
core_init_notifier must do some additional initialization or
something, but that initialization isn't connected here.

There should be some symbol, maybe a member of pci_epc_features, that
both *does* this initialization and *tells us* that the driver needs
this initialization.

Right now, I think it's something like:

  1) this driver sets core_init_notifier
  2) that must mean that it also calls dw_pcie_ep_init_notify() somewhere
  3) we must avoid DBI access until it does

There's nothing that directly connects those three things.

> To workaround this issue, users of the above mentioned platforms have to
> maintain the dependency with the PCIe host by booting the PCIe EP after
> host boot. But this won't provide a good user experience, since PCIe EP is
> _one_ of the features of those platforms and it doesn't make sense to
> delay the whole platform booting due to the PCIe dependency.

IIUC, "have to maintain the dependency" refers to the situation
*before* this patch, right?  This patch improves the user experience
by removing the need for users to enforce this "boot host before EP"
ordering?

> So to fix this issue, let's move all the DBI access during
> dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> API that gets called only after core initialization on these platforms.
> This makes sure that the DBI register accesses are skipped during
> dw_pcie_ep_init() and accessed later once the core initialization happens.

This patch doesn't "skip" them in dw_pcie_ep_init(); it *moves* them
completely to dw_pcie_ep_late_init() and calls that from the end of
dw_pcie_ep_init().

> For the rest of the platforms, DBI access happens as usual.

I don't really understand what "as usual" means here.  I guess it just
means "if the driver doesn't set 'core_init_notifier', nothing
changes"?  I would at least make it specific to make it clear that
"rest of the platforms" means "those that don't set
core_init_notifier".

> Co-developed-by: Vidya Sagar <vidyas@nvidia.com>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
>  .../pci/controller/dwc/pcie-designware-ep.c   | 139 ++++++++++++------
>  1 file changed, 91 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index f6207989fc6a..b1c79cd8e25f 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -662,14 +662,19 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct dw_pcie *pci, int cap)
>  	return 0;
>  }
>  
> -int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> +static int dw_pcie_ep_late_init(struct dw_pcie_ep *ep)
>  {
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	struct dw_pcie_ep_func *ep_func;
> +	struct device *dev = pci->dev;
> +	struct pci_epc *epc = ep->epc;
>  	unsigned int offset, ptm_cap_base;
>  	unsigned int nbars;
>  	u8 hdr_type;
> +	u8 func_no;
> +	int i, ret;
> +	void *addr;
>  	u32 reg;
> -	int i;
>  
>  	hdr_type = dw_pcie_readb_dbi(pci, PCI_HEADER_TYPE) &
>  		   PCI_HEADER_TYPE_MASK;
> @@ -680,6 +685,55 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
>  		return -EIO;
>  	}
>  
> +	dw_pcie_version_detect(pci);
> +
> +	dw_pcie_iatu_detect(pci);
> +
> +	ret = dw_pcie_edma_detect(pci);
> +	if (ret)
> +		return ret;
> +
> +	if (!ep->ib_window_map) {
> +		ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
> +						       GFP_KERNEL);
> +		if (!ep->ib_window_map)
> +			goto err_remove_edma;
> +	}
> +
> +	if (!ep->ob_window_map) {
> +		ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
> +						       GFP_KERNEL);
> +		if (!ep->ob_window_map)
> +			goto err_remove_edma;
> +	}
> +
> +	if (!ep->outbound_addr) {
> +		addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
> +				    GFP_KERNEL);
> +		if (!addr)
> +			goto err_remove_edma;
> +		ep->outbound_addr = addr;
> +	}
> +
> +	for (func_no = 0; func_no < epc->max_functions; func_no++) {
> +
> +		ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> +		if (ep_func)
> +			continue;
> +
> +		ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> +		if (!ep_func)
> +			goto err_remove_edma;
> +
> +		ep_func->func_no = func_no;
> +		ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> +							      PCI_CAP_ID_MSI);
> +		ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> +							       PCI_CAP_ID_MSIX);
> +
> +		list_add_tail(&ep_func->list, &ep->func_list);
> +	}
> +
>  	offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
>  	ptm_cap_base = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_PTM);
>  
> @@ -714,14 +768,38 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
>  	dw_pcie_dbi_ro_wr_dis(pci);
>  
>  	return 0;
> +
> +err_remove_edma:
> +	dw_pcie_edma_remove(pci);
> +
> +	return ret;
> +}
> +
> +int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> +{
> +	struct pci_epc *epc = ep->epc;
> +	int ret;
> +
> +	ret = dw_pcie_ep_late_init(ep);
> +	if (ret)
> +		goto err_cleanup;
> +
> +	return 0;
> +
> +err_cleanup:
> +	pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
> +			      epc->mem->window.page_size);
> +	pci_epc_mem_exit(epc);
> +	if (ep->ops->deinit)
> +		ep->ops->deinit(ep);
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete);
>  
>  int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>  {
>  	int ret;
> -	void *addr;
> -	u8 func_no;
>  	struct resource *res;
>  	struct pci_epc *epc;
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> @@ -729,7 +807,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>  	struct platform_device *pdev = to_platform_device(dev);
>  	struct device_node *np = dev->of_node;
>  	const struct pci_epc_features *epc_features;
> -	struct dw_pcie_ep_func *ep_func;
>  
>  	INIT_LIST_HEAD(&ep->func_list);
>  
> @@ -747,26 +824,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>  	if (ep->ops->pre_init)
>  		ep->ops->pre_init(ep);
>  
> -	dw_pcie_version_detect(pci);
> -
> -	dw_pcie_iatu_detect(pci);
> -
> -	ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
> -					       GFP_KERNEL);
> -	if (!ep->ib_window_map)
> -		return -ENOMEM;
> -
> -	ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
> -					       GFP_KERNEL);
> -	if (!ep->ob_window_map)
> -		return -ENOMEM;
> -
> -	addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
> -			    GFP_KERNEL);
> -	if (!addr)
> -		return -ENOMEM;
> -	ep->outbound_addr = addr;
> -
>  	epc = devm_pci_epc_create(dev, &epc_ops);
>  	if (IS_ERR(epc)) {
>  		dev_err(dev, "Failed to create epc device\n");
> @@ -780,20 +837,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>  	if (ret < 0)
>  		epc->max_functions = 1;
>  
> -	for (func_no = 0; func_no < epc->max_functions; func_no++) {
> -		ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> -		if (!ep_func)
> -			return -ENOMEM;
> -
> -		ep_func->func_no = func_no;
> -		ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> -							      PCI_CAP_ID_MSI);
> -		ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> -							       PCI_CAP_ID_MSIX);
> -
> -		list_add_tail(&ep_func->list, &ep->func_list);
> -	}
> -
>  	if (ep->ops->ep_init)
>  		ep->ops->ep_init(ep);
>  
> @@ -812,25 +855,25 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>  		goto err_exit_epc_mem;
>  	}
>  
> -	ret = dw_pcie_edma_detect(pci);
> -	if (ret)
> -		goto err_free_epc_mem;
> -
>  	if (ep->ops->get_features) {
>  		epc_features = ep->ops->get_features(ep);
>  		if (epc_features->core_init_notifier)
>  			return 0;
>  	}
>  
> -	ret = dw_pcie_ep_init_complete(ep);
> +	/*
> +	 * NOTE:- Avoid accessing the hardware (Ex:- DBI space) before this
> +	 * step as platforms that implement 'core_init_notifier' feature may
> +	 * not have the hardware ready (i.e. core initialized) for access
> +	 * (Ex: tegra194). Any hardware access on such platforms result
> +	 * in system hang.

What specifically does "before this step" refer to?  I think the
intent is that it's something to do with "core_init_notifier", but
there's no *direct* connection because there's no test of
core_init_notifier except here and the test case.

> +	ret = dw_pcie_ep_late_init(ep);
>  	if (ret)
> -		goto err_remove_edma;
> +		goto err_free_epc_mem;
>  
>  	return 0;
>  
> -err_remove_edma:
> -	dw_pcie_edma_remove(pci);
> -
>  err_free_epc_mem:
>  	pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
>  			      epc->mem->window.page_size);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-09 17:58   ` Niklas Cassel
@ 2024-01-10  3:11     ` Manivannan Sadhasivam
  2024-01-10 10:01       ` Niklas Cassel
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2024-01-10  3:11 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Krzysztof Wilczyński, jingoohan1, gustavo.pimentel,
	lpieralisi, robh, bhelgaas, linux-pci, linux-arm-msm,
	linux-kernel, quic_bjorande, fancer.lancer, vidyas

On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > Hello,
> 
> Hello Krzysztof, Manivannan,
> 
> > 
> > > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > > issues related to accessing DBI register space before completing the core
> > > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
> > 
> > Applied to controller/dwc-ep, thank you!
> > 
> > [01/02] PCI: designware-ep: Fix DBI access before core init
> >         https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> > [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
> >         https://git.kernel.org/pci/pci/c/a171e1d60dad
> > 
> > 	Krzysztof
> 
> Considering that we know that this series introduces new problems
> for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> 
> Do we really want to apply this series as is?
> 
> 

Niklas, I think I explained it in this thread itself. Let me reiterate here
again.

The fact that you are seeing the dmaengine warnings is due to function drivers
not releasing the channels properly. It is not the job of the DWC driver to
release the channels. The channels are requested by the function drivers [1]
and they _should_ release them when the channels are no longer required.

I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
warnings. But I do not have a device to test that function driver. Qcom
platforms use a dedicated function driver and that releases the channels when it
gets the LINK_DOWN event from EPC [2].

So my conclusion is that the issue is there even without this series. If you
still want me to fix the EPF_TEST driver, I can submit a change, but someone has
to test it.

- Mani

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/endpoint/functions/pci-epf-test.c#n229
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/endpoint/functions/pci-epf-mhi.c#n563

> Reading the patch, it appears that (at least some) tegra and
> qcom boards currently causes a whole system hang, which IIUC,
> renders those boards unusable.
> 
> So perhaps the new issues introduced by this series is preferable
> to a whole system hang.
> 
> As such, I can understand the urgency to merge this series.
> However, at the very least, I think that it might be worth amending
> the commit message to mention that this will currently not deregister
> the dma device in a clean way, leading to e.g. superfluous entries in
> /sys/class/dma/ and debugfs warnings being printed to dmesg.
> 
> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-10  3:11     ` Manivannan Sadhasivam
@ 2024-01-10 10:01       ` Niklas Cassel
  2024-01-10 15:27         ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2024-01-10 10:01 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Krzysztof Wilczyński, jingoohan1, gustavo.pimentel,
	lpieralisi, robh, bhelgaas, linux-pci, linux-arm-msm,
	linux-kernel, quic_bjorande, fancer.lancer, vidyas

Hello Mani,

On Wed, Jan 10, 2024 at 08:41:37AM +0530, Manivannan Sadhasivam wrote:
> On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> > On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > 
> > Considering that we know that this series introduces new problems
> > for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> > https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> > 
> > Do we really want to apply this series as is?
> > 
> > 
> 
> Niklas, I think I explained it in this thread itself. Let me reiterate here
> again.
> 
> The fact that you are seeing the dmaengine warnings is due to function drivers
> not releasing the channels properly. It is not the job of the DWC driver to
> release the channels. The channels are requested by the function drivers [1]
> and they _should_ release them when the channels are no longer required.

Sure, the function driver should release the channels.


> 
> I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
> warnings. But I do not have a device to test that function driver. Qcom
> platforms use a dedicated function driver and that releases the channels when it
> gets the LINK_DOWN event from EPC [2].
> 
> So my conclusion is that the issue is there even without this series. If you
> still want me to fix the EPF_TEST driver, I can submit a change, but someone has
> to test it.

That conclusion is not fully correct.

Let's take e.g. these error messages that this series introduces:
[ 1000.714355] debugfs: File 'mf' in directory '/' already present!
[ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
[ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
[ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!

These come from dw_edma_core_debugfs_on(), which is called by dw_edma_probe().

This is a direct result from your patch:
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=controller/dwc-ep&id=9ab5c8bb7a305135b1b6c65cb8db92b4acbef79d

Which moves dw_pcie_edma_detect() from dw_pcie_ep_init_complete() to
dw_pcie_ep_late_init() (since dw_pcie_edma_detect() calls dw_edma_probe()).

So without your patch, those debugfs error messages are not seen.

Thus, I do not think that it is sufficient to only modify the pci-epf-test
driver to release the dma channels, as I don't see how that will avoid e.g.
the debugfs error messages introduced by this patch.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-10 10:01       ` Niklas Cassel
@ 2024-01-10 15:27         ` Manivannan Sadhasivam
  0 siblings, 0 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2024-01-10 15:27 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Krzysztof Wilczyński, jingoohan1, gustavo.pimentel,
	lpieralisi, robh, bhelgaas, linux-pci, linux-arm-msm,
	linux-kernel, quic_bjorande, fancer.lancer, vidyas

On Wed, Jan 10, 2024 at 10:01:08AM +0000, Niklas Cassel wrote:
> Hello Mani,
> 
> On Wed, Jan 10, 2024 at 08:41:37AM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> > > On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > > 
> > > Considering that we know that this series introduces new problems
> > > for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> > > https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> > > 
> > > Do we really want to apply this series as is?
> > > 
> > > 
> > 
> > Niklas, I think I explained it in this thread itself. Let me reiterate here
> > again.
> > 
> > The fact that you are seeing the dmaengine warnings is due to function drivers
> > not releasing the channels properly. It is not the job of the DWC driver to
> > release the channels. The channels are requested by the function drivers [1]
> > and they _should_ release them when the channels are no longer required.
> 
> Sure, the function driver should release the channels.
> 
> 
> > 
> > I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
> > warnings. But I do not have a device to test that function driver. Qcom
> > platforms use a dedicated function driver and that releases the channels when it
> > gets the LINK_DOWN event from EPC [2].
> > 
> > So my conclusion is that the issue is there even without this series. If you
> > still want me to fix the EPF_TEST driver, I can submit a change, but someone has
> > to test it.
> 
> That conclusion is not fully correct.
> 
> Let's take e.g. these error messages that this series introduces:
> [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
> 
> These come from dw_edma_core_debugfs_on(), which is called by dw_edma_probe().
> 
> This is a direct result from your patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=controller/dwc-ep&id=9ab5c8bb7a305135b1b6c65cb8db92b4acbef79d
> 
> Which moves dw_pcie_edma_detect() from dw_pcie_ep_init_complete() to
> dw_pcie_ep_late_init() (since dw_pcie_edma_detect() calls dw_edma_probe()).
> 
> So without your patch, those debugfs error messages are not seen.
> 
> Thus, I do not think that it is sufficient to only modify the pci-epf-test
> driver to release the dma channels, as I don't see how that will avoid e.g.
> the debugfs error messages introduced by this patch.
> 

Ah, sorry I overlooked the warnings from the edma core. I think adding
dw_pcie_edma_remove() to the perst_assert() function would fix this issue. But
I'm traveling this week, so couldn't verify it on the device.

Bjorn, Krzysztof feel free to drop this series for 6.8. I will modify this
series to address some other issues discussed so far and resubmit it for 6.9.

- Mani

> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-06 15:12                   ` Manivannan Sadhasivam
@ 2024-01-18 18:33                     ` Niklas Cassel
  2024-01-19  7:28                       ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2024-01-18 18:33 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

Hello Mani,

On Sat, Jan 06, 2024 at 08:42:38PM +0530, Manivannan Sadhasivam wrote:
> > 
> > So to give you a clear example:
> > If you:
> > 1) start the EP, start the pci-epf-test
> > 2) start the RC
> > 3) run pci-epf-test
> > 4) reboot the RC
> > 
> > this will trigger a link-down reset IRQ on the EP side (link_req_rst_not).
> > 
> 
> Right. This is the sane RC reboot scenario. Although, there is no guarantee
> that the EP will get LINK_DOWN event before perst_assert (I've seen this with
> some RC platforms).
> 
> But can you confirm whether your EP is receiving PERST assert/deassert when RC
> reboots?

Yes, it does:

## RC side:
# reboot

## EP side
[  845.606810] pci: PERST asserted by host!
[  845.657058] pci: PCIE_CLIENT_INTR_STATUS_MISC: 0x4
[  845.657759] pci: hot reset or link-down reset (LTSSM_STATUS: 0x0)
[  852.483985] pci: PERST de-asserted by host!
[  852.503041] pci: PERST asserted by host!
[  852.521616] pci: PCIE_CLIENT_INTR_STATUS_MISC: 0x2003
[  852.522375] pci: link up! (LTSSM_STATUS: 0x230011)
[  852.610318] pci: PERST de-asserted by host!

The time between 845 and 852 is the time it takes for the RC to
boot and probe the RC PCIe driver.

Note that I currently don't do anything in the perst gpio handler,
I only print when receiving the PERST GPIO IRQ, as I wanted to be
able to answer your question.


Currently, what I do instead, in order to handle a link down
(which clears all the RESBAR registers) followed by a link up,
is to re-write the RESBAR registers when receiving the link down IRQ.
(This is only to handle the case of the RC rebooting.)

A nicer solution would probably be to modify dw_pcie_ep_set_bar() to
properly handle controllers with the Resizable BAR capability, and remove
the RESBAR related code from dw_pcie_ep_init_complete().

However, that would still require changes in pci-epf-test.c to call
set_bar() after a hot reset/link-down reset (and it is not possible
to distinguish between them), which could be done by either:
1) Making sure that the glue drivers (that implement Resizable BAR capability)
 call dw_pcie_ep_init_notify() when receiving a hot reset/link-down reset
 IRQ (or maybe instead when getting the link up IRQ), as that will
 trigger pci-epf-test.c to (once again) call set_bar().
or
2) Modifying pci-epf-test.c:pci_epf_test_link_up() to call set_bar()
 (If epc_features doesn't have a core_init_notifier, as in that case
 set_bar() is called by pci_epf_test_core_init()).
 (Since I assume that not all SoCs might have a PERST GPIO.)
or
3) We could allow glue drivers that use an internal refclk to still make
 use of the PERST GPIO IRQ, and only call dw_pcie_ep_init_notify(),
 as that would also cause pci-epf-test.c to call set_bar().
 (These glue drivers, which implement the Resizable BAR capability would
 however not need to perform a full core reset, etc. in the PERST GPIO
 IRQ handler, they only need to call dw_pcie_ep_init_notify().)

Right now, I think I prefer 1).

What do you think?

Since it seems that not many SoCs that use a DWC core, have Resizable
BAR capability implemented, I will try to see what I can come up with.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-18 18:33                     ` Niklas Cassel
@ 2024-01-19  7:28                       ` Manivannan Sadhasivam
  2024-01-23  9:18                         ` Niklas Cassel
  0 siblings, 1 reply; 24+ messages in thread
From: Manivannan Sadhasivam @ 2024-01-19  7:28 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Thu, Jan 18, 2024 at 06:33:35PM +0000, Niklas Cassel wrote:
> Hello Mani,
> 
> On Sat, Jan 06, 2024 at 08:42:38PM +0530, Manivannan Sadhasivam wrote:
> > > 
> > > So to give you a clear example:
> > > If you:
> > > 1) start the EP, start the pci-epf-test
> > > 2) start the RC
> > > 3) run pci-epf-test
> > > 4) reboot the RC
> > > 
> > > this will trigger a link-down reset IRQ on the EP side (link_req_rst_not).
> > > 
> > 
> > Right. This is the sane RC reboot scenario. Although, there is no guarantee
> > that the EP will get LINK_DOWN event before perst_assert (I've seen this with
> > some RC platforms).
> > 
> > But can you confirm whether your EP is receiving PERST assert/deassert when RC
> > reboots?
> 
> Yes, it does:
> 
> ## RC side:
> # reboot
> 
> ## EP side
> [  845.606810] pci: PERST asserted by host!
> [  845.657058] pci: PCIE_CLIENT_INTR_STATUS_MISC: 0x4
> [  845.657759] pci: hot reset or link-down reset (LTSSM_STATUS: 0x0)
> [  852.483985] pci: PERST de-asserted by host!
> [  852.503041] pci: PERST asserted by host!
> [  852.521616] pci: PCIE_CLIENT_INTR_STATUS_MISC: 0x2003
> [  852.522375] pci: link up! (LTSSM_STATUS: 0x230011)
> [  852.610318] pci: PERST de-asserted by host!
> 
> The time between 845 and 852 is the time it takes for the RC to
> boot and probe the RC PCIe driver.
> 
> Note that I currently don't do anything in the perst gpio handler,
> I only print when receiving the PERST GPIO IRQ, as I wanted to be
> able to answer your question.
> 
> /
> Currently, what I do instead, in order to handle a link down
> (which clears all the RESBAR registers) followed by a link up,
> is to re-write the RESBAR registers when receiving the link down IRQ.
> (This is only to handle the case of the RC rebooting.)
> 
> A nicer solution would probably be to modify dw_pcie_ep_set_bar() to
> properly handle controllers with the Resizable BAR capability, and remove
> the RESBAR related code from dw_pcie_ep_init_complete().
> 
> However, that would still require changes in pci-epf-test.c to call
> set_bar() after a hot reset/link-down reset (and it is not possible
> to distinguish between them), which could be done by either:
> 1) Making sure that the glue drivers (that implement Resizable BAR capability)
>  call dw_pcie_ep_init_notify() when receiving a hot reset/link-down reset
>  IRQ (or maybe instead when getting the link up IRQ), as that will
>  trigger pci-epf-test.c to (once again) call set_bar().
> or
> 2) Modifying pci-epf-test.c:pci_epf_test_link_up() to call set_bar()
>  (If epc_features doesn't have a core_init_notifier, as in that case
>  set_bar() is called by pci_epf_test_core_init()).
>  (Since I assume that not all SoCs might have a PERST GPIO.)
> or
> 3) We could allow glue drivers that use an internal refclk to still make
>  use of the PERST GPIO IRQ, and only call dw_pcie_ep_init_notify(),
>  as that would also cause pci-epf-test.c to call set_bar().
>  (These glue drivers, which implement the Resizable BAR capability would
>  however not need to perform a full core reset, etc. in the PERST GPIO
>  IRQ handler, they only need to call dw_pcie_ep_init_notify().)
> 
> Right now, I think I prefer 1).
> 
> What do you think?
> 

[For this context, I'm not only focusing on REBAR but all of the non-sticky DWC
registers]

If the PCIe spec has mandated using PERST# for all endpoints, I would've
definitely went with option 3. But the spec has made PERST# as optional for the
endpoints, so we cannot force the glue drivers to make use of PERST# IRQ.

But the solution I'm thinking is, we should reconfigure all non-sticky DWC
registers during LINK_DOWN phase irrespective of whether core_init_notifier is
used or not. This should work for both cases because we can skip configuring the
DWC registers when the core_init platforms try to do it again during PERST#
deassert.

Wdyt?

- Mani

> Since it seems that not many SoCs that use a DWC core, have Resizable
> BAR capability implemented, I will try to see what I can come up with.
> 
> 
> Kind regards,
> Niklas

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-19  7:28                       ` Manivannan Sadhasivam
@ 2024-01-23  9:18                         ` Niklas Cassel
  2024-01-23  9:59                           ` Manivannan Sadhasivam
  0 siblings, 1 reply; 24+ messages in thread
From: Niklas Cassel @ 2024-01-23  9:18 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Fri, Jan 19, 2024 at 12:58:46PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Jan 18, 2024 at 06:33:35PM +0000, Niklas Cassel wrote:

(snip)

> > A nicer solution would probably be to modify dw_pcie_ep_set_bar() to
> > properly handle controllers with the Resizable BAR capability, and remove
> > the RESBAR related code from dw_pcie_ep_init_complete().
> > 
> > However, that would still require changes in pci-epf-test.c to call
> > set_bar() after a hot reset/link-down reset (and it is not possible
> > to distinguish between them), which could be done by either:
> > 1) Making sure that the glue drivers (that implement Resizable BAR capability)
> >  call dw_pcie_ep_init_notify() when receiving a hot reset/link-down reset
> >  IRQ (or maybe instead when getting the link up IRQ), as that will
> >  trigger pci-epf-test.c to (once again) call set_bar().
> > or
> > 2) Modifying pci-epf-test.c:pci_epf_test_link_up() to call set_bar()
> >  (If epc_features doesn't have a core_init_notifier, as in that case
> >  set_bar() is called by pci_epf_test_core_init()).
> >  (Since I assume that not all SoCs might have a PERST GPIO.)
> > or
> > 3) We could allow glue drivers that use an internal refclk to still make
> >  use of the PERST GPIO IRQ, and only call dw_pcie_ep_init_notify(),
> >  as that would also cause pci-epf-test.c to call set_bar().
> >  (These glue drivers, which implement the Resizable BAR capability would
> >  however not need to perform a full core reset, etc. in the PERST GPIO
> >  IRQ handler, they only need to call dw_pcie_ep_init_notify().)
> > 
> > Right now, I think I prefer 1).
> > 
> > What do you think?
> > 
> 
> [For this context, I'm not only focusing on REBAR but all of the non-sticky DWC
> registers]
> 
> If the PCIe spec has mandated using PERST# for all endpoints, I would've
> definitely went with option 3. But the spec has made PERST# as optional for the
> endpoints, so we cannot force the glue drivers to make use of PERST# IRQ.
> 
> But the solution I'm thinking is, we should reconfigure all non-sticky DWC
> registers during LINK_DOWN phase irrespective of whether core_init_notifier is
> used or not. This should work for both cases because we can skip configuring the
> DWC registers when the core_init platforms try to do it again during PERST#
> deassert.
> 
> Wdyt?

I'm guessing you mean something like, create a dw_pcie_ep_linkdown() function,
that:
1) calls a dwc_pcie_ep_reinit_non_sticky()
2) calls pci_epc_linkdown()

If so, I like the suggestion.


The only problem I can see is that not all DWC EP glue drivers might have
an IRQ for link down. But I don't see any better way.
(I guess the glue drivers that don't have an IRQ on link down could have a
kthread that polls dw_pcie_link_up(), if they want to be able to handle the
RC/host rebooting.)

One thing comes to mind though...
Some EPF drivers might have a .link_down handler, which might e.g. dealloc the
memory backing the BARs. But I guess that is fine, as long as we have called
dwc_pcie_ep_reinit_non_sticky() before calling pci_epc_linkdown(), the
non-sticky registers have been re-initialized, so even if a EPF driver performs
a .link_down() + .link_up(), the non-sticky registers in DWC core should still
have proper values set.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/2] PCI: designware-ep: Fix DBI access before core init
  2024-01-23  9:18                         ` Niklas Cassel
@ 2024-01-23  9:59                           ` Manivannan Sadhasivam
  0 siblings, 0 replies; 24+ messages in thread
From: Manivannan Sadhasivam @ 2024-01-23  9:59 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-pci, Damien Le Moal, Vidya Sagar

On Tue, Jan 23, 2024 at 09:18:31AM +0000, Niklas Cassel wrote:
> On Fri, Jan 19, 2024 at 12:58:46PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Jan 18, 2024 at 06:33:35PM +0000, Niklas Cassel wrote:
> 
> (snip)
> 
> > > A nicer solution would probably be to modify dw_pcie_ep_set_bar() to
> > > properly handle controllers with the Resizable BAR capability, and remove
> > > the RESBAR related code from dw_pcie_ep_init_complete().
> > > 
> > > However, that would still require changes in pci-epf-test.c to call
> > > set_bar() after a hot reset/link-down reset (and it is not possible
> > > to distinguish between them), which could be done by either:
> > > 1) Making sure that the glue drivers (that implement Resizable BAR capability)
> > >  call dw_pcie_ep_init_notify() when receiving a hot reset/link-down reset
> > >  IRQ (or maybe instead when getting the link up IRQ), as that will
> > >  trigger pci-epf-test.c to (once again) call set_bar().
> > > or
> > > 2) Modifying pci-epf-test.c:pci_epf_test_link_up() to call set_bar()
> > >  (If epc_features doesn't have a core_init_notifier, as in that case
> > >  set_bar() is called by pci_epf_test_core_init()).
> > >  (Since I assume that not all SoCs might have a PERST GPIO.)
> > > or
> > > 3) We could allow glue drivers that use an internal refclk to still make
> > >  use of the PERST GPIO IRQ, and only call dw_pcie_ep_init_notify(),
> > >  as that would also cause pci-epf-test.c to call set_bar().
> > >  (These glue drivers, which implement the Resizable BAR capability would
> > >  however not need to perform a full core reset, etc. in the PERST GPIO
> > >  IRQ handler, they only need to call dw_pcie_ep_init_notify().)
> > > 
> > > Right now, I think I prefer 1).
> > > 
> > > What do you think?
> > > 
> > 
> > [For this context, I'm not only focusing on REBAR but all of the non-sticky DWC
> > registers]
> > 
> > If the PCIe spec has mandated using PERST# for all endpoints, I would've
> > definitely went with option 3. But the spec has made PERST# as optional for the
> > endpoints, so we cannot force the glue drivers to make use of PERST# IRQ.
> > 
> > But the solution I'm thinking is, we should reconfigure all non-sticky DWC
> > registers during LINK_DOWN phase irrespective of whether core_init_notifier is
> > used or not. This should work for both cases because we can skip configuring the
> > DWC registers when the core_init platforms try to do it again during PERST#
> > deassert.
> > 
> > Wdyt?
> 
> I'm guessing you mean something like, create a dw_pcie_ep_linkdown() function,
> that:
> 1) calls a dwc_pcie_ep_reinit_non_sticky()

This could be "dwc_pcie_ep_init_non_sticky" since we can reuse this function
during init and reinit phase. We can have a separate flag to check whether is
performed or not.

> 2) calls pci_epc_linkdown()
> 
> If so, I like the suggestion.
> 

Yes, this is what I meant.

> 
> The only problem I can see is that not all DWC EP glue drivers might have
> an IRQ for link down. But I don't see any better way.
> (I guess the glue drivers that don't have an IRQ on link down could have a
> kthread that polls dw_pcie_link_up(), if they want to be able to handle the
> RC/host rebooting.)
> 

Yeah, if the EPC driver doesn't catch PERST# or LINK_DOWN then I would consider
it as doomed.

> One thing comes to mind though...
> Some EPF drivers might have a .link_down handler, which might e.g. dealloc the
> memory backing the BARs. But I guess that is fine, as long as we have called
> dwc_pcie_ep_reinit_non_sticky() before calling pci_epc_linkdown(), the
> non-sticky registers have been re-initialized, so even if a EPF driver performs
> a .link_down() + .link_up(), the non-sticky registers in DWC core should still
> have proper values set.
> 

Yeah, there is no fixed rule on what the EPF drivers need to do during these
callbacks. So as long as we init the registers, it shouldn't matter.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-01-23  9:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-20  8:40 [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Manivannan Sadhasivam
2023-11-20  8:40 ` [PATCH v7 1/2] " Manivannan Sadhasivam
2023-11-28 17:41   ` Niklas Cassel
2023-11-29 11:38     ` Niklas Cassel
2023-11-29 15:51       ` Manivannan Sadhasivam
2023-11-29 16:36         ` Niklas Cassel
2023-11-30  6:38           ` Manivannan Sadhasivam
2023-11-30 11:22             ` Niklas Cassel
2023-11-30 13:57               ` Manivannan Sadhasivam
2023-12-23  1:52                 ` Niklas Cassel
2024-01-06 15:12                   ` Manivannan Sadhasivam
2024-01-18 18:33                     ` Niklas Cassel
2024-01-19  7:28                       ` Manivannan Sadhasivam
2024-01-23  9:18                         ` Niklas Cassel
2024-01-23  9:59                           ` Manivannan Sadhasivam
2023-11-29 14:08     ` Manivannan Sadhasivam
2024-01-09 21:12   ` Bjorn Helgaas
2023-11-20  8:40 ` [PATCH v7 2/2] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete() Manivannan Sadhasivam
2024-01-07  7:27 ` [PATCH v7 0/2] PCI: designware-ep: Fix DBI access before core init Krzysztof Wilczyński
2024-01-07  7:34   ` Krzysztof Wilczyński
2024-01-09 17:58   ` Niklas Cassel
2024-01-10  3:11     ` Manivannan Sadhasivam
2024-01-10 10:01       ` Niklas Cassel
2024-01-10 15:27         ` Manivannan Sadhasivam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).