All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/2] PCI: iproc: SOC specific fixes
@ 2017-08-04 15:48 Oza Pawandeep
  2017-08-04 15:48 ` [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP Oza Pawandeep
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Oza Pawandeep @ 2017-08-04 15:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	bcm-kernel-feedback-list, Oza Pawandeep, Andy Gospodarek,
	linux-pci, linux-kernel, Oza Pawandeep

PCI: iproc: Retry request when CRS returned from EP
Above patch adds support for CRS in PCI RC driver, otherwise if not
handled at lower level, the user space PMD (poll mode drivers) can
timeout.

PCI: iproc: add device shutdown for PCI RC
This fixes the issue where certian PCI endpoints are not getting
detected on Stingray SOC after reboot.

Changes Since v6:
Bjorn's comments addressed.
Added reference to PCIe spec and iproc Controller spec in Changelog.

Changes Since v5:
Ray's comments addressed.

Changes Since v4:
Bjorn's comments addressed.

Changes Since v3:
[re-send]

Changes Since v2:
Fix compilation errors for pcie-iproc-platform.ko which was caught by
kbuild.

Oza Pawandeep (2):
  PCI: iproc: Retry request when CRS returned from EP
  PCI: iproc: add device shutdown for PCI RC

 drivers/pci/host/pcie-iproc-platform.c |   8 ++
 drivers/pci/host/pcie-iproc.c          | 139 +++++++++++++++++++++++++++++----
 drivers/pci/host/pcie-iproc.h          |   1 +
 3 files changed, 132 insertions(+), 16 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
  2017-08-04 15:48 [PATCH v6 0/2] PCI: iproc: SOC specific fixes Oza Pawandeep
@ 2017-08-04 15:48 ` Oza Pawandeep
  2017-08-19 18:26   ` Bjorn Helgaas
  2017-08-04 15:48 ` [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC Oza Pawandeep
  2017-08-19 17:10 ` [PATCH v6 0/2] PCI: iproc: SOC specific fixes Bjorn Helgaas
  2 siblings, 1 reply; 16+ messages in thread
From: Oza Pawandeep @ 2017-08-04 15:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	bcm-kernel-feedback-list, Oza Pawandeep, Andy Gospodarek,
	linux-pci, linux-kernel, Oza Pawandeep

PCIe spec r3.1, sec 2.3.2
If CRS software visibility is not enabled, the RC must reissue the
config request as a new request.

- If CRS software visibility is enabled,
- for a config read of Vendor ID, the RC must return 0x0001 data
- for all other config reads/writes, the RC must reissue the
  request

iproc PCIe Controller spec:
4.7.3.3. Retry Status On Configuration Cycle
Endpoints are allowed to generate retry status on configuration
cycles. In this case, the RC needs to re-issue the request. The IP
does not handle this because the number of configuration cycles needed
will probably be less than the total number of non-posted operations
needed.

When a retry status is received on the User RX interface for a
configuration request that was sent on the User TX interface,
it will be indicated with a completion with the CMPL_STATUS field set
to 2=CRS, and the user will have to find the address and data values
and send a new transaction on the User TX interface.
When the internal configuration space returns a retry status during a
configuration cycle (user_cscfg = 1) on the Command/Status interface,
the pcie_cscrs will assert with the pcie_csack signal to indicate the
CRS status.
When the CRS Software Visibility Enable register in the Root Control
register is enabled, the IP will return the data value to 0x0001 for
the Vendor ID value and 0xffff  (all 1’s) for the rest of the data in
the request for reads of offset 0 that return with CRS status.  This
is true for both the User RX Interface and for the Command/Status
interface.  When CRS Software Visibility is enabled, the CMPL_STATUS
field of the completion on the User RX Interface will not be 2=CRS and
the pcie_cscrs signal will not assert on the Command/Status interface.

Note that, neither PCIe host bridge nor PCIe core re-issues the
request for any configuration offset.

As a result of the fact, PCIe RC driver (sw) should take care of
retrying.
This patch fixes the problem, and attempts to read config space again
in case of PCIe code forwarding the CRS back to CPU.
It implements iproc_pcie_config_read which gets called for Stingray,
Otherwise it falls back to PCI generic APIs.

Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>

diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index 0f39bd2..583cee0 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -68,6 +68,9 @@
 #define APB_ERR_EN_SHIFT             0
 #define APB_ERR_EN                   BIT(APB_ERR_EN_SHIFT)
 
+#define CFG_RETRY_STATUS             0xffff0001
+#define CFG_RETRY_STATUS_TIMEOUT_US  500000 /* 500 milli-seconds. */
+
 /* derive the enum index of the outbound/inbound mapping registers */
 #define MAP_REG(base_reg, index)      ((base_reg) + (index) * 2)
 
@@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus,
 	}
 }
 
+static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
+{
+	int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
+	unsigned int ret;
+
+	/*
+	 * As per PCIe spec r3.1, sec 2.3.2, CRS Software
+	 * Visibility only affects config read of the Vendor ID.
+	 * For config write or any other config read the Root must
+	 * automatically re-issue configuration request again as a
+	 * new request.
+	 *
+	 * Iproc based PCIe RC (hw) does not retry
+	 * request on its own, so handle it here.
+	 * Note that, AXI data 0xffff0001 returned by PAXB could be
+	 * valid data in configuration space, for e.g. BAR requesting
+	 * 64bit IO memory, so we retry till timeout.
+	 * And leave to be handled by upper layers.
+	 *
+	 * Also, iproc PCIe RC does not have any effect on
+	 * CRS Software visibility bit, irrespective if it
+	 * set or unset, the RC will never re-issue any configuration
+	 * access request.
+	 */
+	do {
+		ret = readl(cfg_data_p);
+		if (ret == CFG_RETRY_STATUS)
+			udelay(1);
+		else
+			return PCIBIOS_SUCCESSFUL;
+	} while (timeout--);
+
+	return PCIBIOS_DEVICE_NOT_FOUND;
+}
+
+static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
+					       unsigned int busno,
+					       unsigned int slot,
+					       unsigned int fn,
+					       int where)
+{
+	u16 offset;
+	u32 val;
+
+	/* EP device access */
+	val = (busno << CFG_ADDR_BUS_NUM_SHIFT) |
+		(slot << CFG_ADDR_DEV_NUM_SHIFT) |
+		(fn << CFG_ADDR_FUNC_NUM_SHIFT) |
+		(where & CFG_ADDR_REG_NUM_MASK) |
+		(1 & CFG_ADDR_CFG_TYPE_MASK);
+
+	iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
+	offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
+
+	if (iproc_pcie_reg_is_invalid(offset))
+		return NULL;
+
+	return (pcie->base + offset);
+}
+
 /**
  * Note access to the configuration registers are protected at the higher layer
  * by 'pci_lock' in drivers/pci/access.c
@@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus,
 		return (pcie->base + offset);
 }
 
+static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
+				    int where, int size, u32 *val)
+{
+	struct iproc_pcie *pcie = iproc_data(bus);
+	unsigned int slot = PCI_SLOT(devfn);
+	unsigned int fn = PCI_FUNC(devfn);
+	unsigned int busno = bus->number;
+	void __iomem *cfg_data_p;
+	int ret;
+
+	/* root complex access. */
+	if (busno == 0)
+		return pci_generic_config_read32(bus, devfn, where, size, val);
+
+	cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
+
+	if (!cfg_data_p)
+		return PCIBIOS_DEVICE_NOT_FOUND;
+
+	ret = iproc_pcie_cfg_retry(cfg_data_p);
+	if (ret)
+		return ret;
+
+	*val = readl(cfg_data_p);
+
+	if (size <= 2)
+		*val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
 static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
 				    int where, int size, u32 *val)
 {
 	int ret;
+	struct iproc_pcie *pcie = iproc_data(bus);
 
 	iproc_pcie_apb_err_disable(bus, true);
-	ret = pci_generic_config_read32(bus, devfn, where, size, val);
+	if (pcie->type == IPROC_PCIE_PAXB_V2)
+		ret = iproc_pcie_config_read(bus, devfn, where, size, val);
+	else
+		ret = pci_generic_config_read32(bus, devfn, where, size, val);
 	iproc_pcie_apb_err_disable(bus, false);
 
 	return ret;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC
  2017-08-04 15:48 [PATCH v6 0/2] PCI: iproc: SOC specific fixes Oza Pawandeep
  2017-08-04 15:48 ` [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP Oza Pawandeep
@ 2017-08-04 15:48 ` Oza Pawandeep
  2017-08-19 21:04   ` Bjorn Helgaas
  2017-08-19 17:10 ` [PATCH v6 0/2] PCI: iproc: SOC specific fixes Bjorn Helgaas
  2 siblings, 1 reply; 16+ messages in thread
From: Oza Pawandeep @ 2017-08-04 15:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	bcm-kernel-feedback-list, Oza Pawandeep, Andy Gospodarek,
	linux-pci, linux-kernel, Oza Pawandeep

PERST must be asserted around ~500ms before the reboot is applied.

During soft reset (e.g., "reboot" from Linux) on some iproc based SOCs
LCPLL clock and PERST both goes off simultaneously.
This will cause certain Endpoints Intel NVMe not get detected, upon
next boot sequence.

This is specifically happening with Intel P3700 400GB series.
Endpoint is expecting the clock for some amount of time after PERST is
asserted, which is not happening in Stingray (iproc based SOC).
This causes NVMe to behave in undefined way.

On the contrary, Intel x86 boards will have ref clock running, so we
do not see this behavior there.

Besides, PCI spec does not stipulate about such timings.
In-fact it does not tell us, whether PCIe device should consider
refclk to be available or not to be.

It is completely up to vendor to design their EP the way they want
with respect to ref clock availability.

500ms is just based on the observation and taken as safe margin.
This patch adds platform shutdown where it should be
called in device_shutdown while reboot command is issued.
So in sequence first Endpoint Shutdown (e.g. nvme_shutdown)
followed by RC shutdown, which issues safe PERST assertion.

Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>

diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
index 90d2bdd..9512960 100644
--- a/drivers/pci/host/pcie-iproc-platform.c
+++ b/drivers/pci/host/pcie-iproc-platform.c
@@ -131,6 +131,13 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
 	return iproc_pcie_remove(pcie);
 }
 
+static void iproc_pcie_pltfm_shutdown(struct platform_device *pdev)
+{
+	struct iproc_pcie *pcie = platform_get_drvdata(pdev);
+
+	iproc_pcie_shutdown(pcie);
+}
+
 static struct platform_driver iproc_pcie_pltfm_driver = {
 	.driver = {
 		.name = "iproc-pcie",
@@ -138,6 +145,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
 	},
 	.probe = iproc_pcie_pltfm_probe,
 	.remove = iproc_pcie_pltfm_remove,
+	.shutdown = iproc_pcie_pltfm_shutdown,
 };
 module_platform_driver(iproc_pcie_pltfm_driver);
 
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index 583cee0..ee40651 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -627,7 +627,7 @@ static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn,
 	.write = iproc_pcie_config_write32,
 };
 
-static void iproc_pcie_reset(struct iproc_pcie *pcie)
+static void iproc_pcie_perst_ctrl(struct iproc_pcie *pcie, bool assert)
 {
 	u32 val;
 
@@ -639,20 +639,28 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie)
 	if (pcie->ep_is_internal)
 		return;
 
-	/*
-	 * Select perst_b signal as reset source. Put the device into reset,
-	 * and then bring it out of reset
-	 */
-	val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
-	val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
-		~RC_PCIE_RST_OUTPUT;
-	iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
-	udelay(250);
-
-	val |= RC_PCIE_RST_OUTPUT;
-	iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
-	msleep(100);
+	if (assert) {
+		val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
+		val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
+			~RC_PCIE_RST_OUTPUT;
+		iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
+		udelay(250);
+	} else {
+		val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
+		val |= RC_PCIE_RST_OUTPUT;
+		iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
+		msleep(100);
+	}
+}
+
+int iproc_pcie_shutdown(struct iproc_pcie *pcie)
+{
+	iproc_pcie_perst_ctrl(pcie, true);
+	msleep(500);
+
+	return 0;
 }
+EXPORT_SYMBOL(iproc_pcie_shutdown);
 
 static int iproc_pcie_check_link(struct iproc_pcie *pcie, struct pci_bus *bus)
 {
@@ -1329,7 +1337,8 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
 		goto err_exit_phy;
 	}
 
-	iproc_pcie_reset(pcie);
+	iproc_pcie_perst_ctrl(pcie, true);
+	iproc_pcie_perst_ctrl(pcie, false);
 
 	if (pcie->need_ob_cfg) {
 		ret = iproc_pcie_map_ranges(pcie, res);
diff --git a/drivers/pci/host/pcie-iproc.h b/drivers/pci/host/pcie-iproc.h
index 0bbe2ea..a6b55ce 100644
--- a/drivers/pci/host/pcie-iproc.h
+++ b/drivers/pci/host/pcie-iproc.h
@@ -110,6 +110,7 @@ struct iproc_pcie {
 
 int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res);
 int iproc_pcie_remove(struct iproc_pcie *pcie);
+int iproc_pcie_shutdown(struct iproc_pcie *pcie);
 
 #ifdef CONFIG_PCIE_IPROC_MSI
 int iproc_msi_init(struct iproc_pcie *pcie, struct device_node *node);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/2] PCI: iproc: SOC specific fixes
  2017-08-04 15:48 [PATCH v6 0/2] PCI: iproc: SOC specific fixes Oza Pawandeep
  2017-08-04 15:48 ` [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP Oza Pawandeep
  2017-08-04 15:48 ` [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC Oza Pawandeep
@ 2017-08-19 17:10 ` Bjorn Helgaas
  2017-08-19 18:18   ` Oza Oza
  2 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2017-08-19 17:10 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	bcm-kernel-feedback-list, Andy Gospodarek, linux-pci,
	linux-kernel, Oza Pawandeep

On Fri, Aug 04, 2017 at 09:18:14PM +0530, Oza Pawandeep wrote:
> PCI: iproc: Retry request when CRS returned from EP
> Above patch adds support for CRS in PCI RC driver, otherwise if not
> handled at lower level, the user space PMD (poll mode drivers) can
> timeout.
> 
> PCI: iproc: add device shutdown for PCI RC
> This fixes the issue where certian PCI endpoints are not getting
> detected on Stingray SOC after reboot.
> 
> Changes Since v6:

This current series is v6, so I guess you mean "since v5".

> Bjorn's comments addressed.
> Added reference to PCIe spec and iproc Controller spec in Changelog.
> 
> Changes Since v5:
> Ray's comments addressed.
> 
> Changes Since v4:
> Bjorn's comments addressed.
> 
> Changes Since v3:
> [re-send]
> 
> Changes Since v2:
> Fix compilation errors for pcie-iproc-platform.ko which was caught by
> kbuild.
> 
> Oza Pawandeep (2):
>   PCI: iproc: Retry request when CRS returned from EP
>   PCI: iproc: add device shutdown for PCI RC
> 
>  drivers/pci/host/pcie-iproc-platform.c |   8 ++
>  drivers/pci/host/pcie-iproc.c          | 139 +++++++++++++++++++++++++++++----
>  drivers/pci/host/pcie-iproc.h          |   1 +
>  3 files changed, 132 insertions(+), 16 deletions(-)

What is this series based on?  It doesn't apply to my master branch,
my next branch, or my pci/host-iproc branch.  It's always best to base
patches on my master branch, unless they depend on something that's
not there.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/2] PCI: iproc: SOC specific fixes
  2017-08-19 17:10 ` [PATCH v6 0/2] PCI: iproc: SOC specific fixes Bjorn Helgaas
@ 2017-08-19 18:18   ` Oza Oza
  0 siblings, 0 replies; 16+ messages in thread
From: Oza Oza @ 2017-08-19 18:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sat, Aug 19, 2017 at 10:40 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Aug 04, 2017 at 09:18:14PM +0530, Oza Pawandeep wrote:
>> PCI: iproc: Retry request when CRS returned from EP
>> Above patch adds support for CRS in PCI RC driver, otherwise if not
>> handled at lower level, the user space PMD (poll mode drivers) can
>> timeout.
>>
>> PCI: iproc: add device shutdown for PCI RC
>> This fixes the issue where certian PCI endpoints are not getting
>> detected on Stingray SOC after reboot.
>>
>> Changes Since v6:
>
> This current series is v6, so I guess you mean "since v5".
>
>> Bjorn's comments addressed.
>> Added reference to PCIe spec and iproc Controller spec in Changelog.
>>
>> Changes Since v5:
>> Ray's comments addressed.
>>
>> Changes Since v4:
>> Bjorn's comments addressed.
>>
>> Changes Since v3:
>> [re-send]
>>
>> Changes Since v2:
>> Fix compilation errors for pcie-iproc-platform.ko which was caught by
>> kbuild.
>>
>> Oza Pawandeep (2):
>>   PCI: iproc: Retry request when CRS returned from EP
>>   PCI: iproc: add device shutdown for PCI RC
>>
>>  drivers/pci/host/pcie-iproc-platform.c |   8 ++
>>  drivers/pci/host/pcie-iproc.c          | 139 +++++++++++++++++++++++++++++----
>>  drivers/pci/host/pcie-iproc.h          |   1 +
>>  3 files changed, 132 insertions(+), 16 deletions(-)
>
> What is this series based on?  It doesn't apply to my master branch,
> my next branch, or my pci/host-iproc branch.  It's always best to base
> patches on my master branch, unless they depend on something that's
> not there.

I am on master branch.
It looks like following Lorenzo Pieralisi's iproc patches have gone in
breaking it.

commit 64bcd00a7ef54d3d55d7eaa8b058e5125d215129
commit 527740765629142993966afbd7a836fc47fb30ee
commit 022adcfc4666a375185d14a43d1de1cdc58d8905

I think both the patch sets were running in parallel.
I will rebase the patches.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
  2017-08-04 15:48 ` [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP Oza Pawandeep
@ 2017-08-19 18:26   ` Bjorn Helgaas
  2017-08-19 18:52       ` Oza Oza
  2017-08-19 19:32       ` Oza Oza
  0 siblings, 2 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2017-08-19 18:26 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	bcm-kernel-feedback-list, Andy Gospodarek, linux-pci,
	linux-kernel, Oza Pawandeep

On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote:
> PCIe spec r3.1, sec 2.3.2
> If CRS software visibility is not enabled, the RC must reissue the
> config request as a new request.
> 
> - If CRS software visibility is enabled,
> - for a config read of Vendor ID, the RC must return 0x0001 data
> - for all other config reads/writes, the RC must reissue the
>   request
> 
> iproc PCIe Controller spec:
> 4.7.3.3. Retry Status On Configuration Cycle
> Endpoints are allowed to generate retry status on configuration
> cycles. In this case, the RC needs to re-issue the request. The IP
> does not handle this because the number of configuration cycles needed
> will probably be less than the total number of non-posted operations
> needed.
> 
> When a retry status is received on the User RX interface for a
> configuration request that was sent on the User TX interface,
> it will be indicated with a completion with the CMPL_STATUS field set
> to 2=CRS, and the user will have to find the address and data values
> and send a new transaction on the User TX interface.
> When the internal configuration space returns a retry status during a
> configuration cycle (user_cscfg = 1) on the Command/Status interface,
> the pcie_cscrs will assert with the pcie_csack signal to indicate the
> CRS status.
> When the CRS Software Visibility Enable register in the Root Control
> register is enabled, the IP will return the data value to 0x0001 for
> the Vendor ID value and 0xffff  (all 1’s) for the rest of the data in
> the request for reads of offset 0 that return with CRS status.  This
> is true for both the User RX Interface and for the Command/Status
> interface.  When CRS Software Visibility is enabled, the CMPL_STATUS
> field of the completion on the User RX Interface will not be 2=CRS and
> the pcie_cscrs signal will not assert on the Command/Status interface.
> 
> Note that, neither PCIe host bridge nor PCIe core re-issues the
> request for any configuration offset.
> 
> As a result of the fact, PCIe RC driver (sw) should take care of
> retrying.
> This patch fixes the problem, and attempts to read config space again
> in case of PCIe code forwarding the CRS back to CPU.

I think "fixes the problem" is a little bit of an overstatement.  I
think it covers up some cases of the problem, but if I understand
correctly, we're still susceptible to data corruptions on both read
and write:

  Per PCIe r3.1, sec 2.3.2, config requests that receive completions
  with Configuration Request Retry Status (CRS) should be reissued by
  the hardware *except* reads of the Vendor ID when CRS Software
  Visibility is enabled.

  This hardware never reissues configuration requests when it receives
  CRS completions.

  For config writes, there is no way for this driver to learn about
  CRS completions.  A write that receives a CRS completion will not be
  retried and the data will be discarded.  This is a potential data
  corruption.

  For config reads, this hardware returns CFG_RETRY_STATUS data when
  it receives a CRS completion for a config read, regardless of the
  address of the read or the CRS Software Visibility Enable bit.  As a
  partial workaround for this, we retry in software any read that
  returns CFG_RETRY_STATUS.
  
  CFG_RETRY_STATUS could be valid data for other config registers.  It
  is impossible to read such registers correctly because we can't tell
  whether the hardware (1) received a CRS completion and synthesized
  data of CFG_RETRY_STATUS or (2) received a successful completion
  with data of CFG_RETRY_STATUS.

  The only thing we can really do is return 0xffffffff data, which
  indicates a config read failure in the first case but is a data
  corruption in the second case.

The write case is probably not that important because we have a
similar situation on other systems.  On other systems, the root
complex automatically reissues config writes with CRS completions, but
it may limit the number of retries.  In that case, the failed write is
probably logged as a Target Abort error, but nobody pays attention to
that, so this isn't really much worse.

> It implements iproc_pcie_config_read which gets called for Stingray,
> Otherwise it falls back to PCI generic APIs.
> 
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> 
> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> index 0f39bd2..583cee0 100644
> --- a/drivers/pci/host/pcie-iproc.c
> +++ b/drivers/pci/host/pcie-iproc.c
> @@ -68,6 +68,9 @@
>  #define APB_ERR_EN_SHIFT             0
>  #define APB_ERR_EN                   BIT(APB_ERR_EN_SHIFT)
>  
> +#define CFG_RETRY_STATUS             0xffff0001
> +#define CFG_RETRY_STATUS_TIMEOUT_US  500000 /* 500 milli-seconds. */
> +
>  /* derive the enum index of the outbound/inbound mapping registers */
>  #define MAP_REG(base_reg, index)      ((base_reg) + (index) * 2)
>  
> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus,
>  	}
>  }
>  
> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
> +{
> +	int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
> +	unsigned int ret;
> +
> +	/*
> +	 * As per PCIe spec r3.1, sec 2.3.2, CRS Software
> +	 * Visibility only affects config read of the Vendor ID.
> +	 * For config write or any other config read the Root must
> +	 * automatically re-issue configuration request again as a
> +	 * new request.
> +	 *
> +	 * Iproc based PCIe RC (hw) does not retry
> +	 * request on its own, so handle it here.
> +	 * Note that, AXI data 0xffff0001 returned by PAXB could be
> +	 * valid data in configuration space, for e.g. BAR requesting
> +	 * 64bit IO memory, so we retry till timeout.
> +	 * And leave to be handled by upper layers.
> +	 *
> +	 * Also, iproc PCIe RC does not have any effect on
> +	 * CRS Software visibility bit, irrespective if it
> +	 * set or unset, the RC will never re-issue any configuration
> +	 * access request.
> +	 */
> +	do {
> +		ret = readl(cfg_data_p);
> +		if (ret == CFG_RETRY_STATUS)
> +			udelay(1);
> +		else
> +			return PCIBIOS_SUCCESSFUL;
> +	} while (timeout--);
> +
> +	return PCIBIOS_DEVICE_NOT_FOUND;
> +}
> +
> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
> +					       unsigned int busno,
> +					       unsigned int slot,
> +					       unsigned int fn,
> +					       int where)
> +{
> +	u16 offset;
> +	u32 val;
> +
> +	/* EP device access */
> +	val = (busno << CFG_ADDR_BUS_NUM_SHIFT) |
> +		(slot << CFG_ADDR_DEV_NUM_SHIFT) |
> +		(fn << CFG_ADDR_FUNC_NUM_SHIFT) |
> +		(where & CFG_ADDR_REG_NUM_MASK) |
> +		(1 & CFG_ADDR_CFG_TYPE_MASK);
> +
> +	iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
> +	offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
> +
> +	if (iproc_pcie_reg_is_invalid(offset))
> +		return NULL;
> +
> +	return (pcie->base + offset);
> +}

This is a duplicate of most of the code in iproc_pcie_map_cfg_bus().
Can you use that directly, or factor this out somehow so it's not
repeated?

> +
>  /**
>   * Note access to the configuration registers are protected at the higher layer
>   * by 'pci_lock' in drivers/pci/access.c
> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus,
>  		return (pcie->base + offset);
>  }
>  
> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
> +				    int where, int size, u32 *val)
> +{
> +	struct iproc_pcie *pcie = iproc_data(bus);
> +	unsigned int slot = PCI_SLOT(devfn);
> +	unsigned int fn = PCI_FUNC(devfn);
> +	unsigned int busno = bus->number;
> +	void __iomem *cfg_data_p;
> +	int ret;
> +
> +	/* root complex access. */
> +	if (busno == 0)
> +		return pci_generic_config_read32(bus, devfn, where, size, val);
> +
> +	cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
> +
> +	if (!cfg_data_p)
> +		return PCIBIOS_DEVICE_NOT_FOUND;
> +
> +	ret = iproc_pcie_cfg_retry(cfg_data_p);
> +	if (ret)

This relies on the fact that PCIBIOS_SUCCESSFUL == 0, which defeats
the whole point of having a #define.  You could test for
PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even
better).

> +		return ret;
> +
> +	*val = readl(cfg_data_p);

This is an extra read.  iproc_pcie_cfg_retry() read it once and got
valid data (something other than CFG_RETRY_STATUS), returned
PCIBIOS_SUCCESSFUL, and now you're reading it again.

I think you should do something like this instead so you don't do the
MMIO read any more times than necessary:

  static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
  {
    u32 data;
    int timeout = CFG_RETRY_STATUS_TIMEOUT_US;

    data = readl(cfg_data_p);
    while (data == CFG_RETRY_STATUS && timeout--) {
      udelay(1);
      data = readl(cfg_data_p);
    }

    if (data == CFG_RETRY_STATUS)
      data = 0xffffffff;
    return data;
  }

  static int iproc_pcie_config_read(...)
  {
    u32 data;

    ...
    data = iproc_pcie_cfg_retry(cfg_data_p);
    if (size <= 2)
      *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);

In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
0xffffffff data.  That's what most other platforms do, and most
callers of the PCI config accessors check for that data instead of
checking the return code to see whether the access was successful.

For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
returns ~0, it's because the device isn't ready yet, and we should
wait and retry.

> +
> +	if (size <= 2)
> +		*val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
> +
> +	return PCIBIOS_SUCCESSFUL;
> +}
> +
>  static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
>  				    int where, int size, u32 *val)
>  {
>  	int ret;
> +	struct iproc_pcie *pcie = iproc_data(bus);
>  
>  	iproc_pcie_apb_err_disable(bus, true);
> -	ret = pci_generic_config_read32(bus, devfn, where, size, val);
> +	if (pcie->type == IPROC_PCIE_PAXB_V2)
> +		ret = iproc_pcie_config_read(bus, devfn, where, size, val);
> +	else
> +		ret = pci_generic_config_read32(bus, devfn, where, size, val);
>  	iproc_pcie_apb_err_disable(bus, false);
>  
>  	return ret;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
  2017-08-19 18:26   ` Bjorn Helgaas
@ 2017-08-19 18:52       ` Oza Oza
  2017-08-19 19:32       ` Oza Oza
  1 sibling, 0 replies; 16+ messages in thread
From: Oza Oza @ 2017-08-19 18:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote:
>> PCIe spec r3.1, sec 2.3.2
>> If CRS software visibility is not enabled, the RC must reissue the
>> config request as a new request.
>>
>> - If CRS software visibility is enabled,
>> - for a config read of Vendor ID, the RC must return 0x0001 data
>> - for all other config reads/writes, the RC must reissue the
>>   request
>>
>> iproc PCIe Controller spec:
>> 4.7.3.3. Retry Status On Configuration Cycle
>> Endpoints are allowed to generate retry status on configuration
>> cycles. In this case, the RC needs to re-issue the request. The IP
>> does not handle this because the number of configuration cycles needed
>> will probably be less than the total number of non-posted operations
>> needed.
>>
>> When a retry status is received on the User RX interface for a
>> configuration request that was sent on the User TX interface,
>> it will be indicated with a completion with the CMPL_STATUS field set
>> to 2=CRS, and the user will have to find the address and data values
>> and send a new transaction on the User TX interface.
>> When the internal configuration space returns a retry status during a
>> configuration cycle (user_cscfg = 1) on the Command/Status interface,
>> the pcie_cscrs will assert with the pcie_csack signal to indicate the
>> CRS status.
>> When the CRS Software Visibility Enable register in the Root Control
>> register is enabled, the IP will return the data value to 0x0001 for
>> the Vendor ID value and 0xffff  (all 1’s) for the rest of the data in
>> the request for reads of offset 0 that return with CRS status.  This
>> is true for both the User RX Interface and for the Command/Status
>> interface.  When CRS Software Visibility is enabled, the CMPL_STATUS
>> field of the completion on the User RX Interface will not be 2=CRS and
>> the pcie_cscrs signal will not assert on the Command/Status interface.
>>
>> Note that, neither PCIe host bridge nor PCIe core re-issues the
>> request for any configuration offset.
>>
>> As a result of the fact, PCIe RC driver (sw) should take care of
>> retrying.
>> This patch fixes the problem, and attempts to read config space again
>> in case of PCIe code forwarding the CRS back to CPU.
>
> I think "fixes the problem" is a little bit of an overstatement.  I
> think it covers up some cases of the problem, but if I understand
> correctly, we're still susceptible to data corruptions on both read
> and write:
>
>   Per PCIe r3.1, sec 2.3.2, config requests that receive completions
>   with Configuration Request Retry Status (CRS) should be reissued by
>   the hardware *except* reads of the Vendor ID when CRS Software
>   Visibility is enabled.
>
>   This hardware never reissues configuration requests when it receives
>   CRS completions.
>
>   For config writes, there is no way for this driver to learn about
>   CRS completions.  A write that receives a CRS completion will not be
>   retried and the data will be discarded.  This is a potential data
>   corruption.
>
>   For config reads, this hardware returns CFG_RETRY_STATUS data when
>   it receives a CRS completion for a config read, regardless of the
>   address of the read or the CRS Software Visibility Enable bit.  As a
>   partial workaround for this, we retry in software any read that
>   returns CFG_RETRY_STATUS.
>
>   CFG_RETRY_STATUS could be valid data for other config registers.  It
>   is impossible to read such registers correctly because we can't tell
>   whether the hardware (1) received a CRS completion and synthesized
>   data of CFG_RETRY_STATUS or (2) received a successful completion
>   with data of CFG_RETRY_STATUS.
>

this will be fixed in our SOC in next revision, we will have separate register
in host bridge space to tell us, whether it has to be retried.
so there will not be any data corruption.
it will be minor change (instead of checking for CFG_RETRY_STATUS, we
will directly check that register)
so that will be clean.

>   The only thing we can really do is return 0xffffffff data, which
>   indicates a config read failure in the first case but is a data
>   corruption in the second case.

Instead of returning PCIBIOS_DEVICE_NOT_FOUND,
you want me to return 0xffffffff data, and I suppose that you have
suggsted in the code at the end also.
Is my understanding right on this comment ?

And do you want me to add all the texts in commit description,
which you have written above regarding config read and write handling ?
instead of "fixes the problem"

It will be quiet long commit description :),
but that's okay.

>
> The write case is probably not that important because we have a
> similar situation on other systems.  On other systems, the root
> complex automatically reissues config writes with CRS completions, but
> it may limit the number of retries.  In that case, the failed write is
> probably logged as a Target Abort error, but nobody pays attention to
> that, so this isn't really much worse.
>

yes, we will not be handling write.

>> It implements iproc_pcie_config_read which gets called for Stingray,
>> Otherwise it falls back to PCI generic APIs.
>>
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>> index 0f39bd2..583cee0 100644
>> --- a/drivers/pci/host/pcie-iproc.c
>> +++ b/drivers/pci/host/pcie-iproc.c
>> @@ -68,6 +68,9 @@
>>  #define APB_ERR_EN_SHIFT             0
>>  #define APB_ERR_EN                   BIT(APB_ERR_EN_SHIFT)
>>
>> +#define CFG_RETRY_STATUS             0xffff0001
>> +#define CFG_RETRY_STATUS_TIMEOUT_US  500000 /* 500 milli-seconds. */
>> +
>>  /* derive the enum index of the outbound/inbound mapping registers */
>>  #define MAP_REG(base_reg, index)      ((base_reg) + (index) * 2)
>>
>> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus,
>>       }
>>  }
>>
>> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>> +{
>> +     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
>> +     unsigned int ret;
>> +
>> +     /*
>> +      * As per PCIe spec r3.1, sec 2.3.2, CRS Software
>> +      * Visibility only affects config read of the Vendor ID.
>> +      * For config write or any other config read the Root must
>> +      * automatically re-issue configuration request again as a
>> +      * new request.
>> +      *
>> +      * Iproc based PCIe RC (hw) does not retry
>> +      * request on its own, so handle it here.
>> +      * Note that, AXI data 0xffff0001 returned by PAXB could be
>> +      * valid data in configuration space, for e.g. BAR requesting
>> +      * 64bit IO memory, so we retry till timeout.
>> +      * And leave to be handled by upper layers.
>> +      *
>> +      * Also, iproc PCIe RC does not have any effect on
>> +      * CRS Software visibility bit, irrespective if it
>> +      * set or unset, the RC will never re-issue any configuration
>> +      * access request.
>> +      */
>> +     do {
>> +             ret = readl(cfg_data_p);
>> +             if (ret == CFG_RETRY_STATUS)
>> +                     udelay(1);
>> +             else
>> +                     return PCIBIOS_SUCCESSFUL;
>> +     } while (timeout--);
>> +
>> +     return PCIBIOS_DEVICE_NOT_FOUND;
>> +}
>> +
>> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
>> +                                            unsigned int busno,
>> +                                            unsigned int slot,
>> +                                            unsigned int fn,
>> +                                            int where)
>> +{
>> +     u16 offset;
>> +     u32 val;
>> +
>> +     /* EP device access */
>> +     val = (busno << CFG_ADDR_BUS_NUM_SHIFT) |
>> +             (slot << CFG_ADDR_DEV_NUM_SHIFT) |
>> +             (fn << CFG_ADDR_FUNC_NUM_SHIFT) |
>> +             (where & CFG_ADDR_REG_NUM_MASK) |
>> +             (1 & CFG_ADDR_CFG_TYPE_MASK);
>> +
>> +     iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
>> +     offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
>> +
>> +     if (iproc_pcie_reg_is_invalid(offset))
>> +             return NULL;
>> +
>> +     return (pcie->base + offset);
>> +}
>
> This is a duplicate of most of the code in iproc_pcie_map_cfg_bus().
> Can you use that directly, or factor this out somehow so it's not
> repeated?

this whole things has to be aligned to Lorenzo's patch, because I see
his patches are in.

>
>> +
>>  /**
>>   * Note access to the configuration registers are protected at the higher layer
>>   * by 'pci_lock' in drivers/pci/access.c
>> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus,
>>               return (pcie->base + offset);
>>  }
>>
>> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
>> +                                 int where, int size, u32 *val)
>> +{
>> +     struct iproc_pcie *pcie = iproc_data(bus);
>> +     unsigned int slot = PCI_SLOT(devfn);
>> +     unsigned int fn = PCI_FUNC(devfn);
>> +     unsigned int busno = bus->number;
>> +     void __iomem *cfg_data_p;
>> +     int ret;
>> +
>> +     /* root complex access. */
>> +     if (busno == 0)
>> +             return pci_generic_config_read32(bus, devfn, where, size, val);
>> +
>> +     cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
>> +
>> +     if (!cfg_data_p)
>> +             return PCIBIOS_DEVICE_NOT_FOUND;
>> +
>> +     ret = iproc_pcie_cfg_retry(cfg_data_p);
>> +     if (ret)
>
> This relies on the fact that PCIBIOS_SUCCESSFUL == 0, which defeats
> the whole point of having a #define.  You could test for
> PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even
> better).
>
>> +             return ret;
>> +
>> +     *val = readl(cfg_data_p);
>
> This is an extra read.  iproc_pcie_cfg_retry() read it once and got
> valid data (something other than CFG_RETRY_STATUS), returned
> PCIBIOS_SUCCESSFUL, and now you're reading it again.
>
> I think you should do something like this instead so you don't do the
> MMIO read any more times than necessary:
>
>   static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>   {
>     u32 data;
>     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
>
>     data = readl(cfg_data_p);
>     while (data == CFG_RETRY_STATUS && timeout--) {
>       udelay(1);
>       data = readl(cfg_data_p);
>     }
>
>     if (data == CFG_RETRY_STATUS)
>       data = 0xffffffff;
>     return data;
>   }
>
>   static int iproc_pcie_config_read(...)
>   {
>     u32 data;
>
>     ...
>     data = iproc_pcie_cfg_retry(cfg_data_p);
>     if (size <= 2)
>       *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>
> In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
> 0xffffffff data.  That's what most other platforms do, and most
> callers of the PCI config accessors check for that data instead of
> checking the return code to see whether the access was successful.
>
> For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
> returns ~0, it's because the device isn't ready yet, and we should
> wait and retry.
>
>> +
>> +     if (size <= 2)
>> +             *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>> +
>> +     return PCIBIOS_SUCCESSFUL;
>> +}
>> +
>>  static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
>>                                   int where, int size, u32 *val)
>>  {
>>       int ret;
>> +     struct iproc_pcie *pcie = iproc_data(bus);
>>
>>       iproc_pcie_apb_err_disable(bus, true);
>> -     ret = pci_generic_config_read32(bus, devfn, where, size, val);
>> +     if (pcie->type == IPROC_PCIE_PAXB_V2)
>> +             ret = iproc_pcie_config_read(bus, devfn, where, size, val);
>> +     else
>> +             ret = pci_generic_config_read32(bus, devfn, where, size, val);
>>       iproc_pcie_apb_err_disable(bus, false);
>>
>>       return ret;
>> --
>> 1.9.1
>>

ok I will take care of your comments. and change the code accordingly.
and re-submit the patch-set again.

Regards
Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
@ 2017-08-19 18:52       ` Oza Oza
  0 siblings, 0 replies; 16+ messages in thread
From: Oza Oza @ 2017-08-19 18:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote:
>> PCIe spec r3.1, sec 2.3.2
>> If CRS software visibility is not enabled, the RC must reissue the
>> config request as a new request.
>>
>> - If CRS software visibility is enabled,
>> - for a config read of Vendor ID, the RC must return 0x0001 data
>> - for all other config reads/writes, the RC must reissue the
>>   request
>>
>> iproc PCIe Controller spec:
>> 4.7.3.3. Retry Status On Configuration Cycle
>> Endpoints are allowed to generate retry status on configuration
>> cycles. In this case, the RC needs to re-issue the request. The IP
>> does not handle this because the number of configuration cycles needed
>> will probably be less than the total number of non-posted operations
>> needed.
>>
>> When a retry status is received on the User RX interface for a
>> configuration request that was sent on the User TX interface,
>> it will be indicated with a completion with the CMPL_STATUS field set
>> to 2=3DCRS, and the user will have to find the address and data values
>> and send a new transaction on the User TX interface.
>> When the internal configuration space returns a retry status during a
>> configuration cycle (user_cscfg =3D 1) on the Command/Status interface,
>> the pcie_cscrs will assert with the pcie_csack signal to indicate the
>> CRS status.
>> When the CRS Software Visibility Enable register in the Root Control
>> register is enabled, the IP will return the data value to 0x0001 for
>> the Vendor ID value and 0xffff  (all 1=E2=80=99s) for the rest of the da=
ta in
>> the request for reads of offset 0 that return with CRS status.  This
>> is true for both the User RX Interface and for the Command/Status
>> interface.  When CRS Software Visibility is enabled, the CMPL_STATUS
>> field of the completion on the User RX Interface will not be 2=3DCRS and
>> the pcie_cscrs signal will not assert on the Command/Status interface.
>>
>> Note that, neither PCIe host bridge nor PCIe core re-issues the
>> request for any configuration offset.
>>
>> As a result of the fact, PCIe RC driver (sw) should take care of
>> retrying.
>> This patch fixes the problem, and attempts to read config space again
>> in case of PCIe code forwarding the CRS back to CPU.
>
> I think "fixes the problem" is a little bit of an overstatement.  I
> think it covers up some cases of the problem, but if I understand
> correctly, we're still susceptible to data corruptions on both read
> and write:
>
>   Per PCIe r3.1, sec 2.3.2, config requests that receive completions
>   with Configuration Request Retry Status (CRS) should be reissued by
>   the hardware *except* reads of the Vendor ID when CRS Software
>   Visibility is enabled.
>
>   This hardware never reissues configuration requests when it receives
>   CRS completions.
>
>   For config writes, there is no way for this driver to learn about
>   CRS completions.  A write that receives a CRS completion will not be
>   retried and the data will be discarded.  This is a potential data
>   corruption.
>
>   For config reads, this hardware returns CFG_RETRY_STATUS data when
>   it receives a CRS completion for a config read, regardless of the
>   address of the read or the CRS Software Visibility Enable bit.  As a
>   partial workaround for this, we retry in software any read that
>   returns CFG_RETRY_STATUS.
>
>   CFG_RETRY_STATUS could be valid data for other config registers.  It
>   is impossible to read such registers correctly because we can't tell
>   whether the hardware (1) received a CRS completion and synthesized
>   data of CFG_RETRY_STATUS or (2) received a successful completion
>   with data of CFG_RETRY_STATUS.
>

this will be fixed in our SOC in next revision, we will have separate regis=
ter
in host bridge space to tell us, whether it has to be retried.
so there will not be any data corruption.
it will be minor change (instead of checking for CFG_RETRY_STATUS, we
will directly check that register)
so that will be clean.

>   The only thing we can really do is return 0xffffffff data, which
>   indicates a config read failure in the first case but is a data
>   corruption in the second case.

Instead of returning PCIBIOS_DEVICE_NOT_FOUND,
you want me to return 0xffffffff data, and I suppose that you have
suggsted in the code at the end also.
Is my understanding right on this comment ?

And do you want me to add all the texts in commit description,
which you have written above regarding config read and write handling ?
instead of "fixes the problem"

It will be quiet long commit description :),
but that's okay.

>
> The write case is probably not that important because we have a
> similar situation on other systems.  On other systems, the root
> complex automatically reissues config writes with CRS completions, but
> it may limit the number of retries.  In that case, the failed write is
> probably logged as a Target Abort error, but nobody pays attention to
> that, so this isn't really much worse.
>

yes, we will not be handling write.

>> It implements iproc_pcie_config_read which gets called for Stingray,
>> Otherwise it falls back to PCI generic APIs.
>>
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc=
.c
>> index 0f39bd2..583cee0 100644
>> --- a/drivers/pci/host/pcie-iproc.c
>> +++ b/drivers/pci/host/pcie-iproc.c
>> @@ -68,6 +68,9 @@
>>  #define APB_ERR_EN_SHIFT             0
>>  #define APB_ERR_EN                   BIT(APB_ERR_EN_SHIFT)
>>
>> +#define CFG_RETRY_STATUS             0xffff0001
>> +#define CFG_RETRY_STATUS_TIMEOUT_US  500000 /* 500 milli-seconds. */
>> +
>>  /* derive the enum index of the outbound/inbound mapping registers */
>>  #define MAP_REG(base_reg, index)      ((base_reg) + (index) * 2)
>>
>> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struc=
t pci_bus *bus,
>>       }
>>  }
>>
>> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>> +{
>> +     int timeout =3D CFG_RETRY_STATUS_TIMEOUT_US;
>> +     unsigned int ret;
>> +
>> +     /*
>> +      * As per PCIe spec r3.1, sec 2.3.2, CRS Software
>> +      * Visibility only affects config read of the Vendor ID.
>> +      * For config write or any other config read the Root must
>> +      * automatically re-issue configuration request again as a
>> +      * new request.
>> +      *
>> +      * Iproc based PCIe RC (hw) does not retry
>> +      * request on its own, so handle it here.
>> +      * Note that, AXI data 0xffff0001 returned by PAXB could be
>> +      * valid data in configuration space, for e.g. BAR requesting
>> +      * 64bit IO memory, so we retry till timeout.
>> +      * And leave to be handled by upper layers.
>> +      *
>> +      * Also, iproc PCIe RC does not have any effect on
>> +      * CRS Software visibility bit, irrespective if it
>> +      * set or unset, the RC will never re-issue any configuration
>> +      * access request.
>> +      */
>> +     do {
>> +             ret =3D readl(cfg_data_p);
>> +             if (ret =3D=3D CFG_RETRY_STATUS)
>> +                     udelay(1);
>> +             else
>> +                     return PCIBIOS_SUCCESSFUL;
>> +     } while (timeout--);
>> +
>> +     return PCIBIOS_DEVICE_NOT_FOUND;
>> +}
>> +
>> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
>> +                                            unsigned int busno,
>> +                                            unsigned int slot,
>> +                                            unsigned int fn,
>> +                                            int where)
>> +{
>> +     u16 offset;
>> +     u32 val;
>> +
>> +     /* EP device access */
>> +     val =3D (busno << CFG_ADDR_BUS_NUM_SHIFT) |
>> +             (slot << CFG_ADDR_DEV_NUM_SHIFT) |
>> +             (fn << CFG_ADDR_FUNC_NUM_SHIFT) |
>> +             (where & CFG_ADDR_REG_NUM_MASK) |
>> +             (1 & CFG_ADDR_CFG_TYPE_MASK);
>> +
>> +     iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
>> +     offset =3D iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
>> +
>> +     if (iproc_pcie_reg_is_invalid(offset))
>> +             return NULL;
>> +
>> +     return (pcie->base + offset);
>> +}
>
> This is a duplicate of most of the code in iproc_pcie_map_cfg_bus().
> Can you use that directly, or factor this out somehow so it's not
> repeated?

this whole things has to be aligned to Lorenzo's patch, because I see
his patches are in.

>
>> +
>>  /**
>>   * Note access to the configuration registers are protected at the high=
er layer
>>   * by 'pci_lock' in drivers/pci/access.c
>> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct=
 pci_bus *bus,
>>               return (pcie->base + offset);
>>  }
>>
>> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int dev=
fn,
>> +                                 int where, int size, u32 *val)
>> +{
>> +     struct iproc_pcie *pcie =3D iproc_data(bus);
>> +     unsigned int slot =3D PCI_SLOT(devfn);
>> +     unsigned int fn =3D PCI_FUNC(devfn);
>> +     unsigned int busno =3D bus->number;
>> +     void __iomem *cfg_data_p;
>> +     int ret;
>> +
>> +     /* root complex access. */
>> +     if (busno =3D=3D 0)
>> +             return pci_generic_config_read32(bus, devfn, where, size, =
val);
>> +
>> +     cfg_data_p =3D iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, wh=
ere);
>> +
>> +     if (!cfg_data_p)
>> +             return PCIBIOS_DEVICE_NOT_FOUND;
>> +
>> +     ret =3D iproc_pcie_cfg_retry(cfg_data_p);
>> +     if (ret)
>
> This relies on the fact that PCIBIOS_SUCCESSFUL =3D=3D 0, which defeats
> the whole point of having a #define.  You could test for
> PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even
> better).
>
>> +             return ret;
>> +
>> +     *val =3D readl(cfg_data_p);
>
> This is an extra read.  iproc_pcie_cfg_retry() read it once and got
> valid data (something other than CFG_RETRY_STATUS), returned
> PCIBIOS_SUCCESSFUL, and now you're reading it again.
>
> I think you should do something like this instead so you don't do the
> MMIO read any more times than necessary:
>
>   static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>   {
>     u32 data;
>     int timeout =3D CFG_RETRY_STATUS_TIMEOUT_US;
>
>     data =3D readl(cfg_data_p);
>     while (data =3D=3D CFG_RETRY_STATUS && timeout--) {
>       udelay(1);
>       data =3D readl(cfg_data_p);
>     }
>
>     if (data =3D=3D CFG_RETRY_STATUS)
>       data =3D 0xffffffff;
>     return data;
>   }
>
>   static int iproc_pcie_config_read(...)
>   {
>     u32 data;
>
>     ...
>     data =3D iproc_pcie_cfg_retry(cfg_data_p);
>     if (size <=3D 2)
>       *val =3D (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>
> In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
> 0xffffffff data.  That's what most other platforms do, and most
> callers of the PCI config accessors check for that data instead of
> checking the return code to see whether the access was successful.
>
> For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
> returns ~0, it's because the device isn't ready yet, and we should
> wait and retry.
>
>> +
>> +     if (size <=3D 2)
>> +             *val =3D (*val >> (8 * (where & 3))) & ((1 << (size * 8)) =
- 1);
>> +
>> +     return PCIBIOS_SUCCESSFUL;
>> +}
>> +
>>  static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int d=
evfn,
>>                                   int where, int size, u32 *val)
>>  {
>>       int ret;
>> +     struct iproc_pcie *pcie =3D iproc_data(bus);
>>
>>       iproc_pcie_apb_err_disable(bus, true);
>> -     ret =3D pci_generic_config_read32(bus, devfn, where, size, val);
>> +     if (pcie->type =3D=3D IPROC_PCIE_PAXB_V2)
>> +             ret =3D iproc_pcie_config_read(bus, devfn, where, size, va=
l);
>> +     else
>> +             ret =3D pci_generic_config_read32(bus, devfn, where, size,=
 val);
>>       iproc_pcie_apb_err_disable(bus, false);
>>
>>       return ret;
>> --
>> 1.9.1
>>

ok I will take care of your comments. and change the code accordingly.
and re-submit the patch-set again.

Regards
Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
  2017-08-19 18:26   ` Bjorn Helgaas
@ 2017-08-19 19:32       ` Oza Oza
  2017-08-19 19:32       ` Oza Oza
  1 sibling, 0 replies; 16+ messages in thread
From: Oza Oza @ 2017-08-19 19:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote:
>> PCIe spec r3.1, sec 2.3.2
>> If CRS software visibility is not enabled, the RC must reissue the
>> config request as a new request.
>>
>> - If CRS software visibility is enabled,
>> - for a config read of Vendor ID, the RC must return 0x0001 data
>> - for all other config reads/writes, the RC must reissue the
>>   request
>>
>> iproc PCIe Controller spec:
>> 4.7.3.3. Retry Status On Configuration Cycle
>> Endpoints are allowed to generate retry status on configuration
>> cycles. In this case, the RC needs to re-issue the request. The IP
>> does not handle this because the number of configuration cycles needed
>> will probably be less than the total number of non-posted operations
>> needed.
>>
>> When a retry status is received on the User RX interface for a
>> configuration request that was sent on the User TX interface,
>> it will be indicated with a completion with the CMPL_STATUS field set
>> to 2=CRS, and the user will have to find the address and data values
>> and send a new transaction on the User TX interface.
>> When the internal configuration space returns a retry status during a
>> configuration cycle (user_cscfg = 1) on the Command/Status interface,
>> the pcie_cscrs will assert with the pcie_csack signal to indicate the
>> CRS status.
>> When the CRS Software Visibility Enable register in the Root Control
>> register is enabled, the IP will return the data value to 0x0001 for
>> the Vendor ID value and 0xffff  (all 1’s) for the rest of the data in
>> the request for reads of offset 0 that return with CRS status.  This
>> is true for both the User RX Interface and for the Command/Status
>> interface.  When CRS Software Visibility is enabled, the CMPL_STATUS
>> field of the completion on the User RX Interface will not be 2=CRS and
>> the pcie_cscrs signal will not assert on the Command/Status interface.
>>
>> Note that, neither PCIe host bridge nor PCIe core re-issues the
>> request for any configuration offset.
>>
>> As a result of the fact, PCIe RC driver (sw) should take care of
>> retrying.
>> This patch fixes the problem, and attempts to read config space again
>> in case of PCIe code forwarding the CRS back to CPU.
>
> I think "fixes the problem" is a little bit of an overstatement.  I
> think it covers up some cases of the problem, but if I understand
> correctly, we're still susceptible to data corruptions on both read
> and write:
>
>   Per PCIe r3.1, sec 2.3.2, config requests that receive completions
>   with Configuration Request Retry Status (CRS) should be reissued by
>   the hardware *except* reads of the Vendor ID when CRS Software
>   Visibility is enabled.
>
>   This hardware never reissues configuration requests when it receives
>   CRS completions.
>
>   For config writes, there is no way for this driver to learn about
>   CRS completions.  A write that receives a CRS completion will not be
>   retried and the data will be discarded.  This is a potential data
>   corruption.
>
>   For config reads, this hardware returns CFG_RETRY_STATUS data when
>   it receives a CRS completion for a config read, regardless of the
>   address of the read or the CRS Software Visibility Enable bit.  As a
>   partial workaround for this, we retry in software any read that
>   returns CFG_RETRY_STATUS.
>
>   CFG_RETRY_STATUS could be valid data for other config registers.  It
>   is impossible to read such registers correctly because we can't tell
>   whether the hardware (1) received a CRS completion and synthesized
>   data of CFG_RETRY_STATUS or (2) received a successful completion
>   with data of CFG_RETRY_STATUS.
>
>   The only thing we can really do is return 0xffffffff data, which
>   indicates a config read failure in the first case but is a data
>   corruption in the second case.
>
> The write case is probably not that important because we have a
> similar situation on other systems.  On other systems, the root
> complex automatically reissues config writes with CRS completions, but
> it may limit the number of retries.  In that case, the failed write is
> probably logged as a Target Abort error, but nobody pays attention to
> that, so this isn't really much worse.
>
>> It implements iproc_pcie_config_read which gets called for Stingray,
>> Otherwise it falls back to PCI generic APIs.
>>
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>> index 0f39bd2..583cee0 100644
>> --- a/drivers/pci/host/pcie-iproc.c
>> +++ b/drivers/pci/host/pcie-iproc.c
>> @@ -68,6 +68,9 @@
>>  #define APB_ERR_EN_SHIFT             0
>>  #define APB_ERR_EN                   BIT(APB_ERR_EN_SHIFT)
>>
>> +#define CFG_RETRY_STATUS             0xffff0001
>> +#define CFG_RETRY_STATUS_TIMEOUT_US  500000 /* 500 milli-seconds. */
>> +
>>  /* derive the enum index of the outbound/inbound mapping registers */
>>  #define MAP_REG(base_reg, index)      ((base_reg) + (index) * 2)
>>
>> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus,
>>       }
>>  }
>>
>> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>> +{
>> +     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
>> +     unsigned int ret;
>> +
>> +     /*
>> +      * As per PCIe spec r3.1, sec 2.3.2, CRS Software
>> +      * Visibility only affects config read of the Vendor ID.
>> +      * For config write or any other config read the Root must
>> +      * automatically re-issue configuration request again as a
>> +      * new request.
>> +      *
>> +      * Iproc based PCIe RC (hw) does not retry
>> +      * request on its own, so handle it here.
>> +      * Note that, AXI data 0xffff0001 returned by PAXB could be
>> +      * valid data in configuration space, for e.g. BAR requesting
>> +      * 64bit IO memory, so we retry till timeout.
>> +      * And leave to be handled by upper layers.
>> +      *
>> +      * Also, iproc PCIe RC does not have any effect on
>> +      * CRS Software visibility bit, irrespective if it
>> +      * set or unset, the RC will never re-issue any configuration
>> +      * access request.
>> +      */
>> +     do {
>> +             ret = readl(cfg_data_p);
>> +             if (ret == CFG_RETRY_STATUS)
>> +                     udelay(1);
>> +             else
>> +                     return PCIBIOS_SUCCESSFUL;
>> +     } while (timeout--);
>> +
>> +     return PCIBIOS_DEVICE_NOT_FOUND;
>> +}
>> +
>> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
>> +                                            unsigned int busno,
>> +                                            unsigned int slot,
>> +                                            unsigned int fn,
>> +                                            int where)
>> +{
>> +     u16 offset;
>> +     u32 val;
>> +
>> +     /* EP device access */
>> +     val = (busno << CFG_ADDR_BUS_NUM_SHIFT) |
>> +             (slot << CFG_ADDR_DEV_NUM_SHIFT) |
>> +             (fn << CFG_ADDR_FUNC_NUM_SHIFT) |
>> +             (where & CFG_ADDR_REG_NUM_MASK) |
>> +             (1 & CFG_ADDR_CFG_TYPE_MASK);
>> +
>> +     iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
>> +     offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
>> +
>> +     if (iproc_pcie_reg_is_invalid(offset))
>> +             return NULL;
>> +
>> +     return (pcie->base + offset);
>> +}
>
> This is a duplicate of most of the code in iproc_pcie_map_cfg_bus().
> Can you use that directly, or factor this out somehow so it's not
> repeated?
>
>> +
>>  /**
>>   * Note access to the configuration registers are protected at the higher layer
>>   * by 'pci_lock' in drivers/pci/access.c
>> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus,
>>               return (pcie->base + offset);
>>  }
>>
>> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
>> +                                 int where, int size, u32 *val)
>> +{
>> +     struct iproc_pcie *pcie = iproc_data(bus);
>> +     unsigned int slot = PCI_SLOT(devfn);
>> +     unsigned int fn = PCI_FUNC(devfn);
>> +     unsigned int busno = bus->number;
>> +     void __iomem *cfg_data_p;
>> +     int ret;
>> +
>> +     /* root complex access. */
>> +     if (busno == 0)
>> +             return pci_generic_config_read32(bus, devfn, where, size, val);
>> +
>> +     cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
>> +
>> +     if (!cfg_data_p)
>> +             return PCIBIOS_DEVICE_NOT_FOUND;
>> +
>> +     ret = iproc_pcie_cfg_retry(cfg_data_p);
>> +     if (ret)
>
> This relies on the fact that PCIBIOS_SUCCESSFUL == 0, which defeats
> the whole point of having a #define.  You could test for
> PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even
> better).
>
>> +             return ret;
>> +
>> +     *val = readl(cfg_data_p);
>
> This is an extra read.  iproc_pcie_cfg_retry() read it once and got
> valid data (something other than CFG_RETRY_STATUS), returned
> PCIBIOS_SUCCESSFUL, and now you're reading it again.
>
> I think you should do something like this instead so you don't do the
> MMIO read any more times than necessary:
>
>   static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>   {
>     u32 data;
>     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
>
>     data = readl(cfg_data_p);
>     while (data == CFG_RETRY_STATUS && timeout--) {
>       udelay(1);
>       data = readl(cfg_data_p);
>     }
>
>     if (data == CFG_RETRY_STATUS)
>       data = 0xffffffff;
>     return data;
>   }
>
>   static int iproc_pcie_config_read(...)
>   {
>     u32 data;
>
>     ...
>     data = iproc_pcie_cfg_retry(cfg_data_p);
>     if (size <= 2)
>       *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>
> In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
> 0xffffffff data.  That's what most other platforms do, and most
> callers of the PCI config accessors check for that data instead of
> checking the return code to see whether the access was successful.
>
I see one problem with this.
we have Samsung NVMe which exposes 64-bit IO BAR.
0xffff0001.

and if you see __pci_read_base which does following in sequence.

pci_read_config_dword(dev, pos, &l);
pci_write_config_dword(dev, pos, l | mask);
pci_read_config_dword(dev, pos, &sz);
pci_write_config_dword(dev, pos, l);

returning 0xffffffff would not be correct in that case.
even if callers retry, they will end up in a loop if caller retries.

hence I was returning 0xffff0001, it is upto the upper layer to treat
it as data or not.
let me know if this sounds like a problem to you as well.

so in my opinion returning 0xffffffff is not an option.

> For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
> returns ~0, it's because the device isn't ready yet, and we should
> wait and retry.
>
>> +
>> +     if (size <= 2)
>> +             *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>> +
>> +     return PCIBIOS_SUCCESSFUL;
>> +}
>> +
>>  static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
>>                                   int where, int size, u32 *val)
>>  {
>>       int ret;
>> +     struct iproc_pcie *pcie = iproc_data(bus);
>>
>>       iproc_pcie_apb_err_disable(bus, true);
>> -     ret = pci_generic_config_read32(bus, devfn, where, size, val);
>> +     if (pcie->type == IPROC_PCIE_PAXB_V2)
>> +             ret = iproc_pcie_config_read(bus, devfn, where, size, val);
>> +     else
>> +             ret = pci_generic_config_read32(bus, devfn, where, size, val);
>>       iproc_pcie_apb_err_disable(bus, false);
>>
>>       return ret;
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
@ 2017-08-19 19:32       ` Oza Oza
  0 siblings, 0 replies; 16+ messages in thread
From: Oza Oza @ 2017-08-19 19:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote:
>> PCIe spec r3.1, sec 2.3.2
>> If CRS software visibility is not enabled, the RC must reissue the
>> config request as a new request.
>>
>> - If CRS software visibility is enabled,
>> - for a config read of Vendor ID, the RC must return 0x0001 data
>> - for all other config reads/writes, the RC must reissue the
>>   request
>>
>> iproc PCIe Controller spec:
>> 4.7.3.3. Retry Status On Configuration Cycle
>> Endpoints are allowed to generate retry status on configuration
>> cycles. In this case, the RC needs to re-issue the request. The IP
>> does not handle this because the number of configuration cycles needed
>> will probably be less than the total number of non-posted operations
>> needed.
>>
>> When a retry status is received on the User RX interface for a
>> configuration request that was sent on the User TX interface,
>> it will be indicated with a completion with the CMPL_STATUS field set
>> to 2=3DCRS, and the user will have to find the address and data values
>> and send a new transaction on the User TX interface.
>> When the internal configuration space returns a retry status during a
>> configuration cycle (user_cscfg =3D 1) on the Command/Status interface,
>> the pcie_cscrs will assert with the pcie_csack signal to indicate the
>> CRS status.
>> When the CRS Software Visibility Enable register in the Root Control
>> register is enabled, the IP will return the data value to 0x0001 for
>> the Vendor ID value and 0xffff  (all 1=E2=80=99s) for the rest of the da=
ta in
>> the request for reads of offset 0 that return with CRS status.  This
>> is true for both the User RX Interface and for the Command/Status
>> interface.  When CRS Software Visibility is enabled, the CMPL_STATUS
>> field of the completion on the User RX Interface will not be 2=3DCRS and
>> the pcie_cscrs signal will not assert on the Command/Status interface.
>>
>> Note that, neither PCIe host bridge nor PCIe core re-issues the
>> request for any configuration offset.
>>
>> As a result of the fact, PCIe RC driver (sw) should take care of
>> retrying.
>> This patch fixes the problem, and attempts to read config space again
>> in case of PCIe code forwarding the CRS back to CPU.
>
> I think "fixes the problem" is a little bit of an overstatement.  I
> think it covers up some cases of the problem, but if I understand
> correctly, we're still susceptible to data corruptions on both read
> and write:
>
>   Per PCIe r3.1, sec 2.3.2, config requests that receive completions
>   with Configuration Request Retry Status (CRS) should be reissued by
>   the hardware *except* reads of the Vendor ID when CRS Software
>   Visibility is enabled.
>
>   This hardware never reissues configuration requests when it receives
>   CRS completions.
>
>   For config writes, there is no way for this driver to learn about
>   CRS completions.  A write that receives a CRS completion will not be
>   retried and the data will be discarded.  This is a potential data
>   corruption.
>
>   For config reads, this hardware returns CFG_RETRY_STATUS data when
>   it receives a CRS completion for a config read, regardless of the
>   address of the read or the CRS Software Visibility Enable bit.  As a
>   partial workaround for this, we retry in software any read that
>   returns CFG_RETRY_STATUS.
>
>   CFG_RETRY_STATUS could be valid data for other config registers.  It
>   is impossible to read such registers correctly because we can't tell
>   whether the hardware (1) received a CRS completion and synthesized
>   data of CFG_RETRY_STATUS or (2) received a successful completion
>   with data of CFG_RETRY_STATUS.
>
>   The only thing we can really do is return 0xffffffff data, which
>   indicates a config read failure in the first case but is a data
>   corruption in the second case.
>
> The write case is probably not that important because we have a
> similar situation on other systems.  On other systems, the root
> complex automatically reissues config writes with CRS completions, but
> it may limit the number of retries.  In that case, the failed write is
> probably logged as a Target Abort error, but nobody pays attention to
> that, so this isn't really much worse.
>
>> It implements iproc_pcie_config_read which gets called for Stingray,
>> Otherwise it falls back to PCI generic APIs.
>>
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc=
.c
>> index 0f39bd2..583cee0 100644
>> --- a/drivers/pci/host/pcie-iproc.c
>> +++ b/drivers/pci/host/pcie-iproc.c
>> @@ -68,6 +68,9 @@
>>  #define APB_ERR_EN_SHIFT             0
>>  #define APB_ERR_EN                   BIT(APB_ERR_EN_SHIFT)
>>
>> +#define CFG_RETRY_STATUS             0xffff0001
>> +#define CFG_RETRY_STATUS_TIMEOUT_US  500000 /* 500 milli-seconds. */
>> +
>>  /* derive the enum index of the outbound/inbound mapping registers */
>>  #define MAP_REG(base_reg, index)      ((base_reg) + (index) * 2)
>>
>> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struc=
t pci_bus *bus,
>>       }
>>  }
>>
>> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>> +{
>> +     int timeout =3D CFG_RETRY_STATUS_TIMEOUT_US;
>> +     unsigned int ret;
>> +
>> +     /*
>> +      * As per PCIe spec r3.1, sec 2.3.2, CRS Software
>> +      * Visibility only affects config read of the Vendor ID.
>> +      * For config write or any other config read the Root must
>> +      * automatically re-issue configuration request again as a
>> +      * new request.
>> +      *
>> +      * Iproc based PCIe RC (hw) does not retry
>> +      * request on its own, so handle it here.
>> +      * Note that, AXI data 0xffff0001 returned by PAXB could be
>> +      * valid data in configuration space, for e.g. BAR requesting
>> +      * 64bit IO memory, so we retry till timeout.
>> +      * And leave to be handled by upper layers.
>> +      *
>> +      * Also, iproc PCIe RC does not have any effect on
>> +      * CRS Software visibility bit, irrespective if it
>> +      * set or unset, the RC will never re-issue any configuration
>> +      * access request.
>> +      */
>> +     do {
>> +             ret =3D readl(cfg_data_p);
>> +             if (ret =3D=3D CFG_RETRY_STATUS)
>> +                     udelay(1);
>> +             else
>> +                     return PCIBIOS_SUCCESSFUL;
>> +     } while (timeout--);
>> +
>> +     return PCIBIOS_DEVICE_NOT_FOUND;
>> +}
>> +
>> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
>> +                                            unsigned int busno,
>> +                                            unsigned int slot,
>> +                                            unsigned int fn,
>> +                                            int where)
>> +{
>> +     u16 offset;
>> +     u32 val;
>> +
>> +     /* EP device access */
>> +     val =3D (busno << CFG_ADDR_BUS_NUM_SHIFT) |
>> +             (slot << CFG_ADDR_DEV_NUM_SHIFT) |
>> +             (fn << CFG_ADDR_FUNC_NUM_SHIFT) |
>> +             (where & CFG_ADDR_REG_NUM_MASK) |
>> +             (1 & CFG_ADDR_CFG_TYPE_MASK);
>> +
>> +     iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
>> +     offset =3D iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
>> +
>> +     if (iproc_pcie_reg_is_invalid(offset))
>> +             return NULL;
>> +
>> +     return (pcie->base + offset);
>> +}
>
> This is a duplicate of most of the code in iproc_pcie_map_cfg_bus().
> Can you use that directly, or factor this out somehow so it's not
> repeated?
>
>> +
>>  /**
>>   * Note access to the configuration registers are protected at the high=
er layer
>>   * by 'pci_lock' in drivers/pci/access.c
>> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct=
 pci_bus *bus,
>>               return (pcie->base + offset);
>>  }
>>
>> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int dev=
fn,
>> +                                 int where, int size, u32 *val)
>> +{
>> +     struct iproc_pcie *pcie =3D iproc_data(bus);
>> +     unsigned int slot =3D PCI_SLOT(devfn);
>> +     unsigned int fn =3D PCI_FUNC(devfn);
>> +     unsigned int busno =3D bus->number;
>> +     void __iomem *cfg_data_p;
>> +     int ret;
>> +
>> +     /* root complex access. */
>> +     if (busno =3D=3D 0)
>> +             return pci_generic_config_read32(bus, devfn, where, size, =
val);
>> +
>> +     cfg_data_p =3D iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, wh=
ere);
>> +
>> +     if (!cfg_data_p)
>> +             return PCIBIOS_DEVICE_NOT_FOUND;
>> +
>> +     ret =3D iproc_pcie_cfg_retry(cfg_data_p);
>> +     if (ret)
>
> This relies on the fact that PCIBIOS_SUCCESSFUL =3D=3D 0, which defeats
> the whole point of having a #define.  You could test for
> PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even
> better).
>
>> +             return ret;
>> +
>> +     *val =3D readl(cfg_data_p);
>
> This is an extra read.  iproc_pcie_cfg_retry() read it once and got
> valid data (something other than CFG_RETRY_STATUS), returned
> PCIBIOS_SUCCESSFUL, and now you're reading it again.
>
> I think you should do something like this instead so you don't do the
> MMIO read any more times than necessary:
>
>   static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>   {
>     u32 data;
>     int timeout =3D CFG_RETRY_STATUS_TIMEOUT_US;
>
>     data =3D readl(cfg_data_p);
>     while (data =3D=3D CFG_RETRY_STATUS && timeout--) {
>       udelay(1);
>       data =3D readl(cfg_data_p);
>     }
>
>     if (data =3D=3D CFG_RETRY_STATUS)
>       data =3D 0xffffffff;
>     return data;
>   }
>
>   static int iproc_pcie_config_read(...)
>   {
>     u32 data;
>
>     ...
>     data =3D iproc_pcie_cfg_retry(cfg_data_p);
>     if (size <=3D 2)
>       *val =3D (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>
> In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
> 0xffffffff data.  That's what most other platforms do, and most
> callers of the PCI config accessors check for that data instead of
> checking the return code to see whether the access was successful.
>
I see one problem with this.
we have Samsung NVMe which exposes 64-bit IO BAR.
0xffff0001.

and if you see __pci_read_base which does following in sequence.

pci_read_config_dword(dev, pos, &l);
pci_write_config_dword(dev, pos, l | mask);
pci_read_config_dword(dev, pos, &sz);
pci_write_config_dword(dev, pos, l);

returning 0xffffffff would not be correct in that case.
even if callers retry, they will end up in a loop if caller retries.

hence I was returning 0xffff0001, it is upto the upper layer to treat
it as data or not.
let me know if this sounds like a problem to you as well.

so in my opinion returning 0xffffffff is not an option.

> For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
> returns ~0, it's because the device isn't ready yet, and we should
> wait and retry.
>
>> +
>> +     if (size <=3D 2)
>> +             *val =3D (*val >> (8 * (where & 3))) & ((1 << (size * 8)) =
- 1);
>> +
>> +     return PCIBIOS_SUCCESSFUL;
>> +}
>> +
>>  static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int d=
evfn,
>>                                   int where, int size, u32 *val)
>>  {
>>       int ret;
>> +     struct iproc_pcie *pcie =3D iproc_data(bus);
>>
>>       iproc_pcie_apb_err_disable(bus, true);
>> -     ret =3D pci_generic_config_read32(bus, devfn, where, size, val);
>> +     if (pcie->type =3D=3D IPROC_PCIE_PAXB_V2)
>> +             ret =3D iproc_pcie_config_read(bus, devfn, where, size, va=
l);
>> +     else
>> +             ret =3D pci_generic_config_read32(bus, devfn, where, size,=
 val);
>>       iproc_pcie_apb_err_disable(bus, false);
>>
>>       return ret;
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
  2017-08-19 19:32       ` Oza Oza
  (?)
@ 2017-08-19 20:26       ` Bjorn Helgaas
  2017-08-20  3:42         ` Oza Oza
  -1 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2017-08-19 20:26 UTC (permalink / raw)
  To: Oza Oza
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sun, Aug 20, 2017 at 01:02:09AM +0530, Oza Oza wrote:
> On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:

> > I think you should do something like this instead so you don't do the
> > MMIO read any more times than necessary:
> >
> >   static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
> >   {
> >     u32 data;
> >     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
> >
> >     data = readl(cfg_data_p);
> >     while (data == CFG_RETRY_STATUS && timeout--) {
> >       udelay(1);
> >       data = readl(cfg_data_p);
> >     }
> >
> >     if (data == CFG_RETRY_STATUS)
> >       data = 0xffffffff;
> >     return data;
> >   }
> >
> >   static int iproc_pcie_config_read(...)
> >   {
> >     u32 data;
> >
> >     ...
> >     data = iproc_pcie_cfg_retry(cfg_data_p);
> >     if (size <= 2)
> >       *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
> >
> > In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
> > 0xffffffff data.  That's what most other platforms do, and most
> > callers of the PCI config accessors check for that data instead of
> > checking the return code to see whether the access was successful.
> >
> I see one problem with this.
> we have Samsung NVMe which exposes 64-bit IO BAR.
> 0xffff0001.
> 
> and if you see __pci_read_base which does following in sequence.
> 
> pci_read_config_dword(dev, pos, &l);
> pci_write_config_dword(dev, pos, l | mask);
> pci_read_config_dword(dev, pos, &sz);
> pci_write_config_dword(dev, pos, l);
> 
> returning 0xffffffff would not be correct in that case.
> even if callers retry, they will end up in a loop if caller retries.
> 
> hence I was returning 0xffff0001, it is upto the upper layer to treat
> it as data or not.

In your patch, I don't think the upper layer will ever see 0xffff0001.
iproc_pcie_config_read() only updates *data if iproc_pcie_cfg_retry()
returns PCIBIOS_SUCCESSFUL.  And iproc_pcie_cfg_retry() only returns
PCIBIOS_SUCCESSFUL if it reads something other than 0xffff0001.

Even if you *did* return 0xffff0001 to upper layers, I think we'd have
a problem.  Here's an example scenario.  If we do a Function-Level
Reset, pcie_flr() starts the reset, then calls pci_flr_wait() to wait
until the device becomes ready again.  pci_flr_wait() thinks that any
PCI_COMMAND value other than 0xffffffff means the device is ready.

If the device returns CRS status and you return 0xffff0001 when
pci_flr_wait() reads PCI_COMMAND, it thinks that means the device is
ready, but it's not.

> let me know if this sounds like a problem to you as well.
> 
> so in my opinion returning 0xffffffff is not an option.
> 
> > For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
> > returns ~0, it's because the device isn't ready yet, and we should
> > wait and retry.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC
  2017-08-04 15:48 ` [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC Oza Pawandeep
@ 2017-08-19 21:04   ` Bjorn Helgaas
  2017-08-20  3:36     ` Oza Oza
  0 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2017-08-19 21:04 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	bcm-kernel-feedback-list, Andy Gospodarek, linux-pci,
	linux-kernel, Oza Pawandeep

On Fri, Aug 04, 2017 at 09:18:16PM +0530, Oza Pawandeep wrote:
> PERST must be asserted around ~500ms before the reboot is applied.
> 
> During soft reset (e.g., "reboot" from Linux) on some iproc based SOCs
> LCPLL clock and PERST both goes off simultaneously.
> This will cause certain Endpoints Intel NVMe not get detected, upon
> next boot sequence.
> 
> This is specifically happening with Intel P3700 400GB series.
> Endpoint is expecting the clock for some amount of time after PERST is
> asserted, which is not happening in Stingray (iproc based SOC).
> This causes NVMe to behave in undefined way.
> 
> On the contrary, Intel x86 boards will have ref clock running, so we
> do not see this behavior there.
> 
> Besides, PCI spec does not stipulate about such timings.
> In-fact it does not tell us, whether PCIe device should consider
> refclk to be available or not to be.
> 
> It is completely up to vendor to design their EP the way they want
> with respect to ref clock availability.

I am unconvinced.  Designing an endpoint that relies on ref clock
timing not guaranteed by the spec does not sound like a way to build
reliable hardware.

The PCIe Card Electromechanical spec, r2.0, sec 2.2.3 says the clock
goes inactive after PERST# goes active, but doesn't specify how long.
In the absence of a spec requirement, I don't see a reason to assume
other systems preserve the ref clock after PERST#, so if the Intel
device requires clocks for 500ms after PERST#, we should see problems
on other systems.

Sec 2.2 says that on power up, the power rails must be stable at least
T(PVPERL) (100ms) and reference clocks must be stable at least
T(PERST-CLK) (100us) before PERST# is deasserted.  I think it's more
likely the problem is here.

The current iproc_pcie_reset() looks like it sleeps 100ms *after* it
deasserts PERST#.  Should that be *before* deasserting PERST#?

> 500ms is just based on the observation and taken as safe margin.
> This patch adds platform shutdown where it should be
> called in device_shutdown while reboot command is issued.
> So in sequence first Endpoint Shutdown (e.g. nvme_shutdown)
> followed by RC shutdown, which issues safe PERST assertion.
> 
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> 
> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
> index 90d2bdd..9512960 100644
> --- a/drivers/pci/host/pcie-iproc-platform.c
> +++ b/drivers/pci/host/pcie-iproc-platform.c
> @@ -131,6 +131,13 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
>  	return iproc_pcie_remove(pcie);
>  }
>  
> +static void iproc_pcie_pltfm_shutdown(struct platform_device *pdev)
> +{
> +	struct iproc_pcie *pcie = platform_get_drvdata(pdev);
> +
> +	iproc_pcie_shutdown(pcie);
> +}
> +
>  static struct platform_driver iproc_pcie_pltfm_driver = {
>  	.driver = {
>  		.name = "iproc-pcie",
> @@ -138,6 +145,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
>  	},
>  	.probe = iproc_pcie_pltfm_probe,
>  	.remove = iproc_pcie_pltfm_remove,
> +	.shutdown = iproc_pcie_pltfm_shutdown,
>  };
>  module_platform_driver(iproc_pcie_pltfm_driver);
>  
> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> index 583cee0..ee40651 100644
> --- a/drivers/pci/host/pcie-iproc.c
> +++ b/drivers/pci/host/pcie-iproc.c
> @@ -627,7 +627,7 @@ static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn,
>  	.write = iproc_pcie_config_write32,
>  };
>  
> -static void iproc_pcie_reset(struct iproc_pcie *pcie)
> +static void iproc_pcie_perst_ctrl(struct iproc_pcie *pcie, bool assert)
>  {
>  	u32 val;
>  
> @@ -639,20 +639,28 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie)
>  	if (pcie->ep_is_internal)
>  		return;
>  
> -	/*
> -	 * Select perst_b signal as reset source. Put the device into reset,
> -	 * and then bring it out of reset
> -	 */
> -	val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> -	val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
> -		~RC_PCIE_RST_OUTPUT;
> -	iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> -	udelay(250);
> -
> -	val |= RC_PCIE_RST_OUTPUT;
> -	iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> -	msleep(100);
> +	if (assert) {
> +		val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> +		val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
> +			~RC_PCIE_RST_OUTPUT;
> +		iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> +		udelay(250);
> +	} else {
> +		val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> +		val |= RC_PCIE_RST_OUTPUT;
> +		iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> +		msleep(100);
> +	}
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC
  2017-08-19 21:04   ` Bjorn Helgaas
@ 2017-08-20  3:36     ` Oza Oza
  2017-08-20 21:25       ` Bjorn Helgaas
  0 siblings, 1 reply; 16+ messages in thread
From: Oza Oza @ 2017-08-20  3:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sun, Aug 20, 2017 at 2:34 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Aug 04, 2017 at 09:18:16PM +0530, Oza Pawandeep wrote:
>> PERST must be asserted around ~500ms before the reboot is applied.
>>
>> During soft reset (e.g., "reboot" from Linux) on some iproc based SOCs
>> LCPLL clock and PERST both goes off simultaneously.
>> This will cause certain Endpoints Intel NVMe not get detected, upon
>> next boot sequence.
>>
>> This is specifically happening with Intel P3700 400GB series.
>> Endpoint is expecting the clock for some amount of time after PERST is
>> asserted, which is not happening in Stingray (iproc based SOC).
>> This causes NVMe to behave in undefined way.
>>
>> On the contrary, Intel x86 boards will have ref clock running, so we
>> do not see this behavior there.
>>
>> Besides, PCI spec does not stipulate about such timings.
>> In-fact it does not tell us, whether PCIe device should consider
>> refclk to be available or not to be.
>>
>> It is completely up to vendor to design their EP the way they want
>> with respect to ref clock availability.
>
> I am unconvinced.  Designing an endpoint that relies on ref clock
> timing not guaranteed by the spec does not sound like a way to build
> reliable hardware.
>
> The PCIe Card Electromechanical spec, r2.0, sec 2.2.3 says the clock
> goes inactive after PERST# goes active, but doesn't specify how long.
> In the absence of a spec requirement, I don't see a reason to assume
> other systems preserve the ref clock after PERST#, so if the Intel
> device requires clocks for 500ms after PERST#, we should see problems
> on other systems.

The reason you do not see and most likely you will not see on any
other system is
because, the ref clock is supplied by on board oscillator.
when PERST# is asserted, the ref clock is still available.

but in our case, we do not have on-board clock generator, rather we
rely on the clock
coming from SOC, so if SOC reset is issued, both PERST# and the ref
clock goes off simultaneously.

please also refer to the link below which stipulate on clk being
active after PERST# being asserted.
http://www.eetimes.com/document.asp?doc_id=1279299  [Figure 2:
Power-down waveforms]

however I am not saying that, this article has more to claim than PCIe spec.
but, I think the PCIe Card Electromechanical spec leaves the margin
for card manufactures to design the card based on the assumption
that ref clock could be available after PERST#  is asserted.

most of the cards do not assume that, but at the least we have seen that,
once particular card which behaves as per the link which I have just
pasted above.

>
> Sec 2.2 says that on power up, the power rails must be stable at least
> T(PVPERL) (100ms) and reference clocks must be stable at least
> T(PERST-CLK) (100us) before PERST# is deasserted.  I think it's more
> likely the problem is here.
>
> The current iproc_pcie_reset() looks like it sleeps 100ms *after* it
> deasserts PERST#.  Should that be *before* deasserting PERST#?
>

T(PERST-CLK) (100us) before PERST# is deasserted.
which we are already waiting for 250us

T(PVPERL) (100ms), the power rail is stable much before linux comes up.
so I still think we are meeting power up requirements.

>> 500ms is just based on the observation and taken as safe margin.
>> This patch adds platform shutdown where it should be
>> called in device_shutdown while reboot command is issued.
>> So in sequence first Endpoint Shutdown (e.g. nvme_shutdown)
>> followed by RC shutdown, which issues safe PERST assertion.
>>
>> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
>> index 90d2bdd..9512960 100644
>> --- a/drivers/pci/host/pcie-iproc-platform.c
>> +++ b/drivers/pci/host/pcie-iproc-platform.c
>> @@ -131,6 +131,13 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
>>       return iproc_pcie_remove(pcie);
>>  }
>>
>> +static void iproc_pcie_pltfm_shutdown(struct platform_device *pdev)
>> +{
>> +     struct iproc_pcie *pcie = platform_get_drvdata(pdev);
>> +
>> +     iproc_pcie_shutdown(pcie);
>> +}
>> +
>>  static struct platform_driver iproc_pcie_pltfm_driver = {
>>       .driver = {
>>               .name = "iproc-pcie",
>> @@ -138,6 +145,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
>>       },
>>       .probe = iproc_pcie_pltfm_probe,
>>       .remove = iproc_pcie_pltfm_remove,
>> +     .shutdown = iproc_pcie_pltfm_shutdown,
>>  };
>>  module_platform_driver(iproc_pcie_pltfm_driver);
>>
>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>> index 583cee0..ee40651 100644
>> --- a/drivers/pci/host/pcie-iproc.c
>> +++ b/drivers/pci/host/pcie-iproc.c
>> @@ -627,7 +627,7 @@ static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn,
>>       .write = iproc_pcie_config_write32,
>>  };
>>
>> -static void iproc_pcie_reset(struct iproc_pcie *pcie)
>> +static void iproc_pcie_perst_ctrl(struct iproc_pcie *pcie, bool assert)
>>  {
>>       u32 val;
>>
>> @@ -639,20 +639,28 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie)
>>       if (pcie->ep_is_internal)
>>               return;
>>
>> -     /*
>> -      * Select perst_b signal as reset source. Put the device into reset,
>> -      * and then bring it out of reset
>> -      */
>> -     val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
>> -     val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
>> -             ~RC_PCIE_RST_OUTPUT;
>> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> -     udelay(250);
>> -
>> -     val |= RC_PCIE_RST_OUTPUT;
>> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> -     msleep(100);
>> +     if (assert) {
>> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
>> +             val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
>> +                     ~RC_PCIE_RST_OUTPUT;
>> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> +             udelay(250);
>> +     } else {
>> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
>> +             val |= RC_PCIE_RST_OUTPUT;
>> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> +             msleep(100);
>> +     }
>> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP
  2017-08-19 20:26       ` Bjorn Helgaas
@ 2017-08-20  3:42         ` Oza Oza
  0 siblings, 0 replies; 16+ messages in thread
From: Oza Oza @ 2017-08-20  3:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sun, Aug 20, 2017 at 1:56 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Sun, Aug 20, 2017 at 01:02:09AM +0530, Oza Oza wrote:
>> On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> > I think you should do something like this instead so you don't do the
>> > MMIO read any more times than necessary:
>> >
>> >   static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>> >   {
>> >     u32 data;
>> >     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
>> >
>> >     data = readl(cfg_data_p);
>> >     while (data == CFG_RETRY_STATUS && timeout--) {
>> >       udelay(1);
>> >       data = readl(cfg_data_p);
>> >     }
>> >
>> >     if (data == CFG_RETRY_STATUS)
>> >       data = 0xffffffff;
>> >     return data;
>> >   }
>> >
>> >   static int iproc_pcie_config_read(...)
>> >   {
>> >     u32 data;
>> >
>> >     ...
>> >     data = iproc_pcie_cfg_retry(cfg_data_p);
>> >     if (size <= 2)
>> >       *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>> >
>> > In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
>> > 0xffffffff data.  That's what most other platforms do, and most
>> > callers of the PCI config accessors check for that data instead of
>> > checking the return code to see whether the access was successful.
>> >
>> I see one problem with this.
>> we have Samsung NVMe which exposes 64-bit IO BAR.
>> 0xffff0001.
>>
>> and if you see __pci_read_base which does following in sequence.
>>
>> pci_read_config_dword(dev, pos, &l);
>> pci_write_config_dword(dev, pos, l | mask);
>> pci_read_config_dword(dev, pos, &sz);
>> pci_write_config_dword(dev, pos, l);
>>
>> returning 0xffffffff would not be correct in that case.
>> even if callers retry, they will end up in a loop if caller retries.
>>
>> hence I was returning 0xffff0001, it is upto the upper layer to treat
>> it as data or not.
>
> In your patch, I don't think the upper layer will ever see 0xffff0001.
> iproc_pcie_config_read() only updates *data if iproc_pcie_cfg_retry()
> returns PCIBIOS_SUCCESSFUL.  And iproc_pcie_cfg_retry() only returns
> PCIBIOS_SUCCESSFUL if it reads something other than 0xffff0001.
>
> Even if you *did* return 0xffff0001 to upper layers, I think we'd have
> a problem.  Here's an example scenario.  If we do a Function-Level
> Reset, pcie_flr() starts the reset, then calls pci_flr_wait() to wait
> until the device becomes ready again.  pci_flr_wait() thinks that any
> PCI_COMMAND value other than 0xffffffff means the device is ready.
>
> If the device returns CRS status and you return 0xffff0001 when
> pci_flr_wait() reads PCI_COMMAND, it thinks that means the device is
> ready, but it's not.
>
>> let me know if this sounds like a problem to you as well.
>>
>> so in my opinion returning 0xffffffff is not an option.
>>
>> > For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
>> > returns ~0, it's because the device isn't ready yet, and we should
>> > wait and retry.

yes that will be a problem if I return 0xffff0001.
ok let me revise the patch by returning 0xffffffff as of now.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC
  2017-08-20  3:36     ` Oza Oza
@ 2017-08-20 21:25       ` Bjorn Helgaas
  2017-08-21  3:01         ` Oza Oza
  0 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2017-08-20 21:25 UTC (permalink / raw)
  To: Oza Oza
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Sun, Aug 20, 2017 at 09:06:51AM +0530, Oza Oza wrote:
> On Sun, Aug 20, 2017 at 2:34 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Aug 04, 2017 at 09:18:16PM +0530, Oza Pawandeep wrote:
> >> PERST must be asserted around ~500ms before the reboot is applied.
> >>
> >> During soft reset (e.g., "reboot" from Linux) on some iproc based SOCs
> >> LCPLL clock and PERST both goes off simultaneously.
> >> This will cause certain Endpoints Intel NVMe not get detected, upon
> >> next boot sequence.
> >>
> >> This is specifically happening with Intel P3700 400GB series.
> >> Endpoint is expecting the clock for some amount of time after PERST is
> >> asserted, which is not happening in Stingray (iproc based SOC).
> >> This causes NVMe to behave in undefined way.
> >>
> >> On the contrary, Intel x86 boards will have ref clock running, so we
> >> do not see this behavior there.
> >>
> >> Besides, PCI spec does not stipulate about such timings.
> >> In-fact it does not tell us, whether PCIe device should consider
> >> refclk to be available or not to be.
> >>
> >> It is completely up to vendor to design their EP the way they want
> >> with respect to ref clock availability.
> >
> > I am unconvinced.  Designing an endpoint that relies on ref clock
> > timing not guaranteed by the spec does not sound like a way to build
> > reliable hardware.
> >
> > The PCIe Card Electromechanical spec, r2.0, sec 2.2.3 says the clock
> > goes inactive after PERST# goes active, but doesn't specify how long.
> > In the absence of a spec requirement, I don't see a reason to assume
> > other systems preserve the ref clock after PERST#, so if the Intel
> > device requires clocks for 500ms after PERST#, we should see problems
> > on other systems.
> 
> The reason you do not see and most likely you will not see on any
> other system is
> because, the ref clock is supplied by on board oscillator.
> when PERST# is asserted, the ref clock is still available.
> 
> but in our case, we do not have on-board clock generator, rather we
> rely on the clock
> coming from SOC, so if SOC reset is issued, both PERST# and the ref
> clock goes off simultaneously.

OK.  I'm not a hardware person and will have to take your word for
this.

> please also refer to the link below which stipulate on clk being
> active after PERST# being asserted.
> http://www.eetimes.com/document.asp?doc_id=1279299  [Figure 2:
> Power-down waveforms]

This is just a copy of Figure 2-13 from the PCIe CEM spec I cited
above.  It's better to cite the spec itself than an article based on
the spec.

> however I am not saying that, this article has more to claim than PCIe spec.
> but, I think the PCIe Card Electromechanical spec leaves the margin
> for card manufactures to design the card based on the assumption
> that ref clock could be available after PERST#  is asserted.

The only statement in the spec I'm aware of is that "Clock and JTAG go
inactive after PERST# goes inactive."  Since there's no required time
the clock must remain active, a robust card would not assume the clock
is available at all after PERST.  500ms is a *huge* length of time;
I'd be very surprised if Intel designed a card that required that.

I don't feel like we really got to the root cause of this, but this
patch only hurts the iproc reboot time, so I'm OK with it.  I was just
hoping to avoid slowing down your reboot time.

> most of the cards do not assume that, but at the least we have seen that,
> once particular card which behaves as per the link which I have just
> pasted above.
> 
> >
> > Sec 2.2 says that on power up, the power rails must be stable at least
> > T(PVPERL) (100ms) and reference clocks must be stable at least
> > T(PERST-CLK) (100us) before PERST# is deasserted.  I think it's more
> > likely the problem is here.
> >
> > The current iproc_pcie_reset() looks like it sleeps 100ms *after* it
> > deasserts PERST#.  Should that be *before* deasserting PERST#?
> >
> 
> T(PERST-CLK) (100us) before PERST# is deasserted.
> which we are already waiting for 250us
> 
> T(PVPERL) (100ms), the power rail is stable much before linux comes up.
> so I still think we are meeting power up requirements.
> 
> >> 500ms is just based on the observation and taken as safe margin.
> >> This patch adds platform shutdown where it should be
> >> called in device_shutdown while reboot command is issued.
> >> So in sequence first Endpoint Shutdown (e.g. nvme_shutdown)
> >> followed by RC shutdown, which issues safe PERST assertion.
> >>
> >> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> >> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> >> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> >>
> >> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
> >> index 90d2bdd..9512960 100644
> >> --- a/drivers/pci/host/pcie-iproc-platform.c
> >> +++ b/drivers/pci/host/pcie-iproc-platform.c
> >> @@ -131,6 +131,13 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
> >>       return iproc_pcie_remove(pcie);
> >>  }
> >>
> >> +static void iproc_pcie_pltfm_shutdown(struct platform_device *pdev)
> >> +{
> >> +     struct iproc_pcie *pcie = platform_get_drvdata(pdev);
> >> +
> >> +     iproc_pcie_shutdown(pcie);
> >> +}
> >> +
> >>  static struct platform_driver iproc_pcie_pltfm_driver = {
> >>       .driver = {
> >>               .name = "iproc-pcie",
> >> @@ -138,6 +145,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
> >>       },
> >>       .probe = iproc_pcie_pltfm_probe,
> >>       .remove = iproc_pcie_pltfm_remove,
> >> +     .shutdown = iproc_pcie_pltfm_shutdown,
> >>  };
> >>  module_platform_driver(iproc_pcie_pltfm_driver);
> >>
> >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> >> index 583cee0..ee40651 100644
> >> --- a/drivers/pci/host/pcie-iproc.c
> >> +++ b/drivers/pci/host/pcie-iproc.c
> >> @@ -627,7 +627,7 @@ static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn,
> >>       .write = iproc_pcie_config_write32,
> >>  };
> >>
> >> -static void iproc_pcie_reset(struct iproc_pcie *pcie)
> >> +static void iproc_pcie_perst_ctrl(struct iproc_pcie *pcie, bool assert)
> >>  {
> >>       u32 val;
> >>
> >> @@ -639,20 +639,28 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie)
> >>       if (pcie->ep_is_internal)
> >>               return;
> >>
> >> -     /*
> >> -      * Select perst_b signal as reset source. Put the device into reset,
> >> -      * and then bring it out of reset
> >> -      */
> >> -     val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> >> -     val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
> >> -             ~RC_PCIE_RST_OUTPUT;
> >> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> -     udelay(250);
> >> -
> >> -     val |= RC_PCIE_RST_OUTPUT;
> >> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> -     msleep(100);
> >> +     if (assert) {
> >> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> >> +             val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
> >> +                     ~RC_PCIE_RST_OUTPUT;
> >> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> +             udelay(250);
> >> +     } else {
> >> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> >> +             val |= RC_PCIE_RST_OUTPUT;
> >> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> +             msleep(100);
> >> +     }
> >> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC
  2017-08-20 21:25       ` Bjorn Helgaas
@ 2017-08-21  3:01         ` Oza Oza
  0 siblings, 0 replies; 16+ messages in thread
From: Oza Oza @ 2017-08-21  3:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Ray Jui, Scott Branden, Jon Mason,
	BCM Kernel Feedback, Andy Gospodarek, linux-pci, linux-kernel,
	Oza Pawandeep

On Mon, Aug 21, 2017 at 2:55 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Sun, Aug 20, 2017 at 09:06:51AM +0530, Oza Oza wrote:
>> On Sun, Aug 20, 2017 at 2:34 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > On Fri, Aug 04, 2017 at 09:18:16PM +0530, Oza Pawandeep wrote:
>> >> PERST must be asserted around ~500ms before the reboot is applied.
>> >>
>> >> During soft reset (e.g., "reboot" from Linux) on some iproc based SOCs
>> >> LCPLL clock and PERST both goes off simultaneously.
>> >> This will cause certain Endpoints Intel NVMe not get detected, upon
>> >> next boot sequence.
>> >>
>> >> This is specifically happening with Intel P3700 400GB series.
>> >> Endpoint is expecting the clock for some amount of time after PERST is
>> >> asserted, which is not happening in Stingray (iproc based SOC).
>> >> This causes NVMe to behave in undefined way.
>> >>
>> >> On the contrary, Intel x86 boards will have ref clock running, so we
>> >> do not see this behavior there.
>> >>
>> >> Besides, PCI spec does not stipulate about such timings.
>> >> In-fact it does not tell us, whether PCIe device should consider
>> >> refclk to be available or not to be.
>> >>
>> >> It is completely up to vendor to design their EP the way they want
>> >> with respect to ref clock availability.
>> >
>> > I am unconvinced.  Designing an endpoint that relies on ref clock
>> > timing not guaranteed by the spec does not sound like a way to build
>> > reliable hardware.
>> >
>> > The PCIe Card Electromechanical spec, r2.0, sec 2.2.3 says the clock
>> > goes inactive after PERST# goes active, but doesn't specify how long.
>> > In the absence of a spec requirement, I don't see a reason to assume
>> > other systems preserve the ref clock after PERST#, so if the Intel
>> > device requires clocks for 500ms after PERST#, we should see problems
>> > on other systems.
>>
>> The reason you do not see and most likely you will not see on any
>> other system is
>> because, the ref clock is supplied by on board oscillator.
>> when PERST# is asserted, the ref clock is still available.
>>
>> but in our case, we do not have on-board clock generator, rather we
>> rely on the clock
>> coming from SOC, so if SOC reset is issued, both PERST# and the ref
>> clock goes off simultaneously.
>
> OK.  I'm not a hardware person and will have to take your word for
> this.
>
>> please also refer to the link below which stipulate on clk being
>> active after PERST# being asserted.
>> http://www.eetimes.com/document.asp?doc_id=1279299  [Figure 2:
>> Power-down waveforms]
>
> This is just a copy of Figure 2-13 from the PCIe CEM spec I cited
> above.  It's better to cite the spec itself than an article based on
> the spec.
>
>> however I am not saying that, this article has more to claim than PCIe spec.
>> but, I think the PCIe Card Electromechanical spec leaves the margin
>> for card manufactures to design the card based on the assumption
>> that ref clock could be available after PERST#  is asserted.
>
> The only statement in the spec I'm aware of is that "Clock and JTAG go
> inactive after PERST# goes inactive."  Since there's no required time
> the clock must remain active, a robust card would not assume the clock
> is available at all after PERST.  500ms is a *huge* length of time;
> I'd be very surprised if Intel designed a card that required that.
>
> I don't feel like we really got to the root cause of this, but this
> patch only hurts the iproc reboot time, so I'm OK with it.  I was just
> hoping to avoid slowing down your reboot time.
>

I appreciate your concern and valuable inputs.

However, while writing this patch I shared the similar concern.

And, after multiple discussions this is the best we could do.
reboot is less likely in data centers,
but failing to detect the NVMe after reboot has more severe business
impact than delaying reboot by 500ms.

I will rebase both the patches on top of Lorenzo's patches, and submit v7 today.

Thanks and Regards,
Oza.

>> most of the cards do not assume that, but at the least we have seen that,
>> once particular card which behaves as per the link which I have just
>> pasted above.
>>
>> >
>> > Sec 2.2 says that on power up, the power rails must be stable at least
>> > T(PVPERL) (100ms) and reference clocks must be stable at least
>> > T(PERST-CLK) (100us) before PERST# is deasserted.  I think it's more
>> > likely the problem is here.
>> >
>> > The current iproc_pcie_reset() looks like it sleeps 100ms *after* it
>> > deasserts PERST#.  Should that be *before* deasserting PERST#?
>> >
>>
>> T(PERST-CLK) (100us) before PERST# is deasserted.
>> which we are already waiting for 250us
>>
>> T(PVPERL) (100ms), the power rail is stable much before linux comes up.
>> so I still think we are meeting power up requirements.
>>
>> >> 500ms is just based on the observation and taken as safe margin.
>> >> This patch adds platform shutdown where it should be
>> >> called in device_shutdown while reboot command is issued.
>> >> So in sequence first Endpoint Shutdown (e.g. nvme_shutdown)
>> >> followed by RC shutdown, which issues safe PERST assertion.
>> >>
>> >> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
>> >> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
>> >> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>> >>
>> >> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
>> >> index 90d2bdd..9512960 100644
>> >> --- a/drivers/pci/host/pcie-iproc-platform.c
>> >> +++ b/drivers/pci/host/pcie-iproc-platform.c
>> >> @@ -131,6 +131,13 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
>> >>       return iproc_pcie_remove(pcie);
>> >>  }
>> >>
>> >> +static void iproc_pcie_pltfm_shutdown(struct platform_device *pdev)
>> >> +{
>> >> +     struct iproc_pcie *pcie = platform_get_drvdata(pdev);
>> >> +
>> >> +     iproc_pcie_shutdown(pcie);
>> >> +}
>> >> +
>> >>  static struct platform_driver iproc_pcie_pltfm_driver = {
>> >>       .driver = {
>> >>               .name = "iproc-pcie",
>> >> @@ -138,6 +145,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
>> >>       },
>> >>       .probe = iproc_pcie_pltfm_probe,
>> >>       .remove = iproc_pcie_pltfm_remove,
>> >> +     .shutdown = iproc_pcie_pltfm_shutdown,
>> >>  };
>> >>  module_platform_driver(iproc_pcie_pltfm_driver);
>> >>
>> >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>> >> index 583cee0..ee40651 100644
>> >> --- a/drivers/pci/host/pcie-iproc.c
>> >> +++ b/drivers/pci/host/pcie-iproc.c
>> >> @@ -627,7 +627,7 @@ static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn,
>> >>       .write = iproc_pcie_config_write32,
>> >>  };
>> >>
>> >> -static void iproc_pcie_reset(struct iproc_pcie *pcie)
>> >> +static void iproc_pcie_perst_ctrl(struct iproc_pcie *pcie, bool assert)
>> >>  {
>> >>       u32 val;
>> >>
>> >> @@ -639,20 +639,28 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie)
>> >>       if (pcie->ep_is_internal)
>> >>               return;
>> >>
>> >> -     /*
>> >> -      * Select perst_b signal as reset source. Put the device into reset,
>> >> -      * and then bring it out of reset
>> >> -      */
>> >> -     val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
>> >> -     val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
>> >> -             ~RC_PCIE_RST_OUTPUT;
>> >> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> >> -     udelay(250);
>> >> -
>> >> -     val |= RC_PCIE_RST_OUTPUT;
>> >> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> >> -     msleep(100);
>> >> +     if (assert) {
>> >> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
>> >> +             val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
>> >> +                     ~RC_PCIE_RST_OUTPUT;
>> >> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> >> +             udelay(250);
>> >> +     } else {
>> >> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
>> >> +             val |= RC_PCIE_RST_OUTPUT;
>> >> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
>> >> +             msleep(100);
>> >> +     }
>> >> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-08-21  3:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04 15:48 [PATCH v6 0/2] PCI: iproc: SOC specific fixes Oza Pawandeep
2017-08-04 15:48 ` [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP Oza Pawandeep
2017-08-19 18:26   ` Bjorn Helgaas
2017-08-19 18:52     ` Oza Oza
2017-08-19 18:52       ` Oza Oza
2017-08-19 19:32     ` Oza Oza
2017-08-19 19:32       ` Oza Oza
2017-08-19 20:26       ` Bjorn Helgaas
2017-08-20  3:42         ` Oza Oza
2017-08-04 15:48 ` [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC Oza Pawandeep
2017-08-19 21:04   ` Bjorn Helgaas
2017-08-20  3:36     ` Oza Oza
2017-08-20 21:25       ` Bjorn Helgaas
2017-08-21  3:01         ` Oza Oza
2017-08-19 17:10 ` [PATCH v6 0/2] PCI: iproc: SOC specific fixes Bjorn Helgaas
2017-08-19 18:18   ` Oza Oza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.