Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
@ 2020-05-28 14:31 Pali Rohár
  2020-05-28 16:26 ` Bjorn Helgaas
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Pali Rohár @ 2020-05-28 14:31 UTC (permalink / raw)
  To: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium
  Cc: linux-pci, linux-arm-kernel, linux-kernel

When there is no PCIe card connected and advk_pcie_rd_conf() or
advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
root bridge, the aardvark driver throws the following error message:

  advk-pcie d0070000.pcie: config read/write timed out

Obviously accessing PCIe registers of disconnected card is not possible.

Extend check in advk_pcie_valid_device() function for validating
availability of PCIe bus. If PCIe link is down, then the device is marked
as Not Found and the driver does not try to access these registers.

Signed-off-by: Pali Rohár <pali@kernel.org>
---
 drivers/pci/controller/pci-aardvark.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 90ff291c24f0..53a4cfd7d377 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
 	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
 		return false;
 
+	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
+		return false;
+
 	return true;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-28 14:31 [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected Pali Rohár
@ 2020-05-28 16:26 ` Bjorn Helgaas
  2020-05-28 16:38   ` Pali Rohár
  2020-07-01  8:20 ` [PATCH v2] " Pali Rohár
  2020-07-02  8:30 ` [PATCH v3] " Pali Rohár
  2 siblings, 1 reply; 32+ messages in thread
From: Bjorn Helgaas @ 2020-05-28 16:26 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> When there is no PCIe card connected and advk_pcie_rd_conf() or
> advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> root bridge, the aardvark driver throws the following error message:
> 
>   advk-pcie d0070000.pcie: config read/write timed out
> 
> Obviously accessing PCIe registers of disconnected card is not possible.
> 
> Extend check in advk_pcie_valid_device() function for validating
> availability of PCIe bus. If PCIe link is down, then the device is marked
> as Not Found and the driver does not try to access these registers.
> 
> Signed-off-by: Pali Rohár <pali@kernel.org>
> ---
>  drivers/pci/controller/pci-aardvark.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> index 90ff291c24f0..53a4cfd7d377 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
>  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
>  		return false;
>  
> +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> +		return false;

I don't think this is the right fix.  This makes it racy because the
link may go down after we call advk_pcie_valid_device() but before we
perform the config read.

I have no objection to removing the "config read/write timed out"
message.  The "return PCIBIOS_SET_FAILED" in the read case probably
should be augmented by setting "*val = 0xffffffff".

>  	return true;
>  }
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-28 16:26 ` Bjorn Helgaas
@ 2020-05-28 16:38   ` Pali Rohár
  2020-05-28 16:49     ` Bjorn Helgaas
  2020-06-30 13:51     ` Bjorn Helgaas
  0 siblings, 2 replies; 32+ messages in thread
From: Pali Rohár @ 2020-05-28 16:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > root bridge, the aardvark driver throws the following error message:
> > 
> >   advk-pcie d0070000.pcie: config read/write timed out
> > 
> > Obviously accessing PCIe registers of disconnected card is not possible.
> > 
> > Extend check in advk_pcie_valid_device() function for validating
> > availability of PCIe bus. If PCIe link is down, then the device is marked
> > as Not Found and the driver does not try to access these registers.
> > 
> > Signed-off-by: Pali Rohár <pali@kernel.org>
> > ---
> >  drivers/pci/controller/pci-aardvark.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > index 90ff291c24f0..53a4cfd7d377 100644
> > --- a/drivers/pci/controller/pci-aardvark.c
> > +++ b/drivers/pci/controller/pci-aardvark.c
> > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> >  		return false;
> >  
> > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > +		return false;
> 
> I don't think this is the right fix.  This makes it racy because the
> link may go down after we call advk_pcie_valid_device() but before we
> perform the config read.

Yes, it is racy, but I do not think it cause problems. Trying to read
PCIe registers when device is not connected cause just those timeouts,
printing error message and increased delay in advk_pcie_wait_pio() due
to polling loop. This patch reduce unnecessary access to PCIe registers
when advk_pcie_wait_pio() polling just fail.

I think it is a good idea to not call blocking advk_pcie_wait_pio() when
it is not needed. We could have faster enumeration of PCIe buses when
card is not connected.

> I have no objection to removing the "config read/write timed out"
> message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> should be augmented by setting "*val = 0xffffffff".
> 
> >  	return true;
> >  }
> >  
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-28 16:38   ` Pali Rohár
@ 2020-05-28 16:49     ` Bjorn Helgaas
  2020-05-29  8:30       ` Pali Rohár
  2020-06-30 13:51     ` Bjorn Helgaas
  1 sibling, 1 reply; 32+ messages in thread
From: Bjorn Helgaas @ 2020-05-28 16:49 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > root bridge, the aardvark driver throws the following error message:
> > > 
> > >   advk-pcie d0070000.pcie: config read/write timed out
> > > 
> > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > 
> > > Extend check in advk_pcie_valid_device() function for validating
> > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > as Not Found and the driver does not try to access these registers.
> > > 
> > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > ---
> > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > index 90ff291c24f0..53a4cfd7d377 100644
> > > --- a/drivers/pci/controller/pci-aardvark.c
> > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > >  		return false;
> > >  
> > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > +		return false;
> > 
> > I don't think this is the right fix.  This makes it racy because the
> > link may go down after we call advk_pcie_valid_device() but before we
> > perform the config read.
> 
> Yes, it is racy, but I do not think it cause problems. Trying to read
> PCIe registers when device is not connected cause just those timeouts,
> printing error message and increased delay in advk_pcie_wait_pio() due
> to polling loop. This patch reduce unnecessary access to PCIe registers
> when advk_pcie_wait_pio() polling just fail.
> 
> I think it is a good idea to not call blocking advk_pcie_wait_pio() when
> it is not needed. We could have faster enumeration of PCIe buses when
> card is not connected.

Maybe advk_pcie_check_pio_status() and advk_pcie_wait_pio() could be
combined so we could get the correct error status as soon as it's
available, without waiting for a timeout?

In any event, the "return PCIBIOS_SET_FAILED" needs to be fixed.  Most
callers of config read do not check for failure, but most of the ones
that do, check for "val == ~0".  Only a few check for a status of
other than PCIBIOS_SUCCESSFUL.

> > I have no objection to removing the "config read/write timed out"
> > message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> > should be augmented by setting "*val = 0xffffffff".
> > 
> > >  	return true;
> > >  }
> > >  
> > > -- 
> > > 2.20.1
> > > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-28 16:49     ` Bjorn Helgaas
@ 2020-05-29  8:30       ` Pali Rohár
  2020-06-30 12:31         ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-05-29  8:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Thursday 28 May 2020 11:49:38 Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > root bridge, the aardvark driver throws the following error message:
> > > > 
> > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > 
> > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > 
> > > > Extend check in advk_pcie_valid_device() function for validating
> > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > as Not Found and the driver does not try to access these registers.
> > > > 
> > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > ---
> > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > >  		return false;
> > > >  
> > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > +		return false;
> > > 
> > > I don't think this is the right fix.  This makes it racy because the
> > > link may go down after we call advk_pcie_valid_device() but before we
> > > perform the config read.
> > 
> > Yes, it is racy, but I do not think it cause problems. Trying to read
> > PCIe registers when device is not connected cause just those timeouts,
> > printing error message and increased delay in advk_pcie_wait_pio() due
> > to polling loop. This patch reduce unnecessary access to PCIe registers
> > when advk_pcie_wait_pio() polling just fail.
> > 
> > I think it is a good idea to not call blocking advk_pcie_wait_pio() when
> > it is not needed. We could have faster enumeration of PCIe buses when
> > card is not connected.
> 
> Maybe advk_pcie_check_pio_status() and advk_pcie_wait_pio() could be
> combined so we could get the correct error status as soon as it's
> available, without waiting for a timeout?

Any idea how to achieve it?

First call is polling function advk_pcie_wait_pio() and second call is
advk_pcie_check_pio_status() which just reads status register and prints
error message to dmesg.

So for me it looks like that combining these two functions into one does
not change anything. We always need to call polling code prior to
checking status register. And therefore need to wait for timeout. Unless
something like in this proposed patch is not used (to skip whole
register access if it would fail).

> In any event, the "return PCIBIOS_SET_FAILED" needs to be fixed.  Most
> callers of config read do not check for failure, but most of the ones
> that do, check for "val == ~0".  Only a few check for a status of
> other than PCIBIOS_SUCCESSFUL.
> 
> > > I have no objection to removing the "config read/write timed out"
> > > message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> > > should be augmented by setting "*val = 0xffffffff".

Now I see, "*val = 0xffffffff" should be really set when function
advk_pcie_rd_conf() fails.

> > > >  	return true;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.20.1
> > > > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-29  8:30       ` Pali Rohár
@ 2020-06-30 12:31         ` Pali Rohár
  0 siblings, 0 replies; 32+ messages in thread
From: Pali Rohár @ 2020-06-30 12:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

Hello!

On Friday 29 May 2020 10:30:13 Pali Rohár wrote:
> On Thursday 28 May 2020 11:49:38 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > > root bridge, the aardvark driver throws the following error message:
> > > > > 
> > > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > > 
> > > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > > 
> > > > > Extend check in advk_pcie_valid_device() function for validating
> > > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > > as Not Found and the driver does not try to access these registers.
> > > > > 
> > > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > > ---
> > > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > > >  		return false;
> > > > >  
> > > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > > +		return false;
> > > > 
> > > > I don't think this is the right fix.  This makes it racy because the
> > > > link may go down after we call advk_pcie_valid_device() but before we
> > > > perform the config read.
> > > 
> > > Yes, it is racy, but I do not think it cause problems. Trying to read
> > > PCIe registers when device is not connected cause just those timeouts,
> > > printing error message and increased delay in advk_pcie_wait_pio() due
> > > to polling loop. This patch reduce unnecessary access to PCIe registers
> > > when advk_pcie_wait_pio() polling just fail.
> > > 
> > > I think it is a good idea to not call blocking advk_pcie_wait_pio() when
> > > it is not needed. We could have faster enumeration of PCIe buses when
> > > card is not connected.
> > 
> > Maybe advk_pcie_check_pio_status() and advk_pcie_wait_pio() could be
> > combined so we could get the correct error status as soon as it's
> > available, without waiting for a timeout?
> 
> Any idea how to achieve it?
> 
> First call is polling function advk_pcie_wait_pio() and second call is
> advk_pcie_check_pio_status() which just reads status register and prints
> error message to dmesg.
> 
> So for me it looks like that combining these two functions into one does
> not change anything. We always need to call polling code prior to
> checking status register. And therefore need to wait for timeout. Unless
> something like in this proposed patch is not used (to skip whole
> register access if it would fail).

So to answer your question, correct status is possible to retrieve only
after waiting for timeout. As status would be available only after
timeout expires.

Therefore my proposed patch in this (or some other) form is needed if we
want to prevent trying to read from registers and waiting for answer
when card is disconnected.

I would really like to see this issue fixed, so booting linux kernel on
board without connected PCIe card would not be delayed.

Thomas, Lorenzo, Bjorn: do you have any idea how to fix it differently?
Or if not, could be my proposed patch accepted in some form?

> > In any event, the "return PCIBIOS_SET_FAILED" needs to be fixed.  Most
> > callers of config read do not check for failure, but most of the ones
> > that do, check for "val == ~0".  Only a few check for a status of
> > other than PCIBIOS_SUCCESSFUL.
> > 
> > > > I have no objection to removing the "config read/write timed out"
> > > > message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> > > > should be augmented by setting "*val = 0xffffffff".
> 
> Now I see, "*val = 0xffffffff" should be really set when function
> advk_pcie_rd_conf() fails.

I have already sent separate patch which fixes this issue.

> > > > >  	return true;
> > > > >  }
> > > > >  
> > > > > -- 
> > > > > 2.20.1
> > > > > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-28 16:38   ` Pali Rohár
  2020-05-28 16:49     ` Bjorn Helgaas
@ 2020-06-30 13:51     ` Bjorn Helgaas
  2020-06-30 14:04       ` Pali Rohár
  1 sibling, 1 reply; 32+ messages in thread
From: Bjorn Helgaas @ 2020-06-30 13:51 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > root bridge, the aardvark driver throws the following error message:
> > > 
> > >   advk-pcie d0070000.pcie: config read/write timed out
> > > 
> > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > 
> > > Extend check in advk_pcie_valid_device() function for validating
> > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > as Not Found and the driver does not try to access these registers.
> > > 
> > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > ---
> > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > index 90ff291c24f0..53a4cfd7d377 100644
> > > --- a/drivers/pci/controller/pci-aardvark.c
> > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > >  		return false;
> > >  
> > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > +		return false;
> > 
> > I don't think this is the right fix.  This makes it racy because the
> > link may go down after we call advk_pcie_valid_device() but before we
> > perform the config read.
> 
> Yes, it is racy, but I do not think it cause problems. Trying to read
> PCIe registers when device is not connected cause just those timeouts,
> printing error message and increased delay in advk_pcie_wait_pio() due
> to polling loop. This patch reduce unnecessary access to PCIe registers
> when advk_pcie_wait_pio() polling just fail.

What happens when the device is removed after advk_pcie_link_up()
returns true, but before we actually do the config access?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-06-30 13:51     ` Bjorn Helgaas
@ 2020-06-30 14:04       ` Pali Rohár
  2020-06-30 14:58         ` Bjorn Helgaas
  0 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-06-30 14:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Tuesday 30 June 2020 08:51:27 Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > root bridge, the aardvark driver throws the following error message:
> > > > 
> > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > 
> > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > 
> > > > Extend check in advk_pcie_valid_device() function for validating
> > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > as Not Found and the driver does not try to access these registers.
> > > > 
> > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > ---
> > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > >  		return false;
> > > >  
> > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > +		return false;
> > > 
> > > I don't think this is the right fix.  This makes it racy because the
> > > link may go down after we call advk_pcie_valid_device() but before we
> > > perform the config read.
> > 
> > Yes, it is racy, but I do not think it cause problems. Trying to read
> > PCIe registers when device is not connected cause just those timeouts,
> > printing error message and increased delay in advk_pcie_wait_pio() due
> > to polling loop. This patch reduce unnecessary access to PCIe registers
> > when advk_pcie_wait_pio() polling just fail.
> 
> What happens when the device is removed after advk_pcie_link_up()
> returns true, but before we actually do the config access?

Do you mean to remove device physically at runtime? I was told that our
board would crash or issue reset. Removing device from mini PCIe slot
without power off is not supported.

Anyway, currently we are trying to read from device registers even when
no device is connected. So when advk_pcie_link_up() returns true and
after that device is not connected (somehow board and kernel would be
still alive) I guess that it would behave as without applying this
patch. So kernel starts reading from register and would wait until
timeout expires. As device is not connected there would be no answer,
so kernel print error message to dmesg (same as in commit message) and
returns error that read failed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-06-30 14:04       ` Pali Rohár
@ 2020-06-30 14:58         ` Bjorn Helgaas
  2020-07-01  8:08           ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Bjorn Helgaas @ 2020-06-30 14:58 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Tue, Jun 30, 2020 at 04:04:20PM +0200, Pali Rohár wrote:
> On Tuesday 30 June 2020 08:51:27 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > > root bridge, the aardvark driver throws the following error message:
> > > > > 
> > > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > > 
> > > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > > 
> > > > > Extend check in advk_pcie_valid_device() function for validating
> > > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > > as Not Found and the driver does not try to access these registers.
> > > > > 
> > > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > > ---
> > > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > > >  		return false;
> > > > >  
> > > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > > +		return false;
> > > > 
> > > > I don't think this is the right fix.  This makes it racy because the
> > > > link may go down after we call advk_pcie_valid_device() but before we
> > > > perform the config read.
> > > 
> > > Yes, it is racy, but I do not think it cause problems. Trying to read
> > > PCIe registers when device is not connected cause just those timeouts,
> > > printing error message and increased delay in advk_pcie_wait_pio() due
> > > to polling loop. This patch reduce unnecessary access to PCIe registers
> > > when advk_pcie_wait_pio() polling just fail.
> > 
> > What happens when the device is removed after advk_pcie_link_up()
> > returns true, but before we actually do the config access?
> 
> Do you mean to remove device physically at runtime? I was told that our
> board would crash or issue reset. Removing device from mini PCIe slot
> without power off is not supported.

Right, I don't think PCIe mini cards support hotplug.

> Anyway, currently we are trying to read from device registers even when
> no device is connected. So when advk_pcie_link_up() returns true and
> after that device is not connected (somehow board and kernel would be
> still alive) I guess that it would behave as without applying this
> patch. So kernel starts reading from register and would wait until
> timeout expires. As device is not connected there would be no answer,
> so kernel print error message to dmesg (same as in commit message) and
> returns error that read failed.

OK, so if I understand correctly, checking advk_pcie_link_up() is
strictly an optimization.  If we guess wrong (e.g., after calling
advk_pcie_link_up(), the link went down because the card was removed,
DPC triggered, etc), the only bad thing is that we wait for a timeout;
it never causes a crash.

If that's the case, I'm fine with this.  But please add a comment to
that effect.

I think several other drivers check for the link being up because we
actually crash if we try to read config space when the link is down.
That's what I was trying to avoid here.

Bjorn

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-06-30 14:58         ` Bjorn Helgaas
@ 2020-07-01  8:08           ` Pali Rohár
  0 siblings, 0 replies; 32+ messages in thread
From: Pali Rohár @ 2020-07-01  8:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Tuesday 30 June 2020 09:58:48 Bjorn Helgaas wrote:
> On Tue, Jun 30, 2020 at 04:04:20PM +0200, Pali Rohár wrote:
> > On Tuesday 30 June 2020 08:51:27 Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > > > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > > > root bridge, the aardvark driver throws the following error message:
> > > > > > 
> > > > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > > > 
> > > > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > > > 
> > > > > > Extend check in advk_pcie_valid_device() function for validating
> > > > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > > > as Not Found and the driver does not try to access these registers.
> > > > > > 
> > > > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > > > ---
> > > > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > > > >  		return false;
> > > > > >  
> > > > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > > > +		return false;
> > > > > 
> > > > > I don't think this is the right fix.  This makes it racy because the
> > > > > link may go down after we call advk_pcie_valid_device() but before we
> > > > > perform the config read.
> > > > 
> > > > Yes, it is racy, but I do not think it cause problems. Trying to read
> > > > PCIe registers when device is not connected cause just those timeouts,
> > > > printing error message and increased delay in advk_pcie_wait_pio() due
> > > > to polling loop. This patch reduce unnecessary access to PCIe registers
> > > > when advk_pcie_wait_pio() polling just fail.
> > > 
> > > What happens when the device is removed after advk_pcie_link_up()
> > > returns true, but before we actually do the config access?
> > 
> > Do you mean to remove device physically at runtime? I was told that our
> > board would crash or issue reset. Removing device from mini PCIe slot
> > without power off is not supported.
> 
> Right, I don't think PCIe mini cards support hotplug.
> 
> > Anyway, currently we are trying to read from device registers even when
> > no device is connected. So when advk_pcie_link_up() returns true and
> > after that device is not connected (somehow board and kernel would be
> > still alive) I guess that it would behave as without applying this
> > patch. So kernel starts reading from register and would wait until
> > timeout expires. As device is not connected there would be no answer,
> > so kernel print error message to dmesg (same as in commit message) and
> > returns error that read failed.
> 
> OK, so if I understand correctly, checking advk_pcie_link_up() is
> strictly an optimization.  If we guess wrong (e.g., after calling
> advk_pcie_link_up(), the link went down because the card was removed,
> DPC triggered, etc), the only bad thing is that we wait for a timeout;
> it never causes a crash.

Yes.

> If that's the case, I'm fine with this.  But please add a comment to
> that effect.

Ok, I will send V2 with updated commit message.

> I think several other drivers check for the link being up because we
> actually crash if we try to read config space when the link is down.
> That's what I was trying to avoid here.
> 
> Bjorn

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-28 14:31 [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected Pali Rohár
  2020-05-28 16:26 ` Bjorn Helgaas
@ 2020-07-01  8:20 ` Pali Rohár
  2020-07-01 21:34   ` Bjorn Helgaas
  2020-07-02  8:30 ` [PATCH v3] " Pali Rohár
  2 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-01  8:20 UTC (permalink / raw)
  To: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium
  Cc: linux-pci, linux-arm-kernel, linux-kernel

When there is no PCIe card connected and advk_pcie_rd_conf() or
advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
root bridge, the aardvark driver throws the following error message:

  advk-pcie d0070000.pcie: config read/write timed out

Obviously accessing PCIe registers of disconnected card is not possible.

Extend check in advk_pcie_valid_device() function for validating
availability of PCIe bus. If PCIe link is down, then the device is marked
as Not Found and the driver does not try to access these registers.

This is just an optimization to prevent accessing PCIe registers when card
is disconnected. Trying to access PCIe registers of disconnected card does
not cause any crash, kernel just needs to wait for a timeout. So if card
disappear immediately after checking for PCIe link (before accessing PCIe
registers), it does not cause any problems.

Signed-off-by: Pali Rohár <pali@kernel.org>

---
Changes in V2:
* Update commit message, mention that this is optimization
---
 drivers/pci/controller/pci-aardvark.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 90ff291c24f0..53a4cfd7d377 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
 	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
 		return false;
 
+	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
+		return false;
+
 	return true;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-01  8:20 ` [PATCH v2] " Pali Rohár
@ 2020-07-01 21:34   ` Bjorn Helgaas
  2020-07-02  8:23     ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Bjorn Helgaas @ 2020-07-01 21:34 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Wed, Jul 01, 2020 at 10:20:44AM +0200, Pali Rohár wrote:
> When there is no PCIe card connected and advk_pcie_rd_conf() or
> advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> root bridge, the aardvark driver throws the following error message:
> 
>   advk-pcie d0070000.pcie: config read/write timed out
> 
> Obviously accessing PCIe registers of disconnected card is not possible.
> 
> Extend check in advk_pcie_valid_device() function for validating
> availability of PCIe bus. If PCIe link is down, then the device is marked
> as Not Found and the driver does not try to access these registers.
> 
> This is just an optimization to prevent accessing PCIe registers when card
> is disconnected. Trying to access PCIe registers of disconnected card does
> not cause any crash, kernel just needs to wait for a timeout. So if card
> disappear immediately after checking for PCIe link (before accessing PCIe
> registers), it does not cause any problems.

Thanks, this is good.  I'd really like a short comment in the code as
well, because this sort of link-up check tends to get copied to new
drivers where it shouldn't be used, e.g., something like this:

  /*
   * If the link goes down after we check for link-up, nothing bad
   * happens but the config access times out.
   */

> Signed-off-by: Pali Rohár <pali@kernel.org>
> 
> ---
> Changes in V2:
> * Update commit message, mention that this is optimization
> ---
>  drivers/pci/controller/pci-aardvark.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> index 90ff291c24f0..53a4cfd7d377 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
>  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
>  		return false;
>  
> +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> +		return false;
> +
>  	return true;
>  }
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-01 21:34   ` Bjorn Helgaas
@ 2020-07-02  8:23     ` Pali Rohár
  0 siblings, 0 replies; 32+ messages in thread
From: Pali Rohár @ 2020-07-02  8:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Wednesday 01 July 2020 16:34:42 Bjorn Helgaas wrote:
> On Wed, Jul 01, 2020 at 10:20:44AM +0200, Pali Rohár wrote:
> > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > root bridge, the aardvark driver throws the following error message:
> > 
> >   advk-pcie d0070000.pcie: config read/write timed out
> > 
> > Obviously accessing PCIe registers of disconnected card is not possible.
> > 
> > Extend check in advk_pcie_valid_device() function for validating
> > availability of PCIe bus. If PCIe link is down, then the device is marked
> > as Not Found and the driver does not try to access these registers.
> > 
> > This is just an optimization to prevent accessing PCIe registers when card
> > is disconnected. Trying to access PCIe registers of disconnected card does
> > not cause any crash, kernel just needs to wait for a timeout. So if card
> > disappear immediately after checking for PCIe link (before accessing PCIe
> > registers), it does not cause any problems.
> 
> Thanks, this is good.  I'd really like a short comment in the code as
> well, because this sort of link-up check tends to get copied to new
> drivers where it shouldn't be used, e.g., something like this:
> 
>   /*
>    * If the link goes down after we check for link-up, nothing bad
>    * happens but the config access times out.
>    */

Ok, it makes sense! I will send a new patch version.

> > Signed-off-by: Pali Rohár <pali@kernel.org>
> > 
> > ---
> > Changes in V2:
> > * Update commit message, mention that this is optimization
> > ---
> >  drivers/pci/controller/pci-aardvark.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > index 90ff291c24f0..53a4cfd7d377 100644
> > --- a/drivers/pci/controller/pci-aardvark.c
> > +++ b/drivers/pci/controller/pci-aardvark.c
> > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> >  		return false;
> >  
> > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > +		return false;
> > +
> >  	return true;
> >  }
> >  
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-05-28 14:31 [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected Pali Rohár
  2020-05-28 16:26 ` Bjorn Helgaas
  2020-07-01  8:20 ` [PATCH v2] " Pali Rohár
@ 2020-07-02  8:30 ` Pali Rohár
  2020-07-09 11:35   ` Lorenzo Pieralisi
  2 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-02  8:30 UTC (permalink / raw)
  To: Thomas Petazzoni, Lorenzo Pieralisi, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium
  Cc: linux-pci, linux-arm-kernel, linux-kernel

When there is no PCIe card connected and advk_pcie_rd_conf() or
advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
root bridge, the aardvark driver throws the following error message:

  advk-pcie d0070000.pcie: config read/write timed out

Obviously accessing PCIe registers of disconnected card is not possible.

Extend check in advk_pcie_valid_device() function for validating
availability of PCIe bus. If PCIe link is down, then the device is marked
as Not Found and the driver does not try to access these registers.

This is just an optimization to prevent accessing PCIe registers when card
is disconnected. Trying to access PCIe registers of disconnected card does
not cause any crash, kernel just needs to wait for a timeout. So if card
disappear immediately after checking for PCIe link (before accessing PCIe
registers), it does not cause any problems.

Signed-off-by: Pali Rohár <pali@kernel.org>

---
Changes in V3:
* Add comment to the code
Changes in V2:
* Update commit message, mention that this is optimization
---
 drivers/pci/controller/pci-aardvark.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 90ff291c24f0..d18f389b36a1 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
 	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
 		return false;
 
+	/*
+	 * If the link goes down after we check for link-up, nothing bad
+	 * happens but the config access times out.
+	 */
+	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
+		return false;
+
 	return true;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-02  8:30 ` [PATCH v3] " Pali Rohár
@ 2020-07-09 11:35   ` Lorenzo Pieralisi
  2020-07-09 12:22     ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Lorenzo Pieralisi @ 2020-07-09 11:35 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> When there is no PCIe card connected and advk_pcie_rd_conf() or
> advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> root bridge, the aardvark driver throws the following error message:
> 
>   advk-pcie d0070000.pcie: config read/write timed out
> 
> Obviously accessing PCIe registers of disconnected card is not possible.
> 
> Extend check in advk_pcie_valid_device() function for validating
> availability of PCIe bus. If PCIe link is down, then the device is marked
> as Not Found and the driver does not try to access these registers.
> 
> This is just an optimization to prevent accessing PCIe registers when card
> is disconnected. Trying to access PCIe registers of disconnected card does
> not cause any crash, kernel just needs to wait for a timeout. So if card
> disappear immediately after checking for PCIe link (before accessing PCIe
> registers), it does not cause any problems.
> 
> Signed-off-by: Pali Rohár <pali@kernel.org>
> 
> ---
> Changes in V3:
> * Add comment to the code
> Changes in V2:
> * Update commit message, mention that this is optimization
> ---
>  drivers/pci/controller/pci-aardvark.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> index 90ff291c24f0..d18f389b36a1 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
>  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
>  		return false;
>  
> +	/*
> +	 * If the link goes down after we check for link-up, nothing bad
> +	 * happens but the config access times out.
> +	 */
> +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> +		return false;
> +
>  	return true;
>  }

Question: this basically means that you can only effectively enumerate
bus number == root_bus_nr and AFAICS if at probe the link did not
come up it will never do, will it ?

Isn't this equivalent to limiting the bus numbers the bridge is capable
of handling ?

Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
the point of trying to enumerate the bus hierarchy below the root bus ?

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-09 11:35   ` Lorenzo Pieralisi
@ 2020-07-09 12:22     ` Pali Rohár
  2020-07-09 14:47       ` Lorenzo Pieralisi
  0 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-09 12:22 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Thursday 09 July 2020 12:35:09 Lorenzo Pieralisi wrote:
> On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > root bridge, the aardvark driver throws the following error message:
> > 
> >   advk-pcie d0070000.pcie: config read/write timed out
> > 
> > Obviously accessing PCIe registers of disconnected card is not possible.
> > 
> > Extend check in advk_pcie_valid_device() function for validating
> > availability of PCIe bus. If PCIe link is down, then the device is marked
> > as Not Found and the driver does not try to access these registers.
> > 
> > This is just an optimization to prevent accessing PCIe registers when card
> > is disconnected. Trying to access PCIe registers of disconnected card does
> > not cause any crash, kernel just needs to wait for a timeout. So if card
> > disappear immediately after checking for PCIe link (before accessing PCIe
> > registers), it does not cause any problems.
> > 
> > Signed-off-by: Pali Rohár <pali@kernel.org>
> > 
> > ---
> > Changes in V3:
> > * Add comment to the code
> > Changes in V2:
> > * Update commit message, mention that this is optimization
> > ---
> >  drivers/pci/controller/pci-aardvark.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > index 90ff291c24f0..d18f389b36a1 100644
> > --- a/drivers/pci/controller/pci-aardvark.c
> > +++ b/drivers/pci/controller/pci-aardvark.c
> > @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> >  		return false;
> >  
> > +	/*
> > +	 * If the link goes down after we check for link-up, nothing bad
> > +	 * happens but the config access times out.
> > +	 */
> > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > +		return false;
> > +
> >  	return true;
> >  }
> 
> Question: this basically means that you can only effectively enumerate
> bus number == root_bus_nr and AFAICS if at probe the link did not
> come up it will never do, will it ?
> 
> Isn't this equivalent to limiting the bus numbers the bridge is capable
> of handling ?
> 
> Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
> the point of trying to enumerate the bus hierarchy below the root bus ?

Hello Lorenzo!

PCIe link can theoretically come up even after boot, but aardvark driver
currently does not support link detection at runtime. So it checks and
enumerate device only at probe time.

I do not know if hardware has some mechanism to inform kernel that PCIe
link come up (or down) and re-enumeration is required. Or the only
option is polling via advk_pcie_link_up().

So if device is not visible at the probe time then it would not appear
in system and cannot be used. This is current state.

Just to note that our hardware does not support physical hotplug of
mPCIe cards. You need to connect card when board is powered off.

So if at the aardvark probe time PCIe link is not up then trying to
enumerate devices under (software) root bridge is not needed. But it is
needed to register/enumerate software root bridge device and currently
both is done by one (recursive) call pci_host_probe().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-09 12:22     ` Pali Rohár
@ 2020-07-09 14:47       ` Lorenzo Pieralisi
  2020-07-09 15:09         ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Lorenzo Pieralisi @ 2020-07-09 14:47 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Thu, Jul 09, 2020 at 02:22:08PM +0200, Pali Rohár wrote:
> On Thursday 09 July 2020 12:35:09 Lorenzo Pieralisi wrote:
> > On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > root bridge, the aardvark driver throws the following error message:
> > > 
> > >   advk-pcie d0070000.pcie: config read/write timed out
> > > 
> > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > 
> > > Extend check in advk_pcie_valid_device() function for validating
> > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > as Not Found and the driver does not try to access these registers.
> > > 
> > > This is just an optimization to prevent accessing PCIe registers when card
> > > is disconnected. Trying to access PCIe registers of disconnected card does
> > > not cause any crash, kernel just needs to wait for a timeout. So if card
> > > disappear immediately after checking for PCIe link (before accessing PCIe
> > > registers), it does not cause any problems.
> > > 
> > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > 
> > > ---
> > > Changes in V3:
> > > * Add comment to the code
> > > Changes in V2:
> > > * Update commit message, mention that this is optimization
> > > ---
> > >  drivers/pci/controller/pci-aardvark.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > index 90ff291c24f0..d18f389b36a1 100644
> > > --- a/drivers/pci/controller/pci-aardvark.c
> > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > >  		return false;
> > >  
> > > +	/*
> > > +	 * If the link goes down after we check for link-up, nothing bad
> > > +	 * happens but the config access times out.
> > > +	 */
> > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > +		return false;
> > > +
> > >  	return true;
> > >  }
> > 
> > Question: this basically means that you can only effectively enumerate
> > bus number == root_bus_nr and AFAICS if at probe the link did not
> > come up it will never do, will it ?
> > 
> > Isn't this equivalent to limiting the bus numbers the bridge is capable
> > of handling ?
> > 
> > Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
> > the point of trying to enumerate the bus hierarchy below the root bus ?
> 
> Hello Lorenzo!
> 
> PCIe link can theoretically come up even after boot, but aardvark driver
> currently does not support link detection at runtime. So it checks and
> enumerate device only at probe time.

If the link is not up at probe enumerating devices below the root
bus is basically useless and that's actually what is causing the
delays you are fixing. Is this correct ?

> I do not know if hardware has some mechanism to inform kernel that PCIe
> link come up (or down) and re-enumeration is required. Or the only
> option is polling via advk_pcie_link_up().
> 
> So if device is not visible at the probe time then it would not appear
> in system and cannot be used. This is current state.
> 
> Just to note that our hardware does not support physical hotplug of
> mPCIe cards. You need to connect card when board is powered off.
> 
> So if at the aardvark probe time PCIe link is not up then trying to
> enumerate devices under (software) root bridge is not needed. But it is
> needed to register/enumerate software root bridge device and currently
> both is done by one (recursive) call pci_host_probe().

I understand that but the bridge bus resource can be trimmed to just
contain the root bus because that's the only one where there is a
chance you can enumerate a device.

I would like to get Bjorn's opinion on this, I don't like these "link is
up" checks in config accessors (they are racy and honestly it is a
run-time check that does not make much sense, either it is always
true/false or it is inevitably racy) I was wondering if we can find an
alternative solution but I am not sure the one I suggested above is
better than this patch.

Lorenzo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-09 14:47       ` Lorenzo Pieralisi
@ 2020-07-09 15:09         ` Pali Rohár
  2020-07-10  9:18           ` Lorenzo Pieralisi
  0 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-09 15:09 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Thursday 09 July 2020 15:47:01 Lorenzo Pieralisi wrote:
> On Thu, Jul 09, 2020 at 02:22:08PM +0200, Pali Rohár wrote:
> > On Thursday 09 July 2020 12:35:09 Lorenzo Pieralisi wrote:
> > > On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > root bridge, the aardvark driver throws the following error message:
> > > > 
> > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > 
> > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > 
> > > > Extend check in advk_pcie_valid_device() function for validating
> > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > as Not Found and the driver does not try to access these registers.
> > > > 
> > > > This is just an optimization to prevent accessing PCIe registers when card
> > > > is disconnected. Trying to access PCIe registers of disconnected card does
> > > > not cause any crash, kernel just needs to wait for a timeout. So if card
> > > > disappear immediately after checking for PCIe link (before accessing PCIe
> > > > registers), it does not cause any problems.
> > > > 
> > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > 
> > > > ---
> > > > Changes in V3:
> > > > * Add comment to the code
> > > > Changes in V2:
> > > > * Update commit message, mention that this is optimization
> > > > ---
> > > >  drivers/pci/controller/pci-aardvark.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > index 90ff291c24f0..d18f389b36a1 100644
> > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > >  		return false;
> > > >  
> > > > +	/*
> > > > +	 * If the link goes down after we check for link-up, nothing bad
> > > > +	 * happens but the config access times out.
> > > > +	 */
> > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > +		return false;
> > > > +
> > > >  	return true;
> > > >  }
> > > 
> > > Question: this basically means that you can only effectively enumerate
> > > bus number == root_bus_nr and AFAICS if at probe the link did not
> > > come up it will never do, will it ?
> > > 
> > > Isn't this equivalent to limiting the bus numbers the bridge is capable
> > > of handling ?
> > > 
> > > Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
> > > the point of trying to enumerate the bus hierarchy below the root bus ?
> > 
> > Hello Lorenzo!
> > 
> > PCIe link can theoretically come up even after boot, but aardvark driver
> > currently does not support link detection at runtime. So it checks and
> > enumerate device only at probe time.
> 
> If the link is not up at probe enumerating devices below the root
> bus is basically useless and that's actually what is causing the
> delays you are fixing. Is this correct ?

Yes, this is one (but not the only one) delay.

> > I do not know if hardware has some mechanism to inform kernel that PCIe
> > link come up (or down) and re-enumeration is required. Or the only
> > option is polling via advk_pcie_link_up().
> > 
> > So if device is not visible at the probe time then it would not appear
> > in system and cannot be used. This is current state.
> > 
> > Just to note that our hardware does not support physical hotplug of
> > mPCIe cards. You need to connect card when board is powered off.
> > 
> > So if at the aardvark probe time PCIe link is not up then trying to
> > enumerate devices under (software) root bridge is not needed. But it is
> > needed to register/enumerate software root bridge device and currently
> > both is done by one (recursive) call pci_host_probe().
> 
> I understand that but the bridge bus resource can be trimmed to just
> contain the root bus because that's the only one where there is a
> chance you can enumerate a device.

It is possible to register only root bridge without endpoint?

> I would like to get Bjorn's opinion on this, I don't like these "link is
> up" checks in config accessors (they are racy and honestly it is a
> run-time check that does not make much sense, either it is always
> true/false or it is inevitably racy)

It is runtime check, but does not have to be always true/false. I have
tested more Compex wifi cards and under certain conditions they
"disappear" from the bus during usage.

So I think it still make sense to do this "fast" check as it is only
optimization.

> I was wondering if we can find an
> alternative solution but I am not sure the one I suggested above is
> better than this patch.

I do not know if it helps in situation when card disappear from bus on
runtime...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-09 15:09         ` Pali Rohár
@ 2020-07-10  9:18           ` Lorenzo Pieralisi
  2020-07-10 15:44             ` Pali Rohár
  2020-07-13  8:27             ` Pali Rohár
  0 siblings, 2 replies; 32+ messages in thread
From: Lorenzo Pieralisi @ 2020-07-10  9:18 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:

[...]

> > I understand that but the bridge bus resource can be trimmed to just
> > contain the root bus because that's the only one where there is a
> > chance you can enumerate a device.
> 
> It is possible to register only root bridge without endpoint?

It is possible to register the root bridge with a trimmed IORESOURCE_BUS
so that you don't enumerate anything other than the root port.

> > I would like to get Bjorn's opinion on this, I don't like these "link is
> > up" checks in config accessors (they are racy and honestly it is a
> > run-time check that does not make much sense, either it is always
> > true/false or it is inevitably racy)
> 
> It is runtime check, but does not have to be always true/false. I have
> tested more Compex wifi cards and under certain conditions they
> "disappear" from the bus during usage.

I would be very grateful if you could describe what happens in HW
when these conditions trigger - I would like to understand if this
issue is aardvark specific or it isn't.

> So I think it still make sense to do this "fast" check as it is only
> optimization.

I will merge this patch but I'd also like to understand the underlying
issue better.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-10  9:18           ` Lorenzo Pieralisi
@ 2020-07-10 15:44             ` Pali Rohár
  2020-07-10 16:08               ` Bjorn Helgaas
  2020-07-13  8:27             ` Pali Rohár
  1 sibling, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-10 15:44 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> I would be very grateful if you could describe what happens in HW
> when these conditions trigger - I would like to understand if this
> issue is aardvark specific or it isn't.

Hello Lorenzo! We are not sure what is the problem and where it happens.

There are more issues which happens randomly or under some specific
conditions.

I can reproduce following issue: Connect Compex WLE900VX card, configure
aardvark to gen2 mode. And then card is detected only after the first
link training. If kernel tries to retrain link again (e.g. via ASPM
code) then card is not detected anymore. To detect it again it is needed
to reset card via PERST# signal (assert PERST#, wait, de-assert PERST#).
PCI warm, hot or function reset does not help. When aardvark is
configured in gen1 mode then card is detected fine also after multiple
link training.

Above problem does not happen with Compex WLE200VX (ath9k) or Compex
WLE1216V5-20 cards.

Sometimes WLE900VX card disappear from the bus during usage. It just
stop communicates with ath10k driver and aardvark does not see link.

Another issue which happens for WLE900VX, WLE600VX and WLE1216VS-20 (but
not for WLE200VX): Linux kernel can detect these cards only if it issues
card reset via PERST# signal and start link training (via standard pcie
endpoint register PCI_EXP_LNKCTL/PCI_EXP_LNKCTL_RL) immediately after
enable link training in aardvark (via aardvark specific LINK_TRAINING_EN
bit). If there is e.g. 100ms delay between enabling link training and
setting PCI_EXP_LNKCTL_RL bit then these cards are not detected.

Also issuing reset via PERST# signal is required to detect these cards
if either board was rebooted (not started from cold power off state) or
if U-Boot touched/initialized PCIe aardvark.

WLE200VX works fine also after doing second or third link training and
also works without need to issue reset via PERST# signal.

And WLE900VX card is not detected even after resetting it via PERST#
signal if aardvark link training (LINK_TRAINING_EN bit) was enabled
prior toggling PERST#. PERST# signal is controlled via GPIO.

When I put WLE900VX card into board with uses mvebu PCI driver (not
aardvak) then card is working fine, there is no need to issue card reset
via PERST#, no need to explicitly set gen mode and card is also working
after more link training.

So basically I have no idea why it happens or where is the problem,
either in aardvark or in cards or on both places. As you can see each of
tested card has different set problems.

Today I tested card from different vendor but with same Qualcomm chip as
is in WLE900VX and I observe same behavior as from Compex WLE900VX. So
it looks like that card vendor does not have to matter, important is
wifi chip inside.

I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
more people have problems with them. But issues described in kernel
bugzilla (like card is reporting incorrect PCI device id) I'm not
observing.

If you have any idea how to either debug these problems or come up with
idea where could be the problem, please let me know.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-10 15:44             ` Pali Rohár
@ 2020-07-10 16:08               ` Bjorn Helgaas
  2020-07-10 19:30                 ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Bjorn Helgaas @ 2020-07-10 16:08 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Lorenzo Pieralisi, Thomas Petazzoni, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Fri, Jul 10, 2020 at 05:44:58PM +0200, Pali Rohár wrote:
> I can reproduce following issue: Connect Compex WLE900VX card, configure
> aardvark to gen2 mode. And then card is detected only after the first
> link training. If kernel tries to retrain link again (e.g. via ASPM
> code) then card is not detected anymore. 

Somebody should go over the ASPM retrain link code and the PCIe spec
with a fine-toothed comb.  Maybe we're doing something wrong there.
Or maybe aardvark has some hardware issue and we need some sort of
quirk to work around it.

> Another issue which happens for WLE900VX, WLE600VX and WLE1216VS-20 (but
> not for WLE200VX): Linux kernel can detect these cards only if it issues
> card reset via PERST# signal and start link training (via standard pcie
> endpoint register PCI_EXP_LNKCTL/PCI_EXP_LNKCTL_RL)

I think you mean "downstream port" (not "endpoint") register?
PCI_EXP_LNKCTL_RL is only applicable to *downstream ports* (root ports
or switch downstream ports) and is reserved for endpoints.

> immediately after
> enable link training in aardvark (via aardvark specific LINK_TRAINING_EN
> bit). If there is e.g. 100ms delay between enabling link training and
> setting PCI_EXP_LNKCTL_RL bit then these cards are not detected.

This sounds problematic.  Hardware should not be dependent on the
software being "fast enough".  In general we should be able to insert
arbitrary delays at any point without breaking anything.

But I have the impression that aardvark requires more software
hand-holding that most hardware does.  If it imposes timing
requirements on the software, that *should* be documented in the
aardvark spec.

> I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
> more people have problems with them. But issues described in kernel
> bugzilla (like card is reporting incorrect PCI device id) I'm not
> observing.

Pointer?  Is the incorrect device ID 0xffff?  That could be a symptom
of a PCIe error.  If we read a device ID that's something other than
0, 0xffff, or the correct ID, that would be really weird.  Even 0
would be really strange.

I suspect these wifi cards are a little special because they probably
play unusual games with power for airplane mode and the like.

Bjorn

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-10 16:08               ` Bjorn Helgaas
@ 2020-07-10 19:30                 ` Pali Rohár
  2020-07-10 20:08                   ` Bjorn Helgaas
  0 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-10 19:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lorenzo Pieralisi, Thomas Petazzoni, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Friday 10 July 2020 11:08:28 Bjorn Helgaas wrote:
> On Fri, Jul 10, 2020 at 05:44:58PM +0200, Pali Rohár wrote:
> > I can reproduce following issue: Connect Compex WLE900VX card, configure
> > aardvark to gen2 mode. And then card is detected only after the first
> > link training. If kernel tries to retrain link again (e.g. via ASPM
> > code) then card is not detected anymore. 
> 
> Somebody should go over the ASPM retrain link code and the PCIe spec
> with a fine-toothed comb.  Maybe we're doing something wrong there.

I think this is not ASPM related as card simply disappear just after
flipping PCI_EXP_LNKCTL_RL bit second time without changing ASPM bits.

> Or maybe aardvark has some hardware issue and we need some sort of
> quirk to work around it.

It is possible that this is aardvark issue. As I said I really do not
know.

In aardvark driver there is already merged workaround for this issue:
driver force gen1 aardvark mode for gen1 card.

> > Another issue which happens for WLE900VX, WLE600VX and WLE1216VS-20 (but
> > not for WLE200VX): Linux kernel can detect these cards only if it issues
> > card reset via PERST# signal and start link training (via standard pcie
> > endpoint register PCI_EXP_LNKCTL/PCI_EXP_LNKCTL_RL)
> 
> I think you mean "downstream port" (not "endpoint") register?

Yes.

> PCI_EXP_LNKCTL_RL is only applicable to *downstream ports* (root ports
> or switch downstream ports) and is reserved for endpoints.
> 
> > immediately after
> > enable link training in aardvark (via aardvark specific LINK_TRAINING_EN
> > bit). If there is e.g. 100ms delay between enabling link training and
> > setting PCI_EXP_LNKCTL_RL bit then these cards are not detected.
> 
> This sounds problematic.  Hardware should not be dependent on the
> software being "fast enough".  In general we should be able to insert
> arbitrary delays at any point without breaking anything.

Yes, it is problematic. For example following commit broke those cards:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4c7d053d7f77cd5c1a1ba7c7ce085ddba13d1d7

And this commit fixed it (just msleep was moved to different stage):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6964494582f56a3882c2c53b0edbfe99eb32b2e1

But we somehow need to deal with it until we find root cause.

Basically additional sleep in aardvark init phase can break WLE900VX
cards, but not WLE200VX.

And because WLE900VX works fine with pci-mvebu and WLE200VX works fine
with pci-aardvark we cannot deduce from it if problem for combination of
WLE900VX and aardvark is in WLE900VX or in aardvark.

> But I have the impression that aardvark requires more software
> hand-holding that most hardware does.  If it imposes timing
> requirements on the software, that *should* be documented in the
> aardvark spec.

There is absolutely nothing regarding to timings in documentation which
I saw. In documentation are just instructions/steps how to init PCI
subsystem and it is basically advk_pcie_setup_hw() function.

> > I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
> > more people have problems with them. But issues described in kernel
> > bugzilla (like card is reporting incorrect PCI device id) I'm not
> > observing.
> 
> Pointer?

Hm... I cannot find right now pointer to bugzilla, but I have pointer to
ath9k-devel mailing list with that incorrect device id:

https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html

> Is the incorrect device ID 0xffff?

No, incorrect device ID in that case is 0xabcd and vendor ID is correct
(Qualcomm).

> That could be a symptom
> of a PCIe error.  If we read a device ID that's something other than
> 0, 0xffff, or the correct ID, that would be really weird.  Even 0
> would be really strange.

It is strange and also reason why discussion on that list is long.

As I said, I'm not seeing that problem with wrong device ID.

But I know people who are observing same problem on different boards
(which do not use aardvark) as described in above mailing list thread
with Compex ath10k cards.

> I suspect these wifi cards are a little special because they probably
> play unusual games with power for airplane mode and the like.

This is another/different problem and is already "documented" in kernel
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=84821#c52

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-10 19:30                 ` Pali Rohár
@ 2020-07-10 20:08                   ` Bjorn Helgaas
  0 siblings, 0 replies; 32+ messages in thread
From: Bjorn Helgaas @ 2020-07-10 20:08 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Lorenzo Pieralisi, Thomas Petazzoni, Andrew Murray,
	Bjorn Helgaas, Marek Behún, Remi Pommarel,
	Tomasz Maciej Nowak, Xogium, linux-pci, linux-arm-kernel,
	linux-kernel

On Fri, Jul 10, 2020 at 09:30:03PM +0200, Pali Rohár wrote:
> On Friday 10 July 2020 11:08:28 Bjorn Helgaas wrote:
> > On Fri, Jul 10, 2020 at 05:44:58PM +0200, Pali Rohár wrote:
> > > I can reproduce following issue: Connect Compex WLE900VX card, configure
> > > aardvark to gen2 mode. And then card is detected only after the first
> > > link training. If kernel tries to retrain link again (e.g. via ASPM
> > > code) then card is not detected anymore. 
> > 
> > Somebody should go over the ASPM retrain link code and the PCIe spec
> > with a fine-toothed comb.  Maybe we're doing something wrong there.
> 
> I think this is not ASPM related as card simply disappear just after
> flipping PCI_EXP_LNKCTL_RL bit second time without changing ASPM bits.

Right.  The retrain code in aspm.c doesn't really have anything in
particular to do with ASPM and it should probably be moved elsewhere.
So I think the problem may be related to retrain and the delays after
it in general, not to ASPM.

> There is absolutely nothing regarding to timings in documentation which
> I saw. In documentation are just instructions/steps how to init PCI
> subsystem and it is basically advk_pcie_setup_hw() function.
> 
> > > I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
> > > more people have problems with them. But issues described in kernel
> > > bugzilla (like card is reporting incorrect PCI device id) I'm not
> > > observing.
> 
> Hm... I cannot find right now pointer to bugzilla, but I have pointer to
> ath9k-devel mailing list with that incorrect device id:
> 
> https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
> 
> > Is the incorrect device ID 0xffff?
> 
> No, incorrect device ID in that case is 0xabcd and vendor ID is correct
> (Qualcomm).

From a quick look at that thread, it sounds like the device isn't
quite ready yet.  In that case, it's supposed to respond with Config
Request Retry Status, and Linux is supposed to wait longer and retry.
But I don't think Linux does that quite correctly, so it could be
either a hardware problem or Linux being broken.  But I guess that's
not the current problem so I don't want to go down that rathole right
now.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-10  9:18           ` Lorenzo Pieralisi
  2020-07-10 15:44             ` Pali Rohár
@ 2020-07-13  8:27             ` Pali Rohár
  2020-07-13 11:23               ` Lorenzo Pieralisi
  1 sibling, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-13  8:27 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > I understand that but the bridge bus resource can be trimmed to just
> > > contain the root bus because that's the only one where there is a
> > > chance you can enumerate a device.
> > 
> > It is possible to register only root bridge without endpoint?
> 
> It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> so that you don't enumerate anything other than the root port.

Hello Lorenzo! I really do not know how to achieve it. From code it
looks like that pci/probe.c scans child buses unconditionally.

pci-aardvark.c calls pci_host_probe() which calls functions
pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
needs to be reconfigured) which then try to probe child bus via
pci_scan_child_bus_extend() because bridge is not card bus.

In function pci_scan_bridge_extend() I do not see a way how to skip
probing for child buses which would avoid enumerating aardvark root
bridge when PCIe device is not connected.

dmesg output contains:

  advk-pcie d0070000.pcie: link never came up
  advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
  pci_bus 0000:00: root bus resource [bus 00-ff]
  pci_bus 0000:00: root bus resource [mem 0xe8000000-0xe8ffffff]
  pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0xe9000000-0xe900ffff])
  pci_bus 0000:00: scanning bus
  pci 0000:00:00.0: [1b4b:0100] type 01 class 0x060400
  pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
  pci_bus 0000:00: fixups for bus
  pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
  pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
  pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
  pci_bus 0000:01: scanning bus
  advk-pcie d0070000.pcie: advk_pcie_valid_device

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-13  8:27             ` Pali Rohár
@ 2020-07-13 11:23               ` Lorenzo Pieralisi
  2020-07-13 14:50                 ` Pali Rohár
  2020-07-15 12:17                 ` Pali Rohár
  0 siblings, 2 replies; 32+ messages in thread
From: Lorenzo Pieralisi @ 2020-07-13 11:23 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > I understand that but the bridge bus resource can be trimmed to just
> > > > contain the root bus because that's the only one where there is a
> > > > chance you can enumerate a device.
> > > 
> > > It is possible to register only root bridge without endpoint?
> > 
> > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > so that you don't enumerate anything other than the root port.
> 
> Hello Lorenzo! I really do not know how to achieve it. From code it
> looks like that pci/probe.c scans child buses unconditionally.
> 
> pci-aardvark.c calls pci_host_probe() which calls functions
> pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> needs to be reconfigured) which then try to probe child bus via
> pci_scan_child_bus_extend() because bridge is not card bus.
> 
> In function pci_scan_bridge_extend() I do not see a way how to skip
> probing for child buses which would avoid enumerating aardvark root
> bridge when PCIe device is not connected.
> 
> dmesg output contains:
> 
>   advk-pcie d0070000.pcie: link never came up
>   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
>   pci_bus 0000:00: root bus resource [bus 00-ff]

This resource can be limited to the root bus number only before calling
pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
pci_scan_bridge_extend() that programs primary/secondary/subordinate
busses) but I think that only papers over the issue, it does not fix it.

I will go over the thread again but I suspect I can merge the patch even
though I still believe there is work to be done to understand the issue
we are facing.

Lorenzo

>   pci_bus 0000:00: root bus resource [mem 0xe8000000-0xe8ffffff]
>   pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0xe9000000-0xe900ffff])
>   pci_bus 0000:00: scanning bus
>   pci 0000:00:00.0: [1b4b:0100] type 01 class 0x060400
>   pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
>   pci_bus 0000:00: fixups for bus
>   pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
>   pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>   pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
>   pci_bus 0000:01: scanning bus
>   advk-pcie d0070000.pcie: advk_pcie_valid_device

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-13 11:23               ` Lorenzo Pieralisi
@ 2020-07-13 14:50                 ` Pali Rohár
  2020-07-13 16:41                   ` Lorenzo Pieralisi
  2020-07-15 12:17                 ` Pali Rohár
  1 sibling, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-13 14:50 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> I will go over the thread again but I suspect I can merge the patch even
> though I still believe there is work to be done to understand the issue
> we are facing.

Just to note that pci-mvebu.c also checks if pcie link is up before
trying to access the real PCIe interface registers, similarly as in my
patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-13 14:50                 ` Pali Rohár
@ 2020-07-13 16:41                   ` Lorenzo Pieralisi
  2020-07-14  7:38                     ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Lorenzo Pieralisi @ 2020-07-13 16:41 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Mon, Jul 13, 2020 at 04:50:03PM +0200, Pali Rohár wrote:
> On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > I will go over the thread again but I suspect I can merge the patch even
> > though I still believe there is work to be done to understand the issue
> > we are facing.
> 
> Just to note that pci-mvebu.c also checks if pcie link is up before
> trying to access the real PCIe interface registers, similarly as in my
> patch.

I understand - that does not change my opinion though, the link check
is just a workaround, it'd be best if we pinpoint the real issue which
is likely to a HW one.

Lorenzo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-13 16:41                   ` Lorenzo Pieralisi
@ 2020-07-14  7:38                     ` Pali Rohár
  0 siblings, 0 replies; 32+ messages in thread
From: Pali Rohár @ 2020-07-14  7:38 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Monday 13 July 2020 17:41:40 Lorenzo Pieralisi wrote:
> On Mon, Jul 13, 2020 at 04:50:03PM +0200, Pali Rohár wrote:
> > On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > > I will go over the thread again but I suspect I can merge the patch even
> > > though I still believe there is work to be done to understand the issue
> > > we are facing.
> > 
> > Just to note that pci-mvebu.c also checks if pcie link is up before
> > trying to access the real PCIe interface registers, similarly as in my
> > patch.
> 
> I understand - that does not change my opinion though, the link check
> is just a workaround, it'd be best if we pinpoint the real issue which
> is likely to a HW one.

Lorenzo, if you have an idea how to debug this issue or if you would
like to see some test results, let me know. I can do some tests, but I
currently really do not know more then what I wrote in previous emails.

In my opinion, problem is in HW which Marvell has not documented nor
proved that it exists. Other option is that problem is in Compex card
which can be triggered only by Marvell aardvark HW.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-13 11:23               ` Lorenzo Pieralisi
  2020-07-13 14:50                 ` Pali Rohár
@ 2020-07-15 12:17                 ` Pali Rohár
  2020-07-15 16:21                   ` Lorenzo Pieralisi
  1 sibling, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-15 12:17 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > contain the root bus because that's the only one where there is a
> > > > > chance you can enumerate a device.
> > > > 
> > > > It is possible to register only root bridge without endpoint?
> > > 
> > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > so that you don't enumerate anything other than the root port.
> > 
> > Hello Lorenzo! I really do not know how to achieve it. From code it
> > looks like that pci/probe.c scans child buses unconditionally.
> > 
> > pci-aardvark.c calls pci_host_probe() which calls functions
> > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > needs to be reconfigured) which then try to probe child bus via
> > pci_scan_child_bus_extend() because bridge is not card bus.
> > 
> > In function pci_scan_bridge_extend() I do not see a way how to skip
> > probing for child buses which would avoid enumerating aardvark root
> > bridge when PCIe device is not connected.
> > 
> > dmesg output contains:
> > 
> >   advk-pcie d0070000.pcie: link never came up
> >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> >   pci_bus 0000:00: root bus resource [bus 00-ff]
> 
> This resource can be limited to the root bus number only before calling
> pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> pci_scan_bridge_extend() that programs primary/secondary/subordinate
> busses) but I think that only papers over the issue, it does not fix it.

I looked at the code in pci/probe.c again and I do not think it is
possible to avoid scanning devices. pci_scan_child_bus_extend() is
unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
And this function unconditionally calls pci_scan_device() which is
directly trying to read vendor id from config register.

So for me it looks like that kernel expects that can read vendor id and
device id from config register for device which is not connected.

And trying to read config register would cause those timeouts in
aardvark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-15 12:17                 ` Pali Rohár
@ 2020-07-15 16:21                   ` Lorenzo Pieralisi
  2020-07-21  8:57                     ` Pali Rohár
  0 siblings, 1 reply; 32+ messages in thread
From: Lorenzo Pieralisi @ 2020-07-15 16:21 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Wed, Jul 15, 2020 at 02:17:26PM +0200, Pali Rohár wrote:
> On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > > contain the root bus because that's the only one where there is a
> > > > > > chance you can enumerate a device.
> > > > > 
> > > > > It is possible to register only root bridge without endpoint?
> > > > 
> > > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > > so that you don't enumerate anything other than the root port.
> > > 
> > > Hello Lorenzo! I really do not know how to achieve it. From code it
> > > looks like that pci/probe.c scans child buses unconditionally.
> > > 
> > > pci-aardvark.c calls pci_host_probe() which calls functions
> > > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > > needs to be reconfigured) which then try to probe child bus via
> > > pci_scan_child_bus_extend() because bridge is not card bus.
> > > 
> > > In function pci_scan_bridge_extend() I do not see a way how to skip
> > > probing for child buses which would avoid enumerating aardvark root
> > > bridge when PCIe device is not connected.
> > > 
> > > dmesg output contains:
> > > 
> > >   advk-pcie d0070000.pcie: link never came up
> > >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> > >   pci_bus 0000:00: root bus resource [bus 00-ff]
> > 
> > This resource can be limited to the root bus number only before calling
> > pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> > pci_scan_bridge_extend() that programs primary/secondary/subordinate
> > busses) but I think that only papers over the issue, it does not fix it.
> 
> I looked at the code in pci/probe.c again and I do not think it is
> possible to avoid scanning devices. pci_scan_child_bus_extend() is
> unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
> And this function unconditionally calls pci_scan_device() which is
> directly trying to read vendor id from config register.
> 
> So for me it looks like that kernel expects that can read vendor id and
> device id from config register for device which is not connected.

Not if it is connected to a bus that the root port does not decode,
that's what I am saying.

> And trying to read config register would cause those timeouts in
> aardvark.

The root port (which effectively works as PCI bridge from this
standpoint) does not issue config cycles for busses that aren't within
its decoded bus range, which in turn is determined by the firmware
IORESOURCE_BUS resource.

This issue is caused by devices that are connected downstream to
the root port.

Anyway - patch merged but I would be happy to keep this discussion
going, somehow.

If the LPC20 VFIO/IOMMU/PCI microconference is approved it can be a
good venue for this to happen.

Lorenzo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-15 16:21                   ` Lorenzo Pieralisi
@ 2020-07-21  8:57                     ` Pali Rohár
  2020-07-21 10:48                       ` Lorenzo Pieralisi
  0 siblings, 1 reply; 32+ messages in thread
From: Pali Rohár @ 2020-07-21  8:57 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Wednesday 15 July 2020 17:21:08 Lorenzo Pieralisi wrote:
> On Wed, Jul 15, 2020 at 02:17:26PM +0200, Pali Rohár wrote:
> > On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > > On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > > > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > > > contain the root bus because that's the only one where there is a
> > > > > > > chance you can enumerate a device.
> > > > > > 
> > > > > > It is possible to register only root bridge without endpoint?
> > > > > 
> > > > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > > > so that you don't enumerate anything other than the root port.
> > > > 
> > > > Hello Lorenzo! I really do not know how to achieve it. From code it
> > > > looks like that pci/probe.c scans child buses unconditionally.
> > > > 
> > > > pci-aardvark.c calls pci_host_probe() which calls functions
> > > > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > > > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > > > needs to be reconfigured) which then try to probe child bus via
> > > > pci_scan_child_bus_extend() because bridge is not card bus.
> > > > 
> > > > In function pci_scan_bridge_extend() I do not see a way how to skip
> > > > probing for child buses which would avoid enumerating aardvark root
> > > > bridge when PCIe device is not connected.
> > > > 
> > > > dmesg output contains:
> > > > 
> > > >   advk-pcie d0070000.pcie: link never came up
> > > >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> > > >   pci_bus 0000:00: root bus resource [bus 00-ff]
> > > 
> > > This resource can be limited to the root bus number only before calling
> > > pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> > > pci_scan_bridge_extend() that programs primary/secondary/subordinate
> > > busses) but I think that only papers over the issue, it does not fix it.
> > 
> > I looked at the code in pci/probe.c again and I do not think it is
> > possible to avoid scanning devices. pci_scan_child_bus_extend() is
> > unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
> > And this function unconditionally calls pci_scan_device() which is
> > directly trying to read vendor id from config register.
> > 
> > So for me it looks like that kernel expects that can read vendor id and
> > device id from config register for device which is not connected.
> 
> Not if it is connected to a bus that the root port does not decode,
> that's what I am saying.
> 
> > And trying to read config register would cause those timeouts in
> > aardvark.
> 
> The root port (which effectively works as PCI bridge from this
> standpoint) does not issue config cycles for busses that aren't within
> its decoded bus range, which in turn is determined by the firmware
> IORESOURCE_BUS resource.
> 
> This issue is caused by devices that are connected downstream to
> the root port.
> 
> Anyway - patch merged

Could you send me a link to git commit? I have looked into
lpieralisi/pci.git repository, but I do not see it here.

> but I would be happy to keep this discussion going, somehow.

Ok, no problem. As I said if anybody has any idea or would like to see
some tests from me, I can do it and provide results.

> If the LPC20 VFIO/IOMMU/PCI microconference is approved it can be a
> good venue for this to happen.
> 
> Lorenzo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card connected
  2020-07-21  8:57                     ` Pali Rohár
@ 2020-07-21 10:48                       ` Lorenzo Pieralisi
  0 siblings, 0 replies; 32+ messages in thread
From: Lorenzo Pieralisi @ 2020-07-21 10:48 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Thomas Petazzoni, Andrew Murray, Bjorn Helgaas, Marek Behún,
	Remi Pommarel, Tomasz Maciej Nowak, Xogium, linux-pci,
	linux-arm-kernel, linux-kernel

On Tue, Jul 21, 2020 at 10:57:13AM +0200, Pali Rohár wrote:
> On Wednesday 15 July 2020 17:21:08 Lorenzo Pieralisi wrote:
> > On Wed, Jul 15, 2020 at 02:17:26PM +0200, Pali Rohár wrote:
> > > On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > > > On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > > > > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > > > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > > > > contain the root bus because that's the only one where there is a
> > > > > > > > chance you can enumerate a device.
> > > > > > > 
> > > > > > > It is possible to register only root bridge without endpoint?
> > > > > > 
> > > > > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > > > > so that you don't enumerate anything other than the root port.
> > > > > 
> > > > > Hello Lorenzo! I really do not know how to achieve it. From code it
> > > > > looks like that pci/probe.c scans child buses unconditionally.
> > > > > 
> > > > > pci-aardvark.c calls pci_host_probe() which calls functions
> > > > > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > > > > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > > > > needs to be reconfigured) which then try to probe child bus via
> > > > > pci_scan_child_bus_extend() because bridge is not card bus.
> > > > > 
> > > > > In function pci_scan_bridge_extend() I do not see a way how to skip
> > > > > probing for child buses which would avoid enumerating aardvark root
> > > > > bridge when PCIe device is not connected.
> > > > > 
> > > > > dmesg output contains:
> > > > > 
> > > > >   advk-pcie d0070000.pcie: link never came up
> > > > >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> > > > >   pci_bus 0000:00: root bus resource [bus 00-ff]
> > > > 
> > > > This resource can be limited to the root bus number only before calling
> > > > pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> > > > pci_scan_bridge_extend() that programs primary/secondary/subordinate
> > > > busses) but I think that only papers over the issue, it does not fix it.
> > > 
> > > I looked at the code in pci/probe.c again and I do not think it is
> > > possible to avoid scanning devices. pci_scan_child_bus_extend() is
> > > unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
> > > And this function unconditionally calls pci_scan_device() which is
> > > directly trying to read vendor id from config register.
> > > 
> > > So for me it looks like that kernel expects that can read vendor id and
> > > device id from config register for device which is not connected.
> > 
> > Not if it is connected to a bus that the root port does not decode,
> > that's what I am saying.
> > 
> > > And trying to read config register would cause those timeouts in
> > > aardvark.
> > 
> > The root port (which effectively works as PCI bridge from this
> > standpoint) does not issue config cycles for busses that aren't within
> > its decoded bus range, which in turn is determined by the firmware
> > IORESOURCE_BUS resource.
> > 
> > This issue is caused by devices that are connected downstream to
> > the root port.
> > 
> > Anyway - patch merged
> 
> Could you send me a link to git commit? I have looked into
> lpieralisi/pci.git repository, but I do not see it here.

Apologies - I did not push it out, I have pushed it out on
pci/aardvark now.

> > but I would be happy to keep this discussion going, somehow.
> 
> Ok, no problem. As I said if anybody has any idea or would like to see
> some tests from me, I can do it and provide results.

Sounds good, I will let you know, thanks.

Lorenzo

> > If the LPC20 VFIO/IOMMU/PCI microconference is approved it can be a
> > good venue for this to happen.
> > 
> > Lorenzo

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, back to index

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-28 14:31 [PATCH] PCI: aardvark: Don't touch PCIe registers if no card connected Pali Rohár
2020-05-28 16:26 ` Bjorn Helgaas
2020-05-28 16:38   ` Pali Rohár
2020-05-28 16:49     ` Bjorn Helgaas
2020-05-29  8:30       ` Pali Rohár
2020-06-30 12:31         ` Pali Rohár
2020-06-30 13:51     ` Bjorn Helgaas
2020-06-30 14:04       ` Pali Rohár
2020-06-30 14:58         ` Bjorn Helgaas
2020-07-01  8:08           ` Pali Rohár
2020-07-01  8:20 ` [PATCH v2] " Pali Rohár
2020-07-01 21:34   ` Bjorn Helgaas
2020-07-02  8:23     ` Pali Rohár
2020-07-02  8:30 ` [PATCH v3] " Pali Rohár
2020-07-09 11:35   ` Lorenzo Pieralisi
2020-07-09 12:22     ` Pali Rohár
2020-07-09 14:47       ` Lorenzo Pieralisi
2020-07-09 15:09         ` Pali Rohár
2020-07-10  9:18           ` Lorenzo Pieralisi
2020-07-10 15:44             ` Pali Rohár
2020-07-10 16:08               ` Bjorn Helgaas
2020-07-10 19:30                 ` Pali Rohár
2020-07-10 20:08                   ` Bjorn Helgaas
2020-07-13  8:27             ` Pali Rohár
2020-07-13 11:23               ` Lorenzo Pieralisi
2020-07-13 14:50                 ` Pali Rohár
2020-07-13 16:41                   ` Lorenzo Pieralisi
2020-07-14  7:38                     ` Pali Rohár
2020-07-15 12:17                 ` Pali Rohár
2020-07-15 16:21                   ` Lorenzo Pieralisi
2020-07-21  8:57                     ` Pali Rohár
2020-07-21 10:48                       ` Lorenzo Pieralisi

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git