linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: Extend probing of native PCIe controllers
@ 2021-10-22 18:38 Pali Rohár
  2021-11-08  8:50 ` Luca Ceresoli
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Pali Rohár @ 2021-10-22 18:38 UTC (permalink / raw)
  To: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Luca Ceresoli, Marek Behún, Matthew Wilcox,
	Alex Williamson, linux-pci, linux-kernel

Hello!

In this email I'm describing how I see that probing of native PCIe
controller drivers is implemented, how it should be implemented and how
to extend / simplify core code for controller drivers.


Native PCIe controller drivers needs to fill struct pci_host_bridge and
then call pci_host_probe(). Function pci_host_probe() starts probing and
enumerating buses and register PCIe devices to system.

But initialization of PCIe controller and cards on buses (other end of
PCIe link) is more complicated and basically every native PCIe
controller driver needs to do initialization PCIe link prior calling
pci_host_probe(). Steps which controller drivers are doing are de-facto
standard steps defined in PCIe base or CEM specification.

The most problematic step is to reset endpoint card and wait until
endpoint card start. Reset of endpoint card is done by standard PERST#
signal (defined in PCIe CEM spec) and in most cases board export control
of this signal to OS via GPIO (few board and drivers have this signal
connected to PCIe controller and then PCIe controller has some specific
registers to control this signal). Reset via PERST# signal is defined in
PCIe CEM and base specs as "PCIe Warm Reset".

As discussed in the following email thread, this PCIe Warm Reset should
not depend on PCIe controller as it resets card on the other end of PCIe
controller. But currently every native PCIe controller driver does PCIe
Warm Reset by its own for randomly chosen time period. There is open
question how long should be endpoint card in Warm Reset state:
https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/

Initialization of PCIe endpoint card is described in PCIe CEM spec in
Figure 2-10: Power Up. Other informations are in PCIe base spec in 6.6.1
Conventional Reset section.

If I understand specifications correctly then OS initialization steps
should be following (please correct me if I'm wrong!):

1) Put PERST# to low which enter endpoint card into reset state
2) Enable AUX power (3.3V) and wait until is stable
3) Enable main power (12V/3.3V) and wait until is stable
4) Enable refclock and wait until is stable
5) Enable LTSSM on PCIe controller to start link training
6) Put PERST# to high which exit endpoint card from reset state
7) Wait until link training completes
8) Wait another 100ms prior accessing config space of endpoint card

Minimal time period between "after step 3)" and "before step 6)" is T_PVPERL = 100ms
Minimal time period between "after step 4)" and "before step 6)" is T_PERSTCLK = 100us

After step 6) is end of Fundamental Reset and PCIe controller needs to
be in LTSSM Detect state within 20ms. So enabling it prior putting
PERST# to high should achieve it.

Competition of link training is indicated by standard DLLLA bit in Root
Port config space. Support for DLLLA bit is optional and is indicated by
DLLLARC bit in Root Port config space. Lot of PCIe controllers do not
support this standard DLLLA bit, but have their own specific register
for it.

Similarly is defined power down of PCIe card in PCIe CEM spec in Figure
2-13: Power Down. If I understand it correctly steps are:

1) Put endpoint card into D3hot state, so PCIe link goes inactive
2) Put PERST# to low, so endpoint card enters reset state
3) Disable main power (12V/3.3V)
4) Disable refclock

In case of surprise power down, PERST# needs to go low in 500ns.

In PCIe base spec in section 5.2 Link State Power Management is
described that preparation for removing the main power source can be
done also by sending PCIe PME_Turn_Off Message by Root Complex. IIRC
there is no standard way how to send PCIe PME_Turn_Off message.




I see that basically every PCIe controller driver is doing its own
implementation of PCIe Warm Reset and waiting until PCIe link is ready
prior calling pci_host_probe().

Based on all above details I would like to propose following extending
of struct pci_host_bridge and pci_host_probe() function to de-duplicate
native PCIe controller driver code:

1) extend struct pci_host_bridge to provide callbacks for:
   * enable / disable main power source
   * enable / disable refclock
   * enable / disable LTSSM link training (if PCIe link should go into Detect / Polling state)
   * enable / disable PERST# signal
   * returning boolean if endpoint card is present (physically in PCIe/mPCIe/m.2/... slot)
   * returning boolean if link training completed
   * sending PCIe PME_Turn_Off message

2) implement asynchronous initialization of PCIe link and enumeration of
   PCIe bus behind the PCIe Root Port from pci_host_probe() based on new
   callbacks added in 1)
   --> so native PCIe controller drivers do not have to do it manually
   --> during this initialization can be done also PCIe Hot Reset

3) implement PCIe Hot Reset as reset method via PERST# signal and put it
   into pci_reset_fn_methods[] array

4) implement PCIe Cold Reset as reset method via power down / up and put
   it into pci_reset_fn_methods[] array

5) as enabling / disabling power supply and toggling PERST# signal is
   implemented via GPIO, add some generic implementation for callback
   functions which will use named gpios (e.g. from DT)

This could simplify implementations of native PCIe controller drivers by
calling initialization steps in correct order with correct timeouts and
drivers do not have to do copy+paste same common code or reimplement it
with own constants and macros for timeouts, etc...

Also it should enable access to PCIe Root Port as this device is part of
Root Complex and should be available also when link is down or link
training was not completed. Currently some PCIe controllers are not
registered into system when link is down (e.g. card is disconnected or
card has some issue). Which also prevents access to PCIe Root Port
device. And in some cases it could speed up boot process as pci
controller driver does not have to actively wait for link and let kernel
do initialization of other devices.

What do you think about this idea?


PS: Excuse me if I wrote some mistakes here, I'm still learning how PCIe
is working and how is PCI subsystem implemented.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Extend probing of native PCIe controllers
  2021-10-22 18:38 RFC: Extend probing of native PCIe controllers Pali Rohár
@ 2021-11-08  8:50 ` Luca Ceresoli
  2021-11-08 16:11   ` Pali Rohár
  2021-11-08 13:14 ` Oliver O'Halloran
  2021-12-08 15:52 ` Bjorn Helgaas
  2 siblings, 1 reply; 7+ messages in thread
From: Luca Ceresoli @ 2021-11-08  8:50 UTC (permalink / raw)
  To: Pali Rohár, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Marek Behún,
	Matthew Wilcox, Alex Williamson, linux-pci, linux-kernel

Hi Pali,

On 22/10/21 20:38, Pali Rohár wrote:
> Hello!
> 
> In this email I'm describing how I see that probing of native PCIe
> controller drivers is implemented, how it should be implemented and how
> to extend / simplify core code for controller drivers.
> 
> 
> Native PCIe controller drivers needs to fill struct pci_host_bridge and
> then call pci_host_probe(). Function pci_host_probe() starts probing and
> enumerating buses and register PCIe devices to system.
> 
> But initialization of PCIe controller and cards on buses (other end of
> PCIe link) is more complicated and basically every native PCIe
> controller driver needs to do initialization PCIe link prior calling
> pci_host_probe(). Steps which controller drivers are doing are de-facto
> standard steps defined in PCIe base or CEM specification.
> 
> The most problematic step is to reset endpoint card and wait until
> endpoint card start. Reset of endpoint card is done by standard PERST#
> signal (defined in PCIe CEM spec) and in most cases board export control
> of this signal to OS via GPIO (few board and drivers have this signal
> connected to PCIe controller and then PCIe controller has some specific
> registers to control this signal). Reset via PERST# signal is defined in
> PCIe CEM and base specs as "PCIe Warm Reset".
> 
> As discussed in the following email thread, this PCIe Warm Reset should
> not depend on PCIe controller as it resets card on the other end of PCIe
> controller. But currently every native PCIe controller driver does PCIe
> Warm Reset by its own for randomly chosen time period. There is open
> question how long should be endpoint card in Warm Reset state:
> https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
> 
> Initialization of PCIe endpoint card is described in PCIe CEM spec in
> Figure 2-10: Power Up. Other informations are in PCIe base spec in 6.6.1
> Conventional Reset section.
> 
> If I understand specifications correctly then OS initialization steps
> should be following (please correct me if I'm wrong!):
> 
> 1) Put PERST# to low which enter endpoint card into reset state
> 2) Enable AUX power (3.3V) and wait until is stable
> 3) Enable main power (12V/3.3V) and wait until is stable
> 4) Enable refclock and wait until is stable
> 5) Enable LTSSM on PCIe controller to start link training
> 6) Put PERST# to high which exit endpoint card from reset state
> 7) Wait until link training completes
> 8) Wait another 100ms prior accessing config space of endpoint card
> 
> Minimal time period between "after step 3)" and "before step 6)" is T_PVPERL = 100ms
> Minimal time period between "after step 4)" and "before step 6)" is T_PERSTCLK = 100us
> 
> After step 6) is end of Fundamental Reset and PCIe controller needs to
> be in LTSSM Detect state within 20ms. So enabling it prior putting
> PERST# to high should achieve it.
> 
> Competition of link training is indicated by standard DLLLA bit in Root
> Port config space. Support for DLLLA bit is optional and is indicated by
> DLLLARC bit in Root Port config space. Lot of PCIe controllers do not
> support this standard DLLLA bit, but have their own specific register
> for it.
> 
> Similarly is defined power down of PCIe card in PCIe CEM spec in Figure
> 2-13: Power Down. If I understand it correctly steps are:
> 
> 1) Put endpoint card into D3hot state, so PCIe link goes inactive
> 2) Put PERST# to low, so endpoint card enters reset state
> 3) Disable main power (12V/3.3V)
> 4) Disable refclock
> 
> In case of surprise power down, PERST# needs to go low in 500ns.
> 
> In PCIe base spec in section 5.2 Link State Power Management is
> described that preparation for removing the main power source can be
> done also by sending PCIe PME_Turn_Off Message by Root Complex. IIRC
> there is no standard way how to send PCIe PME_Turn_Off message.
> 
> 
> 
> 
> I see that basically every PCIe controller driver is doing its own
> implementation of PCIe Warm Reset and waiting until PCIe link is ready
> prior calling pci_host_probe().
> 
> Based on all above details I would like to propose following extending
> of struct pci_host_bridge and pci_host_probe() function to de-duplicate
> native PCIe controller driver code:
> 
> 1) extend struct pci_host_bridge to provide callbacks for:
>    * enable / disable main power source
>    * enable / disable refclock
>    * enable / disable LTSSM link training (if PCIe link should go into Detect / Polling state)
>    * enable / disable PERST# signal
>    * returning boolean if endpoint card is present (physically in PCIe/mPCIe/m.2/... slot)
>    * returning boolean if link training completed
>    * sending PCIe PME_Turn_Off message
> 
> 2) implement asynchronous initialization of PCIe link and enumeration of
>    PCIe bus behind the PCIe Root Port from pci_host_probe() based on new
>    callbacks added in 1)
>    --> so native PCIe controller drivers do not have to do it manually
>    --> during this initialization can be done also PCIe Hot Reset
> 
> 3) implement PCIe Hot Reset as reset method via PERST# signal and put it
>    into pci_reset_fn_methods[] array
> 
> 4) implement PCIe Cold Reset as reset method via power down / up and put
>    it into pci_reset_fn_methods[] array
> 
> 5) as enabling / disabling power supply and toggling PERST# signal is
>    implemented via GPIO, add some generic implementation for callback
>    functions which will use named gpios (e.g. from DT)
> 
> This could simplify implementations of native PCIe controller drivers by
> calling initialization steps in correct order with correct timeouts and
> drivers do not have to do copy+paste same common code or reimplement it
> with own constants and macros for timeouts, etc...
> 
> Also it should enable access to PCIe Root Port as this device is part of
> Root Complex and should be available also when link is down or link
> training was not completed. Currently some PCIe controllers are not
> registered into system when link is down (e.g. card is disconnected or
> card has some issue). Which also prevents access to PCIe Root Port
> device. And in some cases it could speed up boot process as pci
> controller driver does not have to actively wait for link and let kernel
> do initialization of other devices.
> 
> What do you think about this idea?

I'm afraid I know very little about PCI so I cannot give much valuable
feedback.

For what I can understand yours seems like a good analysis and the plan
seems correct: move control from drivers into the core as far as the
actions to take are standard and leave drivers the duty to implement
only hardware-specific pieces via function pointers.

We have discussed in detail the PERST# behavior in [0], and I have a
comment about that.

First, from that discussion it was clear that some drivers drive PERST#
with an inverted polarity when calling gpiod_set_value() or
devm_gpiod_get_optional(), which is "fixed" by an inverted polarity
setting in their device tree.

Second, you say "in most cases board export control of this signal to OS
via GPIO".

For these two reasons you might consider, in addition to the
pci_reset_fn_methods[] function to implement PERST#, to add a struct
gpio_desc * to be filled by the driver and used by the core. If set, it
would allow the core to assert/deassert PERST# without driver
intervention. The idea in pseudocode is:

  [in drivers/pci/probe.c]

  if (bridge->perst_gpio) {
      gpiod_set_value(reset, 1); // assert
      usleep(...);
      gpiod_set_value(reset, 0); // deassert
      usleep(...);
  } else if (bridge->pci_reset_fn_methods[PERST]) {
      bridge->pci_reset_fn_methods[PERST]();
  }

It would make drivers slightly simpler and less error prone.

[0]
https://lore.kernel.org/linux-arm-kernel/e9ab9c22-f73b-fe72-820a-4f2825c3dabc@lucaceresoli.net/T/#mc4fc8c10ebeeff8c6d3593b0072afbcf7de9f2ae

My 2c,
-- 
Luca


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Extend probing of native PCIe controllers
  2021-10-22 18:38 RFC: Extend probing of native PCIe controllers Pali Rohár
  2021-11-08  8:50 ` Luca Ceresoli
@ 2021-11-08 13:14 ` Oliver O'Halloran
  2021-11-08 16:47   ` Pali Rohár
  2021-12-08 15:52 ` Bjorn Helgaas
  2 siblings, 1 reply; 7+ messages in thread
From: Oliver O'Halloran @ 2021-11-08 13:14 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Luca Ceresoli, Marek Behún, Matthew Wilcox,
	Alex Williamson, linux-pci, Linux Kernel Mailing List

On Sat, Oct 23, 2021 at 5:38 AM Pali Rohár <pali@kernel.org> wrote:
>
> Hello!
>
> In this email I'm describing how I see that probing of native PCIe
> controller drivers is implemented, how it should be implemented and how
> to extend / simplify core code for controller drivers.
>
> Native PCIe controller drivers needs to fill struct pci_host_bridge and
> then call pci_host_probe(). Function pci_host_probe() starts probing and
> enumerating buses and register PCIe devices to system.
>
> But initialization of PCIe controller and cards on buses (other end of
> PCIe link) is more complicated and basically every native PCIe
> controller driver needs to do initialization PCIe link prior calling
> pci_host_probe(). Steps which controller drivers are doing are de-facto
> standard steps defined in PCIe base or CEM specification.
>
> The most problematic step is to reset endpoint card and wait until
> endpoint card start. Reset of endpoint card is done by standard PERST#
> signal (defined in PCIe CEM spec) and in most cases board export control
> of this signal to OS via GPIO (few board and drivers have this signal
> connected to PCIe controller and then PCIe controller has some specific
> registers to control this signal). Reset via PERST# signal is defined in
> PCIe CEM and base specs as "PCIe Warm Reset".
>
> As discussed in the following email thread, this PCIe Warm Reset should
> not depend on PCIe controller as it resets card on the other end of PCIe
> controller. But currently every native PCIe controller driver does PCIe
> Warm Reset by its own for randomly chosen time period. There is open
> question how long should be endpoint card in Warm Reset state:
> https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
>
> Initialization of PCIe endpoint card is described in PCIe CEM spec in
> Figure 2-10: Power Up. Other informations are in PCIe base spec in 6.6.1
> Conventional Reset section.
>
> If I understand specifications correctly then OS initialization steps
> should be following (please correct me if I'm wrong!):
>
> 1) Put PERST# to low which enter endpoint card into reset state
> 2) Enable AUX power (3.3V) and wait until is stable
> 3) Enable main power (12V/3.3V) and wait until is stable
> 4) Enable refclock and wait until is stable
> 5) Enable LTSSM on PCIe controller to start link training
> 6) Put PERST# to high which exit endpoint card from reset state
> 7) Wait until link training completes
> 8) Wait another 100ms prior accessing config space of endpoint card
>
> Minimal time period between "after step 3)" and "before step 6)" is T_PVPERL = 100ms
> Minimal time period between "after step 4)" and "before step 6)" is T_PERSTCLK = 100us
>
> After step 6) is end of Fundamental Reset and PCIe controller needs to
> be in LTSSM Detect state within 20ms. So enabling it prior putting
> PERST# to high should achieve it.

5) is a bit out of place here and it could be step 0). There's no real
requirements around when the upstream device (the controller in our
case) starts polling for downstream devices. #PERST is effectively a
power good signal so the downstream device is required to ignore the
link polls until #PERST is lifted.

> Competition of link training is indicated by standard DLLLA bit in Root
> Port config space. Support for DLLLA bit is optional and is indicated by
> DLLLARC bit in Root Port config space. Lot of PCIe controllers do not
> support this standard DLLLA bit, but have their own specific register
> for it.

I thought DLLLA reporting was made mandatory in gen2? I suppose it
doesn't really matter since we need to support gen1 devices anyway,
but still...

> Similarly is defined power down of PCIe card in PCIe CEM spec in Figure
> 2-13: Power Down. If I understand it correctly steps are:
>
> 1) Put endpoint card into D3hot state, so PCIe link goes inactive
> 2) Put PERST# to low, so endpoint card enters reset state
> 3) Disable main power (12V/3.3V)
> 4) Disable refclock
>
> In case of surprise power down, PERST# needs to go low in 500ns.
>
> In PCIe base spec in section 5.2 Link State Power Management is
> described that preparation for removing the main power source can be
> done also by sending PCIe PME_Turn_Off Message by Root Complex. IIRC
> there is no standard way how to send PCIe PME_Turn_Off message.

Is there any real difference between sending this message and placing
the device into D3hot? IIRC devices in D3hot are required to tolerate
a transition to D3cold (i.e. power being removed) anyway.

> I see that basically every PCIe controller driver is doing its own
> implementation of PCIe Warm Reset and waiting until PCIe link is ready
> prior calling pci_host_probe().
>
> Based on all above details I would like to propose following extending
> of struct pci_host_bridge and pci_host_probe() function to de-duplicate
> native PCIe controller driver code:
>
> 1) extend struct pci_host_bridge to provide callbacks for:
>    * enable / disable main power source
>    * enable / disable refclock
>    * enable / disable LTSSM link training (if PCIe link should go into Detect / Polling state)
>    * enable / disable PERST# signal
>    * returning boolean if endpoint card is present (physically in PCIe/mPCIe/m.2/... slot)
>    * returning boolean if link training completed
>    * sending PCIe PME_Turn_Off message

I don't have any real objections to adding this, but I do wonder what
code would be using these calls. If you squint a bit you've got
something that looks a lot like a pci hotplug slot driver (enable /
disable / presence check calls). It might make sense to try use some
of that infrastructure for what you want to do since the bus scanning
code already knows how to deal with slot drivers.

> 2) implement asynchronous initialization of PCIe link and enumeration of
>    PCIe bus behind the PCIe Root Port from pci_host_probe() based on new
>    callbacks added in 1)
>    --> so native PCIe controller drivers do not have to do it manually
>    --> during this initialization can be done also PCIe Hot Reset
>
> 3) implement PCIe Hot Reset as reset method via PERST# signal and put it
>    into pci_reset_fn_methods[] array

I assume you mean Warm reset here. I'll add a word of caution that
there are badly behaved devices out there which will ignore #PERST
being re-asserted after the card is powered on.

> 4) implement PCIe Cold Reset as reset method via power down / up and put
>    it into pci_reset_fn_methods[] array
>
> 5) as enabling / disabling power supply and toggling PERST# signal is
>    implemented via GPIO, add some generic implementation for callback
>    functions which will use named gpios (e.g. from DT)
>
> This could simplify implementations of native PCIe controller drivers by
> calling initialization steps in correct order with correct timeouts and
> drivers do not have to do copy+paste same common code or reimplement it
> with own constants and macros for timeouts, etc...
>
> Also it should enable access to PCIe Root Port as this device is part of
> Root Complex and should be available also when link is down or link
> training was not completed. Currently some PCIe controllers are not
> registered into system when link is down (e.g. card is disconnected or
> card has some issue). Which also prevents access to PCIe Root Port
> device.

Is there a good reason for skipping probing the controller when
there's no downstream device present? Seems kinda dumb to make a whole
pci domain disappear just because it didn't have a device at boot.

> And in some cases it could speed up boot process as pci
> controller driver does not have to actively wait for link and let kernel
> do initialization of other devices.

> What do you think about this idea?

Sounds like a good idea to me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Extend probing of native PCIe controllers
  2021-11-08  8:50 ` Luca Ceresoli
@ 2021-11-08 16:11   ` Pali Rohár
  2021-11-08 16:33     ` Luca Ceresoli
  0 siblings, 1 reply; 7+ messages in thread
From: Pali Rohár @ 2021-11-08 16:11 UTC (permalink / raw)
  To: Luca Ceresoli
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Marek Behún, Matthew Wilcox, Alex Williamson,
	linux-pci, linux-kernel

Hello!

On Monday 08 November 2021 09:50:51 Luca Ceresoli wrote:
> Hi Pali,
> 
> On 22/10/21 20:38, Pali Rohár wrote:
> > Hello!
> > 
> > In this email I'm describing how I see that probing of native PCIe
> > controller drivers is implemented, how it should be implemented and how
> > to extend / simplify core code for controller drivers.
> > 
> > 
> > Native PCIe controller drivers needs to fill struct pci_host_bridge and
> > then call pci_host_probe(). Function pci_host_probe() starts probing and
> > enumerating buses and register PCIe devices to system.
> > 
> > But initialization of PCIe controller and cards on buses (other end of
> > PCIe link) is more complicated and basically every native PCIe
> > controller driver needs to do initialization PCIe link prior calling
> > pci_host_probe(). Steps which controller drivers are doing are de-facto
> > standard steps defined in PCIe base or CEM specification.
> > 
> > The most problematic step is to reset endpoint card and wait until
> > endpoint card start. Reset of endpoint card is done by standard PERST#
> > signal (defined in PCIe CEM spec) and in most cases board export control
> > of this signal to OS via GPIO (few board and drivers have this signal
> > connected to PCIe controller and then PCIe controller has some specific
> > registers to control this signal). Reset via PERST# signal is defined in
> > PCIe CEM and base specs as "PCIe Warm Reset".
> > 
> > As discussed in the following email thread, this PCIe Warm Reset should
> > not depend on PCIe controller as it resets card on the other end of PCIe
> > controller. But currently every native PCIe controller driver does PCIe
> > Warm Reset by its own for randomly chosen time period. There is open
> > question how long should be endpoint card in Warm Reset state:
> > https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
> > 
> > Initialization of PCIe endpoint card is described in PCIe CEM spec in
> > Figure 2-10: Power Up. Other informations are in PCIe base spec in 6.6.1
> > Conventional Reset section.
> > 
> > If I understand specifications correctly then OS initialization steps
> > should be following (please correct me if I'm wrong!):
> > 
> > 1) Put PERST# to low which enter endpoint card into reset state
> > 2) Enable AUX power (3.3V) and wait until is stable
> > 3) Enable main power (12V/3.3V) and wait until is stable
> > 4) Enable refclock and wait until is stable
> > 5) Enable LTSSM on PCIe controller to start link training
> > 6) Put PERST# to high which exit endpoint card from reset state
> > 7) Wait until link training completes
> > 8) Wait another 100ms prior accessing config space of endpoint card
> > 
> > Minimal time period between "after step 3)" and "before step 6)" is T_PVPERL = 100ms
> > Minimal time period between "after step 4)" and "before step 6)" is T_PERSTCLK = 100us
> > 
> > After step 6) is end of Fundamental Reset and PCIe controller needs to
> > be in LTSSM Detect state within 20ms. So enabling it prior putting
> > PERST# to high should achieve it.
> > 
> > Competition of link training is indicated by standard DLLLA bit in Root
> > Port config space. Support for DLLLA bit is optional and is indicated by
> > DLLLARC bit in Root Port config space. Lot of PCIe controllers do not
> > support this standard DLLLA bit, but have their own specific register
> > for it.
> > 
> > Similarly is defined power down of PCIe card in PCIe CEM spec in Figure
> > 2-13: Power Down. If I understand it correctly steps are:
> > 
> > 1) Put endpoint card into D3hot state, so PCIe link goes inactive
> > 2) Put PERST# to low, so endpoint card enters reset state
> > 3) Disable main power (12V/3.3V)
> > 4) Disable refclock
> > 
> > In case of surprise power down, PERST# needs to go low in 500ns.
> > 
> > In PCIe base spec in section 5.2 Link State Power Management is
> > described that preparation for removing the main power source can be
> > done also by sending PCIe PME_Turn_Off Message by Root Complex. IIRC
> > there is no standard way how to send PCIe PME_Turn_Off message.
> > 
> > 
> > 
> > 
> > I see that basically every PCIe controller driver is doing its own
> > implementation of PCIe Warm Reset and waiting until PCIe link is ready
> > prior calling pci_host_probe().
> > 
> > Based on all above details I would like to propose following extending
> > of struct pci_host_bridge and pci_host_probe() function to de-duplicate
> > native PCIe controller driver code:
> > 
> > 1) extend struct pci_host_bridge to provide callbacks for:
> >    * enable / disable main power source
> >    * enable / disable refclock
> >    * enable / disable LTSSM link training (if PCIe link should go into Detect / Polling state)
> >    * enable / disable PERST# signal
> >    * returning boolean if endpoint card is present (physically in PCIe/mPCIe/m.2/... slot)
> >    * returning boolean if link training completed
> >    * sending PCIe PME_Turn_Off message
> > 
> > 2) implement asynchronous initialization of PCIe link and enumeration of
> >    PCIe bus behind the PCIe Root Port from pci_host_probe() based on new
> >    callbacks added in 1)
> >    --> so native PCIe controller drivers do not have to do it manually
> >    --> during this initialization can be done also PCIe Hot Reset
> > 
> > 3) implement PCIe Hot Reset as reset method via PERST# signal and put it
> >    into pci_reset_fn_methods[] array
> > 
> > 4) implement PCIe Cold Reset as reset method via power down / up and put
> >    it into pci_reset_fn_methods[] array
> > 
> > 5) as enabling / disabling power supply and toggling PERST# signal is
> >    implemented via GPIO, add some generic implementation for callback
> >    functions which will use named gpios (e.g. from DT)
> > 
> > This could simplify implementations of native PCIe controller drivers by
> > calling initialization steps in correct order with correct timeouts and
> > drivers do not have to do copy+paste same common code or reimplement it
> > with own constants and macros for timeouts, etc...
> > 
> > Also it should enable access to PCIe Root Port as this device is part of
> > Root Complex and should be available also when link is down or link
> > training was not completed. Currently some PCIe controllers are not
> > registered into system when link is down (e.g. card is disconnected or
> > card has some issue). Which also prevents access to PCIe Root Port
> > device. And in some cases it could speed up boot process as pci
> > controller driver does not have to actively wait for link and let kernel
> > do initialization of other devices.
> > 
> > What do you think about this idea?
> 
> I'm afraid I know very little about PCI so I cannot give much valuable
> feedback.
> 
> For what I can understand yours seems like a good analysis and the plan
> seems correct: move control from drivers into the core as far as the
> actions to take are standard and leave drivers the duty to implement
> only hardware-specific pieces via function pointers.
> 
> We have discussed in detail the PERST# behavior in [0], and I have a
> comment about that.
> 
> First, from that discussion it was clear that some drivers drive PERST#
> with an inverted polarity when calling gpiod_set_value() or
> devm_gpiod_get_optional(), which is "fixed" by an inverted polarity
> setting in their device tree.

Yes, that is truth.

> Second, you say "in most cases board export control of this signal to OS
> via GPIO".

Yes.

> For these two reasons you might consider, in addition to the
> pci_reset_fn_methods[] function to implement PERST#, to add a struct
> gpio_desc * to be filled by the driver and used by the core. If set, it
> would allow the core to assert/deassert PERST# without driver
> intervention.

Controlling PERST# via pci_reset_fn_methods[] is not enough.
pci_reset_fn_methods[PERST]() would do something like this:

1. Assert PERST# (put card into reset)
2. Delay some time period (this is open question how long it should be)
3. De-assert PERST# (put card from reset into normal state)

But in some other functions it would be needed to put card into reset
state and let it in this state (e.g. in power down, driver unbind,
etc...). So pci_reset_fn_methods[PERST] would not be possible to use it
here.

I was thinking to provide just callback bridge->set_perst(enable) and
then implement pci_reset_fn_methods[PERST] via this callback.

If that callback is NULL then PCI core code can assign default
"set_perst" callback which use gpiod_set_value. Like in your pseudocode
below.

Drivers which have inverted polarity of PERTST# gpio or drivers which
have custom way for controlling this pin would have to provide
appropriate callbacks.

> The idea in pseudocode is:
> 
>   [in drivers/pci/probe.c]
> 
>   if (bridge->perst_gpio) {
>       gpiod_set_value(reset, 1); // assert
>       usleep(...);
>       gpiod_set_value(reset, 0); // deassert
>       usleep(...);
>   } else if (bridge->pci_reset_fn_methods[PERST]) {
>       bridge->pci_reset_fn_methods[PERST]();
>   }
> 
> It would make drivers slightly simpler and less error prone.
> 
> [0]
> https://lore.kernel.org/linux-arm-kernel/e9ab9c22-f73b-fe72-820a-4f2825c3dabc@lucaceresoli.net/T/#mc4fc8c10ebeeff8c6d3593b0072afbcf7de9f2ae
> 
> My 2c,
> -- 
> Luca
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Extend probing of native PCIe controllers
  2021-11-08 16:11   ` Pali Rohár
@ 2021-11-08 16:33     ` Luca Ceresoli
  0 siblings, 0 replies; 7+ messages in thread
From: Luca Ceresoli @ 2021-11-08 16:33 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Marek Behún, Matthew Wilcox, Alex Williamson,
	linux-pci, linux-kernel

On 08/11/21 17:11, Pali Rohár wrote:
> Hello!
> 
> On Monday 08 November 2021 09:50:51 Luca Ceresoli wrote:
>> Hi Pali,
>>
>> On 22/10/21 20:38, Pali Rohár wrote:
>>> Hello!
>>>
>>> In this email I'm describing how I see that probing of native PCIe
>>> controller drivers is implemented, how it should be implemented and how
>>> to extend / simplify core code for controller drivers.
>>>
>>>
>>> Native PCIe controller drivers needs to fill struct pci_host_bridge and
>>> then call pci_host_probe(). Function pci_host_probe() starts probing and
>>> enumerating buses and register PCIe devices to system.
>>>
>>> But initialization of PCIe controller and cards on buses (other end of
>>> PCIe link) is more complicated and basically every native PCIe
>>> controller driver needs to do initialization PCIe link prior calling
>>> pci_host_probe(). Steps which controller drivers are doing are de-facto
>>> standard steps defined in PCIe base or CEM specification.
>>>
>>> The most problematic step is to reset endpoint card and wait until
>>> endpoint card start. Reset of endpoint card is done by standard PERST#
>>> signal (defined in PCIe CEM spec) and in most cases board export control
>>> of this signal to OS via GPIO (few board and drivers have this signal
>>> connected to PCIe controller and then PCIe controller has some specific
>>> registers to control this signal). Reset via PERST# signal is defined in
>>> PCIe CEM and base specs as "PCIe Warm Reset".
>>>
>>> As discussed in the following email thread, this PCIe Warm Reset should
>>> not depend on PCIe controller as it resets card on the other end of PCIe
>>> controller. But currently every native PCIe controller driver does PCIe
>>> Warm Reset by its own for randomly chosen time period. There is open
>>> question how long should be endpoint card in Warm Reset state:
>>> https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
>>>
>>> Initialization of PCIe endpoint card is described in PCIe CEM spec in
>>> Figure 2-10: Power Up. Other informations are in PCIe base spec in 6.6.1
>>> Conventional Reset section.
>>>
>>> If I understand specifications correctly then OS initialization steps
>>> should be following (please correct me if I'm wrong!):
>>>
>>> 1) Put PERST# to low which enter endpoint card into reset state
>>> 2) Enable AUX power (3.3V) and wait until is stable
>>> 3) Enable main power (12V/3.3V) and wait until is stable
>>> 4) Enable refclock and wait until is stable
>>> 5) Enable LTSSM on PCIe controller to start link training
>>> 6) Put PERST# to high which exit endpoint card from reset state
>>> 7) Wait until link training completes
>>> 8) Wait another 100ms prior accessing config space of endpoint card
>>>
>>> Minimal time period between "after step 3)" and "before step 6)" is T_PVPERL = 100ms
>>> Minimal time period between "after step 4)" and "before step 6)" is T_PERSTCLK = 100us
>>>
>>> After step 6) is end of Fundamental Reset and PCIe controller needs to
>>> be in LTSSM Detect state within 20ms. So enabling it prior putting
>>> PERST# to high should achieve it.
>>>
>>> Competition of link training is indicated by standard DLLLA bit in Root
>>> Port config space. Support for DLLLA bit is optional and is indicated by
>>> DLLLARC bit in Root Port config space. Lot of PCIe controllers do not
>>> support this standard DLLLA bit, but have their own specific register
>>> for it.
>>>
>>> Similarly is defined power down of PCIe card in PCIe CEM spec in Figure
>>> 2-13: Power Down. If I understand it correctly steps are:
>>>
>>> 1) Put endpoint card into D3hot state, so PCIe link goes inactive
>>> 2) Put PERST# to low, so endpoint card enters reset state
>>> 3) Disable main power (12V/3.3V)
>>> 4) Disable refclock
>>>
>>> In case of surprise power down, PERST# needs to go low in 500ns.
>>>
>>> In PCIe base spec in section 5.2 Link State Power Management is
>>> described that preparation for removing the main power source can be
>>> done also by sending PCIe PME_Turn_Off Message by Root Complex. IIRC
>>> there is no standard way how to send PCIe PME_Turn_Off message.
>>>
>>>
>>>
>>>
>>> I see that basically every PCIe controller driver is doing its own
>>> implementation of PCIe Warm Reset and waiting until PCIe link is ready
>>> prior calling pci_host_probe().
>>>
>>> Based on all above details I would like to propose following extending
>>> of struct pci_host_bridge and pci_host_probe() function to de-duplicate
>>> native PCIe controller driver code:
>>>
>>> 1) extend struct pci_host_bridge to provide callbacks for:
>>>    * enable / disable main power source
>>>    * enable / disable refclock
>>>    * enable / disable LTSSM link training (if PCIe link should go into Detect / Polling state)
>>>    * enable / disable PERST# signal
>>>    * returning boolean if endpoint card is present (physically in PCIe/mPCIe/m.2/... slot)
>>>    * returning boolean if link training completed
>>>    * sending PCIe PME_Turn_Off message
>>>
>>> 2) implement asynchronous initialization of PCIe link and enumeration of
>>>    PCIe bus behind the PCIe Root Port from pci_host_probe() based on new
>>>    callbacks added in 1)
>>>    --> so native PCIe controller drivers do not have to do it manually
>>>    --> during this initialization can be done also PCIe Hot Reset
>>>
>>> 3) implement PCIe Hot Reset as reset method via PERST# signal and put it
>>>    into pci_reset_fn_methods[] array
>>>
>>> 4) implement PCIe Cold Reset as reset method via power down / up and put
>>>    it into pci_reset_fn_methods[] array
>>>
>>> 5) as enabling / disabling power supply and toggling PERST# signal is
>>>    implemented via GPIO, add some generic implementation for callback
>>>    functions which will use named gpios (e.g. from DT)
>>>
>>> This could simplify implementations of native PCIe controller drivers by
>>> calling initialization steps in correct order with correct timeouts and
>>> drivers do not have to do copy+paste same common code or reimplement it
>>> with own constants and macros for timeouts, etc...
>>>
>>> Also it should enable access to PCIe Root Port as this device is part of
>>> Root Complex and should be available also when link is down or link
>>> training was not completed. Currently some PCIe controllers are not
>>> registered into system when link is down (e.g. card is disconnected or
>>> card has some issue). Which also prevents access to PCIe Root Port
>>> device. And in some cases it could speed up boot process as pci
>>> controller driver does not have to actively wait for link and let kernel
>>> do initialization of other devices.
>>>
>>> What do you think about this idea?
>>
>> I'm afraid I know very little about PCI so I cannot give much valuable
>> feedback.
>>
>> For what I can understand yours seems like a good analysis and the plan
>> seems correct: move control from drivers into the core as far as the
>> actions to take are standard and leave drivers the duty to implement
>> only hardware-specific pieces via function pointers.
>>
>> We have discussed in detail the PERST# behavior in [0], and I have a
>> comment about that.
>>
>> First, from that discussion it was clear that some drivers drive PERST#
>> with an inverted polarity when calling gpiod_set_value() or
>> devm_gpiod_get_optional(), which is "fixed" by an inverted polarity
>> setting in their device tree.
> 
> Yes, that is truth.
> 
>> Second, you say "in most cases board export control of this signal to OS
>> via GPIO".
> 
> Yes.
> 
>> For these two reasons you might consider, in addition to the
>> pci_reset_fn_methods[] function to implement PERST#, to add a struct
>> gpio_desc * to be filled by the driver and used by the core. If set, it
>> would allow the core to assert/deassert PERST# without driver
>> intervention.
> 
> Controlling PERST# via pci_reset_fn_methods[] is not enough.
> pci_reset_fn_methods[PERST]() would do something like this:
> 
> 1. Assert PERST# (put card into reset)
> 2. Delay some time period (this is open question how long it should be)
> 3. De-assert PERST# (put card from reset into normal state)
> 
> But in some other functions it would be needed to put card into reset
> state and let it in this state (e.g. in power down, driver unbind,
> etc...). So pci_reset_fn_methods[PERST] would not be possible to use it
> here.
> 
> I was thinking to provide just callback bridge->set_perst(enable) and
> then implement pci_reset_fn_methods[PERST] via this callback.

I agree, a callback with a bool "enable" parameter is a good thing. It
also allows to move delays into the core.

> If that callback is NULL then PCI core code can assign default
> "set_perst" callback which use gpiod_set_value. Like in your pseudocode
> below.
> 
> Drivers which have inverted polarity of PERTST# gpio or drivers which
> have custom way for controlling this pin would have to provide
> appropriate callbacks.

If a gpio_desc is passed by the driver to the core, it "knows" the
polarity (from device tree) so there is no need for special handling by
the driver. So drivers which have inverted polarity won't need a
callback; drivers which need special handling (e.g. not a regular GPIO)
will need a callback.

-- 
Luca


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Extend probing of native PCIe controllers
  2021-11-08 13:14 ` Oliver O'Halloran
@ 2021-11-08 16:47   ` Pali Rohár
  0 siblings, 0 replies; 7+ messages in thread
From: Pali Rohár @ 2021-11-08 16:47 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Luca Ceresoli, Marek Behún, Matthew Wilcox,
	Alex Williamson, linux-pci, Linux Kernel Mailing List

Hello!

On Tuesday 09 November 2021 00:14:36 Oliver O'Halloran wrote:
> On Sat, Oct 23, 2021 at 5:38 AM Pali Rohár <pali@kernel.org> wrote:
> >
> > Hello!
> >
> > In this email I'm describing how I see that probing of native PCIe
> > controller drivers is implemented, how it should be implemented and how
> > to extend / simplify core code for controller drivers.
> >
> > Native PCIe controller drivers needs to fill struct pci_host_bridge and
> > then call pci_host_probe(). Function pci_host_probe() starts probing and
> > enumerating buses and register PCIe devices to system.
> >
> > But initialization of PCIe controller and cards on buses (other end of
> > PCIe link) is more complicated and basically every native PCIe
> > controller driver needs to do initialization PCIe link prior calling
> > pci_host_probe(). Steps which controller drivers are doing are de-facto
> > standard steps defined in PCIe base or CEM specification.
> >
> > The most problematic step is to reset endpoint card and wait until
> > endpoint card start. Reset of endpoint card is done by standard PERST#
> > signal (defined in PCIe CEM spec) and in most cases board export control
> > of this signal to OS via GPIO (few board and drivers have this signal
> > connected to PCIe controller and then PCIe controller has some specific
> > registers to control this signal). Reset via PERST# signal is defined in
> > PCIe CEM and base specs as "PCIe Warm Reset".
> >
> > As discussed in the following email thread, this PCIe Warm Reset should
> > not depend on PCIe controller as it resets card on the other end of PCIe
> > controller. But currently every native PCIe controller driver does PCIe
> > Warm Reset by its own for randomly chosen time period. There is open
> > question how long should be endpoint card in Warm Reset state:
> > https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
> >
> > Initialization of PCIe endpoint card is described in PCIe CEM spec in
> > Figure 2-10: Power Up. Other informations are in PCIe base spec in 6.6.1
> > Conventional Reset section.
> >
> > If I understand specifications correctly then OS initialization steps
> > should be following (please correct me if I'm wrong!):
> >
> > 1) Put PERST# to low which enter endpoint card into reset state
> > 2) Enable AUX power (3.3V) and wait until is stable
> > 3) Enable main power (12V/3.3V) and wait until is stable
> > 4) Enable refclock and wait until is stable
> > 5) Enable LTSSM on PCIe controller to start link training
> > 6) Put PERST# to high which exit endpoint card from reset state
> > 7) Wait until link training completes
> > 8) Wait another 100ms prior accessing config space of endpoint card
> >
> > Minimal time period between "after step 3)" and "before step 6)" is T_PVPERL = 100ms
> > Minimal time period between "after step 4)" and "before step 6)" is T_PERSTCLK = 100us
> >
> > After step 6) is end of Fundamental Reset and PCIe controller needs to
> > be in LTSSM Detect state within 20ms. So enabling it prior putting
> > PERST# to high should achieve it.
> 
> 5) is a bit out of place here and it could be step 0). There's no real
> requirements around when the upstream device (the controller in our
> case) starts polling for downstream devices. #PERST is effectively a
> power good signal so the downstream device is required to ignore the
> link polls until #PERST is lifted.

Ok, currently I do not see any issue by moving step 5) before step 1).

> > Competition of link training is indicated by standard DLLLA bit in Root
> > Port config space. Support for DLLLA bit is optional and is indicated by
> > DLLLARC bit in Root Port config space. Lot of PCIe controllers do not
> > support this standard DLLLA bit, but have their own specific register
> > for it.
> 
> I thought DLLLA reporting was made mandatory in gen2? I suppose it
> doesn't really matter since we need to support gen1 devices anyway,
> but still...

I have PCIe 2.0 and also PCIe 3.0 controllers where Root Ports do not
provide DLLLA capability. I do not know if DLLLA is mandatory in PCIe
specifications, but we need to support also these devices.

> > Similarly is defined power down of PCIe card in PCIe CEM spec in Figure
> > 2-13: Power Down. If I understand it correctly steps are:
> >
> > 1) Put endpoint card into D3hot state, so PCIe link goes inactive
> > 2) Put PERST# to low, so endpoint card enters reset state
> > 3) Disable main power (12V/3.3V)
> > 4) Disable refclock
> >
> > In case of surprise power down, PERST# needs to go low in 500ns.
> >
> > In PCIe base spec in section 5.2 Link State Power Management is
> > described that preparation for removing the main power source can be
> > done also by sending PCIe PME_Turn_Off Message by Root Complex. IIRC
> > there is no standard way how to send PCIe PME_Turn_Off message.
> 
> Is there any real difference between sending this message and placing
> the device into D3hot? IIRC devices in D3hot are required to tolerate
> a transition to D3cold (i.e. power being removed) anyway.

Hm... I do not know right now. I think that this is just another option.
Maybe somebody else know more details about it?

I will try to look into my PCIe book if there is some more explanation
about it.

> > I see that basically every PCIe controller driver is doing its own
> > implementation of PCIe Warm Reset and waiting until PCIe link is ready
> > prior calling pci_host_probe().
> >
> > Based on all above details I would like to propose following extending
> > of struct pci_host_bridge and pci_host_probe() function to de-duplicate
> > native PCIe controller driver code:
> >
> > 1) extend struct pci_host_bridge to provide callbacks for:
> >    * enable / disable main power source
> >    * enable / disable refclock
> >    * enable / disable LTSSM link training (if PCIe link should go into Detect / Polling state)
> >    * enable / disable PERST# signal
> >    * returning boolean if endpoint card is present (physically in PCIe/mPCIe/m.2/... slot)
> >    * returning boolean if link training completed
> >    * sending PCIe PME_Turn_Off message
> 
> I don't have any real objections to adding this, but I do wonder what
> code would be using these calls.

My initial idea is that native controller drivers would provide these
callbacks and they would not do these steps in their probe callback.
Instead pci core function would implement "common" probe method which
would call these callbacks in correct order and with proper delays and
checks between them.

Next pci_reset_fn_methods[PERST] code would call some of these callbacks
to implement Warm reset.

> If you squint a bit you've got
> something that looks a lot like a pci hotplug slot driver (enable /
> disable / presence check calls). It might make sense to try use some
> of that infrastructure for what you want to do since the bus scanning
> code already knows how to deal with slot drivers.

Something could be implemented in pci hotplug / slot driver.

I think that PCIe Warm Reset fits better for "slot reset" mechanism than
PCIe Hot Reset. It is because PCIe Warm Reset is done via out-of-band
signal (dedicated PERST# line) as opposite of PCIe Hot Reset which is
done via in-band PCIe protocol.

Enable / disable power source also perfectly fits for "slot power on"
mechanism.

Also hotplug / slot driver can monitor change of DLLLA state via PCIe
HotPlug interrupt. So it looks like that this could be reused too. But
in more PCIe controllers do not provide interrupt to signal link up and
some of them even DLLLA capability in Root Port (they have custom way
how to read link state).

> > 2) implement asynchronous initialization of PCIe link and enumeration of
> >    PCIe bus behind the PCIe Root Port from pci_host_probe() based on new
> >    callbacks added in 1)
> >    --> so native PCIe controller drivers do not have to do it manually
> >    --> during this initialization can be done also PCIe Hot Reset
> >
> > 3) implement PCIe Hot Reset as reset method via PERST# signal and put it
> >    into pci_reset_fn_methods[] array
> 
> I assume you mean Warm reset here.

Yes, that is typo. I really mean PCIe Warm Reset here.

> I'll add a word of caution that
> there are badly behaved devices out there which will ignore #PERST
> being re-asserted after the card is powered on.

Thanks for information! I know that there are devices which do not like
PCIe Hot Reset (done via Secondary Bus Reset bit in PCI Bridge).
And kernel has already quirks for these devices which disallow usage of
this kind of reset.

So if there are issue also with PCIe Warm Reset via PERST# pin then we
can use this kind of reset as the last one (pci_reset_fn_methods has
priority of resets). This should not break existing cards and if no
other reset is possible then card would stay in broken state like
before. Also we could add quirks which disallow this kind of reset for
broken cards. I think that there are more options how to handle it.

At least lot of native PCIe controller drivers are doing PERST# reset
during kernel boot time more of these PCIe controllers are also
initialized in bootloader U-Boot. So this issue about re-asserting
PERST# pin is already affected by more PCIe controller drivers.

> > 4) implement PCIe Cold Reset as reset method via power down / up and put
> >    it into pci_reset_fn_methods[] array
> >
> > 5) as enabling / disabling power supply and toggling PERST# signal is
> >    implemented via GPIO, add some generic implementation for callback
> >    functions which will use named gpios (e.g. from DT)
> >
> > This could simplify implementations of native PCIe controller drivers by
> > calling initialization steps in correct order with correct timeouts and
> > drivers do not have to do copy+paste same common code or reimplement it
> > with own constants and macros for timeouts, etc...
> >
> > Also it should enable access to PCIe Root Port as this device is part of
> > Root Complex and should be available also when link is down or link
> > training was not completed. Currently some PCIe controllers are not
> > registered into system when link is down (e.g. card is disconnected or
> > card has some issue). Which also prevents access to PCIe Root Port
> > device.
> 
> Is there a good reason for skipping probing the controller when
> there's no downstream device present?

Either old code or hw bugs (accessing config space when link is down
cause CPU aborts). I saw on mailing list patch which is fixing this and
registering "host controller" even when link is down. So looks like
these issues are being fixing.

> Seems kinda dumb to make a whole
> pci domain disappear just because it didn't have a device at boot.

I agree.

> > And in some cases it could speed up boot process as pci
> > controller driver does not have to actively wait for link and let kernel
> > do initialization of other devices.
> 
> > What do you think about this idea?
> 
> Sounds like a good idea to me.

Thanks for checking it!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Extend probing of native PCIe controllers
  2021-10-22 18:38 RFC: Extend probing of native PCIe controllers Pali Rohár
  2021-11-08  8:50 ` Luca Ceresoli
  2021-11-08 13:14 ` Oliver O'Halloran
@ 2021-12-08 15:52 ` Bjorn Helgaas
  2 siblings, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2021-12-08 15:52 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Luca Ceresoli, Marek Behún, Matthew Wilcox,
	Alex Williamson, linux-pci, linux-kernel

On Fri, Oct 22, 2021 at 08:38:08PM +0200, Pali Rohár wrote:
> Hello!
> 
> In this email I'm describing how I see that probing of native PCIe
> controller drivers is implemented, how it should be implemented and how
> to extend / simplify core code for controller drivers.
>
> Native PCIe controller drivers needs to fill struct pci_host_bridge and
> then call pci_host_probe(). Function pci_host_probe() starts probing and
> enumerating buses and register PCIe devices to system.
> 
> But initialization of PCIe controller and cards on buses (other end of
> PCIe link) is more complicated and basically every native PCIe
> controller driver needs to do initialization PCIe link prior calling
> pci_host_probe(). Steps which controller drivers are doing are de-facto
> standard steps defined in PCIe base or CEM specification.
> 
> The most problematic step is to reset endpoint card and wait until
> endpoint card start. Reset of endpoint card is done by standard PERST#
> signal (defined in PCIe CEM spec) and in most cases board export control
> of this signal to OS via GPIO (few board and drivers have this signal
> connected to PCIe controller and then PCIe controller has some specific
> registers to control this signal). Reset via PERST# signal is defined in
> PCIe CEM and base specs as "PCIe Warm Reset".
> 
> As discussed in the following email thread, this PCIe Warm Reset should
> not depend on PCIe controller as it resets card on the other end of PCIe
> controller. But currently every native PCIe controller driver does PCIe
> Warm Reset by its own for randomly chosen time period. There is open
> question how long should be endpoint card in Warm Reset state:
> https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/
> 
> Initialization of PCIe endpoint card is described in PCIe CEM spec in
> Figure 2-10: Power Up. Other informations are in PCIe base spec in 6.6.1
> Conventional Reset section.
> 
> If I understand specifications correctly then OS initialization steps
> should be following (please correct me if I'm wrong!):
> 
> 1) Put PERST# to low which enter endpoint card into reset state
> 2) Enable AUX power (3.3V) and wait until is stable
> 3) Enable main power (12V/3.3V) and wait until is stable
> 4) Enable refclock and wait until is stable
> 5) Enable LTSSM on PCIe controller to start link training
> 6) Put PERST# to high which exit endpoint card from reset state
> 7) Wait until link training completes
> 8) Wait another 100ms prior accessing config space of endpoint card
> 
> Minimal time period between "after step 3)" and "before step 6)" is T_PVPERL = 100ms
> Minimal time period between "after step 4)" and "before step 6)" is T_PERSTCLK = 100us
> 
> After step 6) is end of Fundamental Reset and PCIe controller needs to
> be in LTSSM Detect state within 20ms. So enabling it prior putting
> PERST# to high should achieve it.
> 
> Competition of link training is indicated by standard DLLLA bit in Root
> Port config space. Support for DLLLA bit is optional and is indicated by
> DLLLARC bit in Root Port config space. Lot of PCIe controllers do not
> support this standard DLLLA bit, but have their own specific register
> for it.
> 
> Similarly is defined power down of PCIe card in PCIe CEM spec in Figure
> 2-13: Power Down. If I understand it correctly steps are:
> 
> 1) Put endpoint card into D3hot state, so PCIe link goes inactive
> 2) Put PERST# to low, so endpoint card enters reset state
> 3) Disable main power (12V/3.3V)
> 4) Disable refclock
> 
> In case of surprise power down, PERST# needs to go low in 500ns.
> 
> In PCIe base spec in section 5.2 Link State Power Management is
> described that preparation for removing the main power source can be
> done also by sending PCIe PME_Turn_Off Message by Root Complex. IIRC
> there is no standard way how to send PCIe PME_Turn_Off message.
> 
> 
> 
> 
> I see that basically every PCIe controller driver is doing its own
> implementation of PCIe Warm Reset and waiting until PCIe link is ready
> prior calling pci_host_probe().
> 
> Based on all above details I would like to propose following extending
> of struct pci_host_bridge and pci_host_probe() function to de-duplicate
> native PCIe controller driver code:
> 
> 1) extend struct pci_host_bridge to provide callbacks for:
>    * enable / disable main power source
>    * enable / disable refclock
>    * enable / disable LTSSM link training (if PCIe link should go into Detect / Polling state)
>    * enable / disable PERST# signal
>    * returning boolean if endpoint card is present (physically in PCIe/mPCIe/m.2/... slot)
>    * returning boolean if link training completed
>    * sending PCIe PME_Turn_Off message
> 
> 2) implement asynchronous initialization of PCIe link and enumeration of
>    PCIe bus behind the PCIe Root Port from pci_host_probe() based on new
>    callbacks added in 1)
>    --> so native PCIe controller drivers do not have to do it manually
>    --> during this initialization can be done also PCIe Hot Reset
> 
> 3) implement PCIe Hot Reset as reset method via PERST# signal and put it
>    into pci_reset_fn_methods[] array
> 
> 4) implement PCIe Cold Reset as reset method via power down / up and put
>    it into pci_reset_fn_methods[] array
> 
> 5) as enabling / disabling power supply and toggling PERST# signal is
>    implemented via GPIO, add some generic implementation for callback
>    functions which will use named gpios (e.g. from DT)
> 
> This could simplify implementations of native PCIe controller drivers by
> calling initialization steps in correct order with correct timeouts and
> drivers do not have to do copy+paste same common code or reimplement it
> with own constants and macros for timeouts, etc...
> 
> Also it should enable access to PCIe Root Port as this device is part of
> Root Complex and should be available also when link is down or link
> training was not completed. Currently some PCIe controllers are not
> registered into system when link is down (e.g. card is disconnected or
> card has some issue). Which also prevents access to PCIe Root Port
> device. And in some cases it could speed up boot process as pci
> controller driver does not have to actively wait for link and let kernel
> do initialization of other devices.
> 
> What do you think about this idea?

I would love to have somebody work on this because every native driver
does it differently and I'm guessing none of them really do it
correctly.

Note that struct pci_host_bridge is a 1:many situation with Root
Ports, which is where the link management is.  Unfortunately many
drivers have the single Root Port assumption baked into them.

I think I would start small, by adding some #defines for T_PVPERL,
T_PERSTCLK, etc, and using them, which will probably expose lots of
gaps and hidden assumptions.  Then maybe add the callback functions
but use them internally to the driver before moving the calls to the
shared core.  I like it best when reorganizations like moving all the
initialization steps into the core are simple "code move only, no
functional change" sort of things.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-08 15:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22 18:38 RFC: Extend probing of native PCIe controllers Pali Rohár
2021-11-08  8:50 ` Luca Ceresoli
2021-11-08 16:11   ` Pali Rohár
2021-11-08 16:33     ` Luca Ceresoli
2021-11-08 13:14 ` Oliver O'Halloran
2021-11-08 16:47   ` Pali Rohár
2021-12-08 15:52 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).