All of lore.kernel.org
 help / color / mirror / Atom feed
* PCI CRS Support
@ 2016-08-24 15:56 Sinan Kaya
  2016-08-24 19:10 ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: Sinan Kaya @ 2016-08-24 15:56 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas, Vikram Sethi

Hi Bjorn,
I see that the kernel has support for Configuration Request Retry Status (CRS) visibility
support and it gets discovered and enabled as part of the probe function.

Let's assume a system with CRS capability and have its visibility set as above.
I do not see any code in the failure/reset path to support the CRS requests
returned by the endpoint.

An endpoint is allowed to return CRS after several reset types. I'm pasting the part of
the spec for you at 2.3.1 Request Handling Rules of 3.1 spec.

"For Configuration Requests only, following reset it is possible for a device to terminate the request 
but indicate that it is temporarily unable to process the Request, but will be able to process the Request 
in the future – in this case, the Configuration Request Retry 10 Status (CRS) Completion Status is used 
(see Section 6.6). Valid reset conditions after which a device is permitted to return CRS are:

- Cold, Warm, and Hot Resets
- FLRs
- A reset initiated in response to a D3hot to D0uninitialized device state transition."

I have identified the following functions that have problems for warm and hot resets.

Some callers of pci_reset_bridge_secondary_bus such as pciehp_reset_slot, aer_root_reset.
Other higher level callers such as pci_bus_reset, pci_try_reset_bus and their callers from VFIO.
All these places are impacted by a CRS call. They do the secondary bus reset but do not wait for the
endpoint to respond. Waiting for 1 second is not a guarantee that the endpoint will start responding
immediately. A CRS capable OS needs to interpret the incoming CRS response and poll longer
since CRS visibility is et.

All of this was warm and hot reset.

I also see another problem in the FLR path too. There is some best effort wait up to 1 second in
pci_flr_wait.

Where do we go from here? I was thinking of putting something deep down into the reset secondary
bus function but I'm afraid it will break things especially when we wait up to 60 seconds.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PCI CRS Support
  2016-08-24 15:56 PCI CRS Support Sinan Kaya
@ 2016-08-24 19:10 ` Bjorn Helgaas
  2016-08-24 19:28   ` Sinan Kaya
  0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2016-08-24 19:10 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: Linux PCI, Bjorn Helgaas, Vikram Sethi

Hi Sinan,

On Wed, Aug 24, 2016 at 11:56:18AM -0400, Sinan Kaya wrote:
> Hi Bjorn,
> I see that the kernel has support for Configuration Request Retry Status (CRS) visibility
> support and it gets discovered and enabled as part of the probe function.
> 
> Let's assume a system with CRS capability and have its visibility set as above.
> I do not see any code in the failure/reset path to support the CRS requests
> returned by the endpoint.
> 
> An endpoint is allowed to return CRS after several reset types. I'm pasting the part of
> the spec for you at 2.3.1 Request Handling Rules of 3.1 spec.
> 
> "For Configuration Requests only, following reset it is possible for a device to terminate the request 
> but indicate that it is temporarily unable to process the Request, but will be able to process the Request 
> in the future – in this case, the Configuration Request Retry 10 Status (CRS) Completion Status is used 
> (see Section 6.6). Valid reset conditions after which a device is permitted to return CRS are:
> 
> - Cold, Warm, and Hot Resets
> - FLRs
> - A reset initiated in response to a D3hot to D0uninitialized device state transition."
> 
> I have identified the following functions that have problems for warm and hot resets.
> 
> Some callers of pci_reset_bridge_secondary_bus such as pciehp_reset_slot, aer_root_reset.
> Other higher level callers such as pci_bus_reset, pci_try_reset_bus and their callers from VFIO.
> All these places are impacted by a CRS call. They do the secondary bus reset but do not wait for the
> endpoint to respond. Waiting for 1 second is not a guarantee that the endpoint will start responding
> immediately. A CRS capable OS needs to interpret the incoming CRS response and poll longer
> since CRS visibility is et.
> 
> All of this was warm and hot reset.
> 
> I also see another problem in the FLR path too. There is some best effort wait up to 1 second in
> pci_flr_wait.
> 
> Where do we go from here? I was thinking of putting something deep down into the reset secondary
> bus function but I'm afraid it will break things especially when we wait up to 60 seconds.

I agree CRS handling after reset is probably all broken.

I hate the fact that we reset devices without re-enumerating them.  We
have no assurance that the device is the same after reset (it could
have loaded new firmware and been completely reconfigured).

I don't have any good suggestions for you, so if you have some ideas
and want to fix it, please go ahead.

Bjorn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PCI CRS Support
  2016-08-24 19:10 ` Bjorn Helgaas
@ 2016-08-24 19:28   ` Sinan Kaya
  2016-08-25  8:44     ` Lorenzo Pieralisi
  0 siblings, 1 reply; 5+ messages in thread
From: Sinan Kaya @ 2016-08-24 19:28 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Bjorn Helgaas, Vikram Sethi

On 8/24/2016 3:10 PM, Bjorn Helgaas wrote:
>> Where do we go from here? I was thinking of putting something deep down into the reset secondary
>> > bus function but I'm afraid it will break things especially when we wait up to 60 seconds.
> I agree CRS handling after reset is probably all broken.
> 
> I hate the fact that we reset devices without re-enumerating them.  We
> have no assurance that the device is the same after reset (it could
> have loaded new firmware and been completely reconfigured).
> 
> I don't have any good suggestions for you, so if you have some ideas
> and want to fix it, please go ahead.

I think I'll make a list of paths that reach to secondary bus reset function
and try to keep CRS loop as close as possible to the caller. 

I'll focus on FLR later. I won't be heart-broken if somebody took a stab at the
FLR.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PCI CRS Support
  2016-08-24 19:28   ` Sinan Kaya
@ 2016-08-25  8:44     ` Lorenzo Pieralisi
  2016-08-28 17:57       ` Sinan Kaya
  0 siblings, 1 reply; 5+ messages in thread
From: Lorenzo Pieralisi @ 2016-08-25  8:44 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Bjorn Helgaas, Linux PCI, Bjorn Helgaas, Vikram Sethi, alex.williamson

Hi Sinan,

[+Alex]

On Wed, Aug 24, 2016 at 03:28:51PM -0400, Sinan Kaya wrote:
> On 8/24/2016 3:10 PM, Bjorn Helgaas wrote:
> >> Where do we go from here? I was thinking of putting something deep down into the reset secondary
> >> > bus function but I'm afraid it will break things especially when we wait up to 60 seconds.
> > I agree CRS handling after reset is probably all broken.
> > 
> > I hate the fact that we reset devices without re-enumerating them.  We
> > have no assurance that the device is the same after reset (it could
> > have loaded new firmware and been completely reconfigured).
> > 
> > I don't have any good suggestions for you, so if you have some ideas
> > and want to fix it, please go ahead.
> 
> I think I'll make a list of paths that reach to secondary bus reset
> function and try to keep CRS loop as close as possible to the caller. 
> 
> I'll focus on FLR later. I won't be heart-broken if somebody took a
> stab at the FLR.

Side note, taking advantage of this query: I hope this is a problem
solved by end of October, if it is not I would suggest adding a track to
tackle it at LPC16 PCI uconf, I added Alex since I know he is interested
in the topic too, it makes sense to debate a solution with all
interested people in one room.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PCI CRS Support
  2016-08-25  8:44     ` Lorenzo Pieralisi
@ 2016-08-28 17:57       ` Sinan Kaya
  0 siblings, 0 replies; 5+ messages in thread
From: Sinan Kaya @ 2016-08-28 17:57 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Bjorn Helgaas
  Cc: Linux PCI, Bjorn Helgaas, Vikram Sethi, alex.williamson

Hi Bjorn, Lorenzo;

On 8/25/2016 4:44 AM, Lorenzo Pieralisi wrote:
> Hi Sinan,
> 
> [+Alex]
> 
> On Wed, Aug 24, 2016 at 03:28:51PM -0400, Sinan Kaya wrote:
>> On 8/24/2016 3:10 PM, Bjorn Helgaas wrote:
>>>> Where do we go from here? I was thinking of putting something deep down into the reset secondary
>>>>> bus function but I'm afraid it will break things especially when we wait up to 60 seconds.
>>> I agree CRS handling after reset is probably all broken.
>>>
>>> I hate the fact that we reset devices without re-enumerating them.  We
>>> have no assurance that the device is the same after reset (it could
>>> have loaded new firmware and been completely reconfigured).
>>>
>>> I don't have any good suggestions for you, so if you have some ideas
>>> and want to fix it, please go ahead.
>>
>> I think I'll make a list of paths that reach to secondary bus reset
>> function and try to keep CRS loop as close as possible to the caller. 
>>
>> I'll focus on FLR later. I won't be heart-broken if somebody took a
>> stab at the FLR.
> 
> Side note, taking advantage of this query: I hope this is a problem
> solved by end of October, if it is not I would suggest adding a track to
> tackle it at LPC16 PCI uconf, I added Alex since I know he is interested
> in the topic too, it makes sense to debate a solution with all
> interested people in one room.
> 

Let's see where this goes.

Here is a list of all the callers I found. Based on what I see, the code
assumes that registers are accessible immediately after pci_reset_bridge_secondary_bus
call.

pci_reset_bridge_secondary_bus also sleeps for 1 seconds. We can maybe assume
that it is OK to sleep additional potential 60 seconds. 

How do you feel about running CRS against all immediate children
of this bridge in pci_reset_secondary_bus function by calling pci_bus_read_dev_vendor_id
for every single child?

Something like:

+ u32 l;

+ list_for_each_entry(dev, &bus->devices, bus_list)
+ 	pci_bus_read_dev_vendor_id(bus, dev->devfn, &l, 60*1000);



trigger_sbr (infiniband/hw/hfi1/pcie.c)
	pci_reset_bridge_secondary_bus (export)
		pcibios_reset_secondary_bus	 (weak)
			pci_reset_secondary_bus

__pci_reset_function
	pci_dev_reset
		__pci_dev_reset
			pci_dev_reset_slot_function
				pci_reset_hotplug_slot
					reset_slot (drivers/pci/hotplug/pciehp_core.c)
						pciehp_reset_slot (drivers/pci/hotplug/pciehp_hpc.c)
							pci_reset_bridge_secondary_bus (export)
								pcibios_reset_secondary_bus	 (weak)
									pci_reset_secondary_bus

							pcie_capability_write_word


			pci_parent_bus_reset
				pci_reset_bridge_secondary_bus
					pcibios_reset_secondary_bus	 (weak)
						pci_reset_secondary_bus

pci_try_reset_function
	__pci_dev_reset
	pci_dev_restore

pcistub_put_pci_dev
	__pci_reset_function_locked
		__pci_dev_reset
	pci_restore_state

pcistub_init_device
	__pci_reset_function_locked
		__pci_dev_reset
	pci_restore_state

handle_error_source
	do_recovery
		reset_link (drivers/pci/pcie/aer/aerdrv_core.c)
			default_reset_link
				pci_reset_bridge_secondary_bus (export)
					pcibios_reset_secondary_bus	 (weak)
						pci_reset_secondary_bus


			driver->reset_link
				aer_root_reset
					pci_reset_bridge_secondary_bus (export)
						pcibios_reset_secondary_bus	 (weak)
							pci_reset_secondary_bus
					pci_read_config_dword

vfio_pci_ioctl (VFIO_DEVICE_PCI_HOT_RESET)
	pci_try_reset_bus
		pci_reset_bridge_secondary_bus
			pcibios_reset_secondary_bus	 (weak)
				pci_reset_secondary_bus

vfio_pci_disable
	vfio_pci_try_bus_reset
		pci_try_reset_bus
			pci_reset_bridge_secondary_bus
				pcibios_reset_secondary_bus	 (weak)
					pci_reset_secondary_bus

	pci_probe_reset_bus (probe =1)
		pci_bus_reset
			pci_reset_bridge_secondary_bus
				pcibios_reset_secondary_bus	 (weak)
					pci_reset_secondary_bus

vfio_pci_ioctl ( VFIO_DEVICE_GET_PCI_HOT_RESET_INFO)
	pci_probe_reset_bus (probe =1)
		pci_bus_reset
			pci_reset_bridge_secondary_bus
				pcibios_reset_secondary_bus	 (weak)
					pci_reset_secondary_bus

pci_reset_bus (no caller)
	pci_probe_reset_bus (probe =1)
		pci_bus_reset
			pci_reset_bridge_secondary_bus
				pcibios_reset_secondary_bus	 (weak)
					pci_reset_secondary_bus




-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-08-28 17:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-24 15:56 PCI CRS Support Sinan Kaya
2016-08-24 19:10 ` Bjorn Helgaas
2016-08-24 19:28   ` Sinan Kaya
2016-08-25  8:44     ` Lorenzo Pieralisi
2016-08-28 17:57       ` Sinan Kaya

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.