All of lore.kernel.org
 help / color / mirror / Atom feed
From: <Austin.Bolen@dell.com>
To: <helgaas@kernel.org>, <Austin.Bolen@dell.com>
Cc: <sathyanarayanan.kuppuswamy@linux.intel.com>,
	<linux-pci@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<ashok.raj@intel.com>
Subject: Re: [PATCH v17 09/12] PCI/AER: Allow clearing Error Status Register in FF mode
Date: Wed, 11 Mar 2020 15:19:44 +0000	[thread overview]
Message-ID: <0890801daa6c4564bca1690fd8439dab@AUSX13MPC107.AMER.DELL.COM> (raw)
In-Reply-To: 20200311144556.GA208157@google.com

On 3/11/2020 9:46 AM, Bjorn Helgaas wrote:
> 
> [EXTERNAL EMAIL]
> 
> On Tue, Mar 10, 2020 at 08:06:21PM +0000, Austin.Bolen@dell.com wrote:
>> On 3/10/2020 2:33 PM, Bjorn Helgaas wrote:
>>> On Tue, Mar 10, 2020 at 06:14:20PM +0000, Austin.Bolen@dell.com wrote:
>>>> On 3/9/2020 11:28 PM, Kuppuswamy, Sathyanarayanan wrote:
>>>>> On 3/9/2020 7:40 PM, Bjorn Helgaas wrote:
>>>>>> [+cc Austin, tentative Linux patches on this git branch:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/tree/drivers/pci/pcie?h=review/edr]
>>>>>>
>>>>>> On Tue, Mar 03, 2020 at 06:36:32PM -0800, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>>>>>>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>>>>>
>>>>>>> As per PCI firmware specification r3.2 System Firmware Intermediary
>>>>>>> (SFI) _OSC and DPC Updates ECR
>>>>>>> (https://members.pcisig.com/wg/PCI-SIG/document/13563), sec titled "DPC
>>>>>>> Event Handling Implementation Note", page 10, Error Disconnect Recover
>>>>>>> (EDR) support allows OS to handle error recovery and clearing Error
>>>>>>> Registers even in FF mode. So create new API pci_aer_raw_clear_status()
>>>>>>> which allows clearing AER registers without FF mode checks.
> 
>>>> OS clears the DPC Trigger Status bit which will bring port below it out
>>>> of containment. Then OS will clear the "port" error status bits (i.e.,
>>>> the AER and DPC status bits in the root port or downstream port that
>>>> triggered containment). I don't think it would hurt to do this two steps
>>>> in reverse order but don't think it is necessary.
> 
>>>> Note that error status bits for devices below the port in
>>>> containment are cleared later after f/w has a chance to log them.
> 
> Thanks for pointing out this wrinkle about devices below the port in
> containment.  I think we might have an issue here with the current
> series because evaluating _OST is the last thing the EDR notify
> handler does.  More below.
> 
>>> Maybe I'm misreading the DPC enhancements ECN.  I think it says the OS
>>> can read/write DPC registers until it clears the DPC Trigger Status.
>>> If the OS clears Trigger Status first, my understanding is that we're
>>> now out of the EDR notification processing window and the OS is not
>>> permitted to write DPC registers.
>>>
>>> If it's OK for the OS to clear Trigger Status before clearing DPC
>>> error status, what is the event that determines when the OS may no
>>> longer read/write the DPC registers?
>>
>> I think there are a few different registers to consider... DPC
>> Control, DPC Status, various AER registers, and the RP PIO
>> registers. At this point in the flow, the firmware has already had a
>> chance to read all of them and so it really doesn't matter the order
>> the OS does those two things. The firmware isn't going to get
>> notified again until _OST so by then both operation will be done and
>> system firmware will have no idea which order the OS did them in,
>> nor will it care.  But since the existing normative text specifies
>> and order, I would just follow that.
> 
> OK, this series clears DPC error status before clearing DPC Trigger
> Status, so I think we can keep that as-is.
> 
>>> There are no events after the "clear device AER status" box.  That
>>> seems to mean the OS can write the AER status registers at any
>>> time.  But the whole implementation note assumes firmware
>>> maintains control of AER.
>>
>> In this model the OS doesn't own DPC or AER but the model allows OS
>> to touch both DPC and AER registers at certain times.  I would view
>> ownership in this case as who is the primary owner and not who is
>> the sole entity allowed to access the registers.
> 
> I'm not sure how to translate the idea of primary ownership into code.

I would just add text that said when it's ok for OS to touch these bits 
even when they don't own them similar to what's done for the DPC bits.

> 
>> For the normative text describing when OS clears the AER bits
>> following the informative flow chart, it could say that OS clears
>> AER as soon as possible after OST returns and before OS processes
>> _HPX and loading drivers.  Open to other suggestions as well.
> 
> I'm not sure what to do with "as soon as possible" either.  That
> doesn't seem like something firmware and the OS can agree on.
> 

I can just state that it's done after OST returns but before _HPX or 
driver is loaded. Any time in that range is fine. I can't get super 
specific here because different OSes do different things.  Even for a 
given OS they change over time. And I need something generic enough to 
support a wide variety of OS implementations.

> For the port that triggered DPC containment, I think the easiest thing
> to understand and implement would be to allow AER access during the
> same EDR processing window where DPC access is allowed.
Agreed.

> 
> For child devices of that port, obviously it's impossible to access
> AER registers until DPC Trigger Status is cleared, and the flowchart
> says the OS shouldn't access them until after _OST.
> 
> I'm actually not sure we currently do *anything* with child device AER
> info in the EDR path.  pcie_do_recovery() does walk the sub-hierarchy
> of child devices, but it only calls error handling callbacks in the
> child drivers; it doesn't do anything with the child AER registers
> itself.  And of course, this happens before _OST, so it would be too
> early in any case.  But maybe I'm missing something here.

My understanding is that the OS read/clears AER in the case where OS has 
native control of AER.  Feedback from OSVs is they wanted to continue to 
do that to keep the native OS controlled AER and FF mechanism similar. 
The other way we could have done it would be to have the firmware 
read/clear AER and report them to OS via APEI.

> 
> BTW, if/when this is updated, I have another question: the _OSC DPC
> control bit currently allows the OS to write DPC Control during that
> window.  I understand the OS writing the RW1C *Status* bits to clear
> them, but it seems like writing the DPC Control register is likely to
> cause issues.  The same question would apply to the AER access we're
> talking about.

We could specify which particular bits can and can't be touched.  But 
it's hard to maintain as new bits are added.  Probably better to add 
some guidance that OS should read/clear error status, DPC Trigger 
Status, etc. but shouldn't change masks/severity/control bits/etc.

> 
> Bjorn
> 


  reply	other threads:[~2020-03-11 15:19 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04  2:36 [PATCH v17 00/12] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 01/12] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 02/12] PCI/AER: Move pci_cleanup_aer_error_status_regs() declaration to pci.h sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 03/12] PCI/ERR: Remove service dependency in pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-17 14:40   ` Christoph Hellwig
2020-03-04  2:36 ` [PATCH v17 04/12] PCI: portdrv: remove unnecessary pcie_port_find_service() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 05/12] PCI: portdrv: remove reset_link member from pcie_port_service_driver sathyanarayanan.kuppuswamy
2020-03-17 14:41   ` Christoph Hellwig
2020-03-17 14:55     ` Kuppuswamy, Sathyanarayanan
2020-03-04  2:36 ` [PATCH v17 06/12] Documentation: PCI: Remove reset_link references sathyanarayanan.kuppuswamy
2020-03-17 14:42   ` Christoph Hellwig
2020-03-17 15:05     ` Kuppuswamy, Sathyanarayanan
2020-03-17 15:07       ` Christoph Hellwig
2020-03-17 16:03         ` Bjorn Helgaas
2020-03-17 17:06           ` Christoph Hellwig
2020-03-19 22:52             ` Bjorn Helgaas
2020-03-04  2:36 ` [PATCH v17 07/12] PCI/ERR: Return status of pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 08/12] PCI/DPC: Cache DPC capabilities in pci_init_capabilities() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 09/12] PCI/AER: Allow clearing Error Status Register in FF mode sathyanarayanan.kuppuswamy
2020-03-06  5:45   ` Kuppuswamy, Sathyanarayanan
2020-03-06 16:04     ` Bjorn Helgaas
2020-03-06 16:11       ` Kuppuswamy, Sathyanarayanan
2020-03-06 16:41         ` Bjorn Helgaas
2020-03-10  2:40   ` Bjorn Helgaas
2020-03-10  4:28     ` Kuppuswamy, Sathyanarayanan
2020-03-10 18:14       ` Austin.Bolen
2020-03-10 19:32         ` Bjorn Helgaas
2020-03-10 20:06           ` Austin.Bolen
2020-03-10 20:41             ` Kuppuswamy Sathyanarayanan
2020-03-10 20:41               ` Kuppuswamy Sathyanarayanan
2020-03-10 20:49               ` Austin.Bolen
2020-03-11 14:45             ` Bjorn Helgaas
2020-03-11 15:19               ` Austin.Bolen [this message]
2020-03-11 17:12                 ` Bjorn Helgaas
2020-03-11 17:27                   ` Austin.Bolen
2020-03-11 20:33                     ` Bjorn Helgaas
2020-03-11 21:25                       ` Kuppuswamy Sathyanarayanan
2020-03-11 21:53                         ` Austin.Bolen
2020-03-11 22:11                           ` Kuppuswamy Sathyanarayanan
2020-03-11 22:23                             ` Bjorn Helgaas
2020-03-11 23:07                               ` Kuppuswamy Sathyanarayanan
2020-03-12 19:53                                 ` Bjorn Helgaas
2020-03-12 21:02                                   ` Austin.Bolen
2020-03-12 21:29                                     ` Kuppuswamy Sathyanarayanan
2020-03-12 21:52                                       ` Bjorn Helgaas
2020-03-12 22:02                                         ` Kuppuswamy Sathyanarayanan
2020-03-12 22:36                                           ` Bjorn Helgaas
2020-03-12 21:59                                   ` Kuppuswamy Sathyanarayanan
2020-03-12 22:32                                     ` Bjorn Helgaas
2020-03-13  6:22                                       ` Kuppuswamy, Sathyanarayanan
2020-03-13 19:28                                         ` Bjorn Helgaas
2020-03-13 20:26                                           ` Kuppuswamy Sathyanarayanan
2020-03-19 23:03                                             ` Bjorn Helgaas
2020-03-19 23:20                                               ` Kuppuswamy, Sathyanarayanan
2020-03-11 22:13                         ` Bjorn Helgaas
2020-03-11 22:41                           ` Kuppuswamy Sathyanarayanan
2020-03-11 18:12                   ` Kuppuswamy Sathyanarayanan
2020-03-11 22:05             ` Bjorn Helgaas
2020-03-04  2:36 ` [PATCH v17 10/12] PCI/DPC: Export DPC error recovery functions sathyanarayanan.kuppuswamy
2020-03-17 14:43   ` Christoph Hellwig
2020-03-04  2:36 ` [PATCH v17 11/12] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-06  3:47   ` Bjorn Helgaas
2020-03-06  6:32     ` Kuppuswamy, Sathyanarayanan
2020-03-06 21:00       ` Bjorn Helgaas
2020-03-06 22:42         ` Kuppuswamy Sathyanarayanan
2020-03-06 23:23           ` Bjorn Helgaas
2020-03-07  0:19             ` Kuppuswamy Sathyanarayanan
2020-03-04  2:36 ` [PATCH v17 12/12] PCI/ACPI: Enable EDR support sathyanarayanan.kuppuswamy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0890801daa6c4564bca1690fd8439dab@AUSX13MPC107.AMER.DELL.COM \
    --to=austin.bolen@dell.com \
    --cc=ashok.raj@intel.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.