linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Austin.Bolen@dell.com
Cc: sathyanarayanan.kuppuswamy@linux.intel.com,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	ashok.raj@intel.com
Subject: Re: [PATCH v17 09/12] PCI/AER: Allow clearing Error Status Register in FF mode
Date: Tue, 10 Mar 2020 14:32:57 -0500	[thread overview]
Message-ID: <20200310193257.GA170043@google.com> (raw)
In-Reply-To: <0476c948e73f4c68a9bf221afccfcf7e@AUSX13MPC107.AMER.DELL.COM>

On Tue, Mar 10, 2020 at 06:14:20PM +0000, Austin.Bolen@dell.com wrote:
> On 3/9/2020 11:28 PM, Kuppuswamy, Sathyanarayanan wrote:
> > On 3/9/2020 7:40 PM, Bjorn Helgaas wrote:
> >> [+cc Austin, tentative Linux patches on this git branch:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/tree/drivers/pci/pcie?h=review/edr]
> >>
> >> On Tue, Mar 03, 2020 at 06:36:32PM -0800, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> >>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> >>>
> >>> As per PCI firmware specification r3.2 System Firmware Intermediary
> >>> (SFI) _OSC and DPC Updates ECR
> >>> (https://members.pcisig.com/wg/PCI-SIG/document/13563), sec titled "DPC
> >>> Event Handling Implementation Note", page 10, Error Disconnect Recover
> >>> (EDR) support allows OS to handle error recovery and clearing Error
> >>> Registers even in FF mode. So create new API pci_aer_raw_clear_status()
> >>> which allows clearing AER registers without FF mode checks.
> >>
> >> I see that this ECR was released as an ECN a few days ago:
> >> https://members.pcisig.com/wg/PCI-SIG/document/14076
> >> Regrettably the title in the PDF still says "ECR" (the rendered title
> >> *page* says "ENGINEERING CHANGE NOTIFICATION", but some metadata
> >> buried in the file says "ECR - SFI _OSC Support and DPC Updates".
> 
> I'll see if PCI-SIG can update the metadata and repost.

If that's possible, it would be nice to update the metadata for the
"Downstream Port Containment related Enhancements" ECN as well.  That
one currently says "ECR - CardBus Header Proposal", which means that's
what's in the window title bar and icons in the panel.

> >> Anyway, I think I see the note you refer to (now on page 12):
> >>
> >>     IMPLEMENTATION NOTE
> >>     DPC Event Handling
> >>
> >>     The flow chart below documents the behavior when firmware maintains
> >>     control of AER and DPC and grants control of PCIe Hot-Plug to the
> >>     operating system.
> >>
> >>     ...
> >>
> >>     Capture and clear device AER status. OS may choose to offline
> >>     devices3, either via SW (not load driver) or HW (power down device,
> >>     disable Link5,6,7). Otherwise process _HPX, complete device
> >>     enumeration, load drivers
> >>
> >> This clearly suggests that the OS should clear device AER status.
> >> However, according to the intro text, firmware has retained control of
> >> AER, so what gives the OS the right to clear AER status?
> >>
> >> The Downstream Port Containment Related Enhancements ECN (sec 4.5.1,
> >> table 4-6) contains an exception that allows the OS to read/write
> >> DPC registers during recovery.  But
> >>
> >>     - that is for *DPC* registers, not for AER registers, and
> >>
> >>     - that exception only applies between OS receipt of the EDR
> >>       notification and OS release of DPC by clearing the DPC Trigger
> >>       Status bit.
> >>
> >> The flowchart in the SFI ECN shows the OS releasing DPC before
> >> clearing AER status:
> >>
> >>     - Receive EDR notification
> >>
> >>     - Cleanup - Notify and unload child drivers below Port
> >>
> >>     - Bring Port out of DPC, clear port error status, assign bus numbers
> >>       to child devices.
> >>
> >>       I assume this box includes clearing DPC error status and clearing
> >>       Trigger Status?  They seem to be out of order in the box.
> 
> OS clears the DPC Trigger Status bit which will bring port below it out 
> of containment. Then OS will clear the "port" error status bits (i.e., 
> the AER and DPC status bits in the root port or downstream port that 
> triggered containment). I don't think it would hurt to do this two steps 
> in reverse order but don't think it is necessary. Note that error status 
> bits for devices below the port in containment are cleared later after 
> f/w has a chance to log them.

Maybe I'm misreading the DPC enhancements ECN.  I think it says the OS
can read/write DPC registers until it clears the DPC Trigger Status.
If the OS clears Trigger Status first, my understanding is that we're
now out of the EDR notification processing window and the OS is not
permitted to write DPC registers.

If it's OK for the OS to clear Trigger Status before clearing DPC
error status, what is the event that determines when the OS may no
longer read/write the DPC registers?

> >>     - Evaluate _OST
> >>
> >>     - Capture and clear device AER status.
> >>
> >>       This seems suspect to me.  Where does it say the OS is
> >>       allowed to write AER status when firmware retains control
> >>       of AER?
> >>
> >> This patch series does things in this order:
> >>
> >>     - Receive EDR notification (edr_handle_event(), edr.c)
> >>
> >>     - Read, log, and clear DPC error regs (dpc_process_error(),
> >>       dpc.c).
> >>
> >>       This also clears AER uncorrectable error status when the
> >>       relevant HEST entries do not have the FIRMWARE_FIRST bit
> >>       set.  I think this is incorrect: the test should be based
> >>       the _OSC negotiation for AER ownership, not on the HEST
> >>       entries.  But this problem pre-dates this patch series.
> >>
> >>     - Clear AER status (pci_aer_raw_clear_status(), aer.c).
> >>
> >>       This is at least inside the EDR recovery window, but again,
> >>       I don't see where it says the OS is allowed to write the
> >>       AER status.
> > 
> > Implementation note is the only reference we have regarding
> > clearing the AER registers.
> > 
> > But since the spec says both DPC and AER needs to be always
> > controlled together by the either OS or firmware, and when
> > firmware relinquishes control over DPC registers in EDR
> > notification window, we can assume that we also have control over
> > AER registers.
> > 
> > But I agree that is not explicitly spelled out any where outside
> > the implementation note.

This is all quite unsatisfying since implementation notes are not
normative.  I would far rather reference actual spec text.

> > Austin,
> > 
> > May be ECN (section 4.5.1, table 4-6) needs to be updated to add
> > this clarification.
> 
> Sure we can update to section 4.5.1, table 4-6 to indicate when OS
> can clear the AER status bits. It will just follow what's done in
> the implementation note so I think it's acceptable to follow
> implementation guidance for now.

There are no events after the "clear device AER status" box.  That
seems to mean the OS can write the AER status registers at any time.
But the whole implementation note assumes firmware maintains control
of AER.

> >>     - Attempt recovery (pcie_do_recovery(), err.c)
> >>
> >>     - Clear DPC Trigger Status (dpc_reset_link(), dpc.c)
> >>
> >>     - Evaluate _OST (acpi_send_edr_status(), edr.c)
> >>
> >> What am I missing?

  reply	other threads:[~2020-03-10 19:33 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04  2:36 [PATCH v17 00/12] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 01/12] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 02/12] PCI/AER: Move pci_cleanup_aer_error_status_regs() declaration to pci.h sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 03/12] PCI/ERR: Remove service dependency in pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-17 14:40   ` Christoph Hellwig
2020-03-04  2:36 ` [PATCH v17 04/12] PCI: portdrv: remove unnecessary pcie_port_find_service() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 05/12] PCI: portdrv: remove reset_link member from pcie_port_service_driver sathyanarayanan.kuppuswamy
2020-03-17 14:41   ` Christoph Hellwig
2020-03-17 14:55     ` Kuppuswamy, Sathyanarayanan
2020-03-04  2:36 ` [PATCH v17 06/12] Documentation: PCI: Remove reset_link references sathyanarayanan.kuppuswamy
2020-03-17 14:42   ` Christoph Hellwig
2020-03-17 15:05     ` Kuppuswamy, Sathyanarayanan
2020-03-17 15:07       ` Christoph Hellwig
2020-03-17 16:03         ` Bjorn Helgaas
2020-03-17 17:06           ` Christoph Hellwig
2020-03-19 22:52             ` Bjorn Helgaas
2020-03-04  2:36 ` [PATCH v17 07/12] PCI/ERR: Return status of pcie_do_recovery() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 08/12] PCI/DPC: Cache DPC capabilities in pci_init_capabilities() sathyanarayanan.kuppuswamy
2020-03-04  2:36 ` [PATCH v17 09/12] PCI/AER: Allow clearing Error Status Register in FF mode sathyanarayanan.kuppuswamy
2020-03-06  5:45   ` Kuppuswamy, Sathyanarayanan
2020-03-06 16:04     ` Bjorn Helgaas
2020-03-06 16:11       ` Kuppuswamy, Sathyanarayanan
2020-03-06 16:41         ` Bjorn Helgaas
2020-03-10  2:40   ` Bjorn Helgaas
2020-03-10  4:28     ` Kuppuswamy, Sathyanarayanan
2020-03-10 18:14       ` Austin.Bolen
2020-03-10 19:32         ` Bjorn Helgaas [this message]
2020-03-10 20:06           ` Austin.Bolen
2020-03-10 20:41             ` Kuppuswamy Sathyanarayanan
2020-03-10 20:49               ` Austin.Bolen
2020-03-11 14:45             ` Bjorn Helgaas
2020-03-11 15:19               ` Austin.Bolen
2020-03-11 17:12                 ` Bjorn Helgaas
2020-03-11 17:27                   ` Austin.Bolen
2020-03-11 20:33                     ` Bjorn Helgaas
2020-03-11 21:25                       ` Kuppuswamy Sathyanarayanan
2020-03-11 21:53                         ` Austin.Bolen
2020-03-11 22:11                           ` Kuppuswamy Sathyanarayanan
2020-03-11 22:23                             ` Bjorn Helgaas
2020-03-11 23:07                               ` Kuppuswamy Sathyanarayanan
2020-03-12 19:53                                 ` Bjorn Helgaas
2020-03-12 21:02                                   ` Austin.Bolen
2020-03-12 21:29                                     ` Kuppuswamy Sathyanarayanan
2020-03-12 21:52                                       ` Bjorn Helgaas
2020-03-12 22:02                                         ` Kuppuswamy Sathyanarayanan
2020-03-12 22:36                                           ` Bjorn Helgaas
2020-03-12 21:59                                   ` Kuppuswamy Sathyanarayanan
2020-03-12 22:32                                     ` Bjorn Helgaas
2020-03-13  6:22                                       ` Kuppuswamy, Sathyanarayanan
2020-03-13 19:28                                         ` Bjorn Helgaas
2020-03-13 20:26                                           ` Kuppuswamy Sathyanarayanan
2020-03-19 23:03                                             ` Bjorn Helgaas
2020-03-19 23:20                                               ` Kuppuswamy, Sathyanarayanan
2020-03-11 22:13                         ` Bjorn Helgaas
2020-03-11 22:41                           ` Kuppuswamy Sathyanarayanan
2020-03-11 18:12                   ` Kuppuswamy Sathyanarayanan
2020-03-11 22:05             ` Bjorn Helgaas
2020-03-04  2:36 ` [PATCH v17 10/12] PCI/DPC: Export DPC error recovery functions sathyanarayanan.kuppuswamy
2020-03-17 14:43   ` Christoph Hellwig
2020-03-04  2:36 ` [PATCH v17 11/12] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2020-03-06  3:47   ` Bjorn Helgaas
2020-03-06  6:32     ` Kuppuswamy, Sathyanarayanan
2020-03-06 21:00       ` Bjorn Helgaas
2020-03-06 22:42         ` Kuppuswamy Sathyanarayanan
2020-03-06 23:23           ` Bjorn Helgaas
2020-03-07  0:19             ` Kuppuswamy Sathyanarayanan
2020-03-04  2:36 ` [PATCH v17 12/12] PCI/ACPI: Enable EDR support sathyanarayanan.kuppuswamy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200310193257.GA170043@google.com \
    --to=helgaas@kernel.org \
    --cc=Austin.Bolen@dell.com \
    --cc=ashok.raj@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).