All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean V Kelley <sean.v.kelley@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>,
	"Jonathan.Cameron@huawei.com" <Jonathan.Cameron@huawei.com>,
	"Kuppuswamy,
	Sathyanarayanan" <sathyanarayanan.kuppuswamy@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"seanvk.dev@oregontracks.org" <seanvk.dev@oregontracks.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>
Subject: Re: [PATCH v8 11/14] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
Date: Tue, 13 Oct 2020 09:55:34 -0700	[thread overview]
Message-ID: <d023cc5153bf855e27d7ee098e1afa0fd5ba9761.camel@intel.com> (raw)
In-Reply-To: <20201012225848.GA3754175@bjorn-Precision-5520>

On Mon, 2020-10-12 at 17:58 -0500, Bjorn Helgaas wrote:
> On Fri, Oct 09, 2020 at 11:51:39PM +0000, Kelley, Sean V wrote:
> > On Fri, 2020-10-09 at 15:07 -0700, Sean V Kelley wrote:
> > So I tested the following out, including your moving flr to aer.c:
> > 
> > - Renamed flr_on_rciep() to flr_on_rc() for RC devices (RC_END and
> > RC_EC)
> > 
> > - Moved check on dev->rcec into aer_root_reset() including the FLR.
> > 
> > - Reworked pci_walk_bridge() to drop extra dev argument and check
> > locally for the bridge->rcec. Maybe should also check on type when
> > checking on bridge->rcec.
> > 
> > Note I didn't use the check on aer_cap existence because I think
> > you
> > had added that for simply being able to skip over for the non-
> > native
> > case and I handle that with the single goto at the beginning which
> > takes you to the FLR.
> 
> Right.  Well, my thinking was that "root" would be a device with the
> AER Root Error Command and Root Error Status registers, i.e., a Root
> Port or RCEC.  IIUC that basically means the APEI case where firmware
> gives us an error record.

Got it.

> 
> Isn't the existing v5.9 code buggy in that it unconditionally pokes
> these registers?  I think the APEI path can end up here, and firmware
> probably has not granted us control over AER.

Yes, APEI path can end up here even in the absence of AER control.

> 
> Somewhat related question: I'm a little skeptical about the fact that
> aer_root_reset() currently does:
> 
>   - clear ROOT_PORT_INTR_ON_MESG_MASK
>   - do reset
>   - clear PCI_ERR_ROOT_STATUS
>   - enable ROOT_PORT_INTR_ON_MESG_MASK

It's a bit of a mix and growing with RC_END handling.

> 
> In the APEI path all this AER register manipulation must be done by
> firmware before passing the error record to the OS.  So in the native
> case where the OS does own the AER registers, why can't the OS do
> that
> manipulation in the same order, i.e., all before doing the reset?

And you're right, the mix here imposes additional complexity for native
versus non-native. If you're not actively engaged with the code, it's
not obvious. So, yes moving it out would make more sense.


> 
> > So this is rough, compiled, tested with AER injections but that's
> > it...
> 
> I couldn't actually apply the patch below because it seems to be
> whitespace-damaged, but I think I like it.

Yes, it was a quick copy-paste to an existing email. Will work with
your branch.

> 
>   - It would be nice to be able to just call pcie_flr() and not have
>     to add flr_on_rc().  I can't remember why we need the
>     pcie_has_flr() / pcie_flr() dance.  It seems racy and ugly, but I
>     have a vague recollection that there actually is some reason for
>     it.

I'll have a look.

> 
>   - I would *rather* consolidate the AER register updates and test
> for
>     the non-native case once instead of treating it like a completely
>     separate path with a "goto".  But maybe not possible.  Not a big
>     deal either way.

Following your line of reasoning above, I think we can better
consolidate the AER register updates.

> 
>   - Getting rid of the extra "dev" argument to pci_walk_bridge() is a
>     great side-effect.  I didn't even notice that.
> 
>   - If we can simplify that "state == pci_channel_io_frozen" case as
>     this does, that is a *big* deal because there are other patches
>     just waiting to touch that reset and it will be much simpler if
>     there's only one reset_subordinate_devices() call there.

Agreed.

> 
> If you do work this up, I'd really appreciate it if you can start
> with
> my pci/err branch so I don't have to re-do all the tweaks I've
> already
> done:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/err
> 

Will do.

Thanks,

Sean



> Bjorn


  reply	other threads:[~2020-10-13 16:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 18:47 [PATCH v8 00/14] Add RCEC handling to PCI/AER Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 01/14] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 02/14] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 03/14] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 04/14] PCI/ERR: Rename reset_link() to reset_subordinate_device() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 05/14] PCI/ERR: Use "bridge" for clarity in pcie_do_recovery() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 06/14] PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 07/14] PCI/ERR: Limit AER resets in pcie_do_recovery() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 08/14] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
2020-10-09 21:27   ` Bjorn Helgaas
2020-10-09 21:54     ` Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 09/14] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 10/14] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 11/14] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Sean V Kelley
2020-10-09 17:57   ` Bjorn Helgaas
2020-10-09 18:26     ` Sean V Kelley
2020-10-09 18:34       ` Sean V Kelley
2020-10-09 18:53         ` Sean V Kelley
2020-10-09 19:48           ` Sean V Kelley
2020-10-09 21:30     ` Bjorn Helgaas
2020-10-09 22:07       ` Sean V Kelley
2020-10-09 23:51         ` Kelley, Sean V
2020-10-12 22:58           ` Bjorn Helgaas
2020-10-13 16:55             ` Sean V Kelley [this message]
2020-10-02 18:47 ` [PATCH v8 12/14] PCI/AER: Add pcie_walk_rcec() to RCEC AER handling Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 13/14] PCI/PME: Add pcie_walk_rcec() to RCEC PME handling Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 14/14] PCI/AER: Add RCEC AER error injection support Sean V Kelley
2020-10-09 15:53 ` [PATCH v8 00/14] Add RCEC handling to PCI/AER Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d023cc5153bf855e27d7ee098e1afa0fd5ba9761.camel@intel.com \
    --to=sean.v.kelley@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=sathyanarayanan.kuppuswamy@intel.com \
    --cc=seanvk.dev@oregontracks.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.