All of lore.kernel.org
 help / color / mirror / Atom feed
From: poza@codeaurora.org
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Sinan Kaya <okaya@codeaurora.org>,
	Keith Busch <keith.busch@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Philippe Ombredanne <pombredanne@nexb.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kate Stewart <kstewart@linuxfoundation.org>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Dongdong Liu <liudongdong3@huawei.com>, Wei Zhang <wzhang@fb.com>,
	Timur Tabi <timur@codeaurora.org>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system
Date: Mon, 16 Apr 2018 11:03:25 +0530	[thread overview]
Message-ID: <da626c40564276097ab7380ead5f0238@codeaurora.org> (raw)
In-Reply-To: <20180416031726.GB158153@bhelgaas-glaptop.roam.corp.google.com>

On 2018-04-16 08:47, Bjorn Helgaas wrote:
> On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote:
> 
>> You indicated that you want to unify the AER and DPC behavior. Let's
>> settle on what we want to do one more time. We have been going forth
>> and back on the direction.
> 
> My thinking is that as much as possible, similar events should be
> handled similarly, whether the mechanism is AER, DPC, EEH, etc.
> Ideally, drivers shouldn't have to be aware of which mechanism is in
> use.
> 
> Error recovery includes conventional PCI as well, but right now I
> think we're only concerned with PCIe.  The following error types are
> from PCIe r4.0, sec 6.2.2:
> 
>   ERR_COR
>     Corrected by hardware with no software intervention.  Software
>     involved for logging only.
> 
>     Handled by AER via pci_error_handlers; DPC is never involved.
> 
>     Link is unaffected.
> 
>   ERR_NONFATAL
>     A transaction is unreliable but the link is fully functional.
> 
>     If DPC is not supported, handled by AER via pci_error_handlers and
>     the link is unaffected.
> 
>     If DPC supported, handled by DPC (because we set
>     PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate.
> 
>   ERR_FATAL
>     The link is unreliable.
> 
>     If DPC is not supported, handled by AER via pci_error_handlers and
>     the link is reset.
> 
>     If DPC supported, handled by DPC via remove/re-enumerate.
> 
> It doesn't seem right to me that we handle both ERR_NONFATAL and
> ERR_FATAL events differently if we happen to have DPC support in a
> switch.
> 
> Maybe we should consider triggering DPC only on ERR_FATAL?  That would
> keep DPC out of the ERR_NONFATAL cases.
> 
> For ERR_FATAL, maybe we should bite the bullet and use
> remove/re-enumerate for AER as well as for DPC.  That would be painful
> for higher-level software, but if we're willing to accept that pain
> for new systems that support DPC, maybe life would be better overall
> if it worked the same way on systems without DPC?
> 
> Bjorn

This had crossed my mind when I first looked at the code.
DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case.
I thought the primary purpose of DPC to recover fatal errors, by 
triggering HW recovery.
but what if some platform wants to handle both FATAL and NON_FATAL with 
DPC ?

As you said AER FATAL cases and DPC FATAL cases should be handled 
similarly.
e.g. remove/re-enumerate the devices.

while NON_FATAL case; only AER would come into picture.
if some platform would like to handle DPC NON_FATAL then it should 
follow AER NON_FATAL path  (where it does not do remove/re-enumerate)

And the case where hotplug is enabled, remove/re-enumerate more sense in 
case of ERR_FATAL.
And the case where hotplug is disabled, only re-enumeration is required. 
(no need to remove the devices)
but then do we need to handle this case specifically, what is the harm 
in removing the devices in all the cases followed by re-enumerate ?

Regards,
Oza.

  reply	other threads:[~2018-04-16  5:33 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-09 14:41 [PATCH v13 0/6] Address error and recovery for AER and DPC Oza Pawandeep
2018-04-09 14:41 ` [PATCH v13 1/6] PCI/AER: Rename error recovery to generic PCI naming Oza Pawandeep
2018-04-09 23:14   ` Keith Busch
2018-04-09 14:41 ` [PATCH v13 2/6] PCI/AER: Factor out error reporting from AER Oza Pawandeep
2018-04-09 23:15   ` Keith Busch
2018-04-10 11:36   ` kbuild test robot
2018-04-09 14:41 ` [PATCH v13 3/6] PCI/PORTDRV: Implement generic find service Oza Pawandeep
2018-04-09 23:15   ` Keith Busch
2018-04-09 14:41 ` [PATCH v13 4/6] PCI/DPC: Unify and plumb error handling into DPC Oza Pawandeep
2018-04-09 23:29   ` Keith Busch
2018-04-09 23:51     ` Sinan Kaya
2018-04-10  0:05       ` Sinan Kaya
2018-04-09 14:41 ` [PATCH v13 5/6] PCI: Unify wait for link active into generic PCI Oza Pawandeep
2018-04-09 23:25   ` Keith Busch
2018-04-12  8:40     ` poza
2018-04-09 14:41 ` [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system Oza Pawandeep
2018-04-10 21:03   ` Bjorn Helgaas
2018-04-12  1:41     ` Sinan Kaya
2018-04-12 14:06       ` Bjorn Helgaas
2018-04-12 14:34         ` Sinan Kaya
2018-04-12 14:39           ` Keith Busch
2018-04-12 15:02             ` Keith Busch
2018-04-12 16:27               ` Sinan Kaya
2018-04-12 17:09                 ` Keith Busch
2018-04-12 17:41                   ` Sinan Kaya
2018-04-14 15:53                     ` Sinan Kaya
2018-04-16  3:17                       ` Bjorn Helgaas
2018-04-16  5:33                         ` poza [this message]
2018-04-16  5:51                           ` poza
2018-04-16 14:01                             ` Bjorn Helgaas
2018-04-16 14:46                         ` Sinan Kaya
2018-04-16 17:15                           ` poza
2018-04-16  3:16 ` [PATCH v13 0/6] Address error and recovery for AER and DPC Bjorn Helgaas
2018-04-16  3:53   ` Sinan Kaya
2018-04-16  6:03     ` poza
2018-04-16 13:27       ` Bjorn Helgaas
2018-04-16 14:12         ` poza
2018-04-16 14:30         ` Sinan Kaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da626c40564276097ab7380ead5f0238@codeaurora.org \
    --to=poza@codeaurora.org \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=okaya@codeaurora.org \
    --cc=pombredanne@nexb.com \
    --cc=tglx@linutronix.de \
    --cc=timur@codeaurora.org \
    --cc=wzhang@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.