linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sinan Kaya <okaya@codeaurora.org>
To: Keith Busch <keith.busch@intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Oza Pawandeep <poza@codeaurora.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Philippe Ombredanne <pombredanne@nexb.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kate Stewart <kstewart@linuxfoundation.org>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Dongdong Liu <liudongdong3@huawei.com>, Wei Zhang <wzhang@fb.com>,
	Timur Tabi <timur@codeaurora.org>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system
Date: Thu, 12 Apr 2018 12:27:20 -0400	[thread overview]
Message-ID: <ea977780-6f90-af5a-442f-575536038307@codeaurora.org> (raw)
In-Reply-To: <20180412150231.GD4810@localhost.localdomain>

On 4/12/2018 11:02 AM, Keith Busch wrote:
> On Thu, Apr 12, 2018 at 08:39:54AM -0600, Keith Busch wrote:
>> On Thu, Apr 12, 2018 at 10:34:37AM -0400, Sinan Kaya wrote:
>>> On 4/12/2018 10:06 AM, Bjorn Helgaas wrote:
>>>>
>>>> I think the scenario you are describing is two systems that are
>>>> identical except that in the first, the endpoint is below a hotplug
>>>> bridge, while in the second, it's below a non-hotplug bridge.  There's
>>>> no physical hotplug (no drive removed or inserted), and DPC is
>>>> triggered in both systems.
>>>>
>>>> I suggest that DPC should be handled identically in both systems:
>>>>
>>>>   - The PCI core should have the same view of the endpoint: it should
>>>>     be removed and re-added in both cases (or in neither case).
>>>>
>>>>   - The endpoint itself should not be able to tell the difference: it
>>>>     should see a link down event, followed by a link retrain, followed
>>>>     by the same sequence of config accesses, etc.
>>>>
>>>>   - The endpoint driver should not be able to tell the difference,
>>>>     i.e., we should be calling the same pci_error_handlers callbacks
>>>>     in both cases.
>>>>
>>>> It's true that in the non-hotplug system, pciehp probably won't start
>>>> re-enumeration, so we might need an alternate path to trigger that.
>>>>
>>>> But that's not what we're doing in this patch.  In this patch we're
>>>> adding a much bigger difference: for hotplug bridges, we stop and
>>>> remove the hierarchy below the bridge; for non-hotplug bridges, we do
>>>> the AER-style flow of calling pci_error_handlers callbacks.
>>>
>>> Our approach on V12 was to go to AER style recovery for all DPC events
>>> regardless of hotplug support or not. 
>>>
>>> Keith was not comfortable with this approach. That's why, we special cased
>>> hotplug.
>>>
>>> If we drop 6/6 on this patch on v13, we achieve this. We still have to
>>> take care of Keith's inputs on individual patches.
>>>
>>> we have been struggling with the direction for a while.
>>>
>>> Keith, what do you think?
>>
>> My only concern was for existing production environments that use DPC
>> for handling surprise removal, and I don't wish to break the existing
>> uses.
> 
> Also, I thought the plan was to keep hotplug and non-hotplug the same,
> except for the very end: if not a hotplug bridge, initiate the rescan
> automatically after releasing from containment, otherwise let pciehp
> handle it when the link reactivates.
> 

Hmm...

AER driver doesn't do stop and rescan approach for fatal errors. AER driver
makes an error callback followed by secondary bus reset and finally driver
the resume callback on the endpoint only if link recovery is successful.
Otherwise, AER driver bails out with recovery unsuccessful message.

Why do we need an additional rescan in the DPC driver if the link is up
and driver resumes operation?

If hotplug is supported and somebody removed the device, link won't come up.
The AER error recovery sequence will fail after timeout.

When the drive is inserted, hotplug driver observes a link up interrupt,
Hotplug driver does a rescan. Drive is functional one more time. 

This should satisfy both use cases, right?


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

  reply	other threads:[~2018-04-12 16:27 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-09 14:41 [PATCH v13 0/6] Address error and recovery for AER and DPC Oza Pawandeep
2018-04-09 14:41 ` [PATCH v13 1/6] PCI/AER: Rename error recovery to generic PCI naming Oza Pawandeep
2018-04-09 23:14   ` Keith Busch
2018-04-09 14:41 ` [PATCH v13 2/6] PCI/AER: Factor out error reporting from AER Oza Pawandeep
2018-04-09 23:15   ` Keith Busch
2018-04-10 11:36   ` kbuild test robot
2018-04-09 14:41 ` [PATCH v13 3/6] PCI/PORTDRV: Implement generic find service Oza Pawandeep
2018-04-09 23:15   ` Keith Busch
2018-04-09 14:41 ` [PATCH v13 4/6] PCI/DPC: Unify and plumb error handling into DPC Oza Pawandeep
2018-04-09 23:29   ` Keith Busch
2018-04-09 23:51     ` Sinan Kaya
2018-04-10  0:05       ` Sinan Kaya
2018-04-09 14:41 ` [PATCH v13 5/6] PCI: Unify wait for link active into generic PCI Oza Pawandeep
2018-04-09 23:25   ` Keith Busch
2018-04-12  8:40     ` poza
2018-04-09 14:41 ` [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system Oza Pawandeep
2018-04-10 21:03   ` Bjorn Helgaas
2018-04-12  1:41     ` Sinan Kaya
2018-04-12 14:06       ` Bjorn Helgaas
2018-04-12 14:34         ` Sinan Kaya
2018-04-12 14:39           ` Keith Busch
2018-04-12 15:02             ` Keith Busch
2018-04-12 16:27               ` Sinan Kaya [this message]
2018-04-12 17:09                 ` Keith Busch
2018-04-12 17:41                   ` Sinan Kaya
2018-04-14 15:53                     ` Sinan Kaya
2018-04-16  3:17                       ` Bjorn Helgaas
2018-04-16  5:33                         ` poza
2018-04-16  5:51                           ` poza
2018-04-16 14:01                             ` Bjorn Helgaas
2018-04-16 14:46                         ` Sinan Kaya
2018-04-16 17:15                           ` poza
2018-04-16  3:16 ` [PATCH v13 0/6] Address error and recovery for AER and DPC Bjorn Helgaas
2018-04-16  3:53   ` Sinan Kaya
2018-04-16  6:03     ` poza
2018-04-16 13:27       ` Bjorn Helgaas
2018-04-16 14:12         ` poza
2018-04-16 14:30         ` Sinan Kaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea977780-6f90-af5a-442f-575536038307@codeaurora.org \
    --to=okaya@codeaurora.org \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=pombredanne@nexb.com \
    --cc=poza@codeaurora.org \
    --cc=tglx@linutronix.de \
    --cc=timur@codeaurora.org \
    --cc=wzhang@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).