From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: Date: Tue, 21 Aug 2018 08:37:51 -0600 From: Keith Busch To: Benjamin Herrenschmidt Cc: poza@codeaurora.org, Sinan Kaya , Bjorn Helgaas , Thomas Tai , bhelgaas@google.com, linux-pci@vger.kernel.org, linux-pci-owner@vger.kernel.org, Sam Bobroff Subject: Re: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed Message-ID: <20180821143751.GA18477@localhost.localdomain> References: <908ff33ded8f31830f95a8889d8540f1@codeaurora.org> <5027d857bb59edfd33442003aa618ece1bc9cd52.camel@kernel.crashing.org> <2ecd1fd6d763810d45697f846fa876b58a193b1b.camel@kernel.crashing.org> <512e0e11c3ba462c1d033f8b0e768fa27489731c.camel@kernel.crashing.org> <2742bdba5ae8ccc420234b6e6b0224919367ed4c.camel@kernel.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <2742bdba5ae8ccc420234b6e6b0224919367ed4c.camel@kernel.crashing.org> List-ID: On Tue, Aug 21, 2018 at 04:06:30PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2018-08-21 at 10:44 +0530, poza@codeaurora.org wrote: > > > > Ok Let me summarize the so far discussed things. > > > > It would be nice if we all (Bjorn, Keith, Ben, Sinan) can hold consensus > > on this. > > > > 1) Right now AER and DPC both calls pcie_do_fatal_recovery(), I majorly > > see DPC as error handling and recovery agent rather than being used for > > hotplug. > > so in my opinion, both AER and DPC should have same error handling > > and recovery mechanism > > Yes. > > > so if there is a way to figure out that in absence of pcihp, if DPC > > is being used to support hotplug then we fall back to original DPC > > mechanism (which is remove devices) > > Not exactly. If the presence detect change indicates it was a hotplug > event rather. The actions associated with error recovery will trigger link state changes for a lot of existing hardware. PCIEHP currently does the same removal sequence for both link state change (DLLSC) and presence detect change (PDC) events. It sounds like you want pciehp to do nothing on the DLLSC events that it currently handles, and instead do the board removal only on PDC. If that is the case, is the desire to not remove devices downstream a permanently disabled link, or does that responsibility fall onto some other component?