From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.codeaurora.org ([198.145.29.96]:57704 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726342AbeHPKxd (ORCPT ); Thu, 16 Aug 2018 06:53:33 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Date: Thu, 16 Aug 2018 13:26:51 +0530 From: poza@codeaurora.org To: Benjamin Herrenschmidt Cc: Thomas Tai , bhelgaas@google.com, keith.busch@intel.com, linux-pci@vger.kernel.org, linux-pci-owner@vger.kernel.org, Sam Bobroff Subject: Re: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed In-Reply-To: <3f454ec4e96842286e1bfeba960f9f37bfcfaf6f.camel@kernel.crashing.org> References: <1534179088-44219-1-git-send-email-thomas.tai@oracle.com> <1534179088-44219-2-git-send-email-thomas.tai@oracle.com> <51f4b387d9bd96a42d526a6a029fc43b@codeaurora.org> <903394c04d6ad468ed06dc0a779200e7555345a7.camel@kernel.crashing.org> <6cb069038530757f31f3dd60328c7e30@codeaurora.org> <3f454ec4e96842286e1bfeba960f9f37bfcfaf6f.camel@kernel.crashing.org> Message-ID: <96ecbc5963993703381ffcdda093d147@codeaurora.org> Sender: linux-pci-owner@vger.kernel.org List-ID: On 2018-08-16 12:45, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-16 at 17:05 +1000, Benjamin Herrenschmidt wrote: >> Those changes are utterly broken. >> >> The basic premise of the design that we woudl do that unplug/replug >> trick if and ONLY IF the driver doesn't have the appropriate >> callbacks. >> >> We are also now looking at replacing this with an ubind/re-bind >> because >> in practice, the unplugging is causing us all sort of problems. Sam >> (CC) can elaborate. >> >> Bjorn, we are the main authors of that spec (Linas wrote it under my >> supervision) and created those callbacks for EEH. AER picked them up >> only later. Those changes must be at the very least acked by us before >> going upstream. > > Also I had a quick look at the DPC spec, it looks like a subset of our > EEH capability. > > Again, the code in Linux was written without concertation with us, > misunderstands/misuses the driver callbacks, does unplugs when it > shouldn't etc... > > It's all very broken. > > Please, at the very least revert the spec changes. They are utterly > wrong. > > The driver MUST remain active during the recovery process *including* > fatal errors. > > Only if the recovery fails and the driver gives us may you chose to > unplug the device (though there is little point). > > What you have designed will work fine for network drivers but will not > work for storage. > > Ben. Hi Ben, If you look at original DPC driver it was doing exactly the same as it is doing now. so with or without new changes DPC driver had the same behavior from beginning. commit 26e515713342b6f7c553aa3c66b21c6ab7cf82af Regards, Oza.