From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bmailout1.hostsharing.net ([83.223.95.100]:46941 "EHLO bmailout1.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726657AbeIJWDx (ORCPT ); Mon, 10 Sep 2018 18:03:53 -0400 Date: Mon, 10 Sep 2018 19:08:48 +0200 From: Lukas Wunner To: Keith Busch Cc: Linux PCI , Bjorn Helgaas , Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , "poza@codeaurora.org" , Christoph Hellwig Subject: Re: [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks Message-ID: <20180910170848.7q2qii2mm655eghw@wunner.de> References: <20180905203546.21921-1-keith.busch@intel.com> <20180905203546.21921-17-keith.busch@intel.com> <20180910132033.ei5nk4iibt7pesd5@wunner.de> <20180910145641.GA7466@localhost.localdomain> <20180910160926.7grtfdyw3iv2xg4x@wunner.de> <20180910164528.GC7466@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180910164528.GC7466@localhost.localdomain> Sender: linux-pci-owner@vger.kernel.org List-ID: On Mon, Sep 10, 2018 at 10:45:28AM -0600, Keith Busch wrote: > On Mon, Sep 10, 2018 at 06:09:26PM +0200, Lukas Wunner wrote: > > On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote: > > > The sysfs entries still function. Their actions are only temporarily > > > stalled during error handling. Once the slot reset is called, the > > > ctrl->pending_events is queried to take requested actions. > > > > Okay I see. Still, releasing the IRQ and requesting it again seems fairly > > heavy-wheight. Why not just acquire ctrl->reset_lock? > > That was looking like a nice way to handle it, but it introduces > circular locking between ctrl->reset_lock and pci_bus_sem: > > CPU A CPU B > --------------------------------- ------------------------ > pci_walk_bus pciehp_ist > down_read(&pci_bus_sem) down_read(&ctrl->reset_lock); > pcie_portdrv_error_detected pciehp_handle_presence_or_link_change > pciehp_error_detected pciehp_unconfigure_device > down_write(&ctrl->reset_lock) pci_stop_and_remove_bus_devicea > down_write(&pci_bus_sem); Why is pciehp bringing down the slot? Is that in reaction to a Link Down caused by the error? Can this be solved with Sinan's approach to check in pciehp whether PCI_EXP_DEVSTA_FED is set and if so, waiting for it to be handled? FWIW you can use synchronize_irq() in pciehp_error_detected() if you need to wait for the IRQ thread to stop before taking the reset_lock. Thanks, Lukas