From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0803CC1975A for ; Wed, 25 Mar 2020 10:40:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DA55A20775 for ; Wed, 25 Mar 2020 10:40:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727129AbgCYKkV (ORCPT ); Wed, 25 Mar 2020 06:40:21 -0400 Received: from verein.lst.de ([213.95.11.211]:40182 "EHLO verein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726264AbgCYKkV (ORCPT ); Wed, 25 Mar 2020 06:40:21 -0400 Received: by verein.lst.de (Postfix, from userid 2407) id CBF5368C4E; Wed, 25 Mar 2020 11:40:18 +0100 (CET) Date: Wed, 25 Mar 2020 11:40:18 +0100 From: Christoph Hellwig To: Lukas Wunner Cc: "Haeuptle, Michael" , Christoph Hellwig , "linux-pci@vger.kernel.org" , "michaelhaeuptle@gmail.com" Subject: Re: Deadlock during PCIe hot remove Message-ID: <20200325104018.GA30853@lst.de> References: <20200324161534.b2u6ag6oecvcthqd@wunner.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200324161534.b2u6ag6oecvcthqd@wunner.de> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Tue, Mar 24, 2020 at 05:15:34PM +0100, Lukas Wunner wrote: > The pci_dev_trylock() in pci_try_reset_function() looks questionable > to me. It was added by commit b014e96d1abb ("PCI: Protect > pci_error_handlers->reset_notify() usage with device_lock()") > with the following rationale: > > Every method in struct device_driver or structures derived from it like > struct pci_driver MUST provide exclusion vs the driver's ->remove() > method, usually by using device_lock(). > [...] > Without this, ->reset_notify() may race with ->remove() calls, which > can be easily triggered in NVMe. > > The intersection of drivers defining a ->reset_notify() hook and files > invoking pci_try_reset_function() appears to be empty. So I don't quite > understand the problem the commit sought to address. What am I missing? No driver defines ->reset_notify as that has been split into ->reset_prepare and ->reset_done a while ago, and plenty of drivers define those. And we can't call into drivers unless we know the driver actually still is bound to the device, which is why we need the locking.