From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 360A8C3A5A6 for ; Mon, 23 Sep 2019 08:12:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 07FDD20820 for ; Mon, 23 Sep 2019 08:12:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390703AbfIWIMm (ORCPT ); Mon, 23 Sep 2019 04:12:42 -0400 Received: from mga05.intel.com ([192.55.52.43]:11443 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388953AbfIWIMm (ORCPT ); Mon, 23 Sep 2019 04:12:42 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Sep 2019 01:12:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,539,1559545200"; d="scan'208";a="203076028" Received: from lahna.fi.intel.com (HELO lahna) ([10.237.72.157]) by fmsmga001.fm.intel.com with SMTP; 23 Sep 2019 01:12:38 -0700 Received: by lahna (sSMTP sendmail emulation); Mon, 23 Sep 2019 11:12:37 +0300 Date: Mon, 23 Sep 2019 11:12:37 +0300 From: Mika Westerberg To: Lukas Wunner Cc: Bjorn Helgaas , "Rafael J. Wysocki" , Keith Busch , Andy Shevchenko , Frederick Lawler , "Gustavo A . R . Silva" , Sinan Kaya , Kai-Heng Feng , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect Message-ID: <20190923081237.GB2773@lahna.fi.intel.com> References: <20190812143133.75319-1-mika.westerberg@linux.intel.com> <20190812143133.75319-2-mika.westerberg@linux.intel.com> <20190923053403.jdjw6ed3sub6iuou@wunner.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190923053403.jdjw6ed3sub6iuou@wunner.de> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo User-Agent: Mutt/1.12.1 (2019-06-15) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Hi Lukas, On Mon, Sep 23, 2019 at 07:34:03AM +0200, Lukas Wunner wrote: > On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote: > > If there are more than one PCIe switch with hotplug downstream ports > > hot-removing them leads to a following deadlock: > > For the record, I think my comments on v1 of this patch still apply: > > https://patchwork.ozlabs.org/patch/1117870/#2230798 Well, so do I ;-) As I tried to explain in v1 discussion, I think what we do here in this patch is correct thing to do regardless. I mean once the hardware is gone the driver should not do any decisions based on what it thinks it reads from the now missing hardware. This also makes the deadlock problem go away on all the system I've been testing. Where previously I was able to reproduce the deadlock 100% reliably I have not seen it happen once with this one applied (and haven't got reports from our internal testing either). Regarding suggestion of unbinding PCI drivers without pci_lock_rescan_remove() hold, I haven't looked it too closely but I think we need to take that lock anyway because when we are unbinding a hotplug driver it is supposed to remove the hierarchy below touching the shared structures, possibly concurrently. Unfortunately there is no documentation what data pci_lock_rescan_remove() actually protects so first one needs to understand that. I think one way to clean up this is to use finer grained locking (with documented lock ordering) for PCI bus structures that can be accessed simultaneusly by different threads. But that is not a simple task.