From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A003C3F68F for ; Tue, 10 Dec 2019 07:28:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4AC8E206E0 for ; Tue, 10 Dec 2019 07:28:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727388AbfLJH2E (ORCPT ); Tue, 10 Dec 2019 02:28:04 -0500 Received: from mga01.intel.com ([192.55.52.88]:52144 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727143AbfLJH2E (ORCPT ); Tue, 10 Dec 2019 02:28:04 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Dec 2019 23:28:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,298,1571727600"; d="scan'208";a="220151589" Received: from lahna.fi.intel.com (HELO lahna) ([10.237.72.163]) by fmsmga001.fm.intel.com with SMTP; 09 Dec 2019 23:28:01 -0800 Received: by lahna (sSMTP sendmail emulation); Tue, 10 Dec 2019 09:28:00 +0200 Date: Tue, 10 Dec 2019 09:28:00 +0200 From: "mika.westerberg@linux.intel.com" To: Nicholas Johnson Cc: Bjorn Helgaas , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" Subject: Re: Linux v5.5 serious PCI bug Message-ID: <20191210072800.GY2665@lahna.fi.intel.com> References: <20191209131239.GP2665@lahna.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo User-Agent: Mutt/1.12.1 (2019-06-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 09, 2019 at 01:33:49PM +0000, Nicholas Johnson wrote: > On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote: > > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote: > > > Hi, > > > > > > I have compiled Linux v5.5-rc1 and thought all was good until I > > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the > > > GPU was not loaded (blacklisted) so the crash is nothing to do with the > > > GPU driver. > > > > > > We had: > > > - kernel NULL pointer dereference > > > - refcount_t: underflow; use-after-free. > > > > > > Attaching dmesg for now; will bisect and come back with results. > > > > Looks like something related to iommu. Does it work if you disable it? > > (intel_iommu=off in the command line). > On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote: > > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote: > > > Hi, > > > > > > I have compiled Linux v5.5-rc1 and thought all was good until I > > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the > > > GPU was not loaded (blacklisted) so the crash is nothing to do with the > > > GPU driver. > > > > > > We had: > > > - kernel NULL pointer dereference > > > - refcount_t: underflow; use-after-free. > > > > > > Attaching dmesg for now; will bisect and come back with results. > > > > Looks like something related to iommu. Does it work if you disable it? > > (intel_iommu=off in the command line). > I thought it could be that, too. > > The attachment "dmesg-4" from the original email is with iommu parameters. > The attachment "dmesg-5" from the original email is with no iommu parameters. > Attaching here "dmesg-6" with the iommu explicitly set off like you said. > > No difference, still broken. Although, with iommu off, there are less stack traces. > > Could it be sysfs-related? Bisect would probably be the best option to find the culprit commit. There are couple of commits done for pciehp so reverting them one by one may help as well: 87d0f2a5536f PCI: pciehp: Prevent deadlock on disconnect 75fcc0ce72e5 PCI: pciehp: Do not disable interrupt twice on suspend b94ec12dfaee PCI: pciehp: Refactor infinite loop in pcie_poll_cmd() 157c1062fcd8 PCI: pciehp: Avoid returning prematurely from sysfs requests