From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755214AbeD3VRo (ORCPT ); Mon, 30 Apr 2018 17:17:44 -0400 Received: from mail.kernel.org ([198.145.29.99]:52742 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755151AbeD3VRn (ORCPT ); Mon, 30 Apr 2018 17:17:43 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9DDC4229E6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=helgaas@kernel.org Date: Mon, 30 Apr 2018 16:17:40 -0500 From: Bjorn Helgaas To: Sinan Kaya Cc: Paul Menzel , Dave Young , linux-pci@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Lukas Wunner , Eric Biederman , Bjorn Helgaas , Vivek Goyal Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Message-ID: <20180430211740.GG95643@bhelgaas-glaptop.roam.corp.google.com> References: <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de> <20180427192207.GG8199@bhelgaas-glaptop.roam.corp.google.com> <20180427211255.GI8199@bhelgaas-glaptop.roam.corp.google.com> <20180428005620.GB1675@dhcp-128-65.nay.redhat.com> <20180428011845.GC1675@dhcp-128-65.nay.redhat.com> <3ebc908fb196168bf0373875ffc5679e@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 30, 2018 at 04:48:15PM -0400, Sinan Kaya wrote: > Bjorn, > > On 4/28/2018 9:03 AM, okaya@codeaurora.org wrote: > >> Hmm, if it is the remove() method then kexec does not use it.  kexec use > >> the shutdown() method instead.  I missed this details when I replied. > > > > Portdrv hooks up remove handler to shutdown. That's why remove is getting called. > > What should we do about this? > > Since there is an actual HW errata involved, should we quirk this > root port and not wait as if remove/shutdown doesn't exist? I was hoping to avoid a quirk because AFAIK all Intel parts have this issue so it will be an ongoing maintenance issue. I tried to avoid the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute timeout from hotplug command start time"). But we still see the alarming messages, so we should probably add a quirk to get rid of those. But I haven't given up on the idea of getting rid of the pciehp_remove() path. I'm not convinced yet that we actually need to do anything to shut this device down. I don't like the assumption that kexec requires this. The kexec is fundamentally just a branch, and anything we do before the branch (i.e., in the old kernel), we should also be able to do after the branch (i.e., in the kexec-ed kernel). > Paul, > You might want to file a bugzilla so that we can keep our debug > efforts out of this list. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fDGBG-0002ZL-4j for kexec@lists.infradead.org; Mon, 30 Apr 2018 21:17:55 +0000 Date: Mon, 30 Apr 2018 16:17:40 -0500 From: Bjorn Helgaas Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Message-ID: <20180430211740.GG95643@bhelgaas-glaptop.roam.corp.google.com> References: <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de> <20180427192207.GG8199@bhelgaas-glaptop.roam.corp.google.com> <20180427211255.GI8199@bhelgaas-glaptop.roam.corp.google.com> <20180428005620.GB1675@dhcp-128-65.nay.redhat.com> <20180428011845.GC1675@dhcp-128-65.nay.redhat.com> <3ebc908fb196168bf0373875ffc5679e@codeaurora.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Sinan Kaya Cc: linux-pci@vger.kernel.org, Paul Menzel , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Lukas Wunner , Eric Biederman , Bjorn Helgaas , Dave Young , Vivek Goyal On Mon, Apr 30, 2018 at 04:48:15PM -0400, Sinan Kaya wrote: > Bjorn, > = > On 4/28/2018 9:03 AM, okaya@codeaurora.org wrote: > >> Hmm, if it is the remove() method then kexec does not use it.=A0 kexec= use > >> the shutdown() method instead.=A0 I missed this details when I replied. > > = > > Portdrv hooks up remove handler to shutdown. That's why remove is getti= ng called. > = > What should we do about this? > = > Since there is an actual HW errata involved, should we quirk this > root port and not wait as if remove/shutdown doesn't exist? I was hoping to avoid a quirk because AFAIK all Intel parts have this issue so it will be an ongoing maintenance issue. I tried to avoid the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute timeout from hotplug command start time"). But we still see the alarming messages, so we should probably add a quirk to get rid of those. But I haven't given up on the idea of getting rid of the pciehp_remove() path. I'm not convinced yet that we actually need to do anything to shut this device down. I don't like the assumption that kexec requires this. The kexec is fundamentally just a branch, and anything we do before the branch (i.e., in the old kernel), we should also be able to do after the branch (i.e., in the kexec-ed kernel). > Paul, > You might want to file a bugzilla so that we can keep our debug > efforts out of this list. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec