From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755328AbeEAMiw (ORCPT ); Tue, 1 May 2018 08:38:52 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:46718 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755044AbeEAMiv (ORCPT ); Tue, 1 May 2018 08:38:51 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 570C4601A0 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=okaya@codeaurora.org Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) To: Bjorn Helgaas Cc: Paul Menzel , Dave Young , linux-pci@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Lukas Wunner , Eric Biederman , Bjorn Helgaas , Vivek Goyal , Marc Zyngier References: <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de> <20180427192207.GG8199@bhelgaas-glaptop.roam.corp.google.com> <20180427211255.GI8199@bhelgaas-glaptop.roam.corp.google.com> <20180428005620.GB1675@dhcp-128-65.nay.redhat.com> <20180428011845.GC1675@dhcp-128-65.nay.redhat.com> <3ebc908fb196168bf0373875ffc5679e@codeaurora.org> <20180430211740.GG95643@bhelgaas-glaptop.roam.corp.google.com> <7285da70-2c3e-c3b7-62e1-fdbb55a77729@codeaurora.org> From: Sinan Kaya Message-ID: <3549ffe8-7605-d72c-5c09-1436a4288c7d@codeaurora.org> Date: Tue, 1 May 2018 08:38:47 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <7285da70-2c3e-c3b7-62e1-fdbb55a77729@codeaurora.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Marc, On 4/30/2018 5:27 PM, Sinan Kaya wrote: > On 4/30/2018 5:17 PM, Bjorn Helgaas wrote: >>> What should we do about this? >>> >>> Since there is an actual HW errata involved, should we quirk this >>> root port and not wait as if remove/shutdown doesn't exist? >> I was hoping to avoid a quirk because AFAIK all Intel parts have this >> issue so it will be an ongoing maintenance issue. I tried to avoid >> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute >> timeout from hotplug command start time"). >> >> But we still see the alarming messages, so we should probably add a >> quirk to get rid of those. >> >> But I haven't given up on the idea of getting rid of the >> pciehp_remove() path. I'm not convinced yet that we actually need to >> do anything to shut this device down. I don't like the assumption >> that kexec requires this. The kexec is fundamentally just a branch, >> and anything we do before the branch (i.e., in the old kernel), we >> should also be able to do after the branch (i.e., in the kexec-ed >> kernel). >> > > In my experience with kexec, MSI type edge interrupts are harmless. > You might just see a few unhandled interrupt messages during boot > if something is pending from the first kernel. > > It is the level interrupts that are more concerning. It remains pending > until the interrupt source is cleared. CPU never returns from the > interrupt handler to actually continue booting the second kernel. This makes me wonder why kexec doesn't disable all interrupt sources by itself instead of relying on the drivers shutdown routine. Some drivers don't even have a shutdown callback. Kexec could have done both as another example. Something like. 1. Call shutdown for all drivers if available. 2. Disable all interrupt sources in the interrupt controller 3. Start the new kernel. > > Execution doesn't reach to PCIe hp driver initialization for > acknowledging the interrupt. > > How about remove() only if MSI is disabled? Most root port interrupts > are MSI based anyhow. > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from smtp.codeaurora.org ([198.145.29.96]) by casper.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fDUYj-0001oG-2K for kexec@lists.infradead.org; Tue, 01 May 2018 12:39:06 +0000 Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) References: <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de> <20180427192207.GG8199@bhelgaas-glaptop.roam.corp.google.com> <20180427211255.GI8199@bhelgaas-glaptop.roam.corp.google.com> <20180428005620.GB1675@dhcp-128-65.nay.redhat.com> <20180428011845.GC1675@dhcp-128-65.nay.redhat.com> <3ebc908fb196168bf0373875ffc5679e@codeaurora.org> <20180430211740.GG95643@bhelgaas-glaptop.roam.corp.google.com> <7285da70-2c3e-c3b7-62e1-fdbb55a77729@codeaurora.org> From: Sinan Kaya Message-ID: <3549ffe8-7605-d72c-5c09-1436a4288c7d@codeaurora.org> Date: Tue, 1 May 2018 08:38:47 -0400 MIME-Version: 1.0 In-Reply-To: <7285da70-2c3e-c3b7-62e1-fdbb55a77729@codeaurora.org> Content-Language: en-US List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=twosheds.infradead.org@lists.infradead.org To: Bjorn Helgaas Cc: Marc Zyngier , linux-pci@vger.kernel.org, Paul Menzel , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Lukas Wunner , Eric Biederman , Bjorn Helgaas , Dave Young , Vivek Goyal +Marc, On 4/30/2018 5:27 PM, Sinan Kaya wrote: > On 4/30/2018 5:17 PM, Bjorn Helgaas wrote: >>> What should we do about this? >>> >>> Since there is an actual HW errata involved, should we quirk this >>> root port and not wait as if remove/shutdown doesn't exist? >> I was hoping to avoid a quirk because AFAIK all Intel parts have this >> issue so it will be an ongoing maintenance issue. I tried to avoid >> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute >> timeout from hotplug command start time"). >> >> But we still see the alarming messages, so we should probably add a >> quirk to get rid of those. >> >> But I haven't given up on the idea of getting rid of the >> pciehp_remove() path. I'm not convinced yet that we actually need to >> do anything to shut this device down. I don't like the assumption >> that kexec requires this. The kexec is fundamentally just a branch, >> and anything we do before the branch (i.e., in the old kernel), we >> should also be able to do after the branch (i.e., in the kexec-ed >> kernel). >> > > In my experience with kexec, MSI type edge interrupts are harmless. > You might just see a few unhandled interrupt messages during boot > if something is pending from the first kernel. > > It is the level interrupts that are more concerning. It remains pending > until the interrupt source is cleared. CPU never returns from the > interrupt handler to actually continue booting the second kernel. This makes me wonder why kexec doesn't disable all interrupt sources by itself instead of relying on the drivers shutdown routine. Some drivers don't even have a shutdown callback. Kexec could have done both as another example. Something like. 1. Call shutdown for all drivers if available. 2. Disable all interrupt sources in the interrupt controller 3. Start the new kernel. > > Execution doesn't reach to PCIe hp driver initialization for > acknowledging the interrupt. > > How about remove() only if MSI is disabled? Most root port interrupts > are MSI based anyhow. > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec