From: Takao Indoh <indou.takao@jp.fujitsu.com> To: trenn@suse.de Cc: yinghai@kernel.org, muneda.takahiro@jp.fujitsu.com, linux-pci@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, andi@firstfloor.org, tokunaga.keiich@jp.fujitsu.com, kexec@lists.infradead.org, hbabu@us.ibm.com, mingo@redhat.com, ddutile@redhat.com, vgoyal@redhat.com, ishii.hironobu@jp.fujitsu.com, hpa@zytor.com, bhelgaas@google.com, tglx@linutronix.de, khalid@gonehiking.org Subject: Re: [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Date: Wed, 09 Jan 2013 13:39:03 +0900 [thread overview] Message-ID: <50ECF467.2040008@jp.fujitsu.com> (raw) In-Reply-To: <3564889.4S6qWWRR6X@hammer82.arch.suse.de> Hi Thomas, (2013/01/09 11:32), Thomas Renninger wrote: > On Tuesday, January 08, 2013 09:27:55 AM Yinghai Lu wrote: >> On Tue, Jan 8, 2013 at 8:50 AM, Thomas Renninger <trenn@suse.de> wrote: >>> megaraid_sas >> >> can you check if your initrd for kdump kernel has that driver and >> module that it depends on like >> scsi sas transport etc ? > > Removing the 5 patches and the disk works and the > dump is written. > > I can look a bit further at the memmap=exactmap issue tomorrow. > I can also double check above then, but I am rather sure about it > already: > I tried plain vanilla -> worked, dumping started It seems that there are several disk controllers in your system. 00:1f.2 SATA controller [0106]: Intel Corporation Device [8086:1d02] (rev 05) 02:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic Device [1000:005b] (rev 01) 05:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [1000:0064] (rev 02) Which disk are you using to save the vmcore? > I tried with only these 5 patches added -> no disk. > > > Some questions: > > You try to initialize the PCI subsystem in a way the BIOS typically has > to do it in kexec case? These patches sends hot reset to endpoints to reset them, it may be different way from BIOS initialization. > Reacting and trying to handle error condtitions more gracefully > at the place where they are caught could be another approach which > imo makes sense to implement in parallel. > > In my case for example I see: > "Present field in the IRTE entry is clear" > DMAR errors. I expect this comes from a device which still throws > interrupts, but irq vector got not set-up or registered in the kexec'ed > kernel. > > I could imagine this is the same error which happens when an irq is > wrongly configured and spurious interrupts happen (but in irq remapped case). > In my case it's not sever as I only see this message once, but according > to another report, they see about 80 of such DMAR error messages per > second. This seem to result in endless DMAR error interrupts and finally > a dead system. > > I wonder whether the DMAR error handler could already invoke a PCIe > reset. > I found: > int pci_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state) > which unfortunatly is only implemented for PPC, but would it make sense to > implement this one and trigger function level reset if several specific DMAR > errors are seen (or other PCI(e) error handlers get active?)? Or AER framework may be able to handle this. Actually it has a function to reset endpoint when error is detected. Thanks, Takao Indoh > > If this does not help the next step could be to stop DMAR error interrupt > handling or other iommu commands to keep the machine alive, even if one > device keeps firing interrupts to an unconfigured irq vector (or whatever other > things could happen). > > Just some ideas... > Comments appreciated. > > Thomas > >
WARNING: multiple messages have this Message-ID (diff)
From: Takao Indoh <indou.takao@jp.fujitsu.com> To: trenn@suse.de Cc: muneda.takahiro@jp.fujitsu.com, tokunaga.keiich@jp.fujitsu.com, linux-pci@vger.kernel.org, x86@kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, hbabu@us.ibm.com, andi@firstfloor.org, ddutile@redhat.com, ishii.hironobu@jp.fujitsu.com, hpa@zytor.com, bhelgaas@google.com, tglx@linutronix.de, yinghai@kernel.org, mingo@redhat.com, vgoyal@redhat.com, khalid@gonehiking.org Subject: Re: [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Date: Wed, 09 Jan 2013 13:39:03 +0900 [thread overview] Message-ID: <50ECF467.2040008@jp.fujitsu.com> (raw) In-Reply-To: <3564889.4S6qWWRR6X@hammer82.arch.suse.de> Hi Thomas, (2013/01/09 11:32), Thomas Renninger wrote: > On Tuesday, January 08, 2013 09:27:55 AM Yinghai Lu wrote: >> On Tue, Jan 8, 2013 at 8:50 AM, Thomas Renninger <trenn@suse.de> wrote: >>> megaraid_sas >> >> can you check if your initrd for kdump kernel has that driver and >> module that it depends on like >> scsi sas transport etc ? > > Removing the 5 patches and the disk works and the > dump is written. > > I can look a bit further at the memmap=exactmap issue tomorrow. > I can also double check above then, but I am rather sure about it > already: > I tried plain vanilla -> worked, dumping started It seems that there are several disk controllers in your system. 00:1f.2 SATA controller [0106]: Intel Corporation Device [8086:1d02] (rev 05) 02:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic Device [1000:005b] (rev 01) 05:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [1000:0064] (rev 02) Which disk are you using to save the vmcore? > I tried with only these 5 patches added -> no disk. > > > Some questions: > > You try to initialize the PCI subsystem in a way the BIOS typically has > to do it in kexec case? These patches sends hot reset to endpoints to reset them, it may be different way from BIOS initialization. > Reacting and trying to handle error condtitions more gracefully > at the place where they are caught could be another approach which > imo makes sense to implement in parallel. > > In my case for example I see: > "Present field in the IRTE entry is clear" > DMAR errors. I expect this comes from a device which still throws > interrupts, but irq vector got not set-up or registered in the kexec'ed > kernel. > > I could imagine this is the same error which happens when an irq is > wrongly configured and spurious interrupts happen (but in irq remapped case). > In my case it's not sever as I only see this message once, but according > to another report, they see about 80 of such DMAR error messages per > second. This seem to result in endless DMAR error interrupts and finally > a dead system. > > I wonder whether the DMAR error handler could already invoke a PCIe > reset. > I found: > int pci_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state) > which unfortunatly is only implemented for PPC, but would it make sense to > implement this one and trigger function level reset if several specific DMAR > errors are seen (or other PCI(e) error handlers get active?)? Or AER framework may be able to handle this. Actually it has a function to reset endpoint when error is detected. Thanks, Takao Indoh > > If this does not help the next step could be to stop DMAR error interrupt > handling or other iommu commands to keep the machine alive, even if one > device keeps firing interrupts to an unconfigured irq vector (or whatever other > things could happen). > > Just some ideas... > Comments appreciated. > > Thomas > > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2013-01-09 4:39 UTC|newest] Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top 2012-11-27 0:42 [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh 2012-11-27 0:42 ` Takao Indoh 2012-11-27 0:42 ` [PATCH v7 1/5] x86, pci: add dummy pci device for early stage Takao Indoh 2012-11-27 0:42 ` Takao Indoh 2012-11-27 0:42 ` [PATCH v7 2/5] PCI: Define the maximum number of PCI function Takao Indoh 2012-11-27 0:42 ` Takao Indoh 2012-11-27 0:42 ` [PATCH v7 3/5] Make reset_devices available at early stage Takao Indoh 2012-11-27 0:42 ` Takao Indoh 2012-11-27 0:43 ` [PATCH v7 4/5] x86, pci: Reset PCIe devices at boot time Takao Indoh 2012-11-27 0:43 ` Takao Indoh 2012-11-27 0:43 ` [PATCH v7 5/5] x86, pci: Enable PCI INTx when MSI is disabled Takao Indoh 2012-11-27 0:43 ` Takao Indoh 2012-11-30 15:49 ` [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu MUNEDA Takahiro 2012-11-30 15:49 ` MUNEDA Takahiro 2012-12-21 16:19 ` Yinghai Lu 2012-12-21 16:19 ` Yinghai Lu 2013-01-07 19:09 ` Thomas Renninger 2013-01-07 19:09 ` Thomas Renninger 2013-01-07 20:16 ` Yinghai Lu 2013-01-07 20:16 ` Yinghai Lu 2013-01-08 0:42 ` Thomas Renninger 2013-01-08 0:42 ` Thomas Renninger 2013-01-08 3:04 ` Yinghai Lu 2013-01-08 3:04 ` Yinghai Lu 2013-01-08 16:47 ` [PATCH] Only reset e820 once, even with multiple memmap=exactmap params Thomas Renninger 2013-01-08 16:47 ` Thomas Renninger 2013-01-08 17:19 ` Yinghai Lu 2013-01-08 17:19 ` Yinghai Lu 2013-01-10 3:21 ` Thomas Renninger 2013-01-10 3:21 ` Thomas Renninger 2013-01-10 14:26 ` Vivek Goyal 2013-01-10 14:26 ` Vivek Goyal 2013-01-10 16:53 ` Yinghai Lu 2013-01-10 16:53 ` Yinghai Lu 2013-01-10 17:01 ` Vivek Goyal 2013-01-10 17:01 ` Vivek Goyal 2013-01-10 17:11 ` Yinghai Lu 2013-01-10 17:11 ` Yinghai Lu 2013-01-10 23:34 ` Yinghai Lu 2013-01-11 12:33 ` [PATCH] x86 e820: only void usable memory areas in memmap=exactmap case Thomas Renninger 2013-01-11 12:33 ` Thomas Renninger 2013-01-11 16:16 ` Yinghai Lu 2013-01-11 16:16 ` Yinghai Lu 2013-01-11 18:24 ` Thomas Renninger 2013-01-11 18:24 ` Thomas Renninger 2013-01-11 19:59 ` Yinghai Lu 2013-01-11 19:59 ` Yinghai Lu 2013-01-11 20:06 ` H. Peter Anvin 2013-01-11 20:06 ` H. Peter Anvin 2013-01-11 21:09 ` Yinghai Lu 2013-01-11 21:09 ` Yinghai Lu 2013-01-11 22:16 ` H. Peter Anvin 2013-01-11 22:16 ` H. Peter Anvin 2013-01-12 11:31 ` Thomas Renninger 2013-01-12 11:31 ` Thomas Renninger 2013-01-12 17:07 ` Yinghai Lu 2013-01-12 17:07 ` Yinghai Lu 2013-01-14 2:08 ` Thomas Renninger 2013-01-14 2:08 ` Thomas Renninger 2013-01-14 2:43 ` Yinghai Lu 2013-01-14 2:43 ` Yinghai Lu 2013-01-14 15:05 ` Thomas Renninger 2013-01-14 15:05 ` Thomas Renninger 2013-01-14 19:04 ` Yinghai Lu 2013-01-14 19:04 ` Yinghai Lu 2013-01-15 0:54 ` Thomas Renninger 2013-01-15 0:54 ` Thomas Renninger 2013-01-15 4:45 ` Yinghai Lu 2013-01-15 4:45 ` Yinghai Lu 2013-01-22 15:21 ` Thomas Renninger 2013-01-22 15:21 ` Thomas Renninger 2013-01-08 16:50 ` [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Thomas Renninger 2013-01-08 16:50 ` Thomas Renninger 2013-01-08 17:27 ` Yinghai Lu 2013-01-08 17:27 ` Yinghai Lu 2013-01-09 2:32 ` Thomas Renninger 2013-01-09 2:32 ` Thomas Renninger 2013-01-09 4:39 ` Takao Indoh [this message] 2013-01-09 4:39 ` Takao Indoh 2013-01-21 1:11 ` Takao Indoh 2013-01-21 1:11 ` Takao Indoh 2013-01-23 0:47 ` Thomas Renninger 2013-01-23 0:47 ` Thomas Renninger 2013-01-24 0:23 ` Takao Indoh 2013-01-24 0:23 ` Takao Indoh 2013-01-29 1:14 ` Thomas Renninger 2013-01-29 1:14 ` Thomas Renninger 2013-01-30 5:01 ` Takao Indoh 2013-01-30 5:01 ` Takao Indoh 2013-03-04 0:56 ` Takao Indoh 2013-03-04 0:56 ` Takao Indoh 2013-03-04 22:00 ` Don Dutile 2013-03-04 22:00 ` Don Dutile 2013-03-05 0:56 ` Takao Indoh 2013-03-05 0:56 ` Takao Indoh 2012-12-21 9:59 ` oliver yang 2012-12-21 10:37 ` Takao Indoh 2012-12-21 10:37 ` Takao Indoh
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=50ECF467.2040008@jp.fujitsu.com \ --to=indou.takao@jp.fujitsu.com \ --cc=andi@firstfloor.org \ --cc=bhelgaas@google.com \ --cc=ddutile@redhat.com \ --cc=hbabu@us.ibm.com \ --cc=hpa@zytor.com \ --cc=ishii.hironobu@jp.fujitsu.com \ --cc=kexec@lists.infradead.org \ --cc=khalid@gonehiking.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pci@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=muneda.takahiro@jp.fujitsu.com \ --cc=tglx@linutronix.de \ --cc=tokunaga.keiich@jp.fujitsu.com \ --cc=trenn@suse.de \ --cc=vgoyal@redhat.com \ --cc=x86@kernel.org \ --cc=yinghai@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.