All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takao Indoh <indou.takao@jp.fujitsu.com>
To: trenn@suse.de
Cc: yinghai@kernel.org, muneda.takahiro@jp.fujitsu.com,
	linux-pci@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, andi@firstfloor.org,
	tokunaga.keiich@jp.fujitsu.com, kexec@lists.infradead.org,
	hbabu@us.ibm.com, mingo@redhat.com, ddutile@redhat.com,
	vgoyal@redhat.com, ishii.hironobu@jp.fujitsu.com, hpa@zytor.com,
	bhelgaas@google.com, tglx@linutronix.de, khalid@gonehiking.org
Subject: Re: [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu
Date: Wed, 09 Jan 2013 13:39:03 +0900	[thread overview]
Message-ID: <50ECF467.2040008@jp.fujitsu.com> (raw)
In-Reply-To: <3564889.4S6qWWRR6X@hammer82.arch.suse.de>

Hi Thomas,

(2013/01/09 11:32), Thomas Renninger wrote:
> On Tuesday, January 08, 2013 09:27:55 AM Yinghai Lu wrote:
>> On Tue, Jan 8, 2013 at 8:50 AM, Thomas Renninger <trenn@suse.de> wrote:
>>> megaraid_sas
>>
>> can you check if your initrd for kdump kernel has that driver and
>> module that it depends on like
>> scsi sas transport etc ?
>
> Removing the 5 patches and the disk works and the
> dump is written.
>
> I can look a bit further at the memmap=exactmap issue tomorrow.
> I can also double check above then, but I am rather sure about it
> already:
> I tried plain vanilla -> worked, dumping started

It seems that there are several disk controllers in your system.

00:1f.2 SATA controller [0106]: Intel Corporation Device [8086:1d02] (rev 05)
02:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic Device [1000:005b] (rev 01)
05:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [1000:0064] (rev 02)

Which disk are you using to save the vmcore?


> I tried with only these 5 patches added -> no disk.
>
>
> Some questions:
>
> You try to initialize the PCI subsystem in a way the BIOS typically has
> to do it in kexec case?

These patches sends hot reset to endpoints to reset them, it may be
different way from BIOS initialization.

> Reacting and trying to handle error condtitions more gracefully
> at the place where they are caught could be another approach which
> imo makes sense to implement in parallel.
>
> In my case for example I see:
> "Present field in the IRTE entry is clear"
> DMAR errors. I expect this comes from a device which still throws
> interrupts, but irq vector got not set-up or registered in the kexec'ed
> kernel.
>
> I could imagine this is the same error which happens when an irq is
> wrongly configured and spurious interrupts happen (but in irq remapped case).
> In my case it's not sever as I only see this message once, but according
> to another report, they see about 80 of such DMAR error messages per
> second. This seem to result in endless DMAR error interrupts and finally
> a dead system.
>
> I wonder whether the DMAR error handler could already invoke a PCIe
> reset.
> I found:
> int pci_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state)
> which unfortunatly is only implemented for PPC, but would it make sense to
> implement this one and trigger function level reset if several specific DMAR
> errors are seen (or other PCI(e) error handlers get active?)?

Or AER framework may be able to handle this. Actually it has a function
to reset endpoint when error is detected.

Thanks,
Takao Indoh

>
> If this does not help the next step could be to stop DMAR error interrupt
> handling or other iommu commands to keep the machine alive, even if one
> device keeps firing interrupts to an unconfigured irq vector (or whatever other
> things could happen).
>
> Just some ideas...
> Comments appreciated.
>
>     Thomas
>
>


WARNING: multiple messages have this Message-ID (diff)
From: Takao Indoh <indou.takao@jp.fujitsu.com>
To: trenn@suse.de
Cc: muneda.takahiro@jp.fujitsu.com, tokunaga.keiich@jp.fujitsu.com,
	linux-pci@vger.kernel.org, x86@kernel.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	hbabu@us.ibm.com, andi@firstfloor.org, ddutile@redhat.com,
	ishii.hironobu@jp.fujitsu.com, hpa@zytor.com,
	bhelgaas@google.com, tglx@linutronix.de, yinghai@kernel.org,
	mingo@redhat.com, vgoyal@redhat.com, khalid@gonehiking.org
Subject: Re: [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu
Date: Wed, 09 Jan 2013 13:39:03 +0900	[thread overview]
Message-ID: <50ECF467.2040008@jp.fujitsu.com> (raw)
In-Reply-To: <3564889.4S6qWWRR6X@hammer82.arch.suse.de>

Hi Thomas,

(2013/01/09 11:32), Thomas Renninger wrote:
> On Tuesday, January 08, 2013 09:27:55 AM Yinghai Lu wrote:
>> On Tue, Jan 8, 2013 at 8:50 AM, Thomas Renninger <trenn@suse.de> wrote:
>>> megaraid_sas
>>
>> can you check if your initrd for kdump kernel has that driver and
>> module that it depends on like
>> scsi sas transport etc ?
>
> Removing the 5 patches and the disk works and the
> dump is written.
>
> I can look a bit further at the memmap=exactmap issue tomorrow.
> I can also double check above then, but I am rather sure about it
> already:
> I tried plain vanilla -> worked, dumping started

It seems that there are several disk controllers in your system.

00:1f.2 SATA controller [0106]: Intel Corporation Device [8086:1d02] (rev 05)
02:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic Device [1000:005b] (rev 01)
05:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [1000:0064] (rev 02)

Which disk are you using to save the vmcore?


> I tried with only these 5 patches added -> no disk.
>
>
> Some questions:
>
> You try to initialize the PCI subsystem in a way the BIOS typically has
> to do it in kexec case?

These patches sends hot reset to endpoints to reset them, it may be
different way from BIOS initialization.

> Reacting and trying to handle error condtitions more gracefully
> at the place where they are caught could be another approach which
> imo makes sense to implement in parallel.
>
> In my case for example I see:
> "Present field in the IRTE entry is clear"
> DMAR errors. I expect this comes from a device which still throws
> interrupts, but irq vector got not set-up or registered in the kexec'ed
> kernel.
>
> I could imagine this is the same error which happens when an irq is
> wrongly configured and spurious interrupts happen (but in irq remapped case).
> In my case it's not sever as I only see this message once, but according
> to another report, they see about 80 of such DMAR error messages per
> second. This seem to result in endless DMAR error interrupts and finally
> a dead system.
>
> I wonder whether the DMAR error handler could already invoke a PCIe
> reset.
> I found:
> int pci_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state state)
> which unfortunatly is only implemented for PPC, but would it make sense to
> implement this one and trigger function level reset if several specific DMAR
> errors are seen (or other PCI(e) error handlers get active?)?

Or AER framework may be able to handle this. Actually it has a function
to reset endpoint when error is detected.

Thanks,
Takao Indoh

>
> If this does not help the next step could be to stop DMAR error interrupt
> handling or other iommu commands to keep the machine alive, even if one
> device keeps firing interrupts to an unconfigured irq vector (or whatever other
> things could happen).
>
> Just some ideas...
> Comments appreciated.
>
>     Thomas
>
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2013-01-09  4:39 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-27  0:42 [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh
2012-11-27  0:42 ` Takao Indoh
2012-11-27  0:42 ` [PATCH v7 1/5] x86, pci: add dummy pci device for early stage Takao Indoh
2012-11-27  0:42   ` Takao Indoh
2012-11-27  0:42 ` [PATCH v7 2/5] PCI: Define the maximum number of PCI function Takao Indoh
2012-11-27  0:42   ` Takao Indoh
2012-11-27  0:42 ` [PATCH v7 3/5] Make reset_devices available at early stage Takao Indoh
2012-11-27  0:42   ` Takao Indoh
2012-11-27  0:43 ` [PATCH v7 4/5] x86, pci: Reset PCIe devices at boot time Takao Indoh
2012-11-27  0:43   ` Takao Indoh
2012-11-27  0:43 ` [PATCH v7 5/5] x86, pci: Enable PCI INTx when MSI is disabled Takao Indoh
2012-11-27  0:43   ` Takao Indoh
2012-11-30 15:49 ` [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu MUNEDA Takahiro
2012-11-30 15:49   ` MUNEDA Takahiro
2012-12-21 16:19   ` Yinghai Lu
2012-12-21 16:19     ` Yinghai Lu
2013-01-07 19:09     ` Thomas Renninger
2013-01-07 19:09       ` Thomas Renninger
2013-01-07 20:16       ` Yinghai Lu
2013-01-07 20:16         ` Yinghai Lu
2013-01-08  0:42         ` Thomas Renninger
2013-01-08  0:42           ` Thomas Renninger
2013-01-08  3:04           ` Yinghai Lu
2013-01-08  3:04             ` Yinghai Lu
2013-01-08 16:47             ` [PATCH] Only reset e820 once, even with multiple memmap=exactmap params Thomas Renninger
2013-01-08 16:47               ` Thomas Renninger
2013-01-08 17:19               ` Yinghai Lu
2013-01-08 17:19                 ` Yinghai Lu
2013-01-10  3:21                 ` Thomas Renninger
2013-01-10  3:21                   ` Thomas Renninger
2013-01-10 14:26                   ` Vivek Goyal
2013-01-10 14:26                     ` Vivek Goyal
2013-01-10 16:53                     ` Yinghai Lu
2013-01-10 16:53                       ` Yinghai Lu
2013-01-10 17:01                       ` Vivek Goyal
2013-01-10 17:01                         ` Vivek Goyal
2013-01-10 17:11                         ` Yinghai Lu
2013-01-10 17:11                           ` Yinghai Lu
2013-01-10 23:34                   ` Yinghai Lu
2013-01-11 12:33                     ` [PATCH] x86 e820: only void usable memory areas in memmap=exactmap case Thomas Renninger
2013-01-11 12:33                       ` Thomas Renninger
2013-01-11 16:16                       ` Yinghai Lu
2013-01-11 16:16                         ` Yinghai Lu
2013-01-11 18:24                         ` Thomas Renninger
2013-01-11 18:24                           ` Thomas Renninger
2013-01-11 19:59                           ` Yinghai Lu
2013-01-11 19:59                             ` Yinghai Lu
2013-01-11 20:06                             ` H. Peter Anvin
2013-01-11 20:06                               ` H. Peter Anvin
2013-01-11 21:09                               ` Yinghai Lu
2013-01-11 21:09                                 ` Yinghai Lu
2013-01-11 22:16                                 ` H. Peter Anvin
2013-01-11 22:16                                   ` H. Peter Anvin
2013-01-12 11:31                                   ` Thomas Renninger
2013-01-12 11:31                                     ` Thomas Renninger
2013-01-12 17:07                                     ` Yinghai Lu
2013-01-12 17:07                                       ` Yinghai Lu
2013-01-14  2:08                                       ` Thomas Renninger
2013-01-14  2:08                                         ` Thomas Renninger
2013-01-14  2:43                                         ` Yinghai Lu
2013-01-14  2:43                                           ` Yinghai Lu
2013-01-14 15:05                                           ` Thomas Renninger
2013-01-14 15:05                                             ` Thomas Renninger
2013-01-14 19:04                                             ` Yinghai Lu
2013-01-14 19:04                                               ` Yinghai Lu
2013-01-15  0:54                                               ` Thomas Renninger
2013-01-15  0:54                                                 ` Thomas Renninger
2013-01-15  4:45                                                 ` Yinghai Lu
2013-01-15  4:45                                                   ` Yinghai Lu
2013-01-22 15:21                                                   ` Thomas Renninger
2013-01-22 15:21                                                     ` Thomas Renninger
2013-01-08 16:50         ` [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Thomas Renninger
2013-01-08 16:50           ` Thomas Renninger
2013-01-08 17:27           ` Yinghai Lu
2013-01-08 17:27             ` Yinghai Lu
2013-01-09  2:32             ` Thomas Renninger
2013-01-09  2:32               ` Thomas Renninger
2013-01-09  4:39               ` Takao Indoh [this message]
2013-01-09  4:39                 ` Takao Indoh
2013-01-21  1:11       ` Takao Indoh
2013-01-21  1:11         ` Takao Indoh
2013-01-23  0:47         ` Thomas Renninger
2013-01-23  0:47           ` Thomas Renninger
2013-01-24  0:23           ` Takao Indoh
2013-01-24  0:23             ` Takao Indoh
2013-01-29  1:14             ` Thomas Renninger
2013-01-29  1:14               ` Thomas Renninger
2013-01-30  5:01               ` Takao Indoh
2013-01-30  5:01                 ` Takao Indoh
2013-03-04  0:56           ` Takao Indoh
2013-03-04  0:56             ` Takao Indoh
2013-03-04 22:00             ` Don Dutile
2013-03-04 22:00               ` Don Dutile
2013-03-05  0:56               ` Takao Indoh
2013-03-05  0:56                 ` Takao Indoh
2012-12-21  9:59 ` oliver yang
2012-12-21 10:37   ` Takao Indoh
2012-12-21 10:37     ` Takao Indoh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50ECF467.2040008@jp.fujitsu.com \
    --to=indou.takao@jp.fujitsu.com \
    --cc=andi@firstfloor.org \
    --cc=bhelgaas@google.com \
    --cc=ddutile@redhat.com \
    --cc=hbabu@us.ibm.com \
    --cc=hpa@zytor.com \
    --cc=ishii.hironobu@jp.fujitsu.com \
    --cc=kexec@lists.infradead.org \
    --cc=khalid@gonehiking.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=muneda.takahiro@jp.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=tokunaga.keiich@jp.fujitsu.com \
    --cc=trenn@suse.de \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.