All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takao Indoh <indou.takao@jp.fujitsu.com>
To: ddutile@redhat.com
Cc: trenn@suse.de, yinghai@kernel.org,
	muneda.takahiro@jp.fujitsu.com, linux-pci@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	andi@firstfloor.org, tokunaga.keiich@jp.fujitsu.com,
	kexec@lists.infradead.org, hbabu@us.ibm.com, mingo@redhat.com,
	vgoyal@redhat.com, ishii.hironobu@jp.fujitsu.com, hpa@zytor.com,
	bhelgaas@google.com, tglx@linutronix.de, khalid@gonehiking.org
Subject: Re: [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu
Date: Tue, 05 Mar 2013 09:56:47 +0900	[thread overview]
Message-ID: <513542CF.1010602@jp.fujitsu.com> (raw)
In-Reply-To: <5135196C.3020104@redhat.com>

(2013/03/05 7:00), Don Dutile wrote:
> On 03/03/2013 07:56 PM, Takao Indoh wrote:
>> (2013/01/23 9:47), Thomas Renninger wrote:
>>> On Monday, January 21, 2013 10:11:04 AM Takao Indoh wrote:
>>>> (2013/01/08 4:09), Thomas Renninger wrote:
>>> ...
>>>>> I tried the provided patches first on 2.6.32, then I verfied with 3.8-rc2
>>>>> and in both cases the disk is not detected anymore in
>>>>> reset_devices (kexec'ed/kdump) case (but things work fine without these
>>>>> patches).
>>>>
>>>> So the problem that the disk is not detected was caused by exactmap
>>>> problem you guys are discussing? Or still not detected even if exactmap
>>>> problem is fixed?
>>> This problem is related to the 5 PCI resetting patches.
>>> Dumping worked with a 2.6.32 and a 3.8-rc2 kernel, adding the PCI resetting
>>> patches broke both. I first tried 2.6.32 and verified with 3.8-rc2 to make sure
>>> I didn't mess up the backport adjustings of the patches to 2.6.32.
>>>
>>> Unfortunately this Dell platform takes really long to boot.
>>> I can give it the one or other test, but please do not bomb me with patches.
>>>
>>> For info:
>>> About the interrupt remapping error interrupt storm in kdump case I tried to
>>> reproduce on this machine, but never could: The guys who saw that also cannot
>>> reproduce this anymore.
>>>
>>> Two ideas I had about this:
>>>     - As said already, (also) try to catch the error case and try to reset the
>>>       the device in AER/Specific iterrupt remapping error interrupt caught.
>>
>> I tried this idea but it did not work on megaraid_sas.
>>
>> I made a experimental patch so that devices are reset when DMAR error is
>> detected on it. What happened is that:
>> 1) megaraid_sas module is loaded.
>> 2) DMAR error is detected during the driver initialization.
> This driver does something bad that IOMMU code isn't designed for,
> or handle correctly -- it starts with one dma-mask, does an IOMMU mapping,
> changes its dma-mask, and that moves it into another domain that's not
> valid for the first mask.... and does occassional access with original mask.
> I have it on my to-do list to dig into the driver more to see if that
> sequence can be changed/fixed.
>
>> 3) Reset device
>> 4) kdump fails because the disk is not found.
>>
>> When I tested patches which reset all devices in early boot time, the
>> disk was recognized correctly, so it seems that device reset during its
>> driver loading does something wrong. I think we need reset device at
> driver rest, or master-enable turned off ?

I have another patch to turn off busmaster bit in early quirk, but after
driver loading DMAR error is still detected as follows. This may be
driver problem as you said above.

Loading mptscsih.koigb: Intel(R) Gigabit Ethernet Network Driver - version 4.1.2-k
  module
Loadingigb: Copyright (c) 2007-2012 Intel Corporation.
  scsi_transport_dmar: DRHD: handling fault status reg 102
dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr ffe16000
DMAR:[fault reason 01] Present bit in root entry is clear
Uhhuh. NMI received for unknown reason 2c on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
fc.ko module
Loigb 0000:01:00.0: irq 76 for MSI/MSI-X
ading dm-log.ko igb 0000:01:00.0: irq 77 for MSI/MSI-X
module
Loading igb 0000:01:00.0: irq 78 for MSI/MSI-X
nf_conntrack_ipv6.ko module
Loading vhost_net.ko module
Loading igb.ko module
igb 0000:01:00.0: DCA enabled
igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x2) c8:0a:a9:9d:fa:52
igb 0000:01:00.0: eth0: PBA No: 323131-030
igb 0000:01:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
dmar: DRHD: handling fault status reg 202
dmar: DMAR:[DMA Write] Request device [01:00.1] fault addr ffee9000
DMAR:[fault reason 01] Present bit in root entry is clear
igb 0000:01:00.1: irq 79 for MSI/MSI-X
igb 0000:01:00.1: irq 80 for MSI/MSI-X
igb 0000:01:00.1: irq 81 for MSI/MSI-X
(snip)

Thanks,
Takao Indoh

  
>> least before its driver is loaded.
>>
>> Thanks,
>> Takao Indoh
>>
>>
>>>     - Have a look at coreboot, these guys should know how to initialize the PCI
>>>       subsystem from scratch and might have some well tested PCI resetting
>>>       code in place already (no idea, just a thought).
>>>
>>>       Thomas
>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


WARNING: multiple messages have this Message-ID (diff)
From: Takao Indoh <indou.takao@jp.fujitsu.com>
To: ddutile@redhat.com
Cc: muneda.takahiro@jp.fujitsu.com, mingo@redhat.com,
	tokunaga.keiich@jp.fujitsu.com, linux-pci@vger.kernel.org,
	x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, hbabu@us.ibm.com,
	andi@firstfloor.org, ishii.hironobu@jp.fujitsu.com,
	hpa@zytor.com, bhelgaas@google.com, tglx@linutronix.de,
	yinghai@kernel.org, trenn@suse.de, vgoyal@redhat.com,
	khalid@gonehiking.org
Subject: Re: [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu
Date: Tue, 05 Mar 2013 09:56:47 +0900	[thread overview]
Message-ID: <513542CF.1010602@jp.fujitsu.com> (raw)
In-Reply-To: <5135196C.3020104@redhat.com>

(2013/03/05 7:00), Don Dutile wrote:
> On 03/03/2013 07:56 PM, Takao Indoh wrote:
>> (2013/01/23 9:47), Thomas Renninger wrote:
>>> On Monday, January 21, 2013 10:11:04 AM Takao Indoh wrote:
>>>> (2013/01/08 4:09), Thomas Renninger wrote:
>>> ...
>>>>> I tried the provided patches first on 2.6.32, then I verfied with 3.8-rc2
>>>>> and in both cases the disk is not detected anymore in
>>>>> reset_devices (kexec'ed/kdump) case (but things work fine without these
>>>>> patches).
>>>>
>>>> So the problem that the disk is not detected was caused by exactmap
>>>> problem you guys are discussing? Or still not detected even if exactmap
>>>> problem is fixed?
>>> This problem is related to the 5 PCI resetting patches.
>>> Dumping worked with a 2.6.32 and a 3.8-rc2 kernel, adding the PCI resetting
>>> patches broke both. I first tried 2.6.32 and verified with 3.8-rc2 to make sure
>>> I didn't mess up the backport adjustings of the patches to 2.6.32.
>>>
>>> Unfortunately this Dell platform takes really long to boot.
>>> I can give it the one or other test, but please do not bomb me with patches.
>>>
>>> For info:
>>> About the interrupt remapping error interrupt storm in kdump case I tried to
>>> reproduce on this machine, but never could: The guys who saw that also cannot
>>> reproduce this anymore.
>>>
>>> Two ideas I had about this:
>>>     - As said already, (also) try to catch the error case and try to reset the
>>>       the device in AER/Specific iterrupt remapping error interrupt caught.
>>
>> I tried this idea but it did not work on megaraid_sas.
>>
>> I made a experimental patch so that devices are reset when DMAR error is
>> detected on it. What happened is that:
>> 1) megaraid_sas module is loaded.
>> 2) DMAR error is detected during the driver initialization.
> This driver does something bad that IOMMU code isn't designed for,
> or handle correctly -- it starts with one dma-mask, does an IOMMU mapping,
> changes its dma-mask, and that moves it into another domain that's not
> valid for the first mask.... and does occassional access with original mask.
> I have it on my to-do list to dig into the driver more to see if that
> sequence can be changed/fixed.
>
>> 3) Reset device
>> 4) kdump fails because the disk is not found.
>>
>> When I tested patches which reset all devices in early boot time, the
>> disk was recognized correctly, so it seems that device reset during its
>> driver loading does something wrong. I think we need reset device at
> driver rest, or master-enable turned off ?

I have another patch to turn off busmaster bit in early quirk, but after
driver loading DMAR error is still detected as follows. This may be
driver problem as you said above.

Loading mptscsih.koigb: Intel(R) Gigabit Ethernet Network Driver - version 4.1.2-k
  module
Loadingigb: Copyright (c) 2007-2012 Intel Corporation.
  scsi_transport_dmar: DRHD: handling fault status reg 102
dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr ffe16000
DMAR:[fault reason 01] Present bit in root entry is clear
Uhhuh. NMI received for unknown reason 2c on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
fc.ko module
Loigb 0000:01:00.0: irq 76 for MSI/MSI-X
ading dm-log.ko igb 0000:01:00.0: irq 77 for MSI/MSI-X
module
Loading igb 0000:01:00.0: irq 78 for MSI/MSI-X
nf_conntrack_ipv6.ko module
Loading vhost_net.ko module
Loading igb.ko module
igb 0000:01:00.0: DCA enabled
igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x2) c8:0a:a9:9d:fa:52
igb 0000:01:00.0: eth0: PBA No: 323131-030
igb 0000:01:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
dmar: DRHD: handling fault status reg 202
dmar: DMAR:[DMA Write] Request device [01:00.1] fault addr ffee9000
DMAR:[fault reason 01] Present bit in root entry is clear
igb 0000:01:00.1: irq 79 for MSI/MSI-X
igb 0000:01:00.1: irq 80 for MSI/MSI-X
igb 0000:01:00.1: irq 81 for MSI/MSI-X
(snip)

Thanks,
Takao Indoh

  
>> least before its driver is loaded.
>>
>> Thanks,
>> Takao Indoh
>>
>>
>>>     - Have a look at coreboot, these guys should know how to initialize the PCI
>>>       subsystem from scratch and might have some well tested PCI resetting
>>>       code in place already (no idea, just a thought).
>>>
>>>       Thomas
>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2013-03-05  0:57 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-27  0:42 [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh
2012-11-27  0:42 ` Takao Indoh
2012-11-27  0:42 ` [PATCH v7 1/5] x86, pci: add dummy pci device for early stage Takao Indoh
2012-11-27  0:42   ` Takao Indoh
2012-11-27  0:42 ` [PATCH v7 2/5] PCI: Define the maximum number of PCI function Takao Indoh
2012-11-27  0:42   ` Takao Indoh
2012-11-27  0:42 ` [PATCH v7 3/5] Make reset_devices available at early stage Takao Indoh
2012-11-27  0:42   ` Takao Indoh
2012-11-27  0:43 ` [PATCH v7 4/5] x86, pci: Reset PCIe devices at boot time Takao Indoh
2012-11-27  0:43   ` Takao Indoh
2012-11-27  0:43 ` [PATCH v7 5/5] x86, pci: Enable PCI INTx when MSI is disabled Takao Indoh
2012-11-27  0:43   ` Takao Indoh
2012-11-30 15:49 ` [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu MUNEDA Takahiro
2012-11-30 15:49   ` MUNEDA Takahiro
2012-12-21 16:19   ` Yinghai Lu
2012-12-21 16:19     ` Yinghai Lu
2013-01-07 19:09     ` Thomas Renninger
2013-01-07 19:09       ` Thomas Renninger
2013-01-07 20:16       ` Yinghai Lu
2013-01-07 20:16         ` Yinghai Lu
2013-01-08  0:42         ` Thomas Renninger
2013-01-08  0:42           ` Thomas Renninger
2013-01-08  3:04           ` Yinghai Lu
2013-01-08  3:04             ` Yinghai Lu
2013-01-08 16:47             ` [PATCH] Only reset e820 once, even with multiple memmap=exactmap params Thomas Renninger
2013-01-08 16:47               ` Thomas Renninger
2013-01-08 17:19               ` Yinghai Lu
2013-01-08 17:19                 ` Yinghai Lu
2013-01-10  3:21                 ` Thomas Renninger
2013-01-10  3:21                   ` Thomas Renninger
2013-01-10 14:26                   ` Vivek Goyal
2013-01-10 14:26                     ` Vivek Goyal
2013-01-10 16:53                     ` Yinghai Lu
2013-01-10 16:53                       ` Yinghai Lu
2013-01-10 17:01                       ` Vivek Goyal
2013-01-10 17:01                         ` Vivek Goyal
2013-01-10 17:11                         ` Yinghai Lu
2013-01-10 17:11                           ` Yinghai Lu
2013-01-10 23:34                   ` Yinghai Lu
2013-01-11 12:33                     ` [PATCH] x86 e820: only void usable memory areas in memmap=exactmap case Thomas Renninger
2013-01-11 12:33                       ` Thomas Renninger
2013-01-11 16:16                       ` Yinghai Lu
2013-01-11 16:16                         ` Yinghai Lu
2013-01-11 18:24                         ` Thomas Renninger
2013-01-11 18:24                           ` Thomas Renninger
2013-01-11 19:59                           ` Yinghai Lu
2013-01-11 19:59                             ` Yinghai Lu
2013-01-11 20:06                             ` H. Peter Anvin
2013-01-11 20:06                               ` H. Peter Anvin
2013-01-11 21:09                               ` Yinghai Lu
2013-01-11 21:09                                 ` Yinghai Lu
2013-01-11 22:16                                 ` H. Peter Anvin
2013-01-11 22:16                                   ` H. Peter Anvin
2013-01-12 11:31                                   ` Thomas Renninger
2013-01-12 11:31                                     ` Thomas Renninger
2013-01-12 17:07                                     ` Yinghai Lu
2013-01-12 17:07                                       ` Yinghai Lu
2013-01-14  2:08                                       ` Thomas Renninger
2013-01-14  2:08                                         ` Thomas Renninger
2013-01-14  2:43                                         ` Yinghai Lu
2013-01-14  2:43                                           ` Yinghai Lu
2013-01-14 15:05                                           ` Thomas Renninger
2013-01-14 15:05                                             ` Thomas Renninger
2013-01-14 19:04                                             ` Yinghai Lu
2013-01-14 19:04                                               ` Yinghai Lu
2013-01-15  0:54                                               ` Thomas Renninger
2013-01-15  0:54                                                 ` Thomas Renninger
2013-01-15  4:45                                                 ` Yinghai Lu
2013-01-15  4:45                                                   ` Yinghai Lu
2013-01-22 15:21                                                   ` Thomas Renninger
2013-01-22 15:21                                                     ` Thomas Renninger
2013-01-08 16:50         ` [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu Thomas Renninger
2013-01-08 16:50           ` Thomas Renninger
2013-01-08 17:27           ` Yinghai Lu
2013-01-08 17:27             ` Yinghai Lu
2013-01-09  2:32             ` Thomas Renninger
2013-01-09  2:32               ` Thomas Renninger
2013-01-09  4:39               ` Takao Indoh
2013-01-09  4:39                 ` Takao Indoh
2013-01-21  1:11       ` Takao Indoh
2013-01-21  1:11         ` Takao Indoh
2013-01-23  0:47         ` Thomas Renninger
2013-01-23  0:47           ` Thomas Renninger
2013-01-24  0:23           ` Takao Indoh
2013-01-24  0:23             ` Takao Indoh
2013-01-29  1:14             ` Thomas Renninger
2013-01-29  1:14               ` Thomas Renninger
2013-01-30  5:01               ` Takao Indoh
2013-01-30  5:01                 ` Takao Indoh
2013-03-04  0:56           ` Takao Indoh
2013-03-04  0:56             ` Takao Indoh
2013-03-04 22:00             ` Don Dutile
2013-03-04 22:00               ` Don Dutile
2013-03-05  0:56               ` Takao Indoh [this message]
2013-03-05  0:56                 ` Takao Indoh
2012-12-21  9:59 ` oliver yang
2012-12-21 10:37   ` Takao Indoh
2012-12-21 10:37     ` Takao Indoh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513542CF.1010602@jp.fujitsu.com \
    --to=indou.takao@jp.fujitsu.com \
    --cc=andi@firstfloor.org \
    --cc=bhelgaas@google.com \
    --cc=ddutile@redhat.com \
    --cc=hbabu@us.ibm.com \
    --cc=hpa@zytor.com \
    --cc=ishii.hironobu@jp.fujitsu.com \
    --cc=kexec@lists.infradead.org \
    --cc=khalid@gonehiking.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=muneda.takahiro@jp.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=tokunaga.keiich@jp.fujitsu.com \
    --cc=trenn@suse.de \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.