All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: "Li, Zhen-Hua" <zhen-hual@hp.com>
Cc: indou.takao@jp.fujitsu.com, bhe@redhat.com, joro@8bytes.org,
	vgoyal@redhat.com, dyoung@redhat.com, tom.vaden@hp.com,
	rwright@hp.com, linux-pci@vger.kernel.org,
	kexec@lists.infradead.org, iommu@lists.linux-foundation.org,
	lisa.mitchell@hp.com, linux-kernel@vger.kernel.org,
	doug.hatch@hp.com, ishii.hironobu@jp.fujitsu.com,
	bhelgaas@google.com, billsumnerlinux@gmail.com, li.zhang6@hp.com
Subject: Re: [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
Date: Thu, 11 Jun 2015 16:40:12 +0100	[thread overview]
Message-ID: <1434037212.3907.82.camel@infradead.org> (raw)
In-Reply-To: <1428655333-19504-1-git-send-email-zhen-hual@hp.com>

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

On Fri, 2015-04-10 at 16:42 +0800, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
> when a panic happens, the kdump kernel will boot with these faults:

But, in the general case, it *does* boot.

There are two cases where it doesn't actually boot, and those are the
interesting ones.

Firstly, a device just keeps generating faults and we die in an
interrupt storm, reporting the same fault over and over again. That can
actually happen without kdump/kexec and the correct fix for that is to
have rate-limiting, disable fault reporting for the offending device
after too many are seen, and then eventually to tie it in to the PCIe
error handling as has been discussed elsewhere.

Secondly, there are devices which do not correctly respond to a
hardware reset. This is broken hardware, and if we really have to copy
the old contexts from the crashed kernel to work around it then I'd
like it to be on a blacklist basis — we do it only for hardware which
is *known* to be broken in this way.

(There's also some cases where the device driver doesn't even *try* to
reset the hardware and just assumes it'll find it in a sane state as
the BIOS or a cleanly shut down kexec would have left it. In those
cases of course we can just fix the driver).

I don't much like the idea of doing this context copy for *all*
hardware. That's masking hardware issues with reset that we really
*ought* to be finding.

I believe that most of the offending hardware is HP's; they like to do
the most, erm, "interesting" things with odd hardware and RMRRs and
stuff. So Zhen-Hua would you be able to provide the list of broken
devices that HP has shipped, for the purpose of such a blacklist?

I assume you've already contacted the hardware folks responsible and
insisted that their devices are fixed to be resettable already, right?

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: David Woodhouse <dwmw2@infradead.org>
To: "Li, Zhen-Hua" <zhen-hual@hp.com>
Cc: indou.takao@jp.fujitsu.com, bhe@redhat.com, tom.vaden@hp.com,
	rwright@hp.com, linux-pci@vger.kernel.org, joro@8bytes.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	lisa.mitchell@hp.com, iommu@lists.linux-foundation.org,
	doug.hatch@hp.com, ishii.hironobu@jp.fujitsu.com,
	bhelgaas@google.com, billsumnerlinux@gmail.com, li.zhang6@hp.com,
	dyoung@redhat.com, vgoyal@redhat.com
Subject: Re: [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
Date: Thu, 11 Jun 2015 16:40:12 +0100	[thread overview]
Message-ID: <1434037212.3907.82.camel@infradead.org> (raw)
In-Reply-To: <1428655333-19504-1-git-send-email-zhen-hual@hp.com>


[-- Attachment #1.1: Type: text/plain, Size: 1997 bytes --]

On Fri, 2015-04-10 at 16:42 +0800, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
> when a panic happens, the kdump kernel will boot with these faults:

But, in the general case, it *does* boot.

There are two cases where it doesn't actually boot, and those are the
interesting ones.

Firstly, a device just keeps generating faults and we die in an
interrupt storm, reporting the same fault over and over again. That can
actually happen without kdump/kexec and the correct fix for that is to
have rate-limiting, disable fault reporting for the offending device
after too many are seen, and then eventually to tie it in to the PCIe
error handling as has been discussed elsewhere.

Secondly, there are devices which do not correctly respond to a
hardware reset. This is broken hardware, and if we really have to copy
the old contexts from the crashed kernel to work around it then I'd
like it to be on a blacklist basis — we do it only for hardware which
is *known* to be broken in this way.

(There's also some cases where the device driver doesn't even *try* to
reset the hardware and just assumes it'll find it in a sane state as
the BIOS or a cleanly shut down kexec would have left it. In those
cases of course we can just fix the driver).

I don't much like the idea of doing this context copy for *all*
hardware. That's masking hardware issues with reset that we really
*ought* to be finding.

I believe that most of the offending hardware is HP's; they like to do
the most, erm, "interesting" things with odd hardware and RMRRs and
stuff. So Zhen-Hua would you be able to provide the list of broken
devices that HP has shipped, for the purpose of such a blacklist?

I assume you've already contacted the hardware folks responsible and
insisted that their devices are fixed to be resettable already, right?

-- 
dwmw2

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 143 bytes --]

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  parent reply	other threads:[~2015-06-11 15:40 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10  8:42 [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel Li, Zhen-Hua
2015-04-10  8:42 ` Li, Zhen-Hua
2015-04-10  8:42 ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 01/10] iommu/vt-d: New function to attach domain with id Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 02/10] iommu/vt-d: Items required for kdump Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 03/10] iommu/vt-d: Function to get old context entry Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 04/10] iommu/vt-d: functions to copy data from old mem Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-05-07  7:49   ` Baoquan He
2015-05-07  7:49     ` Baoquan He
2015-05-07  8:33     ` Li, ZhenHua
2015-05-07  8:33       ` Li, ZhenHua
2015-05-07  8:33       ` Li, ZhenHua
2015-04-10  8:42 ` [PATCH v10 05/10] iommu/vt-d: Add functions to load and save old re Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 06/10] iommu/vt-d: datatypes and functions used for kdump Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 07/10] iommu/vt-d: enable kdump support in iommu module Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 08/10] iommu/vt-d: assign new page table for dma_map Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 09/10] iommu/vt-d: Copy functions for irte Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42 ` [PATCH v10 10/10] iommu/vt-d: Use old irte in kdump kernel Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-10  8:42   ` Li, Zhen-Hua
2015-04-15  0:57 ` [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults " Dave Young
2015-04-15  5:47   ` Li, ZhenHua
2015-04-15  5:47     ` Li, ZhenHua
2015-04-15  5:47     ` Li, ZhenHua
2015-04-15  6:48     ` Dave Young
2015-04-15  6:48       ` Dave Young
2015-04-21  1:39       ` Li, ZhenHua
2015-04-21  1:39         ` Li, ZhenHua
2015-04-21  2:53         ` Dave Young
2015-04-21  2:53           ` Dave Young
2015-04-21  2:53           ` Dave Young
2015-04-24  8:01       ` Baoquan He
2015-04-24  8:01         ` Baoquan He
2015-04-24  8:25         ` Dave Young
2015-04-24  8:25           ` Dave Young
2015-04-24  8:35           ` Baoquan He
2015-04-24  8:35             ` Baoquan He
2015-04-24  8:49             ` Dave Young
2015-04-24  8:49               ` Dave Young
2015-04-28  8:54               ` Baoquan He
2015-04-28  8:54                 ` Baoquan He
2015-04-28  9:00                 ` Li, ZhenHua
2015-04-28  9:00                   ` Li, ZhenHua
2015-05-04 16:23               ` Joerg Roedel
2015-05-04 16:23                 ` Joerg Roedel
2015-05-05  6:14                 ` Dave Young
2015-05-05  6:14                   ` Dave Young
2015-05-05 15:31                   ` Joerg Roedel
2015-05-05 15:31                     ` Joerg Roedel
2015-05-06  1:51                     ` Dave Young
2015-05-06  1:51                       ` Dave Young
2015-05-06  1:51                       ` Dave Young
2015-05-06  2:37                       ` Li, ZhenHua
2015-05-06  2:37                         ` Li, ZhenHua
2015-05-06  2:37                         ` Li, ZhenHua
2015-05-06  8:25                       ` Joerg Roedel
2015-05-06  8:25                         ` Joerg Roedel
2015-04-23  8:35 ` Li, ZhenHua
2015-04-23  8:35   ` Li, ZhenHua
2015-04-23  8:35   ` Li, ZhenHua
2015-04-23  8:38   ` Li, ZhenHua
2015-04-23  8:38     ` Li, ZhenHua
2015-04-23  8:38     ` Li, ZhenHua
2015-04-29 11:20 ` Baoquan He
2015-04-29 11:20   ` Baoquan He
2015-04-29 11:20   ` Baoquan He
2015-05-03  8:55   ` Baoquan He
2015-05-03  8:55     ` Baoquan He
2015-05-03  8:55     ` Baoquan He
2015-05-04  3:06     ` Li, ZhenHua
2015-05-04  3:06       ` Li, ZhenHua
2015-05-04  3:06       ` Li, ZhenHua
2015-05-04  3:17       ` Baoquan He
2015-05-04  3:17         ` Baoquan He
2015-05-07 17:32         ` Joerg Roedel
2015-05-07 17:32           ` Joerg Roedel
2015-05-08  1:00           ` Li, ZhenHua
2015-05-08  1:00             ` Li, ZhenHua
2015-05-08  1:00             ` Li, ZhenHua
2015-06-11 15:40 ` David Woodhouse [this message]
2015-06-11 15:40   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1434037212.3907.82.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bhe@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=billsumnerlinux@gmail.com \
    --cc=doug.hatch@hp.com \
    --cc=dyoung@redhat.com \
    --cc=indou.takao@jp.fujitsu.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=ishii.hironobu@jp.fujitsu.com \
    --cc=joro@8bytes.org \
    --cc=kexec@lists.infradead.org \
    --cc=li.zhang6@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lisa.mitchell@hp.com \
    --cc=rwright@hp.com \
    --cc=tom.vaden@hp.com \
    --cc=vgoyal@redhat.com \
    --cc=zhen-hual@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.