linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <bhelgaas@google.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>,
	"Li, Zhen-Hua" <zhen-hual@hp.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linda.knippers@hp.com, jerry.hoemann@hp.com,
	lisa.mitchell@hp.com, rwright@hp.com,
	Joerg Roedel <joro@8bytes.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Tom Vaden <tom.vaden@hp.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"open list:INTEL IOMMU (VT-d)" <iommu@lists.linux-foundation.org>
Subject: Re: [PATCH 1/1] pci: fix dmar fault for kdump kernel
Date: Wed, 22 Oct 2014 11:24:23 -0600	[thread overview]
Message-ID: <CAErSpo4JhOyT=HX95uMeghAtnQC1UJQ+NkSbAoRk_2zFrh3vVA@mail.gmail.com> (raw)
In-Reply-To: <5447E12F.2030905@gmail.com>

On Wed, Oct 22, 2014 at 10:54 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 10/21/2014 07:47 PM, Bjorn Helgaas wrote:
>> [+cc Joerg, Eric, Tom, David, iommu list]
>>
>> On Wed, Oct 15, 2014 at 2:14 AM, Takao Indoh <indou.takao@jp.fujitsu.com> wrote:
>>> (2014/10/14 18:34), Li, ZhenHua wrote:
>>>> I tested on the latest stable version 3.17, it works well.
>>>>
>>>> On 10/10/2014 03:13 PM, Li, Zhen-Hua wrote:

>>>>> To fix this DMAR fault, we need to reset the bus that this device on. Reset
>>>>> the device itself does not work.
>> You have not explained why the DMAR faults are a problem.  The fault
>> is just an indication that the IOMMU prevented a DMA from completing.
>> If the DMA is an artifact of the crashed kernel, we probably don't
>> *want* it to complete, so taking a DMAR fault seems like exactly the
>> right thing.
>>
>> If the problem is that we're being flooded with messages, it's easy
>> enough to just tone down the printks.
>
> As I recall what we have seen in the past with the network controllers
> is that they get stuck in a state where they can no longer perform any
> DMA due to the fact that some of the transactions have returned errors
> from the IOMMU being reset.  The only way out is to perform a PCIe reset
> on the part after the IOMMU has been enabled which doesn't occur
> automatically unless AER or EEH is enabled in the system.

OK, now we're talking about a real issue, the sort of thing that
should be in the changelog for a change like this.

I'm uneasy about the strategy of "it hurts when an IOMMU fault occurs,
therefore we need to avoid all IOMMU faults."  Isn't the whole *point*
of an IOMMU to generate faults?  It seems like we need to be able to
handle faults gracefully.

If having AER or EEH enabled in the kdump kernel is part of what's
required to recover, I don't see a problem with requiring that.

Don't we have to be able to recover from IOMMU faults for the device
pass-through case anyway?  If a NIC is passed through to a malicious
guest, I assume the guest can cause IOMMU faults.  I assume we handle
this today by resetting the NIC when the guest exits.

> One thought would be to take a look at the IOMMU reset code.  Is there
> any way to go through and make sure that all of the PCI devices that
> make use of the IOMMU have the bus mastering disabled prior to the IOMMU
> being reset?  For example could we suspend all of the parts in order to
> force them to hold off any transactions, and then resume them after the
> IOMMU has been reset?  If we could do at least that much that would
> prevent the errors and should allow for a graceful reset.
>
> Thanks,
>
> Alex
>
>

  reply	other threads:[~2014-10-22 17:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-10  7:13 [PATCH 1/1] pci: fix dmar fault for kdump kernel Li, Zhen-Hua
2014-10-14  9:34 ` Li, ZhenHua
2014-10-15  8:14   ` Takao Indoh
2014-10-15  8:31     ` Li, ZhenHua
2014-10-20  2:19     ` Li, ZhenHua
2014-10-21  8:23       ` Takao Indoh
2014-10-22  2:47     ` Bjorn Helgaas
2014-10-22  3:02       ` Li, ZhenHua
2014-10-22 16:54       ` Alexander Duyck
2014-10-22 17:24         ` Bjorn Helgaas [this message]
2014-10-23  7:26       ` Li, ZhenHua

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAErSpo4JhOyT=HX95uMeghAtnQC1UJQ+NkSbAoRk_2zFrh3vVA@mail.gmail.com' \
    --to=bhelgaas@google.com \
    --cc=alexander.duyck@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=indou.takao@jp.fujitsu.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jerry.hoemann@hp.com \
    --cc=joro@8bytes.org \
    --cc=linda.knippers@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lisa.mitchell@hp.com \
    --cc=rwright@hp.com \
    --cc=tom.vaden@hp.com \
    --cc=zhen-hual@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).