Re: arm64: Getting continuous PCIe "CmpltTO" AER from network card in kdump kernel

From: Prabhakar Kushwaha <prabhakar.pkin@gmail.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Ganapatrao Prabhakerrao Kulkarni <gkulkarni@marvell.com>,
	Will Deacon <will@kernel.org>,
	will.deacon@arm.com, Bhupesh Sharma <bhsharma@redhat.com>,
	kexec mailing list <kexec@lists.infradead.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: arm64: Getting continuous PCIe "CmpltTO" AER from network card in kdump kernel
Date: Thu, 26 Mar 2020 22:25:38 +0530	[thread overview]
Message-ID: <CAJ2QiJJ8nB=2df2++1KahTD8Cqe7f5JY1tp=tZ07QomvgFwjnA@mail.gmail.com> (raw)
In-Reply-To: <81616336e3b8c5f083b26125bf47575c@kernel.org>

On Thu, Mar 26, 2020 at 9:43 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On 2020-03-26 15:35, Prabhakar Kushwaha wrote:
> > On Thu, Mar 26, 2020 at 7:49 PM Robin Murphy <robin.murphy@arm.com>
> > wrote:
> >>
> >> On 2020-03-26 1:36 pm, Prabhakar Kushwaha wrote:
> >> > On Mon, Mar 23, 2020 at 10:28 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >> >>
> >> >> On 2020-03-23 3:21 pm, Prabhakar Kushwaha wrote:
> >> >>> Hi All,
> >> >>>
> >> >>> I am facing issue on Marvell's ARM64 Thunder X2 with kdump kernel.
> >> >>> Here network card is continuously giving following AER error
> >> >>> [  100.839168] igb 0000:09:00.1: AER: aer_status: 0x00004000,
> >> >>> aer_mask: 0x00000000
> >> >>> [  100.846463] igb 0000:09:00.1: AER:    [14] CmpltTO                (First)
> >> >>> [  100.861491] igb 0000:09:00.1: AER: aer_layer=Transaction Layer,
> >> >>> aer_agent=Requester ID
> >> >>> [  100.869400] igb 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> >> >>>
> >> >>> This error is not 100% reproducible. It happens 1 out of 4 try.
> >> >>>
> >> >>> This error goes away in following two scenarios
> >> >>> A) Set iommu in bypass mode via bootargs iommu.passthrough=1
> >> >>> B) Wait for ~100ms in arm_smmu_device_reset of  drivers/iommu/arm-smmu-v3.c
> >> >>>           if (reg & CR0_SMMUEN) {
> >> >>>                   dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
> >> >>>                   WARN_ON(is_kdump_kernel() && !disable_bypass);
> >> >>>                   mdelay(100);  <-- Added delay
> >> >>>                   arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> >> >>>           }
> >> >>>
> >> >>>   From A), it is clear that it is related to IOMMU
> >> >>>   From B), looks like during boot of kdump kernel, network card is still
> >> >>> active and it has sent some request over PCIe.
> >> >>> as GPBA_ABORT bit is set, no response/completion coming to PCIe
> >> >>> controller hence "CmpltTO" error.
> >> >>>
> >> >>> Ideally before setting GPBA_ABORT bit, there should be some check for
> >> >>> active transaction. if it is not possible, a wait should be done to
> >> >>> assure that no more pending transaction left.
> >> >>
> >> >> In general there is no way to check for active transactions, and even if
> >> >> there were, waiting for them to finish could mean waiting forever (if,
> >> >> say, a device is continuously streaming to/from a ring buffer).
> >> >>
> >> >>> why any such delay has not been considered?
> >> >>
> >> >> The main aim here is to block any DMA left over from the crashed kernel
> >> >> as quickly as possible, to minimise any further potential corruption of
> >> >> memory (consider if a device was left writing to an IOMMU virtual
> >> >> address that happened to have the same value as some physical address in
> >> >> the crash kernel's reserved memory). The fact that an arbitrary delay
> >> >> happens to give a 'nicer' result in one particular situation on one
> >> >> particular platform is neither here nor there in general.
> >> >>
> >> >
> >> > I agree.
> >> > But we are depending upon kdump boot time and expecting devices to
> >> > reach to idle state before setting GBPA_ABORT bit.
> >>
> >> So (ideally) stop depending on that, because like I said it's fragile
> >> and doesn't generalise.
> >>
> >> > adding a delay will be fair and make it independent of kdump boot time.
> >>
> >> And what delay value is "fair" and appropriate for any device on any
> >> system in any circumstance?
> >>
> >
> >  it is tough question.  1sec can be thought of.
> >
> >> >> Besides, this is *crash* kernel, so yeah, expect errors - something's
> >> >> already gone badly wrong to get us here, and everything from then on is
> >> >> merely a best-effort attempt to salvage what we can. Does it even make
> >> >> sense to have AER enabled at this point?
> >> >>
> >> >
> >> > i tried by disabling AER in kdump kernel. but it did not helped as
> >> > network device become out of sync with respect to tx unit causing it
> >> > to be hanged and it never recovered from there.  Same can happen with
> >> > other devices like SATA etc
> >>
> >> Any devices that the kdump kernel wants to use need to be fully reset
> >> to
> >> get them into a sane state anyway, don't they? I mean, what if the
> >> crash
> >> was *caused* by once of those devices going wrong in the first place?
> >> Any devices that kdump *doesn't* care about shouldn't matter, since
> >> nothing should be unmasking their interrupts regardless of what state
> >> they're in.
> >>
> >> Assume some descriptor or pagetable entry got corrupted that caused
> >> your
> >> network device to access an invalid physical address downstream of the
> >> SMMU and get an abort from that *before* the kdump kernel starts - is
> >> waiting an extra 100ms at any point after that going to help?
> >>
> > I agree with you. in above scenaro, where device if faulty or done
> > something wrong, waiting even hours is waste.
> >
> > I was just going through other iommu drivers as this problem is
> > generic one and i found following patch
> >
> > commit 091d42e43d21b6ca7ec39bf5f9e17bc0bd8d4312
> > Author: Joerg Roedel <jroedel@suse.de>
> > Date:   Fri Jun 12 11:56:10 2015 +0200
> >     iommu/vt-d: Copy translation tables from old kernel
> >
> >     If we are in a kdump kernel and find translation enabled in
> >     the iommu, try to copy the translation tables from the old
> >     kernel to preserve the mappings until the device driver
> >     takes over.
> >
> >     This supports old and the extended root-entry and
> >     context-table formats.
> >
> >     Tested-by: ZhenHua Li <zhen-hual@hp.com>
> >     Tested-by: Baoquan He <bhe@redhat.com>
> >     Signed-off-by: Joerg Roedel <jroedel@suse.de>
> >
> > I believe, similar kind of solution is required for SMMU also.
>
> There's a much more general problem: how to preserve pre-boot DMA
> configurations because they are important to the new kernel (for
> whatever reason).
>
> And in a number of cases, it makes perfect sense: framebuffer
> scanning out boot animations, ongoing DMA for other *cough* agents
> *cough* in the system...
>
> But I really don't like the idea of preserving the page tables across
> a kdump kernel because for all we know, these page tables could be
> horribly corrupted and mostly only make sense in the context of
> the driver that created them.

If I am correct, similar approach is used in GIC-ITS for LPI tables.
Probability of corruption is still there.

> Oh wait, this driver has long died,
> along with the rest of the original kernel.
>

:(
if this is the case than chance of foolproof and generic solution is very less.

At least a delay should be considered before setting SMMU_ABORT bit
for giving a chance of DMA getting completed.

--pk

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel