All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
To: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jon Nettleton <jon@solid-run.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Diana Madalina Craciun <diana.craciun@nxp.com>,
	Ioana Ciornei <ioana.ciornei@nxp.com>,
	leoyang.li@nxp.com
Subject: Re: [PATCH 2/8] bus: fsl-mc: handle DMA config deferral in ACPI case
Date: Wed, 17 Nov 2021 17:30:32 +0200	[thread overview]
Message-ID: <ef23386b-5b83-a791-e2f0-a72ec610836a@nxp.com> (raw)
In-Reply-To: <20211117135909.uf3pnhjorllnhcxp@maple.lan>



On 11/17/2021 3:59 PM, Daniel Thompson wrote:
> On Wed, Nov 17, 2021 at 03:07:51PM +0200, Laurentiu Tudor wrote:
>> On 11/12/2021 7:31 PM, Daniel Thompson wrote:
>>> On Thu, Nov 11, 2021 at 06:36:58PM +0100, Jon Nettleton wrote:
>>>> On Thu, Nov 11, 2021 at 6:23 PM Daniel Thompson
>>>> <daniel.thompson@linaro.org> wrote:
>>>>> Hi Laurentiu
>>>>>
>>>>> On Thu, Jul 15, 2021 at 05:07:12PM +0300, laurentiu.tudor@nxp.com wrote:
>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>>
>>>>>> ACPI DMA configure API may return a defer status code, so handle it.
>>>>>> On top of this, move the MC firmware resume after the DMA setup
>>>>>> is completed to avoid crashing due to DMA setup not being done yet or
>>>>>> being deferred.
>>>>>>
>>>>>> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>
>>>>> I saw regressions on my Honeycomb LX2 (NXP LX2060A) when I switched to
>>>>> v5.15. It seems like it results in so many sMMU errors that the system
>>>>> cannot function correctly (it's only about a 75% chance the system will
>>>>> boot to GUI and even if it does boot successfully the system will hang
>>>>> up soon after).
>>>>>
>>>>> Bisect took me up a couple of blind alleys (mostly due to unrelated boot
>>>>> problems in v5.14-rc2) by eventually led me to this patch as the cause.
>>>>> Applying/unapplying this patch to a v5.14-rc3 tree will provoke/fix the
>>>>> problem and reverting it against v5.15 also resolves the problem.
>>>>>
>>>>> Is there some specific firmware version required for this patch to work
>>>>> correctly?
>>>>
>>>> This patch was merged as a requirement for operational on board networking.
>>>> This was merged as a prerequisite to landing the patches to support MDIO and
>>>> phy initialization in general.
>>>
>>> Interesting.
>>>
>>> I assumed the change of behaviour comes from properly handling
>>> -EPROBE_DEFER (which can hardly be regarded as a fault with the patch).
>>>
>>> Having said that the patch does not seem to be mandatory to get the 1G
>>> networking working on Honeycomb LX2 (running ACPI). By taking v5.15 and
>>> reverting as I shared previously, I am still able to access the network
>>> using the 1G port on the back of the unit (although I didn't do any
>>> performance tests).
>>>
>>>
>>>> The correct solution for the problem you are seeing is the ACPI
>>>> maintainers figuring out how to land the IORT RMR patchset.  Until
>>>> that is done the only workaround is setting "arm-smmu.disable_bypass=0
>>>> iommu.passthrough=1" on the kernel commandline.  The latter option is
>>>> required since 5.15 and I haven't had time or energy to figure out
>>>> why.  The proper solution is to just land the IORT RMR patchset and
>>>> let HoneyComb run with the SMMU enabled.
>>>
>>> Thanks for the update. I'll probably adopt iommu.passthrough=1 for now.
>>> That allows me to adopt a distro kernel when it updates to v5.15.
>>
>> The "iommu.passthrough=1" kernel arg shouldn't be needed. By chance, do
>> you remember what errors were you seeing? What was failing?
> 
> For all testing of v5.15 I had "arm-smmu.disable_bypass=0" set because I
> was guided to enable that by the error messages in older kernels ;-) .
> 
> Anyhow without "iommu.passthrough=1" (and without the patch from this thread
> reverted) then the logs are being massively spammed with error messages:
> 
> ~~~
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> arm_smmu_context_fault: 1697259 callbacks suppressed
> ~~~
> 
> This results a relatively simple workstation (LX2 + nVidia GT-710 + USB
> for networking) becoming unresponsive. How long to fail is a little
> unpredictable. I assumed that the weight of such dense log messages
> eventually gets into a timing pattern that prevented any useful
> interrupts from being serviced... but that is only a guess.
> 

Few comments here:
 - I'm suspecting that the PCI video card is triggering the smmu faults.
Would it be possible to give it a try with the card out and without
"iommu.passthrough=1"?
 - the IOVAs look weird to me, they should look something like
0xffffxxxxxx or so. Maybe there are issues in the nvidia driver?
 - Would it be possible to share a full boot log? I'm thinking that it
would be interesting to see how the devices are allocated in iommu groups.

---
Thanks & Best Regards, Laurentiu

  reply	other threads:[~2021-11-17 15:30 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15 14:07 [PATCH 1/8] bus: fsl-mc: fix arg in call to dprc_scan_objects() laurentiu.tudor
2021-07-15 14:07 ` [PATCH 2/8] bus: fsl-mc: handle DMA config deferral in ACPI case laurentiu.tudor
2021-11-11 17:23   ` Daniel Thompson
2021-11-11 17:36     ` Jon Nettleton
2021-11-12 17:31       ` Daniel Thompson
2021-11-17 13:07         ` Laurentiu Tudor
2021-11-17 13:22           ` Jon Nettleton
2021-11-17 13:59           ` Daniel Thompson
2021-11-17 15:30             ` Laurentiu Tudor [this message]
2021-11-17 17:00               ` Daniel Thompson
2021-11-18 12:41                 ` Laurentiu Tudor
2021-11-17 13:03     ` Laurentiu Tudor
2021-11-17 14:46       ` Daniel Thompson
2021-07-15 14:07 ` [PATCH 3/8] bus: fsl-mc: fully resume the firmware laurentiu.tudor
2021-07-15 14:07 ` [PATCH 4/8] bus: fsl-mc: add .shutdown() op for the bus driver laurentiu.tudor
2021-07-15 14:07 ` [PATCH 5/8] bus: fsl-mc: pause the MC firmware before IOMMU setup laurentiu.tudor
2021-07-15 14:07 ` [PATCH 6/8] bus: fsl-mc: pause the MC firmware when unloading laurentiu.tudor
2021-07-15 14:07 ` [PATCH 7/8] bus: fsl-mc: rescan devices if endpoint not found laurentiu.tudor
2021-07-15 14:07 ` [PATCH 8/8] bus: fsl-mc: fix mmio base address for child DPRCs laurentiu.tudor
2021-07-21 16:11 ` [PATCH 1/8] bus: fsl-mc: fix arg in call to dprc_scan_objects() Diana Madalina Craciun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef23386b-5b83-a791-e2f0-a72ec610836a@nxp.com \
    --to=laurentiu.tudor@nxp.com \
    --cc=daniel.thompson@linaro.org \
    --cc=diana.craciun@nxp.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ioana.ciornei@nxp.com \
    --cc=jon@solid-run.com \
    --cc=leoyang.li@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.