All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Oliver O'Halloran" <oohall@gmail.com>
To: Forest Crossman <cyrozap@gmail.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-pci <linux-pci@vger.kernel.org>,
	linux-usb@vger.kernel.org
Subject: Re: ASMedia USB 3.x host controllers triggering EEH on POWER9
Date: Fri, 17 Jul 2020 20:43:27 +1000	[thread overview]
Message-ID: <CAOSf1CEiCOjTQ8XX=cC10eghaC+1xaRG-bju8VBFQStOCaU_xQ@mail.gmail.com> (raw)
In-Reply-To: <CAO3ALPwZPN1vdqHaFsuSpgj63o6Z69VUg0LngmCSnvESrO4kNg@mail.gmail.com>

On Fri, Jul 17, 2020 at 8:10 PM Forest Crossman <cyrozap@gmail.com> wrote:
>
> > From word 2 of the PEST entry the faulting DMA address is:
> > 0x0000203974c00000. That address is interesting since it looks a lot
> > like a memory address on the 2nd chip, but it doesn't have bit 59 set
> > so TVE#0 is used to validate it. Obviously that address is above 2GB
> > so we get the error.
>
> Ah, I see. Do you know if the information on the PEST registers is
> documented publicly somewhere? I tried searching for what those
> registers meant in the PHB4 spec but it basically just said, "the PEST
> registers contain PEST data," which isn't particularly helpful.

Most of it is in the IODA spec. See Sec 2.2.6 "Partitionable-Endpoint
State Table", the only part that isn't documented there is the top
bits of each word which are documented in the PHB spec for some
reason. For word one the top bit (PPCBIT(0)) means MMIO is frozen and
for word two (PPCBIT(64)) the top bit indicates DMA is frozen.

> > What's probably happening is that the ASmedia controller doesn't
> > actually implement all 64 address bits and truncates the upper bits of
> > the DMA address. Doing that is a blatant violation of the PCIe (and
> > probably the xHCI) spec, but it's also pretty common since "it works
> > on x86." Something to try would be booting with the iommu=nobypass in
> > the kernel command line. That'll disable TVE#1 and force all DMAs to
> > go through TVE#0.
>
> Thanks, iommu=nobypass fixed it! Plugging in one or more USB devices
> no longer triggers any EEH errors.
>
> > Assuming the nobypass trick above works, what you really need to do is
> > have the driver report that it can't address all 64bits by setting its
> > DMA mask accordingly. For the xhci driver it looks like this is done
> > in xhci_gen_setup(), there might be a quirks-style interface for
> > working around bugs in specific controllers that you can use. Have a
> > poke around and see what you can find :)
>
> Yup, the xhci driver has a quirks system, and conveniently one of
> those is XHCI_NO_64BIT_SUPPORT. After making a 3-line patch to
> xhci-pci.c to add that quirk for this chip, the host controller is now
> able to work without setting iommu=nobypass in the kernel arguments.

Cool

> Thank you so much for your help! You've likely saved me several hours
> of reading documentation, as well as several more hours of fiddling
> around with the xhci driver.

> I'm almost disappointed the fix was so
> simple, but the time savings alone more than makes up for it. I'll
> submit the patch to the USB ML shortly.

It might be only three lines, but it doesn't mean it's trivial. Most
bugs like that require a lot of context to understand, let alone fix.
Thanks for looking into it!

Oliver

WARNING: multiple messages have this Message-ID (diff)
From: "Oliver O'Halloran" <oohall@gmail.com>
To: Forest Crossman <cyrozap@gmail.com>
Cc: linux-pci <linux-pci@vger.kernel.org>,
	linux-usb@vger.kernel.org,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: ASMedia USB 3.x host controllers triggering EEH on POWER9
Date: Fri, 17 Jul 2020 20:43:27 +1000	[thread overview]
Message-ID: <CAOSf1CEiCOjTQ8XX=cC10eghaC+1xaRG-bju8VBFQStOCaU_xQ@mail.gmail.com> (raw)
In-Reply-To: <CAO3ALPwZPN1vdqHaFsuSpgj63o6Z69VUg0LngmCSnvESrO4kNg@mail.gmail.com>

On Fri, Jul 17, 2020 at 8:10 PM Forest Crossman <cyrozap@gmail.com> wrote:
>
> > From word 2 of the PEST entry the faulting DMA address is:
> > 0x0000203974c00000. That address is interesting since it looks a lot
> > like a memory address on the 2nd chip, but it doesn't have bit 59 set
> > so TVE#0 is used to validate it. Obviously that address is above 2GB
> > so we get the error.
>
> Ah, I see. Do you know if the information on the PEST registers is
> documented publicly somewhere? I tried searching for what those
> registers meant in the PHB4 spec but it basically just said, "the PEST
> registers contain PEST data," which isn't particularly helpful.

Most of it is in the IODA spec. See Sec 2.2.6 "Partitionable-Endpoint
State Table", the only part that isn't documented there is the top
bits of each word which are documented in the PHB spec for some
reason. For word one the top bit (PPCBIT(0)) means MMIO is frozen and
for word two (PPCBIT(64)) the top bit indicates DMA is frozen.

> > What's probably happening is that the ASmedia controller doesn't
> > actually implement all 64 address bits and truncates the upper bits of
> > the DMA address. Doing that is a blatant violation of the PCIe (and
> > probably the xHCI) spec, but it's also pretty common since "it works
> > on x86." Something to try would be booting with the iommu=nobypass in
> > the kernel command line. That'll disable TVE#1 and force all DMAs to
> > go through TVE#0.
>
> Thanks, iommu=nobypass fixed it! Plugging in one or more USB devices
> no longer triggers any EEH errors.
>
> > Assuming the nobypass trick above works, what you really need to do is
> > have the driver report that it can't address all 64bits by setting its
> > DMA mask accordingly. For the xhci driver it looks like this is done
> > in xhci_gen_setup(), there might be a quirks-style interface for
> > working around bugs in specific controllers that you can use. Have a
> > poke around and see what you can find :)
>
> Yup, the xhci driver has a quirks system, and conveniently one of
> those is XHCI_NO_64BIT_SUPPORT. After making a 3-line patch to
> xhci-pci.c to add that quirk for this chip, the host controller is now
> able to work without setting iommu=nobypass in the kernel arguments.

Cool

> Thank you so much for your help! You've likely saved me several hours
> of reading documentation, as well as several more hours of fiddling
> around with the xhci driver.

> I'm almost disappointed the fix was so
> simple, but the time savings alone more than makes up for it. I'll
> submit the patch to the USB ML shortly.

It might be only three lines, but it doesn't mean it's trivial. Most
bugs like that require a lot of context to understand, let alone fix.
Thanks for looking into it!

Oliver

  reply	other threads:[~2020-07-17 10:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17  4:13 ASMedia USB 3.x host controllers triggering EEH on POWER9 Forest Crossman
2020-07-17  4:13 ` Forest Crossman
2020-07-17  6:10 ` Oliver O'Halloran
2020-07-17  6:10   ` Oliver O'Halloran
2020-07-17 10:10   ` Forest Crossman
2020-07-17 10:10     ` Forest Crossman
2020-07-17 10:43     ` Oliver O'Halloran [this message]
2020-07-17 10:43       ` Oliver O'Halloran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOSf1CEiCOjTQ8XX=cC10eghaC+1xaRG-bju8VBFQStOCaU_xQ@mail.gmail.com' \
    --to=oohall@gmail.com \
    --cc=cyrozap@gmail.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.