linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Luís Mendes" <luis.p.mendes@gmail.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Jason Cooper <jason@lakedaemon.net>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Todd Poynor <toddpoynor@google.com>
Subject: Re: Problem with PCIe enumeration of Google/Coral TPU Edge module on Linux
Date: Thu, 9 Apr 2020 21:07:46 +0100	[thread overview]
Message-ID: <CAEzXK1oGTCQfQq1LFunvU53CxD+zmNcoB4B8ze6qb+LRQA1mXg@mail.gmail.com> (raw)
In-Reply-To: <20200409180840.GA23054@google.com>

Hi Bjorn,

I've just tested with linux-next master branch 20200409.

You can add:
Tested-by: Luis Mendes <luis.p.mendes@gmail.com>

Thanks,
Luís

01:00.0 System peripheral: Device 1ac1:089a (prog-if ff)
    Subsystem: Device 1ac1:089a
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 0
    Region 0: Memory at e8100000 (64-bit, prefetchable) [disabled] [size=16K]
    Region 2: Memory at e8000000 (64-bit, prefetchable) [disabled] [size=1M]
    Capabilities: <access denied>

On Thu, Apr 9, 2020 at 7:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Todd]
>
> On Thu, Apr 09, 2020 at 11:30:10AM -0500, Bjorn Helgaas wrote:
> > On Thu, Apr 09, 2020 at 04:25:40PM +0100, Luís Mendes wrote:
> > > Hi Bjorn,
> > >
> > > I've good news. I've found the culprit and it is a pretty simple
> > > issue, however the good solution is not obvious to me.
> > > Can you help in finding the best way to patch this issue?
> > >
> > > So first detailing the problem in file setup_bus.c there is this *if
> > > condition* to ignore resources from classless devices and so
> > > it is that this Google/Coral Edge TPU is a classless device with class 0xff:
> > >
> > > static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
> > > {
> > >     u16 class = dev->class >> 8;
> > >
> > >        pci_info(dev, "%s\n", __func__);
> > >     /* Don't touch classless devices or host bridges or IOAPICs */
> > >     if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
> > >         return;
> > >    ....
> > >
> > > So the one possible trivial, non generic, attempt that works is to do:
> > > static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
> > > {
> > >     u16 class = dev->class >> 8;
> > >
> > >        pci_info(dev, "%s\n", __func__);
> > >     /* Don't touch classless devices or host bridges or IOAPICs */
> > >     if ((class == PCI_CLASS_NOT_DEFINED &&  !(dev->vendor == 0x1ac1 &&
> > > dev->device==0x089a)) || class == PCI_CLASS_BRIDGE_HOST)
> > >         return;
> > >    ....
> > >
> > > What is your suggestion to make the solution generic? Create a
> > > whitelist? Remove this verification? I have no idea... nothing sounds
> > > good to me...
> >
> > Good detective work, thanks for chasing this down!
> >
> > I should have seen that check when adding the debug.  Guess I thought
> > "sort", hmmm, that just re-orders things without actually changing the
> > content.  But pdev_sort_resources() in fact *adds* resources to a
> > list, and if resources aren't on the list, we apparently don't assign
> > space for them.
> >
> > In any event, I would first check to see if there's an Edge TPU
> > firmware update that might set the class code.
> >
> > If not, we should probably add a quirk to override the class code,
> > similar to quirk_eisa_bridge(), fixup_rev1_53c810(),
> > fixup_ti816x_class(), quirk_tw686x_class().
>
> In fact, apex_pci_fixup_class() already exists!  But it's in
> apex_driver.c.  Do you happen to have CONFIG_STAGING_APEX_DRIVER=m
> (built as a module)?  If so, that quirk won't be run until the module
> is loaded, and that happens long after resource assignment.
>
> Building with CONFIG_STAGING_APEX_DRIVER=y (not =m) should be a
> workaround.  But I think the real fix would be moving
> apex_pci_fixup_class() from apex_driver.c to drivers/pci/quirks.c,
> like the following.  Would you mind testing it?
>
>
> commit 59f3165318b3 ("PCI: Move Apex Edge TPU class quirk to fix BAR assignment")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu Apr 9 12:43:45 2020 -0500
>
>     PCI: Move Apex Edge TPU class quirk to fix BAR assignment
>
>     Some Google Apex Edge TPU devices have a class code of 0
>     (PCI_CLASS_NOT_DEFINED).  This prevents the PCI core from assigning
>     resources for the Apex BARs because __dev_sort_resources() ignores
>     classless devices, host bridges, and IOAPICs.
>
>     On x86, firmware typically assigns those resources, so this was not a
>     problem.  But on some architectures, firmware does *not* assign BARs, and
>     since the PCI core didn't do it either, the Apex device didn't work
>     correctly:
>
>       apex 0000:01:00.0: can't enable device: BAR 0 [mem 0x00000000-0x00003fff 64bit pref] not claimed
>       apex 0000:01:00.0: error enabling PCI device
>
>     f390d08d8b87 ("staging: gasket: apex: fixup undefined PCI class") added a
>     quirk to fix the class code, but it was in the apex driver, and if the
>     driver was built as a module, it was too late to help.
>
>     Move the quirk to the PCI core, where it will always run early enough that
>     the PCI core will assign resources if necessary.
>
>     Link: https://lore.kernel.org/r/CAEzXK1r0Er039iERnc2KJ4jn7ySNUOG9H=Ha8TD8XroVqiZjgg@mail.gmail.com
>     Fixes: f390d08d8b87 ("staging: gasket: apex: fixup undefined PCI class")
>     Reported-by: Luís Mendes <luis.p.mendes@gmail.com>
>     Debugged-by: Luís Mendes <luis.p.mendes@gmail.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 28c9a2409c50..ca9ed5774eb1 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5567,3 +5567,10 @@ static void pci_fixup_no_d0_pme(struct pci_dev *dev)
>         dev->pme_support &= ~(PCI_PM_CAP_PME_D0 >> PCI_PM_CAP_PME_SHIFT);
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x2142, pci_fixup_no_d0_pme);
> +
> +static void apex_pci_fixup_class(struct pci_dev *pdev)
> +{
> +       pdev->class = (PCI_CLASS_SYSTEM_OTHER << 8) | pdev->class;
> +}
> +DECLARE_PCI_FIXUP_CLASS_HEADER(0x1ac1, 0x089a,
> +                              PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
> diff --git a/drivers/staging/gasket/apex_driver.c b/drivers/staging/gasket/apex_driver.c
> index 46199c8ca441..f12f81c8dd2f 100644
> --- a/drivers/staging/gasket/apex_driver.c
> +++ b/drivers/staging/gasket/apex_driver.c
> @@ -570,13 +570,6 @@ static const struct pci_device_id apex_pci_ids[] = {
>         { PCI_DEVICE(APEX_PCI_VENDOR_ID, APEX_PCI_DEVICE_ID) }, { 0 }
>  };
>
> -static void apex_pci_fixup_class(struct pci_dev *pdev)
> -{
> -       pdev->class = (PCI_CLASS_SYSTEM_OTHER << 8) | pdev->class;
> -}
> -DECLARE_PCI_FIXUP_CLASS_HEADER(APEX_PCI_VENDOR_ID, APEX_PCI_DEVICE_ID,
> -                              PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
> -
>  static int apex_pci_probe(struct pci_dev *pci_dev,
>                           const struct pci_device_id *id)
>  {

      reply	other threads:[~2020-04-09 20:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 14:32 Problem with PCIe enumeration of Google/Coral TPU Edge module on Linux Luís Mendes
2020-03-06 21:47 ` Bjorn Helgaas
2020-03-07 12:11   ` Luís Mendes
2020-03-07 15:26     ` Luís Mendes
2020-03-07 21:38       ` Bjorn Helgaas
2020-03-08  5:51         ` Nicholas Johnson
2020-03-09 11:21           ` Luís Mendes
2020-03-11 14:20             ` Luís Mendes
2020-03-29 22:11               ` Luís Mendes
2020-03-30 19:49                 ` Bjorn Helgaas
2020-03-31 21:28                   ` Luís Mendes
2020-04-01 18:16                     ` Bjorn Helgaas
2020-04-01 21:20                       ` Luís Mendes
2020-04-01 21:55                         ` Luís Mendes
2020-04-01 23:31                         ` Bjorn Helgaas
2020-04-02 14:13                           ` Luís Mendes
2020-04-04  1:32                             ` Bjorn Helgaas
2020-04-04 21:39                               ` Luís Mendes
2020-04-08 23:05                                 ` Luís Mendes
2020-04-09 15:25                                   ` Luís Mendes
2020-04-09 15:29                                     ` Luís Mendes
2020-04-09 16:30                                     ` Bjorn Helgaas
2020-04-09 17:32                                       ` Luís Mendes
2020-04-09 18:08                                       ` Bjorn Helgaas
2020-04-09 20:07                                         ` Luís Mendes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEzXK1oGTCQfQq1LFunvU53CxD+zmNcoB4B8ze6qb+LRQA1mXg@mail.gmail.com \
    --to=luis.p.mendes@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=helgaas@kernel.org \
    --cc=jason@lakedaemon.net \
    --cc=linux-pci@vger.kernel.org \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=toddpoynor@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).