All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Luís Mendes" <luis.p.mendes@gmail.com>
Cc: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Jason Cooper <jason@lakedaemon.net>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Todd Poynor <toddpoynor@google.com>
Subject: Re: Problem with PCIe enumeration of Google/Coral TPU Edge module on Linux
Date: Thu, 9 Apr 2020 13:08:40 -0500	[thread overview]
Message-ID: <20200409180840.GA23054@google.com> (raw)
In-Reply-To: <20200409163010.GA8879@google.com>

[+cc Todd]

On Thu, Apr 09, 2020 at 11:30:10AM -0500, Bjorn Helgaas wrote:
> On Thu, Apr 09, 2020 at 04:25:40PM +0100, Luís Mendes wrote:
> > Hi Bjorn,
> > 
> > I've good news. I've found the culprit and it is a pretty simple
> > issue, however the good solution is not obvious to me.
> > Can you help in finding the best way to patch this issue?
> > 
> > So first detailing the problem in file setup_bus.c there is this *if
> > condition* to ignore resources from classless devices and so
> > it is that this Google/Coral Edge TPU is a classless device with class 0xff:
> > 
> > static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
> > {
> >     u16 class = dev->class >> 8;
> > 
> >        pci_info(dev, "%s\n", __func__);
> >     /* Don't touch classless devices or host bridges or IOAPICs */
> >     if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
> >         return;
> >    ....
> > 
> > So the one possible trivial, non generic, attempt that works is to do:
> > static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
> > {
> >     u16 class = dev->class >> 8;
> > 
> >        pci_info(dev, "%s\n", __func__);
> >     /* Don't touch classless devices or host bridges or IOAPICs */
> >     if ((class == PCI_CLASS_NOT_DEFINED &&  !(dev->vendor == 0x1ac1 &&
> > dev->device==0x089a)) || class == PCI_CLASS_BRIDGE_HOST)
> >         return;
> >    ....
> > 
> > What is your suggestion to make the solution generic? Create a
> > whitelist? Remove this verification? I have no idea... nothing sounds
> > good to me...
> 
> Good detective work, thanks for chasing this down!
> 
> I should have seen that check when adding the debug.  Guess I thought
> "sort", hmmm, that just re-orders things without actually changing the
> content.  But pdev_sort_resources() in fact *adds* resources to a
> list, and if resources aren't on the list, we apparently don't assign
> space for them.
> 
> In any event, I would first check to see if there's an Edge TPU
> firmware update that might set the class code.
> 
> If not, we should probably add a quirk to override the class code,
> similar to quirk_eisa_bridge(), fixup_rev1_53c810(),
> fixup_ti816x_class(), quirk_tw686x_class().

In fact, apex_pci_fixup_class() already exists!  But it's in
apex_driver.c.  Do you happen to have CONFIG_STAGING_APEX_DRIVER=m
(built as a module)?  If so, that quirk won't be run until the module
is loaded, and that happens long after resource assignment.

Building with CONFIG_STAGING_APEX_DRIVER=y (not =m) should be a
workaround.  But I think the real fix would be moving
apex_pci_fixup_class() from apex_driver.c to drivers/pci/quirks.c,
like the following.  Would you mind testing it?


commit 59f3165318b3 ("PCI: Move Apex Edge TPU class quirk to fix BAR assignment")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Thu Apr 9 12:43:45 2020 -0500

    PCI: Move Apex Edge TPU class quirk to fix BAR assignment
    
    Some Google Apex Edge TPU devices have a class code of 0
    (PCI_CLASS_NOT_DEFINED).  This prevents the PCI core from assigning
    resources for the Apex BARs because __dev_sort_resources() ignores
    classless devices, host bridges, and IOAPICs.
    
    On x86, firmware typically assigns those resources, so this was not a
    problem.  But on some architectures, firmware does *not* assign BARs, and
    since the PCI core didn't do it either, the Apex device didn't work
    correctly:
    
      apex 0000:01:00.0: can't enable device: BAR 0 [mem 0x00000000-0x00003fff 64bit pref] not claimed
      apex 0000:01:00.0: error enabling PCI device
    
    f390d08d8b87 ("staging: gasket: apex: fixup undefined PCI class") added a
    quirk to fix the class code, but it was in the apex driver, and if the
    driver was built as a module, it was too late to help.
    
    Move the quirk to the PCI core, where it will always run early enough that
    the PCI core will assign resources if necessary.
    
    Link: https://lore.kernel.org/r/CAEzXK1r0Er039iERnc2KJ4jn7ySNUOG9H=Ha8TD8XroVqiZjgg@mail.gmail.com
    Fixes: f390d08d8b87 ("staging: gasket: apex: fixup undefined PCI class")
    Reported-by: Luís Mendes <luis.p.mendes@gmail.com>
    Debugged-by: Luís Mendes <luis.p.mendes@gmail.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 28c9a2409c50..ca9ed5774eb1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5567,3 +5567,10 @@ static void pci_fixup_no_d0_pme(struct pci_dev *dev)
 	dev->pme_support &= ~(PCI_PM_CAP_PME_D0 >> PCI_PM_CAP_PME_SHIFT);
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x2142, pci_fixup_no_d0_pme);
+
+static void apex_pci_fixup_class(struct pci_dev *pdev)
+{
+	pdev->class = (PCI_CLASS_SYSTEM_OTHER << 8) | pdev->class;
+}
+DECLARE_PCI_FIXUP_CLASS_HEADER(0x1ac1, 0x089a,
+			       PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
diff --git a/drivers/staging/gasket/apex_driver.c b/drivers/staging/gasket/apex_driver.c
index 46199c8ca441..f12f81c8dd2f 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -570,13 +570,6 @@ static const struct pci_device_id apex_pci_ids[] = {
 	{ PCI_DEVICE(APEX_PCI_VENDOR_ID, APEX_PCI_DEVICE_ID) }, { 0 }
 };
 
-static void apex_pci_fixup_class(struct pci_dev *pdev)
-{
-	pdev->class = (PCI_CLASS_SYSTEM_OTHER << 8) | pdev->class;
-}
-DECLARE_PCI_FIXUP_CLASS_HEADER(APEX_PCI_VENDOR_ID, APEX_PCI_DEVICE_ID,
-			       PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
-
 static int apex_pci_probe(struct pci_dev *pci_dev,
 			  const struct pci_device_id *id)
 {

  parent reply	other threads:[~2020-04-09 18:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 14:32 Problem with PCIe enumeration of Google/Coral TPU Edge module on Linux Luís Mendes
2020-03-06 21:47 ` Bjorn Helgaas
2020-03-07 12:11   ` Luís Mendes
2020-03-07 15:26     ` Luís Mendes
2020-03-07 21:38       ` Bjorn Helgaas
2020-03-08  5:51         ` Nicholas Johnson
2020-03-09 11:21           ` Luís Mendes
2020-03-11 14:20             ` Luís Mendes
2020-03-29 22:11               ` Luís Mendes
2020-03-30 19:49                 ` Bjorn Helgaas
2020-03-31 21:28                   ` Luís Mendes
2020-04-01 18:16                     ` Bjorn Helgaas
2020-04-01 21:20                       ` Luís Mendes
2020-04-01 21:55                         ` Luís Mendes
2020-04-01 23:31                         ` Bjorn Helgaas
2020-04-02 14:13                           ` Luís Mendes
2020-04-04  1:32                             ` Bjorn Helgaas
2020-04-04 21:39                               ` Luís Mendes
2020-04-08 23:05                                 ` Luís Mendes
2020-04-09 15:25                                   ` Luís Mendes
2020-04-09 15:29                                     ` Luís Mendes
2020-04-09 16:30                                     ` Bjorn Helgaas
2020-04-09 17:32                                       ` Luís Mendes
2020-04-09 18:08                                       ` Bjorn Helgaas [this message]
2020-04-09 20:07                                         ` Luís Mendes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200409180840.GA23054@google.com \
    --to=helgaas@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=jason@lakedaemon.net \
    --cc=linux-pci@vger.kernel.org \
    --cc=luis.p.mendes@gmail.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=toddpoynor@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.