All of lore.kernel.org
 help / color / mirror / Atom feed
From: Myron Stowe <myron.stowe@gmail.com>
To: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>, yxlraid <yxlraid@gmail.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] PCI: fix system hang issue of Marvell SATA host controller
Date: Fri, 8 Mar 2013 20:18:05 -0700	[thread overview]
Message-ID: <CAL-B5D05V124xbG7z6UK9UWJ5PqmAVkqShdrMjWSGHDpTYZLQQ@mail.gmail.com> (raw)
In-Reply-To: <F766E4F80769BD478052FB6533FA745D25F436861C@SC-VEXCH4.marvell.com>

On Thu, Mar 7, 2013 at 11:51 PM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Bjorn
>
>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>> >> > BAR4, system will hang after executing lspci command
>> >>
>> >> This needs more explanation.  We've already read the BARs by the time
>> >> header quirks are run, so apparently it's not just the mere act of
>> >> accessing a BAR that causes a hang.
>> >>
>> >> We need to know exactly what's going on here.  For example, do BARs
>> >> 0-4 exist?  Does the device decode accesses to the regions described
>> >> by the BARs?  The PCI core has to know what resources the device uses,
>> >> so if the device decodes accesses, we can't just throw away the
>> >> start/end information.
>> > The BARs 0-4 is exist and the PCI device is enable IO space, but user access
>> the regions file by udevadm command with info parameter, the system will hang.
>> > Like this: udevadmin info --attribut-walk
>> --path=/sys/device/pci-device/000:*.
>> > Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region
>> file.
>> > Is my explanation ok for the patch?
>>
>> No, I still don't know what causes the hang; I only know that udevadm
>> can trigger it.  I don't want to just paper over the problem until we
>> know what the root cause is.
>>
>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
> The commands are ok because the commands can't find the device after accessing IO port.

Xiangliang:

Sorry but I didn't understand your response above, could you elaborate
a little more?


Are the first five BARs of the suspect device all mapping to I/O port
space - i.e. similar to something like this (a capture and inclusion
of an 'lspci' of the suspect device would be nice to see):
  00:1f.2 SATA controller:
    Region 0: I/O ports at 1860 [size=8]
    Region 1: I/O ports at 1814 [size=4]
    Region 2: I/O ports at 1818 [size=8]
    Region 3: I/O ports at 1810 [size=4]
    Region 4: I/O ports at 1840 [size=32]
    Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=2K]

You have done a good job isolating the issue so far.  As Bjorn noted;
it's looking as if the problem is with accessing the I/O port space
mapped by the suspect device's BAR(s), not with accessing the BAR(s)
in the device's configuration space.

As you responded positively to earlier, as proposed the suspect device
will still actively be decoding accesses to the regions described by
the BARs.  Because the device is actively decoding the PCI core can't
just throw away the BAR's corresponding resource regions, as the patch
is currently doing, due to the possibility of another device being
added at a later time.

If a subsequent device were added later, the core may need to try and
allocate resources for it and, in the worst case scenario, the core
could end up allocating resources that conflict with this suspect
device as a consequence of the suspect device's original resource
allocations having been silently thrown away.  The result would be
both devices believing they each exclusively own the same set (or
subset) of I/O port mappings and thus both actively decoding accesses
to such which.  A situation that would obviously be disastrous.

There is still something going on here that we still do not
understand.  Could you please capture the following information to
help further isolate the issue:
  A 'dmesg' log from the system which was booted using both the
"debug" and "ignore_loglevel" boot parameters, a 'lspci -xxx -s<dev>'
capture, and a 'lspci -vv' capture.

Thanks,
 Myron

> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.
>
>>
>> >>
>> >> > ---
>> >> >  drivers/pci/quirks.c |   15 +++++++++++++++
>> >> >  1 files changed, 15 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> >> > index 0369fb6..d49f8dc 100644
>> >> > --- a/drivers/pci/quirks.c
>> >> > +++ b/drivers/pci/quirks.c
>> >> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>> >> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>> >> >                                 PCI_CLASS_BRIDGE_HOST, 8,
>> >> quirk_mmio_always_on);
>> >> >
>> >> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>> >> > +*  by IO resource file, and need to skip the files
>> >> > +*/
>> >> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>> >> > +{
>> >> > +       int i;
>> >> > +
>> >> > +       for (i = 0; i < 5; i++)
>> >> > +               if (dev->resource[i].start)
>> >> > +                       dev->resource[i].start =
>> >> > +                               dev->resource[i].end = 0;
>> >> > +}
>> >> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>> >> > +                               quirk_marvell_mask_bar);
>> >> > +
>> >> >  /* The Mellanox Tavor device gives false positive parity errors
>> >> >   * Mark this device with a broken_parity_status, to allow
>> >> >   * PCI scanning code to "skip" this now blacklisted device.
>> >> > --
>> >> > 1.7.5.4
>> >> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-03-09  3:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-07 14:29 [PATCH 2/2] PCI: fix system hang issue of Marvell SATA host controller yxlraid
2013-03-07 16:28 ` Bjorn Helgaas
2013-03-08  3:07   ` Xiangliang Yu
2013-03-08  4:19     ` Bjorn Helgaas
2013-03-08  6:51       ` Xiangliang Yu
2013-03-08 17:01         ` Bjorn Helgaas
2013-03-09 14:49           ` Xiangliang Yu
2013-03-09 23:24             ` Myron Stowe
     [not found]               ` <F766E4F80769BD478052FB6533FA745D25F440A64D@SC-VEXCH4.marvell.com>
2013-03-11 21:19                 ` Myron Stowe
     [not found]                   ` <F766E4F80769BD478052FB6533FA745D25F440A9C6@SC-VEXCH4.marvell.com>
2013-03-12 16:21                     ` Bjorn Helgaas
2013-03-13  9:40                       ` Xiangliang Yu
2013-03-14 15:03                         ` Myron Stowe
2013-03-17  0:13                           ` Myron Stowe
2013-03-21 16:00                             ` Myron Stowe
2013-03-09  3:18         ` Myron Stowe [this message]
2013-03-14  4:16           ` Robert Hancock
2013-03-14 15:02             ` Myron Stowe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL-B5D05V124xbG7z6UK9UWJ5PqmAVkqShdrMjWSGHDpTYZLQQ@mail.gmail.com \
    --to=myron.stowe@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=yuxiangl@marvell.com \
    --cc=yxlraid@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.