All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Mason <slash.tmp@free.fr>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Lukas Wunner <lukas@wunner.de>,
	Mathias Nyman <mathias.nyman@linux.intel.com>,
	Felipe Balbi <felipe.balbi@linux.intel.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	linux-usb <linux-usb@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Alan Stern <stern@rowland.harvard.edu>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: Possible regression between 4.9 and 4.13
Date: Wed, 30 Aug 2017 10:07:59 +0100	[thread overview]
Message-ID: <CAKv+Gu8QSWO8jYc1L6eJyMg58gtVsoa9zYuymSc-PdEm60HzxA@mail.gmail.com> (raw)
In-Reply-To: <678490ce-9381-e63e-7a12-33d3eff7f894@free.fr>

On 30 August 2017 at 09:55, Mason <slash.tmp@free.fr> wrote:
> On 30/08/2017 08:02, Greg Kroah-Hartman wrote:
>
>> To get back to the original issue here, the hardware seems to have died,
>> the driver stops talking to it, and all is good.  The "regression" here
>> is that we now properly can determine that the hardware is crap.
>
> Before 4.12, when I unplugged my USB3 Flash drive, Linux would
> detect a few "Uncorrected Non-Fatal errors" via AER, but it was
> still possible to plug the drive back in.
>
> Since 4.12, once I unplug the drive, the whole USB3 card is marked
> as dead (all 4 ports), and I can no longer plug anything in (not even
> the USB2 drive that didn't have any issues, IIRC).
>
> It seems a bit premature to "mark as dead" something that remains
> functional, doesn't it?
>
> Disclaimer, there are many variables in this setup, and I've only
> tested a small fraction of the problem space: only one system,
> only one USB3 board, only one USB3 Flash drive.
>

Please don't forget to mention that this is quirky hardware that
depends on BROKEN because it multiplexes MMIO and config space
accesses in the same memory window without any locking whatsoever
(which would be difficult to do in the first place because we don't
use accessors for MMIO in the kernel).

So how likely is it that you are attempting to read from the xhci BAR
window while a config space access is in progress? Any way to
instrument this in your driver?

>> So, how do you think we should proceed, delay a bit longer before saying
>> the device is gone?  How long is "long enough"?  How many bus errors are
>> we allowed to tolerate (hint, the PCI spec says none...)
>>
>> Maybe someone wants to get to the root problem here, why is the hardware
>> suddenly reporting all 1s?
>
> I'm afraid I won't be able to make any progress on this front,
> unless I can get my hands on a PCIe packet analyzer.
>
> Regards.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: ard.biesheuvel@linaro.org (Ard Biesheuvel)
To: linux-arm-kernel@lists.infradead.org
Subject: Possible regression between 4.9 and 4.13
Date: Wed, 30 Aug 2017 10:07:59 +0100	[thread overview]
Message-ID: <CAKv+Gu8QSWO8jYc1L6eJyMg58gtVsoa9zYuymSc-PdEm60HzxA@mail.gmail.com> (raw)
In-Reply-To: <678490ce-9381-e63e-7a12-33d3eff7f894@free.fr>

On 30 August 2017 at 09:55, Mason <slash.tmp@free.fr> wrote:
> On 30/08/2017 08:02, Greg Kroah-Hartman wrote:
>
>> To get back to the original issue here, the hardware seems to have died,
>> the driver stops talking to it, and all is good.  The "regression" here
>> is that we now properly can determine that the hardware is crap.
>
> Before 4.12, when I unplugged my USB3 Flash drive, Linux would
> detect a few "Uncorrected Non-Fatal errors" via AER, but it was
> still possible to plug the drive back in.
>
> Since 4.12, once I unplug the drive, the whole USB3 card is marked
> as dead (all 4 ports), and I can no longer plug anything in (not even
> the USB2 drive that didn't have any issues, IIRC).
>
> It seems a bit premature to "mark as dead" something that remains
> functional, doesn't it?
>
> Disclaimer, there are many variables in this setup, and I've only
> tested a small fraction of the problem space: only one system,
> only one USB3 board, only one USB3 Flash drive.
>

Please don't forget to mention that this is quirky hardware that
depends on BROKEN because it multiplexes MMIO and config space
accesses in the same memory window without any locking whatsoever
(which would be difficult to do in the first place because we don't
use accessors for MMIO in the kernel).

So how likely is it that you are attempting to read from the xhci BAR
window while a config space access is in progress? Any way to
instrument this in your driver?

>> So, how do you think we should proceed, delay a bit longer before saying
>> the device is gone?  How long is "long enough"?  How many bus errors are
>> we allowed to tolerate (hint, the PCI spec says none...)
>>
>> Maybe someone wants to get to the root problem here, why is the hardware
>> suddenly reporting all 1s?
>
> I'm afraid I won't be able to make any progress on this front,
> unless I can get my hands on a PCIe packet analyzer.
>
> Regards.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2017-08-30  9:07 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-22 17:34 Possible regression between 4.9 and 4.13 Mason
2017-08-22 17:34 ` Mason
2017-08-23  6:07 ` Felipe Balbi
2017-08-23  6:07   ` Felipe Balbi
2017-08-23  7:51   ` Mathias Nyman
2017-08-23  7:51     ` Mathias Nyman
2017-08-23  9:18     ` Mason
2017-08-23  9:18       ` Mason
2017-08-23  9:31     ` Mason
2017-08-23  9:31       ` Mason
2017-08-23 11:11       ` Mathias Nyman
2017-08-23 11:11         ` Mathias Nyman
2017-08-23 11:54         ` Mason
2017-08-23 11:54           ` Mason
2017-08-23 12:41           ` Mason
2017-08-23 12:41             ` Mason
2017-08-23 14:30             ` Mason
2017-08-23 14:30               ` Mason
2017-08-28  8:39               ` Mathias Nyman
2017-08-28  8:39                 ` Mathias Nyman
2017-08-28 14:40                 ` Mason
2017-08-28 14:40                   ` Mason
2017-08-29 13:28                   ` Mathias Nyman
2017-08-29 13:28                     ` Mathias Nyman
2017-08-29 13:38                     ` Lukas Wunner
2017-08-29 13:38                       ` Lukas Wunner
2017-08-29 14:47                       ` Greg Kroah-Hartman
2017-08-29 14:47                         ` Greg Kroah-Hartman
2017-08-29 15:34                         ` Lukas Wunner
2017-08-29 15:34                           ` Lukas Wunner
2017-08-29 15:51                           ` Greg Kroah-Hartman
2017-08-29 15:51                             ` Greg Kroah-Hartman
2017-08-30  6:36                             ` Lukas Wunner
2017-08-30  6:36                               ` Lukas Wunner
2017-08-30  6:45                               ` Greg Kroah-Hartman
2017-08-30  6:45                                 ` Greg Kroah-Hartman
2017-08-29 23:53                     ` Lukas Wunner
2017-08-29 23:53                       ` Lukas Wunner
2017-08-30  6:02                       ` Greg Kroah-Hartman
2017-08-30  6:02                         ` Greg Kroah-Hartman
2017-08-30  8:55                         ` Mason
2017-08-30  8:55                           ` Mason
2017-08-30  9:06                           ` Greg Kroah-Hartman
2017-08-30  9:06                             ` Greg Kroah-Hartman
2017-08-31  9:39                             ` Mason
2017-08-31  9:39                               ` Mason
2017-08-31 11:40                               ` Mathias Nyman
2017-08-31 11:40                                 ` Mathias Nyman
2017-08-30  9:07                           ` Ard Biesheuvel [this message]
2017-08-30  9:07                             ` Ard Biesheuvel
2017-08-30  9:22                             ` Greg Kroah-Hartman
2017-08-30  9:22                               ` Greg Kroah-Hartman
2017-08-30  9:37                             ` Mason
2017-08-30  9:37                               ` Mason
2017-08-31  9:17                               ` Mason
2017-08-31  9:17                                 ` Mason
2017-08-31 11:38                                 ` Mathias Nyman
2017-08-31 11:38                                   ` Mathias Nyman
2017-08-23 10:19     ` Mason
2017-08-23 10:19       ` Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKv+Gu8QSWO8jYc1L6eJyMg58gtVsoa9zYuymSc-PdEm60HzxA@mail.gmail.com \
    --to=ard.biesheuvel@linaro.org \
    --cc=felipe.balbi@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mathias.nyman@linux.intel.com \
    --cc=slash.tmp@free.fr \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.