All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Oliver O'Halloran" <oohall@gmail.com>
To: "Pali Rohár" <pali@kernel.org>
Cc: "Bjorn Helgaas" <helgaas@kernel.org>,
	"Aaron Ma" <aaron.ma@canonical.com>,
	jesse.brandeburg@intel.com, anthony.l.nguyen@intel.com,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Krzysztof Wilczyński" <kw@linux.com>,
	linux-pci <linux-pci@vger.kernel.org>
Subject: Re: [PATCH 1/2] igc: don't rd/wr iomem when PCI is removed
Date: Mon, 19 Jul 2021 12:49:18 +1000	[thread overview]
Message-ID: <CAOSf1CHOrUBfibO0t6Zr2=SZ7GjLTiAzfoKBeZL8RXdcC+Ou3A@mail.gmail.com> (raw)
In-Reply-To: <20210718225059.hd3od4k4on3aopcu@pali>

On Mon, Jul 19, 2021 at 8:51 AM Pali Rohár <pali@kernel.org> wrote:
>
> And do we have some solution for this kind of issue? There are more PCIe
> controllers / platforms which do not like MMIO read/write operation when
> card / link is not connected.

Do you have some actual examples? The few times I've seen those
crashes were due to broken firmware-first error handling. The AER
notifications would be escalated into some kind of ACPI error which
the kernel didn't have a good way of dealing with so it panicked
instead.

Assuming it is a real problem then as Bjorn pointed out this sort of
hack doesn't really fix the issue because hotplug and AER
notifications are fundamentally asynchronous. If the driver is
actively using the device when the error / removal happens then the
pci_dev_is_disconnected() check will pass and the MMIO will go
through. If the MMIO is poisonous because of dumb hardware then this
sort of hack will only paper over the issue.

> If we do not provide a way how to solve these problems then we can
> expect that people would just hack ethernet / wifi / ... device drivers
> which are currently crashing by patches like in this thread.
>
> Maybe PCI subsystem could provide wrapper function which implements
> above pattern and which can be used by device drivers?

We could do that and I think there was a proposal to add some
pci_readl(pdev, <addr>) style wrappers at one point. On powerpc
there's hooks in the arch provided MMIO functions to detect error
responses and kick off the error handling machinery when a problem is
detected. Those hooks are mainly there to help the platform detect
errors though and they don't make life much easier for drivers. Due to
locking concerns the driver's .error_detected() callback cannot be
called in the MMIO hook so even when the platform detects errors
synchronously the driver notifications must happen asynchronously. In
the meanwhile the driver still needs to handle the 0xFFs response
safely and there's not much we can do from the platform side to help
there.

Oliver

WARNING: multiple messages have this Message-ID (diff)
From: Oliver O'Halloran <oohall@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH 1/2] igc: don't rd/wr iomem when PCI is removed
Date: Mon, 19 Jul 2021 12:49:18 +1000	[thread overview]
Message-ID: <CAOSf1CHOrUBfibO0t6Zr2=SZ7GjLTiAzfoKBeZL8RXdcC+Ou3A@mail.gmail.com> (raw)
In-Reply-To: <20210718225059.hd3od4k4on3aopcu@pali>

On Mon, Jul 19, 2021 at 8:51 AM Pali Roh?r <pali@kernel.org> wrote:
>
> And do we have some solution for this kind of issue? There are more PCIe
> controllers / platforms which do not like MMIO read/write operation when
> card / link is not connected.

Do you have some actual examples? The few times I've seen those
crashes were due to broken firmware-first error handling. The AER
notifications would be escalated into some kind of ACPI error which
the kernel didn't have a good way of dealing with so it panicked
instead.

Assuming it is a real problem then as Bjorn pointed out this sort of
hack doesn't really fix the issue because hotplug and AER
notifications are fundamentally asynchronous. If the driver is
actively using the device when the error / removal happens then the
pci_dev_is_disconnected() check will pass and the MMIO will go
through. If the MMIO is poisonous because of dumb hardware then this
sort of hack will only paper over the issue.

> If we do not provide a way how to solve these problems then we can
> expect that people would just hack ethernet / wifi / ... device drivers
> which are currently crashing by patches like in this thread.
>
> Maybe PCI subsystem could provide wrapper function which implements
> above pattern and which can be used by device drivers?

We could do that and I think there was a proposal to add some
pci_readl(pdev, <addr>) style wrappers at one point. On powerpc
there's hooks in the arch provided MMIO functions to detect error
responses and kick off the error handling machinery when a problem is
detected. Those hooks are mainly there to help the platform detect
errors though and they don't make life much easier for drivers. Due to
locking concerns the driver's .error_detected() callback cannot be
called in the MMIO hook so even when the platform detects errors
synchronously the driver notifications must happen asynchronously. In
the meanwhile the driver still needs to handle the 0xFFs response
safely and there's not much we can do from the platform side to help
there.

Oliver

  reply	other threads:[~2021-07-19  2:49 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-02  4:51 [PATCH 1/2] igc: don't rd/wr iomem when PCI is removed Aaron Ma
2021-07-02  4:51 ` [Intel-wired-lan] " Aaron Ma
2021-07-02  4:51 ` [PATCH 2/2] igc: wait for the MAC copy when enabled MAC passthrough Aaron Ma
2021-07-02  4:51   ` [Intel-wired-lan] " Aaron Ma
2021-07-04  5:36   ` Neftin, Sasha
2021-07-04  5:36     ` Neftin, Sasha
2021-07-05  7:38     ` Aaron Ma
2021-07-05  7:38       ` Aaron Ma
2021-07-05 11:54       ` Neftin, Sasha
2021-07-05 11:54         ` Neftin, Sasha
2021-07-06  6:46         ` Aaron Ma
2021-07-06  6:46           ` Aaron Ma
2021-07-08  4:24           ` Neftin, Sasha
2021-07-08  4:24             ` Neftin, Sasha
2021-07-13 13:45             ` Aaron Ma
2021-07-13 13:45               ` Aaron Ma
2021-07-14  9:13               ` Ruinskiy, Dima
2021-07-14  9:13                 ` Ruinskiy, Dima
2021-07-04 14:28 ` [PATCH 1/2] igc: don't rd/wr iomem when PCI is removed Pali Rohár
2021-07-04 14:28   ` [Intel-wired-lan] " Pali =?unknown-8bit?q?Roh=C3=A1r?=
2021-07-05  7:23   ` Aaron Ma
2021-07-05  7:23     ` [Intel-wired-lan] " Aaron Ma
2021-07-05 23:02   ` Krzysztof Wilczyński
2021-07-05 23:02     ` [Intel-wired-lan] " Krzysztof =?unknown-8bit?q?Wilczy=C5=84ski?=
2021-07-06 14:23     ` Pali Rohár
2021-07-06 14:23       ` [Intel-wired-lan] " Pali =?unknown-8bit?q?Roh=C3=A1r?=
2021-07-05  7:47 ` Dave Airlie
2021-07-05  7:47   ` [Intel-wired-lan] " Dave Airlie
2021-07-06  6:42   ` Aaron Ma
2021-07-06  6:42     ` [Intel-wired-lan] " Aaron Ma
2021-07-06 20:12 ` Bjorn Helgaas
2021-07-06 20:12   ` [Intel-wired-lan] " Bjorn Helgaas
2021-07-07 21:53   ` Pali Rohár
2021-07-07 21:53     ` [Intel-wired-lan] " Pali =?unknown-8bit?q?Roh=C3=A1r?=
2021-07-07 22:10     ` Bjorn Helgaas
2021-07-07 22:10       ` [Intel-wired-lan] " Bjorn Helgaas
2021-07-08  2:04       ` Oliver O'Halloran
2021-07-08  2:04         ` [Intel-wired-lan] " Oliver O'Halloran
2021-07-08 15:45         ` Bjorn Helgaas
2021-07-08 15:45           ` [Intel-wired-lan] " Bjorn Helgaas
2021-07-18 16:31           ` Oliver O'Halloran
2021-07-18 16:31             ` [Intel-wired-lan] " Oliver O'Halloran
2021-07-18 22:50             ` Pali Rohár
2021-07-18 22:50               ` [Intel-wired-lan] " Pali =?unknown-8bit?q?Roh=C3=A1r?=
2021-07-19  2:49               ` Oliver O'Halloran [this message]
2021-07-19  2:49                 ` Oliver O'Halloran
2021-07-19  8:13                 ` Pali Rohár
2021-07-19  8:13                   ` [Intel-wired-lan] " Pali =?unknown-8bit?q?Roh=C3=A1r?=
2021-07-20  0:17                 ` Bjorn Helgaas
2021-07-20  0:17                   ` [Intel-wired-lan] " Bjorn Helgaas
2021-07-13 13:00 ` [PATCH v2] igc: fix page fault when thunderbolt is unplugged Aaron Ma
2021-07-13 13:00   ` [Intel-wired-lan] " Aaron Ma
2021-08-04 12:06   ` Fuxbrumer, Dvora
2021-08-04 12:06     ` Fuxbrumer, Dvora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOSf1CHOrUBfibO0t6Zr2=SZ7GjLTiAzfoKBeZL8RXdcC+Ou3A@mail.gmail.com' \
    --to=oohall@gmail.com \
    --cc=aaron.ma@canonical.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=helgaas@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=kw@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pali@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.