All of lore.kernel.org
 help / color / mirror / Atom feed
* pciehp regression (bug #79701)
@ 2014-08-29 22:36 Bjorn Helgaas
  2014-08-30  0:00 ` Rajat Jain
  2014-08-30  2:25 ` Rajat Jain
  0 siblings, 2 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2014-08-29 22:36 UTC (permalink / raw)
  To: Rajat Jain; +Cc: linux-pci, Rafael J. Wysocki

Hi Rajat,

This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was
bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch
status while disabling").  Can you take a look?

Bjorn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: pciehp regression (bug #79701)
  2014-08-29 22:36 pciehp regression (bug #79701) Bjorn Helgaas
@ 2014-08-30  0:00 ` Rajat Jain
  2014-08-30  2:25 ` Rajat Jain
  1 sibling, 0 replies; 5+ messages in thread
From: Rajat Jain @ 2014-08-30  0:00 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Rafael J. Wysocki

Hi Bjorn,

I will take a look.

Thanks,

Rajat

On Fri, Aug 29, 2014 at 3:36 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Hi Rajat,
>
> This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was
> bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch
> status while disabling").  Can you take a look?
>
> Bjorn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: pciehp regression (bug #79701)
  2014-08-29 22:36 pciehp regression (bug #79701) Bjorn Helgaas
  2014-08-30  0:00 ` Rajat Jain
@ 2014-08-30  2:25 ` Rajat Jain
  2014-08-30  2:36   ` Guenter Roeck
  2014-09-04 13:28   ` Rajat Jain
  1 sibling, 2 replies; 5+ messages in thread
From: Rajat Jain @ 2014-08-30  2:25 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Rafael J. Wysocki, Rajat Jain, Guenter Roeck

Hello,

On Fri, Aug 29, 2014 at 3:36 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Hi Rajat,
>
> This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was
> bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch
> status while disabling").  Can you take a look?
>

Hi,

I looked at this and wanted to share by observations:

The Basic Issue
==============
There are a bunch of quick hotplug events (unplug followed by the
hot-plug) that are received by the hotplug driver. While both the
hotplug drivers (pciehp and acpiphp) are fine with it, the radeon
driver itself is probably not equipped enough to handle them so well?
[   41.224428] trying to unbind memory from uninitialized GART !

When acpiphp was being used
=======================
As Rafael mentions in this commit log, this is a problem with the VGA
subsystem, that requires the hot-plug driver to ignore such hot-plug
events associated with a slot that connects to such known Radeon
controllers. This was done for acpiphp by introducing a "no_hotplug"
flag for the ACPI:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f244d8b623dae7a7bc695b0336f67729b95a9736
The above commit would fix the problem if the acpiphp is used, by
ignoring the hot-plug events for that slot.

Switch to using pciehp
=================
1) For some reason, the system now seems to use pciehp for these slots
instead of the acpiphp (can someone please tell if this looks OK? I
only ask because I see the concerned Rafael's log getting printed that
seems to indicate that he is expecting the acpiphp to control this
slot?). But I also see that the pciehp has already grabbed the slot by
the time this messages gets printed:
[    4.419180] VGA switcheroo: detected switching method
\_SB_.PCI0.VGA_.ATPX handle

Even with using pciehp, things were still all right until the commit
02e93a8a7, beacuse the pciehp used to ignore the hot-unplug events
(including loss-of-presence-detect and link-down) if (1) SURPRISE
removal is not supported or (2) ADAPTER is not present (which is what
this commit addresses). Thus the hot unplug event used to come, the
pciehp_disable_slot() used to find no adapter and refused to do
anything.

Why problem started with pciehp
==========================
Essentially the commits 02e93a8a7 and 2b3940b60 made the pciehp handle
all hot-unplug events (loss-of-presence-detect and link-downs)
irrespective of whether the the SURPRISE removal was supported or not,
and also if ADAPTER is not present. Now, I would think that both these
commits are still valid because it makes no sense to ignore an unplug
event (and let the kernel continue with stale data structures) just
because SURPRISE is not set, or the ADAPTER is not present (The latter
is an even better reason to process the unplug event).

My recommendations / Options
========================
1) I would first like an opinion on whether it is OK to see the pciehp
handle these hotplug slots. The radeon code seems to be ACPI
intensive, and Rafael's commit also seems to say that this was
supposed to be handled by acpiphp.

2) If it is expected to continue using pciehp, may be we could handle
it in the same way as Rafael did for acpiphp. We could add a flag in
the pci_dev ("ignore_hp_events" or something) and set it for the hot
pluggable slot from radeon code, just like acpi_bus_no_hotplug() is
called today.

I'll be going out for vacation for the 3 days, and would be glad to
submit a patch if needed.

One question to the gentleman who bisected this. (SpacemanSpiff).
Would it be possible for you to look for the following messages while
trying out the image just before the commit 02e93a8a7c?
...
No adapter on slot(2)
...

Thanks & Best Regards,

Rajat Jain

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: pciehp regression (bug #79701)
  2014-08-30  2:25 ` Rajat Jain
@ 2014-08-30  2:36   ` Guenter Roeck
  2014-09-04 13:28   ` Rajat Jain
  1 sibling, 0 replies; 5+ messages in thread
From: Guenter Roeck @ 2014-08-30  2:36 UTC (permalink / raw)
  To: Bjorn Helgaas, rajatxjain; +Cc: linux-pci, Rafael J. Wysocki, Rajat Jain



Hi Rajat,

excellent summary. Another option might be to introduce a blacklist
instead of a callback.

Guenter

________________________________________
From: Rajat Jain <rajatxjain@gmail.com>
Sent: Friday, August 29, 2014 7:25 PM
To: Bjorn Helgaas
Cc: linux-pci@vger.kernel.org; Rafael J. Wysocki; Rajat Jain; Guenter Roeck
Subject: Re: pciehp regression (bug #79701)

Hello,

On Fri, Aug 29, 2014 at 3:36 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Hi Rajat,
>
> This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was
> bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch
> status while disabling").  Can you take a look?
>

Hi,

I looked at this and wanted to share by observations:

The Basic Issue
==============
There are a bunch of quick hotplug events (unplug followed by the
hot-plug) that are received by the hotplug driver. While both the
hotplug drivers (pciehp and acpiphp) are fine with it, the radeon
driver itself is probably not equipped enough to handle them so well?
[   41.224428] trying to unbind memory from uninitialized GART !

When acpiphp was being used
=======================
As Rafael mentions in this commit log, this is a problem with the VGA
subsystem, that requires the hot-plug driver to ignore such hot-plug
events associated with a slot that connects to such known Radeon
controllers. This was done for acpiphp by introducing a "no_hotplug"
flag for the ACPI:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f244d8b623dae7a7bc695b0336f67729b95a9736
The above commit would fix the problem if the acpiphp is used, by
ignoring the hot-plug events for that slot.

Switch to using pciehp
=================
1) For some reason, the system now seems to use pciehp for these slots
instead of the acpiphp (can someone please tell if this looks OK? I
only ask because I see the concerned Rafael's log getting printed that
seems to indicate that he is expecting the acpiphp to control this
slot?). But I also see that the pciehp has already grabbed the slot by
the time this messages gets printed:
[    4.419180] VGA switcheroo: detected switching method
\_SB_.PCI0.VGA_.ATPX handle

Even with using pciehp, things were still all right until the commit
02e93a8a7, beacuse the pciehp used to ignore the hot-unplug events
(including loss-of-presence-detect and link-down) if (1) SURPRISE
removal is not supported or (2) ADAPTER is not present (which is what
this commit addresses). Thus the hot unplug event used to come, the
pciehp_disable_slot() used to find no adapter and refused to do
anything.

Why problem started with pciehp
==========================
Essentially the commits 02e93a8a7 and 2b3940b60 made the pciehp handle
all hot-unplug events (loss-of-presence-detect and link-downs)
irrespective of whether the the SURPRISE removal was supported or not,
and also if ADAPTER is not present. Now, I would think that both these
commits are still valid because it makes no sense to ignore an unplug
event (and let the kernel continue with stale data structures) just
because SURPRISE is not set, or the ADAPTER is not present (The latter
is an even better reason to process the unplug event).

My recommendations / Options
========================
1) I would first like an opinion on whether it is OK to see the pciehp
handle these hotplug slots. The radeon code seems to be ACPI
intensive, and Rafael's commit also seems to say that this was
supposed to be handled by acpiphp.

2) If it is expected to continue using pciehp, may be we could handle
it in the same way as Rafael did for acpiphp. We could add a flag in
the pci_dev ("ignore_hp_events" or something) and set it for the hot
pluggable slot from radeon code, just like acpi_bus_no_hotplug() is
called today.

I'll be going out for vacation for the 3 days, and would be glad to
submit a patch if needed.

One question to the gentleman who bisected this. (SpacemanSpiff).
Would it be possible for you to look for the following messages while
trying out the image just before the commit 02e93a8a7c?
...
No adapter on slot(2)
...

Thanks & Best Regards,

Rajat Jain

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: pciehp regression (bug #79701)
  2014-08-30  2:25 ` Rajat Jain
  2014-08-30  2:36   ` Guenter Roeck
@ 2014-09-04 13:28   ` Rajat Jain
  1 sibling, 0 replies; 5+ messages in thread
From: Rajat Jain @ 2014-09-04 13:28 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Rafael J. Wysocki, Rajat Jain, Guenter Roeck

Updated

https://bugzilla.kernel.org/show_bug.cgi?id=79701

with the possible options to make forward progress.

Thanks,

Rajat


On Fri, Aug 29, 2014 at 7:25 PM, Rajat Jain <rajatxjain@gmail.com> wrote:
> Hello,
>
> On Fri, Aug 29, 2014 at 3:36 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Hi Rajat,
>>
>> This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was
>> bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch
>> status while disabling").  Can you take a look?
>>
>
> Hi,
>
> I looked at this and wanted to share by observations:
>
> The Basic Issue
> ==============
> There are a bunch of quick hotplug events (unplug followed by the
> hot-plug) that are received by the hotplug driver. While both the
> hotplug drivers (pciehp and acpiphp) are fine with it, the radeon
> driver itself is probably not equipped enough to handle them so well?
> [   41.224428] trying to unbind memory from uninitialized GART !
>
> When acpiphp was being used
> =======================
> As Rafael mentions in this commit log, this is a problem with the VGA
> subsystem, that requires the hot-plug driver to ignore such hot-plug
> events associated with a slot that connects to such known Radeon
> controllers. This was done for acpiphp by introducing a "no_hotplug"
> flag for the ACPI:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f244d8b623dae7a7bc695b0336f67729b95a9736
> The above commit would fix the problem if the acpiphp is used, by
> ignoring the hot-plug events for that slot.
>
> Switch to using pciehp
> =================
> 1) For some reason, the system now seems to use pciehp for these slots
> instead of the acpiphp (can someone please tell if this looks OK? I
> only ask because I see the concerned Rafael's log getting printed that
> seems to indicate that he is expecting the acpiphp to control this
> slot?). But I also see that the pciehp has already grabbed the slot by
> the time this messages gets printed:
> [    4.419180] VGA switcheroo: detected switching method
> \_SB_.PCI0.VGA_.ATPX handle
>
> Even with using pciehp, things were still all right until the commit
> 02e93a8a7, beacuse the pciehp used to ignore the hot-unplug events
> (including loss-of-presence-detect and link-down) if (1) SURPRISE
> removal is not supported or (2) ADAPTER is not present (which is what
> this commit addresses). Thus the hot unplug event used to come, the
> pciehp_disable_slot() used to find no adapter and refused to do
> anything.
>
> Why problem started with pciehp
> ==========================
> Essentially the commits 02e93a8a7 and 2b3940b60 made the pciehp handle
> all hot-unplug events (loss-of-presence-detect and link-downs)
> irrespective of whether the the SURPRISE removal was supported or not,
> and also if ADAPTER is not present. Now, I would think that both these
> commits are still valid because it makes no sense to ignore an unplug
> event (and let the kernel continue with stale data structures) just
> because SURPRISE is not set, or the ADAPTER is not present (The latter
> is an even better reason to process the unplug event).
>
> My recommendations / Options
> ========================
> 1) I would first like an opinion on whether it is OK to see the pciehp
> handle these hotplug slots. The radeon code seems to be ACPI
> intensive, and Rafael's commit also seems to say that this was
> supposed to be handled by acpiphp.
>
> 2) If it is expected to continue using pciehp, may be we could handle
> it in the same way as Rafael did for acpiphp. We could add a flag in
> the pci_dev ("ignore_hp_events" or something) and set it for the hot
> pluggable slot from radeon code, just like acpi_bus_no_hotplug() is
> called today.
>
> I'll be going out for vacation for the 3 days, and would be glad to
> submit a patch if needed.
>
> One question to the gentleman who bisected this. (SpacemanSpiff).
> Would it be possible for you to look for the following messages while
> trying out the image just before the commit 02e93a8a7c?
> ...
> No adapter on slot(2)
> ...
>
> Thanks & Best Regards,
>
> Rajat Jain

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-04 13:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-29 22:36 pciehp regression (bug #79701) Bjorn Helgaas
2014-08-30  0:00 ` Rajat Jain
2014-08-30  2:25 ` Rajat Jain
2014-08-30  2:36   ` Guenter Roeck
2014-09-04 13:28   ` Rajat Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.