All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jon Hunter <jonathanh@nvidia.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Keith Busch <kbusch@kernel.org>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	linux-tegra <linux-tegra@vger.kernel.org>
Subject: Re: [PATCH v2] PCI: PM: Skip devices in D0 for suspend-to-idle
Date: Tue, 25 Jun 2019 13:46:04 +0100	[thread overview]
Message-ID: <b562e5b6-68f5-d267-1529-72b881006534@nvidia.com> (raw)
In-Reply-To: <CAJZ5v0hdtXqoK84DpYtyMSCnkR9zOHFiUPAzWZDtkFmEjyWD1g@mail.gmail.com>


On 24/06/2019 22:37, Rafael J. Wysocki wrote:
> On Mon, Jun 24, 2019 at 2:43 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>> Hi Rafael,
>>
>> On 13/06/2019 22:59, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Commit d491f2b75237 ("PCI: PM: Avoid possible suspend-to-idle issue")
>>> attempted to avoid a problem with devices whose drivers want them to
>>> stay in D0 over suspend-to-idle and resume, but it did not go as far
>>> as it should with that.
>>>
>>> Namely, first of all, the power state of a PCI bridge with a
>>> downstream device in D0 must be D0 (based on the PCI PM spec r1.2,
>>> sec 6, table 6-1, if the bridge is not in D0, there can be no PCI
>>> transactions on its secondary bus), but that is not actively enforced
>>> during system-wide PM transitions, so use the skip_bus_pm flag
>>> introduced by commit d491f2b75237 for that.
>>>
>>> Second, the configuration of devices left in D0 (whatever the reason)
>>> during suspend-to-idle need not be changed and attempting to put them
>>> into D0 again by force is pointless, so explicitly avoid doing that.
>>>
>>> Fixes: d491f2b75237 ("PCI: PM: Avoid possible suspend-to-idle issue")
>>> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>
>> I have noticed a regression in both the mainline and -next branches on
>> one of our boards when testing suspend. The bisect is point to this
>> commit and reverting on top of mainline does fix the problem. So far I
>> have not looked at this in close detail but kernel log is showing ...
> 
> Can you please collect a log like that, but with dynamic debug in
> pci-driver.c enabled?

Yes, here you go ...

[   52.939258] PM: suspend entry (deep)
[   52.942963] Filesystems sync: 0.000 seconds
[   52.947596] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   52.956145] OOM killer disabled.
[   52.959371] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   52.968088] printk: Suspending console(s) (use no_console_suspend to debug)
[   52.992168] r8169 0000:01:00.0 eth0: Link is Down
[   52.992245] pci_generic_config_write32: 22 callbacks suppressed
[   52.992250] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x9c may corrupt adjacent RW1C bits
[   53.204186] r8169 0000:01:00.0: PCI PM: Suspend power state: D3hot
[   53.204221] pcieport 0000:00:02.0: PCI PM: Suspend power state: D0
[   53.204224] pcieport 0000:00:02.0: PCI PM: Skipped
[   53.215716] Disabling non-boot CPUs ...
[   53.218833] Entering suspend state LP1
[   53.218860] Enabling non-boot CPUs ...
[   53.219731] CPU1 is up
[   53.220482] CPU2 is up
[   53.221289] CPU3 is up
[   53.221850] tegra-pcie 1003000.pcie: probing port 1, using 1 lanes
[   53.239925] pcieport 0000:00:02.0: nv_msi_ht_cap_quirk didn't locate host bridge
[   53.264145] r8169 0000:01:00.0: Refused to change power state, currently in D3
[   53.326969] tegra-pcie 1003000.pcie: Slot present pin change, signature: 00000004
[   53.326975] tegra-pcie 1003000.pcie: Response decoding error, signature: 10010045
[   53.326978] tegra-pcie 1003000.pcie:   FPCI address: fe10010044
[   53.327091] tegra-pcie 1003000.pcie: Response decoding error, signature: 2000000c
[   53.327095] tegra-pcie 1003000.pcie:   FPCI address:   2000000c
[   53.327099] tegra-pcie 1003000.pcie: Response decoding error, signature: 20000001
[   53.327102] tegra-pcie 1003000.pcie:   FPCI address:   20000000
[   53.347944] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x88 may corrupt adjacent RW1C bits
[   53.347955] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x90 may corrupt adjacent RW1C bits
[   53.347962] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x98 may corrupt adjacent RW1C bits
[   53.347969] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x9c may corrupt adjacent RW1C bits
[   53.347977] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0xa8 may corrupt adjacent RW1C bits
[   53.347984] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0xb0 may corrupt adjacent RW1C bits
[   53.348025] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x52 may corrupt adjacent RW1C bits
[   53.348033] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x52 may corrupt adjacent RW1C bits
[   53.348043] pci_bus 0000:00: 2-byte config write to 0000:00:02.0 offset 0x5c may corrupt adjacent RW1C bits
[   53.358310] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   53.358592] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.394498] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.394789] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.395072] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.395352] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.395635] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.395919] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.396209] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.396488] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.396771] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.397055] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.397330] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.397608] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.397884] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.398162] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.398441] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.398721] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.399006] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.399295] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   53.399579] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   54.234501] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   54.234819] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   54.235104] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   54.235386] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   54.235403] Generic Realtek PHY r8169-100:00: Master/Slave resolution failed, maybe conflicting manual settings?
[   54.235406] ------------[ cut here ]------------
[   54.235416] WARNING: CPU: 3 PID: 112 at /home/jonathanh/workdir/tegra/mlt-linux_torvalds/kernel/drivers/net/phy/phy.c:735 phy_error+0x1c/0x54
[   54.235419] Modules linked in: ttm
[   54.235429] CPU: 3 PID: 112 Comm: kworker/3:1 Not tainted 5.2.0-rc6-dirty #3
[   54.235431] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[   54.235441] Workqueue: events_power_efficient phy_state_machine
[   54.235455] [<c0112244>] (unwind_backtrace) from [<c010cad8>] (show_stack+0x10/0x14)
[   54.235463] [<c010cad8>] (show_stack) from [<c0a606a4>] (dump_stack+0xb4/0xc8)
[   54.235471] [<c0a606a4>] (dump_stack) from [<c0123cbc>] (__warn+0xe0/0xf8)
[   54.235477] [<c0123cbc>] (__warn) from [<c0123dec>] (warn_slowpath_null+0x40/0x48)
[   54.235482] [<c0123dec>] (warn_slowpath_null) from [<c0617470>] (phy_error+0x1c/0x54)
[   54.235488] [<c0617470>] (phy_error) from [<c0618564>] (phy_state_machine+0x64/0x1c0)
[   54.235498] [<c0618564>] (phy_state_machine) from [<c013e744>] (process_one_work+0x204/0x578)
[   54.235503] [<c013e744>] (process_one_work) from [<c013f444>] (worker_thread+0x44/0x584)
[   54.235507] [<c013f444>] (worker_thread) from [<c01445d4>] (kthread+0x148/0x150)
[   54.235512] [<c01445d4>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[   54.235515] Exception stack(0xe9ea1fb0 to 0xe9ea1ff8)
[   54.235518] 1fa0:                                     00000000 00000000 00000000 00000000
[   54.235522] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   54.235525] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   54.235528] ---[ end trace 772a7ce78ffff5e6 ]---
[   54.235551] r8169 0000:01:00.0 eth0: Link is Down
[   54.245804] r8169 0000:01:00.0 eth0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[   54.256058] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.266257] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.276454] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.286656] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.296860] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.307064] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.317263] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.327464] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.337660] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.347902] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.358102] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.368303] r8169 0000:01:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   54.369471] r8169 0000:01:00.0 eth0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[   54.370637] r8169 0000:01:00.0 eth0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[   54.371799] r8169 0000:01:00.0 eth0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[   54.372961] r8169 0000:01:00.0 eth0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[   54.373416] r8169 0000:01:00.0 eth0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[   54.716510] ata1: SATA link down (SStatus 0 SControl 300)
[   56.998780] OOM killer enabled.
[   57.001909] Restarting tasks ... done.
[   57.007392] PM: suspend exit
[   73.144767] nfs: server 192.168.99.1 not responding, still trying
[   77.624567] nfs: server 192.168.99.1 not responding, still trying

Cheers
Jon

-- 
nvpublic

      parent reply	other threads:[~2019-06-25 12:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-13 21:59 [PATCH v2] PCI: PM: Skip devices in D0 for suspend-to-idle Rafael J. Wysocki
2019-06-24 12:43 ` Jon Hunter
2019-06-24 21:37   ` Rafael J. Wysocki
2019-06-24 22:20     ` Rafael J. Wysocki
2019-06-24 23:09       ` Rafael J. Wysocki
2019-06-25 12:46         ` Jon Hunter
2019-06-25 13:26         ` Jon Hunter
2019-06-25 16:23         ` Rafael J. Wysocki
2019-06-26 10:58           ` Mika Westerberg
2019-06-25 12:46     ` Jon Hunter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b562e5b6-68f5-d267-1529-72b881006534@nvidia.com \
    --to=jonathanh@nvidia.com \
    --cc=helgaas@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=kbusch@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.