linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] scsi: sd: Revert "Rework asynchronous resume support"
       [not found]         ` <026ad7cc-5be9-e90b-8c95-0649caf68779@acm.org>
@ 2022-08-26  7:54           ` Geert Uytterhoeven
  0 siblings, 0 replies; only message in thread
From: Geert Uytterhoeven @ 2022-08-26  7:54 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Hans de Goede, Martin K . Petersen, scsi, Damien Le Moal,
	Hannes Reinecke, gzhqyz, James E.J. Bottomley, Linux-Renesas

Hi Bart,

On Tue, Aug 23, 2022 at 8:10 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 8/22/22 23:41, Geert Uytterhoeven wrote:
> > A lock-up (magic sysrq does not work) during s2idle.
> > I tried bisecting it yesterday, but failed.
> > On v6.0-rc1 (and rc2) it happens ca. 25% of the time, but the closer
> > I get to v5.19, the less likely it is to happen. Apparently 100
> > successful s2idle cycles was not enough to declare a kernel good...
> >
> >      Freezing ...
> >      Filesystems sync: 0.001 seconds
> >      Freezing user space processes ... (elapsed 0.001 seconds) done.
> >      OOM killer disabled.
> >      Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >      sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >      sd 0:0:0:0: [sda] Stopping disk
> >
> > ---> hangs here if it happens
> >
> >      ravb e6800000.ethernet eth0: Link is Down
> >      sd 0:0:0:0: [sda] Starting disk
> >      Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached
> > PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=186)
> >      ata1: link resume succeeded after 1 retries
> >      ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >      ata1.00: configured for UDMA/133
> >      OOM killer enabled.
> >      Restarting tasks ... done.
> >      random: crng reseeded on system resumption
> >      PM: suspend exit
> >      ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
>
> I'm not sure that is enough information to find the root cause. How

Sorry for not making it clear I didn't expect this to be enough
information.

> about enabling the tp_printk boot option and to enable tracing for
> suspend/resume operations, e.g. as follows?
>
> cd /sys/kernel/tracing &&
> echo 256 > /sys/kernel/tracing/buffer_size_kb &&
> echo nop > current_tracer &&
> echo > trace &&
> echo 1 > events/power/device_pm_callback_start/enable &&
> echo 1 > events/power/device_pm_callback_end/enable &&
> echo 1 > events/power/suspend_resume/enable &&
> echo 1 > tracing_on

Thanks, that generates lots of output (362 KiB/cycle)!
Unfortunately it also has an impact on the probability of lock-ups.
Combined with 'scsi: sd: Revert "Rework asynchronous resume support"',
s2idle now works almost always.

I did manage to trigger the lock-up once with tracing enabled:

     device_pm_callback_end: gpio_rcar e6055400.gpio, err=0
     device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc,
noirq power domain [suspend]
     device_pm_callback_end: gpio_rcar e6055800.gpio, err=0
     device_pm_callback_start: renesas-cpg-mssr
e6150000.clock-controller, parent: soc, noirq driver [suspend]
     device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0
     device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc,
noirq driver [suspend]
     device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0
     suspend_resume: dpm_suspend_noirq[2] end
     suspend_resume: machine_suspend[1] begin
     suspend_resume: timekeeping_freeze[5] begin

---> hang

     suspend_resume: timekeeping_freeze[0] end
     suspend_resume: machine_suspend[1] end
     suspend_resume: dpm_resume_noirq[16] begin
     device_pm_callback_start: sh-pfc e6060000.pinctrl, parent: soc,
noirq driver [resume]
     device_pm_callback_end: sh-pfc e6060000.pinctrl, err=0
     device_pm_callback_start: renesas-cpg-mssr
e6150000.clock-controller, parent: soc, noirq driver [resume]
     device_pm_callback_end: renesas-cpg-mssr e6150000.clock-controller, err=0
     device_pm_callback_start: gpio_rcar e6055800.gpio, parent: soc,
noirq power domain [resume]

Oops, timers...

At least it's not related to SCSI ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-08-26  7:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220816172638.538734-1-bvanassche@acm.org>
     [not found] ` <decc1ef4-ec85-d947-ec81-ebeaa982f53f@redhat.com>
     [not found]   ` <CAMuHMdVDWrLs_KusG8vXA_1z8ORdPnpfxzNqw4jCG_G0D-fn+A@mail.gmail.com>
     [not found]     ` <ecf878dc-905b-f714-4c44-6c90e81f8391@acm.org>
     [not found]       ` <CAMuHMdW0WzgQjR33hz9om7ahE5StbDCLozVnZzYAS1WEzStR0w@mail.gmail.com>
     [not found]         ` <026ad7cc-5be9-e90b-8c95-0649caf68779@acm.org>
2022-08-26  7:54           ` [PATCH] scsi: sd: Revert "Rework asynchronous resume support" Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).