From: Joakim Zhang <qiangqing.zhang@nxp.com>
To: Sean Nyekjaer <sean@geanix.com>,
"mkl@pengutronix.de" <mkl@pengutronix.de>,
"linux-can@vger.kernel.org" <linux-can@vger.kernel.org>
Cc: "wg@grandegger.com" <wg@grandegger.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
dl-linux-imx <linux-imx@nxp.com>,
"Martin Hundebøll" <martin@geanix.com>
Subject: RE: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self wakeup
Date: Tue, 20 Aug 2019 11:56:07 +0000 [thread overview]
Message-ID: <DB7PR04MB4618DBF92BD6F24AF0CBD1C0E6AB0@DB7PR04MB4618.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <DB7PR04MB4618A1F984F2281C66959B06E6AB0@DB7PR04MB4618.eurprd04.prod.outlook.com>
> -----Original Message-----
> From: Joakim Zhang
> Sent: 2019年8月20日 19:25
> To: Sean Nyekjaer <sean@geanix.com>; mkl@pengutronix.de;
> linux-can@vger.kernel.org
> Cc: wg@grandegger.com; netdev@vger.kernel.org; dl-linux-imx
> <linux-imx@nxp.com>; Martin Hundebøll <martin@geanix.com>
> Subject: RE: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self
> wakeup
>
>
> > -----Original Message-----
> > From: Sean Nyekjaer <sean@geanix.com>
> > Sent: 2019年8月20日 18:25
> > To: Joakim Zhang <qiangqing.zhang@nxp.com>; mkl@pengutronix.de;
> > linux-can@vger.kernel.org
> > Cc: wg@grandegger.com; netdev@vger.kernel.org; dl-linux-imx
> > <linux-imx@nxp.com>; Martin Hundebøll <martin@geanix.com>
> > Subject: Re: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using
> > self wakeup
> >
> >
> >
> > On 16/08/2019 10.20, Joakim Zhang wrote:
> > > As reproted by Sean Nyekjaer below:
> > > When suspending, when there is still can traffic on the interfaces
> > > the flexcan immediately wakes the platform again. As it should :-).
> > > But it throws this error msg:
> > > [ 3169.378661] PM: noirq suspend of devices failed
> > >
> > > On the way down to suspend the interface that throws the error
> > > message does call flexcan_suspend but fails to call flexcan_noirq_suspend.
> > > That means the flexcan_enter_stop_mode is called, but on the way out
> > > of suspend the driver only calls flexcan_resume and skips
> > > flexcan_noirq_resume, thus it doesn't call flexcan_exit_stop_mode.
> > > This leaves the flexcan in stop mode, and with the current driver it
> > > can't recover from this even with a soft reboot, it requires a hard reboot.
> > >
> > > The best way to exit stop mode is in Wake Up interrupt context, and
> > > then
> > > suspend() and resume() functions can be symmetric. However, stop
> > > mode request and ack will be controlled by SCU(System Control Unit)
> > > firmware(manage clock,power,stop mode, etc. by Cortex-M4 core) in
> > > coming i.MX8(QM/QXP). And SCU firmware interface can't be available
> > > in
> > interrupt context.
> > >
> > > For compatibillity, the wake up mechanism can't be symmetric, so we
> > > need in_stop_mode hack.
> > >
> > > Fixes: de3578c198c6 ("can: flexcan: add self wakeup support")
> > > Reported-by: Sean Nyekjaer <sean@geanix.com>
> > > Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> > >
> >
> > Unfortunatly it's still possible to reproduce the deadlock with this patch...
> >
> > [ 689.921717] flexcan: probe of 2094000.flexcan failed with error
> > -110
> >
> > My test setup:
> > PC with CAN-USB dongle connected to can0 and can1.
> >
> > PC:
> > $ while true; do cansend can0 '123#DEADBEEF'; done
> >
> > iMX6ull:
> > root@iwg26:~# systemctl suspend
> >
> >
> > [ 365.858054] systemd[1]: Reached target Sleep.
> > root@iwg26:~# [ 365.939826] systemd[1]: Starting Suspend...
> > [ 366.115839] systemd-sleep[248]: Suspending system...
> > [ 366.517949] dpm_run_callback(): platform_pm_suspend+0x0/0x5c
> > returns
> > -110 [ 366.518249] PM: Device 2094000.flexcan failed to suspend:
> > error -110 [ 366.518406] PM: Some devices failed to suspend, or early
> > wake event detected [ 366.732162] dpm_run_callback():
> > platform_pm_suspend+0x0/0x5c returns -110 [ 366.732285] PM: Device
> > 2090000.flexcan failed to suspend: error -110 [ 366.732330] PM: Some
> > devices failed to suspend, or early wake event detected [ 366.890637]
> > systemd-sleep[248]: System resumed.
>
> CAN1, CAN0 suspended failed, then CAN0, CAN1 resumed back, so
> CAN0/CAN1 can work fine.
>
> > [ 366.923062] systemd[1]: Started Suspend.
> > [ 366.942819] systemd[1]: sleep.target: Unit not needed anymore.
> Stopping.
> > [ 366.954791] systemd[1]: Stopped target Sleep.
> > [ 366.962402] systemd[1]: Reached target Suspend.
> > [ 366.977546] systemd-logind[135]: Operation 'sleep' finished.
> > [ 366.979194] systemd[1]: suspend.target: Unit not needed anymore.
> > Stopping.
> > [ 366.993831] systemd[1]: Stopped target Suspend.
> > [ 367.139972] systemd-networkd[220]: usb0: Lost carrier [
> > 367.294077]
> > systemd-networkd[220]: usb0: Gained carrier
> >
> > root@iwg26:~# candump can0 | head -n 2
> >
> > can0 123 [4] DE AD BE EF
> > can0 123 [4] DE AD BE EF
> > root@iwg26:~# candump can1 | head -n 2
> >
> > can1 123 [4] DE AD BE EF
> > can1 123 [4] DE AD BE EF
> > root@iwg26:~# systemctl suspend
> >
> > root@iwg26:~# [ 385.106658] systemd[1]: Reached target Sleep.
> > [ 385.147602] systemd[1]: Starting Suspend...
> > [ 385.246421] systemd-sleep[260]: Suspending system...
> > [ 385.634733] dpm_run_callback(): platform_pm_suspend+0x0/0x5c
> > returns
> > -110 [ 385.634855] PM: Device 2090000.flexcan failed to suspend:
> > error -110 [ 385.634897] PM: Some devices failed to suspend, or early
> > wake event detected [ 385.856251] PM: noirq suspend of devices failed
> > [ 385.998364]
> > systemd-sleep[260]: System resumed.
>
> CAN0 suspended failed, CAN1 noirq suspended failed, then CAN1, CAN0
> resumed back, so CAN0/CAN1 can work fine.
If CAN0 suspended failed, should system resumed after suspended all devices, should not enter noirq suspend,
why it here printed "PM: noirq suspend of devices failed"?
> > [ 386.023390] systemd[1]: Started Suspend.
> > [ 386.031570] systemd[1]: sleep.target: Unit not needed anymore.
> Stopping.
> > [ 386.055886] systemd[1]: Stopped target Sleep.
> > [ 386.061430] systemd[1]: Reached target Suspend.
> > [ 386.066142] systemd[1]: suspend.target: Unit not needed anymore.
> > Stopping.
> > [ 386.112575] systemd-networkd[220]: usb0: Lost carrier [
> > 386.116797]
> > systemd-logind[135]: Operation 'sleep' finished.
> > [ 386.146161] systemd[1]: Stopped target Suspend.
> > [ 386.260866] systemd-networkd[220]: usb0: Gained carrier
> > root@iwg26:~# candump can0 | head -n 2
> > can0 123 [4] DE AD BE EF
> > can0 123 [4] DE AD BE EF
> > root@iwg26:~# candump can1 | head -n 2
> >
> > can1 123 [4] DE AD BE EF
> > can1 123 [4] DE AD BE EF
> > root@iwg26:~# systemctl suspend
> >
> > [ 396.919303] systemd[1]: Reached target Sleep.
> > root@iwg26:~# [ 396.964722] systemd[1]: Starting Suspend...
> > [ 397.067336] systemd-sleep[268]: Suspending system...
> > [ 397.574571] PM: noirq suspend of devices failed [ 397.834731] PM:
> > noirq suspend of devices failed [ 397.807996] systemd-networkd[220]:
> > usb0: Lost carrier [ 398.156295] dpm_run_callback():
> > platform_pm_suspend+0x0/0x5c returns -110 [ 398.156339] PM: Device
> 2094000.flexcan failed to suspend:
> > error -110 [ 398.156509] PM: Some devices failed to suspend, or early
> > wake event detected [ 398.053555] systemd-sleep[268]: Failed to write
> > /sys/power/state:
> > Device or resource busy
>
> But the log here is very strange and chaotic, it looks like CAN0 suspended failed,
> then resumed back, so CAN0 can work fine.
> CAN1 noirq suspend failed, but have not resumed back, so CAN1 still in stop
> mode, cannot work. I think this may be other device noirq suspend failed broke
> the resume of CAN1.
>
> Could you do more debug to help locate the issue?
More strange, why here first enter noirq suspend?
Best Regards,
Joakim Zhang
> > [ 398.074751] systemd[1]: systemd-suspend.service: Main process
> > exited, code=exited, status=1/FAILURE [ 398.076779] systemd[1]:
>
> > systemd-suspend.service: Failed with result 'exit-code'.
> > [ 398.109255] systemd[1]: Failed to start Suspend.
> > [ 398.118704] systemd[1]: Dependency failed for Suspend.
> > [ 398.136283] systemd-logind[135]: Operation 'sleep' finished.
> > [ 398.137770] systemd[1]: suspend.target: Job suspend.target/start
> > failed with result 'dependency'.
> > [ 398.139105] systemd[1]: sleep.target: Unit not needed anymore.
> Stopping.
> > [ 398.167590] systemd[1]: Stopped target Sleep.
> > [ 398.201558] systemd-networkd[220]: usb0: Gained carrier
>
> Log here also strange.
>
> Best Regards,
> Joakim Zhang
> > root@iwg26:~# candump can0 | head -n 2
> > can0 123 [4] DE AD BE EF
> > can0 123 [4] DE AD BE EF
> > root@iwg26:~# candump can1 | head -n 2
> >
> > nothing on can1 anymore :-(
> >
> > root@iwg26:~# rmmod flexcan
> > [ 622.884746] systemd-networkd[220]: can1: Lost carrier [
> > 623.046766]
> > systemd-networkd[220]: can0: Lost carrier root@iwg26:~# insmod
> > /mnt/flexcan.ko [ 628.323981] flexcan 2094000.flexcan: registering
> > netdev failed
> >
> > and can1 fails to register with:
> > [ 628.347485] flexcan: probe of 2094000.flexcan failed with error
> > -110
> >
> > /Sean
next prev parent reply other threads:[~2019-08-20 11:56 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-16 8:20 [PATCH REPOST 0/2] can: flexcan: fix PM and wakeup issue Joakim Zhang
2019-08-16 8:20 ` [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self wakeup Joakim Zhang
2019-08-20 10:25 ` Sean Nyekjaer
2019-08-20 11:24 ` Joakim Zhang
2019-08-20 11:55 ` Sean Nyekjaer
2019-08-28 13:24 ` Sean Nyekjaer
2019-08-29 7:30 ` Joakim Zhang
2019-09-05 5:57 ` Sean Nyekjaer
2019-09-05 7:10 ` Joakim Zhang
2019-09-05 13:17 ` Sean Nyekjaer
2019-09-05 15:24 ` Marc Kleine-Budde
2019-09-06 2:11 ` Joakim Zhang
2019-09-10 7:52 ` Sean Nyekjaer
2019-10-08 10:20 ` Joakim Zhang
2019-08-20 11:56 ` Joakim Zhang [this message]
2019-08-16 8:20 ` [PATCH REPOST 2/2] can: flexcan: add LPSR mode support for i.MX7D Joakim Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DB7PR04MB4618DBF92BD6F24AF0CBD1C0E6AB0@DB7PR04MB4618.eurprd04.prod.outlook.com \
--to=qiangqing.zhang@nxp.com \
--cc=linux-can@vger.kernel.org \
--cc=linux-imx@nxp.com \
--cc=martin@geanix.com \
--cc=mkl@pengutronix.de \
--cc=netdev@vger.kernel.org \
--cc=sean@geanix.com \
--cc=wg@grandegger.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).