All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
@ 2017-03-30  1:17 Brian Norris
       [not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2017-04-10 23:35 ` Doug Anderson
  0 siblings, 2 replies; 18+ messages in thread
From: Brian Norris @ 2017-03-30  1:17 UTC (permalink / raw)
  To: linux-mmc, linux-rockchip
  Cc: Heiko Stuebner, amstan, Ziyuan Xu, Shawn Lin, Jaehoon Chung

Hi all,

I haven't managed to get as far as a bugfix for this, but I've bisected
some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
in particular). v4.9 works fine.

Issue #1 - eMMC complains periodically:

[    4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[    4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[    5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[    5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[   11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[   17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)

and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M >
/dev/null), it will eventually croak:

[  359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[  360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153
[  360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0
[  360.221936] mmcblk2: retrying using single block read
[  363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
[  363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[  363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[  363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152
[  363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
[  363.761938] mmcblk2: retrying using single block read
[  366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds.
[  366.618134]       Not tainted 4.10.0 #284
[  366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.629960] mmcqd/2boot1    D    0    92      2 0x00000000
[  366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0)
[  366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c)
[  366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34)
[  366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c)
[  366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4)
[  366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144)
[  366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c)
...

Issue #2 - Wifi (via SDIO, mmc1) is completely dead:

[    1.444125] mmc_host mmc1: card is non-removable.
[    1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[    1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[    1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001
[   25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[   25.691666] mwifiex: rx work enabled, cpus 4
[   26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes
[   27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active
[   33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[   37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0
[   37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0
[   37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0
[   37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1
[   37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0
[   37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1
[   37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
[   37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[   37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0
[   37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
[   37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0
[   37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
[   37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0
[   37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0
[   37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
[   37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device

For either of these issues, if I simply revert the dw_mmc driver back to
its v4.9 version (but keep everything else at v4.10), things seem to
work fine.

At this point, I'm pretty sure that it's the runtime PM support added to
dw_mmc that cause the regression.

Any thoughts? I don't exactly plan on trying to debug a solution myself here,
but I thought I'd report it in case somebody else has ideas.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
       [not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2017-03-30  1:32   ` Shawn Lin
  2017-03-30  1:42     ` Brian Norris
  0 siblings, 1 reply; 18+ messages in thread
From: Shawn Lin @ 2017-03-30  1:32 UTC (permalink / raw)
  To: Brian Norris, linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Jaehoon Chung, Ziyuan Xu, Heiko Stuebner, amstan-F7+t8E8rja9g9hUCZPvPmw

Hi Brian,

On 2017/3/30 9:17, Brian Norris wrote:
> Hi all,
>
> I haven't managed to get as far as a bugfix for this, but I've bisected
> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
> in particular). v4.9 works fine.

Does your v4.10+ kernel  have these commits?

commit e9748e0364fe82dc037d22900ff13a62d04518bf
Author: Ziyuan Xu <xzy.xu-TNX95d0MmH7DzftRWevZcw@public.gmane.org>
Date:   Tue Jan 17 09:22:56 2017 +0800

     mmc: dw_mmc: force setup bus if active slots exist


commit df9bcc2bc0a1f8d2963bd916698268fb2470713b
Author: Joonyoung Shim <jy0922.shim-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
Date:   Fri Nov 25 12:47:15 2016 +0900

     mmc: dw_mmc: add missing codes for runtime resume


commit ce69e2fea093b7fa3991c87849c4955cd47796c9
Author: Shawn Lin <shawn.lin-TNX95d0MmH7DzftRWevZcw@public.gmane.org>
Date:   Tue Jan 17 09:22:55 2017 +0800

     mmc: dw_mmc: silent verbose log when calling from PM context


>
> Issue #1 - eMMC complains periodically:
>
> [    4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>
> and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M >
> /dev/null), it will eventually croak:
>
> [  359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [  360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153
> [  360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0
> [  360.221936] mmcblk2: retrying using single block read
> [  363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
> [  363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
> [  363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [  363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152
> [  363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
> [  363.761938] mmcblk2: retrying using single block read
> [  366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds.
> [  366.618134]       Not tainted 4.10.0 #284
> [  366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  366.629960] mmcqd/2boot1    D    0    92      2 0x00000000
> [  366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0)
> [  366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c)
> [  366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34)
> [  366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c)
> [  366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4)
> [  366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144)
> [  366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c)
> ...
>
> Issue #2 - Wifi (via SDIO, mmc1) is completely dead:
>
> [    1.444125] mmc_host mmc1: card is non-removable.
> [    1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
> [    1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001
> [   25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   25.691666] mwifiex: rx work enabled, cpus 4
> [   26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes
> [   27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active
> [   33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0
> [   37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0
> [   37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0
> [   37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1
> [   37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0
> [   37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1
> [   37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
> [   37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
> [   37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0
> [   37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
> [   37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0
> [   37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
> [   37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0
> [   37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0
> [   37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device
>
> For either of these issues, if I simply revert the dw_mmc driver back to
> its v4.9 version (but keep everything else at v4.10), things seem to
> work fine.
>
> At this point, I'm pretty sure that it's the runtime PM support added to
> dw_mmc that cause the regression.
>
> Any thoughts? I don't exactly plan on trying to debug a solution myself here,
> but I thought I'd report it in case somebody else has ideas.
>
> Brian
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-03-30  1:32   ` Shawn Lin
@ 2017-03-30  1:42     ` Brian Norris
  2017-03-30  2:18       ` Eddie Cai
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Norris @ 2017-03-30  1:42 UTC (permalink / raw)
  To: Shawn Lin
  Cc: linux-mmc, linux-rockchip, Heiko Stuebner, amstan, Ziyuan Xu,
	Jaehoon Chung

On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
> Hi Brian,
> 
> On 2017/3/30 9:17, Brian Norris wrote:
> >Hi all,
> >
> >I haven't managed to get as far as a bugfix for this, but I've bisected
> >some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
> >in particular). v4.9 works fine.
> 
> Does your v4.10+ kernel  have these commits?

By "4.10+", I meant that pure 4.10 is broken, as are all subsequent
versions (e.g., 4.11-rc1).

> commit e9748e0364fe82dc037d22900ff13a62d04518bf
> Author: Ziyuan Xu <xzy.xu@rock-chips.com>
> Date:   Tue Jan 17 09:22:56 2017 +0800
> 
>     mmc: dw_mmc: force setup bus if active slots exist
> 
> 
> commit df9bcc2bc0a1f8d2963bd916698268fb2470713b
> Author: Joonyoung Shim <jy0922.shim@samsung.com>
> Date:   Fri Nov 25 12:47:15 2016 +0900
> 
>     mmc: dw_mmc: add missing codes for runtime resume

'git describe' tells me these are in 4.10-rc1 and -rc6. So yes.

> commit ce69e2fea093b7fa3991c87849c4955cd47796c9
> Author: Shawn Lin <shawn.lin@rock-chips.com>
> Date:   Tue Jan 17 09:22:55 2017 +0800
> 
>     mmc: dw_mmc: silent verbose log when calling from PM context

'git describe' tells me this is in 4.11-rc1, so no.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-03-30  1:42     ` Brian Norris
@ 2017-03-30  2:18       ` Eddie Cai
  2017-03-30  2:53         ` Brian Norris
  0 siblings, 1 reply; 18+ messages in thread
From: Eddie Cai @ 2017-03-30  2:18 UTC (permalink / raw)
  To: Brian Norris
  Cc: Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, Jaehoon Chung,
	linux-rockchip, amstan

HI Brian
I test on rk3288 firefly reload with 4.11-rc4. It work fine.

2017-03-30 9:42 GMT+08:00 Brian Norris <briannorris@chromium.org>:
> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
>> Hi Brian,
>>
>> On 2017/3/30 9:17, Brian Norris wrote:
>> >Hi all,
>> >
>> >I haven't managed to get as far as a bugfix for this, but I've bisected
>> >some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
>> >in particular). v4.9 works fine.
>>
>> Does your v4.10+ kernel  have these commits?
>
> By "4.10+", I meant that pure 4.10 is broken, as are all subsequent
> versions (e.g., 4.11-rc1).
>
>> commit e9748e0364fe82dc037d22900ff13a62d04518bf
>> Author: Ziyuan Xu <xzy.xu@rock-chips.com>
>> Date:   Tue Jan 17 09:22:56 2017 +0800
>>
>>     mmc: dw_mmc: force setup bus if active slots exist
>>
>>
>> commit df9bcc2bc0a1f8d2963bd916698268fb2470713b
>> Author: Joonyoung Shim <jy0922.shim@samsung.com>
>> Date:   Fri Nov 25 12:47:15 2016 +0900
>>
>>     mmc: dw_mmc: add missing codes for runtime resume
>
> 'git describe' tells me these are in 4.10-rc1 and -rc6. So yes.
>
>> commit ce69e2fea093b7fa3991c87849c4955cd47796c9
>> Author: Shawn Lin <shawn.lin@rock-chips.com>
>> Date:   Tue Jan 17 09:22:55 2017 +0800
>>
>>     mmc: dw_mmc: silent verbose log when calling from PM context
>
> 'git describe' tells me this is in 4.11-rc1, so no.
>
> Brian
>
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-03-30  2:18       ` Eddie Cai
@ 2017-03-30  2:53         ` Brian Norris
  2017-03-30  5:11           ` Jaehoon Chung
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Norris @ 2017-03-30  2:53 UTC (permalink / raw)
  To: Eddie Cai
  Cc: Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, Jaehoon Chung,
	linux-rockchip, amstan, Kevin Mihelich

Hi Eddie,

On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote:
> I test on rk3288 firefly reload with 4.11-rc4. It work fine.

OK, thanks for checking.

> > On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
> >> Hi Brian,
> >>
> >> On 2017/3/30 9:17, Brian Norris wrote:
> >> >Hi all,
> >> >
> >> >I haven't managed to get as far as a bugfix for this, but I've bisected
> >> >some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
> >> >in particular). v4.9 works fine.

[...]

By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an
Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook.
I haven't picked these apart yet to see what the differences and
similarities are, but presumably it's not actually a Rockchip-specific
bug. Maybe related to the way power sequencing is plumbed for these, for
example?

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-03-30  2:53         ` Brian Norris
@ 2017-03-30  5:11           ` Jaehoon Chung
  2017-04-06 22:04             ` Brian Norris
  0 siblings, 1 reply; 18+ messages in thread
From: Jaehoon Chung @ 2017-03-30  5:11 UTC (permalink / raw)
  To: Brian Norris, Eddie Cai
  Cc: Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip,
	amstan, Kevin Mihelich

Hi,

On 03/30/2017 11:53 AM, Brian Norris wrote:
> Hi Eddie,
> 
> On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote:
>> I test on rk3288 firefly reload with 4.11-rc4. It work fine.
> 
> OK, thanks for checking.
> 
>>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
>>>> Hi Brian,
>>>>
>>>> On 2017/3/30 9:17, Brian Norris wrote:
>>>>> Hi all,
>>>>>
>>>>> I haven't managed to get as far as a bugfix for this, but I've bisected
>>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
>>>>> in particular). v4.9 works fine.
> 
> [...]
> 
> By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an
> Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook.	
> I haven't picked these apart yet to see what the differences and
> similarities are, but presumably it's not actually a Rockchip-specific
> bug. Maybe related to the way power sequencing is plumbed for these, for
> example?

I'm not sure but if card-detecting is polling, the timing issue could be occurred.

Best Regards,
Jaehoon Chung

> 
> Brian
> 
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-03-30  5:11           ` Jaehoon Chung
@ 2017-04-06 22:04             ` Brian Norris
  2017-04-07  4:59               ` Jaehoon Chung
  2017-04-07  6:50               ` Shawn Lin
  0 siblings, 2 replies; 18+ messages in thread
From: Brian Norris @ 2017-04-06 22:04 UTC (permalink / raw)
  To: Jaehoon Chung
  Cc: Eddie Cai, Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc,
	linux-rockchip, amstan, Kevin Mihelich, Doug Anderson

Hi,

On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote:
> On 03/30/2017 11:53 AM, Brian Norris wrote:
> > On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote:
> >> I test on rk3288 firefly reload with 4.11-rc4. It work fine.
> > 
> > OK, thanks for checking.
> > 
> >>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
> >>>> Hi Brian,
> >>>>
> >>>> On 2017/3/30 9:17, Brian Norris wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I haven't managed to get as far as a bugfix for this, but I've bisected
> >>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
> >>>>> in particular). v4.9 works fine.
> > 
> > [...]
> > 
> > By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an
> > Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook.	
> > I haven't picked these apart yet to see what the differences and
> > similarities are, but presumably it's not actually a Rockchip-specific
> > bug. Maybe related to the way power sequencing is plumbed for these, for
> > example?
> 
> I'm not sure but if card-detecting is polling, the timing issue could be occurred.

I don't know much about MMC in general, nor about this driver. Any
chance you'd accept reverts of the patches in question though? This is a
huge regression, and there were only a few relevant changes that seem to
have triggered this. I can try to come up with something targeted, but
I'm not going to even try if that'd get rejected up front.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-06 22:04             ` Brian Norris
@ 2017-04-07  4:59               ` Jaehoon Chung
  2017-04-07  6:50               ` Shawn Lin
  1 sibling, 0 replies; 18+ messages in thread
From: Jaehoon Chung @ 2017-04-07  4:59 UTC (permalink / raw)
  To: Brian Norris
  Cc: Eddie Cai, Shawn Lin, Heiko Stuebner, Ziyuan Xu, linux-mmc,
	linux-rockchip, amstan, Kevin Mihelich, Doug Anderson

On 04/07/2017 07:04 AM, Brian Norris wrote:
> Hi,
> 
> On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote:
>> On 03/30/2017 11:53 AM, Brian Norris wrote:
>>> On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote:
>>>> I test on rk3288 firefly reload with 4.11-rc4. It work fine.
>>>
>>> OK, thanks for checking.
>>>
>>>>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
>>>>>> Hi Brian,
>>>>>>
>>>>>> On 2017/3/30 9:17, Brian Norris wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I haven't managed to get as far as a bugfix for this, but I've bisected
>>>>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
>>>>>>> in particular). v4.9 works fine.
>>>
>>> [...]
>>>
>>> By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an
>>> Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook.	
>>> I haven't picked these apart yet to see what the differences and
>>> similarities are, but presumably it's not actually a Rockchip-specific
>>> bug. Maybe related to the way power sequencing is plumbed for these, for
>>> example?
>>
>> I'm not sure but if card-detecting is polling, the timing issue could be occurred.
> 
> I don't know much about MMC in general, nor about this driver. Any
> chance you'd accept reverts of the patches in question though? This is a
> huge regression, and there were only a few relevant changes that seem to
> have triggered this. I can try to come up with something targeted, but
> I'm not going to even try if that'd get rejected up front.

Sure, if it's big regression, we can revert the patches relevant to problem.
After fixing it, we can re-apply them..before reverting, i will try to fix it until next Wends,

Otherwise, will revert them at that time.

Best Regards,
Jaehoon Chung

> 
> Brian
> 
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-06 22:04             ` Brian Norris
  2017-04-07  4:59               ` Jaehoon Chung
@ 2017-04-07  6:50               ` Shawn Lin
  2017-04-07  7:38                 ` Jaehoon Chung
  1 sibling, 1 reply; 18+ messages in thread
From: Shawn Lin @ 2017-04-07  6:50 UTC (permalink / raw)
  To: Brian Norris, Jaehoon Chung
  Cc: Eddie Cai, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip,
	amstan, Kevin Mihelich, Doug Anderson

Hi Brian,

On 2017/4/7 6:04, Brian Norris wrote:
> Hi,
>
> On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote:
>> On 03/30/2017 11:53 AM, Brian Norris wrote:
>>> On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote:
>>>> I test on rk3288 firefly reload with 4.11-rc4. It work fine.
>>>
>>> OK, thanks for checking.
>>>
>>>>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
>>>>>> Hi Brian,
>>>>>>
>>>>>> On 2017/3/30 9:17, Brian Norris wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I haven't managed to get as far as a bugfix for this, but I've bisected
>>>>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
>>>>>>> in particular). v4.9 works fine.
>>>
>>> [...]
>>>
>>> By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an
>>> Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook.	
>>> I haven't picked these apart yet to see what the differences and
>>> similarities are, but presumably it's not actually a Rockchip-specific
>>> bug. Maybe related to the way power sequencing is plumbed for these, for
>>> example?
>>
>> I'm not sure but if card-detecting is polling, the timing issue could be occurred.
>
> I don't know much about MMC in general, nor about this driver. Any
> chance you'd accept reverts of the patches in question though? This is a
> huge regression, and there were only a few relevant changes that seem to
> have triggered this. I can try to come up with something targeted, but
> I'm not going to even try if that'd get rejected up front.

Untile now, none of my(and my colleagues') rockchip platforms are able
to reproduce this issue, so I can't tell what exactly the problem is.
However, I noticed you mentioned that the Exynos platforms are also
affected by rpm of dwmmc. I don't see dw_mmc-exynos enable this feature,
so it looks quite odd to me!

Can I or Eddie get a Veyron board to help you debug it?

>
> Brian
>
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-07  6:50               ` Shawn Lin
@ 2017-04-07  7:38                 ` Jaehoon Chung
  0 siblings, 0 replies; 18+ messages in thread
From: Jaehoon Chung @ 2017-04-07  7:38 UTC (permalink / raw)
  To: Shawn Lin, Brian Norris
  Cc: Eddie Cai, Heiko Stuebner, Ziyuan Xu, linux-mmc, linux-rockchip,
	amstan, Kevin Mihelich, Doug Anderson

On 04/07/2017 03:50 PM, Shawn Lin wrote:
> Hi Brian,
> 
> On 2017/4/7 6:04, Brian Norris wrote:
>> Hi,
>>
>> On Thu, Mar 30, 2017 at 02:11:19PM +0900, Jaehoon Chung wrote:
>>> On 03/30/2017 11:53 AM, Brian Norris wrote:
>>>> On Thu, Mar 30, 2017 at 10:18:59AM +0800, Eddie Cai wrote:
>>>>> I test on rk3288 firefly reload with 4.11-rc4. It work fine.
>>>>
>>>> OK, thanks for checking.
>>>>
>>>>>> On Thu, Mar 30, 2017 at 09:32:22AM +0800, Shawn Lin wrote:
>>>>>>> Hi Brian,
>>>>>>>
>>>>>>> On 2017/3/30 9:17, Brian Norris wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I haven't managed to get as far as a bugfix for this, but I've bisected
>>>>>>>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
>>>>>>>> in particular). v4.9 works fine.
>>>>
>>>> [...]
>>>>
>>>> By the way, Kevin (CC'd) says he noticed similiar Wifi issues on an
>>>> Exynos 5800 Peach chromebook, but not on an Exynos 5250 Snow chromebook.   
>>>> I haven't picked these apart yet to see what the differences and
>>>> similarities are, but presumably it's not actually a Rockchip-specific
>>>> bug. Maybe related to the way power sequencing is plumbed for these, for
>>>> example?
>>>
>>> I'm not sure but if card-detecting is polling, the timing issue could be occurred.
>>
>> I don't know much about MMC in general, nor about this driver. Any
>> chance you'd accept reverts of the patches in question though? This is a
>> huge regression, and there were only a few relevant changes that seem to
>> have triggered this. I can try to come up with something targeted, but
>> I'm not going to even try if that'd get rejected up front.
> 
> Untile now, none of my(and my colleagues') rockchip platforms are able
> to reproduce this issue, so I can't tell what exactly the problem is.
> However, I noticed you mentioned that the Exynos platforms are also
> affected by rpm of dwmmc. I don't see dw_mmc-exynos enable this feature,
> so it looks quite odd to me!

Well, exynos boards what i have didn't see the similar issue.
But i will test all exynos boards for reproducing this.

> 
> Can I or Eddie get a Veyron board to help you debug it?
> 
>>
>> Brian
>>
>>
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-03-30  1:17 [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards Brian Norris
       [not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2017-04-10 23:35 ` Doug Anderson
  2017-04-11 10:21   ` Ulf Hansson
  2017-04-12  0:54   ` Shawn Lin
  1 sibling, 2 replies; 18+ messages in thread
From: Doug Anderson @ 2017-04-10 23:35 UTC (permalink / raw)
  To: Brian Norris
  Cc: linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Shawn Lin,
	Jaehoon Chung, kevin

Hi,

On Wed, Mar 29, 2017 at 6:17 PM, Brian Norris <briannorris@chromium.org> wrote:
> Hi all,
>
> I haven't managed to get as far as a bugfix for this, but I've bisected
> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
> in particular). v4.9 works fine.

OK, I finally got everything up and running to test this too...


> Issue #1 - eMMC complains periodically:
>
> [    4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)

I don't believe that this is an error, actually.  Really we just need
to quiet this message or (since I've always found it useful) move it
to a different place.  I believe that with runtime PM we're
effectively turning the clock off whenever the MMC device isn't in
use.  On dw_mmc we have a "helpful" printout every time the clock is
changed, and that's what you're seeing here.

You can see with:

while true; do
  dd if=/dev/mmcblk2 of=/dev/null bs=512 count=1 iflag=direct;
  sleep .1;
done

...that you'll get a printout every 100ms.


Ah, looks like this is in:

ce69e2fea093 mmc: dw_mmc: silent verbose log when calling from PM context

...as pointed out by Shawn Lin.  So I think in 4.10 we can just ignore
those messages and they're good on 4.11.


> and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M >
> /dev/null), it will eventually croak:
>
> [  359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [  360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153
> [  360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0
> [  360.221936] mmcblk2: retrying using single block read
> [  363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
> [  363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
> [  363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [  363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152
> [  363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
> [  363.761938] mmcblk2: retrying using single block read
> [  366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds.
> [  366.618134]       Not tainted 4.10.0 #284
> [  366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  366.629960] mmcqd/2boot1    D    0    92      2 0x00000000
> [  366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0)
> [  366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c)
> [  366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34)
> [  366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c)
> [  366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4)
> [  366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144)
> [  366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c)

I'm not convinced this is a regression.

I remember Heiko saying that he's heard reports that on some boards
eMMC doesn't work with high speed, and I'be believe that's what you're
seeing here.  It would be interesting to try to debug this.  I can't
personally reproduce, though.

I think veyron_minnie already has UHS turned off for eMMC upstream.  I
guess we could do it for other veyron boards too until someone can
debug?


> Issue #2 - Wifi (via SDIO, mmc1) is completely dead:
>
> [    1.444125] mmc_host mmc1: card is non-removable.
> [    1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
> [    1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [    1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001
> [   25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   25.691666] mwifiex: rx work enabled, cpus 4
> [   26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes
> [   27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active
> [   33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0
> [   37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0
> [   37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0
> [   37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1
> [   37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0
> [   37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1
> [   37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
> [   37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
> [   37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0
> [   37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
> [   37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0
> [   37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
> [   37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0
> [   37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0
> [   37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
> [   37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device

This doesn't surprise me at all.  What surprises me, though, is that
nobody else seems to be able to reproduce this.

On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
Interrupts.  See this bit in the device tree:

  cap-sdio-irq;

SDIO interrupts (in 4-bit mode) specifically need the card clock to be
running all the time to work.  I can reproduce your regression (on
veryron-jerry, which also has Marvell WiFi) and I can also find that
the regression is "gone' if I take out the "cap-sdio-irq" in the
veyron device tree.  Ah, interestingly enough, turning off SDIO
interrupts has the side effect of sending enough (polling) traffic
that we never seem to runtime suspend, either.  ;-P

In general I'd question whether dw_mmc actually gets much power
benefit from Runtime PM in Linux.  The dw_mmc IP blocks already have a
feature in them to automatically stop and restart the card clock.  See
SDMMC_CLKEN_LOW_PWR.  Maybe you're getting the benefit of turning off
VMMC or VQMMC?  Is that really a lot of power?  Presumably those power
savings would be for eMMC or normal SD cards (not SDIO).

Maybe someone else on this thread knows how Runtime PM is supposed to
work in general for SDIO?  I notice that in sdio.c the
mmc_sdio_runtime_suspend() unconditionally calls mmc_power_off().
That seems odd since the main mmc_sdio_suspend() _doesn't_ call it if
mmc_card_keep_power().  Hrmmm...

OK, so I just tried this on veyron-minnie.  On minnie we have Broadcom
WiFi.  That actually works (!).  Presumably this is because
brcmf_sdiod_host_fixup() calls pm_runtime_forbid().  Commenting that
out breaks things.

OK, and I can make Marvell work by adding
"pm_runtime_forbid(func->card->host->parent);" to the end of
mwifiex_sdio_probe().

--

So where does that leave us?

A) Technically we can fix Marvell's driver to work like Broadcom's.
One could possibly assert that this is the wrong fix because
technically we could make Runtime PM work with SDIO with enough work.
We could theoretically move into 1-bit mode and there (I think) you
can get interrupts with the clock off.  ...or we could have a
dedicated SDIO Interrupt pin (for the embedded case), which is talked
about in the SDIO spec.

B) Technically we could hack this in the dw_mmc code to disable
Runtime PM if we see that an SDIO interrupt is used.  One advantage of
doing it here is that if we ever add support in dw_mmc for the
external SDIO interrupt we could allow Runtime PM in that case.  In
theory the dw_mmc IP block has some basic support for a dedicated SDIO
interrupt pin, but there's no code to support it.

C) Technically we could add this into the MMC core.

D) Technically we could remove Runtime PM support from dw_mmc for now
until someone can address all these issues (and ideally show a real
power savings).


I'd tend to vote for D, but I've been pretty absent from dw_mmc for a
long time, so probably my vote isn't worth that much...

Shawn: I think you actually enabled runtime PM.  Did you really see
power savings, or did it just seem like enabling Runtime PM would be a
neat thing to do?


-Doug

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-10 23:35 ` Doug Anderson
@ 2017-04-11 10:21   ` Ulf Hansson
  2017-04-11 22:57     ` Doug Anderson
  2017-04-12  0:54   ` Shawn Lin
  1 sibling, 1 reply; 18+ messages in thread
From: Ulf Hansson @ 2017-04-11 10:21 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Brian Norris, linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Shawn Lin,
	Jaehoon Chung, kevin

[...]

>
>> Issue #2 - Wifi (via SDIO, mmc1) is completely dead:
>>
>> [    1.444125] mmc_host mmc1: card is non-removable.
>> [    1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
>> [    1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [    1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001
>> [   25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   25.691666] mwifiex: rx work enabled, cpus 4
>> [   26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes
>> [   27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active
>> [   33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0
>> [   37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0
>> [   37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0
>> [   37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1
>> [   37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0
>> [   37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1
>> [   37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
>> [   37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
>> [   37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0
>> [   37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
>> [   37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0
>> [   37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
>> [   37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0
>> [   37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0
>> [   37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device
>
> This doesn't surprise me at all.  What surprises me, though, is that
> nobody else seems to be able to reproduce this.

Me too.

As I have stated several times, the PM code for SDIO is fragile/broken
for many scenarios. This is just one case.

>
> On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
> Interrupts.  See this bit in the device tree:
>
>   cap-sdio-irq;
>
> SDIO interrupts (in 4-bit mode) specifically need the card clock to be
> running all the time to work.  I can reproduce your regression (on
> veryron-jerry, which also has Marvell WiFi) and I can also find that
> the regression is "gone' if I take out the "cap-sdio-irq" in the
> veyron device tree.  Ah, interestingly enough, turning off SDIO
> interrupts has the side effect of sending enough (polling) traffic
> that we never seem to runtime suspend, either.  ;-P

We did a similar fix for sdhci recently. Simply, in cases when sdio
IRQ is turned on, we call pm_runtime_get_noresume() to prevent the
device from being runtime suspended.

Unless the SoC supports the SDIO irq to be re-routed to a wakeup IRQ
at runtime suspend, there is no other solution. However, re-routing to
a wakeup IRQ should be done, when switching to 1-bit mode is
completed. This is currently not supported by the mmc core.

>
> In general I'd question whether dw_mmc actually gets much power
> benefit from Runtime PM in Linux.  The dw_mmc IP blocks already have a
> feature in them to automatically stop and restart the card clock.  See
> SDMMC_CLKEN_LOW_PWR.  Maybe you're getting the benefit of turning off
> VMMC or VQMMC?  Is that really a lot of power?  Presumably those power
> savings would be for eMMC or normal SD cards (not SDIO).

I think that depends on the PM topology of the SoC. Perhaps the dw_mmc
devices are in PM domains sharing power rails etc, and preventing
runtime PM could be very costly as it then prevent those shared
resources to be put into low power state.

I think the best we can do at this point is something similar as we do
for sdhci.

>
> Maybe someone else on this thread knows how Runtime PM is supposed to
> work in general for SDIO?  I notice that in sdio.c the
> mmc_sdio_runtime_suspend() unconditionally calls mmc_power_off().
> That seems odd since the main mmc_sdio_suspend() _doesn't_ call it if
> mmc_card_keep_power().  Hrmmm...

As stated above, the system PM and runtime PM code for SDIO is fragile
and needs an update. I know how to do it, but it requires some work. I
have started to hack on it several times, maybe I just need to put
everything else aside and focus on this. :-)

Moreover, I would really like to invent the feature being able to
defer system PM resume of the SDIO card (in cases when it means a full
re-init of the SDIO card) to runtime PM resume instead. Why? It would
saves several hundreds of milliseconds in system PM resume time. The
very similar feature as we already have for SD/(e)MMC.

>
> OK, so I just tried this on veyron-minnie.  On minnie we have Broadcom
> WiFi.  That actually works (!).  Presumably this is because
> brcmf_sdiod_host_fixup() calls pm_runtime_forbid().  Commenting that
> out breaks things.

I think this should be managed in the dw_mmc driver instead, as it is
there the problem lies.

>
> OK, and I can make Marvell work by adding
> "pm_runtime_forbid(func->card->host->parent);" to the end of
> mwifiex_sdio_probe().

Again, dw_mmc is the correct place.

>
> --
>
> So where does that leave us?
>
> A) Technically we can fix Marvell's driver to work like Broadcom's.
> One could possibly assert that this is the wrong fix because
> technically we could make Runtime PM work with SDIO with enough work.
> We could theoretically move into 1-bit mode and there (I think) you
> can get interrupts with the clock off.  ...or we could have a
> dedicated SDIO Interrupt pin (for the embedded case), which is talked
> about in the SDIO spec.

If the WIFI chip supports an external SDIO irq pin, that is very much
preferred. Both from PM point of view, but actually also from
performance point of view (mainly because it's faster to ack the IRQ).

That said, for these scenarios, I assume the switching to 1-bit mode
isn't necessary before gating the clock, as the IRQ is driven
completely separately from the SDIO bus. From my own experience, this
is how cw1200 WIFI chip behaves on ux500.

>
> B) Technically we could hack this in the dw_mmc code to disable
> Runtime PM if we see that an SDIO interrupt is used.  One advantage of
> doing it here is that if we ever add support in dw_mmc for the
> external SDIO interrupt we could allow Runtime PM in that case.  In
> theory the dw_mmc IP block has some basic support for a dedicated SDIO
> interrupt pin, but there's no code to support it.

Right.

>
> C) Technically we could add this into the MMC core.

Perhaps the MMC core needs to play a role, not sure exactly how yet.

>
> D) Technically we could remove Runtime PM support from dw_mmc for now
> until someone can address all these issues (and ideally show a real
> power savings).

No. Then it's better to just prevent runtime suspend when SDIO irq
becomes enabled.

>
>
> I'd tend to vote for D, but I've been pretty absent from dw_mmc for a
> long time, so probably my vote isn't worth that much...
>
> Shawn: I think you actually enabled runtime PM.  Did you really see
> power savings, or did it just seem like enabling Runtime PM would be a
> neat thing to do?
>
>
> -Doug

Dough, really appreciate you efforts in testing this and the detailed
way you describes the problem.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-11 10:21   ` Ulf Hansson
@ 2017-04-11 22:57     ` Doug Anderson
  0 siblings, 0 replies; 18+ messages in thread
From: Doug Anderson @ 2017-04-11 22:57 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Brian Norris, linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Shawn Lin,
	Jaehoon Chung, kevin

Hi,

On Tue, Apr 11, 2017 at 3:21 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>
>>> Issue #2 - Wifi (via SDIO, mmc1) is completely dead:
>>>
>>> [    1.444125] mmc_host mmc1: card is non-removable.
>>> [    1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
>>> [    1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>>> [    1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001
>>> [   25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>>> [   25.691666] mwifiex: rx work enabled, cpus 4
>>> [   26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes
>>> [   27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active
>>> [   33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>>> [   37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0
>>> [   37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0
>>> [   37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0
>>> [   37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1
>>> [   37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0
>>> [   37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1
>>> [   37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
>>> [   37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
>>> [   37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0
>>> [   37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
>>> [   37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0
>>> [   37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
>>> [   37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0
>>> [   37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0
>>> [   37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>>> [   37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device
>>
>> This doesn't surprise me at all.  What surprises me, though, is that
>> nobody else seems to be able to reproduce this.
>
> Me too.
>
> As I have stated several times, the PM code for SDIO is fragile/broken
> for many scenarios. This is just one case.
>
>>
>> On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
>> Interrupts.  See this bit in the device tree:
>>
>>   cap-sdio-irq;
>>
>> SDIO interrupts (in 4-bit mode) specifically need the card clock to be
>> running all the time to work.  I can reproduce your regression (on
>> veryron-jerry, which also has Marvell WiFi) and I can also find that
>> the regression is "gone' if I take out the "cap-sdio-irq" in the
>> veyron device tree.  Ah, interestingly enough, turning off SDIO
>> interrupts has the side effect of sending enough (polling) traffic
>> that we never seem to runtime suspend, either.  ;-P
>
> We did a similar fix for sdhci recently. Simply, in cases when sdio
> IRQ is turned on, we call pm_runtime_get_noresume() to prevent the
> device from being runtime suspended.

I tried a similar mechanism for dw_mmc and it mostly worked.  Until I
stressed it out.  I stressed it out by running:

while true; do
  ifconfig mlan0 up;
  ifconfig mlan0 down;
done

When I did this long enough, I somehow managed to get into a state
where the card allowed itself to temporarily be runtime suspended.
This caused communication errors and eventually the mwifiex driver did
a full reset of itself.  I found that when communication errors were
happening that I was runtime suspended.

For whatever reason this was easiest to reproduce when I added a
printk to a serial console in the enable_sdio_irq() callback, but I
could also reproduce by setting the autosuspend_delay_ms to 1 and
using pm_runtime_put() instead of pm_runtime_put_noidle() in my patch.


It looks like on dw_mmc enable_sdio_irq(0) enable_sdio_irq(1) is
called almost constantly.  Without having boards with SDHCI and SDIO
to test with, it appears that this is different than how things work
with SDHCI.  For dw_mmc enable/disable is called in sdio_irq_thread()
to mask new interrupts while processing the current one.  SDHCI
doesn't use sdio_irq_thread() because it sets
MMC_CAP2_SDIO_IRQ_NOTHREAD.

Because of this constant stream of disable / enable calls we spend
some amount of time with Runtime PM enabled.  If we happen to have
enough delays (or printks) and we happen to get lucky then we can end
up running the PM Runtime suspend code for dw_mmc.  This is bad
because dw_mci_runtime_resume() fully resets the host controller and
clears all interrupts.  That doesn't seem so great, but I believe the
specific problem is that we might be clearing the next SDIO interrupt
which might have already come in (I haven't proven this).

Overall we really just don't want any Runtime PM at all when SDIO
Interrupts are being used.  There's currently no callback into dw_mmc
that gets called when someone holds a SDIO IRQ.  I suppose we could
add one from sdio_claim_irq() / sdio_release_irq() if that was
desired...


We could also (in theory) get the core SDIO code to get / put Runtime
PM whenever it's currently processing an interrupt.  I coded this up
and I can post it if you want, but it feels a bit complicated.


Instead I'm thinking of using the same solution I came up with for
DW_MMC_CARD_NO_LOW_PWR in dw_mmc: putting this in dw_mci_init_card().
It's not 100% perfect if there are use cases where the SDIO interrupt
is really disabled for long periods of times (and we want to save
power) but seems like a good stopgap and eliminates this particular
regression quickly.

OK, I've posted that now.  https://patchwork.kernel.org/patch/9676197/

Whew, that took a lot longer to dig into than I originally thought it
would.  :-P


>> In general I'd question whether dw_mmc actually gets much power
>> benefit from Runtime PM in Linux.  The dw_mmc IP blocks already have a
>> feature in them to automatically stop and restart the card clock.  See
>> SDMMC_CLKEN_LOW_PWR.  Maybe you're getting the benefit of turning off
>> VMMC or VQMMC?  Is that really a lot of power?  Presumably those power
>> savings would be for eMMC or normal SD cards (not SDIO).
>
> I think that depends on the PM topology of the SoC. Perhaps the dw_mmc
> devices are in PM domains sharing power rails etc, and preventing
> runtime PM could be very costly as it then prevent those shared
> resources to be put into low power state.

Ah, good point.  I hadn't thought about the shared power domain case.



-Doug

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-10 23:35 ` Doug Anderson
  2017-04-11 10:21   ` Ulf Hansson
@ 2017-04-12  0:54   ` Shawn Lin
  2017-04-12 16:12     ` Doug Anderson
  1 sibling, 1 reply; 18+ messages in thread
From: Shawn Lin @ 2017-04-12  0:54 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Brian Norris, linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung,
	kevin

Hi Doug,

在 2017/4/11 7:35, Doug Anderson 写道:
> Hi,
>
> On Wed, Mar 29, 2017 at 6:17 PM, Brian Norris <briannorris@chromium.org> wrote:
>> Hi all,
>>
>> I haven't managed to get as far as a bugfix for this, but I've bisected
>> some issues seen on v4.10+ with a Chromebook of the Veyron family (Jaq,
>> in particular). v4.9 works fine.
>
> OK, I finally got everything up and running to test this too...
>
>
>> Issue #1 - eMMC complains periodically:
>>
>> [    4.358135] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [    4.461466] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [    5.291450] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [    5.381471] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   11.243337] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   17.371628] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>
> I don't believe that this is an error, actually.  Really we just need
> to quiet this message or (since I've always found it useful) move it
> to a different place.  I believe that with runtime PM we're
> effectively turning the clock off whenever the MMC device isn't in
> use.  On dw_mmc we have a "helpful" printout every time the clock is
> changed, and that's what you're seeing here.
>
> You can see with:
>
> while true; do
>   dd if=/dev/mmcblk2 of=/dev/null bs=512 count=1 iflag=direct;
>   sleep .1;
> done
>
> ...that you'll get a printout every 100ms.
>
>
> Ah, looks like this is in:
>
> ce69e2fea093 mmc: dw_mmc: silent verbose log when calling from PM context
>
> ...as pointed out by Shawn Lin.  So I think in 4.10 we can just ignore
> those messages and they're good on 4.11.
>
>
>> and if I stress it out at all (e.g., dd if=/dev/mmcblk2 bs=1M >
>> /dev/null), it will eventually croak:
>>
>> [  359.916315] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [  360.071378] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 153
>> [  360.211351] mmcblk2: error -110 transferring data, sector 8644608, nr 2048, cmd response 0x900, card status 0x0
>> [  360.221936] mmcblk2: retrying using single block read
>> [  363.491362] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
>> [  363.531569] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
>> [  363.596326] mmc_host mmc2: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [  363.612712] dwmmc_rockchip ff0f0000.dwmmc: Successfully tuned phase to 152
>> [  363.751351] mmcblk2: error -110 transferring data, sector 8646656, nr 2048, cmd response 0x900, card status 0x0
>> [  363.761938] mmcblk2: retrying using single block read
>> [  366.611356] INFO: task mmcqd/2boot1:92 blocked for more than 120 seconds.
>> [  366.618134]       Not tainted 4.10.0 #284
>> [  366.622146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  366.629960] mmcqd/2boot1    D    0    92      2 0x00000000
>> [  366.635454] [<c07dc21c>] (__schedule) from [<c07dc4e0>] (schedule+0x90/0xa0)
>> [  366.642497] [<c07dc4e0>] (schedule) from [<c066e8b4>] (__mmc_claim_host+0xd4/0x19c)
>> [  366.650142] [<c066e8b4>] (__mmc_claim_host) from [<c066e9ac>] (mmc_get_card+0x30/0x34)
>> [  366.658056] [<c066e9ac>] (mmc_get_card) from [<c067fc8c>] (mmc_blk_issue_rq+0x64/0x48c)
>> [  366.666052] [<c067fc8c>] (mmc_blk_issue_rq) from [<c0680230>] (mmc_queue_thread+0x114/0x1b4)
>> [  366.674484] [<c0680230>] (mmc_queue_thread) from [<c023d1b0>] (kthread+0x128/0x144)
>> [  366.682134] [<c023d1b0>] (kthread) from [<c02076e8>] (ret_from_fork+0x14/0x2c)
>
> I'm not convinced this is a regression.
>
> I remember Heiko saying that he's heard reports that on some boards
> eMMC doesn't work with high speed, and I'be believe that's what you're
> seeing here.  It would be interesting to try to debug this.  I can't
> personally reproduce, though.
>
> I think veyron_minnie already has UHS turned off for eMMC upstream.  I
> guess we could do it for other veyron boards too until someone can
> debug?
>
>
>> Issue #2 - Wifi (via SDIO, mmc1) is completely dead:
>>
>> [    1.444125] mmc_host mmc1: card is non-removable.
>> [    1.471368] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
>> [    1.619553] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [    1.881699] mmc1: new ultra high speed SDR104 SDIO card at address 0001
>> [   25.681172] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   25.691666] mwifiex: rx work enabled, cpus 4
>> [   26.827000] mwifiex_sdio mmc1:0001:1: info: FW download over, size 800344 bytes
>> [   27.561352] mwifiex_sdio mmc1:0001:1: WLAN FW is active
>> [   33.585165] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   37.651344] mwifiex_sdio mmc1:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id = 0xa9, act = 0x0
>> [   37.660122] mwifiex_sdio mmc1:0001:1: num_data_h2c_failure = 0
>> [   37.665951] mwifiex_sdio mmc1:0001:1: num_cmd_h2c_failure = 0
>> [   37.671688] mwifiex_sdio mmc1:0001:1: is_cmd_timedout = 1
>> [   37.677076] mwifiex_sdio mmc1:0001:1: num_tx_timeout = 0
>> [   37.682380] mwifiex_sdio mmc1:0001:1: last_cmd_index = 1
>> [   37.687681] mwifiex_sdio mmc1:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
>> [   37.695066] mwifiex_sdio mmc1:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
>> [   37.702536] mwifiex_sdio mmc1:0001:1: last_cmd_resp_index = 0
>> [   37.708269] mwifiex_sdio mmc1:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
>> [   37.716087] mwifiex_sdio mmc1:0001:1: last_event_index = 0
>> [   37.721564] mwifiex_sdio mmc1:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
>> [   37.728857] mwifiex_sdio mmc1:0001:1: data_sent=1 cmd_sent=0
>> [   37.734508] mwifiex_sdio mmc1:0001:1: ps_mode=0 ps_state=0
>> [   37.740016] mmc_host mmc1: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)
>> [   37.750268] mwifiex_sdio mmc1:0001:1: info: mwifiex_fw_dpc: unregister device
>
> This doesn't surprise me at all.  What surprises me, though, is that
> nobody else seems to be able to reproduce this.
>
> On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
> Interrupts.  See this bit in the device tree:
>
>   cap-sdio-irq;
>

all of *my* boards are using side-band interrupt, so there are no
"cap-sdio-irq".

> SDIO interrupts (in 4-bit mode) specifically need the card clock to be
> running all the time to work.  I can reproduce your regression (on
> veryron-jerry, which also has Marvell WiFi) and I can also find that
> the regression is "gone' if I take out the "cap-sdio-irq" in the
> veyron device tree.  Ah, interestingly enough, turning off SDIO
> interrupts has the side effect of sending enough (polling) traffic
> that we never seem to runtime suspend, either.  ;-P
>
> In general I'd question whether dw_mmc actually gets much power
> benefit from Runtime PM in Linux.  The dw_mmc IP blocks already have a
> feature in them to automatically stop and restart the card clock.  See
> SDMMC_CLKEN_LOW_PWR.  Maybe you're getting the benefit of turning off
> VMMC or VQMMC?  Is that really a lot of power?  Presumably those power
> savings would be for eMMC or normal SD cards (not SDIO).
>
> Maybe someone else on this thread knows how Runtime PM is supposed to
> work in general for SDIO?  I notice that in sdio.c the
> mmc_sdio_runtime_suspend() unconditionally calls mmc_power_off().
> That seems odd since the main mmc_sdio_suspend() _doesn't_ call it if
> mmc_card_keep_power().  Hrmmm...
>
> OK, so I just tried this on veyron-minnie.  On minnie we have Broadcom
> WiFi.  That actually works (!).  Presumably this is because
> brcmf_sdiod_host_fixup() calls pm_runtime_forbid().  Commenting that
> out breaks things.
>
> OK, and I can make Marvell work by adding
> "pm_runtime_forbid(func->card->host->parent);" to the end of
> mwifiex_sdio_probe().
>
> --
>
> So where does that leave us?
>
> A) Technically we can fix Marvell's driver to work like Broadcom's.
> One could possibly assert that this is the wrong fix because
> technically we could make Runtime PM work with SDIO with enough work.
> We could theoretically move into 1-bit mode and there (I think) you
> can get interrupts with the clock off.  ...or we could have a
> dedicated SDIO Interrupt pin (for the embedded case), which is talked
> about in the SDIO spec.
>
> B) Technically we could hack this in the dw_mmc code to disable
> Runtime PM if we see that an SDIO interrupt is used.  One advantage of
> doing it here is that if we ever add support in dw_mmc for the
> external SDIO interrupt we could allow Runtime PM in that case.  In
> theory the dw_mmc IP block has some basic support for a dedicated SDIO
> interrupt pin, but there's no code to support it.
>
> C) Technically we could add this into the MMC core.
>
> D) Technically we could remove Runtime PM support from dw_mmc for now
> until someone can address all these issues (and ideally show a real
> power savings).
>
>
> I'd tend to vote for D, but I've been pretty absent from dw_mmc for a
> long time, so probably my vote isn't worth that much...
>
> Shawn: I think you actually enabled runtime PM.  Did you really see
> power savings, or did it just seem like enabling Runtime PM would be a
> neat thing to do?

As Ulf pointed out that the genpd for mmc IP on Rockchip platforms were
shared with others, so it's worth to add runtime PM.


>
>
> -Doug
>
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-12  0:54   ` Shawn Lin
@ 2017-04-12 16:12     ` Doug Anderson
  2017-04-13  7:17       ` Ulf Hansson
  2017-04-13  8:28       ` Shawn Lin
  0 siblings, 2 replies; 18+ messages in thread
From: Doug Anderson @ 2017-04-12 16:12 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Brian Norris, linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung,
	kevin

Shawn

On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>> This doesn't surprise me at all.  What surprises me, though, is that
>> nobody else seems to be able to reproduce this.
>>
>> On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
>> Interrupts.  See this bit in the device tree:
>>
>>   cap-sdio-irq;
>>
>
> all of *my* boards are using side-band interrupt, so there are no
> "cap-sdio-irq".

They are all using side-band interrupt?  What WiFi device do you have connected?

If you're truly using a side-band interrupt using the dedicated SDIO
interrupt pin on your SoC, I'm pretty sure you still need to define
cap-sdio-irq in order for things to work properly.  If you don't do
that, you'll get "polling mode" for SDIO Interrupts.  See
sdio_irq_thread() where you can see that the kernel will poll your
device every 10 ms if MMC_CAP_SDIO_IRQ isn't set.

Maybe you should try defining cap-sdio-irq and see if you get a big
performance boost?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-12 16:12     ` Doug Anderson
@ 2017-04-13  7:17       ` Ulf Hansson
  2017-04-13 15:45         ` Doug Anderson
  2017-04-13  8:28       ` Shawn Lin
  1 sibling, 1 reply; 18+ messages in thread
From: Ulf Hansson @ 2017-04-13  7:17 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Shawn Lin, Brian Norris, linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung,
	kevin

On 12 April 2017 at 18:12, Doug Anderson <dianders@google.com> wrote:
> Shawn
>
> On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>>> This doesn't surprise me at all.  What surprises me, though, is that
>>> nobody else seems to be able to reproduce this.
>>>
>>> On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
>>> Interrupts.  See this bit in the device tree:
>>>
>>>   cap-sdio-irq;
>>>
>>
>> all of *my* boards are using side-band interrupt, so there are no
>> "cap-sdio-irq".
>
> They are all using side-band interrupt?  What WiFi device do you have connected?
>
> If you're truly using a side-band interrupt using the dedicated SDIO
> interrupt pin on your SoC, I'm pretty sure you still need to define
> cap-sdio-irq in order for things to work properly.  If you don't do
> that, you'll get "polling mode" for SDIO Interrupts.  See
> sdio_irq_thread() where you can see that the kernel will poll your
> device every 10 ms if MMC_CAP_SDIO_IRQ isn't set.

In these cases I would expect the WIFI driver to deal with the SDIO
IRQ itself and not requesting it via calling sdio_claim_irq(). Because
of this, there should be no polling performed by the sdio_irq_thread.

>
> Maybe you should try defining cap-sdio-irq and see if you get a big
> performance boost?

No, that seems like a bad idea. I think it would rather add overhead -
decreasing performance. Likely it will also make us wake up the mmc
host from its low power state, when when it actually isn't needed.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-12 16:12     ` Doug Anderson
  2017-04-13  7:17       ` Ulf Hansson
@ 2017-04-13  8:28       ` Shawn Lin
  1 sibling, 0 replies; 18+ messages in thread
From: Shawn Lin @ 2017-04-13  8:28 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Brian Norris, linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung,
	kevin

Hi,

在 2017/4/13 0:12, Doug Anderson 写道:
> Shawn
>
> On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>>> This doesn't surprise me at all.  What surprises me, though, is that
>>> nobody else seems to be able to reproduce this.
>>>
>>> On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
>>> Interrupts.  See this bit in the device tree:
>>>
>>>   cap-sdio-irq;
>>>
>>
>> all of *my* boards are using side-band interrupt, so there are no
>> "cap-sdio-irq".
>
> They are all using side-band interrupt?  What WiFi device do you have connected?

I'm using brcm wifi that using out-of-tree drivers.

>
> If you're truly using a side-band interrupt using the dedicated SDIO
> interrupt pin on your SoC, I'm pretty sure you still need to define

Not really. The intention of using side-band int is that we could put
the host into low power mode(maybe with pd off), so that the wifi could
still works with Socs. And mostly, we don't need to keep the controller
on when in S3. The side-band io could be registered as a gpio interrupt
(wakeup source), and once the wifi chip need to communicate with Socs,
it could wakeup the system(of course sdio controller will be alive
then). Also, once using side-band interrupt, the interrupt service and
management should be done with the wifi function drivers. I'm pretty
sure that my at-hand drivers, for instance, brcm and realtek actually
do that.

> cap-sdio-irq in order for things to work properly.  If you don't do
> that, you'll get "polling mode" for SDIO Interrupts.  See
> sdio_irq_thread() where you can see that the kernel will poll your
> device every 10 ms if MMC_CAP_SDIO_IRQ isn't set.
>
> Maybe you should try defining cap-sdio-irq and see if you get a big
> performance boost?

Sorry, I didn't test the upstreamed wifi drivers but from the test of my
out-of-tree wifi drivers, there is no much difference.

>
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards
  2017-04-13  7:17       ` Ulf Hansson
@ 2017-04-13 15:45         ` Doug Anderson
  0 siblings, 0 replies; 18+ messages in thread
From: Doug Anderson @ 2017-04-13 15:45 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Shawn Lin, Brian Norris, linux-mmc, open list:ARM/Rockchip SoC...,
	Heiko Stuebner, Alexandru M Stan, Ziyuan Xu, Jaehoon Chung,
	kevin

Hi,

On Thu, Apr 13, 2017 at 12:17 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 12 April 2017 at 18:12, Doug Anderson <dianders@google.com> wrote:
>> Shawn
>>
>> On Tue, Apr 11, 2017 at 5:54 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>>>> This doesn't surprise me at all.  What surprises me, though, is that
>>>> nobody else seems to be able to reproduce this.
>>>>
>>>> On veyron, WiFi is connected via SDIO.  For good speed, it uses SDIO
>>>> Interrupts.  See this bit in the device tree:
>>>>
>>>>   cap-sdio-irq;
>>>>
>>>
>>> all of *my* boards are using side-band interrupt, so there are no
>>> "cap-sdio-irq".
>>
>> They are all using side-band interrupt?  What WiFi device do you have connected?
>>
>> If you're truly using a side-band interrupt using the dedicated SDIO
>> interrupt pin on your SoC, I'm pretty sure you still need to define
>> cap-sdio-irq in order for things to work properly.  If you don't do
>> that, you'll get "polling mode" for SDIO Interrupts.  See
>> sdio_irq_thread() where you can see that the kernel will poll your
>> device every 10 ms if MMC_CAP_SDIO_IRQ isn't set.
>
> In these cases I would expect the WIFI driver to deal with the SDIO
> IRQ itself and not requesting it via calling sdio_claim_irq(). Because
> of this, there should be no polling performed by the sdio_irq_thread.

You're the boss here, but that's not how I envisioned it if I ever
found time to dig deeper.

My vision of the world is probably colored by the dw_mmc IP block,
though.  Both of the two SoC families that I've dealt with that have
dw_mmc (both Exynos and Rockchip) have always had a pin that could be
muxed as "SDIO Interrupt".  If you choose this pinmux, my
understanding is that it will assert the dw_mmc's normal SDIO
interrupt in the IP block.  The DesignWare datasheet talks about this
in terms of eSDIO.  It does talk a little bit about the fact that this
method of interrupting can happen even when the card clock is off.

Given that this concept seems generic, is directly supported by the
dw_mmc hardware, and is talked about in the dw_mmc datasheet, it seems
as if dw_mmc would be the place to deal with it.

If other controllers don't support this concept in a generic way, I
see no reason why we still couldn't handle it in a generic way (via a
GPIO) at the mmc level rather than forcing each WiFi driver to invent
this themselves.

...but obviously I haven't worked through all the details and have
never actually coded this up successfully.

>> Maybe you should try defining cap-sdio-irq and see if you get a big
>> performance boost?
>
> No, that seems like a bad idea. I think it would rather add overhead -
> decreasing performance. Likely it will also make us wake up the mmc
> host from its low power state, when when it actually isn't needed.

Sounds like Shawn is using an out-of-tree driver, but if it's anything
like the in-tree driver then there's no Runtime PM anyway.  See the
pm_runtime_forbid() in brcmf_sdiod_probe().

-Doug

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-13 15:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-30  1:17 [REGRESSION 4.10] dw_mmc: failures on Rockchip rk3288 veyron boards Brian Norris
     [not found] ` <20170330011709.GA110687-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2017-03-30  1:32   ` Shawn Lin
2017-03-30  1:42     ` Brian Norris
2017-03-30  2:18       ` Eddie Cai
2017-03-30  2:53         ` Brian Norris
2017-03-30  5:11           ` Jaehoon Chung
2017-04-06 22:04             ` Brian Norris
2017-04-07  4:59               ` Jaehoon Chung
2017-04-07  6:50               ` Shawn Lin
2017-04-07  7:38                 ` Jaehoon Chung
2017-04-10 23:35 ` Doug Anderson
2017-04-11 10:21   ` Ulf Hansson
2017-04-11 22:57     ` Doug Anderson
2017-04-12  0:54   ` Shawn Lin
2017-04-12 16:12     ` Doug Anderson
2017-04-13  7:17       ` Ulf Hansson
2017-04-13 15:45         ` Doug Anderson
2017-04-13  8:28       ` Shawn Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.