* Continuous SD IO causes hung task messages
@ 2019-08-27 13:43 Russell King - ARM Linux admin
2019-08-27 13:55 ` Ulf Hansson
0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2019-08-27 13:43 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, linux-mmc
Hi,
While dd'ing the contents of a SD card, I get hung task timeout
messages as per below. However, the dd is making progress. Any
ideas?
Presumably, mmc_rescan doesn't get a look-in while IO is progressing
for the card?
ARM64 host, Macchiatobin, uSD card.
Thanks.
root@arm-d06300000000:~# dd if=/dev/mmcblk1 | md5sum
INFO: task kworker/2:1:52 blocked for more than 120 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
INFO: task kworker/2:1:52 blocked for more than 120 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
INFO: task kworker/2:1:52 blocked for more than 241 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
INFO: task kworker/2:1:52 blocked for more than 362 seconds.
Not tainted 5.2.0+ #309
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D 0 52 2 0x00000028
Workqueue: events_freezable mmc_rescan
Call trace:
__switch_to+0xb0/0x198
__schedule+0x22c/0x604
schedule+0x38/0xc8
__mmc_claim_host+0xcc/0x1ec
mmc_get_card+0x30/0x3c
mmc_sd_detect+0x1c/0x78
mmc_rescan+0x1c4/0x35c
process_one_work+0x14c/0x408
worker_thread+0x140/0x3f4
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages
2019-08-27 13:43 Continuous SD IO causes hung task messages Russell King - ARM Linux admin
@ 2019-08-27 13:55 ` Ulf Hansson
2019-08-27 14:36 ` Russell King - ARM Linux admin
0 siblings, 1 reply; 6+ messages in thread
From: Ulf Hansson @ 2019-08-27 13:55 UTC (permalink / raw)
To: Russell King - ARM Linux admin
Cc: Linux ARM, Linux Kernel Mailing List, linux-mmc
On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> Hi,
>
> While dd'ing the contents of a SD card, I get hung task timeout
> messages as per below. However, the dd is making progress. Any
> ideas?
>
> Presumably, mmc_rescan doesn't get a look-in while IO is progressing
> for the card?
Is it a regression?
There not much of recent mmc core and mmc block changes, that I can
think of at this point.
>
> ARM64 host, Macchiatobin, uSD card.
What mmc host driver is it? mmci?
Kind regards
Uffe
>
> Thanks.
>
> root@arm-d06300000000:~# dd if=/dev/mmcblk1 | md5sum
> INFO: task kworker/2:1:52 blocked for more than 120 seconds.
> Not tainted 5.2.0+ #309
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/2:1 D 0 52 2 0x00000028
> Workqueue: events_freezable mmc_rescan
> Call trace:
> __switch_to+0xb0/0x198
> __schedule+0x22c/0x604
> schedule+0x38/0xc8
> __mmc_claim_host+0xcc/0x1ec
> mmc_get_card+0x30/0x3c
> mmc_sd_detect+0x1c/0x78
> mmc_rescan+0x1c4/0x35c
> process_one_work+0x14c/0x408
> worker_thread+0x140/0x3f4
> kthread+0xfc/0x128
> ret_from_fork+0x10/0x18
> INFO: task kworker/2:1:52 blocked for more than 120 seconds.
> Not tainted 5.2.0+ #309
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/2:1 D 0 52 2 0x00000028
> Workqueue: events_freezable mmc_rescan
> Call trace:
> __switch_to+0xb0/0x198
> __schedule+0x22c/0x604
> schedule+0x38/0xc8
> __mmc_claim_host+0xcc/0x1ec
> mmc_get_card+0x30/0x3c
> mmc_sd_detect+0x1c/0x78
> mmc_rescan+0x1c4/0x35c
> process_one_work+0x14c/0x408
> worker_thread+0x140/0x3f4
> kthread+0xfc/0x128
> ret_from_fork+0x10/0x18
> INFO: task kworker/2:1:52 blocked for more than 241 seconds.
> Not tainted 5.2.0+ #309
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/2:1 D 0 52 2 0x00000028
> Workqueue: events_freezable mmc_rescan
> Call trace:
> __switch_to+0xb0/0x198
> __schedule+0x22c/0x604
> schedule+0x38/0xc8
> __mmc_claim_host+0xcc/0x1ec
> mmc_get_card+0x30/0x3c
> mmc_sd_detect+0x1c/0x78
> mmc_rescan+0x1c4/0x35c
> process_one_work+0x14c/0x408
> worker_thread+0x140/0x3f4
> kthread+0xfc/0x128
> ret_from_fork+0x10/0x18
> INFO: task kworker/2:1:52 blocked for more than 362 seconds.
> Not tainted 5.2.0+ #309
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/2:1 D 0 52 2 0x00000028
> Workqueue: events_freezable mmc_rescan
> Call trace:
> __switch_to+0xb0/0x198
> __schedule+0x22c/0x604
> schedule+0x38/0xc8
> __mmc_claim_host+0xcc/0x1ec
> mmc_get_card+0x30/0x3c
> mmc_sd_detect+0x1c/0x78
> mmc_rescan+0x1c4/0x35c
> process_one_work+0x14c/0x408
> worker_thread+0x140/0x3f4
> kthread+0xfc/0x128
> ret_from_fork+0x10/0x18
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages
2019-08-27 13:55 ` Ulf Hansson
@ 2019-08-27 14:36 ` Russell King - ARM Linux admin
2019-08-27 14:52 ` Russell King - ARM Linux admin
0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2019-08-27 14:36 UTC (permalink / raw)
To: Ulf Hansson; +Cc: linux-mmc, Linux Kernel Mailing List, Linux ARM
On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote:
> On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > Hi,
> >
> > While dd'ing the contents of a SD card, I get hung task timeout
> > messages as per below. However, the dd is making progress. Any
> > ideas?
> >
> > Presumably, mmc_rescan doesn't get a look-in while IO is progressing
> > for the card?
>
> Is it a regression?
>
> There not much of recent mmc core and mmc block changes, that I can
> think of at this point.
No idea - I just repaired the SD socket after the D2 line became
disconnected, and decided to run that command as a test.
> > ARM64 host, Macchiatobin, uSD card.
>
> What mmc host driver is it? mmci?
sdhci-xenon.
I'm just trying with one CPU online, then I'll try with two. My
suspicion is that there's a problem in the ARM64 arch code where
unlocking a mutex doesn't get noticed on other CPUs.
Hmm, I thought I'd try bringing another CPU online, but it seems
like the ARM64 CPU hotplug code is broken:
[ 3552.029689] CPU1: shutdown
[ 3552.031099] psci: CPU1 killed.
[ 3949.835212] CPU1: failed to come online
[ 3949.837753] CPU1: failed in unknown state : 0x0
which means I can only take CPUs down, I can't bring them back
online without rebooting.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages
2019-08-27 14:36 ` Russell King - ARM Linux admin
@ 2019-08-27 14:52 ` Russell King - ARM Linux admin
2019-08-27 15:06 ` Russell King - ARM Linux admin
0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2019-08-27 14:52 UTC (permalink / raw)
To: Ulf Hansson; +Cc: linux-mmc, Linux Kernel Mailing List, Linux ARM
On Tue, Aug 27, 2019 at 03:36:34PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote:
> > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> > >
> > > Hi,
> > >
> > > While dd'ing the contents of a SD card, I get hung task timeout
> > > messages as per below. However, the dd is making progress. Any
> > > ideas?
> > >
> > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing
> > > for the card?
> >
> > Is it a regression?
> >
> > There not much of recent mmc core and mmc block changes, that I can
> > think of at this point.
>
> No idea - I just repaired the SD socket after the D2 line became
> disconnected, and decided to run that command as a test.
>
> > > ARM64 host, Macchiatobin, uSD card.
> >
> > What mmc host driver is it? mmci?
>
> sdhci-xenon.
>
> I'm just trying with one CPU online, then I'll try with two. My
> suspicion is that there's a problem in the ARM64 arch code where
> unlocking a mutex doesn't get noticed on other CPUs.
>
> Hmm, I thought I'd try bringing another CPU online, but it seems
> like the ARM64 CPU hotplug code is broken:
>
> [ 3552.029689] CPU1: shutdown
> [ 3552.031099] psci: CPU1 killed.
> [ 3949.835212] CPU1: failed to come online
> [ 3949.837753] CPU1: failed in unknown state : 0x0
>
> which means I can only take CPUs down, I can't bring them back
> online without rebooting.
Okay, running on a single CPU shows no problems.
Running on four CPUs (as originally) shows that the kworker thread
_never_ gets scheduled, so the warning is not false.
With three CPUs, same problem.
root@arm-d06300000000:~# ps aux | grep ' D '
root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable]
root@arm-d06300000000:~# cat /proc/34/sched
kworker/1:1 (34, #threads: 1)
-------------------------------------------------------------------
se.exec_start : 318689.992440
se.vruntime : 37750.882357
se.sum_exec_runtime : 9.421240
se.nr_migrations : 0
nr_switches : 1174
nr_voluntary_switches : 1171
nr_involuntary_switches : 3
se.load.weight : 1048576
se.runnable_weight : 1048576
se.avg.load_sum : 6
se.avg.runnable_load_sum : 6
se.avg.util_sum : 5170
se.avg.load_avg : 0
se.avg.runnable_load_avg : 0
se.avg.util_avg : 0
se.avg.last_update_time : 318689991680
se.avg.util_est.ewma : 10
se.avg.util_est.enqueued : 0
policy : 0
prio : 120
clock-delta : 0
The only thing that changes there is "clock-delta". When I kill the
dd, I get:
root@arm-d06300000000:~# cat /proc/34/sched
kworker/1:1 (34, #threads: 1)
-------------------------------------------------------------------
se.exec_start : 574025.791680
se.vruntime : 79996.657300
se.sum_exec_runtime : 10.916400
se.nr_migrations : 0
nr_switches : 1403
nr_voluntary_switches : 1400
nr_involuntary_switches : 3
se.load.weight : 1048576
se.runnable_weight : 1048576
se.avg.load_sum : 15
se.avg.runnable_load_sum : 15
se.avg.util_sum : 15007
se.avg.load_avg : 0
se.avg.runnable_load_avg : 0
se.avg.util_avg : 0
se.avg.last_update_time : 574025791488
se.avg.util_est.ewma : 10
se.avg.util_est.enqueued : 0
policy : 0
prio : 120
clock-delta : 40
so the thread makes forward progress.
Down to two CPUs:
root@arm-d06300000000:~# ps aux | grep ' D '
root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable]
Same symptoms. dd and md5sum switch between CPU 0 and CPU1.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages
2019-08-27 14:52 ` Russell King - ARM Linux admin
@ 2019-08-27 15:06 ` Russell King - ARM Linux admin
2019-08-28 10:31 ` Ulf Hansson
0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2019-08-27 15:06 UTC (permalink / raw)
To: Ulf Hansson; +Cc: linux-mmc, Linux Kernel Mailing List, Linux ARM
On Tue, Aug 27, 2019 at 03:52:17PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Aug 27, 2019 at 03:36:34PM +0100, Russell King - ARM Linux admin wrote:
> > On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote:
> > > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin
> > > <linux@armlinux.org.uk> wrote:
> > > >
> > > > Hi,
> > > >
> > > > While dd'ing the contents of a SD card, I get hung task timeout
> > > > messages as per below. However, the dd is making progress. Any
> > > > ideas?
> > > >
> > > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing
> > > > for the card?
> > >
> > > Is it a regression?
> > >
> > > There not much of recent mmc core and mmc block changes, that I can
> > > think of at this point.
> >
> > No idea - I just repaired the SD socket after the D2 line became
> > disconnected, and decided to run that command as a test.
> >
> > > > ARM64 host, Macchiatobin, uSD card.
> > >
> > > What mmc host driver is it? mmci?
> >
> > sdhci-xenon.
> >
> > I'm just trying with one CPU online, then I'll try with two. My
> > suspicion is that there's a problem in the ARM64 arch code where
> > unlocking a mutex doesn't get noticed on other CPUs.
> >
> > Hmm, I thought I'd try bringing another CPU online, but it seems
> > like the ARM64 CPU hotplug code is broken:
> >
> > [ 3552.029689] CPU1: shutdown
> > [ 3552.031099] psci: CPU1 killed.
> > [ 3949.835212] CPU1: failed to come online
> > [ 3949.837753] CPU1: failed in unknown state : 0x0
> >
> > which means I can only take CPUs down, I can't bring them back
> > online without rebooting.
>
> Okay, running on a single CPU shows no problems.
>
> Running on four CPUs (as originally) shows that the kworker thread
> _never_ gets scheduled, so the warning is not false.
>
> With three CPUs, same problem.
>
> root@arm-d06300000000:~# ps aux | grep ' D '
> root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable]
> root@arm-d06300000000:~# cat /proc/34/sched
> kworker/1:1 (34, #threads: 1)
> -------------------------------------------------------------------
> se.exec_start : 318689.992440
> se.vruntime : 37750.882357
> se.sum_exec_runtime : 9.421240
> se.nr_migrations : 0
> nr_switches : 1174
> nr_voluntary_switches : 1171
> nr_involuntary_switches : 3
> se.load.weight : 1048576
> se.runnable_weight : 1048576
> se.avg.load_sum : 6
> se.avg.runnable_load_sum : 6
> se.avg.util_sum : 5170
> se.avg.load_avg : 0
> se.avg.runnable_load_avg : 0
> se.avg.util_avg : 0
> se.avg.last_update_time : 318689991680
> se.avg.util_est.ewma : 10
> se.avg.util_est.enqueued : 0
> policy : 0
> prio : 120
> clock-delta : 0
>
> The only thing that changes there is "clock-delta". When I kill the
> dd, I get:
>
> root@arm-d06300000000:~# cat /proc/34/sched
> kworker/1:1 (34, #threads: 1)
> -------------------------------------------------------------------
> se.exec_start : 574025.791680
> se.vruntime : 79996.657300
> se.sum_exec_runtime : 10.916400
> se.nr_migrations : 0
> nr_switches : 1403
> nr_voluntary_switches : 1400
> nr_involuntary_switches : 3
> se.load.weight : 1048576
> se.runnable_weight : 1048576
> se.avg.load_sum : 15
> se.avg.runnable_load_sum : 15
> se.avg.util_sum : 15007
> se.avg.load_avg : 0
> se.avg.runnable_load_avg : 0
> se.avg.util_avg : 0
> se.avg.last_update_time : 574025791488
> se.avg.util_est.ewma : 10
> se.avg.util_est.enqueued : 0
> policy : 0
> prio : 120
> clock-delta : 40
>
> so the thread makes forward progress.
>
> Down to two CPUs:
>
> root@arm-d06300000000:~# ps aux | grep ' D '
> root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable]
>
> Same symptoms. dd and md5sum switch between CPU 0 and CPU1.
Hmm.
static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
mq->in_flight[issue_type] += 1;
get_card = (mmc_tot_in_flight(mq) == 1);
if (get_card)
mmc_get_card(card, &mq->ctx);
mmc_get_card() gets the host lock according to the card.
So, if we always have requests in flight (which is probably the case
here) we never drop the host lock, and mmc_rescan() never gets a look
in - hence blocking the kworker.
So this is a real issue with MMC, and not down to something in the
arch.
I suspect the reason that single-CPU doesn't show it is because it is
unable to keep multiple requests in flight.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Continuous SD IO causes hung task messages
2019-08-27 15:06 ` Russell King - ARM Linux admin
@ 2019-08-28 10:31 ` Ulf Hansson
0 siblings, 0 replies; 6+ messages in thread
From: Ulf Hansson @ 2019-08-28 10:31 UTC (permalink / raw)
To: Russell King - ARM Linux admin
Cc: linux-mmc, Linux Kernel Mailing List, Linux ARM
On Tue, 27 Aug 2019 at 17:06, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Tue, Aug 27, 2019 at 03:52:17PM +0100, Russell King - ARM Linux admin wrote:
> > On Tue, Aug 27, 2019 at 03:36:34PM +0100, Russell King - ARM Linux admin wrote:
> > > On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote:
> > > > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin
> > > > <linux@armlinux.org.uk> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > While dd'ing the contents of a SD card, I get hung task timeout
> > > > > messages as per below. However, the dd is making progress. Any
> > > > > ideas?
> > > > >
> > > > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing
> > > > > for the card?
> > > >
> > > > Is it a regression?
> > > >
> > > > There not much of recent mmc core and mmc block changes, that I can
> > > > think of at this point.
> > >
> > > No idea - I just repaired the SD socket after the D2 line became
> > > disconnected, and decided to run that command as a test.
> > >
> > > > > ARM64 host, Macchiatobin, uSD card.
> > > >
> > > > What mmc host driver is it? mmci?
> > >
> > > sdhci-xenon.
> > >
> > > I'm just trying with one CPU online, then I'll try with two. My
> > > suspicion is that there's a problem in the ARM64 arch code where
> > > unlocking a mutex doesn't get noticed on other CPUs.
> > >
> > > Hmm, I thought I'd try bringing another CPU online, but it seems
> > > like the ARM64 CPU hotplug code is broken:
> > >
> > > [ 3552.029689] CPU1: shutdown
> > > [ 3552.031099] psci: CPU1 killed.
> > > [ 3949.835212] CPU1: failed to come online
> > > [ 3949.837753] CPU1: failed in unknown state : 0x0
> > >
> > > which means I can only take CPUs down, I can't bring them back
> > > online without rebooting.
> >
> > Okay, running on a single CPU shows no problems.
> >
> > Running on four CPUs (as originally) shows that the kworker thread
> > _never_ gets scheduled, so the warning is not false.
> >
> > With three CPUs, same problem.
> >
> > root@arm-d06300000000:~# ps aux | grep ' D '
> > root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable]
> > root@arm-d06300000000:~# cat /proc/34/sched
> > kworker/1:1 (34, #threads: 1)
> > -------------------------------------------------------------------
> > se.exec_start : 318689.992440
> > se.vruntime : 37750.882357
> > se.sum_exec_runtime : 9.421240
> > se.nr_migrations : 0
> > nr_switches : 1174
> > nr_voluntary_switches : 1171
> > nr_involuntary_switches : 3
> > se.load.weight : 1048576
> > se.runnable_weight : 1048576
> > se.avg.load_sum : 6
> > se.avg.runnable_load_sum : 6
> > se.avg.util_sum : 5170
> > se.avg.load_avg : 0
> > se.avg.runnable_load_avg : 0
> > se.avg.util_avg : 0
> > se.avg.last_update_time : 318689991680
> > se.avg.util_est.ewma : 10
> > se.avg.util_est.enqueued : 0
> > policy : 0
> > prio : 120
> > clock-delta : 0
> >
> > The only thing that changes there is "clock-delta". When I kill the
> > dd, I get:
> >
> > root@arm-d06300000000:~# cat /proc/34/sched
> > kworker/1:1 (34, #threads: 1)
> > -------------------------------------------------------------------
> > se.exec_start : 574025.791680
> > se.vruntime : 79996.657300
> > se.sum_exec_runtime : 10.916400
> > se.nr_migrations : 0
> > nr_switches : 1403
> > nr_voluntary_switches : 1400
> > nr_involuntary_switches : 3
> > se.load.weight : 1048576
> > se.runnable_weight : 1048576
> > se.avg.load_sum : 15
> > se.avg.runnable_load_sum : 15
> > se.avg.util_sum : 15007
> > se.avg.load_avg : 0
> > se.avg.runnable_load_avg : 0
> > se.avg.util_avg : 0
> > se.avg.last_update_time : 574025791488
> > se.avg.util_est.ewma : 10
> > se.avg.util_est.enqueued : 0
> > policy : 0
> > prio : 120
> > clock-delta : 40
> >
> > so the thread makes forward progress.
> >
> > Down to two CPUs:
> >
> > root@arm-d06300000000:~# ps aux | grep ' D '
> > root 34 0.0 0.0 0 0 ? D 15:38 0:00 [kworker/1:1+events_freezable]
> >
> > Same symptoms. dd and md5sum switch between CPU 0 and CPU1.
>
> Hmm.
>
> static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
> const struct blk_mq_queue_data *bd)
>
> mq->in_flight[issue_type] += 1;
> get_card = (mmc_tot_in_flight(mq) == 1);
>
> if (get_card)
> mmc_get_card(card, &mq->ctx);
>
> mmc_get_card() gets the host lock according to the card.
>
> So, if we always have requests in flight (which is probably the case
> here) we never drop the host lock, and mmc_rescan() never gets a look
> in - hence blocking the kworker.
Ahh, you are right. However, this isn't a new problem I believe.
Even if we did some re-work of the locking mechanism while converting
to blk-mq, I still think the worker could starve the mmc_rescan work
before.
In practice this shouldn't be a problem though, unless I am
overlooking something. This is because it's not until there is an I/O
error, that causes the block worker to release the host, to it makes
sense to let mmc_rescan to claim the host to check for card removal.
>
> So this is a real issue with MMC, and not down to something in the
> arch.
Yep, thanks for running the test and providing more details!
>
> I suspect the reason that single-CPU doesn't show it is because it is
> unable to keep multiple requests in flight.
Yes, most likely.
Now, how to solve this problem I need to think more about....
FYI: The long term goal has been to try to remove the big fat host
lock altogether and slowly we have moved more an more things to be
executed as a part of the block worker, which is one of the needed
steps. Like the mmc ioctls for example...
Kind regards
Uffe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-08-28 10:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27 13:43 Continuous SD IO causes hung task messages Russell King - ARM Linux admin
2019-08-27 13:55 ` Ulf Hansson
2019-08-27 14:36 ` Russell King - ARM Linux admin
2019-08-27 14:52 ` Russell King - ARM Linux admin
2019-08-27 15:06 ` Russell King - ARM Linux admin
2019-08-28 10:31 ` Ulf Hansson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).