* [PATCH 0/1] dm-delay: fix hung task issue
@ 2024-04-26 7:21 Joel Colledge
2024-04-26 7:21 ` [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode Joel Colledge
0 siblings, 1 reply; 8+ messages in thread
From: Joel Colledge @ 2024-04-26 7:21 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: Alasdair Kergon, Mike Snitzer, Christian Loehle, dm-devel
Hi,
The tests for DRBD encountered an issue with dm-delay when testing with
a 6.8 series kernel. Specifically on Ubuntu 24.04 "Noble". Here is a
minimal reproducer:
virter vm run --name dm-delay-test --id 10 --wait-ssh ubuntu-noble
virter vm ssh dm-delay-test
# truncate -s 100M /file
# loop_dev=$(losetup -f --show /file)
# echo "0 $(blockdev --getsz $loop_dev) delay $loop_dev 0 0" | dmsetup create delay-volume
After a few minutes, the following is printed to the kernel log:
[ 246.919123] INFO: task dm-delay-flush-:1256 blocked for more than 122 seconds.
[ 246.922543] Not tainted 6.8.0-31-generic #31-Ubuntu
[ 246.923753] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.924932] task:dm-delay-flush- state:D stack:0 pid:1256 tgid:1256 ppid:2 flags:0x00004000
[ 246.924940] Call Trace:
[ 246.924950] <TASK>
[ 246.924980] __schedule+0x27c/0x6b0
[ 246.925002] ? __pfx_flush_worker_fn+0x10/0x10 [dm_delay]
[ 246.925011] schedule+0x33/0x110
[ 246.925016] schedule_preempt_disabled+0x15/0x30
[ 246.925035] kthread+0xb1/0x120
...
This bug appears to have been introduced in Linux v6.7.
The following patch fixes the issue.
Best regards,
Joel
Joel Colledge (1):
dm-delay: fix hung task introduced by kthread mode
drivers/md/dm-delay.c | 1 +
1 file changed, 1 insertion(+)
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
2024-04-26 7:21 [PATCH 0/1] dm-delay: fix hung task issue Joel Colledge
@ 2024-04-26 7:21 ` Joel Colledge
2024-04-30 14:28 ` Christian Loehle
0 siblings, 1 reply; 8+ messages in thread
From: Joel Colledge @ 2024-04-26 7:21 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Alasdair Kergon, Mike Snitzer, Christian Loehle, dm-devel, Joel Colledge
If the worker thread is not woken due to a bio, then it is not woken at
all. This causes the hung task check to trigger. For instance, when a
delay of 0 is configured, delay_bio() returns without waking the worker,
so this situation occurs.
Prevent the hung task check from triggering by waking up the newly
minted worker in delay_ctr().
Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
---
drivers/md/dm-delay.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 5eabdb06c649..003512bb5fea 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -274,6 +274,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
dc->worker = NULL;
goto bad;
}
+ wake_up_process(dc->worker);
} else {
timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
2024-04-26 7:21 ` [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode Joel Colledge
@ 2024-04-30 14:28 ` Christian Loehle
2024-04-30 14:44 ` Joel Colledge
0 siblings, 1 reply; 8+ messages in thread
From: Christian Loehle @ 2024-04-30 14:28 UTC (permalink / raw)
To: Joel Colledge, Mikulas Patocka; +Cc: Alasdair Kergon, Mike Snitzer, dm-devel
On 26/04/2024 08:21, Joel Colledge wrote:
> If the worker thread is not woken due to a bio, then it is not woken at
> all. This causes the hung task check to trigger. For instance, when a
> delay of 0 is configured, delay_bio() returns without waking the worker,
> so this situation occurs.
>
> Prevent the hung task check from triggering by waking up the newly
> minted worker in delay_ctr().
>
> Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
> ---
> drivers/md/dm-delay.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
> index 5eabdb06c649..003512bb5fea 100644
> --- a/drivers/md/dm-delay.c
> +++ b/drivers/md/dm-delay.c
> @@ -274,6 +274,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> dc->worker = NULL;
> goto bad;
> }
> + wake_up_process(dc->worker);
> } else {
> timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
> INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
Is this an issue for delay > 0 too somehow?
Indeed if we don't have a delay the process will never be woken up,
but in that case, why create the worker in the first place?
You're missing lkml as recipient btw.
Kind Regards,
Christian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
2024-04-30 14:28 ` Christian Loehle
@ 2024-04-30 14:44 ` Joel Colledge
2024-04-30 15:26 ` Christian Loehle
0 siblings, 1 reply; 8+ messages in thread
From: Joel Colledge @ 2024-04-30 14:44 UTC (permalink / raw)
To: Christian Loehle; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel
On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> Is this an issue for delay > 0 too somehow?
I believe it is. If there is simply no IO to the delay device, then
nothing will wake the new thread and the same issue will occur.
I haven't yet reproduced this case, because the system I am testing on
submits reads on any new block device and I don't know offhand how to
disable them. I think the reads are from scans for multipathd,
bcachefs or similar.
> Indeed if we don't have a delay the process will never be woken up,
> but in that case, why create the worker in the first place?
I agree. If this were only an issue with delay == 0, it would make
more sense not to create the worker at all.
As mentioned above, I believe the issue can occur with delay > 0. So
not creating the worker in the delay == 0 case is an additional
optimization and out of the scope of this change.
> You're missing lkml as recipient btw.
Thanks for the pointer and thanks for your response!
Best regards,
Joel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
2024-04-30 14:44 ` Joel Colledge
@ 2024-04-30 15:26 ` Christian Loehle
2024-04-30 16:12 ` Joel Colledge
0 siblings, 1 reply; 8+ messages in thread
From: Christian Loehle @ 2024-04-30 15:26 UTC (permalink / raw)
To: Joel Colledge; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel
On 30/04/2024 15:44, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
>> Is this an issue for delay > 0 too somehow?
>
> I believe it is. If there is simply no IO to the delay device, then
> nothing will wake the new thread and the same issue will occur.
Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
instead of the wakeup.
>
> I haven't yet reproduced this case, because the system I am testing on
> submits reads on any new block device and I don't know offhand how to
> disable them. I think the reads are from scans for multipathd,
> bcachefs or similar.
>
>> Indeed if we don't have a delay the process will never be woken up,
>> but in that case, why create the worker in the first place?
>
> I agree. If this were only an issue with delay == 0, it would make
> more sense not to create the worker at all.
>
> As mentioned above, I believe the issue can occur with delay > 0. So
> not creating the worker in the delay == 0 case is an additional
> optimization and out of the scope of this change.
Ah right because of the delay_is_fast() definition that is slightly more
work.
Regards,
Christian.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
2024-04-30 15:26 ` Christian Loehle
@ 2024-04-30 16:12 ` Joel Colledge
2024-05-01 8:26 ` Christian Loehle
2024-05-03 22:56 ` Benjamin Marzinski
0 siblings, 2 replies; 8+ messages in thread
From: Joel Colledge @ 2024-04-30 16:12 UTC (permalink / raw)
To: Christian Loehle; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel
On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
> On 30/04/2024 15:44, Joel Colledge wrote:
> > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> >> Is this an issue for delay > 0 too somehow?
> >
> > I believe it is. If there is simply no IO to the delay device, then
> > nothing will wake the new thread and the same issue will occur.
>
> Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
> instead of the wakeup.
I'm afraid I don't follow what you mean.
set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
worker, but the worker isn't running yet, so it can't do that.
When preparing this patch, I checked how common kthread_create()
followed by wake_up_process() is. It is fairly common according to
this very approximate metric:
$ grep -r -I -A10 kthread_create | grep -c wake_up_process
49
I just looked through those matches in more detail and noticed that
there is a kthread_run() macro which does exactly what we need. I will
change my suggestion to use that instead.
I will wait for more comments before sending a new version of the patch.
Best regards,
Joel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
2024-04-30 16:12 ` Joel Colledge
@ 2024-05-01 8:26 ` Christian Loehle
2024-05-03 22:56 ` Benjamin Marzinski
1 sibling, 0 replies; 8+ messages in thread
From: Christian Loehle @ 2024-05-01 8:26 UTC (permalink / raw)
To: Joel Colledge; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel
On 30/04/2024 17:12, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
>> On 30/04/2024 15:44, Joel Colledge wrote:
>>> On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
>>>> Is this an issue for delay > 0 too somehow?
>>>
>>> I believe it is. If there is simply no IO to the delay device, then
>>> nothing will wake the new thread and the same issue will occur.
>>
>> Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
>> instead of the wakeup.
>
> I'm afraid I don't follow what you mean.
> set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
> worker, but the worker isn't running yet, so it can't do that.
Sorry about that, wasn't thinking :/
>
> When preparing this patch, I checked how common kthread_create()
> followed by wake_up_process() is. It is fairly common according to
> this very approximate metric:
> $ grep -r -I -A10 kthread_create | grep -c wake_up_process
> 49
>
> I just looked through those matches in more detail and noticed that
> there is a kthread_run() macro which does exactly what we need. I will
> change my suggestion to use that instead.
>
> I will wait for more comments before sending a new version of the patch.
>
> Best regards,
> Joel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
2024-04-30 16:12 ` Joel Colledge
2024-05-01 8:26 ` Christian Loehle
@ 2024-05-03 22:56 ` Benjamin Marzinski
1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2024-05-03 22:56 UTC (permalink / raw)
To: Joel Colledge
Cc: Christian Loehle, Mikulas Patocka, Alasdair Kergon, Mike Snitzer,
dm-devel
On Tue, Apr 30, 2024 at 06:12:37PM +0200, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
> > On 30/04/2024 15:44, Joel Colledge wrote:
> > > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> > >> Is this an issue for delay > 0 too somehow?
> > >
> > > I believe it is. If there is simply no IO to the delay device, then
> > > nothing will wake the new thread and the same issue will occur.
> >
> > Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
> > instead of the wakeup.
>
> I'm afraid I don't follow what you mean.
> set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
> worker, but the worker isn't running yet, so it can't do that.
>
> When preparing this patch, I checked how common kthread_create()
> followed by wake_up_process() is. It is fairly common according to
> this very approximate metric:
> $ grep -r -I -A10 kthread_create | grep -c wake_up_process
> 49
>
> I just looked through those matches in more detail and noticed that
> there is a kthread_run() macro which does exactly what we need. I will
> change my suggestion to use that instead.
>
> I will wait for more comments before sending a new version of the patch.
kthread_run() looks like the right solution here. There's no need to
wait for more initialization and the kthread will put itself to sleep
momentarily. Please send an updated patch.
-Ben
>
> Best regards,
> Joel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-03 22:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26 7:21 [PATCH 0/1] dm-delay: fix hung task issue Joel Colledge
2024-04-26 7:21 ` [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode Joel Colledge
2024-04-30 14:28 ` Christian Loehle
2024-04-30 14:44 ` Joel Colledge
2024-04-30 15:26 ` Christian Loehle
2024-04-30 16:12 ` Joel Colledge
2024-05-01 8:26 ` Christian Loehle
2024-05-03 22:56 ` Benjamin Marzinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).