dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] dm-delay: fix hung task issue
@ 2024-04-26  7:21 Joel Colledge
  2024-04-26  7:21 ` [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode Joel Colledge
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Colledge @ 2024-04-26  7:21 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Alasdair Kergon, Mike Snitzer, Christian Loehle, dm-devel

Hi,

The tests for DRBD encountered an issue with dm-delay when testing with
a 6.8 series kernel. Specifically on Ubuntu 24.04 "Noble". Here is a
minimal reproducer:

virter vm run --name dm-delay-test --id 10 --wait-ssh ubuntu-noble
virter vm ssh dm-delay-test
# truncate -s 100M /file
# loop_dev=$(losetup -f --show /file)
# echo "0 $(blockdev --getsz $loop_dev) delay $loop_dev 0 0" | dmsetup create delay-volume

After a few minutes, the following is printed to the kernel log:

[  246.919123] INFO: task dm-delay-flush-:1256 blocked for more than 122 seconds.
[  246.922543]       Not tainted 6.8.0-31-generic #31-Ubuntu
[  246.923753] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.924932] task:dm-delay-flush- state:D stack:0     pid:1256  tgid:1256  ppid:2      flags:0x00004000
[  246.924940] Call Trace:
[  246.924950]  <TASK>
[  246.924980]  __schedule+0x27c/0x6b0
[  246.925002]  ? __pfx_flush_worker_fn+0x10/0x10 [dm_delay]
[  246.925011]  schedule+0x33/0x110
[  246.925016]  schedule_preempt_disabled+0x15/0x30
[  246.925035]  kthread+0xb1/0x120
...

This bug appears to have been introduced in Linux v6.7.

The following patch fixes the issue.

Best regards,
Joel

Joel Colledge (1):
  dm-delay: fix hung task introduced by kthread mode

 drivers/md/dm-delay.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
  2024-04-26  7:21 [PATCH 0/1] dm-delay: fix hung task issue Joel Colledge
@ 2024-04-26  7:21 ` Joel Colledge
  2024-04-30 14:28   ` Christian Loehle
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Colledge @ 2024-04-26  7:21 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alasdair Kergon, Mike Snitzer, Christian Loehle, dm-devel, Joel Colledge

If the worker thread is not woken due to a bio, then it is not woken at
all. This causes the hung task check to trigger. For instance, when a
delay of 0 is configured, delay_bio() returns without waking the worker,
so this situation occurs.

Prevent the hung task check from triggering by waking up the newly
minted worker in delay_ctr().

Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
---
 drivers/md/dm-delay.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 5eabdb06c649..003512bb5fea 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -274,6 +274,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 			dc->worker = NULL;
 			goto bad;
 		}
+		wake_up_process(dc->worker);
 	} else {
 		timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
 		INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
  2024-04-26  7:21 ` [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode Joel Colledge
@ 2024-04-30 14:28   ` Christian Loehle
  2024-04-30 14:44     ` Joel Colledge
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Loehle @ 2024-04-30 14:28 UTC (permalink / raw)
  To: Joel Colledge, Mikulas Patocka; +Cc: Alasdair Kergon, Mike Snitzer, dm-devel

On 26/04/2024 08:21, Joel Colledge wrote:
> If the worker thread is not woken due to a bio, then it is not woken at
> all. This causes the hung task check to trigger. For instance, when a
> delay of 0 is configured, delay_bio() returns without waking the worker,
> so this situation occurs.
> 
> Prevent the hung task check from triggering by waking up the newly
> minted worker in delay_ctr().
> 
> Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
> ---
>  drivers/md/dm-delay.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
> index 5eabdb06c649..003512bb5fea 100644
> --- a/drivers/md/dm-delay.c
> +++ b/drivers/md/dm-delay.c
> @@ -274,6 +274,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  			dc->worker = NULL;
>  			goto bad;
>  		}
> +		wake_up_process(dc->worker);
>  	} else {
>  		timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
>  		INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);

Is this an issue for delay > 0 too somehow?
Indeed if we don't have a delay the process will never be woken up,
but in that case, why create the worker in the first place?

You're missing lkml as recipient btw.

Kind Regards,
Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
  2024-04-30 14:28   ` Christian Loehle
@ 2024-04-30 14:44     ` Joel Colledge
  2024-04-30 15:26       ` Christian Loehle
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Colledge @ 2024-04-30 14:44 UTC (permalink / raw)
  To: Christian Loehle; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel

On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> Is this an issue for delay > 0 too somehow?

I believe it is. If there is simply no IO to the delay device, then
nothing will wake the new thread and the same issue will occur.

I haven't yet reproduced this case, because the system I am testing on
submits reads on any new block device and I don't know offhand how to
disable them. I think the reads are from scans for multipathd,
bcachefs or similar.

> Indeed if we don't have a delay the process will never be woken up,
> but in that case, why create the worker in the first place?

I agree. If this were only an issue with delay == 0, it would make
more sense not to create the worker at all.

As mentioned above, I believe the issue can occur with delay > 0. So
not creating the worker in the delay == 0 case is an additional
optimization and out of the scope of this change.

> You're missing lkml as recipient btw.

Thanks for the pointer and thanks for your response!

Best regards,
Joel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
  2024-04-30 14:44     ` Joel Colledge
@ 2024-04-30 15:26       ` Christian Loehle
  2024-04-30 16:12         ` Joel Colledge
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Loehle @ 2024-04-30 15:26 UTC (permalink / raw)
  To: Joel Colledge; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel

On 30/04/2024 15:44, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
>> Is this an issue for delay > 0 too somehow?
> 
> I believe it is. If there is simply no IO to the delay device, then
> nothing will wake the new thread and the same issue will occur.

Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
instead of the wakeup.

> 
> I haven't yet reproduced this case, because the system I am testing on
> submits reads on any new block device and I don't know offhand how to
> disable them. I think the reads are from scans for multipathd,
> bcachefs or similar.
> 
>> Indeed if we don't have a delay the process will never be woken up,
>> but in that case, why create the worker in the first place?
> 
> I agree. If this were only an issue with delay == 0, it would make
> more sense not to create the worker at all.
> 
> As mentioned above, I believe the issue can occur with delay > 0. So
> not creating the worker in the delay == 0 case is an additional
> optimization and out of the scope of this change.

Ah right because of the delay_is_fast() definition that is slightly more
work.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
  2024-04-30 15:26       ` Christian Loehle
@ 2024-04-30 16:12         ` Joel Colledge
  2024-05-01  8:26           ` Christian Loehle
  2024-05-03 22:56           ` Benjamin Marzinski
  0 siblings, 2 replies; 8+ messages in thread
From: Joel Colledge @ 2024-04-30 16:12 UTC (permalink / raw)
  To: Christian Loehle; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel

On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
> On 30/04/2024 15:44, Joel Colledge wrote:
> > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> >> Is this an issue for delay > 0 too somehow?
> >
> > I believe it is. If there is simply no IO to the delay device, then
> > nothing will wake the new thread and the same issue will occur.
>
> Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
> instead of the wakeup.

I'm afraid I don't follow what you mean.
set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
worker, but the worker isn't running yet, so it can't do that.

When preparing this patch, I checked how common kthread_create()
followed by wake_up_process() is. It is fairly common according to
this very approximate metric:
$ grep -r -I -A10 kthread_create | grep -c wake_up_process
49

I just looked through those matches in more detail and noticed that
there is a kthread_run() macro which does exactly what we need. I will
change my suggestion to use that instead.

I will wait for more comments before sending a new version of the patch.

Best regards,
Joel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
  2024-04-30 16:12         ` Joel Colledge
@ 2024-05-01  8:26           ` Christian Loehle
  2024-05-03 22:56           ` Benjamin Marzinski
  1 sibling, 0 replies; 8+ messages in thread
From: Christian Loehle @ 2024-05-01  8:26 UTC (permalink / raw)
  To: Joel Colledge; +Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, dm-devel

On 30/04/2024 17:12, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
>> On 30/04/2024 15:44, Joel Colledge wrote:
>>> On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
>>>> Is this an issue for delay > 0 too somehow?
>>>
>>> I believe it is. If there is simply no IO to the delay device, then
>>> nothing will wake the new thread and the same issue will occur.
>>
>> Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
>> instead of the wakeup.
> 
> I'm afraid I don't follow what you mean.
> set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
> worker, but the worker isn't running yet, so it can't do that.

Sorry about that, wasn't thinking :/

> 
> When preparing this patch, I checked how common kthread_create()
> followed by wake_up_process() is. It is fairly common according to
> this very approximate metric:
> $ grep -r -I -A10 kthread_create | grep -c wake_up_process
> 49
> 
> I just looked through those matches in more detail and noticed that
> there is a kthread_run() macro which does exactly what we need. I will
> change my suggestion to use that instead.
> 
> I will wait for more comments before sending a new version of the patch.
> 
> Best regards,
> Joel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode
  2024-04-30 16:12         ` Joel Colledge
  2024-05-01  8:26           ` Christian Loehle
@ 2024-05-03 22:56           ` Benjamin Marzinski
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2024-05-03 22:56 UTC (permalink / raw)
  To: Joel Colledge
  Cc: Christian Loehle, Mikulas Patocka, Alasdair Kergon, Mike Snitzer,
	dm-devel

On Tue, Apr 30, 2024 at 06:12:37PM +0200, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
> > On 30/04/2024 15:44, Joel Colledge wrote:
> > > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> > >> Is this an issue for delay > 0 too somehow?
> > >
> > > I believe it is. If there is simply no IO to the delay device, then
> > > nothing will wake the new thread and the same issue will occur.
> >
> > Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
> > instead of the wakeup.
> 
> I'm afraid I don't follow what you mean.
> set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
> worker, but the worker isn't running yet, so it can't do that.
> 
> When preparing this patch, I checked how common kthread_create()
> followed by wake_up_process() is. It is fairly common according to
> this very approximate metric:
> $ grep -r -I -A10 kthread_create | grep -c wake_up_process
> 49
> 
> I just looked through those matches in more detail and noticed that
> there is a kthread_run() macro which does exactly what we need. I will
> change my suggestion to use that instead.
> 
> I will wait for more comments before sending a new version of the patch.

kthread_run() looks like the right solution here. There's no need to
wait for more initialization and the kthread will put itself to sleep
momentarily.  Please send an updated patch.

-Ben

> 
> Best regards,
> Joel


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-03 22:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26  7:21 [PATCH 0/1] dm-delay: fix hung task issue Joel Colledge
2024-04-26  7:21 ` [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode Joel Colledge
2024-04-30 14:28   ` Christian Loehle
2024-04-30 14:44     ` Joel Colledge
2024-04-30 15:26       ` Christian Loehle
2024-04-30 16:12         ` Joel Colledge
2024-05-01  8:26           ` Christian Loehle
2024-05-03 22:56           ` Benjamin Marzinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).