[PATCH] block: Prevent hung_check firing during long sync IO

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] block: Prevent hung_check firing during long sync IO
@ 2020-02-20 12:05 Ming Lei
  2020-02-20 14:43 ` Jesse Barnes
  2020-02-21  2:49 ` Bart Van Assche
  0 siblings, 2 replies; 4+ messages in thread
From: Ming Lei @ 2020-02-20 12:05 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Salman Qazi, Jesse Barnes, Bart Van Assche

submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
takes up to 100 second on some devices. Also any block I/O operation
that occurs after the BLKSECDISCARD is submitted will also potentially
be affected by the hung task timeouts.

So prevent hung_check from firing by taking same approach used
in blk_execute_rq(), and the wake-up interval is set as half the
hung_check timer period, which keeps overhead low enough.

Cc: Salman Qazi <sqazi@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Link: https://lkml.org/lkml/2020/2/12/1193
Reported-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index 94d697217887..c9ce19a86de7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -17,6 +17,7 @@
 #include <linux/cgroup.h>
 #include <linux/blk-cgroup.h>
 #include <linux/highmem.h>
+#include <linux/sched/sysctl.h>
 
 #include <trace/events/block.h>
 #include "blk.h"
@@ -1019,12 +1020,19 @@ static void submit_bio_wait_endio(struct bio *bio)
 int submit_bio_wait(struct bio *bio)
 {
 	DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
+	unsigned long hang_check;
 
 	bio->bi_private = &done;
 	bio->bi_end_io = submit_bio_wait_endio;
 	bio->bi_opf |= REQ_SYNC;
 	submit_bio(bio);
-	wait_for_completion_io(&done);
+
+	/* Prevent hang_check timer from firing at us during very long I/O */
+	hang_check = sysctl_hung_task_timeout_secs;
+	if (hang_check)
+		while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
+	else
+		wait_for_completion_io(&done);
 
 	return blk_status_to_errno(bio->bi_status);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] block: Prevent hung_check firing during long sync IO
  2020-02-20 12:05 [PATCH] block: Prevent hung_check firing during long sync IO Ming Lei
@ 2020-02-20 14:43 ` Jesse Barnes
  2020-02-20 19:04   ` Salman Qazi
  2020-02-21  2:49 ` Bart Van Assche
  1 sibling, 1 reply; 4+ messages in thread
From: Jesse Barnes @ 2020-02-20 14:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Salman Qazi, Bart Van Assche

Yeah looks good.

Reviewed-by: Jesse Barnes <jsbarnes@google.com>

A further change (just for readability) might be to factor out these
"don't trigger hangcheck" waits from here and blk_execute_rq() into a
small helper with a descriptive name.

Thanks,
Jesse

On Thu, Feb 20, 2020 at 5:05 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
> may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
> takes up to 100 second on some devices. Also any block I/O operation
> that occurs after the BLKSECDISCARD is submitted will also potentially
> be affected by the hung task timeouts.
>
> So prevent hung_check from firing by taking same approach used
> in blk_execute_rq(), and the wake-up interval is set as half the
> hung_check timer period, which keeps overhead low enough.
>
> Cc: Salman Qazi <sqazi@google.com>
> Cc: Jesse Barnes <jsbarnes@google.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Link: https://lkml.org/lkml/2020/2/12/1193
> Reported-by: Salman Qazi <sqazi@google.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/bio.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 94d697217887..c9ce19a86de7 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -17,6 +17,7 @@
>  #include <linux/cgroup.h>
>  #include <linux/blk-cgroup.h>
>  #include <linux/highmem.h>
> +#include <linux/sched/sysctl.h>
>
>  #include <trace/events/block.h>
>  #include "blk.h"
> @@ -1019,12 +1020,19 @@ static void submit_bio_wait_endio(struct bio *bio)
>  int submit_bio_wait(struct bio *bio)
>  {
>         DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
> +       unsigned long hang_check;
>
>         bio->bi_private = &done;
>         bio->bi_end_io = submit_bio_wait_endio;
>         bio->bi_opf |= REQ_SYNC;
>         submit_bio(bio);
> -       wait_for_completion_io(&done);
> +
> +       /* Prevent hang_check timer from firing at us during very long I/O */
> +       hang_check = sysctl_hung_task_timeout_secs;
> +       if (hang_check)
> +               while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
> +       else
> +               wait_for_completion_io(&done);
>
>         return blk_status_to_errno(bio->bi_status);
>  }
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block: Prevent hung_check firing during long sync IO
  2020-02-20 14:43 ` Jesse Barnes
@ 2020-02-20 19:04   ` Salman Qazi
  0 siblings, 0 replies; 4+ messages in thread
From: Salman Qazi @ 2020-02-20 19:04 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Ming Lei, Jens Axboe, linux-block, Bart Van Assche

Thank you for doing this.  The fix will help, but it is not a complete solution.

+1 to Jesse's comment about putting into a descriptive helper.  This
functionality might be useful elsewhere.

I'd like to reiterate that the problem is broader.  For instance, if I
hack the IOCTL path to be immune from the hung task watchdog from
firing there, the problem simply moves to another I/O that is behind
the IOCTL.  For instance if I have a dd command doing direct I/O
against the same device that has the IOCTL pending, the dd will hang
in uninterruptible sleep and eventually the hung task timer will fire.

[ 5645.712955] dd              D 634a749ef8dd8750     0  5126   4978
0x00000000 last_sleep: 5568344242277.  last_runnable: 5568342587878
[ 5645.712965]  ffff88005cfb2c00 ffff88007ab14940 ffff88007a6df080
ffff8800614a4b00
[ 5645.712973]  0000000000014940 ffff880076f77a00 ffffffffb1076a57
ffff880000000000
[ 5645.712981]  ffff8800614a4b00 7fffffffffffffff ffff880075402940
ffff880076f77a38
[ 5645.712988] Call Trace:
[ 5645.713000]  [<ffffffffb1076a57>] ? __schedule+0x37a/0x7a2
[ 5645.713005]  [<ffffffffb10766a7>] schedule+0x3f/0x75
[ 5645.713009]  [<ffffffffb1079410>] schedule_timeout+0x53/0x6af
[ 5645.713015]  [<ffffffffb1077425>] io_schedule_timeout+0x6b/0x91
[ 5645.713021]  [<ffffffffb09dd66a>] dio_await_one+0x94/0xdb
[ 5645.713025]  [<ffffffffb09dc668>] __blockdev_direct_IO+0x6d5/0x763
[ 5645.713029]  [<ffffffffb09dbde9>] ? blkdev_direct_IO+0x3b/0x3b
[ 5645.713034]  [<ffffffffb09dbde9>] ? blkdev_direct_IO+0x3b/0x3b
[ 5645.713039]  [<ffffffffb09dbde3>] blkdev_direct_IO+0x35/0x3b
[ 5645.713044]  [<ffffffffb095bf30>]
generic_file_direct_write+0xfb/0x16d
[ 5645.713050]  [<ffffffffb098c661>]
__generic_file_write_iter+0x2b0/0x3c1
[ 5645.713056]  [<ffffffffb0bffe9d>] ? write_port+0xdc/0xdc
[ 5645.713061]  [<ffffffffb0b506de>] ? __clear_user+0x41/0x66
[ 5645.713065]  [<ffffffffb09db78f>] blkdev_write_iter+0xd1/0x13e
[ 5645.713072]  [<ffffffffb0a66977>] SyS_write+0x22d/0x431
[ 5645.713077]  [<ffffffffb107b163>] entry_SYSCALL_64_fastpath+0x31/0xab

 Similarly, I am sure there are filesystem specific paths that can
potentially hang.  I suspect that the flushing code in ext4 might be
one such path.  These are just examples.
Also, my worry persists that we are creating more places for bugs to
hide by blanket immunizing I/O paths from the hung task timer.  Having
said all of that, I don't really have a significantly better in-kernel
solution.

Reviewed-by: Salman Qazi <sqazi@google.com>

On Thu, Feb 20, 2020 at 6:43 AM Jesse Barnes <jsbarnes@google.com> wrote:
>
> Yeah looks good.
>
> Reviewed-by: Jesse Barnes <jsbarnes@google.com>
>
> A further change (just for readability) might be to factor out these
> "don't trigger hangcheck" waits from here and blk_execute_rq() into a
> small helper with a descriptive name.
>
> Thanks,
> Jesse
>
> On Thu, Feb 20, 2020 at 5:05 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
> > may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
> > takes up to 100 second on some devices. Also any block I/O operation
> > that occurs after the BLKSECDISCARD is submitted will also potentially
> > be affected by the hung task timeouts.
> >
> > So prevent hung_check from firing by taking same approach used
> > in blk_execute_rq(), and the wake-up interval is set as half the
> > hung_check timer period, which keeps overhead low enough.
> >
> > Cc: Salman Qazi <sqazi@google.com>
> > Cc: Jesse Barnes <jsbarnes@google.com>
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Link: https://lkml.org/lkml/2020/2/12/1193
> > Reported-by: Salman Qazi <sqazi@google.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/bio.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/bio.c b/block/bio.c
> > index 94d697217887..c9ce19a86de7 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/cgroup.h>
> >  #include <linux/blk-cgroup.h>
> >  #include <linux/highmem.h>
> > +#include <linux/sched/sysctl.h>
> >
> >  #include <trace/events/block.h>
> >  #include "blk.h"
> > @@ -1019,12 +1020,19 @@ static void submit_bio_wait_endio(struct bio *bio)
> >  int submit_bio_wait(struct bio *bio)
> >  {
> >         DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
> > +       unsigned long hang_check;
> >
> >         bio->bi_private = &done;
> >         bio->bi_end_io = submit_bio_wait_endio;
> >         bio->bi_opf |= REQ_SYNC;
> >         submit_bio(bio);
> > -       wait_for_completion_io(&done);
> > +
> > +       /* Prevent hang_check timer from firing at us during very long I/O */
> > +       hang_check = sysctl_hung_task_timeout_secs;
> > +       if (hang_check)
> > +               while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
> > +       else
> > +               wait_for_completion_io(&done);
> >
> >         return blk_status_to_errno(bio->bi_status);
> >  }
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block: Prevent hung_check firing during long sync IO
  2020-02-20 12:05 [PATCH] block: Prevent hung_check firing during long sync IO Ming Lei
  2020-02-20 14:43 ` Jesse Barnes
@ 2020-02-21  2:49 ` Bart Van Assche
  1 sibling, 0 replies; 4+ messages in thread
From: Bart Van Assche @ 2020-02-21  2:49 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: linux-block, Salman Qazi, Jesse Barnes

On 2020-02-20 04:05, Ming Lei wrote:
>  	submit_bio(bio);
> -	wait_for_completion_io(&done);
> +
> +	/* Prevent hang_check timer from firing at us during very long I/O */
> +	hang_check = sysctl_hung_task_timeout_secs;
> +	if (hang_check)
> +		while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));
> +	else
> +		wait_for_completion_io(&done);
>  
>  	return blk_status_to_errno(bio->bi_status);
>  }

Please make this patch compliant with the kernel coding style.
Checkpatch reports the following about the above patch:

WARNING: line over 80 characters
#152: FILE: block/bio.c:1032:
+  while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));

ERROR: trailing statements should be on next line
#152: FILE: block/bio.c:1032:
+  while (!wait_for_completion_io_timeout(&done, hang_check * (HZ/2)));

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-21  2:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20 12:05 [PATCH] block: Prevent hung_check firing during long sync IO Ming Lei
2020-02-20 14:43 ` Jesse Barnes
2020-02-20 19:04   ` Salman Qazi
2020-02-21  2:49 ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).