All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Jens Axboe <axboe@kernel.dk>,
	Bart Van Assche <Bart.VanAssche@wdc.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	alan.christopher.jenkins@gmail.com,
	syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	"=Oleksandr Natalenko" <oleksandr@natalenko.name>,
	martin@lichtvoll.de, Hannes Reinecke <hare@suse.com>,
	syzkaller-bugs@googlegroups.com,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Keith Busch <keith.busch@intel.com>,
	"open list:EXT4 FILE SYSTEM" <linux-ext4@vger.kernel.org>
Subject: Re: INFO: task hung in blk_queue_enter
Date: Thu, 7 Jun 2018 11:29:32 +0800	[thread overview]
Message-ID: <CACVXFVNcJ+tC6RUT+JUA3iw+STB+q4P_u+AvoOSrt04zEw8TZA@mail.gmail.com> (raw)
In-Reply-To: <20180605004128.GA28826@ming.t460p>

On Tue, Jun 5, 2018 at 8:41 AM, Ming Lei <ming.lei@redhat.com> wrote:
> On Tue, Jun 05, 2018 at 09:27:41AM +0900, Tetsuo Handa wrote:
>> Jens Axboe wrote:
>> > On 6/1/18 4:10 AM, Tetsuo Handa wrote:
>> > > Tetsuo Handa wrote:
>> > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is
>> > >> not a race condition while folding percpu counter values into atomic counter
>> > >> value. That is, for some reason, someone who is responsible for calling
>> > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is
>> > >> unable to call percpu_ref_put().
>> > >> But I don't know how to find someone who is failing to call percpu_ref_put()...
>> > >
>> > > I found the someone. It was already there in the backtrace...
>> > >
>> >
>> > Ahh, nicely spotted! One idea would be the one below. For this case,
>> > we're recursing, so we can either do a non-block queue enter, or we
>> > can just do a live enter.
>> >
>>
>> While "block: don't use blocking queue entered for recursive bio submits" was
>> already applied, syzbot is still reporting a hung task with same signature but
>> different trace.
>>
>> https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000
>> ----------------------------------------
>> [  492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds.
>> [  492.519604]       Not tainted 4.17.0+ #83
>> [  492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  492.531787] syz-executor1   D23384 20263   4574 0x00000004
>> [  492.537443] Call Trace:
>> [  492.540041]  __schedule+0x801/0x1e30
>> [  492.580958]  schedule+0xef/0x430
>> [  492.610154]  blk_queue_enter+0x8da/0xdf0
>> [  492.716327]  generic_make_request+0x651/0x1790
>> [  492.765680]  submit_bio+0xba/0x460
>> [  492.793198]  submit_bio_wait+0x134/0x1e0
>> [  492.801891]  blkdev_issue_flush+0x204/0x300
>> [  492.806236]  blkdev_fsync+0x93/0xd0
>> [  492.813620]  vfs_fsync_range+0x140/0x220
>> [  492.817702]  vfs_fsync+0x29/0x30
>> [  492.821081]  __loop_update_dio+0x4de/0x6a0
>> [  492.825341]  lo_ioctl+0xd28/0x2190
>> [  492.833442]  blkdev_ioctl+0x9b6/0x2020
>> [  492.872146]  block_ioctl+0xee/0x130
>> [  492.880139]  do_vfs_ioctl+0x1cf/0x16a0
>> [  492.927550]  ksys_ioctl+0xa9/0xd0
>> [  492.931036]  __x64_sys_ioctl+0x73/0xb0
>> [  492.934952]  do_syscall_64+0x1b1/0x800
>> [  492.963624]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [  493.212768] 1 lock held by syz-executor1/20263:
>> [  493.217448]  #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190
>> ----------------------------------------
>>
>> Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and
>> blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling
>> blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter()
>> by caling atomic_inc_return() and percpu_ref_kill() ?
>>
>
> The vfs_fsync() isn't necessary in loop_update_dio() since both
> generic_file_write_iter() and generic_file_read_iter() can handle
> buffered io vs dio well.
>
> I will send one patch to remove the vfs_sync() later.

Hi Tetsuo,

The issue might be fixed by removing this vfs_sync(), but I'd like to
understand the idea behind since vfs_sync() shouldn't have caused
any IO to this loop queue.

I also tried to do the test via the following c syzbot, but can't reproduce
it yet after running it for several hours.

https://syzkaller.appspot.com/x/repro.c?id=4727023951937536

Could you share us how you reproduce it?

Thanks,
Ming Lei

WARNING: multiple messages have this Message-ID
From: Ming Lei <tom.leiming@gmail.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Jens Axboe <axboe@kernel.dk>,
	Bart Van Assche <Bart.VanAssche@wdc.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	alan.christopher.jenkins@gmail.com,
	syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	"=Oleksandr Natalenko" <oleksandr@natalenko.name>,
	martin@lichtvoll.de, Hannes Reinecke <hare@suse.com>,
	syzkaller-bugs@googlegroups.com,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Keith Busch <keith.busch@intel.com>,
	"open list:EXT4 FILE SYS
Subject: Re: INFO: task hung in blk_queue_enter
Date: Thu, 7 Jun 2018 11:29:32 +0800	[thread overview]
Message-ID: <CACVXFVNcJ+tC6RUT+JUA3iw+STB+q4P_u+AvoOSrt04zEw8TZA@mail.gmail.com> (raw)
In-Reply-To: <20180605004128.GA28826@ming.t460p>

On Tue, Jun 5, 2018 at 8:41 AM, Ming Lei <ming.lei@redhat.com> wrote:
> On Tue, Jun 05, 2018 at 09:27:41AM +0900, Tetsuo Handa wrote:
>> Jens Axboe wrote:
>> > On 6/1/18 4:10 AM, Tetsuo Handa wrote:
>> > > Tetsuo Handa wrote:
>> > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is
>> > >> not a race condition while folding percpu counter values into atomic counter
>> > >> value. That is, for some reason, someone who is responsible for calling
>> > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is
>> > >> unable to call percpu_ref_put().
>> > >> But I don't know how to find someone who is failing to call percpu_ref_put()...
>> > >
>> > > I found the someone. It was already there in the backtrace...
>> > >
>> >
>> > Ahh, nicely spotted! One idea would be the one below. For this case,
>> > we're recursing, so we can either do a non-block queue enter, or we
>> > can just do a live enter.
>> >
>>
>> While "block: don't use blocking queue entered for recursive bio submits" was
>> already applied, syzbot is still reporting a hung task with same signature but
>> different trace.
>>
>> https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000
>> ----------------------------------------
>> [  492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds.
>> [  492.519604]       Not tainted 4.17.0+ #83
>> [  492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  492.531787] syz-executor1   D23384 20263   4574 0x00000004
>> [  492.537443] Call Trace:
>> [  492.540041]  __schedule+0x801/0x1e30
>> [  492.580958]  schedule+0xef/0x430
>> [  492.610154]  blk_queue_enter+0x8da/0xdf0
>> [  492.716327]  generic_make_request+0x651/0x1790
>> [  492.765680]  submit_bio+0xba/0x460
>> [  492.793198]  submit_bio_wait+0x134/0x1e0
>> [  492.801891]  blkdev_issue_flush+0x204/0x300
>> [  492.806236]  blkdev_fsync+0x93/0xd0
>> [  492.813620]  vfs_fsync_range+0x140/0x220
>> [  492.817702]  vfs_fsync+0x29/0x30
>> [  492.821081]  __loop_update_dio+0x4de/0x6a0
>> [  492.825341]  lo_ioctl+0xd28/0x2190
>> [  492.833442]  blkdev_ioctl+0x9b6/0x2020
>> [  492.872146]  block_ioctl+0xee/0x130
>> [  492.880139]  do_vfs_ioctl+0x1cf/0x16a0
>> [  492.927550]  ksys_ioctl+0xa9/0xd0
>> [  492.931036]  __x64_sys_ioctl+0x73/0xb0
>> [  492.934952]  do_syscall_64+0x1b1/0x800
>> [  492.963624]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [  493.212768] 1 lock held by syz-executor1/20263:
>> [  493.217448]  #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190
>> ----------------------------------------
>>
>> Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and
>> blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling
>> blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter()
>> by caling atomic_inc_return() and percpu_ref_kill() ?
>>
>
> The vfs_fsync() isn't necessary in loop_update_dio() since both
> generic_file_write_iter() and generic_file_read_iter() can handle
> buffered io vs dio well.
>
> I will send one patch to remove the vfs_sync() later.

Hi Tetsuo,

The issue might be fixed by removing this vfs_sync(), but I'd like to
understand the idea behind since vfs_sync() shouldn't have caused
any IO to this loop queue.

I also tried to do the test via the following c syzbot, but can't reproduce
it yet after running it for several hours.

https://syzkaller.appspot.com/x/repro.c?id=4727023951937536

Could you share us how you reproduce it?

Thanks,
Ming Lei

  reply	other threads:[~2018-06-07  3:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-28 11:24 syzbot
2018-05-15 11:45 ` Tetsuo Handa
2018-05-16 13:05   ` Tetsuo Handa
2018-05-16 14:56     ` Bart Van Assche
2018-05-16 14:56       ` Bart Van Assche
2018-05-16 15:16       ` Dmitry Vyukov
2018-05-16 15:16         ` Dmitry Vyukov
2018-05-16 15:37         ` Bart Van Assche
2018-05-16 15:37           ` Bart Van Assche
2018-05-16 15:37           ` Bart Van Assche
2018-05-21 21:52           ` Tetsuo Handa
2018-05-22 11:20             ` Tetsuo Handa
2018-06-01 10:10               ` Tetsuo Handa
2018-06-01 17:52                 ` Jens Axboe
2018-06-01 23:49                   ` Ming Lei
2018-06-02  0:49                     ` Jens Axboe
2018-06-02  0:56                       ` Jens Axboe
2018-06-02  2:36                       ` Ming Lei
2018-06-02  4:31                         ` Jens Axboe
2018-06-02  4:54                           ` Ming Lei
2018-06-02  8:07                             ` Martin Steigerwald
2018-06-02  8:07                               ` Martin Steigerwald
2018-06-02 13:48                             ` Jens Axboe
2018-06-02 13:48                               ` Jens Axboe
2018-06-05  0:27                   ` Tetsuo Handa
2018-06-05  0:41                     ` Ming Lei
2018-06-07  3:29                       ` Ming Lei [this message]
2018-06-07  3:29                         ` Ming Lei
2018-06-07 13:19                         ` Tetsuo Handa
2018-06-04 11:46                 ` Dmitry Vyukov
2018-06-04 13:13                   ` Tetsuo Handa
2018-05-16 17:33     ` Alan Jenkins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACVXFVNcJ+tC6RUT+JUA3iw+STB+q4P_u+AvoOSrt04zEw8TZA@mail.gmail.com \
    --to=tom.leiming@gmail.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=alan.christopher.jenkins@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=dvyukov@google.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jthumshirn@suse.de \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=martin@lichtvoll.de \
    --cc=ming.lei@redhat.com \
    --cc=oleksandr@natalenko.name \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=ross.zwisler@linux.intel.com \
    --cc=syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --subject='Re: INFO: task hung in blk_queue_enter' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.