From: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
To: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>, fstests <fstests@vger.kernel.org>
Subject: Re: generic/269 hangs on lastest upstream kernel
Date: Wed, 19 Feb 2020 18:09:37 +0800 [thread overview]
Message-ID: <b0b82e0c-0e86-af5e-5a61-2fafb2ca85a0@cn.fujitsu.com> (raw)
In-Reply-To: <20200218110311.GI16121@quack2.suse.cz>
on 2020/02/18 19:03, Jan Kara wrote:
> On Tue 18-02-20 17:46:54, Yang Xu wrote:
>>
>> on 2020/02/18 16:24, Jan Kara wrote:
>>> On Tue 18-02-20 11:25:37, Yang Xu wrote:
>>>> on 2020/02/14 23:00, Jan Kara wrote:
>>>>> On Fri 14-02-20 18:24:50, Yang Xu wrote:
>>>>>> on 2020/02/14 5:10, Jan Kara wrote:
>>>>>>> On Thu 13-02-20 16:49:21, Yang Xu wrote:
>>>>>>>>>> When I test generic/269(ext4) on 5.6.0-rc1 kernel, it hangs.
>>>>>>>>>> ----------------------------------------------
>>>>>>>>>> dmesg as below:
>>>>>>>>>> 76.506753] run fstests generic/269 at 2020-02-11 05:53:44
>>>>>>>>>> [ 76.955667] EXT4-fs (sdc): mounted filesystem with ordered data mode.
>>>>>>>>>> Opts: acl, user_xattr
>>>>>>>>>> [ 100.912511] device virbr0-nic left promiscuous mode
>>>>>>>>>> [ 100.912520] virbr0: port 1(virbr0-nic) entered disabled state
>>>>>>>>>> [ 246.801561] INFO: task dd:17284 blocked for more than 122 seconds.
>>>>>>>>>> [ 246.801564] Not tainted 5.6.0-rc1 #41
>>>>>>>>>> [ 246.801565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>>>>>>>>>> this mes sage.
>>>>>>>>>> [ 246.801566] dd D 0 17284 16931 0x00000080
>>>>>>>>>> [ 246.801568] Call Trace:
>>>>>>>>>> [ 246.801584] ? __schedule+0x251/0x690
>>>>>>>>>> [ 246.801586] schedule+0x40/0xb0
>>>>>>>>>> [ 246.801588] wb_wait_for_completion+0x52/0x80
>>>>>>>>>> [ 246.801591] ? finish_wait+0x80/0x80
>>>>>>>>>> [ 246.801592] __writeback_inodes_sb_nr+0xaa/0xd0
>>>>>>>>>> [ 246.801593] try_to_writeback_inodes_sb+0x3c/0x50
>>>>>>>>>
>>>>>>>>> Interesting. Does the hang resolve eventually or the machine is hung
>>>>>>>>> permanently? If the hang is permanent, can you do:
>>>>>>>>>
>>>>>>>>> echo w >/proc/sysrq-trigger
>>>>>>>>>
>>>>>>>>> and send us the stacktraces from dmesg? Thanks!
>>>>>>>> Yes. the hang is permanent, log as below:
>>>>>> full dmesg as attach
>>>>> ...
>>>>>
>>>>> Thanks! So the culprit seems to be:
>>>>>
>>>>>> [ 388.087799] kworker/u12:0 D 0 32 2 0x80004000
>>>>>> [ 388.087803] Workqueue: writeback wb_workfn (flush-8:32)
>>>>>> [ 388.087805] Call Trace:
>>>>>> [ 388.087810] ? __schedule+0x251/0x690
>>>>>> [ 388.087811] ? __switch_to_asm+0x34/0x70
>>>>>> [ 388.087812] ? __switch_to_asm+0x34/0x70
>>>>>> [ 388.087814] schedule+0x40/0xb0
>>>>>> [ 388.087816] schedule_timeout+0x20d/0x310
>>>>>> [ 388.087818] io_schedule_timeout+0x19/0x40
>>>>>> [ 388.087819] wait_for_completion_io+0x113/0x180
>>>>>> [ 388.087822] ? wake_up_q+0xa0/0xa0
>>>>>> [ 388.087824] submit_bio_wait+0x5b/0x80
>>>>>> [ 388.087827] blkdev_issue_flush+0x81/0xb0
>>>>>> [ 388.087834] jbd2_cleanup_journal_tail+0x80/0xa0 [jbd2]
>>>>>> [ 388.087837] jbd2_log_do_checkpoint+0xf4/0x3f0 [jbd2]
>>>>>> [ 388.087840] __jbd2_log_wait_for_space+0x66/0x190 [jbd2]
>>>>>> [ 388.087843] ? finish_wait+0x80/0x80
>>>>>> [ 388.087845] add_transaction_credits+0x27d/0x290 [jbd2]
>>>>>> [ 388.087847] ? blk_mq_make_request+0x289/0x5d0
>>>>>> [ 388.087849] start_this_handle+0x10a/0x510 [jbd2]
>>>>>> [ 388.087851] ? _cond_resched+0x15/0x30
>>>>>> [ 388.087853] jbd2__journal_start+0xea/0x1f0 [jbd2]
>>>>>> [ 388.087869] ? ext4_writepages+0x518/0xd90 [ext4]
>>>>>> [ 388.087875] __ext4_journal_start_sb+0x6e/0x130 [ext4]
>>>>>> [ 388.087883] ext4_writepages+0x518/0xd90 [ext4]
>>>>>> [ 388.087886] ? do_writepages+0x41/0xd0
>>>>>> [ 388.087893] ? ext4_mark_inode_dirty+0x1f0/0x1f0 [ext4]
>>>>>> [ 388.087894] do_writepages+0x41/0xd0
>>>>>> [ 388.087896] ? snprintf+0x49/0x60
>>>>>> [ 388.087898] __writeback_single_inode+0x3d/0x340
>>>>>> [ 388.087899] writeback_sb_inodes+0x1e5/0x480
>>>>>> [ 388.087901] wb_writeback+0xfb/0x2f0
>>>>>> [ 388.087902] wb_workfn+0xf0/0x430
>>>>>> [ 388.087903] ? __switch_to_asm+0x34/0x70
>>>>>> [ 388.087905] ? finish_task_switch+0x75/0x250
>>>>>> [ 388.087907] process_one_work+0x1a7/0x370
>>>>>> [ 388.087909] worker_thread+0x30/0x380
>>>>>> [ 388.087911] ? process_one_work+0x370/0x370
>>>>>> [ 388.087912] kthread+0x10c/0x130
>>>>>> [ 388.087913] ? kthread_park+0x80/0x80
>>>>>> [ 388.087914] ret_from_fork+0x35/0x40
>>>>>
>>>>> This process is actually waiting for IO to complete while holding
>>>>> checkpoint_mutex which holds up everybody else. The question is why the IO
>>>>> doesn't complete - that's definitely outside of filesystem. Maybe a bug in
>>>>> the block layer, storage driver, or something like that... What does
>>>>> 'cat /sys/block/<device-with-xfstests>/inflight' show?
>>>> Sorry for the late reply.
>>>> This value is 0, it represent it doesn't have inflight data(but it may be
>>>> counted bug or storage driver bug, is it right?).
>>>> Also, it doesn't hang on my physical machine, but only hang on vm.
>>>
>>> Hum, curious. Just do make sure, did you check sdc (because that appears to
>>> be the stuck device)?
>> Yes, I check sdc, its value is 0.
>> # cat /sys/block/sdc/inflight
>> 0 0
>
> OK, thanks!
>
>>>> So what should I do in next step(change storge disk format)?
>>>
>>> I'd try couple of things:
>>>
>>> 1) If you mount ext4 with barrier=0 mount option, does the problem go away?
>> Yes. Use barrier=0, this case doesn't hang,
>
> OK, so there's some problem with how the block layer is handling flush
> bios...
>
>>> 2) Can you run the test and at the same time run 'blktrace -d /dev/sdc' to
>>> gather traces? Once the machine is stuck, abort blktrace, process the
>>> resulting files with 'blkparse -i sdc' and send here compressed blkparse
>>> output. We should be able to see what was happening with the stuck request
>>> in the trace and maybe that will tell us something.
>> The log size is too big(58M) and our emali limit is 5M.
>
> OK, can you put the log somewhere for download? Alternatively you could
> provide only last say 20s of the trace which should hopefully fit into the
> limit...
Ok. I will use split command and send you in private to avoid much noise.
Best Regard
Yang Xu
>
> Honza
>
next prev parent reply other threads:[~2020-02-19 10:09 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-11 8:14 generic/269 hangs on lastest upstream kernel Yang Xu
2020-02-12 10:54 ` Jan Kara
2020-02-13 8:49 ` Yang Xu
2020-02-13 17:08 ` Theodore Y. Ts'o
2020-02-14 1:14 ` Yang Xu
2020-02-14 14:05 ` Theodore Y. Ts'o
[not found] ` <7adf16bf-d527-1c25-1a24-b4d5e4d757c4@cn.fujitsu.com>
2020-02-18 14:35 ` Theodore Y. Ts'o
2020-02-19 10:57 ` Yang Xu
2020-02-13 21:10 ` Jan Kara
[not found] ` <062ac52c-3a16-22ef-6396-53334ed94783@cn.fujitsu.com>
2020-02-14 15:00 ` Jan Kara
2020-02-18 3:25 ` Yang Xu
2020-02-18 8:24 ` Jan Kara
2020-02-18 9:46 ` Yang Xu
2020-02-18 11:03 ` Jan Kara
2020-02-19 10:09 ` Yang Xu [this message]
[not found] ` <73af3d5c-ca64-3ad3-aee2-1e78ee4fae4a@cn.fujitsu.com>
2020-02-19 12:43 ` Jan Kara
2020-02-19 15:20 ` Theodore Y. Ts'o
2020-02-20 1:35 ` Yang Xu
2020-02-25 6:03 ` Yang Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b0b82e0c-0e86-af5e-5a61-2fafb2ca85a0@cn.fujitsu.com \
--to=xuyang2018.jy@cn.fujitsu.com \
--cc=fstests@vger.kernel.org \
--cc=jack@suse.cz \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).