From: Kyeong Yoo <Kyeong.Yoo@alliedtelesis.co.nz>
To: Richard Weinberger <richard.weinberger@gmail.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: [PATCH] jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
Date: Tue, 18 Jul 2017 23:00:58 +0000 [thread overview]
Message-ID: <0e1399d9-a7ed-9838-7a03-0a1b18be2234@alliedtelesis.co.nz> (raw)
In-Reply-To: <CAFLxGvwAJ+xfN=1QGfe_6mNksmMr6cTjSSRKPqYGKMi-1zvU2Q@mail.gmail.com>
Hi Richard,
This patch was on top of v4.12 and still applies cleanly to the latest
mainline which has the following commit at the top:
commit 74cbd96bc2e00f5daa805e2ebf49e998f7045062 (origin/master, origin/HEAD)
Merge: bef85bd7db26 6409e84ec58f
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue Jul 18 11:51:08 2017 -0700
Merge tag 'md/4.13-rc2' of
git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
And I can see the commit you mentioned exists in my repo.
commit 157078f64b8a9cd7011b6b900b2f2498df850748
Author: Thomas Betker <thomas.betker@rohde-schwarz.com>
Date: Tue Nov 10 22:18:15 2015 +0100
Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
The logic of deadlock still exists with the above commit, that's what I
described in the patch commit message. Actual kernel version I saw the
issue is 4.4.6. But what I found is the logic regarding locking is same
in the current mainline.
For you information, attached the kernel panic info below.
Thanks,
Kyeong
<3>INFO: task jffs2_gcd_mtd3:190 blocked for more than 120 seconds.
<3> Tainted: P O 4.4.6-at1 #1
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>jffs2_gcd_mtd3 D 00000000 0 190 2 0x00000800
<6>Call Trace:
<6>[a712faf0] [a4003180] 0xa4003180 (unreliable)
<6>[a712fbb0] [8068ea54] __schedule+0x1f4/0x5f0
<6>[a712fbf0] [8068ee84] schedule+0x34/0xb0
<6>[a712fc00] [80691b48] schedule_timeout+0x188/0x1d0
<6>[a712fc50] [8068e824] io_schedule_timeout+0x84/0xc0
<6>[a712fc70] [8068f7c0] bit_wait_io+0x20/0x70
<6>[a712fc80] [8068f584] __wait_on_bit_lock+0xa4/0x140
<6>[a712fcb0] [800b51bc] __lock_page+0x9c/0xb0
<6>[a712fce0] [800b58c8] do_read_cache_page+0x168/0x200
<6>[a712fd20] [8022b670] jffs2_gc_fetch_page+0x30/0xd0
<6>[a712fd30] [80694934] jffs2_garbage_collect_live+0x958/0xe64
<6>[a712fde0] [80228934] jffs2_garbage_collect_pass+0x454/0x7e0
<6>[a712fe40] [8022a290] jffs2_garbage_collect_thread+0xc0/0x1d0
<6>[a712fef0] [8003fb38] kthread+0xc8/0xe0
<6>[a712ff40] [8000eed8] ret_from_kernel_thread+0x5c/0x64
<3>INFO: task syslog-ng:483 blocked for more than 120 seconds.
<3> Tainted: P O 4.4.6-at1 #1
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>syslog-ng D 0fc2101c 0 483 1 0x00000000
<6>Call Trace:
<6>[a663bb60] [a6346840] 0xa6346840 (unreliable)
<6>[a663bc20] [8068ea54] __schedule+0x1f4/0x5f0
<6>[a663bc60] [8068ee84] schedule+0x34/0xb0
<6>[a663bc70] [8068f230] schedule_preempt_disabled+0x10/0x20
<6>[a663bc80] [806908a0] __mutex_lock_slowpath+0xa0/0x140
<6>[a663bcc0] [80690994] mutex_lock+0x54/0x70
<6>[a663bcd0] [802237e0] jffs2_reserve_space+0x40/0x310
<6>[a663bd40] [802263b0] jffs2_write_inode_range+0xe0/0x2e0
<6>[a663bda0] [80220c04] jffs2_write_end+0xf4/0x2d0
<6>[a663bde0] [800b62b4] generic_perform_write+0x114/0x200
<6>[a663be40] [800b772c] __generic_file_write_iter+0x1bc/0x230
<6>[a663be80] [800b78b0] generic_file_write_iter+0x110/0x300
<6>[a663bea0] [80103cb0] __vfs_write+0xc0/0x130
<6>[a663bef0] [80104660] vfs_write+0xa0/0x1b0
<6>[a663bf10] [8010506c] SyS_write+0x4c/0xd0
<6>[a663bf40] [8000eda0] ret_from_syscall+0x0/0x3c
<6>--- interrupt: c01 at 0xfc2101c
<6> LR = 0x10023e94
<3>INFO: task scp:15190 blocked for more than 120 seconds.
<3> Tainted: P O 4.4.6-at1 #1
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>scp D 1fdb3b4c 0 15190 15189 0x00000000
<6>Call Trace:
<6>[a468fb60] [0929effc] 0x929effc (unreliable)
<6>[a468fc20] [8068ea54] __schedule+0x1f4/0x5f0
<6>[a468fc60] [8068ee84] schedule+0x34/0xb0
<6>[a468fc70] [8068f230] schedule_preempt_disabled+0x10/0x20
<6>[a468fc80] [806908a0] __mutex_lock_slowpath+0xa0/0x140
<6>[a468fcc0] [80690994] mutex_lock+0x54/0x70
<6>[a468fcd0] [802237e0] jffs2_reserve_space+0x40/0x310
<6>[a468fd40] [802263b0] jffs2_write_inode_range+0xe0/0x2e0
<6>[a468fda0] [80220c04] jffs2_write_end+0xf4/0x2d0
<6>[a468fde0] [800b62b4] generic_perform_write+0x114/0x200
<6>[a468fe40] [800b772c] __generic_file_write_iter+0x1bc/0x230
<6>[a468fe80] [800b78b0] generic_file_write_iter+0x110/0x300
<6>[a468fea0] [80103cb0] __vfs_write+0xc0/0x130
<6>[a468fef0] [80104660] vfs_write+0xa0/0x1b0
<6>[a468ff10] [8010506c] SyS_write+0x4c/0xd0
<6>[a468ff40] [8000eda0] ret_from_syscall+0x0/0x3c
<6>--- interrupt: c01 at 0x1fdb3b4c
<6> LR = 0x2021adcc
On 18/07/17 19:17, Richard Weinberger wrote:
> Kyeong Yoo,
>
> On Mon, Jul 17, 2017 at 11:54 PM, Kyeong Yoo
> <Kyeong.Yoo@alliedtelesis.co.nz> wrote:
>> Hi,
>>
>> Can someone review this and give me any feedback?
>>
>> This patch is to prevent deadlock in jffs2 GC.
>>
>> Thanks,
>> Kyeong
>>
>>
>> On 04/07/17 16:22, Kyeong Yoo wrote:
>>> GC task can deadlock in read_cache_page() because it may attempt
>>> to release a page that is actually allocated by another task in
>>> jffs2_write_begin().
>>> The reason is that in jffs2_write_begin() there is a small window
>>> a cache page is allocated for use but not set Uptodate yet.
>>>
>>> This ends up with a deadlock between two tasks:
>>> 1) A task (e.g. file copy)
>>> - jffs2_write_begin() locks a cache page
>>> - jffs2_write_end() tries to lock "alloc_sem" from
>>> jffs2_reserve_space() <-- STUCK
>>> 2) GC task (jffs2_gcd_mtd3)
>>> - jffs2_garbage_collect_pass() locks "alloc_sem"
>>> - try to lock the same cache page in read_cache_page() <-- STUCK
>
> Your description does not match the code. Are you seeing this on a
> recent mainline kernel?
> Can t be that your kernel does not has this commit?
> commit 157078f64b8a9cd7011b6b900b2f2498df850748
> Author: Thomas Betker <thomas.betker@rohde-schwarz.com>
> Date: Tue Nov 10 22:18:15 2015 +0100
>
> Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
>
> This reverts commit 5ffd3412ae55
> ("jffs2: Fix lock acquisition order bug in jffs2_write_begin").
>
>
next prev parent reply other threads:[~2017-07-18 23:01 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-04 4:22 [PATCH] jffs2: GC deadlock reading a page that is used in jffs2_write_begin() Kyeong Yoo
2017-07-17 21:54 ` Kyeong Yoo
2017-07-18 7:17 ` Richard Weinberger
2017-07-18 23:00 ` Kyeong Yoo [this message]
2018-12-15 22:37 ` Richard Weinberger
2019-01-25 6:30 ` Hou Tao
2020-05-20 0:00 ` Hamish Martin
2020-05-25 19:45 ` Richard Weinberger
2020-06-02 21:00 ` Richard Weinberger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0e1399d9-a7ed-9838-7a03-0a1b18be2234@alliedtelesis.co.nz \
--to=kyeong.yoo@alliedtelesis.co.nz \
--cc=linux-mtd@lists.infradead.org \
--cc=richard.weinberger@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).