* Re: JFFS2 deadlock
[not found] ` <1453910781.20662.35.camel@infinera.com>
@ 2016-01-28 0:05 ` Brian Norris
2016-01-28 8:16 ` Thomas.Betker
0 siblings, 1 reply; 9+ messages in thread
From: Brian Norris @ 2016-01-28 0:05 UTC (permalink / raw)
To: Joakim Tjernlund, David Woodhouse
Cc: sztomi89, linux-mtd, linux-fsdevel, linux-kernel,
David Woodhouse, Artem Bityutskiy, Thomas Betker, Ming Liu,
Deng Chao, wangzaiwei, Alexander Viro
+ David (maintainer), linux-fsdevel, and others
On Wed, Jan 27, 2016 at 04:05:35PM +0000, Joakim Tjernlund wrote:
> On Wed, 2016-01-27 at 16:36 +0100, Szabó Tamás wrote:
> > Hello all,
> >
> > I work on an embedded system running Linux 3.10 and found a deadlock
> > situation between jffs2_readpage and jffs2_write.
> > The problem is present on the latest 4.4 kernel too and occurs when
> > two tasks want to access the same file, one reads and the other writes it.
> >
> > The kernel stack traces for writer and reader in deadlock:
> >
> > __switch_to+0x4c/0x98
> > sleep_on_page+0x10/0x24
> > __lock_page+0x8c/0x9c
> > find_lock_page+0x7c/0x94
> > grab_cache_page_write_begin+0x64/0xd8
> > jffs2_write_begin+0x6c/0x2ec
> > generic_file_buffered_write+0x188/0x258
> > __generic_file_aio_write+0x1e0/0x484
> > generic_file_aio_write+0x70/0xfc
> > do_sync_write+0x7c/0xd4
> > vfs_write+0xc8/0x1b0
> > SyS_write+0x4c/0xa8
> > ret_from_syscall+0x0/0x38
> >
> > __switch_to+0x4c/0x98
> > jffs2_readpage+0x28/0x5c
> > generic_file_aio_read+0x22c/0x7a0
> > do_sync_read+0x7c/0xd4
> > vfs_read+0xb0/0x170
> > SyS_read+0x4c/0xa8
> > ret_from_syscall+0x0/0x38
> >
> > The root cause here is the locking order of f->sem mutex and pagelock.
> > jffs2_readpage function gets the page in locked state and then locks
> > the f->sem mutex, while jffs2_write_begin does it in reverse order.
> >
> > I found a commit that brought in this bug.
> > That was a fix for another deadlock issue:
> > https://github.com/torvalds/linux/commit/5ffd3412ae5536a4c57469cb8ea31887121dcb2e
> >
> > According to this commit and my code inspections the lock orders may be
> > the following:
> > readpage: page lock, f->sem
> > writepage_begin: f->sem, page lock
> > writepage_end: page lock, f->sem
> > GC: f->sem, page lock
>
> I am not sure if this is the first time I hear this or if someone else has reported
> a similar issue.
No, I'm pretty sure this is not the first report. I think there have
even been patches. The problem is that JFFS2 is effectively
unmaintained, despite what MAINTAINERS has to say about it.
Previous reports:
Subject: Another JFFS2 deadlock, kernel 3.4.11
http://thread.gmane.org/gmane.linux.drivers.mtd/62523
Subject: [JFFS2] Revision "jffs2: Fix lock acquisition order bug in
jffs2_write_begin" introduces another dead lock.
http://thread.gmane.org/gmane.linux.drivers.mtd/47986
There are other reports of deadlocks in jffs2_readpage, but in my
limited scanning, they look slightly different, so I won't include them
in this list.
For reference: outstanding patches, waiting for a maintainer (I've been
keeping patchwork up-to-date, mostly, but I'm not touching JFFS2 myself,
for the most part):
http://patchwork.ozlabs.org/project/linux-mtd/list/?q=jffs2
I'm tempted to resurrect this patch, to mark JFFS2 as Orphaned /
Obsolete:
http://patchwork.ozlabs.org/patch/422160/
David, can you please clarify your role here? Are you maintaining JFFS2
or not? Or perhaps someone else should be added? I don't really know any
interested parties.
Maybe the MAINTAINERS entry should be directed to linux-fsdevel too?
Brian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
2016-01-28 0:05 ` JFFS2 deadlock Brian Norris
@ 2016-01-28 8:16 ` Thomas.Betker
2016-02-01 14:28 ` David Woodhouse
0 siblings, 1 reply; 9+ messages in thread
From: Thomas.Betker @ 2016-01-28 8:16 UTC (permalink / raw)
To: computersforpeace
Cc: Artem Bityutskiy, Deng Chao, David Woodhouse, Joakim Tjernlund,
linux-fsdevel, linux-kernel, linux-mtd, linux-mtd, Ming Liu,
sztomi89, Thomas Betker, Alexander Viro, wangzaiwei
Hello Brian:
> No, I'm pretty sure this is not the first report. I think there have
> even been patches. The problem is that JFFS2 is effectively
> unmaintained, despite what MAINTAINERS has to say about it.
>
> Previous reports:
>
> Subject: Another JFFS2 deadlock, kernel 3.4.11
> http://thread.gmane.org/gmane.linux.drivers.mtd/62523
>
> Subject: [JFFS2] Revision "jffs2: Fix lock acquisition order bug in
> jffs2_write_begin" introduces another dead lock.
> http://thread.gmane.org/gmane.linux.drivers.mtd/47986
> For reference: outstanding patches, waiting for a maintainer (I've been
> keeping patchwork up-to-date, mostly, but I'm not touching JFFS2 myself,
> for the most part):
> http://patchwork.ozlabs.org/project/linux-mtd/list/?q=jffs2
Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in
jffs2_write_begin"
http://article.gmane.org/gmane.linux.drivers.mtd/62951
This is a patch revising my original patch, which I sent to linux-mtd on
10-Nov-2015. I didn't see a response yet, but it's one of the outstanding
patches above.
Best regards,
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
2016-01-28 8:16 ` Thomas.Betker
@ 2016-02-01 14:28 ` David Woodhouse
2016-02-01 18:54 ` Thomas.Betker
[not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
0 siblings, 2 replies; 9+ messages in thread
From: David Woodhouse @ 2016-02-01 14:28 UTC (permalink / raw)
To: Thomas.Betker, computersforpeace
Cc: linux-mtd, Artem Bityutskiy, Thomas Betker, linux-kernel,
Joakim Tjernlund, sztomi89, wangzaiwei, linux-mtd,
Alexander Viro, Ming Liu, linux-fsdevel, Deng Chao
[-- Attachment #1: Type: text/plain, Size: 3738 bytes --]
On Thu, 2016-01-28 at 09:16 +0100, Thomas.Betker@rohde-schwarz.com wrote:
>
> Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in
> jffs2_write_begin"
> http://article.gmane.org/gmane.linux.drivers.mtd/62951
>
> This is a patch revising my original patch, which I sent to linux-mtd on
> 10-Nov-2015. I didn't see a response yet, but it's one of the outstanding
> patches above.
That looks necessary but not sufficient. I think we need this
(untested) patch on top of it, to ensure that we *always* take the page
lock before f->sem?
Please could you try what's in the tree at
http://git.infradead.org/users/dwmw2/jffs2-fixes.git
----
From: David Woodhouse <David.Woodhouse@intel.com>
Subject: [PATCH] jffs2: Fix page lock / f->sem deadlock
With this fix, all code paths should now be obtaining the page lock before
f->sem.
Reported-by: Szabó Tamás <sztomi89@gmail.com>
Reported-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
---
fs/jffs2/README.Locking | 5 +----
fs/jffs2/gc.c | 17 ++++++++++-------
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/fs/jffs2/README.Locking b/fs/jffs2/README.Locking
index 3ea3655..8918ac9 100644
--- a/fs/jffs2/README.Locking
+++ b/fs/jffs2/README.Locking
@@ -2,10 +2,6 @@
JFFS2 LOCKING DOCUMENTATION
---------------------------
-At least theoretically, JFFS2 does not require the Big Kernel Lock
-(BKL), which was always helpfully obtained for it by Linux 2.4 VFS
-code. It has its own locking, as described below.
-
This document attempts to describe the existing locking rules for
JFFS2. It is not expected to remain perfectly up to date, but ought to
be fairly close.
@@ -69,6 +65,7 @@ Ordering constraints:
any f->sem held.
2. Never attempt to lock two file mutexes in one thread.
No ordering rules have been made for doing so.
+ 3. Never lock a page cache page with f->sem held.
erase_completion_lock spinlock
diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index 6fb0802..5919fef 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -1316,14 +1316,17 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
BUG_ON(start > orig_start);
}
- /* First, use readpage() to read the appropriate page into the page cache */
- /* Q: What happens if we actually try to GC the _same_ page for which commit_write()
- * triggered garbage collection in the first place?
- * A: I _think_ it's OK. read_cache_page shouldn't deadlock, we'll write out the
- * page OK. We'll actually write it out again in commit_write, which is a little
- * suboptimal, but at least we're correct.
- */
+ /* The rules state that we must obtain the page lock *before* f->sem, so
+ * drop f->sem temporarily. Since we also hold c->alloc_sem, nothing's
+ * actually going to *change* so we're safe; we only allow reading.
+ *
+ * It is important to note that jffs2_write_begin() will ensure that its
+ * page is marked Uptodate before allocating space. That means that if we
+ * end up here trying to GC the *same* page that jffs2_write_begin() is
+ * trying to write out, read_cache_page() will not deadlock. */
+ mutex_unlock(&f->sem);
pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+ mutex_lock(&f->sem);
if (IS_ERR(pg_ptr)) {
pr_warn("read_cache_page() returned error: %ld\n",
--
2.5.0
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
2016-02-01 14:28 ` David Woodhouse
@ 2016-02-01 18:54 ` Thomas.Betker
[not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
1 sibling, 0 replies; 9+ messages in thread
From: Thomas.Betker @ 2016-02-01 18:54 UTC (permalink / raw)
To: dwmw2
Cc: computersforpeace, Artem Bityutskiy, Deng Chao, Joakim Tjernlund,
linux-fsdevel, linux-kernel, linux-mtd, linux-mtd, Ming Liu,
sztomi89, Thomas Betker, Alexander Viro, wangzaiwei
Hello David:
> > Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in
> > jffs2_write_begin"
> > http://article.gmane.org/gmane.linux.drivers.mtd/62951
> >
> > This is a patch revising my original patch, which I sent to linux-mtd
on
> > 10-Nov-2015. I didn't see a response yet, but it's one of the
outstanding
> > patches above.
>
> That looks necessary but not sufficient. I think we need this
> (untested) patch on top of it, to ensure that we *always* take the page
> lock before f->sem?
>
> Please could you try what's in the tree at
> http://git.infradead.org/users/dwmw2/jffs2-fixes.git
I have been using a variant of Deng Chao's patch here for a long time, so
that one has been tested quite a bit:
http://lists.infradead.org/pipermail/linux-mtd/2013-August/048352.html.
The problem with that patch was that it modified mm/filemap.c and
include/linux/pagemap.h, which we were not too happy about.
Your patch looks much simpler, and I will definitely test it. It may take
a few days, though, as I have to unearth the test scripts, and find a time
slot for testing.
Best regards,
Thomas Betker
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
[not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
@ 2016-02-18 9:57 ` Thomas.Betker
2016-02-25 7:46 ` Joakim Tjernlund
0 siblings, 1 reply; 9+ messages in thread
From: Thomas.Betker @ 2016-02-18 9:57 UTC (permalink / raw)
To: David Woodhouse
Cc: computersforpeace, Artem Bityutskiy, Deng Chao, Joakim Tjernlund,
linux-fsdevel, linux-kernel, linux-mtd, linux-mtd, Ming Liu,
sztomi89, Alexander Viro, wangzaiwei
Hello David:
> > Please could you try what's in the tree at
> > http://git.infradead.org/users/dwmw2/jffs2-fixes.git
> Your patch looks much simpler, and I will definitely test it. It may
> take a few days, though, as I have to unearth the test scripts, and
> find a time slot for testing.
Here is what I did (sorry for the wait, things were piling up):
1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix
page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't
changed much since then, so this shouldn't make a difference. Added a
printk() before mutex_unlock(&f->sem) to check if the prospective page was
locked, i.e. if the deadlock situation actually occurs.
2) On my target system, started wangzaiwei's test (with some fixes), plus
a loop copying a large file over and over (to get GC rolling, which
increases the chance of a deadlock).
3) After 24 hours, the system was still alive, and the printk() had been
hit 32 times.
So yes, I am confident that your patch avoids the deadlock, and if that's
good enough for you, please add my Tested-by:. However, I am going to run
some more stress tests here just to check that there weren't any
unexpected side effects. (Don't get me wrong -- I am sure the patch is
fine, but for me it's a case of "once bitten, twice shy" ...)
Best regards,
Thomas Betker
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
2016-02-18 9:57 ` Thomas.Betker
@ 2016-02-25 7:46 ` Joakim Tjernlund
2016-02-25 9:57 ` Thomas.Betker
0 siblings, 1 reply; 9+ messages in thread
From: Joakim Tjernlund @ 2016-02-25 7:46 UTC (permalink / raw)
To: dwmw2, Thomas.Betker
Cc: linux-kernel, linux-mtd, sztomi89, wangzaiwei, deng.chao1,
liu.ming50, dedekind1, viro, linux-mtd-bounces, linux-fsdevel,
computersforpeace
On Thu, 2016-02-18 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote:
> Hello David:
>
> >
> > >
> > > Please could you try what's in the tree at
> > > http://git.infradead.org/users/dwmw2/jffs2-fixes.git
> >
> > Your patch looks much simpler, and I will definitely test it. It may
> > take a few days, though, as I have to unearth the test scripts, and
> > find a time slot for testing.
> Here is what I did (sorry for the wait, things were piling up):
>
> 1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix
> page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't
> changed much since then, so this shouldn't make a difference. Added a
> printk() before mutex_unlock(&f->sem) to check if the prospective page was
> locked, i.e. if the deadlock situation actually occurs.
>
> 2) On my target system, started wangzaiwei's test (with some fixes), plus
> a loop copying a large file over and over (to get GC rolling, which
> increases the chance of a deadlock).
>
> 3) After 24 hours, the system was still alive, and the printk() had been
> hit 32 times.
>
> So yes, I am confident that your patch avoids the deadlock, and if that's
> good enough for you, please add my Tested-by:. However, I am going to run
> some more stress tests here just to check that there weren't any
> unexpected side effects. (Don't get me wrong -- I am sure the patch is
> fine, but for me it's a case of "once bitten, twice shy" ...)
Can we get this upstream before next release? I don't think there will much more
testing at this point.
Jocke
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
2016-02-25 7:46 ` Joakim Tjernlund
@ 2016-02-25 9:57 ` Thomas.Betker
2016-02-25 11:22 ` David Woodhouse
0 siblings, 1 reply; 9+ messages in thread
From: Thomas.Betker @ 2016-02-25 9:57 UTC (permalink / raw)
To: Joakim.Tjernlund
Cc: computersforpeace, dedekind1, deng.chao1, dwmw2, linux-fsdevel,
linux-kernel, linux-mtd, linux-mtd-bounces, liu.ming50, sztomi89,
viro, wangzaiwei
Hello Joakim:
> Can we get this upstream before next release? I don't think there
> will much more
> testing at this point.
I would second this. Actually, I did some additional stress testing, but
didn't see any problems.
Best regards,
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
2016-02-25 9:57 ` Thomas.Betker
@ 2016-02-25 11:22 ` David Woodhouse
2016-02-25 17:57 ` Brian Norris
0 siblings, 1 reply; 9+ messages in thread
From: David Woodhouse @ 2016-02-25 11:22 UTC (permalink / raw)
To: Thomas.Betker, Joakim.Tjernlund
Cc: computersforpeace, dedekind1, deng.chao1, linux-fsdevel,
linux-kernel, linux-mtd, linux-mtd-bounces, liu.ming50, sztomi89,
viro, wangzaiwei
[-- Attachment #1: Type: text/plain, Size: 840 bytes --]
On Thu, 2016-02-25 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote:
> Hello Joakim:
>
> > Can we get this upstream before next release? I don't think there
> > will much more
> > testing at this point.
>
> I would second this. Actually, I did some additional stress testing, but
> didn't see any problems.
OK, of the four fixes I had in my branch, three are Cc:stable and I
have pushed them to linux-mtd.git. I'll give them a day or two in
linux-next and then send a pull request.
The fourth, which is just a mount speedup, I've put in l2-mtd.git.
Brian, was that the right way to do it? I feel like an impostor pushing
to my own repositories now :)
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock
2016-02-25 11:22 ` David Woodhouse
@ 2016-02-25 17:57 ` Brian Norris
0 siblings, 0 replies; 9+ messages in thread
From: Brian Norris @ 2016-02-25 17:57 UTC (permalink / raw)
To: David Woodhouse
Cc: Thomas.Betker, Joakim.Tjernlund, dedekind1, deng.chao1,
linux-fsdevel, linux-kernel, linux-mtd, linux-mtd-bounces,
liu.ming50, sztomi89, viro, wangzaiwei
On Thu, Feb 25, 2016 at 11:22:12AM +0000, David Woodhouse wrote:
> OK, of the four fixes I had in my branch, three are Cc:stable and I
> have pushed them to linux-mtd.git. I'll give them a day or two in
> linux-next and then send a pull request.
>
> The fourth, which is just a mount speedup, I've put in l2-mtd.git.
>
> Brian, was that the right way to do it? I feel like an impostor pushing
> to my own repositories now :)
Nope, that's perfect. Happy to have a JFFS2 maintainer again :)
(Feel free to send the MAINTAINERS update that's also in
linux-mtd.git/master. It was meant to ride along with whatever else
needed pushed to Linus.)
Brian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-02-25 17:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAKatumo13ZM2o3dK9_ouxzta-sxeiB5E236Qh5dfbnEBc9CuMw@mail.gmail.com>
[not found] ` <1453910781.20662.35.camel@infinera.com>
2016-01-28 0:05 ` JFFS2 deadlock Brian Norris
2016-01-28 8:16 ` Thomas.Betker
2016-02-01 14:28 ` David Woodhouse
2016-02-01 18:54 ` Thomas.Betker
[not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
2016-02-18 9:57 ` Thomas.Betker
2016-02-25 7:46 ` Joakim Tjernlund
2016-02-25 9:57 ` Thomas.Betker
2016-02-25 11:22 ` David Woodhouse
2016-02-25 17:57 ` Brian Norris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).