linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: JFFS2 deadlock
       [not found] ` <1453910781.20662.35.camel@infinera.com>
@ 2016-01-28  0:05   ` Brian Norris
  2016-01-28  8:16     ` Thomas.Betker
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Norris @ 2016-01-28  0:05 UTC (permalink / raw)
  To: Joakim Tjernlund, David Woodhouse
  Cc: sztomi89, linux-mtd, linux-fsdevel, linux-kernel,
	David Woodhouse, Artem Bityutskiy, Thomas Betker, Ming Liu,
	Deng Chao, wangzaiwei, Alexander Viro

+ David (maintainer), linux-fsdevel, and others

On Wed, Jan 27, 2016 at 04:05:35PM +0000, Joakim Tjernlund wrote:
> On Wed, 2016-01-27 at 16:36 +0100, Szabó Tamás wrote:
> > Hello all,
> > 
> > I work on an embedded system running Linux 3.10 and found a deadlock
> > situation between jffs2_readpage and jffs2_write.
> > The problem is present on the latest 4.4 kernel too and occurs when
> > two tasks want to access the same file, one reads and the other writes it.
> > 
> > The kernel stack traces for writer and reader in deadlock:
> > 
> > __switch_to+0x4c/0x98
> > sleep_on_page+0x10/0x24
> > __lock_page+0x8c/0x9c
> > find_lock_page+0x7c/0x94
> > grab_cache_page_write_begin+0x64/0xd8
> > jffs2_write_begin+0x6c/0x2ec
> > generic_file_buffered_write+0x188/0x258
> > __generic_file_aio_write+0x1e0/0x484
> > generic_file_aio_write+0x70/0xfc
> > do_sync_write+0x7c/0xd4
> > vfs_write+0xc8/0x1b0
> > SyS_write+0x4c/0xa8
> > ret_from_syscall+0x0/0x38
> > 
> > __switch_to+0x4c/0x98
> > jffs2_readpage+0x28/0x5c
> > generic_file_aio_read+0x22c/0x7a0
> > do_sync_read+0x7c/0xd4
> > vfs_read+0xb0/0x170
> > SyS_read+0x4c/0xa8
> > ret_from_syscall+0x0/0x38
> > 
> > The root cause here is the locking order of f->sem mutex and pagelock.
> > jffs2_readpage function gets the page in locked state and then locks
> > the f->sem mutex, while jffs2_write_begin does it in reverse order.
> > 
> > I found a commit that brought in this bug.
> > That was a fix for another deadlock issue:
> > https://github.com/torvalds/linux/commit/5ffd3412ae5536a4c57469cb8ea31887121dcb2e
> > 
> > According to this commit and my code inspections the lock orders may be
> > the following:
> > readpage: page lock, f->sem
> > writepage_begin: f->sem, page lock
> > writepage_end: page lock, f->sem
> > GC: f->sem, page lock
> 
> I am not sure if this is the first time I hear this or if someone else has reported
> a similar issue.

No, I'm pretty sure this is not the first report. I think there have
even been patches. The problem is that JFFS2 is effectively
unmaintained, despite what MAINTAINERS has to say about it.

Previous reports:

Subject: Another JFFS2 deadlock, kernel 3.4.11
http://thread.gmane.org/gmane.linux.drivers.mtd/62523

Subject: [JFFS2] Revision "jffs2: Fix lock acquisition order bug in
jffs2_write_begin" introduces another dead lock.
http://thread.gmane.org/gmane.linux.drivers.mtd/47986

There are other reports of deadlocks in jffs2_readpage, but in my
limited scanning, they look slightly different, so I won't include them
in this list.

For reference: outstanding patches, waiting for a maintainer (I've been
keeping patchwork up-to-date, mostly, but I'm not touching JFFS2 myself,
for the most part):
http://patchwork.ozlabs.org/project/linux-mtd/list/?q=jffs2

I'm tempted to resurrect this patch, to mark JFFS2 as Orphaned /
Obsolete:

http://patchwork.ozlabs.org/patch/422160/

David, can you please clarify your role here? Are you maintaining JFFS2
or not? Or perhaps someone else should be added? I don't really know any
interested parties.

Maybe the MAINTAINERS entry should be directed to linux-fsdevel too?

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
  2016-01-28  0:05   ` JFFS2 deadlock Brian Norris
@ 2016-01-28  8:16     ` Thomas.Betker
  2016-02-01 14:28       ` David Woodhouse
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas.Betker @ 2016-01-28  8:16 UTC (permalink / raw)
  To: computersforpeace
  Cc: Artem Bityutskiy, Deng Chao, David Woodhouse, Joakim Tjernlund,
	linux-fsdevel, linux-kernel, linux-mtd, linux-mtd, Ming Liu,
	sztomi89, Thomas Betker, Alexander Viro, wangzaiwei

Hello Brian:

> No, I'm pretty sure this is not the first report. I think there have
> even been patches. The problem is that JFFS2 is effectively
> unmaintained, despite what MAINTAINERS has to say about it.
> 
> Previous reports:
> 
> Subject: Another JFFS2 deadlock, kernel 3.4.11
> http://thread.gmane.org/gmane.linux.drivers.mtd/62523
> 
> Subject: [JFFS2] Revision "jffs2: Fix lock acquisition order bug in
> jffs2_write_begin" introduces another dead lock.
> http://thread.gmane.org/gmane.linux.drivers.mtd/47986

> For reference: outstanding patches, waiting for a maintainer (I've been
> keeping patchwork up-to-date, mostly, but I'm not touching JFFS2 myself,
> for the most part):
> http://patchwork.ozlabs.org/project/linux-mtd/list/?q=jffs2

Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in 
jffs2_write_begin"
http://article.gmane.org/gmane.linux.drivers.mtd/62951

This is a patch revising my original patch, which I sent to linux-mtd on 
10-Nov-2015. I didn't see a response yet, but it's one of the outstanding 
patches above.

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
  2016-01-28  8:16     ` Thomas.Betker
@ 2016-02-01 14:28       ` David Woodhouse
  2016-02-01 18:54         ` Thomas.Betker
       [not found]         ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
  0 siblings, 2 replies; 9+ messages in thread
From: David Woodhouse @ 2016-02-01 14:28 UTC (permalink / raw)
  To: Thomas.Betker, computersforpeace
  Cc: linux-mtd, Artem Bityutskiy, Thomas Betker, linux-kernel,
	Joakim Tjernlund, sztomi89, wangzaiwei, linux-mtd,
	Alexander Viro, Ming Liu, linux-fsdevel, Deng Chao

[-- Attachment #1: Type: text/plain, Size: 3738 bytes --]

On Thu, 2016-01-28 at 09:16 +0100, Thomas.Betker@rohde-schwarz.com wrote:
> 
> Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in 
> jffs2_write_begin"
> http://article.gmane.org/gmane.linux.drivers.mtd/62951
> 
> This is a patch revising my original patch, which I sent to linux-mtd on 
> 10-Nov-2015. I didn't see a response yet, but it's one of the outstanding 
> patches above.

That looks necessary but not sufficient. I think we need this
(untested) patch on top of it, to ensure that we *always* take the page
lock before f->sem?

Please could you try what's in the tree at
http://git.infradead.org/users/dwmw2/jffs2-fixes.git

----
From: David Woodhouse <David.Woodhouse@intel.com>
Subject: [PATCH] jffs2: Fix page lock / f->sem deadlock

With this fix, all code paths should now be obtaining the page lock before
f->sem.

Reported-by: Szabó Tamás <sztomi89@gmail.com>
Reported-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
---
 fs/jffs2/README.Locking |  5 +----
 fs/jffs2/gc.c           | 17 ++++++++++-------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/jffs2/README.Locking b/fs/jffs2/README.Locking
index 3ea3655..8918ac9 100644
--- a/fs/jffs2/README.Locking
+++ b/fs/jffs2/README.Locking
@@ -2,10 +2,6 @@
 	JFFS2 LOCKING DOCUMENTATION
 	---------------------------
 
-At least theoretically, JFFS2 does not require the Big Kernel Lock
-(BKL), which was always helpfully obtained for it by Linux 2.4 VFS
-code. It has its own locking, as described below.
-
 This document attempts to describe the existing locking rules for
 JFFS2. It is not expected to remain perfectly up to date, but ought to
 be fairly close.
@@ -69,6 +65,7 @@ Ordering constraints:
 	   any f->sem held.
 	2. Never attempt to lock two file mutexes in one thread.
 	   No ordering rules have been made for doing so.
+	3. Never lock a page cache page with f->sem held.
 
 
 	erase_completion_lock spinlock
diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index 6fb0802..5919fef 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -1316,14 +1316,17 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
 		BUG_ON(start > orig_start);
 	}
 
-	/* First, use readpage() to read the appropriate page into the page cache */
-	/* Q: What happens if we actually try to GC the _same_ page for which commit_write()
-	 *    triggered garbage collection in the first place?
-	 * A: I _think_ it's OK. read_cache_page shouldn't deadlock, we'll write out the
-	 *    page OK. We'll actually write it out again in commit_write, which is a little
-	 *    suboptimal, but at least we're correct.
-	 */
+	/* The rules state that we must obtain the page lock *before* f->sem, so
+	 * drop f->sem temporarily. Since we also hold c->alloc_sem, nothing's
+	 * actually going to *change* so we're safe; we only allow reading.
+	 *
+	 * It is important to note that jffs2_write_begin() will ensure that its
+	 * page is marked Uptodate before allocating space. That means that if we
+	 * end up here trying to GC the *same* page that jffs2_write_begin() is
+	 * trying to write out, read_cache_page() will not deadlock. */
+	mutex_unlock(&f->sem);
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+	mutex_lock(&f->sem);
 
 	if (IS_ERR(pg_ptr)) {
 		pr_warn("read_cache_page() returned error: %ld\n",
-- 
2.5.0

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
  2016-02-01 14:28       ` David Woodhouse
@ 2016-02-01 18:54         ` Thomas.Betker
       [not found]         ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas.Betker @ 2016-02-01 18:54 UTC (permalink / raw)
  To: dwmw2
  Cc: computersforpeace, Artem Bityutskiy, Deng Chao, Joakim Tjernlund,
	linux-fsdevel, linux-kernel, linux-mtd, linux-mtd, Ming Liu,
	sztomi89, Thomas Betker, Alexander Viro, wangzaiwei

Hello David:

> > Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in 
> > jffs2_write_begin"
> > http://article.gmane.org/gmane.linux.drivers.mtd/62951
> > 
> > This is a patch revising my original patch, which I sent to linux-mtd 
on 
> > 10-Nov-2015. I didn't see a response yet, but it's one of the 
outstanding 
> > patches above.
> 
> That looks necessary but not sufficient. I think we need this
> (untested) patch on top of it, to ensure that we *always* take the page
> lock before f->sem?
> 
> Please could you try what's in the tree at
> http://git.infradead.org/users/dwmw2/jffs2-fixes.git

I have been using a variant of Deng Chao's patch here for a long time, so 
that one has been tested quite a bit: 
http://lists.infradead.org/pipermail/linux-mtd/2013-August/048352.html. 
The problem with that patch was that it modified mm/filemap.c and 
include/linux/pagemap.h, which we were not too happy about.

Your patch looks much simpler, and I will definitely test it. It may take 
a few days, though, as I have to unearth the test scripts, and find a time 
slot for testing.

Best regards,
Thomas Betker

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
       [not found]         ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
@ 2016-02-18  9:57           ` Thomas.Betker
  2016-02-25  7:46             ` Joakim Tjernlund
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas.Betker @ 2016-02-18  9:57 UTC (permalink / raw)
  To: David Woodhouse
  Cc: computersforpeace, Artem Bityutskiy, Deng Chao, Joakim Tjernlund,
	linux-fsdevel, linux-kernel, linux-mtd, linux-mtd, Ming Liu,
	sztomi89, Alexander Viro, wangzaiwei

Hello David:

> > Please could you try what's in the tree at
> > http://git.infradead.org/users/dwmw2/jffs2-fixes.git

> Your patch looks much simpler, and I will definitely test it. It may
> take a few days, though, as I have to unearth the test scripts, and 
> find a time slot for testing.

Here is what I did (sorry for the wait, things were piling up):

1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix 
page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't 
changed much since then, so this shouldn't make a difference. Added a 
printk() before mutex_unlock(&f->sem) to check if the prospective page was 
locked, i.e. if the deadlock situation actually occurs.

2) On my target system, started wangzaiwei's test (with some fixes), plus 
a loop copying a large file over and over (to get GC rolling, which 
increases the chance of a deadlock).

3) After 24 hours, the system was still alive, and the printk() had been 
hit 32 times.

So yes, I am confident that your patch avoids the deadlock, and if that's 
good enough for you, please add my Tested-by:. However, I am going to run 
some more stress tests here just to check that there weren't any 
unexpected side effects. (Don't get me wrong -- I am sure the patch is 
fine, but for me it's a case of "once bitten, twice shy" ...)

Best regards,
Thomas Betker

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
  2016-02-18  9:57           ` Thomas.Betker
@ 2016-02-25  7:46             ` Joakim Tjernlund
  2016-02-25  9:57               ` Thomas.Betker
  0 siblings, 1 reply; 9+ messages in thread
From: Joakim Tjernlund @ 2016-02-25  7:46 UTC (permalink / raw)
  To: dwmw2, Thomas.Betker
  Cc: linux-kernel, linux-mtd, sztomi89, wangzaiwei, deng.chao1,
	liu.ming50, dedekind1, viro, linux-mtd-bounces, linux-fsdevel,
	computersforpeace

On Thu, 2016-02-18 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote:
> Hello David:
> 
> > 
> > > 
> > > Please could you try what's in the tree at
> > > http://git.infradead.org/users/dwmw2/jffs2-fixes.git
> > 
> > Your patch looks much simpler, and I will definitely test it. It may
> > take a few days, though, as I have to unearth the test scripts, and 
> > find a time slot for testing.
> Here is what I did (sorry for the wait, things were piling up):
> 
> 1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix 
> page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't 
> changed much since then, so this shouldn't make a difference. Added a 
> printk() before mutex_unlock(&f->sem) to check if the prospective page was 
> locked, i.e. if the deadlock situation actually occurs.
> 
> 2) On my target system, started wangzaiwei's test (with some fixes), plus 
> a loop copying a large file over and over (to get GC rolling, which 
> increases the chance of a deadlock).
> 
> 3) After 24 hours, the system was still alive, and the printk() had been 
> hit 32 times.
> 
> So yes, I am confident that your patch avoids the deadlock, and if that's 
> good enough for you, please add my Tested-by:. However, I am going to run 
> some more stress tests here just to check that there weren't any 
> unexpected side effects. (Don't get me wrong -- I am sure the patch is 
> fine, but for me it's a case of "once bitten, twice shy" ...)

Can we get this upstream before next release? I don't think there will much more
testing at this point.

 Jocke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
  2016-02-25  7:46             ` Joakim Tjernlund
@ 2016-02-25  9:57               ` Thomas.Betker
  2016-02-25 11:22                 ` David Woodhouse
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas.Betker @ 2016-02-25  9:57 UTC (permalink / raw)
  To: Joakim.Tjernlund
  Cc: computersforpeace, dedekind1, deng.chao1, dwmw2, linux-fsdevel,
	linux-kernel, linux-mtd, linux-mtd-bounces, liu.ming50, sztomi89,
	viro, wangzaiwei

Hello Joakim:

> Can we get this upstream before next release? I don't think there 
> will much more
> testing at this point.

I would second this. Actually, I did some additional stress testing, but 
didn't see any problems.

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
  2016-02-25  9:57               ` Thomas.Betker
@ 2016-02-25 11:22                 ` David Woodhouse
  2016-02-25 17:57                   ` Brian Norris
  0 siblings, 1 reply; 9+ messages in thread
From: David Woodhouse @ 2016-02-25 11:22 UTC (permalink / raw)
  To: Thomas.Betker, Joakim.Tjernlund
  Cc: computersforpeace, dedekind1, deng.chao1, linux-fsdevel,
	linux-kernel, linux-mtd, linux-mtd-bounces, liu.ming50, sztomi89,
	viro, wangzaiwei

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On Thu, 2016-02-25 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote:
> Hello Joakim:
> 
> > Can we get this upstream before next release? I don't think there 
> > will much more
> > testing at this point.
> 
> I would second this. Actually, I did some additional stress testing, but 
> didn't see any problems.

OK, of the four fixes I had in my branch, three are Cc:stable and I
have pushed them to linux-mtd.git. I'll give them a day or two in
linux-next and then send a pull request.

The fourth, which is just a mount speedup, I've put in l2-mtd.git.

Brian, was that the right way to do it? I feel like an impostor pushing
to my own repositories now :)

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: JFFS2 deadlock
  2016-02-25 11:22                 ` David Woodhouse
@ 2016-02-25 17:57                   ` Brian Norris
  0 siblings, 0 replies; 9+ messages in thread
From: Brian Norris @ 2016-02-25 17:57 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Thomas.Betker, Joakim.Tjernlund, dedekind1, deng.chao1,
	linux-fsdevel, linux-kernel, linux-mtd, linux-mtd-bounces,
	liu.ming50, sztomi89, viro, wangzaiwei

On Thu, Feb 25, 2016 at 11:22:12AM +0000, David Woodhouse wrote:
> OK, of the four fixes I had in my branch, three are Cc:stable and I
> have pushed them to linux-mtd.git. I'll give them a day or two in
> linux-next and then send a pull request.
> 
> The fourth, which is just a mount speedup, I've put in l2-mtd.git.
> 
> Brian, was that the right way to do it? I feel like an impostor pushing
> to my own repositories now :)

Nope, that's perfect. Happy to have a JFFS2 maintainer again :)

(Feel free to send the MAINTAINERS update that's also in
linux-mtd.git/master. It was meant to ride along with whatever else
needed pushed to Linus.)

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-02-25 17:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKatumo13ZM2o3dK9_ouxzta-sxeiB5E236Qh5dfbnEBc9CuMw@mail.gmail.com>
     [not found] ` <1453910781.20662.35.camel@infinera.com>
2016-01-28  0:05   ` JFFS2 deadlock Brian Norris
2016-01-28  8:16     ` Thomas.Betker
2016-02-01 14:28       ` David Woodhouse
2016-02-01 18:54         ` Thomas.Betker
     [not found]         ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
2016-02-18  9:57           ` Thomas.Betker
2016-02-25  7:46             ` Joakim Tjernlund
2016-02-25  9:57               ` Thomas.Betker
2016-02-25 11:22                 ` David Woodhouse
2016-02-25 17:57                   ` Brian Norris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).