linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] jffs2: Fix lock acquisition order bug in gc path
@ 2012-02-09 18:50 Josh Cartwright
  2012-03-29 15:38 ` Brian Norris
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Cartwright @ 2012-02-09 18:50 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd, linux-kernel

The locking policy is such that the erase_complete_block spinlock is
nested within the alloc_sem mutex.  This fixes a case in which the
acquisition order was erroneously reversed.  This issue was caught by
the following lockdep splat:

   =======================================================
   [ INFO: possible circular locking dependency detected ]
   3.0.5 #1
   -------------------------------------------------------
   jffs2_gcd_mtd6/299 is trying to acquire lock:
    (&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890

   but task is already holding lock:
    (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}:
          [<c008bec4>] validate_chain+0xe6c/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c046780c>] _raw_spin_lock+0x3c/0x4c
          [<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   -> #0 (&c->alloc_sem){+.+.+.}:
          [<c008ad2c>] print_circular_bug+0x70/0x2c4
          [<c008c08c>] validate_chain+0x1034/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c0466628>] mutex_lock_nested+0x74/0x33c
          [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   other info that might help us debug this:

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&(&c->erase_completion_lock)->rlock);
                                  lock(&c->alloc_sem);
                                  lock(&(&c->erase_completion_lock)->rlock);
     lock(&c->alloc_sem);

    *** DEADLOCK ***

   1 lock held by jffs2_gcd_mtd6/299:
    #0:  (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   stack backtrace:
   [<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24)
   [<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4)
   [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc)
   [<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4)
   [<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114)
   [<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c)
   [<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890)
   [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc)
   [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0)
   [<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8)

Signed-off-by: Josh Cartwright <joshc@linux.com>
---
 fs/jffs2/gc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index 31dce61..4bbd521 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -225,8 +225,8 @@ int jffs2_garbage_collect_pass(struct jffs2_sb_info *c)
 			return 0;
 
 		D1(printk(KERN_DEBUG "No progress from erasing blocks; doing GC anyway\n"));
-		spin_lock(&c->erase_completion_lock);
 		mutex_lock(&c->alloc_sem);
+		spin_lock(&c->erase_completion_lock);
 	}
 
 	/* First, work out which block we're garbage-collecting */
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] jffs2: Fix lock acquisition order bug in gc path
  2012-02-09 18:50 [PATCH] jffs2: Fix lock acquisition order bug in gc path Josh Cartwright
@ 2012-03-29 15:38 ` Brian Norris
  2012-03-29 23:29   ` Josh Cartwright
  2012-03-29 23:34   ` [PATCH V2] " Josh Cartwright
  0 siblings, 2 replies; 5+ messages in thread
From: Brian Norris @ 2012-03-29 15:38 UTC (permalink / raw)
  To: Josh Cartwright
  Cc: David Woodhouse, linux-mtd, linux-kernel, Artem Bityutskiy

On Thu, Feb 9, 2012 at 10:50 AM, Josh Cartwright <joshc@linux.com> wrote:
> The locking policy is such that the erase_complete_block spinlock is
> nested within the alloc_sem mutex.  This fixes a case in which the
> acquisition order was erroneously reversed.  This issue was caught by
> the following lockdep splat:

(Bump.) Anybody interested in this patch? It seemed sane to me, and I
had it sitting in my inbox. It does need a trivial rebase, BTW.

Brian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] jffs2: Fix lock acquisition order bug in gc path
  2012-03-29 15:38 ` Brian Norris
@ 2012-03-29 23:29   ` Josh Cartwright
  2012-03-29 23:34   ` [PATCH V2] " Josh Cartwright
  1 sibling, 0 replies; 5+ messages in thread
From: Josh Cartwright @ 2012-03-29 23:29 UTC (permalink / raw)
  To: Brian Norris; +Cc: David Woodhouse, linux-mtd, linux-kernel, Artem Bityutskiy

On Thu, Mar 29, 2012 at 08:38:41AM -0700, Brian Norris wrote:
> On Thu, Feb 9, 2012 at 10:50 AM, Josh Cartwright <joshc@linux.com> wrote:
> > The locking policy is such that the erase_complete_block spinlock is
> > nested within the alloc_sem mutex.  This fixes a case in which the
> > acquisition order was erroneously reversed.  This issue was caught by
> > the following lockdep splat:
> 
> (Bump.) Anybody interested in this patch? It seemed sane to me, and I
> had it sitting in my inbox. It does need a trivial rebase, BTW.

I'm certainly still interested in it, for what its worth :).  We've had
no problems here for the month and a half its been in our tree.

I'll send out a rebased V2.

Thanks!

-- 
                                            joshc


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V2] jffs2: Fix lock acquisition order bug in gc path
  2012-03-29 15:38 ` Brian Norris
  2012-03-29 23:29   ` Josh Cartwright
@ 2012-03-29 23:34   ` Josh Cartwright
  2012-04-13 15:14     ` Artem Bityutskiy
  1 sibling, 1 reply; 5+ messages in thread
From: Josh Cartwright @ 2012-03-29 23:34 UTC (permalink / raw)
  To: Brian Norris; +Cc: David Woodhouse, linux-mtd, linux-kernel, Artem Bityutskiy

The locking policy is such that the erase_complete_block spinlock is
nested within the alloc_sem mutex.  This fixes a case in which the
acquisition order was erroneously reversed.  This issue was caught by
the following lockdep splat:

   =======================================================
   [ INFO: possible circular locking dependency detected ]
   3.0.5 #1
   -------------------------------------------------------
   jffs2_gcd_mtd6/299 is trying to acquire lock:
    (&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890

   but task is already holding lock:
    (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}:
          [<c008bec4>] validate_chain+0xe6c/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c046780c>] _raw_spin_lock+0x3c/0x4c
          [<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   -> #0 (&c->alloc_sem){+.+.+.}:
          [<c008ad2c>] print_circular_bug+0x70/0x2c4
          [<c008c08c>] validate_chain+0x1034/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c0466628>] mutex_lock_nested+0x74/0x33c
          [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   other info that might help us debug this:

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&(&c->erase_completion_lock)->rlock);
                                  lock(&c->alloc_sem);
                                  lock(&(&c->erase_completion_lock)->rlock);
     lock(&c->alloc_sem);

    *** DEADLOCK ***

   1 lock held by jffs2_gcd_mtd6/299:
    #0:  (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   stack backtrace:
   [<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24)
   [<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4)
   [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc)
   [<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4)
   [<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114)
   [<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c)
   [<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890)
   [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc)
   [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0)
   [<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8)

Signed-off-by: Josh Cartwright <joshc@linux.com>
---
 fs/jffs2/gc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index ad271c7..5a2dec2 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -234,8 +234,8 @@ int jffs2_garbage_collect_pass(struct jffs2_sb_info *c)
 			return 0;
 
 		jffs2_dbg(1, "No progress from erasing block; doing GC anyway\n");
-		spin_lock(&c->erase_completion_lock);
 		mutex_lock(&c->alloc_sem);
+		spin_lock(&c->erase_completion_lock);
 	}
 
 	/* First, work out which block we're garbage-collecting */
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] jffs2: Fix lock acquisition order bug in gc path
  2012-03-29 23:34   ` [PATCH V2] " Josh Cartwright
@ 2012-04-13 15:14     ` Artem Bityutskiy
  0 siblings, 0 replies; 5+ messages in thread
From: Artem Bityutskiy @ 2012-04-13 15:14 UTC (permalink / raw)
  To: Josh Cartwright, Joakim Tjernlund
  Cc: Brian Norris, David Woodhouse, linux-mtd, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 645 bytes --]

On Thu, 2012-03-29 at 19:34 -0400, Josh Cartwright wrote:
> The locking policy is such that the erase_complete_block spinlock is
> nested within the alloc_sem mutex.  This fixes a case in which the
> acquisition order was erroneously reversed.  This issue was caught by
> the following lockdep splat:

I've added the follwing:

This was introduce in '81cfc9f jffs2: Fix serious write stall due to
erase'.
    
Cc: stable@kernel.org [2.6.37+]

and pushed to l2-mtd.git, thanks! CC-in Joakim to let him know that his
patch has a bug and that he might want this fix in his system(s) as
well.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-04-13 15:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-09 18:50 [PATCH] jffs2: Fix lock acquisition order bug in gc path Josh Cartwright
2012-03-29 15:38 ` Brian Norris
2012-03-29 23:29   ` Josh Cartwright
2012-03-29 23:34   ` [PATCH V2] " Josh Cartwright
2012-04-13 15:14     ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).