linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] probably bug in current ext3/jbd
@ 2003-05-18 17:21 Alex Tomas
  2003-05-19 20:34 ` [Ext2-devel] " Stephen C. Tweedie
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Tomas @ 2003-05-18 17:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: ext2-devel, Alex Tomas


hi!

ext3/jbd use b_committed_data buffer in order to prevent
allocation of blocks which were freed in non-committed
transaction. I think there is bug in this code. look,

some thread                               commit thread
----------------------------------------------------------
get_undo_access(#1)
dirty_buffer(#1)
stop_journal()

                                           start commit


start_journal()
get_undo_access(#1):
   1) wait for #1 to be
      in t_forget_list

                                           write #1 to log
                                           put #1 onto t_forget_list


   2) b_commit_data exists,
      finish get_undo_access()


                                           for_each_bh_in_forget_list() {
                                              if (jh->b_committed_data) {
                                                  kfree(jh->b_committed_data);
                                                  jh->b_committed_data = NULL;
                                              }
                                           }

                                           
/* using of b_committed_data */

b_committed_data is NULL ?



with best regards, Alex


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-18 17:21 [RFC] probably bug in current ext3/jbd Alex Tomas
@ 2003-05-19 20:34 ` Stephen C. Tweedie
  2003-05-20  0:46   ` Alex Tomas
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen C. Tweedie @ 2003-05-19 20:34 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel, Stephen Tweedie

Hi,

On Sun, 2003-05-18 at 18:21, Alex Tomas wrote:

> ext3/jbd use b_committed_data buffer in order to prevent
> allocation of blocks which were freed in non-committed
> transaction. I think there is bug in this code. look,

You just found a rather subtle bug which was fixed around 2 or 3 years
ago. :-)

> some thread                               commit thread
> ----------------------------------------------------------

> start_journal()
> get_undo_access(#1):
>    1) wait for #1 to be
>       in t_forget_list

get_undo_access is a declaration of intention to modify the buffer. 
When that happens, it calls do_get_write_access() with the force_copy
flag set.  That means that it _always_ creates a new frozen_data copy of
the buffer the first time we get undo access to a bitmap buffer within
any given transaction.  That basically means that for bitmaps,
frozen_data always holds the version of the buffer as of the end of the
previously completed transaction.

>                                            for_each_bh_in_forget_list() {
>                                               if (jh->b_committed_data) {
>                                                   kfree(jh->b_committed_data);
>                                                   jh->b_committed_data = NULL;
>                                               }

Ah, but the *immediately* following lines are:

			if (jh->b_frozen_data) {
				jh->b_committed_data = jh->b_frozen_data;
				jh->b_frozen_data = NULL;
			}

so the frozen data that was preserved at get_undo_access() time has now
committed to disk and gets rotated into the b_committed_data version. 
This is exactly how we get the new version of the committed data when
the old transaction commits.

Cheers,
 Stephen


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-20  0:46   ` Alex Tomas
@ 2003-05-19 20:51     ` Stephen C. Tweedie
  2003-05-20  0:58       ` Alex Tomas
  2003-05-20 16:06       ` Alex Tomas
  0 siblings, 2 replies; 15+ messages in thread
From: Stephen C. Tweedie @ 2003-05-19 20:51 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel, Stephen Tweedie

Hi,

On Tue, 2003-05-20 at 01:46, Alex Tomas wrote:

> please, look:
> 
>   thread A                          commit thread
> 
>                         	    if (jh->b_committed_data) {
>                                 	kfree(jh->b_committed_data);
>                                         jh->b_committed_data = NULL;
>                                     }
> access for
> b_committed_data == NULL ?

Not with BKL.  Without it, yes, that's definitely a risk, and you need
some locking for the access to b_committed_data.  Without that, even if
you keep the jh->b_committed_data field valid, you risk freeing the old
copy that another thread is using.

Cheers,
 Stephen


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-19 20:34 ` [Ext2-devel] " Stephen C. Tweedie
@ 2003-05-20  0:46   ` Alex Tomas
  2003-05-19 20:51     ` Stephen C. Tweedie
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Tomas @ 2003-05-20  0:46 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alex Tomas, linux-kernel, ext2-devel


hi!

please, look:

  thread A                          commit thread

                        	    if (jh->b_committed_data) {
                                	kfree(jh->b_committed_data);
                                        jh->b_committed_data = NULL;
                                    }
access for
b_committed_data == NULL ?

                                         if (jh->b_frozen_data) {
                                            jh->b_committed_data = jh->b_frozen_data;
                                            jh->b_frozen_data = NULL;
                                          }


or I miss something subtle here?


>>>>> Stephen C Tweedie (SCT) writes:

 SCT> get_undo_access is a declaration of intention to modify the buffer. 
 SCT> When that happens, it calls do_get_write_access() with the force_copy
 SCT> flag set.  That means that it _always_ creates a new frozen_data copy of
 SCT> the buffer the first time we get undo access to a bitmap buffer within
 SCT> any given transaction.  That basically means that for bitmaps,
 SCT> frozen_data always holds the version of the buffer as of the end of the
 SCT> previously completed transaction.

 >> for_each_bh_in_forget_list() {
 >> if (jh->b_committed_data) {
 >> kfree(jh->b_committed_data);
 jh-> b_committed_data = NULL;
 >> }

 SCT> Ah, but the *immediately* following lines are:

 SCT> 			if (jh->b_frozen_data) {
 jh-> b_committed_data = jh->b_frozen_data;
 jh-> b_frozen_data = NULL;
 SCT> 			}

 SCT> so the frozen data that was preserved at get_undo_access() time has now
 SCT> committed to disk and gets rotated into the b_committed_data version. 
 SCT> This is exactly how we get the new version of the committed data when
 SCT> the old transaction commits.

 SCT> Cheers,
 SCT>  Stephen



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-19 20:51     ` Stephen C. Tweedie
@ 2003-05-20  0:58       ` Alex Tomas
  2003-05-20 16:06       ` Alex Tomas
  1 sibling, 0 replies; 15+ messages in thread
From: Alex Tomas @ 2003-05-20  0:58 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alex Tomas, linux-kernel, ext2-devel


aha. now it's clear. thank you. I catched J_ASSERT(b_committed_data != NULL)
in ext3_free_blocks() with de-BKL'ed JBD. hence, my solution is to have a
tid journal_head indicating which transaction uses b_committed_data.

I don't want to look intrusive, but .. what do you think about new locking
schema I'm trying to implement?


>>>>> Stephen C Tweedie (SCT) writes:

 >> access for
 >> b_committed_data == NULL ?

 SCT> Not with BKL.  Without it, yes, that's definitely a risk, and you need
 SCT> some locking for the access to b_committed_data.  Without that, even if
 SCT> you keep the jh->b_committed_data field valid, you risk freeing the old
 SCT> copy that another thread is using.

 SCT> Cheers,
 SCT>  Stephen


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-19 20:51     ` Stephen C. Tweedie
  2003-05-20  0:58       ` Alex Tomas
@ 2003-05-20 16:06       ` Alex Tomas
  2003-05-21 16:38         ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Alex Tomas @ 2003-05-20 16:06 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Andrew Morton, Alex Tomas, linux-kernel, ext2-devel


hmm.
looks Andrew Morton should return BKL in ext3_get_block_handle() in -mm tree?
this BKL protects ext3_alloc_branch() -> ext3_alloc_block() -> ext3_new_block()
call chain. or we may implement new protection schema where each jh has some
reference alike 'used by transaction N'


Andrew?

>>>>> Stephen C Tweedie (SCT) writes:

 SCT> Not with BKL.  Without it, yes, that's definitely a risk, and you need
 SCT> some locking for the access to b_committed_data.  Without that, even if
 SCT> you keep the jh->b_committed_data field valid, you risk freeing the old
 SCT> copy that another thread is using.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-20 16:06       ` Alex Tomas
@ 2003-05-21 16:38         ` Andrew Morton
  2003-05-21 20:45           ` Alex Tomas
  2003-05-23 11:20           ` [RFC] probably invalid accounting in jbd Alex Tomas
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2003-05-21 16:38 UTC (permalink / raw)
  To: Alex Tomas; +Cc: sct, bzzz, linux-kernel, ext2-devel

Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> looks Andrew Morton should return BKL in ext3_get_block_handle() in -mm tree?
>  this BKL protects ext3_alloc_branch() -> ext3_alloc_block() -> ext3_new_block()
>  call chain. or we may implement new protection schema where each jh has some
>  reference alike 'used by transaction N'

Can this be solved by spinlocking the relevant buffer_head, in a similar
way to your recent changes to journal_add_journal_head()?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-21 20:45           ` Alex Tomas
@ 2003-05-21 16:59             ` Andrew Morton
       [not found]               ` <m3brxwe2lr.fsf@lexa.home.net>
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2003-05-21 16:59 UTC (permalink / raw)
  To: Alex Tomas; +Cc: bzzz, sct, linux-kernel, ext2-devel

Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> yes, we may protect it by such lock, but this lock have to be held all time
>  ext3_new_block() uses some b_committed_data because last one may be freed
>  during current ext3_new_block(). I don't think it's good.

Take a closer look - I don't think it'll be too messy, and certainly the
hold times will not be a problem.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-21 16:38         ` Andrew Morton
@ 2003-05-21 20:45           ` Alex Tomas
  2003-05-21 16:59             ` Andrew Morton
  2003-05-23 11:20           ` [RFC] probably invalid accounting in jbd Alex Tomas
  1 sibling, 1 reply; 15+ messages in thread
From: Alex Tomas @ 2003-05-21 20:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alex Tomas, sct, linux-kernel, ext2-devel

>>>>> Andrew Morton (AM) writes:

 AM> Alex Tomas <bzzz@tmi.comex.ru> wrote:
 >> 
 >> looks Andrew Morton should return BKL in ext3_get_block_handle() in -mm tree?
 >> this BKL protects ext3_alloc_branch() -> ext3_alloc_block() -> ext3_new_block()
 >> call chain. or we may implement new protection schema where each jh has some
 >> reference alike 'used by transaction N'

 AM> Can this be solved by spinlocking the relevant buffer_head, in a similar
 AM> way to your recent changes to journal_add_journal_head()?

yes, we may protect it by such lock, but this lock have to be held all time
ext3_new_block() uses some b_committed_data because last one may be freed
during current ext3_new_block(). I don't think it's good.

probably, another solution is do not free b_committed_data if it's referenced
by new transaction and make copy from b_frozen instead of freeing it.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] probably invalid accounting in jbd
  2003-05-23 11:20           ` [RFC] probably invalid accounting in jbd Alex Tomas
@ 2003-05-23  8:26             ` Andrew Morton
  2003-05-23 16:02               ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2003-05-23  8:26 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel, bzzz

Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> I think current journal_release_buffer() which is used by ext3 allocator
>  in -mm tree has a bug. look, it won't decrement credits if concurrent
>  thread already put buffer on metadata list. but this may ext3_try_to_allocate()
>  may overflow handle's credist.

Yes, that is so.

But some other handle has gained itself a free buffer.  That is actually
harmless, except for one thing: as the handles are closed down,
->t_outstanding_credits will be too small.  We could conceivably end up
overflowing the journal.


umm, one possible solution to that is to rework the t_outstanding_credits
logic so that we instead record:

	number of buffers attached to the transaction +
		sum of the initial size of all currently-running handles.

as each handle is closed off, we subtract its initial size from the above
metric.  Any buffers which that handle happened to add to the lists would
have already been accounted for, when they were added.

I _think_ that'll work...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-23 11:08                           ` Alex Tomas
@ 2003-05-23  8:49                             ` Andrew Morton
  2003-05-23 12:49                               ` Alex Tomas
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2003-05-23  8:49 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel

Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> here is small patch that intented to fix race with b_committed_data.

Thanks, it looks good.

The balloc.c code is getting awfully convoluted and hard to follow,
but no obvious restructuring strategies are leaping out at me.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
       [not found]                         ` <20030521143140.3aaa86ba.akpm@digeo.com>
@ 2003-05-23 11:08                           ` Alex Tomas
  2003-05-23  8:49                             ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Tomas @ 2003-05-23 11:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alex Tomas


hi!

here is small patch that intented to fix race with b_committed_data.
also, as Andrew asked I introduced new bit-based spinlock and
converted pte_chain_lock()/pte_chain_unlock() to use that one.
patch tested by dbench 1/2/4/6/8/16/24.

with best regards, Alex


diff -puNr linux-2.5.69-mm6/include/linux/rmap-locking.h b_commited_data-race/include/linux/rmap-locking.h
--- linux-2.5.69-mm6/include/linux/rmap-locking.h	Fri May 16 17:59:38 2003
+++ b_commited_data-race/include/linux/rmap-locking.h	Fri May 23 10:23:44 2003
@@ -10,32 +10,8 @@
 struct pte_chain;
 extern kmem_cache_t *pte_chain_cache;
 
-static inline void pte_chain_lock(struct page *page)
-{
-	/*
-	 * Assuming the lock is uncontended, this never enters
-	 * the body of the outer loop. If it is contended, then
-	 * within the inner loop a non-atomic test is used to
-	 * busywait with less bus contention for a good time to
-	 * attempt to acquire the lock bit.
-	 */
-	preempt_disable();
-#ifdef CONFIG_SMP
-	while (test_and_set_bit(PG_chainlock, &page->flags)) {
-		while (test_bit(PG_chainlock, &page->flags))
-			cpu_relax();
-	}
-#endif
-}
-
-static inline void pte_chain_unlock(struct page *page)
-{
-#ifdef CONFIG_SMP
-	smp_mb__before_clear_bit();
-	clear_bit(PG_chainlock, &page->flags);
-#endif
-	preempt_enable();
-}
+#define pte_chain_lock(page)	bb_spin_lock(PG_chainlock, &page->flags)
+#define pte_chain_unlock(page)	bb_spin_unlock(PG_chainlock, &page->flags)
 
 struct pte_chain *pte_chain_alloc(int gfp_flags);
 void __pte_chain_free(struct pte_chain *pte_chain);
diff -puNr linux-2.5.69-mm6/include/linux/jbd.h b_commited_data-race/include/linux/jbd.h
--- linux-2.5.69-mm6/include/linux/jbd.h	Fri May 16 17:59:41 2003
+++ b_commited_data-race/include/linux/jbd.h	Fri May 23 10:28:17 2003
@@ -292,6 +292,7 @@ enum jbd_state_bits {
 	BH_Revoked,		/* Has been revoked from the log */
 	BH_RevokeValid,		/* Revoked flag is valid */
 	BH_JBDDirty,		/* Is dirty but journaled */
+	BH_JLock,		/* Buffer is locked for short time */
 };
 
 BUFFER_FNS(JBD, jbd)
@@ -308,6 +309,9 @@ static inline struct journal_head *bh2jh
 {
 	return bh->b_private;
 }
+
+#define jbd_lock_bh(bh)		bb_spin_lock(BH_JLock, &bh->b_state)
+#define jbd_unlock_bh(bh)	bb_spin_unlock(BH_JLock, &bh->b_state)
 
 #define HAVE_JOURNAL_CALLBACK_STATUS
 /**
diff -puNr linux-2.5.69-mm6/fs/jbd/commit.c b_commited_data-race/fs/jbd/commit.c
--- linux-2.5.69-mm6/fs/jbd/commit.c	Fri May 16 18:03:20 2003
+++ b_commited_data-race/fs/jbd/commit.c	Fri May 23 10:29:20 2003
@@ -686,12 +686,14 @@ skip_commit: /* The journal should be un
 		 * Otherwise, we can just throw away the frozen data now.
 		 */
 		if (jh->b_committed_data) {
+			jbd_lock_bh(jh2bh(jh));
 			kfree(jh->b_committed_data);
 			jh->b_committed_data = NULL;
 			if (jh->b_frozen_data) {
 				jh->b_committed_data = jh->b_frozen_data;
 				jh->b_frozen_data = NULL;
 			}
+			jbd_unlock_bh(jh2bh(jh));
 		} else if (jh->b_frozen_data) {
 			kfree(jh->b_frozen_data);
 			jh->b_frozen_data = NULL;
diff -puNr linux-2.5.69-mm6/fs/ext3/balloc.c b_commited_data-race/fs/ext3/balloc.c
--- linux-2.5.69-mm6/fs/ext3/balloc.c	Sat May 17 17:52:42 2003
+++ b_commited_data-race/fs/ext3/balloc.c	Fri May 23 10:55:55 2003
@@ -185,11 +185,14 @@ do_more:
 	if (err)
 		goto error_return;
 
+	jbd_lock_bh(bitmap_bh);
+	
 	for (i = 0; i < count; i++) {
 		/*
 		 * An HJ special.  This is expensive...
 		 */
 #ifdef CONFIG_JBD_DEBUG
+		jbd_unlock_bh(bitmap_bh);
 		{
 			struct buffer_head *debug_bh;
 			debug_bh = sb_find_get_block(sb, block + i);
@@ -202,6 +205,7 @@ do_more:
 				__brelse(debug_bh);
 			}
 		}
+		jbd_lock_bh(bitmap_bh);
 #endif
 		/* @@@ This prevents newly-allocated data from being
 		 * freed and then reallocated within the same
@@ -243,6 +247,7 @@ do_more:
 			dquot_freed_blocks++;
 		}
 	}
+	jbd_unlock_bh(bitmap_bh);
 
 	spin_lock(sb_bgl_lock(sbi, block_group));
 	gdp->bg_free_blocks_count =
@@ -289,11 +294,12 @@ error_return:
  * data-writes at some point, and disable it for metadata allocations or
  * sync-data inodes.
  */
-static int ext3_test_allocatable(int nr, struct buffer_head *bh)
+static inline int ext3_test_allocatable(int nr, struct buffer_head *bh,
+					int have_access)
 {
 	if (ext3_test_bit(nr, bh->b_data))
 		return 0;
-	if (!buffer_jbd(bh) || !bh2jh(bh)->b_committed_data)
+	if (!have_access || !buffer_jbd(bh) || !bh2jh(bh)->b_committed_data)
 		return 1;
 	return !ext3_test_bit(nr, bh2jh(bh)->b_committed_data);
 }
@@ -305,8 +311,8 @@ static int ext3_test_allocatable(int nr,
  * the initial goal; then for a free byte somewhere in the bitmap; then
  * for any free bit in the bitmap.
  */
-static int find_next_usable_block(int start,
-			struct buffer_head *bh, int maxblocks)
+static int find_next_usable_block(int start, struct buffer_head *bh,
+				int maxblocks, int have_access)
 {
 	int here, next;
 	char *p, *r;
@@ -322,7 +328,8 @@ static int find_next_usable_block(int st
 		 */
 		int end_goal = (start + 63) & ~63;
 		here = ext3_find_next_zero_bit(bh->b_data, end_goal, start);
-		if (here < end_goal && ext3_test_allocatable(here, bh))
+		if (here < end_goal &&
+			ext3_test_allocatable(here, bh, have_access))
 			return here;
 		
 		ext3_debug ("Bit not found near goal\n");
@@ -345,7 +352,7 @@ static int find_next_usable_block(int st
 	r = memscan(p, 0, (maxblocks - here + 7) >> 3);
 	next = (r - ((char *) bh->b_data)) << 3;
 	
-	if (next < maxblocks && ext3_test_allocatable(next, bh))
+	if (next < maxblocks && ext3_test_allocatable(next, bh, have_access))
 		return next;
 	
 	/* The bitmap search --- search forward alternately
@@ -357,13 +364,13 @@ static int find_next_usable_block(int st
 						 maxblocks, here);
 		if (next >= maxblocks)
 			return -1;
-		if (ext3_test_allocatable(next, bh))
+		if (ext3_test_allocatable(next, bh, have_access))
 			return next;
 
-		J_ASSERT_BH(bh, bh2jh(bh)->b_committed_data);
-		here = ext3_find_next_zero_bit
-			((unsigned long *) bh2jh(bh)->b_committed_data, 
-			 maxblocks, next);
+		if (have_access)
+			here = ext3_find_next_zero_bit
+				((unsigned long *) bh2jh(bh)->b_committed_data, 
+			 	maxblocks, next);
 	}
 	return -1;
 }
@@ -402,17 +409,18 @@ ext3_try_to_allocate(struct super_block 
 
 	*errp = 0;
 
-	if (goal >= 0 && ext3_test_allocatable(goal, bitmap_bh))
+	if (goal >= 0 && ext3_test_allocatable(goal, bitmap_bh, 0))
 		goto got;
 
 repeat:
 	goal = find_next_usable_block(goal, bitmap_bh,
-				EXT3_BLOCKS_PER_GROUP(sb));
+				EXT3_BLOCKS_PER_GROUP(sb), have_access);
 	if (goal < 0)
 		goto fail;
 
 	for (i = 0;
-		i < 7 && goal > 0 && ext3_test_allocatable(goal - 1, bitmap_bh);
+		i < 7 && goal > 0 && 
+			ext3_test_allocatable(goal - 1, bitmap_bh, have_access);
 		i++, goal--);
 
 got:
@@ -429,6 +437,7 @@ got:
 			*errp = fatal;
 			goto fail;
 		}
+		jbd_lock_bh(bitmap_bh);
 		have_access = 1;
 	}
 
@@ -444,6 +453,7 @@ got:
 	}
 
 	BUFFER_TRACE(bitmap_bh, "journal_dirty_metadata for bitmap block");
+	jbd_unlock_bh(bitmap_bh);
 	fatal = ext3_journal_dirty_metadata(handle, bitmap_bh);
 	if (fatal) {
 		*errp = fatal;
@@ -454,6 +464,7 @@ got:
 fail:
 	if (have_access) {
 		BUFFER_TRACE(bitmap_bh, "journal_release_buffer");
+		jbd_unlock_bh(bitmap_bh);
 		ext3_journal_release_buffer(handle, bitmap_bh);
 	}
 	return -1;
diff -puNr linux-2.5.69-mm6/include/linux/spinlock.h b_commited_data-race/include/linux/spinlock.h
--- linux-2.5.69-mm6/include/linux/spinlock.h	Fri May 16 18:03:21 2003
+++ b_commited_data-race/include/linux/spinlock.h	Fri May 23 10:45:58 2003
@@ -540,4 +540,37 @@ do { \
 extern int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
 #endif
 
+/*
+ *  bit-based spin_lock()
+ */
+static inline void bb_spin_lock(int bitnum, unsigned long *addr)
+{
+	/*
+	 * Assuming the lock is uncontended, this never enters
+	 * the body of the outer loop. If it is contended, then
+	 * within the inner loop a non-atomic test is used to
+	 * busywait with less bus contention for a good time to
+	 * attempt to acquire the lock bit.
+	 */
+	preempt_disable();
+#ifdef CONFIG_SMP
+	while (test_and_set_bit(bitnum, addr)) {
+		while (test_bit(bitnum, addr))
+			cpu_relax();
+	}
+#endif
+}
+
+/*
+ *  bit-based spin_unlock()
+ */
+static inline void bb_spin_unlock(int bitnum, unsigned long *addr)
+{
+#ifdef CONFIG_SMP
+	smp_mb__before_clear_bit();
+	clear_bit(bitnum, addr);
+#endif
+	preempt_enable();
+}
+
 #endif /* __LINUX_SPINLOCK_H */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC] probably invalid accounting in jbd
  2003-05-21 16:38         ` Andrew Morton
  2003-05-21 20:45           ` Alex Tomas
@ 2003-05-23 11:20           ` Alex Tomas
  2003-05-23  8:26             ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Alex Tomas @ 2003-05-23 11:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, ext2-devel, Alex Tomas


hi!

I think current journal_release_buffer() which is used by ext3 allocator
in -mm tree has a bug. look, it won't decrement credits if concurrent
thread already put buffer on metadata list. but this may ext3_try_to_allocate()
may overflow handle's credist.
so, jbd will hit J_ASSERT_JH(jh, handle->h_buffer_credits > 0) somewhere


void journal_release_buffer (handle_t *handle, struct buffer_head *bh)
{
        .....
        if (jh->b_jlist == BJ_Reserved && jh->b_transaction == transaction &&
            !buffer_jdirty(jh2bh(jh))) {
                JBUFFER_TRACE(jh, "unused: refiling it");
                handle->h_buffer_credits++;
                __journal_refile_buffer(jh);
        }
        ....
}


here is the patch against 2.5.69-mm6:

diff -puNr linux-2.5.69-mm6/fs/jbd/transaction.c edited/fs/jbd/transaction.c
--- linux-2.5.69-mm6/fs/jbd/transaction.c	Fri May 16 18:03:20 2003
+++ b_commited_data-race/fs/jbd/transaction.c	Fri May 23 11:10:21 2003
@@ -1146,10 +1146,10 @@ void journal_release_buffer (handle_t *h
 	if (jh->b_jlist == BJ_Reserved && jh->b_transaction == transaction &&
 	    !buffer_jdirty(jh2bh(jh))) {
 		JBUFFER_TRACE(jh, "unused: refiling it");
-		handle->h_buffer_credits++;
 		__journal_refile_buffer(jh);
 	}
 	spin_unlock(&journal_datalist_lock);
+	handle->h_buffer_credits++;
 
 	JBUFFER_TRACE(jh, "exit");
 	unlock_journal(journal);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd
  2003-05-23  8:49                             ` Andrew Morton
@ 2003-05-23 12:49                               ` Alex Tomas
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Tomas @ 2003-05-23 12:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alex Tomas, linux-kernel

>>>>> Andrew Morton (AM) writes:

 AM> The balloc.c code is getting awfully convoluted and hard to follow,
 AM> but no obvious restructuring strategies are leaping out at me.

agree. I'll try to figure out something more readable sometime



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Ext2-devel] Re: [RFC] probably invalid accounting in jbd
  2003-05-23  8:26             ` Andrew Morton
@ 2003-05-23 16:02               ` Andreas Dilger
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Dilger @ 2003-05-23 16:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alex Tomas, linux-kernel, ext2-devel

On May 23, 2003  01:26 -0700, Andrew Morton wrote:
> umm, one possible solution to that is to rework the t_outstanding_credits
> logic so that we instead record:
> 
> 	number of buffers attached to the transaction +
> 		sum of the initial size of all currently-running handles.
> 
> as each handle is closed off, we subtract its initial size from the above
> metric.  Any buffers which that handle happened to add to the lists would
> have already been accounted for, when they were added.

One other benefit of doing it that way is that having the original handle
size kept in the handle means that you can properly flag errors when a
nested transaction is "started" on an existing handle and the new start
wants more credits than were actually in the handle.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-05-23 15:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-18 17:21 [RFC] probably bug in current ext3/jbd Alex Tomas
2003-05-19 20:34 ` [Ext2-devel] " Stephen C. Tweedie
2003-05-20  0:46   ` Alex Tomas
2003-05-19 20:51     ` Stephen C. Tweedie
2003-05-20  0:58       ` Alex Tomas
2003-05-20 16:06       ` Alex Tomas
2003-05-21 16:38         ` Andrew Morton
2003-05-21 20:45           ` Alex Tomas
2003-05-21 16:59             ` Andrew Morton
     [not found]               ` <m3brxwe2lr.fsf@lexa.home.net>
     [not found]                 ` <20030521103737.52eddeb3.akpm@digeo.com>
     [not found]                   ` <87n0hgc6s6.fsf@gw.home.net>
     [not found]                     ` <20030521105011.2d316baf.akpm@digeo.com>
     [not found]                       ` <87k7ckc5z2.fsf@gw.home.net>
     [not found]                         ` <20030521143140.3aaa86ba.akpm@digeo.com>
2003-05-23 11:08                           ` Alex Tomas
2003-05-23  8:49                             ` Andrew Morton
2003-05-23 12:49                               ` Alex Tomas
2003-05-23 11:20           ` [RFC] probably invalid accounting in jbd Alex Tomas
2003-05-23  8:26             ` Andrew Morton
2003-05-23 16:02               ` [Ext2-devel] " Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).