linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] JBD: journal_release_buffer()
@ 2005-01-19 15:32 Alex Tomas
  2005-01-24 22:05 ` [Ext2-devel] " Stephen C. Tweedie
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2005-01-19 15:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: ext2-devel, akpm, alex


Good day,

journal_release_buffer() can cause journal overflow in some
(very rare) conditions. Please, take a look at possible fix.

Th journal_head structure holds number of users of the buffer
for current transaction. The routines do_get_write_access()
and journal_get_create_access() tracks this number:
1) resets it to zero if the block's becoming part of the current
   transaction
2) increments it

journal_release_buffer() decrements it and if it's 0, then the
blocks isn't member of the transaction.

The patch has been tested on UP with dbench and tool that
uses xattr very much. 


Signed-off-by: Alex Tomas <alex@clusterfs.com>

Index: linux-2.6.7/include/linux/journal-head.h
===================================================================
--- linux-2.6.7.orig/include/linux/journal-head.h	2003-06-24 18:05:26.000000000 +0400
+++ linux-2.6.7/include/linux/journal-head.h	2005-01-19 14:09:59.000000000 +0300
@@ -80,6 +80,11 @@
 	 * [j_list_lock]
 	 */
 	struct journal_head *b_cpnext, *b_cpprev;
+
+	/*
+	 * counter to track users of the buffer in current transaction
+	 */
+	int b_tcount;
 };
 
 #endif		/* JOURNAL_HEAD_H_INCLUDED */
Index: linux-2.6.7/fs/jbd/transaction.c
===================================================================
--- linux-2.6.7.orig/fs/jbd/transaction.c	2004-08-26 17:12:40.000000000 +0400
+++ linux-2.6.7/fs/jbd/transaction.c	2005-01-19 17:23:30.058160408 +0300
@@ -611,6 +611,10 @@
 		handle->h_buffer_credits--;
 		if (credits)
 			(*credits)++;
+
+		/* the block's becoming member of the trasaction -bzzz */
+		jh->b_tcount = 0;
+
 		goto done;
 	}
 
@@ -694,6 +698,9 @@
 	if (credits)
 		(*credits)++;
 
+	/* the block's becoming member of the trasaction -bzzz */
+	jh->b_tcount = 0;
+
 	/*
 	 * Finally, if the buffer is not journaled right now, we need to make
 	 * sure it doesn't get written to disk before the caller actually
@@ -723,6 +730,11 @@
 		memcpy(jh->b_frozen_data, source+offset, jh2bh(jh)->b_size);
 		kunmap_atomic(source, KM_USER0);
 	}
+
+	/* track all references to the block to be able to recognize the
+	 * situation when the buffer is not part of transaction -bzzz */
+	jh->b_tcount++;
+
 	jbd_unlock_bh_state(bh);
 
 	/*
@@ -822,11 +834,20 @@
 		jh->b_transaction = transaction;
 		JBUFFER_TRACE(jh, "file as BJ_Reserved");
 		__journal_file_buffer(jh, transaction, BJ_Reserved);
+		jh->b_tcount = 0;
 	} else if (jh->b_transaction == journal->j_committing_transaction) {
 		JBUFFER_TRACE(jh, "set next transaction");
 		jh->b_next_transaction = transaction;
+		jh->b_tcount = 0;
 	}
 	spin_unlock(&journal->j_list_lock);
+
+	/*
+	 * track all reference to the block to be able to recognize
+	 * the situation when the buffer is not part of transaction -bzzz
+	 */
+	jh->b_tcount++;
+
 	jbd_unlock_bh_state(bh);
 
 	/*
@@ -1178,8 +1199,40 @@
 void
 journal_release_buffer(handle_t *handle, struct buffer_head *bh, int credits)
 {
+	journal_t *journal = handle->h_transaction->t_journal;
+	struct journal_head *jh = bh2jh(bh);
+
 	BUFFER_TRACE(bh, "entry");
-	handle->h_buffer_credits += credits;
+
+	/* return credit back to the handle if it was really spent */
+	if (credits)
+		handle->h_buffer_credits++; 
+
+	jbd_lock_bh_state(bh);
+	J_ASSERT(jh->b_tcount > 0);
+
+	jh->b_tcount--;
+	if (jh->b_tcount == 0) {
+		/* we can drop it from the transaction -bzzz */
+		J_ASSERT(jh->b_transaction == handle->h_transaction ||
+				jh->b_next_transaction == handle->h_transaction);
+		if (jh->b_transaction == handle->h_transaction) {
+			spin_lock(&journal->j_list_lock);
+			__journal_unfile_buffer(jh);
+			spin_unlock(&journal->j_list_lock);
+		} else if(jh->b_next_transaction) {
+			jh->b_next_transaction = NULL;
+		}
+
+		/* 
+		 * this was last reference to the block from the current
+		 * transaction and we'd like to return credit to the
+		 * whole transaction -bzzz
+		 */
+		if (!credits)
+			handle->h_buffer_credits++; 
+	}
+	jbd_unlock_bh_state(bh);
 }
 
 /** 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-19 15:32 [PATCH] JBD: journal_release_buffer() Alex Tomas
@ 2005-01-24 22:05 ` Stephen C. Tweedie
  2005-01-24 22:24   ` Alex Tomas
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 2005-01-24 22:05 UTC (permalink / raw)
  To: Alex Tomas; +Cc: Stephen Tweedie, linux-kernel, ext2-devel, Andrew Morton

Hi,

On Wed, 2005-01-19 at 15:32, Alex Tomas wrote:

> @@ -1178,8 +1199,40 @@
>  void
>  journal_release_buffer(handle_t *handle, struct buffer_head *bh, int credits)
>  {
> +	/* return credit back to the handle if it was really spent */
> +	if (credits)
> +		handle->h_buffer_credits++; 

> +	jh->b_tcount--;
> +	if (jh->b_tcount == 0) {
> +		/* 
> +		 * this was last reference to the block from the current
> +		 * transaction and we'd like to return credit to the
> +		 * whole transaction -bzzz
> +		 */
> +		if (!credits)
> +			handle->h_buffer_credits++; 

I think there's a problem here.

What if:
  Process A gets write access, and is the first to do so (*credits=1)
  Processes B gets write access (*credits=0)
  B modifies the buffer and finishes
  A looks again, sees B's modifications, and releases the buffer because
it's no use any more.

Now, B's release didn't return credits.  The bh is part of the
transaction and was not released.

What does A do on journal_release_buffer()?  It can either:

1) return its credit.  Not legal: the bh is part of the transaction, A
had the only credit for it, it cannot be returned or else we risk
overflowing the journal.

2) Keep its credit.  Legal from jbd's perspective, but breaks ext3
callers: the whole point of journal_release_buffer() is to try to get a
credit back.  The caller has only a finite number of credits, and must
complete the handle within that limit.  If we don't return the credit,
then we've just lost a credit to the handle, and risk the handle running
out of credit before completing the operation (allocation or xattr
update.)

I looked into this in some depth the first time it came up in the
context of AG's xattr fixes.  The only solution I could see required us
to take a buffer credit during get_write_access in *ALL* cases where the
buffer is not already dirtied against the current transaction (ie. has
undergone journal_dirty_metadata() in addition to get_write_access()). 
(If it's dirty, then we don't have a problem, as the bh is guaranteed to
be used anyway.)  Only then can we safely do a release.

I coded up a patch for that, but came up against another difficulty. 
Trouble is, we don't always know if a bh is dirty against the
transaction.  If a bh is committing against one transaction, then a
get_write_access from a second transaction just sets b_next_transaction;
but the logic doesn't distinguish between that and a
journal_dirty_metadata() as far as the list operations are concerned, so
if we release the buffer in this state, we can't tell if it's dirty or
not.  

Your patch helps there: by checking b_tcount we will be able to tell
whether a buffer with b_next_transaction == j_running_transaction really
needs the credit or not.

Cheers,
 Stephen



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-24 22:05 ` [Ext2-devel] " Stephen C. Tweedie
@ 2005-01-24 22:24   ` Alex Tomas
  2005-01-24 23:35     ` Stephen C. Tweedie
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2005-01-24 22:24 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alex Tomas, linux-kernel, ext2-devel, Andrew Morton

>>>>> Stephen C Tweedie (SCT) writes:

 >> +	/* return credit back to the handle if it was really spent */
 >> +	if (credits)
 >> +		handle->h_buffer_credits++; 

 >> +	jh->b_tcount--;
 >> +	if (jh->b_tcount == 0) {
 >> +		/* 
 >> +		 * this was last reference to the block from the current
 >> +		 * transaction and we'd like to return credit to the
 >> +		 * whole transaction -bzzz
 >> +		 */
 >> +		if (!credits)
 >> +			handle->h_buffer_credits++; 

 SCT> I think there's a problem here.

 SCT> What if:
 SCT>   Process A gets write access, and is the first to do so (*credits=1)
 SCT>   Processes B gets write access (*credits=0)
 SCT>   B modifies the buffer and finishes
 SCT>   A looks again, sees B's modifications, and releases the buffer because
 SCT> it's no use any more.

 SCT> Now, B's release didn't return credits.  The bh is part of the
 SCT> transaction and was not released.

hmmm. that's a good catch. so, with this patch A increments h_buffer_credits
and this one will go to the t_outstanding_credits while the buffer is still
part of the transaction. indeed, an imbalance.

probably something like the following would be enough?

 +	/* return credit back to the handle if it was really spent */
 +	if (credits) {
 +		handle->h_buffer_credits++; 
 +              spin_lock(&handle->h_transaction->t_handle_lock);
 +              handle->h_transaction->t_outstanding_credits++;
 +              spin_lock(&handle->h_transaction->t_handle_lock);
 +      }

thanks, Alex



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-24 22:24   ` Alex Tomas
@ 2005-01-24 23:35     ` Stephen C. Tweedie
  2005-01-25  0:08       ` Alex Tomas
  2005-01-25 14:36       ` Alex Tomas
  0 siblings, 2 replies; 11+ messages in thread
From: Stephen C. Tweedie @ 2005-01-24 23:35 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel, Andrew Morton, Stephen Tweedie

Hi,

On Mon, 2005-01-24 at 22:24, Alex Tomas wrote:
> hmmm. that's a good catch. so, with this patch A increments h_buffer_credits
> and this one will go to the t_outstanding_credits while the buffer is still
> part of the transaction. indeed, an imbalance.
> 
> probably something like the following would be enough?
> 
>  +	/* return credit back to the handle if it was really spent */
>  +	if (credits) {
>  +		handle->h_buffer_credits++; 
>  +              spin_lock(&handle->h_transaction->t_handle_lock);
>  +              handle->h_transaction->t_outstanding_credits++;
>  +              spin_lock(&handle->h_transaction->t_handle_lock);
>  +      }

That returns the credit to A (satisfying ext3), but you just grew
t_outstanding_credits, thus growing the journal commitments without
checking if it's safe to do so or being able to handle failure.  So it
just reintroduces the original bug.

--Stephen


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-24 23:35     ` Stephen C. Tweedie
@ 2005-01-25  0:08       ` Alex Tomas
  2005-01-25 14:36       ` Alex Tomas
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Tomas @ 2005-01-25  0:08 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alex Tomas, linux-kernel, ext2-devel, Andrew Morton

>>>>> Stephen C Tweedie (SCT) writes:

 >> +	/* return credit back to the handle if it was really spent */
 >> +	if (credits) {
 >> +		handle->h_buffer_credits++; 
 >> +              spin_lock(&handle->h_transaction->t_handle_lock);
 >> +              handle->h_transaction->t_outstanding_credits++;
 >> +              spin_lock(&handle->h_transaction->t_handle_lock);
 >> +      }

 SCT> That returns the credit to A (satisfying ext3), but you just grew
 SCT> t_outstanding_credits, thus growing the journal commitments without
 SCT> checking if it's safe to do so or being able to handle failure.  So it
 SCT> just reintroduces the original bug.

incremented h_buffer_credits will be subtracted from incremented
t_outstanding_credits in journal_stop(). so, there is no imbalance
at this point. 

then, if (b_tcount == 0) we can correct t_outstanind_credits or
h_buffer_credits. wrong?

let's try another way ... we have two processes: P1 and P2. they
access block B.

the code is:
    if (credits != 0) {
       handle->h_buffer_credits++;
       transaction->t_outstanding_credits++;
    }
    if (jh->b_tcount == 0)
       transcation->t_outstanding_credits--;

case 1:
     P1 accesses B (*credits=1)
     P1 releases B

        (credits != 0) h1->h_buffer_credits++;
        (credits != 0) transaction->t_outstanding_credits++;
        (b_tcount == 0) transaction->t_outstanding_credits--;

    OUTPUT: transaction->t_outstanding_credits -= 1


case 2:
     P1 accesses B (*credits=1)
     P2 accesses B (*credits=0)
     P1 releases B
     P2 modifies B

        (credits != 0) h1->h_buffer_credits++;
        (credits != 0) transaction->t_outstainding_credits++;
        (b_tcount != 0)

     OUTPUT: journal_stop() will subtract incremented h_buffer_credits
             from incremented t_outstading_credits => no changes


case 3:
     P1 accesses B (*credits=1)
     P2 accesses B (*credits=0)
     P2 releases B
     P1 modifies B

        (credits != 0)
        (credits != 0)
        (b_tcount == 0)

     OUTPUT: no changes        


case 4:
     P1 accesses B (*credits=1)
     P2 accesses B (*credits=0)
     P2 releases B
     P1 releases B

        (credits != 0) 
        (credits != 0)
        (b_tcount == 0)

        (credits != 0) h1->h_buffer_credits++;
        (credits != 0) transaction->t_outstanding_credits++;
        (b_tcount == 0) transaction->t_outstanding_credits--;

     OUTPUT: P2 will change nothing, P1 will drop the buffer and
             correct t_outstanding_credits


case 5:
     P1 accesses B (*credits=1)
     P2 accesses B (*credits=0)
     P1 releases B
     P2 releases B

        (credits != 0) h1->h_buffer_credits++;
        (credits != 0) transaction->t_outstanding_credits++;
        (b_tcount == 0)

        (credits != 0)
        (credits != 0)
        (b_tcount == 0) transaction->t_outstanding_credits--;

     OUTPUT: P1 will change own credits, P2 will drop the buffer
             and correct t_outstanding_credits



thanks, Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-24 23:35     ` Stephen C. Tweedie
  2005-01-25  0:08       ` Alex Tomas
@ 2005-01-25 14:36       ` Alex Tomas
  2005-01-25 16:21         ` Stephen C. Tweedie
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2005-01-25 14:36 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alex Tomas, linux-kernel, ext2-devel, Andrew Morton


Hi, could you review the following solution?


 t_outstanding_credits - number of _modified_ blocks in the transaction
 t_reserved - number of blocks all running handle reserved

 transaction size = t_outstanding_credits + t_reserved;

 



#define TSIZE(t)	((t)->t_outstanding_credits + (t)->t_reserved)

journal_start(blocks)
{
	if (TSIZE(transaction) + blocks > MAX)
		wait_for_space(journal);
		
	transaction->t_reserved += blocks;
	handle->h_buffer_credits = blocks;
}


journal_get_write_access(handle, bh)
{
	if (jh->b_tcount >= 0)
		jh->b_tcount++;
}

journal_dirty_metadata(handle, bh)
{
	transaction->t_reserved--;
	handle->h_buffer_credits--;
	if (jh->b_tcount > 0) {
                /* modifed, no need to track it any more */
		transaction->t_outstanding_credits++;
		jh->b_tcount = -1;
	}
}

journal_release_buffer(handle, bh)
{
	if (jh->b_tcount > 0) {
                /* it's not modified yet */
		jh->b_tcount--;
                if (jh->b_tcount == 0) {
                       /* remove from the transaction */
                }
        }
}

journal_stop(handle)
{
	transaction->t_outstanding_credits -= handle->h_buffer_credits;
}


thanks, Alex



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-25 14:36       ` Alex Tomas
@ 2005-01-25 16:21         ` Stephen C. Tweedie
  2005-01-25 19:30           ` Alex Tomas
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 2005-01-25 16:21 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel, Andrew Morton, Stephen Tweedie

Hi,

On Tue, 2005-01-25 at 14:36, Alex Tomas wrote:
> Hi, could you review the following solution?
> 
> 
>  t_outstanding_credits - number of _modified_ blocks in the transaction
>  t_reserved - number of blocks all running handle reserved
>  transaction size = t_outstanding_credits + t_reserved;

Ah, a work of genius.

Yes, this appears to cover all of the cases.  It changes the semantics
of the jbd layer subtly: a handle isn't "charged" for its use of bh's
until it actually dirties them.  But the handle is required to reserve
enough space up front in any case, so it shouldn't matter much exactly
when it gets charged.  It _would_ affect journal_extend(), if that were
called between a journal_get_write_access() and a
journal_dirty_metadata(), but I don't think we do that anywhere, and the
change in behaviour ought to be merely a slight difference in when it's
legal to extend --- it shouldn't break the rules.

> journal_dirty_metadata(handle, bh)
> {
> 	transaction->t_reserved--;
> 	handle->h_buffer_credits--;
> 	if (jh->b_tcount > 0) {
>                 /* modifed, no need to track it any more */
> 		transaction->t_outstanding_credits++;
> 		jh->b_tcount = -1;
> 	}
> }

Actually, the whole thing can be wrapped in if (jh->b_tcount > 0) {}, I
think.  If we have already charged the transaction for this buffer, then
there's no need to charge the handle for it again.  That's going to be
particularly important for truncate(), where we are continually updating
the same blocks (eg. bitmap, indirect blocks), so we want to make sure
that after the first journal_dirty_metadata() call, no further charge is
made.

If we do that, do we in fact need t_reserved at all?

Very nice!

Cheers,
 Stephen


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-25 16:21         ` Stephen C. Tweedie
@ 2005-01-25 19:30           ` Alex Tomas
  2005-01-26 17:12             ` Stephen C. Tweedie
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2005-01-25 19:30 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alex Tomas, linux-kernel, ext2-devel, Andrew Morton

>>>>> Stephen C Tweedie (SCT) writes:

 >> journal_dirty_metadata(handle, bh)
 >> {
 >>     transaction->t_reserved--;
 >>     handle->h_buffer_credits--;
 >>     if (jh->b_tcount > 0) {
 >>         /* modifed, no need to track it any more */
 >>          transaction-> t_outstanding_credits++;
 >>        jh-> b_tcount = -1;
 >>      }
 >> }

 SCT> Actually, the whole thing can be wrapped in if (jh->b_tcount > 0) {}, I
 SCT> think.  If we have already charged the transaction for this buffer, then
 SCT> there's no need to charge the handle for it again.  That's going to be
 SCT> particularly important for truncate(), where we are continually updating
 SCT> the same blocks (eg. bitmap, indirect blocks), so we want to make sure
 SCT> that after the first journal_dirty_metadata() call, no further charge is
 SCT> made.

the idea is:
1) the sooner we drop reservation, the higher probability to cover many
   changes by single transaction
1) having h_buffer_credits being decremented for each modification
   could help us to debug handle overflow situations

 SCT> If we do that, do we in fact need t_reserved at all?

hmm. if t_outstanding_credits holds number of modified buffers,
then we need sum of all running h_buffer_credits to protect
from transaction overflow. t_reserved is sum of h_buffer_credits.


thanks, Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-25 19:30           ` Alex Tomas
@ 2005-01-26 17:12             ` Stephen C. Tweedie
  2005-01-30 10:55               ` Alex Tomas
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 2005-01-26 17:12 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel, Andrew Morton, Stephen Tweedie

Hi,

On Tue, 2005-01-25 at 19:30, Alex Tomas wrote:

>  >> journal_dirty_metadata(handle, bh)
>  >> {
>  >>     transaction->t_reserved--;
>  >>     handle->h_buffer_credits--;
>  >>     if (jh->b_tcount > 0) {
>  >>         /* modifed, no need to track it any more */
>  >>          transaction-> t_outstanding_credits++;
>  >>        jh-> b_tcount = -1;
>  >>      }
>  >> }
> 
>  SCT> Actually, the whole thing can be wrapped in if (jh->b_tcount > 0) {}, I
>  SCT> think.

> the idea is:
> 1) the sooner we drop reservation, the higher probability to cover many
>    changes by single transaction

But that's exactly why we _don't_ want to do this.  You're dropping the
reservation, but remember, we return unused handle credits to the
transaction at the end of the handle's life.

So normally, when you are, for example, appending 4k at a time to a
large file, the first handle in a new transaction gets charged for the
indirect/bitmap updates.  But subsequent ones do not, so as the later
handles end, they return their credits to the transaction.  That allows
large numbers of handles to update the same buffers in the same
transaction.

But by reducing handle->h_buffer_credits early, even if the bh is
already modified in the current transaction, you are preventing the
return of those credits back to the transaction.  Effectively, each
modification of the same bh in the same transaction is being accounted
for separately, so you're charging the transaction multiple times for a
bh which is only going to be journaled once.  That's going to cause the
transaction to be closed early, as our accounting will tell us that the
transaction is full even when it has only a few buffers modified.

> 1) having h_buffer_credits being decremented for each modification
>    could help us to debug handle overflow situations
> 
>  SCT> If we do that, do we in fact need t_reserved at all?
> 
> hmm. if t_outstanding_credits holds number of modified buffers,
> then we need sum of all running h_buffer_credits to protect
> from transaction overflow. t_reserved is sum of h_buffer_credits.

Why can't we just maintain t_outstanding_credits for this?  Define that
as the sum of all credits either promised or consumed by any handles.

On journal_start(), t_outstanding_credits += blocks;

On journal_dirty_metadata(), 
	{
		if (jb->b_tcount > 0) {
			/* a promised credit has turned into a consumed one */
			handle->h_buffer_credits--;
			jh->b_tcount = -1;
		}
	}

On journal_stop(), any remaining handle credits are promised but
unconsumed; simply return them: 
	transaction->t_outstanding_credits -= handle->h_buffer_credits;

As before journal_release_buffer() doesn't have to do anything about
credits because we're only releasing buffers if they have not been
charged for yet.

--Stephen


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-26 17:12             ` Stephen C. Tweedie
@ 2005-01-30 10:55               ` Alex Tomas
  2005-02-04 23:45                 ` Stephen C. Tweedie
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Tomas @ 2005-01-30 10:55 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alex Tomas, linux-kernel, ext2-devel, Andrew Morton

>>>>> Stephen C Tweedie (SCT) writes:

 SCT> Hi,
 SCT> On Tue, 2005-01-25 at 19:30, Alex Tomas wrote:

 >> >> journal_dirty_metadata(handle, bh)
 >> >> {
 >> >>     transaction->t_reserved--;
 >> >>     handle->h_buffer_credits--;
 >> >>     if (jh->b_tcount > 0) {
 >> >>         /* modifed, no need to track it any more */
 >> >>          transaction-> t_outstanding_credits++;
 >> >>        jh-> b_tcount = -1;
 >> >>      }
 >> >> }
 >> 
 SCT> Actually, the whole thing can be wrapped in if (jh->b_tcount > 0) {}, I
 SCT> think.

 >> the idea is:
 >> 1) the sooner we drop reservation, the higher probability to cover many
 >> changes by single transaction

 SCT> But that's exactly why we _don't_ want to do this.  You're dropping the
 SCT> reservation, but remember, we return unused handle credits to the
 SCT> transaction at the end of the handle's life.

yup, you're right. so, here is an implementation of this.
tested on UP/SMP by dbench/fsx. Stephen, Andrew, could you
review the patch and give it a run? 

thanks, Alex


fix against credits leak in journal_release_buffer()

The idea is to charge a buffer in journal_dirty_metadata(),
not in journal_get_*_access()). Each buffer has flag call
journal_dirty_metadata() sets on the buffer.

Signed-off-by: Alex Tomas <alex@clusterfs.com>

Index: linux-2.6.10/include/linux/journal-head.h
===================================================================
--- linux-2.6.10.orig/include/linux/journal-head.h	2003-06-24 18:05:26.000000000 +0400
+++ linux-2.6.10/include/linux/journal-head.h	2005-01-29 03:20:05.000000000 +0300
@@ -32,6 +32,13 @@
 	unsigned b_jlist;
 
 	/*
+	 * This flag signals the buffer has been modified by
+	 * the currently running transaction
+	 * [jbd_lock_bh_state()]
+	 */
+	unsigned b_modified;
+
+	/*
 	 * Copy of the buffer data frozen for writing to the log.
 	 * [jbd_lock_bh_state()]
 	 */
Index: linux-2.6.10/include/linux/ext3_jbd.h
===================================================================
--- linux-2.6.10.orig/include/linux/ext3_jbd.h	2005-01-28 19:32:15.000000000 +0300
+++ linux-2.6.10/include/linux/ext3_jbd.h	2005-01-29 03:13:41.000000000 +0300
@@ -113,9 +113,9 @@
 
 static inline int
 __ext3_journal_get_undo_access(const char *where, handle_t *handle,
-				struct buffer_head *bh, int *credits)
+				struct buffer_head *bh)
 {
-	int err = journal_get_undo_access(handle, bh, credits);
+	int err = journal_get_undo_access(handle, bh);
 	if (err)
 		ext3_journal_abort_handle(where, __FUNCTION__, bh, handle,err);
 	return err;
@@ -123,19 +123,18 @@
 
 static inline int
 __ext3_journal_get_write_access(const char *where, handle_t *handle,
-				struct buffer_head *bh, int *credits)
+				struct buffer_head *bh)
 {
-	int err = journal_get_write_access(handle, bh, credits);
+	int err = journal_get_write_access(handle, bh);
 	if (err)
 		ext3_journal_abort_handle(where, __FUNCTION__, bh, handle,err);
 	return err;
 }
 
 static inline void
-ext3_journal_release_buffer(handle_t *handle, struct buffer_head *bh,
-				int credits)
+ext3_journal_release_buffer(handle_t *handle, struct buffer_head *bh)
 {
-	journal_release_buffer(handle, bh, credits);
+	journal_release_buffer(handle, bh);
 }
 
 static inline void
@@ -175,12 +174,10 @@
 }
 
 
-#define ext3_journal_get_undo_access(handle, bh, credits) \
-	__ext3_journal_get_undo_access(__FUNCTION__, (handle), (bh), (credits))
+#define ext3_journal_get_undo_access(handle, bh) \
+	__ext3_journal_get_undo_access(__FUNCTION__, (handle), (bh))
 #define ext3_journal_get_write_access(handle, bh) \
-	__ext3_journal_get_write_access(__FUNCTION__, (handle), (bh), NULL)
-#define ext3_journal_get_write_access_credits(handle, bh, credits) \
-	__ext3_journal_get_write_access(__FUNCTION__, (handle), (bh), (credits))
+	__ext3_journal_get_write_access(__FUNCTION__, (handle), (bh))
 #define ext3_journal_revoke(handle, blocknr, bh) \
 	__ext3_journal_revoke(__FUNCTION__, (handle), (blocknr), (bh))
 #define ext3_journal_get_create_access(handle, bh) \
Index: linux-2.6.10/include/linux/jbd.h
===================================================================
--- linux-2.6.10.orig/include/linux/jbd.h	2005-01-28 19:32:15.000000000 +0300
+++ linux-2.6.10/include/linux/jbd.h	2005-01-29 03:13:41.000000000 +0300
@@ -865,15 +865,12 @@
 extern handle_t *journal_start(journal_t *, int nblocks);
 extern int	 journal_restart (handle_t *, int nblocks);
 extern int	 journal_extend (handle_t *, int nblocks);
-extern int	 journal_get_write_access(handle_t *, struct buffer_head *,
-						int *credits);
+extern int	 journal_get_write_access(handle_t *, struct buffer_head *);
 extern int	 journal_get_create_access (handle_t *, struct buffer_head *);
-extern int	 journal_get_undo_access(handle_t *, struct buffer_head *,
-						int *credits);
+extern int	 journal_get_undo_access(handle_t *, struct buffer_head *);
 extern int	 journal_dirty_data (handle_t *, struct buffer_head *);
 extern int	 journal_dirty_metadata (handle_t *, struct buffer_head *);
-extern void	 journal_release_buffer (handle_t *, struct buffer_head *,
-						int credits);
+extern void	 journal_release_buffer (handle_t *, struct buffer_head *);
 extern void	 journal_forget (handle_t *, struct buffer_head *);
 extern void	 journal_sync_buffer (struct buffer_head *);
 extern int	 journal_invalidatepage(journal_t *,
Index: linux-2.6.10/fs/jbd/transaction.c
===================================================================
--- linux-2.6.10.orig/fs/jbd/transaction.c	2005-01-28 19:32:12.000000000 +0300
+++ linux-2.6.10/fs/jbd/transaction.c	2005-01-29 03:21:52.000000000 +0300
@@ -522,7 +522,7 @@
  */
 static int
 do_get_write_access(handle_t *handle, struct journal_head *jh,
-			int force_copy, int *credits) 
+			int force_copy) 
 {
 	struct buffer_head *bh;
 	transaction_t *transaction;
@@ -604,11 +604,6 @@
 		JBUFFER_TRACE(jh, "has frozen data");
 		J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
 		jh->b_next_transaction = transaction;
-
-		J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
-		handle->h_buffer_credits--;
-		if (credits)
-			(*credits)++;
 		goto done;
 	}
 
@@ -688,10 +683,6 @@
 		jh->b_next_transaction = transaction;
 	}
 
-	J_ASSERT(handle->h_buffer_credits > 0);
-	handle->h_buffer_credits--;
-	if (credits)
-		(*credits)++;
 
 	/*
 	 * Finally, if the buffer is not journaled right now, we need to make
@@ -749,8 +740,7 @@
  * because we're write()ing a buffer which is also part of a shared mapping.
  */
 
-int journal_get_write_access(handle_t *handle,
-			struct buffer_head *bh, int *credits)
+int journal_get_write_access(handle_t *handle, struct buffer_head *bh)
 {
 	struct journal_head *jh = journal_add_journal_head(bh);
 	int rc;
@@ -758,7 +748,7 @@
 	/* We do not want to get caught playing with fields which the
 	 * log thread also manipulates.  Make sure that the buffer
 	 * completes any outstanding IO before proceeding. */
-	rc = do_get_write_access(handle, jh, 0, credits);
+	rc = do_get_write_access(handle, jh, 0);
 	journal_put_journal_head(jh);
 	return rc;
 }
@@ -814,9 +804,6 @@
 	J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
 	J_ASSERT_JH(jh, buffer_locked(jh2bh(jh)));
 
-	J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
-	handle->h_buffer_credits--;
-
 	if (jh->b_transaction == NULL) {
 		jh->b_transaction = transaction;
 		JBUFFER_TRACE(jh, "file as BJ_Reserved");
@@ -869,8 +856,7 @@
  *
  * Returns error number or 0 on success.
  */
-int journal_get_undo_access(handle_t *handle, struct buffer_head *bh,
-				int *credits)
+int journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 {
 	int err;
 	struct journal_head *jh = journal_add_journal_head(bh);
@@ -883,7 +869,7 @@
 	 * make sure that obtaining the committed_data is done
 	 * atomically wrt. completion of any outstanding commits.
 	 */
-	err = do_get_write_access(handle, jh, 1, credits);
+	err = do_get_write_access(handle, jh, 1);
 	if (err)
 		goto out;
 
@@ -1111,6 +1097,17 @@
 
 	jbd_lock_bh_state(bh);
 
+	if (jh->b_modified == 0) {
+		/*
+		 * This buffer's got modified and becoming part
+		 * of the transaction. This needs to be done
+		 * once a transaction -bzzz
+		 */
+		jh->b_modified = 1;
+		J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
+		handle->h_buffer_credits--;
+	}
+
 	/*
 	 * fastpath, to avoid expensive locking.  If this buffer is already
 	 * on the running transaction's metadata list there is nothing to do.
@@ -1161,24 +1158,11 @@
  * journal_release_buffer: undo a get_write_access without any buffer
  * updates, if the update decided in the end that it didn't need access.
  *
- * The caller passes in the number of credits which should be put back for
- * this buffer (zero or one).
- *
- * We leave the buffer attached to t_reserved_list because even though this
- * handle doesn't want it, some other concurrent handle may want to journal
- * this buffer.  If that handle is curently in between get_write_access() and
- * journal_dirty_metadata() then it expects the buffer to be reserved.  If
- * we were to rip it off t_reserved_list here, the other handle will explode
- * when journal_dirty_metadata is presented with a non-reserved buffer.
- *
- * If nobody really wants to journal this buffer then it will be thrown
- * away at the start of commit.
  */
 void
-journal_release_buffer(handle_t *handle, struct buffer_head *bh, int credits)
+journal_release_buffer(handle_t *handle, struct buffer_head *bh)
 {
 	BUFFER_TRACE(bh, "entry");
-	handle->h_buffer_credits += credits;
 }
 
 /** 
@@ -1213,6 +1197,12 @@
 		goto not_jbd;
 	jh = bh2jh(bh);
 
+	/*
+	 * The buffer's going from the transaction, we must drop
+	 * all references -bzzz
+	 */
+	jh->b_modified = 0;
+
 	if (jh->b_transaction == handle->h_transaction) {
 		J_ASSERT_JH(jh, !jh->b_frozen_data);
 
Index: linux-2.6.10/fs/jbd/commit.c
===================================================================
--- linux-2.6.10.orig/fs/jbd/commit.c	2005-01-28 19:32:12.000000000 +0300
+++ linux-2.6.10/fs/jbd/commit.c	2005-01-29 03:32:24.000000000 +0300
@@ -229,6 +229,22 @@
 	jbd_debug (3, "JBD: commit phase 2\n");
 
 	/*
+	 * First, drop modified flag: all accesses to the buffers
+	 * will be tracked for a new trasaction only -bzzz
+	 */
+	spin_lock(&journal->j_list_lock);
+	if (commit_transaction->t_buffers) {
+		new_jh = jh = commit_transaction->t_buffers->b_tnext;
+		do {
+			J_ASSERT_JH(new_jh, new_jh->b_modified == 1 ||
+					new_jh->b_modified == 0);
+			new_jh->b_modified = 0;
+			new_jh = new_jh->b_tnext;
+		} while (new_jh != jh);
+	}
+	spin_unlock(&journal->j_list_lock);
+
+	/*
 	 * Now start flushing things to disk, in the order they appear
 	 * on the transaction lists.  Data blocks go first.
 	 */
Index: linux-2.6.10/fs/ext3/balloc.c
===================================================================
--- linux-2.6.10.orig/fs/ext3/balloc.c	2005-01-28 19:32:12.000000000 +0300
+++ linux-2.6.10/fs/ext3/balloc.c	2005-01-29 03:13:41.000000000 +0300
@@ -342,7 +342,7 @@
 	 */
 	/* @@@ check errors */
 	BUFFER_TRACE(bitmap_bh, "getting undo access");
-	err = ext3_journal_get_undo_access(handle, bitmap_bh, NULL);
+	err = ext3_journal_get_undo_access(handle, bitmap_bh);
 	if (err)
 		goto error_return;
 
@@ -986,7 +986,6 @@
 	unsigned long group_first_block;
 	int ret = 0;
 	int fatal;
-	int credits = 0;
 
 	*errp = 0;
 
@@ -996,7 +995,7 @@
 	 * if the buffer is in BJ_Forget state in the committing transaction.
 	 */
 	BUFFER_TRACE(bitmap_bh, "get undo access for new block");
-	fatal = ext3_journal_get_undo_access(handle, bitmap_bh, &credits);
+	fatal = ext3_journal_get_undo_access(handle, bitmap_bh);
 	if (fatal) {
 		*errp = fatal;
 		return -1;
@@ -1087,7 +1086,7 @@
 	}
 
 	BUFFER_TRACE(bitmap_bh, "journal_release_buffer");
-	ext3_journal_release_buffer(handle, bitmap_bh, credits);
+	ext3_journal_release_buffer(handle, bitmap_bh);
 	return ret;
 }
 
Index: linux-2.6.10/fs/ext3/ialloc.c
===================================================================
--- linux-2.6.10.orig/fs/ext3/ialloc.c	2005-01-28 19:32:12.000000000 +0300
+++ linux-2.6.10/fs/ext3/ialloc.c	2005-01-29 03:13:41.000000000 +0300
@@ -474,11 +474,9 @@
 		ino = ext3_find_next_zero_bit((unsigned long *)
 				bitmap_bh->b_data, EXT3_INODES_PER_GROUP(sb), ino);
 		if (ino < EXT3_INODES_PER_GROUP(sb)) {
-			int credits = 0;
 
 			BUFFER_TRACE(bitmap_bh, "get_write_access");
-			err = ext3_journal_get_write_access_credits(handle,
-							bitmap_bh, &credits);
+			err = ext3_journal_get_write_access(handle, bitmap_bh);
 			if (err)
 				goto fail;
 
@@ -494,7 +492,7 @@
 				goto got;
 			}
 			/* we lost it */
-			journal_release_buffer(handle, bitmap_bh, credits);
+			journal_release_buffer(handle, bitmap_bh);
 
 			if (++ino < EXT3_INODES_PER_GROUP(sb))
 				goto repeat_in_this_group;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()
  2005-01-30 10:55               ` Alex Tomas
@ 2005-02-04 23:45                 ` Stephen C. Tweedie
  0 siblings, 0 replies; 11+ messages in thread
From: Stephen C. Tweedie @ 2005-02-04 23:45 UTC (permalink / raw)
  To: Alex Tomas; +Cc: linux-kernel, ext2-devel, Andrew Morton, Stephen Tweedie

Hi,

On Sun, 2005-01-30 at 10:55, Alex Tomas wrote:

> yup, you're right. so, here is an implementation of this.
> tested on UP/SMP by dbench/fsx. Stephen, Andrew, could you
> review the patch and give it a run? 

Just to say I haven't forgotten, just been battling this week against
all sorts of apparent hardware problems on some test boxes... I'll try
to get you some proper results next week.

Cheers,
 Stephen


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-02-04 23:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-19 15:32 [PATCH] JBD: journal_release_buffer() Alex Tomas
2005-01-24 22:05 ` [Ext2-devel] " Stephen C. Tweedie
2005-01-24 22:24   ` Alex Tomas
2005-01-24 23:35     ` Stephen C. Tweedie
2005-01-25  0:08       ` Alex Tomas
2005-01-25 14:36       ` Alex Tomas
2005-01-25 16:21         ` Stephen C. Tweedie
2005-01-25 19:30           ` Alex Tomas
2005-01-26 17:12             ` Stephen C. Tweedie
2005-01-30 10:55               ` Alex Tomas
2005-02-04 23:45                 ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).