All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix possible NULL pointer dereference in JBD2
@ 2009-02-05 14:52 Jan Kara
  2009-02-05 14:53   ` [Ocfs2-devel] " Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-05 14:52 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, mfasheh


  Hi,

  the three patches below fix a bug in jbd2_journal_begin_ordered_truncate() that
could possibly lead to NULL pointer dereference. The question is how to merge
the fix as it changes the prototype of the function and both ext4 and ocfs2 use it.
Mark, Mingming, any suggestions?

                                                                                Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] jbd2: Fix possible NULL pointer dereference in jbd2_journal_begin_ordered_truncate()
  2009-02-05 14:52 [PATCH 0/3] Fix possible NULL pointer dereference in JBD2 Jan Kara
@ 2009-02-05 14:53   ` Jan Kara
  2009-02-05 19:32 ` [PATCH 0/3] Fix possible NULL pointer dereference in JBD2 Mark Fasheh
  2009-02-05 21:37 ` Theodore Tso
  2 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-05 14:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, mfasheh, Jan Kara, ocfs2-devel, Dan Carpenter

If we race with commit code setting i_transaction to NULL, we could possibly
dereference it. Proper locking requires journal pointer (journal->j_list_lock)
we don't have. So we have to change the prototype of the function so that
filesystem passes us the journal pointer. Also add more detailed comment
about why function does what it does how it should be used.

Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious
code.

CC: linux-ext4@vger.kernel.org
CC: ocfs2-devel@oss.oracle.com
CC: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/transaction.c |   41 ++++++++++++++++++++++++++++++-----------
 include/linux/jbd2.h  |    3 ++-
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 46b4e34..9f1f5f2 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2129,26 +2129,45 @@ done:
 }
 
 /*
- * This function must be called when inode is journaled in ordered mode
- * before truncation happens. It starts writeout of truncated part in
- * case it is in the committing transaction so that we stand to ordered
- * mode consistency guarantees.
+ * File truncate and transaction commit interact with each other in a
+ * non-trivial way. If a transaction writing data block A is committing,
+ * we cannot discard the data by truncate until we have written them.
+ * Otherwise if we crashed after the transaction with write has committed
+ * but before the transaction with truncate has committed, we could see
+ * stale data in block A. This function is a helper to solve this problem.
+ * It starts writeout of the truncated part in case it is in the committing
+ * transaction.
+ *
+ * Filesystem must call this function when inode is journaled in ordered mode
+ * before truncation happens and after the inode has been placed on orphan list
+ * with the new inode size. The second condition avoids the race that someone
+ * writes new data and we start committing the transaction after this function
+ * has been called but before a transaction for truncate is started (and
+ * furthermore it allows us to optimize the case where addition to orphan list
+ * happens in the same transaction as write - we don't have to write any data
+ * in such case).
  */
-int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode,
+int jbd2_journal_begin_ordered_truncate(journal_t *journal,
+					struct jbd2_inode *jinode,
 					loff_t new_size)
 {
-	journal_t *journal;
-	transaction_t *commit_trans;
+	transaction_t *inode_trans, *commit_trans;
 	int ret = 0;
 
-	if (!inode->i_transaction && !inode->i_next_transaction)
+	/* This is a quick check to avoid locking if not necessary */
+	if (!jinode->i_transaction)
 		goto out;
-	journal = inode->i_transaction->t_journal;
+	/* Locks are here just to force reading of recent values, it is
+	 * enough that the transaction was not committing before we started
+	 * a transaction adding the inode to orphan list */
 	spin_lock(&journal->j_state_lock);
 	commit_trans = journal->j_committing_transaction;
 	spin_unlock(&journal->j_state_lock);
-	if (inode->i_transaction == commit_trans) {
-		ret = filemap_fdatawrite_range(inode->i_vfs_inode->i_mapping,
+	spin_lock(&journal->j_list_lock);
+	inode_trans = jinode->i_transaction;
+	spin_unlock(&journal->j_list_lock);
+	if (inode_trans == commit_trans) {
+		ret = filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
 			new_size, LLONG_MAX);
 		if (ret)
 			jbd2_journal_abort(journal, ret);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b28b37e..4d248b3 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1150,7 +1150,8 @@ extern int	   jbd2_journal_clear_err  (journal_t *);
 extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
 extern int	   jbd2_journal_force_commit(journal_t *);
 extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
-extern int	   jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, loff_t new_size);
+extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
+				struct jbd2_inode *inode, loff_t new_size);
 extern void	   jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
 extern void	   jbd2_journal_release_jbd_inode(journal_t *journal, struct jbd2_inode *jinode);
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] jbd2: Fix possible NULL pointer dereference in jbd2_journal_begin_ordered_truncate()
@ 2009-02-05 14:53   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-05 14:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, mfasheh, Jan Kara, ocfs2-devel, Dan Carpenter

If we race with commit code setting i_transaction to NULL, we could possibly
dereference it. Proper locking requires journal pointer (journal->j_list_lock)
we don't have. So we have to change the prototype of the function so that
filesystem passes us the journal pointer. Also add more detailed comment
about why function does what it does how it should be used.

Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious
code.

CC: linux-ext4 at vger.kernel.org
CC: ocfs2-devel at oss.oracle.com
CC: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/transaction.c |   41 ++++++++++++++++++++++++++++++-----------
 include/linux/jbd2.h  |    3 ++-
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 46b4e34..9f1f5f2 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2129,26 +2129,45 @@ done:
 }
 
 /*
- * This function must be called when inode is journaled in ordered mode
- * before truncation happens. It starts writeout of truncated part in
- * case it is in the committing transaction so that we stand to ordered
- * mode consistency guarantees.
+ * File truncate and transaction commit interact with each other in a
+ * non-trivial way. If a transaction writing data block A is committing,
+ * we cannot discard the data by truncate until we have written them.
+ * Otherwise if we crashed after the transaction with write has committed
+ * but before the transaction with truncate has committed, we could see
+ * stale data in block A. This function is a helper to solve this problem.
+ * It starts writeout of the truncated part in case it is in the committing
+ * transaction.
+ *
+ * Filesystem must call this function when inode is journaled in ordered mode
+ * before truncation happens and after the inode has been placed on orphan list
+ * with the new inode size. The second condition avoids the race that someone
+ * writes new data and we start committing the transaction after this function
+ * has been called but before a transaction for truncate is started (and
+ * furthermore it allows us to optimize the case where addition to orphan list
+ * happens in the same transaction as write - we don't have to write any data
+ * in such case).
  */
-int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode,
+int jbd2_journal_begin_ordered_truncate(journal_t *journal,
+					struct jbd2_inode *jinode,
 					loff_t new_size)
 {
-	journal_t *journal;
-	transaction_t *commit_trans;
+	transaction_t *inode_trans, *commit_trans;
 	int ret = 0;
 
-	if (!inode->i_transaction && !inode->i_next_transaction)
+	/* This is a quick check to avoid locking if not necessary */
+	if (!jinode->i_transaction)
 		goto out;
-	journal = inode->i_transaction->t_journal;
+	/* Locks are here just to force reading of recent values, it is
+	 * enough that the transaction was not committing before we started
+	 * a transaction adding the inode to orphan list */
 	spin_lock(&journal->j_state_lock);
 	commit_trans = journal->j_committing_transaction;
 	spin_unlock(&journal->j_state_lock);
-	if (inode->i_transaction == commit_trans) {
-		ret = filemap_fdatawrite_range(inode->i_vfs_inode->i_mapping,
+	spin_lock(&journal->j_list_lock);
+	inode_trans = jinode->i_transaction;
+	spin_unlock(&journal->j_list_lock);
+	if (inode_trans == commit_trans) {
+		ret = filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
 			new_size, LLONG_MAX);
 		if (ret)
 			jbd2_journal_abort(journal, ret);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b28b37e..4d248b3 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1150,7 +1150,8 @@ extern int	   jbd2_journal_clear_err  (journal_t *);
 extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
 extern int	   jbd2_journal_force_commit(journal_t *);
 extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
-extern int	   jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, loff_t new_size);
+extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
+				struct jbd2_inode *inode, loff_t new_size);
 extern void	   jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
 extern void	   jbd2_journal_release_jbd_inode(journal_t *journal, struct jbd2_inode *jinode);
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] ext4: Add journal parameter to jbd2_journal_begin_ordered_truncate()
  2009-02-05 14:53   ` [Ocfs2-devel] " Jan Kara
  (?)
@ 2009-02-05 14:53   ` Jan Kara
  2009-02-05 14:53       ` [Ocfs2-devel] " Jan Kara
  -1 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2009-02-05 14:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, mfasheh, Jan Kara

Add new parameter of the function.

CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 03ba20b..658c4a7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -47,8 +47,10 @@
 static inline int ext4_begin_ordered_truncate(struct inode *inode,
 					      loff_t new_size)
 {
-	return jbd2_journal_begin_ordered_truncate(&EXT4_I(inode)->jinode,
-						   new_size);
+	return jbd2_journal_begin_ordered_truncate(
+					EXT4_SB(inode->i_sb)->s_journal,
+					&EXT4_I(inode)->jinode,
+					new_size);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned long offset);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
  2009-02-05 14:53   ` [PATCH 2/3] ext4: Add journal parameter to jbd2_journal_begin_ordered_truncate() Jan Kara
@ 2009-02-05 14:53       ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-05 14:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, mfasheh, Jan Kara, ocfs2-devel, mfasheh

Add new parameter of the function.

CC: ocfs2-devel@oss.oracle.com
CC: mfasheh@suse.de
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/journal.h |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 3c3532e..172850a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
 static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
 					       loff_t new_size)
 {
-	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
-						   new_size);
+	return jbd2_journal_begin_ordered_truncate(
+				OCFS2_SB(inode->i_sb)->journal->j_journal,
+				&OCFS2_I(inode)->ip_jinode,
+				new_size);
 }
 
 #endif /* OCFS2_JOURNAL_H */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
@ 2009-02-05 14:53       ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-05 14:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, mfasheh, Jan Kara, ocfs2-devel, mfasheh

Add new parameter of the function.

CC: ocfs2-devel at oss.oracle.com
CC: mfasheh at suse.de
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/journal.h |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 3c3532e..172850a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
 static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
 					       loff_t new_size)
 {
-	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
-						   new_size);
+	return jbd2_journal_begin_ordered_truncate(
+				OCFS2_SB(inode->i_sb)->journal->j_journal,
+				&OCFS2_I(inode)->ip_jinode,
+				new_size);
 }
 
 #endif /* OCFS2_JOURNAL_H */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Fix possible NULL pointer dereference in JBD2
  2009-02-05 14:52 [PATCH 0/3] Fix possible NULL pointer dereference in JBD2 Jan Kara
  2009-02-05 14:53   ` [Ocfs2-devel] " Jan Kara
@ 2009-02-05 19:32 ` Mark Fasheh
  2009-02-05 21:37 ` Theodore Tso
  2 siblings, 0 replies; 17+ messages in thread
From: Mark Fasheh @ 2009-02-05 19:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Andrew Morton

On Thu, Feb 05, 2009 at 03:52:59PM +0100, Jan Kara wrote:
> 
>   Hi,
> 
>   the three patches below fix a bug in jbd2_journal_begin_ordered_truncate() that
> could possibly lead to NULL pointer dereference. The question is how to merge
> the fix as it changes the prototype of the function and both ext4 and ocfs2 use it.
> Mark, Mingming, any suggestions?

This seems like it'd be easiest for Andrew to carry. The Ocfs2 patch is
unlikely to conflict with anything in my tree.

Btw, you can add:

Acked-by: Mark Fasheh <mfasheh@suse.com>

to the Ocfs2 patch.
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Fix possible NULL pointer dereference in JBD2
  2009-02-05 14:52 [PATCH 0/3] Fix possible NULL pointer dereference in JBD2 Jan Kara
  2009-02-05 14:53   ` [Ocfs2-devel] " Jan Kara
  2009-02-05 19:32 ` [PATCH 0/3] Fix possible NULL pointer dereference in JBD2 Mark Fasheh
@ 2009-02-05 21:37 ` Theodore Tso
  2009-02-09 17:34   ` Jan Kara
  2 siblings, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2009-02-05 21:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Andrew Morton, mfasheh

On Thu, Feb 05, 2009 at 03:52:59PM +0100, Jan Kara wrote:
> 

>   the three patches below fix a bug in
> jbd2_journal_begin_ordered_truncate() that could possibly lead to
> NULL pointer dereference. The question is how to merge the fix as it
> changes the prototype of the function and both ext4 and ocfs2 use
> it.  Mark, Mingming, any suggestions?

Jan, how how easy is it to hit this?  Do you believe we should try to
get this in for 2.6.29?  I got some final things to push to Linus so
I can also push it via the ext4 tree.  I doubt it will conflict with
> anything in my tree, either.

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Fix possible NULL pointer dereference in JBD2
  2009-02-05 21:37 ` Theodore Tso
@ 2009-02-09 17:34   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-09 17:34 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, Andrew Morton, mfasheh

On Thu 05-02-09 16:37:44, Theodore Tso wrote:
> On Thu, Feb 05, 2009 at 03:52:59PM +0100, Jan Kara wrote:
> > 
> 
> >   the three patches below fix a bug in
> > jbd2_journal_begin_ordered_truncate() that could possibly lead to
> > NULL pointer dereference. The question is how to merge the fix as it
> > changes the prototype of the function and both ext4 and ocfs2 use
> > it.  Mark, Mingming, any suggestions?
> 
> Jan, how how easy is it to hit this?  Do you believe we should try to
> get this in for 2.6.29?  I got some final things to push to Linus so
> I can also push it via the ext4 tree.  I doubt it will conflict with
> > anything in my tree, either.
  The possibility of hitting this is quite small. We check
inode->i_transaction for NULL just before dereferencing it so the commit
would have to set it to NULL just between those two instructions.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ocfs2-devel] [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
  2009-02-05 14:53       ` [Ocfs2-devel] " Jan Kara
@ 2009-02-10  3:19         ` Joel Becker
  -1 siblings, 0 replies; 17+ messages in thread
From: Joel Becker @ 2009-02-10  3:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, mfasheh, mfasheh, Andrew Morton, ocfs2-devel

On Thu, Feb 05, 2009 at 03:53:02PM +0100, Jan Kara wrote:
> Add new parameter of the function.
> 
> CC: ocfs2-devel@oss.oracle.com
> CC: mfasheh@suse.de
> Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Joel Becker <joel.becker@oracle.com>

> ---
>  fs/ocfs2/journal.h |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 3c3532e..172850a 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
>  static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
>  					       loff_t new_size)
>  {
> -	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
> -						   new_size);
> +	return jbd2_journal_begin_ordered_truncate(
> +				OCFS2_SB(inode->i_sb)->journal->j_journal,
> +				&OCFS2_I(inode)->ip_jinode,
> +				new_size);
>  }
>  
>  #endif /* OCFS2_JOURNAL_H */
> -- 
> 1.6.0.2
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

-- 

Life's Little Instruction Book #464

	"Don't miss the magic of the moment by focusing on what's
	 to come."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
@ 2009-02-10  3:19         ` Joel Becker
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Becker @ 2009-02-10  3:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, mfasheh, mfasheh, Andrew Morton, ocfs2-devel

On Thu, Feb 05, 2009 at 03:53:02PM +0100, Jan Kara wrote:
> Add new parameter of the function.
> 
> CC: ocfs2-devel at oss.oracle.com
> CC: mfasheh at suse.de
> Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Joel Becker <joel.becker@oracle.com>

> ---
>  fs/ocfs2/journal.h |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 3c3532e..172850a 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
>  static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
>  					       loff_t new_size)
>  {
> -	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
> -						   new_size);
> +	return jbd2_journal_begin_ordered_truncate(
> +				OCFS2_SB(inode->i_sb)->journal->j_journal,
> +				&OCFS2_I(inode)->ip_jinode,
> +				new_size);
>  }
>  
>  #endif /* OCFS2_JOURNAL_H */
> -- 
> 1.6.0.2
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

-- 

Life's Little Instruction Book #464

	"Don't miss the magic of the moment by focusing on what's
	 to come."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
  2009-02-05 14:53       ` [Ocfs2-devel] " Jan Kara
@ 2009-02-10 15:12         ` Theodore Tso
  -1 siblings, 0 replies; 17+ messages in thread
From: Theodore Tso @ 2009-02-10 15:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Andrew Morton, mfasheh, ocfs2-devel, mfasheh

The problem with separating out the patch into three separate commits
is that it can potentially break a "git bisect".  To fix this, I
believe the right thing to do is to combine the three patches into a
single commit, and to push this via the ext4 tree; any objections?

						- Ted

jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()

From: Jan Kara <jack@suse.cz>

If we race with commit code setting i_transaction to NULL, we could
possibly dereference it.  Proper locking requires the journal pointer
(to access journal->j_list_lock), which we don't have.  So we have to
change the prototype of the function so that filesystem passes us the
journal pointer.  Also add a more detailed comment about why the
function jbd2_journal_begin_ordered_truncate() does what it does and
how it should be used.

Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
suspitious code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Joel Becker <joel.becker@oracle.com>
CC: linux-ext4@vger.kernel.org
CC: ocfs2-devel@oss.oracle.com
CC: mfasheh@suse.de
CC: Dan Carpenter <error27@gmail.com>
---
 fs/ext4/inode.c       |    6 ++++--
 fs/jbd2/transaction.c |   42 +++++++++++++++++++++++++++++++-----------
 fs/ocfs2/journal.h    |    6 ++++--
 include/linux/jbd2.h  |    3 ++-
 4 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 03ba20b..658c4a7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -47,8 +47,10 @@
 static inline int ext4_begin_ordered_truncate(struct inode *inode,
 					      loff_t new_size)
 {
-	return jbd2_journal_begin_ordered_truncate(&EXT4_I(inode)->jinode,
-						   new_size);
+	return jbd2_journal_begin_ordered_truncate(
+					EXT4_SB(inode->i_sb)->s_journal,
+					&EXT4_I(inode)->jinode,
+					new_size);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned long offset);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 46b4e34..28ce21d 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2129,26 +2129,46 @@ done:
 }
 
 /*
- * This function must be called when inode is journaled in ordered mode
- * before truncation happens. It starts writeout of truncated part in
- * case it is in the committing transaction so that we stand to ordered
- * mode consistency guarantees.
+ * File truncate and transaction commit interact with each other in a
+ * non-trivial way.  If a transaction writing data block A is
+ * committing, we cannot discard the data by truncate until we have
+ * written them.  Otherwise if we crashed after the transaction with
+ * write has committed but before the transaction with truncate has
+ * committed, we could see stale data in block A.  This function is a
+ * helper to solve this problem.  It starts writeout of the truncated
+ * part in case it is in the committing transaction.
+ *
+ * Filesystem code must call this function when inode is journaled in
+ * ordered mode before truncation happens and after the inode has been
+ * placed on orphan list with the new inode size. The second condition
+ * avoids the race that someone writes new data and we start
+ * committing the transaction after this function has been called but
+ * before a transaction for truncate is started (and furthermore it
+ * allows us to optimize the case where the addition to orphan list
+ * happens in the same transaction as write --- we don't have to write
+ * any data in such case).
  */
-int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode,
+int jbd2_journal_begin_ordered_truncate(journal_t *journal,
+					struct jbd2_inode *jinode,
 					loff_t new_size)
 {
-	journal_t *journal;
-	transaction_t *commit_trans;
+	transaction_t *inode_trans, *commit_trans;
 	int ret = 0;
 
-	if (!inode->i_transaction && !inode->i_next_transaction)
+	/* This is a quick check to avoid locking if not necessary */
+	if (!jinode->i_transaction)
 		goto out;
-	journal = inode->i_transaction->t_journal;
+	/* Locks are here just to force reading of recent values, it is
+	 * enough that the transaction was not committing before we started
+	 * a transaction adding the inode to orphan list */
 	spin_lock(&journal->j_state_lock);
 	commit_trans = journal->j_committing_transaction;
 	spin_unlock(&journal->j_state_lock);
-	if (inode->i_transaction == commit_trans) {
-		ret = filemap_fdatawrite_range(inode->i_vfs_inode->i_mapping,
+	spin_lock(&journal->j_list_lock);
+	inode_trans = jinode->i_transaction;
+	spin_unlock(&journal->j_list_lock);
+	if (inode_trans == commit_trans) {
+		ret = filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
 			new_size, LLONG_MAX);
 		if (ret)
 			jbd2_journal_abort(journal, ret);
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 3c3532e..172850a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
 static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
 					       loff_t new_size)
 {
-	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
-						   new_size);
+	return jbd2_journal_begin_ordered_truncate(
+				OCFS2_SB(inode->i_sb)->journal->j_journal,
+				&OCFS2_I(inode)->ip_jinode,
+				new_size);
 }
 
 #endif /* OCFS2_JOURNAL_H */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b28b37e..4d248b3 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1150,7 +1150,8 @@ extern int	   jbd2_journal_clear_err  (journal_t *);
 extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
 extern int	   jbd2_journal_force_commit(journal_t *);
 extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
-extern int	   jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, loff_t new_size);
+extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
+				struct jbd2_inode *inode, loff_t new_size);
 extern void	   jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
 extern void	   jbd2_journal_release_jbd_inode(journal_t *journal, struct jbd2_inode *jinode);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
@ 2009-02-10 15:12         ` Theodore Tso
  0 siblings, 0 replies; 17+ messages in thread
From: Theodore Tso @ 2009-02-10 15:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Andrew Morton, mfasheh, ocfs2-devel, mfasheh

The problem with separating out the patch into three separate commits
is that it can potentially break a "git bisect".  To fix this, I
believe the right thing to do is to combine the three patches into a
single commit, and to push this via the ext4 tree; any objections?

						- Ted

jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()

From: Jan Kara <jack@suse.cz>

If we race with commit code setting i_transaction to NULL, we could
possibly dereference it.  Proper locking requires the journal pointer
(to access journal->j_list_lock), which we don't have.  So we have to
change the prototype of the function so that filesystem passes us the
journal pointer.  Also add a more detailed comment about why the
function jbd2_journal_begin_ordered_truncate() does what it does and
how it should be used.

Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
suspitious code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Joel Becker <joel.becker@oracle.com>
CC: linux-ext4 at vger.kernel.org
CC: ocfs2-devel at oss.oracle.com
CC: mfasheh at suse.de
CC: Dan Carpenter <error27@gmail.com>
---
 fs/ext4/inode.c       |    6 ++++--
 fs/jbd2/transaction.c |   42 +++++++++++++++++++++++++++++++-----------
 fs/ocfs2/journal.h    |    6 ++++--
 include/linux/jbd2.h  |    3 ++-
 4 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 03ba20b..658c4a7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -47,8 +47,10 @@
 static inline int ext4_begin_ordered_truncate(struct inode *inode,
 					      loff_t new_size)
 {
-	return jbd2_journal_begin_ordered_truncate(&EXT4_I(inode)->jinode,
-						   new_size);
+	return jbd2_journal_begin_ordered_truncate(
+					EXT4_SB(inode->i_sb)->s_journal,
+					&EXT4_I(inode)->jinode,
+					new_size);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned long offset);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 46b4e34..28ce21d 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2129,26 +2129,46 @@ done:
 }
 
 /*
- * This function must be called when inode is journaled in ordered mode
- * before truncation happens. It starts writeout of truncated part in
- * case it is in the committing transaction so that we stand to ordered
- * mode consistency guarantees.
+ * File truncate and transaction commit interact with each other in a
+ * non-trivial way.  If a transaction writing data block A is
+ * committing, we cannot discard the data by truncate until we have
+ * written them.  Otherwise if we crashed after the transaction with
+ * write has committed but before the transaction with truncate has
+ * committed, we could see stale data in block A.  This function is a
+ * helper to solve this problem.  It starts writeout of the truncated
+ * part in case it is in the committing transaction.
+ *
+ * Filesystem code must call this function when inode is journaled in
+ * ordered mode before truncation happens and after the inode has been
+ * placed on orphan list with the new inode size. The second condition
+ * avoids the race that someone writes new data and we start
+ * committing the transaction after this function has been called but
+ * before a transaction for truncate is started (and furthermore it
+ * allows us to optimize the case where the addition to orphan list
+ * happens in the same transaction as write --- we don't have to write
+ * any data in such case).
  */
-int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode,
+int jbd2_journal_begin_ordered_truncate(journal_t *journal,
+					struct jbd2_inode *jinode,
 					loff_t new_size)
 {
-	journal_t *journal;
-	transaction_t *commit_trans;
+	transaction_t *inode_trans, *commit_trans;
 	int ret = 0;
 
-	if (!inode->i_transaction && !inode->i_next_transaction)
+	/* This is a quick check to avoid locking if not necessary */
+	if (!jinode->i_transaction)
 		goto out;
-	journal = inode->i_transaction->t_journal;
+	/* Locks are here just to force reading of recent values, it is
+	 * enough that the transaction was not committing before we started
+	 * a transaction adding the inode to orphan list */
 	spin_lock(&journal->j_state_lock);
 	commit_trans = journal->j_committing_transaction;
 	spin_unlock(&journal->j_state_lock);
-	if (inode->i_transaction == commit_trans) {
-		ret = filemap_fdatawrite_range(inode->i_vfs_inode->i_mapping,
+	spin_lock(&journal->j_list_lock);
+	inode_trans = jinode->i_transaction;
+	spin_unlock(&journal->j_list_lock);
+	if (inode_trans == commit_trans) {
+		ret = filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
 			new_size, LLONG_MAX);
 		if (ret)
 			jbd2_journal_abort(journal, ret);
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 3c3532e..172850a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
 static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
 					       loff_t new_size)
 {
-	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
-						   new_size);
+	return jbd2_journal_begin_ordered_truncate(
+				OCFS2_SB(inode->i_sb)->journal->j_journal,
+				&OCFS2_I(inode)->ip_jinode,
+				new_size);
 }
 
 #endif /* OCFS2_JOURNAL_H */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b28b37e..4d248b3 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1150,7 +1150,8 @@ extern int	   jbd2_journal_clear_err  (journal_t *);
 extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
 extern int	   jbd2_journal_force_commit(journal_t *);
 extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
-extern int	   jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, loff_t new_size);
+extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
+				struct jbd2_inode *inode, loff_t new_size);
 extern void	   jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
 extern void	   jbd2_journal_release_jbd_inode(journal_t *journal, struct jbd2_inode *jinode);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
  2009-02-10 15:12         ` [Ocfs2-devel] " Theodore Tso
@ 2009-02-11  9:25           ` Jan Kara
  -1 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-11  9:25 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, Andrew Morton, mfasheh, ocfs2-devel, mfasheh

On Tue 10-02-09 10:12:55, Theodore Tso wrote:
> The problem with separating out the patch into three separate commits
> is that it can potentially break a "git bisect".  To fix this, I
> believe the right thing to do is to combine the three patches into a
> single commit, and to push this via the ext4 tree; any objections?
  On a second thought, this is probably the best for such simple change.
So please go ahead. Thanks.

								Honza

> jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()
> 
> From: Jan Kara <jack@suse.cz>
> 
> If we race with commit code setting i_transaction to NULL, we could
> possibly dereference it.  Proper locking requires the journal pointer
> (to access journal->j_list_lock), which we don't have.  So we have to
> change the prototype of the function so that filesystem passes us the
> journal pointer.  Also add a more detailed comment about why the
> function jbd2_journal_begin_ordered_truncate() does what it does and
> how it should be used.
> 
> Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
> suspitious code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Acked-by: Joel Becker <joel.becker@oracle.com>
> CC: linux-ext4@vger.kernel.org
> CC: ocfs2-devel@oss.oracle.com
> CC: mfasheh@suse.de
> CC: Dan Carpenter <error27@gmail.com>
> ---
>  fs/ext4/inode.c       |    6 ++++--
>  fs/jbd2/transaction.c |   42 +++++++++++++++++++++++++++++++-----------
>  fs/ocfs2/journal.h    |    6 ++++--
>  include/linux/jbd2.h  |    3 ++-
>  4 files changed, 41 insertions(+), 16 deletions(-)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 03ba20b..658c4a7 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -47,8 +47,10 @@
>  static inline int ext4_begin_ordered_truncate(struct inode *inode,
>  					      loff_t new_size)
>  {
> -	return jbd2_journal_begin_ordered_truncate(&EXT4_I(inode)->jinode,
> -						   new_size);
> +	return jbd2_journal_begin_ordered_truncate(
> +					EXT4_SB(inode->i_sb)->s_journal,
> +					&EXT4_I(inode)->jinode,
> +					new_size);
>  }
>  
>  static void ext4_invalidatepage(struct page *page, unsigned long offset);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 46b4e34..28ce21d 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -2129,26 +2129,46 @@ done:
>  }
>  
>  /*
> - * This function must be called when inode is journaled in ordered mode
> - * before truncation happens. It starts writeout of truncated part in
> - * case it is in the committing transaction so that we stand to ordered
> - * mode consistency guarantees.
> + * File truncate and transaction commit interact with each other in a
> + * non-trivial way.  If a transaction writing data block A is
> + * committing, we cannot discard the data by truncate until we have
> + * written them.  Otherwise if we crashed after the transaction with
> + * write has committed but before the transaction with truncate has
> + * committed, we could see stale data in block A.  This function is a
> + * helper to solve this problem.  It starts writeout of the truncated
> + * part in case it is in the committing transaction.
> + *
> + * Filesystem code must call this function when inode is journaled in
> + * ordered mode before truncation happens and after the inode has been
> + * placed on orphan list with the new inode size. The second condition
> + * avoids the race that someone writes new data and we start
> + * committing the transaction after this function has been called but
> + * before a transaction for truncate is started (and furthermore it
> + * allows us to optimize the case where the addition to orphan list
> + * happens in the same transaction as write --- we don't have to write
> + * any data in such case).
>   */
> -int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode,
> +int jbd2_journal_begin_ordered_truncate(journal_t *journal,
> +					struct jbd2_inode *jinode,
>  					loff_t new_size)
>  {
> -	journal_t *journal;
> -	transaction_t *commit_trans;
> +	transaction_t *inode_trans, *commit_trans;
>  	int ret = 0;
>  
> -	if (!inode->i_transaction && !inode->i_next_transaction)
> +	/* This is a quick check to avoid locking if not necessary */
> +	if (!jinode->i_transaction)
>  		goto out;
> -	journal = inode->i_transaction->t_journal;
> +	/* Locks are here just to force reading of recent values, it is
> +	 * enough that the transaction was not committing before we started
> +	 * a transaction adding the inode to orphan list */
>  	spin_lock(&journal->j_state_lock);
>  	commit_trans = journal->j_committing_transaction;
>  	spin_unlock(&journal->j_state_lock);
> -	if (inode->i_transaction == commit_trans) {
> -		ret = filemap_fdatawrite_range(inode->i_vfs_inode->i_mapping,
> +	spin_lock(&journal->j_list_lock);
> +	inode_trans = jinode->i_transaction;
> +	spin_unlock(&journal->j_list_lock);
> +	if (inode_trans == commit_trans) {
> +		ret = filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
>  			new_size, LLONG_MAX);
>  		if (ret)
>  			jbd2_journal_abort(journal, ret);
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 3c3532e..172850a 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
>  static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
>  					       loff_t new_size)
>  {
> -	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
> -						   new_size);
> +	return jbd2_journal_begin_ordered_truncate(
> +				OCFS2_SB(inode->i_sb)->journal->j_journal,
> +				&OCFS2_I(inode)->ip_jinode,
> +				new_size);
>  }
>  
>  #endif /* OCFS2_JOURNAL_H */
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index b28b37e..4d248b3 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1150,7 +1150,8 @@ extern int	   jbd2_journal_clear_err  (journal_t *);
>  extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
>  extern int	   jbd2_journal_force_commit(journal_t *);
>  extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
> -extern int	   jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, loff_t new_size);
> +extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
> +				struct jbd2_inode *inode, loff_t new_size);
>  extern void	   jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
>  extern void	   jbd2_journal_release_jbd_inode(journal_t *journal, struct jbd2_inode *jinode);
>  
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
@ 2009-02-11  9:25           ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2009-02-11  9:25 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, Andrew Morton, mfasheh, ocfs2-devel, mfasheh

On Tue 10-02-09 10:12:55, Theodore Tso wrote:
> The problem with separating out the patch into three separate commits
> is that it can potentially break a "git bisect".  To fix this, I
> believe the right thing to do is to combine the three patches into a
> single commit, and to push this via the ext4 tree; any objections?
  On a second thought, this is probably the best for such simple change.
So please go ahead. Thanks.

								Honza

> jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()
> 
> From: Jan Kara <jack@suse.cz>
> 
> If we race with commit code setting i_transaction to NULL, we could
> possibly dereference it.  Proper locking requires the journal pointer
> (to access journal->j_list_lock), which we don't have.  So we have to
> change the prototype of the function so that filesystem passes us the
> journal pointer.  Also add a more detailed comment about why the
> function jbd2_journal_begin_ordered_truncate() does what it does and
> how it should be used.
> 
> Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
> suspitious code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Acked-by: Joel Becker <joel.becker@oracle.com>
> CC: linux-ext4 at vger.kernel.org
> CC: ocfs2-devel at oss.oracle.com
> CC: mfasheh at suse.de
> CC: Dan Carpenter <error27@gmail.com>
> ---
>  fs/ext4/inode.c       |    6 ++++--
>  fs/jbd2/transaction.c |   42 +++++++++++++++++++++++++++++++-----------
>  fs/ocfs2/journal.h    |    6 ++++--
>  include/linux/jbd2.h  |    3 ++-
>  4 files changed, 41 insertions(+), 16 deletions(-)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 03ba20b..658c4a7 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -47,8 +47,10 @@
>  static inline int ext4_begin_ordered_truncate(struct inode *inode,
>  					      loff_t new_size)
>  {
> -	return jbd2_journal_begin_ordered_truncate(&EXT4_I(inode)->jinode,
> -						   new_size);
> +	return jbd2_journal_begin_ordered_truncate(
> +					EXT4_SB(inode->i_sb)->s_journal,
> +					&EXT4_I(inode)->jinode,
> +					new_size);
>  }
>  
>  static void ext4_invalidatepage(struct page *page, unsigned long offset);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 46b4e34..28ce21d 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -2129,26 +2129,46 @@ done:
>  }
>  
>  /*
> - * This function must be called when inode is journaled in ordered mode
> - * before truncation happens. It starts writeout of truncated part in
> - * case it is in the committing transaction so that we stand to ordered
> - * mode consistency guarantees.
> + * File truncate and transaction commit interact with each other in a
> + * non-trivial way.  If a transaction writing data block A is
> + * committing, we cannot discard the data by truncate until we have
> + * written them.  Otherwise if we crashed after the transaction with
> + * write has committed but before the transaction with truncate has
> + * committed, we could see stale data in block A.  This function is a
> + * helper to solve this problem.  It starts writeout of the truncated
> + * part in case it is in the committing transaction.
> + *
> + * Filesystem code must call this function when inode is journaled in
> + * ordered mode before truncation happens and after the inode has been
> + * placed on orphan list with the new inode size. The second condition
> + * avoids the race that someone writes new data and we start
> + * committing the transaction after this function has been called but
> + * before a transaction for truncate is started (and furthermore it
> + * allows us to optimize the case where the addition to orphan list
> + * happens in the same transaction as write --- we don't have to write
> + * any data in such case).
>   */
> -int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode,
> +int jbd2_journal_begin_ordered_truncate(journal_t *journal,
> +					struct jbd2_inode *jinode,
>  					loff_t new_size)
>  {
> -	journal_t *journal;
> -	transaction_t *commit_trans;
> +	transaction_t *inode_trans, *commit_trans;
>  	int ret = 0;
>  
> -	if (!inode->i_transaction && !inode->i_next_transaction)
> +	/* This is a quick check to avoid locking if not necessary */
> +	if (!jinode->i_transaction)
>  		goto out;
> -	journal = inode->i_transaction->t_journal;
> +	/* Locks are here just to force reading of recent values, it is
> +	 * enough that the transaction was not committing before we started
> +	 * a transaction adding the inode to orphan list */
>  	spin_lock(&journal->j_state_lock);
>  	commit_trans = journal->j_committing_transaction;
>  	spin_unlock(&journal->j_state_lock);
> -	if (inode->i_transaction == commit_trans) {
> -		ret = filemap_fdatawrite_range(inode->i_vfs_inode->i_mapping,
> +	spin_lock(&journal->j_list_lock);
> +	inode_trans = jinode->i_transaction;
> +	spin_unlock(&journal->j_list_lock);
> +	if (inode_trans == commit_trans) {
> +		ret = filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping,
>  			new_size, LLONG_MAX);
>  		if (ret)
>  			jbd2_journal_abort(journal, ret);
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 3c3532e..172850a 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -513,8 +513,10 @@ static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode)
>  static inline int ocfs2_begin_ordered_truncate(struct inode *inode,
>  					       loff_t new_size)
>  {
> -	return jbd2_journal_begin_ordered_truncate(&OCFS2_I(inode)->ip_jinode,
> -						   new_size);
> +	return jbd2_journal_begin_ordered_truncate(
> +				OCFS2_SB(inode->i_sb)->journal->j_journal,
> +				&OCFS2_I(inode)->ip_jinode,
> +				new_size);
>  }
>  
>  #endif /* OCFS2_JOURNAL_H */
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index b28b37e..4d248b3 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1150,7 +1150,8 @@ extern int	   jbd2_journal_clear_err  (journal_t *);
>  extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
>  extern int	   jbd2_journal_force_commit(journal_t *);
>  extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
> -extern int	   jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, loff_t new_size);
> +extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
> +				struct jbd2_inode *inode, loff_t new_size);
>  extern void	   jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
>  extern void	   jbd2_journal_release_jbd_inode(journal_t *journal, struct jbd2_inode *jinode);
>  
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
  2009-02-10 15:12         ` [Ocfs2-devel] " Theodore Tso
@ 2009-02-11 21:01           ` Mark Fasheh
  -1 siblings, 0 replies; 17+ messages in thread
From: Mark Fasheh @ 2009-02-11 21:01 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jan Kara, linux-ext4, Andrew Morton, ocfs2-devel

On Tue, Feb 10, 2009 at 10:12:55AM -0500, Theodore Tso wrote:
> The problem with separating out the patch into three separate commits
> is that it can potentially break a "git bisect".  To fix this, I
> believe the right thing to do is to combine the three patches into a
> single commit, and to push this via the ext4 tree; any objections?

That's fine with me. Btw, you forgot my:

Acked-by: Mark Fasheh <mfasheh@suse.com>
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: Add journal parameter to jbd2_journal_begin_ordered_truncate()
@ 2009-02-11 21:01           ` Mark Fasheh
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Fasheh @ 2009-02-11 21:01 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jan Kara, linux-ext4, Andrew Morton, ocfs2-devel

On Tue, Feb 10, 2009 at 10:12:55AM -0500, Theodore Tso wrote:
> The problem with separating out the patch into three separate commits
> is that it can potentially break a "git bisect".  To fix this, I
> believe the right thing to do is to combine the three patches into a
> single commit, and to push this via the ext4 tree; any objections?

That's fine with me. Btw, you forgot my:

Acked-by: Mark Fasheh <mfasheh@suse.com>
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-02-11 21:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-05 14:52 [PATCH 0/3] Fix possible NULL pointer dereference in JBD2 Jan Kara
2009-02-05 14:53 ` [PATCH 1/3] jbd2: Fix possible NULL pointer dereference in jbd2_journal_begin_ordered_truncate() Jan Kara
2009-02-05 14:53   ` [Ocfs2-devel] " Jan Kara
2009-02-05 14:53   ` [PATCH 2/3] ext4: Add journal parameter to jbd2_journal_begin_ordered_truncate() Jan Kara
2009-02-05 14:53     ` [PATCH 3/3] ocfs2: " Jan Kara
2009-02-05 14:53       ` [Ocfs2-devel] " Jan Kara
2009-02-10  3:19       ` Joel Becker
2009-02-10  3:19         ` Joel Becker
2009-02-10 15:12       ` Theodore Tso
2009-02-10 15:12         ` [Ocfs2-devel] " Theodore Tso
2009-02-11  9:25         ` Jan Kara
2009-02-11  9:25           ` [Ocfs2-devel] " Jan Kara
2009-02-11 21:01         ` Mark Fasheh
2009-02-11 21:01           ` [Ocfs2-devel] " Mark Fasheh
2009-02-05 19:32 ` [PATCH 0/3] Fix possible NULL pointer dereference in JBD2 Mark Fasheh
2009-02-05 21:37 ` Theodore Tso
2009-02-09 17:34   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.