All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] ext3/4 data integrity fixes
@ 2013-05-28  9:18 Dmitry Monakhov
  2013-05-28  9:18 ` [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-05-28  9:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Here is updated patch set for various data integrity issues.

TOC:

jbd2: optimize jbd2_journal_force_commit
ext4: fix data integrity for ext4_sync_fs
jbd: optimize journal_force_commit
ext3: fix data integrity for ext4_sync_fs
ext4: Fix fsync error handling after filesystem abort.
ext3: Fix fsync error handling after filesystem abort.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit
  2013-05-28  9:18 [PATCH 0/6] ext3/4 data integrity fixes Dmitry Monakhov
@ 2013-05-28  9:18 ` Dmitry Monakhov
  2013-05-28 21:22   ` Jan Kara
  2013-05-28  9:18 ` [PATCH 2/6] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Dmitry Monakhov @ 2013-05-28  9:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already has jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/jbd2/journal.c     |   55 ++++++++++++++++++++++++++++++++++++------------
 fs/jbd2/transaction.c |   23 --------------------
 include/linux/jbd2.h  |    2 +-
 3 files changed, 42 insertions(+), 38 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 886ec2f..078e8ea 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -566,21 +566,19 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid)
 }
 
 /*
- * Force and wait upon a commit if the calling process is not within
- * transaction.  This is used for forcing out undo-protected data which contains
- * bitmaps, when the fs is running out of space.
- *
- * We can only force the running transaction if we don't have an active handle;
- * otherwise, we will deadlock.
- *
- * Returns true if a transaction was started.
+ * Force and wait any uncommitted transactions.  We can only force the running
+ * transaction if we don't have an active handle, otherwise, we will deadlock.
  */
-int jbd2_journal_force_commit_nested(journal_t *journal)
+int __jbd2_journal_force_commit(journal_t *journal, int nested, int *progress)
 {
 	transaction_t *transaction = NULL;
 	tid_t tid;
-	int need_to_start = 0;
+	int need_to_start = 0, ret = 0;
 
+	J_ASSERT(!current->journal_info || nested);
+
+	if (progress)
+		*progress = 0;
 	read_lock(&journal->j_state_lock);
 	if (journal->j_running_transaction && !current->journal_info) {
 		transaction = journal->j_running_transaction;
@@ -590,16 +588,45 @@ int jbd2_journal_force_commit_nested(journal_t *journal)
 		transaction = journal->j_committing_transaction;
 
 	if (!transaction) {
+		/* Nothing to commit */
 		read_unlock(&journal->j_state_lock);
-		return 0;	/* Nothing to retry */
+		return 0;
 	}
-
 	tid = transaction->t_tid;
 	read_unlock(&journal->j_state_lock);
 	if (need_to_start)
 		jbd2_log_start_commit(journal, tid);
-	jbd2_log_wait_commit(journal, tid);
-	return 1;
+	ret = jbd2_log_wait_commit(journal, tid);
+	if (!ret && progress)
+		*progress = 1;
+
+	return ret;
+}
+
+/**
+ * Force and wait upon a commit if the calling process is not within
+ * transaction.  This is used for forcing out undo-protected data which contains
+ * bitmaps, when the fs is running out of space.
+ *
+ * @journal: journal to force
+ * Returns true if progress was made.
+ */
+int jbd2_journal_force_commit_nested(journal_t *journal)
+{
+	int progress;
+
+	__jbd2_journal_force_commit(journal, 1, &progress);
+	return progress;
+}
+
+/**
+ * int journal_force_commit() - force any uncommitted transactions
+ * @journal: journal to force
+ *
+ */
+int jbd2_journal_force_commit(journal_t *journal)
+{
+	return __jbd2_journal_force_commit(journal, 0, NULL);
 }
 
 /*
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 10f524c..dae6b3d 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1522,29 +1522,6 @@ int jbd2_journal_stop(handle_t *handle)
 	return err;
 }
 
-/**
- * int jbd2_journal_force_commit() - force any uncommitted transactions
- * @journal: journal to force
- *
- * For synchronous operations: force any uncommitted transactions
- * to disk.  May seem kludgy, but it reuses all the handle batching
- * code in a very simple manner.
- */
-int jbd2_journal_force_commit(journal_t *journal)
-{
-	handle_t *handle;
-	int ret;
-
-	handle = jbd2_journal_start(journal, 1);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-	} else {
-		handle->h_sync = 1;
-		ret = jbd2_journal_stop(handle);
-	}
-	return ret;
-}
-
 /*
  *
  * List management code snippets: various functions for manipulating the
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 6e051f4..c9e1ab6 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1125,6 +1125,7 @@ extern void	   jbd2_journal_ack_err    (journal_t *);
 extern int	   jbd2_journal_clear_err  (journal_t *);
 extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
 extern int	   jbd2_journal_force_commit(journal_t *);
+extern int	   jbd2_journal_force_commit_nested(journal_t *);
 extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
 extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
 				struct jbd2_inode *inode, loff_t new_size);
@@ -1199,7 +1200,6 @@ int __jbd2_log_space_left(journal_t *); /* Called with journal locked */
 int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
-int jbd2_journal_force_commit_nested(journal_t *journal);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] ext4: fix data integrity for ext4_sync_fs
  2013-05-28  9:18 [PATCH 0/6] ext3/4 data integrity fixes Dmitry Monakhov
  2013-05-28  9:18 ` [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
@ 2013-05-28  9:18 ` Dmitry Monakhov
  2013-05-28 21:51   ` Jan Kara
  2013-05-28  9:18 ` [PATCH 3/6] jbd: optimize journal_force_commit Dmitry Monakhov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Dmitry Monakhov @ 2013-05-28  9:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Inode's data or non journaled quota may be written w/o jounral so we _must_
send a barrier at the end of ext4_sync_fs. But it can be skipped if journal
commit will do it for us.

Also fix data integrity for nojournal mode.
changes from v1:
 skip barrier for async mode

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/super.c      |   35 ++++++++++++++++++++++++++++++++++-
 include/linux/jbd2.h |   15 +++++++++++++++
 2 files changed, 49 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dbc7c09..edc716d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -69,6 +69,7 @@ static void ext4_mark_recovery_complete(struct super_block *sb,
 static void ext4_clear_journal_err(struct super_block *sb,
 				   struct ext4_super_block *es);
 static int ext4_sync_fs(struct super_block *sb, int wait);
+static int ext4_sync_fs_nojournal(struct super_block *sb, int wait);
 static int ext4_remount(struct super_block *sb, int *flags, char *data);
 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int ext4_unfreeze(struct super_block *sb);
@@ -1096,6 +1097,7 @@ static const struct super_operations ext4_nojournal_sops = {
 	.dirty_inode	= ext4_dirty_inode,
 	.drop_inode	= ext4_drop_inode,
 	.evict_inode	= ext4_evict_inode,
+	.sync_fs	= ext4_sync_fs_nojournal,
 	.put_super	= ext4_put_super,
 	.statfs		= ext4_statfs,
 	.remount_fs	= ext4_remount,
@@ -4520,6 +4522,7 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 {
 	int ret = 0;
 	tid_t target;
+	bool needs_barrier = false;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 
 	trace_ext4_sync_fs(sb, wait);
@@ -4529,10 +4532,40 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 	 * no dirty dquots
 	 */
 	dquot_writeback_dquots(sb, -1);
+	/*
+	 * Data writeback is possible w/o journal transaction, so barrier must
+	 * being sent at the end of the function. But we can skip it if
+	 * transaction_commit will do it for us.
+	 */
+	target = jbd2_get_latest_transaction(sbi->s_journal);
+	if (wait && sbi->s_journal->j_flags & JBD2_BARRIER &&
+	    !jbd2_trans_will_send_data_barrier(sbi->s_journal, target))
+		needs_barrier = true;
+
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
-			jbd2_log_wait_commit(sbi->s_journal, target);
+			ret = jbd2_log_wait_commit(sbi->s_journal, target);
+	}
+	if (needs_barrier) {
+		int err;
+		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
+		if (!ret)
+			ret = err;
 	}
+
+	return ret;
+}
+
+static int ext4_sync_fs_nojournal(struct super_block *sb, int wait)
+{
+	int ret = 0;
+
+	trace_ext4_sync_fs(sb, wait);
+	flush_workqueue(EXT4_SB(sb)->dio_unwritten_wq);
+	dquot_writeback_dquots(sb, -1);
+	if (wait && test_opt(sb, BARRIER))
+		ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
+
 	return ret;
 }
 
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index c9e1ab6..ed974fd 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1319,6 +1319,21 @@ static inline u32 jbd2_chksum(journal_t *journal, u32 crc,
 	return *(u32 *)desc.ctx;
 }
 
+/* Return most recent uncommitted transaction */
+static inline tid_t  jbd2_get_latest_transaction(journal_t *journal)
+{
+	tid_t tid;
+
+	read_lock(&journal->j_state_lock);
+	tid = journal->j_commit_request;
+	if (journal->j_running_transaction) {
+		tid = journal->j_running_transaction->t_tid;
+	} else if (!journal->j_commit_sequence)
+		tid = journal->j_commit_sequence;
+	read_unlock(&journal->j_state_lock);
+	return tid;
+}
+
 #ifdef __KERNEL__
 
 #define buffer_trace_init(bh)	do {} while (0)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] jbd: optimize journal_force_commit
  2013-05-28  9:18 [PATCH 0/6] ext3/4 data integrity fixes Dmitry Monakhov
  2013-05-28  9:18 ` [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
  2013-05-28  9:18 ` [PATCH 2/6] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
@ 2013-05-28  9:18 ` Dmitry Monakhov
  2013-05-28  9:18 ` [PATCH 4/6] ext3: fix data integrity for ext4_sync_fs Dmitry Monakhov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-05-28  9:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Current implementation of journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already has journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/jbd/journal.c     |   50 +++++++++++++++++++++++++++++++++++++++-----------
 fs/jbd/transaction.c |   23 -----------------------
 2 files changed, 39 insertions(+), 34 deletions(-)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 81cc7ea..dd7e851 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -482,19 +482,18 @@ int log_start_commit(journal_t *journal, tid_t tid)
 }
 
 /*
- * Force and wait upon a commit if the calling process is not within
- * transaction.  This is used for forcing out undo-protected data which contains
- * bitmaps, when the fs is running out of space.
- *
- * We can only force the running transaction if we don't have an active handle;
- * otherwise, we will deadlock.
- *
- * Returns true if a transaction was started.
+ * Force and wait any uncommitted transactions.  We can only force the running
+ * transaction if we don't have an active handle, otherwise, we will deadlock.
  */
-int journal_force_commit_nested(journal_t *journal)
+static int __journal_force_commit(journal_t *journal, int nested, int *progress)
 {
 	transaction_t *transaction = NULL;
 	tid_t tid;
+	int ret = 0;
+
+	J_ASSERT(!current->journal_info || nested);
+	if (progress)
+		*progress = 0;
 
 	spin_lock(&journal->j_state_lock);
 	if (journal->j_running_transaction && !current->journal_info) {
@@ -510,8 +509,37 @@ int journal_force_commit_nested(journal_t *journal)
 
 	tid = transaction->t_tid;
 	spin_unlock(&journal->j_state_lock);
-	log_wait_commit(journal, tid);
-	return 1;
+	ret = log_wait_commit(journal, tid);
+	if (!ret && progress)
+		*progress = 1;
+
+	return ret;
+}
+
+/**
+ * Force and wait upon a commit if the calling process is not within
+ * transaction.  This is used for forcing out undo-protected data which contains
+ * bitmaps, when the fs is running out of space.
+ *
+ * @journal: journal to force
+ * Returns true if progress was made.
+ */
+int journal_force_commit_nested(journal_t *journal)
+{
+	int progress;
+
+	__journal_force_commit(journal, 1, &progress);
+	return progress;
+}
+
+/**
+ * int journal_force_commit() - force any uncommitted transactions
+ * @journal: journal to force
+ *
+ */
+int journal_force_commit(journal_t *journal)
+{
+	return __journal_force_commit(journal, 0, NULL);
 }
 
 /*
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..d3b9a13 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1483,29 +1483,6 @@ int journal_stop(handle_t *handle)
 	return err;
 }
 
-/**
- * int journal_force_commit() - force any uncommitted transactions
- * @journal: journal to force
- *
- * For synchronous operations: force any uncommitted transactions
- * to disk.  May seem kludgy, but it reuses all the handle batching
- * code in a very simple manner.
- */
-int journal_force_commit(journal_t *journal)
-{
-	handle_t *handle;
-	int ret;
-
-	handle = journal_start(journal, 1);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-	} else {
-		handle->h_sync = 1;
-		ret = journal_stop(handle);
-	}
-	return ret;
-}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] ext3: fix data integrity for ext4_sync_fs
  2013-05-28  9:18 [PATCH 0/6] ext3/4 data integrity fixes Dmitry Monakhov
                   ` (2 preceding siblings ...)
  2013-05-28  9:18 ` [PATCH 3/6] jbd: optimize journal_force_commit Dmitry Monakhov
@ 2013-05-28  9:18 ` Dmitry Monakhov
  2013-05-28 21:40   ` Jan Kara
  2013-05-28  9:19 ` [PATCH 5/6] ext4: Fix fsync error handling after filesystem abort Dmitry Monakhov
  2013-05-28  9:19 ` [PATCH 6/6] ext3: " Dmitry Monakhov
  5 siblings, 1 reply; 17+ messages in thread
From: Dmitry Monakhov @ 2013-05-28  9:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Inode's data or non journaled quota may be written w/o jounral so we must
send a barrier at the end of ext3_sync_fs. But it can be skipped if journal
commit will do it for us.

changes from v1:
 skip barrier for async mode

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext3/super.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index fb5120a..c8a4e17 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2521,6 +2521,7 @@ int ext3_force_commit(struct super_block *sb)
 static int ext3_sync_fs(struct super_block *sb, int wait)
 {
 	tid_t target;
+	int ret = 0;
 
 	trace_ext3_sync_fs(sb, wait);
 	/*
@@ -2528,11 +2529,14 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
 	 * no dirty dquots
 	 */
 	dquot_writeback_dquots(sb, -1);
-	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
-		if (wait)
-			log_wait_commit(EXT3_SB(sb)->s_journal, target);
+	if (wait) {
+		if (journal_start_commit(EXT3_SB(sb)->s_journal, &target))
+			ret = log_wait_commit(EXT3_SB(sb)->s_journal, target);
+		else
+			ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
 	}
-	return 0;
+
+	return ret;
 }
 
 /*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] ext4: Fix fsync error handling after filesystem abort.
  2013-05-28  9:18 [PATCH 0/6] ext3/4 data integrity fixes Dmitry Monakhov
                   ` (3 preceding siblings ...)
  2013-05-28  9:18 ` [PATCH 4/6] ext3: fix data integrity for ext4_sync_fs Dmitry Monakhov
@ 2013-05-28  9:19 ` Dmitry Monakhov
  2013-05-28 21:29   ` Jan Kara
  2013-05-28  9:19 ` [PATCH 6/6] ext3: " Dmitry Monakhov
  5 siblings, 1 reply; 17+ messages in thread
From: Dmitry Monakhov @ 2013-05-28  9:19 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

If filesystem was aborted after inode's write back is complete
but before its metadata was updated we may return success
results in data loss.
In order to handle fs abort correctly we have to check
fs state once we discover that it is in MS_RDONLY state

Test case: http://patchwork.ozlabs.org/patch/244297

Changes from V1:
 - fix spelling

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/fsync.c |    8 ++++++--
 fs/ext4/super.c |   12 +++++++++++-
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index e0ba8a4..d7df2f1 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -129,9 +129,13 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		return ret;
 	mutex_lock(&inode->i_mutex);
 
-	if (inode->i_sb->s_flags & MS_RDONLY)
+	if (inode->i_sb->s_flags & MS_RDONLY) {
+		/* Make shure that we read updated s_mount_flags value */
+		smp_rmb();
+		if (EXT4_SB(inode->i_sb)->s_mount_flags & EXT4_MF_FS_ABORTED)
+			ret = -EROFS;
 		goto out;
-
+	}
 	ret = ext4_flush_unwritten_io(inode);
 	if (ret < 0)
 		goto out;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index edc716d..b4154d3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -399,6 +399,11 @@ static void ext4_handle_error(struct super_block *sb)
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
+		/*
+		 * Make shure updated value of ->s_mount_flags will be visiable
+		 * before ->s_flags update
+		 */
+		smp_wmb();
 		sb->s_flags |= MS_RDONLY;
 	}
 	if (test_opt(sb, ERRORS_PANIC))
@@ -571,8 +576,13 @@ void __ext4_abort(struct super_block *sb, const char *function,
 
 	if ((sb->s_flags & MS_RDONLY) == 0) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
 		EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FS_ABORTED;
+		/*
+		 * Make shure updated value of ->s_mount_flags will be visiable
+		 * before ->s_flags update
+		 */
+		smp_wmb();
+		sb->s_flags |= MS_RDONLY;
 		if (EXT4_SB(sb)->s_journal)
 			jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
 		save_error_info(sb, function, line);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] ext3: Fix fsync error handling after filesystem abort.
  2013-05-28  9:18 [PATCH 0/6] ext3/4 data integrity fixes Dmitry Monakhov
                   ` (4 preceding siblings ...)
  2013-05-28  9:19 ` [PATCH 5/6] ext4: Fix fsync error handling after filesystem abort Dmitry Monakhov
@ 2013-05-28  9:19 ` Dmitry Monakhov
  2013-05-28 21:33   ` Jan Kara
  5 siblings, 1 reply; 17+ messages in thread
From: Dmitry Monakhov @ 2013-05-28  9:19 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

If filesystem was aborted we will return success
due to (sb->s_flags & MS_RDONLY) which is incorrect and
results in data loss.
In order to handle fs abort correctly we have to check
fs state once we discover that it is in MS_RDONLY state

Test case: http://patchwork.ozlabs.org/patch/244297/
Changes from V1:
 - fix spelling
 - fix smp_rmb()/debug order

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext3/fsync.c |    8 ++++++--
 fs/ext3/super.c |   13 ++++++++++++-
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/ext3/fsync.c b/fs/ext3/fsync.c
index b31dbd4..5412916 100644
--- a/fs/ext3/fsync.c
+++ b/fs/ext3/fsync.c
@@ -48,9 +48,13 @@ int ext3_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 
 	trace_ext3_sync_file_enter(file, datasync);
 
-	if (inode->i_sb->s_flags & MS_RDONLY)
+	if (inode->i_sb->s_flags & MS_RDONLY) {
+		/* Make shure that we read updated state */
+		smp_rmb();
+		if (EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS)
+			return -EROFS;
 		return 0;
-
+	}
 	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
 	if (ret)
 		goto out;
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index c8a4e17..b788bae 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -174,6 +174,11 @@ static void ext3_handle_error(struct super_block *sb)
 	if (test_opt (sb, ERRORS_RO)) {
 		ext3_msg(sb, KERN_CRIT,
 			"error: remounting filesystem read-only");
+		/*
+		 * Make shure updated value of ->s_mount_state will be visiable
+		 * before ->s_flags update.
+		 */
+		smp_wmb();
 		sb->s_flags |= MS_RDONLY;
 	}
 	ext3_commit_super(sb, es, 1);
@@ -291,8 +296,14 @@ void ext3_abort(struct super_block *sb, const char *function,
 	ext3_msg(sb, KERN_CRIT,
 		"error: remounting filesystem read-only");
 	EXT3_SB(sb)->s_mount_state |= EXT3_ERROR_FS;
-	sb->s_flags |= MS_RDONLY;
 	set_opt(EXT3_SB(sb)->s_mount_opt, ABORT);
+	/*
+	 * Make shure updated value of ->s_mount_state will be visiable
+	 * before ->s_flags update.
+	 */
+	smp_wmb();
+	sb->s_flags |= MS_RDONLY;
+
 	if (EXT3_SB(sb)->s_journal)
 		journal_abort(EXT3_SB(sb)->s_journal, -EIO);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit
  2013-05-28  9:18 ` [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
@ 2013-05-28 21:22   ` Jan Kara
  2013-06-03 11:16     ` Dmitry Monakhov
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2013-05-28 21:22 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Tue 28-05-13 13:18:56, Dmitry Monakhov wrote:
> Current implementation of jbd2_journal_force_commit() is suboptimal because
> result in empty and useless commits. But callers just want to force and wait
> any unfinished commits. We already has jbd2_journal_force_commit_nested()
> which does exactly what we want, except we are guaranteed that we do not hold
> journal transaction open.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
  Hum, didn't I already review something like this?

> ---
>  fs/jbd2/journal.c     |   55 ++++++++++++++++++++++++++++++++++++------------
>  fs/jbd2/transaction.c |   23 --------------------
>  include/linux/jbd2.h  |    2 +-
>  3 files changed, 42 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 886ec2f..078e8ea 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -566,21 +566,19 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid)
>  }
>  
>  /*
> - * Force and wait upon a commit if the calling process is not within
> - * transaction.  This is used for forcing out undo-protected data which contains
> - * bitmaps, when the fs is running out of space.
> - *
> - * We can only force the running transaction if we don't have an active handle;
> - * otherwise, we will deadlock.
> - *
> - * Returns true if a transaction was started.
> + * Force and wait any uncommitted transactions.  We can only force the running
> + * transaction if we don't have an active handle, otherwise, we will deadlock.
>   */
> -int jbd2_journal_force_commit_nested(journal_t *journal)
> +int __jbd2_journal_force_commit(journal_t *journal, int nested, int *progress)
>  {
>  	transaction_t *transaction = NULL;
>  	tid_t tid;
> -	int need_to_start = 0;
> +	int need_to_start = 0, ret = 0;
>  
> +	J_ASSERT(!current->journal_info || nested);
> +
> +	if (progress)
> +		*progress = 0;
>  	read_lock(&journal->j_state_lock);
>  	if (journal->j_running_transaction && !current->journal_info) {
>  		transaction = journal->j_running_transaction;
> @@ -590,16 +588,45 @@ int jbd2_journal_force_commit_nested(journal_t *journal)
>  		transaction = journal->j_committing_transaction;
>  
>  	if (!transaction) {
> +		/* Nothing to commit */
>  		read_unlock(&journal->j_state_lock);
> -		return 0;	/* Nothing to retry */
> +		return 0;
>  	}
> -
>  	tid = transaction->t_tid;
>  	read_unlock(&journal->j_state_lock);
>  	if (need_to_start)
>  		jbd2_log_start_commit(journal, tid);
> -	jbd2_log_wait_commit(journal, tid);
> -	return 1;
> +	ret = jbd2_log_wait_commit(journal, tid);
> +	if (!ret && progress)
> +		*progress = 1;
> +
> +	return ret;
> +}
  Can't we just make this function return <0, 0, 1 and get rid of
'progress' argument? Caller jbd2_journal_force_commit() can then return 0
if __jbd2_journal_force_commit() returned >= 0...

> +
> +/**
> + * Force and wait upon a commit if the calling process is not within
> + * transaction.  This is used for forcing out undo-protected data which contains
> + * bitmaps, when the fs is running out of space.
> + *
> + * @journal: journal to force
> + * Returns true if progress was made.
> + */
> +int jbd2_journal_force_commit_nested(journal_t *journal)
> +{
> +	int progress;
> +
> +	__jbd2_journal_force_commit(journal, 1, &progress);
> +	return progress;
> +}
> +
> +/**
> + * int journal_force_commit() - force any uncommitted transactions
> + * @journal: journal to force
> + *
> + */
> +int jbd2_journal_force_commit(journal_t *journal)
> +{
> +	return __jbd2_journal_force_commit(journal, 0, NULL);
>  }
  Also if we move the WARN_ON(!current->journal_info) here, we can remove
the 'nested' argument.

								Honza

>  /*
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 10f524c..dae6b3d 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1522,29 +1522,6 @@ int jbd2_journal_stop(handle_t *handle)
>  	return err;
>  }
>  
> -/**
> - * int jbd2_journal_force_commit() - force any uncommitted transactions
> - * @journal: journal to force
> - *
> - * For synchronous operations: force any uncommitted transactions
> - * to disk.  May seem kludgy, but it reuses all the handle batching
> - * code in a very simple manner.
> - */
> -int jbd2_journal_force_commit(journal_t *journal)
> -{
> -	handle_t *handle;
> -	int ret;
> -
> -	handle = jbd2_journal_start(journal, 1);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> -	} else {
> -		handle->h_sync = 1;
> -		ret = jbd2_journal_stop(handle);
> -	}
> -	return ret;
> -}
> -
>  /*
>   *
>   * List management code snippets: various functions for manipulating the
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 6e051f4..c9e1ab6 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1125,6 +1125,7 @@ extern void	   jbd2_journal_ack_err    (journal_t *);
>  extern int	   jbd2_journal_clear_err  (journal_t *);
>  extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
>  extern int	   jbd2_journal_force_commit(journal_t *);
> +extern int	   jbd2_journal_force_commit_nested(journal_t *);
>  extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
>  extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
>  				struct jbd2_inode *inode, loff_t new_size);
> @@ -1199,7 +1200,6 @@ int __jbd2_log_space_left(journal_t *); /* Called with journal locked */
>  int jbd2_log_start_commit(journal_t *journal, tid_t tid);
>  int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
>  int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
> -int jbd2_journal_force_commit_nested(journal_t *journal);
>  int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
>  int jbd2_complete_transaction(journal_t *journal, tid_t tid);
>  int jbd2_log_do_checkpoint(journal_t *journal);
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] ext4: Fix fsync error handling after filesystem abort.
  2013-05-28  9:19 ` [PATCH 5/6] ext4: Fix fsync error handling after filesystem abort Dmitry Monakhov
@ 2013-05-28 21:29   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2013-05-28 21:29 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Tue 28-05-13 13:19:00, Dmitry Monakhov wrote:
> If filesystem was aborted after inode's write back is complete
> but before its metadata was updated we may return success
> results in data loss.
> In order to handle fs abort correctly we have to check
> fs state once we discover that it is in MS_RDONLY state
> 
> Test case: http://patchwork.ozlabs.org/patch/244297
> 
> Changes from V1:
>  - fix spelling
  Still not quite ;) Search for 'shure' and 'visiable'. But feel free to
add
  Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/fsync.c |    8 ++++++--
>  fs/ext4/super.c |   12 +++++++++++-
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index e0ba8a4..d7df2f1 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -129,9 +129,13 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  		return ret;
>  	mutex_lock(&inode->i_mutex);
>  
> -	if (inode->i_sb->s_flags & MS_RDONLY)
> +	if (inode->i_sb->s_flags & MS_RDONLY) {
> +		/* Make shure that we read updated s_mount_flags value */
> +		smp_rmb();
> +		if (EXT4_SB(inode->i_sb)->s_mount_flags & EXT4_MF_FS_ABORTED)
> +			ret = -EROFS;
>  		goto out;
> -
> +	}
>  	ret = ext4_flush_unwritten_io(inode);
>  	if (ret < 0)
>  		goto out;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index edc716d..b4154d3 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -399,6 +399,11 @@ static void ext4_handle_error(struct super_block *sb)
>  	}
>  	if (test_opt(sb, ERRORS_RO)) {
>  		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
> +		/*
> +		 * Make shure updated value of ->s_mount_flags will be visiable
> +		 * before ->s_flags update
> +		 */
> +		smp_wmb();
>  		sb->s_flags |= MS_RDONLY;
>  	}
>  	if (test_opt(sb, ERRORS_PANIC))
> @@ -571,8 +576,13 @@ void __ext4_abort(struct super_block *sb, const char *function,
>  
>  	if ((sb->s_flags & MS_RDONLY) == 0) {
>  		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
> -		sb->s_flags |= MS_RDONLY;
>  		EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FS_ABORTED;
> +		/*
> +		 * Make shure updated value of ->s_mount_flags will be visiable
> +		 * before ->s_flags update
> +		 */
> +		smp_wmb();
> +		sb->s_flags |= MS_RDONLY;
>  		if (EXT4_SB(sb)->s_journal)
>  			jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
>  		save_error_info(sb, function, line);
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] ext3: Fix fsync error handling after filesystem abort.
  2013-05-28  9:19 ` [PATCH 6/6] ext3: " Dmitry Monakhov
@ 2013-05-28 21:33   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2013-05-28 21:33 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Tue 28-05-13 13:19:01, Dmitry Monakhov wrote:
> If filesystem was aborted we will return success
> due to (sb->s_flags & MS_RDONLY) which is incorrect and
> results in data loss.
> In order to handle fs abort correctly we have to check
> fs state once we discover that it is in MS_RDONLY state
> 
> Test case: http://patchwork.ozlabs.org/patch/244297/
> Changes from V1:
>  - fix spelling
>  - fix smp_rmb()/debug order
  I've taken this patch to my tree with some minor spelling fixes.

								Honza
  
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext3/fsync.c |    8 ++++++--
>  fs/ext3/super.c |   13 ++++++++++++-
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext3/fsync.c b/fs/ext3/fsync.c
> index b31dbd4..5412916 100644
> --- a/fs/ext3/fsync.c
> +++ b/fs/ext3/fsync.c
> @@ -48,9 +48,13 @@ int ext3_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  
>  	trace_ext3_sync_file_enter(file, datasync);
>  
> -	if (inode->i_sb->s_flags & MS_RDONLY)
> +	if (inode->i_sb->s_flags & MS_RDONLY) {
> +		/* Make shure that we read updated state */
> +		smp_rmb();
> +		if (EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS)
> +			return -EROFS;
>  		return 0;
> -
> +	}
>  	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
>  	if (ret)
>  		goto out;
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index c8a4e17..b788bae 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -174,6 +174,11 @@ static void ext3_handle_error(struct super_block *sb)
>  	if (test_opt (sb, ERRORS_RO)) {
>  		ext3_msg(sb, KERN_CRIT,
>  			"error: remounting filesystem read-only");
> +		/*
> +		 * Make shure updated value of ->s_mount_state will be visiable
> +		 * before ->s_flags update.
> +		 */
> +		smp_wmb();
>  		sb->s_flags |= MS_RDONLY;
>  	}
>  	ext3_commit_super(sb, es, 1);
> @@ -291,8 +296,14 @@ void ext3_abort(struct super_block *sb, const char *function,
>  	ext3_msg(sb, KERN_CRIT,
>  		"error: remounting filesystem read-only");
>  	EXT3_SB(sb)->s_mount_state |= EXT3_ERROR_FS;
> -	sb->s_flags |= MS_RDONLY;
>  	set_opt(EXT3_SB(sb)->s_mount_opt, ABORT);
> +	/*
> +	 * Make shure updated value of ->s_mount_state will be visiable
> +	 * before ->s_flags update.
> +	 */
> +	smp_wmb();
> +	sb->s_flags |= MS_RDONLY;
> +
>  	if (EXT3_SB(sb)->s_journal)
>  		journal_abort(EXT3_SB(sb)->s_journal, -EIO);
>  }
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] ext3: fix data integrity for ext4_sync_fs
  2013-05-28  9:18 ` [PATCH 4/6] ext3: fix data integrity for ext4_sync_fs Dmitry Monakhov
@ 2013-05-28 21:40   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2013-05-28 21:40 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Tue 28-05-13 13:18:59, Dmitry Monakhov wrote:
> Inode's data or non journaled quota may be written w/o jounral so we must
> send a barrier at the end of ext3_sync_fs. But it can be skipped if journal
> commit will do it for us.
> 
> changes from v1:
>  skip barrier for async mode
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext3/super.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index fb5120a..c8a4e17 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -2521,6 +2521,7 @@ int ext3_force_commit(struct super_block *sb)
>  static int ext3_sync_fs(struct super_block *sb, int wait)
>  {
>  	tid_t target;
> +	int ret = 0;
>  
>  	trace_ext3_sync_fs(sb, wait);
>  	/*
> @@ -2528,11 +2529,14 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
>  	 * no dirty dquots
>  	 */
>  	dquot_writeback_dquots(sb, -1);
> -	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
> -		if (wait)
> -			log_wait_commit(EXT3_SB(sb)->s_journal, target);
> +	if (wait) {
> +		if (journal_start_commit(EXT3_SB(sb)->s_journal, &target))
> +			ret = log_wait_commit(EXT3_SB(sb)->s_journal, target);
> +		else
> +			ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
>  	}
> -	return 0;
> +
> +	return ret;
>  }
  This will issue a flush even if the filesystem is mounted with barrier=0.
That's not good. Also journal_start_commit() was deliberately called even
if wait == 0. This is so that journal commit is started and in a not so
rare case where second sync pass doesn't find anything to write, we can
just wait for an already running commit...

Finally I think it may be actually worth it and do a similar thing as you
did for ext4/jbd2 with the latest transaction in the journal. It's not very
intrusive change.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] ext4: fix data integrity for ext4_sync_fs
  2013-05-28  9:18 ` [PATCH 2/6] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
@ 2013-05-28 21:51   ` Jan Kara
  2013-06-03 11:30     ` Dmitry Monakhov
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2013-05-28 21:51 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack

On Tue 28-05-13 13:18:57, Dmitry Monakhov wrote:
> Inode's data or non journaled quota may be written w/o jounral so we _must_
> send a barrier at the end of ext4_sync_fs. But it can be skipped if journal
> commit will do it for us.
> 
> Also fix data integrity for nojournal mode.
> changes from v1:
>  skip barrier for async mode
  One comment below:

> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index c9e1ab6..ed974fd 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1319,6 +1319,21 @@ static inline u32 jbd2_chksum(journal_t *journal, u32 crc,
>  	return *(u32 *)desc.ctx;
>  }
>  
> +/* Return most recent uncommitted transaction */
> +static inline tid_t  jbd2_get_latest_transaction(journal_t *journal)
> +{
> +	tid_t tid;
> +
> +	read_lock(&journal->j_state_lock);
> +	tid = journal->j_commit_request;
> +	if (journal->j_running_transaction) {
> +		tid = journal->j_running_transaction->t_tid;
> +	} else if (!journal->j_commit_sequence)
  I would expect here journal->j_committing_transaction and then
journal->j_commit_request below. And you can use j_commit_sequence as an
initial 'tid' value...

> +		tid = journal->j_commit_sequence;
> +	read_unlock(&journal->j_state_lock);
> +	return tid;
> +}
> +

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit
  2013-05-28 21:22   ` Jan Kara
@ 2013-06-03 11:16     ` Dmitry Monakhov
  2013-06-10 13:07       ` Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Monakhov @ 2013-06-03 11:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Tue, 28 May 2013 23:22:05 +0200, Jan Kara <jack@suse.cz> wrote:
> On Tue 28-05-13 13:18:56, Dmitry Monakhov wrote:
> > Current implementation of jbd2_journal_force_commit() is suboptimal because
> > result in empty and useless commits. But callers just want to force and wait
> > any unfinished commits. We already has jbd2_journal_force_commit_nested()
> > which does exactly what we want, except we are guaranteed that we do not hold
> > journal transaction open.
> > 
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
>   Hum, didn't I already review something like this?
Off course you did. As result you have pointed an incorrect error value.
This was fixed in this version.
> 
> > ---
> >  fs/jbd2/journal.c     |   55 ++++++++++++++++++++++++++++++++++++------------
> >  fs/jbd2/transaction.c |   23 --------------------
> >  include/linux/jbd2.h  |    2 +-
> >  3 files changed, 42 insertions(+), 38 deletions(-)
> > 
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index 886ec2f..078e8ea 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -566,21 +566,19 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid)
> >  }
> >  
> >  /*
> > - * Force and wait upon a commit if the calling process is not within
> > - * transaction.  This is used for forcing out undo-protected data which contains
> > - * bitmaps, when the fs is running out of space.
> > - *
> > - * We can only force the running transaction if we don't have an active handle;
> > - * otherwise, we will deadlock.
> > - *
> > - * Returns true if a transaction was started.
> > + * Force and wait any uncommitted transactions.  We can only force the running
> > + * transaction if we don't have an active handle, otherwise, we will deadlock.
> >   */
> > -int jbd2_journal_force_commit_nested(journal_t *journal)
> > +int __jbd2_journal_force_commit(journal_t *journal, int nested, int *progress)
> >  {
> >  	transaction_t *transaction = NULL;
> >  	tid_t tid;
> > -	int need_to_start = 0;
> > +	int need_to_start = 0, ret = 0;
> >  
> > +	J_ASSERT(!current->journal_info || nested);
> > +
> > +	if (progress)
> > +		*progress = 0;
> >  	read_lock(&journal->j_state_lock);
> >  	if (journal->j_running_transaction && !current->journal_info) {
> >  		transaction = journal->j_running_transaction;
> > @@ -590,16 +588,45 @@ int jbd2_journal_force_commit_nested(journal_t *journal)
> >  		transaction = journal->j_committing_transaction;
> >  
> >  	if (!transaction) {
> > +		/* Nothing to commit */
> >  		read_unlock(&journal->j_state_lock);
> > -		return 0;	/* Nothing to retry */
> > +		return 0;
> >  	}
> > -
> >  	tid = transaction->t_tid;
> >  	read_unlock(&journal->j_state_lock);
> >  	if (need_to_start)
> >  		jbd2_log_start_commit(journal, tid);
> > -	jbd2_log_wait_commit(journal, tid);
> > -	return 1;
> > +	ret = jbd2_log_wait_commit(journal, tid);
> > +	if (!ret && progress)
> > +		*progress = 1;
> > +
> > +	return ret;
> > +}
>   Can't we just make this function return <0, 0, 1 and get rid of
> 'progress' argument? Caller jbd2_journal_force_commit() can then return 0
> if __jbd2_journal_force_commit() returned >= 0...
Ok, sound reasonable. In fact __jbd2_journal_force_commit() should be
static so misuse is no likely to happen.
> 
> > +
> > +/**
> > + * Force and wait upon a commit if the calling process is not within
> > + * transaction.  This is used for forcing out undo-protected data which contains
> > + * bitmaps, when the fs is running out of space.
> > + *
> > + * @journal: journal to force
> > + * Returns true if progress was made.
> > + */
> > +int jbd2_journal_force_commit_nested(journal_t *journal)
> > +{
> > +	int progress;
> > +
> > +	__jbd2_journal_force_commit(journal, 1, &progress);
> > +	return progress;
> > +}
> > +
> > +/**
> > + * int journal_force_commit() - force any uncommitted transactions
> > + * @journal: journal to force
> > + *
> > + */
> > +int jbd2_journal_force_commit(journal_t *journal)
> > +{
> > +	return __jbd2_journal_force_commit(journal, 0, NULL);
> >  }
>   Also if we move the WARN_ON(!current->journal_info) here, we can remove
> the 'nested' argument.
Also agree. Will send you with updated version soon.
> 
> 								Honza
> 
> >  /*
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index 10f524c..dae6b3d 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -1522,29 +1522,6 @@ int jbd2_journal_stop(handle_t *handle)
> >  	return err;
> >  }
> >  
> > -/**
> > - * int jbd2_journal_force_commit() - force any uncommitted transactions
> > - * @journal: journal to force
> > - *
> > - * For synchronous operations: force any uncommitted transactions
> > - * to disk.  May seem kludgy, but it reuses all the handle batching
> > - * code in a very simple manner.
> > - */
> > -int jbd2_journal_force_commit(journal_t *journal)
> > -{
> > -	handle_t *handle;
> > -	int ret;
> > -
> > -	handle = jbd2_journal_start(journal, 1);
> > -	if (IS_ERR(handle)) {
> > -		ret = PTR_ERR(handle);
> > -	} else {
> > -		handle->h_sync = 1;
> > -		ret = jbd2_journal_stop(handle);
> > -	}
> > -	return ret;
> > -}
> > -
> >  /*
> >   *
> >   * List management code snippets: various functions for manipulating the
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index 6e051f4..c9e1ab6 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1125,6 +1125,7 @@ extern void	   jbd2_journal_ack_err    (journal_t *);
> >  extern int	   jbd2_journal_clear_err  (journal_t *);
> >  extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
> >  extern int	   jbd2_journal_force_commit(journal_t *);
> > +extern int	   jbd2_journal_force_commit_nested(journal_t *);
> >  extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
> >  extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
> >  				struct jbd2_inode *inode, loff_t new_size);
> > @@ -1199,7 +1200,6 @@ int __jbd2_log_space_left(journal_t *); /* Called with journal locked */
> >  int jbd2_log_start_commit(journal_t *journal, tid_t tid);
> >  int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
> >  int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
> > -int jbd2_journal_force_commit_nested(journal_t *journal);
> >  int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
> >  int jbd2_complete_transaction(journal_t *journal, tid_t tid);
> >  int jbd2_log_do_checkpoint(journal_t *journal);
> > -- 
> > 1.7.1
> > 
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] ext4: fix data integrity for ext4_sync_fs
  2013-05-28 21:51   ` Jan Kara
@ 2013-06-03 11:30     ` Dmitry Monakhov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-06-03 11:30 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Tue, 28 May 2013 23:51:12 +0200, Jan Kara <jack@suse.cz> wrote:
> On Tue 28-05-13 13:18:57, Dmitry Monakhov wrote:
> > Inode's data or non journaled quota may be written w/o jounral so we _must_
> > send a barrier at the end of ext4_sync_fs. But it can be skipped if journal
> > commit will do it for us.
> > 
> > Also fix data integrity for nojournal mode.
> > changes from v1:
> >  skip barrier for async mode
>   One comment below:
> 
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index c9e1ab6..ed974fd 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1319,6 +1319,21 @@ static inline u32 jbd2_chksum(journal_t *journal, u32 crc,
> >  	return *(u32 *)desc.ctx;
> >  }
> >  
> > +/* Return most recent uncommitted transaction */
> > +static inline tid_t  jbd2_get_latest_transaction(journal_t *journal)
> > +{
> > +	tid_t tid;
> > +
> > +	read_lock(&journal->j_state_lock);
> > +	tid = journal->j_commit_request;
> > +	if (journal->j_running_transaction) {
> > +		tid = journal->j_running_transaction->t_tid;
> > +	} else if (!journal->j_commit_sequence)
>   I would expect here journal->j_committing_transaction and then
> journal->j_commit_request below. And you can use j_commit_sequence as an
> initial 'tid' value...
I have to admit that my code is not correct because function should
return 'most recent uncommitted transaction', but I do not get your
idea. journal->j_commit_request >= jorunal->j_committing_transaction
and journal->j_commit_request >= journal->j_commit_sequence

So it is reasonable to define function like follows:

static inline tid_t  jbd2_get_latest_transaction(journal_t journal)
{
   tid_t tid;

   read_lock(&journal->j_state_lock);
   tid = journal->j_commit_request;
   if (journal->j_running_transaction)
           tid = journal->j_running_transaction->t_tid;
   read_unlock(&journal->j_state_lock);
   return tid;
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit
  2013-06-03 11:16     ` Dmitry Monakhov
@ 2013-06-10 13:07       ` Theodore Ts'o
  2013-06-10 13:18         ` Dmitry Monakhov
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2013-06-10 13:07 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, linux-ext4

On Mon, Jun 03, 2013 at 03:16:38PM +0400, Dmitry Monakhov wrote:
> Ok, sound reasonable. In fact __jbd2_journal_force_commit() should be
> static so misuse is no likely to happen.
>
> >   Also if we move the WARN_ON(!current->journal_info) here, we can remove
> > the 'nested' argument.
> Also agree. Will send you with updated version soon.

Hi Dmitry,

Have you had a chance to work on an updated version of this series
with Jan's requested changes?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit
  2013-06-10 13:07       ` Theodore Ts'o
@ 2013-06-10 13:18         ` Dmitry Monakhov
  2013-06-10 13:53           ` Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Monakhov @ 2013-06-10 13:18 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4

On Mon, 10 Jun 2013 09:07:30 -0400, Theodore Ts'o <tytso@mit.edu> wrote:
> On Mon, Jun 03, 2013 at 03:16:38PM +0400, Dmitry Monakhov wrote:
> > Ok, sound reasonable. In fact __jbd2_journal_force_commit() should be
> > static so misuse is no likely to happen.
> >
> > >   Also if we move the WARN_ON(!current->journal_info) here, we can remove
> > > the 'nested' argument.
> > Also agree. Will send you with updated version soon.
> 
> Hi Dmitry,
> 
> Have you had a chance to work on an updated version of this series
> with Jan's requested changes?
Yes. It is ready, but since Jan is on vacations till 20'th I thought
it is useless to send it before that date. Am I right?
> 
> Thanks,
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit
  2013-06-10 13:18         ` Dmitry Monakhov
@ 2013-06-10 13:53           ` Theodore Ts'o
  0 siblings, 0 replies; 17+ messages in thread
From: Theodore Ts'o @ 2013-06-10 13:53 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, linux-ext4

On Mon, Jun 10, 2013 at 05:18:19PM +0400, Dmitry Monakhov wrote:
> > Have you had a chance to work on an updated version of this series
> > with Jan's requested changes?
> Yes. It is ready, but since Jan is on vacations till 20'th I thought
> it is useless to send it before that date. Am I right?

Please send it now; other folks, including myself, can review it.
Since we are already at 3.10-rc5, I'd get the ext4/jbd2-related
patches reviewed and in the ext4 tree sooner rather than later.

Many thanks!!

					- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-06-10 13:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-28  9:18 [PATCH 0/6] ext3/4 data integrity fixes Dmitry Monakhov
2013-05-28  9:18 ` [PATCH 1/6] jbd2: optimize jbd2_journal_force_commit Dmitry Monakhov
2013-05-28 21:22   ` Jan Kara
2013-06-03 11:16     ` Dmitry Monakhov
2013-06-10 13:07       ` Theodore Ts'o
2013-06-10 13:18         ` Dmitry Monakhov
2013-06-10 13:53           ` Theodore Ts'o
2013-05-28  9:18 ` [PATCH 2/6] ext4: fix data integrity for ext4_sync_fs Dmitry Monakhov
2013-05-28 21:51   ` Jan Kara
2013-06-03 11:30     ` Dmitry Monakhov
2013-05-28  9:18 ` [PATCH 3/6] jbd: optimize journal_force_commit Dmitry Monakhov
2013-05-28  9:18 ` [PATCH 4/6] ext3: fix data integrity for ext4_sync_fs Dmitry Monakhov
2013-05-28 21:40   ` Jan Kara
2013-05-28  9:19 ` [PATCH 5/6] ext4: Fix fsync error handling after filesystem abort Dmitry Monakhov
2013-05-28 21:29   ` Jan Kara
2013-05-28  9:19 ` [PATCH 6/6] ext3: " Dmitry Monakhov
2013-05-28 21:33   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.