linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock
@ 2011-12-08 18:04 Kamal Mostafa
  2011-12-08 18:04 ` [PATCH v2 1/7] Adding support to freeze and unfreeze a journal Kamal Mostafa
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

Thanks everyone for the feedback.  Here's [PATCH v2], with these notable
changes since [PATCH 0/5 resend]:
   - made bisect-safe "Freeze and thaw the journal on ext4 freeze"
   - functional change: allow read-only quota subcmds when frozen
   - added "VFS: Rename and refactor writeback_inodes_sb_if_idle"
   - split-out "VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb"
   - split-out "VFS: Document s_frozen state through freeze_super"
   - dropped "VFS: Rename vfs_check_frozen()"

-----

This set of filesystem freeze/thaw deadlock patches is submitted on
behalf of the authors, Kamal Mostafa, Surbhi Palande and Valerie Aurora.
The patches fix the bug:

BugLink: https://bugs.launchpad.net/bugs/897421

Kamal Mostafa (1):
  VFS: Rename and refactor writeback_inodes_sb_if_idle

Surbhi Palande (2):
  Adding support to freeze and unfreeze a journal
  Freeze and thaw the journal on ext4 freeze

Valerie Aurora (4):

  VFS: Fix s_umount thaw/write deadlock
  VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb
  VFS: Document s_frozen state through freeze_super
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs

 Documentation/filesystems/Locking |    4 +-
 fs/btrfs/extent-tree.c            |    2 +-
 fs/ext4/inode.c                   |    2 +-
 fs/ext4/super.c                   |   13 +++++-----
 fs/fs-writeback.c                 |   46 +++++++++++++++++++++---------------
 fs/jbd2/journal.c                 |    1 +
 fs/jbd2/transaction.c             |   42 +++++++++++++++++++++++++++++++++
 fs/quota/quota.c                  |   21 ++++++++++++++++-
 fs/super.c                        |   26 +++++++++++++++++++++
 fs/sync.c                         |    4 +-
 include/linux/fs.h                |    7 +++++-
 include/linux/jbd2.h              |    7 +++++
 include/linux/writeback.h         |    4 +-
 13 files changed, 143 insertions(+), 36 deletions(-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
@ 2011-12-08 18:04 ` Kamal Mostafa
  2012-01-10 20:20   ` Eric Sandeen
  2011-12-08 18:04 ` [PATCH v2 2/7] Freeze and thaw the journal on ext4 freeze Kamal Mostafa
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka,
	Surbhi Palande

From: Surbhi Palande <surbhi.palande@canonical.com>

The journal should be frozen when a filesystem freezes. What this means is
that until the filesystem is thawed again, no new transactions should be
accepted by the journal. When the filesystem thaws, inturn it should thaw the
journal and this should allow the journal to resume accepting new
transactions. While the the filesystem has frozen the journal, the clients of
the journal on calling jbd2_journal_start() will sleep on a wait queue.
Thawing the journal will wake up the sleeping clients and journalling can
progress normally.

An example of the race condition that can happen without this patch is as
follows:

Say the filesystem is thawed when we begin. Let tx be the time at unit x

P1: Process doing an aio write
t1) ext4_file_write()
  t2) generic_file_aio_write()
    t3) __generic_file_aio_write()
      // filesystem is not frozen, so we do not block in the next check.
      t4) vfs_check_frozen()
      t5) generic_write_checks()
----------------- Prempted------------------

P2: Process that does filesystem freeze

t6) freeze_super()
  t7) sync_filesystem()
  t8) sync_blockdev()
  t9) sb->s_op->freeze_fs() (= ext4_freeze)
    t10) jbd2_journal_lock_updates()
    t11) jbd2_journal_flush()
    // Need to unlock the journal before returning to user space.
    t12) jbd2_journal_unlock_updates()
    // Journal is unlocked and so we can start accepting new transactions now.

// freezing process completes execution. Page cache is now clean and should
// remain clean till the filesystem is frozen.
--------------------------------------------

P1: writing process gets the control back
t13) generic_file_buffered_write()
  t14) generic_perform_write()
    t15) a_ops->write_begin() (= ext4_write_begin)
      t16) ext4_journal_start()
	// New handle is started. We do not block here! Write continues
	// dirtying the page cache while the filesystem is frozen!

BugLink: https://bugs.launchpad.net/bugs/897421
Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 fs/jbd2/journal.c     |    1 +
 fs/jbd2/transaction.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/jbd2.h  |    7 +++++++
 3 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0fa0123..f0170cc 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -894,6 +894,7 @@ static journal_t * journal_init_common (void)
 	init_waitqueue_head(&journal->j_wait_checkpoint);
 	init_waitqueue_head(&journal->j_wait_commit);
 	init_waitqueue_head(&journal->j_wait_updates);
+	init_waitqueue_head(&journal->j_wait_frozen);
 	mutex_init(&journal->j_barrier);
 	mutex_init(&journal->j_checkpoint_mutex);
 	spin_lock_init(&journal->j_revoke_lock);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..340ee35 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -173,6 +173,17 @@ repeat:
 				journal->j_barrier_count == 0);
 		goto repeat;
 	}
+	/* Don't let a new handle start when a journal is frozen.
+	 * jbd2_journal_freeze calls jbd2_journal_unlock_updates() only after
+	 * the j_flags indicate that the journal is frozen. So if the
+	 * j_barrier_count is 0, then check if this was made 0 by the freezing
+	 * process
+	 */
+	if (journal->j_flags & JBD2_FROZEN) {
+		read_unlock(&journal->j_state_lock);
+		wait_event(journal->j_wait_frozen, (journal->j_flags & JBD2_FROZEN));
+		goto repeat;
+	}
 
 	if (!journal->j_running_transaction) {
 		read_unlock(&journal->j_state_lock);
@@ -492,6 +503,37 @@ int jbd2_journal_restart(handle_t *handle, int nblocks)
 }
 EXPORT_SYMBOL(jbd2_journal_restart);
 
+int jbd2_journal_freeze(journal_t *journal)
+{
+	int error = 0;
+	/* Now we set up the journal barrier. */
+	jbd2_journal_lock_updates(journal);
+
+	/*
+	 * Don't clear the needs_recovery flag if we failed to flush
+	 * the journal.
+	 */
+	error = jbd2_journal_flush(journal);
+	if (error >= 0) {
+		write_lock(&journal->j_state_lock);
+		journal->j_flags |= JBD2_FROZEN;
+		write_unlock(&journal->j_state_lock);
+	}
+	jbd2_journal_unlock_updates(journal);
+	return error;
+}
+EXPORT_SYMBOL(jbd2_journal_freeze);
+
+void jbd2_journal_thaw(journal_t * journal)
+{
+	write_lock(&journal->j_state_lock);
+	journal->j_flags &= ~JBD2_FROZEN;
+	write_unlock(&journal->j_state_lock);
+	wake_up(&journal->j_wait_frozen);
+}
+EXPORT_SYMBOL(jbd2_journal_thaw);
+
+
 /**
  * void jbd2_journal_lock_updates () - establish a transaction barrier.
  * @journal:  Journal to establish a barrier on.
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2092ea2..bfa0752 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -658,6 +658,7 @@ jbd2_time_diff(unsigned long start, unsigned long end)
  * @j_wait_checkpoint:  Wait queue to trigger checkpointing
  * @j_wait_commit: Wait queue to trigger commit
  * @j_wait_updates: Wait queue to wait for updates to complete
+ * @j_wait_frozen: Wait queue to wait for journal to thaw
  * @j_checkpoint_mutex: Mutex for locking against concurrent checkpoints
  * @j_head: Journal head - identifies the first unused block in the journal
  * @j_tail: Journal tail - identifies the oldest still-used block in the
@@ -775,6 +776,9 @@ struct journal_s
 	/* Wait queue to wait for updates to complete */
 	wait_queue_head_t	j_wait_updates;
 
+	/* Wait queue to wait for journal to thaw*/
+	wait_queue_head_t	j_wait_frozen;
+
 	/* Semaphore for locking against concurrent checkpoints */
 	struct mutex		j_checkpoint_mutex;
 
@@ -953,6 +957,7 @@ struct journal_s
 #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
 						 * data write error in ordered
 						 * mode */
+#define JBD2_FROZEN	0x080 /* Journal thread frozen along with filesystem */
 
 /*
  * Function declarations for the journaling transaction and buffer
@@ -1060,6 +1065,8 @@ extern void	 jbd2_journal_invalidatepage(journal_t *,
 				struct page *, unsigned long);
 extern int	 jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int	 jbd2_journal_stop(handle_t *);
+extern int	 jbd2_journal_freeze(journal_t *);
+extern void	 jbd2_journal_thaw(journal_t *);
 extern int	 jbd2_journal_flush (journal_t *);
 extern void	 jbd2_journal_lock_updates (journal_t *);
 extern void	 jbd2_journal_unlock_updates (journal_t *);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/7] Freeze and thaw the journal on ext4 freeze
  2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
  2011-12-08 18:04 ` [PATCH v2 1/7] Adding support to freeze and unfreeze a journal Kamal Mostafa
@ 2011-12-08 18:04 ` Kamal Mostafa
  2012-01-06  0:32   ` Jan Kara
  2011-12-08 18:04 ` [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock Kamal Mostafa
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka,
	Surbhi Palande

From: Surbhi Palande <surbhi.palande@canonical.com>

Freeze and thaw the journal when you freeze and thaw the filesystem.

BugLink: https://bugs.launchpad.net/bugs/897421
Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 fs/ext4/super.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3858767..751908b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4330,14 +4330,11 @@ static int ext4_freeze(struct super_block *sb)
 
 	journal = EXT4_SB(sb)->s_journal;
 
-	/* Now we set up the journal barrier. */
-	jbd2_journal_lock_updates(journal);
-
+	error = jbd2_journal_freeze(journal);
 	/*
-	 * Don't clear the needs_recovery flag if we failed to flush
+	 * Don't clear the needs_recovery flag if we failed to freeze
 	 * the journal.
 	 */
-	error = jbd2_journal_flush(journal);
 	if (error < 0)
 		goto out;
 
@@ -4345,8 +4342,6 @@ static int ext4_freeze(struct super_block *sb)
 	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
 	error = ext4_commit_super(sb, 1);
 out:
-	/* we rely on s_frozen to stop further updates */
-	jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
 	return error;
 }
 
@@ -4356,10 +4351,14 @@ out:
  */
 static int ext4_unfreeze(struct super_block *sb)
 {
+	journal_t *journal;
 	if (sb->s_flags & MS_RDONLY)
 		return 0;
 
 	lock_super(sb);
+	journal = EXT4_SB(sb)->s_journal;
+
+	jbd2_journal_thaw(journal);
 	/* Reset the needs_recovery flag before the fs is unlocked. */
 	EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
 	ext4_commit_super(sb, 1);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock
  2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
  2011-12-08 18:04 ` [PATCH v2 1/7] Adding support to freeze and unfreeze a journal Kamal Mostafa
  2011-12-08 18:04 ` [PATCH v2 2/7] Freeze and thaw the journal on ext4 freeze Kamal Mostafa
@ 2011-12-08 18:04 ` Kamal Mostafa
  2012-01-06  1:50   ` Jan Kara
  2011-12-08 18:04 ` [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle Kamal Mostafa
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

From: Valerie Aurora <val@vaaconsulting.com>

File system freeze/thaw require the superblock's s_umount lock.
However, we write to the file system while holding the s_umount lock
in several places (e.g., writeback and quotas).  Any of these can
block on a frozen file system while holding s_umount, preventing any
later thaw from occurring, since thaw requires s_umount.

The solution is to add a check, vfs_is_frozen(), to all code paths
that write while holding sb->s_umount and return without writing if it
is true.

BugLink: https://bugs.launchpad.net/bugs/897421
Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
[kamal@canonical.com: minor changes; patch restructure]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 fs/fs-writeback.c  |    8 ++++++++
 fs/quota/quota.c   |   21 ++++++++++++++++++++-
 fs/super.c         |    8 ++++++++
 fs/sync.c          |    4 ++--
 include/linux/fs.h |    7 ++++++-
 5 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 73c3992..eee01cd 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -554,6 +554,14 @@ static long writeback_sb_inodes(struct super_block *sb,
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
 
+		if (inode->i_sb == sb && vfs_is_frozen(sb)) {
+			/*
+			 * Inode is on a frozen superblock; skip it for now.
+			 */
+			redirty_tail(inode, wb);
+			continue;
+		}
+
 		if (inode->i_sb != sb) {
 			if (work->sb) {
 				/*
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 35f4b0e..1d770f2 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -47,7 +47,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
 
 static void quota_sync_one(struct super_block *sb, void *arg)
 {
-	if (sb->s_qcop && sb->s_qcop->quota_sync)
+	if (sb->s_qcop && sb->s_qcop->quota_sync && !vfs_is_frozen(sb))
 		sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
 }
 
@@ -368,8 +368,27 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
 		goto out;
 	}
 
+	/*
+	 * If the fs is frozen, only allow read-only quota subcmds.
+	 */
+	if (vfs_is_frozen(sb)) {
+		switch (cmd) {
+			case Q_GETFMT:
+			case Q_GETINFO:
+			case Q_XGETQSTAT:
+				ret = 0;
+				break;
+			default:
+				ret = -EBUSY;
+				break;
+		}
+		if ( ret != 0 )
+		    goto out_drop_super;
+	}
+
 	ret = do_quotactl(sb, type, cmds, id, addr, pathp);
 
+out_drop_super:
 	drop_super(sb);
 out:
 	if (pathp && !IS_ERR(pathp))
diff --git a/fs/super.c b/fs/super.c
index afd0f1a..5629d06 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -526,6 +526,10 @@ void sync_supers(void)
  *
  *	Scans the superblock list and calls given function, passing it
  *	locked superblock and given argument.
+ *
+ *	Note: If @f is going to write to the file system, it must first
+ *	check if the file system is frozen (via vfs_is_frozen(sb)) and
+ *	refuse to write if so.
  */
 void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
 {
@@ -595,6 +599,10 @@ EXPORT_SYMBOL(iterate_supers_type);
  *	
  *	Scans the superblock list and finds the superblock of the file system
  *	mounted on the device given. %NULL is returned if no match is found.
+ *
+ *	Note: If caller is going to write to the superblock, it must first
+ *	check if the file system is frozen (via vfs_is_frozen(sb)) and
+ *	refuse to write if so.
  */
 
 struct super_block *get_super(struct block_device *bdev)
diff --git a/fs/sync.c b/fs/sync.c
index 101b8ef..e8166db 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
 	/*
 	 * No point in syncing out anything if the filesystem is read-only.
 	 */
-	if (sb->s_flags & MS_RDONLY)
+	if ((sb->s_flags & MS_RDONLY) || vfs_is_frozen(sb))
 		return 0;
 
 	ret = __sync_filesystem(sb, 0);
@@ -80,7 +80,7 @@ EXPORT_SYMBOL_GPL(sync_filesystem);
 
 static void sync_one_sb(struct super_block *sb, void *arg)
 {
-	if (!(sb->s_flags & MS_RDONLY))
+	if (!(sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
 		__sync_filesystem(sb, *(int *)arg);
 }
 /*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 019dc55..ec33b4c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1490,7 +1490,7 @@ extern void prune_dcache_sb(struct super_block *sb, int nr_to_scan);
 extern struct timespec current_fs_time(struct super_block *sb);
 
 /*
- * Snapshotting support.
+ * Snapshotting support.  See freeze_super() for documentation.
  */
 enum {
 	SB_UNFROZEN = 0,
@@ -1501,6 +1501,11 @@ enum {
 #define vfs_check_frozen(sb, level) \
 	wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
 
+static inline int vfs_is_frozen(struct super_block *sb)
+{
+	return sb->s_frozen == SB_FREEZE_TRANS;
+}
+
 /*
  * until VFS tracks user namespaces for inodes, just make all files
  * belong to init_user_ns
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle
  2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
                   ` (2 preceding siblings ...)
  2011-12-08 18:04 ` [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock Kamal Mostafa
@ 2011-12-08 18:04 ` Kamal Mostafa
  2011-12-13  3:34   ` Miao Xie
  2012-01-06  0:33   ` Jan Kara
  2011-12-08 18:04 ` [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb Kamal Mostafa
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

Rename writeback_inodes_sb{_nr}_if_idle to try_to_writeback_inodes_sb{_nr}
and refactor to avoid duplicating logic.

Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 fs/btrfs/extent-tree.c    |    2 +-
 fs/ext4/inode.c           |    2 +-
 fs/fs-writeback.c         |   25 +++++++++++--------------
 include/linux/writeback.h |    4 ++--
 4 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f0d5718..6f6fe2b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3416,7 +3416,7 @@ static int shrink_delalloc(struct btrfs_root *root, u64 to_reclaim,
 		smp_mb();
 		nr_pages = min_t(unsigned long, nr_pages,
 		       root->fs_info->delalloc_bytes >> PAGE_CACHE_SHIFT);
-		writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
+		try_to_writeback_inodes_sb_nr(root->fs_info->sb, nr_pages,
 						WB_REASON_FS_FREE_SPACE);
 
 		spin_lock(&space_info->lock);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 848f436..0da75cd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2373,7 +2373,7 @@ static int ext4_nonda_switch(struct super_block *sb)
 	 * start pushing delalloc when 1/2 of free blocks are dirty.
 	 */
 	if (free_blocks < 2 * dirty_blocks)
-		writeback_inodes_sb_if_idle(sb, WB_REASON_FS_FREE_SPACE);
+		try_to_writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
 
 	return 0;
 }
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index eee01cd..ea89b3f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1271,45 +1271,42 @@ void writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
 EXPORT_SYMBOL(writeback_inodes_sb);
 
 /**
- * writeback_inodes_sb_if_idle	-	start writeback if none underway
+ * try_to_writeback_inodes_sb	-	start writeback if none underway
  * @sb: the superblock
  *
  * Invoke writeback_inodes_sb if no writeback is currently underway.
  * Returns 1 if writeback was started, 0 if not.
  */
-int writeback_inodes_sb_if_idle(struct super_block *sb, enum wb_reason reason)
+int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
 {
-	if (!writeback_in_progress(sb->s_bdi)) {
-		down_read(&sb->s_umount);
-		writeback_inodes_sb(sb, reason);
-		up_read(&sb->s_umount);
-		return 1;
-	} else
-		return 0;
+	return try_to_writeback_inodes_sb_nr(sb, 0, reason);
 }
-EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
+EXPORT_SYMBOL(try_to_writeback_inodes_sb);
 
 /**
- * writeback_inodes_sb_if_idle	-	start writeback if none underway
+ * try_to_writeback_inodes_sb_nr	-	start writeback if none underway
  * @sb: the superblock
  * @nr: the number of pages to write
  *
  * Invoke writeback_inodes_sb if no writeback is currently underway.
  * Returns 1 if writeback was started, 0 if not.
  */
-int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
+int try_to_writeback_inodes_sb_nr(struct super_block *sb,
 				   unsigned long nr,
 				   enum wb_reason reason)
 {
 	if (!writeback_in_progress(sb->s_bdi)) {
 		down_read(&sb->s_umount);
-		writeback_inodes_sb_nr(sb, nr, reason);
+		if (nr == 0)
+			writeback_inodes_sb(sb, reason);
+		else
+			writeback_inodes_sb_nr(sb, nr, reason);
 		up_read(&sb->s_umount);
 		return 1;
 	} else
 		return 0;
 }
-EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
+EXPORT_SYMBOL(try_to_writeback_inodes_sb_nr);
 
 /**
  * sync_inodes_sb	-	sync sb inode pages
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a378c29..e824225 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -89,8 +89,8 @@ int inode_wait(void *);
 void writeback_inodes_sb(struct super_block *, enum wb_reason reason);
 void writeback_inodes_sb_nr(struct super_block *, unsigned long nr,
 							enum wb_reason reason);
-int writeback_inodes_sb_if_idle(struct super_block *, enum wb_reason reason);
-int writeback_inodes_sb_nr_if_idle(struct super_block *, unsigned long nr,
+int try_to_writeback_inodes_sb(struct super_block *, enum wb_reason reason);
+int try_to_writeback_inodes_sb_nr(struct super_block *, unsigned long nr,
 							enum wb_reason reason);
 void sync_inodes_sb(struct super_block *);
 long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb
  2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
                   ` (3 preceding siblings ...)
  2011-12-08 18:04 ` [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle Kamal Mostafa
@ 2011-12-08 18:04 ` Kamal Mostafa
  2012-01-06  0:35   ` Jan Kara
  2011-12-08 18:04 ` [PATCH v2 6/7] VFS: Document s_frozen state through freeze_super Kamal Mostafa
  2011-12-08 18:04 ` [PATCH v2 7/7] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Kamal Mostafa
  6 siblings, 1 reply; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

From: Valerie Aurora <val@vaaconsulting.com>

Use trylock in try_to_writeback_inodes_sb to avoid read-write
deadlocks that could be triggered by freeze.

BugLink: https://bugs.launchpad.net/bugs/897421
Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
[kamal@canonical.com: patch restructure]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 fs/fs-writeback.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ea89b3f..3a80f1b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1274,8 +1274,9 @@ EXPORT_SYMBOL(writeback_inodes_sb);
  * try_to_writeback_inodes_sb	-	start writeback if none underway
  * @sb: the superblock
  *
- * Invoke writeback_inodes_sb if no writeback is currently underway.
- * Returns 1 if writeback was started, 0 if not.
+ * Invoke writeback_inodes_sb if no writeback is currently underway
+ * and no one else holds the s_umount lock.  Returns 1 if writeback
+ * was started, 0 if not.
  */
 int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
 {
@@ -1288,15 +1289,17 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb);
  * @sb: the superblock
  * @nr: the number of pages to write
  *
- * Invoke writeback_inodes_sb if no writeback is currently underway.
- * Returns 1 if writeback was started, 0 if not.
+ * Invoke writeback_inodes_sb if no writeback is currently underway
+ * and no one else holds the s_umount lock.  Returns 1 if writeback
+ * was started, 0 if not.
  */
 int try_to_writeback_inodes_sb_nr(struct super_block *sb,
 				   unsigned long nr,
 				   enum wb_reason reason)
 {
 	if (!writeback_in_progress(sb->s_bdi)) {
-		down_read(&sb->s_umount);
+		if (!down_read_trylock(&sb->s_umount))
+		    return 0;
 		if (nr == 0)
 			writeback_inodes_sb(sb, reason);
 		else
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 6/7] VFS: Document s_frozen state through freeze_super
  2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
                   ` (4 preceding siblings ...)
  2011-12-08 18:04 ` [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb Kamal Mostafa
@ 2011-12-08 18:04 ` Kamal Mostafa
  2012-01-06  0:36   ` Jan Kara
  2011-12-08 18:04 ` [PATCH v2 7/7] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Kamal Mostafa
  6 siblings, 1 reply; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

From: Valerie Aurora <val@vaaconsulting.com>

BugLink: https://bugs.launchpad.net/bugs/897421
Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
[kamal@canonical.com: patch restructure]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 fs/super.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 5629d06..a56696b 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1140,6 +1140,24 @@ out:
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
  * freeze_fs.  Subsequent calls to this without first thawing the fs will return
  * -EBUSY.
+ *
+ * During this function, sb->s_frozen goes through these values:
+ *
+ * SB_UNFROZEN: File system is normal, all writes progress as usual.
+ *
+ * SB_FREEZE_WRITE: The file system is in the process of being frozen
+ * and any remaining out-standing writes are being synced.  Writes
+ * that complete in-process writes should be permitted but new ones
+ * should be blocked.
+ *
+ * SB_FREEZE_TRANS: The file system is frozen.  The ->freeze_fs and
+ * ->unfreeze_fs ops are the only operations permitted to write to the
+ * file system in this state.
+ *
+ * sb->s_frozen is protected by sb->s_umount.  Additionally,
+ * SB_FREEZE_WRITE is only temporarily set during freeze/thaw while
+ * holding sb->s_umount for writing, so any other callers holding
+ * sb->s_umount will never see this state.
  */
 int freeze_super(struct super_block *sb)
 {
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 7/7] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
                   ` (5 preceding siblings ...)
  2011-12-08 18:04 ` [PATCH v2 6/7] VFS: Document s_frozen state through freeze_super Kamal Mostafa
@ 2011-12-08 18:04 ` Kamal Mostafa
  2012-01-06  0:36   ` Jan Kara
  6 siblings, 1 reply; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-08 18:04 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso
  Cc: linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Kamal Mostafa,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

From: Valerie Aurora <val@vaaconsulting.com>

freeze_fs/unfreeze_fs ops are called with s_umount held for write, not read.

BugLink: https://bugs.launchpad.net/bugs/897421
Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/Locking |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index d819ba1..7e46a94 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -134,8 +134,8 @@ evict_inode:
 put_super:		write
 write_super:		read
 sync_fs:		read
-freeze_fs:		read
-unfreeze_fs:		read
+freeze_fs:		write
+unfreeze_fs:		write
 statfs:			maybe(read)	(see below)
 remount_fs:		write
 umount_begin:		no
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle
  2011-12-08 18:04 ` [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle Kamal Mostafa
@ 2011-12-13  3:34   ` Miao Xie
  2011-12-15  7:10     ` Miao Xie
  2011-12-16 20:48     ` Kamal Mostafa
  2012-01-06  0:33   ` Jan Kara
  1 sibling, 2 replies; 27+ messages in thread
From: Miao Xie @ 2011-12-13  3:34 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

On thu, 8 Dec 2011 10:04:34 -0800, Kamal Mostafa wrote:
>  /**
> - * writeback_inodes_sb_if_idle	-	start writeback if none underway
> + * try_to_writeback_inodes_sb_nr	-	start writeback if none underway
>   * @sb: the superblock
>   * @nr: the number of pages to write
>   *
>   * Invoke writeback_inodes_sb if no writeback is currently underway.
>   * Returns 1 if writeback was started, 0 if not.
>   */
> -int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
> +int try_to_writeback_inodes_sb_nr(struct super_block *sb,
>  				   unsigned long nr,
>  				   enum wb_reason reason)
>  {
>  	if (!writeback_in_progress(sb->s_bdi)) {
>  		down_read(&sb->s_umount);
> -		writeback_inodes_sb_nr(sb, nr, reason);
> +		if (nr == 0)
> +			writeback_inodes_sb(sb, reason);
> +		else
> +			writeback_inodes_sb_nr(sb, nr, reason);
>  		up_read(&sb->s_umount);
>  		return 1;
>  	} else
>  		return 0;

The comment said "Returns 1 if writeback was started", so if writeback_in_progress()
return true, I think this function also should return 1.

BTW: Does anyone know when this patchset will be merged into the main tree?

Thanks
Miao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle
  2011-12-13  3:34   ` Miao Xie
@ 2011-12-15  7:10     ` Miao Xie
  2011-12-16 20:48     ` Kamal Mostafa
  1 sibling, 0 replies; 27+ messages in thread
From: Miao Xie @ 2011-12-15  7:10 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

Ping...

On tue, 13 Dec 2011 11:34:05 +0800, Miao Xie wrote:
> On thu, 8 Dec 2011 10:04:34 -0800, Kamal Mostafa wrote:
>>  /**
>> - * writeback_inodes_sb_if_idle	-	start writeback if none underway
>> + * try_to_writeback_inodes_sb_nr	-	start writeback if none underway
>>   * @sb: the superblock
>>   * @nr: the number of pages to write
>>   *
>>   * Invoke writeback_inodes_sb if no writeback is currently underway.
>>   * Returns 1 if writeback was started, 0 if not.
>>   */
>> -int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
>> +int try_to_writeback_inodes_sb_nr(struct super_block *sb,
>>  				   unsigned long nr,
>>  				   enum wb_reason reason)
>>  {
>>  	if (!writeback_in_progress(sb->s_bdi)) {
>>  		down_read(&sb->s_umount);
>> -		writeback_inodes_sb_nr(sb, nr, reason);
>> +		if (nr == 0)
>> +			writeback_inodes_sb(sb, reason);
>> +		else
>> +			writeback_inodes_sb_nr(sb, nr, reason);
>>  		up_read(&sb->s_umount);
>>  		return 1;
>>  	} else
>>  		return 0;
> 
> The comment said "Returns 1 if writeback was started", so if writeback_in_progress()
> return true, I think this function also should return 1.
> 
> BTW: Does anyone know when this patchset will be merged into the main tree?
> 
> Thanks
> Miao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle
  2011-12-13  3:34   ` Miao Xie
  2011-12-15  7:10     ` Miao Xie
@ 2011-12-16 20:48     ` Kamal Mostafa
  1 sibling, 0 replies; 27+ messages in thread
From: Kamal Mostafa @ 2011-12-16 20:48 UTC (permalink / raw)
  To: miaox
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

On Tue, 2011-12-13 at 11:34 +0800, Miao Xie wrote:
> On thu, 8 Dec 2011 10:04:34 -0800, Kamal Mostafa wrote:
> >  /**
> > - * writeback_inodes_sb_if_idle	-	start writeback if none underway
> > + * try_to_writeback_inodes_sb_nr	-	start writeback if none underway
> >   * @sb: the superblock
> >   * @nr: the number of pages to write
> >   *
> >   * Invoke writeback_inodes_sb if no writeback is currently underway.
> >   * Returns 1 if writeback was started, 0 if not.
> >   */
> > -int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
> > +int try_to_writeback_inodes_sb_nr(struct super_block *sb,
> >  				   unsigned long nr,
> >  				   enum wb_reason reason)
> >  {
> >  	if (!writeback_in_progress(sb->s_bdi)) {
> >  		down_read(&sb->s_umount);
> > -		writeback_inodes_sb_nr(sb, nr, reason);
> > +		if (nr == 0)
> > +			writeback_inodes_sb(sb, reason);
> > +		else
> > +			writeback_inodes_sb_nr(sb, nr, reason);
> >  		up_read(&sb->s_umount);
> >  		return 1;
> >  	} else
> >  		return 0;
> 
> The comment said "Returns 1 if writeback was started", so if writeback_in_progress()
> return true, I think this function also should return 1.

My interpretation of that comment is that it will return 1 only if this
call results in a new writeback being started (not if a writeback was
already in progress).

This patch [4/7] intentionally does not introduce any functional
changes.

> BTW: Does anyone know when this patchset will be merged into the main tree?

I also eagerly await the merge of this patch set.

 -Kamal


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/7] Freeze and thaw the journal on ext4 freeze
  2011-12-08 18:04 ` [PATCH v2 2/7] Freeze and thaw the journal on ext4 freeze Kamal Mostafa
@ 2012-01-06  0:32   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2012-01-06  0:32 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka,
	Surbhi Palande

On Thu 08-12-11 10:04:32, Kamal Mostafa wrote:
> From: Surbhi Palande <surbhi.palande@canonical.com>
> 
> Freeze and thaw the journal when you freeze and thaw the filesystem.
> 
> BugLink: https://bugs.launchpad.net/bugs/897421
> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
> Cc: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
  The patch looks good. You can add:
Acked-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  fs/ext4/super.c |   13 ++++++-------
>  1 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 3858767..751908b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4330,14 +4330,11 @@ static int ext4_freeze(struct super_block *sb)
>  
>  	journal = EXT4_SB(sb)->s_journal;
>  
> -	/* Now we set up the journal barrier. */
> -	jbd2_journal_lock_updates(journal);
> -
> +	error = jbd2_journal_freeze(journal);
>  	/*
> -	 * Don't clear the needs_recovery flag if we failed to flush
> +	 * Don't clear the needs_recovery flag if we failed to freeze
>  	 * the journal.
>  	 */
> -	error = jbd2_journal_flush(journal);
>  	if (error < 0)
>  		goto out;
>  
> @@ -4345,8 +4342,6 @@ static int ext4_freeze(struct super_block *sb)
>  	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
>  	error = ext4_commit_super(sb, 1);
>  out:
> -	/* we rely on s_frozen to stop further updates */
> -	jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
>  	return error;
>  }
>  
> @@ -4356,10 +4351,14 @@ out:
>   */
>  static int ext4_unfreeze(struct super_block *sb)
>  {
> +	journal_t *journal;
>  	if (sb->s_flags & MS_RDONLY)
>  		return 0;
>  
>  	lock_super(sb);
> +	journal = EXT4_SB(sb)->s_journal;
> +
> +	jbd2_journal_thaw(journal);
>  	/* Reset the needs_recovery flag before the fs is unlocked. */
>  	EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
>  	ext4_commit_super(sb, 1);
> -- 
> 1.7.5.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle
  2011-12-08 18:04 ` [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle Kamal Mostafa
  2011-12-13  3:34   ` Miao Xie
@ 2012-01-06  0:33   ` Jan Kara
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Kara @ 2012-01-06  0:33 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

On Thu 08-12-11 10:04:34, Kamal Mostafa wrote:
> Rename writeback_inodes_sb{_nr}_if_idle to try_to_writeback_inodes_sb{_nr}
> and refactor to avoid duplicating logic.
  Looks good to me. You can add:
Acked-by: Jan Kara <jack@suse.cz>

								Honza

> 
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---
>  fs/btrfs/extent-tree.c    |    2 +-
>  fs/ext4/inode.c           |    2 +-
>  fs/fs-writeback.c         |   25 +++++++++++--------------
>  include/linux/writeback.h |    4 ++--
>  4 files changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f0d5718..6f6fe2b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3416,7 +3416,7 @@ static int shrink_delalloc(struct btrfs_root *root, u64 to_reclaim,
>  		smp_mb();
>  		nr_pages = min_t(unsigned long, nr_pages,
>  		       root->fs_info->delalloc_bytes >> PAGE_CACHE_SHIFT);
> -		writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
> +		try_to_writeback_inodes_sb_nr(root->fs_info->sb, nr_pages,
>  						WB_REASON_FS_FREE_SPACE);
>  
>  		spin_lock(&space_info->lock);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 848f436..0da75cd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2373,7 +2373,7 @@ static int ext4_nonda_switch(struct super_block *sb)
>  	 * start pushing delalloc when 1/2 of free blocks are dirty.
>  	 */
>  	if (free_blocks < 2 * dirty_blocks)
> -		writeback_inodes_sb_if_idle(sb, WB_REASON_FS_FREE_SPACE);
> +		try_to_writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
>  
>  	return 0;
>  }
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index eee01cd..ea89b3f 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1271,45 +1271,42 @@ void writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
>  EXPORT_SYMBOL(writeback_inodes_sb);
>  
>  /**
> - * writeback_inodes_sb_if_idle	-	start writeback if none underway
> + * try_to_writeback_inodes_sb	-	start writeback if none underway
>   * @sb: the superblock
>   *
>   * Invoke writeback_inodes_sb if no writeback is currently underway.
>   * Returns 1 if writeback was started, 0 if not.
>   */
> -int writeback_inodes_sb_if_idle(struct super_block *sb, enum wb_reason reason)
> +int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
>  {
> -	if (!writeback_in_progress(sb->s_bdi)) {
> -		down_read(&sb->s_umount);
> -		writeback_inodes_sb(sb, reason);
> -		up_read(&sb->s_umount);
> -		return 1;
> -	} else
> -		return 0;
> +	return try_to_writeback_inodes_sb_nr(sb, 0, reason);
>  }
> -EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
> +EXPORT_SYMBOL(try_to_writeback_inodes_sb);
>  
>  /**
> - * writeback_inodes_sb_if_idle	-	start writeback if none underway
> + * try_to_writeback_inodes_sb_nr	-	start writeback if none underway
>   * @sb: the superblock
>   * @nr: the number of pages to write
>   *
>   * Invoke writeback_inodes_sb if no writeback is currently underway.
>   * Returns 1 if writeback was started, 0 if not.
>   */
> -int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
> +int try_to_writeback_inodes_sb_nr(struct super_block *sb,
>  				   unsigned long nr,
>  				   enum wb_reason reason)
>  {
>  	if (!writeback_in_progress(sb->s_bdi)) {
>  		down_read(&sb->s_umount);
> -		writeback_inodes_sb_nr(sb, nr, reason);
> +		if (nr == 0)
> +			writeback_inodes_sb(sb, reason);
> +		else
> +			writeback_inodes_sb_nr(sb, nr, reason);
>  		up_read(&sb->s_umount);
>  		return 1;
>  	} else
>  		return 0;
>  }
> -EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
> +EXPORT_SYMBOL(try_to_writeback_inodes_sb_nr);
>  
>  /**
>   * sync_inodes_sb	-	sync sb inode pages
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index a378c29..e824225 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -89,8 +89,8 @@ int inode_wait(void *);
>  void writeback_inodes_sb(struct super_block *, enum wb_reason reason);
>  void writeback_inodes_sb_nr(struct super_block *, unsigned long nr,
>  							enum wb_reason reason);
> -int writeback_inodes_sb_if_idle(struct super_block *, enum wb_reason reason);
> -int writeback_inodes_sb_nr_if_idle(struct super_block *, unsigned long nr,
> +int try_to_writeback_inodes_sb(struct super_block *, enum wb_reason reason);
> +int try_to_writeback_inodes_sb_nr(struct super_block *, unsigned long nr,
>  							enum wb_reason reason);
>  void sync_inodes_sb(struct super_block *);
>  long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
> -- 
> 1.7.5.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb
  2011-12-08 18:04 ` [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb Kamal Mostafa
@ 2012-01-06  0:35   ` Jan Kara
  2012-01-11 20:29     ` Kamal Mostafa
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kara @ 2012-01-06  0:35 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

On Thu 08-12-11 10:04:35, Kamal Mostafa wrote:
> From: Valerie Aurora <val@vaaconsulting.com>
> 
> Use trylock in try_to_writeback_inodes_sb to avoid read-write
> deadlocks that could be triggered by freeze.
  Christoph asked about what is the exact deadlock this patch tries to fix.
I don't think you answered that. So can you elaborate please? Is it somehow
connected with the fact that ext4 calls try_to_writeback_inodes_sb() with
i_mutex held?

								Honza

> BugLink: https://bugs.launchpad.net/bugs/897421
> Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
> Cc: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> [kamal@canonical.com: patch restructure]
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---
>  fs/fs-writeback.c |   13 ++++++++-----
>  1 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index ea89b3f..3a80f1b 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1274,8 +1274,9 @@ EXPORT_SYMBOL(writeback_inodes_sb);
>   * try_to_writeback_inodes_sb	-	start writeback if none underway
>   * @sb: the superblock
>   *
> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> - * Returns 1 if writeback was started, 0 if not.
> + * Invoke writeback_inodes_sb if no writeback is currently underway
> + * and no one else holds the s_umount lock.  Returns 1 if writeback
> + * was started, 0 if not.
>   */
>  int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
>  {
> @@ -1288,15 +1289,17 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb);
>   * @sb: the superblock
>   * @nr: the number of pages to write
>   *
> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> - * Returns 1 if writeback was started, 0 if not.
> + * Invoke writeback_inodes_sb if no writeback is currently underway
> + * and no one else holds the s_umount lock.  Returns 1 if writeback
> + * was started, 0 if not.
>   */
>  int try_to_writeback_inodes_sb_nr(struct super_block *sb,
>  				   unsigned long nr,
>  				   enum wb_reason reason)
>  {
>  	if (!writeback_in_progress(sb->s_bdi)) {
> -		down_read(&sb->s_umount);
> +		if (!down_read_trylock(&sb->s_umount))
> +		    return 0;
>  		if (nr == 0)
>  			writeback_inodes_sb(sb, reason);
>  		else
> -- 
> 1.7.5.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/7] VFS: Document s_frozen state through freeze_super
  2011-12-08 18:04 ` [PATCH v2 6/7] VFS: Document s_frozen state through freeze_super Kamal Mostafa
@ 2012-01-06  0:36   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2012-01-06  0:36 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

On Thu 08-12-11 10:04:36, Kamal Mostafa wrote:
> From: Valerie Aurora <val@vaaconsulting.com>
  Looks good. You can add:
Acked-by: Jan Kara <jack@suse.cz>

								Honza

> BugLink: https://bugs.launchpad.net/bugs/897421
> Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
> Cc: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> [kamal@canonical.com: patch restructure]
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---
>  fs/super.c |   18 ++++++++++++++++++
>  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 5629d06..a56696b 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1140,6 +1140,24 @@ out:
>   * Syncs the super to make sure the filesystem is consistent and calls the fs's
>   * freeze_fs.  Subsequent calls to this without first thawing the fs will return
>   * -EBUSY.
> + *
> + * During this function, sb->s_frozen goes through these values:
> + *
> + * SB_UNFROZEN: File system is normal, all writes progress as usual.
> + *
> + * SB_FREEZE_WRITE: The file system is in the process of being frozen
> + * and any remaining out-standing writes are being synced.  Writes
> + * that complete in-process writes should be permitted but new ones
> + * should be blocked.
> + *
> + * SB_FREEZE_TRANS: The file system is frozen.  The ->freeze_fs and
> + * ->unfreeze_fs ops are the only operations permitted to write to the
> + * file system in this state.
> + *
> + * sb->s_frozen is protected by sb->s_umount.  Additionally,
> + * SB_FREEZE_WRITE is only temporarily set during freeze/thaw while
> + * holding sb->s_umount for writing, so any other callers holding
> + * sb->s_umount will never see this state.
>   */
>  int freeze_super(struct super_block *sb)
>  {
> -- 
> 1.7.5.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 7/7] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  2011-12-08 18:04 ` [PATCH v2 7/7] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Kamal Mostafa
@ 2012-01-06  0:36   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2012-01-06  0:36 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

On Thu 08-12-11 10:04:37, Kamal Mostafa wrote:
> From: Valerie Aurora <val@vaaconsulting.com>
> 
> freeze_fs/unfreeze_fs ops are called with s_umount held for write, not read.
  Looks good. You can add:
Acked-by: Jan Kara <jack@suse.cz>

								Honza
> 
> BugLink: https://bugs.launchpad.net/bugs/897421
> Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
> Cc: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  Documentation/filesystems/Locking |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index d819ba1..7e46a94 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -134,8 +134,8 @@ evict_inode:
>  put_super:		write
>  write_super:		read
>  sync_fs:		read
> -freeze_fs:		read
> -unfreeze_fs:		read
> +freeze_fs:		write
> +unfreeze_fs:		write
>  statfs:			maybe(read)	(see below)
>  remount_fs:		write
>  umount_begin:		no
> -- 
> 1.7.5.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock
  2011-12-08 18:04 ` [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock Kamal Mostafa
@ 2012-01-06  1:50   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2012-01-06  1:50 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka

  Hello,

On Thu 08-12-11 10:04:33, Kamal Mostafa wrote:
> From: Valerie Aurora <val@vaaconsulting.com>
> 
> File system freeze/thaw require the superblock's s_umount lock.
> However, we write to the file system while holding the s_umount lock
> in several places (e.g., writeback and quotas).  Any of these can
> block on a frozen file system while holding s_umount, preventing any
> later thaw from occurring, since thaw requires s_umount.
> 
> The solution is to add a check, vfs_is_frozen(), to all code paths
> that write while holding sb->s_umount and return without writing if it
> is true.
  Mikulas Patocka correctly pointed out (in thread starting by
https://lkml.org/lkml/2011/11/25/169) that skipping frozen filesystem is
currently actually wrong for filesystems such as ext2. These filesystems
cannot stop modifications from happening and thus sync must not skip them
even when they are marked as frozen. That complicates the solution of the
deadlock you observe.

Another issue is that even with ext4 current freezing code can leave dirty
data on frozen filesystem. Consider the following race:
  Thread 1				  Thread 2
freeze_super()				__generic_file_aio_write()
  ...					  vfs_check_frozen(sb)
  sb->s_frozen = SB_FREEZE_WRITE;	  ...
  sync_filesystem(sb);
					  now we go and write to the fs
  sb->s_frozen = SB_FREEZE_TRANS;

Your patches would just hide this race (which I can actually trigger rather
easily in my testing).

Above two issues make me think more about how to really avoid having any
dirty bits set on frozen filesystem. Then we shouldn't need this patch at
all. The trouble is that race like above can really happen with any
operation modifying the filesystem so it's not really easy to fix. I'll
write email about that tomorrow...

								Honza

> BugLink: https://bugs.launchpad.net/bugs/897421
> Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
> Cc: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> [kamal@canonical.com: minor changes; patch restructure]
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---
>  fs/fs-writeback.c  |    8 ++++++++
>  fs/quota/quota.c   |   21 ++++++++++++++++++++-
>  fs/super.c         |    8 ++++++++
>  fs/sync.c          |    4 ++--
>  include/linux/fs.h |    7 ++++++-
>  5 files changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 73c3992..eee01cd 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -554,6 +554,14 @@ static long writeback_sb_inodes(struct super_block *sb,
>  	while (!list_empty(&wb->b_io)) {
>  		struct inode *inode = wb_inode(wb->b_io.prev);
>  
> +		if (inode->i_sb == sb && vfs_is_frozen(sb)) {
> +			/*
> +			 * Inode is on a frozen superblock; skip it for now.
> +			 */
> +			redirty_tail(inode, wb);
> +			continue;
> +		}
> +
>  		if (inode->i_sb != sb) {
>  			if (work->sb) {
>  				/*
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 35f4b0e..1d770f2 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -47,7 +47,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
>  
>  static void quota_sync_one(struct super_block *sb, void *arg)
>  {
> -	if (sb->s_qcop && sb->s_qcop->quota_sync)
> +	if (sb->s_qcop && sb->s_qcop->quota_sync && !vfs_is_frozen(sb))
>  		sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
>  }
>  
> @@ -368,8 +368,27 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
>  		goto out;
>  	}
>  
> +	/*
> +	 * If the fs is frozen, only allow read-only quota subcmds.
> +	 */
> +	if (vfs_is_frozen(sb)) {
> +		switch (cmd) {
> +			case Q_GETFMT:
> +			case Q_GETINFO:
> +			case Q_XGETQSTAT:
> +				ret = 0;
> +				break;
> +			default:
> +				ret = -EBUSY;
> +				break;
> +		}
> +		if ( ret != 0 )
> +		    goto out_drop_super;
> +	}
> +
>  	ret = do_quotactl(sb, type, cmds, id, addr, pathp);
>  
> +out_drop_super:
>  	drop_super(sb);
>  out:
>  	if (pathp && !IS_ERR(pathp))
> diff --git a/fs/super.c b/fs/super.c
> index afd0f1a..5629d06 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -526,6 +526,10 @@ void sync_supers(void)
>   *
>   *	Scans the superblock list and calls given function, passing it
>   *	locked superblock and given argument.
> + *
> + *	Note: If @f is going to write to the file system, it must first
> + *	check if the file system is frozen (via vfs_is_frozen(sb)) and
> + *	refuse to write if so.
>   */
>  void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
>  {
> @@ -595,6 +599,10 @@ EXPORT_SYMBOL(iterate_supers_type);
>   *	
>   *	Scans the superblock list and finds the superblock of the file system
>   *	mounted on the device given. %NULL is returned if no match is found.
> + *
> + *	Note: If caller is going to write to the superblock, it must first
> + *	check if the file system is frozen (via vfs_is_frozen(sb)) and
> + *	refuse to write if so.
>   */
>  
>  struct super_block *get_super(struct block_device *bdev)
> diff --git a/fs/sync.c b/fs/sync.c
> index 101b8ef..e8166db 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
>  	/*
>  	 * No point in syncing out anything if the filesystem is read-only.
>  	 */
> -	if (sb->s_flags & MS_RDONLY)
> +	if ((sb->s_flags & MS_RDONLY) || vfs_is_frozen(sb))
>  		return 0;
>  
>  	ret = __sync_filesystem(sb, 0);
> @@ -80,7 +80,7 @@ EXPORT_SYMBOL_GPL(sync_filesystem);
>  
>  static void sync_one_sb(struct super_block *sb, void *arg)
>  {
> -	if (!(sb->s_flags & MS_RDONLY))
> +	if (!(sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
>  		__sync_filesystem(sb, *(int *)arg);
>  }
>  /*
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 019dc55..ec33b4c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1490,7 +1490,7 @@ extern void prune_dcache_sb(struct super_block *sb, int nr_to_scan);
>  extern struct timespec current_fs_time(struct super_block *sb);
>  
>  /*
> - * Snapshotting support.
> + * Snapshotting support.  See freeze_super() for documentation.
>   */
>  enum {
>  	SB_UNFROZEN = 0,
> @@ -1501,6 +1501,11 @@ enum {
>  #define vfs_check_frozen(sb, level) \
>  	wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
>  
> +static inline int vfs_is_frozen(struct super_block *sb)
> +{
> +	return sb->s_frozen == SB_FREEZE_TRANS;
> +}
> +
>  /*
>   * until VFS tracks user namespaces for inodes, just make all files
>   * belong to init_user_ns
> -- 
> 1.7.5.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2011-12-08 18:04 ` [PATCH v2 1/7] Adding support to freeze and unfreeze a journal Kamal Mostafa
@ 2012-01-10 20:20   ` Eric Sandeen
  2012-01-10 21:31     ` Jan Kara
  2012-01-10 22:00     ` Surbhi Palande
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Sandeen @ 2012-01-10 20:20 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Alexander Viro, Andreas Dilger, Matthew Wilcox,
	Randy Dunlap, Theodore Tso, linux-doc, linux-ext4, linux-fsdevel,
	linux-kernel, Surbhi Palande, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka,
	Surbhi Palande

On 12/8/11 12:04 PM, Kamal Mostafa wrote:
> From: Surbhi Palande <surbhi.palande@canonical.com>
> 
> The journal should be frozen when a filesystem freezes. What this means is
> that until the filesystem is thawed again, no new transactions should be
> accepted by the journal. When the filesystem thaws, inturn it should thaw the
> journal and this should allow the journal to resume accepting new
> transactions. While the the filesystem has frozen the journal, the clients of
> the journal on calling jbd2_journal_start() will sleep on a wait queue.
> Thawing the journal will wake up the sleeping clients and journalling can
> progress normally.
> 
> An example of the race condition that can happen without this patch is as
> follows:
> 
> Say the filesystem is thawed when we begin. Let tx be the time at unit x
> 
> P1: Process doing an aio write
> t1) ext4_file_write()
>   t2) generic_file_aio_write()
>     t3) __generic_file_aio_write()
>       // filesystem is not frozen, so we do not block in the next check.
>       t4) vfs_check_frozen()
>       t5) generic_write_checks()
> ----------------- Prempted------------------
> 
> P2: Process that does filesystem freeze
> 
> t6) freeze_super()
>   t7) sync_filesystem()
>   t8) sync_blockdev()
>   t9) sb->s_op->freeze_fs() (= ext4_freeze)
>     t10) jbd2_journal_lock_updates()
>     t11) jbd2_journal_flush()
>     // Need to unlock the journal before returning to user space.
>     t12) jbd2_journal_unlock_updates()
>     // Journal is unlocked and so we can start accepting new transactions now.
> 
> // freezing process completes execution. Page cache is now clean and should
> // remain clean till the filesystem is frozen.
> --------------------------------------------
> 
> P1: writing process gets the control back
> t13) generic_file_buffered_write()
>   t14) generic_perform_write()
>     t15) a_ops->write_begin() (= ext4_write_begin)
>       t16) ext4_journal_start()
> 	// New handle is started. We do not block here! Write continues
> 	// dirtying the page cache while the filesystem is frozen!

Hrm let me think through this a little more; we actually do:

t16) ext4_journal_start()
  t17) ext4_journal_start_sb()
    t18) handle = ext4_journal_current_handle();
    t19) if (!handle) vfs_check_frozen()
    t20) ... jbd2_journal_start()

So actually we *do* block new handles, but let *existing* ones
continue (see commits 6b0310fbf087ad6e9e3b8392adca97cd77184084
and be4f27d324e8ddd57cc0d4d604fe85ee0425cba9)

So your assertion that a new handle is started is incorrect
in general, isn't it?  So then does the fix seem necessary?
Or, at least, in the fashion below - maybe we need to just make
sure all started handles complete before the unlock_updates?
Or am I missing something...?

-Eric

> 
> BugLink: https://bugs.launchpad.net/bugs/897421
> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
> Acked-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---
>  fs/jbd2/journal.c     |    1 +
>  fs/jbd2/transaction.c |   42 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/jbd2.h  |    7 +++++++
>  3 files changed, 50 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 0fa0123..f0170cc 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -894,6 +894,7 @@ static journal_t * journal_init_common (void)
>  	init_waitqueue_head(&journal->j_wait_checkpoint);
>  	init_waitqueue_head(&journal->j_wait_commit);
>  	init_waitqueue_head(&journal->j_wait_updates);
> +	init_waitqueue_head(&journal->j_wait_frozen);
>  	mutex_init(&journal->j_barrier);
>  	mutex_init(&journal->j_checkpoint_mutex);
>  	spin_lock_init(&journal->j_revoke_lock);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index a0e41a4..340ee35 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -173,6 +173,17 @@ repeat:
>  				journal->j_barrier_count == 0);
>  		goto repeat;
>  	}
> +	/* Don't let a new handle start when a journal is frozen.
> +	 * jbd2_journal_freeze calls jbd2_journal_unlock_updates() only after
> +	 * the j_flags indicate that the journal is frozen. So if the
> +	 * j_barrier_count is 0, then check if this was made 0 by the freezing
> +	 * process
> +	 */
> +	if (journal->j_flags & JBD2_FROZEN) {
> +		read_unlock(&journal->j_state_lock);
> +		wait_event(journal->j_wait_frozen, (journal->j_flags & JBD2_FROZEN));
> +		goto repeat;
> +	}
>  
>  	if (!journal->j_running_transaction) {
>  		read_unlock(&journal->j_state_lock);
> @@ -492,6 +503,37 @@ int jbd2_journal_restart(handle_t *handle, int nblocks)
>  }
>  EXPORT_SYMBOL(jbd2_journal_restart);
>  
> +int jbd2_journal_freeze(journal_t *journal)
> +{
> +	int error = 0;
> +	/* Now we set up the journal barrier. */
> +	jbd2_journal_lock_updates(journal);
> +
> +	/*
> +	 * Don't clear the needs_recovery flag if we failed to flush
> +	 * the journal.
> +	 */
> +	error = jbd2_journal_flush(journal);
> +	if (error >= 0) {
> +		write_lock(&journal->j_state_lock);
> +		journal->j_flags |= JBD2_FROZEN;
> +		write_unlock(&journal->j_state_lock);
> +	}
> +	jbd2_journal_unlock_updates(journal);
> +	return error;
> +}
> +EXPORT_SYMBOL(jbd2_journal_freeze);
> +
> +void jbd2_journal_thaw(journal_t * journal)
> +{
> +	write_lock(&journal->j_state_lock);
> +	journal->j_flags &= ~JBD2_FROZEN;
> +	write_unlock(&journal->j_state_lock);
> +	wake_up(&journal->j_wait_frozen);
> +}
> +EXPORT_SYMBOL(jbd2_journal_thaw);
> +
> +
>  /**
>   * void jbd2_journal_lock_updates () - establish a transaction barrier.
>   * @journal:  Journal to establish a barrier on.
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 2092ea2..bfa0752 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -658,6 +658,7 @@ jbd2_time_diff(unsigned long start, unsigned long end)
>   * @j_wait_checkpoint:  Wait queue to trigger checkpointing
>   * @j_wait_commit: Wait queue to trigger commit
>   * @j_wait_updates: Wait queue to wait for updates to complete
> + * @j_wait_frozen: Wait queue to wait for journal to thaw
>   * @j_checkpoint_mutex: Mutex for locking against concurrent checkpoints
>   * @j_head: Journal head - identifies the first unused block in the journal
>   * @j_tail: Journal tail - identifies the oldest still-used block in the
> @@ -775,6 +776,9 @@ struct journal_s
>  	/* Wait queue to wait for updates to complete */
>  	wait_queue_head_t	j_wait_updates;
>  
> +	/* Wait queue to wait for journal to thaw*/
> +	wait_queue_head_t	j_wait_frozen;
> +
>  	/* Semaphore for locking against concurrent checkpoints */
>  	struct mutex		j_checkpoint_mutex;
>  
> @@ -953,6 +957,7 @@ struct journal_s
>  #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
>  						 * data write error in ordered
>  						 * mode */
> +#define JBD2_FROZEN	0x080 /* Journal thread frozen along with filesystem */
>  
>  /*
>   * Function declarations for the journaling transaction and buffer
> @@ -1060,6 +1065,8 @@ extern void	 jbd2_journal_invalidatepage(journal_t *,
>  				struct page *, unsigned long);
>  extern int	 jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
>  extern int	 jbd2_journal_stop(handle_t *);
> +extern int	 jbd2_journal_freeze(journal_t *);
> +extern void	 jbd2_journal_thaw(journal_t *);
>  extern int	 jbd2_journal_flush (journal_t *);
>  extern void	 jbd2_journal_lock_updates (journal_t *);
>  extern void	 jbd2_journal_unlock_updates (journal_t *);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2012-01-10 20:20   ` Eric Sandeen
@ 2012-01-10 21:31     ` Jan Kara
  2012-01-10 21:55       ` Surbhi Palande
                         ` (2 more replies)
  2012-01-10 22:00     ` Surbhi Palande
  1 sibling, 3 replies; 27+ messages in thread
From: Jan Kara @ 2012-01-10 21:31 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Kamal Mostafa, Jan Kara, Alexander Viro, Andreas Dilger,
	Matthew Wilcox, Randy Dunlap, Theodore Tso, linux-doc,
	linux-ext4, linux-fsdevel, linux-kernel, Surbhi Palande,
	Valerie Aurora, Christopher Chaltain, Peter M. Petrakis,
	Mikulas Patocka, Surbhi Palande

On Tue 10-01-12 14:20:23, Eric Sandeen wrote:
> On 12/8/11 12:04 PM, Kamal Mostafa wrote:
> > From: Surbhi Palande <surbhi.palande@canonical.com>
> > 
> > The journal should be frozen when a filesystem freezes. What this means is
> > that until the filesystem is thawed again, no new transactions should be
> > accepted by the journal. When the filesystem thaws, inturn it should thaw the
> > journal and this should allow the journal to resume accepting new
> > transactions. While the the filesystem has frozen the journal, the clients of
> > the journal on calling jbd2_journal_start() will sleep on a wait queue.
> > Thawing the journal will wake up the sleeping clients and journalling can
> > progress normally.
> > 
> > An example of the race condition that can happen without this patch is as
> > follows:
> > 
> > Say the filesystem is thawed when we begin. Let tx be the time at unit x
> > 
> > P1: Process doing an aio write
> > t1) ext4_file_write()
> >   t2) generic_file_aio_write()
> >     t3) __generic_file_aio_write()
> >       // filesystem is not frozen, so we do not block in the next check.
> >       t4) vfs_check_frozen()
> >       t5) generic_write_checks()
> > ----------------- Prempted------------------
> > 
> > P2: Process that does filesystem freeze
> > 
> > t6) freeze_super()
> >   t7) sync_filesystem()
> >   t8) sync_blockdev()
> >   t9) sb->s_op->freeze_fs() (= ext4_freeze)
> >     t10) jbd2_journal_lock_updates()
> >     t11) jbd2_journal_flush()
> >     // Need to unlock the journal before returning to user space.
> >     t12) jbd2_journal_unlock_updates()
> >     // Journal is unlocked and so we can start accepting new transactions now.
> > 
> > // freezing process completes execution. Page cache is now clean and should
> > // remain clean till the filesystem is frozen.
> > --------------------------------------------
> > 
> > P1: writing process gets the control back
> > t13) generic_file_buffered_write()
> >   t14) generic_perform_write()
> >     t15) a_ops->write_begin() (= ext4_write_begin)
> >       t16) ext4_journal_start()
> > 	// New handle is started. We do not block here! Write continues
> > 	// dirtying the page cache while the filesystem is frozen!
> 
> Hrm let me think through this a little more; we actually do:
> 
> t16) ext4_journal_start()
>   t17) ext4_journal_start_sb()
>     t18) handle = ext4_journal_current_handle();
>     t19) if (!handle) vfs_check_frozen()
>     t20) ... jbd2_journal_start()
  Ah, right. I forgot.

> So actually we *do* block new handles, but let *existing* ones
> continue (see commits 6b0310fbf087ad6e9e3b8392adca97cd77184084
> and be4f27d324e8ddd57cc0d4d604fe85ee0425cba9)
> 
> So your assertion that a new handle is started is incorrect
> in general, isn't it?  So then does the fix seem necessary?
> Or, at least, in the fashion below - maybe we need to just make
> sure all started handles complete before the unlock_updates?
> Or am I missing something...?
  Well, the problem with running operations and freezing is more
fundamental I believe. See my email
http://marc.info/?l=linux-fsdevel&m=132585911925796&w=2

So I believe we'll need some better exclusion mechanism already in VFS.

								Honza

> > BugLink: https://bugs.launchpad.net/bugs/897421
> > Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
> > Acked-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
> > Cc: Kamal Mostafa <kamal@canonical.com>
> > Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> > Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> > ---
> >  fs/jbd2/journal.c     |    1 +
> >  fs/jbd2/transaction.c |   42 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/jbd2.h  |    7 +++++++
> >  3 files changed, 50 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index 0fa0123..f0170cc 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -894,6 +894,7 @@ static journal_t * journal_init_common (void)
> >  	init_waitqueue_head(&journal->j_wait_checkpoint);
> >  	init_waitqueue_head(&journal->j_wait_commit);
> >  	init_waitqueue_head(&journal->j_wait_updates);
> > +	init_waitqueue_head(&journal->j_wait_frozen);
> >  	mutex_init(&journal->j_barrier);
> >  	mutex_init(&journal->j_checkpoint_mutex);
> >  	spin_lock_init(&journal->j_revoke_lock);
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index a0e41a4..340ee35 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -173,6 +173,17 @@ repeat:
> >  				journal->j_barrier_count == 0);
> >  		goto repeat;
> >  	}
> > +	/* Don't let a new handle start when a journal is frozen.
> > +	 * jbd2_journal_freeze calls jbd2_journal_unlock_updates() only after
> > +	 * the j_flags indicate that the journal is frozen. So if the
> > +	 * j_barrier_count is 0, then check if this was made 0 by the freezing
> > +	 * process
> > +	 */
> > +	if (journal->j_flags & JBD2_FROZEN) {
> > +		read_unlock(&journal->j_state_lock);
> > +		wait_event(journal->j_wait_frozen, (journal->j_flags & JBD2_FROZEN));
> > +		goto repeat;
> > +	}
> >  
> >  	if (!journal->j_running_transaction) {
> >  		read_unlock(&journal->j_state_lock);
> > @@ -492,6 +503,37 @@ int jbd2_journal_restart(handle_t *handle, int nblocks)
> >  }
> >  EXPORT_SYMBOL(jbd2_journal_restart);
> >  
> > +int jbd2_journal_freeze(journal_t *journal)
> > +{
> > +	int error = 0;
> > +	/* Now we set up the journal barrier. */
> > +	jbd2_journal_lock_updates(journal);
> > +
> > +	/*
> > +	 * Don't clear the needs_recovery flag if we failed to flush
> > +	 * the journal.
> > +	 */
> > +	error = jbd2_journal_flush(journal);
> > +	if (error >= 0) {
> > +		write_lock(&journal->j_state_lock);
> > +		journal->j_flags |= JBD2_FROZEN;
> > +		write_unlock(&journal->j_state_lock);
> > +	}
> > +	jbd2_journal_unlock_updates(journal);
> > +	return error;
> > +}
> > +EXPORT_SYMBOL(jbd2_journal_freeze);
> > +
> > +void jbd2_journal_thaw(journal_t * journal)
> > +{
> > +	write_lock(&journal->j_state_lock);
> > +	journal->j_flags &= ~JBD2_FROZEN;
> > +	write_unlock(&journal->j_state_lock);
> > +	wake_up(&journal->j_wait_frozen);
> > +}
> > +EXPORT_SYMBOL(jbd2_journal_thaw);
> > +
> > +
> >  /**
> >   * void jbd2_journal_lock_updates () - establish a transaction barrier.
> >   * @journal:  Journal to establish a barrier on.
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index 2092ea2..bfa0752 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -658,6 +658,7 @@ jbd2_time_diff(unsigned long start, unsigned long end)
> >   * @j_wait_checkpoint:  Wait queue to trigger checkpointing
> >   * @j_wait_commit: Wait queue to trigger commit
> >   * @j_wait_updates: Wait queue to wait for updates to complete
> > + * @j_wait_frozen: Wait queue to wait for journal to thaw
> >   * @j_checkpoint_mutex: Mutex for locking against concurrent checkpoints
> >   * @j_head: Journal head - identifies the first unused block in the journal
> >   * @j_tail: Journal tail - identifies the oldest still-used block in the
> > @@ -775,6 +776,9 @@ struct journal_s
> >  	/* Wait queue to wait for updates to complete */
> >  	wait_queue_head_t	j_wait_updates;
> >  
> > +	/* Wait queue to wait for journal to thaw*/
> > +	wait_queue_head_t	j_wait_frozen;
> > +
> >  	/* Semaphore for locking against concurrent checkpoints */
> >  	struct mutex		j_checkpoint_mutex;
> >  
> > @@ -953,6 +957,7 @@ struct journal_s
> >  #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
> >  						 * data write error in ordered
> >  						 * mode */
> > +#define JBD2_FROZEN	0x080 /* Journal thread frozen along with filesystem */
> >  
> >  /*
> >   * Function declarations for the journaling transaction and buffer
> > @@ -1060,6 +1065,8 @@ extern void	 jbd2_journal_invalidatepage(journal_t *,
> >  				struct page *, unsigned long);
> >  extern int	 jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
> >  extern int	 jbd2_journal_stop(handle_t *);
> > +extern int	 jbd2_journal_freeze(journal_t *);
> > +extern void	 jbd2_journal_thaw(journal_t *);
> >  extern int	 jbd2_journal_flush (journal_t *);
> >  extern void	 jbd2_journal_lock_updates (journal_t *);
> >  extern void	 jbd2_journal_unlock_updates (journal_t *);
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2012-01-10 21:31     ` Jan Kara
@ 2012-01-10 21:55       ` Surbhi Palande
       [not found]       ` <CAMBkX3eVeKSmEzmYTe6Oe_D6kAMQTL5LYoi1-Axj7CcrM85Pow@mail.gmail.com>
  2012-01-11  3:08       ` Eric Sandeen
  2 siblings, 0 replies; 27+ messages in thread
From: Surbhi Palande @ 2012-01-10 21:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: Eric Sandeen, Kamal Mostafa, Alexander Viro, Andreas Dilger,
	Matthew Wilcox, Randy Dunlap, Theodore Tso, linux-doc,
	linux-ext4, linux-fsdevel, linux-kernel, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka,
	Surbhi Palande

> > Hrm let me think through this a little more; we actually do:
> >
> > t16) ext4_journal_start()
> >   t17) ext4_journal_start_sb()
> >     t18) handle = ext4_journal_current_handle();
> >     t19) if (!handle) vfs_check_frozen()
> >     t20) ... jbd2_journal_start()
>  Ah, right. I forgot.
>
> > So actually we *do* block new handles, but let *existing* ones
> > continue (see commits 6b0310fbf087ad6e9e3b8392adca97cd77184084
> > and be4f27d324e8ddd57cc0d4d604fe85ee0425cba9)
> >
> > So your assertion that a new handle is started is incorrect
> > in general, isn't it?  So then does the fix seem necessary?
> > Or, at least, in the fashion below - maybe we need to just make
> > sure all started handles complete before the unlock_updates?
> > Or am I missing something...?
>  Well, the problem with running operations and freezing is more
> fundamental I believe. See my email
> http://marc.info/?l=linux-fsdevel&m=132585911925796&w=2
>
> So I believe we'll need some better exclusion mechanism already in VFS.
>
>                                                                Honza
>

If all the write operations were journaled, then this patch would not
allow ext4 filesystem to have any dirty data after its frozen.
(as journal_start() would block).

 I think the only one candidate that creates dirty data without
calling ext4_journal_start() is mmapped?

Regards,
Surbhi.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2012-01-10 20:20   ` Eric Sandeen
  2012-01-10 21:31     ` Jan Kara
@ 2012-01-10 22:00     ` Surbhi Palande
  1 sibling, 0 replies; 27+ messages in thread
From: Surbhi Palande @ 2012-01-10 22:00 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Kamal Mostafa, Jan Kara, Alexander Viro, Andreas Dilger,
	Matthew Wilcox, Randy Dunlap, Theodore Tso, linux-doc,
	linux-ext4, linux-fsdevel, linux-kernel, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka,
	Surbhi Palande

> Hrm let me think through this a little more; we actually do:
>
> t16) ext4_journal_start()
>  t17) ext4_journal_start_sb()
>    t18) handle = ext4_journal_current_handle();
>    t19) if (!handle) vfs_check_frozen()
>    t20) ... jbd2_journal_start()
>
> So actually we *do* block new handles, but let *existing* ones
> continue (see commits 6b0310fbf087ad6e9e3b8392adca97cd77184084
> and be4f27d324e8ddd57cc0d4d604fe85ee0425cba9)
>
> So your assertion that a new handle is started is incorrect
> in general, isn't it?  So then does the fix seem necessary?
> Or, at least, in the fashion below - maybe we need to just make
> sure all started handles complete before the unlock_updates?
> Or am I missing something...?
>
> -Eric

This patch addresses the race that occurs between filesystem freeze
and jbd2_journal_start().

     Task1                                              Task2
t1) vfs_check_frozen()
t2) ---------------------------------------------------------------------
t3)                                                   filesystem is frozen.
t4) jbd2_journal_start()

Without this patch a new handle can be started because of the race.
With this patch, no new journaled transaction can start after the
journal is frozen and hence no new journaled writes can be made.

Regards,
Surbhi



>
>>
>> BugLink: https://bugs.launchpad.net/bugs/897421
>> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
>> Acked-by: Jan Kara <jack@suse.cz>
>> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
>> Cc: Kamal Mostafa <kamal@canonical.com>
>> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
>> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
>> ---
>>  fs/jbd2/journal.c     |    1 +
>>  fs/jbd2/transaction.c |   42 ++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/jbd2.h  |    7 +++++++
>>  3 files changed, 50 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index 0fa0123..f0170cc 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -894,6 +894,7 @@ static journal_t * journal_init_common (void)
>>       init_waitqueue_head(&journal->j_wait_checkpoint);
>>       init_waitqueue_head(&journal->j_wait_commit);
>>       init_waitqueue_head(&journal->j_wait_updates);
>> +     init_waitqueue_head(&journal->j_wait_frozen);
>>       mutex_init(&journal->j_barrier);
>>       mutex_init(&journal->j_checkpoint_mutex);
>>       spin_lock_init(&journal->j_revoke_lock);
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index a0e41a4..340ee35 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -173,6 +173,17 @@ repeat:
>>                               journal->j_barrier_count == 0);
>>               goto repeat;
>>       }
>> +     /* Don't let a new handle start when a journal is frozen.
>> +      * jbd2_journal_freeze calls jbd2_journal_unlock_updates() only after
>> +      * the j_flags indicate that the journal is frozen. So if the
>> +      * j_barrier_count is 0, then check if this was made 0 by the freezing
>> +      * process
>> +      */
>> +     if (journal->j_flags & JBD2_FROZEN) {
>> +             read_unlock(&journal->j_state_lock);
>> +             wait_event(journal->j_wait_frozen, (journal->j_flags & JBD2_FROZEN));
>> +             goto repeat;
>> +     }
>>
>>       if (!journal->j_running_transaction) {
>>               read_unlock(&journal->j_state_lock);
>> @@ -492,6 +503,37 @@ int jbd2_journal_restart(handle_t *handle, int nblocks)
>>  }
>>  EXPORT_SYMBOL(jbd2_journal_restart);
>>
>> +int jbd2_journal_freeze(journal_t *journal)
>> +{
>> +     int error = 0;
>> +     /* Now we set up the journal barrier. */
>> +     jbd2_journal_lock_updates(journal);
>> +
>> +     /*
>> +      * Don't clear the needs_recovery flag if we failed to flush
>> +      * the journal.
>> +      */
>> +     error = jbd2_journal_flush(journal);
>> +     if (error >= 0) {
>> +             write_lock(&journal->j_state_lock);
>> +             journal->j_flags |= JBD2_FROZEN;
>> +             write_unlock(&journal->j_state_lock);
>> +     }
>> +     jbd2_journal_unlock_updates(journal);
>> +     return error;
>> +}
>> +EXPORT_SYMBOL(jbd2_journal_freeze);
>> +
>> +void jbd2_journal_thaw(journal_t * journal)
>> +{
>> +     write_lock(&journal->j_state_lock);
>> +     journal->j_flags &= ~JBD2_FROZEN;
>> +     write_unlock(&journal->j_state_lock);
>> +     wake_up(&journal->j_wait_frozen);
>> +}
>> +EXPORT_SYMBOL(jbd2_journal_thaw);
>> +
>> +
>>  /**
>>   * void jbd2_journal_lock_updates () - establish a transaction barrier.
>>   * @journal:  Journal to establish a barrier on.
>> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
>> index 2092ea2..bfa0752 100644
>> --- a/include/linux/jbd2.h
>> +++ b/include/linux/jbd2.h
>> @@ -658,6 +658,7 @@ jbd2_time_diff(unsigned long start, unsigned long end)
>>   * @j_wait_checkpoint:  Wait queue to trigger checkpointing
>>   * @j_wait_commit: Wait queue to trigger commit
>>   * @j_wait_updates: Wait queue to wait for updates to complete
>> + * @j_wait_frozen: Wait queue to wait for journal to thaw
>>   * @j_checkpoint_mutex: Mutex for locking against concurrent checkpoints
>>   * @j_head: Journal head - identifies the first unused block in the journal
>>   * @j_tail: Journal tail - identifies the oldest still-used block in the
>> @@ -775,6 +776,9 @@ struct journal_s
>>       /* Wait queue to wait for updates to complete */
>>       wait_queue_head_t       j_wait_updates;
>>
>> +     /* Wait queue to wait for journal to thaw*/
>> +     wait_queue_head_t       j_wait_frozen;
>> +
>>       /* Semaphore for locking against concurrent checkpoints */
>>       struct mutex            j_checkpoint_mutex;
>>
>> @@ -953,6 +957,7 @@ struct journal_s
>>  #define JBD2_ABORT_ON_SYNCDATA_ERR   0x040   /* Abort the journal on file
>>                                                * data write error in ordered
>>                                                * mode */
>> +#define JBD2_FROZEN  0x080 /* Journal thread frozen along with filesystem */
>>
>>  /*
>>   * Function declarations for the journaling transaction and buffer
>> @@ -1060,6 +1065,8 @@ extern void      jbd2_journal_invalidatepage(journal_t *,
>>                               struct page *, unsigned long);
>>  extern int    jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
>>  extern int    jbd2_journal_stop(handle_t *);
>> +extern int    jbd2_journal_freeze(journal_t *);
>> +extern void   jbd2_journal_thaw(journal_t *);
>>  extern int    jbd2_journal_flush (journal_t *);
>>  extern void   jbd2_journal_lock_updates (journal_t *);
>>  extern void   jbd2_journal_unlock_updates (journal_t *);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
       [not found]       ` <CAMBkX3eVeKSmEzmYTe6Oe_D6kAMQTL5LYoi1-Axj7CcrM85Pow@mail.gmail.com>
@ 2012-01-11  0:04         ` Jan Kara
  2012-01-11  0:13           ` Surbhi Palande
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kara @ 2012-01-11  0:04 UTC (permalink / raw)
  To: Surbhi Palande
  Cc: Jan Kara, Eric Sandeen, Kamal Mostafa, Alexander Viro,
	Andreas Dilger, Matthew Wilcox, Randy Dunlap, Theodore Tso,
	linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Valerie Aurora, Christopher Chaltain, Peter M. Petrakis,
	Mikulas Patocka, Surbhi Palande

On Tue 10-01-12 13:50:22, Surbhi Palande wrote:
> > > Hrm let me think through this a little more; we actually do:
> > >
> > > t16) ext4_journal_start()
> > >   t17) ext4_journal_start_sb()
> > >     t18) handle = ext4_journal_current_handle();
> > >     t19) if (!handle) vfs_check_frozen()
> > >     t20) ... jbd2_journal_start()
> >   Ah, right. I forgot.
> >
> > > So actually we *do* block new handles, but let *existing* ones
> > > continue (see commits 6b0310fbf087ad6e9e3b8392adca97cd77184084
> > > and be4f27d324e8ddd57cc0d4d604fe85ee0425cba9)
> > >
> > > So your assertion that a new handle is started is incorrect
> > > in general, isn't it?  So then does the fix seem necessary?
> > > Or, at least, in the fashion below - maybe we need to just make
> > > sure all started handles complete before the unlock_updates?
> > > Or am I missing something...?
> >   Well, the problem with running operations and freezing is more
> > fundamental I believe. See my email
> > http://marc.info/?l=linux-fsdevel&m=132585911925796&w=2
> >
> > So I believe we'll need some better exclusion mechanism already in VFS.
> >
> >                                                                Honza
> >
> 
> 
> If all the write operations were journaled, then this patch would not allow
> ext4 filesystem to have any dirty data after its frozen.
> (as journal_start() would block).
> 
>  I think the only one candidate that creates dirty data without calling
> ext4_journal_start() is mmapped?
  No, the problem is in any write path. The problem is with operations
that happen during the phase when s_frozen == SB_FREEZE_WRITE. These
operations dirty the filesystem but running sync may easily miss them.
During this phase journal is not frozen so that does not help you in any
way.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2012-01-11  0:04         ` Jan Kara
@ 2012-01-11  0:13           ` Surbhi Palande
  2012-01-11  0:51             ` Jan Kara
  0 siblings, 1 reply; 27+ messages in thread
From: Surbhi Palande @ 2012-01-11  0:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Eric Sandeen, Kamal Mostafa, Alexander Viro, Andreas Dilger,
	Matthew Wilcox, Randy Dunlap, Theodore Tso, linux-doc,
	linux-ext4, linux-fsdevel, linux-kernel, Valerie Aurora,
	Christopher Chaltain, Peter M. Petrakis, Mikulas Patocka,
	Surbhi Palande

Hi Jan,

>>
>>
>> If all the write operations were journaled, then this patch would not allow
>> ext4 filesystem to have any dirty data after its frozen.
>> (as journal_start() would block).
>>
>>  I think the only one candidate that creates dirty data without calling
>> ext4_journal_start() is mmapped?
>  No, the problem is in any write path. The problem is with operations
> that happen during the phase when s_frozen == SB_FREEZE_WRITE. These
> operations dirty the filesystem but running sync may easily miss them.
> During this phase journal is not frozen so that does not help you in any
> way.
>
>                                                                Honza

Ok! No new transaction can really start after the journal is frozen.
But we can have dirty data after SB_FREEZE_WRITE and before
SB_FREEZE_TRANS.
I agree with you. However, can this be fixed by adding a
sync_filesystem() in freeze_super() after the sb->s_op->freeze_fs() is
over?


So then essentially, when freeze_super() returns, the page cache is clean?

I do definitely agree that the fix is to add a lock for mutual
exclusion between freeze filesystem and writes to a frozen filesystem.

Thanks!

Regards,
Surbhi.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2012-01-11  0:13           ` Surbhi Palande
@ 2012-01-11  0:51             ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2012-01-11  0:51 UTC (permalink / raw)
  To: Surbhi Palande
  Cc: Jan Kara, Eric Sandeen, Kamal Mostafa, Alexander Viro,
	Andreas Dilger, Matthew Wilcox, Randy Dunlap, Theodore Tso,
	linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Valerie Aurora, Christopher Chaltain, Peter M. Petrakis,
	Mikulas Patocka, Surbhi Palande

On Tue 10-01-12 16:13:46, Surbhi Palande wrote:
> >> If all the write operations were journaled, then this patch would not allow
> >> ext4 filesystem to have any dirty data after its frozen.
> >> (as journal_start() would block).
> >>
> >>  I think the only one candidate that creates dirty data without calling
> >> ext4_journal_start() is mmapped?
> >  No, the problem is in any write path. The problem is with operations
> > that happen during the phase when s_frozen == SB_FREEZE_WRITE. These
> > operations dirty the filesystem but running sync may easily miss them.
> > During this phase journal is not frozen so that does not help you in any
> > way.
> >
> >                                                                Honza
> 
> Ok! No new transaction can really start after the journal is frozen.
> But we can have dirty data after SB_FREEZE_WRITE and before
> SB_FREEZE_TRANS.
> I agree with you. However, can this be fixed by adding a
> sync_filesystem() in freeze_super() after the sb->s_op->freeze_fs() is
> over?
  The problem with this is that to do writeback you need to start a
transaction. So you would deadlock.

> So then essentially, when freeze_super() returns, the page cache is clean?
> 
> I do definitely agree that the fix is to add a lock for mutual
> exclusion between freeze filesystem and writes to a frozen filesystem.
  Yes, that would be the cleanest solution but it's quite some work.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
  2012-01-10 21:31     ` Jan Kara
  2012-01-10 21:55       ` Surbhi Palande
       [not found]       ` <CAMBkX3eVeKSmEzmYTe6Oe_D6kAMQTL5LYoi1-Axj7CcrM85Pow@mail.gmail.com>
@ 2012-01-11  3:08       ` Eric Sandeen
  2 siblings, 0 replies; 27+ messages in thread
From: Eric Sandeen @ 2012-01-11  3:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Eric Sandeen, Kamal Mostafa, Alexander Viro, Andreas Dilger,
	Matthew Wilcox, Randy Dunlap, Theodore Tso, linux-doc,
	linux-ext4, linux-fsdevel, linux-kernel, Surbhi Palande,
	Valerie Aurora, Christopher Chaltain, Peter M. Petrakis,
	Mikulas Patocka, Surbhi Palande

On 1/10/12 3:31 PM, Jan Kara wrote:
> On Tue 10-01-12 14:20:23, Eric Sandeen wrote:

<snip>

>> Hrm let me think through this a little more; we actually do:
>>
>> t16) ext4_journal_start()
>>   t17) ext4_journal_start_sb()
>>     t18) handle = ext4_journal_current_handle();
>>     t19) if (!handle) vfs_check_frozen()
>>     t20) ... jbd2_journal_start()
>   Ah, right. I forgot.
> 
>> So actually we *do* block new handles, but let *existing* ones
>> continue (see commits 6b0310fbf087ad6e9e3b8392adca97cd77184084
>> and be4f27d324e8ddd57cc0d4d604fe85ee0425cba9)
>>
>> So your assertion that a new handle is started is incorrect
>> in general, isn't it?  So then does the fix seem necessary?
>> Or, at least, in the fashion below - maybe we need to just make
>> sure all started handles complete before the unlock_updates?
>> Or am I missing something...?
>   Well, the problem with running operations and freezing is more
> fundamental I believe. See my email
> http://marc.info/?l=linux-fsdevel&m=132585911925796&w=2
> 
> So I believe we'll need some better exclusion mechanism already in VFS.
> 
> 								Honza
> 

Yep, saw it, just wasn't sure if this patchset was still under active consideration.

Thanks,
-Eric


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb
  2012-01-06  0:35   ` Jan Kara
@ 2012-01-11 20:29     ` Kamal Mostafa
  2012-01-12 15:53       ` Mikulas Patocka
  0 siblings, 1 reply; 27+ messages in thread
From: Kamal Mostafa @ 2012-01-11 20:29 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig
  Cc: Alexander Viro, Andreas Dilger, Matthew Wilcox, Randy Dunlap,
	Theodore Tso, linux-doc, linux-ext4, linux-fsdevel, linux-kernel,
	Surbhi Palande, Valerie Aurora, Christopher Chaltain,
	Peter M. Petrakis, Mikulas Patocka

[-- Attachment #1: Type: text/plain, Size: 3291 bytes --]

On Fri, 2012-01-06 at 01:35 +0100, Jan Kara wrote:
> On Thu 08-12-11 10:04:35, Kamal Mostafa wrote:
> > From: Valerie Aurora <val@vaaconsulting.com>
> > 
> > Use trylock in try_to_writeback_inodes_sb to avoid read-write
> > deadlocks that could be triggered by freeze.

>   Christoph asked about what is the exact deadlock this patch tries to fix.
> I don't think you answered that. So can you elaborate please? Is it somehow
> connected with the fact that ext4 calls try_to_writeback_inodes_sb() with
> i_mutex held?
> 
> 								Honza

This was discussed in the thread
http://www.spinics.net/lists/linux-fsdevel/msg48754.html
Summarizing...

Jan>   What's exactly the deadlock trylock protects from here?
Jan>   Or is it just an optimization?

Val>   The trylock is an optimization Dave Chinner suggested.  The first
Val>   version I wrote acquired the lock and then checked vfs_is_frozen().

Dave>  It's not so much an optimisation, but the general case of avoiding
Dave>  read-write deadlocks such that freezing can trigger. I think remount
Dave>  can trigger the same deadlock as freezing, so the trylock avoids
Dave>  both deadlock cases rather than just working around the freeze
Dave>  problem....

 -Kamal


> > BugLink: https://bugs.launchpad.net/bugs/897421
> > Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
> > Cc: Kamal Mostafa <kamal@canonical.com>
> > Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> > [kamal@canonical.com: patch restructure]
> > Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> > ---
> >  fs/fs-writeback.c |   13 ++++++++-----
> >  1 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index ea89b3f..3a80f1b 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1274,8 +1274,9 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> >   * try_to_writeback_inodes_sb	-	start writeback if none underway
> >   * @sb: the superblock
> >   *
> > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > - * Returns 1 if writeback was started, 0 if not.
> > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > + * and no one else holds the s_umount lock.  Returns 1 if writeback
> > + * was started, 0 if not.
> >   */
> >  int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
> >  {
> > @@ -1288,15 +1289,17 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb);
> >   * @sb: the superblock
> >   * @nr: the number of pages to write
> >   *
> > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > - * Returns 1 if writeback was started, 0 if not.
> > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > + * and no one else holds the s_umount lock.  Returns 1 if writeback
> > + * was started, 0 if not.
> >   */
> >  int try_to_writeback_inodes_sb_nr(struct super_block *sb,
> >  				   unsigned long nr,
> >  				   enum wb_reason reason)
> >  {
> >  	if (!writeback_in_progress(sb->s_bdi)) {
> > -		down_read(&sb->s_umount);
> > +		if (!down_read_trylock(&sb->s_umount))
> > +		    return 0;
> >  		if (nr == 0)
> >  			writeback_inodes_sb(sb, reason);
> >  		else
> > -- 
> > 1.7.5.4
> > 


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb
  2012-01-11 20:29     ` Kamal Mostafa
@ 2012-01-12 15:53       ` Mikulas Patocka
  0 siblings, 0 replies; 27+ messages in thread
From: Mikulas Patocka @ 2012-01-12 15:53 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, Christoph Hellwig, Alexander Viro, Andreas Dilger,
	Matthew Wilcox, Randy Dunlap, Theodore Tso, linux-doc,
	linux-ext4, linux-fsdevel, linux-kernel, Surbhi Palande,
	Valerie Aurora, Christopher Chaltain, Peter M. Petrakis



On Wed, 11 Jan 2012, Kamal Mostafa wrote:

> On Fri, 2012-01-06 at 01:35 +0100, Jan Kara wrote:
> > On Thu 08-12-11 10:04:35, Kamal Mostafa wrote:
> > > From: Valerie Aurora <val@vaaconsulting.com>
> > > 
> > > Use trylock in try_to_writeback_inodes_sb to avoid read-write
> > > deadlocks that could be triggered by freeze.
> 
> >   Christoph asked about what is the exact deadlock this patch tries to fix.
> > I don't think you answered that. So can you elaborate please? Is it somehow
> > connected with the fact that ext4 calls try_to_writeback_inodes_sb() with
> > i_mutex held?
> > 
> > 								Honza
> 
> This was discussed in the thread
> http://www.spinics.net/lists/linux-fsdevel/msg48754.html
> Summarizing...
> 
> Jan>   What's exactly the deadlock trylock protects from here?
> Jan>   Or is it just an optimization?
> 
> Val>   The trylock is an optimization Dave Chinner suggested.  The first
> Val>   version I wrote acquired the lock and then checked vfs_is_frozen().
> 
> Dave>  It's not so much an optimisation, but the general case of avoiding
> Dave>  read-write deadlocks such that freezing can trigger. I think remount
> Dave>  can trigger the same deadlock as freezing, so the trylock avoids
> Dave>  both deadlock cases rather than just working around the freeze
> Dave>  problem....
> 
>  -Kamal

As I wrote in 
https://www.redhat.com/archives/dm-devel/2011-November/msg00151.html , 
down_read_trylock doesn't fix the freeze deadlock. Think of this sequence:

Process 1 (freezing)
down_write(&sb->s_umount);
set the filesystem to frozen state
up_write(&sb->s_umount);

Process 2 (executing the code from the patch)
down_read_trylock(&sb->s_umount); - succeeds, because s_umount is not held
writeback_inodes_sb(sb, reason); - waits, because the filesystem is frozen

Process 1 (unfreezing)
down_write(&sb->s_umount); - deadlock (process 1 is waiting for process 2 
to drop the lock; process 2 is waiting for process 1 to unfreeze).

See the patch at 
https://www.redhat.com/archives/dm-devel/2011-November/msg00151.html , it 
has a different approach and it avoids the mentined freeze deadlock.

Mikulas

> > > BugLink: https://bugs.launchpad.net/bugs/897421
> > > Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
> > > Cc: Kamal Mostafa <kamal@canonical.com>
> > > Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> > > [kamal@canonical.com: patch restructure]
> > > Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> > > ---
> > >  fs/fs-writeback.c |   13 ++++++++-----
> > >  1 files changed, 8 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index ea89b3f..3a80f1b 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -1274,8 +1274,9 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> > >   * try_to_writeback_inodes_sb	-	start writeback if none underway
> > >   * @sb: the superblock
> > >   *
> > > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > > - * Returns 1 if writeback was started, 0 if not.
> > > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > > + * and no one else holds the s_umount lock.  Returns 1 if writeback
> > > + * was started, 0 if not.
> > >   */
> > >  int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
> > >  {
> > > @@ -1288,15 +1289,17 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb);
> > >   * @sb: the superblock
> > >   * @nr: the number of pages to write
> > >   *
> > > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > > - * Returns 1 if writeback was started, 0 if not.
> > > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > > + * and no one else holds the s_umount lock.  Returns 1 if writeback
> > > + * was started, 0 if not.
> > >   */
> > >  int try_to_writeback_inodes_sb_nr(struct super_block *sb,
> > >  				   unsigned long nr,
> > >  				   enum wb_reason reason)
> > >  {
> > >  	if (!writeback_in_progress(sb->s_bdi)) {
> > > -		down_read(&sb->s_umount);
> > > +		if (!down_read_trylock(&sb->s_umount))
> > > +		    return 0;
> > >  		if (nr == 0)
> > >  			writeback_inodes_sb(sb, reason);
> > >  		else
> > > -- 
> > > 1.7.5.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-01-12 15:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
2011-12-08 18:04 ` [PATCH v2 1/7] Adding support to freeze and unfreeze a journal Kamal Mostafa
2012-01-10 20:20   ` Eric Sandeen
2012-01-10 21:31     ` Jan Kara
2012-01-10 21:55       ` Surbhi Palande
     [not found]       ` <CAMBkX3eVeKSmEzmYTe6Oe_D6kAMQTL5LYoi1-Axj7CcrM85Pow@mail.gmail.com>
2012-01-11  0:04         ` Jan Kara
2012-01-11  0:13           ` Surbhi Palande
2012-01-11  0:51             ` Jan Kara
2012-01-11  3:08       ` Eric Sandeen
2012-01-10 22:00     ` Surbhi Palande
2011-12-08 18:04 ` [PATCH v2 2/7] Freeze and thaw the journal on ext4 freeze Kamal Mostafa
2012-01-06  0:32   ` Jan Kara
2011-12-08 18:04 ` [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock Kamal Mostafa
2012-01-06  1:50   ` Jan Kara
2011-12-08 18:04 ` [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle Kamal Mostafa
2011-12-13  3:34   ` Miao Xie
2011-12-15  7:10     ` Miao Xie
2011-12-16 20:48     ` Kamal Mostafa
2012-01-06  0:33   ` Jan Kara
2011-12-08 18:04 ` [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb Kamal Mostafa
2012-01-06  0:35   ` Jan Kara
2012-01-11 20:29     ` Kamal Mostafa
2012-01-12 15:53       ` Mikulas Patocka
2011-12-08 18:04 ` [PATCH v2 6/7] VFS: Document s_frozen state through freeze_super Kamal Mostafa
2012-01-06  0:36   ` Jan Kara
2011-12-08 18:04 ` [PATCH v2 7/7] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Kamal Mostafa
2012-01-06  0:36   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).