All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix sys_sync() bug and cleanup code (version 2)
@ 2009-04-23 14:47 Jan Kara
  2009-04-23 14:47 ` [PATCH 1/4] vfs: Fix sys_sync() and fsync_super() reliability " Jan Kara
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Jan Kara @ 2009-04-23 14:47 UTC (permalink / raw)
  To: LKML; +Cc: Christoph Hellwig, Trond Myklebust, linux-fsdevel, Andrew Morton

  Hi,

  this is a next version of my series of patches to fix sys_sync(). The bug
could result in not all data being safely on disk after sys_sync() returns. It
also fixes the same issue in fsync_super() which leads to assertion failure in
JBD with Jens's per-BDI writeback patches.
  The first patch fixes the data integrity problem, the other three patches are
various cleanups.

									Honza

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] vfs: Fix sys_sync() and fsync_super() reliability (version 2)
  2009-04-23 14:47 [PATCH 0/4] Fix sys_sync() bug and cleanup code (version 2) Jan Kara
@ 2009-04-23 14:47 ` Jan Kara
  2009-04-23 14:47 ` [PATCH 2/4] vfs: Call ->sync_fs() even if s_dirt is 0 " Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-04-23 14:47 UTC (permalink / raw)
  To: LKML
  Cc: Christoph Hellwig, Trond Myklebust, linux-fsdevel, Andrew Morton,
	Jan Kara

So far, do_sync() called:
  sync_inodes(0);
  sync_supers();
  sync_filesystems(0);
  sync_filesystems(1);
  sync_inodes(1);

This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
and others are hit by this) when racing e.g. with background writeback. A
similar problem hits also other filesystems (e.g. ext2) because of
write_supers() being called before the sync_inodes(1).

Change the ordering of calls in do_sync() - this requires a new function
sync_blkdevs() to preserve the property that block devices are always synced
after write_super() / sync_fs() call.

The same issue is fixed in __fsync_super() function used on umount /
remount read-only.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/super.c         |   27 ++++++++++++++++++++++++++-
 fs/sync.c          |    3 ++-
 include/linux/fs.h |    2 ++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 786fe7d..4826540 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -267,6 +267,7 @@ void __fsync_super(struct super_block *sb)
 {
 	sync_inodes_sb(sb, 0);
 	vfs_dq_sync(sb);
+	sync_inodes_sb(sb, 1);
 	lock_super(sb);
 	if (sb->s_dirt && sb->s_op->write_super)
 		sb->s_op->write_super(sb);
@@ -274,7 +275,6 @@ void __fsync_super(struct super_block *sb)
 	if (sb->s_op->sync_fs)
 		sb->s_op->sync_fs(sb, 1);
 	sync_blockdev(sb->s_bdev);
-	sync_inodes_sb(sb, 1);
 }
 
 /*
@@ -502,6 +502,31 @@ restart:
 	mutex_unlock(&mutex);
 }
 
+/*
+ *  Sync all block devices underlying some superblock
+ */
+void sync_blockdevs(void)
+{
+	struct super_block *sb;
+
+	spin_lock(&sb_lock);
+restart:
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (!sb->s_bdev)
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+		down_read(&sb->s_umount);
+		if (sb->s_root)
+			sync_blockdev(sb->s_bdev);
+		up_read(&sb->s_umount);
+		spin_lock(&sb_lock);
+		if (__put_super_and_need_restart(sb))
+			goto restart;
+	}
+	spin_unlock(&sb_lock);
+}
+
 /**
  *	get_super - get the superblock of a device
  *	@bdev: device to get the superblock for
diff --git a/fs/sync.c b/fs/sync.c
index 7abc65f..fa14e42 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -26,10 +26,11 @@ static void do_sync(unsigned long wait)
 	wakeup_pdflush(0);
 	sync_inodes(0);		/* All mappings, inodes and their blockdevs */
 	vfs_dq_sync(NULL);
+	sync_inodes(wait);	/* Mappings, inodes and blockdevs, again. */
 	sync_supers();		/* Write the superblocks */
 	sync_filesystems(0);	/* Start syncing the filesystems */
 	sync_filesystems(wait);	/* Waitingly sync the filesystems */
-	sync_inodes(wait);	/* Mappings, inodes and blockdevs, again. */
+	sync_blockdevs();
 	if (!wait)
 		printk("Emergency Sync complete\n");
 	if (unlikely(laptop_mode))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5bed436..4bad02e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1942,6 +1942,7 @@ extern void bdput(struct block_device *);
 extern struct block_device *open_by_devnum(dev_t, fmode_t);
 extern void invalidate_bdev(struct block_device *);
 extern int sync_blockdev(struct block_device *bdev);
+extern void sync_blockdevs(void);
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
@@ -1951,6 +1952,7 @@ extern int fsync_no_super(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
+static inline void sync_blockdevs(void) { }
 static inline void invalidate_bdev(struct block_device *bdev) {}
 
 static inline struct super_block *freeze_bdev(struct block_device *sb)
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] vfs: Call ->sync_fs() even if s_dirt is 0 (version 2)
  2009-04-23 14:47 [PATCH 0/4] Fix sys_sync() bug and cleanup code (version 2) Jan Kara
  2009-04-23 14:47 ` [PATCH 1/4] vfs: Fix sys_sync() and fsync_super() reliability " Jan Kara
@ 2009-04-23 14:47 ` Jan Kara
  2009-04-23 14:47 ` [PATCH 3/4] vfs: Make __fsync_super() a static function " Jan Kara
  2009-04-23 14:47 ` [PATCH 4/4] vfs: Make sys_sync() use fsync_super() " Jan Kara
  3 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-04-23 14:47 UTC (permalink / raw)
  To: LKML
  Cc: Christoph Hellwig, Trond Myklebust, linux-fsdevel, Andrew Morton,
	Jan Kara

sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
->sync_fs() isn't called. This does not really make much sence since s_dirt is
generally used by a filesystem to mean that ->write_super() needs to be called.
But ->sync_fs() does different things. I even suspect that some filesystems
(btrfs?) sets s_dirt just to fool this logic.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/super.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 4826540..d9759e0 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -490,7 +490,7 @@ restart:
 		spin_unlock(&sb_lock);
 		down_read(&sb->s_umount);
 		async_synchronize_full_domain(&sb->s_async_list);
-		if (sb->s_root && (wait || sb->s_dirt))
+		if (sb->s_root)
 			sb->s_op->sync_fs(sb, wait);
 		up_read(&sb->s_umount);
 		/* restart only when sb is no longer on the list */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] vfs: Make __fsync_super() a static function (version 2)
  2009-04-23 14:47 [PATCH 0/4] Fix sys_sync() bug and cleanup code (version 2) Jan Kara
  2009-04-23 14:47 ` [PATCH 1/4] vfs: Fix sys_sync() and fsync_super() reliability " Jan Kara
  2009-04-23 14:47 ` [PATCH 2/4] vfs: Call ->sync_fs() even if s_dirt is 0 " Jan Kara
@ 2009-04-23 14:47 ` Jan Kara
  2009-04-23 16:42   ` Trond Myklebust
  2009-04-23 16:52   ` Christoph Hellwig
  2009-04-23 14:47 ` [PATCH 4/4] vfs: Make sys_sync() use fsync_super() " Jan Kara
  3 siblings, 2 replies; 10+ messages in thread
From: Jan Kara @ 2009-04-23 14:47 UTC (permalink / raw)
  To: LKML
  Cc: Christoph Hellwig, Trond Myklebust, linux-fsdevel, Andrew Morton,
	Jan Kara

__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c     |    2 +-
 fs/super.c         |    5 ++---
 include/linux/fs.h |    1 -
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index f45dbc1..48d1290 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -240,7 +240,7 @@ struct super_block *freeze_bdev(struct block_device *bdev)
 		sb->s_frozen = SB_FREEZE_WRITE;
 		smp_wmb();
 
-		__fsync_super(sb);
+		fsync_super(sb);
 
 		sb->s_frozen = SB_FREEZE_TRANS;
 		smp_wmb();
diff --git a/fs/super.c b/fs/super.c
index d9759e0..4c92068 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -263,7 +263,7 @@ EXPORT_SYMBOL(unlock_super);
  * device.  Takes the superblock lock.  Requires a second blkdev
  * flush by the caller to complete the operation.
  */
-void __fsync_super(struct super_block *sb)
+static int __fsync_super(struct super_block *sb)
 {
 	sync_inodes_sb(sb, 0);
 	vfs_dq_sync(sb);
@@ -284,8 +284,7 @@ void __fsync_super(struct super_block *sb)
  */
 int fsync_super(struct super_block *sb)
 {
-	__fsync_super(sb);
-	return sync_blockdev(sb->s_bdev);
+	return __fsync_super(sb);
 }
 EXPORT_SYMBOL_GPL(fsync_super);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4bad02e..47a67c9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2084,7 +2084,6 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
 extern int vfs_fsync(struct file *file, struct dentry *dentry, int datasync);
 extern void sync_supers(void);
 extern void sync_filesystems(int wait);
-extern void __fsync_super(struct super_block *sb);
 extern void emergency_sync(void);
 extern void emergency_remount(void);
 extern int do_remount_sb(struct super_block *sb, int flags,
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] vfs: Make sys_sync() use fsync_super() (version 2)
  2009-04-23 14:47 [PATCH 0/4] Fix sys_sync() bug and cleanup code (version 2) Jan Kara
                   ` (2 preceding siblings ...)
  2009-04-23 14:47 ` [PATCH 3/4] vfs: Make __fsync_super() a static function " Jan Kara
@ 2009-04-23 14:47 ` Jan Kara
  2009-04-23 16:57   ` Christoph Hellwig
  3 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2009-04-23 14:47 UTC (permalink / raw)
  To: LKML
  Cc: Christoph Hellwig, Trond Myklebust, linux-fsdevel, Andrew Morton,
	Jan Kara

It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.

Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c            |   15 +++++++---
 fs/fs-writeback.c         |   49 ---------------------------------
 fs/super.c                |   66 +++++++++++---------------------------------
 fs/sync.c                 |   31 +++++++--------------
 include/linux/fs.h        |    4 +-
 include/linux/writeback.h |    1 -
 6 files changed, 39 insertions(+), 127 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 48d1290..2609cce 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -175,17 +175,22 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 				iov, offset, nr_segs, blkdev_get_blocks, NULL);
 }
 
+int __sync_blockdev(struct block_device *bdev, int wait)
+{
+	if (!bdev)
+		return 0;
+	if (!wait)
+		return filemap_flush(bdev->bd_inode->i_mapping);
+	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
+}
+
 /*
  * Write out and wait upon all the dirty data associated with a block
  * device via its mapping.  Does not take the superblock lock.
  */
 int sync_blockdev(struct block_device *bdev)
 {
-	int ret = 0;
-
-	if (bdev)
-		ret = filemap_write_and_wait(bdev->bd_inode->i_mapping);
-	return ret;
+	return __sync_blockdev(bdev, 1);
 }
 EXPORT_SYMBOL(sync_blockdev);
 
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 91013ff..e0fb2e7 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -679,55 +679,6 @@ void sync_inodes_sb(struct super_block *sb, int wait)
 }
 
 /**
- * sync_inodes - writes all inodes to disk
- * @wait: wait for completion
- *
- * sync_inodes() goes through each super block's dirty inode list, writes the
- * inodes out, waits on the writeout and puts the inodes back on the normal
- * list.
- *
- * This is for sys_sync().  fsync_dev() uses the same algorithm.  The subtle
- * part of the sync functions is that the blockdev "superblock" is processed
- * last.  This is because the write_inode() function of a typical fs will
- * perform no I/O, but will mark buffers in the blockdev mapping as dirty.
- * What we want to do is to perform all that dirtying first, and then write
- * back all those inode blocks via the blockdev mapping in one sweep.  So the
- * additional (somewhat redundant) sync_blockdev() calls here are to make
- * sure that really happens.  Because if we call sync_inodes_sb(wait=1) with
- * outstanding dirty inodes, the writeback goes block-at-a-time within the
- * filesystem's write_inode().  This is extremely slow.
- */
-static void __sync_inodes(int wait)
-{
-	struct super_block *sb;
-
-	spin_lock(&sb_lock);
-restart:
-	list_for_each_entry(sb, &super_blocks, s_list) {
-		sb->s_count++;
-		spin_unlock(&sb_lock);
-		down_read(&sb->s_umount);
-		if (sb->s_root) {
-			sync_inodes_sb(sb, wait);
-			sync_blockdev(sb->s_bdev);
-		}
-		up_read(&sb->s_umount);
-		spin_lock(&sb_lock);
-		if (__put_super_and_need_restart(sb))
-			goto restart;
-	}
-	spin_unlock(&sb_lock);
-}
-
-void sync_inodes(int wait)
-{
-	__sync_inodes(0);
-
-	if (wait)
-		__sync_inodes(1);
-}
-
-/**
  * write_inode_now	-	write an inode to disk
  * @inode: inode to write to disk
  * @sync: whether the write should be synchronous or not
diff --git a/fs/super.c b/fs/super.c
index 4c92068..80d46c7 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -257,24 +257,17 @@ void unlock_super(struct super_block * sb)
 EXPORT_SYMBOL(lock_super);
 EXPORT_SYMBOL(unlock_super);
 
-/*
- * Write out and wait upon all dirty data associated with this
- * superblock.  Filesystem data as well as the underlying block
- * device.  Takes the superblock lock.  Requires a second blkdev
- * flush by the caller to complete the operation.
- */
-static int __fsync_super(struct super_block *sb)
+static int __fsync_super(struct super_block *sb, int wait)
 {
-	sync_inodes_sb(sb, 0);
 	vfs_dq_sync(sb);
-	sync_inodes_sb(sb, 1);
+	sync_inodes_sb(sb, wait);
 	lock_super(sb);
 	if (sb->s_dirt && sb->s_op->write_super)
 		sb->s_op->write_super(sb);
 	unlock_super(sb);
 	if (sb->s_op->sync_fs)
-		sb->s_op->sync_fs(sb, 1);
-	sync_blockdev(sb->s_bdev);
+		sb->s_op->sync_fs(sb, wait);
+	return __sync_blockdev(sb->s_bdev, wait);
 }
 
 /*
@@ -284,7 +277,12 @@ static int __fsync_super(struct super_block *sb)
  */
 int fsync_super(struct super_block *sb)
 {
-	return __fsync_super(sb);
+	int ret;
+
+	ret = __fsync_super(sb, 0);
+	if (ret < 0)
+		return ret;
+	return __fsync_super(sb, 1);
 }
 EXPORT_SYMBOL_GPL(fsync_super);
 
@@ -448,20 +446,17 @@ restart:
 }
 
 /*
- * Call the ->sync_fs super_op against all filesystems which are r/w and
- * which implement it.
+ * Sync all the data for all the filesystems (called by do_sync())
  *
  * This operation is careful to avoid the livelock which could easily happen
- * if two or more filesystems are being continuously dirtied.  s_need_sync_fs
+ * if two or more filesystems are being continuously dirtied.  s_need_sync
  * is used only here.  We set it against all filesystems and then clear it as
  * we sync them.  So redirtied filesystems are skipped.
  *
  * But if process A is currently running sync_filesystems and then process B
- * calls sync_filesystems as well, process B will set all the s_need_sync_fs
+ * calls sync_filesystems as well, process B will set all the s_need_sync
  * flags again, which will cause process A to resync everything.  Fix that with
  * a local mutex.
- *
- * (Fabian) Avoid sync_fs with clean fs & wait mode 0
  */
 void sync_filesystems(int wait)
 {
@@ -471,18 +466,16 @@ void sync_filesystems(int wait)
 	mutex_lock(&mutex);		/* Could be down_interruptible */
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (!sb->s_op->sync_fs)
-			continue;
 		if (sb->s_flags & MS_RDONLY)
 			continue;
-		sb->s_need_sync_fs = 1;
+		sb->s_need_sync = 1;
 	}
 
 restart:
 	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (!sb->s_need_sync_fs)
+		if (!sb->s_need_sync)
 			continue;
-		sb->s_need_sync_fs = 0;
+		sb->s_need_sync = 0;
 		if (sb->s_flags & MS_RDONLY)
 			continue;	/* hm.  Was remounted r/o meanwhile */
 		sb->s_count++;
@@ -490,7 +483,7 @@ restart:
 		down_read(&sb->s_umount);
 		async_synchronize_full_domain(&sb->s_async_list);
 		if (sb->s_root)
-			sb->s_op->sync_fs(sb, wait);
+			__fsync_super(sb, wait);
 		up_read(&sb->s_umount);
 		/* restart only when sb is no longer on the list */
 		spin_lock(&sb_lock);
@@ -501,31 +494,6 @@ restart:
 	mutex_unlock(&mutex);
 }
 
-/*
- *  Sync all block devices underlying some superblock
- */
-void sync_blockdevs(void)
-{
-	struct super_block *sb;
-
-	spin_lock(&sb_lock);
-restart:
-	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (!sb->s_bdev)
-			continue;
-		sb->s_count++;
-		spin_unlock(&sb_lock);
-		down_read(&sb->s_umount);
-		if (sb->s_root)
-			sync_blockdev(sb->s_bdev);
-		up_read(&sb->s_umount);
-		spin_lock(&sb_lock);
-		if (__put_super_and_need_restart(sb))
-			goto restart;
-	}
-	spin_unlock(&sb_lock);
-}
-
 /**
  *	get_super - get the superblock of a device
  *	@bdev: device to get the superblock for
diff --git a/fs/sync.c b/fs/sync.c
index fa14e42..86c6a86 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -17,35 +17,24 @@
 #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
 			SYNC_FILE_RANGE_WAIT_AFTER)
 
-/*
- * sync everything.  Start out by waking pdflush, because that writes back
- * all queues in parallel.
- */
-static void do_sync(unsigned long wait)
+SYSCALL_DEFINE0(sync)
 {
-	wakeup_pdflush(0);
-	sync_inodes(0);		/* All mappings, inodes and their blockdevs */
-	vfs_dq_sync(NULL);
-	sync_inodes(wait);	/* Mappings, inodes and blockdevs, again. */
-	sync_supers();		/* Write the superblocks */
-	sync_filesystems(0);	/* Start syncing the filesystems */
-	sync_filesystems(wait);	/* Waitingly sync the filesystems */
-	sync_blockdevs();
-	if (!wait)
-		printk("Emergency Sync complete\n");
+	sync_filesystems(0);
+	sync_filesystems(1);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
-}
-
-SYSCALL_DEFINE0(sync)
-{
-	do_sync(1);
 	return 0;
 }
 
 static void do_sync_work(struct work_struct *work)
 {
-	do_sync(0);
+	/*
+	 * Sync twice to reduce the possibility we skipped some inodes / pages
+	 * because they were temporarily locked
+	 */
+	sync_filesystems(0);
+	sync_filesystems(0);
+	printk("Emergency Sync complete\n");
 	kfree(work);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 47a67c9..be2be8d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1321,7 +1321,7 @@ struct super_block {
 	struct rw_semaphore	s_umount;
 	struct mutex		s_lock;
 	int			s_count;
-	int			s_need_sync_fs;
+	int			s_need_sync;
 	atomic_t		s_active;
 #ifdef CONFIG_SECURITY
 	void                    *s_security;
@@ -1942,7 +1942,7 @@ extern void bdput(struct block_device *);
 extern struct block_device *open_by_devnum(dev_t, fmode_t);
 extern void invalidate_bdev(struct block_device *);
 extern int sync_blockdev(struct block_device *bdev);
-extern void sync_blockdevs(void);
+extern int __sync_blockdev(struct block_device *bdev, int wait);
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 9c1ed1f..943d1c9 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -79,7 +79,6 @@ struct writeback_control {
 void writeback_inodes(struct writeback_control *wbc);
 int inode_wait(void *);
 void sync_inodes_sb(struct super_block *, int wait);
-void sync_inodes(int wait);
 
 /* writeback.h requires fs.h; it, too, is not included from here. */
 static inline void wait_on_inode(struct inode *inode)
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] vfs: Make __fsync_super() a static function (version 2)
  2009-04-23 14:47 ` [PATCH 3/4] vfs: Make __fsync_super() a static function " Jan Kara
@ 2009-04-23 16:42   ` Trond Myklebust
  2009-04-23 16:52   ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2009-04-23 16:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, Christoph Hellwig, linux-fsdevel, Andrew Morton

On Thu, 2009-04-23 at 16:47 +0200, Jan Kara wrote:
> __fsync_super() does the same thing as fsync_super(). So change the only
> caller to use fsync_super() and make __fsync_super() static. This removes
> unnecessarily duplicated call to sync_blockdev() and prepares ground
> for the changes to __fsync_super() in the following patches.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/block_dev.c     |    2 +-
>  fs/super.c         |    5 ++---
>  include/linux/fs.h |    1 -
>  3 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index f45dbc1..48d1290 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -240,7 +240,7 @@ struct super_block *freeze_bdev(struct block_device *bdev)
>  		sb->s_frozen = SB_FREEZE_WRITE;
>  		smp_wmb();
>  
> -		__fsync_super(sb);
> +		fsync_super(sb);
>  
>  		sb->s_frozen = SB_FREEZE_TRANS;
>  		smp_wmb();
> diff --git a/fs/super.c b/fs/super.c
> index d9759e0..4c92068 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -263,7 +263,7 @@ EXPORT_SYMBOL(unlock_super);
>   * device.  Takes the superblock lock.  Requires a second blkdev
>   * flush by the caller to complete the operation.
>   */
> -void __fsync_super(struct super_block *sb)
> +static int __fsync_super(struct super_block *sb)
          ^^^ Typo?

>  {
>  	sync_inodes_sb(sb, 0);
>  	vfs_dq_sync(sb);
> @@ -284,8 +284,7 @@ void __fsync_super(struct super_block *sb)
>   */
>  int fsync_super(struct super_block *sb)
>  {
> -	__fsync_super(sb);
> -	return sync_blockdev(sb->s_bdev);
> +	return __fsync_super(sb);
>  }
>  EXPORT_SYMBOL_GPL(fsync_super);
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 4bad02e..47a67c9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2084,7 +2084,6 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
>  extern int vfs_fsync(struct file *file, struct dentry *dentry, int datasync);
>  extern void sync_supers(void);
>  extern void sync_filesystems(int wait);
> -extern void __fsync_super(struct super_block *sb);
>  extern void emergency_sync(void);
>  extern void emergency_remount(void);
>  extern int do_remount_sb(struct super_block *sb, int flags,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] vfs: Make __fsync_super() a static function (version 2)
  2009-04-23 14:47 ` [PATCH 3/4] vfs: Make __fsync_super() a static function " Jan Kara
  2009-04-23 16:42   ` Trond Myklebust
@ 2009-04-23 16:52   ` Christoph Hellwig
  2009-04-23 20:46     ` Jan Kara
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2009-04-23 16:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Christoph Hellwig, Trond Myklebust, linux-fsdevel, Andrew Morton

On Thu, Apr 23, 2009 at 04:47:24PM +0200, Jan Kara wrote:
> -void __fsync_super(struct super_block *sb)
> +static int __fsync_super(struct super_block *sb)
>  {
>  	sync_inodes_sb(sb, 0);
>  	vfs_dq_sync(sb);
> @@ -284,8 +284,7 @@ void __fsync_super(struct super_block *sb)

I think you're missing a return somewhere :)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] vfs: Make sys_sync() use fsync_super() (version 2)
  2009-04-23 14:47 ` [PATCH 4/4] vfs: Make sys_sync() use fsync_super() " Jan Kara
@ 2009-04-23 16:57   ` Christoph Hellwig
  2009-04-23 21:29     ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2009-04-23 16:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Christoph Hellwig, Trond Myklebust, linux-fsdevel, Andrew Morton

> +int __sync_blockdev(struct block_device *bdev, int wait)
> +{
> +	if (!bdev)
> +		return 0;
> +	if (!wait)
> +		return filemap_flush(bdev->bd_inode->i_mapping);
> +	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
> +}

I wonder if the filemap_flush for the async case really buys us much,
all the async and then later sync writeback activities of the FS will
redirty lots of bits of the blockdev mapping that we then have to write
twice.

> @@ -284,7 +277,12 @@ static int __fsync_super(struct super_block *sb)
>   */
>  int fsync_super(struct super_block *sb)
>  {
> -	return __fsync_super(sb);
> +	int ret;
> +
> +	ret = __fsync_super(sb, 0);
> +	if (ret < 0)
> +		return ret;
> +	return __fsync_super(sb, 1);

This async first then wait approach does have some benefits when syncing
multiple filesystems, but I wonder if it isn't actually conta-productive
when syncing a single one.

Maybe this should be a separate patch ontop to allow for more
fine-grained benchmarking.

>  /*
> - * Call the ->sync_fs super_op against all filesystems which are r/w and
> - * which implement it.
> + * Sync all the data for all the filesystems (called by do_sync())

Your patch removes do_sync :)

>  static void do_sync_work(struct work_struct *work)
>  {
> -	do_sync(0);
> +	/*
> +	 * Sync twice to reduce the possibility we skipped some inodes / pages
> +	 * because they were temporarily locked
> +	 */
> +	sync_filesystems(0);
> +	sync_filesystems(0);
> +	printk("Emergency Sync complete\n");
>  	kfree(work);

Ah, nice.  Good to have this out of the sys_sync path.


The patch looks generally good but I'll need some heavy testing.  I'll
do some XFS testing with it ASAP.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] vfs: Make __fsync_super() a static function (version 2)
  2009-04-23 16:52   ` Christoph Hellwig
@ 2009-04-23 20:46     ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-04-23 20:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: LKML, Trond Myklebust, linux-fsdevel, Andrew Morton

On Thu 23-04-09 12:52:05, Christoph Hellwig wrote:
> On Thu, Apr 23, 2009 at 04:47:24PM +0200, Jan Kara wrote:
> > -void __fsync_super(struct super_block *sb)
> > +static int __fsync_super(struct super_block *sb)
> >  {
> >  	sync_inodes_sb(sb, 0);
> >  	vfs_dq_sync(sb);
> > @@ -284,8 +284,7 @@ void __fsync_super(struct super_block *sb)
> 
> I think you're missing a return somewhere :)
  Argh, forgot to commit this compile fix ;).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] vfs: Make sys_sync() use fsync_super() (version 2)
  2009-04-23 16:57   ` Christoph Hellwig
@ 2009-04-23 21:29     ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-04-23 21:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: LKML, Trond Myklebust, linux-fsdevel, Andrew Morton

On Thu 23-04-09 12:57:36, Christoph Hellwig wrote:
> > +int __sync_blockdev(struct block_device *bdev, int wait)
> > +{
> > +	if (!bdev)
> > +		return 0;
> > +	if (!wait)
> > +		return filemap_flush(bdev->bd_inode->i_mapping);
> > +	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
> > +}
> 
> I wonder if the filemap_flush for the async case really buys us much,
> all the async and then later sync writeback activities of the FS will
> redirty lots of bits of the blockdev mapping that we then have to write
> twice.
  Well, I think it does. Because if you call write_inode() with wait==1,
then filesystems usually do sync_dirty_buffer() for the buffer with inode.
But with wait==0 filesystems (e.g. ext2 and other "simple" filesystems)
just mark the buffer with inode dirty.  So sync_inodes_sb(sb, 0) just
dirties lots of buffers and then filemap_flush() submits all the IO more
effectively than doing sync_dirty_buffer() for each inode... But I guess it
deserves a comment.
  Also I'd think that async case submits most of the IO and later sync just
gathers last bits we might have skipped.
 
> > @@ -284,7 +277,12 @@ static int __fsync_super(struct super_block *sb)
> >   */
> >  int fsync_super(struct super_block *sb)
> >  {
> > -	return __fsync_super(sb);
> > +	int ret;
> > +
> > +	ret = __fsync_super(sb, 0);
> > +	if (ret < 0)
> > +		return ret;
> > +	return __fsync_super(sb, 1);
> 
> This async first then wait approach does have some benefits when syncing
> multiple filesystems, but I wonder if it isn't actually conta-productive
> when syncing a single one.
> 
> Maybe this should be a separate patch ontop to allow for more
> fine-grained benchmarking.
  But __fsync_super() previously did:
sync_inodes_sb(sb, 0);
...
sync_inodes_sb(sb, 1);
  So the change is only in calling write_super(), vfs_dq_sync() and
sync_fs() twice. I can certainly change the function to call vfs_dq_sync()
and write_super() only if wait == 1, so only sync_fs() would be called
twice. But then if someone uses __fsync_super() in future, he might get
surprised...

> >  /*
> > - * Call the ->sync_fs super_op against all filesystems which are r/w and
> > - * which implement it.
> > + * Sync all the data for all the filesystems (called by do_sync())
> 
> Your patch removes do_sync :)
  Yup, thanks.
 
> >  static void do_sync_work(struct work_struct *work)
> >  {
> > -	do_sync(0);
> > +	/*
> > +	 * Sync twice to reduce the possibility we skipped some inodes / pages
> > +	 * because they were temporarily locked
> > +	 */
> > +	sync_filesystems(0);
> > +	sync_filesystems(0);
> > +	printk("Emergency Sync complete\n");
> >  	kfree(work);
> 
> Ah, nice.  Good to have this out of the sys_sync path.
> 
> The patch looks generally good but I'll need some heavy testing.  I'll
> do some XFS testing with it ASAP.
  Great. I think the first (maybe first two) patch can go in quickly to fix
the bug and the others can go in after more serious testing.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-04-23 21:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-23 14:47 [PATCH 0/4] Fix sys_sync() bug and cleanup code (version 2) Jan Kara
2009-04-23 14:47 ` [PATCH 1/4] vfs: Fix sys_sync() and fsync_super() reliability " Jan Kara
2009-04-23 14:47 ` [PATCH 2/4] vfs: Call ->sync_fs() even if s_dirt is 0 " Jan Kara
2009-04-23 14:47 ` [PATCH 3/4] vfs: Make __fsync_super() a static function " Jan Kara
2009-04-23 16:42   ` Trond Myklebust
2009-04-23 16:52   ` Christoph Hellwig
2009-04-23 20:46     ` Jan Kara
2009-04-23 14:47 ` [PATCH 4/4] vfs: Make sys_sync() use fsync_super() " Jan Kara
2009-04-23 16:57   ` Christoph Hellwig
2009-04-23 21:29     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.