All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5 v2] Improve sync(2) handling
@ 2011-07-26 22:38 Jan Kara
  2011-07-26 22:38 ` [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes Jan Kara
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Jan Kara @ 2011-07-26 22:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Curt Wohlgemuth, Al Viro


  Hello,

  this is a second iteration of my series improving handling of sync syscall.
Since previous submission I have incorporated Christoph's suggestions and
also measured some numbers.

	Unpatched kernel			Patched kernel
time for (( i = 0; i < 100; i++ )); do sync; done
	1.130400 +- 0.027369			1.073200 +- 0.040848
create 20000 4k files and sync (XFS)
	155.995600 +- 1.879084			155.942200 +- 1.843881
create 20000 4k files and sync (ext3)
	154.597200 +- 1.556965			153.109800 +- 1.339094
run while true; do time sync; sleep 5; done while creating 32 MB files in
  parallel (xfs)
	6.466727 +- 0.710062			5.716000 +- 0.574406
run while true; do time sync; sleep 5; done while creating 32 MB files in
  parallel (ext3)
	15.969909 +- 3.732143			7.662909 +- 2.268830

We see that when we are not writing in parallel to sync (the last test) there
are no big improvements. When writing while running sync, ext3 got a nice boost
from saving one pass over inodes. XFS got improved only slightly.

Comments are welcome.

								Honza

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes
  2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
@ 2011-07-26 22:38 ` Jan Kara
  2011-07-27  9:44   ` Christoph Hellwig
  2011-07-26 22:38 ` [PATCH 2/5] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2011-07-26 22:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Curt Wohlgemuth, Al Viro, Jan Kara

In case block device does not have filesystem mounted on it, sync(1) will just
ignore it and doesn't writeout dirty pages because it iterates over filesystems
with s_bdi != noop_backing_dev_info and thus it avoids blockdev_superblock.
Since it's unexpected that sync doesn't writeout dirty data for block devices
be nice to users and change the behavior to do so.

This requires a change to how syncing is done. We now first traverse all
superblocks with s_bdi != noop_backing_dev_info, writeout their inodes and
call sync_fs and when this is done, we traverse all block devices and sync
them.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index c38ec16..f8f21d9 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -23,20 +23,13 @@
 
 /*
  * Do the filesystem syncing work. For simple filesystems
- * writeback_inodes_sb(sb) just dirties buffers with inodes so we have to
- * submit IO for these buffers via __sync_blockdev(). This also speeds up the
- * wait == 1 case since in that case write_inode() functions do
+ * writeback_inodes_sb(sb) just dirties buffers with inodes so the caller has
+ * to additionally submit IO for these buffers via __sync_blockdev(). This also
+ * speeds up the wait == 1 case since in that case write_inode() functions do
  * sync_dirty_buffer() and thus effectively write one block at a time.
  */
-static int __sync_filesystem(struct super_block *sb, int wait)
+static void __sync_filesystem(struct super_block *sb, int wait)
 {
-	/*
-	 * This should be safe, as we require bdi backing to actually
-	 * write out data in the first place
-	 */
-	if (sb->s_bdi == &noop_backing_dev_info)
-		return 0;
-
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
 		sb->s_qcop->quota_sync(sb, -1, wait);
 
@@ -47,7 +40,6 @@ static int __sync_filesystem(struct super_block *sb, int wait)
 
 	if (sb->s_op->sync_fs)
 		sb->s_op->sync_fs(sb, wait);
-	return __sync_blockdev(sb->s_bdev, wait);
 }
 
 /*
@@ -71,16 +63,26 @@ int sync_filesystem(struct super_block *sb)
 	if (sb->s_flags & MS_RDONLY)
 		return 0;
 
-	ret = __sync_filesystem(sb, 0);
+	/*
+	 * This should be safe, as we require bdi backing to actually
+	 * write out data in the first place.
+	 */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return 0;
+
+	__sync_filesystem(sb, 0);
+	ret = __sync_blockdev(sb->s_bdev, 0);
 	if (ret < 0)
 		return ret;
-	return __sync_filesystem(sb, 1);
+	__sync_filesystem(sb, 1);
+	return __sync_blockdev(sb->s_bdev, 1);
 }
 EXPORT_SYMBOL_GPL(sync_filesystem);
 
 static void sync_one_sb(struct super_block *sb, void *arg)
 {
-	if (!(sb->s_flags & MS_RDONLY))
+	/* Avoid read-only filesystems and filesystems without backing device */
+	if (!(sb->s_flags & MS_RDONLY) && sb->s_bdi != &noop_backing_dev_info)
 		__sync_filesystem(sb, *(int *)arg);
 }
 /*
@@ -92,6 +94,42 @@ static void sync_filesystems(int wait)
 	iterate_supers(sync_one_sb, &wait);
 }
 
+static void sync_all_bdevs(int wait)
+{
+	struct inode *inode, *old_inode = NULL;
+
+	spin_lock(&inode_sb_list_lock);
+	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
+		struct address_space *mapping = inode->i_mapping;
+
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
+		    mapping->nrpages == 0) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&inode_sb_list_lock);
+		/*
+		 * We hold a reference to 'inode' so it couldn't have been
+		 * removed from s_inodes list while we dropped the
+		 * inode_sb_list_lock.  We cannot iput the inode now as we can
+		 * be holding the last reference and we cannot iput it under
+		 * inode_sb_list_lock. So we keep the reference and iput it
+		 * later.
+		 */
+		iput(old_inode);
+		old_inode = inode;
+
+		__sync_blockdev(I_BDEV(inode), wait);
+
+		spin_lock(&inode_sb_list_lock);
+	}
+	spin_unlock(&inode_sb_list_lock);
+	iput(old_inode);
+}
+
 /*
  * sync everything.  Start out by waking pdflush, because that writes back
  * all queues in parallel.
@@ -101,6 +139,8 @@ SYSCALL_DEFINE0(sync)
 	wakeup_flusher_threads(0);
 	sync_filesystems(0);
 	sync_filesystems(1);
+	sync_all_bdevs(0);
+	sync_all_bdevs(1);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] vfs: Move noop_backing_dev_info check from sync into writeback
  2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
  2011-07-26 22:38 ` [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes Jan Kara
@ 2011-07-26 22:38 ` Jan Kara
  2011-07-27  9:44   ` Christoph Hellwig
  2011-07-26 22:38 ` [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2011-07-26 22:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Curt Wohlgemuth, Al Viro, Jan Kara

In principle, a filesystem may want to have ->sync_fs() called during sync(1)
although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
Only writeback code really needs bdi set to something reasonable. So move the
checks where they are more logical.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    6 ++++++
 fs/sync.c         |    9 +--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 0f015a0..82a876b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1194,6 +1194,9 @@ void writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr)
 		.nr_pages	= nr,
 	};
 
+	/* Nothing to do? */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 	bdi_queue_work(sb->s_bdi, &work);
 	wait_for_completion(&done);
@@ -1272,6 +1275,9 @@ void sync_inodes_sb(struct super_block *sb)
 		.done		= &done,
 	};
 
+	/* Nothing to do? */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
 	bdi_queue_work(sb->s_bdi, &work);
diff --git a/fs/sync.c b/fs/sync.c
index f8f21d9..2f5bd52 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -63,13 +63,6 @@ int sync_filesystem(struct super_block *sb)
 	if (sb->s_flags & MS_RDONLY)
 		return 0;
 
-	/*
-	 * This should be safe, as we require bdi backing to actually
-	 * write out data in the first place.
-	 */
-	if (sb->s_bdi == &noop_backing_dev_info)
-		return 0;
-
 	__sync_filesystem(sb, 0);
 	ret = __sync_blockdev(sb->s_bdev, 0);
 	if (ret < 0)
@@ -82,7 +75,7 @@ EXPORT_SYMBOL_GPL(sync_filesystem);
 static void sync_one_sb(struct super_block *sb, void *arg)
 {
 	/* Avoid read-only filesystems and filesystems without backing device */
-	if (!(sb->s_flags & MS_RDONLY) && sb->s_bdi != &noop_backing_dev_info)
+	if (!(sb->s_flags & MS_RDONLY))
 		__sync_filesystem(sb, *(int *)arg);
 }
 /*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part
  2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
  2011-07-26 22:38 ` [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes Jan Kara
  2011-07-26 22:38 ` [PATCH 2/5] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
@ 2011-07-26 22:38 ` Jan Kara
  2011-07-27  8:26   ` Steven Whitehouse
  2011-07-27  9:45   ` Christoph Hellwig
  2011-07-26 22:38 ` [PATCH 4/5] quota: Move quota syncing to ->sync_fs method Jan Kara
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: Jan Kara @ 2011-07-26 22:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Curt Wohlgemuth, Al Viro, Jan Kara

Split off part of dquot_quota_sync() which writes dquots into a quota file
to a separate function. In the next patch we will use the function from
filesystems and we do not want to abuse ->quota_sync quotactl callback more
than necessary.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/quota.c          |    4 ++--
 fs/gfs2/quota.h          |    2 +-
 fs/gfs2/super.c          |    2 +-
 fs/gfs2/sys.c            |    2 +-
 fs/quota/dquot.c         |   35 ++++++++++++++++++++++++++---------
 fs/quota/quota.c         |    4 ++--
 fs/sync.c                |    2 +-
 include/linux/quota.h    |    2 +-
 include/linux/quotaops.h |    3 ++-
 9 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 42e8d23..2ca3c6a 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1122,7 +1122,7 @@ void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
 	}
 }
 
-int gfs2_quota_sync(struct super_block *sb, int type, int wait)
+int gfs2_quota_sync(struct super_block *sb, int type)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_quota_data **qda;
@@ -1168,7 +1168,7 @@ int gfs2_quota_sync(struct super_block *sb, int type, int wait)
 
 static int gfs2_quota_sync_timeo(struct super_block *sb, int type)
 {
-	return gfs2_quota_sync(sb, type, 0);
+	return gfs2_quota_sync(sb, type);
 }
 
 int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id)
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 90bf1c3..f25d98b 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -26,7 +26,7 @@ extern int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 gid);
 extern void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
 			      u32 uid, u32 gid);
 
-extern int gfs2_quota_sync(struct super_block *sb, int type, int wait);
+extern int gfs2_quota_sync(struct super_block *sb, int type);
 extern int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id);
 
 extern int gfs2_quota_init(struct gfs2_sbd *sdp);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ed540e7..e361610 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -807,7 +807,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 	int error;
 
 	flush_workqueue(gfs2_delete_workqueue);
-	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
+	gfs2_quota_sync(sdp->sd_vfs, 0);
 	gfs2_statfs_sync(sdp->sd_vfs, 0);
 
 	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, GL_NOCACHE,
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index e20eab3..b595462 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -168,7 +168,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const char *buf,
 	if (simple_strtol(buf, NULL, 0) != 1)
 		return -EINVAL;
 
-	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
+	gfs2_quota_sync(sdp->sd_vfs, 0);
 	return len;
 }
 
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 5b572c8..35c4a47 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -595,12 +595,14 @@ out:
 }
 EXPORT_SYMBOL(dquot_scan_active);
 
-int dquot_quota_sync(struct super_block *sb, int type, int wait)
+/* Write all dquot structures to quota files */
+int dquot_writeback_dquots(struct super_block *sb, int type)
 {
 	struct list_head *dirty;
 	struct dquot *dquot;
 	struct quota_info *dqopt = sb_dqopt(sb);
 	int cnt;
+	int err, ret = 0;
 
 	mutex_lock(&dqopt->dqonoff_mutex);
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -624,7 +626,9 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 			atomic_inc(&dquot->dq_count);
 			spin_unlock(&dq_list_lock);
 			dqstats_inc(DQST_LOOKUPS);
-			sb->dq_op->write_dquot(dquot);
+			err = sb->dq_op->write_dquot(dquot);
+			if (!ret && err)
+				err = ret;
 			dqput(dquot);
 			spin_lock(&dq_list_lock);
 		}
@@ -638,7 +642,21 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 	dqstats_inc(DQST_SYNCS);
 	mutex_unlock(&dqopt->dqonoff_mutex);
 
-	if (!wait || (sb_dqopt(sb)->flags & DQUOT_QUOTA_SYS_FILE))
+	return ret;
+}
+EXPORT_SYMBOL(dquot_writeback_dquots);
+
+/* Write all dquot structures to disk and make them visible from userspace */
+int dquot_quota_sync(struct super_block *sb, int type)
+{
+	struct quota_info *dqopt = sb_dqopt(sb);
+	int cnt;
+	int ret;
+
+	ret = dquot_writeback_dquots(sb, type);
+	if (ret)
+		return ret;
+	if (dqopt->flags & DQUOT_QUOTA_SYS_FILE)
 		return 0;
 
 	/* This is not very clever (and fast) but currently I don't know about
@@ -652,18 +670,17 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 	 * Now when everything is written we can discard the pagecache so
 	 * that userspace sees the changes.
 	 */
-	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
+	mutex_lock(&dqopt->dqonoff_mutex);
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
 		if (type != -1 && cnt != type)
 			continue;
 		if (!sb_has_quota_active(sb, cnt))
 			continue;
-		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex,
-				  I_MUTEX_QUOTA);
-		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
-		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
+		mutex_lock_nested(&dqopt->files[cnt]->i_mutex, I_MUTEX_QUOTA);
+		truncate_inode_pages(&dqopt->files[cnt]->i_data, 0);
+		mutex_unlock(&dqopt->files[cnt]->i_mutex);
 	}
-	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
+	mutex_unlock(&dqopt->dqonoff_mutex);
 
 	return 0;
 }
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index b34bdb2..76b8012 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -48,7 +48,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
 static void quota_sync_one(struct super_block *sb, void *arg)
 {
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
+		sb->s_qcop->quota_sync(sb, *(int *)arg);
 }
 
 static int quota_sync_all(int type)
@@ -271,7 +271,7 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id,
 	case Q_SYNC:
 		if (!sb->s_qcop->quota_sync)
 			return -ENOSYS;
-		return sb->s_qcop->quota_sync(sb, type, 1);
+		return sb->s_qcop->quota_sync(sb, type);
 	case Q_XQUOTAON:
 	case Q_XQUOTAOFF:
 	case Q_XQUOTARM:
diff --git a/fs/sync.c b/fs/sync.c
index 2f5bd52..2b7d569 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -31,7 +31,7 @@
 static void __sync_filesystem(struct super_block *sb, int wait)
 {
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1, wait);
+		sb->s_qcop->quota_sync(sb, -1);
 
 	if (wait)
 		sync_inodes_sb(sb);
diff --git a/include/linux/quota.h b/include/linux/quota.h
index 9a85412..86173d6 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -329,7 +329,7 @@ struct quotactl_ops {
 	int (*quota_on)(struct super_block *, int, int, struct path *);
 	int (*quota_on_meta)(struct super_block *, int, int);
 	int (*quota_off)(struct super_block *, int);
-	int (*quota_sync)(struct super_block *, int, int);
+	int (*quota_sync)(struct super_block *, int);
 	int (*get_info)(struct super_block *, int, struct if_dqinfo *);
 	int (*set_info)(struct super_block *, int, struct if_dqinfo *);
 	int (*get_dqblk)(struct super_block *, int, qid_t, struct fs_disk_quota *);
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index 26f9e36..aba7578 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -83,7 +83,8 @@ int dquot_quota_on(struct super_block *sb, int type, int format_id,
 int dquot_quota_on_mount(struct super_block *sb, char *qf_name,
  	int format_id, int type);
 int dquot_quota_off(struct super_block *sb, int type);
-int dquot_quota_sync(struct super_block *sb, int type, int wait);
+int dquot_writeback_dquots(struct super_block *sb, int type);
+int dquot_quota_sync(struct super_block *sb, int type);
 int dquot_get_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
 int dquot_set_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
 int dquot_get_dqblk(struct super_block *sb, int type, qid_t id,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] quota: Move quota syncing to ->sync_fs method
  2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
                   ` (2 preceding siblings ...)
  2011-07-26 22:38 ` [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
@ 2011-07-26 22:38 ` Jan Kara
  2011-07-27  8:32   ` Steven Whitehouse
                     ` (2 more replies)
  2011-07-26 22:38 ` [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1) Jan Kara
  2011-09-28 15:00 ` [PATCH 0/5 v2] Improve sync(2) handling Christoph Hellwig
  5 siblings, 3 replies; 21+ messages in thread
From: Jan Kara @ 2011-07-26 22:38 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Curt Wohlgemuth, Al Viro, Jan Kara,
	Theodore Ts'o, Steven Whitehouse, Dave Kleikamp, Joel Becker,
	reiserfs-devel

Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Dave Kleikamp <shaggy@kernel.org>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/super.c     |    6 ++++++
 fs/ext3/super.c     |    5 +++++
 fs/ext4/super.c     |    5 +++++
 fs/gfs2/super.c     |    2 ++
 fs/jfs/super.c      |    5 +++++
 fs/reiserfs/super.c |    5 +++++
 fs/sync.c           |    3 ---
 7 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1dd62ed..174782f 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1180,6 +1180,12 @@ static int ext2_sync_fs(struct super_block *sb, int wait)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	struct ext2_super_block *es = EXT2_SB(sb)->s_es;
 
+	/*
+	 * Write quota structures to quota file, sync_blockdev() will write
+	 * them to disk later
+	 */
+	dquot_writeback_dquots(sb, -1);
+
 	spin_lock(&sbi->s_lock);
 	if (es->s_state & cpu_to_le16(EXT2_VALID_FS)) {
 		ext2_debug("setting valid to 0\n");
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index aad153e..a91869c 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2507,6 +2507,11 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
 {
 	tid_t target;
 
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
 		if (wait)
 			log_wait_commit(EXT3_SB(sb)->s_journal, target);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9ea71aa..c81ab5f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4199,6 +4199,11 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 
 	trace_ext4_sync_fs(sb, wait);
 	flush_workqueue(sbi->dio_unwritten_wq);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
 			jbd2_log_wait_commit(sbi->s_journal, target);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index e361610..faf6e83 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -921,6 +921,8 @@ restart:
 static int gfs2_sync_fs(struct super_block *sb, int wait)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
+
+	gfs2_quota_sync(sb, -1);
 	if (wait && sdp)
 		gfs2_log_flush(sdp, NULL);
 	return 0;
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 06c8a67..a2c9cc1 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -603,6 +603,11 @@ static int jfs_sync_fs(struct super_block *sb, int wait)
 
 	/* log == NULL indicates read-only mount */
 	if (log) {
+		/*
+		 * Write quota structures to quota file, sync_blockdev() will
+		 * write them to disk later
+		 */
+		dquot_writeback_dquots(sb, -1);
 		jfs_flush_journal(log, wait);
 		jfs_syncpt(log, 0);
 	}
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index aa91089..a2bfc97 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -66,6 +66,11 @@ static int reiserfs_sync_fs(struct super_block *s, int wait)
 {
 	struct reiserfs_transaction_handle th;
 
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(s, -1);
 	reiserfs_write_lock(s);
 	if (!journal_begin(&th, s, 1))
 		if (!journal_end_sync(&th, s, 1))
diff --git a/fs/sync.c b/fs/sync.c
index 2b7d569..f07f991 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -30,9 +30,6 @@
  */
 static void __sync_filesystem(struct super_block *sb, int wait)
 {
-	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1);
-
 	if (wait)
 		sync_inodes_sb(sb);
 	else
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)
  2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
                   ` (3 preceding siblings ...)
  2011-07-26 22:38 ` [PATCH 4/5] quota: Move quota syncing to ->sync_fs method Jan Kara
@ 2011-07-26 22:38 ` Jan Kara
  2011-07-27  9:52   ` Christoph Hellwig
  2011-09-28 15:00 ` [PATCH 0/5 v2] Improve sync(2) handling Christoph Hellwig
  5 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2011-07-26 22:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Curt Wohlgemuth, Al Viro, Jan Kara

wakeup_flusher_thread(0) will queue work doing complete writeback for
each flusher thread. Thus there is not much point in submitting another
work doing full inode WB_SYNC_NONE writeback by sync_filesystems(). So change
sync to do:
  wakeup_flusher_threads(0);
  for each filesystem
    WB_SYNC_ALL inode writeback
    sync_fs(wait=0)
  submit dirty buffers from all block devices
  for each filesystem
    sync_fs(wait=1)
  synchronous writeout of all block devices

  Note that this changes ordering of sync_fs() calls and inode writeback.
Previously we called sync_fs(wait=0) after WB_SYNC_NONE inode writeback and
before WB_SYNC_ALL inode writeback. Now we call it after WB_SYNC_ALL inode
writeback because there is no point in calling it while flusher threads woken
by wakeup_flusher_threads(0) are still writing out data and there is no easy
way to find out when work submitted by wakeup_flusher_threads() is finished.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   55 +++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index f07f991..ca40cda 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -28,15 +28,16 @@
  * speeds up the wait == 1 case since in that case write_inode() functions do
  * sync_dirty_buffer() and thus effectively write one block at a time.
  */
-static void __sync_filesystem(struct super_block *sb, int wait)
+static void __sync_filesystem(struct super_block *sb, int emergency)
 {
-	if (wait)
-		sync_inodes_sb(sb);
-	else
+	/* In case of emergency sync we don't want to wait for locks and IO */
+	if (unlikely(emergency))
 		writeback_inodes_sb(sb);
+	else
+		sync_inodes_sb(sb);
 
 	if (sb->s_op->sync_fs)
-		sb->s_op->sync_fs(sb, wait);
+		sb->s_op->sync_fs(sb, 0);
 }
 
 /*
@@ -60,11 +61,23 @@ int sync_filesystem(struct super_block *sb)
 	if (sb->s_flags & MS_RDONLY)
 		return 0;
 
-	__sync_filesystem(sb, 0);
+	/* Asynchronous pass of sync to speed things up */
+	writeback_inodes_sb(sb);
+	if (sb->s_op->sync_fs) {
+		ret = sb->s_op->sync_fs(sb, 0);
+		if (ret)
+			return ret;
+	}
 	ret = __sync_blockdev(sb->s_bdev, 0);
 	if (ret < 0)
 		return ret;
-	__sync_filesystem(sb, 1);
+	/* Synchronous pass of sync to guarantee data integrity */
+	sync_inodes_sb(sb);
+	if (sb->s_op->sync_fs) {
+		ret = sb->s_op->sync_fs(sb, 1);
+		if (ret)
+			return ret;
+	}
 	return __sync_blockdev(sb->s_bdev, 1);
 }
 EXPORT_SYMBOL_GPL(sync_filesystem);
@@ -79,9 +92,9 @@ static void sync_one_sb(struct super_block *sb, void *arg)
  * Sync all the data for all the filesystems (called by sys_sync() and
  * emergency sync)
  */
-static void sync_filesystems(int wait)
+static void sync_filesystems(int emergency)
 {
-	iterate_supers(sync_one_sb, &wait);
+	iterate_supers(sync_one_sb, &emergency);
 }
 
 static void sync_all_bdevs(int wait)
@@ -120,16 +133,34 @@ static void sync_all_bdevs(int wait)
 	iput(old_inode);
 }
 
+static void sb_sync_fs(struct super_block *sb, void *arg)
+{
+	/* Avoid read-only filesystems */
+	if (sb->s_flags & MS_RDONLY)
+		return;
+	if (sb->s_op->sync_fs)
+		sb->s_op->sync_fs(sb, 1);
+}
+
 /*
  * sync everything.  Start out by waking pdflush, because that writes back
  * all queues in parallel.
  */
 SYSCALL_DEFINE0(sync)
 {
+	/* Start flushing on all devices */
 	wakeup_flusher_threads(0);
+	/*
+	 * Above call queued work doing complete writeout on each filesystem.
+	 * Now we queue work which guarantees data integrity of all inodes
+	 * - not much should be left for it to write. The WB_SYNC_ALL inode
+	 * writeback also guarantees that sync_fs() is called after inodes
+	 * are written out and thus it can do meaningful work.
+	 */
 	sync_filesystems(0);
-	sync_filesystems(1);
 	sync_all_bdevs(0);
+	/* Call blocking ->sync_fs() for each filesystem */
+	iterate_supers(sb_sync_fs, NULL);
 	sync_all_bdevs(1);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
@@ -142,8 +173,8 @@ static void do_sync_work(struct work_struct *work)
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
 	 * because they were temporarily locked
 	 */
-	sync_filesystems(0);
-	sync_filesystems(0);
+	sync_filesystems(1);
+	sync_filesystems(1);
 	printk("Emergency Sync complete\n");
 	kfree(work);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part
  2011-07-26 22:38 ` [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
@ 2011-07-27  8:26   ` Steven Whitehouse
  2011-07-27  9:45   ` Christoph Hellwig
  1 sibling, 0 replies; 21+ messages in thread
From: Steven Whitehouse @ 2011-07-27  8:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro

Hi,

On Wed, 2011-07-27 at 00:38 +0200, Jan Kara wrote:
> Split off part of dquot_quota_sync() which writes dquots into a quota file
> to a separate function. In the next patch we will use the function from
> filesystems and we do not want to abuse ->quota_sync quotactl callback more
> than necessary.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>

Steve.

> ---
>  fs/gfs2/quota.c          |    4 ++--
>  fs/gfs2/quota.h          |    2 +-
>  fs/gfs2/super.c          |    2 +-
>  fs/gfs2/sys.c            |    2 +-
>  fs/quota/dquot.c         |   35 ++++++++++++++++++++++++++---------
>  fs/quota/quota.c         |    4 ++--
>  fs/sync.c                |    2 +-
>  include/linux/quota.h    |    2 +-
>  include/linux/quotaops.h |    3 ++-
>  9 files changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index 42e8d23..2ca3c6a 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1122,7 +1122,7 @@ void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
>  	}
>  }
>  
> -int gfs2_quota_sync(struct super_block *sb, int type, int wait)
> +int gfs2_quota_sync(struct super_block *sb, int type)
>  {
>  	struct gfs2_sbd *sdp = sb->s_fs_info;
>  	struct gfs2_quota_data **qda;
> @@ -1168,7 +1168,7 @@ int gfs2_quota_sync(struct super_block *sb, int type, int wait)
>  
>  static int gfs2_quota_sync_timeo(struct super_block *sb, int type)
>  {
> -	return gfs2_quota_sync(sb, type, 0);
> +	return gfs2_quota_sync(sb, type);
>  }
>  
>  int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id)
> diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
> index 90bf1c3..f25d98b 100644
> --- a/fs/gfs2/quota.h
> +++ b/fs/gfs2/quota.h
> @@ -26,7 +26,7 @@ extern int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 gid);
>  extern void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
>  			      u32 uid, u32 gid);
>  
> -extern int gfs2_quota_sync(struct super_block *sb, int type, int wait);
> +extern int gfs2_quota_sync(struct super_block *sb, int type);
>  extern int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id);
>  
>  extern int gfs2_quota_init(struct gfs2_sbd *sdp);
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index ed540e7..e361610 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -807,7 +807,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
>  	int error;
>  
>  	flush_workqueue(gfs2_delete_workqueue);
> -	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
> +	gfs2_quota_sync(sdp->sd_vfs, 0);
>  	gfs2_statfs_sync(sdp->sd_vfs, 0);
>  
>  	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, GL_NOCACHE,
> diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
> index e20eab3..b595462 100644
> --- a/fs/gfs2/sys.c
> +++ b/fs/gfs2/sys.c
> @@ -168,7 +168,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const char *buf,
>  	if (simple_strtol(buf, NULL, 0) != 1)
>  		return -EINVAL;
>  
> -	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
> +	gfs2_quota_sync(sdp->sd_vfs, 0);
>  	return len;
>  }
>  
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 5b572c8..35c4a47 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -595,12 +595,14 @@ out:
>  }
>  EXPORT_SYMBOL(dquot_scan_active);
>  
> -int dquot_quota_sync(struct super_block *sb, int type, int wait)
> +/* Write all dquot structures to quota files */
> +int dquot_writeback_dquots(struct super_block *sb, int type)
>  {
>  	struct list_head *dirty;
>  	struct dquot *dquot;
>  	struct quota_info *dqopt = sb_dqopt(sb);
>  	int cnt;
> +	int err, ret = 0;
>  
>  	mutex_lock(&dqopt->dqonoff_mutex);
>  	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> @@ -624,7 +626,9 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
>  			atomic_inc(&dquot->dq_count);
>  			spin_unlock(&dq_list_lock);
>  			dqstats_inc(DQST_LOOKUPS);
> -			sb->dq_op->write_dquot(dquot);
> +			err = sb->dq_op->write_dquot(dquot);
> +			if (!ret && err)
> +				err = ret;
>  			dqput(dquot);
>  			spin_lock(&dq_list_lock);
>  		}
> @@ -638,7 +642,21 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
>  	dqstats_inc(DQST_SYNCS);
>  	mutex_unlock(&dqopt->dqonoff_mutex);
>  
> -	if (!wait || (sb_dqopt(sb)->flags & DQUOT_QUOTA_SYS_FILE))
> +	return ret;
> +}
> +EXPORT_SYMBOL(dquot_writeback_dquots);
> +
> +/* Write all dquot structures to disk and make them visible from userspace */
> +int dquot_quota_sync(struct super_block *sb, int type)
> +{
> +	struct quota_info *dqopt = sb_dqopt(sb);
> +	int cnt;
> +	int ret;
> +
> +	ret = dquot_writeback_dquots(sb, type);
> +	if (ret)
> +		return ret;
> +	if (dqopt->flags & DQUOT_QUOTA_SYS_FILE)
>  		return 0;
>  
>  	/* This is not very clever (and fast) but currently I don't know about
> @@ -652,18 +670,17 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
>  	 * Now when everything is written we can discard the pagecache so
>  	 * that userspace sees the changes.
>  	 */
> -	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> +	mutex_lock(&dqopt->dqonoff_mutex);
>  	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
>  		if (type != -1 && cnt != type)
>  			continue;
>  		if (!sb_has_quota_active(sb, cnt))
>  			continue;
> -		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex,
> -				  I_MUTEX_QUOTA);
> -		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
> -		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
> +		mutex_lock_nested(&dqopt->files[cnt]->i_mutex, I_MUTEX_QUOTA);
> +		truncate_inode_pages(&dqopt->files[cnt]->i_data, 0);
> +		mutex_unlock(&dqopt->files[cnt]->i_mutex);
>  	}
> -	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> +	mutex_unlock(&dqopt->dqonoff_mutex);
>  
>  	return 0;
>  }
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index b34bdb2..76b8012 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -48,7 +48,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
>  static void quota_sync_one(struct super_block *sb, void *arg)
>  {
>  	if (sb->s_qcop && sb->s_qcop->quota_sync)
> -		sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
> +		sb->s_qcop->quota_sync(sb, *(int *)arg);
>  }
>  
>  static int quota_sync_all(int type)
> @@ -271,7 +271,7 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id,
>  	case Q_SYNC:
>  		if (!sb->s_qcop->quota_sync)
>  			return -ENOSYS;
> -		return sb->s_qcop->quota_sync(sb, type, 1);
> +		return sb->s_qcop->quota_sync(sb, type);
>  	case Q_XQUOTAON:
>  	case Q_XQUOTAOFF:
>  	case Q_XQUOTARM:
> diff --git a/fs/sync.c b/fs/sync.c
> index 2f5bd52..2b7d569 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -31,7 +31,7 @@
>  static void __sync_filesystem(struct super_block *sb, int wait)
>  {
>  	if (sb->s_qcop && sb->s_qcop->quota_sync)
> -		sb->s_qcop->quota_sync(sb, -1, wait);
> +		sb->s_qcop->quota_sync(sb, -1);
>  
>  	if (wait)
>  		sync_inodes_sb(sb);
> diff --git a/include/linux/quota.h b/include/linux/quota.h
> index 9a85412..86173d6 100644
> --- a/include/linux/quota.h
> +++ b/include/linux/quota.h
> @@ -329,7 +329,7 @@ struct quotactl_ops {
>  	int (*quota_on)(struct super_block *, int, int, struct path *);
>  	int (*quota_on_meta)(struct super_block *, int, int);
>  	int (*quota_off)(struct super_block *, int);
> -	int (*quota_sync)(struct super_block *, int, int);
> +	int (*quota_sync)(struct super_block *, int);
>  	int (*get_info)(struct super_block *, int, struct if_dqinfo *);
>  	int (*set_info)(struct super_block *, int, struct if_dqinfo *);
>  	int (*get_dqblk)(struct super_block *, int, qid_t, struct fs_disk_quota *);
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index 26f9e36..aba7578 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -83,7 +83,8 @@ int dquot_quota_on(struct super_block *sb, int type, int format_id,
>  int dquot_quota_on_mount(struct super_block *sb, char *qf_name,
>   	int format_id, int type);
>  int dquot_quota_off(struct super_block *sb, int type);
> -int dquot_quota_sync(struct super_block *sb, int type, int wait);
> +int dquot_writeback_dquots(struct super_block *sb, int type);
> +int dquot_quota_sync(struct super_block *sb, int type);
>  int dquot_get_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
>  int dquot_set_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
>  int dquot_get_dqblk(struct super_block *sb, int type, qid_t id,



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] quota: Move quota syncing to ->sync_fs method
  2011-07-26 22:38 ` [PATCH 4/5] quota: Move quota syncing to ->sync_fs method Jan Kara
@ 2011-07-27  8:32   ` Steven Whitehouse
  2011-07-27  9:46   ` Christoph Hellwig
  2011-07-27 13:44   ` Dave Kleikamp
  2 siblings, 0 replies; 21+ messages in thread
From: Steven Whitehouse @ 2011-07-27  8:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro,
	Theodore Ts'o, Dave Kleikamp, Joel Becker, reiserfs-devel

Hi,

On Wed, 2011-07-27 at 00:38 +0200, Jan Kara wrote:
> Since the moment writes to quota files are using block device page cache and
> space for quota structures is reserved at the moment they are first accessed we
> have no reason to sync quota before inode writeback. In fact this order is now
> only harmful since quota information can easily change during inode writeback
> (either because conversion of delayed-allocated extents or simply because of
> allocation of new blocks for simple filesystems not using page_mkwrite).
> 
> So move syncing of quota information after writeback of inodes into ->sync_fs
> method. This way we do not have to use ->quota_sync callback which is primarily
> intended for use by quotactl syscall anyway and we get rid of calling
> ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
> proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
> also support legacy non-journalled quotas) and thus there are no dirty quota
> structures.
> 
> CC: "Theodore Ts'o" <tytso@mit.edu>
> CC: Steven Whitehouse <swhiteho@redhat.com>
> CC: Dave Kleikamp <shaggy@kernel.org>
> CC: Joel Becker <jlbec@evilplan.org>
> CC: reiserfs-devel@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Steven Whitehouse <swhiteho@redhat.com>

Steve.

> ---
>  fs/ext2/super.c     |    6 ++++++
>  fs/ext3/super.c     |    5 +++++
>  fs/ext4/super.c     |    5 +++++
>  fs/gfs2/super.c     |    2 ++
>  fs/jfs/super.c      |    5 +++++
>  fs/reiserfs/super.c |    5 +++++
>  fs/sync.c           |    3 ---
>  7 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 1dd62ed..174782f 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -1180,6 +1180,12 @@ static int ext2_sync_fs(struct super_block *sb, int wait)
>  	struct ext2_sb_info *sbi = EXT2_SB(sb);
>  	struct ext2_super_block *es = EXT2_SB(sb)->s_es;
>  
> +	/*
> +	 * Write quota structures to quota file, sync_blockdev() will write
> +	 * them to disk later
> +	 */
> +	dquot_writeback_dquots(sb, -1);
> +
>  	spin_lock(&sbi->s_lock);
>  	if (es->s_state & cpu_to_le16(EXT2_VALID_FS)) {
>  		ext2_debug("setting valid to 0\n");
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index aad153e..a91869c 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -2507,6 +2507,11 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
>  {
>  	tid_t target;
>  
> +	/*
> +	 * Writeback quota in non-journalled quota case - journalled quota has
> +	 * no dirty dquots
> +	 */
> +	dquot_writeback_dquots(sb, -1);
>  	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
>  		if (wait)
>  			log_wait_commit(EXT3_SB(sb)->s_journal, target);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9ea71aa..c81ab5f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4199,6 +4199,11 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
>  
>  	trace_ext4_sync_fs(sb, wait);
>  	flush_workqueue(sbi->dio_unwritten_wq);
> +	/*
> +	 * Writeback quota in non-journalled quota case - journalled quota has
> +	 * no dirty dquots
> +	 */
> +	dquot_writeback_dquots(sb, -1);
>  	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
>  		if (wait)
>  			jbd2_log_wait_commit(sbi->s_journal, target);
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index e361610..faf6e83 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -921,6 +921,8 @@ restart:
>  static int gfs2_sync_fs(struct super_block *sb, int wait)
>  {
>  	struct gfs2_sbd *sdp = sb->s_fs_info;
> +
> +	gfs2_quota_sync(sb, -1);
>  	if (wait && sdp)
>  		gfs2_log_flush(sdp, NULL);
>  	return 0;
> diff --git a/fs/jfs/super.c b/fs/jfs/super.c
> index 06c8a67..a2c9cc1 100644
> --- a/fs/jfs/super.c
> +++ b/fs/jfs/super.c
> @@ -603,6 +603,11 @@ static int jfs_sync_fs(struct super_block *sb, int wait)
>  
>  	/* log == NULL indicates read-only mount */
>  	if (log) {
> +		/*
> +		 * Write quota structures to quota file, sync_blockdev() will
> +		 * write them to disk later
> +		 */
> +		dquot_writeback_dquots(sb, -1);
>  		jfs_flush_journal(log, wait);
>  		jfs_syncpt(log, 0);
>  	}
> diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
> index aa91089..a2bfc97 100644
> --- a/fs/reiserfs/super.c
> +++ b/fs/reiserfs/super.c
> @@ -66,6 +66,11 @@ static int reiserfs_sync_fs(struct super_block *s, int wait)
>  {
>  	struct reiserfs_transaction_handle th;
>  
> +	/*
> +	 * Writeback quota in non-journalled quota case - journalled quota has
> +	 * no dirty dquots
> +	 */
> +	dquot_writeback_dquots(s, -1);
>  	reiserfs_write_lock(s);
>  	if (!journal_begin(&th, s, 1))
>  		if (!journal_end_sync(&th, s, 1))
> diff --git a/fs/sync.c b/fs/sync.c
> index 2b7d569..f07f991 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -30,9 +30,6 @@
>   */
>  static void __sync_filesystem(struct super_block *sb, int wait)
>  {
> -	if (sb->s_qcop && sb->s_qcop->quota_sync)
> -		sb->s_qcop->quota_sync(sb, -1);
> -
>  	if (wait)
>  		sync_inodes_sb(sb);
>  	else



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes
  2011-07-26 22:38 ` [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes Jan Kara
@ 2011-07-27  9:44   ` Christoph Hellwig
  2011-07-28 20:13     ` Jan Kara
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2011-07-27  9:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro

> -static int __sync_filesystem(struct super_block *sb, int wait)
> +static void __sync_filesystem(struct super_block *sb, int wait)
>  {
> -	/*
> -	 * This should be safe, as we require bdi backing to actually
> -	 * write out data in the first place
> -	 */
> -	if (sb->s_bdi == &noop_backing_dev_info)
> -		return 0;
> -

Moving this check is not related to the block device writeback, is it?
Furthermore it gets moved deeper into the stack later on anyway, so it
gets reverted.

I think the series would be much cleaner if this patch gets moved
towards the end of it.

> +static void sync_all_bdevs(int wait)
> +{

This function should at least have a comment explaining why we need it.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] vfs: Move noop_backing_dev_info check from sync into writeback
  2011-07-26 22:38 ` [PATCH 2/5] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
@ 2011-07-27  9:44   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2011-07-27  9:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro

On Wed, Jul 27, 2011 at 12:38:03AM +0200, Jan Kara wrote:
> In principle, a filesystem may want to have ->sync_fs() called during sync(1)
> although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
> Only writeback code really needs bdi set to something reasonable. So move the
> checks where they are more logical.

Looks good, although as mentioned this would be better done before patch
1.


Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part
  2011-07-26 22:38 ` [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
  2011-07-27  8:26   ` Steven Whitehouse
@ 2011-07-27  9:45   ` Christoph Hellwig
  1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2011-07-27  9:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro

On Wed, Jul 27, 2011 at 12:38:04AM +0200, Jan Kara wrote:
> Split off part of dquot_quota_sync() which writes dquots into a quota file
> to a separate function. In the next patch we will use the function from
> filesystems and we do not want to abuse ->quota_sync quotactl callback more
> than necessary.

I don't know the generic quota code enough to verify that this actually is
safe, but the change at least looks good to me from the VFS POW.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] quota: Move quota syncing to ->sync_fs method
  2011-07-26 22:38 ` [PATCH 4/5] quota: Move quota syncing to ->sync_fs method Jan Kara
  2011-07-27  8:32   ` Steven Whitehouse
@ 2011-07-27  9:46   ` Christoph Hellwig
  2011-07-27 13:44   ` Dave Kleikamp
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2011-07-27  9:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro,
	Theodore Ts'o, Steven Whitehouse, Dave Kleikamp, Joel Becker,
	reiserfs-devel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)
  2011-07-26 22:38 ` [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1) Jan Kara
@ 2011-07-27  9:52   ` Christoph Hellwig
  2011-07-28 17:42     ` Jan Kara
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2011-07-27  9:52 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro

> +static void __sync_filesystem(struct super_block *sb, int emergency)
>  {
> +	/* In case of emergency sync we don't want to wait for locks and IO */
> +	if (unlikely(emergency))
>  		writeback_inodes_sb(sb);
> +	else
> +		sync_inodes_sb(sb);
>  
>  	if (sb->s_op->sync_fs)
> +		sb->s_op->sync_fs(sb, 0);
>  }

This function doesn't make sense to me in the current form.  Why would
be do a blocking sync_inodes_sb, but then a non-blocking ->sync_fs?

> -static void sync_filesystems(int wait)
> +static void sync_filesystems(int emergency)
>  {
> -	iterate_supers(sync_one_sb, &wait);
> +	iterate_supers(sync_one_sb, &emergency);
>  }


I'd just drop the sync_filesystems wrapper, and also use individual
callbacks for the two cases.

>  SYSCALL_DEFINE0(sync)
>  {
> +	/* Start flushing on all devices */
>  	wakeup_flusher_threads(0);
> +	/*
> +	 * Above call queued work doing complete writeout on each filesystem.
> +	 * Now we queue work which guarantees data integrity of all inodes
> +	 * - not much should be left for it to write. The WB_SYNC_ALL inode
> +	 * writeback also guarantees that sync_fs() is called after inodes
> +	 * are written out and thus it can do meaningful work.
> +	 */
>  	sync_filesystems(0);

This really should be an iteration over sb_sync_fs with wait == 0

>  	sync_all_bdevs(0);
> +	/* Call blocking ->sync_fs() for each filesystem */
> +	iterate_supers(sb_sync_fs, NULL);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] quota: Move quota syncing to ->sync_fs method
  2011-07-26 22:38 ` [PATCH 4/5] quota: Move quota syncing to ->sync_fs method Jan Kara
  2011-07-27  8:32   ` Steven Whitehouse
  2011-07-27  9:46   ` Christoph Hellwig
@ 2011-07-27 13:44   ` Dave Kleikamp
  2 siblings, 0 replies; 21+ messages in thread
From: Dave Kleikamp @ 2011-07-27 13:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro,
	Theodore Ts'o, Steven Whitehouse, Joel Becker,
	reiserfs-devel

On 07/26/2011 05:38 PM, Jan Kara wrote:
> Since the moment writes to quota files are using block device page cache and
> space for quota structures is reserved at the moment they are first accessed we
> have no reason to sync quota before inode writeback. In fact this order is now
> only harmful since quota information can easily change during inode writeback
> (either because conversion of delayed-allocated extents or simply because of
> allocation of new blocks for simple filesystems not using page_mkwrite).
> 
> So move syncing of quota information after writeback of inodes into ->sync_fs
> method. This way we do not have to use ->quota_sync callback which is primarily
> intended for use by quotactl syscall anyway and we get rid of calling
> ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
> proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
> also support legacy non-journalled quotas) and thus there are no dirty quota
> structures.
> 
> CC: "Theodore Ts'o" <tytso@mit.edu>
> CC: Steven Whitehouse <swhiteho@redhat.com>
> CC: Dave Kleikamp <shaggy@kernel.org>
> CC: Joel Becker <jlbec@evilplan.org>
> CC: reiserfs-devel@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>

> ---
>  fs/ext2/super.c     |    6 ++++++
>  fs/ext3/super.c     |    5 +++++
>  fs/ext4/super.c     |    5 +++++
>  fs/gfs2/super.c     |    2 ++
>  fs/jfs/super.c      |    5 +++++
>  fs/reiserfs/super.c |    5 +++++
>  fs/sync.c           |    3 ---
>  7 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 1dd62ed..174782f 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -1180,6 +1180,12 @@ static int ext2_sync_fs(struct super_block *sb, int wait)
>  	struct ext2_sb_info *sbi = EXT2_SB(sb);
>  	struct ext2_super_block *es = EXT2_SB(sb)->s_es;
>  
> +	/*
> +	 * Write quota structures to quota file, sync_blockdev() will write
> +	 * them to disk later
> +	 */
> +	dquot_writeback_dquots(sb, -1);
> +
>  	spin_lock(&sbi->s_lock);
>  	if (es->s_state & cpu_to_le16(EXT2_VALID_FS)) {
>  		ext2_debug("setting valid to 0\n");
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index aad153e..a91869c 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -2507,6 +2507,11 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
>  {
>  	tid_t target;
>  
> +	/*
> +	 * Writeback quota in non-journalled quota case - journalled quota has
> +	 * no dirty dquots
> +	 */
> +	dquot_writeback_dquots(sb, -1);
>  	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
>  		if (wait)
>  			log_wait_commit(EXT3_SB(sb)->s_journal, target);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9ea71aa..c81ab5f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4199,6 +4199,11 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
>  
>  	trace_ext4_sync_fs(sb, wait);
>  	flush_workqueue(sbi->dio_unwritten_wq);
> +	/*
> +	 * Writeback quota in non-journalled quota case - journalled quota has
> +	 * no dirty dquots
> +	 */
> +	dquot_writeback_dquots(sb, -1);
>  	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
>  		if (wait)
>  			jbd2_log_wait_commit(sbi->s_journal, target);
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index e361610..faf6e83 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -921,6 +921,8 @@ restart:
>  static int gfs2_sync_fs(struct super_block *sb, int wait)
>  {
>  	struct gfs2_sbd *sdp = sb->s_fs_info;
> +
> +	gfs2_quota_sync(sb, -1);
>  	if (wait && sdp)
>  		gfs2_log_flush(sdp, NULL);
>  	return 0;
> diff --git a/fs/jfs/super.c b/fs/jfs/super.c
> index 06c8a67..a2c9cc1 100644
> --- a/fs/jfs/super.c
> +++ b/fs/jfs/super.c
> @@ -603,6 +603,11 @@ static int jfs_sync_fs(struct super_block *sb, int wait)
>  
>  	/* log == NULL indicates read-only mount */
>  	if (log) {
> +		/*
> +		 * Write quota structures to quota file, sync_blockdev() will
> +		 * write them to disk later
> +		 */
> +		dquot_writeback_dquots(sb, -1);
>  		jfs_flush_journal(log, wait);
>  		jfs_syncpt(log, 0);
>  	}
> diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
> index aa91089..a2bfc97 100644
> --- a/fs/reiserfs/super.c
> +++ b/fs/reiserfs/super.c
> @@ -66,6 +66,11 @@ static int reiserfs_sync_fs(struct super_block *s, int wait)
>  {
>  	struct reiserfs_transaction_handle th;
>  
> +	/*
> +	 * Writeback quota in non-journalled quota case - journalled quota has
> +	 * no dirty dquots
> +	 */
> +	dquot_writeback_dquots(s, -1);
>  	reiserfs_write_lock(s);
>  	if (!journal_begin(&th, s, 1))
>  		if (!journal_end_sync(&th, s, 1))
> diff --git a/fs/sync.c b/fs/sync.c
> index 2b7d569..f07f991 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -30,9 +30,6 @@
>   */
>  static void __sync_filesystem(struct super_block *sb, int wait)
>  {
> -	if (sb->s_qcop && sb->s_qcop->quota_sync)
> -		sb->s_qcop->quota_sync(sb, -1);
> -
>  	if (wait)
>  		sync_inodes_sb(sb);
>  	else

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)
  2011-07-27  9:52   ` Christoph Hellwig
@ 2011-07-28 17:42     ` Jan Kara
  2011-07-28 20:37       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2011-07-28 17:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, Curt Wohlgemuth, Al Viro

On Wed 27-07-11 05:52:48, Christoph Hellwig wrote:
> > +static void __sync_filesystem(struct super_block *sb, int emergency)
> >  {
> > +	/* In case of emergency sync we don't want to wait for locks and IO */
> > +	if (unlikely(emergency))
> >  		writeback_inodes_sb(sb);
> > +	else
> > +		sync_inodes_sb(sb);
> >  
> >  	if (sb->s_op->sync_fs)
> > +		sb->s_op->sync_fs(sb, 0);
> >  }
> 
> This function doesn't make sense to me in the current form.  Why would
> be do a blocking sync_inodes_sb, but then a non-blocking ->sync_fs?
  I tried to explain this in the changelog but it seems I failed.
wakeup_flusher_threads(0) starts completely asynchronous writeback of all
bdis. We could do an iteration of sb_sync_fs with wait==0 over all
superblocks just after wakeup_flusher_threads(0) returns but at this
moment, almost nothing is written yet because flusher threads have just
started doing their work. So what useful work can ->sync_fs(0) do in such
case? E.g. ocfs2 could start processing orphan list but for example
starting quota writeback as done by XFS would be really fruitless as quota
is going to change heavily as a result of writeback I expect. Similarly
ext3 or ext4 cannot really do useful work while most inodes are not yet
written.

It would be nice to call ->sync_fs with wait == 0 when flusher thread is
done with the bdi but there's no easy way to do that. We could use the
completion for this but we'd have to track these for every bdi which isn't
very nice. That's why I chose to synchronize with flusher threads using
the synchronous inode writeback - when that is finished we know flusher
threads are done and so calling ->sync_fs() can be useful. So do you think
calling ->sync_fs() with wait == 0 that late is a problem?
  
> > -static void sync_filesystems(int wait)
> > +static void sync_filesystems(int emergency)
> >  {
> > -	iterate_supers(sync_one_sb, &wait);
> > +	iterate_supers(sync_one_sb, &emergency);
> >  }
> 
> 
> I'd just drop the sync_filesystems wrapper, and also use individual
> callbacks for the two cases.
  OK, will do.

> >  SYSCALL_DEFINE0(sync)
> >  {
> > +	/* Start flushing on all devices */
> >  	wakeup_flusher_threads(0);
> > +	/*
> > +	 * Above call queued work doing complete writeout on each filesystem.
> > +	 * Now we queue work which guarantees data integrity of all inodes
> > +	 * - not much should be left for it to write. The WB_SYNC_ALL inode
> > +	 * writeback also guarantees that sync_fs() is called after inodes
> > +	 * are written out and thus it can do meaningful work.
> > +	 */
> >  	sync_filesystems(0);
> 
> This really should be an iteration over sb_sync_fs with wait == 0
> 
> >  	sync_all_bdevs(0);
> > +	/* Call blocking ->sync_fs() for each filesystem */
> > +	iterate_supers(sb_sync_fs, NULL);

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes
  2011-07-27  9:44   ` Christoph Hellwig
@ 2011-07-28 20:13     ` Jan Kara
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2011-07-28 20:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, Curt Wohlgemuth, Al Viro

On Wed 27-07-11 05:44:23, Christoph Hellwig wrote:
> > -static int __sync_filesystem(struct super_block *sb, int wait)
> > +static void __sync_filesystem(struct super_block *sb, int wait)
> >  {
> > -	/*
> > -	 * This should be safe, as we require bdi backing to actually
> > -	 * write out data in the first place
> > -	 */
> > -	if (sb->s_bdi == &noop_backing_dev_info)
> > -		return 0;
> > -
> 
> Moving this check is not related to the block device writeback, is it?
> Furthermore it gets moved deeper into the stack later on anyway, so it
> gets reverted.
  Good point, I'll swap patches 1 & 2.

> I think the series would be much cleaner if this patch gets moved
> towards the end of it.
>
> > +static void sync_all_bdevs(int wait)
> > +{
> 
> This function should at least have a comment explaining why we need it.
  Will do.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)
  2011-07-28 17:42     ` Jan Kara
@ 2011-07-28 20:37       ` Christoph Hellwig
  2011-07-28 21:20         ` Curt Wohlgemuth
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2011-07-28 20:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, Curt Wohlgemuth, Al Viro

On Thu, Jul 28, 2011 at 07:42:03PM +0200, Jan Kara wrote:
> > be do a blocking sync_inodes_sb, but then a non-blocking ->sync_fs?
>   I tried to explain this in the changelog but it seems I failed.
> wakeup_flusher_threads(0) starts completely asynchronous writeback of all
> bdis. We could do an iteration of sb_sync_fs with wait==0 over all
> superblocks just after wakeup_flusher_threads(0) returns but at this
> moment, almost nothing is written yet because flusher threads have just
> started doing their work. So what useful work can ->sync_fs(0) do in such
> case? E.g. ocfs2 could start processing orphan list but for example
> starting quota writeback as done by XFS would be really fruitless as quota
> is going to change heavily as a result of writeback I expect. Similarly
> ext3 or ext4 cannot really do useful work while most inodes are not yet
> written.

the way xfs currently starts quota writeback actually seems pretty
useless to be already.  I don't think removing it will have a
performance impact, but I will have to benchmark it carefully.  Remember
that even right now the chances that all inodes are actually written
is fairly low on a busy systems, so we might just create a lot of
duplicate work by the two sync_fs passes.

> 
> It would be nice to call ->sync_fs with wait == 0 when flusher thread is
> done with the bdi but there's no easy way to do that. We could use the
> completion for this but we'd have to track these for every bdi which isn't
> very nice. That's why I chose to synchronize with flusher threads using
> the synchronous inode writeback - when that is finished we know flusher
> threads are done and so calling ->sync_fs() can be useful. So do you think
> calling ->sync_fs() with wait == 0 that late is a problem?

It's far too much subtile changes for one patch.  The current sync
sequence in pseudo-code is:


wakeup_flusher_threads()
for_each_sb {
	writeback_inodes_sb();
	sync_fs(nowait);
	__sync_blockdev(nowait);
}
for_each_sb {
	sync_inodes_sb();
	sync_fs(wait);
	__sync_blockdev(nowait);
}

and currently your patch changes it to:

wakeup_flusher_threads()
for_each_sb {
	sync_inodes_sb();
	sync_fs(nowait);
}
for_each_bdev 
	__sync_blockdev(nowait);
for_each_sb
	sync_fs(wait);
for_each_bdev
	__sync_blockdev(wait);


Let's do things bisectable, with one behaviour change and one goal at a
time.

That is first patch in the sequence splits the loops and moves to:

wakeup_flusher_threads()
for_each_sb
	writeback_inodes_sb();
for_each_sb
	sync_fs(nowait);
for_each_sb
	__sync_blockdev(nowait);
for_each_sb 
	sync_inodes_sb();
for_each_sb 
	sync_fs(wait);
for_each_sb 
	__sync_blockdev(nowait);

Then as next steps add one patch that moves to iterate over the block
devices for the __sync_blockdev pass, and one that drops the
writeback_inodes_sb call at which point we'll have

wakeup_flusher_threads()
for_each_sb
	sync_fs(nowait);
for_each_bdev
	__sync_blockdev(nowait);
for_each_sb 
	sync_inodes_sb();
for_each_sb 
	sync_fs(wait);
for_each_bdev
	__sync_blockdev(nowait);

And now we can thing about optimizations.  Randomly changing order
as done right now doesn't seem to helpful.  My theory would be that
we could simply drop the nowait versions of sync_fs entirely, as it
doesn't do a lot of work that gets invalidated by the inode writeback
later on, but we'd really have to validate that theory by benchmark
runs on the filesystems where people care about performance.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)
  2011-07-28 20:37       ` Christoph Hellwig
@ 2011-07-28 21:20         ` Curt Wohlgemuth
  2011-07-29 11:09           ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Curt Wohlgemuth @ 2011-07-28 21:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, Al Viro

Hi Christoph:

On Thu, Jul 28, 2011 at 1:37 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 28, 2011 at 07:42:03PM +0200, Jan Kara wrote:
>> > be do a blocking sync_inodes_sb, but then a non-blocking ->sync_fs?
>>   I tried to explain this in the changelog but it seems I failed.
>> wakeup_flusher_threads(0) starts completely asynchronous writeback of all
>> bdis. We could do an iteration of sb_sync_fs with wait==0 over all
>> superblocks just after wakeup_flusher_threads(0) returns but at this
>> moment, almost nothing is written yet because flusher threads have just
>> started doing their work. So what useful work can ->sync_fs(0) do in such
>> case? E.g. ocfs2 could start processing orphan list but for example
>> starting quota writeback as done by XFS would be really fruitless as quota
>> is going to change heavily as a result of writeback I expect. Similarly
>> ext3 or ext4 cannot really do useful work while most inodes are not yet
>> written.
>
> the way xfs currently starts quota writeback actually seems pretty
> useless to be already.  I don't think removing it will have a
> performance impact, but I will have to benchmark it carefully.  Remember
> that even right now the chances that all inodes are actually written
> is fairly low on a busy systems, so we might just create a lot of
> duplicate work by the two sync_fs passes.
>
>>
>> It would be nice to call ->sync_fs with wait == 0 when flusher thread is
>> done with the bdi but there's no easy way to do that. We could use the
>> completion for this but we'd have to track these for every bdi which isn't
>> very nice. That's why I chose to synchronize with flusher threads using
>> the synchronous inode writeback - when that is finished we know flusher
>> threads are done and so calling ->sync_fs() can be useful. So do you think
>> calling ->sync_fs() with wait == 0 that late is a problem?
>
> It's far too much subtile changes for one patch.  The current sync
> sequence in pseudo-code is:
>
>
> wakeup_flusher_threads()
> for_each_sb {
>        writeback_inodes_sb();
>        sync_fs(nowait);
>        __sync_blockdev(nowait);
> }
> for_each_sb {
>        sync_inodes_sb();
>        sync_fs(wait);
>        __sync_blockdev(nowait);

Do you mean "__sync_blockdev(wait)" here?  'Cause that's what
__sync_filesystem() does if wait=1.

I'd have just chalked it up to a typo, but you repeated it everywhere
below too, and wanted to see if I'm missing something.

Curt

> }
>
> and currently your patch changes it to:
>
> wakeup_flusher_threads()
> for_each_sb {
>        sync_inodes_sb();
>        sync_fs(nowait);
> }
> for_each_bdev
>        __sync_blockdev(nowait);
> for_each_sb
>        sync_fs(wait);
> for_each_bdev
>        __sync_blockdev(wait);
>
>
> Let's do things bisectable, with one behaviour change and one goal at a
> time.
>
> That is first patch in the sequence splits the loops and moves to:
>
> wakeup_flusher_threads()
> for_each_sb
>        writeback_inodes_sb();
> for_each_sb
>        sync_fs(nowait);
> for_each_sb
>        __sync_blockdev(nowait);
> for_each_sb
>        sync_inodes_sb();
> for_each_sb
>        sync_fs(wait);
> for_each_sb
>        __sync_blockdev(nowait);
>
> Then as next steps add one patch that moves to iterate over the block
> devices for the __sync_blockdev pass, and one that drops the
> writeback_inodes_sb call at which point we'll have
>
> wakeup_flusher_threads()
> for_each_sb
>        sync_fs(nowait);
> for_each_bdev
>        __sync_blockdev(nowait);
> for_each_sb
>        sync_inodes_sb();
> for_each_sb
>        sync_fs(wait);
> for_each_bdev
>        __sync_blockdev(nowait);
>
> And now we can thing about optimizations.  Randomly changing order
> as done right now doesn't seem to helpful.  My theory would be that
> we could simply drop the nowait versions of sync_fs entirely, as it
> doesn't do a lot of work that gets invalidated by the inode writeback
> later on, but we'd really have to validate that theory by benchmark
> runs on the filesystems where people care about performance.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)
  2011-07-28 21:20         ` Curt Wohlgemuth
@ 2011-07-29 11:09           ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2011-07-29 11:09 UTC (permalink / raw)
  To: Curt Wohlgemuth; +Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, Al Viro

On Thu, Jul 28, 2011 at 02:20:38PM -0700, Curt Wohlgemuth wrote:
> > for_each_sb {
> > ? ? ? ?sync_inodes_sb();
> > ? ? ? ?sync_fs(wait);
> > ? ? ? ?__sync_blockdev(nowait);
> 
> Do you mean "__sync_blockdev(wait)" here?  'Cause that's what
> __sync_filesystem() does if wait=1.
> 
> I'd have just chalked it up to a typo, but you repeated it everywhere
> below too, and wanted to see if I'm missing something.

It is a typo, and the other errors below are copy & paste instances of
the typo above.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5 v2] Improve sync(2) handling
  2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
                   ` (4 preceding siblings ...)
  2011-07-26 22:38 ` [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1) Jan Kara
@ 2011-09-28 15:00 ` Christoph Hellwig
  2011-10-06 22:20   ` Jan Kara
  5 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2011-09-28 15:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Curt Wohlgemuth, Al Viro

Jan,

are there any plans to resend this series with the comments addressed?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/5 v2] Improve sync(2) handling
  2011-09-28 15:00 ` [PATCH 0/5 v2] Improve sync(2) handling Christoph Hellwig
@ 2011-10-06 22:20   ` Jan Kara
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2011-10-06 22:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, Curt Wohlgemuth, Al Viro

On Wed 28-09-11 11:00:24, Christoph Hellwig wrote:
> Jan,
> 
> are there any plans to resend this series with the comments addressed?
  Yes, I have the patch series ready for a few weeks already but I've got
stuck trying to make some sense from the performance numbers I've
measured...  Anyway, I'll post tomorrow the series and the numbers I have
measured with some commentary.

								Honza		
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-10-06 22:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
2011-07-26 22:38 ` [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes Jan Kara
2011-07-27  9:44   ` Christoph Hellwig
2011-07-28 20:13     ` Jan Kara
2011-07-26 22:38 ` [PATCH 2/5] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
2011-07-27  9:44   ` Christoph Hellwig
2011-07-26 22:38 ` [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
2011-07-27  8:26   ` Steven Whitehouse
2011-07-27  9:45   ` Christoph Hellwig
2011-07-26 22:38 ` [PATCH 4/5] quota: Move quota syncing to ->sync_fs method Jan Kara
2011-07-27  8:32   ` Steven Whitehouse
2011-07-27  9:46   ` Christoph Hellwig
2011-07-27 13:44   ` Dave Kleikamp
2011-07-26 22:38 ` [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1) Jan Kara
2011-07-27  9:52   ` Christoph Hellwig
2011-07-28 17:42     ` Jan Kara
2011-07-28 20:37       ` Christoph Hellwig
2011-07-28 21:20         ` Curt Wohlgemuth
2011-07-29 11:09           ` Christoph Hellwig
2011-09-28 15:00 ` [PATCH 0/5 v2] Improve sync(2) handling Christoph Hellwig
2011-10-06 22:20   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.