All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Cleanup and improve sync (v3)
@ 2011-10-07 20:40 Jan Kara
  2011-10-07 20:40 ` [PATCH 1/6] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Jan Kara @ 2011-10-07 20:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro


  Hello,

  this is a third iteration of my series improving handling of sync syscall.
Since previous submission I have changed ordering of patches and split some
patches as Christoph suggested.

I have run three tests below to verify performance impact of the patch series.
Each test has been run with 1, 2, and 4 filesystems mounted; test with 2
filesystems was run with each filesystem on a different disk, test with 4
filesystems had 2 filesystems on the first disk and 2 filesystems on the second
disk.

Test 1: Run 200 times sync with filesystem mounted to verify overhead of
  sync when there are no data to write.
Test 2: For each filesystem run a process creating 40 KB files, sleep
  for 3 seconds, run sync.
Test 3: For each filesystem run a process creating 20 GB file, sleep for
  5 seconds, run sync.

I have performed 10 runs of each test for xfs, ext3, ext4, and btrfs
filesystems.

Results of test 1
-----------------
Numbers are time it took 200 syncs to complete.
Character in braces is + if the time increased with 2*STDDEV reliability,
- if it decreased with 2*STDDEV reliability, 0 otherwise.
		      BASE		      PATCHED
FS		AVG      STDDEV         AVG      STDDEV
xfs, 1 disks	4.189300 0.051525	2.141300 0.063389 (-)
xfs, 2 disks	4.820600 0.019096	4.611400 0.066322 (-)
xfs, 4 disks	6.518300 1.440362	6.435700 0.510641 (0)
ext4, 1 disks	4.085000 0.011375	1.689500 0.001360 (-)
ext4, 2 disks	4.088100 0.006488	1.705000 0.026359 (-)
ext4, 4 disks	4.107300 0.011934	1.702900 0.001814 (-)
ext3, 1 disks	4.080200 0.009527	1.703400 0.030559 (-)
ext3, 2 disks	4.138300 0.143909	1.694000 0.001414 (-)
ext3, 4 disks	4.107200 0.002482	1.702900 0.007778 (-)
btrfs, 1 disks	11.214600 0.086619	8.737200 0.081076 (-)
btrfs, 2 disks	32.910000 0.162089	30.673400 0.538820 (-)
btrfs, 4 disks	67.987700 1.655654	67.247100 1.971887 (0)

So we see nice improvements almost all over the board.

Results of test 2
-----------------
Numbers are time it took sync to complete.

		    BASE		    PATCHED
FS		AVG      STDDEV         AVG      STDDEV
xfs, 1 disks	0.436000 0.012000	0.506000 0.014283 (+)
xfs, 2 disks	1.105000 0.055543	1.274000 0.244426 (0)
xfs, 4 disks	5.880000 2.997135	4.837000 3.875448 (0)
ext4, 1 disks	0.791000 0.055579	0.853000 0.042438 (0)
ext4, 2 disks	18.232000 13.505638	17.254000 2.000506 (0)
ext4, 4 disks	491.790000 218.565229	696.783000 234.933562 (0)
ext3, 1 disks	15.315000 2.065465	1.900000 0.184662 (-)
ext3, 2 disks	128.524000 18.090519	55.278000 1.530554 (-)
ext3, 4 disks	221.202000 30.090432	232.849000 68.745423 (0)
btrfs, 1 disks	0.452000 0.026000	0.494000 0.023749 (0)
btrfs, 2 disks	5.156000 4.530852	4.083000 1.560519 (0)
btrfs, 4 disks	31.154000 11.220828	36.987000 17.334126 (0)

Except for ext3 which got a nice boost here and XFS which seems to be a tad bit
slower, there are no changes that would stand out of the noise.

Results of test 3
-----------------
Numbers are time it took sync to complete.

		    BASE		    PATCHED
FS		AVG      STDDEV         AVG      STDDEV
xfs, 1 disks	12.083000 0.058660	10.898000 0.285475 (-)
xfs, 2 disks	20.182000 0.549614	14.977000 0.351114 (-)
xfs, 4 disks	35.814000 5.318310	28.452000 3.332281 (0)
ext4, 1 disks	32.956000 5.753789	20.865000 3.892098 (0)
ext4, 2 disks	34.922000 3.051966	27.411000 2.752978 (0)
ext4, 4 disks	44.508000 6.829004	28.360000 2.561437 (0)
ext3, 1 disks	23.475000 1.288885	17.116000 0.319631 (-)
ext3, 2 disks	43.508000 4.998647	41.547000 2.597976 (0)
ext3, 4 disks	92.130000 11.344117	79.362000 9.891208 (0)
btrfs, 1 disks	12.478000 0.394304	12.847000 0.171117 (0)
btrfs, 2 disks	15.030000 0.777817	18.014000 2.011418 (0)
btrfs, 4 disks	32.395000 4.248859	38.411000 3.179939 (0)

Here we see XFS and ext3 had some improvements, ext4 likely as well although
the results are relatively noisy.

Out of curiosity, I also tried removing syncfs(sb, 0) call from the sync
sequence altogether as Christoph suggested. In the test 1, results end up being
even better, tests 2 and 3 end up roughly the same, sometimes slightly better.
I also performed tests where we write some amount of data to the filesystem
and then call sync - there were no changes in sync times that would stand out
of the noise. So this might be a worthwhile simplification of sync...

								Honza

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] vfs: Move noop_backing_dev_info check from sync into writeback
  2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
@ 2011-10-07 20:40 ` Jan Kara
  2011-10-20  9:48   ` Christoph Hellwig
  2011-10-07 20:40 ` [PATCH 2/6] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2011-10-07 20:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro, Jan Kara

In principle, a filesystem may want to have ->sync_fs() called during sync(1)
although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
Only writeback code really needs bdi set to something reasonable. So move the
checks where they are more logical.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    6 ++++++
 fs/sync.c         |    7 -------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 04cf3b9..9794dfe 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1214,6 +1214,9 @@ void writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr)
 		.nr_pages		= nr,
 	};
 
+	/* Nothing to do? */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 	bdi_queue_work(sb->s_bdi, &work);
 	wait_for_completion(&done);
@@ -1292,6 +1295,9 @@ void sync_inodes_sb(struct super_block *sb)
 		.done		= &done,
 	};
 
+	/* Nothing to do? */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
 	bdi_queue_work(sb->s_bdi, &work);
diff --git a/fs/sync.c b/fs/sync.c
index c98a747..38e942b 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -30,13 +30,6 @@
  */
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
-	/*
-	 * This should be safe, as we require bdi backing to actually
-	 * write out data in the first place
-	 */
-	if (sb->s_bdi == &noop_backing_dev_info)
-		return 0;
-
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
 		sb->s_qcop->quota_sync(sb, -1, wait);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] quota: Split dquot_quota_sync() to writeback and cache flushing part
  2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
  2011-10-07 20:40 ` [PATCH 1/6] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
@ 2011-10-07 20:40 ` Jan Kara
  2011-10-20  9:50   ` Christoph Hellwig
  2011-10-07 20:40 ` [PATCH 3/6] quota: Move quota syncing to ->sync_fs method Jan Kara
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2011-10-07 20:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro, Jan Kara

Split off part of dquot_quota_sync() which writes dquots into a quota file
to a separate function. In the next patch we will use the function from
filesystems and we do not want to abuse ->quota_sync quotactl callback more
than necessary.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/quota.c          |    4 ++--
 fs/gfs2/quota.h          |    2 +-
 fs/gfs2/super.c          |    2 +-
 fs/gfs2/sys.c            |    2 +-
 fs/quota/dquot.c         |   35 ++++++++++++++++++++++++++---------
 fs/quota/quota.c         |    4 ++--
 fs/sync.c                |    2 +-
 include/linux/quota.h    |    2 +-
 include/linux/quotaops.h |    3 ++-
 9 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 0e8bb13..e1ad6f7 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1122,7 +1122,7 @@ void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
 	}
 }
 
-int gfs2_quota_sync(struct super_block *sb, int type, int wait)
+int gfs2_quota_sync(struct super_block *sb, int type)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_quota_data **qda;
@@ -1168,7 +1168,7 @@ int gfs2_quota_sync(struct super_block *sb, int type, int wait)
 
 static int gfs2_quota_sync_timeo(struct super_block *sb, int type)
 {
-	return gfs2_quota_sync(sb, type, 0);
+	return gfs2_quota_sync(sb, type);
 }
 
 int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id)
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 90bf1c3..f25d98b 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -26,7 +26,7 @@ extern int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 gid);
 extern void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
 			      u32 uid, u32 gid);
 
-extern int gfs2_quota_sync(struct super_block *sb, int type, int wait);
+extern int gfs2_quota_sync(struct super_block *sb, int type);
 extern int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id);
 
 extern int gfs2_quota_init(struct gfs2_sbd *sdp);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index b7beadd..3870562 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -812,7 +812,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 	int error;
 
 	flush_workqueue(gfs2_delete_workqueue);
-	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
+	gfs2_quota_sync(sdp->sd_vfs, 0);
 	gfs2_statfs_sync(sdp->sd_vfs, 0);
 
 	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, GL_NOCACHE,
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 443cabc..4e06db2 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -168,7 +168,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const char *buf,
 	if (simple_strtol(buf, NULL, 0) != 1)
 		return -EINVAL;
 
-	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
+	gfs2_quota_sync(sdp->sd_vfs, 0);
 	return len;
 }
 
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 5b572c8..35c4a47 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -595,12 +595,14 @@ out:
 }
 EXPORT_SYMBOL(dquot_scan_active);
 
-int dquot_quota_sync(struct super_block *sb, int type, int wait)
+/* Write all dquot structures to quota files */
+int dquot_writeback_dquots(struct super_block *sb, int type)
 {
 	struct list_head *dirty;
 	struct dquot *dquot;
 	struct quota_info *dqopt = sb_dqopt(sb);
 	int cnt;
+	int err, ret = 0;
 
 	mutex_lock(&dqopt->dqonoff_mutex);
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -624,7 +626,9 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 			atomic_inc(&dquot->dq_count);
 			spin_unlock(&dq_list_lock);
 			dqstats_inc(DQST_LOOKUPS);
-			sb->dq_op->write_dquot(dquot);
+			err = sb->dq_op->write_dquot(dquot);
+			if (!ret && err)
+				err = ret;
 			dqput(dquot);
 			spin_lock(&dq_list_lock);
 		}
@@ -638,7 +642,21 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 	dqstats_inc(DQST_SYNCS);
 	mutex_unlock(&dqopt->dqonoff_mutex);
 
-	if (!wait || (sb_dqopt(sb)->flags & DQUOT_QUOTA_SYS_FILE))
+	return ret;
+}
+EXPORT_SYMBOL(dquot_writeback_dquots);
+
+/* Write all dquot structures to disk and make them visible from userspace */
+int dquot_quota_sync(struct super_block *sb, int type)
+{
+	struct quota_info *dqopt = sb_dqopt(sb);
+	int cnt;
+	int ret;
+
+	ret = dquot_writeback_dquots(sb, type);
+	if (ret)
+		return ret;
+	if (dqopt->flags & DQUOT_QUOTA_SYS_FILE)
 		return 0;
 
 	/* This is not very clever (and fast) but currently I don't know about
@@ -652,18 +670,17 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 	 * Now when everything is written we can discard the pagecache so
 	 * that userspace sees the changes.
 	 */
-	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
+	mutex_lock(&dqopt->dqonoff_mutex);
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
 		if (type != -1 && cnt != type)
 			continue;
 		if (!sb_has_quota_active(sb, cnt))
 			continue;
-		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex,
-				  I_MUTEX_QUOTA);
-		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
-		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
+		mutex_lock_nested(&dqopt->files[cnt]->i_mutex, I_MUTEX_QUOTA);
+		truncate_inode_pages(&dqopt->files[cnt]->i_data, 0);
+		mutex_unlock(&dqopt->files[cnt]->i_mutex);
 	}
-	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
+	mutex_unlock(&dqopt->dqonoff_mutex);
 
 	return 0;
 }
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 10b6be3..922b2d1 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -48,7 +48,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
 static void quota_sync_one(struct super_block *sb, void *arg)
 {
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
+		sb->s_qcop->quota_sync(sb, *(int *)arg);
 }
 
 static int quota_sync_all(int type)
@@ -271,7 +271,7 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id,
 	case Q_SYNC:
 		if (!sb->s_qcop->quota_sync)
 			return -ENOSYS;
-		return sb->s_qcop->quota_sync(sb, type, 1);
+		return sb->s_qcop->quota_sync(sb, type);
 	case Q_XQUOTAON:
 	case Q_XQUOTAOFF:
 	case Q_XQUOTARM:
diff --git a/fs/sync.c b/fs/sync.c
index 38e942b..7fbf283 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -31,7 +31,7 @@
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1, wait);
+		sb->s_qcop->quota_sync(sb, -1);
 
 	if (wait)
 		sync_inodes_sb(sb);
diff --git a/include/linux/quota.h b/include/linux/quota.h
index cb78556..8a43f08 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -329,7 +329,7 @@ struct quotactl_ops {
 	int (*quota_on)(struct super_block *, int, int, struct path *);
 	int (*quota_on_meta)(struct super_block *, int, int);
 	int (*quota_off)(struct super_block *, int);
-	int (*quota_sync)(struct super_block *, int, int);
+	int (*quota_sync)(struct super_block *, int);
 	int (*get_info)(struct super_block *, int, struct if_dqinfo *);
 	int (*set_info)(struct super_block *, int, struct if_dqinfo *);
 	int (*get_dqblk)(struct super_block *, int, qid_t, struct fs_disk_quota *);
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index 26f9e36..aba7578 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -83,7 +83,8 @@ int dquot_quota_on(struct super_block *sb, int type, int format_id,
 int dquot_quota_on_mount(struct super_block *sb, char *qf_name,
  	int format_id, int type);
 int dquot_quota_off(struct super_block *sb, int type);
-int dquot_quota_sync(struct super_block *sb, int type, int wait);
+int dquot_writeback_dquots(struct super_block *sb, int type);
+int dquot_quota_sync(struct super_block *sb, int type);
 int dquot_get_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
 int dquot_set_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
 int dquot_get_dqblk(struct super_block *sb, int type, qid_t id,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] quota: Move quota syncing to ->sync_fs method
  2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
  2011-10-07 20:40 ` [PATCH 1/6] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
  2011-10-07 20:40 ` [PATCH 2/6] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
@ 2011-10-07 20:40 ` Jan Kara
  2011-10-07 20:40 ` [PATCH 4/6] vfs: Reorder operations during sys_sync Jan Kara
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2011-10-07 20:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro, Jan Kara,
	Theodore Ts'o, Joel Becker, reiserfs-devel

Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/super.c     |    6 ++++++
 fs/ext3/super.c     |    5 +++++
 fs/ext4/super.c     |    5 +++++
 fs/gfs2/super.c     |    2 ++
 fs/jfs/super.c      |    5 +++++
 fs/reiserfs/super.c |    5 +++++
 fs/sync.c           |    3 ---
 7 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1dd62ed..174782f 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1180,6 +1180,12 @@ static int ext2_sync_fs(struct super_block *sb, int wait)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	struct ext2_super_block *es = EXT2_SB(sb)->s_es;
 
+	/*
+	 * Write quota structures to quota file, sync_blockdev() will write
+	 * them to disk later
+	 */
+	dquot_writeback_dquots(sb, -1);
+
 	spin_lock(&sbi->s_lock);
 	if (es->s_state & cpu_to_le16(EXT2_VALID_FS)) {
 		ext2_debug("setting valid to 0\n");
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 7beb69a..5e1ca17 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2522,6 +2522,11 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
 	tid_t target;
 
 	trace_ext3_sync_fs(sb, wait);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
 		if (wait)
 			log_wait_commit(EXT3_SB(sb)->s_journal, target);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 44d0c8d..18dfff7 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4226,6 +4226,11 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 
 	trace_ext4_sync_fs(sb, wait);
 	flush_workqueue(sbi->dio_unwritten_wq);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
 			jbd2_log_wait_commit(sbi->s_journal, target);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 3870562..9f9a23d 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -926,6 +926,8 @@ restart:
 static int gfs2_sync_fs(struct super_block *sb, int wait)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
+
+	gfs2_quota_sync(sb, -1);
 	if (wait && sdp)
 		gfs2_log_flush(sdp, NULL);
 	return 0;
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 06c8a67..a2c9cc1 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -603,6 +603,11 @@ static int jfs_sync_fs(struct super_block *sb, int wait)
 
 	/* log == NULL indicates read-only mount */
 	if (log) {
+		/*
+		 * Write quota structures to quota file, sync_blockdev() will
+		 * write them to disk later
+		 */
+		dquot_writeback_dquots(sb, -1);
 		jfs_flush_journal(log, wait);
 		jfs_syncpt(log, 0);
 	}
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 14363b9..dd7aec5 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -66,6 +66,11 @@ static int reiserfs_sync_fs(struct super_block *s, int wait)
 {
 	struct reiserfs_transaction_handle th;
 
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(s, -1);
 	reiserfs_write_lock(s);
 	if (!journal_begin(&th, s, 1))
 		if (!journal_end_sync(&th, s, 1))
diff --git a/fs/sync.c b/fs/sync.c
index 7fbf283..3367d04 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -30,9 +30,6 @@
  */
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
-	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1);
-
 	if (wait)
 		sync_inodes_sb(sb);
 	else
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] vfs: Reorder operations during sys_sync
  2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
                   ` (2 preceding siblings ...)
  2011-10-07 20:40 ` [PATCH 3/6] quota: Move quota syncing to ->sync_fs method Jan Kara
@ 2011-10-07 20:40 ` Jan Kara
  2011-10-20  9:53   ` Christoph Hellwig
  2011-10-07 20:40 ` [PATCH 5/6] vfs: Make sys_sync writeout also block device inodes Jan Kara
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2011-10-07 20:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro, Jan Kara

Change the order of operations during sync from

for_each_sb {
        writeback_inodes_sb();
        sync_fs(nowait);
        __sync_blockdev(nowait);
}
for_each_sb {
        sync_inodes_sb();
        sync_fs(wait);
        __sync_blockdev(wait);
}

to

for_each_sb
        writeback_inodes_sb();
for_each_sb
        sync_fs(nowait);
for_each_sb
        __sync_blockdev(nowait);
for_each_sb
        sync_inodes_sb();
for_each_sb
        sync_fs(wait);
for_each_sb
        __sync_blockdev(wait);

This is a preparation for the following patches in this series.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   46 +++++++++++++++++++++++++++++++++-------------
 1 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 3367d04..5fbeee6 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -68,18 +68,26 @@ int sync_filesystem(struct super_block *sb)
 }
 EXPORT_SYMBOL_GPL(sync_filesystem);
 
-static void sync_one_sb(struct super_block *sb, void *arg)
+static void sync_inodes_one_sb(struct super_block *sb, void *arg)
 {
-	if (!(sb->s_flags & MS_RDONLY))
-		__sync_filesystem(sb, *(int *)arg);
+	if (!(sb->s_flags & MS_RDONLY)) {
+		if (!*(int *)arg)
+			writeback_inodes_sb(sb);
+		else
+			sync_inodes_sb(sb);
+	}
 }
-/*
- * Sync all the data for all the filesystems (called by sys_sync() and
- * emergency sync)
- */
-static void sync_filesystems(int wait)
+
+static void sync_fs_one_sb(struct super_block *sb, void *arg)
 {
-	iterate_supers(sync_one_sb, &wait);
+	if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
+		sb->s_op->sync_fs(sb, *(int *)arg);
+}
+
+static void sync_blkdev_one_sb(struct super_block *sb, void *arg)
+{
+	if (!(sb->s_flags & MS_RDONLY))
+		__sync_blockdev(sb->s_bdev, *(int *)arg);
 }
 
 /*
@@ -88,9 +96,15 @@ static void sync_filesystems(int wait)
  */
 SYSCALL_DEFINE0(sync)
 {
+	int nowait = 0, wait = 1;
+
 	wakeup_flusher_threads(0);
-	sync_filesystems(0);
-	sync_filesystems(1);
+	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_blkdev_one_sb, &nowait);
+	iterate_supers(sync_inodes_one_sb, &wait);
+	iterate_supers(sync_fs_one_sb, &wait);
+	iterate_supers(sync_blkdev_one_sb, &wait);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
 	return 0;
@@ -98,12 +112,18 @@ SYSCALL_DEFINE0(sync)
 
 static void do_sync_work(struct work_struct *work)
 {
+	int nowait = 0;
+
 	/*
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
 	 * because they were temporarily locked
 	 */
-	sync_filesystems(0);
-	sync_filesystems(0);
+	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_blkdev_one_sb, &nowait);
+	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_blkdev_one_sb, &nowait);
 	printk("Emergency Sync complete\n");
 	kfree(work);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] vfs: Make sys_sync writeout also block device inodes
  2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
                   ` (3 preceding siblings ...)
  2011-10-07 20:40 ` [PATCH 4/6] vfs: Reorder operations during sys_sync Jan Kara
@ 2011-10-07 20:40 ` Jan Kara
  2011-10-20  9:54   ` Christoph Hellwig
  2011-10-07 20:40 ` [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes Jan Kara
  2011-10-18  1:07 ` [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
  6 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2011-10-07 20:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro, Jan Kara

In case block device does not have filesystem mounted on it, sys_sync will just
ignore it and doesn't writeout its dirty pages. This is because writeback code
avoids writing inodes from superblock without backing device and
blockdev_superblock is such a superblock.  Since it's unexpected that sync
doesn't writeout dirty data for block devices be nice to users and change the
behavior to do so. So now we iterate over all block devices on blockdev_super
instead of iterating over all superblocks when syncing block devices.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   48 +++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 5fbeee6..7c55c5d 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -84,10 +84,44 @@ static void sync_fs_one_sb(struct super_block *sb, void *arg)
 		sb->s_op->sync_fs(sb, *(int *)arg);
 }
 
-static void sync_blkdev_one_sb(struct super_block *sb, void *arg)
+/*
+ * We go through all existing block devices so that even devices without
+ * filesystem mounted are synced.
+ */
+static void sync_all_bdevs(int wait)
 {
-	if (!(sb->s_flags & MS_RDONLY))
-		__sync_blockdev(sb->s_bdev, *(int *)arg);
+	struct inode *inode, *old_inode = NULL;
+
+	spin_lock(&inode_sb_list_lock);
+	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
+		struct address_space *mapping = inode->i_mapping;
+
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
+		    mapping->nrpages == 0) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&inode_sb_list_lock);
+		/*
+		 * We hold a reference to 'inode' so it couldn't have been
+		 * removed from s_inodes list while we dropped the
+		 * inode_sb_list_lock.  We cannot iput the inode now as we can
+		 * be holding the last reference and we cannot iput it under
+		 * inode_sb_list_lock. So we keep the reference and iput it
+		 * later.
+		 */
+		iput(old_inode);
+		old_inode = inode;
+
+		__sync_blockdev(I_BDEV(inode), wait);
+
+		spin_lock(&inode_sb_list_lock);
+	}
+	spin_unlock(&inode_sb_list_lock);
+	iput(old_inode);
 }
 
 /*
@@ -101,10 +135,10 @@ SYSCALL_DEFINE0(sync)
 	wakeup_flusher_threads(0);
 	iterate_supers(sync_inodes_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_blkdev_one_sb, &nowait);
+	sync_all_bdevs(nowait);
 	iterate_supers(sync_inodes_one_sb, &wait);
 	iterate_supers(sync_fs_one_sb, &wait);
-	iterate_supers(sync_blkdev_one_sb, &wait);
+	sync_all_bdevs(wait);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
 	return 0;
@@ -120,10 +154,10 @@ static void do_sync_work(struct work_struct *work)
 	 */
 	iterate_supers(sync_inodes_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_blkdev_one_sb, &nowait);
+	sync_all_bdevs(nowait);
 	iterate_supers(sync_inodes_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_blkdev_one_sb, &nowait);
+	sync_all_bdevs(nowait);
 	printk("Emergency Sync complete\n");
 	kfree(work);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
                   ` (4 preceding siblings ...)
  2011-10-07 20:40 ` [PATCH 5/6] vfs: Make sys_sync writeout also block device inodes Jan Kara
@ 2011-10-07 20:40 ` Jan Kara
  2011-10-20  9:57   ` Christoph Hellwig
  2011-10-18  1:07 ` [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
  6 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2011-10-07 20:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro, Jan Kara

wakeup_flusher_threads(0) will queue work doing complete writeback for each
flusher thread. Thus there is not much point in submitting another work doing
full inode WB_SYNC_NONE writeback by writeback_inodes_sb().

After this change it does not make sense to call nonblocking ->sync_fs and
block device flush before calling sync_inodes_sb() because
wakeup_flusher_threads() is completely asynchronous and thus these functions
would be called in parallel with inode writeback running which will effectively
void any work they do. So we move sync_inodes_sb() call before these two
functions.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 7c55c5d..02ba2a5 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -133,10 +133,9 @@ SYSCALL_DEFINE0(sync)
 	int nowait = 0, wait = 1;
 
 	wakeup_flusher_threads(0);
-	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_inodes_one_sb, &wait);
 	iterate_supers(sync_fs_one_sb, &nowait);
 	sync_all_bdevs(nowait);
-	iterate_supers(sync_inodes_one_sb, &wait);
 	iterate_supers(sync_fs_one_sb, &wait);
 	sync_all_bdevs(wait);
 	if (unlikely(laptop_mode))
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Cleanup and improve sync (v3)
  2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
                   ` (5 preceding siblings ...)
  2011-10-07 20:40 ` [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes Jan Kara
@ 2011-10-18  1:07 ` Jan Kara
  2011-10-18  6:45   ` Christoph Hellwig
  6 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2011-10-18  1:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Curt Wohlgemuth, linux-fsdevel, Al Viro

  Hello,

On Fri 07-10-11 22:40:49, Jan Kara wrote:
>   this is a third iteration of my series improving handling of sync syscall.
> Since previous submission I have changed ordering of patches and split some
> patches as Christoph suggested.
  Christoph, did you have a chance to look at this?

								Honza

> I have run three tests below to verify performance impact of the patch series.
> Each test has been run with 1, 2, and 4 filesystems mounted; test with 2
> filesystems was run with each filesystem on a different disk, test with 4
> filesystems had 2 filesystems on the first disk and 2 filesystems on the second
> disk.
> 
> Test 1: Run 200 times sync with filesystem mounted to verify overhead of
>   sync when there are no data to write.
> Test 2: For each filesystem run a process creating 40 KB files, sleep
>   for 3 seconds, run sync.
> Test 3: For each filesystem run a process creating 20 GB file, sleep for
>   5 seconds, run sync.
> 
> I have performed 10 runs of each test for xfs, ext3, ext4, and btrfs
> filesystems.
> 
> Results of test 1
> -----------------
> Numbers are time it took 200 syncs to complete.
> Character in braces is + if the time increased with 2*STDDEV reliability,
> - if it decreased with 2*STDDEV reliability, 0 otherwise.
> 		      BASE		      PATCHED
> FS		AVG      STDDEV         AVG      STDDEV
> xfs, 1 disks	4.189300 0.051525	2.141300 0.063389 (-)
> xfs, 2 disks	4.820600 0.019096	4.611400 0.066322 (-)
> xfs, 4 disks	6.518300 1.440362	6.435700 0.510641 (0)
> ext4, 1 disks	4.085000 0.011375	1.689500 0.001360 (-)
> ext4, 2 disks	4.088100 0.006488	1.705000 0.026359 (-)
> ext4, 4 disks	4.107300 0.011934	1.702900 0.001814 (-)
> ext3, 1 disks	4.080200 0.009527	1.703400 0.030559 (-)
> ext3, 2 disks	4.138300 0.143909	1.694000 0.001414 (-)
> ext3, 4 disks	4.107200 0.002482	1.702900 0.007778 (-)
> btrfs, 1 disks	11.214600 0.086619	8.737200 0.081076 (-)
> btrfs, 2 disks	32.910000 0.162089	30.673400 0.538820 (-)
> btrfs, 4 disks	67.987700 1.655654	67.247100 1.971887 (0)
> 
> So we see nice improvements almost all over the board.
> 
> Results of test 2
> -----------------
> Numbers are time it took sync to complete.
> 
> 		    BASE		    PATCHED
> FS		AVG      STDDEV         AVG      STDDEV
> xfs, 1 disks	0.436000 0.012000	0.506000 0.014283 (+)
> xfs, 2 disks	1.105000 0.055543	1.274000 0.244426 (0)
> xfs, 4 disks	5.880000 2.997135	4.837000 3.875448 (0)
> ext4, 1 disks	0.791000 0.055579	0.853000 0.042438 (0)
> ext4, 2 disks	18.232000 13.505638	17.254000 2.000506 (0)
> ext4, 4 disks	491.790000 218.565229	696.783000 234.933562 (0)
> ext3, 1 disks	15.315000 2.065465	1.900000 0.184662 (-)
> ext3, 2 disks	128.524000 18.090519	55.278000 1.530554 (-)
> ext3, 4 disks	221.202000 30.090432	232.849000 68.745423 (0)
> btrfs, 1 disks	0.452000 0.026000	0.494000 0.023749 (0)
> btrfs, 2 disks	5.156000 4.530852	4.083000 1.560519 (0)
> btrfs, 4 disks	31.154000 11.220828	36.987000 17.334126 (0)
> 
> Except for ext3 which got a nice boost here and XFS which seems to be a tad bit
> slower, there are no changes that would stand out of the noise.
> 
> Results of test 3
> -----------------
> Numbers are time it took sync to complete.
> 
> 		    BASE		    PATCHED
> FS		AVG      STDDEV         AVG      STDDEV
> xfs, 1 disks	12.083000 0.058660	10.898000 0.285475 (-)
> xfs, 2 disks	20.182000 0.549614	14.977000 0.351114 (-)
> xfs, 4 disks	35.814000 5.318310	28.452000 3.332281 (0)
> ext4, 1 disks	32.956000 5.753789	20.865000 3.892098 (0)
> ext4, 2 disks	34.922000 3.051966	27.411000 2.752978 (0)
> ext4, 4 disks	44.508000 6.829004	28.360000 2.561437 (0)
> ext3, 1 disks	23.475000 1.288885	17.116000 0.319631 (-)
> ext3, 2 disks	43.508000 4.998647	41.547000 2.597976 (0)
> ext3, 4 disks	92.130000 11.344117	79.362000 9.891208 (0)
> btrfs, 1 disks	12.478000 0.394304	12.847000 0.171117 (0)
> btrfs, 2 disks	15.030000 0.777817	18.014000 2.011418 (0)
> btrfs, 4 disks	32.395000 4.248859	38.411000 3.179939 (0)
> 
> Here we see XFS and ext3 had some improvements, ext4 likely as well although
> the results are relatively noisy.
> 
> Out of curiosity, I also tried removing syncfs(sb, 0) call from the sync
> sequence altogether as Christoph suggested. In the test 1, results end up being
> even better, tests 2 and 3 end up roughly the same, sometimes slightly better.
> I also performed tests where we write some amount of data to the filesystem
> and then call sync - there were no changes in sync times that would stand out
> of the noise. So this might be a worthwhile simplification of sync...
> 
> 								Honza
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Cleanup and improve sync (v3)
  2011-10-18  1:07 ` [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
@ 2011-10-18  6:45   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2011-10-18  6:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Curt Wohlgemuth, linux-fsdevel, Al Viro

On Tue, Oct 18, 2011 at 03:07:26AM +0200, Jan Kara wrote:
>   Hello,
> 
> On Fri 07-10-11 22:40:49, Jan Kara wrote:
> >   this is a third iteration of my series improving handling of sync syscall.
> > Since previous submission I have changed ordering of patches and split some
> > patches as Christoph suggested.
>   Christoph, did you have a chance to look at this?

I had been a bit busy, but I'll get back to it soon.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] vfs: Move noop_backing_dev_info check from sync into writeback
  2011-10-07 20:40 ` [PATCH 1/6] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
@ 2011-10-20  9:48   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2011-10-20  9:48 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Curt Wohlgemuth, linux-fsdevel, Al Viro

On Fri, Oct 07, 2011 at 10:40:50PM +0200, Jan Kara wrote:
> In principle, a filesystem may want to have ->sync_fs() called during sync(1)
> although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
> Only writeback code really needs bdi set to something reasonable. So move the
> checks where they are more logical.

Looks good, but the comment above the check isn't too useful.  Either
remove it or write something more substancial.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] quota: Split dquot_quota_sync() to writeback and cache flushing part
  2011-10-07 20:40 ` [PATCH 2/6] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
@ 2011-10-20  9:50   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2011-10-20  9:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Curt Wohlgemuth, linux-fsdevel, Al Viro

On Fri, Oct 07, 2011 at 10:40:51PM +0200, Jan Kara wrote:
> Split off part of dquot_quota_sync() which writes dquots into a quota file
> to a separate function. In the next patch we will use the function from
> filesystems and we do not want to abuse ->quota_sync quotactl callback more
> than necessary.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] vfs: Reorder operations during sys_sync
  2011-10-07 20:40 ` [PATCH 4/6] vfs: Reorder operations during sys_sync Jan Kara
@ 2011-10-20  9:53   ` Christoph Hellwig
  2011-10-20 23:57     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2011-10-20  9:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Curt Wohlgemuth, linux-fsdevel, Al Viro

Looks good except for some minor comments below:


Reviewed-by: Christoph Hellwig <hch@lst.de>

> diff --git a/fs/sync.c b/fs/sync.c
> index 3367d04..5fbeee6 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -68,18 +68,26 @@ int sync_filesystem(struct super_block *sb)
>  }
>  EXPORT_SYMBOL_GPL(sync_filesystem);
>  
> -static void sync_one_sb(struct super_block *sb, void *arg)
> +static void sync_inodes_one_sb(struct super_block *sb, void *arg)
>  {
> -	if (!(sb->s_flags & MS_RDONLY))
> -		__sync_filesystem(sb, *(int *)arg);
> +	if (!(sb->s_flags & MS_RDONLY)) {
> +		if (!*(int *)arg)
> +			writeback_inodes_sb(sb);
> +		else
> +			sync_inodes_sb(sb);
> +	}

This would be a lot cleaner if you split it into two functions for
the writeback_inodes_sb and sync_inodes_sb cases.

>  }
> -/*
> - * Sync all the data for all the filesystems (called by sys_sync() and
> - * emergency sync)
> - */
> -static void sync_filesystems(int wait)
> +
> +static void sync_fs_one_sb(struct super_block *sb, void *arg)
>  {
> -	iterate_supers(sync_one_sb, &wait);
> +	if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> +		sb->s_op->sync_fs(sb, *(int *)arg);
> +}
> +
> +static void sync_blkdev_one_sb(struct super_block *sb, void *arg)
> +{
> +	if (!(sb->s_flags & MS_RDONLY))
> +		__sync_blockdev(sb->s_bdev, *(int *)arg);
>  }

Same here, not having these odd wait/nowait arguments whos address
is taken in the caller would make the thing a lot more readable.

It would also allow to kill of that nasty __sync_blockdev interface
eventually.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] vfs: Make sys_sync writeout also block device inodes
  2011-10-07 20:40 ` [PATCH 5/6] vfs: Make sys_sync writeout also block device inodes Jan Kara
@ 2011-10-20  9:54   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2011-10-20  9:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Curt Wohlgemuth, linux-fsdevel, Al Viro


Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  2011-10-07 20:40 ` [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes Jan Kara
@ 2011-10-20  9:57   ` Christoph Hellwig
  2011-10-24 13:14     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2011-10-20  9:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Curt Wohlgemuth, linux-fsdevel, Al Viro

On Fri, Oct 07, 2011 at 10:40:55PM +0200, Jan Kara wrote:
> wakeup_flusher_threads(0) will queue work doing complete writeback for each
> flusher thread. Thus there is not much point in submitting another work doing
> full inode WB_SYNC_NONE writeback by writeback_inodes_sb().
> 
> After this change it does not make sense to call nonblocking ->sync_fs and
> block device flush before calling sync_inodes_sb() because
> wakeup_flusher_threads() is completely asynchronous and thus these functions
> would be called in parallel with inode writeback running which will effectively
> void any work they do. So we move sync_inodes_sb() call before these two
> functions.

>  	wakeup_flusher_threads(0);
> -	iterate_supers(sync_inodes_one_sb, &nowait);
> +	iterate_supers(sync_inodes_one_sb, &wait);
>  	iterate_supers(sync_fs_one_sb, &nowait);
>  	sync_all_bdevs(nowait);
> -	iterate_supers(sync_inodes_one_sb, &wait);
>  	iterate_supers(sync_fs_one_sb, &wait);
>  	sync_all_bdevs(wait);
>  	if (unlikely(laptop_mode))

This looks a bit half-assed to me.  Why do we still do the non-blocking
sync_all_bdevs call?  This really only starts async writeback on the
block device inodes, which wakeup_flusher_threads already did.
Similarly I don't think the non-blocking ->sync_fs call really make
much sense anymore here.

Also we really need some good comment on why the order is
like the one chose here in this function.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] vfs: Reorder operations during sys_sync
  2011-10-20  9:53   ` Christoph Hellwig
@ 2011-10-20 23:57     ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2011-10-20 23:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, Curt Wohlgemuth, linux-fsdevel, Al Viro

On Thu 20-10-11 05:53:44, Christoph Hellwig wrote:
> > diff --git a/fs/sync.c b/fs/sync.c
> > index 3367d04..5fbeee6 100644
> > --- a/fs/sync.c
> > +++ b/fs/sync.c
> > @@ -68,18 +68,26 @@ int sync_filesystem(struct super_block *sb)
> >  }
> >  EXPORT_SYMBOL_GPL(sync_filesystem);
> >  
> > -static void sync_one_sb(struct super_block *sb, void *arg)
> > +static void sync_inodes_one_sb(struct super_block *sb, void *arg)
> >  {
> > -	if (!(sb->s_flags & MS_RDONLY))
> > -		__sync_filesystem(sb, *(int *)arg);
> > +	if (!(sb->s_flags & MS_RDONLY)) {
> > +		if (!*(int *)arg)
> > +			writeback_inodes_sb(sb);
> > +		else
> > +			sync_inodes_sb(sb);
> > +	}
> 
> This would be a lot cleaner if you split it into two functions for
> the writeback_inodes_sb and sync_inodes_sb cases.
  OK, split to writeback_inodes_one_sb() and sync_inodes_one_sb().

> > -/*
> > - * Sync all the data for all the filesystems (called by sys_sync() and
> > - * emergency sync)
> > - */
> > -static void sync_filesystems(int wait)
> > +
> > +static void sync_fs_one_sb(struct super_block *sb, void *arg)
> >  {
> > -	iterate_supers(sync_one_sb, &wait);
> > +	if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> > +		sb->s_op->sync_fs(sb, *(int *)arg);
> > +}
> > +
> > +static void sync_blkdev_one_sb(struct super_block *sb, void *arg)
> > +{
> > +	if (!(sb->s_flags & MS_RDONLY))
> > +		__sync_blockdev(sb->s_bdev, *(int *)arg);
> >  }
> 
> Same here, not having these odd wait/nowait arguments whos address
> is taken in the caller would make the thing a lot more readable.
> 
> It would also allow to kill of that nasty __sync_blockdev interface
> eventually.
  So you prefer I directly call:
	filemap_flush(bdev->bd_inode->i_mapping)
and
	filemap_write_and_wait(bdev->bd_inode->i_mapping)
respectively?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  2011-10-20  9:57   ` Christoph Hellwig
@ 2011-10-24 13:14     ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2011-10-24 13:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, Curt Wohlgemuth, linux-fsdevel, Al Viro

On Thu 20-10-11 05:57:25, Christoph Hellwig wrote:
> On Fri, Oct 07, 2011 at 10:40:55PM +0200, Jan Kara wrote:
> > wakeup_flusher_threads(0) will queue work doing complete writeback for each
> > flusher thread. Thus there is not much point in submitting another work doing
> > full inode WB_SYNC_NONE writeback by writeback_inodes_sb().
> > 
> > After this change it does not make sense to call nonblocking ->sync_fs and
> > block device flush before calling sync_inodes_sb() because
> > wakeup_flusher_threads() is completely asynchronous and thus these functions
> > would be called in parallel with inode writeback running which will effectively
> > void any work they do. So we move sync_inodes_sb() call before these two
> > functions.
> 
> >  	wakeup_flusher_threads(0);
> > -	iterate_supers(sync_inodes_one_sb, &nowait);
> > +	iterate_supers(sync_inodes_one_sb, &wait);
> >  	iterate_supers(sync_fs_one_sb, &nowait);
> >  	sync_all_bdevs(nowait);
> > -	iterate_supers(sync_inodes_one_sb, &wait);
> >  	iterate_supers(sync_fs_one_sb, &wait);
> >  	sync_all_bdevs(wait);
> >  	if (unlikely(laptop_mode))
> 
> This looks a bit half-assed to me.  Why do we still do the non-blocking
> sync_all_bdevs call?  This really only starts async writeback on the
> block device inodes, which wakeup_flusher_threads already did.
  Because e.g. ext2 will dirty quite some bdev buffers while doing inode
writeback which runs in no particular order with the writeback of bdev
inode. So after sync_inodes_sb() finishes you can have too much dirty
buffers on a bdev for synchronous sync_all_bdevs() to be efficient.

But what might be the best is to do filemap_fdadawrite() on all bdevs and
then do filemap_fdatawait() on all bdevs which would solve the efficiency
issue and also don't do writeback twice unnecessarily. OK?

> Similarly I don't think the non-blocking ->sync_fs call really make
> much sense anymore here.
  Yes, so as I wrote in the introductory email, I did also measurements
where non-blocking ->sync_fs was removed and I didn't see any regression
with ext3, ext4, xfs, or btrfs. OTOH I can imagine *some* filesystem can do
an equivalent of filemap_fdatawrite() on some metadata for non-blocking
->sync_fs and filemap_fdatawrite_and_wait() on the blocking one and if
there are more such filesystems on different backing storages the
performance difference can be noticeable (actually, checking the
filesystems, JFS and Ceph seem to be doing something like this). So I
that's why I didn't include the change in the end...

> Also we really need some good comment on why the order is
> like the one chose here in this function.
  Good point, will add.

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-10-24 13:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-07 20:40 [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
2011-10-07 20:40 ` [PATCH 1/6] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
2011-10-20  9:48   ` Christoph Hellwig
2011-10-07 20:40 ` [PATCH 2/6] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
2011-10-20  9:50   ` Christoph Hellwig
2011-10-07 20:40 ` [PATCH 3/6] quota: Move quota syncing to ->sync_fs method Jan Kara
2011-10-07 20:40 ` [PATCH 4/6] vfs: Reorder operations during sys_sync Jan Kara
2011-10-20  9:53   ` Christoph Hellwig
2011-10-20 23:57     ` Jan Kara
2011-10-07 20:40 ` [PATCH 5/6] vfs: Make sys_sync writeout also block device inodes Jan Kara
2011-10-20  9:54   ` Christoph Hellwig
2011-10-07 20:40 ` [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes Jan Kara
2011-10-20  9:57   ` Christoph Hellwig
2011-10-24 13:14     ` Jan Kara
2011-10-18  1:07 ` [PATCH 0/6] Cleanup and improve sync (v3) Jan Kara
2011-10-18  6:45   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.