linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code
@ 2012-07-03 14:45 Jan Kara
  2012-07-03 14:45 ` [PATCH 1/8] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara


  Hello,

  this is a fourth iteration of my series improving handling of sync syscall.
Since previous submission I have slightly improved cleaned up iteration loops
so that we don't have to pass void * around. Christoph also asked about why
we do non-blocking ->sync_fs() pass. My answer to it was:

I did also measurements where non-blocking ->sync_fs was removed and I didn't
see any regression with ext3, ext4, xfs, or btrfs. OTOH I can imagine *some*
filesystem can do an equivalent of filemap_fdatawrite() on some metadata for
non-blocking ->sync_fs and filemap_fdatawrite_and_wait() on the blocking one
and if there are more such filesystems on different backing storages the
performance difference can be noticeable (actually, checking the filesystems,
JFS and Ceph seem to be doing something like this). So I that's why I didn't
include the change in the end...

So Christoph, if you think we should get rid of non-blocking ->sync_fs, I can
include the patch but personally I think it has some use. Arguably a cleaner
interface for the users will be something like two methods ->sync_fs_begin
and ->sync_fs_end. Filesystems that don't have much to optimize in ->sync_fs()
would just use one of these functions.

I have run three tests below to verify performance impact of the patch series.
Each test has been run with 1, 2, and 4 filesystems mounted; test with 2
filesystems was run with each filesystem on a different disk, test with 4
filesystems had 2 filesystems on the first disk and 2 filesystems on the second
disk.

Test 1: Run 200 times sync with filesystem mounted to verify overhead of
  sync when there are no data to write.
Test 2: For each filesystem run a process creating 40 KB files, sleep
  for 3 seconds, run sync.
Test 3: For each filesystem run a process creating 20 GB file, sleep for
  5 seconds, run sync.

I have performed 10 runs of each test for xfs, ext3, ext4, and btrfs
filesystems.

Results of test 1
-----------------
Numbers are time it took 200 syncs to complete.
Character in braces is + if the time increased with 2*STDDEV reliability,
- if it decreased with 2*STDDEV reliability, 0 otherwise.
		      BASE		      PATCHED
FS		AVG      STDDEV         AVG      STDDEV
Test xfs, 1 disks	0.783000 0.012689	1.628000 0.120316 (+)	
Test xfs, 2 disks	0.742000 0.011662	1.774000 0.135144 (+)	
Test xfs, 4 disks	0.823000 0.057280	1.034000 0.083690 (0)	
Test ext4, 1 disks	0.620000 0.000000	0.678000 0.004000 (+)	
Test ext4, 2 disks	0.629000 0.003000	0.672000 0.004000 (+)	
Test ext4, 4 disks	0.642000 0.004000	0.670000 0.004472 (+)	
Test ext3, 1 disks	0.625000 0.005000	0.662000 0.009798 (+)	
Test ext3, 2 disks	0.622000 0.004000	0.662000 0.004000 (+)	
Test ext3, 4 disks	0.639000 0.003000	0.661000 0.005385 (+)	
Test btrfs, 1 disks	7.901000 0.173807	7.635000 0.171712 (0)	
Test btrfs, 2 disks	19.690000 0.357379	18.630000 0.260000 (0)	
Test btrfs, 4 disks	42.113000 0.725438	41.440000 0.492016 (0)	

We see small increases in runtime, likely due to us having to process all block
devices in the system. XFS actually suffers a bit more, which has been caused
by the last patch dropping writeback_inodes_sb() and reordering ->sync_fs
calls. But still it seems to be OK for this no-so-important workload.

Results of test 2
-----------------
Numbers are time it took sync to complete.

		    BASE		    PATCHED
FS		AVG      STDDEV         AVG      STDDEV
Test xfs, 1 disks	0.391000 0.010440	0.408000 0.011662 (0)	
Test xfs, 2 disks	0.670000 0.014832	0.707000 0.038223 (0)	
Test xfs, 4 disks	2.800000 1.722202	1.818000 0.144900 (0)	
Test ext4, 1 disks	1.531000 0.778247	0.852000 0.109252 (0)	
Test ext4, 2 disks	9.313000 1.857375	10.671000 2.624806 (0)	
Test ext4, 4 disks	254.982000 88.016783	312.003000 30.387435 (0)	
Test ext3, 1 disks	11.751000 0.924472	1.855000 0.179736 (-)	
Test ext3, 2 disks	82.625000 12.903233	43.483000 0.493438 (-)	
Test ext3, 4 disks	79.826000 21.118762	91.593000 31.763338 (0)	
Test btrfs, 1 disks	0.407000 0.012689	0.423000 0.011874 (0)	
Test btrfs, 2 disks	0.790000 0.404252	1.387000 0.606829 (0)	
Test btrfs, 4 disks	2.069000 0.635460	2.273000 1.617641 (0)	

Changes are mostly in the (sometimes heavy) noise, only ext3 stands out with
some noticeable improvements.

Results of test 3
-----------------
Numbers are time it took sync to complete.

		    BASE		    PATCHED
FS		AVG      STDDEV         AVG      STDDEV
Test xfs, 1 disks	12.541000 1.875209	11.351000 0.824724 (0)	
Test xfs, 2 disks	14.858000 0.866162	12.114000 0.632743 (0)	
Test xfs, 4 disks	23.825000 2.020224	17.388000 1.641809 (0)	
Test ext4, 1 disks	39.697000 2.151465	14.987000 2.611670 (-)	
Test ext4, 2 disks	36.148000 1.231104	20.030000 0.656171 (-)	
Test ext4, 4 disks	33.326000 2.116559	19.864000 1.171829 (-)	
Test ext3, 1 disks	21.509000 1.944307	15.166000 0.115603 (-)	
Test ext3, 2 disks	26.694000 1.989750	21.465000 2.187219 (0)	
Test ext3, 4 disks	42.809000 5.220120	34.878000 5.011055 (0)	
Test btrfs, 1 disks	7.339000 2.299637	9.386000 0.631493 (0)	
Test btrfs, 2 disks	7.945000 3.100275	10.554000 0.073919 (0)	
Test btrfs, 4 disks	18.271000 2.669938	25.275000 2.682839 (0)	

Here we see ext3 & ext4 improved somewhat, XFS likely as well although it's
still in the noise. OTOH btrfs likely got slower although it's in the noise.
I didn't drill down into what caused this, I just now that it's not the last
patch.

								Honza

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] vfs: Move noop_backing_dev_info check from sync into writeback
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  2012-07-03 14:45 ` [PATCH 2/8] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara

In principle, a filesystem may want to have ->sync_fs() called during sync(1)
although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
Only writeback code really needs bdi set to something reasonable. So move the
checks where they are more logical.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    5 +++++
 fs/sync.c         |    7 -------
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 41a3ccf..8f660dd 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1315,6 +1315,8 @@ void writeback_inodes_sb_nr(struct super_block *sb,
 		.reason			= reason,
 	};
 
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 	bdi_queue_work(sb->s_bdi, &work);
 	wait_for_completion(&done);
@@ -1398,6 +1400,9 @@ void sync_inodes_sb(struct super_block *sb)
 		.reason		= WB_REASON_SYNC,
 	};
 
+	/* Nothing to do? */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
 	bdi_queue_work(sb->s_bdi, &work);
diff --git a/fs/sync.c b/fs/sync.c
index 11e3d1c..b3d2a00 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -29,13 +29,6 @@
  */
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
-	/*
-	 * This should be safe, as we require bdi backing to actually
-	 * write out data in the first place
-	 */
-	if (sb->s_bdi == &noop_backing_dev_info)
-		return 0;
-
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
 		sb->s_qcop->quota_sync(sb, -1, wait);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/8] quota: Split dquot_quota_sync() to writeback and cache flushing part
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
  2012-07-03 14:45 ` [PATCH 1/8] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  2012-07-03 14:45 ` [PATCH 3/8] quota: Move quota syncing to ->sync_fs method Jan Kara
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara

Split off part of dquot_quota_sync() which writes dquots into a quota file
to a separate function. In the next patch we will use the function from
filesystems and we do not want to abuse ->quota_sync quotactl callback more
than necessary.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/quota.c          |    4 ++--
 fs/gfs2/quota.h          |    2 +-
 fs/gfs2/super.c          |    2 +-
 fs/gfs2/sys.c            |    2 +-
 fs/quota/dquot.c         |   24 +++++++++++++++++++++---
 fs/quota/quota.c         |    4 ++--
 fs/sync.c                |    2 +-
 include/linux/quota.h    |    2 +-
 include/linux/quotaops.h |    3 ++-
 9 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index b97178e..27b5cc7 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1108,7 +1108,7 @@ void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
 	}
 }
 
-int gfs2_quota_sync(struct super_block *sb, int type, int wait)
+int gfs2_quota_sync(struct super_block *sb, int type)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_quota_data **qda;
@@ -1154,7 +1154,7 @@ int gfs2_quota_sync(struct super_block *sb, int type, int wait)
 
 static int gfs2_quota_sync_timeo(struct super_block *sb, int type)
 {
-	return gfs2_quota_sync(sb, type, 0);
+	return gfs2_quota_sync(sb, type);
 }
 
 int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id)
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 90bf1c3..f25d98b 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -26,7 +26,7 @@ extern int gfs2_quota_check(struct gfs2_inode *ip, u32 uid, u32 gid);
 extern void gfs2_quota_change(struct gfs2_inode *ip, s64 change,
 			      u32 uid, u32 gid);
 
-extern int gfs2_quota_sync(struct super_block *sb, int type, int wait);
+extern int gfs2_quota_sync(struct super_block *sb, int type);
 extern int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id);
 
 extern int gfs2_quota_init(struct gfs2_sbd *sdp);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 713e621..313c329 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -838,7 +838,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 	int error;
 
 	flush_workqueue(gfs2_delete_workqueue);
-	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
+	gfs2_quota_sync(sdp->sd_vfs, 0);
 	gfs2_statfs_sync(sdp->sd_vfs, 0);
 
 	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, GL_NOCACHE,
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 9c2592b..73ecc34 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -168,7 +168,7 @@ static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const char *buf,
 	if (simple_strtol(buf, NULL, 0) != 1)
 		return -EINVAL;
 
-	gfs2_quota_sync(sdp->sd_vfs, 0, 1);
+	gfs2_quota_sync(sdp->sd_vfs, 0);
 	return len;
 }
 
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 10cbe84..d679fc4 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -595,12 +595,14 @@ out:
 }
 EXPORT_SYMBOL(dquot_scan_active);
 
-int dquot_quota_sync(struct super_block *sb, int type, int wait)
+/* Write all dquot structures to quota files */
+int dquot_writeback_dquots(struct super_block *sb, int type)
 {
 	struct list_head *dirty;
 	struct dquot *dquot;
 	struct quota_info *dqopt = sb_dqopt(sb);
 	int cnt;
+	int err, ret = 0;
 
 	mutex_lock(&dqopt->dqonoff_mutex);
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -624,7 +626,9 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 			atomic_inc(&dquot->dq_count);
 			spin_unlock(&dq_list_lock);
 			dqstats_inc(DQST_LOOKUPS);
-			sb->dq_op->write_dquot(dquot);
+			err = sb->dq_op->write_dquot(dquot);
+			if (!ret && err)
+				err = ret;
 			dqput(dquot);
 			spin_lock(&dq_list_lock);
 		}
@@ -638,7 +642,21 @@ int dquot_quota_sync(struct super_block *sb, int type, int wait)
 	dqstats_inc(DQST_SYNCS);
 	mutex_unlock(&dqopt->dqonoff_mutex);
 
-	if (!wait || (dqopt->flags & DQUOT_QUOTA_SYS_FILE))
+	return ret;
+}
+EXPORT_SYMBOL(dquot_writeback_dquots);
+
+/* Write all dquot structures to disk and make them visible from userspace */
+int dquot_quota_sync(struct super_block *sb, int type)
+{
+	struct quota_info *dqopt = sb_dqopt(sb);
+	int cnt;
+	int ret;
+
+	ret = dquot_writeback_dquots(sb, type);
+	if (ret)
+		return ret;
+	if (dqopt->flags & DQUOT_QUOTA_SYS_FILE)
 		return 0;
 
 	/* This is not very clever (and fast) but currently I don't know about
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 9a39120..c659f92 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -47,7 +47,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
 static void quota_sync_one(struct super_block *sb, void *arg)
 {
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
+		sb->s_qcop->quota_sync(sb, *(int *)arg);
 }
 
 static int quota_sync_all(int type)
@@ -270,7 +270,7 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id,
 	case Q_SYNC:
 		if (!sb->s_qcop->quota_sync)
 			return -ENOSYS;
-		return sb->s_qcop->quota_sync(sb, type, 1);
+		return sb->s_qcop->quota_sync(sb, type);
 	case Q_XQUOTAON:
 	case Q_XQUOTAOFF:
 	case Q_XQUOTARM:
diff --git a/fs/sync.c b/fs/sync.c
index b3d2a00..cae145d 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -30,7 +30,7 @@
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1, wait);
+		sb->s_qcop->quota_sync(sb, -1);
 
 	if (wait)
 		sync_inodes_sb(sb);
diff --git a/include/linux/quota.h b/include/linux/quota.h
index c09fa04..524ede8 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -333,7 +333,7 @@ struct quotactl_ops {
 	int (*quota_on)(struct super_block *, int, int, struct path *);
 	int (*quota_on_meta)(struct super_block *, int, int);
 	int (*quota_off)(struct super_block *, int);
-	int (*quota_sync)(struct super_block *, int, int);
+	int (*quota_sync)(struct super_block *, int);
 	int (*get_info)(struct super_block *, int, struct if_dqinfo *);
 	int (*set_info)(struct super_block *, int, struct if_dqinfo *);
 	int (*get_dqblk)(struct super_block *, int, qid_t, struct fs_disk_quota *);
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index 17b9773..778ba5b 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -83,7 +83,8 @@ int dquot_quota_on(struct super_block *sb, int type, int format_id,
 int dquot_quota_on_mount(struct super_block *sb, char *qf_name,
  	int format_id, int type);
 int dquot_quota_off(struct super_block *sb, int type);
-int dquot_quota_sync(struct super_block *sb, int type, int wait);
+int dquot_writeback_dquots(struct super_block *sb, int type);
+int dquot_quota_sync(struct super_block *sb, int type);
 int dquot_get_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
 int dquot_set_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii);
 int dquot_get_dqblk(struct super_block *sb, int type, qid_t id,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] quota: Move quota syncing to ->sync_fs method
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
  2012-07-03 14:45 ` [PATCH 1/8] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
  2012-07-03 14:45 ` [PATCH 2/8] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  2012-07-03 14:45 ` [PATCH 4/8] vfs: Reorder operations during sys_sync Jan Kara
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig,
	Jan Kara, Theodore Ts'o, Joel Becker, reiserfs-devel

Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/super.c     |    6 ++++++
 fs/ext3/super.c     |    5 +++++
 fs/ext4/super.c     |    5 +++++
 fs/gfs2/super.c     |    2 ++
 fs/jfs/super.c      |    5 +++++
 fs/reiserfs/super.c |    5 +++++
 fs/sync.c           |    3 ---
 7 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index b3621cb..5df3d2d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1184,6 +1184,12 @@ static int ext2_sync_fs(struct super_block *sb, int wait)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	struct ext2_super_block *es = EXT2_SB(sb)->s_es;
 
+	/*
+	 * Write quota structures to quota file, sync_blockdev() will write
+	 * them to disk later
+	 */
+	dquot_writeback_dquots(sb, -1);
+
 	spin_lock(&sbi->s_lock);
 	if (es->s_state & cpu_to_le16(EXT2_VALID_FS)) {
 		ext2_debug("setting valid to 0\n");
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 8c3a44b..4ac304c 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2526,6 +2526,11 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
 	tid_t target;
 
 	trace_ext3_sync_fs(sb, wait);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
 		if (wait)
 			log_wait_commit(EXT3_SB(sb)->s_journal, target);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index eb7aa3e..d875940 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4325,6 +4325,11 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 
 	trace_ext4_sync_fs(sb, wait);
 	flush_workqueue(sbi->dio_unwritten_wq);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
 			jbd2_log_wait_commit(sbi->s_journal, target);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 313c329..f3d6bbf 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -952,6 +952,8 @@ restart:
 static int gfs2_sync_fs(struct super_block *sb, int wait)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
+
+	gfs2_quota_sync(sb, -1);
 	if (wait && sdp)
 		gfs2_log_flush(sdp, NULL);
 	return 0;
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 4a82950..c55c745 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -601,6 +601,11 @@ static int jfs_sync_fs(struct super_block *sb, int wait)
 
 	/* log == NULL indicates read-only mount */
 	if (log) {
+		/*
+		 * Write quota structures to quota file, sync_blockdev() will
+		 * write them to disk later
+		 */
+		dquot_writeback_dquots(sb, -1);
 		jfs_flush_journal(log, wait);
 		jfs_syncpt(log, 0);
 	}
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 651ce76..7a37dab 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -68,6 +68,11 @@ static int reiserfs_sync_fs(struct super_block *s, int wait)
 {
 	struct reiserfs_transaction_handle th;
 
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(s, -1);
 	reiserfs_write_lock(s);
 	if (!journal_begin(&th, s, 1))
 		if (!journal_end_sync(&th, s, 1))
diff --git a/fs/sync.c b/fs/sync.c
index cae145d..66acd2b 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -29,9 +29,6 @@
  */
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
-	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1);
-
 	if (wait)
 		sync_inodes_sb(sb);
 	else
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/8] vfs: Reorder operations during sys_sync
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
                   ` (2 preceding siblings ...)
  2012-07-03 14:45 ` [PATCH 3/8] quota: Move quota syncing to ->sync_fs method Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  2012-07-03 14:45 ` [PATCH 5/8] vfs: Create function for iterating over block devices Jan Kara
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara

Change the order of operations during sync from

for_each_sb {
        writeback_inodes_sb();
        sync_fs(nowait);
        __sync_blockdev(nowait);
}
for_each_sb {
        sync_inodes_sb();
        sync_fs(wait);
        __sync_blockdev(wait);
}

to

for_each_sb
        writeback_inodes_sb();
for_each_sb
        sync_fs(nowait);
for_each_sb
        __sync_blockdev(nowait);
for_each_sb
        sync_inodes_sb();
for_each_sb
        sync_fs(wait);
for_each_sb
        __sync_blockdev(wait);

This is a preparation for the following patches in this series.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   46 ++++++++++++++++++++++++++++++++++------------
 1 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 66acd2b..867bf78 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -67,18 +67,28 @@ int sync_filesystem(struct super_block *sb)
 }
 EXPORT_SYMBOL_GPL(sync_filesystem);
 
-static void sync_one_sb(struct super_block *sb, void *arg)
+static void sync_inodes_one_sb(struct super_block *sb, void *arg)
 {
 	if (!(sb->s_flags & MS_RDONLY))
-		__sync_filesystem(sb, *(int *)arg);
+		sync_inodes_sb(sb);
 }
-/*
- * Sync all the data for all the filesystems (called by sys_sync() and
- * emergency sync)
- */
-static void sync_filesystems(int wait)
+
+static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
+{
+	if (!(sb->s_flags & MS_RDONLY))
+		writeback_inodes_sb(sb);
+}
+
+static void sync_fs_one_sb(struct super_block *sb, void *arg)
 {
-	iterate_supers(sync_one_sb, &wait);
+	if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
+		sb->s_op->sync_fs(sb, *(int *)arg);
+}
+
+static void sync_blkdev_one_sb(struct super_block *sb, void *arg)
+{
+	if (!(sb->s_flags & MS_RDONLY))
+		__sync_blockdev(sb->s_bdev, *(int *)arg);
 }
 
 /*
@@ -87,9 +97,15 @@ static void sync_filesystems(int wait)
  */
 SYSCALL_DEFINE0(sync)
 {
+	int nowait = 0, wait = 1;
+
 	wakeup_flusher_threads(0, WB_REASON_SYNC);
-	sync_filesystems(0);
-	sync_filesystems(1);
+	iterate_supers(writeback_inodes_one_sb, NULL);
+	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_blkdev_one_sb, &nowait);
+	iterate_supers(sync_inodes_one_sb, NULL);
+	iterate_supers(sync_fs_one_sb, &wait);
+	iterate_supers(sync_blkdev_one_sb, &wait);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
 	return 0;
@@ -97,12 +113,18 @@ SYSCALL_DEFINE0(sync)
 
 static void do_sync_work(struct work_struct *work)
 {
+	int nowait = 0;
+
 	/*
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
 	 * because they were temporarily locked
 	 */
-	sync_filesystems(0);
-	sync_filesystems(0);
+	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_blkdev_one_sb, &nowait);
+	iterate_supers(sync_inodes_one_sb, &nowait);
+	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_blkdev_one_sb, &nowait);
 	printk("Emergency Sync complete\n");
 	kfree(work);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/8] vfs: Create function for iterating over block devices
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
                   ` (3 preceding siblings ...)
  2012-07-03 14:45 ` [PATCH 4/8] vfs: Reorder operations during sys_sync Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  2012-07-03 14:45 ` [PATCH 6/8] vfs: Make sys_sync writeout also block device inodes Jan Kara
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c     |   36 ++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |    1 +
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index c2bbe1f..1e51919 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1710,3 +1710,39 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
 	return res;
 }
 EXPORT_SYMBOL(__invalidate_device);
+
+void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
+{
+	struct inode *inode, *old_inode = NULL;
+
+	spin_lock(&inode_sb_list_lock);
+	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
+		struct address_space *mapping = inode->i_mapping;
+
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
+		    mapping->nrpages == 0) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&inode_sb_list_lock);
+		/*
+		 * We hold a reference to 'inode' so it couldn't have been
+		 * removed from s_inodes list while we dropped the
+		 * inode_sb_list_lock.  We cannot iput the inode now as we can
+		 * be holding the last reference and we cannot iput it under
+		 * inode_sb_list_lock. So we keep the reference and iput it
+		 * later.
+		 */
+		iput(old_inode);
+		old_inode = inode;
+
+		func(I_BDEV(inode), arg);
+
+		spin_lock(&inode_sb_list_lock);
+	}
+	spin_unlock(&inode_sb_list_lock);
+	iput(old_inode);
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 17fd887..d0892e6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2091,6 +2091,7 @@ extern sector_t blkdev_max_block(struct block_device *bdev);
 extern void bd_forget(struct inode *inode);
 extern void bdput(struct block_device *);
 extern void invalidate_bdev(struct block_device *);
+extern void iterate_bdevs(void (*)(struct block_device *, void *), void *);
 extern int sync_blockdev(struct block_device *bdev);
 extern void kill_bdev(struct block_device *);
 extern struct super_block *freeze_bdev(struct block_device *);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/8] vfs: Make sys_sync writeout also block device inodes
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
                   ` (4 preceding siblings ...)
  2012-07-03 14:45 ` [PATCH 5/8] vfs: Create function for iterating over block devices Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  2012-07-17  6:27   ` Fengguang Wu
  2012-07-03 14:45 ` [PATCH 7/8] vfs: Remove unnecessary flushing of block devices Jan Kara
  2012-07-03 14:45 ` [PATCH 8/8] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes Jan Kara
  7 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara

In case block device does not have filesystem mounted on it, sys_sync will just
ignore it and doesn't writeout its dirty pages. This is because writeback code
avoids writing inodes from superblock without backing device and
blockdev_superblock is such a superblock.  Since it's unexpected that sync
doesn't writeout dirty data for block devices be nice to users and change the
behavior to do so. So now we iterate over all block devices on blockdev_super
instead of iterating over all superblocks when syncing block devices.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 867bf78..ea6e818 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -85,10 +85,14 @@ static void sync_fs_one_sb(struct super_block *sb, void *arg)
 		sb->s_op->sync_fs(sb, *(int *)arg);
 }
 
-static void sync_blkdev_one_sb(struct super_block *sb, void *arg)
+static void flush_one_bdev(struct block_device *bdev, void *arg)
 {
-	if (!(sb->s_flags & MS_RDONLY))
-		__sync_blockdev(sb->s_bdev, *(int *)arg);
+	__sync_blockdev(bdev, 0);
+}
+
+static void sync_one_bdev(struct block_device *bdev, void *arg)
+{
+	sync_blockdev(bdev);
 }
 
 /*
@@ -102,10 +106,10 @@ SYSCALL_DEFINE0(sync)
 	wakeup_flusher_threads(0, WB_REASON_SYNC);
 	iterate_supers(writeback_inodes_one_sb, NULL);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_blkdev_one_sb, &nowait);
+	iterate_bdevs(flush_one_bdev, NULL);
 	iterate_supers(sync_inodes_one_sb, NULL);
 	iterate_supers(sync_fs_one_sb, &wait);
-	iterate_supers(sync_blkdev_one_sb, &wait);
+	iterate_bdevs(sync_one_bdev, NULL);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
 	return 0;
@@ -121,10 +125,10 @@ static void do_sync_work(struct work_struct *work)
 	 */
 	iterate_supers(sync_inodes_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_blkdev_one_sb, &nowait);
+	iterate_bdevs(flush_one_bdev, NULL);
 	iterate_supers(sync_inodes_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_blkdev_one_sb, &nowait);
+	iterate_bdevs(flush_one_bdev, NULL);
 	printk("Emergency Sync complete\n");
 	kfree(work);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/8] vfs: Remove unnecessary flushing of block devices
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
                   ` (5 preceding siblings ...)
  2012-07-03 14:45 ` [PATCH 6/8] vfs: Make sys_sync writeout also block device inodes Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  2012-07-03 14:45 ` [PATCH 8/8] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes Jan Kara
  7 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara

It is not necessary to write block devices twice. The reason why we first did
flush and then proper sync is that
  for_each_bdev() {
    write_bdev()
    wait_for_completion()
  }
is much slower than
  for_each_bdev()
    write_bdev()
  for_each_bdev()
    wait_for_completion()
when there is bigger amount of data. But as is seen in the above, there's no real
need to scan pages and submit them twice. We just need to separate the submission
and waiting part. This patch does that.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index ea6e818..6d177d1 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -85,14 +85,14 @@ static void sync_fs_one_sb(struct super_block *sb, void *arg)
 		sb->s_op->sync_fs(sb, *(int *)arg);
 }
 
-static void flush_one_bdev(struct block_device *bdev, void *arg)
+static void fdatawrite_one_bdev(struct block_device *bdev, void *arg)
 {
-	__sync_blockdev(bdev, 0);
+	filemap_fdatawrite(bdev->bd_inode->i_mapping);
 }
 
-static void sync_one_bdev(struct block_device *bdev, void *arg)
+static void fdatawait_one_bdev(struct block_device *bdev, void *arg)
 {
-	sync_blockdev(bdev);
+	filemap_fdatawait(bdev->bd_inode->i_mapping);
 }
 
 /*
@@ -106,10 +106,10 @@ SYSCALL_DEFINE0(sync)
 	wakeup_flusher_threads(0, WB_REASON_SYNC);
 	iterate_supers(writeback_inodes_one_sb, NULL);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_bdevs(flush_one_bdev, NULL);
 	iterate_supers(sync_inodes_one_sb, NULL);
 	iterate_supers(sync_fs_one_sb, &wait);
-	iterate_bdevs(sync_one_bdev, NULL);
+	iterate_bdevs(fdatawrite_one_bdev, NULL);
+	iterate_bdevs(fdatawait_one_bdev, NULL);
 	if (unlikely(laptop_mode))
 		laptop_sync_completion();
 	return 0;
@@ -125,10 +125,10 @@ static void do_sync_work(struct work_struct *work)
 	 */
 	iterate_supers(sync_inodes_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_bdevs(flush_one_bdev, NULL);
+	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	iterate_supers(sync_inodes_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_bdevs(flush_one_bdev, NULL);
+	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	printk("Emergency Sync complete\n");
 	kfree(work);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/8] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
                   ` (6 preceding siblings ...)
  2012-07-03 14:45 ` [PATCH 7/8] vfs: Remove unnecessary flushing of block devices Jan Kara
@ 2012-07-03 14:45 ` Jan Kara
  7 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-07-03 14:45 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig, Jan Kara

wakeup_flusher_threads(0) will queue work doing complete writeback for each
flusher thread. Thus there is not much point in submitting another work doing
full inode WB_SYNC_NONE writeback by writeback_inodes_sb().

After this change it does not make sense to call nonblocking ->sync_fs and
block device flush before calling sync_inodes_sb() because
wakeup_flusher_threads() is completely asynchronous and thus these functions
would be called in parallel with inode writeback running which will effectively
void any work they do. So we move sync_inodes_sb() call before these two
functions.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sync.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 6d177d1..eb8722d 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -73,12 +73,6 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
 		sync_inodes_sb(sb);
 }
 
-static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
-{
-	if (!(sb->s_flags & MS_RDONLY))
-		writeback_inodes_sb(sb);
-}
-
 static void sync_fs_one_sb(struct super_block *sb, void *arg)
 {
 	if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
@@ -96,17 +90,22 @@ static void fdatawait_one_bdev(struct block_device *bdev, void *arg)
 }
 
 /*
- * sync everything.  Start out by waking pdflush, because that writes back
- * all queues in parallel.
+ * Sync everything. We start by waking flusher threads so that most of
+ * writeback runs on all devices in parallel. Then we sync all inodes reliably
+ * which effectively also waits for all flusher threads to finish doing
+ * writeback. At this point all data is on disk so metadata should be stable
+ * and we tell filesystems to sync their metadata via ->sync_fs() calls.
+ * Finally, we writeout all block devices because some filesystems (e.g. ext2)
+ * just write metadata (such as inodes or bitmaps) to block device page cache
+ * and do not sync it on their own in ->sync_fs().
  */
 SYSCALL_DEFINE0(sync)
 {
 	int nowait = 0, wait = 1;
 
 	wakeup_flusher_threads(0, WB_REASON_SYNC);
-	iterate_supers(writeback_inodes_one_sb, NULL);
-	iterate_supers(sync_fs_one_sb, &nowait);
 	iterate_supers(sync_inodes_one_sb, NULL);
+	iterate_supers(sync_fs_one_sb, &nowait);
 	iterate_supers(sync_fs_one_sb, &wait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	iterate_bdevs(fdatawait_one_bdev, NULL);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/8] vfs: Make sys_sync writeout also block device inodes
  2012-07-03 14:45 ` [PATCH 6/8] vfs: Make sys_sync writeout also block device inodes Jan Kara
@ 2012-07-17  6:27   ` Fengguang Wu
  2012-07-20  2:21     ` Fengguang Wu
  0 siblings, 1 reply; 13+ messages in thread
From: Fengguang Wu @ 2012-07-17  6:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig

Hi Jan,

I got this error for !CONFIG_BLOCK builds:

fs/sync.c:110:2: error: implicit declaration of function 'iterate_bdevs' [-Werror=implicit-function-declaration]

Thanks,
Fengguang

On Tue, Jul 03, 2012 at 04:45:32PM +0200, Jan Kara wrote:
> In case block device does not have filesystem mounted on it, sys_sync will just
> ignore it and doesn't writeout its dirty pages. This is because writeback code
> avoids writing inodes from superblock without backing device and
> blockdev_superblock is such a superblock.  Since it's unexpected that sync
> doesn't writeout dirty data for block devices be nice to users and change the
> behavior to do so. So now we iterate over all block devices on blockdev_super
> instead of iterating over all superblocks when syncing block devices.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/sync.c |   18 +++++++++++-------
>  1 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/sync.c b/fs/sync.c
> index 867bf78..ea6e818 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -85,10 +85,14 @@ static void sync_fs_one_sb(struct super_block *sb, void *arg)
>  		sb->s_op->sync_fs(sb, *(int *)arg);
>  }
>  
> -static void sync_blkdev_one_sb(struct super_block *sb, void *arg)
> +static void flush_one_bdev(struct block_device *bdev, void *arg)
>  {
> -	if (!(sb->s_flags & MS_RDONLY))
> -		__sync_blockdev(sb->s_bdev, *(int *)arg);
> +	__sync_blockdev(bdev, 0);
> +}
> +
> +static void sync_one_bdev(struct block_device *bdev, void *arg)
> +{
> +	sync_blockdev(bdev);
>  }
>  
>  /*
> @@ -102,10 +106,10 @@ SYSCALL_DEFINE0(sync)
>  	wakeup_flusher_threads(0, WB_REASON_SYNC);
>  	iterate_supers(writeback_inodes_one_sb, NULL);
>  	iterate_supers(sync_fs_one_sb, &nowait);
> -	iterate_supers(sync_blkdev_one_sb, &nowait);
> +	iterate_bdevs(flush_one_bdev, NULL);
>  	iterate_supers(sync_inodes_one_sb, NULL);
>  	iterate_supers(sync_fs_one_sb, &wait);
> -	iterate_supers(sync_blkdev_one_sb, &wait);
> +	iterate_bdevs(sync_one_bdev, NULL);
>  	if (unlikely(laptop_mode))
>  		laptop_sync_completion();
>  	return 0;
> @@ -121,10 +125,10 @@ static void do_sync_work(struct work_struct *work)
>  	 */
>  	iterate_supers(sync_inodes_one_sb, &nowait);
>  	iterate_supers(sync_fs_one_sb, &nowait);
> -	iterate_supers(sync_blkdev_one_sb, &nowait);
> +	iterate_bdevs(flush_one_bdev, NULL);
>  	iterate_supers(sync_inodes_one_sb, &nowait);
>  	iterate_supers(sync_fs_one_sb, &nowait);
> -	iterate_supers(sync_blkdev_one_sb, &nowait);
> +	iterate_bdevs(flush_one_bdev, NULL);
>  	printk("Emergency Sync complete\n");
>  	kfree(work);
>  }
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/8] vfs: Make sys_sync writeout also block device inodes
  2012-07-17  6:27   ` Fengguang Wu
@ 2012-07-20  2:21     ` Fengguang Wu
  0 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2012-07-20  2:21 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, linux-fsdevel, LKML, Curt Wohlgemuth, Christoph Hellwig

Hi Al,

Would you please fold the below fix to commit 5c781a65fff ("vfs:
Create function for iterating over block devices")?

Thanks,
Fengguang
---
Subject: [PATCH] sync: fix build error on iterate_bdevs()

fs/sync.c:110:2: error: implicit declaration of function 'iterate_bdevs'

---
 include/linux/fs.h |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index ef839a7..b62af06 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2124,6 +2124,10 @@ static inline int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
 	return 0;
 }
+
+static inline void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
+{
+}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] quota: Move quota syncing to ->sync_fs method
  2012-01-05 23:46 [PATCH 0/8 RESEND] Cleanup and improve sync (v4) Jan Kara
@ 2012-01-05 23:46 ` Jan Kara
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-01-05 23:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Al Viro, Jan Kara, Theodore Ts'o,
	Joel Becker, reiserfs-devel

Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/super.c     |    6 ++++++
 fs/ext3/super.c     |    5 +++++
 fs/ext4/super.c     |    5 +++++
 fs/gfs2/super.c     |    2 ++
 fs/jfs/super.c      |    5 +++++
 fs/reiserfs/super.c |    5 +++++
 fs/sync.c           |    3 ---
 7 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index bd8ac16..adf3cde 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1180,6 +1180,12 @@ static int ext2_sync_fs(struct super_block *sb, int wait)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	struct ext2_super_block *es = EXT2_SB(sb)->s_es;
 
+	/*
+	 * Write quota structures to quota file, sync_blockdev() will write
+	 * them to disk later
+	 */
+	dquot_writeback_dquots(sb, -1);
+
 	spin_lock(&sbi->s_lock);
 	if (es->s_state & cpu_to_le16(EXT2_VALID_FS)) {
 		ext2_debug("setting valid to 0\n");
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 922d289..b721140 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2522,6 +2522,11 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
 	tid_t target;
 
 	trace_ext3_sync_fs(sb, wait);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
 		if (wait)
 			log_wait_commit(EXT3_SB(sb)->s_journal, target);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9953d80..a2d9098 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4304,6 +4304,11 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 
 	trace_ext4_sync_fs(sb, wait);
 	flush_workqueue(sbi->dio_unwritten_wq);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
 			jbd2_log_wait_commit(sbi->s_journal, target);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index af40287..3757a9a 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -952,6 +952,8 @@ restart:
 static int gfs2_sync_fs(struct super_block *sb, int wait)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
+
+	gfs2_quota_sync(sb, -1);
 	if (wait && sdp)
 		gfs2_log_flush(sdp, NULL);
 	return 0;
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index a44eff0..b9e0fca 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -602,6 +602,11 @@ static int jfs_sync_fs(struct super_block *sb, int wait)
 
 	/* log == NULL indicates read-only mount */
 	if (log) {
+		/*
+		 * Write quota structures to quota file, sync_blockdev() will
+		 * write them to disk later
+		 */
+		dquot_writeback_dquots(sb, -1);
 		jfs_flush_journal(log, wait);
 		jfs_syncpt(log, 0);
 	}
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 14363b9..dd7aec5 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -66,6 +66,11 @@ static int reiserfs_sync_fs(struct super_block *s, int wait)
 {
 	struct reiserfs_transaction_handle th;
 
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(s, -1);
 	reiserfs_write_lock(s);
 	if (!journal_begin(&th, s, 1))
 		if (!journal_end_sync(&th, s, 1))
diff --git a/fs/sync.c b/fs/sync.c
index faad2d8..cda435f 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -30,9 +30,6 @@
  */
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
-	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1);
-
 	if (wait)
 		sync_inodes_sb(sb);
 	else
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] quota: Move quota syncing to ->sync_fs method
  2011-11-09 17:44 [PATCH 0/8] Cleanup and improve sync (v4) Jan Kara
@ 2011-11-09 17:45 ` Jan Kara
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2011-11-09 17:45 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Christoph Hellwig, Al Viro, Jan Kara, Theodore Ts'o,
	Joel Becker, reiserfs-devel

Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/super.c     |    6 ++++++
 fs/ext3/super.c     |    5 +++++
 fs/ext4/super.c     |    5 +++++
 fs/gfs2/super.c     |    2 ++
 fs/jfs/super.c      |    5 +++++
 fs/reiserfs/super.c |    5 +++++
 fs/sync.c           |    3 ---
 7 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index bd8ac16..adf3cde 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1180,6 +1180,12 @@ static int ext2_sync_fs(struct super_block *sb, int wait)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	struct ext2_super_block *es = EXT2_SB(sb)->s_es;
 
+	/*
+	 * Write quota structures to quota file, sync_blockdev() will write
+	 * them to disk later
+	 */
+	dquot_writeback_dquots(sb, -1);
+
 	spin_lock(&sbi->s_lock);
 	if (es->s_state & cpu_to_le16(EXT2_VALID_FS)) {
 		ext2_debug("setting valid to 0\n");
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 922d289..b721140 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -2522,6 +2522,11 @@ static int ext3_sync_fs(struct super_block *sb, int wait)
 	tid_t target;
 
 	trace_ext3_sync_fs(sb, wait);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
 		if (wait)
 			log_wait_commit(EXT3_SB(sb)->s_journal, target);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9953d80..a2d9098 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4304,6 +4304,11 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 
 	trace_ext4_sync_fs(sb, wait);
 	flush_workqueue(sbi->dio_unwritten_wq);
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(sb, -1);
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
 			jbd2_log_wait_commit(sbi->s_journal, target);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index af40287..3757a9a 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -952,6 +952,8 @@ restart:
 static int gfs2_sync_fs(struct super_block *sb, int wait)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
+
+	gfs2_quota_sync(sb, -1);
 	if (wait && sdp)
 		gfs2_log_flush(sdp, NULL);
 	return 0;
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index a44eff0..b9e0fca 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -602,6 +602,11 @@ static int jfs_sync_fs(struct super_block *sb, int wait)
 
 	/* log == NULL indicates read-only mount */
 	if (log) {
+		/*
+		 * Write quota structures to quota file, sync_blockdev() will
+		 * write them to disk later
+		 */
+		dquot_writeback_dquots(sb, -1);
 		jfs_flush_journal(log, wait);
 		jfs_syncpt(log, 0);
 	}
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 14363b9..dd7aec5 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -66,6 +66,11 @@ static int reiserfs_sync_fs(struct super_block *s, int wait)
 {
 	struct reiserfs_transaction_handle th;
 
+	/*
+	 * Writeback quota in non-journalled quota case - journalled quota has
+	 * no dirty dquots
+	 */
+	dquot_writeback_dquots(s, -1);
 	reiserfs_write_lock(s);
 	if (!journal_begin(&th, s, 1))
 		if (!journal_end_sync(&th, s, 1))
diff --git a/fs/sync.c b/fs/sync.c
index faad2d8..cda435f 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -30,9 +30,6 @@
  */
 static int __sync_filesystem(struct super_block *sb, int wait)
 {
-	if (sb->s_qcop && sb->s_qcop->quota_sync)
-		sb->s_qcop->quota_sync(sb, -1);
-
 	if (wait)
 		sync_inodes_sb(sb);
 	else
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-07-20  2:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-03 14:45 [PATCH 0/8 v4] Flush all block devices on sync(2) and cleanup the code Jan Kara
2012-07-03 14:45 ` [PATCH 1/8] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
2012-07-03 14:45 ` [PATCH 2/8] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
2012-07-03 14:45 ` [PATCH 3/8] quota: Move quota syncing to ->sync_fs method Jan Kara
2012-07-03 14:45 ` [PATCH 4/8] vfs: Reorder operations during sys_sync Jan Kara
2012-07-03 14:45 ` [PATCH 5/8] vfs: Create function for iterating over block devices Jan Kara
2012-07-03 14:45 ` [PATCH 6/8] vfs: Make sys_sync writeout also block device inodes Jan Kara
2012-07-17  6:27   ` Fengguang Wu
2012-07-20  2:21     ` Fengguang Wu
2012-07-03 14:45 ` [PATCH 7/8] vfs: Remove unnecessary flushing of block devices Jan Kara
2012-07-03 14:45 ` [PATCH 8/8] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2012-01-05 23:46 [PATCH 0/8 RESEND] Cleanup and improve sync (v4) Jan Kara
2012-01-05 23:46 ` [PATCH 3/8] quota: Move quota syncing to ->sync_fs method Jan Kara
2011-11-09 17:44 [PATCH 0/8] Cleanup and improve sync (v4) Jan Kara
2011-11-09 17:45 ` [PATCH 3/8] quota: Move quota syncing to ->sync_fs method Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).