All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Allow the VM to manage NFS unstable writes
@ 2010-02-10 17:03 Trond Myklebust
  2010-02-10 17:03 ` [PATCH 01/13] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Hi,

The following patch series applies on top of Al Viro's 'write_inode' branch
in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/
(which basically just adds a struct writeback_control * argument to the
superblock's 'write_inode' callback).

These patches are designed to ensure better control by the VM of the NFS
'unstable writes'. It should allow balance_dirty_pages() to manage the
unstable write page budget, by giving it a method to tell the NFS
client when it needs to clear out unstable writes and, by implication,
when it can continue to cache them.

This patchset has already been posted on the linux-nfs and linux-kernel
mailing lists. I'm posting it here in order to hopefully get some feedback
from the VM community (and possibly a few more Acks).

Apologies to those of you who have already received these patches through
the other mailing lists...

Cheers
  Trond

Peter Zijlstra (1):
  VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE

Trond Myklebust (12):
  VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices
  NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
  NFS: Reduce the number of unnecessary COMMIT calls
  VM/NFS: The VM must tell the filesystem when to free reclaimable
    pages
  NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background
    is set
  NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has
    unstable pages
  NFS: Simplify nfs_wb_page_cancel()
  NFS: Replace __nfs_write_mapping with sync_inode()
  NFS: Simplify nfs_wb_page()
  NFS: Clean up nfs_sync_mapping
  NFS: Remove requirement for inode->i_mutex from
    nfs_invalidate_mapping
  NFS: Don't write out dirty pages in nfs_release_page()

 fs/nfs/client.c             |    1 +
 fs/nfs/dir.c                |    2 +-
 fs/nfs/file.c               |    7 ++
 fs/nfs/inode.c              |   82 ++-------------
 fs/nfs/symlink.c            |    2 +-
 fs/nfs/write.c              |  238 ++++++++++++-------------------------------
 include/linux/backing-dev.h |    9 ++-
 include/linux/nfs_fs.h      |   13 ---
 include/linux/writeback.h   |    5 +
 mm/backing-dev.c            |    6 +-
 mm/filemap.c                |    2 +-
 mm/page-writeback.c         |   30 +++++-
 mm/truncate.c               |    2 +-
 13 files changed, 130 insertions(+), 269 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/13] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE
  2010-02-10 17:03 [PATCH 00/13] Allow the VM to manage NFS unstable writes Trond Myklebust
@ 2010-02-10 17:03 ` Trond Myklebust
  2010-02-10 17:03   ` [PATCH 02/13] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Peter Zijlstra, Trond Myklebust

From: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/nfs/write.c              |    6 +++---
 include/linux/backing-dev.h |    3 ++-
 mm/backing-dev.c            |    6 ++++--
 mm/filemap.c                |    2 +-
 mm/page-writeback.c         |   16 ++++++++++------
 mm/truncate.c               |    2 +-
 6 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 7b54b8b..d5411e2 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -440,7 +440,7 @@ nfs_mark_request_commit(struct nfs_page *req)
 			NFS_PAGE_TAG_COMMIT);
 	spin_unlock(&inode->i_lock);
 	inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-	inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+	inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_UNSTABLE);
 	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 }
 
@@ -451,7 +451,7 @@ nfs_clear_request_commit(struct nfs_page *req)
 
 	if (test_and_clear_bit(PG_CLEAN, &(req)->wb_flags)) {
 		dec_zone_page_state(page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+		dec_bdi_stat(page->mapping->backing_dev_info, BDI_UNSTABLE);
 		return 1;
 	}
 	return 0;
@@ -1322,7 +1322,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 		nfs_mark_request_commit(req);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
-				BDI_RECLAIMABLE);
+				BDI_UNSTABLE);
 		nfs_clear_page_tag_locked(req);
 	}
 	return -ENOMEM;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index fcbc26a..42c3e2a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -36,7 +36,8 @@ enum bdi_state {
 typedef int (congested_fn)(void *, int);
 
 enum bdi_stat_item {
-	BDI_RECLAIMABLE,
+	BDI_DIRTY,
+	BDI_UNSTABLE,
 	BDI_WRITEBACK,
 	NR_BDI_STAT_ITEMS
 };
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 0e8ca03..88f3655 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -88,7 +88,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 #define K(x) ((x) << (PAGE_SHIFT - 10))
 	seq_printf(m,
 		   "BdiWriteback:     %8lu kB\n"
-		   "BdiReclaimable:   %8lu kB\n"
+		   "BdiDirty:         %8lu kB\n"
+		   "BdiUnstable:      %8lu kB\n"
 		   "BdiDirtyThresh:   %8lu kB\n"
 		   "DirtyThresh:      %8lu kB\n"
 		   "BackgroundThresh: %8lu kB\n"
@@ -102,7 +103,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "wb_list:          %8u\n"
 		   "wb_cnt:           %8u\n",
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
-		   (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)),
+		   (unsigned long) K(bdi_stat(bdi, BDI_DIRTY)),
+		   (unsigned long) K(bdi_stat(bdi, BDI_UNSTABLE)),
 		   K(bdi_thresh), K(dirty_thresh),
 		   K(background_thresh), nr_wb, nr_dirty, nr_io, nr_more_io,
 		   !list_empty(&bdi->bdi_list), bdi->state, bdi->wb_mask,
diff --git a/mm/filemap.c b/mm/filemap.c
index 698ea80..a016561 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -136,7 +136,7 @@ void __remove_from_page_cache(struct page *page)
 	 */
 	if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
 		dec_zone_page_state(page, NR_FILE_DIRTY);
-		dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
+		dec_bdi_stat(mapping->backing_dev_info, BDI_DIRTY);
 	}
 }
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0b19943..23d3fc6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -272,7 +272,8 @@ static void clip_bdi_dirty_limit(struct backing_dev_info *bdi,
 	else
 		avail_dirty = 0;
 
-	avail_dirty += bdi_stat(bdi, BDI_RECLAIMABLE) +
+	avail_dirty += bdi_stat(bdi, BDI_DIRTY) +
+		bdi_stat(bdi, BDI_UNSTABLE) +
 		bdi_stat(bdi, BDI_WRITEBACK);
 
 	*pbdi_dirty = min(*pbdi_dirty, avail_dirty);
@@ -509,7 +510,8 @@ static void balance_dirty_pages(struct address_space *mapping,
 					global_page_state(NR_UNSTABLE_NFS);
 		nr_writeback = global_page_state(NR_WRITEBACK);
 
-		bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
+		bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY) +
+				     bdi_stat(bdi, BDI_UNSTABLE);
 		bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
 
 		if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
@@ -554,10 +556,12 @@ static void balance_dirty_pages(struct address_space *mapping,
 		 * deltas.
 		 */
 		if (bdi_thresh < 2*bdi_stat_error(bdi)) {
-			bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
+			bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_DIRTY) +
+					     bdi_stat_sum(bdi, BDI_UNSTABLE);
 			bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK);
 		} else if (bdi_nr_reclaimable) {
-			bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
+			bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY) +
+					     bdi_stat(bdi, BDI_UNSTABLE);
 			bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
 		}
 
@@ -1079,7 +1083,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
 	if (mapping_cap_account_dirty(mapping)) {
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
-		__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
+		__inc_bdi_stat(mapping->backing_dev_info, BDI_DIRTY);
 		task_dirty_inc(current);
 		task_io_account_write(PAGE_CACHE_SIZE);
 	}
@@ -1255,7 +1259,7 @@ int clear_page_dirty_for_io(struct page *page)
 		if (TestClearPageDirty(page)) {
 			dec_zone_page_state(page, NR_FILE_DIRTY);
 			dec_bdi_stat(mapping->backing_dev_info,
-					BDI_RECLAIMABLE);
+					BDI_DIRTY);
 			return 1;
 		}
 		return 0;
diff --git a/mm/truncate.c b/mm/truncate.c
index e87e372..2466e0c 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -75,7 +75,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size)
 		if (mapping && mapping_cap_account_dirty(mapping)) {
 			dec_zone_page_state(page, NR_FILE_DIRTY);
 			dec_bdi_stat(mapping->backing_dev_info,
-					BDI_RECLAIMABLE);
+					BDI_DIRTY);
 			if (account_size)
 				task_io_account_cancelled_write(account_size);
 		}
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/13] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices
  2010-02-10 17:03 ` [PATCH 01/13] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
@ 2010-02-10 17:03   ` Trond Myklebust
  2010-02-10 17:03     ` [PATCH 03/13] NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Speeds up the accounting in balance_dirty_pages() for non-nfs devices.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/nfs/client.c             |    1 +
 include/linux/backing-dev.h |    6 ++++++
 mm/page-writeback.c         |   16 +++++++++++-----
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index ee77713..d0b060a 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -890,6 +890,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *
 
 	server->backing_dev_info.name = "nfs";
 	server->backing_dev_info.ra_pages = server->rpages * NFS_MAX_READAHEAD;
+	server->backing_dev_info.capabilities |= BDI_CAP_ACCT_UNSTABLE;
 
 	if (server->wsize > max_rpc_payload)
 		server->wsize = max_rpc_payload;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 42c3e2a..8b45166 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -232,6 +232,7 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
 #define BDI_CAP_EXEC_MAP	0x00000040
 #define BDI_CAP_NO_ACCT_WB	0x00000080
 #define BDI_CAP_SWAP_BACKED	0x00000100
+#define BDI_CAP_ACCT_UNSTABLE	0x00000200
 
 #define BDI_CAP_VMFLAGS \
 	(BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP)
@@ -311,6 +312,11 @@ static inline bool bdi_cap_flush_forker(struct backing_dev_info *bdi)
 	return bdi == &default_backing_dev_info;
 }
 
+static inline bool bdi_cap_account_unstable(struct backing_dev_info *bdi)
+{
+	return bdi->capabilities & BDI_CAP_ACCT_UNSTABLE;
+}
+
 static inline bool mapping_cap_writeback_dirty(struct address_space *mapping)
 {
 	return bdi_cap_writeback_dirty(mapping->backing_dev_info);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 23d3fc6..c06739b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -273,8 +273,9 @@ static void clip_bdi_dirty_limit(struct backing_dev_info *bdi,
 		avail_dirty = 0;
 
 	avail_dirty += bdi_stat(bdi, BDI_DIRTY) +
-		bdi_stat(bdi, BDI_UNSTABLE) +
 		bdi_stat(bdi, BDI_WRITEBACK);
+	if (bdi_cap_account_unstable(bdi))
+		avail_dirty += bdi_stat(bdi, BDI_UNSTABLE);
 
 	*pbdi_dirty = min(*pbdi_dirty, avail_dirty);
 }
@@ -510,8 +511,9 @@ static void balance_dirty_pages(struct address_space *mapping,
 					global_page_state(NR_UNSTABLE_NFS);
 		nr_writeback = global_page_state(NR_WRITEBACK);
 
-		bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY) +
-				     bdi_stat(bdi, BDI_UNSTABLE);
+		bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
+		if (bdi_cap_account_unstable(bdi))
+			bdi_nr_reclaimable += bdi_stat(bdi, BDI_UNSTABLE);
 		bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
 
 		if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
@@ -556,11 +558,15 @@ static void balance_dirty_pages(struct address_space *mapping,
 		 * deltas.
 		 */
 		if (bdi_thresh < 2*bdi_stat_error(bdi)) {
-			bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_DIRTY) +
+			bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_DIRTY);
+			if (bdi_cap_account_unstable(bdi))
+				bdi_nr_reclaimable +=
 					     bdi_stat_sum(bdi, BDI_UNSTABLE);
 			bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK);
 		} else if (bdi_nr_reclaimable) {
-			bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY) +
+			bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
+			if (bdi_cap_account_unstable(bdi))
+				bdi_nr_reclaimable +=
 					     bdi_stat(bdi, BDI_UNSTABLE);
 			bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
 		}
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/13] NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
  2010-02-10 17:03   ` [PATCH 02/13] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
@ 2010-02-10 17:03     ` Trond Myklebust
  2010-02-10 17:03       ` [PATCH 04/13] NFS: Reduce the number of unnecessary COMMIT calls Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

The sole purpose of nfs_write_inode is to commit unstable writes, so
move it into fs/nfs/write.c, and make nfs_commit_inode static.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/inode.c         |   12 ------------
 fs/nfs/write.c         |   24 +++++++++++++++++++++++-
 include/linux/nfs_fs.h |    7 -------
 3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index df0d68e..8819ce2 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -97,18 +97,6 @@ u64 nfs_compat_user_ino64(u64 fileid)
 	return ino;
 }
 
-int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
-{
-	int ret;
-
-	ret = nfs_commit_inode(inode,
-			wbc->sync_mode == WB_SYNC_ALL ? FLUSH_SYNC : 0);
-	if (ret >= 0)
-		return 0;
-	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-	return ret;
-}
-
 void nfs_clear_inode(struct inode *inode)
 {
 	/*
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d5411e2..9e87612 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1391,7 +1391,7 @@ static const struct rpc_call_ops nfs_commit_ops = {
 	.rpc_release = nfs_commit_release,
 };
 
-int nfs_commit_inode(struct inode *inode, int how)
+static int nfs_commit_inode(struct inode *inode, int how)
 {
 	LIST_HEAD(head);
 	int res;
@@ -1406,13 +1406,35 @@ int nfs_commit_inode(struct inode *inode, int how)
 	}
 	return res;
 }
+
+static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_control *wbc)
+{
+	int ret;
+
+	ret = nfs_commit_inode(inode,
+			wbc->sync_mode == WB_SYNC_ALL ? FLUSH_SYNC : 0);
+	if (ret >= 0)
+		return 0;
+	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+	return ret;
+}
 #else
 static inline int nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 {
 	return 0;
 }
+
+static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_control *wbc)
+{
+	return 0;
+}
 #endif
 
+int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+	return nfs_commit_unstable_pages(inode, wbc);
+}
+
 long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_control *wbc, int how)
 {
 	struct inode *inode = mapping->host;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index d09db1b..384ea3e 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -483,15 +483,8 @@ extern int nfs_wb_nocommit(struct inode *inode);
 extern int nfs_wb_page(struct inode *inode, struct page* page);
 extern int nfs_wb_page_cancel(struct inode *inode, struct page* page);
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-extern int  nfs_commit_inode(struct inode *, int);
 extern struct nfs_write_data *nfs_commitdata_alloc(void);
 extern void nfs_commit_free(struct nfs_write_data *wdata);
-#else
-static inline int
-nfs_commit_inode(struct inode *inode, int how)
-{
-	return 0;
-}
 #endif
 
 static inline int
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/13] NFS: Reduce the number of unnecessary COMMIT calls
  2010-02-10 17:03     ` [PATCH 03/13] NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c Trond Myklebust
@ 2010-02-10 17:03       ` Trond Myklebust
  2010-02-10 17:03         ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

If the caller is doing a non-blocking flush, and there are still writebacks
pending on the wire, we can usually defer the COMMIT call until those
writes are done.

Also ensure that we honour the wbc->nonblocking flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/write.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 9e87612..ed032c0 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1409,12 +1409,23 @@ static int nfs_commit_inode(struct inode *inode, int how)
 
 static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_control *wbc)
 {
-	int ret;
+	int flags = FLUSH_SYNC;
+	int ret = 0;
 
-	ret = nfs_commit_inode(inode,
-			wbc->sync_mode == WB_SYNC_ALL ? FLUSH_SYNC : 0);
+	/* Don't commit yet if this is a non-blocking flush and there are
+	 * outstanding writes for this mapping.
+	 */
+	if (wbc->sync_mode != WB_SYNC_ALL &&
+	    radix_tree_tagged(&NFS_I(inode)->nfs_page_tree,
+		    NFS_PAGE_TAG_LOCKED))
+		goto out_mark_dirty;
+
+	if (wbc->nonblocking)
+		flags = 0;
+	ret = nfs_commit_inode(inode, flags);
 	if (ret >= 0)
 		return 0;
+out_mark_dirty:
 	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 	return ret;
 }
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages
  2010-02-10 17:03       ` [PATCH 04/13] NFS: Reduce the number of unnecessary COMMIT calls Trond Myklebust
@ 2010-02-10 17:03         ` Trond Myklebust
  2010-02-10 17:03           ` [PATCH 06/13] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
  2010-02-15  5:55           ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Nick Piggin
  0 siblings, 2 replies; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

balance_dirty_pages() should really tell the filesystem whether or not it
has an excess of actual dirty pages, or whether it would be more useful to
start freeing up the unstable writes.

Assume that if the number of unstable writes is more than 1/2 the number of
reclaimable pages, then we should force NFS to free up the former.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
---
 fs/nfs/write.c            |    2 +-
 include/linux/writeback.h |    5 +++++
 mm/page-writeback.c       |   12 ++++++++++--
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ed032c0..2f1d9a6 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1415,7 +1415,7 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
 	/* Don't commit yet if this is a non-blocking flush and there are
 	 * outstanding writes for this mapping.
 	 */
-	if (wbc->sync_mode != WB_SYNC_ALL &&
+	if (!wbc->force_commit_unstable && wbc->sync_mode != WB_SYNC_ALL &&
 	    radix_tree_tagged(&NFS_I(inode)->nfs_page_tree,
 		    NFS_PAGE_TAG_LOCKED))
 		goto out_mark_dirty;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 76e8903..8229139 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -62,6 +62,11 @@ struct writeback_control {
 	 * so we use a single control to update them
 	 */
 	unsigned no_nrwrite_index_update:1;
+	/*
+	 * The following is used by balance_dirty_pages() to
+	 * force NFS to commit unstable pages.
+	 */
+	unsigned force_commit_unstable:1;
 };
 
 /*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c06739b..6a0aec7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -503,6 +503,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 			.nr_to_write	= write_chunk,
 			.range_cyclic	= 1,
 		};
+		long bdi_nr_unstable = 0;
 
 		get_dirty_limits(&background_thresh, &dirty_thresh,
 				&bdi_thresh, bdi);
@@ -512,8 +513,10 @@ static void balance_dirty_pages(struct address_space *mapping,
 		nr_writeback = global_page_state(NR_WRITEBACK);
 
 		bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
-		if (bdi_cap_account_unstable(bdi))
-			bdi_nr_reclaimable += bdi_stat(bdi, BDI_UNSTABLE);
+		if (bdi_cap_account_unstable(bdi)) {
+			bdi_nr_unstable = bdi_stat(bdi, BDI_UNSTABLE);
+			bdi_nr_reclaimable += bdi_nr_unstable;
+		}
 		bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
 
 		if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
@@ -541,6 +544,11 @@ static void balance_dirty_pages(struct address_space *mapping,
 		 * up.
 		 */
 		if (bdi_nr_reclaimable > bdi_thresh) {
+			wbc.force_commit_unstable = 0;
+			/* Force NFS to also free up unstable writes. */
+			if (bdi_nr_unstable > bdi_nr_reclaimable / 2)
+				wbc.force_commit_unstable = 1;
+
 			writeback_inodes_wbc(&wbc);
 			pages_written += write_chunk - wbc.nr_to_write;
 			get_dirty_limits(&background_thresh, &dirty_thresh,
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/13] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
  2010-02-10 17:03         ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
@ 2010-02-10 17:03           ` Trond Myklebust
  2010-02-10 17:03             ` [PATCH 07/13] NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages Trond Myklebust
  2010-02-15  5:55           ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Nick Piggin
  1 sibling, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/nfs/write.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2f1d9a6..8533a2f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1420,7 +1420,7 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
 		    NFS_PAGE_TAG_LOCKED))
 		goto out_mark_dirty;
 
-	if (wbc->nonblocking)
+	if (wbc->nonblocking || wbc->for_background)
 		flags = 0;
 	ret = nfs_commit_inode(inode, flags);
 	if (ret >= 0)
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/13] NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
  2010-02-10 17:03           ` [PATCH 06/13] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
@ 2010-02-10 17:03             ` Trond Myklebust
  2010-02-10 17:03               ` [PATCH 08/13] NFS: Simplify nfs_wb_page_cancel() Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Since nfs_scan_list() doesn't wait for locked pages, we have a race in
which it is possible to end up with an inode that needs to send a COMMIT,
but which does not have the I_DIRTY_DATASYNC flag set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/write.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8533a2f..e027f66 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -573,11 +573,15 @@ static int
 nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
+	int ret;
 
 	if (!nfs_need_commit(nfsi))
 		return 0;
 
-	return nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
+	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
+	if (nfs_need_commit(NFS_I(inode)))
+		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+	return ret;
 }
 #else
 static inline int nfs_need_commit(struct nfs_inode *nfsi)
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/13] NFS: Simplify nfs_wb_page_cancel()
  2010-02-10 17:03             ` [PATCH 07/13] NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages Trond Myklebust
@ 2010-02-10 17:03               ` Trond Myklebust
  2010-02-10 17:03                 ` [PATCH 09/13] NFS: Replace __nfs_write_mapping with sync_inode() Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

In all cases we should be able to just remove the request and call
cancel_dirty_page().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/write.c         |   39 +--------------------------------------
 include/linux/nfs_fs.h |    2 --
 2 files changed, 1 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e027f66..1251555 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -539,19 +539,6 @@ static int nfs_wait_on_requests_locked(struct inode *inode, pgoff_t idx_start, u
 	return res;
 }
 
-static void nfs_cancel_commit_list(struct list_head *head)
-{
-	struct nfs_page *req;
-
-	while(!list_empty(head)) {
-		req = nfs_list_entry(head->next);
-		nfs_list_remove_request(req);
-		nfs_clear_request_commit(req);
-		nfs_inode_remove_request(req);
-		nfs_unlock_request(req);
-	}
-}
-
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
 static int
 nfs_need_commit(struct nfs_inode *nfsi)
@@ -1484,13 +1471,6 @@ long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_contr
 		pages = nfs_scan_commit(inode, &head, idx_start, npages);
 		if (pages == 0)
 			break;
-		if (how & FLUSH_INVALIDATE) {
-			spin_unlock(&inode->i_lock);
-			nfs_cancel_commit_list(&head);
-			ret = pages;
-			spin_lock(&inode->i_lock);
-			continue;
-		}
 		pages += nfs_scan_commit(inode, &head, 0, 0);
 		spin_unlock(&inode->i_lock);
 		ret = nfs_commit_list(inode, &head, how);
@@ -1547,26 +1527,13 @@ int nfs_wb_nocommit(struct inode *inode)
 int nfs_wb_page_cancel(struct inode *inode, struct page *page)
 {
 	struct nfs_page *req;
-	loff_t range_start = page_offset(page);
-	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
-	struct writeback_control wbc = {
-		.bdi = page->mapping->backing_dev_info,
-		.sync_mode = WB_SYNC_ALL,
-		.nr_to_write = LONG_MAX,
-		.range_start = range_start,
-		.range_end = range_end,
-	};
 	int ret = 0;
 
 	BUG_ON(!PageLocked(page));
 	for (;;) {
 		req = nfs_page_find_request(page);
 		if (req == NULL)
-			goto out;
-		if (test_bit(PG_CLEAN, &req->wb_flags)) {
-			nfs_release_request(req);
 			break;
-		}
 		if (nfs_lock_request_dontget(req)) {
 			nfs_inode_remove_request(req);
 			/*
@@ -1580,12 +1547,8 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
 		ret = nfs_wait_on_request(req);
 		nfs_release_request(req);
 		if (ret < 0)
-			goto out;
+			break;
 	}
-	if (!PagePrivate(page))
-		return 0;
-	ret = nfs_sync_mapping_wait(page->mapping, &wbc, FLUSH_INVALIDATE);
-out:
 	return ret;
 }
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 384ea3e..1eec414 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -34,8 +34,6 @@
 #define FLUSH_LOWPRI		8	/* low priority background flush */
 #define FLUSH_HIGHPRI		16	/* high priority memory reclaim flush */
 #define FLUSH_NOCOMMIT		32	/* Don't send the NFSv3/v4 COMMIT */
-#define FLUSH_INVALIDATE	64	/* Invalidate the page cache */
-#define FLUSH_NOWRITEPAGE	128	/* Don't call writepage() */
 
 #ifdef __KERNEL__
 
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/13] NFS: Replace __nfs_write_mapping with sync_inode()
  2010-02-10 17:03               ` [PATCH 08/13] NFS: Simplify nfs_wb_page_cancel() Trond Myklebust
@ 2010-02-10 17:03                 ` Trond Myklebust
  2010-02-10 17:03                   ` [PATCH 10/13] NFS: Simplify nfs_wb_page() Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Now that we have correct COMMIT semantics in writeback_single_inode, we can
reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a
call to filemap_write_and_wait(), which doesn't need to hold the
inode->i_mutex.

With that done, we can eliminate nfs_write_mapping() altogether.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/inode.c         |   15 +++++----------
 fs/nfs/write.c         |   42 +++++-------------------------------------
 include/linux/nfs_fs.h |    2 --
 3 files changed, 10 insertions(+), 49 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 8819ce2..13fe0dc 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -495,17 +495,11 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 	int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME;
 	int err;
 
-	/*
-	 * Flush out writes to the server in order to update c/mtime.
-	 *
-	 * Hold the i_mutex to suspend application writes temporarily;
-	 * this prevents long-running writing applications from blocking
-	 * nfs_wb_nocommit.
-	 */
+	/* Flush out writes to the server in order to update c/mtime.  */
 	if (S_ISREG(inode->i_mode)) {
-		mutex_lock(&inode->i_mutex);
-		nfs_wb_nocommit(inode);
-		mutex_unlock(&inode->i_mutex);
+		err = filemap_write_and_wait(inode->i_mapping);
+		if (err)
+			goto out;
 	}
 
 	/*
@@ -529,6 +523,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 		generic_fillattr(inode, stat);
 		stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
 	}
+out:
 	return err;
 }
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1251555..da7f0c4 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1443,7 +1443,6 @@ long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_contr
 	pgoff_t idx_start, idx_end;
 	unsigned int npages = 0;
 	LIST_HEAD(head);
-	int nocommit = how & FLUSH_NOCOMMIT;
 	long pages, ret;
 
 	/* FIXME */
@@ -1460,14 +1459,11 @@ long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_contr
 				npages = 0;
 		}
 	}
-	how &= ~FLUSH_NOCOMMIT;
 	spin_lock(&inode->i_lock);
 	do {
 		ret = nfs_wait_on_requests_locked(inode, idx_start, npages);
 		if (ret != 0)
 			continue;
-		if (nocommit)
-			break;
 		pages = nfs_scan_commit(inode, &head, idx_start, npages);
 		if (pages == 0)
 			break;
@@ -1481,47 +1477,19 @@ long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_contr
 	return ret;
 }
 
-static int __nfs_write_mapping(struct address_space *mapping, struct writeback_control *wbc, int how)
-{
-	int ret;
-
-	ret = nfs_writepages(mapping, wbc);
-	if (ret < 0)
-		goto out;
-	ret = nfs_sync_mapping_wait(mapping, wbc, how);
-	if (ret < 0)
-		goto out;
-	return 0;
-out:
-	__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
-	return ret;
-}
-
-/* Two pass sync: first using WB_SYNC_NONE, then WB_SYNC_ALL */
-static int nfs_write_mapping(struct address_space *mapping, int how)
+/*
+ * flush the inode to disk.
+ */
+int nfs_wb_all(struct inode *inode)
 {
 	struct writeback_control wbc = {
-		.bdi = mapping->backing_dev_info,
 		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = LONG_MAX,
 		.range_start = 0,
 		.range_end = LLONG_MAX,
 	};
 
-	return __nfs_write_mapping(mapping, &wbc, how);
-}
-
-/*
- * flush the inode to disk.
- */
-int nfs_wb_all(struct inode *inode)
-{
-	return nfs_write_mapping(inode->i_mapping, 0);
-}
-
-int nfs_wb_nocommit(struct inode *inode)
-{
-	return nfs_write_mapping(inode->i_mapping, FLUSH_NOCOMMIT);
+	return sync_inode(inode, &wbc);
 }
 
 int nfs_wb_page_cancel(struct inode *inode, struct page *page)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1eec414..3383622 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -33,7 +33,6 @@
 #define FLUSH_STABLE		4	/* commit to stable storage */
 #define FLUSH_LOWPRI		8	/* low priority background flush */
 #define FLUSH_HIGHPRI		16	/* high priority memory reclaim flush */
-#define FLUSH_NOCOMMIT		32	/* Don't send the NFSv3/v4 COMMIT */
 
 #ifdef __KERNEL__
 
@@ -477,7 +476,6 @@ extern int nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);
  */
 extern long nfs_sync_mapping_wait(struct address_space *, struct writeback_control *, int);
 extern int nfs_wb_all(struct inode *inode);
-extern int nfs_wb_nocommit(struct inode *inode);
 extern int nfs_wb_page(struct inode *inode, struct page* page);
 extern int nfs_wb_page_cancel(struct inode *inode, struct page* page);
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/13] NFS: Simplify nfs_wb_page()
  2010-02-10 17:03                 ` [PATCH 09/13] NFS: Replace __nfs_write_mapping with sync_inode() Trond Myklebust
@ 2010-02-10 17:03                   ` Trond Myklebust
  2010-02-10 17:03                     ` [PATCH 11/13] NFS: Clean up nfs_sync_mapping Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/write.c         |  120 +++++++++--------------------------------------
 include/linux/nfs_fs.h |    1 -
 2 files changed, 23 insertions(+), 98 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index da7f0c4..f438d55 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -501,44 +501,6 @@ int nfs_reschedule_unstable_write(struct nfs_page *req)
 }
 #endif
 
-/*
- * Wait for a request to complete.
- *
- * Interruptible by fatal signals only.
- */
-static int nfs_wait_on_requests_locked(struct inode *inode, pgoff_t idx_start, unsigned int npages)
-{
-	struct nfs_inode *nfsi = NFS_I(inode);
-	struct nfs_page *req;
-	pgoff_t idx_end, next;
-	unsigned int		res = 0;
-	int			error;
-
-	if (npages == 0)
-		idx_end = ~0;
-	else
-		idx_end = idx_start + npages - 1;
-
-	next = idx_start;
-	while (radix_tree_gang_lookup_tag(&nfsi->nfs_page_tree, (void **)&req, next, 1, NFS_PAGE_TAG_LOCKED)) {
-		if (req->wb_index > idx_end)
-			break;
-
-		next = req->wb_index + 1;
-		BUG_ON(!NFS_WBACK_BUSY(req));
-
-		kref_get(&req->wb_kref);
-		spin_unlock(&inode->i_lock);
-		error = nfs_wait_on_request(req);
-		nfs_release_request(req);
-		spin_lock(&inode->i_lock);
-		if (error < 0)
-			return error;
-		res++;
-	}
-	return res;
-}
-
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
 static int
 nfs_need_commit(struct nfs_inode *nfsi)
@@ -1421,7 +1383,7 @@ out_mark_dirty:
 	return ret;
 }
 #else
-static inline int nfs_commit_list(struct inode *inode, struct list_head *head, int how)
+static int nfs_commit_inode(struct inode *inode, int how)
 {
 	return 0;
 }
@@ -1437,46 +1399,6 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
 	return nfs_commit_unstable_pages(inode, wbc);
 }
 
-long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_control *wbc, int how)
-{
-	struct inode *inode = mapping->host;
-	pgoff_t idx_start, idx_end;
-	unsigned int npages = 0;
-	LIST_HEAD(head);
-	long pages, ret;
-
-	/* FIXME */
-	if (wbc->range_cyclic)
-		idx_start = 0;
-	else {
-		idx_start = wbc->range_start >> PAGE_CACHE_SHIFT;
-		idx_end = wbc->range_end >> PAGE_CACHE_SHIFT;
-		if (idx_end > idx_start) {
-			pgoff_t l_npages = 1 + idx_end - idx_start;
-			npages = l_npages;
-			if (sizeof(npages) != sizeof(l_npages) &&
-					(pgoff_t)npages != l_npages)
-				npages = 0;
-		}
-	}
-	spin_lock(&inode->i_lock);
-	do {
-		ret = nfs_wait_on_requests_locked(inode, idx_start, npages);
-		if (ret != 0)
-			continue;
-		pages = nfs_scan_commit(inode, &head, idx_start, npages);
-		if (pages == 0)
-			break;
-		pages += nfs_scan_commit(inode, &head, 0, 0);
-		spin_unlock(&inode->i_lock);
-		ret = nfs_commit_list(inode, &head, how);
-		spin_lock(&inode->i_lock);
-
-	} while (ret >= 0);
-	spin_unlock(&inode->i_lock);
-	return ret;
-}
-
 /*
  * flush the inode to disk.
  */
@@ -1520,45 +1442,49 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
 	return ret;
 }
 
-static int nfs_wb_page_priority(struct inode *inode, struct page *page,
-				int how)
+/*
+ * Write back all requests on one page - we do this before reading it.
+ */
+int nfs_wb_page(struct inode *inode, struct page *page)
 {
 	loff_t range_start = page_offset(page);
 	loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1);
 	struct writeback_control wbc = {
-		.bdi = page->mapping->backing_dev_info,
 		.sync_mode = WB_SYNC_ALL,
-		.nr_to_write = LONG_MAX,
+		.nr_to_write = 0,
 		.range_start = range_start,
 		.range_end = range_end,
 	};
+	struct nfs_page *req;
+	int need_commit;
 	int ret;
 
-	do {
+	while(PagePrivate(page)) {
 		if (clear_page_dirty_for_io(page)) {
 			ret = nfs_writepage_locked(page, &wbc);
 			if (ret < 0)
 				goto out_error;
-		} else if (!PagePrivate(page))
+		}
+		req = nfs_find_and_lock_request(page);
+		if (!req)
 			break;
-		ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
-		if (ret < 0)
+		if (IS_ERR(req)) {
+			ret = PTR_ERR(req);
 			goto out_error;
-	} while (PagePrivate(page));
+		}
+		need_commit = test_bit(PG_CLEAN, &req->wb_flags);
+		nfs_clear_page_tag_locked(req);
+		if (need_commit) {
+			ret = nfs_commit_inode(inode, FLUSH_SYNC);
+			if (ret < 0)
+				goto out_error;
+		}
+	}
 	return 0;
 out_error:
-	__mark_inode_dirty(inode, I_DIRTY_PAGES);
 	return ret;
 }
 
-/*
- * Write back all requests on one page - we do this before reading it.
- */
-int nfs_wb_page(struct inode *inode, struct page* page)
-{
-	return nfs_wb_page_priority(inode, page, FLUSH_STABLE);
-}
-
 #ifdef CONFIG_MIGRATION
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
 		struct page *page)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 3383622..b1e0877 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -474,7 +474,6 @@ extern int nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);
  * Try to write back everything synchronously (but check the
  * return value!)
  */
-extern long nfs_sync_mapping_wait(struct address_space *, struct writeback_control *, int);
 extern int nfs_wb_all(struct inode *inode);
 extern int nfs_wb_page(struct inode *inode, struct page* page);
 extern int nfs_wb_page_cancel(struct inode *inode, struct page* page);
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 11/13] NFS: Clean up nfs_sync_mapping
  2010-02-10 17:03                   ` [PATCH 10/13] NFS: Simplify nfs_wb_page() Trond Myklebust
@ 2010-02-10 17:03                     ` Trond Myklebust
  2010-02-10 17:03                       ` [PATCH 12/13] NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Remove the redundant call to filemap_write_and_wait().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/inode.c |   16 ++++++----------
 1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 13fe0dc..38e79e4 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -114,16 +114,12 @@ void nfs_clear_inode(struct inode *inode)
  */
 int nfs_sync_mapping(struct address_space *mapping)
 {
-	int ret;
+	int ret = 0;
 
-	if (mapping->nrpages == 0)
-		return 0;
-	unmap_mapping_range(mapping, 0, 0, 0);
-	ret = filemap_write_and_wait(mapping);
-	if (ret != 0)
-		goto out;
-	ret = nfs_wb_all(mapping->host);
-out:
+	if (mapping->nrpages != 0) {
+		unmap_mapping_range(mapping, 0, 0, 0);
+		ret = nfs_wb_all(mapping->host);
+	}
 	return ret;
 }
 
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 12/13] NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
  2010-02-10 17:03                     ` [PATCH 11/13] NFS: Clean up nfs_sync_mapping Trond Myklebust
@ 2010-02-10 17:03                       ` Trond Myklebust
  2010-02-10 17:03                         ` [PATCH 13/13] NFS: Don't write out dirty pages in nfs_release_page() Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c           |    2 +-
 fs/nfs/inode.c         |   41 +----------------------------------------
 fs/nfs/symlink.c       |    2 +-
 include/linux/nfs_fs.h |    1 -
 4 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 3c7f03b..a1f6b44 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -560,7 +560,7 @@ static int nfs_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	desc->entry = &my_entry;
 
 	nfs_block_sillyrename(dentry);
-	res = nfs_revalidate_mapping_nolock(inode, filp->f_mapping);
+	res = nfs_revalidate_mapping(inode, filp->f_mapping);
 	if (res < 0)
 		goto out;
 
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 38e79e4..f50ad09 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -754,7 +754,7 @@ int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
 	return __nfs_revalidate_inode(server, inode);
 }
 
-static int nfs_invalidate_mapping_nolock(struct inode *inode, struct address_space *mapping)
+static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
 	
@@ -775,49 +775,10 @@ static int nfs_invalidate_mapping_nolock(struct inode *inode, struct address_spa
 	return 0;
 }
 
-static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
-{
-	int ret = 0;
-
-	mutex_lock(&inode->i_mutex);
-	if (NFS_I(inode)->cache_validity & NFS_INO_INVALID_DATA) {
-		ret = nfs_sync_mapping(mapping);
-		if (ret == 0)
-			ret = nfs_invalidate_mapping_nolock(inode, mapping);
-	}
-	mutex_unlock(&inode->i_mutex);
-	return ret;
-}
-
-/**
- * nfs_revalidate_mapping_nolock - Revalidate the pagecache
- * @inode - pointer to host inode
- * @mapping - pointer to mapping
- */
-int nfs_revalidate_mapping_nolock(struct inode *inode, struct address_space *mapping)
-{
-	struct nfs_inode *nfsi = NFS_I(inode);
-	int ret = 0;
-
-	if ((nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
-			|| nfs_attribute_timeout(inode) || NFS_STALE(inode)) {
-		ret = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
-		if (ret < 0)
-			goto out;
-	}
-	if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
-		ret = nfs_invalidate_mapping_nolock(inode, mapping);
-out:
-	return ret;
-}
-
 /**
  * nfs_revalidate_mapping - Revalidate the pagecache
  * @inode - pointer to host inode
  * @mapping - pointer to mapping
- *
- * This version of the function will take the inode->i_mutex and attempt to
- * flush out all dirty data if it needs to invalidate the page cache.
  */
 int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
 {
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 412738d..2ea9e5c 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -50,7 +50,7 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct page *page;
 	void *err;
 
-	err = ERR_PTR(nfs_revalidate_mapping_nolock(inode, inode->i_mapping));
+	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
 		goto read_failed;
 	page = read_cache_page(&inode->i_data, 0,
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index b1e0877..5af42eb 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -346,7 +346,6 @@ extern int nfs_attribute_timeout(struct inode *inode);
 extern int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode);
 extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
 extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
-extern int nfs_revalidate_mapping_nolock(struct inode *inode, struct address_space *mapping);
 extern int nfs_setattr(struct dentry *, struct iattr *);
 extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr);
 extern struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx);
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 13/13] NFS: Don't write out dirty pages in nfs_release_page()
  2010-02-10 17:03                       ` [PATCH 12/13] NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping Trond Myklebust
@ 2010-02-10 17:03                         ` Trond Myklebust
  0 siblings, 0 replies; 17+ messages in thread
From: Trond Myklebust @ 2010-02-10 17:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Trond Myklebust

This causes too many commits in shrink_page_list()...

Reported-by: Steve Rago <sar@nec-labs.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/file.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 63f2071..dcba521 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -486,6 +486,13 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 {
 	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
 
+	/* See comment in shrink_page_list(): although the VM may
+	 * call this function on a dirty page, we are not expected
+	 * to initiate writeback on it.
+	 */
+	if (PageDirty(page) || !page->mapping)
+		return 0;
+
 	if (gfp & __GFP_WAIT)
 		nfs_wb_page(page->mapping->host, page);
 	/* If PagePrivate() is set, then the page is not freeable */
-- 
1.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages
  2010-02-10 17:03         ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
  2010-02-10 17:03           ` [PATCH 06/13] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
@ 2010-02-15  5:55           ` Nick Piggin
  2010-02-15 17:09             ` Trond Myklebust
  1 sibling, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2010-02-15  5:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-mm

On Wed, Feb 10, 2010 at 12:03:25PM -0500, Trond Myklebust wrote:
> balance_dirty_pages() should really tell the filesystem whether or not it
> has an excess of actual dirty pages, or whether it would be more useful to
> start freeing up the unstable writes.
> 
> Assume that if the number of unstable writes is more than 1/2 the number of
> reclaimable pages, then we should force NFS to free up the former.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> Acked-by: Jan Kara <jack@suse.cz>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  fs/nfs/write.c            |    2 +-
>  include/linux/writeback.h |    5 +++++
>  mm/page-writeback.c       |   12 ++++++++++--
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index ed032c0..2f1d9a6 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1415,7 +1415,7 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
>  	/* Don't commit yet if this is a non-blocking flush and there are
>  	 * outstanding writes for this mapping.
>  	 */
> -	if (wbc->sync_mode != WB_SYNC_ALL &&
> +	if (!wbc->force_commit_unstable && wbc->sync_mode != WB_SYNC_ALL &&
>  	    radix_tree_tagged(&NFS_I(inode)->nfs_page_tree,
>  		    NFS_PAGE_TAG_LOCKED))
>  		goto out_mark_dirty;
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 76e8903..8229139 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -62,6 +62,11 @@ struct writeback_control {
>  	 * so we use a single control to update them
>  	 */
>  	unsigned no_nrwrite_index_update:1;
> +	/*
> +	 * The following is used by balance_dirty_pages() to
> +	 * force NFS to commit unstable pages.
> +	 */
> +	unsigned force_commit_unstable:1;
>  };
>  
>  /*
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index c06739b..6a0aec7 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -503,6 +503,7 @@ static void balance_dirty_pages(struct address_space *mapping,
>  			.nr_to_write	= write_chunk,
>  			.range_cyclic	= 1,
>  		};
> +		long bdi_nr_unstable = 0;
>  
>  		get_dirty_limits(&background_thresh, &dirty_thresh,
>  				&bdi_thresh, bdi);
> @@ -512,8 +513,10 @@ static void balance_dirty_pages(struct address_space *mapping,
>  		nr_writeback = global_page_state(NR_WRITEBACK);
>  
>  		bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
> -		if (bdi_cap_account_unstable(bdi))
> -			bdi_nr_reclaimable += bdi_stat(bdi, BDI_UNSTABLE);
> +		if (bdi_cap_account_unstable(bdi)) {
> +			bdi_nr_unstable = bdi_stat(bdi, BDI_UNSTABLE);
> +			bdi_nr_reclaimable += bdi_nr_unstable;
> +		}
>  		bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
>  
>  		if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> @@ -541,6 +544,11 @@ static void balance_dirty_pages(struct address_space *mapping,
>  		 * up.
>  		 */
>  		if (bdi_nr_reclaimable > bdi_thresh) {
> +			wbc.force_commit_unstable = 0;
> +			/* Force NFS to also free up unstable writes. */
> +			if (bdi_nr_unstable > bdi_nr_reclaimable / 2)
> +				wbc.force_commit_unstable = 1;

This seems like it is putting NFS specific logic into the VM. OK,
we already have it because we have these unstable pages, but all
we really cared about before is that dirty+unstable ~= reclaimable.

Shouldn't NFS just work out its ratio of dirty and unstable pages
and just do the right thing in its writeback path?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages
  2010-02-15  5:55           ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Nick Piggin
@ 2010-02-15 17:09             ` Trond Myklebust
  2010-02-15 17:51               ` Nick Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2010-02-15 17:09 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-mm

On Mon, 2010-02-15 at 16:55 +1100, Nick Piggin wrote: 
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index c06739b..6a0aec7 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -503,6 +503,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> >  			.nr_to_write	= write_chunk,
> >  			.range_cyclic	= 1,
> >  		};
> > +		long bdi_nr_unstable = 0;
> >  
> >  		get_dirty_limits(&background_thresh, &dirty_thresh,
> >  				&bdi_thresh, bdi);
> > @@ -512,8 +513,10 @@ static void balance_dirty_pages(struct address_space *mapping,
> >  		nr_writeback = global_page_state(NR_WRITEBACK);
> >  
> >  		bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
> > -		if (bdi_cap_account_unstable(bdi))
> > -			bdi_nr_reclaimable += bdi_stat(bdi, BDI_UNSTABLE);
> > +		if (bdi_cap_account_unstable(bdi)) {
> > +			bdi_nr_unstable = bdi_stat(bdi, BDI_UNSTABLE);
> > +			bdi_nr_reclaimable += bdi_nr_unstable;
> > +		}
> >  		bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> >  
> >  		if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > @@ -541,6 +544,11 @@ static void balance_dirty_pages(struct address_space *mapping,
> >  		 * up.
> >  		 */
> >  		if (bdi_nr_reclaimable > bdi_thresh) {
> > +			wbc.force_commit_unstable = 0;
> > +			/* Force NFS to also free up unstable writes. */
> > +			if (bdi_nr_unstable > bdi_nr_reclaimable / 2)
> > +				wbc.force_commit_unstable = 1;
> 
> This seems like it is putting NFS specific logic into the VM. OK,
> we already have it because we have these unstable pages, but all
> we really cared about before is that dirty+unstable ~= reclaimable.
> 
> Shouldn't NFS just work out its ratio of dirty and unstable pages
> and just do the right thing in its writeback path?
> 

Part of the problem is that balance_dirty_pages is looking at per-bdi
statistics, whereas the NFS layer is being called back on an
inode-by-inode basis.
Doing a per-bdi calculation every time we get called back in write_inode
is possible, but can be very inefficient for workloads that involve
writing to several files in parallel. In contrast, all we're really
adding to the VM layer here is a single extra comparison.

The issue here is that the VM wants to do non-blocking I/O using
WB_SYNC_NONE to write out the data. While that is well defined as far as
writeback of dirty pages is concerned, it is difficult to figure a
strategy for handling repeated calls to write_inode().

If we send a 'commit' rpc call every time the balance_dirty_pages loop,
or the bdi_writeback thread triggers a call to write_inode(), then we
end up causing the server to sync its pagecache to disk when we've only
managed to send it a few dirty pages.
We've tried adding a heuristic in the NFS layer that says it should
issue a commit when it sees that there are no more writes in flight for
that inode. However when we do so, we see that balance_dirty_pages ends
up spinning while the last few writes are being sent off.

The point of the extra 'force_commit_unstable' knob is that it allows
the caller to tell the NFS layer that we've written out enough dirty
pages, and that as far as the VM is concerned we can best make progress
by attacking the pileup of unstable writes. As I said above, that can be
done using a single extra comparison in the balance_dirty_pages, instead
of redoing the entire calculation for each inode called by
writeback_inodes_wbc(). Furthermore, it means that the bdi_writeback
thread can continue to do opportunistic writebacks without triggering a
lot of unnecessary flushes on the server.

Cheers
  Trond

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages
  2010-02-15 17:09             ` Trond Myklebust
@ 2010-02-15 17:51               ` Nick Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nick Piggin @ 2010-02-15 17:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-mm

On Mon, Feb 15, 2010 at 12:09:49PM -0500, Trond Myklebust wrote:
> On Mon, 2010-02-15 at 16:55 +1100, Nick Piggin wrote: 
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index c06739b..6a0aec7 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -503,6 +503,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> > >  			.nr_to_write	= write_chunk,
> > >  			.range_cyclic	= 1,
> > >  		};
> > > +		long bdi_nr_unstable = 0;
> > >  
> > >  		get_dirty_limits(&background_thresh, &dirty_thresh,
> > >  				&bdi_thresh, bdi);
> > > @@ -512,8 +513,10 @@ static void balance_dirty_pages(struct address_space *mapping,
> > >  		nr_writeback = global_page_state(NR_WRITEBACK);
> > >  
> > >  		bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
> > > -		if (bdi_cap_account_unstable(bdi))
> > > -			bdi_nr_reclaimable += bdi_stat(bdi, BDI_UNSTABLE);
> > > +		if (bdi_cap_account_unstable(bdi)) {
> > > +			bdi_nr_unstable = bdi_stat(bdi, BDI_UNSTABLE);
> > > +			bdi_nr_reclaimable += bdi_nr_unstable;
> > > +		}
> > >  		bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> > >  
> > >  		if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > > @@ -541,6 +544,11 @@ static void balance_dirty_pages(struct address_space *mapping,
> > >  		 * up.
> > >  		 */
> > >  		if (bdi_nr_reclaimable > bdi_thresh) {
> > > +			wbc.force_commit_unstable = 0;
> > > +			/* Force NFS to also free up unstable writes. */
> > > +			if (bdi_nr_unstable > bdi_nr_reclaimable / 2)
> > > +				wbc.force_commit_unstable = 1;
> > 
> > This seems like it is putting NFS specific logic into the VM. OK,
> > we already have it because we have these unstable pages, but all
> > we really cared about before is that dirty+unstable ~= reclaimable.
> > 
> > Shouldn't NFS just work out its ratio of dirty and unstable pages
> > and just do the right thing in its writeback path?
> > 
> 
> Part of the problem is that balance_dirty_pages is looking at per-bdi
> statistics, whereas the NFS layer is being called back on an
> inode-by-inode basis.
> Doing a per-bdi calculation every time we get called back in write_inode
> is possible, but can be very inefficient for workloads that involve

Well you can just cache the unstable number in the wbc?


> writing to several files in parallel. In contrast, all we're really
> adding to the VM layer here is a single extra comparison.

It's not the cost of it that I care about. Obviously it's quite
trivial. It's just that it is nicer if the VM doesn't know anything
beyond "must call the filesystem in order to make these pages
reclaimable". The bdi_nr_unstable > bdi_nr_reclaimable / 2
calculation doesn't belong in the VM.

Yes it's a pretty minor thing to pick on, but things should go
where they belong.

 
> The issue here is that the VM wants to do non-blocking I/O using
> WB_SYNC_NONE to write out the data. While that is well defined as far as
> writeback of dirty pages is concerned, it is difficult to figure a
> strategy for handling repeated calls to write_inode().
> 
> If we send a 'commit' rpc call every time the balance_dirty_pages loop,
> or the bdi_writeback thread triggers a call to write_inode(), then we
> end up causing the server to sync its pagecache to disk when we've only
> managed to send it a few dirty pages.
> We've tried adding a heuristic in the NFS layer that says it should
> issue a commit when it sees that there are no more writes in flight for
> that inode. However when we do so, we see that balance_dirty_pages ends
> up spinning while the last few writes are being sent off.
> 
> The point of the extra 'force_commit_unstable' knob is that it allows
> the caller to tell the NFS layer that we've written out enough dirty
> pages, and that as far as the VM is concerned we can best make progress
> by attacking the pileup of unstable writes. As I said above, that can be
> done using a single extra comparison in the balance_dirty_pages, instead
> of redoing the entire calculation for each inode called by
> writeback_inodes_wbc(). Furthermore, it means that the bdi_writeback
> thread can continue to do opportunistic writebacks without triggering a
> lot of unnecessary flushes on the server.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-02-15 17:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-10 17:03 [PATCH 00/13] Allow the VM to manage NFS unstable writes Trond Myklebust
2010-02-10 17:03 ` [PATCH 01/13] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-02-10 17:03   ` [PATCH 02/13] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-02-10 17:03     ` [PATCH 03/13] NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c Trond Myklebust
2010-02-10 17:03       ` [PATCH 04/13] NFS: Reduce the number of unnecessary COMMIT calls Trond Myklebust
2010-02-10 17:03         ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-02-10 17:03           ` [PATCH 06/13] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-02-10 17:03             ` [PATCH 07/13] NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages Trond Myklebust
2010-02-10 17:03               ` [PATCH 08/13] NFS: Simplify nfs_wb_page_cancel() Trond Myklebust
2010-02-10 17:03                 ` [PATCH 09/13] NFS: Replace __nfs_write_mapping with sync_inode() Trond Myklebust
2010-02-10 17:03                   ` [PATCH 10/13] NFS: Simplify nfs_wb_page() Trond Myklebust
2010-02-10 17:03                     ` [PATCH 11/13] NFS: Clean up nfs_sync_mapping Trond Myklebust
2010-02-10 17:03                       ` [PATCH 12/13] NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping Trond Myklebust
2010-02-10 17:03                         ` [PATCH 13/13] NFS: Don't write out dirty pages in nfs_release_page() Trond Myklebust
2010-02-15  5:55           ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Nick Piggin
2010-02-15 17:09             ` Trond Myklebust
2010-02-15 17:51               ` Nick Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.