[PATCH 0/4] [RFC] transfer vmscan pageout works to the flusher thread

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] [RFC] transfer vmscan pageout works to the flusher thread
@ 2010-09-13 12:31 Wu Fengguang
  2010-09-13 12:31 ` [PATCH 1/4] writeback: integrated background work Wu Fengguang
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-09-13 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara, Andrew Morton,
	Wu Fengguang

Hi,

Some rather early work, for integration with Mel's patches :) 

Thanks,
Fengguang


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] writeback: integrated background work
  2010-09-13 12:31 [PATCH 0/4] [RFC] transfer vmscan pageout works to the flusher thread Wu Fengguang
@ 2010-09-13 12:31 ` Wu Fengguang
  2010-09-13 22:46   ` Minchan Kim
  2010-09-14 12:45   ` Jan Kara
  2010-09-13 12:31 ` [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued Wu Fengguang
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-09-13 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Mel Gorman, Wu Fengguang, Rik van Riel, Johannes Weiner,
	Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara,
	Andrew Morton

[-- Attachment #1: writeback-integrated-background-work.patch --]
[-- Type: text/plain, Size: 1482 bytes --]

Check background work whenever the flusher thread wakes up.  The page
reclaim code may lower the soft dirty limit immediately before sending
some work to the flusher thread.

This is also the prerequisite of next patch.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2010-09-13 19:41:21.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-09-13 19:49:11.000000000 +0800
@@ -716,6 +716,23 @@ get_next_work_item(struct backing_dev_in
 	return work;
 }
 
+static long wb_check_background_flush(struct bdi_writeback *wb)
+{
+	if (over_bground_thresh()) {
+
+		struct wb_writeback_work work = {
+			.nr_pages	= LONG_MAX,
+			.sync_mode	= WB_SYNC_NONE,
+			.for_background	= 1,
+			.range_cyclic	= 1,
+		};
+
+		return wb_writeback(wb, &work);
+	}
+
+	return 0;
+}
+
 static long wb_check_old_data_flush(struct bdi_writeback *wb)
 {
 	unsigned long expired;
@@ -787,6 +804,7 @@ long wb_do_writeback(struct bdi_writebac
 	 * Check for periodic writeback, kupdated() style
 	 */
 	wrote += wb_check_old_data_flush(wb);
+	wrote += wb_check_background_flush(wb);
 	clear_bit(BDI_writeback_running, &wb->bdi->state);
 
 	return wrote;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued
  2010-09-13 12:31 [PATCH 0/4] [RFC] transfer vmscan pageout works to the flusher thread Wu Fengguang
  2010-09-13 12:31 ` [PATCH 1/4] writeback: integrated background work Wu Fengguang
@ 2010-09-13 12:31 ` Wu Fengguang
  2010-09-14 12:40   ` Jan Kara
  2010-09-13 12:31 ` [PATCH 3/4] writeback: introduce bdi_start_inode_writeback() Wu Fengguang
  2010-09-13 12:31 ` [PATCH 4/4] vmscan: transfer async file writeback to the flusher Wu Fengguang
  3 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2010-09-13 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Mel Gorman, Jan Kara, Wu Fengguang, Rik van Riel,
	Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Andrew Morton

[-- Attachment #1: mutt-wfg-t61-1000-16461-3447aada517764116f --]
[-- Type: text/plain, Size: 1403 bytes --]

 From: Jan Kara <jack@suse.cz>

Background writeback and kupdate-style writeback are easily livelockable
(from a definition of their target). This is inconvenient because it can
make sync(1) stall forever waiting on its queued work to be finished.
Fix the problem by interrupting background and kupdate writeback if there
is some other work to do. We can return to them after completing all the
queued work.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2010-09-13 13:58:47.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-09-13 14:03:54.000000000 +0800
@@ -643,6 +643,14 @@ static long wb_writeback(struct bdi_writ
 			break;
 
 		/*
+		 * Background writeout and kupdate-style writeback are
+		 * easily livelockable. Stop them if there is other work
+		 * to do so that e.g. sync can proceed.
+		 */
+		if ((work->for_background || work->for_kupdate) &&
+		    !list_empty(&wb->bdi->work_list))
+			break;
+		/*
 		 * For background writeout, stop when we are below the
 		 * background dirty threshold
 		 */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/4] writeback: introduce bdi_start_inode_writeback()
  2010-09-13 12:31 [PATCH 0/4] [RFC] transfer vmscan pageout works to the flusher thread Wu Fengguang
  2010-09-13 12:31 ` [PATCH 1/4] writeback: integrated background work Wu Fengguang
  2010-09-13 12:31 ` [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued Wu Fengguang
@ 2010-09-13 12:31 ` Wu Fengguang
  2010-09-14 13:36   ` Jan Kara
  2010-09-13 12:31 ` [PATCH 4/4] vmscan: transfer async file writeback to the flusher Wu Fengguang
  3 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2010-09-13 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Mel Gorman, Wu Fengguang, Rik van Riel, Johannes Weiner,
	Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara,
	Andrew Morton

[-- Attachment #1: writeback-bdi_start_inode_writeback.patch --]
[-- Type: text/plain, Size: 6084 bytes --]

This is to transfer dirty pages encountered in page reclaim to the
flusher threads for writeback.

The flusher will piggy back more dirty pages for IO
- it's more IO efficient
- it helps clean more pages, a good number of them may sit in the same
  LRU list that is being scanned.

To avoid memory allocations at page reclaim, a mempool is created.

Background/periodic works will quit automatically, so as to clean the
pages under reclaim ASAP. However the sync work can still block us for
long time.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c           |  103 +++++++++++++++++++++++++++++++++-
 include/linux/backing-dev.h |    2 
 2 files changed, 102 insertions(+), 3 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2010-09-13 20:06:09.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-09-13 20:24:03.000000000 +0800
@@ -30,11 +30,20 @@
 #include "internal.h"
 
 /*
+ * When flushing an inode page (for page reclaim), try to piggy back more
+ * nearby pages for IO efficiency. These pages will have good opportunity
+ * to be in the same LRU list.
+ */
+#define WRITE_AROUND_PAGES	(1UL << (20 - PAGE_CACHE_SHIFT))
+
+/*
  * Passed into wb_writeback(), essentially a subset of writeback_control
  */
 struct wb_writeback_work {
 	long nr_pages;
 	struct super_block *sb;
+	struct inode *inode;
+	pgoff_t offset;
 	enum writeback_sync_modes sync_mode;
 	unsigned int for_kupdate:1;
 	unsigned int range_cyclic:1;
@@ -59,6 +68,27 @@ struct wb_writeback_work {
  */
 int nr_pdflush_threads;
 
+static mempool_t *wb_work_mempool;
+
+static void *wb_work_alloc(gfp_t gfp_mask, void *pool_data)
+{
+	/*
+	 * bdi_start_inode_writeback() may be called on page reclaim
+	 */
+	if (current->flags & PF_MEMALLOC)
+		return NULL;
+
+	return kmalloc(sizeof(struct wb_writeback_work), gfp_mask);
+}
+
+static __init int wb_work_init(void)
+{
+	wb_work_mempool = mempool_create(1024,
+					 wb_work_alloc, mempool_kfree, NULL);
+	return wb_work_mempool ? 0 : -ENOMEM;
+}
+fs_initcall(wb_work_init);
+
 /**
  * writeback_in_progress - determine whether there is writeback in progress
  * @bdi: the device's backing_dev_info structure.
@@ -101,7 +131,7 @@ __bdi_start_writeback(struct backing_dev
 	 * This is WB_SYNC_NONE writeback, so if allocation fails just
 	 * wakeup the thread for old dirty data writeback
 	 */
-	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	work = mempool_alloc(wb_work_mempool, GFP_NOWAIT);
 	if (!work) {
 		if (bdi->wb.task) {
 			trace_writeback_nowork(bdi);
@@ -110,6 +140,7 @@ __bdi_start_writeback(struct backing_dev
 		return;
 	}
 
+	memset(work, 0, sizeof(*work));
 	work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
@@ -148,6 +179,55 @@ void bdi_start_background_writeback(stru
 	__bdi_start_writeback(bdi, LONG_MAX, true, true);
 }
 
+int bdi_start_inode_writeback(struct backing_dev_info *bdi,
+			      struct inode *inode, pgoff_t offset)
+{
+	struct wb_writeback_work *work;
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_for_each_entry_reverse(work, &bdi->work_list, list) {
+		unsigned long end;
+		if (work->inode != inode)
+			continue;
+		end = work->offset + work->nr_pages;
+		if (work->offset - offset < WRITE_AROUND_PAGES) {
+			work->nr_pages += work->offset - offset;
+			work->offset = offset;
+			inode = NULL;
+			break;
+		} else if (offset - end < WRITE_AROUND_PAGES) {
+			work->nr_pages += offset - end;
+			inode = NULL;
+			break;
+		} else if (offset > work->offset &&
+			   offset < end) {
+			inode = NULL;
+			break;
+		}
+	}
+	spin_unlock_bh(&bdi->wb_lock);
+
+	if (!inode)
+		return 0;
+
+	if (!igrab(inode))
+		return -ENOENT;
+
+	work = mempool_alloc(wb_work_mempool, GFP_NOWAIT);
+	if (!work)
+		return -ENOMEM;
+
+	memset(work, 0, sizeof(*work));
+	work->sync_mode		= WB_SYNC_NONE;
+	work->inode		= inode;
+	work->offset		= offset;
+	work->nr_pages		= 1;
+
+	bdi_queue_work(inode->i_sb->s_bdi, work);
+
+	return 0;
+}
+
 /*
  * Redirty an inode: set its when-it-was dirtied timestamp and move it to the
  * furthest end of its superblock's dirty-inode list.
@@ -724,6 +804,20 @@ get_next_work_item(struct backing_dev_in
 	return work;
 }
 
+static long wb_flush_inode(struct bdi_writeback *wb,
+			   struct wb_writeback_work *work)
+{
+	pgoff_t start = round_down(work->offset, WRITE_AROUND_PAGES);
+	pgoff_t end = round_up(work->offset + work->nr_pages,
+			       WRITE_AROUND_PAGES);
+	int wrote;
+
+	wrote = __filemap_fdatawrite_range(work->inode->i_mapping,
+					   start, end, WB_SYNC_NONE);
+	iput(work->inode);
+	return wrote;
+}
+
 static long wb_check_background_flush(struct bdi_writeback *wb)
 {
 	if (over_bground_thresh()) {
@@ -796,7 +890,10 @@ long wb_do_writeback(struct bdi_writebac
 
 		trace_writeback_exec(bdi, work);
 
-		wrote += wb_writeback(wb, work);
+		if (work->inode)
+			wrote += wb_flush_inode(wb, work);
+		else
+			wrote += wb_writeback(wb, work);
 
 		/*
 		 * Notify the caller of completion if this is a synchronous
@@ -805,7 +902,7 @@ long wb_do_writeback(struct bdi_writebac
 		if (work->done)
 			complete(work->done);
 		else
-			kfree(work);
+			mempool_free(work, wb_work_mempool);
 	}
 
 	/*
--- linux-next.orig/include/linux/backing-dev.h	2010-09-13 19:48:16.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2010-09-13 20:20:59.000000000 +0800
@@ -106,6 +106,8 @@ void bdi_unregister(struct backing_dev_i
 int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages);
 void bdi_start_background_writeback(struct backing_dev_info *bdi);
+int bdi_start_inode_writeback(struct backing_dev_info *bdi,
+			      struct inode *inode, pgoff_t offset);
 int bdi_writeback_thread(void *data);
 int bdi_has_dirty_io(struct backing_dev_info *bdi);
 void bdi_arm_supers_timer(void);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/4] vmscan: transfer async file writeback to the flusher
  2010-09-13 12:31 [PATCH 0/4] [RFC] transfer vmscan pageout works to the flusher thread Wu Fengguang
                   ` (2 preceding siblings ...)
  2010-09-13 12:31 ` [PATCH 3/4] writeback: introduce bdi_start_inode_writeback() Wu Fengguang
@ 2010-09-13 12:31 ` Wu Fengguang
  3 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-09-13 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: Mel Gorman, Wu Fengguang, Rik van Riel, Johannes Weiner,
	Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara,
	Andrew Morton

[-- Attachment #1: vmscan-writeback-inode-page.patch --]
[-- Type: text/plain, Size: 2241 bytes --]

This relays all ASYNC file writeback IOs to the flusher threads.
The lesser SYNC pageout()s will work as before (as a last resort).

It's a minimal prototype implementation and barely runs without panic.
It potentially requires lots of more work to go stable.

TODO: avoid OOM if the LRU list is small and/or the storage is slow, so
that the flusher cannot clean enough pages before the LRU is full
scanned.  One simple way could be to do waits on dirty/writeback pages
when priority < 3 for even order 0 allocations.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |    2 +-
 mm/vmscan.c         |   13 +++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

--- linux-next.orig/mm/vmscan.c	2010-09-13 19:48:16.000000000 +0800
+++ linux-next/mm/vmscan.c	2010-09-13 20:14:42.000000000 +0800
@@ -756,6 +756,19 @@ static unsigned long shrink_page_list(st
 				}
 			}
 
+			if (page_is_file_cache(page) && mapping &&
+			    sync_writeback == PAGEOUT_IO_ASYNC) {
+				if (!bdi_start_inode_writeback(
+					mapping->backing_dev_info,
+					mapping->host, page_index(page))) {
+					SetPageReclaim(page);
+					goto keep_locked;
+				} else if (!current_is_kswapd() &&
+					   printk_ratelimit()) {
+					printk(KERN_INFO "cannot pageout\n");
+				}
+			}
+
 			if (references == PAGEREF_RECLAIM_CLEAN)
 				goto keep_locked;
 			if (!may_enter_fs)
--- linux-next.orig/mm/page-writeback.c	2010-09-13 19:48:16.000000000 +0800
+++ linux-next/mm/page-writeback.c	2010-09-13 20:06:26.000000000 +0800
@@ -1232,6 +1232,7 @@ int set_page_dirty(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
+	ClearPageReclaim(page);
 	if (likely(mapping)) {
 		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
 #ifdef CONFIG_BLOCK
@@ -1289,7 +1290,6 @@ int clear_page_dirty_for_io(struct page 
 
 	BUG_ON(!PageLocked(page));
 
-	ClearPageReclaim(page);
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		/*
 		 * Yes, Virginia, this is indeed insane.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] writeback: integrated background work
  2010-09-13 12:31 ` [PATCH 1/4] writeback: integrated background work Wu Fengguang
@ 2010-09-13 22:46   ` Minchan Kim
  2010-09-14  0:53     ` Wu Fengguang
  2010-09-14 12:45   ` Jan Kara
  1 sibling, 1 reply; 30+ messages in thread
From: Minchan Kim @ 2010-09-13 22:46 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara, Andrew Morton

Hi Wu,

On Mon, Sep 13, 2010 at 9:31 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Check background work whenever the flusher thread wakes up.  The page
> reclaim code may lower the soft dirty limit immediately before sending
> some work to the flusher thread.

I looked over this series. First impression is the approach is good. :)
But let me have a question.
I can't find things about soft dirty limit.
Maybe it's a thing based on your another patch series.
But at least, could you explain it in this series if it is really
related to this series?

>
> This is also the prerequisite of next patch.

I can't understand why is the prerequisite of next patch.
Please specify it.

>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>




-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] writeback: integrated background work
  2010-09-13 22:46   ` Minchan Kim
@ 2010-09-14  0:53     ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-09-14  0:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara, Andrew Morton

On Tue, Sep 14, 2010 at 06:46:01AM +0800, Minchan Kim wrote:
> Hi Wu,
> 
> On Mon, Sep 13, 2010 at 9:31 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > Check background work whenever the flusher thread wakes up. A The page
> > reclaim code may lower the soft dirty limit immediately before sending
> > some work to the flusher thread.
> 
> I looked over this series. First impression is the approach is good. :)

Thanks :)

> But let me have a question.
> I can't find things about soft dirty limit.
> Maybe it's a thing based on your another patch series.

Yes, it's in the series "[RFC] soft and dynamic dirty throttling
limits", in particular the patch "[PATCH 15/17] mm: lower soft dirty
limits on memory pressure". https://patchwork.kernel.org/patch/173232/

> But at least, could you explain it in this series if it is really
> related to this series?

The above patch with URL has a chunk:

@@ -745,6 +745,16 @@  static unsigned long shrink_page_list(st
                }
 
                if (PageDirty(page)) {
+
+                       if (file && scanning_global_lru(sc)) {
+                               int dp = VM_DIRTY_PRESSURE >>
+                                       (DEF_PRIORITY + 1 - sc->priority);
+                               if (vm_dirty_pressure > dp) {
+                                       vm_dirty_pressure = dp;
+                                       vm_dirty_pressure_node = numa_node_id();
+                               }
+                       }
+

which lowers the soft dirty limits. It could explicitly check and
start background work, however doesn't do so because this patchset
will take care of it, by adding this chunk immediately after the above
code:

@@ -756,6 +756,19 @@ static unsigned long shrink_page_list(st
                                }
                        }

+                       if (page_is_file_cache(page) && mapping &&
+                           sync_writeback == PAGEOUT_IO_ASYNC) {
+                               if (!bdi_start_inode_writeback(
+                                       mapping->backing_dev_info,
+                                       mapping->host, page_index(page))) {
+                                       SetPageReclaim(page);
+                                       goto keep_locked;
+                               } else if (!current_is_kswapd() &&
+                                          printk_ratelimit()) {
+                                       printk(KERN_INFO "cannot pageout\n");
+                               }
+                       }
+                       

The bdi_start_inode_writeback() will wake up the flusher thread, which
will then check for background writeback (behavior added by this patch).

> >
> > This is also the prerequisite of next patch.
> 
> I can't understand why is the prerequisite of next patch.
> Please specify it.

Because the next patch breaks out of the background work (to serve
other works first). The code in this patch will resume the background
work when other works are done.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued
  2010-09-13 12:31 ` [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued Wu Fengguang
@ 2010-09-14 12:40   ` Jan Kara
  2010-11-01 12:07     ` Wu Fengguang
  2010-11-01 12:14       ` Wu Fengguang
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-14 12:40 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-mm, Mel Gorman, Jan Kara, Rik van Riel, Johannes Weiner,
	Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

  Hi,

On Mon 13-09-10 20:31:12, Wu Fengguang wrote:
>  From: Jan Kara <jack@suse.cz>
> 
> Background writeback and kupdate-style writeback are easily livelockable
> (from a definition of their target). This is inconvenient because it can
> make sync(1) stall forever waiting on its queued work to be finished.
> Fix the problem by interrupting background and kupdate writeback if there
> is some other work to do. We can return to them after completing all the
> queued work.
  I actually have a slightly updated version with a better changelog:

Background writeback are easily livelockable (from a definition of their
target). This is inconvenient because it can make sync(1) stall forever waiting
on its queued work to be finished. Generally, when a flusher thread has
some work queued, someone submitted the work to achieve a goal more specific
than what background writeback does. So it makes sense to give it a priority
over a generic page cleaning.

Thus we interrupt background writeback if there is some other work to do. We
return to the background writeback after completing all the queued work.

  Could you please update it? Thanks.
								Honza

PS: I've also attached the full patch if that's more convenient for you.

> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  fs/fs-writeback.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> --- linux-next.orig/fs/fs-writeback.c	2010-09-13 13:58:47.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2010-09-13 14:03:54.000000000 +0800
> @@ -643,6 +643,14 @@ static long wb_writeback(struct bdi_writ
>  			break;
>  
>  		/*
> +		 * Background writeout and kupdate-style writeback are
> +		 * easily livelockable. Stop them if there is other work
> +		 * to do so that e.g. sync can proceed.
> +		 */
> +		if ((work->for_background || work->for_kupdate) &&
> +		    !list_empty(&wb->bdi->work_list))
> +			break;
> +		/*
>  		 * For background writeout, stop when we are below the
>  		 * background dirty threshold
>  		 */
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0002-mm-Stop-background-writeback-if-there-is-other-work-.patch --]
[-- Type: text/x-patch, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] writeback: integrated background work
  2010-09-13 12:31 ` [PATCH 1/4] writeback: integrated background work Wu Fengguang
  2010-09-13 22:46   ` Minchan Kim
@ 2010-09-14 12:45   ` Jan Kara
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-14 12:45 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1710 bytes --]

On Mon 13-09-10 20:31:11, Wu Fengguang wrote:
> Check background work whenever the flusher thread wakes up.  The page
> reclaim code may lower the soft dirty limit immediately before sending
> some work to the flusher thread.
> 
> This is also the prerequisite of next patch.
  I have a patch doing something functionally rather similar but it also
cleans up the code which isn't necessary after this patch. So could you
maybe consider using that one?
  BTW: What has happened with your patch which started writing back old
inodes?

								Honza
> 
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  fs/fs-writeback.c |   18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> --- linux-next.orig/fs/fs-writeback.c	2010-09-13 19:41:21.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2010-09-13 19:49:11.000000000 +0800
> @@ -716,6 +716,23 @@ get_next_work_item(struct backing_dev_in
>  	return work;
>  }
>  
> +static long wb_check_background_flush(struct bdi_writeback *wb)
> +{
> +	if (over_bground_thresh()) {
> +
> +		struct wb_writeback_work work = {
> +			.nr_pages	= LONG_MAX,
> +			.sync_mode	= WB_SYNC_NONE,
> +			.for_background	= 1,
> +			.range_cyclic	= 1,
> +		};
> +
> +		return wb_writeback(wb, &work);
> +	}
> +
> +	return 0;
> +}
> +
>  static long wb_check_old_data_flush(struct bdi_writeback *wb)
>  {
>  	unsigned long expired;
> @@ -787,6 +804,7 @@ long wb_do_writeback(struct bdi_writebac
>  	 * Check for periodic writeback, kupdated() style
>  	 */
>  	wrote += wb_check_old_data_flush(wb);
> +	wrote += wb_check_background_flush(wb);
>  	clear_bit(BDI_writeback_running, &wb->bdi->state);
>  
>  	return wrote;
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-mm-Check-whether-background-writeback-is-needed-afte.patch --]
[-- Type: text/x-patch, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] writeback: introduce bdi_start_inode_writeback()
  2010-09-13 12:31 ` [PATCH 3/4] writeback: introduce bdi_start_inode_writeback() Wu Fengguang
@ 2010-09-14 13:36   ` Jan Kara
  2010-11-01 12:35     ` Wu Fengguang
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2010-09-14 13:36 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Jan Kara, Andrew Morton

On Mon 13-09-10 20:31:13, Wu Fengguang wrote:
> This is to transfer dirty pages encountered in page reclaim to the
> flusher threads for writeback.
> 
> The flusher will piggy back more dirty pages for IO
> - it's more IO efficient
> - it helps clean more pages, a good number of them may sit in the same
>   LRU list that is being scanned.
> 
> To avoid memory allocations at page reclaim, a mempool is created.
> 
> Background/periodic works will quit automatically, so as to clean the
> pages under reclaim ASAP. However the sync work can still block us for
> long time.
> 
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  fs/fs-writeback.c           |  103 +++++++++++++++++++++++++++++++++-
>  include/linux/backing-dev.h |    2 
>  2 files changed, 102 insertions(+), 3 deletions(-)
> 
...
> +int bdi_start_inode_writeback(struct backing_dev_info *bdi,
> +			      struct inode *inode, pgoff_t offset)
> +{
> +	struct wb_writeback_work *work;
> +
> +	spin_lock_bh(&bdi->wb_lock);
> +	list_for_each_entry_reverse(work, &bdi->work_list, list) {
> +		unsigned long end;
> +		if (work->inode != inode)
> +			continue;
  Hmm, this looks rather inefficient. I can imagine the list of work items
can grow rather large on memory stressed machine and the linear scan does
not play well with that (and contention on wb_lock would make it even
worse). I'm not sure how to best handle your set of intervals... RB tree
attached to an inode is an obvious choice but it seems too expensive
(memory spent for every inode) for such a rare use. Maybe you could have
a per-bdi mapping (hash table) from ino to it's tree of intervals for
reclaim... But before going for this, probably measuring how many intervals
are we going to have under memory pressure would be good.

> +		end = work->offset + work->nr_pages;
> +		if (work->offset - offset < WRITE_AROUND_PAGES) {
       It's slightly unclear what's intended here when offset >
work->offset. Could you make that explicit?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued
  2010-09-14 12:40   ` Jan Kara
@ 2010-11-01 12:07     ` Wu Fengguang
  2010-11-01 15:08       ` Jan Kara
  2010-11-01 15:13       ` Jan Kara
  2010-11-01 12:14       ` Wu Fengguang
  1 sibling, 2 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 12:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton

Hi Jan,

Sorry for the long delay!

On Tue, Sep 14, 2010 at 08:40:33PM +0800, Jan Kara wrote:
>   Hi,
> 
> On Mon 13-09-10 20:31:12, Wu Fengguang wrote:
> >  From: Jan Kara <jack@suse.cz>
> > 
> > Background writeback and kupdate-style writeback are easily livelockable
> > (from a definition of their target). This is inconvenient because it can
> > make sync(1) stall forever waiting on its queued work to be finished.
> > Fix the problem by interrupting background and kupdate writeback if there
> > is some other work to do. We can return to them after completing all the
> > queued work.
>   I actually have a slightly updated version with a better changelog:
> 
> Background writeback are easily livelockable (from a definition of their
> target). This is inconvenient because it can make sync(1) stall forever waiting
> on its queued work to be finished. Generally, when a flusher thread has
> some work queued, someone submitted the work to achieve a goal more specific
> than what background writeback does. So it makes sense to give it a priority
> over a generic page cleaning.
> 
> Thus we interrupt background writeback if there is some other work to do. We
> return to the background writeback after completing all the queued work.
> 
>   Could you please update it? Thanks.
> 								Honza
> 
> PS: I've also attached the full patch if that's more convenient for you.

You patches are more complete than mine, so let's use them. However I
do prefer to have a standalone wb_check_background_flush() that is
called _after_ wb_check_old_data_flush(). This helps make the writeout
a bit more ordered and the separation itself looks a bit more clean to
me.

Followed are the slightly updated patches. IMHO they are straightforward
fixes that could be merged before other writeback changes.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] writeback: integrated background writeback work
  2010-09-14 12:40   ` Jan Kara
@ 2010-11-01 12:14       ` Wu Fengguang
  2010-11-01 12:14       ` Wu Fengguang
  1 sibling, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 12:14 UTC (permalink / raw)
  To: Andrew Morton, Jan Kara
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

From: Jan Kara <jack@suse.cz>

Check whether background writeback is needed after finishing each
work.

When bdi flusher thread finishes doing some work check whether any
kind of background writeback needs to be done (either because
dirty_background_ratio is exceeded or because we need to start
flushing old inodes). If so, just do background write back.

This way, bdi_start_background_writeback() just needs to wake up the
flusher thread. It will do background writeback as soon as there is no
other work.

This is a preparatory patch for the next patch which stops background
writeback as soon as there is other work to do.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   61 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 15 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2010-10-31 19:03:58.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-01 19:31:54.000000000 +0800
@@ -79,13 +79,9 @@ static inline struct backing_dev_info *i
 	return sb->s_bdi;
 }
 
-static void bdi_queue_work(struct backing_dev_info *bdi,
-		struct wb_writeback_work *work)
+/* Wakeup flusher thread or forker thread to fork it. Requires bdi->wb_lock. */
+static void _bdi_wakeup_flusher(struct backing_dev_info *bdi)
 {
-	trace_writeback_queue(bdi, work);
-
-	spin_lock_bh(&bdi->wb_lock);
-	list_add_tail(&work->list, &bdi->work_list);
 	if (bdi->wb.task) {
 		wake_up_process(bdi->wb.task);
 	} else {
@@ -93,15 +89,26 @@ static void bdi_queue_work(struct backin
 		 * The bdi thread isn't there, wake up the forker thread which
 		 * will create and run it.
 		 */
-		trace_writeback_nothread(bdi, work);
 		wake_up_process(default_backing_dev_info.wb.task);
 	}
+}
+
+static void bdi_queue_work(struct backing_dev_info *bdi,
+			   struct wb_writeback_work *work)
+{
+	trace_writeback_queue(bdi, work);
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_add_tail(&work->list, &bdi->work_list);
+	if (!bdi->wb.task)
+		trace_writeback_nothread(bdi, work);
+	_bdi_wakeup_flusher(bdi);
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
 static void
 __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
-		bool range_cyclic, bool for_background)
+		      bool range_cyclic)
 {
 	struct wb_writeback_work *work;
 
@@ -121,7 +128,6 @@ __bdi_start_writeback(struct backing_dev
 	work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
-	work->for_background = for_background;
 
 	bdi_queue_work(bdi, work);
 }
@@ -139,7 +145,7 @@ __bdi_start_writeback(struct backing_dev
  */
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
 {
-	__bdi_start_writeback(bdi, nr_pages, true, false);
+	__bdi_start_writeback(bdi, nr_pages, true);
 }
 
 /**
@@ -147,13 +153,20 @@ void bdi_start_writeback(struct backing_
  * @bdi: the backing device to write from
  *
  * Description:
- *   This does WB_SYNC_NONE background writeback. The IO is only
- *   started when this function returns, we make no guarentees on
- *   completion. Caller need not hold sb s_umount semaphore.
+ *   This makes sure WB_SYNC_NONE background writeback happens. When
+ *   this function returns, it is only guaranteed that for given BDI
+ *   some IO is happening if we are over background dirty threshold.
+ *   Caller need not hold sb s_umount semaphore.
  */
 void bdi_start_background_writeback(struct backing_dev_info *bdi)
 {
-	__bdi_start_writeback(bdi, LONG_MAX, true, true);
+	/*
+	 * We just wake up the flusher thread. It will perform background
+	 * writeback as soon as there is no other work to do.
+	 */
+	spin_lock_bh(&bdi->wb_lock);
+	_bdi_wakeup_flusher(bdi);
+	spin_unlock_bh(&bdi->wb_lock);
 }
 
 /*
@@ -724,6 +737,23 @@ get_next_work_item(struct backing_dev_in
 	return work;
 }
 
+static long wb_check_background_flush(struct bdi_writeback *wb)
+{
+	if (over_bground_thresh()) {
+
+		struct wb_writeback_work work = {
+			.nr_pages	= LONG_MAX,
+			.sync_mode	= WB_SYNC_NONE,
+			.for_background	= 1,
+			.range_cyclic	= 1,
+		};
+
+		return wb_writeback(wb, &work);
+	}
+
+	return 0;
+}
+
 static long wb_check_old_data_flush(struct bdi_writeback *wb)
 {
 	unsigned long expired;
@@ -795,6 +825,7 @@ long wb_do_writeback(struct bdi_writebac
 	 * Check for periodic writeback, kupdated() style
 	 */
 	wrote += wb_check_old_data_flush(wb);
+	wrote += wb_check_background_flush(wb);
 	clear_bit(BDI_writeback_running, &wb->bdi->state);
 
 	return wrote;
@@ -881,7 +912,7 @@ void wakeup_flusher_threads(long nr_page
 	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
 		if (!bdi_has_dirty_io(bdi))
 			continue;
-		__bdi_start_writeback(bdi, nr_pages, false, false);
+		__bdi_start_writeback(bdi, nr_pages, false);
 	}
 	rcu_read_unlock();
 }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] writeback: integrated background writeback work
@ 2010-11-01 12:14       ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 12:14 UTC (permalink / raw)
  To: Andrew Morton, Jan Kara
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

From: Jan Kara <jack@suse.cz>

Check whether background writeback is needed after finishing each
work.

When bdi flusher thread finishes doing some work check whether any
kind of background writeback needs to be done (either because
dirty_background_ratio is exceeded or because we need to start
flushing old inodes). If so, just do background write back.

This way, bdi_start_background_writeback() just needs to wake up the
flusher thread. It will do background writeback as soon as there is no
other work.

This is a preparatory patch for the next patch which stops background
writeback as soon as there is other work to do.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   61 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 15 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2010-10-31 19:03:58.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-01 19:31:54.000000000 +0800
@@ -79,13 +79,9 @@ static inline struct backing_dev_info *i
 	return sb->s_bdi;
 }
 
-static void bdi_queue_work(struct backing_dev_info *bdi,
-		struct wb_writeback_work *work)
+/* Wakeup flusher thread or forker thread to fork it. Requires bdi->wb_lock. */
+static void _bdi_wakeup_flusher(struct backing_dev_info *bdi)
 {
-	trace_writeback_queue(bdi, work);
-
-	spin_lock_bh(&bdi->wb_lock);
-	list_add_tail(&work->list, &bdi->work_list);
 	if (bdi->wb.task) {
 		wake_up_process(bdi->wb.task);
 	} else {
@@ -93,15 +89,26 @@ static void bdi_queue_work(struct backin
 		 * The bdi thread isn't there, wake up the forker thread which
 		 * will create and run it.
 		 */
-		trace_writeback_nothread(bdi, work);
 		wake_up_process(default_backing_dev_info.wb.task);
 	}
+}
+
+static void bdi_queue_work(struct backing_dev_info *bdi,
+			   struct wb_writeback_work *work)
+{
+	trace_writeback_queue(bdi, work);
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_add_tail(&work->list, &bdi->work_list);
+	if (!bdi->wb.task)
+		trace_writeback_nothread(bdi, work);
+	_bdi_wakeup_flusher(bdi);
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
 static void
 __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
-		bool range_cyclic, bool for_background)
+		      bool range_cyclic)
 {
 	struct wb_writeback_work *work;
 
@@ -121,7 +128,6 @@ __bdi_start_writeback(struct backing_dev
 	work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
-	work->for_background = for_background;
 
 	bdi_queue_work(bdi, work);
 }
@@ -139,7 +145,7 @@ __bdi_start_writeback(struct backing_dev
  */
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
 {
-	__bdi_start_writeback(bdi, nr_pages, true, false);
+	__bdi_start_writeback(bdi, nr_pages, true);
 }
 
 /**
@@ -147,13 +153,20 @@ void bdi_start_writeback(struct backing_
  * @bdi: the backing device to write from
  *
  * Description:
- *   This does WB_SYNC_NONE background writeback. The IO is only
- *   started when this function returns, we make no guarentees on
- *   completion. Caller need not hold sb s_umount semaphore.
+ *   This makes sure WB_SYNC_NONE background writeback happens. When
+ *   this function returns, it is only guaranteed that for given BDI
+ *   some IO is happening if we are over background dirty threshold.
+ *   Caller need not hold sb s_umount semaphore.
  */
 void bdi_start_background_writeback(struct backing_dev_info *bdi)
 {
-	__bdi_start_writeback(bdi, LONG_MAX, true, true);
+	/*
+	 * We just wake up the flusher thread. It will perform background
+	 * writeback as soon as there is no other work to do.
+	 */
+	spin_lock_bh(&bdi->wb_lock);
+	_bdi_wakeup_flusher(bdi);
+	spin_unlock_bh(&bdi->wb_lock);
 }
 
 /*
@@ -724,6 +737,23 @@ get_next_work_item(struct backing_dev_in
 	return work;
 }
 
+static long wb_check_background_flush(struct bdi_writeback *wb)
+{
+	if (over_bground_thresh()) {
+
+		struct wb_writeback_work work = {
+			.nr_pages	= LONG_MAX,
+			.sync_mode	= WB_SYNC_NONE,
+			.for_background	= 1,
+			.range_cyclic	= 1,
+		};
+
+		return wb_writeback(wb, &work);
+	}
+
+	return 0;
+}
+
 static long wb_check_old_data_flush(struct bdi_writeback *wb)
 {
 	unsigned long expired;
@@ -795,6 +825,7 @@ long wb_do_writeback(struct bdi_writebac
 	 * Check for periodic writeback, kupdated() style
 	 */
 	wrote += wb_check_old_data_flush(wb);
+	wrote += wb_check_background_flush(wb);
 	clear_bit(BDI_writeback_running, &wb->bdi->state);
 
 	return wrote;
@@ -881,7 +912,7 @@ void wakeup_flusher_threads(long nr_page
 	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
 		if (!bdi_has_dirty_io(bdi))
 			continue;
-		__bdi_start_writeback(bdi, nr_pages, false, false);
+		__bdi_start_writeback(bdi, nr_pages, false);
 	}
 	rcu_read_unlock();
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/2] writeback: stop background/kupdate works from livelocking other works
  2010-11-01 12:14       ` Wu Fengguang
@ 2010-11-01 12:22         ` Wu Fengguang
  -1 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 12:22 UTC (permalink / raw)
  To: Andrew Morton, Jan Kara
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

From: Jan Kara <jack@suse.cz>

Background writeback are easily livelockable (from a definition of their
target). This is inconvenient because it can make sync(1) stall forever waiting
on its queued work to be finished. Generally, when a flusher thread has
some work queued, someone submitted the work to achieve a goal more specific
than what background writeback does. So it makes sense to give it a priority
over a generic page cleaning.

Thus we interrupt background writeback if there is some other work to do. We
return to the background writeback after completing all the queued work.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2010-11-01 19:50:22.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-01 19:56:54.000000000 +0800
@@ -664,6 +664,15 @@ static long wb_writeback(struct bdi_writ
 			break;
 
 		/*
+		 * Background writeout and kupdate-style writeback are
+		 * easily livelockable. Stop them if there is other work
+		 * to do so that e.g. sync can proceed.
+		 */
+		if ((work->for_background || work->for_kupdate) &&
+		    !list_empty(&wb->bdi->work_list))
+			break;
+
+		/*
 		 * For background writeout, stop when we are below the
 		 * background dirty threshold
 		 */

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/2] writeback: stop background/kupdate works from livelocking other works
@ 2010-11-01 12:22         ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 12:22 UTC (permalink / raw)
  To: Andrew Morton, Jan Kara
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

From: Jan Kara <jack@suse.cz>

Background writeback are easily livelockable (from a definition of their
target). This is inconvenient because it can make sync(1) stall forever waiting
on its queued work to be finished. Generally, when a flusher thread has
some work queued, someone submitted the work to achieve a goal more specific
than what background writeback does. So it makes sense to give it a priority
over a generic page cleaning.

Thus we interrupt background writeback if there is some other work to do. We
return to the background writeback after completing all the queued work.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2010-11-01 19:50:22.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-01 19:56:54.000000000 +0800
@@ -664,6 +664,15 @@ static long wb_writeback(struct bdi_writ
 			break;
 
 		/*
+		 * Background writeout and kupdate-style writeback are
+		 * easily livelockable. Stop them if there is other work
+		 * to do so that e.g. sync can proceed.
+		 */
+		if ((work->for_background || work->for_kupdate) &&
+		    !list_empty(&wb->bdi->work_list))
+			break;
+
+		/*
 		 * For background writeout, stop when we are below the
 		 * background dirty threshold
 		 */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] writeback: introduce bdi_start_inode_writeback()
  2010-09-14 13:36   ` Jan Kara
@ 2010-11-01 12:35     ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 12:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner, Minchan Kim,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton

On Tue, Sep 14, 2010 at 09:36:52PM +0800, Jan Kara wrote:
> On Mon 13-09-10 20:31:13, Wu Fengguang wrote:
> > This is to transfer dirty pages encountered in page reclaim to the
> > flusher threads for writeback.
> > 
> > The flusher will piggy back more dirty pages for IO
> > - it's more IO efficient
> > - it helps clean more pages, a good number of them may sit in the same
> >   LRU list that is being scanned.
> > 
> > To avoid memory allocations at page reclaim, a mempool is created.
> > 
> > Background/periodic works will quit automatically, so as to clean the
> > pages under reclaim ASAP. However the sync work can still block us for
> > long time.
> > 
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  fs/fs-writeback.c           |  103 +++++++++++++++++++++++++++++++++-
> >  include/linux/backing-dev.h |    2 
> >  2 files changed, 102 insertions(+), 3 deletions(-)
> > 
> ...
> > +int bdi_start_inode_writeback(struct backing_dev_info *bdi,
> > +			      struct inode *inode, pgoff_t offset)
> > +{
> > +	struct wb_writeback_work *work;
> > +
> > +	spin_lock_bh(&bdi->wb_lock);
> > +	list_for_each_entry_reverse(work, &bdi->work_list, list) {
> > +		unsigned long end;
> > +		if (work->inode != inode)
> > +			continue;
>   Hmm, this looks rather inefficient. I can imagine the list of work items
> can grow rather large on memory stressed machine and the linear scan does
> not play well with that (and contention on wb_lock would make it even
> worse). I'm not sure how to best handle your set of intervals... RB tree
> attached to an inode is an obvious choice but it seems too expensive
> (memory spent for every inode) for such a rare use. Maybe you could have
> a per-bdi mapping (hash table) from ino to it's tree of intervals for
> reclaim... But before going for this, probably measuring how many intervals
> are we going to have under memory pressure would be good.

Good point. For now I'll just limit the search to 100 recent works.
Could be further improved later.

> > +		end = work->offset + work->nr_pages;
> > +		if (work->offset - offset < WRITE_AROUND_PAGES) {
>        It's slightly unclear what's intended here when offset >
> work->offset. Could you make that explicit?

It implicitly takes advantage of the "unsigned" compare.  When offset
 > work->offset, "work->offset - offset" will normally be a
huge _positive_ value.  I added a comment for it in the following
updated version. Is this still way too hacky?

Thanks,
Fengguang
---
Subject: vmscan: transfer async file writeback to the flusher

This is to transfer dirty pages encountered in page reclaim to the
flusher threads for writeback.

Only ASYNC pageout() is relayed to the flusher threads, the less
frequent SYNC pageout()s will work as before as a last resort.
This helps to avoid OOM when the LRU list is small and/or the storage is
slow, and the flusher cannot clean enough pages before the LRU is
full scanned. 

The flusher will piggy back more dirty pages for IO
- it's more IO efficient
- it helps clean more pages, a good number of them may sit in the same
  LRU list that is being scanned.

To avoid memory allocations at page reclaim, a mempool is created.

Background/periodic works will quit automatically (as done in another
patch), so as to clean the pages under reclaim ASAP. However for now the
sync work can still block us for long time.

Jan Kara: limit the search scope.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c           |  118 +++++++++++++++++++++++++++++++++-
 include/linux/backing-dev.h |    2 
 2 files changed, 117 insertions(+), 3 deletions(-)

--- linux-next.orig/mm/vmscan.c	2010-11-01 04:10:37.000000000 +0800
+++ linux-next/mm/vmscan.c	2010-11-01 04:49:47.000000000 +0800
@@ -752,6 +754,16 @@ static unsigned long shrink_page_list(st
 				}
 			}
 
+			if (page_is_file_cache(page) && mapping &&
+			    sync_writeback == PAGEOUT_IO_ASYNC) {
+				if (!bdi_start_inode_writeback(
+					mapping->backing_dev_info,
+					mapping->host, page_index(page))) {
+					SetPageReclaim(page);
+					goto keep_locked;
+				}
+			}
+
 			if (references == PAGEREF_RECLAIM_CLEAN)
 				goto keep_locked;
 			if (!may_enter_fs)
--- linux-next.orig/fs/fs-writeback.c	2010-10-31 19:31:32.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-01 04:32:17.000000000 +0800
@@ -30,11 +30,20 @@
 #include "internal.h"
 
 /*
+ * When flushing an inode page (for page reclaim), try to piggy back more
+ * nearby pages for IO efficiency. These pages will have good opportunity
+ * to be in the same LRU list.
+ */
+#define WRITE_AROUND_PAGES	(1UL << (20 - PAGE_CACHE_SHIFT))
+
+/*
  * Passed into wb_writeback(), essentially a subset of writeback_control
  */
 struct wb_writeback_work {
 	long nr_pages;
 	struct super_block *sb;
+	struct inode *inode;
+	pgoff_t offset;
 	enum writeback_sync_modes sync_mode;
 	unsigned int for_kupdate:1;
 	unsigned int range_cyclic:1;
@@ -57,6 +66,27 @@ struct wb_writeback_work {
  */
 int nr_pdflush_threads;
 
+static mempool_t *wb_work_mempool;
+
+static void *wb_work_alloc(gfp_t gfp_mask, void *pool_data)
+{
+	/*
+	 * bdi_start_inode_writeback() may be called on page reclaim
+	 */
+	if (current->flags & PF_MEMALLOC)
+		return NULL;
+
+	return kmalloc(sizeof(struct wb_writeback_work), gfp_mask);
+}
+
+static __init int wb_work_init(void)
+{
+	wb_work_mempool = mempool_create(1024,
+					 wb_work_alloc, mempool_kfree, NULL);
+	return wb_work_mempool ? 0 : -ENOMEM;
+}
+fs_initcall(wb_work_init);
+
 /**
  * writeback_in_progress - determine whether there is writeback in progress
  * @bdi: the device's backing_dev_info structure.
@@ -116,7 +146,7 @@ __bdi_start_writeback(struct backing_dev
 	 * This is WB_SYNC_NONE writeback, so if allocation fails just
 	 * wakeup the thread for old dirty data writeback
 	 */
-	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	work = mempool_alloc(wb_work_mempool, GFP_NOWAIT);
 	if (!work) {
 		if (bdi->wb.task) {
 			trace_writeback_nowork(bdi);
@@ -125,6 +155,7 @@ __bdi_start_writeback(struct backing_dev
 		return;
 	}
 
+	memset(work, 0, sizeof(*work));
 	work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
@@ -169,6 +200,70 @@ void bdi_start_background_writeback(stru
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
+static bool try_extend_writeback_range(struct wb_writeback_work *work,
+				       pgoff_t offset)
+{
+	pgoff_t end = work->offset + work->nr_pages;
+
+	if (offset > work->offset && offset < end)
+		return true;
+
+	/* the unsigned comparison helps eliminate one compare */
+	if (work->offset - offset < WRITE_AROUND_PAGES) {
+		work->nr_pages += work->offset - offset;
+		work->offset = offset;
+		return true;
+	}
+
+	if (offset - end < WRITE_AROUND_PAGES) {
+		work->nr_pages += offset - end;
+		return true;
+	}
+
+	return false;
+}
+
+int bdi_start_inode_writeback(struct backing_dev_info *bdi,
+			      struct inode *inode, pgoff_t offset)
+{
+	struct wb_writeback_work *work;
+	int i = 0;
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_for_each_entry_reverse(work, &bdi->work_list, list) {
+		unsigned long end;
+		if (work->inode != inode)
+			continue;
+		if (try_extend_writeback_range(work, offset)) {
+			inode = NULL;
+			break;
+		}
+		if (i++ > 100)	/* do limited search */
+			break;
+	}
+	spin_unlock_bh(&bdi->wb_lock);
+
+	if (!inode)
+		return 0;
+
+	if (!igrab(inode))
+		return -ENOENT;
+
+	work = mempool_alloc(wb_work_mempool, GFP_NOWAIT);
+	if (!work)
+		return -ENOMEM;
+
+	memset(work, 0, sizeof(*work));
+	work->sync_mode		= WB_SYNC_NONE;
+	work->inode		= inode;
+	work->offset		= offset;
+	work->nr_pages		= 1;
+
+	bdi_queue_work(inode->i_sb->s_bdi, work);
+
+	return 0;
+}
+
 /*
  * Redirty an inode: set its when-it-was dirtied timestamp and move it to the
  * furthest end of its superblock's dirty-inode list.
@@ -745,6 +840,20 @@ get_next_work_item(struct backing_dev_in
 	return work;
 }
 
+static long wb_flush_inode(struct bdi_writeback *wb,
+			   struct wb_writeback_work *work)
+{
+	pgoff_t start = round_down(work->offset, WRITE_AROUND_PAGES);
+	pgoff_t end = round_up(work->offset + work->nr_pages,
+			       WRITE_AROUND_PAGES);
+	int wrote;
+
+	wrote = __filemap_fdatawrite_range(work->inode->i_mapping,
+					   start, end, WB_SYNC_NONE);
+	iput(work->inode);
+	return wrote;
+}
+
 static long wb_check_background_flush(struct bdi_writeback *wb)
 {
 	if (over_bground_thresh()) {
@@ -817,7 +926,10 @@ long wb_do_writeback(struct bdi_writebac
 
 		trace_writeback_exec(bdi, work);
 
-		wrote += wb_writeback(wb, work);
+		if (work->inode)
+			wrote += wb_flush_inode(wb, work);
+		else
+			wrote += wb_writeback(wb, work);
 
 		/*
 		 * Notify the caller of completion if this is a synchronous
@@ -826,7 +938,7 @@ long wb_do_writeback(struct bdi_writebac
 		if (work->done)
 			complete(work->done);
 		else
-			kfree(work);
+			mempool_free(work, wb_work_mempool);
 	}
 
 	/*
--- linux-next.orig/include/linux/backing-dev.h	2010-10-26 11:21:17.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2010-10-31 19:32:00.000000000 +0800
@@ -107,6 +107,8 @@ void bdi_unregister(struct backing_dev_i
 int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages);
 void bdi_start_background_writeback(struct backing_dev_info *bdi);
+int bdi_start_inode_writeback(struct backing_dev_info *bdi,
+			      struct inode *inode, pgoff_t offset);
 int bdi_writeback_thread(void *data);
 int bdi_has_dirty_io(struct backing_dev_info *bdi);
 void bdi_arm_supers_timer(void);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued
  2010-11-01 12:07     ` Wu Fengguang
@ 2010-11-01 15:08       ` Jan Kara
  2010-11-01 15:13       ` Jan Kara
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-11-01 15:08 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Jan Kara, linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner,
	Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton

On Mon 01-11-10 20:07:33, Wu Fengguang wrote:
> On Tue, Sep 14, 2010 at 08:40:33PM +0800, Jan Kara wrote:
> > Background writeback are easily livelockable (from a definition of their
> > target). This is inconvenient because it can make sync(1) stall forever waiting
> > on its queued work to be finished. Generally, when a flusher thread has
> > some work queued, someone submitted the work to achieve a goal more specific
> > than what background writeback does. So it makes sense to give it a priority
> > over a generic page cleaning.
> > 
> > Thus we interrupt background writeback if there is some other work to do. We
> > return to the background writeback after completing all the queued work.
> > 
> >   Could you please update it? Thanks.
> > 								Honza
> > 
> > PS: I've also attached the full patch if that's more convenient for you.
> 
> You patches are more complete than mine, so let's use them. However I
> do prefer to have a standalone wb_check_background_flush() that is
> called _after_ wb_check_old_data_flush(). This helps make the writeout
> a bit more ordered and the separation itself looks a bit more clean to
> me.
> 
> Followed are the slightly updated patches. IMHO they are straightforward
> fixes that could be merged before other writeback changes.
  Yes, the updated patch looks OK to me. Thanks for picking the patches up.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued
  2010-11-01 12:07     ` Wu Fengguang
  2010-11-01 15:08       ` Jan Kara
@ 2010-11-01 15:13       ` Jan Kara
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-11-01 15:13 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Jan Kara, linux-mm, Mel Gorman, Rik van Riel, Johannes Weiner,
	Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton

On Mon 01-11-10 20:07:33, Wu Fengguang wrote:
> You patches are more complete than mine, so let's use them. However I
> do prefer to have a standalone wb_check_background_flush() that is
> called _after_ wb_check_old_data_flush(). This helps make the writeout
> a bit more ordered and the separation itself looks a bit more clean to
> me.
> 
> Followed are the slightly updated patches. IMHO they are straightforward
> fixes that could be merged before other writeback changes.
  One more question - who's going to merge it with Linus? Would you send
the patches to Jens' to merge them via the block tree (note that Jens has
a new email address since he joined FusionIO)? Or maybe to Andrew?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] writeback: integrated background writeback work
  2010-11-01 12:14       ` Wu Fengguang
  (?)
  (?)
@ 2010-11-01 15:21       ` Christoph Hellwig
  2010-11-01 20:37           ` Wu Fengguang
  2010-11-01 20:39           ` Wu Fengguang
  -1 siblings, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-11-01 15:21 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, Christoph Hellwig, linux-fsdevel

> +static void _bdi_wakeup_flusher(struct backing_dev_info *bdi)

Remove the leading underscore, please.

>  void bdi_start_background_writeback(struct backing_dev_info *bdi)
>  {
> -	__bdi_start_writeback(bdi, LONG_MAX, true, true);
> +	/*
> +	 * We just wake up the flusher thread. It will perform background
> +	 * writeback as soon as there is no other work to do.
> +	 */
> +	spin_lock_bh(&bdi->wb_lock);
> +	_bdi_wakeup_flusher(bdi);
> +	spin_unlock_bh(&bdi->wb_lock);

We probably want a trace point here, too.

Otherwise the patch looks good to me.  Thanks for bringing it up again.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] writeback: stop background/kupdate works from livelocking other works
  2010-11-01 12:22         ` Wu Fengguang
  (?)
@ 2010-11-01 15:22         ` Christoph Hellwig
  -1 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-11-01 15:22 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, Christoph Hellwig, linux-fsdevel


Looks good.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] writeback: integrated background writeback work
  2010-11-01 15:21       ` [PATCH 1/2] writeback: integrated background writeback work Christoph Hellwig
@ 2010-11-01 20:37           ` Wu Fengguang
  2010-11-01 20:39           ` Wu Fengguang
  1 sibling, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 20:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, linux-fsdevel

On Mon, Nov 01, 2010 at 11:21:50PM +0800, Christoph Hellwig wrote:
> > +static void _bdi_wakeup_flusher(struct backing_dev_info *bdi)
> 
> Remove the leading underscore, please.

OK, makes sense. The updated patch will follow.

> >  void bdi_start_background_writeback(struct backing_dev_info *bdi)
> >  {
> > -	__bdi_start_writeback(bdi, LONG_MAX, true, true);
> > +	/*
> > +	 * We just wake up the flusher thread. It will perform background
> > +	 * writeback as soon as there is no other work to do.
> > +	 */
> > +	spin_lock_bh(&bdi->wb_lock);
> > +	_bdi_wakeup_flusher(bdi);
> > +	spin_unlock_bh(&bdi->wb_lock);
> 
> We probably want a trace point here, too.
> 
> Otherwise the patch looks good to me.  Thanks for bringing it up again.

Thanks. It's trivial to add the trace point, here is the incremental
patch.

Thanks,
Fengguang

---
writeback: trace wakeup event for background writeback

This tracks when balance_dirty_pages() tries to wakeup the flusher
thread for background writeback (if it was not started already).

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c                |    1 +
 include/trace/events/writeback.h |    1 +
 2 files changed, 2 insertions(+)

--- linux-next.orig/include/trace/events/writeback.h	2010-11-02 04:17:26.000000000 +0800
+++ linux-next/include/trace/events/writeback.h	2010-11-02 04:21:02.000000000 +0800
@@ -81,6 +81,7 @@ DEFINE_EVENT(writeback_class, name, \
 	TP_ARGS(bdi))
 
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
+DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_wake_thread);
 DEFINE_WRITEBACK_EVENT(writeback_wake_forker_thread);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
--- linux-next.orig/fs/fs-writeback.c	2010-11-02 04:22:17.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-02 04:22:33.000000000 +0800
@@ -164,6 +164,7 @@ void bdi_start_background_writeback(stru
 	 * We just wake up the flusher thread. It will perform background
 	 * writeback as soon as there is no other work to do.
 	 */
+	trace_writeback_wake_background(bdi);
 	spin_lock_bh(&bdi->wb_lock);
 	bdi_wakeup_flusher(bdi);
 	spin_unlock_bh(&bdi->wb_lock);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] writeback: integrated background writeback work
@ 2010-11-01 20:37           ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 20:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, linux-fsdevel

On Mon, Nov 01, 2010 at 11:21:50PM +0800, Christoph Hellwig wrote:
> > +static void _bdi_wakeup_flusher(struct backing_dev_info *bdi)
> 
> Remove the leading underscore, please.

OK, makes sense. The updated patch will follow.

> >  void bdi_start_background_writeback(struct backing_dev_info *bdi)
> >  {
> > -	__bdi_start_writeback(bdi, LONG_MAX, true, true);
> > +	/*
> > +	 * We just wake up the flusher thread. It will perform background
> > +	 * writeback as soon as there is no other work to do.
> > +	 */
> > +	spin_lock_bh(&bdi->wb_lock);
> > +	_bdi_wakeup_flusher(bdi);
> > +	spin_unlock_bh(&bdi->wb_lock);
> 
> We probably want a trace point here, too.
> 
> Otherwise the patch looks good to me.  Thanks for bringing it up again.

Thanks. It's trivial to add the trace point, here is the incremental
patch.

Thanks,
Fengguang

---
writeback: trace wakeup event for background writeback

This tracks when balance_dirty_pages() tries to wakeup the flusher
thread for background writeback (if it was not started already).

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c                |    1 +
 include/trace/events/writeback.h |    1 +
 2 files changed, 2 insertions(+)

--- linux-next.orig/include/trace/events/writeback.h	2010-11-02 04:17:26.000000000 +0800
+++ linux-next/include/trace/events/writeback.h	2010-11-02 04:21:02.000000000 +0800
@@ -81,6 +81,7 @@ DEFINE_EVENT(writeback_class, name, \
 	TP_ARGS(bdi))
 
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
+DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_wake_thread);
 DEFINE_WRITEBACK_EVENT(writeback_wake_forker_thread);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
--- linux-next.orig/fs/fs-writeback.c	2010-11-02 04:22:17.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-02 04:22:33.000000000 +0800
@@ -164,6 +164,7 @@ void bdi_start_background_writeback(stru
 	 * We just wake up the flusher thread. It will perform background
 	 * writeback as soon as there is no other work to do.
 	 */
+	trace_writeback_wake_background(bdi);
 	spin_lock_bh(&bdi->wb_lock);
 	bdi_wakeup_flusher(bdi);
 	spin_unlock_bh(&bdi->wb_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2 v2] writeback: integrated background writeback work
  2010-11-01 15:21       ` [PATCH 1/2] writeback: integrated background writeback work Christoph Hellwig
@ 2010-11-01 20:39           ` Wu Fengguang
  2010-11-01 20:39           ` Wu Fengguang
  1 sibling, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 20:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, linux-fsdevel

From: Jan Kara <jack@suse.cz>

Check whether background writeback is needed after finishing each work.

When bdi flusher thread finishes doing some work check whether any kind
of background writeback needs to be done (either because
dirty_background_ratio is exceeded or because we need to start flushing
old inodes). If so, just do background write back.

This way, bdi_start_background_writeback() just needs to wake up the
flusher thread. It will do background writeback as soon as there is no
other work.

This is a preparatory patch for the next patch which stops background
writeback as soon as there is other work to do.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   61 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 15 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2010-11-02 04:37:39.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-02 04:37:45.000000000 +0800
@@ -79,13 +79,9 @@ static inline struct backing_dev_info *i
 	return sb->s_bdi;
 }
 
-static void bdi_queue_work(struct backing_dev_info *bdi,
-		struct wb_writeback_work *work)
+/* Wakeup flusher thread or forker thread to fork it. Requires bdi->wb_lock. */
+static void bdi_wakeup_flusher(struct backing_dev_info *bdi)
 {
-	trace_writeback_queue(bdi, work);
-
-	spin_lock_bh(&bdi->wb_lock);
-	list_add_tail(&work->list, &bdi->work_list);
 	if (bdi->wb.task) {
 		wake_up_process(bdi->wb.task);
 	} else {
@@ -93,15 +89,26 @@ static void bdi_queue_work(struct backin
 		 * The bdi thread isn't there, wake up the forker thread which
 		 * will create and run it.
 		 */
-		trace_writeback_nothread(bdi, work);
 		wake_up_process(default_backing_dev_info.wb.task);
 	}
+}
+
+static void bdi_queue_work(struct backing_dev_info *bdi,
+			   struct wb_writeback_work *work)
+{
+	trace_writeback_queue(bdi, work);
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_add_tail(&work->list, &bdi->work_list);
+	if (!bdi->wb.task)
+		trace_writeback_nothread(bdi, work);
+	bdi_wakeup_flusher(bdi);
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
 static void
 __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
-		bool range_cyclic, bool for_background)
+		      bool range_cyclic)
 {
 	struct wb_writeback_work *work;
 
@@ -121,7 +128,6 @@ __bdi_start_writeback(struct backing_dev
 	work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
-	work->for_background = for_background;
 
 	bdi_queue_work(bdi, work);
 }
@@ -139,7 +145,7 @@ __bdi_start_writeback(struct backing_dev
  */
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
 {
-	__bdi_start_writeback(bdi, nr_pages, true, false);
+	__bdi_start_writeback(bdi, nr_pages, true);
 }
 
 /**
@@ -147,13 +153,20 @@ void bdi_start_writeback(struct backing_
  * @bdi: the backing device to write from
  *
  * Description:
- *   This does WB_SYNC_NONE background writeback. The IO is only
- *   started when this function returns, we make no guarentees on
- *   completion. Caller need not hold sb s_umount semaphore.
+ *   This makes sure WB_SYNC_NONE background writeback happens. When
+ *   this function returns, it is only guaranteed that for given BDI
+ *   some IO is happening if we are over background dirty threshold.
+ *   Caller need not hold sb s_umount semaphore.
  */
 void bdi_start_background_writeback(struct backing_dev_info *bdi)
 {
-	__bdi_start_writeback(bdi, LONG_MAX, true, true);
+	/*
+	 * We just wake up the flusher thread. It will perform background
+	 * writeback as soon as there is no other work to do.
+	 */
+	spin_lock_bh(&bdi->wb_lock);
+	bdi_wakeup_flusher(bdi);
+	spin_unlock_bh(&bdi->wb_lock);
 }
 
 /*
@@ -724,6 +737,23 @@ get_next_work_item(struct backing_dev_in
 	return work;
 }
 
+static long wb_check_background_flush(struct bdi_writeback *wb)
+{
+	if (over_bground_thresh()) {
+
+		struct wb_writeback_work work = {
+			.nr_pages	= LONG_MAX,
+			.sync_mode	= WB_SYNC_NONE,
+			.for_background	= 1,
+			.range_cyclic	= 1,
+		};
+
+		return wb_writeback(wb, &work);
+	}
+
+	return 0;
+}
+
 static long wb_check_old_data_flush(struct bdi_writeback *wb)
 {
 	unsigned long expired;
@@ -795,6 +825,7 @@ long wb_do_writeback(struct bdi_writebac
 	 * Check for periodic writeback, kupdated() style
 	 */
 	wrote += wb_check_old_data_flush(wb);
+	wrote += wb_check_background_flush(wb);
 	clear_bit(BDI_writeback_running, &wb->bdi->state);
 
 	return wrote;
@@ -881,7 +912,7 @@ void wakeup_flusher_threads(long nr_page
 	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
 		if (!bdi_has_dirty_io(bdi))
 			continue;
-		__bdi_start_writeback(bdi, nr_pages, false, false);
+		__bdi_start_writeback(bdi, nr_pages, false);
 	}
 	rcu_read_unlock();
 }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2 v2] writeback: integrated background writeback work
@ 2010-11-01 20:39           ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2010-11-01 20:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, linux-fsdevel

From: Jan Kara <jack@suse.cz>

Check whether background writeback is needed after finishing each work.

When bdi flusher thread finishes doing some work check whether any kind
of background writeback needs to be done (either because
dirty_background_ratio is exceeded or because we need to start flushing
old inodes). If so, just do background write back.

This way, bdi_start_background_writeback() just needs to wake up the
flusher thread. It will do background writeback as soon as there is no
other work.

This is a preparatory patch for the next patch which stops background
writeback as soon as there is other work to do.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   61 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 15 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2010-11-02 04:37:39.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2010-11-02 04:37:45.000000000 +0800
@@ -79,13 +79,9 @@ static inline struct backing_dev_info *i
 	return sb->s_bdi;
 }
 
-static void bdi_queue_work(struct backing_dev_info *bdi,
-		struct wb_writeback_work *work)
+/* Wakeup flusher thread or forker thread to fork it. Requires bdi->wb_lock. */
+static void bdi_wakeup_flusher(struct backing_dev_info *bdi)
 {
-	trace_writeback_queue(bdi, work);
-
-	spin_lock_bh(&bdi->wb_lock);
-	list_add_tail(&work->list, &bdi->work_list);
 	if (bdi->wb.task) {
 		wake_up_process(bdi->wb.task);
 	} else {
@@ -93,15 +89,26 @@ static void bdi_queue_work(struct backin
 		 * The bdi thread isn't there, wake up the forker thread which
 		 * will create and run it.
 		 */
-		trace_writeback_nothread(bdi, work);
 		wake_up_process(default_backing_dev_info.wb.task);
 	}
+}
+
+static void bdi_queue_work(struct backing_dev_info *bdi,
+			   struct wb_writeback_work *work)
+{
+	trace_writeback_queue(bdi, work);
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_add_tail(&work->list, &bdi->work_list);
+	if (!bdi->wb.task)
+		trace_writeback_nothread(bdi, work);
+	bdi_wakeup_flusher(bdi);
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
 static void
 __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
-		bool range_cyclic, bool for_background)
+		      bool range_cyclic)
 {
 	struct wb_writeback_work *work;
 
@@ -121,7 +128,6 @@ __bdi_start_writeback(struct backing_dev
 	work->sync_mode	= WB_SYNC_NONE;
 	work->nr_pages	= nr_pages;
 	work->range_cyclic = range_cyclic;
-	work->for_background = for_background;
 
 	bdi_queue_work(bdi, work);
 }
@@ -139,7 +145,7 @@ __bdi_start_writeback(struct backing_dev
  */
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
 {
-	__bdi_start_writeback(bdi, nr_pages, true, false);
+	__bdi_start_writeback(bdi, nr_pages, true);
 }
 
 /**
@@ -147,13 +153,20 @@ void bdi_start_writeback(struct backing_
  * @bdi: the backing device to write from
  *
  * Description:
- *   This does WB_SYNC_NONE background writeback. The IO is only
- *   started when this function returns, we make no guarentees on
- *   completion. Caller need not hold sb s_umount semaphore.
+ *   This makes sure WB_SYNC_NONE background writeback happens. When
+ *   this function returns, it is only guaranteed that for given BDI
+ *   some IO is happening if we are over background dirty threshold.
+ *   Caller need not hold sb s_umount semaphore.
  */
 void bdi_start_background_writeback(struct backing_dev_info *bdi)
 {
-	__bdi_start_writeback(bdi, LONG_MAX, true, true);
+	/*
+	 * We just wake up the flusher thread. It will perform background
+	 * writeback as soon as there is no other work to do.
+	 */
+	spin_lock_bh(&bdi->wb_lock);
+	bdi_wakeup_flusher(bdi);
+	spin_unlock_bh(&bdi->wb_lock);
 }
 
 /*
@@ -724,6 +737,23 @@ get_next_work_item(struct backing_dev_in
 	return work;
 }
 
+static long wb_check_background_flush(struct bdi_writeback *wb)
+{
+	if (over_bground_thresh()) {
+
+		struct wb_writeback_work work = {
+			.nr_pages	= LONG_MAX,
+			.sync_mode	= WB_SYNC_NONE,
+			.for_background	= 1,
+			.range_cyclic	= 1,
+		};
+
+		return wb_writeback(wb, &work);
+	}
+
+	return 0;
+}
+
 static long wb_check_old_data_flush(struct bdi_writeback *wb)
 {
 	unsigned long expired;
@@ -795,6 +825,7 @@ long wb_do_writeback(struct bdi_writebac
 	 * Check for periodic writeback, kupdated() style
 	 */
 	wrote += wb_check_old_data_flush(wb);
+	wrote += wb_check_background_flush(wb);
 	clear_bit(BDI_writeback_running, &wb->bdi->state);
 
 	return wrote;
@@ -881,7 +912,7 @@ void wakeup_flusher_threads(long nr_page
 	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
 		if (!bdi_has_dirty_io(bdi))
 			continue;
-		__bdi_start_writeback(bdi, nr_pages, false, false);
+		__bdi_start_writeback(bdi, nr_pages, false);
 	}
 	rcu_read_unlock();
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2 v2] writeback: integrated background writeback work
  2010-11-01 20:39           ` Wu Fengguang
@ 2010-11-02  1:55             ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2010-11-02  1:55 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Christoph Hellwig, Andrew Morton, Jan Kara, linux-mm, Mel Gorman,
	Rik van Riel, Johannes Weiner, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Dave Chinner, linux-fsdevel

On Tue, Nov 2, 2010 at 5:39 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> From: Jan Kara <jack@suse.cz>
>
> Check whether background writeback is needed after finishing each work.
>
> When bdi flusher thread finishes doing some work check whether any kind
> of background writeback needs to be done (either because
> dirty_background_ratio is exceeded or because we need to start flushing
> old inodes). If so, just do background write back.
>
> This way, bdi_start_background_writeback() just needs to wake up the
> flusher thread. It will do background writeback as soon as there is no
> other work.
>
> This is a preparatory patch for the next patch which stops background
> writeback as soon as there is other work to do.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2 v2] writeback: integrated background writeback work
@ 2010-11-02  1:55             ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2010-11-02  1:55 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Christoph Hellwig, Andrew Morton, Jan Kara, linux-mm, Mel Gorman,
	Rik van Riel, Johannes Weiner, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Dave Chinner, linux-fsdevel

On Tue, Nov 2, 2010 at 5:39 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> From: Jan Kara <jack@suse.cz>
>
> Check whether background writeback is needed after finishing each work.
>
> When bdi flusher thread finishes doing some work check whether any kind
> of background writeback needs to be done (either because
> dirty_background_ratio is exceeded or because we need to start flushing
> old inodes). If so, just do background write back.
>
> This way, bdi_start_background_writeback() just needs to wake up the
> flusher thread. It will do background writeback as soon as there is no
> other work.
>
> This is a preparatory patch for the next patch which stops background
> writeback as soon as there is other work to do.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] writeback: stop background/kupdate works from livelocking other works
  2010-11-01 12:22         ` Wu Fengguang
@ 2010-11-02  1:57           ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2010-11-02  1:57 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, Christoph Hellwig, linux-fsdevel

On Mon, Nov 1, 2010 at 9:22 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> From: Jan Kara <jack@suse.cz>
>
> Background writeback are easily livelockable (from a definition of their
> target). This is inconvenient because it can make sync(1) stall forever waiting
> on its queued work to be finished. Generally, when a flusher thread has
> some work queued, someone submitted the work to achieve a goal more specific
> than what background writeback does. So it makes sense to give it a priority
> over a generic page cleaning.
>
> Thus we interrupt background writeback if there is some other work to do. We
> return to the background writeback after completing all the queued work.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] writeback: stop background/kupdate works from livelocking other works
@ 2010-11-02  1:57           ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2010-11-02  1:57 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, Christoph Hellwig, linux-fsdevel

On Mon, Nov 1, 2010 at 9:22 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> From: Jan Kara <jack@suse.cz>
>
> Background writeback are easily livelockable (from a definition of their
> target). This is inconvenient because it can make sync(1) stall forever waiting
> on its queued work to be finished. Generally, when a flusher thread has
> some work queued, someone submitted the work to achieve a goal more specific
> than what background writeback does. So it makes sense to give it a priority
> over a generic page cleaning.
>
> Thus we interrupt background writeback if there is some other work to do. We
> return to the background writeback after completing all the queued work.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2 v2] writeback: integrated background writeback work
  2010-11-01 20:39           ` Wu Fengguang
  (?)
  (?)
@ 2010-11-05 12:01           ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2010-11-05 12:01 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Christoph Hellwig, Andrew Morton, Jan Kara, linux-mm, Mel Gorman,
	Rik van Riel, Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	Dave Chinner, linux-fsdevel

On Tue, Nov 02, 2010 at 04:39:47AM +0800, Wu Fengguang wrote:
> From: Jan Kara <jack@suse.cz>
> 
> Check whether background writeback is needed after finishing each work.
> 
> When bdi flusher thread finishes doing some work check whether any kind
> of background writeback needs to be done (either because
> dirty_background_ratio is exceeded or because we need to start flushing
> old inodes). If so, just do background write back.
> 
> This way, bdi_start_background_writeback() just needs to wake up the
> flusher thread. It will do background writeback as soon as there is no
> other work.
> 
> This is a preparatory patch for the next patch which stops background
> writeback as soon as there is other work to do.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] writeback: stop background/kupdate works from livelocking other works
  2010-11-01 12:22         ` Wu Fengguang
                           ` (2 preceding siblings ...)
  (?)
@ 2010-11-05 12:15         ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2010-11-05 12:15 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Jan Kara, linux-mm, Mel Gorman, Rik van Riel,
	Minchan Kim, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

On Mon, Nov 01, 2010 at 08:22:52PM +0800, Wu Fengguang wrote:
> From: Jan Kara <jack@suse.cz>
> 
> Background writeback are easily livelockable (from a definition of their
> target). This is inconvenient because it can make sync(1) stall forever waiting
> on its queued work to be finished. Generally, when a flusher thread has
> some work queued, someone submitted the work to achieve a goal more specific
> than what background writeback does. So it makes sense to give it a priority
> over a generic page cleaning.
> 
> Thus we interrupt background writeback if there is some other work to do. We
> return to the background writeback after completing all the queued work.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-11-05 12:15 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-13 12:31 [PATCH 0/4] [RFC] transfer vmscan pageout works to the flusher thread Wu Fengguang
2010-09-13 12:31 ` [PATCH 1/4] writeback: integrated background work Wu Fengguang
2010-09-13 22:46   ` Minchan Kim
2010-09-14  0:53     ` Wu Fengguang
2010-09-14 12:45   ` Jan Kara
2010-09-13 12:31 ` [PATCH 2/4] writeback: quit background/periodic work when other works are enqueued Wu Fengguang
2010-09-14 12:40   ` Jan Kara
2010-11-01 12:07     ` Wu Fengguang
2010-11-01 15:08       ` Jan Kara
2010-11-01 15:13       ` Jan Kara
2010-11-01 12:14     ` [PATCH 1/2] writeback: integrated background writeback work Wu Fengguang
2010-11-01 12:14       ` Wu Fengguang
2010-11-01 12:22       ` [PATCH 2/2] writeback: stop background/kupdate works from livelocking other works Wu Fengguang
2010-11-01 12:22         ` Wu Fengguang
2010-11-01 15:22         ` Christoph Hellwig
2010-11-02  1:57         ` Minchan Kim
2010-11-02  1:57           ` Minchan Kim
2010-11-05 12:15         ` Johannes Weiner
2010-11-01 15:21       ` [PATCH 1/2] writeback: integrated background writeback work Christoph Hellwig
2010-11-01 20:37         ` Wu Fengguang
2010-11-01 20:37           ` Wu Fengguang
2010-11-01 20:39         ` [PATCH 1/2 v2] " Wu Fengguang
2010-11-01 20:39           ` Wu Fengguang
2010-11-02  1:55           ` Minchan Kim
2010-11-02  1:55             ` Minchan Kim
2010-11-05 12:01           ` Johannes Weiner
2010-09-13 12:31 ` [PATCH 3/4] writeback: introduce bdi_start_inode_writeback() Wu Fengguang
2010-09-14 13:36   ` Jan Kara
2010-11-01 12:35     ` Wu Fengguang
2010-09-13 12:31 ` [PATCH 4/4] vmscan: transfer async file writeback to the flusher Wu Fengguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.