From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id E86856B025F for ; Wed, 20 Sep 2017 11:33:17 -0400 (EDT) Received: by mail-io0-f200.google.com with SMTP id f72so4787003ioj.7 for ; Wed, 20 Sep 2017 08:33:17 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id b10sor2013963itc.5.2017.09.20.08.33.15 for (Google Transport Security); Wed, 20 Sep 2017 08:33:15 -0700 (PDT) From: Jens Axboe Subject: [PATCH 0/7 v2] More graceful flusher thread memory reclaim wakeup Date: Wed, 20 Sep 2017 09:32:55 -0600 Message-Id: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz We've had some issues with writeback in presence of memory reclaim at Facebook, and this patch set attempts to fix it up. The real functional change is the last patch in the series, the first 5 are prep and cleanup patches. The basic idea is that we have callers that call wakeup_flusher_threads() with nr_pages == 0. This means 'writeback everything'. For memory reclaim situations, we can end up queuing a TON of these kinds of writeback units. This can cause softlockups and further memory issues, since we allocate huge amounts of struct wb_writeback_work to handle this writeback. Handle this situation more gracefully. Changes since v1: - Rename WB_zero_pages to WB_start_all (Amir). - Remove a test_bit() for a condition where we always expect the bit to be set. - Remove 'nr_pages' from the wakeup flusher threads helpers, since everybody now passes in zero. Enables further cleanups in later patches too (Jan). - Fix a case where I forgot to clear WB_start_all if 'work' allocation failed. - Get rid of cond_resched() in the wb_do_writeback() loop. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id 08DED6B0260 for ; Wed, 20 Sep 2017 11:33:21 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id v140so4538756ita.3 for ; Wed, 20 Sep 2017 08:33:21 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id c3sor884298iob.47.2017.09.20.08.33.19 for (Google Transport Security); Wed, 20 Sep 2017 08:33:20 -0700 (PDT) From: Jens Axboe Subject: [PATCH 1/7] buffer: cleanup free_more_memory() flusher wakeup Date: Wed, 20 Sep 2017 09:32:56 -0600 Message-Id: <1505921582-26709-2-git-send-email-axboe@kernel.dk> In-Reply-To: <1505921582-26709-1-git-send-email-axboe@kernel.dk> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz, Jens Axboe This whole function is... interesting. Change the wakeup call to the flusher threads to pass in nr_pages == 0, instead of some random number of pages. This matches more closely what similar cases do for memory shortage/reclaim. Acked-by: Johannes Weiner Tested-by: Chris Mason Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- fs/buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/buffer.c b/fs/buffer.c index 170df856bdb9..9471a445e370 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -260,7 +260,7 @@ static void free_more_memory(void) struct zoneref *z; int nid; - wakeup_flusher_threads(1024, WB_REASON_FREE_MORE_MEM); + wakeup_flusher_threads(0, WB_REASON_FREE_MORE_MEM); yield(); for_each_online_node(nid) { -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f199.google.com (mail-io0-f199.google.com [209.85.223.199]) by kanga.kvack.org (Postfix) with ESMTP id CC3866B0261 for ; Wed, 20 Sep 2017 11:33:22 -0400 (EDT) Received: by mail-io0-f199.google.com with SMTP id g32so4865878ioj.0 for ; Wed, 20 Sep 2017 08:33:22 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id i85sor702839ioa.223.2017.09.20.08.33.21 for (Google Transport Security); Wed, 20 Sep 2017 08:33:21 -0700 (PDT) From: Jens Axboe Subject: [PATCH 2/7] fs: kill 'nr_pages' argument from wakeup_flusher_threads() Date: Wed, 20 Sep 2017 09:32:57 -0600 Message-Id: <1505921582-26709-3-git-send-email-axboe@kernel.dk> In-Reply-To: <1505921582-26709-1-git-send-email-axboe@kernel.dk> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz, Jens Axboe Everybody is passing in 0 now, let's get rid of the argument. Signed-off-by: Jens Axboe --- fs/buffer.c | 2 +- fs/fs-writeback.c | 9 ++++----- fs/sync.c | 2 +- include/linux/writeback.h | 2 +- mm/vmscan.c | 2 +- 5 files changed, 8 insertions(+), 9 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 9471a445e370..cf71926797d3 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -260,7 +260,7 @@ static void free_more_memory(void) struct zoneref *z; int nid; - wakeup_flusher_threads(0, WB_REASON_FREE_MORE_MEM); + wakeup_flusher_threads(WB_REASON_FREE_MORE_MEM); yield(); for_each_online_node(nid) { diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 245c430a2e41..bb6148dc6d24 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1947,12 +1947,12 @@ void wb_workfn(struct work_struct *work) } /* - * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back - * the whole world. + * Wakeup the flusher threads to start writeback of all currently dirty pages */ -void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) +void wakeup_flusher_threads(enum wb_reason reason) { struct backing_dev_info *bdi; + long nr_pages; /* * If we are expecting writeback progress we must submit plugged IO. @@ -1960,8 +1960,7 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) if (blk_needs_flush_plug(current)) blk_schedule_flush_plug(current); - if (!nr_pages) - nr_pages = get_nr_dirty_pages(); + nr_pages = get_nr_dirty_pages(); rcu_read_lock(); list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { diff --git a/fs/sync.c b/fs/sync.c index a576aa2e6b09..09f96a18dd93 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -108,7 +108,7 @@ SYSCALL_DEFINE0(sync) { int nowait = 0, wait = 1; - wakeup_flusher_threads(0, WB_REASON_SYNC); + wakeup_flusher_threads(WB_REASON_SYNC); iterate_supers(sync_inodes_one_sb, NULL); iterate_supers(sync_fs_one_sb, &nowait); iterate_supers(sync_fs_one_sb, &wait); diff --git a/include/linux/writeback.h b/include/linux/writeback.h index d5815794416c..1f9c6db5e29a 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -189,7 +189,7 @@ bool try_to_writeback_inodes_sb(struct super_block *, enum wb_reason reason); bool try_to_writeback_inodes_sb_nr(struct super_block *, unsigned long nr, enum wb_reason reason); void sync_inodes_sb(struct super_block *); -void wakeup_flusher_threads(long nr_pages, enum wb_reason reason); +void wakeup_flusher_threads(enum wb_reason reason); void inode_wait_for_writeback(struct inode *inode); /* writeback.h requires fs.h; it, too, is not included from here. */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 13d711dd8776..42a7fdd52d87 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1867,7 +1867,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, * also allow kswapd to start writing pages during reclaim. */ if (stat.nr_unqueued_dirty == nr_taken) { - wakeup_flusher_threads(0, WB_REASON_VMSCAN); + wakeup_flusher_threads(WB_REASON_VMSCAN); set_bit(PGDAT_DIRTY, &pgdat->flags); } -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f197.google.com (mail-io0-f197.google.com [209.85.223.197]) by kanga.kvack.org (Postfix) with ESMTP id 3099C6B0266 for ; Wed, 20 Sep 2017 11:33:25 -0400 (EDT) Received: by mail-io0-f197.google.com with SMTP id k101so4853773iod.1 for ; Wed, 20 Sep 2017 08:33:25 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id i185sor2405336ita.92.2017.09.20.08.33.23 for (Google Transport Security); Wed, 20 Sep 2017 08:33:24 -0700 (PDT) From: Jens Axboe Subject: [PATCH 3/7] fs-writeback: provide a wakeup_flusher_threads_bdi() Date: Wed, 20 Sep 2017 09:32:58 -0600 Message-Id: <1505921582-26709-4-git-send-email-axboe@kernel.dk> In-Reply-To: <1505921582-26709-1-git-send-email-axboe@kernel.dk> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz, Jens Axboe Similar to wakeup_flusher_threads(), except that we only wake up the flusher threads on the specified backing device. No functional changes in this patch. Acked-by: Johannes Weiner Tested-by: Chris Mason Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- fs/fs-writeback.c | 39 +++++++++++++++++++++++++++++---------- include/linux/writeback.h | 2 ++ 2 files changed, 31 insertions(+), 10 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index bb6148dc6d24..c7f99fd2c7f0 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1947,6 +1947,33 @@ void wb_workfn(struct work_struct *work) } /* + * Start writeback of `nr_pages' pages on this bdi. If `nr_pages' is zero, + * write back the whole world. + */ +static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, + long nr_pages, enum wb_reason reason) +{ + struct bdi_writeback *wb; + + if (!bdi_has_dirty_io(bdi)) + return; + + list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) + wb_start_writeback(wb, wb_split_bdi_pages(wb, nr_pages), + false, reason); +} + +void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, + enum wb_reason reason) +{ + long nr_pages = get_nr_dirty_pages(); + + rcu_read_lock(); + __wakeup_flusher_threads_bdi(bdi, nr_pages, reason); + rcu_read_unlock(); +} + +/* * Wakeup the flusher threads to start writeback of all currently dirty pages */ void wakeup_flusher_threads(enum wb_reason reason) @@ -1963,16 +1990,8 @@ void wakeup_flusher_threads(enum wb_reason reason) nr_pages = get_nr_dirty_pages(); rcu_read_lock(); - list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { - struct bdi_writeback *wb; - - if (!bdi_has_dirty_io(bdi)) - continue; - - list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) - wb_start_writeback(wb, wb_split_bdi_pages(wb, nr_pages), - false, reason); - } + list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) + __wakeup_flusher_threads_bdi(bdi, nr_pages, reason); rcu_read_unlock(); } diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 1f9c6db5e29a..9c0091678af4 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -190,6 +190,8 @@ bool try_to_writeback_inodes_sb_nr(struct super_block *, unsigned long nr, enum wb_reason reason); void sync_inodes_sb(struct super_block *); void wakeup_flusher_threads(enum wb_reason reason); +void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, + enum wb_reason reason); void inode_wait_for_writeback(struct inode *inode); /* writeback.h requires fs.h; it, too, is not included from here. */ -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id E55856B0268 for ; Wed, 20 Sep 2017 11:33:26 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id c195so4514889itb.5 for ; Wed, 20 Sep 2017 08:33:26 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id p69sor1940627ite.91.2017.09.20.08.33.25 for (Google Transport Security); Wed, 20 Sep 2017 08:33:25 -0700 (PDT) From: Jens Axboe Subject: [PATCH 4/7] page-writeback: pass in '0' for nr_pages writeback in laptop mode Date: Wed, 20 Sep 2017 09:32:59 -0600 Message-Id: <1505921582-26709-5-git-send-email-axboe@kernel.dk> In-Reply-To: <1505921582-26709-1-git-send-email-axboe@kernel.dk> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz, Jens Axboe Laptop mode really wants to writeback the number of dirty pages and inodes. Instead of calculating this in the caller, just pass in 0 and let wakeup_flusher_threads() handle it. Use the new wakeup_flusher_threads_bdi() instead of rolling our own. This changes the writeback to not be range cyclic, but that should not matter for laptop mode flush-all semantics. Acked-by: Johannes Weiner Tested-by: Chris Mason Signed-off-by: Jens Axboe Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- mm/page-writeback.c | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0b9c5cbe8eba..8d1fc593bce8 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1980,23 +1980,8 @@ int dirty_writeback_centisecs_handler(struct ctl_table *table, int write, void laptop_mode_timer_fn(unsigned long data) { struct request_queue *q = (struct request_queue *)data; - int nr_pages = global_node_page_state(NR_FILE_DIRTY) + - global_node_page_state(NR_UNSTABLE_NFS); - struct bdi_writeback *wb; - /* - * We want to write everything out, not just down to the dirty - * threshold - */ - if (!bdi_has_dirty_io(q->backing_dev_info)) - return; - - rcu_read_lock(); - list_for_each_entry_rcu(wb, &q->backing_dev_info->wb_list, bdi_node) - if (wb_has_dirty_io(wb)) - wb_start_writeback(wb, nr_pages, true, - WB_REASON_LAPTOP_TIMER); - rcu_read_unlock(); + wakeup_flusher_threads_bdi(q->backing_dev_info, WB_REASON_LAPTOP_TIMER); } /* -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id 488BA6B026A for ; Wed, 20 Sep 2017 11:33:29 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id 85so4573314ith.0 for ; Wed, 20 Sep 2017 08:33:29 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id u67sor1997233itc.42.2017.09.20.08.33.27 for (Google Transport Security); Wed, 20 Sep 2017 08:33:28 -0700 (PDT) From: Jens Axboe Subject: [PATCH 5/7] fs-writeback: make wb_start_writeback() static Date: Wed, 20 Sep 2017 09:33:00 -0600 Message-Id: <1505921582-26709-6-git-send-email-axboe@kernel.dk> In-Reply-To: <1505921582-26709-1-git-send-email-axboe@kernel.dk> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz, Jens Axboe We don't have any callers outside of fs-writeback.c anymore, make it private. Acked-by: Johannes Weiner Tested-by: Chris Mason Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- fs/fs-writeback.c | 4 ++-- include/linux/backing-dev.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index c7f99fd2c7f0..ecbd26d1121d 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -933,8 +933,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, #endif /* CONFIG_CGROUP_WRITEBACK */ -void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, - bool range_cyclic, enum wb_reason reason) +static void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, + bool range_cyclic, enum wb_reason reason) { struct wb_writeback_work *work; diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 854e1bdd0b2a..157e950a70dc 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -38,8 +38,6 @@ static inline struct backing_dev_info *bdi_alloc(gfp_t gfp_mask) return bdi_alloc_node(gfp_mask, NUMA_NO_NODE); } -void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, - bool range_cyclic, enum wb_reason reason); void wb_start_background_writeback(struct bdi_writeback *wb); void wb_workfn(struct work_struct *work); void wb_wakeup_delayed(struct bdi_writeback *wb); -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f70.google.com (mail-it0-f70.google.com [209.85.214.70]) by kanga.kvack.org (Postfix) with ESMTP id 5337C6B026D for ; Wed, 20 Sep 2017 11:33:30 -0400 (EDT) Received: by mail-it0-f70.google.com with SMTP id d6so4512200itc.6 for ; Wed, 20 Sep 2017 08:33:30 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id k4sor2001279ith.17.2017.09.20.08.33.29 for (Google Transport Security); Wed, 20 Sep 2017 08:33:29 -0700 (PDT) From: Jens Axboe Subject: [PATCH 6/7] fs-writeback: move nr_pages == 0 logic to one location Date: Wed, 20 Sep 2017 09:33:01 -0600 Message-Id: <1505921582-26709-7-git-send-email-axboe@kernel.dk> In-Reply-To: <1505921582-26709-1-git-send-email-axboe@kernel.dk> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz, Jens Axboe Now that we have no external callers of wb_start_writeback(), we can shuffle the passing in of 'nr_pages'. Everybody passes in 0 at this point, so just kill the argument and move the dirty count retrieval to that function. Acked-by: Johannes Weiner Tested-by: Chris Mason Signed-off-by: Jens Axboe --- fs/fs-writeback.c | 42 ++++++++++++++++++------------------------ 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index ecbd26d1121d..3916ea2484ae 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -933,8 +933,19 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, #endif /* CONFIG_CGROUP_WRITEBACK */ -static void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, - bool range_cyclic, enum wb_reason reason) +/* + * Add in the number of potentially dirty inodes, because each inode + * write can dirty pagecache in the underlying blockdev. + */ +static unsigned long get_nr_dirty_pages(void) +{ + return global_node_page_state(NR_FILE_DIRTY) + + global_node_page_state(NR_UNSTABLE_NFS) + + get_nr_dirty_inodes(); +} + +static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, + enum wb_reason reason) { struct wb_writeback_work *work; @@ -954,7 +965,7 @@ static void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, } work->sync_mode = WB_SYNC_NONE; - work->nr_pages = nr_pages; + work->nr_pages = wb_split_bdi_pages(wb, get_nr_dirty_pages()); work->range_cyclic = range_cyclic; work->reason = reason; work->auto_free = 1; @@ -1814,17 +1825,6 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb) return work; } -/* - * Add in the number of potentially dirty inodes, because each inode - * write can dirty pagecache in the underlying blockdev. - */ -static unsigned long get_nr_dirty_pages(void) -{ - return global_node_page_state(NR_FILE_DIRTY) + - global_node_page_state(NR_UNSTABLE_NFS) + - get_nr_dirty_inodes(); -} - static long wb_check_background_flush(struct bdi_writeback *wb) { if (wb_over_bg_thresh(wb)) { @@ -1951,7 +1951,7 @@ void wb_workfn(struct work_struct *work) * write back the whole world. */ static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, - long nr_pages, enum wb_reason reason) + enum wb_reason reason) { struct bdi_writeback *wb; @@ -1959,17 +1959,14 @@ static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, return; list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) - wb_start_writeback(wb, wb_split_bdi_pages(wb, nr_pages), - false, reason); + wb_start_writeback(wb, false, reason); } void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, enum wb_reason reason) { - long nr_pages = get_nr_dirty_pages(); - rcu_read_lock(); - __wakeup_flusher_threads_bdi(bdi, nr_pages, reason); + __wakeup_flusher_threads_bdi(bdi, reason); rcu_read_unlock(); } @@ -1979,7 +1976,6 @@ void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, void wakeup_flusher_threads(enum wb_reason reason) { struct backing_dev_info *bdi; - long nr_pages; /* * If we are expecting writeback progress we must submit plugged IO. @@ -1987,11 +1983,9 @@ void wakeup_flusher_threads(enum wb_reason reason) if (blk_needs_flush_plug(current)) blk_schedule_flush_plug(current); - nr_pages = get_nr_dirty_pages(); - rcu_read_lock(); list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) - __wakeup_flusher_threads_bdi(bdi, nr_pages, reason); + __wakeup_flusher_threads_bdi(bdi, reason); rcu_read_unlock(); } -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f199.google.com (mail-io0-f199.google.com [209.85.223.199]) by kanga.kvack.org (Postfix) with ESMTP id 7A0FF6B026D for ; Wed, 20 Sep 2017 11:33:31 -0400 (EDT) Received: by mail-io0-f199.google.com with SMTP id g32so4866386ioj.0 for ; Wed, 20 Sep 2017 08:33:31 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id z2sor1619323iti.80.2017.09.20.08.33.29 for (Google Transport Security); Wed, 20 Sep 2017 08:33:30 -0700 (PDT) From: Jens Axboe Subject: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush Date: Wed, 20 Sep 2017 09:33:02 -0600 Message-Id: <1505921582-26709-8-git-send-email-axboe@kernel.dk> In-Reply-To: <1505921582-26709-1-git-send-email-axboe@kernel.dk> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, clm@fb.com, jack@suse.cz, Jens Axboe When someone calls wakeup_flusher_threads() or wakeup_flusher_threads_bdi(), they schedule writeback of all dirty pages in the system (or on that bdi). If we are tight on memory, we can get tons of these queued from kswapd/vmscan. This causes (at least) two problems: 1) We consume a ton of memory just allocating writeback work items. 2) We spend so much time processing these work items, that we introduce a softlockup in writeback processing. Fix this by adding a 'start_all' bit to the writeback structure, and set that when someone attempts to flush all dirty page. The bit is cleared when we start writeback on that work item. If the bit is already set when we attempt to queue !nr_pages writeback, then we simply ignore it. This provides us one full flush in flight, with one pending as well, and makes for more efficient handling of this type of writeback. Acked-by: Johannes Weiner Tested-by: Chris Mason Reviewed-by: Jan Kara Signed-off-by: Jens Axboe --- fs/fs-writeback.c | 24 ++++++++++++++++++++++++ include/linux/backing-dev-defs.h | 1 + 2 files changed, 25 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 3916ea2484ae..6205319d0c24 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -53,6 +53,7 @@ struct wb_writeback_work { unsigned int for_background:1; unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ unsigned int auto_free:1; /* free on completion */ + unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ enum wb_reason reason; /* why was writeback initiated? */ struct list_head list; /* pending work list */ @@ -953,12 +954,26 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, return; /* + * All callers of this function want to start writeback of all + * dirty pages. Places like vmscan can call this at a very + * high frequency, causing pointless allocations of tons of + * work items and keeping the flusher threads busy retrieving + * that work. Ensure that we only allow one of them pending and + * inflight at the time + */ + if (test_bit(WB_start_all, &wb->state)) + return; + + set_bit(WB_start_all, &wb->state); + + /* * This is WB_SYNC_NONE writeback, so if allocation fails just * wakeup the thread for old dirty data writeback */ work = kzalloc(sizeof(*work), GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); if (!work) { + clear_bit(WB_start_all, &wb->state); trace_writeback_nowork(wb); wb_wakeup(wb); return; @@ -969,6 +984,7 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, work->range_cyclic = range_cyclic; work->reason = reason; work->auto_free = 1; + work->start_all = 1; wb_queue_work(wb, work); } @@ -1822,6 +1838,14 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb) list_del_init(&work->list); } spin_unlock_bh(&wb->work_lock); + + /* + * Once we start processing a work item that had !nr_pages, + * clear the wb state bit for that so we can allow more. + */ + if (work && work->start_all) + clear_bit(WB_start_all, &wb->state); + return work; } diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 866c433e7d32..420de5c7c7f9 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -24,6 +24,7 @@ enum wb_state { WB_shutting_down, /* wb_shutdown() in progress */ WB_writeback_running, /* Writeback is in progress */ WB_has_dirty_io, /* Dirty inodes on ->b_{dirty|io|more_io} */ + WB_start_all, /* nr_pages == 0 (all) work pending */ }; enum wb_congested_state { -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id DCBEC6B0038 for ; Thu, 21 Sep 2017 10:55:31 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id 188so12068483pgb.3 for ; Thu, 21 Sep 2017 07:55:31 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id c8si1134281pfm.48.2017.09.21.07.55.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 07:55:29 -0700 (PDT) Date: Thu, 21 Sep 2017 07:55:27 -0700 From: Christoph Hellwig Subject: Re: [PATCH 1/7] buffer: cleanup free_more_memory() flusher wakeup Message-ID: <20170921145527.GA8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-2-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-2-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz Looks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 6B3DE6B0069 for ; Thu, 21 Sep 2017 10:55:45 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id i130so12053041pgc.5 for ; Thu, 21 Sep 2017 07:55:45 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id a13si1145817pgu.549.2017.09.21.07.55.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 07:55:44 -0700 (PDT) Date: Thu, 21 Sep 2017 07:55:43 -0700 From: Christoph Hellwig Subject: Re: [PATCH 2/7] fs: kill 'nr_pages' argument from wakeup_flusher_threads() Message-ID: <20170921145543.GB8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-3-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-3-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz Looks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id EC44C6B0038 for ; Thu, 21 Sep 2017 10:56:36 -0400 (EDT) Received: by mail-pg0-f69.google.com with SMTP id 188so12073265pgb.3 for ; Thu, 21 Sep 2017 07:56:36 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id m13si1076783pgp.428.2017.09.21.07.56.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 07:56:36 -0700 (PDT) Date: Thu, 21 Sep 2017 07:56:34 -0700 From: Christoph Hellwig Subject: Re: [PATCH 3/7] fs-writeback: provide a wakeup_flusher_threads_bdi() Message-ID: <20170921145634.GC8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-4-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-4-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz Looks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 901EC6B0038 for ; Thu, 21 Sep 2017 10:59:33 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id p5so12049413pgn.7 for ; Thu, 21 Sep 2017 07:59:33 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id m197si1128071pga.97.2017.09.21.07.59.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 07:59:32 -0700 (PDT) Date: Thu, 21 Sep 2017 07:59:30 -0700 From: Christoph Hellwig Subject: Re: [PATCH 4/7] page-writeback: pass in '0' for nr_pages writeback in laptop mode Message-ID: <20170921145929.GD8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-5-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-5-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Wed, Sep 20, 2017 at 09:32:59AM -0600, Jens Axboe wrote: > Laptop mode really wants to writeback the number of dirty > pages and inodes. Instead of calculating this in the caller, > just pass in 0 and let wakeup_flusher_threads() handle it. > > Use the new wakeup_flusher_threads_bdi() instead of rolling > our own. This changes the writeback to not be range cyclic, > but that should not matter for laptop mode flush-all > semantics. Looks good, Reviewed-by: Christoph Hellwig While we're at sorting out the laptop_mode_wb_timer mess: can we move initializing and deleting it from the block code to the backing-dev code given that it now doesn't assume anything about block devices any more? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 3734E6B0038 for ; Thu, 21 Sep 2017 11:00:51 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id p5so12056095pgn.7 for ; Thu, 21 Sep 2017 08:00:51 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id a6si1268956plt.502.2017.09.21.08.00.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 08:00:48 -0700 (PDT) Date: Thu, 21 Sep 2017 08:00:47 -0700 From: Christoph Hellwig Subject: Re: [PATCH 4/7] page-writeback: pass in '0' for nr_pages writeback in laptop mode Message-ID: <20170921150047.GE8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-5-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-5-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Wed, Sep 20, 2017 at 09:32:59AM -0600, Jens Axboe wrote: > Use the new wakeup_flusher_threads_bdi() instead of rolling > our own. This changes the writeback to not be range cyclic, > but that should not matter for laptop mode flush-all > semantics. Oh btw - I think this actually is the more important change and should probably be in the subject line. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id B203F6B0069 for ; Thu, 21 Sep 2017 11:01:03 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id y29so10594249pff.6 for ; Thu, 21 Sep 2017 08:01:03 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id 97si1128196ple.401.2017.09.21.08.01.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 08:01:02 -0700 (PDT) Date: Thu, 21 Sep 2017 08:01:01 -0700 From: Christoph Hellwig Subject: Re: [PATCH 5/7] fs-writeback: make wb_start_writeback() static Message-ID: <20170921150101.GF8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-6-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-6-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz Looks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 766486B0038 for ; Thu, 21 Sep 2017 11:02:15 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id a7so10626496pfj.3 for ; Thu, 21 Sep 2017 08:02:15 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id k184si1208256pga.79.2017.09.21.08.02.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 08:02:14 -0700 (PDT) Date: Thu, 21 Sep 2017 08:02:13 -0700 From: Christoph Hellwig Subject: Re: [PATCH 6/7] fs-writeback: move nr_pages == 0 logic to one location Message-ID: <20170921150213.GG8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-7-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-7-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz Looks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 9E7FB6B0038 for ; Thu, 21 Sep 2017 11:05:12 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id d8so12129314pgt.1 for ; Thu, 21 Sep 2017 08:05:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id r63si1137755plb.443.2017.09.21.08.05.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 08:05:11 -0700 (PDT) Date: Thu, 21 Sep 2017 08:05:10 -0700 From: Christoph Hellwig Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush Message-ID: <20170921150510.GH8839@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-8-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Wed, Sep 20, 2017 at 09:33:02AM -0600, Jens Axboe wrote: > When someone calls wakeup_flusher_threads() or > wakeup_flusher_threads_bdi(), they schedule writeback of all dirty > pages in the system (or on that bdi). If we are tight on memory, we > can get tons of these queued from kswapd/vmscan. This causes (at > least) two problems: > > 1) We consume a ton of memory just allocating writeback work items. > 2) We spend so much time processing these work items, that we > introduce a softlockup in writeback processing. > > Fix this by adding a 'start_all' bit to the writeback structure, and > set that when someone attempts to flush all dirty page. The bit is > cleared when we start writeback on that work item. If the bit is > already set when we attempt to queue !nr_pages writeback, then we > simply ignore it. > > This provides us one full flush in flight, with one pending as well, > and makes for more efficient handling of this type of writeback. > > Acked-by: Johannes Weiner > Tested-by: Chris Mason > Reviewed-by: Jan Kara > Signed-off-by: Jens Axboe > --- > fs/fs-writeback.c | 24 ++++++++++++++++++++++++ > include/linux/backing-dev-defs.h | 1 + > 2 files changed, 25 insertions(+) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 3916ea2484ae..6205319d0c24 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -53,6 +53,7 @@ struct wb_writeback_work { > unsigned int for_background:1; > unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ > unsigned int auto_free:1; /* free on completion */ > + unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ > enum wb_reason reason; /* why was writeback initiated? */ > > struct list_head list; /* pending work list */ > @@ -953,12 +954,26 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, > return; > > /* > + * All callers of this function want to start writeback of all > + * dirty pages. Places like vmscan can call this at a very > + * high frequency, causing pointless allocations of tons of > + * work items and keeping the flusher threads busy retrieving > + * that work. Ensure that we only allow one of them pending and > + * inflight at the time > + */ > + if (test_bit(WB_start_all, &wb->state)) > + return; > + > + set_bit(WB_start_all, &wb->state); This should be test_and_set_bit here.. But more importantly once we are not guaranteed that we only have a single global wb_writeback_work per bdi_writeback we should just embedd that into struct bdi_writeback instead of dynamically allocating it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f197.google.com (mail-io0-f197.google.com [209.85.223.197]) by kanga.kvack.org (Postfix) with ESMTP id 1E3AC6B0038 for ; Thu, 21 Sep 2017 11:36:49 -0400 (EDT) Received: by mail-io0-f197.google.com with SMTP id m103so11016383iod.6 for ; Thu, 21 Sep 2017 08:36:49 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id p144sor962030itc.105.2017.09.21.08.36.47 for (Google Transport Security); Thu, 21 Sep 2017 08:36:47 -0700 (PDT) Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> From: Jens Axboe Message-ID: <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> Date: Thu, 21 Sep 2017 09:36:45 -0600 MIME-Version: 1.0 In-Reply-To: <20170921150510.GH8839@infradead.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Hellwig Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On 09/21/2017 09:05 AM, Christoph Hellwig wrote: > On Wed, Sep 20, 2017 at 09:33:02AM -0600, Jens Axboe wrote: >> When someone calls wakeup_flusher_threads() or >> wakeup_flusher_threads_bdi(), they schedule writeback of all dirty >> pages in the system (or on that bdi). If we are tight on memory, we >> can get tons of these queued from kswapd/vmscan. This causes (at >> least) two problems: >> >> 1) We consume a ton of memory just allocating writeback work items. >> 2) We spend so much time processing these work items, that we >> introduce a softlockup in writeback processing. >> >> Fix this by adding a 'start_all' bit to the writeback structure, and >> set that when someone attempts to flush all dirty page. The bit is >> cleared when we start writeback on that work item. If the bit is >> already set when we attempt to queue !nr_pages writeback, then we >> simply ignore it. >> >> This provides us one full flush in flight, with one pending as well, >> and makes for more efficient handling of this type of writeback. >> >> Acked-by: Johannes Weiner >> Tested-by: Chris Mason >> Reviewed-by: Jan Kara >> Signed-off-by: Jens Axboe >> --- >> fs/fs-writeback.c | 24 ++++++++++++++++++++++++ >> include/linux/backing-dev-defs.h | 1 + >> 2 files changed, 25 insertions(+) >> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c >> index 3916ea2484ae..6205319d0c24 100644 >> --- a/fs/fs-writeback.c >> +++ b/fs/fs-writeback.c >> @@ -53,6 +53,7 @@ struct wb_writeback_work { >> unsigned int for_background:1; >> unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ >> unsigned int auto_free:1; /* free on completion */ >> + unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ >> enum wb_reason reason; /* why was writeback initiated? */ >> >> struct list_head list; /* pending work list */ >> @@ -953,12 +954,26 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, >> return; >> >> /* >> + * All callers of this function want to start writeback of all >> + * dirty pages. Places like vmscan can call this at a very >> + * high frequency, causing pointless allocations of tons of >> + * work items and keeping the flusher threads busy retrieving >> + * that work. Ensure that we only allow one of them pending and >> + * inflight at the time >> + */ >> + if (test_bit(WB_start_all, &wb->state)) >> + return; >> + >> + set_bit(WB_start_all, &wb->state); > > This should be test_and_set_bit here.. That's on purpose, doesn't matter if we race here, and if we're being hammered with flusher thread wakeups, then we don't want to turn that unlocked test into a locked instruction. > But more importantly once we are not guaranteed that we only have > a single global wb_writeback_work per bdi_writeback we should just > embedd that into struct bdi_writeback instead of dynamically > allocating it. We could do this as a followup. But right now the logic is that we can have on started (inflight), and still have one new queued. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f198.google.com (mail-io0-f198.google.com [209.85.223.198]) by kanga.kvack.org (Postfix) with ESMTP id DCD7B6B0253 for ; Thu, 21 Sep 2017 12:05:49 -0400 (EDT) Received: by mail-io0-f198.google.com with SMTP id g32so11228886ioj.0 for ; Thu, 21 Sep 2017 09:05:49 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id u24sor769025ioi.301.2017.09.21.09.05.48 for (Google Transport Security); Thu, 21 Sep 2017 09:05:48 -0700 (PDT) Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush From: Jens Axboe References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> Message-ID: <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> Date: Thu, 21 Sep 2017 10:00:25 -0600 MIME-Version: 1.0 In-Reply-To: <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Hellwig Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On 09/21/2017 09:36 AM, Jens Axboe wrote: >> But more importantly once we are not guaranteed that we only have >> a single global wb_writeback_work per bdi_writeback we should just >> embedd that into struct bdi_writeback instead of dynamically >> allocating it. > > We could do this as a followup. But right now the logic is that we > can have on started (inflight), and still have one new queued. Something like the below would fit on top to do that. Gets rid of the allocation and embeds the work item for global start-all in the bdi_writeback structure. diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 6205319d0c24..9f3872e28c3f 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -40,27 +40,6 @@ struct wb_completion { }; /* - * Passed into wb_writeback(), essentially a subset of writeback_control - */ -struct wb_writeback_work { - long nr_pages; - struct super_block *sb; - unsigned long *older_than_this; - enum writeback_sync_modes sync_mode; - unsigned int tagged_writepages:1; - unsigned int for_kupdate:1; - unsigned int range_cyclic:1; - unsigned int for_background:1; - unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ - unsigned int auto_free:1; /* free on completion */ - unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ - enum wb_reason reason; /* why was writeback initiated? */ - - struct list_head list; /* pending work list */ - struct wb_completion *done; /* set if the caller waits */ -}; - -/* * If one wants to wait for one or more wb_writeback_works, each work's * ->done should be set to a wb_completion defined using the following * macro. Once all work items are issued with wb_queue_work(), the caller @@ -181,6 +160,8 @@ static void finish_writeback_work(struct bdi_writeback *wb, if (work->auto_free) kfree(work); + else if (work->start_all) + clear_bit(WB_start_all, &wb->state); if (done && atomic_dec_and_test(&done->cnt)) wake_up_all(&wb->bdi->wb_waitq); } @@ -945,8 +926,7 @@ static unsigned long get_nr_dirty_pages(void) get_nr_dirty_inodes(); } -static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, - enum wb_reason reason) +static void wb_start_writeback(struct bdi_writeback *wb, enum wb_reason reason) { struct wb_writeback_work *work; @@ -961,29 +941,16 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, * that work. Ensure that we only allow one of them pending and * inflight at the time */ - if (test_bit(WB_start_all, &wb->state)) - return; - - set_bit(WB_start_all, &wb->state); - - /* - * This is WB_SYNC_NONE writeback, so if allocation fails just - * wakeup the thread for old dirty data writeback - */ - work = kzalloc(sizeof(*work), - GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); - if (!work) { - clear_bit(WB_start_all, &wb->state); - trace_writeback_nowork(wb); - wb_wakeup(wb); + if (test_bit(WB_start_all, &wb->state) || + test_and_set_bit(WB_start_all, &wb->state)) return; - } + work = &wb->wb_all_work; + memset(work, 0, sizeof(*work)); work->sync_mode = WB_SYNC_NONE; work->nr_pages = wb_split_bdi_pages(wb, get_nr_dirty_pages()); - work->range_cyclic = range_cyclic; + work->range_cyclic = false; work->reason = reason; - work->auto_free = 1; work->start_all = 1; wb_queue_work(wb, work); @@ -1838,14 +1805,6 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb) list_del_init(&work->list); } spin_unlock_bh(&wb->work_lock); - - /* - * Once we start processing a work item that had !nr_pages, - * clear the wb state bit for that so we can allow more. - */ - if (work && work->start_all) - clear_bit(WB_start_all, &wb->state); - return work; } @@ -1983,7 +1942,7 @@ static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, return; list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) - wb_start_writeback(wb, false, reason); + wb_start_writeback(wb, reason); } void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 420de5c7c7f9..4e1146ce5584 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -24,7 +24,7 @@ enum wb_state { WB_shutting_down, /* wb_shutdown() in progress */ WB_writeback_running, /* Writeback is in progress */ WB_has_dirty_io, /* Dirty inodes on ->b_{dirty|io|more_io} */ - WB_start_all, /* nr_pages == 0 (all) work pending */ + WB_start_all, /* wb->wb_all_work queued/allocated */ }; enum wb_congested_state { @@ -65,6 +65,58 @@ struct bdi_writeback_congested { }; /* + * fs/fs-writeback.c + */ +enum writeback_sync_modes { + WB_SYNC_NONE, /* Don't wait on anything */ + WB_SYNC_ALL, /* Wait on every mapping */ +}; + +/* + * why some writeback work was initiated + */ +enum wb_reason { + WB_REASON_BACKGROUND, + WB_REASON_VMSCAN, + WB_REASON_SYNC, + WB_REASON_PERIODIC, + WB_REASON_LAPTOP_TIMER, + WB_REASON_FREE_MORE_MEM, + WB_REASON_FS_FREE_SPACE, + /* + * There is no bdi forker thread any more and works are done + * by emergency worker, however, this is TPs userland visible + * and we'll be exposing exactly the same information, + * so it has a mismatch name. + */ + WB_REASON_FORKER_THREAD, + + WB_REASON_MAX, +}; + +/* + * Passed into wb_writeback(), essentially a subset of writeback_control + */ +struct wb_completion; +struct wb_writeback_work { + long nr_pages; + struct super_block *sb; + unsigned long *older_than_this; + enum writeback_sync_modes sync_mode; + unsigned int tagged_writepages:1; + unsigned int for_kupdate:1; + unsigned int range_cyclic:1; + unsigned int for_background:1; + unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ + unsigned int auto_free:1; /* free on completion */ + unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ + enum wb_reason reason; /* why was writeback initiated? */ + + struct list_head list; /* pending work list */ + struct wb_completion *done; /* set if the caller waits */ +}; + +/* * Each wb (bdi_writeback) can perform writeback operations, is measured * and throttled, independently. Without cgroup writeback, each bdi * (bdi_writeback) is served by its embedded bdi->wb. @@ -125,6 +177,8 @@ struct bdi_writeback { struct list_head bdi_node; /* anchored at bdi->wb_list */ + struct wb_writeback_work wb_all_work; + #ifdef CONFIG_CGROUP_WRITEBACK struct percpu_ref refcnt; /* used only for !root wb's */ struct fprop_local_percpu memcg_completions; diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 9c0091678af4..f4b52ab328b2 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -34,36 +34,6 @@ DECLARE_PER_CPU(int, dirty_throttle_leaks); struct backing_dev_info; /* - * fs/fs-writeback.c - */ -enum writeback_sync_modes { - WB_SYNC_NONE, /* Don't wait on anything */ - WB_SYNC_ALL, /* Wait on every mapping */ -}; - -/* - * why some writeback work was initiated - */ -enum wb_reason { - WB_REASON_BACKGROUND, - WB_REASON_VMSCAN, - WB_REASON_SYNC, - WB_REASON_PERIODIC, - WB_REASON_LAPTOP_TIMER, - WB_REASON_FREE_MORE_MEM, - WB_REASON_FS_FREE_SPACE, - /* - * There is no bdi forker thread any more and works are done - * by emergency worker, however, this is TPs userland visible - * and we'll be exposing exactly the same information, - * so it has a mismatch name. - */ - WB_REASON_FORKER_THREAD, - - WB_REASON_MAX, -}; - -/* * A control structure which tells the writeback code what to do. These are * always on the stack, and hence need no locking. They are always initialised * in a manner such that unspecified fields are set to zero. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw0-f198.google.com (mail-yw0-f198.google.com [209.85.161.198]) by kanga.kvack.org (Postfix) with ESMTP id 9AD846B0038 for ; Thu, 21 Sep 2017 13:33:46 -0400 (EDT) Received: by mail-yw0-f198.google.com with SMTP id s62so11802183ywg.3 for ; Thu, 21 Sep 2017 10:33:46 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id v127si415935ybe.128.2017.09.21.10.33.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Sep 2017 10:33:44 -0700 (PDT) Date: Thu, 21 Sep 2017 10:33:39 -0700 From: Christoph Hellwig Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush Message-ID: <20170921173339.GA13384@infradead.org> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Thu, Sep 21, 2017 at 10:00:25AM -0600, Jens Axboe wrote: > Something like the below would fit on top to do that. Gets rid of the > allocation and embeds the work item for global start-all in the > bdi_writeback structure. Something like that. Although if we still kalloc the global wb we wouldn't have to expose all the details in the header. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 898246B0033 for ; Fri, 22 Sep 2017 09:12:39 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id g50so1193447wra.4 for ; Fri, 22 Sep 2017 06:12:39 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id q2si1197546edg.302.2017.09.22.06.12.36 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 22 Sep 2017 06:12:37 -0700 (PDT) Date: Fri, 22 Sep 2017 15:12:32 +0200 From: Jan Kara Subject: Re: [PATCH 2/7] fs: kill 'nr_pages' argument from wakeup_flusher_threads() Message-ID: <20170922131232.GA22455@quack2.suse.cz> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-3-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-3-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Wed 20-09-17 09:32:57, Jens Axboe wrote: > Everybody is passing in 0 now, let's get rid of the argument. > > Signed-off-by: Jens Axboe Looks good. You can add: Reviewed-by: Jan Kara Honza > --- > fs/buffer.c | 2 +- > fs/fs-writeback.c | 9 ++++----- > fs/sync.c | 2 +- > include/linux/writeback.h | 2 +- > mm/vmscan.c | 2 +- > 5 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 9471a445e370..cf71926797d3 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -260,7 +260,7 @@ static void free_more_memory(void) > struct zoneref *z; > int nid; > > - wakeup_flusher_threads(0, WB_REASON_FREE_MORE_MEM); > + wakeup_flusher_threads(WB_REASON_FREE_MORE_MEM); > yield(); > > for_each_online_node(nid) { > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 245c430a2e41..bb6148dc6d24 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1947,12 +1947,12 @@ void wb_workfn(struct work_struct *work) > } > > /* > - * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back > - * the whole world. > + * Wakeup the flusher threads to start writeback of all currently dirty pages > */ > -void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) > +void wakeup_flusher_threads(enum wb_reason reason) > { > struct backing_dev_info *bdi; > + long nr_pages; > > /* > * If we are expecting writeback progress we must submit plugged IO. > @@ -1960,8 +1960,7 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) > if (blk_needs_flush_plug(current)) > blk_schedule_flush_plug(current); > > - if (!nr_pages) > - nr_pages = get_nr_dirty_pages(); > + nr_pages = get_nr_dirty_pages(); > > rcu_read_lock(); > list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { > diff --git a/fs/sync.c b/fs/sync.c > index a576aa2e6b09..09f96a18dd93 100644 > --- a/fs/sync.c > +++ b/fs/sync.c > @@ -108,7 +108,7 @@ SYSCALL_DEFINE0(sync) > { > int nowait = 0, wait = 1; > > - wakeup_flusher_threads(0, WB_REASON_SYNC); > + wakeup_flusher_threads(WB_REASON_SYNC); > iterate_supers(sync_inodes_one_sb, NULL); > iterate_supers(sync_fs_one_sb, &nowait); > iterate_supers(sync_fs_one_sb, &wait); > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index d5815794416c..1f9c6db5e29a 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -189,7 +189,7 @@ bool try_to_writeback_inodes_sb(struct super_block *, enum wb_reason reason); > bool try_to_writeback_inodes_sb_nr(struct super_block *, unsigned long nr, > enum wb_reason reason); > void sync_inodes_sb(struct super_block *); > -void wakeup_flusher_threads(long nr_pages, enum wb_reason reason); > +void wakeup_flusher_threads(enum wb_reason reason); > void inode_wait_for_writeback(struct inode *inode); > > /* writeback.h requires fs.h; it, too, is not included from here. */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 13d711dd8776..42a7fdd52d87 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1867,7 +1867,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > * also allow kswapd to start writing pages during reclaim. > */ > if (stat.nr_unqueued_dirty == nr_taken) { > - wakeup_flusher_threads(0, WB_REASON_VMSCAN); > + wakeup_flusher_threads(WB_REASON_VMSCAN); > set_bit(PGDAT_DIRTY, &pgdat->flags); > } > > -- > 2.7.4 > -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id AF2FD6B0033 for ; Fri, 22 Sep 2017 09:14:43 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id 97so1211483wrb.1 for ; Fri, 22 Sep 2017 06:14:43 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id h9si3807778edk.169.2017.09.22.06.14.42 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 22 Sep 2017 06:14:42 -0700 (PDT) Date: Fri, 22 Sep 2017 15:14:38 +0200 From: Jan Kara Subject: Re: [PATCH 4/7] page-writeback: pass in '0' for nr_pages writeback in laptop mode Message-ID: <20170922131438.GB22455@quack2.suse.cz> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-5-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-5-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Wed 20-09-17 09:32:59, Jens Axboe wrote: > Laptop mode really wants to writeback the number of dirty > pages and inodes. Instead of calculating this in the caller, > just pass in 0 and let wakeup_flusher_threads() handle it. > > Use the new wakeup_flusher_threads_bdi() instead of rolling > our own. This changes the writeback to not be range cyclic, > but that should not matter for laptop mode flush-all > semantics. > > Acked-by: Johannes Weiner > Tested-by: Chris Mason > Signed-off-by: Jens Axboe > Reviewed-by: Jan Kara > Signed-off-by: Jens Axboe The subject of the patch looks stale. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 64DD26B0033 for ; Fri, 22 Sep 2017 09:17:03 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id m127so1405072wmm.3 for ; Fri, 22 Sep 2017 06:17:03 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 62si2520758edc.221.2017.09.22.06.17.02 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 22 Sep 2017 06:17:02 -0700 (PDT) Date: Fri, 22 Sep 2017 15:17:01 +0200 From: Jan Kara Subject: Re: [PATCH 6/7] fs-writeback: move nr_pages == 0 logic to one location Message-ID: <20170922131701.GC22455@quack2.suse.cz> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-7-git-send-email-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1505921582-26709-7-git-send-email-axboe@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Wed 20-09-17 09:33:01, Jens Axboe wrote: > Now that we have no external callers of wb_start_writeback(), we > can shuffle the passing in of 'nr_pages'. Everybody passes in 0 > at this point, so just kill the argument and move the dirty > count retrieval to that function. > > Acked-by: Johannes Weiner > Tested-by: Chris Mason > Signed-off-by: Jens Axboe Looks good. You can add: Reviewed-by: Jan Kara Honza > --- > fs/fs-writeback.c | 42 ++++++++++++++++++------------------------ > 1 file changed, 18 insertions(+), 24 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index ecbd26d1121d..3916ea2484ae 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -933,8 +933,19 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, > > #endif /* CONFIG_CGROUP_WRITEBACK */ > > -static void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, > - bool range_cyclic, enum wb_reason reason) > +/* > + * Add in the number of potentially dirty inodes, because each inode > + * write can dirty pagecache in the underlying blockdev. > + */ > +static unsigned long get_nr_dirty_pages(void) > +{ > + return global_node_page_state(NR_FILE_DIRTY) + > + global_node_page_state(NR_UNSTABLE_NFS) + > + get_nr_dirty_inodes(); > +} > + > +static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic, > + enum wb_reason reason) > { > struct wb_writeback_work *work; > > @@ -954,7 +965,7 @@ static void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, > } > > work->sync_mode = WB_SYNC_NONE; > - work->nr_pages = nr_pages; > + work->nr_pages = wb_split_bdi_pages(wb, get_nr_dirty_pages()); > work->range_cyclic = range_cyclic; > work->reason = reason; > work->auto_free = 1; > @@ -1814,17 +1825,6 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb) > return work; > } > > -/* > - * Add in the number of potentially dirty inodes, because each inode > - * write can dirty pagecache in the underlying blockdev. > - */ > -static unsigned long get_nr_dirty_pages(void) > -{ > - return global_node_page_state(NR_FILE_DIRTY) + > - global_node_page_state(NR_UNSTABLE_NFS) + > - get_nr_dirty_inodes(); > -} > - > static long wb_check_background_flush(struct bdi_writeback *wb) > { > if (wb_over_bg_thresh(wb)) { > @@ -1951,7 +1951,7 @@ void wb_workfn(struct work_struct *work) > * write back the whole world. > */ > static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, > - long nr_pages, enum wb_reason reason) > + enum wb_reason reason) > { > struct bdi_writeback *wb; > > @@ -1959,17 +1959,14 @@ static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, > return; > > list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) > - wb_start_writeback(wb, wb_split_bdi_pages(wb, nr_pages), > - false, reason); > + wb_start_writeback(wb, false, reason); > } > > void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, > enum wb_reason reason) > { > - long nr_pages = get_nr_dirty_pages(); > - > rcu_read_lock(); > - __wakeup_flusher_threads_bdi(bdi, nr_pages, reason); > + __wakeup_flusher_threads_bdi(bdi, reason); > rcu_read_unlock(); > } > > @@ -1979,7 +1976,6 @@ void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi, > void wakeup_flusher_threads(enum wb_reason reason) > { > struct backing_dev_info *bdi; > - long nr_pages; > > /* > * If we are expecting writeback progress we must submit plugged IO. > @@ -1987,11 +1983,9 @@ void wakeup_flusher_threads(enum wb_reason reason) > if (blk_needs_flush_plug(current)) > blk_schedule_flush_plug(current); > > - nr_pages = get_nr_dirty_pages(); > - > rcu_read_lock(); > list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) > - __wakeup_flusher_threads_bdi(bdi, nr_pages, reason); > + __wakeup_flusher_threads_bdi(bdi, reason); > rcu_read_unlock(); > } > > -- > 2.7.4 > -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 80DA06B0038 for ; Mon, 25 Sep 2017 05:35:36 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id r83so12656385pfj.5 for ; Mon, 25 Sep 2017 02:35:36 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id f69si3673324pfj.623.2017.09.25.02.35.35 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 25 Sep 2017 02:35:35 -0700 (PDT) Date: Mon, 25 Sep 2017 11:35:32 +0200 From: Jan Kara Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush Message-ID: <20170925093532.GC5741@quack2.suse.cz> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On Thu 21-09-17 10:00:25, Jens Axboe wrote: > On 09/21/2017 09:36 AM, Jens Axboe wrote: > >> But more importantly once we are not guaranteed that we only have > >> a single global wb_writeback_work per bdi_writeback we should just > >> embedd that into struct bdi_writeback instead of dynamically > >> allocating it. > > > > We could do this as a followup. But right now the logic is that we > > can have on started (inflight), and still have one new queued. > > Something like the below would fit on top to do that. Gets rid of the > allocation and embeds the work item for global start-all in the > bdi_writeback structure. Hum, so when we consider stuff like embedded work item, I would somewhat prefer to handle this like we do for for_background and for_kupdate style writeback so that we don't have another special case. For these don't queue any item, we just queue writeback work into the workqueue (via wb_wakeup()). When flusher work gets processed wb_do_writeback() checks (after processing all normal writeback requests) whether conditions for these special writeback styles are met and if yes, it creates on-stack work item and processes it (see wb_check_old_data_flush() and wb_check_background_flush()). So in this case we would just set some flag in bdi_writeback when memory reclaim needs help and wb_do_writeback() would check for this flag and create and process writeback-all style writeback work. Granted this does not preserve ordering of requests (basically any specific request gets priority over writeback-whole-world request) but memory gets cleaned in either case so flusher should be doing what is needed. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f199.google.com (mail-io0-f199.google.com [209.85.223.199]) by kanga.kvack.org (Postfix) with ESMTP id 4AE3C6B0038 for ; Mon, 25 Sep 2017 10:48:49 -0400 (EDT) Received: by mail-io0-f199.google.com with SMTP id k101so13276777iod.1 for ; Mon, 25 Sep 2017 07:48:49 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id f192sor2732182iof.296.2017.09.25.07.48.47 for (Google Transport Security); Mon, 25 Sep 2017 07:48:48 -0700 (PDT) Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> <20170925093532.GC5741@quack2.suse.cz> From: Jens Axboe Message-ID: Date: Mon, 25 Sep 2017 08:48:46 -0600 MIME-Version: 1.0 In-Reply-To: <20170925093532.GC5741@quack2.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara Cc: Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com On 09/25/2017 03:35 AM, Jan Kara wrote: > On Thu 21-09-17 10:00:25, Jens Axboe wrote: >> On 09/21/2017 09:36 AM, Jens Axboe wrote: >>>> But more importantly once we are not guaranteed that we only have >>>> a single global wb_writeback_work per bdi_writeback we should just >>>> embedd that into struct bdi_writeback instead of dynamically >>>> allocating it. >>> >>> We could do this as a followup. But right now the logic is that we >>> can have on started (inflight), and still have one new queued. >> >> Something like the below would fit on top to do that. Gets rid of the >> allocation and embeds the work item for global start-all in the >> bdi_writeback structure. > > Hum, so when we consider stuff like embedded work item, I would somewhat > prefer to handle this like we do for for_background and for_kupdate style > writeback so that we don't have another special case. For these don't queue > any item, we just queue writeback work into the workqueue (via > wb_wakeup()). When flusher work gets processed wb_do_writeback() checks > (after processing all normal writeback requests) whether conditions for > these special writeback styles are met and if yes, it creates on-stack work > item and processes it (see wb_check_old_data_flush() and > wb_check_background_flush()). Thanks Jan, I think that's a really good suggestion and kills the special case completely. I'll rework the patch as a small series for 4.15. > So in this case we would just set some flag in bdi_writeback when memory > reclaim needs help and wb_do_writeback() would check for this flag and > create and process writeback-all style writeback work. Granted this does > not preserve ordering of requests (basically any specific request gets > priority over writeback-whole-world request) but memory gets cleaned in > either case so flusher should be doing what is needed. I don't think that matters, and we're already mostly there since we reject a request if one is pending. And at this point they are all identical "start all writeback" requests. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id 5C8026B0038 for ; Mon, 25 Sep 2017 10:57:56 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id 4so11358888itv.4 for ; Mon, 25 Sep 2017 07:57:56 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id x6sor3492373itd.86.2017.09.25.07.57.55 for (Google Transport Security); Mon, 25 Sep 2017 07:57:55 -0700 (PDT) Subject: Re: [PATCH 4/7] page-writeback: pass in '0' for nr_pages writeback in laptop mode References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-5-git-send-email-axboe@kernel.dk> <20170921145929.GD8839@infradead.org> From: Jens Axboe Message-ID: Date: Mon, 25 Sep 2017 08:57:52 -0600 MIME-Version: 1.0 In-Reply-To: <20170921145929.GD8839@infradead.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Hellwig Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com, jack@suse.cz On 09/21/2017 08:59 AM, Christoph Hellwig wrote: > On Wed, Sep 20, 2017 at 09:32:59AM -0600, Jens Axboe wrote: >> Laptop mode really wants to writeback the number of dirty >> pages and inodes. Instead of calculating this in the caller, >> just pass in 0 and let wakeup_flusher_threads() handle it. >> >> Use the new wakeup_flusher_threads_bdi() instead of rolling >> our own. This changes the writeback to not be range cyclic, >> but that should not matter for laptop mode flush-all >> semantics. > > Looks good, > > Reviewed-by: Christoph Hellwig > > While we're at sorting out the laptop_mode_wb_timer mess: > can we move initializing and deleting it from the block code > to the backing-dev code given that it now doesn't assume anything > about block devices any more? Good point, I'll include that in a followup for 4.15. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 6C8966B0038 for ; Thu, 28 Sep 2017 14:10:02 -0400 (EDT) Received: by mail-pg0-f69.google.com with SMTP id y17so2969778pgc.2 for ; Thu, 28 Sep 2017 11:10:02 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id t2sor322324pgb.272.2017.09.28.11.10.00 for (Google Transport Security); Thu, 28 Sep 2017 11:10:00 -0700 (PDT) Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> <20170925093532.GC5741@quack2.suse.cz> From: Jens Axboe Message-ID: <214d2bcb-0697-c051-0f36-20cd0d8702b0@kernel.dk> Date: Thu, 28 Sep 2017 20:09:50 +0200 MIME-Version: 1.0 In-Reply-To: <20170925093532.GC5741@quack2.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara Cc: Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com On 09/25/2017 11:35 AM, Jan Kara wrote: > On Thu 21-09-17 10:00:25, Jens Axboe wrote: >> On 09/21/2017 09:36 AM, Jens Axboe wrote: >>>> But more importantly once we are not guaranteed that we only have >>>> a single global wb_writeback_work per bdi_writeback we should just >>>> embedd that into struct bdi_writeback instead of dynamically >>>> allocating it. >>> >>> We could do this as a followup. But right now the logic is that we >>> can have on started (inflight), and still have one new queued. >> >> Something like the below would fit on top to do that. Gets rid of the >> allocation and embeds the work item for global start-all in the >> bdi_writeback structure. > > Hum, so when we consider stuff like embedded work item, I would somewhat > prefer to handle this like we do for for_background and for_kupdate style > writeback so that we don't have another special case. For these don't queue > any item, we just queue writeback work into the workqueue (via > wb_wakeup()). When flusher work gets processed wb_do_writeback() checks > (after processing all normal writeback requests) whether conditions for > these special writeback styles are met and if yes, it creates on-stack work > item and processes it (see wb_check_old_data_flush() and > wb_check_background_flush()). > > So in this case we would just set some flag in bdi_writeback when memory > reclaim needs help and wb_do_writeback() would check for this flag and > create and process writeback-all style writeback work. Granted this does > not preserve ordering of requests (basically any specific request gets > priority over writeback-whole-world request) but memory gets cleaned in > either case so flusher should be doing what is needed. How about something like the below? It's on top of the latest series, which is in my wb-start-all branch. It handles start_all like the background/kupdate style writeback, reusing the WB_start_all bit for that. On a plane, so untested, but it seems pretty straight forward. It changes the logic a little bit, as the WB_start_all bit isn't cleared until after we're done with a flush-all request. At this point it's truly on inflight at any point in time, not one inflight and one potentially queued. diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 399619c97567..9e24d604c59c 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -53,7 +53,6 @@ struct wb_writeback_work { unsigned int for_background:1; unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ unsigned int auto_free:1; /* free on completion */ - unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ enum wb_reason reason; /* why was writeback initiated? */ struct list_head list; /* pending work list */ @@ -947,8 +946,6 @@ static unsigned long get_nr_dirty_pages(void) static void wb_start_writeback(struct bdi_writeback *wb, enum wb_reason reason) { - struct wb_writeback_work *work; - if (!wb_has_dirty_io(wb)) return; @@ -958,35 +955,14 @@ static void wb_start_writeback(struct bdi_writeback *wb, enum wb_reason reason) * high frequency, causing pointless allocations of tons of * work items and keeping the flusher threads busy retrieving * that work. Ensure that we only allow one of them pending and - * inflight at the time. It doesn't matter if we race a little - * bit on this, so use the faster separate test/set bit variants. + * inflight at the time. */ - if (test_bit(WB_start_all, &wb->state)) + if (test_bit(WB_start_all, &wb->state) || + test_and_set_bit(WB_start_all, &wb->state)) return; - set_bit(WB_start_all, &wb->state); - - /* - * This is WB_SYNC_NONE writeback, so if allocation fails just - * wakeup the thread for old dirty data writeback - */ - work = kzalloc(sizeof(*work), - GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); - if (!work) { - clear_bit(WB_start_all, &wb->state); - trace_writeback_nowork(wb); - wb_wakeup(wb); - return; - } - - work->sync_mode = WB_SYNC_NONE; - work->nr_pages = wb_split_bdi_pages(wb, get_nr_dirty_pages()); - work->range_cyclic = 1; - work->reason = reason; - work->auto_free = 1; - work->start_all = 1; - - wb_queue_work(wb, work); + wb->start_all_reason = reason; + wb_wakeup(wb); } /** @@ -1838,14 +1814,6 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb) list_del_init(&work->list); } spin_unlock_bh(&wb->work_lock); - - /* - * Once we start processing a work item that had !nr_pages, - * clear the wb state bit for that so we can allow more. - */ - if (work && work->start_all) - clear_bit(WB_start_all, &wb->state); - return work; } @@ -1901,6 +1869,30 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb) return 0; } +static long wb_check_start_all(struct bdi_writeback *wb) +{ + long nr_pages; + + if (!test_bit(WB_start_all, &wb->state)) + return 0; + + nr_pages = get_nr_dirty_pages(); + if (nr_pages) { + struct wb_writeback_work work = { + .nr_pages = wb_split_bdi_pages(wb, nr_pages), + .sync_mode = WB_SYNC_NONE, + .range_cyclic = 1, + .reason = wb->start_all_reason, + }; + + nr_pages = wb_writeback(wb, &work); + } + + clear_bit(WB_start_all, &wb->state); + return nr_pages; +} + + /* * Retrieve work items and do the writeback they describe */ @@ -1917,6 +1909,11 @@ static long wb_do_writeback(struct bdi_writeback *wb) } /* + * Check for a flush-everything request + */ + wrote += wb_check_start_all(wb); + + /* * Check for periodic writeback, kupdated() style */ wrote += wb_check_old_data_flush(wb); diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 420de5c7c7f9..f0f1df29d6b8 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -116,6 +116,7 @@ struct bdi_writeback { struct fprop_local_percpu completions; int dirty_exceeded; + int start_all_reason; spinlock_t work_lock; /* protects work_list & dwork scheduling */ struct list_head work_list; diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index 9b57f014d79d..19a0ea08e098 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -286,7 +286,6 @@ DEFINE_EVENT(writeback_class, name, \ TP_PROTO(struct bdi_writeback *wb), \ TP_ARGS(wb)) -DEFINE_WRITEBACK_EVENT(writeback_nowork); DEFINE_WRITEBACK_EVENT(writeback_wake_background); TRACE_EVENT(writeback_bdi_register, -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id BE8406B0038 for ; Fri, 29 Sep 2017 19:20:06 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id r74so694007wme.5 for ; Fri, 29 Sep 2017 16:20:06 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id d21sor3010026edb.24.2017.09.29.16.20.04 for (Google Transport Security); Fri, 29 Sep 2017 16:20:04 -0700 (PDT) Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush From: Jens Axboe References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> <20170925093532.GC5741@quack2.suse.cz> <214d2bcb-0697-c051-0f36-20cd0d8702b0@kernel.dk> Message-ID: <028e9761-6188-a531-9b1e-32ad9353de13@kernel.dk> Date: Sat, 30 Sep 2017 01:20:02 +0200 MIME-Version: 1.0 In-Reply-To: <214d2bcb-0697-c051-0f36-20cd0d8702b0@kernel.dk> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara Cc: Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com On 09/28/2017 08:09 PM, Jens Axboe wrote: > On 09/25/2017 11:35 AM, Jan Kara wrote: >> On Thu 21-09-17 10:00:25, Jens Axboe wrote: >>> On 09/21/2017 09:36 AM, Jens Axboe wrote: >>>>> But more importantly once we are not guaranteed that we only have >>>>> a single global wb_writeback_work per bdi_writeback we should just >>>>> embedd that into struct bdi_writeback instead of dynamically >>>>> allocating it. >>>> >>>> We could do this as a followup. But right now the logic is that we >>>> can have on started (inflight), and still have one new queued. >>> >>> Something like the below would fit on top to do that. Gets rid of the >>> allocation and embeds the work item for global start-all in the >>> bdi_writeback structure. >> >> Hum, so when we consider stuff like embedded work item, I would somewhat >> prefer to handle this like we do for for_background and for_kupdate style >> writeback so that we don't have another special case. For these don't queue >> any item, we just queue writeback work into the workqueue (via >> wb_wakeup()). When flusher work gets processed wb_do_writeback() checks >> (after processing all normal writeback requests) whether conditions for >> these special writeback styles are met and if yes, it creates on-stack work >> item and processes it (see wb_check_old_data_flush() and >> wb_check_background_flush()). >> >> So in this case we would just set some flag in bdi_writeback when memory >> reclaim needs help and wb_do_writeback() would check for this flag and >> create and process writeback-all style writeback work. Granted this does >> not preserve ordering of requests (basically any specific request gets >> priority over writeback-whole-world request) but memory gets cleaned in >> either case so flusher should be doing what is needed. > > How about something like the below? It's on top of the latest series, > which is in my wb-start-all branch. It handles start_all like the > background/kupdate style writeback, reusing the WB_start_all bit for > that. > > On a plane, so untested, but it seems pretty straight forward. It > changes the logic a little bit, as the WB_start_all bit isn't cleared > until after we're done with a flush-all request. At this point it's > truly on inflight at any point in time, not one inflight and one > potentially queued. I tested it, with adding a patch that actually enables laptop completion triggers on blk-mq (not there before, an oversight, will send that out separately). It works fine for me, verified with tracing that we do trigger flushes with completions from laptop mode. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 0A0C66B0033 for ; Mon, 2 Oct 2017 10:53:18 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id k15so3253702wrc.1 for ; Mon, 02 Oct 2017 07:53:17 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id q135si7978147wmb.46.2017.10.02.07.53.15 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 02 Oct 2017 07:53:15 -0700 (PDT) Date: Mon, 2 Oct 2017 16:53:12 +0200 From: Jan Kara Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush Message-ID: <20171002145312.GB11879@quack2.suse.cz> References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> <20170925093532.GC5741@quack2.suse.cz> <214d2bcb-0697-c051-0f36-20cd0d8702b0@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <214d2bcb-0697-c051-0f36-20cd0d8702b0@kernel.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Jens Axboe Cc: Jan Kara , Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com On Thu 28-09-17 20:09:50, Jens Axboe wrote: > On 09/25/2017 11:35 AM, Jan Kara wrote: > > On Thu 21-09-17 10:00:25, Jens Axboe wrote: > >> On 09/21/2017 09:36 AM, Jens Axboe wrote: > >>>> But more importantly once we are not guaranteed that we only have > >>>> a single global wb_writeback_work per bdi_writeback we should just > >>>> embedd that into struct bdi_writeback instead of dynamically > >>>> allocating it. > >>> > >>> We could do this as a followup. But right now the logic is that we > >>> can have on started (inflight), and still have one new queued. > >> > >> Something like the below would fit on top to do that. Gets rid of the > >> allocation and embeds the work item for global start-all in the > >> bdi_writeback structure. > > > > Hum, so when we consider stuff like embedded work item, I would somewhat > > prefer to handle this like we do for for_background and for_kupdate style > > writeback so that we don't have another special case. For these don't queue > > any item, we just queue writeback work into the workqueue (via > > wb_wakeup()). When flusher work gets processed wb_do_writeback() checks > > (after processing all normal writeback requests) whether conditions for > > these special writeback styles are met and if yes, it creates on-stack work > > item and processes it (see wb_check_old_data_flush() and > > wb_check_background_flush()). > > > > So in this case we would just set some flag in bdi_writeback when memory > > reclaim needs help and wb_do_writeback() would check for this flag and > > create and process writeback-all style writeback work. Granted this does > > not preserve ordering of requests (basically any specific request gets > > priority over writeback-whole-world request) but memory gets cleaned in > > either case so flusher should be doing what is needed. > > How about something like the below? It's on top of the latest series, > which is in my wb-start-all branch. It handles start_all like the > background/kupdate style writeback, reusing the WB_start_all bit for > that. > > On a plane, so untested, but it seems pretty straight forward. It > changes the logic a little bit, as the WB_start_all bit isn't cleared > until after we're done with a flush-all request. At this point it's > truly on inflight at any point in time, not one inflight and one > potentially queued. Thanks for looking into this! I like the change. Honza > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 399619c97567..9e24d604c59c 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -53,7 +53,6 @@ struct wb_writeback_work { > unsigned int for_background:1; > unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ > unsigned int auto_free:1; /* free on completion */ > - unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ > enum wb_reason reason; /* why was writeback initiated? */ > > struct list_head list; /* pending work list */ > @@ -947,8 +946,6 @@ static unsigned long get_nr_dirty_pages(void) > > static void wb_start_writeback(struct bdi_writeback *wb, enum wb_reason reason) > { > - struct wb_writeback_work *work; > - > if (!wb_has_dirty_io(wb)) > return; > > @@ -958,35 +955,14 @@ static void wb_start_writeback(struct bdi_writeback *wb, enum wb_reason reason) > * high frequency, causing pointless allocations of tons of > * work items and keeping the flusher threads busy retrieving > * that work. Ensure that we only allow one of them pending and > - * inflight at the time. It doesn't matter if we race a little > - * bit on this, so use the faster separate test/set bit variants. > + * inflight at the time. > */ > - if (test_bit(WB_start_all, &wb->state)) > + if (test_bit(WB_start_all, &wb->state) || > + test_and_set_bit(WB_start_all, &wb->state)) > return; > > - set_bit(WB_start_all, &wb->state); > - > - /* > - * This is WB_SYNC_NONE writeback, so if allocation fails just > - * wakeup the thread for old dirty data writeback > - */ > - work = kzalloc(sizeof(*work), > - GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > - if (!work) { > - clear_bit(WB_start_all, &wb->state); > - trace_writeback_nowork(wb); > - wb_wakeup(wb); > - return; > - } > - > - work->sync_mode = WB_SYNC_NONE; > - work->nr_pages = wb_split_bdi_pages(wb, get_nr_dirty_pages()); > - work->range_cyclic = 1; > - work->reason = reason; > - work->auto_free = 1; > - work->start_all = 1; > - > - wb_queue_work(wb, work); > + wb->start_all_reason = reason; > + wb_wakeup(wb); > } > > /** > @@ -1838,14 +1814,6 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb) > list_del_init(&work->list); > } > spin_unlock_bh(&wb->work_lock); > - > - /* > - * Once we start processing a work item that had !nr_pages, > - * clear the wb state bit for that so we can allow more. > - */ > - if (work && work->start_all) > - clear_bit(WB_start_all, &wb->state); > - > return work; > } > > @@ -1901,6 +1869,30 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb) > return 0; > } > > +static long wb_check_start_all(struct bdi_writeback *wb) > +{ > + long nr_pages; > + > + if (!test_bit(WB_start_all, &wb->state)) > + return 0; > + > + nr_pages = get_nr_dirty_pages(); > + if (nr_pages) { > + struct wb_writeback_work work = { > + .nr_pages = wb_split_bdi_pages(wb, nr_pages), > + .sync_mode = WB_SYNC_NONE, > + .range_cyclic = 1, > + .reason = wb->start_all_reason, > + }; > + > + nr_pages = wb_writeback(wb, &work); > + } > + > + clear_bit(WB_start_all, &wb->state); > + return nr_pages; > +} > + > + > /* > * Retrieve work items and do the writeback they describe > */ > @@ -1917,6 +1909,11 @@ static long wb_do_writeback(struct bdi_writeback *wb) > } > > /* > + * Check for a flush-everything request > + */ > + wrote += wb_check_start_all(wb); > + > + /* > * Check for periodic writeback, kupdated() style > */ > wrote += wb_check_old_data_flush(wb); > diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h > index 420de5c7c7f9..f0f1df29d6b8 100644 > --- a/include/linux/backing-dev-defs.h > +++ b/include/linux/backing-dev-defs.h > @@ -116,6 +116,7 @@ struct bdi_writeback { > > struct fprop_local_percpu completions; > int dirty_exceeded; > + int start_all_reason; > > spinlock_t work_lock; /* protects work_list & dwork scheduling */ > struct list_head work_list; > diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h > index 9b57f014d79d..19a0ea08e098 100644 > --- a/include/trace/events/writeback.h > +++ b/include/trace/events/writeback.h > @@ -286,7 +286,6 @@ DEFINE_EVENT(writeback_class, name, \ > TP_PROTO(struct bdi_writeback *wb), \ > TP_ARGS(wb)) > > -DEFINE_WRITEBACK_EVENT(writeback_nowork); > DEFINE_WRITEBACK_EVENT(writeback_wake_background); > > TRACE_EVENT(writeback_bdi_register, > > -- > Jens Axboe > -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org