[2/5] mm: vmscan: kick flushers when we encounter dirty pages on the LRU
diff mbox series

Message ID 20170123181641.23938-3-hannes@cmpxchg.org
State New, archived
Headers show
Series
  • mm: vmscan: fix kswapd writeback regression
Related show

Commit Message

Johannes Weiner Jan. 23, 2017, 6:16 p.m. UTC
Memory pressure can put dirty pages at the end of the LRU without
anybody running into dirty limits. Don't start writing individual
pages from kswapd while the flushers might be asleep.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/writeback.h        |  2 +-
 include/trace/events/writeback.h |  2 +-
 mm/vmscan.c                      | 18 +++++++++++++-----
 3 files changed, 15 insertions(+), 7 deletions(-)

Comments

Minchan Kim Jan. 26, 2017, 1:35 a.m. UTC | #1
On Mon, Jan 23, 2017 at 01:16:38PM -0500, Johannes Weiner wrote:
> Memory pressure can put dirty pages at the end of the LRU without
> anybody running into dirty limits. Don't start writing individual
> pages from kswapd while the flushers might be asleep.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Mel Gorman Jan. 26, 2017, 9:57 a.m. UTC | #2
On Mon, Jan 23, 2017 at 01:16:38PM -0500, Johannes Weiner wrote:
> Memory pressure can put dirty pages at the end of the LRU without
> anybody running into dirty limits. Don't start writing individual
> pages from kswapd while the flushers might be asleep.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

I don't understand the motivation for checking the wb_reason name. Maybe
it was easier to eyeball while reading ftraces. The comment about the
flusher not doing its job could also be as simple as the writes took
place and clean pages were reclaimed before dirty_expire was reached.
Not impossible if there was a light writer combined with a heavy reader
or a large number of anonymous faults.

Anyway;

Acked-by: Mel Gorman <mgorman@suse.de>
Michal Hocko Jan. 26, 2017, 1:16 p.m. UTC | #3
On Mon 23-01-17 13:16:38, Johannes Weiner wrote:
> Memory pressure can put dirty pages at the end of the LRU without
> anybody running into dirty limits. Don't start writing individual
> pages from kswapd while the flushers might be asleep.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/writeback.h        |  2 +-
>  include/trace/events/writeback.h |  2 +-
>  mm/vmscan.c                      | 18 +++++++++++++-----
>  3 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 5527d910ba3d..a3c0cbd7c888 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -46,7 +46,7 @@ enum writeback_sync_modes {
>   */
>  enum wb_reason {
>  	WB_REASON_BACKGROUND,
> -	WB_REASON_TRY_TO_FREE_PAGES,
> +	WB_REASON_VMSCAN,
>  	WB_REASON_SYNC,
>  	WB_REASON_PERIODIC,
>  	WB_REASON_LAPTOP_TIMER,
> diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> index 2ccd9ccbf9ef..7bd8783a590f 100644
> --- a/include/trace/events/writeback.h
> +++ b/include/trace/events/writeback.h
> @@ -31,7 +31,7 @@
>  
>  #define WB_WORK_REASON							\
>  	EM( WB_REASON_BACKGROUND,		"background")		\
> -	EM( WB_REASON_TRY_TO_FREE_PAGES,	"try_to_free_pages")	\
> +	EM( WB_REASON_VMSCAN,			"vmscan")		\
>  	EM( WB_REASON_SYNC,			"sync")			\
>  	EM( WB_REASON_PERIODIC,			"periodic")		\
>  	EM( WB_REASON_LAPTOP_TIMER,		"laptop_timer")		\
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0d05f7f3b532..56ea8d24041f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1798,12 +1798,20 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  
>  		/*
>  		 * If dirty pages are scanned that are not queued for IO, it
> -		 * implies that flushers are not keeping up. In this case, flag
> -		 * the pgdat PGDAT_DIRTY and kswapd will start writing pages from
> -		 * reclaim context.
> +		 * implies that flushers are not doing their job. This can
> +		 * happen when memory pressure pushes dirty pages to the end
> +		 * of the LRU without the dirty limits being breached. It can
> +		 * also happen when the proportion of dirty pages grows not
> +		 * through writes but through memory pressure reclaiming all
> +		 * the clean cache. And in some cases, the flushers simply
> +		 * cannot keep up with the allocation rate. Nudge the flusher
> +		 * threads in case they are asleep, but also allow kswapd to
> +		 * start writing pages during reclaim.
>  		 */
> -		if (stat.nr_unqueued_dirty == nr_taken)
> +		if (stat.nr_unqueued_dirty == nr_taken) {
> +			wakeup_flusher_threads(0, WB_REASON_VMSCAN);
>  			set_bit(PGDAT_DIRTY, &pgdat->flags);
> +		}
>  
>  		/*
>  		 * If kswapd scans pages marked marked for immediate
> @@ -2787,7 +2795,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>  		writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2;
>  		if (total_scanned > writeback_threshold) {
>  			wakeup_flusher_threads(laptop_mode ? 0 : total_scanned,
> -						WB_REASON_TRY_TO_FREE_PAGES);
> +						WB_REASON_VMSCAN);
>  			sc->may_writepage = 1;
>  		}
>  	} while (--sc->priority >= 0);
> -- 
> 2.11.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Johannes Weiner Jan. 26, 2017, 5:47 p.m. UTC | #4
On Thu, Jan 26, 2017 at 09:57:45AM +0000, Mel Gorman wrote:
> On Mon, Jan 23, 2017 at 01:16:38PM -0500, Johannes Weiner wrote:
> > Memory pressure can put dirty pages at the end of the LRU without
> > anybody running into dirty limits. Don't start writing individual
> > pages from kswapd while the flushers might be asleep.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> I don't understand the motivation for checking the wb_reason name. Maybe
> it was easier to eyeball while reading ftraces. The comment about the
> flusher not doing its job could also be as simple as the writes took
> place and clean pages were reclaimed before dirty_expire was reached.
> Not impossible if there was a light writer combined with a heavy reader
> or a large number of anonymous faults.

The name change was only because try_to_free_pages() wasn't the only
function doing this flusher wakeup anymore. I associate that name with
direct reclaim rather than reclaim in general, so I figured this makes
more sense. No strong feelings either way, but I doubt this will break
anything in userspace.

The comment on dirty expiration is a good point. Let's add this to the
list of reasons why reclaim might run into dirty data. Fixlet below.

> Acked-by: Mel Gorman <mgorman@suse.de>

Thanks!

---

>From 44c4289ab85c0af66cb06de6d1bb72a5c67fd755 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 26 Jan 2017 12:41:39 -0500
Subject: [PATCH] mm: vmscan: kick flushers when we encounter dirty pages on
 the LRU fix

Mention dirty expiration as a condition: we need dirty data that is
too recent for periodic flushing and not large enough for waking up
limit flushing. As per Mel.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 56ea8d24041f..ccd4bf952cb3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1799,15 +1799,14 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		/*
 		 * If dirty pages are scanned that are not queued for IO, it
 		 * implies that flushers are not doing their job. This can
-		 * happen when memory pressure pushes dirty pages to the end
-		 * of the LRU without the dirty limits being breached. It can
-		 * also happen when the proportion of dirty pages grows not
-		 * through writes but through memory pressure reclaiming all
-		 * the clean cache. And in some cases, the flushers simply
-		 * cannot keep up with the allocation rate. Nudge the flusher
-		 * threads in case they are asleep, but also allow kswapd to
-		 * start writing pages during reclaim.
+		 * happen when memory pressure pushes dirty pages to the end of
+		 * the LRU before the dirty limits are breached and the dirty
+		 * data has expired. It can also happen when the proportion of
+		 * dirty pages grows not through writes but through memory
+		 * pressure reclaiming all the clean cache. And in some cases,
+		 * the flushers simply cannot keep up with the allocation
+		 * rate. Nudge the flusher threads in case they are asleep, but
+		 * also allow kswapd to start writing pages during reclaim.
 		 */
 		if (stat.nr_unqueued_dirty == nr_taken) {
 			wakeup_flusher_threads(0, WB_REASON_VMSCAN);
Mel Gorman Jan. 26, 2017, 6:47 p.m. UTC | #5
On Thu, Jan 26, 2017 at 12:47:39PM -0500, Johannes Weiner wrote:
> On Thu, Jan 26, 2017 at 09:57:45AM +0000, Mel Gorman wrote:
> > On Mon, Jan 23, 2017 at 01:16:38PM -0500, Johannes Weiner wrote:
> > > Memory pressure can put dirty pages at the end of the LRU without
> > > anybody running into dirty limits. Don't start writing individual
> > > pages from kswapd while the flushers might be asleep.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > I don't understand the motivation for checking the wb_reason name. Maybe
> > it was easier to eyeball while reading ftraces. The comment about the
> > flusher not doing its job could also be as simple as the writes took
> > place and clean pages were reclaimed before dirty_expire was reached.
> > Not impossible if there was a light writer combined with a heavy reader
> > or a large number of anonymous faults.
> 
> The name change was only because try_to_free_pages() wasn't the only
> function doing this flusher wakeup anymore.

Ah, ok. I was thinking of it in terms of "we are trying to free pages"
and not the specific name of the direct reclaim function.

> I associate that name with
> direct reclaim rather than reclaim in general, so I figured this makes
> more sense. No strong feelings either way, but I doubt this will break
> anything in userspace.
> 

Doubtful, maybe some tracing analysis scripts but they routinely have
to adapt.

> The comment on dirty expiration is a good point. Let's add this to the
> list of reasons why reclaim might run into dirty data. Fixlet below.
> 

Looks good.

Patch
diff mbox series

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 5527d910ba3d..a3c0cbd7c888 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -46,7 +46,7 @@  enum writeback_sync_modes {
  */
 enum wb_reason {
 	WB_REASON_BACKGROUND,
-	WB_REASON_TRY_TO_FREE_PAGES,
+	WB_REASON_VMSCAN,
 	WB_REASON_SYNC,
 	WB_REASON_PERIODIC,
 	WB_REASON_LAPTOP_TIMER,
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 2ccd9ccbf9ef..7bd8783a590f 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -31,7 +31,7 @@ 
 
 #define WB_WORK_REASON							\
 	EM( WB_REASON_BACKGROUND,		"background")		\
-	EM( WB_REASON_TRY_TO_FREE_PAGES,	"try_to_free_pages")	\
+	EM( WB_REASON_VMSCAN,			"vmscan")		\
 	EM( WB_REASON_SYNC,			"sync")			\
 	EM( WB_REASON_PERIODIC,			"periodic")		\
 	EM( WB_REASON_LAPTOP_TIMER,		"laptop_timer")		\
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0d05f7f3b532..56ea8d24041f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1798,12 +1798,20 @@  shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 
 		/*
 		 * If dirty pages are scanned that are not queued for IO, it
-		 * implies that flushers are not keeping up. In this case, flag
-		 * the pgdat PGDAT_DIRTY and kswapd will start writing pages from
-		 * reclaim context.
+		 * implies that flushers are not doing their job. This can
+		 * happen when memory pressure pushes dirty pages to the end
+		 * of the LRU without the dirty limits being breached. It can
+		 * also happen when the proportion of dirty pages grows not
+		 * through writes but through memory pressure reclaiming all
+		 * the clean cache. And in some cases, the flushers simply
+		 * cannot keep up with the allocation rate. Nudge the flusher
+		 * threads in case they are asleep, but also allow kswapd to
+		 * start writing pages during reclaim.
 		 */
-		if (stat.nr_unqueued_dirty == nr_taken)
+		if (stat.nr_unqueued_dirty == nr_taken) {
+			wakeup_flusher_threads(0, WB_REASON_VMSCAN);
 			set_bit(PGDAT_DIRTY, &pgdat->flags);
+		}
 
 		/*
 		 * If kswapd scans pages marked marked for immediate
@@ -2787,7 +2795,7 @@  static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 		writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2;
 		if (total_scanned > writeback_threshold) {
 			wakeup_flusher_threads(laptop_mode ? 0 : total_scanned,
-						WB_REASON_TRY_TO_FREE_PAGES);
+						WB_REASON_VMSCAN);
 			sc->may_writepage = 1;
 		}
 	} while (--sc->priority >= 0);