linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 -mm 0/3] kswapd vs compaction improvements
@ 2012-01-26 19:54 Rik van Riel
  2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-26 19:54 UTC (permalink / raw)
  To: linux-mm
  Cc: lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Andrew Morton, Minchan Kim, KOSAKI Motohiro

Running a kernel with compaction enabled with some memory pressure has
caused my system to run into all kinds of trouble.

One of the more obvious problems is that while kswapd does not try to 
free contiguous pages when CONFIG_COMPACTION is enabled, it does
continue reclaiming until enough contiguous pages have become available.
This can lead to enormous swap storms, where lots of memory is freed
and a fair amount of the working set can end up in swap.

A second problem is that memory compaction currently does nothing for
network allocations in the receive path, for eg. jumbo frames, because
those are done in interrupt context.  In the past we have tried to
have kswapd invoke memory compaction, but it used too much CPU time.
The second patch in this series has kswapd invoke compaction very
carefully, taking in account the desired page order, as well as
zone->compaction_deferred.

I have tested these patches on my system, and things seem to behave
well. Any tests and reviews would be appreciated.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled
  2012-01-26 19:54 [PATCH v3 -mm 0/3] kswapd vs compaction improvements Rik van Riel
@ 2012-01-26 19:59 ` Rik van Riel
  2012-01-27  9:13   ` Hillf Danton
                     ` (3 more replies)
  2012-01-26 19:59 ` [PATCH v3 -mm 2/3] mm: kswapd carefully call compaction Rik van Riel
  2012-01-26 20:01 ` [PATCH v3 -mm 3/3] mm: only defer compaction for failed order and higher Rik van Riel
  2 siblings, 4 replies; 12+ messages in thread
From: Rik van Riel @ 2012-01-26 19:59 UTC (permalink / raw)
  To: linux-mm
  Cc: lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Andrew Morton, Minchan Kim, KOSAKI Motohiro

When built with CONFIG_COMPACTION, kswapd should not try to free
contiguous pages, because it is not trying hard enough to have
a real chance at being successful, but still disrupts the LRU
enough to break other things.

Do not do higher order page isolation unless we really are in
lumpy reclaim mode.

Stop reclaiming pages once we have enough free pages that
compaction can deal with things, and we hit the normal order 0
watermarks used by kswapd.

Also remove a line of code that increments balanced right before
exiting the function.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
-v4: further cleanups suggested by Mel Gorman
 mm/vmscan.c |   43 +++++++++++++++++++++++++++++--------------
 1 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2880396..2e2e43d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1139,7 +1139,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
  * @mz:		The mem_cgroup_zone to pull pages from.
  * @dst:	The temp list to put pages on to.
  * @nr_scanned:	The number of pages that were scanned.
- * @order:	The caller's attempted allocation order
+ * @sc:		The scan_control struct for this reclaim session
  * @mode:	One of the LRU isolation modes
  * @active:	True [1] if isolating active pages
  * @file:	True [1] if isolating file [!anon] pages
@@ -1148,8 +1148,8 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct mem_cgroup_zone *mz, struct list_head *dst,
-		unsigned long *nr_scanned, int order, isolate_mode_t mode,
-		int active, int file)
+		unsigned long *nr_scanned, struct scan_control *sc,
+		isolate_mode_t mode, int active, int file)
 {
 	struct lruvec *lruvec;
 	struct list_head *src;
@@ -1195,7 +1195,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			BUG();
 		}
 
-		if (!order)
+		if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))
 			continue;
 
 		/*
@@ -1209,8 +1209,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		 */
 		zone_id = page_zone_id(page);
 		page_pfn = page_to_pfn(page);
-		pfn = page_pfn & ~((1 << order) - 1);
-		end_pfn = pfn + (1 << order);
+		pfn = page_pfn & ~((1 << sc->order) - 1);
+		end_pfn = pfn + (1 << sc->order);
 		for (; pfn < end_pfn; pfn++) {
 			struct page *cursor_page;
 
@@ -1276,7 +1276,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 	*nr_scanned = scan;
 
-	trace_mm_vmscan_lru_isolate(order,
+	trace_mm_vmscan_lru_isolate(sc->order,
 			nr_to_scan, scan,
 			nr_taken,
 			nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed,
@@ -1535,7 +1535,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	spin_lock_irq(&zone->lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, mz, &page_list,
-				     &nr_scanned, sc->order,
+				     &nr_scanned, sc,
 				     reclaim_mode, 0, file);
 	if (global_reclaim(sc)) {
 		zone->pages_scanned += nr_scanned;
@@ -1713,7 +1713,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	spin_lock_irq(&zone->lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, mz, &l_hold,
-				     &nr_scanned, sc->order,
+				     &nr_scanned, sc,
 				     reclaim_mode, 1, file);
 	if (global_reclaim(sc))
 		zone->pages_scanned += nr_scanned;
@@ -2754,7 +2754,7 @@ loop_again:
 		 */
 		for (i = 0; i <= end_zone; i++) {
 			struct zone *zone = pgdat->node_zones + i;
-			int nr_slab;
+			int nr_slab, testorder;
 			unsigned long balance_gap;
 
 			if (!populated_zone(zone))
@@ -2787,7 +2787,20 @@ loop_again:
 				(zone->present_pages +
 					KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
 				KSWAPD_ZONE_BALANCE_GAP_RATIO);
-			if (!zone_watermark_ok_safe(zone, order,
+			/*
+			 * Kswapd reclaims only single pages with compaction
+			 * enabled. Trying too hard to reclaim until contiguous
+			 * free pages have become available can hurt performance
+			 * by evicting too much useful data from memory.
+			 * Do not reclaim more than needed for compaction.
+			 */
+			testorder = order;
+			if (COMPACTION_BUILD && order &&
+					compaction_suitable(zone, order) !=
+						COMPACT_SKIPPED)
+				testorder = 0;
+
+			if (!zone_watermark_ok_safe(zone, testorder,
 					high_wmark_pages(zone) + balance_gap,
 					end_zone, 0)) {
 				shrink_zone(priority, zone, &sc);
@@ -2816,7 +2829,7 @@ loop_again:
 				continue;
 			}
 
-			if (!zone_watermark_ok_safe(zone, order,
+			if (!zone_watermark_ok_safe(zone, testorder,
 					high_wmark_pages(zone), end_zone, 0)) {
 				all_zones_ok = 0;
 				/*
@@ -2913,6 +2926,10 @@ out:
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;
 
+			/* Would compaction fail due to lack of free memory? */
+			if (compaction_suitable(zone, order) == COMPACT_SKIPPED)
+				goto loop_again;
+
 			/* Confirm the zone is balanced for order-0 */
 			if (!zone_watermark_ok(zone, 0,
 					high_wmark_pages(zone), 0, 0)) {
@@ -2922,8 +2939,6 @@ out:
 
 			/* If balanced, clear the congested flag */
 			zone_clear_flag(zone, ZONE_CONGESTED);
-			if (i <= *classzone_idx)
-				balanced += zone->present_pages;
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 -mm 2/3] mm: kswapd carefully call compaction
  2012-01-26 19:54 [PATCH v3 -mm 0/3] kswapd vs compaction improvements Rik van Riel
  2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
@ 2012-01-26 19:59 ` Rik van Riel
  2012-01-27 23:36   ` Andrew Morton
  2012-01-26 20:01 ` [PATCH v3 -mm 3/3] mm: only defer compaction for failed order and higher Rik van Riel
  2 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2012-01-26 19:59 UTC (permalink / raw)
  To: linux-mm
  Cc: lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Andrew Morton, Minchan Kim, KOSAKI Motohiro

With CONFIG_COMPACTION enabled, kswapd does not try to free
contiguous free pages, even when it is woken for a higher order
request.

This could be bad for eg. jumbo frame network allocations, which
are done from interrupt context and cannot compact memory themselves.
Higher than before allocation failure rates in the network receive
path have been observed in kernels with compaction enabled.

Teach kswapd to defragment the memory zones in a node, but only
if required and compaction is not deferred in a zone.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/compaction.h |    6 +++++
 mm/compaction.c            |   53 +++++++++++++++++++++++++++++---------------
 mm/vmscan.c                |    9 +++++++
 3 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index bb2bbdb..7a9323a 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -23,6 +23,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask,
 			bool sync);
+extern int compact_pgdat(pg_data_t *pgdat, int order);
 extern unsigned long compaction_suitable(struct zone *zone, int order);
 
 /* Do not skip compaction more than 64 times */
@@ -62,6 +63,11 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	return COMPACT_CONTINUE;
 }
 
+static inline int compact_pgdat(pg_data_t *pgdat, int order)
+{
+	return COMPACT_CONTINUE;
+}
+
 static inline unsigned long compaction_suitable(struct zone *zone, int order)
 {
 	return COMPACT_SKIPPED;
diff --git a/mm/compaction.c b/mm/compaction.c
index 71a58f6..51ece75 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -653,44 +653,61 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 
 
 /* Compact all zones within a node */
-static int compact_node(int nid)
+static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 {
 	int zoneid;
-	pg_data_t *pgdat;
 	struct zone *zone;
 
-	if (nid < 0 || nid >= nr_node_ids || !node_online(nid))
-		return -EINVAL;
-	pgdat = NODE_DATA(nid);
-
 	/* Flush pending updates to the LRU lists */
 	lru_add_drain_all();
 
 	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
-		struct compact_control cc = {
-			.nr_freepages = 0,
-			.nr_migratepages = 0,
-			.order = -1,
-			.sync = true,
-		};
 
 		zone = &pgdat->node_zones[zoneid];
 		if (!populated_zone(zone))
 			continue;
 
-		cc.zone = zone;
-		INIT_LIST_HEAD(&cc.freepages);
-		INIT_LIST_HEAD(&cc.migratepages);
+		cc->nr_freepages = 0;
+		cc->nr_migratepages = 0;
+		cc->zone = zone;
+		INIT_LIST_HEAD(&cc->freepages);
+		INIT_LIST_HEAD(&cc->migratepages);
 
-		compact_zone(zone, &cc);
+		if (cc->order < 0 || !compaction_deferred(zone))
+			compact_zone(zone, cc);
 
-		VM_BUG_ON(!list_empty(&cc.freepages));
-		VM_BUG_ON(!list_empty(&cc.migratepages));
+		VM_BUG_ON(!list_empty(&cc->freepages));
+		VM_BUG_ON(!list_empty(&cc->migratepages));
 	}
 
 	return 0;
 }
 
+int compact_pgdat(pg_data_t *pgdat, int order)
+{
+	struct compact_control cc = {
+		.order = order,
+		.sync = false,
+	};
+
+	return __compact_pgdat(pgdat, &cc);
+}
+
+static int compact_node(int nid)
+{
+	pg_data_t *pgdat;
+	struct compact_control cc = {
+		.order = -1,
+		.sync = true,
+	};
+
+	if (nid < 0 || nid >= nr_node_ids || !node_online(nid))
+		return -EINVAL;
+	pgdat = NODE_DATA(nid);
+
+	return __compact_pgdat(pgdat, &cc);
+}
+
 /* Compact all nodes in the system */
 static int compact_nodes(void)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0398fab..fa17794 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2673,6 +2673,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 	int priority;
 	int i;
 	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
+	int zones_need_compaction = 1;
 	unsigned long total_scanned;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_soft_reclaimed;
@@ -2937,9 +2938,17 @@ out:
 				goto loop_again;
 			}
 
+			/* Check if the memory needs to be defragmented. */
+			if (zone_watermark_ok(zone, order,
+				    low_wmark_pages(zone), *classzone_idx, 0))
+				zones_need_compaction = 0;
+
 			/* If balanced, clear the congested flag */
 			zone_clear_flag(zone, ZONE_CONGESTED);
 		}
+
+		if (zones_need_compaction)
+			compact_pgdat(pgdat, order);
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 -mm 3/3] mm: only defer compaction for failed order and higher
  2012-01-26 19:54 [PATCH v3 -mm 0/3] kswapd vs compaction improvements Rik van Riel
  2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
  2012-01-26 19:59 ` [PATCH v3 -mm 2/3] mm: kswapd carefully call compaction Rik van Riel
@ 2012-01-26 20:01 ` Rik van Riel
  2012-01-30 10:47   ` Mel Gorman
  2 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2012-01-26 20:01 UTC (permalink / raw)
  To: linux-mm
  Cc: lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Andrew Morton, Minchan Kim, KOSAKI Motohiro

Currently a failed order-9 (transparent hugepage) compaction can
lead to memory compaction being temporarily disabled for a memory
zone.  Even if we only need compaction for an order 2 allocation,
eg. for jumbo frames networking.

The fix is relatively straightforward: keep track of the highest
order at which compaction is succeeding, and only defer compaction
for orders at which compaction is failing.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
-v3: add comment suggested by Mel Gorman

 include/linux/compaction.h |   14 ++++++++++----
 include/linux/mmzone.h     |    1 +
 mm/compaction.c            |   12 +++++++++++-
 mm/page_alloc.c            |    6 ++++--
 mm/vmscan.c                |    2 +-
 5 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 7a9323a..51a90b7 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -34,20 +34,26 @@ extern unsigned long compaction_suitable(struct zone *zone, int order);
  * allocation success. 1 << compact_defer_limit compactions are skipped up
  * to a limit of 1 << COMPACT_MAX_DEFER_SHIFT
  */
-static inline void defer_compaction(struct zone *zone)
+static inline void defer_compaction(struct zone *zone, int order)
 {
 	zone->compact_considered = 0;
 	zone->compact_defer_shift++;
 
+	if (order < zone->compact_order_failed)
+		zone->compact_order_failed = order;
+
 	if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT)
 		zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT;
 }
 
 /* Returns true if compaction should be skipped this time */
-static inline bool compaction_deferred(struct zone *zone)
+static inline bool compaction_deferred(struct zone *zone, int order)
 {
 	unsigned long defer_limit = 1UL << zone->compact_defer_shift;
 
+	if (order < zone->compact_order_failed)
+		return false;
+
 	/* Avoid possible overflow */
 	if (++zone->compact_considered > defer_limit)
 		zone->compact_considered = defer_limit;
@@ -73,11 +79,11 @@ static inline unsigned long compaction_suitable(struct zone *zone, int order)
 	return COMPACT_SKIPPED;
 }
 
-static inline void defer_compaction(struct zone *zone)
+static inline void defer_compaction(struct zone *zone, int order)
 {
 }
 
-static inline bool compaction_deferred(struct zone *zone)
+static inline bool compaction_deferred(struct zone *zone, int order)
 {
 	return 1;
 }
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 650ba2f..dff7115 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -365,6 +365,7 @@ struct zone {
 	 */
 	unsigned int		compact_considered;
 	unsigned int		compact_defer_shift;
+	int			compact_order_failed;
 #endif
 
 	ZONE_PADDING(_pad1_)
diff --git a/mm/compaction.c b/mm/compaction.c
index 51ece75..3bde234 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -673,9 +673,19 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		INIT_LIST_HEAD(&cc->freepages);
 		INIT_LIST_HEAD(&cc->migratepages);
 
-		if (cc->order < 0 || !compaction_deferred(zone))
+		if (cc->order < 0 || !compaction_deferred(zone, cc->order))
 			compact_zone(zone, cc);
 
+		if (cc->order > 0) {
+			int ok = zone_watermark_ok(zone, cc->order,
+						low_wmark_pages(zone), 0, 0);
+			if (ok && cc->order > zone->compact_order_failed)
+				zone->compact_order_failed = cc->order + 1;
+			/* Currently async compaction is never deferred. */
+			else if (!ok && cc->sync)
+				defer_compaction(zone, cc->order);
+		}
+
 		VM_BUG_ON(!list_empty(&cc->freepages));
 		VM_BUG_ON(!list_empty(&cc->migratepages));
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0027d8f..cd617d9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1990,7 +1990,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	if (!order)
 		return NULL;
 
-	if (compaction_deferred(preferred_zone)) {
+	if (compaction_deferred(preferred_zone, order)) {
 		*deferred_compaction = true;
 		return NULL;
 	}
@@ -2012,6 +2012,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		if (page) {
 			preferred_zone->compact_considered = 0;
 			preferred_zone->compact_defer_shift = 0;
+			if (order >= preferred_zone->compact_order_failed)
+				preferred_zone->compact_order_failed = order + 1;
 			count_vm_event(COMPACTSUCCESS);
 			return page;
 		}
@@ -2028,7 +2030,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		 * defer if the failure was a sync compaction failure.
 		 */
 		if (sync_migration)
-			defer_compaction(preferred_zone);
+			defer_compaction(preferred_zone, order);
 
 		cond_resched();
 	}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0d7a9cd..021ca2f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2196,7 +2196,7 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 	 * If compaction is deferred, reclaim up to a point where
 	 * compaction will have a chance of success when re-enabled
 	 */
-	if (compaction_deferred(zone))
+	if (compaction_deferred(zone, sc->order))
 		return watermark_ok;
 
 	/* If compaction is not ready to start, keep reclaiming */


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled
  2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
@ 2012-01-27  9:13   ` Hillf Danton
  2012-01-27 16:35     ` Rik van Riel
  2012-01-27 23:31   ` Andrew Morton
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Hillf Danton @ 2012-01-27  9:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Andrew Morton, Minchan Kim, Hillf Danton

Hi Rik

On Fri, Jan 27, 2012 at 3:59 AM, Rik van Riel <riel@redhat.com> wrote:
> When built with CONFIG_COMPACTION, kswapd should not try to free
> contiguous pages, because it is not trying hard enough to have
> a real chance at being successful, but still disrupts the LRU
> enough to break other things.
>
> Do not do higher order page isolation unless we really are in
> lumpy reclaim mode.
>
> Stop reclaiming pages once we have enough free pages that
> compaction can deal with things, and we hit the normal order 0
> watermarks used by kswapd.
>
> Also remove a line of code that increments balanced right before
> exiting the function.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> -v4: further cleanups suggested by Mel Gorman
>  mm/vmscan.c |   43 +++++++++++++++++++++++++++++--------------
>  1 files changed, 29 insertions(+), 14 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2880396..2e2e43d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1139,7 +1139,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>  * @mz:                The mem_cgroup_zone to pull pages from.
>  * @dst:       The temp list to put pages on to.
>  * @nr_scanned:        The number of pages that were scanned.
> - * @order:     The caller's attempted allocation order
> + * @sc:                The scan_control struct for this reclaim session
>  * @mode:      One of the LRU isolation modes
>  * @active:    True [1] if isolating active pages
>  * @file:      True [1] if isolating file [!anon] pages
> @@ -1148,8 +1148,8 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>  */
>  static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>                struct mem_cgroup_zone *mz, struct list_head *dst,
> -               unsigned long *nr_scanned, int order, isolate_mode_t mode,
> -               int active, int file)
> +               unsigned long *nr_scanned, struct scan_control *sc,
> +               isolate_mode_t mode, int active, int file)
>  {
>        struct lruvec *lruvec;
>        struct list_head *src;
> @@ -1195,7 +1195,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>                        BUG();
>                }
>
> -               if (!order)
> +               if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))
>                        continue;
>
Just a tiny advice 8-)

mind to move checking lumpy reclaim out of the loop,
something like

	unsigned long nr_lumpy_failed = 0;
	unsigned long scan;
	int lru = LRU_BASE;
+	int order = sc->order;
+
+	if (!(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))
+		order = 0;

	lruvec = mem_cgroup_zone_lruvec(mz->zone, mz->mem_cgroup);
	if (active)
		lru += LRU_ACTIVE;

with a line of comment?

Hillf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled
  2012-01-27  9:13   ` Hillf Danton
@ 2012-01-27 16:35     ` Rik van Riel
  2012-01-30 10:26       ` Mel Gorman
  0 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2012-01-27 16:35 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Andrew Morton, Minchan Kim

On 01/27/2012 04:13 AM, Hillf Danton wrote:

>> @@ -1195,7 +1195,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>                         BUG();
>>                 }
>>
>> -               if (!order)
>> +               if (!sc->order || !(sc->reclaim_mode&  RECLAIM_MODE_LUMPYRECLAIM))
>>                         continue;
>>
> Just a tiny advice 8-)
>
> mind to move checking lumpy reclaim out of the loop,
> something like

Hehe, I made the change the way it is on request
of Mel Gorman :)

I don't particularly care either way and will be
happy to make the code whichever way people prefer.

Just let me know.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled
  2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
  2012-01-27  9:13   ` Hillf Danton
@ 2012-01-27 23:31   ` Andrew Morton
  2012-01-29 13:25   ` Hillf Danton
  2012-01-30 10:36   ` Mel Gorman
  3 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2012-01-27 23:31 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Minchan Kim, KOSAKI Motohiro

On Thu, 26 Jan 2012 14:59:14 -0500
Rik van Riel <riel@redhat.com> wrote:

> When built with CONFIG_COMPACTION, kswapd should not try to free
> contiguous pages, because it is not trying hard enough to have
> a real chance at being successful, but still disrupts the LRU
> enough to break other things.
> 
> Do not do higher order page isolation unless we really are in
> lumpy reclaim mode.
> 
> Stop reclaiming pages once we have enough free pages that
> compaction can deal with things, and we hit the normal order 0
> watermarks used by kswapd.
> 
> Also remove a line of code that increments balanced right before
> exiting the function.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1139,7 +1139,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>   * @mz:		The mem_cgroup_zone to pull pages from.
>   * @dst:	The temp list to put pages on to.
>   * @nr_scanned:	The number of pages that were scanned.
> - * @order:	The caller's attempted allocation order
> + * @sc:		The scan_control struct for this reclaim session
>   * @mode:	One of the LRU isolation modes
>   * @active:	True [1] if isolating active pages
>   * @file:	True [1] if isolating file [!anon] pages
> @@ -1148,8 +1148,8 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>   */
>  static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  		struct mem_cgroup_zone *mz, struct list_head *dst,
> -		unsigned long *nr_scanned, int order, isolate_mode_t mode,
> -		int active, int file)
> +		unsigned long *nr_scanned, struct scan_control *sc,
> +		isolate_mode_t mode, int active, int file)
>  {
>  	struct lruvec *lruvec;
>  	struct list_head *src;
> @@ -1195,7 +1195,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			BUG();
>  		}
>  
> -		if (!order)
> +		if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))

We should have a comment here explaining the reason for the code.

And the immediately following comment isn't very good: "Only take those
pages of the same active state as that tag page".  As is common with
poor comments, it tells us "what", but not "why".  Reclaiming inactive
_and_ inactive pages would make larger-page freeing more successful and
might be a good thing!  Apparently someone felt otherwise, but the
reader is kept in the dark...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 2/3] mm: kswapd carefully call compaction
  2012-01-26 19:59 ` [PATCH v3 -mm 2/3] mm: kswapd carefully call compaction Rik van Riel
@ 2012-01-27 23:36   ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2012-01-27 23:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Minchan Kim, KOSAKI Motohiro

On Thu, 26 Jan 2012 14:59:58 -0500
Rik van Riel <riel@redhat.com> wrote:

> With CONFIG_COMPACTION enabled, kswapd does not try to free
> contiguous free pages, even when it is woken for a higher order
> request.
> 
> This could be bad for eg. jumbo frame network allocations, which
> are done from interrupt context and cannot compact memory themselves.
> Higher than before allocation failure rates in the network receive
> path have been observed in kernels with compaction enabled.
> 
> Teach kswapd to defragment the memory zones in a node, but only
> if required and compaction is not deferred in a zone.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2673,6 +2673,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
>  	int priority;
>  	int i;
>  	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
> +	int zones_need_compaction = 1;
>  	unsigned long total_scanned;
>  	struct reclaim_state *reclaim_state = current->reclaim_state;
>  	unsigned long nr_soft_reclaimed;
> @@ -2937,9 +2938,17 @@ out:
>  				goto loop_again;
>  			}
>  
> +			/* Check if the memory needs to be defragmented. */
> +			if (zone_watermark_ok(zone, order,
> +				    low_wmark_pages(zone), *classzone_idx, 0))
> +				zones_need_compaction = 0;
> +
>  			/* If balanced, clear the congested flag */
>  			zone_clear_flag(zone, ZONE_CONGESTED);
>  		}
> +
> +		if (zones_need_compaction)
> +			compact_pgdat(pgdat, order);
>  	}

Nicer:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmscan-kswapd-carefully-call-compaction-fix

reduce scope of zones_need_compaction

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/vmscan.c~vmscan-kswapd-carefully-call-compaction-fix
+++ a/mm/vmscan.c
@@ -2672,7 +2672,6 @@ static unsigned long balance_pgdat(pg_da
 	int priority;
 	int i;
 	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
-	int zones_need_compaction = 1;
 	unsigned long total_scanned;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_soft_reclaimed;
@@ -2920,6 +2919,8 @@ out:
 	 * and it is potentially going to sleep here.
 	 */
 	if (order) {
+		int zones_need_compaction = 1;
+
 		for (i = 0; i <= end_zone; i++) {
 			struct zone *zone = pgdat->node_zones + i;
 
_

(could have given it type "bool", but that seems unnecessary when it
has "needs" in the name)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled
  2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
  2012-01-27  9:13   ` Hillf Danton
  2012-01-27 23:31   ` Andrew Morton
@ 2012-01-29 13:25   ` Hillf Danton
  2012-01-30 10:36   ` Mel Gorman
  3 siblings, 0 replies; 12+ messages in thread
From: Hillf Danton @ 2012-01-29 13:25 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, lkml, Andrea Arcangeli, Mel Gorman, Johannes Weiner,
	Andrew Morton, Minchan Kim, Hillf Danton

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2438 bytes --]

Hi Rik

On Fri, Jan 27, 2012 at 3:59 AM, Rik van Riel <riel@redhat.com> wrote:
[...]

> @@ -2754,7 +2754,7 @@ loop_again:
>                 */
>                for (i = 0; i <= end_zone; i++) {
>                        struct zone *zone = pgdat->node_zones + i;
> -                       int nr_slab;
> +                       int nr_slab, testorder;
>                        unsigned long balance_gap;
>
>                        if (!populated_zone(zone))
> @@ -2787,7 +2787,20 @@ loop_again:
>                                (zone->present_pages +
>                                        KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
>                                KSWAPD_ZONE_BALANCE_GAP_RATIO);
> -                       if (!zone_watermark_ok_safe(zone, order,
> +                       /*
> +                        * Kswapd reclaims only single pages with compaction
> +                        * enabled. Trying too hard to reclaim until contiguous
> +                        * free pages have become available can hurt performance
> +                        * by evicting too much useful data from memory.
> +                        * Do not reclaim more than needed for compaction.
> +                        */
> +                       testorder = order;
> +                       if (COMPACTION_BUILD && order &&
> +                                       compaction_suitable(zone, order) !=
> +                                               COMPACT_SKIPPED)
> +                               testorder = 0;
> +
> +                       if (!zone_watermark_ok_safe(zone, testorder,
>                                        high_wmark_pages(zone) + balance_gap,
>                                        end_zone, 0)) {
>                                shrink_zone(priority, zone, &sc);

Hard to understand that zone is shrunk as hard as it was,
with water mark checked with new order, tippoint please.

Hillf
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled
  2012-01-27 16:35     ` Rik van Riel
@ 2012-01-30 10:26       ` Mel Gorman
  0 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2012-01-30 10:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hillf Danton, linux-mm, lkml, Andrea Arcangeli, Johannes Weiner,
	Andrew Morton, Minchan Kim

On Fri, Jan 27, 2012 at 11:35:02AM -0500, Rik van Riel wrote:
> On 01/27/2012 04:13 AM, Hillf Danton wrote:
> 
> >>@@ -1195,7 +1195,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> >>                        BUG();
> >>                }
> >>
> >>-               if (!order)
> >>+               if (!sc->order || !(sc->reclaim_mode&  RECLAIM_MODE_LUMPYRECLAIM))
> >>                        continue;
> >>
> >Just a tiny advice 8-)
> >
> >mind to move checking lumpy reclaim out of the loop,
> >something like
> 
> Hehe, I made the change the way it is on request
> of Mel Gorman :)
> 

Yes. I recognise that checking inside the loop like this results
in a tiny hit but it is hardly critical. By putting the check here,
it is absolutely clear that this is now a lumpy-reclaim only thing
where it used to be used by both lumpy reclaim and reclaim/compaction.
It'll make deleting lumpy reclaim a little bit easier in the future.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled
  2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
                     ` (2 preceding siblings ...)
  2012-01-29 13:25   ` Hillf Danton
@ 2012-01-30 10:36   ` Mel Gorman
  3 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2012-01-30 10:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, lkml, Andrea Arcangeli, Johannes Weiner, Andrew Morton,
	Minchan Kim, KOSAKI Motohiro

On Thu, Jan 26, 2012 at 02:59:14PM -0500, Rik van Riel wrote:
> When built with CONFIG_COMPACTION, kswapd should not try to free
> contiguous pages, because it is not trying hard enough to have
> a real chance at being successful, but still disrupts the LRU
> enough to break other things.
> 
> Do not do higher order page isolation unless we really are in
> lumpy reclaim mode.
> 
> Stop reclaiming pages once we have enough free pages that
> compaction can deal with things, and we hit the normal order 0
> watermarks used by kswapd.
> 
> Also remove a line of code that increments balanced right before
> exiting the function.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 -mm 3/3] mm: only defer compaction for failed order and higher
  2012-01-26 20:01 ` [PATCH v3 -mm 3/3] mm: only defer compaction for failed order and higher Rik van Riel
@ 2012-01-30 10:47   ` Mel Gorman
  0 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2012-01-30 10:47 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, lkml, Andrea Arcangeli, Johannes Weiner, Andrew Morton,
	Minchan Kim, KOSAKI Motohiro

On Thu, Jan 26, 2012 at 03:01:02PM -0500, Rik van Riel wrote:
> Currently a failed order-9 (transparent hugepage) compaction can
> lead to memory compaction being temporarily disabled for a memory
> zone.  Even if we only need compaction for an order 2 allocation,
> eg. for jumbo frames networking.
> 
> The fix is relatively straightforward: keep track of the highest
> order at which compaction is succeeding, and only defer compaction
> for orders at which compaction is failing.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-01-30 10:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-26 19:54 [PATCH v3 -mm 0/3] kswapd vs compaction improvements Rik van Riel
2012-01-26 19:59 ` [PATCH v3 -mm 1/3] mm: reclaim at order 0 when compaction is enabled Rik van Riel
2012-01-27  9:13   ` Hillf Danton
2012-01-27 16:35     ` Rik van Riel
2012-01-30 10:26       ` Mel Gorman
2012-01-27 23:31   ` Andrew Morton
2012-01-29 13:25   ` Hillf Danton
2012-01-30 10:36   ` Mel Gorman
2012-01-26 19:59 ` [PATCH v3 -mm 2/3] mm: kswapd carefully call compaction Rik van Riel
2012-01-27 23:36   ` Andrew Morton
2012-01-26 20:01 ` [PATCH v3 -mm 3/3] mm: only defer compaction for failed order and higher Rik van Riel
2012-01-30 10:47   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).