* Re: [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
[not found] <02fe01d1c48b$c44e9e80$4cebdb80$@alibaba-inc.com>
@ 2016-06-12 9:33 ` Hillf Danton
2016-06-14 14:52 ` Mel Gorman
0 siblings, 1 reply; 9+ messages in thread
From: Hillf Danton @ 2016-06-12 9:33 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-kernel, linux-mm
>
> /*
> - * kswapd shrinks the zone by the number of pages required to reach
> - * the high watermark.
> + * kswapd shrinks a node of pages that are at or below the highest usable
> + * zone that is currently unbalanced.
> *
> * Returns true if kswapd scanned at least the requested number of pages to
> * reclaim or if the lack of progress was due to pages under writeback.
> * This is used to determine if the scanning priority needs to be raised.
> */
> -static bool kswapd_shrink_zone(struct zone *zone,
> +static bool kswapd_shrink_node(pg_data_t *pgdat,
> int classzone_idx,
> struct scan_control *sc)
> {
> - unsigned long balance_gap;
> - bool lowmem_pressure;
> - struct pglist_data *pgdat = zone->zone_pgdat;
> + struct zone *zone;
> + unsigned long nr_to_reclaim = 0;
> + int z;
>
> - /* Reclaim above the high watermark. */
> - sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone));
> + /* Reclaim a number of pages proportional to the number of zones */
> + for (z = 0; z <= classzone_idx; z++) {
> + zone = pgdat->node_zones + z;
> + if (!populated_zone(zone))
> + continue;
>
> - /*
> - * We put equal pressure on every zone, unless one zone has way too
> - * many pages free already. The "too many pages" is defined as the
> - * high wmark plus a "gap" where the gap is either the low
> - * watermark or 1% of the zone, whichever is smaller.
> - */
> - balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
> - zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
> + nr_to_reclaim += max(high_wmark_pages(zone), SWAP_CLUSTER_MAX);
> + }
Missing sc->nr_to_reclaim = nr_to_reclaim; ?
>
> /*
> - * If there is no low memory pressure or the zone is balanced then no
> - * reclaim is necessary
> + * Historically care was taken to put equal pressure on all zones but
> + * now pressure is applied based on node LRU order.
> */
> - lowmem_pressure = (buffer_heads_over_limit && is_highmem(zone));
> - if (!lowmem_pressure && zone_balanced(zone, sc->order, false,
> - balance_gap, classzone_idx))
> - return true;
> -
> - shrink_node(zone->zone_pgdat, sc, classzone_idx);
> -
> - /* TODO: ANOMALY */
> - clear_bit(PGDAT_WRITEBACK, &pgdat->flags);
> + shrink_node(pgdat, sc, classzone_idx);
>
> /*
> - * If a zone reaches its high watermark, consider it to be no longer
> - * congested. It's possible there are dirty pages backed by congested
> - * BDIs but as pressure is relieved, speculatively avoid congestion
> - * waits.
> + * Fragmentation may mean that the system cannot be rebalanced for
> + * high-order allocations. If twice the allocation size has been
> + * reclaimed then recheck watermarks only at order-0 to prevent
> + * excessive reclaim. Assume that a process requested a high-order
> + * can direct reclaim/compact.
> */
> - if (pgdat_reclaimable(zone->zone_pgdat) &&
> - zone_balanced(zone, sc->order, false, 0, classzone_idx)) {
> - clear_bit(PGDAT_CONGESTED, &pgdat->flags);
> - clear_bit(PGDAT_DIRTY, &pgdat->flags);
> - }
> + if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
> + sc->order = 0;
>
> return sc->nr_scanned >= sc->nr_to_reclaim;
> }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
2016-06-12 9:33 ` [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes Hillf Danton
@ 2016-06-14 14:52 ` Mel Gorman
0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2016-06-14 14:52 UTC (permalink / raw)
To: Hillf Danton; +Cc: linux-kernel, linux-mm
On Sun, Jun 12, 2016 at 05:33:24PM +0800, Hillf Danton wrote:
> > - /*
> > - * We put equal pressure on every zone, unless one zone has way too
> > - * many pages free already. The "too many pages" is defined as the
> > - * high wmark plus a "gap" where the gap is either the low
> > - * watermark or 1% of the zone, whichever is smaller.
> > - */
> > - balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
> > - zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
> > + nr_to_reclaim += max(high_wmark_pages(zone), SWAP_CLUSTER_MAX);
> > + }
>
> Missing sc->nr_to_reclaim = nr_to_reclaim; ?
>
Yes. It may explain why I saw lower than expected kswapd in more
detailed tests recently. Thanks.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
2016-06-22 8:42 ` Hillf Danton
@ 2016-06-23 11:31 ` Mel Gorman
0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2016-06-23 11:31 UTC (permalink / raw)
To: Hillf Danton; +Cc: Johannes Weiner, Vlastimil Babka, linux-kernel, linux-mm
On Wed, Jun 22, 2016 at 04:42:06PM +0800, Hillf Danton wrote:
> > /*
> > - * If a zone reaches its high watermark, consider it to be no longer
> > - * congested. It's possible there are dirty pages backed by congested
> > - * BDIs but as pressure is relieved, speculatively avoid congestion
> > - * waits.
> > + * Fragmentation may mean that the system cannot be rebalanced for
> > + * high-order allocations. If twice the allocation size has been
> > + * reclaimed then recheck watermarks only at order-0 to prevent
> > + * excessive reclaim. Assume that a process requested a high-order
> > + * can direct reclaim/compact.
> > */
> > - if (pgdat_reclaimable(zone->zone_pgdat) &&
> > - zone_balanced(zone, sc->order, false, 0, classzone_idx)) {
> > - clear_bit(PGDAT_CONGESTED, &pgdat->flags);
> > - clear_bit(PGDAT_DIRTY, &pgdat->flags);
> > - }
> > + if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
> > + sc->order = 0;
> >
>
> Reclaim order is changed here.
> Btw, I find no such change in current code.
>
It is reintroducing a check removed by commit accf62422b3a ("mm, kswapd: replace
kswapd compaction with waking up kcompactd"). That patch had kswapd
always check at order-0 once kswapd is awake in pgdat_balanced but would
still take at least one pass through reclaiming so kcompactd potentially
makes progress.
This patch removes pgdat_balanced entirely and zone_balanced() checks the
order it is asked like it used to. Hence, it is necessary to reset sc->order
once progress is made or kswapd potentially stays awake reclaiming pages
until a high-order page is freed.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
[not found] <071801d1cc5c$245087d0$6cf19770$@alibaba-inc.com>
@ 2016-06-22 8:42 ` Hillf Danton
2016-06-23 11:31 ` Mel Gorman
0 siblings, 1 reply; 9+ messages in thread
From: Hillf Danton @ 2016-06-22 8:42 UTC (permalink / raw)
To: Mel Gorman; +Cc: Johannes Weiner, Vlastimil Babka, linux-kernel, linux-mm
> /*
> - * kswapd shrinks the zone by the number of pages required to reach
> - * the high watermark.
> + * kswapd shrinks a node of pages that are at or below the highest usable
> + * zone that is currently unbalanced.
> *
> * Returns true if kswapd scanned at least the requested number of pages to
> * reclaim or if the lack of progress was due to pages under writeback.
> * This is used to determine if the scanning priority needs to be raised.
> */
> -static bool kswapd_shrink_zone(struct zone *zone,
> +static bool kswapd_shrink_node(pg_data_t *pgdat,
> int classzone_idx,
> struct scan_control *sc)
> {
> - unsigned long balance_gap;
> - bool lowmem_pressure;
> - struct pglist_data *pgdat = zone->zone_pgdat;
> + struct zone *zone;
> + int z;
>
> - /* Reclaim above the high watermark. */
> - sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone));
> + /* Reclaim a number of pages proportional to the number of zones */
> + sc->nr_to_reclaim = 0;
> + for (z = 0; z <= classzone_idx; z++) {
> + zone = pgdat->node_zones + z;
> + if (!populated_zone(zone))
> + continue;
>
> - /*
> - * We put equal pressure on every zone, unless one zone has way too
> - * many pages free already. The "too many pages" is defined as the
> - * high wmark plus a "gap" where the gap is either the low
> - * watermark or 1% of the zone, whichever is smaller.
> - */
> - balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
> - zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
> + sc->nr_to_reclaim += max(high_wmark_pages(zone), SWAP_CLUSTER_MAX);
> + }
>
> /*
> - * If there is no low memory pressure or the zone is balanced then no
> - * reclaim is necessary
> + * Historically care was taken to put equal pressure on all zones but
> + * now pressure is applied based on node LRU order.
> */
> - lowmem_pressure = (buffer_heads_over_limit && is_highmem(zone));
> - if (!lowmem_pressure && zone_balanced(zone, sc->order, false,
> - balance_gap, classzone_idx))
> - return true;
> -
> - shrink_node(zone->zone_pgdat, sc, classzone_idx);
> -
> - /* TODO: ANOMALY */
> - clear_bit(PGDAT_WRITEBACK, &pgdat->flags);
> + shrink_node(pgdat, sc, classzone_idx);
>
> /*
> - * If a zone reaches its high watermark, consider it to be no longer
> - * congested. It's possible there are dirty pages backed by congested
> - * BDIs but as pressure is relieved, speculatively avoid congestion
> - * waits.
> + * Fragmentation may mean that the system cannot be rebalanced for
> + * high-order allocations. If twice the allocation size has been
> + * reclaimed then recheck watermarks only at order-0 to prevent
> + * excessive reclaim. Assume that a process requested a high-order
> + * can direct reclaim/compact.
> */
> - if (pgdat_reclaimable(zone->zone_pgdat) &&
> - zone_balanced(zone, sc->order, false, 0, classzone_idx)) {
> - clear_bit(PGDAT_CONGESTED, &pgdat->flags);
> - clear_bit(PGDAT_DIRTY, &pgdat->flags);
> - }
> + if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
> + sc->order = 0;
>
Reclaim order is changed here.
Btw, I find no such change in current code.
> return sc->nr_scanned >= sc->nr_to_reclaim;
> }
>
> /*
> - * For kswapd, balance_pgdat() will work across all this node's zones until
> - * they are all at high_wmark_pages(zone).
> - *
> - * Returns the highest zone idx kswapd was reclaiming at
> + * For kswapd, balance_pgdat() will reclaim pages across a node from zones
> + * that are eligible for use by the caller until at least one zone is
> + * balanced.
> *
> - * There is special handling here for zones which are full of pinned pages.
> - * This can happen if the pages are all mlocked, or if they are all used by
> - * device drivers (say, ZONE_DMA). Or if they are all in use by hugetlb.
> - * What we do is to detect the case where all pages in the zone have been
> - * scanned twice and there has been zero successful reclaim. Mark the zone as
> - * dead and from now on, only perform a short scan. Basically we're polling
> - * the zone for when the problem goes away.
> + * Returns the order kswapd finished reclaiming at.
> *
> * kswapd scans the zones in the highmem->normal->dma direction. It skips
> * zones which have free_pages > high_wmark_pages(zone), but once a zone is
> - * found to have free_pages <= high_wmark_pages(zone), we scan that zone and the
> - * lower zones regardless of the number of free pages in the lower zones. This
> - * interoperates with the page allocator fallback scheme to ensure that aging
> - * of pages is balanced across the zones.
> + * found to have free_pages <= high_wmark_pages(zone), any page is that zone
> + * or lower is eligible for reclaim until at least one usable zone is
> + * balanced.
> */
> static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
> {
> int i;
> - int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
> unsigned long nr_soft_reclaimed;
> unsigned long nr_soft_scanned;
> + struct zone *zone;
> struct scan_control sc = {
> .gfp_mask = GFP_KERNEL,
> - .reclaim_idx = MAX_NR_ZONES - 1,
> .order = order,
> .priority = DEF_PRIORITY,
> .may_writepage = !laptop_mode,
> .may_unmap = 1,
> .may_swap = 1,
> + .reclaim_idx = classzone_idx,
> };
> count_vm_event(PAGEOUTRUN);
>
> @@ -3203,21 +3125,10 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
>
> /* Scan from the highest requested zone to dma */
> for (i = classzone_idx; i >= 0; i--) {
> - struct zone *zone = pgdat->node_zones + i;
> -
> + zone = pgdat->node_zones + i;
> if (!populated_zone(zone))
> continue;
>
> - if (sc.priority != DEF_PRIORITY &&
> - !pgdat_reclaimable(zone->zone_pgdat))
> - continue;
> -
> - /*
> - * Do some background aging of the anon list, to give
> - * pages a chance to be referenced before reclaiming.
> - */
> - age_active_anon(zone, &sc);
> -
> /*
> * If the number of buffer_heads in the machine
> * exceeds the maximum allowed level and this node
> @@ -3225,19 +3136,17 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
> * it to relieve lowmem pressure.
> */
> if (buffer_heads_over_limit && is_highmem_idx(i)) {
> - end_zone = i;
> + classzone_idx = i;
> break;
> }
>
> - if (!zone_balanced(zone, order, false, 0, 0)) {
> - end_zone = i;
> + if (!zone_balanced(zone, order, 0, 0)) {
We need to sync order with the above change?
> + classzone_idx = i;
> break;
> } else {
> /*
> - * If balanced, clear the dirty and congested
> - * flags
> - *
> - * TODO: ANOMALY
> + * If any eligible zone is balanced then the
> + * node is not considered congested or dirty.
> */
> clear_bit(PGDAT_CONGESTED, &zone->zone_pgdat->flags);
> clear_bit(PGDAT_DIRTY, &zone->zone_pgdat->flags);
> @@ -3248,51 +3157,34 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
> goto out;
>
> /*
> + * Do some background aging of the anon list, to give
> + * pages a chance to be referenced before reclaiming. All
> + * pages are rotated regardless of classzone as this is
> + * about consistent aging.
> + */
> + age_active_anon(pgdat, &pgdat->node_zones[MAX_NR_ZONES - 1], &sc);
> +
> + /*
> * If we're getting trouble reclaiming, start doing writepage
> * even in laptop mode.
> */
> - if (sc.priority < DEF_PRIORITY - 2)
> + if (sc.priority < DEF_PRIORITY - 2 || !pgdat_reclaimable(pgdat))
> sc.may_writepage = 1;
>
> + /* Call soft limit reclaim before calling shrink_node. */
> + sc.nr_scanned = 0;
> + nr_soft_scanned = 0;
> + nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone, sc.order,
> + sc.gfp_mask, &nr_soft_scanned);
> + sc.nr_reclaimed += nr_soft_reclaimed;
> +
> /*
> - * Continue scanning in the highmem->dma direction stopping at
> - * the last zone which needs scanning. This may reclaim lowmem
> - * pages that are not necessary for zone balancing but it
> - * preserves LRU ordering. It is assumed that the bulk of
> - * allocation requests can use arbitrary zones with the
> - * possible exception of big highmem:lowmem configurations.
> + * There should be no need to raise the scanning priority if
> + * enough pages are already being scanned that that high
> + * watermark would be met at 100% efficiency.
> */
> - for (i = end_zone; i >= 0; i--) {
> - struct zone *zone = pgdat->node_zones + i;
> -
> - if (!populated_zone(zone))
> - continue;
> -
> - if (sc.priority != DEF_PRIORITY &&
> - !pgdat_reclaimable(zone->zone_pgdat))
> - continue;
> -
> - sc.nr_scanned = 0;
> - sc.reclaim_idx = i;
> -
> - nr_soft_scanned = 0;
> - /*
> - * Call soft limit reclaim before calling shrink_zone.
> - */
> - nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
> - order, sc.gfp_mask,
> - &nr_soft_scanned);
> - sc.nr_reclaimed += nr_soft_reclaimed;
> -
> - /*
> - * There should be no need to raise the scanning
> - * priority if enough pages are already being scanned
> - * that that high watermark would be met at 100%
> - * efficiency.
> - */
> - if (kswapd_shrink_zone(zone, end_zone, &sc))
> - raise_priority = false;
> - }
> + if (kswapd_shrink_node(pgdat, classzone_idx, &sc))
> + raise_priority = false;
>
> /*
> * If the low watermark is met there is no need for processes
> @@ -3308,20 +3200,37 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
> break;
>
> /*
> + * Stop reclaiming if any eligible zone is balanced and clear
> + * node writeback or congested.
> + */
> + for (i = 0; i <= classzone_idx; i++) {
> + zone = pgdat->node_zones + i;
> + if (!populated_zone(zone))
> + continue;
> +
> + if (zone_balanced(zone, sc.order, 0, classzone_idx)) {
> + clear_bit(PGDAT_CONGESTED, &pgdat->flags);
> + clear_bit(PGDAT_DIRTY, &pgdat->flags);
> + goto out;
> + }
> + }
> +
> + /*
> * Raise priority if scanning rate is too low or there was no
> * progress in reclaiming pages
> */
> if (raise_priority || !sc.nr_reclaimed)
> sc.priority--;
> - } while (sc.priority >= 1 &&
> - !pgdat_balanced(pgdat, order, classzone_idx));
> + } while (sc.priority >= 1);
>
> out:
> /*
> - * Return the highest zone idx we were reclaiming at so
> - * prepare_kswapd_sleep() makes the same decisions as here.
> + * Return the order kswapd stopped reclaiming at as
> + * prepare_kswapd_sleep() takes it into account. If another caller
> + * entered the allocator slow path while kswapd was awake, order will
> + * remain at the higher level.
> */
> - return end_zone;
> + return sc.order;
> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
2016-06-21 14:15 [PATCH 00/27] Move LRU page reclaim from zones to nodes v7 Mel Gorman
@ 2016-06-21 14:15 ` Mel Gorman
0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2016-06-21 14:15 UTC (permalink / raw)
To: Andrew Morton, Linux-MM
Cc: Rik van Riel, Vlastimil Babka, Johannes Weiner, LKML, Mel Gorman
Patch "mm: vmscan: Begin reclaiming pages on a per-node basis" started
thinking of reclaim in terms of nodes but kswapd is still zone-centric. This
patch gets rid of many of the node-based versus zone-based decisions.
o A node is considered balanced when any eligible lower zone is balanced.
This eliminates one class of age-inversion problem because we avoid
reclaiming a newer page just because it's in the wrong zone
o pgdat_balanced disappears because we now only care about one zone being
balanced.
o Some anomalies related to writeback and congestion tracking being based on
zones disappear.
o kswapd no longer has to take care to reclaim zones in the reverse order
that the page allocator uses.
o Most importantly of all, reclaim from node 0 with multiple zones will
have similar aging and reclaiming characteristics as every
other node.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/vmscan.c | 292 +++++++++++++++++++++---------------------------------------
1 file changed, 101 insertions(+), 191 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b5b355db97cb..5873f5003078 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2972,7 +2972,8 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
}
#endif
-static void age_active_anon(struct zone *zone, struct scan_control *sc)
+static void age_active_anon(struct pglist_data *pgdat,
+ struct zone *zone, struct scan_control *sc)
{
struct mem_cgroup *memcg;
@@ -2991,85 +2992,15 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc)
} while (memcg);
}
-static bool zone_balanced(struct zone *zone, int order, bool highorder,
+static bool zone_balanced(struct zone *zone, int order,
unsigned long balance_gap, int classzone_idx)
{
unsigned long mark = high_wmark_pages(zone) + balance_gap;
- /*
- * When checking from pgdat_balanced(), kswapd should stop and sleep
- * when it reaches the high order-0 watermark and let kcompactd take
- * over. Other callers such as wakeup_kswapd() want to determine the
- * true high-order watermark.
- */
- if (IS_ENABLED(CONFIG_COMPACTION) && !highorder) {
- mark += (1UL << order);
- order = 0;
- }
-
return zone_watermark_ok_safe(zone, order, mark, classzone_idx);
}
/*
- * pgdat_balanced() is used when checking if a node is balanced.
- *
- * For order-0, all zones must be balanced!
- *
- * For high-order allocations only zones that meet watermarks and are in a
- * zone allowed by the callers classzone_idx are added to balanced_pages. The
- * total of balanced pages must be at least 25% of the zones allowed by
- * classzone_idx for the node to be considered balanced. Forcing all zones to
- * be balanced for high orders can cause excessive reclaim when there are
- * imbalanced zones.
- * The choice of 25% is due to
- * o a 16M DMA zone that is balanced will not balance a zone on any
- * reasonable sized machine
- * o On all other machines, the top zone must be at least a reasonable
- * percentage of the middle zones. For example, on 32-bit x86, highmem
- * would need to be at least 256M for it to be balance a whole node.
- * Similarly, on x86-64 the Normal zone would need to be at least 1G
- * to balance a node on its own. These seemed like reasonable ratios.
- */
-static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx)
-{
- unsigned long managed_pages = 0;
- unsigned long balanced_pages = 0;
- int i;
-
- /* Check the watermark levels */
- for (i = 0; i <= classzone_idx; i++) {
- struct zone *zone = pgdat->node_zones + i;
-
- if (!populated_zone(zone))
- continue;
-
- managed_pages += zone->managed_pages;
-
- /*
- * A special case here:
- *
- * balance_pgdat() skips over all_unreclaimable after
- * DEF_PRIORITY. Effectively, it considers them balanced so
- * they must be considered balanced here as well!
- */
- if (!pgdat_reclaimable(zone->zone_pgdat)) {
- balanced_pages += zone->managed_pages;
- continue;
- }
-
- if (zone_balanced(zone, order, false, 0, i))
- balanced_pages += zone->managed_pages;
- else if (!order)
- return false;
- }
-
- if (order)
- return balanced_pages >= (managed_pages >> 2);
- else
- return true;
-}
-
-/*
* Prepare kswapd for sleeping. This verifies that there are no processes
* waiting in throttle_direct_reclaim() and that watermarks have been met.
*
@@ -3078,6 +3009,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx)
static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
int classzone_idx)
{
+ int i;
+
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
return false;
@@ -3098,101 +3031,90 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
if (waitqueue_active(&pgdat->pfmemalloc_wait))
wake_up_all(&pgdat->pfmemalloc_wait);
- return pgdat_balanced(pgdat, order, classzone_idx);
+ for (i = 0; i <= classzone_idx; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_balanced(zone, order, 0, classzone_idx))
+ return true;
+ }
+
+ return false;
}
/*
- * kswapd shrinks the zone by the number of pages required to reach
- * the high watermark.
+ * kswapd shrinks a node of pages that are at or below the highest usable
+ * zone that is currently unbalanced.
*
* Returns true if kswapd scanned at least the requested number of pages to
* reclaim or if the lack of progress was due to pages under writeback.
* This is used to determine if the scanning priority needs to be raised.
*/
-static bool kswapd_shrink_zone(struct zone *zone,
+static bool kswapd_shrink_node(pg_data_t *pgdat,
int classzone_idx,
struct scan_control *sc)
{
- unsigned long balance_gap;
- bool lowmem_pressure;
- struct pglist_data *pgdat = zone->zone_pgdat;
+ struct zone *zone;
+ int z;
- /* Reclaim above the high watermark. */
- sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone));
+ /* Reclaim a number of pages proportional to the number of zones */
+ sc->nr_to_reclaim = 0;
+ for (z = 0; z <= classzone_idx; z++) {
+ zone = pgdat->node_zones + z;
+ if (!populated_zone(zone))
+ continue;
- /*
- * We put equal pressure on every zone, unless one zone has way too
- * many pages free already. The "too many pages" is defined as the
- * high wmark plus a "gap" where the gap is either the low
- * watermark or 1% of the zone, whichever is smaller.
- */
- balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
- zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
+ sc->nr_to_reclaim += max(high_wmark_pages(zone), SWAP_CLUSTER_MAX);
+ }
/*
- * If there is no low memory pressure or the zone is balanced then no
- * reclaim is necessary
+ * Historically care was taken to put equal pressure on all zones but
+ * now pressure is applied based on node LRU order.
*/
- lowmem_pressure = (buffer_heads_over_limit && is_highmem(zone));
- if (!lowmem_pressure && zone_balanced(zone, sc->order, false,
- balance_gap, classzone_idx))
- return true;
-
- shrink_node(zone->zone_pgdat, sc, classzone_idx);
-
- /* TODO: ANOMALY */
- clear_bit(PGDAT_WRITEBACK, &pgdat->flags);
+ shrink_node(pgdat, sc, classzone_idx);
/*
- * If a zone reaches its high watermark, consider it to be no longer
- * congested. It's possible there are dirty pages backed by congested
- * BDIs but as pressure is relieved, speculatively avoid congestion
- * waits.
+ * Fragmentation may mean that the system cannot be rebalanced for
+ * high-order allocations. If twice the allocation size has been
+ * reclaimed then recheck watermarks only at order-0 to prevent
+ * excessive reclaim. Assume that a process requested a high-order
+ * can direct reclaim/compact.
*/
- if (pgdat_reclaimable(zone->zone_pgdat) &&
- zone_balanced(zone, sc->order, false, 0, classzone_idx)) {
- clear_bit(PGDAT_CONGESTED, &pgdat->flags);
- clear_bit(PGDAT_DIRTY, &pgdat->flags);
- }
+ if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
+ sc->order = 0;
return sc->nr_scanned >= sc->nr_to_reclaim;
}
/*
- * For kswapd, balance_pgdat() will work across all this node's zones until
- * they are all at high_wmark_pages(zone).
- *
- * Returns the highest zone idx kswapd was reclaiming at
+ * For kswapd, balance_pgdat() will reclaim pages across a node from zones
+ * that are eligible for use by the caller until at least one zone is
+ * balanced.
*
- * There is special handling here for zones which are full of pinned pages.
- * This can happen if the pages are all mlocked, or if they are all used by
- * device drivers (say, ZONE_DMA). Or if they are all in use by hugetlb.
- * What we do is to detect the case where all pages in the zone have been
- * scanned twice and there has been zero successful reclaim. Mark the zone as
- * dead and from now on, only perform a short scan. Basically we're polling
- * the zone for when the problem goes away.
+ * Returns the order kswapd finished reclaiming at.
*
* kswapd scans the zones in the highmem->normal->dma direction. It skips
* zones which have free_pages > high_wmark_pages(zone), but once a zone is
- * found to have free_pages <= high_wmark_pages(zone), we scan that zone and the
- * lower zones regardless of the number of free pages in the lower zones. This
- * interoperates with the page allocator fallback scheme to ensure that aging
- * of pages is balanced across the zones.
+ * found to have free_pages <= high_wmark_pages(zone), any page is that zone
+ * or lower is eligible for reclaim until at least one usable zone is
+ * balanced.
*/
static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
{
int i;
- int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
+ struct zone *zone;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
- .reclaim_idx = MAX_NR_ZONES - 1,
.order = order,
.priority = DEF_PRIORITY,
.may_writepage = !laptop_mode,
.may_unmap = 1,
.may_swap = 1,
+ .reclaim_idx = classzone_idx,
};
count_vm_event(PAGEOUTRUN);
@@ -3203,21 +3125,10 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
/* Scan from the highest requested zone to dma */
for (i = classzone_idx; i >= 0; i--) {
- struct zone *zone = pgdat->node_zones + i;
-
+ zone = pgdat->node_zones + i;
if (!populated_zone(zone))
continue;
- if (sc.priority != DEF_PRIORITY &&
- !pgdat_reclaimable(zone->zone_pgdat))
- continue;
-
- /*
- * Do some background aging of the anon list, to give
- * pages a chance to be referenced before reclaiming.
- */
- age_active_anon(zone, &sc);
-
/*
* If the number of buffer_heads in the machine
* exceeds the maximum allowed level and this node
@@ -3225,19 +3136,17 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
* it to relieve lowmem pressure.
*/
if (buffer_heads_over_limit && is_highmem_idx(i)) {
- end_zone = i;
+ classzone_idx = i;
break;
}
- if (!zone_balanced(zone, order, false, 0, 0)) {
- end_zone = i;
+ if (!zone_balanced(zone, order, 0, 0)) {
+ classzone_idx = i;
break;
} else {
/*
- * If balanced, clear the dirty and congested
- * flags
- *
- * TODO: ANOMALY
+ * If any eligible zone is balanced then the
+ * node is not considered congested or dirty.
*/
clear_bit(PGDAT_CONGESTED, &zone->zone_pgdat->flags);
clear_bit(PGDAT_DIRTY, &zone->zone_pgdat->flags);
@@ -3248,51 +3157,34 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
goto out;
/*
+ * Do some background aging of the anon list, to give
+ * pages a chance to be referenced before reclaiming. All
+ * pages are rotated regardless of classzone as this is
+ * about consistent aging.
+ */
+ age_active_anon(pgdat, &pgdat->node_zones[MAX_NR_ZONES - 1], &sc);
+
+ /*
* If we're getting trouble reclaiming, start doing writepage
* even in laptop mode.
*/
- if (sc.priority < DEF_PRIORITY - 2)
+ if (sc.priority < DEF_PRIORITY - 2 || !pgdat_reclaimable(pgdat))
sc.may_writepage = 1;
+ /* Call soft limit reclaim before calling shrink_node. */
+ sc.nr_scanned = 0;
+ nr_soft_scanned = 0;
+ nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone, sc.order,
+ sc.gfp_mask, &nr_soft_scanned);
+ sc.nr_reclaimed += nr_soft_reclaimed;
+
/*
- * Continue scanning in the highmem->dma direction stopping at
- * the last zone which needs scanning. This may reclaim lowmem
- * pages that are not necessary for zone balancing but it
- * preserves LRU ordering. It is assumed that the bulk of
- * allocation requests can use arbitrary zones with the
- * possible exception of big highmem:lowmem configurations.
+ * There should be no need to raise the scanning priority if
+ * enough pages are already being scanned that that high
+ * watermark would be met at 100% efficiency.
*/
- for (i = end_zone; i >= 0; i--) {
- struct zone *zone = pgdat->node_zones + i;
-
- if (!populated_zone(zone))
- continue;
-
- if (sc.priority != DEF_PRIORITY &&
- !pgdat_reclaimable(zone->zone_pgdat))
- continue;
-
- sc.nr_scanned = 0;
- sc.reclaim_idx = i;
-
- nr_soft_scanned = 0;
- /*
- * Call soft limit reclaim before calling shrink_zone.
- */
- nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
- order, sc.gfp_mask,
- &nr_soft_scanned);
- sc.nr_reclaimed += nr_soft_reclaimed;
-
- /*
- * There should be no need to raise the scanning
- * priority if enough pages are already being scanned
- * that that high watermark would be met at 100%
- * efficiency.
- */
- if (kswapd_shrink_zone(zone, end_zone, &sc))
- raise_priority = false;
- }
+ if (kswapd_shrink_node(pgdat, classzone_idx, &sc))
+ raise_priority = false;
/*
* If the low watermark is met there is no need for processes
@@ -3308,20 +3200,37 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
break;
/*
+ * Stop reclaiming if any eligible zone is balanced and clear
+ * node writeback or congested.
+ */
+ for (i = 0; i <= classzone_idx; i++) {
+ zone = pgdat->node_zones + i;
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_balanced(zone, sc.order, 0, classzone_idx)) {
+ clear_bit(PGDAT_CONGESTED, &pgdat->flags);
+ clear_bit(PGDAT_DIRTY, &pgdat->flags);
+ goto out;
+ }
+ }
+
+ /*
* Raise priority if scanning rate is too low or there was no
* progress in reclaiming pages
*/
if (raise_priority || !sc.nr_reclaimed)
sc.priority--;
- } while (sc.priority >= 1 &&
- !pgdat_balanced(pgdat, order, classzone_idx));
+ } while (sc.priority >= 1);
out:
/*
- * Return the highest zone idx we were reclaiming at so
- * prepare_kswapd_sleep() makes the same decisions as here.
+ * Return the order kswapd stopped reclaiming at as
+ * prepare_kswapd_sleep() takes it into account. If another caller
+ * entered the allocator slow path while kswapd was awake, order will
+ * remain at the higher level.
*/
- return end_zone;
+ return sc.order;
}
static void kswapd_try_to_sleep(pg_data_t *pgdat, int order,
@@ -3478,8 +3387,9 @@ static int kswapd(void *p)
*/
if (!ret) {
trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
- balanced_classzone_idx = balance_pgdat(pgdat, order,
- classzone_idx);
+
+ /* return value ignored until next patch */
+ balance_pgdat(pgdat, order, classzone_idx);
}
}
@@ -3509,7 +3419,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
}
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
- if (zone_balanced(zone, order, true, 0, 0))
+ if (zone_balanced(zone, order, 0, 0))
return;
trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
--
2.6.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
2016-06-09 18:04 ` [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes Mel Gorman
@ 2016-06-15 14:23 ` Vlastimil Babka
0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2016-06-15 14:23 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton, Linux-MM; +Cc: Rik van Riel, Johannes Weiner, LKML
On 06/09/2016 08:04 PM, Mel Gorman wrote:
> Patch "mm: vmscan: Begin reclaiming pages on a per-node basis" started
> thinking of reclaim in terms of nodes but kswapd is still zone-centric. This
> patch gets rid of many of the node-based versus zone-based decisions.
>
> o A node is considered balanced when any eligible lower zone is balanced.
> This eliminates one class of age-inversion problem because we avoid
> reclaiming a newer page just because it's in the wrong zone
> o pgdat_balanced disappears because we now only care about one zone being
> balanced.
> o Some anomalies related to writeback and congestion tracking being based on
> zones disappear.
> o kswapd no longer has to take care to reclaim zones in the reverse order
> that the page allocator uses.
> o Most importantly of all, reclaim from node 0 with multiple zones will
> have similar aging and reclaiming characteristics as every
> other node.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
2016-06-09 18:04 [PATCH 00/27] Move LRU page reclaim from zones to nodes v6 Mel Gorman
@ 2016-06-09 18:04 ` Mel Gorman
2016-06-15 14:23 ` Vlastimil Babka
0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2016-06-09 18:04 UTC (permalink / raw)
To: Andrew Morton, Linux-MM
Cc: Rik van Riel, Vlastimil Babka, Johannes Weiner, LKML, Mel Gorman
Patch "mm: vmscan: Begin reclaiming pages on a per-node basis" started
thinking of reclaim in terms of nodes but kswapd is still zone-centric. This
patch gets rid of many of the node-based versus zone-based decisions.
o A node is considered balanced when any eligible lower zone is balanced.
This eliminates one class of age-inversion problem because we avoid
reclaiming a newer page just because it's in the wrong zone
o pgdat_balanced disappears because we now only care about one zone being
balanced.
o Some anomalies related to writeback and congestion tracking being based on
zones disappear.
o kswapd no longer has to take care to reclaim zones in the reverse order
that the page allocator uses.
o Most importantly of all, reclaim from node 0 with multiple zones will
have similar aging and reclaiming characteristics as every
other node.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/vmscan.c | 292 +++++++++++++++++++++---------------------------------------
1 file changed, 101 insertions(+), 191 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0a619241c576..9368af4cfb06 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2942,7 +2942,8 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
}
#endif
-static void age_active_anon(struct zone *zone, struct scan_control *sc)
+static void age_active_anon(struct pglist_data *pgdat,
+ struct zone *zone, struct scan_control *sc)
{
struct mem_cgroup *memcg;
@@ -2961,85 +2962,15 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc)
} while (memcg);
}
-static bool zone_balanced(struct zone *zone, int order, bool highorder,
+static bool zone_balanced(struct zone *zone, int order,
unsigned long balance_gap, int classzone_idx)
{
unsigned long mark = high_wmark_pages(zone) + balance_gap;
- /*
- * When checking from pgdat_balanced(), kswapd should stop and sleep
- * when it reaches the high order-0 watermark and let kcompactd take
- * over. Other callers such as wakeup_kswapd() want to determine the
- * true high-order watermark.
- */
- if (IS_ENABLED(CONFIG_COMPACTION) && !highorder) {
- mark += (1UL << order);
- order = 0;
- }
-
return zone_watermark_ok_safe(zone, order, mark, classzone_idx);
}
/*
- * pgdat_balanced() is used when checking if a node is balanced.
- *
- * For order-0, all zones must be balanced!
- *
- * For high-order allocations only zones that meet watermarks and are in a
- * zone allowed by the callers classzone_idx are added to balanced_pages. The
- * total of balanced pages must be at least 25% of the zones allowed by
- * classzone_idx for the node to be considered balanced. Forcing all zones to
- * be balanced for high orders can cause excessive reclaim when there are
- * imbalanced zones.
- * The choice of 25% is due to
- * o a 16M DMA zone that is balanced will not balance a zone on any
- * reasonable sized machine
- * o On all other machines, the top zone must be at least a reasonable
- * percentage of the middle zones. For example, on 32-bit x86, highmem
- * would need to be at least 256M for it to be balance a whole node.
- * Similarly, on x86-64 the Normal zone would need to be at least 1G
- * to balance a node on its own. These seemed like reasonable ratios.
- */
-static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx)
-{
- unsigned long managed_pages = 0;
- unsigned long balanced_pages = 0;
- int i;
-
- /* Check the watermark levels */
- for (i = 0; i <= classzone_idx; i++) {
- struct zone *zone = pgdat->node_zones + i;
-
- if (!populated_zone(zone))
- continue;
-
- managed_pages += zone->managed_pages;
-
- /*
- * A special case here:
- *
- * balance_pgdat() skips over all_unreclaimable after
- * DEF_PRIORITY. Effectively, it considers them balanced so
- * they must be considered balanced here as well!
- */
- if (!pgdat_reclaimable(zone->zone_pgdat)) {
- balanced_pages += zone->managed_pages;
- continue;
- }
-
- if (zone_balanced(zone, order, false, 0, i))
- balanced_pages += zone->managed_pages;
- else if (!order)
- return false;
- }
-
- if (order)
- return balanced_pages >= (managed_pages >> 2);
- else
- return true;
-}
-
-/*
* Prepare kswapd for sleeping. This verifies that there are no processes
* waiting in throttle_direct_reclaim() and that watermarks have been met.
*
@@ -3048,6 +2979,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx)
static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
int classzone_idx)
{
+ int i;
+
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
return false;
@@ -3068,101 +3001,90 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
if (waitqueue_active(&pgdat->pfmemalloc_wait))
wake_up_all(&pgdat->pfmemalloc_wait);
- return pgdat_balanced(pgdat, order, classzone_idx);
+ for (i = 0; i <= classzone_idx; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_balanced(zone, order, 0, classzone_idx))
+ return true;
+ }
+
+ return false;
}
/*
- * kswapd shrinks the zone by the number of pages required to reach
- * the high watermark.
+ * kswapd shrinks a node of pages that are at or below the highest usable
+ * zone that is currently unbalanced.
*
* Returns true if kswapd scanned at least the requested number of pages to
* reclaim or if the lack of progress was due to pages under writeback.
* This is used to determine if the scanning priority needs to be raised.
*/
-static bool kswapd_shrink_zone(struct zone *zone,
+static bool kswapd_shrink_node(pg_data_t *pgdat,
int classzone_idx,
struct scan_control *sc)
{
- unsigned long balance_gap;
- bool lowmem_pressure;
- struct pglist_data *pgdat = zone->zone_pgdat;
+ struct zone *zone;
+ unsigned long nr_to_reclaim = 0;
+ int z;
- /* Reclaim above the high watermark. */
- sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone));
+ /* Reclaim a number of pages proportional to the number of zones */
+ for (z = 0; z <= classzone_idx; z++) {
+ zone = pgdat->node_zones + z;
+ if (!populated_zone(zone))
+ continue;
- /*
- * We put equal pressure on every zone, unless one zone has way too
- * many pages free already. The "too many pages" is defined as the
- * high wmark plus a "gap" where the gap is either the low
- * watermark or 1% of the zone, whichever is smaller.
- */
- balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
- zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
+ nr_to_reclaim += max(high_wmark_pages(zone), SWAP_CLUSTER_MAX);
+ }
/*
- * If there is no low memory pressure or the zone is balanced then no
- * reclaim is necessary
+ * Historically care was taken to put equal pressure on all zones but
+ * now pressure is applied based on node LRU order.
*/
- lowmem_pressure = (buffer_heads_over_limit && is_highmem(zone));
- if (!lowmem_pressure && zone_balanced(zone, sc->order, false,
- balance_gap, classzone_idx))
- return true;
-
- shrink_node(zone->zone_pgdat, sc, classzone_idx);
-
- /* TODO: ANOMALY */
- clear_bit(PGDAT_WRITEBACK, &pgdat->flags);
+ shrink_node(pgdat, sc, classzone_idx);
/*
- * If a zone reaches its high watermark, consider it to be no longer
- * congested. It's possible there are dirty pages backed by congested
- * BDIs but as pressure is relieved, speculatively avoid congestion
- * waits.
+ * Fragmentation may mean that the system cannot be rebalanced for
+ * high-order allocations. If twice the allocation size has been
+ * reclaimed then recheck watermarks only at order-0 to prevent
+ * excessive reclaim. Assume that a process requested a high-order
+ * can direct reclaim/compact.
*/
- if (pgdat_reclaimable(zone->zone_pgdat) &&
- zone_balanced(zone, sc->order, false, 0, classzone_idx)) {
- clear_bit(PGDAT_CONGESTED, &pgdat->flags);
- clear_bit(PGDAT_DIRTY, &pgdat->flags);
- }
+ if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
+ sc->order = 0;
return sc->nr_scanned >= sc->nr_to_reclaim;
}
/*
- * For kswapd, balance_pgdat() will work across all this node's zones until
- * they are all at high_wmark_pages(zone).
- *
- * Returns the highest zone idx kswapd was reclaiming at
+ * For kswapd, balance_pgdat() will reclaim pages across a node from zones
+ * that are eligible for use by the caller until at least one zone is
+ * balanced.
*
- * There is special handling here for zones which are full of pinned pages.
- * This can happen if the pages are all mlocked, or if they are all used by
- * device drivers (say, ZONE_DMA). Or if they are all in use by hugetlb.
- * What we do is to detect the case where all pages in the zone have been
- * scanned twice and there has been zero successful reclaim. Mark the zone as
- * dead and from now on, only perform a short scan. Basically we're polling
- * the zone for when the problem goes away.
+ * Returns the order kswapd finished reclaiming at.
*
* kswapd scans the zones in the highmem->normal->dma direction. It skips
* zones which have free_pages > high_wmark_pages(zone), but once a zone is
- * found to have free_pages <= high_wmark_pages(zone), we scan that zone and the
- * lower zones regardless of the number of free pages in the lower zones. This
- * interoperates with the page allocator fallback scheme to ensure that aging
- * of pages is balanced across the zones.
+ * found to have free_pages <= high_wmark_pages(zone), any page is that zone
+ * or lower is eligible for reclaim until at least one usable zone is
+ * balanced.
*/
static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
{
int i;
- int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
+ struct zone *zone;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
- .reclaim_idx = MAX_NR_ZONES - 1,
.order = order,
.priority = DEF_PRIORITY,
.may_writepage = !laptop_mode,
.may_unmap = 1,
.may_swap = 1,
+ .reclaim_idx = classzone_idx,
};
count_vm_event(PAGEOUTRUN);
@@ -3173,21 +3095,10 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
/* Scan from the highest requested zone to dma */
for (i = classzone_idx; i >= 0; i--) {
- struct zone *zone = pgdat->node_zones + i;
-
+ zone = pgdat->node_zones + i;
if (!populated_zone(zone))
continue;
- if (sc.priority != DEF_PRIORITY &&
- !pgdat_reclaimable(zone->zone_pgdat))
- continue;
-
- /*
- * Do some background aging of the anon list, to give
- * pages a chance to be referenced before reclaiming.
- */
- age_active_anon(zone, &sc);
-
/*
* If the number of buffer_heads in the machine
* exceeds the maximum allowed level and this node
@@ -3195,19 +3106,17 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
* it to relieve lowmem pressure.
*/
if (buffer_heads_over_limit && is_highmem_idx(i)) {
- end_zone = i;
+ classzone_idx = i;
break;
}
- if (!zone_balanced(zone, order, false, 0, 0)) {
- end_zone = i;
+ if (!zone_balanced(zone, order, 0, 0)) {
+ classzone_idx = i;
break;
} else {
/*
- * If balanced, clear the dirty and congested
- * flags
- *
- * TODO: ANOMALY
+ * If any eligible zone is balanced then the
+ * node is not considered congested or dirty.
*/
clear_bit(PGDAT_CONGESTED, &zone->zone_pgdat->flags);
clear_bit(PGDAT_DIRTY, &zone->zone_pgdat->flags);
@@ -3218,51 +3127,34 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
goto out;
/*
+ * Do some background aging of the anon list, to give
+ * pages a chance to be referenced before reclaiming. All
+ * pages are rotated regardless of classzone as this is
+ * about consistent aging.
+ */
+ age_active_anon(pgdat, &pgdat->node_zones[MAX_NR_ZONES - 1], &sc);
+
+ /*
* If we're getting trouble reclaiming, start doing writepage
* even in laptop mode.
*/
- if (sc.priority < DEF_PRIORITY - 2)
+ if (sc.priority < DEF_PRIORITY - 2 || !pgdat_reclaimable(pgdat))
sc.may_writepage = 1;
+ /* Call soft limit reclaim before calling shrink_node. */
+ sc.nr_scanned = 0;
+ nr_soft_scanned = 0;
+ nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone, sc.order,
+ sc.gfp_mask, &nr_soft_scanned);
+ sc.nr_reclaimed += nr_soft_reclaimed;
+
/*
- * Continue scanning in the highmem->dma direction stopping at
- * the last zone which needs scanning. This may reclaim lowmem
- * pages that are not necessary for zone balancing but it
- * preserves LRU ordering. It is assumed that the bulk of
- * allocation requests can use arbitrary zones with the
- * possible exception of big highmem:lowmem configurations.
+ * There should be no need to raise the scanning priority if
+ * enough pages are already being scanned that that high
+ * watermark would be met at 100% efficiency.
*/
- for (i = end_zone; i >= end_zone; i--) {
- struct zone *zone = pgdat->node_zones + i;
-
- if (!populated_zone(zone))
- continue;
-
- if (sc.priority != DEF_PRIORITY &&
- !pgdat_reclaimable(zone->zone_pgdat))
- continue;
-
- sc.nr_scanned = 0;
- sc.reclaim_idx = i;
-
- nr_soft_scanned = 0;
- /*
- * Call soft limit reclaim before calling shrink_zone.
- */
- nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
- order, sc.gfp_mask,
- &nr_soft_scanned);
- sc.nr_reclaimed += nr_soft_reclaimed;
-
- /*
- * There should be no need to raise the scanning
- * priority if enough pages are already being scanned
- * that that high watermark would be met at 100%
- * efficiency.
- */
- if (kswapd_shrink_zone(zone, end_zone, &sc))
- raise_priority = false;
- }
+ if (kswapd_shrink_node(pgdat, classzone_idx, &sc))
+ raise_priority = false;
/*
* If the low watermark is met there is no need for processes
@@ -3278,20 +3170,37 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
break;
/*
+ * Stop reclaiming if any eligible zone is balanced and clear
+ * node writeback or congested.
+ */
+ for (i = 0; i <= classzone_idx; i++) {
+ zone = pgdat->node_zones + i;
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_balanced(zone, sc.order, 0, classzone_idx)) {
+ clear_bit(PGDAT_CONGESTED, &pgdat->flags);
+ clear_bit(PGDAT_DIRTY, &pgdat->flags);
+ goto out;
+ }
+ }
+
+ /*
* Raise priority if scanning rate is too low or there was no
* progress in reclaiming pages
*/
if (raise_priority || !sc.nr_reclaimed)
sc.priority--;
- } while (sc.priority >= 1 &&
- !pgdat_balanced(pgdat, order, classzone_idx));
+ } while (sc.priority >= 1);
out:
/*
- * Return the highest zone idx we were reclaiming at so
- * prepare_kswapd_sleep() makes the same decisions as here.
+ * Return the order kswapd stopped reclaiming at as
+ * prepare_kswapd_sleep() takes it into account. If another caller
+ * entered the allocator slow path while kswapd was awake, order will
+ * remain at the higher level.
*/
- return end_zone;
+ return sc.order;
}
static void kswapd_try_to_sleep(pg_data_t *pgdat, int order,
@@ -3448,8 +3357,9 @@ static int kswapd(void *p)
*/
if (!ret) {
trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
- balanced_classzone_idx = balance_pgdat(pgdat, order,
- classzone_idx);
+
+ /* return value ignored until next patch */
+ balance_pgdat(pgdat, order, classzone_idx);
}
}
@@ -3479,7 +3389,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
}
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
- if (zone_balanced(zone, order, true, 0, 0))
+ if (zone_balanced(zone, order, 0, 0))
return;
trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
--
2.6.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
2016-04-15 9:13 ` [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes Mel Gorman
@ 2016-04-28 8:36 ` Vlastimil Babka
0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2016-04-28 8:36 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton, Linux-MM
Cc: Rik van Riel, Johannes Weiner, Jesper Dangaard Brouer, LKML
On 04/15/2016 11:13 AM, Mel Gorman wrote:
> /*
> - * If a zone reaches its high watermark, consider it to be no longer
> - * congested. It's possible there are dirty pages backed by congested
> - * BDIs but as pressure is relieved, speculatively avoid congestion
> - * waits.
> + * Fragmentation may mean that the system cannot be rebalanced for
> + * high-order allocations. If twice the allocation size has been
> + * reclaimed then recheck watermarks only at order-0 to prevent
> + * excessive reclaim. Assume that a process requested a high-order
> + * can direct reclaim/compact.
Also kcompactd is woken up in this case...
> */
> - if (pgdat_reclaimable(zone->zone_pgdat) &&
> - zone_balanced(zone, sc->order, false, 0, classzone_idx)) {
> - clear_bit(PGDAT_CONGESTED, &pgdat->flags);
> - clear_bit(PGDAT_DIRTY, &pgdat->flags);
> - }
> + if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
> + sc->order = 0;
>
> return sc->nr_scanned >= sc->nr_to_reclaim;
This looks indeed simpler than my earlier zone_balanced() modification
you removed. However I think there's still potential of overreclaim due
to a stream of kswapd_wakeups where each will have to reclaim 2UL <<
sc->order pages, regardless of watermarks. Could be some high-order
wakeups from GFP_ATOMIC context that have order-0 fallbacks but will
cause kswapd to keep reclaiming when kcompactd can't keep up due to
fragmentation...
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes
2016-04-15 9:13 [PATCH 00/27] Move LRU page reclaim from zones to nodes v5 Mel Gorman
@ 2016-04-15 9:13 ` Mel Gorman
2016-04-28 8:36 ` Vlastimil Babka
0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2016-04-15 9:13 UTC (permalink / raw)
To: Andrew Morton, Linux-MM
Cc: Rik van Riel, Vlastimil Babka, Johannes Weiner,
Jesper Dangaard Brouer, LKML, Mel Gorman
Patch "mm: vmscan: Begin reclaiming pages on a per-node basis" started
thinking of reclaim in terms of nodes but kswapd is still zone-centric. This
patch gets rid of many of the node-based versus zone-based decisions.
o A node is considered balanced when any eligible lower zone is balanced.
This eliminates one class of age-inversion problem because we avoid
reclaiming a newer page just because it's in the wrong zone
o pgdat_balanced disappears because we now only care about one zone being
balanced.
o Some anomalies related to writeback and congestion tracking being based on
zones disappear.
o kswapd no longer has to take care to reclaim zones in the reverse order
that the page allocator uses.
o Most importantly of all, reclaim from node 0 with multiple zones will
have similar aging and reclaiming characteristics as every
other node.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/vmscan.c | 292 +++++++++++++++++++++---------------------------------------
1 file changed, 101 insertions(+), 191 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f2534e8f8527..c23d8f9722ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2979,7 +2979,8 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
}
#endif
-static void age_active_anon(struct zone *zone, struct scan_control *sc)
+static void age_active_anon(struct pglist_data *pgdat,
+ struct zone *zone, struct scan_control *sc)
{
struct mem_cgroup *memcg;
@@ -2998,85 +2999,15 @@ static void age_active_anon(struct zone *zone, struct scan_control *sc)
} while (memcg);
}
-static bool zone_balanced(struct zone *zone, int order, bool highorder,
+static bool zone_balanced(struct zone *zone, int order,
unsigned long balance_gap, int classzone_idx)
{
unsigned long mark = high_wmark_pages(zone) + balance_gap;
- /*
- * When checking from pgdat_balanced(), kswapd should stop and sleep
- * when it reaches the high order-0 watermark and let kcompactd take
- * over. Other callers such as wakeup_kswapd() want to determine the
- * true high-order watermark.
- */
- if (IS_ENABLED(CONFIG_COMPACTION) && !highorder) {
- mark += (1UL << order);
- order = 0;
- }
-
return zone_watermark_ok_safe(zone, order, mark, classzone_idx);
}
/*
- * pgdat_balanced() is used when checking if a node is balanced.
- *
- * For order-0, all zones must be balanced!
- *
- * For high-order allocations only zones that meet watermarks and are in a
- * zone allowed by the callers classzone_idx are added to balanced_pages. The
- * total of balanced pages must be at least 25% of the zones allowed by
- * classzone_idx for the node to be considered balanced. Forcing all zones to
- * be balanced for high orders can cause excessive reclaim when there are
- * imbalanced zones.
- * The choice of 25% is due to
- * o a 16M DMA zone that is balanced will not balance a zone on any
- * reasonable sized machine
- * o On all other machines, the top zone must be at least a reasonable
- * percentage of the middle zones. For example, on 32-bit x86, highmem
- * would need to be at least 256M for it to be balance a whole node.
- * Similarly, on x86-64 the Normal zone would need to be at least 1G
- * to balance a node on its own. These seemed like reasonable ratios.
- */
-static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx)
-{
- unsigned long managed_pages = 0;
- unsigned long balanced_pages = 0;
- int i;
-
- /* Check the watermark levels */
- for (i = 0; i <= classzone_idx; i++) {
- struct zone *zone = pgdat->node_zones + i;
-
- if (!populated_zone(zone))
- continue;
-
- managed_pages += zone->managed_pages;
-
- /*
- * A special case here:
- *
- * balance_pgdat() skips over all_unreclaimable after
- * DEF_PRIORITY. Effectively, it considers them balanced so
- * they must be considered balanced here as well!
- */
- if (!pgdat_reclaimable(zone->zone_pgdat)) {
- balanced_pages += zone->managed_pages;
- continue;
- }
-
- if (zone_balanced(zone, order, false, 0, i))
- balanced_pages += zone->managed_pages;
- else if (!order)
- return false;
- }
-
- if (order)
- return balanced_pages >= (managed_pages >> 2);
- else
- return true;
-}
-
-/*
* Prepare kswapd for sleeping. This verifies that there are no processes
* waiting in throttle_direct_reclaim() and that watermarks have been met.
*
@@ -3085,6 +3016,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int classzone_idx)
static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
int classzone_idx)
{
+ int i;
+
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
return false;
@@ -3105,101 +3038,90 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
if (waitqueue_active(&pgdat->pfmemalloc_wait))
wake_up_all(&pgdat->pfmemalloc_wait);
- return pgdat_balanced(pgdat, order, classzone_idx);
+ for (i = 0; i <= classzone_idx; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_balanced(zone, order, 0, classzone_idx))
+ return true;
+ }
+
+ return false;
}
/*
- * kswapd shrinks the zone by the number of pages required to reach
- * the high watermark.
+ * kswapd shrinks a node of pages that are at or below the highest usable
+ * zone that is currently unbalanced.
*
* Returns true if kswapd scanned at least the requested number of pages to
* reclaim or if the lack of progress was due to pages under writeback.
* This is used to determine if the scanning priority needs to be raised.
*/
-static bool kswapd_shrink_zone(struct zone *zone,
+static bool kswapd_shrink_node(pg_data_t *pgdat,
int classzone_idx,
struct scan_control *sc)
{
- unsigned long balance_gap;
- bool lowmem_pressure;
- struct pglist_data *pgdat = zone->zone_pgdat;
+ struct zone *zone;
+ unsigned long nr_to_reclaim = 0;
+ int z;
- /* Reclaim above the high watermark. */
- sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone));
+ /* Reclaim a number of pages proportional to the number of zones */
+ for (z = 0; z <= classzone_idx; z++) {
+ zone = pgdat->node_zones + z;
+ if (!populated_zone(zone))
+ continue;
- /*
- * We put equal pressure on every zone, unless one zone has way too
- * many pages free already. The "too many pages" is defined as the
- * high wmark plus a "gap" where the gap is either the low
- * watermark or 1% of the zone, whichever is smaller.
- */
- balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
- zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
+ nr_to_reclaim += max(high_wmark_pages(zone), SWAP_CLUSTER_MAX);
+ }
/*
- * If there is no low memory pressure or the zone is balanced then no
- * reclaim is necessary
+ * Historically care was taken to put equal pressure on all zones but
+ * now pressure is applied based on node LRU order.
*/
- lowmem_pressure = (buffer_heads_over_limit && is_highmem(zone));
- if (!lowmem_pressure && zone_balanced(zone, sc->order, false,
- balance_gap, classzone_idx))
- return true;
-
- shrink_node(zone->zone_pgdat, sc, classzone_idx);
-
- /* TODO: ANOMALY */
- clear_bit(PGDAT_WRITEBACK, &pgdat->flags);
+ shrink_node(pgdat, sc, classzone_idx);
/*
- * If a zone reaches its high watermark, consider it to be no longer
- * congested. It's possible there are dirty pages backed by congested
- * BDIs but as pressure is relieved, speculatively avoid congestion
- * waits.
+ * Fragmentation may mean that the system cannot be rebalanced for
+ * high-order allocations. If twice the allocation size has been
+ * reclaimed then recheck watermarks only at order-0 to prevent
+ * excessive reclaim. Assume that a process requested a high-order
+ * can direct reclaim/compact.
*/
- if (pgdat_reclaimable(zone->zone_pgdat) &&
- zone_balanced(zone, sc->order, false, 0, classzone_idx)) {
- clear_bit(PGDAT_CONGESTED, &pgdat->flags);
- clear_bit(PGDAT_DIRTY, &pgdat->flags);
- }
+ if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
+ sc->order = 0;
return sc->nr_scanned >= sc->nr_to_reclaim;
}
/*
- * For kswapd, balance_pgdat() will work across all this node's zones until
- * they are all at high_wmark_pages(zone).
- *
- * Returns the highest zone idx kswapd was reclaiming at
+ * For kswapd, balance_pgdat() will reclaim pages across a node from zones
+ * that are eligible for use by the caller until at least one zone is
+ * balanced.
*
- * There is special handling here for zones which are full of pinned pages.
- * This can happen if the pages are all mlocked, or if they are all used by
- * device drivers (say, ZONE_DMA). Or if they are all in use by hugetlb.
- * What we do is to detect the case where all pages in the zone have been
- * scanned twice and there has been zero successful reclaim. Mark the zone as
- * dead and from now on, only perform a short scan. Basically we're polling
- * the zone for when the problem goes away.
+ * Returns the order kswapd finished reclaiming at.
*
* kswapd scans the zones in the highmem->normal->dma direction. It skips
* zones which have free_pages > high_wmark_pages(zone), but once a zone is
- * found to have free_pages <= high_wmark_pages(zone), we scan that zone and the
- * lower zones regardless of the number of free pages in the lower zones. This
- * interoperates with the page allocator fallback scheme to ensure that aging
- * of pages is balanced across the zones.
+ * found to have free_pages <= high_wmark_pages(zone), any page is that zone
+ * or lower is eligible for reclaim until at least one usable zone is
+ * balanced.
*/
static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
{
int i;
- int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
+ struct zone *zone;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
- .reclaim_idx = MAX_NR_ZONES - 1,
.order = order,
.priority = DEF_PRIORITY,
.may_writepage = !laptop_mode,
.may_unmap = 1,
.may_swap = 1,
+ .reclaim_idx = classzone_idx,
};
count_vm_event(PAGEOUTRUN);
@@ -3210,21 +3132,10 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
/* Scan from the highest requested zone to dma */
for (i = classzone_idx; i >= 0; i--) {
- struct zone *zone = pgdat->node_zones + i;
-
+ zone = pgdat->node_zones + i;
if (!populated_zone(zone))
continue;
- if (sc.priority != DEF_PRIORITY &&
- !pgdat_reclaimable(zone->zone_pgdat))
- continue;
-
- /*
- * Do some background aging of the anon list, to give
- * pages a chance to be referenced before reclaiming.
- */
- age_active_anon(zone, &sc);
-
/*
* If the number of buffer_heads in the machine
* exceeds the maximum allowed level and this node
@@ -3232,19 +3143,17 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
* it to relieve lowmem pressure.
*/
if (buffer_heads_over_limit && is_highmem_idx(i)) {
- end_zone = i;
+ classzone_idx = i;
break;
}
- if (!zone_balanced(zone, order, false, 0, 0)) {
- end_zone = i;
+ if (!zone_balanced(zone, order, 0, 0)) {
+ classzone_idx = i;
break;
} else {
/*
- * If balanced, clear the dirty and congested
- * flags
- *
- * TODO: ANOMALY
+ * If any eligible zone is balanced then the
+ * node is not considered congested or dirty.
*/
clear_bit(PGDAT_CONGESTED, &zone->zone_pgdat->flags);
clear_bit(PGDAT_DIRTY, &zone->zone_pgdat->flags);
@@ -3255,51 +3164,34 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
goto out;
/*
+ * Do some background aging of the anon list, to give
+ * pages a chance to be referenced before reclaiming. All
+ * pages are rotated regardless of classzone as this is
+ * about consistent aging.
+ */
+ age_active_anon(pgdat, &pgdat->node_zones[MAX_NR_ZONES - 1], &sc);
+
+ /*
* If we're getting trouble reclaiming, start doing writepage
* even in laptop mode.
*/
- if (sc.priority < DEF_PRIORITY - 2)
+ if (sc.priority < DEF_PRIORITY - 2 || !pgdat_reclaimable(pgdat))
sc.may_writepage = 1;
+ /* Call soft limit reclaim before calling shrink_node. */
+ sc.nr_scanned = 0;
+ nr_soft_scanned = 0;
+ nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone, sc.order,
+ sc.gfp_mask, &nr_soft_scanned);
+ sc.nr_reclaimed += nr_soft_reclaimed;
+
/*
- * Continue scanning in the highmem->dma direction stopping at
- * the last zone which needs scanning. This may reclaim lowmem
- * pages that are not necessary for zone balancing but it
- * preserves LRU ordering. It is assumed that the bulk of
- * allocation requests can use arbitrary zones with the
- * possible exception of big highmem:lowmem configurations.
+ * There should be no need to raise the scanning priority if
+ * enough pages are already being scanned that that high
+ * watermark would be met at 100% efficiency.
*/
- for (i = end_zone; i >= end_zone; i--) {
- struct zone *zone = pgdat->node_zones + i;
-
- if (!populated_zone(zone))
- continue;
-
- if (sc.priority != DEF_PRIORITY &&
- !pgdat_reclaimable(zone->zone_pgdat))
- continue;
-
- sc.nr_scanned = 0;
- sc.reclaim_idx = i;
-
- nr_soft_scanned = 0;
- /*
- * Call soft limit reclaim before calling shrink_zone.
- */
- nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
- order, sc.gfp_mask,
- &nr_soft_scanned);
- sc.nr_reclaimed += nr_soft_reclaimed;
-
- /*
- * There should be no need to raise the scanning
- * priority if enough pages are already being scanned
- * that that high watermark would be met at 100%
- * efficiency.
- */
- if (kswapd_shrink_zone(zone, end_zone, &sc))
- raise_priority = false;
- }
+ if (kswapd_shrink_node(pgdat, classzone_idx, &sc))
+ raise_priority = false;
/*
* If the low watermark is met there is no need for processes
@@ -3315,20 +3207,37 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
break;
/*
+ * Stop reclaiming if any eligible zone is balanced and clear
+ * node writeback or congested.
+ */
+ for (i = 0; i <= classzone_idx; i++) {
+ zone = pgdat->node_zones + i;
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone_balanced(zone, sc.order, 0, classzone_idx)) {
+ clear_bit(PGDAT_CONGESTED, &pgdat->flags);
+ clear_bit(PGDAT_DIRTY, &pgdat->flags);
+ goto out;
+ }
+ }
+
+ /*
* Raise priority if scanning rate is too low or there was no
* progress in reclaiming pages
*/
if (raise_priority || !sc.nr_reclaimed)
sc.priority--;
- } while (sc.priority >= 1 &&
- !pgdat_balanced(pgdat, order, classzone_idx));
+ } while (sc.priority >= 1);
out:
/*
- * Return the highest zone idx we were reclaiming at so
- * prepare_kswapd_sleep() makes the same decisions as here.
+ * Return the order kswapd stopped reclaiming at as
+ * prepare_kswapd_sleep() takes it into account. If another caller
+ * entered the allocator slow path while kswapd was awake, order will
+ * remain at the higher level.
*/
- return end_zone;
+ return sc.order;
}
static void kswapd_try_to_sleep(pg_data_t *pgdat, int order,
@@ -3485,8 +3394,9 @@ static int kswapd(void *p)
*/
if (!ret) {
trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
- balanced_classzone_idx = balance_pgdat(pgdat, order,
- classzone_idx);
+
+ /* return value ignored until next patch */
+ balance_pgdat(pgdat, order, classzone_idx);
}
}
@@ -3516,7 +3426,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
}
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
- if (zone_balanced(zone, order, true, 0, 0))
+ if (zone_balanced(zone, order, 0, 0))
return;
trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
--
2.6.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-06-23 11:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <02fe01d1c48b$c44e9e80$4cebdb80$@alibaba-inc.com>
2016-06-12 9:33 ` [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes Hillf Danton
2016-06-14 14:52 ` Mel Gorman
[not found] <071801d1cc5c$245087d0$6cf19770$@alibaba-inc.com>
2016-06-22 8:42 ` Hillf Danton
2016-06-23 11:31 ` Mel Gorman
2016-06-21 14:15 [PATCH 00/27] Move LRU page reclaim from zones to nodes v7 Mel Gorman
2016-06-21 14:15 ` [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2016-06-09 18:04 [PATCH 00/27] Move LRU page reclaim from zones to nodes v6 Mel Gorman
2016-06-09 18:04 ` [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes Mel Gorman
2016-06-15 14:23 ` Vlastimil Babka
2016-04-15 9:13 [PATCH 00/27] Move LRU page reclaim from zones to nodes v5 Mel Gorman
2016-04-15 9:13 ` [PATCH 06/27] mm, vmscan: Make kswapd reclaim in terms of nodes Mel Gorman
2016-04-28 8:36 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).