From: Yang Shi <yang.shi@linux.alibaba.com>
To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com,
hannes@cmpxchg.org, akpm@linux-foundation.org,
dave.hansen@intel.com, keith.busch@intel.com,
dan.j.williams@intel.com, fengguang.wu@intel.com,
fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com
Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: [v3 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not
Date: Fri, 14 Jun 2019 07:29:35 +0800 [thread overview]
Message-ID: <1560468577-101178-8-git-send-email-yang.shi@linux.alibaba.com> (raw)
In-Reply-To: <1560468577-101178-1-git-send-email-yang.shi@linux.alibaba.com>
When demoting to the migration target node, the target node may have
memory pressure, then the memory pressure may cause migrate_pages()
fail.
If the failure is caused by memory pressure (i.e. returning -ENOMEM),
tag the node with PGDAT_CONTENDED. The tag would be cleared once the
target node is balanced again.
Check if the target node is PGDAT_CONTENDED or not, if it is just skip
demotion.
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
include/linux/mmzone.h | 3 +++
mm/vmscan.c | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394ca..d4e05c5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -573,6 +573,9 @@ enum pgdat_flags {
* many pages under writeback
*/
PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */
+ PGDAT_CONTENDED, /* the node has not enough free memory
+ * available
+ */
};
enum zone_flags {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fb931ded..9ec55d7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1126,6 +1126,21 @@ static inline struct page *alloc_demote_page(struct page *page,
}
#endif
+static inline bool is_migration_target_contended(int nid)
+{
+ int node;
+ nodemask_t used_mask;
+
+
+ nodes_clear(used_mask);
+ node = find_next_best_node(nid, &used_mask, true);
+
+ if (test_bit(PGDAT_CONTENDED, &NODE_DATA(node)->flags))
+ return true;
+
+ return false;
+}
+
static inline bool is_demote_ok(int nid, struct scan_control *sc)
{
/* Just do demotion with migrate mode of node reclaim */
@@ -1144,6 +1159,10 @@ static inline bool is_demote_ok(int nid, struct scan_control *sc)
if (!has_migration_target_node_online())
return false;
+ /* Check if the demote target node is contended or not */
+ if (is_migration_target_contended(nid))
+ return false;
+
return true;
}
@@ -1564,6 +1583,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
nr_reclaimed += nr_succeeded;
if (err) {
+ if (err == -ENOMEM)
+ set_bit(PGDAT_CONTENDED,
+ &NODE_DATA(target_nid)->flags);
+
putback_movable_pages(&demote_pages);
list_splice(&ret_pages, &demote_pages);
@@ -2597,6 +2620,19 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
* scan target and the percentage scanning already complete
*/
lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE;
+
+ /*
+ * The shrink_page_list() may find the demote target node is
+ * contended, if so it doesn't make sense to scan anonymous
+ * LRU again.
+ *
+ * Need check if swap is available or not too since demotion
+ * may happen on swapless system.
+ */
+ if (!is_demote_ok(pgdat->node_id, sc) &&
+ (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0))
+ lru = LRU_FILE;
+
nr_scanned = targets[lru] - nr[lru];
nr[lru] = targets[lru] * (100 - percentage) / 100;
nr[lru] -= min(nr[lru], nr_scanned);
@@ -3447,6 +3483,7 @@ static void clear_pgdat_congested(pg_data_t *pgdat)
clear_bit(PGDAT_CONGESTED, &pgdat->flags);
clear_bit(PGDAT_DIRTY, &pgdat->flags);
clear_bit(PGDAT_WRITEBACK, &pgdat->flags);
+ clear_bit(PGDAT_CONTENDED, &pgdat->flags);
}
/*
--
1.8.3.1
next prev parent reply other threads:[~2019-06-13 23:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-13 23:29 [v3 RFC PATCH 0/9] Migrate mode for node reclaim with heterogeneous memory hierarchy Yang Shi
2019-06-13 23:29 ` [v3 PATCH 1/9] mm: define N_CPU_MEM node states Yang Shi
2019-06-13 23:29 ` [v3 PATCH 2/9] mm: Introduce migrate target nodemask Yang Shi
2019-06-13 23:29 ` [v3 PATCH 3/9] mm: page_alloc: make find_next_best_node find return migration target node Yang Shi
2019-06-13 23:29 ` [v3 PATCH 4/9] mm: migrate: make migrate_pages() return nr_succeeded Yang Shi
2019-06-13 23:29 ` [v3 PATCH 5/9] mm: vmscan: demote anon DRAM pages to migration target node Yang Shi
2019-06-13 23:29 ` [v3 PATCH 6/9] mm: vmscan: don't demote for memcg reclaim Yang Shi
2019-06-13 23:29 ` Yang Shi [this message]
2019-06-13 23:29 ` [v3 PATCH 8/9] mm: vmscan: add page demotion counter Yang Shi
2019-06-13 23:29 ` [v3 PATCH 9/9] mm: numa: add page promotion counter Yang Shi
2019-06-27 2:57 ` [v3 RFC PATCH 0/9] Migrate mode for node reclaim with heterogeneous memory hierarchy Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1560468577-101178-8-git-send-email-yang.shi@linux.alibaba.com \
--to=yang.shi@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=fan.du@intel.com \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=ying.huang@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).