From: "Huang, Ying" <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Huang Ying <ying.huang@intel.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shaohua Li <shli@kernel.org>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Zi Yan <zi.yan@cs.rutgers.edu>,
Daniel Jordan <daniel.m.jordan@oracle.com>
Subject: [PATCH -mm -v4 12/21] mm, THP, swap: Support PMD swap mapping in swapoff
Date: Fri, 22 Jun 2018 11:51:42 +0800 [thread overview]
Message-ID: <20180622035151.6676-13-ying.huang@intel.com> (raw)
In-Reply-To: <20180622035151.6676-1-ying.huang@intel.com>
From: Huang Ying <ying.huang@intel.com>
During swapoff, for a huge swap cluster, we need to allocate a THP,
read its contents into the THP and unuse the PMD and PTE swap mappings
to it. If failed to allocate a THP, the huge swap cluster will be
split.
During unuse, if it is found that the swap cluster mapped by a PMD
swap mapping is split already, we will split the PMD swap mapping and
unuse the PTEs.
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
---
include/asm-generic/pgtable.h | 15 ++------
include/linux/huge_mm.h | 8 ++++
mm/huge_memory.c | 4 +-
mm/swapfile.c | 86 ++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 98 insertions(+), 15 deletions(-)
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index bb8354981a36..caa381962cd2 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -931,22 +931,13 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
barrier();
#endif
/*
- * !pmd_present() checks for pmd migration entries
- *
- * The complete check uses is_pmd_migration_entry() in linux/swapops.h
- * But using that requires moving current function and pmd_trans_unstable()
- * to linux/swapops.h to resovle dependency, which is too much code move.
- *
- * !pmd_present() is equivalent to is_pmd_migration_entry() currently,
- * because !pmd_present() pages can only be under migration not swapped
- * out.
- *
- * pmd_none() is preseved for future condition checks on pmd migration
+ * pmd_none() is preseved for future condition checks on pmd swap
* entries and not confusing with this function name, although it is
* redundant with !pmd_present().
*/
if (pmd_none(pmdval) || pmd_trans_huge(pmdval) ||
- (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval)))
+ ((IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) ||
+ IS_ENABLED(CONFIG_THP_SWAP)) && !pmd_present(pmdval)))
return 1;
if (unlikely(pmd_bad(pmdval))) {
pmd_clear_bad(pmd);
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7931fa888f11..bc92c2944756 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -406,6 +406,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#ifdef CONFIG_THP_SWAP
+extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long address, pmd_t orig_pmd);
extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
static inline bool transparent_hugepage_swapin_enabled(
@@ -431,6 +433,12 @@ static inline bool transparent_hugepage_swapin_enabled(
return false;
}
#else /* CONFIG_THP_SWAP */
+static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long address, pmd_t orig_pmd)
+{
+ return 0;
+}
+
static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd)
{
return 0;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index da42d1cdc26a..73fc77633642 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1663,8 +1663,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma,
pmd_populate(mm, pmd, pgtable);
}
-static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long address, pmd_t orig_pmd)
+int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long address, pmd_t orig_pmd)
{
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index e1e43654407c..34e64f3570c3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1933,6 +1933,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte)
return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte);
}
+static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd)
+{
+ return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd);
+}
+
/*
* No need to decide whether this PTE shares the swap entry with others,
* just let do_wp_page work it out if a write is requested later - to
@@ -1994,6 +1999,57 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
return ret;
}
+#ifdef CONFIG_THP_SWAP
+static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long addr, swp_entry_t entry, struct page *page)
+{
+ struct mem_cgroup *memcg;
+ struct swap_info_struct *si;
+ spinlock_t *ptl;
+ int ret = 1;
+
+ if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL,
+ &memcg, true)) {
+ ret = -ENOMEM;
+ goto out_nolock;
+ }
+
+ ptl = pmd_lock(vma->vm_mm, pmd);
+ if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) {
+ mem_cgroup_cancel_charge(page, memcg, true);
+ ret = 0;
+ goto out;
+ }
+
+ add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR);
+ add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+ get_page(page);
+ set_pmd_at(vma->vm_mm, addr, pmd,
+ pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot)));
+ page_add_anon_rmap(page, vma, addr, true);
+ mem_cgroup_commit_charge(page, memcg, true, true);
+ si = _swap_info_get(entry);
+ if (si)
+ swap_free_cluster(si, entry);
+ /*
+ * Move the page to the active list so it is not
+ * immediately swapped out again after swapon.
+ */
+ activate_page(page);
+out:
+ spin_unlock(ptl);
+out_nolock:
+ return ret;
+}
+#else
+static inline int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long addr, swp_entry_t entry,
+ struct page *page)
+{
+ return 0;
+}
+#endif
+
static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
swp_entry_t entry, struct page *page)
@@ -2034,7 +2090,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
swp_entry_t entry, struct page *page)
{
- pmd_t *pmd;
+ pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd;
unsigned long next;
int ret;
@@ -2042,6 +2098,24 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud,
do {
cond_resched();
next = pmd_addr_end(addr, end);
+ orig_pmd = *pmd;
+ if (thp_swap_supported() && is_swap_pmd(orig_pmd)) {
+ if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd)))
+ continue;
+ /* Huge cluster has been split already */
+ if (!PageTransCompound(page)) {
+ ret = split_huge_swap_pmd(vma, pmd,
+ addr, orig_pmd);
+ if (ret)
+ return ret;
+ ret = unuse_pte_range(vma, pmd, addr,
+ next, entry, page);
+ } else
+ ret = unuse_pmd(vma, pmd, addr, entry, page);
+ if (ret)
+ return ret;
+ continue;
+ }
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
continue;
ret = unuse_pte_range(vma, pmd, addr, next, entry, page);
@@ -2206,6 +2280,7 @@ int try_to_unuse(unsigned int type, bool frontswap,
* to prevent compiler doing
* something odd.
*/
+ struct swap_cluster_info *ci = NULL;
unsigned char swcount;
struct page *page;
swp_entry_t entry;
@@ -2235,6 +2310,7 @@ int try_to_unuse(unsigned int type, bool frontswap,
* there are races when an instance of an entry might be missed.
*/
while ((i = find_next_to_unuse(si, i, frontswap)) != 0) {
+retry:
if (signal_pending(current)) {
retval = -EINTR;
break;
@@ -2246,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap,
* page and read the swap into it.
*/
swap_map = &si->swap_map[i];
+ if (si->cluster_info)
+ ci = si->cluster_info + i / SWAPFILE_CLUSTER;
entry = swp_entry(type, i);
page = read_swap_cache_async(entry,
GFP_HIGHUSER_MOVABLE, NULL, 0, false);
@@ -2266,6 +2344,12 @@ int try_to_unuse(unsigned int type, bool frontswap,
*/
if (!swcount || swcount == SWAP_MAP_BAD)
continue;
+ /* Split huge cluster if failed to allocate huge page */
+ if (thp_swap_supported() && cluster_is_huge(ci)) {
+ retval = split_swap_cluster(entry, false);
+ if (!retval || retval == -EEXIST)
+ goto retry;
+ }
retval = -ENOMEM;
break;
}
--
2.16.4
next prev parent reply other threads:[~2018-06-22 3:57 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-22 3:51 [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 01/21] mm, THP, swap: Enable PMD swap operations for CONFIG_THP_SWAP Huang, Ying
2018-07-07 21:11 ` Dan Williams
2018-07-09 5:40 ` Huang, Ying
2018-07-09 6:08 ` Dan Williams
2018-07-09 6:34 ` Huang, Ying
2018-07-09 15:59 ` Dave Hansen
2018-07-10 1:08 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 02/21] mm, THP, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP Huang, Ying
2018-07-07 21:12 ` Dan Williams
2018-07-09 6:34 ` Huang, Ying
2018-07-09 16:00 ` Dave Hansen
2018-07-10 1:19 ` Huang, Ying
2018-07-10 1:59 ` Dave Hansen
2018-07-10 5:26 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate() Huang, Ying
2018-06-29 6:04 ` Matthew Wilcox
2018-07-02 5:19 ` Huang, Ying
2018-07-07 23:22 ` Dan Williams
2018-07-09 7:38 ` Huang, Ying
2018-07-09 16:51 ` Dave Hansen
2018-07-10 6:44 ` Huang, Ying
2018-07-10 13:50 ` Dave Hansen
2018-07-11 0:59 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 04/21] mm, THP, swap: Support PMD swap mapping in swapcache_free_cluster() Huang, Ying
2018-07-09 17:11 ` Dave Hansen
2018-07-10 6:53 ` Huang, Ying
2018-07-10 13:54 ` Dave Hansen
2018-07-11 1:08 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 05/21] mm, THP, swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Huang, Ying
2018-07-05 18:33 ` Daniel Jordan
2018-07-06 12:49 ` Huang, Ying
2018-07-09 17:19 ` Dave Hansen
2018-07-10 7:13 ` Huang, Ying
2018-07-10 14:07 ` Dave Hansen
2018-07-11 1:28 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 06/21] mm, THP, swap: Support PMD swap mapping when splitting huge PMD Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 07/21] mm, THP, swap: Support PMD swap mapping in split_swap_cluster() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 08/21] mm, THP, swap: Support to read a huge swap cluster for swapin a THP Huang, Ying
2018-06-29 6:21 ` Matthew Wilcox
2018-07-02 6:02 ` Huang, Ying
2018-07-04 0:12 ` Daniel Jordan
2018-07-04 2:24 ` Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 09/21] mm, THP, swap: Swapin a THP as a whole Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 10/21] mm, THP, swap: Support to count THP swapin and its fallback Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 11/21] mm, THP, swap: Add sysfs interface to configure THP swapin Huang, Ying
2018-06-22 3:51 ` Huang, Ying [this message]
2018-06-22 3:51 ` [PATCH -mm -v4 13/21] mm, THP, swap: Support PMD swap mapping in madvise_free() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 14/21] mm, cgroup, THP, swap: Support to move swap account for PMD swap mapping Huang, Ying
2018-07-09 17:20 ` Daniel Jordan
2018-07-10 7:49 ` Huang, Ying
2018-07-10 22:49 ` Daniel Jordan
2018-06-22 3:51 ` [PATCH -mm -v4 15/21] mm, THP, swap: Support to copy PMD swap mapping when fork() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 16/21] mm, THP, swap: Free PMD swap mapping when zap_huge_pmd() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 17/21] mm, THP, swap: Support PMD swap mapping for MADV_WILLNEED Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 18/21] mm, THP, swap: Support PMD swap mapping in mincore() Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 19/21] mm, THP, swap: Support PMD swap mapping in common path Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 20/21] mm, THP, swap: create PMD swap mapping when unmap the THP Huang, Ying
2018-06-22 3:51 ` [PATCH -mm -v4 21/21] mm, THP: Avoid to split THP when reclaim MADV_FREE THP Huang, Ying
2018-06-28 4:51 ` [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece Andrew Morton
2018-06-28 5:29 ` Huang, Ying
2018-06-28 5:31 ` Andrew Morton
2018-06-28 5:35 ` Huang, Ying
2018-06-28 6:18 ` Andrew Morton
2018-06-28 9:03 ` Matthew Wilcox
2018-06-29 1:17 ` Huang, Ying
2018-06-29 5:57 ` Matthew Wilcox
2018-07-02 5:19 ` Huang, Ying
2018-07-04 2:11 ` Sergey Senozhatsky
2018-07-04 2:20 ` Huang, Ying
2018-07-04 2:27 ` Sergey Senozhatsky
2018-07-04 2:59 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180622035151.6676-13-ying.huang@intel.com \
--to=ying.huang@intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=daniel.m.jordan@oracle.com \
--cc=dave.hansen@linux.intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).