All of lore.kernel.org
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shaohua Li <shli@kernel.org>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Zi Yan <zi.yan@cs.rutgers.edu>,
	Daniel Jordan <daniel.m.jordan@oracle.com>
Subject: [PATCH -V8 12/21] swap: Support PMD swap mapping in swapoff
Date: Fri,  7 Dec 2018 13:41:12 +0800	[thread overview]
Message-ID: <20181207054122.27822-13-ying.huang@intel.com> (raw)
In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com>

During swapoff, for a huge swap cluster, we need to allocate a THP,
read its contents into the THP and unuse the PMD and PTE swap mappings
to it.  If failed to allocate a THP, the huge swap cluster will be
split.

During unuse, if it is found that the swap cluster mapped by a PMD
swap mapping is split already, we will split the PMD swap mapping and
unuse the PTEs.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
---
 include/asm-generic/pgtable.h | 14 +-----
 include/linux/huge_mm.h       |  8 ++++
 mm/huge_memory.c              |  4 +-
 mm/swapfile.c                 | 86 ++++++++++++++++++++++++++++++++++-
 4 files changed, 97 insertions(+), 15 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 20aab7bfd487..5216124ba13c 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
 	barrier();
 #endif
 	/*
-	 * !pmd_present() checks for pmd migration entries
-	 *
-	 * The complete check uses is_pmd_migration_entry() in linux/swapops.h
-	 * But using that requires moving current function and pmd_trans_unstable()
-	 * to linux/swapops.h to resovle dependency, which is too much code move.
-	 *
-	 * !pmd_present() is equivalent to is_pmd_migration_entry() currently,
-	 * because !pmd_present() pages can only be under migration not swapped
-	 * out.
-	 *
-	 * pmd_none() is preseved for future condition checks on pmd migration
+	 * pmd_none() is preseved for future condition checks on pmd swap
 	 * entries and not confusing with this function name, although it is
 	 * redundant with !pmd_present().
 	 */
 	if (pmd_none(pmdval) || pmd_trans_huge(pmdval) ||
-		(IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval)))
+	    (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval)))
 		return 1;
 	if (unlikely(pmd_bad(pmdval))) {
 		pmd_clear_bad(pmd);
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index ea4999a4b6cd..6236f8b1d04b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -376,6 +376,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #ifdef CONFIG_THP_SWAP
+extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+			       unsigned long address, pmd_t orig_pmd);
 extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd);
 
 static inline bool transparent_hugepage_swapin_enabled(
@@ -401,6 +403,12 @@ static inline bool transparent_hugepage_swapin_enabled(
 	return false;
 }
 #else /* CONFIG_THP_SWAP */
+static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+				      unsigned long address, pmd_t orig_pmd)
+{
+	return 0;
+}
+
 static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd)
 {
 	return 0;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0ae7f824dbeb..f3c0a9e8fb9a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1721,8 +1721,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma,
 }
 
 #ifdef CONFIG_THP_SWAP
-static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
-			       unsigned long address, pmd_t orig_pmd)
+int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+			unsigned long address, pmd_t orig_pmd)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	spinlock_t *ptl;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index c22c11b4a879..b85ec810d941 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte)
 	return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte);
 }
 
+static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd)
+{
+	return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd);
+}
+
 /*
  * No need to decide whether this PTE shares the swap entry with others,
  * just let do_wp_page work it out if a write is requested later - to
@@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	return ret;
 }
 
+#ifdef CONFIG_THP_SWAP
+static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+		     unsigned long addr, swp_entry_t entry, struct page *page)
+{
+	struct mem_cgroup *memcg;
+	spinlock_t *ptl;
+	int ret = 1;
+
+	if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL,
+				  &memcg, true)) {
+		ret = -ENOMEM;
+		goto out_nolock;
+	}
+
+	ptl = pmd_lock(vma->vm_mm, pmd);
+	if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) {
+		mem_cgroup_cancel_charge(page, memcg, true);
+		ret = 0;
+		goto out;
+	}
+
+	add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR);
+	add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+	get_page(page);
+	set_pmd_at(vma->vm_mm, addr, pmd,
+		   pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot)));
+	page_add_anon_rmap(page, vma, addr, true);
+	mem_cgroup_commit_charge(page, memcg, true, true);
+	swap_free(entry, HPAGE_PMD_NR);
+	/*
+	 * Move the page to the active list so it is not
+	 * immediately swapped out again after swapon.
+	 */
+	activate_page(page);
+out:
+	spin_unlock(ptl);
+out_nolock:
+	return ret;
+}
+#else
+static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd,
+		     unsigned long addr, swp_entry_t entry, struct page *page)
+{
+	return 0;
+}
+#endif
+
 static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				unsigned long addr, unsigned long end,
 				swp_entry_t entry, struct page *page)
@@ -2032,7 +2084,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud,
 				unsigned long addr, unsigned long end,
 				swp_entry_t entry, struct page *page)
 {
-	pmd_t *pmd;
+	pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd;
 	unsigned long next;
 	int ret;
 
@@ -2040,6 +2092,27 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud,
 	do {
 		cond_resched();
 		next = pmd_addr_end(addr, end);
+		orig_pmd = *pmd;
+		if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(orig_pmd)) {
+			if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd)))
+				continue;
+			/*
+			 * Huge cluster has been split already, split
+			 * PMD swap mapping and fallback to unuse PTE
+			 */
+			if (!PageTransCompound(page)) {
+				ret = split_huge_swap_pmd(vma, pmd,
+							  addr, orig_pmd);
+				if (ret)
+					return ret;
+				ret = unuse_pte_range(vma, pmd, addr,
+						      next, entry, page);
+			} else
+				ret = unuse_pmd(vma, pmd, addr, entry, page);
+			if (ret)
+				return ret;
+			continue;
+		}
 		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
 			continue;
 		ret = unuse_pte_range(vma, pmd, addr, next, entry, page);
@@ -2233,6 +2306,7 @@ int try_to_unuse(unsigned int type, bool frontswap,
 	 * there are races when an instance of an entry might be missed.
 	 */
 	while ((i = find_next_to_unuse(si, i, frontswap)) != 0) {
+retry:
 		if (signal_pending(current)) {
 			retval = -EINTR;
 			break;
@@ -2248,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap,
 		page = read_swap_cache_async(entry,
 					GFP_HIGHUSER_MOVABLE, NULL, 0, false);
 		if (!page) {
+			struct swap_cluster_info *ci = NULL;
+
 			/*
 			 * Either swap_duplicate() failed because entry
 			 * has been freed independently, and will not be
@@ -2264,6 +2340,14 @@ int try_to_unuse(unsigned int type, bool frontswap,
 			 */
 			if (!swcount || swcount == SWAP_MAP_BAD)
 				continue;
+			if (si->cluster_info)
+				ci = si->cluster_info + i / SWAPFILE_CLUSTER;
+			/* Split huge cluster if failed to allocate huge page */
+			if (cluster_is_huge(ci)) {
+				retval = split_swap_cluster(entry, 0);
+				if (!retval || retval == -EEXIST)
+					goto retry;
+			}
 			retval = -ENOMEM;
 			break;
 		}
-- 
2.18.1


  parent reply	other threads:[~2018-12-07  5:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-07  5:41 [PATCH -V8 00/21] swap: Swapout/swapin THP in one piece Huang Ying
2018-12-07  5:41 ` [PATCH -V8 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Huang Ying
2018-12-07  5:41 ` [PATCH -V8 02/21] swap: Add __swap_duplicate_locked() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 03/21] swap: Support PMD swap mapping in swap_duplicate() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 04/21] swap: Support PMD swap mapping in put_swap_page() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 06/21] swap: Support PMD swap mapping when splitting huge PMD Huang Ying
2018-12-07  5:41 ` [PATCH -V8 07/21] swap: Support PMD swap mapping in split_swap_cluster() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 08/21] swap: Support to read a huge swap cluster for swapin a THP Huang Ying
2018-12-07  5:41 ` [PATCH -V8 09/21] swap: Swapin a THP in one piece Huang Ying
2018-12-07  5:41 ` [PATCH -V8 10/21] swap: Support to count THP swapin and its fallback Huang Ying
2018-12-07  5:41 ` [PATCH -V8 11/21] swap: Add sysfs interface to configure THP swapin Huang Ying
2018-12-07  5:41 ` Huang Ying [this message]
2018-12-07  5:41 ` [PATCH -V8 13/21] swap: Support PMD swap mapping in madvise_free() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 14/21] swap: Support to move swap account for PMD swap mapping Huang Ying
2018-12-07  5:41 ` [PATCH -V8 15/21] swap: Support to copy PMD swap mapping when fork() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Huang Ying
2018-12-07  5:41 ` [PATCH -V8 18/21] swap: Support PMD swap mapping in mincore() Huang Ying
2018-12-07  5:41 ` [PATCH -V8 19/21] swap: Support PMD swap mapping in common path Huang Ying
2018-12-07  5:41 ` [PATCH -V8 20/21] swap: create PMD swap mapping when unmap the THP Huang Ying
2018-12-07  5:41 ` [PATCH -V8 21/21] swap: Update help of CONFIG_THP_SWAP Huang Ying
2018-12-07  6:20 ` [PATCH -V8 00/21] swap: Swapout/swapin THP in one piece Huang, Ying
2018-12-07  6:20   ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181207054122.27822-13-ying.huang@intel.com \
    --to=ying.huang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.