linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Ning Qu <quning@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 24/24] kvm: teach kvm to map page teams as huge pages.
Date: Fri, 20 Feb 2015 20:31:38 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.1502202029340.14414@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.1502201941340.14414@eggly.anvils>

From: Andres Lagar-Cavilla <andreslc@google.com>

Include a small treatise on the locking rules around page teams.

Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
 arch/x86/kvm/mmu.c         |  155 +++++++++++++++++++++++++++++------
 arch/x86/kvm/paging_tmpl.h |    3 
 2 files changed, 132 insertions(+), 26 deletions(-)

--- thpfs.orig/arch/x86/kvm/mmu.c	2015-02-20 19:35:20.095835839 -0800
+++ thpfs/arch/x86/kvm/mmu.c	2015-02-20 19:35:24.775825138 -0800
@@ -32,6 +32,7 @@
 #include <linux/module.h>
 #include <linux/swap.h>
 #include <linux/hugetlb.h>
+#include <linux/pageteam.h>
 #include <linux/compiler.h>
 #include <linux/srcu.h>
 #include <linux/slab.h>
@@ -2723,7 +2724,106 @@ static int kvm_handle_bad_page(struct kv
 	return -EFAULT;
 }
 
+/*
+ * We are holding kvm->mmu_lock, serializing against mmu notifiers.
+ * We have a ref on page.
+ *
+ * A team of tmpfs 512 pages can be mapped as an integral hugepage as long as
+ * the team is not disbanded. The head page is !PageTeam if disbanded.
+ *
+ * Huge tmpfs pages are disbanded for page freeing, shrinking, or swap out.
+ *
+ * Freeing (punch hole, truncation):
+ *  shmem_undo_range
+ *     disband
+ *       lock head page
+ *       unmap_mapping_range
+ *         zap_page_range_single
+ *           mmu_notifier_invalidate_range_start
+ *           split_huge_page_pmd or zap_huge_pmd
+ *             remap_team_by_ptes
+ *           mmu_notifier_invalidate_range_end
+ *       unlock head page
+ *     pagevec_release
+ *        pages are freed
+ * If we race with disband MMUN will fix us up. The head page lock also
+ * serializes any gup() against resolving the page team.
+ *
+ * Shrinker, disbands, but once a page team is fully banded up it no longer is
+ * tagged as shrinkable in the radix tree and hence can't be shrunk.
+ *  shmem_shrink_hugehole
+ *     shmem_choose_hugehole
+ *        disband
+ *     migrate_pages
+ *        try_to_unmap
+ *           mmu_notifier_invalidate_page
+ * Double-indemnity: if we race with disband, MMUN will fix us up.
+ *
+ * Swap out:
+ *  shrink_page_list
+ *    try_to_unmap
+ *      unmap_team_by_pmd
+ *         mmu_notifier_invalidate_range
+ *    pageout
+ *      shmem_writepage
+ *         disband
+ *    free_hot_cold_page_list
+ *       pages are freed
+ * If we race with disband, no one will come to fix us up. So, we check for a
+ * pmd mapping, serializing against the MMUN in unmap_team_by_pmd, which will
+ * break the pmd mapping if it runs before us (or invalidate our mapping if ran
+ * after).
+ *
+ * N.B. migration requires further thought all around.
+ */
+static bool is_huge_tmpfs(struct mm_struct *mm, struct page *page,
+			  unsigned long address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	struct page *head;
+
+	if (PageAnon(page) || !PageTeam(page))
+		return false;
+	/*
+	 * This strictly assumes PMD-level huge-ing.
+	 * Which is the only thing KVM can handle here.
+	 * N.B. Assume (like everywhere else) PAGE_SIZE == PAGE_CACHE_SIZE.
+	 */
+	if (((address & (HPAGE_PMD_SIZE - 1)) >> PAGE_SHIFT) !=
+	    (page->index & (HPAGE_PMD_NR-1)))
+		return false;
+	head = team_head(page);
+	if (!PageTeam(head))
+		return false;
+	/*
+	 * Attempt at early discard. If the head races into becoming SwapCache,
+	 * and thus having a bogus team_usage, we'll know for sure next.
+	 */
+	if (!team_hugely_mapped(head))
+		return false;
+	/*
+	 * Open code page_check_address_pmd, otherwise we'd have to make it
+	 * a module-visible symbol. Simplify it. No need for page table lock,
+	 * as mmu notifier serialization ensures we are on either side of
+	 * unmap_team_by_pmd or remap_team_by_ptes.
+	 */
+	address &= HPAGE_PMD_MASK;
+	pgd = pgd_offset(mm, address);
+	if (!pgd_present(*pgd))
+		return false;
+	pud = pud_offset(pgd, address);
+	if (!pud_present(*pud))
+		return false;
+	pmd = pmd_offset(pud, address);
+	if (!pmd_trans_huge(*pmd))
+		return false;
+	return pmd_page(*pmd) == head;
+}
+
 static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
+					unsigned long address,
 					gfn_t *gfnp, pfn_t *pfnp, int *levelp)
 {
 	pfn_t pfn = *pfnp;
@@ -2737,29 +2837,34 @@ static void transparent_hugepage_adjust(
 	 * here.
 	 */
 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
-	    level == PT_PAGE_TABLE_LEVEL &&
-	    PageTransCompound(pfn_to_page(pfn)) &&
-	    !has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
-		unsigned long mask;
-		/*
-		 * mmu_notifier_retry was successful and we hold the
-		 * mmu_lock here, so the pmd can't become splitting
-		 * from under us, and in turn
-		 * __split_huge_page_refcount() can't run from under
-		 * us and we can safely transfer the refcount from
-		 * PG_tail to PG_head as we switch the pfn to tail to
-		 * head.
-		 */
-		*levelp = level = PT_DIRECTORY_LEVEL;
-		mask = KVM_PAGES_PER_HPAGE(level) - 1;
-		VM_BUG_ON((gfn & mask) != (pfn & mask));
-		if (pfn & mask) {
-			gfn &= ~mask;
-			*gfnp = gfn;
-			kvm_release_pfn_clean(pfn);
-			pfn &= ~mask;
-			kvm_get_pfn(pfn);
-			*pfnp = pfn;
+	    level == PT_PAGE_TABLE_LEVEL) {
+		struct page *page = pfn_to_page(pfn);
+
+		if ((PageTransCompound(page) ||
+		     is_huge_tmpfs(vcpu->kvm->mm, page, address)) &&
+		    !has_wrprotected_page(vcpu->kvm, gfn,
+					  PT_DIRECTORY_LEVEL)) {
+			unsigned long mask;
+			/*
+			 * mmu_notifier_retry was successful and we hold the
+			 * mmu_lock here, so the pmd can't become splitting
+			 * from under us, and in turn
+			 * __split_huge_page_refcount() can't run from under
+			 * us and we can safely transfer the refcount from
+			 * PG_tail to PG_head as we switch the pfn to tail to
+			 * head.
+			 */
+			*levelp = level = PT_DIRECTORY_LEVEL;
+			mask = KVM_PAGES_PER_HPAGE(level) - 1;
+			VM_BUG_ON((gfn & mask) != (pfn & mask));
+			if (pfn & mask) {
+				gfn &= ~mask;
+				*gfnp = gfn;
+				kvm_release_pfn_clean(pfn);
+				pfn &= ~mask;
+				kvm_get_pfn(pfn);
+				*pfnp = pfn;
+			}
 		}
 	}
 }
@@ -2955,7 +3060,7 @@ static int nonpaging_map(struct kvm_vcpu
 		goto out_unlock;
 	make_mmu_pages_available(vcpu);
 	if (likely(!force_pt_level))
-		transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
+		transparent_hugepage_adjust(vcpu, hva, &gfn, &pfn, &level);
 	r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
 			 prefault);
 	spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3440,7 +3545,7 @@ static int tdp_page_fault(struct kvm_vcp
 		goto out_unlock;
 	make_mmu_pages_available(vcpu);
 	if (likely(!force_pt_level))
-		transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
+		transparent_hugepage_adjust(vcpu, hva, &gfn, &pfn, &level);
 	r = __direct_map(vcpu, gpa, write, map_writable,
 			 level, gfn, pfn, prefault);
 	spin_unlock(&vcpu->kvm->mmu_lock);
--- thpfs.orig/arch/x86/kvm/paging_tmpl.h	2015-02-20 19:35:20.095835839 -0800
+++ thpfs/arch/x86/kvm/paging_tmpl.h	2015-02-20 19:35:24.775825138 -0800
@@ -794,7 +794,8 @@ static int FNAME(page_fault)(struct kvm_
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	make_mmu_pages_available(vcpu);
 	if (!force_pt_level)
-		transparent_hugepage_adjust(vcpu, &walker.gfn, &pfn, &level);
+		transparent_hugepage_adjust(vcpu, hva, &walker.gfn, &pfn,
+					    &level);
 	r = FNAME(fetch)(vcpu, addr, &walker, write_fault,
 			 level, pfn, map_writable, prefault);
 	++vcpu->stat.pf_fixed;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-02-21  4:31 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-21  3:49 [PATCH 00/24] huge tmpfs: an alternative approach to THPageCache Hugh Dickins
2015-02-21  3:51 ` [PATCH 01/24] mm: update_lru_size warn and reset bad lru_size Hugh Dickins
2015-02-23  9:30   ` Kirill A. Shutemov
2015-03-23  2:44     ` Hugh Dickins
2015-02-21  3:54 ` [PATCH 02/24] mm: update_lru_size do the __mod_zone_page_state Hugh Dickins
2015-02-21  3:56 ` [PATCH 03/24] mm: use __SetPageSwapBacked and don't ClearPageSwapBacked Hugh Dickins
2015-02-25 10:53   ` Mel Gorman
2015-03-23  3:01     ` Hugh Dickins
2015-02-21  3:58 ` [PATCH 04/24] mm: make page migration's newpage handling more robust Hugh Dickins
2015-02-21  4:00 ` [PATCH 05/24] tmpfs: preliminary minor tidyups Hugh Dickins
2015-02-21  4:01 ` [PATCH 06/24] huge tmpfs: prepare counts in meminfo, vmstat and SysRq-m Hugh Dickins
2015-02-21  4:03 ` [PATCH 07/24] huge tmpfs: include shmem freeholes in available memory counts Hugh Dickins
2015-02-21  4:05 ` [PATCH 08/24] huge tmpfs: prepare huge=N mount option and /proc/sys/vm/shmem_huge Hugh Dickins
2015-02-21  4:06 ` [PATCH 09/24] huge tmpfs: try to allocate huge pages, split into a team Hugh Dickins
2015-02-21  4:07 ` [PATCH 10/24] huge tmpfs: avoid team pages in a few places Hugh Dickins
2015-02-21  4:09 ` [PATCH 11/24] huge tmpfs: shrinker to migrate and free underused holes Hugh Dickins
2015-03-19 16:56   ` Konstantin Khlebnikov
2015-03-23  4:40     ` Hugh Dickins
2015-03-23 12:50       ` Kirill A. Shutemov
2015-03-23 13:50         ` Kirill A. Shutemov
2015-03-24 12:57       ` Kirill A. Shutemov
2015-03-25  0:41         ` Hugh Dickins
2015-02-21  4:11 ` [PATCH 12/24] huge tmpfs: get_unmapped_area align and fault supply huge page Hugh Dickins
2015-02-21  4:12 ` [PATCH 13/24] huge tmpfs: extend get_user_pages_fast to shmem pmd Hugh Dickins
2015-02-21  4:13 ` [PATCH 14/24] huge tmpfs: extend vma_adjust_trans_huge " Hugh Dickins
2015-02-21  4:15 ` [PATCH 15/24] huge tmpfs: rework page_referenced_one and try_to_unmap_one Hugh Dickins
2015-02-21  4:16 ` [PATCH 16/24] huge tmpfs: fix problems from premature exposure of pagetable Hugh Dickins
2015-07-01 10:53   ` Kirill A. Shutemov
2015-02-21  4:18 ` [PATCH 17/24] huge tmpfs: map shmem by huge page pmd or by page team ptes Hugh Dickins
2015-02-21  4:20 ` [PATCH 18/24] huge tmpfs: mmap_sem is unlocked when truncation splits huge pmd Hugh Dickins
2015-02-21  4:22 ` [PATCH 19/24] huge tmpfs: disband split huge pmds on race or memory failure Hugh Dickins
2015-02-21  4:23 ` [PATCH 20/24] huge tmpfs: use Unevictable lru with variable hpage_nr_pages() Hugh Dickins
2015-02-21  4:25 ` [PATCH 21/24] huge tmpfs: fix Mlocked meminfo, tracking huge and unhuge mlocks Hugh Dickins
2015-02-21  4:27 ` [PATCH 22/24] huge tmpfs: fix Mapped meminfo, tracking huge and unhuge mappings Hugh Dickins
2015-02-21  4:29 ` [PATCH 23/24] kvm: plumb return of hva when resolving page fault Hugh Dickins
2015-02-21  4:31 ` Hugh Dickins [this message]
2015-02-23 13:48 ` [PATCH 00/24] huge tmpfs: an alternative approach to THPageCache Kirill A. Shutemov
2015-03-23  2:25   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1502202029340.14414@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quning@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).