linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling
@ 2018-09-17 14:29 Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go Thomas Gleixner
                   ` (10 more replies)
  0 siblings, 11 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

This is the 3rd revision of this series. Previous versions are here:

  https://lkml.kernel.org/r/20180907150119.325866892@linutronix.de
  https://lkml.kernel.org/r/20180914130917.155416208@linutronix.de

Changes since v2:

  - Fix the print format so it works for 32bit(PAE) and 64bit

  - Make mark_rodata_do() on 32bit convert text and rodata in one go to
    prevent the 'invalid existing mapping' check to trigger

Bin reported that try_preserve_large_page() in the page attribute code
consumes an large amount of CPU time. His initial attempts of addressing
this made me look deeper into the code.

The logic in this code is not really intelligent. It requires to check a
large page in 4k steps for conflicts. That's insane as most operations do
not conflict at all.

The code also lacks sanity checks which allow to detect whether the
existing mapping is incorrect vs. the static protections.

Any form of debugging or statistics is missing as well.

The following series addresses this:

  - Clean up the code so it becomes extensible

  - Provide the ability to check a full range for conflicts

  - Add debug output and statistics to quantify the changes and to allow
    observation of the mechanism in the future.

  - Add a sanity check for existing mappings with a fixup for the 2M case
    and a warning for the 1G case. The 2M case is trivial to address, the
    1G case requires larger changes and is just warned about for now.

  - Avoid conflict checks for operations which clear the PRESENT bit

  - Utilize the range checks to detect conflicts in one operation

  - Drop the 4k wise checking which turned out to provide no extra large
    page preservation in testing. There might be corner cases where a page
    would be preserved, but that's overkill for the common cases.

Before:
 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800770
 4K pages set-checked:             7668

After:
 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages set-checked:             7668

This gets rid of ~800000 checks whether a particular address is with a
static protection region. Each check tests against 4 different regions,
which adds up to several million instructions.

Thanks,

	tglx

8<---------------------
 Kconfig       |    8 
 mm/init_32.c  |   23 --
 mm/pageattr.c |  516 +++++++++++++++++++++++++++++++++++++++++++---------------
 3 files changed, 396 insertions(+), 151 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 16:15   ` Dave Hansen
  2018-09-27 18:45   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Thomas Gleixner
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross,
	kernel test robot

[-- Attachment #1: x86-mm-init32--Mark-text-and-rodata-RO-in-one-go.patch --]
[-- Type: text/plain, Size: 2268 bytes --]

The sequence of marking text and rodata read-only in 32bit init is:

  set_ro(text);
  kernel_set_to_readonly = 1;
  set_ro(rodata);

When kernel_set_to_readonly is 1 it enables the protection mechanism in CPA
for the read only regions. With the upcoming checks for existing mappings
this consequently triggers the warning about an existing mapping being
incorrect vs. static protections because rodata has not been converted yet.

There is no technical reason to split the two, so just combine the RO
protection to convert text and rodata in one go.

Convert the printks to pr_info while at it.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/init_32.c |   23 ++++-------------------
 1 file changed, 4 insertions(+), 19 deletions(-)

--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -923,34 +923,19 @@ static void mark_nxdata_nx(void)
 void mark_rodata_ro(void)
 {
 	unsigned long start = PFN_ALIGN(_text);
-	unsigned long size = PFN_ALIGN(_etext) - start;
+	unsigned long size = (unsigned long)__end_rodata - start;
 
 	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
-	printk(KERN_INFO "Write protecting the kernel text: %luk\n",
+	pr_info("Write protecting kernel text and read-only data: %luk\n",
 		size >> 10);
 
 	kernel_set_to_readonly = 1;
 
 #ifdef CONFIG_CPA_DEBUG
-	printk(KERN_INFO "Testing CPA: Reverting %lx-%lx\n",
-		start, start+size);
-	set_pages_rw(virt_to_page(start), size>>PAGE_SHIFT);
-
-	printk(KERN_INFO "Testing CPA: write protecting again\n");
-	set_pages_ro(virt_to_page(start), size>>PAGE_SHIFT);
-#endif
-
-	start += size;
-	size = (unsigned long)__end_rodata - start;
-	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
-	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
-		size >> 10);
-
-#ifdef CONFIG_CPA_DEBUG
-	printk(KERN_INFO "Testing CPA: undo %lx-%lx\n", start, start + size);
+	pr_info("Testing CPA: Reverting %lx-%lx\n", start, start + size);
 	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);
 
-	printk(KERN_INFO "Testing CPA: write protecting again\n");
+	pr_info("Testing CPA: write protecting again\n");
 	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
 #endif
 	mark_nxdata_nx();



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page()
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-18  7:03   ` Peter Zijlstra
                     ` (3 more replies)
  2018-09-17 14:29 ` [patch V3 03/11] x86/mm/cpa: Rework static_protections() Thomas Gleixner
                   ` (8 subsequent siblings)
  10 siblings, 4 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Split-try_preserve_large_page--.patch --]
[-- Type: text/plain, Size: 7653 bytes --]

Avoid the extra variable and gotos by splitting the function into the
actual algorithm and a callable function which contains the lock
protection.

Rename it to should_split_large_page() while at it so the return values make
actually sense.

Clean up the code flow, comments and general whitespace damage while at it. No
functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |  121 ++++++++++++++++++++++++-------------------------
 1 file changed, 60 insertions(+), 61 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -421,18 +421,18 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd,
  */
 pte_t *lookup_address(unsigned long address, unsigned int *level)
 {
-        return lookup_address_in_pgd(pgd_offset_k(address), address, level);
+	return lookup_address_in_pgd(pgd_offset_k(address), address, level);
 }
 EXPORT_SYMBOL_GPL(lookup_address);
 
 static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
 				  unsigned int *level)
 {
-        if (cpa->pgd)
+	if (cpa->pgd)
 		return lookup_address_in_pgd(cpa->pgd + pgd_index(address),
 					       address, level);
 
-        return lookup_address(address, level);
+	return lookup_address(address, level);
 }
 
 /*
@@ -549,27 +549,22 @@ static pgprot_t pgprot_clear_protnone_bi
 	return prot;
 }
 
-static int
-try_preserve_large_page(pte_t *kpte, unsigned long address,
-			struct cpa_data *cpa)
+static int __should_split_large_page(pte_t *kpte, unsigned long address,
+				     struct cpa_data *cpa)
 {
-	unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn;
-	pte_t new_pte, old_pte, *tmp;
+	unsigned long numpages, pmask, psize, lpaddr, addr, pfn, old_pfn;
 	pgprot_t old_prot, new_prot, req_prot;
-	int i, do_split = 1;
+	pte_t new_pte, old_pte, *tmp;
 	enum pg_level level;
+	int i;
 
-	if (cpa->force_split)
-		return 1;
-
-	spin_lock(&pgd_lock);
 	/*
 	 * Check for races, another CPU might have split this page
 	 * up already:
 	 */
 	tmp = _lookup_address_cpa(cpa, address, &level);
 	if (tmp != kpte)
-		goto out_unlock;
+		return 1;
 
 	switch (level) {
 	case PG_LEVEL_2M:
@@ -581,8 +576,7 @@ try_preserve_large_page(pte_t *kpte, uns
 		old_pfn = pud_pfn(*(pud_t *)kpte);
 		break;
 	default:
-		do_split = -EINVAL;
-		goto out_unlock;
+		return -EINVAL;
 	}
 
 	psize = page_level_size(level);
@@ -592,8 +586,8 @@ try_preserve_large_page(pte_t *kpte, uns
 	 * Calculate the number of pages, which fit into this large
 	 * page starting at address:
 	 */
-	nextpage_addr = (address + psize) & pmask;
-	numpages = (nextpage_addr - address) >> PAGE_SHIFT;
+	lpaddr = (address + psize) & pmask;
+	numpages = (lpaddr - address) >> PAGE_SHIFT;
 	if (numpages < cpa->numpages)
 		cpa->numpages = numpages;
 
@@ -620,57 +614,62 @@ try_preserve_large_page(pte_t *kpte, uns
 		pgprot_val(req_prot) |= _PAGE_PSE;
 
 	/*
-	 * old_pfn points to the large page base pfn. So we need
-	 * to add the offset of the virtual address:
+	 * old_pfn points to the large page base pfn. So we need to add the
+	 * offset of the virtual address:
 	 */
 	pfn = old_pfn + ((address & (psize - 1)) >> PAGE_SHIFT);
 	cpa->pfn = pfn;
 
-	new_prot = static_protections(req_prot, address, pfn);
+	/*
+	 * Calculate the large page base address and the number of 4K pages
+	 * in the large page
+	 */
+	lpaddr = address & pmask;
+	numpages = psize >> PAGE_SHIFT;
 
 	/*
-	 * We need to check the full range, whether
-	 * static_protection() requires a different pgprot for one of
-	 * the pages in the range we try to preserve:
+	 * Make sure that the requested pgprot does not violate the static
+	 * protections. Check the full large page whether one of the pages
+	 * in it results in a different pgprot than the first one of the
+	 * requested range. If yes, then the page needs to be split.
 	 */
-	addr = address & pmask;
+	new_prot = static_protections(req_prot, address, pfn);
 	pfn = old_pfn;
-	for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr += PAGE_SIZE, pfn++) {
+	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
 		pgprot_t chk_prot = static_protections(req_prot, addr, pfn);
 
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
-			goto out_unlock;
+			return 1;
 	}
 
-	/*
-	 * If there are no changes, return. maxpages has been updated
-	 * above:
-	 */
-	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
-		do_split = 0;
-		goto out_unlock;
-	}
+	/* If there are no changes, return. */
+	if (pgprot_val(new_prot) == pgprot_val(old_prot))
+		return 0;
 
 	/*
-	 * We need to change the attributes. Check, whether we can
-	 * change the large page in one go. We request a split, when
-	 * the address is not aligned and the number of pages is
-	 * smaller than the number of pages in the large page. Note
-	 * that we limited the number of possible pages already to
-	 * the number of pages in the large page.
+	 * Verify that the address is aligned and the number of pages
+	 * covers the full page.
 	 */
-	if (address == (address & pmask) && cpa->numpages == (psize >> PAGE_SHIFT)) {
-		/*
-		 * The address is aligned and the number of pages
-		 * covers the full page.
-		 */
-		new_pte = pfn_pte(old_pfn, new_prot);
-		__set_pmd_pte(kpte, address, new_pte);
-		cpa->flags |= CPA_FLUSHTLB;
-		do_split = 0;
-	}
+	if (address != lpaddr || cpa->numpages != numpages)
+		return 1;
+
+	/* All checks passed. Update the large page mapping. */
+	new_pte = pfn_pte(old_pfn, new_prot);
+	__set_pmd_pte(kpte, address, new_pte);
+	cpa->flags |= CPA_FLUSHTLB;
+	return 0;
+}
 
-out_unlock:
+static int should_split_large_page(pte_t *kpte, unsigned long address,
+				   struct cpa_data *cpa)
+{
+	int do_split;
+
+	if (cpa->force_split)
+		return 1;
+
+	spin_lock(&pgd_lock);
+	do_split = __should_split_large_page(kpte, address, cpa);
 	spin_unlock(&pgd_lock);
 
 	return do_split;
@@ -1273,7 +1272,7 @@ static int __change_page_attr(struct cpa
 	 * Check, whether we can keep the large page intact
 	 * and just change the pte:
 	 */
-	do_split = try_preserve_large_page(kpte, address, cpa);
+	do_split = should_split_large_page(kpte, address, cpa);
 	/*
 	 * When the range fits into the existing large page,
 	 * return. cp->numpages and cpa->tlbflush have been updated in
@@ -1288,23 +1287,23 @@ static int __change_page_attr(struct cpa
 	err = split_large_page(cpa, kpte, address);
 	if (!err) {
 		/*
-	 	 * Do a global flush tlb after splitting the large page
-	 	 * and before we do the actual change page attribute in the PTE.
-	 	 *
-	 	 * With out this, we violate the TLB application note, that says
-	 	 * "The TLBs may contain both ordinary and large-page
+		 * Do a global flush tlb after splitting the large page
+		 * and before we do the actual change page attribute in the PTE.
+		 *
+		 * With out this, we violate the TLB application note, that says
+		 * "The TLBs may contain both ordinary and large-page
 		 *  translations for a 4-KByte range of linear addresses. This
 		 *  may occur if software modifies the paging structures so that
 		 *  the page size used for the address range changes. If the two
 		 *  translations differ with respect to page frame or attributes
 		 *  (e.g., permissions), processor behavior is undefined and may
 		 *  be implementation-specific."
-	 	 *
-	 	 * We do this global tlb flush inside the cpa_lock, so that we
+		 *
+		 * We do this global tlb flush inside the cpa_lock, so that we
 		 * don't allow any other cpu, with stale tlb entries change the
 		 * page attribute in parallel, that also falls into the
 		 * just split large page entry.
-	 	 */
+		 */
 		flush_tlb_all();
 		goto repeat;
 	}



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 03/11] x86/mm/cpa: Rework static_protections()
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 16:33   ` Dave Hansen
  2018-09-27 18:46   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 04/11] x86/mm/cpa: Allow range check for static protections Thomas Gleixner
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Rework-static_protections--.patch --]
[-- Type: text/plain, Size: 6924 bytes --]

static_protections() is pretty unreadable. Split it up into separate checks
for each protection area.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |  159 +++++++++++++++++++++++++++++--------------------
 1 file changed, 95 insertions(+), 64 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -286,84 +286,115 @@ static void cpa_flush_array(unsigned lon
 	}
 }
 
+#ifdef CONFIG_PCI_BIOS
 /*
- * Certain areas of memory on x86 require very specific protection flags,
- * for example the BIOS area or kernel text. Callers don't always get this
- * right (again, ioremap() on BIOS memory is not uncommon) so this function
- * checks and fixes these known static required protection bits.
+ * The BIOS area between 640k and 1Mb needs to be executable for PCI BIOS
+ * based config access (CONFIG_PCI_GOBIOS) support.
  */
-static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
-				   unsigned long pfn)
-{
-	pgprot_t forbidden = __pgprot(0);
+#define BIOS_PFN	PFN_DOWN(BIOS_BEGIN)
+#define BIOS_PFN_END	PFN_DOWN(BIOS_END)
 
-	/*
-	 * The BIOS area between 640k and 1Mb needs to be executable for
-	 * PCI BIOS based config access (CONFIG_PCI_GOBIOS) support.
-	 */
-#ifdef CONFIG_PCI_BIOS
-	if (pcibios_enabled && within(pfn, BIOS_BEGIN >> PAGE_SHIFT, BIOS_END >> PAGE_SHIFT))
-		pgprot_val(forbidden) |= _PAGE_NX;
+static pgprotval_t protect_pci_bios(unsigned long pfn)
+{
+	if (pcibios_enabled && within(pfn, BIOS_PFN, BIOS_PFN_END))
+		return _PAGE_NX;
+	return 0;
+}
+#else
+static pgprotval_t protect_pci_bios(unsigned long pfn)
+{
+	return 0;
+}
 #endif
 
-	/*
-	 * The kernel text needs to be executable for obvious reasons
-	 * Does not cover __inittext since that is gone later on. On
-	 * 64bit we do not enforce !NX on the low mapping
-	 */
-	if (within(address, (unsigned long)_text, (unsigned long)_etext))
-		pgprot_val(forbidden) |= _PAGE_NX;
+/*
+ * The .rodata section needs to be read-only. Using the pfn catches all
+ * aliases.  This also includes __ro_after_init, so do not enforce until
+ * kernel_set_to_readonly is true.
+ */
+static pgprotval_t protect_rodata(unsigned long pfn)
+{
+	unsigned long start_pfn = __pa_symbol(__start_rodata) >> PAGE_SHIFT;
+	unsigned long end_pfn = __pa_symbol(__end_rodata) >> PAGE_SHIFT;
 
-	/*
-	 * The .rodata section needs to be read-only. Using the pfn
-	 * catches all aliases.  This also includes __ro_after_init,
-	 * so do not enforce until kernel_set_to_readonly is true.
-	 */
-	if (kernel_set_to_readonly &&
-	    within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
-		   __pa_symbol(__end_rodata) >> PAGE_SHIFT))
-		pgprot_val(forbidden) |= _PAGE_RW;
+	if (kernel_set_to_readonly && within(pfn, start_pfn, end_pfn))
+		return _PAGE_RW;
+	return 0;
+}
+
+/*
+ * The kernel text needs to be executable for obvious reasons. This does
+ * not cover __inittext since that is gone after boot. On 64bit we do not
+ * enforce !NX on the low mapping
+ */
+static pgprotval_t protect_kernel_text(unsigned long address)
+{
+	if (within(address, (unsigned long)_text, (unsigned long)_etext))
+		return _PAGE_NX;
+	return 0;
+}
 
 #if defined(CONFIG_X86_64)
+/*
+ * Once the kernel maps the text as RO (kernel_set_to_readonly is set),
+ * kernel text mappings for the large page aligned text, rodata sections
+ * will be always read-only. For the kernel identity mappings covering the
+ * holes caused by this alignment can be anything that user asks.
+ *
+ * This will preserve the large page mappings for kernel text/data at no
+ * extra cost.
+ */
+static pgprotval_t protect_kernel_text_ro(unsigned long address)
+{
+	unsigned long end = (unsigned long)__end_rodata_hpage_align;
+	unsigned long start = (unsigned long)_text;
+	unsigned int level;
+
+	if (!kernel_set_to_readonly || !within(address, start, end))
+		return 0;
 	/*
-	 * Once the kernel maps the text as RO (kernel_set_to_readonly is set),
-	 * kernel text mappings for the large page aligned text, rodata sections
-	 * will be always read-only. For the kernel identity mappings covering
-	 * the holes caused by this alignment can be anything that user asks.
+	 * Don't enforce the !RW mapping for the kernel text mapping, if
+	 * the current mapping is already using small page mapping.  No
+	 * need to work hard to preserve large page mappings in this case.
 	 *
-	 * This will preserve the large page mappings for kernel text/data
-	 * at no extra cost.
+	 * This also fixes the Linux Xen paravirt guest boot failure caused
+	 * by unexpected read-only mappings for kernel identity
+	 * mappings. In this paravirt guest case, the kernel text mapping
+	 * and the kernel identity mapping share the same page-table pages,
+	 * so the protections for kernel text and identity mappings have to
+	 * be the same.
 	 */
-	if (kernel_set_to_readonly &&
-	    within(address, (unsigned long)_text,
-		   (unsigned long)__end_rodata_hpage_align)) {
-		unsigned int level;
-
-		/*
-		 * Don't enforce the !RW mapping for the kernel text mapping,
-		 * if the current mapping is already using small page mapping.
-		 * No need to work hard to preserve large page mappings in this
-		 * case.
-		 *
-		 * This also fixes the Linux Xen paravirt guest boot failure
-		 * (because of unexpected read-only mappings for kernel identity
-		 * mappings). In this paravirt guest case, the kernel text
-		 * mapping and the kernel identity mapping share the same
-		 * page-table pages. Thus we can't really use different
-		 * protections for the kernel text and identity mappings. Also,
-		 * these shared mappings are made of small page mappings.
-		 * Thus this don't enforce !RW mapping for small page kernel
-		 * text mapping logic will help Linux Xen parvirt guest boot
-		 * as well.
-		 */
-		if (lookup_address(address, &level) && (level != PG_LEVEL_4K))
-			pgprot_val(forbidden) |= _PAGE_RW;
-	}
+	if (lookup_address(address, &level) && (level != PG_LEVEL_4K))
+		return _PAGE_RW;
+	return 0;
+}
+#else
+static pgprotval_t protect_kernel_text_ro(unsigned long address)
+{
+	return 0;
+}
 #endif
 
-	prot = __pgprot(pgprot_val(prot) & ~pgprot_val(forbidden));
+/*
+ * Certain areas of memory on x86 require very specific protection flags,
+ * for example the BIOS area or kernel text. Callers don't always get this
+ * right (again, ioremap() on BIOS memory is not uncommon) so this function
+ * checks and fixes these known static required protection bits.
+ */
+static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
+					  unsigned long pfn)
+{
+	pgprotval_t forbidden;
+
+	/* Operate on the virtual address */
+	forbidden  = protect_kernel_text(address);
+	forbidden |= protect_kernel_text_ro(address);
+
+	/* Check the PFN directly */
+	forbidden |= protect_pci_bios(pfn);
+	forbidden |= protect_rodata(pfn);
 
-	return prot;
+	return __pgprot(pgprot_val(prot) & ~forbidden);
 }
 
 /*



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 04/11] x86/mm/cpa: Allow range check for static protections
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (2 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 03/11] x86/mm/cpa: Rework static_protections() Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 16:36   ` Dave Hansen
  2018-09-27 18:47   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 05/11] x86/mm/cpa: Add debug mechanism Thomas Gleixner
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Allow-range-check-for-static-protections.patch --]
[-- Type: text/plain, Size: 6307 bytes --]

Checking static protections only page by page is slow especially for huge
pages. To allow quick checks over a complete range, add the ability to do
that.

Make the checks inclusive so the ranges can be directly used for debug output
later.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |   69 +++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 25 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -286,22 +286,29 @@ static void cpa_flush_array(unsigned lon
 	}
 }
 
+static bool overlaps(unsigned long r1_start, unsigned long r1_end,
+		     unsigned long r2_start, unsigned long r2_end)
+{
+	return (r1_start <= r2_end && r1_end >= r2_start) ||
+		(r2_start <= r1_end && r2_end >= r1_start);
+}
+
 #ifdef CONFIG_PCI_BIOS
 /*
  * The BIOS area between 640k and 1Mb needs to be executable for PCI BIOS
  * based config access (CONFIG_PCI_GOBIOS) support.
  */
 #define BIOS_PFN	PFN_DOWN(BIOS_BEGIN)
-#define BIOS_PFN_END	PFN_DOWN(BIOS_END)
+#define BIOS_PFN_END	PFN_DOWN(BIOS_END - 1)
 
-static pgprotval_t protect_pci_bios(unsigned long pfn)
+static pgprotval_t protect_pci_bios(unsigned long spfn, unsigned long epfn)
 {
-	if (pcibios_enabled && within(pfn, BIOS_PFN, BIOS_PFN_END))
+	if (pcibios_enabled && overlaps(spfn, epfn, BIOS_PFN, BIOS_PFN_END))
 		return _PAGE_NX;
 	return 0;
 }
 #else
-static pgprotval_t protect_pci_bios(unsigned long pfn)
+static pgprotval_t protect_pci_bios(unsigned long spfn, unsigned long epfn)
 {
 	return 0;
 }
@@ -312,12 +319,17 @@ static pgprotval_t protect_pci_bios(unsi
  * aliases.  This also includes __ro_after_init, so do not enforce until
  * kernel_set_to_readonly is true.
  */
-static pgprotval_t protect_rodata(unsigned long pfn)
+static pgprotval_t protect_rodata(unsigned long spfn, unsigned long epfn)
 {
-	unsigned long start_pfn = __pa_symbol(__start_rodata) >> PAGE_SHIFT;
-	unsigned long end_pfn = __pa_symbol(__end_rodata) >> PAGE_SHIFT;
+	unsigned long epfn_ro, spfn_ro = PFN_DOWN(__pa_symbol(__start_rodata));
+
+	/*
+	 * Note: __end_rodata is at page aligned and not inclusive, so
+	 * subtract 1 to get the last enforced PFN in the rodata area.
+	 */
+	epfn_ro = PFN_DOWN(__pa_symbol(__end_rodata)) - 1;
 
-	if (kernel_set_to_readonly && within(pfn, start_pfn, end_pfn))
+	if (kernel_set_to_readonly && overlaps(spfn, epfn, spfn_ro, epfn_ro))
 		return _PAGE_RW;
 	return 0;
 }
@@ -327,9 +339,12 @@ static pgprotval_t protect_rodata(unsign
  * not cover __inittext since that is gone after boot. On 64bit we do not
  * enforce !NX on the low mapping
  */
-static pgprotval_t protect_kernel_text(unsigned long address)
+static pgprotval_t protect_kernel_text(unsigned long start, unsigned long end)
 {
-	if (within(address, (unsigned long)_text, (unsigned long)_etext))
+	unsigned long t_end = (unsigned long)_etext - 1;
+	unsigned long t_start = (unsigned long)_text;
+
+	if (overlaps(start, end, t_start, t_end))
 		return _PAGE_NX;
 	return 0;
 }
@@ -344,13 +359,14 @@ static pgprotval_t protect_kernel_text(u
  * This will preserve the large page mappings for kernel text/data at no
  * extra cost.
  */
-static pgprotval_t protect_kernel_text_ro(unsigned long address)
+static pgprotval_t protect_kernel_text_ro(unsigned long start,
+					  unsigned long end)
 {
-	unsigned long end = (unsigned long)__end_rodata_hpage_align;
-	unsigned long start = (unsigned long)_text;
+	unsigned long t_end = (unsigned long)__end_rodata_hpage_align - 1;
+	unsigned long t_start = (unsigned long)_text;
 	unsigned int level;
 
-	if (!kernel_set_to_readonly || !within(address, start, end))
+	if (!kernel_set_to_readonly || !overlaps(start, end, t_start, t_end))
 		return 0;
 	/*
 	 * Don't enforce the !RW mapping for the kernel text mapping, if
@@ -364,12 +380,13 @@ static pgprotval_t protect_kernel_text_r
 	 * so the protections for kernel text and identity mappings have to
 	 * be the same.
 	 */
-	if (lookup_address(address, &level) && (level != PG_LEVEL_4K))
+	if (lookup_address(start, &level) && (level != PG_LEVEL_4K))
 		return _PAGE_RW;
 	return 0;
 }
 #else
-static pgprotval_t protect_kernel_text_ro(unsigned long address)
+static pgprotval_t protect_kernel_text_ro(unsigned long start,
+					  unsigned long end)
 {
 	return 0;
 }
@@ -381,18 +398,20 @@ static pgprotval_t protect_kernel_text_r
  * right (again, ioremap() on BIOS memory is not uncommon) so this function
  * checks and fixes these known static required protection bits.
  */
-static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
-					  unsigned long pfn)
+static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
+					  unsigned long pfn, unsigned long npg)
 {
 	pgprotval_t forbidden;
+	unsigned long end;
 
 	/* Operate on the virtual address */
-	forbidden  = protect_kernel_text(address);
-	forbidden |= protect_kernel_text_ro(address);
+	end = start + npg * PAGE_SIZE - 1;
+	forbidden  = protect_kernel_text(start, end);
+	forbidden |= protect_kernel_text_ro(start, end);
 
 	/* Check the PFN directly */
-	forbidden |= protect_pci_bios(pfn);
-	forbidden |= protect_rodata(pfn);
+	forbidden |= protect_pci_bios(pfn, pfn + npg - 1);
+	forbidden |= protect_rodata(pfn, pfn + npg - 1);
 
 	return __pgprot(pgprot_val(prot) & ~forbidden);
 }
@@ -664,10 +683,10 @@ static int __should_split_large_page(pte
 	 * in it results in a different pgprot than the first one of the
 	 * requested range. If yes, then the page needs to be split.
 	 */
-	new_prot = static_protections(req_prot, address, pfn);
+	new_prot = static_protections(req_prot, address, pfn, 1);
 	pfn = old_pfn;
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		pgprot_t chk_prot = static_protections(req_prot, addr, pfn);
+		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1);
 
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
@@ -1277,7 +1296,7 @@ static int __change_page_attr(struct cpa
 		pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
 		pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
 
-		new_prot = static_protections(new_prot, address, pfn);
+		new_prot = static_protections(new_prot, address, pfn, 1);
 
 		new_prot = pgprot_clear_protnone_bits(new_prot);
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 05/11] x86/mm/cpa: Add debug mechanism
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (3 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 04/11] x86/mm/cpa: Allow range check for static protections Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 16:40   ` Dave Hansen
  2018-09-27 18:48   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 06/11] x86/mm/cpa: Add large page preservation statistics Thomas Gleixner
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Add-proper-debug-output.patch --]
[-- Type: text/plain, Size: 4180 bytes --]

The whole static protection magic is silently fixing up anything which is
handed in. That's just wrong. The offending call sites need to be fixed.

Add a debug mechanism which emits a warning if a requested mapping needs to be
fixed up. The DETECT debug mechanism is really not meant to be enabled except
for developers, so limit the output hard to the protection fixups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |   61 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 9 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -42,6 +42,13 @@ struct cpa_data {
 	struct page	**pages;
 };
 
+enum cpa_warn {
+	CPA_PROTECT,
+	CPA_DETECT,
+};
+
+static const int cpa_warn_level = CPA_PROTECT;
+
 /*
  * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings)
  * using cpa_lock. So that we don't allow any other cpu, with stale large tlb
@@ -392,6 +399,28 @@ static pgprotval_t protect_kernel_text_r
 }
 #endif
 
+static inline bool conflicts(pgprot_t prot, pgprotval_t val)
+{
+	return (pgprot_val(prot) & ~val) != pgprot_val(prot);
+}
+
+static inline void check_conflict(int warnlvl, pgprot_t prot, pgprotval_t val,
+				  unsigned long start, unsigned long end,
+				  unsigned long pfn, const char *txt)
+{
+	static const char *lvltxt[] = {
+		[CPA_PROTECT]	= "protect",
+		[CPA_DETECT]	= "detect",
+	};
+
+	if (warnlvl > cpa_warn_level || !conflicts(prot, val))
+		return;
+
+	pr_warn("CPA %8s %10s: 0x%016lx - 0x%016lx PFN %lx req %016llx prevent %016llx\n",
+		lvltxt[warnlvl], txt, start, end, pfn, (unsigned long long)pgprot_val(prot),
+		(unsigned long long)val);
+}
+
 /*
  * Certain areas of memory on x86 require very specific protection flags,
  * for example the BIOS area or kernel text. Callers don't always get this
@@ -399,19 +428,31 @@ static pgprotval_t protect_kernel_text_r
  * checks and fixes these known static required protection bits.
  */
 static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
-					  unsigned long pfn, unsigned long npg)
+					  unsigned long pfn, unsigned long npg,
+					  int warnlvl)
 {
-	pgprotval_t forbidden;
+	pgprotval_t forbidden, res;
 	unsigned long end;
 
 	/* Operate on the virtual address */
 	end = start + npg * PAGE_SIZE - 1;
-	forbidden  = protect_kernel_text(start, end);
-	forbidden |= protect_kernel_text_ro(start, end);
+
+	res = protect_kernel_text(start, end);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX");
+	forbidden = res;
+
+	res = protect_kernel_text_ro(start, end);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO");
+	forbidden |= res;
 
 	/* Check the PFN directly */
-	forbidden |= protect_pci_bios(pfn, pfn + npg - 1);
-	forbidden |= protect_rodata(pfn, pfn + npg - 1);
+	res = protect_pci_bios(pfn, pfn + npg - 1);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "PCIBIOS NX");
+	forbidden |= res;
+
+	res = protect_rodata(pfn, pfn + npg - 1);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "Rodata RO");
+	forbidden |= res;
 
 	return __pgprot(pgprot_val(prot) & ~forbidden);
 }
@@ -683,10 +724,11 @@ static int __should_split_large_page(pte
 	 * in it results in a different pgprot than the first one of the
 	 * requested range. If yes, then the page needs to be split.
 	 */
-	new_prot = static_protections(req_prot, address, pfn, 1);
+	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
 	pfn = old_pfn;
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1);
+		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1,
+						       CPA_DETECT);
 
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
@@ -1296,7 +1338,8 @@ static int __change_page_attr(struct cpa
 		pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
 		pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
 
-		new_prot = static_protections(new_prot, address, pfn, 1);
+		new_prot = static_protections(new_prot, address, pfn, 1,
+					      CPA_PROTECT);
 
 		new_prot = pgprot_clear_protnone_bits(new_prot);
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 06/11] x86/mm/cpa: Add large page preservation statistics
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (4 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 05/11] x86/mm/cpa: Add debug mechanism Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 19:59   ` Dave Hansen
  2018-09-27 18:48   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 07/11] x86/mm/cpa: Avoid static protection checks on unmap Thomas Gleixner
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Add-large-page-preservation-statistics.patch --]
[-- Type: text/plain, Size: 5338 bytes --]

The large page preservation mechanism is just magic and provides no
information at all. Add optional statistic output in debugfs so the magic can
be evaluated. Defaults is off.

Output:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800770
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig       |    8 +++
 arch/x86/mm/pageattr.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 105 insertions(+), 2 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1495,6 +1495,14 @@ config X86_DIRECT_GBPAGES
 	  supports them), so don't confuse the user by printing
 	  that we have them enabled.
 
+config X86_CPA_STATISTICS
+	bool "Enable statistic for Change Page Attribute"
+	depends on DEBUG_FS
+	---help---
+	  Expose statistics about the Change Page Attribute mechanims, which
+	  helps to determine the effectivness of preserving large and huge
+	  page mappings when mapping protections are changed.
+
 config ARCH_HAS_MEM_ENCRYPT
 	def_bool y
 
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -101,6 +101,95 @@ void arch_report_meminfo(struct seq_file
 static inline void split_page_count(int level) { }
 #endif
 
+#ifdef CONFIG_X86_CPA_STATISTICS
+
+static unsigned long cpa_1g_checked;
+static unsigned long cpa_1g_sameprot;
+static unsigned long cpa_1g_preserved;
+static unsigned long cpa_2m_checked;
+static unsigned long cpa_2m_sameprot;
+static unsigned long cpa_2m_preserved;
+static unsigned long cpa_4k_checked;
+static unsigned long cpa_4k_install;
+
+static inline void cpa_inc_1g_checked(void)
+{
+	cpa_1g_checked++;
+}
+
+static inline void cpa_inc_2m_checked(void)
+{
+	cpa_2m_checked++;
+}
+
+static inline void cpa_inc_4k_checked(void)
+{
+	cpa_4k_checked++;
+}
+
+static inline void cpa_inc_4k_install(void)
+{
+	cpa_4k_install++;
+}
+
+static inline void cpa_inc_lp_sameprot(int level)
+{
+	if (level == PG_LEVEL_1G)
+		cpa_1g_sameprot++;
+	else
+		cpa_2m_sameprot++;
+}
+
+static inline void cpa_inc_lp_preserved(int level)
+{
+	if (level == PG_LEVEL_1G)
+		cpa_1g_preserved++;
+	else
+		cpa_2m_preserved++;
+}
+
+static int cpastats_show(struct seq_file *m, void *p)
+{
+	seq_printf(m, "1G pages checked:     %16lu\n", cpa_1g_checked);
+	seq_printf(m, "1G pages sameprot:    %16lu\n", cpa_1g_sameprot);
+	seq_printf(m, "1G pages preserved:   %16lu\n", cpa_1g_preserved);
+	seq_printf(m, "2M pages checked:     %16lu\n", cpa_2m_checked);
+	seq_printf(m, "2M pages sameprot:    %16lu\n", cpa_2m_sameprot);
+	seq_printf(m, "2M pages preserved:   %16lu\n", cpa_2m_preserved);
+	seq_printf(m, "4K pages checked:     %16lu\n", cpa_4k_checked);
+	seq_printf(m, "4K pages set-checked: %16lu\n", cpa_4k_install);
+	return 0;
+}
+
+static int cpastats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, cpastats_show, NULL);
+}
+
+static const struct file_operations cpastats_fops = {
+	.open		= cpastats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int __init cpa_stats_init(void)
+{
+	debugfs_create_file("cpa_stats", S_IRUSR, arch_debugfs_dir, NULL,
+			    &cpastats_fops);
+	return 0;
+}
+late_initcall(cpa_stats_init);
+#else
+static inline void cpa_inc_1g_checked(void) { }
+static inline void cpa_inc_2m_checked(void) { }
+static inline void cpa_inc_4k_checked(void) { }
+static inline void cpa_inc_4k_install(void) { }
+static inline void cpa_inc_lp_sameprot(int level) { }
+static inline void cpa_inc_lp_preserved(int level) { }
+#endif
+
+
 static inline int
 within(unsigned long addr, unsigned long start, unsigned long end)
 {
@@ -660,10 +749,12 @@ static int __should_split_large_page(pte
 	case PG_LEVEL_2M:
 		old_prot = pmd_pgprot(*(pmd_t *)kpte);
 		old_pfn = pmd_pfn(*(pmd_t *)kpte);
+		cpa_inc_2m_checked();
 		break;
 	case PG_LEVEL_1G:
 		old_prot = pud_pgprot(*(pud_t *)kpte);
 		old_pfn = pud_pfn(*(pud_t *)kpte);
+		cpa_inc_1g_checked();
 		break;
 	default:
 		return -EINVAL;
@@ -728,14 +819,16 @@ static int __should_split_large_page(pte
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
 		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1,
 						       CPA_DETECT);
-
+		cpa_inc_4k_checked();
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
 	}
 
 	/* If there are no changes, return. */
-	if (pgprot_val(new_prot) == pgprot_val(old_prot))
+	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
+		cpa_inc_lp_sameprot(level);
 		return 0;
+	}
 
 	/*
 	 * Verify that the address is aligned and the number of pages
@@ -748,6 +841,7 @@ static int __should_split_large_page(pte
 	new_pte = pfn_pte(old_pfn, new_prot);
 	__set_pmd_pte(kpte, address, new_pte);
 	cpa->flags |= CPA_FLUSHTLB;
+	cpa_inc_lp_preserved(level);
 	return 0;
 }
 
@@ -1337,6 +1431,7 @@ static int __change_page_attr(struct cpa
 		pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
 		pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
 
+		cpa_inc_4k_install();
 		new_prot = static_protections(new_prot, address, pfn, 1,
 					      CPA_PROTECT);
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 07/11] x86/mm/cpa: Avoid static protection checks on unmap
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (5 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 06/11] x86/mm/cpa: Add large page preservation statistics Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 20:01   ` Dave Hansen
  2018-09-27 18:49   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings Thomas Gleixner
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Avoid-static-protection-checks-on-unmap.patch --]
[-- Type: text/plain, Size: 1326 bytes --]

If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are
completely irrelevant.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800770
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800709
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -522,6 +522,13 @@ static inline pgprot_t static_protection
 	pgprotval_t forbidden, res;
 	unsigned long end;
 
+	/*
+	 * There is no point in checking RW/NX conflicts when the requested
+	 * mapping is setting the page !PRESENT.
+	 */
+	if (!(pgprot_val(prot) & _PAGE_PRESENT))
+		return prot;
+
 	/* Operate on the virtual address */
 	end = start + npg * PAGE_SIZE - 1;
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (6 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 07/11] x86/mm/cpa: Avoid static protection checks on unmap Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-18  7:14   ` Peter Zijlstra
                     ` (2 more replies)
  2018-09-17 14:29 ` [patch V3 09/11] x86/mm/cpa: Optimize same protection check Thomas Gleixner
                   ` (2 subsequent siblings)
  10 siblings, 3 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Add-sanity-check-for-existing-mappings.patch --]
[-- Type: text/plain, Size: 5446 bytes --]

With the range check it is possible to do a quick verification that the
current mapping is correct vs. the static protection areas.

In case a incorrect mapping is detected a warning is emitted and the large
page is split up. If the large page is a 2M page, then the split code is
forced to check the static protections for the PTE entries to fix up the
incorrectness. For 1G pages this can't be done easily because that would
require to either find the offending 2M areas before the split or
afterwards. For now just warn about that case and revisit it when reported.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |   77 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 67 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -37,12 +37,14 @@ struct cpa_data {
 	unsigned long	numpages;
 	int		flags;
 	unsigned long	pfn;
-	unsigned	force_split : 1;
+	unsigned	force_split		: 1,
+			force_static_prot	: 1;
 	int		curpage;
 	struct page	**pages;
 };
 
 enum cpa_warn {
+	CPA_CONFLICT,
 	CPA_PROTECT,
 	CPA_DETECT,
 };
@@ -498,6 +500,7 @@ static inline void check_conflict(int wa
 				  unsigned long pfn, const char *txt)
 {
 	static const char *lvltxt[] = {
+		[CPA_CONFLICT]	= "conflict",
 		[CPA_PROTECT]	= "protect",
 		[CPA_DETECT]	= "detect",
 	};
@@ -739,7 +742,7 @@ static int __should_split_large_page(pte
 				     struct cpa_data *cpa)
 {
 	unsigned long numpages, pmask, psize, lpaddr, addr, pfn, old_pfn;
-	pgprot_t old_prot, new_prot, req_prot;
+	pgprot_t old_prot, new_prot, req_prot, chk_prot;
 	pte_t new_pte, old_pte, *tmp;
 	enum pg_level level;
 	int i;
@@ -816,6 +819,23 @@ static int __should_split_large_page(pte
 	numpages = psize >> PAGE_SHIFT;
 
 	/*
+	 * Sanity check that the existing mapping is correct versus the static
+	 * protections. static_protections() guards against !PRESENT, so no
+	 * extra conditional required here.
+	 */
+	chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages,
+				      CPA_CONFLICT);
+
+	if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) {
+		/*
+		 * Split the large page and tell the split code to
+		 * enforce static protections.
+		 */
+		cpa->force_static_prot = 1;
+		return 1;
+	}
+
+	/*
 	 * Make sure that the requested pgprot does not violate the static
 	 * protections. Check the full large page whether one of the pages
 	 * in it results in a different pgprot than the first one of the
@@ -824,8 +844,8 @@ static int __should_split_large_page(pte
 	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
 	pfn = old_pfn;
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1,
-						       CPA_DETECT);
+		chk_prot = static_protections(req_prot, addr, pfn, 1,
+					      CPA_DETECT);
 		cpa_inc_4k_checked();
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
@@ -867,15 +887,50 @@ static int should_split_large_page(pte_t
 	return do_split;
 }
 
+static void split_set_pte(struct cpa_data *cpa, pte_t *pte, unsigned long pfn,
+			  pgprot_t ref_prot, unsigned long address,
+			  unsigned long size)
+{
+	unsigned int npg = PFN_DOWN(size);
+	pgprot_t prot;
+
+	/*
+	 * If try_preserve_large_page() discovered an inconsistent mapping,
+	 * remove the invalid protection in the split mapping.
+	 */
+	if (!cpa->force_static_prot)
+		goto set;
+
+	prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT);
+
+	if (pgprot_val(prot) == pgprot_val(ref_prot))
+		goto set;
+
+	/*
+	 * If this is splitting a PMD, fix it up. PUD splits cannot be
+	 * fixed trivially as that would require to rescan the newly
+	 * installed PMD mappings after returning from split_large_page()
+	 * so an eventual further split can allocate the necessary PTE
+	 * pages. Warn for now and revisit it in case this actually
+	 * happens.
+	 */
+	if (size == PAGE_SIZE)
+		ref_prot = prot;
+	else
+		pr_warn_once("CPA: Cannot fixup static protections for PUD split\n");
+set:
+	set_pte(pte, pfn_pte(pfn, ref_prot));
+}
+
 static int
 __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 		   struct page *base)
 {
+	unsigned long lpaddr, lpinc, ref_pfn, pfn, pfninc = 1;
 	pte_t *pbase = (pte_t *)page_address(base);
-	unsigned long ref_pfn, pfn, pfninc = 1;
 	unsigned int i, level;
-	pte_t *tmp;
 	pgprot_t ref_prot;
+	pte_t *tmp;
 
 	spin_lock(&pgd_lock);
 	/*
@@ -898,15 +953,17 @@ static int
 		 * PAT bit to correct position.
 		 */
 		ref_prot = pgprot_large_2_4k(ref_prot);
-
 		ref_pfn = pmd_pfn(*(pmd_t *)kpte);
+		lpaddr = address & PMD_MASK;
+		lpinc = PAGE_SIZE;
 		break;
 
 	case PG_LEVEL_1G:
 		ref_prot = pud_pgprot(*(pud_t *)kpte);
 		ref_pfn = pud_pfn(*(pud_t *)kpte);
 		pfninc = PMD_PAGE_SIZE >> PAGE_SHIFT;
-
+		lpaddr = address & PUD_MASK;
+		lpinc = PMD_SIZE;
 		/*
 		 * Clear the PSE flags if the PRESENT flag is not set
 		 * otherwise pmd_present/pmd_huge will return true
@@ -927,8 +984,8 @@ static int
 	 * Get the target pfn from the original entry:
 	 */
 	pfn = ref_pfn;
-	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
-		set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
+	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc, lpaddr += lpinc)
+		split_set_pte(cpa, pbase + i, pfn, ref_prot, lpaddr, lpinc);
 
 	if (virt_addr_valid(address)) {
 		unsigned long pfn = PFN_DOWN(__pa(address));



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 09/11] x86/mm/cpa: Optimize same protection check
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (7 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 20:12   ` Dave Hansen
  2018-09-27 18:50   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 10/11] x86/mm/cpa: Do the range check early Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 11/11] x86/mm/cpa: Avoid the 4k pages check completely Thomas Gleixner
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Optimize-large-page-preservation-check.patch --]
[-- Type: text/plain, Size: 1959 bytes --]

When the existing mapping is correct and the new requested page protections
are the same as the existing ones, then further checks can be omitted and the
large page can be preserved. The slow path 4k wise check will not come up with
a different result.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800709
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               560642
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -836,6 +836,20 @@ static int __should_split_large_page(pte
 	}
 
 	/*
+	 * Optimization: If the requested pgprot is the same as the current
+	 * pgprot, then the large page can be preserved and no updates are
+	 * required independent of alignment and length of the requested
+	 * range. The above already established that the current pgprot is
+	 * correct, which in consequence makes the requested pgprot correct
+	 * as well if it is the same. The static protection scan below will
+	 * not come to a different conclusion.
+	 */
+	if (pgprot_val(req_prot) == pgprot_val(old_prot)) {
+		cpa_inc_lp_sameprot(level);
+		return 0;
+	}
+
+	/*
 	 * Make sure that the requested pgprot does not violate the static
 	 * protections. Check the full large page whether one of the pages
 	 * in it results in a different pgprot than the first one of the



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 10/11] x86/mm/cpa: Do the range check early
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (8 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 09/11] x86/mm/cpa: Optimize same protection check Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 20:26   ` Dave Hansen
  2018-09-27 18:50   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2018-09-17 14:29 ` [patch V3 11/11] x86/mm/cpa: Avoid the 4k pages check completely Thomas Gleixner
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Do-the-range-check-early.patch --]
[-- Type: text/plain, Size: 2931 bytes --]

To avoid excessive 4k wise checks in the common case do a quick check first
whether the requested new page protections conflict with a static
protection area in the large page. If there is no conflict then the
decision whether to preserve or to split the page can be made immediately.

If the requested range covers the full large page, preserve it. Otherwise
split it up. No point in doing a slow crawl in 4k steps.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               560642
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  541
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:                  514
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |   27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -850,10 +850,28 @@ static int __should_split_large_page(pte
 	}
 
 	/*
-	 * Make sure that the requested pgprot does not violate the static
-	 * protections. Check the full large page whether one of the pages
-	 * in it results in a different pgprot than the first one of the
-	 * requested range. If yes, then the page needs to be split.
+	 * Optimization: Check whether the requested pgprot is conflicting
+	 * with a static protection requirement in the large page. If not,
+	 * then checking whether the requested range is fully covering the
+	 * large page can be done right here.
+	 */
+	new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
+				      CPA_DETECT);
+
+	if (pgprot_val(req_prot) == pgprot_val(new_prot)) {
+		if (address != lpaddr || cpa->numpages != numpages)
+			return 1;
+		goto setlp;
+	}
+
+	/*
+	 * Slow path. The full large page check above established that the
+	 * requested pgprot cannot be applied to the full large page due to
+	 * conflicting requirements of static protection regions. It might
+	 * turn out that the whole requested range is covered by the
+	 * modified protection of the first 4k segment at @address. This
+	 * might result in the ability to preserve the large page
+	 * nevertheless.
 	 */
 	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
 	pfn = old_pfn;
@@ -878,6 +896,7 @@ static int __should_split_large_page(pte
 	if (address != lpaddr || cpa->numpages != numpages)
 		return 1;
 
+setlp:
 	/* All checks passed. Update the large page mapping. */
 	new_pte = pfn_pte(old_pfn, new_prot);
 	__set_pmd_pte(kpte, address, new_pte);



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch V3 11/11] x86/mm/cpa: Avoid the 4k pages check completely
  2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
                   ` (9 preceding siblings ...)
  2018-09-17 14:29 ` [patch V3 10/11] x86/mm/cpa: Do the range check early Thomas Gleixner
@ 2018-09-17 14:29 ` Thomas Gleixner
  2018-09-21 20:32   ` Dave Hansen
  2018-09-27 18:51   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  10 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-17 14:29 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Dave Hansen, Mark Gross

[-- Attachment #1: x86-mm-cpa--Avoid-the-4k-pages-check-completely.patch --]
[-- Type: text/plain, Size: 5632 bytes --]

The extra loop which tries hard to preserve large pages in case of conflicts
with static protection regions turns out to be not preserving anything, at
least not in the experiments which have been conducted.

There might be corner cases in which the code would be able to preserve a
large page oaccsionally, but it's really not worth the extra code and the
cycles wasted in the common case.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  541
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:                  514
 4K pages set-checked:             7668

After:
 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/pageattr.c |   66 ++++++++++++-------------------------------------
 1 file changed, 17 insertions(+), 49 deletions(-)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -111,7 +111,6 @@ static unsigned long cpa_1g_preserved;
 static unsigned long cpa_2m_checked;
 static unsigned long cpa_2m_sameprot;
 static unsigned long cpa_2m_preserved;
-static unsigned long cpa_4k_checked;
 static unsigned long cpa_4k_install;
 
 static inline void cpa_inc_1g_checked(void)
@@ -124,11 +123,6 @@ static inline void cpa_inc_2m_checked(vo
 	cpa_2m_checked++;
 }
 
-static inline void cpa_inc_4k_checked(void)
-{
-	cpa_4k_checked++;
-}
-
 static inline void cpa_inc_4k_install(void)
 {
 	cpa_4k_install++;
@@ -158,7 +152,6 @@ static int cpastats_show(struct seq_file
 	seq_printf(m, "2M pages checked:     %16lu\n", cpa_2m_checked);
 	seq_printf(m, "2M pages sameprot:    %16lu\n", cpa_2m_sameprot);
 	seq_printf(m, "2M pages preserved:   %16lu\n", cpa_2m_preserved);
-	seq_printf(m, "4K pages checked:     %16lu\n", cpa_4k_checked);
 	seq_printf(m, "4K pages set-checked: %16lu\n", cpa_4k_install);
 	return 0;
 }
@@ -185,7 +178,6 @@ late_initcall(cpa_stats_init);
 #else
 static inline void cpa_inc_1g_checked(void) { }
 static inline void cpa_inc_2m_checked(void) { }
-static inline void cpa_inc_4k_checked(void) { }
 static inline void cpa_inc_4k_install(void) { }
 static inline void cpa_inc_lp_sameprot(int level) { }
 static inline void cpa_inc_lp_preserved(int level) { }
@@ -741,11 +733,10 @@ static pgprot_t pgprot_clear_protnone_bi
 static int __should_split_large_page(pte_t *kpte, unsigned long address,
 				     struct cpa_data *cpa)
 {
-	unsigned long numpages, pmask, psize, lpaddr, addr, pfn, old_pfn;
+	unsigned long numpages, pmask, psize, lpaddr, pfn, old_pfn;
 	pgprot_t old_prot, new_prot, req_prot, chk_prot;
 	pte_t new_pte, old_pte, *tmp;
 	enum pg_level level;
-	int i;
 
 	/*
 	 * Check for races, another CPU might have split this page
@@ -850,53 +841,30 @@ static int __should_split_large_page(pte
 	}
 
 	/*
-	 * Optimization: Check whether the requested pgprot is conflicting
-	 * with a static protection requirement in the large page. If not,
-	 * then checking whether the requested range is fully covering the
-	 * large page can be done right here.
+	 * If the requested range does not cover the full page, split it up
 	 */
-	new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
-				      CPA_DETECT);
-
-	if (pgprot_val(req_prot) == pgprot_val(new_prot)) {
-		if (address != lpaddr || cpa->numpages != numpages)
-			return 1;
-		goto setlp;
-	}
+	if (address != lpaddr || cpa->numpages != numpages)
+		return 1;
 
 	/*
-	 * Slow path. The full large page check above established that the
-	 * requested pgprot cannot be applied to the full large page due to
-	 * conflicting requirements of static protection regions. It might
-	 * turn out that the whole requested range is covered by the
-	 * modified protection of the first 4k segment at @address. This
-	 * might result in the ability to preserve the large page
-	 * nevertheless.
-	 */
-	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
-	pfn = old_pfn;
-	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		chk_prot = static_protections(req_prot, addr, pfn, 1,
-					      CPA_DETECT);
-		cpa_inc_4k_checked();
-		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
-			return 1;
-	}
-
-	/* If there are no changes, return. */
-	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
-		cpa_inc_lp_sameprot(level);
-		return 0;
-	}
+	 * Check whether the requested pgprot is conflicting with a static
+	 * protection requirement in the large page.
+	 */
+	new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
+				      CPA_DETECT);
 
 	/*
-	 * Verify that the address is aligned and the number of pages
-	 * covers the full page.
+	 * If there is a conflict, split the large page.
+	 *
+	 * There used to be a 4k wise evaluation trying really hard to
+	 * preserve the large pages, but experimentation has shown, that this
+	 * does not help at all. There might be corner cases which would
+	 * preserve one large page occasionally, but it's really not worth the
+	 * extra code and cycles for the common case.
 	 */
-	if (address != lpaddr || cpa->numpages != numpages)
+	if (pgprot_val(req_prot) != pgprot_val(new_prot))
 		return 1;
 
-setlp:
 	/* All checks passed. Update the large page mapping. */
 	new_pte = pfn_pte(old_pfn, new_prot);
 	__set_pmd_pte(kpte, address, new_pte);



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page()
  2018-09-17 14:29 ` [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Thomas Gleixner
@ 2018-09-18  7:03   ` Peter Zijlstra
  2018-09-18  8:19   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2018-09-18  7:03 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Bin Yang, Dave Hansen, Mark Gross

On Mon, Sep 17, 2018 at 04:29:08PM +0200, Thomas Gleixner wrote:
> Avoid the extra variable and gotos by splitting the function into the
> actual algorithm and a callable function which contains the lock
> protection.
> 
> Rename it to should_split_large_page() while at it so the return values make
> actually sense.

Much thanks, that makes the whole thing much saner.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings
  2018-09-17 14:29 ` [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings Thomas Gleixner
@ 2018-09-18  7:14   ` Peter Zijlstra
  2018-09-21 20:07   ` Dave Hansen
  2018-09-27 18:49   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2018-09-18  7:14 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Bin Yang, Dave Hansen, Mark Gross

On Mon, Sep 17, 2018 at 04:29:14PM +0200, Thomas Gleixner wrote:
> +static void split_set_pte(struct cpa_data *cpa, pte_t *pte, unsigned long pfn,
> +			  pgprot_t ref_prot, unsigned long address,
> +			  unsigned long size)
> +{
> +	unsigned int npg = PFN_DOWN(size);
> +	pgprot_t prot;
> +
> +	/*
> +	 * If try_preserve_large_page() discovered an inconsistent mapping,

You just renamed that thing.. :-)

> +	 * remove the invalid protection in the split mapping.
> +	 */
> +	if (!cpa->force_static_prot)
> +		goto set;
> +
> +	prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT);
> +
> +	if (pgprot_val(prot) == pgprot_val(ref_prot))
> +		goto set;
> +
> +	/*
> +	 * If this is splitting a PMD, fix it up. PUD splits cannot be
> +	 * fixed trivially as that would require to rescan the newly
> +	 * installed PMD mappings after returning from split_large_page()
> +	 * so an eventual further split can allocate the necessary PTE
> +	 * pages. Warn for now and revisit it in case this actually
> +	 * happens.
> +	 */
> +	if (size == PAGE_SIZE)
> +		ref_prot = prot;
> +	else
> +		pr_warn_once("CPA: Cannot fixup static protections for PUD split\n");
> +set:
> +	set_pte(pte, pfn_pte(pfn, ref_prot));
> +}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page()
  2018-09-17 14:29 ` [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Thomas Gleixner
  2018-09-18  7:03   ` Peter Zijlstra
@ 2018-09-18  8:19   ` Peter Zijlstra
  2018-09-18 12:14     ` Peter Zijlstra
  2018-09-21 16:22   ` Dave Hansen
  2018-09-27 18:46   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  3 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2018-09-18  8:19 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Bin Yang, Dave Hansen, Mark Gross

On Mon, Sep 17, 2018 at 04:29:08PM +0200, Thomas Gleixner wrote:
> @@ -1288,23 +1287,23 @@ static int __change_page_attr(struct cpa
>  	err = split_large_page(cpa, kpte, address);
>  	if (!err) {
>  		/*
> +		 * Do a global flush tlb after splitting the large page
> +		 * and before we do the actual change page attribute in the PTE.
> +		 *
> +		 * With out this, we violate the TLB application note, that says
> +		 * "The TLBs may contain both ordinary and large-page
>  		 *  translations for a 4-KByte range of linear addresses. This
>  		 *  may occur if software modifies the paging structures so that
>  		 *  the page size used for the address range changes. If the two
>  		 *  translations differ with respect to page frame or attributes
>  		 *  (e.g., permissions), processor behavior is undefined and may
>  		 *  be implementation-specific."
> +		 *
> +		 * We do this global tlb flush inside the cpa_lock, so that we
>  		 * don't allow any other cpu, with stale tlb entries change the
>  		 * page attribute in parallel, that also falls into the
>  		 * just split large page entry.
> +		 */
>  		flush_tlb_all();
>  		goto repeat;
>  	}

this made me look at the tlb invalidation of that thing again; do we
want something like the below?

---

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -285,16 +285,6 @@ static void cpa_flush_all(unsigned long
 	on_each_cpu(__cpa_flush_all, (void *) cache, 1);
 }
 
-static void __cpa_flush_range(void *arg)
-{
-	/*
-	 * We could optimize that further and do individual per page
-	 * tlb invalidates for a low number of pages. Caveat: we must
-	 * flush the high aliases on 64bit as well.
-	 */
-	__flush_tlb_all();
-}
-
 static void cpa_flush_range(unsigned long start, int numpages, int cache)
 {
 	unsigned int i, level;
@@ -303,7 +293,7 @@ static void cpa_flush_range(unsigned lon
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 	WARN_ON(PAGE_ALIGN(start) != start);
 
-	on_each_cpu(__cpa_flush_range, NULL, 1);
+	flush_tlb_all();
 
 	if (!cache)
 		return;
@@ -1006,14 +996,24 @@ __split_large_page(struct cpa_data *cpa,
 	__set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE)));
 
 	/*
-	 * Intel Atom errata AAH41 workaround.
+	 * Do a global flush tlb after splitting the large page
+	 * and before we do the actual change page attribute in the PTE.
 	 *
-	 * The real fix should be in hw or in a microcode update, but
-	 * we also probabilistically try to reduce the window of having
-	 * a large TLB mixed with 4K TLBs while instruction fetches are
-	 * going on.
+	 * Without this, we violate the TLB application note, that says
+	 * "The TLBs may contain both ordinary and large-page
+	 *  translations for a 4-KByte range of linear addresses. This
+	 *  may occur if software modifies the paging structures so that
+	 *  the page size used for the address range changes. If the two
+	 *  translations differ with respect to page frame or attributes
+	 *  (e.g., permissions), processor behavior is undefined and may
+	 *  be implementation-specific."
+	 *
+	 * We do this global tlb flush inside the cpa_lock, so that we
+	 * don't allow any other cpu, with stale tlb entries change the
+	 * page attribute in parallel, that also falls into the
+	 * just split large page entry.
 	 */
-	__flush_tlb_all();
+	flush_tlb_all();
 	spin_unlock(&pgd_lock);
 
 	return 0;
@@ -1538,28 +1538,8 @@ static int __change_page_attr(struct cpa
 	 * We have to split the large page:
 	 */
 	err = split_large_page(cpa, kpte, address);
-	if (!err) {
-		/*
-		 * Do a global flush tlb after splitting the large page
-		 * and before we do the actual change page attribute in the PTE.
-		 *
-		 * With out this, we violate the TLB application note, that says
-		 * "The TLBs may contain both ordinary and large-page
-		 *  translations for a 4-KByte range of linear addresses. This
-		 *  may occur if software modifies the paging structures so that
-		 *  the page size used for the address range changes. If the two
-		 *  translations differ with respect to page frame or attributes
-		 *  (e.g., permissions), processor behavior is undefined and may
-		 *  be implementation-specific."
-		 *
-		 * We do this global tlb flush inside the cpa_lock, so that we
-		 * don't allow any other cpu, with stale tlb entries change the
-		 * page attribute in parallel, that also falls into the
-		 * just split large page entry.
-		 */
-		flush_tlb_all();
+	if (!err)
 		goto repeat;
-	}
 
 	return err;
 }

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page()
  2018-09-18  8:19   ` Peter Zijlstra
@ 2018-09-18 12:14     ` Peter Zijlstra
  2018-09-18 22:34       ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2018-09-18 12:14 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Bin Yang, Dave Hansen, Mark Gross

On Tue, Sep 18, 2018 at 10:19:09AM +0200, Peter Zijlstra wrote:
> On Mon, Sep 17, 2018 at 04:29:08PM +0200, Thomas Gleixner wrote:
> > @@ -1288,23 +1287,23 @@ static int __change_page_attr(struct cpa
> >  	err = split_large_page(cpa, kpte, address);
> >  	if (!err) {
> >  		/*
> > +		 * Do a global flush tlb after splitting the large page
> > +		 * and before we do the actual change page attribute in the PTE.
> > +		 *
> > +		 * With out this, we violate the TLB application note, that says
> > +		 * "The TLBs may contain both ordinary and large-page
> >  		 *  translations for a 4-KByte range of linear addresses. This
> >  		 *  may occur if software modifies the paging structures so that
> >  		 *  the page size used for the address range changes. If the two
> >  		 *  translations differ with respect to page frame or attributes
> >  		 *  (e.g., permissions), processor behavior is undefined and may
> >  		 *  be implementation-specific."
> > +		 *
> > +		 * We do this global tlb flush inside the cpa_lock, so that we
> >  		 * don't allow any other cpu, with stale tlb entries change the
> >  		 * page attribute in parallel, that also falls into the
> >  		 * just split large page entry.
> > +		 */
> >  		flush_tlb_all();
> >  		goto repeat;
> >  	}
> 
> this made me look at the tlb invalidation of that thing again; do we
> want something like the below?
> 

Further cleanups are possible...

---
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -285,25 +285,20 @@ static void cpa_flush_all(unsigned long
 	on_each_cpu(__cpa_flush_all, (void *) cache, 1);
 }
 
-static void __cpa_flush_range(void *arg)
-{
-	/*
-	 * We could optimize that further and do individual per page
-	 * tlb invalidates for a low number of pages. Caveat: we must
-	 * flush the high aliases on 64bit as well.
-	 */
-	__flush_tlb_all();
-}
-
 static void cpa_flush_range(unsigned long start, int numpages, int cache)
 {
-	unsigned int i, level;
-	unsigned long addr;
+	unsigned long addr, end = start + PAGE_SIZE * numpages;
+	unsigned int level;
 
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 	WARN_ON(PAGE_ALIGN(start) != start);
 
-	on_each_cpu(__cpa_flush_range, NULL, 1);
+	if (!static_cpu_has(X86_FEATURE_CLFLUSH)) {
+		cpa_flush_all(cache);
+		return;
+	}
+
+	flush_tlb_kernel_range(start, end);
 
 	if (!cache)
 		return;
@@ -314,7 +309,7 @@ static void cpa_flush_range(unsigned lon
 	 * will cause all other CPUs to flush the same
 	 * cachelines:
 	 */
-	for (i = 0, addr = start; i < numpages; i++, addr += PAGE_SIZE) {
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
 		pte_t *pte = lookup_address(addr, &level);
 
 		/*
@@ -325,30 +320,22 @@ static void cpa_flush_range(unsigned lon
 	}
 }
 
-static void cpa_flush_array(unsigned long *start, int numpages, int cache,
+static void cpa_flush_array(unsigned long baddr, unsigned long *start, 
+			    int numpages, int cache,
 			    int in_flags, struct page **pages)
 {
 	unsigned int i, level;
-#ifdef CONFIG_PREEMPT
-	/*
-	 * Avoid wbinvd() because it causes latencies on all CPUs,
-	 * regardless of any CPU isolation that may be in effect.
-	 *
-	 * This should be extended for CAT enabled systems independent of
-	 * PREEMPT because wbinvd() does not respect the CAT partitions and
-	 * this is exposed to unpriviledged users through the graphics
-	 * subsystem.
-	 */
-	unsigned long do_wbinvd = 0;
-#else
-	unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */
-#endif
 
 	BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
 
-	on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
+	if (!static_cpu_has(X86_FEATURE_CLFLUSH)) {
+		cpa_flush_all(cache);
+		return;
+	}
+
+	flush_tlb_kernel_range(baddr, baddr + PAGE_SIZE * numpages);
 
-	if (!cache || do_wbinvd)
+	if (!cache)
 		return;
 
 	/*
@@ -1006,14 +993,24 @@ __split_large_page(struct cpa_data *cpa,
 	__set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE)));
 
 	/*
-	 * Intel Atom errata AAH41 workaround.
+	 * Do a global flush tlb after splitting the large page
+	 * and before we do the actual change page attribute in the PTE.
 	 *
-	 * The real fix should be in hw or in a microcode update, but
-	 * we also probabilistically try to reduce the window of having
-	 * a large TLB mixed with 4K TLBs while instruction fetches are
-	 * going on.
+	 * Without this, we violate the TLB application note, that says
+	 * "The TLBs may contain both ordinary and large-page
+	 *  translations for a 4-KByte range of linear addresses. This
+	 *  may occur if software modifies the paging structures so that
+	 *  the page size used for the address range changes. If the two
+	 *  translations differ with respect to page frame or attributes
+	 *  (e.g., permissions), processor behavior is undefined and may
+	 *  be implementation-specific."
+	 *
+	 * We do this global tlb flush inside the cpa_lock, so that we
+	 * don't allow any other cpu, with stale tlb entries change the
+	 * page attribute in parallel, that also falls into the
+	 * just split large page entry.
 	 */
-	__flush_tlb_all();
+	flush_tlb_all();
 	spin_unlock(&pgd_lock);
 
 	return 0;
@@ -1538,28 +1535,8 @@ static int __change_page_attr(struct cpa
 	 * We have to split the large page:
 	 */
 	err = split_large_page(cpa, kpte, address);
-	if (!err) {
-		/*
-		 * Do a global flush tlb after splitting the large page
-		 * and before we do the actual change page attribute in the PTE.
-		 *
-		 * With out this, we violate the TLB application note, that says
-		 * "The TLBs may contain both ordinary and large-page
-		 *  translations for a 4-KByte range of linear addresses. This
-		 *  may occur if software modifies the paging structures so that
-		 *  the page size used for the address range changes. If the two
-		 *  translations differ with respect to page frame or attributes
-		 *  (e.g., permissions), processor behavior is undefined and may
-		 *  be implementation-specific."
-		 *
-		 * We do this global tlb flush inside the cpa_lock, so that we
-		 * don't allow any other cpu, with stale tlb entries change the
-		 * page attribute in parallel, that also falls into the
-		 * just split large page entry.
-		 */
-		flush_tlb_all();
+	if (!err)
 		goto repeat;
-	}
 
 	return err;
 }
@@ -1786,9 +1763,9 @@ static int change_page_attr_set_clr(unsi
 	 * error case we fall back to cpa_flush_all (which uses
 	 * WBINVD):
 	 */
-	if (!ret && boot_cpu_has(X86_FEATURE_CLFLUSH)) {
+	if (!ret) {
 		if (cpa.flags & (CPA_PAGES_ARRAY | CPA_ARRAY)) {
-			cpa_flush_array(addr, numpages, cache,
+			cpa_flush_array(baddr, addr, numpages, cache,
 					cpa.flags, pages);
 		} else
 			cpa_flush_range(baddr, numpages, cache);
@@ -2108,10 +2085,7 @@ static int __set_memory_enc_dec(unsigned
 	/*
 	 * Before changing the encryption attribute, we need to flush caches.
 	 */
-	if (static_cpu_has(X86_FEATURE_CLFLUSH))
-		cpa_flush_range(start, numpages, 1);
-	else
-		cpa_flush_all(1);
+	cpa_flush_range(start, numpages, 1);
 
 	ret = __change_page_attr_set_clr(&cpa, 1);
 
@@ -2122,10 +2096,7 @@ static int __set_memory_enc_dec(unsigned
 	 * in case TLB flushing gets optimized in the cpa_flush_range()
 	 * path use the same logic as above.
 	 */
-	if (static_cpu_has(X86_FEATURE_CLFLUSH))
-		cpa_flush_range(start, numpages, 0);
-	else
-		cpa_flush_all(0);
+	cpa_flush_range(start, numpages, 0);
 
 	return ret;
 }

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page()
  2018-09-18 12:14     ` Peter Zijlstra
@ 2018-09-18 22:34       ` Thomas Gleixner
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-18 22:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, x86, Bin Yang, Dave Hansen, Mark Gross

On Tue, 18 Sep 2018, Peter Zijlstra wrote:
> On Tue, Sep 18, 2018 at 10:19:09AM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 17, 2018 at 04:29:08PM +0200, Thomas Gleixner wrote:
> > > @@ -1288,23 +1287,23 @@ static int __change_page_attr(struct cpa
> > >  	err = split_large_page(cpa, kpte, address);
> > >  	if (!err) {
> > >  		/*
> > > +		 * Do a global flush tlb after splitting the large page
> > > +		 * and before we do the actual change page attribute in the PTE.
> > > +		 *
> > > +		 * With out this, we violate the TLB application note, that says
> > > +		 * "The TLBs may contain both ordinary and large-page
> > >  		 *  translations for a 4-KByte range of linear addresses. This
> > >  		 *  may occur if software modifies the paging structures so that
> > >  		 *  the page size used for the address range changes. If the two
> > >  		 *  translations differ with respect to page frame or attributes
> > >  		 *  (e.g., permissions), processor behavior is undefined and may
> > >  		 *  be implementation-specific."
> > > +		 *
> > > +		 * We do this global tlb flush inside the cpa_lock, so that we
> > >  		 * don't allow any other cpu, with stale tlb entries change the
> > >  		 * page attribute in parallel, that also falls into the
> > >  		 * just split large page entry.
> > > +		 */
> > >  		flush_tlb_all();
> > >  		goto repeat;
> > >  	}
> > 
> > this made me look at the tlb invalidation of that thing again; do we
> > want something like the below?
> > 
>
> Further cleanups are possible...

Yes please. Can you write up a changelog for that please?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go
  2018-09-17 14:29 ` [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go Thomas Gleixner
@ 2018-09-21 16:15   ` Dave Hansen
  2018-09-27 18:45   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 16:15 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross, kernel test robot

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> The sequence of marking text and rodata read-only in 32bit init is:
> 
>   set_ro(text);
>   kernel_set_to_readonly = 1;
>   set_ro(rodata);
> 
> When kernel_set_to_readonly is 1 it enables the protection mechanism in CPA
> for the read only regions. With the upcoming checks for existing mappings
> this consequently triggers the warning about an existing mapping being
> incorrect vs. static protections because rodata has not been converted yet.
> 
> There is no technical reason to split the two, so just combine the RO
> protection to convert text and rodata in one go.
> 
> Convert the printks to pr_info while at it.
> 
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page()
  2018-09-17 14:29 ` [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Thomas Gleixner
  2018-09-18  7:03   ` Peter Zijlstra
  2018-09-18  8:19   ` Peter Zijlstra
@ 2018-09-21 16:22   ` Dave Hansen
  2018-09-27 18:46   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 16:22 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> Avoid the extra variable and gotos by splitting the function into the
> actual algorithm and a callable function which contains the lock
> protection.
> 
> Rename it to should_split_large_page() while at it so the return values make
> actually sense.
> 
> Clean up the code flow, comments and general whitespace damage while at it. No
> functional change.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

This one is getting a wee bit hard to follow from the diff, but the
cleanup and rename looks nice.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 03/11] x86/mm/cpa: Rework static_protections()
  2018-09-17 14:29 ` [patch V3 03/11] x86/mm/cpa: Rework static_protections() Thomas Gleixner
@ 2018-09-21 16:33   ` Dave Hansen
  2018-09-27 18:46   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 16:33 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> +/*
> + * The kernel text needs to be executable for obvious reasons. This does
> + * not cover __inittext since that is gone after boot. On 64bit we do not
> + * enforce !NX on the low mapping
> + */
> +static pgprotval_t protect_kernel_text(unsigned long address)
> +{
> +	if (within(address, (unsigned long)_text, (unsigned long)_etext))
> +		return _PAGE_NX;
> +	return 0;
> +}

Minor nit: I was scratching my head about how why this works.  It
_reads_ like we are using _PAGE_NX to protect kernel text which doesn't
make any sense of course.

Could we make a connection between the protection and _PAGE_NX in the
comments:

	Protect kernel text against by forbidding _PAGE_NX.  This 	
	protects only the high kernel mapping (_text -> _etext) out of
	which we actually execute.  Do not protect the low mapping.

	This does not cover __inittext since that is gone after boot.

The static_protections() code looks fine because it's totally obvious
that it is dealing with "forbidden" bits, btw:

> +static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
> +					  unsigned long pfn)
> +{
> +	pgprotval_t forbidden;
> +
> +	/* Operate on the virtual address */
> +	forbidden  = protect_kernel_text(address);
> +	forbidden |= protect_kernel_text_ro(address);
> +
> +	/* Check the PFN directly */
> +	forbidden |= protect_pci_bios(pfn);
> +	forbidden |= protect_rodata(pfn);
>  
> -	return prot;
> +	return __pgprot(pgprot_val(prot) & ~forbidden);
>  }

This is more of a, "if you happen to respin these" comment, though, so:

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 04/11] x86/mm/cpa: Allow range check for static protections
  2018-09-17 14:29 ` [patch V3 04/11] x86/mm/cpa: Allow range check for static protections Thomas Gleixner
@ 2018-09-21 16:36   ` Dave Hansen
  2018-09-27 18:47   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 16:36 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> Checking static protections only page by page is slow especially for huge
> pages. To allow quick checks over a complete range, add the ability to do
> that.
> 
> Make the checks inclusive so the ranges can be directly used for debug output
> later.
> 
> No functional change.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 05/11] x86/mm/cpa: Add debug mechanism
  2018-09-17 14:29 ` [patch V3 05/11] x86/mm/cpa: Add debug mechanism Thomas Gleixner
@ 2018-09-21 16:40   ` Dave Hansen
  2018-09-22 10:33     ` Peter Zijlstra
  2018-09-27 18:48   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 1 reply; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 16:40 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> The whole static protection magic is silently fixing up anything which is
> handed in. That's just wrong. The offending call sites need to be fixed.
> 
> Add a debug mechanism which emits a warning if a requested mapping needs to be
> fixed up. The DETECT debug mechanism is really not meant to be enabled except
> for developers, so limit the output hard to the protection fixups.
...
> +enum cpa_warn {
> +	CPA_PROTECT,
> +	CPA_DETECT,
> +};
> +
> +static const int cpa_warn_level = CPA_PROTECT;

Even if this is intended for developers only, should we also add some
config option here so things like 0day can still get warnings out of this?

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 06/11] x86/mm/cpa: Add large page preservation statistics
  2018-09-17 14:29 ` [patch V3 06/11] x86/mm/cpa: Add large page preservation statistics Thomas Gleixner
@ 2018-09-21 19:59   ` Dave Hansen
  2018-09-27 18:48   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 19:59 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> The large page preservation mechanism is just magic and provides no
> information at all. Add optional statistic output in debugfs so the magic can
> be evaluated. Defaults is off.
> 
> Output:
> 
>  1G pages checked:                    2
>  1G pages sameprot:                   0
>  1G pages preserved:                  0
>  2M pages checked:                  540
>  2M pages sameprot:                 466
>  2M pages preserved:                 47
>  4K pages checked:               800770
>  4K pages set-checked:             7668

Very nice, of course.

We've all rigged one of these up as we've played with this code, and
this will certainly help the next issue that we have.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 07/11] x86/mm/cpa: Avoid static protection checks on unmap
  2018-09-17 14:29 ` [patch V3 07/11] x86/mm/cpa: Avoid static protection checks on unmap Thomas Gleixner
@ 2018-09-21 20:01   ` Dave Hansen
  2018-09-27 18:49   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 20:01 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are
> completely irrelevant.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings
  2018-09-17 14:29 ` [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings Thomas Gleixner
  2018-09-18  7:14   ` Peter Zijlstra
@ 2018-09-21 20:07   ` Dave Hansen
  2018-09-27 18:49   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 20:07 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> +	/*
> +	 * If this is splitting a PMD, fix it up. PUD splits cannot be
> +	 * fixed trivially as that would require to rescan the newly
> +	 * installed PMD mappings after returning from split_large_page()
> +	 * so an eventual further split can allocate the necessary PTE
> +	 * pages. Warn for now and revisit it in case this actually
> +	 * happens.
> +	 */
> +	if (size == PAGE_SIZE)
> +		ref_prot = prot;
> +	else
> +		pr_warn_once("CPA: Cannot fixup static protections for PUD split\n");
> +set:
> +	set_pte(pte, pfn_pte(pfn, ref_prot));
> +}

This looked a _little_ bit funky to me.  It talks about splitting up
PMDs and PUDs, but it wasn't immediately obvious why it never looks for
PMD or PUD sizes.

It's because split_set_pte()'s "size" is the size we are splitting *to*.
 IOW, a PMD split gets PAGE_SIZE and a PUD split gets PMD_SIZE.  It's
obvious with a bit more context, so it might be handy to include a blurb
in the comment about what 'size' is *of*.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 09/11] x86/mm/cpa: Optimize same protection check
  2018-09-17 14:29 ` [patch V3 09/11] x86/mm/cpa: Optimize same protection check Thomas Gleixner
@ 2018-09-21 20:12   ` Dave Hansen
  2018-09-27 18:07     ` Thomas Gleixner
  2018-09-27 18:50   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 1 reply; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 20:12 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> When the existing mapping is correct and the new requested page protections
> are the same as the existing ones, then further checks can be omitted and the
> large page can be preserved. The slow path 4k wise check will not come up with
> a different result.
> 
> Before:
> 
>  1G pages checked:                    2
>  1G pages sameprot:                   0
>  1G pages preserved:                  0
>  2M pages checked:                  540
>  2M pages sameprot:                 466
>  2M pages preserved:                 47
>  4K pages checked:               800709
>  4K pages set-checked:             7668
> 
> After:
> 
>  1G pages checked:                    2
>  1G pages sameprot:                   0
>  1G pages preserved:                  0
>  2M pages checked:                  538
>  2M pages sameprot:                 466
>  2M pages preserved:                 47
>  4K pages checked:               560642
>  4K pages set-checked:             7668

With this new path added, I would have expected "2M pages sameprot" or
"1G pages sameprot" to go up.  The "checked" pages go down, which makes
sense, but I don't see either of the "sameprot"s going up.

I did a quick look back at the stats patch and didn't see any obvious
buglets that can account for it.  Both that and this code look sane, but
the stats just don't seem to square for some reason.

>  	/*
> +	 * Optimization: If the requested pgprot is the same as the current
> +	 * pgprot, then the large page can be preserved and no updates are
> +	 * required independent of alignment and length of the requested
> +	 * range. The above already established that the current pgprot is
> +	 * correct, which in consequence makes the requested pgprot correct
> +	 * as well if it is the same. The static protection scan below will
> +	 * not come to a different conclusion.
> +	 */
> +	if (pgprot_val(req_prot) == pgprot_val(old_prot)) {
> +		cpa_inc_lp_sameprot(level);
> +		return 0;
> +	}

Patch *looks* fine though.  I'm going to assume that the pasted stats
were from the wrong run or something.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 10/11] x86/mm/cpa: Do the range check early
  2018-09-17 14:29 ` [patch V3 10/11] x86/mm/cpa: Do the range check early Thomas Gleixner
@ 2018-09-21 20:26   ` Dave Hansen
  2018-09-27 18:50   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 20:26 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> To avoid excessive 4k wise checks in the common case do a quick check first
> whether the requested new page protections conflict with a static
> protection area in the large page. If there is no conflict then the
> decision whether to preserve or to split the page can be made immediately.
> 
> If the requested range covers the full large page, preserve it. Otherwise
> split it up. No point in doing a slow crawl in 4k steps.

That's the big one...

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 11/11] x86/mm/cpa: Avoid the 4k pages check completely
  2018-09-17 14:29 ` [patch V3 11/11] x86/mm/cpa: Avoid the 4k pages check completely Thomas Gleixner
@ 2018-09-21 20:32   ` Dave Hansen
  2018-09-27 18:51   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: Dave Hansen @ 2018-09-21 20:32 UTC (permalink / raw)
  To: Thomas Gleixner, LKML; +Cc: x86, Peter Zijlstra, Bin Yang, Mark Gross

On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> +	 * There used to be a 4k wise evaluation trying really hard to
> +	 * preserve the large pages, but experimentation has shown, that this
> +	 * does not help at all. There might be corner cases which would
> +	 * preserve one large page occasionally, but it's really not worth the
> +	 * extra code and cycles for the common case.

My only concern with this is that if this goes bad, it will be silently
bad and we'll just have an unnecessarily fractured direct map.

But, writing code to go verify the page tables and make sure they're
sane is a task for another day.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 05/11] x86/mm/cpa: Add debug mechanism
  2018-09-21 16:40   ` Dave Hansen
@ 2018-09-22 10:33     ` Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2018-09-22 10:33 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Thomas Gleixner, LKML, x86, Bin Yang, Mark Gross, Ingo Molnar

On Fri, Sep 21, 2018 at 09:40:36AM -0700, Dave Hansen wrote:
> On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> > The whole static protection magic is silently fixing up anything which is
> > handed in. That's just wrong. The offending call sites need to be fixed.
> > 
> > Add a debug mechanism which emits a warning if a requested mapping needs to be
> > fixed up. The DETECT debug mechanism is really not meant to be enabled except
> > for developers, so limit the output hard to the protection fixups.
> ...
> > +enum cpa_warn {
> > +	CPA_PROTECT,
> > +	CPA_DETECT,
> > +};
> > +
> > +static const int cpa_warn_level = CPA_PROTECT;
> 
> Even if this is intended for developers only, should we also add some
> config option here so things like 0day can still get warnings out of this?
> 
> Reviewed-by: Dave Hansen <dave.hansen@intel.com>

OTOH, I really wish there was something like: depends !RANDCONFIG
for some of those things, because I triggered
GCC_PLUGIN_STRUCTLEAK_VERBOSE in a randconfig the other day and thought
everything was busted due to the massive output.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch V3 09/11] x86/mm/cpa: Optimize same protection check
  2018-09-21 20:12   ` Dave Hansen
@ 2018-09-27 18:07     ` Thomas Gleixner
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gleixner @ 2018-09-27 18:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: LKML, x86, Peter Zijlstra, Bin Yang, Mark Gross

On Fri, 21 Sep 2018, Dave Hansen wrote:
> On 09/17/2018 07:29 AM, Thomas Gleixner wrote:
> > When the existing mapping is correct and the new requested page protections
> > are the same as the existing ones, then further checks can be omitted and the
> > large page can be preserved. The slow path 4k wise check will not come up with
> > a different result.
> > 
> > Before:
> > 
> >  1G pages checked:                    2
> >  1G pages sameprot:                   0
> >  1G pages preserved:                  0
> >  2M pages checked:                  540
> >  2M pages sameprot:                 466
> >  2M pages preserved:                 47
> >  4K pages checked:               800709
> >  4K pages set-checked:             7668
> > 
> > After:
> > 
> >  1G pages checked:                    2
> >  1G pages sameprot:                   0
> >  1G pages preserved:                  0
> >  2M pages checked:                  538
> >  2M pages sameprot:                 466
> >  2M pages preserved:                 47
> >  4K pages checked:               560642
> >  4K pages set-checked:             7668
> 
> With this new path added, I would have expected "2M pages sameprot" or
> "1G pages sameprot" to go up.  The "checked" pages go down, which makes
> sense, but I don't see either of the "sameprot"s going up.
> 
> I did a quick look back at the stats patch and didn't see any obvious
> buglets that can account for it.  Both that and this code look sane, but
> the stats just don't seem to square for some reason.

sameprot does not go up, because the change merily leaves early when it
detected the same prot and full mapping size. So it really just avoids the
extra scanning of the large pages in 4k steps which obviously does not come
to a different conclusion. So quite the contrary if the same prot count
would go up then the early dropout would be wrong because the 4k loop finds
a violation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/init32: Mark text and rodata RO in one go
  2018-09-17 14:29 ` [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go Thomas Gleixner
  2018-09-21 16:15   ` Dave Hansen
@ 2018-09-27 18:45   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bin.yang, peterz, mark.gross, dave.hansen, linux-kernel, tglx,
	rong.a.chen, mingo, hpa

Commit-ID:  2a25dc7c79c92c6cba45c6218c49395173be80bf
Gitweb:     https://git.kernel.org/tip/2a25dc7c79c92c6cba45c6218c49395173be80bf
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:07 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:37 +0200

x86/mm/init32: Mark text and rodata RO in one go

The sequence of marking text and rodata read-only in 32bit init is:

  set_ro(text);
  kernel_set_to_readonly = 1;
  set_ro(rodata);

When kernel_set_to_readonly is 1 it enables the protection mechanism in CPA
for the read only regions. With the upcoming checks for existing mappings
this consequently triggers the warning about an existing mapping being
incorrect vs. static protections because rodata has not been converted yet.

There is no technical reason to split the two, so just combine the RO
protection to convert text and rodata in one go.

Convert the printks to pr_info while at it.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.731701535@linutronix.de

---
 arch/x86/mm/init_32.c | 23 ++++-------------------
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 979e0a02cbe1..142c7d9f89cc 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -923,34 +923,19 @@ static void mark_nxdata_nx(void)
 void mark_rodata_ro(void)
 {
 	unsigned long start = PFN_ALIGN(_text);
-	unsigned long size = PFN_ALIGN(_etext) - start;
+	unsigned long size = (unsigned long)__end_rodata - start;
 
 	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
-	printk(KERN_INFO "Write protecting the kernel text: %luk\n",
+	pr_info("Write protecting kernel text and read-only data: %luk\n",
 		size >> 10);
 
 	kernel_set_to_readonly = 1;
 
 #ifdef CONFIG_CPA_DEBUG
-	printk(KERN_INFO "Testing CPA: Reverting %lx-%lx\n",
-		start, start+size);
-	set_pages_rw(virt_to_page(start), size>>PAGE_SHIFT);
-
-	printk(KERN_INFO "Testing CPA: write protecting again\n");
-	set_pages_ro(virt_to_page(start), size>>PAGE_SHIFT);
-#endif
-
-	start += size;
-	size = (unsigned long)__end_rodata - start;
-	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
-	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
-		size >> 10);
-
-#ifdef CONFIG_CPA_DEBUG
-	printk(KERN_INFO "Testing CPA: undo %lx-%lx\n", start, start + size);
+	pr_info("Testing CPA: Reverting %lx-%lx\n", start, start + size);
 	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);
 
-	printk(KERN_INFO "Testing CPA: write protecting again\n");
+	pr_info("Testing CPA: write protecting again\n");
 	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
 #endif
 	mark_nxdata_nx();

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Split, rename and clean up try_preserve_large_page()
  2018-09-17 14:29 ` [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Thomas Gleixner
                     ` (2 preceding siblings ...)
  2018-09-21 16:22   ` Dave Hansen
@ 2018-09-27 18:46   ` tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, linux-kernel, mingo, tglx, mark.gross, bin.yang, peterz,
	dave.hansen

Commit-ID:  8679de0959e65ee7f78db6405a8d23e61665751d
Gitweb:     https://git.kernel.org/tip/8679de0959e65ee7f78db6405a8d23e61665751d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:08 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:37 +0200

x86/mm/cpa: Split, rename and clean up try_preserve_large_page()

Avoid the extra variable and gotos by splitting the function into the
actual algorithm and a callable function which contains the lock
protection.

Rename it to should_split_large_page() while at it so the return values make
actually sense.

Clean up the code flow, comments and general whitespace damage while at it. No
functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.830507216@linutronix.de

---
 arch/x86/mm/pageattr.c | 121 ++++++++++++++++++++++++-------------------------
 1 file changed, 60 insertions(+), 61 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 51a5a69ecac9..36f5d711ddbe 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -421,18 +421,18 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
  */
 pte_t *lookup_address(unsigned long address, unsigned int *level)
 {
-        return lookup_address_in_pgd(pgd_offset_k(address), address, level);
+	return lookup_address_in_pgd(pgd_offset_k(address), address, level);
 }
 EXPORT_SYMBOL_GPL(lookup_address);
 
 static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
 				  unsigned int *level)
 {
-        if (cpa->pgd)
+	if (cpa->pgd)
 		return lookup_address_in_pgd(cpa->pgd + pgd_index(address),
 					       address, level);
 
-        return lookup_address(address, level);
+	return lookup_address(address, level);
 }
 
 /*
@@ -549,27 +549,22 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
 	return prot;
 }
 
-static int
-try_preserve_large_page(pte_t *kpte, unsigned long address,
-			struct cpa_data *cpa)
+static int __should_split_large_page(pte_t *kpte, unsigned long address,
+				     struct cpa_data *cpa)
 {
-	unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn;
-	pte_t new_pte, old_pte, *tmp;
+	unsigned long numpages, pmask, psize, lpaddr, addr, pfn, old_pfn;
 	pgprot_t old_prot, new_prot, req_prot;
-	int i, do_split = 1;
+	pte_t new_pte, old_pte, *tmp;
 	enum pg_level level;
+	int i;
 
-	if (cpa->force_split)
-		return 1;
-
-	spin_lock(&pgd_lock);
 	/*
 	 * Check for races, another CPU might have split this page
 	 * up already:
 	 */
 	tmp = _lookup_address_cpa(cpa, address, &level);
 	if (tmp != kpte)
-		goto out_unlock;
+		return 1;
 
 	switch (level) {
 	case PG_LEVEL_2M:
@@ -581,8 +576,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 		old_pfn = pud_pfn(*(pud_t *)kpte);
 		break;
 	default:
-		do_split = -EINVAL;
-		goto out_unlock;
+		return -EINVAL;
 	}
 
 	psize = page_level_size(level);
@@ -592,8 +586,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 	 * Calculate the number of pages, which fit into this large
 	 * page starting at address:
 	 */
-	nextpage_addr = (address + psize) & pmask;
-	numpages = (nextpage_addr - address) >> PAGE_SHIFT;
+	lpaddr = (address + psize) & pmask;
+	numpages = (lpaddr - address) >> PAGE_SHIFT;
 	if (numpages < cpa->numpages)
 		cpa->numpages = numpages;
 
@@ -620,57 +614,62 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 		pgprot_val(req_prot) |= _PAGE_PSE;
 
 	/*
-	 * old_pfn points to the large page base pfn. So we need
-	 * to add the offset of the virtual address:
+	 * old_pfn points to the large page base pfn. So we need to add the
+	 * offset of the virtual address:
 	 */
 	pfn = old_pfn + ((address & (psize - 1)) >> PAGE_SHIFT);
 	cpa->pfn = pfn;
 
-	new_prot = static_protections(req_prot, address, pfn);
+	/*
+	 * Calculate the large page base address and the number of 4K pages
+	 * in the large page
+	 */
+	lpaddr = address & pmask;
+	numpages = psize >> PAGE_SHIFT;
 
 	/*
-	 * We need to check the full range, whether
-	 * static_protection() requires a different pgprot for one of
-	 * the pages in the range we try to preserve:
+	 * Make sure that the requested pgprot does not violate the static
+	 * protections. Check the full large page whether one of the pages
+	 * in it results in a different pgprot than the first one of the
+	 * requested range. If yes, then the page needs to be split.
 	 */
-	addr = address & pmask;
+	new_prot = static_protections(req_prot, address, pfn);
 	pfn = old_pfn;
-	for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr += PAGE_SIZE, pfn++) {
+	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
 		pgprot_t chk_prot = static_protections(req_prot, addr, pfn);
 
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
-			goto out_unlock;
+			return 1;
 	}
 
-	/*
-	 * If there are no changes, return. maxpages has been updated
-	 * above:
-	 */
-	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
-		do_split = 0;
-		goto out_unlock;
-	}
+	/* If there are no changes, return. */
+	if (pgprot_val(new_prot) == pgprot_val(old_prot))
+		return 0;
 
 	/*
-	 * We need to change the attributes. Check, whether we can
-	 * change the large page in one go. We request a split, when
-	 * the address is not aligned and the number of pages is
-	 * smaller than the number of pages in the large page. Note
-	 * that we limited the number of possible pages already to
-	 * the number of pages in the large page.
+	 * Verify that the address is aligned and the number of pages
+	 * covers the full page.
 	 */
-	if (address == (address & pmask) && cpa->numpages == (psize >> PAGE_SHIFT)) {
-		/*
-		 * The address is aligned and the number of pages
-		 * covers the full page.
-		 */
-		new_pte = pfn_pte(old_pfn, new_prot);
-		__set_pmd_pte(kpte, address, new_pte);
-		cpa->flags |= CPA_FLUSHTLB;
-		do_split = 0;
-	}
+	if (address != lpaddr || cpa->numpages != numpages)
+		return 1;
+
+	/* All checks passed. Update the large page mapping. */
+	new_pte = pfn_pte(old_pfn, new_prot);
+	__set_pmd_pte(kpte, address, new_pte);
+	cpa->flags |= CPA_FLUSHTLB;
+	return 0;
+}
+
+static int should_split_large_page(pte_t *kpte, unsigned long address,
+				   struct cpa_data *cpa)
+{
+	int do_split;
+
+	if (cpa->force_split)
+		return 1;
 
-out_unlock:
+	spin_lock(&pgd_lock);
+	do_split = __should_split_large_page(kpte, address, cpa);
 	spin_unlock(&pgd_lock);
 
 	return do_split;
@@ -1273,7 +1272,7 @@ repeat:
 	 * Check, whether we can keep the large page intact
 	 * and just change the pte:
 	 */
-	do_split = try_preserve_large_page(kpte, address, cpa);
+	do_split = should_split_large_page(kpte, address, cpa);
 	/*
 	 * When the range fits into the existing large page,
 	 * return. cp->numpages and cpa->tlbflush have been updated in
@@ -1288,23 +1287,23 @@ repeat:
 	err = split_large_page(cpa, kpte, address);
 	if (!err) {
 		/*
-	 	 * Do a global flush tlb after splitting the large page
-	 	 * and before we do the actual change page attribute in the PTE.
-	 	 *
-	 	 * With out this, we violate the TLB application note, that says
-	 	 * "The TLBs may contain both ordinary and large-page
+		 * Do a global flush tlb after splitting the large page
+		 * and before we do the actual change page attribute in the PTE.
+		 *
+		 * With out this, we violate the TLB application note, that says
+		 * "The TLBs may contain both ordinary and large-page
 		 *  translations for a 4-KByte range of linear addresses. This
 		 *  may occur if software modifies the paging structures so that
 		 *  the page size used for the address range changes. If the two
 		 *  translations differ with respect to page frame or attributes
 		 *  (e.g., permissions), processor behavior is undefined and may
 		 *  be implementation-specific."
-	 	 *
-	 	 * We do this global tlb flush inside the cpa_lock, so that we
+		 *
+		 * We do this global tlb flush inside the cpa_lock, so that we
 		 * don't allow any other cpu, with stale tlb entries change the
 		 * page attribute in parallel, that also falls into the
 		 * just split large page entry.
-	 	 */
+		 */
 		flush_tlb_all();
 		goto repeat;
 	}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Rework static_protections()
  2018-09-17 14:29 ` [patch V3 03/11] x86/mm/cpa: Rework static_protections() Thomas Gleixner
  2018-09-21 16:33   ` Dave Hansen
@ 2018-09-27 18:46   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mark.gross, peterz, dave.hansen, hpa, mingo, linux-kernel,
	bin.yang

Commit-ID:  afd7969a99e072e6aa0d88511176d4d2f3009fd9
Gitweb:     https://git.kernel.org/tip/afd7969a99e072e6aa0d88511176d4d2f3009fd9
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:09 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:37 +0200

x86/mm/cpa: Rework static_protections()

static_protections() is pretty unreadable. Split it up into separate checks
for each protection area.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.913005317@linutronix.de

---
 arch/x86/mm/pageattr.c | 162 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 98 insertions(+), 64 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 36f5d711ddbe..9731aeeebe71 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -286,84 +286,118 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache,
 	}
 }
 
+#ifdef CONFIG_PCI_BIOS
 /*
- * Certain areas of memory on x86 require very specific protection flags,
- * for example the BIOS area or kernel text. Callers don't always get this
- * right (again, ioremap() on BIOS memory is not uncommon) so this function
- * checks and fixes these known static required protection bits.
+ * The BIOS area between 640k and 1Mb needs to be executable for PCI BIOS
+ * based config access (CONFIG_PCI_GOBIOS) support.
  */
-static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
-				   unsigned long pfn)
-{
-	pgprot_t forbidden = __pgprot(0);
+#define BIOS_PFN	PFN_DOWN(BIOS_BEGIN)
+#define BIOS_PFN_END	PFN_DOWN(BIOS_END)
 
-	/*
-	 * The BIOS area between 640k and 1Mb needs to be executable for
-	 * PCI BIOS based config access (CONFIG_PCI_GOBIOS) support.
-	 */
-#ifdef CONFIG_PCI_BIOS
-	if (pcibios_enabled && within(pfn, BIOS_BEGIN >> PAGE_SHIFT, BIOS_END >> PAGE_SHIFT))
-		pgprot_val(forbidden) |= _PAGE_NX;
+static pgprotval_t protect_pci_bios(unsigned long pfn)
+{
+	if (pcibios_enabled && within(pfn, BIOS_PFN, BIOS_PFN_END))
+		return _PAGE_NX;
+	return 0;
+}
+#else
+static pgprotval_t protect_pci_bios(unsigned long pfn)
+{
+	return 0;
+}
 #endif
 
-	/*
-	 * The kernel text needs to be executable for obvious reasons
-	 * Does not cover __inittext since that is gone later on. On
-	 * 64bit we do not enforce !NX on the low mapping
-	 */
-	if (within(address, (unsigned long)_text, (unsigned long)_etext))
-		pgprot_val(forbidden) |= _PAGE_NX;
+/*
+ * The .rodata section needs to be read-only. Using the pfn catches all
+ * aliases.  This also includes __ro_after_init, so do not enforce until
+ * kernel_set_to_readonly is true.
+ */
+static pgprotval_t protect_rodata(unsigned long pfn)
+{
+	unsigned long start_pfn = __pa_symbol(__start_rodata) >> PAGE_SHIFT;
+	unsigned long end_pfn = __pa_symbol(__end_rodata) >> PAGE_SHIFT;
 
-	/*
-	 * The .rodata section needs to be read-only. Using the pfn
-	 * catches all aliases.  This also includes __ro_after_init,
-	 * so do not enforce until kernel_set_to_readonly is true.
-	 */
-	if (kernel_set_to_readonly &&
-	    within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
-		   __pa_symbol(__end_rodata) >> PAGE_SHIFT))
-		pgprot_val(forbidden) |= _PAGE_RW;
+	if (kernel_set_to_readonly && within(pfn, start_pfn, end_pfn))
+		return _PAGE_RW;
+	return 0;
+}
+
+/*
+ * Protect kernel text against becoming non executable by forbidding
+ * _PAGE_NX.  This protects only the high kernel mapping (_text -> _etext)
+ * out of which the kernel actually executes.  Do not protect the low
+ * mapping.
+ *
+ * This does not cover __inittext since that is gone after boot.
+ */
+static pgprotval_t protect_kernel_text(unsigned long address)
+{
+	if (within(address, (unsigned long)_text, (unsigned long)_etext))
+		return _PAGE_NX;
+	return 0;
+}
 
 #if defined(CONFIG_X86_64)
+/*
+ * Once the kernel maps the text as RO (kernel_set_to_readonly is set),
+ * kernel text mappings for the large page aligned text, rodata sections
+ * will be always read-only. For the kernel identity mappings covering the
+ * holes caused by this alignment can be anything that user asks.
+ *
+ * This will preserve the large page mappings for kernel text/data at no
+ * extra cost.
+ */
+static pgprotval_t protect_kernel_text_ro(unsigned long address)
+{
+	unsigned long end = (unsigned long)__end_rodata_hpage_align;
+	unsigned long start = (unsigned long)_text;
+	unsigned int level;
+
+	if (!kernel_set_to_readonly || !within(address, start, end))
+		return 0;
 	/*
-	 * Once the kernel maps the text as RO (kernel_set_to_readonly is set),
-	 * kernel text mappings for the large page aligned text, rodata sections
-	 * will be always read-only. For the kernel identity mappings covering
-	 * the holes caused by this alignment can be anything that user asks.
+	 * Don't enforce the !RW mapping for the kernel text mapping, if
+	 * the current mapping is already using small page mapping.  No
+	 * need to work hard to preserve large page mappings in this case.
 	 *
-	 * This will preserve the large page mappings for kernel text/data
-	 * at no extra cost.
+	 * This also fixes the Linux Xen paravirt guest boot failure caused
+	 * by unexpected read-only mappings for kernel identity
+	 * mappings. In this paravirt guest case, the kernel text mapping
+	 * and the kernel identity mapping share the same page-table pages,
+	 * so the protections for kernel text and identity mappings have to
+	 * be the same.
 	 */
-	if (kernel_set_to_readonly &&
-	    within(address, (unsigned long)_text,
-		   (unsigned long)__end_rodata_hpage_align)) {
-		unsigned int level;
-
-		/*
-		 * Don't enforce the !RW mapping for the kernel text mapping,
-		 * if the current mapping is already using small page mapping.
-		 * No need to work hard to preserve large page mappings in this
-		 * case.
-		 *
-		 * This also fixes the Linux Xen paravirt guest boot failure
-		 * (because of unexpected read-only mappings for kernel identity
-		 * mappings). In this paravirt guest case, the kernel text
-		 * mapping and the kernel identity mapping share the same
-		 * page-table pages. Thus we can't really use different
-		 * protections for the kernel text and identity mappings. Also,
-		 * these shared mappings are made of small page mappings.
-		 * Thus this don't enforce !RW mapping for small page kernel
-		 * text mapping logic will help Linux Xen parvirt guest boot
-		 * as well.
-		 */
-		if (lookup_address(address, &level) && (level != PG_LEVEL_4K))
-			pgprot_val(forbidden) |= _PAGE_RW;
-	}
+	if (lookup_address(address, &level) && (level != PG_LEVEL_4K))
+		return _PAGE_RW;
+	return 0;
+}
+#else
+static pgprotval_t protect_kernel_text_ro(unsigned long address)
+{
+	return 0;
+}
 #endif
 
-	prot = __pgprot(pgprot_val(prot) & ~pgprot_val(forbidden));
+/*
+ * Certain areas of memory on x86 require very specific protection flags,
+ * for example the BIOS area or kernel text. Callers don't always get this
+ * right (again, ioremap() on BIOS memory is not uncommon) so this function
+ * checks and fixes these known static required protection bits.
+ */
+static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
+					  unsigned long pfn)
+{
+	pgprotval_t forbidden;
+
+	/* Operate on the virtual address */
+	forbidden  = protect_kernel_text(address);
+	forbidden |= protect_kernel_text_ro(address);
 
-	return prot;
+	/* Check the PFN directly */
+	forbidden |= protect_pci_bios(pfn);
+	forbidden |= protect_rodata(pfn);
+
+	return __pgprot(pgprot_val(prot) & ~forbidden);
 }
 
 /*

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Allow range check for static protections
  2018-09-17 14:29 ` [patch V3 04/11] x86/mm/cpa: Allow range check for static protections Thomas Gleixner
  2018-09-21 16:36   ` Dave Hansen
@ 2018-09-27 18:47   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, peterz, bin.yang, tglx, hpa, mingo, linux-kernel,
	mark.gross

Commit-ID:  91ee8f5c1f50a1f4096c178a93a8da46ce3f6cc8
Gitweb:     https://git.kernel.org/tip/91ee8f5c1f50a1f4096c178a93a8da46ce3f6cc8
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:10 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:37 +0200

x86/mm/cpa: Allow range check for static protections

Checking static protections only page by page is slow especially for huge
pages. To allow quick checks over a complete range, add the ability to do
that.

Make the checks inclusive so the ranges can be directly used for debug output
later.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.995734490@linutronix.de

---
 arch/x86/mm/pageattr.c | 69 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 25 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 9731aeeebe71..61ff27848452 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -286,22 +286,29 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache,
 	}
 }
 
+static bool overlaps(unsigned long r1_start, unsigned long r1_end,
+		     unsigned long r2_start, unsigned long r2_end)
+{
+	return (r1_start <= r2_end && r1_end >= r2_start) ||
+		(r2_start <= r1_end && r2_end >= r1_start);
+}
+
 #ifdef CONFIG_PCI_BIOS
 /*
  * The BIOS area between 640k and 1Mb needs to be executable for PCI BIOS
  * based config access (CONFIG_PCI_GOBIOS) support.
  */
 #define BIOS_PFN	PFN_DOWN(BIOS_BEGIN)
-#define BIOS_PFN_END	PFN_DOWN(BIOS_END)
+#define BIOS_PFN_END	PFN_DOWN(BIOS_END - 1)
 
-static pgprotval_t protect_pci_bios(unsigned long pfn)
+static pgprotval_t protect_pci_bios(unsigned long spfn, unsigned long epfn)
 {
-	if (pcibios_enabled && within(pfn, BIOS_PFN, BIOS_PFN_END))
+	if (pcibios_enabled && overlaps(spfn, epfn, BIOS_PFN, BIOS_PFN_END))
 		return _PAGE_NX;
 	return 0;
 }
 #else
-static pgprotval_t protect_pci_bios(unsigned long pfn)
+static pgprotval_t protect_pci_bios(unsigned long spfn, unsigned long epfn)
 {
 	return 0;
 }
@@ -312,12 +319,17 @@ static pgprotval_t protect_pci_bios(unsigned long pfn)
  * aliases.  This also includes __ro_after_init, so do not enforce until
  * kernel_set_to_readonly is true.
  */
-static pgprotval_t protect_rodata(unsigned long pfn)
+static pgprotval_t protect_rodata(unsigned long spfn, unsigned long epfn)
 {
-	unsigned long start_pfn = __pa_symbol(__start_rodata) >> PAGE_SHIFT;
-	unsigned long end_pfn = __pa_symbol(__end_rodata) >> PAGE_SHIFT;
+	unsigned long epfn_ro, spfn_ro = PFN_DOWN(__pa_symbol(__start_rodata));
+
+	/*
+	 * Note: __end_rodata is at page aligned and not inclusive, so
+	 * subtract 1 to get the last enforced PFN in the rodata area.
+	 */
+	epfn_ro = PFN_DOWN(__pa_symbol(__end_rodata)) - 1;
 
-	if (kernel_set_to_readonly && within(pfn, start_pfn, end_pfn))
+	if (kernel_set_to_readonly && overlaps(spfn, epfn, spfn_ro, epfn_ro))
 		return _PAGE_RW;
 	return 0;
 }
@@ -330,9 +342,12 @@ static pgprotval_t protect_rodata(unsigned long pfn)
  *
  * This does not cover __inittext since that is gone after boot.
  */
-static pgprotval_t protect_kernel_text(unsigned long address)
+static pgprotval_t protect_kernel_text(unsigned long start, unsigned long end)
 {
-	if (within(address, (unsigned long)_text, (unsigned long)_etext))
+	unsigned long t_end = (unsigned long)_etext - 1;
+	unsigned long t_start = (unsigned long)_text;
+
+	if (overlaps(start, end, t_start, t_end))
 		return _PAGE_NX;
 	return 0;
 }
@@ -347,13 +362,14 @@ static pgprotval_t protect_kernel_text(unsigned long address)
  * This will preserve the large page mappings for kernel text/data at no
  * extra cost.
  */
-static pgprotval_t protect_kernel_text_ro(unsigned long address)
+static pgprotval_t protect_kernel_text_ro(unsigned long start,
+					  unsigned long end)
 {
-	unsigned long end = (unsigned long)__end_rodata_hpage_align;
-	unsigned long start = (unsigned long)_text;
+	unsigned long t_end = (unsigned long)__end_rodata_hpage_align - 1;
+	unsigned long t_start = (unsigned long)_text;
 	unsigned int level;
 
-	if (!kernel_set_to_readonly || !within(address, start, end))
+	if (!kernel_set_to_readonly || !overlaps(start, end, t_start, t_end))
 		return 0;
 	/*
 	 * Don't enforce the !RW mapping for the kernel text mapping, if
@@ -367,12 +383,13 @@ static pgprotval_t protect_kernel_text_ro(unsigned long address)
 	 * so the protections for kernel text and identity mappings have to
 	 * be the same.
 	 */
-	if (lookup_address(address, &level) && (level != PG_LEVEL_4K))
+	if (lookup_address(start, &level) && (level != PG_LEVEL_4K))
 		return _PAGE_RW;
 	return 0;
 }
 #else
-static pgprotval_t protect_kernel_text_ro(unsigned long address)
+static pgprotval_t protect_kernel_text_ro(unsigned long start,
+					  unsigned long end)
 {
 	return 0;
 }
@@ -384,18 +401,20 @@ static pgprotval_t protect_kernel_text_ro(unsigned long address)
  * right (again, ioremap() on BIOS memory is not uncommon) so this function
  * checks and fixes these known static required protection bits.
  */
-static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
-					  unsigned long pfn)
+static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
+					  unsigned long pfn, unsigned long npg)
 {
 	pgprotval_t forbidden;
+	unsigned long end;
 
 	/* Operate on the virtual address */
-	forbidden  = protect_kernel_text(address);
-	forbidden |= protect_kernel_text_ro(address);
+	end = start + npg * PAGE_SIZE - 1;
+	forbidden  = protect_kernel_text(start, end);
+	forbidden |= protect_kernel_text_ro(start, end);
 
 	/* Check the PFN directly */
-	forbidden |= protect_pci_bios(pfn);
-	forbidden |= protect_rodata(pfn);
+	forbidden |= protect_pci_bios(pfn, pfn + npg - 1);
+	forbidden |= protect_rodata(pfn, pfn + npg - 1);
 
 	return __pgprot(pgprot_val(prot) & ~forbidden);
 }
@@ -667,10 +686,10 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	 * in it results in a different pgprot than the first one of the
 	 * requested range. If yes, then the page needs to be split.
 	 */
-	new_prot = static_protections(req_prot, address, pfn);
+	new_prot = static_protections(req_prot, address, pfn, 1);
 	pfn = old_pfn;
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		pgprot_t chk_prot = static_protections(req_prot, addr, pfn);
+		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1);
 
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
@@ -1280,7 +1299,7 @@ repeat:
 		pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
 		pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
 
-		new_prot = static_protections(new_prot, address, pfn);
+		new_prot = static_protections(new_prot, address, pfn, 1);
 
 		new_prot = pgprot_clear_protnone_bits(new_prot);
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Add debug mechanism
  2018-09-17 14:29 ` [patch V3 05/11] x86/mm/cpa: Add debug mechanism Thomas Gleixner
  2018-09-21 16:40   ` Dave Hansen
@ 2018-09-27 18:48   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, peterz, mingo, mark.gross, bin.yang, tglx, hpa,
	dave.hansen

Commit-ID:  4046460b867f8b041c81c26c09d3bcad6d6e814e
Gitweb:     https://git.kernel.org/tip/4046460b867f8b041c81c26c09d3bcad6d6e814e
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:11 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:38 +0200

x86/mm/cpa: Add debug mechanism

The whole static protection magic is silently fixing up anything which is
handed in. That's just wrong. The offending call sites need to be fixed.

Add a debug mechanism which emits a warning if a requested mapping needs to be
fixed up. The DETECT debug mechanism is really not meant to be enabled except
for developers, so limit the output hard to the protection fixups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.078998733@linutronix.de

---
 arch/x86/mm/pageattr.c | 61 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 61ff27848452..1f2519055b30 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -42,6 +42,13 @@ struct cpa_data {
 	struct page	**pages;
 };
 
+enum cpa_warn {
+	CPA_PROTECT,
+	CPA_DETECT,
+};
+
+static const int cpa_warn_level = CPA_PROTECT;
+
 /*
  * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings)
  * using cpa_lock. So that we don't allow any other cpu, with stale large tlb
@@ -395,6 +402,28 @@ static pgprotval_t protect_kernel_text_ro(unsigned long start,
 }
 #endif
 
+static inline bool conflicts(pgprot_t prot, pgprotval_t val)
+{
+	return (pgprot_val(prot) & ~val) != pgprot_val(prot);
+}
+
+static inline void check_conflict(int warnlvl, pgprot_t prot, pgprotval_t val,
+				  unsigned long start, unsigned long end,
+				  unsigned long pfn, const char *txt)
+{
+	static const char *lvltxt[] = {
+		[CPA_PROTECT]	= "protect",
+		[CPA_DETECT]	= "detect",
+	};
+
+	if (warnlvl > cpa_warn_level || !conflicts(prot, val))
+		return;
+
+	pr_warn("CPA %8s %10s: 0x%016lx - 0x%016lx PFN %lx req %016llx prevent %016llx\n",
+		lvltxt[warnlvl], txt, start, end, pfn, (unsigned long long)pgprot_val(prot),
+		(unsigned long long)val);
+}
+
 /*
  * Certain areas of memory on x86 require very specific protection flags,
  * for example the BIOS area or kernel text. Callers don't always get this
@@ -402,19 +431,31 @@ static pgprotval_t protect_kernel_text_ro(unsigned long start,
  * checks and fixes these known static required protection bits.
  */
 static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
-					  unsigned long pfn, unsigned long npg)
+					  unsigned long pfn, unsigned long npg,
+					  int warnlvl)
 {
-	pgprotval_t forbidden;
+	pgprotval_t forbidden, res;
 	unsigned long end;
 
 	/* Operate on the virtual address */
 	end = start + npg * PAGE_SIZE - 1;
-	forbidden  = protect_kernel_text(start, end);
-	forbidden |= protect_kernel_text_ro(start, end);
+
+	res = protect_kernel_text(start, end);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX");
+	forbidden = res;
+
+	res = protect_kernel_text_ro(start, end);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO");
+	forbidden |= res;
 
 	/* Check the PFN directly */
-	forbidden |= protect_pci_bios(pfn, pfn + npg - 1);
-	forbidden |= protect_rodata(pfn, pfn + npg - 1);
+	res = protect_pci_bios(pfn, pfn + npg - 1);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "PCIBIOS NX");
+	forbidden |= res;
+
+	res = protect_rodata(pfn, pfn + npg - 1);
+	check_conflict(warnlvl, prot, res, start, end, pfn, "Rodata RO");
+	forbidden |= res;
 
 	return __pgprot(pgprot_val(prot) & ~forbidden);
 }
@@ -686,10 +727,11 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	 * in it results in a different pgprot than the first one of the
 	 * requested range. If yes, then the page needs to be split.
 	 */
-	new_prot = static_protections(req_prot, address, pfn, 1);
+	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
 	pfn = old_pfn;
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1);
+		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1,
+						       CPA_DETECT);
 
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
@@ -1299,7 +1341,8 @@ repeat:
 		pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
 		pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
 
-		new_prot = static_protections(new_prot, address, pfn, 1);
+		new_prot = static_protections(new_prot, address, pfn, 1,
+					      CPA_PROTECT);
 
 		new_prot = pgprot_clear_protnone_bits(new_prot);
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Add large page preservation statistics
  2018-09-17 14:29 ` [patch V3 06/11] x86/mm/cpa: Add large page preservation statistics Thomas Gleixner
  2018-09-21 19:59   ` Dave Hansen
@ 2018-09-27 18:48   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, tglx, linux-kernel, dave.hansen, bin.yang, mark.gross,
	hpa, mingo

Commit-ID:  5c280cf6081ff99078e28b51172d78359f194fd9
Gitweb:     https://git.kernel.org/tip/5c280cf6081ff99078e28b51172d78359f194fd9
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:12 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:38 +0200

x86/mm/cpa: Add large page preservation statistics

The large page preservation mechanism is just magic and provides no
information at all. Add optional statistic output in debugfs so the magic can
be evaluated. Defaults is off.

Output:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800770
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.160867778@linutronix.de

---
 arch/x86/Kconfig       |  8 ++++
 arch/x86/mm/pageattr.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1a0be022f91d..65728bb1182c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1491,6 +1491,14 @@ config X86_DIRECT_GBPAGES
 	  supports them), so don't confuse the user by printing
 	  that we have them enabled.
 
+config X86_CPA_STATISTICS
+	bool "Enable statistic for Change Page Attribute"
+	depends on DEBUG_FS
+	---help---
+	  Expose statistics about the Change Page Attribute mechanims, which
+	  helps to determine the effectivness of preserving large and huge
+	  page mappings when mapping protections are changed.
+
 config ARCH_HAS_MEM_ENCRYPT
 	def_bool y
 
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1f2519055b30..cdf52eb86f3a 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -101,6 +101,95 @@ void arch_report_meminfo(struct seq_file *m)
 static inline void split_page_count(int level) { }
 #endif
 
+#ifdef CONFIG_X86_CPA_STATISTICS
+
+static unsigned long cpa_1g_checked;
+static unsigned long cpa_1g_sameprot;
+static unsigned long cpa_1g_preserved;
+static unsigned long cpa_2m_checked;
+static unsigned long cpa_2m_sameprot;
+static unsigned long cpa_2m_preserved;
+static unsigned long cpa_4k_checked;
+static unsigned long cpa_4k_install;
+
+static inline void cpa_inc_1g_checked(void)
+{
+	cpa_1g_checked++;
+}
+
+static inline void cpa_inc_2m_checked(void)
+{
+	cpa_2m_checked++;
+}
+
+static inline void cpa_inc_4k_checked(void)
+{
+	cpa_4k_checked++;
+}
+
+static inline void cpa_inc_4k_install(void)
+{
+	cpa_4k_install++;
+}
+
+static inline void cpa_inc_lp_sameprot(int level)
+{
+	if (level == PG_LEVEL_1G)
+		cpa_1g_sameprot++;
+	else
+		cpa_2m_sameprot++;
+}
+
+static inline void cpa_inc_lp_preserved(int level)
+{
+	if (level == PG_LEVEL_1G)
+		cpa_1g_preserved++;
+	else
+		cpa_2m_preserved++;
+}
+
+static int cpastats_show(struct seq_file *m, void *p)
+{
+	seq_printf(m, "1G pages checked:     %16lu\n", cpa_1g_checked);
+	seq_printf(m, "1G pages sameprot:    %16lu\n", cpa_1g_sameprot);
+	seq_printf(m, "1G pages preserved:   %16lu\n", cpa_1g_preserved);
+	seq_printf(m, "2M pages checked:     %16lu\n", cpa_2m_checked);
+	seq_printf(m, "2M pages sameprot:    %16lu\n", cpa_2m_sameprot);
+	seq_printf(m, "2M pages preserved:   %16lu\n", cpa_2m_preserved);
+	seq_printf(m, "4K pages checked:     %16lu\n", cpa_4k_checked);
+	seq_printf(m, "4K pages set-checked: %16lu\n", cpa_4k_install);
+	return 0;
+}
+
+static int cpastats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, cpastats_show, NULL);
+}
+
+static const struct file_operations cpastats_fops = {
+	.open		= cpastats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int __init cpa_stats_init(void)
+{
+	debugfs_create_file("cpa_stats", S_IRUSR, arch_debugfs_dir, NULL,
+			    &cpastats_fops);
+	return 0;
+}
+late_initcall(cpa_stats_init);
+#else
+static inline void cpa_inc_1g_checked(void) { }
+static inline void cpa_inc_2m_checked(void) { }
+static inline void cpa_inc_4k_checked(void) { }
+static inline void cpa_inc_4k_install(void) { }
+static inline void cpa_inc_lp_sameprot(int level) { }
+static inline void cpa_inc_lp_preserved(int level) { }
+#endif
+
+
 static inline int
 within(unsigned long addr, unsigned long start, unsigned long end)
 {
@@ -664,10 +753,12 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	case PG_LEVEL_2M:
 		old_prot = pmd_pgprot(*(pmd_t *)kpte);
 		old_pfn = pmd_pfn(*(pmd_t *)kpte);
+		cpa_inc_2m_checked();
 		break;
 	case PG_LEVEL_1G:
 		old_prot = pud_pgprot(*(pud_t *)kpte);
 		old_pfn = pud_pfn(*(pud_t *)kpte);
+		cpa_inc_1g_checked();
 		break;
 	default:
 		return -EINVAL;
@@ -732,14 +823,16 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
 		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1,
 						       CPA_DETECT);
-
+		cpa_inc_4k_checked();
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
 	}
 
 	/* If there are no changes, return. */
-	if (pgprot_val(new_prot) == pgprot_val(old_prot))
+	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
+		cpa_inc_lp_sameprot(level);
 		return 0;
+	}
 
 	/*
 	 * Verify that the address is aligned and the number of pages
@@ -752,6 +845,7 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	new_pte = pfn_pte(old_pfn, new_prot);
 	__set_pmd_pte(kpte, address, new_pte);
 	cpa->flags |= CPA_FLUSHTLB;
+	cpa_inc_lp_preserved(level);
 	return 0;
 }
 
@@ -1341,6 +1435,7 @@ repeat:
 		pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
 		pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
 
+		cpa_inc_4k_install();
 		new_prot = static_protections(new_prot, address, pfn, 1,
 					      CPA_PROTECT);
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Avoid static protection checks on unmap
  2018-09-17 14:29 ` [patch V3 07/11] x86/mm/cpa: Avoid static protection checks on unmap Thomas Gleixner
  2018-09-21 20:01   ` Dave Hansen
@ 2018-09-27 18:49   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, linux-kernel, dave.hansen, mingo, bin.yang,
	mark.gross, peterz

Commit-ID:  69c31e69df3d2efc4ad7f53d500fdd920d3865a4
Gitweb:     https://git.kernel.org/tip/69c31e69df3d2efc4ad7f53d500fdd920d3865a4
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:13 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:38 +0200

x86/mm/cpa: Avoid static protection checks on unmap

If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are
completely irrelevant.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800770
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800709
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.245849757@linutronix.de

---
 arch/x86/mm/pageattr.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index cdf52eb86f3a..8f9083eb21ac 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -526,6 +526,13 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
 	pgprotval_t forbidden, res;
 	unsigned long end;
 
+	/*
+	 * There is no point in checking RW/NX conflicts when the requested
+	 * mapping is setting the page !PRESENT.
+	 */
+	if (!(pgprot_val(prot) & _PAGE_PRESENT))
+		return prot;
+
 	/* Operate on the virtual address */
 	end = start + npg * PAGE_SIZE - 1;
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Add sanity check for existing mappings
  2018-09-17 14:29 ` [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings Thomas Gleixner
  2018-09-18  7:14   ` Peter Zijlstra
  2018-09-21 20:07   ` Dave Hansen
@ 2018-09-27 18:49   ` tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mark.gross, tglx, linux-kernel, dave.hansen, bin.yang, peterz,
	mingo, hpa

Commit-ID:  f61c5ba2885eabc7bc4b0b2f5232f475216ba446
Gitweb:     https://git.kernel.org/tip/f61c5ba2885eabc7bc4b0b2f5232f475216ba446
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:14 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:39 +0200

x86/mm/cpa: Add sanity check for existing mappings

With the range check it is possible to do a quick verification that the
current mapping is correct vs. the static protection areas.

In case a incorrect mapping is detected a warning is emitted and the large
page is split up. If the large page is a 2M page, then the split code is
forced to check the static protections for the PTE entries to fix up the
incorrectness. For 1G pages this can't be done easily because that would
require to either find the offending 2M areas before the split or
afterwards. For now just warn about that case and revisit it when reported.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.331408643@linutronix.de

---
 arch/x86/mm/pageattr.c | 77 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 67 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 8f9083eb21ac..19781b0ab4b4 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -37,12 +37,14 @@ struct cpa_data {
 	unsigned long	numpages;
 	int		flags;
 	unsigned long	pfn;
-	unsigned	force_split : 1;
+	unsigned	force_split		: 1,
+			force_static_prot	: 1;
 	int		curpage;
 	struct page	**pages;
 };
 
 enum cpa_warn {
+	CPA_CONFLICT,
 	CPA_PROTECT,
 	CPA_DETECT,
 };
@@ -501,6 +503,7 @@ static inline void check_conflict(int warnlvl, pgprot_t prot, pgprotval_t val,
 				  unsigned long pfn, const char *txt)
 {
 	static const char *lvltxt[] = {
+		[CPA_CONFLICT]	= "conflict",
 		[CPA_PROTECT]	= "protect",
 		[CPA_DETECT]	= "detect",
 	};
@@ -743,7 +746,7 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 				     struct cpa_data *cpa)
 {
 	unsigned long numpages, pmask, psize, lpaddr, addr, pfn, old_pfn;
-	pgprot_t old_prot, new_prot, req_prot;
+	pgprot_t old_prot, new_prot, req_prot, chk_prot;
 	pte_t new_pte, old_pte, *tmp;
 	enum pg_level level;
 	int i;
@@ -819,6 +822,23 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	lpaddr = address & pmask;
 	numpages = psize >> PAGE_SHIFT;
 
+	/*
+	 * Sanity check that the existing mapping is correct versus the static
+	 * protections. static_protections() guards against !PRESENT, so no
+	 * extra conditional required here.
+	 */
+	chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages,
+				      CPA_CONFLICT);
+
+	if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) {
+		/*
+		 * Split the large page and tell the split code to
+		 * enforce static protections.
+		 */
+		cpa->force_static_prot = 1;
+		return 1;
+	}
+
 	/*
 	 * Make sure that the requested pgprot does not violate the static
 	 * protections. Check the full large page whether one of the pages
@@ -828,8 +848,8 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
 	pfn = old_pfn;
 	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		pgprot_t chk_prot = static_protections(req_prot, addr, pfn, 1,
-						       CPA_DETECT);
+		chk_prot = static_protections(req_prot, addr, pfn, 1,
+					      CPA_DETECT);
 		cpa_inc_4k_checked();
 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			return 1;
@@ -871,15 +891,50 @@ static int should_split_large_page(pte_t *kpte, unsigned long address,
 	return do_split;
 }
 
+static void split_set_pte(struct cpa_data *cpa, pte_t *pte, unsigned long pfn,
+			  pgprot_t ref_prot, unsigned long address,
+			  unsigned long size)
+{
+	unsigned int npg = PFN_DOWN(size);
+	pgprot_t prot;
+
+	/*
+	 * If should_split_large_page() discovered an inconsistent mapping,
+	 * remove the invalid protection in the split mapping.
+	 */
+	if (!cpa->force_static_prot)
+		goto set;
+
+	prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT);
+
+	if (pgprot_val(prot) == pgprot_val(ref_prot))
+		goto set;
+
+	/*
+	 * If this is splitting a PMD, fix it up. PUD splits cannot be
+	 * fixed trivially as that would require to rescan the newly
+	 * installed PMD mappings after returning from split_large_page()
+	 * so an eventual further split can allocate the necessary PTE
+	 * pages. Warn for now and revisit it in case this actually
+	 * happens.
+	 */
+	if (size == PAGE_SIZE)
+		ref_prot = prot;
+	else
+		pr_warn_once("CPA: Cannot fixup static protections for PUD split\n");
+set:
+	set_pte(pte, pfn_pte(pfn, ref_prot));
+}
+
 static int
 __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 		   struct page *base)
 {
+	unsigned long lpaddr, lpinc, ref_pfn, pfn, pfninc = 1;
 	pte_t *pbase = (pte_t *)page_address(base);
-	unsigned long ref_pfn, pfn, pfninc = 1;
 	unsigned int i, level;
-	pte_t *tmp;
 	pgprot_t ref_prot;
+	pte_t *tmp;
 
 	spin_lock(&pgd_lock);
 	/*
@@ -902,15 +957,17 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 		 * PAT bit to correct position.
 		 */
 		ref_prot = pgprot_large_2_4k(ref_prot);
-
 		ref_pfn = pmd_pfn(*(pmd_t *)kpte);
+		lpaddr = address & PMD_MASK;
+		lpinc = PAGE_SIZE;
 		break;
 
 	case PG_LEVEL_1G:
 		ref_prot = pud_pgprot(*(pud_t *)kpte);
 		ref_pfn = pud_pfn(*(pud_t *)kpte);
 		pfninc = PMD_PAGE_SIZE >> PAGE_SHIFT;
-
+		lpaddr = address & PUD_MASK;
+		lpinc = PMD_SIZE;
 		/*
 		 * Clear the PSE flags if the PRESENT flag is not set
 		 * otherwise pmd_present/pmd_huge will return true
@@ -931,8 +988,8 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 	 * Get the target pfn from the original entry:
 	 */
 	pfn = ref_pfn;
-	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
-		set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
+	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc, lpaddr += lpinc)
+		split_set_pte(cpa, pbase + i, pfn, ref_prot, lpaddr, lpinc);
 
 	if (virt_addr_valid(address)) {
 		unsigned long pfn = PFN_DOWN(__pa(address));

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Optimize same protection check
  2018-09-17 14:29 ` [patch V3 09/11] x86/mm/cpa: Optimize same protection check Thomas Gleixner
  2018-09-21 20:12   ` Dave Hansen
@ 2018-09-27 18:50   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, hpa, mingo, linux-kernel, peterz, mark.gross, tglx,
	bin.yang

Commit-ID:  1c4b406ee89c2c4210f2e19b97d39215f445c316
Gitweb:     https://git.kernel.org/tip/1c4b406ee89c2c4210f2e19b97d39215f445c316
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:15 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:39 +0200

x86/mm/cpa: Optimize same protection check

When the existing mapping is correct and the new requested page protections
are the same as the existing ones, then further checks can be omitted and the
large page can be preserved. The slow path 4k wise check will not come up with
a different result.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800709
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               560642
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.424477581@linutronix.de

---
 arch/x86/mm/pageattr.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 19781b0ab4b4..5160334f9095 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -839,6 +839,20 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 		return 1;
 	}
 
+	/*
+	 * Optimization: If the requested pgprot is the same as the current
+	 * pgprot, then the large page can be preserved and no updates are
+	 * required independent of alignment and length of the requested
+	 * range. The above already established that the current pgprot is
+	 * correct, which in consequence makes the requested pgprot correct
+	 * as well if it is the same. The static protection scan below will
+	 * not come to a different conclusion.
+	 */
+	if (pgprot_val(req_prot) == pgprot_val(old_prot)) {
+		cpa_inc_lp_sameprot(level);
+		return 0;
+	}
+
 	/*
 	 * Make sure that the requested pgprot does not violate the static
 	 * protections. Check the full large page whether one of the pages

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Do the range check early
  2018-09-17 14:29 ` [patch V3 10/11] x86/mm/cpa: Do the range check early Thomas Gleixner
  2018-09-21 20:26   ` Dave Hansen
@ 2018-09-27 18:50   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bin.yang, mingo, hpa, peterz, mark.gross, tglx, dave.hansen,
	linux-kernel

Commit-ID:  9cc9f17a5a0a8564b41b7c5c460e7f078c42d712
Gitweb:     https://git.kernel.org/tip/9cc9f17a5a0a8564b41b7c5c460e7f078c42d712
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:16 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:39 +0200

x86/mm/cpa: Do the range check early

To avoid excessive 4k wise checks in the common case do a quick check first
whether the requested new page protections conflict with a static
protection area in the large page. If there is no conflict then the
decision whether to preserve or to split the page can be made immediately.

If the requested range covers the full large page, preserve it. Otherwise
split it up. No point in doing a slow crawl in 4k steps.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               560642
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  541
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:                  514
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.507259989@linutronix.de

---
 arch/x86/mm/pageattr.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 5160334f9095..bbc5eb5cbc9d 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -854,10 +854,28 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	}
 
 	/*
-	 * Make sure that the requested pgprot does not violate the static
-	 * protections. Check the full large page whether one of the pages
-	 * in it results in a different pgprot than the first one of the
-	 * requested range. If yes, then the page needs to be split.
+	 * Optimization: Check whether the requested pgprot is conflicting
+	 * with a static protection requirement in the large page. If not,
+	 * then checking whether the requested range is fully covering the
+	 * large page can be done right here.
+	 */
+	new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
+				      CPA_DETECT);
+
+	if (pgprot_val(req_prot) == pgprot_val(new_prot)) {
+		if (address != lpaddr || cpa->numpages != numpages)
+			return 1;
+		goto setlp;
+	}
+
+	/*
+	 * Slow path. The full large page check above established that the
+	 * requested pgprot cannot be applied to the full large page due to
+	 * conflicting requirements of static protection regions. It might
+	 * turn out that the whole requested range is covered by the
+	 * modified protection of the first 4k segment at @address. This
+	 * might result in the ability to preserve the large page
+	 * nevertheless.
 	 */
 	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
 	pfn = old_pfn;
@@ -882,6 +900,7 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	if (address != lpaddr || cpa->numpages != numpages)
 		return 1;
 
+setlp:
 	/* All checks passed. Update the large page mapping. */
 	new_pte = pfn_pte(old_pfn, new_prot);
 	__set_pmd_pte(kpte, address, new_pte);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip:x86/mm] x86/mm/cpa: Avoid the 4k pages check completely
  2018-09-17 14:29 ` [patch V3 11/11] x86/mm/cpa: Avoid the 4k pages check completely Thomas Gleixner
  2018-09-21 20:32   ` Dave Hansen
@ 2018-09-27 18:51   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-09-27 18:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, dave.hansen, mark.gross, mingo, peterz, bin.yang, hpa,
	linux-kernel

Commit-ID:  585948f4f695b07204702cfee0f828424af32aa7
Gitweb:     https://git.kernel.org/tip/585948f4f695b07204702cfee0f828424af32aa7
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 17 Sep 2018 16:29:17 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 27 Sep 2018 20:39:39 +0200

x86/mm/cpa: Avoid the 4k pages check completely

The extra loop which tries hard to preserve large pages in case of conflicts
with static protection regions turns out to be not preserving anything, at
least not in the experiments which have been conducted.

There might be corner cases in which the code would be able to preserve a
large page oaccsionally, but it's really not worth the extra code and the
cycles wasted in the common case.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  541
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:                  514
 4K pages set-checked:             7668

After:
 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages set-checked:             7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.589642503@linutronix.de

---
 arch/x86/mm/pageattr.c | 64 +++++++++++++-------------------------------------
 1 file changed, 16 insertions(+), 48 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index bbc5eb5cbc9d..4e55ded01be5 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -111,7 +111,6 @@ static unsigned long cpa_1g_preserved;
 static unsigned long cpa_2m_checked;
 static unsigned long cpa_2m_sameprot;
 static unsigned long cpa_2m_preserved;
-static unsigned long cpa_4k_checked;
 static unsigned long cpa_4k_install;
 
 static inline void cpa_inc_1g_checked(void)
@@ -124,11 +123,6 @@ static inline void cpa_inc_2m_checked(void)
 	cpa_2m_checked++;
 }
 
-static inline void cpa_inc_4k_checked(void)
-{
-	cpa_4k_checked++;
-}
-
 static inline void cpa_inc_4k_install(void)
 {
 	cpa_4k_install++;
@@ -158,7 +152,6 @@ static int cpastats_show(struct seq_file *m, void *p)
 	seq_printf(m, "2M pages checked:     %16lu\n", cpa_2m_checked);
 	seq_printf(m, "2M pages sameprot:    %16lu\n", cpa_2m_sameprot);
 	seq_printf(m, "2M pages preserved:   %16lu\n", cpa_2m_preserved);
-	seq_printf(m, "4K pages checked:     %16lu\n", cpa_4k_checked);
 	seq_printf(m, "4K pages set-checked: %16lu\n", cpa_4k_install);
 	return 0;
 }
@@ -185,7 +178,6 @@ late_initcall(cpa_stats_init);
 #else
 static inline void cpa_inc_1g_checked(void) { }
 static inline void cpa_inc_2m_checked(void) { }
-static inline void cpa_inc_4k_checked(void) { }
 static inline void cpa_inc_4k_install(void) { }
 static inline void cpa_inc_lp_sameprot(int level) { }
 static inline void cpa_inc_lp_preserved(int level) { }
@@ -745,11 +737,10 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot)
 static int __should_split_large_page(pte_t *kpte, unsigned long address,
 				     struct cpa_data *cpa)
 {
-	unsigned long numpages, pmask, psize, lpaddr, addr, pfn, old_pfn;
+	unsigned long numpages, pmask, psize, lpaddr, pfn, old_pfn;
 	pgprot_t old_prot, new_prot, req_prot, chk_prot;
 	pte_t new_pte, old_pte, *tmp;
 	enum pg_level level;
-	int i;
 
 	/*
 	 * Check for races, another CPU might have split this page
@@ -854,53 +845,30 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	}
 
 	/*
-	 * Optimization: Check whether the requested pgprot is conflicting
-	 * with a static protection requirement in the large page. If not,
-	 * then checking whether the requested range is fully covering the
-	 * large page can be done right here.
+	 * If the requested range does not cover the full page, split it up
 	 */
-	new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
-				      CPA_DETECT);
-
-	if (pgprot_val(req_prot) == pgprot_val(new_prot)) {
-		if (address != lpaddr || cpa->numpages != numpages)
-			return 1;
-		goto setlp;
-	}
+	if (address != lpaddr || cpa->numpages != numpages)
+		return 1;
 
 	/*
-	 * Slow path. The full large page check above established that the
-	 * requested pgprot cannot be applied to the full large page due to
-	 * conflicting requirements of static protection regions. It might
-	 * turn out that the whole requested range is covered by the
-	 * modified protection of the first 4k segment at @address. This
-	 * might result in the ability to preserve the large page
-	 * nevertheless.
+	 * Check whether the requested pgprot is conflicting with a static
+	 * protection requirement in the large page.
 	 */
-	new_prot = static_protections(req_prot, address, pfn, 1, CPA_DETECT);
-	pfn = old_pfn;
-	for (i = 0, addr = lpaddr; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
-		chk_prot = static_protections(req_prot, addr, pfn, 1,
-					      CPA_DETECT);
-		cpa_inc_4k_checked();
-		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
-			return 1;
-	}
-
-	/* If there are no changes, return. */
-	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
-		cpa_inc_lp_sameprot(level);
-		return 0;
-	}
+	new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
+				      CPA_DETECT);
 
 	/*
-	 * Verify that the address is aligned and the number of pages
-	 * covers the full page.
+	 * If there is a conflict, split the large page.
+	 *
+	 * There used to be a 4k wise evaluation trying really hard to
+	 * preserve the large pages, but experimentation has shown, that this
+	 * does not help at all. There might be corner cases which would
+	 * preserve one large page occasionally, but it's really not worth the
+	 * extra code and cycles for the common case.
 	 */
-	if (address != lpaddr || cpa->numpages != numpages)
+	if (pgprot_val(req_prot) != pgprot_val(new_prot))
 		return 1;
 
-setlp:
 	/* All checks passed. Update the large page mapping. */
 	new_pte = pfn_pte(old_pfn, new_prot);
 	__set_pmd_pte(kpte, address, new_pte);

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-09-27 18:52 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17 14:29 [patch V3 00/11] x86/mm/cpa: Improve large page preservation handling Thomas Gleixner
2018-09-17 14:29 ` [patch V3 01/11] x86/mm/init32: Mark text and rodata RO in one go Thomas Gleixner
2018-09-21 16:15   ` Dave Hansen
2018-09-27 18:45   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 02/11] x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Thomas Gleixner
2018-09-18  7:03   ` Peter Zijlstra
2018-09-18  8:19   ` Peter Zijlstra
2018-09-18 12:14     ` Peter Zijlstra
2018-09-18 22:34       ` Thomas Gleixner
2018-09-21 16:22   ` Dave Hansen
2018-09-27 18:46   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 03/11] x86/mm/cpa: Rework static_protections() Thomas Gleixner
2018-09-21 16:33   ` Dave Hansen
2018-09-27 18:46   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 04/11] x86/mm/cpa: Allow range check for static protections Thomas Gleixner
2018-09-21 16:36   ` Dave Hansen
2018-09-27 18:47   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 05/11] x86/mm/cpa: Add debug mechanism Thomas Gleixner
2018-09-21 16:40   ` Dave Hansen
2018-09-22 10:33     ` Peter Zijlstra
2018-09-27 18:48   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 06/11] x86/mm/cpa: Add large page preservation statistics Thomas Gleixner
2018-09-21 19:59   ` Dave Hansen
2018-09-27 18:48   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 07/11] x86/mm/cpa: Avoid static protection checks on unmap Thomas Gleixner
2018-09-21 20:01   ` Dave Hansen
2018-09-27 18:49   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 08/11] x86/mm/cpa: Add sanity check for existing mappings Thomas Gleixner
2018-09-18  7:14   ` Peter Zijlstra
2018-09-21 20:07   ` Dave Hansen
2018-09-27 18:49   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 09/11] x86/mm/cpa: Optimize same protection check Thomas Gleixner
2018-09-21 20:12   ` Dave Hansen
2018-09-27 18:07     ` Thomas Gleixner
2018-09-27 18:50   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 10/11] x86/mm/cpa: Do the range check early Thomas Gleixner
2018-09-21 20:26   ` Dave Hansen
2018-09-27 18:50   ` [tip:x86/mm] " tip-bot for Thomas Gleixner
2018-09-17 14:29 ` [patch V3 11/11] x86/mm/cpa: Avoid the 4k pages check completely Thomas Gleixner
2018-09-21 20:32   ` Dave Hansen
2018-09-27 18:51   ` [tip:x86/mm] " tip-bot for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).