linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [0/31] Great change_page_attr patch series v2
@ 2008-01-14 22:16 Andi Kleen
  2008-01-14 22:16 ` [PATCH] [1/31] Shrink __PAGE_KERNEL/__PAGE_KERNEL_EXEC on non PAE kernels Andi Kleen
                   ` (31 more replies)
  0 siblings, 32 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Lots of improvements to change_page_attr(). Make it a lot more
efficient and fix various bugs.

Changes against earlier version

- Fixed some checkpatch.pl complaints
- Improved self test suite
- Fixed more reference bugs 
- Fix NX handling on 32bit
- Remove some useless code there
- Few other random fixes

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [1/31] Shrink __PAGE_KERNEL/__PAGE_KERNEL_EXEC on non PAE kernels
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [2/31] CPA: Do a simple self test at boot Andi Kleen
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


No need to make it 64bit there.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/init_32.c     |    4 ++--
 include/asm-x86/pgtable.h |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -326,9 +326,9 @@ static void __init set_highmem_pages_ini
 #define set_highmem_pages_init(bad_ppro) do { } while (0)
 #endif /* CONFIG_HIGHMEM */
 
-unsigned long long __PAGE_KERNEL = _PAGE_KERNEL;
+pteval_t __PAGE_KERNEL = _PAGE_KERNEL;
 EXPORT_SYMBOL(__PAGE_KERNEL);
-unsigned long long __PAGE_KERNEL_EXEC = _PAGE_KERNEL_EXEC;
+pteval_t __PAGE_KERNEL_EXEC = _PAGE_KERNEL_EXEC;
 
 #ifdef CONFIG_NUMA
 extern void __init remap_numa_kva(void);
Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -64,7 +64,7 @@
 #define _PAGE_KERNEL (_PAGE_KERNEL_EXEC | _PAGE_NX)
 
 #ifndef __ASSEMBLY__
-extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
+extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
 #endif	/* __ASSEMBLY__ */
 #else
 #define __PAGE_KERNEL_EXEC						\

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [2/31] CPA: Do a simple self test at boot
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
  2008-01-14 22:16 ` [PATCH] [1/31] Shrink __PAGE_KERNEL/__PAGE_KERNEL_EXEC on non PAE kernels Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15  8:47   ` Harvey Harrison
  2008-01-14 22:16 ` [PATCH] [3/31] Add pte accessors for the global bit Andi Kleen
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


When CONFIG_DEBUG_RODATA is enabled undo the ro mapping and redo it again.
This gives some simple testing for change_page_attr()

Optional patch, but I find it useful.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/Kconfig.debug |    5 +++++
 arch/x86/mm/init_32.c  |   26 ++++++++++++++++++++++++++
 arch/x86/mm/init_64.c  |   10 ++++++++++
 3 files changed, 41 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -603,6 +603,16 @@ void mark_rodata_ro(void)
 	 * of who is the culprit.
 	 */
 	global_flush_tlb();
+
+#ifdef CONFIG_CPA_DEBUG
+	printk("Testing CPA: undo %lx-%lx\n", start, end);
+	change_page_attr_addr(start, (end - start) >> PAGE_SHIFT, PAGE_KERNEL);
+	global_flush_tlb();
+
+	printk("Testing CPA: again\n");
+	change_page_attr_addr(start, (end - start) >> PAGE_SHIFT, PAGE_KERNEL_RO);
+	global_flush_tlb();
+#endif
 }
 #endif
 
Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -793,6 +793,20 @@ void mark_rodata_ro(void)
 		change_page_attr(virt_to_page(start),
 		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
 		printk("Write protecting the kernel text: %luk\n", size >> 10);
+
+#ifdef CONFIG_CPA_DEBUG
+		global_flush_tlb();
+
+		printk("Testing CPA: Reverting %lx-%lx\n", start, start+size);
+		change_page_attr(virt_to_page(start), size>>PAGE_SHIFT,
+				 PAGE_KERNEL_EXEC);
+		global_flush_tlb();
+
+		printk("Testing CPA: write protecting again\n");
+		change_page_attr(virt_to_page(start), size>>PAGE_SHIFT,
+				PAGE_KERNEL_RX);
+		global_flush_tlb();
+#endif
 	}
 #endif
 	start += size;
@@ -809,6 +823,18 @@ void mark_rodata_ro(void)
 	 * of who is the culprit.
 	 */
 	global_flush_tlb();
+
+#ifdef CONFIG_CPA_DEBUG
+	printk("Testing CPA: undo %lx-%lx\n", start, start + size);
+	change_page_attr(virt_to_page(start), size >> PAGE_SHIFT,
+				PAGE_KERNEL);
+	global_flush_tlb();
+
+	printk("Testing CPA: write protecting again\n");
+	change_page_attr(virt_to_page(start), size >> PAGE_SHIFT,
+				PAGE_KERNEL_RO);
+	global_flush_tlb();
+#endif
 }
 #endif
 
Index: linux/arch/x86/Kconfig.debug
===================================================================
--- linux.orig/arch/x86/Kconfig.debug
+++ linux/arch/x86/Kconfig.debug
@@ -192,4 +192,9 @@ config DEBUG_BOOT_PARAMS
 	help
 	  This option will cause struct boot_params to be exported via debugfs.
 
+config CPA_DEBUG
+	bool "CPA self test code"
+	help
+	  Do change_page_attr self tests at boot.
+
 endmenu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [3/31] Add pte accessors for the global bit
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
  2008-01-14 22:16 ` [PATCH] [1/31] Shrink __PAGE_KERNEL/__PAGE_KERNEL_EXEC on non PAE kernels Andi Kleen
  2008-01-14 22:16 ` [PATCH] [2/31] CPA: Do a simple self test at boot Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [4/31] Add pte_pgprot on i386 Andi Kleen
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Needed for some test code.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-x86/pgtable.h |    3 +++
 1 file changed, 3 insertions(+)

Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -134,6 +134,7 @@ static inline int pte_young(pte_t pte)		
 static inline int pte_write(pte_t pte)		{ return pte_val(pte) & _PAGE_RW; }
 static inline int pte_file(pte_t pte)		{ return pte_val(pte) & _PAGE_FILE; }
 static inline int pte_huge(pte_t pte)		{ return pte_val(pte) & _PAGE_PSE; }
+static inline int pte_global(pte_t pte) 	{ return pte_val(pte) & _PAGE_GLOBAL; }
 
 static inline int pmd_large(pmd_t pte) {
 	return (pmd_val(pte) & (_PAGE_PSE|_PAGE_PRESENT)) ==
@@ -149,6 +150,8 @@ static inline pte_t pte_mkyoung(pte_t pt
 static inline pte_t pte_mkwrite(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_RW); }
 static inline pte_t pte_mkhuge(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_PSE); }
 static inline pte_t pte_clrhuge(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_PSE); }
+static inline pte_t pte_mkglobal(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_GLOBAL); }
+static inline pte_t pte_clrglobal(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_GLOBAL); }
 
 extern pteval_t __supported_pte_mask;
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [4/31] Add pte_pgprot on i386
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (2 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [3/31] Add pte accessors for the global bit Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15 13:00   ` Johannes Weiner
  2008-01-14 22:16 ` [PATCH] [5/31] Don't drop NX bit in pte modifier functions for 32bit Andi Kleen
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


64bit already had it.

Needed for later patches.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-x86/pgtable-2level.h |    2 ++
 include/asm-x86/pgtable-3level.h |    2 ++
 2 files changed, 4 insertions(+)

Index: linux/include/asm-x86/pgtable-2level.h
===================================================================
--- linux.orig/include/asm-x86/pgtable-2level.h
+++ linux/include/asm-x86/pgtable-2level.h
@@ -75,6 +75,8 @@ static inline int pte_exec_kernel(pte_t 
 #define pgoff_to_pte(off) \
 	((pte_t) { .pte_low = (((off) & 0x1f) << 1) + (((off) >> 5) << 8) + _PAGE_FILE })
 
+#define pte_pgprot(x) __pgprot((x).pte_low & 0xfff)
+
 /* Encode and de-code a swap entry */
 #define __swp_type(x)			(((x).val >> 1) & 0x1f)
 #define __swp_offset(x)			((x).val >> 8)
Index: linux/include/asm-x86/pgtable-3level.h
===================================================================
--- linux.orig/include/asm-x86/pgtable-3level.h
+++ linux/include/asm-x86/pgtable-3level.h
@@ -142,6 +142,8 @@ static inline unsigned long pte_pfn(pte_
 	return pte_val(pte) >> PAGE_SHIFT;
 }
 
+#define pte_pgprot(x) __pgprot(pte_val(x) & (0xfff | _PAGE_NX))
+
 /*
  * Bits 0, 6 and 7 are taken in the low part of the pte,
  * put the 32 bits of offset into the high part.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [5/31] Don't drop NX bit in pte modifier functions for 32bit
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (3 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [4/31] Add pte_pgprot on i386 Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [6/31] CPA: Undo white space changes Andi Kleen
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


The pte_* modifier functions that cleared bits dropped the NX bit on 32bit
PAE because they only worked in int, but NX is in bit 63. Fix that
by adding appropiate casts so that the arithmetic happens as long long
on PAE kernels.

I decided to just use 64bit arithmetic instead of open coding like
pte_modify() because gcc should generate good enough code for that now.

While this looks in theory like a .24 candidate this might trigger
some subtle latent bugs so it's better to delay it for .25 for more
testing.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-x86/pgtable.h |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -141,17 +141,17 @@ static inline int pmd_large(pmd_t pte) {
 		(_PAGE_PSE|_PAGE_PRESENT);
 }
 
-static inline pte_t pte_mkclean(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_DIRTY); }
-static inline pte_t pte_mkold(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_ACCESSED); }
-static inline pte_t pte_wrprotect(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_RW); }
-static inline pte_t pte_mkexec(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_NX); }
+static inline pte_t pte_mkclean(pte_t pte)	{ return __pte(pte_val(pte) & ~(pteval_t)_PAGE_DIRTY); }
+static inline pte_t pte_mkold(pte_t pte)	{ return __pte(pte_val(pte) & ~(pteval_t)_PAGE_ACCESSED); }
+static inline pte_t pte_wrprotect(pte_t pte)	{ return __pte(pte_val(pte) & ~(pteval_t)_PAGE_RW); }
+static inline pte_t pte_mkexec(pte_t pte)	{ return __pte(pte_val(pte) & ~(pteval_t)_PAGE_NX); }
 static inline pte_t pte_mkdirty(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_DIRTY); }
 static inline pte_t pte_mkyoung(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_ACCESSED); }
 static inline pte_t pte_mkwrite(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_RW); }
 static inline pte_t pte_mkhuge(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_PSE); }
-static inline pte_t pte_clrhuge(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_PSE); }
+static inline pte_t pte_clrhuge(pte_t pte)	{ return __pte(pte_val(pte) & ~(pteval_t)_PAGE_PSE); }
 static inline pte_t pte_mkglobal(pte_t pte)	{ return __pte(pte_val(pte) | _PAGE_GLOBAL); }
-static inline pte_t pte_clrglobal(pte_t pte)	{ return __pte(pte_val(pte) & ~_PAGE_GLOBAL); }
+static inline pte_t pte_clrglobal(pte_t pte)	{ return __pte(pte_val(pte) & ~(pteval_t)_PAGE_GLOBAL); }
 
 extern pteval_t __supported_pte_mask;
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [6/31] CPA: Undo white space changes
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (4 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [5/31] Don't drop NX bit in pte modifier functions for 32bit Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable() Andi Kleen
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Undo random white space changes. This reverts ddb53b5735793a19dc17bcd98b050f672f28f1ea

I simply don't have the nerves to port a 20+ patch series to the 
reformatted version.  And the patch series changes most lines
anyways and drops the trailing white spaces there.

And since this was a nop losing it for now isn't a problem.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |  149 ++++++++++++++++++++--------------------------
 arch/x86/mm/pageattr_64.c |  137 ++++++++++++++++++------------------------
 2 files changed, 126 insertions(+), 160 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -1,29 +1,28 @@
-/*
- * Copyright 2002 Andi Kleen, SuSE Labs.
+/* 
+ * Copyright 2002 Andi Kleen, SuSE Labs. 
  * Thanks to Ben LaHaise for precious feedback.
- */
+ */ 
 
+#include <linux/mm.h>
+#include <linux/sched.h>
 #include <linux/highmem.h>
 #include <linux/module.h>
-#include <linux/sched.h>
 #include <linux/slab.h>
-#include <linux/mm.h>
-
+#include <asm/uaccess.h>
 #include <asm/processor.h>
 #include <asm/tlbflush.h>
-#include <asm/sections.h>
-#include <asm/uaccess.h>
 #include <asm/pgalloc.h>
+#include <asm/sections.h>
 
 static DEFINE_SPINLOCK(cpa_lock);
 static struct list_head df_list = LIST_HEAD_INIT(df_list);
 
-pte_t *lookup_address(unsigned long address)
-{
+
+pte_t *lookup_address(unsigned long address) 
+{ 
 	pgd_t *pgd = pgd_offset_k(address);
 	pud_t *pud;
 	pmd_t *pmd;
-
 	if (pgd_none(*pgd))
 		return NULL;
 	pud = pud_offset(pgd, address);
@@ -34,22 +33,21 @@ pte_t *lookup_address(unsigned long addr
 		return NULL;
 	if (pmd_large(*pmd))
 		return (pte_t *)pmd;
-
 	return pte_offset_kernel(pmd, address);
-}
+} 
 
-static struct page *
-split_large_page(unsigned long address, pgprot_t prot, pgprot_t ref_prot)
-{
+static struct page *split_large_page(unsigned long address, pgprot_t prot,
+					pgprot_t ref_prot)
+{ 
+	int i; 
 	unsigned long addr;
 	struct page *base;
 	pte_t *pbase;
-	int i;
 
 	spin_unlock_irq(&cpa_lock);
 	base = alloc_pages(GFP_KERNEL, 0);
 	spin_lock_irq(&cpa_lock);
-	if (!base)
+	if (!base) 
 		return NULL;
 
 	/*
@@ -60,24 +58,22 @@ split_large_page(unsigned long address, 
 	page_private(base) = 0;
 
 	address = __pa(address);
-	addr = address & LARGE_PAGE_MASK;
+	addr = address & LARGE_PAGE_MASK; 
 	pbase = (pte_t *)page_address(base);
 	paravirt_alloc_pt(&init_mm, page_to_pfn(base));
-
 	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
-		set_pte(&pbase[i], pfn_pte(addr >> PAGE_SHIFT,
-					   addr == address ? prot : ref_prot));
+	       set_pte(&pbase[i], pfn_pte(addr >> PAGE_SHIFT,
+					  addr == address ? prot : ref_prot));
 	}
 	return base;
-}
+} 
 
 static void cache_flush_page(struct page *p)
-{
-	void *addr = page_address(p);
+{ 
+	void *adr = page_address(p);
 	int i;
-
 	for (i = 0; i < PAGE_SIZE; i += boot_cpu_data.x86_clflush_size)
-		clflush(addr + i);
+		clflush(adr+i);
 }
 
 static void flush_kernel_map(void *arg)
@@ -87,27 +83,23 @@ static void flush_kernel_map(void *arg)
 
 	/* High level code is not ready for clflush yet */
 	if (0 && cpu_has_clflush) {
-		list_for_each_entry(p, lh, lru)
+		list_for_each_entry (p, lh, lru)
 			cache_flush_page(p);
-	} else {
-		if (boot_cpu_data.x86_model >= 4)
-			wbinvd();
-	}
+	} else if (boot_cpu_data.x86_model >= 4)
+		wbinvd();
 
-	/*
-	 * Flush all to work around Errata in early athlons regarding
-	 * large page flushing.
+	/* Flush all to work around Errata in early athlons regarding 
+	 * large page flushing. 
 	 */
-	__flush_tlb_all();
+	__flush_tlb_all();	
 }
 
-static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
-{
-	unsigned long flags;
+static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) 
+{ 
 	struct page *page;
+	unsigned long flags;
 
-	/* change init_mm */
-	set_pte_atomic(kpte, pte);
+	set_pte_atomic(kpte, pte);	/* change init_mm */
 	if (SHARED_KERNEL_PMD)
 		return;
 
@@ -116,7 +108,6 @@ static void set_pmd_pte(pte_t *kpte, uns
 		pgd_t *pgd;
 		pud_t *pud;
 		pmd_t *pmd;
-
 		pgd = (pgd_t *)page_address(page) + pgd_index(address);
 		pud = pud_offset(pgd, address);
 		pmd = pmd_offset(pud, address);
@@ -125,9 +116,9 @@ static void set_pmd_pte(pte_t *kpte, uns
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
 
-/*
- * No more special protections in this 2/4MB area - revert to a large
- * page again.
+/* 
+ * No more special protections in this 2/4MB area - revert to a
+ * large page again. 
  */
 static inline void revert_page(struct page *kpte_page, unsigned long address)
 {
@@ -151,11 +142,12 @@ static inline void save_page(struct page
 		list_add(&kpte_page->lru, &df_list);
 }
 
-static int __change_page_attr(struct page *page, pgprot_t prot)
-{
-	struct page *kpte_page;
+static int
+__change_page_attr(struct page *page, pgprot_t prot)
+{ 
+	pte_t *kpte; 
 	unsigned long address;
-	pte_t *kpte;
+	struct page *kpte_page;
 
 	BUG_ON(PageHighMem(page));
 	address = (unsigned long)page_address(page);
@@ -163,17 +155,16 @@ static int __change_page_attr(struct pag
 	kpte = lookup_address(address);
 	if (!kpte)
 		return -EINVAL;
-
 	kpte_page = virt_to_page(kpte);
 	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
 
-	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) {
+	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
 		if (!pte_huge(*kpte)) {
-			set_pte_atomic(kpte, mk_pte(page, prot));
+			set_pte_atomic(kpte, mk_pte(page, prot)); 
 		} else {
-			struct page *split;
 			pgprot_t ref_prot;
+			struct page *split;
 
 			ref_prot =
 			((address & LARGE_PAGE_MASK) < (unsigned long)&_etext)
@@ -181,19 +172,16 @@ static int __change_page_attr(struct pag
 			split = split_large_page(address, prot, ref_prot);
 			if (!split)
 				return -ENOMEM;
-
-			set_pmd_pte(kpte, address, mk_pte(split, ref_prot));
+			set_pmd_pte(kpte,address,mk_pte(split, ref_prot));
 			kpte_page = split;
 		}
 		page_private(kpte_page)++;
-	} else {
-		if (!pte_huge(*kpte)) {
-			set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
-			BUG_ON(page_private(kpte_page) == 0);
-			page_private(kpte_page)--;
-		} else
-			BUG();
-	}
+	} else if (!pte_huge(*kpte)) {
+		set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
+		BUG_ON(page_private(kpte_page) == 0);
+		page_private(kpte_page)--;
+	} else
+		BUG();
 
 	/*
 	 * If the pte was reserved, it means it was created at boot
@@ -209,7 +197,7 @@ static int __change_page_attr(struct pag
 		}
 	}
 	return 0;
-}
+} 
 
 static inline void flush_map(struct list_head *l)
 {
@@ -223,33 +211,32 @@ static inline void flush_map(struct list
  * than write-back somewhere - some CPUs do not like it when mappings with
  * different caching policies exist. This changes the page attributes of the
  * in kernel linear mapping too.
- *
+ * 
  * The caller needs to ensure that there are no conflicting mappings elsewhere.
  * This function only deals with the kernel linear map.
- *
+ * 
  * Caller must call global_flush_tlb() after this.
  */
 int change_page_attr(struct page *page, int numpages, pgprot_t prot)
 {
+	int err = 0; 
+	int i; 
 	unsigned long flags;
-	int err = 0, i;
 
 	spin_lock_irqsave(&cpa_lock, flags);
-	for (i = 0; i < numpages; i++, page++) {
+	for (i = 0; i < numpages; i++, page++) { 
 		err = __change_page_attr(page, prot);
-		if (err)
-			break;
-	}
+		if (err) 
+			break; 
+	}	
 	spin_unlock_irqrestore(&cpa_lock, flags);
-
 	return err;
 }
-EXPORT_SYMBOL(change_page_attr);
 
 void global_flush_tlb(void)
 {
-	struct page *pg, *next;
 	struct list_head l;
+	struct page *pg, *next;
 
 	BUG_ON(irqs_disabled());
 
@@ -266,28 +253,26 @@ void global_flush_tlb(void)
 		__free_page(pg);
 	}
 }
-EXPORT_SYMBOL(global_flush_tlb);
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void kernel_map_pages(struct page *page, int numpages, int enable)
 {
 	if (PageHighMem(page))
 		return;
-	if (!enable) {
+	if (!enable)
 		debug_check_no_locks_freed(page_address(page),
 					   numpages * PAGE_SIZE);
-	}
 
-	/*
-	 * the return value is ignored - the calls cannot fail,
+	/* the return value is ignored - the calls cannot fail,
 	 * large pages are disabled at boot time.
 	 */
 	change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0));
-
-	/*
-	 * we should perform an IPI and flush all tlbs,
+	/* we should perform an IPI and flush all tlbs,
 	 * but that can deadlock->flush only current cpu.
 	 */
 	__flush_tlb_all();
 }
 #endif
+
+EXPORT_SYMBOL(change_page_attr);
+EXPORT_SYMBOL(global_flush_tlb);
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -1,54 +1,48 @@
-/*
- * Copyright 2002 Andi Kleen, SuSE Labs.
+/* 
+ * Copyright 2002 Andi Kleen, SuSE Labs. 
  * Thanks to Ben LaHaise for precious feedback.
- */
+ */ 
 
+#include <linux/mm.h>
+#include <linux/sched.h>
 #include <linux/highmem.h>
 #include <linux/module.h>
-#include <linux/sched.h>
 #include <linux/slab.h>
-#include <linux/mm.h>
-
+#include <asm/uaccess.h>
 #include <asm/processor.h>
 #include <asm/tlbflush.h>
-#include <asm/uaccess.h>
 #include <asm/io.h>
 
 pte_t *lookup_address(unsigned long address)
-{
+{ 
 	pgd_t *pgd = pgd_offset_k(address);
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
-
 	if (pgd_none(*pgd))
 		return NULL;
 	pud = pud_offset(pgd, address);
 	if (!pud_present(*pud))
-		return NULL;
+		return NULL; 
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
-		return NULL;
+		return NULL; 
 	if (pmd_large(*pmd))
 		return (pte_t *)pmd;
-
 	pte = pte_offset_kernel(pmd, address);
 	if (pte && !pte_present(*pte))
-		pte = NULL;
-
+		pte = NULL; 
 	return pte;
-}
+} 
 
-static struct page *
-split_large_page(unsigned long address, pgprot_t prot, pgprot_t ref_prot)
-{
+static struct page *split_large_page(unsigned long address, pgprot_t prot,
+				     pgprot_t ref_prot)
+{ 
+	int i; 
 	unsigned long addr;
-	struct page *base;
+	struct page *base = alloc_pages(GFP_KERNEL, 0);
 	pte_t *pbase;
-	int i;
-
-	base = alloc_pages(GFP_KERNEL, 0);
-	if (!base)
+	if (!base) 
 		return NULL;
 	/*
 	 * page_private is used to track the number of entries in
@@ -58,21 +52,20 @@ split_large_page(unsigned long address, 
 	page_private(base) = 0;
 
 	address = __pa(address);
-	addr = address & LARGE_PAGE_MASK;
+	addr = address & LARGE_PAGE_MASK; 
 	pbase = (pte_t *)page_address(base);
 	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
-		pbase[i] = pfn_pte(addr >> PAGE_SHIFT,
+		pbase[i] = pfn_pte(addr >> PAGE_SHIFT, 
 				   addr == address ? prot : ref_prot);
 	}
 	return base;
-}
+} 
 
-void clflush_cache_range(void *addr, int size)
+void clflush_cache_range(void *adr, int size)
 {
 	int i;
-
 	for (i = 0; i < size; i += boot_cpu_data.x86_clflush_size)
-		clflush(addr+i);
+		clflush(adr+i);
 }
 
 static void flush_kernel_map(void *arg)
@@ -83,20 +76,17 @@ static void flush_kernel_map(void *arg)
 	/* When clflush is available always use it because it is
 	   much cheaper than WBINVD. */
 	/* clflush is still broken. Disable for now. */
-	if (1 || !cpu_has_clflush) {
+	if (1 || !cpu_has_clflush)
 		asm volatile("wbinvd" ::: "memory");
-	} else {
-		list_for_each_entry(pg, l, lru) {
-			void *addr = page_address(pg);
-
-			clflush_cache_range(addr, PAGE_SIZE);
-		}
+	else list_for_each_entry(pg, l, lru) {
+		void *adr = page_address(pg);
+		clflush_cache_range(adr, PAGE_SIZE);
 	}
 	__flush_tlb_all();
 }
 
 static inline void flush_map(struct list_head *l)
-{
+{	
 	on_each_cpu(flush_kernel_map, l, 1, 1);
 }
 
@@ -108,47 +98,44 @@ static inline void save_page(struct page
 		list_add(&fpage->lru, &deferred_pages);
 }
 
-/*
+/* 
  * No more special protections in this 2/4MB area - revert to a
- * large page again.
+ * large page again. 
  */
 static void revert_page(unsigned long address, pgprot_t ref_prot)
 {
-	unsigned long pfn;
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t large_pte;
+	unsigned long pfn;
 
 	pgd = pgd_offset_k(address);
 	BUG_ON(pgd_none(*pgd));
-	pud = pud_offset(pgd, address);
+	pud = pud_offset(pgd,address);
 	BUG_ON(pud_none(*pud));
 	pmd = pmd_offset(pud, address);
 	BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
 	pfn = (__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT;
 	large_pte = pfn_pte(pfn, ref_prot);
 	large_pte = pte_mkhuge(large_pte);
-
 	set_pte((pte_t *)pmd, large_pte);
-}
+}      
 
 static int
 __change_page_attr(unsigned long address, unsigned long pfn, pgprot_t prot,
-		   pgprot_t ref_prot)
-{
+				   pgprot_t ref_prot)
+{ 
+	pte_t *kpte; 
 	struct page *kpte_page;
 	pgprot_t ref_prot2;
-	pte_t *kpte;
 
 	kpte = lookup_address(address);
-	if (!kpte)
-		return 0;
-
+	if (!kpte) return 0;
 	kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
 	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
-	if (pgprot_val(prot) != pgprot_val(ref_prot)) {
+	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
 		if (!pte_huge(*kpte)) {
 			set_pte(kpte, pfn_pte(pfn, prot));
 		} else {
@@ -157,7 +144,6 @@ __change_page_attr(unsigned long address
 			 * change_page_attr on the split page.
 			 */
 			struct page *split;
-
 			ref_prot2 = pte_pgprot(pte_clrhuge(*kpte));
 			split = split_large_page(address, prot, ref_prot2);
 			if (!split)
@@ -167,14 +153,12 @@ __change_page_attr(unsigned long address
 			kpte_page = split;
 		}
 		page_private(kpte_page)++;
-	} else {
-		if (!pte_huge(*kpte)) {
-			set_pte(kpte, pfn_pte(pfn, ref_prot));
-			BUG_ON(page_private(kpte_page) == 0);
-			page_private(kpte_page)--;
-		} else
-			BUG();
-	}
+	} else if (!pte_huge(*kpte)) {
+		set_pte(kpte, pfn_pte(pfn, ref_prot));
+		BUG_ON(page_private(kpte_page) == 0);
+		page_private(kpte_page)--;
+	} else
+		BUG();
 
 	/* on x86-64 the direct mapping set at boot is not using 4k pages */
 	BUG_ON(PageReserved(kpte_page));
@@ -183,7 +167,7 @@ __change_page_attr(unsigned long address
 	if (page_private(kpte_page) == 0)
 		revert_page(address, ref_prot);
 	return 0;
-}
+} 
 
 /*
  * Change the page attributes of an page in the linear mapping.
@@ -192,19 +176,19 @@ __change_page_attr(unsigned long address
  * than write-back somewhere - some CPUs do not like it when mappings with
  * different caching policies exist. This changes the page attributes of the
  * in kernel linear mapping too.
- *
+ * 
  * The caller needs to ensure that there are no conflicting mappings elsewhere.
  * This function only deals with the kernel linear map.
- *
+ * 
  * Caller must call global_flush_tlb() after this.
  */
 int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
 {
-	int err = 0, kernel_map = 0, i;
-
-	if (address >= __START_KERNEL_map &&
-			address < __START_KERNEL_map + KERNEL_TEXT_SIZE) {
+	int err = 0, kernel_map = 0;
+	int i; 
 
+	if (address >= __START_KERNEL_map
+	    && address < __START_KERNEL_map + KERNEL_TEXT_SIZE) {
 		address = (unsigned long)__va(__pa(address));
 		kernel_map = 1;
 	}
@@ -214,8 +198,7 @@ int change_page_attr_addr(unsigned long 
 		unsigned long pfn = __pa(address) >> PAGE_SHIFT;
 
 		if (!kernel_map || pte_present(pfn_pte(0, prot))) {
-			err = __change_page_attr(address, pfn, prot,
-						PAGE_KERNEL);
+			err = __change_page_attr(address, pfn, prot, PAGE_KERNEL);
 			if (err)
 				break;
 		}
@@ -224,16 +207,14 @@ int change_page_attr_addr(unsigned long 
 		if (__pa(address) < KERNEL_TEXT_SIZE) {
 			unsigned long addr2;
 			pgprot_t prot2;
-
 			addr2 = __START_KERNEL_map + __pa(address);
 			/* Make sure the kernel mappings stay executable */
 			prot2 = pte_pgprot(pte_mkexec(pfn_pte(0, prot)));
 			err = __change_page_attr(addr2, pfn, prot2,
 						 PAGE_KERNEL_EXEC);
-		}
-	}
-	up_write(&init_mm.mmap_sem);
-
+		} 
+	}	
+	up_write(&init_mm.mmap_sem); 
 	return err;
 }
 
@@ -241,13 +222,11 @@ int change_page_attr_addr(unsigned long 
 int change_page_attr(struct page *page, int numpages, pgprot_t prot)
 {
 	unsigned long addr = (unsigned long)page_address(page);
-
 	return change_page_attr_addr(addr, numpages, prot);
 }
-EXPORT_SYMBOL(change_page_attr);
 
 void global_flush_tlb(void)
-{
+{ 
 	struct page *pg, *next;
 	struct list_head l;
 
@@ -269,6 +248,8 @@ void global_flush_tlb(void)
 			continue;
 		ClearPagePrivate(pg);
 		__free_page(pg);
-	}
-}
+	} 
+} 
+
+EXPORT_SYMBOL(change_page_attr);
 EXPORT_SYMBOL(global_flush_tlb);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable()
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (5 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [6/31] CPA: Undo white space changes Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15  8:56   ` Harvey Harrison
  2008-01-14 22:16 ` [PATCH] [8/31] CPA: Return the page table level in lookup_address() Andi Kleen
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Similar to x86-64. This is useful in other situations where we want
the page table dumped too.

Besides anything that makes i386 do_page_fault shorter is good.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/fault_32.c |   72 ++++++++++++++++++++++++++-----------------------
 1 file changed, 39 insertions(+), 33 deletions(-)

Index: linux/arch/x86/mm/fault_32.c
===================================================================
--- linux.orig/arch/x86/mm/fault_32.c
+++ linux/arch/x86/mm/fault_32.c
@@ -28,6 +28,44 @@
 #include <asm/desc.h>
 #include <asm/segment.h>
 
+void dump_pagetable(unsigned long address)
+{
+	typeof(pte_val(__pte(0))) page;
+
+	page = read_cr3();
+	page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
+#ifdef CONFIG_X86_PAE
+	printk("*pdpt = %016Lx ", page);
+	if ((page >> PAGE_SHIFT) < max_low_pfn
+	    && page & _PAGE_PRESENT) {
+		page &= PAGE_MASK;
+		page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
+							 & (PTRS_PER_PMD - 1)];
+		printk(KERN_CONT "*pde = %016Lx ", page);
+		page &= ~_PAGE_NX;
+	}
+#else
+	printk("*pde = %08lx ", page);
+#endif
+
+	/*
+	 * We must not directly access the pte in the highpte
+	 * case if the page table is located in highmem.
+	 * And let's rather not kmap-atomic the pte, just in case
+	 * it's allocated already.
+	 */
+	if ((page >> PAGE_SHIFT) < max_low_pfn
+	    && (page & _PAGE_PRESENT)
+	    && !(page & _PAGE_PSE)) {
+		page &= PAGE_MASK;
+		page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
+							 & (PTRS_PER_PTE - 1)];
+		printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
+	}
+
+	printk("\n");
+}
+
 /*
  * Page fault error code bits
  *	bit 0 == 0 means no page found, 1 means protection fault
@@ -530,7 +568,6 @@ no_context:
 	bust_spinlocks(1);
 
 	if (oops_may_print()) {
-		__typeof__(pte_val(__pte(0))) page;
 
 #ifdef CONFIG_X86_PAE
 		if (error_code & PF_INSTR) {
@@ -551,38 +588,7 @@ no_context:
 		printk(" at virtual address %08lx\n", address);
 		printk(KERN_ALERT "printing ip: %08lx ", regs->ip);
 
-		page = read_cr3();
-		page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
-#ifdef CONFIG_X86_PAE
-		printk("*pdpt = %016Lx ", page);
-		if ((page >> PAGE_SHIFT) < max_low_pfn
-		    && page & _PAGE_PRESENT) {
-			page &= PAGE_MASK;
-			page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
-			                                         & (PTRS_PER_PMD - 1)];
-			printk(KERN_CONT "*pde = %016Lx ", page);
-			page &= ~_PAGE_NX;
-		}
-#else
-		printk("*pde = %08lx ", page);
-#endif
-
-		/*
-		 * We must not directly access the pte in the highpte
-		 * case if the page table is located in highmem.
-		 * And let's rather not kmap-atomic the pte, just in case
-		 * it's allocated already.
-		 */
-		if ((page >> PAGE_SHIFT) < max_low_pfn
-		    && (page & _PAGE_PRESENT)
-		    && !(page & _PAGE_PSE)) {
-			page &= PAGE_MASK;
-			page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
-			                                         & (PTRS_PER_PTE - 1)];
-			printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
-		}
-
-		printk("\n");
+		dump_pagetable(address);
 	}
 
 	tsk->thread.cr2 = address;

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [8/31] CPA: Return the page table level in lookup_address()
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (6 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable() Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [9/31] CPA: Add simple self test at boot Andi Kleen
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Needed for the next change.

And change all the callers.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/fault_32.c       |    3 ++-
 arch/x86/mm/init_32.c        |    3 ++-
 arch/x86/mm/pageattr_32.c    |    7 +++++--
 arch/x86/mm/pageattr_64.c    |    7 +++++--
 arch/x86/xen/mmu.c           |    9 ++++++---
 include/asm-x86/pgtable_32.h |    2 +-
 include/asm-x86/pgtable_64.h |    2 +-
 7 files changed, 22 insertions(+), 11 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -13,7 +13,7 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
-pte_t *lookup_address(unsigned long address)
+pte_t *lookup_address(unsigned long address, int *level)
 { 
 	pgd_t *pgd = pgd_offset_k(address);
 	pud_t *pud;
@@ -27,8 +27,10 @@ pte_t *lookup_address(unsigned long addr
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
 		return NULL; 
+	*level = 3;
 	if (pmd_large(*pmd))
 		return (pte_t *)pmd;
+	*level = 4;
 	pte = pte_offset_kernel(pmd, address);
 	if (pte && !pte_present(*pte))
 		pte = NULL; 
@@ -129,8 +131,9 @@ __change_page_attr(unsigned long address
 	pte_t *kpte; 
 	struct page *kpte_page;
 	pgprot_t ref_prot2;
+	int level;
 
-	kpte = lookup_address(address);
+	kpte = lookup_address(address, &level);
 	if (!kpte) return 0;
 	kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
 	BUG_ON(PageLRU(kpte_page));
Index: linux/include/asm-x86/pgtable_64.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -240,7 +240,7 @@ extern struct list_head pgd_list;
 
 extern int kern_addr_valid(unsigned long addr); 
 
-pte_t *lookup_address(unsigned long addr);
+pte_t *lookup_address(unsigned long addr, int *level);
 
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
 		remap_pfn_range(vma, vaddr, pfn, size, prot)
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -18,7 +18,7 @@ static DEFINE_SPINLOCK(cpa_lock);
 static struct list_head df_list = LIST_HEAD_INIT(df_list);
 
 
-pte_t *lookup_address(unsigned long address) 
+pte_t *lookup_address(unsigned long address, int *level)
 { 
 	pgd_t *pgd = pgd_offset_k(address);
 	pud_t *pud;
@@ -31,8 +31,10 @@ pte_t *lookup_address(unsigned long addr
 	pmd = pmd_offset(pud, address);
 	if (pmd_none(*pmd))
 		return NULL;
+	*level = 2;
 	if (pmd_large(*pmd))
 		return (pte_t *)pmd;
+	*level = 3;
 	return pte_offset_kernel(pmd, address);
 } 
 
@@ -148,11 +150,12 @@ __change_page_attr(struct page *page, pg
 	pte_t *kpte; 
 	unsigned long address;
 	struct page *kpte_page;
+	int level;
 
 	BUG_ON(PageHighMem(page));
 	address = (unsigned long)page_address(page);
 
-	kpte = lookup_address(address);
+	kpte = lookup_address(address, &level);
 	if (!kpte)
 		return -EINVAL;
 	kpte_page = virt_to_page(kpte);
Index: linux/include/asm-x86/pgtable_32.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_32.h
+++ linux/include/asm-x86/pgtable_32.h
@@ -182,7 +182,7 @@ static inline void clone_pgd_range(pgd_t
  * NOTE: the return type is pte_t but if the pmd is PSE then we return it
  * as a pte too.
  */
-extern pte_t *lookup_address(unsigned long address);
+extern pte_t *lookup_address(unsigned long address, int *level);
 
 /*
  * Make a given kernel text page executable/non-executable.
Index: linux/arch/x86/mm/fault_32.c
===================================================================
--- linux.orig/arch/x86/mm/fault_32.c
+++ linux/arch/x86/mm/fault_32.c
@@ -571,7 +571,8 @@ no_context:
 
 #ifdef CONFIG_X86_PAE
 		if (error_code & PF_INSTR) {
-			pte_t *pte = lookup_address(address);
+			int level;
+			pte_t *pte = lookup_address(address, &level);
 
 			if (pte && pte_present(*pte) && !pte_exec_kernel(*pte))
 				printk(KERN_CRIT "kernel tried to execute "
Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -529,11 +529,12 @@ int __init set_kernel_exec(unsigned long
 {
 	pte_t *pte;
 	int ret = 1;
+	int level;
 
 	if (!nx_enabled)
 		goto out;
 
-	pte = lookup_address(vaddr);
+	pte = lookup_address(vaddr, &level);
 	BUG_ON(!pte);
 
 	if (!pte_exec_kernel(*pte))
Index: linux/arch/x86/xen/mmu.c
===================================================================
--- linux.orig/arch/x86/xen/mmu.c
+++ linux/arch/x86/xen/mmu.c
@@ -58,7 +58,8 @@
 
 xmaddr_t arbitrary_virt_to_machine(unsigned long address)
 {
-	pte_t *pte = lookup_address(address);
+	int level;
+	pte_t *pte = lookup_address(address, &level);
 	unsigned offset = address & PAGE_MASK;
 
 	BUG_ON(pte == NULL);
@@ -70,8 +71,9 @@ void make_lowmem_page_readonly(void *vad
 {
 	pte_t *pte, ptev;
 	unsigned long address = (unsigned long)vaddr;
+	int level;
 
-	pte = lookup_address(address);
+	pte = lookup_address(address, &level);
 	BUG_ON(pte == NULL);
 
 	ptev = pte_wrprotect(*pte);
@@ -84,8 +86,9 @@ void make_lowmem_page_readwrite(void *va
 {
 	pte_t *pte, ptev;
 	unsigned long address = (unsigned long)vaddr;
+	int level;
 
-	pte = lookup_address(address);
+	pte = lookup_address(address, &level);
 	BUG_ON(pte == NULL);
 
 	ptev = pte_mkwrite(*pte);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [9/31] CPA: Add simple self test at boot
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (7 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [8/31] CPA: Return the page table level in lookup_address() Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15 10:37   ` Harvey Harrison
  2008-01-14 22:16 ` [PATCH] [10/31] CPA: Change kernel_map_pages to not use c_p_a() Andi Kleen
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Since change_page_attr() is tricky code it is good to have some regression
test code. This patch maps and unmaps some random pages in the direct mapping
at boot and then dumps the state and does some simple sanity checks.

Add it with a CONFIG option.

Optional patch, but I find it useful.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/Makefile_32     |    1 
 arch/x86/mm/Makefile_64     |    1 
 arch/x86/mm/pageattr-test.c |  233 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 235 insertions(+)

Index: linux/arch/x86/mm/Makefile_64
===================================================================
--- linux.orig/arch/x86/mm/Makefile_64
+++ linux/arch/x86/mm/Makefile_64
@@ -7,3 +7,4 @@ obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpag
 obj-$(CONFIG_NUMA) += numa_64.o
 obj-$(CONFIG_K8_NUMA) += k8topology_64.o
 obj-$(CONFIG_ACPI_NUMA) += srat_64.o
+obj-$(CONFIG_CPA_DEBUG) += pageattr-test.o
Index: linux/arch/x86/mm/pageattr-test.c
===================================================================
--- /dev/null
+++ linux/arch/x86/mm/pageattr-test.c
@@ -0,0 +1,233 @@
+/*
+ * self test for change_page_attr.
+ *
+ * Clears the global bit on random pages in the direct mapping, then reverts
+ * and compares page tables forwards and afterwards.
+ */
+
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <asm/cacheflush.h>
+#include <asm/pgtable.h>
+#include <asm/kdebug.h>
+
+enum {
+	NTEST = 400,
+#ifdef CONFIG_X86_64
+	LOWEST_LEVEL = 4,
+	LPS = (1 << PMD_SHIFT),
+#elif defined(CONFIG_X86_PAE)
+	LOWEST_LEVEL = 3,
+	LPS = (1 << PMD_SHIFT),
+#else
+	LOWEST_LEVEL = 3, /* lookup_address lies here */
+	LPS = (1 << 22),
+#endif
+	GPS = (1<<30)
+};
+
+#ifdef CONFIG_X86_64
+#include <asm/proto.h>
+#define max_mapped end_pfn_map
+#else
+#define max_mapped max_low_pfn
+#endif
+
+struct split_state {
+	long lpg, gpg, spg, exec;
+	long min_exec, max_exec;
+};
+
+static __init int print_split(struct split_state *s)
+{
+	int printed = 0;
+	long i, expected, missed = 0;
+	int err = 0;
+
+	s->lpg = s->gpg = s->spg = s->exec = 0;
+	s->min_exec = ~0UL;
+	s->max_exec = 0;
+	for (i = 0; i < max_mapped; ) {
+		int level;
+		pte_t *pte;
+		unsigned long adr = (unsigned long)__va(i << PAGE_SHIFT);
+
+		pte = lookup_address(adr, &level);
+		if (!pte) {
+			if (!printed) {
+				dump_pagetable(adr);
+				printk("CPA %lx no pte level %d\n", adr, level);
+				printed = 1;
+			}
+			missed++;
+			i++;
+			continue;
+		}
+
+		if (level == 2 && sizeof(long) == 8) {
+			s->gpg++;
+			i += GPS/PAGE_SIZE;
+		} else if (level != LOWEST_LEVEL) {
+			if (!(pte_val(*pte) & _PAGE_PSE)) {
+				printk("%lx level %d but not PSE %Lx\n",
+					adr, level, (u64)pte_val(*pte));
+				err = 1;
+			}
+			s->lpg++;
+			i += LPS/PAGE_SIZE;
+		} else {
+			s->spg++;
+			i++;
+		}
+		if (!(pte_val(*pte) & _PAGE_NX)) {
+			s->exec++;
+			if (adr < s->min_exec)
+				s->min_exec = adr;
+			if (adr > s->max_exec)
+				s->max_exec = adr;
+		}
+	}
+	printk("CPA mapping 4k %lu large %lu gb %lu x %lu[%lx-%lx] miss %lu\n",
+		s->spg, s->lpg, s->gpg, s->exec,
+		s->min_exec != ~0UL ? s->min_exec : 0, s->max_exec, missed);
+	expected = (s->gpg*GPS + s->lpg*LPS)/PAGE_SIZE + s->spg + missed;
+	if (expected != i) {
+		printk("CPA max_mapped %lu but expected %lu\n",
+			max_mapped, expected);
+		return 1;
+	}
+	return err;
+}
+
+static __init int state_same(struct split_state *a, struct split_state *b)
+{
+	return a->lpg == b->lpg && a->gpg == b->gpg && a->spg == b->spg &&
+			a->exec == b->exec;
+}
+
+static unsigned long addr[NTEST] __initdata;
+static unsigned len[NTEST] __initdata;
+
+/* Change the global bit on random pages in the direct mapping */
+static __init int exercise_pageattr(void)
+{
+	int i, k;
+	pte_t *pte, pte0;
+	int level;
+	int err;
+	struct split_state sa, sb, sc;
+	int failed = 0;
+	unsigned long *bm;
+
+	printk("CPA exercising pageattr\n");
+
+	bm = vmalloc((max_mapped + 7) / 8);
+	if (!bm) {
+		printk("CPA Cannot vmalloc bitmap\n");
+		return -ENOMEM;
+	}
+	memset(bm, 0, (max_mapped + 7) / 8);
+
+	failed += print_split(&sa);
+	srandom32(100);
+	for (i = 0; i < NTEST; i++) {
+		unsigned long pfn = random32() % max_mapped;
+		addr[i] = (unsigned long)__va(pfn << PAGE_SHIFT);
+		len[i] = random32() % 100;
+		len[i] = min_t(unsigned long, len[i], max_mapped - pfn - 1);
+		if (len[i] == 0)
+			len[i] = 1;
+
+		pte = NULL;
+		pte0 = pfn_pte(0, __pgprot(0)); /* shut gcc up */
+		for (k = 0; k < len[i]; k++) {
+			pte = lookup_address(addr[i] + k*PAGE_SIZE, &level);
+			if (!pte || pgprot_val(pte_pgprot(*pte)) == 0) {
+				addr[i] = 0;
+				break;
+			}
+			if (k == 0)
+				pte0 = *pte;
+			else if (pgprot_val(pte_pgprot(*pte)) !=
+					pgprot_val(pte_pgprot(pte0))) {
+				len[i] = k;
+				break;
+			}
+			if (test_bit(pfn + k, bm)) {
+				len[i] = k;
+				break;
+			}
+			__set_bit(pfn + k, bm);
+		}
+		if (!addr[i] || !pte || !k) {
+			addr[i] = 0;
+			continue;
+		}
+
+		err = change_page_attr(virt_to_page(addr[i]), len[i],
+			    pte_pgprot(pte_clrhuge(pte_clrglobal(pte0))));
+		if (err < 0) {
+			printk("CPA %d failed %d\n", i, err);
+			failed++;
+		}
+
+		pte = lookup_address(addr[i], &level);
+		if (!pte || pte_global(*pte) || pte_huge(*pte)) {
+			printk("CPA %lx: bad pte %Lx\n", addr[i],
+				pte ? (u64)pte_val(*pte) : 0ULL);
+			failed++;
+		}
+		if (level != LOWEST_LEVEL) {
+			printk("CPA %lx: unexpected level %d\n", addr[i],
+					level);
+			failed++;
+		}
+
+	}
+	vfree(bm);
+	global_flush_tlb();
+
+	failed += print_split(&sb);
+
+	printk("CPA reverting everything\n");
+	for (i = 0; i < NTEST; i++) {
+		if (!addr[i])
+			continue;
+		pte = lookup_address(addr[i], &level);
+		if (!pte) {
+			printk("CPA lookup of %lx failed\n", addr[i]);
+			failed++;
+			continue;
+		}
+		err = change_page_attr(virt_to_page(addr[i]), len[i],
+					  pte_pgprot(pte_mkglobal(*pte)));
+		if (err < 0) {
+			printk("CPA reverting failed: %d\n", err);
+			failed++;
+		}
+		pte = lookup_address(addr[i], &level);
+		if (!pte || !pte_global(*pte)) {
+			printk("CPA %lx: bad pte after revert %Lx\n", addr[i],
+			       pte ? (u64)pte_val(*pte) : 0ULL);
+			failed++;
+		}
+
+	}
+	global_flush_tlb();
+
+	failed += print_split(&sc);
+	if (!state_same(&sa, &sc))
+		failed++;
+
+	if (failed)
+		printk("CPA selftests NOT PASSED. Please report.\n");
+	else
+		printk("CPA selftests PASSED\n");
+
+	return 0;
+}
+
+module_init(exercise_pageattr);
Index: linux/arch/x86/mm/Makefile_32
===================================================================
--- linux.orig/arch/x86/mm/Makefile_32
+++ linux/arch/x86/mm/Makefile_32
@@ -8,3 +8,4 @@ obj-$(CONFIG_NUMA) += discontig_32.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_HIGHMEM) += highmem_32.o
 obj-$(CONFIG_BOOT_IOREMAP) += boot_ioremap_32.o
+obj-$(CONFIG_CPA_DEBUG) += pageattr-test.o

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [10/31] CPA: Change kernel_map_pages to not use c_p_a()
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (8 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [9/31] CPA: Add simple self test at boot Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [11/31] CPA: Change 32bit back to init_mm semaphore locking Andi Kleen
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


CONFIG_DEBUG_PAGEALLOC uses change_page_attr to map/unmap mappings for catching
stray kernel mappings. But standard c_p_a() does a lot of unnecessary work for
this simple case with pre-split mappings.

Change kernel_map_pages to just access the page table directly which
is simpler and faster.

I also fixed it to use INVLPG if available.

This is required for  changes to c_p_a() later that make it use kmalloc. Without
this we would risk infinite recursion. Also in general things are easier when
sleeping is allowed.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -258,22 +258,36 @@ void global_flush_tlb(void)
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
+/* Map or unmap pages in the kernel direct mapping for kernel debugging. */
 void kernel_map_pages(struct page *page, int numpages, int enable)
 {
+	unsigned long addr;
+	int i;
+
 	if (PageHighMem(page))
 		return;
+	addr = (unsigned long)page_address(page);
 	if (!enable)
-		debug_check_no_locks_freed(page_address(page),
-					   numpages * PAGE_SIZE);
+		debug_check_no_locks_freed((void *)addr, numpages * PAGE_SIZE);
+
+	/* Bootup has forced 4K pages so this is very simple */
+
+	for (i = 0; i < numpages; i++, addr += PAGE_SIZE, page++) {
+		int level;
+		pte_t *pte = lookup_address(addr, &level);
 
-	/* the return value is ignored - the calls cannot fail,
-	 * large pages are disabled at boot time.
-	 */
-	change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0));
-	/* we should perform an IPI and flush all tlbs,
-	 * but that can deadlock->flush only current cpu.
-	 */
-	__flush_tlb_all();
+		BUG_ON(level != 3);
+		if (enable) {
+			set_pte_atomic(pte, mk_pte(page, PAGE_KERNEL));
+			/*
+			 * We should perform an IPI and flush all tlbs,
+			 * but that can deadlock->flush only current cpu.
+			 */
+			__flush_tlb_one(addr);
+		} else {
+			kpte_clear_flush(pte, addr);
+		}
+	}
 }
 #endif
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [11/31] CPA: Change 32bit back to init_mm semaphore locking
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (9 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [10/31] CPA: Change kernel_map_pages to not use c_p_a() Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [12/31] CPA: CLFLUSH support in change_page_attr() Andi Kleen
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Now that debug pagealloc uses a separate function it is better
to change standard change_page_attr back to init_mm semaphore locking like 64bit.
Various things are simpler when sleeping is allowed.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -14,10 +14,9 @@
 #include <asm/pgalloc.h>
 #include <asm/sections.h>
 
-static DEFINE_SPINLOCK(cpa_lock);
+/* Protected by init_mm.mmap_sem */
 static struct list_head df_list = LIST_HEAD_INIT(df_list);
 
-
 pte_t *lookup_address(unsigned long address, int *level)
 { 
 	pgd_t *pgd = pgd_offset_k(address);
@@ -46,9 +45,7 @@ static struct page *split_large_page(uns
 	struct page *base;
 	pte_t *pbase;
 
-	spin_unlock_irq(&cpa_lock);
 	base = alloc_pages(GFP_KERNEL, 0);
-	spin_lock_irq(&cpa_lock);
 	if (!base) 
 		return NULL;
 
@@ -224,15 +221,14 @@ int change_page_attr(struct page *page, 
 {
 	int err = 0; 
 	int i; 
-	unsigned long flags;
 
-	spin_lock_irqsave(&cpa_lock, flags);
+	down_write(&init_mm.mmap_sem);
 	for (i = 0; i < numpages; i++, page++) { 
 		err = __change_page_attr(page, prot);
 		if (err) 
 			break; 
 	}	
-	spin_unlock_irqrestore(&cpa_lock, flags);
+	up_write(&init_mm.mmap_sem);
 	return err;
 }
 
@@ -243,9 +239,9 @@ void global_flush_tlb(void)
 
 	BUG_ON(irqs_disabled());
 
-	spin_lock_irq(&cpa_lock);
+	down_write(&init_mm.mmap_sem);
 	list_replace_init(&df_list, &l);
-	spin_unlock_irq(&cpa_lock);
+	up_write(&init_mm.mmap_sem);
 	flush_map(&l);
 	list_for_each_entry_safe(pg, next, &l, lru) {
 		list_del(&pg->lru);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [12/31] CPA: CLFLUSH support in change_page_attr()
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (10 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [11/31] CPA: Change 32bit back to init_mm semaphore locking Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15  8:40   ` Jan Beulich
  2008-01-14 22:16 ` [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr Andi Kleen
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Queue individual data pages for flushing with CLFLUSH in change_page_attr(),
instead of doing global WBINVDs. WBINVD is a very painful operation
for the CPU (can take msecs) and quite slow too.  Worse it is not interruptible 
and can cause long latencies on hypervisors on older Intel VT systems.

CLFLUSH on the other hand only flushes the cache lines that actually need to be 
flushed and since it works in smaller chunks is more preemeptible.

To do this c_p_a needs to save the address to be flush for global_tlb_flush()
later.  This is done using a separate data structure, not struct page, 
because page->lru is often used or not there for memory holes.

Also the flushes are done in FIFO order now, not LIFO. 

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   78 ++++++++++++++++++++++++++++++++++------------
 arch/x86/mm/pageattr_64.c |   77 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 118 insertions(+), 37 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -13,6 +13,11 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
+struct flush {
+	struct list_head l;
+	unsigned long addr;
+};
+
 pte_t *lookup_address(unsigned long address, int *level)
 { 
 	pgd_t *pgd = pgd_offset_k(address);
@@ -63,6 +68,11 @@ static struct page *split_large_page(uns
 	return base;
 } 
 
+struct flush_arg {
+	int full_flush;
+	struct list_head l;
+};
+
 void clflush_cache_range(void *adr, int size)
 {
 	int i;
@@ -72,27 +82,27 @@ void clflush_cache_range(void *adr, int 
 
 static void flush_kernel_map(void *arg)
 {
-	struct list_head *l = (struct list_head *)arg;
-	struct page *pg;
+	struct flush_arg *a = (struct flush_arg *)arg;
+	struct flush *f;
+
+	if (!cpu_has_clflush)
+		a->full_flush = 1;
 
 	/* When clflush is available always use it because it is
 	   much cheaper than WBINVD. */
-	/* clflush is still broken. Disable for now. */
-	if (1 || !cpu_has_clflush)
+	if (a->full_flush)
 		asm volatile("wbinvd" ::: "memory");
-	else list_for_each_entry(pg, l, lru) {
-		void *adr = page_address(pg);
-		clflush_cache_range(adr, PAGE_SIZE);
+	list_for_each_entry(f, &a->l, l) {
+		if (!a->full_flush)
+			clflush_cache_range((void *)f->addr, PAGE_SIZE);
 	}
 	__flush_tlb_all();
 }
 
-static inline void flush_map(struct list_head *l)
-{	
-	on_each_cpu(flush_kernel_map, l, 1, 1);
-}
-
-static LIST_HEAD(deferred_pages); /* protected by init_mm.mmap_sem */
+/* both protected by init_mm.mmap_sem */
+static int full_flush;
+static LIST_HEAD(deferred_pages);
+static LIST_HEAD(flush_pages);
 
 static inline void save_page(struct page *fpage)
 {
@@ -124,6 +134,25 @@ static void revert_page(unsigned long ad
 	set_pte((pte_t *)pmd, large_pte);
 }      
 
+/*
+ * Mark the address for flushing later in global_tlb_flush().
+ *
+ * Other parts of the kernel are already in a feeding frenzy over the various
+ * struct page fields. Instead of trying to compete allocate a separate
+ * data structure to keep track of the flush. This has the added bonus that
+ * it will work for MMIO holes without mem_map too.
+ */
+static void set_tlb_flush(unsigned long address)
+{
+	struct flush *f = kmalloc(sizeof(struct flush), GFP_KERNEL);
+	if (!f) {
+		full_flush = 1;
+		return;
+	}
+	f->addr = address;
+	list_add_tail(&f->l, &flush_pages);
+}
+
 static int
 __change_page_attr(unsigned long address, unsigned long pfn, pgprot_t prot,
 				   pgprot_t ref_prot)
@@ -136,8 +165,11 @@ __change_page_attr(unsigned long address
 	kpte = lookup_address(address, &level);
 	if (!kpte) return 0;
 	kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
-	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
+	BUG_ON(PageLRU(kpte_page));
+
+	set_tlb_flush(address);
+
 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
 		if (!pte_huge(*kpte)) {
 			set_pte(kpte, pfn_pte(pfn, prot));
@@ -231,7 +263,9 @@ int change_page_attr(struct page *page, 
 void global_flush_tlb(void)
 { 
 	struct page *pg, *next;
-	struct list_head l;
+	struct flush *f, *fnext;
+	struct flush_arg arg;
+	struct list_head free_pages;
 
 	/*
 	 * Write-protect the semaphore, to exclude two contexts
@@ -239,12 +273,19 @@ void global_flush_tlb(void)
 	 * exclude new additions to the deferred_pages list:
 	 */
 	down_write(&init_mm.mmap_sem);
-	list_replace_init(&deferred_pages, &l);
+	arg.full_flush = full_flush;
+	full_flush = 0;
+	list_replace_init(&flush_pages, &arg.l);
+	list_replace_init(&deferred_pages, &free_pages);
 	up_write(&init_mm.mmap_sem);
 
-	flush_map(&l);
+	on_each_cpu(flush_kernel_map, &arg, 1, 1);
+
+	list_for_each_entry_safe(f, fnext, &arg.l, l) {
+		kfree(f);
+	}
 
-	list_for_each_entry_safe(pg, next, &l, lru) {
+	list_for_each_entry_safe(pg, next, &free_pages, lru) {
 		list_del(&pg->lru);
 		clear_bit(PG_arch_1, &pg->flags);
 		if (page_private(pg) != 0)
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -15,8 +15,15 @@
 #include <asm/sections.h>
 
 /* Protected by init_mm.mmap_sem */
+/* Variables protected by cpa_lock */
+static int full_flush;
 static struct list_head df_list = LIST_HEAD_INIT(df_list);
+static LIST_HEAD(flush_pages);
 
+struct flush {
+	struct list_head l;
+	unsigned long addr;
+};
 pte_t *lookup_address(unsigned long address, int *level)
 { 
 	pgd_t *pgd = pgd_offset_k(address);
@@ -67,25 +74,31 @@ static struct page *split_large_page(uns
 	return base;
 } 
 
-static void cache_flush_page(struct page *p)
+struct flush_arg {
+	int full_flush;
+	struct list_head l;
+};
+
+void clflush_cache_range(void *adr, int size)
 { 
-	void *adr = page_address(p);
 	int i;
-	for (i = 0; i < PAGE_SIZE; i += boot_cpu_data.x86_clflush_size)
+	for (i = 0; i < size; i += boot_cpu_data.x86_clflush_size)
 		clflush(adr+i);
 }
 
 static void flush_kernel_map(void *arg)
 {
-	struct list_head *lh = (struct list_head *)arg;
-	struct page *p;
+	struct flush_arg *a = (struct flush_arg *)arg;
+	struct flush *f;
 
-	/* High level code is not ready for clflush yet */
-	if (0 && cpu_has_clflush) {
-		list_for_each_entry (p, lh, lru)
-			cache_flush_page(p);
-	} else if (boot_cpu_data.x86_model >= 4)
+	if (!cpu_has_clflush)
+		a->full_flush = 1;
+	if (a->full_flush && boot_cpu_data.x86_model >= 4)
 		wbinvd();
+	list_for_each_entry(f, &a->l, l) {
+		if (!a->full_flush)
+			clflush_cache_range((void *)f->addr, PAGE_SIZE);
+	}
 
 	/* Flush all to work around Errata in early athlons regarding 
 	 * large page flushing. 
@@ -141,6 +154,25 @@ static inline void save_page(struct page
 		list_add(&kpte_page->lru, &df_list);
 }
 
+/*
+ * Mark the address for flushing later in global_tlb_flush().
+ *
+ * Other parts of the kernel are already in a feeding frenzy over the various
+ * struct page fields. Instead of trying to compete allocate a separate
+ * data structure to keep track of the flush. This has the added bonus that
+ * it will work for MMIO holes without mem_map too.
+ */
+static void set_tlb_flush(unsigned long address)
+{
+	struct flush *f = kmalloc(sizeof(struct flush), GFP_KERNEL);
+	if (!f) {
+		full_flush = 1;
+		return;
+	}
+	f->addr = address;
+	list_add_tail(&f->l, &flush_pages);
+}
+
 static int
 __change_page_attr(struct page *page, pgprot_t prot)
 { 
@@ -159,6 +191,8 @@ __change_page_attr(struct page *page, pg
 	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
 
+	set_tlb_flush(address);
+
 	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
 		if (!pte_huge(*kpte)) {
 			set_pte_atomic(kpte, mk_pte(page, prot)); 
@@ -199,11 +233,6 @@ __change_page_attr(struct page *page, pg
 	return 0;
 } 
 
-static inline void flush_map(struct list_head *l)
-{
-	on_each_cpu(flush_kernel_map, l, 1, 1);
-}
-
 /*
  * Change the page attributes of an page in the linear mapping.
  *
@@ -234,16 +263,27 @@ int change_page_attr(struct page *page, 
 
 void global_flush_tlb(void)
 {
-	struct list_head l;
+	struct flush_arg arg;
 	struct page *pg, *next;
+	struct flush *f, *fnext;
+	struct list_head free_pages;
 
 	BUG_ON(irqs_disabled());
 
 	down_write(&init_mm.mmap_sem);
-	list_replace_init(&df_list, &l);
+	arg.full_flush = full_flush;
+	full_flush = 0;
+	list_replace_init(&flush_pages, &arg.l);
+	list_replace_init(&df_list, &free_pages);
 	up_write(&init_mm.mmap_sem);
-	flush_map(&l);
-	list_for_each_entry_safe(pg, next, &l, lru) {
+
+	on_each_cpu(flush_kernel_map, &arg, 1, 1);
+
+	list_for_each_entry_safe(f, fnext, &arg.l, l) {
+		kfree(f);
+	}
+
+	list_for_each_entry_safe(pg, next, &free_pages, lru) {
 		list_del(&pg->lru);
 		clear_bit(PG_arch_1, &pg->flags);
 		if (PageReserved(pg) || !cpu_has_pse || page_private(pg) != 0)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (11 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [12/31] CPA: CLFLUSH support in change_page_attr() Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15  9:29   ` Harvey Harrison
  2008-01-14 22:16 ` [PATCH] [14/31] CPA: Use page granuality TLB flushing " Andi Kleen
                   ` (18 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Instead of open coding the bit accesses uses standard style
*PageDeferred* macros. 

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   14 ++++++++++----
 arch/x86/mm/pageattr_64.c |   11 +++++++++--
 2 files changed, 19 insertions(+), 6 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -18,6 +18,13 @@ struct flush {
 	unsigned long addr;
 };
 
+#define PG_deferred PG_arch_1
+
+#define PageDeferred(p) test_bit(PG_deferred, &(p)->flags)
+#define SetPageDeferred(p) set_bit(PG_deferred, &(p)->flags)
+#define ClearPageDeferred(p) clear_bit(PG_deferred, &(p)->flags)
+#define TestSetPageDeferred(p) test_and_set_bit(PG_deferred, &(p)->flags)
+
 pte_t *lookup_address(unsigned long address, int *level)
 { 
 	pgd_t *pgd = pgd_offset_k(address);
@@ -106,7 +113,7 @@ static LIST_HEAD(flush_pages);
 
 static inline void save_page(struct page *fpage)
 {
-	if (!test_and_set_bit(PG_arch_1, &fpage->flags))
+	if (!TestSetPageDeferred(fpage))
 		list_add(&fpage->lru, &deferred_pages);
 }
 
@@ -287,7 +294,7 @@ void global_flush_tlb(void)
 
 	list_for_each_entry_safe(pg, next, &free_pages, lru) {
 		list_del(&pg->lru);
-		clear_bit(PG_arch_1, &pg->flags);
+		ClearPageDeferred(pg);
 		if (page_private(pg) != 0)
 			continue;
 		ClearPagePrivate(pg);
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -14,8 +14,14 @@
 #include <asm/pgalloc.h>
 #include <asm/sections.h>
 
-/* Protected by init_mm.mmap_sem */
-/* Variables protected by cpa_lock */
+#define PG_deferred PG_arch_1
+
+#define PageDeferred(p) test_bit(PG_deferred, &(p)->flags)
+#define SetPageDeferred(p) set_bit(PG_deferred, &(p)->flags)
+#define ClearPageDeferred(p) clear_bit(PG_deferred, &(p)->flags)
+#define TestSetPageDeferred(p) test_and_set_bit(PG_deferred, &(p)->flags)
+
+/* Variables protected by init_mm.mmap_sem */
 static int full_flush;
 static struct list_head df_list = LIST_HEAD_INIT(df_list);
 static LIST_HEAD(flush_pages);
@@ -150,7 +156,7 @@ static inline void revert_page(struct pa
 
 static inline void save_page(struct page *kpte_page)
 {
-	if (!test_and_set_bit(PG_arch_1, &kpte_page->flags))
+	if (!TestSetPageDeferred(kpte_page))
 		list_add(&kpte_page->lru, &df_list);
 }
 
@@ -285,7 +291,7 @@ void global_flush_tlb(void)
 
 	list_for_each_entry_safe(pg, next, &free_pages, lru) {
 		list_del(&pg->lru);
-		clear_bit(PG_arch_1, &pg->flags);
+		ClearPageDeferred(pg);
 		if (PageReserved(pg) || !cpu_has_pse || page_private(pg) != 0)
 			continue;
 		ClearPagePrivate(pg);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [14/31] CPA: Use page granuality TLB flushing in change_page_attr
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (12 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [15/31] CPA: Don't flush the caches when the CPU supports self-snoop Andi Kleen
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


With the infrastructure added for CLFLUSH it is possible
to only TLB flush the actually changed pages in change_page_attr()

Take care of old Athlon K7 Errata on the 32bit version

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   15 ++++++++-------
 arch/x86/mm/pageattr_64.c |   10 +++++-----
 2 files changed, 13 insertions(+), 12 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -97,19 +97,20 @@ static void flush_kernel_map(void *arg)
 	struct flush_arg *a = (struct flush_arg *)arg;
 	struct flush *f;
 
-	if (!cpu_has_clflush)
-		a->full_flush = 1;
-	if (a->full_flush && boot_cpu_data.x86_model >= 4)
+	if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4)
 		wbinvd();
 	list_for_each_entry(f, &a->l, l) {
 		if (!a->full_flush)
 			clflush_cache_range((void *)f->addr, PAGE_SIZE);
+		if (!a->full_flush)
+			__flush_tlb_one(f->addr);
 	}
 
-	/* Flush all to work around Errata in early athlons regarding 
-	 * large page flushing. 
-	 */
-	__flush_tlb_all();	
+	/* Handle errata with flushing large pages in early Athlons */
+	if (a->full_flush ||
+	    (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+	     boot_cpu_data.x86 == 7))
+		__flush_tlb_all();
 }
 
 static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) 
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -92,18 +92,18 @@ static void flush_kernel_map(void *arg)
 	struct flush_arg *a = (struct flush_arg *)arg;
 	struct flush *f;
 
-	if (!cpu_has_clflush)
-		a->full_flush = 1;
-
 	/* When clflush is available always use it because it is
 	   much cheaper than WBINVD. */
-	if (a->full_flush)
+	if (a->full_flush || !cpu_has_clflush)
 		asm volatile("wbinvd" ::: "memory");
 	list_for_each_entry(f, &a->l, l) {
 		if (!a->full_flush)
 			clflush_cache_range((void *)f->addr, PAGE_SIZE);
+		if (!a->full_flush)
+			__flush_tlb_one(f->addr);
 	}
-	__flush_tlb_all();
+	if (a->full_flush)
+		__flush_tlb_all();
 }
 
 /* both protected by init_mm.mmap_sem */

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [15/31] CPA: Don't flush the caches when the CPU supports self-snoop
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (13 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [14/31] CPA: Use page granuality TLB flushing " Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [16/31] CPA: Use wbinvd() macro instead of inline assembly in 64bit c_p_a() Andi Kleen
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


When the self-snoop CPUID bit is set change_page_attr() only needs to flush
TLBs, but not the caches.

The description of self-snoop in the Intel manuals is a bit vague
but I got confirmation that this is what SS really means.

This should improve c_p_a() performance significantly on newer
Intel CPUs.

Note: the line > 80 characters will be modified again in a followup

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c    |    5 +++--
 arch/x86/mm/pageattr_64.c    |    4 ++--
 include/asm-x86/cpufeature.h |    1 +
 3 files changed, 6 insertions(+), 4 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -97,10 +97,11 @@ static void flush_kernel_map(void *arg)
 	struct flush_arg *a = (struct flush_arg *)arg;
 	struct flush *f;
 
-	if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4)
+	if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4 &&
+		!cpu_has_ss)
 		wbinvd();
 	list_for_each_entry(f, &a->l, l) {
-		if (!a->full_flush)
+		if (!a->full_flush && !cpu_has_ss)
 			clflush_cache_range((void *)f->addr, PAGE_SIZE);
 		if (!a->full_flush)
 			__flush_tlb_one(f->addr);
Index: linux/include/asm-x86/cpufeature.h
===================================================================
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -167,6 +167,7 @@
 #define cpu_has_pebs		boot_cpu_has(X86_FEATURE_PEBS)
 #define cpu_has_clflush		boot_cpu_has(X86_FEATURE_CLFLSH)
 #define cpu_has_bts		boot_cpu_has(X86_FEATURE_BTS)
+#define cpu_has_ss		boot_cpu_has(X86_FEATURE_SELFSNOOP)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg		1
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -94,10 +94,10 @@ static void flush_kernel_map(void *arg)
 
 	/* When clflush is available always use it because it is
 	   much cheaper than WBINVD. */
-	if (a->full_flush || !cpu_has_clflush)
+	if ((a->full_flush || !cpu_has_clflush) && !cpu_has_ss)
 		asm volatile("wbinvd" ::: "memory");
 	list_for_each_entry(f, &a->l, l) {
-		if (!a->full_flush)
+		if (!a->full_flush && !cpu_has_ss)
 			clflush_cache_range((void *)f->addr, PAGE_SIZE);
 		if (!a->full_flush)
 			__flush_tlb_one(f->addr);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [16/31] CPA: Use wbinvd() macro instead of inline assembly in 64bit c_p_a()
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (14 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [15/31] CPA: Don't flush the caches when the CPU supports self-snoop Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [17/31] CPA: Reorder TLB / cache flushes to follow Intel recommendation Andi Kleen
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_64.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -95,7 +95,7 @@ static void flush_kernel_map(void *arg)
 	/* When clflush is available always use it because it is
 	   much cheaper than WBINVD. */
 	if ((a->full_flush || !cpu_has_clflush) && !cpu_has_ss)
-		asm volatile("wbinvd" ::: "memory");
+		wbinvd();
 	list_for_each_entry(f, &a->l, l) {
 		if (!a->full_flush && !cpu_has_ss)
 			clflush_cache_range((void *)f->addr, PAGE_SIZE);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [17/31] CPA: Reorder TLB / cache flushes to follow Intel recommendation
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (15 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [16/31] CPA: Use wbinvd() macro instead of inline assembly in 64bit c_p_a() Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [18/31] CPA: Make change_page_attr() more robust against use of PAT bits Andi Kleen
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Intel recommends to first flush the TLBs and then the caches
on caching attribute changes. c_p_a() previously did it the
other way round. Reorder that.

The procedure is still not fully compliant to the Intel documentation
because Intel recommends a all CPU synchronization step between
the TLB flushes and the cache flushes.

However on all new Intel CPUs this is now meaningless anyways
because they support Self-Snoop and can skip the cache flush 
step anyways.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   13 ++++++++++---
 arch/x86/mm/pageattr_64.c |   14 ++++++++++----
 2 files changed, 20 insertions(+), 7 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -97,9 +97,6 @@ static void flush_kernel_map(void *arg)
 	struct flush_arg *a = (struct flush_arg *)arg;
 	struct flush *f;
 
-	if ((!cpu_has_clflush || a->full_flush) && boot_cpu_data.x86_model >= 4 &&
-		!cpu_has_ss)
-		wbinvd();
 	list_for_each_entry(f, &a->l, l) {
 		if (!a->full_flush && !cpu_has_ss)
 			clflush_cache_range((void *)f->addr, PAGE_SIZE);
@@ -112,6 +109,16 @@ static void flush_kernel_map(void *arg)
 	    (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
 	     boot_cpu_data.x86 == 7))
 		__flush_tlb_all();
+
+	/*
+	 * RED-PEN: Intel documentation ask for a CPU synchronization step
+	 * here and in the loop. But it is moot on Self-Snoop CPUs anyways.
+	 */
+
+	if ((!cpu_has_clflush || a->full_flush) &&
+	    !cpu_has_ss && boot_cpu_data.x86_model >= 4)
+		wbinvd();
+
 }
 
 static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) 
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -94,16 +94,22 @@ static void flush_kernel_map(void *arg)
 
 	/* When clflush is available always use it because it is
 	   much cheaper than WBINVD. */
-	if ((a->full_flush || !cpu_has_clflush) && !cpu_has_ss)
-		wbinvd();
 	list_for_each_entry(f, &a->l, l) {
-		if (!a->full_flush && !cpu_has_ss)
-			clflush_cache_range((void *)f->addr, PAGE_SIZE);
 		if (!a->full_flush)
 			__flush_tlb_one(f->addr);
+		if (!a->full_flush && !cpu_has_ss)
+			clflush_cache_range((void *)f->addr, PAGE_SIZE);
 	}
 	if (a->full_flush)
 		__flush_tlb_all();
+
+	/*
+	 * RED-PEN: Intel documentation ask for a CPU synchronization step
+	 * here and in the loop. But it is moot on Self-Snoop CPUs anyways.
+	 */
+
+	if ((!cpu_has_clflush || a->full_flush) && !cpu_has_ss)
+		wbinvd();
 }
 
 /* both protected by init_mm.mmap_sem */

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [18/31] CPA: Make change_page_attr() more robust against use of PAT bits
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (16 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [17/31] CPA: Reorder TLB / cache flushes to follow Intel recommendation Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [19/31] CPA: Limit cache flushing to pages that really change caching Andi Kleen
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Use the page table level instead of the PSE bit to check if the PTE 
is for a 4K page or not. This makes the code more robust when the PAT
bit is changed because the PAT bit on 4K pages is in the same position as the
PSE bit.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |    4 ++--
 arch/x86/mm/pageattr_64.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -209,7 +209,7 @@ __change_page_attr(struct page *page, pg
 	set_tlb_flush(address);
 
 	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
-		if (!pte_huge(*kpte)) {
+		if (level == 3) {
 			set_pte_atomic(kpte, mk_pte(page, prot)); 
 		} else {
 			pgprot_t ref_prot;
@@ -225,7 +225,7 @@ __change_page_attr(struct page *page, pg
 			kpte_page = split;
 		}
 		page_private(kpte_page)++;
-	} else if (!pte_huge(*kpte)) {
+	} else if (level == 3) {
 		set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
 		BUG_ON(page_private(kpte_page) == 0);
 		page_private(kpte_page)--;
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -184,7 +184,7 @@ __change_page_attr(unsigned long address
 	set_tlb_flush(address);
 
 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
-		if (!pte_huge(*kpte)) {
+		if (level == 4) {
 			set_pte(kpte, pfn_pte(pfn, prot));
 		} else {
 			/*
@@ -201,7 +201,7 @@ __change_page_attr(unsigned long address
 			kpte_page = split;
 		}
 		page_private(kpte_page)++;
-	} else if (!pte_huge(*kpte)) {
+	} else if (level == 4) {
 		set_pte(kpte, pfn_pte(pfn, ref_prot));
 		BUG_ON(page_private(kpte_page) == 0);
 		page_private(kpte_page)--;

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [19/31] CPA: Limit cache flushing to pages that really change caching
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (17 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [18/31] CPA: Make change_page_attr() more robust against use of PAT bits Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15  8:46   ` Jan Beulich
  2008-01-14 22:16 ` [PATCH] [20/31] CPA: Fix inaccurate comments in 64bit change_page_attr() Andi Kleen
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Previously change_page_attr always flushed caches even for 
pages that only change a non caching related attribute (like RO for
read/write protection).

This changes the flush code to only flush the cache when the
caching attributes actually change. 

I made some effort to already handle reprogrammed PAT bits, although this
is not strictly needed right now by the core kernel (but that will
probably change soon) 

This will only make a difference on AMD CPUs or older Intel CPUs,
because all newer Intel CPUs support "self-snoop" and do not require
this cache flushing anyways.

Another advantage of this patch is that it prevents recursive
slab calls with slab debugging.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   43 +++++++++++++++++++++++++++++++++----------
 arch/x86/mm/pageattr_64.c |   42 ++++++++++++++++++++++++++++++++++--------
 include/asm-x86/pgtable.h |    8 ++++++++
 3 files changed, 75 insertions(+), 18 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -13,7 +13,10 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
+enum flush_mode { FLUSH_NONE, FLUSH_CACHE, FLUSH_TLB };
+
 struct flush {
+	enum flush_mode mode;
 	struct list_head l;
 	unsigned long addr;
 };
@@ -76,7 +79,7 @@ static struct page *split_large_page(uns
 } 
 
 struct flush_arg {
-	int full_flush;
+	enum flush_mode full_flush;
 	struct list_head l;
 };
 
@@ -91,14 +94,19 @@ static void flush_kernel_map(void *arg)
 {
 	struct flush_arg *a = (struct flush_arg *)arg;
 	struct flush *f;
+	int cache_flush = a->full_flush == FLUSH_CACHE;
 
 	/* When clflush is available always use it because it is
 	   much cheaper than WBINVD. */
 	list_for_each_entry(f, &a->l, l) {
 		if (!a->full_flush)
 			__flush_tlb_one(f->addr);
-		if (!a->full_flush && !cpu_has_ss)
-			clflush_cache_range((void *)f->addr, PAGE_SIZE);
+		if (f->mode == FLUSH_CACHE && !cpu_has_ss) {
+			if (cpu_has_clflush)
+				clflush_cache_range((void *)f->addr, PAGE_SIZE);
+			else
+				cache_flush++;
+		}
 	}
 	if (a->full_flush)
 		__flush_tlb_all();
@@ -108,12 +116,12 @@ static void flush_kernel_map(void *arg)
 	 * here and in the loop. But it is moot on Self-Snoop CPUs anyways.
 	 */
 
-	if ((!cpu_has_clflush || a->full_flush) && !cpu_has_ss)
+	if (cache_flush > 0 && !cpu_has_ss)
 		wbinvd();
 }
 
 /* both protected by init_mm.mmap_sem */
-static int full_flush;
+static enum flush_mode full_flush;
 static LIST_HEAD(deferred_pages);
 static LIST_HEAD(flush_pages);
 
@@ -155,17 +163,35 @@ static void revert_page(unsigned long ad
  * data structure to keep track of the flush. This has the added bonus that
  * it will work for MMIO holes without mem_map too.
  */
-static void set_tlb_flush(unsigned long address)
+static void set_tlb_flush(unsigned long address, int cache)
 {
+	enum flush_mode mode = cache ? FLUSH_CACHE : FLUSH_TLB;
 	struct flush *f = kmalloc(sizeof(struct flush), GFP_KERNEL);
 	if (!f) {
-		full_flush = 1;
+		if (full_flush < mode)
+			full_flush = mode;
 		return;
 	}
+
 	f->addr = address;
+	f->mode = mode;
 	list_add_tail(&f->l, &flush_pages);
 }
 
+static unsigned short pat_bit[5] = {
+	[4] = _PAGE_PAT,
+	[3] = _PAGE_PAT_LARGE,
+};
+
+static int cache_attr_changed(pte_t pte, pgprot_t prot, int level)
+{
+	unsigned a = pte_val(pte) & (_PAGE_PCD|_PAGE_PWT);
+	/* Correct for PAT bit being in different position on large pages */
+	if (pte_val(pte) & pat_bit[level])
+		a |= _PAGE_PAT;
+	return a != (pgprot_val(prot) & _PAGE_CACHE);
+}
+
 static int
 __change_page_attr(unsigned long address, unsigned long pfn, pgprot_t prot,
 				   pgprot_t ref_prot)
@@ -181,7 +207,7 @@ __change_page_attr(unsigned long address
 	BUG_ON(PageCompound(kpte_page));
 	BUG_ON(PageLRU(kpte_page));
 
-	set_tlb_flush(address);
+	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level));
 
 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
 		if (level == 4) {
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -21,12 +21,15 @@
 #define ClearPageDeferred(p) clear_bit(PG_deferred, &(p)->flags)
 #define TestSetPageDeferred(p) test_and_set_bit(PG_deferred, &(p)->flags)
 
+enum flush_mode { FLUSH_NONE, FLUSH_CACHE, FLUSH_TLB };
+
 /* Variables protected by init_mm.mmap_sem */
-static int full_flush;
+static enum flush_mode full_flush;
 static struct list_head df_list = LIST_HEAD_INIT(df_list);
 static LIST_HEAD(flush_pages);
 
 struct flush {
+	enum flush_mode mode;
 	struct list_head l;
 	unsigned long addr;
 };
@@ -81,7 +84,7 @@ static struct page *split_large_page(uns
 } 
 
 struct flush_arg {
-	int full_flush;
+	enum flush_mode full_flush;
 	struct list_head l;
 };
 
@@ -96,12 +99,17 @@ static void flush_kernel_map(void *arg)
 {
 	struct flush_arg *a = (struct flush_arg *)arg;
 	struct flush *f;
+	int cache_flush = a->full_flush == FLUSH_CACHE;
 
 	list_for_each_entry(f, &a->l, l) {
-		if (!a->full_flush && !cpu_has_ss)
-			clflush_cache_range((void *)f->addr, PAGE_SIZE);
 		if (!a->full_flush)
 			__flush_tlb_one(f->addr);
+		if (f->mode == FLUSH_CACHE && !cpu_has_ss) {
+			if (cpu_has_clflush)
+				clflush_cache_range((void *)f->addr, PAGE_SIZE);
+			else
+				cache_flush++;
+		}
 	}
 
 	/* Handle errata with flushing large pages in early Athlons */
@@ -115,10 +123,8 @@ static void flush_kernel_map(void *arg)
 	 * here and in the loop. But it is moot on Self-Snoop CPUs anyways.
 	 */
 
-	if ((!cpu_has_clflush || a->full_flush) &&
-	    !cpu_has_ss && boot_cpu_data.x86_model >= 4)
+	if (cache_flush > 0 && !cpu_has_ss && boot_cpu_data.x86_model >= 4)
 		wbinvd();
-
 }
 
 static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) 
@@ -169,6 +175,20 @@ static inline void save_page(struct page
 		list_add(&kpte_page->lru, &df_list);
 }
 
+static unsigned short pat_bit[4] = {
+	[3] = _PAGE_PAT,
+	[2] = _PAGE_PAT_LARGE,
+};
+
+static int cache_attr_changed(pte_t pte, pgprot_t prot, int level)
+{
+	unsigned long a = pte_val(pte) & (_PAGE_PCD|_PAGE_PWT);
+	/* Correct for PAT bit being in different position on large pages */
+	if (pte_val(pte) & pat_bit[level])
+		a |= _PAGE_PAT;
+	return a != (pgprot_val(prot) & _PAGE_CACHE);
+}
+
 /*
  * Mark the address for flushing later in global_tlb_flush().
  *
@@ -177,14 +197,17 @@ static inline void save_page(struct page
  * data structure to keep track of the flush. This has the added bonus that
  * it will work for MMIO holes without mem_map too.
  */
-static void set_tlb_flush(unsigned long address)
+static void set_tlb_flush(unsigned long address, int cache)
 {
+	enum flush_mode mode = cache ? FLUSH_CACHE : FLUSH_TLB;
 	struct flush *f = kmalloc(sizeof(struct flush), GFP_KERNEL);
 	if (!f) {
-		full_flush = 1;
+		if (full_flush < mode)
+			full_flush = mode;
 		return;
 	}
 	f->addr = address;
+	f->mode = mode;
 	list_add_tail(&f->l, &flush_pages);
 }
 
@@ -206,7 +229,7 @@ __change_page_attr(struct page *page, pg
 	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
 
-	set_tlb_flush(address);
+	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level));
 
 	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
 		if (level == 3) {
Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -13,12 +13,17 @@
 #define _PAGE_BIT_DIRTY		6
 #define _PAGE_BIT_FILE		6
 #define _PAGE_BIT_PSE		7	/* 4 MB (or 2MB) page */
+#define _PAGE_BIT_PAT		7	/* Only on 4kb pages */
+#define _PAGE_BIT_PAT_LARGE    12	/* Only on large pages */
 #define _PAGE_BIT_GLOBAL	8	/* Global TLB entry PPro+ */
 #define _PAGE_BIT_UNUSED1	9	/* available for programmer */
 #define _PAGE_BIT_UNUSED2	10
 #define _PAGE_BIT_UNUSED3	11
 #define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */
 
+/* Needs special handling for large pages */
+#define _PAGE_CACHE	 (_PAGE_PCD|_PAGE_PWT|_PAGE_PAT)
+
 #define _PAGE_PRESENT	(_AC(1, UL)<<_PAGE_BIT_PRESENT)
 #define _PAGE_RW	(_AC(1, UL)<<_PAGE_BIT_RW)
 #define _PAGE_USER	(_AC(1, UL)<<_PAGE_BIT_USER)
@@ -32,6 +37,9 @@
 #define _PAGE_UNUSED2	(_AC(1, UL)<<_PAGE_BIT_UNUSED2)
 #define _PAGE_UNUSED3	(_AC(1, UL)<<_PAGE_BIT_UNUSED3)
 
+#define _PAGE_PAT	(_AC(1, UL) << _PAGE_BIT_PAT)
+#define _PAGE_PAT_LARGE (_AC(1, UL) << _PAGE_BIT_PAT_LARGE)
+
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 #define _PAGE_NX	(_AC(1, ULL) << _PAGE_BIT_NX)
 #else

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [20/31] CPA: Fix inaccurate comments in 64bit change_page_attr()
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (18 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [19/31] CPA: Limit cache flushing to pages that really change caching Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [21/31] CPA: Dump pagetable when inconsistency is detected Andi Kleen
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


No code changes.
Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_64.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -120,7 +120,7 @@ static void flush_kernel_map(void *arg)
 		wbinvd();
 }
 
-/* both protected by init_mm.mmap_sem */
+/* All protected by init_mm.mmap_sem */
 static enum flush_mode full_flush;
 static LIST_HEAD(deferred_pages);
 static LIST_HEAD(flush_pages);
@@ -132,7 +132,7 @@ static inline void save_page(struct page
 }
 
 /* 
- * No more special protections in this 2/4MB area - revert to a
+ * No more special protections in this 2MB area - revert to a
  * large page again. 
  */
 static void revert_page(unsigned long address, pgprot_t ref_prot)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [21/31] CPA: Dump pagetable when inconsistency is detected
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (19 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [20/31] CPA: Fix inaccurate comments in 64bit change_page_attr() Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [22/31] CPA: Only queue actually unused page table pages for freeing Andi Kleen
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


When c_p_a() detects a inconsistency in the kernel page tables 
it BUGs. When this happens dump the page table first to avoid one 
bug reporting round trip.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   11 ++++++++++-
 arch/x86/mm/pageattr_64.c |   11 ++++++++++-
 2 files changed, 20 insertions(+), 2 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -12,6 +12,7 @@
 #include <asm/processor.h>
 #include <asm/tlbflush.h>
 #include <asm/io.h>
+#include <asm/kdebug.h>
 
 enum flush_mode { FLUSH_NONE, FLUSH_CACHE, FLUSH_TLB };
 
@@ -231,8 +232,16 @@ __change_page_attr(unsigned long address
 		set_pte(kpte, pfn_pte(pfn, ref_prot));
 		BUG_ON(page_private(kpte_page) == 0);
 		page_private(kpte_page)--;
-	} else
+	} else {
+		/*
+		 * When you're here you either set the same page to PAGE_KERNEL
+		 * two times in a row or the page table reference counting is
+		 * broken again. To catch the later bug for now (sorry)
+		 */
+		printk(KERN_ERR "address %lx\n", address);
+		dump_pagetable(address);
 		BUG();
+	}
 
 	/* on x86-64 the direct mapping set at boot is not using 4k pages */
 	BUG_ON(PageReserved(kpte_page));
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -13,6 +13,7 @@
 #include <asm/tlbflush.h>
 #include <asm/pgalloc.h>
 #include <asm/sections.h>
+#include <asm/kdebug.h>
 
 #define PG_deferred PG_arch_1
 
@@ -252,8 +253,16 @@ __change_page_attr(struct page *page, pg
 		set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
 		BUG_ON(page_private(kpte_page) == 0);
 		page_private(kpte_page)--;
-	} else
+	} else {
+		/*
+		 * When you're here you either set the same page to PAGE_KERNEL
+		 * two times in a row or the page table reference counting is
+		 * broken again. To catch the later bug for now (sorry)
+		 */
+		printk(KERN_ERR "address %lx\n", address);
+		dump_pagetable(address);
 		BUG();
+	}
 
 	/*
 	 * If the pte was reserved, it means it was created at boot

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [22/31] CPA: Only queue actually unused page table pages for freeing
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (20 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [21/31] CPA: Dump pagetable when inconsistency is detected Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [23/31] CPA: Remove unnecessary masking of address Andi Kleen
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


With the separate data structure added for flush earlier it is only
needed to call save_page() now on pte pages that have been already reverted.

Also move all freeing checks into the caller.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |    4 +---
 arch/x86/mm/pageattr_64.c |    7 +++----
 2 files changed, 4 insertions(+), 7 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -270,9 +270,9 @@ __change_page_attr(struct page *page, pg
 	 * replace it with a largepage.
 	 */
 
-	save_page(kpte_page);
 	if (!PageReserved(kpte_page)) {
 		if (cpu_has_pse && (page_private(kpte_page) == 0)) {
+			save_page(kpte_page);
 			paravirt_release_pt(page_to_pfn(kpte_page));
 			revert_page(kpte_page, address);
 		}
@@ -333,8 +333,6 @@ void global_flush_tlb(void)
 	list_for_each_entry_safe(pg, next, &free_pages, lru) {
 		list_del(&pg->lru);
 		ClearPageDeferred(pg);
-		if (PageReserved(pg) || !cpu_has_pse || page_private(pg) != 0)
-			continue;
 		ClearPagePrivate(pg);
 		__free_page(pg);
 	}
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -246,9 +246,10 @@ __change_page_attr(unsigned long address
 	/* on x86-64 the direct mapping set at boot is not using 4k pages */
 	BUG_ON(PageReserved(kpte_page));
 
-	save_page(kpte_page);
-	if (page_private(kpte_page) == 0)
+	if (page_private(kpte_page) == 0) {
+		save_page(kpte_page);
 		revert_page(address, ref_prot);
+	}
 	return 0;
 } 
 
@@ -336,8 +337,6 @@ void global_flush_tlb(void)
 	list_for_each_entry_safe(pg, next, &free_pages, lru) {
 		list_del(&pg->lru);
 		ClearPageDeferred(pg);
-		if (page_private(pg) != 0)
-			continue;
 		ClearPagePrivate(pg);
 		__free_page(pg);
 	} 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [23/31] CPA: Remove unnecessary masking of address
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (21 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [22/31] CPA: Only queue actually unused page table pages for freeing Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [24/31] CPA: Only unmap kernel init pages in text mapping when CONFIG_DEBUG_RODATA is set Andi Kleen
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


virt_to_page does not care about the bits below the page granuality.
So don't mask them.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_64.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -204,7 +204,7 @@ __change_page_attr(unsigned long address
 
 	kpte = lookup_address(address, &level);
 	if (!kpte) return 0;
-	kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
+	kpte_page = virt_to_page(kpte);
 	BUG_ON(PageCompound(kpte_page));
 	BUG_ON(PageLRU(kpte_page));
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [24/31] CPA: Only unmap kernel init pages in text mapping when CONFIG_DEBUG_RODATA is set
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (22 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [23/31] CPA: Remove unnecessary masking of address Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [25/31] CPA: Always do full TLB flush when splitting large pages Andi Kleen
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Otherwise the kernel will likely always run with 4K pages instead of 2MB pages,
which is costly in terms of TLBs.

Also optimize it a little bit by using only a single change_page_attr() calls.
This is particularly useful if debugging is enabled inside it because it spams
the logs much less.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/init_64.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -553,13 +553,16 @@ void free_init_pages(char *what, unsigne
 		init_page_count(virt_to_page(addr));
 		memset((void *)(addr & ~(PAGE_SIZE-1)),
 			POISON_FREE_INITMEM, PAGE_SIZE);
-		if (addr >= __START_KERNEL_map)
-			change_page_attr_addr(addr, 1, __pgprot(0));
 		free_page(addr);
 		totalram_pages++;
 	}
-	if (addr > __START_KERNEL_map)
+#ifdef CONFIG_DEBUG_RODATA
+	if (begin >= __START_KERNEL_map) {
+		change_page_attr_addr(begin, (end - begin)/PAGE_SIZE,
+					__pgprot(0));
 		global_flush_tlb();
+	}
+#endif
 }
 
 void free_initmem(void)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [25/31] CPA: Always do full TLB flush when splitting large pages
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (23 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [24/31] CPA: Only unmap kernel init pages in text mapping when CONFIG_DEBUG_RODATA is set Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-14 22:16 ` [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages Andi Kleen
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Various CPUs have errata when using INVLPG to flush large pages.
This includes Intel Penryn (AV2) and AMD K7 (#16 in Athlon 4) 
While these happen only in specific circumstances it is still
a little risky and it's not clear the kernel can avoid them all.

Avoid this can of worms by always flushing the full TLB (but 
not the full cache) when splitting a large page. This should
not be that expensive anyways and initial splitting should be
hopefully infrequent.

This also removes the previously hard coded workaround for K7 
Athlon on 32bit.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   17 +++++++++++------
 arch/x86/mm/pageattr_64.c |   12 ++++++++++--
 2 files changed, 21 insertions(+), 8 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -113,10 +113,7 @@ static void flush_kernel_map(void *arg)
 		}
 	}
 
-	/* Handle errata with flushing large pages in early Athlons */
-	if (a->full_flush ||
-	    (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
-	     boot_cpu_data.x86 == 7))
+	if (a->full_flush)
 		__flush_tlb_all();
 
 	/*
@@ -198,7 +195,7 @@ static int cache_attr_changed(pte_t pte,
  * data structure to keep track of the flush. This has the added bonus that
  * it will work for MMIO holes without mem_map too.
  */
-static void set_tlb_flush(unsigned long address, int cache)
+static void set_tlb_flush(unsigned long address, int cache, int large)
 {
 	enum flush_mode mode = cache ? FLUSH_CACHE : FLUSH_TLB;
 	struct flush *f = kmalloc(sizeof(struct flush), GFP_KERNEL);
@@ -210,6 +207,13 @@ static void set_tlb_flush(unsigned long 
 	f->addr = address;
 	f->mode = mode;
 	list_add_tail(&f->l, &flush_pages);
+
+	/*
+	 * Work around large page INVLPG bugs in early K7 and in Penryn.
+	 * When we split a large page always do a full TLB flush.
+	 */
+	if (large && full_flush < FLUSH_TLB)
+		full_flush = FLUSH_TLB;
 }
 
 static int
@@ -230,7 +234,8 @@ __change_page_attr(struct page *page, pg
 	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
 
-	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level));
+	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
+			level < 3);
 
 	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
 		if (level == 3) {
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -164,7 +164,7 @@ static void revert_page(unsigned long ad
  * data structure to keep track of the flush. This has the added bonus that
  * it will work for MMIO holes without mem_map too.
  */
-static void set_tlb_flush(unsigned long address, int cache)
+static void set_tlb_flush(unsigned long address, int cache, int large)
 {
 	enum flush_mode mode = cache ? FLUSH_CACHE : FLUSH_TLB;
 	struct flush *f = kmalloc(sizeof(struct flush), GFP_KERNEL);
@@ -177,6 +177,13 @@ static void set_tlb_flush(unsigned long 
 	f->addr = address;
 	f->mode = mode;
 	list_add_tail(&f->l, &flush_pages);
+
+	/*
+	 * Work around large page INVLPG bugs in early K7 and in Penryn.
+	 * When we split a large page always do a full TLB flush.
+	 */
+	if (large && full_flush < FLUSH_TLB)
+		full_flush = FLUSH_TLB;
 }
 
 static unsigned short pat_bit[5] = {
@@ -208,7 +215,8 @@ __change_page_attr(unsigned long address
 	BUG_ON(PageCompound(kpte_page));
 	BUG_ON(PageLRU(kpte_page));
 
-	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level));
+	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
+			level < 4);
 
 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
 		if (level == 4) {

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (24 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [25/31] CPA: Always do full TLB flush when splitting large pages Andi Kleen
@ 2008-01-14 22:16 ` Andi Kleen
  2008-01-15  9:05   ` Jan Beulich
  2008-01-14 22:17 ` [PATCH] [27/31] CPA: Change comments of external interfaces to kerneldoc format Andi Kleen
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:16 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


When changing a page that has already been modified to non standard attributes
before don't change the reference count. And when changing back a page
only decrease the ref count if the old attributes were non standard.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |   44 +++++++++++++++++++++++++-------------------
 arch/x86/mm/pageattr_64.c |   16 ++++++++++++----
 include/asm-x86/pgtable.h |    2 ++
 3 files changed, 39 insertions(+), 23 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -206,20 +206,26 @@ __change_page_attr(unsigned long address
 { 
 	pte_t *kpte; 
 	struct page *kpte_page;
-	pgprot_t ref_prot2;
+	pgprot_t ref_prot2, oldprot;
 	int level;
 
 	kpte = lookup_address(address, &level);
 	if (!kpte) return 0;
 	kpte_page = virt_to_page(kpte);
+	oldprot = pte_pgprot(*kpte);
 	BUG_ON(PageCompound(kpte_page));
 	BUG_ON(PageLRU(kpte_page));
 
 	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
 			level < 4);
 
+	ref_prot = canon_pgprot(ref_prot);
+	prot = canon_pgprot(prot);
+
 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
 		if (level == 4) {
+			if (pgprot_val(oldprot) == pgprot_val(ref_prot))
+				page_private(kpte_page)++;
 			set_pte(kpte, pfn_pte(pfn, prot));
 		} else {
 			/*
@@ -234,12 +240,14 @@ __change_page_attr(unsigned long address
 			pgprot_val(ref_prot2) &= ~_PAGE_NX;
 			set_pte(kpte, mk_pte(split, ref_prot2));
 			kpte_page = split;
+			page_private(kpte_page)++;
 		}
-		page_private(kpte_page)++;
 	} else if (level == 4) {
+		if (pgprot_val(oldprot) != pgprot_val(ref_prot)) {
+			BUG_ON(page_private(kpte_page) <= 0);
+			page_private(kpte_page)--;
+		}
 		set_pte(kpte, pfn_pte(pfn, ref_prot));
-		BUG_ON(page_private(kpte_page) == 0);
-		page_private(kpte_page)--;
 	} else {
 		/*
 		 * When you're here you either set the same page to PAGE_KERNEL
Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -151,20 +151,17 @@ static void set_pmd_pte(pte_t *kpte, uns
  * No more special protections in this 2/4MB area - revert to a
  * large page again. 
  */
-static inline void revert_page(struct page *kpte_page, unsigned long address)
+static void
+revert_page(struct page *kpte_page, unsigned long address, pgprot_t ref_prot)
 {
-	pgprot_t ref_prot;
 	pte_t *linear;
 
-	ref_prot =
-	((address & LARGE_PAGE_MASK) < (unsigned long)&_etext)
-		? PAGE_KERNEL_LARGE_EXEC : PAGE_KERNEL_LARGE;
-
 	linear = (pte_t *)
 		pmd_offset(pud_offset(pgd_offset_k(address), address), address);
 	set_pmd_pte(linear,  address,
-		    pfn_pte((__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT,
-			    ref_prot));
+		    pte_mkhuge(pfn_pte((__pa(address) & LARGE_PAGE_MASK)
+					>> PAGE_SHIFT,
+			    ref_prot)));
 }
 
 static inline void save_page(struct page *kpte_page)
@@ -223,6 +220,8 @@ __change_page_attr(struct page *page, pg
 	unsigned long address;
 	struct page *kpte_page;
 	int level;
+	pgprot_t oldprot;
+	pgprot_t ref_prot = PAGE_KERNEL;
 
 	BUG_ON(PageHighMem(page));
 	address = (unsigned long)page_address(page);
@@ -230,6 +229,8 @@ __change_page_attr(struct page *page, pg
 	kpte = lookup_address(address, &level);
 	if (!kpte)
 		return -EINVAL;
+
+	oldprot = pte_pgprot(*kpte);
 	kpte_page = virt_to_page(kpte);
 	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
@@ -237,27 +238,32 @@ __change_page_attr(struct page *page, pg
 	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
 			level < 3);
 
-	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
+	if ((address & LARGE_PAGE_MASK) < (unsigned long)&_etext)
+		ref_prot = PAGE_KERNEL_EXEC;
+
+	ref_prot = canon_pgprot(ref_prot);
+	prot = canon_pgprot(prot);
+
+	if (pgprot_val(prot) != pgprot_val(ref_prot)) {
 		if (level == 3) {
+			if (pgprot_val(oldprot) == pgprot_val(ref_prot))
+				page_private(kpte_page)++;
 			set_pte_atomic(kpte, mk_pte(page, prot)); 
 		} else {
-			pgprot_t ref_prot;
 			struct page *split;
-
-			ref_prot =
-			((address & LARGE_PAGE_MASK) < (unsigned long)&_etext)
-				? PAGE_KERNEL_EXEC : PAGE_KERNEL;
 			split = split_large_page(address, prot, ref_prot);
 			if (!split)
 				return -ENOMEM;
 			set_pmd_pte(kpte,address,mk_pte(split, ref_prot));
 			kpte_page = split;
+			page_private(kpte_page)++;
 		}
-		page_private(kpte_page)++;
 	} else if (level == 3) {
-		set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
-		BUG_ON(page_private(kpte_page) == 0);
-		page_private(kpte_page)--;
+		if (pgprot_val(oldprot) != pgprot_val(ref_prot)) {
+			BUG_ON(page_private(kpte_page) <= 0);
+			page_private(kpte_page)--;
+		}
+		set_pte_atomic(kpte, mk_pte(page, ref_prot));
 	} else {
 		/*
 		 * When you're here you either set the same page to PAGE_KERNEL
@@ -279,7 +285,7 @@ __change_page_attr(struct page *page, pg
 		if (cpu_has_pse && (page_private(kpte_page) == 0)) {
 			save_page(kpte_page);
 			paravirt_release_pt(page_to_pfn(kpte_page));
-			revert_page(kpte_page, address);
+			revert_page(kpte_page, address, ref_prot);
 		}
 	}
 	return 0;
Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -192,6 +192,8 @@ static inline pte_t pte_modify(pte_t pte
 	return __pte(val);
 }
 
+#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else  /* !CONFIG_PARAVIRT */

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [27/31] CPA: Change comments of external interfaces to kerneldoc format
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (25 preceding siblings ...)
  2008-01-14 22:16 ` [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages Andi Kleen
@ 2008-01-14 22:17 ` Andi Kleen
  2008-01-14 22:50   ` Randy Dunlap
  2008-01-14 22:17 ` [PATCH] [28/31] CPA: Make kernel_text test match boot mapping initialization Andi Kleen
                   ` (4 subsequent siblings)
  31 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:17 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


And clarify description a bit.

Only for 64bit, but the interfaces are identical for 32bit and kerneldoc should
merge them (?) 

Signed-off-by: Andi Kleen <ak@suse.de>

---
 Documentation/DocBook/kernel-api.tmpl |    8 +++++
 arch/x86/mm/pageattr_64.c             |   46 +++++++++++++++++++++++++---------
 2 files changed, 42 insertions(+), 12 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -269,19 +269,19 @@ __change_page_attr(unsigned long address
 	return 0;
 } 
 
-/*
- * Change the page attributes of an page in the linear mapping.
- *
- * This should be used when a page is mapped with a different caching policy
- * than write-back somewhere - some CPUs do not like it when mappings with
- * different caching policies exist. This changes the page attributes of the
- * in kernel linear mapping too.
+/**
+ * change_page_attr_addr - Change page table attributes in linear mapping
+ * @address: Virtual address in linear mapping.
+ * @numpages: Number of pages to change
+ * @prot:    New page table attribute (PAGE_*)
  * 
- * The caller needs to ensure that there are no conflicting mappings elsewhere.
- * This function only deals with the kernel linear map.
- * 
- * Caller must call global_flush_tlb() after this.
+ * Change page attributes of a page in the direct mapping. This is a variant
+ * of change_page_attr() that also works on memory holes that do not have
+ * mem_map entry (pfn_valid() is false).
+ *
+ * See change_page_attr() documentation for more details.
  */
+
 int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
 {
 	int err = 0, kernel_map = 0;
@@ -318,13 +318,35 @@ int change_page_attr_addr(unsigned long 
 	return err;
 }
 
-/* Don't call this for MMIO areas that may not have a mem_map entry */
+/**
+ * change_page_attr - Change page table attributes in the linear mapping.
+ * @page: First page to change
+ * @numpages: Number of pages to change
+ * @prot: New protection/caching type (PAGE_*)
+ *
+ * Returns 0 on success, otherwise a negated errno.
+ *
+ * This should be used when a page is mapped with a different caching policy
+ * than write-back somewhere - some CPUs do not like it when mappings with
+ * different caching policies exist. This changes the page attributes of the
+ * in kernel linear mapping too.
+ *
+ * Caller must call global_flush_tlb() later to make the changes active.
+ *
+ * The caller needs to ensure that there are no conflicting mappings elsewhere
+ * (e.g. in user space) * This function only deals with the kernel linear map.
+ *
+ * For MMIO areas without mem_map use change_page_attr_addr() instead.
+ */
 int change_page_attr(struct page *page, int numpages, pgprot_t prot)
 {
 	unsigned long addr = (unsigned long)page_address(page);
 	return change_page_attr_addr(addr, numpages, prot);
 }
 
+/**
+ * global_flush_tlb - Flush linear mappings changed by change_page_attr()
+ */
 void global_flush_tlb(void)
 { 
 	struct page *pg, *next;
Index: linux/Documentation/DocBook/kernel-api.tmpl
===================================================================
--- linux.orig/Documentation/DocBook/kernel-api.tmpl
+++ linux/Documentation/DocBook/kernel-api.tmpl
@@ -726,4 +726,12 @@ X!Idrivers/video/console/fonts.c
 !Ffs/pipe.c
   </chapter>
 
+  <chapter id="pageattr">
+	<title>Kernel direct mapping paging attributes</title>
+  <para>
+	Changing the paging attributes of kernel direct mapping pages
+	on x86.
+  </para>
+!Farch/x86/mm/pageattr_64.c
+  </chapter>
 </book>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [28/31] CPA: Make kernel_text test match boot mapping initialization
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (26 preceding siblings ...)
  2008-01-14 22:17 ` [PATCH] [27/31] CPA: Change comments of external interfaces to kerneldoc format Andi Kleen
@ 2008-01-14 22:17 ` Andi Kleen
  2008-01-14 22:17 ` [PATCH] [29/31] CPA: Add a BUG_ON checking for someone setting the kernel text NX Andi Kleen
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:17 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


The boot direct mapping initialization used a different test to check if a 
page was part of the kernel mapping than c_p_a(). Make them use
a common function. 

Also round up to a large page size to be sure and check for the beginning
of the kernel address to handle highly loaded kernels better.

This gives a small semantic change of NX applying to always 2MB areas 
on !PSE && NX systems, but that's an obscure case even considering
DEBUG_PAGEALLOC.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/init_32.c        |   16 ++--------------
 arch/x86/mm/pageattr_32.c    |    9 ++++++++-
 include/asm-x86/pgtable_32.h |    2 +-
 3 files changed, 11 insertions(+), 16 deletions(-)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -184,6 +184,13 @@ static int cache_attr_changed(pte_t pte,
 	return a != (pgprot_val(prot) & _PAGE_CACHE);
 }
 
+int text_address(unsigned long addr)
+{
+	unsigned long start = ((unsigned long)&_text) & LARGE_PAGE_MASK;
+	unsigned long end = ((unsigned long)&__init_end) & LARGE_PAGE_MASK;
+	return addr >= start && addr < end + LARGE_PAGE_SIZE;
+}
+
 /*
  * Mark the address for flushing later in global_tlb_flush().
  *
@@ -238,7 +245,7 @@ __change_page_attr(struct page *page, pg
 	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
 			level < 3);
 
-	if ((address & LARGE_PAGE_MASK) < (unsigned long)&_etext)
+	if (text_address(address))
 		ref_prot = PAGE_KERNEL_EXEC;
 
 	ref_prot = canon_pgprot(ref_prot);
Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -136,13 +136,6 @@ static void __init page_table_range_init
 	}
 }
 
-static inline int is_kernel_text(unsigned long addr)
-{
-	if (addr >= PAGE_OFFSET && addr <= (unsigned long)__init_end)
-		return 1;
-	return 0;
-}
-
 /*
  * This maps the physical memory to kernel virtual address space, a total 
  * of max_low_pfn pages, by creating page tables starting from address 
@@ -172,14 +165,9 @@ static void __init kernel_physical_mappi
 			/* Map with big pages if possible, otherwise
 			   create normal page tables. */
 			if (cpu_has_pse) {
-				unsigned int address2;
 				pgprot_t prot = PAGE_KERNEL_LARGE;
 
-				address2 = (pfn + PTRS_PER_PTE - 1) * PAGE_SIZE +
-					PAGE_OFFSET + PAGE_SIZE-1;
-
-				if (is_kernel_text(address) ||
-				    is_kernel_text(address2))
+				if (text_address(address))
 					prot = PAGE_KERNEL_LARGE_EXEC;
 
 				set_pmd(pmd, pfn_pmd(pfn, prot));
@@ -193,7 +181,7 @@ static void __init kernel_physical_mappi
 				     pte++, pfn++, pte_ofs++, address += PAGE_SIZE) {
 					pgprot_t prot = PAGE_KERNEL;
 
-					if (is_kernel_text(address))
+					if (text_address(address))
 						prot = PAGE_KERNEL_EXEC;
 
 					set_pte(pte, pfn_pte(pfn, prot));
Index: linux/include/asm-x86/pgtable_32.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_32.h
+++ linux/include/asm-x86/pgtable_32.h
@@ -34,7 +34,7 @@ void check_pgt_cache(void);
 void pmd_ctor(struct kmem_cache *, void *);
 void pgtable_cache_init(void);
 void paging_init(void);
-
+int text_address(unsigned long);
 
 /*
  * The Linux x86 paging architecture is 'compile-time dual-mode', it

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [29/31] CPA: Add a BUG_ON checking for someone setting the kernel text NX
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (27 preceding siblings ...)
  2008-01-14 22:17 ` [PATCH] [28/31] CPA: Make kernel_text test match boot mapping initialization Andi Kleen
@ 2008-01-14 22:17 ` Andi Kleen
  2008-01-14 22:17 ` [PATCH] [30/31] Remove set_kernel_exec Andi Kleen
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:17 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


Someone setting NX on the kernel text tends to result in nasty failures
and triple faults, so BUG_ON early for that.

Does not cover __inittext.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_32.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux/arch/x86/mm/pageattr_32.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_32.c
+++ linux/arch/x86/mm/pageattr_32.c
@@ -242,6 +242,14 @@ __change_page_attr(struct page *page, pg
 	BUG_ON(PageLRU(kpte_page));
 	BUG_ON(PageCompound(kpte_page));
 
+	/*
+	 * Better fail early if someone sets the kernel text to NX.
+	 * Does not cover __inittext
+	 */
+	BUG_ON(address >= (unsigned long)&_text &&
+		address < (unsigned long)&_etext &&
+	       (pgprot_val(prot) & _PAGE_NX));
+
 	set_tlb_flush(address, cache_attr_changed(*kpte, prot, level),
 			level < 3);
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [30/31] Remove set_kernel_exec
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (28 preceding siblings ...)
  2008-01-14 22:17 ` [PATCH] [29/31] CPA: Add a BUG_ON checking for someone setting the kernel text NX Andi Kleen
@ 2008-01-14 22:17 ` Andi Kleen
  2008-01-14 22:17 ` [PATCH] [31/31] Clean up pte_exec Andi Kleen
  2008-01-15  9:11 ` [PATCH] [0/31] Great change_page_attr patch series v2 Jan Beulich
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:17 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


The SMP trampoline always runs in real mode, so making it executable 
in the page tables doesn't make much sense because it executes
before page tables are set up. That was the only user of
set_kernel_exec(). Remove set_kernel_exec().

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/kernel/smpboot_32.c |   11 -----------
 arch/x86/mm/init_32.c        |   30 ------------------------------
 include/asm-x86/pgtable_32.h |   12 ------------
 3 files changed, 53 deletions(-)

Index: linux/arch/x86/mm/init_32.c
===================================================================
--- linux.orig/arch/x86/mm/init_32.c
+++ linux/arch/x86/mm/init_32.c
@@ -508,36 +508,6 @@ static void __init set_nx(void)
 		}
 	}
 }
-
-/*
- * Enables/disables executability of a given kernel page and
- * returns the previous setting.
- */
-int __init set_kernel_exec(unsigned long vaddr, int enable)
-{
-	pte_t *pte;
-	int ret = 1;
-	int level;
-
-	if (!nx_enabled)
-		goto out;
-
-	pte = lookup_address(vaddr, &level);
-	BUG_ON(!pte);
-
-	if (!pte_exec_kernel(*pte))
-		ret = 0;
-
-	if (enable)
-		pte->pte_high &= ~(1 << (_PAGE_BIT_NX - 32));
-	else
-		pte->pte_high |= 1 << (_PAGE_BIT_NX - 32);
-	pte_update_defer(&init_mm, vaddr, pte);
-	__flush_tlb_all();
-out:
-	return ret;
-}
-
 #endif
 
 /*
Index: linux/include/asm-x86/pgtable_32.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_32.h
+++ linux/include/asm-x86/pgtable_32.h
@@ -184,18 +184,6 @@ static inline void clone_pgd_range(pgd_t
  */
 extern pte_t *lookup_address(unsigned long address, int *level);
 
-/*
- * Make a given kernel text page executable/non-executable.
- * Returns the previous executability setting of that page (which
- * is used to restore the previous state). Used by the SMP bootup code.
- * NOTE: this is an __init function for security reasons.
- */
-#ifdef CONFIG_X86_PAE
- extern int set_kernel_exec(unsigned long vaddr, int enable);
-#else
- static inline int set_kernel_exec(unsigned long vaddr, int enable) { return 0;}
-#endif
-
 #if defined(CONFIG_HIGHPTE)
 #define pte_offset_map(dir, address) \
 	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)),KM_PTE0) + pte_index(address))
Index: linux/arch/x86/kernel/smpboot_32.c
===================================================================
--- linux.orig/arch/x86/kernel/smpboot_32.c
+++ linux/arch/x86/kernel/smpboot_32.c
@@ -112,7 +112,6 @@ u8 apicid_2_node[MAX_APICID];
 extern const unsigned char trampoline_data [];
 extern const unsigned char trampoline_end  [];
 static unsigned char *trampoline_base;
-static int trampoline_exec;
 
 static void map_cpu_to_logical_apicid(void);
 
@@ -144,10 +143,6 @@ void __init smp_alloc_memory(void)
 	 */
 	if (__pa(trampoline_base) >= 0x9F000)
 		BUG();
-	/*
-	 * Make the SMP trampoline executable:
-	 */
-	trampoline_exec = set_kernel_exec((unsigned long)trampoline_base, 1);
 }
 
 /*
@@ -1295,12 +1290,6 @@ void __init native_smp_cpus_done(unsigne
 	setup_ioapic_dest();
 #endif
 	zap_low_mappings();
-#ifndef CONFIG_HOTPLUG_CPU
-	/*
-	 * Disable executability of the SMP trampoline:
-	 */
-	set_kernel_exec((unsigned long)trampoline_base, trampoline_exec);
-#endif
 }
 
 void __init smp_intr_init(void)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] [31/31] Clean up pte_exec
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (29 preceding siblings ...)
  2008-01-14 22:17 ` [PATCH] [30/31] Remove set_kernel_exec Andi Kleen
@ 2008-01-14 22:17 ` Andi Kleen
  2008-01-15  9:11 ` [PATCH] [0/31] Great change_page_attr patch series v2 Jan Beulich
  31 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-14 22:17 UTC (permalink / raw)
  To: linux-kernel, jbeulich, mingo, tglx


- Rename it to pte_exec() from pte_exec_kernel(). There is nothing
kernel specific in there.
- Move it into the common file because _PAGE_NX is 0 on !PAE and then
pte_exec() will be always evaluate to true.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/fault_32.c           |    2 +-
 include/asm-x86/pgtable-2level.h |    8 --------
 include/asm-x86/pgtable-3level.h |    8 --------
 include/asm-x86/pgtable.h        |    1 +
 4 files changed, 2 insertions(+), 17 deletions(-)

Index: linux/include/asm-x86/pgtable-2level.h
===================================================================
--- linux.orig/include/asm-x86/pgtable-2level.h
+++ linux/include/asm-x86/pgtable-2level.h
@@ -56,14 +56,6 @@ static inline pte_t native_ptep_get_and_
 #define pte_pfn(x)		(pte_val(x) >> PAGE_SHIFT)
 
 /*
- * All present pages are kernel-executable:
- */
-static inline int pte_exec_kernel(pte_t pte)
-{
-	return 1;
-}
-
-/*
  * Bits 0, 6 and 7 are taken, split up the 29 bits of offset
  * into this range:
  */
Index: linux/include/asm-x86/pgtable.h
===================================================================
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -143,6 +143,7 @@ static inline int pte_write(pte_t pte)		
 static inline int pte_file(pte_t pte)		{ return pte_val(pte) & _PAGE_FILE; }
 static inline int pte_huge(pte_t pte)		{ return pte_val(pte) & _PAGE_PSE; }
 static inline int pte_global(pte_t pte) 	{ return pte_val(pte) & _PAGE_GLOBAL; }
+static inline int pte_exec(pte_t pte)		{ return !(pte_val(pte) & _PAGE_NX); }
 
 static inline int pmd_large(pmd_t pte) {
 	return (pmd_val(pte) & (_PAGE_PSE|_PAGE_PRESENT)) ==
Index: linux/include/asm-x86/pgtable-3level.h
===================================================================
--- linux.orig/include/asm-x86/pgtable-3level.h
+++ linux/include/asm-x86/pgtable-3level.h
@@ -19,14 +19,6 @@
 #define pud_bad(pud)				0
 #define pud_present(pud)			1
 
-/*
- * All present pages with !NX bit are kernel-executable:
- */
-static inline int pte_exec_kernel(pte_t pte)
-{
-	return !(pte_val(pte) & _PAGE_NX);
-}
-
 /* Rules for using set_pte: the pte being assigned *must* be
  * either not present or in a state where the hardware will
  * not attempt to update the pte.  In places where this is
Index: linux/arch/x86/mm/fault_32.c
===================================================================
--- linux.orig/arch/x86/mm/fault_32.c
+++ linux/arch/x86/mm/fault_32.c
@@ -574,7 +574,7 @@ no_context:
 			int level;
 			pte_t *pte = lookup_address(address, &level);
 
-			if (pte && pte_present(*pte) && !pte_exec_kernel(*pte))
+			if (pte && pte_present(*pte) && !pte_exec(*pte))
 				printk(KERN_CRIT "kernel tried to execute "
 					"NX-protected page - exploit attempt? "
 					"(uid: %d)\n", current->uid);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [27/31] CPA: Change comments of external interfaces to kerneldoc format
  2008-01-14 22:17 ` [PATCH] [27/31] CPA: Change comments of external interfaces to kerneldoc format Andi Kleen
@ 2008-01-14 22:50   ` Randy Dunlap
  2008-01-15  0:49     ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Randy Dunlap @ 2008-01-14 22:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Mon, 14 Jan 2008 23:17:00 +0100 (CET) Andi Kleen wrote:

> 
> And clarify description a bit.
> 
> Only for 64bit, but the interfaces are identical for 32bit and kerneldoc should
> merge them (?) 

I doubt that kernel-doc will merge them.

Normally we just import (like with !I, !E, !F, etc.) one file that
contains the kernel-doc API notations.
Would that work for this case?


> Signed-off-by: Andi Kleen <ak@suse.de>
> 
> ---
>  Documentation/DocBook/kernel-api.tmpl |    8 +++++
>  arch/x86/mm/pageattr_64.c             |   46 +++++++++++++++++++++++++---------
>  2 files changed, 42 insertions(+), 12 deletions(-)
> 
> Index: linux/arch/x86/mm/pageattr_64.c
> ===================================================================
> --- linux.orig/arch/x86/mm/pageattr_64.c
> +++ linux/arch/x86/mm/pageattr_64.c
> @@ -269,19 +269,19 @@ __change_page_attr(unsigned long address
>  	return 0;
>  } 
>  
> -/*
> - * Change the page attributes of an page in the linear mapping.
> - *
> - * This should be used when a page is mapped with a different caching policy
> - * than write-back somewhere - some CPUs do not like it when mappings with
> - * different caching policies exist. This changes the page attributes of the
> - * in kernel linear mapping too.
> +/**
> + * change_page_attr_addr - Change page table attributes in linear mapping
> + * @address: Virtual address in linear mapping.
> + * @numpages: Number of pages to change
> + * @prot:    New page table attribute (PAGE_*)
>   * 
> - * The caller needs to ensure that there are no conflicting mappings elsewhere.
> - * This function only deals with the kernel linear map.
> - * 
> - * Caller must call global_flush_tlb() after this.
> + * Change page attributes of a page in the direct mapping. This is a variant
> + * of change_page_attr() that also works on memory holes that do not have
> + * mem_map entry (pfn_valid() is false).
> + *
> + * See change_page_attr() documentation for more details.
>   */
> +
>  int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
>  {
>  	int err = 0, kernel_map = 0;
> @@ -318,13 +318,35 @@ int change_page_attr_addr(unsigned long 
>  	return err;
>  }
>  
> -/* Don't call this for MMIO areas that may not have a mem_map entry */
> +/**
> + * change_page_attr - Change page table attributes in the linear mapping.
> + * @page: First page to change
> + * @numpages: Number of pages to change
> + * @prot: New protection/caching type (PAGE_*)
> + *
> + * Returns 0 on success, otherwise a negated errno.
> + *
> + * This should be used when a page is mapped with a different caching policy
> + * than write-back somewhere - some CPUs do not like it when mappings with
> + * different caching policies exist. This changes the page attributes of the
> + * in kernel linear mapping too.
> + *
> + * Caller must call global_flush_tlb() later to make the changes active.
> + *
> + * The caller needs to ensure that there are no conflicting mappings elsewhere
> + * (e.g. in user space) * This function only deals with the kernel linear map.
> + *
> + * For MMIO areas without mem_map use change_page_attr_addr() instead.
> + */
>  int change_page_attr(struct page *page, int numpages, pgprot_t prot)
>  {
>  	unsigned long addr = (unsigned long)page_address(page);
>  	return change_page_attr_addr(addr, numpages, prot);
>  }
>  
> +/**
> + * global_flush_tlb - Flush linear mappings changed by change_page_attr()
> + */
>  void global_flush_tlb(void)
>  { 
>  	struct page *pg, *next;
> Index: linux/Documentation/DocBook/kernel-api.tmpl
> ===================================================================
> --- linux.orig/Documentation/DocBook/kernel-api.tmpl
> +++ linux/Documentation/DocBook/kernel-api.tmpl
> @@ -726,4 +726,12 @@ X!Idrivers/video/console/fonts.c
>  !Ffs/pipe.c
>    </chapter>
>  
> +  <chapter id="pageattr">
> +	<title>Kernel direct mapping paging attributes</title>
> +  <para>
> +	Changing the paging attributes of kernel direct mapping pages
> +	on x86.
> +  </para>
> +!Farch/x86/mm/pageattr_64.c
> +  </chapter>
>  </book>
> --

---
~Randy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [27/31] CPA: Change comments of external interfaces to kerneldoc format
  2008-01-14 22:50   ` Randy Dunlap
@ 2008-01-15  0:49     ` Andi Kleen
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-15  0:49 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel, jbeulich, mingo, tglx

On Monday 14 January 2008 23:50:54 Randy Dunlap wrote:
> On Mon, 14 Jan 2008 23:17:00 +0100 (CET) Andi Kleen wrote:
> 
> > 
> > And clarify description a bit.
> > 
> > Only for 64bit, but the interfaces are identical for 32bit and kerneldoc should
> > merge them (?) 
> 
> I doubt that kernel-doc will merge them.
> 
> Normally we just import (like with !I, !E, !F, etc.) one file that
> contains the kernel-doc API notations.
> Would that work for this case?

Yes it should. The interface is identical except that 64bit has
one entry point more (but Venki was planning to add that one to 32bit too) 

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [12/31] CPA: CLFLUSH support in change_page_attr()
  2008-01-14 22:16 ` [PATCH] [12/31] CPA: CLFLUSH support in change_page_attr() Andi Kleen
@ 2008-01-15  8:40   ` Jan Beulich
  2008-01-15  9:57     ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2008-01-15  8:40 UTC (permalink / raw)
  To: ak; +Cc: mingo, tglx, linux-kernel

>-	/* clflush is still broken. Disable for now. */
>-	if (1 || !cpu_has_clflush)
>+	if (a->full_flush)
> 		asm volatile("wbinvd" ::: "memory");
>-	else list_for_each_entry(pg, l, lru) {
>-		void *adr = page_address(pg);
>-		clflush_cache_range(adr, PAGE_SIZE);
>+	list_for_each_entry(f, &a->l, l) {
>+		if (!a->full_flush)

This if() looks redundant (could also be avoided in the 32-bit variant, but
isn't redundant there at present). Also, is there no
wbinvd() on 64bit?

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [19/31] CPA: Limit cache flushing to pages that really change caching
  2008-01-14 22:16 ` [PATCH] [19/31] CPA: Limit cache flushing to pages that really change caching Andi Kleen
@ 2008-01-15  8:46   ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2008-01-15  8:46 UTC (permalink / raw)
  To: ak; +Cc: mingo, tglx, linux-kernel

>+static unsigned short pat_bit[5] = {

Could be const and perhaps even local to cache_attr_changed()...

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [2/31] CPA: Do a simple self test at boot
  2008-01-14 22:16 ` [PATCH] [2/31] CPA: Do a simple self test at boot Andi Kleen
@ 2008-01-15  8:47   ` Harvey Harrison
  2008-01-15  9:59     ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Harvey Harrison @ 2008-01-15  8:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Mon, 2008-01-14 at 23:16 +0100, Andi Kleen wrote:
> When CONFIG_DEBUG_RODATA is enabled undo the ro mapping and redo it again.
> This gives some simple testing for change_page_attr()
> 
> Optional patch, but I find it useful.
> 
> Signed-off-by: Andi Kleen <ak@suse.de>
> 
> ---
>  arch/x86/Kconfig.debug |    5 +++++
>  arch/x86/mm/init_32.c  |   26 ++++++++++++++++++++++++++
>  arch/x86/mm/init_64.c  |   10 ++++++++++
>  3 files changed, 41 insertions(+)
> 
> Index: linux/arch/x86/mm/init_64.c
> ===================================================================
> --- linux.orig/arch/x86/mm/init_64.c
> +++ linux/arch/x86/mm/init_64.c
> @@ -603,6 +603,16 @@ void mark_rodata_ro(void)
>  	 * of who is the culprit.
>  	 */
>  	global_flush_tlb();

global_flush_tlb outside CONFIG_CPA_DEBUG here.

> +
> +#ifdef CONFIG_CPA_DEBUG
> +	printk("Testing CPA: undo %lx-%lx\n", start, end);
> +	change_page_attr_addr(start, (end - start) >> PAGE_SHIFT, PAGE_KERNEL);
> +	global_flush_tlb();
> +
> +	printk("Testing CPA: again\n");
> +	change_page_attr_addr(start, (end - start) >> PAGE_SHIFT, PAGE_KERNEL_RO);
> +	global_flush_tlb();
> +#endif
>  }
>  #endif
>  
> Index: linux/arch/x86/mm/init_32.c
> ===================================================================
> --- linux.orig/arch/x86/mm/init_32.c
> +++ linux/arch/x86/mm/init_32.c
> @@ -793,6 +793,20 @@ void mark_rodata_ro(void)
>  		change_page_attr(virt_to_page(start),
>  		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
>  		printk("Write protecting the kernel text: %luk\n", size >> 10);
> +
> +#ifdef CONFIG_CPA_DEBUG
> +		global_flush_tlb();
> +

global_flush_tlb inside CONFIG_CPA_DEBUG here.

> +		printk("Testing CPA: Reverting %lx-%lx\n", start, start+size);
> +		change_page_attr(virt_to_page(start), size>>PAGE_SHIFT,
> +				 PAGE_KERNEL_EXEC);
> +		global_flush_tlb();
> +
> +		printk("Testing CPA: write protecting again\n");
> +		change_page_attr(virt_to_page(start), size>>PAGE_SHIFT,
> +				PAGE_KERNEL_RX);
> +		global_flush_tlb();
> +#endif
>  	}
>  #endif
>  	start += size;
> @@ -809,6 +823,18 @@ void mark_rodata_ro(void)
>  	 * of who is the culprit.
>  	 */
>  	global_flush_tlb();

global_flush_tlb outside CONFIG_CPA_DEBUG here.  Intentional?

> +
> +#ifdef CONFIG_CPA_DEBUG
> +	printk("Testing CPA: undo %lx-%lx\n", start, start + size);
> +	change_page_attr(virt_to_page(start), size >> PAGE_SHIFT,
> +				PAGE_KERNEL);
> +	global_flush_tlb();
> +
> +	printk("Testing CPA: write protecting again\n");
> +	change_page_attr(virt_to_page(start), size >> PAGE_SHIFT,
> +				PAGE_KERNEL_RO);
> +	global_flush_tlb();
> +#endif

Cheers,

Harvey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable()
  2008-01-14 22:16 ` [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable() Andi Kleen
@ 2008-01-15  8:56   ` Harvey Harrison
  2008-01-15 10:00     ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Harvey Harrison @ 2008-01-15  8:56 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Mon, 2008-01-14 at 23:16 +0100, Andi Kleen wrote:
> Similar to x86-64. This is useful in other situations where we want
> the page table dumped too.
> 
> Besides anything that makes i386 do_page_fault shorter is good.
> 
> Signed-off-by: Andi Kleen <ak@suse.de>
> 
> ---
>  arch/x86/mm/fault_32.c |   72 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 39 insertions(+), 33 deletions(-)
> 
> Index: linux/arch/x86/mm/fault_32.c
> ===================================================================
> --- linux.orig/arch/x86/mm/fault_32.c
> +++ linux/arch/x86/mm/fault_32.c
> @@ -28,6 +28,44 @@
>  #include <asm/desc.h>
>  #include <asm/segment.h>
>  
> +void dump_pagetable(unsigned long address)

static?

> +{
> +	typeof(pte_val(__pte(0))) page;

Is there any type that could be picked that would be nicer than
sprinkling ((__typeof__(page) *), typeof(pte_val(__pte(0))) etc
through here, I know it's just moving the code out to another
function, just wondering if you had any better ideas that someone
could follow-up on.

Maybe another helper printk_page()? could help here.

> +
> +	page = read_cr3();
> +	page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
> +#ifdef CONFIG_X86_PAE
> +	printk("*pdpt = %016Lx ", page);
> +	if ((page >> PAGE_SHIFT) < max_low_pfn
> +	    && page & _PAGE_PRESENT) {
> +		page &= PAGE_MASK;
> +		page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
> +							 & (PTRS_PER_PMD - 1)];
> +		printk(KERN_CONT "*pde = %016Lx ", page);
> +		page &= ~_PAGE_NX;
> +	}
> +#else
> +	printk("*pde = %08lx ", page);
> +#endif
> +
> +	/*
> +	 * We must not directly access the pte in the highpte
> +	 * case if the page table is located in highmem.
> +	 * And let's rather not kmap-atomic the pte, just in case
> +	 * it's allocated already.
> +	 */
> +	if ((page >> PAGE_SHIFT) < max_low_pfn
> +	    && (page & _PAGE_PRESENT)
> +	    && !(page & _PAGE_PSE)) {
> +		page &= PAGE_MASK;
> +		page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
> +							 & (PTRS_PER_PTE - 1)];
> +		printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
> +	}
> +
> +	printk("\n");
> +}
> +

Cheers,

Harvey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages
  2008-01-14 22:16 ` [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages Andi Kleen
@ 2008-01-15  9:05   ` Jan Beulich
  2008-01-15 10:04     ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2008-01-15  9:05 UTC (permalink / raw)
  To: ak; +Cc: mingo, tglx, linux-kernel

>+	ref_prot = canon_pgprot(ref_prot);
>+	prot = canon_pgprot(prot);
>+
> 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
>...
> 	} else if (level == 4) {
>...
> 	} else {
> 		/*
> 		 * When you're here you either set the same page to PAGE_KERNEL

Doesn't this change require modifying the BUG() here into a BUG_ON() so
that it doesn't trigger if pgprot_val(prot) == pgprot_val(ref_prot) and
level != 4?

>+#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)

While I remember you stated the inverse, I continue to think that it'd be
safer to mask out the accessed and dirty flags for the comparisons this
macro is being used for.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [0/31] Great change_page_attr patch series v2
  2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
                   ` (30 preceding siblings ...)
  2008-01-14 22:17 ` [PATCH] [31/31] Clean up pte_exec Andi Kleen
@ 2008-01-15  9:11 ` Jan Beulich
  2008-01-15 10:06   ` Andi Kleen
  31 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2008-01-15  9:11 UTC (permalink / raw)
  To: ak; +Cc: mingo, tglx, linux-kernel

>>> Andi Kleen <ak@suse.de> 14.01.08 23:16 >>>
>
>Lots of improvements to change_page_attr(). Make it a lot more
>efficient and fix various bugs.
>
>Changes against earlier version
>
>- Fixed some checkpatch.pl complaints
>- Improved self test suite
>- Fixed more reference bugs 
>- Fix NX handling on 32bit
>- Remove some useless code there
>- Few other random fixes

The one concept that I'm missing (but that I can easily produce a follow-up
patch for, as I had this in my c_p_a() changes) is the tracking and adjusting
of the reference protection for a large page range that got fully converted
to another type (namely relevant for .rodata if it exceeds 2/4 Mb), allowing
to use a large page mapping in this case even for non-default mappings.

Apart from the other comments, the whole series

Acked-by: Jan Beulich <jbeulich@novell.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr
  2008-01-14 22:16 ` [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr Andi Kleen
@ 2008-01-15  9:29   ` Harvey Harrison
  2008-01-15 10:06     ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Harvey Harrison @ 2008-01-15  9:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Mon, 2008-01-14 at 23:16 +0100, Andi Kleen wrote:
> Instead of open coding the bit accesses uses standard style
> *PageDeferred* macros. 
> 

Any reason these couldn't be static inlines in a shared header?

asm/page.h perhaps?

Cheers,

Harvey



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [12/31] CPA: CLFLUSH support in change_page_attr()
  2008-01-15  8:40   ` Jan Beulich
@ 2008-01-15  9:57     ` Andi Kleen
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-15  9:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: mingo, tglx, linux-kernel

On Tuesday 15 January 2008 09:40:27 Jan Beulich wrote:
> >-	/* clflush is still broken. Disable for now. */
> >-	if (1 || !cpu_has_clflush)
> >+	if (a->full_flush)
> > 		asm volatile("wbinvd" ::: "memory");
> >-	else list_for_each_entry(pg, l, lru) {
> >-		void *adr = page_address(pg);
> >-		clflush_cache_range(adr, PAGE_SIZE);
> >+	list_for_each_entry(f, &a->l, l) {
> >+		if (!a->full_flush)
> 
> This if() looks redundant (could also be avoided in the 32-bit variant, but
> isn't redundant there at present). Also, is there no
> wbinvd() on 64bit?

That's all done in a later patch.

The transformation steps are not always ideal, but in the end the code
is ok I think.

-Andi

The final result of the series is for 32bit & flush_kernel_map:

 static void flush_kernel_map(void *arg)
 {
-       struct list_head *lh = (struct list_head *)arg;
-       struct page *p;
+       struct flush_arg *a = (struct flush_arg *)arg;
+       struct flush *f;
+       int cache_flush = a->full_flush == FLUSH_CACHE;
+
+       list_for_each_entry(f, &a->l, l) {
+               if (!a->full_flush)
+                       __flush_tlb_one(f->addr);
+               if (f->mode == FLUSH_CACHE && !cpu_has_ss) {
+                       if (cpu_has_clflush)
+                               clflush_cache_range((void *)f->addr, PAGE_SIZE);
+                       else
+                               cache_flush++;
+               }
+       }
 
-       /* High level code is not ready for clflush yet */
-       if (0 && cpu_has_clflush) {
-               list_for_each_entry (p, lh, lru)
-                       cache_flush_page(p);
-       } else if (boot_cpu_data.x86_model >= 4)
-               wbinvd();
+       if (a->full_flush)
+               __flush_tlb_all();
 
-       /* Flush all to work around Errata in early athlons regarding 
-        * large page flushing. 
+       /*
+        * RED-PEN: Intel documentation ask for a CPU synchronization step
+        * here and in the loop. But it is moot on Self-Snoop CPUs anyways.
         */
-       __flush_tlb_all();      
+
+       if (cache_flush > 0 && !cpu_has_ss && boot_cpu_data.x86_model >= 4)
+               wbinvd();
 }


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [2/31] CPA: Do a simple self test at boot
  2008-01-15  8:47   ` Harvey Harrison
@ 2008-01-15  9:59     ` Andi Kleen
  2008-01-15 10:07       ` Harvey Harrison
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-15  9:59 UTC (permalink / raw)
  To: Harvey Harrison; +Cc: linux-kernel, jbeulich, mingo, tglx


> global_flush_tlb outside CONFIG_CPA_DEBUG here.  Intentional?

Yes that's all intentional. Without CPA_DEBUG there is only a single flush,
with CPA_DEBUG we flush more often simply to catch bugs.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable()
  2008-01-15  8:56   ` Harvey Harrison
@ 2008-01-15 10:00     ` Andi Kleen
  2008-01-15 10:05       ` Harvey Harrison
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-15 10:00 UTC (permalink / raw)
  To: Harvey Harrison; +Cc: linux-kernel, jbeulich, mingo, tglx

 
> > +void dump_pagetable(unsigned long address)
> 
> static?

This is used by other files too with future patches. Also
in general it's useful for debugging stuff - i often put calls 
to it into debugging patches.

> > +{
> > +	typeof(pte_val(__pte(0))) page;
> 
> Is there any type that could be picked that would be nicer than
> sprinkling ((__typeof__(page) *), typeof(pte_val(__pte(0))) etc
> through here, I know it's just moving the code out to another
> function, just wondering if you had any better ideas that someone
> could follow-up on.

It could be pteval_t now which Jeremy recently introduced. But for pure
code movement I don't want to do any improvements.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages
  2008-01-15  9:05   ` Jan Beulich
@ 2008-01-15 10:04     ` Andi Kleen
  2008-01-15 12:00       ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-15 10:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: mingo, tglx, linux-kernel

On Tuesday 15 January 2008 10:05:44 Jan Beulich wrote:
> >+	ref_prot = canon_pgprot(ref_prot);
> >+	prot = canon_pgprot(prot);
> >+
> > 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
> >...
> > 	} else if (level == 4) {
> >...
> > 	} else {
> > 		/*
> > 		 * When you're here you either set the same page to PAGE_KERNEL
> 
> Doesn't this change require modifying the BUG() here into a BUG_ON() so
> that it doesn't trigger if pgprot_val(prot) == pgprot_val(ref_prot) and
> level != 4?

I addressed this in the comment

+               /*
+                * When you're here you either set the same page to PAGE_KERNEL
+                * two times in a row or the page table reference counting is
+                * broken again. To catch the later bug for now (sorry)
+                */

Do you think it's important to handle?  The function already has too many
special cases and setting something several times in a row to PAGE_KERNEL
is usually a bug in the caller anyways (or a cpa bug)

> 
> >+#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)
> 
> While I remember you stated the inverse, 

I changed my mind regarding not doing this. It was true when I said it originally.

> I continue to think that it'd be 
> safer to mask out the accessed and dirty flags for the comparisons this
> macro is being used for.

A and D should be always set for _PAGE_KERNEL and for kernel mappings in general. 
The problem would only occur if someone made up custom page flags without those.
Didn't think that usage was worth supporting. Perhaps it would be a good idea
to add a WARN_ON for this case though.

Thanks for the review.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable()
  2008-01-15 10:00     ` Andi Kleen
@ 2008-01-15 10:05       ` Harvey Harrison
  0 siblings, 0 replies; 56+ messages in thread
From: Harvey Harrison @ 2008-01-15 10:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Tue, 2008-01-15 at 11:00 +0100, Andi Kleen wrote:
> > > +void dump_pagetable(unsigned long address)
> > 
> > static?
> 
> This is used by other files too with future patches. Also
> in general it's useful for debugging stuff - i often put calls 
> to it into debugging patches.
> 
> > > +{
> > > +	typeof(pte_val(__pte(0))) page;
> > 
> > Is there any type that could be picked that would be nicer than
> > sprinkling ((__typeof__(page) *), typeof(pte_val(__pte(0))) etc
> > through here, I know it's just moving the code out to another
> > function, just wondering if you had any better ideas that someone
> > could follow-up on.
> 
> It could be pteval_t now which Jeremy recently introduced. But for pure
> code movement I don't want to do any improvements.

Sure, I had a very similar patch but was looking for a better way to
address this, I'll keep an eye on this and Jeremy's patch and follow
up when his stuff has settled a bit more.

Harvey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [0/31] Great change_page_attr patch series v2
  2008-01-15  9:11 ` [PATCH] [0/31] Great change_page_attr patch series v2 Jan Beulich
@ 2008-01-15 10:06   ` Andi Kleen
  2008-01-15 11:55     ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-15 10:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: mingo, tglx, linux-kernel

On Tuesday 15 January 2008 10:11:44 Jan Beulich wrote:
> >>> Andi Kleen <ak@suse.de> 14.01.08 23:16 >>>
> >
> >Lots of improvements to change_page_attr(). Make it a lot more
> >efficient and fix various bugs.
> >
> >Changes against earlier version
> >
> >- Fixed some checkpatch.pl complaints
> >- Improved self test suite
> >- Fixed more reference bugs 
> >- Fix NX handling on 32bit
> >- Remove some useless code there
> >- Few other random fixes
> 
> The one concept that I'm missing (but that I can easily produce a follow-up
> patch for, as I had this in my c_p_a() changes) is the tracking and adjusting
> of the reference protection for a large page range that got fully converted
> to another type (namely relevant for .rodata if it exceeds 2/4 Mb), allowing
> to use a large page mapping in this case even for non-default mappings.

Ah -- i got rid of that by changing the rodata code to not do this
except for the debugging case

>>
CPA: Only unmap kernel init pages in text mapping when CONFIG_DEBUG_RODATA is set

Otherwise the kernel will likely always run with 4K pages instead of 2MB pages,
which is costly in terms of TLBs.
<<

But you're right that would be an useful feature. But wouldn't it require
aligning rodata to 2MB in the vmlinux to be really effective? 

> 
> Apart from the other comments, the whole series
> 
> Acked-by: Jan Beulich <jbeulich@novell.com>

Thanks.

-Andi




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr
  2008-01-15  9:29   ` Harvey Harrison
@ 2008-01-15 10:06     ` Andi Kleen
  2008-01-15 10:15       ` Harvey Harrison
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2008-01-15 10:06 UTC (permalink / raw)
  To: Harvey Harrison; +Cc: linux-kernel, jbeulich, mingo, tglx

On Tuesday 15 January 2008 10:29:51 Harvey Harrison wrote:
> On Mon, 2008-01-14 at 23:16 +0100, Andi Kleen wrote:
> > Instead of open coding the bit accesses uses standard style
> > *PageDeferred* macros. 
> > 
> 
> Any reason these couldn't be static inlines in a shared header?

The whole usage of that bit is private to the file

-Andi


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [2/31] CPA: Do a simple self test at boot
  2008-01-15  9:59     ` Andi Kleen
@ 2008-01-15 10:07       ` Harvey Harrison
  0 siblings, 0 replies; 56+ messages in thread
From: Harvey Harrison @ 2008-01-15 10:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Tue, 2008-01-15 at 10:59 +0100, Andi Kleen wrote:
> > global_flush_tlb outside CONFIG_CPA_DEBUG here.  Intentional?
> 
> Yes that's all intentional. Without CPA_DEBUG there is only a single flush,
> with CPA_DEBUG we flush more often simply to catch bugs.
> 

OK, it just looked a bit fishy with the different grouping, just making
sure.

Harvey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr
  2008-01-15 10:06     ` Andi Kleen
@ 2008-01-15 10:15       ` Harvey Harrison
  2008-01-15 10:25         ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Harvey Harrison @ 2008-01-15 10:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Tue, 2008-01-15 at 11:06 +0100, Andi Kleen wrote:
> On Tuesday 15 January 2008 10:29:51 Harvey Harrison wrote:
> > On Mon, 2008-01-14 at 23:16 +0100, Andi Kleen wrote:
> > > Instead of open coding the bit accesses uses standard style
> > > *PageDeferred* macros. 
> > > 
> > 
> > Any reason these couldn't be static inlines in a shared header?
> 
> The whole usage of that bit is private to the file

True if pageattr_32|64.c get merged, not one I have looked at yet,
but wouldn't a static inline in the files then be better for the
typechecking?

Harvey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr
  2008-01-15 10:15       ` Harvey Harrison
@ 2008-01-15 10:25         ` Andi Kleen
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-15 10:25 UTC (permalink / raw)
  To: Harvey Harrison; +Cc: linux-kernel, jbeulich, mingo, tglx

On Tuesday 15 January 2008 11:15:02 Harvey Harrison wrote:
> On Tue, 2008-01-15 at 11:06 +0100, Andi Kleen wrote:
> > On Tuesday 15 January 2008 10:29:51 Harvey Harrison wrote:
> > > On Mon, 2008-01-14 at 23:16 +0100, Andi Kleen wrote:
> > > > Instead of open coding the bit accesses uses standard style
> > > > *PageDeferred* macros. 
> > > > 
> > > 
> > > Any reason these couldn't be static inlines in a shared header?
> > 
> > The whole usage of that bit is private to the file
> 
> True if pageattr_32|64.c get merged, not one I have looked at yet,
> but wouldn't a static inline in the files then be better for the
> typechecking?

If you pass in something other than a struct page * the compiler
will already complain. Also all the other Page* accessor in page-flags.h are macros.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [9/31] CPA: Add simple self test at boot
  2008-01-14 22:16 ` [PATCH] [9/31] CPA: Add simple self test at boot Andi Kleen
@ 2008-01-15 10:37   ` Harvey Harrison
  0 siblings, 0 replies; 56+ messages in thread
From: Harvey Harrison @ 2008-01-15 10:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

On Mon, 2008-01-14 at 23:16 +0100, Andi Kleen wrote:
> Since change_page_attr() is tricky code it is good to have some regression
> test code. This patch maps and unmaps some random pages in the direct mapping
> at boot and then dumps the state and does some simple sanity checks.
> 
> Add it with a CONFIG option.
> 
> Optional patch, but I find it useful.

Has it been decided if a new /test directory is going to be
created for this kind of smoketest?

Harvey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [0/31] Great change_page_attr patch series v2
  2008-01-15 10:06   ` Andi Kleen
@ 2008-01-15 11:55     ` Jan Beulich
  2008-01-15 12:43       ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2008-01-15 11:55 UTC (permalink / raw)
  To: Andi Kleen; +Cc: mingo, tglx, linux-kernel

>> The one concept that I'm missing (but that I can easily produce a follow-up
>> patch for, as I had this in my c_p_a() changes) is the tracking and adjusting
>> of the reference protection for a large page range that got fully converted
>> to another type (namely relevant for .rodata if it exceeds 2/4 Mb), allowing
>> to use a large page mapping in this case even for non-default mappings.
>
>Ah -- i got rid of that by changing the rodata code to not do this
>except for the debugging case
>
>>>
>CPA: Only unmap kernel init pages in text mapping when CONFIG_DEBUG_RODATA is set
>
>Otherwise the kernel will likely always run with 4K pages instead of 2MB pages,
>which is costly in terms of TLBs.
><<
>
>But you're right that would be an useful feature. But wouldn't it require
>aligning rodata to 2MB in the vmlinux to be really effective? 

Yes, that would be desirable then (and .data should be at a 2/4 Mb
boundary for this, too).

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages
  2008-01-15 10:04     ` Andi Kleen
@ 2008-01-15 12:00       ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2008-01-15 12:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: mingo, tglx, linux-kernel

>>> Andi Kleen <ak@suse.de> 15.01.08 11:04 >>>
>On Tuesday 15 January 2008 10:05:44 Jan Beulich wrote:
>> >+	ref_prot = canon_pgprot(ref_prot);
>> >+	prot = canon_pgprot(prot);
>> >+
>> > 	if (pgprot_val(prot) != pgprot_val(ref_prot)) { 
>> >...
>> > 	} else if (level == 4) {
>> >...
>> > 	} else {
>> > 		/*
>> > 		 * When you're here you either set the same page to PAGE_KERNEL
>> 
>> Doesn't this change require modifying the BUG() here into a BUG_ON() so
>> that it doesn't trigger if pgprot_val(prot) == pgprot_val(ref_prot) and
>> level != 4?
>
>I addressed this in the comment
>
>+               /*
>+                * When you're here you either set the same page to PAGE_KERNEL
>+                * two times in a row or the page table reference counting is
>+                * broken again. To catch the later bug for now (sorry)
>+                */
>
>Do you think it's important to handle?  The function already has too many
>special cases and setting something several times in a row to PAGE_KERNEL
>is usually a bug in the caller anyways (or a cpa bug)

It definitely is when making ref_prot variable (as discussed in an earlier
reply regarding a different patch), but I think it's even inconsistent given
the possible presence/absence of _PAGE_NX.

Jan


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [0/31] Great change_page_attr patch series v2
  2008-01-15 11:55     ` Jan Beulich
@ 2008-01-15 12:43       ` Andi Kleen
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2008-01-15 12:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: mingo, tglx, linux-kernel


> >But you're right that would be an useful feature. But wouldn't it require
> >aligning rodata to 2MB in the vmlinux to be really effective?
>
> Yes, that would be desirable then (and .data should be at a 2/4 Mb
> boundary for this, too).

Yes, rather .rodata/.text together and .data/.bss separately 
aligned to 2MB.

But the problem is it would be quite wasteful -- even on a big 
defconfig kernel data+bss is far less than 2MB and with 4MB
pages it would be even worse.

I'm not sure that would be worth the advantage of write protection for non 
debugging kernels and for debugging kernels just using 4K pages is fine.

There's also the issue where to put the initcode/initdata.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] [4/31] Add pte_pgprot on i386
  2008-01-14 22:16 ` [PATCH] [4/31] Add pte_pgprot on i386 Andi Kleen
@ 2008-01-15 13:00   ` Johannes Weiner
  0 siblings, 0 replies; 56+ messages in thread
From: Johannes Weiner @ 2008-01-15 13:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, jbeulich, mingo, tglx

Hi,

Andi Kleen <ak@suse.de> writes:

[...]
> --- linux.orig/include/asm-x86/pgtable-2level.h
> +++ linux/include/asm-x86/pgtable-2level.h
> @@ -75,6 +75,8 @@ static inline int pte_exec_kernel(pte_t 
>  #define pgoff_to_pte(off) \
>  	((pte_t) { .pte_low = (((off) & 0x1f) << 1) + (((off) >> 5) << 8) + _PAGE_FILE })
>  
> +#define pte_pgprot(x) __pgprot((x).pte_low & 0xfff)

[...]

> --- linux.orig/include/asm-x86/pgtable-3level.h
> +++ linux/include/asm-x86/pgtable-3level.h
> @@ -142,6 +142,8 @@ static inline unsigned long pte_pfn(pte_
>  	return pte_val(pte) >> PAGE_SHIFT;
>  }
>  
> +#define pte_pgprot(x) __pgprot(pte_val(x) & (0xfff | _PAGE_NX))

Could we use pte_val() in both cases?

	Hannes

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2008-01-15 12:58 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-14 22:16 [PATCH] [0/31] Great change_page_attr patch series v2 Andi Kleen
2008-01-14 22:16 ` [PATCH] [1/31] Shrink __PAGE_KERNEL/__PAGE_KERNEL_EXEC on non PAE kernels Andi Kleen
2008-01-14 22:16 ` [PATCH] [2/31] CPA: Do a simple self test at boot Andi Kleen
2008-01-15  8:47   ` Harvey Harrison
2008-01-15  9:59     ` Andi Kleen
2008-01-15 10:07       ` Harvey Harrison
2008-01-14 22:16 ` [PATCH] [3/31] Add pte accessors for the global bit Andi Kleen
2008-01-14 22:16 ` [PATCH] [4/31] Add pte_pgprot on i386 Andi Kleen
2008-01-15 13:00   ` Johannes Weiner
2008-01-14 22:16 ` [PATCH] [5/31] Don't drop NX bit in pte modifier functions for 32bit Andi Kleen
2008-01-14 22:16 ` [PATCH] [6/31] CPA: Undo white space changes Andi Kleen
2008-01-14 22:16 ` [PATCH] [7/31] Extract page table dumping code from i386 fault handler into dump_pagetable() Andi Kleen
2008-01-15  8:56   ` Harvey Harrison
2008-01-15 10:00     ` Andi Kleen
2008-01-15 10:05       ` Harvey Harrison
2008-01-14 22:16 ` [PATCH] [8/31] CPA: Return the page table level in lookup_address() Andi Kleen
2008-01-14 22:16 ` [PATCH] [9/31] CPA: Add simple self test at boot Andi Kleen
2008-01-15 10:37   ` Harvey Harrison
2008-01-14 22:16 ` [PATCH] [10/31] CPA: Change kernel_map_pages to not use c_p_a() Andi Kleen
2008-01-14 22:16 ` [PATCH] [11/31] CPA: Change 32bit back to init_mm semaphore locking Andi Kleen
2008-01-14 22:16 ` [PATCH] [12/31] CPA: CLFLUSH support in change_page_attr() Andi Kleen
2008-01-15  8:40   ` Jan Beulich
2008-01-15  9:57     ` Andi Kleen
2008-01-14 22:16 ` [PATCH] [13/31] CPA: Use macros to modify the PG_arch_1 page flags in change_page_attr Andi Kleen
2008-01-15  9:29   ` Harvey Harrison
2008-01-15 10:06     ` Andi Kleen
2008-01-15 10:15       ` Harvey Harrison
2008-01-15 10:25         ` Andi Kleen
2008-01-14 22:16 ` [PATCH] [14/31] CPA: Use page granuality TLB flushing " Andi Kleen
2008-01-14 22:16 ` [PATCH] [15/31] CPA: Don't flush the caches when the CPU supports self-snoop Andi Kleen
2008-01-14 22:16 ` [PATCH] [16/31] CPA: Use wbinvd() macro instead of inline assembly in 64bit c_p_a() Andi Kleen
2008-01-14 22:16 ` [PATCH] [17/31] CPA: Reorder TLB / cache flushes to follow Intel recommendation Andi Kleen
2008-01-14 22:16 ` [PATCH] [18/31] CPA: Make change_page_attr() more robust against use of PAT bits Andi Kleen
2008-01-14 22:16 ` [PATCH] [19/31] CPA: Limit cache flushing to pages that really change caching Andi Kleen
2008-01-15  8:46   ` Jan Beulich
2008-01-14 22:16 ` [PATCH] [20/31] CPA: Fix inaccurate comments in 64bit change_page_attr() Andi Kleen
2008-01-14 22:16 ` [PATCH] [21/31] CPA: Dump pagetable when inconsistency is detected Andi Kleen
2008-01-14 22:16 ` [PATCH] [22/31] CPA: Only queue actually unused page table pages for freeing Andi Kleen
2008-01-14 22:16 ` [PATCH] [23/31] CPA: Remove unnecessary masking of address Andi Kleen
2008-01-14 22:16 ` [PATCH] [24/31] CPA: Only unmap kernel init pages in text mapping when CONFIG_DEBUG_RODATA is set Andi Kleen
2008-01-14 22:16 ` [PATCH] [25/31] CPA: Always do full TLB flush when splitting large pages Andi Kleen
2008-01-14 22:16 ` [PATCH] [26/31] CPA: Fix reference counting when changing already changed pages Andi Kleen
2008-01-15  9:05   ` Jan Beulich
2008-01-15 10:04     ` Andi Kleen
2008-01-15 12:00       ` Jan Beulich
2008-01-14 22:17 ` [PATCH] [27/31] CPA: Change comments of external interfaces to kerneldoc format Andi Kleen
2008-01-14 22:50   ` Randy Dunlap
2008-01-15  0:49     ` Andi Kleen
2008-01-14 22:17 ` [PATCH] [28/31] CPA: Make kernel_text test match boot mapping initialization Andi Kleen
2008-01-14 22:17 ` [PATCH] [29/31] CPA: Add a BUG_ON checking for someone setting the kernel text NX Andi Kleen
2008-01-14 22:17 ` [PATCH] [30/31] Remove set_kernel_exec Andi Kleen
2008-01-14 22:17 ` [PATCH] [31/31] Clean up pte_exec Andi Kleen
2008-01-15  9:11 ` [PATCH] [0/31] Great change_page_attr patch series v2 Jan Beulich
2008-01-15 10:06   ` Andi Kleen
2008-01-15 11:55     ` Jan Beulich
2008-01-15 12:43       ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).