All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/29] Page Table Interface Explanation
@ 2007-01-13  2:45 Paul Davies
  2007-01-13  2:45 ` [PATCH 1/29] Abstract current page table implementation Paul Davies
                   ` (49 more replies)
  0 siblings, 50 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:45 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 00

        A CLEAN PAGE TABLE INTERFACE FOR LINUX by Gelato@UNSW

Linux currently uses the same page table format regardless of
architecture.  Access to the page table is open-coded in a variety of
places.  Architectures that walk a different page table format in
hardware set up a hardware-walkable cache in their native format, that
then has to be kept in step with Linux's page table.

The first step to allowing different page table formats is to split
the page table implementation into separate files from its use.
This patch series abstracts the page table implementation, and cleans
it up, so that:
   1.  All page table operations are in one place, making future
       maintenance easier
   2.  Generic code no longer knows what format the page table is,
       opening the way to experimentation with different
       page table formats.

The interface is separated into two parts.  The first part is
architecture independent. All architectures must run through
this interface regardless of whether or not that architecture
can or will ever want to change page tables.

This patch series provides:
   1. Architecture independent page table interface to try on i386 and IA64
   2. Architecture dependent page table interface for IA64
   3. An alternative page table implementation (guarded page table) for IA64

   The GPT is a a compile time alternative to the current page table on IA64.  
   This is running rough at the moment and is intended as an example of
   an alternative page table running under the PTI.

Benchmarking results for the full page table interface on IA64
 * There is negligable regression demonstrated across the board
 demonstrated by testing so far.  For benchmarks, see the link below.
 
Benchmarking results for the arch independent PTI running on i386
 * There is negligable regression demonstrated across the board
 demonstrated by testing so far.  For benchmarks see the link below.

Benchmarking of the GPT shows poor performance and is intended
as a demonstration of an alternate implementation under the PTI.
The GPT is still "under construction".  For benchmarks see the link below.

INSTRUCTIONS,BENCHMARKS and further information at the site below:
 
http://www.gelato.unsw.edu.au/IA64wiki/PageTableInterface/PTI-LCA

                PAGE TABLE INTERFACE

int create_user_page_table(struct mm_struct *mm);

void destroy_user_page_table(struct mm_struct *mm);

pte_t *build_page_table(struct mm_struct *mm, unsigned long address,
		pt_path_t *pt_path);

pte_t *lookup_page_table(struct mm_struct *mm, unsigned long address,
		pt_path_t *pt_path);

void free_pt_range(struct mmu_gather **tlb, unsigned long addr,
		unsigned long end, unsigned long floor, unsigned long ceiling);

int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
		unsigned long addr, unsigned long end, struct vm_area_struct *vma);

unsigned long unmap_page_range_iterator(struct mmu_gather *tlb,
        struct vm_area_struct *vma, unsigned long addr, unsigned long end,
        long *zap_work, struct zap_details *details);

int zeromap_build_iterator(struct mm_struct *mm,
		unsigned long addr, unsigned long end, pgprot_t prot);

int remap_build_iterator(struct mm_struct *mm,
		unsigned long addr, unsigned long end, unsigned long pfn,
		pgprot_t prot);

void change_protection_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, pgprot_t newprot,
		int dirty_accountable);

void vunmap_read_iterator(unsigned long addr, unsigned long end);

int vmap_build_iterator(unsigned long addr,
		unsigned long end, pgprot_t prot, struct page ***pages);

int unuse_vma_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page);

void smaps_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, struct mem_size_stats *mss);

int check_policy_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, const nodemask_t *nodes,
		unsigned long flags, void *private);

unsigned long move_page_tables(struct vm_area_struct *vma,
		unsigned long old_addr, struct vm_area_struct *new_vma,
		unsigned long new_addr, unsigned long len);


Paul Davies
Gelato@UNSW

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 0 files changed
PATCH 00

        A CLEAN PAGE TABLE INTERFACE FOR LINUX by Gelato@UNSW

Linux currently uses the same page table format regardless of
architecture.  Access to the page table is open-coded in a variety of
places.  Architectures that walk a different page table format in
hardware set up a hardware-walkable cache in their native format, that
then has to be kept in step with Linux's page table.

The first step to allowing different page table formats is to split
the page table implementation into separate files from its use.
This patch series abstracts the page table implementation, and cleans
it up, so that:
   1.  All page table operations are in one place, making future
       maintenance easier
   2.  Generic code no longer knows what format the page table is,
       opening the way to experimentation with different
       page table formats.

The interface is separated into two parts.  The first part is
architecture independent. All architectures must run through
this interface regardless of whether or not that architecture
can or will ever want to change page tables.

This patch series provides:
   1. Architecture independent page table interface to try on i386 and IA64
   2. Architecture dependent page table interface for IA64
   3. An alternative page table implementation (guarded page table)

   The GPT is a a compile time alternative to the current page table on IA64.  
   This is running rough at the moment and is intended as an example of
   an alternative page table running under the PTI.

Benchmarking results for the full page table interface on IA64
 * There is negligable regression demonstrated at the moment for testing
 so far.  For benchmarks see the link below.
Benchmarking results for the arch independent PTI running on i386
 * There is negligable regression demonstrated at the moment for testing
 so far.  For benchmarks see link below.

INSTRUCTIONS,BENCHMARKS and further information at the site below:
 
http://www.gelato.unsw.edu.au/IA64wiki/PageTableInterface/PTI-LCA
and benchmarking results.

                PAGE TABLE INTERFACE

int create_user_page_table(struct mm_struct *mm);

void destroy_user_page_table(struct mm_struct *mm);

pte_t *build_page_table(struct mm_struct *mm, unsigned long address,
		pt_path_t *pt_path);

pte_t *lookup_page_table(struct mm_struct *mm, unsigned long address,
		pt_path_t *pt_path);

void free_pt_range(struct mmu_gather **tlb, unsigned long addr,
		unsigned long end, unsigned long floor, unsigned long ceiling);

int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
		unsigned long addr, unsigned long end, struct vm_area_struct *vma);

unsigned long unmap_page_range_iterator(struct mmu_gather *tlb,
        struct vm_area_struct *vma, unsigned long addr, unsigned long end,
        long *zap_work, struct zap_details *details);

int zeromap_build_iterator(struct mm_struct *mm,
		unsigned long addr, unsigned long end, pgprot_t prot);

int remap_build_iterator(struct mm_struct *mm,
		unsigned long addr, unsigned long end, unsigned long pfn,
		pgprot_t prot);

void change_protection_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, pgprot_t newprot,
		int dirty_accountable);

void vunmap_read_iterator(unsigned long addr, unsigned long end);

int vmap_build_iterator(unsigned long addr,
		unsigned long end, pgprot_t prot, struct page ***pages);

int unuse_vma_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page);

void smaps_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, struct mem_size_stats *mss);

int check_policy_read_iterator(struct vm_area_struct *vma,
		unsigned long addr, unsigned long end, const nodemask_t *nodes,
		unsigned long flags, void *private);

unsigned long move_page_tables(struct vm_area_struct *vma,
		unsigned long old_addr, struct vm_area_struct *new_vma,
		unsigned long new_addr, unsigned long len);

Please send questions regarding the PTI to pauld@cse.unsw.edu.au

UNSW PhD student Adam Wiggins is the GPT author, questions regarding his
GPT model should be addressed to awiggins@cse.unsw.edu.au.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/29] Abstract current page table implementation
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
@ 2007-01-13  2:45 ` Paul Davies
  2007-01-13  2:45 ` [PATCH 2/29] " Paul Davies
                   ` (48 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:45 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 01
 * Creates the mm/pt-default.c to hold the implementation of 
 the default page table.
 * Adjusts mm/Makefile to compile default page table.
 * Starts moving default page table implementation from memory.c to
 pt-default.c.
   * moves across pgd/pud/pmd_clear_bad
   * moves accross pt alloc fns: __pte_alloc, __pte_alloc_kernel
   __pud_alloc and __pmd_alloc

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 Makefile     |    2 
 memory.c     |  121 --------------------------------------------------
 pt-default.c |  141 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 142 insertions(+), 122 deletions(-)
Index: linux-2.6.20-rc1/mm/Makefile
===================================================================
--- linux-2.6.20-rc1.orig/mm/Makefile	2006-12-21 11:41:06.410940000 +1100
+++ linux-2.6.20-rc1/mm/Makefile	2006-12-21 13:33:22.215470000 +1100
@@ -5,7 +5,7 @@
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= fremap.o highmem.o madvise.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-			   vmalloc.o
+			   vmalloc.o pt-default.o
 
 obj-y			:= bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
 			   page_alloc.o page-writeback.o pdflush.o \
Index: linux-2.6.20-rc1/mm/memory.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/memory.c	2006-12-21 11:41:06.410940000 +1100
+++ linux-2.6.20-rc1/mm/memory.c	2006-12-21 13:47:45.554231000 +1100
@@ -93,31 +93,6 @@
 }
 __setup("norandmaps", disable_randmaps);
 
-
-/*
- * If a p?d_bad entry is found while walking page tables, report
- * the error, before resetting entry to p?d_none.  Usually (but
- * very seldom) called out from the p?d_none_or_clear_bad macros.
- */
-
-void pgd_clear_bad(pgd_t *pgd)
-{
-	pgd_ERROR(*pgd);
-	pgd_clear(pgd);
-}
-
-void pud_clear_bad(pud_t *pud)
-{
-	pud_ERROR(*pud);
-	pud_clear(pud);
-}
-
-void pmd_clear_bad(pmd_t *pmd)
-{
-	pmd_ERROR(*pmd);
-	pmd_clear(pmd);
-}
-
 /*
  * Note: this doesn't free the actual pages themselves. That
  * has been handled earlier when unmapping all the memory regions.
@@ -300,41 +275,6 @@
 	}
 }
 
-int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address)
-{
-	struct page *new = pte_alloc_one(mm, address);
-	if (!new)
-		return -ENOMEM;
-
-	pte_lock_init(new);
-	spin_lock(&mm->page_table_lock);
-	if (pmd_present(*pmd)) {	/* Another has populated it */
-		pte_lock_deinit(new);
-		pte_free(new);
-	} else {
-		mm->nr_ptes++;
-		inc_zone_page_state(new, NR_PAGETABLE);
-		pmd_populate(mm, pmd, new);
-	}
-	spin_unlock(&mm->page_table_lock);
-	return 0;
-}
-
-int __pte_alloc_kernel(pmd_t *pmd, unsigned long address)
-{
-	pte_t *new = pte_alloc_one_kernel(&init_mm, address);
-	if (!new)
-		return -ENOMEM;
-
-	spin_lock(&init_mm.page_table_lock);
-	if (pmd_present(*pmd))		/* Another has populated it */
-		pte_free_kernel(new);
-	else
-		pmd_populate_kernel(&init_mm, pmd, new);
-	spin_unlock(&init_mm.page_table_lock);
-	return 0;
-}
-
 static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss)
 {
 	if (file_rss)
@@ -2476,67 +2416,6 @@
 
 EXPORT_SYMBOL_GPL(__handle_mm_fault);
 
-#ifndef __PAGETABLE_PUD_FOLDED
-/*
- * Allocate page upper directory.
- * We've already handled the fast-path in-line.
- */
-int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
-{
-	pud_t *new = pud_alloc_one(mm, address);
-	if (!new)
-		return -ENOMEM;
-
-	spin_lock(&mm->page_table_lock);
-	if (pgd_present(*pgd))		/* Another has populated it */
-		pud_free(new);
-	else
-		pgd_populate(mm, pgd, new);
-	spin_unlock(&mm->page_table_lock);
-	return 0;
-}
-#else
-/* Workaround for gcc 2.96 */
-int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
-{
-	return 0;
-}
-#endif /* __PAGETABLE_PUD_FOLDED */
-
-#ifndef __PAGETABLE_PMD_FOLDED
-/*
- * Allocate page middle directory.
- * We've already handled the fast-path in-line.
- */
-int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
-{
-	pmd_t *new = pmd_alloc_one(mm, address);
-	if (!new)
-		return -ENOMEM;
-
-	spin_lock(&mm->page_table_lock);
-#ifndef __ARCH_HAS_4LEVEL_HACK
-	if (pud_present(*pud))		/* Another has populated it */
-		pmd_free(new);
-	else
-		pud_populate(mm, pud, new);
-#else
-	if (pgd_present(*pud))		/* Another has populated it */
-		pmd_free(new);
-	else
-		pgd_populate(mm, pud, new);
-#endif /* __ARCH_HAS_4LEVEL_HACK */
-	spin_unlock(&mm->page_table_lock);
-	return 0;
-}
-#else
-/* Workaround for gcc 2.96 */
-int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
-{
-	return 0;
-}
-#endif /* __PAGETABLE_PMD_FOLDED */
-
 int make_pages_present(unsigned long addr, unsigned long end)
 {
 	int ret, len, write;
Index: linux-2.6.20-rc1/mm/pt-default.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/mm/pt-default.c	2006-12-21 13:46:59.270231000 +1100
@@ -0,0 +1,141 @@
+#include <linux/kernel_stat.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/mman.h>
+#include <linux/swap.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+#include <linux/module.h>
+#include <linux/delayacct.h>
+#include <linux/init.h>
+#include <linux/writeback.h>
+
+#include <asm/pgalloc.h>
+#include <asm/uaccess.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/pgtable.h>
+
+#include <linux/swapops.h>
+#include <linux/elf.h>
+
+/*
+ * If a p?d_bad entry is found while walking page tables, report
+ * the error, before resetting entry to p?d_none.  Usually (but
+ * very seldom) called out from the p?d_none_or_clear_bad macros.
+ */
+
+void pgd_clear_bad(pgd_t *pgd)
+{
+	pgd_ERROR(*pgd);
+	pgd_clear(pgd);
+}
+
+void pud_clear_bad(pud_t *pud)
+{
+	pud_ERROR(*pud);
+	pud_clear(pud);
+}
+
+void pmd_clear_bad(pmd_t *pmd)
+{
+	pmd_ERROR(*pmd);
+	pmd_clear(pmd);
+}
+
+int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address)
+{
+	struct page *new = pte_alloc_one(mm, address);
+	if (!new)
+		return -ENOMEM;
+
+	pte_lock_init(new);
+	spin_lock(&mm->page_table_lock);
+	if (pmd_present(*pmd)) {	/* Another has populated it */
+		pte_lock_deinit(new);
+		pte_free(new);
+	} else {
+		mm->nr_ptes++;
+		inc_zone_page_state(new, NR_PAGETABLE);
+		pmd_populate(mm, pmd, new);
+	}
+	spin_unlock(&mm->page_table_lock);
+	return 0;
+}
+
+int __pte_alloc_kernel(pmd_t *pmd, unsigned long address)
+{
+	pte_t *new = pte_alloc_one_kernel(&init_mm, address);
+	if (!new)
+		return -ENOMEM;
+
+	spin_lock(&init_mm.page_table_lock);
+	if (pmd_present(*pmd))		/* Another has populated it */
+		pte_free_kernel(new);
+	else
+		pmd_populate_kernel(&init_mm, pmd, new);
+	spin_unlock(&init_mm.page_table_lock);
+	return 0;
+}
+
+#ifndef __PAGETABLE_PUD_FOLDED
+/*
+ * Allocate page upper directory.
+ * We've already handled the fast-path in-line.
+ */
+int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
+{
+	pud_t *new = pud_alloc_one(mm, address);
+	if (!new)
+		return -ENOMEM;
+
+	spin_lock(&mm->page_table_lock);
+	if (pgd_present(*pgd))		/* Another has populated it */
+		pud_free(new);
+	else
+		pgd_populate(mm, pgd, new);
+	spin_unlock(&mm->page_table_lock);
+	return 0;
+}
+#else
+/* Workaround for gcc 2.96 */
+int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
+{
+	return 0;
+}
+#endif /* __PAGETABLE_PUD_FOLDED */
+
+#ifndef __PAGETABLE_PMD_FOLDED
+/*
+ * Allocate page middle directory.
+ * We've already handled the fast-path in-line.
+ */
+int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
+{
+	pmd_t *new = pmd_alloc_one(mm, address);
+	if (!new)
+		return -ENOMEM;
+
+	spin_lock(&mm->page_table_lock);
+#ifndef __ARCH_HAS_4LEVEL_HACK
+	if (pud_present(*pud))		/* Another has populated it */
+		pmd_free(new);
+	else
+		pud_populate(mm, pud, new);
+#else
+	if (pgd_present(*pud))		/* Another has populated it */
+		pmd_free(new);
+	else
+		pgd_populate(mm, pud, new);
+#endif /* __ARCH_HAS_4LEVEL_HACK */
+	spin_unlock(&mm->page_table_lock);
+	return 0;
+}
+#else
+/* Workaround for gcc 2.96 */
+int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
+{
+	return 0;
+}
+#endif /* __PAGETABLE_PMD_FOLDED */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 2/29] Abstract current page table implementation
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
  2007-01-13  2:45 ` [PATCH 1/29] Abstract current page table implementation Paul Davies
@ 2007-01-13  2:45 ` Paul Davies
  2007-01-13  2:45 ` [PATCH 3/29] " Paul Davies
                   ` (47 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:45 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 02
 * Creates /include/asm-generic/pgtable-default.h to contain the default 
 page table implementation abstracted from include/asm-generic/pgtable.h.
 The abstraction is peformed.
 * Creates include/linux/pt-default-mm.h to contain default page table
 implementation abstracted from include/linux/mm.h.
 * Starts moving default page table implementation from mm.h to pt-default-mm.h
   * moves function prototypes for __pud_alloc, __pmd_alloc etc
   * moves inline implementation of pud_alloc and pmd_alloc
 * NB: All arches are intended to have CONFIG_PT_DEFAULT defined by default
 (which is done in later patches for i386 and IA64 only).

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 asm-generic/pgtable-default.h |   79 ++++++++++++++++++++++++++++++++++++++++++
 asm-generic/pgtable.h         |   68 +-----------------------------------
 linux/mm.h                    |   25 -------------
 linux/pt-default-mm.h         |   27 ++++++++++++++
 4 files changed, 110 insertions(+), 89 deletions(-)
Index: linux-2.6.20-rc4/include/asm-generic/pgtable-default.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/asm-generic/pgtable-default.h	2007-01-11 13:09:08.147868000 +1100
@@ -0,0 +1,79 @@
+#ifndef _ASM_GENERIC_PGTABLE_DEFAULT_H
+#define _ASM_GENERIC_PGTABLE_DEFAULT_H
+
+#ifndef __ASSEMBLY__
+
+#ifndef __HAVE_ARCH_PGD_OFFSET_GATE
+#define pgd_offset_gate(mm, addr)	pgd_offset(mm, addr)
+#endif
+
+/*
+ * When walking page tables, get the address of the next boundary,
+ * or the end address of the range if that comes earlier.  Although no
+ * vma end wraps to 0, rounded up __boundary may wrap to 0 throughout.
+ */
+
+#define pgd_addr_end(addr, end)						\
+({	unsigned long __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;	\
+	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
+})
+
+#ifndef pud_addr_end
+#define pud_addr_end(addr, end)						\
+({	unsigned long __boundary = ((addr) + PUD_SIZE) & PUD_MASK;	\
+	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
+})
+#endif
+
+#ifndef pmd_addr_end
+#define pmd_addr_end(addr, end)						\
+({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
+	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
+})
+#endif
+
+/*
+ * When walking page tables, we usually want to skip any p?d_none entries;
+ * and any p?d_bad entries - reporting the error before resetting to none.
+ * Do the tests inline, but report and clear the bad entry in mm/memory.c.
+ */
+void pgd_clear_bad(pgd_t *);
+void pud_clear_bad(pud_t *);
+void pmd_clear_bad(pmd_t *);
+
+static inline int pgd_none_or_clear_bad(pgd_t *pgd)
+{
+	if (pgd_none(*pgd))
+		return 1;
+	if (unlikely(pgd_bad(*pgd))) {
+		pgd_clear_bad(pgd);
+		return 1;
+	}
+	return 0;
+}
+
+static inline int pud_none_or_clear_bad(pud_t *pud)
+{
+	if (pud_none(*pud))
+		return 1;
+	if (unlikely(pud_bad(*pud))) {
+		pud_clear_bad(pud);
+		return 1;
+	}
+	return 0;
+}
+
+static inline int pmd_none_or_clear_bad(pmd_t *pmd)
+{
+	if (pmd_none(*pmd))
+		return 1;
+	if (unlikely(pmd_bad(*pmd))) {
+		pmd_clear_bad(pmd);
+		return 1;
+	}
+	return 0;
+}
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* _ASM_GENERIC_PGTABLE_H */
Index: linux-2.6.20-rc4/include/asm-generic/pgtable.h
===================================================================
--- linux-2.6.20-rc4.orig/include/asm-generic/pgtable.h	2007-01-11 13:09:04.215868000 +1100
+++ linux-2.6.20-rc4/include/asm-generic/pgtable.h	2007-01-11 13:09:08.147868000 +1100
@@ -182,72 +182,10 @@
 #define arch_leave_lazy_mmu_mode()	do {} while (0)
 #endif
 
-/*
- * When walking page tables, get the address of the next boundary,
- * or the end address of the range if that comes earlier.  Although no
- * vma end wraps to 0, rounded up __boundary may wrap to 0 throughout.
- */
-
-#define pgd_addr_end(addr, end)						\
-({	unsigned long __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;	\
-	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
-})
-
-#ifndef pud_addr_end
-#define pud_addr_end(addr, end)						\
-({	unsigned long __boundary = ((addr) + PUD_SIZE) & PUD_MASK;	\
-	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
-})
-#endif
+#endif /* !__ASSEMBLY__ */
 
-#ifndef pmd_addr_end
-#define pmd_addr_end(addr, end)						\
-({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
-	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
-})
+#ifdef CONFIG_PT_DEFAULT
+#include <asm-generic/pgtable-default.h>
 #endif
 
-/*
- * When walking page tables, we usually want to skip any p?d_none entries;
- * and any p?d_bad entries - reporting the error before resetting to none.
- * Do the tests inline, but report and clear the bad entry in mm/memory.c.
- */
-void pgd_clear_bad(pgd_t *);
-void pud_clear_bad(pud_t *);
-void pmd_clear_bad(pmd_t *);
-
-static inline int pgd_none_or_clear_bad(pgd_t *pgd)
-{
-	if (pgd_none(*pgd))
-		return 1;
-	if (unlikely(pgd_bad(*pgd))) {
-		pgd_clear_bad(pgd);
-		return 1;
-	}
-	return 0;
-}
-
-static inline int pud_none_or_clear_bad(pud_t *pud)
-{
-	if (pud_none(*pud))
-		return 1;
-	if (unlikely(pud_bad(*pud))) {
-		pud_clear_bad(pud);
-		return 1;
-	}
-	return 0;
-}
-
-static inline int pmd_none_or_clear_bad(pmd_t *pmd)
-{
-	if (pmd_none(*pmd))
-		return 1;
-	if (unlikely(pmd_bad(*pmd))) {
-		pmd_clear_bad(pmd);
-		return 1;
-	}
-	return 0;
-}
-#endif /* !__ASSEMBLY__ */
-
 #endif /* _ASM_GENERIC_PGTABLE_H */
Index: linux-2.6.20-rc4/include/linux/mm.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/mm.h	2007-01-11 13:09:04.215868000 +1100
+++ linux-2.6.20-rc4/include/linux/mm.h	2007-01-11 13:11:05.387868000 +1100
@@ -729,8 +729,6 @@
 		struct vm_area_struct *start_vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
 		struct zap_details *);
-void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
-		unsigned long end, unsigned long floor, unsigned long ceiling);
 void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
 		unsigned long floor, unsigned long ceiling);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
@@ -853,28 +851,7 @@
 
 extern pte_t *FASTCALL(get_locked_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl));
 
-int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address);
-int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address);
-int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address);
-int __pte_alloc_kernel(pmd_t *pmd, unsigned long address);
-
-/*
- * The following ifdef needed to get the 4level-fixup.h header to work.
- * Remove it when 4level-fixup.h has been removed.
- */
-#if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
-static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
-{
-	return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
-		NULL: pud_offset(pgd, address);
-}
-
-static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
-{
-	return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
-		NULL: pmd_offset(pud, address);
-}
-#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
+#include <linux/pt-default-mm.h>
 
 #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
 /*
Index: linux-2.6.20-rc4/include/linux/pt-default-mm.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/pt-default-mm.h	2007-01-11 13:09:08.155868000 +1100
@@ -0,0 +1,27 @@
+#ifndef _LINUX_PT_DEFAULT_MM_H
+#define _LINUX_PT_DEFAULT_MM_H
+
+int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address);
+int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address);
+int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address);
+int __pte_alloc_kernel(pmd_t *pmd, unsigned long address);
+
+/*
+ * The following ifdef needed to get the 4level-fixup.h header to work.
+ * Remove it when 4level-fixup.h has been removed.
+ */
+#if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
+static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
+{
+	return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
+		NULL: pud_offset(pgd, address);
+}
+
+static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
+{
+	return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
+		NULL: pmd_offset(pud, address);
+}
+#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
+
+#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 3/29] Abstract current page table implementation
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
  2007-01-13  2:45 ` [PATCH 1/29] Abstract current page table implementation Paul Davies
  2007-01-13  2:45 ` [PATCH 2/29] " Paul Davies
@ 2007-01-13  2:45 ` Paul Davies
  2007-01-16 18:55   ` Christoph Lameter
  2007-01-13  2:46 ` [PATCH 4/29] Introduce Page Table Interface (PTI) Paul Davies
                   ` (46 subsequent siblings)
  49 siblings, 1 reply; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:45 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 03
 * Continues to move default page table functionality from mm.h to
 pt-default-mm.h.
   * moves macros eg: pte_offset_map_lock, pte_alloc_map etc.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 mm.h            |   50 ++------------------------------------------------
 pt-default-mm.h |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 48 deletions(-)
Index: linux-2.6.20-rc4/include/linux/mm.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/mm.h	2007-01-11 13:11:05.387868000 +1100
+++ linux-2.6.20-rc4/include/linux/mm.h	2007-01-11 13:11:10.387868000 +1100
@@ -851,55 +851,9 @@
 
 extern pte_t *FASTCALL(get_locked_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl));
 
+#ifdef CONFIG_PT_DEFAULT
 #include <linux/pt-default-mm.h>
-
-#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
-/*
- * We tuck a spinlock to guard each pagetable page into its struct page,
- * at page->private, with BUILD_BUG_ON to make sure that this will not
- * overflow into the next struct page (as it might with DEBUG_SPINLOCK).
- * When freeing, reset page->mapping so free_pages_check won't complain.
- */
-#define __pte_lockptr(page)	&((page)->ptl)
-#define pte_lock_init(_page)	do {					\
-	spin_lock_init(__pte_lockptr(_page));				\
-} while (0)
-#define pte_lock_deinit(page)	((page)->mapping = NULL)
-#define pte_lockptr(mm, pmd)	({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
-#else
-/*
- * We use mm->page_table_lock to guard all pagetable pages of the mm.
- */
-#define pte_lock_init(page)	do {} while (0)
-#define pte_lock_deinit(page)	do {} while (0)
-#define pte_lockptr(mm, pmd)	({(void)(pmd); &(mm)->page_table_lock;})
-#endif /* NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS */
-
-#define pte_offset_map_lock(mm, pmd, address, ptlp)	\
-({							\
-	spinlock_t *__ptl = pte_lockptr(mm, pmd);	\
-	pte_t *__pte = pte_offset_map(pmd, address);	\
-	*(ptlp) = __ptl;				\
-	spin_lock(__ptl);				\
-	__pte;						\
-})
-
-#define pte_unmap_unlock(pte, ptl)	do {		\
-	spin_unlock(ptl);				\
-	pte_unmap(pte);					\
-} while (0)
-
-#define pte_alloc_map(mm, pmd, address)			\
-	((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
-		NULL: pte_offset_map(pmd, address))
-
-#define pte_alloc_map_lock(mm, pmd, address, ptlp)	\
-	((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
-		NULL: pte_offset_map_lock(mm, pmd, address, ptlp))
-
-#define pte_alloc_kernel(pmd, address)			\
-	((unlikely(!pmd_present(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
-		NULL: pte_offset_kernel(pmd, address))
+#endif
 
 extern void free_area_init(unsigned long * zones_size);
 extern void free_area_init_node(int nid, pg_data_t *pgdat,
Index: linux-2.6.20-rc4/include/linux/pt-default-mm.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-default-mm.h	2007-01-11 13:09:08.155868000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-default-mm.h	2007-01-11 13:11:10.391868000 +1100
@@ -24,4 +24,53 @@
 }
 #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
 
+#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
+/*
+ * We tuck a spinlock to guard each pagetable page into its struct page,
+ * at page->private, with BUILD_BUG_ON to make sure that this will not
+ * overflow into the next struct page (as it might with DEBUG_SPINLOCK).
+ * When freeing, reset page->mapping so free_pages_check won't complain.
+ */
+#define __pte_lockptr(page)	&((page)->ptl)
+#define pte_lock_init(_page)	do {					\
+	spin_lock_init(__pte_lockptr(_page));				\
+} while (0)
+#define pte_lock_deinit(page)	((page)->mapping = NULL)
+#define pte_lockptr(mm, pmd)	({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
+#else
+/*
+ * We use mm->page_table_lock to guard all pagetable pages of the mm.
+ */
+#define pte_lock_init(page)	do {} while (0)
+#define pte_lock_deinit(page)	do {} while (0)
+#define pte_lockptr(mm, pmd)	({(void)(pmd); &(mm)->page_table_lock;})
+#endif /* NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS */
+
+#define pte_offset_map_lock(mm, pmd, address, ptlp)	\
+({							\
+	spinlock_t *__ptl = pte_lockptr(mm, pmd);	\
+	pte_t *__pte = pte_offset_map(pmd, address);	\
+	*(ptlp) = __ptl;				\
+	spin_lock(__ptl);				\
+	__pte;						\
+})
+
+#define pte_unmap_unlock(pte, ptl)	do {		\
+	spin_unlock(ptl);				\
+	pte_unmap(pte);					\
+} while (0)
+
+#define pte_alloc_map(mm, pmd, address)			\
+	((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
+		NULL: pte_offset_map(pmd, address))
+
+#define pte_alloc_map_lock(mm, pmd, address, ptlp)	\
+	((unlikely(!pmd_present(*(pmd))) && __pte_alloc(mm, pmd, address))? \
+		NULL: pte_offset_map_lock(mm, pmd, address, ptlp))
+
+#define pte_alloc_kernel(pmd, address)			\
+	((unlikely(!pmd_present(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
+		NULL: pte_offset_kernel(pmd, address))
+
+
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 4/29] Introduce Page Table Interface (PTI)
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (2 preceding siblings ...)
  2007-01-13  2:45 ` [PATCH 3/29] " Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-16 19:02   ` Christoph Lameter
  2007-01-13  2:46 ` [PATCH 5/29] Start calling simple PTI functions Paul Davies
                   ` (45 subsequent siblings)
  49 siblings, 1 reply; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 04
 * Creates /include/linux/pt.h and defines the clean page table interface
 there.  This file includes the chosen page table implementation 
 (at the moment, only the default implementation).
 * Creates /include/linux/pt-default.h to hold a small subset of
 the default page table implementation (for performance reasons).
   * It keeps lookup_page_table and build_page_table static inlined.
   * Locking should be inside the implementation.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pt-default.h |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pt.h         |   60 +++++++++++++++++++++++
 2 files changed, 212 insertions(+)
Index: linux-2.6.20-rc4/include/linux/pt-default.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/pt-default.h	2007-01-11 13:01:45.067046000 +1100
@@ -0,0 +1,152 @@
+#ifndef _LINUX_PT_DEFAULT_H
+#define _LINUX_PT_DEFAULT_H
+
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+
+#include <linux/hugetlb.h>
+#include <linux/highmem.h>
+
+typedef struct pt_struct { pmd_t *pmd; } pt_path_t;
+
+static inline int create_user_page_table(struct mm_struct *mm)
+{
+	mm->page_table.pgd = pgd_alloc(NULL);
+
+	if (unlikely(!mm->page_table.pgd))
+		return -ENOMEM;
+	return 0;
+}
+
+static inline void destroy_user_page_table(struct mm_struct *mm)
+{
+	pgd_free(mm->page_table.pgd);
+}
+
+static inline pte_t *lookup_page_table(struct mm_struct *mm,
+		unsigned long address, pt_path_t *pt_path)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	if (mm!=&init_mm) { /* Look up user page table */
+		pgd = pgd_offset(mm, address);
+		if (pgd_none_or_clear_bad(pgd))
+			return NULL;
+	} else { /* Look up kernel page table */
+		pgd = pgd_offset_k(address);
+		if (pgd_none_or_clear_bad(pgd))
+			return NULL;
+	}
+
+	pud = pud_offset(pgd, address);
+	if (pud_none_or_clear_bad(pud)) {
+		return NULL;
+	}
+
+	pmd = pmd_offset(pud, address);
+	if (pmd_none_or_clear_bad(pmd)) {
+		return NULL;
+	}
+
+	if(pt_path)
+		pt_path->pmd = pmd;
+
+	return pte_offset_map(pmd, address);
+}
+
+static inline pte_t *build_page_table(struct mm_struct *mm,
+		unsigned long address, pt_path_t *pt_path)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset(mm, address);
+	pud = pud_alloc(mm, pgd, address);
+	if (!pud)
+		return NULL;
+	pmd = pmd_alloc(mm, pud, address);
+	if (!pmd)
+		return NULL;
+
+	pt_path->pmd = pmd;
+	return pte_alloc_map(mm, pmd, address);
+}
+
+#define INIT_PT .page_table.pgd	= swapper_pg_dir,
+
+#define lock_pte(mm, pt_path) \
+	({ spin_lock(pte_lockptr(mm, pt_path.pmd));})
+
+#define unlock_pte(mm, pt_path) \
+	({ spin_unlock(pte_lockptr(mm, pt_path.pmd)); })
+
+#define lookup_page_table_lock(mm, pt_path, address)	\
+({							\
+	spinlock_t *__ptl = pte_lockptr(mm, pt_path.pmd);	\
+	pte_t *__pte = pte_offset_map(pt_path.pmd, address);	\
+	spin_lock(__ptl);				\
+	__pte;						\
+})
+
+#define atomic_pte_same(mm, pte, orig_pte, pt_path) \
+({ \
+	spinlock_t *ptl = pte_lockptr(mm, pt_path.pmd); \
+	int __same; \
+	spin_lock(ptl); \
+	__same = pte_same(*pte, orig_pte); \
+	spin_unlock(ptl); \
+	__same; \
+})
+
+#define is_huge_page(mm, address, pt_path, flags, page) \
+({ \
+	int __ret=0; \
+	if(pmd_huge(*pt_path.pmd)) { \
+		BUG_ON(flags & FOLL_GET); \
+		page = follow_huge_pmd(mm, address, pt_path.pmd, flags & FOLL_WRITE); \
+		__ret = 1; \
+	} \
+  	__ret; \
+})
+
+#define set_pt_path(pt_path, ppt_path) (*(ppt_path)= (pt_path)) /* fix this */
+
+#define CLUSTER_SIZE	min(32*PAGE_SIZE, PMD_SIZE)
+
+static inline pte_t *lookup_gate_area(struct mm_struct *mm,
+			unsigned long pg)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	if (pg > TASK_SIZE)
+		pgd = pgd_offset_k(pg);
+	else
+		pgd = pgd_offset_gate(mm, pg);
+	BUG_ON(pgd_none(*pgd));
+	pud = pud_offset(pgd, pg);
+	BUG_ON(pud_none(*pud));
+	pmd = pmd_offset(pud, pg);
+	if (pmd_none(*pmd))
+		return NULL;
+	pte = pte_offset_map(pmd, pg);
+	return pte;
+}
+
+#define vma_optimization \
+({ \
+	while (next && next->vm_start <= vma->vm_end + PMD_SIZE \
+	  	  && !is_vm_hugetlb_page(next)) { \
+		vma = next; \
+		next = vma->vm_next; \
+		anon_vma_unlink(vma); \
+		unlink_file_vma(vma); \
+	} \
+})
+
+#endif
Index: linux-2.6.20-rc4/include/linux/pt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/pt.h	2007-01-11 13:04:06.307868000 +1100
@@ -0,0 +1,60 @@
+#ifndef _LINUX_PT_H
+#define _LINUX_PT_H
+
+#include <linux/swap.h>
+
+#ifdef CONFIG_PT_DEFAULT
+#include <linux/pt-default.h>
+#endif
+
+int create_user_page_table(struct mm_struct *mm);
+
+void destroy_user_page_table(struct mm_struct *mm);
+
+pte_t *build_page_table(struct mm_struct *mm, unsigned long address,
+		pt_path_t *pt_path);
+
+pte_t *lookup_page_table(struct mm_struct *mm, unsigned long address,
+		pt_path_t *pt_path);
+
+void free_pt_range(struct mmu_gather **tlb, unsigned long addr,
+		unsigned long end, unsigned long floor, unsigned long ceiling);
+
+int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		unsigned long addr, unsigned long end, struct vm_area_struct *vma);
+
+unsigned long unmap_page_range_iterator(struct mmu_gather *tlb,
+        struct vm_area_struct *vma, unsigned long addr, unsigned long end,
+        long *zap_work, struct zap_details *details);
+
+int zeromap_build_iterator(struct mm_struct *mm,
+		unsigned long addr, unsigned long end, pgprot_t prot);
+
+int remap_build_iterator(struct mm_struct *mm,
+		unsigned long addr, unsigned long end, unsigned long pfn,
+		pgprot_t prot);
+
+void change_protection_read_iterator(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end, pgprot_t newprot,
+		int dirty_accountable);
+
+void vunmap_read_iterator(unsigned long addr, unsigned long end);
+
+int vmap_build_iterator(unsigned long addr,
+		unsigned long end, pgprot_t prot, struct page ***pages);
+
+int unuse_vma_read_iterator(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page);
+
+/*void smaps_read_iterator(struct vm_area_struct *vma,
+  unsigned long addr, unsigned long end, struct mem_size_stats *mss);*/
+
+int check_policy_read_iterator(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end, const nodemask_t *nodes,
+		unsigned long flags, void *private);
+
+unsigned long move_page_tables(struct vm_area_struct *vma,
+		unsigned long old_addr, struct vm_area_struct *new_vma,
+		unsigned long new_addr, unsigned long len);
+
+#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 5/29] Start calling simple PTI functions
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (3 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 4/29] Introduce Page Table Interface (PTI) Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-16 19:04   ` Christoph Lameter
  2007-01-13  2:46 ` [PATCH 6/29] Tweak IA64 arch dependent files to work with PTI Paul Davies
                   ` (44 subsequent siblings)
  49 siblings, 1 reply; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 05
 * Creates /include/linux/pt-type.h for holding different page table types.
  * Gives the default page table a type, and adjusts include/linux/sched.h 
  to point to the generic page table type (as opposed to the pgd).
 * Removes implementation dependent calls from fork.c and replaces them
 with calls from the interface in pt.h. (create_page_table etc are called
 instead).

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/init_task.h |    2 +-
 include/linux/pt-type.h   |    8 ++++++++
 include/linux/sched.h     |    4 +++-
 kernel/fork.c             |   25 +++++++------------------
 4 files changed, 19 insertions(+), 20 deletions(-)
Index: linux-2.6.20-rc1/kernel/fork.c
===================================================================
--- linux-2.6.20-rc1.orig/kernel/fork.c	2006-12-23 14:54:16.573929000 +1100
+++ linux-2.6.20-rc1/kernel/fork.c	2006-12-23 14:55:07.173929000 +1100
@@ -49,6 +49,7 @@
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
+#include <linux/pt.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -300,22 +301,10 @@
 	goto out;
 }
 
-static inline int mm_alloc_pgd(struct mm_struct * mm)
-{
-	mm->pgd = pgd_alloc(mm);
-	if (unlikely(!mm->pgd))
-		return -ENOMEM;
-	return 0;
-}
-
-static inline void mm_free_pgd(struct mm_struct * mm)
-{
-	pgd_free(mm->pgd);
-}
 #else
 #define dup_mmap(mm, oldmm)	(0)
-#define mm_alloc_pgd(mm)	(0)
-#define mm_free_pgd(mm)
+#define create_user_page_table(mm)	(0)
+#define destroy_user_page_table(mm)
 #endif /* CONFIG_MMU */
 
  __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
@@ -340,11 +329,11 @@
 	mm->ioctx_list = NULL;
 	mm->free_area_cache = TASK_UNMAPPED_BASE;
 	mm->cached_hole_size = ~0UL;
-
-	if (likely(!mm_alloc_pgd(mm))) {
+	if (likely(!create_user_page_table(mm))) {
 		mm->def_flags = 0;
 		return mm;
 	}
+
 	free_mm(mm);
 	return NULL;
 }
@@ -372,7 +361,7 @@
 void fastcall __mmdrop(struct mm_struct *mm)
 {
 	BUG_ON(mm == &init_mm);
-	mm_free_pgd(mm);
+	destroy_user_page_table(mm);
 	destroy_context(mm);
 	free_mm(mm);
 }
@@ -519,7 +508,7 @@
 	 * If init_new_context() failed, we cannot use mmput() to free the mm
 	 * because it calls destroy_context()
 	 */
-	mm_free_pgd(mm);
+	destroy_user_page_table(mm);
 	free_mm(mm);
 	return NULL;
 }
Index: linux-2.6.20-rc1/include/linux/pt-type.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/linux/pt-type.h	2006-12-23 14:55:39.021929000 +1100
@@ -0,0 +1,8 @@
+#ifndef _LINUX_PT_TYPE_H
+#define _LINUX_PT_TYPE_H
+
+#ifdef CONFIG_PT_DEFAULT
+typedef struct { pgd_t *pgd; } pt_t;
+#endif
+
+#endif
Index: linux-2.6.20-rc1/include/linux/sched.h
===================================================================
--- linux-2.6.20-rc1.orig/include/linux/sched.h	2006-12-23 14:54:16.581929000 +1100
+++ linux-2.6.20-rc1/include/linux/sched.h	2006-12-23 14:55:07.173929000 +1100
@@ -83,6 +83,7 @@
 #include <linux/timer.h>
 #include <linux/hrtimer.h>
 #include <linux/task_io_accounting.h>
+#include <linux/pt-type.h>
 
 #include <asm/processor.h>
 
@@ -308,6 +309,7 @@
 } while (0)
 
 struct mm_struct {
+	pt_t page_table;					/* Page table */
 	struct vm_area_struct * mmap;		/* list of VMAs */
 	struct rb_root mm_rb;
 	struct vm_area_struct * mmap_cache;	/* last find_vma result */
@@ -319,7 +321,7 @@
 	unsigned long task_size;		/* size of task vm space */
 	unsigned long cached_hole_size;         /* if non-zero, the largest hole below free_area_cache */
 	unsigned long free_area_cache;		/* first hole of size cached_hole_size or larger */
-	pgd_t * pgd;
+		/*pgd_t * pgd;*/
 	atomic_t mm_users;			/* How many users with user space? */
 	atomic_t mm_count;			/* How many references to "struct mm_struct" (users count as 1) */
 	int map_count;				/* number of VMAs */
Index: linux-2.6.20-rc1/include/linux/init_task.h
===================================================================
--- linux-2.6.20-rc1.orig/include/linux/init_task.h	2006-12-23 14:54:16.581929000 +1100
+++ linux-2.6.20-rc1/include/linux/init_task.h	2006-12-23 14:55:07.177929000 +1100
@@ -47,7 +47,7 @@
 #define INIT_MM(name) \
 {			 					\
 	.mm_rb		= RB_ROOT,				\
-	.pgd		= swapper_pg_dir, 			\
+	INIT_PT 			\
 	.mm_users	= ATOMIC_INIT(2), 			\
 	.mm_count	= ATOMIC_INIT(1), 			\
 	.mmap_sem	= __RWSEM_INITIALIZER(name.mmap_sem),	\

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 6/29] Tweak IA64 arch dependent files to work with PTI
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (4 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 5/29] Start calling simple PTI functions Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-16 19:05   ` Christoph Lameter
  2007-01-13  2:46 ` [PATCH 7/29] Continue calling simple PTI functions Paul Davies
                   ` (43 subsequent siblings)
  49 siblings, 1 reply; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 06 ia64
 * Defines default page table config option: PT_DEFAULT in Kconfig.debug
 to appear in kernel hacking.
 * Adjusts arch dependent files referring to the pgd in the mm_struct
 to do so via the new generic page table type (no pgd in mm_struct anymore).

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 arch/ia64/Kconfig.debug        |    9 +++++++++
 arch/ia64/kernel/init_task.c   |    2 +-
 arch/ia64/mm/hugetlbpage.c     |    4 ++--
 include/asm-ia64/mmu_context.h |    2 +-
 include/asm-ia64/pgtable.h     |    4 ++--
 5 files changed, 15 insertions(+), 6 deletions(-)
Index: linux-2.6.20-rc4/include/asm-ia64/mmu_context.h
===================================================================
--- linux-2.6.20-rc4.orig/include/asm-ia64/mmu_context.h	2007-01-11 13:15:05.228780000 +1100
+++ linux-2.6.20-rc4/include/asm-ia64/mmu_context.h	2007-01-11 13:15:36.184250000 +1100
@@ -191,7 +191,7 @@
 	 * We may get interrupts here, but that's OK because interrupt
 	 * handlers cannot touch user-space.
 	 */
-	ia64_set_kr(IA64_KR_PT_BASE, __pa(next->pgd));
+	ia64_set_kr(IA64_KR_PT_BASE, __pa(next->page_table.pgd));
 	activate_context(next);
 }
 
Index: linux-2.6.20-rc4/include/asm-ia64/pgtable.h
===================================================================
--- linux-2.6.20-rc4.orig/include/asm-ia64/pgtable.h	2007-01-11 13:15:05.232782000 +1100
+++ linux-2.6.20-rc4/include/asm-ia64/pgtable.h	2007-01-11 13:15:36.184250000 +1100
@@ -346,13 +346,13 @@
 static inline pgd_t*
 pgd_offset (struct mm_struct *mm, unsigned long address)
 {
-	return mm->pgd + pgd_index(address);
+	return mm->page_table.pgd + pgd_index(address);
 }
 
 /* In the kernel's mapped region we completely ignore the region number
    (since we know it's in region number 5). */
 #define pgd_offset_k(addr) \
-	(init_mm.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
+	(init_mm.page_table.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
 
 /* Look up a pgd entry in the gate area.  On IA-64, the gate-area
    resides in the kernel-mapped segment, hence we use pgd_offset_k()
Index: linux-2.6.20-rc4/arch/ia64/kernel/init_task.c
===================================================================
--- linux-2.6.20-rc4.orig/arch/ia64/kernel/init_task.c	2007-01-11 13:15:05.232782000 +1100
+++ linux-2.6.20-rc4/arch/ia64/kernel/init_task.c	2007-01-11 13:15:36.188252000 +1100
@@ -12,9 +12,9 @@
 #include <linux/sched.h>
 #include <linux/init_task.h>
 #include <linux/mqueue.h>
+#include <linux/pt.h>
 
 #include <asm/uaccess.h>
-#include <asm/pgtable.h>
 
 static struct fs_struct init_fs = INIT_FS;
 static struct files_struct init_files = INIT_FILES;
Index: linux-2.6.20-rc4/arch/ia64/mm/hugetlbpage.c
===================================================================
--- linux-2.6.20-rc4.orig/arch/ia64/mm/hugetlbpage.c	2007-01-11 13:15:05.232782000 +1100
+++ linux-2.6.20-rc4/arch/ia64/mm/hugetlbpage.c	2007-01-11 13:16:53.160812000 +1100
@@ -16,8 +16,8 @@
 #include <linux/smp_lock.h>
 #include <linux/slab.h>
 #include <linux/sysctl.h>
+#include <linux/pt.h>
 #include <asm/mman.h>
-#include <asm/pgalloc.h>
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 
@@ -136,7 +136,7 @@
 	if (REGION_NUMBER(ceiling) == RGN_HPAGE)
 		ceiling = htlbpage_to_page(ceiling);
 
-	free_pgd_range(tlb, addr, end, floor, ceiling);
+	free_pt_range(tlb, addr, end, floor, ceiling);
 }
 
 unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
Index: linux-2.6.20-rc4/arch/ia64/Kconfig.debug
===================================================================
--- linux-2.6.20-rc4.orig/arch/ia64/Kconfig.debug	2007-01-11 13:15:05.232782000 +1100
+++ linux-2.6.20-rc4/arch/ia64/Kconfig.debug	2007-01-11 13:15:36.192254000 +1100
@@ -3,6 +3,15 @@
 source "lib/Kconfig.debug"
 
 choice
+	prompt "Page table selection"
+	default DEFAULT-PT
+
+config  PT_DEFAULT
+	bool "PT_DEFAULT"
+
+endchoice
+
+choice
 	prompt "Physical memory granularity"
 	default IA64_GRANULE_64MB
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 7/29] Continue calling simple PTI functions
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (5 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 6/29] Tweak IA64 arch dependent files to work with PTI Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-16 19:08   ` Christoph Lameter
  2007-01-13  2:46 ` [PATCH 8/29] Clean up page fault handers Paul Davies
                   ` (42 subsequent siblings)
  49 siblings, 1 reply; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 07
 * get_locked_pte removed from memory.c (it is now absorbed into the default
 page table implementation).
 * removes the prototype for get_locked_pte from mm.h
 * Goes through kernel code calling build_page_table in exec.c, fremap.c
 and memory.c. Call new macros to lock and unlock ptes.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 fs/exec.c          |   15 ++++++++++-----
 include/linux/mm.h |    2 --
 mm/fremap.c        |   20 ++++++++++++++------
 mm/memory.c        |   40 ++++++++++++----------------------------
 4 files changed, 36 insertions(+), 41 deletions(-)
Index: linux-2.6.20-rc4/fs/exec.c
===================================================================
--- linux-2.6.20-rc4.orig/fs/exec.c	2007-01-11 13:09:03.951868000 +1100
+++ linux-2.6.20-rc4/fs/exec.c	2007-01-11 13:11:42.147868000 +1100
@@ -50,6 +50,7 @@
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
 #include <linux/audit.h>
+#include <linux/pt.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -308,17 +309,21 @@
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t * pte;
-	spinlock_t *ptl;
+	pt_path_t pt_path;
 
 	if (unlikely(anon_vma_prepare(vma)))
 		goto out;
 
 	flush_dcache_page(page);
-	pte = get_locked_pte(mm, address, &ptl);
+
+	pte = build_page_table(mm, address, &pt_path);
+	lock_pte(mm, pt_path);
+
 	if (!pte)
 		goto out;
 	if (!pte_none(*pte)) {
-		pte_unmap_unlock(pte, ptl);
+		unlock_pte(mm, pt_path);
+		pte_unmap(pte);
 		goto out;
 	}
 	inc_mm_counter(mm, anon_rss);
@@ -326,8 +331,8 @@
 	set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
 					page, vma->vm_page_prot))));
 	page_add_new_anon_rmap(page, vma, address);
-	pte_unmap_unlock(pte, ptl);
-
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 	/* no need for flush_tlb */
 	return;
 out:
Index: linux-2.6.20-rc4/mm/fremap.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/fremap.c	2007-01-11 13:09:03.951868000 +1100
+++ linux-2.6.20-rc4/mm/fremap.c	2007-01-11 13:11:42.227868000 +1100
@@ -15,6 +15,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/pt.h>
 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
@@ -56,9 +57,11 @@
 	int err = -ENOMEM;
 	pte_t *pte;
 	pte_t pte_val;
-	spinlock_t *ptl;
+	pt_path_t pt_path;
+
+	pte = build_page_table(mm, addr, &pt_path);
+	lock_pte(mm, pt_path);
 
-	pte = get_locked_pte(mm, addr, &ptl);
 	if (!pte)
 		goto out;
 
@@ -86,7 +89,8 @@
 	lazy_mmu_prot_update(pte_val);
 	err = 0;
 unlock:
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 out:
 	return err;
 }
@@ -101,9 +105,11 @@
 {
 	int err = -ENOMEM;
 	pte_t *pte;
-	spinlock_t *ptl;
+ 	pt_path_t pt_path;
+
+ 	pte = build_page_table(mm, addr, &pt_path);
+ 	lock_pte(mm, pt_path);
 
-	pte = get_locked_pte(mm, addr, &ptl);
 	if (!pte)
 		goto out;
 
@@ -120,7 +126,9 @@
 	 * be mapped there when there's a fault (in a non-linear vma where
 	 * that's not obvious).
 	 */
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
+
 	err = 0;
 out:
 	return err;
Index: linux-2.6.20-rc4/mm/memory.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/memory.c	2007-01-11 13:11:37.315868000 +1100
+++ linux-2.6.20-rc4/mm/memory.c	2007-01-11 13:11:42.227868000 +1100
@@ -50,6 +50,7 @@
 #include <linux/delayacct.h>
 #include <linux/init.h>
 #include <linux/writeback.h>
+#include <linux/pt.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -1134,18 +1135,6 @@
 	return err;
 }
 
-pte_t * fastcall get_locked_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl)
-{
-	pgd_t * pgd = pgd_offset(mm, addr);
-	pud_t * pud = pud_alloc(mm, pgd, addr);
-	if (pud) {
-		pmd_t * pmd = pmd_alloc(mm, pud, addr);
-		if (pmd)
-			return pte_alloc_map_lock(mm, pmd, addr, ptl);
-	}
-	return NULL;
-}
-
 /*
  * This is the old fallback for page remapping.
  *
@@ -1157,14 +1146,17 @@
 {
 	int retval;
 	pte_t *pte;
-	spinlock_t *ptl;  
+	pt_path_t pt_path;
 
 	retval = -EINVAL;
 	if (PageAnon(page))
 		goto out;
 	retval = -ENOMEM;
 	flush_dcache_page(page);
-	pte = get_locked_pte(mm, addr, &ptl);
+
+	pte = build_page_table(mm, addr, &pt_path);
+	lock_pte(mm, pt_path);
+
 	if (!pte)
 		goto out;
 	retval = -EBUSY;
@@ -1179,7 +1171,8 @@
 
 	retval = 0;
 out_unlock:
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 out:
 	return retval;
 }
@@ -2385,13 +2378,12 @@
 /*
  * By the time we get here, we already hold the mm semaphore
  */
+
 int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		unsigned long address, int write_access)
 {
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
 	pte_t *pte;
+	pt_path_t pt_path;
 
 	__set_current_state(TASK_RUNNING);
 
@@ -2400,20 +2392,12 @@
 	if (unlikely(is_vm_hugetlb_page(vma)))
 		return hugetlb_fault(mm, vma, address, write_access);
 
-	pgd = pgd_offset(mm, address);
-	pud = pud_alloc(mm, pgd, address);
-	if (!pud)
-		return VM_FAULT_OOM;
-	pmd = pmd_alloc(mm, pud, address);
-	if (!pmd)
-		return VM_FAULT_OOM;
-	pte = pte_alloc_map(mm, pmd, address);
+	pte = build_page_table(mm, address, &pt_path);
 	if (!pte)
 		return VM_FAULT_OOM;
 
-	return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+	return handle_pte_fault(mm, vma, address, pte, pt_path.pmd, write_access);
 }
-
 EXPORT_SYMBOL_GPL(__handle_mm_fault);
 
 int make_pages_present(unsigned long addr, unsigned long end)
Index: linux-2.6.20-rc4/include/linux/mm.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/mm.h	2007-01-11 13:11:38.999868000 +1100
+++ linux-2.6.20-rc4/include/linux/mm.h	2007-01-11 13:11:42.231868000 +1100
@@ -849,8 +849,6 @@
 		mapping_cap_account_dirty(vma->vm_file->f_mapping);
 }
 
-extern pte_t *FASTCALL(get_locked_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl));
-
 #ifdef CONFIG_PT_DEFAULT
 #include <linux/pt-default-mm.h>
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 8/29] Clean up page fault handers
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (6 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 7/29] Continue calling simple PTI functions Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:46 ` [PATCH 9/29] Clean up page fault handlers Paul Davies
                   ` (41 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 08
 * Goes through the page fault handler functions in memory.c and abstracts
 the implementation dependent page table lookups, replacing them with calls
 from the interface.   
 * This has been abstracted such that the fault handler functions remain 
 as undisturbed as possible for the default page table implementation.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 memory.c |   44 ++++++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 20 deletions(-)
Index: linux-2.6.20-rc1/mm/memory.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/memory.c	2006-12-21 16:50:32.334023000 +1100
+++ linux-2.6.20-rc1/mm/memory.c	2006-12-21 16:54:57.202023000 +1100
@@ -2014,11 +2014,10 @@
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
 static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, pte_t *page_table, pmd_t *pmd,
+		unsigned long address, pte_t *page_table, pt_path_t pt_path,
 		int write_access)
 {
 	struct page *page;
-	spinlock_t *ptl;
 	pte_t entry;
 
 	if (write_access) {
@@ -2034,7 +2033,7 @@
 		entry = mk_pte(page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 
-		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+		page_table = lookup_page_table_lock(mm, pt_path, address);
 		if (!pte_none(*page_table))
 			goto release;
 		inc_mm_counter(mm, anon_rss);
@@ -2046,8 +2045,8 @@
 		page_cache_get(page);
 		entry = mk_pte(page, vma->vm_page_prot);
 
-		ptl = pte_lockptr(mm, pmd);
-		spin_lock(ptl);
+		lock_pte(mm, pt_path);
+
 		if (!pte_none(*page_table))
 			goto release;
 		inc_mm_counter(mm, file_rss);
@@ -2060,7 +2059,8 @@
 	update_mmu_cache(vma, address, entry);
 	lazy_mmu_prot_update(entry);
 unlock:
-	pte_unmap_unlock(page_table, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 	return VM_FAULT_MINOR;
 release:
 	page_cache_release(page);
@@ -2083,10 +2083,9 @@
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
 static int do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, pte_t *page_table, pmd_t *pmd,
+		unsigned long address, pte_t *page_table, pt_path_t pt_path,
 		int write_access)
 {
-	spinlock_t *ptl;
 	struct page *new_page;
 	struct address_space *mapping = NULL;
 	pte_t entry;
@@ -2151,14 +2150,16 @@
 		}
 	}
 
-	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	page_table = lookup_page_table_lock(mm, pt_path, address);
+
 	/*
 	 * For a file-backed vma, someone could have truncated or otherwise
 	 * invalidated this page.  If unmap_mapping_range got called,
 	 * retry getting the page.
 	 */
 	if (mapping && unlikely(sequence != mapping->truncate_count)) {
-		pte_unmap_unlock(page_table, ptl);
+		unlock_pte(mm, pt_path);
+		pte_unmap(page_table);
 		page_cache_release(new_page);
 		cond_resched();
 		sequence = mapping->truncate_count;
@@ -2205,7 +2206,8 @@
 	update_mmu_cache(vma, address, entry);
 	lazy_mmu_prot_update(entry);
 unlock:
-	pte_unmap_unlock(page_table, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 	if (dirty_page) {
 		set_page_dirty_balance(dirty_page);
 		put_page(dirty_page);
@@ -2233,10 +2235,9 @@
  * Mark this `noinline' to prevent it from bloating the main pagefault code.
  */
 static noinline int do_no_pfn(struct mm_struct *mm, struct vm_area_struct *vma,
-		     unsigned long address, pte_t *page_table, pmd_t *pmd,
+		     unsigned long address, pte_t *page_table, pt_path_t pt_path,
 		     int write_access)
 {
-	spinlock_t *ptl;
 	pte_t entry;
 	unsigned long pfn;
 	int ret = VM_FAULT_MINOR;
@@ -2251,7 +2252,7 @@
 	if (pfn == NOPFN_SIGBUS)
 		return VM_FAULT_SIGBUS;
 
-	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	page_table = lookup_page_table_lock(mm, pt_path, address);
 
 	/* Only go through if we didn't race with anybody else... */
 	if (pte_none(*page_table)) {
@@ -2260,7 +2261,8 @@
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		set_pte_at(mm, address, page_table, entry);
 	}
-	pte_unmap_unlock(page_table, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 	return ret;
 }
 
@@ -2317,26 +2319,28 @@
  */
 static inline int handle_pte_fault(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long address,
-		pte_t *pte, pmd_t *pmd, int write_access)
+		pte_t *pte, pt_path_t pt_path, int write_access)
 {
 	pte_t entry;
 	pte_t old_entry;
 	spinlock_t *ptl;
 
+	pmd_t *pmd = pt_path.pmd;
+
 	old_entry = entry = *pte;
 	if (!pte_present(entry)) {
 		if (pte_none(entry)) {
 			if (vma->vm_ops) {
 				if (vma->vm_ops->nopage)
 					return do_no_page(mm, vma, address,
-							  pte, pmd,
+							  pte, pt_path,
 							  write_access);
 				if (unlikely(vma->vm_ops->nopfn))
 					return do_no_pfn(mm, vma, address, pte,
-							 pmd, write_access);
+							 pt_path, write_access);
 			}
 			return do_anonymous_page(mm, vma, address,
-						 pte, pmd, write_access);
+						 pte, pt_path, write_access);
 		}
 		if (pte_file(entry))
 			return do_file_page(mm, vma, address,
@@ -2396,7 +2400,7 @@
 	if (!pte)
 		return VM_FAULT_OOM;
 
-	return handle_pte_fault(mm, vma, address, pte, pt_path.pmd, write_access);
+	return handle_pte_fault(mm, vma, address, pte, pt_path, write_access);
 }
 EXPORT_SYMBOL_GPL(__handle_mm_fault);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 9/29] Clean up page fault handlers
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (7 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 8/29] Clean up page fault handers Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:46 ` [PATCH 10/29] Call simple PTI functions Paul Davies
                   ` (40 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 09
 * Finish abstracting implementation dependendent code from
 page fault handler functions
 * Abstract page migration function to call lookup page table
 from the page table interface.
 * Align migration_entry_wait prototype in swapops.h with
 page table interface. 

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/swapops.h |    6 ++--
 mm/memory.c             |   70 +++++++++++++++++++++++-------------------------
 mm/migrate.c            |   12 ++++----
 3 files changed, 44 insertions(+), 44 deletions(-)
Index: linux-2.6.20-rc3/mm/memory.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/memory.c	2007-01-04 20:49:40.550922000 +1100
+++ linux-2.6.20-rc3/mm/memory.c	2007-01-04 20:49:46.340026000 +1100
@@ -1345,22 +1345,17 @@
  * (but do_wp_page is only called after already making such a check;
  * and do_anonymous_page and do_no_page can safely check later on).
  */
-static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
+static inline int pte_unmap_same(struct mm_struct *mm, pt_path_t pt_path,
 				pte_t *page_table, pte_t orig_pte)
 {
 	int same = 1;
 #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
-	if (sizeof(pte_t) > sizeof(unsigned long)) {
-		spinlock_t *ptl = pte_lockptr(mm, pmd);
-		spin_lock(ptl);
-		same = pte_same(*page_table, orig_pte);
-		spin_unlock(ptl);
-	}
+	if (sizeof(pte_t) > sizeof(unsigned long))
+		same = atomic_pte_same(mm, page_table, orig_pte, pt_path);
 #endif
 	pte_unmap(page_table);
 	return same;
 }
-
 /*
  * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
  * servicing faults for write access.  In the normal case, do always want
@@ -1421,8 +1416,8 @@
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
 static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, pte_t *page_table, pmd_t *pmd,
-		spinlock_t *ptl, pte_t orig_pte)
+		unsigned long address, pte_t *page_table, pt_path_t pt_path,
+		pte_t orig_pte)
 {
 	struct page *old_page, *new_page;
 	pte_t entry;
@@ -1459,7 +1454,8 @@
 			 * sleep if it needs to.
 			 */
 			page_cache_get(old_page);
-			pte_unmap_unlock(page_table, ptl);
+			unlock_pte(mm, pt_path);
+			pte_unmap(page_table);
 
 			if (vma->vm_ops->page_mkwrite(vma, old_page) < 0)
 				goto unwritable_page;
@@ -1472,8 +1468,7 @@
 			 * they did, we just return, as we can count on the
 			 * MMU to tell us if they didn't also make it writable.
 			 */
-			page_table = pte_offset_map_lock(mm, pmd, address,
-							 &ptl);
+			page_table = lookup_page_table_lock(mm, pt_path, address);
 			if (!pte_same(*page_table, orig_pte))
 				goto unlock;
 		}
@@ -1498,7 +1493,8 @@
 	 */
 	page_cache_get(old_page);
 gotten:
-	pte_unmap_unlock(page_table, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 
 	if (unlikely(anon_vma_prepare(vma)))
 		goto oom;
@@ -1516,7 +1512,8 @@
 	/*
 	 * Re-check the pte - we dropped the lock
 	 */
-	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	page_table = lookup_page_table_lock(mm, pt_path, address);
+
 	if (likely(pte_same(*page_table, orig_pte))) {
 		if (old_page) {
 			page_remove_rmap(old_page, vma);
@@ -1551,7 +1548,8 @@
 	if (old_page)
 		page_cache_release(old_page);
 unlock:
-	pte_unmap_unlock(page_table, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 	if (dirty_page) {
 		set_page_dirty_balance(dirty_page);
 		put_page(dirty_page);
@@ -1913,21 +1911,20 @@
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
 static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, pte_t *page_table, pmd_t *pmd,
+		unsigned long address, pte_t *page_table, pt_path_t pt_path,
 		int write_access, pte_t orig_pte)
 {
-	spinlock_t *ptl;
 	struct page *page;
 	swp_entry_t entry;
 	pte_t pte;
 	int ret = VM_FAULT_MINOR;
 
-	if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
+	if (!pte_unmap_same(mm, pt_path, page_table, orig_pte))
 		goto out;
 
 	entry = pte_to_swp_entry(orig_pte);
 	if (is_migration_entry(entry)) {
-		migration_entry_wait(mm, pmd, address);
+		migration_entry_wait(mm, pt_path, address);
 		goto out;
 	}
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
@@ -1941,7 +1938,7 @@
 			 * Back out if somebody else faulted in this pte
 			 * while we released the pte lock.
 			 */
-			page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+			page_table = lookup_page_table_lock(mm, pt_path, address);
 			if (likely(pte_same(*page_table, orig_pte)))
 				ret = VM_FAULT_OOM;
 			delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
@@ -1960,7 +1957,8 @@
 	/*
 	 * Back out if somebody else already faulted in this pte.
 	 */
-	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+	page_table = lookup_page_table_lock(mm, pt_path, address);
+
 	if (unlikely(!pte_same(*page_table, orig_pte)))
 		goto out_nomap;
 
@@ -1989,7 +1987,7 @@
 
 	if (write_access) {
 		if (do_wp_page(mm, vma, address,
-				page_table, pmd, ptl, pte) == VM_FAULT_OOM)
+				page_table, pt_path, pte) == VM_FAULT_OOM)
 			ret = VM_FAULT_OOM;
 		goto out;
 	}
@@ -1998,11 +1996,13 @@
 	update_mmu_cache(vma, address, pte);
 	lazy_mmu_prot_update(pte);
 unlock:
-	pte_unmap_unlock(page_table, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 out:
 	return ret;
 out_nomap:
-	pte_unmap_unlock(page_table, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 	unlock_page(page);
 	page_cache_release(page);
 	return ret;
@@ -2276,13 +2276,13 @@
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
 static int do_file_page(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long address, pte_t *page_table, pmd_t *pmd,
+		unsigned long address, pte_t *page_table, pt_path_t pt_path,
 		int write_access, pte_t orig_pte)
 {
 	pgoff_t pgoff;
 	int err;
 
-	if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
+	if (!pte_unmap_same(mm, pt_path, page_table, orig_pte))
 		return VM_FAULT_MINOR;
 
 	if (unlikely(!(vma->vm_flags & VM_NONLINEAR))) {
@@ -2323,9 +2323,6 @@
 {
 	pte_t entry;
 	pte_t old_entry;
-	spinlock_t *ptl;
-
-	pmd_t *pmd = pt_path.pmd;
 
 	old_entry = entry = *pte;
 	if (!pte_present(entry)) {
@@ -2344,19 +2341,19 @@
 		}
 		if (pte_file(entry))
 			return do_file_page(mm, vma, address,
-					pte, pmd, write_access, entry);
+					pte, pt_path, write_access, entry);
 		return do_swap_page(mm, vma, address,
-					pte, pmd, write_access, entry);
+					pte, pt_path, write_access, entry);
 	}
 
-	ptl = pte_lockptr(mm, pmd);
-	spin_lock(ptl);
+	lock_pte(mm, pt_path);
+
 	if (unlikely(!pte_same(*pte, entry)))
 		goto unlock;
 	if (write_access) {
 		if (!pte_write(entry))
 			return do_wp_page(mm, vma, address,
-					pte, pmd, ptl, entry);
+					pte, pt_path, entry);
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
@@ -2375,7 +2372,8 @@
 			flush_tlb_page(vma, address);
 	}
 unlock:
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 	return VM_FAULT_MINOR;
 }
 
Index: linux-2.6.20-rc3/include/linux/swapops.h
===================================================================
--- linux-2.6.20-rc3.orig/include/linux/swapops.h	2007-01-01 11:53:20.000000000 +1100
+++ linux-2.6.20-rc3/include/linux/swapops.h	2007-01-04 20:49:46.344024000 +1100
@@ -68,6 +68,8 @@
 	return __swp_entry_to_pte(arch_entry);
 }
 
+#include <linux/pt.h>
+
 #ifdef CONFIG_MIGRATION
 static inline swp_entry_t make_migration_entry(struct page *page, int write)
 {
@@ -103,7 +105,7 @@
 	*entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry));
 }
 
-extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+extern void migration_entry_wait(struct mm_struct *mm, pt_path_t pt_path,
 					unsigned long address);
 #else
 
@@ -111,7 +113,7 @@
 #define is_migration_entry(swp) 0
 #define migration_entry_to_page(swp) NULL
 static inline void make_migration_entry_read(swp_entry_t *entryp) { }
-static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+static inline void migration_entry_wait(struct mm_struct *mm, pt_path_t pt_path,
 					 unsigned long address) { }
 static inline int is_write_migration_entry(swp_entry_t entry)
 {
Index: linux-2.6.20-rc3/mm/migrate.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/migrate.c	2007-01-01 11:53:20.000000000 +1100
+++ linux-2.6.20-rc3/mm/migrate.c	2007-01-04 20:49:46.344024000 +1100
@@ -255,15 +255,14 @@
  *
  * This function is called from do_swap_page().
  */
-void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+void migration_entry_wait(struct mm_struct *mm, pt_path_t pt_path,
 				unsigned long address)
 {
 	pte_t *ptep, pte;
-	spinlock_t *ptl;
 	swp_entry_t entry;
 	struct page *page;
 
-	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+	ptep = lookup_page_table_lock(mm, pt_path, address);
 	pte = *ptep;
 	if (!is_swap_pte(pte))
 		goto out;
@@ -275,14 +274,15 @@
 	page = migration_entry_to_page(entry);
 
 	get_page(page);
-	pte_unmap_unlock(ptep, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 	wait_on_page_locked(page);
 	put_page(page);
 	return;
 out:
-	pte_unmap_unlock(ptep, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(page_table);
 }
-
 /*
  * Replace the page in the mapping.
  *

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 10/29] Call simple PTI functions
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (8 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 9/29] Clean up page fault handlers Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:46 ` [PATCH 11/29] Call simple PTI functions cont Paul Davies
                   ` (39 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 10 summary:
 * Remove implementation dependent page table lookup code from memory.c,
 rmap.c and filemap_xip.c and replace with interface defined lookup_page_table.
   * Leaves hugetlb undisturbed for the default page table implementation.
 * Adjust prototype in rmap.h to align with page table interface.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/rmap.h |    3 +-
 mm/filemap_xip.c     |    7 +++--
 mm/memory.c          |   61 +++++++++++++++------------------------------------
 mm/rmap.c            |   52 ++++++++++++++++++-------------------------
 4 files changed, 47 insertions(+), 76 deletions(-)
Index: linux-2.6.20-rc3/mm/memory.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/memory.c	2007-01-09 12:17:31.550257000 +1100
+++ linux-2.6.20-rc3/mm/memory.c	2007-01-09 12:18:09.418257000 +1100
@@ -844,11 +844,8 @@
 struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 			unsigned int flags)
 {
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
 	pte_t *ptep, pte;
-	spinlock_t *ptl;
+	pt_path_t pt_path;
 	struct page *page;
 	struct mm_struct *mm = vma->vm_mm;
 
@@ -859,25 +856,14 @@
 	}
 
 	page = NULL;
-	pgd = pgd_offset(mm, address);
-	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
-		goto no_page_table;
-
-	pud = pud_offset(pgd, address);
-	if (pud_none(*pud) || unlikely(pud_bad(*pud)))
-		goto no_page_table;
-	
-	pmd = pmd_offset(pud, address);
-	if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
+	ptep = lookup_page_table(mm, address, &pt_path);
+	if (!ptep)
 		goto no_page_table;
 
-	if (pmd_huge(*pmd)) {
-		BUG_ON(flags & FOLL_GET);
-		page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
+	if(is_huge_page(mm, address, pt_path, flags, page))
 		goto out;
-	}
 
-	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+	lock_pte(mm, pt_path);
 	if (!ptep)
 		goto out;
 
@@ -899,7 +885,8 @@
 		mark_page_accessed(page);
 	}
 unlock:
-	pte_unmap_unlock(ptep, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(ptep);
 out:
 	return page;
 
@@ -2426,29 +2413,19 @@
  */
 struct page * vmalloc_to_page(void * vmalloc_addr)
 {
-	unsigned long addr = (unsigned long) vmalloc_addr;
-	struct page *page = NULL;
-	pgd_t *pgd = pgd_offset_k(addr);
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *ptep, pte;
-  
-	if (!pgd_none(*pgd)) {
-		pud = pud_offset(pgd, addr);
-		if (!pud_none(*pud)) {
-			pmd = pmd_offset(pud, addr);
-			if (!pmd_none(*pmd)) {
-				ptep = pte_offset_map(pmd, addr);
-				pte = *ptep;
-				if (pte_present(pte))
-					page = pte_page(pte);
-				pte_unmap(ptep);
-			}
-		}
-	}
-	return page;
+    unsigned long addr = (unsigned long) vmalloc_addr;
+    struct page *page = NULL;
+    pte_t *ptep, pte;
+
+    ptep = lookup_page_table(&init_mm, addr, NULL);
+    if(ptep) {
+        pte = *ptep;
+        if (pte_present(pte))
+            page = pte_page(pte);
+        pte_unmap(ptep);
+    }
+    return page;
 }
-
 EXPORT_SYMBOL(vmalloc_to_page);
 
 /*
Index: linux-2.6.20-rc3/mm/rmap.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/rmap.c	2007-01-09 12:15:41.563902000 +1100
+++ linux-2.6.20-rc3/mm/rmap.c	2007-01-09 12:17:44.614257000 +1100
@@ -48,6 +48,7 @@
 #include <linux/rcupdate.h>
 #include <linux/module.h>
 #include <linux/kallsyms.h>
+#include <linux/pt.h>
 
 #include <asm/tlbflush.h>
 
@@ -243,43 +244,30 @@
  * On success returns with pte mapped and locked.
  */
 pte_t *page_check_address(struct page *page, struct mm_struct *mm,
-			  unsigned long address, spinlock_t **ptlp)
+			  unsigned long address, pt_path_t *ppt_path)
 {
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
 	pte_t *pte;
-	spinlock_t *ptl;
-
-	pgd = pgd_offset(mm, address);
-	if (!pgd_present(*pgd))
-		return NULL;
+	pt_path_t pt_path;
 
-	pud = pud_offset(pgd, address);
-	if (!pud_present(*pud))
+	pte = lookup_page_table(mm, address, &pt_path);
+	if(!pte)
 		return NULL;
 
-	pmd = pmd_offset(pud, address);
-	if (!pmd_present(*pmd))
-		return NULL;
-
-	pte = pte_offset_map(pmd, address);
 	/* Make a quick check before getting the lock */
 	if (!pte_present(*pte)) {
 		pte_unmap(pte);
 		return NULL;
 	}
 
-	ptl = pte_lockptr(mm, pmd);
-	spin_lock(ptl);
+	lock_pte(mm, pt_path);
 	if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
-		*ptlp = ptl;
+		set_pt_path(pt_path, ppt_path);
 		return pte;
 	}
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 	return NULL;
 }
-
 /*
  * Subfunctions of page_referenced: page_referenced_one called
  * repeatedly from either page_referenced_anon or page_referenced_file.
@@ -290,14 +278,14 @@
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
 	pte_t *pte;
-	spinlock_t *ptl;
 	int referenced = 0;
+	pt_path_t pt_path;
 
 	address = vma_address(page, vma);
 	if (address == -EFAULT)
 		goto out;
 
-	pte = page_check_address(page, mm, address, &ptl);
+	pte = page_check_address(page, mm, address, &pt_path);
 	if (!pte)
 		goto out;
 
@@ -311,7 +299,8 @@
 		referenced++;
 
 	(*mapcount)--;
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 out:
 	return referenced;
 }
@@ -434,14 +423,14 @@
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
 	pte_t *pte;
-	spinlock_t *ptl;
+	pt_path_t pt_path;
 	int ret = 0;
 
 	address = vma_address(page, vma);
 	if (address == -EFAULT)
 		goto out;
 
-	pte = page_check_address(page, mm, address, &ptl);
+	pte = page_check_address(page, mm, address, &pt_path);
 	if (!pte)
 		goto out;
 
@@ -457,7 +446,9 @@
 		ret = 1;
 	}
 
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
+
 out:
 	return ret;
 }
@@ -615,14 +606,14 @@
 	unsigned long address;
 	pte_t *pte;
 	pte_t pteval;
-	spinlock_t *ptl;
+	pt_path_t pt_path;
 	int ret = SWAP_AGAIN;
 
 	address = vma_address(page, vma);
 	if (address == -EFAULT)
 		goto out;
 
-	pte = page_check_address(page, mm, address, &ptl);
+	pte = page_check_address(page, mm, address, &pt_path);
 	if (!pte)
 		goto out;
 
@@ -693,7 +684,8 @@
 	page_cache_release(page);
 
 out_unmap:
-	pte_unmap_unlock(pte, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 out:
 	return ret;
 }
Index: linux-2.6.20-rc3/include/linux/rmap.h
===================================================================
--- linux-2.6.20-rc3.orig/include/linux/rmap.h	2007-01-09 12:15:41.563902000 +1100
+++ linux-2.6.20-rc3/include/linux/rmap.h	2007-01-09 12:17:44.618257000 +1100
@@ -8,6 +8,7 @@
 #include <linux/slab.h>
 #include <linux/mm.h>
 #include <linux/spinlock.h>
+#include <linux/pt.h>
 
 /*
  * The anon_vma heads a list of private "related" vmas, to scan if
@@ -96,7 +97,7 @@
  * Called from mm/filemap_xip.c to unmap empty zero page
  */
 pte_t *page_check_address(struct page *, struct mm_struct *,
-				unsigned long, spinlock_t **);
+				unsigned long, pt_path_t *);
 
 /*
  * Used by swapoff to help locate where page is expected in vma.
Index: linux-2.6.20-rc3/mm/filemap_xip.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/filemap_xip.c	2007-01-09 12:15:41.563902000 +1100
+++ linux-2.6.20-rc3/mm/filemap_xip.c	2007-01-09 12:17:44.618257000 +1100
@@ -174,7 +174,7 @@
 	unsigned long address;
 	pte_t *pte;
 	pte_t pteval;
-	spinlock_t *ptl;
+	pt_path_t pt_path;
 	struct page *page;
 
 	spin_lock(&mapping->i_mmap_lock);
@@ -184,7 +184,7 @@
 			((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 		BUG_ON(address < vma->vm_start || address >= vma->vm_end);
 		page = ZERO_PAGE(address);
-		pte = page_check_address(page, mm, address, &ptl);
+		pte = page_check_address(page, mm, address, &pt_path);
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
@@ -192,7 +192,8 @@
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
-			pte_unmap_unlock(pte, ptl);
+			unlock_pte(mm, pt_path);
+			pte_unmap(pte);
 			page_cache_release(page);
 		}
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 11/29] Call simple PTI functions cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (9 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 10/29] Call simple PTI functions Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:46 ` [PATCH 12/29] Abstract page table tear down Paul Davies
                   ` (38 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 11
 * Abstract the implementation dependent page table lookup from
 get_user_pages and make call from page table interface.
 * Abstract implementation dependent page table lookups from rmap.c.
 Abstract CLUSTER_SIZE to pt_default.h since it is implementation 
 dependent.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 memory.c  |   15 +--------------
 migrate.c |   27 +++++++--------------------
 rmap.c    |   25 ++++++-------------------
 3 files changed, 14 insertions(+), 53 deletions(-)
Index: linux-2.6.20-rc3/mm/rmap.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/rmap.c	2007-01-06 16:35:41.714144000 +1100
+++ linux-2.6.20-rc3/mm/rmap.c	2007-01-06 16:35:52.362144000 +1100
@@ -709,19 +709,16 @@
  * there there won't be many ptes located within the scan cluster.  In this case
  * maybe we could scan further - to the end of the pte page, perhaps.
  */
-#define CLUSTER_SIZE	min(32*PAGE_SIZE, PMD_SIZE)
+
 #define CLUSTER_MASK	(~(CLUSTER_SIZE - 1))
 
 static void try_to_unmap_cluster(unsigned long cursor,
 	unsigned int *mapcount, struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
 	pte_t *pte;
 	pte_t pteval;
-	spinlock_t *ptl;
+	pt_path_t pt_path;
 	struct page *page;
 	unsigned long address;
 	unsigned long end;
@@ -733,19 +730,8 @@
 	if (end > vma->vm_end)
 		end = vma->vm_end;
 
-	pgd = pgd_offset(mm, address);
-	if (!pgd_present(*pgd))
-		return;
-
-	pud = pud_offset(pgd, address);
-	if (!pud_present(*pud))
-		return;
-
-	pmd = pmd_offset(pud, address);
-	if (!pmd_present(*pmd))
-		return;
-
-	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
+	pte = lookup_page_table(mm, address, &pt_path);
+	lock_pte(mm, pt_path);
 
 	/* Update high watermark before we lower rss */
 	update_hiwater_rss(mm);
@@ -776,7 +762,8 @@
 		dec_mm_counter(mm, file_rss);
 		(*mapcount)--;
 	}
-	pte_unmap_unlock(pte - 1, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte-1);
 }
 
 static int try_to_unmap_anon(struct page *page, int migration)
Index: linux-2.6.20-rc3/mm/migrate.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/migrate.c	2007-01-06 13:01:46.930183000 +1100
+++ linux-2.6.20-rc3/mm/migrate.c	2007-01-06 16:35:52.366144000 +1100
@@ -28,6 +28,7 @@
 #include <linux/mempolicy.h>
 #include <linux/vmalloc.h>
 #include <linux/security.h>
+#include <linux/pt.h>
 
 #include "internal.h"
 
@@ -128,37 +129,22 @@
 {
 	struct mm_struct *mm = vma->vm_mm;
 	swp_entry_t entry;
- 	pgd_t *pgd;
- 	pud_t *pud;
- 	pmd_t *pmd;
+	pt_path_t pt_path;
 	pte_t *ptep, pte;
- 	spinlock_t *ptl;
 	unsigned long addr = page_address_in_vma(new, vma);
 
 	if (addr == -EFAULT)
 		return;
 
- 	pgd = pgd_offset(mm, addr);
-	if (!pgd_present(*pgd))
-                return;
-
-	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud))
-                return;
-
-	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
-		return;
-
-	ptep = pte_offset_map(pmd, addr);
+	ptep = lookup_page_table(mm, addr, &pt_path);
 
 	if (!is_swap_pte(*ptep)) {
 		pte_unmap(ptep);
  		return;
  	}
 
- 	ptl = pte_lockptr(mm, pmd);
- 	spin_lock(ptl);
+	lock_pte(mm, pt_path);
+
 	pte = *ptep;
 	if (!is_swap_pte(pte))
 		goto out;
@@ -184,7 +170,8 @@
 	lazy_mmu_prot_update(pte);
 
 out:
-	pte_unmap_unlock(ptep, ptl);
+	unlock_pte(mm, pt_path);
+	pte_unmap(pte);
 }
 
 /*
Index: linux-2.6.20-rc3/mm/memory.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/memory.c	2007-01-06 16:34:55.534144000 +1100
+++ linux-2.6.20-rc3/mm/memory.c	2007-01-06 16:35:52.366144000 +1100
@@ -927,23 +927,10 @@
 		if (!vma && in_gate_area(tsk, start)) {
 			unsigned long pg = start & PAGE_MASK;
 			struct vm_area_struct *gate_vma = get_gate_vma(tsk);
-			pgd_t *pgd;
-			pud_t *pud;
-			pmd_t *pmd;
 			pte_t *pte;
 			if (write) /* user gate pages are read-only */
 				return i ? : -EFAULT;
-			if (pg > TASK_SIZE)
-				pgd = pgd_offset_k(pg);
-			else
-				pgd = pgd_offset_gate(mm, pg);
-			BUG_ON(pgd_none(*pgd));
-			pud = pud_offset(pgd, pg);
-			BUG_ON(pud_none(*pud));
-			pmd = pmd_offset(pud, pg);
-			if (pmd_none(*pmd))
-				return i ? : -EFAULT;
-			pte = pte_offset_map(pmd, pg);
+			pte = lookup_gate_area(mm, pg);
 			if (pte_none(*pte)) {
 				pte_unmap(pte);
 				return i ? : -EFAULT;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 12/29] Abstract page table tear down
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (10 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 11/29] Call simple PTI functions cont Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:46 ` [PATCH 13/29] Finish abstracting " Paul Davies
                   ` (37 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 12
 * Moves the default page table tear down iterator from memory.c to
 pt_default.c

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 memory.c     |  148 ----------------------------------------------------------
 pt-default.c |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 149 insertions(+), 148 deletions(-)
Index: linux-2.6.20-rc1/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/pt-default.c	2006-12-21 16:50:32.218023000 +1100
+++ linux-2.6.20-rc1/mm/pt-default.c	2006-12-21 17:34:52.106463000 +1100
@@ -139,3 +139,152 @@
 	return 0;
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
+
+
+/*
+ * Note: this doesn't free the actual pages themselves. That
+ * has been handled earlier when unmapping all the memory regions.
+ */
+static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd)
+{
+	struct page *page = pmd_page(*pmd);
+	pmd_clear(pmd);
+	pte_lock_deinit(page);
+	pte_free_tlb(tlb, page);
+	dec_zone_page_state(page, NR_PAGETABLE);
+	tlb->mm->nr_ptes--;
+}
+
+static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
+				unsigned long addr, unsigned long end,
+				unsigned long floor, unsigned long ceiling)
+{
+	pmd_t *pmd;
+	unsigned long next;
+	unsigned long start;
+
+	start = addr;
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		free_pte_range(tlb, pmd);
+	} while (pmd++, addr = next, addr != end);
+
+	start &= PUD_MASK;
+	if (start < floor)
+		return;
+	if (ceiling) {
+		ceiling &= PUD_MASK;
+		if (!ceiling)
+			return;
+	}
+	if (end - 1 > ceiling - 1)
+		return;
+
+	pmd = pmd_offset(pud, start);
+	pud_clear(pud);
+	pmd_free_tlb(tlb, pmd);
+}
+
+static inline void free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
+				unsigned long addr, unsigned long end,
+				unsigned long floor, unsigned long ceiling)
+{
+	pud_t *pud;
+	unsigned long next;
+	unsigned long start;
+
+	start = addr;
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		free_pmd_range(tlb, pud, addr, next, floor, ceiling);
+	} while (pud++, addr = next, addr != end);
+
+	start &= PGDIR_MASK;
+	if (start < floor)
+		return;
+	if (ceiling) {
+		ceiling &= PGDIR_MASK;
+		if (!ceiling)
+			return;
+	}
+	if (end - 1 > ceiling - 1)
+		return;
+
+	pud = pud_offset(pgd, start);
+	pgd_clear(pgd);
+	pud_free_tlb(tlb, pud);
+}
+
+/*
+ * This function frees user-level page tables of a process.
+ *
+ * Must be called with pagetable lock held.
+ */
+void free_pgd_range(struct mmu_gather **tlb,
+			unsigned long addr, unsigned long end,
+			unsigned long floor, unsigned long ceiling)
+{
+	pgd_t *pgd;
+	unsigned long next;
+	unsigned long start;
+
+	/*
+	 * The next few lines have given us lots of grief...
+	 *
+	 * Why are we testing PMD* at this top level?  Because often
+	 * there will be no work to do at all, and we'd prefer not to
+	 * go all the way down to the bottom just to discover that.
+	 *
+	 * Why all these "- 1"s?  Because 0 represents both the bottom
+	 * of the address space and the top of it (using -1 for the
+	 * top wouldn't help much: the masks would do the wrong thing).
+	 * The rule is that addr 0 and floor 0 refer to the bottom of
+	 * the address space, but end 0 and ceiling 0 refer to the top
+	 * Comparisons need to use "end - 1" and "ceiling - 1" (though
+	 * that end 0 case should be mythical).
+	 *
+	 * Wherever addr is brought up or ceiling brought down, we must
+	 * be careful to reject "the opposite 0" before it confuses the
+	 * subsequent tests.  But what about where end is brought down
+	 * by PMD_SIZE below? no, end can't go down to 0 there.
+	 *
+	 * Whereas we round start (addr) and ceiling down, by different
+	 * masks at different levels, in order to test whether a table
+	 * now has no other vmas using it, so can be freed, we don't
+	 * bother to round floor or end up - the tests don't need that.
+	 */
+
+	addr &= PMD_MASK;
+	if (addr < floor) {
+		addr += PMD_SIZE;
+		if (!addr)
+			return;
+	}
+	if (ceiling) {
+		ceiling &= PMD_MASK;
+		if (!ceiling)
+			return;
+	}
+	if (end - 1 > ceiling - 1)
+		end -= PMD_SIZE;
+	if (addr > end - 1)
+		return;
+
+	start = addr;
+	pgd = pgd_offset((*tlb)->mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		free_pud_range(*tlb, pgd, addr, next, floor, ceiling);
+	} while (pgd++, addr = next, addr != end);
+
+	if (!(*tlb)->fullmm)
+		flush_tlb_pgtables((*tlb)->mm, start, end);
+}
Index: linux-2.6.20-rc1/mm/memory.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/memory.c	2006-12-21 17:23:31.325774000 +1100
+++ linux-2.6.20-rc1/mm/memory.c	2006-12-21 17:34:48.882463000 +1100
@@ -94,154 +94,6 @@
 }
 __setup("norandmaps", disable_randmaps);
 
-/*
- * Note: this doesn't free the actual pages themselves. That
- * has been handled earlier when unmapping all the memory regions.
- */
-static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd)
-{
-	struct page *page = pmd_page(*pmd);
-	pmd_clear(pmd);
-	pte_lock_deinit(page);
-	pte_free_tlb(tlb, page);
-	dec_zone_page_state(page, NR_PAGETABLE);
-	tlb->mm->nr_ptes--;
-}
-
-static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
-				unsigned long addr, unsigned long end,
-				unsigned long floor, unsigned long ceiling)
-{
-	pmd_t *pmd;
-	unsigned long next;
-	unsigned long start;
-
-	start = addr;
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		free_pte_range(tlb, pmd);
-	} while (pmd++, addr = next, addr != end);
-
-	start &= PUD_MASK;
-	if (start < floor)
-		return;
-	if (ceiling) {
-		ceiling &= PUD_MASK;
-		if (!ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
-		return;
-
-	pmd = pmd_offset(pud, start);
-	pud_clear(pud);
-	pmd_free_tlb(tlb, pmd);
-}
-
-static inline void free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
-				unsigned long addr, unsigned long end,
-				unsigned long floor, unsigned long ceiling)
-{
-	pud_t *pud;
-	unsigned long next;
-	unsigned long start;
-
-	start = addr;
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		free_pmd_range(tlb, pud, addr, next, floor, ceiling);
-	} while (pud++, addr = next, addr != end);
-
-	start &= PGDIR_MASK;
-	if (start < floor)
-		return;
-	if (ceiling) {
-		ceiling &= PGDIR_MASK;
-		if (!ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
-		return;
-
-	pud = pud_offset(pgd, start);
-	pgd_clear(pgd);
-	pud_free_tlb(tlb, pud);
-}
-
-/*
- * This function frees user-level page tables of a process.
- *
- * Must be called with pagetable lock held.
- */
-void free_pgd_range(struct mmu_gather **tlb,
-			unsigned long addr, unsigned long end,
-			unsigned long floor, unsigned long ceiling)
-{
-	pgd_t *pgd;
-	unsigned long next;
-	unsigned long start;
-
-	/*
-	 * The next few lines have given us lots of grief...
-	 *
-	 * Why are we testing PMD* at this top level?  Because often
-	 * there will be no work to do at all, and we'd prefer not to
-	 * go all the way down to the bottom just to discover that.
-	 *
-	 * Why all these "- 1"s?  Because 0 represents both the bottom
-	 * of the address space and the top of it (using -1 for the
-	 * top wouldn't help much: the masks would do the wrong thing).
-	 * The rule is that addr 0 and floor 0 refer to the bottom of
-	 * the address space, but end 0 and ceiling 0 refer to the top
-	 * Comparisons need to use "end - 1" and "ceiling - 1" (though
-	 * that end 0 case should be mythical).
-	 *
-	 * Wherever addr is brought up or ceiling brought down, we must
-	 * be careful to reject "the opposite 0" before it confuses the
-	 * subsequent tests.  But what about where end is brought down
-	 * by PMD_SIZE below? no, end can't go down to 0 there.
-	 *
-	 * Whereas we round start (addr) and ceiling down, by different
-	 * masks at different levels, in order to test whether a table
-	 * now has no other vmas using it, so can be freed, we don't
-	 * bother to round floor or end up - the tests don't need that.
-	 */
-
-	addr &= PMD_MASK;
-	if (addr < floor) {
-		addr += PMD_SIZE;
-		if (!addr)
-			return;
-	}
-	if (ceiling) {
-		ceiling &= PMD_MASK;
-		if (!ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
-		end -= PMD_SIZE;
-	if (addr > end - 1)
-		return;
-
-	start = addr;
-	pgd = pgd_offset((*tlb)->mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		free_pud_range(*tlb, pgd, addr, next, floor, ceiling);
-	} while (pgd++, addr = next, addr != end);
-
-	if (!(*tlb)->fullmm)
-		flush_tlb_pgtables((*tlb)->mm, start, end);
-}
-
 void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
 		unsigned long floor, unsigned long ceiling)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 13/29] Finish abstracting tear down
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (11 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 12/29] Abstract page table tear down Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:46 ` [PATCH 14/29] Abstract copy page range iterator Paul Davies
                   ` (36 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 13
 * Adjust hugetlb.h to refer to free_pt_range
 * Put optimization code in free_pgtables into a macro in
 pt_default.h, since it is implementation dependent.  Call 
 free_pt_range from interface in free_pgtables.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/hugetlb.h |    2 +-
 mm/memory.c             |   12 +++---------
 mm/pt-default.c         |    2 +-
 3 files changed, 5 insertions(+), 11 deletions(-)
Index: linux-2.6.20-rc4/mm/memory.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/memory.c	2007-01-11 13:11:54.675868000 +1100
+++ linux-2.6.20-rc4/mm/memory.c	2007-01-11 13:12:14.767868000 +1100
@@ -112,16 +112,10 @@
 				floor, next? next->vm_start: ceiling);
 		} else {
 			/*
-			 * Optimization: gather nearby vmas into one call down
+			 * Optimization: gather nearby vmas into one call down for default page table
 			 */
-			while (next && next->vm_start <= vma->vm_end + PMD_SIZE
-			       && !is_vm_hugetlb_page(next)) {
-				vma = next;
-				next = vma->vm_next;
-				anon_vma_unlink(vma);
-				unlink_file_vma(vma);
-			}
-			free_pgd_range(tlb, addr, vma->vm_end,
+			vma_optimization;
+			free_pt_range(tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
 		}
 		vma = next;
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:11:54.671868000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:12:14.771868000 +1100
@@ -226,7 +226,7 @@
  *
  * Must be called with pagetable lock held.
  */
-void free_pgd_range(struct mmu_gather **tlb,
+void free_pt_range(struct mmu_gather **tlb,
 			unsigned long addr, unsigned long end,
 			unsigned long floor, unsigned long ceiling)
 {
Index: linux-2.6.20-rc4/include/linux/hugetlb.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/hugetlb.h	2007-01-11 13:00:53.680752000 +1100
+++ linux-2.6.20-rc4/include/linux/hugetlb.h	2007-01-11 13:12:14.775868000 +1100
@@ -49,7 +49,7 @@
 #endif
 
 #ifndef ARCH_HAS_HUGETLB_FREE_PGD_RANGE
-#define hugetlb_free_pgd_range	free_pgd_range
+#define hugetlb_free_pgd_range	free_pt_range
 #else
 void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
 			    unsigned long end, unsigned long floor,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 14/29] Abstract copy page range iterator
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (12 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 13/29] Finish abstracting " Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:46 ` [PATCH 15/29] Finish abstracting copy page range Paul Davies
                   ` (35 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 14
 * Move the implementation of the copy iterator from memory.c to pt_default.c
 * Adjust copy_page_range to call the interface defined copy iterator.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 memory.c     |  108 ------------------------------------------------------
 pt-default.c |  116 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+), 107 deletions(-)
Index: linux-2.6.20-rc3/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/pt-default.c	2007-01-09 16:02:57.748363000 +1100
+++ linux-2.6.20-rc3/mm/pt-default.c	2007-01-09 16:03:58.968363000 +1100
@@ -288,3 +288,119 @@
 	if (!(*tlb)->fullmm)
 		flush_tlb_pgtables((*tlb)->mm, start, end);
 }
+
+static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end)
+{
+	pte_t *src_pte, *dst_pte;
+	spinlock_t *src_ptl, *dst_ptl;
+	int progress = 0;
+	int rss[2];
+
+again:
+	rss[1] = rss[0] = 0;
+	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
+	if (!dst_pte)
+		return -ENOMEM;
+	src_pte = pte_offset_map_nested(src_pmd, addr);
+	src_ptl = pte_lockptr(src_mm, src_pmd);
+	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
+	arch_enter_lazy_mmu_mode();
+
+	do {
+		/*
+		 * We are holding two locks at this point - either of them
+		 * could generate latencies in another task on another CPU.
+		 */
+		if (progress >= 32) {
+			progress = 0;
+			if (need_resched() ||
+			    need_lockbreak(src_ptl) ||
+			    need_lockbreak(dst_ptl))
+				break;
+		}
+		if (pte_none(*src_pte)) {
+			progress++;
+			continue;
+		}
+		copy_one_pte(dst_mm, src_mm, dst_pte, src_pte, vma, addr, rss);
+		progress += 8;
+	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
+
+	arch_leave_lazy_mmu_mode();
+	spin_unlock(src_ptl);
+	pte_unmap_nested(src_pte - 1);
+	add_mm_rss(dst_mm, rss[0], rss[1]);
+	pte_unmap_unlock(dst_pte - 1, dst_ptl);
+	cond_resched();
+	if (addr != end)
+		goto again;
+	return 0;
+}
+
+static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		pud_t *dst_pud, pud_t *src_pud, struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end)
+{
+	pmd_t *src_pmd, *dst_pmd;
+	unsigned long next;
+
+	dst_pmd = pmd_alloc(dst_mm, dst_pud, addr);
+	if (!dst_pmd)
+		return -ENOMEM;
+	src_pmd = pmd_offset(src_pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(src_pmd))
+			continue;
+		if (copy_pte_range(dst_mm, src_mm, dst_pmd, src_pmd,
+						vma, addr, next))
+			return -ENOMEM;
+	} while (dst_pmd++, src_pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		pgd_t *dst_pgd, pgd_t *src_pgd, struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end)
+{
+	pud_t *src_pud, *dst_pud;
+	unsigned long next;
+
+	dst_pud = pud_alloc(dst_mm, dst_pgd, addr);
+	if (!dst_pud)
+		return -ENOMEM;
+	src_pud = pud_offset(src_pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(src_pud))
+			continue;
+		if (copy_pmd_range(dst_mm, src_mm, dst_pud, src_pud,
+						vma, addr, next))
+			return -ENOMEM;
+	} while (dst_pud++, src_pud++, addr = next, addr != end);
+	return 0;
+}
+
+int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		unsigned long addr, unsigned long end, struct vm_area_struct *vma)
+{
+	pgd_t *src_pgd;
+	pgd_t *dst_pgd;
+	unsigned long next;
+
+	dst_pgd = pgd_offset(dst_mm, addr);
+	src_pgd = pgd_offset(src_mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(src_pgd))
+			continue;
+
+		if (copy_pud_range(dst_mm, src_mm, dst_pgd,
+			src_pgd, vma, addr, next))
+			return -ENOMEM;
+
+	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
+	return 0;
+}
Index: linux-2.6.20-rc3/mm/memory.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/memory.c	2007-01-09 16:02:57.748363000 +1100
+++ linux-2.6.20-rc3/mm/memory.c	2007-01-09 16:03:25.940363000 +1100
@@ -276,105 +276,9 @@
 	set_pte_at(dst_mm, addr, dst_pte, pte);
 }
 
-static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
-		unsigned long addr, unsigned long end)
-{
-	pte_t *src_pte, *dst_pte;
-	spinlock_t *src_ptl, *dst_ptl;
-	int progress = 0;
-	int rss[2];
-
-again:
-	rss[1] = rss[0] = 0;
-	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
-	if (!dst_pte)
-		return -ENOMEM;
-	src_pte = pte_offset_map_nested(src_pmd, addr);
-	src_ptl = pte_lockptr(src_mm, src_pmd);
-	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
-	arch_enter_lazy_mmu_mode();
-
-	do {
-		/*
-		 * We are holding two locks at this point - either of them
-		 * could generate latencies in another task on another CPU.
-		 */
-		if (progress >= 32) {
-			progress = 0;
-			if (need_resched() ||
-			    need_lockbreak(src_ptl) ||
-			    need_lockbreak(dst_ptl))
-				break;
-		}
-		if (pte_none(*src_pte)) {
-			progress++;
-			continue;
-		}
-		copy_one_pte(dst_mm, src_mm, dst_pte, src_pte, vma, addr, rss);
-		progress += 8;
-	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
-
-	arch_leave_lazy_mmu_mode();
-	spin_unlock(src_ptl);
-	pte_unmap_nested(src_pte - 1);
-	add_mm_rss(dst_mm, rss[0], rss[1]);
-	pte_unmap_unlock(dst_pte - 1, dst_ptl);
-	cond_resched();
-	if (addr != end)
-		goto again;
-	return 0;
-}
-
-static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pud_t *dst_pud, pud_t *src_pud, struct vm_area_struct *vma,
-		unsigned long addr, unsigned long end)
-{
-	pmd_t *src_pmd, *dst_pmd;
-	unsigned long next;
-
-	dst_pmd = pmd_alloc(dst_mm, dst_pud, addr);
-	if (!dst_pmd)
-		return -ENOMEM;
-	src_pmd = pmd_offset(src_pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(src_pmd))
-			continue;
-		if (copy_pte_range(dst_mm, src_mm, dst_pmd, src_pmd,
-						vma, addr, next))
-			return -ENOMEM;
-	} while (dst_pmd++, src_pmd++, addr = next, addr != end);
-	return 0;
-}
-
-static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pgd_t *dst_pgd, pgd_t *src_pgd, struct vm_area_struct *vma,
-		unsigned long addr, unsigned long end)
-{
-	pud_t *src_pud, *dst_pud;
-	unsigned long next;
-
-	dst_pud = pud_alloc(dst_mm, dst_pgd, addr);
-	if (!dst_pud)
-		return -ENOMEM;
-	src_pud = pud_offset(src_pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(src_pud))
-			continue;
-		if (copy_pmd_range(dst_mm, src_mm, dst_pud, src_pud,
-						vma, addr, next))
-			return -ENOMEM;
-	} while (dst_pud++, src_pud++, addr = next, addr != end);
-	return 0;
-}
-
 int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		struct vm_area_struct *vma)
 {
-	pgd_t *src_pgd, *dst_pgd;
-	unsigned long next;
 	unsigned long addr = vma->vm_start;
 	unsigned long end = vma->vm_end;
 
@@ -392,17 +296,7 @@
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
-	dst_pgd = pgd_offset(dst_mm, addr);
-	src_pgd = pgd_offset(src_mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(src_pgd))
-			continue;
-		if (copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd,
-						vma, addr, next))
-			return -ENOMEM;
-	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
-	return 0;
+	return copy_dual_iterator(dst_mm, src_mm, addr, end, vma);
 }
 
 static unsigned long zap_pte_range(struct mmu_gather *tlb,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 15/29] Finish abstracting copy page range
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (13 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 14/29] Abstract copy page range iterator Paul Davies
@ 2007-01-13  2:46 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 16/29] Abstract unmap page range iterator Paul Davies
                   ` (34 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 15
 * Creates file called pt-iterators-ops.h to contain the implementation
 of functions called by various iterators, for operating on ptes.
   * Moves copy_one_pte in here from memory.c
   * Puts add_mm_rss into pt-iterator-ops.
   * Puts is_cow_mapping in mm.h
 * Call pt-iterator-ops.h in pt-default.c so the default page table
 iterator implementations can call their operator functions.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/mm.h              |    5 ++
 include/linux/pt-iterator-ops.h |   79 ++++++++++++++++++++++++++++++++++++++++
 mm/memory.c                     |   76 --------------------------------------
 mm/pt-default.c                 |    1 
 4 files changed, 85 insertions(+), 76 deletions(-)
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:34:22.720438000 +1100
@@ -0,0 +1,79 @@
+
+static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss)
+{
+	if (file_rss)
+		add_mm_counter(mm, file_rss, file_rss);
+	if (anon_rss)
+		add_mm_counter(mm, anon_rss, anon_rss);
+}
+
+/*
+ * copy one vm_area from one task to the other. Assumes the page tables
+ * already present in the new task to be cleared in the whole range
+ * covered by this vma.
+ */
+
+static inline void
+copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
+		unsigned long addr, int *rss)
+{
+	unsigned long vm_flags = vma->vm_flags;
+	pte_t pte = *src_pte;
+	struct page *page;
+
+	/* pte contains position in swap or file, so copy. */
+	if (unlikely(!pte_present(pte))) {
+		if (!pte_file(pte)) {
+			swp_entry_t entry = pte_to_swp_entry(pte);
+
+			swap_duplicate(entry);
+			/* make sure dst_mm is on swapoff's mmlist. */
+			if (unlikely(list_empty(&dst_mm->mmlist))) {
+				spin_lock(&mmlist_lock);
+				if (list_empty(&dst_mm->mmlist))
+					list_add(&dst_mm->mmlist,
+						 &src_mm->mmlist);
+				spin_unlock(&mmlist_lock);
+			}
+			if (is_write_migration_entry(entry) &&
+					is_cow_mapping(vm_flags)) {
+				/*
+				 * COW mappings require pages in both parent
+				 * and child to be set to read.
+				 */
+				make_migration_entry_read(&entry);
+				pte = swp_entry_to_pte(entry);
+				set_pte_at(src_mm, addr, src_pte, pte);
+			}
+		}
+		goto out_set_pte;
+	}
+
+	/*
+	 * If it's a COW mapping, write protect it both
+	 * in the parent and the child
+	 */
+	if (is_cow_mapping(vm_flags)) {
+		ptep_set_wrprotect(src_mm, addr, src_pte);
+		pte = pte_wrprotect(pte);
+	}
+
+	/*
+	 * If it's a shared mapping, mark it clean in
+	 * the child
+	 */
+	if (vm_flags & VM_SHARED)
+		pte = pte_mkclean(pte);
+	pte = pte_mkold(pte);
+
+	page = vm_normal_page(vma, addr, pte);
+	if (page) {
+		get_page(page);
+		page_dup_rmap(page);
+		rss[!!PageAnon(page)]++;
+	}
+
+out_set_pte:
+	set_pte_at(dst_mm, addr, dst_pte, pte);
+}
Index: linux-2.6.20-rc4/mm/memory.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/memory.c	2007-01-11 13:32:59.516438000 +1100
+++ linux-2.6.20-rc4/mm/memory.c	2007-01-11 13:36:05.280438000 +1100
@@ -147,11 +147,6 @@
 	dump_stack();
 }
 
-static inline int is_cow_mapping(unsigned int flags)
-{
-	return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
-}
-
 /*
  * This function gets the "struct page" associated with a pte.
  *
@@ -205,77 +200,6 @@
 	return pfn_to_page(pfn);
 }
 
-/*
- * copy one vm_area from one task to the other. Assumes the page tables
- * already present in the new task to be cleared in the whole range
- * covered by this vma.
- */
-
-static inline void
-copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
-		unsigned long addr, int *rss)
-{
-	unsigned long vm_flags = vma->vm_flags;
-	pte_t pte = *src_pte;
-	struct page *page;
-
-	/* pte contains position in swap or file, so copy. */
-	if (unlikely(!pte_present(pte))) {
-		if (!pte_file(pte)) {
-			swp_entry_t entry = pte_to_swp_entry(pte);
-
-			swap_duplicate(entry);
-			/* make sure dst_mm is on swapoff's mmlist. */
-			if (unlikely(list_empty(&dst_mm->mmlist))) {
-				spin_lock(&mmlist_lock);
-				if (list_empty(&dst_mm->mmlist))
-					list_add(&dst_mm->mmlist,
-						 &src_mm->mmlist);
-				spin_unlock(&mmlist_lock);
-			}
-			if (is_write_migration_entry(entry) &&
-					is_cow_mapping(vm_flags)) {
-				/*
-				 * COW mappings require pages in both parent
-				 * and child to be set to read.
-				 */
-				make_migration_entry_read(&entry);
-				pte = swp_entry_to_pte(entry);
-				set_pte_at(src_mm, addr, src_pte, pte);
-			}
-		}
-		goto out_set_pte;
-	}
-
-	/*
-	 * If it's a COW mapping, write protect it both
-	 * in the parent and the child
-	 */
-	if (is_cow_mapping(vm_flags)) {
-		ptep_set_wrprotect(src_mm, addr, src_pte);
-		pte = pte_wrprotect(pte);
-	}
-
-	/*
-	 * If it's a shared mapping, mark it clean in
-	 * the child
-	 */
-	if (vm_flags & VM_SHARED)
-		pte = pte_mkclean(pte);
-	pte = pte_mkold(pte);
-
-	page = vm_normal_page(vma, addr, pte);
-	if (page) {
-		get_page(page);
-		page_dup_rmap(page);
-		rss[!!PageAnon(page)]++;
-	}
-
-out_set_pte:
-	set_pte_at(dst_mm, addr, dst_pte, pte);
-}
-
 int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		struct vm_area_struct *vma)
 {
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:32:59.512438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:33:00.300438000 +1100
@@ -19,6 +19,7 @@
 
 #include <linux/swapops.h>
 #include <linux/elf.h>
+#include <linux/pt-iterator-ops.h>
 
 /*
  * If a p?d_bad entry is found while walking page tables, report
Index: linux-2.6.20-rc4/include/linux/mm.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/mm.h	2007-01-11 13:31:31.440438000 +1100
+++ linux-2.6.20-rc4/include/linux/mm.h	2007-01-11 13:35:32.024438000 +1100
@@ -722,6 +722,11 @@
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
 };
 
+static inline int is_cow_mapping(unsigned int flags)
+{
+	return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
+}
+
 struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size, struct zap_details *);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 16/29] Abstract unmap page range iterator
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (14 preceding siblings ...)
  2007-01-13  2:46 ` [PATCH 15/29] Finish abstracting copy page range Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 17/29] Finish abstracting unmap page range Paul Davies
                   ` (33 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 16
 * Remove add_mm_rss from memory.c (now only required in pt-iterator-ops.h)
 * Start shift of default page table unmap page range iterator implementation
 into pt_default.c. Call unmap_page_range_iterator from the interface.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 memory.c |  149 ---------------------------------------------------------------
 1 file changed, 1 insertion(+), 148 deletions(-)
Index: linux-2.6.20-rc4/mm/memory.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/memory.c	2007-01-11 13:37:23.140438000 +1100
+++ linux-2.6.20-rc4/mm/memory.c	2007-01-11 13:37:23.960438000 +1100
@@ -122,14 +122,6 @@
 	}
 }
 
-static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss)
-{
-	if (file_rss)
-		add_mm_counter(mm, file_rss, file_rss);
-	if (anon_rss)
-		add_mm_counter(mm, anon_rss, anon_rss);
-}
-
 /*
  * This function is called to print an error when a bad pte
  * is found. For example, we might have a PFN-mapped pte in
@@ -223,158 +215,19 @@
 	return copy_dual_iterator(dst_mm, src_mm, addr, end, vma);
 }
 
-static unsigned long zap_pte_range(struct mmu_gather *tlb,
-				struct vm_area_struct *vma, pmd_t *pmd,
-				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
-{
-	struct mm_struct *mm = tlb->mm;
-	pte_t *pte;
-	spinlock_t *ptl;
-	int file_rss = 0;
-	int anon_rss = 0;
-
-	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-	arch_enter_lazy_mmu_mode();
-	do {
-		pte_t ptent = *pte;
-		if (pte_none(ptent)) {
-			(*zap_work)--;
-			continue;
-		}
-
-		(*zap_work) -= PAGE_SIZE;
-
-		if (pte_present(ptent)) {
-			struct page *page;
-
-			page = vm_normal_page(vma, addr, ptent);
-			if (unlikely(details) && page) {
-				/*
-				 * unmap_shared_mapping_pages() wants to
-				 * invalidate cache without truncating:
-				 * unmap shared but keep private pages.
-				 */
-				if (details->check_mapping &&
-				    details->check_mapping != page->mapping)
-					continue;
-				/*
-				 * Each page->index must be checked when
-				 * invalidating or truncating nonlinear.
-				 */
-				if (details->nonlinear_vma &&
-				    (page->index < details->first_index ||
-				     page->index > details->last_index))
-					continue;
-			}
-			ptent = ptep_get_and_clear_full(mm, addr, pte,
-							tlb->fullmm);
-			tlb_remove_tlb_entry(tlb, pte, addr);
-			if (unlikely(!page))
-				continue;
-			if (unlikely(details) && details->nonlinear_vma
-			    && linear_page_index(details->nonlinear_vma,
-						addr) != page->index)
-				set_pte_at(mm, addr, pte,
-					   pgoff_to_pte(page->index));
-			if (PageAnon(page))
-				anon_rss--;
-			else {
-				if (pte_dirty(ptent))
-					set_page_dirty(page);
-				if (pte_young(ptent))
-					mark_page_accessed(page);
-				file_rss--;
-			}
-			page_remove_rmap(page, vma);
-			tlb_remove_page(tlb, page);
-			continue;
-		}
-		/*
-		 * If details->check_mapping, we leave swap entries;
-		 * if details->nonlinear_vma, we leave file entries.
-		 */
-		if (unlikely(details))
-			continue;
-		if (!pte_file(ptent))
-			free_swap_and_cache(pte_to_swp_entry(ptent));
-		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
-	} while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
-
-	add_mm_rss(mm, file_rss, anon_rss);
-	arch_leave_lazy_mmu_mode();
-	pte_unmap_unlock(pte - 1, ptl);
-
-	return addr;
-}
-
-static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
-				struct vm_area_struct *vma, pud_t *pud,
-				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd)) {
-			(*zap_work)--;
-			continue;
-		}
-		next = zap_pte_range(tlb, vma, pmd, addr, next,
-						zap_work, details);
-	} while (pmd++, addr = next, (addr != end && *zap_work > 0));
-
-	return addr;
-}
-
-static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
-				struct vm_area_struct *vma, pgd_t *pgd,
-				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud)) {
-			(*zap_work)--;
-			continue;
-		}
-		next = zap_pmd_range(tlb, vma, pud, addr, next,
-						zap_work, details);
-	} while (pud++, addr = next, (addr != end && *zap_work > 0));
 
-	return addr;
-}
 
 static unsigned long unmap_page_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma,
 				unsigned long addr, unsigned long end,
 				long *zap_work, struct zap_details *details)
 {
-	pgd_t *pgd;
-	unsigned long next;
-
 	if (details && !details->check_mapping && !details->nonlinear_vma)
 		details = NULL;
 
 	BUG_ON(addr >= end);
 	tlb_start_vma(tlb, vma);
-	pgd = pgd_offset(vma->vm_mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd)) {
-			(*zap_work)--;
-			continue;
-		}
-		next = zap_pud_range(tlb, vma, pgd, addr, next,
-						zap_work, details);
-	} while (pgd++, addr = next, (addr != end && *zap_work > 0));
+	addr = unmap_page_range_iterator(tlb, vma, addr, end, zap_work, details);
 	tlb_end_vma(tlb, vma);
 
 	return addr;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 17/29] Finish abstracting unmap page range
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (15 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 16/29] Abstract unmap page range iterator Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 18/29] Abstract zeromap " Paul Davies
                   ` (32 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 17
 * A function called zap_one_pte has been abstracted from the
 unmap_page_range iterator and put into pt-iterator-ops.h
 * Put implementation of unmap_page_range iterator for default page
 table into pt-default.c

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-iterator-ops.h |   68 ++++++++++++++++++++++++++++++
 mm/pt-default.c                 |   89 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 157 insertions(+)
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:37:23.136438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:37:35.728438000 +1100
@@ -77,3 +77,71 @@
 out_set_pte:
 	set_pte_at(dst_mm, addr, dst_pte, pte);
 }
+
+static inline void zap_one_pte(pte_t *pte, struct mm_struct *mm, unsigned long addr,
+		struct vm_area_struct *vma, long *zap_work, struct zap_details *details,
+		struct mmu_gather *tlb, int *anon_rss, int* file_rss)
+{
+	pte_t ptent = *pte;
+	if (pte_none(ptent)) {
+		(*zap_work)--;
+		return;
+	}
+
+	(*zap_work) -= PAGE_SIZE;
+
+	if (pte_present(ptent)) {
+		struct page *page;
+
+		page = vm_normal_page(vma, addr, ptent);
+		if (unlikely(details) && page) {
+			/*
+			 * unmap_shared_mapping_pages() wants to
+			 * invalidate cache without truncating:
+			 * unmap shared but keep private pages.
+			 */
+			if (details->check_mapping &&
+			    details->check_mapping != page->mapping)
+				return;
+			/*
+			 * Each page->index must be checked when
+			 * invalidating or truncating nonlinear.
+			 */
+			if (details->nonlinear_vma &&
+			    (page->index < details->first_index ||
+			     page->index > details->last_index))
+				return;
+		}
+		ptent = ptep_get_and_clear_full(mm, addr, pte,
+						tlb->fullmm);
+		tlb_remove_tlb_entry(tlb, pte, addr);
+		if (unlikely(!page))
+			return;
+		if (unlikely(details) && details->nonlinear_vma
+		    && linear_page_index(details->nonlinear_vma,
+					addr) != page->index)
+			set_pte_at(mm, addr, pte,
+				   pgoff_to_pte(page->index));
+		if (PageAnon(page))
+			(*anon_rss)--;
+		else {
+			if (pte_dirty(ptent))
+				set_page_dirty(page);
+			if (pte_young(ptent))
+				mark_page_accessed(page);
+			(*file_rss)--;
+		}
+		page_remove_rmap(page,vma);
+		tlb_remove_page(tlb, page);
+		return;
+	}
+	/*
+	 * If details->check_mapping, we leave swap entries;
+	 * if details->nonlinear_vma, we leave file entries.
+	 */
+	if (unlikely(details))
+		return;
+	if (!pte_file(ptent))
+		free_swap_and_cache(pte_to_swp_entry(ptent));
+	pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+}
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:37:23.140438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:37:35.728438000 +1100
@@ -405,3 +405,92 @@
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
 	return 0;
 }
+
+static inline unsigned long zap_pte_range(struct mmu_gather *tlb,
+				struct vm_area_struct *vma, pmd_t *pmd,
+				unsigned long addr, unsigned long end,
+				long *zap_work, struct zap_details *details)
+{
+	struct mm_struct *mm = tlb->mm;
+	pte_t *pte;
+	spinlock_t *ptl;
+	int file_rss = 0;
+	int anon_rss = 0;
+
+	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	arch_enter_lazy_mmu_mode();
+	do {
+		zap_one_pte(pte, mm, addr, vma, zap_work, details, tlb, &anon_rss, &file_rss);
+	} while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
+
+	add_mm_rss(mm, file_rss, anon_rss);
+	arch_leave_lazy_mmu_mode();
+	pte_unmap_unlock(pte - 1, ptl);
+
+	return addr;
+}
+
+static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
+				struct vm_area_struct *vma, pud_t *pud,
+				unsigned long addr, unsigned long end,
+				long *zap_work, struct zap_details *details)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd)) {
+			(*zap_work)--;
+			continue;
+		}
+		next = zap_pte_range(tlb, vma, pmd, addr, next,
+						zap_work, details);
+	} while (pmd++, addr = next, (addr != end && *zap_work > 0));
+
+	return addr;
+}
+
+static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
+				struct vm_area_struct *vma, pgd_t *pgd,
+				unsigned long addr, unsigned long end,
+				long *zap_work, struct zap_details *details)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud)) {
+			(*zap_work)--;
+			continue;
+		}
+		next = zap_pmd_range(tlb, vma, pud, addr, next,
+						zap_work, details);
+	} while (pud++, addr = next, (addr != end && *zap_work > 0));
+
+	return addr;
+}
+
+unsigned long unmap_page_range_iterator(struct mmu_gather *tlb,
+		struct vm_area_struct *vma, unsigned long addr, unsigned long end,
+		long *zap_work, struct zap_details *details)
+{
+	pgd_t *pgd;
+	unsigned long next;
+
+	pgd = pgd_offset(vma->vm_mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd)) {
+			(*zap_work)--;
+			continue;
+		}
+		next = zap_pud_range(tlb, vma, pgd, addr, next, zap_work,
+							 details);
+	} while (pgd++, addr = next, (addr != end && *zap_work > 0));
+
+	return addr;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 18/29] Abstract zeromap page range
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (16 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 17/29] Finish abstracting unmap page range Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 19/29] Abstract remap pfn range Paul Davies
                   ` (31 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 18 
 * Move zeromap_page_range iterator implemtenation from memory.c to pt-default.c
 * Abstract an operator function zeromap_one_pte from this iterator during
 the process and put it into pt-iterator-ops.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-iterator-ops.h |   12 ++++++
 include/linux/pt.h              |    1 
 mm/memory.c                     |   78 ----------------------------------------
 mm/pt-default.c                 |   67 ++++++++++++++++++++++++++++++++++
 4 files changed, 81 insertions(+), 77 deletions(-)
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:37:35.728438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:37:46.828438000 +1100
@@ -494,3 +494,70 @@
 
 	return addr;
 }
+
+static int zeromap_pte_range(struct mm_struct *mm, pmd_t *pmd,
+			unsigned long addr, unsigned long end, pgprot_t prot)
+{
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
+	if (!pte)
+		return -ENOMEM;
+	arch_enter_lazy_mmu_mode();
+	do {
+		zeromap_one_pte(mm, pte, addr, prot);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
+	pte_unmap_unlock(pte - 1, ptl);
+	return 0;
+}
+
+static inline int zeromap_pmd_range(struct mm_struct *mm, pud_t *pud,
+			unsigned long addr, unsigned long end, pgprot_t prot)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return -ENOMEM;
+	do {
+		next = pmd_addr_end(addr, end);
+		if (zeromap_pte_range(mm, pmd, addr, next, prot))
+			return -ENOMEM;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline int zeromap_pud_range(struct mm_struct *mm, pgd_t *pgd,
+			unsigned long addr, unsigned long end, pgprot_t prot)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_alloc(mm, pgd, addr);
+	if (!pud)
+		return -ENOMEM;
+	do {
+		next = pud_addr_end(addr, end);
+		if (zeromap_pmd_range(mm, pud, addr, next, prot))
+			return -ENOMEM;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+int zeromap_build_iterator(struct mm_struct *mm,
+			unsigned long addr, unsigned long end, pgprot_t prot)
+{
+	unsigned long next;
+	pgd_t *pgd;
+
+	pgd = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if(zeromap_pud_range(mm, pgd, addr, next, prot))
+		  	return -ENOMEM;
+	} while (pgd++, addr = next, addr != end);
+	return 0;
+}
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:37:35.728438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:37:46.832438000 +1100
@@ -145,3 +145,15 @@
 		free_swap_and_cache(pte_to_swp_entry(ptent));
 	pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 }
+
+static inline void
+zeromap_one_pte(struct mm_struct *mm, pte_t *pte, unsigned long addr, pgprot_t prot)
+{
+	struct page *page = ZERO_PAGE(addr);
+	pte_t zero_pte = pte_wrprotect(mk_pte(page, prot));
+	page_cache_get(page);
+	page_add_file_rmap(page);
+	inc_mm_counter(mm, file_rss);
+	BUG_ON(!pte_none(*pte));
+	set_pte_at(mm, addr, pte, zero_pte);
+}
Index: linux-2.6.20-rc4/include/linux/pt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt.h	2007-01-11 13:37:17.200438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt.h	2007-01-11 13:37:46.832438000 +1100
@@ -20,6 +20,7 @@
 void free_pt_range(struct mmu_gather **tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
 
+/* Iterators for memory.c */
 int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		unsigned long addr, unsigned long end, struct vm_area_struct *vma);
 
Index: linux-2.6.20-rc4/mm/memory.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/memory.c	2007-01-11 13:37:23.960438000 +1100
+++ linux-2.6.20-rc4/mm/memory.c	2007-01-11 13:37:46.832438000 +1100
@@ -537,92 +537,16 @@
 }
 EXPORT_SYMBOL(get_user_pages);
 
-static int zeromap_pte_range(struct mm_struct *mm, pmd_t *pmd,
-			unsigned long addr, unsigned long end, pgprot_t prot)
-{
-	pte_t *pte;
-	spinlock_t *ptl;
-	int err = 0;
-
-	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
-	if (!pte)
-		return -EAGAIN;
-	arch_enter_lazy_mmu_mode();
-	do {
-		struct page *page = ZERO_PAGE(addr);
-		pte_t zero_pte = pte_wrprotect(mk_pte(page, prot));
-
-		if (unlikely(!pte_none(*pte))) {
-			err = -EEXIST;
-			pte++;
-			break;
-		}
-		page_cache_get(page);
-		page_add_file_rmap(page);
-		inc_mm_counter(mm, file_rss);
-		set_pte_at(mm, addr, pte, zero_pte);
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	arch_leave_lazy_mmu_mode();
-	pte_unmap_unlock(pte - 1, ptl);
-	return err;
-}
-
-static inline int zeromap_pmd_range(struct mm_struct *mm, pud_t *pud,
-			unsigned long addr, unsigned long end, pgprot_t prot)
-{
-	pmd_t *pmd;
-	unsigned long next;
-	int err;
-
-	pmd = pmd_alloc(mm, pud, addr);
-	if (!pmd)
-		return -EAGAIN;
-	do {
-		next = pmd_addr_end(addr, end);
-		err = zeromap_pte_range(mm, pmd, addr, next, prot);
-		if (err)
-			break;
-	} while (pmd++, addr = next, addr != end);
-	return err;
-}
-
-static inline int zeromap_pud_range(struct mm_struct *mm, pgd_t *pgd,
-			unsigned long addr, unsigned long end, pgprot_t prot)
-{
-	pud_t *pud;
-	unsigned long next;
-	int err;
-
-	pud = pud_alloc(mm, pgd, addr);
-	if (!pud)
-		return -EAGAIN;
-	do {
-		next = pud_addr_end(addr, end);
-		err = zeromap_pmd_range(mm, pud, addr, next, prot);
-		if (err)
-			break;
-	} while (pud++, addr = next, addr != end);
-	return err;
-}
-
 int zeromap_page_range(struct vm_area_struct *vma,
 			unsigned long addr, unsigned long size, pgprot_t prot)
 {
-	pgd_t *pgd;
-	unsigned long next;
 	unsigned long end = addr + size;
 	struct mm_struct *mm = vma->vm_mm;
 	int err;
 
 	BUG_ON(addr >= end);
-	pgd = pgd_offset(mm, addr);
 	flush_cache_range(vma, addr, end);
-	do {
-		next = pgd_addr_end(addr, end);
-		err = zeromap_pud_range(mm, pgd, addr, next, prot);
-		if (err)
-			break;
-	} while (pgd++, addr = next, addr != end);
+	err = zeromap_build_iterator(mm, addr, end, prot);
 	return err;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 19/29] Abstract remap pfn range
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (17 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 18/29] Abstract zeromap " Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 20/29] Abstract change protection iterator Paul Davies
                   ` (30 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 19
 * Move remap_pfn_range iterator implementation from memory.c to pt-default.c
 * Abstract an operator function, remap_one_pte from the iterator and
 put it into pt-iterator-ops.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-iterator-ops.h |   11 ++++-
 mm/memory.c                     |   79 --------------------------------------
 mm/pt-default.c                 |   83 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 79 deletions(-)
Index: linux-2.6.20-rc4/mm/memory.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/memory.c	2007-01-11 13:37:46.832438000 +1100
+++ linux-2.6.20-rc4/mm/memory.c	2007-01-11 13:37:53.600438000 +1100
@@ -625,72 +625,6 @@
 }
 EXPORT_SYMBOL(vm_insert_page);
 
-/*
- * maps a range of physical memory into the requested pages. the old
- * mappings are removed. any references to nonexistent pages results
- * in null mappings (currently treated as "copy-on-access")
- */
-static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
-			unsigned long addr, unsigned long end,
-			unsigned long pfn, pgprot_t prot)
-{
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
-	if (!pte)
-		return -ENOMEM;
-	arch_enter_lazy_mmu_mode();
-	do {
-		BUG_ON(!pte_none(*pte));
-		set_pte_at(mm, addr, pte, pfn_pte(pfn, prot));
-		pfn++;
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	arch_leave_lazy_mmu_mode();
-	pte_unmap_unlock(pte - 1, ptl);
-	return 0;
-}
-
-static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
-			unsigned long addr, unsigned long end,
-			unsigned long pfn, pgprot_t prot)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pfn -= addr >> PAGE_SHIFT;
-	pmd = pmd_alloc(mm, pud, addr);
-	if (!pmd)
-		return -ENOMEM;
-	do {
-		next = pmd_addr_end(addr, end);
-		if (remap_pte_range(mm, pmd, addr, next,
-				pfn + (addr >> PAGE_SHIFT), prot))
-			return -ENOMEM;
-	} while (pmd++, addr = next, addr != end);
-	return 0;
-}
-
-static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
-			unsigned long addr, unsigned long end,
-			unsigned long pfn, pgprot_t prot)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pfn -= addr >> PAGE_SHIFT;
-	pud = pud_alloc(mm, pgd, addr);
-	if (!pud)
-		return -ENOMEM;
-	do {
-		next = pud_addr_end(addr, end);
-		if (remap_pmd_range(mm, pud, addr, next,
-				pfn + (addr >> PAGE_SHIFT), prot))
-			return -ENOMEM;
-	} while (pud++, addr = next, addr != end);
-	return 0;
-}
-
 /**
  * remap_pfn_range - remap kernel memory to userspace
  * @vma: user vma to map to
@@ -704,11 +638,8 @@
 int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
 		    unsigned long pfn, unsigned long size, pgprot_t prot)
 {
-	pgd_t *pgd;
-	unsigned long next;
 	unsigned long end = addr + PAGE_ALIGN(size);
 	struct mm_struct *mm = vma->vm_mm;
-	int err;
 
 	/*
 	 * Physically remapped pages are special. Tell the
@@ -738,16 +669,8 @@
 
 	BUG_ON(addr >= end);
 	pfn -= addr >> PAGE_SHIFT;
-	pgd = pgd_offset(mm, addr);
 	flush_cache_range(vma, addr, end);
-	do {
-		next = pgd_addr_end(addr, end);
-		err = remap_pud_range(mm, pgd, addr, next,
-				pfn + (addr >> PAGE_SHIFT), prot);
-		if (err)
-			break;
-	} while (pgd++, addr = next, addr != end);
-	return err;
+	return remap_build_iterator(mm, addr, end, pfn, prot);
 }
 EXPORT_SYMBOL(remap_pfn_range);
 
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:37:46.828438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:37:53.600438000 +1100
@@ -561,3 +561,86 @@
 	} while (pgd++, addr = next, addr != end);
 	return 0;
 }
+
+/*
+ * maps a range of physical memory into the requested pages. the old
+ * mappings are removed. any references to nonexistent pages results
+ * in null mappings (currently treated as "copy-on-access")
+ */
+static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
+			unsigned long addr, unsigned long end,
+			unsigned long pfn, pgprot_t prot)
+{
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
+	if (!pte)
+		return -ENOMEM;
+	arch_enter_lazy_mmu_mode();
+	do {
+		remap_one_pte(mm, pte, addr, pfn++, prot);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
+	pte_unmap_unlock(pte - 1, ptl);
+	return 0;
+}
+
+static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
+			unsigned long addr, unsigned long end,
+			unsigned long pfn, pgprot_t prot)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pfn -= addr >> PAGE_SHIFT;
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return -ENOMEM;
+	do {
+		next = pmd_addr_end(addr, end);
+		if (remap_pte_range(mm, pmd, addr, next,
+				pfn + (addr >> PAGE_SHIFT), prot))
+			return -ENOMEM;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
+			unsigned long addr, unsigned long end,
+			unsigned long pfn, pgprot_t prot)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pfn -= addr >> PAGE_SHIFT;
+	pud = pud_alloc(mm, pgd, addr);
+	if (!pud)
+		return -ENOMEM;
+	do {
+		next = pud_addr_end(addr, end);
+		if (remap_pmd_range(mm, pud, addr, next,
+				pfn + (addr >> PAGE_SHIFT), prot))
+			return -ENOMEM;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+int remap_build_iterator(struct mm_struct *mm,
+		unsigned long addr, unsigned long end, unsigned long pfn,
+		pgprot_t prot)
+{
+	pgd_t *pgd;
+	unsigned long next;
+	int err;
+
+	pgd = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		err = remap_pud_range(mm, pgd, addr, next,
+				pfn + (addr >> PAGE_SHIFT), prot);
+		if (err)
+			break;
+	} while (pgd++, addr = next, addr != end);
+	return 0;
+}
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:37:46.832438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:37:53.612438000 +1100
@@ -78,7 +78,8 @@
 	set_pte_at(dst_mm, addr, dst_pte, pte);
 }
 
-static inline void zap_one_pte(pte_t *pte, struct mm_struct *mm, unsigned long addr,
+static inline void 
+zap_one_pte(pte_t *pte, struct mm_struct *mm, unsigned long addr,
 		struct vm_area_struct *vma, long *zap_work, struct zap_details *details,
 		struct mmu_gather *tlb, int *anon_rss, int* file_rss)
 {
@@ -157,3 +158,11 @@
 	BUG_ON(!pte_none(*pte));
 	set_pte_at(mm, addr, pte, zero_pte);
 }
+
+static inline void
+remap_one_pte(struct mm_struct *mm, pte_t *pte, unsigned long addr,
+			   unsigned long pfn, pgprot_t prot)
+{
+	BUG_ON(!pte_none(*pte));
+	set_pte_at(mm, addr, pte, pfn_pte(pfn, prot));
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 20/29] Abstract change protection iterator
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (18 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 19/29] Abstract remap pfn range Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 21/29] Abstract unmap vm area Paul Davies
                   ` (29 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 20
 * Move default change_protection iterator implementation from mprotect.c to
 pt-default.c
 * Abstract an operator function, change_prot_pte and place this function
 into pt-iterator-ops.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-iterator-ops.h |   44 ++++++++++++++++++
 mm/mprotect.c                   |   94 +---------------------------------------
 mm/pt-default.c                 |   66 ++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+), 91 deletions(-)
Index: linux-2.6.20-rc4/mm/mprotect.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/mprotect.c	2007-01-11 13:30:52.468438000 +1100
+++ linux-2.6.20-rc4/mm/mprotect.c	2007-01-11 13:37:59.472438000 +1100
@@ -21,110 +21,22 @@
 #include <linux/syscalls.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/pt.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 
-static void change_pte_range(struct mm_struct *mm, pmd_t *pmd,
-		unsigned long addr, unsigned long end, pgprot_t newprot,
-		int dirty_accountable)
-{
-	pte_t *pte, oldpte;
-	spinlock_t *ptl;
-
-	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-	arch_enter_lazy_mmu_mode();
-	do {
-		oldpte = *pte;
-		if (pte_present(oldpte)) {
-			pte_t ptent;
-
-			/* Avoid an SMP race with hardware updated dirty/clean
-			 * bits by wiping the pte and then setting the new pte
-			 * into place.
-			 */
-			ptent = ptep_get_and_clear(mm, addr, pte);
-			ptent = pte_modify(ptent, newprot);
-			/*
-			 * Avoid taking write faults for pages we know to be
-			 * dirty.
-			 */
-			if (dirty_accountable && pte_dirty(ptent))
-				ptent = pte_mkwrite(ptent);
-			set_pte_at(mm, addr, pte, ptent);
-			lazy_mmu_prot_update(ptent);
-#ifdef CONFIG_MIGRATION
-		} else if (!pte_file(oldpte)) {
-			swp_entry_t entry = pte_to_swp_entry(oldpte);
-
-			if (is_write_migration_entry(entry)) {
-				/*
-				 * A protection check is difficult so
-				 * just be safe and disable write
-				 */
-				make_migration_entry_read(&entry);
-				set_pte_at(mm, addr, pte,
-					swp_entry_to_pte(entry));
-			}
-#endif
-		}
-
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	arch_leave_lazy_mmu_mode();
-	pte_unmap_unlock(pte - 1, ptl);
-}
-
-static inline void change_pmd_range(struct mm_struct *mm, pud_t *pud,
-		unsigned long addr, unsigned long end, pgprot_t newprot,
-		int dirty_accountable)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		change_pte_range(mm, pmd, addr, next, newprot, dirty_accountable);
-	} while (pmd++, addr = next, addr != end);
-}
-
-static inline void change_pud_range(struct mm_struct *mm, pgd_t *pgd,
-		unsigned long addr, unsigned long end, pgprot_t newprot,
-		int dirty_accountable)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		change_pmd_range(mm, pud, addr, next, newprot, dirty_accountable);
-	} while (pud++, addr = next, addr != end);
-}
-
 static void change_protection(struct vm_area_struct *vma,
 		unsigned long addr, unsigned long end, pgprot_t newprot,
 		int dirty_accountable)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	pgd_t *pgd;
-	unsigned long next;
 	unsigned long start = addr;
 
 	BUG_ON(addr >= end);
-	pgd = pgd_offset(mm, addr);
+
 	flush_cache_range(vma, addr, end);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		change_pud_range(mm, pgd, addr, next, newprot, dirty_accountable);
-	} while (pgd++, addr = next, addr != end);
+	change_protection_read_iterator(vma, addr, end, newprot, dirty_accountable);
 	flush_tlb_range(vma, start, end);
 }
 
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:37:53.612438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:37:59.476438000 +1100
@@ -1,3 +1,5 @@
+#include <linux/rmap.h>
+#include <asm/tlb.h>
 
 static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss)
 {
@@ -166,3 +168,45 @@
 	BUG_ON(!pte_none(*pte));
 	set_pte_at(mm, addr, pte, pfn_pte(pfn, prot));
 }
+
+static inline void
+change_prot_pte(struct mm_struct *mm, pte_t *pte,
+	unsigned long addr, pgprot_t newprot, int dirty_accountable)
+
+{
+	pte_t oldpte;
+	oldpte = *pte;
+
+	if (pte_present(oldpte)) {
+		pte_t ptent;
+
+		/* Avoid an SMP race with hardware updated dirty/clean
+		 * bits by wiping the pte and then setting the new pte
+		 * into place.
+		 */
+		ptent = ptep_get_and_clear(mm, addr, pte);
+		ptent = pte_modify(ptent, newprot);
+		/*
+		 * Avoid taking write faults for pages we know to be
+		 * dirty.
+		 */
+		if (dirty_accountable && pte_dirty(ptent))
+			ptent = pte_mkwrite(ptent);
+		set_pte_at(mm, addr, pte, ptent);
+		lazy_mmu_prot_update(ptent);
+#ifdef CONFIG_MIGRATION
+	} else if (!pte_file(oldpte)) {
+		swp_entry_t entry = pte_to_swp_entry(oldpte);
+
+		if (is_write_migration_entry(entry)) {
+			/*
+			 * A protection check is difficult so
+			 * just be safe and disable write
+			 */
+			make_migration_entry_read(&entry);
+			set_pte_at(mm, addr, pte,
+				swp_entry_to_pte(entry));
+		}
+#endif
+	}
+}
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:37:53.600438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:37:59.476438000 +1100
@@ -644,3 +644,69 @@
 	} while (pgd++, addr = next, addr != end);
 	return 0;
 }
+
+static void change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+		unsigned long addr, unsigned long end, pgprot_t newprot,
+		int dirty_accountable)
+{
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	arch_enter_lazy_mmu_mode();
+	do {
+		change_prot_pte(mm, pte, addr, newprot, dirty_accountable);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	arch_leave_lazy_mmu_mode();
+	pte_unmap_unlock(pte - 1, ptl);
+}
+
+static inline void change_pmd_range(struct mm_struct *mm, pud_t *pud,
+		unsigned long addr, unsigned long end, pgprot_t newprot,
+		int dirty_accountable)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		change_pte_range(mm, pmd, addr, next, newprot, dirty_accountable);
+	} while (pmd++, addr = next, addr != end);
+}
+
+static inline void change_pud_range(struct mm_struct *mm, pgd_t *pgd,
+		unsigned long addr, unsigned long end, pgprot_t newprot,
+		int dirty_accountable)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		change_pmd_range(mm, pud, addr, next, newprot, dirty_accountable);
+	} while (pud++, addr = next, addr != end);
+}
+
+void change_protection_read_iterator(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end, pgprot_t newprot,
+		int dirty_accountable)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pgd_t *pgd;
+	unsigned long next;
+
+	pgd = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd)) {
+			continue;
+		}
+		change_pud_range(mm, pgd, addr, next, newprot, dirty_accountable);
+	} while (pgd++, addr = next, addr != end);
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 21/29] Abstract unmap vm area
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (19 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 20/29] Abstract change protection iterator Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 22/29] Abstract map " Paul Davies
                   ` (28 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 21
 * Move default page table iterator, vunmap_read_iterator, from vmalloc.c
 to pt_default.c
 * Abstract the operation performed by this iterator into vunmap-one_pte and
 put it in pt-iterator-ops.h
 * Place inclusion guards in swapops.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-iterator-ops.h |    9 ++++++
 include/linux/swapops.h         |    4 ++
 mm/pt-default.c                 |   55 ++++++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c                    |   52 +------------------------------------
 4 files changed, 70 insertions(+), 50 deletions(-)
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:37:59.476438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:38:30.348438000 +1100
@@ -1,4 +1,6 @@
 #include <linux/rmap.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
 #include <asm/tlb.h>
 
 static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss)
@@ -210,3 +212,10 @@
 #endif
 	}
 }
+
+static inline void
+vunmap_one_pte(pte_t *pte, unsigned long address)
+{
+	pte_t ptent = ptep_get_and_clear(&init_mm, address, pte);
+	WARN_ON(!pte_none(ptent) && !pte_present(ptent));
+}
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:37:59.476438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:38:30.352438000 +1100
@@ -710,3 +710,58 @@
 		change_pud_range(mm, pgd, addr, next, newprot, dirty_accountable);
 	} while (pgd++, addr = next, addr != end);
 }
+
+static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		vunmap_one_pte(pte, addr);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+static inline void vunmap_pmd_range(pud_t *pud, unsigned long addr,
+						unsigned long end)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		vunmap_pte_range(pmd, addr, next);
+	} while (pmd++, addr = next, addr != end);
+}
+
+static inline void vunmap_pud_range(pgd_t *pgd, unsigned long addr,
+						unsigned long end)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		vunmap_pmd_range(pud, addr, next);
+	} while (pud++, addr = next, addr != end);
+}
+
+void vunmap_read_iterator(unsigned long addr, unsigned long end)
+{
+	pgd_t *pgd;
+	unsigned long next;
+
+	pgd = pgd_offset_k(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+			vunmap_pud_range(pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
+}
+
Index: linux-2.6.20-rc4/mm/vmalloc.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/vmalloc.c	2007-01-11 13:30:52.416438000 +1100
+++ linux-2.6.20-rc4/mm/vmalloc.c	2007-01-11 13:38:30.352438000 +1100
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 
 #include <linux/vmalloc.h>
+#include <linux/pt.h>
 
 #include <asm/uaccess.h>
 #include <asm/tlbflush.h>
@@ -27,63 +28,14 @@
 static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot,
 			    int node);
 
-static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
-{
-	pte_t *pte;
-
-	pte = pte_offset_kernel(pmd, addr);
-	do {
-		pte_t ptent = ptep_get_and_clear(&init_mm, addr, pte);
-		WARN_ON(!pte_none(ptent) && !pte_present(ptent));
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-}
-
-static inline void vunmap_pmd_range(pud_t *pud, unsigned long addr,
-						unsigned long end)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		vunmap_pte_range(pmd, addr, next);
-	} while (pmd++, addr = next, addr != end);
-}
-
-static inline void vunmap_pud_range(pgd_t *pgd, unsigned long addr,
-						unsigned long end)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		vunmap_pmd_range(pud, addr, next);
-	} while (pud++, addr = next, addr != end);
-}
-
 void unmap_vm_area(struct vm_struct *area)
 {
-	pgd_t *pgd;
-	unsigned long next;
 	unsigned long addr = (unsigned long) area->addr;
 	unsigned long end = addr + area->size;
 
 	BUG_ON(addr >= end);
-	pgd = pgd_offset_k(addr);
 	flush_cache_vunmap(addr, end);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		vunmap_pud_range(pgd, addr, next);
-	} while (pgd++, addr = next, addr != end);
+	vunmap_read_iterator(addr, end);
 	flush_tlb_kernel_range((unsigned long) area->addr, end);
 }
 
Index: linux-2.6.20-rc4/include/linux/swapops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/swapops.h	2007-01-11 13:37:20.448438000 +1100
+++ linux-2.6.20-rc4/include/linux/swapops.h	2007-01-11 13:38:30.352438000 +1100
@@ -1,3 +1,6 @@
+#ifndef _LINUX_SWAPOPS_H
+#define _LINUX_SWAPOPS_H
+
 /*
  * swapcache pages are stored in the swapper_space radix tree.  We want to
  * get good packing density in that tree, so the index should be dense in
@@ -122,3 +125,4 @@
 
 #endif
 
+#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 22/29] Abstract map vm area
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (20 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 21/29] Abstract unmap vm area Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 23/29] Abstract unuse_vma Paul Davies
                   ` (27 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 22
 * Move default page table iterator map_vm_area from vmalloc.c to pt_default.c
 * Abstract the operation performed by this iterator into vmap-one_pte and
 put it in pt-iterator-ops.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-iterator-ops.h |   14 ++++++++
 mm/pt-default.c                 |   67 ++++++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c                    |   63 -------------------------------------
 3 files changed, 82 insertions(+), 62 deletions(-)
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:38:30.352438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:38:47.456438000 +1100
@@ -765,3 +765,70 @@
 	} while (pgd++, addr = next, addr != end);
 }
 
+static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
+			unsigned long end, pgprot_t prot, struct page ***pages)
+{
+	pte_t *pte;
+	int err;
+
+	pte = pte_alloc_kernel(pmd, addr);
+	if (!pte)
+		return -ENOMEM;
+	do {
+		err = vmap_one_pte(pte, addr, pages, prot);
+		if(err)
+			return err;
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	return 0;
+}
+
+static inline int vmap_pmd_range(pud_t *pud, unsigned long addr,
+			unsigned long end, pgprot_t prot, struct page ***pages)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_alloc(&init_mm, pud, addr);
+	if (!pmd)
+		return -ENOMEM;
+	do {
+		next = pmd_addr_end(addr, end);
+		if (vmap_pte_range(pmd, addr, next, prot, pages))
+			return -ENOMEM;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline int vmap_pud_range(pgd_t *pgd, unsigned long addr,
+			unsigned long end, pgprot_t prot, struct page ***pages)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_alloc(&init_mm, pgd, addr);
+	if (!pud)
+		return -ENOMEM;
+	do {
+		next = pud_addr_end(addr, end);
+		if (vmap_pmd_range(pud, addr, next, prot, pages))
+			return -ENOMEM;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+int vmap_build_iterator(unsigned long addr,
+			unsigned long end, pgprot_t prot, struct page ***pages)
+{
+	pgd_t *pgd;
+	unsigned long next;
+	int err;
+
+	pgd = pgd_offset_k(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		err = vmap_pud_range(pgd, addr, next, prot, pages);
+		if (err)
+			break;
+	} while (pgd++, addr = next, addr != end);
+	return 0;
+}
Index: linux-2.6.20-rc4/mm/vmalloc.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/vmalloc.c	2007-01-11 13:38:30.352438000 +1100
+++ linux-2.6.20-rc4/mm/vmalloc.c	2007-01-11 13:38:47.456438000 +1100
@@ -39,75 +39,14 @@
 	flush_tlb_kernel_range((unsigned long) area->addr, end);
 }
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
-			unsigned long end, pgprot_t prot, struct page ***pages)
-{
-	pte_t *pte;
-
-	pte = pte_alloc_kernel(pmd, addr);
-	if (!pte)
-		return -ENOMEM;
-	do {
-		struct page *page = **pages;
-		WARN_ON(!pte_none(*pte));
-		if (!page)
-			return -ENOMEM;
-		set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
-		(*pages)++;
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	return 0;
-}
-
-static inline int vmap_pmd_range(pud_t *pud, unsigned long addr,
-			unsigned long end, pgprot_t prot, struct page ***pages)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pmd = pmd_alloc(&init_mm, pud, addr);
-	if (!pmd)
-		return -ENOMEM;
-	do {
-		next = pmd_addr_end(addr, end);
-		if (vmap_pte_range(pmd, addr, next, prot, pages))
-			return -ENOMEM;
-	} while (pmd++, addr = next, addr != end);
-	return 0;
-}
-
-static inline int vmap_pud_range(pgd_t *pgd, unsigned long addr,
-			unsigned long end, pgprot_t prot, struct page ***pages)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pud = pud_alloc(&init_mm, pgd, addr);
-	if (!pud)
-		return -ENOMEM;
-	do {
-		next = pud_addr_end(addr, end);
-		if (vmap_pmd_range(pud, addr, next, prot, pages))
-			return -ENOMEM;
-	} while (pud++, addr = next, addr != end);
-	return 0;
-}
-
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
 {
-	pgd_t *pgd;
-	unsigned long next;
 	unsigned long addr = (unsigned long) area->addr;
 	unsigned long end = addr + area->size - PAGE_SIZE;
 	int err;
 
 	BUG_ON(addr >= end);
-	pgd = pgd_offset_k(addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		err = vmap_pud_range(pgd, addr, next, prot, pages);
-		if (err)
-			break;
-	} while (pgd++, addr = next, addr != end);
+	err = vmap_build_iterator(addr, end, prot, pages);
 	flush_cache_vmap((unsigned long) area->addr, end);
 	return err;
 }
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:38:30.348438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:38:47.456438000 +1100
@@ -219,3 +219,17 @@
 	pte_t ptent = ptep_get_and_clear(&init_mm, address, pte);
 	WARN_ON(!pte_none(ptent) && !pte_present(ptent));
 }
+
+static inline int
+vmap_one_pte(pte_t *pte, unsigned long addr,
+			struct page ***pages, pgprot_t prot)
+{
+	struct page *page = **pages;
+
+	WARN_ON(!pte_none(*pte));
+	if (!page)
+		return -ENOMEM;
+	set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
+	(*pages)++;
+	return 0;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 23/29] Abstract unuse_vma
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (21 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 22/29] Abstract map " Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 24/29] Abstract smaps iterator Paul Davies
                   ` (26 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 23
 * Move default page table iterator unuse_vma from swapfile.c to pt_default.c
 * Move unuse_pte into pt-iterator-ops.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-iterator-ops.h |   21 ++++++++
 mm/pt-default.c                 |   79 ++++++++++++++++++++++++++++++++
 mm/swapfile.c                   |   97 +---------------------------------------
 3 files changed, 104 insertions(+), 93 deletions(-)
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:38:47.456438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:38:51.872438000 +1100
@@ -832,3 +832,82 @@
 	} while (pgd++, addr = next, addr != end);
 	return 0;
 }
+
+static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
+				unsigned long addr, unsigned long end,
+				swp_entry_t entry, struct page *page)
+{
+	pte_t swp_pte = swp_entry_to_pte(entry);
+	pte_t *pte;
+	spinlock_t *ptl;
+	int found = 0;
+
+	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	do {
+		/*
+		 * swapoff spends a _lot_ of time in this loop!
+		 * Test inline before going to call unuse_pte.
+		 */
+		if (unlikely(pte_same(*pte, swp_pte))) {
+			unuse_pte(vma, pte++, addr, entry, page);
+			found = 1;
+			break;
+		}
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	pte_unmap_unlock(pte - 1, ptl);
+	return found;
+}
+
+static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud,
+				unsigned long addr, unsigned long end,
+				swp_entry_t entry, struct page *page)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		if (unuse_pte_range(vma, pmd, addr, next, entry, page))
+			return 1;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline int unuse_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
+				unsigned long addr, unsigned long end,
+				swp_entry_t entry, struct page *page)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		if (unuse_pmd_range(vma, pud, addr, next, entry, page))
+			return 1;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+int unuse_vma_read_iterator(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end, swp_entry_t entry,
+				struct page *page)
+{
+	pgd_t *pgd;
+	unsigned long next;
+
+	pgd = pgd_offset(vma->vm_mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		if (unuse_pud_range(vma, pgd, addr, next, entry, page))
+			return 1;
+	} while (pgd++, addr = next, addr != end);
+	return 0;
+}
Index: linux-2.6.20-rc4/mm/swapfile.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/swapfile.c	2007-01-11 13:30:52.300438000 +1100
+++ linux-2.6.20-rc4/mm/swapfile.c	2007-01-11 13:38:51.876438000 +1100
@@ -27,6 +27,7 @@
 #include <linux/mutex.h>
 #include <linux/capability.h>
 #include <linux/syscalls.h>
+#include <linux/pt.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -501,93 +502,10 @@
 }
 #endif
 
-/*
- * No need to decide whether this PTE shares the swap entry with others,
- * just let do_wp_page work it out if a write is requested later - to
- * force COW, vm_page_prot omits write permission from any private vma.
- */
-static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
-		unsigned long addr, swp_entry_t entry, struct page *page)
-{
-	inc_mm_counter(vma->vm_mm, anon_rss);
-	get_page(page);
-	set_pte_at(vma->vm_mm, addr, pte,
-		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
-	page_add_anon_rmap(page, vma, addr);
-	swap_free(entry);
-	/*
-	 * Move the page to the active list so it is not
-	 * immediately swapped out again after swapon.
-	 */
-	activate_page(page);
-}
-
-static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
-				unsigned long addr, unsigned long end,
-				swp_entry_t entry, struct page *page)
-{
-	pte_t swp_pte = swp_entry_to_pte(entry);
-	pte_t *pte;
-	spinlock_t *ptl;
-	int found = 0;
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	do {
-		/*
-		 * swapoff spends a _lot_ of time in this loop!
-		 * Test inline before going to call unuse_pte.
-		 */
-		if (unlikely(pte_same(*pte, swp_pte))) {
-			unuse_pte(vma, pte++, addr, entry, page);
-			found = 1;
-			break;
-		}
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	pte_unmap_unlock(pte - 1, ptl);
-	return found;
-}
-
-static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud,
-				unsigned long addr, unsigned long end,
-				swp_entry_t entry, struct page *page)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		if (unuse_pte_range(vma, pmd, addr, next, entry, page))
-			return 1;
-	} while (pmd++, addr = next, addr != end);
-	return 0;
-}
-
-static inline int unuse_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
-				unsigned long addr, unsigned long end,
-				swp_entry_t entry, struct page *page)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		if (unuse_pmd_range(vma, pud, addr, next, entry, page))
-			return 1;
-	} while (pud++, addr = next, addr != end);
-	return 0;
-}
-
 static int unuse_vma(struct vm_area_struct *vma,
 				swp_entry_t entry, struct page *page)
 {
-	pgd_t *pgd;
-	unsigned long addr, end, next;
+	unsigned long addr, end;
 
 	if (page->mapping) {
 		addr = page_address_in_vma(page, vma);
@@ -600,15 +518,8 @@
 		end = vma->vm_end;
 	}
 
-	pgd = pgd_offset(vma->vm_mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		if (unuse_pud_range(vma, pgd, addr, next, entry, page))
-			return 1;
-	} while (pgd++, addr = next, addr != end);
-	return 0;
+	return unuse_vma_read_iterator(vma, addr, end,
+		entry, page);
 }
 
 static int unuse_mm(struct mm_struct *mm,
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:38:47.456438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:38:51.876438000 +1100
@@ -233,3 +233,24 @@
 	(*pages)++;
 	return 0;
 }
+
+/*
+ * No need to decide whether this PTE shares the swap entry with others,
+ * just let do_wp_page work it out if a write is requested later - to
+ * force COW, vm_page_prot omits write permission from any private vma.
+ */
+static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
+		unsigned long addr, swp_entry_t entry, struct page *page)
+{
+	inc_mm_counter(vma->vm_mm, anon_rss);
+	get_page(page);
+	set_pte_at(vma->vm_mm, addr, pte,
+		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
+	page_add_anon_rmap(page, vma, addr);
+	swap_free(entry);
+	/*
+	 * Move the page to the active list so it is not
+	 * immediately swapped out again after swapon.
+	 */
+	activate_page(page);
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 24/29] Abstract smaps iterator
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (22 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 23/29] Abstract unuse_vma Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 25/29] Abstact mempolicy iterator Paul Davies
                   ` (25 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 24
 * move the smaps iterator from the default page table implementation
 in task_mmu.c to pt_default.c
 * relocate mem_size_stats struct from task_mmu.c to mm.h
 * abstract smaps_one_pte from the iterator and place in pt_iterator-ops.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 fs/proc/task_mmu.c              |  107 +++-------------------------------------
 include/linux/mm.h              |    9 +++
 include/linux/pt-iterator-ops.h |   32 +++++++++++
 include/linux/pt.h              |    4 -
 mm/pt-default.c                 |   63 +++++++++++++++++++++++
 5 files changed, 114 insertions(+), 101 deletions(-)
Index: linux-2.6.20-rc4/fs/proc/task_mmu.c
===================================================================
--- linux-2.6.20-rc4.orig/fs/proc/task_mmu.c	2007-01-11 13:30:52.244438000 +1100
+++ linux-2.6.20-rc4/fs/proc/task_mmu.c	2007-01-11 13:38:55.480438000 +1100
@@ -5,6 +5,7 @@
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
 #include <linux/mempolicy.h>
+#include <linux/pt.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -42,16 +43,19 @@
 		"VmData:\t%8lu kB\n"
 		"VmStk:\t%8lu kB\n"
 		"VmExe:\t%8lu kB\n"
-		"VmLib:\t%8lu kB\n"
-		"VmPTE:\t%8lu kB\n",
+		"VmLib:\t%8lu kB\n",
 		hiwater_vm << (PAGE_SHIFT-10),
 		(total_vm - mm->reserved_vm) << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
 		hiwater_rss << (PAGE_SHIFT-10),
 		total_rss << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
-		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
+		mm->stack_vm << (PAGE_SHIFT-10), text, lib);
+#ifdef CONFIG_PT_DEFAULT
+	buffer += sprintf(buffer,
+		"VmPTE:\t%8lu kB\n",
 		(PTRS_PER_PTE*sizeof(pte_t)*mm->nr_ptes) >> 10);
+#endif
 	return buffer;
 }
 
@@ -113,15 +117,6 @@
 	seq_printf(m, "%*c", len, ' ');
 }
 
-struct mem_size_stats
-{
-	unsigned long resident;
-	unsigned long shared_clean;
-	unsigned long shared_dirty;
-	unsigned long private_clean;
-	unsigned long private_dirty;
-};
-
 static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats *mss)
 {
 	struct proc_maps_private *priv = m->private;
@@ -204,90 +199,6 @@
 	return show_map_internal(m, v, NULL);
 }
 
-static void smaps_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
-				unsigned long addr, unsigned long end,
-				struct mem_size_stats *mss)
-{
-	pte_t *pte, ptent;
-	spinlock_t *ptl;
-	struct page *page;
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	do {
-		ptent = *pte;
-		if (!pte_present(ptent))
-			continue;
-
-		mss->resident += PAGE_SIZE;
-
-		page = vm_normal_page(vma, addr, ptent);
-		if (!page)
-			continue;
-
-		if (page_mapcount(page) >= 2) {
-			if (pte_dirty(ptent))
-				mss->shared_dirty += PAGE_SIZE;
-			else
-				mss->shared_clean += PAGE_SIZE;
-		} else {
-			if (pte_dirty(ptent))
-				mss->private_dirty += PAGE_SIZE;
-			else
-				mss->private_clean += PAGE_SIZE;
-		}
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	pte_unmap_unlock(pte - 1, ptl);
-	cond_resched();
-}
-
-static inline void smaps_pmd_range(struct vm_area_struct *vma, pud_t *pud,
-				unsigned long addr, unsigned long end,
-				struct mem_size_stats *mss)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		smaps_pte_range(vma, pmd, addr, next, mss);
-	} while (pmd++, addr = next, addr != end);
-}
-
-static inline void smaps_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
-				unsigned long addr, unsigned long end,
-				struct mem_size_stats *mss)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		smaps_pmd_range(vma, pud, addr, next, mss);
-	} while (pud++, addr = next, addr != end);
-}
-
-static inline void smaps_pgd_range(struct vm_area_struct *vma,
-				unsigned long addr, unsigned long end,
-				struct mem_size_stats *mss)
-{
-	pgd_t *pgd;
-	unsigned long next;
-
-	pgd = pgd_offset(vma->vm_mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		smaps_pud_range(vma, pgd, addr, next, mss);
-	} while (pgd++, addr = next, addr != end);
-}
-
 static int show_smap(struct seq_file *m, void *v)
 {
 	struct vm_area_struct *vma = v;
@@ -295,10 +206,10 @@
 
 	memset(&mss, 0, sizeof mss);
 	if (vma->vm_mm && !is_vm_hugetlb_page(vma))
-		smaps_pgd_range(vma, vma->vm_start, vma->vm_end, &mss);
+		smaps_read_iterator(vma, vma->vm_start, vma->vm_end, &mss);
+
 	return show_map_internal(m, v, &mss);
 }
-
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
 	struct proc_maps_private *priv = m->private;
Index: linux-2.6.20-rc4/include/linux/mm.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/mm.h	2007-01-11 13:37:23.144438000 +1100
+++ linux-2.6.20-rc4/include/linux/mm.h	2007-01-11 13:38:55.484438000 +1100
@@ -858,6 +858,15 @@
 #include <linux/pt-default-mm.h>
 #endif
 
+struct mem_size_stats
+{
+    unsigned long resident;
+    unsigned long shared_clean;
+    unsigned long shared_dirty;
+    unsigned long private_clean;
+    unsigned long private_dirty;
+};
+
 extern void free_area_init(unsigned long * zones_size);
 extern void free_area_init_node(int nid, pg_data_t *pgdat,
 	unsigned long * zones_size, unsigned long zone_start_pfn, 
Index: linux-2.6.20-rc4/include/linux/pt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt.h	2007-01-11 13:37:46.832438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt.h	2007-01-11 13:38:55.484438000 +1100
@@ -47,8 +47,8 @@
 int unuse_vma_read_iterator(struct vm_area_struct *vma,
 		unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page);
 
-/*void smaps_read_iterator(struct vm_area_struct *vma,
-  unsigned long addr, unsigned long end, struct mem_size_stats *mss);*/
+void smaps_read_iterator(struct vm_area_struct *vma,
+  unsigned long addr, unsigned long end, struct mem_size_stats *mss);
 
 int check_policy_read_iterator(struct vm_area_struct *vma,
 		unsigned long addr, unsigned long end, const nodemask_t *nodes,
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:38:51.872438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:38:55.484438000 +1100
@@ -911,3 +911,66 @@
 	} while (pgd++, addr = next, addr != end);
 	return 0;
 }
+
+static void smaps_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
+				unsigned long addr, unsigned long end,
+				struct mem_size_stats *mss)
+{
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	do {
+		smaps_one_pte(vma, addr, pte, mss);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	pte_unmap_unlock(pte - 1, ptl);
+	cond_resched();
+}
+
+static inline void smaps_pmd_range(struct vm_area_struct *vma, pud_t *pud,
+				unsigned long addr, unsigned long end,
+				struct mem_size_stats *mss)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		smaps_pte_range(vma, pmd, addr, next, mss);
+	} while (pmd++, addr = next, addr != end);
+}
+
+static inline void smaps_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
+				unsigned long addr, unsigned long end,
+				struct mem_size_stats *mss)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		smaps_pmd_range(vma, pud, addr, next, mss);
+	} while (pud++, addr = next, addr != end);
+}
+
+void smaps_read_iterator(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end,
+				struct mem_size_stats *mss)
+{
+	pgd_t *pgd;
+	unsigned long next;
+
+	pgd = pgd_offset(vma->vm_mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		smaps_pud_range(vma, pgd, addr, next, mss);
+	} while (pgd++, addr = next, addr != end);
+}
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:38:51.876438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:38:55.488438000 +1100
@@ -239,7 +239,7 @@
  * just let do_wp_page work it out if a write is requested later - to
  * force COW, vm_page_prot omits write permission from any private vma.
  */
-static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
+static inline void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
 		unsigned long addr, swp_entry_t entry, struct page *page)
 {
 	inc_mm_counter(vma->vm_mm, anon_rss);
@@ -254,3 +254,33 @@
 	 */
 	activate_page(page);
 }
+
+static inline void smaps_one_pte(struct vm_area_struct *vma, unsigned long addr, pte_t *pte,
+			   struct mem_size_stats *mss)
+{
+	pte_t ptent;
+	struct page *page;
+
+	ptent = *pte;
+	if (!pte_present(ptent))
+		return;
+
+	mss->resident += PAGE_SIZE;
+
+	page = vm_normal_page(vma, addr, ptent);
+	if (!page)
+		return;
+
+	if (page_mapcount(page) >= 2) {
+		if (pte_dirty(ptent))
+			mss->shared_dirty += PAGE_SIZE;
+		else
+			mss->shared_clean += PAGE_SIZE;
+	} else {
+		if (pte_dirty(ptent))
+			mss->private_dirty += PAGE_SIZE;
+		else
+			mss->private_clean += PAGE_SIZE;
+	}
+}
+

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 25/29] Abstact mempolicy iterator
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (23 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 24/29] Abstract smaps iterator Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:47 ` [PATCH 26/29] Abstract mempolicy iterator cont Paul Davies
                   ` (24 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 25
 * Start moving the mempolicy iterator from mempolicy.c to pt_default.c

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 mempolicy.c  |  108 -----------------------------------------------------------
 pt-default.c |   84 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 108 deletions(-)
Index: linux-2.6.20-rc3/mm/mempolicy.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/mempolicy.c	2007-01-09 16:01:20.604363000 +1100
+++ linux-2.6.20-rc3/mm/mempolicy.c	2007-01-09 16:05:35.496363000 +1100
@@ -208,114 +208,6 @@
 static void migrate_page_add(struct page *page, struct list_head *pagelist,
 				unsigned long flags);
 
-/* Scan through pages checking if pages follow certain conditions. */
-static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long addr, unsigned long end,
-		const nodemask_t *nodes, unsigned long flags,
-		void *private)
-{
-	pte_t *orig_pte;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	do {
-		struct page *page;
-		int nid;
-
-		if (!pte_present(*pte))
-			continue;
-		page = vm_normal_page(vma, addr, *pte);
-		if (!page)
-			continue;
-		/*
-		 * The check for PageReserved here is important to avoid
-		 * handling zero pages and other pages that may have been
-		 * marked special by the system.
-		 *
-		 * If the PageReserved would not be checked here then f.e.
-		 * the location of the zero page could have an influence
-		 * on MPOL_MF_STRICT, zero pages would be counted for
-		 * the per node stats, and there would be useless attempts
-		 * to put zero pages on the migration list.
-		 */
-		if (PageReserved(page))
-			continue;
-		nid = page_to_nid(page);
-		if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
-			continue;
-
-		if (flags & MPOL_MF_STATS)
-			gather_stats(page, private, pte_dirty(*pte));
-		else if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
-			migrate_page_add(page, private, flags);
-		else
-			break;
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	pte_unmap_unlock(orig_pte, ptl);
-	return addr != end;
-}
-
-static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
-		unsigned long addr, unsigned long end,
-		const nodemask_t *nodes, unsigned long flags,
-		void *private)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		if (check_pte_range(vma, pmd, addr, next, nodes,
-				    flags, private))
-			return -EIO;
-	} while (pmd++, addr = next, addr != end);
-	return 0;
-}
-
-static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
-		unsigned long addr, unsigned long end,
-		const nodemask_t *nodes, unsigned long flags,
-		void *private)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		if (check_pmd_range(vma, pud, addr, next, nodes,
-				    flags, private))
-			return -EIO;
-	} while (pud++, addr = next, addr != end);
-	return 0;
-}
-
-static inline int check_pgd_range(struct vm_area_struct *vma,
-		unsigned long addr, unsigned long end,
-		const nodemask_t *nodes, unsigned long flags,
-		void *private)
-{
-	pgd_t *pgd;
-	unsigned long next;
-
-	pgd = pgd_offset(vma->vm_mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		if (check_pud_range(vma, pgd, addr, next, nodes,
-				    flags, private))
-			return -EIO;
-	} while (pgd++, addr = next, addr != end);
-	return 0;
-}
-
 /* Check if a vma is migratable */
 static inline int vma_migratable(struct vm_area_struct *vma)
 {
Index: linux-2.6.20-rc3/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc3.orig/mm/pt-default.c	2007-01-09 16:05:30.932363000 +1100
+++ linux-2.6.20-rc3/mm/pt-default.c	2007-01-09 16:05:35.496363000 +1100
@@ -974,3 +974,87 @@
 		smaps_pud_range(vma, pgd, addr, next, mss);
 	} while (pgd++, addr = next, addr != end);
 }
+
+#ifdef CONFIG_NUMA
+/* Scan through pages checking if pages follow certain conditions. */
+static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
+		unsigned long addr, unsigned long end,
+		const nodemask_t *nodes, unsigned long flags,
+		void *private)
+{
+	pte_t *orig_pte;
+	pte_t *pte;
+	spinlock_t *ptl;
+	int ret;
+
+	orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	do {
+		ret = mempolicy_check_one_pte(vma, addr, pte, nodes, flags, private);
+		if(ret)
+			break;
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	pte_unmap_unlock(orig_pte, ptl);
+	return addr != end;
+}
+
+static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
+		unsigned long addr, unsigned long end,
+		const nodemask_t *nodes, unsigned long flags,
+		void *private)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		if (check_pte_range(vma, pmd, addr, next, nodes,
+				    flags, private))
+			return -EIO;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
+		unsigned long addr, unsigned long end,
+		const nodemask_t *nodes, unsigned long flags,
+		void *private)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		if (check_pmd_range(vma, pud, addr, next, nodes,
+				    flags, private))
+			return -EIO;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+int check_policy_read_iterator(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end,
+		const nodemask_t *nodes, unsigned long flags,
+		void *private)
+{
+	pgd_t *pgd;
+	unsigned long next;
+
+	pgd = pgd_offset(vma->vm_mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		if (check_pud_range(vma, pgd, addr, next, nodes,
+				    flags, private))
+			return -EIO;
+	} while (pgd++, addr = next, addr != end);
+	return 0;
+}
+
+#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 26/29] Abstract mempolicy iterator cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (24 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 25/29] Abstact mempolicy iterator Paul Davies
@ 2007-01-13  2:47 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 27/29] Abstract implementation dependent code for mremap Paul Davies
                   ` (23 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 26
 * Continue moving default page table mempolicy iterator implementation
 to pt-default.c.
 * abstracted mempolicy_one_pte and placed it in pt_iterator-ops.h
   * moved some macros from mempolicy.c to mempolicy.h to make this possible.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/mempolicy.h       |   10 +++++++++
 include/linux/pt-iterator-ops.h |   43 ++++++++++++++++++++++++++++++++++++++++
 mm/mempolicy.c                  |   20 +++++++-----------
 3 files changed, 61 insertions(+), 12 deletions(-)
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:38:55.488438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:39:01.788438000 +1100
@@ -1,6 +1,7 @@
 #include <linux/rmap.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/mempolicy.h>
 #include <asm/tlb.h>
 
 static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss)
@@ -284,3 +285,45 @@
 	}
 }
 
+#ifdef CONFIG_NUMA
+static inline int mempolicy_check_one_pte(struct vm_area_struct *vma, unsigned long addr,
+				pte_t *pte, const nodemask_t *nodes, unsigned long flags,
+				void *private)
+{
+	struct page *page;
+	unsigned int nid;
+
+	if (!pte_present(*pte))
+		return 0;
+	page = vm_normal_page(vma, addr, *pte);
+	if (!page)
+		return 0;
+	if (!page)
+			return 0;
+	/*
+	 * The check for PageReserved here is important to avoid
+	 * handling zero pages and other pages that may have been
+	 * marked special by the system.
+	 *
+	 * If the PageReserved would not be checked here then f.e.
+	 * the location of the zero page could have an influence
+	 * on MPOL_MF_STRICT, zero pages would be counted for
+	 * the per node stats, and there would be useless attempts
+	 * to put zero pages on the migration list.
+	 */
+	if (PageReserved(page))
+		return 0;
+	nid = page_to_nid(page);
+	if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
+		return 0;
+
+	if (flags & MPOL_MF_STATS)
+		gather_stats(page, private, pte_dirty(*pte));
+	else if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+		migrate_page_add(page, private, flags);
+	else
+		return 1;
+
+	return 0;
+}
+#endif
Index: linux-2.6.20-rc4/include/linux/mempolicy.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/mempolicy.h	2007-01-11 13:30:52.128438000 +1100
+++ linux-2.6.20-rc4/include/linux/mempolicy.h	2007-01-11 13:39:01.788438000 +1100
@@ -59,6 +59,16 @@
  * Copying policy objects:
  * For MPOL_BIND the zonelist must be always duplicated. mpol_clone() does this.
  */
+
+/* Internal flags */
+#define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0)	/* Skip checks for continuous vmas */
+#define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1)		/* Invert check for nodemask */
+#define MPOL_MF_STATS (MPOL_MF_INTERNAL << 2)		/* Gather statistics */
+
+void gather_stats(struct page *, void *, int pte_dirty);
+void migrate_page_add(struct page *page, struct list_head *pagelist,
+				unsigned long flags);
+
 struct mempolicy {
 	atomic_t refcnt;
 	short policy; 	/* See MPOL_* above */
Index: linux-2.6.20-rc4/mm/mempolicy.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/mempolicy.c	2007-01-11 13:39:00.152438000 +1100
+++ linux-2.6.20-rc4/mm/mempolicy.c	2007-01-11 13:39:01.792438000 +1100
@@ -89,15 +89,11 @@
 #include <linux/migrate.h>
 #include <linux/rmap.h>
 #include <linux/security.h>
+#include <linux/pt.h>
 
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
 
-/* Internal flags */
-#define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0)	/* Skip checks for continuous vmas */
-#define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1)		/* Invert check for nodemask */
-#define MPOL_MF_STATS (MPOL_MF_INTERNAL << 2)		/* Gather statistics */
-
 static struct kmem_cache *policy_cache;
 static struct kmem_cache *sn_cache;
 
@@ -204,8 +200,8 @@
 	return policy;
 }
 
-static void gather_stats(struct page *, void *, int pte_dirty);
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+void gather_stats(struct page *, void *, int pte_dirty);
+void migrate_page_add(struct page *page, struct list_head *pagelist,
 				unsigned long flags);
 
 /* Check if a vma is migratable */
@@ -257,7 +253,7 @@
 				endvma = end;
 			if (vma->vm_start > start)
 				start = vma->vm_start;
-			err = check_pgd_range(vma, start, endvma, nodes,
+			err = check_policy_read_iterator(vma, start, endvma, nodes,
 						flags, private);
 			if (err) {
 				first = ERR_PTR(err);
@@ -478,7 +474,7 @@
 /*
  * page migration
  */
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+void migrate_page_add(struct page *page, struct list_head *pagelist,
 				unsigned long flags)
 {
 	/*
@@ -610,7 +606,7 @@
 }
 #else
 
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+void migrate_page_add(struct page *page, struct list_head *pagelist,
 				unsigned long flags)
 {
 }
@@ -1664,7 +1660,7 @@
 	unsigned long node[MAX_NUMNODES];
 };
 
-static void gather_stats(struct page *page, void *private, int pte_dirty)
+void gather_stats(struct page *page, void *private, int pte_dirty)
 {
 	struct numa_maps *md = private;
 	int count = page_mapcount(page);
@@ -1761,7 +1757,7 @@
 		check_huge_range(vma, vma->vm_start, vma->vm_end, md);
 		seq_printf(m, " huge");
 	} else {
-		check_pgd_range(vma, vma->vm_start, vma->vm_end,
+		check_policy_read_iterator(vma, vma->vm_start, vma->vm_end,
 				&node_online_map, MPOL_MF_STATS, md);
 	}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 27/29] Abstract implementation dependent code for mremap
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (25 preceding siblings ...)
  2007-01-13  2:47 ` [PATCH 26/29] Abstract mempolicy iterator cont Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 28/29] Abstract ioremap iterator Paul Davies
                   ` (22 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 27
 * Moved implementation dependent page table code from mremap.c to
 pt_default.c. move_page_tables has been made part of the page table interface.
   * Added partial page table lookup functions to pt-default-mm.h to
   facilitate the abstraction of the page table dependent code.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/linux/pt-default-mm.h |   49 +++++++++++++++
 mm/mremap.c                   |  133 ------------------------------------------
 mm/pt-default.c               |   90 ++++++++++++++++++++++++++++
 3 files changed, 140 insertions(+), 132 deletions(-)
Index: linux-2.6.20-rc4/mm/mremap.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/mremap.c	2007-01-11 12:40:58.728788000 +1100
+++ linux-2.6.20-rc4/mm/mremap.c	2007-01-11 12:41:42.240788000 +1100
@@ -18,143 +18,12 @@
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/pt.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 
-static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
-{
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-
-	pgd = pgd_offset(mm, addr);
-	if (pgd_none_or_clear_bad(pgd))
-		return NULL;
-
-	pud = pud_offset(pgd, addr);
-	if (pud_none_or_clear_bad(pud))
-		return NULL;
-
-	pmd = pmd_offset(pud, addr);
-	if (pmd_none_or_clear_bad(pmd))
-		return NULL;
-
-	return pmd;
-}
-
-static pmd_t *alloc_new_pmd(struct mm_struct *mm, unsigned long addr)
-{
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-
-	pgd = pgd_offset(mm, addr);
-	pud = pud_alloc(mm, pgd, addr);
-	if (!pud)
-		return NULL;
-
-	pmd = pmd_alloc(mm, pud, addr);
-	if (!pmd)
-		return NULL;
-
-	if (!pmd_present(*pmd) && __pte_alloc(mm, pmd, addr))
-		return NULL;
-
-	return pmd;
-}
-
-static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
-		unsigned long old_addr, unsigned long old_end,
-		struct vm_area_struct *new_vma, pmd_t *new_pmd,
-		unsigned long new_addr)
-{
-	struct address_space *mapping = NULL;
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t *old_pte, *new_pte, pte;
-	spinlock_t *old_ptl, *new_ptl;
-
-	if (vma->vm_file) {
-		/*
-		 * Subtle point from Rajesh Venkatasubramanian: before
-		 * moving file-based ptes, we must lock vmtruncate out,
-		 * since it might clean the dst vma before the src vma,
-		 * and we propagate stale pages into the dst afterward.
-		 */
-		mapping = vma->vm_file->f_mapping;
-		spin_lock(&mapping->i_mmap_lock);
-		if (new_vma->vm_truncate_count &&
-		    new_vma->vm_truncate_count != vma->vm_truncate_count)
-			new_vma->vm_truncate_count = 0;
-	}
-
-	/*
-	 * We don't have to worry about the ordering of src and dst
-	 * pte locks because exclusive mmap_sem prevents deadlock.
-	 */
-	old_pte = pte_offset_map_lock(mm, old_pmd, old_addr, &old_ptl);
- 	new_pte = pte_offset_map_nested(new_pmd, new_addr);
-	new_ptl = pte_lockptr(mm, new_pmd);
-	if (new_ptl != old_ptl)
-		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
-	arch_enter_lazy_mmu_mode();
-
-	for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
-				   new_pte++, new_addr += PAGE_SIZE) {
-		if (pte_none(*old_pte))
-			continue;
-		pte = ptep_clear_flush(vma, old_addr, old_pte);
-		/* ZERO_PAGE can be dependant on virtual addr */
-		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
-		set_pte_at(mm, new_addr, new_pte, pte);
-	}
-
-	arch_leave_lazy_mmu_mode();
-	if (new_ptl != old_ptl)
-		spin_unlock(new_ptl);
-	pte_unmap_nested(new_pte - 1);
-	pte_unmap_unlock(old_pte - 1, old_ptl);
-	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
-}
-
-#define LATENCY_LIMIT	(64 * PAGE_SIZE)
-
-static unsigned long move_page_tables(struct vm_area_struct *vma,
-		unsigned long old_addr, struct vm_area_struct *new_vma,
-		unsigned long new_addr, unsigned long len)
-{
-	unsigned long extent, next, old_end;
-	pmd_t *old_pmd, *new_pmd;
-
-	old_end = old_addr + len;
-	flush_cache_range(vma, old_addr, old_end);
-
-	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
-		cond_resched();
-		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		if (next - 1 > old_end)
-			next = old_end;
-		extent = next - old_addr;
-		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
-		if (!old_pmd)
-			continue;
-		new_pmd = alloc_new_pmd(vma->vm_mm, new_addr);
-		if (!new_pmd)
-			break;
-		next = (new_addr + PMD_SIZE) & PMD_MASK;
-		if (extent > next - new_addr)
-			extent = next - new_addr;
-		if (extent > LATENCY_LIMIT)
-			extent = LATENCY_LIMIT;
-		move_ptes(vma, old_pmd, old_addr, old_addr + extent,
-				new_vma, new_pmd, new_addr);
-	}
-
-	return len + old_addr - old_end;	/* how much done */
-}
-
 static unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr)
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 12:40:58.728788000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 12:41:42.240788000 +1100
@@ -1058,3 +1058,93 @@
 }
 
 #endif
+
+static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
+		unsigned long old_addr, unsigned long old_end,
+		struct vm_area_struct *new_vma, pmd_t *new_pmd,
+		unsigned long new_addr)
+{
+	struct address_space *mapping = NULL;
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t *old_pte, *new_pte, pte;
+	spinlock_t *old_ptl, *new_ptl;
+
+	if (vma->vm_file) {
+		/*
+		 * Subtle point from Rajesh Venkatasubramanian: before
+		 * moving file-based ptes, we must lock vmtruncate out,
+		 * since it might clean the dst vma before the src vma,
+		 * and we propagate stale pages into the dst afterward.
+		 */
+		mapping = vma->vm_file->f_mapping;
+		spin_lock(&mapping->i_mmap_lock);
+		if (new_vma->vm_truncate_count &&
+		    new_vma->vm_truncate_count != vma->vm_truncate_count)
+			new_vma->vm_truncate_count = 0;
+	}
+
+	/*
+	 * We don't have to worry about the ordering of src and dst
+	 * pte locks because exclusive mmap_sem prevents deadlock.
+	 */
+	old_pte = pte_offset_map_lock(mm, old_pmd, old_addr, &old_ptl);
+ 	new_pte = pte_offset_map_nested(new_pmd, new_addr);
+	new_ptl = pte_lockptr(mm, new_pmd);
+	if (new_ptl != old_ptl)
+		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+	arch_enter_lazy_mmu_mode();
+
+	for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
+				   new_pte++, new_addr += PAGE_SIZE) {
+		if (pte_none(*old_pte))
+			continue;
+		pte = ptep_clear_flush(vma, old_addr, old_pte);
+		/* ZERO_PAGE can be dependant on virtual addr */
+		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
+		set_pte_at(mm, new_addr, new_pte, pte);
+	}
+
+	arch_leave_lazy_mmu_mode();
+	if (new_ptl != old_ptl)
+		spin_unlock(new_ptl);
+	pte_unmap_nested(new_pte - 1);
+	pte_unmap_unlock(old_pte - 1, old_ptl);
+	if (mapping)
+		spin_unlock(&mapping->i_mmap_lock);
+}
+
+#define LATENCY_LIMIT	(64 * PAGE_SIZE)
+
+unsigned long move_page_tables(struct vm_area_struct *vma,
+		unsigned long old_addr, struct vm_area_struct *new_vma,
+		unsigned long new_addr, unsigned long len)
+{
+	unsigned long extent, next, old_end;
+	pmd_t *old_pmd, *new_pmd;
+
+	old_end = old_addr + len;
+	flush_cache_range(vma, old_addr, old_end);
+
+	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
+		cond_resched();
+		next = (old_addr + PMD_SIZE) & PMD_MASK;
+		if (next - 1 > old_end)
+			next = old_end;
+		extent = next - old_addr;
+		old_pmd = lookup_pmd(vma->vm_mm, old_addr);
+		if (!old_pmd)
+			continue;
+		new_pmd = build_pmd(vma->vm_mm, new_addr);
+		if (!new_pmd)
+			break;
+		next = (new_addr + PMD_SIZE) & PMD_MASK;
+		if (extent > next - new_addr)
+			extent = next - new_addr;
+		if (extent > LATENCY_LIMIT)
+			extent = LATENCY_LIMIT;
+		move_ptes(vma, old_pmd, old_addr, old_addr + extent,
+				new_vma, new_pmd, new_addr);
+	}
+
+	return len + old_addr - old_end;	/* how much done */
+}
Index: linux-2.6.20-rc4/include/linux/pt-default-mm.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-default-mm.h	2007-01-11 12:40:58.752788000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-default-mm.h	2007-01-11 12:41:42.268788000 +1100
@@ -72,5 +72,54 @@
 	((unlikely(!pmd_present(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
 		NULL: pte_offset_kernel(pmd, address))
 
+static inline pmd_t *lookup_pmd(struct mm_struct *mm, unsigned long address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	if (mm!=&init_mm) { /* Look up user page table */
+		pgd = pgd_offset(mm, address);
+		if (pgd_none_or_clear_bad(pgd))
+			return NULL;
+	} else {            /* Look up kernel page table */
+		pgd = pgd_offset_k(address);
+		if (pgd_none_or_clear_bad(pgd))
+			return NULL;
+	}
+
+	pud = pud_offset(pgd, address);
+	if (pud_none_or_clear_bad(pud)) {
+		return NULL;
+	}
+
+	pmd = pmd_offset(pud, address);
+	if (pmd_none_or_clear_bad(pmd)) {
+		return NULL;
+	}
+
+	return pmd;
+}
+
+static inline pmd_t *build_pmd(struct mm_struct *mm, unsigned long addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset(mm, addr);
+	pud = pud_alloc(mm, pgd, addr);
+	if (!pud)
+		return NULL;
+
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return NULL;
+
+	if (!pmd_present(*pmd) && __pte_alloc(mm, pmd, addr))
+		return NULL;
+
+	return pmd;
+}
 
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 28/29] Abstract ioremap iterator
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (26 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 27/29] Abstract implementation dependent code for mremap Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 29/29] Tweak i386 arch dependent files to work with PTI Paul Davies
                   ` (21 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 28
 * Move ioremap iterator from /lib/ioremap.c to pt-default.c
 * Abstract ioremap_one_pte from iterator and put in pt-iterator-ops.h
 * Remove ioremap.c and update /lib/Makefile

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 lib/ioremap.c                                    |   91 -----------------------
 linux-2.6.20-rc4/include/linux/pt-iterator-ops.h |    8 ++
 linux-2.6.20-rc4/lib/Makefile                    |    1 
 linux-2.6.20-rc4/mm/mmap.c                       |    2 
 linux-2.6.20-rc4/mm/pt-default.c                 |   82 ++++++++++++++++++++
 5 files changed, 92 insertions(+), 92 deletions(-)
Index: linux-2.6.20-rc4/lib/Makefile
===================================================================
--- linux-2.6.20-rc4.orig/lib/Makefile	2007-01-11 13:30:52.020438000 +1100
+++ linux-2.6.20-rc4/lib/Makefile	2007-01-11 13:39:06.924438000 +1100
@@ -7,7 +7,6 @@
 	 idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
 	 sha1.o irq_regs.o reciprocal_div.o
 
-lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
 
 lib-y	+= kobject.o kref.o kobject_uevent.o klist.o
Index: linux-2.6.20-rc4/mm/pt-default.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-default.c	2007-01-11 13:39:05.324438000 +1100
+++ linux-2.6.20-rc4/mm/pt-default.c	2007-01-11 13:39:06.928438000 +1100
@@ -1059,6 +1059,88 @@
 
 #endif
 
+static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
+		unsigned long end, unsigned long phys_addr, pgprot_t prot)
+{
+	pte_t *pte;
+	unsigned long pfn;
+
+	pfn = phys_addr >> PAGE_SHIFT;
+	pte = pte_alloc_kernel(pmd, addr);
+	if (!pte)
+		return -ENOMEM;
+	do {
+			ioremap_one_pte(pte, addr, pfn++, prot);
+#ifdef FFF
+		BUG_ON(!pte_none(*pte));
+		set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
+		pfn++;
+#endif
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	return 0;
+}
+
+static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
+		unsigned long end, unsigned long phys_addr, pgprot_t prot)
+{
+	pmd_t *pmd;
+	unsigned long next;
+
+	phys_addr -= addr;
+	pmd = pmd_alloc(&init_mm, pud, addr);
+	if (!pmd)
+		return -ENOMEM;
+	do {
+		next = pmd_addr_end(addr, end);
+		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
+			return -ENOMEM;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
+		unsigned long end, unsigned long phys_addr, pgprot_t prot)
+{
+	pud_t *pud;
+	unsigned long next;
+
+	phys_addr -= addr;
+	pud = pud_alloc(&init_mm, pgd, addr);
+	if (!pud)
+		return -ENOMEM;
+	do {
+		next = pud_addr_end(addr, end);
+		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
+			return -ENOMEM;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+int ioremap_page_range(unsigned long addr,
+		       unsigned long end, unsigned long phys_addr, pgprot_t prot)
+{
+	pgd_t *pgd;
+	unsigned long start;
+	unsigned long next;
+	int err;
+
+	BUG_ON(addr >= end);
+
+	start = addr;
+	phys_addr -= addr;
+	pgd = pgd_offset_k(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		err = ioremap_pud_range(pgd, addr, next, phys_addr+addr, prot);
+		if (err)
+			break;
+	} while (pgd++, addr = next, addr != end);
+
+	flush_cache_vmap(start, end);
+
+	return err;
+}
+
 static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		unsigned long old_addr, unsigned long old_end,
 		struct vm_area_struct *new_vma, pmd_t *new_pmd,
Index: linux-2.6.20-rc4/mm/mmap.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/mmap.c	2007-01-11 13:30:52.020438000 +1100
+++ linux-2.6.20-rc4/mm/mmap.c	2007-01-11 13:39:06.928438000 +1100
@@ -1987,7 +1987,9 @@
 	while (vma)
 		vma = remove_vma(vma);
 
+#ifdef CONFIG_PT_DEFAULT
 	BUG_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
+#endif
 }
 
 /* Insert vm structure into process list sorted by address
Index: linux-2.6.20-rc4/lib/ioremap.c
===================================================================
--- linux-2.6.20-rc4.orig/lib/ioremap.c	2007-01-11 13:30:52.020438000 +1100
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,91 +0,0 @@
-/*
- * Re-map IO memory to kernel address space so that we can access it.
- * This is needed for high PCI addresses that aren't mapped in the
- * 640k-1MB IO memory area on PC's
- *
- * (C) Copyright 1995 1996 Linus Torvalds
- */
-#include <linux/vmalloc.h>
-#include <linux/mm.h>
-
-#include <asm/cacheflush.h>
-#include <asm/pgtable.h>
-
-static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
-		unsigned long end, unsigned long phys_addr, pgprot_t prot)
-{
-	pte_t *pte;
-	unsigned long pfn;
-
-	pfn = phys_addr >> PAGE_SHIFT;
-	pte = pte_alloc_kernel(pmd, addr);
-	if (!pte)
-		return -ENOMEM;
-	do {
-		BUG_ON(!pte_none(*pte));
-		set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
-		pfn++;
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	return 0;
-}
-
-static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
-		unsigned long end, unsigned long phys_addr, pgprot_t prot)
-{
-	pmd_t *pmd;
-	unsigned long next;
-
-	phys_addr -= addr;
-	pmd = pmd_alloc(&init_mm, pud, addr);
-	if (!pmd)
-		return -ENOMEM;
-	do {
-		next = pmd_addr_end(addr, end);
-		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
-			return -ENOMEM;
-	} while (pmd++, addr = next, addr != end);
-	return 0;
-}
-
-static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
-		unsigned long end, unsigned long phys_addr, pgprot_t prot)
-{
-	pud_t *pud;
-	unsigned long next;
-
-	phys_addr -= addr;
-	pud = pud_alloc(&init_mm, pgd, addr);
-	if (!pud)
-		return -ENOMEM;
-	do {
-		next = pud_addr_end(addr, end);
-		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
-			return -ENOMEM;
-	} while (pud++, addr = next, addr != end);
-	return 0;
-}
-
-int ioremap_page_range(unsigned long addr,
-		       unsigned long end, unsigned long phys_addr, pgprot_t prot)
-{
-	pgd_t *pgd;
-	unsigned long start;
-	unsigned long next;
-	int err;
-
-	BUG_ON(addr >= end);
-
-	start = addr;
-	phys_addr -= addr;
-	pgd = pgd_offset_k(addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		err = ioremap_pud_range(pgd, addr, next, phys_addr+addr, prot);
-		if (err)
-			break;
-	} while (pgd++, addr = next, addr != end);
-
-	flush_cache_vmap(start, end);
-
-	return err;
-}
Index: linux-2.6.20-rc4/include/linux/pt-iterator-ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-iterator-ops.h	2007-01-11 13:39:01.788438000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-iterator-ops.h	2007-01-11 13:39:06.932438000 +1100
@@ -327,3 +327,11 @@
 	return 0;
 }
 #endif
+
+static inline void
+ioremap_one_pte(pte_t *pte, unsigned long addr, unsigned long pfn,
+				pgprot_t prot)
+{
+	BUG_ON(!pte_none(*pte));
+	set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 29/29] Tweak i386 arch dependent files to work with PTI
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (27 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 28/29] Abstract ioremap iterator Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 1/5] Introduce IA64 page table interface Paul Davies
                   ` (20 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH 29 i386
 * Defines default page table config option: PT_DEFAULT in Kconfig.debug
 to appear in kernel hacking.
 * Adjusts arch dependent files referring to the pgd in the mm_struct
 to do it via the new generic page table type (no pgd in mm_struct anymore).

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 arch/i386/Kconfig.debug        |    9 +++++++++
 arch/i386/kernel/init_task.c   |    2 +-
 arch/i386/mm/fault.c           |    2 +-
 arch/i386/mm/pageattr.c        |    3 ++-
 include/asm-i386/mmu_context.h |    4 ++--
 include/asm-i386/pgtable.h     |    2 +-
 6 files changed, 16 insertions(+), 6 deletions(-)
 arch/i386/Kconfig.debug        |    9 +++++++++
 arch/i386/kernel/init_task.c   |    2 +-
 arch/i386/mm/fault.c           |    2 +-
 arch/i386/mm/pageattr.c        |    3 ++-
 include/asm-i386/mmu_context.h |    4 ++--
 include/asm-i386/pgtable.h     |    2 +-
 6 files changed, 16 insertions(+), 6 deletions(-)
PATCH 06-i386
 * Defines default page table config option: PT_DEFAULT in Kconfig.debug
 to appear in kernel hacking.
 * Adjusts arch dependent files referring to the pgd in the mm_struct
 to do it via the new generic page table type (no pgd in mm_struct anymore).

 arch/i386/Kconfig.debug        |    9 +++++++++
 arch/i386/kernel/init_task.c   |    2 +-
 arch/i386/mm/fault.c           |    2 +-
 arch/i386/mm/pageattr.c        |    3 ++-
 include/asm-i386/mmu_context.h |    4 ++--
 include/asm-i386/pgtable.h     |    2 +-
 6 files changed, 16 insertions(+), 6 deletions(-)

Index: linux-2.6.20-rc3/include/asm-i386/mmu_context.h
===================================================================
--- linux-2.6.20-rc3.orig/include/asm-i386/mmu_context.h	2007-01-01 11:53:20.000000000 +1100
+++ linux-2.6.20-rc3/include/asm-i386/mmu_context.h	2007-01-06 02:53:38.000000000 +1100
@@ -38,7 +38,7 @@
 		cpu_set(cpu, next->cpu_vm_mask);
 
 		/* Re-load page tables */
-		load_cr3(next->pgd);
+		load_cr3(next->page_table.pgd);
 
 		/*
 		 * load the LDT, if the LDT is different:
@@ -55,7 +55,7 @@
 			/* We were in lazy tlb mode and leave_mm disabled 
 			 * tlb flush IPI delivery. We must reload %cr3.
 			 */
-			load_cr3(next->pgd);
+			load_cr3(next->page_table.pgd);
 			load_LDT_nolock(&next->context);
 		}
 	}
Index: linux-2.6.20-rc3/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6.20-rc3.orig/arch/i386/mm/pageattr.c	2007-01-01 11:53:20.000000000 +1100
+++ linux-2.6.20-rc3/arch/i386/mm/pageattr.c	2007-01-06 02:53:38.000000000 +1100
@@ -8,10 +8,11 @@
 #include <linux/highmem.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/pt.h>
 #include <asm/uaccess.h>
 #include <asm/processor.h>
 #include <asm/tlbflush.h>
-#include <asm/pgalloc.h>
+//#include <asm/pgalloc.h>
 #include <asm/sections.h>
 
 static DEFINE_SPINLOCK(cpa_lock);
Index: linux-2.6.20-rc3/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.20-rc3.orig/include/asm-i386/pgtable.h	2007-01-01 11:53:20.000000000 +1100
+++ linux-2.6.20-rc3/include/asm-i386/pgtable.h	2007-01-06 02:53:38.000000000 +1100
@@ -415,7 +415,7 @@
  * pgd_offset() returns a (pgd_t *)
  * pgd_index() is used get the offset into the pgd page's array of pgd_t's;
  */
-#define pgd_offset(mm, address) ((mm)->pgd+pgd_index(address))
+#define pgd_offset(mm, address) ((mm)->page_table.pgd+pgd_index(address))
 
 /*
  * a shortcut which implies the use of the kernel's pgd, instead
Index: linux-2.6.20-rc3/arch/i386/kernel/init_task.c
===================================================================
--- linux-2.6.20-rc3.orig/arch/i386/kernel/init_task.c	2007-01-01 11:53:20.000000000 +1100
+++ linux-2.6.20-rc3/arch/i386/kernel/init_task.c	2007-01-06 02:53:38.000000000 +1100
@@ -5,9 +5,9 @@
 #include <linux/init_task.h>
 #include <linux/fs.h>
 #include <linux/mqueue.h>
+#include <linux/pt.h>
 
 #include <asm/uaccess.h>
-#include <asm/pgtable.h>
 #include <asm/desc.h>
 
 static struct fs_struct init_fs = INIT_FS;
Index: linux-2.6.20-rc3/arch/i386/Kconfig.debug
===================================================================
--- linux-2.6.20-rc3.orig/arch/i386/Kconfig.debug	2007-01-01 11:53:20.000000000 +1100
+++ linux-2.6.20-rc3/arch/i386/Kconfig.debug	2007-01-06 02:53:38.000000000 +1100
@@ -6,6 +6,15 @@
 
 source "lib/Kconfig.debug"
 
+choice
+	prompt "Page table selection"
+	default DEFAULT-PT
+
+config  PT_DEFAULT
+	bool "PT_DEFAULT"
+
+endchoice
+
 config EARLY_PRINTK
 	bool "Early printk" if EMBEDDED && DEBUG_KERNEL
 	default y
Index: linux-2.6.20-rc3/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.20-rc3.orig/arch/i386/mm/fault.c	2007-01-06 05:02:32.000000000 +1100
+++ linux-2.6.20-rc3/arch/i386/mm/fault.c	2007-01-06 05:03:24.000000000 +1100
@@ -254,7 +254,7 @@
 	pmd_t *pmd, *pmd_k;
 
 	pgd += index;
-	pgd_k = init_mm.pgd + index;
+	pgd_k = init_mm.page_table.pgd + index;
 
 	if (!pgd_present(*pgd_k))
 		return NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/5] Introduce IA64 page table interface
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (28 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 29/29] Tweak i386 arch dependent files to work with PTI Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 2/5] Abstract pgtable Paul Davies
                   ` (19 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH IA64 01
 * Create /include/asm-ia64/pt.h and define IA64 page table interface.
   * This file is for including various implementations of page table.
   At the moment, just the default page table.
 * Create /include/asm-ia64/pt-default.h and place abstracted page table
 dependendent functions (for IA64) in there.
 * Call create_kernel_page_table in arch-ia64/kernel/setup.c (which does
 nothing for the current page table implementation, but may do for others).
 * Make implementation independent call to lookup the kernel page table in
 /arch/ia64/mm/fault.c
 * Call implementation independent build and lookup functions in
 /arch/ia64/mm/init.c.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 arch/ia64/kernel/setup.c      |    2 
 arch/ia64/mm/fault.c          |   19 +------
 arch/ia64/mm/init.c           |   59 ++--------------------
 include/asm-ia64/pt-default.h |  112 ++++++++++++++++++++++++++++++++++++++++++
 include/asm-ia64/pt.h         |   16 ++++++
 5 files changed, 140 insertions(+), 68 deletions(-)
Index: linux-2.6.20-rc1/include/asm-ia64/pt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/pt.h	2006-12-23 20:55:49.287909000 +1100
@@ -0,0 +1,16 @@
+#ifndef _ASM_IA64_PT_H
+#define _ASM_IA64_PT_H 1
+
+#ifdef CONFIG_PT_DEFAULT
+#include <asm/pt-default.h>
+#endif
+
+void create_kernel_page_table(void);
+
+pte_t *build_page_table_k(unsigned long address);
+
+pte_t *build_page_table_k_bootmem(unsigned long address, int _node);
+
+pte_t *lookup_page_table_k(unsigned long address);
+
+#endif
Index: linux-2.6.20-rc1/include/asm-ia64/pt-default.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/pt-default.h	2006-12-23 20:55:19.759909000 +1100
@@ -0,0 +1,112 @@
+#ifndef _ASM_IA64_PT_DEFAULT_H
+#define _ASM_IA64_PT_DEFAULT_H 1
+
+#include <linux/bootmem.h>
+#include <asm/pgalloc.h>
+
+
+/* Create kernel page table */
+static inline void create_kernel_page_table(void) {}
+
+/* Lookup the kernel page table */
+static inline pte_t *lookup_page_table_k(unsigned long address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *ptep;
+
+	pgd = pgd_offset_k(address);
+	if (pgd_none(*pgd) || pgd_bad(*pgd))
+		return 0;
+
+	pud = pud_offset(pgd, address);
+	if (pud_none(*pud) || pud_bad(*pud))
+		return 0;
+
+	pmd = pmd_offset(pud, address);
+	if (pmd_none(*pmd) || pmd_bad(*pmd))
+		return 0;
+
+	ptep = pte_offset_kernel(pmd, address);
+
+	return ptep;
+}
+
+static inline pte_t *lookup_page_table_k2(unsigned long *end_address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset_k(*end_address);
+	if (pgd_none(*pgd)) {
+		*end_address += PGDIR_SIZE;
+		return NULL;
+	}
+
+	pud = pud_offset(pgd, *end_address);
+	if (pud_none(*pud)) {
+		*end_address += PUD_SIZE;
+		return NULL;
+	}
+
+	pmd = pmd_offset(pud, *end_address);
+	if (pmd_none(*pmd)) {
+		*end_address += PMD_SIZE;
+		return NULL;
+	}
+
+	return pte_offset_kernel(pmd, *end_address);
+}
+
+/* Build the kernel page table */
+static inline pte_t *build_page_table_k(unsigned long address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset_k(address);		/* note: this is NOT pgd_offset()! */
+
+	pud = pud_alloc(&init_mm, pgd, address);
+	if (!pud)
+		return NULL;
+	pmd = pmd_alloc(&init_mm, pud, address);
+	if (!pmd)
+		return NULL;
+
+	return  pte_alloc_kernel(pmd, address);
+}
+
+/* Builds the kernel page table from bootmem (before kernel memory allocation
+ * comes on line) */
+static inline pte_t *build_page_table_k_bootmem(unsigned long address, int _node)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	int node= _node;
+
+	pgd = pgd_offset_k(address);
+	if (pgd_none(*pgd))
+		pgd_populate(&init_mm, pgd,
+			     alloc_bootmem_pages_node(
+				     NODE_DATA(node), PAGE_SIZE));
+	pud = pud_offset(pgd, address);
+
+	if (pud_none(*pud))
+		pud_populate(&init_mm, pud,
+			     alloc_bootmem_pages_node(
+				     NODE_DATA(node), PAGE_SIZE));
+	pmd = pmd_offset(pud, address);
+
+	if (pmd_none(*pmd))
+		pmd_populate_kernel(&init_mm, pmd,
+				    alloc_bootmem_pages_node(
+					    NODE_DATA(node), PAGE_SIZE));
+	return pte_offset_kernel(pmd, address);
+}
+
+
+#endif /* !_PT_DEFAULT_H */
Index: linux-2.6.20-rc1/arch/ia64/kernel/setup.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/kernel/setup.c	2006-12-23 20:55:06.603909000 +1100
+++ linux-2.6.20-rc1/arch/ia64/kernel/setup.c	2006-12-23 20:55:19.763909000 +1100
@@ -61,6 +61,7 @@
 #include <asm/system.h>
 #include <asm/unistd.h>
 #include <asm/system.h>
+#include <asm/pt.h>
 
 #if defined(CONFIG_SMP) && (IA64_CPU_SIZE > PAGE_SIZE)
 # error "struct cpuinfo_ia64 too big!"
@@ -545,6 +546,7 @@
 		ia64_mca_init();
 
 	platform_setup(cmdline_p);
+	create_kernel_page_table();
 	paging_init();
 }
 
Index: linux-2.6.20-rc1/arch/ia64/mm/fault.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/mm/fault.c	2006-12-23 20:55:06.603909000 +1100
+++ linux-2.6.20-rc1/arch/ia64/mm/fault.c	2006-12-23 20:55:19.763909000 +1100
@@ -16,6 +16,7 @@
 #include <asm/system.h>
 #include <asm/uaccess.h>
 #include <asm/kdebug.h>
+#include <asm/pt.h>
 
 extern void die (char *, struct pt_regs *, long);
 
@@ -57,27 +58,13 @@
  * Return TRUE if ADDRESS points at a page in the kernel's mapped segment
  * (inside region 5, on ia64) and that page is present.
  */
+
 static int
 mapped_kernel_page_is_present (unsigned long address)
 {
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
 	pte_t *ptep, pte;
 
-	pgd = pgd_offset_k(address);
-	if (pgd_none(*pgd) || pgd_bad(*pgd))
-		return 0;
-
-	pud = pud_offset(pgd, address);
-	if (pud_none(*pud) || pud_bad(*pud))
-		return 0;
-
-	pmd = pmd_offset(pud, address);
-	if (pmd_none(*pmd) || pmd_bad(*pmd))
-		return 0;
-
-	ptep = pte_offset_kernel(pmd, address);
+	ptep = lookup_page_table_k(address);
 	if (!ptep)
 		return 0;
 
Index: linux-2.6.20-rc1/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/mm/init.c	2006-12-23 20:55:06.603909000 +1100
+++ linux-2.6.20-rc1/arch/ia64/mm/init.c	2006-12-23 20:55:19.763909000 +1100
@@ -35,6 +35,7 @@
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/mca.h>
+#include <asm/pt.h>
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
@@ -269,25 +270,14 @@
 static struct page * __init
 put_kernel_page (struct page *page, unsigned long address, pgprot_t pgprot)
 {
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
 	pte_t *pte;
 
 	if (!PageReserved(page))
 		printk(KERN_ERR "put_kernel_page: page at 0x%p not in reserved memory\n",
 		       page_address(page));
 
-	pgd = pgd_offset_k(address);		/* note: this is NOT pgd_offset()! */
-
 	{
-		pud = pud_alloc(&init_mm, pgd, address);
-		if (!pud)
-			goto out;
-		pmd = pmd_alloc(&init_mm, pud, address);
-		if (!pmd)
-			goto out;
-		pte = pte_alloc_kernel(pmd, address);
+		pte = build_page_table_k(address);
 		if (!pte)
 			goto out;
 		if (!pte_none(*pte))
@@ -428,30 +418,9 @@
 		pgdat->node_start_pfn + pgdat->node_spanned_pages];
 
 	do {
-		pgd_t *pgd;
-		pud_t *pud;
-		pmd_t *pmd;
 		pte_t *pte;
 
-		pgd = pgd_offset_k(end_address);
-		if (pgd_none(*pgd)) {
-			end_address += PGDIR_SIZE;
-			continue;
-		}
-
-		pud = pud_offset(pgd, end_address);
-		if (pud_none(*pud)) {
-			end_address += PUD_SIZE;
-			continue;
-		}
-
-		pmd = pmd_offset(pud, end_address);
-		if (pmd_none(*pmd)) {
-			end_address += PMD_SIZE;
-			continue;
-		}
-
-		pte = pte_offset_kernel(pmd, end_address);
+		pte = lookup_page_table_k2(&end_address);
 retry_pte:
 		if (pte_none(*pte)) {
 			end_address += PAGE_SIZE;
@@ -477,9 +446,6 @@
 	unsigned long address, start_page, end_page;
 	struct page *map_start, *map_end;
 	int node;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
 	pte_t *pte;
 
 	map_start = vmem_map + (__pa(start) >> PAGE_SHIFT);
@@ -489,23 +455,12 @@
 	end_page = PAGE_ALIGN((unsigned long) map_end);
 	node = paddr_to_nid(__pa(start));
 
+	printk("MEMMAP\n");
 	for (address = start_page; address < end_page; address += PAGE_SIZE) {
-		pgd = pgd_offset_k(address);
-		if (pgd_none(*pgd))
-			pgd_populate(&init_mm, pgd, alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE));
-		pud = pud_offset(pgd, address);
-
-		if (pud_none(*pud))
-			pud_populate(&init_mm, pud, alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE));
-		pmd = pmd_offset(pud, address);
-
-		if (pmd_none(*pmd))
-			pmd_populate_kernel(&init_mm, pmd, alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE));
-		pte = pte_offset_kernel(pmd, address);
-
+		pte = build_page_table_k_bootmem(address, node);
 		if (pte_none(*pte))
-			set_pte(pte, pfn_pte(__pa(alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE)) >> PAGE_SHIFT,
-					     PAGE_KERNEL));
+			set_pte(pte, pfn_pte(__pa(alloc_bootmem_pages_node(NODE_DATA(node),
+				PAGE_SIZE)) >> PAGE_SHIFT, PAGE_KERNEL));
 	}
 	return 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 2/5] Abstract pgtable
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (29 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 1/5] Introduce IA64 page table interface Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 3/5] Abstact pgtable continued Paul Davies
                   ` (18 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH IA64 02
 * Create file page-default.h and move implementation dependent code from
 page.h into it.
 * Create pgtable-default.h and put implementation dependent code from
 pgtable.h into it.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 page-default.h    |   38 +++++++++++++++++++++++++++++++++++++
 page.h            |   20 ++++---------------
 pgtable-default.h |   54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 pgtable.h         |   55 +++++-------------------------------------------------
 4 files changed, 103 insertions(+), 64 deletions(-)
Index: linux-2.6.20-rc1/include/asm-ia64/page-default.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/page-default.h	2006-12-23 20:24:57.787909000 +1100
@@ -0,0 +1,38 @@
+#ifndef _ASM_IA64_PAGE_DEFAULT_H
+#define _ASM_IA64_PAGE_DEFAULT_H
+
+#ifdef __KERNEL__
+
+#ifdef STRICT_MM_TYPECHECKS
+  /*
+   * These are used to make use of C type-checking..
+   */
+  typedef struct { unsigned long pmd; } pmd_t;
+#ifdef CONFIG_PGTABLE_4
+  typedef struct { unsigned long pud; } pud_t;
+#endif
+  typedef struct { unsigned long pgd; } pgd_t;
+
+# define pmd_val(x)	((x).pmd)
+#ifdef CONFIG_PGTABLE_4
+# define pud_val(x)	((x).pud)
+#endif
+# define pgd_val(x)	((x).pgd)
+
+#else /* !STRICT_MM_TYPECHECKS */
+  /*
+   * .. while these make it easier on the compiler
+   */
+# ifndef __ASSEMBLY__
+    typedef unsigned long pmd_t;
+    typedef unsigned long pgd_t;
+# endif
+
+# define pmd_val(x)	(x)
+# define pgd_val(x)	(x)
+
+# define __pgd(x)	(x)
+#endif /* !STRICT_MM_TYPECHECKS */
+
+#endif /* __KERNEL__ */
+#endif /* _ASM_IA64_PAGE_DEFAULT_H */
Index: linux-2.6.20-rc1/include/asm-ia64/page.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/page.h	2006-12-23 20:24:51.003909000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/page.h	2006-12-23 20:25:39.879909000 +1100
@@ -180,19 +180,9 @@
    * These are used to make use of C type-checking..
    */
   typedef struct { unsigned long pte; } pte_t;
-  typedef struct { unsigned long pmd; } pmd_t;
-#ifdef CONFIG_PGTABLE_4
-  typedef struct { unsigned long pud; } pud_t;
-#endif
-  typedef struct { unsigned long pgd; } pgd_t;
   typedef struct { unsigned long pgprot; } pgprot_t;
 
 # define pte_val(x)	((x).pte)
-# define pmd_val(x)	((x).pmd)
-#ifdef CONFIG_PGTABLE_4
-# define pud_val(x)	((x).pud)
-#endif
-# define pgd_val(x)	((x).pgd)
 # define pgprot_val(x)	((x).pgprot)
 
 # define __pte(x)	((pte_t) { (x) } )
@@ -204,18 +194,13 @@
    */
 # ifndef __ASSEMBLY__
     typedef unsigned long pte_t;
-    typedef unsigned long pmd_t;
-    typedef unsigned long pgd_t;
     typedef unsigned long pgprot_t;
 # endif
 
 # define pte_val(x)	(x)
-# define pmd_val(x)	(x)
-# define pgd_val(x)	(x)
 # define pgprot_val(x)	(x)
 
 # define __pte(x)	(x)
-# define __pgd(x)	(x)
 # define __pgprot(x)	(x)
 #endif /* !STRICT_MM_TYPECHECKS */
 
@@ -227,4 +212,9 @@
 					  ? VM_EXEC : 0))
 
 # endif /* __KERNEL__ */
+
+#ifdef CONFIG_PT_DEFAULT
+#include <asm/page-default.h>
+#endif
+
 #endif /* _ASM_IA64_PAGE_H */
Index: linux-2.6.20-rc1/include/asm-ia64/pgtable-default.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/pgtable-default.h	2006-12-23 20:24:57.791909000 +1100
@@ -0,0 +1,54 @@
+#ifndef _ASM_IA64_PGTABLE_DEFAULT_H
+#define _ASM_IA64_PGTABLE_DEFAULT_H
+
+/*
+ * How many pointers will a page table level hold expressed in shift
+ */
+#define PTRS_PER_PTD_SHIFT	(PAGE_SHIFT-3)
+
+/*
+ * Definitions for fourth level:
+ */
+#define PTRS_PER_PTE	(__IA64_UL(1) << (PTRS_PER_PTD_SHIFT))
+
+/*
+ * Definitions for third level:
+ *
+ * PMD_SHIFT determines the size of the area a third-level page table
+ * can map.
+ */
+#define PMD_SHIFT	(PAGE_SHIFT + (PTRS_PER_PTD_SHIFT))
+#define PMD_SIZE	(1UL << PMD_SHIFT)
+#define PMD_MASK	(~(PMD_SIZE-1))
+#define PTRS_PER_PMD	(1UL << (PTRS_PER_PTD_SHIFT))
+
+#ifdef CONFIG_PGTABLE_4
+/*
+ * Definitions for second level:
+ *
+ * PUD_SHIFT determines the size of the area a second-level page table
+ * can map.
+ */
+#define PUD_SHIFT	(PMD_SHIFT + (PTRS_PER_PTD_SHIFT))
+#define PUD_SIZE	(1UL << PUD_SHIFT)
+#define PUD_MASK	(~(PUD_SIZE-1))
+#define PTRS_PER_PUD	(1UL << (PTRS_PER_PTD_SHIFT))
+
+#endif
+/*
+ * Definitions for first level:
+ *
+ * PGDIR_SHIFT determines what a first-level page table entry can map.
+ */
+#ifdef CONFIG_PGTABLE_4
+#define PGDIR_SHIFT		(PUD_SHIFT + (PTRS_PER_PTD_SHIFT))
+#else
+#define PGDIR_SHIFT		(PMD_SHIFT + (PTRS_PER_PTD_SHIFT))
+#endif
+#define PGDIR_SIZE		(__IA64_UL(1) << PGDIR_SHIFT)
+#define PGDIR_MASK		(~(PGDIR_SIZE-1))
+#define PTRS_PER_PGD_SHIFT	PTRS_PER_PTD_SHIFT
+#define PTRS_PER_PGD		(1UL << PTRS_PER_PGD_SHIFT)
+#define USER_PTRS_PER_PGD	(5*PTRS_PER_PGD/8)	/* regions 0-4 are user regions */
+
+#endif
Index: linux-2.6.20-rc1/include/asm-ia64/pgtable.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/pgtable.h	2006-12-23 20:24:51.003909000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/pgtable.h	2006-12-23 20:26:16.243909000 +1100
@@ -19,6 +19,10 @@
 #include <asm/system.h>
 #include <asm/types.h>
 
+#ifdef CONFIG_PT_DEFAULT
+#include <asm/pgtable-default.h>
+#endif
+
 #define IA64_MAX_PHYS_BITS	50	/* max. number of physical address bits (architected) */
 
 /*
@@ -82,55 +86,6 @@
 #define __DIRTY_BITS_NO_ED	_PAGE_A | _PAGE_P | _PAGE_D | _PAGE_MA_WB
 #define __DIRTY_BITS		_PAGE_ED | __DIRTY_BITS_NO_ED
 
-/*
- * How many pointers will a page table level hold expressed in shift
- */
-#define PTRS_PER_PTD_SHIFT	(PAGE_SHIFT-3)
-
-/*
- * Definitions for fourth level:
- */
-#define PTRS_PER_PTE	(__IA64_UL(1) << (PTRS_PER_PTD_SHIFT))
-
-/*
- * Definitions for third level:
- *
- * PMD_SHIFT determines the size of the area a third-level page table
- * can map.
- */
-#define PMD_SHIFT	(PAGE_SHIFT + (PTRS_PER_PTD_SHIFT))
-#define PMD_SIZE	(1UL << PMD_SHIFT)
-#define PMD_MASK	(~(PMD_SIZE-1))
-#define PTRS_PER_PMD	(1UL << (PTRS_PER_PTD_SHIFT))
-
-#ifdef CONFIG_PGTABLE_4
-/*
- * Definitions for second level:
- *
- * PUD_SHIFT determines the size of the area a second-level page table
- * can map.
- */
-#define PUD_SHIFT	(PMD_SHIFT + (PTRS_PER_PTD_SHIFT))
-#define PUD_SIZE	(1UL << PUD_SHIFT)
-#define PUD_MASK	(~(PUD_SIZE-1))
-#define PTRS_PER_PUD	(1UL << (PTRS_PER_PTD_SHIFT))
-#endif
-
-/*
- * Definitions for first level:
- *
- * PGDIR_SHIFT determines what a first-level page table entry can map.
- */
-#ifdef CONFIG_PGTABLE_4
-#define PGDIR_SHIFT		(PUD_SHIFT + (PTRS_PER_PTD_SHIFT))
-#else
-#define PGDIR_SHIFT		(PMD_SHIFT + (PTRS_PER_PTD_SHIFT))
-#endif
-#define PGDIR_SIZE		(__IA64_UL(1) << PGDIR_SHIFT)
-#define PGDIR_MASK		(~(PGDIR_SIZE-1))
-#define PTRS_PER_PGD_SHIFT	PTRS_PER_PTD_SHIFT
-#define PTRS_PER_PGD		(1UL << PTRS_PER_PGD_SHIFT)
-#define USER_PTRS_PER_PGD	(5*PTRS_PER_PGD/8)	/* regions 0-4 are user regions */
 #define FIRST_USER_ADDRESS	0
 
 /*
@@ -595,9 +550,11 @@
 #define __HAVE_ARCH_PGD_OFFSET_GATE
 #define __HAVE_ARCH_LAZY_MMU_PROT_UPDATE
 
+#ifdef CONFIG_PT_DEFAULT
 #ifndef CONFIG_PGTABLE_4
 #include <asm-generic/pgtable-nopud.h>
 #endif
 #include <asm-generic/pgtable.h>
+#endif
 
 #endif /* _ASM_IA64_PGTABLE_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 3/5] Abstact pgtable continued.
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (30 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 2/5] Abstract pgtable Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 4/5] Abstract assembler lookup Paul Davies
                   ` (17 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH IA64 03
 * Continue abstracting implementation dependent pgtable.h code into 
 pgtable-default.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 arch/ia64/mm/init.c                |    2 
 include/asm-ia64/pgtable-default.h |   93 +++++++++++++++++++++++++++++++++++++
 include/asm-ia64/pgtable.h         |   84 ++-------------------------------
 3 files changed, 101 insertions(+), 78 deletions(-)
Index: linux-2.6.20-rc1/include/asm-ia64/pgtable-default.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/pgtable-default.h	2006-12-23 20:24:57.791909000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/pgtable-default.h	2006-12-23 20:26:33.219909000 +1100
@@ -51,4 +51,97 @@
 #define PTRS_PER_PGD		(1UL << PTRS_PER_PGD_SHIFT)
 #define USER_PTRS_PER_PGD	(5*PTRS_PER_PGD/8)	/* regions 0-4 are user regions */
 
+# ifndef __ASSEMBLY__
+
+#include <linux/sched.h>	/* for mm_struct */
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/mmu_context.h>
+#include <asm/processor.h>
+
+
+#define pgd_ERROR(e)	printk("%s:%d: bad pgd %016lx.\n", __FILE__, __LINE__, pgd_val(e))
+#ifdef CONFIG_PGTABLE_4
+#define pud_ERROR(e)	printk("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
+#endif
+#define pmd_ERROR(e)	printk("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
+
+
+#define pmd_none(pmd)			(!pmd_val(pmd))
+#define pmd_bad(pmd)			(!ia64_phys_addr_valid(pmd_val(pmd)))
+#define pmd_present(pmd)		(pmd_val(pmd) != 0UL)
+#define pmd_clear(pmdp)			(pmd_val(*(pmdp)) = 0UL)
+#define pmd_page_vaddr(pmd)		((unsigned long) __va(pmd_val(pmd) & _PFN_MASK))
+#define pmd_page(pmd)			virt_to_page((pmd_val(pmd) + PAGE_OFFSET))
+
+#define pud_none(pud)			(!pud_val(pud))
+#define pud_bad(pud)			(!ia64_phys_addr_valid(pud_val(pud)))
+#define pud_present(pud)		(pud_val(pud) != 0UL)
+#define pud_clear(pudp)			(pud_val(*(pudp)) = 0UL)
+#define pud_page_vaddr(pud)		((unsigned long) __va(pud_val(pud) & _PFN_MASK))
+#define pud_page(pud)			virt_to_page((pud_val(pud) + PAGE_OFFSET))
+
+#ifdef CONFIG_PGTABLE_4
+#define pgd_none(pgd)			(!pgd_val(pgd))
+#define pgd_bad(pgd)			(!ia64_phys_addr_valid(pgd_val(pgd)))
+#define pgd_present(pgd)		(pgd_val(pgd) != 0UL)
+#define pgd_clear(pgdp)			(pgd_val(*(pgdp)) = 0UL)
+#define pgd_page_vaddr(pgd)		((unsigned long) __va(pgd_val(pgd) & _PFN_MASK))
+#define pgd_page(pgd)			virt_to_page((pgd_val(pgd) + PAGE_OFFSET))
+#endif
+
+static inline unsigned long
+pgd_index (unsigned long address)
+{
+	unsigned long region = address >> 61;
+	unsigned long l1index = (address >> PGDIR_SHIFT) & ((PTRS_PER_PGD >> 3) - 1);
+
+	return (region << (PAGE_SHIFT - 6)) | l1index;
+}
+
+/* The offset in the 1-level directory is given by the 3 region bits
+   (61..63) and the level-1 bits.  */
+static inline pgd_t*
+pgd_offset (struct mm_struct *mm, unsigned long address)
+{
+	return mm->page_table.pgd + pgd_index(address);
+}
+
+/* In the kernel's mapped region we completely ignore the region number
+   (since we know it's in region number 5). */
+#define pgd_offset_k(addr) \
+	(init_mm.page_table.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
+
+/* Look up a pgd entry in the gate area.  On IA-64, the gate-area
+   resides in the kernel-mapped segment, hence we use pgd_offset_k()
+   here.  */
+#define pgd_offset_gate(mm, addr)	pgd_offset_k(addr)
+
+#ifdef CONFIG_PGTABLE_4
+/* Find an entry in the second-level page table.. */
+#define pud_offset(dir,addr) \
+	((pud_t *) pgd_page_vaddr(*(dir)) + (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)))
+#endif
+
+/* Find an entry in the third-level page table.. */
+#define pmd_offset(dir,addr) \
+	((pmd_t *) pud_page_vaddr(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)))
+
+/*
+ * Find an entry in the third-level page table.  This looks more complicated than it
+ * should be because some platforms place page tables in high memory.
+ */
+#define pte_index(addr)	 	(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
+#define pte_offset_kernel(dir,addr)	((pte_t *) pmd_page_vaddr(*(dir)) + pte_index(addr))
+#define pte_offset_map(dir,addr)	pte_offset_kernel(dir, addr)
+#define pte_offset_map_nested(dir,addr)	pte_offset_map(dir, addr)
+
+extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
+
+#define RGN_MAP_SHIFT (PGDIR_SHIFT + PTRS_PER_PGD_SHIFT - 3)
+
+#define ALIGNVAL (1UL << PMD_SHIFT)
+
+# endif /* !__ASSEMBLY__ */
+
 #endif
Index: linux-2.6.20-rc1/include/asm-ia64/pgtable.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/pgtable.h	2006-12-23 20:26:16.243909000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/pgtable.h	2006-12-23 20:32:51.431909000 +1100
@@ -23,6 +23,10 @@
 #include <asm/pgtable-default.h>
 #endif
 
+#ifdef CONFIG_PT_GPT
+#include <asm/pgtable-gpt.h>
+#endif
+
 #define IA64_MAX_PHYS_BITS	50	/* max. number of physical address bits (architected) */
 
 /*
@@ -137,14 +141,8 @@
 #define __S110	__pgprot(__ACCESS_BITS | _PAGE_PL_3 | _PAGE_AR_RWX)
 #define __S111	__pgprot(__ACCESS_BITS | _PAGE_PL_3 | _PAGE_AR_RWX)
 
-#define pgd_ERROR(e)	printk("%s:%d: bad pgd %016lx.\n", __FILE__, __LINE__, pgd_val(e))
-#ifdef CONFIG_PGTABLE_4
-#define pud_ERROR(e)	printk("%s:%d: bad pud %016lx.\n", __FILE__, __LINE__, pud_val(e))
-#endif
-#define pmd_ERROR(e)	printk("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
 #define pte_ERROR(e)	printk("%s:%d: bad pte %016lx.\n", __FILE__, __LINE__, pte_val(e))
 
-
 /*
  * Some definitions to translate between mem_map, PTEs, and page addresses:
  */
@@ -198,7 +196,6 @@
 #define	kc_vaddr_to_offset(v) ((v) - RGN_BASE(RGN_GATE))
 #define	kc_offset_to_vaddr(o) ((o) + RGN_BASE(RGN_GATE))
 
-#define RGN_MAP_SHIFT (PGDIR_SHIFT + PTRS_PER_PGD_SHIFT - 3)
 #define RGN_MAP_LIMIT	((1UL << RGN_MAP_SHIFT) - PAGE_SIZE)	/* per region addr limit */
 
 /*
@@ -226,29 +223,6 @@
 /* pte_page() returns the "struct page *" corresponding to the PTE: */
 #define pte_page(pte)			virt_to_page(((pte_val(pte) & _PFN_MASK) + PAGE_OFFSET))
 
-#define pmd_none(pmd)			(!pmd_val(pmd))
-#define pmd_bad(pmd)			(!ia64_phys_addr_valid(pmd_val(pmd)))
-#define pmd_present(pmd)		(pmd_val(pmd) != 0UL)
-#define pmd_clear(pmdp)			(pmd_val(*(pmdp)) = 0UL)
-#define pmd_page_vaddr(pmd)		((unsigned long) __va(pmd_val(pmd) & _PFN_MASK))
-#define pmd_page(pmd)			virt_to_page((pmd_val(pmd) + PAGE_OFFSET))
-
-#define pud_none(pud)			(!pud_val(pud))
-#define pud_bad(pud)			(!ia64_phys_addr_valid(pud_val(pud)))
-#define pud_present(pud)		(pud_val(pud) != 0UL)
-#define pud_clear(pudp)			(pud_val(*(pudp)) = 0UL)
-#define pud_page_vaddr(pud)		((unsigned long) __va(pud_val(pud) & _PFN_MASK))
-#define pud_page(pud)			virt_to_page((pud_val(pud) + PAGE_OFFSET))
-
-#ifdef CONFIG_PGTABLE_4
-#define pgd_none(pgd)			(!pgd_val(pgd))
-#define pgd_bad(pgd)			(!ia64_phys_addr_valid(pgd_val(pgd)))
-#define pgd_present(pgd)		(pgd_val(pgd) != 0UL)
-#define pgd_clear(pgdp)			(pgd_val(*(pgdp)) = 0UL)
-#define pgd_page_vaddr(pgd)		((unsigned long) __va(pgd_val(pgd) & _PFN_MASK))
-#define pgd_page(pgd)			virt_to_page((pgd_val(pgd) + PAGE_OFFSET))
-#endif
-
 /*
  * The following have defined behavior only work if pte_present() is true.
  */
@@ -287,51 +261,6 @@
 				     unsigned long size, pgprot_t vma_prot);
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 
-static inline unsigned long
-pgd_index (unsigned long address)
-{
-	unsigned long region = address >> 61;
-	unsigned long l1index = (address >> PGDIR_SHIFT) & ((PTRS_PER_PGD >> 3) - 1);
-
-	return (region << (PAGE_SHIFT - 6)) | l1index;
-}
-
-/* The offset in the 1-level directory is given by the 3 region bits
-   (61..63) and the level-1 bits.  */
-static inline pgd_t*
-pgd_offset (struct mm_struct *mm, unsigned long address)
-{
-	return mm->page_table.pgd + pgd_index(address);
-}
-
-/* In the kernel's mapped region we completely ignore the region number
-   (since we know it's in region number 5). */
-#define pgd_offset_k(addr) \
-	(init_mm.page_table.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
-
-/* Look up a pgd entry in the gate area.  On IA-64, the gate-area
-   resides in the kernel-mapped segment, hence we use pgd_offset_k()
-   here.  */
-#define pgd_offset_gate(mm, addr)	pgd_offset_k(addr)
-
-#ifdef CONFIG_PGTABLE_4
-/* Find an entry in the second-level page table.. */
-#define pud_offset(dir,addr) \
-	((pud_t *) pgd_page_vaddr(*(dir)) + (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)))
-#endif
-
-/* Find an entry in the third-level page table.. */
-#define pmd_offset(dir,addr) \
-	((pmd_t *) pud_page_vaddr(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)))
-
-/*
- * Find an entry in the third-level page table.  This looks more complicated than it
- * should be because some platforms place page tables in high memory.
- */
-#define pte_index(addr)	 	(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
-#define pte_offset_kernel(dir,addr)	((pte_t *) pmd_page_vaddr(*(dir)) + pte_index(addr))
-#define pte_offset_map(dir,addr)	pte_offset_kernel(dir, addr)
-#define pte_offset_map_nested(dir,addr)	pte_offset_map(dir, addr)
 #define pte_unmap(pte)			do { } while (0)
 #define pte_unmap_nested(pte)		do { } while (0)
 
@@ -405,7 +334,6 @@
 
 #define update_mmu_cache(vma, address, pte) do { } while (0)
 
-extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern void paging_init (void);
 
 /*
@@ -554,7 +482,9 @@
 #ifndef CONFIG_PGTABLE_4
 #include <asm-generic/pgtable-nopud.h>
 #endif
-#include <asm-generic/pgtable.h>
 #endif
 
+#include <asm-generic/pgtable.h>
+
+
 #endif /* _ASM_IA64_PGTABLE_H */
Index: linux-2.6.20-rc1/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/mm/init.c	2006-12-23 20:24:49.187909000 +1100
+++ linux-2.6.20-rc1/arch/ia64/mm/init.c	2006-12-23 20:33:20.435909000 +1100
@@ -426,7 +426,7 @@
 			end_address += PAGE_SIZE;
 			pte++;
 			if ((end_address < stop_address) &&
-			    (end_address != ALIGN(end_address, 1UL << PMD_SHIFT)))
+			    (end_address != ALIGN(end_address, ALIGNVAL)))
 				goto retry_pte;
 			continue;
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 4/5] Abstract assembler lookup
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (31 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 3/5] Abstact pgtable continued Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 5/5] Abstract pgalloc Paul Davies
                   ` (16 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH IA64 04
 * Create ivt.h to hold page table assembler lookup function.
 * Abstract implementation dependent assembler.
 NB: Not actually calling the defined lookup .macro for the default
 page table here.  We will probably get rid of this and have just #ifdefed
 it out at the moment.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 arch/ia64/kernel/ivt.S         |    7 +++++
 arch/ia64/mm/init.c            |    2 +
 include/asm-ia64/ivt.h         |   56 +++++++++++++++++++++++++++++++++++++++++
 include/asm-ia64/mmu_context.h |    2 +
 4 files changed, 67 insertions(+)
Index: linux-2.6.20-rc1/arch/ia64/kernel/ivt.S
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/kernel/ivt.S	2006-12-23 21:02:16.115531000 +1100
+++ linux-2.6.20-rc1/arch/ia64/kernel/ivt.S	2006-12-23 21:05:07.849355000 +1100
@@ -51,6 +51,7 @@
 #include <asm/thread_info.h>
 #include <asm/unistd.h>
 #include <asm/errno.h>
+#include <asm/ivt.h>
 
 #if 1
 # define PSR_DEFAULT_BITS	psr.ac
@@ -102,12 +103,14 @@
 	 *	- the faulting virtual address uses unimplemented address bits
 	 *	- the faulting virtual address has no valid page table mapping
 	 */
+
 	mov r16=cr.ifa				// get address that caused the TLB miss
 #ifdef CONFIG_HUGETLB_PAGE
 	movl r18=PAGE_SHIFT
 	mov r25=cr.itir
 #endif
 	;;
+#ifdef CONFIG_PT_DEFAULT
 	rsm psr.dt				// use physical addressing for data
 	mov r31=pr				// save the predicate registers
 	mov r19=IA64_KR(PT_BASE)		// get page table base address
@@ -166,6 +169,7 @@
 	;;
 (p7)	cmp.eq.or.andcm p6,p7=r20,r0		// was pmd_present(*pmd) == NULL?
 	dep r21=r19,r20,3,(PAGE_SHIFT-3)	// r21=pte_offset(pmd,addr)
+#endif
 	;;
 (p7)	ld8 r18=[r21]				// read *pte
 	mov r19=cr.isr				// cr.isr bit 32 tells us if this is an insn miss
@@ -435,6 +439,7 @@
 	 *
 	 * Clobbered:	b0, r18, r19, r21, r22, psr.dt (cleared)
 	 */
+#ifdef CONFIG_PT_DEFAULT
 	rsm psr.dt				// switch to using physical data addressing
 	mov r19=IA64_KR(PT_BASE)		// get the page table base address
 	shl r21=r16,3				// shift bit 60 into sign bit
@@ -485,6 +490,8 @@
 	;;
 (p7)	cmp.eq.or.andcm p6,p7=r17,r0		// was pmd_present(*pmd) == NULL?
 	dep r17=r19,r17,3,(PAGE_SHIFT-3)	// r17=pte_offset(pmd,addr);
+#endif
+	/* find_pte r16,r17,p6,p7 */
 (p6)	br.cond.spnt page_fault
 	mov b0=r30
 	br.sptk.many b0				// return to continuation point
Index: linux-2.6.20-rc1/include/asm-ia64/ivt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/ivt.h	2006-12-23 21:05:07.849355000 +1100
@@ -0,0 +1,56 @@
+#ifdef CONFIG_PT_DEFAULT
+
+.macro find_pte va, ppte, p1, p2
+	rsm psr.dt				// switch to using physical data addressing
+	mov r19=IA64_KR(PT_BASE)		// get the page table base address
+	shl r21=\va,3				// shift bit 60 into sign bit
+	mov r18=cr.itir
+	;;
+	shr.u \ppte=\va,61			// get the region number into ppte
+	extr.u r18=r18,2,6			// get the faulting page size
+	;;
+	cmp.eq \p1,\p2=5,\ppte			// is faulting address in region 5?
+	add r22=-PAGE_SHIFT,r18			// adjustment for hugetlb address
+	add r18=PGDIR_SHIFT-PAGE_SHIFT,r18
+	;;
+	shr.u r22=\va,r22
+	shr.u r18=\va,r18
+(\p2)	dep \ppte=\ppte,r19,(PAGE_SHIFT-3),3	// put region number bits in place
+
+	srlz.d
+	LOAD_PHYSICAL(\p1, r19, swapper_pg_dir)	// region 5 is rooted at swapper_pg_dir
+
+	.pred.rel "mutex", \p1, \p2
+(\p1)	shr.u r21=r21,PGDIR_SHIFT+PAGE_SHIFT
+(\p2)	shr.u r21=r21,PGDIR_SHIFT+PAGE_SHIFT-3
+	;;
+(\p1)	dep \ppte=r18,r19,3,(PAGE_SHIFT-3)	// ppte=pgd_offset for region 5
+(\p2)	dep \ppte=r18,\ppte,3,(PAGE_SHIFT-3)-3	// ppte=pgd_offset for region[0-4]
+	cmp.eq \p2,\p1=0,r21			// unused address bits all zeroes?
+#ifdef CONFIG_PGTABLE_4
+	shr.u r18=r22,PUD_SHIFT			// shift pud index into position
+#else
+	shr.u r18=r22,PMD_SHIFT			// shift pmd index into position
+#endif
+	;;
+	ld8 \ppte=[\ppte]			// get *pgd (may be 0)
+	;;
+(\p2)	cmp.eq \p1,\p2=\ppte,r0			// was pgd_present(*pgd) == NULL?
+	dep \ppte=r18,\ppte,3,(PAGE_SHIFT-3)	// ppte=p[u|m]d_offset(pgd,addr)
+	;;
+#ifdef CONFIG_PGTABLE_4
+(\p2)	ld8 \ppte=[\ppte]			// get *pud (may be 0)
+	shr.u r18=r22,PMD_SHIFT			// shift pmd index into position
+	;;
+(\p2)	cmp.eq.or.andcm \p1,\p2=\ppte,r0	// was pud_present(*pud) == NULL?
+	dep \ppte=r18,\ppte,3,(PAGE_SHIFT-3)	// ppte=pmd_offset(pud,addr)
+	;;
+#endif
+(\p2)	ld8 \ppte=[\ppte]			// get *pmd (may be 0)
+	shr.u r19=r22,PAGE_SHIFT		// shift pte index into position
+	;;
+(\p2)	cmp.eq.or.andcm \p1,\p2=\ppte,r0	// was pmd_present(*pmd) == NULL?
+	dep \ppte=r19,\ppte,3,(PAGE_SHIFT-3)	// ppte=pte_offset(pmd,addr)
+.endm
+
+#endif
Index: linux-2.6.20-rc1/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/mm/init.c	2006-12-23 21:05:07.437149000 +1100
+++ linux-2.6.20-rc1/arch/ia64/mm/init.c	2006-12-23 21:05:07.853357000 +1100
@@ -597,9 +597,11 @@
 	int i;
 	static struct kcore_list kcore_mem, kcore_vmem, kcore_kernel;
 
+#ifdef CONFIG_PT_DEFAULT
 	BUG_ON(PTRS_PER_PGD * sizeof(pgd_t) != PAGE_SIZE);
 	BUG_ON(PTRS_PER_PMD * sizeof(pmd_t) != PAGE_SIZE);
 	BUG_ON(PTRS_PER_PTE * sizeof(pte_t) != PAGE_SIZE);
+#endif
 
 #ifdef CONFIG_PCI
 	/*
Index: linux-2.6.20-rc1/include/asm-ia64/mmu_context.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/mmu_context.h	2006-12-23 21:04:57.420143000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/mmu_context.h	2006-12-23 21:05:07.857359000 +1100
@@ -191,7 +191,9 @@
 	 * We may get interrupts here, but that's OK because interrupt
 	 * handlers cannot touch user-space.
 	 */
+#ifdef CONFIG_PT_DEFAULT
 	ia64_set_kr(IA64_KR_PT_BASE, __pa(next->page_table.pgd));
+#endif
 	activate_context(next);
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 5/5] Abstract pgalloc
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (32 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 4/5] Abstract assembler lookup Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 1/12] Alternate page table implementation (GPT) Paul Davies
                   ` (15 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH IA64 05
 * Abstract implementation dependent memory allocation stuff from
 pgalloc.h into pgalloc-default.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pgalloc-default.h |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pgalloc.h         |   87 ++----------------------------------------------------
 2 files changed, 91 insertions(+), 83 deletions(-)
Index: linux-2.6.20-rc1/include/asm-ia64/pgalloc-default.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/pgalloc-default.h	2006-12-23 21:18:48.054043000 +1100
@@ -0,0 +1,87 @@
+#ifndef _ASM_IA64_PGALLOC_DEFAULT_H
+#define _ASM_IA64_PGALLOC_DEFAULT_H
+
+static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+	return pgtable_quicklist_alloc();
+}
+
+static inline void pgd_free(pgd_t * pgd)
+{
+	pgtable_quicklist_free(pgd);
+}
+
+#ifdef CONFIG_PGTABLE_4
+static inline void
+pgd_populate(struct mm_struct *mm, pgd_t * pgd_entry, pud_t * pud)
+{
+	pgd_val(*pgd_entry) = __pa(pud);
+}
+
+static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	return pgtable_quicklist_alloc();
+}
+
+static inline void pud_free(pud_t * pud)
+{
+	pgtable_quicklist_free(pud);
+}
+#define __pud_free_tlb(tlb, pud)	pud_free(pud)
+#endif /* CONFIG_PGTABLE_4 */
+
+static inline void
+pud_populate(struct mm_struct *mm, pud_t * pud_entry, pmd_t * pmd)
+{
+	pud_val(*pud_entry) = __pa(pmd);
+}
+
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	return pgtable_quicklist_alloc();
+}
+
+static inline void pmd_free(pmd_t * pmd)
+{
+	pgtable_quicklist_free(pmd);
+}
+
+#define __pmd_free_tlb(tlb, pmd)	pmd_free(pmd)
+
+static inline void
+pmd_populate(struct mm_struct *mm, pmd_t * pmd_entry, struct page *pte)
+{
+	pmd_val(*pmd_entry) = page_to_phys(pte);
+}
+
+static inline void
+pmd_populate_kernel(struct mm_struct *mm, pmd_t * pmd_entry, pte_t * pte)
+{
+	pmd_val(*pmd_entry) = __pa(pte);
+}
+
+static inline struct page *pte_alloc_one(struct mm_struct *mm,
+					 unsigned long addr)
+{
+	return virt_to_page(pgtable_quicklist_alloc());
+}
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long addr)
+{
+	return pgtable_quicklist_alloc();
+}
+
+static inline void pte_free(struct page *pte)
+{
+	pgtable_quicklist_free(page_address(pte));
+}
+
+static inline void pte_free_kernel(pte_t * pte)
+{
+	pgtable_quicklist_free(pte);
+}
+
+#define __pte_free_tlb(tlb, pte)	pte_free(pte)
+
+#endif
Index: linux-2.6.20-rc1/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/pgalloc.h	2006-12-21 11:32:12.430004000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/pgalloc.h	2006-12-23 21:20:42.258043000 +1100
@@ -75,89 +75,10 @@
 	preempt_enable();
 }
 
-static inline pgd_t *pgd_alloc(struct mm_struct *mm)
-{
-	return pgtable_quicklist_alloc();
-}
-
-static inline void pgd_free(pgd_t * pgd)
-{
-	pgtable_quicklist_free(pgd);
-}
-
-#ifdef CONFIG_PGTABLE_4
-static inline void
-pgd_populate(struct mm_struct *mm, pgd_t * pgd_entry, pud_t * pud)
-{
-	pgd_val(*pgd_entry) = __pa(pud);
-}
-
-static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-	return pgtable_quicklist_alloc();
-}
-
-static inline void pud_free(pud_t * pud)
-{
-	pgtable_quicklist_free(pud);
-}
-#define __pud_free_tlb(tlb, pud)	pud_free(pud)
-#endif /* CONFIG_PGTABLE_4 */
-
-static inline void
-pud_populate(struct mm_struct *mm, pud_t * pud_entry, pmd_t * pmd)
-{
-	pud_val(*pud_entry) = __pa(pmd);
-}
-
-static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-	return pgtable_quicklist_alloc();
-}
-
-static inline void pmd_free(pmd_t * pmd)
-{
-	pgtable_quicklist_free(pmd);
-}
-
-#define __pmd_free_tlb(tlb, pmd)	pmd_free(pmd)
-
-static inline void
-pmd_populate(struct mm_struct *mm, pmd_t * pmd_entry, struct page *pte)
-{
-	pmd_val(*pmd_entry) = page_to_phys(pte);
-}
-
-static inline void
-pmd_populate_kernel(struct mm_struct *mm, pmd_t * pmd_entry, pte_t * pte)
-{
-	pmd_val(*pmd_entry) = __pa(pte);
-}
-
-static inline struct page *pte_alloc_one(struct mm_struct *mm,
-					 unsigned long addr)
-{
-	return virt_to_page(pgtable_quicklist_alloc());
-}
-
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-					  unsigned long addr)
-{
-	return pgtable_quicklist_alloc();
-}
-
-static inline void pte_free(struct page *pte)
-{
-	pgtable_quicklist_free(page_address(pte));
-}
-
-static inline void pte_free_kernel(pte_t * pte)
-{
-	pgtable_quicklist_free(pte);
-}
-
-#define __pte_free_tlb(tlb, pte)	pte_free(pte)
-
 extern void check_pgt_cache(void);
 
+#ifdef CONFIG_PT_DEFAULT
+#include <asm/pgalloc-default.h>
+#endif
+
 #endif				/* _ASM_IA64_PGALLOC_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/12] Alternate page table implementation (GPT)
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (33 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 5/5] Abstract pgalloc Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 2/12] Alternate page table implementation cont Paul Davies
                   ` (14 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 01
 * The GPT itself is not being commented in these patches, just how
 to fit this page table implementation in under the interface, next
 to the default page table.
   * Any queries regarding GPTs are best directed to 
   awiggins@cse.unsw.edu.au
 * Add GPT option as alternative to the default page table for IA64
 * Create include/asm-ia64/pgtable-gpt.h for GPT specific pgtable.h
 code.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 arch/ia64/Kconfig.debug        |    3 
 include/asm-ia64/pgtable-gpt.h |  157 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 160 insertions(+)
Index: linux-2.6.20-rc4/arch/ia64/Kconfig.debug
===================================================================
--- linux-2.6.20-rc4.orig/arch/ia64/Kconfig.debug	2007-01-11 16:46:47.662747000 +1100
+++ linux-2.6.20-rc4/arch/ia64/Kconfig.debug	2007-01-11 16:58:15.245390000 +1100
@@ -9,6 +9,9 @@
 config  PT_DEFAULT
 	bool "PT_DEFAULT"
 
+config  GPT
+	bool "GPT"
+
 endchoice
 
 choice
Index: linux-2.6.20-rc4/include/asm-ia64/pgtable-gpt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/asm-ia64/pgtable-gpt.h	2007-01-11 18:57:09.215823000 +1100
@@ -0,0 +1,157 @@
+/**
+ *  include/asm-ia64/pgtable-gpt.h
+ *
+ *  Copyright (C) 2005 - 2006 University of New South Wales, Australia
+ *      Adam 'WeirdArms' Wiggins <awiggins@cse.unsw.edu.au>,
+ */
+
+#ifndef _ASM_IA64_PGTABLE_GPT_H
+#define _ASM_IA64_PGTABLE_GPT_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/types.h>
+
+#define RGN_MAP_SHIFT 55
+#define ALIGNVAL (1UL << 25)
+
+typedef uint64_t gpt_key_value_t;
+
+typedef struct {
+	uint64_t _pad:    6;
+	uint64_t length:  6;
+	uint64_t value:  52;
+} gpt_key_t;
+
+static inline gpt_key_t
+gpt_key_init(gpt_key_value_t value, int8_t length)
+{
+	gpt_key_t key;
+
+	key.value = value;
+	key.length = length;
+	key._pad = 0;
+
+	return key;
+}
+
+static inline gpt_key_t
+gpt_key_null(void)
+{
+	return gpt_key_init(0, 0);
+}
+
+static inline gpt_key_value_t
+gpt_key_read_value(gpt_key_t key)
+{
+	return key.value;
+}
+
+static inline int8_t
+gpt_key_read_length(gpt_key_t key)
+{
+	return key.length;
+}
+
+static inline gpt_key_value_t
+gpt_key_value_mask(int8_t coverage)
+{
+	return (coverage < GPT_KEY_LENGTH_MAX) ?
+		~(((gpt_key_value_t)1 << coverage) - (gpt_key_value_t)1) : 0;
+}
+
+static inline int
+gpt_key_compare_null(gpt_key_t key)
+{
+	return gpt_key_read_length(key) == 0;
+}
+
+static inline gpt_key_t
+gpt_key_cut_LSB(int8_t length_lsb, gpt_key_t key)
+{
+	int8_t length;
+	gpt_key_value_t value;
+
+	value = gpt_key_read_value(key);
+	length = gpt_key_read_length(key);
+	if(length_lsb > length) {
+		return gpt_key_null();
+	}
+	length -= length_lsb;
+	value >>= length_lsb;
+	return gpt_key_init(value, length);
+}
+
+static inline gpt_key_t
+gpt_key_cut_LSB2(int8_t length_lsb, gpt_key_t* key_u)
+{
+	int8_t length;
+	gpt_key_value_t value;
+
+	value = gpt_key_read_value(*key_u);
+	length = gpt_key_read_length(*key_u);
+	if(length_lsb > length) {
+		length_lsb = length;
+	}
+	length -= length_lsb;
+	*key_u = ((length == 0) ? gpt_key_null() :
+			  gpt_key_init(value >> length_lsb, length));
+	return gpt_key_init(value & ~gpt_key_value_mask(length_lsb),
+			length_lsb);
+}
+
+static inline gpt_key_t
+gpt_keys_merge_MSB(gpt_key_t key_lsb, gpt_key_t key)
+{
+	int8_t length, length_lsb;
+	gpt_key_value_t value;
+
+	value = gpt_key_read_value(key);
+	length = gpt_key_read_length(key);
+	length_lsb = gpt_key_read_length(key_lsb);
+	value = (value << length_lsb) + gpt_key_read_value(key_lsb);
+	length += length_lsb;
+	return gpt_key_init(value, length);
+}
+
+static inline gpt_key_t
+gpt_keys_merge_LSB(gpt_key_t key_msb, gpt_key_t key)
+{
+    int8_t length;
+	gpt_key_value_t value;
+
+	value = gpt_key_read_value(key);
+	length = gpt_key_read_length(key);
+	value = (gpt_key_read_value(key_msb) << length) + value;
+	length += gpt_key_read_length(key_msb);
+	return gpt_key_init(value, length);
+}
+
+static inline int
+gptKeysCompareEqual(gpt_key_t key1, gpt_key_t key2)
+{
+	return ((gpt_key_read_length(key1) == gpt_key_read_length(key2)) &&
+            (gpt_key_read_value(key1) == gpt_key_read_value(key2)));
+}
+
+
+/* awiggins (2006-06-23): Massage in a little better, also optimise for ia64. */
+#define WORD_BIT 64
+static inline size_t
+gpt_ctlz(uint64_t n, int8_t msb)
+{
+	int8_t i;
+
+	if(msb > WORD_BIT) msb = WORD_BIT;
+	for(i = 0; i <= msb; i++) {
+		/* Check most significant bit is zero. */
+		if(n & (uint64_t)1 << msb) break;
+		/* Shift to the next test bit. */
+		n <<= 1;
+	}
+	return i;
+}
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* !_ASM_PGTABLE_GPT_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 2/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (34 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 1/12] Alternate page table implementation (GPT) Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:48 ` [PATCH 3/12] " Paul Davies
                   ` (13 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 02
 * Creates /include/asm-ia64/page-gpt.h for GPT specific page.h requirements.
 and includes it in page.h. (similar to page-pt-default.h)

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 page-gpt.h |  238 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 page.h     |    4 +
 2 files changed, 242 insertions(+)
Index: linux-2.6.20-rc1/include/asm-ia64/page-gpt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/page-gpt.h	2007-01-03 12:09:27.559871000 +1100
@@ -0,0 +1,238 @@
+/**
+ *  include/asm-ia64/page-gpt.h
+ *
+ *  Copyright (C) 2005 - 2006 University of New South Wales, Australia
+ *      Adam 'WeirdArms' Wiggins <awiggins@cse.unsw.edu.au>.
+ */
+
+#ifndef _ASM_IA64_PAGE_GPT_H
+#define _ASM_IA64_PAGE_GPT_H
+
+
+#define GPT_NODE_LOG2BYTES   4 /* 128 bit == 16 (2^4) bytes. */
+#define GPT_KEY_LENGTH_MAX   (64 - PAGE_SHIFT)
+#define GPT_KEY_LENGTH_STORE 52 /* 64 - 12 (Smallest page size). */
+
+#define GPT_LEVEL_ORDER_MAX 16 /* 2^4 == 16 */
+
+#define GPT_NODE_TERM_BIT 0
+#define GPT_NODE_MODE_BIT 1
+#define GPT_NODE_TYPE(t,m) ((t) << GPT_NODE_TERM_BIT | (m) << GPT_NODE_MODE_BIT)
+
+#define GPT_NODE_TYPE_INVALID  GPT_NODE_TYPE(0,0)
+#define GPT_NODE_TYPE_LEAF     GPT_NODE_TYPE(0,1)
+#define GPT_NODE_TYPE_INTERNAL GPT_NODE_TYPE(1,0)
+#define GPT_NODE_TYPE_SHARED   GPT_NODE_TYPE(1,1) // Shared sub-trees unimpl.
+
+#ifndef __ASSEMBLY__
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <asm/pgtable-gpt.h>
+
+/* awiggins (2006-06-27): Replace union/structs with defines/macro's for asm. */
+typedef union {
+	struct {
+		uint64_t guard_length:  6;
+		uint64_t type:          2;
+		uint64_t order:         4; /* Remove. */
+		uint64_t guard_value:  52; /* 64 - 12 (minimum page size) */
+		union {
+			struct {
+				uint64_t ptr/*:      60*/;
+				//uint64_t order:  4;
+			} level;
+			pte_t pte;
+		} entry;
+	} node ;
+        struct {
+			uint64_t guard;
+			uint64_t entry;
+        } raw;
+} gpt_node_t;
+
+/* awiggins (2006-07-06): Next 2 functions are placeholders, should be atomic.*/
+static inline gpt_node_t
+gpt_node_get(gpt_node_t* node_p)
+{
+	return *node_p;
+}
+
+static inline void
+gpt_node_set(gpt_node_t* node_p, gpt_node_t node)
+{
+	/* Invalidate entry to mark node as being updated. */
+	node_p->raw.entry = 0;
+	/* Update node. */
+	((gpt_node_t volatile *)node_p)->raw.guard = node.raw.guard;
+	((gpt_node_t volatile *)node_p)->raw.entry = node.raw.entry;
+
+	/** awiggins 2006-08-02: The volatile typecasts should preserve
+	 *  the ordering of the operations by tagging those stores as
+	 *  releases.
+	 */
+}
+
+static inline int
+gpt_node_type(gpt_node_t node)
+{
+	return node.node.type;
+}
+
+static inline int
+gpt_node_valid(gpt_node_t node)
+{
+	switch(gpt_node_type(node)) {
+	case GPT_NODE_TYPE_INTERNAL:
+	case GPT_NODE_TYPE_LEAF:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
+static inline gpt_node_t
+gpt_node_invalid_init(void)
+{
+	gpt_node_t invalid;
+
+	invalid.raw.guard = 0;
+	invalid.raw.entry = 0;
+//invalid.node.type = GPT_NODE_TYPE_INVALID;
+	return invalid;
+}
+
+static inline gpt_node_t
+gpt_node_leaf_init(pte_t pte)
+{
+	gpt_node_t leaf;
+
+	leaf.node.type = GPT_NODE_TYPE_LEAF;
+	leaf.node.entry.pte = pte;
+	leaf.node.order = 0; // awiggins (2006-07-07): Should be set in pte.
+
+	return leaf;
+}
+
+static inline int8_t
+gpt_node_leaf_read_coverage(gpt_node_t node)
+{
+	return node.node.order;
+}
+
+static inline pte_t*
+gpt_node_leaf_read_ptep(gpt_node_t* node_p)
+{
+	return &(node_p->node.entry.pte);
+}
+
+static inline gpt_node_t
+gpt_node_internal_init(gpt_node_t* level, int8_t order)
+{
+	gpt_node_t internal;
+
+	internal.node.type = GPT_NODE_TYPE_INTERNAL;
+	internal.node.entry.level.ptr =
+			__pa((uint64_t)level) /*> GPT_NODE_LOG2BYTES*/;
+	internal.node.order = order;
+	return internal;
+}
+
+static inline gpt_node_t*
+gpt_node_internal_read_ptr(gpt_node_t node)
+{
+	return (gpt_node_t*)__va(node.node.entry.level.ptr
+							 /*< GPT_NODE_LOG2BYTES*/);
+}
+
+static inline int8_t
+gpt_node_internal_read_order(gpt_node_t node)
+{
+	return (int8_t)node.node.order;
+}
+
+/* Current node structure does not store the number of valid children. */
+static inline gpt_node_t
+gpt_node_internal_dec_children(gpt_node_t node)
+{
+	/* awiggins (2006-07-14): Decrement node's valid children count. */
+	return node;
+}
+
+static inline gpt_node_t
+gpt_node_internal_inc_children(gpt_node_t node)
+{
+	/* awiggins (2006-07-14): Increment node's valid children count. */
+	return node;
+}
+
+static inline gpt_key_value_t
+gpt_node_internal_count_children(gpt_node_t node)
+{
+	int8_t order;
+	gpt_node_t* level;
+	gpt_key_value_t index, valid;
+
+	level = gpt_node_internal_read_ptr(node);
+	order = gpt_node_internal_read_order(node);
+	for(index = valid = 0; index < (1 << order); index++) {
+		if(gpt_node_valid(level[index])) {
+			valid++;
+		}
+	}
+	return valid;
+}
+
+static inline gpt_key_value_t
+gpt_node_internal_first_child(gpt_node_t node)
+{
+	gpt_key_value_t index;
+	int8_t order;
+	gpt_node_t* level;
+
+	level = gpt_node_internal_read_ptr(node);
+	order = gpt_node_internal_read_order(node);
+	for(index = 0; index < (1 << order); index++) {
+		if(gpt_node_valid(level[index])) {
+			return index;
+		}
+	}
+	panic("Should empty level encountered!");
+}
+
+static inline int
+gpt_node_internal_elongation(gpt_node_t node)
+{
+	/* Elongations are unit sized levels. */
+	return gpt_node_internal_read_order(node) == 0;
+}
+
+static inline gpt_node_t
+gpt_node_init_guard(gpt_node_t node, gpt_key_t guard)
+{
+	int8_t length, shift;
+
+	length = gpt_key_read_length(guard);
+	node.node.guard_length = gpt_key_read_length(guard);
+	/* Store guard value MSB aligned for assembly walker. */
+	shift = GPT_KEY_LENGTH_STORE - length;
+	node.node.guard_value = gpt_key_read_value(guard) << shift;
+
+	return node;
+}
+
+static inline gpt_key_t
+gpt_node_read_guard(gpt_node_t node)
+{
+	int8_t length, shift;
+
+	length = node.node.guard_length;
+	shift = GPT_KEY_LENGTH_STORE - length;
+	/* Need to LSB align guard before returning. */
+	return gpt_key_init(node.node.guard_value >> shift, length);
+}
+
+#endif /* !__ASSEMBLY__ */
+
+
+#endif /* !_ASM_IA64_PAGE_GPT_H */
Index: linux-2.6.20-rc1/include/asm-ia64/page.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/page.h	2007-01-03 11:51:37.343593000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/page.h	2007-01-03 11:58:00.191871000 +1100
@@ -217,4 +217,8 @@
 #include <asm/page-default.h>
 #endif
 
+#ifdef CONFIG_GPT
+#include <asm/page-gpt.h>
+#endif
+
 #endif /* _ASM_IA64_PAGE_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 3/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (35 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 2/12] Alternate page table implementation cont Paul Davies
@ 2007-01-13  2:48 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 4/12] " Paul Davies
                   ` (12 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 03
 * Adds the GPT as a page table type
 * include the GPT in include/linux/pt.h
 * Adds some of the GPT implementation in pt-gpt.h
 and include it in pt.h

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 gpt.h     |  120 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pt-gpt.h  |  115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pt-type.h |    5 ++
 pt.h      |    6 ++-
 4 files changed, 245 insertions(+), 1 deletion(-)
Index: linux-2.6.20-rc4/include/linux/pt-type.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt-type.h	2007-01-11 16:46:47.518747000 +1100
+++ linux-2.6.20-rc4/include/linux/pt-type.h	2007-01-11 16:58:19.345390000 +1100
@@ -5,4 +5,9 @@
 typedef struct { pgd_t *pgd; } pt_t;
 #endif
 
+#ifdef CONFIG_GPT
+#include <linux/gpt.h>
+typedef gpt_t pt_t;
+#endif
+
 #endif
Index: linux-2.6.20-rc4/include/linux/pt-gpt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/pt-gpt.h	2007-01-11 16:58:19.345390000 +1100
@@ -0,0 +1,115 @@
+#ifndef _LINUX_PT_GPT_H
+#define _LINUX_PT_GPT_H
+
+#include <asm/pgtable.h>
+
+#include <linux/hugetlb.h>
+#include <linux/gpt.h>
+
+typedef struct pt_struct { } pt_path_t;
+
+static inline int create_user_page_table(struct mm_struct *mm)
+{
+	mm->page_table = gpt_node_invalid_init();
+
+	return 0;
+}
+
+static inline void destroy_user_page_table(struct mm_struct *mm)
+{
+
+}
+
+static inline pte_t *lookup_page_table(struct mm_struct *mm,
+		unsigned long address, pt_path_t *pt_path)
+{
+	gpt_thunk_t thunk;
+
+    thunk.key = gpt_key_init(extract_key(address), GPT_KEY_LENGTH_MAX);
+    thunk.node_p = &(mm->page_table);
+    if(!gpt_node_inspect_find(&thunk) ||
+       (gpt_node_type(*thunk.node_p) != GPT_NODE_TYPE_LEAF)) {
+		return NULL;
+    }
+    return gpt_node_leaf_read_ptep(thunk.node_p);
+}
+
+static inline pte_t *build_page_table(struct mm_struct *mm,
+		unsigned long address, pt_path_t *pt_path)
+{
+	int is_root;
+	pte_t pte;
+	gpt_thunk_t update_thunk;
+	gpt_node_t leaf;
+
+	update_thunk.key = gpt_key_init(extract_key(address),
+                                        GPT_KEY_LENGTH_MAX);
+	pte_clear(mm, address, &pte); /* Should set coverage/page-size here. */
+	leaf = gpt_node_leaf_init(pte);
+
+	update_thunk.node_p = (gpt_node_t *)&mm->page_table;
+	is_root = gpt_node_update_find(&update_thunk);
+	if(gptLevelRestructureInsert(is_root, &update_thunk) < GPT_OK) {
+		return gpt_node_leaf_read_ptep(update_thunk.node_p);
+	}
+	if(gpt_node_insert(leaf, update_thunk) < GPT_OK) {
+		return NULL;
+    }
+	gpt_node_internal_traverse(&update_thunk);
+	return gpt_node_leaf_read_ptep(update_thunk.node_p);
+}
+
+#define INIT_PT
+
+#define lock_pte(mm, pt_path) \
+	({ spin_lock(&mm->page_table_lock);})
+
+/*
+ * Unlocks the ptes notionally pointed to by the
+ * page table path.
+ */
+#define unlock_pte(mm, pt_path) \
+	({ spin_unlock(&mm->page_table_lock);})
+
+/*
+ * Looks up a page table from a saved path.  It also
+ * locks the page table.
+ */
+#define lookup_page_table_lock(mm, pt_path, address) \
+	({ pte_t *__pte = lookup_page_table(mm, address, NULL);\
+	   spin_lock(&mm->page_table_lock); \
+	   __pte; })
+
+/*
+ * Check that the original pte hasn't change.
+ */
+
+#define atomic_pte_same(mm, pte, orig_pte, pt_path) \
+({ \
+	int __same; \
+	spin_lock(&mm->page_table_lock); \
+	__same = pte_same(*pte, orig_pte); \
+	spin_unlock(&mm->page_table_lock); \
+	__same; \
+})
+
+#define is_huge_page(mm, address, pt_path, flags, page) \
+({ \
+	int __ret=0; \
+  	__ret; \
+})
+
+#define set_pt_path(pt_path, ppt_path) ((pt_path) = *(ppt_path))
+
+#define CLUSTER_SIZE	min(32*PAGE_SIZE, 32*PAGE_SIZE)
+
+static inline pte_t *lookup_gate_area(struct mm_struct *mm,
+			unsigned long pg)
+{
+	panic("Implement\n");
+	return NULL;
+}
+
+#define vma_optimization do {} while(0)
+
+#endif
Index: linux-2.6.20-rc4/include/linux/pt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/pt.h	2007-01-11 16:46:48.246747000 +1100
+++ linux-2.6.20-rc4/include/linux/pt.h	2007-01-11 16:58:19.345390000 +1100
@@ -1,6 +1,10 @@
 #ifndef _LINUX_PT_H
 #define _LINUX_PT_H
 
+#ifdef CONFIG_GPT
+#include <linux/pt-gpt.h>
+#endif
+
 #include <linux/swap.h>
 
 #ifdef CONFIG_PT_DEFAULT
@@ -48,7 +52,7 @@
 		unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page);
 
 void smaps_read_iterator(struct vm_area_struct *vma,
-  unsigned long addr, unsigned long end, struct mem_size_stats *mss);
+		unsigned long addr, unsigned long end, struct mem_size_stats *mss);
 
 int check_policy_read_iterator(struct vm_area_struct *vma,
 		unsigned long addr, unsigned long end, const nodemask_t *nodes,
Index: linux-2.6.20-rc4/include/linux/gpt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/gpt.h	2007-01-11 16:58:19.349390000 +1100
@@ -0,0 +1,120 @@
+/**
+ *  include/linux/gpt.h
+ *
+ *  Copyright (C) 2005 - 2006 University of New South Wales, Australia
+ *      Adam 'WeirdArms' Wiggins <awiggins@cse.unsw.edu.au>,
+ *      Paul Davies <pauld@cse.unsw.edu.au>.
+ */
+
+#ifndef _LINUX_GPT_H
+#define _LINUX_GPT_H
+
+#include <asm/pgtable-gpt.h>
+
+#define GPT_SPECIAL 1
+#define GPT_NORMAL  2
+
+#define GPT_ORDER (PAGE_SHIFT - GPT_NODE_LOG2BYTES) // Main levels page-sized.
+
+#define GPT_KEY_VALUE_MAX ((gpt_key_value_t)((1 << GPT_KEY_LENGTH_MAX) - 1))
+
+typedef gpt_node_t gpt_t;
+
+typedef struct {
+	gpt_key_t key;
+	gpt_node_t *node_p;
+} gpt_thunk_t;
+
+typedef enum {GPT_TRAVERSED_FULL = 0, GPT_TRAVERSED_GUARD,
+              GPT_TRAVERSED_MISMATCH, GPT_TRAVERSED_NONE
+} gpt_traversed_t;
+
+#define GPT_ITERATE_INVALIDS  (1 << 0)
+#define GPT_ITERATE_LEAVES    (1 << 1)
+#define GPT_ITERATE_INTERNALS (1 << 2)
+
+#define GPT_ITERATOR_STACK_SIZE (((GPT_KEY_LENGTH_MAX - 1)/GPT_ORDER) + 1)
+
+typedef struct {
+	int8_t flags, coverage, depth, finished;
+	gpt_key_value_t start, limit;
+	gpt_key_t key;
+	gpt_node_t* node_p;
+	gpt_node_t* stack[GPT_ITERATOR_STACK_SIZE];
+} gpt_iterator_t;
+
+/****************
+* Return codes. *
+****************/
+
+#define GPT_OK            0
+#define GPT_FAILED       -1
+#define GPT_INVALID      -2
+#define GPT_NOT_FOUND    -3
+#define GPT_OCCUPIED     -4
+#define GPT_OVERLAP      -5
+#define GPT_ALLOC_FAILED -6
+
+static inline unsigned long extract_key(unsigned long address)
+{
+	address >>= PAGE_SHIFT;
+
+	return address;
+}
+
+static inline unsigned long get_real_address(unsigned long pos_value)
+{
+	pos_value <<= PAGE_SHIFT;
+
+	return pos_value;
+}
+
+int gpt_node_inspect_find(gpt_thunk_t* inspect_thunk_u);
+int gpt_node_update_find(gpt_thunk_t* update_thunk_u);
+int gpt_node_delete(int is_root, gpt_thunk_t update_thunk);
+int gpt_node_insert(gpt_node_t new_node, gpt_thunk_t update_thunk);
+gpt_traversed_t gpt_node_internal_traverse(gpt_thunk_t* thunk_u);
+void gpt_node_restructure_delete(int is_root, int8_t update_coverage,
+                                 gpt_node_t* update_node_u);
+int gpt_node_restructure_insert(int is_root, gpt_thunk_t* update_thunk_u);
+
+gpt_node_t* gpt_level_allocate(int8_t order);
+void gpt_level_deallocate(gpt_node_t* level, int8_t order);
+
+int gpt_iterator_inspect(gpt_iterator_t* iterator, gpt_key_t* key_r,
+                         gpt_node_t** node_p_r);
+
+gpt_node_t gpt_node_get(gpt_node_t* node_p);
+void gpt_node_set(gpt_node_t* node_p, gpt_node_t node);
+int gpt_node_type(gpt_node_t node);
+int gpt_node_valid(gpt_node_t node);
+gpt_node_t gpt_node_invalid_init(void);
+gpt_node_t gpt_node_leaf_init(pte_t pte);
+int8_t gpt_node_leaf_read_coverage(gpt_node_t node);
+pte_t* gpt_node_leaf_read_ptep(gpt_node_t* node_p);
+gpt_node_t gpt_node_internal_init(gpt_node_t* level, int8_t order);
+gpt_node_t gpt_node_internal_dec_children(gpt_node_t node);
+gpt_node_t gpt_node_internal_inc_children(gpt_node_t node);
+gpt_key_value_t gpt_node_internal_count_children(gpt_node_t node);
+gpt_key_value_t gpt_node_internal_first_child(gpt_node_t node);
+int gpt_node_internal_elongation(gpt_node_t node);
+gpt_node_t* gpt_node_internal_read_ptr(gpt_node_t node);
+int8_t gpt_node_internal_read_order(gpt_node_t node);
+gpt_node_t gpt_node_init_guard(gpt_node_t node, gpt_key_t guard);
+gpt_key_t gpt_node_read_guard(gpt_node_t node);
+
+int gptLevelRestructureInsert(int is_root, gpt_thunk_t* update_thunk_u);
+
+int8_t gptNodeReplication(gpt_node_t node, int8_t coverage);
+
+gpt_key_t gpt_key_null(void);
+gpt_key_t gpt_key_init(gpt_key_value_t value, int8_t length);
+gpt_key_value_t gpt_key_read_value(gpt_key_t key);
+int8_t gpt_key_read_length(gpt_key_t key);
+
+void gptKeyCutMSB(int8_t length_msb, gpt_key_t* key_u, gpt_key_t* key_msb_r);
+void gptKeyCutLSB(int8_t length_lsb, gpt_key_t* key_u, gpt_key_t* key_lsb_r);
+void gptKeysMergeLSB(gpt_key_t key_msb, gpt_key_t* key_u);
+int8_t gptKeysCompareStripPrefix(gpt_key_t* key1_u, gpt_key_t* key2_u);
+
+#endif /* !_LINUX_GPT_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 4/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (36 preceding siblings ...)
  2007-01-13  2:48 ` [PATCH 3/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 5/12] " Paul Davies
                   ` (11 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 04
 * Add C files for GPT implementation and update Makefile

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 Makefile             |    7 +
 pt-gpt-alloc.c       |   38 +++++++++
 pt-gpt-restructure.c |  195 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 239 insertions(+), 1 deletion(-)
Index: linux-2.6.20-rc1/mm/Makefile
===================================================================
--- linux-2.6.20-rc1.orig/mm/Makefile	2007-01-03 12:30:42.879871000 +1100
+++ linux-2.6.20-rc1/mm/Makefile	2007-01-03 12:35:13.756007000 +1100
@@ -5,7 +5,12 @@
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= fremap.o highmem.o madvise.o memory.o mincore.o \
 			   mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-			   vmalloc.o pt-default.o
+			   vmalloc.o
+
+ifdef CONFIG_MMU
+mmu-$(CONFIG_PT_DEFAULT)+= pt-default.o
+mmu-$(CONFIG_GPT) += pt-gpt-core.o pt-gpt-restructure.o pt-gpt-alloc.o
+endif
 
 obj-y			:= bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
 			   page_alloc.o page-writeback.o pdflush.o \
Index: linux-2.6.20-rc1/mm/pt-gpt-alloc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/mm/pt-gpt-alloc.c	2007-01-03 12:30:46.159871000 +1100
@@ -0,0 +1,38 @@
+/**
+ *  mm/pt-gpt-alloc.c
+ *
+ *  Copyright (C) 2005 - 2006 University of New South Wales, Australia
+ *      Adam 'WeirdArms' Wiggins <awiggins@cse.unsw.edu.au>,
+ *      Paul Davies <pauld@cse.unsw.edu.au>.
+ */
+
+#include <linux/types.h>
+#include <linux/bootmem.h>
+#include <linux/gpt.h>
+
+#include <asm/pgalloc.h>
+
+int gpt_memsrc = GPT_SPECIAL;
+
+/* awiggins (2006-07-17): Currently ignores the size and allocates a page. */
+gpt_node_t*
+gpt_level_allocate(int8_t order)
+{
+        gpt_node_t* level;
+
+        if(gpt_memsrc == GPT_SPECIAL) {
+                level = (gpt_node_t*)alloc_bootmem_pages(PAGE_SIZE);
+        } else {
+                level = (gpt_node_t*)pgtable_quicklist_alloc();
+		}
+        if(!level) {
+                panic("GPT level allocation failed!\n");
+        }
+        return level;
+}
+
+void
+gpt_level_deallocate(gpt_node_t* level, int8_t order)
+{
+        pgtable_quicklist_free((void*)level);
+}
Index: linux-2.6.20-rc1/mm/pt-gpt-restructure.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/mm/pt-gpt-restructure.c	2007-01-03 12:40:30.841030000 +1100
@@ -0,0 +1,195 @@
+/**
+ *  mm/pt-gpt-restructure.c
+ *
+ *  Copyright (C) 2005 - 2006 University of New South Wales, Australia
+ *      Adam 'WeirdArms' Wiggins <awiggins@cse.unsw.edu.au>,
+ *      Paul Davies <pauld@cse.unsw.edu.au>.
+ */
+
+#include <linux/types.h>
+#include <linux/bootmem.h>
+#include <linux/gpt.h>
+
+/****************************
+* Local function prototypes *
+****************************/
+
+static void gpt_node_restructure_merge(gpt_key_value_t merge_index,
+                                       int8_t coverage,
+                                       gpt_node_t* update_node_u);
+static int gpt_node_restructure_cut(int8_t cut_length,
+                                    gpt_node_t* update_node_u);
+
+/*********************
+* Exported functions *
+*********************/
+
+void
+gpt_node_restructure_delete(int is_root, int8_t coverage,
+                            gpt_node_t* update_node_u)
+{
+	gpt_node_t update_node;
+	gpt_key_value_t index_value;
+
+	/* If deletion window is the root node, no restructuring possible. */
+	update_node = gpt_node_get(update_node_u);
+	if(!is_root && (gpt_node_internal_count_children(update_node) == 1)) {
+		index_value = gpt_node_internal_first_child(update_node);
+		gpt_node_restructure_merge(index_value, coverage,
+									   update_node_u);
+	}
+}
+
+int
+gpt_node_restructure_insert(int is_root, gpt_thunk_t* update_thunk_u)
+{
+	int traversed = 0;
+	int8_t match_length;
+	gpt_key_t key, guard;
+	gpt_node_t node;
+	gpt_thunk_t temp_thunk;
+
+	/* If required, traverse to the node covering the insertion window. */
+	temp_thunk = *update_thunk_u;
+	if(!is_root && !gpt_node_internal_traverse(&temp_thunk)) {
+		traversed = 1;
+	} else {
+		temp_thunk.key = update_thunk_u->key;
+	}
+	/* Find if insertion window lays on a guard, if so restructure. */
+	node = gpt_node_get(temp_thunk.node_p);
+	if(!gpt_node_valid(node)) {
+		return GPT_OK;
+	}
+	key = temp_thunk.key;
+	guard = gpt_node_read_guard(node);
+	match_length = gptKeysCompareStripPrefix(&key, &guard);
+	if(gpt_key_compare_null(key)) {
+		return GPT_OVERLAP;
+	}
+	if(gpt_key_compare_null(guard)) {
+		switch(gpt_node_type(node)) {
+		case GPT_NODE_TYPE_LEAF:
+			return GPT_OVERLAP;
+		case GPT_NODE_TYPE_INTERNAL:
+			return GPT_OK;
+		default:
+			panic("Invalid GPT node type!");
+		}
+	}
+	/* Traverse to cut node and return it cut. */
+	if(traversed) {
+		*update_thunk_u = temp_thunk;
+	}
+	return gpt_node_restructure_cut(match_length, update_thunk_u->node_p);
+}
+
+int
+gptLevelRestructureInsert(int is_root, gpt_thunk_t* update_thunk_u)
+{
+	int8_t match_length;
+	gpt_key_t key, guard;
+	gpt_node_t node;
+	gpt_thunk_t temp_thunk;
+
+	/* If required, try traversing to node covering the insert point. */
+	key = update_thunk_u->key;
+	temp_thunk = *update_thunk_u;
+	if(!is_root &&
+	   (gpt_node_internal_traverse(&temp_thunk) == GPT_TRAVERSED_FULL)) {
+			update_thunk_u->key = key = temp_thunk.key;
+	}
+	update_thunk_u->node_p = temp_thunk.node_p;
+	/* Already at the insertion point. */
+	node = gpt_node_get(update_thunk_u->node_p);
+	if(!gpt_node_valid(node)) {
+		return GPT_OK;
+	}
+	guard = gpt_node_read_guard(node);
+	match_length = gptKeysCompareStripPrefix(&key, &guard);
+	if(gpt_key_compare_null(key)) {
+		return GPT_OVERLAP;
+	}
+	if(gpt_key_compare_null(guard)) {
+		switch(gpt_node_type(node)) {
+		case GPT_NODE_TYPE_LEAF:
+			return GPT_OVERLAP;
+		case GPT_NODE_TYPE_INTERNAL:
+			return GPT_OK;
+		default:
+			panic("Should never get here\n");
+		}
+	}
+	return gpt_node_restructure_cut(match_length, update_thunk_u->node_p);
+}
+
+/******************
+* Local functions *
+******************/
+
+static void
+gpt_node_restructure_merge(gpt_key_value_t merge_index, int8_t coverage,
+                           gpt_node_t* update_node_u)
+{
+	int8_t level_order, guard_length, replication;
+	gpt_key_t guard_top, guard, index;
+	gpt_node_t temp_node, update_node;
+	gpt_node_t* level;
+
+	/* Find the merge-node, guards and index for merging. */
+	update_node = gpt_node_get(update_node_u);
+	level = gpt_node_internal_read_ptr(update_node);
+	level_order = gpt_node_internal_read_order(update_node);
+	guard = gpt_node_read_guard(update_node);
+	temp_node = level[merge_index];
+	guard = gpt_node_read_guard(temp_node);
+
+	/* Merge guards and index into a single node. */
+	guard_length = gpt_key_read_length(guard_top);
+	coverage -= (level_order + guard_length);
+	replication = gptNodeReplication(temp_node, coverage);
+	index = gpt_key_init(merge_index >> replication,
+						 level_order - replication);
+	gptKeysMergeLSB(guard_top, &index); gptKeysMergeLSB(index, &guard);
+	gpt_node_set(update_node_u, gpt_node_init_guard(temp_node, guard));
+	gpt_level_deallocate(level, level_order);
+}
+
+static int
+gpt_node_restructure_cut(int8_t cut_length, gpt_node_t* update_node_u)
+{
+	int error;
+	int8_t index_length, coverage, order = GPT_ORDER;
+	gpt_key_t guard_top, guard_bottom, index;
+	gpt_node_t node, update_node;
+	gpt_node_t* level;
+	gpt_thunk_t thunk;
+
+	/* Preserve the update-node's guard. */
+	update_node = gpt_node_get(update_node_u);
+	thunk.key = gpt_node_read_guard(update_node);
+	guard_bottom = thunk.key;
+
+	/* Seperate the node in two. */
+	cut_length -= (cut_length % order); /* Top guard must be a multiple of trie's order. */
+	gptKeyCutMSB(cut_length, &guard_bottom, &guard_top);
+	index_length = gpt_key_read_length(guard_bottom);
+	if(gpt_node_type(update_node) == GPT_NODE_TYPE_LEAF) {
+		coverage = gpt_node_leaf_read_coverage(update_node);
+	} else {
+		coverage = 0;
+	}
+	index_length = (order > index_length + coverage) ?
+			index_length + coverage : order;
+	gptKeyCutMSB(index_length, &guard_bottom, &index);
+	level = gpt_level_allocate(index_length);
+	if(!level) {
+		return GPT_ALLOC_FAILED;
+	}
+	node = gpt_node_internal_init(level, index_length);
+	node = gpt_node_init_guard(node, guard_top);
+	thunk.node_p = &node;
+	error = gpt_node_insert(update_node, thunk);
+	gpt_node_set(update_node_u, node);
+	return GPT_OK;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 5/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (37 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 4/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 6/12] " Paul Davies
                   ` (10 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 05
 * Adds IA64 GPT assembler lookup into ivt.h
 * Adds other arch dependent implementation for GPT (parallels how
 its done for default page table).

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 arch/ia64/kernel/ivt.S         |    4 +-
 arch/ia64/kernel/setup.c       |    4 ++
 include/asm-ia64/ivt.h         |   77 +++++++++++++++++++++++++++++++++++++++++
 include/asm-ia64/mmu_context.h |    3 +
 include/asm-ia64/pgalloc-gpt.h |   18 +++++++++
 include/asm-ia64/pgalloc.h     |    4 ++
 include/asm-ia64/pt-gpt.h      |   40 +++++++++++++++++++++
 include/asm-ia64/pt.h          |    4 ++
 8 files changed, 153 insertions(+), 1 deletion(-)
Index: linux-2.6.20-rc1/include/asm-ia64/pt-gpt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/pt-gpt.h	2007-01-03 12:43:01.693030000 +1100
@@ -0,0 +1,40 @@
+#ifndef _ASM_IA64_GPT_H
+#define _ASM_IA64_GPT_H 1
+
+#include <linux/bootmem.h>
+#include <linux/pt.h>
+
+/* Create kernel page table */
+static inline void create_kernel_page_table(void)
+{
+	init_mm.page_table = gpt_node_invalid_init();
+}
+
+/* Lookup the kernel page table */
+static inline pte_t *lookup_page_table_k(unsigned long address)
+{
+	return lookup_page_table(&init_mm, address, NULL);
+}
+
+/* Lookup the kernel page table */
+static inline pte_t *lookup_page_table_k2(unsigned long *address)
+{
+	panic("Unimplemented");
+	return NULL;
+}
+
+/* Build the kernel page table */
+static inline pte_t *build_page_table_k(unsigned long address)
+{
+	return build_page_table(&init_mm, address, NULL);
+}
+
+/* Builds the kernel page table from bootmem (before kernel memory allocation
+ * comes on line) */
+static inline pte_t *build_page_table_k_bootmem(unsigned long address, int _node)
+{
+	return build_page_table(&init_mm, address, NULL);
+}
+
+
+#endif
Index: linux-2.6.20-rc1/include/asm-ia64/pt.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/pt.h	2007-01-03 12:35:16.310729000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/pt.h	2007-01-03 12:40:56.437030000 +1100
@@ -5,6 +5,10 @@
 #include <asm/pt-default.h>
 #endif
 
+#ifdef CONFIG_GPT
+#include <asm/pt-gpt.h>
+#endif
+
 void create_kernel_page_table(void);
 
 pte_t *build_page_table_k(unsigned long address);
Index: linux-2.6.20-rc1/include/asm-ia64/pgalloc-gpt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/include/asm-ia64/pgalloc-gpt.h	2007-01-03 12:40:56.441030000 +1100
@@ -0,0 +1,18 @@
+/**
+ *  include/asm-ia64/ptalloc-gpt.h
+ *
+ *  Copyright (C) 2005 - 2006 University of New South Wales, Australia
+ *      Adam 'WeirdArms' Wiggins <awiggins@cse.unsw.edu.au>,
+ *      Paul Davies <pauld@cse.unsw.edu.au>.
+ */
+
+#ifndef _ASM_IA64_PGALLOC_GPT_H
+#define _ASM_IA64_PGALLOC_GPT_H
+
+#define GPT_SPECIAL 1
+#define GPT_NORMAL  2
+
+extern int gpt_memsrc;
+
+
+#endif /* !_ASM_IA64_PGALLOC_GPT_H */
Index: linux-2.6.20-rc1/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/pgalloc.h	2007-01-03 12:35:16.310729000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/pgalloc.h	2007-01-03 12:40:56.441030000 +1100
@@ -81,4 +81,8 @@
 #include <asm/pgalloc-default.h>
 #endif
 
+#ifdef CONFIG_GPT
+#include <asm/pgalloc-gpt.h>
+#endif
+
 #endif				/* _ASM_IA64_PGALLOC_H */
Index: linux-2.6.20-rc1/arch/ia64/kernel/setup.c
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/kernel/setup.c	2007-01-03 12:35:16.310729000 +1100
+++ linux-2.6.20-rc1/arch/ia64/kernel/setup.c	2007-01-03 12:40:56.441030000 +1100
@@ -53,6 +53,7 @@
 #include <asm/page.h>
 #include <asm/patch.h>
 #include <asm/pgtable.h>
+#include <asm/pgalloc.h>
 #include <asm/processor.h>
 #include <asm/sal.h>
 #include <asm/sections.h>
@@ -548,6 +549,9 @@
 	platform_setup(cmdline_p);
 	create_kernel_page_table();
 	paging_init();
+#ifdef CONFIG_GPT
+	gpt_memsrc = GPT_NORMAL;
+#endif
 }
 
 /*
Index: linux-2.6.20-rc1/include/asm-ia64/ivt.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/ivt.h	2007-01-03 12:35:16.310729000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/ivt.h	2007-01-03 12:40:56.441030000 +1100
@@ -54,3 +54,80 @@
 .endm
 
 #endif
+
+#ifdef CONFIG_GPT
+
+/*
+ * FIND_PTE
+ * Walks the page table to find a PTE
+ * @va,		register holding virtual address
+ * @ppte, 	register with pointer to page table entry
+ * @ok,		predicate set if found
+ * @fail,      	predicate set if !found
+ */
+
+#define tmp             r20     /* tmp val to work out key */
+#define pnode           \ppte   /* pointer to node         */
+#define guard           r20     /* lower node word         */
+#define key             r21     /* lookup key              */
+#define size            r22     /* size of node level      */
+#define multiplier      r23
+#define inc             r23     /* inc = multiplier        */
+#define cmp_value       r24     /* cmp val with guard      */
+#define length          r25     /* guard length            */
+#define shift           r26     /* justify guard shift     */
+#define type            r27     /* node type               */
+#define level           r27     /* higher node word.       */
+#define guard2          r18
+#define internal        p8      /* Internal node           */
+#define recurse         p9
+
+.macro find_pte va, ppte, fail, ok
+	;;
+        rsm psr.dt              /* switch to using physical data addressing. */
+        mov pnode = IA64_KR(CURRENT_MM)   /* Load pointer to tasks GPT root. */
+        shr.u tmp = \va, 61                       /* Pull out region number. */
+        ;;
+        cmp.eq \ok, p0=5, tmp  /* Compare if region number is kernel region. */
+        ;;
+        srlz.d                             /* Don't remove, clarify purpose. */
+        LOAD_PHYSICAL(\ok, pnode, init_mm) /* If kernel region, use init_mm. */
+        mov key = \va
+        ;;
+.F1:
+/*0 M */ld8.acq guard = [pnode], 8               /* Load first word of node. */
+        ;;
+/*1 M */xor cmp_value = guard, key          /* Compare guard and key's MSBs. */
+/*1 I0*/extr.u size = guard, 8, 4                      /* Extract level size */
+/*1 M */and length = 63, guard                      /* Extract guard length. */
+/*1 I */tbit.nz internal, p0 = guard, 6           /* Test for internal node. */
+        ;;
+/*2 I */(internal) sub multiplier = 64, size /* Prep key for level indexing. */
+/*2 M */sub shift = 64, length    /* Prep guard/key shift from guard length. */
+/*2 M */(internal) ld8.acq level = [pnode], -8 /* Get pointer to next level. */
+/*2 I */(internal) shl key = key, length             /* Strip guard from key */
+        ;;
+/*3 M */(internal) ld8 guard2 = [pnode]   /* Load guard to check for update. */
+/*3 I */(internal) shr.u inc = key, multiplier     /* Calculate level index. */
+/*3 I */shr.u cmp_value = cmp_value, shift     /* Clear out none guard bits. */
+/*3 M */(internal) cmp.ne.unc recurse, \fail = level, r0   /* Level updated? */
+        ;;
+/*4 M */(internal) cmp.eq.and.orcm recurse, \fail = guard, guard2/* Changed? */
+/*4 I */(internal) shladd pnode = inc, 4, level       /* Point to next node. */
+/*4 M */(internal) cmp.eq.and.orcm recurse, \fail = cmp_value, r0  /* Match? */
+/*4 I */(internal) shl key = key, size               /* strip level from va. */
+/*4 B */(recurse) br.cond.dptk .F1            /* Get next node or exit loop. */
+        ;;
+.F2:
+        extr.u type = guard, 6, 2                       /* Extract node type */
+        cmp.eq \ok, p0 = r0, r0
+        ;;
+        (\fail) cmp.eq p0, \ok = r0, r0
+        ;;
+        (\ok) cmp.eq \ok, \fail = 2, type /* Did we terminated on a leaf node. */
+        ;;
+        (\ok) cmp.eq \ok, \fail = cmp_value, r0 /* FIX! */   /* Check guard. */
+        ;;
+.endm
+
+#endif
Index: linux-2.6.20-rc1/arch/ia64/kernel/ivt.S
===================================================================
--- linux-2.6.20-rc1.orig/arch/ia64/kernel/ivt.S	2007-01-03 12:35:16.310729000 +1100
+++ linux-2.6.20-rc1/arch/ia64/kernel/ivt.S	2007-01-03 12:40:56.445030000 +1100
@@ -490,8 +490,10 @@
 	;;
 (p7)	cmp.eq.or.andcm p6,p7=r17,r0		// was pmd_present(*pmd) == NULL?
 	dep r17=r19,r17,3,(PAGE_SHIFT-3)	// r17=pte_offset(pmd,addr);
+#endif		
+#ifdef CONFIG_GPT
+	find_pte r16,r17,p6,p7
 #endif
-	/* find_pte r16,r17,p6,p7 */
 (p6)	br.cond.spnt page_fault
 	mov b0=r30
 	br.sptk.many b0				// return to continuation point
Index: linux-2.6.20-rc1/include/asm-ia64/mmu_context.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/mmu_context.h	2007-01-03 12:35:16.310729000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/mmu_context.h	2007-01-03 12:40:56.445030000 +1100
@@ -194,6 +194,9 @@
 #ifdef CONFIG_PT_DEFAULT
 	ia64_set_kr(IA64_KR_PT_BASE, __pa(next->page_table.pgd));
 #endif
+#ifdef CONFIG_GPT
+	ia64_set_kr(IA64_KR_CURRENT_MM, __pa(next));
+#endif
 	activate_context(next);
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 6/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (38 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 5/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 7/12] " Paul Davies
                   ` (9 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 06
 * Adds more GPT implementation.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 include/asm-ia64/kregs.h       |    4 +
 include/asm-ia64/mmu_context.h |    1 
 mm/pt-gpt-core.c               |  101 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)
Index: linux-2.6.20-rc1/include/asm-ia64/kregs.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/kregs.h	2007-01-03 15:34:29.855180000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/kregs.h	2007-01-03 15:34:36.303180000 +1100
@@ -20,6 +20,10 @@
 #define IA64_KR_CURRENT		6	/* ar.k6: "current" task pointer */
 #define IA64_KR_PT_BASE		7	/* ar.k7: page table base address (physical) */
 
+#ifdef CONFIG_GPT
+#define IA64_KR_CURRENT_MM	7
+#endif
+
 #define _IA64_KR_PASTE(x,y)	x##y
 #define _IA64_KR_PREFIX(n)	_IA64_KR_PASTE(ar.k, n)
 #define IA64_KR(n)		_IA64_KR_PREFIX(IA64_KR_##n)
Index: linux-2.6.20-rc1/include/asm-ia64/mmu_context.h
===================================================================
--- linux-2.6.20-rc1.orig/include/asm-ia64/mmu_context.h	2007-01-03 15:34:29.859180000 +1100
+++ linux-2.6.20-rc1/include/asm-ia64/mmu_context.h	2007-01-03 15:34:36.303180000 +1100
@@ -197,6 +197,7 @@
 #ifdef CONFIG_GPT
 	ia64_set_kr(IA64_KR_CURRENT_MM, __pa(next));
 #endif
+
 	activate_context(next);
 }
 
Index: linux-2.6.20-rc1/mm/pt-gpt-core.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc1/mm/pt-gpt-core.c	2007-01-03 15:35:25.427180000 +1100
@@ -0,0 +1,101 @@
+/**
+ *  mm/pt-gpt.c
+ *
+ *  Copyright (C) 2005 - 2006 University of New South Wales, Australia
+ *      Adam 'WeirdArms' Wiggins <awiggins@cse.unsw.edu.au,
+ *      Paul Davies <pauld@cse.unsw.edu.au>.
+ */
+
+#include <linux/types.h>
+#include <linux/bootmem.h>
+#include <linux/gpt.h>
+
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <linux/mman.h>
+#include <linux/swap.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/pt.h>
+
+#include <asm/uaccess.h>
+#include <asm/tlb.h>
+#include <asm/tlbflush.h>
+#include <asm/pgtable.h>
+
+#include <linux/swapops.h>
+#include <linux/elf.h>
+#include <linux/pt-iterator-ops.h>
+
+#define GPT_ITERATOR_INRANGE 0
+#define GPT_ITERATOR_START   1
+#define GPT_ITERATOR_LIMIT   (-1)
+
+/*******************************************************************************
+* Local function prototypes.                                                   *
+*******************************************************************************/
+
+static inline void gpt_iterator_inspect_init_all(gpt_iterator_t* iterator_r,
+                                                 gpt_t* trie_p);
+static inline void gpt_iterator_inspect_init_range(gpt_iterator_t* iterator_r,
+                                                   gpt_t* trie_p,
+                                                   unsigned long addr,
+                                                   unsigned long end);
+static inline int gpt_iterator_free_pgtables(gpt_iterator_t* iterator_u,
+                                             gpt_node_t** node_p_r,
+                                             unsigned long floor,
+                                             unsigned long ceiling);
+static inline int gpt_iterator_inspect_internals_all(gpt_iterator_t* iterator,
+                                                     gpt_key_t* key_r,
+                                                     gpt_node_t** node_p_r);
+static inline int gpt_iterator_inspect_leaves_range(gpt_iterator_t* iterator_u,
+                                                    gpt_key_t* key_r,
+                                                    gpt_node_t** node_p_r);
+static inline int gpt_iterator_leaf_visit_range(gpt_iterator_t* iterator_u,
+                                                gpt_key_t* key_r,
+                                                gpt_node_t** node_p_r);
+static inline int gpt_iterator_internal_free_pgtables(gpt_iterator_t* iterator_u,
+                                                      gpt_node_t**node_p_r,
+                                                      unsigned long floor,
+                                                      unsigned long ceiling);
+static inline int gpt_iterator_internal_visit_range(gpt_iterator_t* iterator_u,
+                                                    gpt_node_t** node_p_r);
+static inline int gpt_iterator_internal_visit_all(gpt_iterator_t* iterator_u,
+                                                  gpt_key_t* key_r,
+                                                  gpt_node_t** node_p_r);
+static inline void gpt_iterator_terminal_free_pgtables(gpt_iterator_t* iterator_u);
+static inline void gpt_iterator_terminal_skip_range(gpt_iterator_t* iterator_u);
+static inline void gpt_iterator_terminal_skip_all(gpt_iterator_t* iterator_u);
+static inline void gpt_iterator_internal_skip_range(gpt_iterator_t* iterator_u);
+static inline int gpt_iterator_check_bounds(gpt_iterator_t* iterator_u,
+                                            int8_t* replication_r);
+static inline void gpt_iterator_inspect_push_range(gpt_iterator_t* iterator_u);
+static inline void gpt_iterator_inspect_push_all(gpt_iterator_t* iterator_u);
+static inline void gpt_iterator_inspect_next(gpt_iterator_t* iterator_u,
+                                              int8_t replication);
+
+static int gpt_iterator_internal(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                                 gpt_node_t** node_p_r);
+static int gpt_iterator_leaf(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                             gpt_node_t** node_p_r);
+static inline int gpt_iterator_invalid(gpt_iterator_t* iterator_u,
+                                       gpt_key_t* key_r, gpt_node_t** node_p_r);
+static inline void gpt_iterator_inspect_pop(gpt_iterator_t* iterator_u);
+static inline gpt_node_t* gpt_iterator_parent(gpt_iterator_t iterator);
+static inline void gpt_iterator_return(gpt_iterator_t iterator,
+                                       gpt_key_t* key_r, gpt_node_t** node_p_r);
+
+static inline int gpt_node_delete_single(gpt_thunk_t delete_thunk,
+                                         gpt_node_t delete_node);
+static int gpt_node_delete_replicate(gpt_thunk_t delete_thunk,
+                                     gpt_node_t delete_node);
+static inline int gpt_node_internal_delete(gpt_thunk_t delete_thunk,
+                                           gpt_node_t delete_node);
+static inline void gpt_node_insert_single(gpt_node_t new_node,
+                                          gpt_thunk_t insert_thunk);
+static int gpt_node_insert_replicate(gpt_node_t new_node,
+                                     gpt_thunk_t insert_thunk,
+                                     gpt_node_t insert_node);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 7/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (39 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 6/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 8/12] " Paul Davies
                   ` (8 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 07
 * Adding GPT implementation

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pt-gpt-core.c |  180 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)
Index: linux-2.6.20-rc1/mm/pt-gpt-core.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/pt-gpt-core.c	2007-01-03 15:35:25.427180000 +1100
+++ linux-2.6.20-rc1/mm/pt-gpt-core.c	2007-01-03 15:42:21.973271000 +1100
@@ -99,3 +99,183 @@
 static int gpt_node_insert_replicate(gpt_node_t new_node,
                                      gpt_thunk_t insert_thunk,
                                      gpt_node_t insert_node);
+
+/*******************************************************************************
+ * Exported functions.                                                         *
+ *******************************************************************************/
+
+void
+gptKeyCutMSB(int8_t length_msb, gpt_key_t* key_u, gpt_key_t* key_msb_r)
+{
+	int8_t length;
+	gpt_key_value_t value;
+
+	value = gpt_key_read_value(*key_u);
+	length = gpt_key_read_length(*key_u);
+	if(length_msb > length) length_msb = length;
+	length -= length_msb;
+	if(key_msb_r) {
+		*key_msb_r = ((length_msb == 0) ? gpt_key_null() :
+					  gpt_key_init(value >> length, length_msb));
+	}
+	if(length == GPT_KEY_LENGTH_MAX) return;
+	*key_u = gpt_key_init(value & (((gpt_key_value_t)1 << length) - 1),
+						  length);
+}
+
+void
+gptKeyCutLSB(int8_t length_lsb, gpt_key_t* key_u, gpt_key_t* key_lsb_r)
+{
+	int8_t length;
+	gpt_key_value_t value;
+
+	value = gpt_key_read_value(*key_u);
+	length = gpt_key_read_length(*key_u);
+	if(length_lsb > length) length_lsb = length;
+	length -= length_lsb;
+	if(key_lsb_r) {
+		*key_lsb_r = gpt_key_init(value & ~gpt_key_value_mask(length_lsb),
+								  length_lsb);
+	}
+	*key_u = ((length == 0) ? gpt_key_null() :
+			  gpt_key_init(value >> length_lsb, length));
+}
+
+void
+gptKeysMergeMSB(gpt_key_t key_lsb, gpt_key_t* key_u)
+{
+	int8_t length, length_lsb;
+	gpt_key_value_t value;
+
+	value = gpt_key_read_value(*key_u);
+	length = gpt_key_read_length(*key_u);
+	length_lsb = gpt_key_read_length(key_lsb);
+	value = (value << length_lsb) + gpt_key_read_value(key_lsb);
+	length += length_lsb;
+	*key_u = gpt_key_init(value, length);
+}
+
+void
+gptKeysMergeLSB(gpt_key_t key_msb, gpt_key_t* key_u)
+{
+	gpt_key_value_t value;
+	int8_t length;
+
+	value = gpt_key_read_value(*key_u);
+	length = gpt_key_read_length(*key_u);
+	value = (gpt_key_read_value(key_msb) << length) + value;
+	length += gpt_key_read_length(key_msb);
+	*key_u = gpt_key_init(value, length);
+}
+
+/* awiggins (2006-02-07): I'd like to simplify this function if possible. */
+int8_t
+gptKeysCompareStripPrefix(gpt_key_t* key1_u, gpt_key_t* key2_u)
+{
+	int8_t length, key1_length, key2_length;
+	gpt_key_value_t value1, value2;
+
+	key1_length = key1_u->length; key2_length = key2_u->length;
+	if(key1_length < key2_length) {
+		length = key1_length;
+		value1 = key1_u->value;
+		value2 = key2_u->value >> (key2_length - length);
+	} else {
+		length = key2_length;
+		value1 = key2_u->value;
+		value2 = key1_u->value >> (key1_length - length);
+	}
+	if(length == 0) return 0;
+	length = gpt_ctlz(value1 ^ value2, length - 1);
+
+	/* Strip matching prefix from keys. */
+	gptKeyCutMSB(length, key1_u, NULL);
+	gptKeyCutMSB(length, key2_u, NULL);
+
+	return length;
+}
+
+int8_t
+gptNodeReplication(gpt_node_t node, int8_t coverage)
+{
+	gpt_key_t guard;
+	int8_t leaf_coverage, guard_length;
+
+	switch(gpt_node_type(node)) {
+	case GPT_NODE_TYPE_INTERNAL:
+	case GPT_NODE_TYPE_INVALID:
+		return 0; /* These nodes are never replicated. */
+	case GPT_NODE_TYPE_LEAF:
+		guard = gpt_node_read_guard(node);
+		leaf_coverage = gpt_node_leaf_read_coverage(node);
+		guard_length = gpt_key_read_length(guard);
+		coverage -= guard_length;
+		return leaf_coverage - coverage;
+	default: panic("Invalid GPT node encountered\n");
+	}
+}
+
+int
+gpt_node_inspect_find(gpt_thunk_t* inspect_thunk_u)
+{
+	int8_t guard_length, key_length;
+	gpt_key_t guard, key, temp_key;
+	gpt_node_t node;
+	gpt_thunk_t temp_thunk = *inspect_thunk_u;
+
+	/* Travrse to inspection node. */
+	while(gpt_node_internal_traverse(&temp_thunk) == GPT_TRAVERSED_FULL) {
+		inspect_thunk_u->key = temp_thunk.key;
+	}
+	key = inspect_thunk_u->key;
+	node = gpt_node_get(temp_thunk.node_p);
+	/* Only guardable entries are valid nodes. */
+	if(!gpt_node_valid(node)) {
+		return 0;
+	}
+	guard = gpt_node_read_guard(node);
+	inspect_thunk_u->node_p = temp_thunk.node_p;
+	key_length = gpt_key_read_length(key);
+	guard_length = gpt_key_read_length(guard);
+	/* Split the larger keys msb's off for comparison. */
+	if(key_length < guard_length) {
+		/* Cut the guard for checking. */
+		gptKeyCutMSB(key_length, &guard, &temp_key);
+		return gptKeysCompareEqual(key, temp_key);
+	} else if(key_length > guard_length) {
+		/* Cut the key for checking. */
+		gptKeyCutMSB(guard_length, &key, &temp_key);
+		return gptKeysCompareEqual(temp_key, guard);
+	}else {
+		return gptKeysCompareEqual(key, guard);
+	}
+}
+
+int
+gpt_node_update_find(gpt_thunk_t* update_thunk_u)
+{
+	gpt_traversed_t traversed;
+	gpt_thunk_t temp_thunk1, temp_thunk2;
+
+	/* Check if the deletion/insertion window covers the root node. */
+	temp_thunk1 = *update_thunk_u;
+	traversed = gpt_node_internal_traverse(&temp_thunk1);
+	if(traversed == GPT_TRAVERSED_NONE ||
+	   traversed == GPT_TRAVERSED_MISMATCH) {
+		return 1;
+	} else if (traversed == GPT_TRAVERSED_GUARD) {
+		return 0;
+	}
+	/* Traverse to update-node. */
+	temp_thunk2 = temp_thunk1;
+	while((traversed = gpt_node_internal_traverse(&temp_thunk2)) ==
+		  GPT_TRAVERSED_FULL) {
+		*update_thunk_u = temp_thunk1;
+		temp_thunk1 = temp_thunk2;
+	}
+	if(traversed == GPT_TRAVERSED_GUARD) {
+		*update_thunk_u = temp_thunk1;
+	}
+	return 0; /* Deletion/insertion window dose not cover root node. */
+}
+

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 8/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (40 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 7/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 9/12] " Paul Davies
                   ` (7 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 08
 * Continue adding GPT implementation.

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pt-gpt-core.c |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)
Index: linux-2.6.20-rc1/mm/pt-gpt-core.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/pt-gpt-core.c	2007-01-03 15:42:21.973271000 +1100
+++ linux-2.6.20-rc1/mm/pt-gpt-core.c	2007-01-03 15:46:46.657271000 +1100
@@ -279,3 +279,153 @@
 	return 0; /* Deletion/insertion window dose not cover root node. */
 }
 
+int
+gpt_node_delete(int is_root, gpt_thunk_t update_thunk)
+{
+	int error;
+	int8_t coverage;
+	gpt_node_t delete_node, update_node;
+	gpt_thunk_t delete_thunk = update_thunk;
+
+	/* Traverse update-node to delete-node. */
+	if(gpt_node_internal_traverse(&delete_thunk) != GPT_TRAVERSED_FULL) {
+		delete_thunk.key = update_thunk.key;
+	}
+	/* Is delete-key is inside a super-keyed leaf? */
+	delete_node = gpt_node_get(delete_thunk.node_p);
+	if((gpt_node_type(delete_node) == GPT_NODE_TYPE_LEAF) &&
+	   (coverage = gpt_node_leaf_read_coverage(delete_node) > 0)) {
+		/* Adjust lookup key, and retry traverse to delete-node. */
+		gptKeyCutLSB(coverage, &(update_thunk.key), NULL);
+		delete_thunk = update_thunk;
+		if(gpt_node_internal_traverse(&delete_thunk) !=
+		   GPT_TRAVERSED_FULL) {
+			delete_thunk.key = update_thunk.key;
+		}
+		delete_node = gpt_node_get(delete_thunk.node_p);
+	}
+	/* Delete node. */
+	switch(gpt_node_type(delete_node)) {
+	case GPT_NODE_TYPE_INVALID:
+		return GPT_NOT_FOUND;
+	case GPT_NODE_TYPE_LEAF:
+		error = gpt_node_delete_single(delete_thunk, delete_node);
+		break;
+	case GPT_NODE_TYPE_INTERNAL:
+		error = gpt_node_internal_delete(delete_thunk, delete_node);
+		break;
+	default: return GPT_NOT_FOUND;
+	}
+	if(error < 0) {
+		return error;
+	}
+	/* Decrement update-node's valid children count and return coverage. */
+	if(!is_root) {
+		update_node = gpt_node_get(update_thunk.node_p);
+		gpt_node_set(update_thunk.node_p,
+					 gpt_node_internal_dec_children(update_node));
+	}
+	return error;
+}
+
+int
+gpt_node_insert(gpt_node_t new_node, gpt_thunk_t update_thunk)
+{
+	int error;
+	gpt_node_t update_node, insert_node;
+	gpt_thunk_t insert_thunk = update_thunk;
+
+	/* Traverse to insertion point. */
+	if(gpt_node_internal_traverse(&insert_thunk) != GPT_TRAVERSED_FULL)
+			insert_thunk.key = update_thunk.key;
+
+	/* Insert new node. */
+	insert_node = gpt_node_get(insert_thunk.node_p);
+	switch(gpt_node_type(insert_node)) {
+	case GPT_NODE_TYPE_INVALID:
+		gpt_node_insert_single(new_node, insert_thunk);
+		break;
+	case GPT_NODE_TYPE_LEAF:
+		return GPT_OCCUPIED;
+	case GPT_NODE_TYPE_INTERNAL:
+		error = gpt_node_insert_replicate(new_node, insert_thunk,
+                                                  insert_node);
+		if(error < 0) {
+			return error;
+		}
+		break;
+	default:
+		return GPT_FAILED;
+	}
+
+	/* Increment update-node's valid children count and return. */
+	update_node = gpt_node_get(update_thunk.node_p);
+	if(gpt_node_type(update_node) == GPT_NODE_TYPE_INTERNAL) {
+		gpt_node_set(update_thunk.node_p,
+					 gpt_node_internal_inc_children(update_node));
+	}
+	return GPT_OK;
+}
+
+gpt_traversed_t
+gpt_node_internal_traverse(gpt_thunk_t* thunk_u)
+{
+	gpt_key_t guard, key_msb;
+	gpt_node_t node;
+	gpt_node_t* level;
+	gpt_key_value_t level_index;
+	int8_t guard_length, key_length, level_order;
+
+	/* Check for internal node, match guard for key stripping match. */
+	node = gpt_node_get(thunk_u->node_p);
+	if(gpt_node_type(node) != GPT_NODE_TYPE_INTERNAL) {
+		return GPT_TRAVERSED_NONE;
+	}
+	guard = gpt_node_read_guard(node);
+	level = gpt_node_internal_read_ptr(node);
+	level_order = gpt_node_internal_read_order(node);
+	guard_length = gpt_key_read_length(guard);
+	gptKeyCutMSB(guard_length, &(thunk_u->key), &key_msb);
+	if(!gptKeysCompareEqual(guard, key_msb)) {
+		return GPT_TRAVERSED_MISMATCH;
+	}
+	/* Index internal node's level with key stripping index. */
+	gptKeyCutMSB(level_order, &(thunk_u->key), &key_msb);
+	level_index = gpt_key_read_value(key_msb);
+	key_length = gpt_key_read_length(key_msb);
+	if(key_length != level_order) {
+		return GPT_TRAVERSED_GUARD;
+	}
+	/* Return next node. */
+	thunk_u->node_p = level + level_index;
+	return GPT_TRAVERSED_FULL;
+}
+
+int
+gpt_iterator_inspect(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                     gpt_node_t** node_p_r)
+{
+	int found = 0;
+	gpt_node_t node;
+
+	/* Find the next node. */
+	while(!found && iterator_u->node_p) {
+		node = gpt_node_get(iterator_u->node_p);
+		switch(gpt_node_type(node)) {
+		case GPT_NODE_TYPE_INTERNAL:
+			found = gpt_iterator_internal(iterator_u, key_r,
+										  node_p_r);
+			break;
+		case GPT_NODE_TYPE_LEAF:
+			found = gpt_iterator_leaf(iterator_u, key_r, node_p_r);
+			break;
+		case GPT_NODE_TYPE_INVALID:
+			found = gpt_iterator_invalid(iterator_u, key_r,
+										 node_p_r);
+			break;
+		default:
+			panic("Should never get here!");
+		}
+	}
+	return found;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 9/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (41 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 8/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 10/12] " Paul Davies
                   ` (6 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 09
 * Continue adding GPT implementation

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pt-gpt-core.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)
Index: linux-2.6.20-rc1/mm/pt-gpt-core.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/pt-gpt-core.c	2007-01-03 15:46:46.657271000 +1100
+++ linux-2.6.20-rc1/mm/pt-gpt-core.c	2007-01-03 15:57:35.539309000 +1100
@@ -429,3 +429,174 @@
 	}
 	return found;
 }
+
+/******************
+* Local functions *
+******************/
+
+static inline void
+gpt_iterator_inspect_init_all(gpt_iterator_t* iterator_r, gpt_t* trie_p)
+{
+	iterator_r->coverage = GPT_KEY_LENGTH_MAX;
+	iterator_r->depth = 0;
+	iterator_r->finished = 0;
+	iterator_r->key = gpt_key_null();
+	iterator_r->node_p = trie_p;
+}
+
+static inline void
+gpt_iterator_inspect_init_range(gpt_iterator_t* iterator_r, gpt_t* trie_p,
+                                unsigned long addr, unsigned long end)
+{
+	gpt_iterator_inspect_init_all(iterator_r, trie_p);
+	iterator_r->start = extract_key(addr);
+	iterator_r->limit = extract_key(end-1);
+}
+
+static inline int
+gpt_iterator_inspect_leaves_range(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                                  gpt_node_t** node_p_r)
+{
+	int found = 0;
+	gpt_node_t node;
+
+	/* Find the next node. */
+	while(!found && iterator_u->node_p) {
+		node = gpt_node_get(iterator_u->node_p);
+		switch(gpt_node_type(node)) {
+		case GPT_NODE_TYPE_INTERNAL:
+			gpt_iterator_internal_skip_range(iterator_u);
+			break;
+		case GPT_NODE_TYPE_LEAF:
+			found = gpt_iterator_leaf_visit_range(iterator_u, key_r,
+												  node_p_r);
+			break;
+		case GPT_NODE_TYPE_INVALID:
+			gpt_iterator_terminal_skip_range(iterator_u);
+			break;
+		default:
+			panic("Should never get here!");
+		}
+	}
+	return found;
+}
+
+static inline int
+gpt_iterator_free_pgtables(gpt_iterator_t* iterator_u, gpt_node_t** node_p_r,
+                           unsigned long floor, unsigned long ceiling)
+{
+	int found = 0;
+	gpt_node_t node;
+
+	while(!found && iterator_u->node_p) {
+		node = gpt_node_get(iterator_u->node_p);
+		switch(gpt_node_type(node)) {
+		case GPT_NODE_TYPE_INTERNAL:
+			found = gpt_iterator_internal_free_pgtables(iterator_u,
+								node_p_r, floor, ceiling);
+			break;
+		case GPT_NODE_TYPE_LEAF:
+		case GPT_NODE_TYPE_INVALID:
+			gpt_iterator_terminal_free_pgtables(iterator_u);
+			break;
+		default:
+			panic("Should never get here!");
+		}
+	}
+	return found;
+}
+
+/** awiggins (2006-09-18): Two problems to deal with with this implementation:
+ *    - Don't properly determine when an internal level should be freed.
+ *    - We should skip visiting leaf nodes, ie when we have levels that
+ *      can ONLY contain leaf nodes don't traverse any further.
+ */
+static inline int
+gpt_iterator_inspect_internals_all(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                                   gpt_node_t** node_p_r)
+{
+	int found = 0;
+	gpt_node_t node;
+
+	/* Find the next node. */
+	while(!found && iterator_u->node_p) {
+		node = gpt_node_get(iterator_u->node_p);
+		switch(gpt_node_type(node)) {
+		case GPT_NODE_TYPE_INTERNAL:
+			gpt_iterator_internal_visit_all(iterator_u, key_r,
+											node_p_r);
+			break;
+		case GPT_NODE_TYPE_LEAF:
+		case GPT_NODE_TYPE_INVALID:
+			gpt_iterator_terminal_skip_all(iterator_u);
+			break;
+		default:
+			panic("Should never get here!");
+		}
+	}
+	return found;
+}
+
+static inline int
+gpt_iterator_leaf_visit_range(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                              gpt_node_t** node_p_r)
+{
+	int found = 0;
+	int8_t replication;
+
+	switch(gpt_iterator_check_bounds(iterator_u, &replication)) {
+	case GPT_ITERATOR_INRANGE:
+		gpt_iterator_return(*iterator_u, key_r, node_p_r);
+		found = 1;
+		/* Fall through to update current. */
+	case GPT_ITERATOR_START:
+		gpt_iterator_inspect_next(iterator_u, replication);
+		break;
+	case GPT_ITERATOR_LIMIT:
+		iterator_u->node_p = NULL;
+		break;
+	default:
+		panic("Should never get here!");
+	}
+	return found;
+}
+
+static inline int
+gpt_iterator_internal_free_pgtables(gpt_iterator_t* iterator_u,
+                                    gpt_node_t**node_p_r,
+                                    unsigned long floor, unsigned long ceiling)
+{
+	int8_t replication;
+	gpt_key_t key, guard;
+	gpt_node_t* node_temp_p;
+
+	/* Process node once children have been processed. */
+	// DEBUG [
+	if(iterator_u->depth == 0) {
+		printk("Root");
+	}
+	gpt_iterator_return(*iterator_u, &key, &node_temp_p);
+	guard = gpt_node_read_guard(gpt_node_get(node_temp_p));
+	printk("\tinternal node (0x%lx, %d) guard (0x%lx, %d)",
+		   gpt_key_read_value(key), gpt_key_read_length(key),
+		   gpt_key_read_value(guard), gpt_key_read_length(guard));
+	printk((iterator_u->finished) ? "U\n" : "D\n");
+	// DEBUG ]
+	if(iterator_u->finished) {
+		// gpt_iterator_return(*iterator_u, &key, node_p_r);
+		//if(ceiling-1 /* inside internal using key*/) {
+		//        iterator_u->node_p = NULL; // Finished.
+		//        return 0;
+		//}
+		gpt_iterator_inspect_next(iterator_u, 0);
+		//return (floor /* inside internal using key*/) {
+		//        return 0;
+		//}
+		return 0; // Ignore then for now while debugging.
+	}
+	// add code to skip over levels containing only leaves.
+	gpt_iterator_check_bounds(iterator_u, &replication); // updates key.
+	/* If guard is in range process child nodes if guard. */
+	gpt_iterator_inspect_push_range(iterator_u);
+	return 0;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 10/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (42 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 9/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 11/12] " Paul Davies
                   ` (5 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 10
 * Continue adding GPT implementation

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pt-gpt-core.c |  219 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 219 insertions(+)
Index: linux-2.6.20-rc1/mm/pt-gpt-core.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/pt-gpt-core.c	2007-01-03 15:57:35.539309000 +1100
+++ linux-2.6.20-rc1/mm/pt-gpt-core.c	2007-01-03 16:07:14.297584000 +1100
@@ -600,3 +600,222 @@
 	gpt_iterator_inspect_push_range(iterator_u);
 	return 0;
 }
+
+static inline void
+gpt_iterator_terminal_free_pgtables(gpt_iterator_t* iterator_u)
+{
+	int8_t replication;
+
+	if(gpt_iterator_check_bounds(iterator_u, &replication) ==
+	   GPT_ITERATOR_LIMIT) {
+		if(iterator_u->depth == 0) {
+			iterator_u->node_p = NULL;
+			return;
+		}
+		iterator_u->finished = 1;
+		gpt_iterator_inspect_pop(iterator_u);
+	} else {
+		gpt_iterator_inspect_next(iterator_u, replication);
+	}
+}
+
+static inline int
+gpt_iterator_internal_visit_range(gpt_iterator_t* iterator_u,
+                                                    gpt_node_t** node_p_r)
+{
+	//int8_t replication;
+
+	panic("Unimplemented!");
+	if(iterator_u->finished) {
+		gpt_iterator_return(*iterator_u, NULL, node_p_r);
+		gpt_iterator_inspect_next(iterator_u, 0);
+		return 1;
+	}
+	//gpt_iterator_check_bounds(iterator_u,
+}
+
+static inline int
+gpt_iterator_internal_visit_all(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                                gpt_node_t** node_p_r)
+{
+	int8_t replication;
+
+	/* Process node once children have been processed. */
+	if(iterator_u->finished) {
+		gpt_iterator_return(*iterator_u, key_r, node_p_r);
+		gpt_iterator_inspect_next(iterator_u, 0);
+		return 1;
+	}
+	gpt_iterator_check_bounds(iterator_u, &replication); // updates key.
+	/* If guard is in range process child nodes if guard. */
+	gpt_iterator_inspect_push_range(iterator_u);
+	return 0;
+}
+
+static inline void
+gpt_iterator_internal_skip_range(gpt_iterator_t* iterator_u)
+{
+	int8_t replication;
+
+	/* Process node once children have been processed. */
+	if(iterator_u->finished) {
+		gpt_iterator_inspect_next(iterator_u, 0);
+		return;
+	}
+	/* If guard is in range process child nodes if guard. */
+	switch(gpt_iterator_check_bounds(iterator_u, &replication)) {
+	case GPT_ITERATOR_INRANGE:
+		gpt_iterator_inspect_push_range(iterator_u);
+		break;
+	case GPT_ITERATOR_START:
+		BUG_ON(replication != 0); // Internal nodes not be replicated.
+		gpt_iterator_inspect_next(iterator_u, 0);
+		break;
+	case GPT_ITERATOR_LIMIT:
+		iterator_u->node_p = NULL;
+		break;
+	default:
+		panic("Should never get here!");
+	}
+}
+
+static inline void
+gpt_iterator_terminal_skip_range(gpt_iterator_t* iterator_u)
+{
+	int8_t replication;
+
+	switch(gpt_iterator_check_bounds(iterator_u, &replication)) {
+	case GPT_ITERATOR_INRANGE:
+	case GPT_ITERATOR_START:
+		gpt_iterator_inspect_next(iterator_u, replication);
+		break;
+	case GPT_ITERATOR_LIMIT:
+		iterator_u->node_p = NULL;
+		break;
+	default:
+		panic("Should never get here!");
+	}
+}
+
+static inline void
+gpt_iterator_terminal_skip_all(gpt_iterator_t* iterator_u)
+{
+	int8_t replication;
+
+	// awiggins (2006-09-14) Should actually check replication of leaves.
+	gpt_iterator_check_bounds(iterator_u, &replication); // updates key.
+	gpt_iterator_inspect_next(iterator_u, replication);
+}
+
+
+static inline int
+gpt_iterator_check_bounds(gpt_iterator_t* iterator_u, int8_t* replication_r)
+{
+	int8_t coverage;
+	gpt_key_t key;
+	gpt_key_value_t key_value;
+
+	/* Construct the search-key for the current node. */
+	coverage = iterator_u->coverage - gpt_key_read_length(iterator_u->key);
+	*replication_r = gptNodeReplication(*iterator_u->node_p, coverage);
+	key = gpt_node_read_guard(gpt_node_get(iterator_u->node_p));
+	key = gpt_keys_merge_LSB(iterator_u->key, key);
+	iterator_u->key = key = gpt_key_cut_LSB(*replication_r, key);
+	/* Compare the current nodes search-key to the iterator's range. */
+	key_value = gpt_key_read_value(key);
+	coverage = iterator_u->coverage - gpt_key_read_length(key);
+	if(key_value < (iterator_u->start >> coverage)) {
+		return GPT_ITERATOR_START;
+	}
+	if(key_value > (iterator_u->limit >> coverage)) {
+		return GPT_ITERATOR_LIMIT;
+	}
+	return GPT_ITERATOR_INRANGE;
+}
+
+static inline void
+gpt_iterator_inspect_push_range(gpt_iterator_t* iterator_u)
+{
+	int8_t key_length, coverage, level_order;
+	gpt_key_t index;
+	gpt_node_t node, *level;
+	gpt_key_value_t i, key_value;
+
+	/* Get details of next level. */
+	node = gpt_node_get(iterator_u->node_p);
+	level = gpt_node_internal_read_ptr(node);
+	level_order = gpt_node_internal_read_order(node);
+	/* Find index into next level. */
+	key_value = gpt_key_read_value(iterator_u->key);
+	key_length = gpt_key_read_length(iterator_u->key);
+	coverage = iterator_u->coverage - key_length;
+	i = iterator_u->start >> (coverage - level_order);
+	if((i >> level_order) == (key_value)) {
+		i &= ~gpt_key_value_mask(level_order);
+	} else {
+		i = 0;
+	}
+	index = gpt_key_init(i, level_order);
+	/* Update iterator position. */
+	iterator_u->stack[(iterator_u->depth)++] = iterator_u->node_p;
+	iterator_u->node_p = level + i;
+	iterator_u->key = gpt_keys_merge_MSB(index, iterator_u->key);
+}
+
+static inline void
+gpt_iterator_inspect_push_all(gpt_iterator_t* iterator_u)
+{
+	int8_t level_order;
+	gpt_key_t index;
+	gpt_node_t node, *level;
+
+	/* Get details of next level. */
+	node = gpt_node_get(iterator_u->node_p);
+	level = gpt_node_internal_read_ptr(node);
+	level_order = gpt_node_internal_read_order(node);
+	/* Update iterator position. */
+	index = gpt_key_init(0, level_order);
+	iterator_u->stack[(iterator_u->depth)++] = iterator_u->node_p;
+	iterator_u->node_p = level;
+	iterator_u->key = gpt_keys_merge_MSB(index, iterator_u->key);
+}
+
+static inline void
+gpt_iterator_inspect_next(gpt_iterator_t* iterator_u, int8_t replication)
+{
+	int8_t level_order, key_length;
+	gpt_key_t guard, index;
+	gpt_node_t node, *level;
+	gpt_key_value_t i, step;
+
+	/* The root node has no siblings, iteration complete. */
+	if(iterator_u->depth == 0) {
+		iterator_u->node_p = NULL;
+		return;
+	}
+	/* Strip the current nodes guard bits from the key. */
+	node = gpt_node_get(iterator_u->node_p);
+	if(gpt_node_valid(node)) {
+		guard = gpt_node_read_guard(node);
+		key_length = gpt_key_read_length(guard);
+		iterator_u->key = gpt_key_cut_LSB(key_length, iterator_u->key);
+	}
+	/* Update index and either get next sibling or return to parent. */
+	node = gpt_node_get(gpt_iterator_parent(*iterator_u));
+	level = gpt_node_internal_read_ptr(node);
+	level_order = gpt_node_internal_read_order(node);
+	index = gpt_key_cut_LSB2(level_order - replication, &(iterator_u->key));
+	i = gpt_key_read_value(index);
+	key_length = gpt_key_read_length(index);
+	step = 1 << replication;
+	if(i < ((1 << (level_order - replication)) - 1)) {
+		i = (i + 1) << replication;
+		index = gpt_key_init(i, level_order);
+		iterator_u->key = gpt_keys_merge_MSB(index, iterator_u->key);
+		iterator_u->node_p = level + i;
+		iterator_u->finished = 0;
+	} else {
+		gpt_iterator_inspect_pop(iterator_u);
+	}
+}
+

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 11/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (43 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 10/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13  2:49 ` [PATCH 12/12] " Paul Davies
                   ` (4 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 11
 * Continue adding GPT implementation

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pt-gpt-core.c |  205 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 205 insertions(+)
Index: linux-2.6.20-rc1/mm/pt-gpt-core.c
===================================================================
--- linux-2.6.20-rc1.orig/mm/pt-gpt-core.c	2007-01-03 16:07:14.297584000 +1100
+++ linux-2.6.20-rc1/mm/pt-gpt-core.c	2007-01-03 16:11:35.005312000 +1100
@@ -819,3 +819,208 @@
 	}
 }
 
+static inline void
+gpt_iterator_inspect_pop(gpt_iterator_t* iterator_u)
+{
+	BUG_ON(iterator_u->depth <= 0);
+	iterator_u->node_p = iterator_u->stack[--(iterator_u->depth)];
+	iterator_u->finished = 1;
+}
+
+static inline gpt_node_t*
+gpt_iterator_parent(gpt_iterator_t iterator)
+{
+	return iterator.stack[iterator.depth - 1];
+}
+
+static inline void
+gpt_iterator_return(gpt_iterator_t iterator,
+                    gpt_key_t* key_r, gpt_node_t** node_p_r)
+{
+	if(key_r) {
+		*key_r = iterator.key;
+	}
+	if(node_p_r) {
+		*node_p_r = iterator.node_p;
+	}
+}
+
+static int
+gpt_iterator_internal(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                      gpt_node_t** node_p_r)
+{
+	int found = 0;
+	int8_t replication;
+
+	/* Process node once children have been processed. */
+	if(iterator_u->finished) {
+		if(iterator_u->flags & GPT_ITERATE_INTERNALS) {
+			gpt_iterator_return(*iterator_u, key_r, node_p_r);
+			found = 1;
+		}
+		gpt_iterator_inspect_next(iterator_u, 0);
+		return found;
+	}
+	/* If guard is in range process child nodes if guard. */
+	switch(gpt_iterator_check_bounds(iterator_u, &replication)) {
+	case GPT_ITERATOR_INRANGE:
+		gpt_iterator_inspect_push_range(iterator_u);
+		break;
+	case GPT_ITERATOR_START:
+		if(replication != 0) {
+			panic("Internal nodes should not be replicated!");
+		}
+		gpt_iterator_inspect_next(iterator_u, 0);
+		break;
+	case GPT_ITERATOR_LIMIT:
+		iterator_u->node_p = NULL;
+		break;
+	default:
+		panic("Should never get here!");
+	}
+	return 0;
+}
+
+static int
+gpt_iterator_leaf(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                  gpt_node_t** node_p_r)
+{
+	int found = 0;
+	int8_t replication;
+
+	switch(gpt_iterator_check_bounds(iterator_u, &replication)) {
+	case GPT_ITERATOR_INRANGE:
+		if(iterator_u->flags & GPT_ITERATE_LEAVES) {
+			gpt_iterator_return(*iterator_u, key_r, node_p_r);
+			found = 1;
+		}
+		/* Fall through to update current. */
+	case GPT_ITERATOR_START:
+		gpt_iterator_inspect_next(iterator_u, replication);
+		break;
+	case GPT_ITERATOR_LIMIT:
+		iterator_u->node_p = NULL;
+		break;
+	default:
+		panic("Should never get here!");
+	}
+	return found;
+}
+
+static inline int
+gpt_iterator_invalid(gpt_iterator_t* iterator_u, gpt_key_t* key_r,
+                     gpt_node_t** node_p_r)
+{
+	int found = 0;
+
+	if(iterator_u->flags & GPT_ITERATE_INVALIDS) {
+		gpt_iterator_return(*iterator_u, key_r, node_p_r);
+		found = 1;
+	}
+	gpt_iterator_inspect_next(iterator_u, 0);
+	return found;
+}
+
+static inline int
+gpt_node_delete_single(gpt_thunk_t delete_thunk, gpt_node_t delete_node)
+{
+	int8_t coverage;
+	gpt_key_t guard, key = delete_thunk.key;
+
+	guard = gpt_node_read_guard(delete_node);
+	coverage = gpt_node_leaf_read_coverage(delete_node);
+	/* Check the key matches the guard and coverage of the leaf node. */
+	gptKeysCompareStripPrefix(&key, &guard);
+	if(!gpt_key_compare_null(guard)) {
+		return GPT_NOT_FOUND;
+	}
+	gpt_node_set(delete_thunk.node_p, gpt_node_invalid_init());
+	return coverage;
+}
+/* awiggins (2006-07-18): Review the use of gpt_node_delete_single, redundent?*/
+static int
+gpt_node_delete_replicate(gpt_thunk_t delete_thunk, gpt_node_t delete_node)
+{
+	int i;
+	int8_t level_order, key_length, delete_coverage;
+	gpt_key_t guard, key = delete_thunk.key;
+	gpt_node_t* level;
+	gpt_key_value_t key_value;
+
+	level = gpt_node_internal_read_ptr(delete_node);
+	level_order = gpt_node_internal_read_order(delete_node);
+	gpt_node_read_guard(delete_node);
+	gptKeysCompareStripPrefix(&guard, &key);
+	key_value = gpt_key_read_value(key);
+	key_length = gpt_key_read_length(key);
+	key_length = level_order - key_length;
+	key_value <<= key_length;
+	delete_thunk.key = gpt_key_null();
+	for(i = key_value; i < key_value + (1 << key_length); i++) {
+		delete_thunk.node_p = level + i;
+		delete_node = gpt_node_get(delete_thunk.node_p);
+		delete_coverage =
+				gpt_node_delete_single(delete_thunk, delete_node);
+	}
+	return delete_coverage;
+}
+
+static int
+gpt_node_internal_delete(gpt_thunk_t delete_thunk, gpt_node_t delete_node)
+{
+	if(!gpt_node_internal_elongation(delete_node)) {
+		return gpt_node_delete_replicate(delete_thunk, delete_node);
+	}
+	panic("Fix me! Currently don't handle elongations");
+}
+
+static inline void
+gpt_node_insert_single(gpt_node_t new_node, gpt_thunk_t insert_thunk)
+{
+	new_node = gpt_node_init_guard(new_node, insert_thunk.key);
+	gpt_node_set(insert_thunk.node_p, new_node);
+}
+
+static int
+gpt_node_insert_replicate(gpt_node_t new_node, gpt_thunk_t insert_thunk,
+                          gpt_node_t insert_node)
+{
+	int i;
+	int8_t key_length, guard_length, level_order, log2replication;
+	gpt_key_t guard, key_temp, key = insert_thunk.key;
+	gpt_node_t* level;
+	gpt_key_value_t key_value;
+	unsigned long long interval;
+
+	level = gpt_node_internal_read_ptr(insert_node);
+	level_order = gpt_node_internal_read_order(insert_node);
+	guard = gpt_node_read_guard(insert_node);
+	guard_length = gpt_key_read_length(guard);
+	gptKeyCutMSB(guard_length, &key, &key_temp);
+	key_value = gpt_key_read_value(key);
+	key_length = gpt_key_read_length(key);
+	/* The split of key and guard should match. */
+	//assert(gptKeysCompareEqual(guard, key_temp));
+	/* Insert the new replicated node. */
+	key_temp = gpt_key_null();
+	log2replication = level_order - key_length;
+	interval = 1ULL << log2replication;
+	level = level + (interval * key_value);
+	insert_thunk.key = key_temp;
+	for(i = 0; i < interval; i++) {
+		insert_thunk.node_p = level + i;
+		/* Check for overlap. */
+		insert_node = gpt_node_get(insert_thunk.node_p);
+		if(gpt_node_type(insert_node) != GPT_NODE_TYPE_INVALID) {
+			/* Clean up the entries that we set. */
+				for(i--; i >= 0; i--) {
+					gpt_node_set(level + i,
+								 gpt_node_invalid_init());
+				}
+				return GPT_OVERLAP;
+		} else {
+			gpt_node_insert_single(new_node, insert_thunk);
+		}
+	}
+	return GPT_OK;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 12/12] Alternate page table implementation cont...
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (44 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 11/12] " Paul Davies
@ 2007-01-13  2:49 ` Paul Davies
  2007-01-13 19:29 ` [PATCH 0/29] Page Table Interface Explanation Peter Zijlstra
                   ` (3 subsequent siblings)
  49 siblings, 0 replies; 60+ messages in thread
From: Paul Davies @ 2007-01-13  2:49 UTC (permalink / raw)
  To: linux-mm; +Cc: Paul Davies

PATCH GPT 12
 * Adds iterator implementations necessary to boot GPT, run LTP and lmbench
 without bringing down the machine (providing you have plenty of memory :))
   * There are problems freeing the GPT at the moment so I have commented
   it out :(

Signed-Off-By: Paul Davies <pauld@gelato.unsw.edu.au>

---

 pt-gpt-core.c |  210 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 205 insertions(+), 5 deletions(-)
Index: linux-2.6.20-rc4/mm/pt-gpt-core.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/pt-gpt-core.c	2007-01-11 19:00:49.115823000 +1100
+++ linux-2.6.20-rc4/mm/pt-gpt-core.c	2007-01-11 19:13:08.783823000 +1100
@@ -573,14 +573,14 @@
 	/* Process node once children have been processed. */
 	// DEBUG [
 	if(iterator_u->depth == 0) {
-		printk("Root");
+//		printk("Root"); //pauld
 	}
 	gpt_iterator_return(*iterator_u, &key, &node_temp_p);
 	guard = gpt_node_read_guard(gpt_node_get(node_temp_p));
-	printk("\tinternal node (0x%lx, %d) guard (0x%lx, %d)",
-		   gpt_key_read_value(key), gpt_key_read_length(key),
-		   gpt_key_read_value(guard), gpt_key_read_length(guard));
-	printk((iterator_u->finished) ? "U\n" : "D\n");
+//	printk("\tinternal node (0x%lx, %d) guard (0x%lx, %d)", //pauld
+//		   gpt_key_read_value(key), gpt_key_read_length(key),
+//		   gpt_key_read_value(guard), gpt_key_read_length(guard));
+//	printk((iterator_u->finished) ? "U\n" : "D\n");
 	// DEBUG ]
 	if(iterator_u->finished) {
 		// gpt_iterator_return(*iterator_u, &key, node_p_r);
@@ -1024,3 +1024,203 @@
 	}
 	return GPT_OK;
 }
+
+/*
+ * This function frees user-level page tables of a process.
+ *
+ * Must be called with pagetable lock held.
+ */
+void free_pt_range(struct mmu_gather **tlb,
+			unsigned long addr, unsigned long end,
+			unsigned long floor, unsigned long ceiling)
+{
+	gpt_iterator_t iterator;
+	gpt_node_t* node_p;
+	/* buggy somewhere - so turned off temporarily */
+	//gpt_iterator_inspect_init_range(&iterator, &(((*tlb)->mm)->page_table),
+    //                                                 addr, end);
+	//gpt_iterator_free_pgtables(&iterator, &node_p, floor, ceiling);
+}
+
+int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+                       unsigned long addr, unsigned long end,
+                       struct vm_area_struct *vma)
+{
+	unsigned long i;
+	pte_t* src_pte_p;
+	pte_t* dst_pte_p;
+	int rss[2];
+	gpt_key_t key;
+	gpt_node_t* node_p;
+	gpt_iterator_t iterator;
+
+	gpt_iterator_inspect_init_range(&iterator, &(src_mm->page_table),
+                                                     addr, end);
+	spin_lock(&src_mm->page_table_lock);
+	while(gpt_iterator_inspect_leaves_range(&iterator, &key, &node_p)) {
+		BUG_ON(gpt_key_read_length(key) != GPT_KEY_LENGTH_MAX);
+		i = get_real_address(gpt_key_read_value(key));
+		src_pte_p = gpt_node_leaf_read_ptep(node_p);
+		spin_lock(&dst_mm->page_table_lock);
+		dst_pte_p = build_page_table(dst_mm, i, NULL);
+		BUG_ON(!dst_pte_p); // Need to fix this with clean failure.
+		spin_unlock(&dst_mm->page_table_lock);
+		copy_one_pte(dst_mm, src_mm, dst_pte_p, src_pte_p, vma, addr,
+					 rss);
+		add_mm_rss(dst_mm, rss[0], rss[1]);
+	}
+	spin_unlock(&src_mm->page_table_lock);
+	return 0;
+}
+
+unsigned long unmap_page_range_iterator(struct mmu_gather *tlb,
+		struct vm_area_struct *vma, unsigned long addr, unsigned long end,
+		long *zap_work, struct zap_details *details)
+{
+    gpt_key_t key; // DEBUG!
+	pte_t* pte_p;
+	int file_rss, anon_rss = 0;
+    gpt_node_t* node_p;
+    gpt_iterator_t iterator;
+	struct mm_struct *mm = vma->vm_mm;
+
+    gpt_iterator_inspect_init_range(&iterator, &(mm->page_table),
+                                        addr, end);
+    spin_lock(&mm->page_table_lock);
+    while(gpt_iterator_inspect_leaves_range(&iterator, &key, &node_p)) {
+        pte_p = gpt_node_leaf_read_ptep(node_p);
+        node_p->raw.guard = 0; // zap doesn't clear gpt guard field.
+        zap_one_pte(pte_p, mm, addr, vma, zap_work, details, tlb,
+                    &anon_rss, &file_rss);
+        add_mm_rss(mm, file_rss, anon_rss);
+    }
+    spin_unlock(&mm->page_table_lock);
+ 	return end;
+}
+
+int zeromap_build_iterator(struct mm_struct *mm,
+			unsigned long addr, unsigned long end, pgprot_t prot)
+{
+	unsigned long i;
+	pte_t *pte;
+	int err;
+
+	spin_lock(&mm->page_table_lock);
+	for(i=addr; i<end;) {
+		if((pte = build_page_table(&init_mm,i, NULL))) {
+			zeromap_one_pte(mm, pte, addr, prot);
+		}
+		i+=PAGE_SIZE;
+	}
+	spin_unlock(&mm->page_table_lock);
+	return 0;
+}
+
+int remap_build_iterator(struct mm_struct *mm,
+		unsigned long addr, unsigned long end, unsigned long pfn,
+		pgprot_t prot)
+{
+	panic("TODO rebuild iterator\n");
+	return 0;
+}
+
+void change_protection_read_iterator(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end, pgprot_t newprot,
+		int dirty_accountable)
+{
+	unsigned long i;
+	pte_t* pte_p;
+	gpt_key_t key;
+	gpt_node_t* node_p;
+	gpt_iterator_t iterator;
+	struct mm_struct* mm = vma->vm_mm;
+
+	gpt_iterator_inspect_init_range(&iterator, &(mm->page_table),
+                                        addr, end);
+	spin_lock(&mm->page_table_lock);
+	while(gpt_iterator_inspect_leaves_range(&iterator, &key, &node_p)) {
+		BUG_ON(gpt_key_read_length(key) != GPT_KEY_LENGTH_MAX);
+		i = get_real_address(gpt_key_read_value(key));
+		pte_p = gpt_node_leaf_read_ptep(node_p);
+		change_prot_pte(mm, pte_p, i, newprot, dirty_accountable);
+	}
+	spin_unlock(&mm->page_table_lock);
+}
+
+void vunmap_read_iterator(unsigned long addr, unsigned long end)
+{
+	unsigned long i;
+	pte_t* pte_p;
+    gpt_key_t key;
+    gpt_node_t* node_p;
+    gpt_iterator_t iterator;
+
+    gpt_iterator_inspect_init_range(&iterator, &(init_mm.page_table),
+                                                     addr, end);
+    while(gpt_iterator_inspect_leaves_range(&iterator, &key, &node_p)) {
+         pte_p = gpt_node_leaf_read_ptep(node_p);
+         BUG_ON(gpt_key_read_length(key) != GPT_KEY_LENGTH_MAX);
+         i = get_real_address(gpt_key_read_value(key));
+         vunmap_one_pte(pte_p, i);
+    }
+}
+
+int vmap_build_iterator(unsigned long addr,
+			unsigned long end, pgprot_t prot, struct page ***pages)
+{
+	unsigned long i;
+	pte_t *pte;
+	int err;
+
+	for(i=addr; i<end;) {
+		if((pte = build_page_table(&init_mm,i, NULL))) {
+				err = vmap_one_pte(pte, addr, pages, prot);
+				if(err)
+				return err;
+		}
+		i+=PAGE_SIZE;
+	}
+	return 0;
+}
+
+int unuse_vma_read_iterator(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end, swp_entry_t entry,
+				struct page *page)
+{
+	panic("TODO: unuse vma iterator\n");
+	return 0;
+}
+
+void smaps_read_iterator(struct vm_area_struct *vma,
+				unsigned long addr, unsigned long end,
+				struct mem_size_stats *mss)
+{
+	panic("TODO: smaps read iterator\n");
+}
+
+#ifdef CONFIG_NUMA
+
+int check_policy_read_iterator(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long end,
+		const nodemask_t *nodes, unsigned long flags,
+		void *private)
+{
+	panic("TODO: check policy iterator");
+	return 0;
+}
+#endif
+
+int ioremap_page_range(unsigned long addr,
+		       unsigned long end, unsigned long phys_addr, pgprot_t prot)
+{
+	panic("TODO: ioremap iterator");
+	return 0;
+}
+
+unsigned long move_page_tables(struct vm_area_struct *vma,
+		unsigned long old_addr, struct vm_area_struct *new_vma,
+		unsigned long new_addr, unsigned long len)
+{
+	panic("TODO: move page tables\n");
+	return 0;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/29] Page Table Interface Explanation
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (45 preceding siblings ...)
  2007-01-13  2:49 ` [PATCH 12/12] " Paul Davies
@ 2007-01-13 19:29 ` Peter Zijlstra
  2007-01-14 10:06   ` Paul Cameron Davies
  2007-01-16 18:49 ` Christoph Lameter
                   ` (2 subsequent siblings)
  49 siblings, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2007-01-13 19:29 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

>                 PAGE TABLE INTERFACE
> 
> int create_user_page_table(struct mm_struct *mm);
> 
> void destroy_user_page_table(struct mm_struct *mm);
> 
> pte_t *build_page_table(struct mm_struct *mm, unsigned long address,
> 		pt_path_t *pt_path);
> 
> pte_t *lookup_page_table(struct mm_struct *mm, unsigned long address,
> 		pt_path_t *pt_path);



> void free_pt_range(struct mmu_gather **tlb, unsigned long addr,
> 		unsigned long end, unsigned long floor, unsigned long ceiling);
> 
> int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> 		unsigned long addr, unsigned long end, struct vm_area_struct *vma);
> 
> unsigned long unmap_page_range_iterator(struct mmu_gather *tlb,
>         struct vm_area_struct *vma, unsigned long addr, unsigned long end,
>         long *zap_work, struct zap_details *details);
> 
> int zeromap_build_iterator(struct mm_struct *mm,
> 		unsigned long addr, unsigned long end, pgprot_t prot);
> 
> int remap_build_iterator(struct mm_struct *mm,
> 		unsigned long addr, unsigned long end, unsigned long pfn,
> 		pgprot_t prot);
> 
> void change_protection_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, pgprot_t newprot,
> 		int dirty_accountable);
> 
> void vunmap_read_iterator(unsigned long addr, unsigned long end);
> 
> int vmap_build_iterator(unsigned long addr,
> 		unsigned long end, pgprot_t prot, struct page ***pages);
> 
> int unuse_vma_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page);
> 
> void smaps_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, struct mem_size_stats *mss);
> 
> int check_policy_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, const nodemask_t *nodes,
> 		unsigned long flags, void *private);
> 
> unsigned long move_page_tables(struct vm_area_struct *vma,
> 		unsigned long old_addr, struct vm_area_struct *new_vma,
> 		unsigned long new_addr, unsigned long len);
> 

weird naming, functions are not iterators, if named after what they do
it should be *_iteration.

But still, I would have expected an iterator based interface; something
along the lines of:

typedef struct pti_struct {
  struct mm_struct *mm;
  pgd_t *pgd;
  pud_t *pud;
  pmd_t *pmd;
  pte_t *pte;
  spinlock_t *ptl;
  unsigned long address;
} pti_t

with accessors like:

#define pti_address(pti) (pti).address
#define pti_pte(pti) (pti).pte

and methods like:

bool pti_valid(pti_t *pti);
pti_t pti_lookup(struct mm_struct *mm, unsigned long address);
pti_t pti_acquire(struct mm_struct *mm, unsigned long address);
void pti_release(pti_t *pti);

bool pti_next(pti_t *pti);

so that you could write the typical loops like:

  int ret = 0;

  pti_t *pri = pti_lookup(mm, start);
  do_for_each_pti_range(pti, end) {
    if (per_pte_op(pti_pte(pti))) {
      ret = -EFOO;
      break;
    }
  } while_for_each_pti_range(pti, end);
  pti_release(pti);

  return ret;

where do_for_each_pti_range() and while_for_each_pti_range() look
something like:

#define do_for_each_pti_range(pti, end) \
  if (pti_valid(pti) && pti_address(pti) < end) do

#define while_for_each_pti_range(pti, end) \
  while (pti_next(pti) && pti_valid(pti) && pti_address(pti) < end)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/29] Page Table Interface Explanation
  2007-01-13 19:29 ` [PATCH 0/29] Page Table Interface Explanation Peter Zijlstra
@ 2007-01-14 10:06   ` Paul Cameron Davies
  0 siblings, 0 replies; 60+ messages in thread
From: Paul Cameron Davies @ 2007-01-14 10:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Paul Davies, linux-mm

Hi Peter

> weird naming, functions are not iterators, if named after what they do
> it should be *_iteration.

Sorry.  I had genuine "iterators" in a previous attempted PTI and never
changed my naming convention.

> But still, I would have expected an iterator based interface; something
> along the lines of:
>
> typedef struct pti_struct {
>  struct mm_struct *mm;
>  pgd_t *pgd;
>  pud_t *pud;
>  pmd_t *pmd;
>  pte_t *pte;
>  spinlock_t *ptl;
>  unsigned long address;
> } pti_t
>
> with accessors like:
>
> #define pti_address(pti) (pti).address
> #define pti_pte(pti) (pti).pte
>
> and methods like:
>
> bool pti_valid(pti_t *pti);
> pti_t pti_lookup(struct mm_struct *mm, unsigned long address);
> pti_t pti_acquire(struct mm_struct *mm, unsigned long address);
> void pti_release(pti_t *pti);
>
> bool pti_next(pti_t *pti);
>
> so that you could write the typical loops like:
>
>  int ret = 0;
>
>  pti_t *pri = pti_lookup(mm, start);
>  do_for_each_pti_range(pti, end) {
>    if (per_pte_op(pti_pte(pti))) {
>      ret = -EFOO;
>      break;
>    }
>  } while_for_each_pti_range(pti, end);
>  pti_release(pti);
>
>  return ret;
>
> where do_for_each_pti_range() and while_for_each_pti_range() look
> something like:
>
> #define do_for_each_pti_range(pti, end) \
>  if (pti_valid(pti) && pti_address(pti) < end) do
>
> #define while_for_each_pti_range(pti, end) \
>  while (pti_next(pti) && pti_valid(pti) && pti_address(pti) < end)
Excellent.

After LCA, I will take what you have given me, and do a version based
around what you would have expected to see.  I hope that you will be 
able to find the time to have a quick look at it :)

Cheers

Paul Davies

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/29] Page Table Interface Explanation
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (46 preceding siblings ...)
  2007-01-13 19:29 ` [PATCH 0/29] Page Table Interface Explanation Peter Zijlstra
@ 2007-01-16 18:49 ` Christoph Lameter
  2007-01-18  6:22   ` Paul Cameron Davies
  2007-01-16 18:51 ` Christoph Lameter
  2007-01-16 19:14 ` Christoph Lameter
  49 siblings, 1 reply; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 18:49 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

I am glad to see that this endeavor is still going forward.

> int copy_dual_iterator(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> 		unsigned long addr, unsigned long end, struct vm_area_struct *vma);
> 
> unsigned long unmap_page_range_iterator(struct mmu_gather *tlb,
>         struct vm_area_struct *vma, unsigned long addr, unsigned long end,
>         long *zap_work, struct zap_details *details);
> 
> int zeromap_build_iterator(struct mm_struct *mm,
> 		unsigned long addr, unsigned long end, pgprot_t prot);
> 
> int remap_build_iterator(struct mm_struct *mm,
> 		unsigned long addr, unsigned long end, unsigned long pfn,
> 		pgprot_t prot);
> 
> void change_protection_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, pgprot_t newprot,
> 		int dirty_accountable);
> 
> void vunmap_read_iterator(unsigned long addr, unsigned long end);
> 
> int vmap_build_iterator(unsigned long addr,
> 		unsigned long end, pgprot_t prot, struct page ***pages);
> 
> int unuse_vma_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page);
> 
> void smaps_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, struct mem_size_stats *mss);
> 
> int check_policy_read_iterator(struct vm_area_struct *vma,
> 		unsigned long addr, unsigned long end, const nodemask_t *nodes,
> 		unsigned long flags, void *private);
> 
> unsigned long move_page_tables(struct vm_area_struct *vma,
> 		unsigned long old_addr, struct vm_area_struct *new_vma,
> 		unsigned long new_addr, unsigned long len);

Why do we need so many individual specialized iterators? Isnt there some 
way to have a common iterator function?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/29] Page Table Interface Explanation
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (47 preceding siblings ...)
  2007-01-16 18:49 ` Christoph Lameter
@ 2007-01-16 18:51 ` Christoph Lameter
  2007-01-18  6:53   ` Paul Cameron Davies
  2007-01-16 19:14 ` Christoph Lameter
  49 siblings, 1 reply; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 18:51 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

On Sat, 13 Jan 2007, Paul Davies wrote:

> INSTRUCTIONS,BENCHMARKS and further information at the site below:

The benchmarks seem to be a mixed bag. Mostly up to the same speed, some 
minor improvements in some operations some minor regressions in others. If 
we cannot find any major regressions on other platforms then I would 
think that the patchset is acceptable on that ground.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/29] Abstract current page table implementation
  2007-01-13  2:45 ` [PATCH 3/29] " Paul Davies
@ 2007-01-16 18:55   ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 18:55 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

I think the last 3 patches could stand alone as a cleanup of the existing 
API. Could you describe them differently? If they are all doing the same 
then maybe make the 3 patches one patch?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 4/29] Introduce Page Table Interface (PTI)
  2007-01-13  2:46 ` [PATCH 4/29] Introduce Page Table Interface (PTI) Paul Davies
@ 2007-01-16 19:02   ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 19:02 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

On Sat, 13 Jan 2007, Paul Davies wrote:

> +	if (mm!=&init_mm) { /* Look up user page table */

Missing blanks. Comment on a separate line please.

> +#define lookup_page_table_lock(mm, pt_path, address)	\

We need the complete path to the pte here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/29] Start calling simple PTI functions
  2007-01-13  2:46 ` [PATCH 5/29] Start calling simple PTI functions Paul Davies
@ 2007-01-16 19:04   ` Christoph Lameter
  2007-01-18  6:43     ` Paul Cameron Davies
  0 siblings, 1 reply; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 19:04 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

On Sat, 13 Jan 2007, Paul Davies wrote:

> @@ -308,6 +309,7 @@
>  } while (0)
>  
>  struct mm_struct {
> +	pt_t page_table;					/* Page table */
>  	struct vm_area_struct * mmap;		/* list of VMAs */

Why are you changing the location of the page table pointer in mm struct?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 6/29] Tweak IA64 arch dependent files to work with PTI
  2007-01-13  2:46 ` [PATCH 6/29] Tweak IA64 arch dependent files to work with PTI Paul Davies
@ 2007-01-16 19:05   ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 19:05 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

On Sat, 13 Jan 2007, Paul Davies wrote:

>  	 * We may get interrupts here, but that's OK because interrupt
>  	 * handlers cannot touch user-space.
>  	 */
> -	ia64_set_kr(IA64_KR_PT_BASE, __pa(next->pgd));
> +	ia64_set_kr(IA64_KR_PT_BASE, __pa(next->page_table.pgd));
>  	activate_context(next);

Argh... The requirement for patches is that the kernel compiles after each 
patch was required. It looks as if the last patch broke the compile.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 7/29] Continue calling simple PTI functions
  2007-01-13  2:46 ` [PATCH 7/29] Continue calling simple PTI functions Paul Davies
@ 2007-01-16 19:08   ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 19:08 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm

On Sat, 13 Jan 2007, Paul Davies wrote:

> -	pte = pte_alloc_map(mm, pmd, address);
> +	pte = build_page_table(mm, address, &pt_path);

build_page_table as a name for a function whose role is mainly to lookup 
a pte? Yes it adds entries as required. Maybe something like

lookup_and_add_page_table()


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/29] Page Table Interface Explanation
  2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
                   ` (48 preceding siblings ...)
  2007-01-16 18:51 ` Christoph Lameter
@ 2007-01-16 19:14 ` Christoph Lameter
  49 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2007-01-16 19:14 UTC (permalink / raw)
  To: Paul Davies; +Cc: linux-mm, akpm

One important thing to note here is that the abstraction of the page table 
goes way beyond the needs for other page table formats.

Think about virtualization technologies such as VMware, Xen and KVM. If 
those can implement an alternate page table update mechanism then we do 
not need many of the hooks that are currently being proposed. There is the 
potential that we can come up with forms of page tables that avoid the 
current issues with shadow page tables.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/29] Page Table Interface Explanation
  2007-01-16 18:49 ` Christoph Lameter
@ 2007-01-18  6:22   ` Paul Cameron Davies
  0 siblings, 0 replies; 60+ messages in thread
From: Paul Cameron Davies @ 2007-01-18  6:22 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Davies, linux-mm

On Tue, 16 Jan 2007, Christoph Lameter wrote:

> I am glad to see that this endeavor is still going forward.
I will be working hard to make this happen over the coming period
of time.  I will take your feedback, talk to my colleagues, and
come up with a new version after LCA.

>> 		unsigned long new_addr, unsigned long len);
>
> Why do we need so many individual specialized iterators? Isnt there some
> way to have a common iterator function?
Yes - and this is the intention.  However, I thought that it might
be easier to get the page table interface into the kernel by doing
it in stages.

I was worried a common iterator function represented too much change
too quickly.

Cheers

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/29] Start calling simple PTI functions
  2007-01-16 19:04   ` Christoph Lameter
@ 2007-01-18  6:43     ` Paul Cameron Davies
  0 siblings, 0 replies; 60+ messages in thread
From: Paul Cameron Davies @ 2007-01-18  6:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Davies, linux-mm

On Tue, 16 Jan 2007, Christoph Lameter wrote:

> On Sat, 13 Jan 2007, Paul Davies wrote:
>
>> @@ -308,6 +309,7 @@
>>  } while (0)
>>
>>  struct mm_struct {
>> +	pt_t page_table;					/* Page table */
>>  	struct vm_area_struct * mmap;		/* list of VMAs */
>
> Why are you changing the location of the page table pointer in mm struct?
This was part of an ugly and temporary hack to get our alternative page
table (a guarded page table) going.  I wrote an ugly hack to get the GPT
lookup happening on my machine, then passed it on to a PhD student
to deal with it (it still requires further work).  The lookup was
dependent upon the position in the struct.

It will be moved back next time I push the patches out.

Cheers

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/29] Page Table Interface Explanation
  2007-01-16 18:51 ` Christoph Lameter
@ 2007-01-18  6:53   ` Paul Cameron Davies
  0 siblings, 0 replies; 60+ messages in thread
From: Paul Cameron Davies @ 2007-01-18  6:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Davies, linux-mm

On Tue, 16 Jan 2007, Christoph Lameter wrote:

> On Sat, 13 Jan 2007, Paul Davies wrote:
>
>> INSTRUCTIONS,BENCHMARKS and further information at the site below:
>
> The benchmarks seem to be a mixed bag. Mostly up to the same speed, some
> minor improvements in some operations some minor regressions in others. If
> we cannot find any major regressions on other platforms then I would
> think that the patchset is acceptable on that ground.
I will expand the PTI testing to other archictectures after LCA.  The
results will be placed on our wiki and I will notify linux-mm when I have
gathered some more interesting results.

Cheers

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2007-01-18  6:53 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-13  2:45 [PATCH 0/29] Page Table Interface Explanation Paul Davies
2007-01-13  2:45 ` [PATCH 1/29] Abstract current page table implementation Paul Davies
2007-01-13  2:45 ` [PATCH 2/29] " Paul Davies
2007-01-13  2:45 ` [PATCH 3/29] " Paul Davies
2007-01-16 18:55   ` Christoph Lameter
2007-01-13  2:46 ` [PATCH 4/29] Introduce Page Table Interface (PTI) Paul Davies
2007-01-16 19:02   ` Christoph Lameter
2007-01-13  2:46 ` [PATCH 5/29] Start calling simple PTI functions Paul Davies
2007-01-16 19:04   ` Christoph Lameter
2007-01-18  6:43     ` Paul Cameron Davies
2007-01-13  2:46 ` [PATCH 6/29] Tweak IA64 arch dependent files to work with PTI Paul Davies
2007-01-16 19:05   ` Christoph Lameter
2007-01-13  2:46 ` [PATCH 7/29] Continue calling simple PTI functions Paul Davies
2007-01-16 19:08   ` Christoph Lameter
2007-01-13  2:46 ` [PATCH 8/29] Clean up page fault handers Paul Davies
2007-01-13  2:46 ` [PATCH 9/29] Clean up page fault handlers Paul Davies
2007-01-13  2:46 ` [PATCH 10/29] Call simple PTI functions Paul Davies
2007-01-13  2:46 ` [PATCH 11/29] Call simple PTI functions cont Paul Davies
2007-01-13  2:46 ` [PATCH 12/29] Abstract page table tear down Paul Davies
2007-01-13  2:46 ` [PATCH 13/29] Finish abstracting " Paul Davies
2007-01-13  2:46 ` [PATCH 14/29] Abstract copy page range iterator Paul Davies
2007-01-13  2:46 ` [PATCH 15/29] Finish abstracting copy page range Paul Davies
2007-01-13  2:47 ` [PATCH 16/29] Abstract unmap page range iterator Paul Davies
2007-01-13  2:47 ` [PATCH 17/29] Finish abstracting unmap page range Paul Davies
2007-01-13  2:47 ` [PATCH 18/29] Abstract zeromap " Paul Davies
2007-01-13  2:47 ` [PATCH 19/29] Abstract remap pfn range Paul Davies
2007-01-13  2:47 ` [PATCH 20/29] Abstract change protection iterator Paul Davies
2007-01-13  2:47 ` [PATCH 21/29] Abstract unmap vm area Paul Davies
2007-01-13  2:47 ` [PATCH 22/29] Abstract map " Paul Davies
2007-01-13  2:47 ` [PATCH 23/29] Abstract unuse_vma Paul Davies
2007-01-13  2:47 ` [PATCH 24/29] Abstract smaps iterator Paul Davies
2007-01-13  2:47 ` [PATCH 25/29] Abstact mempolicy iterator Paul Davies
2007-01-13  2:47 ` [PATCH 26/29] Abstract mempolicy iterator cont Paul Davies
2007-01-13  2:48 ` [PATCH 27/29] Abstract implementation dependent code for mremap Paul Davies
2007-01-13  2:48 ` [PATCH 28/29] Abstract ioremap iterator Paul Davies
2007-01-13  2:48 ` [PATCH 29/29] Tweak i386 arch dependent files to work with PTI Paul Davies
2007-01-13  2:48 ` [PATCH 1/5] Introduce IA64 page table interface Paul Davies
2007-01-13  2:48 ` [PATCH 2/5] Abstract pgtable Paul Davies
2007-01-13  2:48 ` [PATCH 3/5] Abstact pgtable continued Paul Davies
2007-01-13  2:48 ` [PATCH 4/5] Abstract assembler lookup Paul Davies
2007-01-13  2:48 ` [PATCH 5/5] Abstract pgalloc Paul Davies
2007-01-13  2:48 ` [PATCH 1/12] Alternate page table implementation (GPT) Paul Davies
2007-01-13  2:48 ` [PATCH 2/12] Alternate page table implementation cont Paul Davies
2007-01-13  2:48 ` [PATCH 3/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 4/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 5/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 6/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 7/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 8/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 9/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 10/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 11/12] " Paul Davies
2007-01-13  2:49 ` [PATCH 12/12] " Paul Davies
2007-01-13 19:29 ` [PATCH 0/29] Page Table Interface Explanation Peter Zijlstra
2007-01-14 10:06   ` Paul Cameron Davies
2007-01-16 18:49 ` Christoph Lameter
2007-01-18  6:22   ` Paul Cameron Davies
2007-01-16 18:51 ` Christoph Lameter
2007-01-18  6:53   ` Paul Cameron Davies
2007-01-16 19:14 ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.