linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH,RFC] Transparent SuperPage Support for 2.5.44
@ 2002-10-28  1:58 Naohiko Shimizu
  2002-10-28  2:40 ` Wiedemeier, Jeff
  2002-11-03  7:40 ` Pavel Machek
  0 siblings, 2 replies; 5+ messages in thread
From: Naohiko Shimizu @ 2002-10-28  1:58 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

This is a transparent superpage support patch for 2.5.44.
Big difference between this patch and 2.4.19 patch is
eliminating of automatic dynamic downgrade for superpages.
Instead, I place pagesize adjust routine where required.
I hope this change minimize the overhead for conventional
programs which does not use superpages.

Linux SuperPage patch is transparent for user applications.
It will automatically allocate appropriate size of superpages
if possible.
It does not allocate real strage unless the application
really access that area. And it does not allocate memory
larger than the application requests.

This patch includes i386, alpha, sparc64 ports.
But I could not compile for alpha even with plain 2.5.44, and
I don't have sparc64 to test, then only i386 was tested now.

Any comments will be welcome.

I am not on this list, then please CC me for any comments.
-- 
Naohiko Shimizu
Dept. Communications Engineering/Tokai University
1117 Kitakaname Hiratsuka 259-1292 Japan
TEL.+81-463-58-1211(ext. 4084) FAX.+81-463-58-8320
http://shimizu-lab.dt.u-tokai.ac.jp/


[-- Attachment #2: super_page-2.5.44_021028.patch --]
[-- Type: application/octet-stream, Size: 47463 bytes --]

diff -urN linux-2.5.44/Documentation/super_page.txt linux-2.5.44-superpage/Documentation/super_page.txt
--- linux-2.5.44/Documentation/super_page.txt	Thu Jan  1 09:00:00 1970
+++ linux-2.5.44-superpage/Documentation/super_page.txt	Sun Oct 27 17:26:17 2002
@@ -0,0 +1,75 @@
+The super page is a feature which extend the TLB coverage much a lot.
+Some modern processors have this feature, and Linux Super Page
+will activate this feature and result in getting more performance
+for the extensively TLB missing programs.
+For example a large matrix transpoze benchmark runs about 4 times
+faster with super page on Alpha, 2 to 3 times faster on UltraSparcII,
+2 to 6 times faster on Pentium4 than with normal kernel.
+
+Linux Super Page is transparent for user applications and turns on
+superpages for 'brk', 'bss' and 'anonymous private mmap' area.
+
+The current target architectures of this patch are
+1) Alpha
+2) Sparc64
+3) i386
+
+But I don't have any Sparc64 machine then it is not tested by myself.
+The i386 port is under the development.
+
+The super page patch provides some convenient features.
+
+/proc/super_page: 
+
+	current nr: The maximum super page index. Alpha,Sparc64 support 1 
+through 4 and i386 support 1 or 2.
+	current vm_align: The current super page align flag. 
+1 means that anonymous mmap request will be aligned to the 
+appropriate super page boundary. It will align the requested 
+area on the boundary of largest superpage within the requested length.
+	current bitmask: The value of the super page order mask.
+	order reserve  allocate  fail: The super page order, reservation count,
+allocation count and allocation fail count, respectively.
+
+sysctl: Provide following variables.
+
+on: None zero value means to use superpage.
+nr: The maximum super page index.
+vm_align: The super page align flag.
+bitmask: The super page order mask. The super page index corresponding to
+the 0 value of the mask will not be used. It will fall into the atomic page.
+logreset: When 1, the allocation counters will be reset by reading it.
+
+Alpha architecture defines Granularity Hint(GH) bits in the
+Page Table Entry(PTE). If these bits are set to non-zero value,
+it supply a hint to translation buffer implementations that
+a block of pages can be treated as a single larger page.
+Sparc64 has almost same feature in the PTE that defines the
+page size which is mapped by the PTE. The restriction to  use
+and the page size variation is also the same as Alpha.
+X86 defines PageSize bit in PGD or PMD depending on the PAE mode.
+When PS bit is on, PGD/PMD points to a 4MB page and no PTE will
+be used.
+
+For the large working set HPC applications the performance
+degradation caused by the translation misses should be avoided.
+Then if we can use this feature, many HPC applications will be
+appreciated.
+
+There is a configuration option to support this
+feature. You can turn on this feature by assert
+the CONFIG_SUPER_PAGE variable. 
+With turning on, kernel will use maximum super page for
+bss, brk or anonymous mmap area transparently.
+On the address boundary which is not fit for any of
+the superpages, kernel will use atomic size of pages.
+Or on the tail of the requested area, if the request
+length is not large enough for any of the superpages,
+kernel will also use atomic size of pages.
+The allocated page size will vary depending on the
+requested length and address. Your working set
+will be stay just the same as the normal kernel.
+
+Naohiko Shimizu<nshimizu@keyaki.cc.u-tokai.ac.jp>
+URL: http://shimizu-lab.dt.u-tokai.ac.jp/lsp.html
+
diff -urN linux-2.5.44/arch/alpha/config.in linux-2.5.44-superpage/arch/alpha/config.in
--- linux-2.5.44/arch/alpha/config.in	Sat Oct 19 13:01:18 2002
+++ linux-2.5.44-superpage/arch/alpha/config.in	Thu Oct 24 13:52:53 2002
@@ -246,6 +246,8 @@
 # LARGE_VMALLOC is racy, if you *really* need it then fix it first
 define_bool CONFIG_ALPHA_LARGE_VMALLOC n
 
+bool 'Transparent SuperPage Support' CONFIG_SUPER_PAGE
+
 source drivers/pci/Config.in
 
 bool 'Support for hot-pluggable devices' CONFIG_HOTPLUG
diff -urN linux-2.5.44/arch/alpha/kernel/osf_sys.c linux-2.5.44-superpage/arch/alpha/kernel/osf_sys.c
--- linux-2.5.44/arch/alpha/kernel/osf_sys.c	Sat Oct 19 13:00:42 2002
+++ linux-2.5.44-superpage/arch/alpha/kernel/osf_sys.c	Thu Oct 24 13:49:21 2002
@@ -1361,14 +1361,31 @@
 		         unsigned long limit)
 {
 	struct vm_area_struct *vma = find_vma(current->mm, addr);
+#if CONFIG_SUPER_PAGE
+	extern int super_page_vm_align;
+	unsigned long super_page_mask=0;
 
+	if(super_page_vm_align&&addr==PAGE_ALIGN(TASK_UNMAPPED_BASE)) {
+		int i;
+		for(i=super_page_nr-1; i>0; i--) {
+		if(len>(PAGE_SIZE << super_page_order[i])) {
+			super_page_mask = (PAGE_SIZE << super_page_order[i]) -1;
+			break;
+			}
+		}
+	}
+#endif
 	while (1) {
 		/* At this point:  (!vma || addr < vma->vm_end). */
 		if (limit - len < addr)
 			return -ENOMEM;
 		if (!vma || addr + len <= vma->vm_start)
 			return addr;
+#if CONFIG_SUPER_PAGE
+		addr = (vma->vm_end + super_page_mask)&~super_page_mask;
+#else
 		addr = vma->vm_end;
+#endif
 		vma = vma->vm_next;
 	}
 }
diff -urN linux-2.5.44/arch/alpha/mm/init.c linux-2.5.44-superpage/arch/alpha/mm/init.c
--- linux-2.5.44/arch/alpha/mm/init.c	Sat Oct 19 13:02:30 2002
+++ linux-2.5.44-superpage/arch/alpha/mm/init.c	Thu Oct 24 13:51:38 2002
@@ -34,6 +34,12 @@
 #include <asm/console.h>
 #include <asm/tlb.h>
 
+#ifdef CONFIG_SUPER_PAGE
+int super_page_order[SUPER_PAGE_NR] = {0,3,6,9};
+pgprot_t super_page_prot[SUPER_PAGE_NR] = 
+  {__pgprot(0x0000),__pgprot(0x0020),__pgprot(0x0040),__pgprot(0x0060)};
+#endif
+
 mmu_gather_t mmu_gathers[NR_CPUS];
 
 extern void die_if_kernel(char *,struct pt_regs *,long);
diff -urN linux-2.5.44/arch/i386/config.in linux-2.5.44-superpage/arch/i386/config.in
--- linux-2.5.44/arch/i386/config.in	Sat Oct 19 13:01:21 2002
+++ linux-2.5.44-superpage/arch/i386/config.in	Sun Oct 27 18:20:16 2002
@@ -234,6 +234,14 @@
    define_bool CONFIG_X86_PAE y
 fi
 
+bool 'Transparent SuperPage Support' CONFIG_SUPER_PAGE
+if [ "$CONFIG_SUPER_PAGE" = "y" ]; then
+   define_int  CONFIG_FORCE_MAX_ZONEORDER 11
+  if [ "$CONFIG_MK7" = "y" ]; then
+     bool 'Athlon 4MB TLB BUG workaround' CONFIG_X86_K7_INVLPG_BUG
+  fi
+fi
+
 if [ "$CONFIG_HIGHMEM4G" = "y" -o "$CONFIG_HIGHMEM64G" = "y" ]; then
    bool 'Allocate 3rd-level pagetables from highmem' CONFIG_HIGHPTE
 fi
diff -urN linux-2.5.44/arch/i386/kernel/head.S linux-2.5.44-superpage/arch/i386/kernel/head.S
--- linux-2.5.44/arch/i386/kernel/head.S	Sat Oct 19 13:01:10 2002
+++ linux-2.5.44-superpage/arch/i386/kernel/head.S	Sun Oct 27 17:19:22 2002
@@ -93,7 +93,11 @@
  * Enable paging
  */
 3:
+#if CONFIG_SUPER_PAGE
+	movl $swapper_tlb_dir-__PAGE_OFFSET,%eax
+#else
 	movl $swapper_pg_dir-__PAGE_OFFSET,%eax
+#endif
 	movl %eax,%cr3		/* set the page table pointer.. */
 	movl %cr0,%eax
 	orl $0x80000000,%eax
@@ -363,6 +367,27 @@
  */
 .org 0x1000
 ENTRY(swapper_pg_dir)
+#if CONFIG_SUPER_PAGE
+       .long 0x00103007
+       .long 0x00104007
+       .fill BOOT_USER_PGD_PTRS-2,4,0
+       /* default: 766 entries */
+       .long 0x00103007
+       .long 0x00104007
+       /* default: 254 entries */
+       .fill BOOT_KERNEL_PGD_PTRS-2,4,0
+/* we copy the table for the tlb */
+swapper_tlb_dir:
+       .long 0x00103007
+       .long 0x00104007
+       .fill BOOT_USER_PGD_PTRS-2,4,0
+       /* default: 766 entries */
+       .long 0x00103007
+       .long 0x00104007
+       /* default: 254 entries */
+       .fill BOOT_KERNEL_PGD_PTRS-2,4,0
+#define SUPER_OFFSET +0x1000
+#else
 	.long 0x00102007
 	.long 0x00103007
 	.fill BOOT_USER_PGD_PTRS-2,4,0
@@ -371,15 +396,17 @@
 	.long 0x00103007
 	/* default: 254 entries */
 	.fill BOOT_KERNEL_PGD_PTRS-2,4,0
+#define SUPER_OFFSET
+#endif
 
 /*
  * The page tables are initialized to only 8MB here - the final page
  * tables are set up later depending on memory size.
  */
-.org 0x2000
+.org 0x2000 SUPER_OFFSET
 ENTRY(pg0)
 
-.org 0x3000
+.org 0x3000 SUPER_OFFSET
 ENTRY(pg1)
 
 /*
@@ -387,10 +414,10 @@
  * initialization loop counts until empty_zero_page)
  */
 
-.org 0x4000
+.org 0x4000 SUPER_OFFSET
 ENTRY(empty_zero_page)
 
-.org 0x5000
+.org 0x5000 SUPER_OFFSET
 
 /*
  * Real beginning of normal "text" segment
diff -urN linux-2.5.44/arch/i386/mm/fault.c linux-2.5.44-superpage/arch/i386/mm/fault.c
--- linux-2.5.44/arch/i386/mm/fault.c	Sat Oct 19 13:00:42 2002
+++ linux-2.5.44-superpage/arch/i386/mm/fault.c	Thu Oct 24 14:01:46 2002
@@ -392,7 +392,11 @@
 		pte_t *pte_k;
 
 		asm("movl %%cr3,%0":"=r" (pgd));
+#if CONFIG_SUPER_PAGE
+		pgd = offset + (pgd_t *)__va(pgd) - PTRS_PER_PGD;
+#else
 		pgd = offset + (pgd_t *)__va(pgd);
+#endif
 		pgd_k = init_mm.pgd + offset;
 
 		if (!pgd_present(*pgd_k))
diff -urN linux-2.5.44/arch/i386/mm/init.c linux-2.5.44-superpage/arch/i386/mm/init.c
--- linux-2.5.44/arch/i386/mm/init.c	Sat Oct 19 13:02:27 2002
+++ linux-2.5.44-superpage/arch/i386/mm/init.c	Sun Oct 27 18:13:07 2002
@@ -42,6 +42,14 @@
 
 mmu_gather_t mmu_gathers[NR_CPUS];
 unsigned long highstart_pfn, highend_pfn;
+#if CONFIG_SUPER_PAGE
+#if CONFIG_X86_PAE
+int super_page_order[SUPER_PAGE_NR] = {0,9};
+#else
+int super_page_order[SUPER_PAGE_NR] = {0,10};
+#endif
+pgprot_t super_page_prot[SUPER_PAGE_NR] = {__pgprot(0),__pgprot(_PAGE_SUPER)};
+#endif
 
 /*
  * Creates a middle page table and puts a pointer to it in the
@@ -352,7 +360,11 @@
 {
 	pagetable_init();
 
+#if CONFIG_SUPER_PAGE
+	load_cr3(swapper_pg_dir+PTRS_PER_PGD);  
+#else
 	load_cr3(swapper_pg_dir);
+#endif
 
 #if CONFIG_X86_PAE
 	/*
@@ -542,8 +554,14 @@
         /*
          * PAE pgds must be 16-byte aligned:
          */
+#if CONFIG_SUPER_PAGE
+	pae_pgd_cachep = kmem_cache_create("pae_pgd",
+		PTR_PER_PGD*sizeof(pgd_t)*2, 0,
+		SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, NULL, NULL);
+#else
         pae_pgd_cachep = kmem_cache_create("pae_pgd", 32, 0,
                 SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, NULL, NULL);
+#endif
         if (!pae_pgd_cachep)
                 panic("init_pae(): Cannot alloc pae_pgd SLAB cache");
 }
diff -urN linux-2.5.44/arch/i386/mm/pgtable.c linux-2.5.44-superpage/arch/i386/mm/pgtable.c
--- linux-2.5.44/arch/i386/mm/pgtable.c	Sat Oct 19 13:02:59 2002
+++ linux-2.5.44-superpage/arch/i386/mm/pgtable.c	Sun Oct 27 22:12:14 2002
@@ -175,20 +175,41 @@
 
 	if (pgd) {
 		for (i = 0; i < USER_PTRS_PER_PGD; i++) {
+#if CONFIG_SUPER_PAGE
+			unsigned long pmd = __get_free_pages(GFP_KERNEL,1);
+			if (!pmd)
+				goto out_oom;
+			clear_page(pmd + (PTRS_PER_PMD*sizeof(pmd_t)));
+#else
 			unsigned long pmd = __get_free_page(GFP_KERNEL);
 			if (!pmd)
 				goto out_oom;
+#endif
 			clear_page(pmd);
 			set_pgd(pgd + i, __pgd(1 + __pa(pmd)));
+#if CONFIG_SUPER_PAGE
+			set_pgd_raw(pgd + i + PTRS_PER_PGD,
+			__pgd(1 + __pa(pmd + (PTRS_PER_PMD*sizeof(pmd_t)))));
+#endif
+
 		}
 		memcpy(pgd + USER_PTRS_PER_PGD,
 			swapper_pg_dir + USER_PTRS_PER_PGD,
 			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+#if CONFIG_SUPER_PAGE
+		memcpy(pgd + PTRS_PER_PGD + USER_PTRS_PER_PGD,
+			swapper_pg_dir + USER_PTRS_PER_PGD,
+			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+#endif
 	}
 	return pgd;
 out_oom:
 	for (i--; i >= 0; i--)
+#if CONFIG_SUPER_PAGE
+		free_pages((unsigned long)__va(pgd_val(pgd[i])-1), 1);
+#else
 		free_page((unsigned long)__va(pgd_val(pgd[i])-1));
+#endif
 	kmem_cache_free(pae_pgd_cachep, pgd);
 	return NULL;
 }
@@ -198,7 +219,11 @@
 	int i;
 
 	for (i = 0; i < USER_PTRS_PER_PGD; i++)
+#if CONFIG_SUPER_PAGE
+		free_pages((unsigned long)__va(pgd_val(pgd[i])-1), 1);
+#else
 		free_page((unsigned long)__va(pgd_val(pgd[i])-1));
+#endif
 	kmem_cache_free(pae_pgd_cachep, pgd);
 }
 
@@ -206,6 +231,21 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
+#if CONFIG_SUPER_PAGE
+	pgd_t *pgd = (pgd_t *)__get_free_pages(GFP_KERNEL,1);
+
+	if (pgd) {
+		memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
+		memcpy(pgd + USER_PTRS_PER_PGD,
+			swapper_pg_dir + USER_PTRS_PER_PGD,
+			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+		memset(pgd + PTRS_PER_PGD,
+			0, USER_PTRS_PER_PGD * sizeof(pgd_t));
+		memcpy(pgd + PTRS_PER_PGD + USER_PTRS_PER_PGD,
+			swapper_pg_dir + USER_PTRS_PER_PGD,
+			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+	}
+#else
 	pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL);
 
 	if (pgd) {
@@ -214,12 +254,17 @@
 			swapper_pg_dir + USER_PTRS_PER_PGD,
 			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
 	}
+#endif
 	return pgd;
 }
 
 void pgd_free(pgd_t *pgd)
 {
+#if CONFIG_SUPER_PAGE
+	free_pages((unsigned long)pgd, 1);
+#else
 	free_page((unsigned long)pgd);
+#endif
 }
 
 #endif /* CONFIG_X86_PAE */
diff -urN linux-2.5.44/arch/sparc64/config.in linux-2.5.44-superpage/arch/sparc64/config.in
--- linux-2.5.44/arch/sparc64/config.in	Sat Oct 19 13:01:20 2002
+++ linux-2.5.44-superpage/arch/sparc64/config.in	Thu Oct 24 14:07:37 2002
@@ -27,6 +27,8 @@
 # Identify this as a Sparc64 build
 define_bool CONFIG_SPARC64 y
 
+bool 'Transparent SuperPage Support' CONFIG_SUPER_PAGE
+
 bool 'Support for hot-pluggable devices' CONFIG_HOTPLUG
 
 # Global things across all Sun machines.
diff -urN linux-2.5.44/arch/sparc64/kernel/sys_sparc.c linux-2.5.44-superpage/arch/sparc64/kernel/sys_sparc.c
--- linux-2.5.44/arch/sparc64/kernel/sys_sparc.c	Sat Oct 19 13:01:57 2002
+++ linux-2.5.44-superpage/arch/sparc64/kernel/sys_sparc.c	Thu Oct 24 14:14:23 2002
@@ -44,11 +44,16 @@
 	((((addr)+SHMLBA-1)&~(SHMLBA-1)) +	\
 	 (((pgoff)<<PAGE_SHIFT) & (SHMLBA-1)))
 
+
 unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)
 {
 	struct vm_area_struct * vmm;
 	unsigned long task_size = TASK_SIZE;
 	int do_color_align;
+#if CONFIG_SUPER_PAGE
+	extern int super_page_vm_align;
+	unsigned long super_page_mask=0;
+#endif
 
 	if (flags & MAP_FIXED) {
 		/* We do not accept a shared mapping if it would violate
@@ -63,8 +68,20 @@
 		task_size = 0xf0000000UL;
 	if (len > task_size || len > -PAGE_OFFSET)
 		return -ENOMEM;
-	if (!addr)
+	if (!addr) {
 		addr = TASK_UNMAPPED_BASE;
+#if CONFIG_SUPER_PAGE
+		if(super_page_vm_align) {
+			int i;
+			for(i=super_page_nr-1; i>0; i--) {
+				if(len>(PAGE_SIZE << super_page_order[i])) {
+					super_page_mask = (PAGE_SIZE << super_page_order[i])-1;
+					break;
+				}
+			}
+		}
+#endif
+	}
 
 	do_color_align = 0;
 	if (filp || (flags & MAP_SHARED))
@@ -89,6 +106,10 @@
 		addr = vmm->vm_end;
 		if (do_color_align)
 			addr = COLOUR_ALIGN(addr, pgoff);
+#if CONFIG_SUPER_PAGE
+		else
+			addr = (addr + super_page_mask)&~super_page_mask;
+#endif
 	}
 }
 
diff -urN linux-2.5.44/arch/sparc64/mm/init.c linux-2.5.44-superpage/arch/sparc64/mm/init.c
--- linux-2.5.44/arch/sparc64/mm/init.c	Sat Oct 19 13:01:48 2002
+++ linux-2.5.44-superpage/arch/sparc64/mm/init.c	Thu Oct 24 14:23:42 2002
@@ -1141,6 +1141,12 @@
 #ifndef CONFIG_SMP
 struct pgtable_cache_struct pgt_quicklists;
 #endif
+#ifdef CONFIG_SUPER_PAGE
+int super_page_order[SUPER_PAGE_NR] = {0,3,6,9};
+pgprot_t super_page_prot[SUPER_PAGE_NR] = 
+	{__pgprot(_PAGE_SZ8K),__pgprot(_PAGE_SZ64K),
+	__pgprot(_PAGE_SZ512K),__pgprot(_PAGE_SZ4MB)};
+#endif
 
 /* OK, we have to color these pages. The page tables are accessed
  * by non-Dcache enabled mapping in the VPTE area by the dtlb_backend.S
diff -urN linux-2.5.44/include/asm-alpha/pgtable.h linux-2.5.44-superpage/include/asm-alpha/pgtable.h
--- linux-2.5.44/include/asm-alpha/pgtable.h	Sat Oct 19 13:01:19 2002
+++ linux-2.5.44-superpage/include/asm-alpha/pgtable.h	Sun Oct 27 17:30:43 2002
@@ -257,9 +257,7 @@
 extern inline unsigned long pgd_page(pgd_t pgd)
 { return PAGE_OFFSET + ((pgd_val(pgd) & _PFN_MASK) >> (32-PAGE_SHIFT)); }
 
-extern inline int pte_none(pte_t pte)		{ return !pte_val(pte); }
 extern inline int pte_present(pte_t pte)	{ return pte_val(pte) & _PAGE_VALID; }
-extern inline void pte_clear(pte_t *ptep)	{ pte_val(*ptep) = 0; }
 
 extern inline int pmd_none(pmd_t pmd)		{ return !pmd_val(pmd); }
 extern inline int pmd_bad(pmd_t pmd)		{ return (pmd_val(pmd) & ~_PFN_MASK) != _PAGE_TABLE; }
@@ -271,6 +269,30 @@
 extern inline int pgd_present(pgd_t pgd)	{ return pgd_val(pgd) & _PAGE_VALID; }
 extern inline void pgd_clear(pgd_t * pgdp)	{ pgd_val(*pgdp) = 0; }
 
+#if CONFIG_SUPER_PAGE
+#define clear_pmd_sp(pmd) do {} while(0)
+#define super_page_populate(mm, adr, page, prot, index) do {} while (0)
+#define SUPER_PAGE_MASK 0x0060
+#define SUPER_PAGE_MASK_SHIFT 5
+#define SUPER_PAGE_NR 4
+#define SIZEOF_PTE_LOG2 SEZEOF_PTR_LOG2
+void down_pte_sp(pte_t *pteptr, int index);
+void clear_pte_sp(pte_t *pteptr, int index);
+extern int super_page_order[];
+extern pgprot_t super_page_prot[];
+extern inline int pte_none(pte_t pte)\
+	{ return !(pte_val(pte) & ~SUPER_PAGE_MASK); }
+#define pte_to_sp_index(x)\
+	((pte_val(x) & SUPER_PAGE_MASK) >> SUPER_PAGE_MASK_SHIFT)
+extern inline pte_t mk_pte_sp_clean(pte_t pte)	\
+	{pte_val(pte) &= ~SUPER_PAGE_MASK; return pte;} 
+extern inline void pte_clear(pte_t *ptep)	\
+	{ pte_t pte; pte_val(pte)=0; set_pte(ptep, pte); }
+#else
+extern inline int pte_none(pte_t pte)          { return !(pte_val(pte)); }
+extern inline void pte_clear(pte_t *ptep)      { pte_val(*ptep)=0; }
+#endif
+
 /*
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..
diff -urN linux-2.5.44/include/asm-i386/cpufeature.h linux-2.5.44-superpage/include/asm-i386/cpufeature.h
--- linux-2.5.44/include/asm-i386/cpufeature.h	Sat Oct 19 13:01:17 2002
+++ linux-2.5.44-superpage/include/asm-i386/cpufeature.h	Thu Oct 24 17:34:34 2002
@@ -69,7 +69,7 @@
 #define cpu_has_fpu		boot_cpu_has(X86_FEATURE_FPU)
 #define cpu_has_vme		boot_cpu_has(X86_FEATURE_VME)
 #define cpu_has_de		boot_cpu_has(X86_FEATURE_DE)
-#define cpu_has_pse		boot_cpu_has(X86_FEATURE_PSE)
+#define cpu_has_pse		(boot_cpu_has(X86_FEATURE_PSE)|boot_cpu_has(X86_FEATURE_PSE36))
 #define cpu_has_tsc		boot_cpu_has(X86_FEATURE_TSC)
 #define cpu_has_pae		boot_cpu_has(X86_FEATURE_PAE)
 #define cpu_has_pge		boot_cpu_has(X86_FEATURE_PGE)
diff -urN linux-2.5.44/include/asm-i386/mmu_context.h linux-2.5.44-superpage/include/asm-i386/mmu_context.h
--- linux-2.5.44/include/asm-i386/mmu_context.h	Sat Oct 19 13:00:42 2002
+++ linux-2.5.44-superpage/include/asm-i386/mmu_context.h	Thu Oct 24 14:51:06 2002
@@ -38,7 +38,11 @@
 		set_bit(cpu, &next->cpu_vm_mask);
 
 		/* Re-load page tables */
+#if CONFIG_SUPER_PAGE
+		load_cr3(next->pgd+PTRS_PER_PGD);
+#else
 		load_cr3(next->pgd);
+#endif
 
 		/*
 		 * load the LDT, if the LDT is different:
@@ -55,7 +59,11 @@
 			/* We were in lazy tlb mode and leave_mm disabled 
 			 * tlb flush IPI delivery. We must reload %cr3.
 			 */
+#if CONFIG_SUPER_PAGE
+			load_cr3(next->pgd+PTRS_PER_PGD);
+#else
 			load_cr3(next->pgd);
+#endif
 			load_LDT_nolock(&next->context, cpu);
 		}
 	}
diff -urN linux-2.5.44/include/asm-i386/pgalloc.h linux-2.5.44-superpage/include/asm-i386/pgalloc.h
--- linux-2.5.44/include/asm-i386/pgalloc.h	Sat Oct 19 13:01:09 2002
+++ linux-2.5.44-superpage/include/asm-i386/pgalloc.h	Thu Oct 24 15:09:21 2002
@@ -10,6 +10,26 @@
 #define pmd_populate_kernel(mm, pmd, pte) \
 		set_pmd(pmd, __pmd(_PAGE_TABLE + __pa(pte)))
 
+#if CONFIG_SUPER_PAGE
+static inline void super_page_populate(struct mm_struct *mm, unsigned long address, struct page *page, pgprot_t prot, int spindex) {
+	pgd_t *pgd;
+	pmd_t *pmd;
+	union {
+		pmd_t entry;
+		pte_t pte_entry;
+	} x;
+	pgd = pgd_offset(mm, address);
+	pmd = pmd_offset(pgd, address);
+	x.pte_entry = mk_pte(page, __pgprot(pgprot_val(prot)|_PAGE_PSE)) ;
+	x.pte_entry = pte_mkwrite(pte_mkdirty(x.pte_entry));
+#if defined (CONFIG_X86_PAE)
+	set_pmd_raw(pmd + PTRS_PER_PMD, x.entry);
+#else
+	set_pmd_raw(pmd + PTRS_PER_PGD, x.entry);
+#endif
+}
+#endif
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte)
 {
 	set_pmd(pmd, __pmd(_PAGE_TABLE +
diff -urN linux-2.5.44/include/asm-i386/pgtable-2level.h linux-2.5.44-superpage/include/asm-i386/pgtable-2level.h
--- linux-2.5.44/include/asm-i386/pgtable-2level.h	Sat Oct 19 13:01:56 2002
+++ linux-2.5.44-superpage/include/asm-i386/pgtable-2level.h	Thu Oct 24 17:30:17 2002
@@ -45,8 +45,21 @@
  * (pmds are folded into pgds so this doesnt get actually called,
  * but the define is needed for a generic inline function.)
  */
+#if CONFIG_SUPER_PAGE
+#define set_pmd_raw(pmdptr, pmdval) (*(pmdptr) = pmdval)
+#define set_pgd_raw(pgdptr, pgdval) (*(pgdptr) = pgdval)
+#define set_pmd(pmdptr, pmdval) do {\
+	set_pmd_raw(pmdptr, pmdval);\
+	set_pmd_raw(pmdptr+PTRS_PER_PGD, pmdval);\
+	} while (0)
+#define set_pgd(pgdptr, pgdval) do {\
+	set_pgd_raw(pgdptr, pgdval);\
+	set_pgd_raw(pgdptr+PTRS_PER_PGD, pgdval);\
+	} while (0)
+#else
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
 #define set_pgd(pgdptr, pgdval) (*(pgdptr) = pgdval)
+#endif
 
 #define pgd_page(pgd) \
 ((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
@@ -58,7 +71,11 @@
 #define ptep_get_and_clear(xp)	__pte(xchg(&(xp)->pte_low, 0))
 #define pte_same(a, b)		((a).pte_low == (b).pte_low)
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
+#if CONFIG_SUPER_PAGE
+#define pte_none(x)		(!((x).pte_low&~SUPER_PAGE_MASK))
+#else
 #define pte_none(x)		(!(x).pte_low)
+#endif
 #define pte_pfn(x)		((unsigned long)(((x).pte_low >> PAGE_SHIFT)))
 #define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
 #define pfn_pmd(pfn, prot)	__pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
diff -urN linux-2.5.44/include/asm-i386/pgtable-3level.h linux-2.5.44-superpage/include/asm-i386/pgtable-3level.h
--- linux-2.5.44/include/asm-i386/pgtable-3level.h	Sat Oct 19 13:02:28 2002
+++ linux-2.5.44-superpage/include/asm-i386/pgtable-3level.h	Thu Oct 24 17:29:52 2002
@@ -51,11 +51,26 @@
 }
 #define set_pte_atomic(pteptr,pteval) \
 		set_64bit((unsigned long long *)(pteptr),pte_val(pteval))
+#if CONFIG_SUPER_PAGE
+#define set_pgd_raw(pgdptr,pgdval) \
+	set_64bit((unsigned long long *)(pgdptr),pgd_val(pgdval))
+#define set_pmd_raw(pmdptr,pmdval) \
+	set_64bit((unsigned long long *)(pmdptr),pmd_val(pmdval))
+#define set_pmd(pmdptr,pmdval) do {\
+	set_pmd_raw(pmdptr,pmdval);\
+	set_pmd_raw(pmdptr+PTRS_PER_PMD,pmdval);\
+	} while(0)
+#define set_pgd(pgdptr,pgdval) do {\
+	set_pgd_raw(pgdptr,pgdval);\
+	set_pgd_raw(pgdptr+PTRS_PER_PGD,pgdval);\
+	} while(0)
+#else
 #define set_pmd(pmdptr,pmdval) \
 		set_64bit((unsigned long long *)(pmdptr),pmd_val(pmdval))
 #define set_pgd(pgdptr,pgdval) \
 		set_64bit((unsigned long long *)(pgdptr),pgd_val(pgdval))
 
+#endif
 /*
  * Pentium-II erratum A13: in PAE mode we explicitly have to flush
  * the TLB via cr3 if the top-level pgd is changed...
@@ -89,7 +104,11 @@
 }
 
 #define pte_page(x)	pfn_to_page(pte_pfn(x))
+#if CONFIG_SUPER_PAGE
+#define pte_none(x)	(!((x).pte_low&~SUPER_PAGE_MASK) && !(x).pte_high)
+#else
 #define pte_none(x)	(!(x).pte_low && !(x).pte_high)
+#endif
 #define pte_pfn(x)	(((x).pte_low >> PAGE_SHIFT) | ((x).pte_high << (32 - PAGE_SHIFT)))
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
diff -urN linux-2.5.44/include/asm-i386/pgtable.h linux-2.5.44-superpage/include/asm-i386/pgtable.h
--- linux-2.5.44/include/asm-i386/pgtable.h	Sat Oct 19 13:02:27 2002
+++ linux-2.5.44-superpage/include/asm-i386/pgtable.h	Sun Oct 27 13:59:04 2002
@@ -120,6 +120,12 @@
 #define _PAGE_DIRTY	0x040
 #define _PAGE_PSE	0x080	/* 4 MB (or 2MB) page, Pentium+, if present.. */
 #define _PAGE_GLOBAL	0x100	/* Global TLB entry PPro+ */
+#if CONFIG_SUPER_PAGE
+#define _PAGE_SUPER	0x200   /* SuperPage candidated page */
+#define SUPER_PAGE_MASK	_PAGE_SUPER
+#define SUPER_PAGE_MASK_SHIFT	9
+#define SUPER_PAGE_NR	2
+#endif
 
 #define _PAGE_PROTNONE	0x080	/* If not present */
 
@@ -215,6 +221,27 @@
 static inline void ptep_set_wrprotect(pte_t *ptep)		{ clear_bit(_PAGE_BIT_RW, &ptep->pte_low); }
 static inline void ptep_mkdirty(pte_t *ptep)			{ set_bit(_PAGE_BIT_DIRTY, &ptep->pte_low); }
 
+#if CONFIG_SUPER_PAGE
+extern pgprot_t super_page_prot[];
+extern int super_page_order[];
+#define pte_to_sp_index(x)  (((x).pte_low & SUPER_PAGE_MASK) >> SUPER_PAGE_MASK_SHIFT)
+static inline pte_t mk_pte_sp_clean(pte_t pte)
+	{(pte).pte_low &= ~SUPER_PAGE_MASK; return pte;} 
+#if CONFIG_X86_PAE
+#define SIZEOF_PTE_LOG2 3
+static inline void clear_pmd_sp(pmd_t *pmd) {
+	set_pmd_raw((pmd+PTRS_PER_PMD), *pmd);
+}
+#else
+#define SIZEOF_PTE_LOG2 2
+static inline void clear_pmd_sp(pmd_t *pmd) {
+	set_pmd_raw((pmd+PTRS_PER_PGD), *pmd);
+}
+#endif 
+void down_pte_sp(pte_t *pteptr, int index);
+void clear_pte_sp(pte_t *pteptr, int index);
+#define set_pte_raw(pteptr, pteval) set_pte(pteptr, pteval) 
+#endif
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -295,8 +322,13 @@
 
 /* Encode and de-code a swap entry */
 #define __swp_type(x)			(((x).val >> 1) & 0x3f)
+#if CONFIG_SUPER_PAGE
+#define __swp_offset(x)			((x).val >> 10)
+#define __swp_entry(type, offset)	((swp_entry_t) { ((type) << 1) | ((offset) << 10) })
+#else
 #define __swp_offset(x)			((x).val >> 8)
 #define __swp_entry(type, offset)	((swp_entry_t) { ((type) << 1) | ((offset) << 8) })
+#endif
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { (pte).pte_low })
 #define __swp_entry_to_pte(x)		((pte_t) { (x).val })
 
diff -urN linux-2.5.44/include/asm-i386/tlbflush.h linux-2.5.44-superpage/include/asm-i386/tlbflush.h
--- linux-2.5.44/include/asm-i386/tlbflush.h	Sat Oct 19 13:01:52 2002
+++ linux-2.5.44-superpage/include/asm-i386/tlbflush.h	Thu Oct 24 16:50:19 2002
@@ -50,6 +50,9 @@
 #define __flush_tlb_single(addr) \
 	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
 
+#if CONFIG_X86_K7_INVLPG_BUG
+#define __flush_tlb_one(addr) __flush_tlb()
+#else
 #ifdef CONFIG_X86_INVLPG
 # define __flush_tlb_one(addr) __flush_tlb_single(addr)
 #else
@@ -61,6 +64,7 @@
 			__flush_tlb();					\
 	} while (0)
 #endif
+#endif
 
 /*
  * TLB flushing:
diff -urN linux-2.5.44/include/asm-sparc64/pgtable.h linux-2.5.44-superpage/include/asm-sparc64/pgtable.h
--- linux-2.5.44/include/asm-sparc64/pgtable.h	Sat Oct 19 13:01:08 2002
+++ linux-2.5.44-superpage/include/asm-sparc64/pgtable.h	Sun Oct 27 13:28:46 2002
@@ -54,6 +54,9 @@
 #define PMD_SIZE	(1UL << PMD_SHIFT)
 #define PMD_MASK	(~(PMD_SIZE-1))
 #define PMD_BITS	11
+#if CONFIG_SUPER_PAGE
+#define SIZEOF_PTE_LOG2 3
+#endif
 
 /* PGDIR_SHIFT determines what a third-level page table entry can map */
 #define PGDIR_SHIFT	(PAGE_SHIFT + (PAGE_SHIFT-3) + PMD_BITS)
@@ -136,7 +139,7 @@
 #elif PAGE_SHIFT == 19
 #define _PAGE_SZBITS	_PAGE_SZ512K
 #elif PAGE_SHIFT == 22
-#define _PAGE_SZBITS	_PAGE_SZ4M
+#define _PAGE_SZBITS	_PAGE_SZ4MB
 #else
 #error Wrong PAGE_SHIFT specified
 #endif
@@ -228,9 +231,7 @@
 #define __pmd_page(pmd)			((unsigned long) __va((pmd_val(pmd)<<11UL)))
 #define pmd_page(pmd) 			virt_to_page((void *)__pmd_page(pmd))
 #define pgd_page(pgd)			((unsigned long) __va((pgd_val(pgd)<<11UL)))
-#define pte_none(pte) 			(!pte_val(pte))
 #define pte_present(pte)		(pte_val(pte) & _PAGE_PRESENT)
-#define pte_clear(pte)			(pte_val(*(pte)) = 0UL)
 #define pmd_none(pmd)			(!pmd_val(pmd))
 #define pmd_bad(pmd)			(0)
 #define pmd_present(pmd)		(pmd_val(pmd) != 0UL)
@@ -239,7 +240,24 @@
 #define pgd_bad(pgd)			(0)
 #define pgd_present(pgd)		(pgd_val(pgd) != 0UL)
 #define pgd_clear(pgdp)			(pgd_val(*(pgdp)) = 0UL)
-
+#if CONFIG_SUPER_PAGE
+#define clear_pmd_sp(pmd) do {} while(0)
+#define super_page_populate(mm, adr, page, prot, index) do {} while (0)
+#define SUPER_PAGE_MASK (_PAGE_SZ64K|_PAGE_SZ512K|_PAGE_SZ4MB)
+#define SUPER_PAGE_MASK_SHIFT 61
+#define SUPER_PAGE_NR 4
+extern int super_page_order[];
+extern pgprot_t super_page_prot[];
+extern inline int pte_none(pte_t pte)	{ return !(pte_val(pte) & ~SUPER_PAGE_MASK); }
+#define pte_to_sp_index(x)	((pte_val(x) & SUPER_PAGE_MASK) >> SUPER_PAGE_MASK_SHIFT)
+extern inline pte_t mk_pte_sp_clean(pte_t pte) {pte_val(pte) &= ~SUPER_PAGE_MASK; return pte;} 
+void down_pte_sp(pte_t *pteptr, int index);
+void clear_pte_sp(pte_t *pteptr, int index);
+extern inline void pte_clear(pte_t *ptep)      { pte_t pte; pte_val(pte)=0; set_pte(ptep, pte); }
+#else
+#define pte_none(pte)	(!pte_val(pte))
+#define pte_clear(pte)	(pte_val(*(pte)) = 0UL)
+#endif
 /* The following only work if pte_present() is true.
  * Undefined behaviour if not..
  */
diff -urN linux-2.5.44/include/linux/mm.h linux-2.5.44-superpage/include/linux/mm.h
--- linux-2.5.44/include/linux/mm.h	Sat Oct 19 13:01:08 2002
+++ linux-2.5.44-superpage/include/linux/mm.h	Sun Oct 27 14:18:27 2002
@@ -552,6 +552,20 @@
 extern unsigned long get_page_cache_size(void);
 extern unsigned int nr_used_zone_pages(void);
 
+#if CONFIG_SUPER_PAGE
+extern int super_page_on;
+extern int super_page_nr;
+extern unsigned long super_page_reserve[];
+extern unsigned long super_page_allocate[];
+extern unsigned long super_page_downgrade[];
+void super_page_init(void);
+int make_ptes_present(unsigned long addr, unsigned long end);
+void __break_area (struct page *page, unsigned long order);
+void adj_sp_range(struct mm_struct *mm,
+	int zap,unsigned long address, unsigned long end);
+#define break_area(page, order) __break_area(page, order)
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif
diff -urN linux-2.5.44/kernel/ksyms.c linux-2.5.44-superpage/kernel/ksyms.c
--- linux-2.5.44/kernel/ksyms.c	Sat Oct 19 13:01:08 2002
+++ linux-2.5.44-superpage/kernel/ksyms.c	Thu Oct 24 19:59:20 2002
@@ -599,5 +599,10 @@
 EXPORT_SYMBOL(__per_cpu_offset);
 #endif
 
+#if CONFIG_SUPER_PAGE
+EXPORT_SYMBOL(super_page_order);
+EXPORT_SYMBOL(super_page_prot);
+#endif
+
 /* debug */
 EXPORT_SYMBOL(dump_stack);
diff -urN linux-2.5.44/kernel/sysctl.c linux-2.5.44-superpage/kernel/sysctl.c
--- linux-2.5.44/kernel/sysctl.c	Sat Oct 19 13:01:11 2002
+++ linux-2.5.44-superpage/kernel/sysctl.c	Sun Oct 27 12:46:04 2002
@@ -147,6 +147,10 @@
 static void unregister_proc_table(ctl_table *, struct proc_dir_entry *);
 #endif
 
+#if CONFIG_SUPER_PAGE
+void super_page_init(void);
+#endif
+
 /* The default sysctl tables: */
 
 static ctl_table root_table[] = {
@@ -367,6 +371,10 @@
 	register_proc_table(root_table, proc_sys_root);
 	init_irq_proc();
 #endif
+#ifdef CONFIG_SUPER_PAGE
+	super_page_init();
+#endif
+
 }
 
 int do_sysctl(int *name, int nlen, void *oldval, size_t *oldlenp,
diff -urN linux-2.5.44/mm/Makefile linux-2.5.44-superpage/mm/Makefile
--- linux-2.5.44/mm/Makefile	Sat Oct 19 13:02:00 2002
+++ linux-2.5.44-superpage/mm/Makefile	Thu Oct 24 20:03:26 2002
@@ -11,4 +11,6 @@
 	    pdflush.o page-writeback.o rmap.o madvise.o vcache.o \
 	    truncate.o
 
+obj-$(CONFIG_SUPER_PAGE) += super_page.o
+
 include $(TOPDIR)/Rules.make
diff -urN linux-2.5.44/mm/memory.c linux-2.5.44-superpage/mm/memory.c
--- linux-2.5.44/mm/memory.c	Sat Oct 19 13:01:52 2002
+++ linux-2.5.44-superpage/mm/memory.c	Mon Oct 28 10:05:28 2002
@@ -212,6 +212,14 @@
 
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst, src, vma);
+#if CONFIG_SUPER_PAGE
+	if(cow) {
+		spin_lock(&src->page_table_lock);                  
+		adj_sp_range(src, 1, address, end);
+		spin_unlock(&src->page_table_lock);                        
+	}
+#endif
+
 
 	src_pgd = pgd_offset(src, address)-1;
 	dst_pgd = pgd_offset(dst, address)-1;
@@ -402,6 +410,9 @@
 	dir = pgd_offset(vma->vm_mm, address);
 	tlb_start_vma(tlb, vma);
 	do {
+#if CONFIG_SUPER_PAGE
+		adj_sp_range(vma->vm_mm, 1, address, end);
+#endif
 		zap_pmd_range(tlb, dir, address, end - address);
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
@@ -641,6 +652,9 @@
 		BUG();
 
 	spin_lock(&mm->page_table_lock);
+#if CONFIG_SUPER_PAGE
+	adj_sp_range(mm, 1, address, end);
+#endif
 	do {
 		pmd_t *pmd = pmd_alloc(mm, dir, address);
 		error = -ENOMEM;
@@ -722,6 +736,9 @@
 		BUG();
 
 	spin_lock(&mm->page_table_lock);
+#if CONFIG_SUPER_PAGE
+	adj_sp_range(mm, 1, beg, end);
+#endif
 	do {
 		pmd_t *pmd = pmd_alloc(mm, dir, from);
 		error = -ENOMEM;
@@ -1053,21 +1070,78 @@
 {
 	pte_t entry;
 	struct page * page = ZERO_PAGE(addr);
+#if CONFIG_SUPER_PAGE
+	int i, order;
+	unsigned long spaddr;
+	pte_t oldpte, *wktable;
+#endif
 
 	/* Read-only mapping of ZERO_PAGE. */
 	entry = pte_wrprotect(mk_pte(ZERO_PAGE(addr), vma->vm_page_prot));
 
 	/* ..except if it's a write access */
 	if (write_access) {
+#if CONFIG_SUPER_PAGE
+retry:
+		oldpte = *page_table;
+		order = super_page_order[pte_to_sp_index(oldpte)];
+		wktable =
+			(pte_t *)((unsigned long)page_table & 
+			~((1UL << (order + SIZEOF_PTE_LOG2)) -1));
+		for (i=0; i < 1 << order; i++) {
+			if(!pte_none(*(wktable+i))) {
+				down_pte_sp(page_table, pte_to_sp_index(oldpte));
+				goto retry;
+			}
+		}
+#endif
 		/* Allocate our own private page. */
 		pte_unmap(page_table);
 		spin_unlock(&mm->page_table_lock);
-
+#if CONFIG_SUPER_PAGE
+		page = alloc_pages(GFP_HIGHUSER, order);
+		if (!page) {
+			if (order) {
+				spin_lock(&mm->page_table_lock);
+				super_page_downgrade[pte_to_sp_index(oldpte)]++;
+				down_pte_sp(page_table, pte_to_sp_index(oldpte));
+				goto retry;
+			} 
+		else goto no_mem;
+		}
+		spaddr = addr & ~((PAGE_SIZE << order) - 1);
+		for (i=0; i < 1 << order; i++) {
+			clear_user_highpage(page+i, spaddr);
+			spaddr += PAGE_SIZE;
+		}
+		if (order) {
+			break_area(page, order);
+			super_page_allocate[pte_to_sp_index(oldpte)]++;
+			spin_lock(&mm->page_table_lock);
+			spaddr = addr & ~((PAGE_SIZE << order) - 1);
+			super_page_populate(mm, spaddr, page, vma->vm_page_prot,
+			pte_to_sp_index(oldpte));
+			for (i=0; i < 1 << order; i++) {
+				entry = pte_mkwrite(pte_mkdirty(
+				mk_pte(page+i, __pgprot(pgprot_val(vma->vm_page_prot)|
+				pgprot_val(super_page_prot[pte_to_sp_index(oldpte)])))));
+       				mm->rss++;
+				flush_page_to_ram(page+i);
+				lru_cache_add(page+i);
+				mark_page_accessed(page+i);
+				set_pte_raw(wktable+i, entry);
+				page_add_rmap(page+i, wktable+i);
+				pte_unmap(wktable+i);
+				spaddr += PAGE_SIZE;
+			}
+		} else{
+#else
 		page = alloc_page(GFP_HIGHUSER);
 		if (!page)
 			goto no_mem;
 		clear_user_highpage(page, addr);
 
+#endif
 		spin_lock(&mm->page_table_lock);
 		page_table = pte_offset_map(pmd, addr);
 
@@ -1082,11 +1156,23 @@
 		entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
 		lru_cache_add(page);
 		mark_page_accessed(page);
+#if CONFIG_SUPER_PAGE
+		set_pte(page_table, entry);
+		page_add_rmap(page, page_table); /* ignores ZERO_PAGE */
+		pte_unmap(page_table);
+		}
+	} else {
+		set_pte(page_table, entry);
+		page_add_rmap(page, page_table); /* ignores ZERO_PAGE */
+		pte_unmap(page_table);
+	}
+#else
 	}
 
 	set_pte(page_table, entry);
 	page_add_rmap(page, page_table); /* ignores ZERO_PAGE */
 	pte_unmap(page_table);
+#endif
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, addr, entry);
diff -urN linux-2.5.44/mm/mmap.c linux-2.5.44-superpage/mm/mmap.c
--- linux-2.5.44/mm/mmap.c	Sat Oct 19 13:02:00 2002
+++ linux-2.5.44-superpage/mm/mmap.c	Sun Oct 27 16:57:38 2002
@@ -602,6 +602,11 @@
 		mm->locked_vm += len >> PAGE_SHIFT;
 		make_pages_present(addr, addr + len);
 	}
+#ifdef CONFIG_SUPER_PAGE
+	if (super_page_on && len >= (PAGE_SIZE << super_page_order[1])) {
+		make_ptes_present(addr, addr + len);
+	}
+#endif
 	return addr;
 
 unmap_and_free_vma:
@@ -632,9 +637,15 @@
  * This function "knows" that -ENOMEM has the bits set.
  */
 #ifndef HAVE_ARCH_UNMAPPED_AREA
+#if CONFIG_SUPER_PAGE
+extern int super_page_vm_align;
+#endif
 static inline unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)
 {
 	struct vm_area_struct *vma;
+#if CONFIG_SUPER_PAGE
+	unsigned long super_page_mask=0;
+#endif
 
 	if (len > TASK_SIZE)
 		return -ENOMEM;
@@ -648,13 +659,28 @@
 	}
 	addr = PAGE_ALIGN(TASK_UNMAPPED_BASE);
 
+#if CONFIG_SUPER_PAGE
+	if(super_page_vm_align) {
+		int i;
+		for(i=super_page_nr-1; i>0; i--) {
+			if(len>(PAGE_SIZE << super_page_order[i])) {
+				super_page_mask = (PAGE_SIZE << super_page_order[i]) -1;
+				break;
+			}
+		}
+	}
+#endif
 	for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) {
 		/* At this point:  (!vma || addr < vma->vm_end). */
 		if (TASK_SIZE - len < addr)
 			return -ENOMEM;
 		if (!vma || addr + len <= vma->vm_start)
 			return addr;
+#if CONFIG_SUPER_PAGE
+		addr = (vma->vm_end + super_page_mask)&~super_page_mask;
+#else
 		addr = vma->vm_end;
+#endif
 	}
 }
 #else
@@ -1233,6 +1259,11 @@
 		mm->locked_vm += len >> PAGE_SHIFT;
 		make_pages_present(addr, addr + len);
 	}
+#ifdef CONFIG_SUPER_PAGE
+	if (super_page_on && len >= (PAGE_SIZE << super_page_order[1])) {
+		make_ptes_present(addr, addr + len);
+	}
+#endif
 	return addr;
 }
 
diff -urN linux-2.5.44/mm/mprotect.c linux-2.5.44-superpage/mm/mprotect.c
--- linux-2.5.44/mm/mprotect.c	Sat Oct 19 13:01:49 2002
+++ linux-2.5.44-superpage/mm/mprotect.c	Sun Oct 27 16:59:20 2002
@@ -96,6 +96,9 @@
 	if (start >= end)
 		BUG();
 	spin_lock(&current->mm->page_table_lock);
+#if CONFIG_SUPER_PAGE
+	adj_sp_range(current->mm, 1, start, end);
+#endif
 	do {
 		change_pmd_range(dir, start, end - start, newprot);
 		start = (start + PGDIR_SIZE) & PGDIR_MASK;
diff -urN linux-2.5.44/mm/super_page.c linux-2.5.44-superpage/mm/super_page.c
--- linux-2.5.44/mm/super_page.c	Thu Jan  1 09:00:00 1970
+++ linux-2.5.44-superpage/mm/super_page.c	Mon Oct 28 10:19:17 2002
@@ -0,0 +1,298 @@
+/*
+  Linux Super Page internal functions.
+*/
+
+#define SUPER_PAGE_DEBUG 0
+
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/swap.h>
+#include <linux/smp_lock.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+
+#include <asm/pgalloc.h>
+#include <asm/uaccess.h>
+#include <asm/tlb.h>
+#include <linux/proc_fs.h>
+#include <linux/sysctl.h>
+
+/* We use arbitrary high number for the sysctl. You may have to change it.*/
+#define CTL_SUPER_PAGE 4558
+
+#define CTL_SET_ON 1
+#define CTL_SET_NR 2
+#define CTL_SET_ALIGN 3
+#define CTL_SET_BITMASK 4
+#define CTL_SET_LOGRES 5
+
+int super_page_on = 0; /* We start without super_page at first. */
+int super_page_nr = SUPER_PAGE_NR;
+int super_page_vm_align = 0; /* We start without super_page align at first. */
+int super_page_tail_align = 0; /* We start without super_page tail align at first. */
+int super_page_bitmask = (1<<SUPER_PAGE_NR)-1; /* To control each order of the reservation. */
+int super_page_logreset = 0; /* If 1 then reset counter when dumped */
+
+unsigned long super_page_reserve[SUPER_PAGE_NR];
+unsigned long super_page_allocate[SUPER_PAGE_NR];
+unsigned long super_page_downgrade[SUPER_PAGE_NR];
+#if CONFIG_SYSCTL
+static ctl_table super_page_table[] = {
+        {CTL_SET_ON, "on", &super_page_on, sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {CTL_SET_NR, "nr", &super_page_nr, sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {CTL_SET_ALIGN, "vm_align", &super_page_vm_align, sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {CTL_SET_ALIGN, "tail_align", &super_page_tail_align, sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {CTL_SET_BITMASK, "bitmask", &super_page_bitmask, sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {CTL_SET_LOGRES, "logreset", &super_page_logreset, sizeof(int),
+         0644, NULL, &proc_dointvec},
+	{0}
+};
+static ctl_table sys_table[] = {
+	{CTL_SUPER_PAGE, "super_page", NULL, 0, 0555, super_page_table},
+	{0}
+};
+
+#endif
+
+#if CONFIG_PROC_FS
+int super_page_getinfo(char *buf, char **start, off_t fpos, int length)
+{
+      int i;
+      char *p = buf;
+
+      p += sprintf(p, "current on: %d\n", super_page_on);
+      p += sprintf(p, "current nr: %d\n", super_page_nr);
+      p += sprintf(p, "current bitmask: %d\n", super_page_bitmask);
+      p += sprintf(p, "current vm_align: %d\n", super_page_vm_align);
+      p += sprintf(p, "order\treserve\tallocate\tfail \n");
+      for(i=1;i<SUPER_PAGE_NR;i++) {
+      p += sprintf(p, "%d:\t%ld\t%ld\t%ld\n",
+               super_page_order[i],
+               super_page_reserve[i],
+               super_page_allocate[i],
+               super_page_downgrade[i]
+               );
+      }
+      if(super_page_logreset)
+      for(i=1;i<SUPER_PAGE_NR;i++) {
+               super_page_reserve[i] = 0;
+               super_page_allocate[i] = 0;
+               super_page_downgrade[i] = 0;
+      }
+      return p - buf;
+ }
+#endif
+
+void super_page_init() {
+  int i;
+#if SUPER_PAGE_DEBUG
+	super_page_on = 0;
+	printk("super_page_init\n");
+#else
+	super_page_on = 1;
+#endif
+#if CONFIG_PROC_FS
+	for(i=0;i<SUPER_PAGE_NR;i++) {
+		super_page_reserve[i] = 0;
+		super_page_allocate[i] = 0;
+		super_page_downgrade[i] = 0;
+	}
+	create_proc_info_entry("super_page", 0, NULL, super_page_getinfo);
+#endif
+#if CONFIG_SYSCTL
+	register_sysctl_table(sys_table,0);
+#endif
+}
+
+  /*
+   * Allocating PTEs for future falt handling. 
+   */
+  
+void set_sp_range(unsigned long address, int order, pgprot_t prot)
+  {
+        int i;
+        pgd_t * dir;
+        pmd_t *pmd;
+        pte_t * pte;
+	struct mm_struct *mm = current->mm;
+  
+	spin_lock(&mm->page_table_lock);
+        dir = pgd_offset(mm, address);
+        pmd = pmd_alloc(mm, dir, address);
+        if (!pmd) goto out;
+        address &= ~PGDIR_MASK;
+        pte = pte_alloc_map(mm, pmd, address);
+        if (!pte)
+                goto out;
+        for (i = 0; i < 1<<order; i ++)
+                if(!pte_none(*(pte+i))) goto out;
+        for (i = 0; i < 1<<order; i ++) {
+                set_pte_raw(pte, pte_modify(*pte, prot));
+                pte++;
+        }
+out:
+	spin_unlock(&mm->page_table_lock);
+        return;
+  }
+  
+  /*
+   * Simplistic new page table allocation for sys_brk..
+   * Only GH bit != 0 tables will be allocated.
+   * At this time, we will not allocate real storage, it remains
+   * for the page_fault handler.
+   */
+int make_ptes_present(unsigned long addr, unsigned long end)
+  {
+        int i;
+        unsigned long rem;
+        if (addr >= end)
+                BUG();
+#if SUPER_PAGE_DEBUG
+	printk("make_ptes_present\n");
+#endif
+  /*
+   * The first order(i=0) is the ordinary pte (1page), then we skip to
+   * allocate the pte.
+   */
+        for (i = 0; i < super_page_nr - 1; i++) {
+           rem = (~addr + 1) & ((PAGE_SIZE << super_page_order[i+1]) - 1);
+           while (rem &&
+                  (addr & ((PAGE_SIZE << super_page_order[i]) - 1)) == 0UL &&
+                  ((end - addr ) >= (PAGE_SIZE << super_page_order[i]))) {
+                  if(i&&super_page_bitmask & (1<<i)) {
+                   super_page_reserve[i]++;
+                   set_sp_range(addr, super_page_order[i], super_page_prot[i]);
+                  }
+                        addr += PAGE_SIZE << super_page_order[i];
+                        rem  -= PAGE_SIZE << super_page_order[i];
+                }
+        }
+        for (i = super_page_nr - 1; i > 0; i--) {
+                while (
+                    (addr & ((PAGE_SIZE << super_page_order[i]) - 1)) == 0UL &&
+                    ((end - addr ) >= (PAGE_SIZE << super_page_order[i]))) {
+                     if(super_page_bitmask & (1<<i)) {
+                        super_page_reserve[i]++;
+                        set_sp_range(addr, super_page_order[i], super_page_prot[i]);
+                     }
+                        addr += PAGE_SIZE << super_page_order[i];
+                }
+        }
+        return 0;
+  }
+
+void adj_sp_pte(struct mm_struct *mm, int zap,
+               unsigned long address, int order)
+  {
+        int i,downgrade;
+        pgd_t * dir;
+        pmd_t *pmd;
+        pte_t * pte;
+  
+        downgrade=0;
+        dir = pgd_offset(mm, address);
+        pmd = pmd_offset(dir, address);
+        if (!pmd) return;
+        address &= ~PGDIR_MASK;
+        pte = pte_offset_map(pmd, address);
+        if (!pte) return;
+/*
+We assume that the largest super page is less or equal to the
+mapped area by the pmd. Then following code does not take
+the pmd_offset again. If you want to use a larger super page,
+you need to check the code.
+*/
+        for (i = 0; i < 1<<order; i ++) {
+ retry:
+           if(super_page_order[pte_to_sp_index(*(pte+i))]>order) {
+                down_pte_sp(pte+i,pte_to_sp_index(*(pte+i)));
+                downgrade=1;
+                goto retry;
+               }
+        }
+        if(zap) clear_pte_sp(pte, pte_to_sp_index(*pte));
+        if(zap||downgrade) clear_pmd_sp(pmd);
+        return;
+  }
+  
+void adj_sp_range(struct mm_struct *mm, int zap, 
+                  unsigned long addr, unsigned long end)
+  {
+        int i;
+        unsigned long rem;
+        if (addr >= end)
+                BUG();
+
+#if SUPER_PAGE_DEBUG
+	printk("adj_sp_range mm:%p, addr:%0lx, end:%0lx\n",mm,addr,end);
+#endif
+        for (i = 0; i < super_page_nr - 1; i++) {
+           rem = (~addr + 1) & ((PAGE_SIZE << super_page_order[i+1]) - 1);
+           while (rem &&
+                  (addr & ((PAGE_SIZE << super_page_order[i]) - 1)) == 0UL &&
+                  ((end - addr ) >= (PAGE_SIZE << super_page_order[i]))) {
+                        adj_sp_pte(mm, zap, addr, super_page_order[i]);
+                        addr += PAGE_SIZE << super_page_order[i];
+                        rem  -= PAGE_SIZE << super_page_order[i];
+                };
+        }
+        for (i = super_page_nr - 1; i >= 0; i--) {
+                while (
+                    (addr & ((PAGE_SIZE << super_page_order[i]) - 1)) == 0UL &&
+                    ((end - addr ) >= (PAGE_SIZE << super_page_order[i]))) {
+                        adj_sp_pte(mm, zap, addr, super_page_order[i]);
+                        addr += PAGE_SIZE << super_page_order[i];
+                };
+        }
+#if SUPER_PAGE_DEBUG
+	printk("adj_sp_range return\n");
+#endif
+        return;
+  }
+
+void __break_area (struct page *page, unsigned long order) {
+	int i;
+	unsigned long size = 1 << order;
+
+	for ( i = 0; i < size; i++ ) {
+		set_page_count(page + i, 1);
+	}
+	return;
+}
+
+void down_pte_sp(pte_t *pteptr, int index) {
+	int i,order;
+	pte_t *addr;
+	order = super_page_order[index];
+	addr = (pte_t *)((unsigned long) pteptr & 
+		~((1UL<<(order + SIZEOF_PTE_LOG2)) - 1));
+	for ( i=0; i < 1<<order; i++) {
+#if CONFIG_X86
+		(*(addr+i)).pte_low = 
+			((*(addr+i)).pte_low & ~SUPER_PAGE_MASK) |
+			pgprot_val(super_page_prot[index -1]);
+#else
+		pte_val(*(addr+i)) = (pte_val(*(addr+i)) & ~SUPER_PAGE_MASK) |
+					 pgprot_val(super_page_prot[index -1]);
+#endif
+	}
+}
+void clear_pte_sp(pte_t *pteptr, int index) {
+	int i,order;
+	pte_t *addr;
+	order = super_page_order[index];
+	addr = (pte_t *)((unsigned long) pteptr &
+		 ~((1UL<<(order + SIZEOF_PTE_LOG2)) - 1));
+	for ( i=0; i < 1<<order; i++) {
+#if CONFIG_X86
+		(*(addr+i)).pte_low &=  ~SUPER_PAGE_MASK;
+#else
+		pte_val(*(addr+i)) &=  ~SUPER_PAGE_MASK;
+#endif
+	}
+}
diff -urN linux-2.5.44/mm/swapfile.c linux-2.5.44-superpage/mm/swapfile.c
--- linux-2.5.44/mm/swapfile.c	Sat Oct 19 13:01:17 2002
+++ linux-2.5.44-superpage/mm/swapfile.c	Sun Oct 27 17:02:06 2002
@@ -454,6 +454,9 @@
 
 	if (start >= end)
 		BUG();
+#if CONFIG_SUPER_PAGE
+	adj_sp_range(vma->vm_mm, 1, start, end);
+#endif
 	do {
 		unuse_pgd(vma, pgdir, start, end - start, entry, page);
 		start = (start + PGDIR_SIZE) & PGDIR_MASK;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH,RFC] Transparent SuperPage Support for 2.5.44
  2002-10-28  1:58 [PATCH,RFC] Transparent SuperPage Support for 2.5.44 Naohiko Shimizu
@ 2002-10-28  2:40 ` Wiedemeier, Jeff
  2002-10-28  3:00   ` Naohiko Shimizu
  2002-11-03  7:40 ` Pavel Machek
  1 sibling, 1 reply; 5+ messages in thread
From: Wiedemeier, Jeff @ 2002-10-28  2:40 UTC (permalink / raw)
  To: Naohiko Shimizu; +Cc: linux-kernel, Jeff Wiedemeier

On Mon, Oct 28, 2002 at 10:58:49AM +0900, Naohiko Shimizu wrote:
> This patch includes i386, alpha, sparc64 ports.
> But I could not compile for alpha even with plain 2.5.44, and

Here's a small patch on top of yours to fix a typo and add set_pte_raw
for Alpha. 

/jeff

----
diff -Nuar superpage/include/asm-alpha/pgtable.h superpage.alpha/include/asm-alpha/pgtable.h
--- superpage/include/asm-alpha/pgtable.h	Sun Oct 27 21:16:02 2002
+++ superpage.alpha/include/asm-alpha/pgtable.h	Sun Oct 27 21:20:45 2002
@@ -275,7 +275,7 @@
 #define SUPER_PAGE_MASK 0x0060
 #define SUPER_PAGE_MASK_SHIFT 5
 #define SUPER_PAGE_NR 4
-#define SIZEOF_PTE_LOG2 SEZEOF_PTR_LOG2
+#define SIZEOF_PTE_LOG2 SIZEOF_PTR_LOG2
 void down_pte_sp(pte_t *pteptr, int index);
 void clear_pte_sp(pte_t *pteptr, int index);
 extern int super_page_order[];
@@ -288,6 +288,7 @@
 	{pte_val(pte) &= ~SUPER_PAGE_MASK; return pte;} 
 extern inline void pte_clear(pte_t *ptep)	\
 	{ pte_t pte; pte_val(pte)=0; set_pte(ptep, pte); }
+#define set_pte_raw(pteptr, pteval) set_pte(pteptr, pteval)
 #else
 extern inline int pte_none(pte_t pte)          { return !(pte_val(pte)); }
 extern inline void pte_clear(pte_t *ptep)      { pte_val(*ptep)=0; }



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH,RFC] Transparent SuperPage Support for 2.5.44
  2002-10-28  2:40 ` Wiedemeier, Jeff
@ 2002-10-28  3:00   ` Naohiko Shimizu
  0 siblings, 0 replies; 5+ messages in thread
From: Naohiko Shimizu @ 2002-10-28  3:00 UTC (permalink / raw)
  To: Wiedemeier, Jeff; +Cc: linux-kernel

Thank you Jeff for the fix.
I also needed to add set_pte_raw for sparc64 port.
Revised patch is placed at:

http://shimizu-lab.dt.u-tokai.ac.jp/lsp/super_page-2.5.44_021028-1.patch

-- 
Naohiko Shimizu
Dept. Communications Engineering/Tokai University
1117 Kitakaname Hiratsuka 259-1292 Japan
TEL.+81-463-58-1211(ext. 4084) FAX.+81-463-58-8320
http://shimizu-lab.dt.u-tokai.ac.jp/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH,RFC] Transparent SuperPage Support for 2.5.44
  2002-10-28  1:58 [PATCH,RFC] Transparent SuperPage Support for 2.5.44 Naohiko Shimizu
  2002-10-28  2:40 ` Wiedemeier, Jeff
@ 2002-11-03  7:40 ` Pavel Machek
  2002-11-04  5:34   ` Naohiko Shimizu
  1 sibling, 1 reply; 5+ messages in thread
From: Pavel Machek @ 2002-11-03  7:40 UTC (permalink / raw)
  To: Naohiko Shimizu; +Cc: linux-kernel

Hi!

> This is a transparent superpage support patch for 2.5.44.
> Big difference between this patch and 2.4.19 patch is
> eliminating of automatic dynamic downgrade for superpages.
> Instead, I place pagesize adjust routine where required.
> I hope this change minimize the overhead for conventional
> programs which does not use superpages.
> 
> Linux SuperPage patch is transparent for user applications.
> It will automatically allocate appropriate size of superpages
> if possible.
> It does not allocate real strage unless the application
> really access that area. And it does not allocate memory
> larger than the application requests.
> 
> This patch includes i386, alpha, sparc64 ports.
> But I could not compile for alpha even with plain 2.5.44, and
> I don't have sparc64 to test, then only i386 was tested now.

How do you swap these 4mb beasts?
And you need 4mb, physically continuous
area for this to work, right? How do you
get that?
			PavelEnd_of_mail_magic_4294
# New mail (delim 4464)
/usr/sbin/sendmail -oem -oi -N failure,delay -R full -- linux@linux.cz << End_of_mail_magic_4464
Date: Sun, 3 Nov 2002 09:53:56 +0100
From: Pavel Machek <pavel@ucw.cz>
To: linux@linux.cz
Subject: Re: Komprimovany filesystem ?
Message-ID: <20021103085349.GB4189@zaurus>
References: <slrnar9h47.59n.tripie@tripie.tripie>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <slrnar9h47.59n.tripie@tripie.tripie>
User-Agent: Mutt/1.3.27i


> Snazim se najit nejaky realtime komprimovany filesystem pro Linux
> s podporou cteni i zapisu. Mam na mysli neco podobneho, jako byl
> kdysi Stacker nebo Doublespace pod DOS. Nenasel jsem bohuzel
> zadny, ktery by byl v soucasne dobe udrzovany a pouzitelny pod
> jadrem 2.4. Nejvice nadejnym se mi jevi projekt e2compr, bohuzel
> vsak existuje pouze verze pro jadra 2.2.

e2compr pro 2.4 existuje....

> Jeste lepsim resenim nez specialni komprimovany filesystem by
> byla komprimovana loop device, podobne reseni jake je pouzito u
> projektu LoopAES.

To nejde protoze komprese meni velikost
dat....
 			PavelEnd_of_mail_magic_4464

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH,RFC] Transparent SuperPage Support for 2.5.44
  2002-11-03  7:40 ` Pavel Machek
@ 2002-11-04  5:34   ` Naohiko Shimizu
  0 siblings, 0 replies; 5+ messages in thread
From: Naohiko Shimizu @ 2002-11-04  5:34 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

Hi,

On Sun, 3 Nov 2002 08:40:10 +0100
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > This is a transparent superpage support patch for 2.5.44.
> > Big difference between this patch and 2.4.19 patch is
> > eliminating of automatic dynamic downgrade for superpages.
> > Instead, I place pagesize adjust routine where required.
> > I hope this change minimize the overhead for conventional
> > programs which does not use superpages.
> > 
> > Linux SuperPage patch is transparent for user applications.
> > It will automatically allocate appropriate size of superpages
> > if possible.
> > It does not allocate real strage unless the application
> > really access that area. And it does not allocate memory
> > larger than the application requests.
> > 
> > This patch includes i386, alpha, sparc64 ports.
> > But I could not compile for alpha even with plain 2.5.44, and
> > I don't have sparc64 to test, then only i386 was tested now.
> 

> How do you swap these 4mb beasts?

With this feature, when kernel needs to swap superpages, 
it will first downgrade the page down to a basic page 
before swapping.
Then we don't need to handle 4MB page for swapping.
I think that if we need to swap some pages, we should not
keep the superpage.

> And you need 4mb, physically continuous
> area for this to work, right? How do you
> get that?

I use buddy allocator of Linux. What I modifed on the
conventional buddy system is only the max order for x86.
Because normal buddy system of Linux only handle upto 2^9 size
of a page, and I need 2^10 for 4MB page on x86.
If buddy system does not have enough 4MB block when required,
we will downgrade the request to use 4KB page on x86 and
512KB page on Alpha or sparc64. We recuresively downgrade
the requesting page size down to the basic page size or
the size that buddy can allocate it.

-- 
Naohiko Shimizu
Dept. Communications Engineering/Tokai University
1117 Kitakaname Hiratsuka 259-1292 Japan
TEL.+81-463-58-1211(ext. 4084) FAX.+81-463-58-8320
http://shimizu-lab.dt.u-tokai.ac.jp/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-11-04  5:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-28  1:58 [PATCH,RFC] Transparent SuperPage Support for 2.5.44 Naohiko Shimizu
2002-10-28  2:40 ` Wiedemeier, Jeff
2002-10-28  3:00   ` Naohiko Shimizu
2002-11-03  7:40 ` Pavel Machek
2002-11-04  5:34   ` Naohiko Shimizu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).