linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [QUICKLIST 0/6] Arch independent quicklists V1
@ 2007-03-11  2:09 Christoph Lameter
  2007-03-11  2:09 ` [QUICKLIST 1/6] Extract quicklist implementation from IA64 Christoph Lameter
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, holt, linux-ia64, Christoph Lameter, mpm

This patchset introduces an arch independent framework to handle lists
of recently used page table pages.

Page table pages have the characteristics that they are typically zero
or in a known state when they are freed. This is usually the exactly
same state as needed after allocation. So it makes sense to build a list
of freed page table pages and then consume the pages already in use
first. Those pages have already been initialized correctly (thus no
need to zero them) and are likely already cached in such a way that
the MMU can use them most effectively.

Such an implementation already exits for ia64. If I remember correctly
it was done by Robin Holt. Howver, that implementation did not support
constructors and destructors as needed by i386 / x86_64. It also only
supported a single quicklist. The implementation here has constructor
and destructor support as well as the ability for an arch to specify
how many quicklists are needed.

Quicklists are defined by an arch defining the necessary number
of quicklists in arch/<arch>/Kconfig. F.e. i386 needs two and thus
has 

config NR_QUICK
	int
	default 2

If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:

quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)

Page table pages can be freed using:

quicklist_free(<quicklist-nr>, <destructor>, <page>)

Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.

If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.

Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.

6 patches follow this message:

[QUICKLIST 1/6] Extract quicklist implementation from IA64
[QUICKLIST 2/6] i386: quicklist support
[QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list
[QUICKLIST 4/6] x86_64: Single quicklist
[QUICKLIST 5/6] x86_64: Separate quicklist for pgds
[QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [QUICKLIST 1/6] Extract quicklist implementation from IA64
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
@ 2007-03-11  2:09 ` Christoph Lameter
  2007-03-11  2:09 ` [QUICKLIST 2/6] i386: quicklist support Christoph Lameter
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, mpm, linux-ia64, holt, Christoph Lameter

Abstract quicklist from the IA64 implementation

Extract the quicklist implementation for IA64, clean it up
and generalize it to:

1. Allow multiple quicklists

2. Add support for constructors and destructors..

Quicklist allocation and frees occur inline. The support
for constructors / destructors and multiple quicklists
can therefore be optimized out of the final code for an
arch.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/ia64/mm/init.c	2007-03-10 11:34:00.000000000 -0800
+++ linux-2.6.21-rc3/arch/ia64/mm/init.c	2007-03-10 11:50:46.000000000 -0800
@@ -39,9 +39,6 @@
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
-DEFINE_PER_CPU(unsigned long *, __pgtable_quicklist);
-DEFINE_PER_CPU(long, __pgtable_quicklist_size);
-
 extern void ia64_tlb_init (void);
 
 unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
@@ -56,54 +53,6 @@ EXPORT_SYMBOL(vmem_map);
 struct page *zero_page_memmap_ptr;	/* map entry for zero page */
 EXPORT_SYMBOL(zero_page_memmap_ptr);
 
-#define MIN_PGT_PAGES			25UL
-#define MAX_PGT_FREES_PER_PASS		16L
-#define PGT_FRACTION_OF_NODE_MEM	16
-
-static inline long
-max_pgt_pages(void)
-{
-	u64 node_free_pages, max_pgt_pages;
-
-#ifndef	CONFIG_NUMA
-	node_free_pages = nr_free_pages();
-#else
-	node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES);
-#endif
-	max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM;
-	max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES);
-	return max_pgt_pages;
-}
-
-static inline long
-min_pages_to_free(void)
-{
-	long pages_to_free;
-
-	pages_to_free = pgtable_quicklist_size - max_pgt_pages();
-	pages_to_free = min(pages_to_free, MAX_PGT_FREES_PER_PASS);
-	return pages_to_free;
-}
-
-void
-check_pgt_cache(void)
-{
-	long pages_to_free;
-
-	if (unlikely(pgtable_quicklist_size <= MIN_PGT_PAGES))
-		return;
-
-	preempt_disable();
-	while (unlikely((pages_to_free = min_pages_to_free()) > 0)) {
-		while (pages_to_free--) {
-			free_page((unsigned long)pgtable_quicklist_alloc());
-		}
-		preempt_enable();
-		preempt_disable();
-	}
-	preempt_enable();
-}
-
 void
 lazy_mmu_prot_update (pte_t pte)
 {
Index: linux-2.6.21-rc3/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3.orig/include/asm-ia64/pgalloc.h	2007-03-10 11:34:00.000000000 -0800
+++ linux-2.6.21-rc3/include/asm-ia64/pgalloc.h	2007-03-10 12:37:56.000000000 -0800
@@ -18,71 +18,18 @@
 #include <linux/mm.h>
 #include <linux/page-flags.h>
 #include <linux/threads.h>
+#include <linux/quicklist.h>
 
 #include <asm/mmu_context.h>
 
-DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist);
-#define pgtable_quicklist __ia64_per_cpu_var(__pgtable_quicklist)
-DECLARE_PER_CPU(long, __pgtable_quicklist_size);
-#define pgtable_quicklist_size __ia64_per_cpu_var(__pgtable_quicklist_size)
-
-static inline long pgtable_quicklist_total_size(void)
-{
-	long ql_size = 0;
-	int cpuid;
-
-	for_each_online_cpu(cpuid) {
-		ql_size += per_cpu(__pgtable_quicklist_size, cpuid);
-	}
-	return ql_size;
-}
-
-static inline void *pgtable_quicklist_alloc(void)
-{
-	unsigned long *ret = NULL;
-
-	preempt_disable();
-
-	ret = pgtable_quicklist;
-	if (likely(ret != NULL)) {
-		pgtable_quicklist = (unsigned long *)(*ret);
-		ret[0] = 0;
-		--pgtable_quicklist_size;
-		preempt_enable();
-	} else {
-		preempt_enable();
-		ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-	}
-
-	return ret;
-}
-
-static inline void pgtable_quicklist_free(void *pgtable_entry)
-{
-#ifdef CONFIG_NUMA
-	int nid = page_to_nid(virt_to_page(pgtable_entry));
-
-	if (unlikely(nid != numa_node_id())) {
-		free_page((unsigned long)pgtable_entry);
-		return;
-	}
-#endif
-
-	preempt_disable();
-	*(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist;
-	pgtable_quicklist = (unsigned long *)pgtable_entry;
-	++pgtable_quicklist_size;
-	preempt_enable();
-}
-
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pgd_free(pgd_t * pgd)
 {
-	pgtable_quicklist_free(pgd);
+	quicklist_free(0, NULL, pgd);
 }
 
 #ifdef CONFIG_PGTABLE_4
@@ -94,12 +41,12 @@ pgd_populate(struct mm_struct *mm, pgd_t
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pud_free(pud_t * pud)
 {
-	pgtable_quicklist_free(pud);
+	quicklist_free(0, NULL, pud);
 }
 #define __pud_free_tlb(tlb, pud)	pud_free(pud)
 #endif /* CONFIG_PGTABLE_4 */
@@ -112,12 +59,12 @@ pud_populate(struct mm_struct *mm, pud_t
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pmd_free(pmd_t * pmd)
 {
-	pgtable_quicklist_free(pmd);
+	quicklist_free(0, NULL, pmd);
 }
 
 #define __pmd_free_tlb(tlb, pmd)	pmd_free(pmd)
@@ -137,28 +84,31 @@ pmd_populate_kernel(struct mm_struct *mm
 static inline struct page *pte_alloc_one(struct mm_struct *mm,
 					 unsigned long addr)
 {
-	void *pg = pgtable_quicklist_alloc();
+	void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
 	return pg ? virt_to_page(pg) : NULL;
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long addr)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pte_free(struct page *pte)
 {
-	pgtable_quicklist_free(page_address(pte));
+	quicklist_free(0, NULL, page_address(pte));
 }
 
 static inline void pte_free_kernel(pte_t * pte)
 {
-	pgtable_quicklist_free(pte);
+	quicklist_free(0, NULL, pte);
 }
 
-#define __pte_free_tlb(tlb, pte)	pte_free(pte)
+static inline void check_pgt_cache(void)
+{
+	quicklist_check(0, NULL);
+}
 
-extern void check_pgt_cache(void);
+#define __pte_free_tlb(tlb, pte)	pte_free(pte)
 
 #endif				/* _ASM_IA64_PGALLOC_H */
Index: linux-2.6.21-rc3/include/linux/quicklist.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc3/include/linux/quicklist.h	2007-03-10 12:38:44.000000000 -0800
@@ -0,0 +1,79 @@
+#ifndef LINUX_QUICKLIST_H
+#define LINUX_QUICKLIST_H
+/*
+ * Fast allocations and disposal of pages. Pages must be in the condition
+ * as needed after allocation when they are freed. Per cpu lists of pages
+ * are kept that only contain node local pages.
+ *
+ * (C) 2007, SGI. Christoph Lameter <clameter@sgi.com>
+ */
+#include <linux/kernel.h>
+
+#ifdef CONFIG_NR_QUICK
+
+struct quicklist {
+	void *page;
+	int nr_pages;
+};
+
+DECLARE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
+{
+	struct quicklist *q;
+	void **p = NULL;
+
+	q =&get_cpu_var(quicklist)[nr];
+	p = q->page;
+	if (likely(p)) {
+		q->page = p[0];
+		p[0] = NULL;
+		q->nr_pages--;
+	}
+	put_cpu_var(quicklist);
+	if (likely(p))
+		return p;
+
+	p = (void *)__get_free_page(flags | __GFP_ZERO);
+	if (ctor && p)
+		ctor(p);
+	return p;
+}
+
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+{
+	struct quicklist *q;
+	void **p = pp;
+	struct page *page = virt_to_page(p);
+	int nid = page_to_nid(page);
+
+	if (unlikely(nid != numa_node_id())) {
+		if (dtor)
+			dtor(p);
+		free_page((unsigned long)p);
+		return;
+	}
+
+	q = &get_cpu_var(quicklist)[nr];
+	p[0] = q->page;
+	q->page = p;
+	q->nr_pages++;
+	put_cpu_var(quicklist);
+}
+
+void quicklist_check(int nr, void (*dtor)(void *));
+unsigned long quicklist_total_size(void);
+
+#else
+void quicklist_check(int nr, void (*dtor)(void *))
+{
+}
+
+unsigned long quicklist_total_size(void)
+{
+	return 0;
+}
+#endif
+
+#endif /* LINUX_QUICKLIST_H */
+
Index: linux-2.6.21-rc3/mm/Makefile
===================================================================
--- linux-2.6.21-rc3.orig/mm/Makefile	2007-03-10 11:34:00.000000000 -0800
+++ linux-2.6.21-rc3/mm/Makefile	2007-03-10 11:50:46.000000000 -0800
@@ -30,3 +30,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
+obj-$(CONFIG_QUICKLIST) += quicklist.o
+
Index: linux-2.6.21-rc3/mm/quicklist.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc3/mm/quicklist.c	2007-03-10 12:40:27.000000000 -0800
@@ -0,0 +1,74 @@
+/*
+ * Quicklist support.
+ *
+ * Copyright (C) 2005-2007 SGI,
+ * 	Robin Holt <holt@sgi.com>
+ * 	Christoph Lameter <clameter@sgi.com>
+ */
+#include <linux/kernel.h>
+
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/quicklist.h>
+
+DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+#define MIN_PAGES		25
+#define MAX_FREES_PER_PASS	16
+#define FRACTION_OF_NODE_MEM	16
+
+static unsigned long max_pages(void)
+{
+	unsigned long node_free_pages, max;
+
+	node_free_pages = node_page_state(numa_node_id(),
+			NR_FREE_PAGES);
+	max = node_free_pages / FRACTION_OF_NODE_MEM;
+	return max(max, (unsigned long)MIN_PAGES);
+}
+
+static long min_pages_to_free(struct quicklist *q)
+{
+	long pages_to_free;
+
+	pages_to_free = q->nr_pages - max_pages();
+
+	return min(pages_to_free, (long)MAX_FREES_PER_PASS);
+}
+
+void quicklist_check(int nr, void (*dtor)(void *))
+{
+	long pages_to_free;
+	struct quicklist *q;
+
+	q = &get_cpu_var(quicklist)[nr];
+	if (q->nr_pages > MIN_PAGES) {
+		pages_to_free = min_pages_to_free(q);
+
+		while (pages_to_free > 0) {
+			void *p = quicklist_alloc(nr, 0, NULL);
+
+			if (dtor)
+				dtor(p);
+			free_page((unsigned long)p);
+			pages_to_free--;
+		}
+	}
+	put_cpu_var(quicklist);
+}
+
+unsigned long quicklist_total_size(void)
+{
+	unsigned long count = 0;
+	int cpu;
+	struct quicklist *ql, *q;
+
+	for_each_online_cpu(cpu) {
+		ql = per_cpu(quicklist, cpu);
+		for (q = ql; q < ql + CONFIG_NR_QUICK; q++)
+			count += q->nr_pages;
+	}
+	return count;
+}
+
Index: linux-2.6.21-rc3/arch/ia64/mm/contig.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/ia64/mm/contig.c	2007-03-10 11:34:00.000000000 -0800
+++ linux-2.6.21-rc3/arch/ia64/mm/contig.c	2007-03-10 11:50:46.000000000 -0800
@@ -88,7 +88,7 @@ void show_mem(void)
 	printk(KERN_INFO "%d pages shared\n", total_shared);
 	printk(KERN_INFO "%d pages swap cached\n", total_cached);
 	printk(KERN_INFO "Total of %ld pages in page table cache\n",
-	       pgtable_quicklist_total_size());
+	       quicklist_total_size());
 	printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
 }
 
Index: linux-2.6.21-rc3/arch/ia64/mm/discontig.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/ia64/mm/discontig.c	2007-03-10 11:34:00.000000000 -0800
+++ linux-2.6.21-rc3/arch/ia64/mm/discontig.c	2007-03-10 11:50:46.000000000 -0800
@@ -563,7 +563,7 @@ void show_mem(void)
 	printk(KERN_INFO "%d pages shared\n", total_shared);
 	printk(KERN_INFO "%d pages swap cached\n", total_cached);
 	printk(KERN_INFO "Total of %ld pages in page table cache\n",
-	       pgtable_quicklist_total_size());
+	       quicklist_total_size());
 	printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
 }
 
Index: linux-2.6.21-rc3/arch/ia64/Kconfig
===================================================================
--- linux-2.6.21-rc3.orig/arch/ia64/Kconfig	2007-03-10 11:34:00.000000000 -0800
+++ linux-2.6.21-rc3/arch/ia64/Kconfig	2007-03-10 11:50:46.000000000 -0800
@@ -29,6 +29,10 @@ config ZONE_DMA
 	def_bool y
 	depends on !IA64_SGI_SN2
 
+config NR_QUICK
+	int
+	default 1
+
 config MMU
 	bool
 	default y
Index: linux-2.6.21-rc3/mm/Kconfig
===================================================================
--- linux-2.6.21-rc3.orig/mm/Kconfig	2007-03-10 11:34:00.000000000 -0800
+++ linux-2.6.21-rc3/mm/Kconfig	2007-03-10 12:55:15.000000000 -0800
@@ -163,3 +163,8 @@ config ZONE_DMA_FLAG
 	default "0" if !ZONE_DMA
 	default "1"
 
+config QUICKLIST
+	bool
+	default y if NR_QUICK != 0
+
+

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [QUICKLIST 2/6] i386: quicklist support
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
  2007-03-11  2:09 ` [QUICKLIST 1/6] Extract quicklist implementation from IA64 Christoph Lameter
@ 2007-03-11  2:09 ` Christoph Lameter
  2007-03-11  3:22   ` William Lee Irwin III
  2007-03-11  2:09 ` [QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list Christoph Lameter
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, linux-ia64, holt, Christoph Lameter, mpm

i386: Convert to quicklists

Implement the i386 management of pgd and pmds using quicklists.

The i386 management of page table pages currently uses page sized slabs.
The page state is therefore mainly determined by the slab code. However,
i386 also uses its own fields in the page struct to mark special pages
and to build a list of pgds using the ->private and ->index field (yuck!).
This has been finely tuned to work right with SLAB but SLUB needs more
control over the page struct. Currently the only way for SLUB to support
these slabs is through special casing PAGE_SIZE slabs.

If we use quicklists instead then we can avoid the mess, and also the
overhead of manipulating page sized objects through slab.

It also allows us to use standard list manipulation macros for the
pgd list using page->lru thereby simplifying the code.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3/arch/i386/mm/init.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/mm/init.c	2007-03-10 13:13:32.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/mm/init.c	2007-03-10 13:39:23.000000000 -0800
@@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size)
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif
 
-struct kmem_cache *pgd_cache;
-struct kmem_cache *pmd_cache;
-
-void __init pgtable_cache_init(void)
-{
-	if (PTRS_PER_PMD > 1) {
-		pmd_cache = kmem_cache_create("pmd",
-					PTRS_PER_PMD*sizeof(pmd_t),
-					PTRS_PER_PMD*sizeof(pmd_t),
-					0,
-					pmd_ctor,
-					NULL);
-		if (!pmd_cache)
-			panic("pgtable_cache_init(): cannot create pmd cache");
-	}
-	pgd_cache = kmem_cache_create("pgd",
-				PTRS_PER_PGD*sizeof(pgd_t),
-				PTRS_PER_PGD*sizeof(pgd_t),
-				0,
-				pgd_ctor,
-				PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
-	if (!pgd_cache)
-		panic("pgtable_cache_init(): Cannot create pgd cache");
-}
-
 /*
  * This function cannot be __init, since exceptions don't work in that
  * section.  Put this after the callers, so that it cannot be inlined.
Index: linux-2.6.21-rc3/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/mm/pgtable.c	2007-03-10 13:13:32.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/mm/pgtable.c	2007-03-10 13:43:39.000000000 -0800
@@ -13,6 +13,7 @@
 #include <linux/pagemap.h>
 #include <linux/spinlock.h>
 #include <linux/module.h>
+#include <linux/quicklist.h>
 
 #include <asm/system.h>
 #include <asm/pgtable.h>
@@ -181,9 +182,12 @@ void reserve_top_address(unsigned long r
 #endif
 }
 
+#define QUICK_PGD 0
+#define QUICK_PT 1
+
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+	return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL);
 }
 
 struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
@@ -198,11 +202,6 @@ struct page *pte_alloc_one(struct mm_str
 	return pte;
 }
 
-void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags)
-{
-	memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
 /*
  * List of all pgd's needed for non-PAE so it can invalidate entries
  * in both cached and uncached pgd's; not needed for PAE since the
@@ -211,8 +210,6 @@ void pmd_ctor(void *pmd, struct kmem_cac
  * against pageattr.c; it is the unique case in which a valid change
  * of kernel pagetables can't be lazily synchronized by vmalloc faults.
  * vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
  * -- wli
  */
 DEFINE_SPINLOCK(pgd_lock);
@@ -238,7 +235,7 @@ static inline void pgd_list_del(pgd_t *p
 		set_page_private(next, (unsigned long)pprev);
 }
 
-void pgd_ctor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_ctor(void *pgd)
 {
 	unsigned long flags;
 
@@ -264,7 +261,7 @@ void pgd_ctor(void *pgd, struct kmem_cac
 }
 
 /* never called when PTRS_PER_PMD > 1 */
-void pgd_dtor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_dtor(void *pgd)
 {
 	unsigned long flags; /* can be called from interrupt context */
 
@@ -277,13 +274,13 @@ void pgd_dtor(void *pgd, struct kmem_cac
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	int i;
-	pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
+	pgd_t *pgd = quicklist_alloc(QUICK_PGD, GFP_KERNEL, pgd_ctor);
 
 	if (PTRS_PER_PMD == 1 || !pgd)
 		return pgd;
 
 	for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
-		pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+		pmd_t *pmd = quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL);
 		if (!pmd)
 			goto out_oom;
 		paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT);
@@ -296,9 +293,9 @@ out_oom:
 		pgd_t pgdent = pgd[i];
 		void* pmd = (void *)__va(pgd_val(pgdent)-1);
 		paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
-		kmem_cache_free(pmd_cache, pmd);
+		quicklist_free(QUICK_PT, NULL, pmd);
 	}
-	kmem_cache_free(pgd_cache, pgd);
+	quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 	return NULL;
 }
 
@@ -312,8 +309,14 @@ void pgd_free(pgd_t *pgd)
 			pgd_t pgdent = pgd[i];
 			void* pmd = (void *)__va(pgd_val(pgdent)-1);
 			paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
-			kmem_cache_free(pmd_cache, pmd);
+			quicklist_free(QUICK_PT, NULL, pmd);
 		}
 	/* in the non-PAE case, free_pgtables() clears user pgd entries */
-	kmem_cache_free(pgd_cache, pgd);
+	quicklist_free(QUICK_PGD, pgd_ctor, pgd);
+}
+
+void check_pgt_cache(void)
+{
+	quicklist_check(QUICK_PGD, pgd_dtor);
+	quicklist_check(QUICK_PT, NULL);
 }
Index: linux-2.6.21-rc3/arch/i386/Kconfig
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/Kconfig	2007-03-10 13:13:32.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/Kconfig	2007-03-10 13:16:22.000000000 -0800
@@ -55,6 +55,10 @@ config ZONE_DMA
 	bool
 	default y
 
+config NR_QUICK
+	int
+	default 2
+
 config SBUS
 	bool
 
Index: linux-2.6.21-rc3/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.21-rc3.orig/include/asm-i386/pgtable.h	2007-03-10 13:13:32.000000000 -0800
+++ linux-2.6.21-rc3/include/asm-i386/pgtable.h	2007-03-10 13:38:30.000000000 -0800
@@ -35,15 +35,12 @@ struct vm_area_struct;
 #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
 extern unsigned long empty_zero_page[1024];
 extern pgd_t swapper_pg_dir[1024];
-extern struct kmem_cache *pgd_cache;
-extern struct kmem_cache *pmd_cache;
+
+void check_pgt_cache(void);
+
 extern spinlock_t pgd_lock;
 extern struct page *pgd_list;
-
-void pmd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_dtor(void *, struct kmem_cache *, unsigned long);
-void pgtable_cache_init(void);
+static inline void pgtable_cache_init(void) {}
 void paging_init(void);
 
 /*
Index: linux-2.6.21-rc3/arch/i386/kernel/smp.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/kernel/smp.c	2007-03-10 13:13:32.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/kernel/smp.c	2007-03-10 13:16:22.000000000 -0800
@@ -437,7 +437,7 @@ void flush_tlb_mm (struct mm_struct * mm
 	}
 	if (!cpus_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+	check_pgt_cache();
 	preempt_enable();
 }
 
Index: linux-2.6.21-rc3/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/kernel/process.c	2007-03-10 13:13:32.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/kernel/process.c	2007-03-10 13:16:22.000000000 -0800
@@ -181,6 +181,7 @@ void cpu_idle(void)
 			if (__get_cpu_var(cpu_idle_state))
 				__get_cpu_var(cpu_idle_state) = 0;
 
+			check_pgt_cache();
 			rmb();
 			idle = pm_idle;
 
Index: linux-2.6.21-rc3/include/asm-i386/pgalloc.h
===================================================================
--- linux-2.6.21-rc3.orig/include/asm-i386/pgalloc.h	2007-03-10 13:44:28.000000000 -0800
+++ linux-2.6.21-rc3/include/asm-i386/pgalloc.h	2007-03-10 13:45:50.000000000 -0800
@@ -66,6 +66,6 @@ do {									\
 #define pud_populate(mm, pmd, pte)	BUG()
 #endif
 
-#define check_pgt_cache()	do { } while (0)
+extern void check_pgt_cache(void);
 
 #endif /* _I386_PGALLOC_H */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
  2007-03-11  2:09 ` [QUICKLIST 1/6] Extract quicklist implementation from IA64 Christoph Lameter
  2007-03-11  2:09 ` [QUICKLIST 2/6] i386: quicklist support Christoph Lameter
@ 2007-03-11  2:09 ` Christoph Lameter
  2007-03-11  2:09 ` [QUICKLIST 4/6] x86_64: Single Quicklist Christoph Lameter
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, mpm, linux-ia64, holt, Christoph Lameter

i386: Use standard list macros.

Get rid of generating a list via page->index and page->private. Use
page->lru instead.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/mm/pgtable.c	2007-03-10 17:42:08.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/mm/pgtable.c	2007-03-10 17:44:23.000000000 -0800
@@ -213,31 +213,12 @@ struct page *pte_alloc_one(struct mm_str
  * -- wli
  */
 DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
-
-static inline void pgd_list_add(pgd_t *pgd)
-{
-	struct page *page = virt_to_page(pgd);
-	page->index = (unsigned long)pgd_list;
-	if (pgd_list)
-		set_page_private(pgd_list, (unsigned long)&page->index);
-	pgd_list = page;
-	set_page_private(page, (unsigned long)&pgd_list);
-}
-
-static inline void pgd_list_del(pgd_t *pgd)
-{
-	struct page *next, **pprev, *page = virt_to_page(pgd);
-	next = (struct page *)page->index;
-	pprev = (struct page **)page_private(page);
-	*pprev = next;
-	if (next)
-		set_page_private(next, (unsigned long)pprev);
-}
+LIST_HEAD(pgd_list);
 
 void pgd_ctor(void *pgd)
 {
 	unsigned long flags;
+	struct page *page = virt_to_page(pgd);
 
 	if (PTRS_PER_PMD == 1) {
 		memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
@@ -256,7 +237,7 @@ void pgd_ctor(void *pgd)
 			__pa(swapper_pg_dir) >> PAGE_SHIFT,
 			USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD);
 
-	pgd_list_add(pgd);
+	list_add(&page->lru, &pgd_list);
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
 
@@ -264,10 +245,11 @@ void pgd_ctor(void *pgd)
 void pgd_dtor(void *pgd)
 {
 	unsigned long flags; /* can be called from interrupt context */
+	struct page *page = virt_to_page(pgd);
 
 	paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT);
 	spin_lock_irqsave(&pgd_lock, flags);
-	pgd_list_del(pgd);
+	list_del(&page->lru);
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
 
Index: linux-2.6.21-rc3/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.21-rc3.orig/include/asm-i386/pgtable.h	2007-03-10 17:41:48.000000000 -0800
+++ linux-2.6.21-rc3/include/asm-i386/pgtable.h	2007-03-10 17:42:00.000000000 -0800
@@ -39,7 +39,7 @@ extern pgd_t swapper_pg_dir[1024];
 void check_pgt_cache(void);
 
 extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
+extern struct list_head pgd_list;
 static inline void pgtable_cache_init(void) {};
 void paging_init(void);
 
Index: linux-2.6.21-rc3/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/mm/fault.c	2007-03-10 17:48:04.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/mm/fault.c	2007-03-10 17:49:30.000000000 -0800
@@ -608,11 +608,10 @@ void vmalloc_sync_all(void)
 			struct page *page;
 
 			spin_lock_irqsave(&pgd_lock, flags);
-			for (page = pgd_list; page; page =
-					(struct page *)page->index)
+			list_for_each_entry(page, &pgd_list, lru)
 				if (!vmalloc_sync_one(page_address(page),
 								address)) {
-					BUG_ON(page != pgd_list);
+					BUG();
 					break;
 				}
 			spin_unlock_irqrestore(&pgd_lock, flags);
Index: linux-2.6.21-rc3/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/i386/mm/pageattr.c	2007-03-10 17:49:44.000000000 -0800
+++ linux-2.6.21-rc3/arch/i386/mm/pageattr.c	2007-03-10 17:50:14.000000000 -0800
@@ -95,7 +95,7 @@ static void set_pmd_pte(pte_t *kpte, uns
 		return;
 
 	spin_lock_irqsave(&pgd_lock, flags);
-	for (page = pgd_list; page; page = (struct page *)page->index) {
+	list_for_each_entry(page, &pgd_list, lru) {
 		pgd_t *pgd;
 		pud_t *pud;
 		pmd_t *pmd;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [QUICKLIST 4/6] x86_64: Single Quicklist
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
                   ` (2 preceding siblings ...)
  2007-03-11  2:09 ` [QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list Christoph Lameter
@ 2007-03-11  2:09 ` Christoph Lameter
  2007-03-11  7:54   ` Andi Kleen
  2007-03-11  2:09 ` [QUICKLIST 5/6] x86_64: Separate quicklist for pgds Christoph Lameter
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, linux-ia64, holt, Christoph Lameter, mpm

x86_64: Convert to use a single quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

The first patch just adds a simple implementation using a single
quicklist. As a consequence we need to zero a pgd before returning
it to the pool.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.21-rc3.orig/arch/x86_64/Kconfig	2007-03-10 10:45:38.000000000 -0800
+++ linux-2.6.21-rc3/arch/x86_64/Kconfig	2007-03-10 12:50:47.000000000 -0800
@@ -56,6 +56,10 @@ config ZONE_DMA
 	bool
 	default y
 
+config NR_QUICK
+	int
+	default 1
+
 config ISA
 	bool
 
Index: linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3.orig/include/asm-x86_64/pgalloc.h	2007-03-10 10:45:39.000000000 -0800
+++ linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h	2007-03-10 12:52:14.000000000 -0800
@@ -5,6 +5,7 @@
 #include <asm/pda.h>
 #include <linux/threads.h>
 #include <linux/mm.h>
+#include <linux/quicklist.h>
 
 #define pmd_populate_kernel(mm, pmd, pte) \
 		set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -21,23 +22,23 @@ static inline void pmd_populate(struct m
 static inline void pmd_free(pmd_t *pmd)
 {
 	BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-	free_page((unsigned long)pmd);
+	quicklist_free(0, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pmd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pud_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	free_page((unsigned long)pud);
+	quicklist_free(0, NULL, pud);
 }
 
 static inline void pgd_list_add(pgd_t *pgd)
@@ -69,9 +70,10 @@ static inline void pgd_list_del(pgd_t *p
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	unsigned boundary;
-	pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+	pgd_t *pgd = (pgd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
 	if (!pgd)
 		return NULL;
+
 	pgd_list_add(pgd);
 	/*
 	 * Copy kernel pointers in from init.
@@ -90,17 +92,18 @@ static inline void pgd_free(pgd_t *pgd)
 {
 	BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
 	pgd_list_del(pgd);
-	free_page((unsigned long)pgd);
+	memset(pgd, 0, PAGE_SIZE);
+	quicklist_free(0, NULL, pgd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pte_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
-	void *p = (void *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	void *p = (void *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
 	if (!p)
 		return NULL;
 	return virt_to_page(p);
@@ -112,17 +115,21 @@ static inline struct page *pte_alloc_one
 static inline void pte_free_kernel(pte_t *pte)
 {
 	BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
-	free_page((unsigned long)pte); 
+	quicklist_free(0, NULL, pte);
 }
 
 static inline void pte_free(struct page *pte)
 {
 	__free_page(pte);
-} 
+}
 
 #define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
 
 #define __pmd_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
 #define __pud_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
 
+static inline void check_pgt_cache(void)
+{
+	quicklist_check(0, NULL);
+}
 #endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc3/mm/Kconfig
===================================================================
--- linux-2.6.21-rc3.orig/mm/Kconfig	2007-03-10 11:50:46.000000000 -0800
+++ linux-2.6.21-rc3/mm/Kconfig	2007-03-10 12:50:47.000000000 -0800
@@ -168,3 +168,8 @@ config QUICKLIST
 	default y if NR_QUICK != 0
 
 
+config QUICKLIST
+	bool
+	default y if NR_QUICK != 0
+
+
Index: linux-2.6.21-rc3/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/x86_64/kernel/process.c	2007-03-10 10:45:38.000000000 -0800
+++ linux-2.6.21-rc3/arch/x86_64/kernel/process.c	2007-03-10 12:52:46.000000000 -0800
@@ -207,6 +207,7 @@ void cpu_idle (void)
 			if (__get_cpu_var(cpu_idle_state))
 				__get_cpu_var(cpu_idle_state) = 0;
 
+			check_pgt_cache();
 			rmb();
 			idle = pm_idle;
 			if (!idle)
Index: linux-2.6.21-rc3/arch/x86_64/kernel/smp.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/x86_64/kernel/smp.c	2007-03-10 10:45:38.000000000 -0800
+++ linux-2.6.21-rc3/arch/x86_64/kernel/smp.c	2007-03-10 12:54:44.000000000 -0800
@@ -242,7 +242,7 @@ void flush_tlb_mm (struct mm_struct * mm
 	}
 	if (!cpus_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+	check_pgt_cache();
 	preempt_enable();
 }
 EXPORT_SYMBOL(flush_tlb_mm);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [QUICKLIST 5/6] x86_64: Separate quicklist for pgds
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
                   ` (3 preceding siblings ...)
  2007-03-11  2:09 ` [QUICKLIST 4/6] x86_64: Single Quicklist Christoph Lameter
@ 2007-03-11  2:09 ` Christoph Lameter
  2007-03-11  2:09 ` [QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs Christoph Lameter
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, mpm, linux-ia64, holt, Christoph Lameter

x86_64: Add quicklist for pgd.

A second quicklist is useful to separate out PGD handling. We can carry
the initialized pgds over to the next process needing them. This
avoids the zeroing of the pgds on free that we had to introduce
in the last patch.

Also clean up the pgd_list handling to use regular list macros.
There is no need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the
constructor / destructor. That way the implementation is
congruent with i386.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.21-rc3.orig/arch/x86_64/Kconfig	2007-03-10 14:00:52.000000000 -0800
+++ linux-2.6.21-rc3/arch/x86_64/Kconfig	2007-03-10 14:00:53.000000000 -0800
@@ -58,7 +58,7 @@
 
 config NR_QUICK
 	int
-	default 1
+	default 2
 
 config ISA
 	bool
Index: linux-2.6.21-rc3/arch/x86_64/mm/fault.c
===================================================================
--- linux-2.6.21-rc3.orig/arch/x86_64/mm/fault.c	2007-03-10 14:00:29.000000000 -0800
+++ linux-2.6.21-rc3/arch/x86_64/mm/fault.c	2007-03-10 14:00:53.000000000 -0800
@@ -585,7 +585,7 @@
 }
 
 DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
+LIST_HEAD(pgd_list);
 
 void vmalloc_sync_all(void)
 {
@@ -605,8 +605,7 @@
 			if (pgd_none(*pgd_ref))
 				continue;
 			spin_lock(&pgd_lock);
-			for (page = pgd_list; page;
-			     page = (struct page *)page->index) {
+			list_for_each_entry(page, &pgd_list, lru) {
 				pgd_t *pgd;
 				pgd = (pgd_t *)page_address(page) + pgd_index(address);
 				if (pgd_none(*pgd))
Index: linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3.orig/include/asm-x86_64/pgalloc.h	2007-03-10 14:00:52.000000000 -0800
+++ linux-2.6.21-rc3/include/asm-x86_64/pgalloc.h	2007-03-10 14:00:53.000000000 -0800
@@ -7,6 +7,9 @@
 #include <linux/mm.h>
 #include <linux/quicklist.h>
 
+#define QUICK_PGD 0	/* We preserve special mappings over free */
+#define QUICK_PT 1	/* Other page table pages that are zero on free */
+
 #define pmd_populate_kernel(mm, pmd, pte) \
 		set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
 #define pud_populate(mm, pud, pmd) \
@@ -22,88 +25,77 @@
 static inline void pmd_free(pmd_t *pmd)
 {
 	BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-	quicklist_free(0, NULL, pmd);
+	quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
+	return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pud_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
+	return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	quicklist_free(0, NULL, pud);
+	quicklist_free(QUICK_PT, NULL, pud);
 }
 
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
 {
+	unsigned boundary;
+	pgd_t *pgd = x;
 	struct page *page = virt_to_page(pgd);
 
+	/*
+	 * Copy kernel pointers in from init.
+	 */
+	boundary = pgd_index(__PAGE_OFFSET);
+	memcpy(pgd + boundary,
+		init_level4_pgt + boundary,
+		(PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
 	spin_lock(&pgd_lock);
-	page->index = (pgoff_t)pgd_list;
-	if (pgd_list)
-		pgd_list->private = (unsigned long)&page->index;
-	pgd_list = page;
-	page->private = (unsigned long)&pgd_list;
+	list_add(&page->lru, &pgd_list);
 	spin_unlock(&pgd_lock);
 }
 
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
 {
-	struct page *next, **pprev, *page = virt_to_page(pgd);
+	pgd_t *pgd = x;
+	struct page *page = virt_to_page(pgd);
 
 	spin_lock(&pgd_lock);
-	next = (struct page *)page->index;
-	pprev = (struct page **)page->private;
-	*pprev = next;
-	if (next)
-		next->private = (unsigned long)pprev;
+	list_del(&page->lru);
 	spin_unlock(&pgd_lock);
 }
 
+
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	unsigned boundary;
-	pgd_t *pgd = (pgd_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
-	if (!pgd)
-		return NULL;
+	pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+			 GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
 
-	pgd_list_add(pgd);
-	/*
-	 * Copy kernel pointers in from init.
-	 * Could keep a freelist or slab cache of those because the kernel
-	 * part never changes.
-	 */
-	boundary = pgd_index(__PAGE_OFFSET);
-	memset(pgd, 0, boundary * sizeof(pgd_t));
-	memcpy(pgd + boundary,
-	       init_level4_pgt + boundary,
-	       (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
 	return pgd;
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
 	BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
-	pgd_list_del(pgd);
-	memset(pgd, 0, PAGE_SIZE);
-	quicklist_free(0, NULL, pgd);
+	quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return (pte_t *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
+	return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
-	void *p = (void *)quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
+	void *p = (void *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 	if (!p)
 		return NULL;
 	return virt_to_page(p);
@@ -115,7 +107,7 @@
 static inline void pte_free_kernel(pte_t *pte)
 {
 	BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
-	quicklist_free(0, NULL, pte);
+	quicklist_free(QUICK_PT, NULL, pte);
 }
 
 static inline void pte_free(struct page *pte)
@@ -130,6 +122,7 @@
 
 static inline void check_pgt_cache(void)
 {
-	quicklist_check(0, NULL);
+	quicklist_check(QUICK_PGD, pgd_dtor);
+	quicklist_check(QUICK_PT, NULL);
 }
 #endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc3/include/asm-x86_64/pgtable.h
===================================================================
--- linux-2.6.21-rc3.orig/include/asm-x86_64/pgtable.h	2007-03-10 14:00:29.000000000 -0800
+++ linux-2.6.21-rc3/include/asm-x86_64/pgtable.h	2007-03-10 14:02:18.000000000 -0800
@@ -403,7 +403,7 @@
 #define __swp_entry_to_pte(x)		((pte_t) { (x).val })
 
 extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
+extern struct list_head pgd_list;
 void vmalloc_sync_all(void);
 
 #endif /* !__ASSEMBLY__ */
@@ -420,7 +420,6 @@
 #define HAVE_ARCH_UNMAPPED_AREA
 
 #define pgtable_cache_init()   do { } while (0)
-#define check_pgt_cache()      do { } while (0)
 
 #define PAGE_AGP    PAGE_KERNEL_NOCACHE
 #define HAVE_PAGE_AGP 1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
                   ` (4 preceding siblings ...)
  2007-03-11  2:09 ` [QUICKLIST 5/6] x86_64: Separate quicklist for pgds Christoph Lameter
@ 2007-03-11  2:09 ` Christoph Lameter
  2007-03-11 20:59 ` [QUICKLIST 0/6] Arch independent quicklists V1 David Miller
  2007-03-12 22:51 ` David Miller
  7 siblings, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, linux-ia64, holt, Christoph Lameter, mpm

Slub: Remove special casing for page sized slabs

After we have used quicklist so that arches can avoid using the slab
allocator to manage page table pages we can now remove the special
casing from slub.

This is against SLUB V5

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3/mm/slub.c
===================================================================
--- linux-2.6.21-rc3.orig/mm/slub.c	2007-03-09 21:23:39.000000000 -0800
+++ linux-2.6.21-rc3/mm/slub.c	2007-03-09 21:24:23.000000000 -0800
@@ -1236,16 +1236,6 @@
 	int order;
 	int rem;
 
-	/*
-	 * If this is an order 0 page then there are no issues with
-	 * fragmentation. We can then create a slab with a single object.
-	 * We need this to support the i386 arch code that uses our
-	 * freelist field (index field) for a list pointer. We neveri
-	 * touch the freelist pointer if we just have one object
-	 */
-	if (size == PAGE_SIZE)
-		return 0;
-
 	for (order = max(slub_min_order, fls(size - 1) - PAGE_SHIFT);
 			order < MAX_ORDER; order++) {
 		unsigned long slab_size = PAGE_SIZE << order;
@@ -1386,15 +1376,6 @@
 
 	tentative_size = ALIGN(size, calculate_alignment(align, flags));
 
-	/*
-	 * PAGE_SIZE slabs are special in that they are passed through
-	 * to the page allocator. Do not do any debugging in order to avoid
-	 * increasing the size of the object.
-	 */
-	if (size == PAGE_SIZE)
-		flags &= ~(SLAB_RED_ZONE| SLAB_DEBUG_FREE | \
-			SLAB_STORE_USER | SLAB_POISON | __OBJECT_POISON);
-
 	s->name = name;
 	s->ctor = ctor;
 	s->dtor = dtor;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 2/6] i386: quicklist support
  2007-03-11  2:09 ` [QUICKLIST 2/6] i386: quicklist support Christoph Lameter
@ 2007-03-11  3:22   ` William Lee Irwin III
  0 siblings, 0 replies; 24+ messages in thread
From: William Lee Irwin III @ 2007-03-11  3:22 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, ak, linux-ia64, holt, mpm

On Sat, Mar 10, 2007 at 06:09:34PM -0800, Christoph Lameter wrote:
> i386: Convert to quicklists
> Implement the i386 management of pgd and pmds using quicklists.

I approve, though it would be nice if ptes had an interface operating
on struct page * to use.


On Sat, Mar 10, 2007 at 06:09:34PM -0800, Christoph Lameter wrote:
> The i386 management of page table pages currently uses page sized slabs.
> The page state is therefore mainly determined by the slab code. However,
> i386 also uses its own fields in the page struct to mark special pages
> and to build a list of pgds using the ->private and ->index field (yuck!).
> This has been finely tuned to work right with SLAB but SLUB needs more
> control over the page struct. Currently the only way for SLUB to support
> these slabs is through special casing PAGE_SIZE slabs.
> If we use quicklists instead then we can avoid the mess, and also the
> overhead of manipulating page sized objects through slab.

Hey! I did quite well given the constraints under which I was operating.


-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 4/6] x86_64: Single Quicklist
  2007-03-11  2:09 ` [QUICKLIST 4/6] x86_64: Single Quicklist Christoph Lameter
@ 2007-03-11  7:54   ` Andi Kleen
  2007-03-11 16:44     ` Christoph Lameter
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2007-03-11  7:54 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, ak, linux-ia64, holt, mpm

On Sunday 11 March 2007 03:09, Christoph Lameter wrote:
> x86_64: Convert to use a single quicklists
> 
> This adds caching of pgds and puds, pmds, pte. That way we can
> avoid costly zeroing and initialization of special mappings in the
> pgd.
> 
> The first patch just adds a simple implementation using a single
> quicklist. As a consequence we need to zero a pgd before returning
> it to the pool.


This and i386 version are ok to me, although it might be better to just
finish __GFP_ZERO support to do this.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 4/6] x86_64: Single Quicklist
  2007-03-11  7:54   ` Andi Kleen
@ 2007-03-11 16:44     ` Christoph Lameter
  2007-03-14 19:49       ` Mel Gorman
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2007-03-11 16:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, ak, linux-ia64, holt, mpm

On Sun, 11 Mar 2007, Andi Kleen wrote:

> This and i386 version are ok to me, although it might be better to just
> finish __GFP_ZERO support to do this.

This would not work for pgds on i386 and x86_64

GFP_ZERO support the way I have done it in the past would mean another set 
of buddy lists in the page allocator and another issue with fragmentation. 
So I have stayed away from it although patches exist in my archives (See 
my ftp.kernel.org archive).

Maybe we could implemento limited GFP_ZERO support by just keeping an 
additional per cpu list of pages? The issue with that one is that a page
may grow cold on that list. One usually want the page to be hot in the 
cache when it is allocated. This is different for page table pages. Page 
table pages are typically sparsely accessed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
                   ` (5 preceding siblings ...)
  2007-03-11  2:09 ` [QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs Christoph Lameter
@ 2007-03-11 20:59 ` David Miller
  2007-03-12 11:12   ` Christoph Lameter
  2007-03-12 22:51 ` David Miller
  7 siblings, 1 reply; 24+ messages in thread
From: David Miller @ 2007-03-11 20:59 UTC (permalink / raw)
  To: clameter; +Cc: linux-kernel, ak, holt, linux-ia64, mpm

From: Christoph Lameter <clameter@sgi.com>
Date: Sat, 10 Mar 2007 18:09:23 -0800 (PST)

> Page table pages have the characteristics that they are typically zero
> or in a known state when they are freed. This is usually the exactly
> same state as needed after allocation. So it makes sense to build a list
> of freed page table pages and then consume the pages already in use
> first. Those pages have already been initialized correctly (thus no
> need to zero them) and are likely already cached in such a way that
> the MMU can use them most effectively.

I'm going to make the radical declaration that it be perhaps often
better to always initialize page table chunks to all zeros on
allocation.

The reason is that every time I've monitored the allocation patterns
of these things on SMP, the page table chunks always get released on a
different cpu than where they were initialized.

It's precisely suboptimal for a workload that forks off a lot of
very short lived jobs, watch what happens during runs of lmbench's
lat_proc for example.

The allocator side just does nothing but emit L2 cache line ownership
transactions as the pte page is touched.  Especially on chips like
PowerPC where zero initialization is absurdly cheap, we can avoid all
of the cache line transfers if we just initialize it at allocation
time.

Look, I like this trick too, sparc and sparc64 were the first Linux
platforms to implement page table caching about 8 years ago, but
I'm wondering whether it really makes sense any more.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-11 20:59 ` [QUICKLIST 0/6] Arch independent quicklists V1 David Miller
@ 2007-03-12 11:12   ` Christoph Lameter
  2007-03-12 11:23     ` David Miller
  2007-03-12 15:52     ` Robin Holt
  0 siblings, 2 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-03-12 11:12 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, ak, holt, linux-ia64, mpm

On Sun, 11 Mar 2007, David Miller wrote:

> I'm going to make the radical declaration that it be perhaps often
> better to always initialize page table chunks to all zeros on
> allocation.

That is the case if most of the page is going to be used soon. If we have 
sparse access patterns then not zeroing can avoid uselessly bringing 
cachelines in.

> The reason is that every time I've monitored the allocation patterns
> of these things on SMP, the page table chunks always get released on a
> different cpu than where they were initialized.

But its even advantageous in that case for sparse allocs.
 
> The allocator side just does nothing but emit L2 cache line ownership
> transactions as the pte page is touched.  Especially on chips like
> PowerPC where zero initialization is absurdly cheap, we can avoid all
> of the cache line transfers if we just initialize it at allocation
> time.

We have no need to touch all the cache lines w/o initialization if we 
alloc from the quicklist. And that is a performance benefit.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-12 11:12   ` Christoph Lameter
@ 2007-03-12 11:23     ` David Miller
  2007-03-12 15:52     ` Robin Holt
  1 sibling, 0 replies; 24+ messages in thread
From: David Miller @ 2007-03-12 11:23 UTC (permalink / raw)
  To: clameter; +Cc: linux-kernel, ak, holt, linux-ia64, mpm

From: Christoph Lameter <clameter@sgi.com>
Date: Mon, 12 Mar 2007 04:12:32 -0700 (PDT)

> On Sun, 11 Mar 2007, David Miller wrote:
> 
> > I'm going to make the radical declaration that it be perhaps often
> > better to always initialize page table chunks to all zeros on
> > allocation.
> 
> That is the case if most of the page is going to be used soon. If we have 
> sparse access patterns then not zeroing can avoid uselessly bringing 
> cachelines in.

Good point.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-12 11:12   ` Christoph Lameter
  2007-03-12 11:23     ` David Miller
@ 2007-03-12 15:52     ` Robin Holt
  1 sibling, 0 replies; 24+ messages in thread
From: Robin Holt @ 2007-03-12 15:52 UTC (permalink / raw)
  To: Christoph Lameter, David Miller; +Cc: linux-kernel, ak, holt, linux-ia64, mpm

On Mon, Mar 12, 2007 at 04:12:32AM -0700, Christoph Lameter wrote:
> On Sun, 11 Mar 2007, David Miller wrote:
> > The reason is that every time I've monitored the allocation patterns
> > of these things on SMP, the page table chunks always get released on a
> > different cpu than where they were initialized.
> 
> But its even advantageous in that case for sparse allocs.

We have a written a little LD_PRELOAD library which intercepts fork()
et. al. and migrates the task to a different cpu, pins itself to only
that cpu, then does the user requested call, upon return from the call,
the task is migrated back to the original cpu and the original allowed
cpus is restored.  With that library, we find a few advantages, mainly,
the task_struct and page tables are allocated node local to the task.
As long as the task does not change its allowed cpus mask, they are also
freed on the same cpu which allocated them.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
                   ` (6 preceding siblings ...)
  2007-03-11 20:59 ` [QUICKLIST 0/6] Arch independent quicklists V1 David Miller
@ 2007-03-12 22:51 ` David Miller
  2007-03-13  0:37   ` Paul Mackerras
  2007-03-14  0:41   ` William Lee Irwin III
  7 siblings, 2 replies; 24+ messages in thread
From: David Miller @ 2007-03-12 22:51 UTC (permalink / raw)
  To: clameter; +Cc: linux-kernel, ak, holt, linux-ia64, mpm

From: Christoph Lameter <clameter@sgi.com>
Date: Sat, 10 Mar 2007 18:09:23 -0800 (PST)

> 6 patches follow this message:
> 
> [QUICKLIST 1/6] Extract quicklist implementation from IA64
> [QUICKLIST 2/6] i386: quicklist support
> [QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list
> [QUICKLIST 4/6] x86_64: Single quicklist
> [QUICKLIST 5/6] x86_64: Separate quicklist for pgds
> [QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs

I ported this to sparc64 as per the patch below, tested on
UP SunBlade1500 and 24 cpu Niagara T1000.

And for fun I checked everything except the SLUB patch into
a cloned Linus tree at:

	kernel.org:/pub/scm/linux/kernel/git/davem/quicklist-2.6.git

Someone with some extreme patience could do the sparc 32-bit port too,
in fact it's lacking the cached PGD update logic that x86 et al. have
so it would even end up being a bug fix :-)  This lack is why sparc32
pre-initializes the vmalloc/module area PGDs with static page tables
at boot time, FWIW.

commit 3f5d1a8772b300d401ff88065c3a6def0c5011d0
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Mon Mar 12 15:20:14 2007 -0700

    [QUICKLIST]: Add sparc64 quicklist support.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/sparc64/Kconfig b/arch/sparc64/Kconfig
index f75a686..40371db 100644
--- a/arch/sparc64/Kconfig
+++ b/arch/sparc64/Kconfig
@@ -26,6 +26,10 @@ config MMU
 	bool
 	default y
 
+config NR_QUICK
+	int
+	default 1
+
 config STACKTRACE_SUPPORT
 	bool
 	default y
diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index b1a1ee0..e399537 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -176,30 +176,6 @@ unsigned long sparc64_kern_sec_context __read_mostly;
 
 int bigkernel = 0;
 
-struct kmem_cache *pgtable_cache __read_mostly;
-
-static void zero_ctor(void *addr, struct kmem_cache *cache, unsigned long flags)
-{
-	clear_page(addr);
-}
-
-extern void tsb_cache_init(void);
-
-void pgtable_cache_init(void)
-{
-	pgtable_cache = kmem_cache_create("pgtable_cache",
-					  PAGE_SIZE, PAGE_SIZE,
-					  SLAB_HWCACHE_ALIGN |
-					  SLAB_MUST_HWCACHE_ALIGN,
-					  zero_ctor,
-					  NULL);
-	if (!pgtable_cache) {
-		prom_printf("Could not create pgtable_cache\n");
-		prom_halt();
-	}
-	tsb_cache_init();
-}
-
 #ifdef CONFIG_DEBUG_DCFLUSH
 atomic_t dcpage_flushes = ATOMIC_INIT(0);
 #ifdef CONFIG_SMP
diff --git a/arch/sparc64/mm/tsb.c b/arch/sparc64/mm/tsb.c
index 236d02f..57eb302 100644
--- a/arch/sparc64/mm/tsb.c
+++ b/arch/sparc64/mm/tsb.c
@@ -252,7 +252,7 @@ static const char *tsb_cache_names[8] = {
 	"tsb_1MB",
 };
 
-void __init tsb_cache_init(void)
+void __init pgtable_cache_init(void)
 {
 	unsigned long i;
 
diff --git a/include/asm-sparc64/pgalloc.h b/include/asm-sparc64/pgalloc.h
index 5891ff7..c7cb5d5 100644
--- a/include/asm-sparc64/pgalloc.h
+++ b/include/asm-sparc64/pgalloc.h
@@ -6,6 +6,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <linux/quicklist.h>
 
 #include <asm/spitfire.h>
 #include <asm/cpudata.h>
@@ -13,52 +14,50 @@
 #include <asm/page.h>
 
 /* Page table allocation/freeing. */
-extern struct kmem_cache *pgtable_cache;
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
-	kmem_cache_free(pgtable_cache, pgd);
+	quicklist_free(0, NULL, pgd);
 }
 
 #define pud_populate(MM, PUD, PMD)	pud_set(PUD, PMD)
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return kmem_cache_alloc(pgtable_cache,
-				GFP_KERNEL|__GFP_REPEAT);
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pmd_free(pmd_t *pmd)
 {
-	kmem_cache_free(pgtable_cache, pmd);
+	quicklist_free(0, NULL, pmd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return kmem_cache_alloc(pgtable_cache,
-				GFP_KERNEL|__GFP_REPEAT);
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm,
 					 unsigned long address)
 {
-	return virt_to_page(pte_alloc_one_kernel(mm, address));
+	void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
+	return pg ? virt_to_page(pg) : NULL;
 }
 		
 static inline void pte_free_kernel(pte_t *pte)
 {
-	kmem_cache_free(pgtable_cache, pte);
+	quicklist_free(0, NULL, pte);
 }
 
 static inline void pte_free(struct page *ptepage)
 {
-	pte_free_kernel(page_address(ptepage));
+	quicklist_free(0, NULL, page_address(ptepage));
 }
 
 
@@ -66,6 +65,9 @@ static inline void pte_free(struct page *ptepage)
 #define pmd_populate(MM,PMD,PTE_PAGE)		\
 	pmd_populate_kernel(MM,PMD,page_address(PTE_PAGE))
 
-#define check_pgt_cache()	do { } while (0)
+static inline void check_pgt_cache(void)
+{
+	quicklist_check(0, NULL);
+}
 
 #endif /* _SPARC64_PGALLOC_H */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-12 22:51 ` David Miller
@ 2007-03-13  0:37   ` Paul Mackerras
  2007-03-13  1:38     ` Christoph Lameter
  2007-03-13  2:26     ` David Miller
  2007-03-14  0:41   ` William Lee Irwin III
  1 sibling, 2 replies; 24+ messages in thread
From: Paul Mackerras @ 2007-03-13  0:37 UTC (permalink / raw)
  To: David Miller; +Cc: clameter, linux-kernel, ak, holt, linux-ia64, mpm

David Miller writes:

> I ported this to sparc64 as per the patch below, tested on
> UP SunBlade1500 and 24 cpu Niagara T1000.

Did you see any performance improvement?  We used to have quicklists
on ppc, but I remain to be convinced that they actually help.

Also, I didn't understand why we have to do quicklists to take
advantage of the fact that the pages are in a pristine state when they
are freed.  I thought the whole point of the slab allocator was to be
able to take advantage of that...

Paul.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-13  0:37   ` Paul Mackerras
@ 2007-03-13  1:38     ` Christoph Lameter
  2007-03-13  2:26     ` David Miller
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2007-03-13  1:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: David Miller, linux-kernel, ak, holt, linux-ia64, mpm

On Tue, 13 Mar 2007, Paul Mackerras wrote:

> Also, I didn't understand why we have to do quicklists to take
> advantage of the fact that the pages are in a pristine state when they
> are freed.  I thought the whole point of the slab allocator was to be
> able to take advantage of that...

It used to be the case that initializating objects was better in the past. 
Today it is better to initialize the objects immediately before they are 
used. That will move them into the cpu caches and keep them there. 
Initializing them earlier may cause the cachelines of the object to be 
evicted from the cpu cache and then those have to be refetched. The 
benefit of this approach diminishes the larger objects get and the sparser 
the access to the cachelines of the object. In the case of page sized 
objects that are sparsely accessed (the PAGE_SIZE caches covered by 
quicklists) it makes sense to attempt to avoid having to touch all 
cachelines of the page on alloc.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-13  0:37   ` Paul Mackerras
  2007-03-13  1:38     ` Christoph Lameter
@ 2007-03-13  2:26     ` David Miller
  2007-03-13  2:32       ` David Miller
  1 sibling, 1 reply; 24+ messages in thread
From: David Miller @ 2007-03-13  2:26 UTC (permalink / raw)
  To: paulus; +Cc: clameter, linux-kernel, ak, holt, linux-ia64, mpm

From: Paul Mackerras <paulus@samba.org>
Date: Tue, 13 Mar 2007 11:37:32 +1100

> David Miller writes:
> 
> > I ported this to sparc64 as per the patch below, tested on
> > UP SunBlade1500 and 24 cpu Niagara T1000.
> 
> Did you see any performance improvement?  We used to have quicklists
> on ppc, but I remain to be convinced that they actually help.

It shaved about 3 or 4 seconds consistently off of my kernel
build on Niagara which usually clocks in just over 4 minutes
on this 24 thread machine.

> Also, I didn't understand why we have to do quicklists to take
> advantage of the fact that the pages are in a pristine state when they
> are freed.  I thought the whole point of the slab allocator was to be
> able to take advantage of that...

He just wants to side-step the issue in SLUB, which arguably
is an attempt to simplify SLUB at the expense of functionality.

I don't agree with that, but I'm merely preemptively testing his
patches and porting them to sparc64 so it does not break when/if his
code is merged in.  After being bitten by stuff like this in the past,
I've decided to become more proactive :)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-13  2:26     ` David Miller
@ 2007-03-13  2:32       ` David Miller
  2007-03-15  8:22         ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: David Miller @ 2007-03-13  2:32 UTC (permalink / raw)
  To: paulus; +Cc: clameter, linux-kernel, ak, holt, linux-ia64, mpm

From: David Miller <davem@davemloft.net>
Date: Mon, 12 Mar 2007 19:26:16 -0700 (PDT)

> From: Paul Mackerras <paulus@samba.org>
> Date: Tue, 13 Mar 2007 11:37:32 +1100
> 
> > David Miller writes:
> > 
> > > I ported this to sparc64 as per the patch below, tested on
> > > UP SunBlade1500 and 24 cpu Niagara T1000.
> > 
> > Did you see any performance improvement?  We used to have quicklists
> > on ppc, but I remain to be convinced that they actually help.
> 
> It shaved about 3 or 4 seconds consistently off of my kernel
> build on Niagara which usually clocks in just over 4 minutes
> on this 24 thread machine.

I want to quantify this with the fact that all the cache false sharing
issues are irrelevant in this test because the L2 cache is shared
between all of the cpu threads on Niagara.

It was fast just because the quicklists were lighter weight than the
SLAB stuff.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-12 22:51 ` David Miller
  2007-03-13  0:37   ` Paul Mackerras
@ 2007-03-14  0:41   ` William Lee Irwin III
  1 sibling, 0 replies; 24+ messages in thread
From: William Lee Irwin III @ 2007-03-14  0:41 UTC (permalink / raw)
  To: David Miller; +Cc: clameter, linux-kernel, ak, holt, linux-ia64, mpm

On Mon, Mar 12, 2007 at 03:51:57PM -0700, David Miller wrote:
> Someone with some extreme patience could do the sparc 32-bit port too,
> in fact it's lacking the cached PGD update logic that x86 et al. have
> so it would even end up being a bug fix :-)  This lack is why sparc32
> pre-initializes the vmalloc/module area PGDs with static page tables
> at boot time, FWIW.

I'll spare everyone the details and let code if/when it appears stand
in for promises on the sparc32 front.


-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 4/6] x86_64: Single Quicklist
  2007-03-11 16:44     ` Christoph Lameter
@ 2007-03-14 19:49       ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2007-03-14 19:49 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, linux-kernel, ak, linux-ia64, holt, mpm

On (11/03/07 09:44), Christoph Lameter didst pronounce:
> On Sun, 11 Mar 2007, Andi Kleen wrote:
> 
> > This and i386 version are ok to me, although it might be better to just
> > finish __GFP_ZERO support to do this.
> 
> This would not work for pgds on i386 and x86_64
> 
> GFP_ZERO support the way I have done it in the past would mean another set 
> of buddy lists in the page allocator and another issue with fragmentation. 
> So I have stayed away from it although patches exist in my archives (See 
> my ftp.kernel.org archive).

I haven't checked this in a while but when we experimented with keeping
zero'd pages on separate lists before, the performance sucked. I haven't
looked at it in a *long* time though.

> 
> Maybe we could implemento limited GFP_ZERO support by just keeping an 
> additional per cpu list of pages?

I imagine that adding an additional per-cpu list will not be welcome.

> The issue with that one is that a page
> may grow cold on that list.

And that growing cold appeared to hurt before. It could be checked out
again though. The anti-fragmentation breaks out the buddy lists already
and has the ability to search the per-cpu lists for pages of an
appropriate type.

I'll try and find an hour or two to hack something together to see what
it looks like but I suspect it'll still be a performance loss. At least
then though, we can see if quicklists are a better plan or not.

> One usually want the page to be hot in the 
> cache when it is allocated. This is different for page table pages. Page 
> table pages are typically sparsely accessed.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-15  8:22         ` Andrew Morton
@ 2007-03-15  7:31           ` David Miller
  2007-03-15  8:39             ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: David Miller @ 2007-03-15  7:31 UTC (permalink / raw)
  To: akpm; +Cc: paulus, clameter, linux-kernel, ak, holt, linux-ia64, mpm

From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 15 Mar 2007 00:22:49 -0800

> So...  what would happen if sparc64 were to use neither quicklists nor
> slab?  Just grab these pages from the page allocator and clear them?

The page table allocator is heavier weight than the quicklists,
although obviously not as heavy weight as SLAB.

I know special purpose allocation lists suck, but they really help in
this case in my opinion.

And for the x86 cases it's not going to help to have GFP_ZERO stuff
via the page allocator for page tables, they have to have specific
bits in the pre-initialized areas for the kernel PGDs, not just zero.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-13  2:32       ` David Miller
@ 2007-03-15  8:22         ` Andrew Morton
  2007-03-15  7:31           ` David Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2007-03-15  8:22 UTC (permalink / raw)
  To: David Miller; +Cc: paulus, clameter, linux-kernel, ak, holt, linux-ia64, mpm

> On Mon, 12 Mar 2007 19:32:11 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 12 Mar 2007 19:26:16 -0700 (PDT)
> 
> > From: Paul Mackerras <paulus@samba.org>
> > Date: Tue, 13 Mar 2007 11:37:32 +1100
> > 
> > > David Miller writes:
> > > 
> > > > I ported this to sparc64 as per the patch below, tested on
> > > > UP SunBlade1500 and 24 cpu Niagara T1000.
> > > 
> > > Did you see any performance improvement?  We used to have quicklists
> > > on ppc, but I remain to be convinced that they actually help.
> > 
> > It shaved about 3 or 4 seconds consistently off of my kernel
> > build on Niagara which usually clocks in just over 4 minutes
> > on this 24 thread machine.
> 
> I want to quantify this with the fact that all the cache false sharing
> issues are irrelevant in this test because the L2 cache is shared
> between all of the cpu threads on Niagara.
> 
> It was fast just because the quicklists were lighter weight than the
> SLAB stuff.

So...  what would happen if sparc64 were to use neither quicklists nor
slab?  Just grab these pages from the page allocator and clear them?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [QUICKLIST 0/6] Arch independent quicklists V1
  2007-03-15  7:31           ` David Miller
@ 2007-03-15  8:39             ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2007-03-15  8:39 UTC (permalink / raw)
  To: David Miller; +Cc: paulus, clameter, linux-kernel, ak, holt, linux-ia64, mpm

> On Thu, 15 Mar 2007 00:31:18 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Thu, 15 Mar 2007 00:22:49 -0800
> 
> > So...  what would happen if sparc64 were to use neither quicklists nor
> > slab?  Just grab these pages from the page allocator and clear them?
> 
> The page table allocator is heavier weight than the quicklists,
> although obviously not as heavy weight as SLAB.

Spose so, although only in the case where we go into the buddy lists, I hope.

otoh, releasing a cache-hot page into the page allocator makes it available for
other use.

It'd be nice if you could run the numbers sometime, please - if it's OK then
we can remove code...

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2007-03-15  7:39 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-11  2:09 [QUICKLIST 0/6] Arch independent quicklists V1 Christoph Lameter
2007-03-11  2:09 ` [QUICKLIST 1/6] Extract quicklist implementation from IA64 Christoph Lameter
2007-03-11  2:09 ` [QUICKLIST 2/6] i386: quicklist support Christoph Lameter
2007-03-11  3:22   ` William Lee Irwin III
2007-03-11  2:09 ` [QUICKLIST 3/6] i386: Use standard list manipulators for pgd_list Christoph Lameter
2007-03-11  2:09 ` [QUICKLIST 4/6] x86_64: Single Quicklist Christoph Lameter
2007-03-11  7:54   ` Andi Kleen
2007-03-11 16:44     ` Christoph Lameter
2007-03-14 19:49       ` Mel Gorman
2007-03-11  2:09 ` [QUICKLIST 5/6] x86_64: Separate quicklist for pgds Christoph Lameter
2007-03-11  2:09 ` [QUICKLIST 6/6] slub: remove special casing for PAGE_SIZE slabs Christoph Lameter
2007-03-11 20:59 ` [QUICKLIST 0/6] Arch independent quicklists V1 David Miller
2007-03-12 11:12   ` Christoph Lameter
2007-03-12 11:23     ` David Miller
2007-03-12 15:52     ` Robin Holt
2007-03-12 22:51 ` David Miller
2007-03-13  0:37   ` Paul Mackerras
2007-03-13  1:38     ` Christoph Lameter
2007-03-13  2:26     ` David Miller
2007-03-13  2:32       ` David Miller
2007-03-15  8:22         ` Andrew Morton
2007-03-15  7:31           ` David Miller
2007-03-15  8:39             ` Andrew Morton
2007-03-14  0:41   ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).