All of lore.kernel.org
 help / color / mirror / Atom feed
* [00/18] Virtual Compound Page Support V2
@ 2007-10-04  3:59 ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

Allocations of larger pages are not reliable in Linux. If larger
pages have to be allocated then one faces various choices of allowing
graceful fallback or using vmalloc with a performance penalty due
to the use of a page table. Virtual Compound pages are
a simple solution out of this dilemma. If an allocation specifies
GFP_VFALLBACK then the page allocator will first attempt to satisfy
the request with physically contiguous memory. If that is not possible
then the page allocator will create a virtually contiguous memory
area for the caller. That way large allocations may perhaps be
considered "reliable" indepedent of the memory fragmentation situation.

This means that memory with optimal performance is used when available.
We are currently gradually introducing methods to reduce memory
defragmentation. The better these methods become the less the
chances that fallback will occur.

Fallback is rare in particular on machines with contemporary memory
sizes of 1G or more. It seems to take special load situations that
pin a lot of memory and systems with low memory in order to get
system memory so fragmented that the fallback scheme must kick in.

There is therefore a compile time option to switch on fallback for
testing purposes. Virtually mapped mmemory may behave differently
and the CONFIG_FALLBACK_ALWAYS option will insure that the code is
tested to deal with virtual memory.

The patchset then addresses a series of issues in the current code
through the use of fallbacks:

- Fallback for x86_64 stack allocations. The default stack size
  is 8k which requires an order 1 allocation.

- Removes the manual fallback to vmalloc for sparsemem
  through the use of GFP_VFALLBACK.

- Uses a compound page for the wait table in the zone thereby
  avoiding having to go through a page table to get to the
  data structures used for waiting on events in pages.

- Allows fallback for the order 2 allocation in the crypto
  subsystem.

- Allows fallback for the caller table used by SLUB when determining
  the call sites for slab caches for sysfs output.

- Allows a configurable stack size on x86_64 (up to 32k).

More uses are possible by simply adding GFP_VFALLBACK to the page
flags or by converting vmalloc calls to regular page allocator calls.

It is likely that we have had to avoid the use of larger memory areas
because of the reliability issues. The patch may simplify future coding
of handling large memoryh areas because these issues are taken care of
by the page allocator. For HPC uses we constantly have to deal with
demands for larger and larger memory areas to speed up various loads.

Additional patches exist to enable SLUB and the Large Blocksize Patchset
to use these fallbacks.

The patchset is also available via git from the largeblock git tree via

git pull
  git://git.kernel.org/pub/scm/linux/kernel/git/christoph/largeblocksize.git
    vcompound

V1->V2
- Remove some cleanup patches and the SLUB patches from this set.
- Transparent vcompound support through page_address() and
  virt_to_head_page().
- Additional use cases.
- Factor the code better for an easier read
- Add configurable stack size.
- Follow up on various suggestions made for V1

RFC->V1
- Complete support for all compound functions for virtual compound pages
  (including the compound_nth_page() necessary for LBS mmap support)
- Fix various bugs
- Fix i386 build

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [00/18] Virtual Compound Page Support V2
@ 2007-10-04  3:59 ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

Allocations of larger pages are not reliable in Linux. If larger
pages have to be allocated then one faces various choices of allowing
graceful fallback or using vmalloc with a performance penalty due
to the use of a page table. Virtual Compound pages are
a simple solution out of this dilemma. If an allocation specifies
GFP_VFALLBACK then the page allocator will first attempt to satisfy
the request with physically contiguous memory. If that is not possible
then the page allocator will create a virtually contiguous memory
area for the caller. That way large allocations may perhaps be
considered "reliable" indepedent of the memory fragmentation situation.

This means that memory with optimal performance is used when available.
We are currently gradually introducing methods to reduce memory
defragmentation. The better these methods become the less the
chances that fallback will occur.

Fallback is rare in particular on machines with contemporary memory
sizes of 1G or more. It seems to take special load situations that
pin a lot of memory and systems with low memory in order to get
system memory so fragmented that the fallback scheme must kick in.

There is therefore a compile time option to switch on fallback for
testing purposes. Virtually mapped mmemory may behave differently
and the CONFIG_FALLBACK_ALWAYS option will insure that the code is
tested to deal with virtual memory.

The patchset then addresses a series of issues in the current code
through the use of fallbacks:

- Fallback for x86_64 stack allocations. The default stack size
  is 8k which requires an order 1 allocation.

- Removes the manual fallback to vmalloc for sparsemem
  through the use of GFP_VFALLBACK.

- Uses a compound page for the wait table in the zone thereby
  avoiding having to go through a page table to get to the
  data structures used for waiting on events in pages.

- Allows fallback for the order 2 allocation in the crypto
  subsystem.

- Allows fallback for the caller table used by SLUB when determining
  the call sites for slab caches for sysfs output.

- Allows a configurable stack size on x86_64 (up to 32k).

More uses are possible by simply adding GFP_VFALLBACK to the page
flags or by converting vmalloc calls to regular page allocator calls.

It is likely that we have had to avoid the use of larger memory areas
because of the reliability issues. The patch may simplify future coding
of handling large memoryh areas because these issues are taken care of
by the page allocator. For HPC uses we constantly have to deal with
demands for larger and larger memory areas to speed up various loads.

Additional patches exist to enable SLUB and the Large Blocksize Patchset
to use these fallbacks.

The patchset is also available via git from the largeblock git tree via

git pull
  git://git.kernel.org/pub/scm/linux/kernel/git/christoph/largeblocksize.git
    vcompound

V1->V2
- Remove some cleanup patches and the SLUB patches from this set.
- Transparent vcompound support through page_address() and
  virt_to_head_page().
- Additional use cases.
- Factor the code better for an easier read
- Add configurable stack size.
- Follow up on various suggestions made for V1

RFC->V1
- Complete support for all compound functions for virtual compound pages
  (including the compound_nth_page() necessary for LBS mmap support)
- Fix various bugs
- Fix i386 build

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [01/18] vmalloc: clean up page array indexing
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_array_indexes --]
[-- Type: text/plain, Size: 1425 bytes --]

The page array is repeatedly indexed both in vunmap and vmalloc_area_node().
Add a temporary variable to make it easier to read (and easier to patch
later).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/vmalloc.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-02 09:26:16.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-02 21:35:34.000000000 -0700
@@ -345,8 +345,10 @@ static void __vunmap(void *addr, int dea
 		int i;
 
 		for (i = 0; i < area->nr_pages; i++) {
-			BUG_ON(!area->pages[i]);
-			__free_page(area->pages[i]);
+			struct page *page = area->pages[i];
+
+			BUG_ON(!page);
+			__free_page(page);
 		}
 
 		if (area->flags & VM_VPAGES)
@@ -450,15 +452,19 @@ void *__vmalloc_area_node(struct vm_stru
 	}
 
 	for (i = 0; i < area->nr_pages; i++) {
+		struct page *page;
+
 		if (node < 0)
-			area->pages[i] = alloc_page(gfp_mask);
+			page = alloc_page(gfp_mask);
 		else
-			area->pages[i] = alloc_pages_node(node, gfp_mask, 0);
-		if (unlikely(!area->pages[i])) {
+			page = alloc_pages_node(node, gfp_mask, 0);
+
+		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
 			area->nr_pages = i;
 			goto fail;
 		}
+		area->pages[i] = page;
 	}
 
 	if (map_vm_area(area, prot, &pages))

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [01/18] vmalloc: clean up page array indexing
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_array_indexes --]
[-- Type: text/plain, Size: 1651 bytes --]

The page array is repeatedly indexed both in vunmap and vmalloc_area_node().
Add a temporary variable to make it easier to read (and easier to patch
later).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/vmalloc.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-02 09:26:16.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-02 21:35:34.000000000 -0700
@@ -345,8 +345,10 @@ static void __vunmap(void *addr, int dea
 		int i;
 
 		for (i = 0; i < area->nr_pages; i++) {
-			BUG_ON(!area->pages[i]);
-			__free_page(area->pages[i]);
+			struct page *page = area->pages[i];
+
+			BUG_ON(!page);
+			__free_page(page);
 		}
 
 		if (area->flags & VM_VPAGES)
@@ -450,15 +452,19 @@ void *__vmalloc_area_node(struct vm_stru
 	}
 
 	for (i = 0; i < area->nr_pages; i++) {
+		struct page *page;
+
 		if (node < 0)
-			area->pages[i] = alloc_page(gfp_mask);
+			page = alloc_page(gfp_mask);
 		else
-			area->pages[i] = alloc_pages_node(node, gfp_mask, 0);
-		if (unlikely(!area->pages[i])) {
+			page = alloc_pages_node(node, gfp_mask, 0);
+
+		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
 			area->nr_pages = i;
 			goto fail;
 		}
+		area->pages[i] = page;
 	}
 
 	if (map_vm_area(area, prot, &pages))

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [02/18] vunmap: return page array passed on vmap()
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_vunmap_returns_pages --]
[-- Type: text/plain, Size: 4235 bytes --]

Make vunmap return the page array that was used at vmap. This is useful
if one has no structures to track the page array but simply stores the
virtual address somewhere. The disposition of the page array can be
decided upon after vunmap. vfree() may now also be used instead of
vunmap which will release the page array after vunmap'ping it.

As noted by Kamezawa: The same subsystem that provides the page array
to vmap must must use its own method to dispose of the page array.

If vfree() is called to free the page array then the page array must either
be

1. Allocated via the slab allocator

2. Allocated via vmalloc but then VM_VPAGES must have been passed at
   vunmap to specify that a vfree is needed.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/vmalloc.h |    2 +-
 mm/vmalloc.c            |   32 ++++++++++++++++++++++----------
 2 files changed, 23 insertions(+), 11 deletions(-)

Index: linux-2.6/include/linux/vmalloc.h
===================================================================
--- linux-2.6.orig/include/linux/vmalloc.h	2007-10-03 16:19:29.000000000 -0700
+++ linux-2.6/include/linux/vmalloc.h	2007-10-03 16:19:41.000000000 -0700
@@ -49,7 +49,7 @@ extern void vfree(void *addr);
 
 extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
-extern void vunmap(void *addr);
+extern struct page **vunmap(void *addr);
 
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 							unsigned long pgoff);
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 16:19:35.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 16:20:15.000000000 -0700
@@ -152,6 +152,7 @@ int map_vm_area(struct vm_struct *area, 
 	unsigned long addr = (unsigned long) area->addr;
 	unsigned long end = addr + area->size - PAGE_SIZE;
 	int err;
+	area->pages = *pages;
 
 	BUG_ON(addr >= end);
 	pgd = pgd_offset_k(addr);
@@ -162,6 +163,8 @@ int map_vm_area(struct vm_struct *area, 
 			break;
 	} while (pgd++, addr = next, addr != end);
 	flush_cache_vmap((unsigned long) area->addr, end);
+
+	area->nr_pages = *pages - area->pages;
 	return err;
 }
 EXPORT_SYMBOL_GPL(map_vm_area);
@@ -318,17 +321,18 @@ struct vm_struct *remove_vm_area(void *a
 	return v;
 }
 
-static void __vunmap(void *addr, int deallocate_pages)
+static struct page **__vunmap(void *addr, int deallocate_pages)
 {
 	struct vm_struct *area;
+	struct page **pages;
 
 	if (!addr)
-		return;
+		return NULL;
 
 	if ((PAGE_SIZE-1) & (unsigned long)addr) {
 		printk(KERN_ERR "Trying to vfree() bad address (%p)\n", addr);
 		WARN_ON(1);
-		return;
+		return NULL;
 	}
 
 	area = remove_vm_area(addr);
@@ -336,29 +340,30 @@ static void __vunmap(void *addr, int dea
 		printk(KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
 				addr);
 		WARN_ON(1);
-		return;
+		return NULL;
 	}
 
+	pages = area->pages;
 	debug_check_no_locks_freed(addr, area->size);
 
 	if (deallocate_pages) {
 		int i;
 
 		for (i = 0; i < area->nr_pages; i++) {
-			struct page *page = area->pages[i];
+			struct page *page = pages[i];
 
 			BUG_ON(!page);
 			__free_page(page);
 		}
 
 		if (area->flags & VM_VPAGES)
-			vfree(area->pages);
+			vfree(pages);
 		else
-			kfree(area->pages);
+			kfree(pages);
 	}
 
 	kfree(area);
-	return;
+	return pages;
 }
 
 /**
@@ -387,10 +392,10 @@ EXPORT_SYMBOL(vfree);
  *
  *	Must not be called in interrupt context.
  */
-void vunmap(void *addr)
+struct page **vunmap(void *addr)
 {
 	BUG_ON(in_interrupt());
-	__vunmap(addr, 0);
+	return __vunmap(addr, 0);
 }
 EXPORT_SYMBOL(vunmap);
 
@@ -403,6 +408,13 @@ EXPORT_SYMBOL(vunmap);
  *
  *	Maps @count pages from @pages into contiguous kernel virtual
  *	space.
+ *
+ *	The page array may be freed via vfree() on the virtual address
+ *	returned. In that case the page array must be allocated via
+ *	the slab allocator. If the page array was allocated via
+ *	vmalloc then VM_VPAGES must be specified in the flags. There is
+ *	no support for vfree() to free a page array allocated via the
+ *	page allocator.
  */
 void *vmap(struct page **pages, unsigned int count,
 		unsigned long flags, pgprot_t prot)

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [02/18] vunmap: return page array passed on vmap()
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_vunmap_returns_pages --]
[-- Type: text/plain, Size: 4461 bytes --]

Make vunmap return the page array that was used at vmap. This is useful
if one has no structures to track the page array but simply stores the
virtual address somewhere. The disposition of the page array can be
decided upon after vunmap. vfree() may now also be used instead of
vunmap which will release the page array after vunmap'ping it.

As noted by Kamezawa: The same subsystem that provides the page array
to vmap must must use its own method to dispose of the page array.

If vfree() is called to free the page array then the page array must either
be

1. Allocated via the slab allocator

2. Allocated via vmalloc but then VM_VPAGES must have been passed at
   vunmap to specify that a vfree is needed.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/vmalloc.h |    2 +-
 mm/vmalloc.c            |   32 ++++++++++++++++++++++----------
 2 files changed, 23 insertions(+), 11 deletions(-)

Index: linux-2.6/include/linux/vmalloc.h
===================================================================
--- linux-2.6.orig/include/linux/vmalloc.h	2007-10-03 16:19:29.000000000 -0700
+++ linux-2.6/include/linux/vmalloc.h	2007-10-03 16:19:41.000000000 -0700
@@ -49,7 +49,7 @@ extern void vfree(void *addr);
 
 extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
-extern void vunmap(void *addr);
+extern struct page **vunmap(void *addr);
 
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 							unsigned long pgoff);
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 16:19:35.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 16:20:15.000000000 -0700
@@ -152,6 +152,7 @@ int map_vm_area(struct vm_struct *area, 
 	unsigned long addr = (unsigned long) area->addr;
 	unsigned long end = addr + area->size - PAGE_SIZE;
 	int err;
+	area->pages = *pages;
 
 	BUG_ON(addr >= end);
 	pgd = pgd_offset_k(addr);
@@ -162,6 +163,8 @@ int map_vm_area(struct vm_struct *area, 
 			break;
 	} while (pgd++, addr = next, addr != end);
 	flush_cache_vmap((unsigned long) area->addr, end);
+
+	area->nr_pages = *pages - area->pages;
 	return err;
 }
 EXPORT_SYMBOL_GPL(map_vm_area);
@@ -318,17 +321,18 @@ struct vm_struct *remove_vm_area(void *a
 	return v;
 }
 
-static void __vunmap(void *addr, int deallocate_pages)
+static struct page **__vunmap(void *addr, int deallocate_pages)
 {
 	struct vm_struct *area;
+	struct page **pages;
 
 	if (!addr)
-		return;
+		return NULL;
 
 	if ((PAGE_SIZE-1) & (unsigned long)addr) {
 		printk(KERN_ERR "Trying to vfree() bad address (%p)\n", addr);
 		WARN_ON(1);
-		return;
+		return NULL;
 	}
 
 	area = remove_vm_area(addr);
@@ -336,29 +340,30 @@ static void __vunmap(void *addr, int dea
 		printk(KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
 				addr);
 		WARN_ON(1);
-		return;
+		return NULL;
 	}
 
+	pages = area->pages;
 	debug_check_no_locks_freed(addr, area->size);
 
 	if (deallocate_pages) {
 		int i;
 
 		for (i = 0; i < area->nr_pages; i++) {
-			struct page *page = area->pages[i];
+			struct page *page = pages[i];
 
 			BUG_ON(!page);
 			__free_page(page);
 		}
 
 		if (area->flags & VM_VPAGES)
-			vfree(area->pages);
+			vfree(pages);
 		else
-			kfree(area->pages);
+			kfree(pages);
 	}
 
 	kfree(area);
-	return;
+	return pages;
 }
 
 /**
@@ -387,10 +392,10 @@ EXPORT_SYMBOL(vfree);
  *
  *	Must not be called in interrupt context.
  */
-void vunmap(void *addr)
+struct page **vunmap(void *addr)
 {
 	BUG_ON(in_interrupt());
-	__vunmap(addr, 0);
+	return __vunmap(addr, 0);
 }
 EXPORT_SYMBOL(vunmap);
 
@@ -403,6 +408,13 @@ EXPORT_SYMBOL(vunmap);
  *
  *	Maps @count pages from @pages into contiguous kernel virtual
  *	space.
+ *
+ *	The page array may be freed via vfree() on the virtual address
+ *	returned. In that case the page array must be allocated via
+ *	the slab allocator. If the page array was allocated via
+ *	vmalloc then VM_VPAGES must be specified in the flags. There is
+ *	no support for vfree() to free a page array allocated via the
+ *	page allocator.
  */
 void *vmap(struct page **pages, unsigned int count,
 		unsigned long flags, pgprot_t prot)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [03/18] vmalloc_address(): Determine vmalloc address from page struct
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_vmalloc_address --]
[-- Type: text/plain, Size: 3102 bytes --]

Sometimes we need to figure out which vmalloc address is in use
for a certain page struct. There is no easy way to figure out
the vmalloc address from the page struct. Simply search through
the kernel page tables to find the address. Use sparingly.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h |    2 +
 mm/vmalloc.c       |   79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 16:20:15.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 16:20:48.000000000 -0700
@@ -840,3 +840,82 @@ void free_vm_area(struct vm_struct *area
 	kfree(area);
 }
 EXPORT_SYMBOL_GPL(free_vm_area);
+
+
+/*
+ * Determine vmalloc address from a page struct.
+ *
+ * Linear search through all ptes of the vmalloc area.
+ */
+static unsigned long vaddr_pte_range(pmd_t *pmd, unsigned long addr,
+		unsigned long end, unsigned long pfn)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		pte_t ptent = *pte;
+		if (pte_present(ptent) && pte_pfn(ptent) == pfn)
+			return addr;
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	return 0;
+}
+
+static inline unsigned long vaddr_pmd_range(pud_t *pud, unsigned long addr,
+		unsigned long end, unsigned long pfn)
+{
+	pmd_t *pmd;
+	unsigned long next;
+	unsigned long n;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		n = vaddr_pte_range(pmd, addr, next, pfn);
+		if (n)
+			return n;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline unsigned long vaddr_pud_range(pgd_t *pgd, unsigned long addr,
+		unsigned long end, unsigned long pfn)
+{
+	pud_t *pud;
+	unsigned long next;
+	unsigned long n;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		n = vaddr_pmd_range(pud, addr, next, pfn);
+		if (n)
+			return n;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+void *vmalloc_address(struct page *page)
+{
+	pgd_t *pgd;
+	unsigned long next, n;
+	unsigned long addr = VMALLOC_START;
+	unsigned long pfn = page_to_pfn(page);
+
+	pgd = pgd_offset_k(VMALLOC_START);
+	do {
+		next = pgd_addr_end(addr, VMALLOC_END);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		n = vaddr_pud_range(pgd, addr, next, pfn);
+		if (n)
+			return (void *)n;
+	} while (pgd++, addr = next, addr < VMALLOC_END);
+	return NULL;
+}
+EXPORT_SYMBOL(vmalloc_address);
+
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 16:19:27.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 16:20:48.000000000 -0700
@@ -294,6 +294,8 @@ static inline int get_page_unless_zero(s
 	return atomic_inc_not_zero(&page->_count);
 }
 
+void *vmalloc_address(struct page *);
+
 static inline struct page *compound_head(struct page *page)
 {
 	if (unlikely(PageTail(page)))

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [03/18] vmalloc_address(): Determine vmalloc address from page struct
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_vmalloc_address --]
[-- Type: text/plain, Size: 3328 bytes --]

Sometimes we need to figure out which vmalloc address is in use
for a certain page struct. There is no easy way to figure out
the vmalloc address from the page struct. Simply search through
the kernel page tables to find the address. Use sparingly.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h |    2 +
 mm/vmalloc.c       |   79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 16:20:15.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 16:20:48.000000000 -0700
@@ -840,3 +840,82 @@ void free_vm_area(struct vm_struct *area
 	kfree(area);
 }
 EXPORT_SYMBOL_GPL(free_vm_area);
+
+
+/*
+ * Determine vmalloc address from a page struct.
+ *
+ * Linear search through all ptes of the vmalloc area.
+ */
+static unsigned long vaddr_pte_range(pmd_t *pmd, unsigned long addr,
+		unsigned long end, unsigned long pfn)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		pte_t ptent = *pte;
+		if (pte_present(ptent) && pte_pfn(ptent) == pfn)
+			return addr;
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+	return 0;
+}
+
+static inline unsigned long vaddr_pmd_range(pud_t *pud, unsigned long addr,
+		unsigned long end, unsigned long pfn)
+{
+	pmd_t *pmd;
+	unsigned long next;
+	unsigned long n;
+
+	pmd = pmd_offset(pud, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none_or_clear_bad(pmd))
+			continue;
+		n = vaddr_pte_range(pmd, addr, next, pfn);
+		if (n)
+			return n;
+	} while (pmd++, addr = next, addr != end);
+	return 0;
+}
+
+static inline unsigned long vaddr_pud_range(pgd_t *pgd, unsigned long addr,
+		unsigned long end, unsigned long pfn)
+{
+	pud_t *pud;
+	unsigned long next;
+	unsigned long n;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none_or_clear_bad(pud))
+			continue;
+		n = vaddr_pmd_range(pud, addr, next, pfn);
+		if (n)
+			return n;
+	} while (pud++, addr = next, addr != end);
+	return 0;
+}
+
+void *vmalloc_address(struct page *page)
+{
+	pgd_t *pgd;
+	unsigned long next, n;
+	unsigned long addr = VMALLOC_START;
+	unsigned long pfn = page_to_pfn(page);
+
+	pgd = pgd_offset_k(VMALLOC_START);
+	do {
+		next = pgd_addr_end(addr, VMALLOC_END);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		n = vaddr_pud_range(pgd, addr, next, pfn);
+		if (n)
+			return (void *)n;
+	} while (pgd++, addr = next, addr < VMALLOC_END);
+	return NULL;
+}
+EXPORT_SYMBOL(vmalloc_address);
+
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 16:19:27.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 16:20:48.000000000 -0700
@@ -294,6 +294,8 @@ static inline int get_page_unless_zero(s
 	return atomic_inc_not_zero(&page->_count);
 }
 
+void *vmalloc_address(struct page *);
+
 static inline struct page *compound_head(struct page *page)
 {
 	if (unlikely(PageTail(page)))

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [04/18] Vcompound: Smart up virt_to_head_page()
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_virt_to_head_page --]
[-- Type: text/plain, Size: 2209 bytes --]

The determination of a page struct for an address in a compound page
will need some more smarts in order to deal with virtual addresses.

We need to use the evil constants VMALLOC_START and VMALLOC_END for this
and they are notoriously for referencing various arch header files or may
even be variables. Uninline the function to avoid trouble.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h |    6 +-----
 mm/page_alloc.c    |   23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 19:21:50.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 19:23:08.000000000 -0700
@@ -315,11 +315,7 @@ static inline void get_page(struct page 
 	atomic_inc(&page->_count);
 }
 
-static inline struct page *virt_to_head_page(const void *x)
-{
-	struct page *page = virt_to_page(x);
-	return compound_head(page);
-}
+struct page *virt_to_head_page(const void *x);
 
 /*
  * Setup the page count before being freed into the page allocator for
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 19:21:50.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 19:23:08.000000000 -0700
@@ -150,6 +150,29 @@ int nr_node_ids __read_mostly = MAX_NUMN
 EXPORT_SYMBOL(nr_node_ids);
 #endif
 
+/*
+ * Determine the appropriate page struct given a virtual address
+ * (including vmalloced areas).
+ *
+ * Return the head page if this is a compound page.
+ *
+ * Cannot be inlined since VMALLOC_START and VMALLOC_END may contain
+ * complex calculations that depend on multiple arch includes or
+ * even variables.
+ */
+struct page *virt_to_head_page(const void *x)
+{
+	unsigned long addr = (unsigned long)x;
+	struct page *page;
+
+	if (unlikely(addr >= VMALLOC_START && addr < VMALLOC_END))
+		page = vmalloc_to_page((void *)addr);
+	else
+		page = virt_to_page(addr);
+
+	return compound_head(page);
+}
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [04/18] Vcompound: Smart up virt_to_head_page()
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_virt_to_head_page --]
[-- Type: text/plain, Size: 2435 bytes --]

The determination of a page struct for an address in a compound page
will need some more smarts in order to deal with virtual addresses.

We need to use the evil constants VMALLOC_START and VMALLOC_END for this
and they are notoriously for referencing various arch header files or may
even be variables. Uninline the function to avoid trouble.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h |    6 +-----
 mm/page_alloc.c    |   23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 19:21:50.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 19:23:08.000000000 -0700
@@ -315,11 +315,7 @@ static inline void get_page(struct page 
 	atomic_inc(&page->_count);
 }
 
-static inline struct page *virt_to_head_page(const void *x)
-{
-	struct page *page = virt_to_page(x);
-	return compound_head(page);
-}
+struct page *virt_to_head_page(const void *x);
 
 /*
  * Setup the page count before being freed into the page allocator for
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 19:21:50.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 19:23:08.000000000 -0700
@@ -150,6 +150,29 @@ int nr_node_ids __read_mostly = MAX_NUMN
 EXPORT_SYMBOL(nr_node_ids);
 #endif
 
+/*
+ * Determine the appropriate page struct given a virtual address
+ * (including vmalloced areas).
+ *
+ * Return the head page if this is a compound page.
+ *
+ * Cannot be inlined since VMALLOC_START and VMALLOC_END may contain
+ * complex calculations that depend on multiple arch includes or
+ * even variables.
+ */
+struct page *virt_to_head_page(const void *x)
+{
+	unsigned long addr = (unsigned long)x;
+	struct page *page;
+
+	if (unlikely(addr >= VMALLOC_START && addr < VMALLOC_END))
+		page = vmalloc_to_page((void *)addr);
+	else
+		page = virt_to_page(addr);
+
+	return compound_head(page);
+}
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [05/18] Page flags: Add PageVcompound()
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_pagevcompound --]
[-- Type: text/plain, Size: 1588 bytes --]

Add a another page flag that can be used to figure out if a compound
page is virtually mapped. The mark is necessary since we have to know
when freeing pages if we have to destroy a virtual mapping. No additional
flag is consumed through the use of PG_swapcache together with PG_compound
(similar to PageHead() and PageTail()).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/page-flags.h |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h	2007-10-03 19:31:51.000000000 -0700
+++ linux-2.6/include/linux/page-flags.h	2007-10-03 19:34:37.000000000 -0700
@@ -248,6 +248,24 @@ static inline void __ClearPageTail(struc
 #define __SetPageHead(page)	__SetPageCompound(page)
 #define __ClearPageHead(page)	__ClearPageCompound(page)
 
+/*
+ * PG_swapcache is used in combination with PG_compound to indicate
+ * that a compound page was allocated via vmalloc.
+ */
+#define PG_vcompound_mask ((1L << PG_compound) | (1L << PG_swapcache))
+#define PageVcompound(page)	((page->flags & PG_vcompound_mask) \
+					== PG_vcompound_mask)
+
+static inline void __SetPageVcompound(struct page *page)
+{
+	page->flags |= PG_vcompound_mask;
+}
+
+static inline void __ClearPageVcompound(struct page *page)
+{
+	page->flags &= ~PG_vcompound_mask;
+}
+
 #ifdef CONFIG_SWAP
 #define PageSwapCache(page)	test_bit(PG_swapcache, &(page)->flags)
 #define SetPageSwapCache(page)	set_bit(PG_swapcache, &(page)->flags)

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [05/18] Page flags: Add PageVcompound()
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_pagevcompound --]
[-- Type: text/plain, Size: 1814 bytes --]

Add a another page flag that can be used to figure out if a compound
page is virtually mapped. The mark is necessary since we have to know
when freeing pages if we have to destroy a virtual mapping. No additional
flag is consumed through the use of PG_swapcache together with PG_compound
(similar to PageHead() and PageTail()).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/page-flags.h |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h	2007-10-03 19:31:51.000000000 -0700
+++ linux-2.6/include/linux/page-flags.h	2007-10-03 19:34:37.000000000 -0700
@@ -248,6 +248,24 @@ static inline void __ClearPageTail(struc
 #define __SetPageHead(page)	__SetPageCompound(page)
 #define __ClearPageHead(page)	__ClearPageCompound(page)
 
+/*
+ * PG_swapcache is used in combination with PG_compound to indicate
+ * that a compound page was allocated via vmalloc.
+ */
+#define PG_vcompound_mask ((1L << PG_compound) | (1L << PG_swapcache))
+#define PageVcompound(page)	((page->flags & PG_vcompound_mask) \
+					== PG_vcompound_mask)
+
+static inline void __SetPageVcompound(struct page *page)
+{
+	page->flags |= PG_vcompound_mask;
+}
+
+static inline void __ClearPageVcompound(struct page *page)
+{
+	page->flags &= ~PG_vcompound_mask;
+}
+
 #ifdef CONFIG_SWAP
 #define PageSwapCache(page)	test_bit(PG_swapcache, &(page)->flags)
 #define SetPageSwapCache(page)	set_bit(PG_swapcache, &(page)->flags)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [06/18] Vcompound: Update page address determination
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, apw

[-- Attachment #1: vcompound_page_address --]
[-- Type: text/plain, Size: 2226 bytes --]

Make page_address() correctly determine the address of a potentially
virtually mapped compound page.

There are 3 cases to consider:

1. !HASHED_PAGE_VIRTUAL && !WANT_PAGE_VIRTUAL

Call vmalloc_address() directly from the page_address function
defined in mm.h.

2. HASHED_PAGE_VIRTUAL

Modify page_address() in highmem.c to call vmalloc_address().

3. WANT_PAGE_VIRTUAL

set_page_address() is used to set up the virtual addresses of
all pages that are part of the virtual compound.

Cc: apw@shadowen.org
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h |    9 ++++++++-
 mm/highmem.c       |   10 ++++++++--
 2 files changed, 16 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 19:39:52.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 19:40:29.000000000 -0700
@@ -605,7 +605,14 @@ void page_address_init(void);
 #endif
 
 #if !defined(HASHED_PAGE_VIRTUAL) && !defined(WANT_PAGE_VIRTUAL)
-#define page_address(page) lowmem_page_address(page)
+
+static inline void *page_address(struct page *page)
+{
+	if (unlikely(PageVcompound(page)))
+		return vmalloc_address(page);
+	return lowmem_page_address(page);
+}
+
 #define set_page_address(page, address)  do { } while(0)
 #define page_address_init()  do { } while(0)
 #endif
Index: linux-2.6/mm/highmem.c
===================================================================
--- linux-2.6.orig/mm/highmem.c	2007-10-03 19:39:25.000000000 -0700
+++ linux-2.6/mm/highmem.c	2007-10-03 19:40:29.000000000 -0700
@@ -265,8 +265,11 @@ void *page_address(struct page *page)
 	void *ret;
 	struct page_address_slot *pas;
 
-	if (!PageHighMem(page))
+	if (!PageHighMem(page)) {
+		if (PageVcompound(page))
+			return vmalloc_address(page);
 		return lowmem_page_address(page);
+	}
 
 	pas = page_slot(page);
 	ret = NULL;
@@ -294,7 +297,10 @@ void set_page_address(struct page *page,
 	struct page_address_slot *pas;
 	struct page_address_map *pam;
 
-	BUG_ON(!PageHighMem(page));
+	if (!PageHighMem(page)) {
+		BUG_ON(!PageVcompound(page));
+		return;
+	}
 
 	pas = page_slot(page);
 	if (virtual) {		/* Add */

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [06/18] Vcompound: Update page address determination
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, apw

[-- Attachment #1: vcompound_page_address --]
[-- Type: text/plain, Size: 2452 bytes --]

Make page_address() correctly determine the address of a potentially
virtually mapped compound page.

There are 3 cases to consider:

1. !HASHED_PAGE_VIRTUAL && !WANT_PAGE_VIRTUAL

Call vmalloc_address() directly from the page_address function
defined in mm.h.

2. HASHED_PAGE_VIRTUAL

Modify page_address() in highmem.c to call vmalloc_address().

3. WANT_PAGE_VIRTUAL

set_page_address() is used to set up the virtual addresses of
all pages that are part of the virtual compound.

Cc: apw@shadowen.org
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/mm.h |    9 ++++++++-
 mm/highmem.c       |   10 ++++++++--
 2 files changed, 16 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 19:39:52.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 19:40:29.000000000 -0700
@@ -605,7 +605,14 @@ void page_address_init(void);
 #endif
 
 #if !defined(HASHED_PAGE_VIRTUAL) && !defined(WANT_PAGE_VIRTUAL)
-#define page_address(page) lowmem_page_address(page)
+
+static inline void *page_address(struct page *page)
+{
+	if (unlikely(PageVcompound(page)))
+		return vmalloc_address(page);
+	return lowmem_page_address(page);
+}
+
 #define set_page_address(page, address)  do { } while(0)
 #define page_address_init()  do { } while(0)
 #endif
Index: linux-2.6/mm/highmem.c
===================================================================
--- linux-2.6.orig/mm/highmem.c	2007-10-03 19:39:25.000000000 -0700
+++ linux-2.6/mm/highmem.c	2007-10-03 19:40:29.000000000 -0700
@@ -265,8 +265,11 @@ void *page_address(struct page *page)
 	void *ret;
 	struct page_address_slot *pas;
 
-	if (!PageHighMem(page))
+	if (!PageHighMem(page)) {
+		if (PageVcompound(page))
+			return vmalloc_address(page);
 		return lowmem_page_address(page);
+	}
 
 	pas = page_slot(page);
 	ret = NULL;
@@ -294,7 +297,10 @@ void set_page_address(struct page *page,
 	struct page_address_slot *pas;
 	struct page_address_map *pam;
 
-	BUG_ON(!PageHighMem(page));
+	if (!PageHighMem(page)) {
+		BUG_ON(!PageVcompound(page));
+		return;
+	}
 
 	pas = page_slot(page);
 	if (virtual) {		/* Add */

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [07/18] Vcompound: Add compound_nth_page() to determine nth base page
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_compound_nth_page --]
[-- Type: text/plain, Size: 4137 bytes --]

Add a new function

        compound_nth_page(page, n)

and
	vmalloc_nth_page(page, n)

to find the nth page of a compound page. For real compound pages
his simply reduces to page + n. For virtual compound pages we need to consult
the page tables to figure out the nth page from the one specified.

Update all the references to page[1] to use compound_nth instead.

---
 include/linux/mm.h |   17 +++++++++++++----
 mm/page_alloc.c    |   16 +++++++++++-----
 mm/vmalloc.c       |   10 ++++++++++
 3 files changed, 34 insertions(+), 9 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 19:31:45.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 19:31:51.000000000 -0700
@@ -295,6 +295,8 @@ static inline int get_page_unless_zero(s
 }
 
 void *vmalloc_address(struct page *);
+struct page *vmalloc_to_page(void *addr);
+struct page *vmalloc_nth_page(struct page *page, int n);
 
 static inline struct page *compound_head(struct page *page)
 {
@@ -338,27 +340,34 @@ void split_page(struct page *page, unsig
  */
 typedef void compound_page_dtor(struct page *);
 
+static inline struct page *compound_nth_page(struct page *page, int n)
+{
+	if (likely(!PageVcompound(page)))
+		return page + n;
+	return vmalloc_nth_page(page, n);
+}
+
 static inline void set_compound_page_dtor(struct page *page,
 						compound_page_dtor *dtor)
 {
-	page[1].lru.next = (void *)dtor;
+	compound_nth_page(page, 1)->lru.next = (void *)dtor;
 }
 
 static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
 {
-	return (compound_page_dtor *)page[1].lru.next;
+	return (compound_page_dtor *)compound_nth_page(page, 1)->lru.next;
 }
 
 static inline int compound_order(struct page *page)
 {
 	if (!PageHead(page))
 		return 0;
-	return (unsigned long)page[1].lru.prev;
+	return (unsigned long)compound_nth_page(page, 1)->lru.prev;
 }
 
 static inline void set_compound_order(struct page *page, unsigned long order)
 {
-	page[1].lru.prev = (void *)order;
+	compound_nth_page(page, 1)->lru.prev = (void *)order;
 }
 
 /*
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 19:31:45.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 19:31:51.000000000 -0700
@@ -541,6 +541,16 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
+/*
+ * Given a pointer to the first page struct:
+ * Determine a pointer to the nth page.
+ */
+struct page *vmalloc_nth_page(struct page *page, int n)
+{
+	return vmalloc_to_page(page_address(page) + n * PAGE_SIZE);
+}
+EXPORT_SYMBOL(vmalloc_nth_page);
+
 /**
  * vmalloc_user - allocate zeroed virtually contiguous memory for userspace
  * @size: allocation size
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 19:31:51.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 19:32:45.000000000 -0700
@@ -274,7 +274,7 @@ static void prep_compound_page(struct pa
 	set_compound_order(page, order);
 	__SetPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
-		struct page *p = page + i;
+		struct page *p = compound_nth_page(page, i);
 
 		__SetPageTail(p);
 		p->first_page = page;
@@ -289,17 +289,23 @@ static void destroy_compound_page(struct
 	if (unlikely(compound_order(page) != order))
 		bad_page(page);
 
-	if (unlikely(!PageHead(page)))
-			bad_page(page);
-	__ClearPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
-		struct page *p = page + i;
+		struct page *p = compound_nth_page(page,  i);
 
 		if (unlikely(!PageTail(p) |
 				(p->first_page != page)))
 			bad_page(page);
 		__ClearPageTail(p);
 	}
+
+	/*
+	 * The PageHead is important since it determines how operations on
+	 * a compound page have to be performed. We can only tear the head
+	 * down after all the tail pages are done.
+	 */
+	if (unlikely(!PageHead(page)))
+			bad_page(page);
+	__ClearPageHead(page);
 }
 
 static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [07/18] Vcompound: Add compound_nth_page() to determine nth base page
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_compound_nth_page --]
[-- Type: text/plain, Size: 4363 bytes --]

Add a new function

        compound_nth_page(page, n)

and
	vmalloc_nth_page(page, n)

to find the nth page of a compound page. For real compound pages
his simply reduces to page + n. For virtual compound pages we need to consult
the page tables to figure out the nth page from the one specified.

Update all the references to page[1] to use compound_nth instead.

---
 include/linux/mm.h |   17 +++++++++++++----
 mm/page_alloc.c    |   16 +++++++++++-----
 mm/vmalloc.c       |   10 ++++++++++
 3 files changed, 34 insertions(+), 9 deletions(-)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2007-10-03 19:31:45.000000000 -0700
+++ linux-2.6/include/linux/mm.h	2007-10-03 19:31:51.000000000 -0700
@@ -295,6 +295,8 @@ static inline int get_page_unless_zero(s
 }
 
 void *vmalloc_address(struct page *);
+struct page *vmalloc_to_page(void *addr);
+struct page *vmalloc_nth_page(struct page *page, int n);
 
 static inline struct page *compound_head(struct page *page)
 {
@@ -338,27 +340,34 @@ void split_page(struct page *page, unsig
  */
 typedef void compound_page_dtor(struct page *);
 
+static inline struct page *compound_nth_page(struct page *page, int n)
+{
+	if (likely(!PageVcompound(page)))
+		return page + n;
+	return vmalloc_nth_page(page, n);
+}
+
 static inline void set_compound_page_dtor(struct page *page,
 						compound_page_dtor *dtor)
 {
-	page[1].lru.next = (void *)dtor;
+	compound_nth_page(page, 1)->lru.next = (void *)dtor;
 }
 
 static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
 {
-	return (compound_page_dtor *)page[1].lru.next;
+	return (compound_page_dtor *)compound_nth_page(page, 1)->lru.next;
 }
 
 static inline int compound_order(struct page *page)
 {
 	if (!PageHead(page))
 		return 0;
-	return (unsigned long)page[1].lru.prev;
+	return (unsigned long)compound_nth_page(page, 1)->lru.prev;
 }
 
 static inline void set_compound_order(struct page *page, unsigned long order)
 {
-	page[1].lru.prev = (void *)order;
+	compound_nth_page(page, 1)->lru.prev = (void *)order;
 }
 
 /*
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 19:31:45.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 19:31:51.000000000 -0700
@@ -541,6 +541,16 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
+/*
+ * Given a pointer to the first page struct:
+ * Determine a pointer to the nth page.
+ */
+struct page *vmalloc_nth_page(struct page *page, int n)
+{
+	return vmalloc_to_page(page_address(page) + n * PAGE_SIZE);
+}
+EXPORT_SYMBOL(vmalloc_nth_page);
+
 /**
  * vmalloc_user - allocate zeroed virtually contiguous memory for userspace
  * @size: allocation size
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 19:31:51.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 19:32:45.000000000 -0700
@@ -274,7 +274,7 @@ static void prep_compound_page(struct pa
 	set_compound_order(page, order);
 	__SetPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
-		struct page *p = page + i;
+		struct page *p = compound_nth_page(page, i);
 
 		__SetPageTail(p);
 		p->first_page = page;
@@ -289,17 +289,23 @@ static void destroy_compound_page(struct
 	if (unlikely(compound_order(page) != order))
 		bad_page(page);
 
-	if (unlikely(!PageHead(page)))
-			bad_page(page);
-	__ClearPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
-		struct page *p = page + i;
+		struct page *p = compound_nth_page(page,  i);
 
 		if (unlikely(!PageTail(p) |
 				(p->first_page != page)))
 			bad_page(page);
 		__ClearPageTail(p);
 	}
+
+	/*
+	 * The PageHead is important since it determines how operations on
+	 * a compound page have to be performed. We can only tear the head
+	 * down after all the tail pages are done.
+	 */
+	if (unlikely(!PageHead(page)))
+			bad_page(page);
+	__ClearPageHead(page);
 }
 
 static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [08/18] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_core --]
[-- Type: text/plain, Size: 7133 bytes --]

Add a new gfp flag

	__GFP_VFALLBACK

If specified during a higher order allocation then the system will fall
back to vmap if no physically contiguous pages can be found. This will
create a virtually contiguous area instead of a physically contiguous area.
In many cases the virtually contiguous area can stand in for the physically
contiguous area (with some loss of performance).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/gfp.h |    5 +
 mm/page_alloc.c     |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 139 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 19:44:07.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 19:44:08.000000000 -0700
@@ -60,6 +60,9 @@ long nr_swap_pages;
 int percpu_pagelist_fraction;
 
 static void __free_pages_ok(struct page *page, unsigned int order);
+static struct page *alloc_vcompound(gfp_t, int,
+					struct zonelist *, unsigned long);
+static void destroy_compound_page(struct page *page, unsigned long order);
 
 /*
  * results with 256, 32 in the lowmem_reserve sysctl:
@@ -260,9 +263,51 @@ static void bad_page(struct page *page)
  * This usage means that zero-order pages may not be compound.
  */
 
+static void __free_vcompound(void *addr)
+{
+	struct page **pages;
+	int i;
+	struct page *page = vmalloc_to_page(addr);
+	int order = compound_order(page);
+	int nr_pages = 1 << order;
+
+	if (!PageVcompound(page) || !PageHead(page)) {
+		bad_page(page);
+		return;
+	}
+	destroy_compound_page(page, order);
+	pages = vunmap(addr);
+	/*
+	 * First page will have zero refcount since it maintains state
+	 * for the compound and was decremented before we got here.
+	 */
+	set_page_address(page, NULL);
+	__ClearPageVcompound(page);
+	free_hot_page(page);
+
+	for (i = 1; i < nr_pages; i++) {
+		page = pages[i];
+		set_page_address(page, NULL);
+		__ClearPageVcompound(page);
+		__free_page(page);
+	}
+	kfree(pages);
+}
+
+
+static void free_vcompound(void *addr)
+{
+	__free_vcompound(addr);
+}
+
 static void free_compound_page(struct page *page)
 {
-	__free_pages_ok(page, compound_order(page));
+	if (PageVcompound(page))
+		free_vcompound(page_address(page));
+	else {
+		destroy_compound_page(page, compound_order(page));
+		__free_pages_ok(page, compound_order(page));
+	}
 }
 
 static void prep_compound_page(struct page *page, unsigned long order)
@@ -1259,6 +1304,67 @@ try_next_zone:
 }
 
 /*
+ * Virtual Compound Page support.
+ *
+ * Virtual Compound Pages are used to fall back to order 0 allocations if large
+ * linear mappings are not available and __GFP_VFALLBACK is set. They are
+ * formatted according to compound page conventions. I.e. following
+ * page->first_page if PageTail(page) is set can be used to determine the
+ * head page.
+ */
+static noinline struct page *alloc_vcompound(gfp_t gfp_mask, int order,
+		struct zonelist *zonelist, unsigned long alloc_flags)
+{
+	struct page *page;
+	int i;
+	struct vm_struct *vm;
+	int nr_pages = 1 << order;
+	struct page **pages = kmalloc(nr_pages * sizeof(struct page *),
+						gfp_mask & GFP_LEVEL_MASK);
+	struct page **pages2;
+
+	if (!pages)
+		return NULL;
+
+	gfp_mask &= ~(__GFP_COMP | __GFP_VFALLBACK);
+	for (i = 0; i < nr_pages; i++) {
+		page = get_page_from_freelist(gfp_mask, 0, zonelist,
+							alloc_flags);
+		if (!page)
+			goto abort;
+
+		/* Sets PageCompound which makes PageHead(page) true */
+		__SetPageVcompound(page);
+		pages[i] = page;
+	}
+
+	vm = get_vm_area_node(nr_pages << PAGE_SHIFT, VM_MAP,
+			zone_to_nid(zonelist->zones[0]), gfp_mask);
+	pages2 = pages;
+	if (map_vm_area(vm, PAGE_KERNEL, &pages2))
+		goto abort;
+
+	prep_compound_page(pages[0], order);
+
+	for (i = 0; i < nr_pages; i++)
+		set_page_address(pages[0], vm->addr + (i << PAGE_SHIFT));
+
+	return pages[0];
+
+abort:
+	while (i-- > 0) {
+		page = pages[i];
+		if (!page)
+			continue;
+		set_page_address(page, NULL);
+		__ClearPageVcompound(page);
+		__free_page(page);
+	}
+	kfree(pages);
+	return NULL;
+}
+
+/*
  * This is the 'heart' of the zoned buddy allocator.
  */
 struct page * fastcall
@@ -1353,12 +1459,12 @@ nofail_alloc:
 				goto nofail_alloc;
 			}
 		}
-		goto nopage;
+		goto try_vcompound;
 	}
 
 	/* Atomic allocations - we can't balance anything */
 	if (!wait)
-		goto nopage;
+		goto try_vcompound;
 
 	cond_resched();
 
@@ -1389,6 +1495,11 @@ nofail_alloc:
 		 */
 		page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
 				zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET);
+
+		if (!page && order && (gfp_mask & __GFP_VFALLBACK))
+			page = alloc_vcompound(gfp_mask, order,
+					zonelist, alloc_flags);
+
 		if (page)
 			goto got_pg;
 
@@ -1420,6 +1531,14 @@ nofail_alloc:
 		goto rebalance;
 	}
 
+try_vcompound:
+	/* Last chance before failing the allocation */
+	if (order && (gfp_mask & __GFP_VFALLBACK)) {
+		page = alloc_vcompound(gfp_mask, order,
+					zonelist, alloc_flags);
+		if (page)
+			goto got_pg;
+	}
 nopage:
 	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
 		printk(KERN_WARNING "%s: page allocation failure."
@@ -1480,6 +1599,9 @@ fastcall void __free_pages(struct page *
 		if (order == 0)
 			free_hot_page(page);
 		else
+		if (unlikely(PageHead(page)))
+			free_compound_page(page);
+		else
 			__free_pages_ok(page, order);
 	}
 }
@@ -1489,8 +1611,15 @@ EXPORT_SYMBOL(__free_pages);
 fastcall void free_pages(unsigned long addr, unsigned int order)
 {
 	if (addr != 0) {
-		VM_BUG_ON(!virt_addr_valid((void *)addr));
-		__free_pages(virt_to_page((void *)addr), order);
+		struct page *page;
+
+		if (unlikely(addr >= VMALLOC_START && addr < VMALLOC_END))
+			page = vmalloc_to_page((void *)addr);
+		else {
+			VM_BUG_ON(!virt_addr_valid(addr));
+			page  = virt_to_page(addr);
+		};
+		__free_pages(page, order);
 	}
 }
 
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h	2007-10-03 19:44:07.000000000 -0700
+++ linux-2.6/include/linux/gfp.h	2007-10-03 19:44:08.000000000 -0700
@@ -43,6 +43,7 @@ struct vm_area_struct;
 #define __GFP_REPEAT	((__force gfp_t)0x400u)	/* Retry the allocation.  Might fail */
 #define __GFP_NOFAIL	((__force gfp_t)0x800u)	/* Retry for ever.  Cannot fail */
 #define __GFP_NORETRY	((__force gfp_t)0x1000u)/* Do not retry.  Might fail */
+#define __GFP_VFALLBACK	((__force gfp_t)0x2000u)/* Permit fallback to vmalloc */
 #define __GFP_COMP	((__force gfp_t)0x4000u)/* Add compound page metadata */
 #define __GFP_ZERO	((__force gfp_t)0x8000u)/* Return zeroed page on success */
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
@@ -86,6 +87,10 @@ struct vm_area_struct;
 #define GFP_THISNODE	((__force gfp_t)0)
 #endif
 
+/*
+ * Allocate large page but allow fallback to a virtually mapped page
+ */
+#define GFP_VFALLBACK	(GFP_KERNEL | __GFP_VFALLBACK)
 
 /* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
    platforms, used as appropriate on others */

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [08/18] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_core --]
[-- Type: text/plain, Size: 7359 bytes --]

Add a new gfp flag

	__GFP_VFALLBACK

If specified during a higher order allocation then the system will fall
back to vmap if no physically contiguous pages can be found. This will
create a virtually contiguous area instead of a physically contiguous area.
In many cases the virtually contiguous area can stand in for the physically
contiguous area (with some loss of performance).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/gfp.h |    5 +
 mm/page_alloc.c     |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 139 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 19:44:07.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 19:44:08.000000000 -0700
@@ -60,6 +60,9 @@ long nr_swap_pages;
 int percpu_pagelist_fraction;
 
 static void __free_pages_ok(struct page *page, unsigned int order);
+static struct page *alloc_vcompound(gfp_t, int,
+					struct zonelist *, unsigned long);
+static void destroy_compound_page(struct page *page, unsigned long order);
 
 /*
  * results with 256, 32 in the lowmem_reserve sysctl:
@@ -260,9 +263,51 @@ static void bad_page(struct page *page)
  * This usage means that zero-order pages may not be compound.
  */
 
+static void __free_vcompound(void *addr)
+{
+	struct page **pages;
+	int i;
+	struct page *page = vmalloc_to_page(addr);
+	int order = compound_order(page);
+	int nr_pages = 1 << order;
+
+	if (!PageVcompound(page) || !PageHead(page)) {
+		bad_page(page);
+		return;
+	}
+	destroy_compound_page(page, order);
+	pages = vunmap(addr);
+	/*
+	 * First page will have zero refcount since it maintains state
+	 * for the compound and was decremented before we got here.
+	 */
+	set_page_address(page, NULL);
+	__ClearPageVcompound(page);
+	free_hot_page(page);
+
+	for (i = 1; i < nr_pages; i++) {
+		page = pages[i];
+		set_page_address(page, NULL);
+		__ClearPageVcompound(page);
+		__free_page(page);
+	}
+	kfree(pages);
+}
+
+
+static void free_vcompound(void *addr)
+{
+	__free_vcompound(addr);
+}
+
 static void free_compound_page(struct page *page)
 {
-	__free_pages_ok(page, compound_order(page));
+	if (PageVcompound(page))
+		free_vcompound(page_address(page));
+	else {
+		destroy_compound_page(page, compound_order(page));
+		__free_pages_ok(page, compound_order(page));
+	}
 }
 
 static void prep_compound_page(struct page *page, unsigned long order)
@@ -1259,6 +1304,67 @@ try_next_zone:
 }
 
 /*
+ * Virtual Compound Page support.
+ *
+ * Virtual Compound Pages are used to fall back to order 0 allocations if large
+ * linear mappings are not available and __GFP_VFALLBACK is set. They are
+ * formatted according to compound page conventions. I.e. following
+ * page->first_page if PageTail(page) is set can be used to determine the
+ * head page.
+ */
+static noinline struct page *alloc_vcompound(gfp_t gfp_mask, int order,
+		struct zonelist *zonelist, unsigned long alloc_flags)
+{
+	struct page *page;
+	int i;
+	struct vm_struct *vm;
+	int nr_pages = 1 << order;
+	struct page **pages = kmalloc(nr_pages * sizeof(struct page *),
+						gfp_mask & GFP_LEVEL_MASK);
+	struct page **pages2;
+
+	if (!pages)
+		return NULL;
+
+	gfp_mask &= ~(__GFP_COMP | __GFP_VFALLBACK);
+	for (i = 0; i < nr_pages; i++) {
+		page = get_page_from_freelist(gfp_mask, 0, zonelist,
+							alloc_flags);
+		if (!page)
+			goto abort;
+
+		/* Sets PageCompound which makes PageHead(page) true */
+		__SetPageVcompound(page);
+		pages[i] = page;
+	}
+
+	vm = get_vm_area_node(nr_pages << PAGE_SHIFT, VM_MAP,
+			zone_to_nid(zonelist->zones[0]), gfp_mask);
+	pages2 = pages;
+	if (map_vm_area(vm, PAGE_KERNEL, &pages2))
+		goto abort;
+
+	prep_compound_page(pages[0], order);
+
+	for (i = 0; i < nr_pages; i++)
+		set_page_address(pages[0], vm->addr + (i << PAGE_SHIFT));
+
+	return pages[0];
+
+abort:
+	while (i-- > 0) {
+		page = pages[i];
+		if (!page)
+			continue;
+		set_page_address(page, NULL);
+		__ClearPageVcompound(page);
+		__free_page(page);
+	}
+	kfree(pages);
+	return NULL;
+}
+
+/*
  * This is the 'heart' of the zoned buddy allocator.
  */
 struct page * fastcall
@@ -1353,12 +1459,12 @@ nofail_alloc:
 				goto nofail_alloc;
 			}
 		}
-		goto nopage;
+		goto try_vcompound;
 	}
 
 	/* Atomic allocations - we can't balance anything */
 	if (!wait)
-		goto nopage;
+		goto try_vcompound;
 
 	cond_resched();
 
@@ -1389,6 +1495,11 @@ nofail_alloc:
 		 */
 		page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
 				zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET);
+
+		if (!page && order && (gfp_mask & __GFP_VFALLBACK))
+			page = alloc_vcompound(gfp_mask, order,
+					zonelist, alloc_flags);
+
 		if (page)
 			goto got_pg;
 
@@ -1420,6 +1531,14 @@ nofail_alloc:
 		goto rebalance;
 	}
 
+try_vcompound:
+	/* Last chance before failing the allocation */
+	if (order && (gfp_mask & __GFP_VFALLBACK)) {
+		page = alloc_vcompound(gfp_mask, order,
+					zonelist, alloc_flags);
+		if (page)
+			goto got_pg;
+	}
 nopage:
 	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
 		printk(KERN_WARNING "%s: page allocation failure."
@@ -1480,6 +1599,9 @@ fastcall void __free_pages(struct page *
 		if (order == 0)
 			free_hot_page(page);
 		else
+		if (unlikely(PageHead(page)))
+			free_compound_page(page);
+		else
 			__free_pages_ok(page, order);
 	}
 }
@@ -1489,8 +1611,15 @@ EXPORT_SYMBOL(__free_pages);
 fastcall void free_pages(unsigned long addr, unsigned int order)
 {
 	if (addr != 0) {
-		VM_BUG_ON(!virt_addr_valid((void *)addr));
-		__free_pages(virt_to_page((void *)addr), order);
+		struct page *page;
+
+		if (unlikely(addr >= VMALLOC_START && addr < VMALLOC_END))
+			page = vmalloc_to_page((void *)addr);
+		else {
+			VM_BUG_ON(!virt_addr_valid(addr));
+			page  = virt_to_page(addr);
+		};
+		__free_pages(page, order);
 	}
 }
 
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h	2007-10-03 19:44:07.000000000 -0700
+++ linux-2.6/include/linux/gfp.h	2007-10-03 19:44:08.000000000 -0700
@@ -43,6 +43,7 @@ struct vm_area_struct;
 #define __GFP_REPEAT	((__force gfp_t)0x400u)	/* Retry the allocation.  Might fail */
 #define __GFP_NOFAIL	((__force gfp_t)0x800u)	/* Retry for ever.  Cannot fail */
 #define __GFP_NORETRY	((__force gfp_t)0x1000u)/* Do not retry.  Might fail */
+#define __GFP_VFALLBACK	((__force gfp_t)0x2000u)/* Permit fallback to vmalloc */
 #define __GFP_COMP	((__force gfp_t)0x4000u)/* Add compound page metadata */
 #define __GFP_ZERO	((__force gfp_t)0x8000u)/* Return zeroed page on success */
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
@@ -86,6 +87,10 @@ struct vm_area_struct;
 #define GFP_THISNODE	((__force gfp_t)0)
 #endif
 
+/*
+ * Allocate large page but allow fallback to a virtually mapped page
+ */
+#define GFP_VFALLBACK	(GFP_KERNEL | __GFP_VFALLBACK)
 
 /* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
    platforms, used as appropriate on others */

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [09/18] Vcompound: GFP_VFALLBACK debugging aid
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_debugging_aid --]
[-- Type: text/plain, Size: 1881 bytes --]

Virtual fallbacks are rare and thus subtle bugs may creep in if we do not
test the fallbacks. CONFIG_VFALLBACK_ALWAYS makes all GFP_VFALLBACK
allocations fall back to virtual mapping.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 lib/Kconfig.debug |   11 +++++++++++
 mm/page_alloc.c   |    6 ++++++
 2 files changed, 17 insertions(+)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 18:04:33.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 18:07:16.000000000 -0700
@@ -1257,6 +1257,12 @@ zonelist_scan:
 			}
 		}
 
+#ifdef CONFIG_VFALLBACK_ALWAYS
+		if ((gfp_mask & __GFP_VFALLBACK) &&
+				system_state == SYSTEM_RUNNING)
+			return alloc_vcompound(gfp_mask, order,
+					zonelist, alloc_flags);
+#endif
 		page = buffered_rmqueue(zonelist, zone, order, gfp_mask);
 		if (page)
 			break;
Index: linux-2.6/lib/Kconfig.debug
===================================================================
--- linux-2.6.orig/lib/Kconfig.debug	2007-10-03 18:04:29.000000000 -0700
+++ linux-2.6/lib/Kconfig.debug	2007-10-03 18:07:16.000000000 -0700
@@ -105,6 +105,17 @@ config DETECT_SOFTLOCKUP
 	   can be detected via the NMI-watchdog, on platforms that
 	   support it.)
 
+config VFALLBACK_ALWAYS
+	bool "Always fall back to Virtual Compound pages"
+	default y
+	help
+	  Virtual compound pages are only allocated if there is no linear
+	  memory available. They are a fallback and errors created by the
+	  use of virtual mappings instead of linear ones may not surface
+	  because of their infrequent use. This option makes every
+	  allocation that allows a fallback to a virtual mapping use
+	  the virtual mapping. May have a significant performance impact.
+
 config SCHED_DEBUG
 	bool "Collect scheduler debugging info"
 	depends on DEBUG_KERNEL && PROC_FS

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [09/18] Vcompound: GFP_VFALLBACK debugging aid
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_debugging_aid --]
[-- Type: text/plain, Size: 2107 bytes --]

Virtual fallbacks are rare and thus subtle bugs may creep in if we do not
test the fallbacks. CONFIG_VFALLBACK_ALWAYS makes all GFP_VFALLBACK
allocations fall back to virtual mapping.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 lib/Kconfig.debug |   11 +++++++++++
 mm/page_alloc.c   |    6 ++++++
 2 files changed, 17 insertions(+)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 18:04:33.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 18:07:16.000000000 -0700
@@ -1257,6 +1257,12 @@ zonelist_scan:
 			}
 		}
 
+#ifdef CONFIG_VFALLBACK_ALWAYS
+		if ((gfp_mask & __GFP_VFALLBACK) &&
+				system_state == SYSTEM_RUNNING)
+			return alloc_vcompound(gfp_mask, order,
+					zonelist, alloc_flags);
+#endif
 		page = buffered_rmqueue(zonelist, zone, order, gfp_mask);
 		if (page)
 			break;
Index: linux-2.6/lib/Kconfig.debug
===================================================================
--- linux-2.6.orig/lib/Kconfig.debug	2007-10-03 18:04:29.000000000 -0700
+++ linux-2.6/lib/Kconfig.debug	2007-10-03 18:07:16.000000000 -0700
@@ -105,6 +105,17 @@ config DETECT_SOFTLOCKUP
 	   can be detected via the NMI-watchdog, on platforms that
 	   support it.)
 
+config VFALLBACK_ALWAYS
+	bool "Always fall back to Virtual Compound pages"
+	default y
+	help
+	  Virtual compound pages are only allocated if there is no linear
+	  memory available. They are a fallback and errors created by the
+	  use of virtual mappings instead of linear ones may not surface
+	  because of their infrequent use. This option makes every
+	  allocation that allows a fallback to a virtual mapping use
+	  the virtual mapping. May have a significant performance impact.
+
 config SCHED_DEBUG
 	bool "Collect scheduler debugging info"
 	depends on DEBUG_KERNEL && PROC_FS

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [10/18] Sparsemem: Use fallback for the memmap.
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, apw

[-- Attachment #1: vcompound_sparse_gfp_vfallback --]
[-- Type: text/plain, Size: 1661 bytes --]

Sparsemem currently attempts first to do a physically contiguous mapping
and then falls back to vmalloc. The same thing can now be accomplished
using GFP_VFALLBACK.

Cc: apw@shadowen.org
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/sparse.c |   33 +++------------------------------
 1 file changed, 3 insertions(+), 30 deletions(-)

Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c	2007-10-02 22:02:58.000000000 -0700
+++ linux-2.6/mm/sparse.c	2007-10-02 22:19:58.000000000 -0700
@@ -269,40 +269,13 @@ void __init sparse_init(void)
 #ifdef CONFIG_MEMORY_HOTPLUG
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 {
-	struct page *page, *ret;
-	unsigned long memmap_size = sizeof(struct page) * nr_pages;
-
-	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
-	if (page)
-		goto got_map_page;
-
-	ret = vmalloc(memmap_size);
-	if (ret)
-		goto got_map_ptr;
-
-	return NULL;
-got_map_page:
-	ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
-got_map_ptr:
-	memset(ret, 0, memmap_size);
-
-	return ret;
-}
-
-static int vaddr_in_vmalloc_area(void *addr)
-{
-	if (addr >= (void *)VMALLOC_START &&
-	    addr < (void *)VMALLOC_END)
-		return 1;
-	return 0;
+	return (struct page *)__get_free_pages(GFP_VFALLBACK,
+			get_order(memmap_size));
 }
 
 static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
 {
-	if (vaddr_in_vmalloc_area(memmap))
-		vfree(memmap);
-	else
-		free_pages((unsigned long)memmap,
+	free_pages((unsigned long)memmap,
 			   get_order(sizeof(struct page) * nr_pages));
 }
 

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [10/18] Sparsemem: Use fallback for the memmap.
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, apw

[-- Attachment #1: vcompound_sparse_gfp_vfallback --]
[-- Type: text/plain, Size: 1887 bytes --]

Sparsemem currently attempts first to do a physically contiguous mapping
and then falls back to vmalloc. The same thing can now be accomplished
using GFP_VFALLBACK.

Cc: apw@shadowen.org
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/sparse.c |   33 +++------------------------------
 1 file changed, 3 insertions(+), 30 deletions(-)

Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c	2007-10-02 22:02:58.000000000 -0700
+++ linux-2.6/mm/sparse.c	2007-10-02 22:19:58.000000000 -0700
@@ -269,40 +269,13 @@ void __init sparse_init(void)
 #ifdef CONFIG_MEMORY_HOTPLUG
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 {
-	struct page *page, *ret;
-	unsigned long memmap_size = sizeof(struct page) * nr_pages;
-
-	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
-	if (page)
-		goto got_map_page;
-
-	ret = vmalloc(memmap_size);
-	if (ret)
-		goto got_map_ptr;
-
-	return NULL;
-got_map_page:
-	ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
-got_map_ptr:
-	memset(ret, 0, memmap_size);
-
-	return ret;
-}
-
-static int vaddr_in_vmalloc_area(void *addr)
-{
-	if (addr >= (void *)VMALLOC_START &&
-	    addr < (void *)VMALLOC_END)
-		return 1;
-	return 0;
+	return (struct page *)__get_free_pages(GFP_VFALLBACK,
+			get_order(memmap_size));
 }
 
 static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
 {
-	if (vaddr_in_vmalloc_area(memmap))
-		vfree(memmap);
-	else
-		free_pages((unsigned long)memmap,
+	free_pages((unsigned long)memmap,
 			   get_order(sizeof(struct page) * nr_pages));
 }
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [11/18] Page allocator: Use a higher order allocation for the zone wait table.
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_wait_table_no_vmalloc --]
[-- Type: text/plain, Size: 1117 bytes --]

Currently vmalloc is used for the zone wait table. Therefore the vmalloc
page tables have to be consulted by the MMU to access the wait table.
We can now use GFP_VFALLBACK to attempt the use of a physically contiguous
page that can then use the large kernel TLBs.

Drawback: The zone wait table is rounded up to the next power of two which
may cost some memory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/page_alloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 18:07:16.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 18:07:20.000000000 -0700
@@ -2585,7 +2585,9 @@ int zone_wait_table_init(struct zone *zo
 		 * To use this new node's memory, further consideration will be
 		 * necessary.
 		 */
-		zone->wait_table = (wait_queue_head_t *)vmalloc(alloc_size);
+		zone->wait_table = (wait_queue_head_t *)
+			__get_free_pages(GFP_VFALLBACK,
+					get_order(alloc_size));
 	}
 	if (!zone->wait_table)
 		return -ENOMEM;

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [11/18] Page allocator: Use a higher order allocation for the zone wait table.
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_wait_table_no_vmalloc --]
[-- Type: text/plain, Size: 1343 bytes --]

Currently vmalloc is used for the zone wait table. Therefore the vmalloc
page tables have to be consulted by the MMU to access the wait table.
We can now use GFP_VFALLBACK to attempt the use of a physically contiguous
page that can then use the large kernel TLBs.

Drawback: The zone wait table is rounded up to the next power of two which
may cost some memory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/page_alloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 18:07:16.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 18:07:20.000000000 -0700
@@ -2585,7 +2585,9 @@ int zone_wait_table_init(struct zone *zo
 		 * To use this new node's memory, further consideration will be
 		 * necessary.
 		 */
-		zone->wait_table = (wait_queue_head_t *)vmalloc(alloc_size);
+		zone->wait_table = (wait_queue_head_t *)
+			__get_free_pages(GFP_VFALLBACK,
+					get_order(alloc_size));
 	}
 	if (!zone->wait_table)
 		return -ENOMEM;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [12/18] Wait: Allow bit_waitqueue to wait on a bit in a virtual compound page
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_wait_on_virtually_mapped_object --]
[-- Type: text/plain, Size: 887 bytes --]

If bit waitqueue is passed a virtual address then it must use
virt_to_head_page instead of virt_to_page.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 kernel/wait.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/wait.c
===================================================================
--- linux-2.6.orig/kernel/wait.c	2007-10-03 17:44:21.000000000 -0700
+++ linux-2.6/kernel/wait.c	2007-10-03 17:53:07.000000000 -0700
@@ -245,7 +245,7 @@ EXPORT_SYMBOL(wake_up_bit);
 fastcall wait_queue_head_t *bit_waitqueue(void *word, int bit)
 {
 	const int shift = BITS_PER_LONG == 32 ? 5 : 6;
-	const struct zone *zone = page_zone(virt_to_page(word));
+	const struct zone *zone = page_zone(virt_to_head_page(word));
 	unsigned long val = (unsigned long)word << shift | bit;
 
 	return &zone->wait_table[hash_long(val, zone->wait_table_bits)];

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [12/18] Wait: Allow bit_waitqueue to wait on a bit in a virtual compound page
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_wait_on_virtually_mapped_object --]
[-- Type: text/plain, Size: 1113 bytes --]

If bit waitqueue is passed a virtual address then it must use
virt_to_head_page instead of virt_to_page.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 kernel/wait.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/wait.c
===================================================================
--- linux-2.6.orig/kernel/wait.c	2007-10-03 17:44:21.000000000 -0700
+++ linux-2.6/kernel/wait.c	2007-10-03 17:53:07.000000000 -0700
@@ -245,7 +245,7 @@ EXPORT_SYMBOL(wake_up_bit);
 fastcall wait_queue_head_t *bit_waitqueue(void *word, int bit)
 {
 	const int shift = BITS_PER_LONG == 32 ? 5 : 6;
-	const struct zone *zone = page_zone(virt_to_page(word));
+	const struct zone *zone = page_zone(virt_to_head_page(word));
 	unsigned long val = (unsigned long)word << shift | bit;
 
 	return &zone->wait_table[hash_long(val, zone->wait_table_bits)];

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [13/18] x86_64: Allow fallback for the stack
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, ak, travis

[-- Attachment #1: vcompound_x86_64_stack_fallback --]
[-- Type: text/plain, Size: 1708 bytes --]

Peter Zijlstra has recently demonstrated that we can have order 1 allocation
failures under memory pressure with small memory configurations. The
x86_64 stack has a size of 8k and thus requires a order 1 allocation.

This patch adds a virtual fallback capability for the stack. The system may
continue even in extreme situations and we may be able to increase the stack
size if necessary (see next patch).

Cc: ak@suse.de
Cc: travis@sgi.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-x86_64/thread_info.h |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Index: linux-2.6/include/asm-x86_64/thread_info.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/thread_info.h	2007-10-03 14:49:48.000000000 -0700
+++ linux-2.6/include/asm-x86_64/thread_info.h	2007-10-03 14:51:00.000000000 -0700
@@ -74,20 +74,14 @@ static inline struct thread_info *stack_
 
 /* thread information allocation */
 #ifdef CONFIG_DEBUG_STACK_USAGE
-#define alloc_thread_info(tsk)					\
-    ({								\
-	struct thread_info *ret;				\
-								\
-	ret = ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)); \
-	if (ret)						\
-		memset(ret, 0, THREAD_SIZE);			\
-	ret;							\
-    })
+#define THREAD_FLAGS (GFP_VFALLBACK | __GFP_ZERO)
 #else
-#define alloc_thread_info(tsk) \
-	((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER))
+#define THREAD_FLAGS GFP_VFALLBACK
 #endif
 
+#define alloc_thread_info(tsk) \
+	((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER))
+
 #define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)
 
 #else /* !__ASSEMBLY__ */

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, ak, travis

[-- Attachment #1: vcompound_x86_64_stack_fallback --]
[-- Type: text/plain, Size: 1934 bytes --]

Peter Zijlstra has recently demonstrated that we can have order 1 allocation
failures under memory pressure with small memory configurations. The
x86_64 stack has a size of 8k and thus requires a order 1 allocation.

This patch adds a virtual fallback capability for the stack. The system may
continue even in extreme situations and we may be able to increase the stack
size if necessary (see next patch).

Cc: ak@suse.de
Cc: travis@sgi.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-x86_64/thread_info.h |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Index: linux-2.6/include/asm-x86_64/thread_info.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/thread_info.h	2007-10-03 14:49:48.000000000 -0700
+++ linux-2.6/include/asm-x86_64/thread_info.h	2007-10-03 14:51:00.000000000 -0700
@@ -74,20 +74,14 @@ static inline struct thread_info *stack_
 
 /* thread information allocation */
 #ifdef CONFIG_DEBUG_STACK_USAGE
-#define alloc_thread_info(tsk)					\
-    ({								\
-	struct thread_info *ret;				\
-								\
-	ret = ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)); \
-	if (ret)						\
-		memset(ret, 0, THREAD_SIZE);			\
-	ret;							\
-    })
+#define THREAD_FLAGS (GFP_VFALLBACK | __GFP_ZERO)
 #else
-#define alloc_thread_info(tsk) \
-	((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER))
+#define THREAD_FLAGS GFP_VFALLBACK
 #endif
 
+#define alloc_thread_info(tsk) \
+	((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER))
+
 #define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)
 
 #else /* !__ASSEMBLY__ */

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [14/18] Configure stack size
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, ak, travis

[-- Attachment #1: vcompound_x86_64_config_stack_size --]
[-- Type: text/plain, Size: 2506 bytes --]

Make the stack size configurable now that we can fallback to vmalloc if
necessary. SGI NUMA configurations may need more stack because cpumasks
and nodemasks are at times kept on the stack. With the coming 16k cpu
support this is going to be 2k just for the mask. This patch allows to
run with 16k or 32k kernel stacks on x86_74.

Cc: ak@suse.de
Cc: travis@sgi.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/x86_64/Kconfig              |    6 ++++++
 include/asm-x86_64/page.h        |    3 +--
 include/asm-x86_64/thread_info.h |    4 ++--
 3 files changed, 9 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86_64/Kconfig	2007-10-03 18:11:20.000000000 -0700
+++ linux-2.6/arch/x86_64/Kconfig	2007-10-03 18:12:13.000000000 -0700
@@ -363,6 +363,12 @@ config NODES_SHIFT
 	default "6"
 	depends on NEED_MULTIPLE_NODES
 
+config THREAD_ORDER
+	int "Kernel stack size (in page order)"
+	default "1"
+	help
+	  Page order for the thread stack.
+
 # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig.
 
 config X86_64_ACPI_NUMA
Index: linux-2.6/include/asm-x86_64/page.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/page.h	2007-10-03 18:11:20.000000000 -0700
+++ linux-2.6/include/asm-x86_64/page.h	2007-10-03 18:12:13.000000000 -0700
@@ -9,8 +9,7 @@
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 #define PHYSICAL_PAGE_MASK	(~(PAGE_SIZE-1) & __PHYSICAL_MASK)
 
-#define THREAD_ORDER 1 
-#define THREAD_SIZE  (PAGE_SIZE << THREAD_ORDER)
+#define THREAD_SIZE  (PAGE_SIZE << CONFIG_THREAD_ORDER)
 #define CURRENT_MASK (~(THREAD_SIZE-1))
 
 #define EXCEPTION_STACK_ORDER 0
Index: linux-2.6/include/asm-x86_64/thread_info.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/thread_info.h	2007-10-03 18:12:13.000000000 -0700
+++ linux-2.6/include/asm-x86_64/thread_info.h	2007-10-03 18:12:13.000000000 -0700
@@ -80,9 +80,9 @@ static inline struct thread_info *stack_
 #endif
 
 #define alloc_thread_info(tsk) \
-	((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER))
+	((struct thread_info *) __get_free_pages(THREAD_FLAGS, CONFIG_THREAD_ORDER))
 
-#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)
+#define free_thread_info(ti) free_pages((unsigned long) (ti), CONFIG_THREAD_ORDER)
 
 #else /* !__ASSEMBLY__ */
 

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [14/18] Configure stack size
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, ak, travis

[-- Attachment #1: vcompound_x86_64_config_stack_size --]
[-- Type: text/plain, Size: 2732 bytes --]

Make the stack size configurable now that we can fallback to vmalloc if
necessary. SGI NUMA configurations may need more stack because cpumasks
and nodemasks are at times kept on the stack. With the coming 16k cpu
support this is going to be 2k just for the mask. This patch allows to
run with 16k or 32k kernel stacks on x86_74.

Cc: ak@suse.de
Cc: travis@sgi.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 arch/x86_64/Kconfig              |    6 ++++++
 include/asm-x86_64/page.h        |    3 +--
 include/asm-x86_64/thread_info.h |    4 ++--
 3 files changed, 9 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86_64/Kconfig	2007-10-03 18:11:20.000000000 -0700
+++ linux-2.6/arch/x86_64/Kconfig	2007-10-03 18:12:13.000000000 -0700
@@ -363,6 +363,12 @@ config NODES_SHIFT
 	default "6"
 	depends on NEED_MULTIPLE_NODES
 
+config THREAD_ORDER
+	int "Kernel stack size (in page order)"
+	default "1"
+	help
+	  Page order for the thread stack.
+
 # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig.
 
 config X86_64_ACPI_NUMA
Index: linux-2.6/include/asm-x86_64/page.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/page.h	2007-10-03 18:11:20.000000000 -0700
+++ linux-2.6/include/asm-x86_64/page.h	2007-10-03 18:12:13.000000000 -0700
@@ -9,8 +9,7 @@
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 #define PHYSICAL_PAGE_MASK	(~(PAGE_SIZE-1) & __PHYSICAL_MASK)
 
-#define THREAD_ORDER 1 
-#define THREAD_SIZE  (PAGE_SIZE << THREAD_ORDER)
+#define THREAD_SIZE  (PAGE_SIZE << CONFIG_THREAD_ORDER)
 #define CURRENT_MASK (~(THREAD_SIZE-1))
 
 #define EXCEPTION_STACK_ORDER 0
Index: linux-2.6/include/asm-x86_64/thread_info.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/thread_info.h	2007-10-03 18:12:13.000000000 -0700
+++ linux-2.6/include/asm-x86_64/thread_info.h	2007-10-03 18:12:13.000000000 -0700
@@ -80,9 +80,9 @@ static inline struct thread_info *stack_
 #endif
 
 #define alloc_thread_info(tsk) \
-	((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER))
+	((struct thread_info *) __get_free_pages(THREAD_FLAGS, CONFIG_THREAD_ORDER))
 
-#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)
+#define free_thread_info(ti) free_pages((unsigned long) (ti), CONFIG_THREAD_ORDER)
 
 #else /* !__ASSEMBLY__ */
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [15/18] Fallback for temporary order 2 allocation
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Dan Williams

[-- Attachment #1: vcompound_crypto --]
[-- Type: text/plain, Size: 837 bytes --]

The cryto subsystem needs an order 2 allocation. This is a temporary buffer
for xoring data so we can safely allow fallback.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 crypto/xor.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/crypto/xor.c
===================================================================
--- linux-2.6.orig/crypto/xor.c	2007-10-03 18:11:20.000000000 -0700
+++ linux-2.6/crypto/xor.c	2007-10-03 18:12:14.000000000 -0700
@@ -101,7 +101,7 @@ calibrate_xor_blocks(void)
 	void *b1, *b2;
 	struct xor_block_template *f, *fastest;
 
-	b1 = (void *) __get_free_pages(GFP_KERNEL, 2);
+	b1 = (void *) __get_free_pages(GFP_VFALLBACK, 2);
 	if (!b1) {
 		printk(KERN_WARNING "xor: Yikes!  No memory available.\n");
 		return -ENOMEM;

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [15/18] Fallback for temporary order 2 allocation
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Dan Williams

[-- Attachment #1: vcompound_crypto --]
[-- Type: text/plain, Size: 1063 bytes --]

The cryto subsystem needs an order 2 allocation. This is a temporary buffer
for xoring data so we can safely allow fallback.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 crypto/xor.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/crypto/xor.c
===================================================================
--- linux-2.6.orig/crypto/xor.c	2007-10-03 18:11:20.000000000 -0700
+++ linux-2.6/crypto/xor.c	2007-10-03 18:12:14.000000000 -0700
@@ -101,7 +101,7 @@ calibrate_xor_blocks(void)
 	void *b1, *b2;
 	struct xor_block_template *f, *fastest;
 
-	b1 = (void *) __get_free_pages(GFP_KERNEL, 2);
+	b1 = (void *) __get_free_pages(GFP_VFALLBACK, 2);
 	if (!b1) {
 		printk(KERN_WARNING "xor: Yikes!  No memory available.\n");
 		return -ENOMEM;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [16/18] Virtual Compound page allocation from interrupt context.
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_interrupt_alloc --]
[-- Type: text/plain, Size: 1527 bytes --]

In an interrupt context we cannot wait for the vmlist_lock in
__get_vm_area_node(). So use a trylock instead. If the trylock fails
then the atomic allocation will fail and subsequently be retried.

This only works because the flush_cache_vunmap in use for
allocation is never performing any IPIs in contrast to flush_tlb_...
in use for freeing.  flush_cache_vunmap is only used on architectures
with a virtually mapped cache (xtensa, pa-risc).

[Note: Nick Piggin is working on a scheme to make this simpler by
no longer requiring flushes]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/vmalloc.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 16:21:10.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 16:25:17.000000000 -0700
@@ -177,7 +177,6 @@ static struct vm_struct *__get_vm_area_n
 	unsigned long align = 1;
 	unsigned long addr;
 
-	BUG_ON(in_interrupt());
 	if (flags & VM_IOREMAP) {
 		int bit = fls(size);
 
@@ -202,7 +201,14 @@ static struct vm_struct *__get_vm_area_n
 	 */
 	size += PAGE_SIZE;
 
-	write_lock(&vmlist_lock);
+	if (gfp_mask & __GFP_WAIT)
+		write_lock(&vmlist_lock);
+	else {
+		if (!write_trylock(&vmlist_lock)) {
+			kfree(area);
+			return NULL;
+		}
+	}
 	for (p = &vmlist; (tmp = *p) != NULL ;p = &tmp->next) {
 		if ((unsigned long)tmp->addr < addr) {
 			if((unsigned long)tmp->addr + tmp->size >= addr)

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [16/18] Virtual Compound page allocation from interrupt context.
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_interrupt_alloc --]
[-- Type: text/plain, Size: 1753 bytes --]

In an interrupt context we cannot wait for the vmlist_lock in
__get_vm_area_node(). So use a trylock instead. If the trylock fails
then the atomic allocation will fail and subsequently be retried.

This only works because the flush_cache_vunmap in use for
allocation is never performing any IPIs in contrast to flush_tlb_...
in use for freeing.  flush_cache_vunmap is only used on architectures
with a virtually mapped cache (xtensa, pa-risc).

[Note: Nick Piggin is working on a scheme to make this simpler by
no longer requiring flushes]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/vmalloc.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c	2007-10-03 16:21:10.000000000 -0700
+++ linux-2.6/mm/vmalloc.c	2007-10-03 16:25:17.000000000 -0700
@@ -177,7 +177,6 @@ static struct vm_struct *__get_vm_area_n
 	unsigned long align = 1;
 	unsigned long addr;
 
-	BUG_ON(in_interrupt());
 	if (flags & VM_IOREMAP) {
 		int bit = fls(size);
 
@@ -202,7 +201,14 @@ static struct vm_struct *__get_vm_area_n
 	 */
 	size += PAGE_SIZE;
 
-	write_lock(&vmlist_lock);
+	if (gfp_mask & __GFP_WAIT)
+		write_lock(&vmlist_lock);
+	else {
+		if (!write_trylock(&vmlist_lock)) {
+			kfree(area);
+			return NULL;
+		}
+	}
 	for (p = &vmlist; (tmp = *p) != NULL ;p = &tmp->next) {
 		if ((unsigned long)tmp->addr < addr) {
 			if((unsigned long)tmp->addr + tmp->size >= addr)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [17/18] Virtual compound page freeing in interrupt context
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_interrupt_free --]
[-- Type: text/plain, Size: 1108 bytes --]

If we are in an interrupt context then simply defer the free via a workqueue.

Removing a virtual mappping *must* be done with interrupts enabled
since tlb_xx functions are called that rely on interrupts for
processor to processor communications.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/page_alloc.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 20:00:37.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 20:01:09.000000000 -0700
@@ -294,10 +294,20 @@ static void __free_vcompound(void *addr)
 	kfree(pages);
 }
 
+static void vcompound_free_work(struct work_struct *w)
+{
+	__free_vcompound((void *)w);
+}
 
 static void free_vcompound(void *addr)
 {
-	__free_vcompound(addr);
+	struct work_struct *w = addr;
+
+	if (irqs_disabled() || in_interrupt()) {
+		INIT_WORK(w, vcompound_free_work);
+		schedule_work(w);
+	} else
+		__free_vcompound(w);
 }
 
 static void free_compound_page(struct page *page)

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [17/18] Virtual compound page freeing in interrupt context
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_interrupt_free --]
[-- Type: text/plain, Size: 1334 bytes --]

If we are in an interrupt context then simply defer the free via a workqueue.

Removing a virtual mappping *must* be done with interrupts enabled
since tlb_xx functions are called that rely on interrupts for
processor to processor communications.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/page_alloc.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2007-10-03 20:00:37.000000000 -0700
+++ linux-2.6/mm/page_alloc.c	2007-10-03 20:01:09.000000000 -0700
@@ -294,10 +294,20 @@ static void __free_vcompound(void *addr)
 	kfree(pages);
 }
 
+static void vcompound_free_work(struct work_struct *w)
+{
+	__free_vcompound((void *)w);
+}
 
 static void free_vcompound(void *addr)
 {
-	__free_vcompound(addr);
+	struct work_struct *w = addr;
+
+	if (irqs_disabled() || in_interrupt()) {
+		INIT_WORK(w, vcompound_free_work);
+		schedule_work(w);
+	} else
+		__free_vcompound(w);
 }
 
 static void free_compound_page(struct page *page)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [18/18] SLUB: Use fallback for table of callers/freers of a slab cache
  2007-10-04  3:59 ` Christoph Lameter
@ 2007-10-04  3:59   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_slub_safe --]
[-- Type: text/plain, Size: 870 bytes --]

The caller table can get quite large if there are many call sites for a
particular slab. Add GFP_FALLBACK allows falling back to vmalloc in case
the caller table gets too big and memory is fragmented. Currently we
would fail the operation.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-10-03 20:00:23.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-10-03 20:01:12.000000000 -0700
@@ -3003,7 +3003,8 @@ static int alloc_loc_track(struct loc_tr
 
 	order = get_order(sizeof(struct location) * max);
 
-	l = (void *)__get_free_pages(flags, order);
+	l = (void *)__get_free_pages(flags | __GFP_COMP | __GFP_VFALLBACK,
+								order);
 	if (!l)
 		return 0;
 

-- 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [18/18] SLUB: Use fallback for table of callers/freers of a slab cache
@ 2007-10-04  3:59   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04  3:59 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

[-- Attachment #1: vcompound_slub_safe --]
[-- Type: text/plain, Size: 1096 bytes --]

The caller table can get quite large if there are many call sites for a
particular slab. Add GFP_FALLBACK allows falling back to vmalloc in case
the caller table gets too big and memory is fragmented. Currently we
would fail the operation.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-10-03 20:00:23.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-10-03 20:01:12.000000000 -0700
@@ -3003,7 +3003,8 @@ static int alloc_loc_track(struct loc_tr
 
 	order = get_order(sizeof(struct location) * max);
 
-	l = (void *)__get_free_pages(flags, order);
+	l = (void *)__get_free_pages(flags | __GFP_COMP | __GFP_VFALLBACK,
+								order);
 	if (!l)
 		return 0;
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
  2007-10-04  3:59   ` Christoph Lameter
@ 2007-10-04  4:36     ` Arjan van de Ven
  -1 siblings, 0 replies; 85+ messages in thread
From: Arjan van de Ven @ 2007-10-04  4:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel, ak, travis

On Wed, 03 Oct 2007 20:59:49 -0700
Christoph Lameter <clameter@sgi.com> wrote:

> Make the stack size configurable now that we can fallback to vmalloc
> if necessary. SGI NUMA configurations may need more stack because
> cpumasks and nodemasks are at times kept on the stack. With the
> coming 16k cpu support this is going to be 2k just for the mask. This
> patch allows to run with 16k or 32k kernel stacks on x86_74.

there is still code that does DMA from and to the stack....
how would this work with virtual allocated stack?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
@ 2007-10-04  4:36     ` Arjan van de Ven
  0 siblings, 0 replies; 85+ messages in thread
From: Arjan van de Ven @ 2007-10-04  4:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel, ak, travis

On Wed, 03 Oct 2007 20:59:49 -0700
Christoph Lameter <clameter@sgi.com> wrote:

> Make the stack size configurable now that we can fallback to vmalloc
> if necessary. SGI NUMA configurations may need more stack because
> cpumasks and nodemasks are at times kept on the stack. With the
> coming 16k cpu support this is going to be 2k just for the mask. This
> patch allows to run with 16k or 32k kernel stacks on x86_74.

there is still code that does DMA from and to the stack....
how would this work with virtual allocated stack?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
  2007-10-04  4:36     ` Arjan van de Ven
@ 2007-10-04  4:43       ` David Miller, Arjan van de Ven
  -1 siblings, 0 replies; 85+ messages in thread
From: David Miller @ 2007-10-04  4:43 UTC (permalink / raw)
  To: arjan; +Cc: clameter, akpm, linux-mm, linux-kernel, ak, travis

From: Arjan van de Ven <arjan@infradead.org>
Date: Wed, 3 Oct 2007 21:36:31 -0700

> there is still code that does DMA from and to the stack....
> how would this work with virtual allocated stack?

That's a bug and must be fixed.

There honestly shouldn't be that many examples around.

FWIW, there are platforms using a virtually allocated kernel stack
already.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
@ 2007-10-04  4:43       ` David Miller, Arjan van de Ven
  0 siblings, 0 replies; 85+ messages in thread
From: David Miller, Arjan van de Ven @ 2007-10-04  4:43 UTC (permalink / raw)
  To: arjan; +Cc: clameter, akpm, linux-mm, linux-kernel, ak, travis

> there is still code that does DMA from and to the stack....
> how would this work with virtual allocated stack?

That's a bug and must be fixed.

There honestly shouldn't be that many examples around.

FWIW, there are platforms using a virtually allocated kernel stack
already.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
  2007-10-04  3:59   ` Christoph Lameter
@ 2007-10-04  9:11     ` Andi Kleen
  -1 siblings, 0 replies; 85+ messages in thread
From: Andi Kleen @ 2007-10-04  9:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel, travis

On Thursday 04 October 2007 05:59, Christoph Lameter wrote:
> Make the stack size configurable now that we can fallback to vmalloc if
> necessary. SGI NUMA configurations may need more stack because cpumasks
> and nodemasks are at times kept on the stack.  With the coming 16k cpu 
> support 

Hmm, I was told 512 byte cpumasks for x86 earlier. Why is this suddenly 2K? 

2K is too much imho. If you really want to go that big you have
to look in allocating them all separately imho. But messing
with the stack TLB entries and risking more TLB misses 
is not a good idea.

-Andi

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
@ 2007-10-04  9:11     ` Andi Kleen
  0 siblings, 0 replies; 85+ messages in thread
From: Andi Kleen @ 2007-10-04  9:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel, travis

On Thursday 04 October 2007 05:59, Christoph Lameter wrote:
> Make the stack size configurable now that we can fallback to vmalloc if
> necessary. SGI NUMA configurations may need more stack because cpumasks
> and nodemasks are at times kept on the stack.  With the coming 16k cpu 
> support 

Hmm, I was told 512 byte cpumasks for x86 earlier. Why is this suddenly 2K? 

2K is too much imho. If you really want to go that big you have
to look in allocating them all separately imho. But messing
with the stack TLB entries and risking more TLB misses 
is not a good idea.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04  3:59   ` Christoph Lameter
@ 2007-10-04 11:56     ` Andi Kleen
  -1 siblings, 0 replies; 85+ messages in thread
From: Andi Kleen @ 2007-10-04 11:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel, travis

On Thursday 04 October 2007 05:59:48 Christoph Lameter wrote:
> Peter Zijlstra has recently demonstrated that we can have order 1 allocation
> failures under memory pressure with small memory configurations. The
> x86_64 stack has a size of 8k and thus requires a order 1 allocation.

We've known for ages that it is possible. But it has been always so rare
that it was ignored.

Is there any evidence this is more common now than it used to be?

-Andi

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 11:56     ` Andi Kleen
  0 siblings, 0 replies; 85+ messages in thread
From: Andi Kleen @ 2007-10-04 11:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel, travis

On Thursday 04 October 2007 05:59:48 Christoph Lameter wrote:
> Peter Zijlstra has recently demonstrated that we can have order 1 allocation
> failures under memory pressure with small memory configurations. The
> x86_64 stack has a size of 8k and thus requires a order 1 allocation.

We've known for ages that it is possible. But it has been always so rare
that it was ignored.

Is there any evidence this is more common now than it used to be?

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 11:56     ` Andi Kleen
@ 2007-10-04 12:08       ` Peter Zijlstra
  -1 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2007-10-04 12:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, akpm, linux-mm, linux-kernel, travis

On Thu, 2007-10-04 at 13:56 +0200, Andi Kleen wrote:
> On Thursday 04 October 2007 05:59:48 Christoph Lameter wrote:
> > Peter Zijlstra has recently demonstrated that we can have order 1 allocation
> > failures under memory pressure with small memory configurations. The
> > x86_64 stack has a size of 8k and thus requires a order 1 allocation.
> 
> We've known for ages that it is possible. But it has been always so rare
> that it was ignored.
> 
> Is there any evidence this is more common now than it used to be?

The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
order for everything. Kernel stack allocation is GFP_KERNEL I presume.
Also, I use 4k stacks on all my machines.

Maybe the cpumask thing needs an extended api, one that falls back to
kmalloc if NR_CPUS >> sane.

That way that cannot be an argument to inflate stacks.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 12:08       ` Peter Zijlstra
  0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2007-10-04 12:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, akpm, linux-mm, linux-kernel, travis

On Thu, 2007-10-04 at 13:56 +0200, Andi Kleen wrote:
> On Thursday 04 October 2007 05:59:48 Christoph Lameter wrote:
> > Peter Zijlstra has recently demonstrated that we can have order 1 allocation
> > failures under memory pressure with small memory configurations. The
> > x86_64 stack has a size of 8k and thus requires a order 1 allocation.
> 
> We've known for ages that it is possible. But it has been always so rare
> that it was ignored.
> 
> Is there any evidence this is more common now than it used to be?

The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
order for everything. Kernel stack allocation is GFP_KERNEL I presume.
Also, I use 4k stacks on all my machines.

Maybe the cpumask thing needs an extended api, one that falls back to
kmalloc if NR_CPUS >> sane.

That way that cannot be an argument to inflate stacks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 12:08       ` Peter Zijlstra
@ 2007-10-04 12:25         ` Andi Kleen
  -1 siblings, 0 replies; 85+ messages in thread
From: Andi Kleen @ 2007-10-04 12:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Christoph Lameter, akpm, linux-mm, linux-kernel, travis


> The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> order for everything.

slub is wrong then. Can it be fixed?

> Kernel stack allocation is GFP_KERNEL I presume. 

Of course.

> Also, I use 4k stacks on all my machines.

You don't have any x86-64 machines?

-Andi

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 12:25         ` Andi Kleen
  0 siblings, 0 replies; 85+ messages in thread
From: Andi Kleen @ 2007-10-04 12:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Christoph Lameter, akpm, linux-mm, linux-kernel, travis

> The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> order for everything.

slub is wrong then. Can it be fixed?

> Kernel stack allocation is GFP_KERNEL I presume. 

Of course.

> Also, I use 4k stacks on all my machines.

You don't have any x86-64 machines?

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 12:25         ` Andi Kleen
@ 2007-10-04 12:30           ` Peter Zijlstra
  -1 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2007-10-04 12:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, akpm, linux-mm, linux-kernel, travis

On Thu, 2007-10-04 at 14:25 +0200, Andi Kleen wrote:
> > The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> > order for everything.
> 
> slub is wrong then. Can it be fixed?

I think mainline slub doesn't do this, just -mm.

See DEFAULT_MAX_ORDER in mm/slub.c

> > Kernel stack allocation is GFP_KERNEL I presume. 
> 
> Of course.
> 
> > Also, I use 4k stacks on all my machines.
> 
> You don't have any x86-64 machines?

Ah, my bad, yes I do, but I (wrongly) thought they had that option too.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 12:30           ` Peter Zijlstra
  0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2007-10-04 12:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, akpm, linux-mm, linux-kernel, travis

On Thu, 2007-10-04 at 14:25 +0200, Andi Kleen wrote:
> > The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> > order for everything.
> 
> slub is wrong then. Can it be fixed?

I think mainline slub doesn't do this, just -mm.

See DEFAULT_MAX_ORDER in mm/slub.c

> > Kernel stack allocation is GFP_KERNEL I presume. 
> 
> Of course.
> 
> > Also, I use 4k stacks on all my machines.
> 
> You don't have any x86-64 machines?

Ah, my bad, yes I do, but I (wrongly) thought they had that option too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 12:25         ` Andi Kleen
@ 2007-10-04 17:40           ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 17:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Zijlstra, akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Andi Kleen wrote:

> > The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> > order for everything.
> 
> slub is wrong then. Can it be fixed?

SLUB in mm kernels was using higher order allocations for some slabs 
for the last 6 months or so. Not true for upstream.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 17:40           ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 17:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Zijlstra, akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Andi Kleen wrote:

> > The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> > order for everything.
> 
> slub is wrong then. Can it be fixed?

SLUB in mm kernels was using higher order allocations for some slabs 
for the last 6 months or so. Not true for upstream.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 11:56     ` Andi Kleen
@ 2007-10-04 19:20       ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 19:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Andi Kleen wrote:

> We've known for ages that it is possible. But it has been always so rare
> that it was ignored.

Well we can now address the rarity. That is the whole point of the 
patchset.

> Is there any evidence this is more common now than it used to be?

It will be more common if the stack size is increased beyond 8k.



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 19:20       ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 19:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Andi Kleen wrote:

> We've known for ages that it is possible. But it has been always so rare
> that it was ignored.

Well we can now address the rarity. That is the whole point of the 
patchset.

> Is there any evidence this is more common now than it used to be?

It will be more common if the stack size is increased beyond 8k.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
  2007-10-04  9:11     ` Andi Kleen
@ 2007-10-04 19:26       ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 19:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Andi Kleen wrote:

> On Thursday 04 October 2007 05:59, Christoph Lameter wrote:
> > Make the stack size configurable now that we can fallback to vmalloc if
> > necessary. SGI NUMA configurations may need more stack because cpumasks
> > and nodemasks are at times kept on the stack.  With the coming 16k cpu 
> > support 
> 
> Hmm, I was told 512 byte cpumasks for x86 earlier. Why is this suddenly 2K? 

512 is for the default 4k cpu configuration that should be enough for most 
purposes. The hardware maximum is 16k and we need at least a kernel config 
option that covers the potential stack size issues.

> 2K is too much imho. If you really want to go that big you have
> to look in allocating them all separately imho. But messing
> with the stack TLB entries and risking more TLB misses 
> is not a good idea.

These machines have very large amounts of memory (up to the maximum 
addressable memory of an x86_64 cpu). The fallback is as good as 
impossible. If you get into fallback then we are likely already swapping 
and doing other bad placement things. We typically tune the loads to avoid 
this.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
@ 2007-10-04 19:26       ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 19:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Andi Kleen wrote:

> On Thursday 04 October 2007 05:59, Christoph Lameter wrote:
> > Make the stack size configurable now that we can fallback to vmalloc if
> > necessary. SGI NUMA configurations may need more stack because cpumasks
> > and nodemasks are at times kept on the stack.  With the coming 16k cpu 
> > support 
> 
> Hmm, I was told 512 byte cpumasks for x86 earlier. Why is this suddenly 2K? 

512 is for the default 4k cpu configuration that should be enough for most 
purposes. The hardware maximum is 16k and we need at least a kernel config 
option that covers the potential stack size issues.

> 2K is too much imho. If you really want to go that big you have
> to look in allocating them all separately imho. But messing
> with the stack TLB entries and risking more TLB misses 
> is not a good idea.

These machines have very large amounts of memory (up to the maximum 
addressable memory of an x86_64 cpu). The fallback is as good as 
impossible. If you get into fallback then we are likely already swapping 
and doing other bad placement things. We typically tune the loads to avoid 
this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
  2007-10-04  4:43       ` David Miller, Arjan van de Ven
@ 2007-10-04 19:34         ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 19:34 UTC (permalink / raw)
  To: David Miller; +Cc: arjan, akpm, linux-mm, linux-kernel, ak, travis

On Wed, 3 Oct 2007, David Miller wrote:

> > there is still code that does DMA from and to the stack....
> > how would this work with virtual allocated stack?
> 
> That's a bug and must be fixed.
> 
> There honestly shouldn't be that many examples around.
> 
> FWIW, there are platforms using a virtually allocated kernel stack
> already.

There would be a way to address this by checking in the DMA layer for a 
virtually mapped page and then segmenting I/O at the page boundaries to 
the individual pages. We may need that anyways for large block sizes.




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [14/18] Configure stack size
@ 2007-10-04 19:34         ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 19:34 UTC (permalink / raw)
  To: David Miller; +Cc: arjan, akpm, linux-mm, linux-kernel, ak, travis

On Wed, 3 Oct 2007, David Miller wrote:

> > there is still code that does DMA from and to the stack....
> > how would this work with virtual allocated stack?
> 
> That's a bug and must be fixed.
> 
> There honestly shouldn't be that many examples around.
> 
> FWIW, there are platforms using a virtually allocated kernel stack
> already.

There would be a way to address this by checking in the DMA layer for a 
virtually mapped page and then segmenting I/O at the page boundaries to 
the individual pages. We may need that anyways for large block sizes.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 19:20       ` Christoph Lameter
@ 2007-10-04 19:39         ` Rik van Riel
  -1 siblings, 0 replies; 85+ messages in thread
From: Rik van Riel @ 2007-10-04 19:39 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007 12:20:50 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 4 Oct 2007, Andi Kleen wrote:
> 
> > We've known for ages that it is possible. But it has been always so
> > rare that it was ignored.
> 
> Well we can now address the rarity. That is the whole point of the 
> patchset.

Introducing complexity to fight a very rare problem with a good
fallback (refusing to fork more tasks, as well as lumpy reclaim)
somehow does not seem like a good tradeoff.
 
> > Is there any evidence this is more common now than it used to be?
> 
> It will be more common if the stack size is increased beyond 8k.

Why would we want to do such a thing?

8kB stacks are large enough...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 19:39         ` Rik van Riel
  0 siblings, 0 replies; 85+ messages in thread
From: Rik van Riel @ 2007-10-04 19:39 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007 12:20:50 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 4 Oct 2007, Andi Kleen wrote:
> 
> > We've known for ages that it is possible. But it has been always so
> > rare that it was ignored.
> 
> Well we can now address the rarity. That is the whole point of the 
> patchset.

Introducing complexity to fight a very rare problem with a good
fallback (refusing to fork more tasks, as well as lumpy reclaim)
somehow does not seem like a good tradeoff.
 
> > Is there any evidence this is more common now than it used to be?
> 
> It will be more common if the stack size is increased beyond 8k.

Why would we want to do such a thing?

8kB stacks are large enough...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 19:39         ` Rik van Riel
@ 2007-10-04 21:20           ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 21:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Rik van Riel wrote:

> > Well we can now address the rarity. That is the whole point of the 
> > patchset.
> 
> Introducing complexity to fight a very rare problem with a good
> fallback (refusing to fork more tasks, as well as lumpy reclaim)
> somehow does not seem like a good tradeoff.

The problem can become non-rare on special low memory machines doing wild 
swapping things though.

> > It will be more common if the stack size is increased beyond 8k.
> 
> Why would we want to do such a thing?

Because NUMA requires more stack space. In particular support for very 
large cpu configurations of 16k may require 2k cpumasks on the stack.
 
> 8kB stacks are large enough...

For many things yes. I just want to have the compile time option to 
increase it.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-04 21:20           ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-04 21:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Thu, 4 Oct 2007, Rik van Riel wrote:

> > Well we can now address the rarity. That is the whole point of the 
> > patchset.
> 
> Introducing complexity to fight a very rare problem with a good
> fallback (refusing to fork more tasks, as well as lumpy reclaim)
> somehow does not seem like a good tradeoff.

The problem can become non-rare on special low memory machines doing wild 
swapping things though.

> > It will be more common if the stack size is increased beyond 8k.
> 
> Why would we want to do such a thing?

Because NUMA requires more stack space. In particular support for very 
large cpu configurations of 16k may require 2k cpumasks on the stack.
 
> 8kB stacks are large enough...

For many things yes. I just want to have the compile time option to 
increase it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 19:39         ` Rik van Riel
@ 2007-10-06 18:53           ` Bill Davidsen
  -1 siblings, 0 replies; 85+ messages in thread
From: Bill Davidsen @ 2007-10-06 18:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Lameter, Andi Kleen, akpm, linux-mm, linux-kernel, travis

Rik van Riel wrote:
> On Thu, 4 Oct 2007 12:20:50 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
>> On Thu, 4 Oct 2007, Andi Kleen wrote:
>>
>>> We've known for ages that it is possible. But it has been always so
>>> rare that it was ignored.
>> Well we can now address the rarity. That is the whole point of the 
>> patchset.
> 
> Introducing complexity to fight a very rare problem with a good
> fallback (refusing to fork more tasks, as well as lumpy reclaim)
> somehow does not seem like a good tradeoff.
>  
>>> Is there any evidence this is more common now than it used to be?
>> It will be more common if the stack size is increased beyond 8k.
> 
> Why would we want to do such a thing?
> 
> 8kB stacks are large enough...
> 
Why would anyone need more than 640k... In addition to NUMA, who can 
tell what some future hardware might do, given that the size of memory 
is expanding as if it were covered in Moore's Law. As memory sizes 
increase someone will bump the page size again. Better to Let people 
make it as large as they feel they need and warn at build time 
performance may suck.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-06 18:53           ` Bill Davidsen
  0 siblings, 0 replies; 85+ messages in thread
From: Bill Davidsen @ 2007-10-06 18:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Lameter, Andi Kleen, akpm, linux-mm, linux-kernel, travis

Rik van Riel wrote:
> On Thu, 4 Oct 2007 12:20:50 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
>> On Thu, 4 Oct 2007, Andi Kleen wrote:
>>
>>> We've known for ages that it is possible. But it has been always so
>>> rare that it was ignored.
>> Well we can now address the rarity. That is the whole point of the 
>> patchset.
> 
> Introducing complexity to fight a very rare problem with a good
> fallback (refusing to fork more tasks, as well as lumpy reclaim)
> somehow does not seem like a good tradeoff.
>  
>>> Is there any evidence this is more common now than it used to be?
>> It will be more common if the stack size is increased beyond 8k.
> 
> Why would we want to do such a thing?
> 
> 8kB stacks are large enough...
> 
Why would anyone need more than 640k... In addition to NUMA, who can 
tell what some future hardware might do, given that the size of memory 
is expanding as if it were covered in Moore's Law. As memory sizes 
increase someone will bump the page size again. Better to Let people 
make it as large as they feel they need and warn at build time 
performance may suck.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-04 21:20           ` Christoph Lameter
@ 2007-10-07  7:35             ` Nick Piggin
  -1 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-07  7:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Friday 05 October 2007 07:20, Christoph Lameter wrote:
> On Thu, 4 Oct 2007, Rik van Riel wrote:
> > > Well we can now address the rarity. That is the whole point of the
> > > patchset.
> >
> > Introducing complexity to fight a very rare problem with a good
> > fallback (refusing to fork more tasks, as well as lumpy reclaim)
> > somehow does not seem like a good tradeoff.
>
> The problem can become non-rare on special low memory machines doing wild
> swapping things though.

But only your huge systems will be using huge stacks?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-07  7:35             ` Nick Piggin
  0 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-07  7:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Friday 05 October 2007 07:20, Christoph Lameter wrote:
> On Thu, 4 Oct 2007, Rik van Riel wrote:
> > > Well we can now address the rarity. That is the whole point of the
> > > patchset.
> >
> > Introducing complexity to fight a very rare problem with a good
> > fallback (refusing to fork more tasks, as well as lumpy reclaim)
> > somehow does not seem like a good tradeoff.
>
> The problem can become non-rare on special low memory machines doing wild
> swapping things though.

But only your huge systems will be using huge stacks?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-08 17:36               ` Christoph Lameter
@ 2007-10-08 12:55                 ` Nick Piggin
  -1 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-08 12:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Tuesday 09 October 2007 03:36, Christoph Lameter wrote:
> On Sun, 7 Oct 2007, Nick Piggin wrote:
> > > The problem can become non-rare on special low memory machines doing
> > > wild swapping things though.
> >
> > But only your huge systems will be using huge stacks?
>
> I have no idea who else would be using such a feature. Relaxing the tight
> memory restrictions on stack use may allow placing larger structures on
> the stack in general.

The tight memory restrictions on stack usage do not come about because
of the difficulty in increasing the stack size :) It is because we want to
keep stack sizes small!

Increasing the stack size 4K uses another 4MB of memory for every 1000
threads you have, right?

It would take a lot of good reason to move away from the general direction
we've been taking over the past years that 4/8K stacks are a good idea for
regular 32 and 64 bit builds in general.


> I have some concerns about the medium NUMA systems (a few dozen of nodes)
> also running out of stack since more data is placed on the stack through
> the policy layer and since we may end up with a couple of stacked
> filesystems. Most of the current NUMA systems on x86_64 are basically
> two nodes on one motherboard. The use of NUMA controls is likely
> limited there and the complexity of the filesystems is also not high.

The solution has until now always been to fix the problems so they don't
use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
systems, but I don't think you'd be able to make that assumption for most
normal systems.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-08 12:55                 ` Nick Piggin
  0 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-08 12:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Tuesday 09 October 2007 03:36, Christoph Lameter wrote:
> On Sun, 7 Oct 2007, Nick Piggin wrote:
> > > The problem can become non-rare on special low memory machines doing
> > > wild swapping things though.
> >
> > But only your huge systems will be using huge stacks?
>
> I have no idea who else would be using such a feature. Relaxing the tight
> memory restrictions on stack use may allow placing larger structures on
> the stack in general.

The tight memory restrictions on stack usage do not come about because
of the difficulty in increasing the stack size :) It is because we want to
keep stack sizes small!

Increasing the stack size 4K uses another 4MB of memory for every 1000
threads you have, right?

It would take a lot of good reason to move away from the general direction
we've been taking over the past years that 4/8K stacks are a good idea for
regular 32 and 64 bit builds in general.


> I have some concerns about the medium NUMA systems (a few dozen of nodes)
> also running out of stack since more data is placed on the stack through
> the policy layer and since we may end up with a couple of stacked
> filesystems. Most of the current NUMA systems on x86_64 are basically
> two nodes on one motherboard. The use of NUMA controls is likely
> limited there and the complexity of the filesystems is also not high.

The solution has until now always been to fix the problems so they don't
use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
systems, but I don't think you'd be able to make that assumption for most
normal systems.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-07  7:35             ` Nick Piggin
@ 2007-10-08 17:36               ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-08 17:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Sun, 7 Oct 2007, Nick Piggin wrote:

> > The problem can become non-rare on special low memory machines doing wild
> > swapping things though.
> 
> But only your huge systems will be using huge stacks?

I have no idea who else would be using such a feature. Relaxing the tight 
memory restrictions on stack use may allow placing larger structures on 
the stack in general.

I have some concerns about the medium NUMA systems (a few dozen of nodes) 
also running out of stack since more data is placed on the stack through 
the policy layer and since we may end up with a couple of stacked 
filesystems. Most of the current NUMA systems on x86_64 are basically 
two nodes on one motherboard. The use of NUMA controls is likely 
limited there and the complexity of the filesystems is also not high.



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-08 17:36               ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-08 17:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Sun, 7 Oct 2007, Nick Piggin wrote:

> > The problem can become non-rare on special low memory machines doing wild
> > swapping things though.
> 
> But only your huge systems will be using huge stacks?

I have no idea who else would be using such a feature. Relaxing the tight 
memory restrictions on stack use may allow placing larger structures on 
the stack in general.

I have some concerns about the medium NUMA systems (a few dozen of nodes) 
also running out of stack since more data is placed on the stack through 
the policy layer and since we may end up with a couple of stacked 
filesystems. Most of the current NUMA systems on x86_64 are basically 
two nodes on one motherboard. The use of NUMA controls is likely 
limited there and the complexity of the filesystems is also not high.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-09 18:39                   ` Christoph Lameter
@ 2007-10-09  8:46                     ` Nick Piggin
  -1 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-09  8:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Wednesday 10 October 2007 04:39, Christoph Lameter wrote:
> On Mon, 8 Oct 2007, Nick Piggin wrote:
> > The tight memory restrictions on stack usage do not come about because
> > of the difficulty in increasing the stack size :) It is because we want
> > to keep stack sizes small!
> >
> > Increasing the stack size 4K uses another 4MB of memory for every 1000
> > threads you have, right?
> >
> > It would take a lot of good reason to move away from the general
> > direction we've been taking over the past years that 4/8K stacks are a
> > good idea for regular 32 and 64 bit builds in general.
>
> We already use 32k stacks on IA64. So the memory argument fail there.

I'm talking about generic code.


> > > I have some concerns about the medium NUMA systems (a few dozen of
> > > nodes) also running out of stack since more data is placed on the stack
> > > through the policy layer and since we may end up with a couple of
> > > stacked filesystems. Most of the current NUMA systems on x86_64 are
> > > basically two nodes on one motherboard. The use of NUMA controls is
> > > likely limited there and the complexity of the filesystems is also not
> > > high.
> >
> > The solution has until now always been to fix the problems so they don't
> > use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> > systems, but I don't think you'd be able to make that assumption for most
> > normal systems.
>
> Yes that is why I made the stack size configurable.

Fine. I just don't see why you need this fallback.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-09  8:46                     ` Nick Piggin
  0 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-09  8:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Wednesday 10 October 2007 04:39, Christoph Lameter wrote:
> On Mon, 8 Oct 2007, Nick Piggin wrote:
> > The tight memory restrictions on stack usage do not come about because
> > of the difficulty in increasing the stack size :) It is because we want
> > to keep stack sizes small!
> >
> > Increasing the stack size 4K uses another 4MB of memory for every 1000
> > threads you have, right?
> >
> > It would take a lot of good reason to move away from the general
> > direction we've been taking over the past years that 4/8K stacks are a
> > good idea for regular 32 and 64 bit builds in general.
>
> We already use 32k stacks on IA64. So the memory argument fail there.

I'm talking about generic code.


> > > I have some concerns about the medium NUMA systems (a few dozen of
> > > nodes) also running out of stack since more data is placed on the stack
> > > through the policy layer and since we may end up with a couple of
> > > stacked filesystems. Most of the current NUMA systems on x86_64 are
> > > basically two nodes on one motherboard. The use of NUMA controls is
> > > likely limited there and the complexity of the filesystems is also not
> > > high.
> >
> > The solution has until now always been to fix the problems so they don't
> > use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> > systems, but I don't think you'd be able to make that assumption for most
> > normal systems.
>
> Yes that is why I made the stack size configurable.

Fine. I just don't see why you need this fallback.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-10  1:26                       ` Christoph Lameter
@ 2007-10-09  9:56                         ` Nick Piggin
  -1 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-09  9:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Wednesday 10 October 2007 11:26, Christoph Lameter wrote:
> On Tue, 9 Oct 2007, Nick Piggin wrote:
> > > We already use 32k stacks on IA64. So the memory argument fail there.
> >
> > I'm talking about generic code.
>
> The stack size is set in arch code not in generic code.

Generic code must assume a 4K stack on 32-bit, in general (modulo
huge cpumasks and such, I guess).


> > > > The solution has until now always been to fix the problems so they
> > > > don't use so much stack. Maybe a bigger stack is OK for you for 1024+
> > > > CPU systems, but I don't think you'd be able to make that assumption
> > > > for most normal systems.
> > >
> > > Yes that is why I made the stack size configurable.
> >
> > Fine. I just don't see why you need this fallback.
>
> So you would be ok with submitting the configurable stacksize patches
> separately without the fallback?

Sure. It's already configurable on other architectures.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-09  9:56                         ` Nick Piggin
  0 siblings, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2007-10-09  9:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Wednesday 10 October 2007 11:26, Christoph Lameter wrote:
> On Tue, 9 Oct 2007, Nick Piggin wrote:
> > > We already use 32k stacks on IA64. So the memory argument fail there.
> >
> > I'm talking about generic code.
>
> The stack size is set in arch code not in generic code.

Generic code must assume a 4K stack on 32-bit, in general (modulo
huge cpumasks and such, I guess).


> > > > The solution has until now always been to fix the problems so they
> > > > don't use so much stack. Maybe a bigger stack is OK for you for 1024+
> > > > CPU systems, but I don't think you'd be able to make that assumption
> > > > for most normal systems.
> > >
> > > Yes that is why I made the stack size configurable.
> >
> > Fine. I just don't see why you need this fallback.
>
> So you would be ok with submitting the configurable stacksize patches
> separately without the fallback?

Sure. It's already configurable on other architectures.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-08 12:55                 ` Nick Piggin
@ 2007-10-09 18:39                   ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-09 18:39 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Mon, 8 Oct 2007, Nick Piggin wrote:

> The tight memory restrictions on stack usage do not come about because
> of the difficulty in increasing the stack size :) It is because we want to
> keep stack sizes small!
> 
> Increasing the stack size 4K uses another 4MB of memory for every 1000
> threads you have, right?
> 
> It would take a lot of good reason to move away from the general direction
> we've been taking over the past years that 4/8K stacks are a good idea for
> regular 32 and 64 bit builds in general.

We already use 32k stacks on IA64. So the memory argument fail there.

> > I have some concerns about the medium NUMA systems (a few dozen of nodes)
> > also running out of stack since more data is placed on the stack through
> > the policy layer and since we may end up with a couple of stacked
> > filesystems. Most of the current NUMA systems on x86_64 are basically
> > two nodes on one motherboard. The use of NUMA controls is likely
> > limited there and the complexity of the filesystems is also not high.
> 
> The solution has until now always been to fix the problems so they don't
> use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> systems, but I don't think you'd be able to make that assumption for most
> normal systems.

Yes that is why I made the stack size configurable.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-09 18:39                   ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-09 18:39 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Mon, 8 Oct 2007, Nick Piggin wrote:

> The tight memory restrictions on stack usage do not come about because
> of the difficulty in increasing the stack size :) It is because we want to
> keep stack sizes small!
> 
> Increasing the stack size 4K uses another 4MB of memory for every 1000
> threads you have, right?
> 
> It would take a lot of good reason to move away from the general direction
> we've been taking over the past years that 4/8K stacks are a good idea for
> regular 32 and 64 bit builds in general.

We already use 32k stacks on IA64. So the memory argument fail there.

> > I have some concerns about the medium NUMA systems (a few dozen of nodes)
> > also running out of stack since more data is placed on the stack through
> > the policy layer and since we may end up with a couple of stacked
> > filesystems. Most of the current NUMA systems on x86_64 are basically
> > two nodes on one motherboard. The use of NUMA controls is likely
> > limited there and the complexity of the filesystems is also not high.
> 
> The solution has until now always been to fix the problems so they don't
> use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> systems, but I don't think you'd be able to make that assumption for most
> normal systems.

Yes that is why I made the stack size configurable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
  2007-10-09  8:46                     ` Nick Piggin
@ 2007-10-10  1:26                       ` Christoph Lameter
  -1 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-10  1:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Tue, 9 Oct 2007, Nick Piggin wrote:

> > We already use 32k stacks on IA64. So the memory argument fail there.
> 
> I'm talking about generic code.

The stack size is set in arch code not in generic code.

> > > The solution has until now always been to fix the problems so they don't
> > > use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> > > systems, but I don't think you'd be able to make that assumption for most
> > > normal systems.
> >
> > Yes that is why I made the stack size configurable.
> 
> Fine. I just don't see why you need this fallback.

So you would be ok with submitting the configurable stacksize patches 
separately without the fallback? 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [13/18] x86_64: Allow fallback for the stack
@ 2007-10-10  1:26                       ` Christoph Lameter
  0 siblings, 0 replies; 85+ messages in thread
From: Christoph Lameter @ 2007-10-10  1:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rik van Riel, Andi Kleen, akpm, linux-mm, linux-kernel, travis

On Tue, 9 Oct 2007, Nick Piggin wrote:

> > We already use 32k stacks on IA64. So the memory argument fail there.
> 
> I'm talking about generic code.

The stack size is set in arch code not in generic code.

> > > The solution has until now always been to fix the problems so they don't
> > > use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> > > systems, but I don't think you'd be able to make that assumption for most
> > > normal systems.
> >
> > Yes that is why I made the stack size configurable.
> 
> Fine. I just don't see why you need this fallback.

So you would be ok with submitting the configurable stacksize patches 
separately without the fallback? 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* where to get ZONE_MOVABLE pathces?
  2007-10-09  9:56                         ` Nick Piggin
  (?)
@ 2007-10-10  3:36                         ` Jacky(GuangXiang  Lee)
  2007-10-10 10:32                           ` Mel Gorman
  -1 siblings, 1 reply; 85+ messages in thread
From: Jacky(GuangXiang  Lee) @ 2007-10-10  3:36 UTC (permalink / raw)
  To: linux-mm

hi list ,

I am looking for Mel's patch about ZONE_MOVABLE
http://kerneltrap.org/mailarchive/linux-kernel/2007/1/25/48006


Who can tell me where to download those patches?
Thanks a lot.
Jacky




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: where to get ZONE_MOVABLE pathces?
  2007-10-10  3:36                         ` where to get ZONE_MOVABLE pathces? Jacky(GuangXiang  Lee)
@ 2007-10-10 10:32                           ` Mel Gorman
  0 siblings, 0 replies; 85+ messages in thread
From: Mel Gorman @ 2007-10-10 10:32 UTC (permalink / raw)
  To: Jacky(GuangXiang  Lee); +Cc: linux-mm

On (10/10/07 11:36), Jacky(GuangXiang  Lee) didst pronounce:
> hi list ,
> 
> I am looking for Mel's patch about ZONE_MOVABLE
> http://kerneltrap.org/mailarchive/linux-kernel/2007/1/25/48006
> 
> Who can tell me where to download those patches?

They are already in mainline. Check out the latest kernel and look at
Documentation/kernel-parameters.txt for the kernelcore= and movablecore=
parameters.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Compiling kernel 2.6.23.1 HELP
  2007-10-04  3:59   ` Christoph Lameter
  (?)
@ 2007-10-13 16:46   ` animesh saxena
  2007-10-13 17:12     ` Willy Tarreau
  2007-10-14  2:35     ` Kristoffer Ericson
  -1 siblings, 2 replies; 85+ messages in thread
From: animesh saxena @ 2007-10-13 16:46 UTC (permalink / raw)
  To: linux-kernel

Hi,
   I am a kernel newbie, so need some help trying to compile it. 
I have fedora 7 installed. 

While compiling it, I get errors as follows.....

 HOSTCC  scripts/basic/fixdep
scripts/basic/fixdep.c:107:23: error: sys/types.h: No such file or
directory
scripts/basic/fixdep.c:108:22: error: sys/stat.h: No such file or
directory
scripts/basic/fixdep.c:109:22: error: sys/mman.h: No such file or
directory
scripts/basic/fixdep.c:110:20: error: unistd.h: No such file or
directory
scripts/basic/fixdep.c:111:19: error: fcntl.h: No such file or directory
scripts/basic/fixdep.c:112:20: error: string.h: No such file or
directory
scripts/basic/fixdep.c:113:20: error: stdlib.h: No such file or
directory
scripts/basic/fixdep.c:114:19: error: stdio.h: No such file or directory
In file included
from /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/syslimits.h:7,

from /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/limits.h:11,
                 from scripts/basic/fixdep.c:115:
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/limits.h:122:61: error:
limits.h: No such file or directory
scripts/basic/fixdep.c:116:19: error: ctype.h: No such file or directory
scripts/basic/fixdep.c:117:23: error: arpa/inet.h: No such file or
directory
scripts/basic/fixdep.c: In function ‘usage’:
scripts/basic/fixdep.c:131: warning: implicit declaration of function
‘fprintf’
scripts/basic/fixdep.c:131: warning: incompatible implicit declaration
of built-in function ‘fprintf’
scripts/basic/fixdep.c:131: error: ‘stderr’ undeclared (first use in
this function)
scripts/basic/fixdep.c:131: error: (Each undeclared identifier is
reported only once
scripts/basic/fixdep.c:131: error: for each function it appears in.)
scripts/basic/fixdep.c:132: warning: implicit declaration of function
‘exit’
scripts/basic/fixdep.c:132: warning: incompatible implicit declaration
of built-in function ‘exit’
scripts/basic/fixdep.c: In function ‘print_cmdline’:
scripts/basic/fixdep.c:140: warning: implicit declaration of function
‘printf’
scripts/basic/fixdep.c:140: warning: incompatible implicit declaration
of built-in function ‘printf’
scripts/basic/fixdep.c: At top level:
scripts/basic/fixdep.c:143: error: ‘NULL’ undeclared here (not in a
function)
scripts/basic/fixdep.c: In function ‘grow_config’:
scripts/basic/fixdep.c:156: warning: implicit declaration of function
‘realloc’
scripts/basic/fixdep.c:156: warning: assignment makes pointer from
integer without a cast
scripts/basic/fixdep.c:158: warning: implicit declaration of function
‘perror’
scripts/basic/fixdep.c:158: warning: incompatible implicit declaration
of built-in function ‘exit’
scripts/basic/fixdep.c: In function ‘is_defined_config’:
scripts/basic/fixdep.c:174: warning: implicit declaration of function
‘memcmp’
scripts/basic/fixdep.c: In function ‘define_config’:
scripts/basic/fixdep.c:187: warning: implicit declaration of function
‘memcpy’
scripts/basic/fixdep.c:187: warning: incompatible implicit declaration
of built-in function ‘memcpy’
scripts/basic/fixdep.c: In function ‘use_config’:
scripts/basic/fixdep.c:206: error: ‘PATH_MAX’ undeclared (first use in
this function)
scripts/basic/fixdep.c:214: warning: incompatible implicit declaration
of built-in function ‘memcpy’
scripts/basic/fixdep.c:220: warning: implicit declaration of function
‘tolower’
scripts/basic/fixdep.c:222: warning: incompatible implicit declaration
of built-in function ‘printf’
scripts/basic/fixdep.c:206: warning: unused variable ‘s’
scripts/basic/fixdep.c: At top level:
scripts/basic/fixdep.c:225: error: expected declaration specifiers or
‘...’ before ‘size_t’
scripts/basic/fixdep.c: In function ‘parse_config_file’:
scripts/basic/fixdep.c:227: error: ‘len’ undeclared (first use in this
function)
scripts/basic/fixdep.c:233: warning: implicit declaration of function
‘ntohl’
scripts/basic/fixdep.c:244: warning: implicit declaration of function
‘isalnum’
scripts/basic/fixdep.c: In function ‘strrcmp’:
scripts/basic/fixdep.c:261: warning: implicit declaration of function
‘strlen’
scripts/basic/fixdep.c:261: warning: incompatible implicit declaration
of built-in function ‘strlen’
scripts/basic/fixdep.c: In function ‘do_config_file’:
scripts/basic/fixdep.c:272: error: storage size of ‘st’ isn’t known
scripts/basic/fixdep.c:276: warning: implicit declaration of function
‘open’
scripts/basic/fixdep.c:276: error: ‘O_RDONLY’ undeclared (first use in
this function)
scripts/basic/fixdep.c:278: warning: incompatible implicit declaration
of built-in function ‘fprintf’
scripts/basic/fixdep.c:278: error: ‘stderr’ undeclared (first use in
this function)

On many forums it said I need to have 2.4 kernel headers. I can't
understand why is it so? Am I missing something?

I used git for getting the latest kernel. 

git-clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

Is there something similar for getting kernel headers? Please help. 

Thanks,
Animesh Saxena









^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: Compiling kernel 2.6.23.1 HELP
  2007-10-13 16:46   ` Compiling kernel 2.6.23.1 HELP animesh saxena
@ 2007-10-13 17:12     ` Willy Tarreau
  2007-10-14  2:35     ` Kristoffer Ericson
  1 sibling, 0 replies; 85+ messages in thread
From: Willy Tarreau @ 2007-10-13 17:12 UTC (permalink / raw)
  To: animesh saxena; +Cc: linux-kernel

On Sat, Oct 13, 2007 at 10:16:08PM +0530, animesh saxena wrote:
> Hi,
>    I am a kernel newbie, so need some help trying to compile it. 
> I have fedora 7 installed. 
> 
> While compiling it, I get errors as follows.....
> 
>  HOSTCC  scripts/basic/fixdep
> scripts/basic/fixdep.c:107:23: error: sys/types.h: No such file or
> directory
> scripts/basic/fixdep.c:108:22: error: sys/stat.h: No such file or
> directory
> scripts/basic/fixdep.c:109:22: error: sys/mman.h: No such file or
> directory
> scripts/basic/fixdep.c:110:20: error: unistd.h: No such file or
> directory
> scripts/basic/fixdep.c:111:19: error: fcntl.h: No such file or directory
> scripts/basic/fixdep.c:112:20: error: string.h: No such file or
> directory
> scripts/basic/fixdep.c:113:20: error: stdlib.h: No such file or
> directory
> scripts/basic/fixdep.c:114:19: error: stdio.h: No such file or directory

It seems to me that you do not have glibc-devel or any other name it
can have on your distro. It's very likely that after that you will be
missing a few other development packages, but you'll quickly find them
one at a time.

(...)
> On many forums it said I need to have 2.4 kernel headers. I can't
> understand why is it so? Am I missing something?

Possibly that you will also need some random kernel-header-xxxx package,
I don't know how it's packaged. Install the libc headers first to see.

> I used git for getting the latest kernel. 
> 
> git-clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

If you're a newbie, you're not encouraged to try building GIT kernels.
They are *very* likely to break during build with nasty errors. You'd
better get the official releases in tar.bz2 form which are know to build
and work for most people.

> Is there something similar for getting kernel headers? Please help. 

not to my knowledge.

Regards,
Willy


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: Compiling kernel 2.6.23.1 HELP
  2007-10-13 16:46   ` Compiling kernel 2.6.23.1 HELP animesh saxena
  2007-10-13 17:12     ` Willy Tarreau
@ 2007-10-14  2:35     ` Kristoffer Ericson
  1 sibling, 0 replies; 85+ messages in thread
From: Kristoffer Ericson @ 2007-10-14  2:35 UTC (permalink / raw)
  To: animesh.saxena; +Cc: linux-kernel

Greetings,

This is not the proper forum for compilation help (if nothing is broken of course). You would probably get good help asking on fedora forum (or simply do a search there). Looking at your log it seems to more reflect headers missing than anything wrong with the kernel in it self.

If you fail to find anything useful, I can help you but please send me mail directly so we don't put additional weight on 
this mailinglist.

Best wishes
Kristoffer Ericson

On Sat, 13 Oct 2007 22:16:08 +0530
animesh saxena <animesh.saxena@gmail.com> wrote:

> Hi,
>    I am a kernel newbie, so need some help trying to compile it. 
> I have fedora 7 installed. 
> 
> While compiling it, I get errors as follows.....
> 
>  HOSTCC  scripts/basic/fixdep
> scripts/basic/fixdep.c:107:23: error: sys/types.h: No such file or
> directory
> scripts/basic/fixdep.c:108:22: error: sys/stat.h: No such file or
> directory
> scripts/basic/fixdep.c:109:22: error: sys/mman.h: No such file or
> directory
> scripts/basic/fixdep.c:110:20: error: unistd.h: No such file or
> directory
> scripts/basic/fixdep.c:111:19: error: fcntl.h: No such file or directory
> scripts/basic/fixdep.c:112:20: error: string.h: No such file or
> directory
> scripts/basic/fixdep.c:113:20: error: stdlib.h: No such file or
> directory
> scripts/basic/fixdep.c:114:19: error: stdio.h: No such file or directory
> In file included
> from /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/syslimits.h:7,
> 
> from /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/limits.h:11,
>                  from scripts/basic/fixdep.c:115:
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/include/limits.h:122:61: error:
> limits.h: No such file or directory
> scripts/basic/fixdep.c:116:19: error: ctype.h: No such file or directory
> scripts/basic/fixdep.c:117:23: error: arpa/inet.h: No such file or
> directory
> scripts/basic/fixdep.c: In function ‘usage’:
> scripts/basic/fixdep.c:131: warning: implicit declaration of function
> ‘fprintf’
> scripts/basic/fixdep.c:131: warning: incompatible implicit declaration
> of built-in function ‘fprintf’
> scripts/basic/fixdep.c:131: error: ‘stderr’ undeclared (first use in
> this function)
> scripts/basic/fixdep.c:131: error: (Each undeclared identifier is
> reported only once
> scripts/basic/fixdep.c:131: error: for each function it appears in.)
> scripts/basic/fixdep.c:132: warning: implicit declaration of function
> ‘exit’
> scripts/basic/fixdep.c:132: warning: incompatible implicit declaration
> of built-in function ‘exit’
> scripts/basic/fixdep.c: In function ‘print_cmdline’:
> scripts/basic/fixdep.c:140: warning: implicit declaration of function
> ‘printf’
> scripts/basic/fixdep.c:140: warning: incompatible implicit declaration
> of built-in function ‘printf’
> scripts/basic/fixdep.c: At top level:
> scripts/basic/fixdep.c:143: error: ‘NULL’ undeclared here (not in a
> function)
> scripts/basic/fixdep.c: In function ‘grow_config’:
> scripts/basic/fixdep.c:156: warning: implicit declaration of function
> ‘realloc’
> scripts/basic/fixdep.c:156: warning: assignment makes pointer from
> integer without a cast
> scripts/basic/fixdep.c:158: warning: implicit declaration of function
> ‘perror’
> scripts/basic/fixdep.c:158: warning: incompatible implicit declaration
> of built-in function ‘exit’
> scripts/basic/fixdep.c: In function ‘is_defined_config’:
> scripts/basic/fixdep.c:174: warning: implicit declaration of function
> ‘memcmp’
> scripts/basic/fixdep.c: In function ‘define_config’:
> scripts/basic/fixdep.c:187: warning: implicit declaration of function
> ‘memcpy’
> scripts/basic/fixdep.c:187: warning: incompatible implicit declaration
> of built-in function ‘memcpy’
> scripts/basic/fixdep.c: In function ‘use_config’:
> scripts/basic/fixdep.c:206: error: ‘PATH_MAX’ undeclared (first use in
> this function)
> scripts/basic/fixdep.c:214: warning: incompatible implicit declaration
> of built-in function ‘memcpy’
> scripts/basic/fixdep.c:220: warning: implicit declaration of function
> ‘tolower’
> scripts/basic/fixdep.c:222: warning: incompatible implicit declaration
> of built-in function ‘printf’
> scripts/basic/fixdep.c:206: warning: unused variable ‘s’
> scripts/basic/fixdep.c: At top level:
> scripts/basic/fixdep.c:225: error: expected declaration specifiers or
> ‘...’ before ‘size_t’
> scripts/basic/fixdep.c: In function ‘parse_config_file’:
> scripts/basic/fixdep.c:227: error: ‘len’ undeclared (first use in this
> function)
> scripts/basic/fixdep.c:233: warning: implicit declaration of function
> ‘ntohl’
> scripts/basic/fixdep.c:244: warning: implicit declaration of function
> ‘isalnum’
> scripts/basic/fixdep.c: In function ‘strrcmp’:
> scripts/basic/fixdep.c:261: warning: implicit declaration of function
> ‘strlen’
> scripts/basic/fixdep.c:261: warning: incompatible implicit declaration
> of built-in function ‘strlen’
> scripts/basic/fixdep.c: In function ‘do_config_file’:
> scripts/basic/fixdep.c:272: error: storage size of ‘st’ isn’t known
> scripts/basic/fixdep.c:276: warning: implicit declaration of function
> ‘open’
> scripts/basic/fixdep.c:276: error: ‘O_RDONLY’ undeclared (first use in
> this function)
> scripts/basic/fixdep.c:278: warning: incompatible implicit declaration
> of built-in function ‘fprintf’
> scripts/basic/fixdep.c:278: error: ‘stderr’ undeclared (first use in
> this function)
> 
> On many forums it said I need to have 2.4 kernel headers. I can't
> understand why is it so? Am I missing something?
> 
> I used git for getting the latest kernel. 
> 
> git-clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> 
> Is there something similar for getting kernel headers? Please help. 
> 
> Thanks,
> Animesh Saxena
> 
> 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Kristoffer Ericson <Kristoffer.Ericson@Gmail.com>

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2007-10-13 17:31 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-04  3:59 [00/18] Virtual Compound Page Support V2 Christoph Lameter
2007-10-04  3:59 ` Christoph Lameter
2007-10-04  3:59 ` [01/18] vmalloc: clean up page array indexing Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [02/18] vunmap: return page array passed on vmap() Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [03/18] vmalloc_address(): Determine vmalloc address from page struct Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [04/18] Vcompound: Smart up virt_to_head_page() Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [05/18] Page flags: Add PageVcompound() Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [06/18] Vcompound: Update page address determination Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [07/18] Vcompound: Add compound_nth_page() to determine nth base page Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-13 16:46   ` Compiling kernel 2.6.23.1 HELP animesh saxena
2007-10-13 17:12     ` Willy Tarreau
2007-10-14  2:35     ` Kristoffer Ericson
2007-10-04  3:59 ` [08/18] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [09/18] Vcompound: GFP_VFALLBACK debugging aid Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [10/18] Sparsemem: Use fallback for the memmap Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [11/18] Page allocator: Use a higher order allocation for the zone wait table Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [12/18] Wait: Allow bit_waitqueue to wait on a bit in a virtual compound page Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [13/18] x86_64: Allow fallback for the stack Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04 11:56   ` Andi Kleen
2007-10-04 11:56     ` Andi Kleen
2007-10-04 12:08     ` Peter Zijlstra
2007-10-04 12:08       ` Peter Zijlstra
2007-10-04 12:25       ` Andi Kleen
2007-10-04 12:25         ` Andi Kleen
2007-10-04 12:30         ` Peter Zijlstra
2007-10-04 12:30           ` Peter Zijlstra
2007-10-04 17:40         ` Christoph Lameter
2007-10-04 17:40           ` Christoph Lameter
2007-10-04 19:20     ` Christoph Lameter
2007-10-04 19:20       ` Christoph Lameter
2007-10-04 19:39       ` Rik van Riel
2007-10-04 19:39         ` Rik van Riel
2007-10-04 21:20         ` Christoph Lameter
2007-10-04 21:20           ` Christoph Lameter
2007-10-07  7:35           ` Nick Piggin
2007-10-07  7:35             ` Nick Piggin
2007-10-08 17:36             ` Christoph Lameter
2007-10-08 17:36               ` Christoph Lameter
2007-10-08 12:55               ` Nick Piggin
2007-10-08 12:55                 ` Nick Piggin
2007-10-09 18:39                 ` Christoph Lameter
2007-10-09 18:39                   ` Christoph Lameter
2007-10-09  8:46                   ` Nick Piggin
2007-10-09  8:46                     ` Nick Piggin
2007-10-10  1:26                     ` Christoph Lameter
2007-10-10  1:26                       ` Christoph Lameter
2007-10-09  9:56                       ` Nick Piggin
2007-10-09  9:56                         ` Nick Piggin
2007-10-10  3:36                         ` where to get ZONE_MOVABLE pathces? Jacky(GuangXiang  Lee)
2007-10-10 10:32                           ` Mel Gorman
2007-10-06 18:53         ` [13/18] x86_64: Allow fallback for the stack Bill Davidsen
2007-10-06 18:53           ` Bill Davidsen
2007-10-04  3:59 ` [14/18] Configure stack size Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  4:36   ` Arjan van de Ven
2007-10-04  4:36     ` Arjan van de Ven
2007-10-04  4:43     ` David Miller
2007-10-04  4:43       ` David Miller, Arjan van de Ven
2007-10-04 19:34       ` Christoph Lameter
2007-10-04 19:34         ` Christoph Lameter
2007-10-04  9:11   ` Andi Kleen
2007-10-04  9:11     ` Andi Kleen
2007-10-04 19:26     ` Christoph Lameter
2007-10-04 19:26       ` Christoph Lameter
2007-10-04  3:59 ` [15/18] Fallback for temporary order 2 allocation Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [16/18] Virtual Compound page allocation from interrupt context Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [17/18] Virtual compound page freeing in " Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter
2007-10-04  3:59 ` [18/18] SLUB: Use fallback for table of callers/freers of a slab cache Christoph Lameter
2007-10-04  3:59   ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.