linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/5] NOMMU: High-order page management overhaul
       [not found]   ` <200412082012.iB8KCTBK010123@warthog.cambridge.redhat.com>
@ 2004-12-10 15:45     ` David Howells
  2004-12-10 21:01       ` Andrew Morton
  2004-12-13 16:32       ` David Howells
  0 siblings, 2 replies; 4+ messages in thread
From: David Howells @ 2004-12-10 15:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, gerg%snapgear.com.wli, linux-kernel, uclinux-dev

Andrew Morton <akpm@osdl.org> wrote:

> > The attached patch overhauls high-order page handling.
>
> This patch (which is actually twelve patches)

How did you work that one out? Just because there're twelve points in my list
doesn't mean the patch can be split twelve ways. If you really want it
dissociating into sub-patches, I'm sure I can do that, but not all the
intermediate stages would be compilable and testable.

> seems to be taking out old code and replacing it with new code for no
> apparent reason.

 (1) I've been moaned at by a lot of people for:

     (a) #ifdefs in page_alloc.c... This gets rid of some of them, even if I
       	 didn't add them.

     (b) The way page_alloc.c was handling page refcounting differently under
     	 nommu conditions. All I did was to fix it, but it seems it's my
     	 fault:-/ This fixes it to use compound pages "as [I] should've done
     	 in the first place".

 (2) Splitting high-order pages has to be done differently on MMU vs
     NOMMU. Part of this makes it simpler by providing convenience functions
     for the job.

 (3) More robust nommu high-order page handling. I'm wary of the current way
     the individual secondary pages of a high-order page are handled in nomuu
     conditions. I can see ways it can go wrong all too easily (the existence
     of the whole thing is contingent on the count on the first page, but
     pinning the secondary pages doesn't affect that).

 (4) Making it easier to debug problems with compound pages (bad_page
     changes).

 (5) Abstraction of some compound page related functions, including a way to
     make it more efficient to access the first page (PG_compound_slave).

> I mean, what is the *objective* of doing all of this stuff?  What problems
> does it cause if the patch is simply dropped???

Objectives? Well:

 (1) More robust high-order page handling in nommu conditions.

 (2) Use compound pages to achieve (1) as per the numerous suggestions.

 (3) Remove #ifdefs as per the numerous suggestions.

I think the drivers need a good auditing too. A lot of them allocate
high-order pages for various uses, some for use as single units, and some for
use as arrays of pages.

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/5] NOMMU: High-order page management overhaul
  2004-12-10 15:45     ` [PATCH 2/5] NOMMU: High-order page management overhaul David Howells
@ 2004-12-10 21:01       ` Andrew Morton
  2004-12-13 16:32       ` David Howells
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2004-12-10 21:01 UTC (permalink / raw)
  To: David Howells; +Cc: davidm, gerg%snapgear.com.wli, linux-kernel, uclinux-dev

David Howells <dhowells@redhat.com> wrote:
>
> Andrew Morton <akpm@osdl.org> wrote:
> 
> > > The attached patch overhauls high-order page handling.
> >
> > This patch (which is actually twelve patches)
> 
> How did you work that one out? Just because there're twelve points in my list
> doesn't mean the patch can be split twelve ways. If you really want it
> dissociating into sub-patches, I'm sure I can do that, but not all the
> intermediate stages would be compilable and testable.

Of course, splitting the work into one-concept-per-patch would be a big help.

> > seems to be taking out old code and replacing it with new code for no
> > apparent reason.
> 
>  (1) I've been moaned at by a lot of people for:
> 
>      (a) #ifdefs in page_alloc.c... This gets rid of some of them, even if I
>        	 didn't add them.
> 
>      (b) The way page_alloc.c was handling page refcounting differently under
>      	 nommu conditions. All I did was to fix it, but it seems it's my
>      	 fault:-/ This fixes it to use compound pages "as [I] should've done
>      	 in the first place".

I think I was the original "use compound pages" culprit.  But when I
realised that nommu needs access to fields in the sub-pages which are
currently used for compound page metadata I withdrew into the "if what's
there now works, stick with it" camp.

>  (2) Splitting high-order pages has to be done differently on MMU vs
>      NOMMU.

Oh.  Why?

> Part of this makes it simpler by providing convenience functions
>      for the job.
> 
>  (3) More robust nommu high-order page handling. I'm wary of the current way
>      the individual secondary pages of a high-order page are handled in nomuu
>      conditions. I can see ways it can go wrong all too easily (the existence
>      of the whole thing is contingent on the count on the first page, but
>      pinning the secondary pages doesn't affect that).

The current code (which pins each subpage individually) seems robust
enough.  I assume that nommu will thenceforth simply treat the region as an
array of zero-order pages.

>  (4) Making it easier to debug problems with compound pages (bad_page
>      changes).
> 
>  (5) Abstraction of some compound page related functions, including a way to
>      make it more efficient to access the first page (PG_compound_slave).

If there is any way at all in which we can avoid consuming another page
flag then we should do so.  There are various concepts (many zones,
advanced page aging algorithms) which would be unfeasible if there are not
several more bits available in ->flags.   And they continue to dribble away.

> > I mean, what is the *objective* of doing all of this stuff?  What problems
> > does it cause if the patch is simply dropped???
> 
> Objectives? Well:
> 
>  (1) More robust high-order page handling in nommu conditions.
> 
>  (2) Use compound pages to achieve (1) as per the numerous suggestions.
> 
>  (3) Remove #ifdefs as per the numerous suggestions.

But there's nothing actually *essential* here, is there?  No bugs are
fixed?

> I think the drivers need a good auditing too. A lot of them allocate
> high-order pages for various uses, some for use as single units, and some for
> use as arrays of pages.

I think an ARM driver is freeing zero-order pages within a higher-order
page.  But as long as the driver didn't set __GFP_COMP then the higher
order page is not compound, and that splitting treatment is appropriate.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/5] NOMMU: High-order page management overhaul
  2004-12-10 15:45     ` [PATCH 2/5] NOMMU: High-order page management overhaul David Howells
  2004-12-10 21:01       ` Andrew Morton
@ 2004-12-13 16:32       ` David Howells
  1 sibling, 0 replies; 4+ messages in thread
From: David Howells @ 2004-12-13 16:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, gerg%snapgear.com.wli, linux-kernel, uclinux-dev


Andrew Morton <akpm@osdl.org> wrote:

> I think I was the original "use compound pages" culprit.

You were, but several other people have chimed in since.

> But when I realised that nommu needs access to fields in the sub-pages which
> are currently used for compound page metadata I withdrew into the "if what's
> there now works, stick with it" camp.

The nommu stuff only needs access to a flag or two (PG_compound or
PG_compound_slave) and the refcount. I don't believe that any of the stuff
that pins secondary pages for userspace's benefit cares about anything else.

And, apart from that, as far as kernel side code is concerned, high-order
pages should be dealt with as high-order pages, or they should be properly
split and used as arrays of pages.

> >  (2) Splitting high-order pages has to be done differently on MMU vs
> >      NOMMU.
> 
> Oh.  Why?

There are three cases of splitting that I can think of:

 (1) Split down to zero-order pages. I think this can be handled the same in
     both cases, since _every_ secondaty page needs reinitialisation.

     Note that I'm ignoring the case of a secondary page already being
     pinned. That is one case where the old way is superior _ASSUMING_ the
     counts on the secondary pages are incremented, not just set to 1.

     However, if a high-order page is being split after being exposed to
     userspace, the driver writer probably deserves everything they get:-)

 (2) Split down to smaller high-order pages. If a driver doing this just
     reinitialises the first page of every chunk, it'll probably be okay,
     _provided_ it doesn't touch the secondary pages. If it does do that - say
     by initialising the size to zero, the whole thing is likely to explode.

 (3) Splitting compound pages. Obviously, if a driver requests a compound
     page, it should be able to handle dissociation into lower-order compound
     pages or zero-order pages. I'd argue that the core kernel should provide
     a function to do this.

So, case (2) is potentially problematical.

> The current code (which pins each subpage individually) seems robust
> enough.

Maybe.

> I assume that nommu will thenceforth simply treat the region as an
> array of zero-order pages.

That depends what you mean by "nommu". It's actually the common bits that
thenceforth treat high-order pages as individual pages, be they compound pages
from hugetlbfs, single pages from the page cache or high-order pages from the
slab allocator or alloc_pages().

> >  (5) Abstraction of some compound page related functions, including a way to
> >      make it more efficient to access the first page (PG_compound_slave).
> 
> If there is any way at all in which we can avoid consuming another page
> flag then we should do so.  There are various concepts (many zones,
> advanced page aging algorithms) which would be unfeasible if there are not
> several more bits available in ->flags.   And they continue to dribble away.

There is. We can move the current occupant of the compound-second struct
page's mapping into page[1].lru and stick a unique magic value in there.

	[mm/page_alloc.c]
	const char compound_page_slave_magic[4];

	[include/linux/mm.h]
	extern const char compound_page_slave_magic[];
	#define COMPOUND_PAGE_SLAVE_MAGIC \
		((struct address space *) &compound_page_slave[3])

	#define PageCompoundSlave(page) \
		((page)->mapping == COMPOUND_PAGE_SLAVE_MAGIC)

	#define SetPageCompoundSlave(page) \
	do { \
		BUG_ON((page)->mapping); \
		(page)->mapping = COMPOUND_PAGE_SLAVE_MAGIC; \
	} while(0)

	#define ClearPageCompoundSlave(page) \
	do { \
		BUG_ON(!PageCompoundSlave(page)); \
		(page)->mapping = NULL; \
	} while(0)

This would have a useful property of causing a misalignment exception
(assuming it's not the i386 arch) if someone tries to access the mapping.

Andrew Morton <akpm@osdl.org> wrote:

> But there's nothing actually *essential* here, is there?  No bugs are
> fixed?

Well, I feel it's more robust. I can't say that it _definitely_ fixes any
bugs, but I can see how they could happen.

> > I think the drivers need a good auditing too. A lot of them allocate
> > high-order pages for various uses, some for use as single units, and some
> > for use as arrays of pages.
> 
> I think an ARM driver is freeing zero-order pages within a higher-order
> page.  But as long as the driver didn't set __GFP_COMP then the higher
> order page is not compound, and that splitting treatment is appropriate.

I'd changed my patch to honour __GFP_COMP. However, such driver should
probably be changed to call a splitting function in mm/page_alloc.c. This sort
of thing is definitely the territory of the master mm routines.

It might be worth adding a new allocator routine that takes arguments along
the lines of calloc() - so that you ask for 2^N pages of 2^M size. This would
allow the allocator to initialise everything correctly up front.

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/5] NOMMU: High-order page management overhaul
  2004-12-09 15:08 [PATCH 1/5] NOMMU: MM cleanups dhowells
@ 2004-12-09 15:08 ` dhowells
  0 siblings, 0 replies; 4+ messages in thread
From: dhowells @ 2004-12-09 15:08 UTC (permalink / raw)
  To: akpm, davidm, gerg, wli; +Cc: linux-kernel, uclinux-dev

The attached patch overhauls high-order page handling.

 (1) A new bit flag PG_compound_slave has been added. This is used to mark the
     second+ subpages of a compound page, thus making get_page() and
     put_page() able to determine the need to perform weird stuff quickly.

     This could be changed to do horribly things with the page count or to
     abuse the page->lru member instead of eating another page flag.

 (2) Compound page metadata is now always set on compound pages when allocating
     and checked when freeing. This metadata is mostly as it was before:

	- PG_compound is set on all subpages
	- PG_compound_slave is set on all but the first subpage <--- [1]
	- page[1].index holds the compound page order
	- page[1...N-1].private points to page[0]. <--- [2]
	- page[1].mapping may hold a destructor function for put_page()

     This is now done in prep_new_page().

     [1] New metadata addition
     [2] Page private is no longer modified on page[0]

 (3) __page_first() is now provided to find the first page of any page set
     (even single page sets).

 (4) A new config option ENHANCED_COMPOUND_PAGES is now available. This is
     only set on !MMU or HUGETLB_PAGE. It causes __page_first() to dereference
     page->private if PG_compound_slave is set.

 (5) __GFP_COMP is required to request a compound page. This is asserted by the
     slab allocator when it allocates a page. The flag is ignored for any
     single-page allocation.

 (6) compound_page_order() is now available. This will indicate the order of a
     compound page. It says that high-order arrays of single pages are order 0.

     Since it is now trivial to work out the order of any page, free_pages()
     and co could all lose their order arguments.

 (7) bad_page() now prints more information, including information about more
     pages in the case of a compound page.

 (8) prep_compound_page() and destroy_compound_page() have been absorbed.

 (9) A lot more unlikely() clauses have been inserted in the free page
     checking functions.

(10) The !MMU bits have all gone from page_alloc.c.

(11) There's now a page destructor prototype and a function to set the
     destructor on compound pages.

(12) Two functions are now provided in page_alloc.c to dissociate high-order or
     compound pages into pages of a smaller order.

Note: I've changed my patch such that high-order pages aren't always marked
compound now. This has reverted to being contingent on the __GFP_COMP flag
being passed to __alloc_pages(). The slab allocator now always supplies this
flag.

Signed-Off-By: dhowells@redhat.com
---
diffstat compound-2610rc2mm3-3.diff
 include/linux/mm.h         |   69 ++++++--
 include/linux/page-flags.h |    6 
 init/Kconfig               |   13 +
 mm/hugetlb.c               |    4 
 mm/page_alloc.c            |  388 ++++++++++++++++++++++++++++-----------------
 mm/slab.c                  |    2 
 6 files changed, 323 insertions(+), 159 deletions(-)

diff -uNrp linux-2.6.10-rc2-mm3-mmcleanup/include/linux/mm.h linux-2.6.10-rc2-mm3-shmem/include/linux/mm.h
--- linux-2.6.10-rc2-mm3-mmcleanup/include/linux/mm.h	2004-11-22 10:54:16.000000000 +0000
+++ linux-2.6.10-rc2-mm3-shmem/include/linux/mm.h	2004-12-08 16:52:24.000000000 +0000
@@ -227,6 +227,12 @@ typedef unsigned long page_flags_t;
  * it to keep track of whatever it is we are using the page for at the
  * moment. Note that we have no way to track which tasks are using
  * a page.
+ *
+ * Any high-order page allocation has all the pages marked PG_compound. The
+ * first page of such a block holds the block's usage count and control
+ * data. The second page holds the order in its index member and a destructor
+ * function pointer in its mapping member. In enhanced compound page mode, the
+ * second+ pages have their private pointers pointing at the first page.
  */
 struct page {
 	page_flags_t flags;		/* Atomic flags, some possibly
@@ -314,45 +320,76 @@ struct page {
  */
 #define get_page_testone(p)	atomic_inc_and_test(&(p)->_count)
 
-#define set_page_count(p,v) 	atomic_set(&(p)->_count, v - 1)
+#define set_page_count(p,v) 	atomic_set(&(p)->_count, (v) - 1)
 #define __put_page(p)		atomic_dec(&(p)->_count)
 
 extern void FASTCALL(__page_cache_release(struct page *));
 
-#ifdef CONFIG_HUGETLB_PAGE
-
-static inline int page_count(struct page *p)
+static inline struct page *page_head(struct page *page)
 {
-	if (PageCompound(p))
-		p = (struct page *)p->private;
-	return atomic_read(&(p)->_count) + 1;
+#ifdef CONFIG_ENHANCED_COMPOUND_PAGES
+	if (unlikely(PageCompoundSlave(page)))
+		page = (struct page *) page->private;
+#endif
+	return page;
 }
 
-static inline void get_page(struct page *page)
+static inline unsigned compound_page_order(struct page *page)
 {
-	if (unlikely(PageCompound(page)))
-		page = (struct page *)page->private;
-	atomic_inc(&page->_count);
+	unsigned order = 0;
+
+	if (unlikely(PageCompound(page))) {
+		page = page_head(page);
+		order = page[1].index;
+	}
+	return order;
 }
 
-void put_page(struct page *page);
+extern void split_compound_page(struct page *page, unsigned new_order);
+extern void split_highorder_page(struct page *page, unsigned new_order,
+				 unsigned old_order);
 
-#else		/* CONFIG_HUGETLB_PAGE */
+typedef void (*page_dtor_t)(struct page *);
 
-#define page_count(p)		(atomic_read(&(p)->_count) + 1)
+static inline page_dtor_t page_dtor(struct page *page)
+{
+	page_dtor_t dtor = NULL;
+
+	if (unlikely(PageCompound(page))) {
+		page = page_head(page);
+		dtor = (page_dtor_t) page[1].mapping;
+	}
+	return dtor;
+}
+
+static inline void set_page_dtor(struct page *page, page_dtor_t dtor)
+{
+	BUG_ON(!PageCompound(page));
+	BUG_ON(PageCompoundSlave(page));
+	page[1].mapping = (void *) dtor;
+}
+
+static inline int page_count(struct page *page)
+{
+	page = page_head(page);
+	return atomic_read(&page->_count) + 1;
+}
 
 static inline void get_page(struct page *page)
 {
+	page = page_head(page);
 	atomic_inc(&page->_count);
 }
 
+#ifdef CONFIG_ENHANCED_COMPOUND_PAGES
+extern fastcall void put_page(struct page *page);
+#else
 static inline void put_page(struct page *page)
 {
 	if (!PageReserved(page) && put_page_testzero(page))
 		__page_cache_release(page);
 }
-
-#endif		/* CONFIG_HUGETLB_PAGE */
+#endif /* CONFIG_COMPOUND_PAGE */
 
 /*
  * Multiple processes may "see" the same page. E.g. for untouched
diff -uNrp linux-2.6.10-rc2-mm3-mmcleanup/include/linux/page-flags.h linux-2.6.10-rc2-mm3-shmem/include/linux/page-flags.h
--- linux-2.6.10-rc2-mm3-mmcleanup/include/linux/page-flags.h	2004-11-22 10:54:16.000000000 +0000
+++ linux-2.6.10-rc2-mm3-shmem/include/linux/page-flags.h	2004-11-22 11:45:09.000000000 +0000
@@ -78,6 +78,7 @@
 #define PG_sharedpolicy         19      /* Page was allocated for a file
 					   mapping using a shared_policy */
 
+#define PG_compound_slave	20	/* second+ page of a compound page */
 
 /*
  * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -294,6 +295,11 @@ extern unsigned long __read_page_state(u
 #define PageCompound(page)	test_bit(PG_compound, &(page)->flags)
 #define SetPageCompound(page)	set_bit(PG_compound, &(page)->flags)
 #define ClearPageCompound(page)	clear_bit(PG_compound, &(page)->flags)
+#define __ClearPageCompound(page)	__clear_bit(PG_compound, &(page)->flags)
+
+#define PageCompoundSlave(page)		test_bit(PG_compound_slave, &(page)->flags)
+#define SetPageCompoundSlave(page)	set_bit(PG_compound_slave, &(page)->flags)
+#define ClearPageCompoundSlave(page)	clear_bit(PG_compound_slave, &(page)->flags)
 
 #define PageSharedPolicy(page)      test_bit(PG_sharedpolicy, &(page)->flags)
 #define SetPageSharedPolicy(page)   set_bit(PG_sharedpolicy, &(page)->flags)
diff -uNrp linux-2.6.10-rc2-mm3-mmcleanup/init/Kconfig linux-2.6.10-rc2-mm3-shmem/init/Kconfig
--- linux-2.6.10-rc2-mm3-mmcleanup/init/Kconfig	2004-11-22 10:54:17.000000000 +0000
+++ linux-2.6.10-rc2-mm3-shmem/init/Kconfig	2004-12-01 17:07:36.000000000 +0000
@@ -380,6 +380,19 @@ config TINY_SHMEM
 	default !SHMEM
 	bool
 
+config ENHANCED_COMPOUND_PAGES
+	bool
+	default HUGETLB_PAGE || !MMU
+	help
+
+	  Enhance management of high-order pages by pointing the 2nd+ pages at
+	  the first. get_page() and put_page() then use the usage count on the
+	  first page to manage all the pages in the block.
+
+	  This is used when it might be necessary to access the intermediate
+	  pages of a block, such as ptrace() might under nommu of hugetlb
+	  conditions.
+
 menu "Loadable module support"
 
 config MODULES
diff -uNrp linux-2.6.10-rc2-mm3-mmcleanup/mm/hugetlb.c linux-2.6.10-rc2-mm3-shmem/mm/hugetlb.c
--- linux-2.6.10-rc2-mm3-mmcleanup/mm/hugetlb.c	2004-11-22 10:54:18.000000000 +0000
+++ linux-2.6.10-rc2-mm3-shmem/mm/hugetlb.c	2004-12-01 15:37:59.000000000 +0000
@@ -67,7 +67,7 @@ void free_huge_page(struct page *page)
 	BUG_ON(page_count(page));
 
 	INIT_LIST_HEAD(&page->lru);
-	page[1].mapping = NULL;
+	set_page_dtor(page, NULL);
 
 	spin_lock(&hugetlb_lock);
 	enqueue_huge_page(page);
@@ -87,7 +87,7 @@ struct page *alloc_huge_page(void)
 	}
 	spin_unlock(&hugetlb_lock);
 	set_page_count(page, 1);
-	page[1].mapping = (void *)free_huge_page;
+	set_page_dtor(page, free_huge_page);
 	for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
 		clear_highpage(&page[i]);
 	return page;
diff -uNrp linux-2.6.10-rc2-mm3-mmcleanup/mm/page_alloc.c linux-2.6.10-rc2-mm3-shmem/mm/page_alloc.c
--- linux-2.6.10-rc2-mm3-mmcleanup/mm/page_alloc.c	2004-11-23 16:13:04.000000000 +0000
+++ linux-2.6.10-rc2-mm3-shmem/mm/page_alloc.c	2004-12-02 14:02:37.000000000 +0000
@@ -80,15 +80,61 @@ static int bad_range(struct zone *zone, 
 	return 0;
 }
 
-static void bad_page(const char *function, struct page *page)
+static inline void __bad_page(struct page *page)
 {
-	printk(KERN_EMERG "Bad page state at %s (in process '%s', page %p)\n",
-		function, current->comm, page);
-	printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
-		(int)(2*sizeof(page_flags_t)), (unsigned long)page->flags,
-		page->mapping, page_mapcount(page), page_count(page));
+	const char *fmt;
+
+	if (sizeof(void *) == 4)
+		fmt = KERN_EMERG "%08lx %p %08x %p %8x %8x %8lx %8lx\n";
+	else
+		fmt = KERN_EMERG "%016lx %p %08x %p %8x %8x %16lx %16lx\n";
+
+	printk(fmt,
+	       page_to_pfn(page),
+	       page,
+	       (unsigned) page->flags,
+	       page->mapping, page_mapcount(page), page_count(page),
+	       page->index, page->private);
+}
+
+static void bad_page(const char *function, struct page *page,
+		     struct page *page0, int order)
+{
+	printk(KERN_EMERG "\n");
+	printk(KERN_EMERG
+	       "Bad page state at %s (in process '%s', order %d)\n",
+	       function, current->comm, order);
+
+	if (sizeof(void *) == 4) {
+		printk(KERN_EMERG
+		       "PFN      PAGE*    FLAGS    MAPPING  MAPCOUNT COUNT    INDEX    PRIVATE\n");
+		printk(KERN_EMERG
+		       "======== ======== ======== ======== ======== ======== ======== ========\n");
+	}
+	else {
+		printk(KERN_EMERG
+		       "PFN              PAGE*            FLAGS    MAPPING          MAPCOUNT COUNT    INDEX            PRIVATE\n");
+		printk(KERN_EMERG
+		       "================ ================ ======== ================ ======== ======== ================ ================\n");
+	}
+
+	/* print extra details on a compound page */
+	if (PageCompound(page0)) {
+		__bad_page(page0);
+		__bad_page(page0 + 1);
+
+		if (page > page0 + 1) {
+			if (page > page0 + 2)
+				printk(KERN_EMERG "...\n");
+			__bad_page(page);
+		}
+	} else {
+		__bad_page(page);
+	}
+
 	printk(KERN_EMERG "Backtrace:\n");
 	dump_stack();
+
 	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n");
 	page->flags &= ~(1 << PG_private	|
 			1 << PG_locked	|
@@ -103,82 +149,6 @@ static void bad_page(const char *functio
 	tainted |= TAINT_BAD_PAGE;
 }
 
-void set_page_refs(struct page *page, int order)
-{
-#ifdef CONFIG_MMU
-	set_page_count(page, 1);
-#else
-	int i;
-
-	/*
-	 * We need to reference all the pages for this order, otherwise if
-	 * anyone accesses one of the pages with (get/put) it will be freed.
-	 * - eg: access_process_vm()
-	 */
-	for (i = 0; i < (1 << order); i++)
-		set_page_count(page + i, 1);
-#endif /* CONFIG_MMU */
-}
-
-#ifndef CONFIG_HUGETLB_PAGE
-#define prep_compound_page(page, order) do { } while (0)
-#define destroy_compound_page(page, order) do { } while (0)
-#else
-/*
- * Higher-order pages are called "compound pages".  They are structured thusly:
- *
- * The first PAGE_SIZE page is called the "head page".
- *
- * The remaining PAGE_SIZE pages are called "tail pages".
- *
- * All pages have PG_compound set.  All pages have their ->private pointing at
- * the head page (even the head page has this).
- *
- * The first tail page's ->mapping, if non-zero, holds the address of the
- * compound page's put_page() function.
- *
- * The order of the allocation is stored in the first tail page's ->index
- * This is only for debug at present.  This usage means that zero-order pages
- * may not be compound.
- */
-static void prep_compound_page(struct page *page, unsigned long order)
-{
-	int i;
-	int nr_pages = 1 << order;
-
-	page[1].mapping = NULL;
-	page[1].index = order;
-	for (i = 0; i < nr_pages; i++) {
-		struct page *p = page + i;
-
-		SetPageCompound(p);
-		p->private = (unsigned long)page;
-	}
-}
-
-static void destroy_compound_page(struct page *page, unsigned long order)
-{
-	int i;
-	int nr_pages = 1 << order;
-
-	if (!PageCompound(page))
-		return;
-
-	if (page[1].index != order)
-		bad_page(__FUNCTION__, page);
-
-	for (i = 0; i < nr_pages; i++) {
-		struct page *p = page + i;
-
-		if (!PageCompound(p))
-			bad_page(__FUNCTION__, page);
-		if (p->private != (unsigned long)page)
-			bad_page(__FUNCTION__, page);
-		ClearPageCompound(p);
-	}
-}
-#endif		/* CONFIG_HUGETLB_PAGE */
-
 /*
  * function for dealing with page's order in buddy system.
  * zone->lock is already acquired when we use these.
@@ -201,6 +171,11 @@ static inline void rmv_page_order(struct
 	page->private = 0;
 }
 
+static inline void set_page_refs(struct page *page, int order)
+{
+	set_page_count(page, 1);
+}
+
 /*
  * This function checks whether a page is free && is the buddy
  * we can do coalesce a page and its buddy if
@@ -221,6 +196,93 @@ static inline int page_is_buddy(struct p
 }
 
 /*
+ * validate a page that's being handed back for recycling
+ */
+static
+void free_pages_check_compound(const char *function, struct page *page, int order)
+{
+	struct page *xpage;
+	int i;
+
+	xpage = page;
+
+	if (unlikely(order == 0 ||
+		     PageCompoundSlave(page)
+		     ))
+		goto badpage;
+
+	xpage++;
+	if (unlikely(xpage->index != order))
+		goto badpage;
+
+	for (i = (1 << order) - 1; i > 0; i--) {
+		if (unlikely(!PageCompound(xpage) ||
+			     !PageCompoundSlave(xpage) ||
+			     (xpage->flags & (
+				     1 << PG_lru	|
+				     1 << PG_private	|
+				     1 << PG_locked	|
+				     1 << PG_active	|
+				     1 << PG_reclaim	|
+				     1 << PG_slab	|
+				     1 << PG_swapcache	|
+				     1 << PG_writeback
+				     )) ||
+			     page_count(xpage) != 0 ||
+			     page_mapped(xpage) ||
+			     xpage->mapping != NULL ||
+			     xpage->private != (unsigned long) page
+			     ))
+			goto badpage;
+
+		if (PageDirty(xpage))
+			ClearPageDirty(xpage);
+		xpage++;
+	}
+
+	return;
+
+ badpage:
+	bad_page(function, xpage, page, order);
+	return;
+}
+
+static inline
+void free_pages_check(const char *function, struct page *page, int order)
+{
+	if (unlikely(
+		page_mapped(page) ||
+		page->mapping != NULL ||
+		page_count(page) != 0 ||
+		(page->flags & (
+			1 << PG_lru	|
+			1 << PG_private |
+			1 << PG_locked	|
+			1 << PG_active	|
+			1 << PG_reclaim	|
+			1 << PG_slab	|
+			1 << PG_swapcache |
+			1 << PG_writeback ))
+		))
+		goto badpage;
+
+	/* check that compound pages are correctly assembled */
+	if (unlikely(PageCompound(page)))
+		free_pages_check_compound(function, page, order);
+	else if (unlikely(order > 0))
+		goto badpage;
+
+	if (PageDirty(page))
+		ClearPageDirty(page);
+
+	return;
+
+ badpage:
+	bad_page(function, page, page, order);
+	return;
+}
+
+/*
  * Freeing function for a buddy system allocator.
  *
  * The concept of a buddy system is to maintain direct-mapped table
@@ -251,8 +313,14 @@ static inline void __free_pages_bulk (st
 	struct page *coalesced;
 	int order_size = 1 << order;
 
-	if (unlikely(order))
-		destroy_compound_page(page, order);
+	if (unlikely(PageCompound(page))) {
+		struct page *xpage = page;
+		int i;
+
+		for (i = (1 << order); i > 0; i--)
+			(xpage++)->flags &=
+				~(1 << PG_compound | 1 << PG_compound_slave);
+	}
 
 	page_idx = page - base;
 
@@ -285,25 +353,6 @@ static inline void __free_pages_bulk (st
 	zone->free_area[order].nr_free++;
 }
 
-static inline void free_pages_check(const char *function, struct page *page)
-{
-	if (	page_mapped(page) ||
-		page->mapping != NULL ||
-		page_count(page) != 0 ||
-		(page->flags & (
-			1 << PG_lru	|
-			1 << PG_private |
-			1 << PG_locked	|
-			1 << PG_active	|
-			1 << PG_reclaim	|
-			1 << PG_slab	|
-			1 << PG_swapcache |
-			1 << PG_writeback )))
-		bad_page(function, page);
-	if (PageDirty(page))
-		ClearPageDirty(page);
-}
-
 /*
  * Frees a list of pages.
  * Assumes all pages on list are in same zone, and of same order.
@@ -341,20 +390,12 @@ free_pages_bulk(struct zone *zone, int c
 void __free_pages_ok(struct page *page, unsigned int order)
 {
 	LIST_HEAD(list);
-	int i;
 
 	arch_free_page(page, order);
 
 	mod_page_state(pgfree, 1 << order);
 
-#ifndef CONFIG_MMU
-	if (order > 0)
-		for (i = 1 ; i < (1 << order) ; ++i)
-			__put_page(page + i);
-#endif
-
-	for (i = 0 ; i < (1 << order) ; ++i)
-		free_pages_check(__FUNCTION__, page + i);
+	free_pages_check(__FUNCTION__, page, order);
 	list_add(&page->lru, &list);
 	kernel_map_pages(page, 1 << order, 0);
 	free_pages_bulk(page_zone(page), 1, &list, order);
@@ -419,25 +460,57 @@ expand(struct zone *zone, struct page *p
 /*
  * This page is about to be returned from the page allocator
  */
-static void prep_new_page(struct page *page, int order)
+static void prep_new_page(struct page *page, unsigned int gfp_mask, int order,
+			  int check)
 {
-	if (page->mapping || page_mapped(page) ||
-	    (page->flags & (
-			1 << PG_private	|
-			1 << PG_locked	|
-			1 << PG_lru	|
-			1 << PG_active	|
-			1 << PG_dirty	|
-			1 << PG_reclaim	|
-			1 << PG_swapcache |
-			1 << PG_writeback )))
-		bad_page(__FUNCTION__, page);
+	page_flags_t pgflags = page->flags;
+
+	/* check the struct page hasn't become corrupted */
+	if (check) {
+		if (page->mapping || page_mapped(page) ||
+		    (pgflags & (
+			    1 << PG_private	|
+			    1 << PG_locked	|
+			    1 << PG_lru	|
+			    1 << PG_active	|
+			    1 << PG_dirty	|
+			    1 << PG_reclaim	|
+			    1 << PG_swapcache |
+			    1 << PG_writeback |
+			    1 << PG_compound |
+			    1 << PG_compound_slave)))
+			bad_page(__FUNCTION__, page, page, order);
+	}
+
+	pgflags &= ~(1 << PG_uptodate | 1 << PG_error |
+		     1 << PG_referenced | 1 << PG_arch_1 |
+		     1 << PG_checked | 1 << PG_mappedtodisk);
 
-	page->flags &= ~(1 << PG_uptodate | 1 << PG_error |
-			1 << PG_referenced | 1 << PG_arch_1 |
-			1 << PG_checked | 1 << PG_mappedtodisk);
 	page->private = 0;
+
+	/* set the refcount on the page */
 	set_page_refs(page, order);
+
+	/* if requested, mark a high-order allocation as being a compound page
+	 * and store high-order page metadata on the second page */
+	if (order > 0 && gfp_mask & __GFP_COMP) {
+		struct page *xpage;
+		int i;
+
+		pgflags |= 1 << PG_compound;
+
+		page[1].index = order;
+		page[1].mapping = NULL; /* no destructor yet */
+
+		xpage = page + 1;
+		for (i = (1 << order) - 1; i > 0; i--) {
+			xpage->flags |= 1 << PG_compound | 1 << PG_compound_slave;
+			xpage->private = (unsigned long) page;
+			xpage++;
+		}
+	}
+
+	page->flags = pgflags;
 }
 
 /*
@@ -589,7 +662,7 @@ void fastcall free_hot_cold_page(struct 
 	inc_page_state(pgfree);
 	if (PageAnon(page))
 		page->mapping = NULL;
-	free_pages_check(__FUNCTION__, page);
+	free_pages_check(__FUNCTION__, page, 0);
 	pcp = &zone->pageset[get_cpu()].pcp[cold];
 	local_irq_save(flags);
 	if (pcp->count >= pcp->high)
@@ -708,11 +781,11 @@ perthread_pages_alloc(void)
  */
 
 static struct page *
-buffered_rmqueue(struct zone *zone, int order, int gfp_flags)
+buffered_rmqueue(struct zone *zone, int order, unsigned int gfp_mask)
 {
 	unsigned long flags;
 	struct page *page = NULL;
-	int cold = !!(gfp_flags & __GFP_COLD);
+	int cold = !!(gfp_mask & __GFP_COLD);
 
 	if (order == 0) {
 		struct per_cpu_pages *pcp;
@@ -740,9 +813,7 @@ buffered_rmqueue(struct zone *zone, int 
 	if (page != NULL) {
 		BUG_ON(bad_range(zone, page));
 		mod_page_state_zone(zone, pgalloc, 1 << order);
-		prep_new_page(page, order);
-		if (order && (gfp_flags & __GFP_COMP))
-			prep_compound_page(page, order);
+		prep_new_page(page, gfp_mask, order, 1);
 	}
 	return page;
 }
@@ -1003,23 +1074,24 @@ fastcall void free_pages(unsigned long a
 
 EXPORT_SYMBOL(free_pages);
 
-#ifdef CONFIG_HUGETLB_PAGE
-
-void put_page(struct page *page)
+#ifdef CONFIG_ENHANCED_COMPOUND_PAGES
+fastcall void put_page(struct page *page)
 {
 	if (unlikely(PageCompound(page))) {
-		page = (struct page *)page->private;
+		page = (struct page *) page->private;
 		if (put_page_testzero(page)) {
-			void (*dtor)(struct page *page);
+			page_dtor_t dtor;
 
-			dtor = (void (*)(struct page *))page[1].mapping;
+			dtor = (page_dtor_t) page[1].mapping;
 			(*dtor)(page);
 		}
 		return;
 	}
-	if (!PageReserved(page) && put_page_testzero(page))
+
+	if (likely(!PageReserved(page)) && put_page_testzero(page))
 		__page_cache_release(page);
 }
+
 EXPORT_SYMBOL(put_page);
 #endif
 
@@ -2258,3 +2330,39 @@ void *__init alloc_large_system_hash(con
 
 	return table;
 }
+
+/*
+ * split a compound page into an array of smaller chunks of a given order
+ */
+void split_compound_page(struct page *page, unsigned new_order)
+{
+	unsigned old_order, loop, stop, step;
+
+	old_order = compound_page_order(page);
+	if (old_order != new_order) {
+		BUG_ON(old_order < new_order);
+
+		stop = 1 << old_order;
+		step = 1 << new_order;
+		for (loop = 0; loop < stop; loop += step)
+			prep_new_page(page + loop, __GFP_COMP, new_order, 0);
+	}
+}
+
+/*
+ * split a high-order page into an array of smaller chunks of a given order
+ */
+void split_highorder_page(struct page *page, unsigned new_order,
+			  unsigned old_order)
+{
+	unsigned loop, stop, step;
+
+	if (old_order != new_order) {
+		BUG_ON(old_order < new_order);
+
+		stop = 1 << old_order;
+		step = 1 << new_order;
+		for (loop = 0; loop < stop; loop += step)
+			prep_new_page(page + loop, 0, new_order, 0);
+	}
+}
diff -uNrp linux-2.6.10-rc2-mm3-mmcleanup/mm/slab.c linux-2.6.10-rc2-mm3-shmem/mm/slab.c
--- linux-2.6.10-rc2-mm3-mmcleanup/mm/slab.c	2004-11-22 10:54:18.000000000 +0000
+++ linux-2.6.10-rc2-mm3-shmem/mm/slab.c	2004-12-01 15:49:28.000000000 +0000
@@ -873,7 +873,7 @@ static void *kmem_getpages(kmem_cache_t 
 	void *addr;
 	int i;
 
-	flags |= cachep->gfpflags;
+	flags |= cachep->gfpflags | __GFP_COMP;
 	if (likely(nodeid == -1)) {
 		addr = (void*)__get_free_pages(flags, cachep->gfporder);
 		if (!addr)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-12-13 16:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20041209141718.6acec9ee.akpm@osdl.org>
     [not found] ` <7ad0b24c-4955-11d9-8e19-0002b3163499@redhat.com>
     [not found]   ` <200412082012.iB8KCTBK010123@warthog.cambridge.redhat.com>
2004-12-10 15:45     ` [PATCH 2/5] NOMMU: High-order page management overhaul David Howells
2004-12-10 21:01       ` Andrew Morton
2004-12-13 16:32       ` David Howells
2004-12-09 15:08 [PATCH 1/5] NOMMU: MM cleanups dhowells
2004-12-09 15:08 ` [PATCH 2/5] NOMMU: High-order page management overhaul dhowells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).