All of lore.kernel.org
 help / color / mirror / Atom feed
* Hugepages for shm page cache (defrag)
@ 2011-07-06 19:31 Radosław Smogura
  2011-07-07  5:28 ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Radosław Smogura @ 2011-07-06 19:31 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: Text/Plain, Size: 1124 bytes --]

Hello,

This is may first try with Linux patch, so please do not blame me too much. 
Actually I started with small idea to add MAP_HUGTLB for /dev/shm but it grew 
up in something more like support for huge pages in page cache, but according 
to documentation to submit alpha-work too, I decided to send this.

-------------------
Idea is that I created defrag method which tries to defragment page cache to 
huge page, I added one flag for such compound page, as I want to treat it 
little different, for example any put/get unless/test zero will increase it 
count, so page will be freed if all "sub-pages" will be freed. From other side 
each tail page may have it's own LRU (why I don't know).

There is ofocourse much work to do in shm to make it hugepage aware, map 
pte(pde) etc.

At this stage I don't have BUGs and I can make 2MB shm area, read and delete 
it.
Signed-off-by: Radosław Smogura (mail@smogura.eu)
-------------------

Any suggestion are welcome.

If it's to early work I want to say sorry. I hope I fullfiled all code/submit 
conventions.

Regards,
Radosław Smogura



[-- Attachment #2: defrag_page_cache_01.patch.gz --]
[-- Type: application/x-gzip, Size: 5064 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hugepages for shm page cache (defrag)
  2011-07-06 19:31 Hugepages for shm page cache (defrag) Radosław Smogura
@ 2011-07-07  5:28 ` Andi Kleen
  2011-07-07 12:02   ` mail
  2011-08-07 17:03   ` Hugepages for shm page cache (defrag) Radosław Smogura
  0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2011-07-07  5:28 UTC (permalink / raw)
  To: Radosław Smogura; +Cc: linux-mm, aarcange

Radosław Smogura <mail@smogura.eu> writes:

> Hello,
>
> This is may first try with Linux patch, so please do not blame me too much. 
> Actually I started with small idea to add MAP_HUGTLB for /dev/shm but it grew 
> up in something more like support for huge pages in page cache, but according 
> to documentation to submit alpha-work too, I decided to send this.

Shouldn't this be rather integrated with the normal transparent huge
pages? It seems odd to develop parallel infrastructure.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hugepages for shm page cache (defrag)
  2011-07-07  5:28 ` Andi Kleen
@ 2011-07-07 12:02   ` mail
  2011-07-08  0:07     ` Hugh Dickins
  2011-08-07 17:03   ` Hugepages for shm page cache (defrag) Radosław Smogura
  1 sibling, 1 reply; 8+ messages in thread
From: mail @ 2011-07-07 12:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Radosław Smogura, linux-mm, aarcange

On Wed, 06 Jul 2011 22:28:59 -0700, Andi Kleen wrote:
> RadosA?aw Smogura <mail@smogura.eu> writes:
>
>> Hello,
>>
>> This is may first try with Linux patch, so please do not blame me 
>> too much.
>> Actually I started with small idea to add MAP_HUGTLB for /dev/shm 
>> but it grew
>> up in something more like support for huge pages in page cache, but 
>> according
>> to documentation to submit alpha-work too, I decided to send this.
>
> Shouldn't this be rather integrated with the normal transparent huge
> pages? It seems odd to develop parallel infrastructure.
>
> -Andi
It's not quite good to ask me about this, as I'm starting hacker, but I 
think it should be treated as counterpart for page cache, and actually I 
got few "collisions" with THP.

High level design will probably be the same (e.g. I use defrag_, THP 
uses collapse_ for creating huge page), but in contrast I try to operate 
on page cache, so in some way file system must be huge page aware (shm 
fs is not, as it can move page from page cache to swap cache - it may 
silently fragment de-fragmented areas).

I put some requirements for work, e. g. mapping file as huge should not 
affect previous or future, even fancy, non huge mappings, both callers 
should succeed and get this what they asked for.

Of course I think how to make it more "transparent" without need of 
file system support, but I suppose it may be dead-corner.

I still want to emphasise it's really alpha version.

Regards,
Radek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hugepages for shm page cache (defrag)
  2011-07-07 12:02   ` mail
@ 2011-07-08  0:07     ` Hugh Dickins
  2011-07-14 20:15       ` mail
  0 siblings, 1 reply; 8+ messages in thread
From: Hugh Dickins @ 2011-07-08  0:07 UTC (permalink / raw)
  To: Radoslaw Smogura; +Cc: Radislaw Smogura, Andi Kleen, linux-mm, aarcange

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2488 bytes --]

On Thu, 7 Jul 2011, mail@rsmogura.net wrote:
> On Wed, 06 Jul 2011 22:28:59 -0700, Andi Kleen wrote:
> > Radosław Smogura <mail@smogura.eu> writes:
> > 
> > > Hello,
> > > 
> > > This is may first try with Linux patch, so please do not blame me too
> > > much.
> > > Actually I started with small idea to add MAP_HUGTLB for /dev/shm but it
> > > grew
> > > up in something more like support for huge pages in page cache, but
> > > according
> > > to documentation to submit alpha-work too, I decided to send this.
> > 
> > Shouldn't this be rather integrated with the normal transparent huge
> > pages? It seems odd to develop parallel infrastructure.
> > 
> > -Andi

Although Andi's sig says "Speaking for myself only",
he is very much speaking for me on this too ;)

There is definitely interest in extending Transparent Huge Pages to tmpfs;
though so far as I know, nobody has yet had time to think through just
what that will entail.

Correspondingly, I'm afraid there would be little interest in adding yet
another variant of hugepages into the kernel - enough ugliness already!

> It's not quite good to ask me about this, as I'm starting hacker, but I
> think it should be treated as counterpart for page cache, and actually I got
> few "collisions" with THP.
> 
> High level design will probably be the same (e.g. I use defrag_, THP uses
> collapse_ for creating huge page), but in contrast I try to operate on page
> cache, so in some way file system must be huge page aware (shm fs is not, as
> it can move page from page cache to swap cache - it may silently fragment
> de-fragmented areas).
> 
> I put some requirements for work, e. g. mapping file as huge should not
> affect previous or future, even fancy, non huge mappings, both callers
> should succeed and get this what they asked for.
> 
> Of course I think how to make it more "transparent" without need of file
> system support, but I suppose it may be dead-corner.
> 
> I still want to emphasise it's really alpha version.

I barely looked at it, but did notice that scripts/checkpatch.pl reports
127 errors and 111 warnings, plus it seems to be significantly incomplete
(an extern declaration of defragPageCache() but not the function itself).

And it serves no purpose without the pte work you mention (there
is no point to a shmem hugepage unless it is mapped in that way).

Sorry to be discouraging, but extending THP is likely to be the way to go.

Hugh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hugepages for shm page cache (defrag)
  2011-07-08  0:07     ` Hugh Dickins
@ 2011-07-14 20:15       ` mail
  2011-07-15 19:50         ` Hugh Dickins
  0 siblings, 1 reply; 8+ messages in thread
From: mail @ 2011-07-14 20:15 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Radislaw Smogura, Andi Kleen, linux-mm, aarcange

On Thu, 7 Jul 2011 17:07:47 -0700 (PDT), Hugh Dickins wrote:
> On Thu, 7 Jul 2011, mail@rsmogura.net wrote:
>> On Wed, 06 Jul 2011 22:28:59 -0700, Andi Kleen wrote:
>> > RadosA?aw Smogura <mail@smogura.eu> writes:
>> >
>> > > Hello,
>> > >
>> > > This is may first try with Linux patch, so please do not blame 
>> me too
>> > > much.
>> > > Actually I started with small idea to add MAP_HUGTLB for 
>> /dev/shm but it
>> > > grew
>> > > up in something more like support for huge pages in page cache, 
>> but
>> > > according
>> > > to documentation to submit alpha-work too, I decided to send 
>> this.
>> >
>> > Shouldn't this be rather integrated with the normal transparent 
>> huge
>> > pages? It seems odd to develop parallel infrastructure.
>> >
>> > -Andi
>
> Although Andi's sig says "Speaking for myself only",
> he is very much speaking for me on this too ;)
>
> There is definitely interest in extending Transparent Huge Pages to 
> tmpfs;
> though so far as I know, nobody has yet had time to think through 
> just
> what that will entail.
>
> Correspondingly, I'm afraid there would be little interest in adding 
> yet
> another variant of hugepages into the kernel - enough ugliness 
> already!
>
>> It's not quite good to ask me about this, as I'm starting hacker, 
>> but I
>> think it should be treated as counterpart for page cache, and 
>> actually I got
>> few "collisions" with THP.
>>
>> High level design will probably be the same (e.g. I use defrag_, THP 
>> uses
>> collapse_ for creating huge page), but in contrast I try to operate 
>> on page
>> cache, so in some way file system must be huge page aware (shm fs is 
>> not, as
>> it can move page from page cache to swap cache - it may silently 
>> fragment
>> de-fragmented areas).
>>
>> I put some requirements for work, e. g. mapping file as huge should 
>> not
>> affect previous or future, even fancy, non huge mappings, both 
>> callers
>> should succeed and get this what they asked for.
>>
>> Of course I think how to make it more "transparent" without need of 
>> file
>> system support, but I suppose it may be dead-corner.
>>
>> I still want to emphasise it's really alpha version.
>
> I barely looked at it, but did notice that scripts/checkpatch.pl 
> reports
> 127 errors and 111 warnings, plus it seems to be significantly 
> incomplete
> (an extern declaration of defragPageCache() but not the function 
> itself).
>
> And it serves no purpose without the pte work you mention (there
> is no point to a shmem hugepage unless it is mapped in that way).
>
> Sorry to be discouraging, but extending THP is likely to be the way 
> to go.
>
> Hugh
Hi,
I working to remove errors from patch, and I integrated it with current 
THP infrastructure a little bit, but I want ask if following I do 
following - it's about get_page, put_page, get_page_unless_zero, 
put_page_test_zero.

I want following logic I think it may be better (in x86):
1) Each THP page will start with 512 refcount (self + 511 tails)
2) Each get/put will increment usage count only on this page, same test 
variants will do (currently those do not make this, so split is broken)
3) On compounds put page will call put_page_test_zero, if true, it will 
do compound lock, ask again if it has 0, if yes it will decrease 
refcount of head, if it will fall to zero compound will be freed (double 
check lock).
4) Compound lock is this what caller will need to establish if it needs 
to operate on transparent huge page in whole.

Motivation:
I operate on page cache, many assumptions about concurrent call of 
put/get_page are and plain using those causes memory leaks, faults, 
dangling pointers, etc when I'm going to split compound page.

Is this acceptable?

Regards,
Radek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hugepages for shm page cache (defrag)
  2011-07-14 20:15       ` mail
@ 2011-07-15 19:50         ` Hugh Dickins
  2011-07-30 16:15           ` [PATCH] Changes how ref-count of compound pages are managed Radosław Smogura
  0 siblings, 1 reply; 8+ messages in thread
From: Hugh Dickins @ 2011-07-15 19:50 UTC (permalink / raw)
  To: mail; +Cc: Andrea Arcangeli, Radislaw Smogura, Andi Kleen, linux-mm

On Thu, 14 Jul 2011, mail@rsmogura.net wrote:
> I working to remove errors from patch, and I integrated it with current THP
> infrastructure a little bit,

Great, thank you.

> but I want ask if following I do following -
> it's about get_page, put_page, get_page_unless_zero, put_page_test_zero.
> 
> I want following logic I think it may be better (in x86):
> 1) Each THP page will start with 512 refcount (self + 511 tails)
> 2) Each get/put will increment usage count only on this page, same test
> variants will do (currently those do not make this, so split is broken)
> 3) On compounds put page will call put_page_test_zero, if true, it will do
> compound lock, ask again if it has 0, if yes it will decrease refcount of
> head, if it will fall to zero compound will be freed (double check lock).
> 4) Compound lock is this what caller will need to establish if it needs to
> operate on transparent huge page in whole.
> 
> Motivation:
> I operate on page cache, many assumptions about concurrent call of
> put/get_page are and plain using those causes memory leaks, faults, dangling
> pointers, etc when I'm going to split compound page.
> 
> Is this acceptable?

Sounds plausible, but I really don't know.

I do remember that refcounting compounds by head or by tail always raises
questions (and access via get_user_pages() is an easily-overlooked path
that needs to be kept in mind).  But where THP stands today, and how it
needs to be changed for this, I have no idea - whereas Andrea, perhaps,
will recognize some of your points above and have a more useful response.

It's clear that you have much more of a grip on these details than I
have at present, so just be guided by the principle of not slowing
down the common paths.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Changes how ref-count of compound pages are managed.
  2011-07-15 19:50         ` Hugh Dickins
@ 2011-07-30 16:15           ` Radosław Smogura
  0 siblings, 0 replies; 8+ messages in thread
From: Radosław Smogura @ 2011-07-30 16:15 UTC (permalink / raw)
  To: Andrea Arcangeli, Radislaw Smogura, Andi Kleen, linux-mm
  Cc: Radosław Smogura

WARNING. This change alone may at least break THP splitting.
WARNING. Read end.

Each compound page is made from head and tail of pages. Compound huge page
starts with head ref-count 2^order on head, and ref-count 1 for each tail, but
this is, currently, caller responsibility to ensure this.

If ref-count of tail goes to 0 then ref-count of head is decreased by 1.
This is changes previous behaviour which managed (only partially)
head ref-count as head's individual counts + sum tail's ref-count.

To make this consistent page_count no longer returns count of head, but
only count of tail. Changes has been made to put_page_test_zero,
get_page_unless_zero.

Freeing compound page is protected by compound on head with
double check locking.

This makes
- page splitting for transparent anonymous huge pages, and for transparent huge
pages for page cache (WIP) more securely and simpler. Both of above to fix
head ref-count may subtract values in block of compound lock and if
get_page_unless_zero returned true,
- removes some atomic_inc, and dangling pointers.

WARNING.
This doesn't prevent situations when pages is splitted from larger compound
page to smaller compound page. If this happens (currently no one makes this),
then double-check locking is not enough. Imagine that you have page of order 2,
you put last tail page (4th), some one splits it to two order 1 compound
pages then head of 4th page will change to 3rd (was 1st). If splitting to
smaller compound pages will be possible then, recheck if head doesn't changed
and relocking of head will be required.

This patch doesn't comes with changes for others (see warning), and is
made for preview.
Signed-off-by: RadosA?aw Smogura <mail@smogura.eu>
---
 include/linux/mm.h |  101 +++++++++++++++++++++++++-----------
 mm/swap.c          |  142 +++++++++++++++++++++++++---------------------------
 2 files changed, 139 insertions(+), 104 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3172a1c..e51c4f8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -267,18 +267,51 @@ struct inode;
  * routine so they can be sure the page doesn't go away from under them.
  */
 
+
+/** Decreases refcount of page's head. <b>Refcount of {@code page} must be
+ * {@code 0}</b>.
+ * You may pass page head as {@code page} - in this case if {@code page}
+ * refcount is {@code 0} then compound page will be freed.
+ *
+ * @see get_page
+ */
+void put_compound_head_of_page(struct page *page);
+
+static inline struct page *compound_head(struct page *page)
+{
+	if (unlikely(PageTail(page)))
+		return page->first_page;
+	return page;
+}
+
 /*
  * Drop a ref, return true if the refcount fell to zero (the page has no users)
  */
 static inline int put_page_testzero(struct page *page)
 {
+	int result;
 	VM_BUG_ON(atomic_read(&page->_count) == 0);
-	return atomic_dec_and_test(&page->_count);
+
+	result = atomic_dec_and_test(&page->_count);
+	if (unlikely(PageTail(page) && result)) {
+		/* If below BUG will be shown, or "null" exception raised
+		 * the page could be splitted from under us (but it should not).
+		 */
+		VM_BUG_ON(atomic_read(&page->first_page->_count) == 0);
+		put_compound_head_of_page(page);
+	} else {
+		/* TODO Free page? - orignally page was not freed.
+		 * Page can be head of compund or "stand alone" page.
+		 */
+	}
+
+	return result;
 }
 
 /*
  * Try to grab a ref unless the page has a refcount of zero, return false if
  * that is the case.
+ * @see get_page
  */
 static inline int get_page_unless_zero(struct page *page)
 {
@@ -349,43 +382,49 @@ static inline void compound_unlock_irqrestore(struct page *page,
 #endif
 }
 
-static inline struct page *compound_head(struct page *page)
-{
-	if (unlikely(PageTail(page)))
-		return page->first_page;
-	return page;
-}
-
+/** Returns usage count of given page.
+ * @see get_page.
+ */
 static inline int page_count(struct page *page)
 {
-	return atomic_read(&compound_head(page)->_count);
-}
-
+	return atomic_read(&page->_count);
+}
+
+/** Increases refcount of page.
+ * <h1>Logic of refcounting pages</h1>
+ * <h2>Normal pages (not compound)</h2>
+ * Getting or putting page will increase refcount of this page, when refcount
+ * falls to {@code 0} then page will be freed.
+ * <h2>Compound pages (transparent huge pages)</h2>
+ * Each compound page is made from {@code 2^order} of pages, 1st page is
+ * <b>head</b> rest pages are <b>tail</b> pages. Compound huge page
+ * starts (when allocated) with refcount {@code 2^order} on head, and refcount
+ * {@code 1} for each tail (developer may need to ensure this by self).<br/>
+ * Getting or putting any of page (head or tail) will increase / decrease
+ * refcount of this page <b>only</b>, but with <b>exceptions:</b>.
+ * <ol>
+ * <li>
+ *      If refcount of tail page will fall to {@code 0}, then compound lock is
+ *      aquired on page head (to prevent splitting by others), if page is still
+ *      compound then refcount of head is decremented by {@code 1}.
+ * </li>
+ * <li>
+ *      If recount of head will fall to {@code 0} then compound lock on head
+ *      is aquired (to prevent background splitting), if page is still compound
+ *      then it will be freed as compound.
+ * </li>
+ * </ol>
+ * This function will not free pages if those has been splitted before head
+ * refcount has been decremented (in particullary, if split occurs, then
+ * splitter is reponsible for freeing pages).
+ */
 static inline void get_page(struct page *page)
 {
-	/*
-	 * Getting a normal page or the head of a compound page
-	 * requires to already have an elevated page->_count. Only if
-	 * we're getting a tail page, the elevated page->_count is
-	 * required only in the head page, so for tail pages the
-	 * bugcheck only verifies that the page->_count isn't
-	 * negative.
-	 */
-	VM_BUG_ON(atomic_read(&page->_count) < !PageTail(page));
+	/* Disallow of getting any page (event tail) if it refcount felt
+	 * to zero */
+	VM_BUG_ON(atomic_read(&page->_count) <= 0);
+
 	atomic_inc(&page->_count);
-	/*
-	 * Getting a tail page will elevate both the head and tail
-	 * page->_count(s).
-	 */
-	if (unlikely(PageTail(page))) {
-		/*
-		 * This is safe only because
-		 * __split_huge_page_refcount can't run under
-		 * get_page().
-		 */
-		VM_BUG_ON(atomic_read(&page->first_page->_count) <= 0);
-		atomic_inc(&page->first_page->_count);
-	}
 }
 
 static inline struct page *virt_to_head_page(const void *x)
diff --git a/mm/swap.c b/mm/swap.c
index 3a442f1..a25f096 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -76,90 +76,86 @@ static void __put_compound_page(struct page *page)
 
 static void put_compound_page(struct page *page)
 {
-	if (unlikely(PageTail(page))) {
-		/* __split_huge_page_refcount can run under us */
-		struct page *page_head = page->first_page;
-		smp_rmb();
-		/*
-		 * If PageTail is still set after smp_rmb() we can be sure
-		 * that the page->first_page we read wasn't a dangling pointer.
-		 * See __split_huge_page_refcount() smp_wmb().
+	if (atomic_dec_and_test(&page->_count)) {
+		/* Some ref fell to zero. It doesn't matter if it was head
+		 * or tail put_compound_head_of_page is greate and deals with
+		 * both cases.
 		 */
-		if (likely(PageTail(page) && get_page_unless_zero(page_head))) {
-			unsigned long flags;
-			/*
-			 * Verify that our page_head wasn't converted
-			 * to a a regular page before we got a
-			 * reference on it.
-			 */
-			if (unlikely(!PageHead(page_head))) {
-				/* PageHead is cleared after PageTail */
-				smp_rmb();
-				VM_BUG_ON(PageTail(page));
-				goto out_put_head;
-			}
-			/*
-			 * Only run compound_lock on a valid PageHead,
-			 * after having it pinned with
-			 * get_page_unless_zero() above.
-			 */
-			smp_mb();
-			/* page_head wasn't a dangling pointer */
-			flags = compound_lock_irqsave(page_head);
-			if (unlikely(!PageTail(page))) {
-				/* __split_huge_page_refcount run before us */
-				compound_unlock_irqrestore(page_head, flags);
-				VM_BUG_ON(PageHead(page_head));
-			out_put_head:
-				if (put_page_testzero(page_head))
-					__put_single_page(page_head);
-			out_put_single:
-				if (put_page_testzero(page))
-					__put_single_page(page);
-				return;
-			}
-			VM_BUG_ON(page_head != page->first_page);
-			/*
-			 * We can release the refcount taken by
-			 * get_page_unless_zero now that
-			 * split_huge_page_refcount is blocked on the
-			 * compound_lock.
-			 */
-			if (put_page_testzero(page_head))
-				VM_BUG_ON(1);
-			/* __split_huge_page_refcount will wait now */
-			VM_BUG_ON(atomic_read(&page->_count) <= 0);
-			atomic_dec(&page->_count);
-			VM_BUG_ON(atomic_read(&page_head->_count) <= 0);
-			compound_unlock_irqrestore(page_head, flags);
-			if (put_page_testzero(page_head)) {
-				if (PageHead(page_head))
-					__put_compound_page(page_head);
-				else
-					__put_single_page(page_head);
-			}
-		} else {
-			/* page_head is a dangling pointer */
-			VM_BUG_ON(PageTail(page));
-			goto out_put_single;
-		}
-	} else if (put_page_testzero(page)) {
-		if (PageHead(page))
-			__put_compound_page(page);
-		else
-			__put_single_page(page);
+		put_compound_head_of_page(page);
 	}
 }
 
 void put_page(struct page *page)
 {
-	if (unlikely(PageCompound(page)))
+	if (unlikely(PageCompound(page))) {
 		put_compound_page(page);
-	else if (put_page_testzero(page))
+	} else if (put_page_testzero(page)) {
 		__put_single_page(page);
+	}
 }
 EXPORT_SYMBOL(put_page);
 
+static inline void free_compound_page(struct page *head)
+{
+	VM_BUG_ON(PageTail(head));
+	VM_BUG_ON(!PageCompound(head));
+	/* Debug test if all tails are zero ref - we do not have lock,
+	 * but we shuld not have refcount, so no one should split us!
+	 */
+#if CONFIG_DEBUG_VM
+	do {
+		unsigned long toCheck = 1 << compound_order(head);
+		unsigned long i;
+		for (i = 0; i < toCheck; i++)
+			VM_BUG_ON(atomic_read(&head[i]._count));
+	} while (0);
+#endif
+	__put_compound_page(head);
+}
+
+void put_compound_head_of_page(struct page *page)
+{
+	struct page *head = compound_head(page);
+	int flags;
+
+	VM_BUG_ON(atomic_read(&page->_count));
+
+	/* Lock page to prevent splitting */
+	flags = compound_lock_irqsave(head);
+
+	/* Split impossible right now, is page still tail? Maybe was splitted.
+	 * Desired behaviour is that splitter of compund page (and decraser of
+	 * head refcount) will need to conditionaly aquire page tail
+	 * (get_page_unless_zero) and before putting tail it need to
+	 * decrease head refcount. This SHOULD prevent problems with
+	 * "under us" operations.
+	 *
+	 * Splitter is responsible to free head & tail, if it felt to zero.
+	 */
+	if (PageCompound(page)) {
+		int shouldFree;
+		head = compound_head(page);
+		if (head == page) {
+			/* It's head by call requirements its refcount is 0. */
+			shouldFree = 1;
+		} else if (atomic_dec_and_test(&head->_count)) {
+			/* It's last used tail. */
+			shouldFree = 1;
+		} else {
+			/* It's tail and head refcount is still positive. */
+			shouldFree = 0;
+		}
+
+		compound_unlock_irqrestore(head, flags);
+		if (shouldFree)
+			free_compound_page(head);
+	} else {
+		/* Page is no longer compound splitter decreased head refcount,
+		 * and freed head and tails if needed. No op.
+		 */
+	}
+}
+EXPORT_SYMBOL(put_compound_head_of_page);
 /**
  * put_pages_list() - release a list of pages
  * @pages: list of pages threaded on page->lru
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Hugepages for shm page cache (defrag)
  2011-07-07  5:28 ` Andi Kleen
  2011-07-07 12:02   ` mail
@ 2011-08-07 17:03   ` Radosław Smogura
  1 sibling, 0 replies; 8+ messages in thread
From: Radosław Smogura @ 2011-08-07 17:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, Andrea Arcangeli, Radosław Smogura

[-- Attachment #1: Type: Text/Plain, Size: 2382 bytes --]

Andi Kleen <andi@firstfloor.org> Thursday 07 of July 2011 07:28:59
> Radosław Smogura <mail@smogura.eu> writes:
> > Hello,
> > 
> > This is may first try with Linux patch, so please do not blame me too
> > much. Actually I started with small idea to add MAP_HUGTLB for /dev/shm
> > but it grew up in something more like support for huge pages in page
> > cache, but according to documentation to submit alpha-work too, I
> > decided to send this.
> 
> Shouldn't this be rather integrated with the normal transparent huge
> pages? It seems odd to develop parallel infrastructure.
> 
> -Andi
Hello,

I send as attachment some preview of work, last time I putted many work to 
make code less messy, etc, but it is still messy and in alpha stage.

=== Info about changes and how it's works ===
Current work is focused to give support for huge pages for tmpfs, but I 
designed it for private mappings too (my dream will be if I will be able to 
map execution portions of glibc as huge). I think there will be no big change 
to expose THP for general file system.

I putted some effort to do not create new type of huge pages, so I do not use 
longer special flag for managing this.

PMD is establish (from historical reasons) from pte fault, so I unmaps pte's 
from pmd (I think I have memory leak here).

Splitting of huge page for page cache still need some work, and it has 
different concept. Mainly I use currently compound_lock to prohibit concurrent 
splitting etc. Splitting acquires compound lock, then unmaps all pmds, and 
split occurs.

In splitting there is usage of new ref-counting. I may be wrong, but in other 
cases we will need to manage ref-count bit for compound pages (this what is 
included /mm.h, swap.c/ is slightly different from my previous send; I still 
think if I should use irqsave lock for compounds). I think included construct 
removes all "under us" risks.

I don't want to waste Your time for reviewing this work, but I will be glad if 
you will find some time. One thing I worried is about pde locking is it needed 
to do pmd locking, like for pte (or any other kind of locking? I have some 
ideas)?

I hope this time here is no missing files.
Regards,
Radosław Smogura
P. S. In one of mails in CC there is putted wrongly my e-mail address and 
name, please remove it to avoid unknown recipient.

[-- Attachment #2: defrag-preview-20110807.patch.gz --]
[-- Type: application/x-gzip, Size: 20948 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-08-07 17:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-06 19:31 Hugepages for shm page cache (defrag) Radosław Smogura
2011-07-07  5:28 ` Andi Kleen
2011-07-07 12:02   ` mail
2011-07-08  0:07     ` Hugh Dickins
2011-07-14 20:15       ` mail
2011-07-15 19:50         ` Hugh Dickins
2011-07-30 16:15           ` [PATCH] Changes how ref-count of compound pages are managed Radosław Smogura
2011-08-07 17:03   ` Hugepages for shm page cache (defrag) Radosław Smogura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.