All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] simplify shmem_aops.set_page_dirty method
@ 2007-01-31  4:06 Ken Chen
  2007-01-31 17:17 ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Ken Chen @ 2007-01-31  4:06 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: linux-mm

shmem backed file does not have page write back, nor it participates in
BDI_CAP_NO_ACCT_DIRTY or BDI_CAP_NO_WRITEBACK accounting. So using generic
__set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
overkill.  It unnecessarily prolonged shm unmap latency.

For example, on a densely populated large shm segment (sevearl GBs), the
unmapping operation becomes painfully long. Because at unmap, kernel
transfers dirty bit in PTE into page struct and to the radix tree tag. The
operation of tagging the radix tree is particularlly expensive because it
has to traverse the tree from the root to the leaf node on every dirty page.
What's bothering is that radix tree tag is used for page write back. However,
shmem is memory backed and there is no page write back for such file system.
And in the end, we spend all that time tagging radix tree and none of that
fancy tagging will be used.  So let's simplify it by introduce a new aops
__set_page_dirty_no_write_back and this will speed up shm unmap.


Signed-off-by: Ken Chen <kenchen@google.com>

---
Hugh, would you please kindly review this patch?


diff -Nurp linux-2.6.20-rc6/include/linux/mm.h
linux-2.6.20-rc6.unmap/include/linux/mm.h
--- linux-2.6.20-rc6/include/linux/mm.h	2007-01-30 19:23:44.000000000 -0800
+++ linux-2.6.20-rc6.unmap/include/linux/mm.h	2007-01-30
19:25:06.000000000 -0800
@@ -785,6 +785,7 @@ extern int try_to_release_page(struct pa
 extern void do_invalidatepage(struct page *page, unsigned long offset);

 int __set_page_dirty_nobuffers(struct page *page);
+int __set_page_dirty_no_write_back(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
 				struct page *page);
 int FASTCALL(set_page_dirty(struct page *page));
diff -Nurp linux-2.6.20-rc6/mm/page-writeback.c
linux-2.6.20-rc6.unmap/mm/page-writeback.c
--- linux-2.6.20-rc6/mm/page-writeback.c	2007-01-30 19:23:45.000000000 -0800
+++ linux-2.6.20-rc6.unmap/mm/page-writeback.c	2007-01-30
19:58:46.000000000 -0800
@@ -742,6 +742,21 @@ int write_one_page(struct page *page, in
 EXPORT_SYMBOL(write_one_page);

 /*
+ * For address_spaces which do not use buffers nor page write back.
+ */
+int __set_page_dirty_no_write_back(struct page *page)
+{
+	if (!TestSetPageDirty(page)) {
+		struct address_space *mapping = page_mapping(page);
+		if (mapping && mapping->host) {
+			/* !PageAnon && !swapper_space */
+			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+		}
+	}
+	return 0;
+}
+
+/*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
  * its radix tree.
  *
diff -Nurp linux-2.6.20-rc6/mm/shmem.c linux-2.6.20-rc6.unmap/mm/shmem.c
--- linux-2.6.20-rc6/mm/shmem.c	2007-01-30 19:23:45.000000000 -0800
+++ linux-2.6.20-rc6.unmap/mm/shmem.c	2007-01-30 19:38:26.000000000 -0800
@@ -2316,7 +2316,7 @@ static void destroy_inodecache(void)

 static const struct address_space_operations shmem_aops = {
 	.writepage	= shmem_writepage,
-	.set_page_dirty	= __set_page_dirty_nobuffers,
+	.set_page_dirty	= __set_page_dirty_no_write_back,
 #ifdef CONFIG_TMPFS
 	.prepare_write	= shmem_prepare_write,
 	.commit_write	= simple_commit_write,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] simplify shmem_aops.set_page_dirty method
  2007-01-31  4:06 [patch] simplify shmem_aops.set_page_dirty method Ken Chen
@ 2007-01-31 17:17 ` Hugh Dickins
  2007-01-31 19:11   ` Andrew Morton
  2007-01-31 19:14   ` Hugh Dickins
  0 siblings, 2 replies; 6+ messages in thread
From: Hugh Dickins @ 2007-01-31 17:17 UTC (permalink / raw)
  To: Ken Chen; +Cc: Andrew Morton, linux-mm

On Tue, 30 Jan 2007, Ken Chen wrote:

> shmem backed file does not have page write back, nor it participates in
> BDI_CAP_NO_ACCT_DIRTY or BDI_CAP_NO_WRITEBACK accounting. So using generic
> __set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
> overkill.  It unnecessarily prolonged shm unmap latency.
> 
> For example, on a densely populated large shm segment (sevearl GBs), the
> unmapping operation becomes painfully long. Because at unmap, kernel
> transfers dirty bit in PTE into page struct and to the radix tree tag. The
> operation of tagging the radix tree is particularlly expensive because it
> has to traverse the tree from the root to the leaf node on every dirty page.
> What's bothering is that radix tree tag is used for page write back. However,
> shmem is memory backed and there is no page write back for such file system.
> And in the end, we spend all that time tagging radix tree and none of that
> fancy tagging will be used.  So let's simplify it by introduce a new aops
> __set_page_dirty_no_write_back and this will speed up shm unmap.
> 
> 
> Signed-off-by: Ken Chen <kenchen@google.com>
> 
> ---
> Hugh, would you please kindly review this patch?

Sure.  Thanks for doing this, Ken: I certainly approve of your intention
here, I remember having a patch doing much the same when set_page_dirty
first came in.  I think it was part of some series which got rejected or
abandoned for other reasons; and lacking the numbers to justify it, I
just let it go and forgot.  You've now seen the improvement, great,
please go ahead - with a few changes.

1.  Would you mind changing the name to either
__set_page_dirty_no_writeback or __set_page_dirty_nowriteback?
I would prefer the former (we speak of "writeback" not "write back"
elsewhere), except __set_page_dirty_nobuffers has set a precedent
for the latter.

2.  Please remind me what good __mark_inode_dirty will do for shmem:
in my patch the equivalent function did nothing beyond SetPageDirty
(your TestSetPageDirty looks better, less redirtying the cacheline).
The world may have moved on and __mark_inode_dirty now be important,
but I suspect still not - I think it just puts the inode on some
hashlist which serves no good purpose for nowriteback mappings.

3.  There's some other places which should benefit from it too:
ramfs (which will cover tiny-shmem) for one, swap_aops for another.
Change those over at the same time?  Or leave them to another patch?
Up to you.

Some of the baroqueness (e.g. mapping2) in __set_page_dirty_nobuffers
reflects how tmpfs pages used to come there; but I think leave it as is.

Hugh

> 
> 
> diff -Nurp linux-2.6.20-rc6/include/linux/mm.h
> linux-2.6.20-rc6.unmap/include/linux/mm.h
> --- linux-2.6.20-rc6/include/linux/mm.h	2007-01-30 19:23:44.000000000 -0800
> +++ linux-2.6.20-rc6.unmap/include/linux/mm.h	2007-01-30
> 19:25:06.000000000 -0800
> @@ -785,6 +785,7 @@ extern int try_to_release_page(struct pa
> extern void do_invalidatepage(struct page *page, unsigned long offset);
> 
> int __set_page_dirty_nobuffers(struct page *page);
> +int __set_page_dirty_no_write_back(struct page *page);
> int redirty_page_for_writepage(struct writeback_control *wbc,
> 				struct page *page);
> int FASTCALL(set_page_dirty(struct page *page));
> diff -Nurp linux-2.6.20-rc6/mm/page-writeback.c
> linux-2.6.20-rc6.unmap/mm/page-writeback.c
> --- linux-2.6.20-rc6/mm/page-writeback.c	2007-01-30 19:23:45.000000000
> -0800
> +++ linux-2.6.20-rc6.unmap/mm/page-writeback.c	2007-01-30
> 19:58:46.000000000 -0800
> @@ -742,6 +742,21 @@ int write_one_page(struct page *page, in
> EXPORT_SYMBOL(write_one_page);
> 
> /*
> + * For address_spaces which do not use buffers nor page write back.
> + */
> +int __set_page_dirty_no_write_back(struct page *page)
> +{
> +	if (!TestSetPageDirty(page)) {
> +		struct address_space *mapping = page_mapping(page);
> +		if (mapping && mapping->host) {
> +			/* !PageAnon && !swapper_space */
> +			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
>  * For address_spaces which do not use buffers.  Just tag the page as dirty in
>  * its radix tree.
>  *
> diff -Nurp linux-2.6.20-rc6/mm/shmem.c linux-2.6.20-rc6.unmap/mm/shmem.c
> --- linux-2.6.20-rc6/mm/shmem.c	2007-01-30 19:23:45.000000000 -0800
> +++ linux-2.6.20-rc6.unmap/mm/shmem.c	2007-01-30 19:38:26.000000000 -0800
> @@ -2316,7 +2316,7 @@ static void destroy_inodecache(void)
> 
> static const struct address_space_operations shmem_aops = {
> 	.writepage	= shmem_writepage,
> -	.set_page_dirty	= __set_page_dirty_nobuffers,
> +	.set_page_dirty	= __set_page_dirty_no_write_back,
> #ifdef CONFIG_TMPFS
> 	.prepare_write	= shmem_prepare_write,
> 	.commit_write	= simple_commit_write,
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] simplify shmem_aops.set_page_dirty method
  2007-01-31 17:17 ` Hugh Dickins
@ 2007-01-31 19:11   ` Andrew Morton
  2007-01-31 19:17     ` Hugh Dickins
  2007-01-31 19:14   ` Hugh Dickins
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-01-31 19:11 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Ken Chen, linux-mm

On Wed, 31 Jan 2007 17:17:10 +0000 (GMT) Hugh Dickins <hugh@veritas.com> wrote:

> 2.  Please remind me what good __mark_inode_dirty will do for shmem:

None that I can think of - tmpfs inodes don't get written back to swap (do
they?)

> in my patch the equivalent function did nothing beyond SetPageDirty
> (your TestSetPageDirty looks better, less redirtying the cacheline).

Will test_and_set_bit() avoid dirtying the cacheline?  I guess it _could_
do this, and perhaps this depends upon the architecture.  Perhaps

	if (!PageDirty(page))
		SetPageDirty(page);

would be better here.

> The world may have moved on and __mark_inode_dirty now be important,
> but I suspect still not - I think it just puts the inode on some
> hashlist which serves no good purpose for nowriteback mappings.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] simplify shmem_aops.set_page_dirty method
  2007-01-31 17:17 ` Hugh Dickins
  2007-01-31 19:11   ` Andrew Morton
@ 2007-01-31 19:14   ` Hugh Dickins
  1 sibling, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2007-01-31 19:14 UTC (permalink / raw)
  To: Ken Chen; +Cc: Andrew Morton, linux-mm

On Wed, 31 Jan 2007, Hugh Dickins wrote:
> in my patch the equivalent function did nothing beyond SetPageDirty
> (your TestSetPageDirty looks better, less redirtying the cacheline).

You've probably been wondering what I meant by that cacheline remark:
I was fantasizing TestSetPageDirty(page) as
	if (PageDirty(page))
		return 1;
	SetPageDirty(page);
	return 0;
whereas of course it's something atomic.  Perhaps I had a point,
and the less atomic version above is advantageous here: not sure.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] simplify shmem_aops.set_page_dirty method
  2007-01-31 19:11   ` Andrew Morton
@ 2007-01-31 19:17     ` Hugh Dickins
  2007-01-31 21:23       ` Ken Chen
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2007-01-31 19:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ken Chen, linux-mm

On Wed, 31 Jan 2007, Andrew Morton wrote:
> On Wed, 31 Jan 2007 17:17:10 +0000 (GMT) Hugh Dickins <hugh@veritas.com> wrote:
> 
> > 2.  Please remind me what good __mark_inode_dirty will do for shmem:
> 
> None that I can think of - tmpfs inodes don't get written back to swap (do
> they?)

That's right, tmpfs inodes are only in RAM, only the data can go to swap.

> 
> > in my patch the equivalent function did nothing beyond SetPageDirty
> > (your TestSetPageDirty looks better, less redirtying the cacheline).
> 
> Will test_and_set_bit() avoid dirtying the cacheline?  I guess it _could_
> do this, and perhaps this depends upon the architecture.  Perhaps
> 
> 	if (!PageDirty(page))
> 		SetPageDirty(page);
> 
> would be better here.

Synchronicity or telepathy?  Our mails on that crossed.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] simplify shmem_aops.set_page_dirty method
  2007-01-31 19:17     ` Hugh Dickins
@ 2007-01-31 21:23       ` Ken Chen
  0 siblings, 0 replies; 6+ messages in thread
From: Ken Chen @ 2007-01-31 21:23 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-mm

On 1/31/07, Hugh Dickins <hugh@veritas.com> wrote:
> > > 2.  Please remind me what good __mark_inode_dirty will do for shmem:
> >
> > None that I can think of - tmpfs inodes don't get written back to swap (do
> > they?)
>
> That's right, tmpfs inodes are only in RAM, only the data can go to swap.
>
> > Will test_and_set_bit() avoid dirtying the cacheline?  I guess it _could_
> > do this, and perhaps this depends upon the architecture.  Perhaps
> >
> >       if (!PageDirty(page))
> >               SetPageDirty(page);
> >
> > would be better here.


Thank you for reviewing.  Here is a patch with comments incorporated:


shmem backed file does not have page writeback, nor it participates in
backing device's dirty or writeback accounting.  So using generic
__set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
overkill.  It unnecessarily prolongs shm unmap latency.

For example, on a densely populated large shm segment (sevearl GBs), the
unmapping operation becomes painfully long. Because at unmap, kernel
transfers dirty bit in PTE into page struct and to the radix tree tag. The
operation of tagging the radix tree is particularly expensive because it
has to traverse the tree from the root to the leaf node on every dirty page.
What's bothering is that radix tree tag is used for page write back. However,
shmem is memory backed and there is no page write back for such file system.
And in the end, we spend all that time tagging radix tree and none of that
fancy tagging will be used.  So let's simplify it by introduce a new aops
__set_page_dirty_no_writeback and this will speed up shm unmap.


Signed-off-by: Ken Chen <kenchen@google.com>

---
Hugh, If you are OK with this, would you please sign off with your s-o-b?

diff -Nurp linux-2.6.20-rc6/include/linux/mm.h
linux-2.6.20-rc6.unmap/include/linux/mm.h
--- linux-2.6.20-rc6/include/linux/mm.h	2007-01-30 19:23:44.000000000 -0800
+++ linux-2.6.20-rc6.unmap/include/linux/mm.h	2007-01-31
11:22:23.000000000 -0800
@@ -785,6 +785,7 @@ extern int try_to_release_page(struct pa
 extern void do_invalidatepage(struct page *page, unsigned long offset);

 int __set_page_dirty_nobuffers(struct page *page);
+int __set_page_dirty_no_writeback(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
 				struct page *page);
 int FASTCALL(set_page_dirty(struct page *page));
diff -Nurp linux-2.6.20-rc6/mm/memory.c linux-2.6.20-rc6.unmap/mm/memory.c
--- linux-2.6.20-rc6/mm/memory.c	2007-01-30 19:23:45.000000000 -0800
+++ linux-2.6.20-rc6.unmap/mm/memory.c	2007-01-31 12:47:19.000000000 -0800
@@ -678,7 +678,7 @@ static unsigned long zap_pte_range(struc
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent))
-					mark_page_accessed(page);
+					SetPageReferenced(page);
 				file_rss--;
 			}
 			page_remove_rmap(page, vma);
diff -Nurp linux-2.6.20-rc6/mm/page-writeback.c
linux-2.6.20-rc6.unmap/mm/page-writeback.c
--- linux-2.6.20-rc6/mm/page-writeback.c	2007-01-30 19:23:45.000000000 -0800
+++ linux-2.6.20-rc6.unmap/mm/page-writeback.c	2007-01-31
12:36:46.000000000 -0800
@@ -742,6 +742,16 @@ int write_one_page(struct page *page, in
 EXPORT_SYMBOL(write_one_page);

 /*
+ * For address_spaces which do not use buffers nor write back.
+ */
+int __set_page_dirty_no_writeback(struct page *page)
+{
+	if (!PageDirty(page))
+		SetPageDirty(page);
+	return 0;
+}
+
+/*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
  * its radix tree.
  *
diff -Nurp linux-2.6.20-rc6/mm/shmem.c linux-2.6.20-rc6.unmap/mm/shmem.c
--- linux-2.6.20-rc6/mm/shmem.c	2007-01-30 19:23:45.000000000 -0800
+++ linux-2.6.20-rc6.unmap/mm/shmem.c	2007-01-31 11:23:27.000000000 -0800
@@ -2316,7 +2316,7 @@ static void destroy_inodecache(void)

 static const struct address_space_operations shmem_aops = {
 	.writepage	= shmem_writepage,
-	.set_page_dirty	= __set_page_dirty_nobuffers,
+	.set_page_dirty	= __set_page_dirty_no_writeback,
 #ifdef CONFIG_TMPFS
 	.prepare_write	= shmem_prepare_write,
 	.commit_write	= simple_commit_write,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-01-31 21:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-31  4:06 [patch] simplify shmem_aops.set_page_dirty method Ken Chen
2007-01-31 17:17 ` Hugh Dickins
2007-01-31 19:11   ` Andrew Morton
2007-01-31 19:17     ` Hugh Dickins
2007-01-31 21:23       ` Ken Chen
2007-01-31 19:14   ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.