All of lore.kernel.org
 help / color / mirror / Atom feed
* [v4][PATCH 0/6] mm: vmscan: Batch page reclamation under shink_page_list
@ 2013-05-31 18:38 ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen

These are an update of Tim Chen's earlier work:

	http://lkml.kernel.org/r/1347293960.9977.70.camel@schen9-DESK

I broke the patches up a bit more, and tried to incorporate some
changes based on some feedback from Mel and Andrew.

Changes for v4:
 * generated on top of linux-next-20130530, plus Mel's vmscan
   fixes:
	http://lkml.kernel.org/r/1369659778-6772-2-git-send-email-mgorman@suse.de
 * added some proper vmscan/swap: prefixes to the subjects

Changes for v3:
 * Add batch draining before congestion_wait()
 * minor merge conflicts with Mel's vmscan work

Changes for v2:
 * use page_mapping() accessor instead of direct access
   to page->mapping (could cause crashes when running in
   to swap cache pages.
 * group the batch function's introduction patch with
   its first use
 * rename a few functions as suggested by Mel
 * Ran some single-threaded tests to look for regressions
   caused by the batching.  If there is overhead, it is only
   in the worst-case scenarios, and then only in hundreths of
   a percent of CPU time.

If you're curious how effective the batching is, I have a quick
and dirty patch to keep some stats:

	https://www.sr71.net/~dave/intel/rmb-stats-only.patch

--

To do page reclamation in shrink_page_list function, there are
two locks taken on a page by page basis.  One is the tree lock
protecting the radix tree of the page mapping and the other is
the mapping->i_mmap_mutex protecting the mapped pages.  This set
deals only with mapping->tree_lock.

Tim managed to get 14% throughput improvement when with a workload
putting heavy pressure of page cache by reading many large mmaped
files simultaneously on a 8 socket Westmere server.

I've been testing these by running large parallel kernel compiles
on systems that are under memory pressure.  During development,
I caught quite a few races on smaller setups, and it's being
quite stable that large (160 logical CPU / 1TB) system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 0/6] mm: vmscan: Batch page reclamation under shink_page_list
@ 2013-05-31 18:38 ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen

These are an update of Tim Chen's earlier work:

	http://lkml.kernel.org/r/1347293960.9977.70.camel@schen9-DESK

I broke the patches up a bit more, and tried to incorporate some
changes based on some feedback from Mel and Andrew.

Changes for v4:
 * generated on top of linux-next-20130530, plus Mel's vmscan
   fixes:
	http://lkml.kernel.org/r/1369659778-6772-2-git-send-email-mgorman@suse.de
 * added some proper vmscan/swap: prefixes to the subjects

Changes for v3:
 * Add batch draining before congestion_wait()
 * minor merge conflicts with Mel's vmscan work

Changes for v2:
 * use page_mapping() accessor instead of direct access
   to page->mapping (could cause crashes when running in
   to swap cache pages.
 * group the batch function's introduction patch with
   its first use
 * rename a few functions as suggested by Mel
 * Ran some single-threaded tests to look for regressions
   caused by the batching.  If there is overhead, it is only
   in the worst-case scenarios, and then only in hundreths of
   a percent of CPU time.

If you're curious how effective the batching is, I have a quick
and dirty patch to keep some stats:

	https://www.sr71.net/~dave/intel/rmb-stats-only.patch

--

To do page reclamation in shrink_page_list function, there are
two locks taken on a page by page basis.  One is the tree lock
protecting the radix tree of the page mapping and the other is
the mapping->i_mmap_mutex protecting the mapped pages.  This set
deals only with mapping->tree_lock.

Tim managed to get 14% throughput improvement when with a workload
putting heavy pressure of page cache by reading many large mmaped
files simultaneously on a 8 socket Westmere server.

I've been testing these by running large parallel kernel compiles
on systems that are under memory pressure.  During development,
I caught quite a few races on smaller setups, and it's being
quite stable that large (160 logical CPU / 1TB) system.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
  2013-05-31 18:38 ` Dave Hansen
@ 2013-05-31 18:38   ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

This patch defers the destruction of swapcache-specific data in
'struct page'.  This simplifies the code because we do not have
to keep extra copies of the data during the removal of a page
from the swap cache.

There are only two callers of swapcache_free() which actually
pass in a non-NULL 'struct page'.  Both of them (__remove_mapping
and delete_from_swap_cache())  create a temporary on-stack
'swp_entry_t' and set entry.val to page_private().

They need to do this since __delete_from_swap_cache() does
set_page_private(page, 0) and destroys the information.

However, I'd like to batch a few of these operations on several
pages together in a new version of __remove_mapping(), and I
would prefer not to have to allocate temporary storage for each
page.  The code is pretty ugly, and it's a bit silly to create
these on-stack 'swp_entry_t's when it is so easy to just keep the
information around in 'struct page'.

There should not be any danger in doing this since we are
absolutely on the path of freeing these page.  There is no
turning back, and no other rerferences can be obtained after it
comes out of the radix tree.

Note: This patch is separate from the next one since it
introduces the behavior change.  I've seen issues with this patch
by itself in various forms and I think having it separate like
this aids bisection.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
---

 linux.git-davehans/mm/swap_state.c |    4 ++--
 linux.git-davehans/mm/vmscan.c     |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c
--- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.630079404 -0700
+++ linux.git-davehans/mm/swap_state.c	2013-05-30 16:07:50.635079624 -0700
@@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag
 	entry.val = page_private(page);
 	address_space = swap_address_space(entry);
 	radix_tree_delete(&address_space->page_tree, page_private(page));
-	set_page_private(page, 0);
-	ClearPageSwapCache(page);
 	address_space->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
 	INC_CACHE_INFO(del_total);
@@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page
 	spin_unlock_irq(&address_space->tree_lock);
 
 	swapcache_free(entry, page);
+	set_page_private(page, 0);
+	ClearPageSwapCache(page);
 	page_cache_release(page);
 }
 
diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
--- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
@@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
 		__delete_from_swap_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
 		swapcache_free(swap, page);
+		set_page_private(page, 0);
+		ClearPageSwapCache(page);
 	} else {
 		void (*freepage)(struct page *);
 
_

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
@ 2013-05-31 18:38   ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

This patch defers the destruction of swapcache-specific data in
'struct page'.  This simplifies the code because we do not have
to keep extra copies of the data during the removal of a page
from the swap cache.

There are only two callers of swapcache_free() which actually
pass in a non-NULL 'struct page'.  Both of them (__remove_mapping
and delete_from_swap_cache())  create a temporary on-stack
'swp_entry_t' and set entry.val to page_private().

They need to do this since __delete_from_swap_cache() does
set_page_private(page, 0) and destroys the information.

However, I'd like to batch a few of these operations on several
pages together in a new version of __remove_mapping(), and I
would prefer not to have to allocate temporary storage for each
page.  The code is pretty ugly, and it's a bit silly to create
these on-stack 'swp_entry_t's when it is so easy to just keep the
information around in 'struct page'.

There should not be any danger in doing this since we are
absolutely on the path of freeing these page.  There is no
turning back, and no other rerferences can be obtained after it
comes out of the radix tree.

Note: This patch is separate from the next one since it
introduces the behavior change.  I've seen issues with this patch
by itself in various forms and I think having it separate like
this aids bisection.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
---

 linux.git-davehans/mm/swap_state.c |    4 ++--
 linux.git-davehans/mm/vmscan.c     |    2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c
--- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.630079404 -0700
+++ linux.git-davehans/mm/swap_state.c	2013-05-30 16:07:50.635079624 -0700
@@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag
 	entry.val = page_private(page);
 	address_space = swap_address_space(entry);
 	radix_tree_delete(&address_space->page_tree, page_private(page));
-	set_page_private(page, 0);
-	ClearPageSwapCache(page);
 	address_space->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
 	INC_CACHE_INFO(del_total);
@@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page
 	spin_unlock_irq(&address_space->tree_lock);
 
 	swapcache_free(entry, page);
+	set_page_private(page, 0);
+	ClearPageSwapCache(page);
 	page_cache_release(page);
 }
 
diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
--- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
@@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
 		__delete_from_swap_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
 		swapcache_free(swap, page);
+		set_page_private(page, 0);
+		ClearPageSwapCache(page);
 	} else {
 		void (*freepage)(struct page *);
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free().
  2013-05-31 18:38 ` Dave Hansen
@ 2013-05-31 18:38   ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

swapcache_free() takes two arguments:

	void swapcache_free(swp_entry_t entry, struct page *page)

Most of its callers (5/7) are from error handling paths haven't even
instantiated a page, so they pass page=NULL.  Both of the callers
that call in with a 'struct page' create and pass in a temporary
swp_entry_t.

Now that we are deferring clearing page_private() until after
swapcache_free() has been called, we can just create a variant
that takes a 'struct page' and does the temporary variable in
the helper.

That leaves all the other callers doing

	swapcache_free(entry, NULL)

so create another helper for them that makes it clear that they
need only pass in a swp_entry_t.

One downside here is that delete_from_swap_cache() now does
an extra swap_address_space() call.  But, those are pretty
cheap (just some array index arithmetic).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/drivers/staging/zcache/zcache-main.c |    2 +-
 linux.git-davehans/include/linux/swap.h                 |    3 ++-
 linux.git-davehans/mm/shmem.c                           |    2 +-
 linux.git-davehans/mm/swap_state.c                      |   15 +++++----------
 linux.git-davehans/mm/swapfile.c                        |   15 ++++++++++++++-
 linux.git-davehans/mm/vmscan.c                          |    5 +----
 6 files changed, 24 insertions(+), 18 deletions(-)

diff -puN drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants drivers/staging/zcache/zcache-main.c
--- linux.git/drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.898091196 -0700
+++ linux.git-davehans/drivers/staging/zcache/zcache-main.c	2013-05-30 16:07:50.910091724 -0700
@@ -961,7 +961,7 @@ static int zcache_get_swap_cache_page(in
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
 		 */
-		swapcache_free(entry, NULL);
+		swapcache_free_entry(entry);
 		/* FIXME: is it possible to get here without err==-ENOMEM?
 		 * If not, we can dispense with the do loop, use goto retry */
 	} while (err != -ENOMEM);
diff -puN include/linux/swap.h~make-page-and-swp_entry_t-variants include/linux/swap.h
--- linux.git/include/linux/swap.h~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.900091284 -0700
+++ linux.git-davehans/include/linux/swap.h	2013-05-30 16:07:50.911091768 -0700
@@ -385,7 +385,8 @@ extern void swap_shmem_alloc(swp_entry_t
 extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
-extern void swapcache_free(swp_entry_t, struct page *page);
+extern void swapcache_free_entry(swp_entry_t entry);
+extern void swapcache_free_page_entry(struct page *page);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
diff -puN mm/shmem.c~make-page-and-swp_entry_t-variants mm/shmem.c
--- linux.git/mm/shmem.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.902091372 -0700
+++ linux.git-davehans/mm/shmem.c	2013-05-30 16:07:50.912091812 -0700
@@ -872,7 +872,7 @@ static int shmem_writepage(struct page *
 	}
 
 	mutex_unlock(&shmem_swaplist_mutex);
-	swapcache_free(swap, NULL);
+	swapcache_free_entry(swap);
 redirty:
 	set_page_dirty(page);
 	if (wbc->for_reclaim)
diff -puN mm/swapfile.c~make-page-and-swp_entry_t-variants mm/swapfile.c
--- linux.git/mm/swapfile.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.904091460 -0700
+++ linux.git-davehans/mm/swapfile.c	2013-05-30 16:07:50.913091856 -0700
@@ -637,7 +637,7 @@ void swap_free(swp_entry_t entry)
 /*
  * Called after dropping swapcache to decrease refcnt to swap entries.
  */
-void swapcache_free(swp_entry_t entry, struct page *page)
+static void __swapcache_free(swp_entry_t entry, struct page *page)
 {
 	struct swap_info_struct *p;
 	unsigned char count;
@@ -651,6 +651,19 @@ void swapcache_free(swp_entry_t entry, s
 	}
 }
 
+void swapcache_free_entry(swp_entry_t entry)
+{
+	__swapcache_free(entry, NULL);
+}
+
+void swapcache_free_page_entry(struct page *page)
+{
+	swp_entry_t entry = { .val = page_private(page) };
+	__swapcache_free(entry, page);
+	set_page_private(page, 0);
+	ClearPageSwapCache(page);
+}
+
 /*
  * How many references to page are currently swapped out?
  * This does not give an exact answer when swap count is continued,
diff -puN mm/swap_state.c~make-page-and-swp_entry_t-variants mm/swap_state.c
--- linux.git/mm/swap_state.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.905091504 -0700
+++ linux.git-davehans/mm/swap_state.c	2013-05-30 16:07:50.914091900 -0700
@@ -174,7 +174,7 @@ int add_to_swap(struct page *page, struc
 
 	if (unlikely(PageTransHuge(page)))
 		if (unlikely(split_huge_page_to_list(page, list))) {
-			swapcache_free(entry, NULL);
+			swapcache_free_entry(entry);
 			return 0;
 		}
 
@@ -200,7 +200,7 @@ int add_to_swap(struct page *page, struc
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
 		 */
-		swapcache_free(entry, NULL);
+		swapcache_free_entry(entry);
 		return 0;
 	}
 }
@@ -213,19 +213,14 @@ int add_to_swap(struct page *page, struc
  */
 void delete_from_swap_cache(struct page *page)
 {
-	swp_entry_t entry;
 	struct address_space *address_space;
 
-	entry.val = page_private(page);
-
-	address_space = swap_address_space(entry);
+	address_space = page_mapping(page);
 	spin_lock_irq(&address_space->tree_lock);
 	__delete_from_swap_cache(page);
 	spin_unlock_irq(&address_space->tree_lock);
 
-	swapcache_free(entry, page);
-	set_page_private(page, 0);
-	ClearPageSwapCache(page);
+	swapcache_free_page_entry(page);
 	page_cache_release(page);
 }
 
@@ -370,7 +365,7 @@ struct page *read_swap_cache_async(swp_e
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
 		 */
-		swapcache_free(entry, NULL);
+		swapcache_free_entry(entry);
 	} while (err != -ENOMEM);
 
 	if (new_page)
diff -puN mm/vmscan.c~make-page-and-swp_entry_t-variants mm/vmscan.c
--- linux.git/mm/vmscan.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.907091592 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.915091944 -0700
@@ -490,12 +490,9 @@ static int __remove_mapping(struct addre
 	}
 
 	if (PageSwapCache(page)) {
-		swp_entry_t swap = { .val = page_private(page) };
 		__delete_from_swap_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
-		swapcache_free(swap, page);
-		set_page_private(page, 0);
-		ClearPageSwapCache(page);
+		swapcache_free_page_entry(page);
 	} else {
 		void (*freepage)(struct page *);
 
_

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free().
@ 2013-05-31 18:38   ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

swapcache_free() takes two arguments:

	void swapcache_free(swp_entry_t entry, struct page *page)

Most of its callers (5/7) are from error handling paths haven't even
instantiated a page, so they pass page=NULL.  Both of the callers
that call in with a 'struct page' create and pass in a temporary
swp_entry_t.

Now that we are deferring clearing page_private() until after
swapcache_free() has been called, we can just create a variant
that takes a 'struct page' and does the temporary variable in
the helper.

That leaves all the other callers doing

	swapcache_free(entry, NULL)

so create another helper for them that makes it clear that they
need only pass in a swp_entry_t.

One downside here is that delete_from_swap_cache() now does
an extra swap_address_space() call.  But, those are pretty
cheap (just some array index arithmetic).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/drivers/staging/zcache/zcache-main.c |    2 +-
 linux.git-davehans/include/linux/swap.h                 |    3 ++-
 linux.git-davehans/mm/shmem.c                           |    2 +-
 linux.git-davehans/mm/swap_state.c                      |   15 +++++----------
 linux.git-davehans/mm/swapfile.c                        |   15 ++++++++++++++-
 linux.git-davehans/mm/vmscan.c                          |    5 +----
 6 files changed, 24 insertions(+), 18 deletions(-)

diff -puN drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants drivers/staging/zcache/zcache-main.c
--- linux.git/drivers/staging/zcache/zcache-main.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.898091196 -0700
+++ linux.git-davehans/drivers/staging/zcache/zcache-main.c	2013-05-30 16:07:50.910091724 -0700
@@ -961,7 +961,7 @@ static int zcache_get_swap_cache_page(in
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
 		 */
-		swapcache_free(entry, NULL);
+		swapcache_free_entry(entry);
 		/* FIXME: is it possible to get here without err==-ENOMEM?
 		 * If not, we can dispense with the do loop, use goto retry */
 	} while (err != -ENOMEM);
diff -puN include/linux/swap.h~make-page-and-swp_entry_t-variants include/linux/swap.h
--- linux.git/include/linux/swap.h~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.900091284 -0700
+++ linux.git-davehans/include/linux/swap.h	2013-05-30 16:07:50.911091768 -0700
@@ -385,7 +385,8 @@ extern void swap_shmem_alloc(swp_entry_t
 extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
-extern void swapcache_free(swp_entry_t, struct page *page);
+extern void swapcache_free_entry(swp_entry_t entry);
+extern void swapcache_free_page_entry(struct page *page);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
diff -puN mm/shmem.c~make-page-and-swp_entry_t-variants mm/shmem.c
--- linux.git/mm/shmem.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.902091372 -0700
+++ linux.git-davehans/mm/shmem.c	2013-05-30 16:07:50.912091812 -0700
@@ -872,7 +872,7 @@ static int shmem_writepage(struct page *
 	}
 
 	mutex_unlock(&shmem_swaplist_mutex);
-	swapcache_free(swap, NULL);
+	swapcache_free_entry(swap);
 redirty:
 	set_page_dirty(page);
 	if (wbc->for_reclaim)
diff -puN mm/swapfile.c~make-page-and-swp_entry_t-variants mm/swapfile.c
--- linux.git/mm/swapfile.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.904091460 -0700
+++ linux.git-davehans/mm/swapfile.c	2013-05-30 16:07:50.913091856 -0700
@@ -637,7 +637,7 @@ void swap_free(swp_entry_t entry)
 /*
  * Called after dropping swapcache to decrease refcnt to swap entries.
  */
-void swapcache_free(swp_entry_t entry, struct page *page)
+static void __swapcache_free(swp_entry_t entry, struct page *page)
 {
 	struct swap_info_struct *p;
 	unsigned char count;
@@ -651,6 +651,19 @@ void swapcache_free(swp_entry_t entry, s
 	}
 }
 
+void swapcache_free_entry(swp_entry_t entry)
+{
+	__swapcache_free(entry, NULL);
+}
+
+void swapcache_free_page_entry(struct page *page)
+{
+	swp_entry_t entry = { .val = page_private(page) };
+	__swapcache_free(entry, page);
+	set_page_private(page, 0);
+	ClearPageSwapCache(page);
+}
+
 /*
  * How many references to page are currently swapped out?
  * This does not give an exact answer when swap count is continued,
diff -puN mm/swap_state.c~make-page-and-swp_entry_t-variants mm/swap_state.c
--- linux.git/mm/swap_state.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.905091504 -0700
+++ linux.git-davehans/mm/swap_state.c	2013-05-30 16:07:50.914091900 -0700
@@ -174,7 +174,7 @@ int add_to_swap(struct page *page, struc
 
 	if (unlikely(PageTransHuge(page)))
 		if (unlikely(split_huge_page_to_list(page, list))) {
-			swapcache_free(entry, NULL);
+			swapcache_free_entry(entry);
 			return 0;
 		}
 
@@ -200,7 +200,7 @@ int add_to_swap(struct page *page, struc
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
 		 */
-		swapcache_free(entry, NULL);
+		swapcache_free_entry(entry);
 		return 0;
 	}
 }
@@ -213,19 +213,14 @@ int add_to_swap(struct page *page, struc
  */
 void delete_from_swap_cache(struct page *page)
 {
-	swp_entry_t entry;
 	struct address_space *address_space;
 
-	entry.val = page_private(page);
-
-	address_space = swap_address_space(entry);
+	address_space = page_mapping(page);
 	spin_lock_irq(&address_space->tree_lock);
 	__delete_from_swap_cache(page);
 	spin_unlock_irq(&address_space->tree_lock);
 
-	swapcache_free(entry, page);
-	set_page_private(page, 0);
-	ClearPageSwapCache(page);
+	swapcache_free_page_entry(page);
 	page_cache_release(page);
 }
 
@@ -370,7 +365,7 @@ struct page *read_swap_cache_async(swp_e
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
 		 */
-		swapcache_free(entry, NULL);
+		swapcache_free_entry(entry);
 	} while (err != -ENOMEM);
 
 	if (new_page)
diff -puN mm/vmscan.c~make-page-and-swp_entry_t-variants mm/vmscan.c
--- linux.git/mm/vmscan.c~make-page-and-swp_entry_t-variants	2013-05-30 16:07:50.907091592 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.915091944 -0700
@@ -490,12 +490,9 @@ static int __remove_mapping(struct addre
 	}
 
 	if (PageSwapCache(page)) {
-		swp_entry_t swap = { .val = page_private(page) };
 		__delete_from_swap_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
-		swapcache_free(swap, page);
-		set_page_private(page, 0);
-		ClearPageSwapCache(page);
+		swapcache_free_page_entry(page);
 	} else {
 		void (*freepage)(struct page *);
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping()
  2013-05-31 18:38 ` Dave Hansen
@ 2013-05-31 18:38   ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Our goal here is to eventually reduce the number of repetitive
acquire/release operations on mapping->tree_lock.

Logically, this patch has two steps:
1. rename __remove_mapping() to lock_remove_mapping() since
   "__" usually means "this us the unlocked version.
2. Recreate __remove_mapping() to _be_ the lock_remove_mapping()
   but without the locks.

I think this actually makes the code flow around the locking
_much_ more straighforward since the locking just becomes:

	spin_lock_irq(&mapping->tree_lock);
	ret = __remove_mapping(mapping, page);
	spin_unlock_irq(&mapping->tree_lock);

One non-obvious part of this patch: the

	freepage = mapping->a_ops->freepage;

used to happen under the mapping->tree_lock, but this patch
moves it to outside of the lock.  All of the other
a_ops->freepage users do it outside the lock, and we only
assign it when we create inodes, so that makes it safe.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>

---

 linux.git-davehans/mm/vmscan.c |   43 ++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c
--- linux.git/mm/vmscan.c~make-remove-mapping-without-locks	2013-05-30 16:07:51.210104924 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.214105100 -0700
@@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa
  * Same as remove_mapping, but if the page is removed from the mapping, it
  * gets returned with a refcount of 0.
  */
-static int __remove_mapping(struct address_space *mapping, struct page *page)
+static int __remove_mapping(struct address_space *mapping,
+			    struct page *page)
 {
 	BUG_ON(!PageLocked(page));
 	BUG_ON(mapping != page_mapping(page));
 
-	spin_lock_irq(&mapping->tree_lock);
 	/*
 	 * The non racy check for a busy page.
 	 *
@@ -482,35 +482,44 @@ static int __remove_mapping(struct addre
 	 * and thus under tree_lock, then this ordering is not required.
 	 */
 	if (!page_freeze_refs(page, 2))
-		goto cannot_free;
+		return 0;
 	/* note: atomic_cmpxchg in page_freeze_refs provides the smp_rmb */
 	if (unlikely(PageDirty(page))) {
 		page_unfreeze_refs(page, 2);
-		goto cannot_free;
+		return 0;
 	}
 
 	if (PageSwapCache(page)) {
 		__delete_from_swap_cache(page);
-		spin_unlock_irq(&mapping->tree_lock);
+	} else {
+		__delete_from_page_cache(page);
+	}
+	return 1;
+}
+
+static int lock_remove_mapping(struct address_space *mapping, struct page *page)
+{
+	int ret;
+	BUG_ON(!PageLocked(page));
+
+	spin_lock_irq(&mapping->tree_lock);
+	ret = __remove_mapping(mapping, page);
+	spin_unlock_irq(&mapping->tree_lock);
+
+	/* unable to free */
+	if (!ret)
+		return 0;
+
+	if (PageSwapCache(page)) {
 		swapcache_free_page_entry(page);
 	} else {
 		void (*freepage)(struct page *);
-
 		freepage = mapping->a_ops->freepage;
-
-		__delete_from_page_cache(page);
-		spin_unlock_irq(&mapping->tree_lock);
 		mem_cgroup_uncharge_cache_page(page);
-
 		if (freepage != NULL)
 			freepage(page);
 	}
-
-	return 1;
-
-cannot_free:
-	spin_unlock_irq(&mapping->tree_lock);
-	return 0;
+	return ret;
 }
 
 /*
@@ -521,7 +530,7 @@ cannot_free:
  */
 int remove_mapping(struct address_space *mapping, struct page *page)
 {
-	if (__remove_mapping(mapping, page)) {
+	if (lock_remove_mapping(mapping, page)) {
 		/*
 		 * Unfreezing the refcount with 1 rather than 2 effectively
 		 * drops the pagecache ref for us without requiring another
_

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping()
@ 2013-05-31 18:38   ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Our goal here is to eventually reduce the number of repetitive
acquire/release operations on mapping->tree_lock.

Logically, this patch has two steps:
1. rename __remove_mapping() to lock_remove_mapping() since
   "__" usually means "this us the unlocked version.
2. Recreate __remove_mapping() to _be_ the lock_remove_mapping()
   but without the locks.

I think this actually makes the code flow around the locking
_much_ more straighforward since the locking just becomes:

	spin_lock_irq(&mapping->tree_lock);
	ret = __remove_mapping(mapping, page);
	spin_unlock_irq(&mapping->tree_lock);

One non-obvious part of this patch: the

	freepage = mapping->a_ops->freepage;

used to happen under the mapping->tree_lock, but this patch
moves it to outside of the lock.  All of the other
a_ops->freepage users do it outside the lock, and we only
assign it when we create inodes, so that makes it safe.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>

---

 linux.git-davehans/mm/vmscan.c |   43 ++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c
--- linux.git/mm/vmscan.c~make-remove-mapping-without-locks	2013-05-30 16:07:51.210104924 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.214105100 -0700
@@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa
  * Same as remove_mapping, but if the page is removed from the mapping, it
  * gets returned with a refcount of 0.
  */
-static int __remove_mapping(struct address_space *mapping, struct page *page)
+static int __remove_mapping(struct address_space *mapping,
+			    struct page *page)
 {
 	BUG_ON(!PageLocked(page));
 	BUG_ON(mapping != page_mapping(page));
 
-	spin_lock_irq(&mapping->tree_lock);
 	/*
 	 * The non racy check for a busy page.
 	 *
@@ -482,35 +482,44 @@ static int __remove_mapping(struct addre
 	 * and thus under tree_lock, then this ordering is not required.
 	 */
 	if (!page_freeze_refs(page, 2))
-		goto cannot_free;
+		return 0;
 	/* note: atomic_cmpxchg in page_freeze_refs provides the smp_rmb */
 	if (unlikely(PageDirty(page))) {
 		page_unfreeze_refs(page, 2);
-		goto cannot_free;
+		return 0;
 	}
 
 	if (PageSwapCache(page)) {
 		__delete_from_swap_cache(page);
-		spin_unlock_irq(&mapping->tree_lock);
+	} else {
+		__delete_from_page_cache(page);
+	}
+	return 1;
+}
+
+static int lock_remove_mapping(struct address_space *mapping, struct page *page)
+{
+	int ret;
+	BUG_ON(!PageLocked(page));
+
+	spin_lock_irq(&mapping->tree_lock);
+	ret = __remove_mapping(mapping, page);
+	spin_unlock_irq(&mapping->tree_lock);
+
+	/* unable to free */
+	if (!ret)
+		return 0;
+
+	if (PageSwapCache(page)) {
 		swapcache_free_page_entry(page);
 	} else {
 		void (*freepage)(struct page *);
-
 		freepage = mapping->a_ops->freepage;
-
-		__delete_from_page_cache(page);
-		spin_unlock_irq(&mapping->tree_lock);
 		mem_cgroup_uncharge_cache_page(page);
-
 		if (freepage != NULL)
 			freepage(page);
 	}
-
-	return 1;
-
-cannot_free:
-	spin_unlock_irq(&mapping->tree_lock);
-	return 0;
+	return ret;
 }
 
 /*
@@ -521,7 +530,7 @@ cannot_free:
  */
 int remove_mapping(struct address_space *mapping, struct page *page)
 {
-	if (__remove_mapping(mapping, page)) {
+	if (lock_remove_mapping(mapping, page)) {
 		/*
 		 * Unfreezing the refcount with 1 rather than 2 effectively
 		 * drops the pagecache ref for us without requiring another
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code
  2013-05-31 18:38 ` Dave Hansen
@ 2013-05-31 18:39   ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:39 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

__remove_mapping() only deals with pages with mappings, meaning
page cache and swap cache.

At this point, the page has been removed from the mapping's radix
tree, and we need to ensure that any fs-specific (or swap-
specific) resources are freed up.

We will be using this function from a second location in a
following patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
---

 linux.git-davehans/mm/vmscan.c |   28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c
--- linux.git/mm/vmscan.c~free_mapping_page	2013-05-30 16:07:51.461115968 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.465116144 -0700
@@ -497,6 +497,24 @@ static int __remove_mapping(struct addre
 	return 1;
 }
 
+/*
+ * Release any resources the mapping had tied up in
+ * the page.
+ */
+static void mapping_release_page(struct address_space *mapping,
+				 struct page *page)
+{
+	if (PageSwapCache(page)) {
+		swapcache_free_page_entry(page);
+	} else {
+		void (*freepage)(struct page *);
+		freepage = mapping->a_ops->freepage;
+		mem_cgroup_uncharge_cache_page(page);
+		if (freepage != NULL)
+			freepage(page);
+	}
+}
+
 static int lock_remove_mapping(struct address_space *mapping, struct page *page)
 {
 	int ret;
@@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad
 	if (!ret)
 		return 0;
 
-	if (PageSwapCache(page)) {
-		swapcache_free_page_entry(page);
-	} else {
-		void (*freepage)(struct page *);
-		freepage = mapping->a_ops->freepage;
-		mem_cgroup_uncharge_cache_page(page);
-		if (freepage != NULL)
-			freepage(page);
-	}
+	mapping_release_page(mapping, page);
 	return ret;
 }
 
_

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code
@ 2013-05-31 18:39   ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:39 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

__remove_mapping() only deals with pages with mappings, meaning
page cache and swap cache.

At this point, the page has been removed from the mapping's radix
tree, and we need to ensure that any fs-specific (or swap-
specific) resources are freed up.

We will be using this function from a second location in a
following patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
---

 linux.git-davehans/mm/vmscan.c |   28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c
--- linux.git/mm/vmscan.c~free_mapping_page	2013-05-30 16:07:51.461115968 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.465116144 -0700
@@ -497,6 +497,24 @@ static int __remove_mapping(struct addre
 	return 1;
 }
 
+/*
+ * Release any resources the mapping had tied up in
+ * the page.
+ */
+static void mapping_release_page(struct address_space *mapping,
+				 struct page *page)
+{
+	if (PageSwapCache(page)) {
+		swapcache_free_page_entry(page);
+	} else {
+		void (*freepage)(struct page *);
+		freepage = mapping->a_ops->freepage;
+		mem_cgroup_uncharge_cache_page(page);
+		if (freepage != NULL)
+			freepage(page);
+	}
+}
+
 static int lock_remove_mapping(struct address_space *mapping, struct page *page)
 {
 	int ret;
@@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad
 	if (!ret)
 		return 0;
 
-	if (PageSwapCache(page)) {
-		swapcache_free_page_entry(page);
-	} else {
-		void (*freepage)(struct page *);
-		freepage = mapping->a_ops->freepage;
-		mem_cgroup_uncharge_cache_page(page);
-		if (freepage != NULL)
-			freepage(page);
-	}
+	mapping_release_page(mapping, page);
 	return ret;
 }
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 5/6] mm: vmscan: batch shrink_page_list() locking operations
  2013-05-31 18:38 ` Dave Hansen
@ 2013-05-31 18:39   ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:39 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>
changes for v2:
 * remove batch_has_same_mapping() helper.  A local varible makes
   the check cheaper and cleaner
 * Move batch draining later to where we already know
   page_mapping().  This probably fixes a truncation race anyway
 * rename batch_for_mapping_removal -> batch_for_mapping_rm.  It
   caused a line over 80 chars and needed shortening anyway.
 * Note: we only set 'batch_mapping' when there are pages in the
   batch_for_mapping_rm list

--

We batch like this so that several pages can be freed with a
single mapping->tree_lock acquisition/release pair.  This reduces
the number of atomic operations and ensures that we do not bounce
cachelines around.

Tim Chen's earlier version of these patches just unconditionally
created large batches of pages, even if they did not share a
page_mapping().  This is a bit suboptimal for a few reasons:
1. if we can not consolidate lock acquisitions, it makes little
   sense to batch
2. The page locks are held for long periods of time, so we only
   want to do this when we are sure that we will gain a
   substantial throughput improvement because we pay a latency
   cost by holding the locks.

This patch makes sure to only batch when all the pages on
'batch_for_mapping_rm' continue to share a page_mapping().
This only happens in practice in cases where pages in the same
file are close to each other on the LRU.  That seems like a
reasonable assumption.

In a 128MB virtual machine doing kernel compiles, the average
batch size when calling __remove_mapping_batch() is around 5,
so this does seem to do some good in practice.

On a 160-cpu system doing kernel compiles, I still saw an
average batch length of about 2.8.  One promising feature:
as the memory pressure went up, the average batches seem to
have gotten larger.

It has shown some substantial performance benefits on
microbenchmarks.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/vmscan.c |   95 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 9 deletions(-)

diff -puN mm/vmscan.c~create-remove_mapping_batch mm/vmscan.c
--- linux.git/mm/vmscan.c~create-remove_mapping_batch	2013-05-30 16:07:51.711126969 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.716127189 -0700
@@ -552,6 +552,61 @@ int remove_mapping(struct address_space
 	return 0;
 }
 
+/*
+ * pages come in here (via remove_list) locked and leave unlocked
+ * (on either ret_pages or free_pages)
+ *
+ * We do this batching so that we free batches of pages with a
+ * single mapping->tree_lock acquisition/release.  This optimization
+ * only makes sense when the pages on remove_list all share a
+ * page_mapping().  If this is violated you will BUG_ON().
+ */
+static int __remove_mapping_batch(struct list_head *remove_list,
+				  struct list_head *ret_pages,
+				  struct list_head *free_pages)
+{
+	int nr_reclaimed = 0;
+	struct address_space *mapping;
+	struct page *page;
+	LIST_HEAD(need_free_mapping);
+
+	if (list_empty(remove_list))
+		return 0;
+
+	mapping = page_mapping(lru_to_page(remove_list));
+	spin_lock_irq(&mapping->tree_lock);
+	while (!list_empty(remove_list)) {
+		page = lru_to_page(remove_list);
+		BUG_ON(!PageLocked(page));
+		BUG_ON(page_mapping(page) != mapping);
+		list_del(&page->lru);
+
+		if (!__remove_mapping(mapping, page)) {
+			unlock_page(page);
+			list_add(&page->lru, ret_pages);
+			continue;
+		}
+		list_add(&page->lru, &need_free_mapping);
+	}
+	spin_unlock_irq(&mapping->tree_lock);
+
+	while (!list_empty(&need_free_mapping)) {
+		page = lru_to_page(&need_free_mapping);
+		list_move(&page->list, free_pages);
+		mapping_release_page(mapping, page);
+		/*
+		 * At this point, we have no other references and there is
+		 * no way to pick any more up (removed from LRU, removed
+		 * from pagecache). Can use non-atomic bitops now (and
+		 * we obviously don't have to worry about waking up a process
+		 * waiting on the page lock, because there are no references.
+		 */
+		__clear_page_locked(page);
+		nr_reclaimed++;
+	}
+	return nr_reclaimed;
+}
+
 /**
  * putback_lru_page - put previously isolated page onto appropriate LRU list
  * @page: page to be put back to appropriate lru list
@@ -730,6 +785,8 @@ static unsigned long shrink_page_list(st
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
+	LIST_HEAD(batch_for_mapping_rm);
+	struct address_space *batch_mapping = NULL;
 	int pgactivate = 0;
 	unsigned long nr_unqueued_dirty = 0;
 	unsigned long nr_dirty = 0;
@@ -751,6 +808,7 @@ static unsigned long shrink_page_list(st
 		cond_resched();
 
 		page = lru_to_page(page_list);
+
 		list_del(&page->lru);
 
 		if (!trylock_page(page))
@@ -853,6 +911,10 @@ static unsigned long shrink_page_list(st
 
 			/* Case 3 above */
 			} else {
+				/*
+				 * batch_for_mapping_rm could be drained here
+				 * if its lock_page()s hurt latency elsewhere.
+				 */
 				wait_on_page_writeback(page);
 			}
 		}
@@ -883,6 +945,18 @@ static unsigned long shrink_page_list(st
 		}
 
 		mapping = page_mapping(page);
+		/*
+		 * batching only makes sense when we can save lock
+		 * acquisitions, so drain the previously-batched
+		 * pages when we move over to a different mapping
+		 */
+		if (batch_mapping && (batch_mapping != mapping)) {
+			nr_reclaimed +=
+				__remove_mapping_batch(&batch_for_mapping_rm,
+							&ret_pages,
+							&free_pages);
+			batch_mapping = NULL;
+		}
 
 		/*
 		 * The page is mapped into the page tables of one or more
@@ -998,17 +1072,18 @@ static unsigned long shrink_page_list(st
 			}
 		}
 
-		if (!mapping || !__remove_mapping(mapping, page))
+		if (!mapping)
 			goto keep_locked;
-
 		/*
-		 * At this point, we have no other references and there is
-		 * no way to pick any more up (removed from LRU, removed
-		 * from pagecache). Can use non-atomic bitops now (and
-		 * we obviously don't have to worry about waking up a process
-		 * waiting on the page lock, because there are no references.
+		 * This list contains pages all in the same mapping, but
+		 * in effectively random order and we hold lock_page()
+		 * on *all* of them.  This can potentially cause lock
+		 * ordering issues, but the reclaim code only trylocks
+		 * them which saves us.
 		 */
-		__clear_page_locked(page);
+		list_add(&page->lru, &batch_for_mapping_rm);
+		batch_mapping = mapping;
+		continue;
 free_it:
 		nr_reclaimed++;
 
@@ -1039,7 +1114,9 @@ keep:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
-
+	nr_reclaimed += __remove_mapping_batch(&batch_for_mapping_rm,
+						&ret_pages,
+						&free_pages);
 	/*
 	 * Tag a zone as congested if all the dirty pages encountered were
 	 * backed by a congested BDI. In this case, reclaimers should just
_

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 5/6] mm: vmscan: batch shrink_page_list() locking operations
@ 2013-05-31 18:39   ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:39 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>
changes for v2:
 * remove batch_has_same_mapping() helper.  A local varible makes
   the check cheaper and cleaner
 * Move batch draining later to where we already know
   page_mapping().  This probably fixes a truncation race anyway
 * rename batch_for_mapping_removal -> batch_for_mapping_rm.  It
   caused a line over 80 chars and needed shortening anyway.
 * Note: we only set 'batch_mapping' when there are pages in the
   batch_for_mapping_rm list

--

We batch like this so that several pages can be freed with a
single mapping->tree_lock acquisition/release pair.  This reduces
the number of atomic operations and ensures that we do not bounce
cachelines around.

Tim Chen's earlier version of these patches just unconditionally
created large batches of pages, even if they did not share a
page_mapping().  This is a bit suboptimal for a few reasons:
1. if we can not consolidate lock acquisitions, it makes little
   sense to batch
2. The page locks are held for long periods of time, so we only
   want to do this when we are sure that we will gain a
   substantial throughput improvement because we pay a latency
   cost by holding the locks.

This patch makes sure to only batch when all the pages on
'batch_for_mapping_rm' continue to share a page_mapping().
This only happens in practice in cases where pages in the same
file are close to each other on the LRU.  That seems like a
reasonable assumption.

In a 128MB virtual machine doing kernel compiles, the average
batch size when calling __remove_mapping_batch() is around 5,
so this does seem to do some good in practice.

On a 160-cpu system doing kernel compiles, I still saw an
average batch length of about 2.8.  One promising feature:
as the memory pressure went up, the average batches seem to
have gotten larger.

It has shown some substantial performance benefits on
microbenchmarks.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/vmscan.c |   95 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 9 deletions(-)

diff -puN mm/vmscan.c~create-remove_mapping_batch mm/vmscan.c
--- linux.git/mm/vmscan.c~create-remove_mapping_batch	2013-05-30 16:07:51.711126969 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.716127189 -0700
@@ -552,6 +552,61 @@ int remove_mapping(struct address_space
 	return 0;
 }
 
+/*
+ * pages come in here (via remove_list) locked and leave unlocked
+ * (on either ret_pages or free_pages)
+ *
+ * We do this batching so that we free batches of pages with a
+ * single mapping->tree_lock acquisition/release.  This optimization
+ * only makes sense when the pages on remove_list all share a
+ * page_mapping().  If this is violated you will BUG_ON().
+ */
+static int __remove_mapping_batch(struct list_head *remove_list,
+				  struct list_head *ret_pages,
+				  struct list_head *free_pages)
+{
+	int nr_reclaimed = 0;
+	struct address_space *mapping;
+	struct page *page;
+	LIST_HEAD(need_free_mapping);
+
+	if (list_empty(remove_list))
+		return 0;
+
+	mapping = page_mapping(lru_to_page(remove_list));
+	spin_lock_irq(&mapping->tree_lock);
+	while (!list_empty(remove_list)) {
+		page = lru_to_page(remove_list);
+		BUG_ON(!PageLocked(page));
+		BUG_ON(page_mapping(page) != mapping);
+		list_del(&page->lru);
+
+		if (!__remove_mapping(mapping, page)) {
+			unlock_page(page);
+			list_add(&page->lru, ret_pages);
+			continue;
+		}
+		list_add(&page->lru, &need_free_mapping);
+	}
+	spin_unlock_irq(&mapping->tree_lock);
+
+	while (!list_empty(&need_free_mapping)) {
+		page = lru_to_page(&need_free_mapping);
+		list_move(&page->list, free_pages);
+		mapping_release_page(mapping, page);
+		/*
+		 * At this point, we have no other references and there is
+		 * no way to pick any more up (removed from LRU, removed
+		 * from pagecache). Can use non-atomic bitops now (and
+		 * we obviously don't have to worry about waking up a process
+		 * waiting on the page lock, because there are no references.
+		 */
+		__clear_page_locked(page);
+		nr_reclaimed++;
+	}
+	return nr_reclaimed;
+}
+
 /**
  * putback_lru_page - put previously isolated page onto appropriate LRU list
  * @page: page to be put back to appropriate lru list
@@ -730,6 +785,8 @@ static unsigned long shrink_page_list(st
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
+	LIST_HEAD(batch_for_mapping_rm);
+	struct address_space *batch_mapping = NULL;
 	int pgactivate = 0;
 	unsigned long nr_unqueued_dirty = 0;
 	unsigned long nr_dirty = 0;
@@ -751,6 +808,7 @@ static unsigned long shrink_page_list(st
 		cond_resched();
 
 		page = lru_to_page(page_list);
+
 		list_del(&page->lru);
 
 		if (!trylock_page(page))
@@ -853,6 +911,10 @@ static unsigned long shrink_page_list(st
 
 			/* Case 3 above */
 			} else {
+				/*
+				 * batch_for_mapping_rm could be drained here
+				 * if its lock_page()s hurt latency elsewhere.
+				 */
 				wait_on_page_writeback(page);
 			}
 		}
@@ -883,6 +945,18 @@ static unsigned long shrink_page_list(st
 		}
 
 		mapping = page_mapping(page);
+		/*
+		 * batching only makes sense when we can save lock
+		 * acquisitions, so drain the previously-batched
+		 * pages when we move over to a different mapping
+		 */
+		if (batch_mapping && (batch_mapping != mapping)) {
+			nr_reclaimed +=
+				__remove_mapping_batch(&batch_for_mapping_rm,
+							&ret_pages,
+							&free_pages);
+			batch_mapping = NULL;
+		}
 
 		/*
 		 * The page is mapped into the page tables of one or more
@@ -998,17 +1072,18 @@ static unsigned long shrink_page_list(st
 			}
 		}
 
-		if (!mapping || !__remove_mapping(mapping, page))
+		if (!mapping)
 			goto keep_locked;
-
 		/*
-		 * At this point, we have no other references and there is
-		 * no way to pick any more up (removed from LRU, removed
-		 * from pagecache). Can use non-atomic bitops now (and
-		 * we obviously don't have to worry about waking up a process
-		 * waiting on the page lock, because there are no references.
+		 * This list contains pages all in the same mapping, but
+		 * in effectively random order and we hold lock_page()
+		 * on *all* of them.  This can potentially cause lock
+		 * ordering issues, but the reclaim code only trylocks
+		 * them which saves us.
 		 */
-		__clear_page_locked(page);
+		list_add(&page->lru, &batch_for_mapping_rm);
+		batch_mapping = mapping;
+		continue;
 free_it:
 		nr_reclaimed++;
 
@@ -1039,7 +1114,9 @@ keep:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
-
+	nr_reclaimed += __remove_mapping_batch(&batch_for_mapping_rm,
+						&ret_pages,
+						&free_pages);
 	/*
 	 * Tag a zone as congested if all the dirty pages encountered were
 	 * backed by a congested BDI. In this case, reclaimers should just
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 6/6] mm: vmscan: drain batch list during long operations
  2013-05-31 18:38 ` Dave Hansen
@ 2013-05-31 18:39   ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:39 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

This was a suggestion from Mel:

	http://lkml.kernel.org/r/20120914085634.GM11157@csn.ul.ie

Any pages we collect on 'batch_for_mapping_removal' will have
their lock_page() held during the duration of their stay on the
list.  If some other user is trying to get at them during this
time, they might end up having to wait.

This ensures that we drain the batch if we are about to perform a
pageout() or congestion_wait(), either of which will take some
time.  We expect this to help mitigate the worst of the latency
increase that the batching could cause.

I added some statistics to the __remove_mapping_batch() code to
track how large the lists are that we pass in to it.  With this
patch, the average list length drops about 10% (from about 4.1 to
3.8).  The workload here was a make -j4 kernel compile on a VM
with 200MB of RAM.

I've still got the statistics patch around if anyone is
interested.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/vmscan.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff -puN mm/vmscan.c~drain-batch-list-during-long-operations mm/vmscan.c
--- linux.git/mm/vmscan.c~drain-batch-list-during-long-operations	2013-05-30 16:07:51.962138013 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.966138189 -0700
@@ -1003,6 +1003,16 @@ static unsigned long shrink_page_list(st
 			if (!sc->may_writepage)
 				goto keep_locked;
 
+			/*
+			 * We hold a bunch of page locks on the batch.
+			 * pageout() can take a while, so drain the
+			 * batch before we perform pageout.
+			 */
+			nr_reclaimed +=
+		               __remove_mapping_batch(&batch_for_mapping_rm,
+		                                      &ret_pages,
+		                                      &free_pages);
+
 			/* Page is dirty, try to write it out here */
 			switch (pageout(page, mapping, sc)) {
 			case PAGE_KEEP:
_

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4][PATCH 6/6] mm: vmscan: drain batch list during long operations
@ 2013-05-31 18:39   ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-05-31 18:39 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, mgorman, tim.c.chen, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

This was a suggestion from Mel:

	http://lkml.kernel.org/r/20120914085634.GM11157@csn.ul.ie

Any pages we collect on 'batch_for_mapping_removal' will have
their lock_page() held during the duration of their stay on the
list.  If some other user is trying to get at them during this
time, they might end up having to wait.

This ensures that we drain the batch if we are about to perform a
pageout() or congestion_wait(), either of which will take some
time.  We expect this to help mitigate the worst of the latency
increase that the batching could cause.

I added some statistics to the __remove_mapping_batch() code to
track how large the lists are that we pass in to it.  With this
patch, the average list length drops about 10% (from about 4.1 to
3.8).  The workload here was a make -j4 kernel compile on a VM
with 200MB of RAM.

I've still got the statistics patch around if anyone is
interested.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/vmscan.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff -puN mm/vmscan.c~drain-batch-list-during-long-operations mm/vmscan.c
--- linux.git/mm/vmscan.c~drain-batch-list-during-long-operations	2013-05-30 16:07:51.962138013 -0700
+++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.966138189 -0700
@@ -1003,6 +1003,16 @@ static unsigned long shrink_page_list(st
 			if (!sc->may_writepage)
 				goto keep_locked;
 
+			/*
+			 * We hold a bunch of page locks on the batch.
+			 * pageout() can take a while, so drain the
+			 * batch before we perform pageout.
+			 */
+			nr_reclaimed +=
+		               __remove_mapping_batch(&batch_for_mapping_rm,
+		                                      &ret_pages,
+		                                      &free_pages);
+
 			/* Page is dirty, try to write it out here */
 			switch (pageout(page, mapping, sc)) {
 			case PAGE_KEEP:
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
  2013-05-31 18:38   ` Dave Hansen
@ 2013-06-03  5:40     ` Minchan Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  5:40 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Fri, May 31, 2013 at 11:38:56AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> This patch defers the destruction of swapcache-specific data in
> 'struct page'.  This simplifies the code because we do not have
> to keep extra copies of the data during the removal of a page
> from the swap cache.
> 
> There are only two callers of swapcache_free() which actually
> pass in a non-NULL 'struct page'.  Both of them (__remove_mapping
> and delete_from_swap_cache())  create a temporary on-stack
> 'swp_entry_t' and set entry.val to page_private().
> 
> They need to do this since __delete_from_swap_cache() does
> set_page_private(page, 0) and destroys the information.
> 
> However, I'd like to batch a few of these operations on several
> pages together in a new version of __remove_mapping(), and I
> would prefer not to have to allocate temporary storage for each
> page.  The code is pretty ugly, and it's a bit silly to create
> these on-stack 'swp_entry_t's when it is so easy to just keep the
> information around in 'struct page'.
> 
> There should not be any danger in doing this since we are
> absolutely on the path of freeing these page.  There is no
> turning back, and no other rerferences can be obtained after it
> comes out of the radix tree.
> 
> Note: This patch is separate from the next one since it
> introduces the behavior change.  I've seen issues with this patch
> by itself in various forms and I think having it separate like
> this aids bisection.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Minchan Kim <minchan@kernel.org>

Look at below nitpick.

> ---
> 
>  linux.git-davehans/mm/swap_state.c |    4 ++--
>  linux.git-davehans/mm/vmscan.c     |    2 ++
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c
> --- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.630079404 -0700
> +++ linux.git-davehans/mm/swap_state.c	2013-05-30 16:07:50.635079624 -0700
> @@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag
>  	entry.val = page_private(page);
>  	address_space = swap_address_space(entry);
>  	radix_tree_delete(&address_space->page_tree, page_private(page));
> -	set_page_private(page, 0);
> -	ClearPageSwapCache(page);
>  	address_space->nrpages--;
>  	__dec_zone_page_state(page, NR_FILE_PAGES);
>  	INC_CACHE_INFO(del_total);
> @@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page
>  	spin_unlock_irq(&address_space->tree_lock);
>  
>  	swapcache_free(entry, page);
> +	set_page_private(page, 0);
> +	ClearPageSwapCache(page);
>  	page_cache_release(page);
>  }
>  
> diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
> --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
> +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
> @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
>  		__delete_from_swap_cache(page);
>  		spin_unlock_irq(&mapping->tree_lock);
>  		swapcache_free(swap, page);
> +		set_page_private(page, 0);
> +		ClearPageSwapCache(page);

It it worth to support non-atomic version of ClearPageSwapCache?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
@ 2013-06-03  5:40     ` Minchan Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  5:40 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Fri, May 31, 2013 at 11:38:56AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> This patch defers the destruction of swapcache-specific data in
> 'struct page'.  This simplifies the code because we do not have
> to keep extra copies of the data during the removal of a page
> from the swap cache.
> 
> There are only two callers of swapcache_free() which actually
> pass in a non-NULL 'struct page'.  Both of them (__remove_mapping
> and delete_from_swap_cache())  create a temporary on-stack
> 'swp_entry_t' and set entry.val to page_private().
> 
> They need to do this since __delete_from_swap_cache() does
> set_page_private(page, 0) and destroys the information.
> 
> However, I'd like to batch a few of these operations on several
> pages together in a new version of __remove_mapping(), and I
> would prefer not to have to allocate temporary storage for each
> page.  The code is pretty ugly, and it's a bit silly to create
> these on-stack 'swp_entry_t's when it is so easy to just keep the
> information around in 'struct page'.
> 
> There should not be any danger in doing this since we are
> absolutely on the path of freeing these page.  There is no
> turning back, and no other rerferences can be obtained after it
> comes out of the radix tree.
> 
> Note: This patch is separate from the next one since it
> introduces the behavior change.  I've seen issues with this patch
> by itself in various forms and I think having it separate like
> this aids bisection.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Minchan Kim <minchan@kernel.org>

Look at below nitpick.

> ---
> 
>  linux.git-davehans/mm/swap_state.c |    4 ++--
>  linux.git-davehans/mm/vmscan.c     |    2 ++
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c
> --- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.630079404 -0700
> +++ linux.git-davehans/mm/swap_state.c	2013-05-30 16:07:50.635079624 -0700
> @@ -148,8 +148,6 @@ void __delete_from_swap_cache(struct pag
>  	entry.val = page_private(page);
>  	address_space = swap_address_space(entry);
>  	radix_tree_delete(&address_space->page_tree, page_private(page));
> -	set_page_private(page, 0);
> -	ClearPageSwapCache(page);
>  	address_space->nrpages--;
>  	__dec_zone_page_state(page, NR_FILE_PAGES);
>  	INC_CACHE_INFO(del_total);
> @@ -226,6 +224,8 @@ void delete_from_swap_cache(struct page
>  	spin_unlock_irq(&address_space->tree_lock);
>  
>  	swapcache_free(entry, page);
> +	set_page_private(page, 0);
> +	ClearPageSwapCache(page);
>  	page_cache_release(page);
>  }
>  
> diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
> --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
> +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
> @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
>  		__delete_from_swap_cache(page);
>  		spin_unlock_irq(&mapping->tree_lock);
>  		swapcache_free(swap, page);
> +		set_page_private(page, 0);
> +		ClearPageSwapCache(page);

It it worth to support non-atomic version of ClearPageSwapCache?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free().
  2013-05-31 18:38   ` Dave Hansen
@ 2013-06-03  6:13     ` Minchan Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  6:13 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

Hello Dave,

On Fri, May 31, 2013 at 11:38:58AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> swapcache_free() takes two arguments:
> 
> 	void swapcache_free(swp_entry_t entry, struct page *page)
> 
> Most of its callers (5/7) are from error handling paths haven't even
> instantiated a page, so they pass page=NULL.  Both of the callers
> that call in with a 'struct page' create and pass in a temporary
> swp_entry_t.
> 
> Now that we are deferring clearing page_private() until after
> swapcache_free() has been called, we can just create a variant
> that takes a 'struct page' and does the temporary variable in
> the helper.
> 
> That leaves all the other callers doing
> 
> 	swapcache_free(entry, NULL)
> 
> so create another helper for them that makes it clear that they
> need only pass in a swp_entry_t.
> 
> One downside here is that delete_from_swap_cache() now does
> an extra swap_address_space() call.  But, those are pretty
> cheap (just some array index arithmetic).

I lost from this description.

Old behavior

delete_from_swap_cache
        swap_address_space
        __delete_from_swap_cache
                swap_address_space


New behavior

delete_from_swap_cache
        __delete_from_swap_cache
                swap_address_space
                
So you removes a swap_address_space, not adding a extra call.
Am I missing something?

> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Otherwise, looks good to me
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free().
@ 2013-06-03  6:13     ` Minchan Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  6:13 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

Hello Dave,

On Fri, May 31, 2013 at 11:38:58AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> swapcache_free() takes two arguments:
> 
> 	void swapcache_free(swp_entry_t entry, struct page *page)
> 
> Most of its callers (5/7) are from error handling paths haven't even
> instantiated a page, so they pass page=NULL.  Both of the callers
> that call in with a 'struct page' create and pass in a temporary
> swp_entry_t.
> 
> Now that we are deferring clearing page_private() until after
> swapcache_free() has been called, we can just create a variant
> that takes a 'struct page' and does the temporary variable in
> the helper.
> 
> That leaves all the other callers doing
> 
> 	swapcache_free(entry, NULL)
> 
> so create another helper for them that makes it clear that they
> need only pass in a swp_entry_t.
> 
> One downside here is that delete_from_swap_cache() now does
> an extra swap_address_space() call.  But, those are pretty
> cheap (just some array index arithmetic).

I lost from this description.

Old behavior

delete_from_swap_cache
        swap_address_space
        __delete_from_swap_cache
                swap_address_space


New behavior

delete_from_swap_cache
        __delete_from_swap_cache
                swap_address_space
                
So you removes a swap_address_space, not adding a extra call.
Am I missing something?

> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Otherwise, looks good to me
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping()
  2013-05-31 18:38   ` Dave Hansen
@ 2013-06-03  8:28     ` Minchan Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  8:28 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Fri, May 31, 2013 at 11:38:59AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Our goal here is to eventually reduce the number of repetitive
> acquire/release operations on mapping->tree_lock.
> 
> Logically, this patch has two steps:
> 1. rename __remove_mapping() to lock_remove_mapping() since
>    "__" usually means "this us the unlocked version.
> 2. Recreate __remove_mapping() to _be_ the lock_remove_mapping()
>    but without the locks.
> 
> I think this actually makes the code flow around the locking
> _much_ more straighforward since the locking just becomes:
> 
> 	spin_lock_irq(&mapping->tree_lock);
> 	ret = __remove_mapping(mapping, page);
> 	spin_unlock_irq(&mapping->tree_lock);
> 
> One non-obvious part of this patch: the
> 
> 	freepage = mapping->a_ops->freepage;
> 
> used to happen under the mapping->tree_lock, but this patch
> moves it to outside of the lock.  All of the other
> a_ops->freepage users do it outside the lock, and we only
> assign it when we create inodes, so that makes it safe.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Minchan Kin <minchan@kernel.org>

Just a nitpick below.

> 
> ---
> 
>  linux.git-davehans/mm/vmscan.c |   43 ++++++++++++++++++++++++-----------------
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c
> --- linux.git/mm/vmscan.c~make-remove-mapping-without-locks	2013-05-30 16:07:51.210104924 -0700
> +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.214105100 -0700
> @@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa
>   * Same as remove_mapping, but if the page is removed from the mapping, it
>   * gets returned with a refcount of 0.
>   */
> -static int __remove_mapping(struct address_space *mapping, struct page *page)
> +static int __remove_mapping(struct address_space *mapping,
> +			    struct page *page)

Unnecessary change.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping()
@ 2013-06-03  8:28     ` Minchan Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  8:28 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Fri, May 31, 2013 at 11:38:59AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> Our goal here is to eventually reduce the number of repetitive
> acquire/release operations on mapping->tree_lock.
> 
> Logically, this patch has two steps:
> 1. rename __remove_mapping() to lock_remove_mapping() since
>    "__" usually means "this us the unlocked version.
> 2. Recreate __remove_mapping() to _be_ the lock_remove_mapping()
>    but without the locks.
> 
> I think this actually makes the code flow around the locking
> _much_ more straighforward since the locking just becomes:
> 
> 	spin_lock_irq(&mapping->tree_lock);
> 	ret = __remove_mapping(mapping, page);
> 	spin_unlock_irq(&mapping->tree_lock);
> 
> One non-obvious part of this patch: the
> 
> 	freepage = mapping->a_ops->freepage;
> 
> used to happen under the mapping->tree_lock, but this patch
> moves it to outside of the lock.  All of the other
> a_ops->freepage users do it outside the lock, and we only
> assign it when we create inodes, so that makes it safe.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Minchan Kin <minchan@kernel.org>

Just a nitpick below.

> 
> ---
> 
>  linux.git-davehans/mm/vmscan.c |   43 ++++++++++++++++++++++++-----------------
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff -puN mm/vmscan.c~make-remove-mapping-without-locks mm/vmscan.c
> --- linux.git/mm/vmscan.c~make-remove-mapping-without-locks	2013-05-30 16:07:51.210104924 -0700
> +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.214105100 -0700
> @@ -450,12 +450,12 @@ static pageout_t pageout(struct page *pa
>   * Same as remove_mapping, but if the page is removed from the mapping, it
>   * gets returned with a refcount of 0.
>   */
> -static int __remove_mapping(struct address_space *mapping, struct page *page)
> +static int __remove_mapping(struct address_space *mapping,
> +			    struct page *page)

Unnecessary change.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code
  2013-05-31 18:39   ` Dave Hansen
@ 2013-06-03  8:35     ` Minchan Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  8:35 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Fri, May 31, 2013 at 11:39:01AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> __remove_mapping() only deals with pages with mappings, meaning
> page cache and swap cache.
> 
> At this point, the page has been removed from the mapping's radix
> tree, and we need to ensure that any fs-specific (or swap-
> specific) resources are freed up.
> 
> We will be using this function from a second location in a
> following patch.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Minchan Kim <minchan@kernel.org>

Again, a nitpick. Sorry.

> ---
> 
>  linux.git-davehans/mm/vmscan.c |   28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c
> --- linux.git/mm/vmscan.c~free_mapping_page	2013-05-30 16:07:51.461115968 -0700
> +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.465116144 -0700
> @@ -497,6 +497,24 @@ static int __remove_mapping(struct addre
>  	return 1;
>  }
>  
> +/*
> + * Release any resources the mapping had tied up in
> + * the page.

It could be a one line.


> + */
> +static void mapping_release_page(struct address_space *mapping,
> +				 struct page *page)
> +{
> +	if (PageSwapCache(page)) {
> +		swapcache_free_page_entry(page);
> +	} else {
> +		void (*freepage)(struct page *);
> +		freepage = mapping->a_ops->freepage;
> +		mem_cgroup_uncharge_cache_page(page);
> +		if (freepage != NULL)
> +			freepage(page);
> +	}
> +}
> +
>  static int lock_remove_mapping(struct address_space *mapping, struct page *page)
>  {
>  	int ret;
> @@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad
>  	if (!ret)
>  		return 0;
>  
> -	if (PageSwapCache(page)) {
> -		swapcache_free_page_entry(page);
> -	} else {
> -		void (*freepage)(struct page *);
> -		freepage = mapping->a_ops->freepage;
> -		mem_cgroup_uncharge_cache_page(page);
> -		if (freepage != NULL)
> -			freepage(page);
> -	}
> +	mapping_release_page(mapping, page);
>  	return ret;
>  }
>  
> _
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code
@ 2013-06-03  8:35     ` Minchan Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-03  8:35 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Fri, May 31, 2013 at 11:39:01AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> __remove_mapping() only deals with pages with mappings, meaning
> page cache and swap cache.
> 
> At this point, the page has been removed from the mapping's radix
> tree, and we need to ensure that any fs-specific (or swap-
> specific) resources are freed up.
> 
> We will be using this function from a second location in a
> following patch.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Minchan Kim <minchan@kernel.org>

Again, a nitpick. Sorry.

> ---
> 
>  linux.git-davehans/mm/vmscan.c |   28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff -puN mm/vmscan.c~free_mapping_page mm/vmscan.c
> --- linux.git/mm/vmscan.c~free_mapping_page	2013-05-30 16:07:51.461115968 -0700
> +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:51.465116144 -0700
> @@ -497,6 +497,24 @@ static int __remove_mapping(struct addre
>  	return 1;
>  }
>  
> +/*
> + * Release any resources the mapping had tied up in
> + * the page.

It could be a one line.


> + */
> +static void mapping_release_page(struct address_space *mapping,
> +				 struct page *page)
> +{
> +	if (PageSwapCache(page)) {
> +		swapcache_free_page_entry(page);
> +	} else {
> +		void (*freepage)(struct page *);
> +		freepage = mapping->a_ops->freepage;
> +		mem_cgroup_uncharge_cache_page(page);
> +		if (freepage != NULL)
> +			freepage(page);
> +	}
> +}
> +
>  static int lock_remove_mapping(struct address_space *mapping, struct page *page)
>  {
>  	int ret;
> @@ -510,15 +528,7 @@ static int lock_remove_mapping(struct ad
>  	if (!ret)
>  		return 0;
>  
> -	if (PageSwapCache(page)) {
> -		swapcache_free_page_entry(page);
> -	} else {
> -		void (*freepage)(struct page *);
> -		freepage = mapping->a_ops->freepage;
> -		mem_cgroup_uncharge_cache_page(page);
> -		if (freepage != NULL)
> -			freepage(page);
> -	}
> +	mapping_release_page(mapping, page);
>  	return ret;
>  }
>  
> _
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
  2013-06-03  5:40     ` Minchan Kim
@ 2013-06-03 14:53       ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-06-03 14:53 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On 06/02/2013 10:40 PM, Minchan Kim wrote:
>> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
>> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
>> > +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
>> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
>> >  		__delete_from_swap_cache(page);
>> >  		spin_unlock_irq(&mapping->tree_lock);
>> >  		swapcache_free(swap, page);
>> > +		set_page_private(page, 0);
>> > +		ClearPageSwapCache(page);
> It it worth to support non-atomic version of ClearPageSwapCache?

Just for this, probably not.

It does look like a site where it would be theoretically safe to use
non-atomic flag operations since the page is on a one-way trip to the
allocator at this point and the __clear_page_locked() now happens _just_
after this code.

But, personally, I'm happy to leave it as-is.  The atomic vs. non-atomic
flags look to me like a micro-optimization that we should use when we
_know_ there will be some tangible benefit.  Otherwise, they're just
something extra for developers to trip over and cause very subtle bugs.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
@ 2013-06-03 14:53       ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-06-03 14:53 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On 06/02/2013 10:40 PM, Minchan Kim wrote:
>> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
>> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
>> > +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
>> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
>> >  		__delete_from_swap_cache(page);
>> >  		spin_unlock_irq(&mapping->tree_lock);
>> >  		swapcache_free(swap, page);
>> > +		set_page_private(page, 0);
>> > +		ClearPageSwapCache(page);
> It it worth to support non-atomic version of ClearPageSwapCache?

Just for this, probably not.

It does look like a site where it would be theoretically safe to use
non-atomic flag operations since the page is on a one-way trip to the
allocator at this point and the __clear_page_locked() now happens _just_
after this code.

But, personally, I'm happy to leave it as-is.  The atomic vs. non-atomic
flags look to me like a micro-optimization that we should use when we
_know_ there will be some tangible benefit.  Otherwise, they're just
something extra for developers to trip over and cause very subtle bugs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free().
  2013-06-03  6:13     ` Minchan Kim
@ 2013-06-03 15:55       ` Dave Hansen
  -1 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-06-03 15:55 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On 06/02/2013 11:13 PM, Minchan Kim wrote:
> I lost from this description.
> 
> Old behavior
> 
> delete_from_swap_cache
>         swap_address_space
>         __delete_from_swap_cache
>                 swap_address_space
> 
> 
> New behavior
> 
> delete_from_swap_cache
>         __delete_from_swap_cache
>                 swap_address_space
>                 
> So you removes a swap_address_space, not adding a extra call.
> Am I missing something?

I think I got the page->swp_entry_t lookup confused with the
page->swap_address_space lookup when I was writing the description.  The
bit that you missed is that I _added_ a page_mapping() call, which calls
swap_address_space() internally:

Old behavior:

delete_from_swap_cache
        swap_address_space
        __delete_from_swap_cache
                swap_address_space

New behavior:

delete_from_swap_cache
	page_mapping
		swap_address_space
        __delete_from_swap_cache
                swap_address_space

--

New description (last paragraph changed).  Andrew, I'll resend the
series since there are a few of these cleanups.

From: Dave Hansen <dave.hansen@linux.intel.com>

swapcache_free() takes two arguments:

	void swapcache_free(swp_entry_t entry, struct page *page)

Most of its callers (5/7) are from error handling paths haven't even
instantiated a page, so they pass page=NULL.  Both of the callers
that call in with a 'struct page' create and pass in a temporary
swp_entry_t.

Now that we are deferring clearing page_private() until after
swapcache_free() has been called, we can just create a variant
that takes a 'struct page' and does the temporary variable in
the helper.

That leaves all the other callers doing

	swapcache_free(entry, NULL)

so create another helper for them that makes it clear that they
need only pass in a swp_entry_t.

One downside here is that delete_from_swap_cache() now calls
swap_address_space() via page_mapping() instead of calling
swap_address_space() directly.  In doing so, it removes one more
case of the swap cache code being special-cased, which is a good
thing in my book.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free().
@ 2013-06-03 15:55       ` Dave Hansen
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Hansen @ 2013-06-03 15:55 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On 06/02/2013 11:13 PM, Minchan Kim wrote:
> I lost from this description.
> 
> Old behavior
> 
> delete_from_swap_cache
>         swap_address_space
>         __delete_from_swap_cache
>                 swap_address_space
> 
> 
> New behavior
> 
> delete_from_swap_cache
>         __delete_from_swap_cache
>                 swap_address_space
>                 
> So you removes a swap_address_space, not adding a extra call.
> Am I missing something?

I think I got the page->swp_entry_t lookup confused with the
page->swap_address_space lookup when I was writing the description.  The
bit that you missed is that I _added_ a page_mapping() call, which calls
swap_address_space() internally:

Old behavior:

delete_from_swap_cache
        swap_address_space
        __delete_from_swap_cache
                swap_address_space

New behavior:

delete_from_swap_cache
	page_mapping
		swap_address_space
        __delete_from_swap_cache
                swap_address_space

--

New description (last paragraph changed).  Andrew, I'll resend the
series since there are a few of these cleanups.

From: Dave Hansen <dave.hansen@linux.intel.com>

swapcache_free() takes two arguments:

	void swapcache_free(swp_entry_t entry, struct page *page)

Most of its callers (5/7) are from error handling paths haven't even
instantiated a page, so they pass page=NULL.  Both of the callers
that call in with a 'struct page' create and pass in a temporary
swp_entry_t.

Now that we are deferring clearing page_private() until after
swapcache_free() has been called, we can just create a variant
that takes a 'struct page' and does the temporary variable in
the helper.

That leaves all the other callers doing

	swapcache_free(entry, NULL)

so create another helper for them that makes it clear that they
need only pass in a swp_entry_t.

One downside here is that delete_from_swap_cache() now calls
swap_address_space() via page_mapping() instead of calling
swap_address_space() directly.  In doing so, it removes one more
case of the swap cache code being special-cased, which is a good
thing in my book.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
  2013-06-03 14:53       ` Dave Hansen
@ 2013-06-04  4:41         ` Minchan Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-04  4:41 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Mon, Jun 03, 2013 at 07:53:01AM -0700, Dave Hansen wrote:
> On 06/02/2013 10:40 PM, Minchan Kim wrote:
> >> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
> >> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
> >> > +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
> >> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
> >> >  		__delete_from_swap_cache(page);
> >> >  		spin_unlock_irq(&mapping->tree_lock);
> >> >  		swapcache_free(swap, page);
> >> > +		set_page_private(page, 0);
> >> > +		ClearPageSwapCache(page);
> > It it worth to support non-atomic version of ClearPageSwapCache?
> 
> Just for this, probably not.
> 
> It does look like a site where it would be theoretically safe to use
> non-atomic flag operations since the page is on a one-way trip to the
> allocator at this point and the __clear_page_locked() now happens _just_
> after this code.

True.

> 
> But, personally, I'm happy to leave it as-is.  The atomic vs. non-atomic
> flags look to me like a micro-optimization that we should use when we
> _know_ there will be some tangible benefit.  Otherwise, they're just
> something extra for developers to trip over and cause very subtle bugs.

I just asked it because when I read the description of patchset, I felt
you were very sensitive to the atomic operation on many CPU system with
several sockets. Anyway, if you don't want it, I don't mind it, either.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages
@ 2013-06-04  4:41         ` Minchan Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Minchan Kim @ 2013-06-04  4:41 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, linux-kernel, akpm, mgorman, tim.c.chen

On Mon, Jun 03, 2013 at 07:53:01AM -0700, Dave Hansen wrote:
> On 06/02/2013 10:40 PM, Minchan Kim wrote:
> >> > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c
> >> > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private	2013-05-30 16:07:50.632079492 -0700
> >> > +++ linux.git-davehans/mm/vmscan.c	2013-05-30 16:07:50.637079712 -0700
> >> > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre
> >> >  		__delete_from_swap_cache(page);
> >> >  		spin_unlock_irq(&mapping->tree_lock);
> >> >  		swapcache_free(swap, page);
> >> > +		set_page_private(page, 0);
> >> > +		ClearPageSwapCache(page);
> > It it worth to support non-atomic version of ClearPageSwapCache?
> 
> Just for this, probably not.
> 
> It does look like a site where it would be theoretically safe to use
> non-atomic flag operations since the page is on a one-way trip to the
> allocator at this point and the __clear_page_locked() now happens _just_
> after this code.

True.

> 
> But, personally, I'm happy to leave it as-is.  The atomic vs. non-atomic
> flags look to me like a micro-optimization that we should use when we
> _know_ there will be some tangible benefit.  Otherwise, they're just
> something extra for developers to trip over and cause very subtle bugs.

I just asked it because when I read the description of patchset, I felt
you were very sensitive to the atomic operation on many CPU system with
several sockets. Anyway, if you don't want it, I don't mind it, either.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-06-04  4:41 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-31 18:38 [v4][PATCH 0/6] mm: vmscan: Batch page reclamation under shink_page_list Dave Hansen
2013-05-31 18:38 ` Dave Hansen
2013-05-31 18:38 ` [v4][PATCH 1/6] mm: swap: defer clearing of page_private() for swap cache pages Dave Hansen
2013-05-31 18:38   ` Dave Hansen
2013-06-03  5:40   ` Minchan Kim
2013-06-03  5:40     ` Minchan Kim
2013-06-03 14:53     ` Dave Hansen
2013-06-03 14:53       ` Dave Hansen
2013-06-04  4:41       ` Minchan Kim
2013-06-04  4:41         ` Minchan Kim
2013-05-31 18:38 ` [v4][PATCH 2/6] mm: swap: make 'struct page' and swp_entry_t variants of swapcache_free() Dave Hansen
2013-05-31 18:38   ` Dave Hansen
2013-06-03  6:13   ` Minchan Kim
2013-06-03  6:13     ` Minchan Kim
2013-06-03 15:55     ` Dave Hansen
2013-06-03 15:55       ` Dave Hansen
2013-05-31 18:38 ` [v4][PATCH 3/6] mm: vmscan: break up __remove_mapping() Dave Hansen
2013-05-31 18:38   ` Dave Hansen
2013-06-03  8:28   ` Minchan Kim
2013-06-03  8:28     ` Minchan Kim
2013-05-31 18:39 ` [v4][PATCH 4/6] mm: vmscan: break out mapping "freepage" code Dave Hansen
2013-05-31 18:39   ` Dave Hansen
2013-06-03  8:35   ` Minchan Kim
2013-06-03  8:35     ` Minchan Kim
2013-05-31 18:39 ` [v4][PATCH 5/6] mm: vmscan: batch shrink_page_list() locking operations Dave Hansen
2013-05-31 18:39   ` Dave Hansen
2013-05-31 18:39 ` [v4][PATCH 6/6] mm: vmscan: drain batch list during long operations Dave Hansen
2013-05-31 18:39   ` Dave Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.