All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-09 16:17 ` Huang, Ying
  0 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2016-08-09 16:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: tim.c.chen, dave.hansen, andi.kleen, aaron.lu, linux-mm,
	linux-kernel, Huang Ying, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang

From: Huang Ying <ying.huang@intel.com>

File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
accelerate finding the pages with the specific tag in the the radix tree
during writing back an inode.  But for anonymous pages in swap cache,
there are no inode based writeback.  So there is no need to find the
pages with some writeback tags in the radix tree.  It is no necessary to
touch radix tree writeback tags for pages in swap cache.

With this patch, the swap out bandwidth improved 22.3% in vm-scalability
swap-w-seq test case with 8 processes on a Xeon E5 v3 system, because of
reduced contention on swap cache radix tree lock.  To test sequence swap
out, the test case uses 8 processes sequentially allocate and write to
anonymous pages until RAM and part of the swap device is used up.

Details of comparison is as follow,

            base base+patch
---------------- --------------------------
             \          |                \
   2506952 ±  2%     +28.1%    3212076 ±  7%  vm-scalability.throughput
   1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so
     10.86 ± 12%     -23.4%       8.31 ± 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
     10.82 ± 13%     -33.1%       7.24 ± 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
     10.36 ± 11%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
     10.52 ± 12%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page

Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
---
 mm/page-writeback.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index f4cd7d8..ebfecb7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2758,7 +2758,7 @@ int test_clear_page_writeback(struct page *page)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && !PageSwapCache(page)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
@@ -2801,7 +2801,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && !PageSwapCache(page)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-09 16:17 ` Huang, Ying
  0 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2016-08-09 16:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: tim.c.chen, dave.hansen, andi.kleen, aaron.lu, linux-mm,
	linux-kernel, Huang Ying, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang

From: Huang Ying <ying.huang@intel.com>

File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
accelerate finding the pages with the specific tag in the the radix tree
during writing back an inode.  But for anonymous pages in swap cache,
there are no inode based writeback.  So there is no need to find the
pages with some writeback tags in the radix tree.  It is no necessary to
touch radix tree writeback tags for pages in swap cache.

With this patch, the swap out bandwidth improved 22.3% in vm-scalability
swap-w-seq test case with 8 processes on a Xeon E5 v3 system, because of
reduced contention on swap cache radix tree lock.  To test sequence swap
out, the test case uses 8 processes sequentially allocate and write to
anonymous pages until RAM and part of the swap device is used up.

Details of comparison is as follow,

            base base+patch
---------------- --------------------------
             \          |                \
   2506952 A+-  2%     +28.1%    3212076 A+-  7%  vm-scalability.throughput
   1207402 A+-  7%     +22.3%    1476578 A+-  6%  vmstat.swap.so
     10.86 A+- 12%     -23.4%       8.31 A+- 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
     10.82 A+- 13%     -33.1%       7.24 A+- 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
     10.36 A+- 11%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
     10.52 A+- 12%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page

Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
---
 mm/page-writeback.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index f4cd7d8..ebfecb7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2758,7 +2758,7 @@ int test_clear_page_writeback(struct page *page)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && !PageSwapCache(page)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
@@ -2801,7 +2801,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && !PageSwapCache(page)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
-- 
2.8.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-09 16:17 ` Huang, Ying
@ 2016-08-09 16:34   ` Dave Hansen
  -1 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2016-08-09 16:34 UTC (permalink / raw)
  To: Huang, Ying, Andrew Morton
  Cc: tim.c.chen, andi.kleen, aaron.lu, linux-mm, linux-kernel,
	Hugh Dickins, Shaohua Li, Minchan Kim, Rik van Riel, Mel Gorman,
	Tejun Heo, Wu Fengguang

On 08/09/2016 09:17 AM, Huang, Ying wrote:
> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
> accelerate finding the pages with the specific tag in the the radix tree
> during writing back an inode.  But for anonymous pages in swap cache,
> there are no inode based writeback.  So there is no need to find the
> pages with some writeback tags in the radix tree.  It is no necessary to
> touch radix tree writeback tags for pages in swap cache.

Seems simple enough.  Do we do any of this unnecessary work for the
other radix tree tags?  If so, maybe we should just fix this once and
for all.  Could we, for instance, WARN_ONCE() in radix_tree_tag_set() if
it sees a swap mapping get handed in there?

In any case, I think the new !PageSwapCache(page) check either needs
commenting, or a common helper for the two sites that you can comment.

> With this patch, the swap out bandwidth improved 22.3% in vm-scalability
> swap-w-seq test case with 8 processes on a Xeon E5 v3 system, because of
> reduced contention on swap cache radix tree lock.  To test sequence swap
> out, the test case uses 8 processes sequentially allocate and write to
> anonymous pages until RAM and part of the swap device is used up.

What was the swap device here, btw?  What is the actual bandwidth
increase you are seeing?  Is it 1MB/s -> 1.223MB/s? :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-09 16:34   ` Dave Hansen
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2016-08-09 16:34 UTC (permalink / raw)
  To: Huang, Ying, Andrew Morton
  Cc: tim.c.chen, andi.kleen, aaron.lu, linux-mm, linux-kernel,
	Hugh Dickins, Shaohua Li, Minchan Kim, Rik van Riel, Mel Gorman,
	Tejun Heo, Wu Fengguang

On 08/09/2016 09:17 AM, Huang, Ying wrote:
> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
> accelerate finding the pages with the specific tag in the the radix tree
> during writing back an inode.  But for anonymous pages in swap cache,
> there are no inode based writeback.  So there is no need to find the
> pages with some writeback tags in the radix tree.  It is no necessary to
> touch radix tree writeback tags for pages in swap cache.

Seems simple enough.  Do we do any of this unnecessary work for the
other radix tree tags?  If so, maybe we should just fix this once and
for all.  Could we, for instance, WARN_ONCE() in radix_tree_tag_set() if
it sees a swap mapping get handed in there?

In any case, I think the new !PageSwapCache(page) check either needs
commenting, or a common helper for the two sites that you can comment.

> With this patch, the swap out bandwidth improved 22.3% in vm-scalability
> swap-w-seq test case with 8 processes on a Xeon E5 v3 system, because of
> reduced contention on swap cache radix tree lock.  To test sequence swap
> out, the test case uses 8 processes sequentially allocate and write to
> anonymous pages until RAM and part of the swap device is used up.

What was the swap device here, btw?  What is the actual bandwidth
increase you are seeing?  Is it 1MB/s -> 1.223MB/s? :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-09 16:34   ` Dave Hansen
@ 2016-08-09 17:00     ` Huang, Ying
  -1 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2016-08-09 17:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang

Hi, Dave,

Dave Hansen <dave.hansen@intel.com> writes:

> On 08/09/2016 09:17 AM, Huang, Ying wrote:
>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
>> accelerate finding the pages with the specific tag in the the radix tree
>> during writing back an inode.  But for anonymous pages in swap cache,
>> there are no inode based writeback.  So there is no need to find the
>> pages with some writeback tags in the radix tree.  It is no necessary to
>> touch radix tree writeback tags for pages in swap cache.
>
> Seems simple enough.  Do we do any of this unnecessary work for the
> other radix tree tags?  If so, maybe we should just fix this once and
> for all.  Could we, for instance, WARN_ONCE() in radix_tree_tag_set() if
> it sees a swap mapping get handed in there?

Good idea!  I will do that and try to catch other places if any.

> In any case, I think the new !PageSwapCache(page) check either needs
> commenting, or a common helper for the two sites that you can comment.

Sure.  I will add that.

>> With this patch, the swap out bandwidth improved 22.3% in vm-scalability
>> swap-w-seq test case with 8 processes on a Xeon E5 v3 system, because of
>> reduced contention on swap cache radix tree lock.  To test sequence swap
>> out, the test case uses 8 processes sequentially allocate and write to
>> anonymous pages until RAM and part of the swap device is used up.
>
> What was the swap device here, btw?  What is the actual bandwidth
> increase you are seeing?  Is it 1MB/s -> 1.223MB/s? :)

The swap device here is a DRAM simulated persistent memory block device
(pmem).

   1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so

The actual bandwidth increase is from 1.21GB/s -> 1.48 GB/s.  This is
lower than that of NVMe disk, so the bottleneck is in swap subsystem
instead of block subsystem and device.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-09 17:00     ` Huang, Ying
  0 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2016-08-09 17:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang

Hi, Dave,

Dave Hansen <dave.hansen@intel.com> writes:

> On 08/09/2016 09:17 AM, Huang, Ying wrote:
>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
>> accelerate finding the pages with the specific tag in the the radix tree
>> during writing back an inode.  But for anonymous pages in swap cache,
>> there are no inode based writeback.  So there is no need to find the
>> pages with some writeback tags in the radix tree.  It is no necessary to
>> touch radix tree writeback tags for pages in swap cache.
>
> Seems simple enough.  Do we do any of this unnecessary work for the
> other radix tree tags?  If so, maybe we should just fix this once and
> for all.  Could we, for instance, WARN_ONCE() in radix_tree_tag_set() if
> it sees a swap mapping get handed in there?

Good idea!  I will do that and try to catch other places if any.

> In any case, I think the new !PageSwapCache(page) check either needs
> commenting, or a common helper for the two sites that you can comment.

Sure.  I will add that.

>> With this patch, the swap out bandwidth improved 22.3% in vm-scalability
>> swap-w-seq test case with 8 processes on a Xeon E5 v3 system, because of
>> reduced contention on swap cache radix tree lock.  To test sequence swap
>> out, the test case uses 8 processes sequentially allocate and write to
>> anonymous pages until RAM and part of the swap device is used up.
>
> What was the swap device here, btw?  What is the actual bandwidth
> increase you are seeing?  Is it 1MB/s -> 1.223MB/s? :)

The swap device here is a DRAM simulated persistent memory block device
(pmem).

   1207402 A+-  7%     +22.3%    1476578 A+-  6%  vmstat.swap.so

The actual bandwidth increase is from 1.21GB/s -> 1.48 GB/s.  This is
lower than that of NVMe disk, so the bottleneck is in swap subsystem
instead of block subsystem and device.

Best Regards,
Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-09 17:00     ` Huang, Ying
@ 2016-08-24 16:12       ` Huang, Ying
  -1 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2016-08-24 16:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, tim.c.chen, andi.kleen, aaron.lu, linux-mm,
	linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang, Huang, Ying

"Huang, Ying" <ying.huang@intel.com> writes:

> Hi, Dave,
>
> Dave Hansen <dave.hansen@intel.com> writes:
>
>> On 08/09/2016 09:17 AM, Huang, Ying wrote:
>>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
>>> accelerate finding the pages with the specific tag in the the radix tree
>>> during writing back an inode.  But for anonymous pages in swap cache,
>>> there are no inode based writeback.  So there is no need to find the
>>> pages with some writeback tags in the radix tree.  It is no necessary to
>>> touch radix tree writeback tags for pages in swap cache.
>>
>> Seems simple enough.  Do we do any of this unnecessary work for the
>> other radix tree tags?  If so, maybe we should just fix this once and
>> for all.  Could we, for instance, WARN_ONCE() in radix_tree_tag_set() if
>> it sees a swap mapping get handed in there?
>
> Good idea!  I will do that and try to catch other places if any.

I tested all (18) anonymous pages related test cases in vm-scalability
with a debug patch to WARN_ONCE for all swap mapping tag operations.
There are no other tag operations for swap mapping caught.  Below is the
patch I used for debugging.

Best Regards,
Huang, Ying

----------------------------------------->
    dbg: find all tag operations for swap cache

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 4c45105..9a239ec 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -106,16 +106,24 @@ struct radix_tree_node {
 
 /* root tags are stored in gfp_mask, shifted by __GFP_BITS_SHIFT */
 struct radix_tree_root {
+	bool			swap;
 	gfp_t			gfp_mask;
 	struct radix_tree_node	__rcu *rnode;
 };
 
 #define RADIX_TREE_INIT(mask)	{					\
+	.swap = false,							\
 	.gfp_mask = (mask),						\
 	.rnode = NULL,							\
 }
 
-#define RADIX_TREE(name, mask) \
+#define RADIX_TREE_INIT_SWAP(mask)	{				\
+	.swap = true,							\
+	.gfp_mask = (mask),						\
+	.rnode = NULL,							\
+}
+
+#define RADIX_TREE(name, mask)					\
 	struct radix_tree_root name = RADIX_TREE_INIT(mask)
 
 #define INIT_RADIX_TREE(root, mask)					\
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf73..51677bf 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -765,6 +765,8 @@ void *radix_tree_tag_set(struct radix_tree_root *root,
 	struct radix_tree_node *node, *parent;
 	unsigned long maxindex;
 
+	WARN_ON_ONCE(root->swap);
+
 	radix_tree_load_root(root, &node, &maxindex);
 	BUG_ON(index > maxindex);
 
@@ -828,6 +830,8 @@ void *radix_tree_tag_clear(struct radix_tree_root *root,
 	unsigned long maxindex;
 	int uninitialized_var(offset);
 
+	WARN_ON_ONCE(root->swap);
+
 	radix_tree_load_root(root, &node, &maxindex);
 	if (index > maxindex)
 		return NULL;
@@ -867,6 +871,8 @@ int radix_tree_tag_get(struct radix_tree_root *root,
 	struct radix_tree_node *node, *parent;
 	unsigned long maxindex;
 
+	WARN_ON_ONCE(root->swap);
+
 	if (!root_tag_get(root, tag))
 		return 0;
 
@@ -1050,6 +1056,8 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 	unsigned long tagged = 0;
 	unsigned long index = *first_indexp;
 
+	WARN_ON_ONCE(root->swap);
+
 	radix_tree_load_root(root, &child, &maxindex);
 	last_index = min(last_index, maxindex);
 	if (index > last_index)
@@ -1240,6 +1248,8 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
 	void **slot;
 	unsigned int ret = 0;
 
+	WARN_ON_ONCE(root->swap);
+
 	if (unlikely(!max_items))
 		return 0;
 
@@ -1281,6 +1291,8 @@ radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root, void ***results,
 	void **slot;
 	unsigned int ret = 0;
 
+	WARN_ON_ONCE(root->swap);
+
 	if (unlikely(!max_items))
 		return 0;
 
@@ -1590,6 +1602,8 @@ struct radix_tree_node *radix_tree_replace_clear_tags(
 	struct radix_tree_node *node;
 	void **slot;
 
+	WARN_ON_ONCE(root->swap);
+
 	__radix_tree_lookup(root, index, &node, &slot);
 
 	if (node) {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c8310a3..0059653 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -34,7 +34,7 @@ static const struct address_space_operations swap_aops = {
 
 struct address_space swapper_spaces[MAX_SWAPFILES] = {
 	[0 ... MAX_SWAPFILES - 1] = {
-		.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN),
+		.page_tree	= RADIX_TREE_INIT_SWAP(GFP_ATOMIC|__GFP_NOWARN),
 		.i_mmap_writable = ATOMIC_INIT(0),
 		.a_ops		= &swap_aops,
 	}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-24 16:12       ` Huang, Ying
  0 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2016-08-24 16:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, tim.c.chen, andi.kleen, aaron.lu, linux-mm,
	linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang, Huang, Ying

"Huang, Ying" <ying.huang@intel.com> writes:

> Hi, Dave,
>
> Dave Hansen <dave.hansen@intel.com> writes:
>
>> On 08/09/2016 09:17 AM, Huang, Ying wrote:
>>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
>>> accelerate finding the pages with the specific tag in the the radix tree
>>> during writing back an inode.  But for anonymous pages in swap cache,
>>> there are no inode based writeback.  So there is no need to find the
>>> pages with some writeback tags in the radix tree.  It is no necessary to
>>> touch radix tree writeback tags for pages in swap cache.
>>
>> Seems simple enough.  Do we do any of this unnecessary work for the
>> other radix tree tags?  If so, maybe we should just fix this once and
>> for all.  Could we, for instance, WARN_ONCE() in radix_tree_tag_set() if
>> it sees a swap mapping get handed in there?
>
> Good idea!  I will do that and try to catch other places if any.

I tested all (18) anonymous pages related test cases in vm-scalability
with a debug patch to WARN_ONCE for all swap mapping tag operations.
There are no other tag operations for swap mapping caught.  Below is the
patch I used for debugging.

Best Regards,
Huang, Ying

----------------------------------------->
    dbg: find all tag operations for swap cache

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 4c45105..9a239ec 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -106,16 +106,24 @@ struct radix_tree_node {
 
 /* root tags are stored in gfp_mask, shifted by __GFP_BITS_SHIFT */
 struct radix_tree_root {
+	bool			swap;
 	gfp_t			gfp_mask;
 	struct radix_tree_node	__rcu *rnode;
 };
 
 #define RADIX_TREE_INIT(mask)	{					\
+	.swap = false,							\
 	.gfp_mask = (mask),						\
 	.rnode = NULL,							\
 }
 
-#define RADIX_TREE(name, mask) \
+#define RADIX_TREE_INIT_SWAP(mask)	{				\
+	.swap = true,							\
+	.gfp_mask = (mask),						\
+	.rnode = NULL,							\
+}
+
+#define RADIX_TREE(name, mask)					\
 	struct radix_tree_root name = RADIX_TREE_INIT(mask)
 
 #define INIT_RADIX_TREE(root, mask)					\
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf73..51677bf 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -765,6 +765,8 @@ void *radix_tree_tag_set(struct radix_tree_root *root,
 	struct radix_tree_node *node, *parent;
 	unsigned long maxindex;
 
+	WARN_ON_ONCE(root->swap);
+
 	radix_tree_load_root(root, &node, &maxindex);
 	BUG_ON(index > maxindex);
 
@@ -828,6 +830,8 @@ void *radix_tree_tag_clear(struct radix_tree_root *root,
 	unsigned long maxindex;
 	int uninitialized_var(offset);
 
+	WARN_ON_ONCE(root->swap);
+
 	radix_tree_load_root(root, &node, &maxindex);
 	if (index > maxindex)
 		return NULL;
@@ -867,6 +871,8 @@ int radix_tree_tag_get(struct radix_tree_root *root,
 	struct radix_tree_node *node, *parent;
 	unsigned long maxindex;
 
+	WARN_ON_ONCE(root->swap);
+
 	if (!root_tag_get(root, tag))
 		return 0;
 
@@ -1050,6 +1056,8 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 	unsigned long tagged = 0;
 	unsigned long index = *first_indexp;
 
+	WARN_ON_ONCE(root->swap);
+
 	radix_tree_load_root(root, &child, &maxindex);
 	last_index = min(last_index, maxindex);
 	if (index > last_index)
@@ -1240,6 +1248,8 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
 	void **slot;
 	unsigned int ret = 0;
 
+	WARN_ON_ONCE(root->swap);
+
 	if (unlikely(!max_items))
 		return 0;
 
@@ -1281,6 +1291,8 @@ radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root, void ***results,
 	void **slot;
 	unsigned int ret = 0;
 
+	WARN_ON_ONCE(root->swap);
+
 	if (unlikely(!max_items))
 		return 0;
 
@@ -1590,6 +1602,8 @@ struct radix_tree_node *radix_tree_replace_clear_tags(
 	struct radix_tree_node *node;
 	void **slot;
 
+	WARN_ON_ONCE(root->swap);
+
 	__radix_tree_lookup(root, index, &node, &slot);
 
 	if (node) {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c8310a3..0059653 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -34,7 +34,7 @@ static const struct address_space_operations swap_aops = {
 
 struct address_space swapper_spaces[MAX_SWAPFILES] = {
 	[0 ... MAX_SWAPFILES - 1] = {
-		.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN),
+		.page_tree	= RADIX_TREE_INIT_SWAP(GFP_ATOMIC|__GFP_NOWARN),
 		.i_mmap_writable = ATOMIC_INIT(0),
 		.a_ops		= &swap_aops,
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-24 16:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-09 16:17 [RFC] mm: Don't use radix tree writeback tags for pages in swap cache Huang, Ying
2016-08-09 16:17 ` Huang, Ying
2016-08-09 16:34 ` Dave Hansen
2016-08-09 16:34   ` Dave Hansen
2016-08-09 17:00   ` Huang, Ying
2016-08-09 17:00     ` Huang, Ying
2016-08-24 16:12     ` Huang, Ying
2016-08-24 16:12       ` Huang, Ying

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.