All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-30 17:28 ` Huang, Ying
  0 siblings, 0 replies; 35+ messages in thread
From: Huang, Ying @ 2016-08-30 17:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: tim.c.chen, dave.hansen, andi.kleen, aaron.lu, linux-mm,
	linux-kernel, Huang Ying, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang

From: Huang Ying <ying.huang@intel.com>

File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
etc.) to accelerate finding the pages with a specific tag in the radix
tree during inode writeback.  But for anonymous pages in the swap
cache, there is no inode writeback.  So there is no need to find the
pages with some writeback tags in the radix tree.  It is not necessary
to touch radix tree writeback tags for pages in the swap cache.

Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
introduced for address spaces which don't need to update the writeback
tags.  The flag is set for swap caches.  It may be used for DAX file
systems, etc.

With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
simulated PMEM (persistent memory) device.  The improvement comes from
the reduced contention on the swap cache radix tree lock.  To test
sequential swapping out, the test case uses 8 processes, which
sequentially allocate and write to the anonymous pages until RAM and
part of the swap device is used up.

Details of comparison is as follow,

base             base+patch
---------------- --------------------------
         %stddev     %change         %stddev
             \          |                \
   2506952 ±  2%     +28.1%    3212076 ±  7%  vm-scalability.throughput
   1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so
     10.86 ± 12%     -23.4%       8.31 ± 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
     10.82 ± 13%     -33.1%       7.24 ± 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
     10.36 ± 11%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
     10.52 ± 12%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page

Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
---
 include/linux/pagemap.h | 12 ++++++++++++
 mm/page-writeback.c     |  4 ++--
 mm/swap_state.c         |  2 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 66a1260..2f5a65dd 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -25,6 +25,8 @@ enum mapping_flags {
 	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
 	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
+	/* writeback related tags are not used */
+	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -64,6 +66,16 @@ static inline int mapping_exiting(struct address_space *mapping)
 	return test_bit(AS_EXITING, &mapping->flags);
 }
 
+static inline void mapping_set_no_writeback_tags(struct address_space *mapping)
+{
+	set_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags);
+}
+
+static inline int mapping_use_writeback_tags(struct address_space *mapping)
+{
+	return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 82e7252..67b7c8b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2728,7 +2728,7 @@ int test_clear_page_writeback(struct page *page)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && mapping_use_writeback_tags(mapping)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
@@ -2771,7 +2771,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && mapping_use_writeback_tags(mapping)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c8310a3..268b819 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -37,6 +37,8 @@ struct address_space swapper_spaces[MAX_SWAPFILES] = {
 		.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN),
 		.i_mmap_writable = ATOMIC_INIT(0),
 		.a_ops		= &swap_aops,
+		/* swap cache doesn't use writeback related tags */
+		.flags		= 1 << AS_NO_WRITEBACK_TAGS,
 	}
 };
 
-- 
2.8.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-30 17:28 ` Huang, Ying
  0 siblings, 0 replies; 35+ messages in thread
From: Huang, Ying @ 2016-08-30 17:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: tim.c.chen, dave.hansen, andi.kleen, aaron.lu, linux-mm,
	linux-kernel, Huang Ying, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Mel Gorman, Tejun Heo, Wu Fengguang

From: Huang Ying <ying.huang@intel.com>

File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
etc.) to accelerate finding the pages with a specific tag in the radix
tree during inode writeback.  But for anonymous pages in the swap
cache, there is no inode writeback.  So there is no need to find the
pages with some writeback tags in the radix tree.  It is not necessary
to touch radix tree writeback tags for pages in the swap cache.

Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
introduced for address spaces which don't need to update the writeback
tags.  The flag is set for swap caches.  It may be used for DAX file
systems, etc.

With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
simulated PMEM (persistent memory) device.  The improvement comes from
the reduced contention on the swap cache radix tree lock.  To test
sequential swapping out, the test case uses 8 processes, which
sequentially allocate and write to the anonymous pages until RAM and
part of the swap device is used up.

Details of comparison is as follow,

base             base+patch
---------------- --------------------------
         %stddev     %change         %stddev
             \          |                \
   2506952 A+-  2%     +28.1%    3212076 A+-  7%  vm-scalability.throughput
   1207402 A+-  7%     +22.3%    1476578 A+-  6%  vmstat.swap.so
     10.86 A+- 12%     -23.4%       8.31 A+- 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
     10.82 A+- 13%     -33.1%       7.24 A+- 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
     10.36 A+- 11%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
     10.52 A+- 12%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page

Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
---
 include/linux/pagemap.h | 12 ++++++++++++
 mm/page-writeback.c     |  4 ++--
 mm/swap_state.c         |  2 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 66a1260..2f5a65dd 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -25,6 +25,8 @@ enum mapping_flags {
 	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
 	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
+	/* writeback related tags are not used */
+	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -64,6 +66,16 @@ static inline int mapping_exiting(struct address_space *mapping)
 	return test_bit(AS_EXITING, &mapping->flags);
 }
 
+static inline void mapping_set_no_writeback_tags(struct address_space *mapping)
+{
+	set_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags);
+}
+
+static inline int mapping_use_writeback_tags(struct address_space *mapping)
+{
+	return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 82e7252..67b7c8b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2728,7 +2728,7 @@ int test_clear_page_writeback(struct page *page)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && mapping_use_writeback_tags(mapping)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
@@ -2771,7 +2771,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 	int ret;
 
 	lock_page_memcg(page);
-	if (mapping) {
+	if (mapping && mapping_use_writeback_tags(mapping)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 		unsigned long flags;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c8310a3..268b819 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -37,6 +37,8 @@ struct address_space swapper_spaces[MAX_SWAPFILES] = {
 		.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN),
 		.i_mmap_writable = ATOMIC_INIT(0),
 		.a_ops		= &swap_aops,
+		/* swap cache doesn't use writeback related tags */
+		.flags		= 1 << AS_NO_WRITEBACK_TAGS,
 	}
 };
 
-- 
2.8.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-30 17:28 ` Huang, Ying
  (?)
@ 2016-08-30 18:29 ` Rik van Riel
  -1 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2016-08-30 18:29 UTC (permalink / raw)
  To: Huang, Ying, Andrew Morton
  Cc: tim.c.chen, dave.hansen, andi.kleen, aaron.lu, linux-mm,
	linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim, Mel Gorman,
	Tejun Heo, Wu Fengguang

[-- Attachment #1: Type: text/plain, Size: 3059 bytes --]

On Tue, 2016-08-30 at 10:28 -0700, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
> etc.) to accelerate finding the pages with a specific tag in the
> radix
> tree during inode writeback.  But for anonymous pages in the swap
> cache, there is no inode writeback.  So there is no need to find the
> pages with some writeback tags in the radix tree.  It is not
> necessary
> to touch radix tree writeback tags for pages in the swap cache.
> 
> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
> introduced for address spaces which don't need to update the
> writeback
> tags.  The flag is set for swap caches.  It may be used for DAX file
> systems, etc.
> 
> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s
> to
> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8
> processes.
> The test is done on a Xeon E5 v3 system.  The swap device used is a
> RAM
> simulated PMEM (persistent memory) device.  The improvement comes
> from
> the reduced contention on the swap cache radix tree lock.  To test
> sequential swapping out, the test case uses 8 processes, which
> sequentially allocate and write to the anonymous pages until RAM and
> part of the swap device is used up.
> 
> Details of comparison is as follow,
> 
> base             base+patch
> ---------------- --------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    2506952 ±  2%     +28.1%    3212076 ±  7%  vm-
> scalability.throughput
>    1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so
>      10.86 ± 12%     -23.4%       8.31 ± 16%  perf-profile.cycles-
> pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_sw
> ap.shrink_page_list
>      10.82 ± 13%     -33.1%       7.24 ± 14%  perf-profile.cycles-
> pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_in
> active_list.shrink_zone_memcg
>      10.36 ± 11%    -100.0%       0.00 ± -1%  perf-profile.cycles-
> pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page._
> _swap_writepage.swap_writepage
>      10.52 ± 12%    -100.0%       0.00 ± -1%  perf-profile.cycles-
> pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writebac
> k.page_endio.pmem_rw_page
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> 
Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-30 17:28 ` Huang, Ying
@ 2016-08-31  9:14   ` Mel Gorman
  -1 siblings, 0 replies; 35+ messages in thread
From: Mel Gorman @ 2016-08-31  9:14 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
> etc.) to accelerate finding the pages with a specific tag in the radix
> tree during inode writeback.  But for anonymous pages in the swap
> cache, there is no inode writeback.  So there is no need to find the
> pages with some writeback tags in the radix tree.  It is not necessary
> to touch radix tree writeback tags for pages in the swap cache.
> 
> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
> introduced for address spaces which don't need to update the writeback
> tags.  The flag is set for swap caches.  It may be used for DAX file
> systems, etc.
> 
> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
> simulated PMEM (persistent memory) device.  The improvement comes from
> the reduced contention on the swap cache radix tree lock.  To test
> sequential swapping out, the test case uses 8 processes, which
> sequentially allocate and write to the anonymous pages until RAM and
> part of the swap device is used up.
> 
> Details of comparison is as follow,
> 
> base             base+patch
> ---------------- --------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    2506952 ±  2%     +28.1%    3212076 ±  7%  vm-scalability.throughput
>    1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so
>      10.86 ± 12%     -23.4%       8.31 ± 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
>      10.82 ± 13%     -33.1%       7.24 ± 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
>      10.36 ± 11%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
>      10.52 ± 12%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> 

I didn't see anything wrong with the patch but it's worth highlighting
that this hunk means we are now out of GFP bits.

> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 66a1260..2f5a65dd 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -25,6 +25,8 @@ enum mapping_flags {
>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>  	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
> +	/* writeback related tags are not used */
> +	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
>  };
>  

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-31  9:14   ` Mel Gorman
  0 siblings, 0 replies; 35+ messages in thread
From: Mel Gorman @ 2016-08-31  9:14 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
> etc.) to accelerate finding the pages with a specific tag in the radix
> tree during inode writeback.  But for anonymous pages in the swap
> cache, there is no inode writeback.  So there is no need to find the
> pages with some writeback tags in the radix tree.  It is not necessary
> to touch radix tree writeback tags for pages in the swap cache.
> 
> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
> introduced for address spaces which don't need to update the writeback
> tags.  The flag is set for swap caches.  It may be used for DAX file
> systems, etc.
> 
> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
> simulated PMEM (persistent memory) device.  The improvement comes from
> the reduced contention on the swap cache radix tree lock.  To test
> sequential swapping out, the test case uses 8 processes, which
> sequentially allocate and write to the anonymous pages until RAM and
> part of the swap device is used up.
> 
> Details of comparison is as follow,
> 
> base             base+patch
> ---------------- --------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    2506952 +-  2%     +28.1%    3212076 +-  7%  vm-scalability.throughput
>    1207402 +-  7%     +22.3%    1476578 +-  6%  vmstat.swap.so
>      10.86 +- 12%     -23.4%       8.31 +- 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
>      10.82 +- 13%     -33.1%       7.24 +- 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
>      10.36 +- 11%    -100.0%       0.00 +- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
>      10.52 +- 12%    -100.0%       0.00 +- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> 

I didn't see anything wrong with the patch but it's worth highlighting
that this hunk means we are now out of GFP bits.

> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 66a1260..2f5a65dd 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -25,6 +25,8 @@ enum mapping_flags {
>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>  	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
> +	/* writeback related tags are not used */
> +	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
>  };
>  

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-31  9:14   ` Mel Gorman
@ 2016-08-31 15:17     ` Huang, Ying
  -1 siblings, 0 replies; 35+ messages in thread
From: Huang, Ying @ 2016-08-31 15:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

Mel Gorman <mgorman@techsingularity.net> writes:

> On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@intel.com>
>> 
>> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
>> etc.) to accelerate finding the pages with a specific tag in the radix
>> tree during inode writeback.  But for anonymous pages in the swap
>> cache, there is no inode writeback.  So there is no need to find the
>> pages with some writeback tags in the radix tree.  It is not necessary
>> to touch radix tree writeback tags for pages in the swap cache.
>> 
>> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
>> introduced for address spaces which don't need to update the writeback
>> tags.  The flag is set for swap caches.  It may be used for DAX file
>> systems, etc.
>> 
>> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
>> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
>> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
>> simulated PMEM (persistent memory) device.  The improvement comes from
>> the reduced contention on the swap cache radix tree lock.  To test
>> sequential swapping out, the test case uses 8 processes, which
>> sequentially allocate and write to the anonymous pages until RAM and
>> part of the swap device is used up.
>> 
>> Details of comparison is as follow,
>> 
>> base             base+patch
>> ---------------- --------------------------
>>          %stddev     %change         %stddev
>>              \          |                \
>>    2506952 ±  2%     +28.1%    3212076 ±  7%  vm-scalability.throughput
>>    1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so
>>      10.86 ± 12%     -23.4%       8.31 ± 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
>>      10.82 ± 13%     -33.1%       7.24 ± 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
>>      10.36 ± 11%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
>>      10.52 ± 12%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
>> 
>
> I didn't see anything wrong with the patch but it's worth highlighting
> that this hunk means we are now out of GFP bits.

Sorry, I don't know whether I understand your words.  It is something
about,

__GFP_BITS_SHIFT == 26

So remainning bits in mapping_flags is 6.  And now the latest bit is
used for the flag introduced in the patch?

Best Regards,
Huang, Ying

>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 66a1260..2f5a65dd 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -25,6 +25,8 @@ enum mapping_flags {
>>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>>  	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
>> +	/* writeback related tags are not used */
>> +	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
>>  };
>>  

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-31 15:17     ` Huang, Ying
  0 siblings, 0 replies; 35+ messages in thread
From: Huang, Ying @ 2016-08-31 15:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

Mel Gorman <mgorman@techsingularity.net> writes:

> On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@intel.com>
>> 
>> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
>> etc.) to accelerate finding the pages with a specific tag in the radix
>> tree during inode writeback.  But for anonymous pages in the swap
>> cache, there is no inode writeback.  So there is no need to find the
>> pages with some writeback tags in the radix tree.  It is not necessary
>> to touch radix tree writeback tags for pages in the swap cache.
>> 
>> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
>> introduced for address spaces which don't need to update the writeback
>> tags.  The flag is set for swap caches.  It may be used for DAX file
>> systems, etc.
>> 
>> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
>> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
>> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
>> simulated PMEM (persistent memory) device.  The improvement comes from
>> the reduced contention on the swap cache radix tree lock.  To test
>> sequential swapping out, the test case uses 8 processes, which
>> sequentially allocate and write to the anonymous pages until RAM and
>> part of the swap device is used up.
>> 
>> Details of comparison is as follow,
>> 
>> base             base+patch
>> ---------------- --------------------------
>>          %stddev     %change         %stddev
>>              \          |                \
>>    2506952 A+-  2%     +28.1%    3212076 A+-  7%  vm-scalability.throughput
>>    1207402 A+-  7%     +22.3%    1476578 A+-  6%  vmstat.swap.so
>>      10.86 A+- 12%     -23.4%       8.31 A+- 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
>>      10.82 A+- 13%     -33.1%       7.24 A+- 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
>>      10.36 A+- 11%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
>>      10.52 A+- 12%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
>> 
>
> I didn't see anything wrong with the patch but it's worth highlighting
> that this hunk means we are now out of GFP bits.

Sorry, I don't know whether I understand your words.  It is something
about,

__GFP_BITS_SHIFT == 26

So remainning bits in mapping_flags is 6.  And now the latest bit is
used for the flag introduced in the patch?

Best Regards,
Huang, Ying

>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 66a1260..2f5a65dd 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -25,6 +25,8 @@ enum mapping_flags {
>>  	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
>>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>>  	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
>> +	/* writeback related tags are not used */
>> +	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
>>  };
>>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-31 15:17     ` Huang, Ying
@ 2016-08-31 15:39       ` Mel Gorman
  -1 siblings, 0 replies; 35+ messages in thread
From: Mel Gorman @ 2016-08-31 15:39 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Wed, Aug 31, 2016 at 08:17:24AM -0700, Huang, Ying wrote:
> Mel Gorman <mgorman@techsingularity.net> writes:
> 
> > On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
> >> From: Huang Ying <ying.huang@intel.com>
> >> 
> >> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
> >> etc.) to accelerate finding the pages with a specific tag in the radix
> >> tree during inode writeback.  But for anonymous pages in the swap
> >> cache, there is no inode writeback.  So there is no need to find the
> >> pages with some writeback tags in the radix tree.  It is not necessary
> >> to touch radix tree writeback tags for pages in the swap cache.
> >> 
> >> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
> >> introduced for address spaces which don't need to update the writeback
> >> tags.  The flag is set for swap caches.  It may be used for DAX file
> >> systems, etc.
> >> 
> >> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
> >> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
> >> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
> >> simulated PMEM (persistent memory) device.  The improvement comes from
> >> the reduced contention on the swap cache radix tree lock.  To test
> >> sequential swapping out, the test case uses 8 processes, which
> >> sequentially allocate and write to the anonymous pages until RAM and
> >> part of the swap device is used up.
> >> 
> >> Details of comparison is as follow,
> >> 
> >> base             base+patch
> >> ---------------- --------------------------
> >>          %stddev     %change         %stddev
> >>              \          |                \
> >>    2506952 ±  2%     +28.1%    3212076 ±  7%  vm-scalability.throughput
> >>    1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so
> >>      10.86 ± 12%     -23.4%       8.31 ± 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> >>      10.82 ± 13%     -33.1%       7.24 ± 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> >>      10.36 ± 11%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> >>      10.52 ± 12%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> >> 
> >
> > I didn't see anything wrong with the patch but it's worth highlighting
> > that this hunk means we are now out of GFP bits.
> 
> Sorry, I don't know whether I understand your words.  It is something
> about,
> 
> __GFP_BITS_SHIFT == 26
> 
> So remainning bits in mapping_flags is 6.  And now the latest bit is
> used for the flag introduced in the patch?
> 

__GFP_BITS_SHIFT + 5 (AS_NO_WRITEBACK_TAGS) = 31

mapping->flags is a combination of AS and GFP flags so increasing
__GFP_BITS_SHIFT overflows mapping->flags on 32-bit as gfp_t is an
unsigned int.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-31 15:39       ` Mel Gorman
  0 siblings, 0 replies; 35+ messages in thread
From: Mel Gorman @ 2016-08-31 15:39 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Wed, Aug 31, 2016 at 08:17:24AM -0700, Huang, Ying wrote:
> Mel Gorman <mgorman@techsingularity.net> writes:
> 
> > On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
> >> From: Huang Ying <ying.huang@intel.com>
> >> 
> >> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
> >> etc.) to accelerate finding the pages with a specific tag in the radix
> >> tree during inode writeback.  But for anonymous pages in the swap
> >> cache, there is no inode writeback.  So there is no need to find the
> >> pages with some writeback tags in the radix tree.  It is not necessary
> >> to touch radix tree writeback tags for pages in the swap cache.
> >> 
> >> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
> >> introduced for address spaces which don't need to update the writeback
> >> tags.  The flag is set for swap caches.  It may be used for DAX file
> >> systems, etc.
> >> 
> >> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
> >> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
> >> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
> >> simulated PMEM (persistent memory) device.  The improvement comes from
> >> the reduced contention on the swap cache radix tree lock.  To test
> >> sequential swapping out, the test case uses 8 processes, which
> >> sequentially allocate and write to the anonymous pages until RAM and
> >> part of the swap device is used up.
> >> 
> >> Details of comparison is as follow,
> >> 
> >> base             base+patch
> >> ---------------- --------------------------
> >>          %stddev     %change         %stddev
> >>              \          |                \
> >>    2506952 +-  2%     +28.1%    3212076 +-  7%  vm-scalability.throughput
> >>    1207402 +-  7%     +22.3%    1476578 +-  6%  vmstat.swap.so
> >>      10.86 +- 12%     -23.4%       8.31 +- 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> >>      10.82 +- 13%     -33.1%       7.24 +- 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> >>      10.36 +- 11%    -100.0%       0.00 +- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> >>      10.52 +- 12%    -100.0%       0.00 +- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> >> 
> >
> > I didn't see anything wrong with the patch but it's worth highlighting
> > that this hunk means we are now out of GFP bits.
> 
> Sorry, I don't know whether I understand your words.  It is something
> about,
> 
> __GFP_BITS_SHIFT == 26
> 
> So remainning bits in mapping_flags is 6.  And now the latest bit is
> used for the flag introduced in the patch?
> 

__GFP_BITS_SHIFT + 5 (AS_NO_WRITEBACK_TAGS) = 31

mapping->flags is a combination of AS and GFP flags so increasing
__GFP_BITS_SHIFT overflows mapping->flags on 32-bit as gfp_t is an
unsigned int.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-31 15:39       ` Mel Gorman
@ 2016-08-31 15:44         ` Huang, Ying
  -1 siblings, 0 replies; 35+ messages in thread
From: Huang, Ying @ 2016-08-31 15:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

Mel Gorman <mgorman@techsingularity.net> writes:

> On Wed, Aug 31, 2016 at 08:17:24AM -0700, Huang, Ying wrote:
>> Mel Gorman <mgorman@techsingularity.net> writes:
>> 
>> > On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
>> >> From: Huang Ying <ying.huang@intel.com>
>> >> 
>> >> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
>> >> etc.) to accelerate finding the pages with a specific tag in the radix
>> >> tree during inode writeback.  But for anonymous pages in the swap
>> >> cache, there is no inode writeback.  So there is no need to find the
>> >> pages with some writeback tags in the radix tree.  It is not necessary
>> >> to touch radix tree writeback tags for pages in the swap cache.
>> >> 
>> >> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
>> >> introduced for address spaces which don't need to update the writeback
>> >> tags.  The flag is set for swap caches.  It may be used for DAX file
>> >> systems, etc.
>> >> 
>> >> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
>> >> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
>> >> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
>> >> simulated PMEM (persistent memory) device.  The improvement comes from
>> >> the reduced contention on the swap cache radix tree lock.  To test
>> >> sequential swapping out, the test case uses 8 processes, which
>> >> sequentially allocate and write to the anonymous pages until RAM and
>> >> part of the swap device is used up.
>> >> 
>> >> Details of comparison is as follow,
>> >> 
>> >> base             base+patch
>> >> ---------------- --------------------------
>> >>          %stddev     %change         %stddev
>> >>              \          |                \
>> >>    2506952 ±  2%     +28.1%    3212076 ±  7%  vm-scalability.throughput
>> >>    1207402 ±  7%     +22.3%    1476578 ±  6%  vmstat.swap.so
>> >>      10.86 ± 12%     -23.4%       8.31 ± 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
>> >>      10.82 ± 13%     -33.1%       7.24 ± 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
>> >>      10.36 ± 11%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
>> >>      10.52 ± 12%    -100.0%       0.00 ± -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
>> >> 
>> >
>> > I didn't see anything wrong with the patch but it's worth highlighting
>> > that this hunk means we are now out of GFP bits.
>> 
>> Sorry, I don't know whether I understand your words.  It is something
>> about,
>> 
>> __GFP_BITS_SHIFT == 26
>> 
>> So remainning bits in mapping_flags is 6.  And now the latest bit is
>> used for the flag introduced in the patch?
>> 
>
> __GFP_BITS_SHIFT + 5 (AS_NO_WRITEBACK_TAGS) = 31
>
> mapping->flags is a combination of AS and GFP flags so increasing
> __GFP_BITS_SHIFT overflows mapping->flags on 32-bit as gfp_t is an
> unsigned int.

Got it!  Thanks a lot!

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-31 15:44         ` Huang, Ying
  0 siblings, 0 replies; 35+ messages in thread
From: Huang, Ying @ 2016-08-31 15:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

Mel Gorman <mgorman@techsingularity.net> writes:

> On Wed, Aug 31, 2016 at 08:17:24AM -0700, Huang, Ying wrote:
>> Mel Gorman <mgorman@techsingularity.net> writes:
>> 
>> > On Tue, Aug 30, 2016 at 10:28:09AM -0700, Huang, Ying wrote:
>> >> From: Huang Ying <ying.huang@intel.com>
>> >> 
>> >> File pages use a set of radix tree tags (DIRTY, TOWRITE, WRITEBACK,
>> >> etc.) to accelerate finding the pages with a specific tag in the radix
>> >> tree during inode writeback.  But for anonymous pages in the swap
>> >> cache, there is no inode writeback.  So there is no need to find the
>> >> pages with some writeback tags in the radix tree.  It is not necessary
>> >> to touch radix tree writeback tags for pages in the swap cache.
>> >> 
>> >> Per Rik van Riel's suggestion, a new flag AS_NO_WRITEBACK_TAGS is
>> >> introduced for address spaces which don't need to update the writeback
>> >> tags.  The flag is set for swap caches.  It may be used for DAX file
>> >> systems, etc.
>> >> 
>> >> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s to
>> >> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 processes.
>> >> The test is done on a Xeon E5 v3 system.  The swap device used is a RAM
>> >> simulated PMEM (persistent memory) device.  The improvement comes from
>> >> the reduced contention on the swap cache radix tree lock.  To test
>> >> sequential swapping out, the test case uses 8 processes, which
>> >> sequentially allocate and write to the anonymous pages until RAM and
>> >> part of the swap device is used up.
>> >> 
>> >> Details of comparison is as follow,
>> >> 
>> >> base             base+patch
>> >> ---------------- --------------------------
>> >>          %stddev     %change         %stddev
>> >>              \          |                \
>> >>    2506952 A+-  2%     +28.1%    3212076 A+-  7%  vm-scalability.throughput
>> >>    1207402 A+-  7%     +22.3%    1476578 A+-  6%  vmstat.swap.so
>> >>      10.86 A+- 12%     -23.4%       8.31 A+- 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
>> >>      10.82 A+- 13%     -33.1%       7.24 A+- 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
>> >>      10.36 A+- 11%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
>> >>      10.52 A+- 12%    -100.0%       0.00 A+- -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
>> >> 
>> >
>> > I didn't see anything wrong with the patch but it's worth highlighting
>> > that this hunk means we are now out of GFP bits.
>> 
>> Sorry, I don't know whether I understand your words.  It is something
>> about,
>> 
>> __GFP_BITS_SHIFT == 26
>> 
>> So remainning bits in mapping_flags is 6.  And now the latest bit is
>> used for the flag introduced in the patch?
>> 
>
> __GFP_BITS_SHIFT + 5 (AS_NO_WRITEBACK_TAGS) = 31
>
> mapping->flags is a combination of AS and GFP flags so increasing
> __GFP_BITS_SHIFT overflows mapping->flags on 32-bit as gfp_t is an
> unsigned int.

Got it!  Thanks a lot!

Best Regards,
Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-31  9:14   ` Mel Gorman
@ 2016-08-31 21:30     ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-08-31 21:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:

> >    2506952 __  2%     +28.1%    3212076 __  7%  vm-scalability.throughput
> >    1207402 __  7%     +22.3%    1476578 __  6%  vmstat.swap.so
> >      10.86 __ 12%     -23.4%       8.31 __ 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> >      10.82 __ 13%     -33.1%       7.24 __ 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> >      10.36 __ 11%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> >      10.52 __ 12%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> > 
> 
> I didn't see anything wrong with the patch but it's worth highlighting
> that this hunk means we are now out of GFP bits.

Well ugh.  What are we to do about that?

Sigh.  This?


From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm: check that we haven't used more than 32 bits in address_space.flags

After "mm: don't use radix tree writeback tags for pages in swap cache",
all the flags are now used up on 32-bit builds.

Add a build-time assertion to prevent 64-bit developers from accidentally
breaking things.

Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagemap.h |    2 ++
 init/main.c             |    4 ++++
 2 files changed, 6 insertions(+)

diff -puN include/linux/pagemap.h~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags include/linux/pagemap.h
--- a/include/linux/pagemap.h~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags
+++ a/include/linux/pagemap.h
@@ -27,6 +27,8 @@ enum mapping_flags {
 	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
+
+	AS_LAST_FLAG,
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
diff -puN init/main.c~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags init/main.c
--- a/init/main.c~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags
+++ a/init/main.c
@@ -59,6 +59,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/pagemap.h>
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/idr.h>
@@ -463,6 +464,9 @@ void __init __weak thread_stack_cache_in
  */
 static void __init mm_init(void)
 {
+	/* Does address_space.flags still fit into a 32-bit ulong? */
+	BUILD_BUG_ON(AS_LAST_FLAG > 32);
+
 	/*
 	 * page_ext requires contiguous pages,
 	 * bigger than MAX_ORDER unless SPARSEMEM.
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-31 21:30     ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-08-31 21:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:

> >    2506952 __  2%     +28.1%    3212076 __  7%  vm-scalability.throughput
> >    1207402 __  7%     +22.3%    1476578 __  6%  vmstat.swap.so
> >      10.86 __ 12%     -23.4%       8.31 __ 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> >      10.82 __ 13%     -33.1%       7.24 __ 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> >      10.36 __ 11%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> >      10.52 __ 12%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> > 
> 
> I didn't see anything wrong with the patch but it's worth highlighting
> that this hunk means we are now out of GFP bits.

Well ugh.  What are we to do about that?

Sigh.  This?


From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm: check that we haven't used more than 32 bits in address_space.flags

After "mm: don't use radix tree writeback tags for pages in swap cache",
all the flags are now used up on 32-bit builds.

Add a build-time assertion to prevent 64-bit developers from accidentally
breaking things.

Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagemap.h |    2 ++
 init/main.c             |    4 ++++
 2 files changed, 6 insertions(+)

diff -puN include/linux/pagemap.h~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags include/linux/pagemap.h
--- a/include/linux/pagemap.h~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags
+++ a/include/linux/pagemap.h
@@ -27,6 +27,8 @@ enum mapping_flags {
 	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
+
+	AS_LAST_FLAG,
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
diff -puN init/main.c~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags init/main.c
--- a/init/main.c~mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags
+++ a/init/main.c
@@ -59,6 +59,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/pagemap.h>
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/idr.h>
@@ -463,6 +464,9 @@ void __init __weak thread_stack_cache_in
  */
 static void __init mm_init(void)
 {
+	/* Does address_space.flags still fit into a 32-bit ulong? */
+	BUILD_BUG_ON(AS_LAST_FLAG > 32);
+
 	/*
 	 * page_ext requires contiguous pages,
 	 * bigger than MAX_ORDER unless SPARSEMEM.
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-31 15:39       ` Mel Gorman
@ 2016-08-31 21:35         ` Andi Kleen
  -1 siblings, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2016-08-31 21:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

Mel Gorman <mgorman@techsingularity.net> writes:
>
> __GFP_BITS_SHIFT + 5 (AS_NO_WRITEBACK_TAGS) = 31
>
> mapping->flags is a combination of AS and GFP flags so increasing
> __GFP_BITS_SHIFT overflows mapping->flags on 32-bit as gfp_t is an
> unsigned int.

Couldn't we just split mapping->flags into two fields?
I'm sure more GFP bits will be needed eventually.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-08-31 21:35         ` Andi Kleen
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2016-08-31 21:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Huang, Ying, Andrew Morton, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

Mel Gorman <mgorman@techsingularity.net> writes:
>
> __GFP_BITS_SHIFT + 5 (AS_NO_WRITEBACK_TAGS) = 31
>
> mapping->flags is a combination of AS and GFP flags so increasing
> __GFP_BITS_SHIFT overflows mapping->flags on 32-bit as gfp_t is an
> unsigned int.

Couldn't we just split mapping->flags into two fields?
I'm sure more GFP bits will be needed eventually.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-31 21:30     ` Andrew Morton
@ 2016-09-01  8:51       ` Mel Gorman
  -1 siblings, 0 replies; 35+ messages in thread
From: Mel Gorman @ 2016-09-01  8:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huang, Ying, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Wed, Aug 31, 2016 at 02:30:31PM -0700, Andrew Morton wrote:
> On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > >    2506952 __  2%     +28.1%    3212076 __  7%  vm-scalability.throughput
> > >    1207402 __  7%     +22.3%    1476578 __  6%  vmstat.swap.so
> > >      10.86 __ 12%     -23.4%       8.31 __ 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> > >      10.82 __ 13%     -33.1%       7.24 __ 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> > >      10.36 __ 11%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> > >      10.52 __ 12%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> > > 
> > 
> > I didn't see anything wrong with the patch but it's worth highlighting
> > that this hunk means we are now out of GFP bits.
> 
> Well ugh.  What are we to do about that?
> 

It'll stop silent breakage so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

Whoever hits it will need to take similar steps we had to with page->flags
by making some 64-bit only, removing flags or inferring the flag values
from other sources.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-09-01  8:51       ` Mel Gorman
  0 siblings, 0 replies; 35+ messages in thread
From: Mel Gorman @ 2016-09-01  8:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huang, Ying, tim.c.chen, dave.hansen, andi.kleen, aaron.lu,
	linux-mm, linux-kernel, Hugh Dickins, Shaohua Li, Minchan Kim,
	Rik van Riel, Tejun Heo, Wu Fengguang

On Wed, Aug 31, 2016 at 02:30:31PM -0700, Andrew Morton wrote:
> On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > >    2506952 __  2%     +28.1%    3212076 __  7%  vm-scalability.throughput
> > >    1207402 __  7%     +22.3%    1476578 __  6%  vmstat.swap.so
> > >      10.86 __ 12%     -23.4%       8.31 __ 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> > >      10.82 __ 13%     -33.1%       7.24 __ 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> > >      10.36 __ 11%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> > >      10.52 __ 12%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> > > 
> > 
> > I didn't see anything wrong with the patch but it's worth highlighting
> > that this hunk means we are now out of GFP bits.
> 
> Well ugh.  What are we to do about that?
> 

It'll stop silent breakage so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

Whoever hits it will need to take similar steps we had to with page->flags
by making some 64-bit only, removing flags or inferring the flag values
from other sources.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
  2016-08-31 21:30     ` Andrew Morton
@ 2016-09-01  9:13       ` Michal Hocko
  -1 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-01  9:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Huang, Ying, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

On Wed 31-08-16 14:30:31, Andrew Morton wrote:
> On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > >    2506952 __  2%     +28.1%    3212076 __  7%  vm-scalability.throughput
> > >    1207402 __  7%     +22.3%    1476578 __  6%  vmstat.swap.so
> > >      10.86 __ 12%     -23.4%       8.31 __ 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> > >      10.82 __ 13%     -33.1%       7.24 __ 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> > >      10.36 __ 11%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> > >      10.52 __ 12%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> > > 
> > 
> > I didn't see anything wrong with the patch but it's worth highlighting
> > that this hunk means we are now out of GFP bits.
> 
> Well ugh.  What are we to do about that?

Can we simply give these AS_ flags their own word in mapping rather than
squash them together with gfp flags and impose the restriction on the
number of gfp flags. There was some demand for new gfp flags already and
mapping flags were in the way.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache
@ 2016-09-01  9:13       ` Michal Hocko
  0 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-01  9:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Huang, Ying, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, linux-mm, linux-kernel, Hugh Dickins, Shaohua Li,
	Minchan Kim, Rik van Riel, Tejun Heo, Wu Fengguang

On Wed 31-08-16 14:30:31, Andrew Morton wrote:
> On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > >    2506952 __  2%     +28.1%    3212076 __  7%  vm-scalability.throughput
> > >    1207402 __  7%     +22.3%    1476578 __  6%  vmstat.swap.so
> > >      10.86 __ 12%     -23.4%       8.31 __ 16%  perf-profile.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list
> > >      10.82 __ 13%     -33.1%       7.24 __ 14%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_zone_memcg
> > >      10.36 __ 11%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page.__swap_writepage.swap_writepage
> > >      10.52 __ 12%    -100.0%       0.00 __ -1%  perf-profile.cycles-pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writeback.page_endio.pmem_rw_page
> > > 
> > 
> > I didn't see anything wrong with the patch but it's worth highlighting
> > that this hunk means we are now out of GFP bits.
> 
> Well ugh.  What are we to do about that?

Can we simply give these AS_ flags their own word in mapping rather than
squash them together with gfp flags and impose the restriction on the
number of gfp flags. There was some demand for new gfp flags already and
mapping flags were in the way.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/2] do not squash mapping flags and gfp_mask together (was: Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in)
  2016-09-01  9:13       ` Michal Hocko
@ 2016-09-12 11:16         ` Michal Hocko
  -1 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Huang, Ying, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, Hugh Dickins, Shaohua Li, Minchan Kim, Rik van Riel,
	Tejun Heo, Wu Fengguang

On Thu 01-09-16 11:13:47, Michal Hocko wrote:
> On Wed 31-08-16 14:30:31, Andrew Morton wrote:
> > On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
[...]
> > > I didn't see anything wrong with the patch but it's worth highlighting
> > > that this hunk means we are now out of GFP bits.
> > 
> > Well ugh.  What are we to do about that?
> 
> Can we simply give these AS_ flags their own word in mapping rather than
> squash them together with gfp flags and impose the restriction on the
> number of gfp flags. There was some demand for new gfp flags already and
> mapping flags were in the way.

OK, it seems this got unnoticed. What do you think about the following
two patches? I have only compile tested them and git grep suggests
nobody else should be relying on storing gfp_mask into flags directly.
So either I my grep-foo fools me or this should be safe. The two patches
will come as a reply to this email.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/2] do not squash mapping flags and gfp_mask together (was: Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in)
@ 2016-09-12 11:16         ` Michal Hocko
  0 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Huang, Ying, tim.c.chen, dave.hansen, andi.kleen,
	aaron.lu, Hugh Dickins, Shaohua Li, Minchan Kim, Rik van Riel,
	Tejun Heo, Wu Fengguang

On Thu 01-09-16 11:13:47, Michal Hocko wrote:
> On Wed 31-08-16 14:30:31, Andrew Morton wrote:
> > On Wed, 31 Aug 2016 10:14:59 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
[...]
> > > I didn't see anything wrong with the patch but it's worth highlighting
> > > that this hunk means we are now out of GFP bits.
> > 
> > Well ugh.  What are we to do about that?
> 
> Can we simply give these AS_ flags their own word in mapping rather than
> squash them together with gfp flags and impose the restriction on the
> number of gfp flags. There was some demand for new gfp flags already and
> mapping flags were in the way.

OK, it seems this got unnoticed. What do you think about the following
two patches? I have only compile tested them and git grep suggests
nobody else should be relying on storing gfp_mask into flags directly.
So either I my grep-foo fools me or this should be safe. The two patches
will come as a reply to this email.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
  2016-09-12 11:16         ` Michal Hocko
@ 2016-09-12 11:16           ` Michal Hocko
  -1 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

mapping_set_error helper sets the correct AS_ flag for the mapping so
there is no reason to open code it. Use the helper directly.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/staging/lustre/lustre/llite/vvp_page.c | 5 +----
 fs/afs/write.c                                 | 5 ++---
 fs/buffer.c                                    | 4 ++--
 fs/exofs/inode.c                               | 2 +-
 fs/ext4/page-io.c                              | 2 +-
 fs/f2fs/data.c                                 | 2 +-
 fs/jbd2/commit.c                               | 3 +--
 7 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 6cd2af7a958f..96194b6f118e 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -247,10 +247,7 @@ static void vvp_vmpage_error(struct inode *inode, struct page *vmpage, int ioret
 		obj->vob_discard_page_warned = 0;
 	} else {
 		SetPageError(vmpage);
-		if (ioret == -ENOSPC)
-			set_bit(AS_ENOSPC, &inode->i_mapping->flags);
-		else
-			set_bit(AS_EIO, &inode->i_mapping->flags);
+		mapping_set_error(inode->i_mapping, ioret);
 
 		if ((ioret == -ESHUTDOWN || ioret == -EINTR) &&
 		     obj->vob_discard_page_warned == 0) {
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 14d506efd1aa..20ed04ab833c 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -398,8 +398,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
 		switch (ret) {
 		case -EDQUOT:
 		case -ENOSPC:
-			set_bit(AS_ENOSPC,
-				&wb->vnode->vfs_inode.i_mapping->flags);
+			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENOSPC);
 			break;
 		case -EROFS:
 		case -EIO:
@@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
 		case -ENOMEDIUM:
 		case -ENXIO:
 			afs_kill_pages(wb->vnode, true, first, last);
-			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
+			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
 			break;
 		case -EACCES:
 		case -EPERM:
diff --git a/fs/buffer.c b/fs/buffer.c
index 754813a6962b..467e1ac3fac6 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -350,7 +350,7 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate)
 		set_buffer_uptodate(bh);
 	} else {
 		buffer_io_error(bh, ", lost async page write");
-		set_bit(AS_EIO, &page->mapping->flags);
+		mapping_set_error(page->mapping, -EIO);
 		set_buffer_write_io_error(bh);
 		clear_buffer_uptodate(bh);
 		SetPageError(page);
@@ -3180,7 +3180,7 @@ drop_buffers(struct page *page, struct buffer_head **buffers_to_free)
 	bh = head;
 	do {
 		if (buffer_write_io_error(bh) && page->mapping)
-			set_bit(AS_EIO, &page->mapping->flags);
+			mapping_set_error(page->mapping, -EIO);
 		if (buffer_busy(bh))
 			goto failed;
 		bh = bh->b_this_page;
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 9dc4c6dbf3c9..a405db82e060 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -778,7 +778,7 @@ static int writepage_strip(struct page *page,
 fail:
 	EXOFS_DBGMSG("Error: writepage_strip(0x%lx, 0x%lx)=>%d\n",
 		     inode->i_ino, page->index, ret);
-	set_bit(AS_EIO, &page->mapping->flags);
+	mapping_set_error(page->mapping, -EIO);
 	unlock_page(page);
 	return ret;
 }
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 2a01df9cc1c3..8073d63e37a7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -89,7 +89,7 @@ static void ext4_finish_bio(struct bio *bio)
 
 		if (bio->bi_error) {
 			SetPageError(page);
-			set_bit(AS_EIO, &page->mapping->flags);
+			mapping_set_error(page->mapping, -EIO);
 		}
 		bh = head = page_buffers(page);
 		/*
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c80dda4bdff8..b728d284778e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -67,7 +67,7 @@ static void f2fs_write_end_io(struct bio *bio)
 		fscrypt_pullback_bio_page(&page, true);
 
 		if (unlikely(bio->bi_error)) {
-			set_bit(AS_EIO, &page->mapping->flags);
+			mapping_set_error(page->mapping, -EIO);
 			f2fs_stop_checkpoint(sbi, true);
 		}
 		end_page_writeback(page);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 70078096117d..f3d5746f2446 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -269,8 +269,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
 			 * filemap_fdatawait_range(), set it again so
 			 * that user process can get -EIO from fsync().
 			 */
-			set_bit(AS_EIO,
-				&jinode->i_vfs_inode->i_mapping->flags);
+			mapping_set_error(jinode->i_vfs_inode->i_mapping, -EIO);
 
 			if (!ret)
 				ret = err;
-- 
2.9.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
@ 2016-09-12 11:16           ` Michal Hocko
  0 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

mapping_set_error helper sets the correct AS_ flag for the mapping so
there is no reason to open code it. Use the helper directly.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/staging/lustre/lustre/llite/vvp_page.c | 5 +----
 fs/afs/write.c                                 | 5 ++---
 fs/buffer.c                                    | 4 ++--
 fs/exofs/inode.c                               | 2 +-
 fs/ext4/page-io.c                              | 2 +-
 fs/f2fs/data.c                                 | 2 +-
 fs/jbd2/commit.c                               | 3 +--
 7 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 6cd2af7a958f..96194b6f118e 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -247,10 +247,7 @@ static void vvp_vmpage_error(struct inode *inode, struct page *vmpage, int ioret
 		obj->vob_discard_page_warned = 0;
 	} else {
 		SetPageError(vmpage);
-		if (ioret == -ENOSPC)
-			set_bit(AS_ENOSPC, &inode->i_mapping->flags);
-		else
-			set_bit(AS_EIO, &inode->i_mapping->flags);
+		mapping_set_error(inode->i_mapping, ioret);
 
 		if ((ioret == -ESHUTDOWN || ioret == -EINTR) &&
 		     obj->vob_discard_page_warned == 0) {
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 14d506efd1aa..20ed04ab833c 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -398,8 +398,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
 		switch (ret) {
 		case -EDQUOT:
 		case -ENOSPC:
-			set_bit(AS_ENOSPC,
-				&wb->vnode->vfs_inode.i_mapping->flags);
+			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENOSPC);
 			break;
 		case -EROFS:
 		case -EIO:
@@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
 		case -ENOMEDIUM:
 		case -ENXIO:
 			afs_kill_pages(wb->vnode, true, first, last);
-			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
+			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
 			break;
 		case -EACCES:
 		case -EPERM:
diff --git a/fs/buffer.c b/fs/buffer.c
index 754813a6962b..467e1ac3fac6 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -350,7 +350,7 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate)
 		set_buffer_uptodate(bh);
 	} else {
 		buffer_io_error(bh, ", lost async page write");
-		set_bit(AS_EIO, &page->mapping->flags);
+		mapping_set_error(page->mapping, -EIO);
 		set_buffer_write_io_error(bh);
 		clear_buffer_uptodate(bh);
 		SetPageError(page);
@@ -3180,7 +3180,7 @@ drop_buffers(struct page *page, struct buffer_head **buffers_to_free)
 	bh = head;
 	do {
 		if (buffer_write_io_error(bh) && page->mapping)
-			set_bit(AS_EIO, &page->mapping->flags);
+			mapping_set_error(page->mapping, -EIO);
 		if (buffer_busy(bh))
 			goto failed;
 		bh = bh->b_this_page;
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 9dc4c6dbf3c9..a405db82e060 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -778,7 +778,7 @@ static int writepage_strip(struct page *page,
 fail:
 	EXOFS_DBGMSG("Error: writepage_strip(0x%lx, 0x%lx)=>%d\n",
 		     inode->i_ino, page->index, ret);
-	set_bit(AS_EIO, &page->mapping->flags);
+	mapping_set_error(page->mapping, -EIO);
 	unlock_page(page);
 	return ret;
 }
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 2a01df9cc1c3..8073d63e37a7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -89,7 +89,7 @@ static void ext4_finish_bio(struct bio *bio)
 
 		if (bio->bi_error) {
 			SetPageError(page);
-			set_bit(AS_EIO, &page->mapping->flags);
+			mapping_set_error(page->mapping, -EIO);
 		}
 		bh = head = page_buffers(page);
 		/*
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c80dda4bdff8..b728d284778e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -67,7 +67,7 @@ static void f2fs_write_end_io(struct bio *bio)
 		fscrypt_pullback_bio_page(&page, true);
 
 		if (unlikely(bio->bi_error)) {
-			set_bit(AS_EIO, &page->mapping->flags);
+			mapping_set_error(page->mapping, -EIO);
 			f2fs_stop_checkpoint(sbi, true);
 		}
 		end_page_writeback(page);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 70078096117d..f3d5746f2446 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -269,8 +269,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
 			 * filemap_fdatawait_range(), set it again so
 			 * that user process can get -EIO from fsync().
 			 */
-			set_bit(AS_EIO,
-				&jinode->i_vfs_inode->i_mapping->flags);
+			mapping_set_error(jinode->i_vfs_inode->i_mapping, -EIO);
 
 			if (!ret)
 				ret = err;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/2] mm: split gfp_mask and mapping flags into separate fields
  2016-09-12 11:16         ` Michal Hocko
@ 2016-09-12 11:16           ` Michal Hocko
  -1 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

mapping->flags currently encodes two different things into a single
flag. It contains sticky gfp_mask for page cache allocations and AS_
codes used to report errors/enospace and other states which are mapping
specific. Condensing the two semantically unrelated things saves few
bytes but it also complicates other things. For one thing the gfp flags
space is reduced and in fact we are already running out of available
bits. It can be assumed that more gfp flags will be necessary later on.

To not introduce the address_space grow (at least on x86_64) we can
stick it right after private_lock because we have a hole there.

struct address_space {
        struct inode *             host;                 /*     0     8 */
        struct radix_tree_root     page_tree;            /*     8    16 */
        spinlock_t                 tree_lock;            /*    24     4 */
        atomic_t                   i_mmap_writable;      /*    28     4 */
        struct rb_root             i_mmap;               /*    32     8 */
        struct rw_semaphore        i_mmap_rwsem;         /*    40    40 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
        long unsigned int          nrpages;              /*    80     8 */
        long unsigned int          nrexceptional;        /*    88     8 */
        long unsigned int          writeback_index;      /*    96     8 */
        const struct address_space_operations  * a_ops;  /*   104     8 */
        long unsigned int          flags;                /*   112     8 */
        spinlock_t                 private_lock;         /*   120     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        struct list_head           private_list;         /*   128    16 */
        void *                     private_data;         /*   144     8 */

        /* size: 152, cachelines: 3, members: 14 */
        /* sum members: 148, holes: 1, sum holes: 4 */
        /* last cacheline: 24 bytes */
};

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/fs.h      |  3 ++-
 include/linux/pagemap.h | 20 +++++++++-----------
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index cd8a5e1d5580..41d7213946af 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -443,7 +443,8 @@ struct address_space {
 	unsigned long		nrexceptional;
 	pgoff_t			writeback_index;/* writeback starts here */
 	const struct address_space_operations *a_ops;	/* methods */
-	unsigned long		flags;		/* error bits/gfp mask */
+	unsigned long		flags;		/* error bits */
+	gfp_t			gfp_mask;	/* implicit gfp mask for allocations */
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	void			*private_data;	/* ditto */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 76f151ab4f62..0385a954465c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -16,17 +16,16 @@
 #include <linux/hugetlb_inline.h>
 
 /*
- * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
- * allocation mode flags.
+ * Bits in mapping->flags.
  */
 enum mapping_flags {
-	AS_EIO		= __GFP_BITS_SHIFT + 0,	/* IO error on async write */
-	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
-	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
-	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
-	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
+	AS_EIO		= 0,	/* IO error on async write */
+	AS_ENOSPC	= 1,	/* ENOSPC on async write */
+	AS_MM_ALL_LOCKS	= 2,	/* under mm_take_all_locks() */
+	AS_UNEVICTABLE	= 3,	/* e.g., ramdisk, SHM_LOCK */
+	AS_EXITING	= 4, 	/* final truncate in progress */
 	/* writeback related tags are not used */
-	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
+	AS_NO_WRITEBACK_TAGS = 5,
 
 	AS_LAST_FLAG,
 };
@@ -80,7 +79,7 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping)
 
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
-	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
+	return mapping->gfp_mask;
 }
 
 /* Restricts the given gfp_mask to what the mapping allows. */
@@ -96,8 +95,7 @@ static inline gfp_t mapping_gfp_constraint(struct address_space *mapping,
  */
 static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 {
-	m->flags = (m->flags & ~(__force unsigned long)__GFP_BITS_MASK) |
-				(__force unsigned long)mask;
+	m->gfp_mask = mask;
 }
 
 void release_pages(struct page **pages, int nr, bool cold);
-- 
2.9.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/2] mm: split gfp_mask and mapping flags into separate fields
@ 2016-09-12 11:16           ` Michal Hocko
  0 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

mapping->flags currently encodes two different things into a single
flag. It contains sticky gfp_mask for page cache allocations and AS_
codes used to report errors/enospace and other states which are mapping
specific. Condensing the two semantically unrelated things saves few
bytes but it also complicates other things. For one thing the gfp flags
space is reduced and in fact we are already running out of available
bits. It can be assumed that more gfp flags will be necessary later on.

To not introduce the address_space grow (at least on x86_64) we can
stick it right after private_lock because we have a hole there.

struct address_space {
        struct inode *             host;                 /*     0     8 */
        struct radix_tree_root     page_tree;            /*     8    16 */
        spinlock_t                 tree_lock;            /*    24     4 */
        atomic_t                   i_mmap_writable;      /*    28     4 */
        struct rb_root             i_mmap;               /*    32     8 */
        struct rw_semaphore        i_mmap_rwsem;         /*    40    40 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
        long unsigned int          nrpages;              /*    80     8 */
        long unsigned int          nrexceptional;        /*    88     8 */
        long unsigned int          writeback_index;      /*    96     8 */
        const struct address_space_operations  * a_ops;  /*   104     8 */
        long unsigned int          flags;                /*   112     8 */
        spinlock_t                 private_lock;         /*   120     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        struct list_head           private_list;         /*   128    16 */
        void *                     private_data;         /*   144     8 */

        /* size: 152, cachelines: 3, members: 14 */
        /* sum members: 148, holes: 1, sum holes: 4 */
        /* last cacheline: 24 bytes */
};

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/fs.h      |  3 ++-
 include/linux/pagemap.h | 20 +++++++++-----------
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index cd8a5e1d5580..41d7213946af 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -443,7 +443,8 @@ struct address_space {
 	unsigned long		nrexceptional;
 	pgoff_t			writeback_index;/* writeback starts here */
 	const struct address_space_operations *a_ops;	/* methods */
-	unsigned long		flags;		/* error bits/gfp mask */
+	unsigned long		flags;		/* error bits */
+	gfp_t			gfp_mask;	/* implicit gfp mask for allocations */
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	void			*private_data;	/* ditto */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 76f151ab4f62..0385a954465c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -16,17 +16,16 @@
 #include <linux/hugetlb_inline.h>
 
 /*
- * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
- * allocation mode flags.
+ * Bits in mapping->flags.
  */
 enum mapping_flags {
-	AS_EIO		= __GFP_BITS_SHIFT + 0,	/* IO error on async write */
-	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
-	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
-	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
-	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
+	AS_EIO		= 0,	/* IO error on async write */
+	AS_ENOSPC	= 1,	/* ENOSPC on async write */
+	AS_MM_ALL_LOCKS	= 2,	/* under mm_take_all_locks() */
+	AS_UNEVICTABLE	= 3,	/* e.g., ramdisk, SHM_LOCK */
+	AS_EXITING	= 4, 	/* final truncate in progress */
 	/* writeback related tags are not used */
-	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
+	AS_NO_WRITEBACK_TAGS = 5,
 
 	AS_LAST_FLAG,
 };
@@ -80,7 +79,7 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping)
 
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
-	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
+	return mapping->gfp_mask;
 }
 
 /* Restricts the given gfp_mask to what the mapping allows. */
@@ -96,8 +95,7 @@ static inline gfp_t mapping_gfp_constraint(struct address_space *mapping,
  */
 static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 {
-	m->flags = (m->flags & ~(__force unsigned long)__GFP_BITS_MASK) |
-				(__force unsigned long)mask;
+	m->gfp_mask = mask;
 }
 
 void release_pages(struct page **pages, int nr, bool cold);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] mm: split gfp_mask and mapping flags into separate fields
  2016-09-12 11:16           ` Michal Hocko
@ 2016-09-12 11:48             ` Michal Hocko
  -1 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML

Errr, the gfp_mask move behind private_lock didn't make it into the
commit. Here is the updated patch. Btw. with this patch we can drop
mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags.patch
---
>From a8200e0de375886bbb41bbae4df7fa65ec619d05 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 12 Sep 2016 12:50:09 +0200
Subject: [PATCH] mm: split gfp_mask and mapping flags into separate fields

mapping->flags currently encodes two different things into a single
flag. It contains sticky gfp_mask for page cache allocations and AS_
codes used to report errors/enospace and other states which are mapping
specific. Condensing the two semantically unrelated things saves few
bytes but it also complicates other things. For one thing the gfp flags
space is reduced and in fact we are already running out of available
bits. It can be assumed that more gfp flags will be necessary later on.

To not introduce the address_space grow (at least on x86_64) we can
stick it right after private_lock because we have a hole there.

struct address_space {
        struct inode *             host;                 /*     0     8 */
        struct radix_tree_root     page_tree;            /*     8    16 */
        spinlock_t                 tree_lock;            /*    24     4 */
        atomic_t                   i_mmap_writable;      /*    28     4 */
        struct rb_root             i_mmap;               /*    32     8 */
        struct rw_semaphore        i_mmap_rwsem;         /*    40    40 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
        long unsigned int          nrpages;              /*    80     8 */
        long unsigned int          nrexceptional;        /*    88     8 */
        long unsigned int          writeback_index;      /*    96     8 */
        const struct address_space_operations  * a_ops;  /*   104     8 */
        long unsigned int          flags;                /*   112     8 */
        spinlock_t                 private_lock;         /*   120     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        struct list_head           private_list;         /*   128    16 */
        void *                     private_data;         /*   144     8 */

        /* size: 152, cachelines: 3, members: 14 */
        /* sum members: 148, holes: 1, sum holes: 4 */
        /* last cacheline: 24 bytes */
};

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/fs.h      |  3 ++-
 include/linux/pagemap.h | 20 +++++++++-----------
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index cd8a5e1d5580..34b9d88cb788 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -443,8 +443,9 @@ struct address_space {
 	unsigned long		nrexceptional;
 	pgoff_t			writeback_index;/* writeback starts here */
 	const struct address_space_operations *a_ops;	/* methods */
-	unsigned long		flags;		/* error bits/gfp mask */
+	unsigned long		flags;		/* error bits */
 	spinlock_t		private_lock;	/* for use by the address_space */
+	gfp_t			gfp_mask;	/* implicit gfp mask for allocations */
 	struct list_head	private_list;	/* ditto */
 	void			*private_data;	/* ditto */
 } __attribute__((aligned(sizeof(long))));
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 76f151ab4f62..0385a954465c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -16,17 +16,16 @@
 #include <linux/hugetlb_inline.h>
 
 /*
- * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
- * allocation mode flags.
+ * Bits in mapping->flags.
  */
 enum mapping_flags {
-	AS_EIO		= __GFP_BITS_SHIFT + 0,	/* IO error on async write */
-	AS_ENOSPC	= __GFP_BITS_SHIFT + 1,	/* ENOSPC on async write */
-	AS_MM_ALL_LOCKS	= __GFP_BITS_SHIFT + 2,	/* under mm_take_all_locks() */
-	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
-	AS_EXITING	= __GFP_BITS_SHIFT + 4, /* final truncate in progress */
+	AS_EIO		= 0,	/* IO error on async write */
+	AS_ENOSPC	= 1,	/* ENOSPC on async write */
+	AS_MM_ALL_LOCKS	= 2,	/* under mm_take_all_locks() */
+	AS_UNEVICTABLE	= 3,	/* e.g., ramdisk, SHM_LOCK */
+	AS_EXITING	= 4, 	/* final truncate in progress */
 	/* writeback related tags are not used */
-	AS_NO_WRITEBACK_TAGS = __GFP_BITS_SHIFT + 5,
+	AS_NO_WRITEBACK_TAGS = 5,
 
 	AS_LAST_FLAG,
 };
@@ -80,7 +79,7 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping)
 
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
-	return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
+	return mapping->gfp_mask;
 }
 
 /* Restricts the given gfp_mask to what the mapping allows. */
@@ -96,8 +95,7 @@ static inline gfp_t mapping_gfp_constraint(struct address_space *mapping,
  */
 static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 {
-	m->flags = (m->flags & ~(__force unsigned long)__GFP_BITS_MASK) |
-				(__force unsigned long)mask;
+	m->gfp_mask = mask;
 }
 
 void release_pages(struct page **pages, int nr, bool cold);
-- 
2.9.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] mm: split gfp_mask and mapping flags into separate fields
@ 2016-09-12 11:48             ` Michal Hocko
  0 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-12 11:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML

Errr, the gfp_mask move behind private_lock didn't make it into the
commit. Here is the updated patch. Btw. with this patch we can drop
mm-check-that-we-havent-used-more-than-32-bits-in-address_spaceflags.patch
---

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
  2016-09-12 11:16           ` Michal Hocko
@ 2016-09-12 22:11             ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-09-12 22:11 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, LKML, Michal Hocko

On Mon, 12 Sep 2016 13:16:07 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> mapping_set_error helper sets the correct AS_ flag for the mapping so
> there is no reason to open code it. Use the helper directly.
> 
> ...
>
> --- a/drivers/staging/lustre/lustre/llite/vvp_page.c
> +++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
> @@ -247,10 +247,7 @@ static void vvp_vmpage_error(struct inode *inode, struct page *vmpage, int ioret
>  		obj->vob_discard_page_warned = 0;
>  	} else {
>  		SetPageError(vmpage);
> -		if (ioret == -ENOSPC)
> -			set_bit(AS_ENOSPC, &inode->i_mapping->flags);
> -		else
> -			set_bit(AS_EIO, &inode->i_mapping->flags);
> +		mapping_set_error(inode->i_mapping, ioret);
>  
>  		if ((ioret == -ESHUTDOWN || ioret == -EINTR) &&
>  		     obj->vob_discard_page_warned == 0) {
> diff --git a/fs/afs/write.c b/fs/afs/write.c
> index 14d506efd1aa..20ed04ab833c 100644
> --- a/fs/afs/write.c
> +++ b/fs/afs/write.c
> @@ -398,8 +398,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
>  		switch (ret) {
>  		case -EDQUOT:
>  		case -ENOSPC:
> -			set_bit(AS_ENOSPC,
> -				&wb->vnode->vfs_inode.i_mapping->flags);
> +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENOSPC);
>  			break;
>  		case -EROFS:
>  		case -EIO:
> @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
>  		case -ENOMEDIUM:
>  		case -ENXIO:
>  			afs_kill_pages(wb->vnode, true, first, last);
> -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);

This one is a functional change: mapping_set_error() will rewrite
-ENXIO into -EIO.  Doesn't seem at all important though.

> ...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
@ 2016-09-12 22:11             ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-09-12 22:11 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, LKML, Michal Hocko

On Mon, 12 Sep 2016 13:16:07 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> mapping_set_error helper sets the correct AS_ flag for the mapping so
> there is no reason to open code it. Use the helper directly.
> 
> ...
>
> --- a/drivers/staging/lustre/lustre/llite/vvp_page.c
> +++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
> @@ -247,10 +247,7 @@ static void vvp_vmpage_error(struct inode *inode, struct page *vmpage, int ioret
>  		obj->vob_discard_page_warned = 0;
>  	} else {
>  		SetPageError(vmpage);
> -		if (ioret == -ENOSPC)
> -			set_bit(AS_ENOSPC, &inode->i_mapping->flags);
> -		else
> -			set_bit(AS_EIO, &inode->i_mapping->flags);
> +		mapping_set_error(inode->i_mapping, ioret);
>  
>  		if ((ioret == -ESHUTDOWN || ioret == -EINTR) &&
>  		     obj->vob_discard_page_warned == 0) {
> diff --git a/fs/afs/write.c b/fs/afs/write.c
> index 14d506efd1aa..20ed04ab833c 100644
> --- a/fs/afs/write.c
> +++ b/fs/afs/write.c
> @@ -398,8 +398,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
>  		switch (ret) {
>  		case -EDQUOT:
>  		case -ENOSPC:
> -			set_bit(AS_ENOSPC,
> -				&wb->vnode->vfs_inode.i_mapping->flags);
> +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENOSPC);
>  			break;
>  		case -EROFS:
>  		case -EIO:
> @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
>  		case -ENOMEDIUM:
>  		case -ENXIO:
>  			afs_kill_pages(wb->vnode, true, first, last);
> -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);

This one is a functional change: mapping_set_error() will rewrite
-ENXIO into -EIO.  Doesn't seem at all important though.

> ...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
  2016-09-12 22:11             ` Andrew Morton
@ 2016-09-12 22:18               ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-09-12 22:18 UTC (permalink / raw)
  To: Michal Hocko, linux-mm, LKML, Michal Hocko

On Mon, 12 Sep 2016 15:11:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> > @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
> >  		case -ENOMEDIUM:
> >  		case -ENXIO:
> >  			afs_kill_pages(wb->vnode, true, first, last);
> > -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> > +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> 
> This one is a functional change: mapping_set_error() will rewrite
> -ENXIO into -EIO.  Doesn't seem at all important though.

hm, OK, it's not a functional change - the code was already doing
s/ENXIO/EIO/.

Let's make it look more truthful?

--- a/fs/afs/write.c~fs-use-mapping_set_error-instead-of-opencoded-set_bit-fix
+++ a/fs/afs/write.c
@@ -408,7 +408,7 @@ no_more:
 		case -ENOMEDIUM:
 		case -ENXIO:
 			afs_kill_pages(wb->vnode, true, first, last);
-			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
+			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -EIO);
 			break;
 		case -EACCES:
 		case -EPERM:
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
@ 2016-09-12 22:18               ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-09-12 22:18 UTC (permalink / raw)
  To: Michal Hocko, linux-mm, LKML, Michal Hocko

On Mon, 12 Sep 2016 15:11:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> > @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
> >  		case -ENOMEDIUM:
> >  		case -ENXIO:
> >  			afs_kill_pages(wb->vnode, true, first, last);
> > -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> > +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> 
> This one is a functional change: mapping_set_error() will rewrite
> -ENXIO into -EIO.  Doesn't seem at all important though.

hm, OK, it's not a functional change - the code was already doing
s/ENXIO/EIO/.

Let's make it look more truthful?

--- a/fs/afs/write.c~fs-use-mapping_set_error-instead-of-opencoded-set_bit-fix
+++ a/fs/afs/write.c
@@ -408,7 +408,7 @@ no_more:
 		case -ENOMEDIUM:
 		case -ENXIO:
 			afs_kill_pages(wb->vnode, true, first, last);
-			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
+			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -EIO);
 			break;
 		case -EACCES:
 		case -EPERM:
_


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
  2016-09-12 22:18               ` Andrew Morton
@ 2016-09-13  6:53                 ` Michal Hocko
  -1 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-13  6:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML

On Mon 12-09-16 15:18:23, Andrew Morton wrote:
> On Mon, 12 Sep 2016 15:11:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
> > >  		case -ENOMEDIUM:
> > >  		case -ENXIO:
> > >  			afs_kill_pages(wb->vnode, true, first, last);
> > > -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> > > +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> > 
> > This one is a functional change: mapping_set_error() will rewrite
> > -ENXIO into -EIO.  Doesn't seem at all important though.
> 
> hm, OK, it's not a functional change - the code was already doing
> s/ENXIO/EIO/.

Yes the rewrite is silent but I've decided to keep the current errno
because I have no idea whether this can change in future. It doesn't
sound probable but it also sounds safer to do an overwrite at a single
place rather than all over the place /me thinks.

> Let's make it look more truthful?
> 
> --- a/fs/afs/write.c~fs-use-mapping_set_error-instead-of-opencoded-set_bit-fix
> +++ a/fs/afs/write.c
> @@ -408,7 +408,7 @@ no_more:
>  		case -ENOMEDIUM:
>  		case -ENXIO:
>  			afs_kill_pages(wb->vnode, true, first, last);
> -			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -EIO);
>  			break;
>  		case -EACCES:
>  		case -EPERM:
> _
> 
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
@ 2016-09-13  6:53                 ` Michal Hocko
  0 siblings, 0 replies; 35+ messages in thread
From: Michal Hocko @ 2016-09-13  6:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML

On Mon 12-09-16 15:18:23, Andrew Morton wrote:
> On Mon, 12 Sep 2016 15:11:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
> > >  		case -ENOMEDIUM:
> > >  		case -ENXIO:
> > >  			afs_kill_pages(wb->vnode, true, first, last);
> > > -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> > > +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> > 
> > This one is a functional change: mapping_set_error() will rewrite
> > -ENXIO into -EIO.  Doesn't seem at all important though.
> 
> hm, OK, it's not a functional change - the code was already doing
> s/ENXIO/EIO/.

Yes the rewrite is silent but I've decided to keep the current errno
because I have no idea whether this can change in future. It doesn't
sound probable but it also sounds safer to do an overwrite at a single
place rather than all over the place /me thinks.

> Let's make it look more truthful?
> 
> --- a/fs/afs/write.c~fs-use-mapping_set_error-instead-of-opencoded-set_bit-fix
> +++ a/fs/afs/write.c
> @@ -408,7 +408,7 @@ no_more:
>  		case -ENOMEDIUM:
>  		case -ENXIO:
>  			afs_kill_pages(wb->vnode, true, first, last);
> -			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -EIO);
>  			break;
>  		case -EACCES:
>  		case -EPERM:
> _
> 
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
  2016-09-13  6:53                 ` Michal Hocko
@ 2016-09-13 21:29                   ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-09-13 21:29 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, LKML

On Tue, 13 Sep 2016 08:53:01 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Mon 12-09-16 15:18:23, Andrew Morton wrote:
> > On Mon, 12 Sep 2016 15:11:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > > @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
> > > >  		case -ENOMEDIUM:
> > > >  		case -ENXIO:
> > > >  			afs_kill_pages(wb->vnode, true, first, last);
> > > > -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> > > > +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> > > 
> > > This one is a functional change: mapping_set_error() will rewrite
> > > -ENXIO into -EIO.  Doesn't seem at all important though.
> > 
> > hm, OK, it's not a functional change - the code was already doing
> > s/ENXIO/EIO/.
> 
> Yes the rewrite is silent but I've decided to keep the current errno
> because I have no idea whether this can change in future. It doesn't
> sound probable but it also sounds safer to do an overwrite at a single
> place rather than all over the place /me thinks.

Well, this is the only place in the kernel where we attempt to set
anything other than EIO.  I do think it's better to be honest about
what's happening, right here at the callsite.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit
@ 2016-09-13 21:29                   ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2016-09-13 21:29 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, LKML

On Tue, 13 Sep 2016 08:53:01 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Mon 12-09-16 15:18:23, Andrew Morton wrote:
> > On Mon, 12 Sep 2016 15:11:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > > @@ -409,7 +408,7 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb,
> > > >  		case -ENOMEDIUM:
> > > >  		case -ENXIO:
> > > >  			afs_kill_pages(wb->vnode, true, first, last);
> > > > -			set_bit(AS_EIO, &wb->vnode->vfs_inode.i_mapping->flags);
> > > > +			mapping_set_error(wb->vnode->vfs_inode.i_mapping, -ENXIO);
> > > 
> > > This one is a functional change: mapping_set_error() will rewrite
> > > -ENXIO into -EIO.  Doesn't seem at all important though.
> > 
> > hm, OK, it's not a functional change - the code was already doing
> > s/ENXIO/EIO/.
> 
> Yes the rewrite is silent but I've decided to keep the current errno
> because I have no idea whether this can change in future. It doesn't
> sound probable but it also sounds safer to do an overwrite at a single
> place rather than all over the place /me thinks.

Well, this is the only place in the kernel where we attempt to set
anything other than EIO.  I do think it's better to be honest about
what's happening, right here at the callsite.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-09-13 21:32 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-30 17:28 [PATCH -v2] mm: Don't use radix tree writeback tags for pages in swap cache Huang, Ying
2016-08-30 17:28 ` Huang, Ying
2016-08-30 18:29 ` Rik van Riel
2016-08-31  9:14 ` Mel Gorman
2016-08-31  9:14   ` Mel Gorman
2016-08-31 15:17   ` Huang, Ying
2016-08-31 15:17     ` Huang, Ying
2016-08-31 15:39     ` Mel Gorman
2016-08-31 15:39       ` Mel Gorman
2016-08-31 15:44       ` Huang, Ying
2016-08-31 15:44         ` Huang, Ying
2016-08-31 21:35       ` Andi Kleen
2016-08-31 21:35         ` Andi Kleen
2016-08-31 21:30   ` Andrew Morton
2016-08-31 21:30     ` Andrew Morton
2016-09-01  8:51     ` Mel Gorman
2016-09-01  8:51       ` Mel Gorman
2016-09-01  9:13     ` Michal Hocko
2016-09-01  9:13       ` Michal Hocko
2016-09-12 11:16       ` [PATCH 0/2] do not squash mapping flags and gfp_mask together (was: Re: [PATCH -v2] mm: Don't use radix tree writeback tags for pages in) Michal Hocko
2016-09-12 11:16         ` Michal Hocko
2016-09-12 11:16         ` [PATCH 1/2] fs: use mapping_set_error instead of opencoded set_bit Michal Hocko
2016-09-12 11:16           ` Michal Hocko
2016-09-12 22:11           ` Andrew Morton
2016-09-12 22:11             ` Andrew Morton
2016-09-12 22:18             ` Andrew Morton
2016-09-12 22:18               ` Andrew Morton
2016-09-13  6:53               ` Michal Hocko
2016-09-13  6:53                 ` Michal Hocko
2016-09-13 21:29                 ` Andrew Morton
2016-09-13 21:29                   ` Andrew Morton
2016-09-12 11:16         ` [PATCH 2/2] mm: split gfp_mask and mapping flags into separate fields Michal Hocko
2016-09-12 11:16           ` Michal Hocko
2016-09-12 11:48           ` Michal Hocko
2016-09-12 11:48             ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.